Abstract

Sharing experiences with others is an important part of everyday life. Immersive virtual reality (IVR) promises to simulate these experiences. However, whether IVR elicits a similar level of social presence as measured in the real world is unclear. It is also uncertain whether AI-driven virtual humans (agents) can elicit a similar level of meaningful social copresence as people-driven virtual-humans (avatars). The current study demonstrates that both virtual human types can elicit a cognitive impact on a social partner. The current experiment tested participants’ cognitive performance changes in the presence of virtual social partners by measuring the social facilitation effect (SFE). The SFE-related performance change can occur through either vigilance-based mechanisms related to other people’s copresence (known as the mere presence effect (MPE)) or reputation management mechanisms related to other people’s monitoring (the audience effect (AE)). In this study, we hypothesised AE and MPE as distinct mechanisms of eliciting SFE. Firstly, we predicted that, if head-mounted IVR can simulate sufficient copresence, any social companion’s visual presence would elicit SFE through MPE. The results demonstrated that companion presence decreased participants’ performance irrespective of whether AI or human-driven. Secondly, we predicted that monitoring by a human-driven, but not an AI-driven, companion would elicit SFE through AE. The results demonstrated that monitoring by a human-driven companion affected participant performance more than AI-driven, worsening performance marginally in accuracy and significantly in reaction times. We discuss how the current results explain the findings in prior SFE in virtual-world literature and map out future considerations for social-IVR testing, such as participants’ virtual self-presence and affordances of physical and IVR testing environments.

1. Introduction

Humans are affected by other people in their shared environment [1]. The lengthy restrictions on in-person interaction during the years of the COVID-19 pandemic have emphasised the importance of in-person experience sharing. A growing requirement for a genuine sense of social copresence with others urged the entertainment and social media industries to seek solutions for simulating these experiences remotely, reigniting the curiosity in immersive virtual reality (IVR) technologies.

The significance of the IVR experience lies in its ability to simulate a sense of agency and self-presence in an interactive virtual world that surrounds its user with a 360-degree digital environment. The IVR experience often perceptually removes the participant from their real-world surroundings. Through immersive presence, the participants can act upon the virtual environment similarly to the real world and exceed the rules of physical reality. Since IVR became commercially available, there has been a growing interest in how immersive reality affordances can be used in psychological research [2]. The findings are currently promising. For example, prior research has shown that embodied interaction with the immersive physical environment can positively influence cognitive performance and foster more creative problem-solving [3], facilitating intuitive cognition [4]. In agreement with an original proposal by Blascovich et al. [5], researchers are now finding that testing in IVR can indeed offer a more ecologically valid solution to testing classic paradigms, which until recently were limited to artificial screen-based stimuli [6].

The effects explored in IVR experimentally are often nonsocial. However, with growing interest in immersive social interaction, it is now crucial to establish and test the baseline mechanism of immersive virtual social reality. This is the aim of the current paper.

The social IVR encloses its users within a computer-generated interactive world, developed around the user and their virtual companions. In contrast to desktop-based social interaction, during which the user is located within their home environment, viewing the companion on-screen, in IVR, the user virtually coinhabits the immersive environment alongside their social companion. Within the immersive space, the companion and participants can directly approach and virtually engage with one another’s virtual representation in the simulated environment, leading to higher levels of copresence in the same environment than desktop-based communication [79]. However, whether this coimmersion in IVR elicits a meaningful sense of copresence, equivalent to real-world interaction, is yet to be determined.

The current study tests whether a classic social phenomenon, reported during a real-world interaction, can be also elicited within the IVR. In particular, the current study focuses on the social facilitation effect (SFE), which manifests as the change in individuals’ performance when they perform within a social context, in contrast to when performing alone [1013]. As per canonical SFE observed in real-world social contexts, we expect the canonical SFE within IVR to follow a similar pattern of influence on performance—the cognitive performance will improve (facilitation) when they perform an easy task and decrease (inhibition) when they perform more difficult tasks, in presence of an immersive virtual companion in contrast to when performing alone. Given the importance of cognitive performance during remote interaction in education and work settings, including IVR, it is important to understand whether cognitive performance changes driven by SFEs are also observed due to the presence of social agents within virtual scenarios. However, previous studies attempting to measure cognitive SFE in an immersive setting have resulted in mixed findings, with most studies not showing a canonical pattern of the SFE, but only showing performance inhibition without facilitation [1417] and one resulting in null findings [18].

The most straightforward explanation of the discrepancy in prior virtual SFE findings could be that different studies used different types of companions and that some types of virtual companions accompanying the participant might not be as socially influential as others. It is often assumed that the human-driven companion’s (i.e., an avatar’s) ability to mentalise from another person’s perspective renders it more socially meaningful to the participant, in contrast to an AI-driven companion (i.e., agent; [19]), which would not possess a similar capacity to mentalise. However, the findings on whether the SFE within an immersive context is affected by the companion’s agency (human or AI-driven) are also inconclusive. Some reports demonstrate performance inhibition in the presence of avatars but not of agents [15], whilst others report performance inhibition for both [14, 16].

If the social influence on performance cannot be explained just through the companions’ agency (the mind behind the companion), it is possible that the performance change could also be driven by the companion’s interactive and visual factors. Prior work suggests that a virtual companion’s human likeness, i.e., whether the companion looks and acts more humanlike versus nonhumanoid, is a significant predictor of companions being treated as other people [2022]. However, the SFE, which manifested as inhibition without facilitation, has also been reported for robotlike virtual companions [14]. This finding could be explained by emerging theoretical work in human-agent interaction that proposes the significance of multimodality in interaction, emphasising the importance of virtual communication context rather than just the companion visual attributes. In their virtual interaction model, Klowait and Erofeeva [23] propose that when the participant learns of the contingencies within the social virtual interaction context, the agent companions’ sufficient humanlike interactive properties can overwrite the insufficient visual humanlike attributes. Therefore, even if the agent companion is not sufficiently human-like in appearance but represents a sufficient human-like function to the participant, it will be interacted with similarly to a real person due to pragmatic heuristics.

A systematic experimental contrast of humanoid versus nonhumanoid presence has not been tested in the framework of immersive virtual SFE. Therefore, it is still not clear whether the agency of the virtual companion, its physical attributes (visual human-likeness, human-like function), or the combination of these two, influences a meaningful sense of immersive copresence and elicits the SFE. And if virtual SFE is elicited, which cognitive mechanisms are responsible for these effects?

We propose that investigating SFE through its hypothesised mechanisms can provide an original perspective on how different companion types might elicit SFE within the IVR and elucidate the driving forces behind the SFE itself. To the best of the authors’ knowledge, the prior published studies investigating SFE within a virtual context measured the SFE as a general effect, without considering the SFE’s underlying mechanisms. Considering that virtual companionship already transcends entertainment industries, heading towards well-being, work, and education [2426], it is important to investigate which virtual companions are truly socially meaningful, and what are the consequences of interactions with them.

The real-world (not virtual) SFE has been theorised as being due to two potential cognitive mechanisms [27], either our awareness of the mere presence of another person in the environment (the mere presence effect (MPE)) or our sense of being observed by another person (the audience effect (AE)). In the SFE literature, the MPE and AE are often researched separately and are, therefore, referred to as “effects” on their own, even if both result in SFE under different social conditions. To resolve the confusion, in the current paper, the terms MPE and AE are being used as distinct mechanisms which elicit SFE as the outcome.

It is important to understand the distinction between the MPE and AE as cognitive mechanisms because there are fundamental differences in cognitive processes related to how social contexts elicit the SFE. The MPE is based on the registration of physical uncertainty and vigilance over another person’s presence in the same environment. Critically, the MPE is agnostic of whether the other person is explicitly monitoring/observing the participant [28]. In contrast, AE is driven by the belief of being watched and potentially judged whilst performing a task [29], for example through a camera, irrespective of whether the observer is visually copresent in the shared environment [30, 31].

Considering the differences between the MPE and the AE, it is hypothesised that through the MPE, the companion’s physically embodied (bodily) presence alone can elicit the SFE, by arousing participants’ vigilance state, irrespective of the companion mind property (agency: human- or AI-driven). Whilst through the AE, being watched by a mentalising agent (such as a human-controlled avatar in contrast to an AI-controlled agent) is hypothesised to be sufficient to elicit the SFE, irrespective of the companion’s visual body presence in the same environment. In the real-world interaction scenario, it is challenging to separate these two properties of a companion, especially the copresence of an interactive partner who lacks the ability to mentalise. However, IVR provides a unique opportunity to experimentally dissociate these two properties, isolating the companion’s virtual body from their mind.

To disentangle how virtual companions are perceived in IVR, the consideration for the categorical differences between the two cognitive mechanisms underlying SFE (MPE and AE) could be crucial, revealing participants’ underlying social cognition when exposed to different types of virtual companions. Up until now, the IVR studies used SFE as an umbrella term, merging its two mechanisms, MPE and AE, and observing SFE as an outcome. These experimental choices potentially enabled the participants’ subjective beliefs about companions’ mentalising properties and agency behind their actions to vary freely. If this is indeed the case, it is still not clear whether the inhibitory effect reported in prior immersive SFE literature is driven by beliefs of companion agency (related to the AE) or their visual human-likeness during immersive copresence (not related to the AE but possibly related to the MPE, as we discuss below).

Based on the MPE, it is hypothesised that if the companion’s embodied copresence in an immersive space is socially meaningful irrespective of their agency, the social virtual companion copresence should be sufficient to elicit SFE. Indeed, a review of virtual interaction [8] shows that communicative social AI agents can be engaging even if they are not in visually humanoid form. However, more visually humanlike companions are believed to be most socially impactful [21, 22, 32]. Interestingly, based on the MPE, it is the sense of social embodied presence of a companion in the shared space that drives the effect, irrespective of whether the companion is attentive to the participant [33]. Therefore, whether the companion has a human mind and is able to appraise the human partner is irrelevant for the MPE to elicit the SFE. If so, both the avatar and agent copresence should be sufficiently and equally impactful, as per MPE. Whether the companion’s humanoid form, irrespective of agency, is important for virtual MPE to emerge is directly tested in the current study.

In contrast to the MPE, the AE is hypothesised to depend on the companion having the mental capacity to monitor the participant. If a participant’s anticipation of social judgement indeed drives the effect [27], an avatar’s mentalising capacity renders it more socially meaningful than an agent, which is without mentalising capacity when the participant believes their performance is monitored. Indeed, the threshold model of social influence (TMSI: [19]) predicts that when evaluation is central to interaction, the avatar companion is always more impactful than the agent companion. Additionally, the participants do not expect to be judged by an AI companion unlike by another person [34]. Considering that real-world AE was demonstrated even when the companion was not visually available to the participants and the monitoring was instead denoted by a light in the environment [30], the companions’ visual presence in the environment might not be as important for the AE as it is for the MPE. If AE requires a companion to have a subjective opinion, rather than just a visual presence, the agency of the companion (i.e., if it is driven by a human or an AI) should be crucial for the AE to elicit the SFE.

As argued in our previous study [35] and as per TMSI [5], when measuring a virtual companion’s social impact, both the social interaction context and what the participants know about the virtual companion should be considered. To the author’s knowledge, there has not yet been a systematic study testing the SFE within IVRs, disentangling the contexts in which either the MPE or the AE elicits the SFE. The current experiment utilised the experimental control provided by IVR to dissociate the social companion’s visual presence levels (not visible versus humanoid versus nonhumanoid), within an immersive virtual space, from participants’ interpretation of whether they were human-minded (avatar) versus nonhuman-minded (AI agent). We contrast companion attributes with the level of their attentiveness to the participant performing the cognitive task in an immersive space. By controlling whether the companion is monitoring or not monitoring the participant’s performance, we were able to directly test whether the MPE or the AE explain the SFE within a virtual environment in a way not possible in a real-world scenario. The cognitive task utilised is the relational reasoning paradigm (RRP). The RRP is a reaction-time-based rapid response visual logic task engaging participant’s executive functioning was previously shown to be susceptive to the AE in real-world testing [30].

During the current experiment, the participants were immersed within a digitally generated 360-degree immersive virtual environment, whilst wearing a head-mounted display. Each participant performed the RRP within an immersive environment, whilst viewing the environment from a first-person perspective. During the RRP, the participants are timed as they are matching rows of shapes and patterns performed at easy and difficult levels. The participants performed the RRP on a large virtual screen within the IVR, whilst in the presence of a companion or alone. The companion was attentive (or not attentive) to participant performance either with or without physical bodily presence (AE) and was merely bodily present (or absent) with or without attending to participants or their performance (MPE).

The companion’s visual presence in the IVR was either visually absent (none), nonhumanoid, or humanoid. To control for possible confounds within the environment, such as distraction to the mere presence of any interactive nonsocial object, the participants in the visually absent companion group (none) performed the RRP task in the active presence of a nonsocial placeholder (an office fan) which mimicked the same movements, i.e., head turns towards the participant became rotations of the fan head. To test the impact of visually humanoid versus visually nonhumanoid social presence, one group of participants performed in the presence of an autonomous video camera, the others in the presence of a humanoid virtual human. The companion’s visually absent (none), nonhumanoid, and humanoid presence were entered into the analysis as companion visual presence (CVP). All participants performed under the belief that they were within an IVR social context with either an AI (agent) or another person (avatar). Effects of the companion’s capacity to mentalise (agency) were analysed under the factor of companion agency. The performance monitoring and nonmonitoring (entered into the analysis as monitoring) occurred orthogonally to the CVP and the companion agency. Within each level of the CVP, Monitoring occurred through either the camera (nonhumanoid companion) or humanoid companion “eye.” In the presence of the nonsocial object, the monitoring condition was indicated just through an on-screen instruction. To reduce the level of participants’ anonymity in IVR, the participants were greeted in person by the researcher, who would operate the avatar to monitor the participant’s performance, prior to the study. This manipulation was aimed at establishing that they are identifiable when they are being monitored performing. The participants viewed the IVR environment from a first-person perspective. To reduce any self-avatar effects, such as identity occlusion and projection [36, 37], the participant was not given an avatar. Instead, all participants were notified that when performing, their behavioural markers (gaze, head rotation) and performance accuracy are seen by the observer, either AI or human (depending on the group assigned). This decision established that participants are aware that their actions are mapped onto their self-presence in IVE. Based on the conditions assigned, several predictions were made based on the MPE and the AE, and the hypotheses and corresponding planned analyses are listed below.

2. Hypotheses and Planned Analyses

Canonical SFE is seen as an improvement in the easy task and a decrease in the difficult task under the social compared to the nonsocial condition. Note that the SFE may or may not show a canonical pattern, e.g., some prior studies only showed inhibition without facilitation [1416]. Figure 1 illustrates a summary of the mixed design levels used in the current study. The predictions and corresponding MPE and AE contrasts within the omnibus mixed ANOVA design are listed below. For schematics of planned contrasts relating to the hypotheses, see Figure 2 for MPE and Figure 3 for AE.

2.1. Hypotheses Based on the MPE

The first analysis tested the MPE within an immersive context, combining the companion visual presence (CVP) and task difficulty (). There are two hypotheses based on the MPE; for an analysis plan related to the hypotheses of MPE, see Figure 2.

The first hypothesis (H1.a) predicted that any companion presence irrespective of its visual type (CVP: nonhumanoid or CVP: humanoid) would elicit the SFE, in contrast to when the participant is alone, i.e., companion visual presence is none (CVP: none); see Figure 2, H1.a.

The second hypothesis (H1.b) predicted that the SFE would be affected more by the humanlike companion presence than the nonhumanoid presence or no visual social presence (CVP: none), as per Blascovich [5]. The H1.b would be supported if the performance on easy conditions would improve, and performance on difficult conditions decreases linearly as the CVP level of social influence increases from no companion presence (CVP: none), to CVP: nonhumanoid, and with CVP: humanoid presence being of highest influence. The hypothesis was tested by contrasting the three levels of CVP (none, nonhumanoid, and humanoid) with one another. Similarly, to the first analysis, the effects are analysed at each level of task difficulty (see Figure 2, H1.b).

2.2. Hypotheses Based on the AE

The second part of the analysis focused on the AE in an immersive environment, investigating whether the SFE in a coimmersive social environment can only be elicited through monitoring by a human-minded companion (companion agency: human) but not an AI-minded companion (companion agency: AI). To test this, the monitoring versus not monitoring condition was contrasted within each level of companion agency (AI, human) separately. There are two hypotheses based on the AE, and for the analysis plan and related hypotheses of AE, see Figure 3.

The first hypothesis, H2.a, predicts that participants’ cognitive performance would change following the canonical pattern of the SFE when monitored versus when not, and the effect would be present when the participants were allocated in companion agency: human condition, but not in companion agency: AI condition.

Alongside the main predictions based on the AE focusing on companion agency, an additional analysis was conducted to test the effect of companion visual presence (CVP) on the AE in eliciting the SFE. The virtual interaction theories suggest that a higher human-likeness heuristic can “trick” the brain into processing a humanlike companion as more socially impactful [20, 21] and that higher human-likeness might overall heighten the social impact even during human-minded agency of the companion [5]. Therefore, the second hypothesis, H2.b, predicts that there might be a positive linear effect of the CVP, with higher levels of companion visual humanness, from no companion to nonhumanoid to humanoid, which will increase the SFE. The effects were explored per difficulty and easy conditions separately.

3. Methods

3.1. Design

The experimental design included four factors, with two between-subject factors, companion agency (AI and human) and companion visual presence (none, nonhumanoid, and humanoid), and two within-subject factors, the level of monitoring (monitored and not monitored) under which participants performed and task difficulty (easy and difficult) (see Figure 1). The relational reasoning paradigm (RRP), which was also used in our recent study that measured SFE within a desktop-based online videoconferencing context [35], measured task performance in per cent accuracy and reaction times (RT) per accurate responses only. Participants who performed less than fifty per cent accuracy for easy and difficult conditions combined or performed more than three (3) standard deviations (SDs) away from the mean were removed from the final analysis.

3.2. Participants

The data used for the final analysis consisted of data from 103 participants, 73 females and 30 males, with the age range of 18-55 (, ). A total of 138 participants were recruited to take part in the study, aiming to enter 18 participants per group for ANOVA analysis. The group sample size was powered (GPower) at , , Cohen , as per Dumontheil et al.’s [30] study, reporting real-world SFE with the current RRP paradigm, in interaction. A total of 35 participants were removed from the final data, for either not following the study instructions or due to their accuracy performance both on easy and difficult tasks combined falling under 50 per cent. Out of the 35, 19 participants were removed for not believing the experimental manipulation of monitoring or the companion agency (companion mind AI or human); see Table 1 (Section 4) for a breakdown of the remaining participants.

No prior experience in immersive virtual reality headsets was required to take part in the study. The simulation sickness risk was considered low, due to the stationary nature of the experiment and as a result of piloting the study on participants with a higher simulation sickness quotient (as measured by the motion sickness susceptibility questionnaire, MSSQ-short [38], reporting no discomfort). The participants were reimbursed with either course credits or with £10 vouchers and fully debriefed on the social manipulation script after the completion of the experiment. The study was approved by Birkbeck, University of London, Ethics Committee: 181933.

3.3. Virtual Environment

The immersive virtual environment (IVE) comprised a brightly lit room with textured white walls, a desk with a chair, and a large flat TV screen on which the stimuli were presented (aerial virtual room perspective in Figure 4(a). The participants sat behind the virtual desk inside the environment, facing the TV screen positioned on the wall in front of the participant. In the real-world lab cubicle, the participant sat behind the computer wearing an Oculus HMD (Figure 4(b), pressing the keys on the keyboard, and responding yes (right arrow click) or no (left arrow click) on the RRP task.

Depending on the experimental group assigned, the virtual companion was present either in a nonhumanoid social presence form (observing interactive camera, Figure 5(a)), in a humanoid form (an interactive humanoid character, Figure 5(b)), or as a visually nonpresent companion (companions animated presence replaced by an animated office fan, Figure 5(c)). All objects were positioned, ensuring the participants could fully observe the task on a virtual TV screen but also notice any movement by the companion or the nonsocial animated object (e.g., camera or fan); see Figure 4(a) for object placement within IVE (example with a humanoid companion present). During the training session, there was no companion implied; therefore, the companion spot was empty; see Figure 5(d).

3.4. Social Context Instructions

Participants were led to believe that they were testing a new virtual monitoring and tracking software. Depending on the level of companion agency assigned to each participant, they were led to believe that they will be monitored in real-time by either an AI algorithm (AI condition) or a real human observer (human condition), at some point during the task. The participants believed that their gazing behaviour (as projected through a VR headset) and performance were monitored by either another person (human mind) or an automated algorithm (AI processing). The companion monitoring blocks were marked by an IVE onscreen instruction “You are now being watched,” which was then followed by the companion’s monitoring behaviour as the participants performed the task. After the study’s completion, all participants were asked whether they believed their companion was congruent with the manipulation. Contrary to the participant’s belief, the social companion presence and monitoring behaviours within the IVE were all automated and matched across all the companion agency levels. There was no virtual real-time monitoring occurring at any time throughout the study by either companion. After the debriefing procedure revealed the social manipulation, the participants had a choice of withdrawing or committing their data to the analysis. Only the participants who consented to data inclusion and who believed in the social context script were included in the final analysis, and nineteen participants were removed for not believing the social context manipulation (see Section 3.2).

3.5. Companion’s Social Presence and Monitoring

Depending on the level of CVP assigned to each participant, the companions’ visual presence within IVE was either socially meaningful, i.e., a humanoid character (humanoid condition), a nonhumanoid camera (nonhumanoid condition), or lacked socially meaningful presence, i.e., an animated office fan with no visible social presence value (none condition). All the participants experienced both the monitored and not monitored conditions, which were allocated to different blocks and were initiated by instructions on the IVE TV task screen (“You are now being watched”; “You are now not being watched”). During the training session, the participants performed alongside a placeholder for the companion’s visual presence in the study. The companion location during training sessions was substituted with a stationary chair (Figure 5(d)) for all groups.

Out of three levels in the CVP, two were expected to be socially meaningful, the nonhumanoid and humanoid conditions. In the nonhumanoid condition, the social companion presence was expressed in the form of an interactive camera on a tripod (Figure 5(a)), which participants believed to be operated in real-time by either a monitoring real person (companion agency, human) or by an autonomous AI algorithm (companion agency, AI). Participants believed that the companion, AI or human, could monitor them inside the IVE using the camera (virtual CCTV). For the humanoid condition, the companion was a humanoid virtual companion (Figure 5(b)), believed to be controlled by the companion (human or AI) in the real world. For both nonhumanoid and humanoid conditions, the participants believed that a real-world companion (AI or human) could see into the IVE from the perspective of the virtual companion (either lens: nonhumanoid camera or gaze: humanoid).

During the monitoring blocks, both the AI and human companion representations were animated to simulate monitoring motion, ranging between looking towards the participant and to the main task screen, as the participants performed. Both the nonhumanoid and the humanoid companions gazing motions were synchronised across between-subjects presence conditions. The head-turn towards and away from a participant in the humanoid character condition was animated to occur at the same time and for the same duration as the camera turn on the tripod in non-humanoid presence condition. In the not monitored blocks, the nonhumanoid camera or the humanoid character’s head tilted down and remained facing down throughout the condition block, with no observation motion directed towards the participant. To ensure a more natural not monitoring condition, in the case of humanoid character conditions, the character was looking down into their laptop.

By contrast, in the third level of the CVP (none condition), participants performed seemingly alone, with the only moving object in the immersive environment being an office fan. The participants believed that the office fan (Figure 5(c)), experimentally used as a replacement for a social agent, was just part of the environment and operated independently from the monitoring conditions. There was no visually meaningful companion presence within the immersive space from the participants’ perspective. Without the participant’s knowledge, in the companion visual presence: none condition, the office fan motion was matched to the companion visually present group’s motion (camera and character) motion. The main difference between the nonsocial presence of a fan and the social presence of the nonhumanoid and humanoid companion conditions was that the fan did not signal any companion social disposition towards the participant and, therefore, was socially meaningless.

Similarly, to the nonhumanoid and humanoid conditions, in the none condition, the participants were also made to believe that their performance is evaluated by either an AI algorithm or a human-minded companion; however, there was no visual representation of an interactive companion in the scene.

3.6. Task Stimuli

The current study used a timed visual logic task, the relational reasoning paradigm (RRP), at two difficulty levels, easy and difficult. The task consisted of two rows of two grayscale images (Figure 6).

For easy condition, the participants were asked to decide whether the top two images matched the bottom images in shape or pattern, by clicking the “yes” arrow if the bottom and top image matched and the “no” arrow if they did not. For the difficult task, the participants had to decide whether the images on the top changed in the same way (dimension: either shape or pattern) as the bottom images. By clicking the “yes” or “no” arrow, the participants matched the “change” type of the top and bottom rows. For example, if the top images change in shape, the bottom ones should also change in shape (not pattern), even if the shapes or patterns of the top and bottom are not the same. If both the top and bottom rows changed in shape, then participants clicked “yes”; if the top row changed in shape and the bottom row changed in the pattern, the response would be “no.”

Both easy and difficult conditions were presented for a fixed duration of 3.5 seconds, following 0.5 seconds of a blank screen, so participants had a total of 4 seconds to answer, until the transition to the next trial. If the answer was not given during the 4-second interval, the answer was considered incorrect. Each monitoring block (monitored and not monitored) consisted of two difficult and two easy subblocks, five trials per subblock; in total, there were 20 trials per one monitoring condition, beginning with either a difficult or easy subblock, followed by a complementing condition subblock. Block sequences were counterbalanced between the groups and conditions. In total, there were eight monitoring blocks: four monitored and four not monitored, counterbalanced. Every new monitoring block was preceded with a 10-second on-screen message on the IVE TV screen: “You are now being watched” or “You are now not being watched.” The time of the message gave participants the time to look around and notice whether the companion was monitoring them or not (visually noticeable in the visually present companion groups). When the monitoring began, the virtual companion turned towards the participant, and then at the screen, in no monitoring conditions, the interactive virtual objects turned downwards, not facing the environment.

3.7. Procedure

Each participant was visually introduced to the RRP task outside of IVE and helped with adjusting the head-mounted display as they were introduced to the 360 environments within IVE. During the introduction phase, the participants were asked to perform a practice task. After the practice session was completed and the participant reached a 70 per cent passing threshold both for easy and difficult trials, all participants were approached by the researcher, who made sure the participants were comfortable wearing the HMD and did not experience any simulation sickness.

The participants were assured that the researcher would be nearby in case of emergency, but unless the researcher would need to virtually monitor the participant’s performance, they would be occupied otherwise. The participants were asked to inform the researcher when they completed the experiment, reinstating that otherwise, the researcher would not be able to see their performance. After the study completion, the researcher helped with removing the headset. The participants then filled out a form on whether they believed in monitoring by their assigned companion agency and, if so, whether they felt judged. The researcher made sure participants understood the questionnaire, guiding them through it when needed. Participants were fully debriefed after the study and made aware of the social scripting used for the experiment.

3.8. Apparatus

The virtual environment was developed in an open-source 3D platform Blender and imported into the Unity game engine. Stimulus presentations were created and generated through the Unity platform. The RRP image textures (originally used in Dumontheil et al. [30]) were edited to 20% larger texture grain in Adobe Photoshop to reduce the Moire pattern effect inside the IVE. The humanoid character was a rigged 3D model mesh created in MakeHuman (http://www.makehumancommunity.org) free software and later imported to Blender for extra texturing. Both nonhumanoid (tripod camera) and humanoid (digital researcher) were imported to and animated in the Unity game engine. The experiment was conducted through a Unity game engine. Stimuli were presented on a virtual reality supporting Dell Alienware laptop and projected through the Oculus Rift DK2 (developers Kit) headset (resolution: pixels per eye, OLED display), refresh rate: 75 Hz. The visible field of view is 93-degree horizontal and 99-degree vertical; optical hardware is aspherical binocular lenses, with an IPD range of 63.3 mm fixed. The in-depth description of companion animation can be found in the appendix.

4. Results

After excluding participants who did not meet inclusion criteria (see Section 3.2 for details), there was an overall similar distribution of participant numbers in each of the six groups (allocated for each of three levels in the CVP factor for each of two levels in the companion agency factor).

See Table 1 for the number of participants allocated to each group. The data normality checks were conducted within each of the difficulty levels, for accuracy and RT separately. The test revealed that for accuracy, but not RT, the overall data distribution was significantly skewed as measured by the Kolmogorov-Smirnov test, for difficult , and easy , . Therefore, the results for accuracy will be reported and corrected when the data does not satisfy homogeneity or sphericity assumptions.

The summary tables of means (M) and standard errors (Table 2) of accuracy percentage (Table 3) and reaction times per factor and its corresponding levels are located in Tables 2 and 3.

4.1. Difficulty

For accuracy, there was a significant main effect of difficulty, , , , , with difficult trials (, ) performed significantly worse than easy (, ) trials. For RT, there was also a significant main effect of difficulty, , , , , with difficult trials (, ) performed significantly slower than easy (, ) trials.

4.1.1. H1: Mere Presence Effect (MPE)

As per the MPE prediction, companion visual presence interaction was significant in accuracy , , (Greenhouse-Geisser, power 77%) and was marginal in RT, , , (Greenhouse-Geisser). The planned Bonferroni-corrected simple effects analyses investigated the interaction between the two levels of difficulty separately.

(1) H1.a: Companion Presence versus Absence. The MPE hypothesis H1.a predicted that any companion visual presence (CVP: humanoid and nonhumanoid combined) would elicit the SFE when contrasted against the condition where a companion is absent (CVP: none).

(I) Accuracy

As per the planned pairwise comparison, the interaction was broken down by difficulty, contrasting the impact of visually absent companion (CVP: none) versus visually present companion condition (CVP: present; the nonhumanoid and humanoid conditions combined), at easy and difficult trials separately. The results showed no significant differences within the difficult or easy conditions. Difficult condition, , (absent , , versus present, , ), easy condition, , (absent , , versus present, , ).

(I) Reaction Times

Similarly, to accuracy, the marginal interaction in the RT was broken down by difficulty, as planned. The results showed no significant difference in the planned contrast between the CVP presence and absence within difficult or easy conditions. Difficult tasks, , (absent , , versus present, , ), easy tasks, , (absent , , versus present, , ).

(2) H1.b: Presence of Companion Type. The MPE hypothesis, H1.b, predicted that the visual presence of a humanoid companion (CVP: humanoid) would be most impactful in contrast to the presence of CVP: nonhumanoid and CVP: none. The impact was expected to increase linearly as social influence increases through companion visual characteristics, from none to nonhumanoid to humanoid.

(I) Accuracy

The planned (Bonferroni corrected) pairwise comparisons tested the interaction at each level of CVP, broken down by levels of difficulty. For difficult conditions, the participants allocated to the humanoid companion condition performed significantly worse than both those in nonhumanoid and none conditions: humanoid (, ) versus nonhumanoid conditions (, ), ; humanoid versus none conditions (CVP: none) (, ), . Participants allocated to nonhumanoid and none conditions did not statistically differ from one another, . For the easy conditions, the participants allocated to the humanoid condition performed marginally worse than those in nonhumanoid and none conditions: humanoid condition (, ) versus nonhumanoid (, ), , and none condition (, ), . The participants allocated to nonhumanoid and none conditions did not statistically differ from each other (). There was no SFE facilitation of easy tasks; see Figure 7(a).

(II) Reaction Times

The Bonferroni corrected pairwise comparisons explored the interaction at each level of CVP, broken down by levels of difficulty. For difficult conditions, the participants allocated to the humanoid condition performed significantly slower than those in the nonhumanoid condition but only numerically slower than those in the none condition: humanoid (, ) versus nonhumanoid (, ), , and humanoid versus none condition (, ), . The participants allocated to nonhumanoid and none conditions did not statistically differ from one another, . For easy conditions, the participants allocated to humanoid condition performed marginally slower than those in nonhumanoid condition but only numerically slower than those in none condition: humanoid group (, ) versus nonhumanoid (, ), , and humanoid versus none conditions (, ), . There were no significant differences between participants allocated to none and nonhumanoid conditions, . There was no significant facilitation; see Figure 7(b).

(I) MPE Summary

These results indicated that both RT and accuracy were negatively affected (inhibited) only by the presence of humanoid companion (CVP: humanoid), both on easy and difficult tasks (as per H1.b), but not by the presence of a nonhumanoid companion, not supporting MPE H1.a. There was no significant linear effect as companion’s visual humanness increases, with no significant difference between the presence of nonhumanoid (CVP: nonhumanoid) and absent companion (CVP: none), although the impact of humanoid presence did elicit a social response, in contrast to other types of presence, as per H2.b. The effects were inhibitory, without facilitation, which is not in line with the canonical pattern of the SFE as hypothesised, but in line with the majority of IVR findings reported to date.

4.1.2. H2: Audience Effect (AE)

The AE (H2.a) hypothesis predicted that there would be a significant interaction. The monitoring should influence participants’ performance in human-minded companion agency: human group, but not companion agency: AI. The effects should be irrespective of CVP. The interaction between companion agency, monitoring, and difficulty was nonsignificant , , for accuracy, and in RT , , , not supporting our hypothesis. However, as planned, the interaction was broken down, contrasting monitoring levels within each level of companion agency and difficulty (see Section (1) H2.a: AE Irrespective of CVP).

In H2.b, we predicted that the companion’s visual presence and humanoid form could contribute additionally to the impact of the monitoring companion, predicting a interaction. The interaction was not significant for accuracy , , , or for RT , , . However, as planned, the interaction was broken down, contrasting performance outcomes under different CVP levels during monitoring. The effects were contrasted within the human-minded companion agency: human group, per difficulty level separately. There was no significant effect for the AI-companion group; therefore, CVP × monitoring interaction was not followed up (see Section (2) H2.b: AE Accounting for CVP).

(1) H2.a: AE Irrespective of CVP. (I) Accuracy

The follow-up analysis for the main effect of monitoring explored whether the performance changes within each level of the companion agency (human or AI) were different from each other. The results showed that although the monitoring decreased participants’ performance accuracy numerically, both in the AI group (not monitored , , versus monitored , ), and in the human group (not monitored , , versus monitored , ), the effect was only significant in the participants allocated to human companion agency condition (, ) but not those allocated to AI condition (, ). The planned follow-up analysis of interaction at each level of companion agency revealed that monitoring by a human companion marginally decreased the accuracy of the performance on the difficult trials (, ), with no significant changes on easy trials (, ); see Figure 8(a). By contrast, monitoring by AI companion did not significantly change performance either in easy trials , or in difficult trials , .

(II) Reaction Times

For RT, the results showed that participants allocated to both companion agency conditions performed slower when monitored (human: , ; AI: , ), versus not monitored (human: , ; AI: ). However, the effect was larger in those allocated to the human condition , , than in those allocated to the AI condition, , . In the planned follow-up analysis, breaking down the effect further by each level of difficulty (see Figure 8(b)), there was a marginal performance decrease when monitored for easy trials (marginal, , ) but none for difficult trials (, ) for participants allocated to AI condition. For those allocated to the human condition, by contrast, the monitoring significantly affected RTs in both the difficulty levels, with performance significantly slower both for easy, , , and difficult trials, , , when monitored.

(2) H2.b: AE Accounting for CVP. The planned Bonferroni corrected contrasts between the levels of monitoring for each level of companion agency, CVP, and difficulty revealed that for participants allocated to the Human companion agency condition, the monitoring by the humanoid CVP decreased performance on difficult tasks in contrast to nonhumanoid companion. The effect was significant for accuracy, humanoid (, ) versus nonhumanoid (, ), , , and marginal for RT, humanoid (, ) versus nonhumanoid (, ), , . There we no other significant differences between CVP levels when monitored.

4.2. AE Summary

Based on the results from breaking down each companion agency level of the interaction, monitoring decreased performance overall both for human (significant) and AI (marginally) companions. However, when broken down by difficulty (see Figure 8), the results support H2.a TMSI theory, suggesting that only a human-minded companion would affect performance during monitoring. The canonical SFE-related facilitation was not observed; instead, we observed overall performance inhibition when monitored.

H2.b, stating that a companion’s humanlike visual form facilitates AE, was not supported; however, the human visual form of the monitoring companion significantly decreased the participant’s performance in contrast to the nonhumanoid monitoring companion’s presence.

4.3. Companion Visual Presence (CVP)

There was an overall significant main effect of CVP in accuracy, , , , and RT, , , . The effects were explored in post hoc independently from the SFE effect, such as performance difficulty. The Bonferroni corrected pairwise contrasts tested the performance differences under the three different CVP levels.

4.3.1. CVP Type Impact

(1) Accuracy. Contrast revealed that a humanoid companion (, ) led to significantly worse performance than both the nonhumanoid companion (, ), , and none condition (, ), . Accuracies did not differ between nonhumanoid and humanoid conditions, .

(2) Reaction Times. The performance was slowest in the humanoid condition (, ), being significantly different to the nonhumanoid condition (, ), . The difference between humanoid and none conditions was only numerical and not significant (, , ). The nonhumanoid condition showed the numerically fastest RTS overall; however, it was not statistically different from the next fastest, none condition ().

Overall, both for accuracy and RT, the results are in line with findings in the MPE: presence of companion type above, suggesting that when performing under a humanoid companion, performance was significantly less accurate and slower, irrespective of task difficulty.

4.4. Monitoring

Looking into data-driven effects, there was also a significant main effect of monitoring, both in accuracy , , and RT , , . The post hoc analyses revealed that for accuracy, the not monitored condition (, ) showed overall more accurate performance than the monitored condition (, ). For RT, the monitored (, ) conditions performed significantly slower (worse) than the not monitored condition (, ). Overall, the process of monitoring, irrespective of companion agency, was also significantly detrimental to performance. However, when breaking down effects by monitoring the effect is only significant in human-minded companion conditions.

5. Discussion

The current study investigated the social facilitation effect (SFE), within an immersive virtual environment, focusing on comparing the two hypothesised mechanisms of eliciting SFE: the audience effect (AE) and the mere presence effect (MPE). The experiment measured participant’s cognitive performance (on a relational reasoning paradigm, RRP) changes under the mere copresence with the companion at three levels of companion visual presence (CVP: none, nonhumanoid, and humanoid) (MPE) and when the companion was monitoring the participant’s performance (AE).

Firstly, the MPE hypothesis, H1.a, predicted that any social virtual companion presence (CVP: nonhumanoid and CVP: humanoid combined) in an immersive environment will elicit SFE, in contrast to when a companion is not visually copresent with the participant (CVP: none). This hypothesis was not supported by the data. However, H1.b, that the companion humanoid immersive presence would be most impactful, was indeed supported by the results. Tasks performed under humanoid companion demonstrated a significantly worse outcome than in other CVP conditions, with no significant differences between nonhumanoid and no companion (CVP: none) groups. Although the effect of humanoid versus other CVP conditions was significant for accuracy and reaction times (RT), the results showed an overall performance inhibition and not facilitation, not replicating canonical SFE (facilitation in easy tasks and inhibition in difficult tasks) observed in real-world interaction but in line with the majority of recent immersive SFE literature [1416]. Based on the current virtual MPE findings, the humanoid presence was possibly socially distracting to the participants, with no evidence of positive social facilitation in easy tasks. This conclusion could be supported by a significant overall main effect of CVP for accuracy and RT, decreasing participants’ performance when a humanoid companion was present irrespective of difficulty. Future studies recording eye tracking during this task could shed more light on whether humanoid companions are more distracting than nonhumanoid companions. For example, eye tracking could be used to investigate whether the participants’ gaze disengages from the task, towards the companion, measuring the frequency of gaze saccades towards the companion, leading to worse task performance. However, as mentioned by Guerin [39], people often restrict their overt embodied behaviours, such as body movement, during copresence, and potentially inhibit gaze towards the companion due to civil attentiveness. Therefore, a more multimodal approach could be taken, transcribing participants’ behaviours in the context of different companions and virtual interaction conditions. Similarly, for Klowait [40] and Klowait and Erofeeva [41], the analysis could focus on more general nonverbal behaviours, such as sequential analysis of body motion and action-based decision-making. A systematic analysis of such behaviours could reveal more about inner processes related to perceived others’ presence, which might not be revealed through focusing just on eye-tracking.

For the AE, there were two hypotheses, with the planned comparison of the companion interaction. For AE, H2.a predicted that monitoring would only impact human-operated companion groups, but not the groups with AI-operated companion groups. The results showed that indeed, the performance was significantly affected overall only during monitoring by another person (human mind). For accuracy, the effect was overall detrimental to performance, with no facilitation, marginally in difficult tasks, and significantly detrimental both in easy and difficult tasks in RT. There was also an overall significant effect of monitoring, demonstrating that being monitored in IVE, irrespective of companion agency, can be significantly detrimental to participants’ cognitive performance, both in RT and accuracy. However, as found in H2.a, the results were driven by significant effects in human-minded companions but not AI groups. The results overall support the AE hypothesis, H2.a, that monitoring by a human companion, but not AI, impacts participants’ performance significantly. As with MPE, there was no SFE-related facilitation.

Additionally, the analysis investigated the contribution of increasing social impact to AE, through increasing companion visual presence (CVP), from none to nonhumanoid to humanoid. H2.b predicted that the companion visual presence might amplify the effect of monitoring, suggesting a linear positive relationship of performance impact as the CVP increases from none to nonhumanoid to humanoid. The test was conducted only in the human-minded companion group in which monitoring resulted in the predicted significant effect irrespective of CVP (H2.a). When testing differences between each level of CVP whilst being monitored, the humanoid companion group performed significantly less accurately and marginally slower than the nonhumanoid companion group. However, there was no linear relationship between the increasing impact of CVP. Therefore, the results did not support the hypothesis (H2.b) that increasing higher levels of virtual companions’ social presence contributes additionally to AE impact. However, importantly, the presence of monitoring humanoid companion did decrease performance in addition to nonhumanoid companion presence. This could suggest that even for AE, the humanoid features of companions in IVR are important.

In summary, we found that sharing the same immersive environment with a humanoid companion (MPE), irrespective of the companion’s agency (human and AI), influenced participants’ cognitive performance outcomes (SFE). We also found that the belief of being monitored (AE) by another person, not AI, influenced participants’ performance significantly, irrespective of whether the virtual observer was visible. Therefore, the socially motivated cognitive performance change (SFE) during immersive virtual interaction can be influenced both through visual aspects of the virtual companion in the shared space and, when the social evaluation might take place, participants’ beliefs about whether the companion had a human mind. Additionally, as observed in the results of analyses based on H2.b, there might be an accumulative effect of humanlike visual attributes companion and the belief that monitoring occurs by a human-minded companion who is able to mentalise. Similar accumulative effects were found in our videoconference-based equivalent of the current study [35]. Future research needs to test this notion further, both in IVR and other virtual settings.

The current immersive experiment findings were overall inhibitory, both when testing humanoid companions’ mere copresence and human-minded companion’s monitoring. We did not find a canonical interaction, facilitation in easy tasks, and inhibition in difficult tasks, often attributed to SFE [10, 27, 42]. Interestingly, the pattern of social inhibition without facilitation was the most reported effect in the current immersive SFE studies. Conclusively, it seems that immersive companions are significantly socially influential, yet the impact does not seem to improve the participant’s performance on cognitive tasks; it just detriments it.

One possible explanation for the lack of facilitation could be that, as mentioned in the SFE meta-analysis, even in the in-real world face-to-face SFE research, the results on the easy conditions are often harder to replicate [10]. Considering the easy task was often at the ceiling during the study, it is possible that participants had little room for improvement for significant facilitation to be noticed. To further investigate these findings, future research needs to apply additional physiological methods to accompany the current behavioural IVR paradigm, and possibly use a more taxing cognitive task.

There are several possible explanations for the lack of facilitation in our findings. Firstly, the immersive environment is considered to be overall more cognitively taxing, even in contrast to the video conference-based interaction in the real world [7]. This could mean that the higher overall cognitive load imposed by the IVR could have increased the overall difficulty level of performing the cognitive tasks. As per SFE, the difficult task performance gets inhibited rather than facilitated during the social context in contrast to performing alone. If both the easy and difficult tasks were considered challenging under a higher cognitive load within the IVR environment, then adding additional social influence could have led to an inhibitory effect due to cognitive overwhelming. However, when looking at the overall mean accuracy of performance, the performance on easy tasks was almost at the ceiling, suggesting that cognitive load does not fully explain the inhibitory trend of being overwhelmed.

Alternatively, it is possible that there was no facilitation to be found in an immersive space because there was no sufficient main driving factor for facilitation. As discussed in the introduction, realistic self-representation and a sense of self-presence within the immersive environment might be two of the crucial factors which lead to prosocial motivation when monitored (AE) or vigilance when merely copresent (MPE). The realistic self-presence of participants in IVR in the current study was limited mainly due to the hardware constraints of the virtual head-mounted display at the time of testing. Although we attempted to elicit a higher sense of self-presence through participants believing that their gaze and performance can be seen and they are identifiable as they perform, this level of functional presence is likely not sufficient for accountability and therefore prosocial action. Indeed, in our virtual desktop study, where the participant was remotely video present with companions, the SFE were mostly facilitatory [35]. Without the self-present facilitation, the inhibitory presence of coimmersive humanoid companions could have been more socially distracting than the other CVP presence types. The effect, however, is not necessarily unique to second-person cognition [43], as required for SFE.

Additionally, considering that the sense of self-presence in the environment is important, it is possible that when the participants were wearing the headset, the presence of the researcher in the real-world testing cubicle could have already raised the levels of participants’ social arousal. Even though the participants were shown that during testing the researcher is sitting with their back to the participant, working on their own project, the participants could have been more vigilant in the real-world presence of another person, as their own view of the real-world environment was restricted by the virtual headset. The initial real-world copresence could have contributed to the virtual social influence of monitoring or humanoid presence, overall raising the arousal levels high enough to detriment the performance. This could explain the overall high level of initial performance, even in the alone condition, then dropping when additional virtual social influence was introduced. If this is indeed the case, then immersive copresence and monitoring can be impactful as per SFE; however, additional real-world environment can accumulatively contribute to the overall arousal, rendering it detrimental. The additive social effects of real-world and immersive environments are an interesting concept and should be tested. If this is indeed the case, future interventions using IVR should be mindful also of the physical surroundings.

In the present study, the only sensory modality used to induce copresence was vision. It is important to, however, note the important role the multimodal cues can play to invoking a sense of immersion and social presence, including proprioception, haptics (touch), olfactory (smell), and auditory cues [23, 40] but were beyond the scope of the present study. Any sensory input in addition to vision, or an expectation of such, could additionally contribute to either the congruency or incongruency of the immersive experience, as well as the vigilance over the occluded physical environment in which immersion occurs. Future studies should, therefore, focus on the multisensory experience of coimmersion. Indeed, in their review article, Martin et al. [44] found that several industries using immersive technologies are already relying on other sensory modalities in their IVE to elicit higher levels of realism and immersion. It is possible that additional social sensory inputs, such as voice or sound of breathing, as well as affordance of touch in IVE, could elicit higher levels of immersive copresence. Reducing the same sensory affordances from the physical surroundings could contribute to a lesser focus on the world outside of IVE. Additionally, the individual differences in social processing and personality traits can also influence SFE [42]; therefore, the future work should consider such differences when designing experimental controls and utilising SFE in IVE practical implementations.

As social media companies and technology developers are currently shaping the future of augmented social interaction, there are already new opportunities that enable testing levels of self-and companion presence, mixing both virtual and own environment. The virtual CAVEs and augmented reality can bring virtual into the real-world environment, without compromising on both. The realistic real-time face scanning and digital twins can recreate and translate participants’ realistic replicas to immersive and augmented spaces. All of these new developments create a mixed reality in which the self-presence of the protagonist can be realistic at different levels alongside projected or immersive companions. The SFE research into augmented reality has already shown promising results through virtually projecting the companions into participants’ physical environments [45].

All of these new platforms and methods will undoubtedly enable cognitive and social perception testing at levels previously impossible in real-world communication, broadening the understanding of the human brain and what it means to be social. It is, however, important to understand the constraints of each emerging technology.

Although the current experiments’ findings are interesting, more questions arise through these immersive results and their generalisation to the in-real-world social scenarios. The immersive environments are no doubt robust platforms for testing the cognitive impacts of social interaction with others. Paraphrasing Blascovich et al. [5], immersive virtual reality is a unique tool that helps us to pick apart and reverse engineer the most complex behaviours in a controlled systematic, yet mostly ecologically valid way.

It is important to note, however, that, as with every technological tool, there are limitations that need to be considered, and immersive virtual interaction is currently far from replicating real-world communication with high validity. Establishing congruency and balance between all the sensory modalities in a realistic way is still a work in progress. Irrespective, immersive reality seems to be the right tool to explore some of the emerging trends of virtual and mixed social interaction. It is important to continue immersive research, especially considering the interest in social immersive experiences increases, and these technologies improve rapidly, reaching levels of what was considered science fiction just a few decades ago.

Appendix

Companion Animation Specifications

All the companion objects in Figure 9 (present humanoid companion (a), present nonhumanoid camera (b), and visually absent (none) nonsocial fan (c)) were animated in a Unity game engine replicating the movement of the object identically both in time and motion. To do so, the , , and axis parameters of the humanoid companion’s neck and head motion were replicated to the axis parameters motion of the camera on the tripod and the motion of the fan on its stand. Through this application, the companions’ dynamic motion towards the participants and their performance screen, as well as a motion for nonengaging with participants and their performance, were identical, performed by the different virtual objects. When the monitoring condition was “not monitoring,” all objects’ motion was identical by lowering its main component (humanoid head, nonhumanoid camera on tripod, and nonsocial fan head on its stand) down to face the floor 45 degrees. In the case of the humanoid companion, the 45 degrees was showing them looking into their virtual laptop, disengaging from the participant. All dynamic behaviours of the companion objects were piloted alongside colleagues assuring all objects’ movements were not out of place or unrealistic.

The animation of the companion’s behaviours (including nonsocial companion: fan) was set to loop after each monitoring trial. Each looping animation lasted 80 seconds, which was the maximum time a monitoring/not monitoring condition would last, based on the 20 trials 4 seconds each assigned by the experimental design. There were two animated looping behaviours: the watching down behaviour for not monitoring blocks and the observing behaviours in monitoring blocks. The transitions between the blocks were animated as follows.

When a monitoring block ended and a not monitoring block began, the companion object would turn to the participant and then turn down, when the onscreen instruction said the next block condition. When a not monitoring block ended, and the monitoring block began, the virtual companion lifted their main component (head, camera, and fan head) to an 85-degree angle turning towards the participant. The animation was programmed smoothly so as not to frighten the participant. The behaviours were overall animated to be smooth, natural as possible, and not purposefully distracting. Considering the participants had to focus on the task, not the object, the repetitive looping of animation seemed sufficient for animating social presence. The recurrence of looping behaviours was not picked up during piloting with peer researchers. When asked whether participants noticed any repetition, very few participants said they did. The only participant who did report some repetitive motion was the one who also suggested they did not pay attention to the task, but rather watched the companion. They were removed from the analysis.

Data Availability

The means and standard errors or accuracy (%) and reaction times (ms) of all the levels of the independent factors presented in the current study are available in Tables 2 and 3.

Disclosure

The experiment presented in the current manuscript was part of a series of experiments published in a doctoral dissertation thesis [45].

Conflicts of Interest

The authors of the current manuscript declare that the current research was conducted in the absence of financial influences that might have led to the perceived or potential conflict of interest.

Acknowledgments

We would like to thank Irene Valori, our cognitive psychology exchange student (2019) at the Centre of Brain and Cognitive Development (CBCD), for her contribution to helping with testing the participants in the current study. The writing of the current manuscript is funded by the Institutional Strategic Support Fund (ISSF) postdoctoral fellowship, granted by Birkbeck University of London and the Wellcome Trust. The experimental work was designed, conducted, and analysed as part of a PhD thesis, supported by UCL, Bloomsbury, and East London Doctoral Training Partnership (UBEL-DTP) studentship funded by the UK Research and Innovation (UKRI) funding body (194207).