Abstract

People are expected to have more opportunities to spend their free time inside the vehicle with advanced vehicle automation in the near future. This will enable people to turn their attention to desirable activities other than driving and to have varied in-vehicle interactions through multimodal ways of conveying and receiving information. Previous studies on in-vehicle multimodal interactions primarily have focused on making users evaluate the impacts of particular multimodal integrations on them, which do not fully provide an overall understanding of user expectations of the multimodal experience in autonomous vehicles. The research was thus designed to fill the research gap by posing the key question “What are the critical aspects that differentiate and characterise in-vehicle multimodal experiences?” To answer this question, five sessions of design fiction workshops were separately conducted with 17 people to understand the users’ expectations of the multimodal experience in autonomous vehicles. Twenty-two subthemes of users’ expected tasks of multimodal experience were extracted through thematic analysis. The research found that two dimensions, attention and duration, are critical aspects that impact in-vehicle multimodal interactions. With this knowledge, a conceptual model of the users’ in-vehicle multimodal experience was proposed with a two-dimensional spectrum, which populates four different layers: sustained, distinct, concurrent, and coherent. The proposed conceptual model could help designers understand and approach users’ expectations more clearly, allowing them to make more informed decisions from the initial stages of the design process.

1. Introduction

People have had more free time inside the vehicle with the increment of vehicle autonomy [1, 2]. In a fully autonomous vehicle (AV), occupants are expected to not be required to keep their eyes on the road all the time [3]. Thus, occupants can fully turn their attention to other activities, meaning information no longer needs to be displayed in either one device or one fixed spot. This provides opportunities for vehicle occupants to experience new forms of in-vehicle spatial orientations and to have varied interactions. This would open up new possibilities for designers and researchers to enhance the users’ experience of forthcoming in-vehicle interactions.

Designing for in-vehicle interaction is multifaceted, as occupants focus on a driving task and interactions with equipped infotainment systems, brought-in devices, external services, AI assistants, and social interactions [4]. The in-vehicle interactions have evolved with advanced technology that adapts to users’ changing contexts, psychological states, preferences, and habits. The interactions can be enabled through different inputs and outputs, including visual and audio, voice and haptic with knobs, buttons, touch screens, and more [5]. To provide occupants with optimal modes, it draws increased attention needed on what they expect from multimodal interaction within AVs and what characteristics of the multimodal experience should be considered when designing overall interactions. This study, therefore, is aimed at investigating users’ expectations of the in-vehicle multimodal experience in future AVs.

2. Background and Relevant Work

2.1. In-Vehicle Multimodality

As a notion, “multimodality” is linked with “mode,” which means “the particular way of accomplishing something” [6], and “modality,” which means “those aspects of a thing which relate to its mode, or manner or state of being, as distinct from its substance or identity” [7]. Multimodality means having multiple modalities. In a system, multimodality is an exchange between device and human where multiple inputs or outputs may be used simultaneously or sequentially depending upon context and preference [8]. Such systems process two or more combined user input modes—such as speech, pen, touch, manual gestures, gaze, and head and body movements—in a coordinated manner with multimedia system output [9]. In the context of a vehicle, it may cover various interactions such as haptic, kinetic, speech, gesture, touch, virtual touch, gaze, eye gaze switch, and facial input, plus devices such as touch screen, the pedals, buttons, face cam, microphone, and multitouch displays [1, 1012].

Recent research shows multimodal interactions improve the in-vehicle user experience [13]. Indeed, it has been found that multimodality inside a vehicle can play a crucial role in enhancing the user experience: (i)As an effective means of interaction and conveying information [14, 15] by giving users ways to interact with the system through the modality that is most appropriate for the present context of use [12](ii)As an enabler for emotional design and particularly positive emotions through in-vehicle systems [16](iii)As a means to increase learnability, as well as accommodate a broader range of users, tasks, and contexts [17](iv)As a means to offer better flexibility and reliability and to offer interaction alternatives to better meet the needs of diverse users with a range of usage patterns and preferences [18](v)As a way to achieve a seamless interaction closer to human-human communication [19, 20](vi)As a way to increase the trust between the machine and the passengers [21]

With a growing number of multimodal technologies diversifying the role of vehicles, understanding what people desire from a multimodal self-driving experience has become crucial to meeting people’s needs.

2.2. Understanding Users’ Expectations of In-Vehicle Multimodal Experience

Vehicle users’ expectations informs the early design process by helping designers rethink the occupant-vehicle relationship, identify the design requirements, and design future artefacts [2224].

Previous studies have emphasised the expected activities, use cases, and scenarios in future AVs [2529]. Pettersson [30] proposed a model of users’ expectations of AVs. Lee et al. [23] identified a comprehensive design taxonomy and design requirements for full AVs. Cha [25] developed AV scenarios by focusing on human emotions. In these studies, user-centred values were elicited by investigating what people would want and expect from future vehicles and breaking down the user experience and expectations.

Moreover, users’ expectations have also been investigated, focusing on in-vehicle multimodal interactions. Politis et al. [31] evaluated multimodal displays with users under varying degrees of situational urgency. Väänänen-Vainio-Mattila et al. [12] explored the user experience and the expectations of haptic feedback in a vehicle. Multiple studies have investigated the usefulness of the combined feedback modalities such as visualisation, light, audio, text, haptic, or vibration for conveying information to the driver that supports a feeling of trust and safety [32, 33]. These studies mainly focused on evaluating or testing a particular multimodal prototype and its real-time impact on users [13, 15, 34, 35].

Additionally, there have been several attempts to map, model, or create a framework for multimodal interaction. For example, Nigay and Coutaz [36] classified multimodal interfaces in a table (combined or independent, sequential or parallel). Obrenovic and Starčević [37] modelled the multimodal human-computer interaction. Lalanne et al. [38] proposed a model for multimodality integration. Serrano et al. [39] developed a framework as a tool for creating multimodal input interfaces. Similarly, some studies offer a methodological perspective. Di Mitri et al. [40] provided a conceptual model for multimodal learning analytics with its objectives. Chang and Bourguet [41] proposed a usability framework for designing and evaluating multimodal interactions. Likewise, Zaidi and Kirazci [42] suggested a single-axis multimodality spectrum for Google Assistant, and Platz [8] mapped the spectrum of multimodal interaction. Most studies focus on a technical perspective that categorises multimodal devices or services.

Given that multimodal interactions depend on contextual preferences [8, 43, 44] and it entails the ability to adapt to different environments and users [45], understanding the user contexts is critical to better shape the multimodal experiences. A “user context” refers to a broad range of contextual elements in a user’s experience [46, 47], the context of use or activities [2628], or situations [48]. The most commonly mentioned elements of the in-vehicle context include actors (occupants), systems (vehicles), and environment such as a driver, vehicle, environment, performance, system, passengers, applications, inner worlds of people, ambience, surrounding, the position of people and objects, and time of the day. Therefore, a working definition of user context for this study is “the particular moments caused by the vehicle, environment or occupants that would trigger occupants to request intuitive interactions in a future AV.”

With this understanding of user contexts, the study is aimed at investigating users’ overall expectations of the multimodal experience in an AV. Next, a conceptual model that categorises the user’s in-vehicle multimodal experience, with two dimensions and four quadrants, is proposed and discussed thoroughly. The paper goes beyond evaluating specific modalities and instead suggests a model that provides a holistic perspective on user interactions. This broader focus contributes to a more comprehensive understanding of user expectations concerning the multimodal experience in upcoming AVs. The model is expected to help designers make more informed decisions when designing future in-vehicle multimodal experiences by considering user insights, and it can be used as a basis for design guidelines for multimodal interactions.

3. Research Design

A qualitative approach was adopted to extract users’ expectations about the in-vehicle multimodal experience. It was the most appropriate approach to gain detailed knowledge about participants’ future desires and expectations [49, 50]. The paper consists of two linked studies—study 1 and study 2. Study 1, as the preliminary study [22], is aimed at exploring in-vehicle user contexts in which users might desire effortless interactions in an AV. Study 2 was designed based on the study 1 findings, which will be the focus of this paper.

Study 1: exploring in-vehicle user contexts for effortless interactions in an AV

An open-ended online survey with 150 people was conducted, and the qualitative responses were thematically analysed, which resulted in a taxonomy of in-vehicle user contexts [22]. This taxonomy, with six contexts, related subcontexts, and their frequency () with which they appeared in the entire responses, shown in Table 1, created a baseline and provided guidance for study 2.

Study 2: understanding users’ expectations of the in-vehicle multimodal experience

Study 2 was designed to understand what users expect of their future vehicle multimodal experience in the six of each in-vehicle user context (Table 1). Five separate design fiction workshops with a total of 17 participants were conducted.

3.1. Design Fiction Workshop

The design fiction sessions support effective engagement and creative expression [51, 52] and promote critical engagement by encouraging people to question current beliefs. Creating an environment that allows participants to engage, collaborate, and play gives them a more dynamic role than being an informant [30]. Hence, these collaborative workshops seemed to be a helpful approach in gathering people to interact with each other to explore their expectations about a particular experience.

However, involving potential users in idea generation may also have disadvantages. While these approaches could enable us to reveal opportunities for users [53], users may not know what they want, and it is hard for the general public to engage with future artefacts and events [54, 55]. Since using creative, playful, and artistic ideation methods and approaches can offer a solution for future imagining and engagement [56, 57], this study implemented future-oriented methods to eliminate the disadvantages of the aforementioned user engagement and future imagining issues. The workshops included the following: (i)Brainstorming and group discussion, a well-known technique for collaborative idea generation, were implemented in the workshops. The workshops included individual and group brainstorming sessions, which helped achieve both creative quality and quantity [58](ii)Science-fiction (sci-fi) scenario writing as a part of a Design Fiction practice was implemented as a significant part of the workshop. Fiction as an artistic form of expression helped participants to imagine the future, envision, and share their thoughts on future technologies and creatively express themselves [56, 59](iii)Roleplaying as a part of the drama technique [60] was also implemented in the workshops as a future envisioning, engaging, and prompting method, which helped extract the future expectations of the participants

3.1.1. Design Fiction Workshop: Participant Selection

Recruiting participants was a crucial part of the study because the workshop activity required creative skills for getting into the role, engaging with the future, and writing. Conducting such a creative workshop activity with the public can be challenging due to a lack of the mentioned skills among participants. Thus, a purposeful sampling strategy [61] was used to overcome the potential challenges. Considering that speculating with fiction writers was identified as an effective options [56] and considering practicalities, the following recruitment inclusion criteria were set: (1) (sci)fiction writer, either a professional or a young or amateur writer, (2) willing to participate and engage with other participants in a session, (3) being a vehicle user, (4) aged over 18, and (5) the ability to access online tools. Screening questions including demographic information were used to ensure a gender balance and to recruit participants that met the inclusion criteria. Online channels were used for the distribution of the research adverts such as social media platforms, research sites, and online discussion forums. Research posters were also posted in a variety of “research,” “writing,” “fiction,” and “sci-fi” relevant social media communities such as research-related Facebook groups and subreddits.

Table 2 presents participants’ information about the workshop sessions. Seventeen fiction and sci-fiction writers (8 F, 9 M) in total participated in the workshops. The participants’ ages ranged between 18 and 65. The average age was 27. Eleven were amateur or hobbyist writers including people who published magazines, were involved in creative writing workshops or writer clubs, were sci-fi story or poet writers, and studied literature with a strong interest in the sci-fi genre. Six were professional writers including content writers, academic writers, and professional young adult fiction writers, who were already writing a sci-fi book or writing fiction for TV shows.

3.1.2. Design Fiction Workshop: Materials

As multimodal experiences should be intuitive, knowing when intuitive interaction may be critical can help to design multimodal interaction. Six contexts from the preliminary study findings [22] were employed to investigate users’ expectations of the in-vehicle multimodal experience. It was intended to frame the in-vehicle interactions in a specific context to reveal rich and in-depth context-specific thoughts about their multimodality expectations.

Each context was introduced to participants with the following details: (1) the key definition, (2) examples, and (3) quotes. An example of how the context was presented to participants is shown in Figure 1.

3.1.3. Design Fiction Workshop: Procedure and Questions

Since this research is aimed at extracting perspectives from different contexts, the participants were assigned a context they would explore throughout the workshop. The in-car contexts were allocated randomly to the participants. The study was conducted through an online video conference platform with an online collaboration whiteboard application. Everyone’s video camera was on, and participants could see and interact with each other as they went through the workshop activity. Each session lasted approximately 120 minutes, and five different workshops were conducted. Nine questions in total were used to extract participants’ expectations of the in-vehicle multimodal experience.

In each session, the participants were first welcomed and informed of the goal and procedure of the workshops. Then, the whiteboard link was shared in the chat section, so participants could meet there. The workshop material (context) was preuploaded, and each task for individual participants was prewritten on the whiteboard application, which was separated by columns for each participant. When participants met on the whiteboard, everyone’s name was located at the top of their column, and then, the context with explanation, examples, and quotes (see Figure 1) was shown just under it, which signified the context they were assigned.

Before the session started, participants were given three minutes to familiarise themselves with the platform and work out how to navigate it. Throughout the workshop, participants’ contexts and the future were emphasised to prevent them from going out of the context without realising it. They were provided with the necessary time and digital space to fuse their sense of freedom while conducting the activities. Breakout rooms were used in group discussions to encourage participants to speak up and engage within smaller groups. The following steps and questions were employed in each workshop session: (i)Step 1: familiarisation and individual brainstorming

Each participant was assigned to a different future in-vehicle context with which to familiarise themselves. The initial session was intentionally an individual exploration because personal time was required for participants to get familiarised. They were encouraged to engage themselves with the contexts. A question was used to help participants to relate to and interpret the contexts both personally and in a broader sense—“What could this multimodal in-vehicle context mean in a broader sense? (e.g., inner, external, social, communal, political, environmental, economic, spiritual).” (ii)Step 2: group discussion and brainstorming

After the participants had familiarised themselves with the contexts, they were encouraged to explore and extract ideas about future vehicles’ multimodal usage experiences. First, they were asked to imagine a future scene in which they had a multimodal AV based on their assigned context. Then, they were asked to illustrate the details of the capabilities and tasks they expect from the vehicle in the context, which included their preferred ways of interacting with the multimodality effortlessly (e.g., showing intention and receiving actions). Interactive group cohesion was used in step 2. (iii)Step 3: sci-fi scenario writing

Participants were prompted to imagine a future and write a fiction scenario within a future multimodal vehicle by considering “before”–“during”–“after” the journey. During their scenario writing, the questions were used to prompt elaboration of their conceptions and expectations about the in-vehicle multimodal experience—“In which ways does your multimodal vehicle penetrate your (or others around you) life and experience? Why and How?” or “How do you (or others around you) interact, collaborate and help each other with your vehicle for the optimal enhancement of your experience?” (iv)Step 4: roleplaying

Participants were asked to empathise with the future character in their fiction writing and to roleplay their scenario. Also, they were asked to interact with other participants, exchange ideas, give recommendations to each other, explore the possibilities when two contexts are merged, and discuss the remarkable parts of their interaction and the reasons behind the interactions.

The research activities were performed in accordance with the code of practice of the university. The ethics were approved before any research activity. All of the participants were provided with a participant information form and then agreed to an ethics consent form before the study.

3.1.4. Design Fiction Workshop: Data Analysis

To analyse the results, it was necessary to inquire beyond what was immediately visible and instead search for a semantically higher meaning consistent with understanding the deeper meaning and aspects that impact users’ experiences in each context. The aspects and dimensions that greatly impact what users expect from a multimodal experience were therefore investigated in the analysis.

The general outline of the analysis was drawn from thematic analysis [62]. It was conducted following the step-by-step thematic analysis guidelines. An online collaborative workspace application was used to organise the codes, which was helpful in organising and working with the large amount of data while increasing the flexibility of the process. The following steps were taken: (i)The audio data was transcribed into a written form and combined with the data from post-its and fiction scenarios that participants generated (see Table 3)(ii)Transcriptions were read and cross-checked by the researchers (authors) to familiarise them with the data and ensure they were represented well(iii)The data was thematically coded according to the research aim of understanding users’ expectations through their views, the tasks, and activities they desire to perform, their preferred input-output interactions with the car, and the narrations they created within a fictional multimodal car(iv)Patterns of expectations in the forms of tasks started to emerge for each context. The subthemes were extracted based on the repeating patterns of tasks. They were all colour coded separately(v)This part of the study was conducted separately for each context, which helped us to reveal more details(vi)Finally, the data was analysed as a whole to determine whether there were recurring aspects

Three reviewers (authors) completed an inter-rater reliability check independently to ensure the reliability of the results. In total, 11 themes were revealed from the analysis. Each reviewer checked two different measures of intercoder reliability: the per cent agreement and Fleiss’s kappa [63]. The checks suggested that two of the 11 themes could be considered highly reliable. The reliable themes—attention and duration—had a reliability rate of 100% (excellent agreement), while the other nine themes—proactivity, inclusivity, control, individuality, information density, physicality, proximity, space usability, and temporality—achieved the reliability rate of 85.185%, 82.716%, 90.123%, 87.654%, 75.309%, 55.556%, 80.247%, 85.185%, and 70.37%, respectively. Only the themes that achieved 100% were selected.

4. Research Findings

As a result of the thematic analysis, 22 subthemes were derived with 27 subconditions as the categories of users’ expected tasks of multimodal experience (see Table 4). Each task (from T1 to T22) indicates the details of preferred ways to interact effortlessly with the multimodality in an AV.

As a result of data analysis, as higher categories, two dimensions—“attention” and “duration”—were identified as the most prominent aspects that crucially impact participants’ expectations of in-vehicle multimodality (see Table 5). Each task was able to be categorised by the dimensions. Table 5 shows the relationship between each dimension and tasks. This section will explain the research results in detail, by justifying how the two dimensions are drawn from the user’s insights. Then, in the next section, these dimensions and their interaction with each other and their implications will be discussed comprehensively.

4.1. Dimension 1: Attention

From the participants’ narratives, “attention” appeared as one of the most important dimension of their expectations of the multimodal experience. This could be interpreted as users expecting that the level of their attention will be a significant determinant in how they experience future multimodal implementations inside an AV. Here, “attention” refers to the attention between the occupant and the potential multimodal experience involved in the vehicle.

In previous studies, users’ attention and cognitive workload have been regarded as factors to be considered in multimodal interactions [20, 64, 65]. Particularly, context-aware interactions, such as interactions with a visually attentive interface, rely on a person’s attention as the primary input [66]. Obrenovic and Starčević [37] also mentioned “attention: focus and context” under the category of cognitive effects when defining the effects of multimodal interaction. Moreover, interaction types are often categorised based on the level of attention [67].

When users mentioned their desires for multimodal interactions in a vehicle, the level of attention between the occupants and experiences was typically depicted in direct or indirect ways, to impact their experience. A dimensional spectrum from low to high attention appeared through the different expectations. The suggested spectrum for attention is detailed in Table 6.

4.1.1. Low Attention

Some expectations were naturally related to easy and quick desires linked with low and distracted attention, categorised as “low attention.” For example, participants expressed their expectations about being able to request information easily and quickly (T3); to skip and switch the media (T2); and to start, pause, and stop actions (T1). These subthemes of tasks were observable through participants’ comments:

I might just go by the name of station 1 most you know, to say like, “BBC One extra” or “Kiss.” (T3)

If I don’t like the music, I can just wave it with my hands. And we can go to the next genre. (T2)

I snap my fingers to get the music to turn off. (T1)

Similarly, participants mentioned their expectation of avoiding intrusiveness when getting feedback from the vehicle (T22)—which is about creating a space where they will not be distracted. Hence, the interaction when the vehicle sends feedback to the occupant should require low attention, for example,

If it just popped up on the screen - So, I might kind of ignore it. (T22)

4.1.2. High Attention

Conversely, some expectations were linked to high attention in an experience. These were more focused, concentration-requiring tasks categorised under “high attention.” For example, participants expressed their expectation of attentive interactions when focusing on a task, showing urgency to the vehicle and getting information in hazardous situations. These subthemes of tasks were observable through participants’ comments, such as the following:

Getting information in a severely hazardous situation—T14b

If the car needs to interrupt me for a safety reason, even if it seems minor, I’m interested in what it has to tell me.

I would like to have the options for it to grab my attention to the fullest as multi-channel.

Showing the urgency—T16b

If I’m going to have an intelligent car, I want it sort of synced and linked with all my devices... It can take my time and read it if it understands the urgency of the situation.

Focusing on a task—T19

I prefer to give attention to one thing.

I tell her that “I want to focus on rehearsing a presentation.” I want it to automatically do things like dim my windows, turn the music down or off.

4.2. Dimension 2: Duration

Another dimension of the user’s expectation of a multimodal experience is the “duration.” This refers to the time that passes while the occupant experiences the multimodal interaction and its effects. It appeared in the data as differences between expectations of tasks that take a short time and a long time.

In previous research, “duration” has been considered an impactful factor in users’ experiences. The relevancy of time in users’ experience has categorised experiences as (1) temporal, meaning the experiences are dynamic and moment-to-moment experiences, and (2) long term, meaning the time is highly related to the experience of the user where experiences remain in their memory and influence their overall evaluation [68]. There are studies on the impact of time and duration on users’ perceived experience [69] and on developing positive long-term user experiences with users [68, 70]. These are equally relevant to in-vehicle experiences, as the level of focus and time users take can play a crucial role in impacting users’ in-vehicle multimodal expectations.

There was a noticeable difference in the length of the time of the envisioned expectations. A short- to long-dimensional spectrum was constructed by considering the differences mentioned in the participants’ narratives. The suggested spectrum for the duration is detailed in Table 7.

4.2.1. Short Duration

Some expectations referred to quick, intermittent, and short-time-required tasks and activities. These activities were categorised under short-term activities. For example, participants mentioned their expectations about receiving unrequested recommendations (T12b), getting information about mildly hazardous situations (T14a), arranging the interior physical space (T5), reporting an error (T15), and starting, pausing, and stopping actions (T1)—all of which were characterised as short-term experiences. These subthemes of tasks were observable through participants’ comments.

If it was kind of unrequested (recommendation), I would prefer it to just be a pop-up on the screen as an option to view. (T12b)

If it’s snowing as well, I quickly think I’d want the car to be able to tell me on the dashboard whether or not I’m sliding or if the temperature is quite cold. I think visual icons would be perfect for that. (T14a)

If it’s not understanding what I’m trying to say. And it needs a very quick activation. (T15)

It has to be something short and sweet; I’d say something like “Alex, put radio one” on whilst I’m driving” and “I like to set up like straight when I when I start driving or if I suddenly remember I want it to be quick and easy.” (T1)

4.2.2. Long Duration

On the other hand, some of the user’s expectations were more constant, long-term desires. These activities required an ongoing or long-term experience to be satisfied, for example, getting reassurance from the vehicle, synchronising, creating the aura bubble in the vehicle, or getting control and confidence from the vehicle. These subthemes of tasks were observable through participants’ comments.

Getting reassurance—T18

If I see something visually, I would be more prone to listen or to do what I’m saying. Whereas if I listen, I’m just going to ignore it.

I want the car to be giving me updates on the blockage. And I would expect this in the form of a new kind of (display)segment. It would just give that kind of constant visual update.

Synchronising—T4

I don’t want to connect my Bluetooth. I just want it to be connected already.

Creating the aura bubble—T10

It can sense wherever you’re annoyed or angry, or wherever you’re in a good mood or whatever. So it automatically does that for you.

The car should understand that something is wrong with me.

Gaining control and confidence—T9

So you maybe don’t want the car to talk back to you. If it’s just a message on the screen, it’s gonna feel less invasive, and you can choose to not even read it. Whereas if it’s talking to you, will be like, “Okay, I get it, just shut up now.”

It will be more control and wanting less feedback from the car. So I can just be stressed.

This means that the car will be constantly aware of this and will not interrupt and will give the user more control and confidence.

The two dimensions observed can impact how the multimodal interaction is designed and how input and output modalities are selected, designed, located, combined, and intended.

5. Discussion

5.1. A Map of Expectations of In-Vehicle Multimodal Experiences

Two dimensions, which cover two different spectrums (from low to high and from short to long), can be used to cluster all different tasks by placing “duration” on the -axis and “attention” on the -axis. One of the suggestions could be demonstrating the relationship between the themes and subthemes, and the tasks are summarised in Figure 2. Although the exact location of tasks cannot be determined with the data we have drawn, the tasks’ relation to quadrants can be interpreted determined to be able to create this quadrant. For example, T1 (quick starting/pausing/stopping actions) and T2 (skipping and switching) are likely to simultaneously require very short duration and low attention; that is, these interactions do not require much attention and last only for a short time. Similarly, T18 or T4 is more likely to be in effect longer term and requires more attention because one is about getting reassurance from the car, and the other is about synchronising and constantly staying connected with the car.

This two-dimensional spectrum allows for the creation of four different quadrants, on which users’ expectations of in-car multimodality can be placed, as shown in Figure 3. The spectrum enables designers and researchers to map the user expectations of multimodal experiences before implementing them into a vehicle design space. It is important to emphasise that this mapping is not intended to present any superior or inferior quadrants; each may have advantages or disadvantages.

5.1.1. Quadrant I: Sustained

The first quadrant represents the continuous expectations with a long-term impact on user experience and that require high attention and presence. The interactions in this quadrant are influenced by a sustainable, focused, and mindful relationship between the user and the vehicle. Some exemplary sustained interactions could be related to a vehicle’s self-learning capabilities. The vehicle constantly learns and adapts to the user, creates and maintains a mindful relationship with the user and their circle, and becomes more familiar with them through a growing experience.

This quadrant may be associated with the experiences that offer constant connection and connectivity in the vehicle, attentive engagement with the vehicle’s connected system, and building and maintaining a relationship with the vehicle. Relevant studies include those that emphasise sustaining a long-term and trustworthy relationship between the human and the computer [7173], ensuring personalisation and customisation [74], and creating active engagement and immersive experiences [75, 76] with the user.

This quadrant highlights the importance of “establishing and maintaining a continuous, familiar and engaging relationship” as a crucial part of users’ expectations of the in-vehicle multimodal experience. Considerations to take into account to address this include implementing robust software and technical capabilities with efficient machine learning, safety, personalisation, and recognition to get to know the user; creating familiarity and long-term interactive relationships; and implementing VR, AR, and XR applications to devise engaging content and an immersive user interface. Designers may employ these solutions to integrate multimodal interactions into future AV that establish and maintain long-term, trustworthy and engaging, immersive relationships with their users.

5.1.2. Quadrant II: Distinct

The second quadrant represents the expectations that are short term or intermittent and where experiences in these ranges are concentrated. It includes focused and short-term experiences inside the vehicle, such as users enjoying focused, productive time when gaming inside the vehicle or when the vehicle offers an exclusive multimodal space with a range of focused modes. These expectations relate to the possibility of transitioning from one focused event to another, so the vehicle may also require the ability to adapt to changes.

Since the user in this quadrant is engaged with the vehicle in a focused but intermittent and short-term way, this quadrant is associated with multimodal concepts such as temporality, transitioning, adaptability, productivity, and time management. It links with experiences that are focused, attentive, and reserved in their world but short term—which will require switching from one focused mood to another. Studies that relate to the idea of ensuring adaptability [77], creating focused activities [28] and time management [29] within a vehicle, correlate with this quadrant.

This quadrant highlights “creating short-term but engaged experiences and enabling efficient ways of transitioning between them as a crucial part of users” expectations of the in-vehicle multimodal experience. It may necessitate tools and techniques that adapt in a short time. This quadrant implies the need for multimodality in “adapting experiences,” raising the following question: how can designers apply multimodal interactions in a future AV to create efficient, engaging time and adaptable transitions between focused modes?

5.1.3. Quadrant III: Concurrent

The third quadrant embodies the expectations that involve a short-term duration and low attention. These are momentary, distracted experiences that do not necessarily require particularly focused attention. Such experiences could involve switching between different platforms, changing the music, getting quick movie recommendations, and urgently informing the vehicle. For example, when the user switches from listening to music from Spotify to listening from YouTube while reading an e-book, the vehicle may offer a range of multimodal synchronisation options.

Due to its short-term and low-attention nature, this quadrant links with multimodal concepts such as momentariness, transitivity, seamlessness, and intuitiveness. Since the experiences are quick and do not have to include particularly focused attention, this quadrant highlights “creating rapid, practical, and absorbed experiences” as a crucial part of users’ expectations of the in-vehicle multimodal experience. Relevant studies such as Detjen et al. [78] and D’Eusanio et al. [79] have explored intuitive interactions that may link with concurrent activities. Implementing technical capabilities that enable rapid, smooth, and intuitive transitions from one modality to another, one device to another, and one activity to another could be considered. The implication of this quadrant could invite designers and researchers to address how multimodal interactions can be integrated into a future AV as methods to establish and maintain concurrent transitioning that would allow intuitive experiences for users.

5.1.4. Quadrant IV: Coherent

The final quadrant refers to the expectations that are long term and continuous and that require only low attention. It has an emphasis on the continuous but not discernible experiences that may create a particular undertone or mood within the vehicle. For example, the vehicle may constantly filter out the information that the user finds undesirable before exposing the remainder to them on the display. Similarly, the vehicle may constantly arrange lighting, colour, sound, and tone based on the users’ moods without them noticing it.

This quadrant correlates with multimodal concepts such as intangibleness, intuitiveness, and sensitivity, as the impact of the experiences on this quadrant may not be immediately perceivable. However, their impacts are complementary to the in-vehicle experience. Studies such as Löcken et al. [80] and Mahmood et al. [81] are related to this quadrant due to their exploration of user experiences that do not occupy users’ attention. Since the experiences are effective in the long term but not necessarily noticeable, this quadrant highlights “creating a coherent undertone, foundation, and ambiences for the other experiences to occur.” This therefore becomes an integral part of users’ expectations of the in-vehicle multimodal experience. Understanding this element should inform how multimodal interactions are integrated into a future AV to establish and maintain a coherent undertone and intuitive user experience.

Various user studies have investigated in-vehicle multimodal experiences or interactions [2, 13, 15, 34, 35, 82]. These studies focused on testing and evaluating a particular in-vehicle multimodal interaction (e.g., speech, gesture, dials, and haptic) on users but do not consider the experience of an overall multimodal journey in a vehicle. Other studies have presented a classification, model, or mapping of multimodality [8, 3638, 40, 41]. However, these are either based on general interactions with devices or on the technical aspects but do not focus on the AVs and occupants’ experiences in them. Hence, the proposed conceptual model could facilitate a discussion in the early design process regarding AVs’ multimodal applications by holistically understanding users’ expectations with more clarity and influencing design decisions accordingly.

6. Conclusion and Future Work

This paper presented two studies, focusing on the latter study, to understand users’ expectations of a multimodal experience in an AV. The attention and the duration of the multimodal interaction and their effects were determined as the two critical dimensions to map out users’ expectations. The outcome was a map of expectations of in-vehicle multimodal experiences structured, which contained four quadrants: sustained (long term-high attention), distinct (short term-high attention), concurrent (short term-low attention), and coherent (long term-low attention). This research presents a novel conceptual model for understanding users’ expectations in autonomous in-vehicle multimodal experiences. Distinct from conventional evaluations of specific multimodalities, it takes a significant stride in this field by presenting a model that categorises users’ in-vehicle multimodal experiences thoroughly and offers a holistic perspective on users’ in-vehicle expectations of a future AV. The map is poised to empower designers to make well-informed decisions for future in-vehicle multimodal experiences by leveraging user insights. Further, it can serve as a foundational resource for crafting design guidelines for in-vehicle multimodal interactions.

The dynamics of technological development today require us to constantly understand users’ altering expectations and desires to provide them with a more advanced user experience. These research findings provide a theoretical basis for researchers to develop further guidelines for in-vehicle multimodal interaction. Using the map of expectations, in which in-vehicle multimodal experiences have been broken down into layers, as a starting point, the following questions can be raised to develop this model further: (i)Which technical capabilities, skills, implications, and conditions may each quadrant bring or require?(ii)What kind of devices and design experiences can assist users in maximising their lives?(iii)What about experiences that move between quadrants in a single scenario, and which interactions may help navigate the transitions between quadrants?

While this study has made original contributions, it is also crucial to mention its limitations. Both the survey and the workshops were conducted remotely. Although this allowed the researchers to reach more people, it would be better to proceed with them face to face, collaborate, and roleplay on-site. Further, although participants were strongly encouraged to consider a future mobility context through contexts, fiction, scenario writing, and roleplaying, it was inevitable for them to be distracted by the present context. This is a reoccurring problem with user studies that explore the future. In future studies, researchers could enable immersion when conducting workshop activities with the help of technologies such as virtual or augmented reality.

It would also be interesting to evaluate whether particular traits and skills of users, such as gender, age, or level of driving experience, impact the attention levels. For instance, future studies could examine if attention levels increase or decrease with increasing user age and explore how user expectations may differ based on that. Moreover, in-depth qualitative research could be conducted to detail and specify the step-by-step tasks for each quadrant. Similarly, an in-depth quantitative study could determine where the tasks would be located on the map in each quadrant by exploring attention and duration levels, to further develop knowledge of the in-vehicle multimodal user experience. Neuropsychological assessments and cognitive tests could be implemented to preliminarily evaluate users’ attention levels.

The outcome of this paper provides designers with a better understanding of user experiences to design multimodal interactions, as well as a theoretical basis for future researchers to develop more applicable design guidelines for in-vehicle multimodal interactions. This will contribute to the adaptation of multimodal interactions inside the vehicle, facilitate the transition from unidirectional tasks to experiences, and enable future users to enjoy improved multimodal experiences instead of fixed and nonreciprocal ones.

Data Availability

Due to the commercial nature of this research, it was agreed that the data from this research would not be shared publicly. Hence, supporting data is not available.

Conflicts of Interest

The authors declare that they have no known competing financial interests or personal relationships that could have appeared to influence the work reported in this paper.

Acknowledgments

This research was fully funded by the Holistic UX Group, Automotive R&D Division, HMG, Seoul, South Korea. The grant ID is 12597. Open Access funding was enabled and organised by JISC.