Abstract

The increasing utilization of virtual teams—driven by advancements in information and communication technology and the forces of globalization—has spurred significant growth in both theoretical and empirical research. Based on the smart literature review framework, this study harnesses artificial intelligence techniques, specifically natural language processing and topic modeling, to extensively analyze the trends in virtual team research spanning the last four decades. Analyses of a dataset comprising 2,184 articles from Scopus-indexed journals discern 16 distinct topics, encompassing critical areas such as communication, leadership, and trust. The trajectory of research topics in this field has witnessed increasing diversification over time. Key subjects such as learning, communication, trust, and leadership have consistently maintained their presence among the ten most frequently explored topics. In contrast, emerging areas such as agile development and patient care have recently become some of the most prominent themes. Employing the state-of-the-art topic modeling technique, BERTopic, this study furnishes a comprehensive and dynamic panorama of the evolving landscape within virtual team research.

1. Introduction

Spurred by technological advancements and the impact of globalization, the utilization of virtual teams (VTs) has emerged as a significant trend in organizations. While previous studies have defined VTs in various ways, recent literature reviews characterize VTs by their geographic dispersion and use of information and communication technologies (ICTs) [14]. Thus, a VT can be defined as a team composed of geographically dispersed members who collaborate through ICTs.

The rise of VTs can be attributed primarily to the development and widespread availability of information and communication technology (ICT) since the 1980s [5, 6]. VTs are expected to enhance project effectiveness and foster innovation by enabling collaboration among experts, regardless of their geographical location [7, 8]. Additionally, the increased need for remote collaboration due to the COVID-19 pandemic has further contributed to the growing popularity of VTs [9, 10].

In response to these emerging trends, academic research on VTs has grown significantly since the late 1990s, resulting in substantial knowledge in this field. With the growing use of VTs and VT research, challenges to collaboration in VTs have become apparent. In VTs, geographic dispersion and reliance on digital communication have created various problems in collaboration, including reduced team cohesion [3], stagnant knowledge sharing [11, 12], and increased conflicts [2]. In recognition of these issues, previous studies have explored effective leadership in VTs [13] and factors that promote effective communication, trust building, and knowledge sharing [11, 14].

In addition, recent technological advancement and globalization of business activities present new opportunities and challenges for collaboration in VTs [15]. For example, developments in ICT enable communications with higher synchronicity and visibility (e.g., video conferencing) and, therefore, can overcome some challenges related to synchronicity and visibility in digital communication [16]. On the other hand, the growing need for global collaboration is increasing difficulties such as time and cultural differences [17, 18]. These changes may have been reflected in trends of VT research. Accordingly, surveying long-term trends is valuable for elucidating the “big picture” and future directions in this research field.

The rapid development and accumulation of academic research have increased the need for literature reviews to systematize and integrate previous studies but have also increased their difficulty. As measures to address this challenge, new literature review methods have been proposed and used to complement traditional systematic and integrative reviews. With the advent of bibliometric and systematic literature review methodologies, researchers have adopted more sophisticated approaches to organize prior studies [19, 20]. These methodologies offer a more transparent, reliable, and reproducible means of reviewing existing literature. However, bibliometric relies on citation counts, which limits its ability to unveil hidden thematic structures and evolving research trends across numerous articles. Furthermore, systematic literature reviews demand considerable manual effort and time, which makes it challenging to comprehensively cover the extensive volume of VT research.

Therefore, this study employs two artificial intelligence techniques—natural language processing (NLP) and topic modeling—to explore trends in VT research.

NLP is a field that combines computer science, artificial intelligence, and linguistics to enable computers to understand and generate human language [21]. Using efficient data preprocessing techniques and machine learning algorithms, the NLP system can quickly analyze thousands of documents, a task that would be impractical for human researchers due to time and resource constraints [22]. Therefore, text mining using NLP is effective for analyzing large volumes of research articles by extracting insights and meaning from large textual data.

Topic modeling is a machine learning technique that uncovers underlying themes and topics within large text data by identifying patterns and relationships among data points [23]. Statistical and machine learning algorithms used in topic modeling enable it to analyze the word frequency and cooccurrence patterns in extensive textual data. These algorithms cluster words into topics based on their distribution across documents, effectively summarizing large datasets by identifying key themes [24]. This approach enables the summarization of extensive textual data, identification of underlying thematic structures, and tracking of changes over time.

Traditional analytical methods of NLP and topic modeling can provide the benefits of removing biases associated with human analysis and efficient processing of large amounts of data. As a recent advancement, the integration of deep learning into topic modeling has enabled the extraction of more meaningful and insightful topics by reflecting the contextual information within text data [25]. These advanced text mining techniques are useful to obtain an overall picture and trends in a research area such as VT research, which is under rapid development and transition. Topic modeling using deep learning techniques can serve as a novel method to organize the accumulation of large amounts of research over time in a reproducible manner, complementing traditional literature review methods.

Therefore, based on the issues in the VT research described above, this study addresses the following research questions using topic modeling: (i)RQ1: Has diversification occurred in VT research topics, and to what extent?(ii)RQ2: What are the various topics within VT research, and which are central?(iii)RQ3: What trends have emerged in these topics?

Methodologically, this study follows the smart literature review framework [26] and employs BERTopic, a state-of-the-art topic modeling technique [27]. Furthermore, this study utilizes the number of topics and the Gini-Simpson Index [28], also known as Blau’s index, to gauge changes in topic diversity over time. Selective paper reviews were conducted within each topic to interpret the topics. Moreover, time-series analysis of topic share captures research trends over the years. By combining quantitative analysis through topic modeling with qualitative insights from manual reviews, this study endeavors to elucidate the evolving trends in the extensive body of VT research throughout the years.

2. Literature Review

2.1. Prior Reviews of VT Research

In numerous recent studies, systematic reviews have been undertaken to explore the themes and trends in VT research. For example, Abarca et al. [29] and Chang et al. [18] delved into the primary constructs and concepts prevalent in contemporary VT research. Their examinations reveal that extant VT research has explored team dynamics and leadership aspects, including communication, trust, and cultural diversity.

There are also comprehensive reviews addressing the challenges associated with collaboration in VTs and the tactics employed to surmount these challenges. For instance, Ali and Lai [12] and Swart et al. [11] organized empirical studies that shed light on communication difficulties and trust-building within VTs. Morrison-Smith and Ruiz [15] examined challenges rooted in the three-dimensional conceptualization of distances—geographical, temporal, and perceived—that characterize VT collaboration. O’Leary et al. [30] reviewed factors contributing to VT effectiveness, encompassing individual attributes, interpersonal connections, and technological elements. Moreover, some reviews have adopted a more specific focus, narrowing their inquiries to targeted concepts and research questions. For example, Caputo et al. [2] scrutinized two decades of research on conflict and conflict management within VTs. Han and Hazard [13] organized predictive variables and outcomes associated with shared leadership in VTs.

Furthermore, meta-analyses have been undertaken to explore predictive and moderating variables pertaining to VT performance. Chaudhary et al. [3] demonstrated that cohesion significantly and positively influences VT performance, with this impact moderated by team tenure and occupation. Brown et al. [31] revealed the direct effects of two leadership styles—relationship-focused and task-focused—on VT performance, highlighting the moderating influence of task interdependence.

Although these review articles have enriched our understanding of key themes and trends in VT research, the sheer volume of VT research forces a review article to focus on a limited subset of research. Moreover, some reviews employ a selective narrative approach, providing limited insights into the overarching themes and trends in this field. These narrative reviews also face challenges in terms of transparency and reproducibility. Therefore, this study utilizes topic modeling to offer a comprehensive and objective assessment of the current state of the art and trends in VT research, leveraging a vast dataset to enhance transparency and reproducibility in its findings.

2.2. Literature Reviews by Topic Modeling

Topic modeling, a machine learning technique using NLP, is pivotal in discovering the hidden semantic architecture, often called “topics,” embedded within a corpus of documents [32]. These topics encapsulate the latent semantic essence of the documents in the collection. In mathematical terms, each topic is represented as a probability distribution across the words within the vocabulary, with higher probabilities assigned to words more closely associated with the topic [33]. In simple terms, topic modeling algorithms analyze word cooccurrence patterns within a document set to unveil the underlying themes, thereby enabling the identification of primary issues within each document and clustering documents according to the topics.

The methodological characteristics of topic modeling render it a powerful tool for exploratory literature reviews [26], and it has gained prominence in recent literature review studies. For instance, Mora et al. [34] used topic modeling to survey autonomous vehicle research comprehensively. Analyzing data from research papers published between 1970 and 2019, they identified 13 core thematic areas in the field. Karami et al. [35] harnessed topic modeling to uncover semantic patterns and yearly trends of Twitter-based studies, extracting 38 topics from over 18,000 papers published from 2006 to 2019. Benita [36] utilized topic modeling to map research on human mobility behavior during the COVID-19 pandemic, identifying four overarching themes and 14 subthemes through keyword cooccurrence and evidence map analysis. As another example, Barravecchia et al. [37] applied topic modeling to the abstracts of 1,708 research papers on product-service systems and identified eight distinct topics.

Topic modeling boasts two primary strengths as an exploratory literature review method. First, it employs efficient computational algorithms capable of handling vast amounts of textual data [38]. This computational efficiency enables researchers to conduct exploratory reviews encompassing a wide scope of articles. In contrast, systematic literature reviews often entail labor-intensive manual tasks [39, 40], constraining the scope and temporal coverage of the reviewed literature. Second, topic modeling can discover hidden thematic structures within documents with a certain degree of objectivity and transparency. In systematic literature reviews, the identification of themes depends on the researcher’s interpretation and is thus inherently subjective. Conversely, topic modeling relies on statistical modeling and machine learning, leading to more objective, transparent, and reproducible analyses and findings [41, 42]. While bibliometric is also a statistical method primarily centered on citation analysis [19], it is not the most suitable approach for uncovering thematic structures and emerging trends that may not be discernible solely through citation patterns.

2.3. Latent Dirichlet Allocation (LDA)

Since the 1990s, when topic modeling was first proposed [43], various topic modeling algorithms have been developed and utilized. According to some review papers focusing on topic modeling research [23, 44], the most widely adopted algorithm is LDA, which was introduced by Blei et al. [45]. Recent literature review papers on topic modeling have also used LDA [35, 37].

The most notable strength distinguishing LDA from previously proposed methods is its ability to automatically uncover interpretable latent topics within extensive collections of documents [46]. LDA is a generative model employing probabilistic techniques to assign words in each document to a particular topic [45]. Specifically, LDA estimates the probability of a word appearing in a document belonging to a particular topic and generates a list of words with their associated probabilities for each topic. While these topics lack explicit labels, the words within each set often exhibit semantic relationships, rendering the topics interpretable [46].

However, despite its popularity, LDA is limited in that it may miss contextual information present in the text. This limitation arises from LDA’s reliance on the bag-of-words representation for documents [45]. Each document is represented as a vector of word counts, with each element corresponding to the frequency of a particular word in the document. Notably, this representation ignores word order, potentially causing LDA to overlook critical contextual information dependent on word order or relationships.

2.4. Bidirectional Encoder Representations from Transformers (BERT)

Because academic articles often contain complex and domain-specific terms and phrases [47], the context and relationships between words are crucial for identifying the topics of these articles. Thus, an alternative method—one adept at capturing contextual information in the text—may be preferred over LDA for extracting topics from academic articles. This study adopts BERTopic [27], a cutting-edge topic modeling method that harnesses the BERT model, which fundamentally differs from earlier models like LDA and recurrent neural networks (RNN).

LDA is a generative probabilistic model tailored for analyzing text corpora, and RNNs process text in a sequential manner. BERT fundamentally differs from these predecessors in that it is a pretrained language model adaptable to various NLP tasks [48]. BERT relies on a deep neural network architecture based on the transformer to capture contextual relationships between words [49].

RNNs are not so effective in handling long-range dependencies in text because of their fundamental mechanism. RNNs are designed to process sequential data, like sentences, by maintaining a form of memory about previous inputs. They achieve this by using their internal state (or memory) to process sequences of inputs, one after the other. This sequential processing allows RNNs to capture information about what has been processed so far, theoretically enabling them to understand the context within sequences.

However, this approach has a drawback, known as the vanishing gradient problem, wherein gradients of the loss function become extremely small [50]. In neural networks, model learning occurs through a process called backpropagation, where the network adjusts its parameters to minimize the output of the loss function that quantifies the difference between predicted output and actual output. During backpropagation, the gradients (i.e., the partial derivatives of the loss function with respect to the neural network’s parameters) are propagated backwards through time. Through this process, the gradients tend to vanish (become very small) because they are repeatedly multiplied by the same weight matrix at each time step.

The vanishing gradient problem is particularly problematic for long sequences. As RNNs process each word in a sentence, the influence of words processed earlier gradually diminishes. This diminishing influence causes the network to fail to grasp the context of words or phrases that appeared much earlier in the sequence, making it challenging for RNNs to learn dependencies between words far apart [51]. For instance, in a long sentence, the RNN might struggle to associate a subject at the beginning with a verb appearing much later.

Vaswani et al. [49] introduced the transformer model to remedy these limitations. This model incorporates inventive techniques, including positional encoding and attention mechanisms. Positional encoding adds extra information to each word that indicates its position in the sentence. It is like tagging each word with a unique identifier. This operation enables the transformer model to understand the sequence of words, which is crucial for grasping the meaning of sentences where word order matters.

The attention mechanism evaluates the relevance of each word in a sentence to a target word. It does this by computing a score for each word. The score is the dot product of two vectors: query (representing the target word) and key (representing each word in the sentence). A higher score means higher relevance of the word to the target word. These scores are then normalized into probabilities, ensuring they add up to one, which helps the model assess word relevance more accurately. Finally, these scores are applied to value vectors, which hold detailed word information. This process allows the model to understand each word in context, giving a richer interpretation of the sentence beyond just the words themselves. This method marks an advancement over earlier models, as it captures the full sentence context, not just relying on neighboring words.

2.5. BERTopic

Utilizing BERT, BERTopic effectively identifies a topic for each document by capturing contextual information and, therefore, is a suitable tool for literature review purposes [52]. As shown in Figure 1, BERTopic operates through four key steps: (1) document embedding, (2) dimensionality reduction, (3) clustering, and (4) topic retrieval [27].

In the first step, document embedding, BERTopic employs the Sentence-BERT (SBERT) framework [53] to convert documents into a dense vector representation. SBERT, an enhancement of the original BERT model, uses a Siamese network to create embeddings that effectively capture the semantic essence of longer text like paragraphs. Each document is transformed into a vector space that not only represents its content but also maintains the contextual and semantic intricacies. These vectors are high-dimensional, ensuring that similar documents are positioned closer in the vector space, preserving semantic relationships. This approach marks a significant improvement over LDA, which primarily relies on word frequency and often misses the subtler aspects of document semantics. By leveraging SBERT, BERTopic effectively captures contextual nuances within and across documents, offering a more refined understanding of their content.

In the second step, dimensionality reduction, BERTopic utilizes Uniform Manifold Approximation and Projection (UMAP), a nonlinear dimensionality reduction technique [54]. UMAP works by simplifying high-dimensional data into lower-dimensional forms. It does so through manifold learning, which assumes that data in high dimensions actually lies on a simpler, lower-dimensional structure. This method allows UMAP to maintain the relationships and distances between documents as they are in high-dimensional space, even after reducing dimensions. While LDA’s linear approach to dimensionality reduction often overlooks the nuanced semantic relationships between documents, UMAP’s nonlinear approach can keep the essential semantic links intact, offering a more nuanced view of document relationships.

The third step, clustering, uses hierarchical density-based spatial clustering of applications with noise (HDBSCAN) algorithm [55], an enhancement of DBSCAN, to process dimensionality-reduced data. HDBSCAN, a density-based clustering method, forms clusters by analyzing the density of data points in the reduced embedding space created in the dimensionality reduction step. HDBSCAN creates a hierarchy of clusters that can offer a detailed view of topic relationships and nesting. This hierarchy is instrumental in distinguishing between well-defined topics (high-density areas) from outliers or irrelevant documents. By isolating these outliers, BERTopic avoids improper topic assignments, ensuring each cluster is distinct and meaningful. This approach is particularly effective compared to LDA, which tends to assign every document to a topic, potentially diluting the relevance and clarity of topic clusters.

In the fourth step, topic retrieval, BERTopic uses a modified version of the term frequency-inverse document frequency (TF-IDF) algorithm, known as class-based TF-IDF (c-TF-IDF) [27]. This modified approach fundamentally alters the standard TF-IDF framework to enhance topic identification and representation. The c-TF-IDF method recalibrates the traditional TF-IDF focus from individual documents to topics within the dataset. While conventional TF-IDF evaluates term importance within each document relative to the entire corpus, c-TF-IDF assesses term frequency within each identified topic.

Mathematically, conventional TD-IDF is computed by the following formula:

Here, is the term frequency (TF), which represents the frequency of a word (term) in document . TF is multiplied with inverse document frequency (IDF), the logarithm of the number of documents divided by the number of documents containing the word . While TF measures the frequency of a word in a document, IDF represents the rarity of a word in a collection of documents. Combining TF and IDF, TF-IDF () quantifies how important a word is to a document in a collection of documents.

On the other hand, c-TF-IDF uses the following formula to compute the score for each word that represents its importance within a given cluster:

In this formula, signifies the term frequency of word within cluster , multiplied by the inverse class frequency that quantifies how much information a term provides to a class. This inverse class frequency is calculated by taking the logarithm of 1 plus the average number of words per class , divided by the frequency of word across all classes. The reason for adding 1 is to correct the value to be positive.

c-TF-IDF helps BERTopic identify and highlight words that are both common within a specific topic and unique to it, compared to the whole corpus. This method enables BERTopic to find terms that best represent each topic, making the topics’ meanings clearer and more distinct. Unlike LDA, which often misses unique terms due to its reliance on word distributions across topics, BERTopic focuses on extracting terms that are not only prevalent within a topic but also uniquely characteristic, leading to a more precise and contextually nuanced topic representation.

3. Method

This section will outline the methods employed for data collection, data preprocessing, and topic extraction by BERTopic. It will also detail the specific analytical methods applied to address each research question. The following data analyses were conducted using Python as the primary programming language. The process from data acquisition to analysis is summarized in Figure 2.

3.1. Data Acquisition

Building upon previous studies that employed topic modeling for literature review, this study uses a dataset of abstracts of focal research papers. The dataset comprises abstracts and accompanying bibliographic details of papers on VT research. This study used only peer-reviewed articles published in Scopus-indexed journals to ensure that the studies included in the review were of high quality. In addition, since this study also analyzes the number of citations for each article, it is meaningful to limit the scope to studies registered in Scopus to unify the criteria for counting citations. Furthermore, Scopus covers a wide variety of research areas and, therefore, is superior in terms of quality control and comprehensiveness of reviews. However, using multiple article databases in the literature search is preferable to avoid unintentional omission and search bias. Therefore, following the criteria for systematic literature review used in previous studies [56], the following three databases were utilized in addition to Scopus: IEEE Xplore, ProQuest, and ScienceDirect.

The paper search did not stipulate a specific starting point for the publication year, encompassing all publications in those journals through December 2023. To assemble this dataset, the three databases were queried for papers containing the terms “virtual team” or “distributed team” within the abstract, title, or keywords. The extracted bibliographic information encompasses author names, publication year, article title, journal title, volume number, page numbers, and abstracts. The abstracts provide the textual content for our analysis, while the publication year can be used for examining topic evolution over time. Additionally, other details serve as unique identifiers for each paper. It is worth noting that the paper search was confined to documents categorized as either “Article” or “Review” in the document type categorization.

In the initial paper search, a total of 3,118 papers were identified from the four databases. After removing duplicate papers from these, 2,440 remained. Of these, 2,269 were extracted from the Scopus database, and the remaining 171 were not extracted from Scopus, but only from the other three databases. A search of these 171 titles in the Scopus database revealed that 16 were peer-reviewed articles published in Scopus-indexed journals. Therefore, these 16 titles were added to the 2,269 titles extracted from the Scopus search, resulting in 2,285 titles. From this pool, 79 items (such as editor’s introductions and commentaries) lacking abstracts were excluded. Furthermore, 22 papers devoid of author names were also removed. Consequently, the final dataset consisted of 2,184 papers.

3.2. Data Preprocessing

A CSV file was created from the extracted data, containing 2,184 cases and five columns, namely, “author names,” “publication year,” “abstracts,” “paper titles,” “journal titles,” and “citation.” Before data preprocessing, the author examined each paper’s abstract to determine the necessary preprocessing tasks. Initially, a visual inspection of the abstracts was conducted to identify required preprocessing tasks. Subsequently, a word count analysis was performed on the abstract data to identify any abnormally frequent occurrences of words. The analysis revealed that some abstracts contained copyright notations, journal names, and abbreviations, deemed irrelevant to topic extraction. They were consequently removed from the abstract data.

Most of the papers include search terms such as “virtual,” “team(s),” “distributed,” and “VT(s),” in their abstracts. When conducting topic modeling with these terms in their original form, many papers appeared to be closely associated only with “VTs.” Therefore, these words were removed in the preprocessing stage.

The subsequent steps involved punctuation removal, lemmatization, and stop-word removal, based on Grootendorst’s [27] methodology. Punctuation removal eliminated all punctuation marks from the text, including commas, periods, exclamation marks, question marks, and other symbols. When employing machine learning models, these characters can be noise in text analysis because they add complexity and variability to the data without providing meaningful information. Punctuation removal can reduce noise in the data, facilitate processing by the NLP algorithm, and ultimately enhance the performance of machine learning models.

Lemmatization is an operation that transforms words in the textual data into a basic root form called a lemma. It reduces noise and the number of unique words in the text while retaining the meaning of the text, thereby improving machine learning model performance. Stop-word removal is an operation that removes stop words in a text. Stop words, such as “a,” “and,” “the,” and “in,” are words that are commonly used in a language at a high frequency and are not considered to add significant meaning to the text analysis. Removing stop words improves machine learning model performance by reducing data dimensions and streamlining data processing. These processes were executed using the NLTK tool in Python.

The BERTopic class was imported from Python’s BERTopic module to use BERTopic for topic modeling. Following Grootendorst’s [27] method, we used the sentence-transformer model “all-MiniLM-L6-v2” for embedding. For HDBSCAN clustering, two hyperparameters, min_cluster_size and min_samples, were set to their default values, which are both 15. A cluster was not considered valid if it contained fewer data points than this threshold. For instance, if min_cluster_size is set at 15, groups with 14 or fewer data points were treated as noise or outliers. min_samples controls the density required for a region of space to qualify as a single cluster. It represented the minimum number of neighbors within a distance defined by HDBSCAN’s interarrival distance for a point. For example, if min_samples is set to 15, a point with fewer than 15 neighbors within the defined distance will be considered noise. Higher values for these hyperparameters resulted in more data points being categorized as noise, leading to a more conservative clustering. The -grams were set to range from 1 to 3.

With these configurations, a BERTopic model was built and applied to the preprocessed abstract data using the Python fit_transform method. It encompassed embedding, UMAP dimensionality reduction, HDBSCAN clustering, and c-TF-IDF calculations.

3.3. Analysis

For the first research question (RQ1) regarding the diversification of VT research topics, two analytical methods were employed. The first method quantifies the degree of topic diversification by counting the number of topics and observing how this number changes over time. A larger number of topics suggests a greater diversity of subjects under investigation. However, solely tallying the number of topics overlooks the distribution of documents among these topics. If some topics gain a larger share as the number of topics increases, the growth in topic count may not necessarily represent true diversification. To account for both the number of topics and their distribution across documents, this study adopted the Gini-Simpson Index as a measure of diversity [57]. The Gini-Simpson Index takes a value between 0 and 1, with higher values indicating greater diversity. The following formula computed the Gini-Simpson Index: where is the number of papers in the -th topic and is the total number of papers in the dataset.

For the second research question (RQ2), which seeks to identify the various topics within VT research, the following approach was employed: Each topic identified was interpreted and named by the author. These named topics were listed and reviewed to show what research topics had been addressed in VT research. To identify the central topics that have garnered substantial attention in prior research, the topics were ranked based on the number of papers under each topic. Topics were interpreted based primarily on the list of words output by the model. Then, using Scopus citation count data at the time of data acquisition, the five most cited papers (top-cited papers) for each topic were identified. In addition, the author manually reviewed these top-cited papers—read the main body of each paper—to confirm the accuracy of the topic interpretation based on the word lists. Topic interpretations and naming were then revised when necessary.

To explore the third research question (RQ3) about the emerging trends within these topics, the analysis examined temporal changes in each topic’s composition ratio, the total number of papers attributed to each topic, and the ranking of these topics. During the trend analysis process, the authors again manually reviewed the top papers for each topic. This round-trip process helped the authors understand the theoretical underpinnings and prior research streams for each topic, which were useful for understanding the background for research trends in each topic.

4. Results

Before addressing the research questions, let us look at some basic information about the data. Papers were collected from 913 different journals. Table 1 enumerates the journals from which more than 20 papers were extracted. Notably, the earliest publication year among the papers included in the dataset is 1984.

Figure 3 depicts the longitudinal trend in the number of papers on VT research published from 1984 to 2023, using a line graph. After the first paper was published in 1984, the number of papers remained low for ten years but increased sharply in the late 1990s, and more than 100 papers were published annually in the late 2000s. Throughout the 2010s, the volume of papers exhibited fluctuations and plateaued. However, there has been a noteworthy surge in papers since the 2020s.

Next, topic modeling with BERTopic was applied to the dataset. BERTopic identified 16 distinct topics. Out of the 2,184 documents in the dataset, 1,485 were successfully categorized into one of these identified topics. However, the remaining 699 documents did not align with any specific topic and were consequently classified as outliers. The subsequent sections provide a detailed breakdown of the analysis conducted to address each of the research questions.

4.1. RQ1: Has Diversification Occurred in VT Research Topics, and to What Extent?

Figure 4 visualizes the number of distinct topics found in papers published each year, offering insights into the trend of topic diversification. The left chart depicts the annual variations in the number of topics. While small short-term fluctuations are discernible, an overarching pattern of increasing topic diversity emerges. This trend becomes particularly pronounced since the late 1990s, aligning with a notable surge in the volume of papers. The chart on the right-hand side presents the number of topics as a five-year moving average. This moving average effectively underscores the substantial upswing in topics from the late 1990s through the mid-2000s, followed by a gradual ascent in subsequent years.

To further gauge diversification, the subsequent analysis employed the Gini-Simpson Index, treating topics as species and considering the number of topics and the distribution of documents among them based on the number of papers published each year. Line graphs in Figure 5 represent the year-by-year changes in the Gini-Simpson Index. The left side chart illustrates the temporal shifts year after year. Notably, it shows a substantial increase from the late 1990s to the mid-2000s, followed by a gradual upward trajectory in subsequent years. The right-side chart portrays a five-year moving average of the Gini-Simpson Index. This moving average offers a clearer depiction of the pronounced rise from the late 1990s to the mid-2000s and the subsequent gradual increase.

These findings underscore that diversification within VT research is not confined to expanding the number of topics alone; it also encompasses the distribution of documents among these topics. Specifically, topics underwent rapid diversification since the mid-1990s, stabilized in the mid-2000s, and exhibited a gradual, ongoing expansion.

4.2. RQ2: What Are the Various Topics within VT Research, and Which Are Central?

The next step is interpreting the meaning of each identified 16 topics. The topic model provides outputs that include word occurrences and the relative weight of each word within each topic (Table 2). Given that this study employs -grams spanning from 1 to 3, the occurrence and relative weight pertain to individual words and sets of 2 to 3 words. In essence, the output from the topic model shows the word or set of words that appear most frequently within each topic. The interpretation of each topic was guided by the information provided in Table 2, enriched by domain knowledge, and informed by the review of several articles falling within each topic. The terms in bold in Table 2 are the topic names formulated based on the interpretation. Notably, topics with lower numerical designations encompass a larger volume of papers. Thus, topic 0 includes the highest number of papers, while topic 15 encompasses the fewest.

Table 3 presents a selection of seminal papers for each topic. These seminal papers were chosen based on their exceptional citation count per year, calculated from the year of publication up to 2024. For topics with more than 100 papers (topics 0 to 5), five papers were selected. Topics with fewer than 30 papers had two papers chosen. For all other topics, three papers were selected.

The citation count per year was determined by dividing the total number of citations by the number of years between the publication year and 2024. The citation per year for each paper is indicated as the number in curly brackets after the publication year within parentheses.

In the following, the interpretation of each topic will be elucidated. (i)Topic 0: student learning and education

Topic 0 mainly focuses on the educational effects of individual competence and group processes on online collaboration in higher education [5862]. These studies include experimental works from experiential and social learning perspectives. For example, Erez et al. [58] used collaborative, experiential learning to enhance global managers’ trust-building skills, resulting in increased cultural intelligence and global identity that lasted six months postproject. Similarly, Taras et al. [59] evaluated the effectiveness of global virtual student collaboration projects in international management education, involving over 6,000 students. Their experiential approach produced positive outcomes, including reactions, learning, attitudes, behaviors, and performance. (ii)Topic 1: communication

Topic 1 research explores issues related to geographical dispersion and computer-mediated communication in VTs, emphasizing conflicts, reduced satisfaction, and strategies for enhancing communication and relationships [6367]. For example, Montoya-Weiss et al. [64] examined the effect of temporal coordination on VTs. They found that temporal coordination supported by an asynchronous communication technology alleviates the negative impact of avoidance conflict management behavior on performance. Marlow et al. [65] developed a conceptual framework of communication effects in VTs. Their framework proposed that team and task characteristics (i.e., virtuality, interdependence, and task complexity) moderate the impacts of communication (quality, frequency, and content) on team process and outcome. Although Opdenakker [63] has the most citations per year, its primary purpose is to review and discuss interview techniques as qualitative research methods rather than directly relevant to communication in VTs. (iii)Topic 2: leadership

Topic 2 focuses on effective leadership behaviors and styles in VTs, considering the unique nature of VT work [6872]. Cortellazzo et al. [68] conducted a systematic review of leadership in digitization, highlighting the significance of leaders’ cultivation of relationships among dispersed stakeholders in digitalized work settings, including e-leadership and ethical considerations. Hoch and Kozlowski [69] investigated the impact of leadership styles on VT performance. They discovered that team virtuality alters the relationship between hierarchical leadership and VT performance, underscoring the need for structural support. Shared leadership, in contrast, positively influenced VT performance, regardless of virtuality. (iv)Topic 3: global and cultural diversity

Topic 3 research delves into the factors and challenges of communication in global virtual teams (GVTs), including cultural diversity, geographic distance, language barriers, communication media, trust, motivation, and conflict. Some papers offer insights into how culturally diverse VTs can effectively address these issues [7377]. Maznevski and Chudoba [73] introduced a contingency model for effective GVTs based on a case study. Their model considers factors such as task complexity, communication media, team interdependence, diversity, and member preferences, providing a framework to adapt communication strategies and media choices for effective team communication. Hinds and Mortensen [74] examined the effect of spontaneous communication on interpersonal and task conflicts in geographically distributed teams. They revealed that there are positive links between geographical dispersion and interpersonal as well as task conflicts, and these relationships were weakened by shared identity and shared context strengthened by spontaneous communication. (v)Topic 4: performance

Topic 4 research includes studies that explore the functionality of VT and its predictors but also contains review articles from a broader perspective [7880]. For instance, Powell et al. [78] provided a comprehensive overview of existing literature on VTs. They also present a set of research questions for future research, organized around inputs, socioemotional processes, task processes, and outputs. Raghuram et al. [79] provided an integrative review that clustered research on virtual work into three areas: telecommuting, VTs, and computer-mediated work. They also develop a conceptual model that helps researchers compare those approaches to investigate virtuality-related issues across research clusters.

This topic also includes studies that discuss challenges work teams faced during the COVID-19 pandemic. Chamakiotis et al. [9] pointed out that the COVID-19 pandemic has increased the responsibility placed on VT leaders to maintain team engagement and trust. Whillans et al. [10] argued that due to COVID-19, teams had to adjust how to collaborate when members could not meet in person. They highlighted the importance of careful planning and structuring work in VTs. (vi)Topic 5: trust

Topic 5 articles delve into trust within VT contexts, exploring its initial assumptions, influencing factors, and dynamic roles, emphasizing technology-mediated interactions and contextual nuances [8185]. Jarvenpaa and Leidner [81] examined trust development in GVTs, highlighting the phenomenon of “swift trust,” where trust is initially assumed and then adjusted over time [113]. However, this trust tends to be fragile and short-lived. Jarvenpaa et al. [82] found that trust-building activities impacted GVT members’ perceptions of their peers’ ability, integrity, and benevolence. Initially, integrity perception primarily influenced trust, later giving way to a greater influence of benevolence perception. The impact of perceived ability diminished over time, while individual members’ trust tendencies remained stable. This process underscores the presence of “swift” trust in team formation. (vii)Topic 6: product design

Topic 6 focuses on enhancing knowledge management and collaboration in product design processes [8688]. Brandt et al. [86] proposed process data warehousing for improved knowledge integration in engineering design teams using flexible ontology-based schemas. El-Diraby et al. [87] explored intelligent knowledge management systems in construction, highlighting human-based knowledge exchange through construction domain ontology development. (viii)Topic 7: patient care

Topic 7 research emphasizes the vital role of effective teamwork, exploring technology-driven interventions and the complexities of interprofessional collaboration within healthcare teams [8991]. Weller et al. [89] highlighted the significance of teamwork and communication in healthcare, identifying barriers such as professional silos and geographical dispersion. They propose a seven-step solution to overcome these obstacles and enhance team communication. Block et al. [90] introduced Alive-PD, an automated behavioral intervention for diabetes, demonstrating its effectiveness in improving health indicators related to diabetes risk. This intervention shows potential for scalability and benefits to at-risk individuals, including those with prediabetes, in the United States. (ix)Topic 8: global software development

Topic 8 research delves into challenges and solutions in global software development [9294]. Sarker et al. [92] used border theory to investigate work-life conflict in distributed software development, linking it to turnover intentions and reduced performance. They suggested that supervisory support and agile methodologies can alleviate this conflict. Ramasubbu et al. [93] explored offshore software project productivity and quality, finding that structured software processes can mitigate challenges in offshore development, with process-based learning activities as a mediating factor. (x)Topic 9: collaboration

Topic 9 research focuses on collaboration in VT and examines the conditions for effective collaboration [9597]. For example, Patel et al. [95] identified seven key factors influencing collaboration, establishing a foundational framework for a collaborative working model. Sarker and Sahay [96], employing ethnographic research on VT members in the US and Norway, observed that there is a tendency to overestimate the effects of technology on virtual collaboration and that the separation in space and time can lead to communication breakdowns and misunderstandings. (xi)Topic 10: knowledge sharing and transfer

Topic 10 research explores knowledge sharing and transfer in VTs and their influencing factors. Trust, task dependency, diversity, communication richness, and support tools are found to promote knowledge sharing and transfer [98100]. For instance, Griffith et al. [98] proposed a theoretical model considering virtuality levels, communication richness, and support tools’ impact on knowledge transformation, access, transfer, and tacit knowledge acquisition in VTs. Reviewing prior research, Alavi and Tiwana [99] identified challenges to knowledge integration in VTs (e.g., insufficient mutual understanding and constraints in transactive memory). They also proposed a knowledge management system approach to tackle these challenges. (xii)Topic 11: agile development

Topic 11 articles delve into managing distributed teams and quality requirements in agile projects [101103], driven by the growing importance of agility in global information systems development (ISD) [114]. Paasivaara et al. [101] examined Ericsson’s large-scale R&D program, highlighting lessons such as the need for experimentation, step-wise implementation, specialization, and a common agile framework. Sarker and Sarker [102] explored agility in globally distributed ISD teams, identifying resource, process, and linkage agility dimensions and providing strategies and contingencies for fostering agility in ISD. (xiii)Topic 12: construction project management

Papers belonging to topic 12 focus on improving project effectiveness in VTs [104106]. Many of them investigated construction projects. Oraee et al. [104] reviewed studies on construction networks that are based on building information modeling (BIM). They analyzed 73 articles to identify collaboration barriers in BIM-based project teams and offer practical guidelines. Daim et al. [105] conducted a study to reveal factors that cause communication breakdown in project teams. Through interview surveys of project team members, they identified five distinct areas: trust, interpersonal relations, cultural differences, leadership, and technology. (xiv)Topic 13: agent and robot

Topic 13 papers emphasized the increasing role of decision support systems and robots in VTs and the need to address human-robot interaction challenges [107, 108]. Reviewing prior studies, Shim et al. [107] illustrated that decision support systems had evolved to support not just individuals but also groups, including VTs. They also pointed out that VTs communicate differently than face-to-face groups due to reduced communication modalities. Murphy [108] discussed robot use in urban search and rescue, focusing on reducing human control, maintaining performance in dispersed teams, and improving acceptance. (xv)Topic 14: creativity

Topic 14 papers explored communication dynamics and the creative process in VTs [109, 110]. For instance, Leenders et al. [109] proposed a three-factor model, emphasizing balanced communication frequency and low centralization for team creativity in new product development. Nemiro [110] identified four creative process stages and their communication methods, shedding light on creativity in VTs. (xvi)Topic 15: social capital and knowledge sharing

Topic 15 papers relate to knowledge sharing and relationship formation within VTs focusing on social capital [111, 112]. For instance, Robert et al. [111] showed that relational and cognitive capital were more impactful to knowledge integration in VTs than in teams collaborating face-to-face. Cummings and Dennis [112] examined the impact of enterprise social networking sites (ESNS) on how team members form impressions of each other in VTs. They found that information on ESNS profiles, like education and previous project work, can create initial perceptions of social capital before team members interact directly.

4.3. RQ3: What Trends Have Emerged in These Topics?

Figure 6 is a heat map showing the yearly distribution of topics. It uses years on the horizontal axis and topics on the vertical axis. The color in each cell represents the topic’s share of papers for that year, with darker green indicating a higher share. Gray cells represent zero publications for that topic in a given year. The color intensity changes indicate topic share fluctuations over time.

The figure shows that topics 0 to 10 gained prominence in the late 1990s, with topic 0 (student learning and education) and topic 1 (communication) consistently having the most papers from then to date. Topic 2 (leadership) has shown recent growth in its share of total papers. Topic 3 (global and cultural diversity) emerged later, with its first paper in 2000. Topic 4 (performance) showed a significant increase in the number of papers in the 2020s following the outbreak of COVID-19 because papers related to the COVID-19 pandemic are clustered into this topic. Topic 5 (trust) was highly published in the 2000s but has declined recently. Except for topic 13 (agent and robots), topics 11 to 15 emerged in the 2000s and later. Topic 11 (agile development) had its first paper in 2009, making it the most recent addition. The emergence of topic 15 (social capital and knowledge sharing) might be attributable to the impactful theory paper published in the late 1990s to the early 2000s [115, 116].

Figure 7 provides a comprehensive view of topic evolution from 1995 to 2023. It divides this period into six five-year blocks, presenting the top 10 topics in each interval. Topics are ranked based on the number of papers in each period. The number to the left of each topic is the topic number, while the number to the right indicates the paper count. Cell colors indicate changes in ranking: blue cells signify an increase in rank compared with the previous five years, with darker shades indicating a more substantial increase. Conversely, red cells indicate a decline in ranking, with darker shades signifying a greater decrease.

Figure 7 shows noteworthy trends in specific topics. Topic 2 (leadership) has shown a significant upward trajectory since 2010, reflecting the growing interest in e-leadership [117] and virtual leaders [118] since the 2000s. Although topic 1 (communication) remains a prominent area of study, there was a decline in the number of papers in the late 2010s. This dip may be attributed to advancements in information and communication technologies (ICTs), enabling more effective communication in virtual environments. As suggested by media richness theory [119], this improvement may have alleviated communication challenges in VTs. However, temporal dispersion remains a concern in GVT, ensuring that communication remains a focal research topic. Topic 5 (trust) showed a substantial increase in paper numbers in the late 2000s, followed by a decline. This surge might be attributed to influential theoretical studies on trust published from the late 1990s to the mid-2000s [120122], which invigorated trust research.

In the 2020s, topics 4 (performance), 7 (patient care), and 11 (agile development) showed significant upward shifts in their rankings. This surge in topics 4 and 7 can be attributed to the widespread adoption of remote work across various professions and the increased use of VTs in the healthcare sector, a trend accelerated by the COVID-19 pandemic [123]. The remarkable increase in research on topic 11 (agile development) may be attributable to the rapid and extensive adoption of agile methodologies in system development [124, 125].

5. Discussion

5.1. Summary of Findings

Through topic modeling, this study identified 16 distinct VT research topics using a dataset comprising abstracts from 2,184 articles published in Scopus-indexed journals, spanning approximately four decades. Notably, the top five topics, with the highest shares across the entire study period, are as follows: “student learning and education” (topic 0), “communication” (topic 1), “leadership” (topic 2), “global and cultural diversity” (topic 3), and “performance” (topic 4).

In regard to the time series trend, we observe a growing diversity in topics, with an increasing number of unique topics. Among the 16 topics, roughly half emerged in the 1990s, while others have been published since 2000. Notably, “agile development” (topic 11) is a relatively recent addition, emerging around the 2010s. Besides, there have been fluctuations in the presence of each topic over time. When we aggregated the data into six five-year blocks covering the survey period, certain topics consistently ranked among the top 10 throughout this duration. These enduring topics include “student learning and education” (topic 0), “communication” (topic 1), “leadership” (topic 2), “performance” (topic 4), “trust” (topic 5), and “collaboration” (topic 9).

In the most recent block (2020–2023), there was a significant increase in the number of papers on three topics: “performance” (topic 4), “patient care” (topic 7), and “agile development” (topic 11). However, research related to the COVID-19 pandemic—belonging to “performance” (topic 4)—is expected to decline as we transition into the postpandemic era. In contrast, “agile development” will likely remain a prominent study area. The versatility of agile methodology extends beyond software development, finding applications across various industries and domains of expertise. Consequently, exploring the application of agile methods within VTs across different sectors may become a key theme in future research.

While traditional research topics like “student learning and education” (topic 0), “communication” (topic 1), “leadership” (topic 2), “trust” (topic 6), and “collaboration” (topic 9) may not experience rapid growth, they will remain pivotal areas of study. These topics, grounded in universal concepts, can be revitalized by introducing new theoretical frameworks. Infusing emerging technologies, such as artificial intelligence and the metaverse, into VT environments can spawn fresh research avenues while remaining integrated with traditional subjects, offering novel research directions [126, 127]. Moreover, while prior studies have shed light on the short-term impacts of virtual teamwork on individual attitudes and interpersonal dynamics [15], the long-term effects of VT experiences on developing interpersonal skills and social relationships remain underresearched. As we transition into the postpandemic era, there is potential for research to delve into the positive and negative effects of VT experiences on the interpersonal interactions of team members as they return from virtual to face-to-face teams.

5.2. Strengths and Limitations

This study represents the first and most current review of VT research that employs topic modeling, affording it several notable strengths. Topic modeling enables the extraction of patterns and evolving trends from large-scale textual data. Consequently, this method unveils temporal changes and thematic developments amidst the voluminous corpus of literature. As an unsupervised machine learning technique, topic modeling can unearth latent patterns within substantial datasets, often not explicitly mentioned in the text [128]. Furthermore, the computational foundation of topic modeling contributes to its reproducibility, surpassing human-hand analysis in this regard. By extracting coherent topic representations grounded in the semantic similarity of words and phrases within the textual data, topic modeling mitigates the introduction of human biases in topic classification, thereby enhancing the objectivity and reliability of findings.

Another strength of this study is its adoption of the state-of-the-art topic modeling technique, BERTopic. In contrast to many prior research reviews that utilized LDA, this study leverages the superior capabilities of BERTopic. A notable limitation of LDA is its inability to grasp contextual information embedded within textual data. Conversely, BERTopic harnesses the power of a pretrained model founded on transformer architecture. This architecture empowers BERTopic to learn long-range dependencies between words while considering the context in which a word or phrase appears. By integrating contextual information, BERTopic can extract topics that are more precise and highly interpretable.

Despite its notable strengths, this study is not without limitations. First, BERTopic, while proficient at presenting each topic as a collection of weighted words, requires human intervention for topic interpretation [27, 46]. Consequently, this process entails a reliance on domain expertise and, to a certain extent, the need for the analyst to review specific papers related to each topic. Thus, despite its computational foundation, BERTopic is not entirely free from manual review and human biases. Second, in this study, out of 2,184 documents, 1,485 were classified into topics, while 699 were considered outliers. Although some outliers might belong to identified topics, human-hand classification was not employed to uphold the objectivity of topic modeling and mitigate human biases.

This study maintained default hyperparameter values (i.e., min_cluster_size and min_samples set to 15) in BERTopic to ensure both within-topic similarity and between-topic distinctiveness. Lowering these values could have reduced outliers but did not yield significant changes, even after hyperparameter tuning. As model fine-tuning, lowering these hyperparameters can lead to a reduction in outliers. However, lower hyperparameter values did not lead to a significant reduction in outliers. In addition, given the sample size of the data (), setting these hyperparameters lower than 15 may lead to fragmented clustering. Therefore, although this study fine-tuned the model, it ultimately adopted a model with the default hyperparameter values.

Furthermore, this study does not provide in-depth reviews of individual topics or address controversial viewpoints, because its primary goal is to extract patterns and identify topics from extensive text data. Future research can conduct systematic or integrative reviews for the topics identified in this study. Meta-analyses can provide valuable insights for topics with competing causal interpretations or potential moderators in causal models.

The development of large language models (LLMs) is a recent revolution in natural language processing that has transformed the use of AI in various fields, including medicine, education, and business [129131]. GPT-4, a representative LLM, performs remarkably in creative content generation [132]. Although BERTopic typically uses SBERT for embedding, it can also utilize GPT-4 as an embedding model using Open AI’s GPT API. Although GPT-4 is a powerful language model, its embeddings are not specifically optimized for semantic similarity tasks. On the other hand, SBERT encodes sentences into fixed-length vectors, enabling semantic similarity comparisons [53].

Moreover, using GPT-4 for embedding is computationally costly, and the Open AI’s API has a per-minute usage limit. Considering these strengths and weaknesses, this study employed BERTopic for a smart literature review. However, the integration of LLMs into topic modeling is a promising avenue. Recent research by Wang et al. [133] showed that PromptTopic—a topic modeling approach that leverages LLMs—achieved performance comparable to BERTopic. Future smart literature reviews can employ methods that leverage LLMs as well as BERTopic.

6. Conclusion

This study utilized BERTopic, a state-of-the-art topic modeling technique, on a dataset of 2,184 papers published over the past 40 years in Scopus-indexed journals. It identified 16 topics, analyzed their shares and temporal trends, and conducted selective reviews of each topic. Although this study offers a comprehensive overview of VT research trends, it does have limitations. BERTopic, as a pretrained language model, can benefit from additional data exposure and fine-tuning for improved accuracy and meaningful topic extraction. As VT literature grows, future research can leverage an enhanced BERTopic for more precise and insightful analyses.

Data Availability

The data that supports the findings of this study are available on request from the corresponding author.

Conflicts of Interest

The author declares that there is no conflict of interest regarding the publication of this paper.

Acknowledgments

This work was supported by the JSPS KAKENHI (Grant Number 23K01544).