Abstract
Both inter- and intraorganizational networks draw the attention of researchers and practitioners from various disciplines who view them as the fabric of the socioeconomic world. The network perspective is believed to successfully model most of the socioeconomic phenomena, which, in combination with the prospects of continuously advancing tools for automated data mining and machine learning, gives a tempting promise to effectively forecast socioeconomic events occurring in our societies and businesses. Despite their significance, the topic of event forecasting in the context of organizational networks appears unexplored. Therefore, the objective of this study was (1) to fill the theoretical gap by proposing a mathematical model for organizational network event forecasting, rooted in the social science to remain consistent with the theory, and (2) to experimentally evaluate how the model performs on real data and validate if the results support its use in practical applications. An implementation of the proposed model, based on a decision tree classifier, achieved a prediction accuracy of 87% on a longitudinal data sample and thus demonstrated the practical usability of the model.
1. Introduction
Both inter- and intraorganizational networks draw the attention of researchers from various disciplines who view them as the fabric of the socioeconomic world. According to Moliterno and Mahony [1], the interorganizational network perspective translates a fragment of the economy to a graph comprising a group of organizations (nodes) interconnected with each other by numerous relationships (ties). It tends to omit the internals of the organizations and treat them as black boxes, with only the relationships and a selection of variables visible to an observer. The internals—lower-level organizational units (e.g., departments, groups, divisions)—become visible in the subsequent intraorganizational level of analysis, which is further divisible to the individual, and potentially other following levels. Some studies focus solely on one level of analysis—interorganizational [2] or intraorganizational [3]—while a significantly smaller niche of a multilevel perspective that aims at projecting a full, holistic picture of a focal network is also discernible [1].
Regardless of the level of analysis, researchers successfully embraced the network approach to identify and describe countless phenomena, e.g., competition in interorganizational networks [4], social capital and value creation in intraorganizational networks [5], and knowledge dissemination in inter- and intraorganizational networks [6], just to name a few. Interestingly, even with an already long-lasting focus dating back to 1970 and popularity manifesting in 3,200 results in the Scopus database, the aspect of event forecasting in the organizational network theme remains uncovered. Existing quantitative research on organizational networks focuses mainly on exploring causal relationships between selected variables during a bounded period [2, 7, 8].
Event forecasting is a popular topic in the fields of knowledge discovery and data mining, including the subarea of social network analysis. Several researchers constructed and successfully applied predictive models that identified patterns in social networks and forecasted subsequent discrete events with satisfactory performance [9–11]. The results found use in various interesting and valuable applications, i.e., crime event prediction [11], social unrest prediction [9], stock event prediction [10], to name a few. The conceptual models used in the research were based on the clear definition of a social network, in which nodes represent individuals and edges represent social relations between them. However, organizational networks, which are inherently more complex systems, are still missing a similar conceptual model for event prediction that could support managerial processes in organizational network contexts, e.g., governmental [12], innovative [7], or knowledge-oriented [3]. The applications for event forecasting in the organizational network contexts seem to be as abundant as their social network counterparts.
Therefore, the objective of this study is twofold: First, to fill the theoretical gap in the research on organizational networks by proposing a mathematical model for organizational network event forecasting, rooted in social science to remain consistent with the theory, and second, to experimentally evaluate how the model performs on real data and validate if the results support its use in practical applications.
The proposed model was built on top of a holistic definition of organizational networks (i.e., multilevel multimodal organizational networks) and interactive events [13]. An important factor considered during the model’s development was to enable the use of advanced pattern recognition techniques, especially machine learning algorithms, which are known in many domains for their performance on sophisticated data [14]. To achieve this, the mathematical model was described in a form of a composite function that translates an organizational network to a discrete dynamical system, which components performing consecutive prediction steps can be easily substituted with more advanced ones in future iterations.
The experiment used a longitudinal data sample collected from Twitter which comprised interactive events occurring in a real organizational network in the course of 11 years. The data was split into two samples—training and testing—at an elected point in time, to simulate a real situation in which a user has observed the past (training set) and will experience the future (test set). The event forecasting model was implemented as a Python script, which used the scikit library for machine learning tasks, and a custom implementation of feature selection (i.e., event windowing algorithm). The model was trained and validated with the split data to measure the accuracy of predictions and test the H1 hypothesis.
H1. There is a positive correlation between sequences of past and future interactive events occurring in organizational networks. Therefore, a predictor function F can be found that accepts past interactive events and produces future interactive events (predictions). Correlation C of the function, expressed as the ratio of correctly predicted future events to all predicted events , is significantly higher than 0.5 (whereas C = 0.5 means there is an equilibrium of correct and incorrect predictions).
The article is structured as follows. Section 2 reviews the literature used for the development of the conceptual model of organizational network event forecasting. Section 3 presents the model itself. Section 4 discusses in detail the methodology of the experiment, and Section 5 presents the results. Finally, Section 6 discusses the results and summarizes the research.
2. Literature Review
2.1. Dynamic Interaction Graphs
Dynamic interaction graphs are a relatively new concept in social studies, although static interaction graphs have been used to model social networks (among others) in plentiful studies [15]. As opposed to static graphs which capture aggregated and/or interpreted relations between interacting network nodes, edges in dynamic interaction graphs represent individual interactive events, rendering the resulting network a discrete dynamical system that evolves over time [16]. The literature discussing applications of dynamic interaction graphs in the context of organizational networks was found to be scarce to nonexistent.
Formally, the static interaction graph can be understood as where represent nodes and represent edges. On the other hand, the dynamic interaction graph contains a time variable and thus can be described by discrete snapshots [17].
2.2. Multimodal (Complex) Networks
The literature regarding social network analysis defines the concept of multimodal networks (or complex networks) as graph structures comprising interconnected nodes of multiple types [18, 19]. As opposed to one-mode networks (e.g., social networks which encompass only humans) where nodes are homogeneous, they allow capturing any kind of representational nodes in one network projection. For example, Krackhardt and Carley discussed an organizational network structure of individual, tasks, and resources—a three-mode network projection of an organization that improved its observability [20]. The conceptualization of multimode networks does appear in the context of organizational networks but is typically limited to two-mode networks (i.e., individuals and organizations), especially in the management literature [21].
2.3. Event Forecasting
The topic of event detection and forecasting using digital data sources has been covered with a variety of interesting studies in the data mining and knowledge discovery areas which discussed approaches to predicting cases of social unrest [9, 22, 23], stock market movements [24], election results [25], among others. Notably, most of the studies found during the literature review correlated determinant variables with patterns identified in processed social media posts using both general-use and dedicated models. For instance, Ning et al. [22] analyzed sequences of news articles as precursors leading to categorized events—protests. Comparably, Zhao et al. [23] analyzed tweets as precursors leading to events of social unrest. In both works, the predicted variables (events) could be characterized as exogenous to the precursor variables. An opposite approach, in which the predicted event variables were endogenous to the precursor event variables, was presented by Laxman et al. [26]. Their generative model based on Hidden Markov Chains operated on a finite alphabet of possible event types and predicted a target event type from the provided windows of event sequences (event streams). Noteworthy, in this case, both precursor (input) and predicted (output) variables were the event variable which renders it an endogenous model. The comparison between exogenous and endogenous forecast variables is summarized in Figure 1.

(a)

(b)
2.4. Organizational Network Mapping
No articles discussing event forecasting using digital data sources in the context of organizational networks were found. Reviewed papers did not attempt to structure input data (i.e., social media posts or news articles) in any kind of organizational representation. However, a broader scope of the search for studies mapping organizational structures using digital data sources revealed a few recent, interesting articles. Dong and Rim [27] used social network analysis in their exploratory research to map the communication of nonprofit businesses that lead to the identification of partnerships between them. Their methodology was based on Shumate and Contractor’s concepts of a representational network and a flow network [28]. The former—representational network—infers a relationship between two organizational network nodes from messages they broadcast to other network nodes, informing them about the relationship. In other words, a link appears between the nodes when they announce it to the public (so a family tree could fall into this category). Conversely, the concept of a flow network pinpointed by Shumate and Contractor infers a relationship between the two nodes from exchanges and transmissions of information, messages, and resources between them. In this case, the link appears when there is an identified flow; e.g., they proceed with a transaction, chat with each other, follow each other on social media, reshare posts, etc. without a need for public acknowledgment. Notably, both approaches lead to qualitatively different projections of the organizational network, and none of them seems to predominate over the other. To illustrate, the flow network maps direct interactions between nodes, which makes it arguably more precise than the representational network. On the other hand, the need for an explicit message from involved participants indicating a relationship that the representational network approach imposes can not only reduce noise but also leave out some relationships.
The flow network approach was also exploited by Wang and Guan [29] who, similarly to Dong and Rim, analyzed Twitter posts to extract following relationships among focal organizations represented by their social media profiles. As a result, the authors were able to present a projection of the analyzed organizational network and draw conclusions about the cross-sector structure of intergovernmental and international nongovernmental organizations.
3. Mathematical Model
To fill the gap found in the literature review, the proposed conceptual framework of organizational network event forecasting is built on the foundation of synthesized theories of (1) dynamic interaction graphs, (2) multimodal organizational networks, (3) event forecasting, and (4) organizational network mapping.
Following the multilevel and multimodal network theory [21], an organizational network is defined as a directed graph of taxonomically unconstrained nodes and ties. This notion assumes that any identified phenomenon along with relations to other phenomena can be translated into a labeled node linked to other labeled nodes with directed, labeled ties [30]. Noteworthy, the flexible structure allows capturing the phenomenon of an organization itself in the form of an additional node connected to other nodes comprising the real organization. For instance, an organization consisting of various people, resources, and other intangible assets could be translated into a graph, in which all these elements tie to a node representing the organization and to each other, like in Figure 2. A similar network projection (although comprising a network of individuals, tasks, and resources) was presented and discussed by Krackenhardt and Carley in their endeavor to reason about complex organizations with a series of hypotheses that can be empirically tested [20].

Such a flat network model is different from “discrete-level” approaches to multilevel organizational networks, which view an interorganizational network as one graph, and intraorganizational networks as hidden inside the former’s nodes [31]. The flat structure along with the unconstrained typology of nodes and ties provides a simple, yet rich vocabulary for expressing socioeconomic phenomena as nouns (nodes) and verbs (edges). Notably, since the typology is not dimensionally restricted, the model can capture both a representational network (in which ties are communicated by nodes themselves) and a flow network (in which ties reflect real flows between nodes) by having two different sets (dimensions) of ties [28]. Unfortunately, the cross-analysis between tie dimensions is not in the scope of the current research but it opens an interesting research agenda.
Formally, the multimodal organizational network is described as a graph where nodes are multimodal, or more specifically where C is a node type (mode).
Events, following Provan et al.’s definition of interactive events occurring in organizational networks [13], are defined as discrete, labeled interactions between organizational network nodes with defined timestamps that determine their temporal location. Compared to models of individual networks but with a node being any modeled phenomenon and an edge being an interaction between nodes occurring at a specific point in time t, the organizational network can be viewed as a discrete dynamical system represented by a dynamic interaction graph in (2) [17].
Note that interactions (edges) are volatile—they exist only at time t. Since each edge refers to a single interaction, the graph can be expressed as a sequence of the interactive events .where is the nth four-element vector containing the origin node (the interaction’s initiator) , the target node , , the interactive event type (with I being a set of interactive event types), and a timestamp (with T being a set of all timestamps).
Event mining is defined as a process of data mining oriented at mapping the real organizational network into its digital representation (i.e., the interactive event sequence) using data from diverse data sources (e.g., web pages, social media content, accounting books, correspondence, transcriptions of conversations, etc.).
Event forecasting is defined as the task of predicting consecutive (future) events from a sequence of preceding (past) events or more formally as the predictor function in (5) [32].where is a future interactive event, is the most recent interactive event, and is the oldest interactive event in observable history. Its objective is to predict a label or multiple labels defining a linking event, an origin node, and a destination node, in a defined time range or a time point, depending on the implementation. In other words, event forecasting is defined as a function that accepts a sequence of events and produces a subsequent event. Additionally, the results can be extended with probability estimates defining how likely each of the predicted events is to materialize and how reliable the estimate is, according to internal metrics. The event forecasting concept is presented visually in Figure 3.

4. Methodology
The experiment was designed to test the H1 hypothesis formulated in the introduction and, if supported with results, present a successful application of the proposed framework for organizational network event forecasting, experimentally evaluated on real data.
4.1. Event Mining
Drawing upon the approach used by Dong and Rim [27] as well as by Wang and Guan [29], Twitter API was used to extract evidence for building a graph rendition of an organizational network using the representational network or flow network framework. Tweets collected with the API were processed to extract a tweet’s author (the interaction’s initiator node), all other users and hashtags mentioned by the author in the tweet (the interaction’s target), and an absolute timestamp determining a time point when the tweet was published (Figure 4).

(a)

(b)
Noteworthy, the multimodality of the resulting organizational network representation was manifested by the fact that nodes were both animate actors (individual user profiles), organizations (business user profiles), resources (hashtags relating to e.g., gaming consoles), and other socioeconomic phenomena (hashtags related to e.g., game brands, emotions, or general concepts). A relation implied from a tweet published by a user mentioning another user or a hashtag was deemed representational [28] since it had been announced by the publishing user. On the other hand, a relation implied from a retweeted content or from a reply was regarded as a flow, resulting in the two-dimensionality of the graph’s relations. The interactions themselves were binary—their detailed classification was outside of this study’s scope, but similar tasks have been exercised by other researchers.
4.2. Event Forecasting
Given the data points resulting from the event mining, which contained binary interactive events between the organizational network’s nodes in time, the objective of the event forecasting was to predict future interactions in a time horizon. In other words, the implementation of the organizational network event forecasting aimed to answer the question: “will Node X interact with Node Y in the next T weeks?”.
The implementation was a Python application based on the scikit-learn package, which executed a machine learning pipeline consisting of several steps described in the following subchapters. The codebase was published on GitHub and was accessible via https://github.com/PiotrSliwa/preludium17 on November 29, 2021.
4.3. Data Input
After the data points were extracted, transformed, and inserted into a database by the event mining task, the application grouped them by origin nodes to form a set of event sequences local to individual origin nodes (they could be referred to as ego-networks of the origin nodes), as presented in Figure 5. As a result, the input data could be quickly queried to find all events of a particular origin node. Such a transformation was needed by the designed predictive model (discussed in the next chapter) which accepted samples of individual origin nodes’ event sequences, labeled with binary information whether they preceded target events of the same origin nodes.

4.4. Training and Test Data Split
The development of a supervised machine learning predictive model usually consists of two stages: (1) training, during which a predictor (a regressor or a classifier) is fed with a training data sample (input and expected output), and (2) testing, during which the trained predictor’s performance is verified against the test data sample by feeding it with the input data, and then comparing calculated output with the real value from the training dataset. By this, one can measure how many correct and incorrect predictions the predictor made and how biased the mistakes were (in terms of false positives, false negatives, etc.).
The data split into the training and test data samples was done by dividing the event sequence of an individual origin node into two parts at the highest event distribution point (to balance the two samples):where n is the number of interactive events collected in the set ; therefore, . A distribution point (or a “bin”) at a given moment is defined as the number of origin nodes whose individual event sequences contain at least one event with (mind the sequence E being ordered, so ).
Events with timestamps lower than the defined time point became the training dataset, while all events that happened after the timestamp became the test dataset (see Figure 6). This approach was chosen in favor of the traditional 20:80 split to simulate a real scenario, in which a user of the event forecasting instrument wants to forecast future events at a specific point of time (his or her “presence” at the time). The defined “splitting point” served as the hypothetical user’s “presence”. The predictor (discussed in the next chapter), trained on the training event sequences (the user’s “past”), was then validated on the test event sequences (the user’s “future”) by comparing its guesses with the actual ones.

4.5. Predictor: The Machine Learning Model
The predictor for the event forecasting task used a decision tree classifier from the scikit-learn package (refer to the attached source code for details) due to its known best average performance [33]. The classifier’s implementation can be trained with pairs of input-output where is the input vector of features , and is the output (an element from the set of interactive event types ). The classifier can be defined with a function
The input feature vectors, according to the requirements of many machine learning algorithms, are fixed-length vectors in which a specific position represents an individual feature, and the value represents the feature’s intensity reflected in a floating-point number [34]. In the experiment, the target value was an integer representing the target class, which, in this case, was a binary value referring to existence (1) or nonexistence (0) of a certain organizational network interaction, initiated by node at time , between the node and a destination node . Once trained, the classifier is expected to predict the existence/nonexistence of the relationship given unseen feature vectors.
However, it was first necessary to transform the input data—the event stream resulting from the event mining—into the feature vectors acceptable by the classifier as described by
Therefore, a procedure herein called event stream windowing, described by the transformation was used to split the dataset into “positive” and “negative” event sequences , whereas “positive” means the ones preceding an interaction , initiated by an origin node , between it and a target node , given a timespan window of . The procedure drew from the event forecasting algorithm designed by Laxman et al. [26].
Compared to Laxman et al., the model accepts an event sequence, a target event type, and window size as the input, and labels subsequences of events as “positive” or “negative”, depending on whether a subsequence leads to the target input type provided as the input. Different from Laxman et al.’s algorithm, though, predictions in the proposed algorithm are made by an interchangeable machine learning classifier instead of the standard frequent episode discovery algorithm, to satisfy the requirement formulated in the introduction. Furthermore, event stream windowing slices event streams grouped by origin nodes, instead of a single, global event sequence (a grouped event stream includes only events of a certain origin node), to mitigate the impact of potentially independent events on the predictions. It iterates over the list of grouped event streams and for each of them (a) finds all occurrences of the target event type, (b) cuts out event sequences of the declared window size which end with the target event type and labels them “positive”, and (c) cuts out event sequences of the declared window size from the remaining event sequence and labels them “negative”. The algorithm (summarized in Figure 7) could be characterized as a member of the sliding window discrete methods (Figure 8) which are known to effectively extract information from unbounded, continuously generated sequences of data, thanks to their adaptive features [35].


The positive and negative event sequences carved out of the event stream E were then transformed into feature vectors using a vectorizing function . The strategy used in this research was counting vectorizer—a bag-of-words model of object categorization ([36]. It transforms an event sequence into a vector , in which each position is related to an interactive event , and the value of represents the number of times the event type occurs in the event sequence .
As a result, the decision tree classifier was trained with pairs of (1) feature vectors and (2) the corresponding expected output as presented in Figure 9. At this point, the transformation function had become a composite of the event sequence windowing algorithm and the counting vectorizer,, and the predictor function (defined in hypothesis H1)—a composite of the transformation function and the classifier, . Therefore, the predictor function could be defined as

Since the interactive event type was assumed binary in the research, the event stream windowing algorithm always looked for the “existence” of the interaction between nodes . Thus, the interactive event type was constant .
5. Results
5.1. Input Data
2,666,281 tweets from 200 Twitter profiles of eSports stakeholders (teams, players, and influencers) were collected. The tweets covered a period of over 11 years—the first tweet was published on 15 February 2008 (01:46:46 CEST) and the last on 7 November 2020 (21:57:55 CEST). They were processed to extract a tweet’s author (origin node), all other users and hashtags mentioned by the author in the tweet (destination nodes), and an absolute timestamp determining the time point when the tweet was published. This step resulted in 3,702,773 data points describing interactive events (like depicted in Figure 4) used as the input to the predictor (see Figure 9).
5.2. Training and Test Datasets
The splitting point dividing the input data set into training and test (see Figure 6) was the highest distribution of origin nodes to cover the most collected profiles in the research (a profile needed both the training and the test sets to be included in the research). The moment of the highest distribution of origin nodes in the collected data was found to be 25 April 2019 (23:32:17 CEST), which covered data points from tweets published by 181 profiles. At this point, 2,978,351 data points belonged to the training set (they were extracted from tweets published before the splitting time point), and 724,422 belonged to the test set. Consequently, the research scenario simulated a scenario in which a user is forecasting network events on 25 April 2019 (23:32:17 CEST). Predicted events were then evaluated using the test data set which contained the events that factually occurred.
5.3. Forecasting Performance
The prediction pipeline was fed with the input data, first to train, and then to evaluate the predictor for each permutation of its parameters—the interaction’s origin node , destination node , and window size . The list of target nodes selected for the research included 1000 most popular destination nodes (the most frequent destination nodes in the data points), while the window size was arbitrarily chosen to be 8, 16, or 24 weeks (2, 4, or 6 months) which produced 3000 permutations, and therefore, the train and test phases were executed so many times. Each of the iterations resulted in a performance summary that included:(1)Destination node (e.g., @FaZeClan)—a unique identifier of a destination node provided to the predictive model as a parameter. Once the model is trained, it should be able to answer the question: “Given the event sequence of Origin Node , will it interact with Destination Node ?”. Note that each of the iterations performed tests for all origin nodes (i.e., all 181 Twitter profiles).(2)Window size (e.g., 16 weeks)—a period in which the forecasted interaction is expected to occur, provided to the predictive model as a parameter. It narrows down the above question to: “Given the event sequence of Origin Node , will it interact with Destination Node in the next N weeks?”.(3)Prediction accuracy (e.g., 0.87)—a number of correctly predicted future interactions in all predictions. The model was first trained with the training data set and then validated with the test data set by feeding it with the input data and comparing predictions with the actual ones from the test data set. In other words, after training the model, it was asked the above question, and if the answer was congruent with the reality (the actual existence/nonexistence of the interaction corresponded with the guess), it was deemed correct.(4)Test datasets (e.g., 1234)—a number of test data sets in the split. The greater the number, the more reliable the accuracy was deemed since it had been validated in more test scenarios.(5)Dataset’s output class ratio (e.g., 0.47)—the number of positive output values divided by the number of negative output values in the dataset.
In the perfect case, the ratio was expected to be 0.5 because it meant there was the same amount of positive and negative target values in the test data set. Conversely, the imaginary worst case would be with the ratio equal to 0.0 or 1.0, as it would mean there would be only positive or negative target values, and the model could reach the perfect accuracy simply by giving a fixed answer. The issue of the ratio reaching one of the extremes is known in the literature as the class imbalance problem [37].
All 3000 iterations were aggregated to determine the average prediction accuracy of the developed predictive model in the given data set. Additionally, the average (mean) accuracy and its standard deviation were extended with the calculated averages (means) and standard deviations of the number of training/test data sets along with their target value ratio for reference (see Table 1).
The overall prediction accuracy of 0.93 was excellent given the simple machine learning classifier (decision tree) and vectorizing strategy (counting vectorizer). However, it is important to notice the high level of class imbalance in the test data sets, which can hinder the reliability of the accuracy metric. Indeed, there were several iterations (see Table 2) that reached nearly perfect 1.0 accuracy. A closer look revealed that there was no positive or negative target value in the test data set (and a comparably small amount in the training data sets). It meant that the predictive model could have “cheated” by responding always with the only output class (positive or negative) present in the sample.
Such unreliable metrics seemed to artificially inflate the average accuracy of the predictive model. To mitigate it, the results were filtered by the amount of test data sets and their output value (class) ratio. The former was set to 200 (an iteration without the satisfactory amount of test data was rejected), whereas the latter was set to 0.4–0.6 (iterations with imbalanced classes in the test data were rejected). This quality threshold let 237 iterations pass, out of the original 3000, which are aggregated in Table 3.
The average accuracy of the predictive model based on the 237 most reliable iterations was found to be approximately 0.87, with a standard deviation of 0.08 and rather insignificant differences across window sizes. Thus, the prediction accuracy achieved by the developed predictive model was much higher than random guesses. The previously defined correlation metric can be computed using the resultswhich supports the H1 hypothesis as .
6. Discussion and Conclusions
6.1. Summary and Findings
The article introduced a mathematical model of organizational network event forecasting, treating organizational networks as discrete dynamical systems, which synthesized the theories of (1) dynamic interaction graphs, (2) multimodal organizational networks, (3) organizational network mapping, and (4) event forecasting. Then, it presented an experimental evaluation of the proposed model, which resulted in an accuracy of approximately 87% correct guesses on a real longitudinal (covering 11 years) data sample. The results supported the H1 hypothesis stated in the introduction—the ratio of correctly predicted future events to all predictions made with an implementation of the model was found to be significantly higher than the one expected from random guesses, which supported the hypothesis that there is a positive correlation between sequences of past and future interactive events occurring in organizational networks. Therefore, the proposed organizational network event forecasting model demonstrated practical usability for event forecasting tasks in network contexts.
6.2. Limitations
Noteworthy, the components of the model used in the experiment—the event sequence windowing and counting vectorizer algorithms as well as the decision-tree-based predictor—were chosen based on theoretical reasoning. A comparative study of different algorithms was outside of this research’s scope but brings an interesting case for a future investigation. Apart from the simplified algorithms used in the predictive model, a limitation of the study was the reduced organizational structure of the nodes. Namely, the research did not scrutinize the impacts of various clustering—grouping nodes together in arbitrary or otherwise determined organizations, allowed by the flat structure of the model—on the model’s performance. It is hypothesized that such a procedure should improve the performance if the clusters accurately reflect real organizations and decrease otherwise. This interesting feature, if proven to be correct, could be used by researchers, for example, to (1) determine the boundaries of real organizations or (2) forecast network events of arbitrary organizations (segments, sectors, industries, countries, etc.).
Another limitation of this study was the single dimensionality of nodes and edges used in the experiment. As mentioned before, the proposed mathematical model gives freedom in defining multiple dimensions of both nodes and edges, and an analysis of dependencies between selected dimensions (e.g., dimensions of flow network and representational network) makes an interesting case for future work.
Moreover, a prospective continuation of the research, which was not in the scope of this one, is evaluating the model on different, more diverse data sets, aggregated from multiple data sources. It will be incredibly interesting to see how the model handles nonbinary interactions or how data from diverse data sources can be aggregated and translated into event streams. Even though at this point the possibilities seem countless, impacts of the increasing diversity and cardinality of the data sources are also unknown and should be analyzed in detail. Presumably, at some point, such a complex organizational network event forecasting instrument will require an informed process of noise reduction and identification of relevant features in the vast ocean of data.
6.3. Practical Applications
The proposed organizational network event forecasting model can positively impact the effectiveness of researchers and practitioners of organization and management areas who nowadays, more than ever, operate in highly interdependent and complicated network contexts. For example, businesses could use it to observe trends in the market and simulate their actions before implementing them, institutions of public health to monitor the risk of dangerous events in the society, marketing teams to predict interactions with their products and target customers, and so forth.
Arguably, the network model can describe a wide range of the contemporary socioeconomic phenomena, which, in combination with the prospects of continuously improving tools for automated data mining and machine learning, gives a tempting promise to someday enable us to predict an abundance of socioeconomic events in our societies and businesses. This article is hoped to initiate the pursuit of this goal and pave the first steps of the long and enthralling endeavor.
Data Availability
The Python code used to support the findings of this study has been deposited in the GitHub repository (https://github.com/PiotrSliwa/preludium17). The data points used to feed the Python application and support the findings of this study are available upon request to the author due to their volume (almost 2 GB file).
Conflicts of Interest
The author declares that there are no conflicts of interest.
Acknowledgments
This research was financed by the Polish National Science Centre as a part of research project no. 2019/33/N/HS4/03086.