Abstract
A method based on coarse-graining to construct a directed weighted complex network which models the transformation of the trading data of an individual stock is introduced. The degree (strength) distribution of derived network follows a power-law. A moderated regression equation with interaction effects of average return and out-degree (in-degree) on out-strength (in-strength) is established. Moreover, we found that the differences of nodes affect the network’s structure and average return level impacts nodes’ eigenvector centrality and pagerank, significantly.
1. Introduction
It is inevitable that the financial market turns into a complex system [1], if the hypothesis of rational man is discarded and the influences of the factors on the financial market, such as the differences among investors, the learning ability, or the external economic environment, are considered. As one of the most important objects in the study of econophysics [2], statistical laws of financial market variables are revealed by various methods which include statistical physics, complex systems theory, and stochastic processes. Complex network, a graph composed of numerous verticals and intricate connections neither completely regular nor stochastic, which always exhibits a large number of nontrivial topological features is an effective tool through which hidden relations of a complex system can be extracted. ER random graphs [3], small world networks [4], and scale-free networks [5] which are the most common models have been widely used to interpret the complexity of the real world. Besides, applications of complex networks analysis have recently attracted considerable interest in the financial market as well as other fields. For example, based on the generalized Indian Buffet process, Boldi et al. introduce a network model which presents complex phenomena well; thus some local and global properties of real networks are explained [6]. Mei et al. redefine a “Complex Agent Network” which is able to capture the multiscale spatiotemporal features of complex systems [7]. In [8], some time series transformed to complex networks, and Xu et al. show that the distribution of subgraphs characterizes types of continuous dynamics. Donner et al. provide a thorough reinterpretation of some statistical measures in terms of phase space properties of dynamic systems computed for recurrence networks to which transform from nonlinear time series [9].
There are lots of applications of complex network on financial market. In [10], the evolution of the topology for the global financial networks is used to evaluate the systemic risk. Based on payments data, De La Torre et al. reconstruct economic structure of Estonia and the attacking simulation is used to analyze the vulnerability of the national economy [11]. Fan models a dynamically evolving complex bank network system and designs the method to evaluate the system risk by means of lending-borrowing algorithm and multiterm clearing algorithm [12]. In [13] the market graph with quite stable power-law structure constructed by cross-correlation is considered a representation of the “self-organized” stock market. Stefan and Atman found that the trust network morphology can alter drastically behavioral model as well as the fluctuations of the stock market index in [14]. Tse et al. construct several complex networks which are scale-free for US stocks over two certain periods by means of calculating cross-correlation and suggest that the stock market is actually heavily influenced by financial sector’s stocks [15]. To analyze Shanghai stock index, Zhang et al. build several scale-free (or small world) networks, suggesting that the existence of hubs and the segments correlated with a given one appear in a Poisson process [16]. Chen et al. suggest that centrality and modularity of a complex network based on correlation are used to detect the effect of interconnection on stock returns and industries [17]. See also [18–22].
Most of the above studies focus on the varying of the index of stock market or various relations between stocks and external circumstance. Briefly, there are 2 types of network construction: in terms of the similarity of segments and according to the correlation coefficients of different derivatives’ return curves. Neither of these two methods is suitable to explore laws of financial time series transforming. For example, the classical GARCH Model [23] which is used to describe the time-varying volatility clustering of financial time series cannot be represented by the above 2 networks because both of them discard diachronic connection between two adjacent nodes. In this paper, a coarse-graining method is adopted to construct a directed weighted complex network which is used to exhibit the time series of a stock exchange data as well as its transformation. We concentrate on the following two questions:(i)Are there any rules on the transformation of stock trading?(ii)How does the stock return affect the structure of the corresponding complex network?
The rest of this paper is organized as follows: a directed weighted complex network is derived from an individual stock (sh600519) by means of a coarse-graining method in Section 2. In Section 3, the parameters of the network’s degree (strength) distribution are estimated and tested, and a regression equation of strength on degree and return and their interaction is established. Besides, this section also includes analysis on edges weight and other statistics. Finally, we summarize this paper and discuss some future works in Section 4.
2. Methodology
The stock data of “sh600519” from Jan. 4, 2015, to Jan. 4, 2016, is captured from Sina Buz&Tech. Take it as an example, we illustrate how to construct a directed weighted network which displays stock trading. Particularly, the data during 9:25:05 and 9:32:18 on Jan. 4, 2016, is shown in Figure 1.

Let ( is a transpose operator) on behalf of trading data of sh600519 at time be a 2-dimensional random variable, where is the stock price and is the trading volume. Denote , the subscript set of is ascending. For an arbitrary , assume that there is a time-varying system which is drastically affected by multiple factors, such as investors’ expectation, which are difficult to quantify, satisfying the fact that there were two positive number and , such that is regarded as the input of and is the output; then where error is composed of two parts: (i)random error comes from the influences of the external economic environment;(ii)systematic error originates from both the changes of the investment strategies of the shareholders and the stock’s coefficient. On the one hand, it is the frequent changes in investors’ strategies that lead to large systemic error; on the other, as a measure of the fluctuation of a security or a portfolio in comparison with the whole market, is uncertain since the volatility of the financial market leads to different effects on different stocks in different periods.
If one asks for a rather small , there is little prospect of finding reasonable , , and with relatively simple form that is consistent with (1) for arbitrary . In order to analyze laws of the stock price transformation, we construct a directed weighted network to exhibit the changes of the stock indicators. Firstly, using slip window which is specified by some time division points, the stock data is cut into a lot of segments. Secondly, these segments are mapped to nodes according to a kind of coursing-graining method. Finally, edges are directed and weighted according with the chronological order. Let , where = 09:25:00, Jan. 4, 2015, = 15:00:00, Jan. 4, 2016. The way of showing as well as its altering via the directed weighted complex network will be illustrated in detail later.
2.1. Cutting the Stock Data
Generally, two ways of segmentation are popular: one is the fixed time window [9, 13, 17], and the other is on the basis of the local extremum [16, 27]. Unfortunately, both of them are regardless of the change of investors’ strategies, which effects the movements of stocks dramatically. Assume that the investors’ strategies will be adjusted if no trading happens in a certain amount of time; thus the time span between two adjacent kinds of trading is regarded as the sign of strategies change. For a specified threshold value , rewrite subscript set as follows: where for any and , for any . That is, a division point will be added between two adjacent moments if the time difference of them is greater than . Let and ; then , where is the th segment of the separated stock data, and we hypothesize that strategies substantially unchanged during and .
For instance, the results of the division ( s) of stock sh600519 data during 9:25:05–9:32:18 in Jan. 4, 2016, are listed in Table 1.
2.2. Coarse-Graining Process
Typically, coarse-graining means symbolizing of original data via ignoring some information which is relatively unimportant subjectively with the aim of analyzing major characteristics or tendency of time series by symbolic dynamics. In fact, it is almost impossible to put forward a standard coarse-graining process which is fit for all kinds of data because of the subjectivity of the process. Practically, coarse-grain methods which are used to construct network from the financial time series, such as those mentioned in [21, 28], are similar: symbolize segments by specified thresholds. How to find out proper thresholds is still unclear. This is because that too large intervals among thresholds lead to severe information loss; meanwhile, too small intervals reserve excessive details which may conceal the major trend. In order to retain some quantifiable features and represent the evolution trend while coarse-graining, some statistics of the segments were used to actualize this process. The specific procedure is introduced as follows.
Let for any , in particular, if contains a call auction, the trading data of the call auction must be removed. (i)Firstly, we calculate some numeric characteristics: duration , volume in average , the returns ; thus we have the average return , the standard deviation of the returns , and the range , where , .(ii)Secondly, let and , where is a row matrix for . Fix as the number of categories for every numeric feature; let be the th -quantile for where .(iii)Finally, assume that for any ; we define the coarse-graining process by as follows: if and only if , where and , where ; if and only if , where and , where ; And so on for the other features, we get , where .
To some extent, the above coarse-graining process reserves quantitative information which brings more accuracy for the follow-up analysis.
Let us use in Table 1 as an example and fix . It is easy to compute s, lot·s−1, ; thus , , , and . Besides, let whose th column is the 4-quantile of . is called a 4-quantile matrix of . Thus, it is susceptible to see , and so on for , where is called the th trading status. It is obvious that there may exist some and satisfying although . is called an initial node or a final node if it is the first node or the last node of that day, respectively; otherwise it is called a transitional node. Let be an ordinal set of with dictionary order. For each , there exists a unique equal to it and we denote .
2.3. Construction of Network
View the nodes in as the vertices of a complex network. The remaining work is to confirm their connections. In order to represent transformations between adjacent dealings on the same day, what we need to do is linking nodes based on the chronological order. In fact, the edges and their weights are assigned as follows.
For arbitrary , if and occurred on the same day, there will be two cases:(A)If there is not a directed edge from to , add such a directed edge and set its weight to 1.(B)Add 1 to the weight of the directed edge from to if there is such a directed edge already.Otherwise, if and occurred on different days, do nothing.
Let traverse through all the values; thus a direct weighted complex network is constructed.
For any two adjacent and , where is the source node and is the target node, is called the successor of and the predecessor of . Let the out-degree of a vertex be the quantity of its out-edges and the out-strength equal to the sum of its out-edges weights and define the in-degree and in-strength similarly; thus the network is constructed. We call it the network of the transformation of the price-volume of the stock sh600519.
The weight of a directed edge is the quantity of linkages from its source node to target node and corresponds to the frequency of this transition. We deem the edges weight positively correlated with the probability of the transition from the source node to the specific target. Clearly, the in-degree of a node corresponds with the node’s ability of being a successor of different nodes, the out-degree indicates the extent of the diversity of the node’s successors, and in-strength (out-strength) is relevant to the probabilities of being a successor (predecessor). In-degree, out-degree, in-strength, and out-strength of are denoted as , , , and , respectively. For a certain , it is prone to deduce that if both and are small then corresponds to transition between some two different trading statuses, is significantly greater than means the the higher probability of being a final node, and is significantly smaller than means the higher probability of being an initial node. This paper concentrates on the relations between in-degree and in-strength as well as out-degree and out-strength.
Besides, since it is obvious that the above network is completely and uniquely determined by and for any given stock data, this way of constructing a complex network is called a -method and the network is denoted as (the transformation of the price-volume network). For instance, fix s, and is from 09:25:00, Jan. 4, 2015, to 15:00:00, Jan. 4, 2016; the is shown in Figure 2.

3. Data Analysis
Since is different if is altered, we need some tests first. The parameters are shown in the following part.
Parameters Source: Sina Buz&Tech Stock code: sh600519 Range: from 9:00 Jan. 4, 2011, to 15:00 Jan. 4, 2016, 1215 market days in total Data type: details of transaction (every 5 seconds) : 15, 30, 60, 120, and 1200 (s) : 3, 5, 7, and 9.
3.1. Tests of Distributions of Degree and Strength
For different combinations of , Figure 3(a) shows the distributions of degree and cumulative degree; Figure 3(b) shows that of strength. In addition, the scaling exponents are listed in Table 2 and Figure 4.

(a) Degree

(b) Strength

(a) Degree

(b) Strength
Almost all curves (point-groups) looked like straight lines when strength (degree) is relatively large in double logarithmic axes, which indicates the scale-free properties of all the . So the test of power-law is necessary which will be done in 2 steps.
Step 1. Estimate parameters based on the actual data.
Step 2. Test the result by some methodology.
In general, the degree (strength) distribution of nodes whose degree (strength) not less than a certain threshold value may obey a power-law. Clauset et al. proposed a principled statistical framework which combines maximum-likelihood fitting method with goodness-of-fit test based on the Kolmogorov-Smirnov statistic and likelihood ratios to discern and quantify power-law behavior in [29]. The result of parameters estimating is shown in Table 2 and Figure 4. The result of Kolmogorov-Smirnov testing is listed in Table 3.
3.2. Network Statistical Characters
Only and which passed the K-S testing satisfy that of both distributions are between 2 and 3. We take for further discussion since and . The corresponding 5-quantile matrix of is By the construction of we know that it could consist of nodes. In fact, there are 481 nodes and 2738 edges; the sum of edges’ weight is 5536. Figure 5 displays nodes’ degree and strength.

3.2.1. The Relationship between Degree and Strength
The avg. degree and strength of are 6.035 and 3.606; that is, on the time period from 9:00, Jan. 4, 2011, to 15:00, Jan. 4, 2016, each treading status happens 3.606 times and has disparate successors or predecessors in average. The number of initial (final) nodes is 1215 since there are 1215 trading days in total. Besides, the number of transitional nodes is . Analysis on degree and strength and their relationship is useful for understanding the evolution process of the networks as well as the linking prediction [30]. Table 4 exhibits the top 10 nodes of out-degree, out-strength, in-degree, and in-strength. Furthermore, these pieces of information are shown in Figure 6.

From Table 4 and Figure 6 we found the following.(1)00200, 00210, 00300, and 11101, 11111 are in all four sequences; they correspond to the highest frequency transitional nodes between different trading statuses because their out-degree and in-degree are large and their out-strength and in-strength which are roughly equal are large too. The transitional nodes of this kind have the following features: the duration is less than 4531 s, the avg. trade volume (per second) is fewer than 1.406, the avg. return lies between −0.003465 and 0.0004174, and the standard deviation and the range of the returns are less than 0.001739 and 0.007528.(2)The difference between and (or and ) indicates that degree is not the only factor affecting strength.(3)Obviously, the strong positive correlation should exist between out-degree and out-strength or between in-degree and in-strength. There are 510 nodes with positive out-degree and 562 nodes with positive in-degree. The correlation coefficient between out-degree and out-strength is 0.9223 and 0.9311 between in-degree and in-strength. A scatter plot exhibits their linear relations; see Figure 7.The significantly positive correlation between out-degree and out-strength can be seen from Figure 7; thus the relationship between them was set up through polynomial regressions. The result is listed in Table 5.

The relationship between in-degree and in-strength is similar and the results of polynomial regressions are listed in Table 6.
The result of higher-order polynomial regression is significantly better than that of linear, which shows that the increase of the out-strength (in-strength) is inhomogeneous, and we deem that such increase is due to at least two different factors:(A)The increase of out-degree, which represents the relation between the node and its neighbours, is regarded as a kind of external cause.(B)Some features of the node, such as and which hide the ability of being a successor, are regarded as a kind of internal cause.
Nodes’ attributes describe the corresponding transaction status including the average return and the range of return; networks statistical characters, such as degree, closeness and centrality, represent the relations between nodes and neighbours, how the nodes are embedded in the networks, and the importance of each node. As mentioned in Table 6, the relationship between degree and strength is significantly influenced by nodes’ attributes. We will utilize hierarchical multiple regression (HMR) to model how the nodes’ attributes affect the relationship between degree and strength. The following work concentrates on quantizing this affecting, that is, the internal cause. To simplify our work, we barely consider the impaction of return on the nodes whose degree not less than 1.
Let with five levels (0–4) be moderating variable, the reference level, and out-strength (OS) the dependent variable. Besides out-degree (OD), there are eight predictors which include four dummy variables corresponding to , respectively. Table 7 shows the details.
The result of HMR is listed in Table 8. The influences of the interactions of all the dummy variables and DOD on DV are significant, so do .
Finally, we use backward regression on ; the results are shown in Tables 9 and 10.
We found that there is not any significant difference on the explanatory power between models 2-1 and 2-2, even though is removed in model 2-2. Thus, the moderated regression equation on out-strength with interaction effects should bewhere is the deflated out-degree and , , , and are the dummy variables corresponding to , , , and , respectively. Similarly, the moderated regression equation with interaction effects on in-strength is where is in-strength and are deflated in-degree or interactions of nodes’ deflated in-degree and average return level, like that of out-degree. To verify (6) and (7), let the stock data from 9:00, Jan. 5, 2016, to 15:00, Apr. 4, 2016, together with the former data be the testing set. See Figure 8 for details.

(a)

(b)
3.2.2. Edges Weight
Let the vertices set of follows dictionary order if we regard each node as a 5-dimensional vector. The adjacent matrix of is shown in Figure 9.

For an edge , that is, a directed edge from to , let , where is the Euclidean distance, be called the length of edge and which is an ascending set. Let , where is the weight of . In order to investigate the relationship between the length of edges and the linking possibilities, we take as coordinate and or as (blue) or (red) coordinate, respectively, in Figure 10(a); the cumulative proportions of soe and the number of edges according to the edges transformation are shown in Figure 10(b).

(a)

(b)
The above analysis illustrates that the transformation between source node and target node is basically negatively related to its frequency; thus the heterogeneity of nodes cannot be ignored when we consider the characters of .
3.2.3. Other Statistical Features
Except degree and strength, there are still a lot statistical characters, such as clustering coefficient and betweenness centrality, that are used to describe the topological features of a complex networks; see [31]. Unfortunately, few of them can be used directly on since it is a directed weighted network. Barely considering the transformation of stock trading status, edges weight can be omitted; thus degenerates to a directed unweighted network denoted by . Some statistical features of are calculated; see Table 11.
From Table 11 we can see that it takes 4.05 times in average or at most 11 times of alternation from any trading status to another. Links exist between only 0.4% of pairs of nodes. is divided into 18 communities; the proportion of the number of alternations inside these communities over the total number in the whole network is more than 42.8%. That the average clustering coefficient is significantly larger than that of a random graph, which is less than 0.01, means the small world property of . The distributions of clustering coefficients, betweenness centralities, eigenvector centralities, and pageranks, are shown in Figure 11. To analyze the influence of return on the importance of nodes, Kruskal-Wallis test is adopted to distinguish the effects of different levels of return on some nodes’ measurements which are non-Gaussianity. They are closeness centrality (CC), harmonic closeness centrality (HCC), betweenness centrality (BC), clustering coefficient (CLC), eigenvector centrality (EVC), and pagerank (PR). We found that only eigenvector centrality and pagerank are affected by return on the significant level 0.05; see Table 12.

(a)

(b)

(c)

(d)
4. Conclusion and Discussion
We introduce a -method to construct a directed weighted complex network which models the transformation of the trading data of an individual stock sh600519 in 5 years; that is, divide the data by a time interval threshold value , coarse-graining according to some statistics, and the number of classifications , linking and weighting nodes in a chronological way. The derived satisfies the power-law for certain combinations of , such as . We found the inhomogeneity of the increase of strength and then build moderated regression equations on out-strength (see (6)) and in-strength (see (7)) with interaction effects. Besides, the heterogeneity of nodes affects the structure of the network dramatically; the level of avg. return of trading statuses impacts nodes’ eigenvector centrality as well as pagerank.
It is susceptible to see a network constructed by the -method losing a great deal information of the time series, so revealing all the rules of transforming is still difficult. Another weakness is the lack of related theoretic supporting which is an obstacle of building its evolution dynamic models. Besides, if the distributions of can be utilized appropriately, the results may be greatly promoted.
Competing Interests
The authors declare that they have no competing interests.
Acknowledgments
The authors are grateful for the support provided by the National Natural Science Foundation of China (Grant no. 11301361) and the Research Foundation of Sichuan Technology and Business University (CGSKY-34).