Abstract
Key nodes have a significant impact, both structural and functional, on complex networks. Commonly used methods for measuring the importance of nodes in complex networks are those using degree centrality, clustering coefficient, etc. Despite a wide range of application due to their simplicity, their limitations cannot be ignored. The methods based on degree centrality use only first-order relations of nodes, and the methods based on the clustering coefficient use the closeness of the neighbors of nodes while ignore the scale of numbers of neighbors. Local structural entropy, by replacing the node influence on networks with local structural influence, increases the identifying effect, but has a low accuracy in the case of high clustered networks. To identify key nodes in complex networks, a novel method, which considers both the influence and the closeness of neighbors and is based on local structural entropy and clustering coefficient, is proposed in this paper. The proposed method considers not only the information of the node itself, but also its neighbors. The simplicity and accuracy of measurement improve the significance of characterizing the reliability and destructiveness of large-scale networks. Demonstrations on constructed networks and real networks show that the proposed method outperforms other related approaches.
1. Introduction
Complex network, as an effective method of studying complexity of systems, provides theoretical tools for researchers to explore their objects from a new perspective. The study of complex networks has been penetrated into many disciplines, such as computer science, control theory, medicine, biology, and economics. Berahmand et al. [1] adopted complex network methods to detect protein-protein interactions by constructed protein networks. Yao et al. [2] employed complex networks to characterize EEG signals and extract EEG signal information for emotion recognition. Further researching of important nodes mining has always been the focus of research in the field of complex networks. In general, the function of the network mainly depends on several important nodes which have a huge impact on the structure and function of the network, even though the number of these nodes is relatively small. For example, in the criminal relationship network, the leader of the criminal gang can be quickly located. In the power network, some important circuit breakers and power generation units are protected to prevent large-scale power outages, which can be caused by the failure of these important units. In the research of search engines, the pages presented to users are sorted by their importance. In the prevention of virus spreading, treatments are more likely to be carried out on key patients. These cases imply that the mining of key nodes is of great significance [3–5].
Existing methods on the identification of important nodes can be roughly summarized into three categories.(1)Methods based on social network analysis. The main idea of the methods based on social networks is to measure the importance of nodes according to some statistical indicators based on graph theory, such as degree centrality [6], betweenness centrality [7], and closeness centrality [8]. PageRank [9], and HITS [10] have improved this kind of method and have adopted them in the design of research engines.(2)Methods based on systems science. The main idea of the methods based on systems science believes that node importance is equivalent to network destructiveness, and it can be measured by the degree of damage to the network after removing the key nodes. Typical methods include the neighborhood coreness proposed by Bea et al. [11], and the K-Shell proposed by Kitsak [12].(3)Methods based on information theory. As a basic concept of information theory, entropy has been widely used in complex networks in recent years. Fei et al. [13] applied information entropy to the identification of important nodes in complex systems. Zhang et al. [14] adopted local structural entropy to measure the importance of nodes.
The single-indicator methods, showing advantages in some aspect, but they have many limitations. Therefore, scholars have synthesized some of these methods to generate a fusion model. Berahmand et al. [15] proposed an important node identification method, DCL, which considered comprehensively a variety of properties of the node. It includes the degree of the node itself, the degrees of the neighbor nodes, the clustering coefficient, and the common edge between the neighbor nodes. Liu et al. [16] proposed an entropy-based node importance evaluation approach to obtain an accurate result, which takes into account not only the importance of the node itself, but also the relative importance of the node to its neighbors treating the entire network as a unit. Mester et al. [17] adopted the traditional centralized measurement method and community detection of clustering to analyze the node importance from multiple perspectives. Qiu et al. [18] proposed a node importance measurement method consisting of local and global positions of nodes. Berahmand et al. [19] proposed a method that integrated the degree of nodes, the negative effects of clustering coefficients, and the positive effects of second-order clustering coefficients to define the importance of nodes.
Although recent works have focused on multiple perspectives to improve the effect of key node identification, these approaches are not always applicable to some specific networks. Therefore, it is desirable to improve the accuracy of the identifying model and decrease the time complexity [20].
In this paper, we proposed a novel method () based on local structure entropy and clustering coefficient in order to measure the importance of the node, which integrates the degree of the node and its first-order neighbors and closeness between the node and its neighbors. We have conducted experiments and evaluated the accuracy with accepted criteria [21]. The results of the experiments on three constructed networks and eight real networks with different sizes demonstrate that our approach outperforms others, especially in real datasets.
2. Preliminary
The notation used in this paper is summarized in Table 1.
Complex networks can be described as an undirected graph , with a node set and an edge set , in which, , , and the edge between two nodes can also be denoted as .
2.1. Local Structural Entropy
Entropy [22–24] is widely used in thermodynamics to describe the process of heat conduction, and the essence of entropy is the internal chaos of the system. With the development of statistics and informatics, the meaning of entropy has been expanded. In information theory, Shannon entropy [25], the basis of information theory, is used to measure the unpredictability of information or the uncertainty of information systems. Zhang et al. proposed the concept of local structural entropy [14], whose main idea is to use the local structural property of nodes in the whole network rather than the property of the node itself. And the definition of local structural entropy iswhere represents the local structural entropy of node , node is the neighbor of node , is the number of network nodes, and is the ratio of the degree of node to the degree of all nodes in the local network corresponding to node . Hence, is defined as follows:
2.2. Clustering Coefficient
In graph theory, the clustering coefficient is a value adopted to describe the level to which nodes tend to cluster together in a graph. To be more specific, it reveals the degree of connection between neighbors of a node. Clustering coefficient, which describes the proportion of neighbors of nodes [26], can be defined aswhere is the clustering coefficient of node , is the number of triangles between node and its neighbors, and , which is the number of neighbors of node , is represented aswhere denotes the connection between node and node .
Generally, the degree of a node is used to measure its importance. A node has greater influence if it has more neighbors. That is, the importance of a node is directly related to its degree. However, the methods based on degree centrality only takes into account the first-order relation of the node and its neighbors, ignoring the second-order relation of the node and its neighbor’s neighbor, although the second-order is an important property in reflecting the node’s local information.
Local structural entropy, by taking into consideration the impact of neighbors for measuring the importance of nodes, achieves better effects on key nodes identification. However, it has low accuracy in the case of high clustering networks.
As shown in Figure 1, it is obvious that deleting node is more destructive to the whole network than deleting node , that is, node has a more important influence than node . In Figure 2, the total degree of the local network corresponding to node is 11, the degree of nodes is 2, 3, 3, and 3, respectively. According to the definition of , we can obtain the ratio of the degree of these nodes to the total degree of the local network, here , , , and . We can also get the local structural entropy of these nodes according to the definition of , here , , , and . Similarly, we can obtain the values in the local network corresponding to node in Figure 3. It is notable from the calculation that, , here . The result illustrates that it is not accurate to identify the influential node when adopting value only.



The above example shows that local structural entropy has better performance than degree centrality in measuring some network characteristics [14]. However, local structural entropy lacks of accuracy in high clustered networks.
3. Method
According to the analysis of the above section, an accurate measurement of the importance of a node cannot depend on local structural entropy only, although it has better effects than degree centrality.
3.1. EC Model
The proposed method, , based on both local structural entropy and clustering coefficient, takes into account not only the node itself, but also the structure of its neighbors. The process of approach is described in Figure 4:

And it can be represented as follows:where is used to measure the importance of node , is the local structural entropy of node , is a normalization of , and is a matching factor for integrating clustering coefficient with local structural entropy of the node . These items are defined as follows:where , which is the clustering coefficient of node , reflects the closeness of its neighbors. While indicates the structural property of the node and its neighbors. is a min-max normalization of . To construct by parameters ( and ) with different properties, we adopt function to standardize these two parameters. The definition of is
We adopt the method proposed in the previous section to calculate the importance of all the nodes in the network illustrated in Figure 1. From (7), we get Table 2, in which and . It is obvious that , in other words, the importance of node is greater than that of node in our proposed method, which shows that the proposed method is more appropriate in the identification of key nodes with the local structure and neighbors’ closeness of a node in complex networks being taken into consideration.
3.2. Analysis of Time Complexity
The computational complexity of degree centrality, betweenness centrality, closeness centrality, and K-Shell are , , , and , respectively. The proposed method mainly uses local structural entropy and clustering coefficient. Local structural entropy has a computational complexity of , in which refers to the average degree of a network. If a network is fully connected, it has the worst performance with a complexity of . Clustering coefficient has a computational complexity of . Thus, the computational complexity of is .
4. Experiments
Three single-indicator methods and two multi-indicator methods were chosen to compare the results of corresponding approaches on typical datasets. Here we chose degree centrality , local structural entropy , K-Shell , , and . The experiment data consisted of three constructed datasets which include a random network , small-world network , and scale-free network and eight real datasets [27] with different sizes.
4.1. Evaluation Criteria
Robustness can be used for measuring the functional change of networks when some nodes are removed. Generally speaking, the function of a network is affected by its maximum connected component. In the experimental simulation of network robustness, we first removed the edges between the node and its neighbors one after another, according to the descending order of node importance. The removal of edges between nodes can decrease the connectivity of a network. After deleting the edges corresponding to the node, the lower the connectivity of the graph is, the greater the influence of the node becomes [28].
As for evaluation criteria, two values, denoted as and were adopted. Where is the ratio of the number of subgraphs to the number of nodes after deleting the edges related to the node, and is the ratio of the maximum subgraph size to the number of nodes. The node deletion order is consistent with the descending order of its importance.whereand is the total number of nodes in network , is the subgraph set of the network, is the size of the set , and is the maximum size of the set .
The evaluation values with large and small mean that the deletion of the node will reduce the connectivity of the network, hence, the node has a great influence.
4.2. Experiments on ER, WS, And BA Networks
We have constructed network, network, and networks for experiments and compared our approach with , , , , and , by which the nodes are sorted in descending order of its importance. To describe the effect of different methods, we only selected a specific deletion range to illustrate the issues.
The network has nodes with a link probability of . As shown in Figures 5 and 6, the evaluation of each index is almost the same, which is due to the stable structure of the random network. The network in Figures 7 and 8 has 5,800 nodes while each node has five neighbors, and the random reconnection probability is 0.5. The BA network in Figures 9 and 10 has 8,800 nodes, which add 18 edges in each construction iteration. As can be seen from Figures 7 and 8, the method is slightly better than the method, but the difference is not obvious, same as in Figures 9 and 10. It is shown that the and the can reach the results more accurately than , , , and . The BA network is constructed by 8,800 nodes with 18 edges for each node. There are little gaps among the methods except when 70% nodes are removed. So, these methods almost have the same performance on constructed network, which is due to their randomness of the construction process.






4.3. Experiments on Real Networks
In order to generalize the method to real networks, eight real datasets of different sizes were selected for experiments of deliberate attack simulation. After deleting the top ratio nodes in the ordered sequence, the statistical characteristics of each network are listed in Table 3. Where, is the number of nodes, is the number of edges, is the maximum degree of nodes in the network, is the average degree, is the average local clustering coefficient, is the global clustering coefficient, and represents the number of triangles.
Figures 11 and 12 show the change of and when the first 25%–35% nodes of the network email-enron-only are removed. As can be seen from Figure 11, is significantly superior to , , , and . When the first 34% nodes are removed, the values corresponding to , , , , , and are 0.6014, 0.4545, 0.6294, 0.4196, 0.4825, and 0.2867, respectively. In Figure 11, the of is significantly lower than that of others, and the of is higher than that of others except . It means that after deleting certain ratio of key nodes identified by , the connected network is separated into several parts which have a large number of subgraphs with each subgraph contains a small number of nodes. Compared with , the of does not exceed that of , though with slight deference. However, has remarkable performance than on . So generally, we believe EC outperforms others.


Figures 13 and 14 show the changes of and on the power-662-bus network. It can be seen that after deleting top 20% nodes, the result of is better than that of others. It also indicates that the methods of using multi-indicator are more accurate than those of using single-indicator. We find has the worst result and the performance of other methods are between and . Figures 15 and 16 show the result of the experiment on power-bcspwr09 network. When we delete top 20% nodes, the curves of produced by these methods have a small fluctuation, but the trends of are almost consistent. It is hard to distinguish which method is the best one on this network. On inf-openflights network as shown in Figures 17 and 18, we can obviously find that has the best result both on and .






The results on other networks, such as power-US-Grid network (Figures 19 and 20), power-bcspwr10 network (Figures 21 and 22), high-energy theory network (Figures 23 and 24), and inf-roadNet-CA network (Figures 25 and 26) show that even the performances of other methods alternate frequently, with always maintaining high accuracy. Therefore, our method is available for many kinds of large-scale networks.








The results of the deliberate attack simulation on eight real networks of different sizes show that, compared with the five methods of , , , , and , the curve of the method has the fastest growth and the curve has the fastest decline in most cases. It indicates that the deletion of the key nodes identified by the EC approach can lead the network to a serious damage, that is, can accurately measure the importance of nodes.
5. Conclusion
The paper proposed a novel method , which focuses on the degree of the node and its neighbors and the closeness of the neighbors. It can measure the importance of nodes more efficiently and can be used to analyze the reliability and invulnerability of large-scale networks. The results of experiments on datasets from three constructed networks and eight real networks indicate that attacks on the top part of nodes sorted by are more likely to increase the extent of the damage to the entire network. From experiments, it can be analysed that the rising of subgraph number and the decline of nodes number of giant components which are calculated by method are more obvious than those of other methods in most cases. Therefore, has its superiority and effectiveness compared to methods of , , , , and . However, we discovered that the accuracy will be reduced in the networks with high clustering. In future research, we will resolve the deficiency to increase the accuracy in high clustering networks, and take into account the network properties of large-scale, high dimension, and dynamics, which are the challenges in the research on identifying key nodes of complex networks.
Data Availability
The data that support the findings of this study are openly available in https://networkrepository.com/networks.php.
Conflicts of Interest
The authors declare that there are no conflicts of interest regarding the publication of this manuscript.
Acknowledgments
This work was supported by the National Natural Science Foundation of China (Nos. 62062049 and 62141303), the Humanities and Social Sciences Fund of the Ministry of Education of China (No. 20YJCZH212), the Natural Science Foundation of Gansu Province, China (No. 20JR5RA390), and the Technological Innovation Guidance Plan of Gansu Province, China (No. 2020-61-14).