Abstract

Node importance estimation is a fundamental task in graph analysis, which can be applied to various downstream applications such as recommendation and resource allocation. However, existing studies merely work under a single view, which neglects the rich information hidden in other aspects of the graph. Hence, in this work, we propose a Multiview Contrastive Representation Learning (MCRL) model to obtain representations of nodes from multiple perspectives and then infer the node importance. Specifically, we are the first to apply the contrastive learning technique to the node importance analysis task, which enhances the expressiveness of graph representations and lays the foundation for importance estimation. Moreover, based on the improved representations, we generate the entity importance score by attentively aggregating the scores from two different views, i.e., node view and node-edge interaction view. We conduct extensive experiments on real-world datasets, and the experimental results show that MCRL outperforms existing methods on all evaluation metrics.

1. Introduction

Knowledge graphs (KGs) are graph-based data structures consisting of nodes and edges [1, 2]. Each node represents an “entity” and each edge represents a “relation” between two connected entities. In recent years, a lot of research has been devoted to solving problems related to graphs [36]. Estimating the importance of each node in a graph, network, and KG is a highly fundamental and crucial task, which is beneficial to many downstream applications, such as question answering, recommendation, web searching, and resource allocation [710].

For instance, Figure 1 shows an example of a movie knowledge graph, where “Suicide Squad” is a movie node, with the author node “David Ayer” and the actor node “Jared Leto” connecting to it via the edges “wrote” and “starred-in.” Each node is also associated with texts such as the biographies or movie plots. Before the two movies “Training Day” and “Suicide Squad” come to screen, we may use the information in the KG to estimate the potential popularity of these two films. As can be observed from the figure, the movie node “Training Day” might become more popular, since it stars more popular actors and is directed by the director with higher recognition in terms of professional reviews.

Thus, a number of works have attempted to estimate the importance of nodes in a graph, which can be divided into two main categories. The first category includes classical methods such as PageRank [7] and Personalized PageRank [11]. PageRank was originally designed for estimating the importance of websites. It assumes that more important nodes may receive more links from other nodes. By counting the number and quality of edges linked to the nodes, the importance of the nodes can be roughly estimated, whereas this algorithm only considers the graph structure. Personalized PageRank improves PageRank by taking into account the user’s estimation of the importance of the nodes in the graph, while it neglects the type of edges. In summary, this category of approaches cannot perform well when estimating the importance of nodes in large-scale complicated graphs, since they merely take into account the graph’s topology while overlook the substantial amount of semantic and latent structural information encoded in the graph.

Another category is machine learning-based strategies, such as GENI [12] and RGTN [13]. GENI acquires the node features through node2vec and then converts them to the importance scores. The scores are flexibly aggregated via a predicate-aware attention method. Additionally, a centrality adjustment module is used at the end for fine-tuning. RGTN considers both structural and semantic information based on representation learning and then uses a common attention fusion mechanism to interact structural features with semantic features; after that, these features are projected into importance values separately and then aggregated with attention weights to produce the final importance scores of nodes. These trainable methods outperform the traditional solutions owing to the advanced supervised learning framework and the flexible graph attention mechanism.

Nevertheless, there are still notable issues with current works:(1)To learn the graph representations, existing efforts usually adopt a single graph learning model to capture the structural information and generate the embeddings, which can be insufficient for accurately estimating the node importance.(2)To aggregate the embeddings and produce the scores, existing efforts directly combine the node scores and edge embeddings, which might fail to make full use of the interaction between node and edge representations.

To tackle the aforementioned issues, we propose a Multiview Contrastive Representation Learning (MCRL) model to generate entity representations from multiple perspectives, which can help make more accurate entity importance estimation. Specifically, we adopt two graph encoders to characterize the entity representations in different views and make cross-view contrasting to extract more useful signals using the contrastive learning strategy. Then, given the learned graph embeddings, we estimate the entity scores from two views—one based purely on the entity embeddings, and one based on the interaction between entity and relation embeddings. Finally, the multiview scores are integrated using the attention mechanism. Comprehensive experiments on real-world knowledge graph datasets validate that MCRL can outperform existing methods in terms of all metrics.

1.1. Contribution

The main contributions are summarized as follows:(1)We devise a multiview contrastive learning strategy to estimate entity importance, where the graph representations are first learned in the two views separately and then forwarded to the cross-view contrasting module to further enhance the expressiveness.(2)Based on the graph embeddings, we generate the entity importance score by attentively aggregating the scores in two views—one merely considering the entity embeddings and one modeling the interactions between entity and relation embeddings.(3)We conduct extensive experiments on real-world public datasets, and the results demonstrate that MCRL outperforms the baselines in all aspects.

1.2. Organization

The rest of this paper is organized as follows. Section 2 gives an overview of the literature that is relevant to this work. Section 3 provides the definitions of key concepts and the problem formulation. Section 4 elaborates the node importance estimation model. Section 5 reports and discusses the evaluation results on the mainstream node importance estimation experimental settings. Section 6 concludes the paper and provides future directions.

This section provides an overview of the literature related to this work, including node importance estimation methods, graph neural network methods, contrastive learning methods, and data augmentation methods.

2.1. Node Importance Estimation

There are numerous ways to estimate node importance. PageRank (PR) [7] is a random walk model that propagates the importance of each node by traversing the graph structure or transmitting it to random nodes with a fixed probability. Personalized PageRank (PPR) [11] adjusts node weights or edge weights to bias the random walk by considering specific topics. Recently, several works have begun to explore supervised machine learning algorithms with the development of deep learning of graph data. In addition to employing the random walk model, HAR [14] additionally distinguishes between different types of predicates in KGs while being aware of importance scores to make better use of the rich information contained in KGs. GENI [12] is the earliest work to apply GNN to node importance estimation, which classifies the neighbors of each node based on the type of edges and aggregates the importance values of neighboring nodes. Additionally, it adjusts the node importance in accordance with the nodes’ degree of centrality. RGTN [13] proposes a representation learning-based framework that utilizes both graph topology and node semantic information and aggregates them via an attention mechanism, which in turn infers node importance scores.

Notably, unlike previous works that use a single graph learning model to extract the structural information and generate the embeddings, which may not be adequate for estimating the node importance precisely, in this work, the graph representations are initially learned in the two views independently before being submitted to the cross-view contrasting module to further improve expressiveness. Besides, existing studies directly combine the node scores and edge embeddings, which may not fully exploit the interaction between node and edge representations; in this work, we calculate the attention scores between nodes using the representations of nodes and edges, and we then combine the scores to predict the node importance scores while fully accounting for the information contained in nodes and edges.

There are also some research works performed on heterogeneous graphs. MultiImport [15] is an end-to-end framework that integrates information from both the KG and external signals, while dealing with challenges arising from the simultaneous use of multiple input signals, such as inferring node importance from sparse signals, and potential conflicts among them. HIVEN [16] traces the local information of each node, employs the meta schema to alleviate the problem of node type dominance, and exploits the node similarity within each node type to overcome the limitation of GNN models in capturing global information. Taking the movie dataset as an example, our model requires only one type of label, the popularity of the movie, and just ranks the importance of the movie nodes. In the work on heterogeneous graph, they also input information about other types of nodes, such as the director’s box office.

2.2. Graph Neural Networks

Graph neural networks (GNNs) apply deep learning ideas to graph data, and these methods have attracted great research attention in recent years [1719]. The pioneering work of GNN is the graph convolution model GCN [20], which performs convolution in the Fourier domain by aggregating neighbor node features and has performed well in many applications. However, GCN training needs to use the neighbor matrix of the whole graph, which depends on the specific graph structure, so GraphSAGE [21] is proposed to solve this problem. GraphSAGE uses a multilayer aggregation function, and each layer of aggregation function will aggregate the information of nodes and their neighbors to get the feature vector of the next layer, which uses the neighborhood information of nodes and does not depend on the global graph structure. In addition, GCN treats all neighboring nodes equally in convolution and cannot assign different weights to nodes according to their importance; graph attention networks (GATs) [22] are proposed to solve this problem. GATs adaptively aggregate neighboring information based on the attention mechanism and can assign different weights to different nodes, and they provide an efficient framework for integrating deep learning into graph mining. These GNN works have been widely used in recommender systems [23], knowledge graph inference [24], and graph classification [25].

2.3. Contrastive Learning on Graphs

Contrastive learning (CL) has recently become recognized as an effective method for learning self-supervised graph representations [2629]. CL can produce data representations by learning to encode similarities or dissimilarities between a set of unlabeled samples. Rich unlabeled data are used as a supervised signal for model training. Since there are typically few labeled entities in knowledge graphs, in this work, we employ contrastive learning and make use of a large number of unlabeled entities to better obtain feature representations of nodes and get more precise node importance scores.

3. Problem Formulation

In this section, we provide the definitions of key concepts and introduce the formalization of the problem studied in our work.

3.1. Knowledge Graph

A knowledge graph is a graph = (, , ) that represents a network of real-world entities and illustrates the relationship between them, where , , and represent the entities, relationships, and predicates, respectively. In the knowledge graph, it is plausible that there might be several different types of predicates between two entities, and hence each edge is linked to a particular predicate through a mapping function: .

3.2. Node Importance

An entity’s importance or popularity in a knowledge graph is indicated by node importance sR, which is a nonnegative real value.

3.3. Semantic Information of Nodes

The semantic information of the node is the natural language text that provides comprehensive descriptions of the semantic information of the entity or the concept represented by the node.

3.4. Problem Definition

Given a knowledge graph = (, , ), a set of semantic information of nodes , and importance scores for a subset of nodes , entity importance estimation aims to learn a function that generates the importance score for each node in the knowledge graph.

4. Approach

In this section, we first describe the outline of the proposed model. Then, we introduce the details of the two components and training. Table 1 provides the definition of symbols used in this paper.

4.1. Outline

As shown in Figure 2, the features of entities are first forwarded to the self-supervised contrastive learning module to generate node embeddings. By adjusting the encoders’ hidden-size and out-size parameters, we produce a high-dimensional embedding and a low-dimensional embedding for each node, respectively. Then, the high-dimensional embeddings are mapped directly to the importance score (i.e., score 1), and the low-dimensional embeddings are concatenated with the edge features, which are mapped to another importance score (i.e., score 2) using the attention mechanism. The reason for this is that node embedding with high dimensionality can save more information and is better suited for direct mapping to predict the importance score; node embedding with low dimensionality can be better combined with edge embedding to get the attention weight between two nodes, and then the scores are aggregated to make the prediction. The scores from the two perspectives are combined to obtain the final predicted node importance score. Finally, we train the entire model by aggregating the self-supervised contrastive loss, the supervised root mean square error (RMSE) loss, and the learning-to-rank (LTR) loss.

4.2. Multiview Contrastive Learning

In this work, we choose two popular GNN models as the encoders to produce the graph representations in different views.

4.2.1. GCN

The first one is the graph convolutional network (GCN) [20]. The GCN model utilizes multiple convolutional layers to conduct the information passing by aggregating the features of nodes and their first-order neighborhoods. Given two message passing layers, the equation of GCN can be expressed as follows:where is the input feature vector matrix of nodes, is the scaled adjacency matrix of the graph with added self-loops, and are trainable weight matrixes, is the activation function, and is the output node embedding matrix.

4.2.2. GAT

The other model is graph attention network (GAT) [22], which assigns attention weight coefficients to the neighboring nodes of the target node and uses a local aggregation function to generate node embeddings. Given the features of nodes, the influence of node on node can be calculated by the following equation:where represents the feature of the node, || is a concatenation operator, is the activation function, represents the neighboring nodes of node , is a learnable weight vector, and is a trainable weight matrix.

Following the acquisition of the attention weights, GAT aggregates the feature representations of the nodes and its neighbors. The -layer GAT aggregation formulation with multihead attention can be expressed as follows:where is the input feature vector matrix of nodes in the graph.

4.2.3. Cross-View Contrastive Learning

After obtaining the representations in the two views, we use the cross-view contrastive learning strategy to help learn more expressive graph representations.

Given a node, we denote its embedding generated by the first view as and the embedding generated by the second view as . These two embeddings form a positive sample. The pairs of embeddings including (or ) and another node’s embedding are the negative samples. The following is the definition of node ’s contrastive object:where is a score function that measures the similarity between two embeddings. Specifically, two embeddings are first transformed using a multilayer perceptron (MLP) with nonlinear activation functions, and then the similarity between the two embeddings is evaluated by using the similarity metric. is the number of nodes in the graph, and is an indicator function that returns 1 if the argument included in the bracket holds true and 0 otherwise. The first term in the denominator is the positive sample. The term refers to the cross-view negative samples and represents the intraview negative samples.

Finally, the overall self-supervised loss is defined as

4.3. Multiview Score Aggregation

We devise a multiview strategy to produce and aggregate the entity importance scores.

4.3.1. Node View

As the two encoders obtain the high-dimensional embeddings of each node in the graph, respectively, i.e., and , we add the two embeddings and generate the node importance scores of the node view:where represents a fully connected neural network in our experiments.

4.3.2. Node-Edge Interaction View

The node view merely focuses on the features of nodes. However, edge features also contain a wealth of information and play an essential role in terms of estimating node importance. Thus, we concatenate the node and edge vectors to better model their interactions. Specifically, we use the attention mechanism, and the attention weight of node to node can be calculated by the following equation:where represents the feature of the node (or edge), denote the predicate between nodes and , || is a concatenation operator, is the activation function, represents the neighboring nodes of node , and is a learnable weight vector.

We first convert the low-dimensional embeddings to scores, and the scores can then be aggregated using the attention weights obtained with the following formulation:

4.3.3. Score Aggregation

The final predicted scores are formed by integrating the scores from the node view and the node-edge interaction view:where is the hyperparameter.

4.4. Training

We select mean square error and learning-to-rank loss as supervised loss functions. First, we establish the node set using nodes with known importance ratings. The following equation illustrates how to utilize RMSE to calculate the error between the predicted and labeled nodes’ important scores:where is the valid ground truth importance value of node and represents the predicted score.

In order to take the entire graph into account while rating the nodes’ importance, we use the learning-to-rank (LTR) loss in the training process, We sample nodes for node to form a node set . The calculation method is shown below:

By combining the supervised loss function with the self-supervised comparison loss function, we can obtain the total loss function for model training:

5. Experiments

In this section, we conduct extensive experiments on real-world datasets to answer the following questions:(1)Does MCRL work better than the existing baseline and previous models? Are the contrastive learning and score aggregation modules useful?(2)Is MCRL generally valid for different encoders? Is it sensitive to hyperparameters?

We describe detailed information about the dataset and baseline in Section 5.1, answer the above questions in Sections 5.2 to 5.4, and perform a case study in Section 5.5.

5.1. Experimental Setting
5.1.1. Datasets

Following previous works, we conduct comprehensive experiments on three public knowledge graphs with different features. More details can be found in Table 2.

FB15K [30] is a subset of the Freebase [31] database which contains knowledge base relation triples and textual mentions of entity pairs. The 30-day view count of the corresponding Wikipedia page is utilized as the node importance score for each entity in the graph, and the description of the entity in the Wikidata is used as the node semantic information. Compared to the rest of the datasets, FB15K has more predicates and also a higher density.

TMDB5K is a movie knowledge graph generated from TMDB (https://www.kaggle.com/tmdb/tmdb-movie-metadata), and it contains information about movies as well as other closely related entities including actors, casts, crews, and countries. The popularity of the movie is conducted to identify the entity’s importance scores, while the movie summaries provide the nodes’ semantic information.

IMDB is a movie knowledge graph created from the IMDB dataset (https://www.imdb.com/interfaces/), which includes entities for movies, casts, crews, genres, publishing companies, and countries. The importance scores are determined by the number of votes for each movie. As the semantic information for the nodes, the movie plot summaries and personal biographies are used.

5.1.2. Competing Methods

We compare MCRL with two primary kinds of methods that are readily available for ranking the importance of nodes in the graph. The first refers to the unsupervised approaches, including(1)PR [7]: a random walk-based algorithm for measuring the importance of web pages can also be used to rank the importance of nodes in a graph.(2)PPR [11]: a variant of PageRank that considers the node’s own feature information.

The second includes the supervised methods:(1)LR: a simple machine learning technique that uses the least squares algorithm based on the reduction of mean square errors.(2)RF: another basic machine learning algorithm that uses the ensemble learning method based on decision trees.(3)GCN [20]: a GNN model that aggregates the neighbor node embeddings to conduct graph convolutions in the Fourier domain.(4)GAT [22]: a GNN model that uses the multihead attention mechanism to aggregate the features of the neighbor nodes.(5)GENI [12]: the model aggregates scores using a predicate-aware attention mechanism and flexible centrality adjustment to perform node importance estimation.(6)RGTN [13]: the model provides a representation learning-based framework for node importance estimation, which propagates the embedding of nodes in a relational graph transformer.

5.1.3. Detailed Settings

For fair comparison, we maintain consistency with previous works [12, 13] by concatenating semantic and structural features as node input features, except for GENI using the structure features only by following the setting in the original paper [12]. The structural features of the nodes are obtained by node2vec [32], and the semantic features are obtained from Transformer-XL [33]. To enhance the GNN model in acquiring the node representation more accurately, we employ two widely used graph data augmentation techniques during training. Given a graph, edge dropout [34] refers to randomly dropping some edges with the probability , while node dropout [35] is the process of randomly discarding some nodes and their connected edges with the ratio . The nodes with the important scores in datasets are divided into training, validation, and testing parts with a ratio of 7 : 1 : 2. To obtain reliable and stable experimental results, we conducted five-fold cross-validation on each dataset to evaluate all the models. In order to avoid the overfitting problem, we apply early stopping if the performance on the validation set is not improved for 1000 consecutive epochs. For testing, the parameters that perform the best during validation are used. The experimental setup includes a Linux operating system, an NVIDIA GeForce RTX 3090 graphics card with 24 GB of memory, CUDA version 11.3, and Python 3.8 programming language, and the model is built using PyTorch framework version 1.11.0.

5.1.4. Evaluation Metrics

Following previous works [12, 13], we use three evaluation metrics to give a thorough evaluation of the rank quality and importance relevance: normalized discounted cumulative gain (NDCG) [36], Spearman’s rank correlation coefficient (SPEARMAN) [37], and Top-K Hit Ratio (HR). For all metrics, higher values are preferable. The definitions of metrics in formal terms are provided below.(1)NDCG is a popular metric for evaluating ranking quality for the top k nodes. Given a list of k nodes ranked by predicted important scores and their ground truth important scores , the discounted cumulative gain at position k (DCG@k) can be defined. The Ideal DCG at rank position k (IDCG@k) is obtained by an ideal ordering of nodes based on their ground truth scores. Then, we can get the normalized DCG at position k (NDCG@k).(2)SPEARMAN measures the strength and direction of the correlation between two node rankings that are rated in accordance with the predicted scores and the ground truth scores .(3)HR measures the ratio of the predicted nodes that have been contained by the real important nodes. HR@k is achieved through .

5.2. Analysis of Experimental Results

The performance results are shown in Table 3. Numbers after symbol refer to standard deviation from the cross-validation. The approach denoted by an asterisk (∗) only employs structure features.

It can be observed from the table that our proposal MCRL outperforms all the compared models in all metrics. Besides, the results also reveal the following:(1)Supervised methods typically perform better than unsupervised approaches and are more accurate in predicting node importance scores.(2)GENI prematurely maps node features into scores and calculates attention weights by simply splicing node scores with edge embeddings, which cannot fully utilize the interaction between nodes and edges.(3)RGTN learns graph representations from a single perspective, which is not flexible and accurate enough for node features.(4)The model proposed in this paper also has some shortcomings because it uses data augmentation methods that randomly discard some nodes or edges in the graph, thus increasing the uncertainty and instability, and the standard deviations from the cross-validation are a little bit higher.

5.3. Ablation Study

In this section, we perform ablation studies to prove the validity of each module in our proposed framework.

5.3.1. On Multiview Contrastive Learning

To verify the effectiveness of contrastive learning, we conducted ablation study on two datasets. Specifically, one variant merely uses GCN as the encoder and the other merely uses GAT as the encoder, both removing the contrastive loss component in the training. It can be observed from Figure 3 that using contrastive learning effectively increases the performance by improving the graph representations.

5.3.2. On Multiview Score Aggregation

To validate the performance of the score aggregation module and also to demonstrate that splicing the features of nodes with those of edges to calculate attention weights is more effective than simply splicing the scores of nodes with the features of edges, we conduct the experiment on multiview score aggregation. Table 4 shows that employing score aggregation enhances the effectiveness and stability of model prediction. Besides, concatenating the node and edge representations can better capture their interactions and generate superior performance.

5.4. Further Experiments
5.4.1. Choices of Encoders

In this study, we select the GCN and GAT models as the encoders and employ contrastive learning approaches to produce the node representations. In fact, the encoders in the model can be replaced with any graph representation learning model, and we choose GraphSAGE for experiments for verification.

From Figure 4, we can see that the outcomes of the experiments on the two datasets are not sensitive to the choices of encoders, and employing contrastive learning can consistently enhance the performance. Therefore, MCRL can be applied to a variety of encoders, and in this paper, we choose two popular encoders.

5.4.2. Parameter Sensitivity

As mentioned in Section 4.3, we obtain the final prediction scores by assigning hyperparameter as the weight to the scores from two perspectives. To show that the model is stable under hyperparameter perturbations, we conduct sensitivity analyses on this important hyperparameter on FB15K. Figure 5 demonstrates that changing the weights of the two scores has no appreciable impact on the experimental outcomes. Therefore, MCRL is robust to the perturbations of .

5.4.3. Training Time

To compare the efficiency of different methods, we report the overall training time on the FB15K dataset in Table 5. For each model, we ran the experiment five times and took the average of the run times to obtain the time cost. As shown in Table 5, it is evident that contrastive learning and dropout method used in our proposed model increase the model training time, while the slight increase in training time is acceptable considering the improvement in prediction results, which is shown in Table 3.

5.4.4. Comparison with Methods Proposed for Heterogeneous Graphs

Recent years have also witnessed the emergence of importance estimation methods on heterogeneous graph. Thus, for the comprehensiveness of experiment, we compare our proposal with state-of-the-art importance estimation method on heterogeneous graph, i.e., HIVEN [16], and report the performance in Table 3. In order to compare the outcomes of the two models without taking into account the different node types, we apply HIVEN on homogeneous graph with the same input data as our proposed model. Taking the movie dataset as an example, we only provide the models with labels of some of the movie nodes to just estimate the importance of the movie nodes. The experimental results demonstrate that our work outperforms the method proposed for heterogeneous graphs when evaluating on the homogeneous graph. This indicates that the methods proposed for heterogeneous graphs cannot work well on homogeneous graphs where the nodes are of the same type.

5.5. Case Study Analysis

To demonstrate the effectiveness of MCRL on the prediction task, we conduct a case study using the movie dataset IMDB as an example. Table 6 shows the top-10 movies with the highest importance scores predicted by MCRL, GENI, and RGTN along with the difference between their ground truth ranks and estimated ranks. The ground truth rank is calculated from known importance scores of movies. From the table, we can see that the top-10 movies predicted by MCRL are qualitatively better than the two others, demonstrating our model’s effectiveness in terms of evaluation.

6. Conclusion

Estimating the importance of nodes in KGs is a highly fundamental and crucial task in graph analysis, which is beneficial to many downstream applications. In this paper, we propose a multiview contrastive learning strategy to obtain representations of nodes from multiple perspectives and use cross-view contrasting module to enhance the expressiveness. Additionally, we generate the entity importance score by attentively aggregating the scores in two views—one merely considering the entity embeddings and one modeling the interactions between entity and relation embeddings. Comprehensive experiments on real-world knowledge graphs show that our model outperforms existing methods in measures. There are also some works on node importance estimation for heterogeneous graphs [15, 16], so for future work, we intend to apply cutting-edge representation learning techniques to estimate node importance on heterogeneous knowledge graphs.

Data Availability

This study used the movie datasets from TMDB and IMDB, the TMDB dataset can be downloaded from website https://www.kaggle.com/datasets/tmdb/tmdb-movie-metadata, and the IMDB dataset can be downloaded from the official IMDB website https://www.imdb.com/interfaces/or https://datasets.imdbws.com/. The datasets are available for access to customers for personal and noncommercial use.

Conflicts of Interest

The authors declare that they have no conflicts of interest.

Acknowledgments

This work was partially supported by the National Key R&D Program of China No. 2022YFB3102600, NSFC under grant nos. 62302513 and 62272469.