Abstract

Recommender systems are known to suffer from the popularity bias problem: popular items are recommended frequently, and nonpopular ones rarely, if at all. Prior studies focused on tackling this issue by increasing the number of recommended nonpopular (long-tail) items. However, these methods ignore the users’ personal popularity preferences and increase the exposure rate of the nonpopular items indiscriminately, which may hurt the user experience because different users have diverse interests in popularity. In this work, we propose a novel debias framework with knowledge graph (AWING), which adaptively alleviates popularity bias from the users’ perspective. Concretely, we explore fine-grained preferences (including popularity preference) behind a user-item interaction by using the heterogeneous graph transformer over the knowledge graph embedded with popularity nodes and endow the preferences with explicit semantics. Based on this idea, we can manipulate how much popularity preference affects recommendation results and improves the exposure rate of nonpopular items while considering the popularity preferences of different users. Experiments on public datasets show that the proposed method AWING can effectively alleviate popularity bias and ensure the user experience at the same time. The case study further demonstrates the feasibility of AWING on the explainable recommendation task.

1. Introduction

In the age of the Internet, users can enjoy a variety of services on various electronic platforms. However, as the number of users continues to increase, the problem of information overload becomes more serious, which makes users cannot effectively search for the content they want. Recommendation system is an effective method to solve such problems [1]. Among these, collaborative filtering, as one of the most successful methods of recommendation system, can predict the rating of a certain user for an item and generate a recommendation list by using the preference of a certain user group [2]. So, accurately characterizing users’ interests lives at the heart of an effective recommender system [3], which is challenging, however. There are even some studies aiming at hindering the system in its efforts to accurately profile users for their privacy [4]. One barrier to the effectiveness of capturing users’ representation is the problem of popularity bias: collaborative filtering recommenders typically emphasize popular items much more than nonpopular ones [5], which makes popular items to be rated higher than their ideal values so that they may be recommended to some users who are actually not interested in those items. Figure 1 illustrates the long-tail phenomenon in the well-known LastFM [6]. The vertical line separates the top 20% of items by popularity, and these items cumulatively have many more ratings than the 80% long-tail items to the right. Similar distributions can be found in other systems as well. After being trained on such long-tailed data, the models inherit this bias and, in many cases, expand it by over-recommending the popular items. As a result, they will be rated by more users and this goes on again and again; the rich gets richer and the poor gets poorer.

Although popular items often get good recommendation, recommending them to users is sometimes not meaningful because these items are likely well-known. In other words, recommending serendipitous items from the nonpopular items is ordinarily considered valuable to users [7], as these are items that users are less likely to know about. Therefore, recommender systems should explore a balance between popular and nonpopular items. Previous studies have mainly focused on increasing the number of recommended nonpopular (long-tail) items. Jones introduces a reweighting method to improve performance on small community detection [8]. Then, some studies [9, 10] introduce a regularization term to correct popularity bias. Besides, a few methods [11] utilize propensity score to decrease the ratio of popular items. However, these methods increase the exposure rate of the nonpopular items indiscriminately, largely ignoring the user’s interest in popularity, which may hurt the user experience [12].

In this paper, we propose an adaptive framework to alleviate popularity bias from the users’ perspective, named AWING (Adaptive Alleviation for Popularity Bias with Knowledge Graph). The key idea is that a user typically has multiple preferences (or reasons), driving him to consume different items. Based on this idea, we can capture users’ popularity preferences and remove a percentage of popularity preferences (lower a ratio of weight in popularity preferences) based on their profiles with a knowledge graph for these users who are not interested in popular items. The knowledge graph is a practical approach to represent large-scale information from multiple domains [13]. To describe a knowledge graph, we can use nodes in the graph as entities and edges as relations between entities, which follows the resource description framework (RDF) standard [14]. Of course, to capture users’ multiple preferences on items, the content feature information is important and helpful; some studies also utilize it to optimize the learning model [15]. However, disentangling these preferences is challenging and has not been well explored. Specifically, we face two key challenges: (1) although knowledge graph can provide rich information to learn these preferences, it lacks knowledge of popularity bias; (2) different users have diverse interests in popular items; how to adaptively eliminate popularity bias according to personal taste is another challenge. To cope with these two challenges, we design AWING with two stages: identifying fine-grained preferences behind a user-item interaction and generating recommendation results that match the user’s interest in popular items. AWING mainly includes the following four components: (1) a component which constructs knowledge graph embedded with popularity nodes; (2) a component which models fine-grained user preferences; (3) a component which learns the representations of the users, items, and fine-grained preferences based on a heterogeneous graph transformer model; (4) a component which generates personalized recommendation list with removing popularity preference personally. Among these components, components (1–3) deal with challenge (1), while the component (4) deals with challenge (2).

The contributions of this work are summarized as follows:(i)To the best of our knowledge, we are the first to introduce knowledge graph embedded with popularity nodes in heterogeneous graph to alleviate popularity bias(ii)We propose a flexible framework AWING to alleviate popularity bias from the users’ perspective, which uses fine-grained preferences to profile user-item relationships over the knowledge graph and then remove a percentage of popularity preference for different users(iii)We conduct extensive experiments on two public datasets to demonstrate the effectiveness of the proposed model for alleviating popularity bias and the case study shows the feasibility of our model on the explainable recommendation task

2. The Proposed Model

In this section, we first introduce the notions and definitions used throughout this paper, and then we show how popularity nodes are embedded in knowledge graph and how to model fine-grained preferences of users. After that, we present how to develop the heterogeneous graph transformer [16] module of the proposed AWING on the synthetic graph. Finally, we use the trained AWING to estimate whether a user will adopt an item considering the fine-grained preferences.

2.1. Preliminary
Interaction data: given a list of user-item interactions , we use implicit feedback as the protocol so that each pair implies the user consumes the item . An additional relation is introduced to explicitly present the user-item relationship and convert a pair to the triplet. As such, the user-item interactions can be seamlessly combined with KG.Knowledge graph (KG): KG is a directed graph composed of triple facts. Each triplet denotes a relationship from head entity to tail entity , formally defined by , where and are entities, and is a relation. With the mappings between items and KG entities (also includes items), KG can profile items and offer complementary information to the interaction data.Heterogeneous graph (HG): formally, a heterogeneous graph is defined as a directed graph where each node and each edge are associated with their type mapping functions and . A and R donate the sets of node types and edge types, respectively.Task description: given the interaction data and the KG , our task is to learn a function that can predict how likely a user would select an item while further alleviating popularity bias.

Definition 1. Popularity: we define the number of times an item rated by all users as its popularity.

Definition 2. Popular item: an item is a popular item if its popularity is in the top 20% of all items.

Definition 3. Niche user (N) [12]: a user is a niche user if she/he is in the bottom 20% regarding the ratio of popular items in her/his profile. For these users, more than half of their profile consists of nonpopular (long-tail) items.

Definition 4. Blockbuster-focused user (B) [12]: a user is a blockbuster-focused user if she/he is in the top 20% regarding the ratio of popular items in her/his profile. These users, on average, have most popular items in their profile.

Definition 5. Diverse user (D) [12]: a user is a diverse user if she/he is neither a niche user nor a blockbuster-focused user.

2.2. The Architecture of AWING

We now present the proposed AWING. As illustrated in Figure 2, it consists of four key components: (1) KG embedded with popularity nodes, which inserts popularity nodes into KG to enrich the relations of KG; (2) fine-grained user preferences modeling, which uses multiple preferences to profile user-item relationships and aligns each preference with the relation in knowledge graph embed with popularity nodes; (3) heterogeneous graph transformer, which fully models heterogeneity to maintain dedicated representations for different types of nodes and edges in the heterogeneous graph; (4) model prediction, which uses the mutual attention to predict how likely the user would adopt the item under each preference.

2.2.1. KG Embedded with Popularity Nodes

We first divide the items into K groups according to their popularity. Next, we create popularity nodes, termed , representing these groups and connecting items to their corresponding nodes. In this way, a new relation, , is introduced to KG, which integrates popularity information into the knowledge graph. There also exits other relations and entities (come from the attributes of items) in KG. As shown in Figure 2, we denote the new graph as KGEPN (KG embedded with popularity nodes).

2.2.2. Fine-Grained User Preferences Graph

We aim to capture the intuition that multiple preferences influence the behaviors of users. Here, we frame the preference as the reason for users’ choices of items, reflecting the commonality of all users’ behaviors. Taking music recommendation as an example, possible preferences are diverse considerations on music attributes, such as artist, genre, or popularity mentioned above. Such intuition motivates us to model user-item relations at the granularity of preferences. Assuming as the set of preferences shared by all users and n as the number of types in the set , we can slice a uniform user-item relation into the n preferences and decompose each triple into , as illustrated in Figure 2, termed preference graph (PG for short). Since the preferences are expressed as latent vectors that are vague to deeper understanding, we set the number of preferences as that of relations in KGEPN and transfer the information on the relation in KGEPN to the preferences. Concretely, we utilize the Euclidean norm to align the preferences embeddings p (between users and items) and relations embeddings r (between items and entities) in KGEPN:

2.2.3. Heterogeneous Graph Transformer

After we get PG and KGEPN mentioned above, we combined them into a new heterogeneous graph (HG); we developed heterogeneous graph transformer (HGT for short) on the HG. HGT aims to aggregate information from the neighbors of target node . Such a process can be decomposed into two parts: message computation and message aggregation. We denote the output of the -th HGT layer as and the depth of HGT as .

The message computation part incorporates the message matrix, , to alleviate the distribution differences of nodes and edges of different types. Based on a source node and an edge , HGT calculates the message passed by on bywhere is a unique linear projection for node type .

In message aggregation part, HGT first calculates the heterogeneous mutual attention between source node and target node to control the influences of on . The attention mechanism was first proposed by the Google team to classify images [17]. Now, it is widely used in graph neural networks in recommender system [18]. HGT utilizes a unique linear projection ( or ) for each type of node and a distinct edge-based matrix for each edge type to model the distribution differences maximally; d is the size of dimension in the head of attention mechanism. The model can capture different semantic relations even between the same node type pairs, e.g., multiple preferences between the same user-item pair as mentioned above. Specifically, we calculate the heterogeneous mutual attention for each edge bywhere denotes the one-hop neighbors of node .

Next, HGT uses the attention vector as the weight to average the corresponding messages from the source nodes and get the updated vector as

We opt for the pairwise BPR loss to train HGT. Specifically, we encourage the score between node and node to be higher than the score between node and random node :

By combining the aligning loss and BPR loss, we minimize the following objective function to learn the model parameter:where is the set of model parameters and and are two hyperparameters to control the aligning loss and regularization term, respectively.

2.2.4. Model Prediction

Benefiting from the individual edge-based matrix for each edge type, we can quantify the user’s preference at a finer granularity. Specifically, as is exhibited in Figure 2, given final representations of user and item , for each preference, we calculate the corresponding score between and by

Then, we sum these different scores up as the probability of adopting :

In addition, we can combine multiple preferences as needed. To alleviate popularity bias, we remove a percentage of popularity preference for every user. Concretely, we design a weight to manipulate how much popularity preference affects recommendation results:where and is the ratio of popular items in the profile of user . If a user is very interested in popular items, we will hardly remove her popularity preference, which can guarantee user experience.

3. Experiments

We provide empirical results to demonstrate the effectiveness of our proposed AWING. The experiments are designed to answer the following research questions (RQ):(i)RQ1: how does AWING perform, compared to the state-of-the-art recommender models? Mainly, can AWING effectively alleviate popularity bias?(ii)RQ2: how does the key hyperparameter affect the recommendation performance?(iii)RQ3: can AWING provide insights on user preferences and give an intuitive impression of explainability?

3.1. Experimental Settings
3.1.1. Datasets

To evaluate the effectiveness of AWING, we utilize two benchmark datasets: LastFM and DBbook-2014, which are publicly accessible and vary in terms of domain, size, and sparsity.(i)LastFM: the dataset is the collection of listening records. The songs, which interacted with the current user only once, are treated as negative feedback. Because these songs may be misclicked by the user and are not helpful for improving the recommendation performance, to ensure the quality of the dataset, we use the 5-core setting, i.e., retaining users and items with at least five interactions.(ii)DBbook-2014 [19]: the dataset consists of users and their binary feedback (1 for likes and 0 otherwise). Similarly, we use the 5-core setting to ensure that each user and item have at least five interactions.

Besides the user-item interactions, we need to construct item knowledge for each dataset. For LastFM and DBbook-2014, we follow the way in [20] to map items into freebase entities. We summarize the statistics of the two datasets in Table 1. We randomly select 80% of items associated with each user to constitute the training set and use all the remaining as the test set. The experiments are conducted with five-fold cross-validation for ten times, and the average results are reported.

3.1.2. Evaluation Metrics

Apart from a relevance-based metric (Recall@N) and a ranking-based metric (NDCG@N), we choose three metrics to measure the popularity bias.(i)DGAP [12]: the group average popularity (GAP ()) metric measures the average popularity of items in the profiles of users in a particular group or their recommendation lists. Furthermore, the change in GAP ( GAP) is the amount of unwanted popularity in the recommendations imposed by the algorithms to each group:where is the group of users (in our case, it is either N, D, or B), is the popularity of a specific item, is the list of items in the profile of user , and is the list of items in the recommendation result of user .(ii)APT@N [9]: the average percentage of tail items (APT) quantifies the ratio of nonpopular items in the recommendation lists:(iii)AD@N [21]: aggregate diversity (AD) counts the total number of different items that have been recommended to at least one user:

3.1.3. Comparison Method

We compare our proposed AWING with the following baselines:(i)BPRMF [22] is a classical CF method that only uses the user-item ratings for the recommendation, assuming that users tend to assign higher ranks to observed items.(ii)BPRMF [23] is a GCN-based general recommendation model that leverages the user-item proximity to learn node representations and generate recommendations, which is reported as the state-of-the-art method.(iii)KTUP [24] employs TransH on user-item interactions and KG triplets simultaneously to learn user preference and perform KG completion.(iv)IPS−CN [25] adds normalization, which also achieved lower variance than plain IPS, at the expense of introducing a small amount of bias.(v)ESAM [10] regards popular and nonpopular items as the source and target domains, respectively, and introduces three regularization terms for transferring the knowledge from these well-trained popular items to the long-tail items.

3.1.4. Implementation Details

We implement our AWING model in PyTorch. We use AdamW [26] to train the model, where the initial learning rate is 0.001. In addition, we use 256 as the hidden dimension throughout the neural networks, and the batch size is fixed as 1024. As for other hyperparameters, we conduct a grid to confirm the optimal settings. More specifically, the coefficients of additional constraints (i.e., aligning loss and L2 regularization) and are searched in and the number of HGT layers is tuned in . Finally, we set , , and in our experiments.

3.2. Overall Performance Comparison (RQ1)

Table 2 shows the best recommendation performance of all models on two datasets. In particular, AWING-APS is the variant of AWING which removes a percentage of popularity preference when recommended. In addition, the bold numbers indicate the best in each row and the underlined values indicate the second best. We can draw the following conclusions from the table.

Firstly, the BPRMF model, the most basic model, has the worst performance on two datasets and has serious popularity bias. Although LightGCN performs better than BPRMF, its APT value is also meager, showing that the popularity bias is ubiquitous in recommender systems.

Next, the performance of the KG-aware baseline, KTUP, is better than LightGCN and BPRMF on all metrics, which demonstrates that the introduction of KG is beneficial not only for the recommendation but also for alleviating popularity bias. Nevertheless, the improvement of APT and GAP is not enough.

Besides, compared with the three models above, IPS-CN and ESAM improve a lot on APT, but they do not keep the ratio of popular and nonpopular items according to their profiles, which may hurt the user experience.

Finally, our proposed method AWING which outperforms all the compared baselines on both datasets in terms of recall and NDCG, which indicates that identifying fine-grained preferences is helpful for the recommendation. Moreover, our proposed method AWING-APS performs best among all models in the light of GAP, AD, and APT, which verifies the significance of removing popularity preference to alleviate popularity bias. Especially, AWING-APS has a shallow value of GAP. In other words, AWING-APS keeps a similar ratio of popular and nonpopular items, which guarantees the user experience well. It should be noted that although AWING-APS sacrifices a small amount of recall and NDCG, this is negligible compared to IPS-CN and ESAM.

We can also find that, in the two datasets, the former dataset is sparser. The proposed model performs worse than the latter in the metrics of ranking task (Recall@N and NDCG@N) for the recommendation. However, for most metrics to measure the popularity bias, e.g., AD and APT, the improvement on the LastFM is greater than the DBbook-2014.

3.3. Parameter Sensitivity Analysis (RQ2)

In this section, we investigate the impact of the number of popularity nodes for popularity bias. In this experiment, we tune in the range of [2, 13] with a step of 1 to report the corresponding performance. From Figure 3, we observe that AWING-APS achieves the best performance when  = 6 and 7 on two datasets, respectively. This is because a too small does not have enough capacity to distinguish the different degrees of popularity, while a too-large causes sparse data in each group and adversely suffers from overfitting.

3.4. Case Study (RQ3)

An important benefit of attention recommender system is the explainability of the results [27]. In the same way, benefiting from the HGT, we can infer the fine-grained user preferences on the target item. Towards this end, we present an example of LastFM to give an intuitive impression of our explainability. We randomly selected one user, u306, and two relevant music m749 and m1364 (from the test, unseen in the training phase). Figure 4 shows the visualization of the example. AWING searches for the most influential preference based on the attention scores (cf. (7)). Thus, it explains this behavior as user u306 selects music m749 since it matches her interest in the featured artist. Similarly, we can infer that u306 chooses m1364 just because of its popularity.

In recent years, with the development of recommender systems, more and more attention has been paid to the fairness of recommender systems. Popularity bias is one of the critical factors affecting its fairness. The problem of popularity bias and the challenges it creates for the recommender systems has been well studied by other researchers [5, 28]. Authors in the mentioned works have mainly explored the overall accuracy of the recommendations in the presence of long-tail distribution in rating data. Moreover, some other researchers have proposed algorithms that can control this bias and give more chances for nonpopular items to be recommended [9, 29, 30]. Gruson et al. [25] and Joachims et al. [11] use IPS (Inverse Propensity Score) to eliminate popularity bias by reweighting each instance according to item popularity. Besides, Abdollahpouri et al. [9] provide a regularization term to control popularity bias. Recently, Chen et al. [10] address this problem from domain adaptation, regarding popular and nonpopular items as the source and target domains. In this work, however, we focus on alleviating popularity bias from the perspective of users. That is, we want to take into account the personal interest in popular and nonpopular items.

The recommendation system based on knowledge graph has attracted extensive attention of researchers now. This method can not only improve the accuracy of the recommendation system but also provide explanations for the recommendation results. Zhang et al. [31] proposed CKE with three-information sources: structural information, textual information, and visual information, which shows the structure information of the knowledge graph can enhance the semantic information of the item embedding. Wang et al. [32] used KGAT to obtain the item representations and propagate them on the user-item interaction graph with the graph attention mechanism. Huang et al. [33] constructed multiple metapaths from users to entities on the interaction and knowledge graph to obtain user representation. Tu et al. [34] proposed KCAN, which uses the knowledge graph to help automatically distill the knowledge graph into the target-specific subgraph and get the refined node representations. They all did not consider introducing knowledge graph embedded with popularity nodes.

5. Conclusion

This paper aims to utilize KG embedded with popularity nodes to alleviate popularity bias by identifying the fine-grained preferences of users. Our method first constructs a heterogeneous graph that combines KG, preference graph, and popularity nodes. Secondly, we use heterogeneous graph transformer over the heterogeneous graph while aligning fine-grained preferences with the relations in KG to learn the user/item/preference embeddings and the parameters of the mutual attention. Finally, according to the user’s interest in popular items, we adaptively remove popularity preference to eliminate popularity bias. In the future, a valuable direction is to consider the dynamic change of popularity, instead of keeping the popularity constant.

Data Availability

The datasets generated during and/or analyzed during the current study are available from the corresponding author on reasonable request.

Conflicts of Interest

The authors declare that there are no conflicts of interest regarding the publication of the manuscript.

Acknowledgments

This work was supported in part by the National Key Research and Development Project under grant 2019YFB1706101, National Natural Science Foundation under grant 72161005, and research program of Chongqing Technology Innovation and Application Development under grant CSTC2019jscx-zdztzxX0031.