Abstract
Recommender system is a very efficient way to deal with the problem of information overload for online users. In recent years, network based recommendation algorithms have demonstrated much better performance than the standard collaborative filtering methods. However, most of network based algorithms do not give a high enough weight to the influence of the target user’s nearest neighbors in the resource diffusion process, while a user or an object with high degree will obtain larger influence in the standard mass diffusion algorithm. In this paper, we propose a novel preferential diffusion recommendation algorithm considering the significance of the target user’s nearest neighbors and evaluate it in the three real-world data sets: MovieLens 100k, MovieLens 1M, and Epinions. Experiments results demonstrate that the novel preferential diffusion recommendation algorithm based on user’s nearest neighbors can significantly improve the recommendation accuracy and diversity.
1. Introduction
With the rapid development of Internet in the past years, the amount of online information increases at an exponential speed, which leads to information overload problem. When faced with vast amount of information, we can hardly find the valuable information accurately and quickly. The personalized recommender system is one of the most effective tools to resolve this problem, and it also can help enterprises make the users’ potential demand a realistic demand [1, 2].
To date, various recommendation methods have been proposed and developed. One of the most successful recommender system methods is based on the collaborative filtering technique [3–5]. Recently, some physical methods, such as mass diffusion [6–9] and heat conduction [10, 11], have found applications in personalized recommendation. Standard mass diffusion algorithm applied the three-step mass diffusion starting from the target user on a user-object bipartite network, which accurately outperforms the standard collaborative filtering methods [1]. Many different bipartite network based methods [12] are proposed to achieve even better recommendation performance. In [6], Zhou et al. proposed a hybrid method by combining the mass diffusion and heat conduction to solve the apparent diversity-accuracy dilemma of recommender systems. Motivated by enhancing the preferential diffusion algorithm’s ability to find unpopular and niche objects, the preferential diffusion has been designed in [9]. Moreover, Zhang and Zeng proposed a strategy to adding some virtual connections to the networks, which is useful to deal with the cold start problem in recommender system [13].
However, all these methods do not give a high enough weight to the influence of the target user’s nearest neighbors in the resource diffusion process. As we all know, birds of a feather flock together. The user’s nearest neighbors are the ones who have similar taste with the given user. Therefore we introduce a novel preferential diffusion recommendation algorithm considering the significance of the target user’s nearest neighbors in the diffusion process.
2. Methods
A recommender system can be represented by a bipartite network , where , , and are the sets of users, objects, and links, respectively [7]. Denote by the adjacency matrix, where the element if the user has selected the object and otherwise.
2.1. Standard Mass Diffusion Recommendation Algorithm
As is shown in Figure 1, the standard mass diffusion (SMD) algorithm is equivalent to a three-step random walk process. At first, objects in the bipartite network are assigned an initial resource , with for the target user . For simplicity, if an object is collected by the user , its initial resource is assigned to be 1, otherwise it is assigned to be 0. That is to say, the initial resource vector can be written as
Then, each object’s resource is redistributed to the user who has collected the object averagely, and the user’s resource is the sum of the resources received from objects. At last, each user’s resource was reallocated to the objects which he has collected averagely. The final score of the object’s resource can be calculated via the transformation , where is the resource transfer matrix. where is the degree of the object and is the degree of the user .
2.2. The Novel Preferential Diffusion Algorithm Based on User’s Nearest Neighbors
Following on from previous research [14], the diffusion process of the novel preferential diffusion recommendation algorithm based on user’s nearest neighbors (NNMD) is shown in Figure 2. At first, we calculate the Jaccard similarities between the target user and the other users to get the top similar neighbors. The formula of Jaccard similarity reads where is the Jaccard similarity between user and user and and are the user neighbors set of user and user , respectively. Then we can get the objects’ initial resource denoted by the vector , with for the target user . can be written aswhere is the nearest neighbors set of the target user . In Figure 2, . But only the objects which the target user has selected can distribute the resources to users and then redistribute them via the transformationwhere is the same as (2). In Figure 2, . Finally, we use the linear combination the resources vectors and to get the last objects’ resources vector . That is to say, where is a variable parameter from 0 to 1.
3. Data and Metrics
3.1. Data
To test the algorithmic performance, we use three benchmark data sets as shown in Table 1. The sparsity of these data sets is shown in the last column of Table 1. They are very sparse, especially Epinions data set. MovieLens 100k and MovieLens 1M data sets [15] were collected by the GroupLens research group. They consist of 100000 ratings from 943 users on 1682 different movies and 1000209 ratings from 6040 users on 3952 different movies, respectively. The ratings are integer numbers in the range of 1 to 5 scales. The Epinions data set [16] consists of 22166 users, 296277 objects, and 922267 ratings. It is noted that Epinions data set is highly sparse. Users only rate a small number of items in the system, and, in order to get better results, we delete those users and objects with degree less than 7. Finally, we get a new data set which consists of 4066 users, 7649 objects, and 154122 ratings. We randomly divide the data sets into two parts: the training set contains 80% of the data and the remaining 20% of data constitutes the probe set .
3.2. Metrics
There has been considerable research in the area of recommender systems evaluation. Accuracy is the most important aspect in evaluating the recommendation algorithmic performance. In this paper, we use ranking score [8] to measure the ability of a recommendation algorithm to generate a ranking list of the target user’s uncollected objects that matches the users’ preference. For the target user , the recommendation algorithm will return a ranking list of all his unselected objects and, according to , if has selected the object and is at th place in the ranking list, we say the position of iswhere is the number of his unselected objects. We obtain the mean value of all the user-object ranking scores in ; namely,
Clearly, the larger the ranking score, the lower the algorithm’s accuracy and vice versa.
In the practical recommender system, we may consider the number of objects that users like in the recommendation list. Therefore, we take another accuracy metric called precision. For a target object and user , there are four cases in the recommender system. The first is that the recommender system recommended the object and user likes it. The second is that recommender system recommended the object but the user does not like it. The third is that the user likes the object but the recommender system did not recommend it. Finally is the case that the user does not like the object and the recommender system did not recommend it. As is shown in Table 2, , , , and denote the number of the objects in the four cases.
For a target user , the precision of recommendation is defined as
We obtain the mean precision of all the users in the recommender system. Besides accuracy, diversity is taken into account as another important aspect to evaluate the recommendation algorithm. There are two kinds of diversity. One is called intrauser-diversity [17]; the other is called interuser-diversity [18]. In this paper, we consider the interuser-diversity. It considers the different objects between users in the recommendation list. For two users and , the differences can use be measured by the Hamming distance [18]: where is the number of common objects between and in the recommendation list and is the length of the recommendation list. Clearly, if and have the same recommendation list, , while if the recommendation lists are completely different, .
In reality, it has been found that a recommender system which has a high accuracy might not be satisfied by the users [19]. For example, for a film website, recommending the popular films to the users may not always be the best recommendation, because users might have already seen those films in other ways. A good recommender system can find the objects that match the users’ preferences and are unlikely to be already known. As a result, the novelty is also often used in evaluating the recommendation algorithmic performance.
The average degree of objects in the recommendation list is widely used to identify the novelty of a recommender system [20], which is defined bywhere is the number of users, is the recommendation list for user , and is the degree of the object .
4. Results and Discussion
In our first set of experiments, we compare the ranking score of the NNMD algorithm under different and top ( is the number of the target user’s nearest neighbors) with that of the SMD algorithm. The results on MovieLens 100k, MovieLens 1M, and Epinions data are reported in Figure 3. Clearly, we can see that in MovieLens 100k and MovieLens 1M, with the increase of , the rank score is smaller and smaller; that is to say, the recommendation accuracy is getting better and better. However, when is more than 30, the change of rank score is very small. Moreover, as long as is not equal to 0 or 1, the rank score of our method is better than that of the SMD algorithm. It is interesting to note that the optimal parameters of our method are the same in MovieLens 100k and MovieLens 1M, which are and , while, in Epinions, the improvement of the rank score is not significant. When is greater than 0 or is greater than 20 the rank score of the NNMD algorithm is a little worse than that of the SMD algorithm, and, with the change of and , the rank scores of the two algorithms are almost the same. But when is less than 20 and , the rank score of our method is getting better than that of the SMD algorithm. Clearly, we can get the optimal parameters and in Epinions.
Then we examined the performance in precision, interuser-diversity, and novelty of our novel algorithm at the optimal parameters and . Summaries of the results for all algorithms and metrics on MovieLens 100k, MovieLens 1M, and Epinions data sets are shown in Table 3. The optimal parameters are subject to the lowest ranking score. The other three metrics, namely, precision, interuser-diversity, and novelty, are obtained at the optimal parameters. Clearly, the NNMD algorithm outperforms the SMD algorithm over all four evaluation metrics.
The comparison of precision between NNMD and SMD in three data sets under different length of recommendation list is shown in Figure 4. It clearly indicates that the precision of the NNMD algorithm is better than that of the NMD algorithm in all the three data sets and it has a very significant improvement in MovieLens 100k and MovieLens 1M. That is to say, our method can recommend objects for users more accurately.
Figure 5 shows the comparison of interuser-diversity between our method NNMD and SMD in three data sets under different length of recommendation list. It clearly shows that interuser-diversity of our NNMD algorithm is better than that of the SMD algorithm in all the three data sets, especially in MovieLens 100k and MovieLens 1M. In other words, the objects in the recommendation list of our method are more different between users.
Figure 6 shows the comparison of novelty between our method NNMD and SMD in three data sets under different length of recommendation list. It clearly indicates that the novelty of our method is much better than the SMD in MovieLens 100k and MovieLens 1M, while, in Epinions, the results of the two algorithms are very similar, but our method also has a little improvement than that of the SMD algorithm.
In summary, the recommendation performance of our method is better than that of the standard mass diffusion. In particular, the precision of our method increases an average of 13.27% percent compared to that of the SMD in MovieLens 100k and increases an average of 35.9% percent in MovieLens 1M and increases an average of 4.47% percent in Epinions. Although the improvement of the algorithmic performance in some aspects is not significant in Epinions data set, the reason may be that the data is so sparse that the novel algorithm cannot get the proper user’s nearest neighbors and it affects our algorithmic performance.
5. Conclusion and Future Work
Most of network based recommendation algorithms have a tendency to recommend popular objects to the users [1] because the object with high degree has a significant influence in the resource diffusion process. In this paper we propose a novel preferential diffusion recommendation algorithm based on user’s nearest neighbors which give a high weight to the influence of the target user’s nearest neighbors in the resource diffusion process. Experimental results based on MovieLens 100k, MovieLens 1M, and Epinions data set show that making a suitable adjustment in the parameter or the size of the user’s nearest neighbors set can help recommendation algorithm get a better recommendation performance. It can not only provide more accurate recommendations but also generate more diverse and novel recommendations.
For future work, we intend to consider the level of rating between user and his nearest neighbors. Moreover, we will use the trust data [21, 22] in the network, because it can be used to find the nearest neighbors more accurately in high sparse data set, and it may have a better recommendation performance.
Conflicts of Interest
The authors declare that they have no conflicts of interest.
Acknowledgments
This work is partially supported by National Natural Science Foundation of China (Grant nos. 71361012 and 71363022), by National Science Foundation of Jiangxi, China (no. 20161BAB201029), and by the Foundation of Jiangxi Provincial Department of Education (no. GJJ. 150446).