Abstract

Most trust inference algorithms based on -learning and learning automata often suffer from the problem of insufficient storage space, as well as some algorithms using the shortest trust path do not fully utilize the trust information. To solve the above problems, a trust inference algorithm (DuelingDQNTrust) based on deep reinforcement learning is proposed. It uses Dueling DQN to compute values to reduce the required storage space and does not limit the traversal depth. Moreover, the way to find out the trusted neighbors of a user is crucial because it affects the reliability of the trust path. So three combination strategies are proposed to find users’ trusted neighbors more accurately. One strategy is to define a priority metric by combining node importance and user similarity to find the top neighbor users of the user as the trusted neighbor set. Another two strategies consider node importance and user similarity in a different order of priority, which first uses a metric to find the top neighbor users of the user, and then further filters these users on another metric to determine the trusted neighbor set of the user. The evaluation of the algorithm on the FilmTrust dataset indicates that the algorithm outperforms several existing trust inference algorithms.

1. Introduction

Users often encounter unfamiliar users with whom they have never interacted in large-scale social networks [1, 2]. In this situation, trust becomes essential for people to make decisions [3, 4]. Users can infer the trustworthiness of a user directly or indirectly by using their previous interaction experiences. Suppose that and are two users who do not know each other, and the user wants to know if the product recommends for him is worth buying. Since and have no interaction experience, cannot intuitively determine whether is trustworthy. If and have a common friend , then can learn information about from . Therefore, assesses the trustworthiness of based on ’s previous interaction experience with .

If a user wants to determine whether another user is trusted, he must first discover the trust path between users (e.g., as mentioned above). Every pair of adjacent users is considered trusted by each other in the trust path. Then, the trust value is calculated by the aggregation function. Ultimately, the user decides whether to trust the interaction object by using the calculated trust value. The above inference process is regarded as trust inference [5, 6].

A large number of research results have emerged in trust inference [79], but there are still many challenges in trust inference work, such as how to find users’ trusted neighbors more accurately and how to find more reliable trust paths. The accuracy of the prediction is greatly affected by the absence of an effective aggregation strategy after the trust paths are found, so finding an effective aggregation strategy is also an important challenge. Some existing research works use the shortest trust path to predict trust as in the literature [10], which may lose some valuable information and thus affect the accuracy of the prediction. In addition, many trust inference algorithms suffer from the problem of insufficient storage space, e.g., the literature [11] uses -learning method to find trust paths, which requires a matrix to store values. The problem is that the storage space will be insufficient when there are too many users.

A trust inference algorithm based on deep reinforcement learning is proposed to solve the above problems. It uses Dueling DQN to compute values, which can solve the problem of insufficient storage space. It does not limit the traversal depth for the problem that the shortest trust path is not fully utilized for trust information. It is well known that users are more willing to trust more trusted users, so a new metric, i.e., node importance, is proposed to find users’ trusted neighbors more accurately.

The main contributions of this paper are as follows: (1)The trust inference algorithm uses the deep reinforcement learning method Dueling DQN and proposes a node importance metric to find the user’s trusted neighbors. Since users prefer to trust users who are trusted more, the sum of trust values of those nodes that trust the current node is given greater weight in calculating the node importance(2)Three strategies are proposed to calculate the trust value among users by considering the node importance and user similarity. The values of the parameters for the three strategies are determined by experiments(3)The algorithm is evaluated on the FilmTrust dataset, and the evaluation results validate the effectiveness of the proposed algorithm and outperform several existing trust inference algorithms

The rest of the paper is organized as follows. Section 2 focuses on the background knowledge and the related work. Section 3 presents the problem description and basic definitions. The proposed algorithm is described in detail in Section 4. Section 5 focuses on the analysis of the experiments. Section 6 summarizes the work of this paper.

This section introduces the concept of trust, the theoretical knowledge related to reinforcement learning, and the work related to trust inference.

2.1. Trust

Trust is currently studied in many disciplines, such as computer science, economics, and sociology. Each subject has considered trust from a different perspective and has its definition [1214]. Literature [10] proposes a definition of trust in social networks as “trust in a person, that is, a promise to behave in a certain way, based on the belief that the person’s future behavior will bring a good outcome.” The trust mentioned in this paper uses the definition in literature [10].

Trust can be represented in different ways. The logical representation uses 0 for distrust and 1 for trust; it can also be expressed in another logical way, where -1 means distrust, 0 means do not know, and 1 means trust. The probabilistic representation expresses the level of trust between users in terms of probability values. The probability is 1 indicating that a user trusts another user totally, and the probability is 0 meaning that a user distrusts another user completely. The level representation represents trust with a discrete interval value, where there are two intervals [-1,1] and [0,1]. The first case considers -1 as absolute distrust, 0 as half-trust, and 1 as absolute trust; the second case considers 0 as distrust and 1 as trust.

2.2. Reinforcement Learning Background

Reinforcement learning [15] is the learning of an agent in a trial-and-error way, with the goal of maximizing the reward for the agent. The idea is that when an agent completes a task, it chooses an action from the action set according to a specific strategy and interacts with the surrounding environment. Under the action and the environment, the environment will return a current reward to the agent, and the agent will generate a new state.

Reinforcement learning is a problem of sequential multistep decision-making through a multistep decision-making process to reach a goal. Therefore, its learning process is similar to the path search process and can be naturally applied to trust inference.

Dueling DQN [16] evolved from the DQN [17] algorithm, a combination of deep learning and reinforcement learning. Dueling DQN optimizes the neural network structure. It decomposes the action-value function into a state-value function and an advantage function. This structural improvement improves the stability of the algorithm.

2.3. Related Work

Some scholars have proposed different trust inference algorithms. The model proposed by Golbeck [10] uses the shortest trust path to evaluate the trust value between users, as well as uses direct trust as the weight when calculating the trust value. However, it may lose a lot of valuable information using the shortest trust path to predict trust, as well as affecting trust prediction accuracy. Literature [18] constructs a new metric named locally weighted centrality to find trust paths based on degree centrality, while the algorithm does not take into account the fact that users are more likely to trust more trusted users when calculating locally weighted centrality.

The trust inference algorithms proposed in the literature [11, 19] improve the accuracy of prediction to some extent. Nevertheless, the -learning algorithm used in literature [11] tends to be less stable and requires a matrix to store values. There will be more state data in large social networks; as a result, it will consume a lot of time and space to find and store values. The distributed learning automaton algorithm used in literature [19] assigns a learning automaton for each node and can effectively identify credible trust paths. Unfortunately, this algorithm has a similar problem to the -learning algorithm, because it requires vectors to store action probability values. Thus, the storage space will be insufficient when there are too many users.

Jiang et al. [20] proposed the SWTrust algorithm. This algorithm uses the “small world” theory to limit the traversal depth to 6. It mainly considers the semantic similarity of users in social networks and ignores the trust propagation ability of users. The authors in literature [21] took into account the dynamic nature of trust and argued that trust changes with the experience of interactions between users. However, since the algorithm takes into account all trust paths in the inference process, it will take more time to run than an algorithm that uses only the shortest path. The two algorithms improve the accuracy of prediction to some extent.

The user’s topic of interest and the trust propagation ability of the nodes are considered in the process of finding trust paths in literature [22]. Only its most relevant neighbors are selected in selecting the next step of the path for each of the intermediate nodes. Although this method reduces the time complexity of path search, it does not assign weights to the two metrics to observe the degree of influence of different weights on the experimental effect when calculating the priority of nodes using the two metrics. Barzegar Nozari and Koohi [23] proposed a new way to calculate similarity, confidence, and identical opinion. The implicit trust network is constructed by the proposed three metrics, which leads to score prediction. The algorithm improves the accuracy of recommendations.

Chen et al. [24] proposed a trust assessment method considering the influence of context on trust inference. The method improves the accuracy of trust assessment by identifying the themes behind trust. Jiang et al. [25] proposed a dynamic trust model. The model focuses on the frequency of interaction between users, the similarity of user attributes, and the number of public friends and introduces a time decay factor. If two users do not interact with each other for some time, the trust value between them will decrease. The problem with this model is that it takes into account the decay of trust but does not take into account the increase in the trust value between users as the frequency of their interactions increases.

The authors in literature [26] proposed a weighted heuristic search trust prediction model for the path length and the distribution of trust values over the entire path. The model combines a weighted heuristic algorithm with trust network features in pathfinding, incorporates trust decay in trust calculation, and improves the accuracy of trust prediction. In literature [27], the authors argue that an essential problem in inferring trust values is how to select the optimal trust path from the set of adoptable trust paths. A dynamic weighted heuristic trust path search algorithm is proposed to solve the optimal trust path problem. It integrates path length and heuristic factors and restricts the depth of the search path to 6. Both of the above algorithms use only one optimal trust path to predict trust, which may lose some valuable information. If a few more suboptimal trust paths are added to predict, trust may improve the accuracy of prediction.

3. Problem Description and Basic Definition

This section mainly includes problem description, node importance definition, and Markov decision process. The descriptions of the notations used in the paper are shown in Table 1.

3.1. Problem Description

is a trust network. denotes all nodes in the trust network (each node represents a user), and denotes all edges. Each edge in the trust network represents a trust relationship between users, and the magnitude of the weights on the edges represents the degree of trust between users. Suppose the source node is and the target node is . If there is an edge between node and node , where , then it means that there is direct trust between and , and it is said that node and node are neighbor nodes. The weight on edge denotes the trust value from node to node , denoted by , where . If a node has no trust in another node, it is indicated by 0. If it is trusted, it is indicated by 1, and in other cases, it is denoted by a value between 0 and 1. The purpose is to evaluate the trust value between nodes by the trust inference algorithm if there is no direct trust between node and node , and the inferred trust value is denoted by .

3.2. Definition of Node Importance

The node importance index is proposed to find the trusted neighbors of a node more accurately. It helps current users to filter out trusted neighbors, which refers to the importance of the node in the social network. The node importance of node is denoted as , where . Only nodes with a trust value greater than or equal to 0.5 are selected when calculating node importance. The formulas for calculating the importance of node are as follows: where is the sum of trust values of nodes trusting node , is the sum of trust values of nodes trusted by node , denotes the set of nodes trusting node , and is a node in the collection of nodes. is the set of nodes trusted by node , is the trust value of node to node , and is a parameter that takes values in the range [0.1,0.9]. The sum of trust values of nodes that trust the current node is given greater weight when calculating the node importance because people prefer to trust users who are trusted more than users who trust others more.

3.3. Markov Decision Process

The task of finding trust paths can be transformed into a Markovian decision process and implemented using the deep reinforcement learning Dueling DQN algorithm. The Markov decision process can be described as a tuple .

denotes the set of states. Consider all users in the trust network as the set of states, as shown in Equation (2), where indicates the total number of users. Each node in the trust network is considered a user; each user is defined as a state, which is represented by . “State” and “user” can be used interchangeably.

denotes the set of actions that can be selected by an agent. The action is defined as the selection of the next trusted user by the agent from , where denotes the set of candidate trusted users of user . The candidate users of the current user are filtered considering the relatively large amount of data and complex relationships in social networks. For example, the hybrid strategy mentioned later combines node importance and user similarity into a single metric, and the top neighbors of the present user are selected using this metric. These users are considered as the candidate action set of the current user .

denotes the state transfer probability, such as the probability that an agent performs action to jump from state to the next state , which can be described as .

is the reward function, and the reward function is shown in Equation (4). If the user selected by the agent has been accessed or there is no trusted user to select for that user, -1 is returned as a penalty; otherwise, is returned as an immediate reward.

denotes the discount factor, where . It determines how the trade-off between current and future rewards is made.

4. Trust Inference Algorithm

This section details how to select the trusted neighbors of users and how to calculate the path strength and the aggregation policy. The detailed algorithm is shown in Algorithm 1.

In this algorithm, the inputs are the trust network, the source user, the target user, and the set of neighbors of the target user, and the output is the predicted trust value from the source user to the target user. The algorithm is trained using data from the experience pool, and the parameters are updated using the gradient descent method. The path strength is calculated after each trust path is found, and the predicted trust value is obtained by aggregating multiple trust paths after all trust paths are found.

Input: Trust network , source user , target user , target user’s neighbor set
Output: Predicted trust value
1: Initialize the capacity of the memory pool as
2: Initialize the weights of the network
3: Initialize the target- network
4: repeat
5: repeat
6:  Selecting the next user from the action set using the policy
7:  Get the reward value for action and the next state
8:  Store into
9:  Get samples from
10:    set
    
11:  Gradient descent update for
12:  Reset every steps
13: until
14: Path strength Minimum value of current path
15: ifcurrent maximum path strength then
16:  
17: end if
18: until All neighbors of the target user are accessed
19: Calculate the trust prediction value using Equation (6)
4.1. Trusted Neighbor Selection Strategy

An essential step is to identify the trusted neighbors of the user in the trust inference algorithm. The user’s trusted neighbors are filtered by node importance and user similarity. There are two types of filtering approaches: hierarchical approach and hybrid approach.

4.1.1. Layering Strategy

The node importance and user similarity for the current user need to be considered in turn when selecting the next user. First, the neighbors of the current user are filtered using one metric, and then, the set of candidate trusted neighbors of the current user is determined by another metric.

Im-Si (importance-similarity): this strategy first considers node importance and then user similarity. The top neighbor nodes for the current user are selected as the candidate set by node importance. User similarity is then used to filter out the nodes that meet the condition. The condition is that the similarity value with the current user is greater than the mean of the similarity values of the current node with all trusting neighboring nodes. These nodes satisfying the condition are the set of trusted neighbors of the current user. The user only needs to choose from the collection of candidate trusted neighbors when it selects the next user

Si-Im (similarity-importance): this strategy first considers user similarity and then node importance. Firstly, the top neighbor nodes are selected as the candidate set of the current user by the user similarity. Then, the node importance is used for the second screening from the candidate set, and the screening condition is that the node importance value is greater than the mean of the node importance values of all trusting neighboring nodes of the current user

4.1.2. Hybrid Strategy

HS (hybrid strategy) integrates node importance and user similarity. The priority metric of nodes is proposed according to the above two metrics, and its calculation is shown in Equation (5). The top neighbor nodes are selected by calculating the priority of all neighbor nodes of the current node , and the selected neighbor nodes are the candidate trusted neighbor set of the current user. where is a parameter to adjust the weights of node importance and user similarity, its value range is [0.1,0.9], and is a neighbor node of the current node. The similarity between user and user is represented by , calculated using the Pearson correlation coefficient. The larger value of means the higher priority of this node.

4.2. Aggregation of Multiple Trust Paths

The trust value is calculated using the aggregation function proposed in literature [19]. The is computed by aggregating the direct trust values of multiple neighbor nodes of the target user . The formula is as follows: where indicates the average that user offers trust value, indicates the maximum trust value that direct neighbor nodes of target user offer, indicates the number of direct neighbors that the target user has, and indicates the trust value of user to user . indicates the path strength.

4.3. Calculate the Path Strength

First, the user randomly chooses the next user from the action set . The reward value is returned according to the reward function defined by Equation (4) and then transferred to the next state . The data is saved into the memory pool to be used as the input to the network.

The Dueling DQN algorithm uses the data in the memory pool for training. The current value is generated in the network for path selection, and the target value is generated in the target network to measure the value of the selected path. The target- network has slower parameter updates compared with the network, often once per fixed number of steps. This method can reduce the correlation between data, as well as make the network training more stable. Trust paths are discovered by iterative learning until the direct adjacent node of the objective node is discovered, where .

The of the discovered trust path is obtained during this phase according to the propagation function (which is the smallest of the trust values in the trust path). Update as the current path’s path strength when a new trust path is found with a path strength greater than or equivalent to the present largest path strength . All trust paths are found when all direct neighbors of the target node have been accessed.

5. Experiment

This section evaluates the performance of the proposed trust inference algorithm, mainly including the experimental design and experimental results.

5.1. Experimental Design

Dataset: experiments are conducted on the FilmTrust dataset to assess the performance of DuelingDQNTrust. The dataset includes 571 users, 35497 movie rating data, and 1853 trust relationships. The trust relationship between users in the initial dataset is denoted as 1 to 10, where 1 denotes absolute distrust and 10 denotes absolute trust. The data are normalized in the experiment, and the trust values are expressed as 0 to 1. 0 means absolute distrust, and 1 means absolute trust

Evaluation methodology: using the leave-one-out method for experiments, the method first neglects the true trust value of the source node to the target node , then searches for trust paths according to the DuelingDQNTrust algorithm to forecast the trust value. Lastly, the performance of the algorithm is measured by comparing the real trust value with the forecasted trust value

Evaluation metrics: four metrics are used to evaluate the performance of the proposed method. Their definitions are given as follows:

5.2. Determination of the Parameters , , and

Three strategies are proposed to find the trusted neighbors of users, and the values of parameters for each strategy need to be determined. In this section, experiments are carried out on HS, Si-Im, and Im-Si strategies, and the effect of parameters is discussed on the performance of the algorithm.

5.2.1. Determination of Parameters in HS Strategy

The value of the parameter is increased from 5 to 30. From Figure 1, it can be found that when is equal to 10, the value of MAE is the smallest, and the values of precision and -score are the largest. So the value of is taken as 10.

The parameter is used to adjust the weights of the two trust sums when calculating the node importance. The range of the parameter is set from 0.1 to 0.9. From Figure 2, the MAE is the smallest and the other three indicators are the largest when . According to the above analysis, the value of is taken as 0.6. This experiment also verifies the accuracy of the idea that users tend to trust the users who are trusted more.

The parameter is used to regulate the ratio of node importance to user similarity. From Figure 3, the MAE is the smallest with a value of 0.0778, and the -score and recall are the largest with 0.9595 and 0.9899 when . The algorithm has the highest precision with the value of 0.9352 when the parameter is 0.1, but not the best performance in other metrics. Through the above analysis, the value of the parameter is 0.5.

5.2.2. Determination of Parameters in Si-Im Strategy

It can be found that MAE is the smallest and recall and -score are the highest from Figure 4 when . The algorithm has the highest precision when . -score can more accurately assess the performance of the algorithm with the comprehensive consideration of precision and recall, so the parameter is set to 0.6.

The algorithm performs best when the value of parameter is 15, as shown in Figure 5. The MAE is the smallest, and the -score and precision are the highest. Therefore, the parameter is taken to be 15.

5.2.3. Determination of Parameters in Im-Si Strategy

The algorithm performs best when , as seen in Figure 6. MAE is the smallest, and precision and -score are the highest. From Figure 7, it has the smallest MAE and the highest precision and -score when the parameter is taken as 10. Through the above analysis, the algorithm performs best when and .

5.3. Comparison of the Three Strategies

The proposed trust inference algorithm considers node importance and user similarity, and these two measures can be combined in three ways to select the user’s trusted neighbors. The experimental results of the three strategies are shown in Table 2.

Table 2 shows that the Si-Im strategy has the smallest MAE and the Im-Si strategy has the largest MAE. The Si-Im strategy outperforms the other strategies in terms of precision and -score. The HS strategy has a higher recall and the lowest precision. It can be summarized that the Si-Im strategy has the best performance among the three strategies, with smaller MAE, higher precision, and -score. Therefore, the Si-Im strategy is used for trust inference.

5.4. Comparison with Other Algorithms

It is compared with several other algorithms for further validation of the performance of DuelingDQNTrust, and the comparison outcome is indicated in Figure 8. It is clear from the figure that the MAE is much smaller for DuelingDQNTrust compared with TidalTrust and MoleTrust. The MAE of TidalTrust and MoleTrust is about three times that of DuelingDQNTrust. The precision of MoleTrust, TidalTrust, and DDQNTrust is 0.9284, 0.9281, and 0.9210, respectively. DuelingDQNTrust improves 3.5%, 3.6%, and 4.3% compared with these three algorithms in the precision metrics. The -score of the comparison algorithms is 0.9472, 0.9451, and 0.9528 in order (in the order of ranking in the figure), and DuelingDQNTrust improves 2.2%, 2.4%, and 1.6% in the -score metrics compared with the comparison algorithms.

DuelingDQNTrust outperforms the other three algorithms through the above analysis. The algorithm takes full advantage of the trust information in the trust network, uses effective metrics to find trusted users, and uses the Dueling DQN method to search for trust paths, thus improving the accuracy of prediction.

6. Conclusion

Most trust inference algorithms often suffer from the problem of insufficient storage space, as well as some algorithms using the shortest trust path do not fully utilize the trust information. In addition, some previous work has considered the trust values of the incoming and outgoing degrees of nodes when finding users’ trusted neighbors. However, it has not been considered that users prefer to trust users who are trusted more. A trust inference algorithm based on deep reinforcement learning Dueling DQN is proposed to solve the above problems. The algorithm proposes a node importance metric and three combination strategies combined with user similarity. The corresponding parameters for each strategy are determined by experiments. The value of in all three strategies is greater than 0.5, thus verifying the accuracy of the idea that users are more willing to trust users who are trusted more. The experimental outcome indicates that the proposed trust inference algorithm is more precise compared with several other trust inference algorithms.

We will investigate the aggregation function in future work since the aggregation function is a key factor affecting the performance of trust inference algorithms and also consider the contextual background of trust since someone who is highly trusted in one domain may not be as trusted in another domain.

Data Availability

The data used to support the findings of this study are available from the corresponding author upon request.

Conflicts of Interest

The authors declare that they have no conflicts of interest.

Acknowledgments

This work was supported by the National Natural Science Foundation of China (No. 62072392 and No. 61972360).