CGPP-POI: A Recommendation Model Based on Privacy Protection

Li, Gesu; Yin, Guisheng; Xiong, Zuobin; Chen, Fukun

doi:https://doi.org/10.1155/2021/4873574

Wireless Communications and Mobile Computing

On this page

Abstract Introduction Related Work Preliminaries Conclusions Data Availability Conflicts of Interest Acknowledgments References Copyright Related Articles

Special Issue

Secure Computational Solutions for Sparse Data Challenges in the Internet of Things

View this Special Issue

Research Article | Open Access

Volume 2021 | Article ID 4873574 | https://doi.org/10.1155/2021/4873574

CGPP-POI: A Recommendation Model Based on Privacy Protection

Gesu Li,¹Guisheng Yin,¹Zuobin Xiong,²and Fukun Chen¹

Academic Editor: Yan Huo

Received17 Jun 2021

Accepted18 Aug 2021

Published31 Aug 2021

Abstract

At present, with the popularization of intelligent equipment. Almost every smart device has a GPS. Users can use it to obtain convenient services, and third parties can use the data to provide recommendations for users and promote relevant business development. However, due to the large number of location data, there are serious data sparsity problems in the data uploaded by users. At the same time, with great value comes great danger. Once the user’s location information is obtained by the attacker, severe security issues will be caused. In recent years, a lot of researchers have studied the recommendation of point of interests (POIs) and the privacy protection of location. Yet, few of them have explored both together, which induces some drawbacks on the combination of them. This paper combines POI recommendation with a privacy protection mechanism. Besides providing user with POI recommendation service, it also protects the privacy of user’s location. We proposed a POI recommendation model with privacy protection mechanism, termed POI recommendation model for community groups based on privacy protection (CGPP-POI). This model can ensure the recommendation accuracy and reduce the leakage of user location information via taking advantages of the characteristics of location. At the same time, it deals with the problem of poor recommendation performance caused by sparse data. In addition, through the expansion of location, random and other methods are used to protect the user’s real check-in information. First, the data processed at the terminal satisfied local differential privacy. At the same time, we use the data to build a recommendation model. Then, we use a community of user in the model to improve the availability of these disturbed data, explore the relationship between users, and expand check-ins within the community. Finally, we provide the POI recommendations to users. Based on the traditional evaluation criteria, we adopted four metrics, i.e., accuracy, recall rate, coverage rate, and popularity in evaluation part, where intensive experiments conducted on real datasets Gowalla and Brightkite demonstrate that our approach outperforms the baseline methods significantly.

1. Introduction

With the advent of 5G techniques, people’s lifestyles have been changed thoroughly. Smartphones have become a ubiquitous part in our daily life, e.g., using smartphone to seek information, shopping online, and performing navigation. Hou et al. [1] pointed out that with the help of smart devices, sharing information has become a normal part of people’s life. Especially during the epidemic, intelligent equipment has a great impact on China, for example, scanning QR codes to report trips and taking online classes. In short, it has accelerated the rapid development of e-commerce industry and internet technology and driven the overall economic development. But the convenient life requires users to provide more information, and the most useful information is GPS location. Location information is a very special data, because it has bonded with people’s routine, e.g., navigation, shopping, and social activity, and thus has strong private attributes. Almost all software and applications require users to provide GPS information if they want to get a better experience. Currently, in order to get recommendation service, users still need to upload their exact GPS data to third parties. According to this location, the third party can provide nearby researched results. The exploration of POI recommendations is still in early stages, while, with the rapid development, people need to get more fresh information in a shorter time. For example, a person’s daily pattern is two dots in a line, which means he/she is only active at home and work place. Suppose that one day he/she needs to go somewhere unfamiliar, searching on the internet for information is necessary. However, it is hard to figure out useful information among such a huge information flow. To solve this problem, recommendation system emerged, aiming at providing users with POI that they might be interested and saving searching time for them.

In recent years, recommendation research based on POI has focused on machine learning. Generally, the research on POI falls into the following categories: (1) tensor decomposition; (2) Markov chain model; and (3) neural network model. These POI recommendation algorithms have achieved good results in certain scenarios, but they lack generality. At the same time, they ignore the privacy issues caused by user trajectories. In the work of Xue et al. [2], the authors pointed that the friend relationship has an influence on users’ future behavior, which shows that social information plays a role in the process of recommendation. Concurrently, the paper proposed that when excluding the influence of work location and family information, the influence of social relationship is more apparent. In addition, an attacker can infer a specific user by using friends and some background knowledge. Undoubtedly, social relationships indeed increase the risk of users’ privacy leakage.

The development of GPS positioning and network technology not only brings convenience to users’ lives but also increases the risk of users’ location information leakage. Liang et al. [3] demonstrate that the prediction accuracy of tap position inference can be at least 90 percent by utilizing convolutional neural networks. As special information, location can be regarded as both public information and user’s private information. For example, on a map, where every location is public, this information is available, and third parties can use this information and certain statistics to build user profiles. But there is no doubt that a specific user’s trajectory is private. Once leaked, it will cause security and privacy risks to the user. If the leakage happens in the data collection stage, it will cause a huge disaster for both the company and the user, such as the AOL [4] and Netflix [5] data breaches, which cause immeasurable social and economic consequences.

For the preceding reasons, we should take into account the particularity of location information. We propose a POI recommendation model based on privacy protection, which explores the relationship between users and the range of check-in. For example, in Figure 1, there are three users, represented by red, yellow, and blue dots. Each user has a certain number of check-ins in a certain period of time. Intuitively, blue users and yellow users are more closely related because they are more likely to be in the same area. As we can see from the figure, each user’s check-in is their exact coordinates. If the same check-in map is obtained by an attacker, the attacker can easily predict where a user will go in the future. But if we expand the scope of a user’s check-in location, the amount of information it contains will be ambiguous. Take the large blue check-in in Figure 1 as an example. Within radius , the check-in scope contains only one check-in. When it was extended to , it had four check-ins. Therefore, if the range is uploaded for service, the probability of an attacker inferring a true check-in will be decreased. Therefore, the user’s real location can be protected. Again, all the check-ins in Figure 1 and where they are located can be considered public information without identifying the user who visited them. This public information is useless to the attacker. But for users, these check-ins can provide them with more options for future utility. In addition, on the establishment of the recommendation model, we hope to enhance the relevance among users by expanding the scope of real check-in, as the blue and yellow users in Figure 1. However, when the scope is extended, as shown in the figure, many check-ins of yellow user and blue user are partitioned into the same range, which increases the connection between blue user and yellow user and adds stronger correlated options to the prediction. It also shows that in complex network, community is a suitable way to solve the relationship between users [6]. In this paper, we build a satisfactory mechanism to provide users with POI recommendations through the user check-in information, while preserving user privacy protection for location information. Experiments show that this model can provide users with good recommendation choices under the privacy protection mechanism. Meanwhile, this model is constructed in a hierarchical way which reduces the amount of computation and effectively reduces the time cost. The innovation points of this paper are as follows: (i)Integrating community detection mechanisms with location network. The community detection model is improved according to the needs of this paper and is adapted to suit the location network. At the same time, it can solve the problem of data sparsity in location recommendation(ii)Instead of the traditional knowledge graph built by all users, the knowledge graph is processed hierarchically. We explore the relationships between users, locations, and check-ins within the community(iii)In the process of recommendation, privacy protection mechanism is considered. To deal with real location coordinates, we apply local differential privacy on it. At the same time, the privacy protection mechanism is also added at the end of the recommendation, protecting the user’s real location and future travel safety.

The content of this paper is organized as follows: Section 2 is related work, discussing the current research status of POI from two aspects of recommendation model and privacy protection. Section 3 is the preliminary knowledge introduction of the model. The basic knowledge of community detection and LDP involved in the model is introduced. Section 4 is the problem definition of the model. Section 5 is the main part of the model, which is elaborated from two parts: privacy protection mechanism and recommendation. Section 6 is the experiment of the paper, including the introduction of data set, setting, evaluation standard, and the analysis of experimental results. Finally, the conclusion and future work of this paper is summarized in Section 7.

In this section, we introduce the research on recommendation and privacy protection based on POI in recent years. We will discuss the current research status of these two aspects and point out the existing problems and the direction for improvement in this paper.

2.1. The Recommendation Model on POI

Recommendation based on POI has attracted much attention in recent years. Many scholars focus on solving the problem of data sparsity. In [7, 8], the authors mentioned that the density of check-in data is usually around 0.1%, while the data density of Netflix movie recommendations is around 1.2%. Data sparsity has a greater impact on POI recommendations. Zhao et al. [9] use context to predict the next POI through associative learning, in which authors construct an STGN to construct personalized sequences for users’ long-term and short-term preferences. Li et al. [10] proposed a dynamic network recommendation algorithm based on HMM, which also includes privacy protection mechanism. Collaborative filtering, as a widely used recommendation algorithm, is also widely used in POI. Koren et al. [11–13] recommend algorithms based on matrix decomposition and its variants. Huang et al. [14] proposed a retrieval algorithm based on convolutional neural network, which is a recommendation technology for 3D and other Internet of Things devices, wherein Mnih and Salakhutdinov [13] proposed a PMF model containing Gaussian noise based on Bayesian framework. This model can maximize the posterior probability of potential features under the Gaussian hypothesis. Yin et al. [15] proposed an implicit probability generation model for the phenomenon of interest drift across geographic regions. This model can learn individual interests according to the check-in of individual interests in each region. It used social and spatial information to enhance regional dependence. At the same time, the author designed an effective pruning algorithm to overcome the dimension problem. Wang et al. [16] proposed a trust-based probabilistic recommendation model for social networks. In order to solve the problem of users’ cold start, the authors consider their latent factor to find potentially similar users. The work of Li et al. [17] proposed a recommendation algorithm based on community division, which can also protect the privacy of users. At present, researchers have proposed a variety of solutions to the problem of data sparsity. With further understanding, most methods improve accuracy by adding features and increasing the complexity of the model. The result is an increased cost of time, which also requires users to upload more personal information when using third-party software to access the corresponding services. This is easy to cause the leakage of privacy.

2.2. The Privacy Protection on POI

Location-based services typically allow users to upload their current location first. The third party obtains the information and provides recommendation service for the user according to the surrounding situation. Usually, the information uploaded by the user is exact location coordinates. In this paper, we denote this exact coordinate as . But when a user provides location to a third party, there is no way to determine whether the third party can be trusted. At the same time, there is a risk of leakage during the process of information collection. Therefore, researchers have proposed some solutions to protect users’ location privacy. -anonymity is the most widely used privacy protection method. Kido et al. and Shankar et al. [18, 19] proposed a method to protect the user’s identity information, so that the attacker cannot deduce which specific user is querying. Xue et al. and Bamba et al. [2, 20] focus on protecting the user’s real location, where authors hide the user’s real location by using virtual location. The preceding privacy protection methods can be boiled down to hiding. By hiding real data in other data, it prevents the attacker from obtaining the user’s real location. There are other privacy protective mechanisms. Cover [21] introduced a stratagem-based protection mechanism. Papadopoulos et al. and Ghinita et al. [22, 23] proposed methods based on private information retrieval. Cai et al. [24] proposed a data sanitization method that collectively manipulates user profile and friendship relations. Besides sanitizing friendship relations, the proposed method can take advantages of various data-manipulating methods. Memon et al. [25] proposed a multimixed region privacy protection method which is dedicated to the mapping service of vehicles on the road network. Hashem and Kulik [26] proposed the use of mobile devices to form personal self-organizing networks. The authors proposed a decentralized method to protect the privacy of visitors’ location. Chen et al. [27] proposed a method to protect user location privacy based on unobservability, in which the author designed, implemented, and evaluated a protection system named LISA, which is a location information scrambler. The system can adjust the location noise, protect the user’s location privacy, and save the resources (especially the power) on mobile devices. Although these privacy protection mechanisms can protect the user’s location security to a certain extent, they are lying under the assumption that the background knowledge of the attacker is limited, thus with some limitations. Some scholars have proposed spatial transformation, namely encryption, to protect the user’s location information, yet these approaches are difficult to implement on mobile devices because of the huge computation cost even though the bandwidth has been increased with the help of 5G technology.

Moreover, more researches [28–32] have investigated the privacy protection problem in the background of 5G, IoT, and Big Data. As a representative of them, differential privacy [33], a privacy concept in the field of statistical databases, has been widely adopted. Its goal is to publish statistics about the database while protecting users’ personal data. Differential privacy requires that modifying the data of a single user has negligible impact on the query results. Zhao et al. [34] proposed a top-down partitioning algorithm based on local differential privacy (LDP). It can generate undifferentiated location record data. At the same time, a new adaptive user assignment scheme and a series of optimization techniques are used to improve the accuracy of published data. It can be seen from the above studies that some researchers usually adopt mechanisms such as anonymity and disturbance to protect data privacy, and others also adopt encryption or differential privacy. This paper integrates multiple privacy protection mechanisms. We build protection mechanisms for data in the aspect of query, publication, and prediction, which can fully protect the location privacy of users.

3. Preliminaries

In this section, we introduce two main technologies covered in this paper, i.e., community detection and local differential privacy.

3.1. Community Detection

Community detection is typically applied to complex networks such as social networks [35], where each user is represented as a node. It focuses on the relationship between users. Tight-knit users are often divided into a community [36]. At present, communities are divided into overlapping communities and nonoverlapping communities. Overlapping community is more consistent with the division of communities in real life. Therefore, it is widely used. This paper makes an improvement on the traditional community detection by making it blend with the location characteristics. We choose the fast unfolding [37] algorithm as the basic algorithm of community detection and make improvement upon it. The algorithm has the advantages of fast computation and suitable for overlapping community division.

Fast unfolding algorithm is an iterative algorithm, which is mainly divided into two stages. The first stage is the module optimization stage. It mainly divides each node into the community where the adjacent nodes are located, thus increasing the value of modularity. The second stage is the community aggregation stage. It is to aggregate the communities divided in the first step into a single point. In other words, aggregation stage restructures the network. The algorithm is performed by repeating the above two steps until the network structure is stable. The formula involved is as follows: where is all the weights in the network, is the weight between and , and is the weight of the edge linked to the vertex. is the community to which the vertex is assigned. is used to determine whether and are divided into the same community. If it is divided into the same community, it returns 1. Otherwise, it returns 0. Whether it is divided into a community depends on whether is positive or not. In other words, if a node is divided into the existing community, the change of modularity is positive.

3.2. Local Differential Privacy

Dwork et al. [38] introduced the concept of differential privacy in 2006. The initial work is concerned with the processing of central difference privacy [33, 38, 39]. Differential privacy is a privacy preserving statistical query method based on statistics. The purpose is that even if all record in the database is changed, the statistical result will not change too much. One problem with centralized differential privacy, however, is that the third party must be trusted. In fact, the third party is not necessarily trustworthy in reality. To solve this problem, local difference privacy (LDP) is proposed. In LDP, the data is processed by the user locally and then uploaded to the central server; therefore, it provides better privacy protection. The definition of LDP is as follows.

A randomized algorithm is if for any two check-in records and , , and for any possible output , the following equation is satisfied:

The perturbation mechanism commonly used for LDP is random response technique. In LDP, each user perturbs their data and uploads it to the central processor. As a result, the data of any two users cannot be obtained from each other, so the concept of global sensitivity does not exist.

4. Problem Definition

The purpose of this paper is to predict POI for users in a recommendation model. Particularly, we use user location to build a recommendation mechanism. In addition to that, this model also needs to protect users’ location privacy and to prevent an attacker from using all of the user’s data to infer the real location. To facilitate better understanding of the model for readers, we define the concepts and formulate the problems involved in the model as follows.

Definition 1. Location network . The location network graph based on user check-in is defined as . is made up of nodes and edges, where and represent nodes and edges, respectively, wherein is formed by subgraphs. The subgraphs are obtained by community detection. That is, each community forms a subgraph. . The node in the graph is user .
. Each user has its own check-in sequence. , , where L is the check-in set of all user. However, in order to ensure the privacy of user location data, we mix up the order of check-ins before uploading. L = (LOC, LOT), where LOC is the set of the check-in label in L, and LOT is the set of precise coordinates corresponding to the check-in label. The following formula is a summary of the relationships in the graph.

Definition 2. . In order to ensure the privacy of user location information, we mix up the order of check-ins before uploading. Here, we focus on the following three check-in definitions. (1) is the real check-in set of users; (2) is the disturbed check-in set uploaded; and (3) is the expanded check-in set of true check-in. The relationship between the three check-in sets is as follows. wherein the extended check-in set is all the check-ins within this range obtained by taking the real check-ins as the center and as the radius. The perturbation check-in is the replacement check-in randomly selected within the perturbation range of the real check-in. So, even though the set of true check-ins and the set of perturbed check-ins have the same content, each perturbed check-ins are a random substitution of its corresponding true check-ins. Take as an example. and are both subsets of the check-in set, but these two sets are not the same. The elements in the perturbation set are randomly selected and replaced from all the check-ins within the radius of centering on the real check-in . is randomly selected from the extended set, used to replace the original check-in . The other real check-ins of are replaced by the above method. Finally, the perturbation set of is obtained.

Definition 3. Similarity. Community detection is based on close relationships between users. We use the similarity between users as the weight, which is used to indicate the closeness of the relationship between users. The higher similarity implies the closer relationship, and the more likely it is to be divided into the same community. Specially, the Jaccard similarity index is used to measure the similarity between two users.

In Equation (6), represents the similarity between and . are the check-ins of extended scopes and , respectively. The numerator is the check-in that both and have visited, and the denominator is the check-in set of two users.

Definition 4. Community. The first step of community detection is to establish an adjacency matrix between users. Different from existing works, this paper takes users and their check-in as the research object. There is no directionality between users. At the same time, the similarity of check-in between users replaces the weight of edge in traditional social network. That is, the higher the similarity between and , the easier it is to be divided into the same community. The node of the community is the user. is the set of all communities. . . According to the definition of community detection in this paper, the formula of modularity is improved, and the improved formula of modularity is as follows:

Definition 5. Incidence matrix. is the relationship matrix between the user and the extended check-in . The weight in is the number of paths that the user reaches after passing hops. Since each community is a subgraph, the relationship can be formulated as follows.

Definition 6. Privacy. For any user, there is a privacy algorithm . Its domain is , and range is . If the algorithm gets the same output on any two records and , it is said that function satisfies .

Then, the relationship amid perturbation check-in, radius, and target check-in is given as follows. wherein is the set of check-ins that gets according to . The intensity of privacy protection is determined by and the number of check-ins contained within its scope, where the range of is set by the user. In other words, the bigger range, the more check-ins in , and the stronger the perturbation. is a random selection of perturbation data from to replace .

Since it is a random replacement of the original check-in, the randomness satisfies the random disturbance mechanism of LDP. This approach takes advantage of the scalability of the location. The statistical results within a certain range are hardly affected by the expansion of the range. That is, data characteristics are preserved. In other words, the data has good utility. According to this algorithm, can be obtained. The data uploaded by each user is , which is the perturbed check-in. In the third-party statistical process, it satisfies the LDP, and the statistical results obtained after perturbation are basically the same as the real ones.

The specific derivation formula is given by the following example.

Suppose there are users using the software, where the true percentage of users check-in at is . The third party hopes to get the true number of users who check in at through calculating the data uploaded by users for answers. If the probability of being checked in the true answer of the data uploaded by is , the probability of getting the other answer is . According to the answers of users, the number of people who have checked in can be derived. If the statistics show that the number of people who have been to is , the number of people who get the other result is . According to this statistic, the likelihood function is constructed. where the maximum likelihood estimate of is obtained after the logarithmic derivative:

By further solving the mathematical expectation of , we can verify that the above estimation is an unbiased estimation of . Therefore, the number of people who have been to is calculated as follows:

Accordingly, based on the total number of people, we can get the number of people who went to through perturbation probability. If we get that the number of people who have been to is , and the perturbation probability , we can get the real number of users who have been to . By definition, the privacy budget of this inference result is calculated by Equation (14): where is the number of elements in .

5. CGPP-POI

As a special information, location data is of great significance to users. Using location properly can provide great convenience for users’ daily life and can also bring more benefits to the third party. But location is also important privacy information because it records the user’s trajectory. Once the information is obtained by the attacker, the user’s private property and even personal safety will be threatened. Considering the particularity of location information, when we build the recommendation model, we should add privacy protection mechanism to the user location. In this work, the privacy of users is protected from three aspects: publication, inquiry, and recommendation. We propose a recommendation model based on privacy protection, namely the CGPP-POI model. This model builds the privacy protection mechanism based on LDP, which can effectively prevent the leakage of real location. At the same time, by extending the original location range, the coincidence degree of activity range among users is improved. Moreover, the user community is divided based on this, leveraging communities to extend user data to address sparse data and provide users with POI recommendation service. Figure 2 shows the overall framework of CGPP-POI, where the operation of LDP is in the blue dotted box. This part processes the original data of the user at the user side and uploads the data after adding perturbation. The perturbed data can be used for third-party statistical queries and further recommendation model use because of the satisfaction of . The middle part is the POI recommendation model. At the bottom part, the specific recommendation of the POI check-ins is also random. So even if an attacker has a prediction result, they cannot accurately infer the user’s future location.

The following section we will introduce the CGPP-POI model from two aspects, privacy protection mechanism and recommendation model.

5.1. Privacy Protection Mechanism

Figure 1 shows the check-in information of three users; all of them have checked in at different ranges. From these check-ins, we can see the range of activities that each user often does. As can be seen from the activity range, the activity range of blue user and yellow user is highly overlapped. And as we can see from the lower left corner of the range area, the larger the scope, the more check-ins it covers, centered on one of the blue users’ check-ins. These collections of check-ins can be considered public information. However, in the user data, it is regarded as the user’s private information. For example, when we open Google Map, we can see many places marked, such as schools, hotels, and parks. These are considered public data. However, in the user’s check-in records, they are the user’s private information. We will use this public information to predict the user’s POI. is the set of these check-ins. Take the blue user in Figure 1 as an example. We expand the blue check-in at the lower left. When contains two real check-ins, . When selects , we randomly choose from to replace . The other check-ins for the are replaced as described above then uploaded to the third-party unified collection.

Algorithm 1 satisfies the stochastic perturbation mechanism of LDP, and the derivation process is shown in Definition 6. Except LDP processing before uploading, the specific check-in mechanism is also added to the random in the recommended stage. The purpose of this is to prevent attackers from using the recommendation results to attack users. The pseudo code of related algorithm privacy protection of CGPP-POI is as follows.

Input:


Output:
1: selects the extension radius
2: fordo
3:
4:
5: replace
6: end for
7: Get

5.2. CGPP-POI Recommendation Model

The data of each user is processed by Algorithm 1 and uploaded to the third party. The third party collects the data of all users, analyzes the data information, and constructs the recommendation model. The recommendation model are built based on location scalability characteristics. First of all, the data are extended uniformly. There are several reasons for this: (1) generalize the check-in coordinates to increase data information; (2) with extended check-in, the overlapping information among users increases; and (3) the extended scope may contain the real data of the user, which reduce the interference of disturbed data. In other words, and may be the same extended check-in . So, there is more real information in the data, and it is easier to predict ’s real preferences. Equation (6) is used to obtain the similarity between users. It acts as the weight value between users in the adjacency matrix. Through community detection, users with high similarity are divided into the same community, which can reduce data sparsity. According to the users and the check-in within the community, the bipartite graph is constructed to obtain the relationship matrix . The recommendation is the recommendation list for , which includes extended check-ins , and each includes -specific check-ins. To obtain the recommended list of extended check-ins by sorting algorithm, we are going to pick a random number of these check-ins and collect them into the recommendation list for . The recommendation algorithm of CGPP-POI is described as follows:

Input:;
;
;
;
Output: Recommendation list ;
1: To choose the extend range
2:
3: fordo
4:
5: end for
6:
7:
8: Initialize each node to a separated community
9: repeat
10: fordo
11: fordo
12: Remove i from its community, place to ’s community
13: Compute the composite modularity gain
14: end for
15: Choose with maximum positive gain (if exists) and move to ’s community
16: Otherwise, stays in its community
17: end for
18: until No further improvement in modularity
19: Get , which are also independent subgraphs
20: if the number of users in then
21:
22: Sort the recommended label for each user, and get recommendation list
23: else
24: Sort by number of check-ins based on history. And take the first and build the recommendation list .
25: end if
26: fordo
27: fordo
28: , to choose n , and is obtained.
29: end for
30: end for

6. Experiments

6.1. Datasets

The datasets we use in this paper are the Gowalla dataset and Brightkite dataset. They are real-world location-based social network, and both of them contain friend relationship information. However, in the preprocessing stage, we only retain the user’s check-in information. The check-in information includes the check-in label , the corresponding coordinates , and the time information. The time information is kept for experimental purposes only and used to divide the train set and test set. The time context is not involved in data publishing and POI recommendation. Table 1 shows the collated information for the two datasets.

The Gowalla dataset is a location-based social network sourced from Stanford University and collected using public APIs. It is an undirected network containing users’ location information by check-ins. It consists 196,581 nodes and 950,327 edges. The data were collected from February 2009 to October 2010. A total of 6,442,890 users were collected. The raw data consists two texts, one is the user’s friendship and the other is the user’s check-in information. The check-in information includes the user ID, check-in time, check-in label, and the corresponding exact coordinates.

The Brightkite dataset is also a location-based social network. Similar to Gowalla, users can check in to share their location. The data is collected based on the site’s public API. It contains 58,228 nodes and 214,087 edges. The network was originally a directed graph, but the collector made it an undirected network. In this paper, friend links are not used and will be deleted during preprocessing. The data was collected from April 2008 to October 2010, and a total of 4,491,143 users were collected. The original data set contains two files, one is a user’s check-in file and the other is a friend relationship file. We only keep the first file for experiments. The check-in file contains the user ID, check-in time, specific latitude and longitude information, and check-in label.

6.2. Evaluation Methods

The main research objective of this paper is POI recommendation. Meanwhile, privacy protection is added to the original data to ensure the privacy of user’s location. In terms of recommendation, the evaluation criteria for the quality of traditional models are usually taken as the evaluation index from four aspects: accuracy, recall rate, coverage rate, and popularity. POI recommendation, in principle, is roughly the same as item recommendation. The accuracy measures whether the prediction of the user’s future behavior at the next moment is accurate. The recall rate judges whether the same check-in is visits multiple times. Popularity can be thought of as areas where users like to go. These measurements have significant guidance for the construction of other project models by third parties. At the same time, for users, people generally like going to places with many people, e.g., business districts and tourist areas. Therefore, adding these recommendations to the list provides a convenient service for users. The specific formulas are as follows:

Accuracy is as follows:

Recall rate is as follows:

Coverage rate is as follows:

Popularity is as follows:

is a list of check-ins of length recommended by the recommendation model to each user. is the actual list of users. is the collection of users, and is the set of check-in. is how many people have visited , which represents the popularity of the place.

In addition to evaluating the recommended results, we also evaluate the impact of different extension ranges on recommendations.

6.3. Experimental Setup

In this section, the specific experimental settings and some operations in the experiment are addressed. In the preprocessing stage, we deleted the friend relationship and time information in the original data. The training set and test set are divided according to the number of users, and the partition ratio is 7 : 3. Then, we retain the most recent three check-in records of users to compare with the predicted results. We choose three benchmark algorithms: item-based collaborative filtering, user-based collaborative filtering, and MF as the baseline algorithms. The reasons for choosing these three are as follows. (1) These three algorithms are recommended classical algorithms; (2) the location-based recommendation in this paper is not path prediction, so many algorithms are not applicable; and (3) these three comparison algorithms have good results in various indicators, and the prediction results are stable and widely applicable in the recommendation algorithm. The number of neighbors in is set to 3. The recommended number of is 3. The recommended number of is set to . is the number of decimal points retained in the experiment. We round to the nearest hundredth for latitude and longitude in recommendation experiment. While in the experiment on the impact of the extended range on the recommended results, the number of decimal points retained for latitude and longitude is .

6.4. Experimental Results

We perform two types of experiments. The first one is the recommendation experiment. The experiment sets up two specific scenarios in which the evaluation results were obtained. One scenario is preprocessing recommendations without privacy protection. Another scenario is a recommendation experiment to perturb the data during processing. The second experiment is to explore the impact of the intensity of privacy protection on recommendations. The results are given in the following two sections.

6.4.1. Recommendation Results

Figures 3 and 4 both contain eight subfigures. Figures 3(a)–3(d) and 4(a)–4(d) are the results of the Gowalla dataset. Figures 3(e)–3(h) and 4(e)–4(h) are the results of the Brightkite dataset. We evaluated our proposed method and three baselines from four aspects of accuracy, recall rate, coverage rate, and popularity. Figure 3 is the recommendation result without noise, and Figure 4 is the recommendation result with noise. We can see from these figures that, except for popularity, the results of CGPP-POI algorithm are all superior to the other three evaluation indexes. The reason that CGPP-POI’s result is less popular than other algorithms is that CGPP-POI divides users into communities, where recommendation results are obtained from. Some communities have a low level of popularity, which can lead to a decrease in popularity. Other algorithms do not have the concept of community, so it is a global selection of check-in recommendation and can derive a higher popularity. We can see the accuracy from Figures 3 and 4. The accuracy after adding noise is affected and is much lower than without noise. That is because we choose to round to the nearest hundredth. That is, the larger extended range and more perturbation will lead to the lower accuracy. Accuracy also decreases as the number of recommendations increases. This is because we are setting the number of comparison check-ins to be reserved at 3. So, as the number of recommendations increases, the accuracy decreases. As can be seen from the recall rate, users are more likely to visit places they have been before.

(a)

(b)

(c)

(d)

(e)

(f)

(g)

(h)

(a)

(b)

(c)

(d)

(e)

(f)

(g)

(h)

6.4.2. The Impact of on Recommendation

Figure 5 shows the changes of recommendation accuracy under different privacy protection levels. Figure 5(a) shows the predicted results of the Gowalla dataset, and Figure 5(b) shows the predicted results of the Brightkite dataset. is the number of digits reserved after the decimal point. The smaller the number, the wider it extends in the coordinates. The more information it contains, the more perturbations it has. As can be seen from the results of the two figures in Figure 5, the bigger extension range results in a lower the accuracy. The prediction accuracy of is lower than when , because the community division does not work well at . It can be seen from the two figures that when the number of recommendation is 3, the accuracy is higher than other numbers. This illustrates that (1) both the number of recommendation and the reserved decimal points will impact the result and (2) the higher the accuracy, the closer the recommended quantity is to the reserved comparison quantity. Therefore, it is not necessary to recommend a lot of information to the user, which will increase the time cost of the user.

(a)

(b)

7. Conclusions

In this paper, we proposed CGPP-POI which is a recommendation model with privacy protection mechanism. Through this model, users can set their own privacy protection scope and then upload data to the third party after LDP. The location information of the user can be protected through LDP processing, while the data remains acceptable utility. The data uploaded after perturbation can satisfy the LDP in the statistical query and maintain its statistical results. But when we try to use this data to build recommendation models, the results are terrible because every user’s preferences is different. Therefore, we generalize the disturbed data and build the community by taking advantage of the overlap between users after location expansion. We enhance the coupling between users with high similarity and search for the relationship between users and check-in within the community. Finally, the obtained recommendation results satisfy local differential privacy. We compare the baseline recommendation algorithms on two real data and found that it was superior to the baseline recommendation algorithm in terms of accuracy, recall rate, coverage, and popularity. We also explore the impact of different privacy intensities on recommendations. As can be seen from the experimental results, the bigger the extension range, the lower the accuracy. Our future goal is to improve the recommendation model of POI based on privacy protection by extending model parameters. We aim to improve the accuracy of prediction and protect user location information simultaneously.

Data Availability

The data used to support the research is generally unavailable due to public releasability constraints. Please contact the corresponding author for special release consideration.

Conflicts of Interest

The authors declare that they have no conflicts of interest.

Acknowledgments

Our research is funded by the Fundamental Research Funds for the Central Universities (No. 3072021CF0609).

References

Q. Hou, M. Han, and Z. Cai, “Survey on data analysis in social media: a practical application aspect,” Big Data Mining and Analytics, vol. 3, no. 4, pp. 27–47, 2020.
View at: Google Scholar
M. Xue, P. Kalnis, and H. K. Pung, “Location diversity: enhanced privacy protection in location based services,” in International Symposium on Location-and Context-Awareness, pp. 70–87, Berlin, Heidelberg, 2009.
View at: Google Scholar
Y. Liang, Z. Cai, J. Yu, Q. Han, and Y. Li, “Deep learning based inference of private information using embedded sensors in smart devices,” IEEE Network, vol. 32, no. 4, pp. 8–14, 2018.
View at: Publisher Site | Google Scholar
https://en.wikipedia.org/wiki/aol_search_data_leak.
https://www.wired.com/2009/12/netflix-privacy-lawsuit/.
Q. Luo, X. Cheng, J. Yu, Z. Cai, D. Yu, and L. Zhang, “Fast skyline community search in multi-valued networks,” Big Data Mining and Analytics, vol. 3, no. 3, p. 171, 2020.
View at: Google Scholar
R. M. Bell and Y. Koren, “Lessons from the netflix prize challenge,” Acm Sigkdd Explorations Newsletter, vol. 9, no. 2, pp. 75–79, 2007.
View at: Publisher Site | Google Scholar
Y. Liu, T.-A. N. Pham, G. Cong, and Q. Yuan, “An experimental evaluation of pointof-interest recommendation in location-based social networks,” Proceedings of the VLDB Endowment, vol. 10, no. 10, pp. 1010–1021, 2017.
View at: Publisher Site | Google Scholar
P. Zhao, A. Luo, Y. Liu et al., “Where to go next: a spatio-temporal gated network for next poi recommendation,” IEEE Transactions on Knowledge and Data Engineering, vol. 33, pp. 5877–5884, 2019.
View at: Google Scholar
G. Li, G. Yin, J. Yang, and F. Chen, “Sdrm-ldp: a recommendation model based on local differential privacy,” Wireless Communications and Mobile Computing, vol. 2021, Article ID 6640667, 15 pages, 2021.
View at: Publisher Site | Google Scholar
Y. Koren, R. Bell, and C. Volinsky, “Matrix factorization techniques for recommender systems,” Computer, vol. 42, no. 8, pp. 30–37, 2009.
View at: Publisher Site | Google Scholar
Y. Koren, “Factorization meets the neighborhood: a multifaceted collaborative filtering model,” in Proceedings of the 14th ACM SIGKDD international conference on Knowledge discovery and data mining, pp. 426–434, Las Vegas, Nevada, USA, 2008.
View at: Publisher Site | Google Scholar
A. Mnih and R. R. Salakhutdinov, “Probabilistic matrix factorization,” Advances in neural information processing systems, vol. 20, pp. 1257–1264, 2007.
View at: Google Scholar
C. Huang, Y. Cheng, W. Zhou, and J. Jia, “Web3d learning framework for 3d shape retrieval based on hybrid convolutional neural networks,” Tsinghua Science and Technology, vol. 25, no. 1, p. 93, 2020.
View at: Google Scholar
H. Yin, X. Zhou, B. Cui, H. Wang, K. Zheng, and Q. V. H. Nguyen, “Adapting to user interest drift for poi recommendation,” IEEE Transactions on Knowledge and Data Engineering, vol. 28, no. 10, pp. 2566–2581, 2016.
View at: Publisher Site | Google Scholar
Y. Wang, G. Yin, Z. Cai, Y. Dong, and H. Dong, “A trust-based probabilistic recommendation model for social networks,” Journal of Network and Computer Applications, vol. 55, pp. 59–67, 2015.
View at: Publisher Site | Google Scholar
G. Li, Z. Cai, G. Yin, Z. He, and M. Siddula, “Differentially private recommendation system based on community detection in social network applications,” Security and Communication Networks, vol. 2018, Article ID 3530123, 18 pages, 2018.
View at: Publisher Site | Google Scholar
H. Kido, Y. Yanagisawa, and T. Satoh, “Protection of location privacy using dummies for location-based services,” in 21st International Conference on Data Engineering Workshops (ICDEW’05), pp. 1248–1248, Tokyo, Japan, 2005.
View at: Publisher Site | Google Scholar
P. Shankar, V. Ganapathy, and L. Iftode, “Privately querying location-based services with sybilquery,” in Proceedings of the 11th international conference on Ubiquitous computing, pp. 31–40, Orlando, Florida, USA, 2009.
View at: Google Scholar
B. Bamba, L. Liu, P. Pesti, and T. Wang, “Supporting anonymous location queries in mobile environments with privacygrid,” in Proceedings of the 17th International Conference on World Wide Web, pp. 237–246, Beijing, China, 2008.
View at: Google Scholar
J. Reagle and L. F. Cranor, “The platform for privacy preferences,” Communications of the ACM, vol. 42, no. 2, pp. 48–55, 1999.
View at: Publisher Site | Google Scholar
S. Papadopoulos, S. Bakiras, and D. Papadias, “Nearest neighbor search with strong location privacy,” Proceedings of the VLDB Endowment, vol. 3, no. 1-2, pp. 619–629, 2010.
View at: Publisher Site | Google Scholar
G. Ghinita, P. Kalnis, A. Khoshgozaran, C. Shahabi, and K.-L. Tan, “Private queries in location based services: anonymizers are not necessary,” in Proceedings of the 2008 ACM SIGMOD international conference on Management of data, pp. 121–132, Vancouver, Canada, 2008.
View at: Google Scholar
Z. Cai, Z. He, X. Guan, and Y. Li, “Collective data-sanitization for preventing sensitive information inference attacks in social networks,” IEEE Transactions on Dependable and Secure Computing, vol. 15, no. 4, pp. 577–590, 2016.
View at: Publisher Site | Google Scholar
I. Memon, Q. A. Arain, M. H. Memon, F. A. Mangi, and R. Akhtar, “Search me if you can: multiple mix zones with location privacy protection for mapping services,” International Journal of Communication Systems, vol. 30, no. 16, article e3312, 2017.
View at: Publisher Site | Google Scholar
T. Hashem and L. Kulik, “"Don’t trust anyone": privacy protection for location-based services,” Pervasive and Mobile Computing, vol. 7, no. 1, pp. 44–59, 2011.
View at: Publisher Site | Google Scholar
Z. Chen, X. Hu, J. Xiaoen, and K. G. Shin, “LISA: location information scrambler for privacy protection on smartphones,” in 2013 IEEE Conference on Communications and Network Security (CNS), pp. 296–304, National Harbor, MD, USA, 2013.
View at: Publisher Site | Google Scholar
Z. Cai, Z. Duan, and W. Li, “Exploiting multi-dimensional task diversity in distributed auctions for mobile crowdsensing,” IEEE Transactions on Mobile Computing, vol. 20, no. 8, pp. 2576–2591, 2021.
View at: Google Scholar
Z. Cai and Z. He, “Trading private range counting over big iot data,” in 2019 IEEE 39th International Conference on Distributed Computing Systems (ICDCS), pp. 144–153, Dallas, TX, USA, 2019.
View at: Publisher Site | Google Scholar
Z. Xu and Z. Cai, “Privacy-preserved data sharing towards multiple parties in industrial iots,” IEEE Journal on Selected Areas in Communications, vol. 38, no. 5, pp. 968–979, 2020.
View at: Publisher Site | Google Scholar
Z. Cai and Z. Xu, “A private and efficient mechanism for data uploading in smart cyber-physical systems,” IEEE Transactions on Network Science and Engineering, vol. 7, no. 2, pp. 766–775, 2018.
View at: Publisher Site | Google Scholar
Z. Cai, Z. Xiong, H. Xu, P. Wang, W. Li, and Y. Pan, “Generative adversarial networks,” ACM Computing Surveys, vol. 54, no. 6, pp. 1–38, 2021.
View at: Publisher Site | Google Scholar
C. Dwork, “Differential privacy: a survey of results,” in International Conference on Theory and Applications of Models of Computation: Theory and Applications of Models of Computation, Springer, 2008.
View at: Google Scholar
X. Zhao, Y. Li, Y. Yuan, X. Bi, and G. Wang, “Ldpart: effective location-record data publication via local differential privacy,” IEEE Access, vol. 7, pp. 31435–31445, 2019.
View at: Publisher Site | Google Scholar
Y. S. R. Xin, J. Zhang, and Y. Shao, “Complex network classification with convolutional neural network,” Tsinghua Science and Technology, vol. 25, no. 4, pp. 447–457, 2020.
View at: Publisher Site | Google Scholar
E. J. Newman Mark, “Networks: an introduction,” Astronomische Nachrichten, vol. 327, no. 8, pp. 741–743, 2010.
View at: Google Scholar
M. E. J. Newman, “Fast algorithm for detecting community structure in networks,” Physical Review E, vol. 69, no. 6, article 066133, 2004.
View at: Publisher Site | Google Scholar
C. Dwork, F. McSherry, K. Nissim, and A. Smith, “Calibrating noise to sensitivity in private data analysis,” in Theory of cryptography conference, Springer, 2006.
View at: Google Scholar
F. McSherry and K. Talwar, “Mechanism design via differential privacy,” in 48th Annual IEEE Symposium on Foundations of Computer Science (FOCS’07), pp. 94–103, Providence, RI, USA, 2007.
View at: Publisher Site | Google Scholar

Copyright

Copyright © 2021 Gesu Li et al. This is an open access article distributed under the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.

PDF Download Citation

Download other formats

Order printed copies

Wireless Communications and Mobile Computing

Secure Computational Solutions for Sparse Data Challenges in the Internet of Things

CGPP-POI: A Recommendation Model Based on Privacy Protection

Abstract

1. Introduction

2. Related Work

2.1. The Recommendation Model on POI

2.2. The Privacy Protection on POI

3. Preliminaries

3.1. Community Detection

3.2. Local Differential Privacy

4. Problem Definition

5. CGPP-POI

5.1. Privacy Protection Mechanism

5.2. CGPP-POI Recommendation Model

6. Experiments

6.1. Datasets

6.2. Evaluation Methods

6.3. Experimental Setup

6.4. Experimental Results

6.4.1. Recommendation Results

6.4.2. The Impact of on Recommendation

7. Conclusions

Data Availability

Conflicts of Interest

Acknowledgments

References

Copyright