Abstract
Cross-city point of interest (POI) recommendation for tourists in an unfamiliar city has high application value but is challenging due to the data sparsity. Most existing models attempt to alleviate the sparsity problem by learning the user preference transfer and drift. However, they either fail to simultaneously model the preference transfer and drift in both long- and short-term user preferences or cannot accomplish the task of the next POI recommendation, which is crucial for a wide spectrum of applications ranging from transportation and urban planning to advertising. To address the limitation, we proposed a user preference transfer and drift network (UPTDNet) for cross-city next POI recommendation. UPTDNet excels at cross-city recommendations by learning the transfer and drift of both long- and short-term preferences. For short-term preference, dual recurrent neural network-based (RNN-based) branches are designed to model preference transfer from tourist’s current city and drift among different user roles. For long-term preference, a mapping function and user similarity calculation are employed for preference transfer from the tourist’s home city and drift among individual users. Experiments are conducted on the Gowalla and Foursquare datasets, and the results show that UPTDNet consistently and significantly outperforms state-of-the-art models by an average of 10.22% to 22.63% in the next POI recommendation task. Ablation study and further analysis validate the effectiveness and plausibility of considering both user preference transfer and drift in the cross-city recommendation.
1. Introduction
The widespread application of location-based social networks (LBSNs), such as Gowalla (https://www.gowalla.com/), Foursquare (https://foursquare.com/), and Yelp (https://www.yelp.com/), has provided a great opportunity for the prevalence of POI recommendation systems. With the huge amount of user check-in data, LBSN service providers are able to mine various user preferences and sequence patterns and generate better POI recommendations for users. As a particular scenario of POI recommendation, cross-city POI recommendation is to recommend POI for users in an unfamiliar city based on their limited historical check-in records [1]. It is extremely difficult but reasonably useful. When users are in their home city, they are familiar with the local conditions; therefore, they can easily decide where to go next without any help. While in a current unfamiliar city, users lack knowledge of the city and usually do not have enough time to explore it further, so they need POI recommendation systems to help them find suitable POIs.
One of the most challenging problems of cross-city recommendation is data sparsity. Scellato et al. [2] reported that for an average user, cross-city check-ins account for only 0.47% of home city check-ins. Ding et al. [1] also proved that the check-in records generated in the current unfamiliar city are on average about one less order of magnitude than those in the home city. Considering the personalized and location-aware characteristics of POI recommendation, previous studies handled data sparsity mainly by supplementing POI information or modeling user preferences.
Some existing methods [3–5] directly transferred knowledge from the home city to the current city to alleviate data sparsity but ignored the user preference information. In cross-city POI recommendation, the interactions of a user with POIs in the current city are limited. Therefore, user preference is a powerful medium for transferring information between cities.
Specifically, user preference is composed of long-term preference and short-term preference. Long-term preference describes the inherent interest in different POI categories (or latent topics) of a user, which can be inferred from his check-in behaviors from his home city and maintained in the current city. On the other hand, short-term preference indicates the dynamic part of user check-in behaviors, which will change continuously according to the current location, time, and any other context information in the current city. For example, the popularity of local POIs will greatly affect users’ check-in decisions. People who never take part in gambling are likely to go to casinos in Las Vegas. This kind of word-of-mouth opinions of POIs can be inferred from the crowd’s preferences of local people. Therefore, we need to take both long- and short-term user preferences into consideration in cross-city POI recommendation.
Recent studies have focused on modeling user preferences. Some literature [6–10] utilized probabilistic generative models to infer user preferences. Generally, the long-term user preference is represented as a multinomial distribution over latent topics and inferred from historical check-in records in the home city. On the other side, the short-term user preference is built from local people’s check-in frequencies over different POIs in the current city. These methods successfully transferred the user invariant knowledge from both the home city and the current city but neglected the phenomenon of user preference drift. In fact, the long-term preference of the same user will change between the home city and the current city. Users with travel intentions are more likely to visit hotels and tourist attractions in the current city than in their home city. On the other hand, the short-term preferences of different user roles, such as tourists and locals in the current city, vary differently as well. To solve this problem, Yin et al. [11] built long-term user preference in each specific region and distinguished POI popularity for different user roles. However, limited by the probabilistic generative model, the obtained user preference becomes fully region-dependent and cannot capture the invariant information.
In order to model user preference transfer and drift simultaneously, Ding et al. [1] designed a comprehensive framework and assigned feature vectors to users and POIs. They separated user features into intrinsic and drifted parts and controlled the similarity of POI features for tourists and local people. Similarly, Xin et al. [12] constructed user feature vectors to integrate hometown preferences and travel intentions. These methods followed the idea of matrix factorization (MF) methods to calculate the interaction scores as the inner product of the user’s and POI’s feature vectors. The calculated scores are time-invariant, so they cannot accomplish the task of the next POI recommendation. Such methods only consider user preference transfer and drift but cannot make next POI recommendation simultaneously.
In this paper, we propose a user preference transfer and drift network (UPTDNet), a cross-city next POI recommendation model for tourists, which learns long- and short-term user preferences from both home and current cities, with specific consideration of user preference transfer and drift. Specifically, we use an RNN-based model to generate short-term user preference and design dual branches for preference transfer and drift, respectively. The transfer branch, as well as the POI embedding vectors, is shared between tourists and local people in the current city, while the drift branch captures the preference variation among different user roles including tourist in his home city, tourist in the current city, and local in the current city. For long-term user preference modeling, we assign each user an embedding vector and use a mapping function to transfer the user’s home city preference to his current city check-in behavior. A user similarity calculation module is designed to capture the long-term user preference transfer and drift, based on the users’ check-in frequency of different POI categories. Finally, the output hidden states of dual RNN-based branches and the user embedding are concatenated together to generate a personalized recommendation of the next suitable POI. The main contributions of this paper are summarized as follows:(i)We propose a next POI recommendation model (UPTDNet) for the cross-city scenario by learning long- and short-term user preference transfer and drift. Our model takes advantage of the knowledge from the user’s home city and current unfamiliar city and utilizes an RNN-based model to capture the sequential effects of check-in records, incorporating user preference transfer and drifting into a unified next POI recommendation model.(ii)We study the user preference on a more granular level by constructing the long- and short-term preference respectively, and considering the preference transfer and drift phenomena. Specifically, we design a user similarity calculation module to model long-term user preference transfer and drift for individuals and construct dual RNN-based branches to model short-term preference transfer and drift for different user roles.(iii)We conduct extensive experiments to evaluate the performance of our UPDTNet model on real-life cross-city datasets. The experiment results demonstrate the superiority of our model over the state-of-the-art methods. We also conduct further analysis to explore the impact and effectiveness of the long- and short-term preferences modeling in our model.
The remainder of the paper is organized as follows: Section 2 summarizes the related work of cross-city POI recommendation and the next POI recommendation. Section 3 presents our proposed UPTDNet in detail. The experiment results on real-word datasets are reported in Section 4, and further discussions are made in Section 5. Finally, Section 6 concludes this paper.
2. Related Work
2.1. Next POI Recommendation
Next POI recommendation is a key problem in human mobility modeling, which needs more attention to sequence pattern and user preference modeling than general POI recommendation. Many previous studies used Markov chains to capture the correlations between consecutive check-in points. Rendle et al. [13] proposed factorizing personalized Markov chains (FPMC) that learned a personalized transition matrix for each user and alleviated data sparsity with matrix factorization. Cheng et al. [14] further extended FPMC with location constraints in building the transition matrices. Feng et al. [15] designed two Euclidean latent spaces to model user-POI and POI-POI correlations, respectively, and ranked POI candidates with the pairwise ranking Metric Embedding algorithm.
Recently, deep learning has been applied to the next POI recommendation and has shown remarkable performance. Tang and Wang [16] used a convolutional neural network (CNN) to model the union-level sequence patterns and skip-behaviors in check-in records. Zhang et al. [17] utilized a self-attention mechanism to explicitly capture POI-POI interactions. As one of the most promising tools of deep learning, RNN is able to model sequence patterns [18, 19] and therefore prevails in the field of the next POI recommendation. Compared to general trajectories such as taxi driving trajectories, the time interval and spatial interval between two consecutive locations are much larger in check-in sequences. Therefore, considering spatiotemporal intervals is vital in the next POI recommendation task. For example, Liu et al. [20] proposed ST-RNN and defined spatiotemporal-specific transition matrices. A popular scheme is to add additional gates or refine the original gates within RNN architecture, taking the spatiotemporal intervals as input [21–24]. Since the hand-crafted refinement within RNN is complicated and challenging to implement, Yang et al. [25] designed spatiotemporal weights with explicit functions according to the temporal periodicity and spatial regularity of user mobility and multiplied the weights with the output hidden states of RNN.
In addition to the sequence pattern of check-in data, user attributes also greatly benefit POI recommendation results. Plenty of work has focused on long- and short-term user preference modeling. As a typical technique of refining the RNN gating mechanism, STGN [21] designed spatiotemporal gates to capture long- and short-term influences, respectively. Some work also interpreted user preferences in different ways. Huang et al. [26] formulated the output hidden states of RNN as the dynamic short-term user preference according to current spatial and temporal contexts, and designed user vectors as stationary long-term user profiles. DeepMove [27] and LSTPM [28] truncated a check-in sequence into subsequences, using the current subsequence and historical subsequences to model short-term and long-term user preferences, respectively.
The above methods work well in general next to POI recommendation task, but they do not consider the cross-city scenario where a user visits an unfamiliar city and has an extremely sparse check-in history, which is very common in POI recommendation. Their user preference modeling also neglects the preference relationships among different individuals and crowds. In this paper, we model user preference transfer and drift between the home city and the current city to alleviate data sparsity and recommend the next POI for cross-city users. The RNN-based transfer and drift branches are designed for different user roles, while the transformation function and user similarity calculation are employed for different individuals.
2.2. Cross-City POI Recommendation
Data sparsity is the most challenging problem in cross-city POI recommendations. In traditional collaborative filtering (CF) and MF methods, the representations of users and POIs are learned from user-POI interactions. However, the extreme sparsity of the user-POI interaction matrix in cross-city scenarios makes it difficult to infer the actual properties of users and POIs. Therefore, CF-based and MF-based methods cannot be used directly for the cross-city recommendation. To alleviate the data sparsity problem, recent studies have been considering transferring user preferences from the home city and local preferences from the current city. A general way to model long-term user preference is by generating probability distributions over latent topics based on historical check-in records. Liu and Xiong [7] exploited an aggregated Latent Dirichlet Allocation (LDA) model to gain topic-related user features and POI features and then made a recommendation according to how well the user preference matches POIs and the popularity of local POIs. Yin et al. [6] further took the long-term preferences of local people into consideration and formulated the users’ check-in decisions as a weighted sum of tourists’ and locals’ preferences. In more recent work [8, 10], travel locality and word-of-mouth opinions of POIs were captured in the probabilistic generative model as well. The spatial extent of user activities was expressed as a multinomial distribution over a set of regions, and the POI popularity was represented as a multinomial distribution over POIs for each region. To adapt to the user preference drift among different regions and the POI popularity variation between tourists and locals, Yin et al. [11] proposed to distinguish different user roles when establishing the distribution of users over topics and the distribution of regions over POIs.
Although the previous probabilistic generative models can generate quite good recommendations given users’ current location and time, the user preference construction is either completely region-independent or completely region-dependent. Such modeling process cannot capture both invariants and variants at the same time. Therefore, recent work started to formulate the long-term user preference and POIs as feature vectors learned by backpropagation. Ding et al. [1] separated the user vectors into region-independent and region-dependent parts to denote the user preference transfer and drift, respectively, and constrained the POI vectors’ similarity between cities. Xin et al. [12] aggregated historical check-in information as user vectors and applied a mapping function to transfer user preferences from the home city. Following the core idea of MF methods, these methods made recommendations by the inner product between user vectors and POI vectors. However, these vectors are not time-variant and therefore cannot finish the next POI recommendation task.
The previous cross-city recommenders fail to model preference transfer and drift and generate the next POI recommendation simultaneously, and the transfer and drift in deep learning methods only focus on long-term preference. Different from previous work, we unify the user preference transfer and drift and the next POI recommendation with an RNN-based model. We also explore user embeddings and the hidden states of RNN to model the user preference from perspectives of long- and short-term preference explicitly, which makes the transfer and drift process more accurate and reasonable.
3. Proposed Model
3.1. Problem Formulation
Our proposed model aims to recommend the next suitable location for cross-city tourists who have only limited records in the current unfamiliar city. It should be noted that although it is not a strictly cold-start problem, the data sparsity problem is still aggravated for cross-city users. When we recommend POIs for a user, the principle is to assess how much the user prefers the POIs. Therefore, we need to generate proper representations to depict the inherent properties of users and POIs. For this purpose, information of users’ historical behaviors is used. However, most of the users’ check-ins occured in their home cities. The lack of information in the current city makes it extremely difficult to learn good representations of users and POIs and hinder model performance. In this paper, we target a higher recommendation quality by transferring user knowledge from both the home city and the current city, as well as POI information from the current city. Let denote the set of target tourists from the home city, and denotes the set of POIs in the home city. Correspondingly, is the set of users and is the set of POIs in the current city. For each user , we can obtain a historical check-in sequence generated in the target current city represented as , where is the most recent POI visited by user . The goal of the next POI recommendation is to recommend the preferable POI to the user at the next timestamp . The main notations used in this paper are presented in Table 1.
3.2. Overall Framework
The overall architecture of our proposed UPTDNet is depicted in Figure 1. In order to accomplish the task of the next POI recommendation, the model is RNN-based to integrate sequential effects and user preferences as a whole. For POI representations in different cities, we assign learnable embedding vectors and to each POI in home city, as well as and to each POI in current city. As for user representations, we assign embeddings to each tourist in their home city and in current city, as well as to each local . The reason why we learn two embeddings for each POI in different RNN branches and for each tourist in different cities will be explained in Sections 3.3 and 3.5. Given historical check-in sequences from different datasets, i.e., check-ins of tourists in their home city and in the current city , and check-ins of locals in current city , we can obtain POI embeddings of the check-in at each timestamp and input them to corresponding RNN branches. The output hidden states of RNN reflect the dynamic contents of user preferences and can be regarded as short-term preferences. The user embeddings , , and reflect the stationary components of user preferences and therefore can represent long-term preferences. To combine the long- and short-term preferences, the hidden states and user embeddings are concatenated and fed into fully connected layers to generate the probabilities of next POI in the home city and in the current city.

To capture the knowledge transfer and drift from the home city and current city, our model mainly consists of three components including POI information transfer, short-term user preference modeling, and long-term user preference modeling. For POI information transfer, the POI embeddings and in the current city are shared between target tourists and locals. Our main contributions lie in user preference transfer and drift modeling, using tourists’ check-ins in the home city and locals’ check-ins in the current city as auxiliary datasets. For short-term user preference modeling, we build dual RNN branches, including transfer branch (the blue branch in Figure 1) and drift branch (the green branch in Figure 1), to model the preference transfer and drift. The transfer branch is to transfer locals’ preferences from current city, while the drift branch is to capture the unique sequence patterns of different user roles with a user role prediction module. For this aim, the POI embeddings of historical check-in sequences from the tourists and locals in current city and are input to the transfer branch and the drift branch, respectively. The POI embeddings of historical check-ins of tourists in the home city are also fed into the drift branch to further distinguish the short-term preferences of different user roles. We also build an independent home city branch (the yellow branch in Figure 1) taking as input for tourists’ check-ins in their home city because they are not the main source of short-term preference transfer. For long-term user preference modeling, a user similarity calculation module is designed to constrain the similarity of user embeddings , , and , transferring tourists’ invariant preferences from home city and capturing the variant parts. The module uses individuals’ visit frequencies of POI categories as constraints to ensure that the long-term preference presented in this model is consistent with the users’ real-life behaviors.
3.3. POI Information Transfer from the Current City
The characteristics of POIs directly affect users’ check-in decision-making. Users will visit POIs with specific functions that match their intentions, such as restaurants for lunch or scenic spots for travel. With the development of deep learning, embedding has become a popular scheme for representing POI features. POI embeddings are randomly initialized and iteratively updated through the model learning process, to automatically learn task-oriented representations rather than hand-crafted features. Specifically, look-up tables are built in this model for POIs in both the home city and the current city. The look-up tables are denoted as matrices and . Each row of the matrix is a latent embedding of POI , where is the number of hidden dimensions and stands for the number of elements in a set. It should be noted that we extend the vanilla RNN model into dual branches to handle the transfer and drift processes. Since the dual branches deal with different tasks, we also assign another embedding vector to each POI to better decouple the POI feature as previous studies [12, 15] did. Accordingly, other two matrices and are built, and stands for the latent feature of POI
Learning the weights of POI embeddings merely on the target tourist dataset is unlikely to achieve satisfactory performance due to its data sparsity. Considering that target tourists and locals visit the same POI set in the current city, we can transfer the POI information from the existing rich local dataset. In recent years, weight sharing has been proven to be an effective method for transfer learning [29, 30]. Models can be pretrained on a large-scale dataset within the source domain and then fine-tuned on the sparse dataset of the target domain while keeping some or all of the parameters frozen. Enlightened by this technique, we assume that the target tourist dataset shares the same POI embeddings as the local dataset in model learning. In this way, the POI information can be enriched from the local dataset by weight sharing embeddings.
3.4. Short-Term User Preference Modeling
Taking POI embeddings as input, we use an RNN-based architecture to capture the sequence pattern over historical check-in records and generate the short-term user preference. Given a user check-in sequence, the vanilla RNN structure is described as follows:where and are learnable parameters, is the latent embedding of the POI at timestamp , and is the output hidden state. The activation function is chosen as tanh function. The hidden state reflects the short-term user preference under dynamically changing context.
The limited historical check-ins in the cross-city dataset are not enough for learning a good representation of short-term preference, and the learned model cannot generalize well to new data. Considering that the target tourists and locals share the same POI set in the current city, their check-in sequence patterns will be similar but also vary differently according to the user roles. Therefore, we design the RNN-based transfer branch and drift branch for different user roles, taking tourists’ and locals’ check-ins in the current city as input. Tourists’ check-ins in the home city are also input to the drift branch to better distinguish the characteristics of different user roles. Since the home city check-ins are not used for preference transfer, we build an independent home city branch as well.
It is worth noticing that the RNN-based branches can be changed to other state-of-the-art RNN methods. For the home city branch and transfer branch, we employ and modify the Flashback model [25] to consider the impacts of past hidden states, which will be illustrated in Section 3.4.1. For the drift branch, we use vanilla RNN to capture the general sequence patterns of different user roles, which will be elaborated in Section 3.4.2.
3.4.1. User Preference Transfer from the Current City
In order to transfer the short-term preferences from locals, the tourist check-in dataset and the local check-in dataset share the weights of the transfer branch. In this way, the general short-term preferences in the current city can be utilized for the target dataset. Following the idea in previous work [1], since we need to utilize the tourists’ check-ins in the home city, we also build a home city branch for the home city POI recommendation with independent weights. Maintaining the same model structure for all datasets guarantees the model robustness, and more information can be used.
We modify the Flashback model for the transfer branch and home city branch to generate more competitive recommendation results. Instead of changing the internal memory units of RNN, Flashback provides a simpler design and directly calculates a weighted sum of output hidden states. For the weight at timestamp , it calculates the time interval and the spatial interval between the most recent check-in point and point . However, it cannot accurately capture the historical influence for current check-in point prediction. In our model, we further modify the spatiotemporal intervals as and between the current point and the previous point. The obtained weight is formulated as follows:where is the havercosine function that models the periodicity of user check-ins, and the exponential terms model the spatiotemporal decay of the impact of historical check-ins. The hidden states and of the transfer branch and the home city branch at timestamp are generated as follows:where and are the hidden states generated by vanilla RNN at timestamp as equation (1). , , , and are the spatiotemporal intervals in the current city and home city. is the number of past hidden states that are considered.
3.4.2. User Preference Drift with User Role Prediction
Differentiating user roles and modeling preference drift are important in the cross-city recommendation [9, 11]. Different user roles have various preferences over the same POI set according to their characteristics [1]. For tourists and locals in current city, given similar historical sequence, the next POI visit will still be different. For instance, locals may go home as the end of all check-ins at a day, while tourists will go to hotels for rest. Although the preference drift further exists at the individual level, the lack of tourists’ data makes it difficult to learn. On the other hand, since traveling is the main purpose of cross-city visit for most tourists, the check-in pattern of tourist users will have more in common than those of local users. Empirically, tourists tend to visit more scenic spots, hotels, and transportation hubs such as airports and train stations. Such distinct property will help alleviate the data sparsity when considering preference drift over user roles.
Building dual RNN branches has been proven to be effective to decouple and better utilize the integrated information embedded in sequence [31, 32]. Therefore, in addition to the transfer branch, we add an RNN-based drift branch with a user role prediction module. In our model, the user roles include tourists in the home city, tourists in the current city, and locals in the current city. Although the first two user roles are the same user set, their sequence patterns in the home city and current city are quite different. We distinguish their user roles according to where they are.
We use vanilla RNN as the drift branch to capture sequence pattern and dynamic preference for different user roles. The reason why we choose the vanilla model is to manifest the performance improvement brought by preference drift mechanism instead of the intricacy of the model. To capture more task-specific POI features, another d-dimensional embedding is assigned to each POI. The latent feature of POI at timestamp is represented as and input to RNN. The home city check-ins of tourists are taken as input as well, enlarging the feature distance among all user roles and therefore gaining more distinct user role characteristics. The hidden states obtained from the current city check-ins and home city check-ins are generated by the drift branch as follows:where and are the trainable weights of the drift branch. The output hidden states and are input to the user role prediction module, which is depicted in Figure 2(a). We compute the probability distribution r over N user roles via the following:where is a trainable projection matrix and .

(a)

(b)
3.5. Long-Term User Preference Modeling
People tend to visit POIs that match their interests. Therefore, the inherent profile of a user, i.e., the long-term preference, plays an important part in recommendation. However, the data sparsity of the cross-city dataset cannot provide enough information for generating proper user representation. In this case, long-term preference is learned from only a few check-ins within a short period of time, which cannot reflect user’s actual interest in the long term. To supplement the lack of user profile, we need more user’s historical check-ins to infer his behavior, so we utilize tourists’ check-ins in their home city in this model for long-term preference modeling. In recent years, user embedding is becoming a common scheme to represent the stationary user properties in both traditional models [1, 15] and deep learning models [12, 26, 33]. They tend to focus on the correlation between users and POIs but neglect to explore the relationships between users. In our model, we assign d-dimensional vectors and to target tourists in the home city and current city and to local in the current city to depict the long-term preference. To fully utilize the user information for cross-city recommendation, we further explore their relationships by building a transfer mechanism and designing a user similarity calculation module.
3.5.1. User Preference Transfer from the Home City
For the target users who travel to the current city, their long-term preferences will be partially similar to their preferences in the home city and also drift due to their tourist role. Besides user’s long term preference in the home city, we assigned another d-dimensional embedding vector to the target user when he/she is visiting the current city, which is consistent with the idea of user role discrimination elaborated in Section 3.4.2. Inspired by previous studies of transfer learning [34–36], we adopt a transformation matrix to transfer the user knowledge from the home city. The mapping function between and for the same user is formulated as follows:where is the transformation matrix to model the relationships of long-term user preference between the home city and the current city.
3.5.2. User Similarity Calculation
Although the transferred knowledge from the home city can be referred when making recommendation to the target tourist, the long-term preference of the same user can still differ according to the visiting city. On the other hand, the long-term preference varies greatly on the individual level. It is very likely that a tourist and a local in the current city have a similar taste over local POIs. Intuitively, if the long-term preference is well represented, users with similar long-term preferences will have similar embeddings in the model and vice versa. Therefore, we design a user similarity calculation module to further utilize user relationships and constrain user embedding learning. The detailed structure is depicted in Figure 2(b). Taking the user embeddings as input, we calculate the cosine similarity between two users as follows:where and are the user embedding vectors and and are the Euclidean norm of and , respectively. It should be noted that the similarity is calculated on all possible user pairs within the input batch, meaning that and can be any combination of , , and .
With the calculated similarity, we also need to construct the label similarity as a reference. We use user’s check-in frequency of all POI categories to describe the real-life long-term preference. Specifically, we can obtain a vector for each user, whose dimension equals to the number of POI categories and each entry represents the visit frequency of a specific POI category by this user. Then, the label similarity of the corresponding user pair can be calculated as follows:where and are the visit frequency vectors and and are the Euclidean norm.
3.6. Recommendation and Network Training
After the long- and short-term preference representations are obtained, they are concatenated as input to calculate the probability distribution over all POI candidates. For the target tourist users and local users in the current city, since these two datasets share the same short-term preference transfer and drift branch, the output probability is generated as follows:where is a trainable projection matrix for all POIs in the current city. Similarly, for the target users in their home city, we calculate the probability as follows:where is a trainable matrix for all POIs in the home city. Consequently, the recommended POI at the next time step will be the one with the largest probability. We denote the predicted probability of the ground truth POI regarding the -th training sample as in the current city and in the home city. Then, the loss function of preference modeling can be formulated as log likelihood as follows:where is the total number of training samples in the current city, including those of target tourists and locals, and is the total number of training samples of target users in the home city.
For the user role prediction module, with the probability distribution , we adopt the log likelihood as the user role loss as follows: where is the total number of training samples for user role prediction and is the predicted probability of the ground truth user role generated for the -th training sample.
For the user similarity calculation module, we use mean absolute error (MAE) as the loss function for user embedding similarity learning as follows:where is the total number of training samples for user similarity calculation, is the user embedding similarity in equation (7), and is the user actual check-in frequency similarity in equation (8), regarding the -th training sample.
By combining the preference loss in equations (11) and (12), the user role loss in equation (13), and the user similarity loss in equation (14), we can minimize the following composite loss function to jointly train our model in an end-to-end fashion:
4. Experiments
4.1. Datasets
We conduct a few experiments on Gowalla [37] and Foursquare [38], two real-world LBSN datasets. The Gowalla check-in data are collected from November 2010 to June 2011, while Foursquare contains worldwide check-ins from April 2012 to January 2014. Each check-in record contains user ID, POI ID, longitude, latitude, POI category, and timestamp. Since the dataset lacks city information, we map each check-in to its corresponding city based on the U.S. metropolitan statistical area map [39]. Due to privacy, users’ home cities are not available in these datasets. A common scheme is to define the spatial cell with most of user’s check-ins as his home region [1, 2, 8, 10, 11]. Since we mainly focus on the city level in this study, we follow the definition developed by Ding et al. [1] which defines the city that a user visits most as his home city. As the check-in dataset is sparse, we choose Dallas-Austin and Los Angeles-San Francisco for Gowalla and Washington-Baltimore for Foursquare, which include more check-in records. For each cross-city pair, we first select the users who live in either of the two cities and have once traveled to the other. Then, we extract their check-in records in the corresponding home city and current city. We also merge the same user and POI records within a short time period because they are probably duplicate records. Next, we filter out users with less than 25 check-in records and POIs with less than 10 check-in records. Each user sequence is chronologically divided into 80% for training and 20% for test.
The major statistics of preprocessed cross-city datasets are shown in Table 2. The number of tourists’ check-ins in the current city (e.g., H⟶C) is about one magnitude smaller than both the number of their check-ins in the home city (e.g., H⟶H) and that of locals’ check-ins in the current city (e.g., C⟶C), indicating the extreme sparsity of city check-ins. Experiments are conducted in both directions on each cross-city pair to investigate the model robustness. For example, when we make recommendations to target users from Dallas and traveling in Austin (H⟶C), the locals’ records in Austin (C⟶C) will be used for short-term preference modeling, while the target users’ records in Dallas (H⟶H) will be considered in long-term preference modeling. Similarly, the data choosing process will be reversed when making recommendations to Austin users in Dallas (C⟶H).
The POIs in the Gowalla dataset are grouped into 7 main categories, i.e., community, entertainment, food, nightlife, outdoors, shopping, and travel, and there are 128 subcategories in total (https://www.yongliu.org/datasets/). POIs in Foursquare are grouped into 10 parent categories, and there are 389 subcategories in total (https://location.foursquare.com/places/docs/categories). To represent the long-term preference in a more precise and detailed way, we adopt the subcategories when calculating the user similarity.
4.2. Baselines and Configurations
We compare the proposed model (UPTDNet) with the following state-of-the-art methods:(i)PRME [15]: a personalized ranking metric embedding method that jointly models the sequential information and individual preference. It embeds POIs and users into latent spaces to calculate the distances between POIs and the distances between POIs and users.(ii)RNN [19]: a traditional recurrent architecture for time series processing and has been widely applied in POI recommendation.(iii)Time-LSTM [24]: the model equips Long Short-Term Memory (LSTM) network with time gates taking the time intervals as input so as to better capture both user’s long- and short-term preferences.(iv)Caser [16]: the model considers both sequential patterns and user’s long-term preference. It regards the embedding matrix of the historical sequence as an image and uses convolutional filters to capture the sequential pattern on both union-level and point-level.(v)AttRec [17]: the model takes both long- and short-term preferences into consideration with a metric learning framework. Self-attention mechanism is utilized to learn item-item relationships from the user’s historical sequence.(vi)DeepMove [27]: the model adopts attention mechanism for the user’s long-term preference learning from the history sequence and uses the RNN module for short-term preference learning from the current subsequence.(vii)LSTPM [28]: the model learns the user’s long-term preference with a nonlocal network and the short-term preference with a geo-dilated RNN.(viii)GETNext [40]: the model incorporates the global transition patterns, user’s general preference, spatiotemporal context, and time-aware category embeddings into a transformer model to make next POI recommendations.(ix)PLSPL [32]: the model leverages the attention mechanism to capture long-term preference and learns short-term preference by LSTM from location- and category-based sequences. A user-based linear combination unit is designed to combine the long- and short-term preferences.(x)Flashback [25]: the model is a general RNN architecture that uses spatiotemporal intervals to search past hidden states with high predictive power.(xi)Flashback-R: the refined version of Flashback. The spatiotemporal intervals are modified as the intervals between current check-in points and historical points as [20, 22] did.
We train our proposed model by backpropagation through time using the Adam stochastic optimizer [41]. The dimension of hidden states and all (POI and user) embeddings is set to 10. The temporal decay factor and the spatial decay factor in Flashback are set to 0.1 and 1000, respectively, according to the best performance in the original work [25]. The learning rate is set to 0.01 and decays by 0.2 every 10 epochs. The model is implemented under TensorFlow [42] 2.3.0 framework with Python 3.6. The experiments are conducted on Intel Core i5-12600KF CPU and a single NVIDIA GeForce RTX 3060.
4.3. Evaluation Metrics
We adopt two widely used metrics for POI recommendation, recall@K (Rec@K) and mean reciprocal rank (MRR), to evaluate the model performances. Recall@K measures the presence of the ground truth POI among the top K recommended POIs, defined as follows:where is the ground truth POI that the user interacts with at the next timestamp and is its rank generated by the model. is the indicator function that returns 1 if the rank of ground truth POI is no larger than K, otherwise 0. stands for the total number of test samples. In this paper, we choose the popular for evaluation.
Mean reciprocal rank measures the quality of the ranking list, defined as follows:where is the rank of the ground truth item. We choose these two metrics for indicating whether the POIs are present in the top K ranking list and how well they are ranked.
4.4. Performance Analysis
Tables 3–5 report the experimental results of our proposed model and baselines on the Dallas-Austin, Los Angeles-San Francisco, and Washington-Baltimore test sets, respectively. The best results are highlighted in bold, and the second best (except Flashback-R) is underlined. From the results, we can obtain the following observations:(i)Our proposed UPTDNet consistently and significantly outperforms all baseline methods on both Gowalla and Foursquare datasets. On the Gowalla dataset, compared to the second best results of all baselines, UPTDNet improves the MRR by 22.63% and 18.40% on Dallas⟶Austin and Austin⟶Dallas, respectively. For the sparser dataset, our method shows the advantage against others by an obvious margin, where overall improvement over the second best results achieves 4.11%–22.20% and 9.66%–29.25% on Los Angeles⟶San Francisco and San Francisco⟶Los Angeles, respectively. On the Foursquare dataset, our method outperforms the second best results by 10.22% and 13.53% on average on Washington⟶Baltimore and Baltimore⟶Washington, respectively. It indicates that the user preference transfer and drift learning mechanism are effective in alleviating the data sparsity problem.(ii)Compared to the backbone model Flashback-R, UPTDNet also shows a great improvement on all datasets in all metrics. It reveals the potential and superiority of UPTDNet as a general user preference transfer and drift learning framework, which can change the backbone model alternatively.(iii)Models that consider long- and short-term preferences show competitive performances in general. Both AttRec and Caser get relatively good results on Rec@5 and Rec@10 but perform worse on Rec@1. It may be caused by the lack of sequential effect modeling that is important for the next POI recommendation. Although both DeepMove and LSTPM capture sequential information and build the long- and short-term preferences in a similar way, LSTPM gains better results because it integrates the distance information. PLSPL better captures the specific preferences of each user by learning their weights on long- and short-term preferences and therefore makes further improvements.(iv)The refined model Flashback-R outperforms other baseline methods on MRR in four out of six datasets. It has a great improvement compared to the original Flashback. The refinement of the spatiotemporal interval calculation better captures the importance of historical points to the current recommendation.(v)The performances of PRME, Time-LSTM, and GETNext are unstable and even worse than RNN on several datasets. The check-in dataset is much sparser under the cross-city scenario, so the time intervals between two consecutive check-ins will vary greatly. In this case, the assumption of PRME that recommendation is only affected by the last check-in may not hold on every dataset. It is also difficult for Time-LSTM to build the time gates to capture information from various lengths of time intervals, so the model cannot boost its performance to the maximum extent. The results of GETNext are correlated with the spatial sparsity of POIs. With less POIs in relatively large areas such as Los Angeles, GETNext shows worse performance because of the higher spatial sparsity of POIs, which is consistent with the results presented in the original work [40].
5. Discussions
In order to explore the respective contributions of different components in our proposed model, we further conduct experiments on the Dallas-Austin dataset with several model variants.
5.1. The Impact of Long- and Short-Term Preference Transfer and Drift
To demonstrate the impact of long- and short-term preference transfer and drift in our model, we come up with several variant models as follows. The target tourists’ check-ins in the home city and the locals’ check-ins in the current city are denoted as the home dataset and the local dataset respectively, for simplicity.(i)ST-C: variant model that only transfers the short-term preference from the local dataset. The POI information is also transferred.(ii)SD-C: variant model that only considers the short-term preference drift between target tourists and locals in the current city.(iii)ST-SD-C: variant model that learns both short-term preference transfer and drift from the local dataset.(iv)ST-SD-C-H: variant model that learns short-term preference from both the local dataset and the home dataset. For the target tourists, the short-term preferences are transferred from the current city and drifted among three different user roles.(v)LT-H: variant model that only transfers the long-term preference from the home dataset.(vi)US-H: variant model that only calculates the user similarity with the home dataset.(vii)LT-US-H: variant model that learns both long-term preference transfer and drift from the home dataset. As a result, the long-term preference is transferred from the home city and drifted according to users’ real-life check-in behaviors.(viii)LT-US-H-C: variant model that models long-term preference with both the home dataset and the local dataset. The long-term preference is transferred from the home city and drifted among more users.
The performances of different variant models are shown in Table 6. The short-term preference transfer and drift makes greater improvement compared to the long-term preference modeling. We infer that the generated short-term preference can capture the dynamic context in changing historical sequences and better track the change of user preference for the next POI recommendation. On the other hand, the POI information transfer also alleviates the data sparsity of the cross-city dataset.
As for the impact of components in short-term preference modeling, we can see that the short-term preference drift shows superiority over preference transfer compared to the baseline Flashback-R. It indicates the effectiveness of capturing user role features for a sparse tourist dataset. Similar results are generated in the long-term preference modeling. Calculating user similarity to capture preference drift performs better than merely transferring information from the home dataset. All integration models for both long- and short-term preference modeling yield better results. It can be inferred that it is important to consider preference transfer and drift simultaneously when building long- and short-term preferences.
The results also indicate that even with only one supplemental dataset, UPTDNet can still achieve competitive performances compared to baseline Flashback-R. It reveals the generalizability of our model under practical scenarios. In addition, although the home dataset is not designed as the main source of short-term preference learning, it can also boost the model performance. We think that the additional user role in the drift branch helps further distinguish user role characteristics so that the tourists’ features in the current city become more distinct. Adding a local dataset also shows competitive performance in long-term preference modeling, especially on Rec@1. We infer that both the home dataset and the local dataset provide valuable knowledge that are utilized in UPTDNet. As a result, UPTDNet performs significantly better than all variant models. It shows the capability of integrating information from all datasets and the advantage of considering long- and short-term preference transfer and drift.
5.2. Visualization of User Long-Term Preference
The user embeddings learned from the model training represent users’ inherent long-term preferences. To further investigate the relationships among users upon long-term preferences, we use the t-SNE algorithm [43] to project the obtained user embeddings of the Dallas-Austin dataset into 2D planes for visualization in Figure 3, which can maintain the relationships between high-dimensional points after the projection. In Figure 3, the point group colored in purple represents the target tourist users in the current city, the green point group is their representations in the home city, and the point group for local users in the current city is colored in yellow. We can observe that user embeddings belonging to the same user group exhibit an evident clustering phenomenon. Specifically, tourists’ embeddings in the home city, which are also representations of home city locals, are mostly mixed with locals’ embeddings in the current city. It can be inferred that the locals, no matter which city, share similar check-in behaviors, and the check-in patterns for tourists are distinct from the locals.

We also explore the relationships of individual users in the figure. We randomly selected two users #166 and #149 from . The embeddings of the same user are represented in the same color in both the home city and the current city, which are blue for user #166 and red for user #149. The cosine similarity of POI category visiting frequencies in two cities is 0.7094 for user #166 and 0.2484 for user #149, which is negatively correlated with the distance of corresponding user embeddings. In addition, we highlight user #30 in orange and user #23 in olive from . We can observe that as the embedding point of user #30 is closer to user #166 than his tourist representation, and the cosine similarity of visiting frequencies rises to 0.7907. Similarly, user #23 and user #149 are closer in the embedding space, and the cosine similarity of visiting frequencies also rises to 0.6239. The previous analysis shows that this model can learn user long-term preference that conforms with user semantics at both the group level and the individual level.
5.3. User Case Study
In order to reveal the plausibility of UPTDNet in modeling the user preference transfer and drift, we sample user #121 from the target user set with a ground truth POI in test data and observe how the components in our model work differently. Figure 4 depicts the historical check-in sequence used for recommendation and short-term preference modeling. Figure 5 demonstrates the most visited POI categories by user #121 in the home city and current city, which also reveals the inherent long-term preference. In UPTDNet, we represent the short-term preference transfer and drift as the hidden states of the transfer branch and drift branch and assign user embeddings as long-term preference. Therefore, the top 5 POIs recommended by different preference modeling are obtained and shown in Table 7, whose cosine similarities to the ground truth POI are also presented.


(a)

(b)
As shown in Figure 4, most historical check-in POIs belong to the nightlife category for rest and leisure. The modern hotel check-in also demonstrates an obvious travel intention of the user. We can observe from Table 7 that the short-term preference transfer tends to recommend POIs belonging to the shopping category, which is more proper for locals’ activities. In contrast, the short-term preference drift can capture tourist properties and recommend more entertainment places to the target user. As for the long-term preference shown in Figure 5, the check-ins of user #121 in the home city mainly lie in the shopping and food categories, while the travel and entertainment categories account for a larger proportion in the current city. Such a phenomenon demonstrates the necessity to discriminate different user roles even for the same user and consider preference drift in long-term preference building. A similar pattern can be seen in the recommended lists in Table 7. The long-term preference in the home city tends to recommend shopping places and Mexican canteens, while the long-term preference in the current city recommends accommodations and outdoor sites for tourism. Both lists are consistent with the preference distributions depicted in Figure 5, which further shows the effectiveness of long-term preference modeling in our model.
6. Conclusion
In this paper, we propose UPTDNet to study the problem of cross-city next POI recommendation for the tourists via user long- and short-term preference modeling. To alleviate the extreme data sparsity of cross-city check-ins, we utilize the tourists’ check-ins in their home city and locals’ check-ins in the current city to transfer useful information, including POI information and user preference. The POI information is transferred via weight-sharing embeddings. The user preference is further divided into long-term preference and short-term preference. The long-term preference is transferred from tourists’ behaviors in the home city and drifted among individual users by user embedding transformation and user similarity calculation. The short-term preference is transferred from locals in the current city and drifted among different user roles through dual RNN-based branches. Experiments on real-world datasets show significant improvements compared to other state-of-the-art models, proving the superiority of UPTDNet in tackling data sparsity problems. We also conduct an ablation study to investigate the impacts of the long- and short-term preference modeling and the importance of combining them in our model. The results indicate that even if we only transfer short-term preference or long-term preference, we can still achieve competitive performances and alleviate data sparsity. Moreover, visualization of long-term preference and user case study further demonstrate the validity of the user preference learning in UPTDNet.
For further study, we aim to explore different loss weights for multiple tasks and observe how they affect the target recommendation task. Another possible improvement lies in applying an attention mechanism to extract the most relevant information in our model and reach better performance. We also hope to adapt our model to strict cold-start scenarios.
Data Availability
The datasets that support the findings of this study are openly available at https://github.com/s3pku/UPTDNet_dataset.
Conflicts of Interest
The authors declare that they have no conflicts of interest.
Acknowledgments
This study was supported by the National Natural Science Foundation of China (grant no. 41971331).