Abstract
While a large number of works concentrated on forecasting trajectories in the outdoor environment, predicting the movement of users in indoor settings has attracted much more attention recently because of the development of smartphones and maturity of Wi-Fi services, e.g., in office buildings. Predicting a user’s movement in indoor spaces can not only help better understand his/her intentions but also improve his/her living experience. While most of the prediction approaches to date tackle the problem by constructing the mathematical models to learn the mobility of objects, they cannot efficiently model the movement of indoor users in the constraint but filled with spatial-temporal-semantic info settings. In order to solve this issue, we propose a frequent subtrajectory-based Markov model that incorporates the spatial location, the temporal aspect, and the shop category context into a unified framework. We first present the frequent subtrajectory algorithm to model and predict adjacent moving points from physical movement perspective, and then, by taking the duration of stay at a specific location into account, we further improve the prediction precision. Finally, by taking location context in the indoor environment (e.g., shop categories) into consideration, we successfully model and predict the user’s future visiting points from the semantic perspective. To validate the effectiveness of our model, we conduct a complete evaluation on a large-scale real-world dataset with more than 261,269 trajectories collected from over 120,000 customers in a shopping mall. The experiment results demonstrate that our method performs significantly superior prediction performance comparing the state-of-the-art models.
1. Introduction
During the past decades, a large number of studies have concentrated on predicting the further whereabouts of human beings [1–4]. However, with the prevalence of global positioning system (GPS), the majority of works are mainly focused on outdoor trajectory prediction, but researches show that people tend to spend over 87% of their lifetime in indoor settings, for instance, conference rooms, shopping centers, transition terminals, and private homes [5, 6]. What is more, indoor localization products such as Bluetooth, Infrared, RFID, Wi-Fi [7], and iBeacon [8] have been popular in commercial markets [9, 10]; more and more people started to use the Wi-Fi-enabled smartphones; as a direct and promising application based on indoor positioning system [10], indoor trajectory prediction gradually attracts much more attention which is a crucial part for the location-based services for human beings.
Two main reasons lead to the insufficiency of study on indoor trajectory prediction. The first one is that without accurate indoor positioning technology, it is hard to collect usable and sufficient experimental data [11]; because the techniques used in outdoor localization such as GPS or GLONASS (Global Navigation Satellite System) perform badly in the indoor settings, the signal attenuation will aggravate significantly in the indoor settings because of the building materials [12]. The other reason is that the indoor environment is much more complicated compared to that of outdoor settings. People just need to track the road map to their destinations in the outdoor environment, but there will be multiple choices for the customers even in a small region due to the complex elements in indoor settings such as gates and elevators which make it hard to accurately predict the further whereabouts of indoor users. Note that we do not predict a specific shop the customer will visit next, in our study, due to the unique feature of our dataset; what we predict is the next Wi-Fi access point (AP) the customer will connect.
Nowadays, many indoor establishments are providing free Wi-Fi services to improve user experiences in their facilities. For instance, in the British supermarket Tesco, hundreds of its shops have provided Wi-Fi connections. Wi-Fi is almost an indispensable and fundamental service for large indoor buildings. By taking advantage of this service, building managers can efficiently capture the physical movements of workers or customers [13–16] to provide better services for customers or accomplish efficient management of workers. And researches on accurate Wi-Fi-based positioning have been carried out; for example, in [17], the authors tackle the problem of insufficient RSS measurements for Wi-Fi-based localization.
In this article, a large-scale real-world dataset containing over 261,369 indoor trajectories was collected from a large shopping mall with seven floors across 90,000 square meters. The mall operator has deployed 67 Wi-Fi access points in the shopping mall; other information such as floor plans, deployment of the Wi-Fi access points, and the shop categories are provided by the proprietor of the building. The data offers us a special chance to study human movement in indoor space and investigate the correlation between human physical movement and contextual movement in indoor settings. To enhance the quality of trajectory prediction in the indoor scenarios, we present a subtrajectory-based Markov model by incorporating the spatial, temporal, and semantic information into a unified model. The major contributions of this work can be briefed as follows: (1)From the spatial point, a frequent subtrajectory-based forecasting approach is proposed which not only improves the prediction precision but also restrains the exponential increase in probability transition matrix and solves the data sparsity problem; what is more, the proposed method only needs the most recent movements to do the prediction which saves much storage space(2)From the temporal perspective, we investigate the influence of users’ retention time at a specific location and the starting time of a trajectory on the prediction performance(3)From the semantic perspective, we investigate the semantic information in the indoor environment, e.g., shop categories in the shopping mall; we verify to which extent we can accurately predict which semantic section the users will go next; we also propose a unified model to integrate semantic section prediction with location prediction; it further improves the prediction performance(4)We conduct comprehensive experiments to test the prediction accuracy of our algorithm using a real-world dataset obtained from a shopping mall. Experimental outcomes show that the newly proposed method performs better than the baseline techniques including Markov model and hidden Markov model
Shopper and store managers can benefit a lot from indoor trajectory prediction. For instance, when managers can predict the customers’ movement in advance, they can compare the predicted location with users’ history locations; if the managers find abnormal movements of a specific user, they can make contingency plans to avoid dangerous events. Furthermore, retailers can push related promotion information to a target customer through online advertising system if they know that shopper will physically approach their store; it will not only boost sales but also improve the customers’ shopping experience.
The rest of this study is structured as follows. First, we present literature review in Section 2. In Section 3, we will introduce the preliminaries on indoor trajectory prediction. In Section 4, we present the details about the proposed prediction model which integrates spatial, temporal, and semantic information into a unified framework. In Section 5, experiments were carried out to verify the effectiveness of our algorithm. Finally, we conclude the paper in Section 6.
2. Related Work
In this part, we highlight related studies on indoor trajectory prediction, and differences between these works and our study are discussed.
2.1. Trajectory Prediction
During the past decade, extensive research has been carried out in the field of trajectory prediction; for example, in the book, the authors provide the latest, cutting-edge research on neural networks and machine learning methods to solve the data prediction problem [18]. Based on the type of experimental data, the methodology of trajectory prediction can be divided into two groups: continuous and discrete. In the group of continuous, the continuous coordinate values are generated through the global positioning system, whereas in the group of discrete, the dataset was generated through wireless sensors where the coordinate values denote the identifications of the sensors [19]. The deviation between the predicted locations and the corresponding real locations can be used to evaluate the accuracy of continuous forecasting, and the performance of discrete prediction can be evaluated as the percentage of the actual IDs in the returned predicted IDs. Since our dataset is collected from the shopping mall and the trajectory is represented as a sequence of time-ordered Wi-Fi access points, our work belongs to the category of discrete trajectory prediction.
Based on the prediction methods, existing work can be classified into (1) frequent pattern mining and (2) model building.
2.2. Frequent Pattern Mining
A large number of studies have been carried out by extracting frequent patterns to predict users’ further locations [20]. In the representative work [21], an indexing scheme that exploits periodic pattern information to organize historical spatiotemporal data was proposed. The authors combined the prefix tree and FP tree to predict the movement rule of objects [22].
While most of the studies are carried out on the basis of geographical characteristics, temporal and contextual information have also been investigated recently [23]. In [24], an Indoor-WhereNext approach for trajectory prediction in the shopping mall is proposed; the authors first compute the similarities of location sequences based on the aspects of spatial and semantic, and then, similarity user groups are obtained by employing the AP algorithm which is used to train the prediction models. In [25], a spatial-temporal model to forecast the trajectories in the indoor setting for the blinds has been studied to help them to avoid obstacles. While the aforementioned works adopt the trajectory patterns for prediction directly, they did not consider the situation of reoccurrences of a pattern behind human movements and they did not provide a solution for continuous updating frequent patterns.
2.3. Model Construction
When considering the approaches of model construction, researchers put forward machine learning algorithms such as dynamic Bayesian network [26, 27] and recurrent neural network (RNN) to forecast human further whereabouts by constructing predicting models based on the historical trajectory data.
In Jin et al.’s study [28], an augmented-intention recurrent neural network (AI-RNN) model to forecast locations in diverse trajectories is proposed. However, in our background study, the most commonly utilized models are Markov model, hidden Markov model, and improvements based on these models.
The Markov transition matrix was calculated according to cell organization of a goal region, and the Markov chain model was presented in [29]. To predict the further positions of a student on campus, in the recent work [3], the concept of time was considered in the prediction algorithm.
HMM- (hidden Markov model-) based trajectory prediction algorithms can be categorized into two groups: (1) parameter learning and (2) structure modeling [30]. For parameter learning, the authors proposed a mixed hidden Markov model (MHHM) [31]; and an HMM-based trajectory prediction model that the primary parameters could be altered autonomously was proposed [32]. For structure learning, based on HMM (hidden Markov model), the temporal structure of moving trajectories was modeled and recognized [33]; and the authors presented an algorithm to analyze pedestrian behaviors by constructing probabilistic models [34].
On behalf of the variations of the Markov model, the authors proposed an intermediate model coined as mix Markov model (MMM), which takes advantage of both the individual movement history and the other peoples’ movements. The algorithm first clusters the users into different groups based on their historical movements, and then, the authors constructed a specific Markov model for each group; experiments show an improvement of prediction accuracy compared to the Markov model and HMM (hidden Markov model) [19]. Though the algorithm can accurately predict further locations, it is time-consuming to construct models for each group. The authors proposed a prediction model that is based on the users’ history locations over some time and his/her recent locations [35]. The authors extended a prediction model named mobility Markov chain (MMC) to record the previous locations and coined it as -MMC. It is a variance of the Markov model in which the future whereabouts did not only rely on the present location but also depended on the previous locations.
Existing work trying to predict indoor movements employed the technique of pattern mining and variance of Markov chain [1, 2, 27, 35–37], but the temporal information and semantic section information in the trajectory were not used. What is more, the aforementioned methods were only validated on small or synthetic datasets (7 POIs in Lam et al. [37], only 72 hours’ data in [1], only 48 hours’ data in [38]). And in [35], the authors controlled the experiment environment to collect the data that the participants were aware of the experiments. Unlike the existing studies, our dataset was obtained from the general public in a large inner-city shopping center, and our proposed method considers both the spatial movement of frequent subtrajectory and shop categories which have not been studied before.
3. Preliminaries
In this section, we will introduce some basic concepts and notations used in the work are illustrated in Notations.
Definition 1 (indoor space). In general, the indoor space can be depicted as .
Based on the previous work [39], the indoor settings can be divided into multiple cells denoting different function areas. AP is a set of Wi-Fi access points deployed in the indoor space. Deployment stores the location details of the access points in the building.
Definition 2 (semantic section). In the indoor environment, the floor plan can be partitioned into various sections according to its functions: for example, lobby, resting, guest room, and restaurant in the hotels; men’s footwear, women’s fashion, food court, and entertainment sections in the shopping mall; consulting area, registration area, and outpatient area in the hospital. In the following parts of the study, we will use to denote semantic sections for short.
Definition 3 (AP point). In general, AP point is the abbreviation of Wi-Fi access point in the indoor building, each with a unique value separating it from the others, and it covers several semantic sections. We denote an access point AP as , where is a subset of all the semantic sections ().
An instance of in our dataset is shown in Table 1; the AP point envelops 4 semantic sections, namely, Cosmetics, Jewelry, Cafe, and Beverage; thus, we denote as .
Definition 4 (trajectory point). An indoor trajectory point is denoted as .
refers to the specific Wi-Fi access point, is the time when the user walks into the Wi-Fi access point’s activation range, and is the time when the user walks out of the Wi-Fi access point’s activation range.
Definition 5 (indoor trajectory). The indoor trajectory is made up of a sequence of time-ordered Wi-Fi access points, where
Definition 6 (indoor trajectory database ). Indoor trajectory database is used to save the customers’ movement information in the indoor space, , is the count of trajectories stored in the database.
Definition 7 (indoor trajectory prediction). Given an indoor trajectory , the objective is to predict the location of in the instant according to the past timestamps.
In the rest of the paper, the terms indoor trajectory and trajectory are used interchangeably unless otherwise specified.
4. Trajectory Prediction Model
In this part, our unified spatial-temporal-semantic indoor trajectory prediction model will be discussed in detail. Figure 1 is the outline of our approach. At its most basic level, the model first processes the raw trajectories into - (frequent subtrajectory-) based trajectories, and then, the prediction is carried out by combining the semantic information in indoor scenario and the temporal information, namely, duration of stay into a unified paradigm.

4.1. Spatial Perspective
In this section, we will introduce our prediction approach from the spatial, namely, physical movement perspective. Intuitions and the detailed frequent subtrajectory-based method are illustrated here. To begin with, we will briefly describe the basic concept of Markov process and its related features.
4.1.1. Markov Process
The underlying technology used in our work is the Markov process; the Markov process is a random process controlled by the Markov property that the future states depend on current states only, regardless of the states in the past. The Markov process is extensively utilized in discrete-time state forecasting problems. The feature is noted as the Markov property or no aftereffect. In the Markov process, the chance of shifting to a new state counts on the present state and the transition matrix only. So predicting the probability of a customer going to a new position at timestamp given his/her history location sequences can be depicted as the formula below.
It is also called first-order Markov process, which supposes that the probability of exploring a new location depends on the last attended position only.
4.1.2. Higher-Order Markov Process
In reality, the probability of visiting a new location may depend on all the visited historical locations; hence, we are inspired to exploit the higher-order Markov process. Note that the higher-order Markov process is an extension of the Markov process such that the next latent state is stochastically dependent on the latent states multiple time steps in the past. This extension yields a richer model that can better capture the dynamics of the trajectories.
However, the higher-order Markov process is impractical for even a moderate number of time steps look back, due to the exponential increase in the size of the transition matrix and the data sparsity problem where if the set of history points taken by a user in previous steps does not match any of other users, and then, there will be no candidate locations for him/her to recommend. In this study, we adopt a second-order Markov process that the further whereabouts to be visited are stochastically dependent on both current and previous locations.
4.1.3. Subtrajectory-Based Markov Model
We hold the assumption that the indoor trajectory data collected from users contains meaningful movement patterns that depict common behaviors of them. To be specific, trajectories repeatedly followed by users can be extracted to reveal their movement patterns. Investigating the movement patterns of a large number of people can disclose the most efficient commuter route; on the other hand, analyzing the movement patterns of every user can know more about his/her lifestyle choices and associated changes. Thus, movement patterns that represent the most frequently traveled routes are employed in this study to depict and forecast the whereabouts of customers.
A novel method called subtrajectory-based Markov model is proposed in our study to address the data sparsity and the exponential increase in the size of the transition matrix problems caused by employing higher-order Markov model. The trend of spatial granularity (the notion of spatial granularity in this study refers to the number of states in the Markov process) and the density of transition matrix when we employ new subtrajectory patterns into our model is shown in Figure 2. The granularity gradually decreases as more patterns are added into the model while the density of the transition matrix increases.

(a) Trend of granularity with respect to number of patterns

(b) Trend of density with respect to number of patterns
What is more, compared with the outdoor environment, the indoor setting which is composed of many of indoor elements, such as apartments, gates, stairways, walls, corridors, and floors, provides much richer constraint information. The geometric restrictions among the indoor elements are much more complex than those of outdoor environment. Moreover, topological constraints, such as adjacency and inclusion correlations, are much more general in the indoor scenario. So there will be much meaningful information held in indoor trajectories. In general, taking advantage of the potential useful information will effectively enhance the accuracy of indoor trajectory tracking. Next, we will describe our subtrajectory-based Markov model in detail.
To begin with, we decompose all the trajectories into subtrajectories comprising two to five neighboring trajectory points. Then, we rank the extracted subtrajectory patterns in descending order in terms of frequency. We argue that the top-ranked subtrajectory patterns are important and meaningful, and we merge the trajectory points in the frequent subtrajectory patterns into a supernode. For example, considering an indoor trajectory sequence , the set of subtrajectory with size two is , and then, as it is shown in Table 2, we count the number of the subtrajectory patterns and rank them in descending order, respectively; the most frequent pattern in the example is .
After computing the top frequent subtrajectory patterns, we will deal with the raw indoor trajectories and transform them into “frequent subtrajectory”-based trajectories. First, we determine frequent subtrajectory patterns according to their frequency, and then, we combine the trajectory points into a supernode. The process is illustrated in Figure 3.

4.2. Temporal Perspective
4.2.1. Duration of Stay
(duration of stay) in wireless LAN environment is an important parameter that can be used to estimate the duration a mobile user spends at the range of a particular access point . We hold the assumption that the time a customer consumes on a specific position has influence to the place where he is going to visit in the future. Consider the situation of a customer in a shopping mall; if he has already spent large time on a specific food-oriented store, it is highly possible that he will not visit a food store in the near future; otherwise, if he has only spent a few minutes on a series of food stores, we can assume that he is trying to find his favorite restaurant and we can predict the next place he is going to visit is also a food store with high confidence.
Based on the aforementioned assumption, we further enhance our model with the duration of stay at a specific access point . We analyzed the duration of stay of all users’ login data; the result is shown in Figure 4; results show that most of the users tend to spend 5 minutes in a specific access point’s range; thus, we generate several candidate lists for the next location to go based on the duration of stay for a specific access point . Our methodology is as follows: when performing the prediction, we first check out which access point the user currently stays, and then, we look into how long he has stayed in the range of that access point; finally, based on the candidate list generated from training data, we return the access points with top- highest probabilities as prediction outcomes. The procedure is illustrated in Figure 5.


4.2.2. Data Processing
Before constructing the transition probability matrix, the data cleaning procedure is performed. We remove the trajectories which have less than four transitions because we hold the assumption that if the length of the trajectories is too short there will be no specific meanings between the transitions, and they will introduce inevitable errors.
4.2.3. Transition Matrix Construction
Based on Equation (2), the chance of a customer going to a new position by considering the first-order Markov Chain can be represented as follows: The numerator denotes the number of unique transform from to in the training dataset, and the divisor is the total number of the transit trajectories that are started from ; is the total number of Wi-Fi access points.
As for the second-order Markov model, we will build a larger transition matrix because there will be much more combinations between the last two history locations, and the probability of moving to a new location considering the last two locations is as follows:
4.3. Semantic Augmented Trajectory Prediction
To better enhance the quality of our prediction model, we incorporate semantic section () aspect in the indoor environment into our framework.
4.3.1. Semantic Section
In real-life settings, based on the functions, the indoor building could be separated into various areas. For example, the hotel can be partitioned into check-in, resting, lobby, restaurant, entertainment, and sports sections; and the supermarket can be divided into cooked food, fruits, meat, beverage, and daily necessities sections. We hold the assumption that user’s physical movements have relationship with his/her current locations. It is clear that in the train station, the majority of passengers will follow the route of buying tickets, security checking, waiting, and boarding; we assume that in the shopping mall, customers will also follow some specific patterns; for instance, and as shown in Figure 6, after strolling for a pair of shoes, the shopper may spend some time for eating and then buy a ticket for the big hits movie. So we can conclude that, in fact, people’s movement shows sequential patterns [40–42]. And we propose a semantic section-based Markov model to validate our hypothesis of how accurate can we predict the semantic section a user visits.
Definition 8 (semantic section prediction based on Markov model). Consider a shopper has visited semantic sections in an indoor building, namely, ; our goal is to predict what is the next semantic section the customer will attend.

Given a trajectory sequence , the number of semantic section sequences is where denotes the count of semantic sections that access point covers. For instance, given a trajectory in the shopping mall, each trajectory point covers several shop categories, as it is shown in Table 3. ’s activation range overlaps four shop categories of , , , and , ’s activation range overlaps three shop categories of , , and , and ’s activation range overlaps three shop categories of , , and . When predicting the next semantic section (shop category) the user is going to visit, firstly, all the possibly category chains are generated according to the previous () access points; in the example, a total of possible store category sequences are formed. Then, we will use the newly generated data to train the Markov model and predict the possible sections to be explored further.
To enhance the accuracy of our prediction model, we incorporate the (semantic section) prediction with trajectory prediction. Each candidate’s semantic section weight is calculated based on Algorithm 1: candidate list is used to store possible semantic sections to be explored later, and for the candidate Wi-Fi access point , if there is a semantic section it overlaps that is in , we allocate to its semantic section weight, where ; lastly, we use Formula (6) to compute the unified score of each Wi-Fi access point to be explored by the customer; is the weight coefficient we signed to semantic section.
|
So far, we can predict the whereabouts for a user by giving his top- positions with the highest merged score.
5. Experiment Evaluation
In this section, we will evaluate and compare our model with baseline algorithms. The algorithms were implemented with Python 3.5. Fivefold cross-validations were carried out, and the indoor trajectory dataset was split into two sets: training dataset and testing dataset. The training dataset was utilized to construct the predicting models, and the testing set was used to evaluate the accuracy of the prediction model.
5.1. Setup
5.1.1. Dataset
The dataset we used was collected from a large shopping center with seven floors and 67 Wi-Fi access points across 90,000 square meters. The shopping center contains more than 200 shops which belong to over 34 shop categories. Table 4 shows the details of the dataset.
According to the layout of the mall, the nearby regions of APs can be partitioned into three major categories: food court, retail, and navigational which includes 11 APs, 46 APs, and 10 APs, respectively. The distribution of retention time with reference to each category is depicted in Table 5. People spend seven percent of their times in navigational areas, for eating foods, and in a retail context.
5.1.2. Parameters
Table 6 lists all the used parameters in our experiments. We set a parameter to the default value when we investigate other parameters.
5.1.3. Measurement Metrics and Baselines
To be specific, considering a target customer, our indoor trajectory predicting model calculates a score for each possible location (i.e., AP point in this study) and returns access points with the top- maximum values as forecasting outcomes to the target customer. To evaluate the accuracy of the proposed algorithm, we use the following three metrics: (1) which denotes the percentage that we predict the correct next position(2) which denotes the percentage of times that the correct next location was in the top-five most probable locations(3) which denotes the average of the reciprocal ranking positions for the correct next location
Note that we did not employ the mean absolute error (MAE) as our performance metric in the experiment. The reason is that MAE is a metric for the average magnitude of the errors in a set of predictions. However, in our problem, users have no ratings about the next location and the metric is not suitable for our problem.
To validate the effectiveness of our proposed algorithm, we compare our method (STS) with the following baselines: (1) has been proven to be effective in predicting future locations [43](2) performs well in modeling the movement of users in spatial-temporal space [38](3) can represent the mobility behavior of an individual in a compact yet precise way [44](4) takes the concept of different time in a day into consideration and performs well in forecasting the students’ trajectory in campus [3](5) considers the similarity between trajectories in terms of spatial and contextual aspects [45]
5.2. Experiment Results
In this section, we will analyze the extensive experimental results. The influence of trajectories’ length is first investigated; then, inspired by Wang et al. [3], we investigate the effect of trajectories’ starting time to our model in prediction accuracy. Next, we check the effect of frequent subtrajectory ’s size and number to the performance of our framework. After that, we study the effect of duration of stay at a specific location to the prediction performance. Then, we check the prediction accuracy of semantic section using the Markov model; after which, we investigate the location prediction accuracy of our unified framework and compare it with baseline methods. Finally, we discuss some important findings in trajectory prediction problem.
5.2.1. Influence of Trajectory Length
In this section, we will check the influence of trajectory length to the prediction precision by employing the first-order Markov model. According to Figure 7, we verified the length of trajectories from four to ten which represent the majority of dataset. As can be seen from Table 7, our evaluation matrices did not vary too much when the length of trajectories varies; the results perform stable around 0.14, 0.39, and 0.27 for , , and MRR, respectively. We can conclude that the performance of the Markov model is stable against the different lengths of trajectories, so in the following sections, we do not verify trajectory length again.

5.2.2. Different Starting Time
The authors take the concept of different starting time into consideration in their prediction algorithm [3]; they divided the daytime into multiple parts and construct the prediction model for each part, respectively; experiment results show that compared to basic Markov model their prediction accuracy improves nearly 100%. Inspired by the paper, we also test this idea in our study; the dataset is split into different parts according to the starting time of a trajectory; Figure 8 depicts the distribution of association times of all the trajectories in the shopping mall; as can be seen from the figure, people start to pour into the shopping mall in the middle of the day and the number of customers going into the shopping mall peaks at 14:00. Table 8 shows the prediction results considering different starting time of a trajectory. We find that the method considering the notion of time in shopping mall did not outperform the basic Markov model too much; the reason may be that unlike the special movement behaviors of a student in campus where his daily location transition is affected badly by the time when the movement occurs, e.g., in the morning, the students will probably go to the classroom, users’ movements in the shopping mall are pretty flexible; people do not have restriction constraints about where to go next at a specific time. So in the rest of the study, our approach does not consider the notion of starting time.

5.2.3. Frequent Subtrajectory
In this section, we study the influence of frequent subtrajectory in terms of its size and number on prediction performance. In the experiment, we vary the size of frequent subtrajectory from two to five. Figure 9 shows the results; when the number of frequent subtrajectories increases, we get better prediction results. And an interesting finding is that adopting a smaller size of frequent subtrajectory can achieve a better result. The explanation is that, in general, when increasing the size of the patterns, there will be fewer patterns appearing in the last two super nodes which can be referenced for the prediction.

5.2.4. Duration of Stay
Next, we investigate the influence of stay time on trajectory prediction. Figure 10 shows the results; as it can be concluded from the figure, when considering stay time, we get better results compared to that without considering stay time, that is to say, temporal aspect will affect the prediction precision.

5.2.5. Semantic Section Prediction
In this part, experiments will be conducted to validate whether the semantic sections (shop categories) visited by customers in the shopping mall are predictable or not. As depicted before, we use the generated data to train the prediction model; the results are shown in Table 9. In our experiment, we use the trajectories with length ranging from four to ten that generate 17,399 to 162,008 store category sequences, respectively, and the forecasting accuracy of the model for is around 50%. So we could conclude that customers’ movements in the shopping mall do have certain patterns. And we can surmise that integrating the concept of shop category into our prediction model would strengthen the accuracy of location prediction.
5.2.6. Semantic Augmented Prediction and Comparisons with Baseline Method
In this part, experiments will be carried out to validate the correlation between the AP’s original weight and semantic weight. Figure 11 depicts the results when the weight parameter ranges from 0 to 1; as can be seen from the figure, the prediction accuracy shows a trend of increase in the beginning for , , and , respectively; the best prediction results are achieved when the weight parameter is around 0.8; after which the accuracy of our model slowly declines. Through the results, we can say that employing semantic information in the indoor environment can help improve the prediction accuracy of our model.

(a)

(b)

(c)
Table 10 shows the overall prediction results by HMM, LHMM, MM, TPA, and our method; as can be concluded from the table, HMM- and layered HMM-based approaches perform the worst; this may be resulted in inappropriate hidden states which can be investigated in future work. Markov-based approaches perform better which achieve the prediction accuracy of 0.405 and 0.455 for in terms of Markov chain model and TPA model, respectively. The similarity-based method STM achieves a pretty high prediction accuracy comparing to the aforementioned methods meaning that grouping the trajectories and constructing models for each subset of trajectories will help improve the prediction accuracy. Our proposed approach performs the best which improves the prediction accuracy of 84.6%, 28%, and 55.6% against STM in terms of , , and , respectively.
6. Conclusion and Further Work
Predicting the trajectory of users in the indoor setting has gradually drawn much attention recently with the development of Wi-Fi-enabled smartphones and free Wi-Fi services in the shopping malls. Due to the complicated indoor environment such as doors, elevators, and rooms, it is hard to perform the prediction task even in a small area. In order to better understand the movement mobility of indoor users, we propose a unified framework that takes the spatial, temporal, and contextual information into consideration. In this study, we first adopted the idea of “supernode” that we grouped the frequent subtrajectories into a new node which can enhance the ability of our model in trajectory prediction in indoor environments and solve the data sparsity problem. Next, we studied the influence of temporal factor called duration; it further improved the performance. Then, we combined the results of location prediction and semantic section prediction into a unified framework which improved the overall prediction accuracy showing that human movements have certain patterns. Finally, extensive experiments were carried out to test the performance of our proposed approach by employing a large real-world dataset collected from an inner-city shopping center and compared it with baseline algorithms; outcomes showed that our method outperformed the baseline methods considerably in forecasting the human movement in indoor settings.
By this end, we conclude the strong points of our proposed method as follows: (1) by employing the subtrajectory-based second-order Markov model, we avoid the problem of exponential increase in transition matrix; (2) the data sparsity problem is solved by introducing the concept of “supernode”; (3) compared to the existing methods, our model provides a solution for continuous updating of the frequent patterns; (4) semantic information is utilized for the predicting problem which is crucial in the indoor environment. Despite the strong points, there are limitations in this work: (1) although the semantic aspect is employed here, the Markov-based algorithm cannot provide similarity information between the trajectories which may denote the closeness of customers and can preserve more information; (2) we used a dataset in the shopping mall; however, the effectiveness of our method on other indoor environments is unclear. Further investigations will be carried out on various indoor datasets such as airport terminals and hospitals to validate the robustness of our approach. Moreover, the semantic aspect of the indoor environment can be evaluated by employing the tree-like model or graph structure to preserve more information.
Notations
: | Wi-Fi access point in indoor settings |
: | Semantic section in the indoor environment |
: | Time stamp association with |
: | Time stamp disassociation with |
: | Indoor trajectory point which combines access point and its association and disassociation time |
: | Indoor trajectory which is a series of time-ordered Wi-Fi access points |
: | Indoor trajectory database |
: | Frequency of movement from to |
: | The transition probability of visiting a new location |
: | Transition matrix which contains all the transition probabilities |
: | Subtrajectory which is composed of adjacent trajectory points |
: | Frequent subtrajectory |
: | Duration of stay in a location |
: | Threshold for |
: | The weight of a semantic section in the indoor building. |
Data Availability
The trajectory data used to support the findings of this study are available from the corresponding author upon request.
Conflicts of Interest
The authors declare that they have no conflicts of interest.
Acknowledgments
This work was supported by the Natural Science Foundation of Heilongjiang Province of China under Grant No. F2015030.