Abstract
When developing a Wi-Fi indoor positioning system in a real-world environment, the problems we have to face are that some access points’ signal strength fluctuates extensively or even loses contact due to the cybersecurity threats, leading to the fact that the indoor location system cannot get reliable application in a real-world environment. To solve this problem, we propose a new integrated model based on signal anomaly detector and signal distance corrector to provide reliable position estimation when the access points’ signal is lost under cybersecurity threats. The signal anomaly detector improves recognition capability of the uncertain signal and noise, while the signal distance corrector improves the robustness and fault tolerance of the highly variable Wi-Fi signals. To fully reflect the performance of the proposed method, experiments have been carried out in the real environment of indoor parking lots. The results show that the proposed integrated model successfully provides reliable position estimation when the access points are lost under cybersecurity threats.
1. Introduction
In recent years, with the rapid development of computer science and mobile communication, and increasing market share of smart phones, tablet PCs, and other equipment, the need for location services has been growing in an unprecedented rapid pace. There are many potential applications emerging, such as real-time vehicle information service [1], traffic guidance information service, and parking guidance information service [2]. After years of development, Global Positioning System has been so maturely developed that sufficiently meets people's needs for outdoor positioning. However, the transmission of wireless signals by satellites and network base stations is inevitably obstructed by indoor structures, resulting in large signal deviations and failure of accurate positions in indoor positioning fields [3].
At present, varieties of technologies are continuously implemented in indoor positioning fields, including Ultrasonic Positioning [4, 5], Geomagnetic Positioning [6], Bluetooth Positioning [7, 8], UWB Positioning [9], and Wi-Fi Positioning [10, 11]. Among these technologies, Wi-Fi Positioning is the most prominent, as it has advantages in terms of vast communication range, low cost, convenient deployment, etc. [12, 13]. Meanwhile, almost all mobile terminals have built-in wireless network cards that can measure Wi-Fi signal strength, which can be used for indoor positioning [14, 15]. There are outstanding prospects for the development of Wi-Fi indoor positioning technology, and its research remains a meaningful and valuable work.
The Wi-Fi RSSI fingerprinting method is recognized as a main technology measure compared to geometric location method because it is flexible, is easy to identify, and does not need to obtain the physical location of the access points [16, 17]. It consists of two phases: offline phase and online phase. In the offline phase, the fingerprint database with position label is constructed through the received signal strength indicator detected from various reference points and spatial coordinates of the reference points. In online phase, a location result is estimated through fingerprinting algorithm by comparing the received signal strength indicator collected from target points with the fingerprint database constructed in the previous phase.
The Wi-Fi RSSI fingerprinting method has a high practicality and effectiveness in most cases. However, when developing a Wi-Fi indoor positioning system in a real-world environment, there are several problems we have to face. With the large-scale deployment of Wi-Fi infrastructure, the cybersecurity threats of indoor positioning environments are getting more serious, leading to the fact that some access points’ signal strength fluctuates extensively or even loses contact. In other words, the target point cannot obtain the received signal strength indicator that should have been received from some access points. This phenomenon directly leads to the invalidity of fingerprinting algorithm, and the indoor location system cannot get reliable application in a real-world environment.
To address this problem, researchers have thought of various approaches based on rules, hybrid, game theory, and graphs [18, 19], as well as discussing lots of anomaly detection models, e.g., models based on rules, clustering, vector supporting machine, closest proximity, and spectral decomposition [20, 21]. Zhang et al. [22] proposed an ellipse-type vector supporting machine to model the behavior attributes of the sensor data in wireless networks. However, ellipse-type support vector machine method has a problem on secondary optimization, which makes it impossible to be implemented on networks deployed in remote and harsh environments and is not suitable for general WLAN environments. Paola et al. [23] proposed an Adaptive Distributed Bayesian Approach for identifying outliers in collected data through wireless sensor networks, but the Bayesian Approach is not adaptive when offline, and its generalization practicability is mediocre. Mohammad Wazid [24] proposed a k-mean clustering to detect outliers and a mixed outlier detection method by obtaining parameter thresholds. Yenke et al. [25] proposed a distributed anomaly detection scheme based on Mahalanobis and Euclidean distance, which uses nearest neighbor search to improve the effectiveness of the algorithm. However, this method does not take into account the characteristics of access points, resulting in poor anomaly detection effect. On the other hand, these methods do not provide reliable location estimation methods after detecting an abnormality.
The emphasis of this work is to provide reliable position estimation when the access points’ signal is lost under cybersecurity threats. Without knowing the physical location of the access points in the environment, we propose a new integrated model based on signal anomaly detector and signal distance corrector. In the offline phase, considering that Wi-Fi fingerprint database can be fitted into an n-dimensional surface in signal space, the signal anomaly detector is constructed based on signal distortion theory and is trained through repeated comparison and analysis. In the online phase, for the unlabeled RSSI sample from the mobile terminal, the signal anomaly detector is used to realize online anomaly estimation, and the signal distance corrector is used to online distance correction. To fully reflect the performance of the proposed method, experiments are carried out in the real environment of indoor parking lots. The results show that the proposed integrated model yields higher anomaly detection accuracy and lower positioning mean error and makes it possible for application in cases when access points are lost under cybersecurity threats.
The rest of the paper is organized as follows. Section 2 introduces the related work of this paper, including Wi-Fi fingerprint data acquisition system, Wi-Fi real-time positioning system, and software architecture. Section 3 describes a new integrated model based on signal anomaly detector and signal distance corrector in detail. Section 4 presents the experimental design and results analysis. Section 5 summarizes the full paper and proposes suggestions for further research.
2. Related Work
2.1. Wi-Fi Fingerprint Data Acquisition System
The Wi-Fi fingerprint data acquisition system mainly completes the data acquisition of indoor Wi-Fi fingerprint signals in the offline phase and saves the collected Wi-Fi fingerprint signal data in the database as a data set for model training in subsequent work. The structure of Wi-Fi fingerprint data acquisition system is shown in Figure 1.

As shown in Figure 1, the fingerprint data acquisition system includes a client and a server. The client is an App based on the Android system, which mainly includes functions such as collecting Wi-Fi fingerprint data, uploading Wi-Fi fingerprint data, and displaying data upload results. The server includes a Web Server and a database. The Web Server responds to data collection requests, processes the data structure, and stores the processed data in the database. The database uses SQL Server to store massive Wi-Fi fingerprint data.
2.2. Wi-Fi Real-Time Positioning System
The Wi-Fi real-time positioning system mainly completes the data matching of indoor Wi-Fi fingerprint signals in the online phase. The entire system relies on the location fingerprint database collected in the offline phase and the online positioning model obtained in the offline training phase. The structure of Wi-Fi real-time positioning system is shown in Figure 2.

As shown in Figure 2, the real-time positioning system also includes a client and a server. The client is an another App based on the Android system, which can scan the signals of Wi-Fi access points around the user in real time, send the scanned Wi-Fi information data to the server, and then receive the return from the server. As a result, the real-time location is displayed on the map. The server mainly uses Web Server to control the data flow jump and Python to achieve data preprocessing. The signal anomaly detector and the anomaly distance corrector obtained through training are matched to obtain the current position coordinates, and the positioning result is returned to the client.
2.3. System Software
In this article, the system software is composed of client, server, and database. The client is based on Android system, the server is published on Tomcat 8.5, and the database uses Microsoft SQL server relational database. In addition, the proposed ensemble model is built by the integrated development environment of anaconda software based on Python3.6.
3. Proposed Methods
When some access points’ signal strength fluctuates extensively or even loses contact due to the cybersecurity threats, the indoor location system cannot provide satisfied positioning accuracy in most situations. To address this problem, we propose a new integrated model based on signal anomaly detector and signal distance corrector to provide reliable position estimation when the access points’ signal is lost under cybersecurity threats. Figure 3 shows the process of data stream analysis. The process mainly contains two steps: the signal anomaly detector for online anomaly estimation and the signal distance corrector for online distance correct.

3.1. Signal Anomaly Detector
In this section, we will approximately describe the proposed signal anomaly detector. Firstly, we assume that the fingerprint data set of reference point can be expressed by , where is the total number of APs. It is composed of the received signal strength vector of reference point and the spatial coordinates of reference point . Meanwhile, if the fingerprint data set of reference point can be fitted into an n-dimensional surface , the can be expressed as a point in n-dimensional signal space. If the received signal strength vector at the point to be located is expressed by , the minimum distance from the point to the n-dimensional surface can be obtained by calculating the distance between and . Through the geometric rule theory, in the normal situation, we can conclude that must be able to converge to zero or a certain range, while, in the abnormal situation, will increase to a greater value than a certain range. Therefore, if cannot converge to a certain range, we can determine that it is in an abnormal situation. Figure 4 shows the sketch map of signal distortion.

The above signal anomaly detection method adopts Euclidean distance threshold to judge whether there is abnormality or not, which can be applied to anomaly detection in most cases [26, 27]. However, for Wi-Fi positioning system, the problem we must face is that the APs signal strength and signal stability are different for each point to be located because the positioning area is too large, and the environment is extremely complex. If we applied the above signal anomaly detection method to Wi-Fi positioning system, because it estimates Euclidean distance by all APs, it may exaggerate some small signal changes, and many normal signal changes may be judged as abnormal signal loss, resulting in it being difficult to accurately realize signal anomaly detection.
To solve this problem, we propose an improved signal anomaly detection algorithm by distinguishing trusted APs and untrusted APs. When estimating Euclidean distance in signal space, the proposed algorithm only considers the signal space distortion of trusted APs, while it does not consider the signal space distortion of untrusted APs, which narrows down some small signal changes and makes it easy to accurately judge whether there is an anomaly. The construction process of improved signal anomaly detection algorithm is discussed in detail as follows.
Firstly, the AP set can be expressed by the , where is the total number of APs. Meanwhile, the received AP signal strength at the point to be located can be expressed by . As we know, the distance from each AP to the point to be located is different, so each AP signal strength at the point to be located is different. Meanwhile, because some APs are farther from the point to be located, it may not be able to search for some signals. These signals that cannot be searched may exaggerate some small signal changes, and many normal signal changes may be judged as abnormal signal loss. Therefore, we can define the APs corresponding to the signals that cannot be searched at the point to be located as , and the APs corresponding to the signals that can be searched at the point to be located as .
Secondly, we can calculate the minimum Euclidean distance under and under , respectively. The minimum Euclidean distance can be calculated by formula (1), and the minimum Euclidean distance can be calculated by formula (2).
Finally, according to the rules proposed above, when estimating Euclidean distance in signal space, we only consider the minimum Euclidean distance under , while we do not consider the minimum Euclidean distance under . When is greater than distance threshold , it is judged that there is AP loss in , which is represented in the abnormal situation, while when is less than distance threshold , it is judged that there is no AP loss in , which is represented in the normal situation. Suppose that the abnormal state is defined as , and the anomaly criterion can be expressed as follows:
3.2. Signal Distance Corrector
As mentioned above, the Wi-Fi fingerprinting method combines offline phase and online phase [28]. In the offline phase, the target is to realize the fingerprint data acquisition. The fingerprint data set of reference point can be expressed by , which is composed of spatial coordinates at i-th the reference point , where k is the total number of reference points, and received signal strength vector at the i-th reference point , where represents the RSSI signal of the j-th AP collected at the i-th reference point. In the online phase, the target is to estimate the location of the mobile terminals using the model. The received signal strength at the point to be located is , and the position estimation is obtained by calculating the similarity between and . Figure 5 shows the overall framework of the Wi-Fi fingerprinting method.

In addition, K-Nearest Neighbor (KNN) is usually used in the online phase [29]. KNN usually uses Euclidean distance to calculate similarity in signal space, and Euclidean distance can be calculated according to formula (4). The smaller the distance between them, the higher the similarity between them. Usually, the coordinates of the first K reference points are selected. The distance weighted K-Nearest Neighbor (DW-KNN) is based on the KNN [30], where the K nearest neighbors can be obtained by sorting according to formulae (5) and (6).
DW-KNN can be applied to Wi-Fi positioning estimation in most cases. However, for cybersecurity threats, the problem we must face is that the target point cannot obtain the received signal strength indicator that should have been received from some access points. When there is AP loss, if Euclidean distance is still calculated according to formula (4), the lost AP may make the K nearest neighbors obtained by sorting according to formulae (5) and (6) no longer reliable, leading to the invalidity of fingerprinting algorithm.
To address this problem, we propose a new fingerprint matching method based on DW-KNN. When the signal abnormality detector determines that there is AP loss, the proposed algorithm only considers the signal space distortion of trusted APs, while it does not consider the signal space distortion of untrusted APs, which greatly reduce the matching error caused by signal loss. Therefore, when the signal abnormality detector determines that there is AP loss, the corrected Euclidean distance can be calculated according to formula (7), while when the signal abnormality detector determines that there is no AP loss, Euclidean distance is still calculated according to formula (8). Suppose that the new Euclidean distance is defined as , and the fingerprint matching criterion can be expressed as formula (9):
4. Experiments and Discussion
4.1. Experiment Environment
To evaluate the proposed strategy, we conducted a real experiment in the indoor parking lot of a shopping mall in China. This is a real indoor parking lot consisting of 150 parking spaces and covering an area of 2,000 square meters. Figure 6 is a real-life view of the indoor parking lot. The rectangular grid represents a parking space of 2.0 m × 5.0 m. The signal identifier represents Wi-Fi access points, which has detected that 16 access points were installed in the indoor parking lot. The real AP device for experiments is shown in Figure 7.


To enhance the robustness of the positioning system, three different types of mobile phones (Huawei glory 3C, Xiaomi mix2, and 360n5) are used in this paper. Meanwhile, data should be collected in all directions facing east, south, west, and north in turn to avoid the influence of the surrounding body on data collection. Then, we collected 100 training samples for four orientations by this way at every parking space. Meanwhile, to meet the scene change problem in real application, we have taken training data and test data collected in different periods and collected 10 testing data pieces for four orientations by this way at every parking space after a week. As we know, because some APs are farther from the point to be located, it may not be able to search for some signals, and we record the missing value as −90 dB. The data storage structure is shown in Figure 8.

In order to compare the performances of various methods, three measuring standards of error distance, precision, and accuracy are proposed. The error distance implies the Euclidean distance between the estimated coordinates and the true coordinates. The precision is another indicator of positioning performance, which is commonly described as the cumulative distribution function (CDF) of the error distance. The accuracy implies the average error distance of all pending points. The smaller the average error distance, the higher the accuracy, and vice versa.
4.2. The Necessity of the Proposed Integrated Model
In this section, to verify the necessity of the proposed signal anomaly detector and signal distance corrector, we adopt DW-KNN as the positioning classifier as it has the advantage of magnifying slight changes. As we know, we need to obtain the optimal number K for our experiments before constructing DW-KNN. Figure 9 is given the variation curve of the average positioning error with different number K, where the error bar represents the standard deviation. We can see that, with the continuous increase of the K value, the average positioning error value first gradually decreases and then slowly increases. It is pointed out that the average positioning error gets the minimum value when K equals seven, where the average positioning error is 2.59 m and the standard deviation is 1.91 m. It is equivalent to the fact that the DW-KNN positioning classifier has the best performance for Wi-Fi signal positioning when K equals seven.

On the basis of DW-KNN positioning classifier, we need to obtain the variation curve of the average positioning error in case of access points loss. To more intuitively show the variations of positioning accuracy in case of access points loss, we introduce the access points loss ratio α% in each observation. Figure 10 shows the variation curve of the average positioning error with the change of the access points loss ratio. Meanwhile, the cumulative distribution function (CDF) of error distance with the change of the access points loss ratio is shown in Figure 11. We can see that the average positioning error continues to grow as the access points loss ratio continues to increase. When the access points loss ratio α = 10%, the average positioning error increases to 4.16 m, which is an increase of 31.7% compared to it in nonabnormal conditions, while, for all observations, the access points loss ratio 10% is only one or two access points, where there are up to sixteen access points in all observations. When the access points loss ratio α = 25%, the average positioning error increases to 5.13 m, which is an increase of 98.1% compared to it in nonabnormal conditions. It is worth mentioning that when the access points loss ratio α = 55%, the average positioning error is as high as 14.06 m. It can be obtained that the proposed signal anomaly detector and signal distance corrector proposed are indispensable.


4.3. Construction of Signal Anomaly Detector
As mentioned in section 3, the construction of the signal anomaly detector needs to determine the distance threshold. Meanwhile, the distance threshold can be obtained by the cumulative distribution function of . The positive detection rate and false detection rate decrease with the increase of the distance threshold; nevertheless, it is the decrease of the distance threshold instead of the law. The goal of distance threshold selection is to satisfy the balance between positive detection rate and false detection rate. To verify that the proposed new signal anomaly detector can achieve better anomaly detection performance, we adopt a traditional anomaly detection method that does not distinguish between trusted APs and untrusted APs for comparative analysis.
Figure 12 shows the cumulative distribution function of with the change of the access points loss ratio in the traditional anomaly detection method. Meanwhile, the cumulative distribution function (CDF) of with the change of the access points loss ratio in the improved signal anomaly detector is shown in Figure 13. We can see that when the access points are lost, both curves move to the right, and the greater the ratio of access points loss, the more obvious the right shift. But no matter what ratio of AP is lost, the improved signal anomaly detector moves to the right obviously compared to the traditional anomaly detection method. This phenomenon indicates that the improved signal anomaly detector is more sensitive.


As shown in Figure 12, the best distance threshold is 20 dBm by analyzing the curve in the traditional anomaly detection method. Therefore, when 10%, 20%, 30%, 40%, 50%, and 60% of the APs are lost, there will be 40%, 42%, 69%, 76%, 83%, and 88% of the positive detection rate, and the corresponding false detection rate is 4%. When the AP loss ratio is less than 30%, this traditional signal anomaly detector is not in high implementability, so it is obliged to improve the traditional anomaly detection method.
As is shown in Figure 13, the best distance threshold is 14 dBm by analyzing the curve in the improved signal anomaly detector. Therefore, when 10%, 20%, 30%, 40%, 50%, and 60% of the access points are lost, there will be 60%, 64%, 82%, 85%, 93%, and 95% of the positive detection rate, and the corresponding false detection rate is 1%. When the AP loss ratio is less than 30%, compared with the traditional anomaly detection method, it has increased by 47%, 49%, and 19%, respectively, and the false detection rate of the system is also significantly reduced. It can verify that the proposed new signal anomaly detector can achieve reliable anomaly detection performance.
4.4. Construction of Signal Distance Corrector
On the basis of the signal anomaly detector in the previous step, we propose a new fingerprint matching method based on DW-KNN for abnormal situation. When the signal abnormality detector determines that there is AP loss, the proposed algorithm only considers the signal space distortion of trusted APs, while it does not consider the signal space distortion of untrusted APs, which greatly reduce the matching error caused by signal loss.
Figure 14 shows the variation curve of the average positioning error with the change of the access points loss ratio. Meanwhile, the cumulative distribution function (CDF) of error distance with the change of the access points loss ratio is shown in Figure 15. When the access points loss ratio α = 10%, the average positioning mean error increases to 3.12 m, which is an increase of 20.4% compared to it in normal situation. When the access points loss ratio α = 25%, the average positioning mean error increases to 3.62 m, which is an increase of 39.7% compared to it in normal situation. It is worth mentioning that when the access points loss ratio α = 55%, the average positioning error grows gradually.


To verify that the proposed signal distance corrector can achieve better positioning performance under abnormal situation, traditional DW-KNN algorithm is used to compare the performance of the algorithm. The results of the experiment on the positioning mean error of the traditional DW-KNN algorithm and the improved signal distance corrector with the change of the access points loss ratio are shown in Figure 16. We can see that the average error increase of the improved DW-KNN was significantly gentler than that of the traditional DW-KNN. Meanwhile, if the acceptable average positioning error is within 5.00 m, the traditional DW-KNN algorithm can tolerate 25% AP loss in the observations, while the improved signal distance corrector can tolerate approximately 60% and shows extremely strong robustness to access point loss, which proves that the improved signal distance corrector is more reliable under abnormal conditions. It is worth mentioning that when the access points loss ratio α = 80%, the improved DW-KNN can improve the average error by 20 m at most or so compared with the traditional DW-KNN. It can verify that the proposed new signal distance corrector can achieve reliable distance correction performance.

5. Conclusion and Future Research
In this paper, we propose a new integrated model based on signal anomaly detector and signal distance corrector. The signal anomaly detector improves recognition capability of the uncertain signal and noise, while the signal distance corrector improves the robustness and fault tolerance of the highly variable Wi-Fi signals. In the offline phase, considering that Wi-Fi fingerprint database can be fitted into an n-dimensional surface in signal space, the signal anomaly detector is constructed based on signal distortion and is trained through repeated comparison and analysis. In the online phase, for the unlabeled RSSI sample from the mobile terminal, the signal anomaly detector is used to realize online anomaly estimation, and the signal distance corrector is used to online distance correct. To fully reflect the performance of the proposed method, experiments have been carried out in the real environment of indoor parking lots. The results show that the proposed integrated model successfully provides reliable position estimation when the access points signal is lost under cybersecurity threats.
In the future, we are planning to solve the problem of best-discriminating AP optimization in large-scale complex environments with partition walls. In addition, we plan to integrate other mobile phone sensors (such as Bluetooth and Geomagnetism) to obtain better positioning precision.
Data Availability
The data are true and reliable, and the original data have been saved in the attachment.
Conflicts of Interest
The authors declare no conflicts of interest.
Authors’ Contributions
H. W. conceptualized the study. Z. Y. performed data curation. Z. Y. performed investigation. Z. Y. wrote the original draft. Y. C., Z. C., and X. Z. reviewed & edited the manuscript. All authors have read and agreed to the published version of the manuscript.
Acknowledgments
This work was supported by Natural Science Foundation of China (NSFC) under Grant nos. 62073250 and 62003249, in part by Key Research and Development Program of China under Grant 2017YFC0806503-05, in part by Key Research and Development Program of Hubei Province under Grant 2020BAB021, and in part by Science and Technology Research Project of Hubei Provincial Department of Education under Grant D20201105.