Abstract

Cross-technology communication (CTC) technique can realize direct communication among heterogeneous wireless devices (e.g., WiFi, ZigBee, and Bluetooth in the 2.4 G ISM band) without gateway equipment for forwarding, which makes heterogeneous wireless communication more convenient and greatly reduces communication costs. However, compared with the traditional homogeneous network model, CTC technique also makes it easier to implement spoofing attacks in heterogeneous networks. WiFi devices with long communication distances and sufficient energy supply can directly launch spoofing attacks against ZigBee devices, which brings severe security concerns for heterogeneous wireless communications. In this paper, we focus on the CTC spoofing attack, especially spoofing attacks from WiFi to ZigBee and propose a machine learning-based method to detect spoofing attacks for heterogeneous wireless networks by using physical-layer information. First, we model the received signal strength (RSS) data of legitimate ZigBee devices to construct a one-class support vector machine (OSVM) classifier for detecting CTC spoofing attacks depending on the obtained training samples. Then, we simulated CTC spoofing attacks in a live testbed and evaluated the performance of our detection method. Results show that our approach is highly effective in spoofing detection. Even if the distance between the legitimate ZigBee device and WiFi attacker is near each other (i.e., less than 2 m) and does not require a large number of samples, the detection rate and precision of our method are both over 90%. Finally, we employ the OSVM classifier to obtain samples of spoofing attacks and then explore using SVM to further improve the performance of the classifier.

1. Introduction

In recent decades, more and more demand for wireless communications has caused issues correlated to communication security. With the rapid development of the Internet-of-Things technology, the unprecedented proliferation of wireless devices has brought great convenience to our lives. According to a recent report [1], the number of Internet-of-Things (IoT) devices is expected to reach 55 billion by 2025, which will cause the intense coexistence of wireless technologies [2]. Many of today’s wireless technologies, such as WiFi, ZigBee, and Bluetooth, coexist and share the unlicensed spectrum (e.g., 2.4 G ISM band), which inevitably renders wireless devices to compete for channel and interfere with each other [3]. Besides, due to the openness of the wireless transmission medium, as a new type of attack, the CTC spoofing attack is incredibly easy to implement and can impair network performance significantly.

As an emerging research work, the CTC technique provides a promising direction for direct communication between heterogeneous wireless devices [4]. According to CTC, using different layer modulations, the existing CTC works can be divided into two categories: packet-level modulation [5] and physical-level modulation [6]. Specifically, compared with coarse-grained packet-level modulation, physical-level modulation can achieve high-speed throughput by directly simulating the heterogeneous signals in the physical layer [7]. Unfortunately, the security of CTC has not always been considered in the design, and thus, the application of CTC could potentially suffer from severe security concerns [8]. In traditional wireless networks, spoofing attacks usually occur in homogeneous networks, that is, using ZigBee devices to attack another ZigBee device or WiFi to attack another WiFi device. However, CTC technology allows spoofing attacks to occur in heterogeneous networks, where WiFi devices can be used to directly attack ZigBee devices. For example, suppose that a malicious WiFi transmitter exists or has been compromised by an attacker. It could send the spoofed packets in the same frequency band to control the Bluetooth or ZigBee receiver via CTC.

The large-scale deployment of wireless devices has attracted a large number of malicious attacks, and in particular, the security issues of identity-based spoofing attacks have become extremely challenging. For example, in an IEEE 802.11 wireless LANs (WLANs), it is effortless for an adversary to change MAC addresses and then masquerade as an authorized wireless access point (AP) by simply issuing the ifconfig command [9]. Besides, spoofing attack is considered as the first step for several other types of attacks, such as traffic injection attacks, session hijacking, man-in-the-middle attacks, and various types of denial-of-service (DoS) attacks [911]. A variety of cryptographic authentication methods are commonly employed to prevent spoofing attacks for the homogeneous network. However, cryptographic authentication requires extrainfrastructural overhead, key distribution, management, and maintenance mechanisms [9, 11, 12]. Because of the limited power and resources of wireless sensors, these cryptographic schemes are not always desirable to be adopted [13]. In light of these circumstances, some advances [14] in noncryptographic mechanisms provide promising opportunities for securing CTCs in heterogeneous wireless networks.

In this paper, we focus on the spoofing attack based on CTC especially from WiFi to ZigBee and propose to utilize RSS, a physical property correlated to both the environmental conditions and distance between the sender and the receiver (not dependent on cryptography), as the basis for detecting spoofing. Specifically, WiFi devices with comprehensive deployment and more extended transmission range can easily launch CTC spoofing attacks when short-range ZigBee devices communicate with each other. The WiFi device masquerades as a ZigBee device to send spoofed data packets to other ZigBee devices in the same frequency band, but ZigBee devices cannot distinguish whether the data comes from the WiFi device or other ZigBee devices.

Spoofing attacks in cross-technology communication are more difficult to monitor than traditional homogeneous networks. Compared with traditional homogeneous networks, spoofing attacks in cross-technology communications are more difficult to monitor mainly because (1) WiFi devices have a longer transmission distance, which allows a WiFi signal to cover a wider range and can spoof more ZigBee devices, and (2) WiFi devices are usually powered by AC power, which makes the energy supply more sufficient and can continuously broadcast spoofing signals to other ZigBee devices. To counteract the aforementioned spoofing attacks over CTC links, we propose to detect spoofing attacks by utilizing machine learning algorithms based on RSS spatial correlation. Furthermore, our method does not require additional overhead, and wireless devices and sensors do not need to be modified.

The contributions are summarized as follows:(1)We study CTC spoofing attack from WiFi device to ZigBee device using machine learning methods grounded on RSS physical property.(2)We proposed two classifiers based on OSVM and SVM models. In the first one, we model the RSS data of legitimate ZigBee devices to construct an OSVM classifier for detecting CTC spoofing attacks depending on the obtained training samples. In the second one, we used SVM to further improve the performance of classifier based on the classification results of OSVM when large-scale spoofing attacks break out in the network.(3)We simulated CTC spoofing attacks in a live testbed and evaluated the performance of our detection method. Results show that our approach is highly effective in spoofing detection. Even if the distance between the legitimate ZigBee device and WiFi attacker is near each other (i.e., less than 2 m) and does not require a large number of samples, the detection rate and precision of our method are both over 90%.

The remainder of this paper is organized as follows. Section 2 reviews the related work, and Section 3 introduces the background knowledge of preliminary work. Section 4 presents system design in detail. Sections 5 contains experimental results, and Section 6 concludes the paper.

The traditional security approach to prevent spoofing attacks is to use cryptographic-based authentication [1518]. Wu et al. [15] proposed a framework based on secure and efficient key management (SEKM). The work in [16] introduced a key management mechanism based on periodic key refresh and host revocation to avoid the leak of authentication keys. Bohge and Trappe [17] proposed an authentication framework for hierarchical, ad hoc sensor networks. In addition, in [18], the authors implemented the binding approaches of cryptographically generated addresses (CGA) to defend against spoofing attacks. However, because of the limited power and resources of wireless devices and sensor nodes, it is not always desirable to deploy these cryptographic schemes.

Some advances based on physical properties associated with wireless transmission provide promising opportunities for detecting spoofing attacks. Faria and Cheriton [19] proposed to detect identity-based attacks in wireless networks using signalprints. Signalprint was defined as the vector of median RSS for a MAC address in multiple air monitors. The work of [10] observed that, as a result of antenna diversity, the RSS readings tend to follow a mixture of multiple Gaussian distributions. They further proposed to build legitimate RSS profiles based on the Gaussian mixture model (GMM) clustering algorithm. The research in [13] proposed a method based on the K-means clustering algorithm to detect and localize MAC address spoofing in both 802.11 WLANs and 802.15.4 ZigBee networks. The strategy proposed in [9] utilized the K-medoids algorithm to detect spoofing attacks and then determined the number of attackers and localized multiple adversaries. This algorithm is superior to K-means clustering algorithm because it is robust against any noise and outliers that the data might contain.

In its early days, how to avoid, mitigate, or tolerate cross-technology interference has drawn many researchers attention [2024]. Recent advances in CTC have been expected to settle the issue of CTI and establish direct communication across technologies. According to CTC, using different layer modulations, the existing CTC works can be divided into two categories: packet-level modulation and physical-level modulation. In Esense [25], GSense [22], and FreeBee [26], RSS is used to measure WiFi signals to enable communication between WiFi and ZigBee devices. In comparison, with existing CTCs deploying packet-level modulation using the packet length [25], timing [26], energy level [27], and sequence patterns [28, 29], WEBee is the first physical-layer CTC design, which carefully fills the payload of a high-speed WiFi frame to directly emulate a low-speed ZigBee frame. In addition, similar to WEBee, TwinBee [30] and LongBee [31] enable CTC via physical signal emulation, where WiFi radio generates the desire of a ZigBee radio by manipulating the WiFi payload.

The strategy proposed in [32] adopted a collaborative mechanism to enable the spoofing attack detection for CTC in heterogeneous wireless networks by measuring the corresponding RSS on WiFi devices. The work in [33] implements a reactive jamming system, JamCloak, that can attack most existing CTC protocols. In addition, they proposed a practical detection and mitigation approach against reactive jamming attack over CTC links such as JamCloak. In [8], the authors observed a new attack named as CTC waveform emulation attack, where the WiFi attacker can capture the preintercepted ZigBee control message and hide it into the signal so as to manipulate the ZigBee device via transmitting the WiFi emulation signal. For detecting this attack, they utilized higher-order statistics at the ZigBee receiver to analyze the constellation.

As a relatively new technology, the security of CTC has not always been considered in design and currently relevant research work is scarce. Therefore, similar to traditional homogeneous wireless spoofing attacks, CTC spoofing attacks can also be easily implemented and cause significant damage to network performance. In this paper, we focus on the problem of CTC spoofing attack detection. We propose to use the spatial correlation of RSS inherited from wireless nodes and combine two machine learning methods for detecting CTC spoofing attacks.

3. Preliminary

3.1. Cross-Technology Communication

Compared with packet-level modulation, physical-level modulation is more fine-grained and thus achieve high-speed throughput by directly simulating the heterogeneous signals in the physical layer. As a pioneering research, WEBee [6] employs a WiFi signal to emulate another ZigBee signal without changing hardware or firmware. As shown in Figure 1, WEBee meticulously fills the payload of a transmitted WiFi frame in order that the RF waveform of the payload resembles that of ZigBee signals. When the ZigBee devices receive such a WiFi frame, it will ignore the WiFi header, the preamble, and trailer as noise, while the payload will successfully pass the ZigBee preamble detection, and then, the ZigBee receiver will demodulate the emulated ZigBee frame.

3.2. Theoretical Analysis of the Spatial Correlation of RSS

The received signal strength (RSS) involved in our research is closely related to its physical space position and is easily available in existing wireless networks. Although affected by random noise, environmental deviation, and multipath effects, the RSS values measured at the same physical location are similar, and the RSS values measured at different physical spatial locations are distinct. Therefore, the RSS values present strong physical spatial correlation characteristics.

We define the RSS value vector as , where represents the number of reference points, and the reference points determine their positions by acquiring the RSS values of the wireless nodes. Generally, the RSS value of the wireless node obtained at the th reference point satisfies the following logarithmic distribution [34]:where indicates the maximum power of the sensor to the reference range , means the distance from sensor to the th landmark, represents the path loss exponent, and indicates the shadow fading which follows zero mean Gaussian distribution with standard deviation [34, 35]. And, for the sake of simple, we suppose the wireless devices have the same transmission power. The RSS distance from one device to another in the signal space of the th landmark can be formulated aswhere follows zero mean Gaussian distribution with standard deviation. The squared value of RSS distance in -dimensional signal space (i.e., reference points) can be given aswhere with denotes the RSS distance at th landmark, represented by equation (2).

4. System Design

4.1. Network Architecture

The network architecture is provided in Figure 2, consisting of wireless devices (i.e., ZigBee devices and WiFi devices), server and console. Suppose there is such a situation that regular communication between ZigBee devices is in progress. Since the WiFi device can directly communicate with the ZigBee device, the adversary can leverage the WiFi device to masquerade as a ZigBee device to launch a spoofing attack. At this time, monitoring sensors located in fixed locations can receive the WiFi attacker frames in real-time for spoofing detection. The server receives packets monitored by the monitoring sensors for global detection. The console receives data packets, utilizes timestamps or sequence numbers to normalize RSS samples, combines the data packets, and constructs the samples.

Figure 3 shows the method of training the OSVM model. Before using the OSVM model to classify the RSS samples in the real environment (i.e., there are spoofing attacks), a set of legitimate RSS samples need to be used to train the OSVM model. First, all incoming RSS data samples are preprocessed. The preprocessing of the sample includes the following. (1) Eliminate outliers: for filtering the outliers of the collected sample set, we use the 3 criterion to eliminate noise data and leave them blank. (2) Filling the missing values: since there may be missing values in the collected sample set, we can use the KNN data filling algorithm to effectively fill in continuous or intermittent missing values and vacant noise data. The preprocessed data will then be divided into two parts: the training set and the test set. They are used to ensure a fully applicable OSVM model.

Figure 4 describes the process of spoofing attack detection. When an ideal model is obtained, the spoofing attack detection process will be used, as shown in Figure 4. Finally, the OSVM classifier is used to classify the incoming RSS data into spoofing data and legitimate data.

4.2. Attack Detection Using OSVM Analysis

As shown in Figure 5, we take an example to illustrate the spoofing attack process of the WiFi device against ZigBee devices. Because of the broadcasting nature of wireless transmission medium, WiFi devices located in nearby locations can easily receive data frames from ZigBee transmitters. Therefore, the WiFi attacker launches a spoofing attack including two steps. First, the attacker is in the channel listening state, obtains the sending frame of the ZigBee device, then forges the identity information to masquerade as a legitimate ZigBee device, and finally successfully launches a malicious attack. We detect the CTC spoofing attack by measuring the WiFi data packet’s RSS value on the monitoring sensors. We analyze the RSS values of WiFi devices collected by monitoring sensors scattered in different locations to monitor attackers. The RSS value of the data packet sent by wireless device is expressed as the following vector:where is the RSS value of node from monitoring sensor . Then, we use the signal strength vector of all deployed monitoring sensors as signal fingerprints of legitimate ZigBee devices and use these signal fingerprint vector to construct the training set in real time. Finally, we train the OSVM classifier to detect whether the vector sample belongs to a legitimate ZigBee device.(1)Training Set. We utilize RSS vector samples from legitimate ZigBee devices to fill the training set. Each sample includes the receiving time from the transmitter to the monitoring sensor, the transmitter’s MAC address, and the RSS value. The training dataset is given bywhere is the number of samples of the signal fingerprint vector in the training dataset .(2)One-Class SVM Classification. The main goal of OSVM is to generate decision functions based on feature vectors in the training dataset. In this system, OSVM detects malicious devices by finding a suitable hyperplane in a nonlinear space. Therefore, the target is expressed as the following quadratic optimization problem:where is the number of training samples, is the nonlinear mapping function for feature vectors in the training set, is the weight vector for the model, is the nonzero slack variable so that the model has a certain tolerance, and the regularization parameter is set to 0.01 to control the tolerance. In all training data, the vector sample to be subjected to is the support vector, which is located at the edge of the decision function. Therefore, the classifier can be written as

The case of indicates that the signal strength fingerprint vector comes from a legitimate ZigBee device; while indicates that it comes from a WiFi attacker.

To determine whether the test sample vector falls within the hyperplane, the nonlinear kernel function is used in the decision function, which is given by [36]where is the Lagrange multiplier obtained by using the function to maximize the margin. In most circumstances; we use three different kernel functions , namely, linear function, polynomial function, and radial basis Function (RBF) given by [37]

The linear kernel function is mainly used in the case of linearly separable. Compared with polynomial and RBF kernel function, it has fewer parameters, so the calculation speed is faster. For linearly separable data, its classification effect is very ideal. The polynomial kernel function can map a low-dimensional input space to a high-dimensional feature space, but it has many parameters. When the order of the polynomial is relatively high, the computational complexity will be too large to be calculated. The RBF function can map a sample to a higher-dimensional space. It has better performance regardless of whether in large-sample or small-sample training, and its parameters are less than the polynomial kernel function. By comparing the calculation results of these three kernel functions, in this study, we use the RBF function with the highest detection rate.

4.3. Attack Detection Using SVM Analysis

In this section, we explore the use of support vector machine algorithm to further improve the performance of the classifier based on the classification results of OSVM when training data are available in the offline phase. In particular, SVM is a set of kernel-based learning methods for data classification, including the training phase and the testing phase [38]. Every sample instance in the training set contains a class label and attribute (i.e., feature). For instance, for CTC spoofing attacks from WiFi to ZigBee, if there are no spoofing attacks, we can use the label value “+1” to mark the result; if there are spoofing attacks, we can use the label value “−1” to mark the result.

Training samples are able to be obtained by means of monitoring network activities regularly. The labeled training set with feature vectors is given by equation (10), where is the label of :

Each feature vector is an -dimensional real vector of

We aim to maximize margin hyperplane that divides feature vectors with and . The hyperplane can be formulated as follows:where represents the normal vector of the hyperplane, indicates the bias variable, and means the feature vector of the sample that lies on the hyperplane, as seen in Figure 6. We select the hyperplane that maximizes the margin between positive and negative samples. The following constraint needs to be satisfied:

As seen in Figure 7, if the training samples in the transformed space are linearly nonseparable, the optimization problem can be modified by introducing slack variables :

The hyperplane is computed by solving the following optimization problem in the primal form [39]:

Its dual iswhere means a compromise parameter between error and margin. We can use “kernel tricks” to solve nonlinear SVM problems. Suppose that the kernel function is represented by . Therefore, equation (16) can be formulated as

We use the following kernel function for testing:

In addition, given a signal strength vector sample , the decision function is expressed as follows:where is used to detect the test sample . If , it indicates that the test sample belongs to a legitimate ZigBee device, that is, there is no spoofing attack. Otherwise, , and the test sample comes from an WiFi attacker, and there is a spoofing attack on the network.

In daily wireless communications, spoofing attacks are relatively rare compared with legitimate communications. This also means that we can easily obtain a large number of legitimate communication samples, while spoofing attack samples are more difficult to obtain and the number is scarce [40, 41]. However, using traditional classification algorithms to study wireless communication security requires roughly the same number of two-class samples [42, 43]. In this article, due to the diverse means of implementing spoofing attacks and the lack of training samples for spoofing attacks, we use the OSVM algorithm to train legitimate data and construct a classifier to finally detect spoofing attacks and obtain spoofing attack samples. As the number of spoofing attack samples detected by the OSVM classifier increases, we explore using SVM algorithm to further improve the performance of the classifier.

5. Experimental Section

5.1. Experimental Setting

In this section, USRP N210 and ZigBee devices (i.e., MICAz nodes) are used to test the spoofing detection performance of the proposed model under heterogeneous networks. The scene setting is shown in Figure 8. Two MICAz nodes conduct normal data communication. The USRP N210 device is simulated as a WiFi signal transmitter and can directly transmit data to the ZigBee device [44, 45]. In this process, the USRP N210 is disguised as a legitimate ZigBee device to perform spoofing attacks on other ZigBee devices.

The proposed spoofing detection experiments were performed in an indoor environment. Figure 8 shows that the WiFi device located near (about 3 m) legitimate ZigBee device is launching a spoofing attack. The ZigBee transmitter and receiver are communicating. However, there is a malicious WiFi device near them. Since the WiFi device can receive ZigBee packets, the malicious WiFi device can masquerade as ZigBee transmitter and launch CTC spoofing attack on ZigBee receiver. This is exactly the CTC spoofing attack we want to detect. As shown in Figure 9, we used 20 testing locations marked with dots to cover an area of . To evaluate our proposed method, we assumed two scenarios of spoofing attacks. The first scenario is when the WiFi attackers are in our room, we chose ten locations to be the location of the legitimate ZigBee device (e.g., location 1–10) and used the remaining locations (e.g., locations 11–20) as the location of the attacker. The specific operation is as follows: ten locations (marked with the red dot in Figure 9) are selected as the locations of the ZigBee device and moved between the ten locations, the USRP N210 is located at locations 11–20 (marked with the purple dot in Figure 9) and moves between the ten locations, and the USRP N210 conducts spoofing attacks on ZigBee devices at these different locations. To detect the attack, four sensors represented by triangles were placed to measure the RSS of the audible frames, and we collected 200 packet-level RSS samples at each location. For detecting spoofing attacks from locations 11–20, we used standard RSS samples at locations 1–10 to train the OSVM classifier. The other scenario is when WiFi devices with comprehensive deployment and more extended transmission range are outside the room, we chose to use data from all test locations (i.e., locations 1–20) to train the OSVM classifier.

5.2. Signal Strength Analysis

Figure 10 shows the data distribution of 10 locations from four monitoring sensors, respectively. We collected 200 RSS samples from each location, a total of 2000 samples. We found RSS oscillation for a stationary device, and the RSS values at the same position are close to each other. There may be several factors for RSS oscillation, for example, multipath effect and obstacles that may cause signal oscillation, particularly, when the distance from the sender to the receiving device is large. However, so as to alleviate this influence, we can collect more samples at each location and then apply data cleaning techniques.

The corresponding probability histogram of RSS from 10 locations is given in Figure 11. Some researchers point out that the RSS samples of a given transmitter/sensor pair fit a Gaussian distribution [12, 13], while other researchers report that it is not rare to see non-Gaussian distributions of RSS samples, suggesting that those distributions are a mixture of multiple Gaussian distributions [10]. As shown in Figure 11, we also discovered this phenomenon, that is, a mixture of multiple Gaussian distributions. For example, in Figure 11(a), it can be seen that there are four Gaussian distributions. In Figure 11(b), two Gaussian distributions can be seen and so on. The four subplots show that the signal strength range of the four sampled locations is [−90, −30] dBm, and the mean and standard deviation of the signal at each location are calculated.

5.3. Performance Comparison

In order to evaluate the performance of the model proposed in this article in actual scenarios, this article chooses K-means, KNN, logistic regression (LR), and random forest (RF) to compare experiments with the method used in this article.

To evaluate the proposed schemes performance in the real scenarios, we simulated ten attack locations (i.e., locations 11–20) in Figure 9. Furthermore, we implemented the spoofing monitoring program, in which the distance between the legitimate ZigBee device and the WiFi attacker was less than 2 m, 2-3 m, and 3–5 m, respectively. Besides, we found that the performance of our method improves with increasing attack distance. When the attack distance is more excellent than 5 m, our spoofing detection method has extremely high detection performance; therefore, CTC spoofing attacks will be easily detected.

In this section, we introduce the comparison results of the accuracy for five spoofing attack detection methods. As shown in Table 1, five algorithms are used to compare the minimum, average, and maximum accuracy rates and standard deviations of accuracy rates under three different attack distances. When the distance between the legitimate ZigBee device and the WiFi attacker is less than 2 m, the average, maximum, and minimum accuracy rates of the K-Means method and the standard deviation of the accuracy rates are 78.95%, 80.72%, 63.66%, and 9.441%, respectively. The corresponding values of the KNN method are 81.34%, 90.26%, 63.53%, and 12.235%. Same as the above values, LR: 68.13%, 88.67%, 65.21%, and 11.127%, RF: 80.40%, 92.23%, 77.53%, and 7.215%, and OSVM: 92.17%, 94.45%, 85.51%, and 4.835%. When the distance is 2-3 m, K-Means: 83.38%, 89.05%, 62.37%, and 10.518%, KNN: 82.27%, 91.57%, 62.45%, and 13.56%, LR: 71.08%, 91.23%, 58.34%, and 15.233%, RF: 85.43%, 94.66%, 82.75%, and 5.755%, and OSVM: 95.38%, 97.89%, 91.60%, and 3.236%. When the distance is 3–5 m, K-Means: 88.75%, 90.31%, 68.50%, and 8.293%, KNN: 93.76%, 95.35%, 80.53%, and 7.411%, LR: 93.24%, 95.74%, 86.13%, and 4.712%, RF: 96.58%, 97.65%, 92.82%, and 2.154%, and OSVM: 97.76%, 98.77%, 96.31%, and 1.624%. It can be seen that, as the distance increases, the accuracy of these methods is improving.

When the distance is less than 2 m, the effect of the logistic regression algorithm is not ideal, the average accuracy rate is only 68.13%, the accuracy difference between K-Means, KNN, and random forest algorithm is small, and the method used in this article reaches 92.17%. Compared with the other four algorithms, it has a higher accuracy rate. When the distance is 2-3 m, the accuracy of the logistic regression algorithm is still the lowest. The average accuracy of KNN, K-Means, and random forest algorithms are 82.27%, 83.38%, and 85.43%, respectively. The OSVM algorithm used in this paper reaches 95.38%. As the distance increases to 3–5 m, the accuracy of the five algorithms improves. The minimum algorithm accuracy rate is 88.75%. Random forest and the method used in this paper exceed 95%, which has a higher accuracy rate for detecting spoofing attacks.

In the three attack scenarios of the experiment, when the distance between the WiFi attacker and the ZigBee device is small (that is, less than 2 m), the accuracy of K-means, KNN, logistic regression, and random forest algorithms are all below 90%, of which logistic regression and the accuracy of the OSVM algorithm differs by 24%, and the accuracy of the other three algorithms differs from that of the OSVM algorithm by more than 10%. When the distance is 2-3 m and 3–5 m, the OSVM algorithm also performs higher than the other four algorithms. When the spoofing attack distance is small (that is, less than 2 m), the accuracy of the other four algorithms is significantly lower than that of the OSVM algorithm, which shows that, in the detection of small-distance spoofing attacks, the use of the OSVM algorithm in this paper has a greater advantage. When the distance is 2-3 m and 3–5 m, the accuracy of the OSVM algorithm is also the highest. Comparing the standard deviation of the accuracy of the five algorithms, it is found that the standard deviation of the accuracy of the OSVM algorithm is always smaller than the other four algorithms, which indicates that the detection performance of the OSVM is more stable than the other four algorithms. In summary, compared to K-means, KNN, logistic regression, and random forest algorithms, the OSVM algorithm has the best detection performance and the most stable model, so it is suitable for different test distance scenarios.

In order to evaluate the computational cost of these two methods, we conducted tests on the laptop equipped with 2.3 GHz CPU and 4 GB memory. Table 2 shows the comparison of the average test time, standard deviation, and minimum and maximum values of 3000 spoofing attack test samples using these five methods. The average, maximum, and minimum test time of the K-Means method and standard deviation of the test time are 0.12666 s, 0.38098 s, 0.025972 s, and 0.03786 s, respectively. Same as the above values, KNN: 0.22578 s, 0.23999 s, 0.21863 s, and 0.00213 s, LR: 0.31306 s, 0.32029 s, 0.30893 s, and 0.01135 s, RF: 2.01943 s, 2.4402 s, 1.97665 s, and 0.04533 s, and OSVM: 0.021941 s, 0.04687 s, 0.018965 s, and 0.003695 s.

The average test time of these five methods is sorted from small to large: OSVM, K-Means, KNN, LR, and RF. Among them, the OSVM method is the fastest, followed by K-means, with an average time of 0.021941 seconds and 0.12666 seconds, respectively. The random forest algorithm has the longest test time, with an average test time of 2.01943 seconds. We observe that the OSVM-based solution is about 100 ms faster than the second-ranked K-means detection method. This shows that the method in this paper is superior to the other four algorithms in terms of computational speed, which also means that, in the experiment, using the OSVM-based method, we can immediately detect the ongoing spoofing attack with very low latency.

5.4. Experimental Evaluation and Result

In this section, we introduce the detailed evaluation results of the OSVM algorithm proposed in this paper. Table 3 lists the WiFi attacker’s detection rate, precision, F-measure, and AUC value at different distances from the legitimate ZigBee device. The corresponding receiver operating characteristic (ROC) curve is plotted in Figure 12. The results are encouraging. When the distance from the WiFi attacker to the legitimate ZigBee device is less than 2 meters, the detection rate for the false alarm rate of less than 3% is higher than 90%; when the distance between the WiFi attacker and the legitimate ZigBee device is 2-3 meters, although the false alarm rate reaches zero, the detection rate reaches 95.38%; when the distance between the WiFi attacker and the legitimate ZigBee device is 3–5 meters, the detection rate still exceeds 97%.

We utilize SVM to further improve the performance of classifier based on the classification results of OSVM when large-scale spoofing attacks break out in the network. Therefore, we utilize the OSVM classifier to detect the RSS data of abnormal network traffic. As the number of spoofing attack samples increases, we use roughly the same number of two-class samples to train the SVM classifier. In order to reasonably evaluate the generalization error of the model, we use the grid search method. After 10-fold cross-validation, we find the best penalty coefficient , the best kernel function is “rbf,” and, finally, we get a 98.67% detection rate. Other metrics are shown in Table 3. The corresponding ROC curve is shown in Figure 13.

6. Conclusions

In this article, we proposed a machine learning-based method to detect spoofing attacks for heterogeneous wireless networks by using physical-layer information. To be more specific, WiFi devices with wide deployment and longer transmission range can easily launch CTC spoofing attacks when short-range ZigBee devices communicate with each other. Due to the lack of CTC spoofing attack samples, we propose to model OSVM classifier based on the RSS data of legitimate ZigBee devices. We simulated CTC spoofing attacks in a live testbed and evaluated the performance of our detection method. Results show that our approach is highly effective in spoofing detection. Even if the distance between the legitimate ZigBee device and WiFi attacker is near each other (i.e., less than 2 m) and does not require a large number of samples, the detection rate and precision of our method are both over 90%. We employ the OSVM classifier to obtain samples of spoofing attacks and, finally, explore using SVM to further improve the performance of the classifier.

Data Availability

The data used to support the findings of this study are available from the corresponding author upon request.

Conflicts of Interest

The authors declare that there are no conflicts of interest regarding the publication of this paper.

Acknowledgments

This work was supported by the project fund of Future Network of Jiangsu (China), the project funded by China Postdoctoral Science Foundation (Grant nos. 2018T110505 and 2017M611828), and the Priority Academic Program Development (PAPD) of Jiangsu Higher Education Institutions.