Abstract
As a cross-protocol endogenous security mechanism, the physical layer-based radio frequency (RF) fingerprinting can effectively enhance the existing password-based application layer authentication utilizing the hardware differences of wireless devices, which is unique and cannot be counterfeited by a third party. However, the recognition performance of the deep learning physical layer fingerprint recognition algorithm drops sharply in the case of a small number of signal samples. This paper analyzes the feasibility and proposes the few-shot wireless signal classification network based on deep metric learning (FSig-Net). FSig-Net reduces the model’s dependence on big data by adaptively learning the feature distance metric. We use 8 mobile phones and 18 Internet of Things (IoT) modules as targets for identification. When the number of single-type samples is only 10, the recognition accuracy of mobile phones can reach 98.28%, and the recognition accuracy of IoT devices can reach 98.20%.
1. Introduction
The popularity of wireless networks and the increasing number of wireless devices have increased the risk of malicious attacks and unauthorized user access. It is critical to use appropriate methods to authenticate the identity of wireless devices accurately. The traditional key-based wireless device identity authentication methods have the risk of being breached, and these methods may be hard to deploy for many Internet of Things (IoT) devices, which usually have power-constrained and low-cost characteristics [1, 2]. In addition, different types of networks have different encryption protocols, and the key needs to be switched according to the device network protocol. Compared with the key, the physical layer has the characteristics of being challenging to break forgery, low latency, and low computational overhead [3]. Radio frequency fingerprint (RFF) technology is the key technology for classifying and identifying physical layer devices. The RFF of different wireless devices depends on the randomness of their physical components [4, 5]. Identifying RFF does not need to consider the impact of different network protocols. It can be effectively combined with the upper-layer authentication scheme to establish a heterogeneous network to increase security.
Currently, the traditional physical layer authentication method based on feature extraction needs to extract the transient or steady-state features of the signal in advance and performs feature screening to improve the classifier’s performance [6, 7]. Fang et al. [8] used the cross-power spectrum to extract RFF. The proposed method can extract RFF from all signal segments, which achieves good results in 50 CC2540 ZigBee modules dataset. Zhang et al. [9] proposed the artificial radio frequency fingerprint (ARFF) embedding scheme for device identification. The scheme includes an adaptive filter-based RFF extraction method, a principal component analysis (PCA)-based ARFF generation algorithm, and a proposed ARFF embedding algorithm. Experimental results show that the method greatly increases the distinguishability of fingerprints. However, in practical applications, feature extraction largely relies on researchers’ experience and domain knowledge. Without the prior information, the design of feature extraction algorithms is relatively difficult and has limitations [10].
Deep learning networks do not require a well-defined mathematical model or a specific feature extraction algorithm. They can realize automatic feature extraction and recognition of the input data through training. Wu et al. [11] proposed the dynamic shrinkage learning network (DSLN) for IoT identification; the method includes a dynamic shrinkage threshold which improves the low signal-to-noise ratio (SNR) recognition performance, and an identity shortcut which increases the running speed. Zong et al. [12] used an improved version of the VGG-16 model to classify signals and trained 1,000 signals from five transmitters. The recognition accuracy of the method for five devices reaches 99.7%.
Although the current deep learning-based wireless device individual identification methods have good identification accuracy, they often rely on a large training sample set, and it is often difficult to obtain many wireless device signals in the real environment. Few-shot learning (FSL) [13] can simulate the rapid learning ability of human beings, using supervised information to learn from a limited number of examples so that deep learning can eliminate the dependence on big data. As one of the FSL methods, metric learning measures the similarity of objects by comparing the distances between samples, so that samples with the same label can output a smaller distance, while heterogeneous samples can output a larger distance. Metric learning provides more flexible constraints for the data in the transformed mapping space, improving the model’s learning performance. Therefore, we propose the FSig-Net based on deep metric learning. The network model extracts multidimensional features of signals through a feature extraction module, automatically learns the metric method, and then realizes signal identification through a feature comparison module. In addition, we introduce the training method of meta-learning, which synchronizes the testing process in the training process, and further improves the recognition results under the few-shot condition.
The contributions of this work can be summarized as follows:(i)We propose the few-shot wireless signal classification network based on deep metric learning (FSig-Net). FSig-Net uses the prior knowledge acquired by its feature extraction network to constrain the complexity of the hypothesis space. By adaptively learning the feature distance measurement method, it reduces the model’s dependence on big data and improves the recognition accuracy of wireless signals in the case of few-shot.(ii)From the perspective of empirical error and generalization error, we deduce two conditions for learning a model with solid generalization ability and verify the feasibility of few-shot recognition based on deep metric learning.(iii)We identify the RFF of mobile phones and IoT modules collected in the real environment. When the number of single-type samples is only 10, the classification accuracy can reach 98.28% for mobile phones and 98.20% for IoT devices.
2. Related Work
2.1. Current Status of Few-Shot Classification
In the early days, few-shot classification problems usually adopted nondeep learning methods. It uses the given supervised information to estimate the joint distribution of samples, and uses the obtained posterior probability distribution as a prediction model to determine the category of new samples. However, this method is customized for a specific data form and is difficult to extend to common samples. The boom in deep learning opens up new directions for FSL. The Siamese network proposed in [14] trains the model by comparing the similarity of paired images, which becomes a watershed between deep learning and nondeep learning-based methods in the problem of few-shot recognition. Currently, the FSL methods are divided into three directions: data augmentation based, algorithm based, and model based.
2.1.1. Data Augmentation-Based Few-Shot Learning Methods
The FSL methods based on data augmentation usually use translation, inversion, shearing, scaling, and rotation to expand the training set [15]. Zhou et al. [16] proposed a data augmentation method that uses a classifier and a generative model for label-flipped data generation. Experiments show that the method improves many tasks’ performance while not negatively affecting others. The network proposed in [17] uses a continuous attribute subspace to add attribute variables to samples. This method can better capture attributes such as object color and texture, and successfully map source features to target features while maintaining class identity.
2.1.2. Algorithm-Based Few-Shot Learning Methods
Algorithmic-based FSL methods refer to strategies that search for optimal hypothesis parameters in the hypothesis space [18]. When the supervision information is abundant, the hypothesis function obtained through model training is more likely to approach the optimal hypothesis. However, when the amount of training data provided is small, the decision function is easy to be unreliable. From an algorithmic point of view, the core way to solve this problem is to provide suitable initialization parameters. Transfer learning [19] uses the data of related tasks to pretrain the model. It uses the learned parameters as the beginning of the target task so that the model can quickly learn the feature structure of new samples even when the amount of target data is small. Zheng et al. [20] designed a multiscale meta-relational network that adopts META-SGD. The method uses the model-independent meta-learning algorithm to find the model’s optimal parameters in the meta-training process. Besides, it eliminates internal gradient iteration in the process of meta-validation and meta-test. Experiments show it makes the learned measurement have a stronger generalization ability. Baik et al. [21] proposed a new meta-learning method with a loss function that adapts to each task. The method has better flexibility and generalization performance in few-shot classification and few-shot regression domains.
2.1.3. Model-Based Few-Shot Learning Methods
Model-based FSL methods mainly include multitask, external memory, and metric learning. Multitask learning [22] refers to integrating the objectives of multiple learning tasks for modeling and training, sharing some relevant parameters, and exchanging learning results to improve the effect of model generalization. However, it is less data efficient and difficult to transfer to other less relevant tasks. Learning with external memory [23] uses some neural networks with memory ability for learning and uses the content in memory to express new samples, thereby reducing the hypothesis space’s size and achieving the purpose of FSL. However, the complexity of the memory network itself will bring a series of problems, such as a slow training process and low efficiency. The core work of metric learning is to study how to measure the similarity between samples more effectively [24]. It makes the samples of the same class gather together in the projected feature space, and the samples of different classes are far away. Traditional linear distance measures include Euclidean distance, Manhattan distance, Cosine similarity, Pearson correlation coefficient, etc. [25]. At the same time, nonlinear methods apply kernel tricks or deep neural networks to simulate higher-order correlations [26, 27]. Jiang et al. [26] proposed a few-shot multiscale metric learning method to extract multiscale features and learn the multiscale relations between samples for classification. Ustinova et al. [27] proposed a histogram loss for deep embedding learning by estimating the similarity distribution of positive and negative pairs.
2.2. Few-Shot RF Fingerprint Recognition
In recent years, research on RF fingerprinting based on FSL has begun to appear. Wang et al. [28] proposed a few-shot specific emitter identification (SEI) method based on deep metric ensemble learning (DMEL). The method extracts discriminative features with compact intracategory and separable intercategory distances based on metric learning with a complex-valued convolutional neural network (CVCNN). The classification is then realized through an ensemble classifier. Simulation results demonstrate the advantages of the proposed method in both discriminability and generalization. Liang et al. [29] proposed two FSL unmanned aerial vehicle (UAV) recognition methods based on a triresidual semantic network. The proposed method extracts different levels of feature information by combining the tri-residual siamese network with the support vector machine (SVM), which has higher recognition accuracy than the traditional few-shot learning method. Xie et al. [30] developed an algorithm involving latent embedding optimization for meta-learning, which extracts low-dimensional key features from input data and evaluates the distance and degree of feature dispersion. Experiments show that the algorithm achieves few-shot SEI and gets efficient recognition accuracy after training with a minimum of 40 samples. However, all previous works mainly focus on the testing of automatic dependent surveillance-broadcast (ADS-B) signals or UAV signal sets, and there is no work to study the performance of few-shot algorithms under mobile phones or IoT datasets. In addition, the feasibility of few-shot identification is not analyzed in detail in previous work.
3. Feasibility Analysis
In the supervised learning field of deep learning, the ability of the decision function to predict the entire sample can be evaluated by the expected risk . Assuming that a total of samples are used, the empirical risk is . More specifically, there are the following definitions: represents the minimized expected risk function. represents the minimized expected risk function under the hypothesis space . Since is unknown, can be regarded as the best approximation of under . represents the minimized empirical risk function under .
In summary, the total error can be expressed as Formula (1) [18, 31]:
Among them, denote as , it is used to measure the gap between the decision function selected by and the ideal optimal hypothesis, and can evaluate the predictive ability of the model for unknown data, that is the generalization error. Denote as to measure the effect of empirical risk minimization, that is the empirical error. For any fixed decision function and when , the relationship between empirical error and generalization error can be obtained from Hough’s inequality [32]:
It can be seen from the above formula that when the value of the training sample number increases continuously, the probability that the difference between and is greater than approaches zero infinitely. The two are approximately equal at this time. However, the condition for obtaining Formula (2) is for a fixed decision function. In fact, for any decision function, the following holds:
Among them, is the number of hypotheses in the hypothesis space. It can be seen that there are two necessary conditions for learning a model with solid generalization ability. One is that the algorithm must be able to select a suitable decision function from the hypothesis space to make the empirical error approaches zero. Second, the number of training samples is large enough, and the number of decision functions in the hypothesis space must be an upper bound. For a fixed training sample set, the dimension can be introduced to describe it, and the dimension refers to the size of the largest dataset that the hypothesis space can breakup. The corresponding generalization bound is transformed into Formula (4):
The second necessary condition for learning a model with strong generalization ability is changed to that the number of training samples is large enough and the dimension of the hypothesis space is upper bound. The size of the dimension has nothing to do with the algorithm used, the distribution of the dataset, and the amount of data. The maximum value is , representing the hypothesis space’s expressive ability. Let in Formula (4), we can find:
It can be concluded that , substitute into Formula (4):
can be called the complexity of the model. The more complex the model, the larger the value of this term, and the greater the cost in the optimization process to obtain stronger generalization ability. On the contrary, the smaller the value of this term, the easier the generalization error is to approach the empirical error. Among the three existing schemes, data-based schemes improve classification accuracy through data augmentation, model-based methods reduce the scope of optimization required by reducing the hypothesis space of the model, and algorithm-based methods optimize classification performance by providing better initialization or search strategies. The three methods are shown in Figure 1.

(a)

(b)

(c)
Both in the time domain and frequency domains, there is a strong correlation between adjacent parts in wireless signals, which has unique physical significance. It is difficult to enumerate the possibility of signal change at different time points by data expansion. Algorithm-based FSL strategies tend to be more complex and have higher requirements for data types that provide prior knowledge. The FSL strategy based on the model takes the space complexity of the model hypothesis as the starting point, reduces the optimization range of the model, and has high feasibility.
4. RF Signal Processing
4.1. Radio Frequency Feature Analysis
Identification of the physical layer based on RFF needs to extract the corresponding fingerprint information from the specific part of the RF signal. Figure 2 shows the time-domain waveform of signals of IoT devices collected by oscilloscope. It can be seen that the RF signal can be divided into a transient part and a steady-state part. The transient part is generated by the state change of transmitter and is the transition stage of data transmission; the steady-state part is the signal transmitted when the transmitter is in a stable state [33]. According to different parts of feature extraction, RFF recognition is mainly divided into the transient feature and steady-state feature-based recognition methods. The recognition method based on transient features is to recognize the signal by obtaining the jump information, while the recognition method based on steady-state features is to extract the feature information from the modulation part of the signal, including frequency and phase offset, I/Q origin offset, etc.

Compared with the steady-state part, the transient part of the signal contains abundant identification information and has greater differences, which is valuable to distinguish IoT devices. Therefore, we mainly use the transient characteristics of RF signals to achieve the classification of wireless transmitters.
4.2. Preprocessing
It is challenging to extract transient signal information. Therefore, we need to preprocess the signal, including starting point detection, normalization, and dimensionality reduction.
Starting point detection: Starting point detection can preserve the transient part with RF signals to avoid identification errors caused by different signal lengths and starting point positions.
Normalization: The normalization limits the data amplitude of the signal within a certain range to eliminate the data’s singularity and reduce the impact of data redundancy.
Dimensionality reduction: The normalized signal still contains too much redundant information, and it is easy to extract many irrelevant features, thus reducing the efficiency of subsequent recognition. In addition, processing data containing many features increases the classification system’s computational complexity. Therefore, it is necessary to compress the features of the signal through dimensionality reduction. Dimensionality reduction removes irrelevant components by somehow projecting data from higher-dimensional into lower-dimensional space. It can reduce the number of irrelevant features and improve the classification accuracy. We adopt the linear discriminant analysis (LDA) algorithm [34], which is widely used. After projecting data into low-dimensional space, the LDA algorithm makes the same type of data as compact as possible, while different types of data are as scattered as possible.
5. Method
The structure of FSig-Net is shown in Figure 3. The network consists of two parts: the feature extraction network and the feature comparison network , where φ and θ correspond to the parameter sets of the two modules, respectively. The main task of the feature extraction network is to learn the similarity of embedded features through nonlinear subspaces so that the same category features are clustered, and different category features can be more easily distinguished. The main task of the feature comparison network is to analyze and compare the high-level semantics of the processed samples of different categories to realize the final recognition.

During the training process, the support set samples of different categories and the validation set samples of a certain category are input into for feature extraction. Then, the features of the support set and the verification sets are combined and input into , and the category of the verification set sample is determined by comparing the similarity.
5.1. Feature Extraction Network and Feature Comparison Network
The structure of feature extraction and comparison networks is shown in Figure 4. The feature extraction module consists of a convolutional layer, a max pooling layer, and four residual blocks of Res1, Res2, Res3, and Res4. The size of the convolution kernel used by the convolutional layer and the max pooling layer is 3 × 3. The input data passes through the four residual blocks after the convolutional layer and the pooling layer. The residual network uses a skip connection to enhance the feature extraction ability and reduce the possibility of overfitting. Among them, the parameters of the convolution operation contained in the residual network Res1 and Res3 are the same, and the parameters of the convolution operation and the pooling operation contained in Res2 and Res4 are the same. Res1 consists of two convolutional layers with 3 × 3 kernel size. Res2 consists of a two-channel convolutional network, whose upper channel is the same as Res1, and the lower channel includes an average pooling layer and a convolutional layer with a 1 × 1 kernel. Res2 senses the characteristics of different scales through the upper and lower branches, increasing the adaptability of the network to scales. The structure of the feature comparison network is relatively simple, including two convolutional layers with a 3 × 3 kernel and one fully connected layer.

To prevent the data from shifting and speed up the network’s convergence, the output is normalized after passing each convolution operation. FSig-Net adopts instance normalization as the normalization method. The activation function adopts leaky ReLU.
The FSig-Net model does not use a predefined distance metric function to judge the proximity between samples, but uses the feature comparison network to learn a suitable metric through training, which is more conducive for the feature processing module to minimize the intraclass distance and maximize the distance between classes. Finally, the category judgment result is obtained in the form of scores, which avoids the unstable effect of the same distance metric function applied to different datasets.
5.2. Training Method
Model training adopts the N-way K-shot method, and the process is shown in Figure 5. N refers to the number of classes, and K refers to the number of support set samples in each task. For each training, N categories are randomly selected from the database. Z samples are randomly selected from each category. A total of m = N × Z samples are used as training set data. K samples of each type are randomly selected from the training set as the support set data, and a total of N × K samples are used as the validation set data. Similarly, Q samples are randomly selected as the validation set without overlapping with the support set, and a total of N × Q samples are used as the validation set data. Both of them are input to the model for training discrimination, which is called a task. After the model training is completed, the relevant parameters are saved for testing, and the testing process is similar to the initial training process.

In the N-way K-shot training method, the testing process is carried out synchronously during the training process. By dividing the complete dataset into datasets of multiple small classification tasks, it continuously adapts to unknown tasks and improves the learning capacity and final classification accuracy of the model.
5.3. Loss Function
In the learning process, it is crucial to choose the loss. The cross-entropy loss function only pursues the separability of the categories. At the same time, it allows the model to autonomously learn the optimal feature projection, which is simple and efficient.
Figure 6 shows the performance comparison of large margin cosine loss (LCML) [35], additive angular margin loss (AAML) [36], and cross-entropy loss function commonly used in deep learning based on FSig-Net. Figure 6(a) shows the oscillation of different loss functions during the training process as the number of iterations increases. Overall, the cross-entropy loss function converges faster and more smoothly, starting to converge at about 200 times. LCML and AAML both experience certain fluctuations in the later period. Figure 6(b) shows the comparison of recognition rates using different loss functions. Since the data of the support set and test set are randomly selected, if the selected samples are outliers in the feature space, LCML, and AAML increase the restrictions on the distance measurement method, they are more likely to have bad results. For example, the performance of AAML drops significantly when the number of iterations is around 2,000. Overall, the performance of FSig-Net with cross-entropy loss function is more stable, so we use the cross-entropy loss as the error judgment criterion.

(a)

(b)
6. Experiment
6.1. Experimental Equipment and Data
In this paper, the signals of 8 mobile phones and 18 IoT modules are collected in the real environment. The mobile phones work in IEEE 802.11 n protocol, and the carrier frequency and bandwidth of IoT modules are 2.4 GHz and 20 MHZ, respectively. The number and models of collected devices are shown in Table 1.
The signal acquisition equipment is an Agilent oscilloscope DSO9404A, whose carrier frequency is 2.4 GHZ, and the sampling rate is set to 20 Gsa/s. The gain of omnidirectional antennas used for collecting signals is 20 dBi. Figure 7 shows the mobile phone and IoT module collection scenario. The collection scene is an empty indoor environment, and the distance between the receiving device and the transmitting device is 1 M. The signal acquisition scene of all devices is the same to eliminate the interference caused by different channel characteristics as much as possible.

(a)

(b)
The number of signal samples collected by each device is 800, and the number of sampling points per sample is 30,000. Figure 8 shows the preamble waveforms of different devices. Since the preamble waveforms of the 18 IoT signals are basically the same, only IoT-1 and IoT-14 are listed in Figure 8.

6.2. Experimental Parameters
We divide the collected data into a training set and a test set. The number of test samples for each category is 320, and the sample length reserved for each signal is 14,400. The FSig-Net model works based on the PyTorch framework. The optimization algorithm adopts Adam, and the learning rate is set to 0.0005.
In order to demonstrate the ability of the FSig-Net model in the case of few-shot, relevant experiments are carried out. We randomly select 10 samples from the dataset for each device. The training method is 8-way 5-shot as an example.
6.3. Experimental Results
6.3.1. Visualize Features
As shown in Figure 9, we, respectively, visualize the feature distribution of the preprocessed signals and the signals extracted by the FSig-Net feature extraction module. Figure 9(a) shows the feature distribution of preprocessed signals. It can be seen that the signal aggregation of the same mobile phone is good, but a small part of signals still deviate from the clustered feature cluster, such as the features represented by black and dark blue. Different mobile phones have different distributions of signal features, among which different models of mobile phones have obvious differences in feature distribution, and the features of the same model, such as Huawei P9_1 and Huawei P9_2, overlap more. Figure 9(b) shows the feature distribution of the signal after processing by FSig-Net’s feature extraction module. Compared with the case without feature extraction, the signal feature aggregation degree of the same mobile phone is higher, the outlier sample points disappear, and the signal samples are compressed more closely in the feature space. The feature distribution of different mobile phones is more distinct. The Fsig-Net’s feature extraction module improves the differentiation of different mobile phone signals in the dataset, which is more conducive to the subsequent feature comparator identification.

(a)

(b)
6.3.2. The Effect of K Value on Model Performance
Taking the mobile phone signal as an example, the number of training samples and test samples for each device are 10 and 320. The SNR is 35 dB, and different K values are adopted to carry out experiments. Figure 10 shows the model performance on the 8-way 1-shot, 8-way 3-shot, and 8-way 5-shot, classification tasks. Figure 10(a) shows a schematic diagram of the oscillation of the loss function value with different K values. The slight change in the K value has little effect on the convergence speed of the model during the training process. When the number of iterations is 600, the loss function gradually tends to 0. Figure 10(b) shows the variation of the average recognition accuracy with the number of iterations. It can be seen that with the increase of the K value, the overall recognition effect of FSig-Net has been slightly improved, indicating that increasing the number of support set samples can improve the recognition effect to a certain extent in the case of few-shot.

(a)

(b)
6.3.3. Recognition Rate under Different Sample Numbers
In order to further verify the performance of FSig-Net and analyze the influence of the training samples on the recognition accuracy, experiments were carried out with different numbers of training sets, and the results are shown in Figure 11. When the SNR is 35 dB, with the increase in the number of training samples, the average recognition accuracy of 26 devices gradually increased. When the number of training samples of a single category increased to 10, the average recognition rate no longer increased significantly. At this time, the average recognition rate of 18 IoT devices was 98.20%. The mobile phone signal recognition rate is shown in Table 2. The average recognition rate of 8 mobile phone devices is 98.28%. When the number of training sets for a single class reaches 50, the average recognition rate per device is 99.18%. Compared with the results of 10 samples, there is little difference, which indicates that FSig-Net has good recognition performance in the case of few-shot.

6.3.4. Recognition Accuracy under Different Signal-to-Noise Ratios
When the number of training samples for a single device is 10, the average recognition rates of 8 mobile phones and 18 IoT devices under different SNRs are shown in Figure 12. When the SNR is high, the average recognition performance of mobile phone signals and IoT signals is not much different. When the SNR is low, since a large amount of specific information in the signal is disturbed by noise, the differences between individuals are blurred, resulting in a significant reduction in the accuracy. Since the 8 mobile phone devices include 4 models, the features of different models are quite different. The 18 IoT devices are all of the same models, and the feature differences are small, so the average recognition rate of mobile phone signals is generally better than that of IoT devices. The FSig-Net can effectively distinguish the signals of the same type of equipment in the case of a high SNR. However, it is insufficient in distinguishing different devices when the SNR is low. Therefore, FSig-Net is more suitable for solving the problem of few-shot recognition in conventional indoor environments.

The confusion matrix of the mobile phone signal is shown in Figure 13. When the SNR is 35 dB, the recognition effect is good, and the error rate is low. When the SNR is 15 dB, since the two Huawei P9 devices have high feature similarity and are greatly affected by noise, the confusion is severe, and the two devices cannot be distinguished.

(a)

(b)
Finally, we conduct comparative experiments on 26 devices including 18 IoTs and 8 phones. On one hand, FSig-Net is compared with the traditional machine learning algorithm: k-nearest neighbor [37]. On the other hand, the typical long short-term memory (LSTM) network with memory ability is used for experiments [38], and the performance of deep learning algorithms is tested, such as the Deep Speaker recognition algorithm [39]. When the total number of training samples for a single class is 10, the recognition results are shown in Figure 14.

As shown in Figure 14, when the SNR is between 20 and 32 dB, the traditional machine learning algorithm is not satisfactory in solving the few-shot problem because of its lower recognition accuracy, and the recognition accuracy of the LSTM algorithm, which is often used to process time series data, is better. The Deep Speaker algorithm has a strong feature expression ability, so the recognition effect is also good under the condition of few-shot. The recognition rate of the FSig-Net proposed in this paper is better than the above algorithms, and it can effectively realize the few-shot RFF recognition, proving our method’s effectiveness. When the SNR is lower than 20 dB, the recognition rate of the FSig-Net model drops rapidly, and the problem needs to be further solved in the follow-up work.
7. Conclusions
Aiming at the problem that deep learning RFF identification methods are difficult to obtain sufficient signal samples in actual scenes, which leads to poor generalization ability and recognition accuracy, this paper proposes a few-shot RFF identification model FSig-Net based on deep metric learning. The model can adaptively obtain the measure of similarity between features, avoiding the problem that the fixed distance measure function is unstable in different datasets. In the experiment, the paper analyzes the influence of the K value, the number of training samples, and the SNR on the recognition performance of the model. In addition, it compares FSig-Net with other algorithms commonly used in RF fingerprinting. The results show that when the number of training samples in a single device is 10, the average recognition accuracy of FSig-Net is above 90% at the SNR of 20–32 dB, which is superior to LSTM, Deep Speaker, and other algorithms. In the future, we will further study few-shot RFF recognition methods at low SNR.
Data Availability
The data used to support the findings of this study are available from the corresponding author upon request.
Conflicts of Interest
The authors declare that they have no conflicts of interest.
Acknowledgments
This work was supported in part by the National Natural Science Foundation of China under grant numbers 61971368, U20A20162, and 61731012 and in part by the Natural Science Foundation of Fujian Province of China under grant number 2019J01003.