Abstract

The lack of labelled signal datasets in noncooperative scenarios limits the performance of specific emitter identification (SEI). To address this limitation, a method for SEI with limited labelled signals is proposed. The bispectrum of the received signal is estimated to enhance individual discriminability. An information-maximising generative adversarial network (InfoGAN) is then developed to perform SEI with limited labelled signals. To prevent nonconvergence and mode collapse due to the complexity of the radiofrequency signals, we improve the InfoGAN, respectively, from the generator and discriminator perspective. For the former, an encoder is combined with the InfoGAN generator to form a variational autoencoder that reduces the difficulty of convergence during training. For the latter, a gradient penalty algorithm is applied during the training of the InfoGAN discriminator, which enables its training loss function to obey the 1-Lipschitz constraint, thereby avoiding gradient disappearance. The design of the objective function for the training of each subnetwork and the training procedure are provided. The proposed network is trained with limited labelled and abundant unlabelled data, and an auxiliary classifier categorizes the emitters after training. Numerical results indicate that our method outperforms state-of-the-art algorithms for SEI with limited labelled signal samples in terms of effectiveness, convergence, accuracy, and robustness against noise.

1. Introduction

With the advent of the 5G era, the demand for radio spectrum has increased significantly. Due to this, technologies that enable the usage of spectrum resources to be monitored and regulated have become complex but important tools for its successful exploitation [13]. Specific emitter identification (SEI), as one of such key technologies, enables the identification of individual sources of radiofrequency (RF) signals based on the RF fingerprints (RFFs) that result from the nonideal hardware tolerances of the emitters [4]. The RFFs extracted from the RF signals produced by a particular emitter contain unique characteristics that enable SEI to be implemented [5, 6].

In recent years, various methods for performing SEI have been proposed. For instance, Bihl et al. [7] developed a method to extract RFFs through a dimensional reduction analysis (DRA) based on feature selection and designed a multiple discriminant analysis classifier to recognize individual emitters based on DRA feature subsets. The method shows an excellent performance in terms of classification accuracy and robustness when applied for SEI of Zigbee devices. Padilla et al. [8] successfully identified 28 Wi-Fi devices with an accuracy of more than 95% by analysing the preamble information in the communication. However, this method is limited to signals with a communication preamble. A method for SEI based on the bispectrum-Radon transform was proposed in [9]. The method first estimates the bispectrum of the RF signal to preliminarily represent the RFFs and then compresses it via the Radon transform to obtain the bispectrum projection vector, which is used as the input of a hybrid network model to extract deep RFFs and conduct individual emitter identification. The method was able to classify six emitter individuals with an identification accuracy higher than 90%. However, the bispectrum analysis-based method can easily lose some important subtle features, resulting in a negative impact on the SEI performance. Yuan et al. [10] applied the Hilbert–Huang transform (HHT) to transient RF signals, extracted RFFs based on the estimated time–frequency energy distribution, and finally used a support vector machine (SVM) for emitter identification. The method was able to successfully identify eight mobile phones but is only applicable to transient RF signals. Satija et al. [11] proposed the use of variational mode decomposition (VMD) to derive the temporal and spectral modes of RF signals. Various spectral features were then selected as RFFs to identify the received signals. The VMD spectral feature-based method is highly adaptable to various scenarios, including single hop and relaying, and considerably robust against noise in both additive white Gaussian noise (AWGN) and flat-fading channels. However, the performance of the method was verified using simulated rather than real-world signal data; thus, its practicality requires further research.

The abovementioned traditional RFF extraction schemes have achieved certain advancements in individual emitter identification, but some drawbacks remain. Owing to their complexity, RFFs cannot be represented by a unified mathematical model, which makes it necessary to blindly attempt multiple RFF extraction schemes and find a relatively optimal method for a specific SEI task. As a result, significant subjectivity is involved and successful identification is dependent on the cognitive level of the researcher. In other words, conventional methods mainly rely on measurements to extract the RFFs defined by specialists and cannot completely scan the representative characteristics of each emitter’s RFFs.

With the progress in artificial intelligence technology, researchers have applied deep learning (DL) [1214] to many fields. Various DL-based methods have been developed and have demonstrated considerable potential in applications such as computer vision (CV) and natural language processing (NLP). Furthermore, these advanced technologies have already achieved great success in emerging fields, such as the Internet of Vehicles [1517], wireless radio processing [1820], radar waveform recognition [21, 22], and edge computing [2325]. Motivated by these developments, works on SEI have found in DL a new research direction. Wu et al. [26] proposed an RFF extraction scheme based on a long short-term memory (LSTM) network to learn the high-order correlation of the received signal and identify its emitter. Considering the advantages of convolutional neural networks (CNNs) in image processing, Wang et al. [27] used pulse waveform images and a CNN to identify specific radar emitters. Recently, considering the characteristics of SEI application, scholars have proposed some new network models and algorithms for SEI using basic network models, such as CNNs and LSTM networks. Qian et al. [28] proposed an automatic SEI system based on a CNN with multilevel sparse representation. The SEI system splices the shallow and deep RFF features extracted by the CNN and then performs SEI based on the sparse representation identification. With limited signal samples, this approach can identify nine emitter devices with an accuracy of more than 90%. Wang et al. [29] proposed a novel DL-based model comprising a complex-valued neural network (CVNN) for SEI. Just as CNNs were designed to process two-dimensional (2D) images, CVNNs were developed to process complex-valued data. Therefore, the CVNN model is suitable for directly processing complex baseband RF signals.

Overall, the existing SEI approaches based on DL mainly focus on supervised learning, where all training data are assumed to be labelled. However, in noncooperative scenarios, it is difficult to obtain labelled training data for SEI, which limits the performance of the DL methods used. To address this limitation, Yang et al. [30] applied a few-shot learning method based on model-agnostic meta-learning (MAML) to SEI with limited availability of labelled signal data. Similarly, to solve the problems of SEI in noncooperative scenarios, in this study, we propose a novel network model for SEI, referred to as VAE-InfoGAN-GP, which comprises a variational autoencoder (VAE) embedded in an information-maximising generative adversarial network (InfoGAN), and a gradient penalty algorithm. Unlike the method in [30], our method can make full use of a large number of easily accessible unlabelled signal data to perform SEI. Accordingly, the system can learn RFFs in a more comprehensive and deeper way, thereby achieving better performance in the identification process. We conducted bispectrum analysis on the received signals as a preprocessing step. The InfoGAN model [32], which is a DL algorithm based on a generative adversarial network (GAN) [31], was used because of its excellent performance in unsupervised learning and was improved by embedding a VAE into it and applying a gradient penalty algorithm during the discriminator training. Additionally, a new latent vector was used as the generator input.

The main contributions of this study can be summarised as follows: (1)The RF signals are preprocessed using a bispectrum analysis to enhance the discriminability of different emitter individuals(2)Considering the fact that the complexity of RFFs may pose certain difficulties during the execution of the InfoGAN, especially mode collapse and nonconvergence, we incorporate some innovations in both generators and discriminators. An encoder is added before the generator to compress the RF signal bispectrum into a low-dimensional hidden variable that is then decoded by the generator to recover real RFF representation data. This provides the prior RFF representation data to the generator, thereby facilitating the generator training. Additionally, we propose a gradient penalty algorithm to train the discriminator, enabling its loss function to obey the 1-Lipschitz constraint, which can avoid gradient disappearance and further optimise the network(3)To improve the practicability of the proposed network, the loss function of each subnetwork training is analysed and designed; moreover, the training flow of the whole network is analysed and provided(4)Numerous experiments are carried out to evaluate the convergence and identification performance of our method. Simulation results show that our method can perform SEI tasks well in noncooperative scenarios

The remainder of this paper is organised as follows. In Section 2, the proposed method for semisupervised SEI is provided. We first introduce the signal preprocessing method based on bispectrum analysis, then describe the design of the VAE-InfoGAN model and the details of the gradient penalty algorithm, and lastly explain a novel latent vector used as the generator input. The results of the experiments conducted on a real-world RF dataset generated through the software-defined radio (SDR) platform are presented in Section 3. Finally, Section 4 presents the conclusions of the paper.

2. Proposed Method for SEI with Limited Labelled Signals

2.1. Signal Preprocessing

A bispectrum analysis is an effective method for signal preprocessing, which can retain the amplitude and phase of the signal and measure its degree of nonlinearity and asymmetry [33, 34]. The estimated bispectrum of the RF signals forms the basis for RFF extraction. The estimated bispectrum can be obtained as follows: where and represent the 2D frequencies, and the third-order cumulant can be expressed as

To form some intuition, we perform a bispectrum analysis on two RF signals, and , recorded from two emitter individuals of the same device type, obtaining the bispectral energy distributions shown in Figure 1. It is observed that the energy side lobe distribution in Figure 1(a) is smoother than that in Figure 1(b), whereas the main lobe distribution in Figure 1(b) is more concentrated than that in Figure 1(a). To visualise the differences more intuitively, the corresponding bispectral contour maps, which are the bispectra of the signals, are obtained in Figure 2 as the RFF representation data that are used as the input of the proposed network model.

As shown in Figure 2, there are visible differences between the two bispectrums, demonstrating the effectiveness of the bispectrum analysis for signal preprocessing, which can enhance individual discriminability.

2.2. VAE-InfoGAN

The basic architecture of the InfoGAN consists of a generator (G), a discriminator (D), and an auxiliary classifier (Q). The generator output data after being fed with the latent vector and latent code , which is an interpretable constraint variable such as the real data category. The discriminator aims to distinguish the generated data from the real data, thereby facilitating the generator to output data that is as close to the real data as possible. Lastly, the auxiliary classifier decodes to maximise the mutual information between the generated data and the latent code , thereby enabling the generator output to have a higher correlation with said code. When InfoGAN converges, the auxiliary classifier can effectively identify generated data, , that is similar to the real data. Therefore, the auxiliary classifier can also be used for the unsupervised classification of the real data.

In this study, we used InfoGAN as the initial structure of the algorithm and extended it by including a VAE. Figure 3 shows the resulting architecture of the proposed VAE-InfoGAN model, where represents the bispectrum of the RF signals, that is, the RFF representation data, and includes both labelled data and unlabelled data . The labels of can be denoted as . Additionally, the latent code is sampled from a categorical distribution, , where represents the emitter classes.

To further optimise the network, instead of an original Gaussian distribution, the latent vector is taken from a Gaussian mixture distribution [35], which can be expressed as follows: where represents the emitter classes, and are, respectively, the mean and covariance of the -th emitter, and represents a categorical random variable such that and .

Finally, we feed and into the generator to produce fake representation data .

The InfoGAN model has a number of inherent failure modes that can occur during training. Of particular interest in this case are the problems of modal collapse and nonconvergence, which can be further exacerbated due to the complexity of the RF signals. To address this limitation, we introduce a VAE into the InfoGAN (which is referred to as VAE-InfoGAN). The VAE consists of an encoder (E) and a decoder (corresponding to generator G). The former encodes the RFF representation data to obtain bottleneck vectors, which include the mean vector and covariance vector . The bottleneck vector space is then constructed as follows [36]. where represents a random vector that follows a standard normal distribution.

By sampling from , we obtain the hidden variable which is subsequently decoded by the generator to reconstruct the real RFF representation data , aiming for it to be as close as possible to the RFF representation data . In the original InfoGAN, the training may become unstable because of the difficulty in balancing the capabilities of both the generator and discriminator. This results from the fact that the generator has no prior knowledge of the complex-featured RFF representation data and simply generates fake sets from random noise. Therefore, during training, the generator is significantly less powerful than the discriminator, and an effective confrontation cannot be achieved. In this case, the network will have a poor training effect, and mode collapse or nonconvergence problems may appear. However, when the VAE is embedded in the original InfoGAN, we can provide the generator with a loss value between the real RFF representation data and RFF representation data , so that it obtains prior knowledge about the RFF representation data. With the help of this information, the generator can converge faster, thereby simplifying the training of the whole network.

Based on [36], the encoder training process can be expressed as follows: where is the function used to make the real RFF representation data as close as possible to the RFF representation data and is used to guide the encoded hidden variable into following the normal distribution. This can be expressed as follows: where represents the real data distribution.

The discriminator is used to classify data as “real/fake,” and its training process can be expressed as follows: where is used to maximise the output value of the real RFF representation data produced by the generator, which can be expressed as follows:

In contrast, is used to minimise the output value of the fake RFF representation data produced by the generator, which can be expressed as follows:

The auxiliary classifier is used to maximise the mutual information between the latent code and the generated data ; its training process can be expressed as follows:

When labelled training data are present, the training process can be extended as follows: where denotes the mutual information between the labelled RFF representation data and the corresponding labels , which can be expressed as follows:

Acting as the link between the different subnetworks, the generator training process can be expressed as follows: where , , and are equivalent to those in Equations (5), (8), and (13), respectively.

2.3. Gradient Penalty for Discriminator

In the previous section, we introduced a VAE to reduce the training difficulty of the generator. However, this measure only improves the network from the perspective of the generator, so that it can effectively play an adversarial game with the discriminator. The defects of discriminator training are also an important factor hindering the training effect of the network. If the discriminator is too weakly trained, it will not be indicative enough for the generator to obtain good results. On the contrary, if the discriminator is too well trained, the generator will not obtain enough gradient information for further optimisation. As a result, discriminator training is very difficult to control. The mathematical proof in [37] shows that the above problem is caused by the use of JS divergence to measure the distance between the real and generated data distribution, which can cause gradient disappearance during training. In this regard, Arjovsky et al. [38] proposed a novel measurement method for calculating the distance between the real and generated data distribution, called the earth mover’s distance (EMD), which can be expressed as where and represent the real and generated data distributions, respectively.

To apply the EMD to neural networks, Equation (15) needs to be transformed using the Kantorovich–Rubinstein duality [39] theory; the transformation result can be expressed as where denotes a neural network function that obeys the 1-Lipschitz constraint, i.e., .

Therefore, to apply the EMD to the GAN-based model, the mathematical model of the discriminator in (8) should be changed, which can be expressed as

Directly implementing a discriminator function subject to the 1-Lipschitz constraint is difficult, and this problem can be solved through a mathematically equivalent method, which can be expressed as

This essentially requires that corresponding to all inputs is less than 1. However, traversing the entire input data distribution is impossible. In this regard, we define a penalty data distribution,, that only meets the requirement that the data sampled from correspond to a gradient less than 1. Therefore, the mathematical model of the discriminator in (17) should be changed, which can be expressed as follows: where acts as a regular term that penalises the behaviour of gradient larger than 1 when the discriminator weight parameters are updated.

In terms of the definition of the penalty data distribution, we define the data distribution space between the real RFF representation data distribution space and the fake RFF representation data distribution space as , as intuitively shown in Figure 4.

The implementation details can be denoted as follows: where and represent real RFF representation data samples sampled from , and represent fake RFF representation data samples sampled from , and denote penalty data samples sampled from , and and denote random parameters.

2.4. Method Procedure

The training process of the proposed method is described in Algorithm 1.

Input:
: Raw RF signal data;
: Latent code;
: Latent vector;
Output:
: Parameters of sub-networks E, G, D, Q
1. Perform bispectrum analysis on RF signal data and obtain bispectrum as RFFs representation data ;
2. Initialize network parameters of encoder, generator, discriminator and auxiliary classifier;
3. Fix the network parameters of encoder and generator, generate real and fake RFFs representation data based on the RFFs representation data , latent code and latent vector :
4. Calculate the loss function of discriminator:
5. Network parameters of discriminator are updated according to the loss function:
6. Calculate the loss function of auxiliary classifier:
7. Network parameters of auxiliary classifier are updated according to the loss function:
8. Fix network parameters of discriminator and auxiliary classifier, and train encoder and generator;
9. Calculate the loss function of generator:
10. Network parameters of generator are updated according to the loss function:
11. Calculate the loss function of encoder:
12. Network parameters of encoder are updated according to the loss function:
13. Repeat steps (2)-(12) until the network converges, and then auxiliary classifier is used to conduct semi-supervised SEI;

3. Results and Discussion

3.1. RF Signal Data Collection and Experimental Setup

We collected real-world RF signal data to evaluate the performance of our method based on an SDR system implementing GNU radio and containing seven USRP (B210) devices. We used seven USRP devices as transmitters at an RF of 2.4 GHz, with each device connected to a laptop running the Linux Ubuntu 18.04 operating system. The remaining USRP device was connected to the same laptop and served as a receiver to collect the seven classes of RF signals generated by the other devices. The SDR platform is shown in Figure 5.

For each of the seven classes of RF signals generated, we obtained 20,000 segments of RF signal data, and the modulation method is QPSK. Next, using MATLAB, we added different levels of AWGN to adjust the SNR to 0, 2, 4, …, and 20 dB. Bispectrum analysis was then applied to the RF signal data to obtain the RFF representation data, which have a uniform size of . At each SNR level, 20,000 RFF representation data were obtained per class. Of all the RFF representation data, 80% were used for training, 10% for validation, and the remaining 10% was used for testing. For the training sets, 20% were set as training data with category labels, and 80% were set as training data without category labels. Figure 6 shows the dataset structure.

We evaluated the performance of our method based on Python 3.6 in TensorFlow. The DL PC was equipped with an NVIDIA GeForce GTX 3090 Ti GPU, and the details of the network structure and training hyperparameters are presented in Table 1.

3.2. Convergence Performance

We compared the convergence performance of the VAE-InfoGAN performing gradient penalty algorithm (VAE-InfoGAN-GP) with other two network frameworks: VAE-InfoGAN and InfoGAN. Figure 7 shows the training loss value curves in 400 epochs.

As shown in Figure 7, VAE-InfoGAN-GP achieves the fastest convergence speed and reaches the best convergence plane. In contrast, the loss value of VAE-InfoGAN is maintained at a higher level because the traditional discriminator adopts JS divergence to calculate the distance between the real and fake data distribution, resulting in gradient disappearance. In this case, the further optimisation of network parameters will be affected, so that the training loss value cannot be further decreased. This comparison demonstrates the effectiveness of the gradient penalty algorithm in avoiding disappearance and improving the network training effect. Additionally, the loss value curve of InfoGAN shows significant fluctuations, which are the manifestation of network nonconvergence. However, the loss value curve of VAE-InfoGAN tends to flatten out with the progress of network training, showing a superior convergence performance. The reason for this significant difference is that VAE-InfoGAN can provide the generator for the prior information of the real data, thereby reducing the training difficulty of the generator and facilitating convergence of the whole network model. This comparison demonstrates that the VAE embedded original InfoGAN can effectively prevent mode collapse and nonconvergence, which is of considerable benefit to the improvement of the convergence performance.

3.3. Identification Performance vs. SNRs

We tested the identification performance of the proposed method according to the SNR. Furthermore, we compare it with that of the original InfoGAN and existing methods, including MAML [30] and virtual adversarial training (VAT) [40].

Figure 8 shows the identification accuracy as a function of the SNRs of the evaluated methods under AWGN and Rayleigh noise.

As shown in Figure 8(a), our method has an average identification accuracy of 7–10% higher than the original InfoGAN, which further demonstrates that the embedded VAE and gradient penalty algorithm can improve the training effect of InfoGAN, thereby enhancing the identification performance of the SEI system. Additionally, our proposed method achieves an approximate accuracy of 90% at 6 dB and 95% at 10 dB. Compared with the existing methods of VAT and MAML, the identification performance of our method is improved to different degrees. Especially at low SNRs, our method shows superior noise robustness, and the maximum accuracy gap between our method and MAML and VAT can, respectively, reach up to approximately 12% and 21%.

As is shown in Figure 8(b), our proposed method achieves an approximate accuracy of 84% at 6 dB and 90% at 10 dB, whereas the MAML method achieves the same accuracy at 10 dB and 14 dB, respectively. The VAT method deteriorates significantly in this scenario such that it is no longer considered. The results show that our method achieves the same excellent identification performance and noise robustness in the Rayleigh propagation channel, which further proves its practicability and superiority.

3.4. Visualised Analysis

To evaluate the excellent identification performance of our method in an intuitive way, the algorithm of t-SNE [41] was used to visualise the high-dimensional feature vectors, which were extracted from the MAML and VAE-InfoGAN-GP methods. Figure 9 displays the t-SNE dimensionality reduction distribution of two methods with  dB.

As shown in Figure 9, there are seven clusters in the 2D scatter diagram, representing seven different emitter individuals. Figure 9(a) shows that seven different clusters show stronger intraclass aggregation and larger interclass differentiation. In Figure 9(b), however, seven clusters show higher intraclass dispersion. The results show that our method can extract more discriminative feature parameters, partly due to the improvement of the network structure and the training algorithm and partly because of making full use of a large number of easily accessible unlabelled signal data, which can enhance the generalisation ability of the network to a certain extent. As a result, our method can identify different emitter individuals with a higher accuracy.

3.5. Identification Performance vs. Ratios of Labelled to Total Signal Data Samples

In this section, the identification performance depending on different ratios of labelled to total signal data samples is considered. The ratios of labelled to total signal data samples are important factors affecting the classification performance of the algorithm. For the training samples of each class of signal, the ratios are set as 0%, 2%, …, 10%, and then, the identification accuracy is tested at different ratios. The experimental results are shown in Figure 10.

Figure 10 shows that the identification performance of our method remains relatively stable after reaching a ratio of 4%, which indicates that our method is still effective for the task of SEI with very limited labelled signal data. In particular, when the ratio decreases to 0%, that is, the training samples are all unlabelled signal data, the identification performance deteriorates but not seriously, and the accuracy can reach 90% at 12 dB. This result further proves the excellent performance of the proposed method in solving the few-shot problem in SEI.

3.6. Other Evaluation Metrics for Identification Performance

The accuracy, which is the most common evaluation metric for classification problems, was used to evaluate the identification performance in the above experiments. However, it can only count the proportion of samples that are correctly predicted, which can reflect only the overall identification performance. For multicategory identification, each class is in a minority with respect to the rest of the RF signals, and it is difficult to evaluate whether one single class of RF signals is correctly identified through the evaluation metric of accuracy.

In this section, we use the metric of receiver operating characteristic (ROC) to further evaluate the identification performance. To avoid class imbalance, we select one USRP as a positive class with a weight of six and the other six USRPs as a negative class with a weight of one. Figure 11 shows the ROC curves of our method and MAML with the SNR set at 20 dB.

As shown in Figure 11, the ROC curves of our method are distributed in the upper left of the figure, while those of MAML are relatively offset to the lower right. This indicates that our method can achieve a higher true-positive rate with a lower false-positive rate, thereby intuitively illustrating that our method can efficiently identify each USRP device. The area under the curve (AUC) is also calculated for each USRP device. Figure 11 shows that the AUC for each device with our method is higher than 0.97, whereas the AUC for each device with MAML is between 0.926 and 0.951. Additionally, Table 2 provides the mean ROC AUC of the two methods at SNRs of 0–20 dB, and our method achieves a higher AUC than MAML at all SNRs. As expected, the results further demonstrate that our proposed method is superior to state-of-the-art methods in terms of identification performance and noise robustness.

4. Conclusion

In this paper, we proposed a method for SEI with limited labelled signals. Bispectrum analysis was performed as a preprocessing method on labelled and unlabelled RF signal data to obtain RFF representation data, which was then fed to the network model InfoGAN for semisupervised training and emitter identification. Considering the mode collapse and nonconvergence in InfoGAN and the fact that the complexity of RFFs may lead to difficulties, the network model InfoGAN was improved, and a VAE was introduced to compress the labelled and unlabelled RFF representation data into a hidden variable, which was then restored into real RFF representation data by the generator. In this way, the VAE can provide the prior information of RFF representation data for the generator, thereby facilitating generator training. Additionally, we proposed a gradient penalty algorithm to train the discriminator, enabling its loss function to obey the 1-Lipschitz constraint, which prevents gradient disappearance and further optimises the network.

The experimental results demonstrate the effectiveness of our proposed VAE-InfoGAN-GP method in improving the convergence performance. Furthermore, the identification accuracy of our method is higher than that of the state-of-the-art MAML and VAT methods and the baseline InfoGAN network model. Additionally, our method can maintain a high identification accuracy when only a few labelled data are available. Moreover, we used the metric of ROC AUC to further evaluate the identification performance. The results further demonstrate that our proposed method outperforms the state-of-the-art MAML method in terms of identification performance and noise robustness.

Data Availability

The data used to support the findings of this study are available from the corresponding author upon request.

Conflicts of Interest

The authors declare no conflicts of interest.

Acknowledgments

This work was supported in part by the National Natural Science Foundation of China under Grant 91538201, in part by the Taishan Scholar Project of Shandong Province under Grant ts201511020, and in part by the Project supported by the Chinese National Key Laboratory of Science and Technology on Information System Security under Grant 6142111190404.