Abstract
Traditionally, nonlinear data processing has been approached via the use of polynomial filters, which are straightforward expansions of many linear methods, or through the use of neural network techniques. In contrast to linear approaches, which often provide algorithms that are simple to apply, nonlinear learning machines such as neural networks demand more computing and are more likely to have nonlinear optimization difficulties, which are more difficult to solve. Kernel methods, a recently developed technology, are strong machine learning approaches that have a less complicated architecture and give a straightforward way to transforming nonlinear optimization issues into convex optimization problems. Typical analytical tasks in kernel-based learning include classification, regression, and clustering, all of which are compromised. For image processing applications, a semisupervised deep learning approach, which is driven by a little amount of labeled data and a large amount of unlabeled data, has shown excellent performance in recent years. For their part, today’s semisupervised learning methods operate on the assumption that both labeled and unlabeled information are distributed in a similar manner, and their performance is mostly impacted by the fact that the two data sets are in a similar state of distribution as well. When there is out-of-class data in unlabeled data, the system’s performance will be adversely affected. When used in real-world applications, the capacity to verify that unlabeled data does not include data that belongs to a different category is difficult to obtain, and this is especially true in the field of synthetic aperture radar image identification (SAR). Using threshold filtering, this work addresses the problem of unlabeled input, including out-of-class data, having a detrimental influence on the performance of the model when it is utilized to train the model in a semisupervised learning environment. When the model is being trained, unlabeled data that does not belong to a category is filtered out by the model using two different sets of data that the model selects in order to optimize its performance. A series of experiments was carried out on the MSTAR data set, and the superiority of our method was shown when it was compared against a large number of current semisupervised classification algorithms of the highest level of sophistication. This was especially true when the unlabeled data had a significant proportion of data that did not fall into any of the categories. The performance of each kernel function is tested independently using two metrics, namely, the false alarm (FA) and the target miss (TM), respectively. These factors are used to calculate the proportion of incorrect judgments made using the techniques.
1. Introduction
Over the past few years, artificial intelligence technology has increased in prominence, with applications ranging from image recognition [1] to adversarial assaults [2] to privacy protection [3, 4] and everything in between. The gains produced in the discipline of computer vision applying data-driven deep supervised learning, which have created exceptional outcomes in a wide range of vision challenges [5–8], are especially notable. A significant number of labeled data is necessary to train deep learning models, on the other hand [9, 10]. Insufficient labeled data in the training data set leads the algorithm’s performance to be limited, and the algorithm’s performance is diminished. SAR pictures vary from normal optical images in the area of synthetic aperture radar (SAR) image processing because of the imaging mechanism and speckle noise [11–13], which separates them from ordinary optical images. SAR pictures are distinct from conventional optical images in the area of synthetic aperture radar (SAR) image processing because of the imaging method and speckle noise [11–13]. As a result of these features, the categorization and identification of SAR pictures have proved to be quite challenging [14].
It has been hypothesized [15, 16] that semisupervised learning technology may be employed to minimize the dependency of deep learning technology on an immense quantity of labeled data, which has been demonstrated to be inefficient. It is feasible to lessen the demand for labeled instances in deep learning while simultaneously minimizing the manpower cost associated with labeled data by utilizing unlabeled data, which is cheap to obtain and store. Semisupervised learning, which happens during the training phase, boosts the model’s performance by automatically categorizing unlabeled input during the training phase. This technique has aroused a great amount of curiosity among academics in the disciplines of machine learning and computer science [17].
For semisupervised learning, target identification technologies such as generative adversarial networks, teacher-student networks, and consistency regularization technology are now available for application [18]. The usage of two discriminators to steady the training of the generator was reported in Reference [19], which results in the generation of high-quality SAR images that may be applied to refine a model in order to acquire good recognition performance. This technique is based on a deep convolutional generative adversarial network and makes use of two discriminators to stabilize the training of the generator during the training phase of the network [20]. Throughout the training process, teacher-student networks are commonly leveraged to find unlabeled data and enhance the student model, among other things. As a result, they are often used in semisupervised learning situations [21]. The following picture is courtesy of Wikimedia Commons: When the unlabeled data was partitioned using a teacher-student network, as described in Reference [22], the authors were able to increase the unlabeled data quality of pseudomarking, allowing them to optimize the recognition model even further [23]. A semisupervised learning framework based on self-consistent enhancement (SCA) was proposed by the authors of Reference [14], and it was integrated with data enhancement, hybrid labeling, consistent regularization, and other technologies in order to achieve outstanding identification accuracy on SAR images [24].
SAR target recognition techniques based on semisupervised learning, such as the ones outlined above, may assist in improving the accuracy with which the model can identify targets in SAR. To put it another way, the unlabeled data used in the training process and the labeled data used in the testing process are both distributed in the same way, which is the assumption made by both of these methods [25]. During the model training and application process, however, unlabeled data is often combined with a considerable quantity of data that does not fit into the target category. The training of the model will be negatively impacted by an unlabeled data set that contains out-of-category data mixing to a certain level, resulting in the model performing below expectations. In the field of SAR image identification, data purification and division, in particular, is a time-consuming and labor-intensive technique that necessitates a significant investment of both energy and time [26].
A threshold filtering semisupervised learning method is proposed as a solution to the problem that out-of-class data mixed with unlabeled data in the semisupervised learning SAR target recognition method has an adverse effect on the recognition accuracy of the model, which is caused by the out-of-class data mixed with unlabeled data in the semisupervised learning SAR target recognition method [27]. The conceptual framework of the proposed approach is shown in Figure 1.

Pseudolabeling involves training the network using data that is both labeled and unlabeled at the same time in each batch. This implies that for each batch of labeled and unlabeled data, the training loop performs the following operations:
To compute the loss, a single forward pass on the labeled batch is performed. This is the loss that has been designated [28].
For the unlabeled batch, one forward pass is performed to forecast the “pseudolabels” for the unlabeled batch.
In order to determine the unlabeled loss, use this “pseudolabel.”
In the course of the model’s training process, the softmax function labels the unlabeled data, and the chance of labeling outcomes varies based on the training data used in the model’s development [29]. In order to put our method into action, it is necessary to start with a solid foundation. When we begin model training at a new epoch, we begin by using the current model to pseudolabel all unlabeled data, sorting the samples according to their pseudolabel prediction probability, and selecting some samples as unlabeled samples based on threshold filtering in order to improve model recognition accuracy before moving on [30]. It is simple and straightforward to implement the recommended methodology into existing semisupervised learning methods, making it simple, effective, and adaptable to a wide range of settings and learning scenarios. With the help of a large number of trials on the widely known SAR target identification data set MSTAR in a semisupervised setting, we were able to show the effectiveness of our technique [31].
The goal is to categorize items in a SAR picture into two categories, man-made things and natural objects, and to determine the most suited kernel function for SAR image classification in each category.
The suggested technique employs a variety of machine learning algorithms, such as Perceptron Learning and Support vector machine, together with a large number of kernel functions, to accomplish classification of SAR images, with the performance of each kernel function being monitored [32]. In this context, the kernels used, such as RBF, sigmoid, polynomial, and Hellinger, are some of the most popular kernels since they are quick and give smoothness throughout the contours [33].
Comparing the design of kernel-based approaches to that of linear methods, the latter provides a clear approach to the transformation of nonlinear issues into convex optimization problems, while the former does not [34]. With SVM approaches, kernel functions are capable of dealing with pixel perturbations and distortions, making them useful for image classification tasks such as image classification [35].
The following is how the rest of the article will be structured. The second section of this article goes into further detail on the strategy that has been proposed. Using the publicly accessible data set, Section 3 presents the experimental results as well as a discussion of the findings. Last but not least, Section 4 presents a summary of the results.
2. The Proposed Method
During this section, we will go over the intricacies of the method that we have just learned. The structure of the suggested technique is shown in Figure 1. Figure 2 illustrates how training model alignment may be used to provide pseudolabeling of unlabeled data in the absence of labeling. A maximum probability value is calculated by sorting the pseudolabels in order of increasing size from large too small [36]. Then, only the samples with ratings more than 50 percent of the maximum probability value are picked. Using the first half of the data, evaluate the last half of the samples, screen out the samples with the highest prediction probability greater than 80 percent (the basket selection portion of the figure), and then perform semisupervised learning training with the remaining half of the data in order to improve the model’s prediction accuracy and precision [37].

2.1. Preselection
Given a labeled data set and an unlabeled data set, which of the following is the case using the presently available semisupervised learning approach? The technique for model optimization is carried out on the premise that the two data sets have the same distribution as the original data, which is not the case [38]. When it comes to real-time SAR target recognition, however, unlabeled data is often combined with data from sources that are outside of the target category, as seen in Figure 2. In the unlabeled data set, samples that are not in the same class as those found in the labeled data set are included (indicated with red bounding boxes). Observe that data outside of these categories is difficult to find and that manual selection is time-consuming and labor-intensive to complete. For this reason, during the semisupervised training phase of the model, we use preprocessing in order to choose particular unlabeled data that are more likely to be utilized as samples in order to maximize the accuracy and precision of the model.
During the process of semisupervised learning and training the model, our objective is to sort and choose the unlabeled data set, delete some unlabeled data that is regarded to be outside the category, and utilize the remaining unlabeled data for model optimization.
When training with unlabeled data, methods such as SCA use constant regularization throughout the process, and the unlabeled loss function calculation procedure is as follows: where is the number of unlabeled samples, and are two different samples generated by random data enhancement of unlabeled data, and and represent the predicted probabilities of the model for unlabeled samples and .
Using SCA, we screen unlabeled samples throughout the training phase in order to improve the recognition performance of the model and reduce the number of false positives. In order to reduce calculation time and increase the efficiency of model training, we refer to the characteristics of the teacher-student semisupervised model, and we use the teacher model during the training process to filter and divide the unlabeled data in order to reduce calculation time and increase the efficiency of model training. The deep learning training model has the feature that the recognition performance of the model steadily improves over time as the machine learns more. As a result, a high number of unlabeled samples will be wrongly split among the samples picked during the first training stage, and this will prevent the model from performing its optimization function. To do this, at the start of each epoch of the model training process, we use the teacher model of the current training model to predict and label all unlabeled samples, and then, we sort them according to the maximum value of the labeling probability calculated from the predictions. Sort away the unlabeled data that is regarded to be data outside the category, and then, utilize the remaining data in conjunction with the labeled data to optimize the model for this epoch.
The method of screening unlabeled samples is as follows: where is a hyperparameter, which is the number of samples selected for training from the unlabeled sample data set . Through Equation (2), we have selected the current training model and sorted the first samples with the largest predicted probability of unlabeled samples. In the current situation, the proportion of out-of-class samples in these samples is lower than the proportion of all unlabeled data sets. Using the filtered unlabeled samples to train the model through Equation (1), the recognition accuracy of the model can be further improved.
2.2. Threshold Filtering
However, in the model training process, it is unknown how many out-of-class labeled samples are mixed in the unlabeled data, so the choice of needs to be verified through experiments. To this end, we have further adjusted the method of selecting unlabeled samples: where is a hyperparameter, which represents the sample prediction probability threshold. We use Equation (3) to select samples with the maximum predicted probability of the model greater than to form an unlabeled data set and the finally selected unlabeled sample set .
In this way, in the initial stage of model training, a certain number of samples can be selected to train the model, and at the same time, when the model has a certain performance, more intraclass data can be selected to optimize the recognition performance of the model.
Through the preselection rule, the unlabeled model can be filtered at the beginning of each epoch of semisupervised training, and the unlabeled sample set for training can be selected.
Despite the preselection rules, there are still some out-of-class samples among the unlabeled samples. Especially in the initial training stage, when the sample set selected by Equation (2) is used for training, the model has a large number of fake labels for the fake labels of the samples. This situation will weaken the training effect of the model and affect the final recognition accuracy of the model. Therefore, referring to the pseudolabeling method [14], we introduce a prediction threshold to decide whether to use pseudolabeled samples: where is the threshold hyperparameter used to ensure the accuracy of pseudolabeled samples during the training process is the number of particles and is the filtered value.
Through threshold filtering, in the training process of the model, pseudolabeled data can be automatically processed in real time, and pseudolabeled data with a larger prediction probability is used to optimize the recognition accuracy of the model.
2.3. Loss Function
After preselection and threshold filtering, our method processes and screens unlabeled data, reducing the ratio of out-of-class samples and false-labeled samples in pseudolabeled samples. Using the filtered pseudolabeled data , combined with the labeled data set , the semisupervised SAR target recognition model is trained on the basis of the SCA method through the operation of mixed samples.
The overall loss function is as follows: where is the supervised training loss function, which is calculated by cross-entropy by labeling samples. where represents the number of labeled samples, a represents the sample obtained after random data augment, represents the probability vector of data augment sample , and is the corresponding training target. The labeled sample and the unlabeled sample are mixed through the mixup [39] method, and then, the mixed sample is used to calculate the loss according to the loss function of Equation (5), and the model is optimized through backpropagation.
3. Experiments and Results
This section demonstrates the effectiveness of our proposed method by way of experimental comparison. First, we introduced the experimental data set, including the division of training set and test set, the division of labeled data and unlabeled data in the training set, and the selection of in-class samples and out-of-class samples. Then, we introduced the implementation details and experimental environment of the proposed method. Finally, by comparing with existing methods, the effectiveness of our proposed method is confirmed.
3.1. Data Set Description
To evaluate the effectiveness of the method, the MSTAR data set is used in the experiments, which is a public data set created by the US Air Force Laboratory and widely used in SAR target recognition. The MSTAR data set is divided into two subdata sets, a training data set obtained at a depression angle, and a test data set captured at a depression angle. In each data subset, there are ten types of SAR images of military vehicles with ground targets. The ten types contained in the data set are 2S1, BMP2, BRDM2, BTR60, BTR70, D7, T62, T72, ZIL131, and ZSU234.
In our experiment, we randomly select 10 samples from each of the six categories 2S1, BMP2, BRDM2, BTR60, BTR70, and D7; a total of 60 samples in depression angle are used as labeled samples. Then, 1260 samples are selected as unlabeled samples from all samples. We control the distribution of labeled samples and unlabeled samples by controlling the proportions of the first six categories in the unlabeled samples. For example, when the proportion of in-class samples of unlabeled samples is 50%, 630 out of 1260 samples are samples in the first 6 categories, and the rest are samples in the last four categories of T62, T72, ZIL131, and ZSU234. When the within-class sample ratio of unlabeled samples is 100%, all 1,260 samples are the first 6 class samples.
Then, we selected 50 SAR target images with depressions for each category in six categories, including 2S1, BMP2, BRDM2, BTR60, BTR70, and D7; a total of 300 images are used as the validation set. Finally, in each category, 145 images that are different from the previous 50 images are selected; a total of 870 images are used as the test set.
3.2. Implementation Details
For our experiments, we use the “Wide-ResNet-28-2” architecture in wide residual networks [40] as our backbone network and a batch of 32 images and 200 batch as an epoch. The model is trained for 120 epochs by Adam solver with a learning rate of 0.002. Weight decay decaying weights by 0.02 at each update for the training model are used as a regularization method. We set the number of training samples in the unlabeled sample data set to 630, the sample prediction probability threshold is set to 0.8, and the threshold hyperparameter is set to 0.9. The main configuration of the employed computer is as follows: GPU, GeForce RTX 2080Ti; operating system, Ubuntu 18.04; and running software, Python 3.7.
3.3. Metrics
In order to quantitatively evaluate the proposed method, we use accuracy as a performance indicator, which is defined as where , , , and represent the number of true positives, false negatives, true negatives, and false positives.
3.4. Experimental Results and Analysis
Using many state-of-the-art semisupervised learning techniques, including model, mean teacher, mixmatch, and SCA, we compare the performance of our proposed approach with those of the other methods. The goal of this part is to confirm the performance of our suggested method. For unlabeled data, the model employs consistency regularization to ensure that no errors are introduced, and it promotes interpolation between predictions made for the unlabeled instance and predictions made with the use of random data to trend to zero. For the purpose of optimizing the model, the mean teacher computes an exponential moving average of parameters on the label predicted by a model on the unlabeled data based on the model’s predictions. In order to improve the model, mixmatch and SCA are interpolated using mixup in order to mix labels with unlabeled data that has been pseudolabeled.
For the techniques described above, we reimplement them in PyTorch and apply them to the same model (wide-resnet-28-2) in order to confirm that they are comparable. At the same time, we picked 5 sets of labeled data, unlabeled data, validation set, and test data at random and ran them all through the computer. Using these five sets of data, all methods are tested, and the model that achieves the best recognition effect for each method on the validation set is chosen as the final result. The model’s average recognition accuracy on the test set is used as the final result, and the recognition effect for each method on the validation set is used as the final result. Furthermore, we compare our results to the supervised learning approach, which basically involves training a deep neural network on a tiny labeled data set, which serves as the baseline method. Supervised learning achieved identification accuracy of 70.39 percent in the case of just ten labeled samples in each category, utilizing the same model and data to continue the training process.
Using the conditions of 0.5, 0.6, 0.7, 0.8, 0.9, and 1, we performed tests to determine whether or not the samples in the category were recognized. The findings are displayed in Table 1.
Table 1 shows that our technique produces the best results under all in-class sample ratios, as can be seen in the figure. Using unlabeled samples with out-of-class data and unlabeled samples with no out-of-class data, the experimental findings demonstrate that the semisupervised approach described in this research can provide the best results in both cases. The recognition accuracy of all approaches is greater than the accuracy of utilizing simply labeled data, which further demonstrates that semisupervised learning has certain benefits over traditional supervised learning.
The identification accuracy rate, on the other hand, diminishes when the fraction of unlabeled data mixed with out-of-class samples rises, as seen in the graph below. The recognition accuracy produced by the model technique is only 5.06 percent greater than the recognition accuracy obtained by supervised learning when the mixing ratio of out-of-class data is 50 percent. Recognizability progress for other approaches is likewise less than 90 percent, demonstrating that out-of-class samples have a bigger influence on the recognition accuracy of the semisupervised learning model than in-class samples do.
Furthermore, when just a limited number of out-of-class data are included, our technique offers a significant advantage over other approaches. When the fraction of in-class samples in unlabeled samples surpasses 60%, the recognition accuracy of our approach is more than 90%, indicating that it is very accurate. When compared to other approaches, it reveals a significant flaw. With a ratio of 0.9 between in-class samples and unlabeled samples, our technique obtains an accuracy of 95.46 percent in terms of recognition accuracy. This identification accuracy rate is much greater than the recognition accuracy rates of other approaches when applied to the same circumstance in the same way. Actual SAR target recognition applications have a number of challenges, including ensuring that unlabeled samples do not get contaminated by samples from out-of-class classes. In this scenario, the semisupervised learning strategy that we presented may result in greater recognition accuracy than the traditional way. As a result, our technique is effective.
At the same time, our technique achieves the maximum recognition accuracy when all of the examples are within-class samples, which is another advantage. Thus, the simple screening of samples during the training process of our suggested technique may help to increase the recognition accuracy of the semisupervised learning model, which is a promising development.
Figure 3 shows the confusion matrix of supervised learning method using only 10 labeled samples per class, while Figure 4 shows the confusion matrix of the method proposed in this paper using 10 labeled samples per class and 1,260 unlabeled samples containing 80 percent of the in-class samples. This is done to demonstrate the recognition of each category by our method more clearly than the confusion matrix of supervised learning method using only ten labeled samples per class.


As shown in Figures 3 and 4, our strategy has the potential to significantly increase the model’s identification accuracy of the SAR target. The accuracy of model identification in categories with poor model recognition accuracy achieved by the supervised learning approach, such as 2S1, BRDM2, BTR60, and BTR70, has been significantly improved as a result of this improvement. Increases in recognition accuracy for 2S1 were made from 65 percent to 91 percent, for 2BRDM2 from 68 percent to 97 percent, and for 2BTR60 from 62 percent to 93 percent. Increases in recognition accuracy for 2BTR70 were also made from 66 percent to 93 percent.
3.5. Time Analysis
In this part, we cover the amount of time it takes to train and evaluate our approach, as well as alternative semisupervised learning methods and supervised learning methods that are available. In our experimental setting, Table 2 records the training and testing time for the supervised learning technique, the model, the mean instructor, the mixmatch, the SCA, and the suggested method, as well as the time for the proposed approach.
It can be observed in Table 2 that the supervised learning approach takes the smallest amount of time when calculating the training time for each epoch. A single epoch requires around 7 seconds of training time. There is minimal difference in time between other semisupervised learning approaches. We found that the semisupervised learning approach model requires 16.68 seconds for a single epoch of training, whereas our method requires 24.69 seconds for a single epoch of training (see Figure 1).
Consistency regularization involves pseudolabeling of unlabeled data, which needs a certain amount of time, which we think is the source of the time difference between supervised learning and semisupervised learning. Despite the fact that our technique takes the longest to complete, the model was trained for a total of 120 epochs across all methods in this experiment. As a result, when compared to the supervised learning approach, our method requires just half an hour more time to train the model than the supervised learning method in total. When compared to the enormous gain in recognition accuracy, a short period of time is tolerable.
In terms of test time for each picture, since all techniques use the same model, the SAR image recognition method is the same for all methods, which means the test time is virtually the same for all methods for each image, and the test time for all methods for each image is within one second. In practical application, greater emphasis is placed on the identification time of a single SAR picture, and semisupervised learning does not increase the amount of time necessary to evaluate the system’s capabilities. In addition, this demonstrates that the semisupervised learning strategy is useful in enhancing the accuracy of SAR identification in practice.
4. Conclusion
SAR pictures of constructed and natural objects were classified in this study, with the goal of finding the most appropriate kernel function for the application’s requirements. Natural and artificial objects in SAR pictures are classified using the supervised machine learning method SVM, which achieves the goal of categorizing objects into natural and artificial. For the purpose of segmenting the considered picture, the multiple area graph cut image segmentation technique is used. In order to map these areas into a higher dimensional feature space, the kernel function is used. SVM is used to categorize items into two categories: manufactured objects and natural objects. Various kernel functions, such as RBF, polynomial, sigmoid, and Hellinger, are employed in the simulation. The classification procedure may be ended with the conclusion that the pseudolabeling kernel provides higher performance measures for false alarms and target missing when compared to other kernels.
Data Availability
The data that support the findings of this study are available on request from the corresponding author.
Conflicts of Interest
All authors declared that they do not have any conflict of interest.