Abstract

A synthetic aperture radar (SAR) target recognition method combining multiple features and multiple classifiers is proposed. The Zernike moments, kernel principal component analysis (KPCA), and monographic signals are used to describe SAR image features. The three types of features describe SAR target geometric shape features, projection features, and image decomposition features. Their combined use can effectively enhance the description of the target. In the classification stage, the support vector machine (SVM), sparse representation-based classification (SRC), and joint sparse representation (JSR) are used as the classifiers for the three types of features, respectively, and the corresponding decision variables are obtained. For the decision variables of the three types of features, multiple sets of weight vectors are used for weighted fusion to determine the target label of the test sample. In the experiment, based on the MSTAR dataset, experiments are performed under standard operating condition (SOC) and extended operating conditions (EOCs). The experimental results verify the effectiveness, robustness, and adaptability of the proposed method.

1. Introduction

Synthetic aperture radar (SAR) obtains effective ground observation data through two-dimensional high-resolution imaging, supporting related applications in the military and civilian fields. SAR target recognition uses the feature analysis and classification decision to determine the target class [1]. Feature extraction obtains the effective feature descriptions of the target in SAR images, including geometric shape, scattering centers, and projection transformation features. Literature [27] designed SAR target recognition methods based on geometric features such as target region, contour, and shadow. In [2, 3], the Zernike moments were used to describe the target area. In [4], a recognition method was proposed based on target region matching. In [6], the target contour distribution was modeled based on the elliptic Fourier descriptor. Papson proposed a SAR target recognition method based on shadow features [7]. The scattering center features describe the target’s backscattering electromagnetic characteristics at the high-frequency region. In [810], SAR target recognition methods were developed using the attribute scattering center as the basic feature. Projection transformation features can be further divided into projection ones and image decomposition ones. Projection features mainly use mathematical transformation algorithms, such as principal component analysis (PCA) or kernel PCA (KPCA) [11, 12] and nonnegative matrix factorization (NMF) [13]. Image decomposition methods include the wavelet analysis [14], monogenic signal [15], and bidimensional empirical mode decomposition (BEMD) [16]. The methods mentioned above are all based on a single feature to carry out target recognition. In fact, combining a variety of different features can effectively improve the performance of SAR target recognition. In [17], the multitask compressive sensing was employed to implement joint classification of multiple features of SAR images. In [18], a multifeature hierarchical decision fusion method was proposed. In [19], a multifeature and multirepresentation fusion strategy was proposed for target recognition. According to the extracted feature categories, the classifier has to analyze and make decisions accordingly to obtain the target label of the unknown sample. In [20], a SAR target recognition method was developed based on K-nearest neighbor (KNN). The support vector machine (SVM) was employed in [19, 20] as a base classifier to design a SAR target recognition method. The sparse representation classification-based classification (SRC) was employed for SAR target recognition in [1222]. With the development of deep learning in recent years, the convolutional neural network (CNN) has gradually become a hot tool in SAR target recognition, and a number of representative methods have emerged [2330]. Similarly, classifier fusion is also used and verified in SAR target recognition. In [21], SVM and SRC were used for fused classification. In [24], the CNN and SVM were combined to further improve the classification performance.

This study proposes a SAR target recognition method based on multiple features and classifiers. Three types of features, i.e., Zernike moments, KPCA, and monogenic signals are used for feature extraction. Zernike moments describe the geometric shape of the target and have the advantages of invariable translation and rotation. The features have clear physical meaning and reflect the details of the target [2, 3, 19]. KPCA extracts the projection features of the original image, which provide a concise feature vector and have a certain nonlinear description ability [11, 12]. The monogenic signal can effectively decompose the SAR image and obtain the multilevel and multifrequency description characteristics [15]. Therefore, the three types of features have good complementarity and can provide more sufficient discriminative information for decision-making. In the classification stage, SVM, SRC, and joint sparse representation (JSR) are used as the classifiers for Zernike moments, KPCA feature vectors, and monogenic features to obtain the corresponding decision variables. On this basis, multiple sets of linear weights are designed to perform weighted fusion on the decision variables of the three types of features [3136], which determine the target label of the test sample finally. In the experiments, the proposed method is tested under standard operating condition (SOC) and extended operating conditions (EOCs) [3742] based on the MSTAR dataset. The recognition results and comparative analysis verify the effectiveness and robustness of the proposed method.

2. Extraction of Multiple Features

2.1. Zernike Moment

Zernike moments are widely used in the description of SAR target regions because of the translation and rotation invariance and noise robustness [2, 3, 19]. For the image in the polar coordinates, the Zernike moment is calculated as follows:where , , is even, and .

Zernike polynomials are a set of orthogonal complete complex-valued functions on the unit circle , which satisfy the following condition:

On this basis, the rotation invariant feature is constructed as follows:

Based on the above equations, the Zernike moment at any order of the input image can be calculated. Among them, the high-order moments can effectively reflect the detailed information in the image, which is beneficial to improve the recognition performance. In this study, the 3–8th moments of Zernike moments are used to describe the SAR target region, which constitute a feature vector.

2.2. KPCA

PCA calculates the best projection direction by analyzing the data structure of a large number of samples to achieve data dimensionality reduction [11]. For the sample set , their mean and covariance matrix are first calculated as follows:

The eigenvalues and eigenvectors of the covariance matrix are calculated as follows:

In equation (6), the vector stores the eigenvalues, and each eigenvalue corresponds to the eigenvector in the matrix . By selecting feature vectors corresponding to several largest eigenvalues, a projection matrix can be constructed for feature extraction of the samples.

KPCA is an extension of PCA in nonlinear space, which can process datasets with nonlinear structure more efficiently [11, 12]. KPCA first processes the data by introducing a kernel function (typically polynomial and radial basis kernel) and then performs PCA operations in high dimensions. In this study, KPCA is used to process SAR images and obtain 80-dimensional feature vectors for subsequent classification.

2.3. Monogenic Signal

The monogenic signal is a two-dimensional signal decomposition algorithm, which can effectively analyze the multilevel spectrum characteristics of the original image [15]. For the input image , its Riesz transformation is calculated as , where represents the two-dimensional coordinates. The corresponding monogenic signal is calculated as follows:where and are the imaginary units. and its Riesz transform, respectively, correspond to the real part and imaginary part of the monogenic signal, respectively. Accordingly, the characteristics of the monogenic signal are defined as follows:

In the above equations, and correspond to the i-imaginary part and j-imaginary part of the monogenic signal, respectively; represent the amplitude information; and correspond to the local phase and azimuth information, respectively.

The three types of features obtained based on monogenic signal decomposition have different characteristics. Among them, mainly reflects the gray distribution characteristics of the image. and reflect the local detail information and shape characteristics of the image. Therefore, the joint use of the characteristics of the monogenic signal is conducive to constructing a more informative characterization. In this study, according to [9], the SAR image is decomposed by monogenic signal, and three corresponding feature vectors are obtained through downsampling and vector concatenation.

3. Multiple Classifiers and Decision Fusion

3.1. SVM for Zernike Moments

For the classification problem of the two types of patterns, SVM obtains the best classification interface by minimizing structural risks [24]. For the sample with unknown label, the decision hyperplane of SVM classification is as follows:where is the weight coefficient vector, which is used to describe the relevant parameters of the hyperplane, represents the kernel function, and represents the bias.

At first, SVM was proposed for the recognition of two types of patterns, namely, the hyperplane of equation (9) was used to distinguish between the two types. In the later period, researchers extended it to the classification of multiclass models through strategies such as “one-to-one” and “one-to-many.” Through the training of a large number of labeled training samples, a suitable classification surface can be obtained. At the same time, choosing a suitable kernel function can effectively enhance the nonlinear classification ability of SVM. When using SVM for multicategory classification, the (pseudo) posterior probability of each category is output to represent the possibility that the current sample belongs to a certain training class. The type of test sample can be determined by the principle of maximum posterior probability. In this study, SVM is used to classify Zernike moments [4346].

3.2. SRC for KPCA Features

SRC uses sparse representation as a basic algorithm to characterize test samples with unknown classes and then determines their category based on the analysis of the reconstruction errors [14, 15, 23]. Dictionary construction is one of the key steps in SRC. Existing methods mostly use samples of all training classes to construct a global dictionary , in which contains all training samples from the ith training category. Accordingly, the sparse reconstruction of the test sample is described as follows:where is the sparse representation coefficient vector to be solved, and is the reconstruction error threshold.

Under the constraint of norm, it is very difficult to solve the sparse representation coefficient in equation (10). For this reason, researchers used norm minimization [23] to approximate the problem in equation (10) and converted it into a convex optimization problem that is easy to solve. In addition, the orthogonal matching pursuit algorithm (OMP) [14], Bayesian compressive sensing (BCS) [15], and other algorithms can also be used to obtain the approximate solution of equation (10). Based on the solutions, the decision process is performed as follows:where are the coefficients corresponding to the ith training class from the extraction in , and represent the reconstruction errors of different classes.

Studies have shown that SRC has good robustness against noise interference and occlusion [23], which can effectively supplement SVM. This study uses SRC to classify KPCA feature vectors.

3.3. JSR for Monogenic Features

For the different types of features extracted from the same SAR image, they have a certain inherent correlation. For this reason, this study uses JSR to jointly represent them, thereby improving the overall accuracy [5, 15, 26]. The three monogenic feature vectors obtained from the test sample are denoted as . This study uses JSR for classification. The basic representation process is as follows:where is the global dictionary corresponding to the th feature, is the corresponding coefficient vector, and .

The objective function in equation (12) does not take into account the inherent relationship of the three types of features. This goal can be achieved by constraining the sparse matrix . The updated objective function is as follows:

In equation (13), the norm is used to constrain , which can effectively use the internal relationship of the three types of features.

According to the obtained coefficient matrix , the sum of the reconstruction errors of each class for the three types of features can be calculated, and then, the target label of the test sample can be decided:where and are the part of the dictionary and the corresponding coefficient vector corresponding to the kth feature in the ith class.

3.4. Decision Fusion

For SRC, the output result is the reconstruction error vector . First, these reconstruction errors are converted into a probability vector according to the following:

This study uses multiple sets of weights for linear fusion, so as to obtain a more robust result. Denoted as the decision variable of the kth feature of the ith class, first construct N weight vectors:

In the formula, each column in the matrix W represents a weight vector, which satisfies

The weighting process under a set of weight vectors is as follows:

Therefore, in the group of N random weight vectors, the th training sample obtains a weighted result , which is called the fusion decision vector. Finally, the decision variable is averaged as the final decision value of the ith category, and the target category of the test sample is determined through various comparisons. It can be seen that under the action of multiple sets of weight vectors, the three types of features participating in the fusion can be fully analyzed.

4. Experiments

4.1. MSTAR Dataset

The MSTAR dataset is a representative dataset for testing and evaluation of current SAR target recognition methods. The dataset contains ten types of military vehicle targets, as shown in Figure 1. The SAR images of various targets are acquired by X-band airborne radar with a resolution of 0.3 m. The MSTAR dataset has abundant samples, and a number of representative operating conditions can be set accordingly. The azimuth angles of various targets cover 0°–360°, which can be used for comprehensive training and testing. Some targets include several different configurations, which can be used to investigate the performance of the recognition methods under configuration variance. Some targets have multiple different depression angles, which can be used to investigate the performance of the recognition methods under large depression angle differences.

During the experiments, several types of existing methods are compared, which are mainly divided into two categories. The first category employs the three types of features used in this study, but uses a single feature, which are, respectively, denoted as Zernike [2], KPCA [12], and monogenic [15]. The second category is the multifeature fusion method, i.e., the methods in the [17, 18], which are, respectively, denoted as fusion 1 and fusion 2. The third category is the currently popular deep learning methods, using the A-ConvNet method in [23]. Subsequent experiments are first carried out under SOC and then under two EOCs of configuration variance and depression angle variance.

4.2. Results and Discussion
4.2.1. SOC

In the SAR target recognition problem, SOC generally refers to the high overall similarity between the test and the training samples, and the recognition difficulty is relatively low. Table 1 provides the training and test samples under SOC, which are from 17° and 15° depression angles, respectively. The test and training samples for various targets are from the same target confirmations. The proposed method is used to classify the 10 types of targets shown in Figure 1, and the confusion matrix shown in Figure 2 is obtained. Among them, the diagonal value marks the correct recognition rate of the corresponding target. In the experiment, the average recognition rate is defined as the proportion of correctly recognized samples in all test samples. The average recognition rate of the proposed method for 10 types of targets is 99.46%. Table 2 compares the average recognition rates of various methods under current experimental settings, which are all higher than 98%, reflecting the low difficulty of problems under SOC. Compared with the three types of methods based on single features, this study significantly improves the final recognition performance through their combined use. Compared with the other two types of multifeature fusion methods, the performance of this study is better, which shows that the designed decision fusion algorithm has stronger effectiveness. The CNN method can achieve high performance under SOC, but it is still lower than the proposed method. In summary, the proposed method can achieve superior performance under SOC, which verifies its effectiveness.

4.2.2. EOCs

EOCs are defined with reference to SOC, which mainly examine the differences between the test and the training samples due to factors such as target, background, and sensor variations. The typical EOCs that can be set based on the MSTAR dataset mainly include configuration variance and depression angle variance, which are tested in the subsequent experiments.

Configuration Variance. The configuration variance is mainly due to the situation caused by the change of the target itself, which refers to the different configurations of the test sample and the training sample from the same target. Table 3 provides the current experimental scene. Among them, the test samples and training samples of BMP2 and T72 targets are from different configurations. The appearance similarity between BTR70 and these two types of targets is relatively high (as shown in Figure 1), and its introduction increases the overall recognition difficulty. Various methods are tested under current conditions, and their average recognition rates are given in Table 4. Compared with the three types of single feature-based methods, the performance advantages of this method are very significant, indicating that their joint representation and weighted fusion can effectively improve the robustness of recognition. Compared with the two types of multifeature fusion methods, the recognition rate of the proposed method is higher, reflecting its stronger robustness. The performance degradation of the CNN method under current conditions is very obvious, mainly due to the weak coverage of the training samples to the test samples, and the corresponding training network adaptability also decreases.

Depression Angle Variance. As the relative viewing angle between the target and the sensor changes, the corresponding SAR images obtained will also have larger differences. In particular, when the test sample and the training sample come from a large depression angle, the difficulty of the recognition is greatly increased. Table 5 provides the training and test samples under the condition of depression angle variance. Among them, the training samples are all from a 17° depression angle; the test samples are divided into two subsets, corresponding to 30° and 45° depression angles, respectively. Independent testing is performed on the test samples at two depression angles, and the average recognition rate of various methods is shown in Figure 3. It is intuitively obvious that the recognition result at the 45° depression angle is significantly lower than that of 30°. The proposed method obtains the highest average recognition rate at both depression angles, indicating that it has better robustness to depression angle variance. Through the effective fusion of the three types of features, the proposed method can more comprehensively investigate the image changes caused by the variances in the depression angle, so as to obtain more reliable recognition results.

5. Conclusion

In this study, a multifeature and multiclassifier SAR target recognition method is proposed. Zernike moments, KPCA, and monogenic signal are used to describe the characteristics of the original SAR image, and the corresponding feature vectors are obtained. In the classification stage, SVM, SRC, and JSR are used to make decisions on the three types of features, and then, their decision vectors are weighted and fused based on the multiple weight vector. Finally, the target label of the test sample is determined according to the fused decision variables. The three types of features and the three classifiers have good complementarity, so they can provide more effective information for target recognition. In the experiment, the proposed method is tested and verified under SOC, configuration variance, and depression angle variance based on the MSTAR dataset. The results show the performance advantages of the proposed method.

Data Availability

The dataset used to support the findings of this study is available from the corresponding author upon request.

Conflicts of Interest

The authors declare that they have no conflicts of interest.

Acknowledgments

This work was supported by the Natural Science Foundation of Hubei Province of China in 2021 (project name: Research on Brain Tumor Diagnosis Based on Capsule Neural Network), Team Project Funding of Scientific Research Innovation for Outstanding Young and Middle-Aged Colleges and Universities in Hubei Province (project number: T201924), and New Generation Information Technology Innovation Project Ministry of Education (project number: 20202020ITA05022).