Abstract
For the problems of feature extraction and decision making in synthetic aperture radar (SAR) image target recognition, a method based on multimode clustering and decision fusion is proposed. The bidimensional variational mode decomposition (BVMD) is used to decompose the SAR image to obtain multiple modes, which provide multilevel descriptions of the target characteristics. Clustering is performed based on the intrinsic correlation of multiple modes, and several subsets with different modes are selected. Based on the joint sparse representation (JSR), each mode subset is classified, and the corresponding reconstruction error vector is obtained. The linear weighted fusion is employed to fuse the results from different mode subsets. Finally, a decision is made based on the fused results. Experiments are carried out based on the MSTAR dataset. The results show the effectiveness of the method under the standard operating condition (SOC) and robustness under extended operating conditions (EOCs).
1. Introduction
rSynthetic aperture radar (SAR) can work in all weather conditions to obtain high-resolution images that can be used for interpretation [1]. The existing SAR target recognition methods are mainly improved or innovated from the two key steps of feature extraction and classification in order to improve the final recognition performance. Feature extraction aims to obtain de-redundant, low-dimensional representations of the original SAR images. In [2–7], the geometric features were extracted such as target area, shadow, contour, and so on to design a target recognition method. In [2], the Zernike moments were adopted as regional features for SAR target recognition. Ding et al. applied binary morphological operations in region matching and defined a robust similarity measure. The target outlines were used as basic features to evaluate the similarities between different SAR images with appellation to target recognition in [6, 7]. The principal component analysis (PCA), kernel PCA (KPCA), monogenic signal, bidimensional empirical mode decomposition (BEMD), and multiresolution representations were employed to develop SAR target recognition algorithms [8–15]. In [11], the monogenic signal was introduced into SAR target recognition with good performance. BEMD was employed by Chang et al. to enhance the discrimination of extracted features. Taking into account the electromagnetic scattering characteristics of the target, the scattering center parameters of the target can be estimated by analyzing the pixel distribution of the SAR image. In [16–18], several matching schemes based on scattering centers were developed and applied. In the classification stage, the corresponding decision-making mechanism is mainly designed by using mature classifiers or according to the characteristics of the features, including K-nearest neighbor (KNN) [8], support vector machine (SVM) [19–21], sparse representation-based classification (SRC) [21–26], and convolutional neural network (CNN) [27–40].
This paper proposes a SAR target recognition method based on multimode clustering and decision fusion. The bidimensional variational mode decomposition (BVMD) [41, 42] is used to obtain multimode representations of the SAR image, which can more effectively characterize the global, detailed, and bidimensional time-frequency characteristics of the target. In [13], these multilevel decomposition features were proved effectively to improve the performance of SAR target recognition. Therefore, BMVD can provide effective features for SAR target recognition. However, this method only selected several decomposition modes empirically and did not fully analyze the contribution of each mode to the final recognition result. As a result, part of the modes involved in the decision making may have adverse effects on the final recognition. Therefore, this paper first clusters the multiple modes obtained by BVMD by investigating their intrinsic correlation and obtains several mode subsets. In each subset, multiple modes obtained by clustering have strong internal correlation. In this paper, the joint sparse representation (JSR) is used for classification of each mode subset [11–13], and the reconstruction error vector corresponding to each category is obtained. As reported, JSR is an extended version of traditional SRC, which could be directly used to handle several sparse representation problems especially when they shared some correlations. For the case of the BVMD features in this paper, they are generated from the same inherently related object. Therefore, JSR is a suitable classifier for them. For the reconstruction error vectors of different mode subsets, the linear weighted fusion is used to perform fusion analysis on them. Finally, the target label of the test sample is determined according to the final fusion error vector. Some experiments are designed and performed based on the MSTAR dataset. According to the experimental results, the performance of the proposed can be quantitively validated.
2. Basics of BVMD
First developed by Dragomiretskiy and Zosso [41], the variational mode decomposition (VMD) provides an adaptive signal processing tool to decompose the input as components with specified frequencies. Compared with wavelet analysis, empirical mode decomposition (EMD), etc., it is validated that VMD has better effectiveness and robustness. As an extension of VMD, BVMD was developed to process 2D signals like images [42]. The basic problem in BVMD is stated as follows:where is the input; represents the 2D analytic signal corresponding to the kth decomposition; and provides a reference direction in the frequency domain.
The problem in equation (1) can be reformulated by the Lagrangian multiplier as follows:
Afterwards, an unconstrained optimization problem is obtained as follows:where and correspond to the Lagrangian multiplier and balance parameter, respectively. contain the K decompositions, and include the corresponding center frequencies.
According to (3), the alternate direction method of multipliers (ADMM) can be employed to solve the above problem. The decomposition process is updated as follows:where , , and correspond to the Fourier transforms of , , and , respectively.
The center frequency is updated using a similar idea:where represents the power spectrum on the half-plane .
A standard gradient ascent can be used to update the Lagrangian multiplier with a fixed time step :
This paper employs BVMD for feature extraction in SAR target recognition. The decomposed multimode representations could comprehensively describe the properties of the target, including the region, outline, etc. Therefore, different modes from BVMD could complement each other as for providing discriminative information for the original image. Therefore, it is promising that the joint use of the multimode representations could enhance the recognition performance.
3. Decision Fusion Based on Multimode Representations
3.1. Clustering of Multimode Representations
Based on the BVMD decompositions, the multimode representations of the same SAR image can be obtained, which share certain correlations. However, due to the impact of EOCs, these correlations may not be global. Therefore, there may be several subsets with strong correlations in multimode representations. This paper uses a correlation-based clustering algorithm to achieve multimode subset division. The correlation metric is defined as follows:where and are two different modes obtained by decompositions of the same SAR image; and represent the pixel mean; and and are the two-dimensional offset distances of the reference image along the mode image. The correlation coefficient obtains the maximum image correlation under different offset distances while taking the maximum value.
Denote the modes obtained by the decomposition of the same SAR image as , and the correlation coefficient between any two modes is calculated according to the similarity measure in equation (7). The results are shown in Table 1. In the table, denotes the similarity between the ith and jth decompositions from BVMD. On this basis, all the modes are clustered using the correlation threshold . When the correlation between a single two modalities is higher than , they are considered to belong to a modal subset. After the clustering, several mode subsets can be obtained. In each subset, the modes in it share high correlations.
3.2. Joint Representation
For the multiple mode subsets obtained by clustering, this paper uses the JSR to independently represent and classify them. For a certain subset , which contains modes, i.e., , the basic form of JSR is as follows:where is the dictionary corresponding to the kth mode in the subset; is the corresponding sparse coefficient vector; and is the sparse coefficient matrix.
In order to make full use of the correlation of different modes in this subset, the JSR model further uses the norm to constrain the coefficient matrix . The updated objective function is as follows:
According to the sparse coefficient matrix obtained by solving equation (9), the reconstruction error of each mode in the subset can be calculated separately to obtain the sum of reconstruction errors, as shown below:where and correspond to the dictionary and sparse coefficients of the ith mode and the jth class, respectively, and is the reconstruction error of the ith mode to the jth class.
The reconstruction error vector for the M mode subsets can be calculated according to the same idea mentioned above, denoted as . Afterwards, the linear weighting can be employed to fuse the reconstruction error vectors of different mode subsets as follows:where are the weights corresponding to different mode subsets, which are determined according to the number of modes in each subset, i.e., where is the number of modalities in the ith subset. For the reconstructed error vector after fusion, the target label of the test sample can be determined according to the principle of the minimum error. According to the above analysis, the SAR target recognition process designed in this paper is shown in Figure 1.

4. Experiments
4.1. MSTAR Dataset
The SAR image dataset released by the US DARPA/AFRL in the MSTAR program is used as the experimental data source. This dataset is obtained by the X-band airborne SAR platform, containing multiview SAR images of 10 types of ground stationary vehicles, with a range and azimuth resolution of 0.3 m. Figure 2 shows the optical images of these 10 targets. Based on this dataset, multiple types of experimental scenarios can be set to conduct a more comprehensive performance analysis of the proposed method.

In the process of testing the proposed method, it is simultaneously compared and analyzed with some methods in the existing literature, including the SRC method [22], monogenic signal method [11], BEMD method [13], and A-ConvNet method [30]. All these reference methods are implemented by the author according to the idea in the original literatures. They are tested and compared with the proposed method under the same conditions.
4.2. Results and Discussion
4.2.1. SOC
Table 2 shows a typical SOC based on the MSTAR dataset. The threshold value in the mode clustering algorithm is set as 0.45, and the proposed method is used to classify 10-class test samples, which obtains the confusion matrix as shown in Figure 3. Among them, the horizontal and vertical coordinates correspond to the real class and the predicted result, respectively. The diagonal element corresponds to the correct recognition rate of the corresponding target. The average recognition rate of the proposed method for all 10 targets is calculated to reach 99.12%, showing its excellent performance under SOC. The four types of comparison methods are tested under the same condition, and their average recognition rates are compared with the proposed one as shown in Table 3. It can be seen that various methods can achieve good performance under SOC. Compared with the BEMD method, the proposed method performs multimode clustering and decision fusion based on the inner correlations to further improve the recognition performance. As a method based on deep learning, ConvNet’s classification ability is closely related to the scale of training samples. In the experimental settings of Table 2, there are differences in the configurations of some samples, which affect the overall classification performance of ConvNet to a certain extent.

The mode clustering threshold determines the final mode composition participating in the JSR classification, which has an important influence on the final classification performance. Therefore, we test the classification results of the proposed method under several typical threshold values for 10 types of targets. The results are shown in Table 4. When the threshold is small, the clustering algorithm has weaker constraints on the correlation between different modes. As a result, the algorithm degrades into the traditional JSR classification. When the threshold is large, the requirements for the interrelationship in the clustering algorithm are too strict. At this time, each mode tends to become a subset independently, which leads to insufficient investigation of the internal relevance of different modes. The comparison shows that the proposed method has the best performance at times, and subsequent tests and comparative analysis will be carried out under this threshold.
4.2.2. EOCs
Different from the SOC, the EOCs mainly refer to the large difference between the acquisition conditions of the test sample and the training sample, resulting in a low overall similarity. Under most non-cooperative conditions, SAR target recognition occurs under EOCs, typically including target configuration differences, depression angle differences, and noise interference. In this paper, the proposed method is tested under the above three types of EOCs, which are denoted as EOC-1, EOC-2, and EOC-3, respectively. Tables 5 and 6 show the training and test sets under the configuration difference and the depression angle difference. The former sets a completely different target model in the test set from the training set, and the latter sets the test set at 30° and 45° depression angles (the training set is from 17°). Tables 7 and 8 compare the average recognition rates of various methods under EOC-1 and EOC-2, respectively. For configuration differences, the proposed method effectively retains the correct decision adapted to the configuration difference samples through multimode clustering, and the final average recognition rate is also higher than that of the four types of comparison methods. When the depression angle is 30°, all methods can maintain a correct recognition rate higher than 90%. The difference between the test and training samples caused by the change of the depression angle at this time is relatively small. However, when the depression angle is 45°, the recognition performance of various methods decreases drastically. At this time, there is a big difference between the test sample and the training sample. The proposed method achieves the highest performance in both cases, showing its stronger robustness to depression angle differences.
According to [17], 10 types of target test samples in Table 2 are processed by adding noise to construct test sets at different signal-to-noise ratios (SNRs). This paper defines SNR as follows:where is the SAR pixel; is the variance of the added additive white Gaussian noise; and the denominator term is the total energy of the added noise.
As the noise intensifies, the significance of the target characteristics gradually weakens, and the difficulty of recognition also increases. The noise samples are used to test various methods, and the comparison results are obtained, as shown in Figure 4. It can be seen that as the SNR decreases, the recognition performance of various methods decreases to varying degrees. The proposed method obtains stronger noise robustness by combining the advantages of clustering multiple modes and decision fusion. A comprehensive comparison of the recognition results under the three types of EOCs shows that the proposed method has stronger adaptability to EOCs and is beneficial to obtain more reliable recognition results.

5. Conclusion
This paper uses BVMD to decompose SAR images, thus obtaining multimode representations. In order to adaptively classify each test sample, all modes are clustered based on the principle of correlation, and multiple mode subsets with intrinsic correlation are obtained. The JSR is used to make decisions on each mode subset, and finally the decision is obtained through linear weighted fusion. Based on the MSTAR dataset, the proposed method is tested and compared under SOC and three types of EOCs. The experimental results show that the proposed method is more effective and robust than several existing methods.
Data Availability
The MSTAR dataset can be accessed upon request.
Conflicts of Interest
The author declares that there are no conflicts of interest.