Abstract
This work proposes an effective high-resolution multisource direction-of-arrival (DOA) estimation method in impulsive noise scenarios based on convolutional neural networks (CNNs). First of all, the array observation matrix is preprocessed and fed into a denoising network to suppress outliers and filter out impulsive noise. Secondly, the denoising network output is fed into a model order selection network to estimate the model order. Next, according to the estimation, the denoising network output is fed into a DOA subnetwork corresponding to the model order in a DOA network to estimate the DOA of each signal. Comprehensive simulations demonstrate that, in the presence of impulsive noise, the proposed method is effective and superior in accuracy and computation speed for multisource DOA estimation. Therefore, it is concluded that CNN can be well generalized for DOA estimation.
1. Introduction
In recent years, direction-of-arrival (DOA) estimation has been extensively applied in many fields, such as radar, sonar, electronic monitoring, and mobile communication [1–3]. It is the critical technology of array signal processing, and the purpose is to estimate the transmitter positions of the signals received by arrays. The representative conventional DOA estimation algorithms are the estimation of signal parameters via rotational invariance techniques (ESPRIT) [4, 5] and multiple signal classification (MUSIC) [6, 7]. ESPRIT uses two subarrays with translation invariance to realize DOA estimation. Although the calculation is less than that of MUSIC, its accuracy is not enough. MUSIC obtains the spatial spectrum by constructing orthogonal signal subspace and noise subspace, and the DOA can be estimated via spectral peak search. The smaller the search step, the higher the estimation accuracy, while the computational burden may increase. In addition, both the ESPRIT and MUSIC algorithms are developed in Gaussian noise. However, many noises and signals in applications are impulsive, such as underwater acoustic signals, radar clutter, and artificial interference. In the presence of impulsive noise, the performance of algorithms represented by ESPRIT and MUSIC would be significantly degraded because of the nonconvergence of array output second-order moments. To improve the robustness, FLOM [8], PFLOM [9], CRCO [10], COBU [11], and other algorithms, which effectively achieve DOA estimation in impulsive noise, have been developed successively. However, the high computational complexity may make these algorithms difficult to be applied in real-time.
Many DOA estimation methods based on neural networks have been developed in recent years to reduce the computational burden. References [12–15] using a convolutional neural network (CNN), references [16, 17] using a support vector regression (SVR), reference [18] using a residual network, reference [19] using a fully connected neural network (FNN), reference [20] using a long short-term memory network, and references [21, 22] using a radial basis function (RBF) achieve high accuracy DOA estimation. However, they can only be used in single-source scenarios, which may be extremely limited in practical applications. Therefore, reference [23] using CNN, references [24, 25] using SVR, and references [26, 27] using RBF achieve multisource DOA estimation. Although high estimation accuracy is obtained, these networks are only suitable for fixed model order scenarios and may not adapt to a variable source number. To improve the robustness to model orders, reference [28] divides the region of interest into several subregions, and each subregion corresponds to an RBF. A multilayer perceptron is used to judge the presence of a single source in each subregion, and RBFs perform position estimations of subregions with a source. Likewise, reference [29] using a circularly fully convolutional network divides the region of interest into several subregions. In the first stage, a coarse DOA is estimated, and then a refined result is obtained in the second stage. Although references [28, 29] can adapt to a variable model order, multiple sources are not permitted to be presented simultaneously in a subregion, which will affect the resolution of DOA estimation. The above neural network-based methods have effectively overcome computational complexity and improved DOA estimation accuracy. Yet, it may be difficult for these neural network-based methods to meet both high-resolution and variable model order requirements. Furthermore, these methods do not consider impulsive noise scenarios, in which neural networks may fail to be fed into practical input features, resulting in performance degradation of neural networks.
Motivated by the above investigation, we propose a high-resolution DOA estimation method with a variable model order using CNNs in impulsive noise environments. A DOA estimation model is developed, which comprises four modules: the preprocessing, denoising network, model order selection (MOS) network, and DOA network. The preprocessing processes array observation matrices into input features for the denoising network. The MOS network input is from the denoising network, and the output is the model order estimation. The DOA network comprises several DOA subnetworks corresponding to different model orders. The subnetworks, which adopt the transfer learning strategy, are trained sequentially from the model order of 1. The estimation of the MOS network decides to feed the denoising network output into the DOA subnetwork corresponding to the model order, which outputs DOA estimation. A series of simulation experiments verify the effectiveness and superiority of the DOA estimation model.
The main contributions of the study are as follows: (1) a high-resolution DOA estimation method with a variable model order using CNNs is achieved; (2) the robustness of DOA estimation is improved in impulsive noise scenarios; (3) for DOA estimation, it is proved that employing transfer learning can reduce training data.
The remainder of this paper is organized as follows: Section 2 briefly introduces the impulsive noise and CNN, and the problem of interest is defined; Section 3 develops the novel DOA estimation model; the experimental results are presented in Section 4, and the study is summarized in Section 5.
Table 1 lists the main notations used in this study. Other notations follow the conventional expression unless otherwise specified.
2. Preliminary and Problem Formulation
2.1. Impulsive Noise
The a-stable distribution, a highly flexible tool, can model the impulsive noise. It is normally defined via the characteristic function containing four variables: location parameter a (−∞<a<+∞), symmetric parameter β (−1≤β ≤ 1), dispersion coefficient γ (γ > 0), and characteristic exponent α (0<α ≤ 2) [30]. a reflects the deviation of the probability density function (PDF) on the x-axis (a = 0 in the study). β indicates the PDF inclination to the left or right about a (β = 0 in the study). γ denotes the dispersion of samples (γ = 1 in the study). α signifies the thickness of the a-stable distribution tails. Figure 1 displays the PDF of impulsive noise with different α. The smaller the α, the stronger the impulses, and the more impulsive the outliers.

2.2. Convolutional Neural Network
Compared with RBF, SVR, and FNN, the weight sharing strategy of CNN effectively improves the problem of the enormous input feature dimension of neural networks. Usually, CNN comprises multiple convolutional layers, pooling layers, and fully connected layers [31]. As an essential part of CNN, the convolutional layer comprises convolution filters. The feature image, as the input of the next layer, can be obtained by convolution of the input image and convolution filters.
The convolution process is displayed in Figure 2. The input image size is 8×8. Convolve with a 3×3 convolution filter with a stride of 1, and obtain an 8×8 feature image. For ensuring the equal size of the input and feature images, the periphery of the input image is filled with zeros. The convolution filter is sliding on the input image, and the element ai,j in row i and column j of the feature image can be calculated bywhere F indicates a nonlinear activation function. ωI−i+1, J−j+1 denotes the weight in row I−i+1 and column J−j+1 of the convolution filter, and υ denotes the bias. signifies the element in row I and column J of the padded input image. Typically, a convolution layer has multiple convolution filters. The number of feature images equals that of convolution filters. The channel number of a convolution filter equals that of the input image.

2.3. Problem Formulation
Figure 3 depicts p far-field stable electromagnetic signals impinge on a uniform linear array (ULA). The array element number is M, and the distance between two elements is d. The incident angle of the i-th signal si(n) is θi (0°≤θi≤180°). The element denoted by 1 is selected as the phase reference point, and the M×p steering matrix can be expressed as [32]where c denotes the propagation speed and f indicates the known carrier frequency. The p×N signal matrix can be expressed aswhere N signifies the snapshot number. The M×N observation matrix is expressed aswhere ei signifies the M×N impulsive noise matrix and subscript i indicates impulse.

Obviously, θi to θp can be obtained from x, while the impulsive noise may disturb x. The problem addressed in the study is establishing the mapping between x and θi to θp via the developed DOA estimation model.
3. DOA Estimation Model
In this section, we proposed the DOA estimation model, of which the input is x and the output is p DOA estimations, as shown in Figure 4. The model contains four modules: the preprocessing, denoising network, MOS network, and DOA network. x is processed by the preprocessing module and fed into the denoising network, of which the output is fed into the MOS and DOA networks. The DOA network comprises a set of DOAp (1≤p ≤ P) subnetworks corresponding to different model orders. The estimation of the MOS network decides to feed the denoising network output into the DOAp subnetwork corresponding to the model order of p. For the convenience of training, the denoising network, MOS network, and each DOAp subnetwork are trained independently.

3.1. Preprocessing
The preprocessing input is x, and the output is the input feature of the denoising network. In the presence of impulsive noise, elements of x may be extreme outliers at some moments. First of all, x is normalized with the infinite norm to suppress impulsive outliers and facilitate the training of the denoising network. The definition of the infinite norm iswhere xi(n) indicates the n-th snapshot of the i-th array element. After normalizing x with the infinity norm, x expressed by (4) should be modified aswhere . Secondly, x∞ is used to construct the covariance matrix R∞ to reduce the dimension of neural network input features. Considering that R∞ is a Hermitian matrix and CNN requires input features to be real numbers, we construct the following M×M input feature Fd for the denoising network:where R∞jk represents the element in row j and column k of R∞.
3.2. Denoising Network
The function of the denoising network is to eliminate the effect of noise and signal powers from Fd on the MOS network and DOA network. Therefore, we must construct appropriate labels for Fd to train the denoising network.
In the absence of impulsive noise, the observation matrix is expressed by xd, and (4) can be modified asxd is used to construct the covariance matrix Rd, and the M×M label Ld constructed with Rd can be expressed aswhere . Rdjk indicates the element in row j and column k of Rd, and , namely, the estimation of the signal power sum. And the principal diagonal elements of Rd are replaced by zeros because they fail to contain angle information. Obviously, Ld does not contain impulsive noise, and NRd normalizes the elements of Rd to eliminate the effect of signal powers.
The cost function of the denoising network employs the mean squared error (MSE), which is expressed by MSEd aswhere signifies the denoising network response. ω and υ denote the denoising network weights and biases, respectively.
3.3. MOS Network
The MOS network input is from the denoising network. The P×1 label Lmos using one-hot [33] represents the model order, and P denotes the preset maximum model order. When the model order is p, the p-th element of Lmos is 1, and other elements are zeros.
Because of the independent MOS network training, the input feature is designed separately for the MOS network. Considering that the previous denoising network may not wholly filter out noise, Gaussian noise is added to the MOS network training set to improve the robustness of the DOA estimation model. In this case, the observation matrix is expressed by xmos, and (4) is modified aswhere eg signifies the M×N Gaussian noise matrix and subscript indicates Gauss. The covariance matrix Rmos is constructed by xmos, and the noise power σ2 can be estimated by eigenvalue decomposition. The input feature is expressed by Fmos, and (9) can be modified aswhere . Rmosjk signifies the element in row j and column k of Rmos, and , namely, the estimation of the signal power sum. Compared with Ld, elements in Fmos are disturbed by Gaussian noise. Training the MOS network with Fmos can improve the network robustness.
The cost function of the MOS network employs cross-entropy, which is expressed aswhere and represent the i-th element of the response and ground-truth label, respectively. ω and υ denote the MOS network weights and biases, respectively.
3.4. DOA Network
The DOA network comprises a set of DOAp subnetworks (1≤p ≤ P) corresponding to different model orders. The MOS network output activates the corresponding DOAp subnetwork. The input feature of each DOAp subnetwork is the denoising network output, and the response is the DOA estimation of p sources.
Theoretically, a single large-scale network can be trained to achieve multisource DOA estimation with a variable model order. However, this method is challenging to implement because of the limitation of computing power and the difficulty in obtaining a large number of training data to ensure the DOA estimation accuracy. Therefore, we adopt the strategy of training each DOAp subnetwork independently. Even so, to ensure the DOAp subnetwork estimation accuracy, the amount of training data required by conventional training methods also increases significantly with p. Given that DOA estimation of p sources and p+1 sources are two different but related tasks, we adopt parameter-transfer learning [34] to train DOAp subnetworks sequentially from p = 1 to P, and training sets are generated separately according to (12). First of all, we construct a single source training set to train the DOA1 subnetwork. Secondly, based on the DOAp subnetwork architecture, convolution filters, convolution layers, or lengths of fully connected layers are increased, and then one output layer port is added. Finally, the DOAp+1 subnetwork architecture is developed. The parameters of the trained DOAp subnetwork architecture are used as initialization parameters for corresponding parts of the DOAp+1 subnetwork architecture, and other newly added parameters of the DOAp+1 subnetwork adopt conventional initialization methods. Thus, the training data required by the DOAp+1 network can be significantly reduced.
The cost function of each DOAp subnetwork adopts the MSE, which is expressed by MSEp aswhere and denote the ground-truth label and DOA estimation of the i-th signal, respectively. L signifies the layer number of the subnetwork, and λ is the regularization parameter. ω and υ denote the DOAp subnetwork network weights and biases, respectively.
4. Simulation Results
Firstly, the denoising network performance is presented. Secondly, Section 4.2 shows the MOS network accuracy and compares it with a modified Akaike information criterion (AIC) method and a modified minimum description length (MDL) method. Finally, Section 4.3 demonstrates the DOA estimation model performance, and the MSE and computation speed are compared with those of CRCO and COBU.
Unless otherwise specified, the simulation conditions are as follows: (1) the ULA element number of M is 8, and the maximum model order P is set to 3; (2) the signal carrier frequency is 600 MHz, and the ULA element spacing is set to 0.48 times the wavelength; (3) the signal amplitude is randomly sampled to improve the DOA estimation model robustness for the signal amplitude [35], and the snapshot number is set to 2,000. The network initialization settings are as follows: (1) initialize the network weights with lecun_normal [36] and the biases with zeros; (2) the convolution filter is 3×3, and the stride is 1; (3) the padding mode is “same”; (4) the mini-batch [37] is 512; (5) Adam [38] is adopted in the backpropagation. Also, in Gaussian noise scenarios, the signal-to-noise ratio (SNR) is defined as , where denotes the signal power. In non-Gaussian noise scenarios, the ratio of the signal to noise dispersion, as generalized SNR, is defined as [11].
4.1. Performance of the Denoising Network
GSNR is set to 0 dB and 20 dB, respectively, and 30,000 samples are generated in p∈[1 : 1: 3], α∈[0.1 : 0.1: 2.0], and θ∈[0°: 1°: 180°]. 60,000 samples constitute the data set, which is divided into the training set (90%) and validation set (10%). GSNR is set to 0 dB to enhance the network generalization ability in the scenario of low GSNR, while 20 dB is set to prevent the network from overfitting. The learning rate gradually decreases with the epoch. Table 2 illustrates the denoising network architecture.
The relative error (RE) is defined as the evaluation criterion, which can be defined as
Figure 5(a) depicts the variation of RE with GSNR of the denoising network in different characteristic exponents after preprocessing test data. GSNR∈[−5 dB: 1 dB: 20 dB], and 500 test samples are generated in p∈[1 : 1: 3] and θ∈ [0°: 1°: 180°] for each GSNR. Figure 5(b) depicts the variation of RE with the snapshot number in different characteristic exponents with GSNR of 10 dB. N∈[100 : 100: 2000], and 500 test samples are generated in p∈[1 : 1: 3] and θ∈[0°: 1°: 180°] for each N. Figure 5 illustrates that increasing the GSNR, snapshot number, or characteristic exponent can improve the denoising network performance.

(a)

(b)
4.2. Performance of the MOS Network
The MOS network and denoising network are trained independently. Considering that the denoising network RE is challenging to be 0%, Gaussian noise is added during the MOS network training to enhance the model generalization ability. The SNR is set as −15 dB and 20 dB, respectively, and 20,000 samples are generated in p∈[1 : 1: 3] and θ∈[0°: 1°: 180°]. 40,000 samples constitute the data set, which is divided into the training set (85%) and validation set (15%). The learning rate gradually decreases with the epoch. Table 3 illustrates the MOS network architecture.
Accuracy is defined as the evaluation criterion. NT is the total amount of test data, and NC is the amount of correctly estimated data. The test data is the same as that in Section 4.1, and the denoising network output is fed into the MOS network. Figure 6 demonstrates the relationship between the MOS network accuracy and SNR or snapshot number. When α < 0.4, the MOS network performance is significantly degraded. The main reasons are as follows: (1) the amplitude of outliers is extreme, resulting in significant attenuation to sources after normalization with the infinite norm; (2) the angle step of random sampling is set to 0.01°, which may cause the angular distance of simultaneous incident sources to be very close, resulting in the inability of the MOS network to distinguish. However, when α > 0.4, the MOS network accuracy approaches 1 and hardly changes with the SNR or snapshot number, indicating that the MOS network has high-resolution and better robustness.

(a)

(b)
To the best of our knowledge, no MOS algorithm has been published in the presence of impulsive noise. To validate that the MOS network is superior to conventional algebraic algorithms, we replace the covariance matrix in AIC and MDL [39] with R∞ in Section 3.1 and obtain the modified AIC and MDL. Figure 7 shows that the MOS network outperforms the modified AIC and MDL, and the test data is the same as that with α = 1.3 in Figure 5. In addition, the processing time of the modified AIC and MDL is about 3 times that of the MOS network.

(a)

(b)
4.3. Performance of the DOA Estimation Model
4.3.1. DOA Network Architecture
Each subnetwork of the DOA network corresponds to a different model order and is trained independently. Firstly, the data set is constructed for the DOA1 subnetwork. The SNR is set as −15 dB and 20 dB, respectively, and 4,500 samples are generated in θ∈[0°: 1°: 180°] with p of 1.9,000 samples constituting the data set, which is divided into the training set (80%) and validation set (20%). The learning rate gradually decreases with the epoch. Table 4 illustrates the DOA1 subnetwork architecture.
Secondly, The SNR is set as −15 dB and 20 dB, respectively, and 20,000 samples are generated in θ∈[0°: 1°: 180°] with p of 2.40,000 samples constitute the data set, which is divided into the training set (90%) and validation set (10%). The learning rate gradually decreases with the epoch, and early stopping is adopted to prevent overfitting. Table 5 illustrates the DOA2 subnetwork architecture obtained by expanding the corresponding layer of the DOA1 subnetwork.
Then, the SNR is set as −15 dB and 20 dB, respectively, and 80,000 samples are generated in θ∈[0°: 1°: 180°] with p of 3.160,000 samples constituting the data set, which is divided into the training set (85%) and validation set (15%). The learning rate gradually decreases with the epoch. Early stopping and regularization are adopted to avoid overfitting. Table 6 illustrates the DOA3 subnetwork architecture. L1 to L4, L7, and L8 of the DOA3 subnetwork are expanded from L1 to L4, L6, and L7 of the DOA2 subnetwork in turn, and L5 is a new layer.
4.3.2. Performance of the Model
The MSE is defined as the evaluation criterion, which can be expressed aswhere p represents a variable model order. After the test data is preprocessed, the MOS network decides to feed the denoising network output into the corresponding subnetwork in the DOA network. Figure 8(a) shows the variation of the MSE with GSNR of the DOA estimation model in different characteristic exponents. GSNR∈[−5 dB: 1 dB: 20 dB], and 500 test samples are generated in p∈[1 : 1: 3] and θ∈ [0°: 2°: 180°] for each GSNR. Figure 8(b) shows the variation of the MSE with snapshot number in different characteristic exponents with GSNR of 10 dB. N∈[100 : 100: 2000], and 500 test samples are generated in p∈[1 : 1: 3] and θ∈[0°: 2°: 180°] for each N. Figure 8 suggests that increasing the GSNR, snapshot number, or characteristic exponent can improve the model performance. Still, DOA estimation is a challenging task in small characteristic exponent scenarios. Therefore, Figure 8 does not display the DOA estimation model performance when α < 0.7.

(a)

(b)
To highlight the superiority of the DOA estimation model, the model is compared with COBU and CRCO, which have been proved to be superior to classic FLOM and PFLOM in references [10, 11]. The search steps of COBU and CRCO are set as 0.01°. Both the weight factor and kernel size of COBU are set as 1. The scale factor and parameter μ of CRCO are set as 1.4 and 0.5, respectively. Other simulation conditions are the same as those of the DOA estimation model. Assume that three signals impinge the ULA with the incident angles of 50.01°, 100.08°, and 120.05°, respectively. Based on 500 Monte Carlo runs, Figure 9 demonstrates the variation of the MSE with GSNR or snapshot number of each method with α of 1.3. GSNR∈[−5 dB: 1 dB: 20 dB] and N∈[100 : 100: 2000]. Although COBU and CRCO are high-resolution algorithms based on spatial spectrum search, the estimation is still a discrete value limited by search step, and appropriate parameters should be set according to the characteristic exponent and GSNR. However, the DOA estimation model needs no prior knowledge about the characteristic exponent and GSNR, and its output is a continuous value. Therefore, the DOA estimation model is superior to COBU and CRCO.

(a)

(b)
Furthermore, the processing time of the three methods is compared, and Table 7 displays the results from 500 Monte Carlo runs. The calculations are performed on a computer equipped with 16 GB RAM and Intel Core i7-9700K CPU. The processing time of the proposed model is mainly spent on preprocessing. If the snapshot number is slightly reduced, Figure 8(b) reveals that the DOA estimation model performance will not be significantly degraded, while the processing time of preprocessing will be dramatically reduced. However, the processing time of COBU and CRCO will not be reduced considerably.
5. Conclusions
This study presented the DOA estimation model in impulsive noise environments. The model consists of the preprocessing, denoising network, MOS network, and DOA network. For the convenience of training, each network is trained independently. The preprocessing processes array observation matrices into appropriate input features for the denoising network. The DOA network consists of several DOA subnetworks corresponding to different model orders. The estimation of the MOS network decides to feed the denoising network output into the corresponding DOA subnetwork, which outputs DOA estimation.
The experiments reached the following conclusions: (1) in the presence of impulsive noise, the preprocessing and denoising network effectively suppress outliers and filter out impulsive noise; (2) the MOS network can estimate the model order with high accuracy; (3) the proposed DOA estimation model is effective and superior in accuracy and computation speed.
The problems that still need to be solved are as follows: (1) to develop DOA estimation in low characteristic exponent scenarios; (2) to extend the dimension of DOA estimation.
Data Availability
The data used to support the findings are available from the corresponding author.
Conflicts of Interest
The authors declare that they have no conflicts of interest.
Acknowledgments
This work was supported by the National Research Foundation of Korea (NRF) funded by the Korea Government (MSIT) (NRF-2016R1A6A1A03013567 and NRF-2021R1A2B5B01001484).