Abstract
In wireless networks, for the common in-phase and quadrature-phase (/) imbalance in the transmitters, the / branch models of digital predistortion (DPD) need to be identified separately, to improve the linearization effects. The existing order reduction methods of the predistorter are based on the contributions of the complex basis function terms, so as not to deal with the different contributions of / components of the complex basis function terms caused by the separate identification of the / branch models. The separate pruning of the / branch models will increase the complexity. Aiming at this issue, this paper proposes a general order reduction method based on the attention mechanism for the predistortion of the power amplifiers (PAs). This method is suitable for pruning both the traditional models and neural network-based models. In this method, the attention mechanism is used to evaluate the contributions of the real basis function terms to the predistorted output’s / components through offline training, and the influence of the cross terms of the / branch models is considered. The experimental results based on the comparison with other typical methods under 100 MHz Doherty PA and different / imbalance levels show that this method has superior pruning performance and good linearization ability.
1. Introduction
With the rapid iteration of the fifth-generation (5G) wireless systems, wider signal bandwidth and more complex modulation modes are used, to satisfy the rapid growth of the requirement of the data service [1–3]. However, the wide signal bandwidth and efficient modulation make the transmitters, especially the power amplifiers (PAs), exhibit more complex nonlinear behavior characteristics [4], which leads to the difficulty of high-efficiency transmission in the transmitting system. To solve this problem, digital predistortion (DPD) is one of the most commonly used linearization techniques [5–7].
DPD techniques compensate for the nonlinear behaviors of the transmitter by constructing a nonlinear model that is opposite to the nonlinear characteristics of the transmitter [8]. At present, the most common and popular predistortion models are the full Volterra (FV) series models. Since these models’ parameters are linear with respect to the output of the system, these models can be easily identified by the classical regression theory [9]. However, the complex nonlinear behaviors (including nonlinearity and memory effects [6]) caused by the increase of the signal bandwidth and complex modulation modes will lead to the curse of dimensionality of the FV models [9]. Therefore, the order reduction of the FV model has become an effective means to improve the availability of the model and reduce the cost [10, 11]. To this end, based on the general FV series model, various prior pruning models, such as the memory polynomial (MP) model [12] and the generalized MP (GMP) model [13], are proposed. These models are easy to be modeled in the field-programmable gate array (FPGA), such as through lookup table (LUT) [14–16], so they are commonly used engineering models at present. However, these models are pruned based on prior knowledge and are still general predistortion models [4]. For a specific PA, in order to meet the linearization requirements, these models still include many basic function terms with fewer contributions, leading to the complexity of the model.
For this reason, classical posterior pruning techniques are proposed to select the necessary terms based on the nonlinear behavior of the specific PA, to find the optimal structure under a given PA [14, 17, 18]. The most typical method is the predistortion model pruning technique based on orthogonal matching pursuit (OMP) [11]. This method selects the term with the greatest correlation with the remaining output in each iteration [11] to determine the optimal predistortion structure. To solve the ill-condition of the equation system caused by the high correlation between the basis function terms, a doubly OMP (DOMP) algorithm uses Gram-Schmidt orthogonalization to eliminate the correlation between the selected and unselected basis function spaces after each iteration [17]. However, the pseudo inverse calculation and the Kronecker product calculation in the orthogonalization process lead to the high computational complexity of the algorithm [9]. To this end, the simplified sparse parameter identification DOMP (SSPI DOMP) algorithm is proposed to implement the pseudo inverse computation through the recursive process [19], which effectively reduces the computational complexity. Reference [9] also proposed to realize the pseudo inverse calculation by processing the covariance matrix by the orthogonal properties, to reduce the calculation cost further. In addition, a predistortion model pruning algorithm based on adaptive principal component analysis (PCA) was proposed in reference [14]. Reference [20] also proposed a pruning algorithm based on the projection of the residual vector. All the above pruning methods regard the complex basis function term as a whole and then achieve order reduction.
However, in real wireless communication systems, the nonideal behavior of the modulator will lead to the mismatch between the gain and the phase of the transmission signal and then cause the imbalance of in-phase and quadrature-phase (/) components [21]. The modulator imperfections are interwoven with the nonlinear behavior of PA, which further reduces the transmission quality of the system [22, 23]. For this situation, the two branches (namely, / components) of the transmitters can be compensated, respectively. In other words, the / components of the compensator can be identified independently, to cope with the nonideal behavior of the modulator. For example, widely used artificial neural network (ANN) models, such as the neural network (NN) model [21] and convolutional NN (CNN) model [5], are predistortion models of / separate identification. The traditional models can also be used for independent modeling of / branches, which can be expressed aswhere and represent the / components of the predistorter and is the predistortion model. Table 1 shows the comparison of the normalized mean square error (NMSE) performance between independent identification and combined identification of / components of the predistorter under 100 MHz Doherty PA, which verifies the above idea. References [8, 23] also proposed the compensation models for / imbalance, which are independent of the DPD model and resulting in the complexity of the design.
In this case, the / components of the basis function terms have independent contributions to the linearization effects. If the / branch models of the predistorter are pruned separately, such as using DOMP, the basis function terms of the / branch models of the predistorter need to be constructed independently, which leads to the high design complexity in FPGA. It has become a difficult point to find the real basis function terms that are important to the / components of the predistortion output.
To solve this issue, this paper proposes a general order reduction method of the predistortion model based on the attention mechanism. In reference [24], we have verified that this method can effectively prune the input items of the NN-based models. In this paper, we improve this method and apply it to the pruning of the traditional polynomial models, to prove its universality. This method firstly calculates the comprehensive contributions of the real basis function terms to the predistorted output’s / components using the attention mechanism through offline training, which considers the influence of the cross basis function terms of the / branch models. Since the contributions of the real basis function terms to the predistorted output’s / components are calculated simultaneously, that is, the cross terms are evaluated, the / branch models are consistent, which further reduces the design complexity of the model. The experimental results based on the comparison with other typical methods under 100 MHz Doherty PA and different / imbalance levels show the effectiveness of the method.
The contributions of this paper are as follows:(i)The traditional / imbalance models are configured independently, which leads to high model complexity [8, 23]. In order to reduce the model complexity, the / branch models of the predistorter are modeled separately, to compensate for the / imbalance and PA’s nonlinearity simultaneously(ii)The existing order reduction methods are based on the contributions of the complex basis function terms, so as not to deal with the different contributions of / components of the complex basis function terms [11, 17]. This paper distinguishes the different contributions of the / components of the basis function terms to the / branch models, to further reduce the complexity of the model(iii)As a result of all the above contributions, this work achieves a good compromise between the model complexity and linearization effects to drive the 100 MHz Doherty PA. In addition, compared with the existing order reduction models, the proposed model has the lowest complexity of the model
The structure of the paper is organized as follows. In Section 2, the modeling and identification processes of the / branch models of the predistorter are described, and the principle of the attention mechanism is analyzed. Section 3 describes in detail the proposed order reduction method of the predistortion model based on the attention mechanism and gives the specific training process. Section 4 introduces the test platform for validation of the proposed order reduction method. In Section 5, the measurement and validation results of the proposed method are described and analyzed. The conclusion is given in Section 6.
2. Digital Predistortion Based on / Separate Identification
2.1. Predistortion Model of / Separate Identification
Due to the nonideal behavior of the modulator, the nonlinear behaviors of PA are interleaved with the / imbalance, which leads to more complex nonlinear characteristics of the transmitter [21]. Therefore, to improve the linearization effects, the compensators of the / branches should be identified separately, to deal with the asymmetry of the / branches of the transmitter. The predistortion structure of / separate identification is shown in Figure 1. The / branch models of the predistorter are modeled using the real basis function terms composed of the / components of the traditional complex basis function terms and then identified separately. The indirect learning architecture (ILA) [7] is used to identify the predistorter. The / branch models based on the GMP model can be expressed as follows [13]:where is the output signal of the PA and and represent the / components of , respectively. are the parameters of the GMP model. are the model coefficients. is the component of the PA input. The component can be represented by the same model as Equation (2).

The input and output data of groups of the predistortion model are collected, and then, the / branch models of the predistortion can be written in matrix form.where ,where and are the / component matrices of , respectively. is a complex vector composed of the basis function terms corresponding to signal , and is the number of complex basis function terms.
The / branch models of the predistortion have the same structure but are identified separately, to cope with the / imbalance. Equation (3) is solved by the least-squares (LS) algorithm [25, 26]; then, the coefficients of the / branch models can be estimated.where and are the estimations of and , respectively. In the calculation of the / components of the predistortion in FPGA, the coefficients of the predistortion model are multiplied by the model in / branches, respectively. Therefore, the different coefficients of the / branch models do not complicate the predistortion process.
2.2. The Principle of the Attention Mechanism
The achievements of artificial intelligence in the field of communication provide us with ideas [27, 28]. The attention mechanism is an effective structure to focus on important features, which has been widely applied in the fields of speech recognition [29] and image processing [30]. Based on the importance of the input features to the generation of the output, the attention mechanism weights the input features, to strengthen the important features and weaken the unimportant features, which can improve the fitting ability. The principle of the attention mechanism is shown in Figure 2. Let the input of the module be , which can be written aswhere is the number of input nodes. The fitting output is , which can be written aswhere is the number of output nodes.

First, the correlation between each input and all outputs is calculated. The commonly used method to calculate correlation is NN [28]. The correlation obtained can be written aswhere is the function that calculates the correlation.
Then, the correlations are normalized and converted to the probability form through the Softmax function, to mean the weights of the input. The weights of the input can be written aswhere stands for and represents the sum function.
Finally, the weights are used to weigh the corresponding input of the module. Using weighted inputs, the model outputs can be fitted. By weighting the inputs, the valid input features are emphasized, and the invalid ones are weakened, so as to improve the fitting performance. This paper does not use the attention mechanism to improve the modeling ability but embeds the attention mechanism into the predistortion model to obtain the weights of the basis function terms to evaluate the contributions of the basis function terms by weights.
3. The Proposed Order Reduction Method of the Predistortion Model
In this section, the structure of the proposed order reduction method of the predistortion model is given first, and each module is described in detail. Then, the training method and process of the proposed order reduction method are analyzed.
3.1. The Structure of the Proposed Order Reduction Method
In order to improve the linearization performance in the case of / imbalance, the / branch models of the predistorter are identified separately. At this point, the / components of all complex basis function terms have separate model coefficients and independent contributions to the predistortion output, as shown in the analysis in Section 2, part 1. To further simplify the structure of the predistortion model, this paper proposes an order reduction method of the predistortion model based on the attention mechanism, as shown in Figure 3. In this method, all real basis function terms are distinguished, and their contributions to the predistortion output are given, so that / branch models of the predistorter can be pruned. Meanwhile, to ensure the consistency of the / branch models of the predistorter, the contributions of the real basis function terms to the / components of the predistortion output are calculated simultaneously, to reduce the predistortion model’s design complexity in FPGA. The specific model structure is described as follows.

The input signal is fed into PA after passing through the upconversion module and the digital-to-analog converter (DAC). In the feedback loop, the coupling output of the coupler passes through the downconversion module and the analog-to-digital converter (ADC) to obtain the digital baseband signal of the PA output. We use the output signal and input signal of the PA to build the / branch models of the predistorter based on the indirect learning architecture (ILA). Then, the proposed order reduction method is used to select important basis function terms in the / branch models. Finally, the selected basis function terms are modeled on the main road using the lookup table (LUT) in the field-programmable gate array (FPGA), to achieve the PA’s linearization.
The / branch models can be constructed by the / components of the traditional models or the ANN-based models. Let us take the GMP model as an example. The / components of the complex basis function terms have independent contributions to the predistortion output, so the input data of the order reduction structure should contain all the real basis function terms, as shown in Equation (8), which can be written aswhere and are the / components of the complex basis function term , respectively. is a vector with a dimension of , where is the number of the real basis function terms. To facilitate numbering, are used to represent these elements.
The output data of the structure contains the / components of the output of the predistorter, which can be expressed aswhere and are the predistortion model output ’s / components.
To improve the fitting performance, a NN layer is used to calculate the correlation between each input and output data. Since each input needs to be calculated the correlation with all outputs, the -th neuron in the NN layer is connected to the -th input and all outputs . The number of neurons is , corresponding to inputs of the module. The NN layer’s output can be written aswhere are the weight coefficients of the -th neuron and is the bias coefficient of the -th neuron. is the activation function, usually “tanh.” The output of the -th neuron represents the correlation between the -th input and the output .
Then, the obtained correlation values between the input and output data are converted numerically using the Softmax function. And the output of the Softmax function can be expressed aswhere . is the form of probability, reflecting the spatial importance of the corresponding input to the output, which is considered in this paper as the contribution of input (the basis function term) to the generation of the output.
The inputs are weighted by the contributions of the inputs (the basis function terms), to emphasize the important inputs. The weighted inputs can be written as
By weighting the inputs (basis function terms) using the contributions, the important spatial details can be emphasized, and unimportant information can be weakened.
Finally, the weighted inputs are used to fit the output of the predistortion model. Since the / branch models of the predistorter are identified separately, two coefficient vectors are used to fit the / components of the predistorter. The predicted / components of the predistorter can be expressed aswhere and are the predicted / components of the predistorter, respectively. and are the coefficient vectors of the model, ,.
The label data of the model training is output data in Equation (15). By calculating the error between the predicted output and the label data, the order reduction structure can be trained. When the model converges, the contributions of the inputs (the basis function terms) are obtained. Then, the real basis function terms can be sorted according to their contributions. Considering the trade-off between the model complexity and linearization effects, a contribution threshold is set. Then, the real basis terms with contributions greater than the threshold are retained, and the real basis terms with contributions less than the threshold are removed. According to the retained basis function terms, the / branch models of the predistorter are modeled and identified.
3.2. Training of the Proposed Order Reduction Method
The input signal and output signal of the predistortion model are captured first. Then, the input data of the order reduction structure is constructed according to Equation (14), and the output data is obtained according to Equation (15). This paper uses 16,000 sets of input and output data to model the proposed method. The data is divided into training data and test data in a ratio of 1 : 1, which are used to train the method and test the method, respectively. The cost function of the training is set as the mean square error (MSE) function, which is written aswhere is the number of data sets for training.
In this paper, the Adam optimization algorithm [31] is used to update the coefficients of the proposed structure. The updating process of coefficients can be expressed aswhere is the gradient of the cost function to the coefficients and is the learning rate. , , and are constants.
The training process of the proposed order reduction method is shown in Algorithm 1. During the training, the attention module’s output and the model output are calculated successively, and then, the cost function is calculated based on the model output and label data. According to the obtained cost function, the coefficients of the proposed method were updated using the Adam algorithm. In the next iteration, the attention module’s output and the model output are calculated based on the updated model coefficients. Then, the cost function is calculated, and the coefficients are updated again. When the training times of the method reach the given iteration times, the training is finished.
|
When the model completes the training, the weight values of the attention module output represent the contributions of the corresponding input. According to the contribution values of the basis function terms, the real basis function terms are filtered, and the retained basis function terms are obtained
where is the number of the retained basis function terms, and satisfy . Then, the predistortion model is constructed using the retained basis function terms, and the predistorter coefficients are calculated by the LS algorithm.where .
4. Experimental Setup
The experimental platform in Figure 4 is used to test the pruning effect of the order reduction method. The test signal used is an orthogonal frequency division multiplexing (OFDM) signal with a bandwidth of 100 MHz and a PAPR of 9.46 dB. In this OFDM signal, the symbol vector is modulated by 16 quadrature amplitude modulation (16-QAM). The OFDM signal is first transmitted to the Arbitrary Waveform Generator (AWG81180A), and then, the device is connected to the Performance Signal Generator (PSGE8267D) to transmit the generated baseband signal. PSG realizes digital to analog conversion and upconversion functions and then transmits the signal to Doherty PA. The PA has a center frequency of 2.14 GHz and a saturated power of 43 dBm, and the output backoff (OBO) is 6 dB. The output signal of PA is fed into the coupler.

In the feedback loop, the coupler’s coupling output is connected to an oscilloscope (MSO9404A), to realize the sampling of the feedback signal. MSO9404A realizes the functions of downconversion and ADC. The sampling bandwidth is set to 500 MHz. Finally, the sampled digital baseband signal is downloaded to a personal computer (PC) to achieve predistortion design. The order reduction method of the predistortion model is constructed using the Python software’s TensorFlow module on a PC. In order to verify the performance of the proposed method under different conditions, we evaluate two cases of transmitter nonlinearity. Case A only contains PA’s nonlinear distortion in the link, and case B contains nonlinear distortion of PA and / imbalance in the link. In the / imbalance, the amplitude imbalance is set to 1 dB, and the phase imbalance is set to 3 degrees.
5. Experimental Results
Figure 5 shows the contribution values of the real basis function terms to the generation of the predistortion output, where the predistortion model is modeled using the GMP model, and the model parameters , , , , , , , and . In Figure 5, the horizontal axis shows the number of the real basis function terms, in the order shown in Equation (10), and the vertical axis represents the corresponding contribution values of the real basis function terms. It can be seen from the figure that different basis function terms show different contribution values with a great difference, which is the basis of the effectiveness of the proposed method. Meanwhile, the / components of some complex basis function terms all display large contribution values, such as the / components , , , , , of , , and . However, there are also some complex basis functions with only one component showing a larger contribution, such as , , and , which suggests that distinguishing the contributions of the / components of the complex basis function terms can further reduce model complexity.

Figure 6 compares the linearization performance of case A and case B at different order reduction levels. It can be found that in case A, the NMSE decreases rapidly with the increase of the number of the selected real basis function terms, when the number of the selected real basis function terms is less than 40. This is because the basis function terms with larger contributions are selected first and can generate most of the predistortion output. When the number of the selected real basis function terms is between 40 and 55, the NMSE decreases slowly. When the real basis function terms’ number exceeds 56, the NMSE performance is barely improved. We select 56 real basis function terms, and the NMSE performance at this time can be maintained to -37.28 dB. Compared with 262 real basis terms of the GMP model, the number of the real function basis terms of the pruning method is reduced by 79%. The NMSE curve of case B shows the same trend as that of case A. However, the NMSE curve of case B has a faster decline rate as the number of the real basis function terms increases. According to the linearization performance, 54 real basis function terms with large contributions were selected to mitigate the PA’s nonlinearity and / imbalance, and the NMSE at this time can be maintained to -34.07 dB. In case B, the number of the real basis function terms of the pruning model is reduced by 79%.

Figure 7 compares the NMSE of the proposed method with other methods at different order reduction levels. In OMP (real model) and DOMP (real model), the important real basis function terms for the / branch models of the predistorter are calculated separately. It can be found that the linearization performance of the real model (the / branch models of the predistorter) is obviously better than that of the complex model. In the real model, to achieve the same NMSE (-36.27 dB, which is 0.4 dB higher than the NMSE of the GMP model), the proposed method requires only 56 basis function terms and the basis function terms’ number is reduced by 79%. However, OMP and DOMP require 166 and 84 basis function terms, respectively, which are 3 times and 1.5 times the number of the selected basis function terms of the proposed method, respectively. The reduced basis function terms represent the influence of the cross terms of the / branch models.

Figure 8 compares the linearization effects of the typical pruning models and the proposed pruning model. The number of the real basis function terms selected for the proposed model is the abovementioned 56. It can be seen from the figure that the proposed pruning model reduces the adjacent channel power ratio (ACPR) of the PA output signal from -32 dBc to -46 dBc, which proves the superior pruning performance of this model. Compared with the full GMP model, the linearization effects of the proposed pruning model are almost no worse, which can be seen from the almost overlapping spectrum. Meanwhile, the linearization effects of the proposed model are almost the same as those of the OMP model and the DOMP model, but the complexity of the predistortion model is significantly reduced. The number of the real basis function terms of the proposed model is only 67% of the number of the real basis function terms of the OMP model and 34% of the number of the real basis function terms of the DOMP model.

Table 2 comprehensively compares the performance of the typical methods and the proposed pruning method in case A and case B. It can be found that in case A, the proposed pruning method achieved the NMSE of -36.28 dB (0.4 dB higher than the NMSE of the GMP model) with only 21% real basis function terms. The ACPR performance of the proposed pruning method is almost equal to that of the full GMP model. Meanwhile, the ACPR and NMSE of the proposed pruning method are almost the same as those of OMP and DOMP, using the least real basis function terms. In case B, the NMSE performance of the proposed pruning method reaches -34.07 dB (0.4 dB higher than the NMSE of the GMP model) using only 21% of the real basis function terms. Similarly, the proposed pruning method achieves nearly the same NMSE and ACPR performance as OMP and DOMP by using the least real basis function terms in case B.
Figure 9 shows the NMSE performance of the proposed order reduction model under different / imbalance levels. It can be found that under different / imbalance levels, the proposed order reduction model can quickly find the important real basis function terms, to construct the low-complexity predistortion model, which proves that the proposed order reduction model is suitable for different transmitter nonlinear conditions. The higher the level of the amplitude imbalance and phase imbalance, the worse the NMSE performance. However, the proposed order reduction model can almost achieve the optimal NMSE performance when the number of the selected real basis function terms is 30.

Figure 10 shows the contributions of the input items of the NN-based predistortion model, where the proposed method is used for the input term pruning of the NN-based predistortion model, and the pruning structure is described in literature [24]. It can be found that this method can get the contribution values of the input items. Then, the input items are sorted according to the contribution values. Figure 11 shows the NMSE performance under the different number of input items. It can be found that with the increase of the input item’s number, the NMSE performance improves rapidly. When the input item’s number is 11, the NMSE performance is almost equal to that of the full model, which proves that this method is suitable for the pruning of the NN model-based model.


6. Conclusions
In this paper, an order reduction method of the predistortion model based on the attention mechanism is proposed. This method calculates the contributions of the real basis function terms to the / components of the predistortion output using the attention mechanism, to select the important real basis function terms to build the / branch models. The experimental results based on 100 MHz Doherty PA and / imbalance verify the superior pruning performance of this method. In case A, the proposed method can prune the number of the real basis function terms to 21%, and the NMSE can be maintained to -36.3 dB. And in case B, the proposed method prunes the number of the real basis function terms to 21%, and the NMSE can be maintained to -34.1 dB. Meanwhile, to achieve almost the same ACPR and NMSE performance, the number of the basis function terms required by the proposed method is only 67% that of DOMP. In order to further reduce the complexity of the digital predistortion model in wideband systems, we will consider designing a fixed core suitable for most nonlinear transmitters and then pruning the / branch models for the remaining basis function terms.
Data Availability
The data used to support the findings of this study are available from the corresponding author upon request.
Conflicts of Interest
The authors declare that they have no conflicts of interest.
Acknowledgments
This work was supported by the Low-orbit satellite under-sampling broadband predistortion high-efficiency transmission technology (No. A2021023) and the BUPT Excellent Ph.D. Students Foundation (No. CX2020112).