Abstract
To enhance the efficiency of antenna optimization, surrogate model methods can usually be used to replace the full-wave electromagnetic simulation software. Broad learning system (BLS), as an emerging network with strong extraction ability and remarkable computational efficiency, has revolutionized the conventional artificial intelligence (AI) methods and overcome the shortcoming of excessive time-consuming training process in deep learning (DL). However, it is difficult to model the regression relationship between input and output variables in the electromagnetic field with the unsatisfactory fitting capability of the original BLS. In order to further improve the performance of the model and speed up the design of microwave components to achieve more accurate prediction of hard-to-measure quality variables through easy-to-measure parameter variables, the conception of auto-context (AC) for the regression scenario is proposed in this paper, using the current BLS training results as the prior knowledge, which are taken as the context information and combined with the original inputs as new inputs for further training. Based on the previous prediction results, AC learns an iterated low-level and context model and then iterates to approach the ground truth, which is very general and easy to implement. Three antenna examples, including rectangular microstrip antenna (RMSA), circular MSA (CMSA), and printed dipole antenna (PDA), and 10 UCI regression datasets are employed to verify the effectiveness of the proposed model.
1. Introduction
As is known to all, electromagnetic simulation software (EMSS) such as high-frequency structure simulator (HFSS) and computer simulation technology (CST) is most commonly used in the optimization design of electromagnetic devices, which can obtain high-precision results, however, along with high computational and time cost. Therefore, the use of surrogate models instead of EMSS for evaluating the fitness of electromagnetic components can save much optimization time, which is currently a hot topic in electromagnetic optimization design. Many popular modeling methods have been widely used like Gaussian process (GP) [1, 2], backpropagation (BP) [3], artificial neural network (ANN) [4–6], support vector machine (SVM) [7, 8], extreme learning machine (ELM) [9, 10], kernel ELM (KELM) [11], and so on. Traditional deep neural networks are generally composed of multilayer learning to mine complex knowledge and abstract data characteristics from simple concepts, which have achieved breakthrough success [12], such as deep belief networks (DBN) [13] and convolutional neural networks (CNNs) [14]. Broad learning system (BLS) [15] is proposed to solve the problem of plenty of time and computing resources that the above-mentioned deep structure suffers due to the need to adjust large numbers of parameters and its complicated manual design structure. Subsequently, BLS was proven to have universal approximation capabilities [16]. Specifically, thanks to the fast incremental learning algorithm [17], which is applied to BLS, when faced with newly added samples and hidden nodes, the system can be updated incrementally without rebuilding the entire network from scratch. The construction of BLS is based on the theory of random vector functional-link neural network (RVFLNN) [18, 19]; however, instead of directly bringing the original inputs into the network, BLS first maps them into feature nodes and then imitates the practice of RVFLNN, that is, nonlinearly transforming them into enhancement nodes, and these two parts together constitute the hidden layer of BLS. More importantly, the powerful mechanism for randomly generating hidden-layer node weights based on any continuous probability distribution is retained by BLS, so that only the output weights need to be trained through the pseudoinverse algorithm [20]. In particular, the input weights are first random generated and then fine-tuned by a sparse autoencoder. The fundamental characteristic of BLS is that the limit formula of Moore-inverse is utilized, and the pseudoinverse incremental formula is adopted, which can guarantee its training accuracy and fast incremental learning ability. Inspired by boosted neural nets [21] and a method of model fusion, that is, stacking, master-apprentice BLS (MABLS) [22] was proposed and applied to the antenna optimization. In [23], the probability that the current classification result belongs to each category is used as the context and combined with the original inputs as new inputs. Furthermore, Stacked BLS [24], which not only uses the current BLS’s outputs, but also utilizes the training algorithm along with residual characteristic, was achieved by stacking several BLS blocks to approach the residual outputs of each block. In addition, a novel k-means clustering algorithm [25] based on a noise algorithm is developed, which solves the problem of determining the number of clusters and sensitively initializing the center cluster in the traditional k-means clustering algorithm. It is an iterative clustering analysis algorithm, which starts the next iteration according to the current clustering results. Still, coupled multistable stochastic resonance (CSMR) [26], adaptively optimizes and determines the system parameters of SR by using the output signal-to-noise ratio and seeker optimization algorithm and then feeds the preprocessed signal into CMSR for further training. Illuminated by these above approaches, we further propose auto-context BLS (ACBLS) as another version of MABLS, and the context is defined as the predicted values of the current model in regression problem.
The rest of the paper is organized as follows. Section 2 briefly reviews related works on BLS. In Section 3, the original BLS will be presented, and then the specific structure and algorithm of the proposed ACBLS are introduced in detail. Experiments on RMSA, CMSA, PDA, and 10 UCI regression data sets are conducted in Section 4 to demonstrate the proposed method and report the results and analysis. At last, Section 5 draws the conclusion.
2. Literature Review
Thanks to its extraordinary efficiency, prominent generalization performance, and easy extendibility, BLS has been applied in different domains. Due to space constraints, we will only show a portion of the innovations and applications on BLS. By incorporating TS fuzzy systems into a BLS, fuzzy BLS [27] is proposed, the feature nodes are replaced by each group of Takagi–Sugeno fuzzy subsystems, and the input data are processed by each of them, fuzzy BLS retains the fast computational nature of BLS and can achieve great accuracies. A novel deep-broad learning system [28] is proposed to jointly consider effectiveness and efficiency in 5G era; specifically, based on typical BLS, it adopted long short-term memory to extract the mapped features, which further improve the performance of prediction. Xu et al. [29] propose a new recursive BLS to capture the dynamic nature of time series in order to make the network remember historical information, and the enhancement nodes are connected recursively. To model uncertain data, especially those with noise and outliers, Chen et al. [30] proposed robust BLS based on regularization and achieved great generalization. In order to fulfill the task of semisupervised classification, Zhao et al. [31] extended BLS based on popular regularization frameworks, forming a semisupervised BLS. However, so far, there are few applications in the field of electromagnetism. Until very recently, MABLS is applied for antenna optimization, and this paper is a continuation of this research.
3. The Proposed Algorithm Model
In this section, we will first briefly introduce the training process of standard BLS and then propose the detailed modeling strategy of ACBLS.
3.1. Standard Broad Learning System
BLS is a typical forward neural network, whose structure draws on the concept of RVFLNN and is divided into three layers: the input layer, the hidden layer, and the output layer. Constructing a BLS contains two essential procedures: (1) randomly generate the weights of both mapping feature nodes and enhancement nodes, and (2) calculate the weights between the hidden and output layers. The architecture of BLS is shown in Figure 1.

First of all, the inputs are converted into n sets of random feature nodes using mapping ηi, which is normally a linear mapping. The output of the ith group of mapping feature nodes can be denoted by the equation of the formwhere weights Wei and bias terms βei are randomly generated with proper dimensions. In particular, Wei is a matrix, which is then learned using a sparse autoencoder based on lasso regression.
Next, combining all the feature nodes, we have . Similarly, the enhancement layer composed of m groups of enhancement nodes is obtained by transforming the feature layer Fn using a nonlinear function εj and the jth group of enhancement nodes can be represented as
Once more, Whj and βhj are the randomly generated matrix weights and bias terms. The hidden layer is composed of the feature layer and the enhancement layer, which is expressed as
Hence, we have the outputs of the modelwhere W are the weights from the hidden layer to the output layer and can be approximated rapidly by the ridge regression learning algorithm, shown in (5), which is a L2 norm regularized least square problem, also referred to as the ridge regression problem.
A constraint term λ, also called the regularization coefficient, is added to the original least squares estimate to make it possible to find the pseudoinverse when the original generalized inverse is under the ill condition. The solution of the above problem can be approximated aswhere HT is the transposed matrix of H, and I is the identity matrix. The solution verges to 0 when . Specially, if λ = 0, the problem degenerates into the least square problem, and it is easy to obtain the solution of the original pseudoinverse.
3.2. Auto-Context Broad Learning System
Given a set of samples, the model is first trained to get its corresponding predictions, which are then used as the context information, that is, the prior knowledge, to train a new model. This procedure is somewhat similar to the part of the training process of three common integrated learning frameworks, which are Bagging, Boosting, and Stacking. For stacking integration, the predictors of a specified layer are independent of each other, so they can be trained in parallel on multiple servers. However, the predictor of a certain layer can only start training after all the predictors of the previous layer have been trained. Regardless of whether it is AC or MA [22], each predictor is based on its previous results, and therefore, the training process must be orderly. It is meaningless to distribute it on multiple servers, and this feature is the same as boosting integration. The training process of AC can be summarized as follows:(1)When given a set of datasets together with their labels, S = {X, Y}, , , define Yu as the corresponding outputs of model-u, where u is the iteration index, u = 1, 2, 3, …, U.(2)Combine the context as each iteration’s inputs Xu = [X | Yu−1], . It is worth noting that Y0 is the null matrix.(3)Use the model to calculate new outputs Yu based on new inputs Xu.
The u-th AC iteration represents that the model-u will teach its prediction results to the model u + 1; that is, we have to first complete the training of the original BLS and get the initial Y1, and then we can start the AC iteration. Once a BLS has completed training, the algorithm repeats the same procedure to better approximate the ground truth. Without loss of generality, taking the kth teaching as an example, the structure of the proposed ACBLS is shown in Figure 2 and its training steps are presented in detail in Algorithm 1.

| 
 | 
For most traditional ANNs, their structure is fixed, and the parameters need to be adjusted repeatedly in order to optimize the performance of the model. On the contrary, the parameters of BLS are randomly selected and fixed, and horizontal expansion is required to adjust the optimal structure. The above-mentioned unique feature is enabled no matter whether AC or MA is applied to the BLS, it could result in a fantastic performance, and there is no need to manually adjust any parameters after the training starts.
4. Experiments
In the first place, define the structure of the hidden layer of BLS as , which are the numbers of feature nodes Fn, mapping groups and enhancement nodes En, respectively. For MABLS, the hidden layer extends column C to the right, which contains the predictions of the previous model, so the final structure becomes and the parameter C is conclusive and unchanging, which is the dimension of the model’s output. So, whether the original BLS or the improved version using AC or MA, we only need to confirm . The regularization parameter δ in ridge regression is set as 2−30. It is worth mentioning that, except the output weights calculated by the ridge regression, all other weights and biases involved are randomly generated, which are drawn from the standard uniform distributions on the interval [−1, 1]. In particular, the input weights are fine-tuned by the sparse encoder with lasso regression of the input data to obtain better feature nodes, while the enhancement nodes are activated by the nonlinear activation function, which is the hyperbolic tangent function. All experiments are conducted on a computer equipped with Intel(R) Core(TM) i7-4790K CPU @ 4.00 GHz 4.00 GHz, and the RAM is 16 GB.
4.1. Resonant Frequency of RMSA
Taking the antenna of rectangular microstrip antenna (RMSA) [32, 33] as the first example. Figure 3 shows its top view schematic (above) and the side view (below). For this case, and , 26 samples are selected for training, and the remaining 7 groups are marked with asterisk as test samples. All experimental data can be checked in [22] and will not be repeated here. Average percentage error (APE) given by (7) is utilized as the performance evaluation index to estimate the prediction errors of different modeling methods.where Yi and yi are the predicted value and the actual value, respectively. Besides, N is the number of samples.

In order to demonstrate the performance of the proposed ACBLS, 6 different modeling methods are compared to do the same experiment, including backpropagation (BP) [3], Parallel Tabu Search (PTS) [34], NN ensemble based on binary particle swarm optimization (BiPSO-NNE) [35], GP with ARD Matern 5/2 kernel function (GP52) [36], DKL with ARD Matern 5/2 kernel function (DKL52) [36], and MABLS [22].
For a fair comparison, just as the MABLS [22], we also perform a grid search from [1,30] × [1,30] × [1,30] to determine the best structure , the searching step is set to 1, and 8 iterations are taken. The optimal structure and test results of MABLS and ACBLS for each iteration are reported in Tables 1 and 2, respectively. It is worth noting that the kth iteration generates model k + 1, and the first row of number 0 means that the current model is the original BLS, which is actually the model 1. To see the effects of iteration more clearly, the rows of Effect-1 and Effect-2 are added to make clear the degree to which each iteration is optimized over the previous result and the degree to which it is improved over the original one, respectively.
As observed in Table 1, the iterative results of MA show a decreasing trend, and it is a coincidence that AC in Table 2 shows the opposite phenomenon. However, after 8 iterations, the effect of MA is an astonishing 76.934%, while the effect of AC undergoing 8 iterations is only 48.997%, which is not as good as the effect of AC after the first iteration. We can preliminarily judge that the method of MA is more suitable than AC for the resonant frequency modeling of RMSA. In addition, the optimal test results of all the methods compared are presented in Table 3. After calculation, the prediction ability of ACBLS is 92.656%, 91.437%, 85.365%, 77.75%, 72.563%, and 48.997% higher than that of BP, PTS, BiPSO + NNE, GP52, DKL52, and BLS, respectively. For this case, the method of MA is 54.775% better than AC.
4.2. Resonant Frequency of CMSA
The second example is circular microstrip antenna (CMSA) [37, 38]; here are the relevant parameters: a is the radius of the circular patch, h is the thickness of the substrate, and εr is the relative dielectric constant. The above three parameters are used as inputs, while the outputs are the corresponding resonant frequency f. The top view schematic (above) and its side view (below) are shown in Figure 4. For this case, and . All experimental data are tabulated in Table 4, among which 16 of them are selected as training samples, and the remaining 4 groups are marked with asterisk as testing samples.

Six different modeling methods with the same experiment are compared to validate the prediction precision of the proposed ACBLS, including BiPSO-NNE, delta-bar-delta (DBD) [3], BP, PTS, extended DBD (EDBD) [3], and MABLS. For a fair comparison, for BLS, MABLS, and ACBLS, we perform the same grid search from [1,30] × [1,30] × [1,15] to determine the best structure , the searching step is set to 1, and 8 iterations are taken.
The optimal structure and testing results of MABLS and ACBLS for each iteration are listed in Tables 5 and 6, respectively. It can be readily seen from Tables 5 and 6 that both AC and MA have great performance in the prediction of the resonance frequency of CMSA. Moreover, results of the first 4 iterations of MA are better than those of AC; however, the effect of subsequent iterations for AC is more obvious and has always been at an advantage. Table 7 gives the best prediction results of different methods, and after calculation, the performance of ACBLS is 99.978%, 99.963%, 99.951%, 99.931%, 99.898%, and 99.843% higher than that of BiPSO + NNE, DBD, BP, PTS, BLS, EDBD, and BLS, respectively. For this case, the method of AC is 76.25% better than MA.
4.3. Printed Dipole Antenna
The top view of the printed dipole antenna (PDA) is shown in Figure 5 and the corresponding three-dimensional view in HFSS is presented in Figure 6. The design index of PDA is to operate at the working frequency of 2.45 GHz. The structure of the antenna can be divided into five parts, which are the dielectric layer, the dipole antenna arm, the microstrip Barron line, the microstrip transmission line, and the feed surface. Five influential geometrical variables each with five levels, i.e., X = [L1, L2, L3, L4, W3], are the inputs of the model, which are transmission line length L1, dipole arm length L2, Barron triangle side right angle side length L3, Barron triangle base right angle side length L4, and microwave Barron rectangle section width W3, respectively. The value ranges of parameters and sampling intervals are defined in Table 8. The unit is millimeter, and other parameters are fixed values, including the thickness of the dielectric layer H = 1.6 mm, transmission line width W1 = 3 mm, and dipole sheet width W2 = 3 mm. In addition, the relative dielectric constant εr is 4.4.


Since return loss (S11) is one of the important indicators for analyzing the performance of antenna, this paper verifies the effectiveness of the proposed algorithm by fitting the curve of S11. The frequency scanning range of PDA is set to 2 GHz∼3 GHz with a step size of 0.001 GHz, which means that each set of the inputs corresponds to 1001 outputs. Thus, for this case, and .
By using the HFSS-MATLAB-API script [39], HFSS software is called by the scripts programmed in MATLAB to get the outputs, which is the S11. Through partial orthogonal experiments, 31 samples are generated, 25 of which are used as training sets, and the rest are used as test sets. APE given by the above (7) is used as the performance evaluation index. We perform a grid search for the original BLS from [1,10] × [1,10] × [1,70] to determine the optimal structure , and the searching step is set to 1. The modeling time of BLS is 111.87 s, however, considering that AC needs iteration if, in order to further save the training time, when executing AC on BLS, the scope of grid search can be narrowed down to [1,10] × [1,10] × [1,10], and the training time of each AC is only 26.33 s. Clearly, the total training time of ACBLS is the sum of the training time spent by the original BLS and the time required for K iterations of AC, and the optimization times of related methods are listed in Table 9.
The results of AC iteration are all recorded in Table 10 and after 8 iterations, the effect of the original BLS is optimized by 40.029%. For this case, the iteration is chosen as 8, the training time here is 111.87s(BLS) 26.33s(8 AC iterations), which is only 322.51s, compared with the direct EM simulation for optimization, and the proposed method takes much less CPU time. One set of obtained optimal S11 solution satisfying the antenna criterion is plotted in Figure 7, and the corresponding geometric value is X = [22, 21, 10, 12, 3]mm. Meanwhile, the blue line of “HFSS” is the simulation result, and the red line of “proposed” is the prediction result. The S11 reaches −23.991 dB @2.45 GHz, which meets the design requirements. It can be easily seen that the modeled and simulated results are highly consistent, which proves the validity of the proposed model.

4.4. Regression Data Sets
Considering some compared methods in the above experiment stage, which may not be commonly used and state-of-the-art methods, typical models such as SVM [8], LSSVM, ELM [40], and the latest improved version of Greedy BLS (GBLS) [41] are involved in the comparison of 10 real-world regression data sets from the University of California, Irvine (UCI) database [42], to further highlight the validity of the proposed algorithm, and the details of different data sets are put up in Table 11. For fair comparisons, the same grid search is performed from [1,10] × [1,30] × [1,200] for BLS, GBLS, MABLS, and ACBLS, and the searching step is set to 1.
As in the previous cases, 8 iterations are considered for MA and AC. Root mean square error (RMSE) [43] is selected as the performance index, and the optimal testing results of different models are tabulated in Table 12, of which the best RMSE result corresponding to each data set is indicated in bold.where N is the number of samples, Yi is the predicted value, and yi is the actual value.
It can be readily seen from Table 12 that, under the same condition, the method of AC outperforms any model compared in all the experiments on 10 function approximation data sets except MABLS in the case of Pyrim. Ultimately, for Bodyfat, Housing, Strike, and Basketball, the performance of AC is better than MA by 10%, 9.259%, 4.267%, and 3.933%, respectively, and the improvement effect for the remaining data sets is less than 3%. When ACBLS is compared with the original BLS, for Pyrim, Bodyfat, Housing, and Basketball, the improvements are 47.924%, 40%, 28.229%, and 22.155%, respectively, and the remaining six data sets show improvements of less than 20%. It can be concluded that the predictive power of ACBLS is slightly better than that of MABLS in most cases, and the former performs much better than other above-mentioned models compared, and up to now, the effectiveness of the proposed model is further verified.
5. Conclusion
In this paper, we have developed and evaluated ACBLS, which obtains context features in the previous regression results. Our goal is to design an iterated framework to rapidly and effectively propagate and use the context information. It is very general and easy to implement and does not depend on any particular type of model, which can avoid heavy algorithm design such as various energy terms and procedures. Three antenna cases and 10 UCI regression datasets are illustrated and comparative results triumphantly demonstrate that the proposed method greatly improved the unsatisfactory generalization ability of the original BLS, and its modeling capability far exceeds that of some mainstream methods. It can be concluded that the proposed model may provide an efficient and powerful parametric modeling ability in antenna optimization, replacing the time-consuming EMSS.
Data Availability
The data that supported the findings of this study are available from the corresponding author upon request.
Disclosure
Weitong Ding and Fei Meng are co-first authors.
Conflicts of Interest
The authors declare that they have no conflicts of interest.
Acknowledgments
This work was supported by the National Natural Science Foundation of China (NSFC) under No.61771225, and the scientific research capacity improvement project of key developing disciplines in Guangdong Province of China under No. 2021ZDJS057.