Abstract
For the optimal design of electromagnetic devices, it is the most time consuming to obtain the training samples from full wave electromagnetic simulation software, including HFSS, CST, and IE3D. Traditional machine learning methods usually use only labeled samples or unlabeled samples, but in practical problems, labeled samples and unlabeled samples coexist, and the acquisition cost of labeled samples is relatively high. This paper proposes a semisupervised learning Gaussian Process (GP), which combines unlabeled samples to improve the accuracy of the GP model and reduce the number of labeled training samples required. The proposed GP model consists two parts: initial training and self-training. In the process of initial training, a small number of labeled samples obtained by full wave electromagnetic simulation are used for training the initial GP model. Afterwards, the trained GP model is copied to another GP model in the process of self-training, and then the two GP models will update after crosstraining with different unlabeled samples. Using the same test samples for testing and updating, a model with a smaller error will replace another. Repeat the self-training process until a predefined stopping criterion is met. Four different benchmark functions and resonant frequency modeling problems of three different microstrip antennas are used to evaluate the effectiveness of the GP model. The results show that the proposed GP model has a good fitting effectiveness on benchmark functions. For microstrip antennas resonant frequency modeling problems, in the case of using the same labeled samples, its predictive ability is better than that of the traditional supervised GP model.
1. Introduction
In recent years, for the optimization design of electromagnetic devices, some excellent research results have been achieved by numerical simulation calculation or combining full wave electromagnetic simulation software such as HFSS with global optimization algorithm, such as particle swarm optimization (PSO) [1]. In general, microwave devices can be simulated by HFSS software to obtain accurate results. However, when global optimization algorithm is combined with HFSS software, each updating needs to call HFSS for evaluation, which is costly and very time consuming. Therefore, using a surrogate method instead of HFSS to evaluate the fitness of electromagnetic devices can save greatly optimization time, which is a hot topic in electromagnetic optimization design. Many researchers have proposed lots of surrogate methods, such as artificial neural network (ANN) [2, 3], support vector machine (SVM) [4, 5], kernel extreme learning machine (KELM) [6, 7], and Gaussian process (GP) [8, 9].
GP is a machine learning method that has developed rapidly in recent years. It has a strict statistical theoretical basis and is suitable for dealing with complex problems such as high dimensions, small samples, and nonlinearity [10, 11]. GP has been developed with continuous research studies on Bayesian neural network (NN) and has the advantages such as flexible nonparametric inference, adaptive acquisition of super-parameters, and predictive outputs. In the electromagnetic field, many scholars have made some achievements in the application of GP, verifying the feasibility of GP as an alternative method of electromagnetic simulation software.
However, to our best knowledge, the most GP modeling of electromagnetic behaviors are based on supervised learning. The labeled training samples used in GP modeling are based on HFSS. The acquisition of labeled samples by HFSS will consume a lot of time, which is also the main reason affecting the efficiency of antenna optimization. Therefore, semisupervised learning (SSL) [12, 13] method is proposed on the basis of existing research studies in this study. Traditional machine learning techniques rely on large numbers of labeled samples for training. In practical electromagnetic engineering, it is difficult to obtain labeled samples, while unlabeled samples are cheap and easy to be obtained [14]. SSL is a learning method between supervised and unsupervised learning [15], mainly considering the combination of labeled samples and unlabeled samples to improve the learning efficiency, which is suitable for regression and classification problems. Specifically, SSL methods include self-training [16], co-training [17], graph-based methods [18], EM-with generative modes [19], and transductive SVM [20]. In this paper, the self-training method is combined with GP modeling to solve the antenna optimization design.
Self-training is one of the SSL methods, which is simple and effective without specific assumptions [21, 22] and is commonly used to deal with classification problems. Based on the traditional self-training method, this paper proposes the SSL-based GP model, which is used to predict the resonant frequency of microstrip antennas (MSAs) that belongs to the regression problems. The SSL-based GP model proposed in this study includes two parts: initial training and self-training. In the initial training, a few labeled samples are used to obtain a GP model with low accuracy, and then the initial error of the GP model can be obtained. Before self-training, the trained GP model in the initial training process is copied to another. Next, different unlabeled samples are inputted, respectively, into each GP model and the corresponding outputs are achieved. The two models are cross-trained with the generated pseudolabeled samples, and then the two GP models are updated and differ from each other. The same test samples are used to verify the two updated GP models, and the model with smaller error is used to substitute another one, and then a more accurate training sample set can be produced for future training the GP model. Repeat the self-training process until a predefined error threshold is met. Four benchmark functions and resonant frequency modeling problems of three MSAs are used to evaluate the effectiveness of the proposed algorithm. Through the experiments of test functions and resonant frequency of three different MSAs, we can get the conclusion that the predictive ability of the proposed GP model in this study is better than that of the traditional supervised GP model.
2. Semisupervised Learning Model
2.1. Gaussian Process Modeling
2.1.1. Training
The properties of GP are determined by the mean function and the covariance function [23], which can be expressed aswhere , is the mean function, and is the covariance function. Furthermore, it can be expressed as
For the regression model , the observed target value polluted by additive noise that is the random variable subject to normal distribution, and its mean is 0 and its variance is ; therefore,
The prior distribution of is given bywhere is the symmetric positive definite covariance matrix of order , measures the correlation and , and n training sample outputs y, and testing sample outputs constitute the joint Gauss prior distribution, that is,
GP can choose different covariance functions [24], usually using the square exponential covariance function as follows:
The properties of the mean function and the covariance function of GP are determined by a set of super-parameters. By establishing the log-likelihood function of the conditional probability of the training samples, the partial derivative of the super-parameters is obtained. The optimal solution of the super-parameters is found by the conjugate gradient optimization method. The form of the negative log-likelihood function is
After obtaining the optimal super-parameters, the trained GP used to perform the relevant prediction.
2.1.2. Predicting
Given the new input , the input value of the training set X, and the observed target value y, which are used to infer the maximum possible predicted posterior distribution of , it is written aswhere m and ∑ are the mean and the covariance of the prediction, given by
The predicted mean and covariance describe the Gaussian distribution that the predicted output may follow. The predicted mean value can be regarded as the predicted output value of the nonlinear fitting tool, and the predicted variance can be regarded as uncertainty evaluation of the predicted mean value. The magnitude of the prediction variance reflects the accuracy of the model at this point. The smaller the variance, the higher the accuracy of the model.
2.2. Training Process of the Semisupervised Learning Model
For the proposed method in the study, there are two training processes. The first part is the initial training with one GP model, the second part is the self-training with two GP models.
2.2.1. Initial Training Process
Firstly, we use HFSS to simulate a small number of labeled samples, denoted as N0. Then, we use these N0 samples as the original training samples. The ith training sample is expressed as , where is the input variable, n is the dimension of the input variables, while is the corresponding output, m is the dimension of the output variables. After that we train the GP model with the N0 training samples. Figure 1 is the flow chart of the initial training, and the process can be summarized as follows: Step 1: using HFSS software simulates a few labeled training samples, denoted as N0. Step 2: applying these N0 samples to train the GP model. After training, the GP model is with relatively low accuracy because of small number of samples. Step 3: the initial error of the trained GP model can be obtained.

2.2.2. Self-Training Process
Before the self-training, the trained GP model in the initial training process is copied to another one, denoted as GP1 and GP2. Two groups of unlabeled sample sets are denoted as and . In each iteration, we select, respectively, one sample from the unlabeled sample set and input it into GP1 and GP2. We can obtain the corresponding output and . The two groups of pseudolabeled samples generated by GP1 and GP2 are denoted, respectively, as and . In each iteration, we use the HFSS software to simulate one sample and get its label, used as the test sample, denoted as . The test sample is used to evaluate the performance of the GP1 and GP2, and the test errors are denoted as e1 and e2, respectively.
Figure 2 is the flow chart of the self-training algorithm, where i is the number of iterations, and its process can be summarized as follows:

The pseudocode of the self-training process is shown in Algorithm 1. In the process of the self-training, the GP model uses the information obtained by unlabeled samples to update constantly. We add the test sample in each iteration to the training sample, set in order to improve the accuracy of the GP model. The whole training process can control the number of unlabeled samples added and prevent the model from losing precision due to too much unlabeled samples introduced.
|
3. Cases study
3.1. Benchmark Functions
To verify the feasibility of the proposed SSL-based GP model, we select four benchmark functions for testing. The functions are, respectively, Sphere function, Sum Squares function, Rastrigin function, and Schwefel function. They all have five dimensions, and the value interval of the independent variable is [−30, 30]. Their formulas are given by (10)–(13). The first two functions are unimodal functions, and the error threshold is set as 1e-06. The last two functions are multimodel functions, and the error threshold is set as 1e-04. We set the maximum number of iterations to 100:
The selected four benchmark functions are used to test the performance of the proposed SSL-based GP model. For each iteration, the Relative Error (RE) is used for evaluation, and it is given bywhere is the label predicted by the SSL-based GP model and is the true label of the test sample. Table 1 records the iterative results of the four benchmark functions. We know from Table 1 that the error threshold of unimodal functions can be reached after about 25 iterations, while the error threshold of multimodel functions can be reached after about 75 iterations. All the models have achieved high accuracy within 100 iterations, which verifies the effectiveness of the proposed GP model.
Figure 3 shows the iterative results of the four benchmark functions. The left side named (a) is the test error curves of the above four functions. From the curves, we know that the test error for each iteration is small. For the unimodal functions, the order of magnitude of the maximum error is 1e-05, and for the multimodel function, the order of magnitude of the maximum error is 1e-02. Meanwhile, the right side named (b) is the fitting effect diagrams of the four functions, showing the fitting effects of each function for 50 test points. As can be seen, although the fitting effects of the multimodel functions are not perfect in some points, the fitting effects of the four functions all have reached good levels.

(a)

(b)
3.2. Resonant Frequency of MSAs
3.2.1. Rectangular MSA
Figure 4 is a schematic diagram of the rectangular MSA [25] that is composed of the radiation element, the medium layer, and the reference ground. The width of the rectangular patch is W, the length is L, the thickness of the dielectric layer is h, and the relative dielectric constant is .

The design variables are W, L, h, and , and the resonant frequency points are measured by Mehmet Kara [26]. There are 33 sets of data in total shown in Table 2. When selecting the training sample set, we should consider the information on each dimension and make sure that the samples in each dimension are uniformly dispersed. Then, 13 sets of data with suffix # are used as the initial training samples, and 10 sets of data with suffix are used as test samples. The label information of 10 sets of data with suffix ★ are removed firstly and then used as unlabeled samples because we have no unlabeled sample in the case.
In initial training, we set the four design variables with W, L, h, and as the input variables of the GP model and the resonant frequency as the output of GP model; then, we establish the initial GP model by using the training samples in Table 2. The test samples are used to obtain the initial error, and the result of the mean RE is 0.0093.
In each iteration, GP1 and GP2 select, respectively, one different unlabeled sample for cross-training and take one test sample for verification. We set the iteration termination condition, that is, the test error has met the error threshold and the error of the latter iteration is worse than that of this iteration simultaneously. If it satisfies the condition, the program stops at the latter iteration. In the case, the error threshold is 1e-05, and the smallest test error is 3.4683e-6 at the 5th iteration. Therefore, we stop the iteration at the 6th iteration. Table 3 shows the test errors of the six iterations, and Figure 5 shows the error curves.

From the above results, except for unlabeled samples for each iteration, the optimal model has been further trained with four more test samples. At the same time, the four test samples are added into the original training sample set. We train a traditional GP model with the updated training sample set. The 5th test sample is used to test the traditional GP model, and the result is 7.7814e-4, larger than 3.4683e-6. We can preliminarily consider that the SSL-based GP model has advantages over the traditional supervised GP model.
Considering the effects of different test samples on the error, we use the above trained SSL-based GP model to predict the 5th to 10th test samples, and the predicted results for these six test samples are shown in bold and underlined, in column 7 named fproposed in Table 2. We also use the above traditional GP model to predict the 5th to 10th test samples for comparing. Here, we use Mean Relative Error (MRE) to evaluate the performance of the model. The MRE is given by
For the rectangular MSA, the MRE of the proposed SSL-based GP model is 0.0075, while the MRE of traditional supervised GP model is 0.0081. We may conclude that, with the same training samples and the same test samples, the performance of the SSL-based GP model is better than traditional supervised GP model, and the test error is smaller and the accuracy is improved.
3.2.2. Circular MSA
As can be seen from Figure 6, replacing the rectangular patch with the circular patch is a schematic diagram of the circular MSA. The radius of the circular patch is a, the thickness of the dielectric layer is h, and the relative dielectric constant is . Its resonant frequency is measured by Sing [27] and Seref Sagiroglu [28].

The design variables are a, h, and , as shown in Table 4, and totally there are 20 sets of data. As described above, considering the information on each dimension, 8 sets of data with suffix are used as test samples, and their label information is removed firstly; then, they are used as unlabeled samples. The remaining samples are used as training samples.
In initial training, the same as above, we get the initial error is 0.0807, which is large and the GP model needs further improvement. In this case, the error threshold is 1e-02, and the smallest test error is 0.0071 at the 7th iteration. Therefore, the program stops at the 8th iteration. Table 5 shows the test errors of eight iterations, and Figure 7 shows the error curve.

From the above results, the GP model has been further trained with six more test samples. Same as above, these six test samples are added to the original training sample set to train a traditional supervised GP model. The 7th test sample is used for comparison, and the result is 0.0073, which is larger than 0.0071.
We use the above trained GP model to predict the 7th to 8th test samples, and the predicted results are shown in bold and underlined, in column 7 named fproposed in Table 4. The MRE of the SSL-based GP model is 0.0160, while that of the traditional GP model is 0.0240, which means the proposed GP model is better. In conclusion, for the resonant frequency of circular MSA, the performance of the proposed SSL-based GP model is better than that of the traditional GP model.
3.2.3. Triangular MSA
As can be seen from Figure 8, we replaced the rectangular patch with the triangular patch and obtained schematic diagram of the triangular MSA. The length of equilateral triangle is s, the height is h, and the relative dielectric constant is . The antenna has five different modes of operation. The design variables are mode, s, h, and , and its resonant frequency is measured by Chen [29] and Danele [30].

As described above, considering the information on each dimension, 5 sets of samples with suffix are used as the test samples in Table 6. At the same time, their label information was removed because we want them to be as unlabeled samples. The remaining samples are training samples.
In the initial training process, the error is 0.1428. Due to less training samples, the initial error is large, which needs to be further improved. In this case, the error threshold is 1e-02, and the smallest test error is 0.0067 at the 4th iteration. Table 7 shows the test errors, and Figure 9 is the error curve. The optimal model has been further trained with three more test samples. Same as above, these three test samples are added to the original training sample set to train a traditional supervised GP model. The 4th test sample is used to test, and the result is 0.0781, which is larger than 0.0067.

We use the above trained SSL-based GP model to predict the 4th and 5th test samples, and the predicted results are shown in bold and underlined, in column 7 named fproposed in Table 6. The MRE of the SSL-based GP model is 0.0194, while that of the traditional GP model is 0.0423. From the result, we can conclude that this error is smaller than the initial training error, and the accuracy of the model can be improved by using the proposed algorithm when the accuracy of initial model is not good enough. At the same time, the MRE of the SSL-based GP model is smaller than that of the traditional supervised GP model, which means the SSL-based GP model is better. In conclusion, for the resonant frequency of the triangular MSA, the performance of the proposed SSL-based GP model is better than that of the traditional supervised GP model.
3.2.4. Comparison with Other Algorithms
The three basic geometries including rectangular MSA, circular MSA, and triangular MSA lead to the development of fractal geometries for the design of multiband antennas. Many research studies have researched on the data of these three different resonant frequency points we used. Firstly, we compare the proposed algorithm in this study with NNs including the backpropagation (BP), the delta-bar-delta (DBD), and the extended delta-bae-delta (EDBD) in reference [26]. Different models are used to predict all the samples including training samples and test samples, and the total absolute errors are obtained. The predicted results of the proposed GP model are, respectively, shown in column 7 named fproposed in Tables 2, 4, and 6 for different MSA. The comparison results are showed in Table 8. From the results of Table 8, for the rectangular MSA, the total absolute error is obviously smaller than the other algorithms. For the circular MSA and the triangular MSA, the proposed algorithm is better than DBD and BP, but a litter worse than EDBD. In a word, the proposed SSL-based GP model uses fewer training samples, but has almost same or better predication ability than the NNs in [26]. However, as we all know, how to decide the structure of NN is difficult. Usually, it depends on the researcher’s experience or adopts the trial and error method. For our proposed method, it has no problem and is very easy to model.
We also cite some results from other references for comparison, and the results also shown in Table 8. From references [31, 32], for the rectangular MSA, the proposed algorithm is better than the methods in references. From references [29–33], for the circular MSA and triangular MSA, we can get the same conclusion. In a word, using a few numbers of training samples, the proposed SSL-based GP model has better predication ability than these methods in the references.
4. Conclusion
In order to improve the optimal design efficiency of electromagnetic devices and save the time for collecting the training samples simulated by full wave electromagnetic software, this study proposes a semisupervised GP model, which covers initial training process and self-training process. In the initial training process, a few labeled samples are used to train the GP model with relatively low accuracy. In each iteration of self-training process, the trained GP model is copied to another GP model firstly, and the two GP models are further updated with unlabeled samples. After using the same test sample for testing, the GP model with small error is used to replace another GP model for self-updating. Repeat the self-training process until the error threshold is met. Four benchmark functions are used to test the effectiveness of the proposed algorithm. Experimental results show that both unimodal functions and multimodel functions can reach the expected error with fewer iterations. Meanwhile, the problems of resonant frequency of three different microstrip antennas are used to verify the effectiveness of the proposed GP model. Comparing with supervised GP model, the results show that the accuracy of the proposed semisupervised GP model is improved, and the error is smaller than that of the traditional supervised GP model. Comparing with other algorithms, the proposed GP model uses the fewer labeled samples, while the prediction ability has some advantages over other methods. In a word, the proposed semisupervised GP model in this study further promotes the research on the optimal design of electromagnetic devices.
Data Availability
The data used to support the findings of this study are included within the article.
Conflicts of Interest
The authors declare that they have no conflicts of interest regarding the publication of this paper.
Acknowledgments
This work was supported by the National Natural Science Foundation of China (NSFC) under no.61771225, Postgraduate Research and Practice Innovation Program of Jiangsu Province China under no. SJCX19-0593, and Qinglan Project of Jiangsu Higher Education.