Abstract
Due to the limited number of labeled samples, semisupervised learning often leads to a considerable empirical distribution mismatch between labeled samples and unlabeled samples. To this end, this paper proposes a novel semisupervised algorithm named Local Gravitation-based Semisupervised Online Sequential Extreme Learning Machine (LGS-OSELM), learning to unlabeled samples follows from easy to difficult. Each sample is formulated as an object with mass and associated with local gravitation generated from its neighbors. The similarity between samples is measurable by the local gravitation measures (centrality CE and coordination CO). First, the LGS-OSELM uses the labeled samples to learn the initialization model by implementing ELM. Second, the unlabeled samples with a high confidence level that is easy to learn are labeled with the pseudo label. Then, these samples are utilized to iterate the neural network by implementing OS-ELM. The proposed approach ultimately realizes effective learning of all samples through successive learning unlabeled samples and iterating neural networks. We implement experiments on several standard benchmark data sets to verify the performance of the proposed LGS-OSELM, which demonstrates that our proposed approach outperforms state-of-the-art methods in terms of accuracy.
1. Introduction
In past decades, neural networks have been intensively studied and applied as computational tools to solve various engineering problems [1, 2]. Although overwhelming traditional neural networks (such as BP algorithm) and their derivatives can approximate any nonlinear continuous function [3, 4], their performance is insufficient in learning speed when exposed to large-scale data sets [5]. To address this issue, Huang et al. [6, 7] have proposed an alternative to neural networks called extreme learning machine (ELM). Unlike traditional neural networks, which rely on an iterative method to adjust connection weights, ELM’s output weights are determined by using Moore Penrose generalized inverse, and the random feature mapping process determines its input weights [8]. Because the entire ELM learning process does not need any iteration, it has the ability to learn rapidly [5, 9, 10].
The original ELM performs offline learning [11, 12]. For online prediction problems, the prediction model must be updated as newly arriving data. If the ELM is directly employed for the online prediction, the ELM must repeat training with past data as well as new data. With the continuous arrival of new data and the repeated training on the past data, the ELM’s learning speed gradually declines and disappears, and even worse its learning ability collapses altogether. To solve the above problem, the Online Sequential Extreme Learning Machine (OS-ELM) has been proposed [13–15], an online ELM variant. The OS-ELM only trains on newly arriving data and then combines the existing prediction model and the training results to update a new prediction model. However, the OS-ELM is overwhelmingly dependent on abundant labeled samples. Many data sets usually contain a few labeled samples and large unlabeled samples [16, 17]. If only applying labeled samples train the prediction model, the valuable information from unlabeled samples is wasted and would lead to inferior generalization performance of the model. Fortunately, Semisupervised Learning (SSL) provides a feasible strategy to effectively utilize the information of unlabeled samples for boosting the model performance [18, 19].
Some Semisupervised ELM (SS-ELM) variants [20–26] have been presented. These semisupervised ELM are implemented by labeling the unlabeled samples once or calculating graph Laplacians between labeled and unlabeled samples once that generally used distance to measure similarity between samples. Semisupervised OS-ELM [27] based on the OS-ELM and SS-ELM [20] have been designed to adapt learning for unlabeled samples chunk by chunk. The effectiveness of distance-based semisupervised learning algorithms relies on the hypothesis that the same class is near feature space. However, for many complex data sets, distance-based methods cannot effectively achieve clustering [28]. To this end, Xia et al. [29] have been proposed a Density-based Semisupervised Online Sequential Extreme Learning Machine (D-SOS-ELM) based on local density and distance to measure similarity between labeled samples and unlabeled samples. The most existing SSL methods are based on the assumption that the empirical distribution of labeled samples follows from the actual samples’ distribution. It can hardly obtain the underlying distribution with only limited labeled samples called the empirical distribution mismatch issue. An example of Jain’s toy data [30] is illustrated in Figure 1, in which we plot tiny amounts of labeled samples and affluent unlabeled samples. It can be observed that unlabeled samples well describe Jain’s toy data distribution compared to labeled samples. Most SLL methods deem that similar samples have similar labels [31]. That is, employing the similarity between labeled and unlabeled samples determines the possibility of unlabeled sample labels. However, due to the sampling bias issue, it is infeasible to determine the relationship between all unlabeled and labeled samples at once, resulting in the inferior performance of model.

(a)

(b)

(c)
To tackle the empirical distribution mismatch issue, a novel semisupervised OS-ELM named Local Gravitation Semisupervised OS-ELM (LGS-OSELM) is proposed in the paper, which deems that the larger the scale of labeled samples is, the closer the empirical distribution of labeled samples is to the real sample distribution. Therefore, a label propagation strategy from labeled samples to unlabeled samples and from easy to difficult is used to gradually raise the scale of labeled samples.
Recently, Wang et al. [32] proposed local gravitation clustering, responding to the relevance between each sample and its neighbors, demonstrating that clustering data with complex features can be better clustered by local gravitation than density and distance clustering. Inspiring by local gravitation clustering, in our proposed method, the gravitation measures, including centrality (CE) and coordination (CO), between each unlabeled sample and labeled samples from different categories are calculated to measure similarity. These gravitation measures determine its confidence level. The unlabeled samples with the higher confidence level are easier to learn. Therefore, these unlabeled samples are labeled pseudo label, to label, and then they are employed to iterate the neural network by implementing OS-ELM for embedding information of newly labeled samples. Through continuous incremental learning, the effective learning of unlabeled samples is finally realized to facilitate effectively the learning accuracy. This method inherits the advantages of the OS-ELM and local gravitation clustering; uses local gravitation to express the similarity measure; and uses OS-ELM for incremental learning to reduce the empirical distribution mismatch gradually. Experimental results on several benchmark data sets, including two synthetic data sets and eleven real-world data sets, demonstrate that the classification accuracy of our approach outperforms state-of-the-art methods.
Contributions of this work are summarized as follows:(1)The proposed LGS-OSELM offers a new incremental learning perspective to reduce gradually distribution mismatch between labeled and unlabeled samples by a label propagation strategy from labeled samples to unlabeled samples.(2)This paper proposes a step-by-step learning strategy, that is, each iteration selects samples with high similarity from the unlabeled sample set for learning.(3)This paper proposed the local gravitation measures, extending the similarity measurement between labeled samples and unlabeled samples method.
The remainder of this paper is organized as follows. Section 2 presents representative works related to OS-ELM and gravitational clustering. Section 3 describes our newly proposed approach. Section 4 evaluates the performance of the proposed approach experimentally. Section 5 presents the conclusion.
2. Related Works
2.1. Extreme Learning Machine
The original ELM is a variant of the feed-forward neural network, in which parameters of hidden layers (input weights bias) are randomly chosen and the output weight vector is determined analytically via a Moore-Penrose generalized inverse. With L hidden nodes, an original ELM model can be expressed aswhere denotes the input feature and is the output row vector of the hidden layer. ELM learning aims to determine the output weight vector by minimizing the estimation error. The loss is expressed as follows:where , and H is the hidden layer output matrix. It can be expressed aswhere w donates the input weight of hidden layer, b donates the bias of hidden node, and (.) is the activation function. If regularized optimization is utilized, the optimal weight vector is given by the solution of the following optimization problem:where C denotes the regularization factor and is a positive constant, and the solution for p is given:
2.2. Online Sequential Extreme Learning Machine
The OS-ELM is an online version of the ELM. It can gradually learn newly arriving samples. Suppose that there is an initial sample set contained samples. In initial phase, the neural network is trained using the original ELM, the initial output weight vector is given bywhere . Suppose that the sequence sample set is available. For each newly arriving sample set , the neural network should be updated aswhere is the output matrix of the k-th iteration. It be calculated as
2.3. Local Gravitation Clustering
The local gravitation clustering approach inspired by Newton’s law of gravitation responds to association between two objects. Each sample is viewed as an object with mass. The attractive force between two objects can be determined using the following formula:where denotes the attractive force between two objects and , and respectively denote the mass of and , is the distance between these two objects, and denotes the unit vector of the direction of the line that connects the two objects.
In a local region, there is existing an assumption that the distances between the current object and its neighbors have not varied significantly. Therefore, equation (10) can be simplified as
The Local Resultant Force (LRF) of an object generated by its k-nearest neighbors can be computed aswhere the unit vector generalizes the directional information, and k is the number of neighbors. The object with larger mass generates more influence to its neighbors, while the object with smaller mass is more sensitive to the influence from its neighbors. Motivated by this formulation, the LRF is newly defined aswhere the mass m of the object is defined as follows:
In the higher density regions, the mass of object will become larger according to equation (14), vice versa. Equation (13) is not sum of scalars, but vectors. The denser the region the larger the mass, the smaller the magnitude for . Figure 2(a) shows that there exists a significant difference between the LRFs of the samples close to the center of cluster and those at border of the cluster. To encapsulate the information of the LRF, two local gravitation measures based on the local gravitation intensity have been proposed: (1) the centrality (CE) and (2) the coordination (CO). The and of sample are defined aswhere refers to the displacement vector from data point to its neighbor. Figure 2(b) shows that points in the cluster interior usually have CE values greater than 0. A sample, whose CO value is larger, indicates that its LRF has approximately same direction as its neighbors, and it may be located at the border. According to equations (15) and (16), , and CO has no clear value range. Figure 3 shows CE and CO values of samples in different localities. From Figure 3, the closer the sample is to the edge, the larger its CO is, the smaller its CE is. Table 1 shows the differences between interior and boundary samples in term of the mass, the magnitude of the LRF, the CE, and the CO. The local gravitation clustering computes the LRF, the CE, and the CO of each data sample and uses this information to distinguish samples in the central region from those at the border region.

(a)

(b)

(a)

(b)
3. Proposed LGS-OSELM
In SLL, available samples include few labeled samples and plenty of unlabeled samples . The previous semisupervised learning is based on two assumptions (1) that labeled and unlabeled samples follow the same true samples distribution and (2) that if two samples and are similar to each other, the conditional probabilities and should be the same or similar. In the previous most of the SLL based on ELM [20, 21, 33, 34], the Laplacian graph built from both labeled and unlabeled samples is built based on these assumptions. However, the empirical distribution of labeled samples usually deviates from the true sample distribution, due to the randomness in sampling of labeled samples and the small sample scale. Sampling bias issue is commonly mentioned in the works under the supervised learning and domain adaptation scenario [35, 36]. In these works, sampling bias is that training data and testing data are differently distributed. In SLL, the number of labeled samples is usually small (), which would lead to a notable empirical distribution difference with the unlabeled samples, and sampling bias between labeled samples and unlabeled samples has rarely been discussed in the previous SLL.
In addition, measuring similarity between two samples is based on distance, but the distance cannot reflect the similarity in many cases [37]. Two examples that semisupervised learning-based distance cannot handle are shown in Figure 4. Figure 4(a) shows the point distribution of flamed data [38], the unlabeled samples located in rectangular areas with black border are easily misclassified when using the distance-based semisupervised learning, because the average distance from these rectangular areas to another category is smaller than that of their real category. Another example of Jain’s toy data [30] in Figure 4(b) also shows that the distance-based method may cause the misclassification in the rectangular areas with black border, making learning worse.

(a)

(b)
To effectively end the aforementioned problems, a new semisupervised learning strategy based on local gravitation and OS-ELM is proposed in this paper, which deems that the larger the scale of labeled samples is, the closer the empirical distribution of labeled samples is to the real sample distribution. Therefore, a label propagation strategy from label samples to unlabeled samples and from easy to difficult is used to gradually raise the scale of labeled samples. Figure 5 shows the illustration of the label propagation with the Jain’s toy data. From left to right is the label propagation in Figure 5(a), and the kernel density estimations of their x-axis projection are plotted in Figure 5(b), respectively. From the figure, we can note that the empirical distribution of labeled data deviates from the true sample distribution due to the limited sampling size of labeled data in the initial stage. As the size of the labeled data continues to increase by a label propagation strategy, the empirical distribution of the labeled data gradually approaches the true sample distribution.

(a)

(b)
First, the LGS-OSELM uses the labeled samples to learn the initialization model by implementing ELM. Second, at each incremental learning, the unlabeled samples with high confidence level that is easy to learn are labeled with pseudo label, and then these samples are utilized to iterate the neural network by implementing OS-ELM. The confidence level of unlabeled samples belonging to a certain category is determined by the gravitational similarity. Through successive learning unlabeled samples and iterating neural network, the proposed approach ultimately realizes effective learning of all samples. We employ relative local gravitation measure between unlabeled and labeled samples to determine confidence level of unlabeled samples. The labeled sample set is divided into different groups with the same label, where , , and . In the work, we define a relative mass between and aswhere denotes the distance between and its neighbor from . Here, k-nearest neighbors are from . An unlabeled sample has different relative LRFs with different groups. The relative LRFs are defined as
Wang et al.’s work [32] indicated that CE gradually decreases from the inside to the edge, and The relative CE is defined aswhere denotes the LRF of the neighbor of , which are calculated by the nearest neighbors from the same group. The relative CO between is defined as
The closer the unlabeled sample is to the edge of , the larger the is, vice versa. has no clear value range. Here, is modified aswhere The points of most central areas and the edge areas are significantly different in terms of CE and CO. If 0, has a higher probability of locating at interior of . However, a sample with , with a smaller probability is in the center of . Since the points in the cluster center are subjected to forces in all directions, their CE are not significantly larger than zero [32]. Fortunately, provides auxiliary information to avoid that in the central region be misclassified as edge sample. The center sample usually has a smaller CO value, . When all internal samples are labeled, the edged samples have smaller CE and larger CO. It is difficult to determine which cluster they belong to. To end the problem, we compare magnitude of relative LRFs. As Figure 6(b) shows, the and are edged data points. For data point , so that data point with higher confidence level belongs to .

(a)

(b)
To sum up, according to the local gravitation theory, the unlabeled samples with high confidence are selected in the following three cases.(1)For each unlabeled sample , we calculate its relative to each category from labeled samples, and then find If and with higher confidence level located at the interior of .(2)If and , we calculate of . If , with higher confidence level locates at the center of .(3)If it does not meet (1) and (2), is the edge data point. We calculate magnitude of relative LRFs and find , 仔 with higher confidence level belonging to .
The algorithm is described in Algorithm 1. Through the calculation of sample’s local gravitation to determine its confidence level, the unlabeled samples with higher confidence level are inserted into neural network to generate pseudo labels [29] and then continue to be iterated until all the unlabeled samples are learned. The proposed method is more conducive to improving the reliability and accuracy of network learning.
|
4. Experiments and Results
In order to verify the effectiveness of the proposed LGS-OSELM, several comparative experiments have been empirically conducted on synthetic and real data sets.
4.1. Baseline Methods
To demonstrate the capabilities of LGS-OSELM algorithm, comparisons are made with different algorithms such as the SS-ELM, Laplacian support vector machine (Lap-SVM), LapTELM D-SOS-ELM, and STAR-SVM. These comparison algorithms are briefly introduced as follows:(1)ELM [6]: ELM is efficient single layer feed-forward neural network. Its details have been introduced in 2.1.(2)SS-ELM [20]: the semisupervised learning algorithm is based on Laplacian graph between labeled data and unlabeled data. It initializes the ELM model, then iterative selects the best until convergence.(3)Lap-SVM [39]: the semisupervised learning algorithm is based on Laplacian graph and SVM, which introduces an additional regularization term on the geometry of both labeled and unlabeled samples.(4)STAR-SVM [40]: STAR-SVM is designed to adaptively modify the optimization by adjusting the weights at each iteration. At each iteration, the regularization parameters are adapted to better reflect label confidence and class proportion and to gradually include more unlabeled samples.(5)LapTELM [22]: LapTELM uses manifold regularization term to explore the geometry structure information of unlabeled samples, which simultaneously trains two related and paired semisupervised ELMs with two nonparallel separating planes for the final classification.(6)D-SOS-ELM [29]: D-SOS-ELM uses local density and distance which are used to the similarity of sample.
4.2. Data Sets
The experimental data sets include synthetic data sets and real data sets. Two synthetic data sets are Jain’s toy data set and the Flame data, which are shown in Figure 4. For real data sets, we use benchmark data sets, which were widely used to evaluate machine learning algorithms. The details of these data sets are shown in Table 2. Among them, USPS and MNIST are the two well-known handwritten data sets with samples ranging from “0” to “9.” The other 10 real-world data sets are selected from UCI machine learning repository.
4.3. Experimental Setup
To evaluate the semisupervised learning algorithms, each data set is split into a training data set and a testing data set (denoted by ). Then, the training data set is further divided into a labeled set (), a validation set (), and an unlabeled set (). When a semisupervised learning algorithm is trained, the labeled data from L and the unlabeled data from U are utilized. In this paper, we randomly select 10% as L from the original data, 50% as , 10% as V, and 30% as T. The performance of model is sensitive to involved parameters of algorithm. In the experiments, we use the training set for a 7-fold cross-validation to find the optimal parameters. In the simulations of ELM, SS-ELM, and LGS-OSELM algorithms on the EEG-Eye, MNIST, and USPS20, the number of hidden layer nodes is set to be 2000, while on the remaining data sets, it is set to be 200, since ELM-based models can realize good generalization performance as long as the number of hidden layer nodes is large enough. The details of the involved parameters in each algorithm are listed as follows:(i)ELM: the trade-off parameter c(ii)SS-ELM: the trade-off parameters c and and the number of the nearest neighbors k(iii)Lap-SVM: the number of nearest neighbors k and the bandwidth parameter (iv)STAR-SVM: trade-off parameter c and confidence weight (v)LGS-OSELM: trade-off parameter c and the number of nearest neighbors k(vi)LapTELM: trade-off parameters and (vii)D-SOS-ELM: trade-off parameters c
These parameters are selected from the following sequences:(i) and : (ii): {0.001, 0.01, 0.05, 0.1, 0.5, 1}(iii): (iv) {5, 10, 15, 20}(v): (vi): {0, 0.1, 0.2, 0.3, 0.4, 0.5, 0.6, 0.7, 0.8, 0.9, 1}
4.4. Experimental Results
4.4.1. The Results on the Synthetic Data Sets
Two synthetic data sets randomly divided the data set into 4 parts, the first part is 10% of the data set, the second part is 50%, the third part is 10%, and the fourth part is 30%. The traditional ELM method is divided into two types of training: one selects the first part of labeled samples to train the model (ELMA), and another simulation employs the first part and second part labeled samples to train the model (ELMB). The SS-ELM, LapTELM, D-SOS-ELM, and LGS-OSELM select the first part as and second part as . The third and fourth parts of data respectively are set as and for all algorithms.
This experiment is mainly used to find the optimal. Table 3 shows the experimental results of LGS-OSELM on these synthetic data sets. In Table 3, it can be seen that when the parameter increases from 5 to 20, classification accuracy is different. On the Flame data set, when the number of nearest neighbors k is 15, the experimental result is the best; for Jain data sets, the number of nearest neighbors is 10.
Table 4 shows the classification accuracy of ELM, SS-ELM, D-SOS-ELM, LapTELM, and LGS-OSELM on the synthetic data sets, where “Acc%” denotes the average classification accuracy on the testing set according to the optimal parameters “(),” “(),” “(c, , k),” or “(c, k).” It can be seen from Table 4 that the ELMB has the best experimental result; ELMA is the worst. In terms of the number of samples used in training, the ELMA model is the least, so the result is the worst. The number of training samples for SS-ELM and LGS-OSELM is the same as that of ELMB. The difference is that the ELMB model is trained by a large number of labeled samples, while SS-ELM, D-SOS-ELM, LapTELM, and GLS-OSELM have very few labeled samples, and most of the samples are unlabeled, and it is obvious that when the labeled data are relatively small, LGS-OSELM outperforms the ELMA, D-SOS-ELM, LapTELM, and SS-ELM. On the Flame data set, the classification accuracy achieved by LGS-OSELM is 8% and 19.1% higher than these of the ELMA and SS-ELM, respectively. On the Jain data set, the classification accuracy achieved by LGS- OSELM is 13.13% and 21.61% higher than those of the ELMA and SS-ELM, respectively. The proposed LGS-OSELM is slightly excellent than D-SOS-ELM and LapTELM on two synthetic data sets.
4.4.2. The Results on Real-World Data Sets
The performance comparison on real-world data sets is displayed in Table 5. Note that the training sets of the ELM consist of only the 10% labeled data. As can be seen from Table 5, LGS-OSELM, LapTELM, and D-SOS-ELM achieve better results than ELM, Lap-SVM, STAR-SVM, and SS-ELM on almost all the employed data sets. LGS-OSELM achieves better performance than LapTELM and D-SOS-ELM in most cases. From the last line of Table 5, the average classification accuracy achieved by LGS-OSELM is 82.15% on the , which is 11.29%, 4.25%, 8.94%, 6.51%, 1.87%, and 1.12% higher than those of the ELM, STA-AVM, Lap-SVM, SS-ELM, D-SOS-ELM, and LapTEM across twelve data sets, respectively. The average classification accuracy achieved by LGS-OSELM is 83.63% on the , which is 10.89%, 4.59%, 7.58%, 6.34%, 2.23%, and 2.21% higher than these of the ELM, STA-SVM, Lap-SVM, SS-ELM, D-SOS-ELM, and LapTELM across twelve data sets, respectively.
4.5. Experimental Analysis
For further analyzing the significant difference between the five models on the employed real-world data sets, we resort to the well-known Friedman test with the corresponding post hoc tests. The Friedman statistics is written as follows:where m and n are the numbers of the involved models and data sets and R denotes the average ranks. The statistics follows distribution with m-1 degree of freedom. The average ranks of the five models testing sets of all the real-world data sets are displayed in Table 6. Firstly, we make a comparison of the five models on . It is clear that m = 7 and n = 12. Based on the average ranks displayed in Table 7, we can calculate the statistics = 56.267, andwhich is distributed following the F-distribution with (m − 1) and (m − 1) (n − 1) degrees of freedom. We can calculate . For the confidence level 90% , < , which means the null hypothesis that the there is no significant difference among the all models. Next, the Nemenyi post hoc test is performed to further compare the five models in pairs. The critical difference (CD) is obtainedwhere is a definite value that related to m and the significance level and the confidence level . It is available by looking up specific table; can be obtained querying specific table and then we obtain the critical difference = 2.375. If the average ranks of two models differ by at least CD, their performance is significantly different. Statistical variables and the difference of average ranks between and other models are displayed in Table 7. The statistical variable is greater than , which indicates the null hypothesis. We can summarize that the proposed LGS-OSELM performs significantly better than ELM, STAR-SVM, Lap-SVM, and SS-ELM. There is no significant difference between D-SOS-ELM, LapTELM, and our proposed LGS-OSELM on and the .
We further analyze the performance of the proposed LGS-OSELM with different scales of labeled data. More specifically, for all the employed data sets, 70% data are randomly selected from each class to form the training set, and the others are left to compose the testing set. Then, % data in the training set are randomly selected as and others as , where the values of are 10, 20, 30, 40, 50, 60, and 70. The learning results are displayed in Figure 7. It is easy to know that the classification accuracy of the five algorithms is generally increased with the increasing of the values of Q. That is to say, increasing the scale of labeled data can generally boost the performance of algorithm.

(a)

(b)

(c)

(d)

(e)

(f)

(g)

(h)

(i)

(j)

(k)

(l)
Further, it can be found that the LGS-OSELM performs better than other baseline methods in most cases, and the advantage is more significant when the value of Q is small. In specific, no matter what the scale of labeled data is, the proposed LGS-OSELM performs better than SS-ELM, Lap-SVM and STAR-SVM, D-SOS-ELM, and LapTELM.
In order to evaluate the computational efficiency of the proposed algorithm, we present the training times of six semisupervised algorithms on different data sets in Table 8. From this table, training time is essentially the same order of magnitude. The proposed algorithm is the most efficient among the six algorithms, in which LGS-OSELM spends the least amount of training time on four of the twelve data sets. Because the proposed method is based on distance and density, the calculation of the local gravitation is very time consuming.
To summarize, two synthetic data sets and 12 groups of real-world data sets are used as the experimental data. For the two synthetic data sets, two types of ELM (ELM trained by few labeled samples and ELMB trained by abundant labeled samples), SS-ELM D-SOS-ELM and LapTELM, are employed as comparisons. The experimental results showed that the LGS-OSELM, trained by few labeled samples and abundant unlabeled samples, has the higher accuracy than SS-ELM, D-SOS-ELM, LapTELM, and ELM is and slightly less than ELMb. For real-world data sets, the experimental results showed that the LGS-OSELM has higher prediction accuracy than state-of-the-art semisupervised methods including STAR-SVM, Lap-SVM and SS-ELM, D-SOS-ELM, and LapELM. For further fair comparisons of five methods, we resort to the well-known Friedman test with the corresponding post hoc tests, which indicates that the proposed LGS-OSELM performs significantly better than ELM, STAR-SVM, Lap-SVM, and SS-ELM, and there is no significant difference between D-SOS-ELM, LapTELM, and our proposed LGS-OSELM on the T and the U across 12 real-world data sets. Experiments with the different scale of the labeled sample are carried out to further explore discrepancy among the five methods. Results indicate that the classification accuracy of these five algorithms generally increases as the scale of the labeled sample increases, and no matter what the scale of labeled data is, the proposed LGS-OSELM performs better than other baseline methods.
5. Conclusion
In this work, a local gravitation-based incremental semisupervised classification algorithm is proposed. The proposed algorithm uses the local gravitation measures (CE and CO) between unlabeled data and label database on analysis of the relationship between sample and their neighborhoods to screen the unlabeled samples with high confidence joining OS-ELM to improve the accuracy of the semisupervised learning. For many complex data sets, the traditional method of using distance as a similarity index is difficult to realize the correct calculation of the similarity between samples, which leads to the low accuracy of semisupervised learning. By calculating the local gravitation measures (CE and CO) of unlabeled samples, this method can effectively determine the similarity of the samples and can evaluate the reliability of the unlabeled samples. Because each iteration selects the samples with high confidence for incremental semisupervised learning, the accuracy of learning is greatly improved compared with the existing methods. The average test accuracy of the proposed model on two synthetic data sets and eleven real-world data sets are higher than that of SS-ELM, Lap-SVM, and STAR-SVM. This method can make use of a small number of labeled data and a large number of unlabeled data to train a practicable model, which can improve the utilization of data, reduce the manufacturing cost of data labeling, and improve the efficiency of learning.
Data Availability
The data used in this study are available at https://archive.ics.uci.edu/ml/index.php.
Conflicts of Interest
The authors declare that they have no conflicts of interest.
Authors’ Contributions
Xinbiao Wang conceptualized the study, did formal analysis, developed methodology, provided software, wrote the original draft, and reviewed and edited the manuscript. Quanyi Zou provided software, did data analysis, developed methodology, and reviewed and edited the manuscript. Rui Lei conceptualized the study, reviewed and edited the manuscript, validated the study, and did project administration.
Acknowledgments
This work was supported in building a community with a shared future for mankind through Scientific and Technological Innovation by 2035 (x2jmB1190800).