Joint Transfer Extreme Learning Machine with Cross-Domain Mean Approximation and Output Weight Alignment

Zang, Shaofei; Li, Dongqing; Ma, Chao; Ma, Jianwei

doi:https://doi.org/10.1155/2023/5072247

Complexity

On this page

Abstract Introduction Related Work Analysis Conclusion Data Availability Conflicts of Interest Acknowledgments References Copyright Related Articles

Research Article | Open Access

Volume 2023 | Article ID 5072247 | https://doi.org/10.1155/2023/5072247

Joint Transfer Extreme Learning Machine with Cross-Domain Mean Approximation and Output Weight Alignment

Shaofei Zang,¹Dongqing Li,²Chao Ma,¹and Jianwei Ma¹

Academic Editor: Sheng Du

Received11 Aug 2022

Revised20 Dec 2022

Accepted15 Feb 2023

Published01 Mar 2023

Abstract

With fast learning speed and high accuracy, extreme learning machine (ELM) has achieved great success in pattern recognition and machine learning. Unfortunately, it will fail in the circumstance where plenty of labeled samples for training model are insufficient. The labeled samples are difficult to obtain due to their high cost. In this paper, we solve this problem with transfer learning and propose joint transfer extreme learning machine (JTELM). First, it applies cross-domain mean approximation (CDMA) to minimize the discrepancy between domains, thus obtaining one ELM model. Second, subspace alignment (sa) and weight approximation are together introduced into the output layer to enhance the capability of knowledge transfer and learn another ELM model. Third, the prediction of test samples is dominated by the two learned ELM models. Finally, a series of experiments are carried out to investigate the performance of JTELM, and the results show that it achieves efficiently the task of transfer learning and performs better than the traditional ELM and other transfer or nontransfer learning methods.

1. Introduction

The fast development of mobile Internet, the Internet of things, and high-performance computers causes a large amount of data to emerge. How to mine the information of these data to help people make decisions has become a challenge. Machine learning uses numerous labeled data to train a statistical model for automatic prediction and this has become a hot topic in artificial intelligence (AI). As a high-performance model in machine learning, ELM has achieved success in pattern recognition, computation science, and machine vision. It includes the following two merits [1–4]: fast learning speed and outstanding generalization performance. There is no need for ELM to tune input weight and bias, and what it only needs is to optimize the output weight by solving a least square problem. Therefore, it has been widely recognized for classification and regression in various fields including industry fault diagnosis [5, 6], medical diagnosis [7], hyperspectral imagery classification [8, 9], facial expression recognition [10], and brain-computer interface [11, 12]. However, like the traditional machine learning model, ELM performs less satisfyingly when the training samples are insufficient.

Transfer learning (TL) can handle this problem, in which the account of labeled samples (data) from other domains (source domain) related to the current domain (target domain) are adopted to train an efficient model for helping target tasks [13–15]. TL can not only reduce the cost of collecting training samples for data reuse but also enhance the generalization performance of the model. It is an expression of advanced intelligence. We commonly divide TL into three parts [14, 16], namely, instance-based transfer [17–19], feature-based transfer [20–23], and classifier (or parameter)-based transfer [24–26]. Moreover, with the success of deep learning and adversarial network in computer version and machine learning, some deep transfer learning [27] and transfer adversarial learning approaches [28] appear to further enrich the transfer learning in theory and application.

TL could help ELM to solve the shortage of available training samples, and many variant ELMs with the ability of knowledge transfer have appeared. Depending on how to adapt between domains, we divide the transfer ELM (TELM) into the following three types. (1) The target supervised method: It usually requires a few of the labeled samples from the target domain to adjust the model training on the source domain. Domain adaptation extreme learning machine (DAELM) [29] was put forward to enhance ELM to handle the domain adaptation problems in the E-nose system. Online domain adaptation extreme learning machine (ODAELM) [30] and online weighted domain transfer extreme learning machine (OWDTELM) [31] extend DAELM to the online task. To further improve DAELM, Xia et al. [32] proposed the boosting for DAELM (BDAELM) which introduces boosting technology to ensemble DAELMs. (2) Parameter transformation or approximation: This method realizes the knowledge transferring across domains by a transform matrix or output weight approximation, such as transfer extreme learning machine with output weight alignment (TELM-OWA) [33], parameter transfer ELM (PTELM) [34], and extreme learning machine (ELM)-based domain adaptation (EDA) [35]. Li et al. [36] designed transfer learning based on the ELM algorithm (TL-ELM) by adding a constraint which forces the output weights of the two domains to be close to each other. (3) Statistical adaptation: It usually introduces a statistical distribution metric, such as MMD [37], into ELM to reduce the domain shift. Many methods including cross-domain extreme learning machines (CdELMs) [38], extreme learning machine based on maximum weighted mean discrepancy (ELM-MWMD) [39], and domain space transfer ELM (DST-ELM) [40] are applied MMD to reduce the distribution discrepancy of the output data in hidden layer from source and target domains.

In this paper, we propose a novel ELM called joint transfer extreme learning machine (JTELM) for transfer learning. It first obtains one ELM model by introducing cross-domain mean approximation (CDMA) [41] into ELM, in which CDMA could effectively minimize the marginal and conditional distribution differences between the two domains. Second, we apply subspace alignment technology [42] to align output weights of two domains and to simultaneously add the approximation term to force the output weights to be close to each other, which could boost knowledge transfer. Then, we can obtain the other ELM model. Finally, the target samples are tested by two learned ELMs. JTELM is illustrated in Figure 1. We carry out some experiments on public datasets for transfer learning tasks to estimate the performance of JTELM, and the result demonstrates the superiority of our method.

We summarize our contributions as follows:(1)CDMA measure is added to the objective function of ELM to reduce the distribution discrepancy of the output of hidden layers in the source and target domains, which could obtain one transfer ELM model.(2)We apply output weight alignment and the approximation of the output weights from the two domains to improve the efficiency of knowledge transfer and simultaneously to get the other transfer ELM.(3)We use the two obtained transfer ELMs to jointly predict test samples, which enhances the robustness of JTELM. To estimate the performance of our approach, we conduct classification experiments on object recognition and text datasets, and the result demonstrates that JTELM has a remarkable knowledge transfer ability.

We organize the rest of the sections of this paper as follows. ELM, CDMA, and SA are briefly described in Section 2. JTELM is described in detail in Section 3. Then,, the experiment is analyzed in Section 4 and the conclusion of this paper is presented in Section 5.

In this section, we briefly introduce ELM, CDMA, and SA.

2.1. Extreme Learning Machine (ELM)

ELM, as a single-hidden-layer forward-feedback network, randomly initializes the input weight and bias and then solves the optimal output weight, which leads to its fast learning speed and high accuracy. If a labeled dataset with samples of and a correspondent label is given, then we can construct a classic ELM model with nodes in a hidden layer in the following manner:where is the output of ELM according to the input samples , , and and these are the input weights and bias which are often randomly initialized. is a vector representation of the output weight. If we want an optimal , then the following loss function is solved:where is a parameter sparse constraint avoiding model overfitting. is the classification error and is its tradeoff parameter. We then convert equation (2) into the following matrix form:where , , and .

According to [2], we get the optimal as

Finally, we predict the testing sample aswhere .

2.2. Cross-Domain Mean Approximation (CDMA)

The distribution discrepancy measure is very critical in transfer learning. Zang et al. [41] presented CDMA that is nonparametric, easy to understand, efficient, and beneficial for mining local information. In transfer learning, there are two datasets: from the source domain and from the target domain, where is the number of in and belonging to classes which is the label of . Then, we can get the CDMA measure aswhere and is the mean vector of the target (source) domain sample. If we further consider the label information of the samples, CDMA is also represented aswhere and is the mean vector of the target (source) domain with category.

2.3. Subspace Alignment (SA)

In transfer learning, especially feature transfer, SA usually aligns two feature subspaces from the source and target domains obtained by other feature extraction methods, and realizes the distribution consistency of the two domains. If we have learned the two subspace transform matrixes and , then, a transformation matrix is obtained to solve the following function:where is the Frobenius norm. We add the orthogonalization operation into equation (8) and get

From equation (9), we can see that . We set and from this it is clear that the sample distribution in subspace is more similar to the one in than in , which facilitates knowledge transfer.

3. Joint Transfer Extreme Learning Machine (JTELM)

In response to the shortcoming of ELM with no ability of knowledge transfer, we propose a novel transfer ELM abbreviated as JTELM for handling unsupervised transfer learning tasks in which no labeled target samples appear. In unsupervised transfer learning, the source domain and are given but disappears in , so we expect that JTELM learned from to precisely predict the samples in

3.1. Extreme Learning Machine with CDMA

To equip ELM with the ability of knowledge transfer, we first use to map and into and , and then construct ELM with CDMA by introducing CDMA into the loss function of ELM. CDMA minimizes the distribution difference of data in output layer from the source and target domains. Then, we get the loss function of ELM with CDMA as follows:

In equation (10), the first two items are the loss of ELM, and the third item is the loss of CDMA in the output layer. is a tradeoff parameter between two losses. , , , is the pseudo label of in the label refinement process. represents samples with category in and . , is the mean vector of and , respectively. is the mean vector of the target (source) domain with category.

We can obtain one ELM with knowledge transfer ability by using according to [2] as follows:

3.2. Extreme Learning Machine with Output Weight Alignment and Approximation

Suppose that there is weight in the target domain, then we can construct a loss function as follows:where denotes the classification error in the source domain, denotes output weight approximation to force close to for the facilitation of knowledge transfer, and and are the balance parameters.

As a next step, we apply SA to align the output layer of the source ELM to target ones. First, we obtain a transform matrix , and set and then replace with and substitute it into equation (12) to get

At this moment, change into the source classification error under the output weight alignment. We substitute to equation (13) and get

Because , equation (14) can change to

We set and , then equation (15) can be simplified as

Let , then we obtain

3.3. Joint Decision

Up to now, we have learned two transfer ELMs with and , but the target samples are predicted by according to equation (5). We summarize JTELM in Algorithm 1.

	Input: Dataset and , number of hidden-layer nodes , tradeoff parameter , , , and .
	Output: Predicted result .
	Step 1: Use and to calculate according to equation (11).
	Step 2: Solve the output weight according to equation (17);
	Step 3: Use and to predict and get its label .
	Step 4: Repeat Steps 1–3 until no change on .

3.4. Discussion

Inspired by TELM-OWA [33], we put forward JTELM to address the problem of unsupervised transfer learning. It has the following characteristics:(1)Similar to TELM-OWA, output weight alignment (equation (8)) and weight approximation are used to learn a transfer ELM parameter , but JTELM is an unsupervised TL method in which no labeled samples in the target domain exist. Therefore, JTELM has a higher difficulty and challenge.(2)The authors in [41] have proved that CDMA is a more efficient distribution discrepancy metric than MMD. We apply it to ELM to add the transferring ability of knowledge from the source domain to the target domain. Thus, in equation (11) is a parameter of the shared model between domains.(3)JTELM utilizes and to jointly make decisions for test samples, which not only unify statistical adaptation and parameter transformation into a learning framework to improve knowledge transfer, but also enhances the robustness of our approach similar to ensemble learning.

4. Experiment and Analysis

In this section, we present the validity of our JTELM and perform some experiments on image and text datasets commonly used in transfer learning for the classification task. All experiments are run on a PC with 8 GB memory and Windows 10 operating system and MATLAB 2017b. Every experiment runs 20 times and the average value is recorded. We evaluate all algorithms in the experiment with an accuracy similar to [21].

4.1. Datasets Description

Office31 + Caltech256 (shown in Figure 2): These datasets were first published in the year referred in [43]. It contains two domains, namely, Office31 and Caltech256. Office31 consists of 4,652 images in 31 categories. They have been collected from 3 subdomains, that is, Amazon (A), DSLR (D), and Webcam (W). Caltech (C) is also an object image dataset consisting of 30,607 images from 256 categories.

During the experiment, we select 1,410 images with 10 categories from office31and 1123 images with 10 categories from Caltech. Every picture is extracted using SURF features with 800 dimensions. Two subdomains in A, W, D, and C are randomly chosen as the source and target domain datasets, and 12 cross-domain tasks are built as C⟶A, C⟶W, C⟶D, …, and D⟶W (shown in Table 1).

USPS + MNIST (shown in Figure 3): USPS and MNIST are the two image datasets describing numbers from 0 to 9, so they share 10 categories but have different distributions. USPS consists of 9,298 images with 16 × 16 pixels and MNIST has 70,000 images with 28 × 28 pixels. During the experiment, 1,800 pictures from USPS and 2,000 pictures from MNIST are selected as the source domain and the target domain (shown in Table 1). Every image is converted into 16 × 16 pixels and two cross-domain tasks, i.e., USPS vs. MNIST and MNIST vs. USPS are constructed for transfer learning tasks.

MSRC + VOC2007 (shown in Figure 4): MSRC is an object image dataset consisting of 4,323 images from 18 categories. VOC2007 is an image dataset with photos in Flickr, consisting of 5,011 images from 18 categories. They have similar but different distributions as can be seen in Figure 4. In this experiment, we collect samples from shared 6 categories of the two datasets, including aircraft, birds, cows, family cars, sheep, and bicycles. Then, we construct two transfer learning tasks: MSRC vs. VOC and VOC vs. MSRC, in which 1,269 images are selected in MSRC and 1530 images are selected in VOC2007. In addition, we rescale all images to 256 gray pixels in length and extract 240 dimensions as a new feature representation (shown in Table 1).

Reuters-21578: Reuters-21578 is a text dataset commonly used for text data mining and analysis. It has 21,577 news documents from 5 classes, such as “exchanges,” “orgs,” “people,” “places,” and “topics.” In this experiment, we select the largest three categories “orgs,” “people,” and “place,” and construct 6 transfer learning tasks, i.e., orgs vs. people, people vs. orgs, orgs vs. place, place vs. orgs, people vs. place, and place vs. people, as shown in Table 1.

4.2. Experimental Settings

We choose some classifiers to compare them with JTELM as follows: 1NN: one nearest neighbor classifier SVM: support vector machine with linear kernel and penalty parameter belonging to ; ELM: standard extreme learning machine SSELM: semisupervised ELM with graph regularization [24] TCA1(2): TCA [20] + NN (SVM) JDA1(2): JDA [21] + NN (SVM); we set the dimension of the feature subspace in TCA and JDA and the value range of the sparsity constraint parameter of the projection matrix is ; DAELM_S, DAELM_T [29]: domain adaptation ELMs AELM [44]: ELM with feature augmentation (AELM). Its result in this section is cited from [44] ARRLS [45]: a general transfer learning framework. We set its parameters according to [45] TELM-OWA [33]: supervised transfer ELM. We set its parameters as referred in [33] CdELM-C [38]: unsupervised transfer ELM using MMD. We cited its results in [38] for comparison

In addition, we set the penalty parameter in ELM, SSELM, DAELM_S, DAELM_T, and TELM-OWA. In JTELM, we set , , , , and on Office + Caltech dataset, and on USPS + MNIST, and on Reuters-21578 dataset, and and on MSRC + VOC2007 datasets.

To evaluate DAELM_S, DAELM_T, and TELM-OWA in the experiment for unsupervised transfer learning, we select few target samples such as 0.5% labeled target samples on USPS + MNIST, MSRC + VOC2007, and Reuters-21578 datasets and 1% labeled target samples on Office + Caltech dataset to train models.

4.3. Results and Analysis

To investigate the performance of JTELM, we carry out classification tasks on image and text datasets including Office + Caltech, USPS + MNIST, MSRC + VOC2007, and Reuters-21578 datasets, and the results are reported in Table 2 and 3. From the results, the following are the observations:(1)JTELM has the highest accuracy in the total average of all algorithms in Tables 2 and 3. It, respectively, gains improvement of 10.54% and 8.73% compared to the baseline ELM in Tables 2 and 3, indicating that our method has a better ability of knowledge transfer with help of CDMA, output weight alignment, and weight approximation. It enriches ELM in theory and application.(2)TELM-OWA and DAELM, as supervised transfer learning mechanisms which requires parts of the labeled target samples, are not ideal under unsupervised learning. TCA, JDA, ARRLS, and CdELM-C apply MMD to reduce the distribution discrepancy of the two domains and to gain good results. SSELM utilizes graph regularization to explore the information of unlabeled target samples and it performs well.(3)TCA1 (2) and JDA1 (2) implement the classification task by combining the transfer feature extraction methods(TCA and JDA) with baseline classifier, therefore they outperform 1NN and SVM. ELM performs slightly better than 1NN and SVM due to its good generation ability.

In Table 4, we test the running time of some compared algorithms and JTELM and record it. It can be seen that (1) ELM has the least running time among all methods without tuning the input weight and bias. (2) DAELM_S and DAELM_T are slightly higher than ELM because of the participation of part of the target samples in the training model. TELM-OWA consumes more time than ELM, DAELM_S, and DAELM_T because of solving the two weights which is similar to JTELM. JTELM has more time to solve weights and refine the target pseudo labels. (3) JDA has the most running time because it requires the construction of MMD scatter, feature extraction, and label refinement. TCA (1, 2) and JDA (1, 2) cost more time than 1NN and SVM to extract cross-domain features. JDA spends more time than TCA for label refinement. (4) Due to construction of the Laplacian matrix, SSELM costs more time than ELM.

4.4. Ablation Study

In JTELM, three mechanisms including CDMA, output weight alignment (OWA), and weight approximation (WA) are applied to realize knowledge transferring. Thus, jointly making a decision needs to be executed under output weight alignment, so that we can regard both output weight alignment and jointly making a decision as OWA. It mainly depends on the parameter to adjust its influence. The impact of different combinations of the three mechanisms on JTELM’s accuracy is shown in Table 5, and it shows that (1) both CDMA and OWA can independently enhance ELM’s knowledge transfer capability. (2) Combining CDMA and OWA for knowledge transfer is better than their own. (3) WA is also beneficial to knowledge transfer. Moreover, we compare ELM, ELM-CDMA, and JTELM in time consumption. ELM-CDMA is a special situation of JTELM, without output weight alignment and weight alignment, and the result in Table 6 shows that CDMA, OWA, and WA could help ELM perform better in transfer learning, but they need more time-cost.

4.5. Parameter Analysis

We investigate the sensitivity of JTELM to parameters , , , and and to the number of hidden-layer nodes and its convergence and perform experiments on org vs. people, MSRC vs. VOC, MNIST vs. USPS, and A vs. D datasets. The results are shown in Figures 5(a) to 5(f) and the following observations were the observations made: (1) With , , , and increasing, the accuracy of JTELM first rises up and then falls on 4 datasets, as shown in Figures 5(a) to 5(d). We can see that CDMA, the source classification error with OWA and output weight approximation, when adjusted to the appropriate range, can improve accuracy and the knowledge transfer ability of ELM in transfer learning mechanism. (2) As shown in Figure 5(b), the accuracy first increases and then slightly decreases on 4 datasets, with the number of growing. When increases, the nonlinear approximation of our network will perform better. However, for some datasets, a larger may maximize the distribution discrepancy of the output data for the hidden layer from two domains, leading to model’s poor performance. (3) We also observed the accuracy varying with the iteration number in Figure 5(f). It shows that the accuracy of JTELM gradually becomes stable and finally converges after 10 iterations.

(a)

(b)

(c)

(d)

(e)

(f)

5. Conclusion

In this paper, we propose JTELM to address the problem that ELM degrades in transfer learning. It first applies CDMA to ELM and one transfer ELM model is learned. Then, similar to TELM-OWA, it uses output weight alignment and out weight approximation to learn the other transfer ELM on the source domain. Finally, it adopts two learned transfer ELMs to predict the samples from the target ones. Extensive experiments have been performed on the open image and text datasets, and the results show that JTELM has a higher accuracy and strong knowledge transfer ability than several state-of-the-art classifiers.

Data Availability

The data used to support the findings of this study are found in https://github.com/jindongwang/transferlearning/blob/master/data/dataset.md.

Conflicts of Interest

The authors declare that they have no conflicts of interest.

Acknowledgments

This work was supported in part by the National Natural Science Foundation for Distinguished Young Scholars of China under Grant No.11905244, in part by the Key Scientific Research Projects of Universities in Henan Province under Grant No. 22A120005, the National Aviation Fund Projects under Grant No. 201701420002, and the Henan Province Key Scientific and Technological Projects under Grants No.222102210095 and 212102210153.

References

G. B. Huang, Q. Y. Zhu, and C. K. Siew, “Extreme learning machine: theory and applications,” Neurocomputing, vol. 70, no. 1–3, pp. 489–501, 2006.
View at: Publisher Site | Google Scholar
G. B. Huang, H. M. Zhou, X. J. Ding, and R. Zhang, “Extreme learning machine for regression and multiclass classification,” IEEE Transactions on Systems, Man, and Cybernetics - Part B: Cybernetics: A Publication of the IEEE Systems, Man, and Cybernetics Society, vol. 42, pp. 513–529, 2012.
View at: Publisher Site | Google Scholar
X. P. Zhang and L. X. Qin, “An improved extreme learning machine for imbalanced data classification,” IEEE Access, vol. 10, pp. 8634–8642, 2022.
View at: Publisher Site | Google Scholar
J. Wang, S. Y. Lu, S. H. Wang, and Y. D. Zhang, “A review on extreme learning machine,” Multimedia Tools and Applications, vol. 81, no. 29, pp. 41611–41660, 2022.
View at: Publisher Site | Google Scholar
Y. P. Zhao and Y. B. Chen, “Extreme learning machine based transfer learning for aero engine fault diagnosis,” Aerospace Science and Technology, vol. 121, Article ID 107311, 2022.
View at: Publisher Site | Google Scholar
R. J. Liang, Y. Chen, and R. P. Zhu, “A novel fault diagnosis method based on the kelm optimized by whale optimization algorithm,” Machines, vol. 10, no. 2, 2022.
View at: Publisher Site | Google Scholar
L. L. Zhao, J. H. Qian, F. C. Tian et al., “A weighted discriminative extreme learning machine design for lung cancer detection by an electronic nose system,” IEEE Transactions on Instrumentation and Measurement, vol. 70, pp. 1–9, 2021.
View at: Publisher Site | Google Scholar
X. M. Yu, Y. Feng, Y. L. Gao, Y. B. Jia, and S. H. Mei, “Dual-weighted kernel extreme learning machine for hyperspectral imagery classification,” Remote Sensing, vol. 13, 2021, 3.
View at: Publisher Site | Google Scholar
S. K. Meher and N. S. Kothari, “Interpretable rule-based fuzzy elm and domain adaptation for remote sensing image classification,” IEEE Transactions on Geoscience and Remote Sensing, vol. 59, pp. 5907–5919, 2021.
View at: Publisher Site | Google Scholar
S. Saurav, R. Saini, and S. Singh, “Facial expression recognition using dynamic local ternary patterns with kernel extreme learning machine classifier,” IEEE Access, vol. 9, pp. 120844–120868, 2021.
View at: Publisher Site | Google Scholar
Z. Y. Lian, L. J. Duan, Y. H. Qiao, J. Chen, J. Miao, and M. G. Li, “The improved elm algorithms optimized by bionic WOA for EEG classification of brain computer interface,” IEEE Access, vol. 9, pp. 67405–67416, 2021.
View at: Publisher Site | Google Scholar
X. M. Zhang, Q. Xiong, Y. Dai, X. Xu, and G. Song, “An ECoG-based binary classification of BCI using optimized extreme learning machine,” Complexity, vol. 2020, Article ID 2913019, 13 pages, 2020.
View at: Publisher Site | Google Scholar
Y. Lu, L. K. Luo, D. Huang, Y. H. Wang, and L. M. Chen, “Knowledge transfer in vision recognition: a survey,” ACM Computing Surveys, vol. 53, no. 2, pp. 1–35, 2020.
View at: Publisher Site | Google Scholar
S. J. Pan and Q. Yang, “A survey on transfer learning,” IEEE Transactions on Knowledge and Data Engineering, vol. 22, no. 10, pp. 1345–1359, 2010.
View at: Publisher Site | Google Scholar
K. Weiss, T. M. Khoshgoftaar, and D. D. Wang, “A survey of transfer learning,” Journal of Big Data, vol. 3, pp. 9–40, 2016, 1.
View at: Publisher Site | Google Scholar
R. K. Sanodiya and L. Yao, “Unsupervised transfer learning via relative distance comparisons,” IEEE Access, vol. 8, pp. 110290–110305, 2020.
View at: Publisher Site | Google Scholar
S. Li, S. J. Song, and G. Huang, “Prediction reweighting for domain adaptation,” IEEE Transactions on Neural Networks and Learning Systems, vol. 28, pp. 1682–1695, 2017, 7.
View at: Publisher Site | Google Scholar
L. Yan, R. X. Zhu, Y. Liu, and N. Mo, “TrAdaBoost based on improved particle swarm optimization for cross-domain scene classification with limited samples,” Ieee Journal of Selected Topics in Applied Earth Observations and Remote Sensing, vol. 11, pp. 3235–3251, 2018.
View at: Publisher Site | Google Scholar
A. Gretton, A. Smola, J. Huang, M. Schmittfull, K. Borgwardt, and B. Scholkopf, “Covariate shift by kernel mean matching,” Dataset Shift in Machine Learning, vol. 3, no. 4, pp. 131–160, 2008.
View at: Publisher Site | Google Scholar
S. J. Pan, I. W. Tsang, J. T. Kwok, and Q. Yang, “Domain adaptation via transfer component analysis,” IEEE Transactions on Neural Networks, vol. 22, pp. 199–210, 2011.
View at: Publisher Site | Google Scholar
M. S. Long, J. M. Wang, G. G. Ding, J. G. Sun, and P. S. Yu, “Transfer feature learning with joint distribution adaptation,” in Proceedings of the International Conference on Computer Vision (ICCV 2013), pp. 2200–2207, Sydney Australia, December 2013.
View at: Google Scholar
J. D. Wang, Y. Q. Chen, S. J. Hao, W. J. Feng, and Z. Q. Shen, “Balanced distribution adaptation for transfer learning,” in Proceedings of the IEEE international conference on data mining (ICDM 2017), pp. 1129–1134, New Orleans, LA, USA, November 2017.
View at: Google Scholar
J. J. Li, M. M. Jing, K. Lu, L. Zhu, and H. T. Shen, “Locality preserving joint transfer for domain adaptation,” IEEE Transactions on Image Processing, vol. 28, no. 12, pp. 6103–6115, 2019.
View at: Publisher Site | Google Scholar
L. Bruzzone and M. Marconcini, “Domain adaptation problems: a DASVM classification technique and a circular validation strategy,” IEEE Transactions on Pattern Analysis and Machine Intelligence, vol. 32, pp. 770–787, 2010.
View at: Publisher Site | Google Scholar
M. T. Bahadori, Y. Liu, and D. Zhang, “Learning with minimum supervision: a general framework for transductive transfer learning,” in Proceedings of the 2011 IEEE 11th International Conference on Data Mining (ICDM 2011), pp. 61–70, Vancouver, Canada, December 2011.
View at: Google Scholar
R. Chattopadhyay, Q. Sun, W. Fan, I. Davidson, S. Panchanathan, and J. P. Ye, “Multisource domain adaptation and its application to early detection of fatigue,” ACM Transactions on Knowledge Discovery from Data, vol. 6, pp. 1–26, 2012.
View at: Publisher Site | Google Scholar
T. Han, C. Liu, R. Wu, and D. X. Jiang, “Deep transfer learning with limited data for machinery fault diagnosis,” Applied Soft Computing, vol. 103, p. 107150, 2021.
View at: Publisher Site | Google Scholar
W. Zhang, X. Li, H. Ma, Z. Luo, and X. Li, “Universal domain adaptation in fault diagnostics with hybrid weighted deep adversarial learning,” IEEE Transactions on Industrial Informatics, vol. 17, pp. 7957–7967, 2021.
View at: Publisher Site | Google Scholar
L. Zhang and D. Zhang, “Domain adaptation extreme learning machines for drift compensation in E-nose systems,” IEEE Transactions on Instrumentation and Measurement, vol. 64, no. 7, pp. 1790–1801, 2014.
View at: Google Scholar
Z. Y. Ma, G. C. Luo, K. Qin, N. Wang, and W. N. Niu, “Online sensor drift compensation for E-nose systems using domain adaptation and extreme learning machine,” Sensors, vol. 18, no. 3, 2018.
View at: Publisher Site | Google Scholar
Z. Y. Ma, G. C. Luo, K. Qin, N. Wang, and W. N. Niu, “Weighted domain transfer extreme learning machine and its online version for gas sensor drift compensation in E-nose systems,” Wireless Communications and Mobile Computing, vol. 2018, Article ID 2308237, 17 pages, 2018.
View at: Publisher Site | Google Scholar
J. S. Xia, N. Yokoya, and A. Iwasaki, “Boosting for domain adaptation extreme learning machines for hyperspectral Image Classification,” in Proceedings of the 2018 IEEE International Geoscience and Remote Sensing Symposium(IGARSS 2018), pp. 3615–3618, Valencia, Spain, July 2018.
View at: Google Scholar
S. F. Zang, Y. H. Cheng, X. S. Wang, and Y. Y. Yan, “Transfer extreme learning machine with output weight alignment,” Computational Intelligence and Neuroscience, vol. 2021, Article ID 6627765, 14 pages, 2021.
View at: Publisher Site | Google Scholar
C. Chen, B. Jiang, and X. Jin, “Parameter transfer extreme learning machine based on projective model,” in Proceedings of the International Joint Conference on Neural Networks (IJCNN 2018), pp. 1–8, IEEE, Rio de Janeiro, Brazil, July 2018.
View at: Google Scholar
L. Zhang and D. Zhang, “Robust visual knowledge transfer via extreme learning machine-based domain adaptation,” IEEE Transactions on Image Processing, vol. 25, no. 10, pp. 4959–4973, 2016.
View at: Publisher Site | Google Scholar
X. D. Li, W. J. Mao, and W. Jiang, “Extreme learning machine based transfer learning for data classification,” Neurocomputing, vol. 174, pp. 203–210, 2016.
View at: Publisher Site | Google Scholar
K. M. Borgwardt, A. Gretton, M. J. Rasch, H. P. Kriegel, B. Schölkopf, and A. J. Smola, “Integrating structured biological data by kernel maximum mean discrepancy,” Bioinformatics, vol. 22, no. 14, pp. 49–57, 2006.
View at: Publisher Site | Google Scholar
S. Li, S. J. Song, G. Huang, and C. Wu, “Cross-domain extreme learning machines for domain adaptation,” IEEE Transactions on Systems, Man, and Cybernetics: Systems, vol. 49, no. 6, pp. 1194–1207, 2019.
View at: Publisher Site | Google Scholar
Y. Si, J. Pu, S. F. Zang, and L. F. Sun, “Extreme learning machine based on maximum weighted mean discrepancy for unsupervised domain adaptation,” IEEE Access, vol. 9, pp. 2283–2293, 2021.
View at: Publisher Site | Google Scholar
Y. M. Chen, S. J. Song, S. Li, L. Yang, and C. Wu, “Domain space transfer extreme learning machine for domain adaptation,” IEEE Transactions on Cybernetics, vol. 49, no. 5, pp. 1909–1922, 2019.
View at: Publisher Site | Google Scholar
S. F. Zang, Y. H. Cheng, X. S. Wang, Q. Yu, and G. S. Xie, “Cross domain mean approximation for unsupervised domain adaptation,” IEEE Access, vol. 8, pp. 139052–139069, 2020.
View at: Publisher Site | Google Scholar
B. Fernando, A. Habrard, M. Sebban, and T. Tuytelaars, “Unsupervised visual domain adaptation using subspace alignment,” in Proceedings of the 2013 IEEE International Conference on Computer Vision (CVPR 2013), pp. 2960–2967, Sydney Australia, December 2013.
View at: Google Scholar
B. Gong, Y. Shi, F. Sha, and K. Grauman, “Geodesic flow kernel for unsupervised domain adaptation,” in Proceedings of the 2012 IEEE International Conference on Computer Vision (CVPR 2012), pp. 2066–2073, Rhode Island, RI, USA, June 2012.
View at: Google Scholar
M. Uzair and A. Mian, “Blind domain adaptation with augmented extreme learning machine features,” IEEE Transactions on Cybernetics, vol. 47, no. 3, pp. 651–660, 2017.
View at: Publisher Site | Google Scholar
M. Long, J. Wang, G. Ding, S. J. Pan, and P. S. Yu, “Adaptation regularization: a general framework for transfer learning,” IEEE Transactions on Knowledge and Data Engineering, vol. 26, no. 5, pp. 1076–1089, 2014.
View at: Publisher Site | Google Scholar

Copyright

Copyright © 2023 Shaofei Zang et al. This is an open access article distributed under the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.

PDF Download Citation

Download other formats

Order printed copies

Complexity

Joint Transfer Extreme Learning Machine with Cross-Domain Mean Approximation and Output Weight Alignment

Abstract

1. Introduction

2. Related Work

2.1. Extreme Learning Machine (ELM)

2.2. Cross-Domain Mean Approximation (CDMA)

2.3. Subspace Alignment (SA)

3. Joint Transfer Extreme Learning Machine (JTELM)

3.1. Extreme Learning Machine with CDMA

3.2. Extreme Learning Machine with Output Weight Alignment and Approximation

3.3. Joint Decision

3.4. Discussion

4. Experiment and Analysis

4.1. Datasets Description

4.2. Experimental Settings

4.3. Results and Analysis

4.4. Ablation Study

4.5. Parameter Analysis

5. Conclusion

Data Availability

Conflicts of Interest

Acknowledgments

References

Copyright