Multilabel Classification Using Low-Rank Decomposition

Yang, Bo; Tong, Kunkun; Zhao, Xueqing; Pang, Shanmin; Chen, Jinguang

doi:https://doi.org/10.1155/2020/1279253

Discrete Dynamics in Nature and Society

On this page

Abstract Introduction Materials and Methods Results and Discussion Conclusions Data Availability Conflicts of Interest Acknowledgments References Copyright Related Articles

Special Issue

Cognitive Modeling of Multimodal Data Intensive Systems for Applications in Nature and Society (COMDICS)

View this Special Issue

Research Article | Open Access

Volume 2020 | Article ID 1279253 | https://doi.org/10.1155/2020/1279253

Multilabel Classification Using Low-Rank Decomposition

Bo Yang,^1,2Kunkun Tong,¹Xueqing Zhao,¹Shanmin Pang,³and Jinguang Chen¹

Guest Editor: Longzhuang Li

Received19 Dec 2019

Accepted21 Feb 2020

Published07 Apr 2020

Abstract

In the multilabel learning framework, each instance is no longer associated with a single semantic, but rather with concept ambiguity. Specifically, the ambiguity of an instance in the input space means that there are multiple corresponding labels in the output space. In most of the existing multilabel classification methods, a binary annotation vector is used to denote the multiple semantic concepts. That is, +1 denotes that the instance has a relevant label, while −1 means the opposite. However, the label representation contains too little semantic information to truly express the differences among multiple different labels. Therefore, we propose a new approach to transform binary label into a real-valued label. We adopt the low-rank decomposition to get latent label information and then incorporate the information and original features to generate new features. Then, using the sparse representation to reconstruct the new instance, the reconstruction error can also be applied in the label space. In this way, we finally achieve the purpose of label conversion. Extensive experiments validate that the proposed method can achieve comparable to or even better results than other state-of-the-art algorithms.

1. Introduction

Classification is a high-frequency vocabulary in machine learning. We often say that classification generally refers to single-label classification, that is, an object is given a category. In multilabel learning, the meaning of classification is multilabel classification. Specifically, an instance is associated with more than one class label simultaneously. Multilabel learning has many application fields, such as web mining [1–3], text categorization [4–6], multimedia contents annotation [7–11], and bioinformatics [12–14].

In recent years, the field of multilabel learning has gradually attracted significant attention. A variety of algorithms have been proposed, which can be basically divided into two categories [15]: algorithm adaptation and problem transformation. The core idea of the former is to transform the previous supervised learning algorithm so that it can be used to solve multilabel learning problems, such as ML-kNN [16], while the latter is to convert the multilabel learning problem into other known problems to solve, such as BR [17]. Some multilabel algorithms solve the multilabel learning problem without using the correlation among different labels, such as LIFT [18]. The main idea of the LIFT is to obtain the identifying characteristics of each label and build a new feature space. It first obtains the positive and negative examples corresponding to each label and then performs cluster analysis on the corresponding set of examples to obtain the cluster centers and finally uses the cluster centers to construct the label-specific features. In the process of solving the multilabel learning problem, LIFT does not consider label correlations; hence, it can be regarded as a new feature conversion method. Some algorithms consider the label correlation [19–25] for solving the multilabel learning problem. For example, the basic idea in [20] is to model the correlation among labels based on the Bayesian network and to achieve efficient learning by using the approximate strategy. Indeed, the rational use of the correlation among labels can effectively boost the performance of multilabel classification. For example, if an image has labels “football” and “rainforest,” it is likely to be labeled “Brazil”. It has a low probability of being labeled “river” if a document is annotated with “desert”. Therefore, how to effectively explore and make full use of label correlations is a crucial problem for multilabel learning.

In fact, for an object with multiple labels, the importance of the related labels is still different. Although the importance of each label is not given directly, we can judge the importance of each label through external observation. Generally speaking, the larger the proportion in the original object, the more important the corresponding label. Accordingly, how to accurately express the importance of the label is also a challenge.

The method in [26] decomposes the original output space in order to obtain potential label semantic information, which can effectively increase the ability of the subsequent feature selection. Motivated by the decomposition of the label space in [26], in the paper, we propose a method named label low-rank decomposition (LLRD) for multilabel classification. The LLRD algorithm first performs low-rank decomposition on the label matrix, then combines the decomposed results with the original features to form new features, and mines the structural information of the feature through sparse reconstruction. Third, it transforms the binary label into the real-valued and finally converts the classification problem into a regression problem.

The contribution of this paper is as follows:(1)Utilize low-rank decomposition to reveal the global label correlations and achieve good classification results(2)Combine the low-rank decomposition results with the original features reducing the information loss in the subsequent label transformation process(3)Carry out extensive experiments on different field datasets to verify the effectiveness of different algorithms

2. Materials and Methods

2.1. Datasets

In this experiment, a total of 13 datasets were used covering four fields: audio, text, image, and biology. All these data resources can be collected from Mulan (http://mulan.sourceforge.net/datasets.html) and Meka (http://meka.sourceforge.net/#datasetsru). Table 1 gives the specific details of the datasets. The number of instances, label space, and the dimension of features are denoted by |S|, L(S), and D(S), respectively. LDen (S) is the density of label, which is the result of the normalization of label cardinality LCard(S).

2.2. Notations

Formally, suppose be the d-dimensional input space and denote the output domain of q class labels. Let be the multilabel training dataset with p examples, where is a d-dimensional instance vector and is the label vector corresponding to . Let represent the input data matrix, and denote the matrix from which is removed from . Let is a matrix composed of label vector.

2.3. The Process of LLRD

First, LLRD decomposes the label matrix with low-rank method. In the framework of multilabel learning, label matrix is often considered to be low rank [27, 28] due to the existence of label correlations. Low-rank structure is also a way to explore the global relationship between labels. Therefore, we can perform low-rank decomposition on the label matrix. Assuming that the rank of is r < q, can be written as follows:where represents the dependency of on the original label space and is a mapping of the original label and also contains label correlation information.

Second, we combine with to form a new feature space . In order to reveal the inner structure of the feature space, we use sparse reconstruction [29] method to model the relationship between the training instances. Specifically, we use to represent the training object relationship matrix, where is a measure of the relationship between and . Let denote the corresponding sparse reconstruction coefficient related to . According to the sparse representation theory, can be calculated as follows:where represent a combination of all training instances except . We can solve the above problem using alternating direction method of multiplier [30].

Third, we transform the original binary label set associated with any in the training set into a real-valued label vector , where and . Because the real value contains more information, and through the size of the value, we can also infer the importance of the label. Since the input space and the label space are often interrelated, it is assumed that the relationship between and in the input space also exists between and in the label space. Accordingly, the representation errors of different elements in the label space can be written as follows:where . The above quadratic programming problem can be solved by mature tools related to quadratic programming. The original multilabel classification problem can be transferred into a multioutput regression problem. There are many solutions [31] to solve it. The learning of LLRD method contains three phases: low-rank decomposition, sparse reconstruction, and multioutput regression. The time complexity of low-rank decomposition and sparse reconstruction is . If we choose multioutput support vector regression to realize the classification, the time complexity is . Thus, the total complexity of LLRD is .

3. Results and Discussion

3.1. Experiment Setup

In this subsection, we investigate comparisons between our LLRD and other six multilabel learning methods on six multilabel evaluation criteria, which include two categories: example-based and label-based metrics [32]. The example-based metric is to first obtain the performance of the learning system on each test example and finally returns the average of the entire test set. Unlike the above example-based metric, the label-based metric first returns the performance of the system on each label and finally gets the macro/microaveraged F1 value on all labels.

In this paper, one-error, coverage, ranking loss, and average precision are employed for example-based performance evaluation. And macroaveraging and microaveraging F1 are label-based metrics. For example-based metrics except average precision, as their values increase, it means that the performance of the algorithm is worse. For the remaining metrics, their values are proportional to the performance of the algorithm.

Let be the multilabel test set and can be seen as the confidence of being the corresponding label associating with . In addition, can be converted into a ranking function . If holds, then the corresponding ranking function has .

The six evaluation criteria for the algorithm used in the paper are defined as follows:(1)One-error:(2)Coverage:(3)Ranking loss:(4)Average precision:(5)Macroaveraging F1:(6)Microaveraging F1:where FN_j, TN_j, FP_j, and TP_j indicate the number of false-negative, true-negative, false-positive, and true-positive instances with regard to .

In order to test the effectiveness of LLRD, we chose six multilabel learning algorithms MLFE [33], RAKEL [34], ML² [35], CLR [36], LIFT [18], and RELIAB [37] for performance comparison. MLFE makes full use of the intrinsic information in feature space, making the semantics of the label space more abundant. The specific parameters of MLFE are set as follows: , , , and , , and searched from {1, 2, …,10}, {1, 10, 15}, and {1, 10}. RAKEL is a high-order approach. The basic idea of the algorithm is to transform the multilabel learning problem into integration of multiclass classification problem. We use the default settings recommended by RAKEL algorithm, namely, , ensemble size . For ML², respective parameter values are recorded as follows: and and selected from {1, 2, …, 10}. ML² is the first multilabel learning algorithm to attempt to explore manifolds at the label level. CLR is a second-order problem transformation method. It solves the problem of multilabel classification by using label ranking, in which ranking among labels is implemented by pairwise comparison. The associated parameter ensemble size is set to . LIFT uses different feature sets to distinguish different labels by clustering positive and negative examples. The value of ratio parameter r is 0.1, as suggested in [18]. RELIAB utilizes the implicit relative information of label to achieve the task of multilabel learning. The parameters and take values from {0.1, 0.15, …, 0.5} and {0.001, 0.01, …, 10}, respectively. For LLRD, , r can be selected from {1, 2, …, q−1}. In a word, the parameter settings of the comparison algorithm are as recommended in the related papers.

3.2. Experimental Results

For each dataset in our experiment, we adopt the tenfold cross-validation strategy. Our experimental results are mainly distributed in Tables 2 and 3, where we record the performance of different algorithms in different multilabel datasets. Specifically, the average and standard deviation of the corresponding evaluation criteria are recorded in the tables. For each evaluation metric, “↓” indicates “the smaller the better” and “↑” indicates “the larger the better”. The best results are shown in bold form.

We use Friedman test [38] based on the average ranks for verifying whether the difference between algorithms is statistically significant. If the assumption that “all algorithms have equal performance” is rejected, it means that the performance of each algorithm is significantly different. As can be seen from the data presented in Table 4, the hypothesis that there is no significant difference among the algorithms is not valid under the condition of 0.05 significance level. Therefore, we need to conduct a post hoc test to further distinguish the various algorithms. Usually, there are two options for post hoc test, one is the Nemenyi test [38] and the other is the Bonferroni–Dunn test [39]. For algorithms, the former needs to compare times, while the latter only needs times in some cases. Thus, we choose the latter. The Bonferroni–Dunn test is used to test whether LLRD is more competitive than the comparative algorithm, in which LLRD plays a role of control algorithm. When the difference of average rank between two algorithms is more than one critical difference CD, the performance of two algorithms is obviously different. The CD value mentioned here can be calculated from , where k = 7 and N = 13, when the significance level is 0.05, the corresponding .

The CD diagram associated with LLRD and its comparison algorithm is shown in Figure 1. The numbers on the horizontal axis of the coordinate indicate the average rank value of each algorithm under different evaluation criteria. There is no significant difference in performance among the various algorithms connected by solid lines.

(a)

(b)

(c)

(d)

(e)

(f)

Through the analysis of the above experimental results, we can draw the following conclusions:(1)In terms of the four evaluation criteria of one-error, coverage, ranking loss, and average precision, LLRD is obviously superior to RELIAB, RAKEL, and CLR.(2)The smaller the average rank value, the better the performance of the corresponding. For LLRD, five of the average rank value in the six CD subdiagrams are optimal, which shows LLRD outperforms other algorithms.(3)For regular-size datasets, LLRD ranks first in 69% of the cases under different evaluation criteria, while for large-scale datasets, it ranks first in 36.1%.

4. Conclusions

In this work, we propose a novel multilabel classification algorithm named LLRD, which adopts the low-rank decomposition to gain the internal information of label and further reduce the information loss of the label transformation via the new feature space. Experimental results show that the performance of the proposed LLRD is better than many state-of-the-art multilabel classification techniques. In the future, we will explore alternative models combining the low-rank decomposition and classification into a joint optimization problem for considering more complex correlation of labels.

Data Availability

The datasets used in our manuscript are all public datasets, which can be downloaded from “http://mulan.sourceforge.net/datasets.html” and “http://meka.sourceforge.net/#datasetsru”.

Conflicts of Interest

The authors declare that they have no conflicts of interest.

Acknowledgments

This work was supported by the National Key R&D Program of China (2019YFC1521400), National Natural Science Foundation of China (61806159 and 61806160, 61972312), and China Postdoctoral Science Foundation (2018M631192).

References

L. Tang, S. Rajan, and V. K. Narayanan, “Large scale multi-label classification via metalabeler,” in Proceedings of the 19th International Conference on World Wide Web, pp. 211–220, Madrid, Spain, July 2009.
View at: Google Scholar
B. Yang, J. T. Sun, T. Wang, and Z. Chen, “Effective multi-label active learning for text classification,” in Proceedings of the 15th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pp. 917–926, Paris, France, July 2009.
View at: Google Scholar
H. Kazawa, T. Izumitani, H. Taira, and E. Maeda, “Maximal margin labeling for multi-topic text categorization,” in Advances In Neural Information Processing Systems 17, L. K. Saul, Y. Weiss, and L. Bottou, Eds., pp. 649–656, MIT Press, Cambridge, MA, USA, 2005.
View at: Google Scholar
A. McCallum, “Multi-label text classification with a mixture model trained by EM,” in Proceedings of the Working Notes of the AAAI Workshop on Learning for Text Categorization, Orlando, FL, USA, July 1999.
View at: Google Scholar
R. E. Schapire, Y. Singer, and Boostexter, “A boosting-based system for text categorization,” Machine Learning, vol. 39, no. 2‐3, pp. 135–168, 2000.
View at: Google Scholar
N. Ueda and K. Saito, “Parametric mixture models for multi-label text,” in Advances In Neural Information Processing Systems 15, S. Becker, S. Thrun, and K. Obermayer, Eds., pp. 721–728, MIT Press, Cambridge, MA, USA, 2003.
View at: Google Scholar
F. Kang, R. Jin, and R. Sukthankar, “Correlated label propagation with application to multi-label learning,” in Proceedings of the IEEE Computer Society Conference on Computer Vision and Pattern Recognition, pp. 1719–1726, New York, NY, USA, June 2006.
View at: Google Scholar
K. Trohidis, G. Tsoumakas, G. Kalliris, and I. Vlahavas, “Multilabel classification of music into emotions,” in Proceedings of the 9th I International Conference on Music Information Retrieval, pp. 325–330, Philadephia, PA, USA, 2008.
View at: Google Scholar
C. Sanden and J. Z. Zhang, “Enhancing multi-label music genre classification through ensemble techniques,” in Proceedings of the 34th SIGIR, pp. 705–714, Beijing, China, July 2011.
View at: Google Scholar
R. C. Hong, M. Wang, Y. Gao et al., “Image annotation by multiple-instance learning with discriminative feature mapping and selection,” IEEE Transactions on Cybernetics, vol. 44, no. 5, pp. 669–680, 2014.
View at: Publisher Site | Google Scholar
Y. Xia, L. Nie, L. Zhang, Y. Yang, R. Hong, and X. Li, “Weakly supervised multilabel clustering and its applications in computer vision,” IEEE Transactions on Cybernetics, vol. 46, no. 12, pp. 3220–3232, 2016.
View at: Publisher Site | Google Scholar
Elisseeff and J. Weston, “A kernel method for multi-labelled classification,” in Advances In Neural Information Processing Systems 14, T. G. Dietterich, S. Becker, and Z. Ghahramani, Eds., pp. 681–687, MIT Press, Cambridge, MA, USA, 2002.
View at: Google Scholar
M. L. Zhang and Z. H. Zhou, “Multilabel neural networks with applications to functional genomics and text categorization,” IEEE Transactions on Knowledge and Data Engineering, vol. 18, no. 10, pp. 1338–1351, 2006.
View at: Publisher Site | Google Scholar
G. Tsoumakas, I. Katakis, and I. Vlahavas, “Mining multi-label data,” Data Mining and Knowledge Discovery Handbook, Springer, 2010.
View at: Publisher Site | Google Scholar
E. Gibaja and S. Ventura, “A tutorial on multilabel learning,” ACM Computing Surveys, vol. 47, no. 3, pp. 1–38, 2015.
View at: Publisher Site | Google Scholar
M.-L. Zhang and Z.-H. Zhou, “ML-KNN: a lazy learning approach to multi-label learning,” Pattern Recognition, vol. 40, no. 7, pp. 2038–2048, 2007.
View at: Publisher Site | Google Scholar
M. R. Boutell, J. Luo, X. Shen, and C. M. Brown, “Learning multi-label scene classification,” Pattern Recognition, vol. 37, no. 9, pp. 1757–1771, 2004.
View at: Publisher Site | Google Scholar
M.-L. Zhang and L. Wu, “Lift: multi-label learning with label-specific features,” IEEE Transactions on Pattern Analysis and Machine Intelligence, vol. 37, no. 1, pp. 107–120, 2015.
View at: Publisher Site | Google Scholar
Y. H. Guo and S. C. Gu, “Multi-label classification using conditional dependency networks,” in Proceedings of the Twenty-Second International Joint Conference on Artificial Intelligence, Catalonia, Spain, July 2011.
View at: Google Scholar
M. L. Zhang and K. Zhang, “Multi label learning by exploiting label dependency,” in Proceedings Of the 16th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pp. 999–1008, Washington, DC, USA, 2010.
View at: Google Scholar
B. Fu, G. D. Xu, Z. H. Wang, and L. B. Cao, “Leveraging supervised label dependency propagation for multi-label learning,” in Proceedings Of the IEEE 13th International Conference on Data Mining, pp. 1061–1066, Dallas, TX, USA, December 2013.
View at: Google Scholar
J. Read, B. Pfahringer, G. Holmes, and E. Frank, “Classifier chains for multi-label classification,” in In Lecture Notes in Artificial Intelligence 5782, W, pp. 254–269, Springer, Berlin, Germany, 2009.
View at: Google Scholar
J. Huang, G. R. Li, S. H. Wang, and Q. M. Huang, “Categorizing social multimedia by neighborhood decision using local pairwise label correlation,” in Proceedings Of the IEEE International Conference on Data Mining Workshop, pp. 913–920, Shenzhen, China, December 2014.
View at: Google Scholar
H. W. Liu, Z. J. Ma, S. C. Zhang, and X. D. Wu, “Penalized partial least square discriminant analysis with ℓ1-norm for multi-label data,” Pattern Recognition, vol. 48, no. 5, pp. 1724–1733, 2015.
View at: Google Scholar
S. J. Huang and Z. H. Zhou, “Multi-label learning by exploiting label correlations locally,” in Proceedings of the 26th AAAI Conference on Artificial Intelligence, pp. 949–955, Toronto, Canada, July 2012.
View at: Google Scholar
J. Ling, J. D. Li, and H. Liu, “Exploiting multilabel information for noise-resilient feature selection,” ACM Transactions on Intelligent Systems and Technology, vol. 9, no. 5, pp. 1–23, 2018.
View at: Google Scholar
C. K. Yeh, W. C. Wu, W. J. Ko, and Y. C. F. Wang, “Learning deep latent space for multi-label classification,” in Proceedings Of the 31st AAAI Conference On Artificial Intelligence, pp. 2838–2844, San Francisco, CA, USA, February 2017.
View at: Google Scholar
K.-H. Huang and H.-T. Lin, “Cost-sensitive label embedding for multi-label classification,” Machine Learning, vol. 106, no. 9-10, pp. 1725–1746, 2017.
View at: Publisher Site | Google Scholar
J. Wright, A. Y. Yang, A. Ganesh, S. S. Sastry, and Y. Ma, “Robust face recognition via sparse representation,” IEEE Transactions on Pattern Analysis and Machine Intelligence, vol. 31, no. 2, pp. 210–227, 2008.
View at: Google Scholar
E. Ghadimi, A. Teixeira, I. Shames, and M. Johansson, “Optimal parameter selection for the alternating direction method of multipliers (ADMM): quadratic problems,” IEEE Transactions on Automatic Control, vol. 60, no. 3, pp. 644–658, 2014.
View at: Publisher Site | Google Scholar
D. Tuia, J. Verrelst, L. Alonso, F. Perez-Cruz, and G. Camps-Valls, “Multioutput support vector regression for remote sensing biophysical parameter estimation,” IEEE Geoscience and Remote Sensing Letters, vol. 8, no. 4, pp. 804–808, 2011.
View at: Publisher Site | Google Scholar
M. L. Zhang and Z. H. Zhou, “A review on multi-label learning algorithms,” IEEE Transactions on Knowledge and Data Engineering, vol. 26, no. 8, pp. 1819–1837, 2013.
View at: Google Scholar
Q. W. Zhang, Y. Zhong, and M. L. Zhang, “Feature-induced labeling information enrichment for multi-label learning,” in Proceedings of the IEEE International Conference on Artificial Intelligence, pp. 4446–4453, Taichung, Taiwan, January 2018.
View at: Google Scholar
G. Tsoumakas, I. Katakis, and I. Vlahavas, “Random k-labelsets for multilabel classification,” IEEE Transactions on Knowledge and Data Engineering, vol. 23, no. 7, pp. 1079–1089, 2010.
View at: Google Scholar
P. Hou, X. Geng, and M. L. Zhang, “Multi-label manifold learning,” in Proceedings of the 30th AAAI Conference On Artificial Intelligence, Phoenix, AZ, USA, February 2016.
View at: Google Scholar
J. Fürnkranz, E. Hüllermeier, E. Loza Mencía, and K. Brinker, “Multilabel classification via calibrated label ranking,” Machine Learning, vol. 73, no. 2, pp. 133–153, 2008.
View at: Publisher Site | Google Scholar
Y. K. Li, M. L. Zhang, and X. Geng, “Leveraging implicit relative labeling-importance information for effective multi-label learning,” in Proceedings Of 15th IEEE International Conference On Data Mining, pp. 251–260, Atlantic City, NJ, USA, November 2015.
View at: Google Scholar
J. Demˇsar, “Statistical comparisons of classifiers over multiple data sets,” Journal of Machine Learning Research, vol. 7, pp. 1–30, 2006.
View at: Google Scholar
A. Benavoli, G. Corani, and F. Mangili, “Should we really use post-hoc tests based on mean-ranks?” Journal of Machine Learning Research, vol. 17, no. 1, pp. 152–161, 2016.
View at: Google Scholar

Copyright

Copyright © 2020 Bo Yang et al. This is an open access article distributed under the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.

PDF Download Citation

Download other formats

Order printed copies