nips nips2012 nips2012-130 nips2012-130-reference knowledge-graph by maker-knowledge-mining

130 nips-2012-Feature-aware Label Space Dimension Reduction for Multi-label Classification

Source: pdf

Author: Yao-nan Chen, Hsuan-tien Lin

Abstract: Label space dimension reduction (LSDR) is an efﬁcient and effective paradigm for multi-label classiﬁcation with many classes. Existing approaches to LSDR, such as compressive sensing and principal label space transformation, exploit only the label part of the dataset, but not the feature part. In this paper, we propose a novel approach to LSDR that considers both the label and the feature parts. The approach, called conditional principal label space transformation, is based on minimizing an upper bound of the popular Hamming loss. The minimization step of the approach can be carried out efﬁciently by a simple use of singular value decomposition. In addition, the approach can be extended to a kernelized version that allows the use of sophisticated feature combinations to assist LSDR. The experimental results verify that the proposed approach is more effective than existing ones to LSDR across many real-world datasets. 1

reference text

[1] I. Katakis, G. Tsoumakas, and I. Vlahavas. Multilabel text classiﬁcation for automated tag suggestion. In Proceedings of the European Conference on Machine Learning and Principles and Practice of Knowledge Discovery in Databases 2008 Discovery Challenge, 2008.

[2] M. Boutell, J. Luo, X. Shen, and C. Brown. Learning multi-label scene classiﬁcation. Pattern Recognition, 2004.

[3] A. Elisseeff and J. Weston. A kernel method for multi-labelled classiﬁcation. In Advances in Neural Information Processing Systems 14, 2001.

[4] D. Hsu, S. Kakade, J. Langford, and T. Zhang. Multi-Label prediction via compressed sensing. In Advances in Neural Information Processing Systems 22, 2009.

[5] F. Tai and H.-T. Lin. Multi-Label classiﬁcation with principal label space transformation. In Neural Computation, 2012.

[6] H. Hotelling. Relations between two sets of variates. Biometrika, 1936.

[7] M. Wall, A. Rechtsteiner, and L. Rocha. Singular value decomposition and principal component analysis. A Practical Approach to Microarray Data Analysis, 2003.

[8] I. Jolliffe. Principal Component Analysis. Springer, second edition, October 2002.

[9] E. Barshan, A. Ghodsi, Z. Azimifar, and M. Zolghadri Jahromi. Supervised principal component analysis: Visualization, classiﬁcation and regression on subspaces and submanifolds. Pattern Recognition, 2011.

[10] K.-C. Li. Sliced inverse regression for dimension reduction. Journal of the American Statistical Association, 1991.

[11] K. Fukumizu, F. Bach, and M. Jordan. Dimensionality reduction for supervised learning with reproducing kernel hilbert spaces. Journal of Machine Learning Research, 2004.

[12] L. Sun, S. Ji, and J. Ye. Canonical correlation analysis for multilabel classiﬁcation: A least-squares formulation, extensions, and analysis. IEEE Transactions on Pattern Analysis and Machine Intelligence, 2011.

[13] G. Tsoumakas, I. Katakis, and I. Vlahavas. Mining multi-label data. In Data Mining and Knowledge Discovery Handbook. Springer US, 2010.

[14] K. Dembczynski, W. Waegeman, W. Cheng, and E. Hüllermeier. On label dependence and loss minimization in multi-label classiﬁcation. Machine Learning, 2012.

[15] J. Weston, O. Chapelle, A. Elisseeff, B. Schölkopf, and V. Vapnik. Kernel dependency estimation. In Advances in Neural Information Processing Systems 15, 2002.

[16] J. Kettenring. Canonical analysis of several sets of variables. Biometrika, 1971.

[17] S. Yu, K. Yu, V. Tresp, and H.-P. Kriegel. Multi-output regularized feature projection. IEEE Transactions on Knowledge and Data Engineering, 2006.

[18] Y. Zhang and J. Schneider. Multi-label output codes using canonical correlation analysis. In Proceedings of the Fourteenth International Conference on Artiﬁcial Intelligence and Statistics, 2011.

[19] D. Hoaglin and R. Welsch. The hat matrix in regression and ANOVA. The American Statistician, 1978.

[20] C. Eckart and G. Young. The approximation of one matrix by another of lower rank. Psychometrika, 1936.

[21] B. Schölkopf and A. Smola. Learning with kernels : support vector machines, regularization, optimization, and beyond. The MIT Press, ﬁrst edition, 2002.

[22] G. Saunders, A. Gammerman, and V. Vovk. Ridge regression learning algorithm in dual variables. In Proceedings of the Fifteenth International Conference on Machine Learning, 1998.

[23] G. Tsoumakas, E. Spyromitros-Xiouﬁs, J. Vilcek, and I. Vlahavas. Mulan: A java library for multi-label learning. Journal of Machine Learning Research, 2011.

[24] B. Datta. Numerical Linear Algebra and Applications, Second Edition. SIAM-Society for Industrial and Applied Mathematics, 2010.

[25] Y.-N. Chen. Feature-aware label space dimension reduction for multi-label classiﬁcation problem. Master’s thesis, National Taiwan University, 2012.

[26] Y. Wang and I. Witten. Induction of model trees for predicting continuous classes. In Poster Papers of the Nineth European Conference on Machine Learning, 1997.

[27] M. Hall, E. Frank, G. Holmes, B. Pfahringer, P. Reutemann, and I. Witten. The weka data mining software: an update. SIGKDD Exploration Newsletter, 2009. 9