nips nips2012 nips2012-130 nips2012-130-reference knowledge-graph by maker-knowledge-mining

130 nips-2012-Feature-aware Label Space Dimension Reduction for Multi-label Classification


Source: pdf

Author: Yao-nan Chen, Hsuan-tien Lin

Abstract: Label space dimension reduction (LSDR) is an efficient and effective paradigm for multi-label classification with many classes. Existing approaches to LSDR, such as compressive sensing and principal label space transformation, exploit only the label part of the dataset, but not the feature part. In this paper, we propose a novel approach to LSDR that considers both the label and the feature parts. The approach, called conditional principal label space transformation, is based on minimizing an upper bound of the popular Hamming loss. The minimization step of the approach can be carried out efficiently by a simple use of singular value decomposition. In addition, the approach can be extended to a kernelized version that allows the use of sophisticated feature combinations to assist LSDR. The experimental results verify that the proposed approach is more effective than existing ones to LSDR across many real-world datasets. 1


reference text

[1] I. Katakis, G. Tsoumakas, and I. Vlahavas. Multilabel text classification for automated tag suggestion. In Proceedings of the European Conference on Machine Learning and Principles and Practice of Knowledge Discovery in Databases 2008 Discovery Challenge, 2008.

[2] M. Boutell, J. Luo, X. Shen, and C. Brown. Learning multi-label scene classification. Pattern Recognition, 2004.

[3] A. Elisseeff and J. Weston. A kernel method for multi-labelled classification. In Advances in Neural Information Processing Systems 14, 2001.

[4] D. Hsu, S. Kakade, J. Langford, and T. Zhang. Multi-Label prediction via compressed sensing. In Advances in Neural Information Processing Systems 22, 2009.

[5] F. Tai and H.-T. Lin. Multi-Label classification with principal label space transformation. In Neural Computation, 2012.

[6] H. Hotelling. Relations between two sets of variates. Biometrika, 1936.

[7] M. Wall, A. Rechtsteiner, and L. Rocha. Singular value decomposition and principal component analysis. A Practical Approach to Microarray Data Analysis, 2003.

[8] I. Jolliffe. Principal Component Analysis. Springer, second edition, October 2002.

[9] E. Barshan, A. Ghodsi, Z. Azimifar, and M. Zolghadri Jahromi. Supervised principal component analysis: Visualization, classification and regression on subspaces and submanifolds. Pattern Recognition, 2011.

[10] K.-C. Li. Sliced inverse regression for dimension reduction. Journal of the American Statistical Association, 1991.

[11] K. Fukumizu, F. Bach, and M. Jordan. Dimensionality reduction for supervised learning with reproducing kernel hilbert spaces. Journal of Machine Learning Research, 2004.

[12] L. Sun, S. Ji, and J. Ye. Canonical correlation analysis for multilabel classification: A least-squares formulation, extensions, and analysis. IEEE Transactions on Pattern Analysis and Machine Intelligence, 2011.

[13] G. Tsoumakas, I. Katakis, and I. Vlahavas. Mining multi-label data. In Data Mining and Knowledge Discovery Handbook. Springer US, 2010.

[14] K. Dembczynski, W. Waegeman, W. Cheng, and E. Hüllermeier. On label dependence and loss minimization in multi-label classification. Machine Learning, 2012.

[15] J. Weston, O. Chapelle, A. Elisseeff, B. Schölkopf, and V. Vapnik. Kernel dependency estimation. In Advances in Neural Information Processing Systems 15, 2002.

[16] J. Kettenring. Canonical analysis of several sets of variables. Biometrika, 1971.

[17] S. Yu, K. Yu, V. Tresp, and H.-P. Kriegel. Multi-output regularized feature projection. IEEE Transactions on Knowledge and Data Engineering, 2006.

[18] Y. Zhang and J. Schneider. Multi-label output codes using canonical correlation analysis. In Proceedings of the Fourteenth International Conference on Artificial Intelligence and Statistics, 2011.

[19] D. Hoaglin and R. Welsch. The hat matrix in regression and ANOVA. The American Statistician, 1978.

[20] C. Eckart and G. Young. The approximation of one matrix by another of lower rank. Psychometrika, 1936.

[21] B. Schölkopf and A. Smola. Learning with kernels : support vector machines, regularization, optimization, and beyond. The MIT Press, first edition, 2002.

[22] G. Saunders, A. Gammerman, and V. Vovk. Ridge regression learning algorithm in dual variables. In Proceedings of the Fifteenth International Conference on Machine Learning, 1998.

[23] G. Tsoumakas, E. Spyromitros-Xioufis, J. Vilcek, and I. Vlahavas. Mulan: A java library for multi-label learning. Journal of Machine Learning Research, 2011.

[24] B. Datta. Numerical Linear Algebra and Applications, Second Edition. SIAM-Society for Industrial and Applied Mathematics, 2010.

[25] Y.-N. Chen. Feature-aware label space dimension reduction for multi-label classification problem. Master’s thesis, National Taiwan University, 2012.

[26] Y. Wang and I. Witten. Induction of model trees for predicting continuous classes. In Poster Papers of the Nineth European Conference on Machine Learning, 1997.

[27] M. Hall, E. Frank, G. Holmes, B. Pfahringer, P. Reutemann, and I. Witten. The weka data mining software: an update. SIGKDD Exploration Newsletter, 2009. 9