nips nips2012 nips2012-207 nips2012-207-reference knowledge-graph by maker-knowledge-mining
Source: pdf
Author: Wei Bi, James T. Kwok
Abstract: In hierarchical classification, the prediction paths may be required to always end at leaf nodes. This is called mandatory leaf node prediction (MLNP) and is particularly useful when the leaf nodes have much stronger semantic meaning than the internal nodes. However, while there have been a lot of MLNP methods in hierarchical multiclass classification, performing MLNP in hierarchical multilabel classification is much more difficult. In this paper, we propose a novel MLNP algorithm that (i) considers the global hierarchy structure; and (ii) can be used on hierarchies of both trees and DAGs. We show that one can efficiently maximize the joint posterior probability of all the node labels by a simple greedy algorithm. Moreover, this can be further extended to the minimization of the expected symmetric loss. Experiments are performed on a number of real-world data sets with tree- and DAG-structured label hierarchies. The proposed method consistently outperforms other hierarchical and flat multilabel classification methods. 1
[1] C. Vens, J. Struyf, L. Schietgat, S. Dvzeroski, and H. Blockeel. Decision trees for hierarchical multi-label classification. Machine Learning, 73:185–214, 2008.
[2] J.J. Burred and A. Lerch. A hierarchical approach to automatic musical genre classification. In Proceedings of the 6th International Conference on Digital Audio Effects, 2003.
[3] N. Cesa-Bianchi, C. Gentile, and L. Zaniboni. Incremental algorithms for hierarchical classification. Journal of Machine Learning Research, 7:31–54, 2006.
[4] C.N. Silla and A.A. Freitas. A survey of hierarchical classification across different application domains. Data Mining and Knowledge Discovery, 22(1-2):31–72, 2011.
[5] Z. Barutcuoglu and O.G. Troyanskaya. Hierarchical multi-label prediction of gene function. Bioinformatics, 22:830–836, 2006.
[6] K. Punera, S. Rajan, and J. Ghosh. Automatically learning document taxonomies for hierarchical classification. In Proceedings of the 14th International Conference on World Wide Web, pages 1010–1011, 2005.
[7] M.-L. Zhang and K. Zhang. Multi-label learning by exploiting label dependency. In Proceedings of the 16th International Conference on Knowledge Discovery and Data Mining, pages 999–1008, 2010.
[8] S. Bengio, J. Weston, and D. Grangier. Label embedding trees for large multi-class tasks. In Advances in Neural Information Processing Systems 23, pages 163–171. 2010.
[9] J. Deng, S. Satheesh, A.C. Berg, and L. Fei-Fei. Fast and balanced: Efficient label tree learning for large scale object recognition. In Advances in Neural Information Processing Systems 24, pages 567–575. 2011.
[10] J. Rousu, C. Saunders, S. Szedmak, and J. Shawe-Taylor. Kernel-based learning of hierarchical multilabel classification models. Journal of Machine Learning Research, 7:1601–1626, 2006.
[11] W. Bi and J.T. Kwok. Multi-label classification on tree- and DAG-structured hierarchies. In Proceedings of the 28th International Conference on Machine Learning, pages 17–24, 2011.
[12] N. Cesa-Bianchi, C. Gentile, and L. Zaniboni. Hierarchical classification: Combining Bayes with SVM. In Proceedings of the 23rd International Conference on Machine Learning, pages 177–184, 2006.
[13] L. Tang, S. Rajan, and V.K. Narayanan. Large scale multi-label classification via metalabeler. In Proceedings of the 18th International Conference on World Wide Web, pages 211–220, 2009.
[14] R. Cerri, A. C. P. L. F. de Carvalho, and A. A. Freitas. Adapting non-hierarchical multilabel classification methods for hierarchical multilabel classification. Intelligent Data Analysis, 15:861–887, 2011.
[15] G. Tsoumakas and I. Vlahavas. Random k-labelsets: An ensemble method for multilabel classification. In Proceedings of the 18th European Conference on Machine Learning, pages 406–417, Warsaw, Poland, 2007.
[16] N. Cesa-Bianchi, C. Gentile, A. Tironi, and L. Zaniboni. Incremental algorithms for hierarchical classification. In Advances in Neural Information Processing Systems 17, pages 233–240. 2005.
[17] J.H. Zaragoza, L.E. Sucar, and EF Morales. Bayesian chain classifiers for multidimensional classification. In Twenty-Second International Joint Conference on Artificial Intelligence, pages 2192–2197, 2011.
[18] R.G. Baraniuk, V. Cevher, M.F. Duarte, and C. Hegde. Model-based compressive sensing. IEEE Transactions on Information Theory, 56:1982–2001, 2010.
[19] S.E. Shimony. Finding maps for belief networks is NP-hard. Artificial Intelligence, 68:399–410, 1994.
[20] C. Varin, N. Reid, and D. Firth. An overview of composite likelihood methods. Statistica Sinica, 21:5–42, 2011.
[21] Y. Zhang and J. Schneider. A composite likelihood view for multi-label classification. In Proceedings of the 15th International Conference on Artificial Intelligence and Statistics, pages 1407–1415, 2012.
[22] J. Zhou, J. Chen, and J. Ye. MALSAR: Multi-tAsk Learning via StructurAl Regularization. Arizona State University, 2012.
[23] G. Tsoumakas, I. Katakis, and I. Vlahavas. Mining multi-label data. In Data Mining and Knowledge Discovery Handbook, pages 667–685. Springer, 2010. 9