jmlr jmlr2011 jmlr2011-52 jmlr2011-52-reference knowledge-graph by maker-knowledge-mining

52 jmlr-2011-Large Margin Hierarchical Classification with Mutually Exclusive Class Membership


Source: pdf

Author: Huixin Wang, Xiaotong Shen, Wei Pan

Abstract: In hierarchical classification, class labels are structured, that is each label value corresponds to one non-root node in a tree, where the inter-class relationship for classification is specified by directed paths of the tree. In such a situation, the focus has been on how to leverage the interclass relationship to enhance the performance of flat classification, which ignores such dependency. This is critical when the number of classes becomes large relative to the sample size. This paper considers single-path or partial-path hierarchical classification, where only one path is permitted from the root to a leaf node. A large margin method is introduced based on a new concept of generalized margins with respect to hierarchy. For implementation, we consider support vector machines and ψ-learning. Numerical and theoretical analyses suggest that the proposed method achieves the desired objective and compares favorably against strong competitors in the literature, including its flat counterparts. Finally, an application to gene function prediction is discussed. Keywords: difference convex programming, gene function annotation, margins, multi-class classification, structured learning


reference text

L. An and P. Tao. Solving a class of linearly constrained indefinite quadratic problems by d.c. algorithms. J. Global Optimization, 11:253–285, 1997. K. Astikainen, L. Holm, S. Szedmak E. Pitknen, and J. Rousu. Towards structured output prediction of enzyme function. BMC Proceedings, 2(S4):S2, 2008. B. Boser, I. Guyon, and V. N. Vapnik. A training algorithm for optimal margin classifiers. Proc. Fifth Ann. Conf on Computat. Learning Theory Pittsburgh, PA, pages 144–152, 1992. L. Cai and T. Hofmann. Hierarchical document categorization with support vector machines. CIKM04, Washington, DC, 2004. N. Cesa-Bianchi and G. Valentini. Hierarchical cost-sensitive algorithms for genome-wide gene function prediction. MLSB 09: The 3rd International Workshop on Machine Learning in Systems Biology 2009, 2009. N. Cesa-Bianchi, A. Conconi, and C. Gentile. Regret bounds for hierarchical classification with linear-threshold functions. Proc. the 17th Ann. Conf. on Computat. Learning Theory, pages 93– 108, 2004. N. Cesa-Bianchi, C. Gentile, and L. Zaniboni. Hierarchical classification: Combining bayes with svm. Proc. of the 23rd Int. Conf. on Machine Learning, ACM Press (2006), pages 177–184, 2006. M. N. Davies, A. Secker, A. A. Freitas, M. Mendao, J. Timmis, and D. R. Flower. On the hierarchical classification of g protein-coupled receptors. Bioinformatics, 23(23):3113–3118, 2007. O. Dekel, J. Keshet, and Y. Singer. An efficient online algorithm for hierarchical phoneme classification. Proc. the 1st Int. Workshop on Machine Learning for Multimodal Interaction, pages 146–158, 2004. L. Dong, E. Frank, and S. Kramer. Ensembles of balanced nested dichotomies for multi-class problems. Lecture Notes in Computer Science, 3721/2005:84–95, 2005. C. Gu. Multidimension smoothing with splines. Smoothing and Regression: Approaches, Computation and Application, 2000. Y. Guan, C. Myers, D. Hess, Z. Barutcuoglu, A. Caudy, and O. Troyanskaya. Predicting gene function in a hierarchical context with an ensemble of classifiers. Genome Biology, 9(S2), 2008. T. Hughes, M. Marton, A. Jones, C. Roberts, R. Stoughton, C. Armour, H. Bennett, E. Coffey, H. Dai, Y. He, M. Kidd, A. King, M. Meyer, D. Slade, P. Lum, S. Stepaniants, D. Shoemaker, D. Gachotte, K. Chakraburtty, J. Simon, M. Bard, and S. Friend. Functional discovery via a compendium of expression profiles. Cell, 102:109–126, 2000. T. Jaakkola, M. Diekhans, and D. Haussler. Using the fisher kernel method to detect remote protein homologizes. In Proc. the Seventh Int. Conf. on Intelligent Systems for Molecular Biology, pages 149–158, 1999. 2746 L ARGE M ARGIN H IERARCHICAL C LASSIFICATION WITH M UTUALLY E XCLUSIVE C LASS M EMBERSHIP T. Joachims. Text categorization with support vector machines: learning with many relevant features. Proc. of the 10th European Conf. on Machine Learning (ECML1998), 1398:117–142, 1998. A. N. Kolmogorov and V. M. Tihomirov. ε-entropy and ε-capacity of sets in function spaces. Uspekhi Mat. Nauk., 14:3–86, 1959. In Russian. English translation, Ameri. Math. Soc. transl. 2 , 17, 277-364. (1961). D. Lewis. Naive (bayes) at forty: The independence assumption in information retrieval. Proc. of the 10th European Conf. on Machine Learning (ECML1998), pages 4–15, 1998. Y. Lin, Y. Lee, and G. Wahba. Support vector machines for classification in nonstandard situations. Machine Learning, 46:191–202, 2002. S. Liu, X. Shen, and W. Wong. Computational development of ψ-learning. Proc. SIAM 2005 Int. Data Mining Conf., pages 1–12, 2005. Y. Liu and X. Shen. Multicategory ψ-learning. J. Amer. Statist. Assoc., 101:500–509, 2006. H. W. Mewes, D. Frishman, U. G’ldener, G. Mannhaupt, K. Mayer, M. Mokrejs, B. Morgenstern, M. M’nsterkoetter, S. Rudd, and B. Weil. Mips: a database for genomes and protein sequences. Nuclerc Acids Res, 30:31–34, 2002. G. Obozinski, G. Lanckriet, C. Grant, M. Jordan, and W. Noble. Consistent probabilistic output for protein function prediction. Genome Biology, 9(S6), 2008. J. Rousu, C. Saunders, S. Szedmak, and J. Shawe-Taylor. Kernel-based learning of hierarchical multilabel classification models. J. Mach. Leaning Res., 7:1601–1626, 2006. B. Shahbaba and R. Neal. Improving classification when a class hierarchy is available using a hierarchy-based prior. Bayesian Analysis, 2:221–238, 2007. X. Shen and L. Wang. Generalization error for multi-class margin classification. Electronic J. of Statist., 1:307–330, 2007. X. Shen, G. Tseng, X. Zhang, and W. Wong. On ψ-learning. J. Amer. Statist. Assoc., 98:724–734, 2003. I. Tsochantaridis, T. Hofmann, T. Joachims, and Y. Altun. Support vector machine learning for interdependent and structured output spaces. Proc. the 21st Int. Conf. on Machine Leaning, 2004. G. Valentini and M. Re. Weighted true path rule: a multilabel hierarchical algorithm for gene function prediction. The 1st International Workshop on learning from Multi-Label Data, ECML/PKDD 2009, 2009. V. Vapnik. Statistical Learning Theory. Wiley, New York, NY, 1998. Y. Yang and X. Liu. A reexamination of text categorization methods. Proc. the 22nd Annual Int. ACM SIGIR Conf. on Research and Development in Information Retrieval, pages 42–49, 1999. 2747 WANG , S HEN AND PAN J. Zhu and T. Hastie. Kernel logistic regression and the import vector machine. J. Comput. and Graph. Statist., 14:185–205, 2005. A. Zimek, F. Buchwald, E. Frank, and S. Kramer. A study of hierarchical and flat classification of proteins. IEEE/ACM Transactions on Computational Biology and Bioinformatics 2008, 2008. 2748