jmlr jmlr2011 jmlr2011-52 jmlr2011-52-reference knowledge-graph by maker-knowledge-mining

52 jmlr-2011-Large Margin Hierarchical Classification with Mutually Exclusive Class Membership

Source: pdf

Author: Huixin Wang, Xiaotong Shen, Wei Pan

Abstract: In hierarchical classiﬁcation, class labels are structured, that is each label value corresponds to one non-root node in a tree, where the inter-class relationship for classiﬁcation is speciﬁed by directed paths of the tree. In such a situation, the focus has been on how to leverage the interclass relationship to enhance the performance of ﬂat classiﬁcation, which ignores such dependency. This is critical when the number of classes becomes large relative to the sample size. This paper considers single-path or partial-path hierarchical classiﬁcation, where only one path is permitted from the root to a leaf node. A large margin method is introduced based on a new concept of generalized margins with respect to hierarchy. For implementation, we consider support vector machines and ψ-learning. Numerical and theoretical analyses suggest that the proposed method achieves the desired objective and compares favorably against strong competitors in the literature, including its ﬂat counterparts. Finally, an application to gene function prediction is discussed. Keywords: difference convex programming, gene function annotation, margins, multi-class classiﬁcation, structured learning

reference text

L. An and P. Tao. Solving a class of linearly constrained indeﬁnite quadratic problems by d.c. algorithms. J. Global Optimization, 11:253–285, 1997. K. Astikainen, L. Holm, S. Szedmak E. Pitknen, and J. Rousu. Towards structured output prediction of enzyme function. BMC Proceedings, 2(S4):S2, 2008. B. Boser, I. Guyon, and V. N. Vapnik. A training algorithm for optimal margin classiﬁers. Proc. Fifth Ann. Conf on Computat. Learning Theory Pittsburgh, PA, pages 144–152, 1992. L. Cai and T. Hofmann. Hierarchical document categorization with support vector machines. CIKM04, Washington, DC, 2004. N. Cesa-Bianchi and G. Valentini. Hierarchical cost-sensitive algorithms for genome-wide gene function prediction. MLSB 09: The 3rd International Workshop on Machine Learning in Systems Biology 2009, 2009. N. Cesa-Bianchi, A. Conconi, and C. Gentile. Regret bounds for hierarchical classiﬁcation with linear-threshold functions. Proc. the 17th Ann. Conf. on Computat. Learning Theory, pages 93– 108, 2004. N. Cesa-Bianchi, C. Gentile, and L. Zaniboni. Hierarchical classiﬁcation: Combining bayes with svm. Proc. of the 23rd Int. Conf. on Machine Learning, ACM Press (2006), pages 177–184, 2006. M. N. Davies, A. Secker, A. A. Freitas, M. Mendao, J. Timmis, and D. R. Flower. On the hierarchical classiﬁcation of g protein-coupled receptors. Bioinformatics, 23(23):3113–3118, 2007. O. Dekel, J. Keshet, and Y. Singer. An efﬁcient online algorithm for hierarchical phoneme classiﬁcation. Proc. the 1st Int. Workshop on Machine Learning for Multimodal Interaction, pages 146–158, 2004. L. Dong, E. Frank, and S. Kramer. Ensembles of balanced nested dichotomies for multi-class problems. Lecture Notes in Computer Science, 3721/2005:84–95, 2005. C. Gu. Multidimension smoothing with splines. Smoothing and Regression: Approaches, Computation and Application, 2000. Y. Guan, C. Myers, D. Hess, Z. Barutcuoglu, A. Caudy, and O. Troyanskaya. Predicting gene function in a hierarchical context with an ensemble of classiﬁers. Genome Biology, 9(S2), 2008. T. Hughes, M. Marton, A. Jones, C. Roberts, R. Stoughton, C. Armour, H. Bennett, E. Coffey, H. Dai, Y. He, M. Kidd, A. King, M. Meyer, D. Slade, P. Lum, S. Stepaniants, D. Shoemaker, D. Gachotte, K. Chakraburtty, J. Simon, M. Bard, and S. Friend. Functional discovery via a compendium of expression proﬁles. Cell, 102:109–126, 2000. T. Jaakkola, M. Diekhans, and D. Haussler. Using the ﬁsher kernel method to detect remote protein homologizes. In Proc. the Seventh Int. Conf. on Intelligent Systems for Molecular Biology, pages 149–158, 1999. 2746 L ARGE M ARGIN H IERARCHICAL C LASSIFICATION WITH M UTUALLY E XCLUSIVE C LASS M EMBERSHIP T. Joachims. Text categorization with support vector machines: learning with many relevant features. Proc. of the 10th European Conf. on Machine Learning (ECML1998), 1398:117–142, 1998. A. N. Kolmogorov and V. M. Tihomirov. ε-entropy and ε-capacity of sets in function spaces. Uspekhi Mat. Nauk., 14:3–86, 1959. In Russian. English translation, Ameri. Math. Soc. transl. 2 , 17, 277-364. (1961). D. Lewis. Naive (bayes) at forty: The independence assumption in information retrieval. Proc. of the 10th European Conf. on Machine Learning (ECML1998), pages 4–15, 1998. Y. Lin, Y. Lee, and G. Wahba. Support vector machines for classiﬁcation in nonstandard situations. Machine Learning, 46:191–202, 2002. S. Liu, X. Shen, and W. Wong. Computational development of ψ-learning. Proc. SIAM 2005 Int. Data Mining Conf., pages 1–12, 2005. Y. Liu and X. Shen. Multicategory ψ-learning. J. Amer. Statist. Assoc., 101:500–509, 2006. H. W. Mewes, D. Frishman, U. G’ldener, G. Mannhaupt, K. Mayer, M. Mokrejs, B. Morgenstern, M. M’nsterkoetter, S. Rudd, and B. Weil. Mips: a database for genomes and protein sequences. Nuclerc Acids Res, 30:31–34, 2002. G. Obozinski, G. Lanckriet, C. Grant, M. Jordan, and W. Noble. Consistent probabilistic output for protein function prediction. Genome Biology, 9(S6), 2008. J. Rousu, C. Saunders, S. Szedmak, and J. Shawe-Taylor. Kernel-based learning of hierarchical multilabel classiﬁcation models. J. Mach. Leaning Res., 7:1601–1626, 2006. B. Shahbaba and R. Neal. Improving classiﬁcation when a class hierarchy is available using a hierarchy-based prior. Bayesian Analysis, 2:221–238, 2007. X. Shen and L. Wang. Generalization error for multi-class margin classiﬁcation. Electronic J. of Statist., 1:307–330, 2007. X. Shen, G. Tseng, X. Zhang, and W. Wong. On ψ-learning. J. Amer. Statist. Assoc., 98:724–734, 2003. I. Tsochantaridis, T. Hofmann, T. Joachims, and Y. Altun. Support vector machine learning for interdependent and structured output spaces. Proc. the 21st Int. Conf. on Machine Leaning, 2004. G. Valentini and M. Re. Weighted true path rule: a multilabel hierarchical algorithm for gene function prediction. The 1st International Workshop on learning from Multi-Label Data, ECML/PKDD 2009, 2009. V. Vapnik. Statistical Learning Theory. Wiley, New York, NY, 1998. Y. Yang and X. Liu. A reexamination of text categorization methods. Proc. the 22nd Annual Int. ACM SIGIR Conf. on Research and Development in Information Retrieval, pages 42–49, 1999. 2747 WANG , S HEN AND PAN J. Zhu and T. Hastie. Kernel logistic regression and the import vector machine. J. Comput. and Graph. Statist., 14:185–205, 2005. A. Zimek, F. Buchwald, E. Frank, and S. Kramer. A study of hierarchical and ﬂat classiﬁcation of proteins. IEEE/ACM Transactions on Computational Biology and Bioinformatics 2008, 2008. 2748