nips nips2009 nips2009-71 nips2009-71-reference knowledge-graph by maker-knowledge-mining
Source: pdf
Author: Ofer Dekel
Abstract: While many advances have already been made in hierarchical classification learning, we take a step back and examine how a hierarchical classification problem should be formally defined. We pay particular attention to the fact that many arbitrary decisions go into the design of the label taxonomy that is given with the training data. Moreover, many hand-designed taxonomies are unbalanced and misrepresent the class structure in the underlying data distribution. We attempt to correct these problems by using the data distribution itself to calibrate the hierarchical classification loss function. This distribution-based correction must be done with care, to avoid introducing unmanageable statistical dependencies into the learning problem. This leads us off the beaten path of binomial-type estimation and into the unfamiliar waters of geometric-type estimation. In this paper, we present a new calibrated definition of statistical risk for hierarchical classification, an unbiased estimator for this risk, and a new algorithmic reduction from hierarchical classification to cost-sensitive classification.
[1] The Library of Congress Classification. http://www.loc.gov/aba/cataloging/classification/.
[2] The Open Directory Project. http://www.dmoz.org/about.html.
[3] L. Cai and T. Hofmann. Hierarchical document categorization with support vector machines. In 13th ACM Conference on Information and Knowledge Management, 2004.
[4] N. Cesa-Bianchi, C. Gentile, and L. Zaniboni. Hierarchical classification: combining bayes with svm. In Proceedings of the 23rd International Conference on Machine Learning, 2006.
[5] N. Cesa-Bianchi, C. Gentile, and L. Zaniboni. Incremental algorithms for hierarchical classification. Journal of Machine Learning Research, 7:31–54, 2007.
[6] The Gene Ontology Consortium. Gene ontology: tool for the unification of biology. Nature Genetics, 25:25–29, 2000.
[7] T. M. Cover and J. A. Thomas. Elements of Information Theory. Wiley, 1991.
[8] O. Dekel, J. Keshet, and Y. Singer. Large margin hierarchical classification. In Proceedings of the Twenty-First International Conference on Machine Learning, 2004.
[9] S. T. Dumais and H. Chen. Hierarchical classification of Web content. In Proceedings of SIGIR-00, pages 256–263, 2000.
[10] T. Evgeniou, C.Micchelli, and M. Pontil. Learning multiple tasks with kernel methods. Journal of Machine Learning Research, 6:615–637, 2005.
[11] W. Feller. An Introduction to Probability and its Applications, volume 2. John Wiley and Sons, second edition, 1970.
[12] D. Koller and M. Sahami. Hierarchically classifying docuemnts using very few words. In Machine Learning: Proceedings of the Fourteenth International Conference, pages 171–178, 1997.
[13] A. K. McCallum, R. Rosenfeld, T. M. Mitchell, and A. Y. Ng. Improving text classification by shrinkage in a hierarchy of classes. In Proceedings of ICML-98, pages 359–367, 1998.
[14] S. Montgomery-Smith and T. Schurmann. Unbiased estimators for entropy and class number.
[15] S.M. Ross and E.A. Pekoz. A second course in probability theory. 2007.
[16] E. Ruiz and P. Srinivasan. Hierarchical text categorization using neural networks. Information Retrieval, 5(1):87–118, 2002.
[17] C. Shirky. Ontology is overrated: Categories, links, and tags. In O’Reilly Media Emerging Technology Conference, 2005.
[18] A. S. Weigend, E. D. Wiener, and J. O. Pedersen. Exploiting hierarchy in text categorization. Information Retrieval, 1(3):193–216, 1999.
[19] J. Zhang, L. Tang, and H. Liu. Automatically adjusting content taxonomies for hierarchical classification. In Proceedings of the Fourth Workshop on Text Mining, SDM06, 2006. 9