nips nips2009 nips2009-71 nips2009-71-reference knowledge-graph by maker-knowledge-mining

71 nips-2009-Distribution-Calibrated Hierarchical Classification

Source: pdf

Author: Ofer Dekel

Abstract: While many advances have already been made in hierarchical classiﬁcation learning, we take a step back and examine how a hierarchical classiﬁcation problem should be formally deﬁned. We pay particular attention to the fact that many arbitrary decisions go into the design of the label taxonomy that is given with the training data. Moreover, many hand-designed taxonomies are unbalanced and misrepresent the class structure in the underlying data distribution. We attempt to correct these problems by using the data distribution itself to calibrate the hierarchical classiﬁcation loss function. This distribution-based correction must be done with care, to avoid introducing unmanageable statistical dependencies into the learning problem. This leads us off the beaten path of binomial-type estimation and into the unfamiliar waters of geometric-type estimation. In this paper, we present a new calibrated deﬁnition of statistical risk for hierarchical classiﬁcation, an unbiased estimator for this risk, and a new algorithmic reduction from hierarchical classiﬁcation to cost-sensitive classiﬁcation.

reference text

[1] The Library of Congress Classiﬁcation. http://www.loc.gov/aba/cataloging/classiﬁcation/.

[2] The Open Directory Project. http://www.dmoz.org/about.html.

[3] L. Cai and T. Hofmann. Hierarchical document categorization with support vector machines. In 13th ACM Conference on Information and Knowledge Management, 2004.

[4] N. Cesa-Bianchi, C. Gentile, and L. Zaniboni. Hierarchical classiﬁcation: combining bayes with svm. In Proceedings of the 23rd International Conference on Machine Learning, 2006.

[5] N. Cesa-Bianchi, C. Gentile, and L. Zaniboni. Incremental algorithms for hierarchical classiﬁcation. Journal of Machine Learning Research, 7:31–54, 2007.

[6] The Gene Ontology Consortium. Gene ontology: tool for the uniﬁcation of biology. Nature Genetics, 25:25–29, 2000.

[7] T. M. Cover and J. A. Thomas. Elements of Information Theory. Wiley, 1991.

[8] O. Dekel, J. Keshet, and Y. Singer. Large margin hierarchical classiﬁcation. In Proceedings of the Twenty-First International Conference on Machine Learning, 2004.

[9] S. T. Dumais and H. Chen. Hierarchical classiﬁcation of Web content. In Proceedings of SIGIR-00, pages 256–263, 2000.

[10] T. Evgeniou, C.Micchelli, and M. Pontil. Learning multiple tasks with kernel methods. Journal of Machine Learning Research, 6:615–637, 2005.

[11] W. Feller. An Introduction to Probability and its Applications, volume 2. John Wiley and Sons, second edition, 1970.

[12] D. Koller and M. Sahami. Hierarchically classifying docuemnts using very few words. In Machine Learning: Proceedings of the Fourteenth International Conference, pages 171–178, 1997.

[13] A. K. McCallum, R. Rosenfeld, T. M. Mitchell, and A. Y. Ng. Improving text classiﬁcation by shrinkage in a hierarchy of classes. In Proceedings of ICML-98, pages 359–367, 1998.

[14] S. Montgomery-Smith and T. Schurmann. Unbiased estimators for entropy and class number.

[15] S.M. Ross and E.A. Pekoz. A second course in probability theory. 2007.

[16] E. Ruiz and P. Srinivasan. Hierarchical text categorization using neural networks. Information Retrieval, 5(1):87–118, 2002.

[17] C. Shirky. Ontology is overrated: Categories, links, and tags. In O’Reilly Media Emerging Technology Conference, 2005.

[18] A. S. Weigend, E. D. Wiener, and J. O. Pedersen. Exploiting hierarchy in text categorization. Information Retrieval, 1(3):193–216, 1999.

[19] J. Zhang, L. Tang, and H. Liu. Automatically adjusting content taxonomies for hierarchical classiﬁcation. In Proceedings of the Fourth Workshop on Text Mining, SDM06, 2006. 9