nips nips2008 nips2008-114 nips2008-114-reference knowledge-graph by maker-knowledge-mining
Source: pdf
Author: Kilian Q. Weinberger, Olivier Chapelle
Abstract: Applications of multi-class classification, such as document categorization, often appear in cost-sensitive settings. Recent work has significantly improved the state of the art by moving beyond “flat” classification through incorporation of class hierarchies [4]. We present a novel algorithm that goes beyond hierarchical classification and estimates the latent semantic space that underlies the class hierarchy. In this space, each class is represented by a prototype and classification is done with the simple nearest neighbor rule. The optimization of the semantic space incorporates large margin constraints that ensure that for each instance the correct class prototype is closer than any other. We show that our optimization is convex and can be solved efficiently for large data sets. Experiments on the OHSUMED medical journal data base yield state-of-the-art results on topic categorization. 1
[1] D. Blei, A. Ng, M. Jordan, and J. Lafferty. Latent Dirichlet Allocation. Journal of Machine Learning Research, 3(4-5):993–1022, 2003.
[2] S. Boyd and L. Vandenberghe. Convex Optimization. Cambridge University Press, 2004.
[3] A. Budanitsky and G. Hirst. Semantic distance in wordnet: An experimental, application-oriented evaluation of five measures. In Workshop on WordNet and Other Lexical Resources, in the North American Chapter of the Association for Co mputational Linguistics (NAACL), 2001.
[4] L. Cai and T. Hofmann. Hierarchical document categorization with support vector machines. In ACM 13th Conference on Information and Knowledge Management, 2004.
[5] T. Cox and M. Cox. Multidimensional Scaling. Chapman & Hall, London, 1994.
[6] K. Crammer and Y. Singer. On the algorithmic implementation of multiclass kernel-based vector machines. Journal of Machine Learning Research, 2:265–292, 2001.
[7] S. Deerwester, S. Dumais, G. Furnas, T. Landauer, and R. Harshman. Indexing by latent semantic analysis. Journal of the American Society for Information Science, 41(6):391–407, 1990.
[8] S. Dumais and H. Chen. Hierarchical classification of Web content. In Proceedings of SIGIR-00, 23rd ACM International Conference on Research and Development in Information Retrieval, pages 256–263. ACM Press, New York, US, 2000.
[9] W. Hersh, C. Buckley, T. J. Leone, and D. Hickam. OHSUMED: an interactive retrieval evaluation and new large test collection for research. In SIGIR ’94: Proceedings of the 17th annual international ACM conference on Research and development in information retrieval, pages 192–201. Springer-Verlag New York, Inc., 1994.
[10] G. Karypis, E. Hong, and S. Han. Concept indexing a fast dimensionality reduction algorithm with applications to document retrieval & categorization, 2000. Technical Report: 00-016 karypis, han@cs.umn.edu Last updated on.
[11] T.-Y. Liu, Y. Yang, H. Wan, H.-J. Zeng, Z. Chen, and W.-Y. Ma. Support vector machines classification with a very large-scale taxonomy. SIGKDD Explorations Newsletter, 7(1):36–43, 2005.
[12] R. Rifkin and A. Klautau. In Defense of One-Vs-All Classification. The Journal of Machine Learning Research, 5:101–141, 2004.
[13] F. Sebastiani. Machine learning in automated text categorization. ACM Computing Surveys, 34(1):1–47, 2002.
[14] F. Sha and L. K. Saul. Large margin hidden markov models for automatic speech recognition. In Advances in Neural Information Processing Systems 19, Cambridge, MA, 2007. MIT Press.
[15] A. Weigend, E. Wiener, and J. Pedersen. Exploiting Hierarchy in Text Categorization. Information Retrieval, 1(3):193–216, 1999.
[16] K. Q. Weinberger and L. K. Saul. Fast solvers and efficient implementations for distance metric learning. pages 1160–1167, 2008. 8