nips nips2000 nips2000-130 nips2000-130-reference knowledge-graph by maker-knowledge-mining

130 nips-2000-Text Classification using String Kernels

Source: pdf

Author: Huma Lodhi, John Shawe-Taylor, Nello Cristianini, Christopher J. C. H. Watkins

Abstract: We introduce a novel kernel for comparing two text documents. The kernel is an inner product in the feature space consisting of all subsequences of length k. A subsequence is any ordered sequence of k characters occurring in the text though not necessarily contiguously. The subsequences are weighted by an exponentially decaying factor of their full length in the text, hence emphasising those occurrences which are close to contiguous. A direct computation of this feature vector would involve a prohibitive amount of computation even for modest values of k, since the dimension of the feature space grows exponentially with k. The paper describes how despite this fact the inner product can be efficiently evaluated by a dynamic programming technique. A preliminary experimental comparison of the performance of the kernel compared with a standard word feature space kernel [6] is made showing encouraging results. 1

reference text

[1] M. Aizerman, E. Braverman, and L. Rozonoer. Theoretical foundations of the potential function method in pattern recognition learning. Automation and Remote Control, 25:821-837, 1964.

[2] B. E. Boser, 1. M. Guyon, and V. N. Vapnik. A training algorithm for optimal margin classifiers. In D. Haussler, editor, Proceedings of the 5th Annual ACM Workshop on Computational Learning Theory, pages 144-152. ACM Press, 1992.

[3] N. Cristianini and J. Shawe-Taylor. An Introduction to Support Vector Machines. Cambridge University Press, 2000. www.support-vector.net.

[4] D. Haussler. Convolution kernels on discrete structures. Technical Report UCSC-CRL-99-10, University of California in Santa Cruz, Computer Science Department, July 1999.

[5] T. Joachims. Text categorization with support vector machines: Learning with many relevant features. Technical Report 23, LS VIII, University of Dortmund, 1997.

[6] T. Joachims. Text categorization with support vector machines. In Proceedings of European Conference on Machine Learning (ECML), 1998.

[7] David Lewis. Reuters-21578 collection. Technical report, Available at: 1987. http://www.research.att.com/~ewis/reuters21578.html.

[8] J. Shawe-Taylor and N. Cristianini Margin Distribution and Soft Margin In Advances in Large Margin Classifiers, MIT Press 2000.

[9] J. Shawe-Taylor, P. Bartlett, R. Williamson and M. Anthony Structural Risk Minimization over Data-Dependent Hierarchies In EEE Transactions on Information Theory 1998

[10] V. Vapnik. Statistical Learning Theory. Wiley, 1998.

[11] C. Watkins. Dynamic alignment kernels. Technical Report CSD-TR-98-11, Royal Holloway, University of London, Computer Science department, January 1999.