nips nips2001 nips2001-56 nips2001-56-reference knowledge-graph by maker-knowledge-mining

56 nips-2001-Convolution Kernels for Natural Language

Source: pdf

Author: Michael Collins, Nigel Duffy

Abstract: We describe the application of kernel methods to Natural Language Processing (NLP) problems. In many NLP tasks the objects being modeled are strings, trees, graphs or other discrete structures which require some mechanism to convert them into feature vectors. We describe kernels for various natural language structures, allowing rich, high dimensional representations of these structures. We show how a kernel over trees can be applied to parsing using the voted perceptron algorithm, and we give experimental results on the ATIS corpus of parse trees.

reference text

[1] Aizerman, M., Braverman, E., and Rozonoer, L. (1964). Theoretical Foundations of the Potential Function Method in Pattern Recognition Learning. Automation and Remote Control, 25:821–837.

[2] Bod, R. (1998). Beyond Grammar: An Experience-Based Theory of Language. CSLI Publications/Cambridge University Press.

[3] Charniak, E. (1997). Statistical techniques for natural language parsing. In AI Magazine, Vol. 18, No. 4.

[4] Collins, M. (2000). Discriminative Reranking for Natural Language Parsing. Proceedings of the Seventeenth International Conference on Machine Learning. San Francisco: Morgan Kaufmann.

[5] Collins, M. and Duffy, N. (2001). Parsing with a Single Neuron: Convolution Kernels for Natural Language Problems. Technical report UCSC-CRL-01-01, University of California at Santa Cruz.

[6] Cortes, C. and Vapnik, V. (1995). Support–Vector Networks. Machine Learning, 20(3):273–297.

[7] Freund, Y. and Schapire, R. (1999). Large Margin Classiﬁcation using the Perceptron Algorithm. In Machine Learning, 37(3):277–296.

[8] Freund, Y., Iyer, R.,Schapire, R.E., & Singer, Y. (1998). An efﬁcient boosting algorithm for combining preferences. In Machine Learning: Proceedings of the Fifteenth International Conference. San Francisco: Morgan Kaufmann.

[9] Goodman, J. (1996). Efﬁcient algorithms for parsing the DOP model. In Proceedings of the Conference on Empirical Methods in Natural Language Processing (EMNLP 96), pages 143-152.

[10] Haussler, D. (1999). Convolution Kernels on Discrete Structures. Technical report, University of Santa Cruz.

[11] Johnson, M. The DOP estimation method is biased and inconsistent. To appear in Computational Linguistics.

[12] Lodhi, H., Christianini, N., Shawe-Taylor, J., and Watkins, C. (2001). Text Classiﬁcation using String Kernels. To appear in Advances in Neural Information Processing Systems 13, MIT Press.

[13] Johnson, M., Geman, S., Canon, S., Chi, S., & Riezler, S. (1999). Estimators for stochastic ‘uniﬁcation-based” grammars. In Proceedings of the 37th Annual Meeting of the Association for Computational Linguistics. San Francisco: Morgan Kaufmann.

[14] Marcus, M., Santorini, B., & Marcinkiewicz, M. (1993). Building a large annotated corpus of english: The Penn treebank. Computational Linguistics, 19, 313-330.

[15] Scholkopf, B., Smola, A.,and Muller, K.-R. (1999). Kernel principal component analysis. In B. Scholkopf, C. J. C. Burges, and A. J. Smola, editors, Advances in Kernel Methods – SV Learning, pages 327-352. MIT Press, Cambridge, MA.

[16] Watkins, C. (2000). Dynamic alignment kernels. In A.J. Smola, P.L. Bartlett, B. Schlkopf, and D. Schuurmans, editors, Advances in Large Margin Classiﬁers, pages 39-50, MIT Press.