emnlp emnlp2011 emnlp2011-127 emnlp2011-127-reference knowledge-graph by maker-knowledge-mining

127 emnlp-2011-Structured Lexical Similarity via Convolution Kernels on Dependency Trees

Source: pdf

Author: Danilo Croce ; Alessandro Moschitti ; Roberto Basili

Abstract: Alessandro Moschitti DISI University of Trento 38123 Povo (TN), Italy mo s chitt i di s i @ .unit n . it Roberto Basili DII University of Tor Vergata 00133 Roma, Italy bas i i info .uni roma2 . it l@ over semantic networks, e.g. (Cowie et al., 1992; Wu and Palmer, 1994; Resnik, 1995; Jiang and Conrath, A central topic in natural language processing is the design of lexical and syntactic fea- tures suitable for the target application. In this paper, we study convolution dependency tree kernels for automatic engineering of syntactic and semantic patterns exploiting lexical similarities. We define efficient and powerful kernels for measuring the similarity between dependency structures, whose surface forms of the lexical nodes are in part or completely different. The experiments with such kernels for question classification show an unprecedented results, e.g. 41% of error reduction of the former state-of-the-art. Additionally, semantic role classification confirms the benefit of semantic smoothing for dependency kernels.

reference text

Marco Baroni, Silvia Bernardini, Adriano Ferraresi, and Eros Zanchetta. 2009. The wacky wide web: a collection of very large linguistically processed webcrawled corpora. Language Resources and Evaluation, 43(3):209–226. Roberto Basili, Marco Cammisa, and Alessandro Moschitti. 2005. Effective use of WordNet semantics via kernel-based learning. In Proceedings of CoNLL2005, pages 1–8, Ann Arbor, Michigan. Association for Computational Linguistics. Stephan Bloehdorn and Alessandro Moschitti. 2007a. Combined syntactic and semantic kernels for text classification. In Proceedings of ECIR 2007, Rome, Italy. Stephan Bloehdorn and Alessandro Moschitti. 2007b. Structure and semantics for expressive text kernels. In In Proceedings of CIKM ’07. Stephan Bloehdorn, Roberto Basili, Marco Cammisa, and Alessandro Moschitti. 2006. Semantic kernels for text classification based on topological measures of feature similarity. In Proceedings of ICDM 06, Hong Kong, 2006. Ulrik Brandes. 2001. A Faster Algorithm for Betweenness Centrality. Journal of Mathematical Sociology, 25: 163–177. Alexander Budanitsky and Graeme Hirst. 2006. Evaluating WordNet-based measures of semantic distance. Computational Linguistics, 32(1): 13–47. Razvan Bunescu and Raymond Mooney. 2005. A shortest path dependency kernel for relation extraction. In Proceedings of HLT and EMNLP, pages 724–731, Vancouver, British Columbia, Canada, October. Horst Bunke and Kim Shearer. 1998. A graph distance metric based on the maximal common subgraph. Pattern Recogn. Lett., 19(3-4):255–259, March. Nicola Cancedda, Eric Gaussier, Cyril Goutte, and Jean Michel Renders. 2003. Word sequence kernels. Journal of Machine Learning Research, 3: 1059–1082. O. Chapelle, B. Schlkopf, and A. Zien. 2006. SemiSupervised Learning. Adaptive computation and machine learning. MIT Press, Cambridge, MA, USA, 09. Eugene Charniak. 2000. A maximum-entropy-inspired parser. In Proceedings of NAACL’00. Wanxiang Che, Min Zhang, Ting Liu, and Sheng Li. 2006. A hybrid convolution tree kernel for semantic role labeling. In Proceedings of the COLING/ACL on Main conference poster sessions, COLING-ACL ’06, pages 73–80, Stroudsburg, PA, USA. Association for Computational Linguistics. Michael Collins and Nigel Duffy. 2002. New Ranking Algorithms for Parsing and Tagging: Kernels over Discrete Structures, and the Voted Perceptron. In Proceedings of ACL’02. 1044 Courtney Corley and Rada Mihalcea. 2005. Measuring the semantic similarity of texts. In Proceedings of the ACL Workshop on Empirical Modeling of Semantic Equivalence and Entailment, pages 13–18, Ann Arbor, Michigan, June. Association for Computational Linguistics. Jim Cowie, Joe Guthrie, and Louise Guthrie. 1992. Lexical disambiguation using simulated annealing. In in COLING, pages 359–365. Nello Cristianini, John Shawe-Taylor, and Huma Lodhi. 2001. Latent semantic kernels. In Carla Brodley and Andrea Danyluk, editors, Proceedings of ICML-01, 18th International Conference on Machine Learning, pages 66–73, Williams College, US. Morgan Kaufmann Publishers, San Francisco, US. Aron Culotta and Jeffrey Sorensen. 2004. Dependency tree kernels for relation extraction. In Proceedings of ACL, pages 423–429, Barcelona, Spain, July. Chad Cumby and Dan Roth. 2003. Kernel Methods for Relational Learning. In Proceedings of ICML 2003. Hal Daum e´ III and Daniel Marcu. 2004. Np bracketing by maximum entropy tagging and SVM reranking. In Proceedings of EMNLP’04. Jason V. Davis, Brian Kulis, Prateek Jain, Suvrit Sra, and Inderjit S. Dhillon. 2007. Information-theoretic metric learning. In Proceedings of the 24th international conference on Machine learning, ICML ’07, pages 209–216, New York, NY, USA. ACM. Linton C. Freeman. 1977. A Set of Measures of Central- ity Based on Betweenness. Sociometry, 40(1):35–41. Hagen F ¨urstenau and Mirella Lapata. 2009. Graph alignment for semi-supervised semantic role labeling. In In Proceedings of EMNLP ’09, pages 11–20, Morristown, NJ, USA. Ana-Maria Giuglea and Alessandro Moschitti. 2004. Knowledge Discovering using FrameNet, VerbNet and PropBank. In In Proceedings of the Workshop on Ontology and Knowledge Discovering at ECML 2004, Pisa, Italy. A.-M. Giuglea and A. Moschitti. 2006. Semantic role labeling via framenet, verbnet and propbank. In Proceedings of ACL, Sydney, Australia. Alfio Gliozzo, Claudio Giuliano, and Carlo Strapparava. 2005. Domain kernels for word sense disambiguation. In Proceedings of ACL’05, pages 403–410. G. Golub and W. Kahan. 1965. Calculating the singular values and pseudo-inverse of a matrix. Journal of the Society for Industrial and Applied Mathematics: Series B, Numerical Analysis, 2(2):pp. 205–224. Zellig Harris. 1964. Distributional structure. In Jerrold J. Katz and Jerry A. Fodor, editors, The Philosophy of Linguistics. Oxford University Press. J. J. Jiang and D. W. Conrath. 1997. Semantic Similarity Based on Corpus Statistics and Lexical Taxonomy. In International Conference Research on Computational Linguistics (ROCLING X). T. Joachims. 2000. Estimating the generalization performance of a SVM efficiently. In Proceedings of ICML’00. Richard Johansson and Alessandro Moschitti. 2010a. Reranking models in fine-grained opinion analysis. In Proceedings of the 23rd International Conference of Computational Linguistics (Coling 2010), pages 5 19– 527, Beijing, China. Richard Johansson and Alessandro Moschitti. 2010b. Syntactic and semantic structure for opinion expression detection. In Proceedings of the Fourteenth Conference on Computational Natural Language Learning, pages 67–76, Uppsala, Sweden. Richard Johansson and Pierre Nugues. 2008a. Dependency-based syntactic–semantic analysis with PropBank and NomBank. In CoNLL 2008: Proceedings of the Twelfth Conference on Natural Language Learning, pages 183–187, Manchester, United Kingdom. Richard Johansson and Pierre Nugues. 2008b. The effect of syntactic representation on semantic role labeling. In Proceedings of COLING, Manchester, UK, August 18-22. Taku Kudo and Yuji Matsumoto. 2003. Fast methods for kernel-based text analysis. In Proceedings of ACL’03. Taku Kudo, Jun Suzuki, and Hideki Isozaki. 2005. Boosting-based parse reranking with subtree features. In Proceedings of ACL’05. Claudia Leacock and Martin Chodorow, 1998. Combining Local Context and WordNet Similarity for Word Sense Identification, chapter 11, pages 265–283. The MIT Press. X. Li and D. Roth. 2002. Learning question classifiers. In Proceedings of ACL’02. Yashar Mehdad, Alessandro Moschitti, and Fabio Massimo Zanzotto. 2010. Syntactic/semantic structures for textual entailment recognition. In HLT-NAACL, pages 1020–1028. Rada Mihalcea, Courtney Corley, and Carlo Strapparava. 2005. Corpus-based and knowledge-based measures of text semantic similarity. In Proceedings of the American Association for Artificial Intelligence (AAAI 2006), Boston, July. Rada Mihalcea. 2005. unsupervised large-vocabulary word sense disambiguation with graph-based algorithms for sequence data labeling. In In HLT/EMNLP 2005, pages 411–418. Alessandro Moschitti, Silvia Quarteroni, Roberto Basili, and Suresh Manandhar. 2007. Exploiting syntactic 1045 and shallow semantic kernels for question/answer classification. In Proceedings of ACL’07. Alessandro Moschitti, Daniele Pighin, and Roberto Basili. 2008. Tree kernels for semantic role labeling. Computational Linguistics, 34(2): 193–224. A. Moschitti. 2004. A study on convolution kernels for shallow semantic parsing. In Proceedings of ACL, Barcelona, Spain. Alessandro Moschitti. 2006a. Efficient convolution kernels for dependency and constituent syntactic trees. In Proceedings of ECML’06, pages 3 18–329. Alessandro Moschitti. 2006b. Making tree kernels practical for natural language learning. In Proccedings of EACL’06. Roberto Navigli and Mirella Lapata. 2010. An Experimental Study of Graph Connectivity for Unsupervised Word Sense Disambiguation. IEEE Transactions on PatternAnalysis andMachine Intelligence, 32(4):678– 692. Sebastian Pado and Mirella Lapata. 2007. Dependencybased construction of semantic space models. Computational Linguistics, 33(2). Sebastian Pad o´, 2006. User’s guide to sigf: Significance testing by approximate randomisation. Ted Pedersen, Siddharth Patwardhan, and Jason Michelizzi. 2004a. WordNet::Similarity - Measuring the Relatedness of Concept. In Proc. of 5th NAACL, Boston, MA. Ted Pedersen, Siddharth Patwardhan, and Jason Michelizzi. 2004b. Wordnet::similarity - measuring the relatedness of concepts. In Daniel Marcu Susan Dumais and Salim Roukos, editors, HLT-NAACL 2004: Demonstration Papers, pages 38–41, Boston, Mas- sachusetts, USA, May 2 - May 7. Association for Computational Linguistics. Philip Resnik. 1995. Using information content to evaluate semantic similarity in a taxonomy. In In Proceedings of the 14th International Joint Conference on Artificial Intelligence, pages 448–453. Magnus Sahlgren. 2006. The Word-Space Model. Ph.D. thesis, Stockholm University. Hinrich Schtze. 1998. Automatic word sense discrimination. Journal of Computational Linguistics, 24:97– 123. John Shawe-Taylor and Nello Cristianini. 2004. Kernel Methods for Pattern Analysis. Cambridge University Press. Libin Shen, Anoop Sarkar, and Aravind k. Joshi. 2003. Using LTAG Based Features in Parse Reranking. In Empirical Methods for Natural Language Processing (EMNLP), pages 89–96, Sapporo, Japan. Georges Siolas and Florence d’Alch Buc. 2000. Support vector machines based on a semantic kernel for text categorization. In Proceedings of the IEEE-INNSENNS International Joint Conference on Neural Networks (IJCNN’00)-Volume 5, page 5205. IEEE Computer Society. Ivan Titov and James Henderson. 2006. Porting statistical parsers with data-defined kernels. In Proceedings of CoNLL-X. Kristina Toutanova, Penka Markova, and Christopher Manning. 2004. The Leaf Path Projection View of Parse Trees: Exploring String Kernels for HPSG Parse Selection. In Proceedings of EMNLP 2004. Yannick Versley, Alessandro Moschitti, Massimo Poesio, and Xiaofeng Yang. 2008. Coreference systems based on kernels methods. In The 22nd International Conference on Computational Linguistics (Coling ’08), Manchester, England. Zhibiao Wu and Martha Palmer. 1994. Verb semantics and lexical selection. In 32nd. Annual Meeting of the Association for Computational Linguistics, pages 133 –138, New Mexico State University, Las Cruces, New Mexico. Xiaofeng Yang, Jian Su, and Chewlim Tan. 2006. Kernel-based pronoun resolution with structured syntactic knowledge. In Proc. COLING-ACL 06. Alexander S. Yeh. 2000. More accurate tests for the statistical significance of result differences. In COLING, pages 947–953. Dmitry Zelenko, Chinatsu Aone, and Anthony Richardella. 2002. Kernel methods for relation extraction. In Proceedings of EMNLP-ACL, pages 181–201. Dell Zhang and Wee Sun Lee. 2003. Question classification using support vector machines. In Proceedings of the 26th annual international ACM SIGIR conference on Research and development in informaion retrieval, pages 26–32. ACM Press. Min Zhang, Jie Zhang, and Jian Su. 2006. Exploring Syntactic Features for Relation Extraction using a Convolution tree kernel. In Proceedings of NAACL. Peixiang Zhao, Jiawei Han, and Yizhou Sun. 2009. PRank: a comprehensive structural similarity measure over information networks. In CIKM ’09: Proceeding of the 18th ACM conference on Information and knowledge management, pages 553–562, New York, NY, USA. ACM. 1046