acl acl2012 acl2012-130 acl2012-130-reference knowledge-graph by maker-knowledge-mining

130 acl-2012-Learning Syntactic Verb Frames using Graphical Models

Source: pdf

Author: Thomas Lippincott ; Anna Korhonen ; Diarmuid O Seaghdha

Abstract: We present a novel approach for verb subcategorization lexicons using a simple graphical model. In contrast to previous methods, we show how the model can be trained without parsed input or a predefined subcategorization frame inventory. Our method outperforms the state-of-the-art on a verb clustering task, and is easily trained on arbitrary domains. This quantitative evaluation is com- plemented by a qualitative discussion of verbs and their frames. We discuss the advantages of graphical models for this task, in particular the ease of integrating semantic information about verbs and arguments in a principled fashion. We conclude with future work to augment the approach.

reference text

Omri Abend and Ari Rappoport. 2010. Fully unsupervised core-adjunct argument classification. In ACL ’10. Galen Andrew, Trond Grenager, and Christopher Manning. 2004. Verb sense and subcategorization: using joint inference to improve performance on complementary tasks. EMNLP ’04. Collin Baker, Charles Fillmore, and John Lowe. 1998. The Berkeley FrameNet project. In COLING ACL ’98. David Blei, Andrew Ng, Michael Jordan, and John Lafferty. 2003. Latent dirichlet allocation. Journal of Machine Learning Research. Olivier Bodenreider. 2004. The Unified Medical Language System (UMLS): integrating biomedical terminology. Nucleic Acids Research, 32. Bran Boguraev and Ted Briscoe. 1987. Large lexicons for natural language processing. Computational Linguistics, 13. Ted Briscoe, John Carroll, and Rebecca Watson. 2006. The second release of the RASP system. In Proceedings of the COLING/ACL on Interactive presentation sessions. John Carroll, Guido Minnen, and Ted Briscoe. 1998. Can subcategorisation probabilities help a statistical parser? In The 6th ACL/SIGDAT Workshop on Very Large Corpora. K Bretonnel Cohen and Lawrence Hunter. 2006. A critical review of PASBio’s argument structures for biomedical verbs. BMC Bioinformatics, 7. James Curran, Stephen Clark, and Johan Bos. 2007. Lin- guistically motivated large-Scale NLP with C&C; and Boxer. In ACL ’07. Marie-Catherine De Marneffe, Bill Maccartney, and Christopher D. Manning. 2006. Generating typed dependency parses from phrase structure parses. In LREC ’06. Jenny Rose Finkel, Trond Grenager, and Christopher Manning. 2007. The infinite tree. In ACL ’07. Ralph Grishman, Catherine Macleod, and Adam Meyers. 1994. Comlex syntax: building a computational lexicon. In COLING ’94. Xiwu Han, Chengguo Lv, and Tiejun Zhao. 2008. Weakly supervised SVM for Chinese-English crosslingual subcategorization lexicon acquisition. In The 11th Joint Conference on Information Science. J.A. Hartigan and M.A. Wong. 1979. Algorithm AS 136: A K-Means clustering algorithm. Journal of the Royal Statistical Society. Series C (Applied Statistics). Gregor Heinrich. 2009. Parameter estimation for text analysis. Technical report, Fraunhofer IGD. Gregor Heinrich. 2011. Infinite LDA implementing the HDP with minimum code complexity. Technical report, arbylon.net. Lawrence Hubert and Phipps Arabie. 1985. Comparing partitions. Journal of Classification, 2. Eric Joanis and Suzanne Stevenson. 2003. A general feature space for automatic verb classification. In EACL ’03. Karin Kipper, Anna Korhonen, Neville Ryant, and Martha Palmer. 2008. A large-scale classification of English verbs. In LREC ’08. Anna Korhonen, Genevieve Gorrell, and Diana McCarthy. 2000. Statistical filtering and subcategorization frame acquisition. In Proceedings of the Joint SIGDAT Conference on Empirical Methods in Natural Language Processing and Very Large Corpora. Anna Korhonen, Yuval Krymolowski, and Ted Briscoe. 2006a. A large subcategorization lexicon for natural language processing applications. In LREC ’06. Anna Korhonen, Yuval Krymolowski, and Nigel Collier. 2006b. Automatic classification of verbs in biomedical texts. In ACL ’06. Anna Korhonen, Yuval Krymolowski, and Nigel Collier. 2008. The choice of features for classification of verbs in biomedical texts. In COLING ’08. Ro Lenci, Barbara Mcgillivray, Simonetta Montemagni, and Vito Pirrelli. 2008. Unsupervised acquisition of verb subcategorization frames from shallow-parsed corpora. In LREC ’08. Beth Levin. 1993. English Verb Classes and Alternation: A Preliminary Investigation. University of Chicago Press, Chicago, IL. Thomas Lippincott, Anna Korhonen, and Diarmuid O´ S ´eaghdha. 2010. Exploring subdomain variation in biomedical language. BMC Bioinformatics. Diana McCarthy. 2000. Using semantic preferences to identify verbal participation in role switching alternations. In NAACL ’00. Marina Meila. 2003. Comparing clusterings by the Variation of Information. In COLT. Paola Merlo and Suzanne Stevenson. 2001 . Automatic verb classification based on statistical distributions of argument structure. Computational Linguistics. C ´edric Messiant. 2008. A subcategorization acquisition system for French verbs. In ACL HLT ’08 Student Research Workshop. Yusuke Miyao. 2005. Probabilistic disambiguation models for wide-coverage HPSG parsing. In ACL ’05. Radford M. Neal. 1993. Probabilistic inference using markov chain Monte Carlo methods. Technical report, University of Toronto. Joakim Nivre, Johan Hall, Sandra K ¨ubler, Ryan Mcdonald, Jens Nilsson, Sebastian Riedel, and Deniz 429 Yuret. 2007. The CoNLL 2007 shared task on dependency parsing. In The CoNLL Shared Task Session of EMNLP-CoNLL 2007. Diarmuid S ´eaghdha. 2010. Latent variable models of selectional preference. In ACL ’10. Martha Palmer, Paul Kingsbury, and Daniel Gildea. 2005. The Proposition Bank: an annotated corpus of O´ semantic roles. Computational Linguistics. Judita Preiss, Ted Briscoe, and Anna Korhonen. 2007. A system for large-scale acquisition of verbal, nominal and adjectival subcategorization frames from corpora. In ACL ’07. Douglas Roland and Daniel Jurafsky. 1998. How verb subcategorization frequencies are affected by corpus choice. In ACL ’98. Peter Rousseeuw. 1987. Silhouettes: a graphical aid to the interpretation and validation of cluster analysis. Journal of Computational and Applied Mathematics. C.J. Rupp, Paul Thompson, William Black, and John McNaught. 2010. A specialised verb lexicon as the basis of fact extraction in the biomedical domain. In Interdisciplinary Workshop on Verbs: The Identification and Representation of Verb Features. Sabine Schulte im Walde. 2009. The induction of verb frames and verb classes from corpora. In Corpus Linguistics. An International Handbook. Mouton de Gruyter. Lin Sun and Anna Korhonen. 2009. Improving verb clustering with automatically acquired selectional preferences. In EMNLP’09. Mihai Surdeanu, Sanda Harabagiu, John Williams, and Paul Aarseth. 2003. Using predicate-argument structures for information extraction. In ACL ’03. Adam R. Teichert and Hal Daum e´ III. 2009. Unsuper- vised part of speech tagging without a lexicon. In NIPS Workshop on Grammar Induction, Representation of Language and Language Learning. E. Uzun, Y. Klaslan, H.V. Agun, and E. Uar. 2008. Web-based acquisition of subcategorization frames for Turkish. In The Eighth International Conference on Artificial Intelligence and Soft Computing. Giulia Venturi, Simonetta Montemagni, Simone Marchi, Yutaka Sasaki, Paul Thompson, John McNaught, and Sophia Ananiadou. 2009. Bootstrapping a verb lexicon for biomedical information extraction. In Computational Linguistics and Intelligent Text Processing. Springer Berlin / Heidelberg.