acl acl2013 acl2013-192 acl2013-192-reference knowledge-graph by maker-knowledge-mining

192 acl-2013-Improved Lexical Acquisition through DPP-based Verb Clustering

Source: pdf

Author: Roi Reichart ; Anna Korhonen

Abstract: Subcategorization frames (SCFs), selectional preferences (SPs) and verb classes capture related aspects of the predicateargument structure. We present the first unified framework for unsupervised learning of these three types of information. We show how to utilize Determinantal Point Processes (DPPs), elegant probabilistic models that are defined over the possible subsets of a given dataset and give higher probability mass to high quality and diverse subsets, for clustering. Our novel clustering algorithm constructs a joint SCF-DPP DPP kernel matrix and utilizes the efficient sampling algorithms of DPPs to cluster together verbs with similar SCFs and SPs. We evaluate the induced clusters in the context of the three tasks and show results that are superior to strong baselines for each 1.

reference text

Ivana Romina Altamirano and Laura Alonso i Alemany. 2010. IRASubcat, a highly customizable, language independent tool for the acquisition of verbal subcategorization information from corpus. In Proceedings of the NAACL HLT 2010 Young Investigators Workshop on Computational Approaches to Languages of the Americas. Collin F. Baker, Charles J. Fillmore, and John B. Lowe. 1998. The berkeley framenet project. In COLINGACL-98. Roberto Basili, Diego De Cao, Paolo Marocco, and Marco Pennacchiotti. 2007. Learning selectional preferences for entailment or paraphrasing rules. In RANLP 2007, Borovets, Bulgaria. Rahul Bhagat, Patrick Pantel, and Eduard Hovy. 2007. Ledir: An unsupervised algorithm for learning directionality of inference rules. In EMNLP-07, page 161170, Prague, Czech Republic. Akshar Bharati, Sriram Venkatapathy, and Prashanth Reddy. 2005. Inferring semantic roles using subcategorization frames and maximum entropy model. In CoNLL-05. Ted Briscoe and John Carroll. 1997. Automatic extraction of subcategorization from corpora. In ANLP97. E.J. Briscoe, J. Carroll, and R. Watson. 2006. The second relsease ofthe rasp system. In COLING/ACL interactive presentation session. Glenn Carroll and Mats Rooth. 1996. Valence induction with a head-lexicalized pcfg. In EMNLP-96. Paula Chesley and Susanne Salmon-Alt. 2006. Automatic extraction of subcategorization frames for french. In LREC-06. Kostadin Cholakov and Gertjan van Noord. 2010. Using unknown word techniques to learn known words. In EMNLP-10. Hoa Trang Dang. 2004. Investigations into the Role of Lexical Semantics in Word Sense Disambiguation. Ph.D. thesis, CIS, University of Pennsylvania. Tim Van de Cruys, Laura Rimell, Thierry Poibeau, and Anna Korhonen. 2012. Multi-way tensor factorization for unsupervised lexical acquisition. In COLING-12. Lukasz Dkebowski. 2009. Valence extraction using EM selection and co-occurrence matrices. Language resources and evaluation, 43(4):301–327. Katrin Erk. 2007. A simple, similarity-based model for selectional preferences. In ACL 2007, Prague, Czech Republic. J. Gillenwater, A. Kulesza, and B. Taskar. 2012. Discovering diverse and salient threads in document collections. In EMNLP-12. Ralph Grishman, Catherine Macleod, and Adam Meyers. 1994. Comlex syntax: Building a computational lexicon. In COLNIG-94. G.E. Hinton. 2002. Training products of experts by minimizing contrastive divergence. Neural Computation, 14:1771–1800. Eduard Hovy, Mitchell Marcus, Martha Palmer, Lance Ramshaw, and Ralph Weischedel. 2006. Ontonotes: the 90% solution. In Porceedings 0fNAACL-HLT-06 short papers. Dino Ienco, Serena Villata, and Cristina Bosco. 2008. Automatic extraction of subcategorization frames for italian. In LREC-08. Eric Joanis, Suzanne Stevenson, and David James. 2008. A general feature space for automatic verb classification. Natural Language Engineering. Daisuke Kawahara and Sadao Kurohashi. 2010. Acquiring reliable predicate-argument structures from raw corpora for case frame compilation. In LREC10. Karin Kipper-Schuler. 2005. VerbNet: A broadcoverage, comprehensive verb lexicon. Ph.D. thesis, University of Pennsylvania, Philadelphia, PA, June. 870 Anna Korhonen, Yuval Krymolowski, and Nigel Collier. 2008. The choice of features for classification of verbs in biomedical texts. In Proceddings of COLING-08. Anna Korhonen. 2002. Semantically motivated subcategorization acquisition. In Proceedings of the ACL-02 workshop on Unsupervised lexical acquisition-Volume 9. A. Kulesza and B. Taskar. 2010. Structured determinantal point processes. In NIPS-10. A. Kulesza and B. Taskar. 2012a. k-dpps: fixed-size determinantal point processes. In ICML-11. A. Kulesza and B. Taskar. 2012b. Learning determinantal point processes. In UAI-12. Alex Kulesza and Ben Taskar. 2012c. Determinantal point processes for machine learning. In arXiv:1207.6083. A. Kulesza. 2012. Learning with determinantal point processes. Ph.D. thesis, CIS, University of Pennsylvania. Alessandro Lenci, Barbara McGillivray, Simonetta Montemagni, and Vito Pirrelli. 2008. Unsupervised acquisition of verb subcategorization frames from shallow-parsed corpora. In LREC-08. Beth Levin. 1993. English verb classes and alternations: A preliminary investigation. Chicago, IL. Jianguo Li and Chris Brew. 2008. Which are the best features for automatic verb classification. In ACL08. Tom Lippincott, Anna Korhonen, and Diarmuid O´ S ´eaghdha. 2012. Learning syntactic verb frames using graphical models. In ACL-12, Jeju, Korea. C ´edric Messiant, Anna Korhonen, and Thierry Poibeau. 2008. LexSchem: A large subcategorization lexicon for French verbs. In LREC-08. George A. Miller. 1995. Wordnet: a lexical database for english. Communications of the ACM, 38(1 1):39–41. Alessandro Moschitti and Roberto Basili. 2005. Verb subcategorization kernels for automatic semantic labeling. In Proceedings of the ACL-SIGLEX Workshop on Deep Lexical Acquisition. Ruth O’Donovan, Michael Burke, Aoife Cahill, Josef van Genabith, and Andy Way. 2005. Large-scale induction and evaluation of lexical resources from the penn-ii and penn-iii treebanks. Computational Linguistics, 31:328–365. Diarmuid O´ S ´eaghdha and Anna Korhonen. 2011. Probabilistic models of similarity in syntactic context. In EMNLP-11, Edinburgh, UK. Diarmuid O´ S ´eaghdha. 2010. Latent variable models of selectional preference. In ACL-10, Uppsala, Swe- den. Martha Palmer, Daniel Gildea, and Paul Kingsbury. 2005. The proposition bank: An annotated corpus of semantic roles. Computational Linguistics, 31(1):71–106. Judita Preiss, Ted Briscoe, and Anna Korhonen. 2007. A system for large-scale acquisition of verbal, nominal and adjectival subcategorization frames from corpora. In ACL-07. Joseph Reisinger and Raymond Mooney. 2011. Crosscutting models of lexical semantics. In EMNLP-11, Edinburgh, UK. Alan Ritter and Oren Etzioni. 2010. A latent dirichlet allocation method for selectional preferences. In ACL-10. Mats Rooth, Stefan Riezler, Detlef Prescher, Glenn Carroll, and Franz Beil. 1999. Inducing a semantically annotated lexicon via em-based clustering. In ACL-99. Karin Kipper Schuler. 2006. VerbNet: A BroadCoverage, Comprehensive Verb Lexicon. Ph.D. thesis, University of Pennsylvania. S. Schulte im Walde, C. Hying, C. Scheible, and H. Schmid. 2008. Combining EM training and the MDL principle for an automatic verb classification incorporating selectional preferences. In ACL-08, pages 496–504. Sabine Schulte im Walde. 2006. Experiments on the automatic induction of german semantic verb classes. Computational Linguistics, 32(2): 159–194. Lei Shi and Rada Mihalcea. 2005. Putting pieces together: Combining framenet, verbnet and wordnet for robust semantic parsing. In CICLING-05. Lin Sun and Anna Korhonen. 2009. Improving verb clustering with automatically acquired selectional preferences. In EMNLP-09, Singapore. Lin Sun and Anna Korhonen. 2011. Hierarchical verb clustering using graph factorization. In EMNLP-11. Lin Sun, Anna Korhonen, and Yuval Krymolowski. 2008. Verb class discovery from rich syntactic data. Lecture Notes in Computer Science, 4919(16). Robert Swier and Suzanne Stevenson. 2004. Unsupervised semantic role labelling. In EMNLP-04. Stefan Thater, Hagen Furstenau, and Manfred Pinkal. 2010. Contextualizing semantic representations using syntactically enriched vector models. In ACL10, Uppsala, Sweden. Tim Van de Cruys. 2009. A non-negative tensor factorization model for selectional preference induction. In Proceedings of the workshop on Geometric Models for Natural Language Semantics (GEMS). 871 Andreas Vlachos, Anna Korhonen, and Zoubin Ghahramani. 2009. Unsupervised and constrained dirichlet process mixture models for verb clustering. In Proceedings ofthe Workshop on Geometrical Models of Natural Language Semantics. 2008. Robustness and generalization of role sets: PropBank vs. VerbNet. Benat Zapirain, Eneko Agirre, and Lluis Marquex. 2009. Generalizing over lexical features: Selectional preferences for semantic role classification. In ACL-IJCNLP-09, Singapore. Guangyou Zhou, Jun Zhao, Kang Liu, and Li Cai. 2011. Exploiting web-derived selectional preference to improve statistical dependency parsing. In ACL-11, Portland, OR. 872