acl acl2011 acl2011-300 acl2011-300-reference knowledge-graph by maker-knowledge-mining

300 acl-2011-The Surprising Variance in Shortest-Derivation Parsing

Source: pdf

Author: Mohit Bansal ; Dan Klein

Abstract: We investigate full-scale shortest-derivation parsing (SDP), wherein the parser selects an analysis built from the fewest number of training fragments. Shortest derivation parsing exhibits an unusual range of behaviors. At one extreme, in the fully unpruned case, it is neither fast nor accurate. At the other extreme, when pruned with a coarse unlexicalized PCFG, the shortest derivation criterion becomes both fast and surprisingly effective, rivaling more complex weighted-fragment approaches. Our analysis includes an investigation of tie-breaking and associated dynamic programs. At its best, our parser achieves an accuracy of 87% F1 on the English WSJ task with minimal annotation, and 90% F1 with richer annotation.

reference text

Mohit Bansal and Dan Klein. 2010. Simple, Accurate Parsing with an All-Fragments Grammar. In Proceedings of ACL. Rens Bod. 1993. Using an Annotated Corpus as a Stochastic Grammar. In Proceedings of EACL. Rens Bod. 2000. Parsing with the Shortest Derivation. In Proceedings of COLING. Rens Bod. 2001. What is the Minimal Set of Fragments that Achieves Maximum Parse Accuracy? In Proceedings of ACL. Eugene Charniak, Sharon Goldwater, and Mark Johnson. 1998. Edge-Based Best-First Chart Parsing. In Proceedings of the 6th Workshop on Very Large Corpora. Eugene Charniak, Mark Johnson, et al. 2006. Multilevel Coarse-to-fine PCFG Parsing. In Proceedings of HLT-NAACL. Trevor Cohn and Phil Blunsom. 2010. Blocked Inference in Bayesian Tree Substitution Grammars. In Proceedings of NAACL. Michael Collins. 1999. Head-Driven Statistical Models for Natural Language Parsing. Ph.D. thesis, University of Pennsylvania, Philadelphia. A. Dubey. 2005. What to do when lexicalization fails: parsing German In ACL ’05. Daniel with suffix analysis and smoothing. 2001. Corpus variation and parser performance. In Proceedings of EMNLP. Joshua Goodman. 1996a. Efficient Algorithms for Parsing the DOP Model. In Proceedings of EMNLP. Joshua Goodman. 1996b. Parsing Algorithms and Metrics. In Proceedings of ACL. Joshua Goodman. 2003. Efficient parsing of DOP with PCFG-reductions. In Bod R, Scha R, Sima ’an K (eds.) Gildea. Data-Oriented Parsing. University of Chicago Press, Chicago, IL. Mark Johnson. 1998. PCFG Models of Linguistic Tree Representations. Computational Linguistics, 24:613– 632. Dan Klein and Christopher Manning. 2003. Accurate Unlexicalized Parsing. In Proceedings of ACL. Slav Petrov and Dan Klein. 2007. Improved Inference for Unlexicalized Parsing. In Proceedings of NAACLHLT. Slav Petrov, Leon Barrett, Romain Thibaux, and Dan Klein. 2006. Learning Accurate, Compact, and Interpretable Tree Annotation. In Proceedings of COLING-ACL. Matt Post and Daniel Gildea. 2009. Bayesian Learning of a Tree Substitution Grammar. In Proceedings of ACL-IJCNLP. Philip Resnik. 1992. Probabilistic Tree-Adjoining Grammar as a Framework for Statistical Natural Language Processing. In Proceedings of COLING. Khalil Sima’an. 2000. Tree-gram Parsing: Lexical Dependencies and Structural Relations. In Proceedings of ACL. 725