acl acl2010 acl2010-211 acl2010-211-reference knowledge-graph by maker-knowledge-mining

211 acl-2010-Simple, Accurate Parsing with an All-Fragments Grammar

Source: pdf

Author: Mohit Bansal ; Dan Klein

Abstract: We present a simple but accurate parser which exploits both large tree fragments and symbol refinement. We parse with all fragments of the training set, in contrast to much recent work on tree selection in data-oriented parsing and treesubstitution grammar learning. We require only simple, deterministic grammar symbol refinement, in contrast to recent work on latent symbol refinement. Moreover, our parser requires no explicit lexicon machinery, instead parsing input sentences as character streams. Despite its simplicity, our parser achieves accuracies of over 88% F1 on the standard English WSJ task, which is competitive with substantially more complicated state-of-theart lexicalized and latent-variable parsers. Additional specific contributions center on making implicit all-fragments parsing efficient, including a coarse-to-fine inference scheme and a new graph encoding.

reference text

Rens Bod. 1993. Using an Annotated Corpus as a Stochastic Grammar. In Proceedings of EACL. Rens Bod. 2001. What is the Minimal Set of Fragments that Achieves Maximum Parse Accuracy? In Proceedings of ACL. Eugene Charniak and Mark Johnson. 2005. Coarseto-fine n-best parsing and MaxEnt discriminative reranking. In Proceedings of ACL. Eugene Charniak, Sharon Goldwater, and Mark Johnson. 1998. Edge-Based Best-First Chart Parsing. In Proceedings of the 6th Workshop on Very Large Corpora. Eugene Charniak, Mark Johnson, et al. 2006. Multilevel Coarse-to-fine PCFG Parsing. In Proceedings of HLT-NAACL. Eugene Charniak. 2000. A Maximum-EntropyInspired Parser. In Proceedings of NAACL. David Chiang. 2003. Statistical parsing with an automatically-extracted tree adjoining grammar. In Data-Oriented Parsing. David Chiang. 2005. A Hierarchical Phrase-Based Model for Statistical Machine Translation. In Proceedings of ACL. Trevor Cohn, Sharon Goldwater, and Phil Blunsom. 2009. Inducing Compact but Accurate TreeSubstitution Grammars. In Proceedings of NAACL. Michael Collins and Nigel Duffy. 2002. New Ranking Algorithms for Parsing and Tagging: Kernels over Discrete Structures, and the Voted Perceptron. In Proceedings of ACL. Michael Collins. 1999. Head-Driven Statistical Mod- els for Natural Language Parsing. Ph.D. thesis, University of Pennsylvania, Philadelphia. Steve Deneefe and Kevin Knight. 2009. Synchronous Tree Adjoining Machine Translation. In Proceedings of EMNLP. Michel Galley, Mark Hopkins, Kevin Knight, and Daniel Marcu. 2004. What’s in a translation rule? In Proceedings of HLT-NAACL. Joshua Goodman. 1996a. Efficient Algorithms for Parsing the DOP Model. In Proceedings of EMNLP. Joshua Goodman. 1996b. Parsing Algorithms and Metrics. In Proceedings of ACL. Joshua Goodman. 2003. Efficient parsing of DOP with PCFG-reductions. In Bod R, Scha R, Sima’an K (eds.) Data-Oriented Parsing. University of Chicago Press, Chicago, IL. James Henderson. 2004. Discriminative Training of a Neural Network Statistical Parser. In Proceedings of ACL. Mark Johnson. 1998. PCFG Models of Linguistic Tree Representations. Computational Linguistics, 24:613–632. Mark Johnson. 2002. The DOP Estimation Method Is Biased and Inconsistent. In Computational Linguistics 28(1). Dan Klein and Christopher Manning. 2003. Accurate Unlexicalized Parsing. In Proceedings of ACL. Philipp Koehn, Franz Och, and Daniel Marcu. 2003. Statistical Phrase-Based Translation. In Proceedings of HLT-NAACL. Takuya Matsuzaki, Yusuke Miyao, and Jun’ichi Tsujii. 2005. Probabilistic CFG with latent annotations. In Proceedings of ACL. Slav Petrov and Dan Klein. 2007. Improved Inference for Unlexicalized Parsing. In Proceedings of NAACL-HLT. Slav Petrov and Dan Klein. 2008. Sparse Multi-Scale Grammars for Discriminative Latent Variable Parsing. In Proceedings of EMNLP. Slav Petrov, Leon Barrett, Romain Thibaux, and Dan Klein. 2006. Learning Accurate, Compact, and Interpretable Tree Annotation. In Proceedings of COLING-ACL. Slav Petrov, Aria Haghighi, and Dan Klein. 2008. Coarse-to-Fine Syntactic Machine Translation using Language Projections. In Proceedings of EMNLP. Matt Post and Daniel Gildea. 2009. Bayesian Learning of a Tree Substitution Grammar. In Proceedings of ACL-IJCNLP. Philip Resnik. 1992. Probabilistic Tree-Adjoining Grammar as a Framework for Statistical Natural Language Processing. In Proceedings of COLING. Remko Scha. 1990. Taaltheorie en taaltechnologie; competence en performance. In R. de Kort and G.L.J. Leerdam (eds.): Computertoepassingen in de Neerlandistiek. Khalil Sima’an. 1996. Computational Complexity of Probabilistic Disambiguation by means of TreeGrammars. In Proceedings of COLING. Khalil Sima’an. 2000. Tree-gram Parsing: Lexical Dependencies and Structural Relations. In Proceedings of ACL. Andreas Zollmann and Khalil Sima’an. 2005. A Consistent and Efficient Estimator for Data-Oriented Parsing. Journal of Automata, Languages and Combinatorics (JALC), 10(2/3):367–388. Willem Zuidema. 2007. Parsimonious Data-Oriented Parsing. In Proceedings of EMNLP-CoNLL. 1107