acl acl2011 acl2011-316 acl2011-316-reference knowledge-graph by maker-knowledge-mining
Source: pdf
Author: Nathan Bodenstab ; Kristy Hollingshead ; Brian Roark
Abstract: We present a novel pruning method for context-free parsing that increases efficiency by disallowing phrase-level unary productions in CKY chart cells spanning a single word. Our work is orthogonal to recent work on “closing” chart cells, which has focused on multi-word constituents, leaving span-1 chart cells unpruned. We show that a simple discriminative classifier can learn with high accuracy which span-1 chart cells to close to phrase-level unary productions. Eliminating these unary productions from the search can have a large impact on downstream processing, depending on implementation details of the search. We apply our method to four parsing architectures and demonstrate how it is complementary to the cell-closing paradigm, as well as other pruning methods such as coarse-to-fine, agenda, and beam-search pruning.
Nathan Bodenstab, Aaron Dunlop, Keith Hall, and Brian Roark. 2011. Beam-width prediction for efficient context-free parsing. In Proceedings of the 49th Annual Meeting of the Association for Computational Linguistics, Portland, Oregon. Association for Computational Linguistics. Eugene Charniak. 1997. Statistical parsing with a context-free grammar and word statistics. In Proceedings of the Fourteenth National Conference on Artificial Intelligence, pages 598–603, Menlo Park, CA. AAAI Press/MIT Press. Eugene Charniak. 2000. A maximum-entropy-inspired parser. In Proceedings of the 1st North American chapter of the Association for Computational Linguistics conference, pages 132–139, Seattle, Washington. Morgan Kaufmann Publishers Inc. John Cocke and Jacob T. Schwartz. 1970. Programming languages and their compilers. Technical report Preliminary notes, Courant Institute of Mathematical Sciences, NYU. Michael Collins. 1997. Three generative, lexicalised models for statistical parsing. In Proceedings of the eighth conference on European chapter of the Association for Computational Linguistics, page 1623, Morristown, NJ, USA. Association for Computational Linguistics. Michael Collins. 2002. Discriminative training methods for hidden Markov models: theory and experiments with perceptron algorithms. In Proceedings of the ACL-02 conference on Empirical Methods in Natural Language Processing, volume 10, pages 1 8, Philadelphia, July. Association for Computational Linguistics. Dan Klein and Christopher D. Manning. 2001. Parsing with treebank grammars: Empirical bounds, theoretical models, and the structure of the Penn treebank. In Proceedings of 39th Annual Meeting of the Association for Computational Linguistics, pages 338–345, Toulouse, France, July. Association for Computational Linguistics. Mitchell P Marcus, Beatrice Santorini, Mary Ann Marcinkiewicz, and Ann Taylor. 1999. Treebank-3. Linguistic Data Consortium, Philadelphia. Slav Petrov and Dan Klein. 2007a. Improved inference for unlexicalized parsing. In Human Language Technologies 2007: The Conference of the North American Chapter of the Association for Computational Linguistics; Proceedings of the Main Conference, pages 404–41 1, Rochester, New York, April. Association for Computational Linguistics. 681 Slav Petrov and Dan Klein. 2007b. Learning and inference for hierarchically split PCFGs. In AAAI 2007 (Nectar Track). Brian Roark and Kristy Hollingshead. 2008. Classifying chart cells for quadratic complexity context-free inference. In Donia Scott and Hans Uszkoreit, editors, Proceedings of the 22nd International Conference on Computational Linguistics (COLING 2008), pages 745–752, Manchester, UK, August. Association for Computational Linguistics. Brian Roark and Kristy Hollingshead. 2009. Linear complexity context-free parsing pipelines via chart constraints. In Proceedings of Human Language Technologies: The 2009 Annual Conference of the North American Chapter of the Association for Computational Linguistics, pages 647–655, Boulder, Colorado, June. Association for Computational Linguistics. Brian Roark and Richard W Sproat. 2007. Computational Approaches to Morphology and Syntax. Oxford University Press, New York. Yue Zhang, Byung gyu Ahn, Stephen Clark, Curt Van Wyk, James R. Curran, and Laura Rimell. 2010. Chart pruning for fast lexicalised-grammar parsing. In Proceedings of the 23rd International Conference on Computational Linguistics, pages 1472–1479, Beijing, China, June.