emnlp emnlp2011 emnlp2011-16 emnlp2011-16-reference knowledge-graph by maker-knowledge-mining

16 emnlp-2011-Accurate Parsing with Compact Tree-Substitution Grammars: Double-DOP


Source: pdf

Author: Federico Sangati ; Willem Zuidema

Abstract: We present a novel approach to Data-Oriented Parsing (DOP). Like other DOP models, our parser utilizes syntactic fragments of arbitrary size from a treebank to analyze new sentences, but, crucially, it uses only those which are encountered at least twice. This criterion allows us to work with a relatively small but representative set of fragments, which can be employed as the symbolic backbone of several probabilistic generative models. For parsing we define a transform-backtransform approach that allows us to use standard PCFG technology, making our results easily replicable. According to standard Parseval metrics, our best model is on par with many state-ofthe-art parsers, while offering some complementary benefits: a simple generative probability model, and an explicit representation of the larger units of grammar.


reference text

Mohit Bansal and Dan Klein. 2010. Simple, accurate parsing with an all-fragments grammar. In Proceedings of the 48th Annual Meeting of the ACL, pages 1098–1 107. Association for Computational Linguistics, Uppsala, Sweden. Rens Bod. 1992. A computational model of language performance: Data oriented parsing. In Proceedings COLING’92 (Nantes, France), pages 855–859. Association for Computational Linguistics, Morristown, NJ. Rens Bod. 2001. What is the minimal set of fragments that achieves maximal parse accuracy? In Proceedings of the ACL. Morgan Kaufmann, San Francisco, CA. Rens Bod. 2003. An efficient implementation of a new DOP model. In Proceedings of the tenth conference on European chapter of the Association for Computational Linguistics - Volume 1, EACL ’03, pages 19–26. Association for Computational Linguistics, Morristown, NJ, USA. Rens Bod, Khalil Sima’an, and Remko Scha. 2003. Data-Oriented Parsing. University of Chicago Press, Chicago, IL, USA. Gideon Borensztajn, Willem Zuidema, and Rens Bod. 2009. Children’s Grammars Grow More Abstract with Age—Evidence from an Automatic Procedure for Identifying the Productive Units of Language. Topics in Cognitive Science, 1(1): 175– 188. Eugene Charniak. 1997. Statistical parsing with a context-free grammar and word statistics. In Proceedings of the Fourteenth National Conference on Artificial Intelligence, pages 598–603. AAAI Press/MIT Press. Eugene Charniak and Mark Johnson. 2005. Coarseto-fine n-best parsing and maxent discriminative reranking. In Proc. 43nd Meeting of Association for Computational Linguistics (ACL 2005). Trevor Cohn, Phil Blunsom, and Sharon Goldwater. 2010. Inducing tree-substitution grammars. Journal of Machine Learning Research, 11:3053– 3096. Michael Collins and Nigel Duffy. 2001 . Convolution Kernels for Natural Language. In Thomas G. 94 Dietterich, Suzanna Becker, and Zoubin Ghahramani, editors, NIPS, pages 625–632. MIT Press. Michael Collins and Nigel Duffy. 2002. New ranking algorithms for parsing and tagging: Kernels over discrete structures, and the voted perceptron. In Proceedings of 40th Annual Meeting of the ACL, pages 263–270. Association for Computational Linguistics, Philadelphia, Pennsylvania, USA. Michael J. Collins. 1999. Head-Driven Statistical Models for Natural Language Parsing. Ph.D. thesis, University of Pennsylvania. Joshua Goodman. 1996. Efficient algorithms for parsing the DOP model. In Proceedings of the Conference on Empirical Methods in Natural Language Processing, pages 143–152. Joshua Goodman. 2003. Efficient parsing of DOP with PCFG-reductions. In Bod et al. (2003). Joshua T. Goodman. 1998. Parsing inside-out. Ph.D. thesis, Harvard University, Cambridge, MA, USA. Ray Jackendoff. 2002. Foundations of Language. Oxford University Press, Oxford, UK. Mark Johnson. 2002. The dop estimation method is biased and inconsistent. Computational Linguistics, 28:71–76. Mark Johnson, Thomas L. Griffiths, and Sharon Goldwater. 2007. Adaptor grammars: A framework for specifying compositional nonparametric bayesian models. In Advances in Neural Information Processing Systems, volume 16, pages 641– 648. Dan Klein and Christopher D. Manning. 2003. Accurate unlexicalized parsing. In ACL ’03: Proceedings of the 41st Annual Meeting on ACL, pages 423–430. Association for Computational Linguistics, Morristown, NJ, USA. K. Lari and S. J. Young. 1990. The estimation of stochastic context-free grammars using the inside-outside algorithm. Computer Speech and Language, 4:35–56. Mitchell P. Marcus, Mary Ann Marcinkiewicz, and Beatrice Santorini. 1993. Building a Large Annotated Corpus of English: The Penn Treebank. Computational Linguistics, 19(2):3 13–330. Takuya Matsuzaki, Yusuke Miyao, and Jun’ichi Tsujii. 2005. Probabilistic cfg with latent anno- tations. In ACL ’05: Proceedings of the 43rd Annual Meeting on ACL, pages 75–82. Association for Computational Linguistics, Morristown, NJ, USA. Alessandro Moschitti. 2006. Efficient Convolution Kernels for Dependency and Constituent Syntactic Trees. In ECML, pages 3 18–329. Machine Learning: ECML 2006, 17th European Conference on Machine Learning, Proceedings, Berlin, Germany. Timothy J. O’Donnell, Noah D. Goodman, and Joshua B. Tenenbaum. 2009. Fragment Grammars: Exploring Computation and Reuse in Language. Technical Report MIT-CSAIL-TR-2009013, MIT. Slav Petrov. 2009. Coarse-to-Fine Natural Language Processing. Ph.D. thesis, University of California at Bekeley, Berkeley, CA, USA. Slav Petrov, Leon Barrett, Romain Thibaux, and Dan Klein. 2006. Learning accurate, compact, and interpretable tree annotation. In ACL-44: Proceedings of the 21st International Conference on Computational Linguistics and the 44th annual meeting of the ACL, pages 433–440. Associ- ation for Computational Linguistics, Morristown, NJ, USA. Slav Petrov and Dan Klein. 2007. Improved inference for unlexicalized parsing. In Human Language Technologies 2007: The Conference of the North American Chapter ofthe ACL; Proceedings of the Main Conference, pages 404–41 1. Association for Computational Linguistics, Rochester, New York. Matt Post and Daniel Gildea. 2009. Bayesian learning of a tree substitution grammar. In Proceedings of the ACL-IJCNLP 2009 Conference Short Papers, pages 45–48. Association for Computational Linguistics, Suntec, Singapore. Federico Sangati and Willem Zuidema. 2009. Unsupervised Methods for Head Assignments. In Proceedings of the 12th Conference of the European Chapter of the ACL (EACL 2009), pages 701– 709. Association for Computational Linguistics, Athens, Greece. 95 Federico Sangati, Willem Zuidema, and Rens Bod. 2010. Efficiently extract recurring tree fragments from large treebanks. In Proceedings of the Seventh conference on International Language Resources and Evaluation (LREC’10). European Language Resources Association (ELRA), Valletta, Malta. Remko Scha. 1990. Taaltheorie en taaltechnologie: competence en performance. In Q. A. M. de Kort and G. L. J. Leerdam, editors, Computertoepassingen in de Neerlandistiek, LVVNjaarboek, pages 7–22. Landelijke Vereniging van Neerlandici, Almere. [Language theory and language technology: Competence and Performance] in Dutch. Khalil Sima’an. 1996. Computational complexity of probabilistic disambiguation by means of treegrammars. In Proceedings of the 16th conference on Computational linguistics, pages 1175–1 180. Association for Computational Linguistics, Morristown, NJ, USA. Khalil Sima’an. 1999. Learning Efficient Disambiguation. Ph.D. thesis, Utrecht University and University of Amsterdam. Khalil Sima’an. 2003. On maximizing metrics for syntactic disambiguation. In Proceedings of the International Workshop on Parsing Technologies (IWPT’03). Elif Yamangil and Stuart M. Shieber. 2010. Bayesian synchronous tree-substitution grammar induction and its application to sentence compression. In Proceedings of the 48th Annual Meeting of the ACL, ACL ’ 10, pages 937–947. Association for Computational Linguistics, Stroudsburg, PA, USA. Andreas Zollmann and Khalil Sima’an. 2005. A consistent and efficient estimator for dataoriented parsing. Journal of Automata, Languages and Combinatorics, 10(2/3):367–388. Willem Zuidema. 2007. Parsimonious DataOriented Parsing. In Proceedings of the 2007 Joint Conference on Empirical Methods in Natural Language Processing and Computational Natural Language Learning (EMNLP-CoNLL), pages 551–560. Association for Computational Linguistics, Prague, Czech Republic.