acl acl2012 acl2012-127 acl2012-127-reference knowledge-graph by maker-knowledge-mining
Source: pdf
Author: Adam Pauls ; Dan Klein
Abstract: We propose a simple generative, syntactic language model that conditions on overlapping windows of tree context (or treelets) in the same way that n-gram language models condition on overlapping windows of linear context. We estimate the parameters of our model by collecting counts from automatically parsed text using standard n-gram language model estimation techniques, allowing us to train a model on over one billion tokens of data using a single machine in a matter of hours. We evaluate on perplexity and a range of grammaticality tasks, and find that we perform as well or better than n-gram models and other generative baselines. Our model even competes with state-of-the-art discriminative models hand-designed for the grammaticality tasks, despite training on positive data alone. We also show fluency improvements in a preliminary machine translation experiment.
Thorsten Brants, Ashok C. Popat, Peng Xu, Franz J. Och, Jeffrey Dean, and Google Inc. 2007. Large language models in machine translation. In Proceedings of the Conference on Empirical Methods in Natural Language Processing. Eugene Charniak and Mark Johnson. 2005. Coarse-tofine n-best parsing and maxent discriminative reranking. In Proceedings of the Association for Computational Linguistics. Eugene Charniak. 2000. A maximum-entropy-inspired parser. In Proceedings of the North American chapter of the Association for Computational Linguistics. Eugene Charniak. 2001 . Immediate-head parsing for language models. In Proceedings of the Association for Computational Linguistics. Ciprian Chelba. 1997. A structured language model. In Proceedings of the Association for Computational Linguistics. Stanley F. Chen and Joshua Goodman. 1998. An empirical study of smoothing techniques for language modeling. In Proceedings of the Association for Computa- tional Linguistics. Colin Cherry and Chris Quirk. 2008. Discriminative, syntactic language modeling through latent SVMs. In Proceedings of The Association for Machine Translation in the Americas. David Chiang. 2005. A hierarchical phrase-based model for statistical machine translation. In The Annual Conference of the Association for Computational Linguistics. Michael Collins. 1997. Three generative, lexicalised models for statistical parsing. In Proceedings of Association for Computational Linguistics. Michael Collins. 1999. Head-Driven Statistical Models for Natural Language Parsing. Ph.D. thesis, University of Pennsylvania. Jennifer Foster, Joachim Wagner, and Josefvan Genabith. 2008. Adapting a wsj-trained parser to grammatically noisy text. In Proceedings of the Association for Computational Linguistics: Short Paper Track. Michel Galley, Jonathan Graehl, Kevin Knight, Daniel Marcu, Steve DeNeefe, Wei Wang, and Ignacio Thayer. 2006. Scalable inference and training of context-rich syntactic translation models. In The Annual Conference of the Association for Computational Linguistics (ACL). David Graff. 2003. English gigaword, version 3. In Linguistic Data Consortium, Philadelphia, Catalog Number LDC2003T05. Keith Hall. 2004. Best-first Word-lattice Parsing: Techniques for Integrated Syntactic Language Modeling. Ph.D. thesis, Brown University. 967 Kenneth Heafield. 2011. Kenlm: Faster and smaller language model queries. In Proceedings of the Sixth Workshop on Statistical Machine Translation. Hieu Hoang, Alexandra Birch, Chris Callison-burch, Richard Zens, Rwth Aachen, Alexandra Constantin, Marcello Federico, Nicola Bertoldi, Chris Dyer, Brooke Cowan, Wade Shen, Christine Moran, and Ondej Bojar. 2007. Moses: Open source toolkit for statistical machine translation. In Proceedings of the Association for Computational Linguistics: Demonstration Session, . Mark Johnson. 1998. PCFG models of linguistic tree representations. Computational Linguistics, 24. Dan Klein and Chris Manning. 2003. Accurate unlexicalized parsing. In Proceedings of the North American Chapter of the Association for Computational Linguistics (NAACL). Reinhard Kneser and Hermann Ney. 1995. Improved backing-off for m-gram language modeling. In IEEE International Conference on Acoustics, Speech and Signal Processing. Philipp Koehn. 2004. Pharaoh: A beam search decoder for phrase-based statistical machine translation models. In Proceedings of The Association for Machine Translation in the Americas. Zhifei Li, Chris Callison-Burch, Chris Dyer, Juri Ganitkevitch, Sanjeev Khudanpur, Lane Schwartz, Wren N. G. Thornton, Jonathan Weese, and Omar F. Zaidan. 2009. Joshua: an open source toolkit for parsingbased machine translation. In Proceedings of the Fourth Workshop on Statistical Machine Translation. M. Marcus, B. Santorini, and M. Marcinkiewicz. 1993. Building a large annotated corpus of English: The Penn Treebank. In Computational Linguistics. Franz J. Och, Daniel Gildea, Sanjeev Khudanpur, Anoop Sarkar, Kenji Yamada, Alex Fraser, Shankar Kumar, Libin Shen, David Smith, Katherine Eng, Viren Jain, Zhen Jin, and Dragomir Radev. 2004. A Smorgasbord of Features for Statistical Machine Translation. In Proceedings of the North American Association for Computational Linguistic. Daisuke Okanohara and Jun’ichi Tsujii. 2007. A discriminative language model with pseudo-negative samples. In Proceedings of the Association for Computational Linguistics. Adam Pauls and Dan Klein. 2011. Faster and smaller n-gram language models. In Proceedings of the Association for Computational Linguistics. Slav Petrov, Leon Barrett, Romain Thibaux, and Dan Klein. 2006. Learning accurate, compact, and interpretable tree annotation. In Proceedings of COLINGACL 2006. Matt Post and Daniel Gildea. 2009. Language modeling with tree substitution grammars. In Proceedings of the Conference on Neural Information Processing Systems. Matt Post. 2011. Judging grammaticality with tree substitution grammar. In Proceedings of the Association for Computational Linguistics: Short Paper Track. Chris Quirk, Arul Menezes, and Colin Cherry. 2005. Dependency treelet translation: Syntactically informed phrasal smt. In Proceedings of the Association of Computational Linguistics. Brian Roark. 2004. Probabilistic top-down parsing and language modeling. Computational Linguistics. Ming Tan, Wenli Zhou, Lei Zheng, and Shaojun Wang. 2011. A large scale distributed syntactic, semantic and lexical language model for machine translation. In Proceedings of the Association for Computational Linguistics. Ashish Vaswani, Haitao Mi, Liang Huang, and David Chiang. 2011. Rule markov models for fast tree-tostring translation. In Proceedings of the Association for Computations Linguistics. Peng Xu, Ciprian Chelba, and Fred Jelinek. 2002. A study on richer syntactic dependencies for structured language modeling. In Proceedings of the Association for Computational Linguistics. Association for Computational Linguistics. Ying Zhang. 2009. Structured language models for statistical machine translation. Ph.D. thesis, Johns Hopkins University. 968