acl acl2011 acl2011-38 acl2011-38-reference knowledge-graph by maker-knowledge-mining
Source: pdf
Author: Greg Durrett ; Dan Klein
Abstract: We investigate the empirical behavior of ngram discounts within and across domains. When a language model is trained and evaluated on two corpora from exactly the same domain, discounts are roughly constant, matching the assumptions of modified Kneser-Ney LMs. However, when training and test corpora diverge, the empirical discount grows essentially as a linear function of the n-gram count. We adapt a Kneser-Ney language model to incorporate such growing discounts, resulting in perplexity improvements over modified Kneser-Ney and Jelinek-Mercer baselines.
Michiel Bacchiani and Brian Roark. 2003. Unsupervised Langauge Model Adaptation. In Proceedings of International Conference on Acoustics, Speech, and Signal Processing. Michiel Bacchiani, Michael Riley, Brian Roark, and Richard Sproat. 2006. MAP adaptation of stochastic grammars. Computer Speech & Language, 20(1):41 68. Jerome R. Bellegarda. 2004. Statistical language model adaptation: review and perspectives. Speech Communication, 42:93–108. Stanley Chen and Joshua Goodman. 1998. An Empirical Study of Smoothing Techniques for Language Modeling. Technical report, Harvard University, August. Kenneth Church and William Gale. 1991. A Comparison ofthe Enhanced Good-Turing and Deleted Estimation Methods for Estimating Probabilities of English Bigrams. Computer Speech & Language, 5(1): 19–54. Joshua Goodman. 2001. A Bit of Progress in Language Modeling. Computer Speech & Language, 15(4):403– 434. Bo-June (Paul) Hsu and James Glass. 2008. Ngram Weighting: Reducing Training Data Mismatch in Cross-Domain Language Model Estimation. In Proceedings of the Conference on Empirical Methods in Natural Language Processing, pages 829–838. Dietrich Klakow. 2000. Selecting articles from the language model training corpus. In Proceedings of the IEEE International Conference on Acoustics, Speech, and Signal Processing, volume 3, pages 1695–1698. Reinhard Kneser and Hermann Ney. 1995. Improved Backing-off for M-Gram Language Modeling. In Proceedings of International Conference on Acoustics, Speech, and Signal Processing. Robert C. Moore and William Lewis. 2010. Intelligent selection of language model training data. In Proceedings of the ACL 2010 Conference Short Papers, pages – 220–224, July. Robert C. Moore and Chris Quirk. 2009. Improved Smoothing for N-gram Language Models Based on Ordinary Counts. In Proceedings of the ACL-IJCNLP 2009 Conference Short Papers, pages 349–352. Ronald Rosenfeld. 1996. A Maximum Entropy Approach to Adaptive Statistical Language Modeling. Computer, Speech & Language, 10: 187–228. Yee Whye Teh. 2006. A Hierarchical Bayesian Language Model Based On Pitman-Yor Processes. In Proceedings of ACL, pages 985–992, Sydney, Australia, July. Association for Computational Linguistics. 29