emnlp emnlp2012 emnlp2012-74 emnlp2012-74-reference knowledge-graph by maker-knowledge-mining

74 emnlp-2012-Language Model Rest Costs and Space-Efficient Storage


Source: pdf

Author: Kenneth Heafield ; Philipp Koehn ; Alon Lavie

Abstract: Approximate search algorithms, such as cube pruning in syntactic machine translation, rely on the language model to estimate probabilities of sentence fragments. We contribute two changes that trade between accuracy of these estimates and memory, holding sentence-level scores constant. Common practice uses lowerorder entries in an N-gram model to score the first few words of a fragment; this violates assumptions made by common smoothing strategies, including Kneser-Ney. Instead, we use a unigram model to score the first word, a bigram for the second, etc. This improves search at the expense of memory. Conversely, we show how to save memory by collapsing probability and backoff into a single value without changing sentence-level scores, at the expense of less accurate estimates for sentence fragments. These changes can be stacked, achieving better estimates with unchanged memory usage. In order to interpret changes in search accuracy, we adjust the pop limit so that accuracy is unchanged and report the change in CPU time. In a GermanEnglish Moses system with target-side syntax, improved estimates yielded a 63% reduction in CPU time; for a Hiero-style version, the reduction is 21%. The compressed language model uses 26% less RAM while equivalent search quality takes 27% more CPU. Source code is released as part of KenLM.


reference text

Djamal Belazzougui, Fabiano C. Botelho, and Martin Dietzfelbinger. 2008. Hash, displace, and compress. In Proceedings of the 35th international colloquium on Automata, Languages and Programming (ICALP ’08), pages 385–396. Thorsten Brants, Ashok C. Popat, Peng Xu, Franz J. Och, and Jeffrey Dean. 2007. Large language models in machine translation. In Proceedings of the 2007 Joint Conference on Empirical Methods in Natural 6http://kheafield.com/code/kenlm/ 1177 Language Processing and Computational Language Learning, pages 858–867, June. Chris Callison-Burch, Philipp Koehn, Christof Monz, and Omar Zaidan. 2011. Findings of the 2011 workshop on statistical machine translation. In Proceedings of the Sixth Workshop on Statistical Machine Translation, pages 22–64, Edinburgh, Scotland, July. Association for Computational Linguistics. Stanley Chen and Joshua Goodman. 1998. An empirical study of smoothing techniques for language modeling. Technical Report TR-10-98, Harvard University, August. David Chiang. 2007. Hierarchical phrase-based translation. Computational Linguistics, 33:201–228, June. Michael Collins. 1999. Head-Driven Statistical Models for Natural Language Parsing. Ph.D. thesis, University of Pennsylvania. Chris Dyer, Adam Lopez, Juri Ganitkevitch, Johnathan Weese, Ferhan Ture, Phil Blunsom, Hendra Setiawan, Vladimir Eidelman, and Philip Resnik. 2010. cdec: A decoder, alignment, and learning framework for finite-state and context-free translation models. In Proceedings of the ACL 2010 System Demonstrations, ACLDemos ’ 10, pages 7–12. Marcello Federico and Nicola Bertoldi. 2006. How many bits are needed to store probabilities for phrasebased translation? In Proceedings of the Workshop on Statistical Machine Translation, pages 94–101, New York City, June. Marcello Federico, Nicola Bertoldi, and Mauro Cettolo. 2008. IRSTLM: an open source toolkit for handling large scale language models. In Proceedings of Interspeech, Brisbane, Australia. David Guthrie and Mark Hepple. 2010. Storing the web in memory: Space efficient language models with constant time retrieval. In Proceedings of EMNLP 2010, Los Angeles, CA. Kenneth Heafield, Hieu Hoang, Philipp Koehn, Tetsuo Kiso, and Marcello Federico. 2011. Left language model state for syntactic machine translation. In Proceedings of the International Workshop on Spoken Language Translation, San Francisco, CA, USA, December. Kenneth Heafield. 2011. KenLM: Faster and smaller language model queries. In Proceedings of the Sixth Workshop on Statistical Machine Translation, Edinburgh, UK, July. Association for Computational Linguistics. Liang Huang and David Chiang. 2007. Forest rescoring: Faster decoding with integrated language models. In Proceedings of the 45th Annual Meeting of the Association for Computational Linguistics, Prague, Czech Republic. Reinhard Kneser and Hermann Ney. 1995. Improved backing-off for m-gram language modeling. In Proceedings of the IEEE International Conference on Acoustics, Speech and Signal Processing, pages 181– 184. Philipp Koehn, Hieu Hoang, Alexandra Birch, Chris Callison-Burch, Marcello Federico, Nicola Bertoldi, Brooke Cowan, Wade Shen, Christine Moran, Richard Zens, Chris Dyer, Ondrej Bojar, Alexandra Constantin, and Evan Herbst. 2007. Moses: Open source toolkit for statistical machine translation. In Annual Meeting of the Association for Computational Linguistics (ACL), Prague, Czech Republic, June. Philipp Koehn. 2005. Europarl: A parallel corpus for statistical machine translation. In Proceedings of MT Summit. Zhifei Li and Sanjeev Khudanpur. 2008. A scalable decoder for parsing-based machine translation with equivalent language model state maintenance. In Proceedings of the Second ACL Workshop on Syntax and Structure in Statistical Translation (SSST-2), pages 10–18, Columbus, Ohio, June. Zhifei Li, Chris Callison-Burch, Chris Dyer, Sanjeev Khudanpur, Lane Schwartz, Wren Thornton, Jonathan Weese, and Omar Zaidan. 2009. Joshua: An open source toolkit for parsing-based machine translation. In Proceedings of the Fourth Workshop on Statistical Machine Translation, pages 135–139, Athens, Greece, March. Association for Computational Linguistics. Franz Josef Och. 2003. Minimum error rate training in statistical machine translation. In ACL ’03: Proceedings of the 41st Annual Meeting on Association for Computational Linguistics, pages 160–167, Morristown, NJ, USA. Association for Computational Linguistics. Kishore Papineni, Salim Roukos, Todd Ward, and WeiJing Zhu. 2002. BLEU: A method for automatic evalution of machine translation. In Proceedings 40th Annual Meeting of the Association for Computational Linguistics, pages 3 11–3 18, Philadelphia, PA, July. Bhiksha Raj and Ed Whittaker. 2003. Lossless compression of language model structure and word identifiers. In Proceedings of IEEE International Conference on Acoustics, Speech and Signal Processing, pages 388– 391. Andreas Stolcke. 2002. SRILM - an extensible language modeling toolkit. In Proceedings of the Seventh International Conference on Spoken Language Processing, pages 901–904. David Talbot and Miles Osborne. 2007. Randomised language modelling for statistical machine translation. In Proceedings of ACL, pages 512–5 19, Prague, Czech Republic. 1178 David Vilar and Hermann Ney. 2011. Cardinality pruning and language model heuristics for hierarchical phrase-based translation. Machine Translation, pages 1–38, November. DOI 10.1007/s10590-01 1-91 19-4. Ed Whittaker and Bhiksha Raj. 2001. Quantizationbased language model compression. In Proceedings of EUROSPEECH, pages 33–36, September. Ian H. Witten and Timothy C. Bell. 1991. The zerofrequency problem: Estimating the probabilities of novel events in adaptive text compression. IEEE Transactions on Information Theory, 37(4): 1085– 1094. Richard Zens and Hermann Ney. 2008. Improvements in dynamic programming beam search for phrase-based statistical machine translation. In Proceedings of the International Workshop on Spoken Language Translation (IWSLT), Honolulu, Hawaii, October.