acl acl2013 acl2013-325 acl2013-325-reference knowledge-graph by maker-knowledge-mining
Source: pdf
Author: Brian Roark ; Cyril Allauzen ; Michael Riley
Abstract: We present an algorithm for re-estimating parameters of backoff n-gram language models so as to preserve given marginal distributions, along the lines of wellknown Kneser-Ney (1995) smoothing. Unlike Kneser-Ney, our approach is designed to be applied to any given smoothed backoff model, including models that have already been heavily pruned. As a result, the algorithm avoids issues observed when pruning Kneser-Ney models (Siivola et al., 2007; Chelba et al., 2010), while retaining the benefits of such marginal distribution constraints. We present experimental results for heavily pruned backoff ngram models, and demonstrate perplexity and word error rate reductions when used with various baseline smoothing methods. An open-source version of the algorithm has been released as part of the OpenGrm ngram library.1
Cyril Allauzen, Michael Riley, Johan Schalkwyk, Wojciech Skut, and Mehryar Mohri. 2007. OpenFst: A general and efficient weighted finite-state transducer library. In Proceedings of the Twelfth International Conference on Implementation and Application of Automata (CIAA 2007), Lecture Notes in Computer Science, volume 4793, pages 11–23. Ciprian Chelba, Thorsten Brants, Will Neveitt, and Peng Xu. 2010. Study on interaction between entropy pruning and Kneser-Ney smoothing. In Proceedings of Interspeech, page 24222425. Stanley Chen and Joshua Goodman. 1998. An empirical study of smoothing techniques for language modeling. Technical Report, TR-10-98, Harvard University. Joshua Goodman. 2001 . A bit of progress in language modeling. Computer Speech and Language, 15(4):403–434. Slava M. Katz. 1987. sparse data for the speech recogniser. Speech, and Signal Estimation of probabilities from language model component of a IEEE Transactions on Acoustic, Processing, 35(3):400–401 . Reinhard Kneser and Hermann Ney. 1995. Improved backing-off for m-gram language modeling. In Proceedings of the International Conference on Acoustics, Speech, and Signal Processing (ICASSP), pages 181–184. Hermann Ney, Ute Essen, and Reinhard Kneser. 1994. On structuring probabilistic dependences in stochastic language modeling. Computer Speech and Language, 8: 1–38. Eric S. Ristad. 1995. A natural law of succession. Technical Report, CS-TR-495-95, Princeton University. Brian Roark, Richard Sproat, Cyril Allauzen, Michael Riley, Jeffrey Sorensen, and Terry Tai. 2012. The OpenGrm open-source finite-state grammar software libraries. In Proceedings of the ACL 2012 System Demonstrations, pages 61–66. Kristie Seymore and Ronald Rosenfeld. 1996. Scalable backoff language models. In Proceedings of the International Conference on Spoken Language Processing (ICSLP). Vesa Siivola, Teemu Hirsimaki, and Sami Virpioja. 2007. On growing and pruning kneserney smoothed n-gram models. IEEE Transactions on Audio, Speech, and Language Processing, 15(5): 1617– 1624. William J Stewart. 1999. Numerical methods for computing stationary distributions of finite irreducible markov chains. Computational Probability, pages 81–1 11. Andreas Stolcke, Jing Zheng, Wen Wang, and Victor Abrash. 2011. Srilm at sixteen: Update and outlook. In Proceedings of the IEEE Automatic Speech Recognition and Understanding Workshop (ASRU). Andreas Stolcke. 1998. Entropy-based pruning of backoff language models. In Proc. DARPA Broadcast News Transcription and Understanding Workshop, pages 270–274. David Talbot and Thorsten Brants. 2008. Randomized language models via perfect hash functions. In Proceedings of ACL-08: HLT, pages 505–513. David Talbot and Miles Osborne. 2007. Smoothed Bloom filter language models: Tera-scale LMs on the cheap. In Proceedings of the 2007 Joint Conference on Empirical Methods in Natural Language Processing and Computational Natural Language Learning (EMNLP-CoNLL), pages 468–476. Ian H. Witten and Timothy C. Bell. 1991. The zerofrequency problem: Estimating the probabilities of novel events in adaptive text compression. IEEE Transactions on Information Theory, 37(4): 1085– 1094. A. Yeh. 2000. More accurate tests for the statistical significance of result differences. In Proceedings of the 18th International COLING, pages 947–953. 52