acl acl2011 acl2011-175 acl2011-175-reference knowledge-graph by maker-knowledge-mining
Source: pdf
Author: Hinrich Schutze
Abstract: Building on earlier work that integrates different factors in language modeling, we view (i) backing off to a shorter history and (ii) class-based generalization as two complementary mechanisms of using a larger equivalence class for prediction when the default equivalence class is too small for reliable estimation. This view entails that the classes in a language model should be learned from rare events only and should be preferably applied to rare events. We construct such a model and show that both training on rare events and preferable application to rare events improve perplexity when compared to a simple direct interpolation of class-based with standard language models.
Jeff Bilmes and Katrin Kirchhoff. 2003. Factored language models and generalized parallel backoff. In HLT-NAACL. Peter F. Brown, Vincent J. Della Pietra, Peter V. de Souza, Jennifer C. Lai, and Robert L. Mercer. 1992. Classbased n-gram models of natural language. Computational Linguistics, 18(4):467–479. Stanley F. Chen and Joshua Goodman. 1996. An empirical study of smoothing techniques for language modeling. CoRR, cmp-lg/960601 1. Stanley F. Chen and Joshua Goodman. 1999. An empirical study of smoothing techniques for language modeling. Computer Speech & Language, 13(4):359–393. Stanley F. Chen. 2009. Shrinking exponential language models. In HLT/NAACL, pages 468–476. Alexander Clark. 2003. Combining distributional and morphological information for part of speech induction. In EACL, pages 59–66. Sabine Deligne and Yoshinori Sagisaka. 2000. Statistical language modeling with a class-based n-multigram model. Computer Speech & Language, 14(3):261– 279. Pierre Dupont and Ronald Rosenfeld. 1997. Lattice based language models. Technical Report CMU-CS97-173, Carnegie Mellon University. Ahmad Emami and Frederick Jelinek. 2005. Random clustering for language modeling. In ICASSP, volume 1, pages 581–584. Frederick Jelinek and Robert L. Mercer. 1980. Interpolated estimation of Markov source parameters from sparse data. In Edzard S. Gelsema and Laveen N. Kanal, editors, Pattern Recognition in Practice, pages 381–397. North-Holland. Frederick Jelinek. 1990. Self-organized language mod- eling for speech recognition. In Alex Waibel and KaiFu Lee, editors, Readings in speech recognition, pages 450–506. Morgan Kaufmann. Raquel Justo and M. In´ es Torres. 2009. Phrase classes in two-level language models for ASR. Pattern Analysis & Applications, 12(4):427–437. Slava M. Katz. 1987. Estimation of probabilities from sparse data for the language model component of a speech recognizer. IEEE Transactions on Acoustics, Speech and Signal Processing, 35(3):400–401 . Reinhard Kneser and Hermann Ney. 1995. Improved backing-offfor m-gram language modeling. In ICASSP, volume 1, pages 181–184. Hong-Kwang J. Kuo and Wolfgang Reichl. 1999. Phrase-based language models for speech recognition. In European Conference on Speech Communication and Technology, volume 4, pages 1595–1598. John G. McMahon and Francis J. Smith. 1996. Improving statistical language model performance with automatically generated word hierarchies. Computational Linguistics, 22:217–247. Saeedeh Momtazi and Dietrich Klakow. 2009. A word clustering approach for language model-based sentence retrieval in question answering systems. In ACM Conference on Information and Knowledge Management, pages 1911–1914. Hermann Ney, Ute Essen, and Reinhard Kneser. 1994. On structuring probabilistic dependencies in stochastic language modelling. Computer Speech andLanguage, 8: 1–38. Roi Reichart, Omri Abend, and Ari Rappoport. 2010. Type level clustering evaluation: new measures and a pos induction case study. In Proceedings of the Fourteenth Conference on Computational Natural Language Learning, pages 77–87. Hinrich Sch u¨tze. 1995. Distributional part-of-speech tagging. In EACL 7, pages 141–148. Andreas Stolcke. 2002. SRILM - An extensible language modeling toolkit. In International Conference on Spoken Language Processing, pages 901–904. Bernhard Suhm and Alex Waibel. 1994. Towards better language models for spontaneous speech. In International Conference on Spoken Language Processing, pages 83 1–834. Jakob Uszkoreit and Thorsten Brants. 2008. Distributed word clustering for large scale class-based language modeling in machine translation. In Annual Meeting of the Association for Computational Linguistics, pages 755–762. E.W.D. Whittaker and P.C. Woodland. 2001. Efficient class-based language modelling for very large vocabularies. In ICASSP, volume 1, pages 545–548. Michael Wiegand and Dietrich Klakow. 2008. Optimizing language models for polarity classification. In ECIR, pages 612–616. T. Yokoyama, T. Shinozaki, K. Iwano, and S. Furui. 2003. Unsupervised class-based language model adaptation for spontaneous speech recognition. In ICASSP, volume 1, pages 236–239. Imed Zitouni and Qiru Zhou. 2007. Linearly interpolated hierarchical n-gram language models for speech recognition engines. In Michael Grimm and Kristian Kroschel, editors, Robust Speech Recognition and Understanding, pages 301–3 18. I-Tech Education and Publishing. Imed Zitouni and Qiru Zhou. 2008. Hierarchical linear discounting class n-gram language models: A multilevel class hierarchy approach. In International Conference on Acoustics, Speech, and Signal Processing, pages 4917–4920. 1525