acl acl2011 acl2011-142 acl2011-142-reference knowledge-graph by maker-knowledge-mining
Source: pdf
Author: Denis Filimonov ; Mary Harper
Abstract: In the face of sparsity, statistical models are often interpolated with lower order (backoff) models, particularly in Language Modeling. In this paper, we argue that there is a relation between the higher order and the backoff model that must be satisfied in order for the interpolation to be effective. We show that in n-gram models, the relation is trivially held, but in models that allow arbitrary clustering of context (such as decision tree models), this relation is generally not satisfied. Based on this insight, we also propose a generalization of linear interpolation which significantly improves the performance of a decision tree language model.
Lalit R. Bahl, Peter F. Brown, Peter V. de Souza, and Robert L. Mercer. 1990. A tree-based statistical lan- guage model for natural language speech recognition. Readings in speech recognition, pages 507–514. Stanley F. Chen and Joshua Goodman. 1999. An empirical study of smoothing techniques for language modeling. Computer Speech & Language, 13(4):359–393. Denis Filimonov and Mary Harper. 2009. A joint language model with fine-grain syntactic tags. In Proceedings of the EMNLP. Peter A. Heeman. 1999. POS tags and decision trees for language modeling. In Proceedings of the Joint SIGDAT Conference on Empirical Methods in Natural Language Processing and Very Large Corpora, pages 129–137. Zhongqiang Huang and Mary Harper. 2009. SelfTraining PCFG grammars with latent annotations across languages. In Proceedings of the EMNLP 2009. Frederick Jelinek and Robert L. Mercer. 1980. Interpolated estimation of markov source parameters from sparse data. In Proceedings of the Workshop on Pattern Recognition in Practice, pages 381–397. Peng Xu and Frederick Jelinek. 2004. Random forests in language modeling. In Proceedings of the EMNLP. 624