nips nips2004 nips2004-200 nips2004-200-reference knowledge-graph by maker-knowledge-mining
Source: pdf
Author: Peng Xu, Frederick Jelinek
Abstract: In this paper, we explore the use of Random Forests (RFs) in the structured language model (SLM), which uses rich syntactic information in predicting the next word based on words already seen. The goal in this work is to construct RFs by randomly growing Decision Trees (DTs) using syntactic information and investigate the performance of the SLM modeled by the RFs in automatic speech recognition. RFs, which were originally developed as classifiers, are a combination of decision tree classifiers. Each tree is grown based on random training data sampled independently and with the same distribution for all trees in the forest, and a random selection of possible questions at each node of the decision tree. Our approach extends the original idea of RFs to deal with the data sparseness problem encountered in language modeling. RFs have been studied in the context of n-gram language modeling and have been shown to generalize well to unseen data. We show in this paper that RFs using syntactic information can also achieve better performance in both perplexity (PPL) and word error rate (WER) in a large vocabulary speech recognition system, compared to a baseline that uses Kneser-Ney smoothing. 1
[1] Stanley F. Chen and Joshua Goodman, “An empirical study of smoothing techniques for language modeling,” Tech. Rep. TR-10-98, Computer Science Group, Harvard University, Cambridge, Massachusetts, 1998.
[2] L. Bahl, P. Brown, P. de Souza, and R. Mercer, “A tree-based statistical language model for natural language speech recognition,” in IEEE Transactions on Acoustics, Speech and Signal Processing, July 1989, vol. 37, pp. 1001–1008.
[3] Gerasimos Potamianos and Frederick Jelinek, “A study of n-gram and decision tree letter language modeling methods,” Speech Communication, vol. 24(3), pp. 171–192, 1998.
[4] Peng Xu and Frederick Jelinek, “Random forests in language modeling,” in Proceedings of the 2004 Conference on Empirical Methods in Natural Language Processing, Barcelona, Spain, July 2004.
[5] Ciprian Chelba and Frederick Jelinek, “Structured language modeling,” Computer Speech and Language, vol. 14, no. 4, pp. 283–332, October 2000.
[6] Eugene Charniak, “Immediate-head parsing for language models,” in Proceedings of the 39th Annual Meeting and 10th Conference of the European Chapter of ACL, Toulouse, France, July 2001, pp. 116–123.
[7] Brian Roark, Robust Probabilistic Predictive Syntactic Processing: Motivations, Models and Applications, Ph.D. thesis, Brown University, Providence, RI, 2001.
[8] Reinhard Kneser and Hermann Ney, “Improved backing-off for m-gram language modeling,” in Proceedings of the IEEE International Conference on Acoustics, Speech, and Signal Processing, 1995, vol. 1, pp. 181–184.
[9] Y. Amit and D. Geman, “Shape quantization and recognition with randomized trees,” Neural Computation, , no. 9, pp. 1545–1588, 1997.
[10] Leo Breiman, “Random forests,” Tech. Rep., Statistics Department, University of California, Berkeley, Berkeley, CA, 2001.
[11] T.K. Ho, “The random subspace method for constructing decision forests,” IEEE Trans. on Pattern Analysis and Machine Intelligence, vol. 20, no. 8, pp. 832–844, 1998.
[12] S. Martin, J. Liermann, and H. Ney, “Algorithms for bigram and trigram word clustering,” Speech Communication, vol. 24(3), pp. 171–192, 1998.
[13] L. Breiman, J.H. Friedman, R.A. Olshen, and C.J. Stone, Classification and Regression Trees, Chapman and Hall, New York, 1984.