emnlp emnlp2011 emnlp2011-46 emnlp2011-46-reference knowledge-graph by maker-knowledge-mining

46 emnlp-2011-Efficient Subsampling for Training Complex Language Models

Source: pdf

Author: Puyang Xu ; Asela Gunawardana ; Sanjeev Khudanpur

Abstract: We propose an efficient way to train maximum entropy language models (MELM) and neural network language models (NNLM). The advantage of the proposed method comes from a more robust and efficient subsampling technique. The original multi-class language modeling problem is transformed into a set of binary problems where each binary classifier predicts whether or not a particular word will occur. We show that the binarized model is as powerful as the standard model and allows us to aggressively subsample negative training examples without sacrificing predictive performance. Empirical results show that we can train MELM and NNLM at 1% ∼ 5% of the strtaaninda MrdE complexity LwMith a no %los ∼s 5in% performance.

reference text

Allwein, Erin, Robert Schapire, Yoram Singer and Pack Kaelbling. 2000. Reducing Multiclass to Binary: A Unifying Approach for Margin Classifiers. Journal of Machine Learning Research, 1:113-141. Bengio, Yoshua, Rejean Ducharme and Pascal Vincent 2003. A neural probabilistic language model Journal of Machine Learning research, 3: 1137–1 155. Bengio, Yoshua and J. Senecal 2008. Adaptive importance sampling to accelerate training of a neural probabilistic language model IEEE Transaction on Neural Network, Apr. 2008. Berger, Adam, Stephen A. Della Pietra and Vicent J. Della Pietra 1996. A Maximum Entropy approach to Natural Language Processing. Computational Linguistics, 1996, 22:39-71. Brants, Thorsten, Ashok C. Popat, Peng Xu, Frank J. Och and Jeffrey Dean 2007. Large language models in machine translation. In Proceedings of 2007 Conference on Empirical Methods in Natural Language Processing, 858–867. Goodman, Joshua 2001. Classes for Fast Maximum Entropy Training. Proceedings of 2001 IEEE International Conference on Acoustics, Speech and Signal Processing. Goodman, Joshua 2001. A bit of Progress in Language Modeling. Computer Speech and Language, 403-434. Khudanpur, Sanjeev and Jun Wu 2000. Maximum Entropy Techniques for Exploiting Syntactic, Semantic and Collocational Dependencies in Language Modeling. Computer Speech and Language, 14(4):355-372. Mikolov, Tomas, Stefan Kombrink, Lukas Burget, Jan ”Honza” Cernocky and Sanjeev Khudanpur 2011. Extensions of recurrent neural network language model. Proceedings of 2011 IEEE International Conference on Acoustics, Speech and Signal Processing. Morin, Frederic 2005. Hierarchical probabilistic neural network language model. AISTATS’05, pp. 246-252. Neyman, Jerzy 1934. On the Two Different Aspects of the Representative Method: The Method of Stratified Sampling and the Method of Purposive Selection. Journal of the Royal Statistical Society, 97(4):558625. Rifkin, Ryan and Aldebaro Klautau 2004. In Defense of One-Vs-All Classification. Journal of Machine Learning Research. Rosenfeld, Roni. 1996. A maximum entropy approach to adaptive statistical language modeling. Computer Speech and Language, 10: 187–228. Schwenk, Holger 2007. Continuous space language model. Computer Speech and Language, 21(3):492518. 1136 Wu, Jun and Sanjeev Khudanpur. 2000. Efficient training methods for maximum entropy language modeling. Proceedings of the 6th International Conference on Spoken Language Technologies, pp. 114–1 17. Xu, Puyang, Damianos Karakos and Sanjeev Khudanpur. 2009. Self-supervised discriminative training of statistical language models. Proceedings of 2009 IEEE Automatic Speech Recognition and Understanding Workshop. Zhang, Tong 2004. Solving large scale linear prediction problems using stochastic gradient descent algorithms. Proceedings of 2004 International Conference on Machine Learnings.