emnlp emnlp2010 emnlp2010-108 emnlp2010-108-reference knowledge-graph by maker-knowledge-mining
Source: pdf
Author: Hai Son Le ; Alexandre Allauzen ; Guillaume Wisniewski ; Francois Yvon
Abstract: Using multi-layer neural networks to estimate the probabilities of word sequences is a promising research area in statistical language modeling, with applications in speech recognition and statistical machine translation. However, training such models for large vocabulary tasks is computationally challenging which does not scale easily to the huge corpora that are nowadays available. In this work, we study the performance and behavior of two neural statistical language models so as to highlight some important caveats of the classical training algorithms. The induced word embeddings for extreme cases are also analysed, thus providing insight into the convergence issues. A new initialization scheme and new training techniques are then introduced. These methods are shown to greatly reduce the training time and to significantly improve performance, both in terms ofperplexity and on a large-scale translation task.
Alexandre Allauzen, Josep Crego, Aur e´lien Max, and Fran ¸cois Yvon. 2009. LIMSI’s statistical translation systems for WMT’09. In Proceedings of the Fourth Workshop on Statistical Machine Translation, pages 100–104, Athens, Greece, March. Association for Computational Linguistics. Yoshua Bengio, R ´ejean Ducharme, Pascal Vincent, and Christian Janvin. 2003. A neural probabilistic language model. JMLR, 3: 1137–1 155. J. Bilmes, K. Asanovic, C. Chin, and J. Demmel. 1997. Using phipac to speed error back-propagation learning. Acoustics, Speech, and Signal Processing, IEEE International Conference on, 5:4153. Peter F. Brown, Peter V. deSouza, Robert L. Mercer, Vincent J. Della Pietra, and Jenifer C. Lai. 1992. Classbased n-gram models of natural language. Comput. Linguist., 18(4):467–479. Stanley F. Chen and Joshua Goodman. 1996. An empirical study of smoothing techniques for language modeling. In Proc. ACL’96, pages 3 10–3 18, San Francisco. Ronan Collobert and Jason Weston. 2008. A unified architecture for natural language processing: deep neural networks with multitask learning. In Proc. of ICML’08, pages 160–167, New York, NY, USA. ACM. Ahmed Emami and Lidia Mangu. 2007. Empirical study of neural network language models for Arabic speech recognition. In Proc. ASRU’07, pages 147–152, Kyoto. IEEE. Geoffrey E. Hinton and Ruslan R. Salakhutdinov. 2006. Reducing the dimensionality of data with neural networks. Science, 313(5786):504–507, July. Philipp Koehn, Hieu Hoang, Alexandra Birch, Chris Callison-Burch, Marcello Federico, Nicola Bertoldi, Brooke Cowan, Wade Shen, Christine Moran, Richard Zens, Chris Dyer, Ondrej Bojar, Alexandra Constantin, and Evan Herbst. 2007. Moses: Open source toolkit for statistical machine translation. In Proc. ACL’07, pages 177–180, Prague, Czech Republic. Philipp Koehn. 2010. Statistical Machine Translation. Cambridge University Press. Hong-Kwang Kuo, Lidia Mangu, Ahmad Emami, and Imed Zitouni. 2010. Morphological and syntactic features for arabic speech recognition. In Proc. ICASSP 2010. Raymond Lau, Ronald Rosenfeld, and Salim Roukos. 1993. Adaptive language modeling using the maximum entropy principle. In Proc HLT’93, pages 108– 113, Princeton, New Jersey. Andriy Mnih and Geoffrey Hinton. 2007. Three new graphical models for statistical language modelling. In Proc. ICML ’07, pages 641–648, New York, NY, USA. 788 Andriy Mnih and Geoffrey E Hinton. 2008. A scalable hierarchical distributed language model. In D. Koller, D. Schuurmans, Y. Bengio, and L. Bottou, editors, Advances in Neural Information Processing Systems 21, volume 21, pages 1081–1088. Thomas R. Niesler. 1997. Category-based statistical language models. Ph.D. thesis, University of Cambridge. Franz Josef Och. 2003. Minimum error rate training in statistical machine translation. In Proc. ACL’03, pages 160–167, Sapporo, Japan. Ilya Oparin, Ond ˇrej Glembek, Luk a´ˇ s Burget, and Jan Cˇernock y´. 2008. Morphological random forests for language modeling of inflectional languages. In Proc. SLT’08, pages 189–192. Kishore Papineni, Salim Roukos, Todd Ward, and WeiJing Zhu. 2002. Bleu: a method for automatic evaluation of machine translation. In Proc. ACL’02, pages 3 11–3 18, Philadelphia. Ronald Rosenfeld. 1996. A maximum entropy approach to adaptive statistical language modeling. Computer, Speech and Language, 10: 187–228. Holger Schwenk and Jean-Luc Gauvain. 2002. Connec- tionist language modeling for large vocabulary continuous speech recognition. In Proc. ICASSP, pages 765– 768, Orlando, FL. Holger Schwenk, Daniel D ´echelotte, and Jean-Luc Gauvain. 2006. Continuous space language models for statistical machine translation. In Proc. COLING/ACL’06, pages 723–730. Holger Schwenk. 2007. Continuous space language models. Comput. Speech Lang., 21(3):492–5 18. Yeh W. Teh. 2006. A hierarchical Bayesian language model based on Pitman-Yor processes. In Proc. of ACL’06, pages 985–992, Sidney, Australia. Peng Xu and Frederik Jelinek. 2004. Random forests in language modeling. In Proceedings of EMNLP’2004, pages 325–332, Barcelona, Spain.