acl acl2011 acl2011-78 acl2011-78-reference knowledge-graph by maker-knowledge-mining

78 acl-2011-Confidence-Weighted Learning of Factored Discriminative Language Models

Source: pdf

Author: Viet Ha Thuc ; Nicola Cancedda

Abstract: Language models based on word surface forms only are unable to benefit from available linguistic knowledge, and tend to suffer from poor estimates for rare features. We propose an approach to overcome these two limitations. We use factored features that can flexibly capture linguistic regularities, and we adopt confidence-weighted learning, a form of discriminative online learning that can better take advantage of a heavy tail of rare features. Finally, we extend the confidence-weighted learning to deal with label noise in training data, a common case with discriminative lan- guage modeling.

reference text

Salah Ait-Mokhtar, Jean-Pierre Chanod, and Claude Roux. 2001. A multi-input dependency parser. In Proceedings of the Seventh International Workshop on Parsing Technologies, Beijing, Cina. Jeff A. Bilmes and Katrin Kirchhoff. 2003. Factored language models and generalized parallel backoff. In Proceedings of HLT/NAACL, Edmonton, Alberta, Canada. Koby Crammer and Daniel D. Lee. 2010. Learning via gaussian herding. In Pre-proceeding of NIPS 2010. Koby Crammer, Ofer Dekel, Joseph Keshet, Shai ShalevShwartz, and Yoram Singer. 2006. Online passiveaggressive algorithms. Journal Of Machine Learning Research, 7. Koby Crammer, Alex Kulesza, and Mark Dredze. 2009. Adaptive regularization of weight vectors. vances in Neural Processing Information (NIPS 2009). Mark Dredze, Koby Crammer, and Fernando 2008. Confidence-weighted linear classifiers. ceedings of ICML, Helsinki, Finland. Zhifei Li and Sanjeev Khudanpur. 2008. Large-scale discriminative n-gram language models for statistical machine translation. In Proceedings of AMTA. Pierre Mah e´ and Nicola Cancedda. 2009. Linguistically enriched word-sequence kernels for discriminative language modeling. In Learning Machine Translation, NIPS Workshop Series. MIT Press, Cambridge, Mass. Brian Roark, Murat Saraclar, Michael Collins, and Mark Johnson. 2004. Discriminative language modeling with conditional random fields and the perceptron algorithm. In Proceedings of the annual meeting of the Association for Computational Linguistics (ACL), Barcelona, Spain. Brian Roark, Murat Saraclar, and Michael Collins. 2007. Discriminative n-gram language modeling. Computer Speech and Language, 21(2). M. Simard, N. Cancedda, B. Cavestro, M. Dymetman, E. Gaussier, C. Goutte, and K. Yamada. 2005. Translating with non-contiguous phrases. In Association for Computational Linguistics, editor, Proceedings of Human Language Technology Conference and Conference on Empirical Methods in Natural Language, pages 755–762, October. 444 In AdSystems Pereira. In Pro-