acl acl2011 acl2011-233 acl2011-233-reference knowledge-graph by maker-knowledge-mining

233 acl-2011-On-line Language Model Biasing for Statistical Machine Translation

Source: pdf

Author: Sankaranarayanan Ananthakrishnan ; Rohit Prasad ; Prem Natarajan

Abstract: The language model (LM) is a critical component in most statistical machine translation (SMT) systems, serving to establish a probability distribution over the hypothesis space. Most SMT systems use a static LM, independent of the source language input. While previous work has shown that adapting LMs based on the input improves SMT performance, none of the techniques has thus far been shown to be feasible for on-line systems. In this paper, we develop a novel measure of cross-lingual similarity for biasing the LM based on the test input. We also illustrate an efficient on-line implementation that supports integration with on-line SMT systems by transferring much of the computational load off-line. Our approach yields significant reductions in target perplexity compared to the static LM, as well as consistent improvements in SMT performance across language pairs (English-Dari and English-Pashto).

reference text

cessing, EMNLP ’08, pages 857–866, Stroudsburg, PYaeTKst1D9anrcivEhgA.ndltiB,-cPaSOJuroltndhwaeiypnzs,otLarNcnfV,olJiaeHnrhmctyUeAa,nDS.htCiuanJm.reMinDthe,arlnMWsmanlioedcPrtkhiD,oeanFsthr:olav,pniFJd.zSthYaoerpl,s fKewnpOsovckArihynt.,BPLUrngaAStis,ZogUhnlu.aSftAiogsh,eLn.MoiAwmc2sgai0otuhdisocetnlai rsufeocd,ErtanCupkfrOtoe,aidLm ICnoqpNaudGlemtSarfcypo’ieu0mnp4tasfho,edlirSntLlsriVanoct.uegLIdaonleisbP.gtmuiCrcoasg2.,m0tehiPcpdnA4su-e., Della Pietra, and Robert L. Mercer. 1993. The mathematics of statistical machine translation: parameter estimation. Computational Linguistics, 19:263–3 11. Woosung Kim. 2005. Language Model Adaptation for Automatic Speech Recognition and StatisticalMachine Translation. Ph.D. thesis, The Johns Hopkins University, Baltimore, MD. Philipp Koehn, Franz Josef Och, and Daniel Marcu. 2003. Statistical phrase-based translation. In NAACL ’03: Proceedings of the 2003 Conference of the North American Chapter of the Association for Computational Linguistics on Human Language Technology, pages 48–54, Morristown, NJ, USA. Association for Computational Linguistics. Philipp Koehn, Hieu Hoang, Alexandra Birch, Chris Callison-Burch, Marcello Federico, Nicola Bertoldi, Brooke Cowan, Wade Shen, Christine Moran, Richard Zens, Chris Dyer, Ond rˇej Bojar, Alexandra Constantin, and Evan Herbst. 2007. Moses: open source toolkit for statistical machine translation. In Proceedings of the 45th Annual Meeting of the ACL on Interactive Poster and Demonstration Sessions, ACL ’07, pages 177–180, Stroudsburg, PA, USA. Association for Computational Linguistics. Franz Josef Och. 2003. Minimum error rate training in statistical machine translation. In ACL ’03: Proceedings of the 41st Annual Meeting on Association for Computational Linguistics, pages 160–167, Morristown, NJ, USA. Association for Computational Linguistics. Kishore Papineni, Salim Roukos, Todd Ward, and WeiJing Zhu. 2001. BLEU: A method for automatic evaluation of machine translation. In ACL ’02: Proceedings of the 40th Annual Meeting on Association for Computational Linguistics, pages 3 11–3 18, Morristown, NJ, USA. Association for Computational Linguistics. Matthew Snover, Bonnie Dorr, Richard Schwartz, Linnea Micciulla, and John Makhoul. 2006. A study of translation edit rate with targeted human annotation. In Proceedings AMTA, pages 223–23 1, August. Matthew Snover, Bonnie Dorr, and Richard Schwartz. 2008. Language and translation model adaptation using comparable corpora. In Proceedings ofthe Conference on Empirical Methods in Natural Language Pro449