emnlp emnlp2011 emnlp2011-93 emnlp2011-93-reference knowledge-graph by maker-knowledge-mining
Source: pdf
Author: Zhifei Li ; Ziyuan Wang ; Jason Eisner ; Sanjeev Khudanpur ; Brian Roark
Abstract: Discriminative training for machine translation has been well studied in the recent past. A limitation of the work to date is that it relies on the availability of high-quality in-domain bilingual text for supervised training. We present an unsupervised discriminative training framework to incorporate the usually plentiful target-language monolingual data by using a rough “reverse” translation system. Intuitively, our method strives to ensure that probabilistic “round-trip” translation from a target- language sentence to the source-language and back will have low expected loss. Theoretically, this may be justified as (discriminatively) minimizing an imputed empirical risk. Empirically, we demonstrate that augmenting supervised training with unsupervised data improves translation performance over the supervised case for both IWSLT and NIST tasks.
Phil Blunsom, Trevor Cohn, and Miles Osborne. 2008. A discriminative latent variable model for statistical machine translation. In ACL, pages 200–208. 928 Stanley F. Chen and Joshua Goodman. 1998. An empirical study of smoothing techniques for language modeling. Technical report. David Chiang, Kevin Knight, and Wei Wang. 2009. 11,001 new features for statistical machine translation. In NAACL, pages 218–226. David Chiang. 2007. Hierarchical phrase-based translation. Computational Linguistics, 33(2):201–228. Michael Collins. 2002. Discriminative training methods for hidden markov models: theory and experiments with perceptron algorithms. In EMNLP, pages 1–8. Koby Crammer, Ofer Dekel, Joseph Keshet, Shai ShalevShwartz, and Yoram Singer. 2006. Online passiveaggressive algorithms. J. Mach. Learn. Res., 7:55 1– 585. Christopher Dyer, Smaranda Muresan, and Philip Resnik. 2008. Generalizing word lattice translation. In ACL, pages 1012–1020. Matthias Eck and Chiori Hori. 2005. Overview of the iwslt 2005 evaluation campaign. In In IWSLT. Michel Galley, Jonathan Graehl, Kevin Knight, Daniel Marcu, Steve DeNeefe, Wei Wang, and Ignacio Thayer. 2006. Scalable inference and training of context-rich syntactic translation models. In ACL, pages 961–968. Liang Huang and David Chiang. 2005. Better k-best parsing. In IWPT, pages 53–64. Jui-Ting Huang, Xiao Li, and Alex Acero. 2010. Discriminative training methods for language models using conditional entropy criteria. In ICASSP. Mark Johnson, Thomas Griffiths, and Sharon Goldwater. 2007. Bayesian inference for PCFGs via Markov chain Monte Carlo. In NAACL, pages 139–146. Philipp Koehn, Franz Josef Och, and Daniel Marcu. 2003. Statistical phrase-based translation. In NAACL, pages 48–54. John Lafferty, Andrew McCallum, and Fernando Pereira. 2001. Conditional random fields: Probabilistic models for segmenting and labeling sequence data. In ICML. Zhifei Li and Jason Eisner. 2009. First- and second-order expectation semirings with applications to minimumrisk training on translation forests. In EMNLP, pages 40–51. Zhifei Li, Chris Callison-Burch, Chris Dyer, Juri Ganitkevitch, Sanjeev Khudanpur, Lane Schwartz, Wren Thornton, Jonathan Weese, and Omar. Zaidan. 2009a. Joshua: An open source toolkit for parsing-based machine translation. In WMT09, pages 26–30. Zhifei Li, Jason Eisner, and Sanjeev Khudanpur. 2009b. Variational decoding for statistical machine translation. In ACL, pages 593–601. Zhifei Li, Ziyuan Wang, Sanjeev Khudanpur, and Jason Eisner. 2010. Unsupervised discriminative language model training for machine translation using simulated confusion sets. In COLING, pages 556–664. Percy Liang, Alexandre Bouchard-C oˆt´ e, Dan Klein, and Ben Taskar. 2006. An end-to-end discriminative approach to machine translation. In ACL, pages 761– 768. R. J. A. Little and D. B. Rubin. 1987. Statistical Analysis with Missing Data. J. Wiley & Sons, New York. Dong C. Liu, Jorge Nocedal, and Dong C. 1989. On the limited memory bfgs method for large scale optimization. Mathematical Programming, 45:503–528. Wolfgang Macherey, Franz Och, Ignacio Thayer, and Jakob Uszkoreit. 2008. Lattice-based minimum error rate training for statistical machine translation. In EMNLP, pages 725–734. Jonathan May and Kevin Knight. 2006. A better n-best list: practical determinization of weighted finite tree automata. In NAACL, pages 351–358. Thomas Minka. 2000. Empirical risk minimization is an incomplete inductive principle. In MIT Media Lab note. Franz Josef Och. 2003. Minimum error rate training in statistical machine translation. In ACL, pages 160– 167. Kishore Papineni, Salim Roukos, Todd Ward, and WeiJing Zhu. 2001. BLEU: A method for automatic evaluation ofmachine translation. InACL, pages 311–318. D. B. Rubin. 1987. Multiple Imputation for Nonresponse in Surveys. J. Wiley & Sons, New York. David A. Smith and Jason Eisner. 2006. Minimum risk annealing for training log-linear models. In ACL, pages 787–794. Roy Tromble, Shankar Kumar, Franz Och, and Wolfgang Macherey. 2008. Lattice minimum-Bayes-risk decoding for statistical machine translation. In EMNLP, pages 620–629. Taro Watanabe, Jun Suzuki, Hajime Tsukada, and Hideki Isozaki. 2007. Online large-margin training for statistical machine translation. In EMNLP-CoNLL, pages 764–773. 929