emnlp emnlp2013 emnlp2013-15 emnlp2013-15-reference knowledge-graph by maker-knowledge-mining

15 emnlp-2013-A Systematic Exploration of Diversity in Machine Translation


Source: pdf

Author: Kevin Gimpel ; Dhruv Batra ; Chris Dyer ; Gregory Shakhnarovich

Abstract: This paper addresses the problem of producing a diverse set of plausible translations. We present a simple procedure that can be used with any statistical machine translation (MT) system. We explore three ways of using diverse translations: (1) system combination, (2) discriminative reranking with rich features, and (3) a novel post-editing scenario in which multiple translations are presented to users. We find that diversity can improve performance on these tasks, especially for sentences that are difficult for MT.


reference text

N. Bach, F. Huang, and Y. Al-Onaizan. 2011. Goodness: A method for measuring machine translation confidence. In Proc. of ACL. D. Batra, P. Yadollahpour, A. Guzman-Rivera, and G. Shakhnarovich. 2012. Diverse M-best solutions in Markov random fields. In Proc. of ECCV. O. Bojar, C. Buck, C. Callison-Burch, C. Federmann, B. Haddow, P. Koehn, C. Monz, M. Post, R. Soricut, and L. Specia. 2013. Findings of the 2013 Workshop on Statistical Machine Translation. In Proc. of WMT. P. F. Brown, P. V. deSouza, R. L. Mercer, V. J. Della Pietra, and J. C. Lai. 1992. Class-based N-gram mod- els of natural language. Computational Linguistics, 18. C. Callison-Burch, P. Koehn, C. Monz, and O. Zaidan. 2011. Findings of the 2011 Workshop on Statistical Machine Translation. In Proc. of WMT. C. Callison-Burch, P. Koehn, C. Monz, M. Post, R. Soricut, and L. Specia. 2012. Findings of the 2012 Workshop on Statistical Machine Translation. In Proc. of WMT. D. Cer, C. D. Manning, and D. Jurafsky. 2013. Positive diversity tuning for machine translation system combination. In Proc. of WMT. P. Chang, M. Galley, and C. D. Manning. 2008. Optimizing Chinese word segmentation for machine translation performance. In Proc. of WMT. E. Charniak and M. Johnson. 2005. Coarse-to-fine nbest parsing and maxent discriminative reranking. In Proc. of ACL. S. Chatterjee and N. Cancedda. 2010. Minimum error rate training by sampling the translation lattice. In Proc. of EMNLP. S. Chen and J. Goodman. 1998. An empirical study of smoothing techniques for language modeling. Technical report 10-98, Harvard University. D. Chiang. 2007. Hierarchical phrase-based translation. Computational Linguistics, 33(2). M. Collins and T. Koo. 2005. Discriminative reranking for natural language parsing. Computational Linguis- tics, 3 1(1). M. Collins. 2000. Discriminative reranking for natural language parsing. In Proc. of ICML. J. DeNero, D. Chiang, and K. Knight. 2009. Fast consensus decoding over translation forests. In Proc. of ACL. J. Devlin and S. Matsoukas. 2012. Trait-based hypothesis selection for machine translation. In Proc. of NAACL. C. Dyer. 2009. Using a maximum entropy model to build segmentation lattices for MT. In Proc. of HLTNAACL. C. Dyer. 2010. A Formal Model of Ambiguity and its Applications in Machine Translation. Ph.D. thesis, University of Maryland. J. R. Finkel, C. D. Manning, and A. Y. Ng. 2006. Solving the problem of cascading errors: Approximate Bayesian inference for linguistic annotation pipelines. In Proc. of EMNLP. E. M. Gertz and S. J. Wright. 2003. Object-oriented software for quadratic programming. ACM Transactions 1110 on Mathematical Software, 29(1). J. Gillenwater, A. Kulesza, and B. Taskar. 2012. Discovering diverse and salient threads in document collections. In Proc. of EMNLP. K. Heafield and A. Lavie. 2010a. Combining machine translation output with open source: The Carnegie Mellon multi-engine machine translation scheme. The Prague Bulletin of Mathematical Linguistics, 93. K. Heafield and A. Lavie. 2010b. Voting on n-grams for machine translation system combination. In Proc. of AMTA. K. Heafield. 2011. Kenlm: Faster and smaller language model queries. In Proc. of WMT. A. Hildebrand and S. Vogel. 2008. Combination of machine translation systems via hypothesis selection from combined n-best lists. In Proc. of AMTA. H. Hoang, P. Koehn, and A. Lopez. 2009. A Unified Framework for Phrase-Based, Hierarchical, and Syntax-Based Statistical Machine Translation. In Proc. of IWSLT. M. Hopkins and J. May. 2011. Tuning as ranking. In Proc. of EMNLP. L. Huang. 2008. Forest reranking: Discriminative parsing with non-local features. In Proc. of ACL. T. Joachims, T. Finley, and C. Yu. 2009. Cuttingplane training of structural SVMs. Machine Learning, 77(1). P. Koehn, F. J. Och, and D. Marcu. 2003. Statistical phrase-based translation. In Proc. of HLT-NAACL. P. Koehn, H. Hoang, A. Birch, C. Callison-Burch, M. Federico, N. Bertoldi, B. Cowan, W. Shen, C. Moran, R. Zens, C. Dyer, O. Bojar, A. Constantin, and E. Herbst. 2007. Moses: Open source toolkit for statistical machine translation. In Proc. of ACL (demo session). P. Koehn. 2010. Enabling monolingual translators: Postediting vs. options. In Proc. of NAACL. M. Koponen. 2012. Comparing human perceptions of post-editing effort with post-editing operations. In Proc. of WMT. A. Kulesza and B. Taskar. 2010. Structured determinantal point processes. In Proc. of NIPS. A. Kulesza and B. Taskar. 2011. Learning determinantal point processes. In Proc. of UAI. S. Kumar and W. Byrne. 2004. Minimum bayes-risk decoding for statistical machine translation. In Proc. of HLT-NAACL. S. Kumar, W. Macherey, C. Dyer, and F. Och. 2009. Efficient minimum error rate training and minimum Bayes-risk decoding for translation hypergraphs and lattices. In Proc. of ACL-IJCNLP. Y. Lee, K. Papineni, S. Roukos, O. Emam, and H. Hassan. 2003. Language model based Arabic word segmentation. In Proc. of ACL. Z. Li, J. Eisner, and S. Khudanpur. 2009. Variational decoding for statistical machine translation. In Proc. of ACL. P. Liang. 2005. Semi-supervised learning for natural language. Master’s thesis, Massachusetts Institute of Technology. C. Lin and F. J. Och. 2004. Orange: a method for evaluating automatic evaluation metrics for machine translation. In Proc. of COLING. W. Macherey and F. J. Och. 2007. An empirical study on computing consensus translations from multiple machine translation systems. In Proc. of EMNLPCoNLL. W. Macherey, F. J. Och, I. Thayer, and J. Uszkoreit. 2008. Lattice-based minimum error rate training for statistical machine translation. In Proc. of EMNLP. F. J. Och and H. Ney. 2002. Discriminative training and maximum entropy models for statistical machine translation. In Proc. of ACL. F. J. Och and H. Ney. 2003. A systematic comparison of various statistical alignment models. Computational Linguistics, 29(1). F. J. Och, D. Gildea, S. Khudanpur, A. Sarkar, K. Yamada, A. Fraser, S. Kumar, L. Shen, D. Smith, K. Eng, V. Jain, Z. Jin, and D. Radev. 2004. A smorgasbord of features for statistical machine translation. In HLTNAACL. F. J. Och. 2003. Minimum error rate training for statistical machine translation. In Proc. of ACL. K. Papineni, S. Roukos, T. Ward, and W.J. Zhu. 2002. BLEU: a method for automatic evaluation of machine translation. In Proc. of ACL. A. Pauls and D. Klein. 2012. Large-scale syntactic language modeling with treelets. In Proc. of ACL. A.-V. Rosti, N. F. Ayan, B. Xiang, S. Matsoukas, R. Schwartz, and B. Dorr. 2007. Combining outputs from multiple machine translation systems. In HLTNAACL. L. Shen and A. K. Joshi. 2003. An SVM-based voting algorithm with application to parse reranking. In Proc. of CoNLL. L. Shen, A. Sarkar, and F. J. Och. 2004. Discriminative reranking for machine translation. In Proc. of HLTNAACL. 1111 L. Specia, N. Hajlaoui, C. Hallett, and W. Aziz. 2011. Predicting machine translation adequacy. In Proc. of MT Summit XIII. L. Specia. 2011. Exploiting objective annotations for measuring translation post-editing effort. In Proc. of EAMT. A. Stolcke. 2002. SRILM—an extensible language modeling toolkit. In Proc. of ICSLP. M. Tatsumi. 2009. Correlation between automatic evaluation metric scores, post-editing speed, and some other factors. In Proc. of MT Summit XII. K. Toutanova, D. Klein, C. D. Manning, and Y. Singer. 2003. Feature-rich part-of-speech tagging with a cyclic dependency network. In Proc. of HLT-NAACL. R. Tromble, S. Kumar, F. J. Och, and W. Macherey. 2008. Lattice Minimum Bayes-Risk decoding for statistical machine translation. In Proc. of EMNLP. I. Tsochantaridis, T. Joachims, T. Hofmann, and Y. Altun. 2005. Large margin methods for structured and interdependent output variables. JMLR, 6. A. Venugopal, A. Zollmann, N.A. Smith, and S. Vogel. 2008. Wider pipelines: N-best alignments and parses in MT training. In Proc. of AMTA. I. H. Witten and T. C. Bell. 1991. The zero-frequency problem: Estimating the probabilities of novel events in adaptive text compression. IEEE Transactions on Information Theory, 37(4). T. Xiao, J. Zhu, and T. Liu. 2013. Bagging and boosting statistical machine translation systems. Artif. Intell., 195. P. Yadollahpour, D. Batra, and G. Shakhnarovich. 2013. Discriminative re-ranking ofdiverse segmentations. In Proc. of CVPR.