acl acl2012 acl2012-136 acl2012-136-reference knowledge-graph by maker-knowledge-mining
Source: pdf
Author: Kevin Duh ; Katsuhito Sudoh ; Xianchao Wu ; Hajime Tsukada ; Masaaki Nagata
Abstract: We introduce an approach to optimize a machine translation (MT) system on multiple metrics simultaneously. Different metrics (e.g. BLEU, TER) focus on different aspects of translation quality; our multi-objective approach leverages these diverse aspects to improve overall quality. Our approach is based on the theory of Pareto Optimality. It is simple to implement on top of existing single-objective optimization methods (e.g. MERT, PRO) and outperforms ad hoc alternatives based on linear-combination of metrics. We also discuss the issue of metric tunability and show that our Pareto approach is more effective in incorporating new metrics from MT evaluation for MT optimization.
Deepak Agarwal, Bee-Chung Chen, Pradheep Elango, and Xuanhui Wang. 2011. Click shaping to optimize multiple objectives. In Proceedings of the 17th ACM SIGKDD international conference on Knowledge discovery and data mining, KDD ’ 11, pages 132–140, New York, NY, USA. ACM. J. Albrecht and R. Hwa. 2007. A re-examination of machine learning approaches for sentence-level mt evaluation. In ACL. J. L. Bentley, H. T. Kung, M. Schkolnick, and C. D. Thompson. 1978. On the average number of maxima in a set of vectors and applications. Journal of the Association for Computing Machinery (JACM), 25(4). Alexandra Birch, Phil Blunsom, and Miles Osborne. 2010. Metrics for MT evaluation: Evaluating reordering. Machine Translation, 24(1). S. B ¨orzs o¨nyi, D. Kossmann, and K. Stocker. 2001. The skyline operator. In Proceedings of the 1 Interna7th tional Conference on Data Engineering (ICDE). Chris Callison-Burch, Philipp Koehn, Christof Monz, and Omar Zaidan. 2011. Findings of the 2011 workshop on statistical machine translation. In Proceedings of the Sixth Workshop on Statistical Machine Translation, pages 22–64, Edinburgh, Scotland, July. Association for Computational Linguistics. Daniel Cer, Christopher Manning, and Daniel Jurafsky. 2010. The best lexical metric for phrase-based statistical MT system optimization. In NAACL HLT. David Chiang, Wei Wang, and Kevin Knight. 2009. 11,001 new features for statistical machine translation. In NAACL. Koby Crammer, Ofer Dekel, Joseph Keshet, Shai ShalevShwartz, and Yoram Singer. 2006. Online passiveaggressive algorithms. Journal of Machine Learning Research, 7. Kalyanmoy Deb, Amrit Pratap, Sammer Agarwal, and T. Meyarivan. 2002. A fast and elitist multiobjective genetic algorithm: NSGA-II. IEEE Transactions on Evolutionary Computation, 6(2). Chris Dyer, Hendra Setiawan, Yuval Marton, and Philip Resnik. 2009. The university of maryland statistical machine translation system for the fourth workshop on machine translation. In Proc. of the Fourth Workshop on Machine Translation. Jason Eisner and Hal Daum e´ III. 2011. Learning speedaccuracy tradeoffs in nondeterministic inference algorithms. In COST: NIPS 2011 Workshop on Computational Trade-offs in Statistical Learning. Jes u´s Gimnez and Llu ı´s M `arquez. 2008. Heterogeneous automatic mt evaluation through non-parametric metric combinations. In ICJNLP. 9 Parke Godfrey, Ryan Shipley, and Jarek Gyrz. 2007. Algorithms and analyses for maximal vector computation. VLDB Journal, 16. Isao Goto, Bin Lu, Ka Po Chow, Eiichiro Sumita, and Benjamin K. Tsou. 2011. Overview of the patent machine translation task at the ntcir-9 workshop. In Proceedings of the NTCIR-9 Workshop Meeting. Keith Hall, Ryan McDonald, Jason Katz-Brown, and Michael Ringgaard. 2011. Training dependency parsers by jointly optimizing multiple objectives. In Proceedings of the 2011 Conference on Empirical Methods in Natural Language Processing, pages 1489–1499, Edinburgh, Scotland, UK., July. Associa- tion for Computational Linguistics. Yifan He and Andy Way. 2009. Improving the objective function in minimum error rate training. In MT Summit. Mark Hopkins and Jonathan May. 2011. Tuning as ranking. In Proceedings of the 2011 Conference on Empirical Methods in Natural Language Processing, pages 1352–1362, Edinburgh, Scotland, UK., July. Association for Computational Linguistics. H. Isozaki, T. Hirao, K. Duh, K. Sudoh, and H. Tsukada. 2010. Automatic evaluation of translation quality for distant language pairs. In EMNLP. T. Joachims. 2006. Training linear SVMs in linear time. In KDD. P. Koehn et al. 2007. Moses: open source toolkit for statistical machine translation. In ACL. A. Lavie and A. Agarwal. 2007. METEOR: An automatic metric for mt evaluation with high levels of correlation with human judgments. In Workshop on Statistical Machine Translation. P. Liang, A. Bouchard-Cote, D. Klein, and B. Taskar. 2006. An end-to-end discriminative approach to machine translation. In ACL. Ding Liu and Daniel Gildea. 2007. Source-language features and maximum correlation training for machine translation evaluation. In NAACL. Chang Liu, Daniel Dahlmeier, and Hwee Tou Ng. 2011. Better evaluation metrics lead to better machine translation. In Proceedings of the Conference on Empirical Methods in Natural Language Processing. Wolfgang Macherey, Franz Och, Ignacio Thayer, and Jakob Uszkoreit. 2008. Lattice-based minimum error rate training for statistical machine translation. In EMNLP. R. T. Marler and J. S. Arora. 2004. Survey of multi-objective optimization methods for engineering. Structural and Multidisciplinary Optimization, 26. Arne Mauser, Saˇ sa Hasan, and Hermann Ney. 2008. Automatic evaluation measures for statistical machine translation system optimization. In International Conference on Language Resources and Evaluation, Marrakech, Morocco, May. Kaisa Miettinen. 1998. Nonlinear Multiobjective Optimization. Springer. J.A. Nelder and R. Mead. 1965. The downhill simplex method. Computer Journal, 7(308). Franz Och. 2003. Minimum error rate training in statistical machine translation. In ACL. Karolina Owczarzak, Josef van Genabith, and Andy Way. 2007. Labelled dependencies in machine translation evaluation. In Proceedings of the Second Workshop on Statistical Machine Translation. Sebastian Pado, Daniel Cer, Michel Galley, Dan Jurafsky, and Christopher D. Manning. 2009. Measuring machine translation quality as semantic equivalence: A metric based on entailment features. Machine Translation, 23(2-3). Kishore Papineni, Salim Roukos, Todd Ward, and WeiJing Zhu. 2002. BLEU: A method for automatic evaluation of machine translation. In ACL. Vilfredo Pareto. 1906. Manuale di Economica Politica, (Translated into English by A.S. Schwier as Manual of Political Economy, 1971). Societa Editrice Libraria, Milan. Michael Paul. 2010. Overview of the iwslt 2010 evaluation campaign. In IWSLT. Yoshikazu Sawaragi, Hirotaka Nakayama, and Tetsuzo Tanino, editors. 1985. Theory of Multiobjective Optimization. Academic Press. M. Snover, B. Dorr, R. Schwartz, L. Micciulla, and J. Makhoul. 2006. A study of translation edit rate with targeted human annotation. In AMTA. Valentin I. Spitkovsky, Hiyan Alshawi, and Daniel Jurafsky. 2011. Lateen em: Unsupervised training with multiple objectives, applied to dependency grammar induction. In Proceedings of the 2011 Conference on Empirical Methods in Natural Language Processing, pages 1269–1280, Edinburgh, Scotland, UK., July. Association for Computational Linguistics. Omar Zaidan. 2009. Z-MERT: A fully configurable open source tool for minimum error rate training of machine translation systems. In The Prague Bulletin of Mathematical Linguistics. 10