emnlp emnlp2012 emnlp2012-86 emnlp2012-86-reference knowledge-graph by maker-knowledge-mining

86 emnlp-2012-Locally Training the Log-Linear Model for SMT

Source: pdf

Author: Lemao Liu ; Hailong Cao ; Taro Watanabe ; Tiejun Zhao ; Mo Yu ; Conghui Zhu

Abstract: In statistical machine translation, minimum error rate training (MERT) is a standard method for tuning a single weight with regard to a given development data. However, due to the diversity and uneven distribution of source sentences, there are two problems suffered by this method. First, its performance is highly dependent on the choice of a development set, which may lead to an unstable performance for testing. Second, translations become inconsistent at the sentence level since tuning is performed globally on a document level. In this paper, we propose a novel local training method to address these two problems. Unlike a global training method, such as MERT, in which a single weight is learned and used for all the input sentences, we perform training and testing in one step by learning a sentencewise weight for each input sentence. We pro- pose efficient incremental training methods to put the local training into practice. In NIST Chinese-to-English translation tasks, our local training method significantly outperforms MERT with the maximal improvements up to 2.0 BLEU points, meanwhile its efficiency is comparable to that of the global method.

reference text

Alexandr Andoni and Piotr Indyk. 2008. Near-optimal hashing algorithms for approximate nearest neighbor in high dimensions. Commun. ACM, 5 1(1): 117–122, January. Phil Blunsom, Trevor Cohn, and Miles Osborne. 2008. A discriminative latent variable model for statistical machine translation. In Proceedings of ACL, pages 200–208, Columbus, Ohio, June. Association for Computational Linguistics. L e´on Bottou and Vladimir Vapnik. 1992. Local learning algorithms. Neural Comput., 4:888–900, November. G. Cauwenberghs and T. Poggio. 2001. Incremental and decremental support vector machine learning. In Advances in Neural Information Processing Systems (NIPS*2000), volume 13. Stanley F Chen and Joshua Goodman. 1998. An empirical study of smoothing techniques for language modeling. In Technical Report TR-10-98. Harvard University. Haibin Cheng, Pang-Ning Tan, and Rong Jin. 2010. Efficient algorithm for localized support vector machine. IEEE Trans. on Knowl. and Data Eng., 22:537–549, April. David Chiang, Yuval Marton, and Philip Resnik. 2008. Online large-margin training of syntactic and structural translation features. In Proceedings of the Conference on Empirical Methods in Natural Language Processing, EMNLP ’08, pages 224–233, Stroudsburg, PA, USA. Association for Computational Linguistics. David Chiang. 2005. A hierarchical phrase-based model for statistical machine translation. In Proceedings of the 43rd Annual Meeting on Association for Computational Linguistics, ACL ’05, pages 263–270, Stroudsburg, PA, USA. Association for Computational Linguistics. Koby Crammer and Yoram Singer. 2003. Ultraconservative online algorithms for multiclass problems. J. Mach. Learn. Res., 3:95 1–991, March. Koby Crammer, Ofer Dekel, Joseph Keshet, Shai Shalev- Shwartz, and Yoram Singer. 2006. Online passive- 410 J. Mach. Learn. Res., 7:55 1– aggressive algorithms. 585, December. Michel Galley and Chris Quirk. 2011. Optimal search for minimum error rate training. In Proceedings of the 2011 Conference on Empirical Methods in Natural Language Processing, pages 38–49, Edinburgh, Scotland, UK., July. Association for Computational Linguistics. Yifan He, Yanjun Ma, Josef van Genabith, and Andy Way. 2010. Bridging smt and tm with translation recommendation. In Proceedings of the 48th Annual Meeting of the Association for Computational Linguistics, pages 622–630, Uppsala, Sweden, July. Association for Computational Linguistics. S. Hildebrand, M. Eck, S. Vogel, and Alex Waibel. 2005. Adaptation of the translation model for statistical machine translation based on information retrieval. In Proceedings of EAMT. Association for Computational Linguistics. Mark Hopkins and Jonathan May. 2011. Tuning as ranking. In Proceedings of the 2011 Conference on Empirical Methods in Natural Language Processing, pages 1352–1362, Edinburgh, Scotland, UK., July. Association for Computational Linguistics. Philipp Koehn, Franz Josef Och, and Daniel Marcu. 2003. Statistical phrase-based translation. In Proc. of HLT-NAACL. ACL. Philipp Koehn, Hieu Hoang, Alexandra Birch, Chris Callison-Burch, Marcello Federico, Nicola Bertoldi, Brooke Cowan, Wade Shen, Christine Moran, Richard Zens, Chris Dyer, Ond ˇrej Bojar, Alexandra Constantin, and Evan Herbst. 2007. Moses: open source toolkit for statistical machine translation. In Proceedings of the 45th Annual Meeting of the ACL on Interactive Poster and Demonstration Sessions, ACL ’07, pages 177–180, Stroudsburg, PA, USA. Association for Computational Linguistics. Philipp Koehn. 2004. Statistical significance tests for machine translation evaluation. In Proc. of EMNLP. ACL. Mu Li, Yinggong Zhao, Dongdong Zhang, and Ming Zhou. 2010. Adaptive development data selection for log-linear model in statistical machine translation. In Proceedings of the 23rd International Conference on Computational Linguistics, COLING ’ 10, pages 662– 670, Stroudsburg, PA, USA. Association for Computational Linguistics. Yajuan L u¨, Jin Huang, and Qun Liu. 2007. Improving statistical machine translation performance by training data selection and optimization. In Proceedings of the 2007 Joint Conference on Empirical Methods in Natural Language Processing and Computational Natural Language Learning (EMNLP-CoNLL), pages 343–350, Prague, Czech Republic, June. Association for Computational Linguistics. Yanjun Ma, Yifan He, Andy Way, and Josef van Genabith. 2011. Consistent translation using discriminative learning - a translation memory-inspired approach. In Proceedings of the 49th Annual Meeting of the Association for Computational Linguistics: Human Language Technologies, pages 1239–1248, Portland, Oregon, USA, June. Association for Computational Linguistics. Christopher D. Manning and Hinrich Sch u¨tze. 1999. Foundations of statistical natural language processing. MIT Press, Cambridge, MA, USA. Robert C. Moore and Chris Quirk. 2008. Random restarts in minimum error rate training for statistical machine translation. In Proceedings of the 22nd International Conference on Computational Linguistics Volume 1, COLING ’08, pages 585–592, Stroudsburg, PA, USA. Association for Computational Linguistics. Franz Josef Och and Hermann Ney. 2000. Improved statistical alignment models. In Proceedings of the 38th Annual Meeting on Association for Computational Linguistics, ACL ’00, pages 440–447, Stroudsburg, PA, USA. Association for Computational Linguistics. Franz Josef Och and Hermann Ney. 2002. Discrimi- native training and maximum entropy models for statistical machine translation. In Proceedings of the 40th Annual Meeting on Association for Computational Linguistics, ACL ’02, pages 295–302, Stroudsburg, PA, USA. Association for Computational Linguistics. Franz Josef Och. 2003. Minimum error rate training in statistical machine translation. In Proceedings of the 41st Annual Meeting of the Association for Computational Linguistics, pages 160–167, Sapporo, Japan, July. Association for Computational Linguistics. Kishore Papineni, Salim Roukos, Todd Ward, and WeiJing Zhu. 2002. Bleu: a method for automatic evaluation of machine translation. In Proceedings of 40th Annual Meeting of the Association for Computational Linguistics, pages 3 11–318, Philadelphia, Pennsylvania, USA, July. Association for Computational Linguistics. Adam Pauls, John Denero, and Dan Klein. 2009. Consensus training for consensus decoding in machine translation. In Proceedings of the 2009 Conference on Empirical Methods in Natural Language Processing, pages 1418–1427, Singapore, August. Association for Computational Linguistics. Alistair Shilton, Marimuthu Palaniswami, Daniel Ralph, and Ah Chung Tsoi. 2005. Incremental training of support vector machines. IEEE Transactions on Neural Networks, 16(1): 114–131 . 411 Andreas Stolcke. 2002. Srilm - an extensible language modeling toolkit. In Proc. of ICSLP. Taro Watanabe and Eiichiro Sumita. 2003. Examplebased decoding for statistical machine translation. In Proc. of MT Summit IX, pages 410–417. Taro Watanabe, Jun Suzuki, Hajime Tsukada, and Hideki Isozaki. 2007. Online large-margin training for statistical machine translation. In Proceedings of the 2007 Joint Conference on Empirical Methods in Natural Language Processing and Computational Natural Language Learning (EMNLP-CoNLL), pages 764– 773, Prague, Czech Republic, June. Association for Computational Linguistics. Hao Zhang, Alexander C. Berg, Michael Maire, and Jitendra Malik. 2006. Svm-knn: Discriminative nearest neighbor classification for visual category recognition. In Proceedings of the 2006 IEEE Computer Society Conference on Computer Vision and Pattern Recognition - Volume 2, CVPR ’06, pages 2126–2136, Washington, DC, USA. IEEE Computer Society. Bing Zhao and Shengyuan Chen. 2009. A simplex armijo downhill algorithm for optimizing statistical machine translation decoding parameters. In Proceedings of Human Language Technologies: The 2009 Annual Conference of the North American Chapter of the Association for Computational Linguistics, Companion Volume: Short Papers, NAACL-Short ’09, pages 21–24, Stroudsburg, PA, USA. Association for Computational Linguistics.