acl acl2013 acl2013-38 acl2013-38-reference knowledge-graph by maker-knowledge-mining

38 acl-2013-Additive Neural Networks for Statistical Machine Translation

Source: pdf

Author: lemao liu ; Taro Watanabe ; Eiichiro Sumita ; Tiejun Zhao

Abstract: Most statistical machine translation (SMT) systems are modeled using a loglinear framework. Although the log-linear model achieves success in SMT, it still suffers from some limitations: (1) the features are required to be linear with respect to the model itself; (2) features cannot be further interpreted to reach their potential. A neural network is a reasonable method to address these pitfalls. However, modeling SMT with a neural network is not trivial, especially when taking the decoding efficiency into consideration. In this paper, we propose a variant of a neural network, i.e. additive neural networks, for SMT to go beyond the log-linear translation model. In addition, word embedding is employed as the input to the neural network, which encodes each word as a feature vector. Our model outperforms the log-linear translation models with/without embedding features on Chinese-to-English and Japanese-to-English translation tasks.

reference text

Yoshua Bengio, R ´ejean Ducharme, Pascal Vincent, and Christian Janvin. 2003. A neural probabilistic language model. J. Mach. Learn. Res., 3: 1137–1 155, March. Christopher M. Bishop. 1995. Neural Networks for Pattern Recognition. Oxford University Press, Inc., New York, NY, USA. Peter F. Brown, Vincent J. Della Pietra, Stephen A. Della Pietra, and Robert L. Mercer. 1993. The mathematics of statistical machine translation: parameter estimation. Comput. Linguist., 19:263–3 11, June. Andreas Buja, Trevor Hastie, and Robert Tibshirani. 1989. Linear smoothers and additive models. The Annals of Statistics, 17:453–510. M. Asuncin Casta ˜no, Francisco Casacuberta, and Enrique Vidal. 1997. Machine translation using neural networks and finite-state models. In TMI, pages 160–167. David Chiang, Yuval Marton, and Philip Resnik. 2008. Online large-margin training of syntactic and structural translation features. In Proc. of EMNLP. ACL. David Chiang. 2005. A hierarchical phrase-based model for statistical machine translation. In Proceedings of the 43rd Annual Meeting on Association for Computational Linguistics, ACL ’05, pages 263– 270, Stroudsburg, PA, USA. Association for Computational Linguistics. David Chiang. 2007. Hierarchical phrase-based translation. Comput. Linguist., 33(2):201–228, June. Jonathan H. Clark, Chris Dyer, Alon Lavie, and Noah A. Smith. 2011. Better hypothesis testing for statistical machine translation: controlling for optimizer instability. In Proceedings of the 49th Annual Meeting of the Association for Computational Linguistics: Human Language Technologies: short papers - Volume 2, HLT ’ 11, pages 176–181, Stroudsburg, PA, USA. Association for Computational Linguistics. R. Collobert and J. Weston. 2008. A unified architecture for natural language processing: Deep neural networks with multitask learning. In International Conference on Machine Learning, ICML. R. Collobert. 2011. Deep learning for efficient discriminative parsing. In AISTATS. Thomas Deselaers, Saˇ sa Hasan, Oliver Bender, and Hermann Ney. 2009. A deep learning approach to machine transliteration. In Proceedings of the Fourth Workshop on Statistical Machine Translation, StatMT ’09, pages 233–241, Stroudsburg, PA, USA. Association for Computational Linguistics. Kevin Duh and Katrin Kirchhoff. 2008. Beyond loglinear models: Boosted minimum error rate training for n-best re-ranking. In Proceedings of ACL-08: HLT, Short Papers, pages 37–40, Columbus, Ohio, June. Association for Computational Linguistics. Atsushi Fujii, Masao Utiyama, Mikio Yamamoto, and Takehito Utsuro. 2010. Overview of the patent translation task at the ntcir-8 workshop. In In Proceedings of the 8th NTCIR Workshop Meeting on Evaluation of Information Access Technologies: Information Retrieval, Question Answering and Cross-lingual Information Access, pages 293– 302. 799 William W. Hager and Hongchao Zhang. 2006. Algorithm 85 1: Cg descent, a conjugate gradient method with guaranteed descent. ACM Trans. Math. Softw., 32(1): 113–137, March. Mark Hopkins and Jonathan May. 2011. Tuning as ranking. In Proceedings of the 2011 Conference on Empirical Methods in Natural Language Processing, pages 1352–1362, Edinburgh, Scotland, UK., July. Association for Computational Linguistics. Philipp Koehn, Franz Josef Och, and Daniel Marcu. 2003. Statistical phrase-based translation. In Proc. of HLT-NAACL. ACL. Philipp Koehn, Hieu Hoang, Alexandra Birch, Chris Callison-Burch, Marcello Federico, Nicola Bertoldi, Brooke Cowan, Wade Shen, Christine Moran, Richard Zens, Chris Dyer, Ond ˇrej Bojar, Alexandra Constantin, and Evan Herbst. 2007. Moses: open source toolkit for statistical machine translation. In Proceedings of the 45th Annual Meeting of the ACL on Interactive Poster and Demonstration Sessions, ACL ’07, pages 177–180, Stroudsburg, PA, USA. Association for Computational Linguistics. Philipp Koehn. 2004a. Pharaoh: A beam search decoder for phrase-based statistical machine translation models. In AMTA. Philipp Koehn. 2004b. Statistical significance tests for machine translation evaluation. In Proc. of EMNLP. ACL. Quoc V. Le, Jiquan Ngiam, Adam Coates, Ahbik Lahiri, Bobby Prochnow, and Andrew Y. Ng. 2011. On optimization methods for deep learning. ICML, pages 265–272. In Tomas Mikolov, Martin Karafi´ at, Lukas Burget, Jan Cernock y´, and Sanjeev Khudanpur. 2010. Recurrent neural network based language model. In INTERSPEECH, pages 1045–1048. Patrick Nguyen, Milind Mahajan, and Xiaodong He. 2007. Training non-parametric features for statistical machine translation. In Proceedings of the Second Workshop on Statistical Machine Translation, pages 72–79, Prague, Czech Republic, June. Association for Computational Linguistics. Franz Josef Och and Hermann Ney. 2000. Improved statistical alignment models. In Proceedings of the 38th Annual Meeting on Association for Computational Linguistics, ACL ’00, pages 440–447, Stroudsburg, PA, USA. Association for Computational Linguistics. Franz Josef Och and Hermann Ney. 2002. Discriminative training and maximum entropy models for statistical machine translation. In Proceedings of the 40th Annual Meeting on Association for Computational Linguistics, ACL ’02, pages 295–302, Stroudsburg, PA, USA. Association for Computational Linguistics. Franz Josef Och. 2003. Minimum error rate training in statistical machine translation. In Proceedings of the 41st Annual Meeting of the Association for Computational Linguistics, pages 160–167, Sapporo, Japan, July. Association for Computational Linguistics. Kishore Papineni, Salim Roukos, Todd Ward, and WeiJing Zhu. 2002. Bleu: a method for automatic evaluation of machine translation. In Proceedings of 40th Annual Meeting of the Association for Computational Linguistics, pages 3 11–3 18, Philadelphia, Pennsylvania, USA, July. Association for Computational Linguistics. William J. E. Potts. 1999. Generalized additive neural networks. In Proceedings of the fifth ACM SIGKDD international conference on Knowledge discovery and data mining, KDD ’99, pages 194–200, New York, NY, USA. ACM. Holger Schwenk. 2012. Continuous space translation models for phrase-based statistical machine translation. In Proceedings of the 24th International Conference on Computational Linguistics, COLING ’ 12, Mumbai, India. Association for Computational Linguistics. Richard Socher, Cliff Chiung-Yu Lin, Andrew Y. Ng, and Christopher D. Manning. 2011. Parsing Natural Scenes and Natural Language with Recursive Neural Networks. In Proceedings of the 26th International Conference on Machine Learning (ICML). A. Sokolov, G. Wisniewski, and F. Yvon. 2012. Nonlinear n-best list reranking with few features. In AMTA, San Diego, USA. Le Hai Son, Alexandre Allauzen, and Fran ¸cois Yvon. 2012. Continuous space translation models with neural networks. In Proceedings of the 2012 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, NAACL HLT ’ 12, pages 39– 48, Stroudsburg, PA, USA. Association for Computational Linguistics. Andreas Stolcke. 2002. Srilm - an extensible language modeling toolkit. In Proc. of ICSLP. Joseph Turian, Lev Ratinov, and Yoshua Bengio. 2010. Word representations: a simple and general method for semi-supervised learning. In Proceedings of the 48th Annual Meeting of the Association for Computational Linguistics, ACL ’ 10, pages 384–394, Stroudsburg, PA, USA. Association for Computational Linguistics. D. A. de Waal and J. V. du Toit. 2007. Generalized additive models from a neural network perspective. In Proceedings of the Seventh IEEE International Conference on Data Mining Workshops, ICDMW ’07, pages 265–270, Washington, DC, USA. IEEE Computer Society. 800 Taro Watanabe, Jun Suzuki, Hajime Tsukada, and Hideki Isozaki. 2007. Online large-margin training for statistical machine translation. In Proceedings of the 2007 Joint Conference on Empirical Methods in Natural Language Processing and Computational Natural Language Learning (EMNLPCoNLL), pages 764–773, Prague, Czech Republic, June. Association for Computational Linguistics. Tong Xiao, Jingbo Zhu, Muhua Zhu, and Huizhen Wang. 2010. Boosting-based system combination for machine translation. In Proceedings of the 48th Annual Meeting of the Association for Computational Linguistics, ACL ’ 10, pages 739–748, Stroudsburg, PA, USA. Association for Computational Linguistics. 801