emnlp emnlp2013 emnlp2013-38 emnlp2013-38-reference knowledge-graph by maker-knowledge-mining

38 emnlp-2013-Bilingual Word Embeddings for Phrase-Based Machine Translation

Source: pdf

Author: Will Y. Zou ; Richard Socher ; Daniel Cer ; Christopher D. Manning

Abstract: We introduce bilingual word embeddings: semantic embeddings associated across two languages in the context of neural language models. We propose a method to learn bilingual embeddings from a large unlabeled corpus, while utilizing MT word alignments to constrain translational equivalence. The new embeddings significantly out-perform baselines in word semantic similarity. A single semantic similarity feature induced with bilingual embeddings adds near half a BLEU point to the results of NIST08 Chinese-English machine translation task.

reference text

A. Klementiev, I. Titov and B. Bhattarai. 2012. Inducing Crosslingual Distributed Representation of Words. COLING. Y. Bengio, J. Louradour, R. Collobert and J. Weston. 2009. Curriculum Learning. ICML. Y. Bengio, R. Ducharme, P. Vincent and C. Jauvin. 2003. A Neural Probabilistic Language Model. Journal of Machine Learning Research. Y. Bengio and Y. LeCunn. 2007. Scaling learning algorithms towards AI. Large-Scale Kernal Machines. J. Boyd-Graber and P. Resnik. 2010. Holistic sentiment analysis across languages: multilingual supervised latent dirichlet allocation. EMNLP. D. Cer, M. Galley, D. Jurafsky and C. Manning. 2010. Phrasal: A Toolkit for Statistical Machine Translation with Facilities for Extraction and Incorporation of Arbitrary Model Features. In Proceedings of the North American Association of Computational Linguistics - Demo Session (NAACL-10). D. Cer, C. Manning and D. Jurafsky. 2010. The Best Lexical Metric for Phrase-Based Statistical MT System Optimization. NAACL. R. Collobert and J. Weston. 2008. A unified architecture for natural language processing: Deep neural networks with multitask learning. ICML. G. Foster and R. Kuhn. 2009. Stabilizing minimum error rate training. Proceedings of the Fourth Workshop on Statistical Machine Translation. M. Galley, P. Chang, D. Cer, J. R. Finkel and C. D. Manning. 2008. NIST Open Machine Translation 2008 Evaluation: Stanford University’s System Description. Unpublished working notes of the 2008 NIST Open Machine Translation Evaluation Workshop. S. Green, S. Wang, D. Cer and C. Manning. 2013. Fast and adaptive online training of feature-rich translation models. ACL. G. Hinton, L. Deng, D. Yu, G. Dahl, A. Mohamed, N. Jaitly, A. Senior, V. Vanhoucke, P. Nguyen, T. Sainath and B. Kingsbury. 2012. Deep Neural Networks for Acoustic Modeling in Speech Recognition. IEEE Signal Processing Magazine. E. Hovy, M. Marcus, M. Palmer, L. Ramshaw and R. Weischedel. 2006. OntoNotes: the 90% solution. NAACL-HLT. E. H. Huang, R. Socher, C. D. Manning and A. Y. Ng. 2012. Improving Word Representations via Global Context and Multiple Word Prototypes. ACL. P. Jin and Y. Wu. 2012. SemEval-2012 Task 4: Evaluating Chinese Word Similarity. Proceedings of the First Joint Conference on Lexical and Computational Semantics-Volume 1: Proceedings of the main conference and the shared task, and Volume 2: Proceedings of the Sixth International Workshop on Semantic Evaluation. Association for Computational Linguistics. R. Jones. 2006. Generating query substitutions. In Proceedings of the 15th international conference on World Wide Web. P. Koehn, F. J. Och and D. Marcu. 2003. Statistical Phrase-Based Translation. HLT. H. Le, A. Allauzen and F. Yvon 2012. Continuous space translation models with neural networks. NAACL. P. Liang, B. Taskar and D. Klein. 2006. Alignment by agreement. NAACL. M. Luong, R. Socher and C. Manning. 2013. Better word representations with recursive neural networks for morphology. CONLL. L. van der Maaten and G. Hinton. 2008. Visualizing data using t-SNE. Journal of Machine Learning Research. A. Maas and R. E. Daly and P. T. Pham and D. Huang and A. Y. Ng and C. Potts. 2011. Learning word vectors for sentiment analysis. ACL. C. Manning and P. Raghavan and H. Schtze. 2008. Introduction to Information Retrieval. Cambridge Univer- sity Press, New York, NY, USA. T. Mikolov, M. Karafiat, L. Burget, J. Cernocky and S. Khudanpur. 2010. Recurrent neural network based language model. INTERSPEECH. T. Mikolov, K. Chen, G. Corrado and J. Dean. 2013. Efficient Estimation of Word Representations in Vector Space. arXiv:1301.3781v1. A. Mnih and G. Hinton. 2008. A scalable hierarchical distributed language model. NIPS. F. Morin and Y. Bengio. 2005. Hierarchical probabilistic neural network language model. AISTATS. F. Och. 2003. Minimum error rate training in statistical machine translation. ACL. M. Pas ¸ca, D. Lin, J. Bigham, A. Lifchits and A. Jain. 2006. Names and similarities on the web: fact extraction in the fast lane. ACL. Y. Peirsman and S. Pad o´. 2010. Cross-lingual induction of selectional preferences with bilingual vector spaces. ACL. 1398 J. Reisinger and R. J. Mooney. 2010. Multi-prototype vector-space models of word meaning. NAACL. F. Sebastiani. 2002. Machine learning in automated text categorization. ACM Comput. Surv., 34:1-47, March. R. Socher, J. Pennington, E. Huang, A. Y. Ng and C. D. Manning. 2011. Semi-Supervised Recursive Autoencoders for Predicting Sentiment Distributions. EMNLP. R. Socher, E. H. Huang, J. Pennington, A. Y. Ng, and C. D. Manning. 2011. Dynamic Pooling and Unfolding Recursive Autoencoders for Paraphrase Detection. NIPS. E. Sumita. 2000. Lexical transfer using a vector-space model. ACL. Y. Tam, I. Lane and T. Schultz. 2007. Bilingual-LSA based LM adaptation for spoken language translation. ACL. S. Tellex and B. Katz and J. Lin and A. Fernandes and G. Marton. 2003. Quantitative evaluation of passage retrieval algorithms for question answering. In Proceedings of the 26th Annual International ACM SIGIR Conference on Search and Development in Information Retrieval, pages 41-47. ACM Press. J. Turian and L. Ratinov and Y. Bengio. 2010. Word representations: A simple and general method for semisupervised learning. ACL. M. Wang, W. Che and C. D. Manning. 2013. Joint Word Alignment and Bilingual Named Entity Recognition Using Dual Decomposition. ACL. K. Yamada and K. Knight. 2001. A Syntax-based Statistical Translation Model. ACL. B. Zhao and E. P. Xing 2006. BiTAM: Bilingual topic AdMixture Models for word alignment. ACL.