acl acl2013 acl2013-255 acl2013-255-reference knowledge-graph by maker-knowledge-mining

255 acl-2013-Name-aware Machine Translation

Source: pdf

Author: Haibo Li ; Jing Zheng ; Heng Ji ; Qi Li ; Wen Wang

Abstract: We propose a Name-aware Machine Translation (MT) approach which can tightly integrate name processing into MT model, by jointly annotating parallel corpora, extracting name-aware translation grammar and rules, adding name phrase table and name translation driven decoding. Additionally, we also propose a new MT metric to appropriately evaluate the translation quality of informative words, by assigning different weights to different words according to their importance values in a document. Experiments on Chinese-English translation demonstrated the effectiveness of our approach on enhancing the quality of overall translation, name translation and word alignment over a high-quality MT baseline1 .

reference text

Y. Al-Onaizan and K. Knight. 2002. Translating Named Entities Using Monolingual and Bilingual Resources. In Proceeding ACL’02, pages 400–408. N. Aswani and R. Gaizauskas. 2005. A Hybrid Approach to Align Sentences and Words in EnglishHindi Parallel Corpora. In Proceeding ACL’05 Workshop on Building and Using Parallel Texts, pages 57–64. Bogdan Babych and Anthony Hartley. 2003. Improving Machine Translation Quality with Automatic Named Entity Recognition. In Proceeding EAMT ’03 workshop on MT and other Language Technology Tools, Improving MT through other Language Technology Tools: Resources and Tools for Building MT, pages 1–8. O. Bojar and D. Wu. 2012. Towards a PredicateArgument evaluation for MT. In Proceeding of the Sixth Workshop on Syntax, Semantics and Structure in Statistical Translation, pages 30–38, July. Marine Carpuat and Dekai Wu. 2007a. How Phrase Sense Disambiguation outperforms Word Sense Disambiguation for Statistical Machine Translation. In Proceeding TMI’07, pages 43–52. Marine Carpuat and Dekai Wu. 2007b. Improving Statistical Machine Translation using Word Sense Disambiguation. In Proceeding EMNLP-CoNLL’07, pages 61–72. Taylor Cassidy, Heng Ji, Hongbo Deng, Jing Zheng, and Jiawei Han. 2012. Analysis and Refinement of Cross-lingual Entity Linking. In Proceeding CLEF’12, pages 1–12. Stanley F. Chen and Joshua Goodman. 1996. An Empirical Study of Smoothing Techniques for Language Modeling. Proceeding of ACL’96, pages 310–318. 612 David Chiang. 2005. A Hierarchical Phrase-based Model for Statistical Machine Translation. In Proceeding ACL’05, pages 263–270. F. Dayne and K. Shahram. 2007. A Sequence Alignment Model Based on the Averaged Perceptron. In Proceeding EMNLP-CoNLL’07, pages 238–247. C. Dyer, S. Muresan, and P. Resnik. 2008. Generalizing Word Lattice Translation. In Proceeding ACL- HLT’08, pages 1012–1020. D. Feng, Y. Lv, and M. Zhou. 2004. A New Approach for English-Chinese Named Entity Alignment. In Proceeding PACLIC’04, pages 372–379. R. Florian, H. Jing, N. Kambhatla, and I. Zitouni. 2006. Factorizing Complex Models: A Case Study in Mention Detection. In Proceeding COLINGACL’06, pages 473–480. P. Fung and L. Y. Yee. 1998. An IR Approach for Translating New Words from Nonparallel and Comparable Texts. In Proceeding COLING-ACL’98, pages 414–420. D. Hakkani-Tur, H. Ji, and R. Grishman. 2007. Using Information Extraction to Improve Cross-lingual Document Retrieval. In Proceeding RANLP Workshop on Multi-source, Multilingual Information Extraction and Summarization, pages 17–23. A. Hassan, H. Fahmy, and H. Hassan. 2007. Improving Named Entity Translation by Exploiting Comparable and Parallel Corpora. In Proceeding RANLP’07, pages 1–6. U. Hermjakob, K. Knight, and H. Daume III. 2008. Name Translation in Statistical Machine Translation: Learning When to Transliterate. In Proceeding ACL’08, pages 389–397. F. Huang, S. Vogel, and A. Waibel. 2004. Improving Named Entity Translation Combining Phonetic and Semantic Similarities. In Proceeding HLT/NAACL’04, pages 281–288. H. Ji and R. Grishman. 2006. Analysis and Repair of Name Tagger Errors. In Proceeding COLINGACL’06, pages 420–427. H. Ji, R. Grishman, D. Freitag, M. Blume, J. Wang, S. Khadivi, R. Zens, and H. Ney. 2009. Name Extraction and Translation for Distillation. Handbook of Natural Language Processing and Machine Translation: DARPA Global Autonomous Language Exploitation. H. Ji. 2009. Mining Name Translations from Comparable Corpora by Creating Bilingual Information Networks. In Proceeding ACL-IJCNLP’09 workshop on Building and Using Comparable Corpora, pages 34–37. B. Jones, J. Andreas, D. Bauer, K. M. Hermann, and K. Knight. 2012. Semantics-Based Machine Translation with Hyperedge Replacement Grammars. In Proceeding COLING’12, pages 1359–1376. K. Knight and J. Graehl. 1998. Machine Transliteration. In Computational Linguistics, volume 24, pages 599–612, Cambridge, MA, USA, December. MIT Press. P. Koehn, F. Josef Och, and D. Marcu. 2003. Statistical Phrase-Based Translation. In Proceeding HLTNAACL’03, pages 127–133. T. Kutsumi, T. Yoshimi, K. Kotani, and I. Sata. 2004. Integrated Use of Internal and External Evidence in The Alignment of Multi-Word Named Entities. In Proceeding PACLIC’04, pages 187–196. X. Li, S. Strassel, S. Grimes, S. Ismael, X. Ma, N. Ge, A. Bies, N. Xue, and M. Maamouri. 2010. Parallel Aligned Treebank Corpora at LDC: Methodology, Annotation and Integration. In Workshop on Annotation and Exploitation of Parallel Corpora (AEPC). Q. Li, H. Li, H. Ji, W. Wang, J. Zheng, and F. Huang. 2012. Joint Bilingual Name Tagging for Parallel Corpora. In Proceeding CIKM’12, pages 1727– 1731. D. Liu and D. Gildea. 2009. Semantic Role Features for Machine Translation. In Proceeding COLING’09, pages 716–724. C. Lo, A. K. Tumuluru, and D. Wu. 2012. Fully Automatic Semantic MT Evaluation. In Proceeding of the Seventh Workshop on Statistical Machine Translation, pages 243–252. M. Lu and J. Zhao. 2006. Multi-feature based Chinese-English Named Entity Extraction from Comparable Corpora. In Proceeding PACLIC’06, pages 134–141. W. Ma and K. McKeown. 2009. Where’s the Verb Correcting Machine Translation During Question Answering. In Proceeding ACL-IJCNLP’09, pages 333–336. P. McNamee, J. Mayfield, D. Lawrie, D. W. Oard, and D. Doermann. 2011. Cross-Language Entity Linking. In Proceeding IJCNLP’11. A. Meyer, M. Kosaka, S. Liao, and N. Xue. 2011. Improving MT Word Alignment Using Aligned MultiStage Parses. In Proceeding ACL-HLT 2011 Workshop on Syntax, Semantics and Structure in Statistical Translation, pages 88–97. T. T. Nguyen, A. Moschitti, and G. Riccardi. 2010. Kernel-based Reranking for Named-Entity Extraction. In Proceeding COLING’10, pages 901–909. F. J. Och and H. Ney. 2003. A Systematic Comparison of Various Statistical Alignment Models. Computational Linguistics, 29(1): 19–5 1. 613 F. J. Och. 2003. Minimum Error Rate Training in Statistical Machine Translation. In Proceeding ACL’03, pages 160–167. K. Papineni, S. Roukos, T. Ward, and W. Zhu. 2002. BLEU: a Method for Automatic Evaluation of Machine Translation. In Proceeding ACL’02, pages 311–318. K. Parton and K. McKeown. 2010. MT Error Detection for Cross-Lingual Question Answering. Proceeding COLING’10, pages 946–954. K. Parton, K. R. McKeown, R. Coyne, M. T. Diab, R. Grishman, D. Hakkani-Tur, M. Harper, H. Ji, W. Y. Ma, A. Meyers, S. Stolbach, A. Sun, G. Tur, W. Xu, and S. Yaman. 2009. Who, What, When, Where, Why? Comparing Multiple Approaches to the Cross-Lingual 5W Task. In Proceeding ACLIJCNLP’09, pages 423–431. K. Parton, N. Habash, K. McKeown, G. Iglesias, and A. de Gispert. 2012. Can Automatic Post- Editing Make MT More Meaningful? In Proceeding EAMT’12, pages 111–1 18. R. Rapp. 1999. Automatic Identification of Word Translations from Unrelated English and German Corpora. In Proceeding ACL’99, pages 5 19–526. L. Shao and H. T. Ng. 2004. Mining New Word Translations from Comparable Corpora. In Proceeding COLING’04. M. Snover, B. Dorr, R. Schwartz, L. Micciulla, and J. Makhoul. 2006. A Study of Translation Edit Rate with Targeted Human Annotation. In Proceeding of Association for Machine Translation in the Americas, pages 223–23 1. M. Snover, X. Li, W. Lin, Z. Chen, S. Tamang, M. Ge, A. Lee, Q. Li, H. Li, S. Anzaroot, and H. Ji. 2011. Cross-lingual Slot Filling from Comparable Corpora. In Proceeding ACL’11 Worshop on Building and Using Comparable Corpora, pages 110–1 19. D. Talbot and T. Brants. 2008. Randomized Language Models via Perfect Hash Functions. In Proceeding of ACL/HLT’08, pages 505–513. R. Udupa, K. Saravanan, A. Kumaran, and J. Jagarlamudi. 2009. MINT: A Method for Effective and Scalable Mining of Named Entity Transliterations from Large Comparable Corpora. In Proceeding EACL’09, pages 799–807. A. Venkataraman, A. Stolcke, W. Wang, D. Vergyri, V. R. R. Gadde, and J. Zheng. 2004. An Efficient Repair Procedure For Quick Transcriptions. In Proceeding INTERSPEECH’04, pages 1961–1964. D. Wu and P. Fung. 2009. Semantic Roles for SMT: A Hybrid Two-Pass Model. In NAACL HLT’09, pages 13–16. R. Zens, O. Bender, S. Hasan, S. Khadivi, E. Matusov, J. Xu, Y. Zhang, and H. Ney. 2005. The RWTH Phrase-based Statistical Machine Translation System. In Proceeding IWSLT’05, pages 155–162. J. Zheng, N. F. Ayan, W. Wang, and D. Burkett. 2009. Using Syntax in Large-Scale Audio Document Translation. In Proceeding Interspeech ’09, pages 440–443. J. Zheng. 2008. SRInterp: SRI’s Scalable Multipurpose SMT Engine. In Technical Report. I. Zitouni and R. Florian. 2008. Mention Detection Crossing the Language Barrier. In Proceeding EMNLP’08, pages 600–609. 614