acl acl2010 acl2010-56 acl2010-56-reference knowledge-graph by maker-knowledge-mining

56 acl-2010-Bridging SMT and TM with Translation Recommendation

Source: pdf

Author: Yifan He ; Yanjun Ma ; Josef van Genabith ; Andy Way

Abstract: We propose a translation recommendation framework to integrate Statistical Machine Translation (SMT) output with Translation Memory (TM) systems. The framework recommends SMT outputs to a TM user when it predicts that SMT outputs are more suitable for post-editing than the hits provided by the TM. We describe an implementation of this framework using an SVM binary classifier. We exploit methods to fine-tune the classifier and investigate a variety of features of different types. We rely on automatic MT evaluation metrics to approximate human judgements in our experiments. Experimental results show that our system can achieve 0.85 precision at 0.89 recall, excluding ex- act matches. Furthermore, it is possible for the end-user to achieve a desired balance between precision and recall by adjusting confidence levels.

reference text

John Blatz, Erin Fitzgerald, George Foster, Simona Gandrabur, Cyril Goutte, Alex Kulesza, Alberto Sanchis, and Nicola Ueffing. 2004. Confidence estimation for machine translation. In The 20th International Conference on Computational Linguistics (Coling-2004), pages 3 15 321, Geneva, Switzerland. – Peter F. Brown, Vincent J. Della Pietra, Stephen A. Della Pietra, and Robert L. Mercer. 1993. The mathematics of statistical machine translation: parameter estimation. Computational Linguistics, 19(2):263 – 3 11. Chris Callison-Burch. 2009. Fast, cheap, and creative: Evaluating translation quality using Amazon’s Mechanical Turk. In The 2009 Conference on Empirical Methods in Natural Language Processing (EMNLP-2009), pages 286 295, Singapore. – Chih-Chung Chang and Chih-Jen Lin, 2001. LIBSVM: a library for support vector machines. Software available at http : / /www .c s i .ntu . edu .tw/ e ˜c jl in/ l svm. ib Corinna Cortes and Vladimir Vapnik. 1995. Support-vector networks. Machine learning, 20(3):273 – 297. R. Kneser and H. Ney. 1995. Improved backing-off for m-gram language modeling. In The 1995 International Conference on Acoustics, Speech, and Signal Processing (ICASSP-95), pages 181 184, Detroit, MI. – Philipp. Koehn, Franz Josef Och, and Daniel Marcu. 2003. Statistical phrase-based translation. In The 2003 Conference of the North American Chapter of the Association for Computational Linguistics on Human Language Technology (NAACL/HLT-2003), pages 48 54, Edmonton, Alberta, Canada. – Philipp Koehn, Hieu Hoang, Alexandra Birch, Chris Callison-Burch, Marcello Federico, Nicola Bertoldi, Brooke Cowan, Wade Shen, Christine Moran, Richard Zens, Chris Dyer, Ondrej Bojar, Alexandra Constantin, and Evan Herbst. 2007. Moses: Open source toolkit for statistical machine translation. In The 45th Annual Meeting of the Association for Computational Linguistics Companion Volume Proceedings of the Demo and Poster Sessions (ACL-2007), pages 177 180, Prague, Czech Republic. – Vladimir Iosifovich Levenshtein. 1966. Binary codes capable of correcting deletions, insertions, and reversals. Soviet Physics Doklady, 10(8):707 – 710. Hsuan-Tien Lin, Chih-Jen Lin, and Ruby C. Weng. 2007. A note on platt’s probabilistic outputs for support vector machines. Machine Learning, 68(3):267 – 276. Franz Josef Och and Hermann Ney. 2002. Discriminative training and maximum entropy models for statistical machine translation. In Proceedings of 40th Annual Meeting of the Association for Computational Linguistics (ACL2002), pages 295 302, Philadelphia, PA. Franz Josef Och. 2003. Minimum error rate training in statistical machine translation. In The 41st Annual Meeting on Association for Computational Linguistics (ACL2003), pages 160 167. – – John C. Platt. 1999. Probabilistic outputs for support vector machines and comparisons to regularized likelihood methods. Advances in Large Margin Classifiers, pages 61 74. – Christopher B. Quirk. 2004. Training a sentence-level machine translation confidence measure. In The Fourth International Conference on Language Resources and Evaluation (LREC-2004), pages 825 828, Lisbon, Portugal. – Richard Sikes. 2007. Fuzzy matching in theory and practice. Multilingual, 18(6):39 43. – Michel Simard and Pierre Isabelle. 2009. Phrase-based machine translation in a computer-assisted translation environment. In The Twelfth Machine Translation Summit (MT Summit XII), pages 120 127, Ottawa, Ontario, Canada. – Matthew Snover, Bonnie Dorr, Richard Schwartz, Linnea Micciulla, and John Makhoul. 2006. A study of translation edit rate with targeted human annotation. In The 2006 conference of the Association for Machine Translation in the Americas (AMTA-2006), pages 223 – 23 1, Cambridge, MA. Lucia Specia, Nicola Cancedda, Marc Dymetman, Marco Turchi, and Nello Cristianini. 2009a. Estimating the sentence-level quality of machine translation systems. In The 13th Annual Conference of the European Association for Machine Translation (EAMT-2009), pages 28 35, Barcelona, Spain. – Lucia Specia, Craig Saunders, Marco Turchi, Zhuoran Wang, and John Shawe-Taylor. 2009b. Improving the confidence of machine translation quality estimates. In The Twelfth Machine Translation Summit (MT Summit XII), pages 136 – 143, Ottawa, Ontario, Canada. Andreas Stolcke. 2002. SRILM-an extensible language modeling toolkit. In The Seventh International Conference on Spoken Language Processing, volume 2, pages 901 904, Denver, CO. – Nicola Ueffing and Hermann Ney. 2005. Application of word-level confidence measures in interactive statistical machine translation. In The Ninth Annual Conference of the European Association for Machine Translation (EAMT-2005), pages 262 270, Budapest, Hungary. – Nicola Ueffing, Klaus Macherey, and Hermann Ney. 2003. Confidence measures for statistical machine translation. In The Ninth Machine Translation Summit (MT Summit IX), pages 394 401, New Orleans, LA. – 630