acl acl2011 acl2011-90 acl2011-90-reference knowledge-graph by maker-knowledge-mining

90 acl-2011-Crowdsourcing Translation: Professional Quality from Non-Professionals

Source: pdf

Author: Omar F. Zaidan ; Chris Callison-Burch

Abstract: Naively collecting translations by crowdsourcing the task to non-professional translators yields disfluent, low-quality results if no quality control is exercised. We demonstrate a variety of mechanisms that increase the translation quality to near professional levels. Specifically, we solicit redundant translations and edits to them, and automatically select the best output among them. We propose a set of features that model both the translations and the translators, such as country of residence, LM perplexity of the translation, edit rate from the other translations, and (optionally) calibration against professional translators. Using these features to score the collected translations, we are able to discriminate between acceptable and unacceptable translations. We recreate the NIST 2009 Urdu-toEnglish evaluation set with Mechanical Turk, and quantitatively show that our models are able to select translations within the range of quality that we expect from professional trans- lators. The total cost is more than an order of magnitude lower than professional translation.

reference text

Yaser Al-Onaizan, Ulrich Germann, Ulf Hermjakob, Kevin Knight, Philipp Koehn, Daniel Marcu, and Kenji Yamada. 2002. Translation with scarce bilingual resources. Machine Translation, 17(1), March. Vamshi Ambati and Stephan Vogel. 2010. Can crowds build parallel corpora for machine translation systems? In Proceedings of the NAACL HLT Workshop on Creating Speech and Language Data With Amazon ’s Mechanical Turk, pages 62–65. Ben Bederson and Philip Resnik. 2010. Workshop on crowdsourcing and translation. http : / /www .c s . umd . edu /hci l /monot rans /workshop/ . Michael Bloodgood and Chris Callison-Burch. 2010. Using Mechanical Turk to build machine translation evaluation sets. In Proceedings of the NAACL HLT Workshop on Creating Speech and Language Data With Amazon ’s Mechanical Turk, pages 208–21 1. Chris Callison-Burch and Mark Dredze. 2010. Creating speech and language data with Amazon’s Mechanical Turk. In Proceedings of the NAACL HLT Workshop on Creating Speech and Language Data With Amazon ’s Mechanical Turk, pages 1–12. Chris Callison-Burch. 2009. Fast, cheap, and creative: Evaluating translation quality using Amazon’s Mechanical Turk. In Proceedings of EMNLP, pages 286– 295. A. P. Dawid and A. M. Skene. 1979. Maximum likeli- hood estimation of observer error-rates using the EM algorithm. Applied Statistics, 28(1):20–28. Alain D ´esilets. 2010. AMTA 2010 workshop on collaborative translation: technology, crowdsourcing, and the translator perspective. http : / /bit .ly/gPnqR2 . Pascale Fung and Lo Yuen Yee. 1998. An ir approach for translating new words from nonparallel, comparable texts. In Proceedings of ACL/CoLing. Ulrich Germann. 2001 . Building a statistical machine translation system from scratch: How much bang for the buck can we expect? In ACL 2001 Workshop on Data-Driven Machine Translation, Toulouse, France. Aria Haghighi, Percy Liang, Taylor Berg-Kirkpatrick, and Dan Klein. 2008. Learning bilingual lexicons from monolingual corpora. In Proceedings of ACL/HLT. Panos Ipeirotis. 2010. New demographics of Mechanical Turk. http : / /behind-the-enemy-l ine s . blogspot .com/ 2 0 10 / 0 3 / new-demographi cs -o f-mechani cal-turk . html . Ann Irvine and Alexandre Klementiev. 2010. Using Mechanical Turk to annotate lexicons for less commonly used languages. In Proceedings of the NAACL HLT Workshop on Creating Speech and Language Data With Amazon ’s Mechanical Turk, pages 108–1 13. Zhifei Li, Chris Callison-Burch, Chris Dyer, Juri Ganitkevitch, Ann Irvine, Sanjeev Khudanpur, Lane Schwartz, Wren Thornton, Ziyuan Wang, Jonathan Weese, and Omar Zaidan. 2010. Joshua 2.0: A toolkit for parsing-based machine translation with syntax, semirings, discriminative training and other goodies. In Proceedings of the Joint Fifth Workshop on Statistical Machine Translation and MetricsMATR, pages 133–137. Dragos Munteanu and Daniel Marcu. 2005. Improving machine translation performance by exploiting comparable corpora. Computational Linguistics, 3 1(4):477– 504, December. Sonja Niessen and Hermann Ney. 2004. Statistical machine translation with scarce resources using morpho-syntatic analysis. Computational Linguistics, 30(2): 181–204. Doug Oard, David Doermann, Bonnie Dorr, Daqing He, Phillip Resnik, William Byrne, Sanjeeve Khudanpur, David Yarowsky, Anton Leuski, Philipp Koehn, and Kevin Knight. 2003. Desperately seeking Cebuano. In Proceedings of HLT/NAACL. Franz Josef Och. 2003. Minimum error rate training in statistical machine translation. In Proceedings of ACL, pages 160–167. 1229 Kishore Papineni, Salim Poukos, Todd Ward, and WeiJing Zhu. 2002. BLEU: a method for automatic evaluation of machine translation. In Proceedings of ACL, pages 3 11–3 18. Katharina Probst, Lori Levin, Erik Peterson, Alon Lavie, and Jamie Carbonell. 2002. MT for minority languages using elicitation-based learning of syntactic transfer rules. Machine Translation, 17(4). Reinhard Rapp. 1995. Identifying word translations in non-parallel texts. In Proceedings of ACL. Philip Resnik and Noah Smith. 2003. The web as a parallel corpus. Computational Linguistics, 29(3):349– 380, September. Philip Resnik, Olivia Buzek, Chang Hu, Yakov Kronrod, Alex Quinn, and Benjamin Bederson. 2010. Improving translation via targeted paraphrasing. In Proceedings of EMNLP, pages 127–137. Charles Schafer and David Yarowsky. 2002. Inducing translation lexicons via diverse similarity measures and bridge languages. In Conference on Natural Language Learning-2002, pages 146–152. Jason R. Smith, Chris Quirk, and Kristina Toutanova. 2010. Extracting parallel sentences from comparable corpora using document level alignment. In Human Language Technologies: The 2010 Annual Conference of the North American Chapter of the Association for Computational Linguistics, pages 403–41 1, Los Angeles, California, June. Association for Computational Linguistics. Matthew Snover, Bonnie Dorr, Richard Schwartz, Linnea Micciulla, and John Makhoul. 2006. A study of translation edit rate with targeted human annotation. In Proceedings ofAssociationforMachine Translation in the Americas (AMTA). Rion Snow, Brendan O’Connor, Daniel Jurafsky, and Andrew Y. Ng. 2008. Cheap and fast but is it good? Evaluating non-expert annotations for natural language tasks. In Proceedings of EMNLP, pages 254–263. Jakob Uszkoreit, Jay M. Ponte, Ashok C. Popat, and Moshe Dubiner. 2010. Large scale parallel document mining for machine translation. In Proc. of the International Conference on Computational Linguistics (COLING). Jacob Whitehill, Paul Ruvolo, Tingfan Wu, Jacob Bergsma, and Javier Movellan. 2009. Whose vote should count more: Optimal integration of labels from labelers of unknown expertise. In Proceedings of NIPS, pages 2035–2043. Omar F. Zaidan. 2009. Z-MERT: A fully configurable open source tool for minimum error rate training of machine translation systems. The Prague Bulletin of Mathematical Linguistics, 91:79–88. –