acl acl2013 acl2013-64 acl2013-64-reference knowledge-graph by maker-knowledge-mining

64 acl-2013-Automatically Predicting Sentence Translation Difficulty

Source: pdf

Author: Abhijit Mishra ; Pushpak Bhattacharyya ; Michael Carl

Abstract: In this paper we introduce Translation Difficulty Index (TDI), a measure of difficulty in text translation. We first define and quantify translation difficulty in terms of TDI. We realize that any measure of TDI based on direct input by translators is fraught with subjectivity and adhocism. We, rather, rely on cognitive evidences from eye tracking. TDI is measured as the sum of fixation (gaze) and saccade (rapid eye movement) times of the eye. We then establish that TDI is correlated with three properties of the input sentence, viz. length (L), degree of polysemy (DP) and structural complexity (SC). We train a Support Vector Regression (SVR) system to predict TDIs for new sentences using these features as input. The prediction done by our framework is well correlated with the empirical gold standard data, which is a repository of < L, DP, SC > and TDI pairs for a set of sentences. The primary use of our work is a way of “binning” sentences (to be translated) in “easy”, “medium” and “hard” categories as per their predicted TDI. This can decide pricing of any translation task, especially useful in a scenario where parallel corpora for Machine Translation are built through translation crowdsourcing/outsourcing. This can also provide a way of monitoring progress of second language learners.

reference text

Campbell, S., and Hale, S. 1999. What makes a text difficult to translate? Refereed Proceedings of the 23rd Annual ALAA Congress. Carl, M. 2012. Translog-II: A Program for Recording User Activity Data for Empirical Reading and Writing Research In Proceedings of the Eight International Conference on Language Resources and Evaluation, European Language Resources Association (ELRA) Carl, M. 2012 The CRITT TPR-DB 1.0: A Database for Empirical Human Translation Process Research. AMTA 2012 Workshop on Post-Editing Technology and Practice (WPTP-2012). Chall, J. S., and Dale, E. 1995. Readability revisited: the new Dale-Chall readability formula Cambridge, Mass.: Brookline Books. Dragsted, B. 2010. Co-ordination of reading andwriting processes in translation. Contribution to Translation and Cognition, Shreve, G. and Angelone, E.(eds.)Cognitive Science Society. Fry, E. 1977 Fry’s readability graph: Clarification, validity, and extension to level 17 Journal of Reading, 21(3), 242-252. Hornof, A. J. and Halverson, T. 2002 Cleaning up systematic error in eye-tracking data by using required fixation locations. Behavior Research Methods, Instruments, and Computers, 34, 592604. Joachims, T., Schlkopf, B. ,Burges, C and A. Smola (ed.). 1999. Making large-Scale SVM Learning Practical. Advances in Kernel Methods - Support Vector Learning. MIT-Press, 1999, Joachims, T. 2006 Training Linear SVMs in Linear Time Proceedings of the ACM Conference on Knowledge Discovery and Data Mining (KDD). Kincaid, J. P., Fishburne, R. P., Jr., Rogers, R. L., and Chissom, B. S. 1975. Derivation of New Readability Formulas (Automated Readability Index, Fog Count and Flesch Reading Ease Formula) for Navy Enlisted Personnel Millington, Tennessee: Naval Air Station Memphis,pp. 8-75. 350 Lin, D. 1996 On the structural complexity of natural language sentences. Proceeding of the 16th International Conference on Computational Linguistics (COLING), pp. 729733. Mishra, A., Carl, M, Bhattacharyya, P. 2012 A heuristic-based approach for systematic error correction of gaze datafor reading. In MichaelCarl, P.B. and Choudhary, K.K., editors, Proceedings of the First Workshop on Eye-tracking and Natural Language Processing, Mumbai, India. The COLING 2012 Organizing Committee von der Malsburg, T., Vasishth, S., and Kliegl, R. 2012 Scanpaths in reading are informative about sentence processing. In MichaelCarl, P.B. and Choudhary, K.K., editors, Proceedings of the First Workshop on Eye-tracking and Natural Language Processing, Mumbai, India. The COLING 2012 Organizing Committee 351