acl acl2013 acl2013-299 acl2013-299-reference knowledge-graph by maker-knowledge-mining

299 acl-2013-Reconstructing an Indo-European Family Tree from Non-native English Texts

Source: pdf

Author: Ryo Nagata ; Edward Whittaker

Abstract: Mother tongue interference is the phenomenon where linguistic systems of a mother tongue are transferred to another language. Although there has been plenty of work on mother tongue interference, very little is known about how strongly it is transferred to another language and about what relation there is across mother tongues. To address these questions, this paper explores and visualizes mother tongue interference preserved in English texts written by Indo-European language speakers. This paper further explores linguistic features that explain why certain relations are preserved in English writing, and which contribute to related tasks such as native language identification.

reference text

Jan Aarts and Sylviane Granger, 1998. Tag sequences in learner corpora: a key to interlanguage grammar and discourse, pages 132–141. Longman, New York. Bengt Altenberg and Marie Tapper, 1998. The use of adverbial connectors in advanced Swedish learners ’ written English, pages 80–93. Longman, New York. Fran ¸cois Barbanc ¸on, Tandy Warnow, Steven N. Evans, Donald Ringe, and Luay Nakhleh. 2007. An experimental study comparing linguistic phylogenetic reconstruction methods. Statistics Technical Reports, page 732. Vladimir Batagelj, Toma zˇ Pisanski, and Damijana Ker zˇi cˇ. 1992. Automatic clustering of languages. Computational Linguistics, 18(3):339–352. 1145 Robert S.P. Beekes. 2011. Comparative IndoEuropean Linguistics: An Introduction (2nd ed.). John Benjamins Publishing Company, Amsterdam. Martin Chodorow, Michael Gamon, and Joel R. Tetreault. 2010. The utility of article and preposition error correction systems for English language learners: feedback and assessment. Language Testing, 27(3):419–436. David Crystal. 1997. The Cambridge Encyclopedia of Language (2nd ed.). Cambridge University Press, Cambridge. Niels Davidsen-Nielsen and Peter Harder, 2001 . Speakers of Scandinavian languages: Danish, Norwegian, Swedish, pages 21–36. Cambridge University Press, Cambridge. Alvar Elleg ˚ard. 1959. Statistical measurement of linguistic relationship. Language, 35(2): 13 1–156. Jessica Enright and Grzegorz Kondrak. 2011. The application of chordal graphs to inferring phylogenetic trees of languages. In Proc. of 5th International Joint Conference on Natural Language Processing, pages 8–13. Sylviane Granger, Estelle Dagneaux, Fanny Meunier, and Magali Paquot. 2009. International Corpus of Learner English v2. Presses universitaires de Louvain, Louvain. Russell D. Gray and Quentin D. Atkinson. 2003. Language-tree divergence times support the Anatolian theory of Indo-European origin. Nature, 426:435–438. Jiawei Han and Micheline Kamber. 2006. Data Mining: Concepts and Techniques (2nd Ed.). Morgan Kaufmann Publishers, San Francisco. Bing-Hwang Juang and Lawrence R. Rabiner. 1985. A probabilistic distance measure for hidden Markov models. AT&T; Technical Journal, 64(2):391–408. Kenji Kita. 1999. Automatic clustering of languages based on probabilistic models. Journal of Quantitative Linguistics, 6(2): 167–171. Reinhard Kneser and Hermann Ney. 1995. Improved backing-off for m-gram language modeling. In Proc. of International Conference on Acoustics, Speech, and Signal Processing, volume 1, pages 181–184. Philipp Koehn. 2011. Europarl: A parallel corpus for statistical machine translation. In Proc. of 10th Machine Translation Summit, pages 79–86. Moshe Koppel and Noam Ordan. 2011. Translationese and its dialects. In Proc. of 49th Annual Meeting of the Association for Computational Linguistics, pages 13 18–1326. Moshe Koppel, Jonathan Schler, and Kfir Zigdon. 2005. Determining an author’s native language by mining a text for errors. In Proc. of 11th ACM SIGKDD International Conference on Knowledge Discovery in Data Mining, pages 624–628. Alfred L. Kroeber and Charles D. Chri e´tien. 1937. Quantitative classification of Indo-European guages. Language, 13(2):83–103. lan- Ryo Nagata, Edward Whittaker, and Vera Sheinman. 2011. Creating a manually error-tagged and shallow-parsed learner corpus. In Proceedings of the 49th Annual Meeting ofthe Associationfor Computational Linguistics: Human Language Technologies, pages 1210–1219. Luay Nakhleh, Tandy Warnow, Don Ringe, and Steven N. Evans. 2005. A comparison of phylogenetic reconstruction methods on an Indo-European dataset. Transactions of the Philological Society, 103(2): 171–192. Taraka Rama and Anil Kumar Singh. 2009. From bag of languages to family trees from noisy corpus. In Proc. of Recent Advances in Natural Language Processing, pages 355–359. Anna Giacalone Ramat and Paolo Ramat, 2006. The Indo-European Languages. Routledge, New York. William Snyder. 1996. The acquisitional role of the syntax-morphology interface: Morphological compounds and syntactic complex predicates. In Proc. of Annual Boston University Conference on Language Development, volume 2, pages 728–735. Masatoshi Sugiura, Masumi Narita, Tomomi Ishida, Tatsuya Sakaue, Remi Murao, and Kyoko Muraki. 2007. A discriminant analysis of non-native speakers and native speakers of English. In Proc. of Corpus Linguistics Conference CL2007, pages 84–89. Michael Swan and Bernard Smith. 2001 . Learner English (2nd Ed.). Cambridge University Press, Cambridge. Hans van Halteren. 2008. Source language markers in EUROPARL translations. In Proc. of 22nd International Conference on Computational Linguistics, pages 937–944. Sze-Meng J. Wong and Mark Dras. 2009. Contrastive analysis and native language identification. In Proc. Australasian Language Technology Workshop, pages 53–61 . Sze-Meng J. Wong, Mark Dras, and Mark Johnson. 2011. Exploiting parse structures for native language identification. In Proc. Conference on Empirical Methods in Natural Language Processing, pages 1600–161 1. Sze-Meng J. Wong, Mark Dras, and Mark Johnson. 2012. Exploring adaptor grammars for native language identification. In Proc. Joint Conference on 1146 Empirical Methods in Natural Language Processing and Computational Natural Language Learning, pages 699–709. 1147