acl acl2012 acl2012-154 acl2012-154-reference knowledge-graph by maker-knowledge-mining

154 acl-2012-Native Language Detection with Tree Substitution Grammars

Source: pdf

Author: Benjamin Swanson ; Eugene Charniak

Abstract: We investigate the potential of Tree Substitution Grammars as a source of features for native language detection, the task of inferring an author’s native language from text in a different language. We compare two state of the art methods for Tree Substitution Grammar induction and show that features from both methods outperform previous state of the art results at native language detection. Furthermore, we contrast these two induction algorithms and show that the Bayesian approach produces superior classification results with a smaller feature set.

reference text

Mohit Bansal and Dan Klein 2010. Simple, accurate parsing with an all-fragments grammar. Association for Computational Linguistics. Phil Blunsom and Trevor Cohn 2010. Unsupervised Induction of Tree Substitution Grammars for Dependency Parsing. Empirical Methods in Natural Language Processing. Rens Bod 1991. A Computational Model of Language Performance: Data Oriented Parsing. Computational Linguistics in the Netherlands. Trevor Cohn, Sharon Goldwater, and Phil Blunsom. 2009. Inducing Compact but Accurate TreeSubstitution Grammars. In Proceedings NAACL. Trevor Cohn, and Phil Blunsom 2010. Blocked inference in Bayesian tree substitution grammars. Association for Computational Linguistics. Michael Collins, Nigel Duffy 2001. Convolution Kernels for Natural Language. Advances in Neural Information Processing Systems. Joshua Goodman 2003. Efficient parsing of DOP with PCFG-reductions. In Bod et al. chapter 8. . S. Granger, E. Dagneaux and F. Meunier. 2002. International Corpus of Learner English, (ICLE). Sangkyum Kim, Hyungsul Kim, Tim Weninger, and Jiawei Han 2010. Authorship classification: a syntactic tree mining approach. Proceedings of the ACM SIGKDD Workshop on Useful Patterns. Koppel, Moshe and Schler, Jonathan and Zigdon, Kfir. 2005. Determining an author’s native language by mining a text for errors. Proceedings of the eleventh ACM SIGKDD international conference on Knowledge discovery in data mining. Alessandro Moschitti, Daniele Pighin and Roberto Basili 2008. Tree Kernels for Semantic Role Labeling. Computational Linguistics. Slav Petrov, Leon Barrett, Romain Thibaux, and Dan Klein 2006. Learning Accurate, Compact, and Interpretable Tree Annotation. Association for Computational Linguistics. Matt Post and Daniel Gildea. 2009. Bayesian Learning of a Tree Substitution Grammar. Association for Com- putational Linguistics. Matt Post. 2011. Judging Grammaticality with Tree Substitution Grammar Derivations. Association for Computational Linguistics. Sindhu Raghavan, Adriana Kovashka and Raymond Mooney 2010. Authorship attribution using probabilistic context-free grammars. Association for Computational Linguistics. Sangati, Federico and Zuidema, Willem 2011. Accurate Parsing with Compact Tree-Substitution Grammars: 197 Double-DOP. Proceedings of the 2011 Conference on Empirical Methods in Natural Language Processing. Jun Suzuki and Hideki Isozaki 2006. Sequence and tree kernels with statistical feature mining. Advances in Neural Information Processing Systems. Sze-Meng Jojo Wong and Mark Dras 2011. Exploiting Parse Structures for Native Language Identification. Proceedings of the 2011 Conference on Empirical Methods in Natural Language Processing. Sze-Meng Jojo Wong and Mark Dras 2011. Topic Modeling for Native Language Identification. Proceedings of the Australasian Language Technology Association Workshop. Elif Yamangil, Stuart M. Shieber 2010. Bayesian Synchronous Tree-Substitution Grammar Induction and Its Application to Sentence Compression.. tion for Computational Linguistics. Associa-