acl acl2011 acl2011-330 acl2011-330-reference knowledge-graph by maker-knowledge-mining

330 acl-2011-Using Derivation Trees for Treebank Error Detection


Source: pdf

Author: Seth Kulick ; Ann Bies ; Justin Mott

Abstract: This work introduces a new approach to checking treebank consistency. Derivation trees based on a variant of Tree Adjoining Grammar are used to compare the annotation of word sequences based on their structural similarity. This overcomes the problems of earlier approaches based on using strings of words rather than tree structure to identify the appropriate contexts for comparison. We report on the result of applying this approach to the Penn Arabic Treebank and how this approach leads to high precision of error detection.


reference text

Adriane Boyd, Markus Dickinson, and Detmar Meurers. 2007. Increasing the recall of corpus annotation error detection. In Proceedings of the Sixth Workshop on Treebanks and Linguistic Theories (TLT 2007), Bergen, Norway. Tim Buckwalter. 2004. Buckwalter Arabic morphological analyzer version 2.0. Linguistic Data Consortium LDC2004L02. David Chiang. 2003. Statistical parsing with an automatically extracted tree adjoining grammar. In Data Oriented Parsing. CSLI. Markus Dickinson and Detmar Meurers. 2003a. Detecting errors in part-of-speech annotation. In Proceedings of the 10th Conference of the European Chapter of the Association for Computational Linguistics (EACL-03), pages 107–1 14, Budapest, Hungary. Markus Dickinson and Detmar Meurers. 2003b. Detecting inconsistencies in treebanks. In Proceedings of the Second Workshop on Treebanks and Linguistic Theories (TLT 2003), Sweden. Treebanks and Linguistic Theories. Jason Eisner. 2003. Learning non-isomorphic tree mappings for machine translation. In The Companion Volume to the Proceedings of 41st Annual Meeting of the Association for Computational Linguistics, pages 205–208, Sapporo, Japan, July. Association for Computational Linguistics. Spence Green and Christopher D. Manning. 2010. Better Arabic parsing: Baselines, evaluations, and analysis. In Proceedings of the 23rd International Conference on Computational Linguistics (Coling 2010), pages 394–402, Beijing, China, August. Coling 2010 Organizing Committee. A.K. Joshi and Y. Schabes. 1997. Tree-adjoining grammars. In G. Rozenberg and A. Salomaa, editors, Handbook of Formal Languages, Volume 3: Beyond Words, pages 69–124. Springer, New York. Yoshihide Kato and Shigeki Matsubara. 2010. Correcting errors in a treebank based on synchronous tree substitution grammar. In Proceedings of the ACL 2010 Conference Short Papers, pages 74–79, Uppsala, Sweden, July. Association for Computational Linguistics. Seth Kulick and Ann Bies. 2010. A TAG-derived database for treebank search and parser analysis. In TAG+10: The 10th International Conference on Tree Adjoining Grammars and Related Formalisms, Yale. Mohamed Maamouri, Ann Bies, Seth Kulick, Fatma Gaddeche, Wigdan Mekki, Sondos Krouna, and Basma Bouziri. 2008a. Arabic treebank part 1 - v4.0. Linguistic Data Consortium LDC2008E61, December 4. Mohamed Maamouri, Ann Bies, Seth Kulick, Fatma Gaddeche, Wigdan Mekki, Sondos Krouna, and Basma Bouziri. 2008b. Arabic treebank part 3 - v3.0. Linguistic Data Consortium LDC2008E22, August 20. Mohamed Maamouri, Ann Bies, Seth Kulick, Fatma Gaddeche, Wigdan Mekki, Sondos Krouna, and Basma Bouziri. 2009. Arabic treebank part 2- v3.0. 698 Linguistic Data Consortium LDC2008E62, 20. January