acl acl2013 acl2013-94 acl2013-94-reference knowledge-graph by maker-knowledge-mining

94 acl-2013-Coordination Structures in Dependency Treebanks


Source: pdf

Author: Martin Popel ; David Marecek ; Jan StÄłpanek ; Daniel Zeman ; ZdÄłnÄłk Zabokrtsky

Abstract: Paratactic syntactic structures are notoriously difficult to represent in dependency formalisms. This has painful consequences such as high frequency of parsing errors related to coordination. In other words, coordination is a pending problem in dependency analysis of natural languages. This paper tries to shed some light on this area by bringing a systematizing view of various formal means developed for encoding coordination structures. We introduce a novel taxonomy of such approaches and apply it to treebanks across a typologically diverse range of 26 languages. In addition, empirical observations on convertibility between selected styles of representations are shown too.


reference text

Itzair Aduriz et al. 2003. Construction of a Basque dependency treebank. In Proceedings of the 2nd Workshop on Treebanks and Linguistic Theories. Susana Afonso, Eckhard Bick, Renato Haber, and Diana Santos. 2002. “Floresta sint a´(c)tica”: a treebank for Portuguese. In LREC, pages 1968–1703. Nart B. Atalay, Kemal Oflazer, and Bilge Say. 2003. The annotation process in the Turkish treebank. In Proceedings of the 4th Intern. Workshop on Linguistically Interpreteted Corpora (LINC). David Bamman and Gregory Crane. 2011. The Ancient Greek and Latin dependency treebanks. In Language Technology for Cultural Heritage, Theory and Applications of Natural Language Processing, pages 79–98. Springer Berlin Heidelberg. Igor Boguslavsky, Svetlana Grigorieva, Nikolai Grigoriev, Leonid Kreidlin, and Nadezhda Frid. 2000. Dependency treebank for Russian: Concept, tools, types of information. In Proceedings of the 18th conference on Computational linguistics-Volume 2, pages 987–991 . Association for Computational Linguistics Morristown, NJ, USA. Sabine Brants, Stefanie Dipper, Silvia Hansen, Wolfgang Lezius, and George Smith. 2002. The TIGER treebank. In Proceedings of the Workshop on Treebanks and Linguistic Theories, Sozopol. Sabine Buchholz and Erwin Marsi. 2006. CoNLL-X shared task on multilingual dependency parsing. In Proceedings of CoNLL, pages 149–164. Montserrat Civit, Maria Ant o`nia Mart ı´, and N u´ria Buf ı´. 2006. Cat3LB and Cast3LB: From constituents to dependencies. In FinTAL, volume 4139 of Lecture Notes in Computer Science, pages 141–152. Springer. Michael Collins. 2003. Head-driven statistical models for natural language parsing. Computational linguistics, 29(4):589–637. D o´ra Csendes, J a´nos Csirik, Tibor Gyim o´thy, and Andr a´s Kocsor. 2005. The Szeged treebank. In TSD, volume 3658 of Lecture Notes in Computer Science, pages 123–13 1. Springer. Mihaela C ˘al ˘acean. 2008. Data-driven dependency parsing for Romanian. Master’s thesis, Uppsala University, August. Saˇ so D ˇzeroski, Toma zˇ Erjavec, Nina Ledinek, Petr Pajas, Zden eˇk Zˇabokrtsk y´, and Andreja Zˇele. 2006. Towards a Slovene dependency treebank. In LREC 2006, pages 1388–1391, Genova, Italy. European Language Resources Association (ELRA). Nathan Green and Zden ˇek Zˇabokrtsk y´. 2012. Hybrid combination of constituency and dependency trees into an ensemble dependency parser. In Proceedings of the Workshop on Innovative Hybrid Approaches to the Processing of Textual Data, pages 19–26, Avignon, France. Association for Computational Linguistics. Jan Haji cˇ, Jarmila Panevov a´, Eva Haji cˇov a´, Petr Sgall, Petr Pajas, Jan Sˇt eˇp a´nek, Jiˇ r ´ı Havelka, Marie Mikulov a´, Zden eˇk Zˇabokrtsk y´, and Magda Sˇev cˇ ı´kov ´a-Raz ı´mov a´. 2006. Prague Dependency Treebank 2.0. CD-ROM, Linguistic Data Consortium, LDC Catalog No.: LDC2006T01, Philadelphia. 525 Jan Haji cˇ et al. 2009. The CoNLL-2009 shared task: Syntactic and semantic dependencies in multiple languages. In Proceedings of the 13th Conference on Computational Natural Language Learning (CoNLL-2009), June 4-5, Boulder, Colorado, USA. Jirka Hana and Jan Sˇt eˇp a´nek. 2012. Prague markup language framework. In Proceedings of the Sixth Linguistic Annotation Workshop, pages 12– 21, Stroudsburg, PA, USA. Association for Compu- tational Linguistics, Association for Computational Linguistics. Katri Haverinen, Timo Viljanen, Veronika Laippala, Samuel Kohonen, Filip Ginter, and Tapio Salakoski. 2010. Treebanking Finnish. In Proceedings of the Ninth International Workshop on Treebanks and Linguistic Theories (TLT9), pages 79–90. Samar Husain, Prashanth Mannem, Bharat Ambati, and Phani Gadde. 2010. The ICON-2010 tools contest on Indian language dependency parsing. In Proceedings of ICON-2010 Tools Contest on Indian Language Dependency Parsing, Kharagpur, India. ISO 24615. 2010. Language resource management Syntactic annotation framework (SynAF). Sylvain Kahane. 1997. Bubble trees and syntactic representations. In Proceedings of the 5th Meeting of the Mathematics of the Language, DFKI, Saarbrucken. Matthias T. Kromann, Line Mikkelsen, and Stine Kern Lynge. 2004. Danish dependency treebank. Sandra K ¨ubler, Erhard Hinrichs, Wolfgang Maier, and Eva Klett. 2009. Parsing coordinations. In Proceedings of the 12th Conference of the European Chapter of the ACL (EACL 2009), pages 406–414, Athens, Greece, March. Association for Computational Linguistics. Vincenzo Lombardo and Leonardo Lesmo. 1998. Unit coordination and gapping in dependency theory. In Processing of Dependency-Based Grammars; proceedings of the workshop. COLING-ACL, Montreal. Mitchell Marcus, Beatrice Santorini, and Mary Ann Marcinkiewicz. 1993. Building a large annotated corpus of English: the Penn Treebank. Computational Linguistics, 19:3 13–330. Nicolar Mazziotta. 2011. Coordination of verbal dependents in Old French: Coordination as a specified juxtaposition or apposition. In Proceedings of International Conference on Dependency Linguistics (DepLing 2011). Ryan McDonald and Joakim Nivre. 2007. Characterizing the errors of data-driven dependency parsing models. In Proceedings of the 2007 Joint Conference on Empirical Methods in Natural Language Processing and Computational Natural Language Learning (EMNLP-CoNLL), pages 122–131. Igor A. Mel’ cˇuk. 1988. Dependency Syntax: Theory and Practice. State University of New York Press. Simonetta Montemagni et al. 2003. Building the Italian syntactic-semantic treebank. In Building and using Parsed Corpora, Language and Speech series, pages 189–210, Dordrecht. Kluwer. Jens Nilsson, Johan Hall, and Joakim Nivre. 2005. MAMBA meets TIGER: Reconstructing a Swedish treebank from antiquity. In Proceedings of the NODALIDA Special Session on Treebanks. Joakim Nivre, Johan Hall, Sandra K ¨ubler, Ryan McDonald, Jens Nilsson, Sebastian Riedel, and Deniz Yuret. 2007. The CoNLL 2007 shared task on dependency parsing. In Proceedings of the CoNLL 2007 Shared Task. EMNLP-CoNLL, June. Martin Popel and Zden eˇk Zˇabokrtsk y´. 2009. Improving English-Czech Tectogrammatical MT. The Prague Bulletin of Mathematical Linguistics, (92): 1–20. Prokopis Prokopidis, Elina Desipri, Maria Koutsombogera, Harris Papageorgiou, and Stelios Piperidis. 2005. Theoretical and practical issues in the construction of a Greek dependency treebank. In Proceedings of the 4th Workshop on Treebanks and Linguistic Theories (TLT), pages 149–160. Loganathan Ramasamy and Zden eˇk Zˇabokrtsk y´. 2012. Prague dependency style treebank for Tamil. In Proceedings of LREC 2012, pages 23–25, I˙stanbul, Turkey. European Language Resources Association. Mohammad Sadegh Rasooli, Amirsaeid Moloodi, Manouchehr Kouhestani, and Behrouz MinaeiBidgoli. 2011. A syntactic valency lexicon for Persian verbs: The first steps towards Persian dependency treebank. In 5th Language & Technology Conference (LTC): Human Language Technologies as a Challenge for Computer Science and Linguistics, pages 227–231, Pozna ´n, Poland. Kiril Simov and Petya Osenova. 2005. Extending the annotation of BulTreeBank: Phase 2. In The Fourth Workshop on Treebanks and Linguistic Theories (TLT 2005), pages 173–184, Barcelona, December. Otakar Smr zˇ, Viktor Bielick y´, Iveta Kou ˇrilov a´, Jakub Kr´ a ˇcmar, Jan Haji cˇ, and Petr Zem a´nek. 2008. Prague Arabic dependency treebank: A word on the million words. In Proceedings of the Workshop on Arabic and Local Languages (LREC) 2008, pages 16–23, Marrakech, Morocco. European Language Resources Association. Leon Stassen. 2000. And-languages and withlanguages. Linguistic Typology, 4(1): 1–54. Jan Sˇt eˇp a´nek. 2006. Capturing a Sentence Structure by a Dependency Relation in an Annotated Syntactical Corpus (Tools Guaranteeing Data Consistence) (in Czech). Ph.D. thesis, Charles Univer526 sity in Prague, Faculty of Mathematics and Physics, Prague, Czech Republic. Pavel Stra nˇ a´k and Jan Sˇt eˇp a´nek. 2010. Representing layered and structured data in the CoNLL-ST format. In Alex Fang, Nancy Ide, and Jonathan Webster, editors, Proceedings of the Second International Conference on Global Interoperability for Language Resources, pages 143–152, Hong Kong, China. City University of Hong Kong, City University of Hong Kong. Mariona Taul e´, Maria Ant o`nia Mart ı´, and Marta Recasens. 2008. AnCora: Multilevel annotated corpora for Catalan and Spanish. In LREC. European Language Resources Association. TEI Consortium. 2013. TEI P5: Guidelines for Electronic Text Encoding and Interchange. Lucien Tesni `ere. turale. Paris. 1959. El´ ements de syntaxe struc- Stephen Tratz and Eduard Hovy. 2011. A fast, accurate, non-projective, semantically-enriched parser. In Proceedings of EMNLP, pages 1257–1268, Edinburgh, Scotland, UK, July. Association for Computational Linguistics. Leonoor van der Beek et al. 2002. Chapter 5. The Alpino dependency treebank. In Algorithms for Linguistic Processing NWO PIONIER Progress Report, Groningen, The Netherlands. Daniel Zeman, David Mare cˇek, Martin Popel, Loganathan Ramasamy, Jan Sˇt eˇp a´nek, Zden eˇk Zˇabokrtsk y´, and Jan Haji cˇ. 2012. HamleDT: To parse or not to parse? In Proceedings ofLREC 2012, pages 2735–2741, I˙stanbul, Turkey. European Language Resources Association. 527