acl acl2010 acl2010-130 acl2010-130-reference knowledge-graph by maker-knowledge-mining

130 acl-2010-Hard Constraints for Grammatical Function Labelling

Source: pdf

Author: Wolfgang Seeker ; Ines Rehbein ; Jonas Kuhn ; Josef Van Genabith

Abstract: For languages with (semi-) free word order (such as German), labelling grammatical functions on top of phrase-structural constituent analyses is crucial for making them interpretable. Unfortunately, most statistical classifiers consider only local information for function labelling and fail to capture important restrictions on the distribution of core argument functions such as subject, object etc., namely that there is at most one subject (etc.) per clause. We augment a statistical classifier with an integer linear program imposing hard linguistic constraints on the solution space output by the classifier, capturing global distributional restrictions. We show that this improves labelling quality, in particular for argument grammatical functions, in an intrinsic evaluation, and, importantly, grammar coverage for treebankbased (Lexical-Functional) grammar acquisition and parsing, in an extrinsic evaluation.

reference text

Steven J. Benson and Jorge J. More. 2001. A limited memory variable metric method in subspaces and bound constrained optimization problems. Technical report, Argonne National Laboratory. Adam L. Berger, Vincent J.D. Pietra, and Stephen A.D. Pietra. 1996. A maximum entropy approach to natural language processing. Computational linguistics, 22(1):71. Don Blaheta and Eugene Charniak. 2000. Assigning function tags to parsed text. In Proceedings of the 1st North American chapter of the Association for Computational Linguistics conference, pages 234 240, Seattle, Washington. Morgan Kaufmann Publishers Inc. – Thorsten Brants, Wojciech Skut, and Brigitte Krenn. 1997. Tagging grammatical functions. In Proceedings of EMNLP, volume 97, pages 64–74. 1095 Sabine Brants, Stefanie Dipper, Silvia Hansen, Wolfgang Lezius, and George Smith. 2002. The TIGER treebank. In Proceedings of the Workshop on Treebanks and Linguistic Theories, page 2441 . Joan Bresnan. 2001. Blackwell Publishers. Lexical-Functional Syntax. Miriam Butt, Helge Dyvik, Tracy Halloway King, Hiroshi Masuichi, and Christian Rohrer. 2002. The parallel grammar project. In COLING-02 on Grammar engineering and evaluation-Volume 15, volume pages, page 7. Association for Computational Linguistics. Aoife Cahill, Martin Forst, Mairead McCarthy, Ruth ODonovan, Christian Rohrer, Josef van Genabith, and Andy Way. 2003. Treebank-based multilingual unification-grammar development. In Proceedings of the Workshop on Ideas and Strategies for Multilingual Grammar Development at the 15th ESSLLI, page 1724. Aoife Cahill, Michael Burke, Ruth O’Donovan, Josef van Genabith, and Andy Way. 2004. Longdistance dependency resolution in automatically acquired wide-coverage PCFG-based LFG approximations. Proceedings of the 42nd Annual Meeting on Association for Computational Linguistics - ACL ’04, pages 3 19–es. Aoife Cahill, Michael Burke, Ruth O’Donovan, Stefan Riezler, Josef van Genabith, and Andy Way. 2008. Wide-Coverage Deep Statistical Parsing Using Automatic Dependency Structure Annotation. Computational Linguistics, 34(1):81–124, M a¨rz. Aoife Cahill. 2004. Parsing with Automatically Acquired, Wide-Coverage, Robust, Probabilistic LFG Approximations. Ph.D. thesis, Dublin City University. Grzegorz Chrupała and Josef Van Genabith. 2006. Using machine-learning to assign function labels to parser output for Spanish. In Proceedings of the COLING/ACL main conference poster session, page 136143, Sydney. Association for Computational Linguistics. Stephen Clark and Judith Hockenmaier. 2002. Evaluating a wide-coverage CCG parser. In Proceedings of the LREC 2002, pages 60–66. James Clarke and Mirella Lapata. 2008. Global inference for sentence compression an integer linear programming approach. Journal of Artificial Intelligence Research, 3 1:399–429. Richard Crouch, Ronald M. Kaplan, Tracy Halloway King, and Stefan Riezler. 2002. A comparison of evaluation metrics for a broad-coverage stochastic parser. In Proceedings of LREC 2002 Workshop, pages 67–74, Las Palmas, Canary Islands, Spain. Peter Eisenberg. 2006. Grundriss der deutschen Grammatik: Das Wort. J.B. Metzler, Stuttgart, 3 edition. Martin Forst, N u´ria Bertomeu, Berthold Crysmann, Frederik Fouvry, Silvia Hansen-Shirra, and Valia Kordoni. 2004. Towards a dependency-based gold standard for German parsers The TiGer Dependency Bank. In Proceedings of the COLING Workshop on Linguistically Interpreted Corpora (LINC ’04), Geneva, Switzerland. Martin Forst. 2007. Filling Statistics with Linguistics Property Design for the Disambiguation of German LFG Parses. In Proceedings of ACL 2007. Association for Computational Linguistics. Jun’Ichi Kazama and Jun’Ichi Tsujii. 2005. Maximum entropy models with inequality constraints: A case study on text categorization. Machine Learning, 60(1): 159194. Dan Klein and Christopher D. Manning. 2003. Accurate unlexicalized parsing. In Proceedings of ACL 2003, pages 423–430, Morristown, NJ, USA. Association for Computational Linguistics. Manfred Klenner. 2005. Extracting Predicate Structures from Parse Trees. In Proceedings of the RANLP 2005. Manfred Klenner. 2007. Shallow dependency labeling. In Proceedings of the ACL 2007 Demo and Poster Sessions, page 201204, Prague. Association for Computational Linguistics. Terry Koo and Michael Collins. 2005. Hiddenvariable models for discriminative reranking. In Proceedings of the conference on Human Language Technology and Empirical Methods in Natural Language Processing - HLT ’05, pages 507–5 14, Morristown, NJ, USA. Association for Computational Linguistics. Sandra K ¨ubler. 2005. How Do Treebank Annotation Schemes Influence Parsing Results? Or How Not to Compare Apples And Oranges. In Proceedings of RANLP 2005, Borovets, Bulgaria. David M. Magerman. 1995. Statistical decision-tree models for parsing. In Proceedings of the 33rd annual meeting on Association for Computational Linguistics, page 276283, Morristown, NJ, USA. Association for Computational Linguistics Morristown, NJ, USA. Andr e´ F. T. Martins, Noah A. Smith, and Eric P. Xing. 2009. Concise integer linear programming formulations for dependency parsing. In Proceedings of ACL 2009. Ryan McDonald and Fernando Pereira. 2006. Online learning of approximate dependency parsing algorithms. In Proceedings of EACL, volume 6. Yusuke Miyao, Takashi Ninomiya, and Jun’ichi Tsujii. 2003. Probabilistic modeling of argument structures including non-local dependencies. In Proceedings of the Conference on Recent Advances in Natural Language Processing RANLP 2003, volume 2. 1096 Joakim Nivre, Johan Hall, Jens Nilsson, Atanas Chanev, G ¨ulsen Eryigit, Sandra K ¨ubler, Svetoslav Marinov, and Erwin Marsi. 2007. MaltParser: A language-independent system for data-driven dependency parsing. Natural Language Engineering, 13(2):95–135, Januar. Slav Petrov, Leon Barrett, Romain Thibaux, and Dan Klein. 2006. Learning accurate, compact, and interpretable tree annotation. In Proceedings of the 21st International Conference on Computational Linguistics and the 44th annual meeting of the ACL - ACL ’06, pages 433–440, Morristown, NJ, USA. Association for Computational Linguistics. Vasin Punyakanok, Wen-Tau Yih, Dan Roth, and Dav Zimak. 2004. Semantic role labeling via integer linear programming inference. In Proceedings of the 20th international conference on Computational Linguistics - COLING ’04, Morristown, NJ, USA. Association for Computational Linguistics. Vasin Punyakanok, Dan Roth, and Wen-tau Yih. 2008. The Importance of Syntactic Parsing and Inference in Semantic Role Labeling. Computational Linguistics, 34(2):257–287, Juni. Adwait Ratnaparkhi. 1998. Maximum Entropy Models for Natural Language Ambiguity Resolution. Ph.D. thesis, University of Pennsylvania. Ines Rehbein and Josef van Genabith. 2009. Automatic Acquisition of LFG Resources for GermanAs Good as it gets. In Miriam Butt and Tracy Holloway King, editors, Proceedings of LFG Confer- ence 2009. CSLI Publications. Ines Rehbein. 2009. Treebank-based grammar acquisition for German. Ph.D. thesis, Dublin City University. Dan Roth and Wen-Tau Yih. 2004. A linear programming formulation for global inference in natural language tasks. In Proceedings of CoNNL 2004. Anne Schiller, Simone Teufel, and Christine St¨ ockert. 1999. Guidelines f u¨r das Tagging deutscher Textcorpora mit STTS (Kleines und Tagset). Technical Report August, Universit a¨t Stuttgart. großes Anne Schiller. 1994. Dmor - user’s guide. Technical report, University of Stuttgart. Helmut Schmid. 1994. Probabilistic Part-of-Speech Tagging Using Decision Trees. In Proceedings of International Conference on New Methods in Language Processing, volume 12. Manchester, UK. Reut Tsarfaty and Khalil Sima’an. 2008. Relationalrealizational parsing. In Proceedings of the 22nd International Conference on Computational Linguistics - COLING ’08, pages 889–896, Morristown, NJ, USA. Association for Computational Linguistics. 1097