acl acl2012 acl2012-175 acl2012-175-reference knowledge-graph by maker-knowledge-mining

175 acl-2012-Semi-supervised Dependency Parsing using Lexical Affinities

Source: pdf

Author: Seyed Abolghasem Mirroshandel ; Alexis Nasr ; Joseph Le Roux

Abstract: Treebanks are not large enough to reliably model precise lexical phenomena. This deficiency provokes attachment errors in the parsers trained on such data. We propose in this paper to compute lexical affinities, on large corpora, for specific lexico-syntactic configurations that are hard to disambiguate and introduce the new information in a parser. Experiments on the French Treebank showed a relative decrease ofthe error rate of 7. 1% Labeled Accuracy Score yielding the best parsing results on this treebank.

reference text

A. Abeill´ e, L. Cl´ ement, and F. Toussenel. 2003. Building a treebank for french. In Anne Abeill´ e, editor, Treebanks. Kluwer, Dordrecht. E.H. Anguiano and M. Candito. 2011. Parse correction with specialized models for difficult attachment types. In Proceedings of EMNLP. M. Bansal and D. Klein. 2011. Web-scale features for full-scale parsing. In Proceedings of ACL, pages 693– 702. D. Bikel. 2004. Intricacies of Collins’ parsing model. Computational Linguistics, 30(4):479–5 11. B. Bohnet. 2010. Very high accuracy and fast dependency parsing is not a contradiction. In Proceedings of ACL, pages 89–97. M. Candito and B. Crabb ´e. 2009. Improving generative statistical parsing with semi-supervised word clustering. In Proceedings of the 11th International Confer- ence on Parsing Technologies, pages 138–141 . M. Candito and D. Seddah. 2010. Parsing word clusters. In Proceedings of the NAACL HLT Workshop on Statistical Parsing of Morphologically-Rich Languages, pages 76–84. M. Candito, B. Crabb ´e, P. Denis, and F. Gu´ erin. 2009. Analyse syntaxique du fran ¸cais : des constituants aux d ´ependances. In Proceedings of Traitement Automatique des Langues Naturelles. W. Chen, J. Kazama, K. Uchimoto, and K. Torisawa. 2009. Improving dependency parsing with subtrees from auto-parsed data. In Proceedings of EMNLP, pages 570–579. K.W. Church and P. Hanks. 1990. Word association norms, mutual information, and lexicography. Computational linguistics, 16(1):22–29. P. Denis and B. Sagot. 2010. Exploitation d’une ressource lexicale pour la construction d’un ´ etiqueteur morphosyntaxique ´e tat-de-l’art du fran ¸cais. In Proceedings of Traitement Automatique des Langues Naturelles. M. Hall, E. Frank, G. Holmes, B. Pfahringer, P. Reutemann, and I.H. Witten. 2009. The WEKA data mining software: an update. ACM SIGKDD Explorations Newsletter, 11(1): 10–18. R. Hwa. 2004. Sample selection for statistical parsing. Computational Linguistics, 30(3):253–276. T. Koo, X. Carreras, and M. Collins. 2008. Simple semisupervised dependency parsing. In Proceedings of the ACL HLT, pages 595–603. S. K ¨ubler, R. McDonald, and J. Nivre. 2009. Dependency parsing. Synthesis Lectures on Human Language Technologies, 1(1): 1–127. 785 M.P. Marcus, M.A. Marcinkiewicz, and B. Santorini. 1993. Building a large annotated corpus of english: The penn treebank. Computational linguistics, 19(2):313–330. D. McClosky, E. Charniak, and M. Johnson. 2006. Effective self-training for parsing. In Proceedings of HLT NAACL, pages 152–159. R. McDonald, F. Pereira, K. Ribarov, and J. Haji cˇ. 2005. Non-projective dependency parsing using spanning tree algorithms. In Proceedings of HLT-EMNLP, pages 523–530. S.A. Mirroshandel and A. Nasr. 2011. Active learning for dependency parsing using partially annotated sentences. In Proceedings of International Conference on Parsing Technologies. P. Nakov and M. Hearst. 2005. Using the web as an implicit training set: application to structural ambiguity resolution. In Proceedings of HLT-EMNLP, pages 835–842. A. Nasr, F. B ´echet, J-F. Rey, B. Favre, and Le Roux J. 2011. MACAON: An NLP tool suite for processing word lattices. In Proceedings of ACL. E. Pitler, S. Bergsma, D. Lin, and K. Church. 2010. Using web-scale N-grams to improve base NP parsing performance. In Proceedings of COLING, pages 886– 894. K. Sagae and J. Tsujii. 2007. Dependency parsing and domain adaptation with lr models and parser ensembles. In Proceedings of the CoNLL shared task session of EMNLP-CoNLL, volume 7, pages 1044–1050. R. S ´anchez-S a´ez, J.A. S ´anchez, and J.M. Bened ´ı. 2009. Statistical confidence measures for probabilistic parsing. In Proceedings of RANLP, pages 388–392. M. Steedman, M. Osborne, A. Sarkar, S. Clark, R. Hwa, J. Hockenmaier, P. Ruhlen, S. Baker, and J. Crim. 2003. Bootstrapping statistical parsers from small datasets. In Proceedings of EACL, pages 33 1–338. J. Suzuki, H. Isozaki, X. Carreras, and M. Collins. 2009. An empirical study of semi-supervised structured conditional models for dependency parsing. In Proceedings of EMNLP, pages 551–560. M. Volk. 2001. Exploiting the WWW as a corpus to resolve PP attachment ambiguities. In Proceedings of Corpus Linguistics. G. Zhou, J. Zhao, K. Liu, and L. Cai. 2011. Exploiting web-derived selectional preference to improve statisti- cal dependency parsing. In Proceedings of HLT-ACL, pages 1556–1565.