acl acl2011 acl2011-127 acl2011-127-reference knowledge-graph by maker-knowledge-mining

127 acl-2011-Exploiting Web-Derived Selectional Preference to Improve Statistical Dependency Parsing

Source: pdf

Author: Guangyou Zhou ; Jun Zhao ; Kang Liu ; Li Cai

Abstract: In this paper, we present a novel approach which incorporates the web-derived selectional preferences to improve statistical dependency parsing. Conventional selectional preference learning methods have usually focused on word-to-class relations, e.g., a verb selects as its subject a given nominal class. This paper extends previous work to wordto-word selectional preferences by using webscale data. Experiments show that web-scale data improves statistical dependency parsing, particularly for long dependency relationships. There is no data like more data, performance improves log-linearly with the number of parameters (unique N-grams). More importantly, when operating on new domains, we show that using web-derived selectional preferences is essential for achieving robust performance.

reference text

T. Abekawa and M. Okumura. 2006. Japanese dependency parsing using co-occurrence information and a combination of case elements. In Proceedings of ACLCOLING. S. Bergsma, D. Lin, and R. Goebel. 2008. Discriminative learning of selectional preference from unlabeled text. In Proceedings of EMNLP, pages 59-68. S. Bergsma, E. Pitler, and D. Lin. 2010. Creating robust supervised classifier via web-scale N-gram data. In Proceedings of ACL. T. Brants and Alex Franz. 2006. The Google Web 1T 5-gram Corpus Version 1. 1. LDC2006T13. H. Calvo and A. Gelbukh. 2004. Acquiring selectional preferences from untagged text for prepositional phrase attachment disambiguation. In Proceedings of VLDB. H. Calvo and A. Gelbukh. 2006. DILUCT: An opensource Spanish dependency parser based on rules, heuristics, and selectional preferences. In Lecture Notes in Computer Science 3999, pages 164-175. X. Carreras. 2007. Experiments with a higher-order projective dependency parser. In Proceedings of EMNLPCoNLL, pages 957-961. X. Carreras, M. Collins, and T. Koo. 2008. TAG, dynamic programming, and the perceptron for efficient, feature-rich parsing. In Proceedings of CoNLL. E. Charniak, D. Blaheta, N. Ge, K. Hall, and M. Johnson. 2000. BLLIP 1987-89 WSJ Corpus Release 1, LDC No. LDC2000T43.Linguistic Data Consortium. W. Chen, D. Kawahara, K. Uchimoto, and Torisawa. 2009. Improving dependency parsing with subtrees from auto-parsed data. In Proceedings of EMNLP, pages 570-579. K. W. Church and P. Hanks. 1900. 1564 Word association norms, mutual information, and lexicography. Computational Linguistics, 16(1):22-29. R. L. Cilibrasi and P. M. B. Vitanyi. 2007. The Google similarity distance. IEEE Transaction on Knowledge and Data Engineering, 19(3):2007. pages 370-383. M. Collins, A. Globerson, T. Koo, X. Carreras, and P. L. Bartlett. 2008. Exponentiated gradient algorithm for conditional random fields and max-margin markov networks. Journal of Machine Learning Research, pages 1775–1822. M. Collins, P. Koehn, and I. Kucerova. 2005. Clause restructuring for statistical machine translation. In Proceedings of ACL, pages 53 1-540. S. Corston-Oliver, A. Aue, Kevin. Duh, and E. Ringger. 2006. Multilingual dependency parsing using bayes point machines. In Proceedings of NAACL. H. Daum e´ III. 2007. Frustrating easy domain adaptation. In Proceedings of ACL. E. F. Drabek and Q. Zhou. 2000. Using co-occurrence statistics as an information source for partial parsing of Chinese. In Proceedings of Second Chinese Language Processing Workshop, ACL, pages 22-28. Y. GoldBerg and M. Elhadad. 2010. An efficient algorithm for easy-first non-directional dependency parsing. In Proceedings of NAACL, pages 742-750. D. Graff. 2003. English Gigaword, LDC2003T05. J. Hall, J. Nivre, and J. Nilsson. 2006. Discrimina- tive classifier for deterministic dependency parsing. In Proceedings of ACL, pages 3 16-323. M. Johnson and S. Riezler. 2000. Exploiting auxiliary distribution in stochastic unification-based garmmars. In Proceedings of NAACL. T. Koo, X. Carreras, and M. Collins. 2008. Simple semi-supervised dependency parsing. In Proceedings of ACL, pages 595-603. F. Keller and M. Lapata. 2003. Using the web to obtain frequencies for unseen bigrams. Computational Linguistics, 29(3):459-484. M. Lapata and F. Keller. 2005. Web-based models for natural language processing. ACM Transactions on Speech and Language Processing, 2(1), pages 1-30. M. Lauer. 1995. Corpus statistics meet the noun compound: some empirical results. In Proceedings of ACL. D. K. Lin, H. Church, S. Ji, S. Sekine, D. Yarowsky, S. Bergsma, K. Patil, E. Pitler, E. Lathbury, V Rao, K. Dalwani, and S. Narsale. 2010. New tools for webscale n-grams. In Proceedings of LREC. M.P. Marcus, B. Santorini, and M. Marcinkiewicz. 1993. Building a large annotated corpus of English: The Penn Treebank. Computational Linguistics. A. F. T. Martins, D. Das, N. A. Smith, and E. P. Xing. 2008. Stacking dependency parsers. In Proceedings of EMNLP, pages 157-166. D. McClosky, E. Charniak, and M. Johnson. 2006. Reranking and self-training for parser adaptation. In Proceedings of ACL. D. McClosky, E. Charniak, and M. Johnson. 2010. Automatic Domain Adapatation for Parsing. In Proceedings of NAACL-HLT. R. McDonald and J. Nivre. 2007. Characterizing the errors of data-driven dependency parsing models. In Proceedings of EMNLP-CoNLL. R. McDonald and F. Pereira. 2006. Online learning of approximate dependency parsing algorithms. In Proceedings of EACL, pages 81-88. R. McDonald, K. Crammer, and F. Pereira. 2005. Online large-margin training of dependency parsers. In Proceedings of ACL, pages 91-98. P. Nakov and M. Hearst. 2005. Search engine statistics beyond the n-gram: application to noun compound bracketing. In Proceedings of CoNLL. J. Nivre and R. McDonald. 2008. Integrating graphbased and transition-based dependency parsers. In Proceedings of ACL, pages 950-958. G. van Noord. 2007. Using self-trained bilexical preferences to improve disambiguation accuracy. In Proceedings of IWPT, pages 1-10. PennBioIE. 2005. Mining the bibliome project, 2005. http:bioie.ldc.upenn.edu/. E. Pitler, S. Bergsma, D. Lin, and K. Church. 2010. Using web-scale N-grams to improve base NP parsing performance. In Proceedings of COLING, pages 886894. P. Resnik. 1993. Selection and information: a classbased approach to lexical relationships. Ph.D. thesis, University of Pennsylvania. J. Suzuki, H. Isozaki, X. Carreras, and M. Collins. 2009. An empirical study of semi-supervised structured conditional models for dependency parsing. In Proceedings of EMNLP, pages 551-560. J. Suzuki and H. Isozaki. 2008. Semi-supervised sequential labeling and segmentation using giga-word scale unlabeled data. In Proceedings of ACL, pages 665673. P. D. Turney. 2003. Measuring praise and criticism: Inference of semantic orientation from association. ACM Transactions on Information Systems, 21(4). J. Veronis. 2005. Web: Google adjusts its counts. Jean Veronis’ blog: http://aixtal.blogsplot.com/2005/03/ web-google-adjusts-its-count.html. M. Volk. 2001. Exploiting the WWW as corpus to resolve PP attachment ambiguities. In Proceedings of the Corpus Linguistics. 1565 Q. I. Wang, D. Lin, and D. Schuurmans. 2007. Simple training of dependency parsers via structured boosting. In Proceedings of IJCAI, pages 1756-1762. Yamada and Matsumoto. 2003. Statistical dependency analysis with support vector machines. In Proceedings of IWPT, pages 195-206. A. Yates, S. Schoenmackers, and O. Etzioni. 2006. Detecting parser errors using web-based semantic filters. In Proceedings of EMNLP, pages 27-34. Y. Zhang and S. Clark. 2008. A tale of two parsers: investigating and combining graph-based and transitionbased dependency parsing using beam-search. In Proceedings of EMNLP, pages 562-571 .