acl acl2010 acl2010-60 acl2010-60-reference knowledge-graph by maker-knowledge-mining
Source: pdf
Author: Gerlof Bouma
Abstract: In this paper we start to explore two-part collocation extraction association measures that do not estimate expected probabilities on the basis of the independence assumption. We propose two new measures based upon the well-known measures of mutual information and pointwise mutual information. Expected probabilities are derived from automatically trained Aggregate Markov Models. On three collocation gold standards, we find the new association measures vary in their effectiveness.
Timothy Baldwin. 2005. The deep lexical acquisition of english verb-particle constructions. Computer Speech and Language, Special Issue on Multiword Expressions, 19(4):398–414. Timothy Baldwin. 2008. A resource for evaluating the deep lexical acquisition of English verb-particle con- structions. In Proceedings of the LREC 2008 Workshop Towards a Shared Task for Multiword Expressions (MWE 2008), pages 1–2, Marrakech. John Blitzer, Amir Globerson, and Fernando Pereira. 2005. Distributed latent variable models of lexical co-occurrences. In Tenth International Workshop on Artificial Intelligence and Statistics. Kenneth W. Church and Patrick Hanks. 1990. Word association norms, mutual information, and lexicography. Computational Linguistics, 16(1):22–29. Ted Dunning. 1993. Accurate methods for the statistics of surprise and coincidence. Computational Linguistics, 19(1):61–74. Stefan Evert. 2007. Corpora and collocations. Extended Manuscript of Chapter 58 of A. L ¨udeling and M. Kyt o¨, 2008, Corpus Linguistics. An International Handbook, Mouton de Gruyter, Berlin. Stefan Evert. 2008. A lexicographic evaluation of German adjective-noun collocations. In Proceedings of the LREC 2008 Workshop Towards a Shared Task for Multiword Expressions (MWE 2008), pages 3–6, Marrakech. Thomas Hofmann and Jan Puzicha. 1998. Statistical models for co-occurrence data. Technical report, MIT. AI Memo 1625, CBCL Memo 159. Brigitte Krenn and Stefan Evert. 2001. Can we do better than frequency? a case study on extracting PPverb collocations. In Proceedings of the ACL Workshop on Collocations, Toulouse. Brigitte Krenn. 2008. Description of evaluation resource German PP-verb data. In Proceedings of the LREC 2008 Workshop Towards a Shared Task for Multiword Expressions (MWE 2008), pages 7– 10, Marrakech. – Chris Manning and Hinrich Sch u¨tze. 1999. Foundations of Statistical Natural Language Processing. MIT Press, Cambridge, MA. Pavel Pecina. 2008. A machine learning approach to multiword expression extraction. In Proceedings of the LREC 2008 Workshop Towards a Shared Task for Multiword Expressions (MWE 2008), pages 54– 57, Marrakech. Mats Rooth, Stefan Riester, Detlef Prescher, Glenn Carrol, and Franz Beil. 1999. Inducing a semantically annotated lexicon via em-based clustering. In Proceedings of the 37th Annual Meeting of the Association for Computational Linguistics, College Park, MD. Lawrence Saul and Fernando Pereira. 1997. Aggregate and mixed-order markov models for statistical language processing. In Proceedings of the Second Conference on Empirical Methods in Natural Language Processing, pages 81–89. Noah A. Smith and Jason Eisner. 2004. Annealing techniques for unsupervised statistical language learning. In Proceedings of the 42nd Annual Meeting of the Association for Computational Linguistics. 114