emnlp emnlp2013 emnlp2013-32 emnlp2013-32-reference knowledge-graph by maker-knowledge-mining

32 emnlp-2013-Automatic Idiom Identification in Wiktionary

Source: pdf

Author: Grace Muzny ; Luke Zettlemoyer

Abstract: Online resources, such as Wiktionary, provide an accurate but incomplete source ofidiomatic phrases. In this paper, we study the problem of automatically identifying idiomatic dictionary entries with such resources. We train an idiom classifier on a newly gathered corpus of over 60,000 Wiktionary multi-word definitions, incorporating features that model whether phrase meanings are constructed compositionally. Experiments demonstrate that the learned classifier can provide high quality idiom labels, more than doubling the number of idiomatic entries from 7,764 to 18,155 at precision levels of over 65%. These gains also translate to idiom detection in sentences, by simply using known word sense disambiguation algorithms to match phrases to their definitions. In a set of Wiktionary definition example sentences, the more complete set of idioms boosts detection recall by over 28 percentage points.

reference text

J. Birke and A. Sarkar. 2006. A clustering approach for nearly unsupervised recognition of nonliteral language. In Proceedings of the Conference of the European Chapter of the Association for Computational Linguistics. A. Budanitsky and G. Hirst. 2006. Evaluating wordnetbased measures of lexical semantic relatedness. Computational Linguistics, 32(1): 13–47. P. Cook, A. Fazly, and S. Stevenson. 2007. Pulling their weight: Exploiting syntactic forms for the automatic identification of idiomatic expressions in context. In Proceedings of the workshop on a broader perspective on multiword expressions. P. Cook, A. Fazly, and S. Stevenson. 2008. The vnc-tokens dataset. In Proceedings of the Language Resources and Evaluation Conference Workshop Towards a Shared Task for Multiword Expressions. M. Diab and P. Bhutada. 2009. Verb noun construction mwe token supervised classification. In Proceedings of the Workshop on MultiwordExpressions: Identification, Interpretation, Disambiguation andApplications. 1421 A. Fazly and S. Stevenson. 2006. Automatically constructing a lexicon of verb phrase idiomatic combinations. In Proceedings of the Conference of the European Chapter of the Association for Computational Linguistics. R. Fothergill and T. Baldwin. 2012. Combining resources for mwe-token classification. In Proceedings of the First Joint Conference on Lexical and Computational Semantics-Volume 1: Proceedings of the main conference and the shared task, and Volume 2: Proceedings of the Sixth International Workshop on Semantic Evaluation. Y. Freund and R.E. Schapire. 1999. Large margin clas- sification using the perceptron algorithm. Machine learning, 37(3):277–296. M. Gedigian, J. Bryant, S. Narayanan, and B. Ciric. 2006. Catching metaphors. In Proceedings of the Third Workshop on Scalable Natural Language Understanding. G. Katz and E. Giesbrecht. 2006. Automatic identification of non-compositional multi-word expressions using latent semantic analysis. In Proceedings of the Workshop on Multiword Expressions: Identifying and Exploiting Underlying Properties. M. Lesk. 1986. Automatic sense disambiguation using machine readable dictionaries: How to tell a pine cone from an ice cream cone. In Proceedings of Special Interest Group on the Design of Communication. L. Li and C. Sporleder. 2009. Classifier combination for contextual idiom detection without labelled data. In Proceedings of the Conference on Empirical Methods in Natural Language Processing. I. Sag, T. Baldwin, F. Bond, A. Copestake, and D. Flickinger. 2002. Multiword expressions: A pain in the neck for nlp. In Computational Linguistics and Intelligent Text Processing. Springer. E. Shutova, L. Sun, and A. Korhonen. 2010. Metaphor identification using verb and noun clustering. In Proceedings of the International Conference on Computational Linguistics. E. Shutova, S. Teufel, and A. Korhonen. 2012. Statistical metaphor processing. Computational Linguistics, 39(2):301–353. T. Zesch, C. M ¨uller, and I. Gurevych. 2008. Extracting lexical semantic knowledge from wikipedia and wiktionary. In Proceedings of the International Conference on Language Resources and Evaluation.