acl acl2011 acl2011-6 acl2011-6-reference knowledge-graph by maker-knowledge-mining

6 acl-2011-A Comprehensive Dictionary of Multiword Expressions


Source: pdf

Author: Kosho Shudo ; Akira Kurahone ; Toshifumi Tanabe

Abstract: It has been widely recognized that one of the most difficult and intriguing problems in natural language processing (NLP) is how to cope with idiosyncratic multiword expressions. This paper presents an overview of the comprehensive dictionary (JDMWE) of Japanese multiword expressions. The JDMWE is characterized by a large notational, syntactic, and semantic diversity of contained expressions as well as a detailed description of their syntactic functions, structures, and flexibilities. The dictionary contains about 104,000 expressions, potentially 750,000 expressions. This paper shows that the JDMWE’s validity can be supported by comparing the dictionary with a large-scale Japanese N-gram frequency dataset, namely the LDC2009T08, generated by Google Inc. (Kudo et al. 2009). 1


reference text

Asahara, M. and Matsumoto, Y. 2003. IPADIC version 2.7.0 User’s Manual (in Japanese). NAIST, Information Science Division. 7 The time required to compile this dictionary is estimated at 24,000 working hours. 8 A portion of the JDMWE is available at http://jefi.info/. 169 Baldwin, T. and Bond, F. 2003. Multiword Expressions: Some Problems for Japanese NLP. Proceedings of the 8th Annual Meeting of the Association for Natural Language Processing (Japan): 379–382. Bannard, C. 2007. A Measure of Syntactic Flexibility for Automatically Identifying Multiword Expressions in Corpora. Proceedings of A Broader Perspective on Multiword Expressions, Workshop at the ACL 2007 Conference: 1–8. Baptista, J., Correia, A., and Fernandes, G. 2004. Frozen Sentences of Portuguese: Formal Descriptions for NLP. Proceedings of ACL 2004 Workshop on Multiword Expressions: Integrating Processing: 72– 79. Fazly, A. and Stevenson, S. 2006. Automatically Constructing a Lexicon of Verb Phrase Idiomatic Combinations. Proceedings of the 11th Conference of the European Chapter of the ACL: 337–344. Fellbaum, C. (ed.) 1999. WordNet. An Electronic Lexical Database, Cambridge, MA: MIT Press. Fellbaum, C., Geyken, Neumann, G. 2006. Idioms and Light Lexicography, Vol. A., Herold, A., Koerner, F., and Corpus-Based Studies of German Verbs. International Journal of 19, No. 4: 349-360. Gross, M. 1986. Lexicon-Grammar. The Representation of Compound Words. Proceedings of the 11th International Conference on Computational Linguistics, COLING86: 1–6. Hashimoto, C. and Kawahara, D. 2009. Compilation of an Idiom Example Database for Supervised Idiom Identification. Language Resource and Evaluation Vol. 43, No. 4 : 355-384. Jackendoff, R. 1997. The Architecture Faculty. Cambridge, MA: MIT Press. of Language Koyama, Y., Yasutake, M., Yoshimura, K., and Shudo, K. 1998. Large Scale Collocation Data and Their Application to Japanese Word Processor Technology. Proceedings of the 17th International Conference on Computational Linguistics, COLING98: 694–698. Kudo, T. and Kazawa, H. 2009. Japanese Web N-gram Version 1. Linguistic Data Consortium, Philadelphia. Kuiper, K., McCan, H., Quinn, H., É. Aitchison, T., and der Veer, K. 2003. SAID: A Syntactically Anno tated Idiom Dataset. Linguistic Data Consortium 2003T10. Laporte, and Voyatzi, S. 2008. An Electronic Dictionary of French Multiword Adverbs. Proceedings of the LREC Workshop towards a Shared Task for Multiword Expressions (MWE 2008): 31–34. Van Pantel, P. and Lin, D. 2001. A Statistical Corpus-Based Term Extractor. Proceedings of the 14th Biennial Conference of the Canadian Society on Computational Studies of Intelligence, SpringerVerlag: 36–46. Sag, I. A., Baldwin, T., Bond, F., Copestake, A., and Flickinger, D. 2002. Multiword Expressions: A Pain in the Neck for NLP. Proceedings of the 3rd International Conference on Intelligent Text Processing and Computational Linguistics, CICLING2002: 1–15. Sato, S. 2007. Compilation of a Comparative List of Basic Japanese Idioms from Five Sources (in Japanese). IPSJ SIG Notes 178: 1-6. Shudo, K., Narahara, T., and Yoshida, S. 1980. Morphological Aspect of Japanese Language Processing. Proceedings of the 8th International Conference on Computational Linguistics, COLING80: 1–8. Shudo, K., Tanabe, T., Takahashi, M., and Yoshimura, K. 2004. MWEs as Non-Propositional Content Indicators. Proceedings of ACL 2004 Workshop on Multiword Expressions: Integrating Processing: 3 1 39. Uchiyama, K. and Ishizaki, S. 2003. A Disambiguation of Compound Verbs. Proceedings of ACL 2003. Workshop on Multiword Expressions: Analysis, Acquisition and Treatment: 81–88. Villavicencio, A. 2004. Lexical Encoding of MWEs. Proceedings of ACL 2004 Workshop on Multiword Expressions: Integrating Processing: 80–87. 170