acl acl2011 acl2011-229 acl2011-229-reference knowledge-graph by maker-knowledge-mining

229 acl-2011-NULEX: An Open-License Broad Coverage Lexicon


Source: pdf

Author: Clifton McFate ; Kenneth Forbus

Abstract: Broad coverage lexicons for the English language have traditionally been handmade. This approach, while accurate, requires too much human labor. Furthermore, resources contain gaps in coverage, contain specific types of information, or are incompatible with other resources. We believe that the state of open-license technology is such that a comprehensive syntactic lexicon can be automatically compiled. This paper describes the creation of such a lexicon, NU-LEX, an open-license feature-based lexicon for general purpose parsing that combines WordNet, VerbNet, and Wiktionary and contains over 100,000 words. NU-LEX was integrated into a bottom up chart parser. We ran the parser through three sets of sentences, 50 sentences total, from the Simple English Wikipedia and compared its performance to the same parser using Comlex. Both parsers performed almost equally with NU-LEX finding all lex-items for 50% of the sentences and Comlex succeeding for 52%. Furthermore, NULEX’s shortcomings primarily fell into two categories, suggesting future research directions. 1


reference text

Allen, James. 1995. Natural Language Understanding: 2nd edition. Benjamin/Cummings Publishing Company, Inc. Redwood City, CA. Fellbaum, Christiane. Ed. 1998. WordNet: An Electronic Database. MIT Press, Cambridge, MA. Forbus, K., Hinrichs, T., de kleer, J., and Usher, J. 2010.FIRE: Infrastructure for Experience-based Systems with Common Sense. AAAI Fall Symposium on Commonsense Knowledge. Menlo Park, CA. AAAI Press. Kipper, Karin, Hoa Trang Dang, and Martha Palmer. 2000. Class-Based Construction of a Verb Lexicon. In AAAI-2000 Seventeenth National Conference on Artificial Intelligence, Austin, TX. Kipper, Karin, Anna Korhonen, Neville Ryant, and Martha Palmer. 2006. Extending VerbNet with Novel Verb Classes. In Fifth International Conference on Language Resources and Evaluation (LREC 2006). Genoa, Italy. Levin, Beth. 1993. English Verb Classes Alternation: A Preliminary Investigation. University of Chicago Press, Chicago. and The Macleod, Catherine, Ralph Grishman, and Adam Meyers. 1994 Creating a Common Syntactic Dictionary of English. Presented at SNLR: International Workshop on Sharable Natural Language Resources, Nara, Japan. Marcus, Mitchell, Beatrice Santorini, Mary Ann Marcinkiewicz. 1993. Building a large annotated corpus of English: the Penn Treebank. Computational Linguistics. 19(2): 313-330. Matuszek, Cynthia, John Cabral, Michael Witbrock, and John DeOliveira. 2006. An Introduction to the Syntax and Content of Cyc. In Proceedings of the 2006 AAAI Spring Symposium on Formalizing and Compiling Background Knowledge and Its Applications to Knowledge Representation and Tomai, Emmet, and Kenneth Forbus. 2009. EA NLU: Practical Language Understanding for Cognitive Modeling. In Proceedings of the 22nd International Florida Artificial Intelligence Research Society Conference, Sanibel Island, FL. Question Answering, Stanford, CA. McFate, Clifton. 2010. Expanding Verb Coverage in Cyc With VerbNet. In proceedings of the ACL 2010 Student Research Workshopl. Uppsala, Sweden, Miller, George, Richard Beckwith, Christiane Fellbaum, Derek Gross, and Katherine Miller. 1993. Introduction to WordNet: An On-line Lexical Database. In Fellbaum, Christiane. Ed. 1998. WordNet: An Electronic Database. MIT Press, Cambridge, MA. Sekine, Satoshi, and Ralph Grishman. 1995. A Corpus-based Probabilistic Grammar with Only Two Non-terminals. In Fourth International Workshop on Parsing Technologies. Prague, Czech Republic. 367