acl acl2010 acl2010-27 acl2010-27-reference knowledge-graph by maker-knowledge-mining

27 acl-2010-An Active Learning Approach to Finding Related Terms


Source: pdf

Author: David Vickrey ; Oscar Kipersztok ; Daphne Koller

Abstract: We present a novel system that helps nonexperts find sets of similar words. The user begins by specifying one or more seed words. The system then iteratively suggests a series of candidate words, which the user can either accept or reject. Current techniques for this task typically bootstrap a classifier based on a fixed seed set. In contrast, our system involves the user throughout the labeling process, using active learning to intelligently explore the space of similar words. In particular, our system can take advantage of negative examples provided by the user. Our system combines multiple preexisting sources of similarity data (a standard thesaurus, WordNet, contextual similarity), enabling it to capture many types of similarity groups (“synonyms of crash,” “types of car,” etc.). We evaluate on a hand-labeled evaluation set; our system improves over a strong baseline by 36%.


reference text

Fellbaum, C. (Ed.). (1998). Wordnet: An electronic lexical database. MIT Press. Ghahramani, Z., & Heller, K. (2005). Bayesian sets. Advances in Neural Information Processing Systems (NIPS). Hughes, T., & Ramage, D. (2007). Lexical semantic relatedness with random graph walks. EMNLP-CoNLL. Jiang, J., & Conrath, D. (1997). Semantic similarity based on corpus statistics and lexical taxonomy. Proceedings of International Conference on Research in Computational Linguistics. Lin, D. (1998). An information-theoretic definition of similarity. Proceedings of ICML. Liu, D. C., & Nocedal, J. (1989). On the limited memory method for large scale optimization. Mathematical Programming B. Pantel, P., Crestan, E., Borkovsky, A., Popescu, A., & Vyas, V. (2009). Web-scale distributional similarity and entity set expansion. EMNLP. Pasca, M., & Durme, B. V. (2008). Weaklysupervised acquisition of open-domain classes and class attributes from web documents and query logs. ACL. Roark, B., & Charniak, E. (1998). Noun-phrase co-occurrence statistics for semiautomatic semantic lexicon construction. ACL-COLING. Snow, R., Jurafsky, D., & Ng, A. (2006). Semantic taxonomy induction from heterogenous evidence. ACL. Vyas, V., & Pantel, P. (2009). Semi-automatic entity set refinement. NAACL/HLT. Vyas, V., Pantel, P., & Crestan, E. (2009). Helping editors choose better seed sets for entity expansion. CIKM. Wang, R., & Cohen, W. (2007). Languageindependent set expansion of named entities using the web. Seventh IEEE International Conference on Data Mining. 376