acl acl2011 acl2011-334 acl2011-334-reference knowledge-graph by maker-knowledge-mining

334 acl-2011-Which Noun Phrases Denote Which Concepts?

Source: pdf

Author: Jayant Krishnamurthy ; Tom Mitchell

Abstract: Resolving polysemy and synonymy is required for high-quality information extraction. We present ConceptResolver, a component for the Never-Ending Language Learner (NELL) (Carlson et al., 2010) that handles both phenomena by identifying the latent concepts that noun phrases refer to. ConceptResolver performs both word sense induction and synonym resolution on relations extracted from text using an ontology and a small amount of labeled data. Domain knowledge (the ontology) guides concept creation by defining a set of possible semantic types for concepts. Word sense induction is performed by inferring a set of semantic types for each noun phrase. Synonym detection exploits redundant informa- tion to train several domain-specific synonym classifiers in a semi-supervised fashion. When ConceptResolver is run on NELL’s knowledge base, 87% of the word senses it creates correspond to real-world concepts, and 85% of noun phrases that it suggests refer to the same concept are indeed synonyms.

reference text

Eneko Agirre and Aitor Soroa. 2007. Semeval-2007 task 02: Evaluating word sense induction and discrimination systems. In Proceedings of the 4th International Workshop on Semantic Evaluations, pages 7–12. Michele Banko, Michael J. Cafarella, Stephen Soderland, Matt Broadhead, and Oren Etzioni. 2007. Open information extraction from the web. In Proceedings of the Twentieth International Joint Conference on Artificial Intelligence, pages 2670–2676. Sugato Basu, Mikhail Bilenko, and Raymond J. Mooney. 2004. A probabilistic framework for semi-supervised clustering. In Proceedings of the Tenth ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pages 59–68. Indrajit Bhattacharya and Lise Getoor. 2006. A latent dirichlet model for unsupervised entity resolution. In Proceedings of the 2006 SIAM International Conference on Data Mining, pages 47–58. Indrajit Bhattacharya and Lise Getoor. 2007. Collective entity resolution in relational data. ACM Transactions on Knowledge Discovery from Data, 1(1). Avrim Blum and Tom Mitchell. 1998. Combining labeled and unlabeled data with co-training. In Proceedings of the Eleventh Annual Conference on Computational Learning Theory, pages 92–100. Kurt Bollacker, Colin Evans, Praveen Paritosh, Tim Sturge, and Jamie Taylor. 2008. Freebase: a collaboratively created graph database for structuring human knowledge. In Proceedings of the 2008 ACM SIGMOD International Conference on Management of Data, pages 1247–1250. Andrew Carlson, Justin Betteridge, Bryan Kisiel, Burr Settles, Estevam R. Hruschka Jr., and Tom M. Mitchell. 2010. Toward an architecture for neverending language learning. In Proceedings of the Twenty-Fourth AAAI Conference on Artificial Intelligence. William W. Cohen, Pradeep Ravikumar, and Stephen E. Fienberg. 2003. A Comparison of String Distance Metrics for Name-Matching Tasks. In Proceedings of the IJCAI-03 Workshop on Information Integration, pages 73–78, August. Ivan P. Fellegi and Alan B. Sunter. 1969. A theory for record linkage. Journal of the American Statistical Association, 64: 1183–1210. Lise Getoor and Christopher P. Diehl. 2005. Link mining: a survey. SIGKDD Explorations Newsletter, 7:3– 12. Aria Haghighi and Dan Klein. 2010. Coreference resolution in a modular, entity-centered model. In Proceedings of the 2010 Annual Conference of the North 579 American Chapter of the Association for Computational Linguistics, pages 385–393, June. Abraham Kaplan. 1955. An experimental study of ambiguity and context. Mechanical Translation, 2:39–46. Dan Klein, Sepandar D. Kamvar, and Christopher D. Manning. 2002. From instance-level constraints to space-level constraints: Making the most of prior knowledge in data clustering. In Proceedings of the Nineteenth International Conference on Machine Learning, pages 307–3 14. Dekang Lin and Patrick Pantel. 2002. Concept discovery from text. In Proceedings of the 19th International Conference on Computational linguistics - Volume 1, pages 1–7. Suresh Manandhar, Ioannis P. Klapaftis, Dmitriy Dligach, and Sameer S. Pradhan. 2010. Semeval-2010 task 14: Word sense induction & disambiguation. In Proceedings of the 5th International Workshop on Semantic Evaluation, pages 63–68. Andrew McCallum and Ben Wellner. 2004. Conditional models ofidentity uncertainty with application to noun coreference. In Advances in Neural Information Processing Systems 18. Andrew McCallum, Kamal Nigam, and Lyle H. Ungar. 2000. Efficient clustering of high-dimensional data sets with application to reference matching. In Proceedings of the Sixth ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pages 169–178. George A. Miller. 1995. Wordnet: A lexical database for english. Communications of the ACM, 38:39–41 . Alvaro Monge and Charles Elkan. 1996. The field matching problem: Algorithms and applications. In Proceedings of the Second International Conference on Knowledge Discovery and Data Mining, pages 267–270. Vincent Ng. 2008. Unsupervised models for coreference resolution. In Proceedings of the 2008 Conference on Empirical Methods in Natural Language Processing, 640–649. Patrick Pantel and Dekang Lin. 2002. Discovering word pages senses from text. In Proceedings of ACM SIGKDD Knowledge Discovery and Data Mining, pages 613–619. Hoifung Poon and Pedro Domingos. 2007. Joint inferConference on ence in information extraction. In Proceedings of the 22nd AAAI Conference on Artificial Intelligence - Vol- 1, pages 913–918. Hoifung Poon and Pedro Domingos. 2008. Joint unsupervised coreference resolution with markov logic. In Proceedings of the 2008 Conference on Empirical Methods in Natural Language Processing, EMNLP ’08, pages 650–659. ume Pradeep Ravikumar and William W. Cohen. 2004. A hierarchical graphical model for record linkage. In Proceedings of the 20th Conference on Uncertainty in Artificial Intelligence, pages 454–461. Ariel S. Schwartz and Marti A. Hearst. 2003. A simple algorithm for identifying abbreviation definitions in biomedical text. In Proceedings of the Pacific Symposium on BIOCOMPUTING 2003, pages 451–462. Parag Singla and Pedro Domingos. 2006. Entity resolution with markov logic. In Proceedings of the Sixth International Conference on Data Mining, pages 572– 582. Rion Snow, Daniel Jurafsky, and Andrew Y. Ng. 2006. Semantic taxonomy induction from heterogenous evidence. In Proceedings of the 21st International Conference on Computational Linguistics and the 44th Annual Meeting of the Association for Computational Linguistics, pages 801–808, Morristown, NJ, USA. Rion Snow, Sushant Prakash, Daniel Jurafsky, and Andrew Y. Ng. 2007. Learning to merge word senses. In Proceedings of the 2007 Joint Conference on Empirical Methods in Natural Language Processing and Computational Natural Language Learning, pages 1005–1014, June. Richard C. Wang and William W. Cohen. 2007. Language-independent set expansion of named entities using the web. In Proceedings of the Seventh IEEE International Conference on Data Mining, pages 342– 350. William E. Winkler. 1999. The state of record linkage and current research problems. Technical report, Statistical Research Division, U.S. Census Bureau. Eric P. Xing, Andrew Y. Ng, Michael I. Jordan, and Stuart Russell. 2003. Distance metric learning, with application to clustering with side-information. In Advances in Neural Information Processing Systems 17, pages 505–512. Alexander Yates and Oren Etzioni. 2007. Unsupervised resolution of objects and relations on the web. In Proceedings of the 2007 Annual Conference of the North American Chapter of the Association for Computational Linguistics. 580