emnlp emnlp2012 emnlp2012-84 emnlp2012-84-reference knowledge-graph by maker-knowledge-mining
Source: pdf
Author: Avirup Sil ; Ernest Cronin ; Penghai Nie ; Yinfei Yang ; Ana-Maria Popescu ; Alexander Yates
Abstract: Existing techniques for disambiguating named entities in text mostly focus on Wikipedia as a target catalog of entities. Yet for many types of entities, such as restaurants and cult movies, relational databases exist that contain far more extensive information than Wikipedia. This paper introduces a new task, called Open-Database Named-Entity Disambiguation (Open-DB NED), in which a system must be able to resolve named entities to symbols in an arbitrary database, without requiring labeled data for each new database. We introduce two techniques for Open-DB NED, one based on distant supervision and the other based on domain adaptation. In experiments on two domains, one with poor coverage by Wikipedia and the other with near-perfect coverage, our Open-DB NED strategies outperform a state-of-the-art Wikipedia NED system by over 25% in accuracy.
Kedar Bellare and Andrew McCallum. 2007. Learning extractors from unlabeled text using relevant databases. In Sixth International Workshop on Information Integration on the Web. Kedar Bellare and Andrew McCallum. 2009. Generalized Expectation Criteria for Bootstrapping Extractors using Record-Text Alignment. In Empirical Methods in Natural Language Processing (EMNLP-09). Shai Ben-David, John Blitzer, Koby Crammer, Alex Kulesza, Fernando Pereira, and Jennifer Wortman Vaughan. 2010. A theory of learning from different domains. Machine Learning, 79: 151–175. John Blitzer, Ryan McDonald, and Fernando Pereira. 2006. Domain adaptation with structural correspondence learning. In EMNLP. Razvan Bunescu and Raymond Mooney. 2007. Learning to extract relations from the web using minimal super- vision. In Proceedings of the 45th Annual Meeting of the Association for Computational Linguistics (ACL07). R. Bunescu and M. Pasca. 2006. Using encyclopedic knowledge for named entity disambiguation. In Proceedings of the 11th Conference of the European Chapter of the Association for Computational Linguistics (EACL-06). Ying Chen and James Martin. 2007. Towards Robust Unsupervised Personal Name Disambiguation. In EMNLP, pages 190–198. Silviu Cucerzan. 2007. Large-scale named entity disambiguation based on wikipedia data. In Proceedings of the 2007 Joint Conference on Empirical Methods in Natural Language Processing and Computational Natural Language Learning (EMNLP-CoNLL), pages 708–716. Nilesh N. Dalvi, Ravi Kumar, Bo Pang, and Andrew Tomkins. 2009. Matching Reviews to Objects using a Language Model. In EMNLP, pages 609–618. Nilesh N. Dalvi, Ravi Kumar, and Bo Pang. 2012. Object matching in tweets with spatial models. In WSDM, pages 43–52. Hal Daum e´ III, Abhishek Kumar, and Avishek Saha. 2010. Frustratingly easy semi-supervised domain adaptation. In Proceedings of the ACL Workshop on Domain Adaptation (DANLP). D. Downey, M. Broadhead, and O. Etzioni. 2007. Lo- cating complex named entities in web text. In Procs. of the 20th International Joint Conference on Artificial Intelligence (IJCAI 2007). Anthony Fader, Stephen Soderland, and Oren Etzioni. 2009. Scaling wikipedia-based named entity disambiguation to arbitrary web text. In Proceedings of the WikiAI 09 - IJCAI Workshop: User Contributed 126 An Evolving Knowledge and Artificial Intelligence: Synergy. Xianpei Han and Jun Zhao. 2009. Named entity disambiguation by leveraging Wikipedia semantic knowledge. In Proceeding of the 18th ACM Conference on Information and Knowledge Management (CIKM), pages 215–224. Johannes Hoffart, Mohamed Amir Yosef, Ilaria Bordino, Hagen Furstenau, Manfred Pinkal, Marc Spaniol, Bilyana Taneva, Stefan Thater, and Gerhard Weikum1. 2011. Robust Disambiguation of Named Entities in Text. In EMNLP, pages 782–792. Raphael Hoffmann, Congle Zhang, Xiao Ling, Luke Zettlemoyer, and Daniel S. Weld. 2011. KnowledgeBased Weak Supervision for Information Extraction of Overlapping Relations. In Proceedings of the Annual Meeting of the Association for Computational Linguistics (ACL). Fei Huang and Alexander Yates. 2009. Distributional representations for handling sparsity in supervised sequence labeling. In Proceedings of the Annual Meeting of the Association for Computational Linguistics (ACL). Sayali Kulkarni, Amit Singh, Ganesh Ramakrishnan, and Soumen Chakrabarti. 2009. Collective annotation of wikipedia entities in web text. In Proceedings of the 15th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (KDD), pages 457–466. Tom Kwiatkowski, Luke Zettlemoyer, Sharon Goldwater, and Mark Steedman. 2011. Lexical Generalization in CCG Grammar Induction for Semantic Parsing. In Proceedings of the Conference on Empirical Methods in Natural Language Processing (EMNLP). M.E. Lesk. 1986. Automatic sense disambiguation using machine readable dictionaries: How to tell a pine cone from an ice cream cone. In Proceedings of the SIGDOC Conference. Thomas Lin, Mausam, and Oren Etzioni. 2012. Entity linking at web scale. In Knowledge Extraction Workshop (AKBC-WEKEX), 2012. D.C. Liu and J. Nocedal. 1989. On the limited memory method for large scale optimization. Mathematical Programming B, 45(3):503–528. G.S. Mann and D. Yarowsky. 2003. Unsupervised per- sonal name disambiguation. In CoNLL. Paul McNamee, Mark Dredze, Adam Gerber, Nikesh Garera, Tim Finin, James Mayfield, Christine Piatko, Delip Rao, David Yarowsky, and Markus Dreyer. 2009. HLTCOE Approaches to Knowledge Base Population at TAC 2009. In Text Analysis Conference. Rada Mihalcea and Andras Csomai. 2007. Wikify!: Linking documents to encyclopedic knowledge. In Proceedings of the Sixteenth ACM Conference on Information and Knowledge Management (CIKM), pages 233–242. Mike Mintz, Steven Bills, Rion Snow, and Daniel Jurafsky. 2009. Distant supervision for relation extraction without labeled data. In Proceedings of the 47th Annual Meeting of the Association for Computational Linguistics (ACL-2009), pages 1003–101 1. Patrick Pantel and Ariel Fuxman. 2011. Jigs and Lures: Associating Web Queries with Structured Entities. In ACL. L. Ratinov, D. Roth, D. Downey, and M. Anderson. 2011. Local and global algorithms for disambiguation to wikipedia. In Proc. of the Annual Meeting of the Association of Computational Linguistics (ACL). Sebastian Riedel, Limin Yao, and Andrew McCallum. 2010. Modeling relations and their mentions without labeled text. In Proceedings of the Sixteenth European Conference on Machine Learning (ECML-2010), pages 148–163. Avi Silberschatz, Henry F. Korth, and S. Sudarshan. 2010. Database System Concepts. McGraw-Hill, sixth edition. Daniel S. Weld, Raphael Hoffmann, and Fei Wu. 2009. Using Wikipedia to Bootstrap Open Information Extraction. In ACM SIGMOD Record. Limin Yao, Sebastian Riedel, and Andrew McCallum. 2010. Collective cross-document relation extraction without labelled data. In Proceedings ofthe 2010 Conference on Empirical Methods in Natural Language Processing (EMNLP-2010), pages 1013–1023. Yiping Zhou, Lan Nie, Omid Rouhani-Kalleh, Flavian Vasile, and Scott Gaffney. 2010. Resolving surface forms to wikipedia topics. In Proceedings of the 23rd International Conference on Computational Linguistics (Coling), pages 1335–1343. 127