acl acl2012 acl2012-208 acl2012-208-reference knowledge-graph by maker-knowledge-mining

208 acl-2012-Unsupervised Relation Discovery with Sense Disambiguation

Source: pdf

Author: Limin Yao ; Sebastian Riedel ; Andrew McCallum

Abstract: To discover relation types from text, most methods cluster shallow or syntactic patterns of relation mentions, but consider only one possible sense per pattern. In practice this assumption is often violated. In this paper we overcome this issue by inducing clusters of pattern senses from feature representations of patterns. In particular, we employ a topic model to partition entity pairs associated with patterns into sense clusters using local and global features. We merge these sense clusters into semantic relations using hierarchical agglomerative clustering. We compare against several baselines: a generative latent-variable model, a clustering method that does not disambiguate between path senses, and our own approach but with only local features. Experimental results show our proposed approach discovers dramatically more accurate clusters than models without sense disambiguation, and that incorporating global features, such as the document theme, is crucial.

reference text

Amit Bagga and Breck Baldwin. 1998. Algorithms for scoring coreference chains. In The First International Conference on Language Resources and Evaluation Workshop on Linguistics Coreference. Pierre Baldi, Søren Brunak, Yves Chauvin, Claus A. F. Andersen, and Henrik Nielsen. 2000. Assessing the accuracy of prediction algorithms for classification: an overview. Bioinformatics, 16:412–424. Michele Banko and Oren Etzioni. 2008. The tradeoffs between open and traditional relation extraction. In Proceedings of ACL-08: HLT. Michele Banko, Michael J Cafarella, Stephen Soderland, Matt Broadhead, and Oren Etzioni. 2007. Open information extraction from the web. In Proceedings of IJCAI2007. Christian Bizer, Jens Lehmann, Georgi Kobilarov, S o¨ren Auer, Christian Becker, Richard Cyganiak, and Sebastian Hellmann. 2009. DBpedia - a crystallization point for the web of data. Journal of Web Semantics: Science, Services and Agents on the World Wide Web, pages 154–165. David Blei, Andrew Ng, and Michael Jordan. 2003. Latent Dirichlet Allocation. Journal of Machine Learning Research, 3:993–1022, January. Kurt Bollacker, Colin Evans, Praveen Paritosh, Tim Sturge, and Jamie Taylor. 2008. Freebase: a collaboratively created graph database for structuring human knowledge. In SIGMOD ’08: Proceedings of the 2008 ACM SIGMOD international conference on Management of data, pages 1247–1250, New York, NY, USA. ACM. Danushka Bollegala, Yutaka Matsuo, and Mitsuru Ishizuka. 2010. Relational duality: Unsupervised extraction of semantic relations between entities on the web. In Proceedings of WWW. Jonathan Chang, Jordan Boyd-Graber, Chong Wang, Sean Gerrish, and David Blei. 2009. Reading tea leaves: How humans interpret topic models. In Proceedings of NIPS. Harr Chen, Edward Benson, Tahira Naseem, and Regina Barzilay. 2011. In-domain relation discovery with meta-constraints via posterior regularization. In Proceedings of ACL. Jenny Rose Finkel, Trond Grenager, and Christopher Manning. 2005. Incorporating non-local information into information extraction systems by gibbs sampling. In Proceedings of the 43rd Annual Meeting of the Association for Computational Linguistics (ACL ’05), pages 363–370, June. Benjamin Hachey. 2009. Towards Generic Relation Extraction. Ph.D. thesis, University of Edinburgh. Takaaki Hasegawa, Satoshi Sekine, and Ralph Grishman. 2004. Discovering relations among named entities from large corpora. In ACL. Dekang Lin and Patrick Pantel. 2001 . DIRT - Discovery of Inference Rules from Text. In Proceedings of KDD. 720 J. Nivre, J. Hall, and J. Nilsson. 2004. Memory-based dependency parsing. In Proceedings of CoNLL, pages 49–56. Patrick Pantel, Rahul Bhagat, Bonaventura Coppola, Timothy Chklovski, and Eduard Hovy. 2007. ISP: Learning Inferential Selectional Preferences. In Proceedings of NAACL HLT. Lev Ratinov, Dan Roth, Doug Downey, and Mike Anderson. 2011. Local and global algorithms for disambiguation to Wikipedia. In Proceedings of ACL. Deepak Ravichandran and Eduard Hovy. 2002. Learning surface text patterns for a question answering system. In Proceedings of ACL. Joseph Reisinger and Raymond J. Mooney. 2011. Crosscutting models of lexical semantics. In Proceedings of EMNLP. Bryan Rink and Sanda Harabagiu. 2011. A generative model for unsupervised discovery of relations and argument classes from clinical texts. In Proceedings of EMNLP. Alan Ritter, Mausam, and Oren Etzioni. 2010. A La- tent Dirichlet Allocation method for Selectional Preferences. In Proceedings of ACL10. Evan Sandhaus, 2008. The New York Times Annotated Corpus. Linguistic Data Consortium, Philadelphia. Diarmuid O Seaghdha. 2010. Latent variable models of selectional preference. In Proceedings of ACL 10. Idan Szpektor, Hristo Tanev, Ido Dagan, and Bonaventura Coppola. 2004. Scaling web-based acquisition of entailment relations. In Proceedings of EMNLP. Limin Yao, Aria Haghighi, Sebastian Riedel, and Andrew McCallum. 2011. Structured relation discovery using generative models. In Proceedings of EMNLP. Alexander Yates and Oren Etzioni. 2009. Unsupervised methods for determining object and relation synonyms on the web. Journal ofArtificial Intelligence Research, 34:255–296.