emnlp emnlp2013 emnlp2013-194 emnlp2013-194-reference knowledge-graph by maker-knowledge-mining

194 emnlp-2013-Unsupervised Relation Extraction with General Domain Knowledge

Source: pdf

Author: Oier Lopez de Lacalle ; Mirella Lapata

Abstract: In this paper we present an unsupervised approach to relational information extraction. Our model partitions tuples representing an observed syntactic relationship between two named entities (e.g., “X was born in Y” and “X is from Y”) into clusters corresponding to underlying semantic relation types (e.g., BornIn, Located). Our approach incorporates general domain knowledge which we encode as First Order Logic rules and automatically combine with a topic model developed specifically for the relation extraction task. Evaluation results on the ACE 2007 English Relation Detection and Categorization (RDC) task show that our model outperforms competitive unsupervised approaches by a wide margin and is able to produce clusters shaped by both the data and the rules.

reference text

Eugene Agichtein and Luis Gravano. 2000. Snowball: Extracting relations from large plain-text collections. In Proceedings of the 5th ACM International Conference on Digital Libraries, pages 85–94, San Antonio, Texas. Eneko Agirre and Aitor Soroa. 2007. Semeval-2007 task 02: Evaluating word sense induction and discrimination systems. In Proceedings of the 4th Interna- tional Workshop on Semantic Evaluations, pages 7–12, Prague, Czech Republic. David Andrzejewski, Xiaojin Zhu, Mark Craven, and Ben Recht. 2011. A framework for incorporating general domain knowledge into latent Dirichlet allocation using first-order logic. In Proceedings of the 22nd International Joint Conference on Artificial Intelligence, pages 1171–1 177, Barcelona, Spain. Michele Banko, Michael J. Cafarella, Stephen Soderland, Matthew Broadhead, and Oren Etzioni. 2007. Open information extraction from the web. In Proceedings of the 20th International Joint Conference on Artificial Intelligence, pages 2670–2676, Hyderabad, India. Amir Beck and Marc Teboulle. 2003. Mirror descent and nonlinear projected subgradient methods for convex optimization. Operations Research Letters, 31(3): 167–175. Christopher M. Bishop. 2006. Pattern Recognition and Machine Learning. Springer. David M. Blei, Andrew Y. Ng, and Michael I. Jordan. 2003. Latent Dirichlet allocation. Journal of Machine Learning Research, 3:993–1022. Razvan Bunescu and Raymond Mooney. 2007. Learning to extract relations from the web using minimal supervision. In Proceedings of the 45th Annual Meeting of the Association of Computational Linguistics, pages 576–583, Prague, Czech Republic. Aron Culotta and Jeffrey Sorensen. 2004. Dependency tree kernels for relation extraction. In Proceedings of the 42ndMeeting oftheAssociationfor Computational Linguistics, Main Volume, pages 423–429, Barcelona, Spain. Ted Dunning. 1993. Accurate methods for the statistics of surprise and coincidence. Computational Linguistics, 19(1):61–74. Jenny Rose Finkel, Trond Grenager, and Christopher Manning. 2005. Incorporating non-local information into information extraction systems by Gibbs sampling. In Proceedings of the 43rd Annual Meeting of the Association for Computational Linguistics, pages 363–370, Ann Arbor, Michigan. David Gondek and Thomas Hofmann. 2004. Nonredundant data clustering. In IEEE International Conference on Data Mining, pages 75–82. IEEE Computer Society. Thomas L. Griffiths and Mark Steyvers. 2004. Finding scientific topics. PNAS, 101(1):5228–5235. Takaaki Hasegawa, Satoshi Sekine, and Ralph Grishman. 2004. Discovering relations among named entities from large corpora. In Proceedings of the 42ndAnnual Meeting of the Association for Computational Linguistics, pages 415–422, Barcelona, Spain. D. Koller and N. Friedman. 2009. Probabilistic Graphical Models: Principles and Techniques. MIT Press. Dekang Lin and Patrick Pantel. 2001. DIRT discovery – of inference rules from text. In Proceedings of the 7th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pages 323–328, San Francisco, California. Mike Mintz, Steven Bills, Rion Snow, and Daniel Jurafsky. 2009. Distant supervision for relation extraction without labeled data. In Proceedings of the Joint Conference of the 47th Annual Meeting of the ACL and the 4th International Joint Conference on Natural Language Processing of the AFNLP, pages 1003–101 1, Suntec, Singapore. Joakim Nivre, Johan Hall, and Jens Nilsson. 2004. Memory-based dependency parsing. In Proceedings of the 8th Conference on Computational Natural Language Learning, pages 49–56, Boston, Massachusetts. Eric W. Noreen. 1989. Computer-intensive Methods for Testing Hypotheses: An Introduction. John Wiley and Sons Inc. Patrick Pantel and Marco Pennacchiotti. 2006. Espresso: Leveraging generic patterns for automatically harvesting semantic relations. In Proceedings of the 21st International Conference on Computational Linguistics 425 and 44th Annual Meeting of the Association for Computational Linguistics, pages 113–120, Sydney, Australia. Hoifung Poon and Pedro Domingos. 2009. Unsuper- vised semantic parsing. In Proceedings of the 2009 Conference on Empirical Methods in Natural Language Processing, pages 1–10, Suntec, Singapore. Matthew Richardson and Pedro Domingos. 2006. Markov logic networks. Machine Learning, 62(1–2): 107–136. Ellen Riloff and Rosie Jones. 1999. Learning dictionaries for information extraction. In Proceedings of the 16th International Joint Conference on Artificial Intelligence, pages 474–479, Stockholm, Sweden. Stefan Schoenmackers, Jesse Davis, Oren Etzioni, and Daniel Weld. 2010. Learning first-order Horn clauses from web text. In Proceedings of the 2010 Conference on Empirical Methods in Natural Language Processing, pages 1088–1098, Cambridge, MA, October. Association for Computational Linguistics. Yusuke Shinyama and Satoshi Sekine. 2006. Preemptive information extraction using unrestricted relation discovery. In Proceedings of the Human Language Technology Conference of the NAACL, Main Conference, pages 304–31 1, New York City, USA. Mihai Surdeanu and Massimiliano Ciaramita. 2007. Robust information extration with perceptrons. In Proceedings of the NIST 2007 Automatic Content Extraction Workshop. Kiri Wagstaff, Claire Cardie, C Rogers, and S Schr o¨dl. 2001. Constrained k-means clustering with back- ground knowledge. In International Conference on Machine Learning, pages 577–584. Morgan Kaufmann. Limin Yao, Aria Haghighi, Sebastian Riedel, and Andrew McCallum. 2011. Structured relation discovery using generative models. In Proceedings ofthe 2011 Conference on Empirical Methods in Natural Language Processing, pages 1456–1466, Edinburgh, Scotland, UK. GuoDong Zhou, Min Zhang, DongHong Ji, and QiaoMing Zhu. 2007. Tree kernel-based relation extraction with context-sensitive structured parse tree information. In Proceedings of the 2007 Joint Conference on Empirical Methods in Natural Language Processing and Computational Natural Language Learning, pages 728–736, Prague, Czech Republic.