emnlp emnlp2012 emnlp2012-100 emnlp2012-100-reference knowledge-graph by maker-knowledge-mining

100 emnlp-2012-Open Language Learning for Information Extraction

Source: pdf

Author: Mausam ; Michael Schmitz ; Stephen Soderland ; Robert Bart ; Oren Etzioni

Abstract: Open Information Extraction (IE) systems extract relational tuples from text, without requiring a pre-specified vocabulary, by identifying relation phrases and associated arguments in arbitrary sentences. However, stateof-the-art Open IE systems such as REVERB and WOE share two important weaknesses (1) they extract only relations that are mediated by verbs, and (2) they ignore context, thus extracting tuples that are not asserted as factual. This paper presents OLLIE, a substantially improved Open IE system that addresses both these limitations. First, OLLIE achieves high yield by extracting relations mediated by nouns, adjectives, and more. Second, a context-analysis step increases precision by including contextual information from the sentence in the extractions. OLLIE obtains 2.7 times the area under precision-yield curve (AUC) compared to REVERB and 1.9 times the AUC of WOEparse. –

reference text

E. Agichtein and L. Gravano. 2000. Snowball: Extracting relations from large plain-text collections. In Procs. of the Fifth ACM International Conference on Digital Libraries. ARPA. 1991. Proc. 3rd Message Understanding Conf. Morgan Kaufmann. ARPA. 1998. Proc. 7th Message Understanding Conf. Morgan Kaufmann. M. Banko, M. Cafarella, S. Soderland, M. Broadhead, and O. Etzioni. 2007. Open information extraction from the Web. In Procs. of IJCAI. S. Brin. 1998. Extracting Patterns and Relations from the World Wide Web. In WebDB Workshop at 6th International Conference on Extending Database Technology, EDBT’98, pages 172–183, Valencia, Spain. Razvan C. Bunescu and Raymond J. Mooney. 2005. A shortest path dependency kernel for relation extraction. In Proc. of HLT/EMNLP. Andrew Carlson, Justin Betteridge, Bryan Kisiel, Burr Settles, Estevam R. Hruschka Jr., and Tom M. Mitchell. 2010. Toward an architecture for neverending language learning. In Procs. of AAAI. Janara Christensen, Mausam, Stephen Soderland, and Oren Etzioni. 2011. An analysis of open information extraction based on semantic role labeling. In Proceedings of the 6th International Conference on Knowledge Capture (K-CAP ’11). Paul R. Cohen. 1995. Empirical Methods for Artificial Intelligence. MIT Press. Ido Dagan, Lillian Lee, and Fernando C. N. Pereira. 1999. Similarity-based models of word cooccurrence probabilities. Machine Learning, 34(1-3):43–69. 533 Marie-Catherine de Marneffe, Bill MacCartney, and Christopher D. Manning. 2006. Generating typed dependency parses from phrase structure parses. In Language Resources and Evaluation (LREC 2006). Oren Etzioni, Anthony Fader, Janara Christensen, Stephen Soderland, and Mausam. 2011. Open information extraction: the second generation. In Proceedings of the International Joint Conference on Artificial Intelligence (IJCAI ’11). Anthony Fader, Stephen Soderland, and Oren Etzioni. 2011. Identifying relations for open information extraction. In Proceedings of EMNLP. Raphael Hoffmann, Congle Zhang, and Daniel S. Weld. 2010. Learning 5000 relational extractors. In Proceedings ofthe 48thAnnual Meeting ofthe Association for Computational Linguistics, ACL ’ 10, pages 286– 295. Raphael Hoffmann, Congle Zhang, Xiao Ling, Luke S. Zettlemoyer, and Daniel S. Weld. 2011. Knowledgebased weak supervision for information extraction of overlapping relations. In ACL, pages 541–550. Richard Johansson and Pierre Nugues. 2008. The effect of syntactic representation on semantic role labeling. In Proceedings of the 22nd International Conference on Computational Linguistics (COLING 08), pages 393–400. Paul Kingsbury Martha and Martha Palmer. 2002. From treebank to propbank. In Proceedings of the Third International Conference on Language Resources and Evaluation (LREC 02). A. Meyers, R. Reeves, C. Macleod, R. Szekely, V. Zielinska, B. Young, and R. Grishman. 2004. Annotating Noun Argument Structure for NomBank. In Proceedings of LREC-2004, Lisbon, Portugal. Mike Mintz, Steven Bills, Rion Snow, and Dan Jurafsky. 2009. Distant supervision for relation extraction without labeled data. In ACL-IJCNLP ’09: Proceedings of the Joint Conference of the 47th Annual Meeting of the ACL and the 4th International Joint Conference on Natural Language Processing of the AFNLP: Volume 2, pages 1003–101 1. Ndapandula Nakashole, Martin Theobald, and Gerhard Weikum. 2011. Scalable knowledge harvesting with high precision and high recall. In Proceedings of the Fourth International Conference on Web Search and Web Data Mining (WSDM 2011), pages 227–236. Joakim Nivre and Jens Nilsson. 2004. Memory-based dependency parsing. In Proceedings ofthe Conference on Natural Language Learning (CoNLL-04), pages 49–56. Patrick Pantel and Marco Pennacchiotti. 2006. Espresso: Leveraging generic patterns for automatically harvesting semantic relations. In Proceedings of 21st International Conference on Computational Linguistics 44th Annual Meeting of the Association for Computational Linguistics (ACL’06). P. Resnik. 1996. Selectional constraints: an informationtheoretic model and its computational realization. Cognition. Sebastian Riedel, Limin Yao, and Andrew McCallum. 2010. Modeling relations and their mentions without labeled text. In ECML/PKDD (3), pages 148–163. Alan Ritter, Mausam, and Oren Etzioni. 2010. A latent dirichlet allocation method for selectional preferences. In Proceedings of the 48th Annual Meeting of the Association for Computational Linguistics (ACL ’10). Karin Kipper Schuler. 2006. VerbNet: A BroadCoverage, Comprehensive Verb Lexicon. Ph.D. thesis, University of Pennsylvania. Y. Shinyama and S. Sekine. 2006. Preemptive information extraction using unrestricted relation discovery. In Procs. of HLT/NAACL. Fabian M. Suchanek, Georgiana Ifrim, and Gerhard Weikum. 2006. Combining linguistic and statistical analysis to extract relations from web documents. In Procs. of KDD, pages 712–717. Fabian M. Suchanek, Mauro Sozio, and Gerhard Weikum. 2009. Sofie: a self-organizing framework for information extraction. In Proceedings of WWW, pages 63 1–640. Gang Wang, Yong Yu, and Haiping Zhu. 2007. Pore: Positive-only relation extraction from wikipedia text. In Proceedings of 6th International Semantic Web Conference and 2nd Asian Semantic Web Conference (ISWC/ASWC’07), pages 580–594. Fei Wu and Daniel S. Weld. 2010. Open information and extraction using Wikipedia. In Proceedings ofthe 48th Annual Meeting of the Association for Computational Linguistics (ACL ’10). Shubin Zhao and Ralph Grishman. 2005. Extracting relations with integrated information using kernel methods. In Procs. of ACL. Jun Zhu, Zaiqing Nie, Xiaojiang Liu, Bo Zhang, and Ji-Rong Wen. 2009. StatSnowball: a statistical approach to extracting entity relationships. In WWW ’09: Proceedings of the 18th international conference on World Wide Web, pages 101–1 10, New York, NY, USA. ACM. 534