emnlp emnlp2011 emnlp2011-113 emnlp2011-113-reference knowledge-graph by maker-knowledge-mining

113 emnlp-2011-Relation Acquisition using Word Classes and Partial Patterns

Source: pdf

Author: Stijn De Saeger ; Kentaro Torisawa ; Masaaki Tsuchida ; Jun'ichi Kazama ; Chikara Hashimoto ; Ichiro Yamada ; Jong Hoon Oh ; Istvan Varga ; Yulan Yan

Abstract: This paper proposes a semi-supervised relation acquisition method that does not rely on extraction patterns (e.g. “X causes Y” for causal relations) but instead learns a combination of indirect evidence for the target relation semantic word classes and partial patterns. This method can extract long tail instances of semantic relations like causality from rare and complex expressions in a large Japanese Web corpus in extreme cases, patterns that occur only once in the entire corpus. Such patterns are beyond the reach ofcurrent pattern based methods. We show that our method performs on par with state-of-the-art pattern based methods, and maintains a reasonable level of accuracy even for instances — — acquired from infrequent patterns. This ability to acquire long tail instances is crucial for risk management and innovation, where an exhaustive database of high-level semantic relations like causation is of vital importance.

reference text

Eugene Agichtein and Luis Gravano. 2000. Snowball: extracting relations from large plain-text collections. In Proc. of the fifth ACM conference on Digital libraries, pages 85–94. Michele Banko and Oren Etzioni. 2008. The tradeoffs between open and traditional relation extraction. In Proc. of the 46th ACL-08:HLT, pages 28–36. Matthew Berland and Eugene Charniak. 1999. Finding parts in very large corpora. In Proceedings of the 37th Annual Meeting of the Association for Computational Linguistics, pages 57–64, College Park, Maryland, USA, June. Razvan C. Bunescu and Raymond J. Mooney. 2005. A shortest path dependency kernel for relation extraction. In Proceedings of the Conference on Human Language Technology and Empirical Methods in Natural Language Processing (HLT ’05), pages 724–73 1. Andrew Carlson, Justin Betteridge, Bryan Kisiel, Burr Settles, Estevam R. Hruschka Jr., and Tom M. Mitchell. 2010. Toward an architecture for neverending language learning. In Proc of the 24th AAAI, pages 1306–1313. Aron Culotta, Andrew McCallum, and Jonathan Betz. 2006. Integrating probabilistic extraction models and data mining to discover relations and patterns in text. In Human Language Technology Conference of the North American Chapter of the Association of Computational Linguistics (HLT/NAACL), pages 296–303. Aron Culotta. 2004. Dependency tree kernels for relation extraction. In In Proceedings of the 42nd Annual Meeting of the Association for Computational Linguistics (ACL-04, pages 423–429. Stijn De Saeger, Kentaro Torisawa, Jun’ichi Kazama, Kow Kuroda, and Masaki Murata. 2009. Large Scale Relation Acquisition Using Class Dependent Patterns. In Proc. of the 9th International Conference on Data Mining (ICDM), pages 764–769. Doug Downey, Stefan Schoenmackers, and Oren Etzioni. 2007. Sparse information extraction: Unsupervised language models to the rescue. In Proceedings of the 45th Annual Meeting of the Association for Computational Linguistics (ACL2007). Oren Etzioni, Michael Cafarella, Doug Downey, Stanley Kok, Ana-Maria Popescu, Tal Shaked, Stephen Soderland, Daniel Weld, and Alexander Yates. 2004. Webscale information extraction in KnowItAll. In Proc. of the 13th international conference on World Wide Web (WWW04), pages 100–1 10. Zhou GuoDong, Su Jian, Zhang Jie, and Zhang Min. 2005. Exploring various knowledge in relation extraction. In Proc. of the 43rd Annual Meeting on Association for Computational Linguistics, ACL ’05, pages 419–444. Marti Hearst. 1992. Automatic acquisition of hyponyms from large text corpora. In Proc. of the 14th International Conference on Computational Linguistics (COLING’92), pages 539–545. Tuyen N. Huynh and Raymond J. Mooney. 2008. Discriminative structure and parameter learning for markov logic networks. In Proc. of the 25th ICML, pages 416–423. Jun’ichi Kazama and Kentaro Torisawa. 2008. Inducing gazetteers for named entity recognition by large-scale clustering of dependency relations. In Proc. of the 46th Annual Meeting of the Association for Computational Linguistics: Human Language Technologies (ACL-08: HLT), pages 407–415. Mamoru Komachi, Taku Kudo, Masashi Shimbo, and Yuji Matsumoto. 2008. Graph-based analysis of semantic drift in espresso-like bootstrapping algorithms. In Proc. of EMNLP’08. Honolulu, USA, pages 1011 1020. Dekang Lin and Patrick Pantel. 2001. Dirt - discovery of inference rules from text. In Proc. of the ACM SIGKDD Conference on Knowledge Discovery and Data Mining, pages 323–328. 835 Mike Mintz, Steven Bills, Rion Snow, and Daniel Jurafsky. 2009. Distant supervision for relation extraction without labeled data. In Proc. of the Joint Conference of the 47th Annual Meeting of the ACL and the 4th International Joint Conference on Natural Language Processing of the AFNLP, pages 1003–101 1. Marius Pas ¸ca, Dekang Lin, Jeffrey Bigham, Andrei Lifchits, and Alpa Jain. 2006. Names and Similarities on the Web: Fact Extraction in the Fast Lane. In Proc. of the COLING-ACL06, pages 809–816. Patrick Pantel and Marco Pennacchiotti. 2006a. Espresso: Leveraging generic patterns for automati- cally harvesting semantic relations. In Proc. of the 21st International Conference on Computational Linguistics and 44th Annual Meeting of the Association for Computational Linguistics (COLING-ACL-06, pages 113–120. Patrick Pantel and Pennacchiotti Pennacchiotti, Marco. 2006b. Espresso: Leveraging generic patterns for automatically harvesting semantic relations. In Proc. of the COLING-ACL06, pages 113–120. J. R. Quinlan. 1990. Learning logical definitions from relations. Machine Learning, 5(3):239–266. Matthew Richardson and Pedro Domingo. 2006. Markov logic networks. Machine Learning, 26: 107– 136. Stefan Schoenmackers, Oren Etzioni, Daniel S. Weld, and Jesse Davis. 2010. Learning first-order horn clauses from web text. In Proc. of EMNLP2010, pages 1088–1098. Keiji Shinzato, Tomohide Shibata, Daisuke Kawahara, Chikara Hashimoto, and Sadao Kurohashi. 2008. TSUBAKI: An open search engine infrastructure for developing new information access. In Proc. of IJCNLP, pages 189–196. Kentaro Torisawa, Stijn De Saeger, Jun’ichi Kazama, Asuka Sumida, Daisuke Noguchi, Yasunari Kakizawa, Masaaki Murata, Kow Kuroda, and Ichiro Yamada. 2010. Organizing the web’s information explosion to discover unknown unknowns. New Generation Com- puting, 28(3):217–236. Masaaki Tsuchida, Stijn De Saeger, Kentaro Torisawa, Masaki Murata, Jun’ichi Kazama, Kow Kuroda, and Hayato Ohwada. 2010. Large scale similarity-based relation expansion. In Proc of the 4th IUCS, pages 140–147. Dmitry Zelenko, Chinatsu Aone, and Anthony Richardella. 2003. Kernel methods for relation extraction. Journal of Machine Learning Research, pages 1083–1 106.