emnlp emnlp2012 emnlp2012-110 emnlp2012-110-reference knowledge-graph by maker-knowledge-mining

110 emnlp-2012-Reading The Web with Learned Syntactic-Semantic Inference Rules

Source: pdf

Author: Ni Lao ; Amarnag Subramanya ; Fernando Pereira ; William W. Cohen

Abstract: We study how to extend a large knowledge base (Freebase) by reading relational information from a large Web text corpus. Previous studies on extracting relational knowledge from text show the potential of syntactic patterns for extraction, but they do not exploit background knowledge of other relations in the knowledge base. We describe a distributed, Web-scale implementation of a path-constrained random walk model that learns syntactic-semantic inference rules for binary relations from a graph representation of the parsed text and the knowledge base. Experiments show significant accuracy improvements in binary relation prediction over methods that consider only text, or only the existing knowledge base.

reference text

Eugene Agichtein and Luis Gravano. 2000. Snowball: extracting relations from large plain-text collections. In Proceedings of the fifth ACM conference on Digital libraries, DL ’00, pages 85–94, New York, NY, USA. ACM. Kurt Bollacker, Colin Evans, Praveen Paritosh, Tim Sturge, and Jamie Taylor. 2008. Freebase: a collaboratively created graph database for structuring human knowledge. In Proceedings of the 2008 ACM SIGMOD international conference on Management of data, SIGMOD ’08, pages 1247–1250, New York, NY, USA. ACM. Aron Culotta, Andrew McCallum, and Jonathan Betz. 2006. Integrating probabilistic extraction models and data mining to discover relations and patterns in text. In Proceedings of the Human Language Technology Conference of the NAACL, Main Conference, pages 296–303, New York City, USA, June. Association for Computational Linguistics. Marie-Catherine de Marneffe and Chris Manning. 2008. Stanford dependencies. http : / /www .t ex . ac .uk / cgi-bin /texfaq2 html ? l abe l=c it eURL. Jeffrey Dean and Sanjay Ghemawat. 2008. Mapreduce: simplified data processing on large clusters. Commun. ACM, 5 1(1): 107–1 13, January. Quang Do and Dan Roth. 2010. Constraints based taxonomic relation classification. In Proceedings of the 2010 Conference on Empirical Methods in Natural Language Processing, pages 1099–1 109, Cambridge, MA, October. Association for Computational Linguistics. Oren Etzioni, Michael Cafarella, Doug Downey, Stanley Kok, Ana-Maria Popescu, Tal Shaked, Stephen Soderland, Daniel S. Weld, and Alexander Yates. 2004. Web-scale information extraction in knowitall: (preliminary results). In Proceedings of the 13th international conference on World Wide Web, WWW ’04, pages 100–1 10, New York, NY, USA. ACM. Nir Friedman, Lise Getoor, Daphne Koller, and Avi Pfeffer. 1999. Learning Probabilistic Relational Models. In IJCAI, volume 16, pages 1300–1309. Aria Haghighi and Dan Klein. 2009. Simple coref- erence resolution with rich syntactic and semantic 1026 features. In Proceedings of the 2009 Conference on Empirical Methods in Natural Language Processing, pages 1152–1 161, Singapore, August. Association for Computational Linguistics. Marti A. Hearst. 1992. Automatic acquisition of hyponyms from large text corpora. In Proceedings of COLING-92, pages 539–545. Association for Computational Linguistics, August. Dan Klein and Christopher D. Manning. 2003. Accurate unlexicalized parsing. In Erhard Hinrichs and Dan Roth, editors, Proceedings of the 41st Annual Meeting on Association for Computational Linguistics, pages 423–430. Association for Computational Linguistics, July. Ni Lao and William Cohen. 2010. Relational retrieval using a combination of path-constrained random walks. Machine Learning, 8 1:53–67. Ni Lao, Tom Mitchell, and William W. Cohen. 2011. Random walk inference and learning in a large scale knowledge base. In Proceedings of the 2011 Conference on Empirical Methods in Natural Language Processing, pages 529–539, Edinburgh, Scotland, UK., July. Association for Computational Linguistics. Mike Mintz, Steven Bills, Rion Snow, and Daniel Jurafsky. 2009. Distant supervision for relation extraction without labeled data. In Proceedings of the Joint Conference of the 47th Annual Meeting of the ACL and the 4th International Joint Conference on Natural Language Processing of the AFNLP, pages 1003–101 1, Suntec, Singapore, August. Association for Computational Linguistics. Deepak Ravichandran and Eduard Hovy. 2002. Learning surface text patterns for a question answering system. In Proceedings of 40th Annual Meeting of the Association for Computational Linguistics, pages 41–47, Philadelphia, Pennsylvania, USA, July. Association for Computational Linguistics. Matthew Richardson and Pedro Domingos. 2006. Markov logic networks. Machine Learning, 62: 107– 136. Rion Snow, Daniel Jurafsky, and Andrew Y. Ng. 2005. Learning syntactic patterns for automatic hypernym discovery. In Lawrence K. Saul, Yair Weiss, and L e´on Bottou, editors, Advances in Neural Information Processing Systems 1 pages 1297–1304, Cambridge, 7, MA. NIPS Foundation, MIT Press. Fabian M. Suchanek, Georgiana Ifrim, and Gerhard Weikum. 2006. Combining linguistic and statistical analysis to extract relations from web documents. In Proceedings of the 12th ACM SIGKDD international conference on Knowledge discovery and data mining, KDD ’06, pages 712–717, New York, NY, USA. ACM.