acl acl2013 acl2013-160 acl2013-160-reference knowledge-graph by maker-knowledge-mining
Source: pdf
Author: Ndapandula Nakashole ; Tomasz Tylenda ; Gerhard Weikum
Abstract: Methods for information extraction (IE) and knowledge base (KB) construction have been intensively studied. However, a largely under-explored case is tapping into highly dynamic sources like news streams and social media, where new entities are continuously emerging. In this paper, we present a method for discovering and semantically typing newly emerging out-ofKB entities, thus improving the freshness and recall of ontology-based IE and improving the precision and semantic rigor of open IE. Our method is based on a probabilistic model that feeds weights into integer linear programs that leverage type signatures of relational phrases and type correlation or disjointness constraints. Our experimental evaluation, based on crowdsourced user studies, show our method performing significantly better than prior work.
S. Auer, C. Bizer, G. Kobilarov, J. Lehmann, R. Cyganiak, Z.G. Ives: DBpedia: A Nucleus for a Web of Open Data. In Proceedings of the 6th International Semantic Web Conference (ISWC), pages 722–735, Busan, Korea, 2007. M. Banko, M. J. Cafarella, S. Soderland, M. Broadhead, O. Etzioni: Open Information Extraction from the Web. In Proceedings of the 20th International Joint Conference on Artificial Intelligence (IJCAI), pages 2670–2676, Hyderabad, India, 2007. K. D. Bollacker, C. Evans, P. Paritosh, T. Sturge, J. Taylor: Freebase: a Collaboratively Created Graph Database for Structuring Human Knowledge. In Proceedings of the ACM SIGMOD International Conference on Management of Data (SIGMOD), pages, 1247-1250, Vancouver, BC, Canada, 2008. Lawrence D. Brown, T.Tony Cai, Anirban Dasgupta: Interval Estimation for a Binomial Proportion. Statistical Science 16: pages 101–133, 2001. R. C. Bunescu, M. Pasca: Using Encyclopedic Knowledge for Named entity Disambiguation. In Proceedings of the 11th Conference of the European Chapter of the Association for Computational Linguistics (EACL), Trento, Italy, 2006. A. Carlson, J. Betteridge, R.C. Wang, E.R. Hruschka, T.M. Mitchell: Coupled Semi-supervised Learning for Information Extraction. In Proceedings of the Third International Conference on Web Search and Web Data Mining (WSDM), pages 101–1 10, New York, NY, USA, 2010. S. Cucerzan: Large-Scale Named Entity Disambiguation Based on Wikipedia Data. In Proceedings of the 2007 Joint Conference on Empirical Methods in Natural Language Processing and Computational Natural Language Learning (EMNLP1495 CoNLL), pages 708–716, Prague, Czech Republic, 2007. A. Fader, S. Soderland, O. Etzioni: Identifying Relations for Open Information Extraction. In Proceedings of the 2011 Conference on Empirical Methods in Natural Language Processing (EMNLP), pages 1535–1545, Edinburgh, UK, 2011. J.R. Finkel, T. Grenager, C. Manning. 2005. Incorporating Non-local Information into Information Extraction Systems by Gibbs Sampling. In Proceedings of the 43rd Annual Meeting of the Association for Computational Linguistics (ACL), pages 363–370, Ann Arbor, Michigan, 2005. Michael Fleischman, Eduard H. Hovy: Fine Grained Classification of Named Entities. In Proceedings the International Conference on Computational Linguistics, COLING 2002. X. Han, J. Zhao: Named Entity Disambiguation by Leveraging Wikipedia Semantic Knowledge. In Proceedings of 18th ACM Conference on Information and Knowledge Management (CIKM), pages 215 224,Hong Kong, China, 2009. – C. Havasi, R. Speer, J. Alonso. ConceptNet 3: a Flexible, Multilingual Semantic Network for Common Sense Knowledge. In Proceedings of the Recent Advances in Natural Language Processing (RANLP), Borovets, Bulgaria, 2007. Sebastian Hellmann, Claus Stadler, Jens Lehmann, Sren Auer: DBpedia Live Extraction. OTM Conferences (2) 2009: 1209-1223. J. Hoffart, M. A. Yosef, I.Bordino and H. Fuerstenau, M. Pinkal, M. Spaniol, B.Taneva, S.Thater, Gerhard Weikum: Robust Disambiguation of Named Entities in Text. In Proceedings of the 2011 Conference on Empirical Methods in Natural Language Processing (EMNLP), pages 782–792, Edinburgh, UK, 2011. J. Hoffart, F. Suchanek, K. Berberich, E. LewisKelham, G. de Melo, G. Weikum: YAGO2: Exploring and Querying World Knowledge in Time, Space, Context, and Many Languages. In Proceedings of the 20th International Conference on World Wide Web (WWW), pages 229–232, Hyderabad, India. 2011. J. Hoffart, F. Suchanek, K. Berberich, G. Weikum: YAGO2: A Spatially and Temporally Enhanced Knowledge Base from Wikipedia. Artificial Intelligence 2012. Z. Kozareva, L. Voevodski, S.-H.Teng: Class Label Enhancement via Related Instances. EMNLP 2011: 118-128 J. R. Landis, G. G. Koch: The measurement of observer agreement for categorical data in Biometrics. Vol. 33, pp. 159174, 1977. C. Lee, Y-G. Hwang, M.-G. Jang: Fine-grained Named Entity Recognition and Relation Extraction for Question Answering. In Proceedings of the 30th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval (SIGIR), pages 799–800, Amsterdam, The Netherlands, 2007. T. Lin, Mausam , O. Etzioni: No Noun Phrase Left Behind: Detecting and Typing Unlinkable Entities. In Proceedings of the 2012 Joint Conference on Empirical Methods in Natural Language Processing and Computational Natural Language Learning (EMNLP-CoNLL), pages 893–903, Jeju, South Korea, 2012. Xiao Ling, Daniel S. Weld: Fine-Grained Entity Recognition. In Proceedings of the Conference on Artificial Intelligence (AAAI), 2012 D. N. Milne, I. H. Witten: Learning to Link with Wikipedia. In Proceedings of 17th ACM Conference on Information and Knowledge Management (CIKM), pages 509-518, Napa Valley, California, USA, 2008. N. Nakashole, G. Weikum, F. Suchanek: PATTY: A Taxonomy of Relational Patterns with Semantic Types. In Proceedings of the 2012 Joint Conference on Empirical Methods in Natural Lan- guage Processing and Computational Natural Language Learning (EMNLP-CoNLL), pages 1135 1145, Jeju, South Korea, 2012. V. Nastase, M. Strube, B. Boerschinger, C ¨acilia Zirn, Anas Elghafari: WikiNet: A Very Large Scale Multi-Lingual Concept Network. In Proceedings of the 7th International Conference on Language Resources and Evaluation(LREC), Malta, 2010. H. T. Nguyen, T. H. Cao: Named Entity Disambiguation on an Ontology Enriched by Wikipedia. In Proceedings of the IEEE International Conference on Research, Innovation and Vision for the Future in Computing & Communication Technologies (RIVF), pages 247–254, Ho Chi Minh City, Vietnam, 2008. Feng Niu, Ce Zhang, Christopher Re, Jude W. Shavlik: DeepDive: Web-scale Knowledge-base Construction using Statistical Learning and Inference. In the VLDS Workshop, pages 25-28, 2012. A. Rahman, Vincent Ng: Inducing Fine-Grained Semantic Classes via Hierarchical and Collective Classification. In Proceedings the International Conference on Computational Linguistics (COLING), pages 931-939, 2010. F. M. Suchanek, G. Kasneci, G. Weikum: Yago: a Core of Semantic Knowledge. In Proceedings of the 16th International Conference on World Wide Web (WWW) pages, 697-706, Banff, Alberta, Canada, 2007. 1496 F. M. Suchanek, M. Sozio, G. Weikum: SOFIE: A Self-organizing Framework for Information Extraction. InProceedings of the 18th International Conference on World Wide Web (WWW), pages 631–640, Madrid, Spain, 2009. P.P. Talukdar, F. Pereira: Experiments in Graph-Based Semi-Supervised Learning Methods for ClassInstance Acquisition. In Proceedings of the Annual Meeting of the Association for Computational Linguistics (ACL), pages 1473-1481, 2010. P. Venetis, A. Halevy, J. Madhavan, M. Pasca, W. Shen, F. Wu, G. Miao, C. Wu: Recovering Semantics of Tables on the Web. In Proceedings of the VLDB Endowment, PVLDB 4(9), pages, 528–538. 2011. W. Wu, H. Li, H. Wang, K. Zhu: Probase: A Probabilistic Taxonomy for Text Understanding. In Proceedings of the International Conference on Management of Data (SIGMOD), pages 48 1–492, Scottsdale, AZ, USA, 2012. M. A. Yosef, S. Bauer, J. Hoffart, M. Spaniol, G. Weikum: HYENA: Hierarchical Type Classification for Entity Names. In Proceedings the International Conference on Computational Linguistics(COLING), to appear, 2012. 1497