acl acl2010 acl2010-113 acl2010-113-reference knowledge-graph by maker-knowledge-mining

113 acl-2010-Extraction and Approximation of Numerical Attributes from the Web


Source: pdf

Author: Dmitry Davidov ; Ari Rappoport

Abstract: We present a novel framework for automated extraction and approximation of numerical object attributes such as height and weight from the Web. Given an object-attribute pair, we discover and analyze attribute information for a set of comparable objects in order to infer the desired value. This allows us to approximate the desired numerical values even when no exact values can be found in the text. Our framework makes use of relation defining patterns and WordNet similarity information. First, we obtain from the Web and WordNet a list of terms similar to the given object. Then we retrieve attribute values for each term in this list, and information that allows us to compare different objects in the list and to infer the attribute value range. Finally, we combine the retrieved data for all terms from the list to select or approximate the requested value. We evaluate our method using automated question answering, WordNet enrichment, and comparison with answers given in Wikipedia and by leading search engines. In all of these, our framework provides a significant improvement.


reference text

Eiji Aramaki, Takeshi Imai, Kengo Miyo and Kazuhiko Ohe. 2007 UTH: SVM-based Semantic Relation Classifi cation using Physical Sizes. Proceedings of the Fourth International Workshop on Semantic Evaluations (SemEval-2007). Somnath Banerjee, Soumen Chakrabarti and Ganesh Ramakrishnan. 2009. Learning to Rank for Quantity Consensus Queries. SIGIR ’09. Michele Banko, Michael J Cafarella , Stephen Soderland, Matt Broadhead and Oren Etzioni. 2007. Open information extraction from the Web. IJCAI ’07. Matthew Berland, Eugene Charniak, 1999. Finding parts in very large corpora. ACL ’99. Michael Cafarella, Alon Halevy, Yang Zhang, Daisy Zhe Wang and Eugene Wu. 2008. WebTables: Exploring the Power of Tables on the Web. VLDB ’08. Timothy Chklovski and Patrick Pantel. 2004. VerbOcean: mining the Web for fi ne-grained semantic verb relations. EMNLP ’04. Eric Crestan and Patrick Pantel. 2010. Web-Scale Knowledge Extraction from Semi-Structured Tables. WWW ’10. Dmitry Davidov and Ari Rappoport. 2006. Effi cient Unsupervised Discovery of Word Categories Using Symmetric Patterns and High Frequency Words. ACL-Coling ’06. Dmitry Davidov, Ari Rappoport and Moshe Koppel. 2007. Fully unsupervised discovery of conceptspecifi c relationships by web mining. ACL ’07. Dmitry Davidov and Ari Rappoport. 2008a. Classifi cation of Semantic Relationships between Nominals Using Pattern Clusters. ACL ’08. Dmitry Davidov and Ari Rappoport. 2008b. Unsupervised Discovery of Generic Relationships Using Pattern Clusters and its Evaluation by Automatically Generated SAT Analogy Questions. ACL ’08. Roxana Girju, Adriana Badulescu, and Dan Moldovan. 2006. Automatic discovery of part-whole relations. Computational Linguistics, 32(1). Marty Hearst, 1992. Automatic acquisition of hyponyms from large text corpora. COLING ’92. Veronique Moriceau, 2006. Numerical Data Integration for Cooperative Question-Answering. EACL KRAQ06 ’06. John Prager, 2006. Open-domain question-answering. In Foundations and Trends in Information Retrieval,vol. 1, pp 91-231. 1316 Patrick Pantel and Marco Pennacchiotti. 2006. Espresso: leveraging generic patterns for automatically harvesting semantic relations. COLING-ACL ’06. Deepak Ravichandran and Eduard Hovy. 2002 Learning Surface Text Patterns for a Question Answering System. ACL ’02. Ellen Riloff and Rosie Jones. 1999. Learning Dictionaries for Information Extraction by Multi-Level Bootstrapping. AAAI ’99. Benjamin Rosenfeld and Ronen Feldman. 2007. Clustering for unsupervised relation identifi cation. CIKM ’07. Peter Turney, 2005. Measuring semantic similarity by latent relational analysis, IJCAI ’05. Dominic Widdows and Beate Dorow. 2002. A graph model for unsupervised Lexical acquisition. COLING ’02. 1317