acl acl2010 acl2010-113 knowledge-graph by maker-knowledge-mining
Source: pdf
Author: Dmitry Davidov ; Ari Rappoport
Abstract: We present a novel framework for automated extraction and approximation of numerical object attributes such as height and weight from the Web. Given an object-attribute pair, we discover and analyze attribute information for a set of comparable objects in order to infer the desired value. This allows us to approximate the desired numerical values even when no exact values can be found in the text. Our framework makes use of relation defining patterns and WordNet similarity information. First, we obtain from the Web and WordNet a list of terms similar to the given object. Then we retrieve attribute values for each term in this list, and information that allows us to compare different objects in the list and to infer the attribute value range. Finally, we combine the retrieved data for all terms from the list to select or approximate the requested value. We evaluate our method using automated question answering, WordNet enrichment, and comparison with answers given in Wikipedia and by leading search engines. In all of these, our framework provides a significant improvement.
Reference: text
sentIndex sentText sentNum sentScore
1 i l Abstract We present a novel framework for automated extraction and approximation of numerical object attributes such as height and weight from the Web. [sent-5, score-0.925]
2 Given an object-attribute pair, we discover and analyze attribute information for a set of comparable objects in order to infer the desired value. [sent-6, score-0.778]
3 This allows us to approximate the desired numerical values even when no exact values can be found in the text. [sent-7, score-0.771]
4 Then we retrieve attribute values for each term in this list, and information that allows us to compare different objects in the list and to infer the attribute value range. [sent-10, score-1.463]
5 Finally, we combine the retrieved data for all terms from the list to select or approximate the requested value. [sent-11, score-0.385]
6 1 Introduction Information on various numerical properties of physical objects, such as length, width and weight is fundamental in question answering frameworks and for answering search engine queries. [sent-14, score-0.807]
7 While in some cases manual annotation of objects with numerical properties is possible, it is a hard and labor intensive task, and is impractical for dealing with the vast amount of objects of interest. [sent-15, score-0.695]
8 i l In addition to answering direct questions, the ability to make a crude comparison or estimation of object attributes is important as well. [sent-20, score-0.429]
9 Due to the importance of relationship and attribute acquisition in NLP, numerous methods were proposed for extraction of various lexical relationships and attributes from text. [sent-24, score-0.746]
10 However, numerical attribute extraction is substantially different in two aspects, verification and approximation. [sent-26, score-0.805]
11 First, unlike most general lexical attributes, numerical attribute values are comparable. [sent-27, score-0.873]
12 The ability to compare values of different objects allows to improve attribute extraction precision by verifying consistency with attributes of other similar objects. [sent-29, score-1.02]
13 Extracting and looking at width values for different car brands and for ‘cars’ in general we find: • • Boundaries: Maximal car width is 2. [sent-33, score-0.658]
14 Second, while it is usually meaningless and impossible to approximate general lexical attribute values like an actor’s name, numerical attributes can be estimated even if they are not explicitly mentioned in the text. [sent-45, score-1.042]
15 In general, attribute extraction frameworks usually attempt to discover a single correct value (e. [sent-46, score-0.725]
16 In contrast, in numerical attribute extraction it is possible to provide an approximation even when no explicit information is present in the text, by using values of comparable objects for which information is provided. [sent-52, score-1.306]
17 In this paper we present a pattern-based framework that takes advantage of the properties of similar objects to improve extraction precision and allow approximation of requested numerical object properties. [sent-53, score-1.096]
18 First, given an object name we utilize WordNet and pattern-based extraction to find a list of similar objects and their category labels. [sent-55, score-0.619]
19 Second, we utilize a predefined set of lexical patterns in order to extract attribute values of these objects and available comparison/boundary information. [sent-56, score-1.031]
20 Finally, we analyze the obtained information and select or approximate the attribute value for the given (object, attribute) pair. [sent-57, score-0.685]
21 For WN enrichment evaluation, our framework discovered size and weight values for 300 WN physical objects, and the quality of results was evaluated by humanjudges. [sent-60, score-0.443]
22 Utilization of information about comparable objects provided a significant boost to numerical attribute extraction quality, and allowed a meaningful approximation of missing attribute values. [sent-63, score-1.683]
23 However, most of them do not exploit the numerical nature of the attribute data. [sent-82, score-0.729]
24 Our research is related to a sub-domain of question answering (Prager, 2006), since one of the applications of our framework is answering questions on numerical values. [sent-83, score-0.546]
25 1309 Several recent studies directly target the acquisition of numerical attributes from the Web and attempt to deal with ambiguity and noise of the retrieved attribute values. [sent-86, score-0.96]
26 , 2007) utilize a small set of patterns to extract physical object sizes and use the averages of the obtained values for a noun compound classification task. [sent-88, score-0.585]
27 They utilize a textual snippet feature and snippet quantity in order to select and rank intervals of the requested values. [sent-92, score-0.422]
28 This approach is particularly useful when it is possible to obtain a substantial amount of a desired attribute values for the requested query. [sent-93, score-0.872]
29 (Moriceau, 2006) proposed a rulebased system which analyzes the variation of the extracted numerical attribute values using information in the textual context of these values. [sent-94, score-0.873]
30 While in the current research we do not utilize this type of information, incorporation of the numerical data extracted from semi- structured web pages can be extremely beneficial for our framework. [sent-99, score-0.436]
31 All of the above numerical attribute extraction systems utilize only direct information available in the discovered object-attribute co-occurrences and their contexts. [sent-100, score-0.958]
32 Also, since the above studies utilize only explicitly available information they were unable to approximate object values in cases where no explicit information was found. [sent-103, score-0.495]
33 The algorithm comprises three main stages: (1) mining for similar objects and determination of a class label; (2) mining for at- tribute values and comparison statements; (3) processing the results. [sent-106, score-0.428]
34 1 Similar objects and class label To verify and estimate attribute values for the given object we utilize similar objects (cohyponyms) and the object’s class label (hypernym). [sent-108, score-1.353]
35 For an object name Obj we query the Web using a small set ofpre-defined co-hyponymy patterns like “as * and/or [Obj]”2. [sent-116, score-0.34]
36 For example, when searching for an object ‘Toyota’, we execute a search engine query [ “as * and Toyota ”] and we might retrieve a text snippet containing “. [sent-121, score-0.42]
37 Note that we perform this stage only once for each object and do not need to repeat it for different attribute types. [sent-140, score-0.715]
38 2 Querying for values, bounds and comparison data Now we would like to extract the attribute values for the given object and its similar objects. [sent-142, score-0.985]
39 We will also extract bounds and comparison information in order to verify the extracted values and to approximate the missing ones. [sent-143, score-0.421]
40 To allow us to extract attribute-specific information, we provided the system with a seed set of extraction patterns for each attribute type. [sent-144, score-0.662]
41 The first pattern group, Pvalues, allows extraction of the attribute values from the Web. [sent-149, score-0.708]
42 All seed patterns of this group contain a measurement unit name, attribute name, and some additional anchoring words, e. [sent-150, score-0.662]
43 1, we execute search engine queries and collect a set of numerical values for each pattern. [sent-154, score-0.524]
44 To do this we re-query the Web with the obtained (object, attribute value, attribute name) triplets (e. [sent-156, score-1.041]
45 We then extract new pat- terns from the retrieved search engine snippets and re-query the Web with the new patterns to obtain more attribute values. [sent-160, score-0.838]
46 At the end of the value extraction stage, we convert all values to a single unit format for comparison. [sent-164, score-0.34]
47 They allow us to find maximal and minimal values for the object category defined by labels. [sent-168, score-0.337]
48 They allow to compare objects directly even when no attribute values are mentioned. [sent-172, score-0.859]
49 This group includes attribute equality patterns such as ‘[Object1] has the same width as [Object2]’, and attribute inequality ones such as ‘[Object1] is wider than [Object2]’. [sent-173, score-1.364]
50 We use these pairs to build a directed graph (Widdows and Dorow, 2002; Davidov and Rappoport, 2006) in which nodes are objects (not necessarily with assigned values) and edges correspond to extracted co-appearances of objects inside the comparison patterns. [sent-175, score-0.593]
51 3 Processing the collected data As a result of the information collection stage, for each object and attribute type we get: • A set of attribute values for the requested ob- ject. [sent-179, score-1.49]
52 • A set of objects similar or comparable to Athe requested object, some oofr th coemmp aanrnaobltaete tdo with one or many attribute values. [sent-180, score-0.892]
53 • Upper and lowed bounds on attribute values Ufopr pteher given object category. [sent-181, score-0.928]
54 • A comparison graph connecting some of the Aretr cioemvepda objects by comparison edges. [sent-182, score-0.378]
55 Now we combine these information sources to select a single attribute value for the requested object or to approximate this value. [sent-184, score-1.055]
56 However, if we obtain a set of values or no values at all, we have to utilize comparison data to select one of the retrieved values or to approximate the value in case we do not have an exact answer. [sent-194, score-0.945]
57 The goal now is to select an attribute value for the given object. [sent-206, score-0.601]
58 During the first stage it is possible that we directly extract from the text a set of values for the requested object. [sent-207, score-0.355]
59 If we still have several values remaining, we choose the most frequent value based on the number of web snippets retrieved during the value acquisition stage. [sent-209, score-0.594]
60 In the case when we do not have any values remaining after the bounds processing step, the object node will remain unassigned after construction of the comparison graph, and we would like to estimate its value. [sent-212, score-0.646]
61 us to set the values of all unassigned nodes, including the node of the requested object. [sent-215, score-0.47]
62 These dummy nodes will be used when we encounter a graph which ends with one or more nodes with no available numerical information. [sent-221, score-0.408]
63 st → we assign values for all unassigned nodes that belong to a single legal unassigned path, using a simple linear combination: V al(Ai)i∈(1. [sent-239, score-0.525]
64 n) = n +n + 1 1 − iAvg(A0) +n +i 1Avg(An+1) Then, for all unassigned nodes that belong to multiple legal unassigned paths, we compute node value as above for each path separately and assign to the node the average of the computed values. [sent-242, score-0.46]
65 Finally we assign the average of all extracted values within bounds to all the remaining unassigned nodes. [sent-243, score-0.396]
66 Note that if we have no comparison information and no value information for the requested object, the requested object will receive the average of the extracted values of the whole set of the retrieved comparable objects and the comparison step will be essentially empty. [sent-244, score-1.201]
67 4 Experimental Setup We performed automated question answering (QA) evaluation, human-based WN enrichment evaluation, and human-based comparison of our results to data available through Wikipedia and to the top results of leading search engines. [sent-245, score-0.407]
68 2, vcaqluuieextraction) for the given object was used, as done in most general-purpose attribute acquisition systems. [sent-249, score-0.737]
69 For each attribute we extracted from the Web 250 questions in the following way. [sent-259, score-0.58]
70 For each question and each condition our framework returned a numerical value marked as either an exact answer or as an approximation. [sent-288, score-0.575]
71 In cases where no data was found for an approximation (no similar objects with values were found), our framework returned no answer. [sent-289, score-0.587]
72 3 WN enrichment evaluation We manually selected 300 WN entities from about 1000 randomly selected objects below the object tree in WN, by filtering out entities that clearly do not possess any of the addressed numerical attributes. [sent-294, score-0.82]
73 There is usually little or no variability in attribute values of such objects, and the major source of extraction errors is name ambiguity of the requested objects. [sent-298, score-0.934]
74 WordNet physical objects, in contrast, are much less specific and their attributes such as size and 4Due to the nature of the task recall/f-score measures are redundant here 1313 weight rarely have a single correct value, but usually possess an acceptable numerical range. [sent-299, score-0.401]
75 For example, the majority of the selected objects like ‘apple’ are too general to assign an exact size. [sent-300, score-0.355]
76 Crudeness of desired approximation depends both on potential applications and on object type. [sent-302, score-0.386]
77 We have mixed (Object, Attribute name, Attribute value) triplets obtained through each of the conditions, and asked human subjects to assign these to one of the following categories: • • The attribute value is reasonable for the given object. [sent-307, score-0.668]
78 The value is a very crude approximation of • • tThhee given object aetrtyrib curuted. [sent-308, score-0.402]
79 queries frequently provide attribute values in the top snippets or in search result web pages. [sent-318, score-0.814]
80 Many Wikipedia articles include infoboxes with well-organized attribute values. [sent-319, score-0.488]
81 Recently, the Wolfram Alpha computational knowledge engine presented the computation of attribute values from a given query text. [sent-320, score-0.7]
82 Wi: There is a Wikipedia page for the given object and it contains an appropriate attribute value or an approximation in an infobox. [sent-328, score-0.89]
83 Wf: A Wolfram Alpha query for [object- name attribute-name] retrieves a correct value or a good approximation value 5 Results 5. [sent-330, score-0.37]
84 Looking at %Exact+%Approx, we can see that for all datasets only 1-9% of the questions remain unanswered, while correct exact answers are found for 65%/87% of the questions for Web/TREC (% Exact and Prec(Exact) in the table). [sent-333, score-0.41]
85 Thus approximation allows us to answer 1324% of the requested values which are either simply missing from the retrieved text or cannot be detected using the current pattern-based framework. [sent-334, score-0.609]
86 Comparing performance of FULL to DIRECT, we see that our framework not only allows an approximation when no exact answer can be found, but also significantly increases the precision of exact answers using the comparison and the boundary information. [sent-335, score-0.578]
87 amount of exact and approximate Comparing results for different question types we can see substantial performance differences between the attribute types. [sent-339, score-0.706]
88 This is likely due to a lesser difficulty ofdepth questions or to a more exact nature of available depth information compared to width or size. [sent-341, score-0.443]
89 2 WN enrichment As shown in Table 2, for the majority of examined WN objects, the algorithm returned an approximate value, and only for 13-15% ofthe objects (vs. [sent-343, score-0.537]
90 Note that the common pattern-based acquisition framework, presented as the DIRECT condition, could only extract attribute values for 15% of the objects since it does not allow approximations and FULLDIRECTNOCBNOBNOC %%AExppacrotx1850. [sent-345, score-0.987]
91 68 Table 2: Percentage of exact and approximate values for the WordNet enrichment dataset. [sent-359, score-0.482]
92 Some examples for WN objects and approximate values discovered by the algorithm are: Sandfish, 15gr; skull, 1100gr; pilot, 80. [sent-366, score-0.501]
93 3 Comparison with search engines and Wikipedia Table 4 shows results for the above datasets in comparison to the proportion of correct results and the approximations returned by our framework under the FULL condition (correct exact values and approximations are taken together). [sent-373, score-0.637]
94 6 Conclusion We presented a novel framework which allows an automated extraction and approximation of numerical attributes from the Web, even when no explicit attribute values can be found in the text for the given object. [sent-376, score-1.271]
95 Our framework retrieves similarity, boundary and comparison information for objects similar to the desired object, and combines this information to approximate the desired attribute. [sent-377, score-0.604]
96 While in this study we explored only several specific numerical attributes like size and weight, our framework can be easily augmented to work with any other consistent and comparable attribute type. [sent-378, score-0.866]
97 The only change required for incorporation of a new attribute type is the development of attribute-specific Pboundary, Pvalues, and Pcompare pattern groups; the rest of the system remains unchanged. [sent-379, score-0.521]
98 In our evaluation we showed that our framework achieves good results and significantly outperforms the baseline commonly used for general lexical attribute retrieval7. [sent-380, score-0.54]
99 Hence a promising direction would be to use our approach in combination with Wikipedia data and with additional manually created attribute rich sources such as Web tables, to achieve the best possible performance and coverage. [sent-382, score-0.488]
100 We would also like to explore the incorporation of approximate discovered numerical attribute data into existing NLP tasks such as noun compound classification and textual entailment. [sent-383, score-0.926]
wordName wordTfidf (topN-words)
[('attribute', 0.488), ('numerical', 0.241), ('objects', 0.227), ('width', 0.21), ('object', 0.193), ('qa', 0.183), ('requested', 0.177), ('enrichment', 0.159), ('wn', 0.15), ('unassigned', 0.149), ('values', 0.144), ('approximation', 0.13), ('davidov', 0.126), ('toyota', 0.114), ('bounds', 0.103), ('patterns', 0.098), ('exact', 0.095), ('height', 0.093), ('questions', 0.092), ('retrieved', 0.09), ('web', 0.088), ('attributes', 0.085), ('approximate', 0.084), ('wikipedia', 0.081), ('value', 0.079), ('extraction', 0.076), ('pcompare', 0.074), ('utilize', 0.074), ('approximations', 0.072), ('engine', 0.068), ('triplets', 0.065), ('desired', 0.063), ('answering', 0.061), ('rappoport', 0.061), ('ari', 0.06), ('snippets', 0.058), ('dmitry', 0.058), ('boundary', 0.058), ('comparison', 0.057), ('answers', 0.056), ('wolfram', 0.056), ('acquisition', 0.056), ('automated', 0.055), ('obj', 0.055), ('framework', 0.052), ('snippet', 0.051), ('subtree', 0.05), ('name', 0.049), ('frameworks', 0.049), ('px', 0.048), ('car', 0.047), ('depth', 0.046), ('wordnet', 0.046), ('discovered', 0.046), ('nodes', 0.045), ('tall', 0.045), ('equality', 0.045), ('querying', 0.042), ('physical', 0.042), ('datasets', 0.042), ('unit', 0.041), ('relationships', 0.041), ('avg', 0.04), ('dummy', 0.04), ('scenario', 0.039), ('question', 0.039), ('legal', 0.038), ('retrieve', 0.037), ('aec', 0.037), ('alpha', 0.037), ('axcrt', 0.037), ('classifi', 0.037), ('corolla', 0.037), ('fulldirectnocbnobnoc', 0.037), ('honda', 0.037), ('moriceau', 0.037), ('pboundary', 0.037), ('pbounds', 0.037), ('reea', 0.037), ('graph', 0.037), ('subjects', 0.036), ('search', 0.036), ('quantity', 0.035), ('reject', 0.035), ('execute', 0.035), ('group', 0.035), ('answer', 0.035), ('compound', 0.034), ('select', 0.034), ('stage', 0.034), ('returned', 0.034), ('house', 0.034), ('missing', 0.033), ('majority', 0.033), ('direct', 0.033), ('correct', 0.033), ('google', 0.033), ('incorporation', 0.033), ('jacket', 0.033), ('berland', 0.033)]
simIndex simValue paperId paperTitle
same-paper 1 0.99999994 113 acl-2010-Extraction and Approximation of Numerical Attributes from the Web
Author: Dmitry Davidov ; Ari Rappoport
Abstract: We present a novel framework for automated extraction and approximation of numerical object attributes such as height and weight from the Web. Given an object-attribute pair, we discover and analyze attribute information for a set of comparable objects in order to infer the desired value. This allows us to approximate the desired numerical values even when no exact values can be found in the text. Our framework makes use of relation defining patterns and WordNet similarity information. First, we obtain from the Web and WordNet a list of terms similar to the given object. Then we retrieve attribute values for each term in this list, and information that allows us to compare different objects in the list and to infer the attribute value range. Finally, we combine the retrieved data for all terms from the list to select or approximate the requested value. We evaluate our method using automated question answering, WordNet enrichment, and comparison with answers given in Wikipedia and by leading search engines. In all of these, our framework provides a significant improvement.
2 0.12974977 245 acl-2010-Understanding the Semantic Structure of Noun Phrase Queries
Author: Xiao Li
Abstract: Determining the semantic intent of web queries not only involves identifying their semantic class, which is a primary focus of previous works, but also understanding their semantic structure. In this work, we formally define the semantic structure of noun phrase queries as comprised of intent heads and intent modifiers. We present methods that automatically identify these constituents as well as their semantic roles based on Markov and semi-Markov conditional random fields. We show that the use of semantic features and syntactic features significantly contribute to improving the understanding performance.
3 0.12685408 215 acl-2010-Speech-Driven Access to the Deep Web on Mobile Devices
Author: Taniya Mishra ; Srinivas Bangalore
Abstract: The Deep Web is the collection of information repositories that are not indexed by search engines. These repositories are typically accessible through web forms and contain dynamically changing information. In this paper, we present a system that allows users to access such rich repositories of information on mobile devices using spoken language.
4 0.12669182 208 acl-2010-Sentence and Expression Level Annotation of Opinions in User-Generated Discourse
Author: Cigdem Toprak ; Niklas Jakob ; Iryna Gurevych
Abstract: In this paper, we introduce a corpus of consumer reviews from the rateitall and the eopinions websites annotated with opinion-related information. We present a two-level annotation scheme. In the first stage, the reviews are analyzed at the sentence level for (i) relevancy to a given topic, and (ii) expressing an evaluation about the topic. In the second stage, on-topic sentences containing evaluations about the topic are further investigated at the expression level for pinpointing the properties (semantic orientation, intensity), and the functional components of the evaluations (opinion terms, targets and holders). We discuss the annotation scheme, the inter-annotator agreement for different subtasks and our observations.
5 0.12334646 189 acl-2010-Optimizing Question Answering Accuracy by Maximizing Log-Likelihood
Author: Matthias H. Heie ; Edward W. D. Whittaker ; Sadaoki Furui
Abstract: In this paper we demonstrate that there is a strong correlation between the Question Answering (QA) accuracy and the log-likelihood of the answer typing component of our statistical QA model. We exploit this observation in a clustering algorithm which optimizes QA accuracy by maximizing the log-likelihood of a set of question-and-answer pairs. Experimental results show that we achieve better QA accuracy using the resulting clusters than by using manually derived clusters.
6 0.11778128 209 acl-2010-Sentiment Learning on Product Reviews via Sentiment Ontology Tree
7 0.11361976 185 acl-2010-Open Information Extraction Using Wikipedia
8 0.11146779 261 acl-2010-Wikipedia as Sense Inventory to Improve Diversity in Web Search Results
9 0.10909519 174 acl-2010-Modeling Semantic Relevance for Question-Answer Pairs in Web Social Communities
10 0.10258164 187 acl-2010-Optimising Information Presentation for Spoken Dialogue Systems
11 0.10229772 109 acl-2010-Experiments in Graph-Based Semi-Supervised Learning Methods for Class-Instance Acquisition
12 0.096379258 2 acl-2010-"Was It Good? It Was Provocative." Learning the Meaning of Scalar Adjectives
13 0.094538644 199 acl-2010-Preferences versus Adaptation during Referring Expression Generation
14 0.092941754 171 acl-2010-Metadata-Aware Measures for Answer Summarization in Community Question Answering
15 0.091687508 159 acl-2010-Learning 5000 Relational Extractors
16 0.091005355 160 acl-2010-Learning Arguments and Supertypes of Semantic Relations Using Recursive Patterns
17 0.090273321 124 acl-2010-Generating Image Descriptions Using Dependency Relational Patterns
18 0.083795339 125 acl-2010-Generating Templates of Entity Summaries with an Entity-Aspect Model and Pattern Mining
19 0.077034041 19 acl-2010-A Taxonomy, Dataset, and Classifier for Automatic Noun Compound Interpretation
20 0.075771339 156 acl-2010-Knowledge-Rich Word Sense Disambiguation Rivaling Supervised Systems
topicId topicWeight
[(0, -0.199), (1, 0.121), (2, -0.098), (3, 0.005), (4, 0.057), (5, -0.051), (6, 0.043), (7, 0.07), (8, -0.058), (9, -0.027), (10, -0.048), (11, -0.051), (12, -0.116), (13, -0.14), (14, 0.083), (15, 0.145), (16, -0.071), (17, 0.128), (18, -0.005), (19, 0.071), (20, -0.018), (21, -0.019), (22, 0.029), (23, -0.034), (24, -0.007), (25, 0.092), (26, 0.005), (27, -0.082), (28, -0.141), (29, 0.02), (30, -0.062), (31, -0.052), (32, 0.021), (33, -0.013), (34, 0.022), (35, 0.06), (36, 0.069), (37, 0.063), (38, 0.077), (39, 0.008), (40, -0.043), (41, 0.044), (42, -0.016), (43, 0.131), (44, 0.018), (45, -0.037), (46, 0.057), (47, 0.098), (48, -0.062), (49, 0.06)]
simIndex simValue paperId paperTitle
same-paper 1 0.97091907 113 acl-2010-Extraction and Approximation of Numerical Attributes from the Web
Author: Dmitry Davidov ; Ari Rappoport
Abstract: We present a novel framework for automated extraction and approximation of numerical object attributes such as height and weight from the Web. Given an object-attribute pair, we discover and analyze attribute information for a set of comparable objects in order to infer the desired value. This allows us to approximate the desired numerical values even when no exact values can be found in the text. Our framework makes use of relation defining patterns and WordNet similarity information. First, we obtain from the Web and WordNet a list of terms similar to the given object. Then we retrieve attribute values for each term in this list, and information that allows us to compare different objects in the list and to infer the attribute value range. Finally, we combine the retrieved data for all terms from the list to select or approximate the requested value. We evaluate our method using automated question answering, WordNet enrichment, and comparison with answers given in Wikipedia and by leading search engines. In all of these, our framework provides a significant improvement.
2 0.67819262 185 acl-2010-Open Information Extraction Using Wikipedia
Author: Fei Wu ; Daniel S. Weld
Abstract: Information-extraction (IE) systems seek to distill semantic relations from naturallanguage text, but most systems use supervised learning of relation-specific examples and are thus limited by the availability of training data. Open IE systems such as TextRunner, on the other hand, aim to handle the unbounded number of relations found on the Web. But how well can these open systems perform? This paper presents WOE, an open IE system which improves dramatically on TextRunner’s precision and recall. The key to WOE’s performance is a novel form of self-supervised learning for open extractors using heuris— tic matches between Wikipedia infobox attribute values and corresponding sentences to construct training data. Like TextRunner, WOE’s extractor eschews lexicalized features and handles an unbounded set of semantic relations. WOE can operate in two modes: when restricted to POS tag features, it runs as quickly as TextRunner, but when set to use dependency-parse features its precision and recall rise even higher.
3 0.6217519 159 acl-2010-Learning 5000 Relational Extractors
Author: Raphael Hoffmann ; Congle Zhang ; Daniel S. Weld
Abstract: Many researchers are trying to use information extraction (IE) to create large-scale knowledge bases from natural language text on the Web. However, the primary approach (supervised learning of relation-specific extractors) requires manually-labeled training data for each relation and doesn’t scale to the thousands of relations encoded in Web text. This paper presents LUCHS, a self-supervised, relation-specific IE system which learns 5025 relations more than an order of magnitude greater than any previous approach with an average F1 score of 61%. Crucial to LUCHS’s performance is an automated system for dynamic lexicon learning, which allows it to learn accurately from heuristically-generated training data, which is often noisy and sparse. — —
4 0.60770315 215 acl-2010-Speech-Driven Access to the Deep Web on Mobile Devices
Author: Taniya Mishra ; Srinivas Bangalore
Abstract: The Deep Web is the collection of information repositories that are not indexed by search engines. These repositories are typically accessible through web forms and contain dynamically changing information. In this paper, we present a system that allows users to access such rich repositories of information on mobile devices using spoken language.
5 0.59768248 189 acl-2010-Optimizing Question Answering Accuracy by Maximizing Log-Likelihood
Author: Matthias H. Heie ; Edward W. D. Whittaker ; Sadaoki Furui
Abstract: In this paper we demonstrate that there is a strong correlation between the Question Answering (QA) accuracy and the log-likelihood of the answer typing component of our statistical QA model. We exploit this observation in a clustering algorithm which optimizes QA accuracy by maximizing the log-likelihood of a set of question-and-answer pairs. Experimental results show that we achieve better QA accuracy using the resulting clusters than by using manually derived clusters.
6 0.58370274 63 acl-2010-Comparable Entity Mining from Comparative Questions
7 0.57860434 248 acl-2010-Unsupervised Ontology Induction from Text
8 0.57509404 111 acl-2010-Extracting Sequences from the Web
9 0.53318131 171 acl-2010-Metadata-Aware Measures for Answer Summarization in Community Question Answering
10 0.52929914 2 acl-2010-"Was It Good? It Was Provocative." Learning the Meaning of Scalar Adjectives
11 0.51989758 174 acl-2010-Modeling Semantic Relevance for Question-Answer Pairs in Web Social Communities
12 0.50711399 109 acl-2010-Experiments in Graph-Based Semi-Supervised Learning Methods for Class-Instance Acquisition
13 0.49427336 166 acl-2010-Learning Word-Class Lattices for Definition and Hypernym Extraction
14 0.48250592 160 acl-2010-Learning Arguments and Supertypes of Semantic Relations Using Recursive Patterns
15 0.47490731 224 acl-2010-Talking NPCs in a Virtual Game World
16 0.4632836 64 acl-2010-Complexity Assumptions in Ontology Verbalisation
17 0.44064987 181 acl-2010-On Learning Subtypes of the Part-Whole Relation: Do Not Mix Your Seeds
18 0.42257288 199 acl-2010-Preferences versus Adaptation during Referring Expression Generation
19 0.41515744 140 acl-2010-Identifying Non-Explicit Citing Sentences for Citation-Based Summarization.
20 0.4126102 261 acl-2010-Wikipedia as Sense Inventory to Improve Diversity in Web Search Results
topicId topicWeight
[(14, 0.02), (25, 0.086), (42, 0.018), (44, 0.011), (59, 0.119), (72, 0.065), (73, 0.056), (76, 0.013), (77, 0.206), (78, 0.033), (80, 0.014), (83, 0.08), (84, 0.032), (98, 0.146)]
simIndex simValue paperId paperTitle
same-paper 1 0.84702504 113 acl-2010-Extraction and Approximation of Numerical Attributes from the Web
Author: Dmitry Davidov ; Ari Rappoport
Abstract: We present a novel framework for automated extraction and approximation of numerical object attributes such as height and weight from the Web. Given an object-attribute pair, we discover and analyze attribute information for a set of comparable objects in order to infer the desired value. This allows us to approximate the desired numerical values even when no exact values can be found in the text. Our framework makes use of relation defining patterns and WordNet similarity information. First, we obtain from the Web and WordNet a list of terms similar to the given object. Then we retrieve attribute values for each term in this list, and information that allows us to compare different objects in the list and to infer the attribute value range. Finally, we combine the retrieved data for all terms from the list to select or approximate the requested value. We evaluate our method using automated question answering, WordNet enrichment, and comparison with answers given in Wikipedia and by leading search engines. In all of these, our framework provides a significant improvement.
2 0.73855472 127 acl-2010-Global Learning of Focused Entailment Graphs
Author: Jonathan Berant ; Ido Dagan ; Jacob Goldberger
Abstract: We propose a global algorithm for learning entailment relations between predicates. We define a graph structure over predicates that represents entailment relations as directed edges, and use a global transitivity constraint on the graph to learn the optimal set of edges, by formulating the optimization problem as an Integer Linear Program. We motivate this graph with an application that provides a hierarchical summary for a set of propositions that focus on a target concept, and show that our global algorithm improves performance by more than 10% over baseline algorithms.
3 0.73339134 209 acl-2010-Sentiment Learning on Product Reviews via Sentiment Ontology Tree
Author: Wei Wei ; Jon Atle Gulla
Abstract: Existing works on sentiment analysis on product reviews suffer from the following limitations: (1) The knowledge of hierarchical relationships of products attributes is not fully utilized. (2) Reviews or sentences mentioning several attributes associated with complicated sentiments are not dealt with very well. In this paper, we propose a novel HL-SOT approach to labeling a product’s attributes and their associated sentiments in product reviews by a Hierarchical Learning (HL) process with a defined Sentiment Ontology Tree (SOT). The empirical analysis against a humanlabeled data set demonstrates promising and reasonable performance of the proposed HL-SOT approach. While this paper is mainly on sentiment analysis on reviews of one product, our proposed HLSOT approach is easily generalized to labeling a mix of reviews of more than one products.
4 0.73153603 218 acl-2010-Structural Semantic Relatedness: A Knowledge-Based Method to Named Entity Disambiguation
Author: Xianpei Han ; Jun Zhao
Abstract: Name ambiguity problem has raised urgent demands for efficient, high-quality named entity disambiguation methods. In recent years, the increasing availability of large-scale, rich semantic knowledge sources (such as Wikipedia and WordNet) creates new opportunities to enhance the named entity disambiguation by developing algorithms which can exploit these knowledge sources at best. The problem is that these knowledge sources are heterogeneous and most of the semantic knowledge within them is embedded in complex structures, such as graphs and networks. This paper proposes a knowledge-based method, called Structural Semantic Relatedness (SSR), which can enhance the named entity disambiguation by capturing and leveraging the structural semantic knowledge in multiple knowledge sources. Empirical results show that, in comparison with the classical BOW based methods and social network based methods, our method can significantly improve the disambiguation performance by respectively 8.7% and 14.7%. 1
5 0.72333753 159 acl-2010-Learning 5000 Relational Extractors
Author: Raphael Hoffmann ; Congle Zhang ; Daniel S. Weld
Abstract: Many researchers are trying to use information extraction (IE) to create large-scale knowledge bases from natural language text on the Web. However, the primary approach (supervised learning of relation-specific extractors) requires manually-labeled training data for each relation and doesn’t scale to the thousands of relations encoded in Web text. This paper presents LUCHS, a self-supervised, relation-specific IE system which learns 5025 relations more than an order of magnitude greater than any previous approach with an average F1 score of 61%. Crucial to LUCHS’s performance is an automated system for dynamic lexicon learning, which allows it to learn accurately from heuristically-generated training data, which is often noisy and sparse. — —
6 0.72253633 169 acl-2010-Learning to Translate with Source and Target Syntax
7 0.71959966 211 acl-2010-Simple, Accurate Parsing with an All-Fragments Grammar
8 0.7191236 109 acl-2010-Experiments in Graph-Based Semi-Supervised Learning Methods for Class-Instance Acquisition
9 0.7184267 261 acl-2010-Wikipedia as Sense Inventory to Improve Diversity in Web Search Results
10 0.71776402 174 acl-2010-Modeling Semantic Relevance for Question-Answer Pairs in Web Social Communities
11 0.71775007 172 acl-2010-Minimized Models and Grammar-Informed Initialization for Supertagging with Highly Ambiguous Lexicons
12 0.71750093 15 acl-2010-A Semi-Supervised Key Phrase Extraction Approach: Learning from Title Phrases through a Document Semantic Network
13 0.71677852 171 acl-2010-Metadata-Aware Measures for Answer Summarization in Community Question Answering
14 0.7153517 245 acl-2010-Understanding the Semantic Structure of Noun Phrase Queries
15 0.71299005 248 acl-2010-Unsupervised Ontology Induction from Text
16 0.71198595 162 acl-2010-Learning Common Grammar from Multilingual Corpus
17 0.71160293 55 acl-2010-Bootstrapping Semantic Analyzers from Non-Contradictory Texts
18 0.71038455 87 acl-2010-Discriminative Modeling of Extraction Sets for Machine Translation
19 0.7101239 160 acl-2010-Learning Arguments and Supertypes of Semantic Relations Using Recursive Patterns
20 0.70934641 184 acl-2010-Open-Domain Semantic Role Labeling by Modeling Word Spans