acl acl2011 acl2011-190 knowledge-graph by maker-knowledge-mining
Source: pdf
Author: Raphael Hoffmann ; Congle Zhang ; Xiao Ling ; Luke Zettlemoyer ; Daniel S. Weld
Abstract: Information extraction (IE) holds the promise of generating a large-scale knowledge base from the Web’s natural language text. Knowledge-based weak supervision, using structured data to heuristically label a training corpus, works towards this goal by enabling the automated learning of a potentially unbounded number of relation extractors. Recently, researchers have developed multiinstance learning algorithms to combat the noisy training data that can come from heuristic labeling, but their models assume relations are disjoint — for example they cannot extract the pair Founded ( Jobs Apple ) and CEO-o f ( Jobs Apple ) . , , This paper presents a novel approach for multi-instance learning with overlapping relations that combines a sentence-level extrac- , tion model with a simple, corpus-level component for aggregating the individual facts. We apply our model to learn extractors for NY Times text using weak supervision from Freebase. Experiments show that the approach runs quickly and yields surprising gains in accuracy, at both the aggregate and sentence level.
Reference: text
sentIndex sentText sentNum sentScore
1 Knowledge-based weak supervision, using structured data to heuristically label a training corpus, works towards this goal by enabling the automated learning of a potentially unbounded number of relation extractors. [sent-3, score-0.4]
2 Recently, researchers have developed multiinstance learning algorithms to combat the noisy training data that can come from heuristic labeling, but their models assume relations are disjoint — for example they cannot extract the pair Founded ( Jobs Apple ) and CEO-o f ( Jobs Apple ) . [sent-4, score-0.259]
3 , , This paper presents a novel approach for multi-instance learning with overlapping relations that combines a sentence-level extrac- , tion model with a simple, corpus-level component for aggregating the individual facts. [sent-5, score-0.282]
4 We apply our model to learn extractors for NY Times text using weak supervision from Freebase. [sent-6, score-0.528]
5 Experiments show that the approach runs quickly and yields surprising gains in accuracy, at both the aggregate and sentence level. [sent-7, score-0.266]
6 For example, suppose that r(e1, e2) = Founded(Jobs , Apple) is a ground tuple in the database and s =“Steve Jobs founded Apple, Inc. [sent-15, score-0.28]
7 While weak supervision works well when the textual corpus is tightly aligned to the database contents (e. [sent-17, score-0.593]
8 To fix this problem they cast weak supervision as a form of multi-instance learning, assuming only that at least one of the sentences containing e1 and e2 are expressing r(e1, e2), and their method yields a substantial improvement in extraction performance. [sent-24, score-0.664]
9 , 2009)) assumes that relations do not overlap there cannot exist two facts r(e1, e2) and q(e1, e2) that are both true for any pair of entities, e1 and e2. [sent-27, score-0.381]
10 3% of the weak supervision facts in Freebase that match sentences in the NY Times 2007 corpus have overlapping relations. [sent-32, score-0.795]
11 This paper presents MULTIR, a novel model of weak supervision that makes the following contributions: • MULTIR introduces a probabilistic, graphical mModel of multi-instance learning which handles overlapping relations. [sent-33, score-0.577]
12 (2010)’s approach on both aggregate (corpus as a whole) and sentential extractions. [sent-39, score-0.433]
13 Weak Supervision from a Database Given a corpus of text, we seek to extract facts about entities, such as the company Apple or the city Bo st on. [sent-41, score-0.204]
14 A ground fact (or relation instance), is an expression r(e) where r is a relation name, for example Founded or CEO-o f, and e = e1, . [sent-42, score-0.347]
15 A relation mention is a sequence of text (including one or more entity mentions) which states that some ground fact r(e) is true. [sent-51, score-0.354]
16 ” contains three entity mentions as well as a relation mention for CEO-o f ( Steve Bal lmer Mi cro s o ft ) . [sent-53, score-0.353]
17 The task of aggregate extraction takes two inputs, Σ, a set of sentences comprising the corpus, and an extraction model; as output it should produce a set of ground facts, I, such that each fact r(e) ∈ I is expressed s foacmtse,w Ih,e srue cinh t thhea corpus. [sent-56, score-0.73]
18 tIenn general, tthhaet corpuslevel extraction problem is easier, since it need only make aggregate predictions, perhaps using corpus- , Γ × wide statistics. [sent-58, score-0.412]
19 In contrast, sentence-level extraction must justify each extraction with every sentence which expresses the fact. [sent-59, score-0.292]
20 The knowledge-based weakly supervised learning problem takes as input (1) Σ, a training corpus, (2) E, a set of entities mentioned in that corpus, (3) R, a set of relation names, and (4), ∆, a set of ground facts of relations in R. [sent-60, score-0.562]
21 3 Modeling Overlapping Relations We define an undirected graphical model that allows joint reasoning about aggregate (corpus-level) and sentence-level extraction decisions. [sent-62, score-0.44]
22 1 Random Variables There exists a connected component for each pair of entities e = (e1, e2) ∈ E E that models all of the extraction decisi)ons ∈ f oEr ×thi Es pair. [sent-65, score-0.234]
23 mTohdeerels i sa one Boolean output variable Yr for each relation name r ∈ R, which represents whether the ground fact r(e) iRs tr wuhe. [sent-66, score-0.247]
24 i Including tshi ws seetth eorf binary rnadnd foamct variables enables our model to extract overlapping relations. [sent-67, score-0.245]
25 eFso rw ehaicchh sentence xi ∈ S(e1,e2) there exists a latent variable Zi which ranges over the relation names r ∈ R and, Figure 1: (a) Network structure depicted as plate model and (b) an example network instantiation for the pair of entities St eve Jobs, Apple. [sent-69, score-0.461]
26 Zi should be assigned a value r ∈ R only when xi expresses bthee ground faa cvta r(e), thereby modeling sentencelevel extraction. [sent-71, score-0.227]
27 2 A Joint, Conditional Extraction Model We use a conditional probability model that defines a joint distribution over all of the extraction random variables defined above. [sent-74, score-0.246]
28 For each entity pair e = (e1, e2), define x to be a vector concatenating the individual sentences xi ∈ S(e1,e2) , Y to be vector of binary Yr random variables, one for each r ∈ R, and Z to be the vector aofb Zi variables, one f o∈r Rea,c ahn sentence xi. [sent-76, score-0.306]
29 The extraction factors Φextract are given by Φextract(zi,xi)=def expXjθjφj(zi,xi) where the features φj are sensitive to the relation name assigned to extraction variable zi, if any, and cues from the sentence xi. [sent-79, score-0.484]
30 However, defining the Yr random variables and tying them to the sentencelevel variables, Zi, provides a direct method for modeling weak supervision. [sent-84, score-0.337]
31 We can simply train the model so that the Y variables match the facts in the database, treating the Zi as hidden variables that can take any value, as long as they produce the correct aggregate predictions. [sent-85, score-0.634]
32 (2010), in that both models include sentence-level and aggregate random variables. [sent-87, score-0.266]
33 However, their sentence level variables are binary and they only have a single aggregate variable that takes values r ∈ R ∪ {none}, thereby ruling oeu tth overlapping eresl rati ∈on Rs. [sent-88, score-0.534]
34 ∪A {dndoitnieon}-, ally, their aggregate decisions make use of Mintzstyle aggregate features (Mintz et al. [sent-89, score-0.532]
35 , 2009), that collect evidence from multiple sentences, while we use Inputs: (1) Σ, a set of sentences, (2) E, a set of entities mentioned in the sentences, (3) R, a set of relation names, and (4) ∆, a database of atomic facts of the form r(e1, e2) for r ∈ R and ei ∈ E. [sent-90, score-0.478]
36 n}, wWhee rdee i n ies an itnradienxin corresponding to a particular entity pair (ej , ek) in ∆, xi contains all of the sentences in Σ with mentions of this pair, and yi = relVector(ej , ek). [sent-94, score-0.478]
37 n do (y0, z0) ← arg maxy,z p(y, z |xi; θ) if y0 yi t ahregnm z6∗= ← arg maxz p(z|xi, yi; θ) Θ ← aΘr φ(xi, z∗) − φ(xi, z0) endΘ Θif end for end for Return Θ Figure 2: The MULTIR Learning Algorithm = + only the deterministic OR nodes. [sent-101, score-0.381]
38 Perhaps surprising, we are still able to improve performance at both the sentential and aggregate extraction tasks. [sent-102, score-0.579]
39 4 Learning We now present a multi-instance learning algorithm for our weak-supervision model that treats the sentence-level extraction random variables Zi as latent, and uses facts from a database (e. [sent-103, score-0.51]
40 As input we have (1) Σ, a set of sentences, (2) E, a set of entities mentioned in the sentences, (3) R, a set of relation names, and (4) ∆, a database of atomic facts of the form r(e1, e2) for r ∈ R and ei ∈ E. [sent-106, score-0.478]
41 Since we are using weak learning, t hRe aYndr var∈iab Ele. [sent-107, score-0.237]
42 n}, where i is an index corresponding |toi a particular entity pair (ej , ek), xi contains all of the sentences with mentions of this pair, and yi = relVector(ej , ek). [sent-114, score-0.478]
43 Specifically, we compute the most likely sentence extractions for the label facts arg maxz p(z|xi, yi; θ) and the most likely extraction for the input, without regard to the labels, arg maxy,z p(y, z|xi; θ). [sent-124, score-0.663]
44 5 Inference To support learning, as described above, we need to compute assignments arg maxz p(z|x, y; θ) and arg maxy,z p(y, z|x; θ). [sent-127, score-0.298]
45 Predicting the most likely joint extraction arg maxy,z p(y, z|x; θ) can be done efficiently given the strpu(yctu,zr|ex o;fθ our amnod beel. [sent-129, score-0.214]
46 It is thus sufficient to independently compute an assignment for each sentence-level extraction variable Zi, ignoring the deterministic dependencies. [sent-131, score-0.314]
47 The optimal setting for the aggregate variables Y is then simply the assignment that is consistent with these extractions. [sent-132, score-0.416]
48 ions given weak supervision facts, arg maxz p(z |x, y; θ), is more challenging. [sent-135, score-0.635]
49 We start by computing ,ex itsrac mtoiorne scores Φextract(xi, zi) for each possible extraction assignment Zi = zi at each sentence xi ∈ S, and storing the values in a dynamic programming ,ta ablned. [sent-136, score-0.528]
50 Let G = (E, V = VS ∪ Vy) be a complete weighted bipartite graph Vwith∪ one node viS ∈ VS for each sentence xi ∈ S and one node ∈ Vy∈ ∈fo Vr each relation r ∈ R wh∈er Se = e1. [sent-139, score-0.27]
51 Our goal is to select a subset of the edges which maximizes the sum of their weights, subject to each node viS ∈ VS being incident to exactly one edge, and each ∈no Vde yr =d yef vry vry ∈ Vy being incident to at least one edge. [sent-141, score-0.326]
52 (2010) for generating weak supervision data, computing features, and evaluating aggregate extraction. [sent-160, score-0.734]
53 We also introduce new metrics for measuring sentential extraction performance, both relation-independent and relation-specific. [sent-161, score-0.313]
54 However, unlike the previous work, we did not make use of any features that explicitly aggregate these properties across multiple mention instances. [sent-177, score-0.318]
55 3 Evaluation Metrics Evaluation is challenging, since only a small percentage (approximately 3%) of sentences match facts in Freebase, and the number of matches is highly unbalanced across relations, as we will see in more detail later. [sent-181, score-0.281]
56 Aggregate Extraction Let ∆e be the set of extracted relations for any of the systems; we compute aggregate precision and recall by comparing ∆e with ∆. [sent-183, score-0.569]
57 This metric is easily computed but underestimates extraction accuracy because Freebase is incomplete and some true relations in ∆e will be marked wrong. [sent-184, score-0.328]
58 Sentential Extraction Let Se be the sentences where some system extracted a relation and SF be the sentences that match the arguments of a fact in ∆. [sent-185, score-0.257]
59 We manually compute sentential extraction accuracy by sampling a set of 1000 sentences from Se ∪ SF and manually labeling the correct extraction∪ decision, either a relation r ∈ R or none. [sent-186, score-0.521]
60 4 Precision / Recall Curves To compute precision / recall curves for the tasks, we ranked the MULTIR extractions as follows. [sent-190, score-0.322]
61 net/ 546 Recall Figure 4: Aggregate extraction precision / recall curves for Riedel et al. [sent-193, score-0.354]
62 For aggregate comparisons, we set the score for an extraction Yr = true to be the max of the extraction factor scores for the sentences where r was extracted. [sent-196, score-0.67]
63 7 Experiments To evaluate our algorithm, we first compare it to an existing approach for using multi-instance learning with weak supervision (Riedel et al. [sent-197, score-0.468]
64 We report both aggregate extraction and sentential extraction results. [sent-199, score-0.725]
65 1 Aggregate Extraction Figure 4 shows approximate precision / recall curves for three systems computed with aggregate metrics (Section 6. [sent-203, score-0.504]
66 3) that test how closely the extractions match the facts in Freebase. [sent-204, score-0.254]
67 To investigate the low precision in the 0-1% recall range, we manually checked the ten highest con- Recall Figure 5: Sentential extraction precision / recall curves for MULTIR and SOLOR. [sent-211, score-0.509]
68 We found that all ten were true facts that were simply missing from Freebase. [sent-213, score-0.23]
69 2 Sentential Extraction Although their model includes variables to model sentential extraction, Riedel et al. [sent-216, score-0.267]
70 To generate the precision / recall curve we used the joint model assignment score for each of the sentences that contributed to the aggregate extraction decision. [sent-218, score-0.667]
71 Figure 4 shows approximate precision / recall curves for MULTIR and SOLOR computed against manually generated sentence labels, as defined in Section 6. [sent-219, score-0.238]
72 Let SrM be the sentences where MULTIR extracted an instance of relation r ∈ R, and let SrF be the sentences teh oatf m realtactiho tnhe r arguments loeft a fact about relation r in ∆. [sent-228, score-0.387]
73 To estimate precision we compute the ratio of true relation mentions in SrM, and to estimate recall we take the ratio of true relation mentions in 547 P˜r R˜r SrF which are returned by our system. [sent-230, score-0.739]
74 Table 1 presents this approximate precision and recall for MULTIR on each of the relations, along with statistics we computed to measure the quality of the weak supervision. [sent-231, score-0.422]
75 Precision is high for the majority of relations but recall is consistently lower. [sent-232, score-0.202]
76 The approach generally performs best on the relations with a sufficiently large number of true matches, in many cases even achieving precision that outperforms the accuracy of the heuristic matches, at reasonable recall levels. [sent-234, score-0.337]
77 For example, in the data, almost all of the matches for the administrative divisions relation overlap with the contains relation, because they both model relationships for a pair of locations. [sent-237, score-0.224]
78 Instead of labeling each entity pair with the set of all true Freebase facts, we created a dataset where each true relation was used to create a different training example. [sent-243, score-0.37]
79 Training MULTIR on this data simulates effects of conflicting supervision that can come from not modeling overlaps. [sent-244, score-0.231]
80 Our implementation of the Table 1: Estimated precision and recall by relation, as well as the number of matched sentences (#sents) and accuracy (% true) of matches between sentences and facts in Freebase. [sent-250, score-0.486]
81 6 Discussion The sentential extraction results demonstrates the advantages of learning a model that is primarily driven by sentence-level features. [sent-258, score-0.313]
82 approach does include a model ofwhich sentences express relations, it makes significant use of aggregate features that are primarily designed to do entity-level relation predictions and has a less detailed model of extractions at the individual sentence level. [sent-261, score-0.532]
83 Perhaps surprisingly, our 548 model is able to do better at both the sentential and aggregate levels. [sent-262, score-0.433]
84 While they offer high precision and recall, these methods are unlikely to scale to the thousands of relations found in text on the Web. [sent-265, score-0.193]
85 1 Weak Supervision Weak supervision (also known as distant- or self su- pervision) refers to a broad class of methods, but we focus on the increasingly-popular idea of using a store of structured data to heuristicaly label a textual corpus. [sent-271, score-0.231]
86 The KYLIN system aplied weak supervision to learn relations from Wikipedia, treating infoboxes as the associated database (Wu and Weld, 2007); Wu et al. [sent-274, score-0.716]
87 (2009) used Freebase facts to train 100 relational extractors on Wikipedia. [sent-277, score-0.282]
88 (2010) perform weak supervision, while using selectional preference constraints to a jointly reason about entity types. [sent-281, score-0.355]
89 , 2010) can also be viewed as performing weak supervision. [sent-283, score-0.237]
90 NELL then matches entity pairs from the seeds to a Web corpus, but instead of learning a probabilistic model, it bootstraps a set of extraction patterns using semisupervised methods for multitask learning. [sent-285, score-0.294]
91 Bunescu and Mooney (2007) connect weak supervision with multi-instance learning and extend their relational extraction kernel to this context. [sent-289, score-0.668]
92 (2010), combine weak supervision and multi-instance learning in a more sophisticated manner, training a graphical model, which assumes only that at least one of the matches between the arguments of a Freebase fact and sentences in the corpus is a true relational mention. [sent-291, score-0.724]
93 Our model may be seen as an extension of theirs, since both models include sentence-level and aggregate random variables. [sent-292, score-0.266]
94 9 Conclusion We argue that weak supervision is promising method for scaling information extraction to the level where it can handle the myriad, different relations on the Web. [sent-296, score-0.734]
95 By using the contents of a database to heuristically label a training corpus, we may be able to 549 automatically learn a nearly unbounded number of relational extractors. [sent-297, score-0.212]
96 Unfortunately, previous approaches assume that all relations are disjoint for example they cannot extract the pair Founded ( Jobs Apple ) and CEO-o f ( Jobs Apple ) , because two relations are not allowed to have the same arguments. [sent-299, score-0.307]
97 — , , This paper presents a novel approach for multiinstance learning with overlapping relations that combines a sentence-level extraction model with a simple, corpus-level component for aggregating the individual facts. [sent-300, score-0.468]
98 We apply our model to learn extractors for NY Times text using weak supervision from Freebase. [sent-301, score-0.528]
99 Experiments show improvements for both sentential and aggregate (corpus level) extraction, and demonstrate that the approach is computationally efficient. [sent-302, score-0.433]
100 Finally, we are also interested in applying the overall learning approaches to other tasks that could be modeled with weak supervision, such as coreference and named entity classification. [sent-307, score-0.322]
wordName wordTfidf (topN-words)
[('multir', 0.417), ('aggregate', 0.266), ('weak', 0.237), ('supervision', 0.231), ('riedel', 0.2), ('zi', 0.192), ('freebase', 0.179), ('facts', 0.168), ('sentential', 0.167), ('extraction', 0.146), ('xi', 0.14), ('jobs', 0.138), ('relation', 0.13), ('relations', 0.12), ('apple', 0.117), ('overlapping', 0.109), ('variables', 0.1), ('maxz', 0.099), ('founded', 0.097), ('yr', 0.096), ('database', 0.096), ('ground', 0.087), ('mentions', 0.086), ('mintz', 0.086), ('extractions', 0.086), ('yi', 0.086), ('entity', 0.085), ('recall', 0.082), ('precision', 0.073), ('join', 0.069), ('arg', 0.068), ('vy', 0.064), ('hoffmann', 0.064), ('matches', 0.063), ('true', 0.062), ('ek', 0.061), ('deterministic', 0.06), ('extractors', 0.06), ('relvector', 0.06), ('solor', 0.06), ('vry', 0.06), ('entities', 0.057), ('incident', 0.055), ('relational', 0.054), ('curves', 0.053), ('aggregating', 0.053), ('srf', 0.052), ('vis', 0.052), ('mention', 0.052), ('sentences', 0.05), ('assignment', 0.05), ('yao', 0.05), ('srm', 0.048), ('infobox', 0.048), ('craven', 0.048), ('ej', 0.048), ('ie', 0.045), ('banko', 0.044), ('def', 0.043), ('limin', 0.043), ('ny', 0.043), ('names', 0.043), ('kumlien', 0.04), ('kylin', 0.04), ('multiinstance', 0.04), ('raphae', 0.04), ('zettlemoyer', 0.04), ('reimplementation', 0.04), ('wu', 0.037), ('weld', 0.036), ('extract', 0.036), ('assignments', 0.035), ('congle', 0.035), ('cinh', 0.035), ('factorie', 0.035), ('nell', 0.035), ('heuristically', 0.033), ('selectional', 0.033), ('infoboxes', 0.032), ('dietterich', 0.032), ('combat', 0.032), ('factors', 0.032), ('pair', 0.031), ('approximate', 0.03), ('plate', 0.03), ('soderland', 0.03), ('variable', 0.03), ('matching', 0.029), ('contents', 0.029), ('bellare', 0.029), ('raphael', 0.029), ('preemptive', 0.029), ('ruling', 0.029), ('vs', 0.028), ('compute', 0.028), ('fei', 0.028), ('reasoning', 0.028), ('arguments', 0.027), ('running', 0.027), ('ei', 0.027)]
simIndex simValue paperId paperTitle
same-paper 1 0.99999875 190 acl-2011-Knowledge-Based Weak Supervision for Information Extraction of Overlapping Relations
Author: Raphael Hoffmann ; Congle Zhang ; Xiao Ling ; Luke Zettlemoyer ; Daniel S. Weld
Abstract: Information extraction (IE) holds the promise of generating a large-scale knowledge base from the Web’s natural language text. Knowledge-based weak supervision, using structured data to heuristically label a training corpus, works towards this goal by enabling the automated learning of a potentially unbounded number of relation extractors. Recently, researchers have developed multiinstance learning algorithms to combat the noisy training data that can come from heuristic labeling, but their models assume relations are disjoint — for example they cannot extract the pair Founded ( Jobs Apple ) and CEO-o f ( Jobs Apple ) . , , This paper presents a novel approach for multi-instance learning with overlapping relations that combines a sentence-level extrac- , tion model with a simple, corpus-level component for aggregating the individual facts. We apply our model to learn extractors for NY Times text using weak supervision from Freebase. Experiments show that the approach runs quickly and yields surprising gains in accuracy, at both the aggregate and sentence level.
2 0.24157053 114 acl-2011-End-to-End Relation Extraction Using Distant Supervision from External Semantic Repositories
Author: Truc Vien T. Nguyen ; Alessandro Moschitti
Abstract: In this paper, we extend distant supervision (DS) based on Wikipedia for Relation Extraction (RE) by considering (i) relations defined in external repositories, e.g. YAGO, and (ii) any subset of Wikipedia documents. We show that training data constituted by sentences containing pairs of named entities in target relations is enough to produce reliable supervision. Our experiments with state-of-the-art relation extraction models, trained on the above data, show a meaningful F1 of 74.29% on a manually annotated test set: this highly improves the state-of-art in RE using DS. Additionally, our end-to-end experiments demonstrated that our extractors can be applied to any general text document.
3 0.17273363 170 acl-2011-In-domain Relation Discovery with Meta-constraints via Posterior Regularization
Author: Harr Chen ; Edward Benson ; Tahira Naseem ; Regina Barzilay
Abstract: We present a novel approach to discovering relations and their instantiations from a collection of documents in a single domain. Our approach learns relation types by exploiting meta-constraints that characterize the general qualities of a good relation in any domain. These constraints state that instances of a single relation should exhibit regularities at multiple levels of linguistic structure, including lexicography, syntax, and document-level context. We capture these regularities via the structure of our probabilistic model as well as a set of declaratively-specified constraints enforced during posterior inference. Across two domains our approach successfully recovers hidden relation structure, comparable to or outperforming previous state-of-the-art approaches. Furthermore, we find that a small , set of constraints is applicable across the domains, and that using domain-specific constraints can further improve performance. 1
4 0.16308776 86 acl-2011-Coreference for Learning to Extract Relations: Yes Virginia, Coreference Matters
Author: Ryan Gabbard ; Marjorie Freedman ; Ralph Weischedel
Abstract: As an alternative to requiring substantial supervised relation training data, many have explored bootstrapping relation extraction from a few seed examples. Most techniques assume that the examples are based on easily spotted anchors, e.g., names or dates. Sentences in a corpus which contain the anchors are then used to induce alternative ways of expressing the relation. We explore whether coreference can improve the learning process. That is, if the algorithm considered examples such as his sister, would accuracy be improved? With coreference, we see on average a 2-fold increase in F-Score. Despite using potentially errorful machine coreference, we see significant increase in recall on all relations. Precision increases in four cases and decreases in six.
5 0.13471517 196 acl-2011-Large-Scale Cross-Document Coreference Using Distributed Inference and Hierarchical Models
Author: Sameer Singh ; Amarnag Subramanya ; Fernando Pereira ; Andrew McCallum
Abstract: Cross-document coreference, the task of grouping all the mentions of each entity in a document collection, arises in information extraction and automated knowledge base construction. For large collections, it is clearly impractical to consider all possible groupings of mentions into distinct entities. To solve the problem we propose two ideas: (a) a distributed inference technique that uses parallelism to enable large scale processing, and (b) a hierarchical model of coreference that represents uncertainty over multiple granularities of entities to facilitate more effective approximate inference. To evaluate these ideas, we constructed a labeled corpus of 1.5 million disambiguated mentions in Web pages by selecting link anchors referring to Wikipedia entities. We show that the combination of the hierarchical model with distributed inference quickly obtains high accuracy (with error reduction of 38%) on this large dataset, demonstrating the scalability of our approach.
6 0.13020292 279 acl-2011-Semi-supervised latent variable models for sentence-level sentiment analysis
7 0.12979904 277 acl-2011-Semi-supervised Relation Extraction with Large-scale Word Clustering
8 0.12464187 103 acl-2011-Domain Adaptation by Constraining Inter-Domain Variability of Latent Feature Representation
9 0.10905162 334 acl-2011-Which Noun Phrases Denote Which Concepts?
10 0.10714084 121 acl-2011-Event Discovery in Social Media Feeds
11 0.10342228 12 acl-2011-A Generative Entity-Mention Model for Linking Entities with Knowledge Base
12 0.10027461 262 acl-2011-Relation Guided Bootstrapping of Semantic Lexicons
13 0.097171202 126 acl-2011-Exploiting Syntactico-Semantic Structures for Relation Extraction
14 0.092578337 191 acl-2011-Knowledge Base Population: Successful Approaches and Challenges
15 0.090222768 79 acl-2011-Confidence Driven Unsupervised Semantic Parsing
16 0.083356321 200 acl-2011-Learning Dependency-Based Compositional Semantics
17 0.07619369 293 acl-2011-Template-Based Information Extraction without the Templates
18 0.072832473 128 acl-2011-Exploring Entity Relations for Named Entity Disambiguation
19 0.072340265 117 acl-2011-Entity Set Expansion using Topic information
20 0.071159437 3 acl-2011-A Bayesian Model for Unsupervised Semantic Parsing
topicId topicWeight
[(0, 0.203), (1, 0.058), (2, -0.141), (3, 0.003), (4, 0.109), (5, 0.011), (6, 0.008), (7, -0.028), (8, -0.173), (9, -0.007), (10, 0.094), (11, 0.028), (12, -0.006), (13, 0.004), (14, 0.003), (15, -0.015), (16, -0.07), (17, -0.157), (18, 0.011), (19, 0.013), (20, -0.037), (21, 0.02), (22, 0.065), (23, -0.019), (24, -0.021), (25, -0.038), (26, 0.077), (27, 0.018), (28, 0.141), (29, 0.035), (30, -0.089), (31, 0.086), (32, 0.05), (33, 0.068), (34, -0.008), (35, 0.007), (36, -0.02), (37, 0.041), (38, -0.011), (39, -0.036), (40, -0.044), (41, -0.049), (42, 0.136), (43, -0.064), (44, -0.044), (45, -0.078), (46, -0.064), (47, -0.002), (48, 0.107), (49, -0.085)]
simIndex simValue paperId paperTitle
same-paper 1 0.94958669 190 acl-2011-Knowledge-Based Weak Supervision for Information Extraction of Overlapping Relations
Author: Raphael Hoffmann ; Congle Zhang ; Xiao Ling ; Luke Zettlemoyer ; Daniel S. Weld
Abstract: Information extraction (IE) holds the promise of generating a large-scale knowledge base from the Web’s natural language text. Knowledge-based weak supervision, using structured data to heuristically label a training corpus, works towards this goal by enabling the automated learning of a potentially unbounded number of relation extractors. Recently, researchers have developed multiinstance learning algorithms to combat the noisy training data that can come from heuristic labeling, but their models assume relations are disjoint — for example they cannot extract the pair Founded ( Jobs Apple ) and CEO-o f ( Jobs Apple ) . , , This paper presents a novel approach for multi-instance learning with overlapping relations that combines a sentence-level extrac- , tion model with a simple, corpus-level component for aggregating the individual facts. We apply our model to learn extractors for NY Times text using weak supervision from Freebase. Experiments show that the approach runs quickly and yields surprising gains in accuracy, at both the aggregate and sentence level.
2 0.86519408 114 acl-2011-End-to-End Relation Extraction Using Distant Supervision from External Semantic Repositories
Author: Truc Vien T. Nguyen ; Alessandro Moschitti
Abstract: In this paper, we extend distant supervision (DS) based on Wikipedia for Relation Extraction (RE) by considering (i) relations defined in external repositories, e.g. YAGO, and (ii) any subset of Wikipedia documents. We show that training data constituted by sentences containing pairs of named entities in target relations is enough to produce reliable supervision. Our experiments with state-of-the-art relation extraction models, trained on the above data, show a meaningful F1 of 74.29% on a manually annotated test set: this highly improves the state-of-art in RE using DS. Additionally, our end-to-end experiments demonstrated that our extractors can be applied to any general text document.
3 0.82402623 170 acl-2011-In-domain Relation Discovery with Meta-constraints via Posterior Regularization
Author: Harr Chen ; Edward Benson ; Tahira Naseem ; Regina Barzilay
Abstract: We present a novel approach to discovering relations and their instantiations from a collection of documents in a single domain. Our approach learns relation types by exploiting meta-constraints that characterize the general qualities of a good relation in any domain. These constraints state that instances of a single relation should exhibit regularities at multiple levels of linguistic structure, including lexicography, syntax, and document-level context. We capture these regularities via the structure of our probabilistic model as well as a set of declaratively-specified constraints enforced during posterior inference. Across two domains our approach successfully recovers hidden relation structure, comparable to or outperforming previous state-of-the-art approaches. Furthermore, we find that a small , set of constraints is applicable across the domains, and that using domain-specific constraints can further improve performance. 1
4 0.78127015 277 acl-2011-Semi-supervised Relation Extraction with Large-scale Word Clustering
Author: Ang Sun ; Ralph Grishman ; Satoshi Sekine
Abstract: We present a simple semi-supervised relation extraction system with large-scale word clustering. We focus on systematically exploring the effectiveness of different cluster-based features. We also propose several statistical methods for selecting clusters at an appropriate level of granularity. When training on different sizes of data, our semi-supervised approach consistently outperformed a state-of-the-art supervised baseline system. 1
5 0.76091784 86 acl-2011-Coreference for Learning to Extract Relations: Yes Virginia, Coreference Matters
Author: Ryan Gabbard ; Marjorie Freedman ; Ralph Weischedel
Abstract: As an alternative to requiring substantial supervised relation training data, many have explored bootstrapping relation extraction from a few seed examples. Most techniques assume that the examples are based on easily spotted anchors, e.g., names or dates. Sentences in a corpus which contain the anchors are then used to induce alternative ways of expressing the relation. We explore whether coreference can improve the learning process. That is, if the algorithm considered examples such as his sister, would accuracy be improved? With coreference, we see on average a 2-fold increase in F-Score. Despite using potentially errorful machine coreference, we see significant increase in recall on all relations. Precision increases in four cases and decreases in six.
6 0.73645258 262 acl-2011-Relation Guided Bootstrapping of Semantic Lexicons
7 0.71166056 126 acl-2011-Exploiting Syntactico-Semantic Structures for Relation Extraction
8 0.69959795 322 acl-2011-Unsupervised Learning of Semantic Relation Composition
9 0.65898156 40 acl-2011-An Error Analysis of Relation Extraction in Social Media Documents
10 0.59928024 342 acl-2011-full-for-print
11 0.59439307 121 acl-2011-Event Discovery in Social Media Feeds
12 0.55369645 291 acl-2011-SystemT: A Declarative Information Extraction System
13 0.52503687 196 acl-2011-Large-Scale Cross-Document Coreference Using Distributed Inference and Hierarchical Models
14 0.49518496 150 acl-2011-Hierarchical Text Classification with Latent Concepts
15 0.48835969 239 acl-2011-P11-5002 k2opt.pdf
16 0.46663278 294 acl-2011-Temporal Evaluation
17 0.4658421 200 acl-2011-Learning Dependency-Based Compositional Semantics
18 0.46345535 231 acl-2011-Nonlinear Evidence Fusion and Propagation for Hyponymy Relation Mining
19 0.45665166 79 acl-2011-Confidence Driven Unsupervised Semantic Parsing
20 0.45571324 334 acl-2011-Which Noun Phrases Denote Which Concepts?
topicId topicWeight
[(5, 0.025), (9, 0.039), (17, 0.066), (26, 0.018), (37, 0.121), (39, 0.038), (41, 0.067), (48, 0.109), (53, 0.017), (55, 0.035), (59, 0.082), (72, 0.036), (77, 0.034), (88, 0.017), (91, 0.048), (96, 0.136), (97, 0.016)]
simIndex simValue paperId paperTitle
1 0.94183236 68 acl-2011-Classifying arguments by scheme
Author: Vanessa Wei Feng ; Graeme Hirst
Abstract: Argumentation schemes are structures or templates for various kinds of arguments. Given the text of an argument with premises and conclusion identified, we classify it as an instance ofone offive common schemes, using features specific to each scheme. We achieve accuracies of 63–91% in one-against-others classification and 80–94% in pairwise classification (baseline = 50% in both cases).
same-paper 2 0.89106613 190 acl-2011-Knowledge-Based Weak Supervision for Information Extraction of Overlapping Relations
Author: Raphael Hoffmann ; Congle Zhang ; Xiao Ling ; Luke Zettlemoyer ; Daniel S. Weld
Abstract: Information extraction (IE) holds the promise of generating a large-scale knowledge base from the Web’s natural language text. Knowledge-based weak supervision, using structured data to heuristically label a training corpus, works towards this goal by enabling the automated learning of a potentially unbounded number of relation extractors. Recently, researchers have developed multiinstance learning algorithms to combat the noisy training data that can come from heuristic labeling, but their models assume relations are disjoint — for example they cannot extract the pair Founded ( Jobs Apple ) and CEO-o f ( Jobs Apple ) . , , This paper presents a novel approach for multi-instance learning with overlapping relations that combines a sentence-level extrac- , tion model with a simple, corpus-level component for aggregating the individual facts. We apply our model to learn extractors for NY Times text using weak supervision from Freebase. Experiments show that the approach runs quickly and yields surprising gains in accuracy, at both the aggregate and sentence level.
3 0.8657251 277 acl-2011-Semi-supervised Relation Extraction with Large-scale Word Clustering
Author: Ang Sun ; Ralph Grishman ; Satoshi Sekine
Abstract: We present a simple semi-supervised relation extraction system with large-scale word clustering. We focus on systematically exploring the effectiveness of different cluster-based features. We also propose several statistical methods for selecting clusters at an appropriate level of granularity. When training on different sizes of data, our semi-supervised approach consistently outperformed a state-of-the-art supervised baseline system. 1
4 0.85537899 126 acl-2011-Exploiting Syntactico-Semantic Structures for Relation Extraction
Author: Yee Seng Chan ; Dan Roth
Abstract: In this paper, we observe that there exists a second dimension to the relation extraction (RE) problem that is orthogonal to the relation type dimension. We show that most of these second dimensional structures are relatively constrained and not difficult to identify. We propose a novel algorithmic approach to RE that starts by first identifying these structures and then, within these, identifying the semantic type of the relation. In the real RE problem where relation arguments need to be identified, exploiting these structures also allows reducing pipelined propagated errors. We show that this RE framework provides significant improvement in RE performance.
5 0.8546468 324 acl-2011-Unsupervised Semantic Role Induction via Split-Merge Clustering
Author: Joel Lang ; Mirella Lapata
Abstract: In this paper we describe an unsupervised method for semantic role induction which holds promise for relieving the data acquisition bottleneck associated with supervised role labelers. We present an algorithm that iteratively splits and merges clusters representing semantic roles, thereby leading from an initial clustering to a final clustering of better quality. The method is simple, surprisingly effective, and allows to integrate linguistic knowledge transparently. By combining role induction with a rule-based component for argument identification we obtain an unsupervised end-to-end semantic role labeling system. Evaluation on the CoNLL 2008 benchmark dataset demonstrates that our method outperforms competitive unsupervised approaches by a wide margin.
6 0.85041428 170 acl-2011-In-domain Relation Discovery with Meta-constraints via Posterior Regularization
7 0.84600157 164 acl-2011-Improving Arabic Dependency Parsing with Form-based and Functional Morphological Features
8 0.84314609 114 acl-2011-End-to-End Relation Extraction Using Distant Supervision from External Semantic Repositories
9 0.84310049 65 acl-2011-Can Document Selection Help Semi-supervised Learning? A Case Study On Event Extraction
10 0.83820319 103 acl-2011-Domain Adaptation by Constraining Inter-Domain Variability of Latent Feature Representation
11 0.83739436 3 acl-2011-A Bayesian Model for Unsupervised Semantic Parsing
12 0.83730847 311 acl-2011-Translationese and Its Dialects
13 0.8354218 85 acl-2011-Coreference Resolution with World Knowledge
14 0.83541059 86 acl-2011-Coreference for Learning to Extract Relations: Yes Virginia, Coreference Matters
15 0.83437437 289 acl-2011-Subjectivity and Sentiment Analysis of Modern Standard Arabic
16 0.83277738 331 acl-2011-Using Large Monolingual and Bilingual Corpora to Improve Coordination Disambiguation
17 0.82879019 222 acl-2011-Model-Portability Experiments for Textual Temporal Analysis
18 0.82862025 111 acl-2011-Effects of Noun Phrase Bracketing in Dependency Parsing and Machine Translation
19 0.82835251 198 acl-2011-Latent Semantic Word Sense Induction and Disambiguation
20 0.82824385 327 acl-2011-Using Bilingual Parallel Corpora for Cross-Lingual Textual Entailment