acl acl2011 acl2011-293 knowledge-graph by maker-knowledge-mining
Source: pdf
Author: Nathanael Chambers ; Dan Jurafsky
Abstract: Standard algorithms for template-based information extraction (IE) require predefined template schemas, and often labeled data, to learn to extract their slot fillers (e.g., an embassy is the Target of a Bombing template). This paper describes an approach to template-based IE that removes this requirement and performs extraction without knowing the template structure in advance. Our algorithm instead learns the template structure automatically from raw text, inducing template schemas as sets of linked events (e.g., bombings include detonate, set off, and destroy events) associated with semantic roles. We also solve the standard IE task, using the induced syntactic patterns to extract role fillers from specific documents. We evaluate on the MUC-4 terrorism dataset and show that we induce template structure very similar to handcreated gold structure, and we extract role fillers with an F1 score of .40, approaching the performance of algorithms that require full knowledge of the templates.
Reference: text
sentIndex sentText sentNum sentScore
1 edu Abstract Standard algorithms for template-based information extraction (IE) require predefined template schemas, and often labeled data, to learn to extract their slot fillers (e. [sent-2, score-0.849]
2 This paper describes an approach to template-based IE that removes this requirement and performs extraction without knowing the template structure in advance. [sent-5, score-0.489]
3 Our algorithm instead learns the template structure automatically from raw text, inducing template schemas as sets of linked events (e. [sent-6, score-1.019]
4 We also solve the standard IE task, using the induced syntactic patterns to extract role fillers from specific documents. [sent-9, score-0.369]
5 We evaluate on the MUC-4 terrorism dataset and show that we induce template structure very similar to handcreated gold structure, and we extract role fillers with an F1 score of . [sent-10, score-0.71]
6 1 Introduction A template defines a specific type of event (e. [sent-12, score-0.539]
7 , a bombing) with a set of semantic roles (or slots) for the typical entities involved in such an event (e. [sent-14, score-0.397]
8 , 2010), templates can extract a richer representation of a particular domain. [sent-19, score-0.332]
9 Very little work addresses how to learn the template structure 976 itself. [sent-21, score-0.433]
10 Our goal in this paper is to perform the standard template filling task, but to first automatically induce the templates from an unlabeled corpus. [sent-22, score-0.801]
11 , 1998) to sequential events in scripts (Schank and Abelson, 1977) and narrative schemas (Chambers and Jurafsky, 2009; Kasch and Oates, 2010). [sent-24, score-0.36]
12 Our goal is to characterize a domain by learning this template structure completely automatically. [sent-28, score-0.466]
13 We learn templates by first clustering event words based on their proximity in a training corpus. [sent-29, score-0.565]
14 We then use a novel approach to role induction that clusters the syntactic functions of these events based on selectional preferences and coreferring arguments. [sent-30, score-0.525]
15 After learning a domain’s template schemas, we perform the standard IE task of role filling from in- dividual documents, for example: Perpetrator: guerrillas Instrument: dynamite Target: embassy ProceedinPgosrt olafn thde, 4 O9rtehg Aonn,n Juuanle M 1e9e-2tin4g, 2 o0f1 t1h. [sent-36, score-0.538]
16 Ac s2s0o1ci1a Atiosnso fcoirat Cioonm foprut Caotimonpaulta Lti nognuails Lti cnsg,u piasgteics 976–986, This extraction stage identifies entities using the learned syntactic functions of our roles. [sent-38, score-0.327]
17 The core of this paper focuses on how to characterize a domain-specific corpus by learning rich template structure. [sent-40, score-0.424]
18 2 Previous Work Many template extraction algorithms require full knowledge of the templates and labeled corpora, such as in rule-based systems (Chinchor et al. [sent-43, score-0.785]
19 Bootstrapping with seed examples of known slot fillers has been shown to be effective (Surdeanu et al. [sent-56, score-0.301]
20 Shinyama and Sekine (2006) describe an approach to template learning without labeled data. [sent-60, score-0.436]
21 Central to the algorithm is collecting multiple documents describ977 ing the same exact event (e. [sent-62, score-0.27]
22 Our approach draws on this idea of using unlabeled documents to discover relations in text, and of defining semantic roles by sets of entities. [sent-66, score-0.382]
23 However, the limitations to their approach are that (1) redundant documents about specific events are required, (2) relations are binary, and (3) only slots with named entities are learned. [sent-67, score-0.563]
24 We will extend their work by showing how to learn without these assumptions, obviating the need for redundant documents, and learning templates with any type and any number of slots. [sent-68, score-0.336]
25 Large-scale learning of scripts and narrative schemas also captures template-like knowledge from unlabeled text (Chambers and Jurafsky, 2008; Kasch and Oates, 2010). [sent-69, score-0.263]
26 Scripts are sets of related event words and semantic roles learned by linking syntactic functions with coreferring arguments. [sent-70, score-0.606]
27 Further, we are the first to apply this knowledge to the IE task of filling in template mentions in documents. [sent-73, score-0.427]
28 We are the first to learn MUC-4 templates, and we are the first to extract entities without knowing how many templates exist, without examples of slot fillers, and without event-clustered documents. [sent-75, score-0.649]
29 3 The Domain and its Templates Our goal is to learn the general event structure of a domain, and then extract the instances of each learned event. [sent-76, score-0.365]
30 This corpus was chosen because it is annotated with templates that describe all of the entities involved in each event. [sent-78, score-0.358]
31 An example snippet from a bombing document is given here: The terrorists used explosives against the town hall. [sent-79, score-0.38]
32 The entities from this document fill the following slots in a MUC-4 bombing template. [sent-82, score-0.639]
33 There are six types of templates, but only four are modestly frequent: bombing (208 docs), kidnap (83 docs), attack (479 docs), and arson (40 docs). [sent-85, score-0.68]
34 After learning event words that represent templates, we induce their slots, not knowing a priori how many there are, and then fill them in by extracting entities as in the standard task. [sent-88, score-0.352]
35 4 Learning Templates from Raw Text Our goal is to learn templates that characterize a domain as described in unclustered, unlabeled documents. [sent-90, score-0.461]
36 This presents a two-fold problem to the learner: it does not know how many events exist, and it does not know which documents describe which event (some may describe multiple events). [sent-91, score-0.41]
37 1 Clustering Events to Learn Templates We cluster event patterns to create templates. [sent-94, score-0.356]
38 An event pattern is either (1) a verb, (2) a noun in Word1There are two Perpetrator slots in MUC-4: Organization and Individual. [sent-95, score-0.339]
39 However, we first need an algorithm to cluster these patterns to learn the domain’s core events. [sent-103, score-0.25]
40 It learns topics as discrete distributions (multinomials) over the event patterns, and thus meets our needs as it clusters patterns based on co-occurrence in documents. [sent-109, score-0.288]
41 2 Clustering on Event Distance Agglomerative clustering does not require foreknowledge of the templates, but its success relies on how event pattern similarity is determined. [sent-115, score-0.303]
42 Ideally, we want to learn that detonate and destroy belong in the same cluster representing a bombing. [sent-116, score-0.367]
43 Let Cdist(wi, wj) be the distance-weighted frequency of two events occurring together: Cdist(wi,wj) = X X 1 − log4(g(wi,wj)) (1) dX∈ XD wiX X,wj ∈ d where d is a document in the set of all documents D. [sent-123, score-0.334]
44 We continue merging clusters until any single cluster grows beyond m patterns. [sent-129, score-0.239]
45 Figure 1 shows 3 clusters (of 77 learned) that characterize the main template types. [sent-133, score-0.491]
46 The previous section clustered events from the MUC-4 corpus, but its 1300 documents do not provide enough examples of verbs and argument counts to further learn the semantic roles in each 979 cluster. [sent-136, score-0.511]
47 For example, MUC-4 labels 83 documents with Kidnap, but our learned cluster (kidnap, abduct, release, . [sent-138, score-0.366]
48 A document’s match score is defined as the average number of times the words in cluster c appear in document d: avgm(d,c) = Pw∈cPt∈|cd|1{w = t} (5) We define word coverage as the number of seen cluster words. [sent-147, score-0.349]
49 Coverage penalizes documents that score highly by repeating a single cluster word a lot. [sent-148, score-0.25]
50 avgm(d,c0) ioft chvergw(dis,ec) > min(3,|c|/4) A document d is retrieved for a cluster c if ir(d, c) > 0. [sent-150, score-0.255]
51 Finally, we emphasize precision by pruning away 50% of a cluster’s retrieved documents that are farthest in distance from the mean document of the retrieved set. [sent-152, score-0.276]
52 3 Inducing Semantic Roles (Slots) Having successfully clustered event words and retrieved an IR-corpus for each cluster, we now address the problem of inducing semantic roles. [sent-159, score-0.278]
53 Our learned roles will then extract entities in the next sec- tion and we will evaluate their per-role accuracy. [sent-160, score-0.357]
54 Most work on unsupervised role induction focuses on learning verb-specific roles, starting with seed examples (Swier and Stevenson, 2004; He and Gildea, 2006) and/or knowing the number of roles (Grenager and Manning, 2006; Lang and Lapata, 2010). [sent-161, score-0.327]
55 Our previous work (Chambers and Jurafsky, 2009) learned situation-specific roles over narrative schemas, similar to frame roles in FrameNet (Baker et al. [sent-162, score-0.434]
56 Schemas link the syntactic relations of verbs by clustering them based on observing coreferring arguments in those positions. [sent-164, score-0.33]
57 1 Syntactic Relations as Roles We learn the roles of cluster C by clustering the syntactic relations RC of its words. [sent-168, score-0.475]
58 We ideally want to cluster RC as: bomb = {go off:s, explode:s, set off:o, destroy:s} suspect = {set off:s} target = {go off:p in, destroy:o} We want to cluster all subjects, objects, and prepositions. [sent-170, score-0.359]
59 Once labeled by type, we separately cluster the syntactic functions for each role type. [sent-214, score-0.341]
60 Finally, since agglomerative clustering makes hard decisions, related events to a template may have been excluded in the initial event clustering stage. [sent-217, score-0.873]
61 To address this problem, we identify the 200 nearby events to each event cluster. [sent-218, score-0.295]
62 4 Template Evaluation We now compare our learned templates to those hand-created by human annotators for the MUC-4 terrorism corpus. [sent-225, score-0.461]
63 The corpus contains 6 template 3Physical objects are defined as non-person physical objects 981 TVPeairc ptgime t ratorBomx xbing Kidx nap At x xack Arx xson IFnigsutr uem 3:e Snltots in txhe hand-crafted MUxC-4 templates. [sent-226, score-0.479]
64 We thus only evaluate the 4 main templates (bombing, kidnapping, attack, and arson). [sent-228, score-0.287]
65 We evaluate the four learned templates that score highest in the document classification evaluation (to be described in section 5. [sent-230, score-0.482]
66 Of the four templates, we learned 12 of the 13 semantic roles as created for MUC. [sent-233, score-0.287]
67 We thus report 92% slot recall, and precision as 14 of 16 (88%) learned slots. [sent-237, score-0.27]
68 We only measure agreement with the MUC template schemas, but our system learns other events as well. [sent-238, score-0.524]
69 5 Information Extraction: Slot Filling We now present how to apply our learned templates to information extraction. [sent-240, score-0.403]
70 This section will describe how to extract slot fillers using our templates, but without knowing which templates are correct. [sent-241, score-0.632]
71 We consider each learned semantic role as a potential slot, and we extract slot fillers using the syntactic functions that were previously learned. [sent-244, score-0.618]
72 , the subject of release) serve the dual purpose of both inducing the template slots, and extracting appropriate slot fillers from text. [sent-247, score-0.641]
73 1 Document Classification A document is labeled for a template if two different conditions are met: (1) it contains at least one trigger phrase, and (2) its average per-token conditional probability meets a strict threshold. [sent-249, score-0.564]
74 Both conditions require a definition of the conditional probability of a template given a token. [sent-250, score-0.384]
75 P(t|w) =PPIRPt(IwRs)(w) (8) where PIRt (w) is the pProbability of pattern w in the IR-corpus of template t. [sent-253, score-0.384]
76 PIRt(w) =PCvtC(wt()v) (9) where Ct(w) is the numberP Pof times word w appears in the IR-corpus of template t. [sent-254, score-0.384]
77 A document is labeled with a template if it contains at least one trigger, and its average word probability is greater than a parameter optimized on the training set. [sent-269, score-0.515]
78 1 Experiment: Document Classification The MUC-4 corpus links templates to documents, allowing us to evaluate our document labels. [sent-275, score-0.366]
79 Our learned clusters naturally do not have MUC labels, so we report results on the four clusters that score highest with each label. [sent-277, score-0.25]
80 The bombing template performs best with an F1 score of . [sent-279, score-0.65]
81 2 Entity Extraction Once documents are labeled with templates, we next extract entities into the template slots. [sent-283, score-0.667]
82 The verb plant is in our learned bombing cluster, so step (1) will extract its passive subject bombs and map it to the correct instrument role (see figure 2). [sent-294, score-0.667]
83 We merge MUC’s two perpetrator slots (individuals and orgs) into one gold Perpetrator slot. [sent-307, score-0.403]
84 The standard evaluation for this corpus is to report the F1 score for slot type accuracy, ignoring the template type. [sent-312, score-0.538]
85 For instance, a perpetrator of a bombing and a perpetrator of an attack are treated the same. [sent-313, score-0.852]
86 Figure 5 thus shows our results with previous work that is comparable: the fully supervised and 983 POautrw aR rdesh ua lnts& (1R 5ialot fat-c0k 97) : SWuepakr-vSiusepd4 P82453R986F354 01 Figure 5: MUC-4 extraction, ignoring template type. [sent-322, score-0.384]
87 We give two numbers for our system: mapping one learned template to Attack, and mapping five. [sent-331, score-0.5]
88 Our learned templates for Attack have a different granularity than MUC-4. [sent-332, score-0.403]
89 We thus show results when we apply the best five learned templates to Attack, rather than just one. [sent-335, score-0.403]
90 Our precision is as good as (and our F1 score near) two algorithms that require knowledge of the templates and/or labeled data. [sent-338, score-0.339]
91 Instead of merging all slots across all template types, we score the slots within each template type. [sent-341, score-1.173]
92 This is a stricter evaluation than Section 6; for example, bombing victims assigned to attacks were previously deemed correct4. [sent-342, score-0.266]
93 Arson also unexpectedly 4We do not address the task of template instance identification (e. [sent-346, score-0.384]
94 027) Figure 7: Performance of each template type, but only evaluated on documents labeled with each type. [sent-355, score-0.551]
95 Most of the false positives in the system thus do not originate from the unlabeled documents (the 74 unlabeled), but rather from extracting incorrect entities from correctly identified documents (the 126 labeled). [sent-369, score-0.344]
96 We began by showing that domain knowledge isn’t necessarily required; we learned the MUC-4 template structure with surprising accuracy, learning new semantic roles and several new template structures. [sent-371, score-1.097]
97 It is possible to take these learned slots and use a previous approach to IE (such as seed-based bootstrapping), but we presented an algorithm that instead uses our learned syntactic patterns. [sent-373, score-0.455]
98 984 The extraction results are encouraging, but the template induction itself is a central contribution of this work. [sent-376, score-0.485]
99 We learned more templates than just the main MUC-4 templates. [sent-380, score-0.403]
100 We believe the IR parameters are quite robust, and did not heavily focus on improving this stage, but the two clustering steps during template induction require parameters to control stopping conditions and word filtering. [sent-387, score-0.497]
wordName wordTfidf (topN-words)
[('template', 0.384), ('templates', 0.287), ('bombing', 0.266), ('perpetrator', 0.219), ('slots', 0.184), ('event', 0.155), ('slot', 0.154), ('attack', 0.148), ('kidnap', 0.142), ('events', 0.14), ('cluster', 0.135), ('roles', 0.125), ('arson', 0.124), ('learned', 0.116), ('documents', 0.115), ('schemas', 0.111), ('explode', 0.109), ('instrument', 0.106), ('fillers', 0.103), ('ie', 0.1), ('physical', 0.095), ('destroy', 0.094), ('bomb', 0.089), ('detonate', 0.089), ('coreferring', 0.086), ('chambers', 0.086), ('document', 0.079), ('arguments', 0.078), ('role', 0.076), ('clustering', 0.074), ('bombings', 0.071), ('sps', 0.071), ('object', 0.071), ('entities', 0.071), ('patwardhan', 0.068), ('narrative', 0.068), ('clusters', 0.067), ('riloff', 0.067), ('patterns', 0.066), ('docs', 0.062), ('extraction', 0.062), ('terrorism', 0.058), ('plant', 0.058), ('relations', 0.053), ('blown', 0.053), ('exploded', 0.053), ('kasch', 0.053), ('raids', 0.053), ('victim', 0.053), ('labeled', 0.052), ('police', 0.051), ('muc', 0.049), ('learn', 0.049), ('trigger', 0.049), ('semantic', 0.046), ('agglomerative', 0.046), ('extract', 0.045), ('lda', 0.045), ('seed', 0.044), ('wj', 0.044), ('induce', 0.044), ('go', 0.044), ('maslennikov', 0.043), ('knowing', 0.043), ('unlabeled', 0.043), ('filling', 0.043), ('domain', 0.042), ('retrieved', 0.041), ('scripts', 0.041), ('nathanael', 0.041), ('chua', 0.041), ('characterize', 0.04), ('induced', 0.04), ('functions', 0.039), ('syntactic', 0.039), ('similarity', 0.039), ('induction', 0.039), ('banko', 0.039), ('fill', 0.039), ('selectional', 0.039), ('merging', 0.037), ('message', 0.036), ('clustered', 0.036), ('weakly', 0.036), ('ambushes', 0.035), ('authorizes', 0.035), ('avgm', 0.035), ('blamed', 0.035), ('blows', 0.035), ('cdist', 0.035), ('defused', 0.035), ('detained', 0.035), ('detonates', 0.035), ('embassy', 0.035), ('explosives', 0.035), ('foreknowledge', 0.035), ('horna', 0.035), ('pdist', 0.035), ('pirt', 0.035), ('schank', 0.035)]
simIndex simValue paperId paperTitle
same-paper 1 1.0000007 293 acl-2011-Template-Based Information Extraction without the Templates
Author: Nathanael Chambers ; Dan Jurafsky
Abstract: Standard algorithms for template-based information extraction (IE) require predefined template schemas, and often labeled data, to learn to extract their slot fillers (e.g., an embassy is the Target of a Bombing template). This paper describes an approach to template-based IE that removes this requirement and performs extraction without knowing the template structure in advance. Our algorithm instead learns the template structure automatically from raw text, inducing template schemas as sets of linked events (e.g., bombings include detonate, set off, and destroy events) associated with semantic roles. We also solve the standard IE task, using the induced syntactic patterns to extract role fillers from specific documents. We evaluate on the MUC-4 terrorism dataset and show that we induce template structure very similar to handcreated gold structure, and we extract role fillers with an F1 score of .40, approaching the performance of algorithms that require full knowledge of the templates.
2 0.27848059 244 acl-2011-Peeling Back the Layers: Detecting Event Role Fillers in Secondary Contexts
Author: Ruihong Huang ; Ellen Riloff
Abstract: The goal of our research is to improve event extraction by learning to identify secondary role filler contexts in the absence of event keywords. We propose a multilayered event extraction architecture that progressively “zooms in” on relevant information. Our extraction model includes a document genre classifier to recognize event narratives, two types of sentence classifiers, and noun phrase classifiers to extract role fillers. These modules are organized as a pipeline to gradually zero in on event-related information. We present results on the MUC-4 event extraction data set and show that this model performs better than previous systems.
3 0.25828654 65 acl-2011-Can Document Selection Help Semi-supervised Learning? A Case Study On Event Extraction
Author: Shasha Liao ; Ralph Grishman
Abstract: Annotating training data for event extraction is tedious and labor-intensive. Most current event extraction tasks rely on hundreds of annotated documents, but this is often not enough. In this paper, we present a novel self-training strategy, which uses Information Retrieval (IR) to collect a cluster of related documents as the resource for bootstrapping. Also, based on the particular characteristics of this corpus, global inference is applied to provide more confident and informative data selection. We compare this approach to self-training on a normal newswire corpus and show that IR can provide a better corpus for bootstrapping and that global inference can further improve instance selection. We obtain gains of 1.7% in trigger labeling and 2.3% in role labeling through IR and an additional 1.1% in trigger labeling and 1.3% in role labeling by applying global inference. 1
4 0.23000513 328 acl-2011-Using Cross-Entity Inference to Improve Event Extraction
Author: Yu Hong ; Jianfeng Zhang ; Bin Ma ; Jianmin Yao ; Guodong Zhou ; Qiaoming Zhu
Abstract: Event extraction is the task of detecting certain specified types of events that are mentioned in the source language data. The state-of-the-art research on the task is transductive inference (e.g. cross-event inference). In this paper, we propose a new method of event extraction by well using cross-entity inference. In contrast to previous inference methods, we regard entitytype consistency as key feature to predict event mentions. We adopt this inference method to improve the traditional sentence-level event extraction system. Experiments show that we can get 8.6% gain in trigger (event) identification, and more than 11.8% gain for argument (role) classification in ACE event extraction. 1
5 0.19261345 324 acl-2011-Unsupervised Semantic Role Induction via Split-Merge Clustering
Author: Joel Lang ; Mirella Lapata
Abstract: In this paper we describe an unsupervised method for semantic role induction which holds promise for relieving the data acquisition bottleneck associated with supervised role labelers. We present an algorithm that iteratively splits and merges clusters representing semantic roles, thereby leading from an initial clustering to a final clustering of better quality. The method is simple, surprisingly effective, and allows to integrate linguistic knowledge transparently. By combining role induction with a rule-based component for argument identification we obtain an unsupervised end-to-end semantic role labeling system. Evaluation on the CoNLL 2008 benchmark dataset demonstrates that our method outperforms competitive unsupervised approaches by a wide margin.
6 0.18241069 122 acl-2011-Event Extraction as Dependency Parsing
7 0.15346737 191 acl-2011-Knowledge Base Population: Successful Approaches and Challenges
8 0.13308933 3 acl-2011-A Bayesian Model for Unsupervised Semantic Parsing
9 0.12284269 18 acl-2011-A Latent Topic Extracting Method based on Events in a Document and its Application
10 0.11643475 170 acl-2011-In-domain Relation Discovery with Meta-constraints via Posterior Regularization
11 0.11617176 86 acl-2011-Coreference for Learning to Extract Relations: Yes Virginia, Coreference Matters
12 0.11022758 277 acl-2011-Semi-supervised Relation Extraction with Large-scale Word Clustering
13 0.095194079 216 acl-2011-MEANT: An inexpensive, high-accuracy, semi-automatic metric for evaluating translation utility based on semantic roles
14 0.09131135 309 acl-2011-Transition-based Dependency Parsing with Rich Non-local Features
15 0.084123418 198 acl-2011-Latent Semantic Word Sense Induction and Disambiguation
16 0.084107287 117 acl-2011-Entity Set Expansion using Topic information
17 0.083343655 121 acl-2011-Event Discovery in Social Media Feeds
18 0.082670063 196 acl-2011-Large-Scale Cross-Document Coreference Using Distributed Inference and Hierarchical Models
19 0.079504982 175 acl-2011-Integrating history-length interpolation and classes in language modeling
20 0.078867577 262 acl-2011-Relation Guided Bootstrapping of Semantic Lexicons
topicId topicWeight
[(0, 0.223), (1, 0.11), (2, -0.229), (3, 0.016), (4, 0.237), (5, 0.111), (6, -0.057), (7, -0.004), (8, 0.124), (9, -0.0), (10, 0.02), (11, -0.022), (12, 0.036), (13, 0.051), (14, -0.005), (15, -0.042), (16, -0.034), (17, -0.014), (18, 0.011), (19, 0.004), (20, -0.032), (21, 0.013), (22, -0.003), (23, -0.054), (24, 0.031), (25, -0.016), (26, 0.006), (27, 0.013), (28, 0.054), (29, -0.038), (30, -0.036), (31, -0.062), (32, 0.0), (33, -0.064), (34, -0.007), (35, 0.005), (36, 0.026), (37, 0.001), (38, 0.014), (39, 0.04), (40, -0.024), (41, 0.007), (42, -0.05), (43, 0.071), (44, 0.079), (45, 0.008), (46, 0.014), (47, -0.004), (48, -0.011), (49, 0.023)]
simIndex simValue paperId paperTitle
same-paper 1 0.94902152 293 acl-2011-Template-Based Information Extraction without the Templates
Author: Nathanael Chambers ; Dan Jurafsky
Abstract: Standard algorithms for template-based information extraction (IE) require predefined template schemas, and often labeled data, to learn to extract their slot fillers (e.g., an embassy is the Target of a Bombing template). This paper describes an approach to template-based IE that removes this requirement and performs extraction without knowing the template structure in advance. Our algorithm instead learns the template structure automatically from raw text, inducing template schemas as sets of linked events (e.g., bombings include detonate, set off, and destroy events) associated with semantic roles. We also solve the standard IE task, using the induced syntactic patterns to extract role fillers from specific documents. We evaluate on the MUC-4 terrorism dataset and show that we induce template structure very similar to handcreated gold structure, and we extract role fillers with an F1 score of .40, approaching the performance of algorithms that require full knowledge of the templates.
2 0.89285058 65 acl-2011-Can Document Selection Help Semi-supervised Learning? A Case Study On Event Extraction
Author: Shasha Liao ; Ralph Grishman
Abstract: Annotating training data for event extraction is tedious and labor-intensive. Most current event extraction tasks rely on hundreds of annotated documents, but this is often not enough. In this paper, we present a novel self-training strategy, which uses Information Retrieval (IR) to collect a cluster of related documents as the resource for bootstrapping. Also, based on the particular characteristics of this corpus, global inference is applied to provide more confident and informative data selection. We compare this approach to self-training on a normal newswire corpus and show that IR can provide a better corpus for bootstrapping and that global inference can further improve instance selection. We obtain gains of 1.7% in trigger labeling and 2.3% in role labeling through IR and an additional 1.1% in trigger labeling and 1.3% in role labeling by applying global inference. 1
3 0.8192091 244 acl-2011-Peeling Back the Layers: Detecting Event Role Fillers in Secondary Contexts
Author: Ruihong Huang ; Ellen Riloff
Abstract: The goal of our research is to improve event extraction by learning to identify secondary role filler contexts in the absence of event keywords. We propose a multilayered event extraction architecture that progressively “zooms in” on relevant information. Our extraction model includes a document genre classifier to recognize event narratives, two types of sentence classifiers, and noun phrase classifiers to extract role fillers. These modules are organized as a pipeline to gradually zero in on event-related information. We present results on the MUC-4 event extraction data set and show that this model performs better than previous systems.
4 0.78725237 328 acl-2011-Using Cross-Entity Inference to Improve Event Extraction
Author: Yu Hong ; Jianfeng Zhang ; Bin Ma ; Jianmin Yao ; Guodong Zhou ; Qiaoming Zhu
Abstract: Event extraction is the task of detecting certain specified types of events that are mentioned in the source language data. The state-of-the-art research on the task is transductive inference (e.g. cross-event inference). In this paper, we propose a new method of event extraction by well using cross-entity inference. In contrast to previous inference methods, we regard entitytype consistency as key feature to predict event mentions. We adopt this inference method to improve the traditional sentence-level event extraction system. Experiments show that we can get 8.6% gain in trigger (event) identification, and more than 11.8% gain for argument (role) classification in ACE event extraction. 1
5 0.75076318 122 acl-2011-Event Extraction as Dependency Parsing
Author: David McClosky ; Mihai Surdeanu ; Christopher Manning
Abstract: Nested event structures are a common occurrence in both open domain and domain specific extraction tasks, e.g., a “crime” event can cause a “investigation” event, which can lead to an “arrest” event. However, most current approaches address event extraction with highly local models that extract each event and argument independently. We propose a simple approach for the extraction of such structures by taking the tree of event-argument relations and using it directly as the representation in a reranking dependency parser. This provides a simple framework that captures global properties of both nested and flat event structures. We explore a rich feature space that models both the events to be parsed and context from the original supporting text. Our approach obtains competitive results in the extraction of biomedical events from the BioNLP’09 shared task with a F1 score of 53.5% in development and 48.6% in testing.
6 0.64922345 324 acl-2011-Unsupervised Semantic Role Induction via Split-Merge Clustering
7 0.58849603 121 acl-2011-Event Discovery in Social Media Feeds
8 0.57673591 3 acl-2011-A Bayesian Model for Unsupervised Semantic Parsing
9 0.57382733 68 acl-2011-Classifying arguments by scheme
10 0.54769087 277 acl-2011-Semi-supervised Relation Extraction with Large-scale Word Clustering
11 0.54203129 40 acl-2011-An Error Analysis of Relation Extraction in Social Media Documents
12 0.51592255 319 acl-2011-Unsupervised Decomposition of a Document into Authorial Components
13 0.51397657 231 acl-2011-Nonlinear Evidence Fusion and Propagation for Hyponymy Relation Mining
14 0.50555116 86 acl-2011-Coreference for Learning to Extract Relations: Yes Virginia, Coreference Matters
15 0.50145876 170 acl-2011-In-domain Relation Discovery with Meta-constraints via Posterior Regularization
16 0.4967075 322 acl-2011-Unsupervised Learning of Semantic Relation Composition
17 0.48678356 18 acl-2011-A Latent Topic Extracting Method based on Events in a Document and its Application
18 0.47190434 126 acl-2011-Exploiting Syntactico-Semantic Structures for Relation Extraction
19 0.46623072 320 acl-2011-Unsupervised Discovery of Domain-Specific Knowledge from Text
20 0.46300554 174 acl-2011-Insights from Network Structure for Text Mining
topicId topicWeight
[(5, 0.015), (17, 0.041), (26, 0.013), (31, 0.017), (37, 0.1), (39, 0.028), (41, 0.056), (55, 0.017), (59, 0.451), (72, 0.027), (91, 0.038), (96, 0.095)]
simIndex simValue paperId paperTitle
1 0.92200762 322 acl-2011-Unsupervised Learning of Semantic Relation Composition
Author: Eduardo Blanco ; Dan Moldovan
Abstract: This paper presents an unsupervised method for deriving inference axioms by composing semantic relations. The method is independent of any particular relation inventory. It relies on describing semantic relations using primitives and manipulating these primitives according to an algebra. The method was tested using a set of eight semantic relations yielding 78 inference axioms which were evaluated over PropBank.
2 0.88087064 279 acl-2011-Semi-supervised latent variable models for sentence-level sentiment analysis
Author: Oscar Tackstrom ; Ryan McDonald
Abstract: We derive two variants of a semi-supervised model for fine-grained sentiment analysis. Both models leverage abundant natural supervision in the form of review ratings, as well as a small amount of manually crafted sentence labels, to learn sentence-level sentiment classifiers. The proposed model is a fusion of a fully supervised structured conditional model and its partially supervised counterpart. This allows for highly efficient estimation and inference algorithms with rich feature definitions. We describe the two variants as well as their component models and verify experimentally that both variants give significantly improved results for sentence-level sentiment analysis compared to all baselines. 1 Sentence-level sentiment analysis In this paper, we demonstrate how combining coarse-grained and fine-grained supervision benefits sentence-level sentiment analysis an important task in the field of opinion classification and retrieval (Pang and Lee, 2008). Typical supervised learning approaches to sentence-level sentiment analysis rely on sentence-level supervision. While such fine-grained supervision rarely exist naturally, and thus requires labor intensive manual annotation effort (Wiebe et al., 2005), coarse-grained supervision is naturally abundant in the form of online review ratings. This coarse-grained supervision is, of course, less informative compared to fine-grained supervision, however, by combining a small amount of sentence-level supervision with a large amount of document-level supervision, we are able to substantially improve on the sentence-level classification task. Our work combines two strands of research: models for sentiment analysis that take document structure into account; – 569 Ryan McDonald Google, Inc., New York ryanmcd@ google com . and models that use latent variables to learn unobserved phenomena from that which can be observed. Exploiting document structure for sentiment analysis has attracted research attention since the early work of Pang and Lee (2004), who performed minimal cuts in a sentence graph to select subjective sentences. McDonald et al. (2007) later showed that jointly learning fine-grained (sentence) and coarsegrained (document) sentiment improves predictions at both levels. More recently, Yessenalina et al. (2010) described how sentence-level latent variables can be used to improve document-level prediction and Nakagawa et al. (2010) used latent variables over syntactic dependency trees to improve sentence-level prediction, using only labeled sentences for training. In a similar vein, Sauper et al. (2010) integrated generative content structure models with discriminative models for multi-aspect sentiment summarization and ranking. These approaches all rely on the availability of fine-grained annotations, but Ta¨ckstro¨m and McDonald (201 1) showed that latent variables can be used to learn fine-grained sentiment using only coarse-grained supervision. While this model was shown to beat a set of natural baselines with quite a wide margin, it has its shortcomings. Most notably, due to the loose constraints provided by the coarse supervision, it tends to only predict the two dominant fine-grained sentiment categories well for each document sentiment category, so that almost all sentences in positive documents are deemed positive or neutral, and vice versa for negative documents. As a way of overcoming these shortcomings, we propose to fuse a coarsely supervised model with a fully supervised model. Below, we describe two ways of achieving such a combined model in the framework of structured conditional latent variable models. Contrary to (generative) topic models (Mei et al., 2007; Titov and Proceedings ofP thoer t4l9atnhd A, Onrnuegaoln M,e Jeuntineg 19 o-f2 t4h,e 2 A0s1s1o.c?i ac t2io0n11 fo Ar Cssoocmiaptuiotanti foonra Clo Lminpguutiast i ocns:aslh Loirntpgaupisetrics , pages 569–574, Figure 1: a) Factor graph of the fully observed graphical model. b) Factor graph of the corresponding latent variable model. During training, shaded nodes are observed, while non-shaded nodes are unobserved. The input sentences si are always observed. Note that there are no factors connecting the document node, yd, with the input nodes, s, so that the sentence-level variables, ys, in effect form a bottleneck between the document sentiment and the input sentences. McDonald, 2008; Lin and He, 2009), structured conditional models can handle rich and overlapping features and allow for exact inference and simple gradient based estimation. The former models are largely orthogonal to the one we propose in this work and combining their merits might be fruitful. As shown by Sauper et al. (2010), it is possible to fuse generative document structure models and task specific structured conditional models. While we do model document structure in terms of sentiment transitions, we do not model topical structure. An interesting avenue for future work would be to extend the model of Sauper et al. (2010) to take coarse-grained taskspecific supervision into account, while modeling fine-grained task-specific aspects with latent variables. Note also that the proposed approach is orthogonal to semi-supervised and unsupervised induction of context independent (prior polarity) lexicons (Turney, 2002; Kim and Hovy, 2004; Esuli and Sebastiani, 2009; Rao and Ravichandran, 2009; Velikovich et al., 2010). The output of such models could readily be incorporated as features in the proposed model. 1.1 Preliminaries Let d be a document consisting of n sentences, s = (si)in=1, with a document–sentence-sequence pair denoted d = (d, s). Let yd = (yd, ys) denote random variables1 the document level sentiment, yd, and the sequence of sentence level sentiment, = (ysi)in=1 . – ys 1We are abusing notation throughout by using the same symbols to refer to random variables and their particular assignments. 570 In what follows, we assume that we have access to two training sets: a small set of fully labeled instances, DF = {(dj, and a large set of ydj)}jm=f1, coarsely labeled instances DC = {(dj, yjd)}jm=fm+fm+c1. Furthermore, we assume that yd and all yis take values in {POS, NEG, NEU}. We focus on structured conditional models in the exponential family, with the standard parametrization pθ(yd,ys|s) = expnhφ(yd,ys,s),θi − Aθ(s)o
same-paper 3 0.87349021 293 acl-2011-Template-Based Information Extraction without the Templates
Author: Nathanael Chambers ; Dan Jurafsky
Abstract: Standard algorithms for template-based information extraction (IE) require predefined template schemas, and often labeled data, to learn to extract their slot fillers (e.g., an embassy is the Target of a Bombing template). This paper describes an approach to template-based IE that removes this requirement and performs extraction without knowing the template structure in advance. Our algorithm instead learns the template structure automatically from raw text, inducing template schemas as sets of linked events (e.g., bombings include detonate, set off, and destroy events) associated with semantic roles. We also solve the standard IE task, using the induced syntactic patterns to extract role fillers from specific documents. We evaluate on the MUC-4 terrorism dataset and show that we induce template structure very similar to handcreated gold structure, and we extract role fillers with an F1 score of .40, approaching the performance of algorithms that require full knowledge of the templates.
4 0.87210542 102 acl-2011-Does Size Matter - How Much Data is Required to Train a REG Algorithm?
Author: Mariet Theune ; Ruud Koolen ; Emiel Krahmer ; Sander Wubben
Abstract: In this paper we investigate how much data is required to train an algorithm for attribute selection, a subtask of Referring Expressions Generation (REG). To enable comparison between different-sized training sets, a systematic training method was developed. The results show that depending on the complexity of the domain, training on 10 to 20 items may already lead to a good performance.
5 0.81810153 224 acl-2011-Models and Training for Unsupervised Preposition Sense Disambiguation
Author: Dirk Hovy ; Ashish Vaswani ; Stephen Tratz ; David Chiang ; Eduard Hovy
Abstract: We present a preliminary study on unsupervised preposition sense disambiguation (PSD), comparing different models and training techniques (EM, MAP-EM with L0 norm, Bayesian inference using Gibbs sampling). To our knowledge, this is the first attempt at unsupervised preposition sense disambiguation. Our best accuracy reaches 56%, a significant improvement (at p <.001) of 16% over the most-frequent-sense baseline.
6 0.78726393 329 acl-2011-Using Deep Morphology to Improve Automatic Error Detection in Arabic Handwriting Recognition
7 0.70900035 51 acl-2011-Automatic Headline Generation using Character Cross-Correlation
8 0.62305468 164 acl-2011-Improving Arabic Dependency Parsing with Form-based and Functional Morphological Features
9 0.61717951 262 acl-2011-Relation Guided Bootstrapping of Semantic Lexicons
10 0.58292764 170 acl-2011-In-domain Relation Discovery with Meta-constraints via Posterior Regularization
11 0.57602185 7 acl-2011-A Corpus for Modeling Morpho-Syntactic Agreement in Arabic: Gender, Number and Rationality
12 0.57539594 324 acl-2011-Unsupervised Semantic Role Induction via Split-Merge Clustering
13 0.55013722 269 acl-2011-Scaling up Automatic Cross-Lingual Semantic Role Annotation
14 0.54387677 167 acl-2011-Improving Dependency Parsing with Semantic Classes
15 0.5403887 3 acl-2011-A Bayesian Model for Unsupervised Semantic Parsing
16 0.53302848 190 acl-2011-Knowledge-Based Weak Supervision for Information Extraction of Overlapping Relations
17 0.53226507 198 acl-2011-Latent Semantic Word Sense Induction and Disambiguation
18 0.53183204 244 acl-2011-Peeling Back the Layers: Detecting Event Role Fillers in Secondary Contexts
19 0.53122902 289 acl-2011-Subjectivity and Sentiment Analysis of Modern Standard Arabic
20 0.5294562 86 acl-2011-Coreference for Learning to Extract Relations: Yes Virginia, Coreference Matters