acl acl2012 acl2012-33 knowledge-graph by maker-knowledge-mining
Source: pdf
Author: Wei Lu ; Dan Roth
Abstract: This paper presents a novel sequence labeling model based on the latent-variable semiMarkov conditional random fields for jointly extracting argument roles of events from texts. The model takes in coarse mention and type information and predicts argument roles for a given event template. This paper addresses the event extraction problem in a primarily unsupervised setting, where no labeled training instances are available. Our key contribution is a novel learning framework called structured preference modeling (PM), that allows arbitrary preference to be assigned to certain structures during the learning procedure. We establish and discuss connections between this framework and other existing works. We show empirically that the structured preferences are crucial to the success of our task. Our model, trained without annotated data and with a small number of structured preferences, yields performance competitive to some baseline supervised approaches.
Reference: text
sentIndex sentText sentNum sentScore
1 edu , Abstract This paper presents a novel sequence labeling model based on the latent-variable semiMarkov conditional random fields for jointly extracting argument roles of events from texts. [sent-2, score-0.393]
2 The model takes in coarse mention and type information and predicts argument roles for a given event template. [sent-3, score-0.898]
3 This paper addresses the event extraction problem in a primarily unsupervised setting, where no labeled training instances are available. [sent-4, score-0.55]
4 Our key contribution is a novel learning framework called structured preference modeling (PM), that allows arbitrary preference to be assigned to certain structures during the learning procedure. [sent-5, score-1.283]
5 We show empirically that the structured preferences are crucial to the success of our task. [sent-7, score-0.301]
6 1 Introduction Automatic template-filling-based event extraction is an important and challenging task. [sent-9, score-0.4]
7 Each row shows an 835 argument for the event, together with a set of its acceptable mention types, where the type specifies a high-level semantic class a mention belongs to. [sent-15, score-0.742]
8 One typical assumption is that certain coarse mention-level information, such as mention boundaries and their semantic class (a. [sent-19, score-0.428]
9 However, in practice, outputs from existing mention identification and typing systems can be far from ideal. [sent-31, score-0.381]
10 Instead of obtaining the above ideal annotation, one might observe the following noisy and ambiguous annotation for the given event span: O[f. [sent-32, score-0.352]
11 stC ao rktyfeh]OicplReGa[iolmt]spvP|aeFElyRAfrCh]|sOLVaREvieGCHdlct mentions in an event span and assign them with corresponding argument information, given such coarse ProceedJienjgus, R ofep thueb 5lic0t hof A Knonrueaa,l M 8-e1e4ti Jnugly o f2 t0h1e2 A. [sent-36, score-0.862]
12 kdl’isvTcmeonpxt(ralfy), and the correct event template annotation for the example event span given in Sec 1 (right). [sent-40, score-0.911]
13 This motivates us to build a novel latentvariable semi-Markov conditional random fields model (Sarawagi and Cohen, 2004) for such an event extraction task. [sent-44, score-0.564]
14 The learned model takes in coarse information as produced by existing mention identification and typing modules, and jointly outputs selected mentions and their corresponding argument roles. [sent-45, score-0.759]
15 We propose a novel general learning framework called structured preference modeling (or preference modeling, PM), which encompasses both the fully supervised and the latent-variable conditional models as special cases. [sent-47, score-1.229]
16 The framework allows arbitrary declarative structured preference knowledge to be introduced to guide the learning procedure in a primarily unsupervised setting. [sent-48, score-0.915]
17 We present our semi-Markov model and discuss our preference modeling framework in Section 2 and 3 respectively. [sent-49, score-0.558]
18 Finally, we demonstrate through experiments that structured preference information is crucial to model and present empirical results on a standard dataset in Section 5. [sent-51, score-0.597]
19 IMn a ksouvpe CrvRiFse,d u nsedtetirn ag ,s poneclyif icco srreegmct arguments are Cobserved but their associated correct mention types are hidden (shaded). [sent-59, score-0.386]
20 This motivates us to build a joint model for extracting the event structures from the text. [sent-61, score-0.413]
21 Cn refer to a particular segmentation of the event span, where C1, C3 . [sent-66, score-0.386]
22 correspond to in-between mention word sequences (we call them gaps) (e. [sent-74, score-0.258]
23 refer to event arguments that carry specific roles (e. [sent-86, score-0.447]
24 The event span is split into segments, where each segment is either linked to a mention type (Ti; these segments can be referred to as “argument segments”), or directly linked to an inter-argument gap (Bj ; they can be referred to as “gap segments”). [sent-93, score-1.126]
25 In the figure, for example, the segments C1 and C3 are identified as two argument segments (which are mentions of types T1 and T3 respectively) and are mapped to two “nodes”, and the segment C2 is identified as a gap segment that connects the two arguments A1 and A3. [sent-95, score-0.969]
26 We use s to denote an event span and t to denote a specific realization (filling) of the event template. [sent-97, score-0.899]
27 Denote by h a particular mention boundary and type assignment for an event span, which gives us a specific segmentation of the given span. [sent-99, score-0.733]
28 Following the conditional 1Extending the model to support certain argument overlapping is possible – we leave it for future work. [sent-100, score-0.299]
29 , 2001), we parameterize the conditional probability of the (t, h) pair given an event span s as follows: PΘ(t,h|s) =Pet,hf(se,fh(s,t,)h·Θ,t)·Θ (1) where f gives the featuPre functions defined on the tuple (s, h, t), and Θ defines the parameter vector. [sent-102, score-0.679]
30 Our objective function is the logarithm of the joint conditional probability of observing the template realization for the observed event span s: L(Θ) = XlogPΘ(ti|si) Xi = XilogPPth,heef(f(sis,ih,h,t,it) · ΘΘ (2) This function is not convePx due to the summation over the hidden variable h. [sent-103, score-0.988]
31 Inference involves computing the most probable template realization t for a given event span: argtmaxPΘ(t|s) = argtmaxXhPΘ(t,h|s) (4) where the possible hidden assignments h need to be marginalized out. [sent-111, score-0.523]
32 In this task, a particular realization t already uniquely defines a particular segmentation (mention boundaries) of the event span, thus the h only contributes type information to t. [sent-112, score-0.532]
33 Since one primary assumption is that we have access to the output of existing mention identification and typing systems, the set ofall possible mentions defines a lattice representation containing the set of all possible segmentations that comply with such mention-level information. [sent-118, score-0.554]
34 Assuming there are A possible arguments for the event and K annotated mentions, the complexity of the forwardbackward style algorithm is in O(A3K2) under the “second-order” setting that we will discuss in Sec- tion 2. [sent-119, score-0.49]
35 Our model will need to disambiguate the mention boundaries as well as their types. [sent-126, score-0.309]
36 Since we are only interested in modeling dependencies between adjacent argument segments, we assign hard labels to each gap segment based on its contextual argument information. [sent-130, score-0.629]
37 Specifically, the label of each gap segment × is uniquely determined by its surrounding argument segments with a list representation. [sent-131, score-0.522]
38 For example, in a “first-order” setting, the gap segment that appears between its previous argument segment “ATTACKER” and its next argument segment “INSTRUMENT” is annotated as the list consisting of two elements: [ATTACKER, INSTRUMENT]. [sent-132, score-0.735]
39 To capture longer-range dependencies, in this work we use a “second-order” setting (as shown in Figure 2), 2The length of a gap segment is arbitrary (including zero), unlike the seminal semi-Markov CRF model of Sarawagi and Cohen (2004). [sent-133, score-0.258]
40 which means each gap segment is annotated with a list that consists of its previous two argument segments as well as its subsequent one. [sent-134, score-0.522]
41 Indicator function for the combination of its immediate two left arguments and its immediate right argument. [sent-139, score-0.281]
42 For argument segments, we also define the same input feature templates as above, with the following additional ones to capture contextual information: CWORDS : CPOS : Indicator function for the previous and next k (= 1, 2, 3) words. [sent-140, score-0.321]
43 and we define the following output feature template: ARGTYPE: Indicator function for the combination of the argument and its associated type. [sent-142, score-0.284]
44 We introduce a novel general learning framework called structured preference modeling, which allows arbitrary prior knowledge about structures to be introduced to the learning process in a declarative manner. [sent-150, score-0.826]
45 he following objective function: Lu(Θ) =XilogPypPΘ(xyip,Θy)(x ×i, κy()xi,y) (5) Intuitively, optimizing suPch an objective function is equivalent to pushing the probability mass from bad structures to good structures corresponding to the same input. [sent-166, score-0.47]
46 When the preference function κ is defined as the indicator function for the correct structure (xi, yi), × the numerator terms of the above formula are simply of the forms pΘ (xi, yi), and the model corresponds to the fully supervised CRF model. [sent-167, score-0.842]
47 The preference function κ serves as a source from which certain prior knowledge about the structure can be injected into our model in a principled way. [sent-176, score-0.618]
48 This allows us to incorporate both local and arbitrary global structured information into the preference function. [sent-178, score-0.65]
49 The preference function κ is defined at the complete structure level. [sent-182, score-0.616]
50 In this work, we exploit a specific form of the preference function κ. [sent-185, score-0.522]
51 We show some actual κp functions usedP Pfor a particular event in Section 5. [sent-191, score-0.403]
52 3 Event Extraction Now we can obtain the objective function for our event extraction task. [sent-195, score-0.571]
53 Constraints Note that the objective function in Equation 5, if written in the additive form, leads to a cost function reminiscent of the one used in constraint-driven learning algorithm (CoDL) (Chang et al. [sent-200, score-0.273]
54 Specifically, in CoDL, the following cost function is involved in its EM-like inference procedure: argymaxΘ · f(x,y) − ρXcd(y,Yc) (14) where Yc defines the set of y’s that all satisfy a certwaihne rceon Ystraint c, and d defines a distance function from y to that set. [sent-203, score-0.282]
55 There are some important distinctions between structured preference modeling (PM) and CoDL. [sent-205, score-0.657]
56 Constraints are typically useful when one works on structured prediction problems for data with certain (often rigid) regularities, such as citations, advertisements, or POS tagging for complete sentences. [sent-208, score-0.289]
57 For example, there is no guarantee that a certain argument will always be present in the event span, nor should a particular mention, if appeared, always be selected and assigned to a specific argument. [sent-217, score-0.591]
58 For example, in the example event span given 840 in Section 1, both “March” and “Tuesday” are valid candidate mentions for the TIME-WITHIN argument given their annotated type TME. [sent-218, score-0.844]
59 In this work, our preference function is related to another function that can be decomposed into a collection of property functions κp. [sent-222, score-0.675]
60 This formulation gives us a complete flexibility to assign arbitrary structured preferences, where positive scores can be assigned to good properties, and negative scores to bad ones. [sent-224, score-0.398]
61 To summarize, preferences are an effective way to “define” the event structure to the learner, which is essential in an unsupervised setting, which may not be easy to do with other forms of constraints. [sent-226, score-0.584]
62 To present general results while making minimal assumptions, our primary event extraction results 3http://www. [sent-232, score-0.4]
63 1 6 1945 Table 1: Performance for different events under different experimental settings, with gold mention boundaries and types. [sent-244, score-0.356]
64 ) are independent of mention identification and typing modules, which are based on the gold mention information as given by the dataset. [sent-247, score-0.639]
65 Additionally, we present results obtained by exploiting our in-house automatic mention identification and typing module, which is a hybrid system that combines statistical and rule-based approaches. [sent-248, score-0.381]
66 In these approaches, we treat each argument of the template as one possi- ble output class, plus a special “NONE” class for not selecting it as an argument. [sent-255, score-0.257]
67 We train and apply the classifiers on argument segments (i. [sent-256, score-0.317]
68 In the simplest baseline approach MaxEnt-b, type information for each mention is simply treated as one special feature. [sent-260, score-0.302]
69 To assess the importance of structured preference, we also perform experiments where structured preference information is incorporated at the inference time of the MaxEnt classifiers. [sent-265, score-0.774]
70 Next, we re-rank this list based on scores from our structured preference functions (we used the same preferences as to be discussed in the next section). [sent-267, score-0.772]
71 Note that no structured preference information is used when training and evaluating our semi-CRF model. [sent-270, score-0.597]
72 This clearly indicates that structured preference information is crucial to model. [sent-273, score-0.597]
73 We first build our simplest baseline by randomly assigning arguments to each mention with mention type information serving as constraints. [sent-276, score-0.655]
74 Figure 3: The complete list of preference patterns used for the “Die” and “Transport” event. [sent-279, score-0.517]
75 However, to demonstrate its general effectiveness, in this work we only choose a minimal amount of general preference patterns for evaluations. [sent-293, score-0.462]
76 We make our preference patterns as general as possible. [sent-294, score-0.462]
77 As shown in the last column (#P) of Table 2, we use only 7 preference patterns each for the “Attack” and “Meet” events, and 6 patterns each for the other two events. [sent-295, score-0.545]
78 In Figure 3, we show the complete list of the 6 preference patterns for the “Die” and “Transport” event used for our experiments. [sent-296, score-0.869]
79 On the other hand, a completely unsupervised approach where structured preferences are not specified, performs substantially worse. [sent-302, score-0.37]
80 To run such completely unsupervised models, we essentially follow the same training procedure as that of the preference modeling, except that structured preference information is not in place when generating the n-best list. [sent-303, score-1.086]
81 As a result, the 842 unsupervised model without preference information can even perform worse than the random baseline 4. [sent-307, score-0.535]
82 Such an approach performs worse than our approach with preference modeling. [sent-311, score-0.42]
83 However, we also note that the performance of preference modeling depends on the actual quality and amount of preferences used for learning. [sent-314, score-0.604]
84 In the extreme case, where only few preferences are used, the performance of preference modeling will be close to that of the unsupervised approach, while the rulebased approach will yield performance close to that of the random baseline. [sent-315, score-0.719]
85 The results with automatically predicted mention boundaries and types are given in Table 3. [sent-316, score-0.309]
86 Similar observations can be made when comparing the performance of preference modeling with other approaches. [sent-317, score-0.48]
87 This set of results further confirms the effectiveness of our approach using preference modeling for the event extraction task. [sent-318, score-0.88]
88 93152487 Table 3: Event extraction performance with automatic mention identifier and typer. [sent-329, score-0.306]
89 We report F1 percentage scores for preference modeling (PM) as well as two baseline approaches. [sent-330, score-0.48]
90 Contrastive estimation (CE) (Smith and Eisner, 2005a) is another log-linear framework for primarily unsupervised structured prediction. [sent-342, score-0.362]
91 , 2010) is another recently proposed framework for unsupervised structured prediction. [sent-352, score-0.281]
92 Empirically the model is effective in various unsupervised structured prediction tasks, and outperforms the globally normalized model. [sent-354, score-0.246]
93 Although modeling the semi-Markov properties of our segments (especially the gap segments) in our task is potentially challenging, we plan to investigate in the future the feasibility for our task with such a framework. [sent-355, score-0.317]
94 7 Conclusions In this paper, we present a novel model based on the semi-Markov conditional random fields for the challenging event extraction task. [sent-356, score-0.564]
95 The model takes in coarse mention boundary and type information and predicts complete structures indicating the corresponding argument role for each mention. [sent-357, score-0.662]
96 To learn the model in an unsupervised manner, we further develop a novel learning approach called structured preference modeling that allows structured knowledge to be incorporated effectively in a declarative manner. [sent-358, score-0.983]
97 Empirically, we show that knowledge about structured preference is crucial to model and the preference modeling is an effective way to guide learning in this setting. [sent-359, score-1.077]
98 Trained in a primarily unsupervised manner, our model incorporating structured preference information exhibits performance that is competitive to that of some supervised baseline approaches. [sent-360, score-0.804]
99 Our event extraction system and code will be available for download from our group web page. [sent-361, score-0.4]
100 Acknowledgments We would like to thank Yee Seng Chan, Mark Sammons, and Quang Xuan Do for their help with the mention identification and typing system used in this paper. [sent-362, score-0.381]
wordName wordTfidf (topN-words)
[('preference', 0.42), ('event', 0.352), ('mention', 0.258), ('argument', 0.182), ('structured', 0.177), ('codl', 0.16), ('segments', 0.135), ('mentions', 0.134), ('span', 0.132), ('xi', 0.129), ('preferences', 0.124), ('gap', 0.122), ('function', 0.102), ('arguments', 0.095), ('pm', 0.092), ('crf', 0.089), ('typing', 0.084), ('segment', 0.083), ('indicator', 0.082), ('primarily', 0.081), ('declarative', 0.08), ('template', 0.075), ('ratinov', 0.073), ('military', 0.073), ('die', 0.073), ('unsupervised', 0.069), ('equation', 0.069), ('objective', 0.069), ('attacker', 0.069), ('xep', 0.069), ('xiep', 0.069), ('bad', 0.068), ('korea', 0.065), ('si', 0.065), ('realization', 0.063), ('yi', 0.062), ('coarse', 0.062), ('structures', 0.061), ('attack', 0.061), ('conditional', 0.06), ('modeling', 0.06), ('constraints', 0.059), ('fields', 0.058), ('supervised', 0.057), ('certain', 0.057), ('roth', 0.056), ('locally', 0.056), ('chang', 0.056), ('complete', 0.055), ('arbitrary', 0.053), ('fk', 0.052), ('posterior', 0.051), ('boundaries', 0.051), ('functions', 0.051), ('transport', 0.051), ('ef', 0.051), ('sarawagi', 0.048), ('extraction', 0.048), ('events', 0.047), ('random', 0.046), ('neighborhood', 0.046), ('ganchev', 0.046), ('byy', 0.046), ('decomposable', 0.046), ('fired', 0.046), ('imn', 0.046), ('laser', 0.046), ('lecun', 0.046), ('samdani', 0.046), ('tgen', 0.046), ('gives', 0.045), ('contrastive', 0.044), ('type', 0.044), ('discuss', 0.043), ('patterns', 0.042), ('immediate', 0.042), ('column', 0.041), ('gpe', 0.04), ('numerator', 0.04), ('pushing', 0.04), ('defines', 0.039), ('structure', 0.039), ('fj', 0.039), ('smith', 0.039), ('identification', 0.039), ('templates', 0.037), ('guiding', 0.036), ('goldwasser', 0.036), ('instrument', 0.036), ('str', 0.036), ('unless', 0.036), ('framework', 0.035), ('okanohara', 0.034), ('semimarkov', 0.034), ('derivatives', 0.034), ('march', 0.034), ('regularities', 0.034), ('segmentation', 0.034), ('eisner', 0.033), ('hidden', 0.033)]
simIndex simValue paperId paperTitle
same-paper 1 1.0000007 33 acl-2012-Automatic Event Extraction with Structured Preference Modeling
Author: Wei Lu ; Dan Roth
Abstract: This paper presents a novel sequence labeling model based on the latent-variable semiMarkov conditional random fields for jointly extracting argument roles of events from texts. The model takes in coarse mention and type information and predicts argument roles for a given event template. This paper addresses the event extraction problem in a primarily unsupervised setting, where no labeled training instances are available. Our key contribution is a novel learning framework called structured preference modeling (PM), that allows arbitrary preference to be assigned to certain structures during the learning procedure. We establish and discuss connections between this framework and other existing works. We show empirically that the structured preferences are crucial to the success of our task. Our model, trained without annotated data and with a small number of structured preferences, yields performance competitive to some baseline supervised approaches.
2 0.29233903 85 acl-2012-Event Linking: Grounding Event Reference in a News Archive
Author: Joel Nothman ; Matthew Honnibal ; Ben Hachey ; James R. Curran
Abstract: Interpreting news requires identifying its constituent events. Events are complex linguistically and ontologically, so disambiguating their reference is challenging. We introduce event linking, which canonically labels an event reference with the article where it was first reported. This implicitly relaxes coreference to co-reporting, and will practically enable augmenting news archives with semantic hyperlinks. We annotate and analyse a corpus of 150 documents, extracting 501 links to a news archive with reasonable inter-annotator agreement.
3 0.18573263 64 acl-2012-Crosslingual Induction of Semantic Roles
Author: Ivan Titov ; Alexandre Klementiev
Abstract: We argue that multilingual parallel data provides a valuable source of indirect supervision for induction of shallow semantic representations. Specifically, we consider unsupervised induction of semantic roles from sentences annotated with automatically-predicted syntactic dependency representations and use a stateof-the-art generative Bayesian non-parametric model. At inference time, instead of only seeking the model which explains the monolingual data available for each language, we regularize the objective by introducing a soft constraint penalizing for disagreement in argument labeling on aligned sentences. We propose a simple approximate learning algorithm for our set-up which results in efficient inference. When applied to German-English parallel data, our method obtains a substantial improvement over a model trained without using the agreement signal, when both are tested on non-parallel sentences.
4 0.1795374 18 acl-2012-A Probabilistic Model for Canonicalizing Named Entity Mentions
Author: Dani Yogatama ; Yanchuan Sim ; Noah A. Smith
Abstract: We present a statistical model for canonicalizing named entity mentions into a table whose rows represent entities and whose columns are attributes (or parts of attributes). The model is novel in that it incorporates entity context, surface features, firstorder dependencies among attribute-parts, and a notion of noise. Transductive learning from a few seeds and a collection of mention tokens combines Bayesian inference and conditional estimation. We evaluate our model and its components on two datasets collected from political blogs and sports news, finding that it outperforms a simple agglomerative clustering approach and previous work.
5 0.15240978 73 acl-2012-Discriminative Learning for Joint Template Filling
Author: Einat Minkov ; Luke Zettlemoyer
Abstract: This paper presents a joint model for template filling, where the goal is to automatically specify the fields of target relations such as seminar announcements or corporate acquisition events. The approach models mention detection, unification and field extraction in a flexible, feature-rich model that allows for joint modeling of interdependencies at all levels and across fields. Such an approach can, for example, learn likely event durations and the fact that start times should come before end times. While the joint inference space is large, we demonstrate effective learning with a Perceptron-style approach that uses simple, greedy beam decoding. Empirical results in two benchmark domains demonstrate consistently strong performance on both mention de- tection and template filling tasks.
6 0.14314452 201 acl-2012-Towards the Unsupervised Acquisition of Discourse Relations
7 0.13005283 10 acl-2012-A Discriminative Hierarchical Model for Fast Coreference at Large Scale
8 0.12341409 17 acl-2012-A Novel Burst-based Text Representation Model for Scalable Event Detection
9 0.12078315 90 acl-2012-Extracting Narrative Timelines as Temporal Dependency Structures
10 0.12036063 147 acl-2012-Modeling the Translation of Predicate-Argument Structure for SMT
11 0.11016322 50 acl-2012-Collective Classification for Fine-grained Information Status
12 0.099527277 9 acl-2012-A Cost Sensitive Part-of-Speech Tagging: Differentiating Serious Errors from Minor Errors
13 0.098916605 53 acl-2012-Combining Textual Entailment and Argumentation Theory for Supporting Online Debates Interactions
14 0.0945848 57 acl-2012-Concept-to-text Generation via Discriminative Reranking
15 0.091094881 99 acl-2012-Finding Salient Dates for Building Thematic Timelines
16 0.090184063 41 acl-2012-Bootstrapping a Unified Model of Lexical and Phonetic Acquisition
17 0.088245012 150 acl-2012-Multilingual Named Entity Recognition using Parallel Data and Metadata from Wikipedia
18 0.086897038 194 acl-2012-Text Segmentation by Language Using Minimum Description Length
19 0.086320177 58 acl-2012-Coreference Semantics from Web Features
20 0.085987903 177 acl-2012-Sentence Dependency Tagging in Online Question Answering Forums
topicId topicWeight
[(0, -0.273), (1, 0.125), (2, -0.111), (3, 0.161), (4, 0.013), (5, -0.004), (6, -0.003), (7, 0.006), (8, 0.025), (9, -0.038), (10, -0.105), (11, -0.152), (12, -0.049), (13, -0.111), (14, -0.16), (15, 0.099), (16, 0.112), (17, 0.11), (18, -0.15), (19, 0.058), (20, -0.042), (21, 0.123), (22, -0.007), (23, -0.0), (24, -0.108), (25, -0.054), (26, -0.126), (27, 0.042), (28, -0.102), (29, -0.004), (30, 0.13), (31, -0.074), (32, 0.162), (33, -0.056), (34, -0.042), (35, 0.049), (36, -0.11), (37, -0.043), (38, -0.091), (39, -0.102), (40, -0.011), (41, -0.096), (42, -0.024), (43, -0.055), (44, 0.034), (45, 0.053), (46, 0.057), (47, 0.073), (48, 0.146), (49, 0.061)]
simIndex simValue paperId paperTitle
same-paper 1 0.97125095 33 acl-2012-Automatic Event Extraction with Structured Preference Modeling
Author: Wei Lu ; Dan Roth
Abstract: This paper presents a novel sequence labeling model based on the latent-variable semiMarkov conditional random fields for jointly extracting argument roles of events from texts. The model takes in coarse mention and type information and predicts argument roles for a given event template. This paper addresses the event extraction problem in a primarily unsupervised setting, where no labeled training instances are available. Our key contribution is a novel learning framework called structured preference modeling (PM), that allows arbitrary preference to be assigned to certain structures during the learning procedure. We establish and discuss connections between this framework and other existing works. We show empirically that the structured preferences are crucial to the success of our task. Our model, trained without annotated data and with a small number of structured preferences, yields performance competitive to some baseline supervised approaches.
2 0.76397479 85 acl-2012-Event Linking: Grounding Event Reference in a News Archive
Author: Joel Nothman ; Matthew Honnibal ; Ben Hachey ; James R. Curran
Abstract: Interpreting news requires identifying its constituent events. Events are complex linguistically and ontologically, so disambiguating their reference is challenging. We introduce event linking, which canonically labels an event reference with the article where it was first reported. This implicitly relaxes coreference to co-reporting, and will practically enable augmenting news archives with semantic hyperlinks. We annotate and analyse a corpus of 150 documents, extracting 501 links to a news archive with reasonable inter-annotator agreement.
3 0.56425524 73 acl-2012-Discriminative Learning for Joint Template Filling
Author: Einat Minkov ; Luke Zettlemoyer
Abstract: This paper presents a joint model for template filling, where the goal is to automatically specify the fields of target relations such as seminar announcements or corporate acquisition events. The approach models mention detection, unification and field extraction in a flexible, feature-rich model that allows for joint modeling of interdependencies at all levels and across fields. Such an approach can, for example, learn likely event durations and the fact that start times should come before end times. While the joint inference space is large, we demonstrate effective learning with a Perceptron-style approach that uses simple, greedy beam decoding. Empirical results in two benchmark domains demonstrate consistently strong performance on both mention de- tection and template filling tasks.
4 0.53012478 18 acl-2012-A Probabilistic Model for Canonicalizing Named Entity Mentions
Author: Dani Yogatama ; Yanchuan Sim ; Noah A. Smith
Abstract: We present a statistical model for canonicalizing named entity mentions into a table whose rows represent entities and whose columns are attributes (or parts of attributes). The model is novel in that it incorporates entity context, surface features, firstorder dependencies among attribute-parts, and a notion of noise. Transductive learning from a few seeds and a collection of mention tokens combines Bayesian inference and conditional estimation. We evaluate our model and its components on two datasets collected from political blogs and sports news, finding that it outperforms a simple agglomerative clustering approach and previous work.
5 0.49409366 50 acl-2012-Collective Classification for Fine-grained Information Status
Author: Katja Markert ; Yufang Hou ; Michael Strube
Abstract: Previous work on classifying information status (Nissim, 2006; Rahman and Ng, 2011) is restricted to coarse-grained classification and focuses on conversational dialogue. We here introduce the task of classifying finegrained information status and work on written text. We add a fine-grained information status layer to the Wall Street Journal portion of the OntoNotes corpus. We claim that the information status of a mention depends not only on the mention itself but also on other mentions in the vicinity and solve the task by collectively classifying the information status ofall mentions. Our approach strongly outperforms reimplementations of previous work.
6 0.4899655 53 acl-2012-Combining Textual Entailment and Argumentation Theory for Supporting Online Debates Interactions
7 0.4785741 64 acl-2012-Crosslingual Induction of Semantic Roles
8 0.46670979 17 acl-2012-A Novel Burst-based Text Representation Model for Scalable Event Detection
9 0.45365462 10 acl-2012-A Discriminative Hierarchical Model for Fast Coreference at Large Scale
10 0.43414986 129 acl-2012-Learning High-Level Planning from Text
11 0.42911071 99 acl-2012-Finding Salient Dates for Building Thematic Timelines
12 0.42822921 201 acl-2012-Towards the Unsupervised Acquisition of Discourse Relations
13 0.4146761 57 acl-2012-Concept-to-text Generation via Discriminative Reranking
14 0.3923701 11 acl-2012-A Feature-Rich Constituent Context Model for Grammar Induction
15 0.38674581 194 acl-2012-Text Segmentation by Language Using Minimum Description Length
16 0.38142902 58 acl-2012-Coreference Semantics from Web Features
17 0.37902886 209 acl-2012-Unsupervised Semantic Role Induction with Global Role Ordering
18 0.36851889 219 acl-2012-langid.py: An Off-the-shelf Language Identification Tool
19 0.36647299 195 acl-2012-The Creation of a Corpus of English Metalanguage
20 0.36340022 215 acl-2012-WizIE: A Best Practices Guided Development Environment for Information Extraction
topicId topicWeight
[(26, 0.031), (28, 0.043), (30, 0.014), (37, 0.041), (39, 0.03), (74, 0.021), (82, 0.024), (84, 0.019), (85, 0.016), (90, 0.579), (92, 0.036), (94, 0.02), (99, 0.059)]
simIndex simValue paperId paperTitle
same-paper 1 0.99936473 33 acl-2012-Automatic Event Extraction with Structured Preference Modeling
Author: Wei Lu ; Dan Roth
Abstract: This paper presents a novel sequence labeling model based on the latent-variable semiMarkov conditional random fields for jointly extracting argument roles of events from texts. The model takes in coarse mention and type information and predicts argument roles for a given event template. This paper addresses the event extraction problem in a primarily unsupervised setting, where no labeled training instances are available. Our key contribution is a novel learning framework called structured preference modeling (PM), that allows arbitrary preference to be assigned to certain structures during the learning procedure. We establish and discuss connections between this framework and other existing works. We show empirically that the structured preferences are crucial to the success of our task. Our model, trained without annotated data and with a small number of structured preferences, yields performance competitive to some baseline supervised approaches.
2 0.99698466 143 acl-2012-Mixing Multiple Translation Models in Statistical Machine Translation
Author: Majid Razmara ; George Foster ; Baskaran Sankaran ; Anoop Sarkar
Abstract: Statistical machine translation is often faced with the problem of combining training data from many diverse sources into a single translation model which then has to translate sentences in a new domain. We propose a novel approach, ensemble decoding, which combines a number of translation systems dynamically at the decoding step. In this paper, we evaluate performance on a domain adaptation setting where we translate sentences from the medical domain. Our experimental results show that ensemble decoding outperforms various strong baselines including mixture models, the current state-of-the-art for domain adaptation in machine translation.
3 0.99647981 177 acl-2012-Sentence Dependency Tagging in Online Question Answering Forums
Author: Zhonghua Qu ; Yang Liu
Abstract: Online forums are becoming a popular resource in the state of the art question answering (QA) systems. Because of its nature as an online community, it contains more updated knowledge than other places. However, going through tedious and redundant posts to look for answers could be very time consuming. Most prior work focused on extracting only question answering sentences from user conversations. In this paper, we introduce the task of sentence dependency tagging. Finding dependency structure can not only help find answer quickly but also allow users to trace back how the answer is concluded through user conversations. We use linear-chain conditional random fields (CRF) for sentence type tagging, and a 2D CRF to label the dependency relation between sentences. Our experimental results show that our proposed approach performs well for sentence dependency tagging. This dependency information can benefit other tasks such as thread ranking and answer summarization in online forums.
4 0.99612427 212 acl-2012-Using Search-Logs to Improve Query Tagging
Author: Kuzman Ganchev ; Keith Hall ; Ryan McDonald ; Slav Petrov
Abstract: Syntactic analysis of search queries is important for a variety of information-retrieval tasks; however, the lack of annotated data makes training query analysis models difficult. We propose a simple, efficient procedure in which part-of-speech tags are transferred from retrieval-result snippets to queries at training time. Unlike previous work, our final model does not require any additional resources at run-time. Compared to a state-ofthe-art approach, we achieve more than 20% relative error reduction. Additionally, we annotate a corpus of search queries with partof-speech tags, providing a resource for future work on syntactic query analysis.
5 0.99513942 2 acl-2012-A Broad-Coverage Normalization System for Social Media Language
Author: Fei Liu ; Fuliang Weng ; Xiao Jiang
Abstract: Social media language contains huge amount and wide variety of nonstandard tokens, created both intentionally and unintentionally by the users. It is of crucial importance to normalize the noisy nonstandard tokens before applying other NLP techniques. A major challenge facing this task is the system coverage, i.e., for any user-created nonstandard term, the system should be able to restore the correct word within its top n output candidates. In this paper, we propose a cognitivelydriven normalization system that integrates different human perspectives in normalizing the nonstandard tokens, including the enhanced letter transformation, visual priming, and string/phonetic similarity. The system was evaluated on both word- and messagelevel using four SMS and Twitter data sets. Results show that our system achieves over 90% word-coverage across all data sets (a . 10% absolute increase compared to state-ofthe-art); the broad word-coverage can also successfully translate into message-level performance gain, yielding 6% absolute increase compared to the best prior approach.
6 0.9746049 23 acl-2012-A Two-step Approach to Sentence Compression of Spoken Utterances
7 0.97094077 216 acl-2012-Word Epoch Disambiguation: Finding How Words Change Over Time
8 0.96672529 119 acl-2012-Incremental Joint Approach to Word Segmentation, POS Tagging, and Dependency Parsing in Chinese
9 0.96640933 131 acl-2012-Learning Translation Consensus with Structured Label Propagation
10 0.95989895 55 acl-2012-Community Answer Summarization for Multi-Sentence Question with Group L1 Regularization
11 0.95969409 9 acl-2012-A Cost Sensitive Part-of-Speech Tagging: Differentiating Serious Errors from Minor Errors
12 0.9584052 172 acl-2012-Selective Sharing for Multilingual Dependency Parsing
13 0.95736188 213 acl-2012-Utilizing Dependency Language Models for Graph-based Dependency Parsing Models
14 0.95732194 150 acl-2012-Multilingual Named Entity Recognition using Parallel Data and Metadata from Wikipedia
16 0.94453287 3 acl-2012-A Class-Based Agreement Model for Generating Accurately Inflected Translations
17 0.94296867 137 acl-2012-Lemmatisation as a Tagging Task
18 0.94290316 20 acl-2012-A Statistical Model for Unsupervised and Semi-supervised Transliteration Mining
19 0.94244915 87 acl-2012-Exploiting Multiple Treebanks for Parsing with Quasi-synchronous Grammars
20 0.93826252 45 acl-2012-Capturing Paradigmatic and Syntagmatic Lexical Relations: Towards Accurate Chinese Part-of-Speech Tagging