acl acl2012 acl2012-191 knowledge-graph by maker-knowledge-mining
Source: pdf
Author: Guillermo Garrido ; Anselmo Penas ; Bernardo Cabaleiro ; Alvaro Rodrigo
Abstract: Although much work on relation extraction has aimed at obtaining static facts, many of the target relations are actually fluents, as their validity is naturally anchored to a certain time period. This paper proposes a methodological approach to temporally anchored relation extraction. Our proposal performs distant supervised learning to extract a set of relations from a natural language corpus, and anchors each of them to an interval of temporal validity, aggregating evidence from documents supporting the relation. We use a rich graphbased document-level representation to generate novel features for this task. Results show that our implementation for temporal anchoring is able to achieve a 69% of the upper bound performance imposed by the relation extraction step. Compared to the state of the art, the overall system achieves the highest precision reported.
Reference: text
sentIndex sentText sentNum sentScore
1 e s Abstract Although much work on relation extraction has aimed at obtaining static facts, many of the target relations are actually fluents, as their validity is naturally anchored to a certain time period. [sent-3, score-0.602]
2 This paper proposes a methodological approach to temporally anchored relation extraction. [sent-4, score-0.551]
3 Our proposal performs distant supervised learning to extract a set of relations from a natural language corpus, and anchors each of them to an interval of temporal validity, aggregating evidence from documents supporting the relation. [sent-5, score-1.197]
4 Results show that our implementation for temporal anchoring is able to achieve a 69% of the upper bound performance imposed by the relation extraction step. [sent-7, score-1.27]
5 1 Introduction A question that arises when extracting a relation is how to capture its temporal validity: Can we assign a period of time when the obtained relation held? [sent-9, score-1.188]
6 As pointed out in (Ling and Weld, 2010), while much research in automatic relation extraction has focused on distilling static facts from text, many of the target relations are in fact fluents, dynamic relations whose truth value is dependent on time (Russell and Norvig, 2010). [sent-10, score-0.648]
7 The Temporally anchored relation extraction problem consists in, given a natural language text document corpus, C, a target entity, e, and a target 107 relation, r, extracting from the corpus the value of that relation for the entity, and a temporal interval for which the relation was valid. [sent-11, score-1.92]
8 In this paper, we introduce a methodological approach to temporal anchoring of relations automatically extracted from unrestricted text. [sent-12, score-1.115]
9 , 2009) and then anchors the relation to an interval of temporal validity. [sent-14, score-1.058]
10 The intuition is that a distant supervised system can effectively extract relations from the source text collection, and a straightforward date aggregation can then be applied to anchor them. [sent-15, score-0.357]
11 In contrast with previous approaches that aim at intra-document temporal information extraction (Ling and Weld, 2010), we focus on mining a corpus aggregating temporal evidences across the supporting documents. [sent-17, score-1.495]
12 (3) Compare the use of document metadata against temporal expressions within the document for relation temporal an- choring. [sent-20, score-1.904]
13 The representation we use for temporal information is detailed in section 2; the rich document-level representation we exploit is described in section 3. [sent-25, score-0.769]
14 For a query entity and target relation, the system first performs relation extraction (section 4); then, we find and aggregate time constraint evidence for the same relation across different documents, to establish a temporal validity anchor interval (section 5). [sent-26, score-1.626]
15 2 Temporal Anchors We will denominate relation instance a triple hentity, relation name, valuei . [sent-28, score-0.609]
16 We aim at anchoring t reitlyat,iroenl aintsitoann nceasm mtoe ,thvaeilru temporal validity. [sent-29, score-0.947]
17 Woreneed a representation flexible enough to capture the imprecise temporal information available in text, but expressed in a structured style. [sent-30, score-0.822]
18 Allen’s (1983) interval-based algebra for temporal representation and reasoning, underlies much research, such as the Tempeval challenges (Verhagen et al. [sent-31, score-0.71]
19 Our task is different, as we focus on obtaining the temporal interval associated to a fact, rather than reasoning about the 108 temporal relations among the events appearing in a single text. [sent-33, score-1.605]
20 Let us assume that each relation instance is valid during a certain temporal interval, I [t0, tf] . [sent-34, score-0.901]
21 This = sharp temporal interval fails to capture the imprecision of temporal boundaries conveyed in natural language text. [sent-35, score-1.422]
22 An imprecise temporal interval is defined as an ordered 4-tuple of time points: (t1, t2 , t3, t4), with the following semantics: the relation is true for a period which starts at some point between t1 and t2 and ends between t3 and t4. [sent-38, score-1.17]
23 3 Document Representation We use a rich document representation that employs a graph structure obtained by augmenting the syntactic dependency analysis of the document with semantic information. [sent-43, score-0.432]
24 A document D is represented as a document graph GD; with node set VD and edge set, ED. [sent-44, score-0.409]
25 David[NNP,Davdi] NER: PERSON Figure 2: Collapsed document graph representation, GC, for the sample text document “David’s wife, Julia, is celebrating her birthday. [sent-52, score-0.373]
26 The processing includes dependency parsing, named entity recognition and coreference resolution, done with the Stanford CoreNLP software (Klein and Manning, 2003); and events and temporal information extraction, via the TARSQI Toolkit (Verhagen et al. [sent-57, score-0.8]
27 The document graph GD is then further transformed into a collapsed document graph, GC. [sent-59, score-0.418]
28 • 4 Distant Supervised Relation Extraction To perform relation extraction, our proposal follows a distant supervision approach (Mintz et al. [sent-72, score-0.438]
29 From a reference Knowledge Base (KB), we extract a set of relation triples or seeds: hentity, relation, valuei, where the relation is one yof, etlhae target ureelait,io wnsh. [sent-78, score-0.573]
30 O thuer document-level distant supervision assumption is that if entity and value are found in a document graph (see section 3), and there is a path connecting them, then the document expresses the relation. [sent-79, score-0.781]
31 Then, entity and value are matched to the document graph representation. [sent-82, score-0.37]
32 Our procedure traverses the document graph looking for entity and value nodes meeting those conditions; when found, we generate features for a positive example for the relation2. [sent-86, score-0.407]
33 Given an input entity and a target relation, we aim at finding a filler value for a relation instance. [sent-96, score-0.395]
34 From the set of retrieved documents relevant to the query entity, represented as document graphs, 2From the collapsed document graph representation we obtained an average of 9213 positive training examples per slot; from the uncollapsed document graph, a slightly lower average of 8178. [sent-98, score-0.693]
35 For each particular relation classifier, only candidates with the expected entity and value types for the relation were used in the application phase. [sent-105, score-0.645]
36 5 Temporal Anchoring of Relations In this section, we propose and discuss a unified methodological approach for temporal anchoring of relations. [sent-112, score-1.004]
37 The task is establishing a imprecise temporal anchor interval for the relation. [sent-114, score-0.95]
38 The first methodological step is to obtain and represent the available intradocument temporal information; the input is a document, and the task is to identify temporal signals and possible links among them. [sent-117, score-1.432]
39 We use the term link for a relation between a temporal expression (a date) and an event; we want to avoid confusion with the term relation (a relational fact extracted from text). [sent-118, score-1.31]
40 In our particular implementation: • We use TARSQI to extract temporal expressions aWnde ulsinek them to events. [sent-119, score-0.707]
41 In particular, TARSQI uses the following temporal links: included, simultaneous, after, before, begun by or ended. [sent-120, score-0.651]
42 • Both are normalized into one from a set of predefBinoetdh temporal l i znekds: i within, throughout, beginning, ending, after and before. [sent-122, score-0.651]
43 For each document and relational instance, we have to select those temporal expressions that are relevant. [sent-124, score-0.919]
44 Temporal evidence comes also from the temporal expressions present in the context of a relation. [sent-131, score-0.757]
45 In our particular implementation, we followed a straight- forward approach, looking for the time expression closest in the document graph to the shortest path between the entity and value nodes. [sent-132, score-0.492]
46 The third step is deciding how a relational fact and its relevant temporal information are themselves related. [sent-135, score-0.715]
47 Let T be a temporal expression identified in the document or its metadata. [sent-139, score-0.839]
48 Now, the mapping of temporal constraints depends on the temporal link to the time expression identified; also, the semantics of the event have to be considered in order to decide the time period associated to a relation instance. [sent-140, score-1.786]
49 For instance, it is obvious that having the event marry is different to having the event divorce, when deciding the temporal constraints associated to the spouse relation. [sent-142, score-0.767]
50 Table 2 shows our particular mapping between temporal links and constraints. [sent-143, score-0.727]
51 In particular, for the default document creation time, we suppose that a relation which appears in a document with creation time d held true at least in that date; that is, we are assuming a within link, and we map t2 = d, t3 = d. [sent-144, score-0.701]
52 The last step is aggregating all the time constraints found for the same relation and value across different documents. [sent-146, score-0.396]
53 If we found that a relation started after two dates d and d0, where d0 > d, the closest constraint to the real start ofthe relation is d0. [sent-147, score-0.537]
54 Mapped to temporal constraints, it means that we would choose the biggest t1possible. [sent-148, score-0.651]
55 1 Evaluation of Relation Extraction System response in the relation extraction step consists in a set of triples hentity, slot type, valuei . [sent-165, score-0.608]
56 Target relations (slots) are potentially list-valued, that is, more than one value can be valid for a relation (possibly at different points in time). [sent-168, score-0.385]
57 In SETTING 1, each document is represented as a document graph, GD, while in SETTING 2 collapsed document graph representation, GC, is employed. [sent-173, score-0.566]
58 This number means that no matter how good relation extraction method is, 38% of relations will not be found. [sent-184, score-0.404]
59 Second, the distant supervision assumption underlying our approach is that for a seed relation instance hentity, relation, valuei, any textual mentsitoann oef entity a,nrde avatiluoen expresses nthye erextlautailo mn. [sent-185, score-0.589]
60 While for the relation cities of residence only 30% of the training examples are expressing the relation, for spouse the number goes up to 59%. [sent-192, score-0.43]
61 2 Evaluation of Temporal Anchoring Under the evaluation metrics proposed by TAC-KBP 2011, if the value of the relation instance is judged as correct, the score for temporal anchoring depends on how well the returned interval matches the one provided in the key. [sent-196, score-1.371]
62 T thhee score for the system response is: Q(S) =14Xi=411 +1 di The score for a target relation Q(r) is computed by summing Q(S) over all unique instances of the relation whose value is correct. [sent-205, score-0.554]
63 We evaluated two different settings for the temporal anchoring step; both use the collapsed document graph representation, GC (SETTING 2). [sent-208, score-1.217]
64 First, test the strength of the document creation time as evidence for temporal anchoring. [sent-210, score-0.945]
65 Second, test how hard this metadata-level baseline is to beat using contextual temporal expressions. [sent-211, score-0.651]
66 The SETTING 2-I assumes a within temporal link between the document creation time and any relation expressed inside the document, and aggregates this information across the documents that we have identified as supporting the relation. [sent-212, score-1.277]
67 The SETTING 2-II considers documents content in order to extract temporal links from the context of the text that expresses the relation. [sent-213, score-0.762]
68 If no temporal expression is found, the date of the document is used as default. [sent-214, score-0.867]
69 The performance on relation extraction is an upper bound for temporal anchoring, attainable if temporal anchoring is perfect. [sent-216, score-1.921]
70 Thus, we also evaluate the temporal anchoring performance as the percentage the final system achieves with respect to the relation extraction upper bound. [sent-217, score-1.27]
71 They are low, due to the upper bound that error propagation in candidate retrieval and relation extraction imposes upon this step: temporally anchoring alone achives 69% of its upper bound. [sent-220, score-0.739]
72 The difference with SETTING 2-II shows that this baseline is difficult to beat by considering temporal evidence inside the document content. [sent-222, score-0.849]
73 The temporal link mapping into time intervals does not depend only on the type of link, but also on the semantics of the text that expresses the relation as we pointed out above. [sent-224, score-1.097]
74 We have to decide how to transform the link between relation and temporal expression into a temporal interval. [sent-225, score-1.647]
75 However, our system achieves the highest precision for the complete task of temporally anchored relation extraction. [sent-264, score-0.494]
76 Despite the vast amount of research focusing on understanding temporal expressions and 4Slot-fillers from human assessors were not considered 113 their relation to events in natural language, the complete problem of temporally anchored relation extraction remains relatively unexplored. [sent-269, score-1.582]
77 Also, while much research has focused on single-document extraction, it seems clear that extracting temporally anchored relations needs the aggregation of evidences across multiple documents. [sent-270, score-0.407]
78 (2012) focus on the partial task of temporally anchoring already known facts, showing the usefulness of the document creation time as temporal signal, aggregated across documents. [sent-280, score-1.311]
79 The TempEval community focused on the classification of the temporal links between pairs of events, or an event and a temporal expression; using shallow features (Mani et al. [sent-282, score-1.375]
80 Aggregating evidence across different documents to temporally anchor facts has been explored in settings different to Information Extraction, such as answering of definition questions (Pa ¸sca, 2008) or extracting possible dates of well-known historical events (Schockaert et al. [sent-286, score-0.443]
81 Temporal inference or reasoning to solve conflicting temporal expressions and induce temporal order of events has been used in TempEval (Tatu and Srikanth, 2008; Yoshikawa et al. [sent-288, score-1.46]
82 (2010), use cross-event joint inference to extract temporal facts, but only inside a single document. [sent-291, score-0.651]
83 While ACE required only to identify time expressions and classify their relation to events, KBP requires to infer explicitly the start/end time of relations, which is a realistic approach in the context of building time-aware knowledge bases. [sent-293, score-0.38]
84 KBP represents an important step for the evaluation of temporal information extraction systems. [sent-294, score-0.724]
85 In general, the participant systems adapted existing slot filling systems, adding a temporal classification component: distant supervised (Chen et al. [sent-295, score-1.027]
86 8 Conclusions This paper introduces the problem of extracting, from unrestricted natural language text, relational knowledge anchored to a temporal span, aggregating temporal evidence from a collection of documents. [sent-298, score-1.625]
87 We have elucidated the two challenges of the task, namely relation extraction and temporal anchoring of the extracted relations. [sent-303, score-1.27]
88 The performance attainable in the full task is limited by the quality of the output of the three main phases: retrieval of candidate passages/ documents, extraction of relations and temporal anchoring of those. [sent-305, score-1.101]
89 114 We have also studied the limits of the distant supervision approach to relation extraction, showing empirically that its performance depends not only on the nature of reference knowledge base and document corpus (Riedel et al. [sent-306, score-0.629]
90 Given a relation between two arguments, if it is not dominant among textual expressions of those arguments, the distant supervision assumption will be more often violated. [sent-308, score-0.494]
91 We have introduced a novel graph-based document level representation, that has allowed us to generate new features for the task of relation extraction, capturing long distance structured contexts. [sent-309, score-0.398]
92 It has been able to extract temporally anchored relational information with the highest precision among the participant systems taking part in the competitive evaluation TAC-KBP 2011. [sent-313, score-0.338]
93 For the temporal anchoring sub-problem, we have demonstrated the strength of the document creation time as a temporal signal. [sent-314, score-1.842]
94 It is possible to achieve a performance of 69% of the upper-bound imposed by relation extraction by assuming that any relation mentioned in a document held at the document creation time (there is a within link between the relational fact and the document creation time). [sent-315, score-1.291]
95 This baseline has proved stronger than extracting and analyzing the temporal expressions present in the document content. [sent-316, score-0.855]
96 Cu-tmp: temporal relation classification using syntactic and se- mantic features. [sent-335, score-0.901]
97 SemEval2010 task 13: evaluating events, time expressions, and temporal relations (TempEval-2). [sent-415, score-0.769]
98 Reasoning about fuzzy temporal information from the web: towards retrieval of historical events. [sent-433, score-0.651]
99 A simple distant supervision approach for the tac-kbp slot filling task. [sent-444, score-0.408]
100 Timely YAGO: harvesting, querying, and visualizing temporal knowledge from Wikipedia. [sent-466, score-0.651]
wordName wordTfidf (topN-words)
[('temporal', 0.651), ('anchoring', 0.296), ('relation', 0.25), ('document', 0.148), ('slot', 0.134), ('distant', 0.126), ('anchored', 0.124), ('temporally', 0.12), ('interval', 0.12), ('imprecise', 0.112), ('residence', 0.093), ('entity', 0.091), ('filling', 0.086), ('relations', 0.081), ('hentity', 0.078), ('valuei', 0.078), ('graph', 0.077), ('triples', 0.073), ('extraction', 0.073), ('facts', 0.072), ('ne', 0.069), ('tac', 0.067), ('anchor', 0.067), ('relational', 0.064), ('supervision', 0.062), ('surdeanu', 0.06), ('gc', 0.06), ('creation', 0.059), ('representation', 0.059), ('events', 0.058), ('methodological', 0.057), ('expressions', 0.056), ('link', 0.055), ('aggregation', 0.055), ('aggregating', 0.055), ('value', 0.054), ('kbp', 0.054), ('spouse', 0.054), ('evidence', 0.05), ('verhagen', 0.05), ('bfs', 0.047), ('tarsqi', 0.047), ('mintz', 0.046), ('responses', 0.046), ('path', 0.045), ('collapsed', 0.045), ('reasoning', 0.044), ('gerhard', 0.044), ('limits', 0.043), ('stroudsburg', 0.042), ('links', 0.042), ('tempeval', 0.042), ('gd', 0.041), ('garrido', 0.041), ('expression', 0.04), ('intervals', 0.04), ('documents', 0.039), ('semeval', 0.038), ('riedel', 0.038), ('marc', 0.038), ('supporting', 0.038), ('nodes', 0.037), ('timely', 0.037), ('anchors', 0.037), ('time', 0.037), ('dates', 0.037), ('validity', 0.037), ('node', 0.036), ('ji', 0.035), ('pa', 0.035), ('talukdar', 0.035), ('heng', 0.035), ('mapping', 0.034), ('business', 0.034), ('cities', 0.033), ('sca', 0.033), ('mani', 0.033), ('weikum', 0.033), ('ling', 0.033), ('event', 0.031), ('triple', 0.031), ('angel', 0.031), ('anselmo', 0.031), ('cabaleiro', 0.031), ('fluents', 0.031), ('genitives', 0.031), ('intradocument', 0.031), ('schockaert', 0.031), ('tackbp', 0.031), ('tatu', 0.031), ('unrestricted', 0.03), ('participant', 0.03), ('seed', 0.03), ('expresses', 0.03), ('per', 0.029), ('date', 0.028), ('ace', 0.028), ('evidences', 0.027), ('lizhen', 0.027), ('spaniol', 0.027)]
simIndex simValue paperId paperTitle
same-paper 1 1.0000005 191 acl-2012-Temporally Anchored Relation Extraction
Author: Guillermo Garrido ; Anselmo Penas ; Bernardo Cabaleiro ; Alvaro Rodrigo
Abstract: Although much work on relation extraction has aimed at obtaining static facts, many of the target relations are actually fluents, as their validity is naturally anchored to a certain time period. This paper proposes a methodological approach to temporally anchored relation extraction. Our proposal performs distant supervised learning to extract a set of relations from a natural language corpus, and anchors each of them to an interval of temporal validity, aggregating evidence from documents supporting the relation. We use a rich graphbased document-level representation to generate novel features for this task. Results show that our implementation for temporal anchoring is able to achieve a 69% of the upper bound performance imposed by the relation extraction step. Compared to the state of the art, the overall system achieves the highest precision reported.
2 0.42194211 90 acl-2012-Extracting Narrative Timelines as Temporal Dependency Structures
Author: Oleksandr Kolomiyets ; Steven Bethard ; Marie-Francine Moens
Abstract: We propose a new approach to characterizing the timeline of a text: temporal dependency structures, where all the events of a narrative are linked via partial ordering relations like BEFORE, AFTER, OVERLAP and IDENTITY. We annotate a corpus of children’s stories with temporal dependency trees, achieving agreement (Krippendorff’s Alpha) of 0.856 on the event words, 0.822 on the links between events, and of 0.700 on the ordering relation labels. We compare two parsing models for temporal dependency structures, and show that a deterministic non-projective dependency parser outperforms a graph-based maximum spanning tree parser, achieving labeled attachment accuracy of 0.647 and labeled tree edit distance of 0.596. Our analysis of the dependency parser errors gives some insights into future research directions.
3 0.3646563 60 acl-2012-Coupling Label Propagation and Constraints for Temporal Fact Extraction
Author: Yafang Wang ; Maximilian Dylla ; Marc Spaniol ; Gerhard Weikum
Abstract: The Web and digitized text sources contain a wealth of information about named entities such as politicians, actors, companies, or cultural landmarks. Extracting this information has enabled the automated construction oflarge knowledge bases, containing hundred millions of binary relationships or attribute values about these named entities. However, in reality most knowledge is transient, i.e. changes over time, requiring a temporal dimension in fact extraction. In this paper we develop a methodology that combines label propagation with constraint reasoning for temporal fact extraction. Label propagation aggressively gathers fact candidates, and an Integer Linear Program is used to clean out false hypotheses that violate temporal constraints. Our method is able to improve on recall while keeping up with precision, which we demonstrate by experiments with biography-style Wikipedia pages and a large corpus of news articles.
4 0.27593786 135 acl-2012-Learning to Temporally Order Medical Events in Clinical Text
Author: Preethi Raghavan ; Albert Lai ; Eric Fosler-Lussier
Abstract: We investigate the problem of ordering medical events in unstructured clinical narratives by learning to rank them based on their time of occurrence. We represent each medical event as a time duration, with a corresponding start and stop, and learn to rank the starts/stops based on their proximity to the admission date. Such a representation allows us to learn all of Allen’s temporal relations between medical events. Interestingly, we observe that this methodology performs better than a classification-based approach for this domain, but worse on the relationships found in the Timebank corpus. This finding has important implications for styles of data representation and resources used for temporal relation learning: clinical narratives may have different language attributes corresponding to temporal ordering relative to Timebank, implying that the field may need to look at a wider range ofdomains to fully understand the nature of temporal ordering.
5 0.24194185 99 acl-2012-Finding Salient Dates for Building Thematic Timelines
Author: Remy Kessler ; Xavier Tannier ; Caroline Hagege ; Veronique Moriceau ; Andre Bittar
Abstract: We present an approach for detecting salient (important) dates in texts in order to automatically build event timelines from a search query (e.g. the name of an event or person, etc.). This work was carried out on a corpus of newswire texts in English provided by the Agence France Presse (AFP). In order to extract salient dates that warrant inclusion in an event timeline, we first recognize and normalize temporal expressions in texts and then use a machine-learning approach to extract salient dates that relate to a particular topic. We focused only on extracting the dates and not the events to which they are related.
6 0.22817378 126 acl-2012-Labeling Documents with Timestamps: Learning from their Time Expressions
7 0.20879154 159 acl-2012-Pattern Learning for Relation Extraction with a Hierarchical Topic Model
8 0.19381328 40 acl-2012-Big Data versus the Crowd: Looking for Relationships in All the Right Places
9 0.16741998 201 acl-2012-Towards the Unsupervised Acquisition of Discourse Relations
10 0.15374938 91 acl-2012-Extracting and modeling durations for habits and events from Twitter
11 0.14733973 17 acl-2012-A Novel Burst-based Text Representation Model for Scalable Event Detection
12 0.14490519 192 acl-2012-Tense and Aspect Error Correction for ESL Learners Using Global Context
13 0.14353473 208 acl-2012-Unsupervised Relation Discovery with Sense Disambiguation
14 0.13362168 169 acl-2012-Reducing Wrong Labels in Distant Supervision for Relation Extraction
15 0.12698989 73 acl-2012-Discriminative Learning for Joint Template Filling
16 0.1191442 85 acl-2012-Event Linking: Grounding Event Reference in a News Archive
17 0.1135877 12 acl-2012-A Graph-based Cross-lingual Projection Approach for Weakly Supervised Relation Extraction
18 0.081513837 14 acl-2012-A Joint Model for Discovery of Aspects in Utterances
19 0.078693964 157 acl-2012-PDTB-style Discourse Annotation of Chinese Text
20 0.067657493 33 acl-2012-Automatic Event Extraction with Structured Preference Modeling
topicId topicWeight
[(0, -0.236), (1, 0.263), (2, -0.165), (3, 0.369), (4, 0.069), (5, -0.285), (6, -0.096), (7, -0.082), (8, -0.012), (9, -0.28), (10, -0.045), (11, 0.058), (12, -0.038), (13, -0.024), (14, 0.054), (15, -0.043), (16, -0.107), (17, -0.038), (18, 0.133), (19, 0.032), (20, 0.076), (21, -0.064), (22, -0.046), (23, -0.014), (24, 0.076), (25, 0.049), (26, 0.059), (27, -0.076), (28, 0.027), (29, -0.031), (30, -0.095), (31, 0.104), (32, 0.003), (33, 0.024), (34, 0.065), (35, -0.014), (36, -0.012), (37, 0.054), (38, 0.024), (39, 0.016), (40, 0.027), (41, -0.005), (42, -0.034), (43, 0.009), (44, -0.015), (45, -0.06), (46, -0.097), (47, -0.059), (48, -0.051), (49, -0.024)]
simIndex simValue paperId paperTitle
same-paper 1 0.9831863 191 acl-2012-Temporally Anchored Relation Extraction
Author: Guillermo Garrido ; Anselmo Penas ; Bernardo Cabaleiro ; Alvaro Rodrigo
Abstract: Although much work on relation extraction has aimed at obtaining static facts, many of the target relations are actually fluents, as their validity is naturally anchored to a certain time period. This paper proposes a methodological approach to temporally anchored relation extraction. Our proposal performs distant supervised learning to extract a set of relations from a natural language corpus, and anchors each of them to an interval of temporal validity, aggregating evidence from documents supporting the relation. We use a rich graphbased document-level representation to generate novel features for this task. Results show that our implementation for temporal anchoring is able to achieve a 69% of the upper bound performance imposed by the relation extraction step. Compared to the state of the art, the overall system achieves the highest precision reported.
2 0.92062056 60 acl-2012-Coupling Label Propagation and Constraints for Temporal Fact Extraction
Author: Yafang Wang ; Maximilian Dylla ; Marc Spaniol ; Gerhard Weikum
Abstract: The Web and digitized text sources contain a wealth of information about named entities such as politicians, actors, companies, or cultural landmarks. Extracting this information has enabled the automated construction oflarge knowledge bases, containing hundred millions of binary relationships or attribute values about these named entities. However, in reality most knowledge is transient, i.e. changes over time, requiring a temporal dimension in fact extraction. In this paper we develop a methodology that combines label propagation with constraint reasoning for temporal fact extraction. Label propagation aggressively gathers fact candidates, and an Integer Linear Program is used to clean out false hypotheses that violate temporal constraints. Our method is able to improve on recall while keeping up with precision, which we demonstrate by experiments with biography-style Wikipedia pages and a large corpus of news articles.
3 0.8803907 135 acl-2012-Learning to Temporally Order Medical Events in Clinical Text
Author: Preethi Raghavan ; Albert Lai ; Eric Fosler-Lussier
Abstract: We investigate the problem of ordering medical events in unstructured clinical narratives by learning to rank them based on their time of occurrence. We represent each medical event as a time duration, with a corresponding start and stop, and learn to rank the starts/stops based on their proximity to the admission date. Such a representation allows us to learn all of Allen’s temporal relations between medical events. Interestingly, we observe that this methodology performs better than a classification-based approach for this domain, but worse on the relationships found in the Timebank corpus. This finding has important implications for styles of data representation and resources used for temporal relation learning: clinical narratives may have different language attributes corresponding to temporal ordering relative to Timebank, implying that the field may need to look at a wider range ofdomains to fully understand the nature of temporal ordering.
4 0.75494915 126 acl-2012-Labeling Documents with Timestamps: Learning from their Time Expressions
Author: Nathanael Chambers
Abstract: Temporal reasoners for document understanding typically assume that a document’s creation date is known. Algorithms to ground relative time expressions and order events often rely on this timestamp to assist the learner. Unfortunately, the timestamp is not always known, particularly on the Web. This paper addresses the task of automatic document timestamping, presenting two new models that incorporate rich linguistic features about time. The first is a discriminative classifier with new features extracted from the text’s time expressions (e.g., ‘since 1999’). This model alone improves on previous generative models by 77%. The second model learns probabilistic constraints between time expressions and the unknown document time. Imposing these learned constraints on the discriminative model further improves its accuracy. Finally, we present a new experiment design that facil- itates easier comparison by future work.
5 0.73684263 99 acl-2012-Finding Salient Dates for Building Thematic Timelines
Author: Remy Kessler ; Xavier Tannier ; Caroline Hagege ; Veronique Moriceau ; Andre Bittar
Abstract: We present an approach for detecting salient (important) dates in texts in order to automatically build event timelines from a search query (e.g. the name of an event or person, etc.). This work was carried out on a corpus of newswire texts in English provided by the Agence France Presse (AFP). In order to extract salient dates that warrant inclusion in an event timeline, we first recognize and normalize temporal expressions in texts and then use a machine-learning approach to extract salient dates that relate to a particular topic. We focused only on extracting the dates and not the events to which they are related.
6 0.72579926 90 acl-2012-Extracting Narrative Timelines as Temporal Dependency Structures
7 0.56502664 91 acl-2012-Extracting and modeling durations for habits and events from Twitter
8 0.52680415 159 acl-2012-Pattern Learning for Relation Extraction with a Hierarchical Topic Model
9 0.42841309 133 acl-2012-Learning to "Read Between the Lines" using Bayesian Logic Programs
10 0.41922176 40 acl-2012-Big Data versus the Crowd: Looking for Relationships in All the Right Places
11 0.4179998 73 acl-2012-Discriminative Learning for Joint Template Filling
12 0.41717115 201 acl-2012-Towards the Unsupervised Acquisition of Discourse Relations
13 0.416161 169 acl-2012-Reducing Wrong Labels in Distant Supervision for Relation Extraction
14 0.39060205 12 acl-2012-A Graph-based Cross-lingual Projection Approach for Weakly Supervised Relation Extraction
15 0.37283781 208 acl-2012-Unsupervised Relation Discovery with Sense Disambiguation
16 0.36640731 129 acl-2012-Learning High-Level Planning from Text
17 0.34898359 17 acl-2012-A Novel Burst-based Text Representation Model for Scalable Event Detection
18 0.34386143 192 acl-2012-Tense and Aspect Error Correction for ESL Learners Using Global Context
19 0.33654284 85 acl-2012-Event Linking: Grounding Event Reference in a News Archive
20 0.29968235 42 acl-2012-Bootstrapping via Graph Propagation
topicId topicWeight
[(25, 0.028), (26, 0.052), (28, 0.026), (30, 0.031), (34, 0.157), (37, 0.038), (39, 0.053), (49, 0.016), (57, 0.015), (64, 0.03), (74, 0.024), (82, 0.091), (84, 0.025), (85, 0.027), (90, 0.101), (92, 0.071), (94, 0.015), (96, 0.011), (99, 0.098)]
simIndex simValue paperId paperTitle
1 0.92777181 200 acl-2012-Toward Automatically Assembling Hittite-Language Cuneiform Tablet Fragments into Larger Texts
Author: Stephen Tyndall
Abstract: This paper presents the problem within Hittite and Ancient Near Eastern studies of fragmented and damaged cuneiform texts, and proposes to use well-known text classification metrics, in combination with some facts about the structure of Hittite-language cuneiform texts, to help classify a number offragments of clay cuneiform-script tablets into more complete texts. In particular, Ipropose using Sumerian and Akkadian ideogrammatic signs within Hittite texts to improve the performance of Naive Bayes and Maximum Entropy classifiers. The performance in some cases is improved, and in some cases very much not, suggesting that the variable frequency of occurrence of these ideograms in individual fragments makes considerable difference in the ideal choice for a classification method. Further, complexities of the writing system and the digital availability ofHittite texts complicate the problem.
2 0.92099869 112 acl-2012-Humor as Circuits in Semantic Networks
Author: Igor Labutov ; Hod Lipson
Abstract: This work presents a first step to a general implementation of the Semantic-Script Theory of Humor (SSTH). Of the scarce amount of research in computational humor, no research had focused on humor generation beyond simple puns and punning riddles. We propose an algorithm for mining simple humorous scripts from a semantic network (ConceptNet) by specifically searching for dual scripts that jointly maximize overlap and incongruity metrics in line with Raskin’s Semantic-Script Theory of Humor. Initial results show that a more relaxed constraint of this form is capable of generating humor of deeper semantic content than wordplay riddles. We evaluate the said metrics through a user-assessed quality of the generated two-liners.
3 0.82778674 219 acl-2012-langid.py: An Off-the-shelf Language Identification Tool
Author: Marco Lui ; Timothy Baldwin
Abstract: We present langid .py, an off-the-shelflanguage identification tool. We discuss the design and implementation of langid .py, and provide an empirical comparison on 5 longdocument datasets, and 2 datasets from the microblog domain. We find that langid .py maintains consistently high accuracy across all domains, making it ideal for end-users that require language identification without wanting to invest in preparation of in-domain training data.
same-paper 4 0.81618953 191 acl-2012-Temporally Anchored Relation Extraction
Author: Guillermo Garrido ; Anselmo Penas ; Bernardo Cabaleiro ; Alvaro Rodrigo
Abstract: Although much work on relation extraction has aimed at obtaining static facts, many of the target relations are actually fluents, as their validity is naturally anchored to a certain time period. This paper proposes a methodological approach to temporally anchored relation extraction. Our proposal performs distant supervised learning to extract a set of relations from a natural language corpus, and anchors each of them to an interval of temporal validity, aggregating evidence from documents supporting the relation. We use a rich graphbased document-level representation to generate novel features for this task. Results show that our implementation for temporal anchoring is able to achieve a 69% of the upper bound performance imposed by the relation extraction step. Compared to the state of the art, the overall system achieves the highest precision reported.
5 0.7307176 12 acl-2012-A Graph-based Cross-lingual Projection Approach for Weakly Supervised Relation Extraction
Author: Seokhwan Kim ; Gary Geunbae Lee
Abstract: Although researchers have conducted extensive studies on relation extraction in the last decade, supervised approaches are still limited because they require large amounts of training data to achieve high performances. To build a relation extractor without significant annotation effort, we can exploit cross-lingual annotation projection, which leverages parallel corpora as external resources for supervision. This paper proposes a novel graph-based projection approach and demonstrates the merits of it by using a Korean relation extraction system based on projected dataset from an English-Korean parallel corpus.
6 0.72217691 187 acl-2012-Subgroup Detection in Ideological Discussions
7 0.71111155 60 acl-2012-Coupling Label Propagation and Constraints for Temporal Fact Extraction
8 0.71069771 188 acl-2012-Subgroup Detector: A System for Detecting Subgroups in Online Discussions
9 0.70533454 90 acl-2012-Extracting Narrative Timelines as Temporal Dependency Structures
10 0.69076484 57 acl-2012-Concept-to-text Generation via Discriminative Reranking
11 0.68756396 84 acl-2012-Estimating Compact Yet Rich Tree Insertion Grammars
13 0.67965907 206 acl-2012-UWN: A Large Multilingual Lexical Knowledge Base
14 0.6794318 159 acl-2012-Pattern Learning for Relation Extraction with a Hierarchical Topic Model
15 0.67184609 31 acl-2012-Authorship Attribution with Author-aware Topic Models
16 0.67152739 80 acl-2012-Efficient Tree-based Approximation for Entailment Graph Learning
17 0.67072022 201 acl-2012-Towards the Unsupervised Acquisition of Discourse Relations
18 0.66712224 36 acl-2012-BIUTEE: A Modular Open-Source System for Recognizing Textual Entailment
19 0.66573471 139 acl-2012-MIX Is Not a Tree-Adjoining Language
20 0.6654458 99 acl-2012-Finding Salient Dates for Building Thematic Timelines