acl acl2012 acl2012-135 knowledge-graph by maker-knowledge-mining
Source: pdf
Author: Preethi Raghavan ; Albert Lai ; Eric Fosler-Lussier
Abstract: We investigate the problem of ordering medical events in unstructured clinical narratives by learning to rank them based on their time of occurrence. We represent each medical event as a time duration, with a corresponding start and stop, and learn to rank the starts/stops based on their proximity to the admission date. Such a representation allows us to learn all of Allen’s temporal relations between medical events. Interestingly, we observe that this methodology performs better than a classification-based approach for this domain, but worse on the relationships found in the Timebank corpus. This finding has important implications for styles of data representation and resources used for temporal relation learning: clinical narratives may have different language attributes corresponding to temporal ordering relative to Timebank, implying that the field may need to look at a wider range ofdomains to fully understand the nature of temporal ordering.
Reference: text
sentIndex sentText sentNum sentScore
1 edu ai@ Abstract We investigate the problem of ordering medical events in unstructured clinical narratives by learning to rank them based on their time of occurrence. [sent-6, score-1.135]
2 We represent each medical event as a time duration, with a corresponding start and stop, and learn to rank the starts/stops based on their proximity to the admission date. [sent-7, score-0.722]
3 Such a representation allows us to learn all of Allen’s temporal relations between medical events. [sent-8, score-0.689]
4 1 Introduction There has been considerable research on learning temporal relations between events in natural language. [sent-11, score-0.618]
5 Most learning problems try to classify event pairs as related by one of Allen’s temporal relations (Allen, 1981) i. [sent-12, score-0.52]
6 The Timebank corpus, widely used for temporal relation learning, consists of newswire text annotated for events, temporal expressions, and temporal relations between events using TimeML (Pustejovsky et al. [sent-16, score-1.302]
7 70 However, there may be a need to rethink how we learn temporal relations between events in different domains. [sent-19, score-0.669]
8 Timebank, its features, and established learning techniques like classification, may not work optimally in many real-world problems where temporal relation learning is of great importance. [sent-20, score-0.408]
9 We study the problem of learning temporal rela- tions between medical events in clinical text. [sent-21, score-1.21]
10 The idea of a medical “event” in clinical text is very different from events in Timebank. [sent-22, score-0.868]
11 Medical events are temporally-associated concepts in clinical text that describe a medical condition affecting the patient’s health, or procedures performed on a patient. [sent-23, score-0.868]
12 Learning to temporally order events in clinical text is fundamental to understanding patient narratives and key to applications such as longitudinal studies, question answering, document summarization and information retrieval with temporal constraints. [sent-24, score-1.223]
13 We propose learning temporal relations between medical events found in clinical narratives by learning to rank them. [sent-25, score-1.463]
14 This is achieved by representing medical events as time durations with starts and stops and ranking them based on their proximity to the admission date. [sent-26, score-0.926]
15 1 This implicitly allows us to learn all of Allen’s temporal relations between medical events. [sent-27, score-0.689]
16 In this paper, we establish the need to rethink the methods and resources used in temporal relation learning, as we demonstrate that the resources widely used for learning temporal relations in newswire text do not work on clinical text. [sent-28, score-1.326]
17 When we model the temporal ordering problem in clinical text as a ranking problem, we empirically show that it outperforms classification; we perform similar ex- periments with Timebank and observe the opposite conclusion (classification outperforms ranking). [sent-29, score-0.904]
18 1The admission date is the only explicit date always present in each clinical narrative. [sent-30, score-0.819]
19 c s 2o0c1ia2ti Aosns fo cria Ctio nm fpourta Ctoiomnpault Laitniognuaislt Licisn,g puaigsteiscs 70–74, events can be realized by ordering the starts and stops 2 Related Work The Timebank corpus provides hand-tagged features, including tense, aspect, modality, polarity and event class. [sent-33, score-0.447]
20 There have been significant efforts in machine learning of temporal relations between events using these features and a wide range of other features extracted from the Timebank corpus (Mani et al. [sent-34, score-0.618]
21 , 2009) challenges have often focused on temporal relation learning between different types of events from Timebank. [sent-38, score-0.564]
22 Zhou and Hripcsak (2007) provide a comprehensive survey of temporal reasoning with clinical data. [sent-39, score-0.76]
23 There has also been some work in generating annotated corpora of clinical text for temporal relation learning (Roberts et al. [sent-40, score-0.784]
24 (2006) propose a Temporal Constraint Structure (TCS) for medical events in discharge summaries. [sent-45, score-0.597]
25 We demonstrate the need to rethink resources, features and methods of learning temporal relations between events in different domains with the help of experiments in learning temporal relations in clinical text. [sent-47, score-1.526]
26 Specifically, we observe that we get better results in learning to rank chains of medical events to derive temporal relations (and their inverses) than learning a classifier for the same task. [sent-48, score-1.06]
27 The problem of learning to rank from examples has gained significant interest in the machine learning community, with important similarities and differences with the problems of regression and classification (Joachims et al. [sent-49, score-0.121]
28 The patient gives a history of fever on and off associated with chills for the last 1month. [sent-52, score-0.372]
29 He does give a history of decubitus ulcer on the back but his main complaint is fever associated with epigastric discomfort. [sent-53, score-0.316]
30 PAST M EDICAL HISTORY Significant for polymicrobial infection in the blood as well as in the urine in July 2007 history of back injury with paraparesis. [sent-54, score-0.334]
31 REVIEW OF SYSTEMS Positive for decubitus ulcer. [sent-56, score-0.07]
32 PHYSICAL EXAMINATION On physical exam the patient is a debilitated malnourished gentleman in mild distress. [sent-60, score-0.189]
33 Extremities revealed pain and atrophied muscles in the lower extremities with decubitus ulcer which had a transparent bandage in the decubitus area which was stage 2-3. [sent-64, score-0.21]
34 Figure 1: Excerpt from a sanitized clinical narrative (history & physical report) with medical events underlined. [sent-68, score-1.075]
35 lems of learning to rank objects in information retrieval and various other domains. [sent-69, score-0.07]
36 To the best of our understanding, there have been no previous attempts to learn temporal relations between events using a ranking approach. [sent-70, score-0.698]
37 3 Representation of Medical Events (MEs) Clinical narratives contain unstructured text describing various MEs including conditions, diagnoses and tests in the history of a patient, along with some information on when they occurred. [sent-71, score-0.241]
38 Much of the temporal information in clinical text is implicit and embedded in relative temporal relations between MEs. [sent-72, score-1.164]
39 , last 1 month, post colostomy rather than on July 2007). [sent-81, score-0.134]
40 Temporal expressions may also be fuzzy where history may refer to an event 1year ago or 3 months ago. [sent-82, score-0.235]
41 Given the ranking of all starts and stops, we can now compose every one of Allen’s temporal relations (Allen, 1981). [sent-96, score-0.537]
42 For instance, “history of paresis secondary to back injury who is bedridden status post colostomy ” indicates the start of paresis is in the past history of the patient prior to colostomy. [sent-98, score-0.736]
43 For recurring and continuous events like chills and fever, if the time period of recurrence is continuous (last 1 month), we consider it to be the time duration of the event. [sent-102, score-0.425]
44 For MEs that are associated with a fixed date or time, the start and stop are assumed to be the same (e. [sent-104, score-0.154]
45 , polymicrobial infection in the blood as well as in the urine in July 2007). [sent-106, score-0.147]
46 In case of negated events like no cough, we consider cough as the ME with a negative polarity. [sent-107, score-0.244]
47 Its start and stop time are assumed to be the same. [sent-108, score-0.085]
48 Polarity allows us to identify events that actually occurred in the patient’s history. [sent-109, score-0.21]
49 4 Ranking Model and Experiments Given a patient with multiple clinical narratives, our objective is to induce a partial temporal ordering of all medical events in each clinical narrative based on their proximity to a reference date (admission). [sent-110, score-2.092]
50 The training data consists of medical event (ME) chains, where each chain consists of an instance of the start or stop of a ME belonging to the same clinical narrative along with a rank. [sent-111, score-0.988]
51 The assumption is that the MEs in the same narrative are more or less semantically related by virtue of narrative discourse structure and are hence considered part of the same ME chain. [sent-112, score-0.326]
52 The rank assigned to an instance indicates the temporal order of the event instance in the chain. [sent-113, score-0.468]
53 Based on the rank of the starts and stops of event instances relative to other event instances, the temporal relations between them can be derived as indicated in Table 1. [sent-115, score-0.753]
54 Our corpus for ranking consisted of 47 clinical narratives obtained from the medical center and annotated with MEs, temporal expressions, relations and event chains. [sent-116, score-1.375]
55 5% of the events and our overall inter-annotator Cohen’s kappa statistic (Conger, 1980) for MEs was 0. [sent-118, score-0.192]
56 Thus, we extracted 47 ME chains across 4 patients. [sent-120, score-0.108]
57 The distribution of MEs across event chains and chains across patients (p) is as as follows. [sent-121, score-0.31]
58 p1 had 5 chains with 68 MEs, p2 had 9 chains with 90 MEs, p3 had 20 chains with 119 MEs and p4 had 13 chains with 82 MEs. [sent-122, score-0.432]
59 The distribution of chains across different types of clinical narratives is shown in Figure 2. [sent-123, score-0.619]
60 We construct a vector of features, from the manually annotated corpus, for each medical event instance. [sent-124, score-0.358]
61 Although 72 14 120648 RadiolgyDischargeSum ariesPatholgyHistory&Physicalp; 4321 Figure 2: Distribution of the 47 medical event chains derived from discharge summaries, history and physical reports, pathology and radiology notes across the 4 patients. [sent-125, score-0.775]
62 there is no real query in our set up, the admission date for each chain can be thought of as the query “date” and the MEs are ordered based on how close or far they are from each other and the admission date. [sent-126, score-0.547]
63 The features extracted for each ME include the the type of clinical narrative, section informa- tion, ME polarity, position of the medical concept in the narrative and verb pattern. [sent-127, score-0.851]
64 We extract temporal expressions linked to the ME like history, before admission, past, during examination, on discharge, after discharge, on admission. [sent-128, score-0.367]
65 We also extract features from each temporal expression indicating its closeness to the admission date. [sent-130, score-0.552]
66 Differences between each explicit date in the narrative is also extracted. [sent-131, score-0.242]
67 The UMLS(Bodenreider, 2004) semantic category of each medical concept is also included based on the intuition that MEs of a certain semantic group may occur closer to admission. [sent-132, score-0.264]
68 We tried using features like the tense of ME or the verb preceding the ME (if any), POS tag in ranking. [sent-133, score-0.082]
69 F,2 T hwehe tirme teh-eb nobss aerrevation sequence is MEs in the order in which they appear in a clinical narrative, and the state sequence is the corresponding label sequence of time-bins. [sent-138, score-0.412]
70 We ran ranking experiments using SVM-rank (Joachims, 2006), and based on the ranking score assigned to each start/stop instance, we derive the relative temporal order of MEs in a chain. [sent-139, score-0.543]
71 3 This in turn allows us to infer temporal relations between 2http://mallet. [sent-140, score-0.425]
72 05 difference in ranking scoreI nof e vstaalrutast/istnogps s imofu uMltaEnse oisu cso,u ±nt0ed. [sent-145, score-0.099]
73 On introducing the time-bin feature, the ranking error drops to 16. [sent-159, score-0.099]
74 The overall accuracy of ranking MEs on including the time-bin feature is 82. [sent-161, score-0.099]
75 Each learned relation is now compared with the pairwise classification of temporal relations between MEs. [sent-163, score-0.469]
76 We train a SVM classifier (Joachims, 1999) with an RBF kernel for pairwise classification of temporal relations. [sent-164, score-0.355]
77 The average classification accuracy for clinical text using the same feature set is 71. [sent-165, score-0.444]
78 1) for evaluation, 186 newswire documents with 3345 event pairs. [sent-168, score-0.121]
79 We traverse transitive relations between events in Timebank, increasing the number of event-event links to 6750 and create chains of related events to be ranked. [sent-169, score-0.598]
80 All classification and ranking results from 10-fold cross validation are presented in Table 2. [sent-173, score-0.131]
81 This model is well suited to the features that are available in clinical text. [sent-175, score-0.43]
82 The assumption that all MEs in a clinical narrative are temporally related allows us to totally order events within each narrative. [sent-176, score-0.875]
83 This works because a clinical narrative usually has a single protagonist, the patient. [sent-177, score-0.565]
84 This assumption, along with the availability of a fixed reference date in each narrative, allows us to effectively extract features that work in ranking MEs. [sent-178, score-0.206]
85 However, this assumption does not hold in newswire text: there tend to be multiple protagonists, and it may be possible to totally order only events that are linked to the same protagonist. [sent-179, score-0.257]
86 Ranking implicitly allows us to learn the transitive relations between MEs in the chain. [sent-180, score-0.124]
87 Ranking ME starts/ stops captures relations like includes and begins much better than classification, primarily because of the date difference and time-bin difference features. [sent-181, score-0.244]
88 Even if we consider verbs co-occurring with MEs, they are not always accurately reflective ofthe MEs’ temporal nature. [sent-186, score-0.343]
89 Moreover, in discharge summaries, almost all MEs or co-occurring verbs are in the past tense (before the discharge date). [sent-187, score-0.368]
90 Based on the type of clinical narrative, when it was generated, the reference date for the tense of the verb could be in the patient’s history, admission, discharge, or an intermediate date between admission and discharge. [sent-189, score-0.884]
91 For similar reasons, features like POS and aspect are not very informative in ordering MEs. [sent-190, score-0.086]
92 Moreover, features like aspect require annotators with not only a clinical background but also some expert knowledge in linguistics, which is not feasible. [sent-191, score-0.466]
93 6 Conclusions Representing and reasoning with temporal information in unstructured text is crucial to the field ofnatural language processing and biomedical informatics. [sent-192, score-0.419]
94 We presented a study on learning to rank medical events. [sent-193, score-0.334]
95 Temporally ordering medical events allows us to induce a partial order of medical events over the patient’s history. [sent-194, score-1.005]
96 We noted many differences between learning temporal relations in clinical text and Timebank. [sent-195, score-0.838]
97 The ranking experiments on clinical text yield better performance than classification, whereas the performance is the exact opposite in Timebank. [sent-196, score-0.531]
98 Based on experiments in two very different domains, we demonstrate the need to rethink the resources and methods for temporal relation learning. [sent-197, score-0.442]
99 TimeML: Robust specification of event and temporal expressions in text. [sent-249, score-0.444]
100 Temporal reasoning with medical data - a review with emphasis on medical natural language processing. [sent-273, score-0.553]
wordName wordTfidf (topN-words)
[('mes', 0.511), ('clinical', 0.412), ('temporal', 0.323), ('medical', 0.264), ('admission', 0.229), ('events', 0.192), ('timebank', 0.157), ('narrative', 0.153), ('discharge', 0.141), ('patient', 0.135), ('history', 0.114), ('chains', 0.108), ('colostomy', 0.106), ('ranking', 0.099), ('narratives', 0.099), ('event', 0.094), ('date', 0.089), ('relations', 0.084), ('allen', 0.071), ('decubitus', 0.07), ('fever', 0.07), ('paresis', 0.07), ('rethink', 0.07), ('temporally', 0.062), ('stops', 0.054), ('physical', 0.054), ('chills', 0.053), ('tube', 0.053), ('rank', 0.051), ('ordering', 0.05), ('blood', 0.046), ('injury', 0.046), ('peg', 0.046), ('biomedical', 0.043), ('tense', 0.043), ('stop', 0.038), ('proximity', 0.037), ('mani', 0.037), ('bedridden', 0.035), ('cough', 0.035), ('extremities', 0.035), ('inverses', 0.035), ('istory', 0.035), ('polymicrobial', 0.035), ('roberts', 0.035), ('tcs', 0.035), ('ulcer', 0.035), ('urine', 0.035), ('verhagen', 0.034), ('joachims', 0.033), ('continuous', 0.033), ('zhou', 0.032), ('duration', 0.032), ('classification', 0.032), ('gaizauskas', 0.031), ('starts', 0.031), ('infection', 0.031), ('savova', 0.031), ('status', 0.03), ('relation', 0.03), ('unstructured', 0.028), ('july', 0.028), ('post', 0.028), ('thorsten', 0.028), ('newswire', 0.027), ('start', 0.027), ('back', 0.027), ('expressions', 0.027), ('polarity', 0.026), ('reasoning', 0.025), ('induce', 0.025), ('secondary', 0.025), ('recurring', 0.025), ('hepple', 0.025), ('timeml', 0.025), ('james', 0.024), ('excerpt', 0.024), ('past', 0.023), ('verb', 0.022), ('transitive', 0.022), ('simultaneous', 0.022), ('relative', 0.022), ('pustejovsky', 0.021), ('time', 0.02), ('opposite', 0.02), ('examination', 0.02), ('chambers', 0.02), ('verbs', 0.02), ('assumption', 0.02), ('month', 0.02), ('summaries', 0.02), ('learning', 0.019), ('resources', 0.019), ('aspect', 0.019), ('totally', 0.018), ('suited', 0.018), ('annotators', 0.018), ('allows', 0.018), ('ohio', 0.018), ('like', 0.017)]
simIndex simValue paperId paperTitle
same-paper 1 0.99999994 135 acl-2012-Learning to Temporally Order Medical Events in Clinical Text
Author: Preethi Raghavan ; Albert Lai ; Eric Fosler-Lussier
Abstract: We investigate the problem of ordering medical events in unstructured clinical narratives by learning to rank them based on their time of occurrence. We represent each medical event as a time duration, with a corresponding start and stop, and learn to rank the starts/stops based on their proximity to the admission date. Such a representation allows us to learn all of Allen’s temporal relations between medical events. Interestingly, we observe that this methodology performs better than a classification-based approach for this domain, but worse on the relationships found in the Timebank corpus. This finding has important implications for styles of data representation and resources used for temporal relation learning: clinical narratives may have different language attributes corresponding to temporal ordering relative to Timebank, implying that the field may need to look at a wider range ofdomains to fully understand the nature of temporal ordering.
2 0.29658058 90 acl-2012-Extracting Narrative Timelines as Temporal Dependency Structures
Author: Oleksandr Kolomiyets ; Steven Bethard ; Marie-Francine Moens
Abstract: We propose a new approach to characterizing the timeline of a text: temporal dependency structures, where all the events of a narrative are linked via partial ordering relations like BEFORE, AFTER, OVERLAP and IDENTITY. We annotate a corpus of children’s stories with temporal dependency trees, achieving agreement (Krippendorff’s Alpha) of 0.856 on the event words, 0.822 on the links between events, and of 0.700 on the ordering relation labels. We compare two parsing models for temporal dependency structures, and show that a deterministic non-projective dependency parser outperforms a graph-based maximum spanning tree parser, achieving labeled attachment accuracy of 0.647 and labeled tree edit distance of 0.596. Our analysis of the dependency parser errors gives some insights into future research directions.
3 0.27593786 191 acl-2012-Temporally Anchored Relation Extraction
Author: Guillermo Garrido ; Anselmo Penas ; Bernardo Cabaleiro ; Alvaro Rodrigo
Abstract: Although much work on relation extraction has aimed at obtaining static facts, many of the target relations are actually fluents, as their validity is naturally anchored to a certain time period. This paper proposes a methodological approach to temporally anchored relation extraction. Our proposal performs distant supervised learning to extract a set of relations from a natural language corpus, and anchors each of them to an interval of temporal validity, aggregating evidence from documents supporting the relation. We use a rich graphbased document-level representation to generate novel features for this task. Results show that our implementation for temporal anchoring is able to achieve a 69% of the upper bound performance imposed by the relation extraction step. Compared to the state of the art, the overall system achieves the highest precision reported.
4 0.18702187 99 acl-2012-Finding Salient Dates for Building Thematic Timelines
Author: Remy Kessler ; Xavier Tannier ; Caroline Hagege ; Veronique Moriceau ; Andre Bittar
Abstract: We present an approach for detecting salient (important) dates in texts in order to automatically build event timelines from a search query (e.g. the name of an event or person, etc.). This work was carried out on a corpus of newswire texts in English provided by the Agence France Presse (AFP). In order to extract salient dates that warrant inclusion in an event timeline, we first recognize and normalize temporal expressions in texts and then use a machine-learning approach to extract salient dates that relate to a particular topic. We focused only on extracting the dates and not the events to which they are related.
5 0.15224873 60 acl-2012-Coupling Label Propagation and Constraints for Temporal Fact Extraction
Author: Yafang Wang ; Maximilian Dylla ; Marc Spaniol ; Gerhard Weikum
Abstract: The Web and digitized text sources contain a wealth of information about named entities such as politicians, actors, companies, or cultural landmarks. Extracting this information has enabled the automated construction oflarge knowledge bases, containing hundred millions of binary relationships or attribute values about these named entities. However, in reality most knowledge is transient, i.e. changes over time, requiring a temporal dimension in fact extraction. In this paper we develop a methodology that combines label propagation with constraint reasoning for temporal fact extraction. Label propagation aggressively gathers fact candidates, and an Integer Linear Program is used to clean out false hypotheses that violate temporal constraints. Our method is able to improve on recall while keeping up with precision, which we demonstrate by experiments with biography-style Wikipedia pages and a large corpus of news articles.
6 0.14510797 85 acl-2012-Event Linking: Grounding Event Reference in a News Archive
7 0.13708855 91 acl-2012-Extracting and modeling durations for habits and events from Twitter
8 0.11546271 201 acl-2012-Towards the Unsupervised Acquisition of Discourse Relations
9 0.11460845 126 acl-2012-Labeling Documents with Timestamps: Learning from their Time Expressions
10 0.10156913 17 acl-2012-A Novel Burst-based Text Representation Model for Scalable Event Detection
11 0.084508248 192 acl-2012-Tense and Aspect Error Correction for ESL Learners Using Global Context
12 0.061443709 33 acl-2012-Automatic Event Extraction with Structured Preference Modeling
13 0.050713003 96 acl-2012-Fast and Robust Part-of-Speech Tagging Using Dynamic Model Selection
14 0.042105082 98 acl-2012-Finding Bursty Topics from Microblogs
15 0.041441694 73 acl-2012-Discriminative Learning for Joint Template Filling
16 0.041439097 101 acl-2012-Fully Abstractive Approach to Guided Summarization
17 0.039631873 19 acl-2012-A Ranking-based Approach to Word Reordering for Statistical Machine Translation
18 0.039604153 157 acl-2012-PDTB-style Discourse Annotation of Chinese Text
19 0.036432128 159 acl-2012-Pattern Learning for Relation Extraction with a Hierarchical Topic Model
20 0.033525031 127 acl-2012-Large-Scale Syntactic Language Modeling with Treelets
topicId topicWeight
[(0, -0.124), (1, 0.16), (2, -0.114), (3, 0.228), (4, 0.041), (5, -0.264), (6, -0.0), (7, -0.098), (8, -0.037), (9, -0.201), (10, -0.187), (11, 0.056), (12, 0.031), (13, 0.064), (14, -0.005), (15, -0.101), (16, 0.024), (17, 0.056), (18, 0.048), (19, 0.02), (20, 0.069), (21, 0.018), (22, -0.041), (23, 0.012), (24, 0.066), (25, 0.014), (26, 0.02), (27, -0.039), (28, 0.028), (29, 0.002), (30, -0.081), (31, 0.1), (32, 0.021), (33, -0.014), (34, 0.078), (35, 0.023), (36, -0.037), (37, 0.058), (38, 0.048), (39, 0.002), (40, -0.047), (41, -0.034), (42, -0.04), (43, -0.072), (44, -0.006), (45, -0.001), (46, -0.042), (47, 0.009), (48, 0.006), (49, -0.028)]
simIndex simValue paperId paperTitle
same-paper 1 0.97588044 135 acl-2012-Learning to Temporally Order Medical Events in Clinical Text
Author: Preethi Raghavan ; Albert Lai ; Eric Fosler-Lussier
Abstract: We investigate the problem of ordering medical events in unstructured clinical narratives by learning to rank them based on their time of occurrence. We represent each medical event as a time duration, with a corresponding start and stop, and learn to rank the starts/stops based on their proximity to the admission date. Such a representation allows us to learn all of Allen’s temporal relations between medical events. Interestingly, we observe that this methodology performs better than a classification-based approach for this domain, but worse on the relationships found in the Timebank corpus. This finding has important implications for styles of data representation and resources used for temporal relation learning: clinical narratives may have different language attributes corresponding to temporal ordering relative to Timebank, implying that the field may need to look at a wider range ofdomains to fully understand the nature of temporal ordering.
2 0.83263731 99 acl-2012-Finding Salient Dates for Building Thematic Timelines
Author: Remy Kessler ; Xavier Tannier ; Caroline Hagege ; Veronique Moriceau ; Andre Bittar
Abstract: We present an approach for detecting salient (important) dates in texts in order to automatically build event timelines from a search query (e.g. the name of an event or person, etc.). This work was carried out on a corpus of newswire texts in English provided by the Agence France Presse (AFP). In order to extract salient dates that warrant inclusion in an event timeline, we first recognize and normalize temporal expressions in texts and then use a machine-learning approach to extract salient dates that relate to a particular topic. We focused only on extracting the dates and not the events to which they are related.
3 0.78832638 191 acl-2012-Temporally Anchored Relation Extraction
Author: Guillermo Garrido ; Anselmo Penas ; Bernardo Cabaleiro ; Alvaro Rodrigo
Abstract: Although much work on relation extraction has aimed at obtaining static facts, many of the target relations are actually fluents, as their validity is naturally anchored to a certain time period. This paper proposes a methodological approach to temporally anchored relation extraction. Our proposal performs distant supervised learning to extract a set of relations from a natural language corpus, and anchors each of them to an interval of temporal validity, aggregating evidence from documents supporting the relation. We use a rich graphbased document-level representation to generate novel features for this task. Results show that our implementation for temporal anchoring is able to achieve a 69% of the upper bound performance imposed by the relation extraction step. Compared to the state of the art, the overall system achieves the highest precision reported.
4 0.73120874 90 acl-2012-Extracting Narrative Timelines as Temporal Dependency Structures
Author: Oleksandr Kolomiyets ; Steven Bethard ; Marie-Francine Moens
Abstract: We propose a new approach to characterizing the timeline of a text: temporal dependency structures, where all the events of a narrative are linked via partial ordering relations like BEFORE, AFTER, OVERLAP and IDENTITY. We annotate a corpus of children’s stories with temporal dependency trees, achieving agreement (Krippendorff’s Alpha) of 0.856 on the event words, 0.822 on the links between events, and of 0.700 on the ordering relation labels. We compare two parsing models for temporal dependency structures, and show that a deterministic non-projective dependency parser outperforms a graph-based maximum spanning tree parser, achieving labeled attachment accuracy of 0.647 and labeled tree edit distance of 0.596. Our analysis of the dependency parser errors gives some insights into future research directions.
5 0.72674614 60 acl-2012-Coupling Label Propagation and Constraints for Temporal Fact Extraction
Author: Yafang Wang ; Maximilian Dylla ; Marc Spaniol ; Gerhard Weikum
Abstract: The Web and digitized text sources contain a wealth of information about named entities such as politicians, actors, companies, or cultural landmarks. Extracting this information has enabled the automated construction oflarge knowledge bases, containing hundred millions of binary relationships or attribute values about these named entities. However, in reality most knowledge is transient, i.e. changes over time, requiring a temporal dimension in fact extraction. In this paper we develop a methodology that combines label propagation with constraint reasoning for temporal fact extraction. Label propagation aggressively gathers fact candidates, and an Integer Linear Program is used to clean out false hypotheses that violate temporal constraints. Our method is able to improve on recall while keeping up with precision, which we demonstrate by experiments with biography-style Wikipedia pages and a large corpus of news articles.
6 0.67170376 126 acl-2012-Labeling Documents with Timestamps: Learning from their Time Expressions
7 0.64931631 91 acl-2012-Extracting and modeling durations for habits and events from Twitter
8 0.49519163 85 acl-2012-Event Linking: Grounding Event Reference in a News Archive
9 0.38163054 17 acl-2012-A Novel Burst-based Text Representation Model for Scalable Event Detection
10 0.33728778 201 acl-2012-Towards the Unsupervised Acquisition of Discourse Relations
11 0.29656479 166 acl-2012-Qualitative Modeling of Spatial Prepositions and Motion Expressions
12 0.28788337 192 acl-2012-Tense and Aspect Error Correction for ESL Learners Using Global Context
13 0.23595591 101 acl-2012-Fully Abstractive Approach to Guided Summarization
14 0.23268087 129 acl-2012-Learning High-Level Planning from Text
15 0.21798316 33 acl-2012-Automatic Event Extraction with Structured Preference Modeling
16 0.19787857 73 acl-2012-Discriminative Learning for Joint Template Filling
17 0.18679899 133 acl-2012-Learning to "Read Between the Lines" using Bayesian Logic Programs
18 0.15492234 110 acl-2012-Historical Analysis of Legal Opinions with a Sparse Mixed-Effects Latent Variable Model
19 0.14685573 156 acl-2012-Online Plagiarized Detection Through Exploiting Lexical, Syntax, and Semantic Information
20 0.14430474 157 acl-2012-PDTB-style Discourse Annotation of Chinese Text
topicId topicWeight
[(25, 0.018), (26, 0.031), (28, 0.021), (30, 0.03), (37, 0.018), (39, 0.051), (64, 0.015), (74, 0.024), (82, 0.084), (84, 0.433), (85, 0.033), (90, 0.077), (92, 0.021), (94, 0.021), (99, 0.028)]
simIndex simValue paperId paperTitle
1 0.90293431 122 acl-2012-Joint Evaluation of Morphological Segmentation and Syntactic Parsing
Author: Reut Tsarfaty ; Joakim Nivre ; Evelina Andersson
Abstract: We present novel metrics for parse evaluation in joint segmentation and parsing scenarios where the gold sequence of terminals is not known in advance. The protocol uses distance-based metrics defined for the space of trees over lattices. Our metrics allow us to precisely quantify the performance gap between non-realistic parsing scenarios (assuming gold segmented and tagged input) and realistic ones (not assuming gold segmentation and tags). Our evaluation of segmentation and parsing for Modern Hebrew sheds new light on the performance ofthe best parsing systems to date in the different scenarios.
same-paper 2 0.87780917 135 acl-2012-Learning to Temporally Order Medical Events in Clinical Text
Author: Preethi Raghavan ; Albert Lai ; Eric Fosler-Lussier
Abstract: We investigate the problem of ordering medical events in unstructured clinical narratives by learning to rank them based on their time of occurrence. We represent each medical event as a time duration, with a corresponding start and stop, and learn to rank the starts/stops based on their proximity to the admission date. Such a representation allows us to learn all of Allen’s temporal relations between medical events. Interestingly, we observe that this methodology performs better than a classification-based approach for this domain, but worse on the relationships found in the Timebank corpus. This finding has important implications for styles of data representation and resources used for temporal relation learning: clinical narratives may have different language attributes corresponding to temporal ordering relative to Timebank, implying that the field may need to look at a wider range ofdomains to fully understand the nature of temporal ordering.
3 0.86625564 68 acl-2012-Decoding Running Key Ciphers
Author: Sravana Reddy ; Kevin Knight
Abstract: There has been recent interest in the problem of decoding letter substitution ciphers using techniques inspired by natural language processing. We consider a different type of classical encoding scheme known as the running key cipher, and propose a search solution using Gibbs sampling with a word language model. We evaluate our method on synthetic ciphertexts of different lengths, and find that it outperforms previous work that employs Viterbi decoding with character-based models.
4 0.8571375 195 acl-2012-The Creation of a Corpus of English Metalanguage
Author: Shomir Wilson
Abstract: Metalanguage is an essential linguistic mechanism which allows us to communicate explicit information about language itself. However, it has been underexamined in research in language technologies, to the detriment of the performance of systems that could exploit it. This paper describes the creation of the first tagged and delineated corpus of English metalanguage, accompanied by an explicit definition and a rubric for identifying the phenomenon in text. This resource will provide a basis for further studies of metalanguage and enable its utilization in language technologies.
5 0.78308809 93 acl-2012-Fast Online Lexicon Learning for Grounded Language Acquisition
Author: David Chen
Abstract: Learning a semantic lexicon is often an important first step in building a system that learns to interpret the meaning of natural language. It is especially important in language grounding where the training data usually consist of language paired with an ambiguous perceptual context. Recent work by Chen and Mooney (201 1) introduced a lexicon learning method that deals with ambiguous relational data by taking intersections of graphs. While the algorithm produced good lexicons for the task of learning to interpret navigation instructions, it only works in batch settings and does not scale well to large datasets. In this paper we introduce a new online algorithm that is an order of magnitude faster and surpasses the stateof-the-art results. We show that by changing the grammar of the formal meaning represen- . tation language and training on additional data collected from Amazon’s Mechanical Turk we can further improve the results. We also include experimental results on a Chinese translation of the training data to demonstrate the generality of our approach.
6 0.43487614 219 acl-2012-langid.py: An Off-the-shelf Language Identification Tool
7 0.40572742 34 acl-2012-Automatically Learning Measures of Child Language Development
8 0.40306139 99 acl-2012-Finding Salient Dates for Building Thematic Timelines
9 0.40251595 211 acl-2012-Using Rejuvenation to Improve Particle Filtering for Bayesian Word Segmentation
10 0.39977235 139 acl-2012-MIX Is Not a Tree-Adjoining Language
11 0.39957887 12 acl-2012-A Graph-based Cross-lingual Projection Approach for Weakly Supervised Relation Extraction
12 0.39950681 90 acl-2012-Extracting Narrative Timelines as Temporal Dependency Structures
13 0.398386 194 acl-2012-Text Segmentation by Language Using Minimum Description Length
15 0.39713126 210 acl-2012-Unsupervized Word Segmentation: the Case for Mandarin Chinese
16 0.39225632 88 acl-2012-Exploiting Social Information in Grounded Language Learning via Grammatical Reduction
17 0.38471574 174 acl-2012-Semantic Parsing with Bayesian Tree Transducers
18 0.3709611 63 acl-2012-Cross-lingual Parse Disambiguation based on Semantic Correspondence
19 0.36867043 166 acl-2012-Qualitative Modeling of Spatial Prepositions and Motion Expressions
20 0.36815435 80 acl-2012-Efficient Tree-based Approximation for Entailment Graph Learning