acl acl2012 acl2012-99 knowledge-graph by maker-knowledge-mining

99 acl-2012-Finding Salient Dates for Building Thematic Timelines


Source: pdf

Author: Remy Kessler ; Xavier Tannier ; Caroline Hagege ; Veronique Moriceau ; Andre Bittar

Abstract: We present an approach for detecting salient (important) dates in texts in order to automatically build event timelines from a search query (e.g. the name of an event or person, etc.). This work was carried out on a corpus of newswire texts in English provided by the Agence France Presse (AFP). In order to extract salient dates that warrant inclusion in an event timeline, we first recognize and normalize temporal expressions in texts and then use a machine-learning approach to extract salient dates that relate to a particular topic. We focused only on extracting the dates and not the events to which they are related.

Reference: text


Summary: the most important sentenses genereted by tfidf model

sentIndex sentText sentNum sentScore

1 mori ceau@ l ims i fr Abstract We present an approach for detecting salient (important) dates in texts in order to automatically build event timelines from a search query (e. [sent-8, score-1.129]

2 In order to extract salient dates that warrant inclusion in an event timeline, we first recognize and normalize temporal expressions in texts and then use a machine-learning approach to extract salient dates that relate to a particular topic. [sent-13, score-1.909]

3 We focused only on extracting the dates and not the events to which they are related. [sent-14, score-0.646]

4 1 Introduction Our aim here was to build thematic timelines for a general domain topic defined by a user query. [sent-15, score-0.182]

5 They use little temporal information, generally only using document metadata, such as the document creation time (DCT). [sent-19, score-0.308]

6 The few systems that do make use of temporal information (such as the now discontinued Google timeline), only extract absolute, full dates (that feature a day, month and year). [sent-20, score-0.8]

7 1, we found that only 7% of extracted temporal expressions are absolute dates. [sent-22, score-0.359]

8 com We distinguish our work from that of previous researchers in that we have focused primarily on extracted temporal information as opposed to other textual content. [sent-25, score-0.289]

9 We show that using linguistic temporal processing helps extract important events in texts. [sent-26, score-0.383]

10 Our system extracts a maximum of temporal information and uses only this information to detect salient dates for the construction of event timelines. [sent-27, score-1.101]

11 Each date is presented with a set of relevant sentences. [sent-30, score-0.232]

12 The system used for temporal analysis is described in Section 4, and the strategy used for indexing and finding salient dates, as well as the results obtained, are given in Section 51. [sent-34, score-0.444]

13 , 2010) is a specification language for manual annotation of temporal information in texts, but, to the best of our knowledge, it has not yet actually been used in information retrieval systems. [sent-36, score-0.283]

14 , 2009), among others, have highlighted that the analysis of temporal information is often an essential component in text understanding and is useful in a wide range of information retrieval applications. [sent-43, score-0.283]

15 , 2009) highlight the importance of processing temporal expressions in Question Answering systems. [sent-45, score-0.293]

16 For example, in the TREC-10 QA evaluation campaign, more than 10% of questions required an element of temporal processing in order to be correctly processed (Li et al. [sent-46, score-0.236]

17 In multidocument summarization, temporal processing enables a system to detect redundant excerpts from various texts on the same topic and to present results in a relevant chronological order (Barzilay and Elhadad, 2002). [sent-48, score-0.356]

18 (Kim and Choi, 2011) present work on the extraction of temporal information in clinical narrative texts. [sent-50, score-0.282]

19 , 2011) present an end-to-end system that processes clinical records, detects events and constructs timelines of patients’ medical histories. [sent-52, score-0.305]

20 His method used place/time collocations and ranked events according to statistical measures. [sent-59, score-0.167]

21 (Chieu and Lee, 2004) propose a similar system that extracts events relevant to a query from a collection of documents. [sent-63, score-0.2]

22 Important events are those reported in a large number of news articles and each event is constructed according to one single query and represented by a set of sentences. [sent-64, score-0.416]

23 (Swan and Allen, 2000) present an approach to generating graphical timelines that involves extracting clusters of noun phrases and named entities. [sent-65, score-0.182]

24 Each document is an XML file containing a title, a date of creation (DCT), set of keywords, and textual content split into paragraphs. [sent-73, score-0.311]

25 2 AFP Chronologies AFP “chronologies” (textual event timelines) are a specific type of articles written by AFP journalists in order to contextualize current events. [sent-75, score-0.234]

26 These chronologies may concern any topic discussed in the media, and consist in a list of dates (typically between 10 and 20) associated with a text describing the related event(s). [sent-76, score-0.84]

27 We selected 91 chronologies satisfying the following constraints: • All dates in the chronologies are between 2004 aAnldl 2a0te1s s1 i nto th eb e c sure tohgaite sth aere r beelatwteede nev 2e0n0t4s are described in the corpus. [sent-79, score-1.082]

28 For example, “Chronology of climax to Vietnam War” was excluded because its corresponding dates do not appear in the content of the articles. [sent-80, score-0.536]

29 • • All dates in the chronology are anterior to the chronology’s hcere cahtiroonn dlaogtey. [sent-81, score-0.656]

30 The temporal granularity of the chronology is tThhee day. [sent-83, score-0.389]

31 For learning and evaluation purposes, all chronologies were converted to a single XML format. [sent-85, score-0.273]

32 Each document was manually associated with a user search query made up of the keywords required to retrieve the chronology. [sent-86, score-0.166]

33 First, pre-processing of the AFP corpus tags and normalizes temporal expressions in each of the articles (step in the Figure). [sent-89, score-0.293]

34 These documents can be filtered ({), and dates are extracted from the remaining documents. [sent-92, score-0.536]

35 These dates are then ranked in order to show the most important ones to the user (|), to- x 2http : / / lucene . [sent-93, score-0.737]

36 4 Temporal and Linguistic Processing In this section, we describe the linguistic and temporal information extracted during the pre-processing phase and how the extraction is carried out. [sent-97, score-0.273]

37 It also performs named entity recognition (NER) of the most usual named entity categories and recognizes temporal expressions. [sent-104, score-0.302]

38 In the following subsections, we give details of the linguistic information that is used for the detec- tion of salient dates. [sent-109, score-0.217]

39 3 Temporal Analysis A previous module for temporal analysis was developed and integrated into the English grammar (Hag e`ge and Tannier, 2008), and evaluated during TempEval campaign (Verhagen et al. [sent-116, score-0.264]

40 Our goal with temporal analysis is to be able to tag and normalize3 aselected subsetoftemporalexpressions (TEs) which we consider to be relevant for our task. [sent-119, score-0.276]

41 1 Absolute Dates Absolute dates are dates that can be normalized without external or contextual knowledge. [sent-123, score-1.108]

42 3We call normalization the operation of turning a temporal expression into a formated, fully specified representation. [sent-126, score-0.265]

43 733 However, absolute dates are relatively infrequent in our corpus (7%), so in order to broaden the coverage for the detection of salient dates, we decided to consider relative dates, which are far more frequent. [sent-128, score-0.82]

44 2 DCT-relative Dates DCT-relative temporal expressions are those which are relative to the creation date of the document. [sent-131, score-0.485]

45 This class represents 40% of dates extracted from the AFP corpus. [sent-132, score-0.536]

46 External information is required, in particular, the date which corresponds to the moment of utterance. [sent-134, score-0.192]

47 This is the case of expressions like next Friday, which correspond to the calendar date of the first Friday following the DCT. [sent-138, score-0.283]

48 However, these underspecified dates are not used in our experiments. [sent-150, score-0.536]

49 4 Modality and Reported Speech An important issue that can affect the calculation of salient dates is the modality associated with timestamped events in text. [sent-152, score-0.974]

50 For instance, the status of a salient date candidate in a sentence like “The meeting takes place on Friday” has to be distinguished from the one in “The meeting should take place on Friday” or “The meeting will take place on Friday, Mr. [sent-153, score-0.462]

51 The time-stamped event meeting takes place is factual in the first example and can be taken as granted. [sent-155, score-0.179]

52 This is expressed by the modality introduced by the modal auxiliary should (second example), or by the use of the future tense or reported speech (third example). [sent-157, score-0.284]

53 More specifically, we consider the following features: Events that are mentioned in the future: If a time-stamped event is in the future tense, we add a specific attribute MODALITY with value FUTURE to the corresponding TE annotation. [sent-159, score-0.189]

54 Events used with a modal verb: If a timestamped event is introduced by a modal verb such as should or would, then attribute MODALITY to the corresponding TE annotation has the value MODAL. [sent-160, score-0.381]

55 We dealt with time-stamped events governed by a reported speech verb, or otherwise appearing in reported speech. [sent-162, score-0.267]

56 If a relevant TE modifies a reported speech verb, the annotation of this TE contains a specific attribute, DECLARATION=”YES”. [sent-164, score-0.169]

57 If the relevant TE modifies a verb that appears in a clause introduced by a reported speech verb then the annotation contains the attribute REPORTED=”YES”. [sent-165, score-0.305]

58 modality and reported speech can occur for a same time-stamped event). [sent-168, score-0.186]

59 Hong said” is annotated with both modality and reported speech attributes. [sent-170, score-0.186]

60 5 Corpus-dependent Special Cases While we developed the linguistic and temporal annotators, we took into account some specificities of our corpus. [sent-172, score-0.273]

61 We decided that the TEs today and 734 now were not relevant for the detection of salient dates. [sent-173, score-0.258]

62 In the AFP news corpus, these expressions are mostly generic expressions synomymous with nowadays and do not really time-stamp an event with respect to the DCT. [sent-174, score-0.312]

63 Another specificity of the corpus is the fact that if the DCT corresponds to a Monday, and if an event in a past tense is described with the associated TE on Monday or Monday, it means that this event occurs on the DCT day itself, and not on the Monday before. [sent-175, score-0.41]

64 The annotation of the relevant TE has the attribute indicating that it time-stamps an event realized by a reported speech verb. [sent-186, score-0.328]

65 5 millions temporal expressions were detected, among which 845,000 absolute dates (7%) and 4. [sent-190, score-0.923]

66 Although we have not yet evaluated our tagging of relative dates, the system on which our current date normalization is based achieved good results in the TempEval (Verhagen et al. [sent-192, score-0.221]

67 2, we present our experiments using simple filtering and statistics on dates calculated by Lucene. [sent-198, score-0.594]

68 luc(d) ins tshe t sum eofr Lucene scores for textual units containing the date d. [sent-203, score-0.275]

69 logdfN(d) where f(d) is the number of occurrences of date d in the sentence (generally, f(d) = 1), N is the number of indexed sentences and df(d) is the number of sentences containing date d. [sent-207, score-0.443]

70 In all experiments (including baselines), timelines have been built by considering only dates between the first and the last dates of the corresponding manual chronology. [sent-208, score-1.221]

71 Processing runs were evaluated on manually-written chronologies (see Section 3. [sent-209, score-0.364]

72 Note that in this baseline, as well as in all the subsequent runs, the information unit was the sentence because a date was associated to a small part of the text. [sent-236, score-0.223]

73 Same as BLabs, except that sentences con- taining no absolute dates were considered and associated to the DCT. [sent-239, score-0.633]

74 This sentence was indexed with the title and keywords of the AFP article containing it. [sent-255, score-0.153]

75 Combinations between the following filtering operations were possible, by removing all dates associated with a reported speech verb (R), a modal verb (M) and/or a future verb (F). [sent-257, score-0.925]

76 All these filtering operations were intended to remove references to events that were not certain, thereby minimizing noise in results. [sent-258, score-0.168]

77 These processing runs are named SD runs, with indices representing the filtering operations. [sent-259, score-0.182]

78 In all combinations, dates were ranked by the sum of Lucene scores for these sentences (luc) or by tf. [sent-261, score-0.593]

79 3 Machine-Learning Runs We used our set of manually-written chronologies as a training corpus to perform machine learning experiments. [sent-269, score-0.273]

80 We used IcsiBoost5, an implementa4We do not present runs where dates are ranked by the number of times they appear in retrieved sentences (occ), as we did for baselines, since results are systematically lower. [sent-270, score-0.684]

81 In our approach, we consider two classes: salient dates are dates that have an entry in the manual chronologies, while non-salient dates are all other dates. [sent-274, score-1.788]

82 The choices of journalists are indeed very subjective, and chronologies must not exceed a certain length, which means that relevant dates can be thrown away. [sent-276, score-0.934]

83 We rather aggregated all sentences corresponding to the same date before learning the classifier. [sent-281, score-0.192]

84 Features representing the fact that an important event is still written about, a long time after it occurs: 1) Distance between the date and the most recent mention ofthis date 2) Distance between the date and the DCT; 3. [sent-285, score-0.725]

85 Instead, we used the predicted probability P(d) returned by the classifier, and mixed it with the Lucene score of sentences, or with date tf. [sent-288, score-0.224]

86 7 g9 0R103u853ns∗ Table 3: MAP results for salient date extraction with machine-learning. [sent-291, score-0.372]

87 Our 91 chronologies were randomly divided into 4 sub-samples, each of them being used once as test data. [sent-302, score-0.273]

88 However, assembling such a chronology is a very subjective task, and no clear method for evaluation agreement between two journalists seems immediately apparent. [sent-308, score-0.205]

89 We asked him to assess the first 30 dates of these runs. [sent-312, score-0.536]

90 ics6 6Namely, “Arab revolt timeline for Morocco ”, “Kyrgyzstan unrest timeline ”, “Lebanon ’s new government: a timeline ”, “Libya timeline ”. [sent-313, score-0.456]

91 9751967815982 Table 4: Average precision results for manual evaluation on 4 topics, against the original chronologies (APC), and the expert assessment (APE). [sent-316, score-0.303]

92 Table 4 presents results for this evaluation, comparing average precision values obtained 1) against the original, manual chronologies (APC), and 2) against the expert assessment (APE). [sent-317, score-0.303]

93 These values show that, for 3 runs out of 4, many dates returned by the system are considered as valid by the expert, even if not presented in the original chronology. [sent-318, score-0.659]

94 In future work, we envisage providing, together with the detection of salient dates, a semantic analysis that will help determine the importance of events. [sent-322, score-0.218]

95 Another interesting direction in which we soon aim to work is to consider all textual excerpts that are associated with salient dates, and use clustering techniques to determine if textual excerpts correspond to the same event or not. [sent-323, score-0.556]

96 Finally, as our news corpus is available both for English and French (comparable corpus, not necessarily translations), we aim to investigate cross-lingual extraction of salient dates and salient events. [sent-324, score-0.945]

97 Building timelines from narrative clinical records: initial results based-on deep natural language understanding. [sent-382, score-0.195]

98 Recognizing temporal information in korean clinical narratives through text normalization. [sent-391, score-0.282]

99 Text classification and named entities for new event detection. [sent-395, score-0.182]

100 Detecting events with date and place information in unstructured text. [sent-429, score-0.332]


similar papers computed by tfidf model

tfidf for this paper:

wordName wordTfidf (topN-words)

[('dates', 0.536), ('chronologies', 0.273), ('temporal', 0.236), ('afp', 0.204), ('date', 0.192), ('friday', 0.19), ('salient', 0.18), ('timelines', 0.149), ('event', 0.149), ('lucene', 0.144), ('sigir', 0.121), ('chronology', 0.12), ('tes', 0.119), ('dct', 0.119), ('timeline', 0.114), ('events', 0.11), ('xip', 0.102), ('te', 0.094), ('runs', 0.091), ('modality', 0.087), ('journalists', 0.085), ('hag', 0.068), ('occ', 0.068), ('xml', 0.066), ('absolute', 0.066), ('allan', 0.06), ('luc', 0.06), ('reported', 0.058), ('filtering', 0.058), ('ranked', 0.057), ('modal', 0.057), ('yan', 0.057), ('expressions', 0.057), ('france', 0.054), ('textual', 0.053), ('alonso', 0.051), ('modelmap', 0.051), ('orsay', 0.051), ('tannier', 0.051), ('monday', 0.051), ('query', 0.05), ('news', 0.049), ('keywords', 0.049), ('verb', 0.048), ('retrieval', 0.047), ('clinical', 0.046), ('title', 0.045), ('excerpts', 0.045), ('tense', 0.041), ('speech', 0.041), ('day', 0.04), ('relevant', 0.04), ('attribute', 0.04), ('detection', 0.038), ('nes', 0.038), ('linguistic', 0.037), ('document', 0.036), ('normalized', 0.036), ('texts', 0.035), ('allen', 0.034), ('tempeval', 0.034), ('apc', 0.034), ('ape', 0.034), ('calendar', 0.034), ('chieu', 0.034), ('mestl', 0.034), ('wrt', 0.034), ('named', 0.033), ('james', 0.033), ('granularity', 0.033), ('ge', 0.033), ('thematic', 0.033), ('verhagen', 0.033), ('evolutionary', 0.033), ('ny', 0.033), ('acm', 0.032), ('returned', 0.032), ('january', 0.031), ('associated', 0.031), ('ims', 0.03), ('modifies', 0.03), ('ne', 0.03), ('containing', 0.03), ('swan', 0.03), ('timestamped', 0.03), ('claude', 0.03), ('jung', 0.03), ('saquete', 0.03), ('manipulated', 0.03), ('xrce', 0.03), ('expert', 0.03), ('place', 0.03), ('indexed', 0.029), ('map', 0.029), ('normalization', 0.029), ('month', 0.028), ('indexing', 0.028), ('module', 0.028), ('java', 0.028), ('millions', 0.028)]

similar papers list:

simIndex simValue paperId paperTitle

same-paper 1 0.99999923 99 acl-2012-Finding Salient Dates for Building Thematic Timelines

Author: Remy Kessler ; Xavier Tannier ; Caroline Hagege ; Veronique Moriceau ; Andre Bittar

Abstract: We present an approach for detecting salient (important) dates in texts in order to automatically build event timelines from a search query (e.g. the name of an event or person, etc.). This work was carried out on a corpus of newswire texts in English provided by the Agence France Presse (AFP). In order to extract salient dates that warrant inclusion in an event timeline, we first recognize and normalize temporal expressions in texts and then use a machine-learning approach to extract salient dates that relate to a particular topic. We focused only on extracting the dates and not the events to which they are related.

2 0.25372803 90 acl-2012-Extracting Narrative Timelines as Temporal Dependency Structures

Author: Oleksandr Kolomiyets ; Steven Bethard ; Marie-Francine Moens

Abstract: We propose a new approach to characterizing the timeline of a text: temporal dependency structures, where all the events of a narrative are linked via partial ordering relations like BEFORE, AFTER, OVERLAP and IDENTITY. We annotate a corpus of children’s stories with temporal dependency trees, achieving agreement (Krippendorff’s Alpha) of 0.856 on the event words, 0.822 on the links between events, and of 0.700 on the ordering relation labels. We compare two parsing models for temporal dependency structures, and show that a deterministic non-projective dependency parser outperforms a graph-based maximum spanning tree parser, achieving labeled attachment accuracy of 0.647 and labeled tree edit distance of 0.596. Our analysis of the dependency parser errors gives some insights into future research directions.

3 0.24194185 191 acl-2012-Temporally Anchored Relation Extraction

Author: Guillermo Garrido ; Anselmo Penas ; Bernardo Cabaleiro ; Alvaro Rodrigo

Abstract: Although much work on relation extraction has aimed at obtaining static facts, many of the target relations are actually fluents, as their validity is naturally anchored to a certain time period. This paper proposes a methodological approach to temporally anchored relation extraction. Our proposal performs distant supervised learning to extract a set of relations from a natural language corpus, and anchors each of them to an interval of temporal validity, aggregating evidence from documents supporting the relation. We use a rich graphbased document-level representation to generate novel features for this task. Results show that our implementation for temporal anchoring is able to achieve a 69% of the upper bound performance imposed by the relation extraction step. Compared to the state of the art, the overall system achieves the highest precision reported.

4 0.20128137 85 acl-2012-Event Linking: Grounding Event Reference in a News Archive

Author: Joel Nothman ; Matthew Honnibal ; Ben Hachey ; James R. Curran

Abstract: Interpreting news requires identifying its constituent events. Events are complex linguistically and ontologically, so disambiguating their reference is challenging. We introduce event linking, which canonically labels an event reference with the article where it was first reported. This implicitly relaxes coreference to co-reporting, and will practically enable augmenting news archives with semantic hyperlinks. We annotate and analyse a corpus of 150 documents, extracting 501 links to a news archive with reasonable inter-annotator agreement.

5 0.18702187 135 acl-2012-Learning to Temporally Order Medical Events in Clinical Text

Author: Preethi Raghavan ; Albert Lai ; Eric Fosler-Lussier

Abstract: We investigate the problem of ordering medical events in unstructured clinical narratives by learning to rank them based on their time of occurrence. We represent each medical event as a time duration, with a corresponding start and stop, and learn to rank the starts/stops based on their proximity to the admission date. Such a representation allows us to learn all of Allen’s temporal relations between medical events. Interestingly, we observe that this methodology performs better than a classification-based approach for this domain, but worse on the relationships found in the Timebank corpus. This finding has important implications for styles of data representation and resources used for temporal relation learning: clinical narratives may have different language attributes corresponding to temporal ordering relative to Timebank, implying that the field may need to look at a wider range ofdomains to fully understand the nature of temporal ordering.

6 0.14386429 126 acl-2012-Labeling Documents with Timestamps: Learning from their Time Expressions

7 0.14185308 17 acl-2012-A Novel Burst-based Text Representation Model for Scalable Event Detection

8 0.13481668 60 acl-2012-Coupling Label Propagation and Constraints for Temporal Fact Extraction

9 0.10533819 91 acl-2012-Extracting and modeling durations for habits and events from Twitter

10 0.091094881 33 acl-2012-Automatic Event Extraction with Structured Preference Modeling

11 0.087204874 201 acl-2012-Towards the Unsupervised Acquisition of Discourse Relations

12 0.085387617 192 acl-2012-Tense and Aspect Error Correction for ESL Learners Using Global Context

13 0.079644389 73 acl-2012-Discriminative Learning for Joint Template Filling

14 0.067681953 98 acl-2012-Finding Bursty Topics from Microblogs

15 0.067020588 101 acl-2012-Fully Abstractive Approach to Guided Summarization

16 0.059684601 217 acl-2012-Word Sense Disambiguation Improves Information Retrieval

17 0.05487936 44 acl-2012-CSNIPER - Annotation-by-query for Non-canonical Constructions in Large Corpora

18 0.051738963 159 acl-2012-Pattern Learning for Relation Extraction with a Hierarchical Topic Model

19 0.049141265 180 acl-2012-Social Event Radar: A Bilingual Context Mining and Sentiment Analysis Summarization System

20 0.048701655 49 acl-2012-Coarse Lexical Semantic Annotation with Supersenses: An Arabic Case Study


similar papers computed by lsi model

lsi for this paper:

topicId topicWeight

[(0, -0.183), (1, 0.167), (2, -0.102), (3, 0.233), (4, 0.03), (5, -0.191), (6, 0.022), (7, -0.094), (8, -0.017), (9, -0.208), (10, -0.164), (11, 0.054), (12, 0.053), (13, 0.094), (14, -0.005), (15, -0.103), (16, 0.021), (17, 0.05), (18, -0.006), (19, 0.008), (20, 0.034), (21, 0.077), (22, 0.039), (23, -0.001), (24, 0.019), (25, 0.018), (26, 0.031), (27, -0.032), (28, 0.028), (29, -0.008), (30, 0.003), (31, 0.032), (32, 0.039), (33, -0.006), (34, 0.004), (35, 0.046), (36, 0.018), (37, 0.043), (38, 0.073), (39, -0.035), (40, -0.01), (41, 0.008), (42, -0.083), (43, -0.089), (44, -0.041), (45, -0.009), (46, -0.001), (47, 0.059), (48, 0.075), (49, 0.016)]

similar papers list:

simIndex simValue paperId paperTitle

same-paper 1 0.96080881 99 acl-2012-Finding Salient Dates for Building Thematic Timelines

Author: Remy Kessler ; Xavier Tannier ; Caroline Hagege ; Veronique Moriceau ; Andre Bittar

Abstract: We present an approach for detecting salient (important) dates in texts in order to automatically build event timelines from a search query (e.g. the name of an event or person, etc.). This work was carried out on a corpus of newswire texts in English provided by the Agence France Presse (AFP). In order to extract salient dates that warrant inclusion in an event timeline, we first recognize and normalize temporal expressions in texts and then use a machine-learning approach to extract salient dates that relate to a particular topic. We focused only on extracting the dates and not the events to which they are related.

2 0.91347885 135 acl-2012-Learning to Temporally Order Medical Events in Clinical Text

Author: Preethi Raghavan ; Albert Lai ; Eric Fosler-Lussier

Abstract: We investigate the problem of ordering medical events in unstructured clinical narratives by learning to rank them based on their time of occurrence. We represent each medical event as a time duration, with a corresponding start and stop, and learn to rank the starts/stops based on their proximity to the admission date. Such a representation allows us to learn all of Allen’s temporal relations between medical events. Interestingly, we observe that this methodology performs better than a classification-based approach for this domain, but worse on the relationships found in the Timebank corpus. This finding has important implications for styles of data representation and resources used for temporal relation learning: clinical narratives may have different language attributes corresponding to temporal ordering relative to Timebank, implying that the field may need to look at a wider range ofdomains to fully understand the nature of temporal ordering.

3 0.72179806 191 acl-2012-Temporally Anchored Relation Extraction

Author: Guillermo Garrido ; Anselmo Penas ; Bernardo Cabaleiro ; Alvaro Rodrigo

Abstract: Although much work on relation extraction has aimed at obtaining static facts, many of the target relations are actually fluents, as their validity is naturally anchored to a certain time period. This paper proposes a methodological approach to temporally anchored relation extraction. Our proposal performs distant supervised learning to extract a set of relations from a natural language corpus, and anchors each of them to an interval of temporal validity, aggregating evidence from documents supporting the relation. We use a rich graphbased document-level representation to generate novel features for this task. Results show that our implementation for temporal anchoring is able to achieve a 69% of the upper bound performance imposed by the relation extraction step. Compared to the state of the art, the overall system achieves the highest precision reported.

4 0.71469188 90 acl-2012-Extracting Narrative Timelines as Temporal Dependency Structures

Author: Oleksandr Kolomiyets ; Steven Bethard ; Marie-Francine Moens

Abstract: We propose a new approach to characterizing the timeline of a text: temporal dependency structures, where all the events of a narrative are linked via partial ordering relations like BEFORE, AFTER, OVERLAP and IDENTITY. We annotate a corpus of children’s stories with temporal dependency trees, achieving agreement (Krippendorff’s Alpha) of 0.856 on the event words, 0.822 on the links between events, and of 0.700 on the ordering relation labels. We compare two parsing models for temporal dependency structures, and show that a deterministic non-projective dependency parser outperforms a graph-based maximum spanning tree parser, achieving labeled attachment accuracy of 0.647 and labeled tree edit distance of 0.596. Our analysis of the dependency parser errors gives some insights into future research directions.

5 0.68419755 126 acl-2012-Labeling Documents with Timestamps: Learning from their Time Expressions

Author: Nathanael Chambers

Abstract: Temporal reasoners for document understanding typically assume that a document’s creation date is known. Algorithms to ground relative time expressions and order events often rely on this timestamp to assist the learner. Unfortunately, the timestamp is not always known, particularly on the Web. This paper addresses the task of automatic document timestamping, presenting two new models that incorporate rich linguistic features about time. The first is a discriminative classifier with new features extracted from the text’s time expressions (e.g., ‘since 1999’). This model alone improves on previous generative models by 77%. The second model learns probabilistic constraints between time expressions and the unknown document time. Imposing these learned constraints on the discriminative model further improves its accuracy. Finally, we present a new experiment design that facil- itates easier comparison by future work.

6 0.67371333 85 acl-2012-Event Linking: Grounding Event Reference in a News Archive

7 0.65981185 91 acl-2012-Extracting and modeling durations for habits and events from Twitter

8 0.65695697 60 acl-2012-Coupling Label Propagation and Constraints for Temporal Fact Extraction

9 0.51950508 17 acl-2012-A Novel Burst-based Text Representation Model for Scalable Event Detection

10 0.43052971 166 acl-2012-Qualitative Modeling of Spatial Prepositions and Motion Expressions

11 0.39019924 33 acl-2012-Automatic Event Extraction with Structured Preference Modeling

12 0.38060707 101 acl-2012-Fully Abstractive Approach to Guided Summarization

13 0.36806977 201 acl-2012-Towards the Unsupervised Acquisition of Discourse Relations

14 0.35030708 156 acl-2012-Online Plagiarized Detection Through Exploiting Lexical, Syntax, and Semantic Information

15 0.34398317 73 acl-2012-Discriminative Learning for Joint Template Filling

16 0.31010357 192 acl-2012-Tense and Aspect Error Correction for ESL Learners Using Global Context

17 0.30828661 49 acl-2012-Coarse Lexical Semantic Annotation with Supersenses: An Arabic Case Study

18 0.28218597 44 acl-2012-CSNIPER - Annotation-by-query for Non-canonical Constructions in Large Corpora

19 0.27964512 75 acl-2012-Discriminative Strategies to Integrate Multiword Expression Recognition and Parsing

20 0.27363434 129 acl-2012-Learning High-Level Planning from Text


similar papers computed by lda model

lda for this paper:

topicId topicWeight

[(25, 0.053), (26, 0.04), (28, 0.022), (30, 0.027), (37, 0.022), (39, 0.055), (52, 0.015), (59, 0.015), (60, 0.194), (74, 0.028), (76, 0.016), (82, 0.069), (84, 0.054), (85, 0.035), (90, 0.117), (92, 0.057), (94, 0.01), (99, 0.066)]

similar papers list:

simIndex simValue paperId paperTitle

same-paper 1 0.8110466 99 acl-2012-Finding Salient Dates for Building Thematic Timelines

Author: Remy Kessler ; Xavier Tannier ; Caroline Hagege ; Veronique Moriceau ; Andre Bittar

Abstract: We present an approach for detecting salient (important) dates in texts in order to automatically build event timelines from a search query (e.g. the name of an event or person, etc.). This work was carried out on a corpus of newswire texts in English provided by the Agence France Presse (AFP). In order to extract salient dates that warrant inclusion in an event timeline, we first recognize and normalize temporal expressions in texts and then use a machine-learning approach to extract salient dates that relate to a particular topic. We focused only on extracting the dates and not the events to which they are related.

2 0.70252544 116 acl-2012-Improve SMT Quality with Automatically Extracted Paraphrase Rules

Author: Wei He ; Hua Wu ; Haifeng Wang ; Ting Liu

Abstract: unkown-abstract

3 0.67746508 196 acl-2012-The OpenGrm open-source finite-state grammar software libraries

Author: Brian Roark ; Richard Sproat ; Cyril Allauzen ; Michael Riley ; Jeffrey Sorensen ; Terry Tai

Abstract: In this paper, we present a new collection of open-source software libraries that provides command line binary utilities and library classes and functions for compiling regular expression and context-sensitive rewrite rules into finite-state transducers, and for n-gram language modeling. The OpenGrm libraries use the OpenFst library to provide an efficient encoding of grammars and general algorithms for building, modifying and applying models.

4 0.65835315 191 acl-2012-Temporally Anchored Relation Extraction

Author: Guillermo Garrido ; Anselmo Penas ; Bernardo Cabaleiro ; Alvaro Rodrigo

Abstract: Although much work on relation extraction has aimed at obtaining static facts, many of the target relations are actually fluents, as their validity is naturally anchored to a certain time period. This paper proposes a methodological approach to temporally anchored relation extraction. Our proposal performs distant supervised learning to extract a set of relations from a natural language corpus, and anchors each of them to an interval of temporal validity, aggregating evidence from documents supporting the relation. We use a rich graphbased document-level representation to generate novel features for this task. Results show that our implementation for temporal anchoring is able to achieve a 69% of the upper bound performance imposed by the relation extraction step. Compared to the state of the art, the overall system achieves the highest precision reported.

5 0.65649772 187 acl-2012-Subgroup Detection in Ideological Discussions

Author: Amjad Abu-Jbara ; Pradeep Dasigi ; Mona Diab ; Dragomir Radev

Abstract: The rapid and continuous growth of social networking sites has led to the emergence of many communities of communicating groups. Many of these groups discuss ideological and political topics. It is not uncommon that the participants in such discussions split into two or more subgroups. The members of each subgroup share the same opinion toward the discussion topic and are more likely to agree with members of the same subgroup and disagree with members from opposing subgroups. In this paper, we propose an unsupervised approach for automatically detecting discussant subgroups in online communities. We analyze the text exchanged between the participants of a discussion to identify the attitude they carry toward each other and towards the various aspects of the discussion topic. We use attitude predictions to construct an attitude vector for each discussant. We use clustering techniques to cluster these vectors and, hence, determine the subgroup membership of each participant. We compare our methods to text clustering and other baselines, and show that our method achieves promising results.

6 0.65054548 12 acl-2012-A Graph-based Cross-lingual Projection Approach for Weakly Supervised Relation Extraction

7 0.63553286 90 acl-2012-Extracting Narrative Timelines as Temporal Dependency Structures

8 0.6352216 102 acl-2012-Genre Independent Subgroup Detection in Online Discussion Threads: A Study of Implicit Attitude using Textual Latent Semantics

9 0.63462025 206 acl-2012-UWN: A Large Multilingual Lexical Knowledge Base

10 0.62901944 188 acl-2012-Subgroup Detector: A System for Detecting Subgroups in Online Discussions

11 0.62179166 57 acl-2012-Concept-to-text Generation via Discriminative Reranking

12 0.62155861 174 acl-2012-Semantic Parsing with Bayesian Tree Transducers

13 0.61979544 219 acl-2012-langid.py: An Off-the-shelf Language Identification Tool

14 0.61924487 156 acl-2012-Online Plagiarized Detection Through Exploiting Lexical, Syntax, and Semantic Information

15 0.61904347 17 acl-2012-A Novel Burst-based Text Representation Model for Scalable Event Detection

16 0.61693132 126 acl-2012-Labeling Documents with Timestamps: Learning from their Time Expressions

17 0.61667305 159 acl-2012-Pattern Learning for Relation Extraction with a Hierarchical Topic Model

18 0.61609823 80 acl-2012-Efficient Tree-based Approximation for Entailment Graph Learning

19 0.61580378 73 acl-2012-Discriminative Learning for Joint Template Filling

20 0.61552608 167 acl-2012-QuickView: NLP-based Tweet Search