emnlp emnlp2013 emnlp2013-41 knowledge-graph by maker-knowledge-mining

41 emnlp-2013-Building Event Threads out of Multiple News Articles


Source: pdf

Author: Xavier Tannier ; Veronique Moriceau

Abstract: We present an approach for building multidocument event threads from a large corpus of newswire articles. An event thread is basically a succession of events belonging to the same story. It helps the reader to contextualize the information contained in a single article, by navigating backward or forward in the thread from this article. A specific effort is also made on the detection of reactions to a particular event. In order to build these event threads, we use a cascade of classifiers and other modules, taking advantage of the redundancy of information in the newswire corpus. We also share interesting comments concerning our manual annotation procedure for building a training and testing set1.

Reference: text


Summary: the most important sentenses genereted by tfidf model

sentIndex sentText sentNum sentScore

1 xavie r t annie r@ l ims i fr Abstract We present an approach for building multidocument event threads from a large corpus of newswire articles. [sent-4, score-0.423]

2 An event thread is basically a succession of events belonging to the same story. [sent-5, score-0.471]

3 In order to build these event threads, we use a cascade of classifiers and other modules, taking advantage of the redundancy of information in the newswire corpus. [sent-8, score-0.427]

4 1 Introduction In this paper, we explore a new way of dealing with temporal relations between events. [sent-10, score-0.521]

5 Our task is somewhat between multidocument summarization and classification of temporal relations between events. [sent-11, score-0.598]

6 We work with a large collection of English newswire articles, where each article relates an event: the main topic of the article is a specific event, and other older events are mentioned in order to put it into perspective. [sent-12, score-0.41]

7 Thus, we consider that an event is associated with an article and that defining temporal relations between articles is a way to define temporal relations between events. [sent-13, score-1.451]

8 The task is to build a temporal graph of articles, linked between each other by the following relations: • • Same event, when two documents relate the same event, or whehnen tw a odo dcoucmumeennt tiss an update of another one. [sent-18, score-0.592]

9 Continuation, when an event is the continuatCioonn or uthatei consequence o evf aen previous one. [sent-19, score-0.216]

10 We also define a subset of continuation, called reaction, concerning a document relating the reaction of someone to another event. [sent-20, score-0.545]

11 These relations can be represented by a directed graph where documents are vertices and relations are edges (as illustrated in all figures of this article). [sent-22, score-0.767]

12 On the one hand, this redundancy is an issue since a system must not show duplicate information to the user; on the other hand, we show in this article that it can also be of great help in the process of extracting temporal graphs. [sent-26, score-0.467]

13 However, the fact is that minor events do hardly lead to dense temporal graphs. [sent-29, score-0.462]

14 The simple modules used to predict the same event, continuation and, possibly, reaction relations are described in Section 4, and results are given in Section 5. [sent-36, score-1.092]

15 When a user reads an article, the system will then be able to provide her with a thread of events having occurred before or after, helping her to contextualize the information she is reading. [sent-38, score-0.395]

16 2 Related work The identification of temporal relations between events in texts has been the focus of increasing attention because of its importance in NLP applications such as information extraction, question-answering or summarization. [sent-40, score-0.67]

17 , 2010) focused on temporal relation identification, mainly on temporal relations between events and times in the same sentence or in consecutive sentences and between events and the creation time of documents. [sent-43, score-1.264]

18 In this context, the goal is to identify the type of a temporal relation which is 959 Figure 2: Example of “temporal graph”: Madrid attacks, with many updates of the initial information. [sent-44, score-0.408]

19 Note that articles gathered in this main pool of articles can be posterior to the continuations and reactions to the described event. [sent-45, score-0.398]

20 6) use statistical learning based on temporal features (modality, tense, aspect, etc. [sent-48, score-0.313]

21 More recently, Mirroshandel and Ghassem-Sani (2012) proposed a new method for temporal relation extraction by using a bootstrapping method on annotated data and have a better accuracy than state-of-the-art systems. [sent-52, score-0.408]

22 Their method is based on the assumption that similar event pairs in topically related documents are likely to have the same temporal relations. [sent-53, score-0.721]

23 In the 2012 i2b2 challenge (i2b, 2012), the problem was not only to identify the type of temporal relations, but also to decide whether a temporal relation existed or not between two elements, either clinical concepts or temporal expressions. [sent-55, score-1.061]

24 But, as in TempEval, the temporal analysis were only to be performed within a single document. [sent-56, score-0.313]

25 They did not use temporal information present in texts and extracted sequences of events (e. [sent-61, score-0.462]

26 The first step consists in detecting narrative relations between events sharing coreferring arguments. [sent-65, score-0.438]

27 Then, a temporal classifier orders partially the connected events with the before relation. [sent-66, score-0.539]

28 Concerning the identification of the reaction relation, to our knowledge, there is no work on the detection of reaction between several documents. [sent-67, score-0.766]

29 (2009) focused on the identification of reported speech or opinions in quotations in a document, but not on the identification of an event which is the source of a reaction and which can possibly be in another document. [sent-71, score-0.599]

30 (201 1) proposed a framework to group temporally and tocipally related news articles into same story clusters in order to reveal the temporal evolution of stories. [sent-74, score-0.491]

31 But in these topically related clusters of documents, no temporal relation is detected between articles or events except chronological order. [sent-75, score-0.764]

32 (2002) propose a system for multidocument summarization from newswire articles describing the same event. [sent-78, score-0.245]

33 We assume that the title and the first paragraph describe the event associated with the document. [sent-93, score-0.273]

34 2 Relation Annotation Two annotators were asked to attribute the following relations between each pair of articles presented by the annotation interface system. [sent-97, score-0.348]

35 – • • • details, when the second document gives more details about the events (see bottom of Figure 5). [sent-100, score-0.227]

36 development of same story, when the two documents premlaetnet two aevmeent ssto owryh,ic whh are ihnec ltuwdoe dd oincutoa third one; continuation, when an event is the continuation or nthtien consequence aofn a previous one. [sent-101, score-0.794]

37 It is important to make clear that a continuation relation is more than a simple thematic relation, it implies a natural prolongation between two events. [sent-103, score-0.583]

38 For example, two sport events of the same Olympic Games, or two different attacks in Iraq, shall not be linked together unless a direct link between both is specified in the articles. [sent-104, score-0.234]

39 961 Figure 5: Example of relations same-event between two documents: update on casualties (top) or details (bottom). [sent-106, score-0.263]

40 We therefore aggregated the number update, form update and details relations into a more generic and consensual same-event relation (see Figure 5). [sent-113, score-0.358]

41 We also discarded the development of same story relation, leaving only same-event, continuation and reaction. [sent-114, score-0.483]

42 Annotation guidelines were modified and a second annotation round was carried out: only the same-event, continuation, reaction and nil relations were annotated. [sent-115, score-0.716]

43 • Same-event and continuation relations are transSiatmivee:- vife At same-event iBo na rnedl tBi same-event C, then A same-event C (and respectively for SNCRTOaioelmnTcetAinLouveaTtnibole1:PCahir1297n168au39c2m6t4ebristcLoe1fa6479tr152h4n83eincgorpuEsv. [sent-120, score-0.658]

44 Then, when the annotation was done, a transitive closure was performed on the entire graph, in order to get more relations with low effort (and to detect and correct some inconsistencies in the annotations). [sent-122, score-0.431]

45 4 Building Temporal Graphs As we explained in the introduction, the main purpose of this paper is to show that it is possible to extract temporal graphs of events from multiple documents in a news corpus. [sent-126, score-0.654]

46 Therefore, we will use a cascade of classifiers and other modules, each of them using the relations deduced by the previous one. [sent-128, score-0.29]

47 All modules predict a relation between two documents (i. [sent-129, score-0.274]

48 From now on, when considering a pair of documents, we will refer to the older document as document 1, and to the more recent one as document 2. [sent-144, score-0.265]

49 The relations found between documents will be represented by a directed graph where documents are vertices and relations are edges. [sent-145, score-0.779]

50 Nils versus non-nils We first aim at separating nil relations (no relation between two events) from other relations. [sent-148, score-0.396]

51 2: Nil classifier, level 2 Finding relations on a document implies that the described event is important enough to be addressed by several articles (same-event) or to have consequences (continuation). [sent-160, score-0.61]

52 Consequently, if we find such relations concerning a document, we are more likely to find more of them, because this means that 3http : / / lucene . [sent-161, score-0.296]

53 A typical example is shown in Figure 7, where an event described by several documents (on the left) has many continuations. [sent-164, score-0.344]

54 2 using additional features related to the relations found at step A. [sent-166, score-0.244]

55 This new information will basically help the classifier to be more optimistic toward non-nil relations for documents having already non-nil relations. [sent-172, score-0.413]

56 Same-event versus We are now working only (even if some relations may non-nil during the transitive Continuation with non-nil relations switch between nil and closure). [sent-175, score-0.583]

57 1: Relation classifier, level 1 Distinction between same-event and continuation is made by the following sets of features: • Date features: – – – Difference between the two document creation times (DCTs): difference in days, in hours, in minutes (3 features); Whether the creation time of doc. [sent-179, score-0.602]

58 Same events are grouped by cliques (see Section 4. [sent-192, score-0.313]

59 These last three features come from the idea that a continuation relation can be made explicit in text by mentioning the first event in the second document. [sent-195, score-0.761]

60 • • Temporal features: whether words introducing temporal lr feelaattuiorenss occur hine rdo wcourmdsen int r1o or dcoincgument 2. [sent-196, score-0.313]

61 Only same-event relations classified with more than 90% confidence by the classifier are kept, in order to ensure a high precision (recall will be improved at next step). [sent-205, score-0.285]

62 1 (collecting numbers of same-event and continuation relations that have • been found by the previous classifier). [sent-214, score-0.658]

63 4: Transitive closure by vote As already stated, same-event and continuation relations are transitive. [sent-219, score-0.842]

64 In the graph formed by documents (vertices) and relations (edges), it is then possible to find all cliques, i. [sent-221, score-0.389]

65 Starting from the result of last step, we find all sameevent cliques in the graph by using the Bron and Kerbosch (1973) algorithm. [sent-225, score-0.28]

66 The transitive closure process is then illustrated by Figure 8. [sent-226, score-0.229]

67 If the classifier proposed a relation between some documents of a clique and some other documents (as D1, D2 and D3), then a vote is necessary: • • If the document is linked to half or more of the clique, othcuenm ealnl missing dli tnoks h are crr meaotered (Figure 8. [sent-227, score-0.782]

68 a); Otherwise, the document is entirely discon- nOethcteerdw firsoem, tthhee clique (Figure 8. [sent-228, score-0.244]

69 This vote is done for same-event and continuation relations (resp. [sent-230, score-0.725]

70 that no document will sit in two different cliques, or that two documents from the same clique will not have two different relations toward a third document. [sent-238, score-0.58]

71 Another way to ensure robustness of the vote would be to apply the transitive closure only on bigger cliques (e. [sent-243, score-0.422]

72 Continuation versus Reaction The approach for reaction extraction is different. [sent-248, score-0.383]

73 We first try to determine which documents describe reactions, regardless of which event it is a reaction to. [sent-249, score-0.727]

74 In the training set, all documents having at least one incoming reaction edge are considered as reaction ñ 964 Figure8:Voteforsame- vñent ransitveclosure. [sent-250, score-0.967]

75 ), four nodes from the 5-node clique are linked to document D1, which is enough to add D1 to the clique. [sent-252, score-0.315]

76 ), only two nodes from the clique are linked to documents D2 and D3, which is not enough to add them into the clique. [sent-254, score-0.365]

77 All edges from the clique to D2 and D3 are then deleted. [sent-255, score-0.244]

78 Once reaction documents have been selected, the question is how to decide to which other document(s) it must be linked. [sent-261, score-0.511]

79 For example, in Figure 1, “Queen Elizabeth expresses deep sorrow ” is a reaction to pope’s death, not to other documents in the temporal thread (for example, not to other reactions or to “Pope in serious condition ”). [sent-262, score-1.112]

80 We then proposed the two following basic heuristics, applied on all continuation relations found after step B: • • A reaction reacts to only one event. [sent-264, score-1.119]

81 Then, among oalnl ecaocntstin tuoa atinon im edges incoming nto, the reaction document, we choose the biggest same-event clique and create reaction edges instead of continuations. [sent-266, score-1.161]

82 Finally, a transitive closure is performed for reactions (C. [sent-270, score-0.373]

83 Unsurprisingly, same-event relations are quite well classified by this baseline, since similarity is the major clue for this class. [sent-282, score-0.234]

84 Finally, if this condition on precision is true, transitivity closure is a robust way to get new relations for free. [sent-300, score-0.402]

85 Results also tell that classification of relations same-event and continuation is encouraging. [sent-301, score-0.658]

86 is not catastrophic since most ofthe missed reactions are tagged as continuation, which is still true (only 10% of the reaction relations are mistagged as sameevent). [sent-329, score-0.773]

87 6 Application As we showed in previous section, results for classi- fication of same-event and continuation relations between documents are good enough to use this system in an application that builds “event threads” around an input document. [sent-331, score-0.786]

88 A link in the page suggests the user to visualize tAhe l event tthhere paadg earso uugngde sthtsis t haert iucsleer. [sent-334, score-0.264]

89 Figure 9: An example oftemporal thread obtained on the death ofJohn Paul IIfor user visualization (see corresponding relation graph in Figure 1). [sent-335, score-0.328]

90 When same-event cliques are found, only the longest aamrteic-leev (often, tuhees m aroest f oruecnedn,t o one) hoef each clique is presented to the user. [sent-338, score-0.33]

91 • • • • This leads to a graph with only continuation aTnhdis sre laeacdtison to o re ala gtrioanpsh. [sent-340, score-0.503]

92 Edges are c“colnetainneuda”ti so that a unique thread is visible: relations that can be obtained by transitivity are removed, edges between two documents are kept only ifno document can be inserted in-between. [sent-341, score-0.706]

93 TNhoed user can vseisnutealdize in na cndh navigate through this graph (the event thread shows only titles but full articles can be accessed by clicking on the node). [sent-343, score-0.565]

94 4In case of very important events where “all pairs” would be too much, the temporal window is restrained. [sent-347, score-0.462]

95 966 Figure 9 presents the result of this process on the partial temporal graph shown in Figure 1. [sent-349, score-0.366]

96 7 Conclusion This article presents a task of multidocument temporal graph building. [sent-350, score-0.528]

97 We make the assumption that each news article (after filtering) relates an event, and we present a system extracting relations between articles. [sent-351, score-0.33]

98 This system uses simple features and algorithms but takes advantage of the important redundancy of information in a news corpus, by incorporating redundancy information in a cascade of classifiers, and by using transitivity of relations to infer new links. [sent-352, score-0.502]

99 Now that the task is well defined and that en- couraging results have been obtained, we envisage to enrich classifiers by more fine-grained temporal and lexical information, such as narrative chains (Chambers and Jurafsky, 2008) for continuation relation or event clustering (Barzilay et al. [sent-354, score-1.159]

100 There is no doubt that reaction detection can be improved a lot, by going beyond simple lexical features and discovering specific patterns. [sent-356, score-0.383]


similar papers computed by tfidf model

tfidf for this paper:

wordName wordTfidf (topN-words)

[('continuation', 0.45), ('reaction', 0.383), ('temporal', 0.313), ('event', 0.216), ('relations', 0.208), ('reactions', 0.182), ('clique', 0.166), ('cliques', 0.164), ('events', 0.149), ('documents', 0.128), ('closure', 0.117), ('articles', 0.108), ('thread', 0.106), ('relation', 0.095), ('nil', 0.093), ('article', 0.085), ('document', 0.078), ('edges', 0.078), ('verhagen', 0.078), ('classifier', 0.077), ('transitivity', 0.077), ('multidocument', 0.077), ('transitive', 0.074), ('incoming', 0.073), ('smo', 0.073), ('threads', 0.07), ('redundancy', 0.069), ('vote', 0.067), ('tempeval', 0.067), ('contextualize', 0.063), ('pope', 0.063), ('sameevent', 0.063), ('chronological', 0.062), ('newswire', 0.06), ('concerning', 0.059), ('update', 0.055), ('vertices', 0.054), ('graph', 0.053), ('chambers', 0.051), ('modules', 0.051), ('pouliquen', 0.05), ('outgoing', 0.05), ('user', 0.048), ('narrative', 0.045), ('linked', 0.043), ('attacks', 0.042), ('dct', 0.042), ('dcts', 0.042), ('edt', 0.042), ('fujiki', 0.042), ('mirroshandel', 0.042), ('moriceau', 0.042), ('reacts', 0.042), ('tannier', 0.042), ('thoe', 0.042), ('cascade', 0.042), ('keywords', 0.041), ('classifiers', 0.04), ('thematic', 0.038), ('illustrated', 0.038), ('topically', 0.037), ('successive', 0.037), ('creation', 0.037), ('news', 0.037), ('krestel', 0.036), ('balahur', 0.036), ('bron', 0.036), ('orsay', 0.036), ('step', 0.036), ('titles', 0.034), ('steinberger', 0.033), ('afp', 0.033), ('date', 0.033), ('agency', 0.033), ('story', 0.033), ('annotation', 0.032), ('older', 0.031), ('ferro', 0.031), ('mani', 0.031), ('paragraph', 0.031), ('kept', 0.031), ('opinion', 0.03), ('reads', 0.029), ('mof', 0.029), ('lucene', 0.029), ('nodes', 0.028), ('weka', 0.028), ('talukdar', 0.028), ('barzilay', 0.027), ('graphs', 0.027), ('pairs', 0.027), ('newspaper', 0.027), ('clinical', 0.027), ('kessler', 0.027), ('similarity', 0.026), ('title', 0.026), ('death', 0.026), ('french', 0.025), ('platt', 0.025), ('someone', 0.025)]

similar papers list:

simIndex simValue paperId paperTitle

same-paper 1 1.0000001 41 emnlp-2013-Building Event Threads out of Multiple News Articles

Author: Xavier Tannier ; Veronique Moriceau

Abstract: We present an approach for building multidocument event threads from a large corpus of newswire articles. An event thread is basically a succession of events belonging to the same story. It helps the reader to contextualize the information contained in a single article, by navigating backward or forward in the thread from this article. A specific effort is also made on the detection of reactions to a particular event. In order to build these event threads, we use a cascade of classifiers and other modules, taking advantage of the redundancy of information in the newswire corpus. We also share interesting comments concerning our manual annotation procedure for building a training and testing set1.

2 0.26927918 76 emnlp-2013-Exploiting Discourse Analysis for Article-Wide Temporal Classification

Author: Jun-Ping Ng ; Min-Yen Kan ; Ziheng Lin ; Wei Feng ; Bin Chen ; Jian Su ; Chew Lim Tan

Abstract: In this paper we classify the temporal relations between pairs of events on an article-wide basis. This is in contrast to much of the existing literature which focuses on just event pairs which are found within the same or adjacent sentences. To achieve this, we leverage on discourse analysis as we believe that it provides more useful semantic information than typical lexico-syntactic features. We propose the use of several discourse analysis frameworks, including 1) Rhetorical Structure Theory (RST), 2) PDTB-styled discourse relations, and 3) topical text segmentation. We explain how features derived from these frameworks can be effectively used with support vector machines (SVM) paired with convolution kernels. Experiments show that our proposal is effective in improving on the state-of-the-art significantly by as much as 16% in terms of F1, even if we only adopt less-than-perfect automatic discourse analyzers and parsers. Making use of more accurate discourse analysis can further boost gains to 35%.

3 0.26279843 118 emnlp-2013-Learning Biological Processes with Global Constraints

Author: Aju Thalappillil Scaria ; Jonathan Berant ; Mengqiu Wang ; Peter Clark ; Justin Lewis ; Brittany Harding ; Christopher D. Manning

Abstract: Biological processes are complex phenomena involving a series of events that are related to one another through various relationships. Systems that can understand and reason over biological processes would dramatically improve the performance of semantic applications involving inference such as question answering (QA) – specifically “How? ” and “Why? ” questions. In this paper, we present the task of process extraction, in which events within a process and the relations between the events are automatically extracted from text. We represent processes by graphs whose edges describe a set oftemporal, causal and co-reference event-event relations, and characterize the structural properties of these graphs (e.g., the graphs are connected). Then, we present a method for extracting relations between the events, which exploits these structural properties by performing joint in- ference over the set of extracted relations. On a novel dataset containing 148 descriptions of biological processes (released with this paper), we show significant improvement comparing to baselines that disregard process structure.

4 0.23420407 74 emnlp-2013-Event-Based Time Label Propagation for Automatic Dating of News Articles

Author: Tao Ge ; Baobao Chang ; Sujian Li ; Zhifang Sui

Abstract: Since many applications such as timeline summaries and temporal IR involving temporal analysis rely on document timestamps, the task of automatic dating of documents has been increasingly important. Instead of using feature-based methods as conventional models, our method attempts to date documents in a year level by exploiting relative temporal relations between documents and events, which are very effective for dating documents. Based on this intuition, we proposed an eventbased time label propagation model called confidence boosting in which time label information can be propagated between documents and events on a bipartite graph. The experiments show that our event-based propagation model can predict document timestamps in high accuracy and the model combined with a MaxEnt classifier outperforms the state-ofthe-art method for this task especially when the size of the training set is small.

5 0.18475361 93 emnlp-2013-Harvesting Parallel News Streams to Generate Paraphrases of Event Relations

Author: Congle Zhang ; Daniel S. Weld

Abstract: The distributional hypothesis, which states that words that occur in similar contexts tend to have similar meanings, has inspired several Web mining algorithms for paraphrasing semantically equivalent phrases. Unfortunately, these methods have several drawbacks, such as confusing synonyms with antonyms and causes with effects. This paper introduces three Temporal Correspondence Heuristics, that characterize regularities in parallel news streams, and shows how they may be used to generate high precision paraphrases for event relations. We encode the heuristics in a probabilistic graphical model to create the NEWSSPIKE algorithm for mining news streams. We present experiments demonstrating that NEWSSPIKE significantly outperforms several competitive baselines. In order to spur further research, we provide a large annotated corpus of timestamped news arti- cles as well as the paraphrases produced by NEWSSPIKE.

6 0.18121621 16 emnlp-2013-A Unified Model for Topics, Events and Users on Twitter

7 0.16934489 192 emnlp-2013-Unsupervised Induction of Contingent Event Pairs from Film Scenes

8 0.13951923 147 emnlp-2013-Optimized Event Storyline Generation based on Mixture-Event-Aspect Model

9 0.095343575 75 emnlp-2013-Event Schema Induction with a Probabilistic Entity-Driven Model

10 0.091550797 152 emnlp-2013-Predicting the Presence of Discourse Connectives

11 0.090217248 179 emnlp-2013-Summarizing Complex Events: a Cross-Modal Solution of Storylines Extraction and Reconstruction

12 0.087475367 160 emnlp-2013-Relational Inference for Wikification

13 0.081296697 194 emnlp-2013-Unsupervised Relation Extraction with General Domain Knowledge

14 0.07969927 182 emnlp-2013-The Topology of Semantic Knowledge

15 0.075417481 90 emnlp-2013-Generating Coherent Event Schemas at Scale

16 0.071408629 24 emnlp-2013-Application of Localized Similarity for Web Documents

17 0.066680439 133 emnlp-2013-Modeling Scientific Impact with Topical Influence Regression

18 0.064196616 68 emnlp-2013-Effectiveness and Efficiency of Open Relation Extraction

19 0.056891136 18 emnlp-2013-A temporal model of text periodicities using Gaussian Processes

20 0.05193422 66 emnlp-2013-Dynamic Feature Selection for Dependency Parsing


similar papers computed by lsi model

lsi for this paper:

topicId topicWeight

[(0, -0.216), (1, 0.196), (2, -0.077), (3, 0.333), (4, 0.023), (5, -0.215), (6, -0.194), (7, 0.013), (8, -0.114), (9, 0.008), (10, 0.083), (11, 0.04), (12, -0.004), (13, 0.016), (14, 0.023), (15, 0.019), (16, -0.108), (17, 0.048), (18, -0.055), (19, 0.146), (20, -0.014), (21, -0.055), (22, -0.013), (23, 0.015), (24, 0.047), (25, -0.012), (26, 0.048), (27, -0.029), (28, 0.072), (29, 0.061), (30, 0.002), (31, -0.058), (32, -0.129), (33, 0.019), (34, 0.027), (35, 0.059), (36, 0.024), (37, 0.076), (38, 0.003), (39, 0.018), (40, 0.016), (41, -0.005), (42, 0.055), (43, 0.001), (44, 0.029), (45, 0.056), (46, -0.023), (47, -0.01), (48, 0.008), (49, 0.036)]

similar papers list:

simIndex simValue paperId paperTitle

same-paper 1 0.97001189 41 emnlp-2013-Building Event Threads out of Multiple News Articles

Author: Xavier Tannier ; Veronique Moriceau

Abstract: We present an approach for building multidocument event threads from a large corpus of newswire articles. An event thread is basically a succession of events belonging to the same story. It helps the reader to contextualize the information contained in a single article, by navigating backward or forward in the thread from this article. A specific effort is also made on the detection of reactions to a particular event. In order to build these event threads, we use a cascade of classifiers and other modules, taking advantage of the redundancy of information in the newswire corpus. We also share interesting comments concerning our manual annotation procedure for building a training and testing set1.

2 0.89518434 74 emnlp-2013-Event-Based Time Label Propagation for Automatic Dating of News Articles

Author: Tao Ge ; Baobao Chang ; Sujian Li ; Zhifang Sui

Abstract: Since many applications such as timeline summaries and temporal IR involving temporal analysis rely on document timestamps, the task of automatic dating of documents has been increasingly important. Instead of using feature-based methods as conventional models, our method attempts to date documents in a year level by exploiting relative temporal relations between documents and events, which are very effective for dating documents. Based on this intuition, we proposed an eventbased time label propagation model called confidence boosting in which time label information can be propagated between documents and events on a bipartite graph. The experiments show that our event-based propagation model can predict document timestamps in high accuracy and the model combined with a MaxEnt classifier outperforms the state-ofthe-art method for this task especially when the size of the training set is small.

3 0.85513699 118 emnlp-2013-Learning Biological Processes with Global Constraints

Author: Aju Thalappillil Scaria ; Jonathan Berant ; Mengqiu Wang ; Peter Clark ; Justin Lewis ; Brittany Harding ; Christopher D. Manning

Abstract: Biological processes are complex phenomena involving a series of events that are related to one another through various relationships. Systems that can understand and reason over biological processes would dramatically improve the performance of semantic applications involving inference such as question answering (QA) – specifically “How? ” and “Why? ” questions. In this paper, we present the task of process extraction, in which events within a process and the relations between the events are automatically extracted from text. We represent processes by graphs whose edges describe a set oftemporal, causal and co-reference event-event relations, and characterize the structural properties of these graphs (e.g., the graphs are connected). Then, we present a method for extracting relations between the events, which exploits these structural properties by performing joint in- ference over the set of extracted relations. On a novel dataset containing 148 descriptions of biological processes (released with this paper), we show significant improvement comparing to baselines that disregard process structure.

4 0.78834444 76 emnlp-2013-Exploiting Discourse Analysis for Article-Wide Temporal Classification

Author: Jun-Ping Ng ; Min-Yen Kan ; Ziheng Lin ; Wei Feng ; Bin Chen ; Jian Su ; Chew Lim Tan

Abstract: In this paper we classify the temporal relations between pairs of events on an article-wide basis. This is in contrast to much of the existing literature which focuses on just event pairs which are found within the same or adjacent sentences. To achieve this, we leverage on discourse analysis as we believe that it provides more useful semantic information than typical lexico-syntactic features. We propose the use of several discourse analysis frameworks, including 1) Rhetorical Structure Theory (RST), 2) PDTB-styled discourse relations, and 3) topical text segmentation. We explain how features derived from these frameworks can be effectively used with support vector machines (SVM) paired with convolution kernels. Experiments show that our proposal is effective in improving on the state-of-the-art significantly by as much as 16% in terms of F1, even if we only adopt less-than-perfect automatic discourse analyzers and parsers. Making use of more accurate discourse analysis can further boost gains to 35%.

5 0.75009215 192 emnlp-2013-Unsupervised Induction of Contingent Event Pairs from Film Scenes

Author: Zhichao Hu ; Elahe Rahimtoroghi ; Larissa Munishkina ; Reid Swanson ; Marilyn A. Walker

Abstract: Human engagement in narrative is partially driven by reasoning about discourse relations between narrative events, and the expectations about what is likely to happen next that results from such reasoning. Researchers in NLP have tackled modeling such expectations from a range of perspectives, including treating it as the inference of the CONTINGENT discourse relation, or as a type of common-sense causal reasoning. Our approach is to model likelihood between events by drawing on several of these lines of previous work. We implement and evaluate different unsupervised methods for learning event pairs that are likely to be CONTINGENT on one another. We refine event pairs that we learn from a corpus of film scene descriptions utilizing web search counts, and evaluate our results by collecting human judgments ofcontingency. Our results indicate that the use of web search counts increases the av- , erage accuracy of our best method to 85.64% over a baseline of 50%, as compared to an average accuracy of 75. 15% without web search.

6 0.64474213 93 emnlp-2013-Harvesting Parallel News Streams to Generate Paraphrases of Event Relations

7 0.57794303 16 emnlp-2013-A Unified Model for Topics, Events and Users on Twitter

8 0.49571234 147 emnlp-2013-Optimized Event Storyline Generation based on Mixture-Event-Aspect Model

9 0.463844 152 emnlp-2013-Predicting the Presence of Discourse Connectives

10 0.37221801 24 emnlp-2013-Application of Localized Similarity for Web Documents

11 0.36343989 68 emnlp-2013-Effectiveness and Efficiency of Open Relation Extraction

12 0.35885477 182 emnlp-2013-The Topology of Semantic Knowledge

13 0.35221532 133 emnlp-2013-Modeling Scientific Impact with Topical Influence Regression

14 0.33288282 179 emnlp-2013-Summarizing Complex Events: a Cross-Modal Solution of Storylines Extraction and Reconstruction

15 0.32525212 18 emnlp-2013-A temporal model of text periodicities using Gaussian Processes

16 0.31512886 75 emnlp-2013-Event Schema Induction with a Probabilistic Entity-Driven Model

17 0.30553767 183 emnlp-2013-The VerbCorner Project: Toward an Empirically-Based Semantic Decomposition of Verbs

18 0.30190706 160 emnlp-2013-Relational Inference for Wikification

19 0.29999545 137 emnlp-2013-Multi-Relational Latent Semantic Analysis

20 0.28239807 189 emnlp-2013-Two-Stage Method for Large-Scale Acquisition of Contradiction Pattern Pairs using Entailment


similar papers computed by lda model

lda for this paper:

topicId topicWeight

[(3, 0.034), (9, 0.019), (18, 0.023), (22, 0.503), (30, 0.048), (51, 0.145), (66, 0.035), (71, 0.021), (75, 0.037), (77, 0.011), (96, 0.02), (97, 0.014)]

similar papers list:

simIndex simValue paperId paperTitle

1 0.94710147 74 emnlp-2013-Event-Based Time Label Propagation for Automatic Dating of News Articles

Author: Tao Ge ; Baobao Chang ; Sujian Li ; Zhifang Sui

Abstract: Since many applications such as timeline summaries and temporal IR involving temporal analysis rely on document timestamps, the task of automatic dating of documents has been increasingly important. Instead of using feature-based methods as conventional models, our method attempts to date documents in a year level by exploiting relative temporal relations between documents and events, which are very effective for dating documents. Based on this intuition, we proposed an eventbased time label propagation model called confidence boosting in which time label information can be propagated between documents and events on a bipartite graph. The experiments show that our event-based propagation model can predict document timestamps in high accuracy and the model combined with a MaxEnt classifier outperforms the state-ofthe-art method for this task especially when the size of the training set is small.

2 0.93628442 25 emnlp-2013-Appropriately Incorporating Statistical Significance in PMI

Author: Om P. Damani ; Shweta Ghonge

Abstract: Two recent measures incorporate the notion of statistical significance in basic PMI formulation. In some tasks, we find that the new measures perform worse than the PMI. Our analysis shows that while the basic ideas in incorporating statistical significance in PMI are reasonable, they have been applied slightly inappropriately. By fixing this, we get new measures that improve performance over not just PMI but on other popular co-occurrence measures as well. In fact, the revised measures perform reasonably well compared with more resource intensive non co-occurrence based methods also.

same-paper 3 0.90230125 41 emnlp-2013-Building Event Threads out of Multiple News Articles

Author: Xavier Tannier ; Veronique Moriceau

Abstract: We present an approach for building multidocument event threads from a large corpus of newswire articles. An event thread is basically a succession of events belonging to the same story. It helps the reader to contextualize the information contained in a single article, by navigating backward or forward in the thread from this article. A specific effort is also made on the detection of reactions to a particular event. In order to build these event threads, we use a cascade of classifiers and other modules, taking advantage of the redundancy of information in the newswire corpus. We also share interesting comments concerning our manual annotation procedure for building a training and testing set1.

4 0.88771814 136 emnlp-2013-Multi-Domain Adaptation for SMT Using Multi-Task Learning

Author: Lei Cui ; Xilun Chen ; Dongdong Zhang ; Shujie Liu ; Mu Li ; Ming Zhou

Abstract: Domain adaptation for SMT usually adapts models to an individual specific domain. However, it often lacks some correlation among different domains where common knowledge could be shared to improve the overall translation quality. In this paper, we propose a novel multi-domain adaptation approach for SMT using Multi-Task Learning (MTL), with in-domain models tailored for each specific domain and a general-domain model shared by different domains. The parameters of these models are tuned jointly via MTL so that they can learn general knowledge more accurately and exploit domain knowledge better. Our experiments on a largescale English-to-Chinese translation task validate that the MTL-based adaptation approach significantly and consistently improves the translation quality compared to a non-adapted baseline. Furthermore, it also outperforms the individual adaptation of each specific domain.

5 0.66343409 77 emnlp-2013-Exploiting Domain Knowledge in Aspect Extraction

Author: Zhiyuan Chen ; Arjun Mukherjee ; Bing Liu ; Meichun Hsu ; Malu Castellanos ; Riddhiman Ghosh

Abstract: Aspect extraction is one of the key tasks in sentiment analysis. In recent years, statistical models have been used for the task. However, such models without any domain knowledge often produce aspects that are not interpretable in applications. To tackle the issue, some knowledge-based topic models have been proposed, which allow the user to input some prior domain knowledge to generate coherent aspects. However, existing knowledge-based topic models have several major shortcomings, e.g., little work has been done to incorporate the cannot-link type of knowledge or to automatically adjust the number of topics based on domain knowledge. This paper proposes a more advanced topic model, called MC-LDA (LDA with m-set and c-set), to address these problems, which is based on an Extended generalized Pólya urn (E-GPU) model (which is also proposed in this paper). Experiments on real-life product reviews from a variety of domains show that MCLDA outperforms the existing state-of-the-art models markedly.

6 0.64479887 118 emnlp-2013-Learning Biological Processes with Global Constraints

7 0.64195782 29 emnlp-2013-Automatic Domain Partitioning for Multi-Domain Learning

8 0.59874541 76 emnlp-2013-Exploiting Discourse Analysis for Article-Wide Temporal Classification

9 0.59299886 179 emnlp-2013-Summarizing Complex Events: a Cross-Modal Solution of Storylines Extraction and Reconstruction

10 0.58338219 21 emnlp-2013-An Empirical Study Of Semi-Supervised Chinese Word Segmentation Using Co-Training

11 0.57809693 120 emnlp-2013-Learning Latent Word Representations for Domain Adaptation using Supervised Word Clustering

12 0.57578677 192 emnlp-2013-Unsupervised Induction of Contingent Event Pairs from Film Scenes

13 0.56312943 88 emnlp-2013-Flexible and Efficient Hypergraph Interactions for Joint Hierarchical and Forest-to-String Decoding

14 0.56238014 125 emnlp-2013-Lexical Chain Based Cohesion Models for Document-Level Statistical Machine Translation

15 0.5547992 187 emnlp-2013-Translation with Source Constituency and Dependency Trees

16 0.55474347 168 emnlp-2013-Semi-Supervised Feature Transformation for Dependency Parsing

17 0.55307889 152 emnlp-2013-Predicting the Presence of Discourse Connectives

18 0.55097163 48 emnlp-2013-Collective Personal Profile Summarization with Social Networks

19 0.54590237 90 emnlp-2013-Generating Coherent Event Schemas at Scale

20 0.5454312 99 emnlp-2013-Implicit Feature Detection via a Constrained Topic Model and SVM