emnlp emnlp2013 emnlp2013-192 knowledge-graph by maker-knowledge-mining

192 emnlp-2013-Unsupervised Induction of Contingent Event Pairs from Film Scenes


Source: pdf

Author: Zhichao Hu ; Elahe Rahimtoroghi ; Larissa Munishkina ; Reid Swanson ; Marilyn A. Walker

Abstract: Human engagement in narrative is partially driven by reasoning about discourse relations between narrative events, and the expectations about what is likely to happen next that results from such reasoning. Researchers in NLP have tackled modeling such expectations from a range of perspectives, including treating it as the inference of the CONTINGENT discourse relation, or as a type of common-sense causal reasoning. Our approach is to model likelihood between events by drawing on several of these lines of previous work. We implement and evaluate different unsupervised methods for learning event pairs that are likely to be CONTINGENT on one another. We refine event pairs that we learn from a corpus of film scene descriptions utilizing web search counts, and evaluate our results by collecting human judgments ofcontingency. Our results indicate that the use of web search counts increases the av- , erage accuracy of our best method to 85.64% over a baseline of 50%, as compared to an average accuracy of 75. 15% without web search.

Reference: text


Summary: the most important sentenses genereted by tfidf model

sentIndex sentText sentNum sentScore

1 We implement and evaluate different unsupervised methods for learning event pairs that are likely to be CONTINGENT on one another. [sent-5, score-0.431]

2 We refine event pairs that we learn from a corpus of film scene descriptions utilizing web search counts, and evaluate our results by collecting human judgments ofcontingency. [sent-6, score-0.972]

3 1 Introduction Human engagement in narrative is partially driven by reasoning about discourse relations between narrative events, and the expectations about what is likely to happen next that results from such reasoning (Gerrig, 1993; Graesser et al. [sent-10, score-0.593]

4 Thus discourse relations are one of the primary means to structure narrative in genres as diverse as weblogs, search queries, stories, film scripts and news articles (Chambers and Jurafsky, 2009; Manshadi et al. [sent-13, score-0.754]

5 Recent work in NLP has tackled the inference of relations between events from a broad range of perspectives: (1) as inference of a discourse relations (e. [sent-19, score-0.429]

6 to enable systems to infer which events are likely to have happened even though they have not been mentioned in the text (Schank et al. [sent-24, score-0.318]

7 , 1977), and which events are likely to happen in the future. [sent-25, score-0.318]

8 We model this likelihood between events by drawing on the PTDB’s general definition of the CONTINGENT relation, which encapsulates relations elsewhere called CAUSE, CONDITION and ENABLEMENT (Prasad et al. [sent-29, score-0.312]

9 Our aim in this paper is to implement and evaluate a range of different unsupervised methods for learning event pairs that are likely to be CONTINGENT on one another. [sent-34, score-0.431]

10 In addition, scenes in film represent many typical sequences from real life, while providing a rich source of event clusters related to battles, love and mystery. [sent-37, score-0.712]

11 We carry out separate experiments for the action movie genre and the romance movie genre. [sent-38, score-0.341]

12 For example, in the scene from Total Recall, from the action movie genre (See Fig. [sent-39, score-0.312]

13 1), we might learn that the event of s it s up is CONTINGENT on the event of clock chime s. [sent-40, score-0.766]

14 The subset of the corpus we use comprises 123,869 total unique event pairs. [sent-41, score-0.351]

15 We produce initial scalar estimates of potential CONTINGENCY between events using four previously defined measures of distributional cooccurrence. [sent-42, score-0.385]

16 We then refine these estimates through web searches that explicitly model the patterns of narrative event sequences that were previously observed to be likely within a particular genre. [sent-43, score-0.791]

17 To test our method, we conduct perceptual experiments with human subjects on Mechanical Turk by asking them to select which of two pairs of events are the most likely. [sent-45, score-0.33]

18 1, Mechanical Turkers are asked to select whether the sequential event pair clock chime s s it s up is more likely than clock chimes followed by a randomly selected event from the action film genre. [sent-47, score-1.179]

19 2 Experimental Method Our method uses a combination of estimating the likelihood of a CONTINGENT relation between events in a corpus of film scenes (Walker et al. [sent-62, score-0.645]

20 Our experiments are based on two subsets of 862 film screen plays collected from the IMSDb website using its ontology of film genres (Walker et al. [sent-64, score-0.664]

21 u Wbteyp asessu mofe th thea tC bOoNthTI thNeGENCY relation will result in pairs of events that are likely to occur together and in a particular order. [sent-74, score-0.429]

22 We define an event as a verb lemma with its subject and object. [sent-86, score-0.382]

23 Word sense ambiguities are also reduced in specific genres (Action and Romance) of film scenes. [sent-91, score-0.354]

24 COMPUTE EVENT REPRESENTATIONS: We form intermediate artifacts such as events, protagonists and event pairs from the annotated documents. [sent-96, score-0.397]

25 We calculate the frequency of the event across the relevant genre (Sec. [sent-98, score-0.497]

26 WEB SEARCH REFINEMENT: We select the top 100 event pairs calculated by each contingency measure, and construct a RANDOM EVENT PAIR (REP) for each PCEP that preserves the first element of the PCEP, and replaces the second element with another event selected randomly from within the same genre. [sent-106, score-0.968]

27 Because we are interested in the event descriptions that are part of the scene descriptions, we excise the dialog from each screen play. [sent-112, score-0.517]

28 Then using the Stanford CoreNLP pipeline, we annotate the film scene files. [sent-113, score-0.39]

29 2 Compute Event Representations Given the results of Step 1 we start by generalizing the subject and object stored with each event by substituting tokens with named entities if there are any named entities tagged. [sent-119, score-0.382]

30 We then integrate all the subjects and objects across all film scene files, keeping a record of the frequency of each subject and object. [sent-122, score-0.421]

31 We then count the frequency of each event across all the film scene files. [sent-125, score-0.77]

32 Within each film scene file, we count adjacent , , , , events as potential CONTINGENT event pairs. [sent-126, score-1.054]

33 Two event pairs are defined as equal if they have the same verbs in the same order. [sent-127, score-0.397]

34 These measures are pointwise mutual information, causal potential, bigram probability and protagonist-based causal potential as described in detail below. [sent-135, score-0.401]

35 We calculate each measure separately by genre for the action and romance genres of the film corpus. [sent-136, score-0.669]

36 Given a set of events (a verb and its collected set of subjects and objects), we calculate the PMI using the standard definition: pmi(e1,e2) = logPP(e(1e)1P,e(2e)2) (1) in which e1 and e2 are two events. [sent-140, score-0.324]

37 P(e1) is the probability that event e1 occur in the corpus: P(e1) =Pcxouconutn(et1(e)x) (2) where count(e1) is thPe count of Phow many times event e1 occurs in the corpus, and Px count(ex) is the count of all the events in the Pcorpus. [sent-141, score-1.074]

38 The numerator is the probability that the Ptwo events occur together in the corpus: P(e1,e2) =PxcPouyncto(uen1t,e(e2x),ey) (3) in which count(e1 , Pe2) Pis the number of times the two events e1 and e2 occur together in the corpus regardless of their order. [sent-142, score-0.628]

39 An annotator deciding whether event A causes event B asks herself the following questions, where answering yes to both means the two events are causally related: • • Does event A occur before (or simultaneously) wDiotehs se evveennt tB A? [sent-148, score-1.436]

40 Keeping constant as many other states of affairs oKfe tehpei nwgo crolnd sitna ntht ea sg mivaenny te otxht ecro snttaetxest oasf apfofsasirisble, does modifying event A entail predictably modifying event B? [sent-149, score-0.702]

41 event e1 occurs before event e2: φ(e1,e2) = pmi(e1,e2) + logPP((ee12→→ e e12)) (4) where pmi(e1,e2) = logPP(e(e1)1P,e(2e)2) The causal potential consists of two terms: the first is pair-wise mutual information (PMI) and the second is relative ordering of bigrams. [sent-154, score-0.913]

42 PMI measures how often events occur as a pair; whereas relative ordering counts how often event order occurs in the bigram. [sent-155, score-0.776]

43 We smooth unseen event pairs by setting their frequency equal to 1to avoid zero probabilities. [sent-157, score-0.397]

44 Our third method models event sequences using statistical language models (Manshadi et al. [sent-161, score-0.386]

45 To identify contingent event sequences, we apply a bigram model which estimates the probability of observing the sequence of two words w1 and w2 as follows: P(w1,w2)∼= P(w2|w1) =cocuonutn(wt(1w,w1)2) (5) Here, the words are events. [sent-164, score-0.749]

46 Each verb is a single event and each film scene is treated as a sequence of verbs. [sent-165, score-0.741]

47 We also used a method of generating event pairs based not only on the consecutive events in text but on their protagonist. [sent-171, score-0.681]

48 We called this method protagonist-based because all events were partitioned into multiple sets where each set of events has one protagonist. [sent-173, score-0.568]

49 We preserve the order ofevents based on their tex- tual order assuming as above that film scripts tend to preserve temporal order. [sent-180, score-0.36]

50 An ordered event pair is generated if both events share a protagonist. [sent-181, score-0.635]

51 We further filter event pairs by eliminating those whose frequency is less than 5 to filter insignificant and rare event pairs. [sent-182, score-0.748]

52 To calculate the PMI part of CP, we combine the frequencies of event pairs in both orders. [sent-185, score-0.437]

53 Our hypothesis is that using the film corpus within a particular genre to do the initial estimates of contingency takes advantage of genre properties such as similar events and narration of scenes in chronological order. [sent-188, score-1.108]

54 However the film corpus is necessarily small, and we can augment the evidence for a particular contingent relation by defining specific narrative sequence patterns and collecting web counts. [sent-189, score-0.952]

55 Recall that PCEP stands for predicted contingent event pair and that REP stands for random event pair. [sent-190, score-1.034]

56 We first select the top 100 event pairs calculated by each CONTINGENCY measure, and construct a RANDOM EVENT PAIR (REP) for each PCEP that preserves the first element of the PCEP, and replaces the second element with another event selected randomly from within the same genre. [sent-191, score-0.748]

57 Our web refinement procedure is: • • • • For each event pair, PCEPs and REPs, create a FGoorog ealce hse eavrecnht pattern CasE Pil su asntdra tRedE P bys, T craebalete 1 a, and described in more detail below. [sent-194, score-0.525]

58 Column 5 shows the results of web search hits for the PCEP patterns and Column 8 shows the results of web search hits for the REP patterns. [sent-198, score-0.495]

59 In addition, we use the “*” operator in Google Search to limit search to pairs of events reported in the historical present tense, that are “near” one another, and in a particular sequence. [sent-205, score-0.399]

60 We don’t care whether the events are in the same utterance or in sequential utterances, thus for the second verb (event) we do not include a subject pronoun he. [sent-206, score-0.315]

61 These search patterns are not intended to match the original instances in the film corpus and in general they are unlikely to match those instances. [sent-207, score-0.394]

62 Even though in general the PCEP pairs are more , , likely (as measured by the paired t-test comparing web search counts for PCEPs vs REPs), there are cases where the REP is highly likely as shown by the REP (pe rs on t ake pers on pe rs on CATCH person) in Row 7. [sent-216, score-0.515]

63 After the web search refinement, we retain the PCEP/REP pairs with initially high PCEP estimates, for which we found good evidence for contingency and for randomness, e. [sent-220, score-0.425]

64 We also decided to utilize event patterns without typical , objects, such as head in pers on REST head in Row 2 of Table 1. [sent-226, score-0.444]

65 The differences in the different types of HITS involve: (1) whether the arguments of events were given in the HIT, as in Fig. [sent-238, score-0.337]

66 2 and (2): whether the Turkers were told that the order of the events mattered, as in Fig. [sent-239, score-0.326]

67 We initially thought that providing the arguments to the events as shown in Fig. [sent-241, score-0.337]

68 2 would help Turkers to reason about which event was more likely. [sent-242, score-0.351]

69 2 illustrates the instructions that were given with the HIT when the event order doesn’t matter. [sent-247, score-0.381]

70 For each measure of CONTINGENCY, we take 100 event pairs with highest PCEP scores, and put them in 5 HITs with twenty items per HIT. [sent-259, score-0.397]

71 The results of using event arguments (pers on KNOW pe rs on) in the Mechanical Turk evaluation task (i. [sent-275, score-0.504]

72 The accuracies for Rows 1 and 2 are Figure 2: Mechanical Turk HIT with event arguments provided. [sent-279, score-0.404]

73 This HIT also illustrates instructions where Turkers are told that the order of the events does not matter. [sent-280, score-0.356]

74 3850 Table 2: Evaluation results for the top 100 event pairs using all methods. [sent-309, score-0.397]

75 Comparing Rows 1 and 2 with Rows 3 and 4 suggests that even if the arguments provide extra information that help to ground the omits the event arguments (i. [sent-311, score-0.457]

76 Thus omitting the arguments of events in evaluations actually appears to allow Turkers to make better judgments. [sent-315, score-0.337]

77 reWloynvce tybhdasuebtsayfirnocgrhef tihnve376 Figure 3: Mechanical Turk HIT for evaluation with no event arguments provided. [sent-318, score-0.404]

78 This HIT also illustrates instructions where Turkers are told that the order of the events does matter. [sent-319, score-0.356]

79 5% when averaged over both film genres (Column 5 Average Acc%). [sent-322, score-0.354]

80 However PMI+Web Search does not beat CP+Web Search on average over both genres we tested, even though the Mechanical Turk HIT for CP specifies that the order of the events matters: a more stringent criterion. [sent-326, score-0.358]

81 However even in this case of romance with PMI, adding web search refinement provides an almost 10% increase in absolute accuracy to the highest accuracy of any combination, i. [sent-334, score-0.413]

82 There is also an interesting case of Protag CP for 377 the romance genre where web search refinement actually decreases accuracy by 4. [sent-338, score-0.487]

83 In future work we plan to examine more genres from the film corpus and also examine the role of corpus size in more detail. [sent-340, score-0.354]

84 4 Discussion and Future Work We induced event pairs using several methods from previous work with similar aims but widely different problem formulations and evaluation methods. [sent-341, score-0.397]

85 We used a verb-rich film scene corpus where events are normally narrated in temporal order. [sent-342, score-0.752]

86 We used Me- chanical Turk to evaluate the learned pairs of CONTINGENT events using human perceptions. [sent-343, score-0.33]

87 We then implemented a novel method of defining narrative sequence patterns using the Google Search API, and used web counts to further refine our estimates of the contingency of the learned event pairs. [sent-345, score-0.99]

88 Other work by Girju and her students defined a measure called causal poten- tial and then used film screen plays to learn a knowledge base of causal pairs of events. [sent-357, score-0.722]

89 Work on commonsense causal reasoning aims to learn causal relations beween pairs of events using a range of methods applied to a large corpus of weblog narratives (Gordon et al. [sent-360, score-0.844]

90 One form of evaluation aimed to predict the last event in a sequence (Manshadi et al. [sent-363, score-0.351]

91 Related work on SCRIPT LEARNING induces likely sequences of temporally ordered events in news, rather than CONTINGENCY or CAUSALITY (Chambers and Jurafsky, 2008; Chambers and Jurafsky, 2009). [sent-366, score-0.353]

92 Chambers & Jurafsky also evaluate against a corpus of existing documents, by leaving one event out of a document (news story), and then testing the system’s ability to predict the missing event. [sent-367, score-0.351]

93 We are also the first to evaluate the learned event pairs with a human perceptual evaluation with native speakers. [sent-369, score-0.397]

94 378 Our work capitalizes on event sequences narrated in temporal order as a cue to causality. [sent-371, score-0.464]

95 We do not expect this technique to generalize without further refinements to genres frequently told out of temporal order or when events are not mentioned consecutively in the text, for example in certain types of fiction. [sent-373, score-0.436]

96 For example, previous work would suggest that the the higher the measure is, the more likely the two events are to be contingent on one another. [sent-375, score-0.65]

97 As shown in Table 2, web search refinement is able to eliminate most noise in event pairs, but we would still aim to achieve a better understanding of the circumstances which lead particular methods to work better. [sent-378, score-0.594]

98 In future work we also want to explore ways of inducing larger event structures than event pairs, such as the causal chains, scripts, or narrative schemas of previous work. [sent-379, score-1.083]

99 Using a bigram event model to predict causal potential. [sent-386, score-0.534]

100 Learning a probabilistic model of event sequences from internet weblog stories. [sent-534, score-0.422]


similar papers computed by tfidf model

tfidf for this paper:

wordName wordTfidf (topN-words)

[('event', 0.351), ('contingent', 0.332), ('events', 0.284), ('film', 0.28), ('pcep', 0.277), ('contingency', 0.22), ('causal', 0.183), ('narrative', 0.17), ('girju', 0.169), ('cp', 0.126), ('beamer', 0.125), ('pceps', 0.125), ('reps', 0.125), ('rep', 0.123), ('pmi', 0.119), ('swanson', 0.111), ('scene', 0.11), ('romance', 0.106), ('genre', 0.106), ('gordon', 0.101), ('web', 0.09), ('discourse', 0.089), ('turkers', 0.088), ('refinement', 0.084), ('manshadi', 0.083), ('chambers', 0.075), ('genres', 0.074), ('causally', 0.069), ('riaz', 0.069), ('search', 0.069), ('turk', 0.067), ('hits', 0.066), ('estimates', 0.066), ('action', 0.063), ('hit', 0.063), ('mechanical', 0.062), ('causality', 0.06), ('jurafsky', 0.057), ('aarrggss', 0.055), ('nnoo', 0.055), ('pe', 0.054), ('arguments', 0.053), ('pdtb', 0.051), ('reasoning', 0.051), ('pers', 0.048), ('counts', 0.048), ('pairs', 0.046), ('rs', 0.046), ('scenes', 0.046), ('patterns', 0.045), ('row', 0.044), ('scripts', 0.044), ('prasad', 0.044), ('told', 0.042), ('narrated', 0.042), ('calculate', 0.04), ('pitler', 0.037), ('temporal', 0.036), ('clock', 0.036), ('weblog', 0.036), ('measures', 0.035), ('sequences', 0.035), ('relation', 0.035), ('likely', 0.034), ('louis', 0.034), ('movie', 0.033), ('knows', 0.033), ('commonsense', 0.033), ('logpp', 0.033), ('walker', 0.032), ('accuracy', 0.032), ('subject', 0.031), ('occur', 0.03), ('instructions', 0.03), ('screen', 0.03), ('count', 0.029), ('column', 0.029), ('relations', 0.028), ('chiarcos', 0.028), ('chime', 0.028), ('dawid', 0.028), ('graesser', 0.028), ('imsdb', 0.028), ('karger', 0.028), ('labov', 0.028), ('nsub', 0.028), ('numhits', 0.028), ('protag', 0.028), ('schank', 0.028), ('unlock', 0.028), ('welinder', 0.028), ('corenlp', 0.028), ('schemas', 0.028), ('ordering', 0.028), ('rows', 0.027), ('person', 0.027), ('personal', 0.027), ('door', 0.026), ('answers', 0.026), ('descriptions', 0.026)]

similar papers list:

simIndex simValue paperId paperTitle

same-paper 1 0.99999964 192 emnlp-2013-Unsupervised Induction of Contingent Event Pairs from Film Scenes

Author: Zhichao Hu ; Elahe Rahimtoroghi ; Larissa Munishkina ; Reid Swanson ; Marilyn A. Walker

Abstract: Human engagement in narrative is partially driven by reasoning about discourse relations between narrative events, and the expectations about what is likely to happen next that results from such reasoning. Researchers in NLP have tackled modeling such expectations from a range of perspectives, including treating it as the inference of the CONTINGENT discourse relation, or as a type of common-sense causal reasoning. Our approach is to model likelihood between events by drawing on several of these lines of previous work. We implement and evaluate different unsupervised methods for learning event pairs that are likely to be CONTINGENT on one another. We refine event pairs that we learn from a corpus of film scene descriptions utilizing web search counts, and evaluate our results by collecting human judgments ofcontingency. Our results indicate that the use of web search counts increases the av- , erage accuracy of our best method to 85.64% over a baseline of 50%, as compared to an average accuracy of 75. 15% without web search.

2 0.27850765 16 emnlp-2013-A Unified Model for Topics, Events and Users on Twitter

Author: Qiming Diao ; Jing Jiang

Abstract: With the rapid growth of social media, Twitter has become one of the most widely adopted platforms for people to post short and instant message. On the one hand, people tweets about their daily lives, and on the other hand, when major events happen, people also follow and tweet about them. Moreover, people’s posting behaviors on events are often closely tied to their personal interests. In this paper, we try to model topics, events and users on Twitter in a unified way. We propose a model which combines an LDA-like topic model and the Recurrent Chinese Restaurant Process to capture topics and events. We further propose a duration-based regularization component to find bursty events. We also propose to use event-topic affinity vectors to model the asso- . ciation between events and topics. Our experiments shows that our model can accurately identify meaningful events and the event-topic affinity vectors are effective for event recommendation and grouping events by topics.

3 0.27631807 118 emnlp-2013-Learning Biological Processes with Global Constraints

Author: Aju Thalappillil Scaria ; Jonathan Berant ; Mengqiu Wang ; Peter Clark ; Justin Lewis ; Brittany Harding ; Christopher D. Manning

Abstract: Biological processes are complex phenomena involving a series of events that are related to one another through various relationships. Systems that can understand and reason over biological processes would dramatically improve the performance of semantic applications involving inference such as question answering (QA) – specifically “How? ” and “Why? ” questions. In this paper, we present the task of process extraction, in which events within a process and the relations between the events are automatically extracted from text. We represent processes by graphs whose edges describe a set oftemporal, causal and co-reference event-event relations, and characterize the structural properties of these graphs (e.g., the graphs are connected). Then, we present a method for extracting relations between the events, which exploits these structural properties by performing joint in- ference over the set of extracted relations. On a novel dataset containing 148 descriptions of biological processes (released with this paper), we show significant improvement comparing to baselines that disregard process structure.

4 0.18088681 76 emnlp-2013-Exploiting Discourse Analysis for Article-Wide Temporal Classification

Author: Jun-Ping Ng ; Min-Yen Kan ; Ziheng Lin ; Wei Feng ; Bin Chen ; Jian Su ; Chew Lim Tan

Abstract: In this paper we classify the temporal relations between pairs of events on an article-wide basis. This is in contrast to much of the existing literature which focuses on just event pairs which are found within the same or adjacent sentences. To achieve this, we leverage on discourse analysis as we believe that it provides more useful semantic information than typical lexico-syntactic features. We propose the use of several discourse analysis frameworks, including 1) Rhetorical Structure Theory (RST), 2) PDTB-styled discourse relations, and 3) topical text segmentation. We explain how features derived from these frameworks can be effectively used with support vector machines (SVM) paired with convolution kernels. Experiments show that our proposal is effective in improving on the state-of-the-art significantly by as much as 16% in terms of F1, even if we only adopt less-than-perfect automatic discourse analyzers and parsers. Making use of more accurate discourse analysis can further boost gains to 35%.

5 0.16934489 41 emnlp-2013-Building Event Threads out of Multiple News Articles

Author: Xavier Tannier ; Veronique Moriceau

Abstract: We present an approach for building multidocument event threads from a large corpus of newswire articles. An event thread is basically a succession of events belonging to the same story. It helps the reader to contextualize the information contained in a single article, by navigating backward or forward in the thread from this article. A specific effort is also made on the detection of reactions to a particular event. In order to build these event threads, we use a cascade of classifiers and other modules, taking advantage of the redundancy of information in the newswire corpus. We also share interesting comments concerning our manual annotation procedure for building a training and testing set1.

6 0.14819726 147 emnlp-2013-Optimized Event Storyline Generation based on Mixture-Event-Aspect Model

7 0.14261234 74 emnlp-2013-Event-Based Time Label Propagation for Automatic Dating of News Articles

8 0.13685657 75 emnlp-2013-Event Schema Induction with a Probabilistic Entity-Driven Model

9 0.11686455 90 emnlp-2013-Generating Coherent Event Schemas at Scale

10 0.11147878 93 emnlp-2013-Harvesting Parallel News Streams to Generate Paraphrases of Event Relations

11 0.083554558 152 emnlp-2013-Predicting the Presence of Discourse Connectives

12 0.074437425 25 emnlp-2013-Appropriately Incorporating Statistical Significance in PMI

13 0.06574706 78 emnlp-2013-Exploiting Language Models for Visual Recognition

14 0.063660778 179 emnlp-2013-Summarizing Complex Events: a Cross-Modal Solution of Storylines Extraction and Reconstruction

15 0.05415282 183 emnlp-2013-The VerbCorner Project: Toward an Empirically-Based Semantic Decomposition of Verbs

16 0.052993637 49 emnlp-2013-Combining Generative and Discriminative Model Scores for Distant Supervision

17 0.050661225 185 emnlp-2013-Towards Situated Dialogue: Revisiting Referring Expression Generation

18 0.048418332 178 emnlp-2013-Success with Style: Using Writing Style to Predict the Success of Novels

19 0.044942603 119 emnlp-2013-Learning Distributions over Logical Forms for Referring Expression Generation

20 0.043752778 126 emnlp-2013-MCTest: A Challenge Dataset for the Open-Domain Machine Comprehension of Text


similar papers computed by lsi model

lsi for this paper:

topicId topicWeight

[(0, -0.184), (1, 0.185), (2, -0.069), (3, 0.304), (4, 0.017), (5, -0.179), (6, -0.244), (7, 0.008), (8, -0.135), (9, 0.048), (10, -0.019), (11, 0.073), (12, 0.055), (13, -0.046), (14, 0.037), (15, -0.095), (16, -0.035), (17, -0.05), (18, -0.049), (19, -0.04), (20, 0.04), (21, 0.015), (22, 0.031), (23, 0.008), (24, 0.081), (25, -0.036), (26, 0.02), (27, 0.043), (28, -0.071), (29, -0.032), (30, -0.021), (31, -0.117), (32, 0.034), (33, 0.071), (34, 0.028), (35, 0.016), (36, 0.034), (37, -0.064), (38, -0.019), (39, -0.015), (40, 0.018), (41, 0.068), (42, 0.032), (43, -0.034), (44, -0.022), (45, -0.076), (46, -0.018), (47, 0.08), (48, 0.039), (49, -0.031)]

similar papers list:

simIndex simValue paperId paperTitle

same-paper 1 0.97208387 192 emnlp-2013-Unsupervised Induction of Contingent Event Pairs from Film Scenes

Author: Zhichao Hu ; Elahe Rahimtoroghi ; Larissa Munishkina ; Reid Swanson ; Marilyn A. Walker

Abstract: Human engagement in narrative is partially driven by reasoning about discourse relations between narrative events, and the expectations about what is likely to happen next that results from such reasoning. Researchers in NLP have tackled modeling such expectations from a range of perspectives, including treating it as the inference of the CONTINGENT discourse relation, or as a type of common-sense causal reasoning. Our approach is to model likelihood between events by drawing on several of these lines of previous work. We implement and evaluate different unsupervised methods for learning event pairs that are likely to be CONTINGENT on one another. We refine event pairs that we learn from a corpus of film scene descriptions utilizing web search counts, and evaluate our results by collecting human judgments ofcontingency. Our results indicate that the use of web search counts increases the av- , erage accuracy of our best method to 85.64% over a baseline of 50%, as compared to an average accuracy of 75. 15% without web search.

2 0.85539448 118 emnlp-2013-Learning Biological Processes with Global Constraints

Author: Aju Thalappillil Scaria ; Jonathan Berant ; Mengqiu Wang ; Peter Clark ; Justin Lewis ; Brittany Harding ; Christopher D. Manning

Abstract: Biological processes are complex phenomena involving a series of events that are related to one another through various relationships. Systems that can understand and reason over biological processes would dramatically improve the performance of semantic applications involving inference such as question answering (QA) – specifically “How? ” and “Why? ” questions. In this paper, we present the task of process extraction, in which events within a process and the relations between the events are automatically extracted from text. We represent processes by graphs whose edges describe a set oftemporal, causal and co-reference event-event relations, and characterize the structural properties of these graphs (e.g., the graphs are connected). Then, we present a method for extracting relations between the events, which exploits these structural properties by performing joint in- ference over the set of extracted relations. On a novel dataset containing 148 descriptions of biological processes (released with this paper), we show significant improvement comparing to baselines that disregard process structure.

3 0.73864752 16 emnlp-2013-A Unified Model for Topics, Events and Users on Twitter

Author: Qiming Diao ; Jing Jiang

Abstract: With the rapid growth of social media, Twitter has become one of the most widely adopted platforms for people to post short and instant message. On the one hand, people tweets about their daily lives, and on the other hand, when major events happen, people also follow and tweet about them. Moreover, people’s posting behaviors on events are often closely tied to their personal interests. In this paper, we try to model topics, events and users on Twitter in a unified way. We propose a model which combines an LDA-like topic model and the Recurrent Chinese Restaurant Process to capture topics and events. We further propose a duration-based regularization component to find bursty events. We also propose to use event-topic affinity vectors to model the asso- . ciation between events and topics. Our experiments shows that our model can accurately identify meaningful events and the event-topic affinity vectors are effective for event recommendation and grouping events by topics.

4 0.69561881 41 emnlp-2013-Building Event Threads out of Multiple News Articles

Author: Xavier Tannier ; Veronique Moriceau

Abstract: We present an approach for building multidocument event threads from a large corpus of newswire articles. An event thread is basically a succession of events belonging to the same story. It helps the reader to contextualize the information contained in a single article, by navigating backward or forward in the thread from this article. A specific effort is also made on the detection of reactions to a particular event. In order to build these event threads, we use a cascade of classifiers and other modules, taking advantage of the redundancy of information in the newswire corpus. We also share interesting comments concerning our manual annotation procedure for building a training and testing set1.

5 0.66093278 74 emnlp-2013-Event-Based Time Label Propagation for Automatic Dating of News Articles

Author: Tao Ge ; Baobao Chang ; Sujian Li ; Zhifang Sui

Abstract: Since many applications such as timeline summaries and temporal IR involving temporal analysis rely on document timestamps, the task of automatic dating of documents has been increasingly important. Instead of using feature-based methods as conventional models, our method attempts to date documents in a year level by exploiting relative temporal relations between documents and events, which are very effective for dating documents. Based on this intuition, we proposed an eventbased time label propagation model called confidence boosting in which time label information can be propagated between documents and events on a bipartite graph. The experiments show that our event-based propagation model can predict document timestamps in high accuracy and the model combined with a MaxEnt classifier outperforms the state-ofthe-art method for this task especially when the size of the training set is small.

6 0.56872791 76 emnlp-2013-Exploiting Discourse Analysis for Article-Wide Temporal Classification

7 0.56530446 147 emnlp-2013-Optimized Event Storyline Generation based on Mixture-Event-Aspect Model

8 0.50560957 93 emnlp-2013-Harvesting Parallel News Streams to Generate Paraphrases of Event Relations

9 0.50229967 75 emnlp-2013-Event Schema Induction with a Probabilistic Entity-Driven Model

10 0.4332464 90 emnlp-2013-Generating Coherent Event Schemas at Scale

11 0.3189677 179 emnlp-2013-Summarizing Complex Events: a Cross-Modal Solution of Storylines Extraction and Reconstruction

12 0.30717197 25 emnlp-2013-Appropriately Incorporating Statistical Significance in PMI

13 0.30461955 183 emnlp-2013-The VerbCorner Project: Toward an Empirically-Based Semantic Decomposition of Verbs

14 0.29994646 153 emnlp-2013-Predicting the Resolution of Referring Expressions from User Behavior

15 0.28704768 152 emnlp-2013-Predicting the Presence of Discourse Connectives

16 0.28579789 185 emnlp-2013-Towards Situated Dialogue: Revisiting Referring Expression Generation

17 0.27630228 191 emnlp-2013-Understanding and Quantifying Creativity in Lexical Composition

18 0.26011324 177 emnlp-2013-Studying the Recursive Behaviour of Adjectival Modification with Compositional Distributional Semantics

19 0.25932458 87 emnlp-2013-Fish Transporters and Miracle Homes: How Compositional Distributional Semantics can Help NP Parsing

20 0.24572562 68 emnlp-2013-Effectiveness and Efficiency of Open Relation Extraction


similar papers computed by lda model

lda for this paper:

topicId topicWeight

[(3, 0.02), (18, 0.042), (22, 0.106), (30, 0.059), (50, 0.037), (51, 0.158), (66, 0.045), (71, 0.018), (75, 0.025), (77, 0.014), (90, 0.325), (96, 0.022), (97, 0.01)]

similar papers list:

simIndex simValue paperId paperTitle

1 0.91086709 155 emnlp-2013-Question Difficulty Estimation in Community Question Answering Services

Author: Jing Liu ; Quan Wang ; Chin-Yew Lin ; Hsiao-Wuen Hon

Abstract: In this paper, we address the problem of estimating question difficulty in community question answering services. We propose a competition-based model for estimating question difficulty by leveraging pairwise comparisons between questions and users. Our experimental results show that our model significantly outperforms a PageRank-based approach. Most importantly, our analysis shows that the text of question descriptions reflects the question difficulty. This implies the possibility of predicting question difficulty from the text of question descriptions.

2 0.88050598 33 emnlp-2013-Automatic Knowledge Acquisition for Case Alternation between the Passive and Active Voices in Japanese

Author: Ryohei Sasano ; Daisuke Kawahara ; Sadao Kurohashi ; Manabu Okumura

Abstract: We present a method for automatically acquiring knowledge for case alternation between the passive and active voices in Japanese. By leveraging several linguistic constraints on alternation patterns and lexical case frames obtained from a large Web corpus, our method aligns a case frame in the passive voice to a corresponding case frame in the active voice and finds an alignment between their cases. We then apply the acquired knowledge to a case alternation task and prove its usefulness.

3 0.85843724 177 emnlp-2013-Studying the Recursive Behaviour of Adjectival Modification with Compositional Distributional Semantics

Author: Eva Maria Vecchi ; Roberto Zamparelli ; Marco Baroni

Abstract: In this study, we use compositional distributional semantic methods to investigate restrictions in adjective ordering. Specifically, we focus on properties distinguishing AdjectiveAdjective-Noun phrases in which there is flexibility in the adjective ordering from those bound to a rigid order. We explore a number of measures extracted from the distributional representation of AAN phrases which may indicate a word order restriction. We find that we are able to distinguish the relevant classes and the correct order based primarily on the degree of modification of the adjectives. Our results offer fresh insight into the semantic properties that determine adjective ordering, building a bridge between syntax and distributional semantics.

4 0.83267403 29 emnlp-2013-Automatic Domain Partitioning for Multi-Domain Learning

Author: Di Wang ; Chenyan Xiong ; William Yang Wang

Abstract: Chenyan Xiong School of Computer Science Carnegie Mellon University Pittsburgh, PA 15213, USA cx@ c s . cmu .edu William Yang Wang School of Computer Science Carnegie Mellon University Pittsburgh, PA 15213, USA ww@ cmu .edu might not be generalizable to other domains (BenDavid et al., 2006; Ben-David et al., 2010). Multi-Domain learning (MDL) assumes that the domain labels in the dataset are known. However, when there are multiple metadata at- tributes available, it is not always straightforward to select a single best attribute for domain partition, and it is possible that combining more than one metadata attributes (including continuous attributes) can lead to better MDL performance. In this work, we propose an automatic domain partitioning approach that aims at providing better domain identities for MDL. We use a supervised clustering approach that learns the domain distance between data instances , and then cluster the data into better domains for MDL. Our experiment on real multi-domain datasets shows that using our automatically generated domain partition improves over popular MDL methods.

same-paper 5 0.78822964 192 emnlp-2013-Unsupervised Induction of Contingent Event Pairs from Film Scenes

Author: Zhichao Hu ; Elahe Rahimtoroghi ; Larissa Munishkina ; Reid Swanson ; Marilyn A. Walker

Abstract: Human engagement in narrative is partially driven by reasoning about discourse relations between narrative events, and the expectations about what is likely to happen next that results from such reasoning. Researchers in NLP have tackled modeling such expectations from a range of perspectives, including treating it as the inference of the CONTINGENT discourse relation, or as a type of common-sense causal reasoning. Our approach is to model likelihood between events by drawing on several of these lines of previous work. We implement and evaluate different unsupervised methods for learning event pairs that are likely to be CONTINGENT on one another. We refine event pairs that we learn from a corpus of film scene descriptions utilizing web search counts, and evaluate our results by collecting human judgments ofcontingency. Our results indicate that the use of web search counts increases the av- , erage accuracy of our best method to 85.64% over a baseline of 50%, as compared to an average accuracy of 75. 15% without web search.

6 0.55571634 179 emnlp-2013-Summarizing Complex Events: a Cross-Modal Solution of Storylines Extraction and Reconstruction

7 0.55368561 48 emnlp-2013-Collective Personal Profile Summarization with Social Networks

8 0.55241436 154 emnlp-2013-Prior Disambiguation of Word Tensors for Constructing Sentence Vectors

9 0.54885083 140 emnlp-2013-Of Words, Eyes and Brains: Correlating Image-Based Distributional Semantic Models with Neural Representations of Concepts

10 0.54825747 89 emnlp-2013-Gender Inference of Twitter Users in Non-English Contexts

11 0.54750448 98 emnlp-2013-Image Description using Visual Dependency Representations

12 0.54706186 51 emnlp-2013-Connecting Language and Knowledge Bases with Embedding Models for Relation Extraction

13 0.54484254 152 emnlp-2013-Predicting the Presence of Discourse Connectives

14 0.53991103 189 emnlp-2013-Two-Stage Method for Large-Scale Acquisition of Contradiction Pattern Pairs using Entailment

15 0.53658468 77 emnlp-2013-Exploiting Domain Knowledge in Aspect Extraction

16 0.53601682 168 emnlp-2013-Semi-Supervised Feature Transformation for Dependency Parsing

17 0.53562474 46 emnlp-2013-Classifying Message Board Posts with an Extracted Lexicon of Patient Attributes

18 0.53219205 160 emnlp-2013-Relational Inference for Wikification

19 0.53161752 175 emnlp-2013-Source-Side Classifier Preordering for Machine Translation

20 0.5314917 99 emnlp-2013-Implicit Feature Detection via a Constrained Topic Model and SVM