acl acl2013 acl2013-153 knowledge-graph by maker-knowledge-mining

153 acl-2013-Extracting Events with Informal Temporal References in Personal Histories in Online Communities

Source: pdf

Author: Miaomiao Wen ; Zeyu Zheng ; Hyeju Jang ; Guang Xiang ; Carolyn Penstein Rose

Abstract: We present a system for extracting the dates of illness events (year and month of the event occurrence) from posting histories in the context of an online medical support community. A temporal tagger retrieves and normalizes dates mentioned informally in social media to actual month and year referents. Building on this, an event date extraction system learns to integrate the likelihood of candidate dates extracted from time-rich sentences with temporal constraints extracted from eventrelated sentences. Our integrated model achieves 89.7% of the maximum performance given the performance of the temporal expression retrieval step.

Reference: text

Summary: the most important sentenses genereted by tfidf model

sentIndex sentText sentNum sentScore

1 edu Abstract We present a system for extracting the dates of illness events (year and month of the event occurrence) from posting histories in the context of an online medical support community. [sent-3, score-1.091]

2 A temporal tagger retrieves and normalizes dates mentioned informally in social media to actual month and year referents. [sent-4, score-1.063]

3 Building on this, an event date extraction system learns to integrate the likelihood of candidate dates extracted from time-rich sentences with temporal constraints extracted from eventrelated sentences. [sent-5, score-1.568]

4 7% of the maximum performance given the performance of the temporal expression retrieval step. [sent-7, score-0.397]

5 1 Introduction In this paper we present a challenging new event date extraction task. [sent-8, score-0.806]

6 Our technical contribution is a temporal tagger that outperforms previously published baseline approaches in its ability to identify informal temporal expressions (TE) and that normalizes each of them to an actual month and year (Chang and Manning, 2012; Strotgen and Gertz, 2010). [sent-9, score-1.179]

7 This temporal tagger then contributes towards high performance at matching event mentions with the month and year in which they occurred based on the complete posting history of users. [sent-10, score-1.287]

8 It does so with high accuracy on informal event mentions in social media by learning to integrate the likelihood of multiple candidate dates extracted from event mentions in timerich sentences with temporal constraints extracted from event-related sentences. [sent-11, score-1.659]

9 Despite considerable prior work in temporal information extraction, to date state-of-the-art resources are designed for extracting temporally scoped facts about public figures/organizations from newswire or Wikipedia articles (Ji et al. [sent-12, score-0.876]

10 Event keywords are in bold and temporal expressions are in italics. [sent-14, score-0.403]

11 When people are instead communicating informally about their lives, they refer to time more informally and frequently from their personal frame of reference rather than from an impersonal third person frame of reference. [sent-17, score-0.169]

12 , “last week”, “two days from now”), or personal time references in our data is more than one and a half times as high as in newswire and Wikipedia. [sent-21, score-0.117]

13 Therefore, it is not surprising that there would be difficulty in applying a temporal tagger designed for newswire to social media data (Strotgen and Gertz, 2012; Kolomiyets et al. [sent-22, score-0.518]

14 , 2012) demonstrate that user-focused event mentions extracted from social media data can provide a useful timeline-like tool for studying how behavior patterns change over time in response to mentioned events. [sent-26, score-0.478]

15 2 Task Our task is to extract personal illness events mentioned in the posting histories of online community participants. [sent-28, score-0.402]

16 c A2s0s1o3ci Aatsiosonc fioartio Cno fmorpu Ctoamtiopnuatalt Lioin gauli Lsitnicgsu,i psatgices 836–842, a candidate event and a posting history. [sent-31, score-0.54]

17 The output is the event date (month and year) for the event if it occurred, or “unknown” if it did not occur. [sent-32, score-1.129]

18 The process iterates through a list of 10 cancer events (CEs). [sent-33, score-0.184]

19 This list includes breast cancer Diagnosis, Metastasis, Recurrence, Mastectomy, Lumpectomy, Reconstruction, Chemotherapy-Start, Chemotherapy-End, Radiation-Start and Radiation-End. [sent-34, score-0.14]

20 For each of these target CEs, we manually designed an event keyword set that includes the name of the event, abbreviations, slang, aliases and related words. [sent-35, score-0.541]

21 For each of the 10 events, all sentences that mention a related event keyword are extracted from the user’s posting history. [sent-36, score-0.718]

22 Figure 1 shows sevaral sentences that were extracted for one user for the start date of Radiation. [sent-37, score-0.527]

23 Note that the user began to post about Radiation before she started it. [sent-39, score-0.147]

24 Most of the TEs are non-standard and need to be resolved to calendar dates (year and month). [sent-41, score-0.275]

25 Once the full set of event mention sentences has been extracted for a user, all the temporal expressions (TEs) that appear in the same sentence with an event mention are resolved to a set of candidate dates. [sent-42, score-1.385]

26 Besides a standard event-time classifier for within-sentence event-time anchoring, we leverage a new source of temporal information to train a constraint-based event-time classifier. [sent-43, score-0.44]

27 However, sentences that contain only the event mention but no explicit TE can also be informative. [sent-47, score-0.404]

28 For example, the post time (usually referred to as document creation time or DCT) of the sentence “metastasis was found in my bone” might be labeled as being after the “metastasis” event date. [sent-48, score-0.415]

29 These DCTs impose constraints on the possible event dates, which can be integrated with the event-time classifier, as a variant on related work(Chambers, 2012). [sent-49, score-0.481]

30 3 Related Work Previous work on TE extraction has focused mainly on newswire text (Strotgen and Gertz, 2010; Chang and Manning, 2012). [sent-50, score-0.088]

31 This paper presents a rule-based TE extractor that identifies and resolves a higher percentage of nonstandard TEs than earlier state-of-art temporal taggers. [sent-51, score-0.403]

32 Our task is closest to the temporal slot filling track in the TAC-KBP 2011 shared task (Ji et al. [sent-52, score-0.363]

33 Their goal was to extract the temporal bounds of event relations. [sent-54, score-0.728]

34 First, they used newswire, Wikipedia and blogs as data sources from which they extract temporal bounds of facts found in Wikipedia infoboxes. [sent-56, score-0.393]

35 Second, in the KBP task, the set of gold event relations are provided as input, so that the task is only to identify a date for an event that is guaranteed to have been mentioned. [sent-57, score-1.16]

36 However, most of the candidate events won’t have ever been reported within a user’s posting history. [sent-59, score-0.253]

37 In most temporal relation bound extraction systems, the constraints are included as input rather than learned by the system (Talukdar et al. [sent-61, score-0.474]

38 Thus, there can be no universal logical constraints on the order of cancer events. [sent-69, score-0.175]

39 Our approach to using temporal constraints is a variant on previously published approaches. [sent-70, score-0.432]

40 (2012) made use of DCT (document creation time) as well, however, they have assumed the DCT is within the time-range of the event stated in the document, which is often not true in our data. [sent-72, score-0.365]

41 We learn the event-DCT relations to produce constrains for the event date. [sent-74, score-0.365]

42 4 Corpus Annotation We have scraped the posts, users, and profiles from a large online cancer support community. [sent-75, score-0.106]

43 From this collection we extracted and then annotated two separate corpora, one for evaluating our TE retrieval and normalization, the other one for event date extraction. [sent-76, score-0.795]

44 For creating the TE extraction corpus, we ran837 domly picked one post from each of 1,000 randomly selected users. [sent-77, score-0.092]

45 We used this sampling technique because each user tends to use a narrow range of date expression forms. [sent-78, score-0.53]

46 From these posts, we manually extracted 601 TEs and resolved them to a specific month and year or just year if the month was not mentioned. [sent-79, score-0.648]

47 Our corpus for event date extraction consists of the complete posting history of 300 users that were randomly drawn from our dataset. [sent-81, score-0.985]

48 Three annotators were provided with guidelines for how to infer the date of the events (Wen et al. [sent-82, score-0.477]

49 94 Kappa on identification ofwhether an event has a reported event date in a user’s history or not. [sent-85, score-1.163]

50 From this corpus, 509 events were annotated with occurrence dates (year and month). [sent-88, score-0.278]

51 Given an event and a user’s post history, the system searches for all of the sentences that contain an event keyword (keyword sentence) and all the sentences that contain both a keyword and a TE (date sentence). [sent-91, score-1.132]

52 The TEs in the date sentences are resolved and then used as candidate dates for the event. [sent-92, score-0.742]

53 First, the Date Classifier is trained from date sentences to predict how likely its candidate TE and the gold event date are to overlap. [sent-94, score-1.262]

54 Then, because constraints over event dates can be informed by temporal relations between the event date and the DCT, the Constraint-based Classifier provides an indication of the plausibility of candidate dates. [sent-95, score-1.829]

55 1 Temporal Tagger We design a rule-based temporal tagger that is built using regular expression patterns to recognize informal TEs. [sent-98, score-0.509]

56 The additional types of TE we handle include: 1)user-specific TEs: A user’s age, cancer anniversary and survivorship can provide temporal information about the user’s CEs. [sent-100, score-0.469]

57 We obtain the birth date of users from their personal profile to resolve age date expressions such as “at the age of 57”. [sent-101, score-1.045]

58 2)non-whole numbers such as “a year and half” and “1/2 weeks”. [sent-102, score-0.117]

59 4)underspecified month mentions, we resolve the year information according to the DCT month, the mentioned month and the verb tense. [sent-106, score-0.461]

60 2 Date Classifier We train a MaxEnt classifier to predict the temporal relationship between the retrieved TE and the event date as overlap or no-overlap, similar to the within-sentence event-time anchoring task in TempEval-2 (UzZaman and Allen, 2010). [sent-108, score-1.249]

61 , 2009): namely, event keyword and its dominant verb, verb and preposition that dominate TE, dependency path between TE and keyword and its length, unigram and bigram word and POS features. [sent-110, score-0.717]

62 So we add subject features to remove this kind of noise, which includes the governing subject of the event keyword and its POS tag. [sent-113, score-0.541]

63 Modality features include the appearance of modals before the event keyword (e. [sent-114, score-0.541]

64 To calculate the likelihood of a candidate date for an event, we need to aggregate the hard decisions from the classifier. [sent-121, score-0.467]

65 Let DSu be the set of the user’s date sentences, let Du be the set of dates resolved from each TE. [sent-122, score-0.674]

66 We represent a MaxEnt classifier by Prelation(R|t, ds) for a candidate date t in date sentence d(sR |atn,dd possible rnedlaidtiaotne R = {overlap, no-overlap}. [sent-123, score-0.943]

67 str Wibeut mioanp over disattrei-s by defining PDateSentence(t|DSu): PDateSentence(t|DSu) = Z(1Du)tjX∈Duδtj(t)Prelation(overlap|tj,dsj) (1) δtj(t) =(01 oifth te =rw tjise 838 We refer to this model as the Date Classifier. [sent-125, score-0.12]

68 However, keyword sentences can inform temporal constraints for events and therefore should not be ignored. [sent-132, score-0.686]

69 ” indicates the user has done radiation by the time of the post (DCT). [sent-134, score-0.286]

70 The topic of the sentence can also indicate the temporal relation. [sent-137, score-0.363]

71 This section departs from the above Date Clas- sifier and instead predicts whether each keyword sentence is posted before or overlap-or-after the user’s event date. [sent-140, score-0.578]

72 We create training examples by computing the temporal relation between the DCT and the user’s gold event date. [sent-143, score-0.759]

73 If the user has not reported an event date, the label should be unknown. [sent-144, score-0.462]

74 We train a MaxEnt classifier on each event mention paired with its corresponding DCT. [sent-145, score-0.481]

75 Let KSu be the set of the user’s keyword sentences, let Du be the set of dates resolved from each date sentence. [sent-147, score-0.85]

76 We define a MaxEnt classifier by Prelation(R|ks) for a keyword sentence ks and possible ( rRel|aktiso)n oRr = {before, overlap-or-after, unknown}. [sent-148, score-0.253]

77 aDtiConT Ris t =he post triem,e o ovefr tlahep keyword sentence }ks. [sent-149, score-0.226]

78 DTCheT rel(DCT, t) ifmunec otifo nth simply doredtermines if the DCT is before or overlap-or-after the candidate date t. [sent-151, score-0.467]

79 We map this distribution over relations to a distribution over dates by defining PKeywordSentence(t, KSu): PKeywordSentence(t, KSu) = (2) Z(D1u)ksjX∈KSuPrelation(rel(dctj,t)|ksj) rel(dct,t) =(boveefor leap-oifr- dacft er < tif dct ≥ t 5. [sent-152, score-0.383]

80 1 Temporal Expression Retrieval We compare our temporal tagger’s performance with SUTime (Chang and Manning, 2012) on the 601 manually extracted TEs. [sent-160, score-0.394]

81 We first evaluate identification of the extent of a TE and then production of the correctly resolved date for each recognized expression. [sent-162, score-0.474]

82 1 Evaluation metric The extracted date is only considered correct if it completely matches the gold date. [sent-170, score-0.461]

83 For less than 4% of users, we have multiple dates for the same event (e. [sent-171, score-0.565]

84 , 2011), in these cases, we give the system the benefit of the doubt and the extracted date is considered correct if it matches one of the gold dates. [sent-175, score-0.461]

85 W de i sch thoeos dei a mrenucceh stricter evaluation metric because we need a precise event date to study user behavior changes. [sent-178, score-0.901]

86 2 Baselines and oracle Based on our temporal tagger, we provide two baselines to describe heuristic methods of aggregating the hard decisions from the classifier 839 learned in Section 5. [sent-181, score-0.488]

87 The first baseline, Baseline1, is to pick the date with the highest clas- × sifier’s prediction confidence. [sent-183, score-0.399]

88 For example, if the candidate date is “6/2009” and we have retrieved two TEs that are resolved to “6/2009” and “4/2008”, then P(“6/2009” ) = Prelation(overlap| “6/2009”) Prelation(no-overlap| “4/2008” ). [sent-185, score-0.542]

89 To set an upper lbapou|“n4d/ on performance given our TE retrieval system, we calculate the oracle score by considering an extraction as correct if the gold date is one of the retrieved candidate dates. [sent-186, score-0.588]

90 This shows the value of our approach to leveraging redundancy of event date mentions. [sent-193, score-0.764]

91 Table 3 shows the performance of our systems and baselines on individual event types. [sent-201, score-0.365]

92 This is mainly because Chemotherapy and Radiation last for a period of time and there are more event-related discussions containing the event keyword. [sent-203, score-0.365]

93 None of our systems improves on cancer Metastasis and Recurrence. [sent-204, score-0.106]

94 7 Conclusion We presented a novel event date extraction task that requires extraction and resolution of nonstandard TEs, namely personal illness event dates, from the posting histories of online community participants. [sent-206, score-1.577]

95 We constructed an evaluation corpus and designed a temporal tagger for non-standard TEs in social media. [sent-207, score-0.435]

96 By creating an analogous keyword set, our event date extraction method could be easily adapted to other datasets. [sent-209, score-0.982]

97 TimeML: Robust specification of event and temporal expressions in text. [sent-282, score-0.768]

98 TimeML: Robust specification of event and temporal expressions in text. [sent-283, score-0.768]

99 TRIPS and TRIOS system for TempEval-2: Extracting temporal information from text. [sent-310, score-0.363]

100 From diagnosis to death: A case study of coping with breast cancer as seen through online discussion group messages. [sent-328, score-0.171]

similar papers computed by tfidf model

tfidf for this paper:

wordName wordTfidf (topN-words)

[('date', 0.399), ('event', 0.365), ('temporal', 0.363), ('dates', 0.2), ('dct', 0.183), ('keyword', 0.176), ('tes', 0.17), ('month', 0.154), ('radiation', 0.139), ('te', 0.12), ('year', 0.117), ('prelation', 0.112), ('strotgen', 0.112), ('posting', 0.107), ('cancer', 0.106), ('mcclosky', 0.1), ('garrido', 0.099), ('user', 0.097), ('ksu', 0.09), ('metastasis', 0.09), ('dsu', 0.086), ('sutime', 0.079), ('events', 0.078), ('classifier', 0.077), ('resolved', 0.075), ('histories', 0.073), ('illness', 0.073), ('tagger', 0.072), ('personal', 0.071), ('constraints', 0.069), ('candidate', 0.068), ('chemotherapy', 0.067), ('pdatesentence', 0.067), ('pkeywordsentence', 0.067), ('wen', 0.067), ('ji', 0.066), ('manning', 0.06), ('gertz', 0.06), ('kolomiyets', 0.06), ('uzzaman', 0.055), ('post', 0.05), ('informally', 0.049), ('oracle', 0.048), ('integrated', 0.047), ('newswire', 0.046), ('allen', 0.045), ('anchoring', 0.045), ('jannik', 0.045), ('mastectomy', 0.045), ('oleksandr', 0.045), ('postsu', 0.045), ('mentions', 0.045), ('maxent', 0.043), ('rel', 0.043), ('extraction', 0.042), ('retrieves', 0.041), ('medical', 0.041), ('expressions', 0.04), ('informal', 0.04), ('hyeju', 0.04), ('miaomiao', 0.04), ('nonstandard', 0.04), ('stricter', 0.04), ('yoshikawa', 0.04), ('clinical', 0.039), ('mention', 0.039), ('narrative', 0.038), ('temporally', 0.038), ('users', 0.038), ('media', 0.037), ('roser', 0.037), ('birthday', 0.037), ('sifier', 0.037), ('carolyn', 0.037), ('jang', 0.037), ('setzer', 0.037), ('timeml', 0.037), ('resolve', 0.036), ('tac', 0.035), ('chang', 0.035), ('du', 0.035), ('breast', 0.034), ('chambers', 0.034), ('expression', 0.034), ('history', 0.034), ('sauri', 0.033), ('ces', 0.033), ('age', 0.031), ('bethard', 0.031), ('diagnosis', 0.031), ('ros', 0.031), ('extracted', 0.031), ('gold', 0.031), ('choudhury', 0.03), ('normalizes', 0.03), ('occurred', 0.03), ('facts', 0.03), ('pustejovsky', 0.029), ('gaizauskas', 0.029), ('talukdar', 0.029)]

similar papers list:

simIndex simValue paperId paperTitle

same-paper 1 0.99999875 153 acl-2013-Extracting Events with Informal Temporal References in Personal Histories in Online Communities

Author: Miaomiao Wen ; Zeyu Zheng ; Hyeju Jang ; Guang Xiang ; Carolyn Penstein Rose

2 0.30083472 339 acl-2013-Temporal Signals Help Label Temporal Relations

Author: Leon Derczynski ; Robert Gaizauskas

Abstract: Automatically determining the temporal order of events and times in a text is difficult, though humans can readily perform this task. Sometimes events and times are related through use of an explicit co-ordination which gives information about the temporal relation: expressions like “before ” and “as soon as”. We investigate the r oˆle that these co-ordinating temporal signals have in determining the type of temporal relations in discourse. Using machine learning, we improve upon prior approaches to the problem, achieving over 80% accuracy at labelling the types of temporal relation between events and times that are related by temporal signals.

3 0.25583643 296 acl-2013-Recognizing Identical Events with Graph Kernels

Author: Goran Glavas ; Jan Snajder

Abstract: Identifying news stories that discuss the same real-world events is important for news tracking and retrieval. Most existing approaches rely on the traditional vector space model. We propose an approach for recognizing identical real-world events based on a structured, event-oriented document representation. We structure documents as graphs of event mentions and use graph kernels to measure the similarity between document pairs. Our experiments indicate that the proposed graph-based approach can outperform the traditional vector space model, and is especially suitable for distinguishing between topically similar, yet non-identical events.

4 0.25489706 56 acl-2013-Argument Inference from Relevant Event Mentions in Chinese Argument Extraction

Author: Peifeng Li ; Qiaoming Zhu ; Guodong Zhou

Abstract: As a paratactic language, sentence-level argument extraction in Chinese suffers much from the frequent occurrence of ellipsis with regard to inter-sentence arguments. To resolve such problem, this paper proposes a novel global argument inference model to explore specific relationships, such as Coreference, Sequence and Parallel, among relevant event mentions to recover those intersentence arguments in the sentence, discourse and document layers which represent the cohesion of an event or a topic. Evaluation on the ACE 2005 Chinese corpus justifies the effectiveness of our global argument inference model over a state-of-the-art baseline. 1

5 0.25354409 212 acl-2013-Language-Independent Discriminative Parsing of Temporal Expressions

Author: Gabor Angeli ; Jakob Uszkoreit

Abstract: Temporal resolution systems are traditionally tuned to a particular language, requiring significant human effort to translate them to new languages. We present a language independent semantic parser for learning the interpretation of temporal phrases given only a corpus of utterances and the times they reference. We make use of a latent parse that encodes a language-flexible representation of time, and extract rich features over both the parse and associated temporal semantics. The parameters of the model are learned using a weakly supervised bootstrapping approach, without the need for manually tuned parameters or any other language expertise. We achieve state-of-the-art accuracy on all languages in the TempEval2 temporal normalization task, reporting a 4% improvement in both English and Spanish accuracy, and to our knowledge the first results for four other languages.

6 0.19842003 206 acl-2013-Joint Event Extraction via Structured Prediction with Global Features

7 0.14644355 224 acl-2013-Learning to Extract International Relations from Political Context

8 0.11644142 22 acl-2013-A Structured Distributional Semantic Model for Event Co-reference

9 0.10233575 283 acl-2013-Probabilistic Domain Modelling With Contextualized Distributional Semantic Vectors

10 0.096148342 138 acl-2013-Enriching Entity Translation Discovery using Selective Temporality

11 0.087619893 301 acl-2013-Resolving Entity Morphs in Censored Data

12 0.086015649 126 acl-2013-Diverse Keyword Extraction from Conversations

13 0.075831831 164 acl-2013-FudanNLP: A Toolkit for Chinese Natural Language Processing

14 0.06678497 179 acl-2013-HYENA-live: Fine-Grained Online Entity Type Classification from Natural-language Text

15 0.063578829 178 acl-2013-HEADY: News headline abstraction through event pattern clustering

16 0.063139223 146 acl-2013-Exploiting Social Media for Natural Language Processing: Bridging the Gap between Language-centric and Real-world Applications

17 0.062045567 267 acl-2013-PARMA: A Predicate Argument Aligner

18 0.061623655 115 acl-2013-Detecting Event-Related Links and Sentiments from Social Media Texts

19 0.060751036 278 acl-2013-Patient Experience in Online Support Forums: Modeling Interpersonal Interactions and Medication Use

20 0.060125902 298 acl-2013-Recognizing Rare Social Phenomena in Conversation: Empowerment Detection in Support Group Chatrooms

similar papers computed by lsi model

lsi for this paper:

topicId topicWeight

[(0, 0.157), (1, 0.088), (2, -0.055), (3, -0.101), (4, 0.078), (5, 0.267), (6, 0.06), (7, 0.145), (8, 0.012), (9, 0.055), (10, -0.016), (11, -0.059), (12, 0.013), (13, 0.022), (14, 0.016), (15, -0.083), (16, -0.075), (17, -0.054), (18, 0.074), (19, -0.038), (20, 0.082), (21, -0.279), (22, 0.077), (23, 0.055), (24, -0.058), (25, -0.1), (26, -0.177), (27, -0.029), (28, -0.227), (29, -0.069), (30, 0.01), (31, 0.049), (32, 0.024), (33, -0.08), (34, 0.179), (35, -0.099), (36, -0.058), (37, 0.013), (38, 0.021), (39, -0.046), (40, -0.097), (41, -0.042), (42, 0.014), (43, -0.037), (44, 0.1), (45, -0.011), (46, 0.047), (47, -0.03), (48, 0.007), (49, -0.012)]

similar papers list:

simIndex simValue paperId paperTitle

same-paper 1 0.97819942 153 acl-2013-Extracting Events with Informal Temporal References in Personal Histories in Online Communities

Author: Miaomiao Wen ; Zeyu Zheng ; Hyeju Jang ; Guang Xiang ; Carolyn Penstein Rose

2 0.83248246 339 acl-2013-Temporal Signals Help Label Temporal Relations

Author: Leon Derczynski ; Robert Gaizauskas

3 0.74992555 296 acl-2013-Recognizing Identical Events with Graph Kernels

Author: Goran Glavas ; Jan Snajder

4 0.70493513 212 acl-2013-Language-Independent Discriminative Parsing of Temporal Expressions

Author: Gabor Angeli ; Jakob Uszkoreit

5 0.60710377 56 acl-2013-Argument Inference from Relevant Event Mentions in Chinese Argument Extraction

Author: Peifeng Li ; Qiaoming Zhu ; Guodong Zhou

6 0.57064158 206 acl-2013-Joint Event Extraction via Structured Prediction with Global Features

7 0.5685938 224 acl-2013-Learning to Extract International Relations from Political Context

8 0.4476563 138 acl-2013-Enriching Entity Translation Discovery using Selective Temporality

9 0.41170251 301 acl-2013-Resolving Entity Morphs in Censored Data

10 0.39923313 22 acl-2013-A Structured Distributional Semantic Model for Event Co-reference

11 0.39127916 178 acl-2013-HEADY: News headline abstraction through event pattern clustering

12 0.38667622 278 acl-2013-Patient Experience in Online Support Forums: Modeling Interpersonal Interactions and Medication Use

13 0.37859657 175 acl-2013-Grounded Language Learning from Video Described with Sentences

14 0.34445918 283 acl-2013-Probabilistic Domain Modelling With Contextualized Distributional Semantic Vectors

15 0.33320543 161 acl-2013-Fluid Construction Grammar for Historical and Evolutionary Linguistics

16 0.33029237 61 acl-2013-Automatic Interpretation of the English Possessive

17 0.32121015 311 acl-2013-Semantic Neighborhoods as Hypergraphs

18 0.31624165 302 acl-2013-Robust Automated Natural Language Processing with Multiword Expressions and Collocations

19 0.29032141 146 acl-2013-Exploiting Social Media for Natural Language Processing: Bridging the Gap between Language-centric and Real-world Applications

20 0.278927 142 acl-2013-Evolutionary Hierarchical Dirichlet Process for Timeline Summarization

similar papers computed by lda model

lda for this paper:

topicId topicWeight

[(0, 0.03), (4, 0.015), (6, 0.04), (7, 0.259), (11, 0.047), (15, 0.011), (24, 0.036), (26, 0.06), (35, 0.077), (42, 0.046), (48, 0.022), (70, 0.139), (88, 0.032), (90, 0.025), (95, 0.055)]

similar papers list:

simIndex simValue paperId paperTitle

same-paper 1 0.80842012 153 acl-2013-Extracting Events with Informal Temporal References in Personal Histories in Online Communities

Author: Miaomiao Wen ; Zeyu Zheng ; Hyeju Jang ; Guang Xiang ; Carolyn Penstein Rose

2 0.72506255 346 acl-2013-The Impact of Topic Bias on Quality Flaw Prediction in Wikipedia

Author: Oliver Ferschke ; Iryna Gurevych ; Marc Rittberger

Abstract: With the increasing amount of user generated reference texts in the web, automatic quality assessment has become a key challenge. However, only a small amount of annotated data is available for training quality assessment systems. Wikipedia contains a large amount of texts annotated with cleanup templates which identify quality flaws. We show that the distribution of these labels is topically biased, since they cannot be applied freely to any arbitrary article. We argue that it is necessary to consider the topical restrictions of each label in order to avoid a sampling bias that results in a skewed classifier and overly optimistic evaluation results. . We factor out the topic bias by extracting reliable training instances from the revision history which have a topic distribution similar to the labeled articles. This approach better reflects the situation a classifier would face in a real-life application.

3 0.71100241 253 acl-2013-Multilingual Affect Polarity and Valence Prediction in Metaphor-Rich Texts

Author: Zornitsa Kozareva

Abstract: Metaphor is an important way of conveying the affect of people, hence understanding how people use metaphors to convey affect is important for the communication between individuals and increases cohesion if the perceived affect of the concrete example is the same for the two individuals. Therefore, building computational models that can automatically identify the affect in metaphor-rich texts like “The team captain is a rock.”, “Time is money.”, “My lawyer is a shark.” is an important challenging problem, which has been of great interest to the research community. To solve this task, we have collected and manually annotated the affect of metaphor-rich texts for four languages. We present novel algorithms that integrate triggers for cognitive, affective, perceptual and social processes with stylistic and lexical information. By running evaluations on datasets in English, Spanish, Russian and Farsi, we show that the developed affect polarity and valence prediction technology of metaphor-rich texts is portable and works equally well for different languages.

4 0.66141504 259 acl-2013-Non-Monotonic Sentence Alignment via Semisupervised Learning

Author: Xiaojun Quan ; Chunyu Kit ; Yan Song

Abstract: This paper studies the problem of nonmonotonic sentence alignment, motivated by the observation that coupled sentences in real bitexts do not necessarily occur monotonically, and proposes a semisupervised learning approach based on two assumptions: (1) sentences with high affinity in one language tend to have their counterparts with similar relatedness in the other; and (2) initial alignment is readily available with existing alignment techniques. They are incorporated as two constraints into a semisupervised learning framework for optimization to produce a globally optimal solution. The evaluation with realworld legal data from a comprehensive legislation corpus shows that while exist- ing alignment algorithms suffer severely from non-monotonicity, this approach can work effectively on both monotonic and non-monotonic data.

5 0.59666055 296 acl-2013-Recognizing Identical Events with Graph Kernels

Author: Goran Glavas ; Jan Snajder

6 0.5937407 220 acl-2013-Learning Latent Personas of Film Characters

7 0.58775383 356 acl-2013-Transfer Learning Based Cross-lingual Knowledge Extraction for Wikipedia

8 0.58384854 19 acl-2013-A Shift-Reduce Parsing Algorithm for Phrase-based String-to-Dependency Translation

9 0.58239108 218 acl-2013-Latent Semantic Tensor Indexing for Community-based Question Answering

10 0.58160973 348 acl-2013-The effect of non-tightness on Bayesian estimation of PCFGs

11 0.57316941 89 acl-2013-Computerized Analysis of a Verbal Fluency Test

12 0.56092197 264 acl-2013-Online Relative Margin Maximization for Statistical Machine Translation

13 0.55734867 329 acl-2013-Statistical Machine Translation Improves Question Retrieval in Community Question Answering via Matrix Factorization

14 0.55152255 249 acl-2013-Models of Semantic Representation with Visual Attributes

15 0.54861939 80 acl-2013-Chinese Parsing Exploiting Characters

16 0.54679579 167 acl-2013-Generalizing Image Captions for Image-Text Parallel Corpus

17 0.54626095 169 acl-2013-Generating Synthetic Comparable Questions for News Articles

18 0.54060185 274 acl-2013-Parsing Graphs with Hyperedge Replacement Grammars

19 0.53702122 384 acl-2013-Visual Features for Linguists: Basic image analysis techniques for multimodally-curious NLPers

20 0.53543264 155 acl-2013-Fast and Accurate Shift-Reduce Constituent Parsing