acl acl2013 acl2013-178 knowledge-graph by maker-knowledge-mining

178 acl-2013-HEADY: News headline abstraction through event pattern clustering


Source: pdf

Author: Enrique Alfonseca ; Daniele Pighin ; Guillermo Garrido

Abstract: This paper presents HEADY: a novel, abstractive approach for headline generation from news collections. From a web-scale corpus of English news, we mine syntactic patterns that a Noisy-OR model generalizes into event descriptions. At inference time, we query the model with the patterns observed in an unseen news collection, identify the event that better captures the gist of the collection and retrieve the most appropriate pattern to generate a headline. HEADY improves over a state-of-theart open-domain title abstraction method, bridging half of the gap that separates it from extractive methods using humangenerated titles in manual evaluations, and performs comparably to human-generated headlines as evaluated with ROUGE.

Reference: text


Summary: the most important sentenses genereted by tfidf model

sentIndex sentText sentNum sentScore

1 HEADY: News headline abstraction through event pattern clustering Enrique Alfonseca Daniele Pighin Guillermo Garrido∗ Google Inc. [sent-1, score-0.764]

2 Abstract This paper presents HEADY: a novel, abstractive approach for headline generation from news collections. [sent-7, score-0.897]

3 From a web-scale corpus of English news, we mine syntactic patterns that a Noisy-OR model generalizes into event descriptions. [sent-8, score-0.239]

4 At inference time, we query the model with the patterns observed in an unseen news collection, identify the event that better captures the gist of the collection and retrieve the most appropriate pattern to generate a headline. [sent-9, score-0.728]

5 HEADY improves over a state-of-theart open-domain title abstraction method, bridging half of the gap that separates it from extractive methods using humangenerated titles in manual evaluations, and performs comparably to human-generated headlines as evaluated with ROUGE. [sent-10, score-0.615]

6 Different news agencies will interpret the event in different ways; various countries or locations may highlight different aspects of it depending on how they are affected; and opinions and in-depth analyses will be written after the fact. [sent-13, score-0.28]

7 On the positive side, we have the same events described in different ways; this redundancy is useful for summarization, as the information content reported by the majority of news sources most likely represents the central part of the event. [sent-15, score-0.247]

8 For some applications it is important to understand, given a collection of related news articles and re∗Work done during an internship at Google Zurich. [sent-17, score-0.268]

9 As a motivating example, Table 1shows the different headlines observed in news reporting the wedding between basketball player Carmelo Anthony and actress LaLa Vazquez. [sent-20, score-0.752]

10 When presenting this event to a user in a news-based information retrieval or recommendation system, different event descriptions may be more appropriate. [sent-22, score-0.202]

11 Our final goal in this research is to build a headline generation system that, given a news collection, is able to describe it with the most com- pact, objective and informative headline. [sent-28, score-0.852]

12 In particular, we want the system to be able to: • • • Generate headlines in an open-domain, unsupervised way, so ethsa int i at nd ooepes nn-odto nmeaeidn to rely on training data which is expensive to produce. [sent-29, score-0.367]

13 In this paper we present HEADY, which is at the same time a novel system for abstractive headline generation, and a smooth clustering of patterns describing the same events. [sent-38, score-0.789]

14 By learning to generalize events across the boundaries of a single news story or news collection, HEADY produces compact and effective headlines that objectively convey the relevant information. [sent-40, score-0.793]

15 When compared to a state-of-the-art opendomain headline abstraction system (Filippova, 2010), the new headlines are statistically significantly better both in terms of readability and informativeness. [sent-41, score-0.968]

16 Also, automatic evaluations using ROUGE, having objective headlines for the news as references, show that the abstractive headlines are on par with human-produced headlines. [sent-42, score-1.053]

17 Most headline generation work in the past has focused on the problem of single-document summarization: given the main passage of a single news article, generate a very short summary of the article. [sent-44, score-0.823]

18 From early in the field, it was pointed out that a purely extractive approach is not good enough to generate headlines from the body text (Banko et al. [sent-45, score-0.431]

19 More importantly, quite often, the single sentence selected as the most informative for the news collection is already longer than the desired headline size. [sent-48, score-0.84]

20 For this reason, most early headline generation work focused on either extracting and reordering n-grams from the document to be summarized (Banko et al. [sent-49, score-0.612]

21 Single-document headline generation was also explored at the Document Understanding Conferences between 2002 and 20041. [sent-53, score-0.612]

22 All of them have direct applications for headline generation, as it can be construed as selecting one or a few sentences from the original document(s), and then reducing them to the target title size. [sent-59, score-0.585]

23 Filippova (2010) reports a system that is very close to our settings: the input is a collection of related news articles, and the system generates a headline that describes the main event. [sent-65, score-0.813]

24 There are not many fully abstractive systems for news summarization. [sent-69, score-0.285]

25 In contrast, HEADY automatically learns the templates or headline patterns automatically, which allows it to work in open-domain settings without relying on supervision or manual annotations. [sent-71, score-0.712]

26 Pattern learning for relation extraction is an active area of research that is very related to our problem of event pattern learning for headline generation. [sent-73, score-0.764]

27 The main differences are that (a) HEADY is not limited to consider patterns expressing relations between pairs of entities; (b) we identify synonym patterns using a probabilistic, Bayesian approach that takes advantage of the multiplicity of news sources reporting the same events. [sent-82, score-0.455]

28 3 Headline generation In this section, we describe the HEADY system for news headline abstraction. [sent-92, score-0.791]

29 Our approach takes as input, for training, a corpus of news articles organized in news collections. [sent-93, score-0.358]

30 Once the model is trained, it can generate headlines for new collections. [sent-94, score-0.399]

31 Identify, in each ofthe news collections, syntactic patterns connecting k entities, for k ≥ 1. [sent-99, score-0.317]

32 Each pattern extracted in the previous step is added as an observed variable, and latent variables are used to represent the hidden events that generate patterns. [sent-103, score-0.321]

33 First, patterns are extracted using the pattern extraction procedure mentioned above. [sent-107, score-0.256]

34 The single pattern with the maximum probability is selected to generate a new headline from it. [sent-110, score-0.695]

35 Being the product of extranews collection generalization, the retrieved pattern is more likely to be objective and informative than patterns directly observed in the news collection. [sent-111, score-0.625]

36 1 Pattern extraction In this section we detail the process for obtaining the event patterns that constitute the building blocks of learning and inference. [sent-114, score-0.239]

37 Patterns are extracted from a large repository N of news collections N1, . [sent-115, score-0.269]

38 Each news Ncoll oefct nieown sN co = {ni} is an unordered collection of related news, e}ac ihs o afn w uhniochrd can b ceo seen as an ordered sequence of sentences, i. [sent-119, score-0.268]

39 GETRELEVANTENTITIES: For each news collection N we collect the set E of the entities mentioned most often within the collection. [sent-137, score-0.345]

40 (1): an MST is extracted from the entity pair e1, e2 (2); nodes are heuristically added to the MST to enforce grammaticality (3); entity types are recombined to generate the fi- nal patterns (4). [sent-139, score-0.286]

41 2 EXTRACTPATTERNS: For each subset of relevant entities Ei, event patterns are mined from the articles in the news collection. [sent-142, score-0.495]

42 The process by which patterns are extracted from a news is explained in Algorithm 2 and exemplified graphically in Figure 1 (2–4). [sent-143, score-0.317]

43 For example, the MST for the en2As our objective is to generate very short titles (under 10 words), we only consider combinations of up to three elements of E. [sent-151, score-0.213]

44 The reason is that we want to limit ourselves, in each news collection, to the most relevant event reported in the collection, which appears most of the times in these two sentences. [sent-159, score-0.28]

45 The patterns mined from the same news collection and for the same set of entities are grouped together, and constitute the building blocks of the clustering algorithm which is described below. [sent-161, score-0.483]

46 2 Training The extracted patterns are used to learn a NoisyOR (Pearl, 1988) model by estimating the probability that each (observed) pattern activates one or many (hidden) events. [sent-163, score-0.256]

47 Figure 2 represents the two levels: the hidden event variables at the top, and the observed pattern variables at the bottom. [sent-164, score-0.359]

48 The associations between latent event variables and observed pattern variables are modeled by noisy-OR gates. [sent-170, score-0.333]

49 The associations between latent events and observed patterns are modeled by noisy-OR gates. [sent-173, score-0.246]

50 In this model, the conditional probability of a hidden event ei given a configuration of observed patterns p ∈ {0, is calculated as: 1}|P| P(ei = 0 | p) = (1 − qi0) Y (1 − qij)pj jY∈πj = exp−θi0−jX∈πiθijpj , where πi is the set of active events (i. [sent-174, score-0.477]

51 , πi = ∪j {pj } | pj = 1), and qij = P(ei = 1 | pj = 1) i∪s t{hpe e}s |tim pated probability that the= =ob 1se |rv ped pattern pi can, in isolation, activate the event e. [sent-176, score-0.378]

52 The term qi0 is the so-called “noise” term of the model, and it accounts for the fact that an observed event ei might be activated by some pattern that has never been observed (Jaakkola and Jordan, 1999). [sent-177, score-0.403]

53 In Algorithm 1, at the end of the process we group in R[N, Ei] all the patterns extracted from tghreo same news cEollection N and entity sub-set Ei. [sent-178, score-0.362]

54 3 Inference (generation of new headlines) Given an unseen news collection N, the inference component of HEADY generates a single headline that captures the main event reported by the news in N. [sent-183, score-1.124]

55 Having selected p∗, in order to generate a headline it is sufficient to replace the entity place- holders in p∗ with the surface forms observed in N. [sent-185, score-0.662]

56 Given a set of entities E and sentences n, EXTRACTPATTERNSΨ(n, E) collects patterns involving those entities. [sent-188, score-0.215]

57 A two-step random walk traversing to the latent event nodes and back to the pattern nodes allows us to generalize across events. [sent-190, score-0.219]

58 Given the set of entities E mentioned in the news collection, we consider each entity sub- set Ei ⊆ E including up to three entities3. [sent-193, score-0.301]

59 For eac⊆h Ei, we run INFERENCE(n, Ei), which computes a distribution wi over patterns involving the entities in Ei. [sent-194, score-0.215]

60 This computes a probability distributio⊆n w over iasll patterns sin avolving any admissible subset of the entities mentioned in the collection. [sent-197, score-0.215]

61 Third, we select the entity-specific distribution that approximates better the overall distribution w∗ = arg miaxcos(w, wi) We assume that the corresponding set of entities Ei are the most central entities in the collection and therefore any headline should make sure to mention them all. [sent-199, score-0.788]

62 3As we noted before, we impose this limitation to keep the generated headlines relatively short and to limit data sparsity issues. [sent-200, score-0.367]

63 A” we would produce patterns such as “[person:2] killed [person: 1]” or “[person: 1] was killed by [person:2]” since ea = “Mr. [sent-216, score-0.194]

64 At inference time, when we query the model with such patterns we can only activate events whose assignments are compatible with the entities observed in the text, making the replacement straightforward and unambiguous. [sent-219, score-0.354]

65 4 Experiment settings In our method we use patterns that are fully lexicalized (with the exception of entity placeholders) and enriched with syntactic data. [sent-220, score-0.183]

66 To our best knowledge, available data sets for headline generation are not large enough to support this kind of inference. [sent-222, score-0.612]

67 The total number of news collections after clustering is 1. [sent-227, score-0.269]

68 As we have no development set, we have done no tuning of the parameters for pattern extraction nor for the Bayesian network training (100,000 latent variables to represent the different events, 40 EM iterations, as mentioned in Section 3. [sent-230, score-0.19]

69 1 Systems used One of the questions we wanted to answer in this research was whether it was possible to obtain the same quality with automatically abstracted headlines as with human-generated headlines. [sent-234, score-0.367]

70 For every news collection we have as many humangenerated headlines as documents. [sent-235, score-0.635]

71 To decide which human-generated headline should be used in this comparison, we used three different methods that pick one of the collection headlines: • • Latest headline: selects the headline from tLhaet elasttes hte adodcliunme:en ste ilenc tthse t hceol hleecatdiolinn. [sent-236, score-1.179]

72 e Ifnrotmuitively this should be the most relevant one for news about sport matches and competitions, where the earlier headlines offer previews and predictions, and the later headlines report who won and the final scores. [sent-237, score-0.913]

73 Most frequent headline: some headlines are repeated across tdhlien collection, and this method chooses the most frequent one. [sent-238, score-0.463]

74 title that has the smallest The news Kullback-Leibler 5Even though we did not run any experiment to find an optimal value for this parameter, 50 documents seems like a reasonable choice to avoid redundancy while allowing for considerable lexical and syntactic variation. [sent-241, score-0.219]

75 6The most frequent headline only has a tie in 6 collections in the whole test set. [sent-242, score-0.683]

76 In 5 cases two headlines are tied at frequencies around 4, and in one case three headlines are tied at frequency 2. [sent-243, score-0.734]

77 All six are large collections with 50 news articles, so this baseline is significantly different from a random baseline. [sent-244, score-0.269]

78 R-1 HEADY Most frequent pattern TopicSum MSC Most frequent headline Latest headline 0. [sent-245, score-1.304]

79 In other words, the only difference with respect to HEADY is that in this case no generalization through the Noisy-OR network is carried out, and that headlines are generated from patterns directly observed in the test news collections. [sent-268, score-0.787]

80 First, from the set of collections that we had set aside at the beginning, we randomly chose 50 collections for which all the systems could generate an output, and we asked raters to manually write titles for them. [sent-272, score-0.472]

81 We collected between four and five reference titles for each of the fifty news collections, to be used to compare the headline 1249 Readability TopicSum Most freq. [sent-274, score-0.871]

82 Raters were shown one headline and asked to rate it in terms of readability on a 5-point Likert scale. [sent-293, score-0.601]

83 In the instructions, the raters were provided with examples of ungrammatical and grammatical titles to guide them in this annotation. [sent-294, score-0.235]

84 After the previous rating is done, raters were shown a selection of five documents from the collection, and they were asked to judge the informativeness of the previous headline for the news in the collection, again on a 5-point Likert scale. [sent-296, score-0.859]

85 They did not know that the headlines they were rating were generated according to different methods. [sent-299, score-0.367]

86 Table 2 shows the results of the comparison of the headline generation systems using ROUGE (R-1, R-2 and R-SU4) (Lin, 2004) with the collected references. [sent-316, score-0.612]

87 The human annotators that created the references for this evaluation were explicitly instructed to write objective titles, which is the kind of headlines that the abstractive systems aim at generating. [sent-320, score-0.507]

88 It is common to see real headlines that are catchy, joking, or with a double meaning, and therefore they use a different vocabulary than objective titles that simply mention what happened. [sent-321, score-0.548]

89 TopicSum sometimes selects objective titles amongst the human-made titles and that is why it also scores very well with the ROUGE scores. [sent-322, score-0.328]

90 But the other two criteria for choosing human-made headlines select non-objective titles much more often, and this lowers their performance when measured with ROUGE with respect to the objective references. [sent-323, score-0.548]

91 The manual evaluation is asking raters to judge whether real, human-written titles that were actually used for those news are grammatical and informative. [sent-329, score-0.443]

92 Reds midfielder victim of racist tweet Frequent pattern mKayldee McFadzean fired a equaliser Crawley were Frequent headline Latics halt Crawley charge HEADYFK. [sent-333, score-0.663]

93 McFadzeanrescuespointforCrawleyTown TopicSum MSC Latest headline UCI to strip Lance Armstrong of his 7 Tour titles The international cycling union said today. [sent-335, score-0.692]

94 Headlines based on the most frequent patterns were better than MSC for all metrics but ROUGE-2. [sent-338, score-0.186]

95 The first example shows a news collection containing news about a rumour that was immediately denied. [sent-343, score-0.447]

96 The examples also show that TopicSum is very effective in selecting a good human-generated headline for each collection. [sent-346, score-0.545]

97 6 Conclusions We have presented HEADY, an abstractive headline generation system based on the generalization of syntactic patterns by means of a Noisy-OR Bayesian network. [sent-348, score-0.884]

98 HEADY performs significantly better than a stateof-the-art open domain abstractive model (Filippova, 2010) in all evaluations, and is in par with human-generated headlines in terms of ROUGE scores. [sent-350, score-0.473]

99 We have shown that it is possible to achieve high quality generation of news headlines in an open-domain, unsupervised setting by successfully exploiting syntactic and ontological information. [sent-351, score-0.613]

100 For feature work, we plan to improve all components of HEADY in order to fill in the gap with the human-generated titles in terms of readability and informativeness. [sent-353, score-0.203]


similar papers computed by tfidf model

tfidf for this paper:

wordName wordTfidf (topN-words)

[('headline', 0.545), ('headlines', 0.367), ('heady', 0.346), ('news', 0.179), ('topicsum', 0.159), ('titles', 0.147), ('patterns', 0.138), ('pattern', 0.118), ('carmelo', 0.116), ('msc', 0.116), ('abstractive', 0.106), ('ei', 0.104), ('event', 0.101), ('collections', 0.09), ('collection', 0.089), ('raters', 0.088), ('rouge', 0.085), ('filippova', 0.083), ('actress', 0.083), ('wedding', 0.083), ('nnp', 0.082), ('entities', 0.077), ('charlize', 0.072), ('lala', 0.072), ('events', 0.068), ('generation', 0.067), ('armstrong', 0.058), ('celebrity', 0.058), ('crawley', 0.058), ('extractpatterns', 0.058), ('theron', 0.058), ('readability', 0.056), ('anthony', 0.054), ('compression', 0.052), ('latest', 0.051), ('frequent', 0.048), ('informativeness', 0.047), ('person', 0.047), ('entity', 0.045), ('marry', 0.045), ('collectiontopatterns', 0.043), ('stonestreet', 0.043), ('vazquez', 0.043), ('bayesian', 0.041), ('dating', 0.041), ('pi', 0.041), ('observed', 0.04), ('title', 0.04), ('pj', 0.04), ('qij', 0.038), ('variables', 0.037), ('banko', 0.036), ('vb', 0.035), ('mst', 0.035), ('network', 0.035), ('objective', 0.034), ('tour', 0.033), ('generate', 0.032), ('extractive', 0.032), ('patty', 0.032), ('eb', 0.032), ('inference', 0.031), ('wan', 0.03), ('katja', 0.03), ('jaakkola', 0.03), ('applyheuristics', 0.029), ('ccconjnsubjaux', 0.029), ('ciara', 0.029), ('combineentitytypes', 0.029), ('getmentionnodes', 0.029), ('getrelevantentities', 0.029), ('icc', 0.029), ('ingliar', 0.029), ('laughs', 0.029), ('mcfadzean', 0.029), ('onisko', 0.029), ('preprocessdata', 0.029), ('prim', 0.029), ('rumours', 0.029), ('stylist', 0.029), ('manual', 0.029), ('cc', 0.028), ('killed', 0.028), ('eric', 0.028), ('generalization', 0.028), ('informative', 0.027), ('fusion', 0.027), ('summarization', 0.026), ('hidden', 0.026), ('star', 0.026), ('freebase', 0.026), ('tom', 0.026), ('middleton', 0.026), ('genest', 0.026), ('uned', 0.026), ('wed', 0.026), ('wording', 0.026), ('enforce', 0.026), ('haghighi', 0.026), ('aside', 0.025)]

similar papers list:

simIndex simValue paperId paperTitle

same-paper 1 1.0000007 178 acl-2013-HEADY: News headline abstraction through event pattern clustering

Author: Enrique Alfonseca ; Daniele Pighin ; Guillermo Garrido

Abstract: This paper presents HEADY: a novel, abstractive approach for headline generation from news collections. From a web-scale corpus of English news, we mine syntactic patterns that a Noisy-OR model generalizes into event descriptions. At inference time, we query the model with the patterns observed in an unseen news collection, identify the event that better captures the gist of the collection and retrieve the most appropriate pattern to generate a headline. HEADY improves over a state-of-theart open-domain title abstraction method, bridging half of the gap that separates it from extractive methods using humangenerated titles in manual evaluations, and performs comparably to human-generated headlines as evaluated with ROUGE.

2 0.11523215 160 acl-2013-Fine-grained Semantic Typing of Emerging Entities

Author: Ndapandula Nakashole ; Tomasz Tylenda ; Gerhard Weikum

Abstract: Methods for information extraction (IE) and knowledge base (KB) construction have been intensively studied. However, a largely under-explored case is tapping into highly dynamic sources like news streams and social media, where new entities are continuously emerging. In this paper, we present a method for discovering and semantically typing newly emerging out-ofKB entities, thus improving the freshness and recall of ontology-based IE and improving the precision and semantic rigor of open IE. Our method is based on a probabilistic model that feeds weights into integer linear programs that leverage type signatures of relational phrases and type correlation or disjointness constraints. Our experimental evaluation, based on crowdsourced user studies, show our method performing significantly better than prior work.

3 0.10340738 18 acl-2013-A Sentence Compression Based Framework to Query-Focused Multi-Document Summarization

Author: Lu Wang ; Hema Raghavan ; Vittorio Castelli ; Radu Florian ; Claire Cardie

Abstract: We consider the problem of using sentence compression techniques to facilitate queryfocused multi-document summarization. We present a sentence-compression-based framework for the task, and design a series of learning-based compression models built on parse trees. An innovative beam search decoder is proposed to efficiently find highly probable compressions. Under this framework, we show how to integrate various indicative metrics such as linguistic motivation and query relevance into the compression process by deriving a novel formulation of a compression scoring function. Our best model achieves statistically significant improvement over the state-of-the-art systems on several metrics (e.g. 8.0% and 5.4% improvements in ROUGE-2 respectively) for the DUC 2006 and 2007 summarization task. ,

4 0.092324205 5 acl-2013-A Decade of Automatic Content Evaluation of News Summaries: Reassessing the State of the Art

Author: Peter A. Rankel ; John M. Conroy ; Hoa Trang Dang ; Ani Nenkova

Abstract: How good are automatic content metrics for news summary evaluation? Here we provide a detailed answer to this question, with a particular focus on assessing the ability of automatic evaluations to identify statistically significant differences present in manual evaluation of content. Using four years of data from the Text Analysis Conference, we analyze the performance of eight ROUGE variants in terms of accuracy, precision and recall in finding significantly different systems. Our experiments show that some of the neglected variants of ROUGE, based on higher order n-grams and syntactic dependencies, are most accurate across the years; the commonly used ROUGE-1 scores find too many significant differences between systems which manual evaluation would deem comparable. We also test combinations ofROUGE variants and find that they considerably improve the accuracy of automatic prediction.

5 0.088823229 296 acl-2013-Recognizing Identical Events with Graph Kernels

Author: Goran Glavas ; Jan Snajder

Abstract: Identifying news stories that discuss the same real-world events is important for news tracking and retrieval. Most existing approaches rely on the traditional vector space model. We propose an approach for recognizing identical real-world events based on a structured, event-oriented document representation. We structure documents as graphs of event mentions and use graph kernels to measure the similarity between document pairs. Our experiments indicate that the proposed graph-based approach can outperform the traditional vector space model, and is especially suitable for distinguishing between topically similar, yet non-identical events.

6 0.087260082 56 acl-2013-Argument Inference from Relevant Event Mentions in Chinese Argument Extraction

7 0.082335211 169 acl-2013-Generating Synthetic Comparable Questions for News Articles

8 0.081385046 224 acl-2013-Learning to Extract International Relations from Political Context

9 0.080488384 233 acl-2013-Linking Tweets to News: A Framework to Enrich Short Text Data in Social Media

10 0.075204149 139 acl-2013-Entity Linking for Tweets

11 0.071765251 283 acl-2013-Probabilistic Domain Modelling With Contextualized Distributional Semantic Vectors

12 0.071171306 256 acl-2013-Named Entity Recognition using Cross-lingual Resources: Arabic as an Example

13 0.070507176 65 acl-2013-BRAINSUP: Brainstorming Support for Creative Sentence Generation

14 0.068299316 206 acl-2013-Joint Event Extraction via Structured Prediction with Global Features

15 0.066531911 129 acl-2013-Domain-Independent Abstract Generation for Focused Meeting Summarization

16 0.064227439 157 acl-2013-Fast and Robust Compressive Summarization with Dual Decomposition and Multi-Task Learning

17 0.063578829 153 acl-2013-Extracting Events with Informal Temporal References in Personal Histories in Online Communities

18 0.063108645 115 acl-2013-Detecting Event-Related Links and Sentiments from Social Media Texts

19 0.062413864 172 acl-2013-Graph-based Local Coherence Modeling

20 0.060375329 22 acl-2013-A Structured Distributional Semantic Model for Event Co-reference


similar papers computed by lsi model

lsi for this paper:

topicId topicWeight

[(0, 0.163), (1, 0.06), (2, -0.0), (3, -0.06), (4, 0.035), (5, 0.106), (6, 0.037), (7, 0.016), (8, -0.061), (9, -0.03), (10, -0.044), (11, -0.042), (12, -0.052), (13, -0.005), (14, -0.051), (15, 0.033), (16, 0.033), (17, -0.033), (18, -0.024), (19, 0.019), (20, -0.028), (21, -0.074), (22, -0.008), (23, 0.067), (24, 0.012), (25, -0.022), (26, 0.023), (27, -0.039), (28, -0.025), (29, 0.078), (30, -0.023), (31, -0.021), (32, -0.047), (33, 0.008), (34, 0.023), (35, -0.032), (36, -0.006), (37, -0.008), (38, -0.014), (39, -0.008), (40, 0.035), (41, -0.066), (42, -0.043), (43, -0.033), (44, -0.005), (45, -0.013), (46, -0.047), (47, 0.099), (48, 0.011), (49, 0.025)]

similar papers list:

simIndex simValue paperId paperTitle

same-paper 1 0.92979079 178 acl-2013-HEADY: News headline abstraction through event pattern clustering

Author: Enrique Alfonseca ; Daniele Pighin ; Guillermo Garrido

Abstract: This paper presents HEADY: a novel, abstractive approach for headline generation from news collections. From a web-scale corpus of English news, we mine syntactic patterns that a Noisy-OR model generalizes into event descriptions. At inference time, we query the model with the patterns observed in an unseen news collection, identify the event that better captures the gist of the collection and retrieve the most appropriate pattern to generate a headline. HEADY improves over a state-of-theart open-domain title abstraction method, bridging half of the gap that separates it from extractive methods using humangenerated titles in manual evaluations, and performs comparably to human-generated headlines as evaluated with ROUGE.

2 0.70478851 160 acl-2013-Fine-grained Semantic Typing of Emerging Entities

Author: Ndapandula Nakashole ; Tomasz Tylenda ; Gerhard Weikum

Abstract: Methods for information extraction (IE) and knowledge base (KB) construction have been intensively studied. However, a largely under-explored case is tapping into highly dynamic sources like news streams and social media, where new entities are continuously emerging. In this paper, we present a method for discovering and semantically typing newly emerging out-ofKB entities, thus improving the freshness and recall of ontology-based IE and improving the precision and semantic rigor of open IE. Our method is based on a probabilistic model that feeds weights into integer linear programs that leverage type signatures of relational phrases and type correlation or disjointness constraints. Our experimental evaluation, based on crowdsourced user studies, show our method performing significantly better than prior work.

3 0.65623105 172 acl-2013-Graph-based Local Coherence Modeling

Author: Camille Guinaudeau ; Michael Strube

Abstract: We propose a computationally efficient graph-based approach for local coherence modeling. We evaluate our system on three tasks: sentence ordering, summary coherence rating and readability assessment. The performance is comparable to entity grid based approaches though these rely on a computationally expensive training phase and face data sparsity problems.

4 0.61386353 18 acl-2013-A Sentence Compression Based Framework to Query-Focused Multi-Document Summarization

Author: Lu Wang ; Hema Raghavan ; Vittorio Castelli ; Radu Florian ; Claire Cardie

Abstract: We consider the problem of using sentence compression techniques to facilitate queryfocused multi-document summarization. We present a sentence-compression-based framework for the task, and design a series of learning-based compression models built on parse trees. An innovative beam search decoder is proposed to efficiently find highly probable compressions. Under this framework, we show how to integrate various indicative metrics such as linguistic motivation and query relevance into the compression process by deriving a novel formulation of a compression scoring function. Our best model achieves statistically significant improvement over the state-of-the-art systems on several metrics (e.g. 8.0% and 5.4% improvements in ROUGE-2 respectively) for the DUC 2006 and 2007 summarization task. ,

5 0.60327917 224 acl-2013-Learning to Extract International Relations from Political Context

Author: Brendan O'Connor ; Brandon M. Stewart ; Noah A. Smith

Abstract: We describe a new probabilistic model for extracting events between major political actors from news corpora. Our unsupervised model brings together familiar components in natural language processing (like parsers and topic models) with contextual political information— temporal and dyad dependence—to infer latent event classes. We quantitatively evaluate the model’s performance on political science benchmarks: recovering expert-assigned event class valences, and detecting real-world conflict. We also conduct a small case study based on our model’s inferences. A supplementary appendix, and replication software/data are available online, at: http://brenocon.com/irevents

6 0.60251957 352 acl-2013-Towards Accurate Distant Supervision for Relational Facts Extraction

7 0.59869963 159 acl-2013-Filling Knowledge Base Gaps for Distant Supervision of Relation Extraction

8 0.5979355 353 acl-2013-Towards Robust Abstractive Multi-Document Summarization: A Caseframe Analysis of Centrality and Domain

9 0.5926345 142 acl-2013-Evolutionary Hierarchical Dirichlet Process for Timeline Summarization

10 0.58585715 332 acl-2013-Subtree Extractive Summarization via Submodular Maximization

11 0.58007145 225 acl-2013-Learning to Order Natural Language Texts

12 0.57566142 283 acl-2013-Probabilistic Domain Modelling With Contextualized Distributional Semantic Vectors

13 0.57459193 333 acl-2013-Summarization Through Submodularity and Dispersion

14 0.57161945 179 acl-2013-HYENA-live: Fine-Grained Online Entity Type Classification from Natural-language Text

15 0.56238103 157 acl-2013-Fast and Robust Compressive Summarization with Dual Decomposition and Multi-Task Learning

16 0.54973966 129 acl-2013-Domain-Independent Abstract Generation for Focused Meeting Summarization

17 0.54650688 365 acl-2013-Understanding Tables in Context Using Standard NLP Toolkits

18 0.544967 377 acl-2013-Using Supervised Bigram-based ILP for Extractive Summarization

19 0.54233032 65 acl-2013-BRAINSUP: Brainstorming Support for Creative Sentence Generation

20 0.52698702 5 acl-2013-A Decade of Automatic Content Evaluation of News Summaries: Reassessing the State of the Art


similar papers computed by lda model

lda for this paper:

topicId topicWeight

[(0, 0.056), (6, 0.048), (11, 0.057), (15, 0.013), (21, 0.011), (24, 0.06), (26, 0.052), (35, 0.082), (42, 0.037), (48, 0.044), (56, 0.26), (60, 0.011), (64, 0.011), (70, 0.054), (88, 0.03), (90, 0.024), (95, 0.066)]

similar papers list:

simIndex simValue paperId paperTitle

1 0.91125607 108 acl-2013-Decipherment

Author: Kevin Knight

Abstract: The first natural language processing systems had a straightforward goal: decipher coded messages sent by the enemy. This tutorial explores connections between early decipherment research and today’s NLP work. We cover classic military and diplomatic ciphers, automatic decipherment algorithms, unsolved ciphers, language translation as decipherment, and analyzing ancient writing as decipherment. 1 Tutorial Overview The first natural language processing systems had a straightforward goal: decipher coded messages sent by the enemy. Sixty years later, we have many more applications, including web search, question answering, summarization, speech recognition, and language translation. This tutorial explores connections between early decipherment research and today’s NLP work. We find that many ideas from the earlier era have become core to the field, while others still remain to be picked up and developed. We first cover classic military and diplomatic cipher types, including complex substitution ciphers implemented in the first electro-mechanical encryption machines. We look at mathematical tools (language recognition, frequency counting, smoothing) developed to decrypt such ciphers on proto-computers. We show algorithms and extensive empirical results for solving different types of ciphers, and we show the role of algorithms in recent decipherments of historical documents. We then look at how foreign language can be viewed as a code for English, a concept developed by Alan Turing and Warren Weaver. We describe recently published work on building automatic translation systems from non-parallel data. We also demonstrate how some of the same algorithmic tools can be applied to natural language tasks like part-of-speech tagging and word alignment. Turning back to historical ciphers, we explore a number of unsolved ciphers, giving results of initial computer experiments on several of them. Finally, we look briefly at writing as a way to encipher phoneme sequences, covering ancient scripts and modern applications. 2 Outline 1. Classical military/diplomatic ciphers (15 minutes) • 60 cipher types (ACA) • Ciphers vs. codes • Enigma cipher: the mother of natural language processing computer analysis of text language recognition Good-Turing smoothing – – – 2. Foreign language as a code (10 minutes) • • Alan Turing’s ”Thinking Machines” Warren Weaver’s Memorandum 3. Automatic decipherment (55 minutes) • Cipher type detection • Substitution ciphers (simple, homophonic, polyalphabetic, etc) plaintext language recognition ∗ how much plaintext knowledge is – nheowede mdu 3 Proce diSnogfsia, of B thuleg5a r1iast, A Anungu aslt M4-9e t2in01g3 o.f ? tc he20 A1s3so Acsiasoticoinat fio rn C fo rm Cpoumtaptuiotantaioln Lainlg Luinisgtuicis ,tpi casges 3–4, – ∗ index of coincidence, unicity distance, oanf dc oointhceidr measures navigating a difficult search space ∗ frequencies of letters and words ∗ pattern words and cribs ∗ pElMin,g ILP, Bayesian models, sam– recent decipherments ∗ Jefferson cipher, Copiale cipher, cJievfifle war ciphers, n Caovaplia Enigma • • • • Application to part-of-speech tagging, Awopprdli alignment Application to machine translation withoAuptp parallel t teoxtm Parallel development of cryptography aPnarda ltrleanls dlaetvioenlo Recently released NSA internal nReewcselnettlyter (1974-1997) 4. *** Break *** (30 minutes) 5. Unsolved ciphers (40 minutes) • Zodiac 340 (1969), including computatZioodnaial cw 3o4r0k • Voynich Manuscript (early 1400s), including computational ewarolyrk • Beale (1885) • Dorabella (1897) • Taman Shud (1948) • Kryptos (1990), including computatKiorynaplt owsor (k1 • McCormick (1999) • Shoeboxes in attics: DuPonceau jour- nal, Finnerana, SYP, Mopse, diptych 6. Writing as a code (20 minutes) • Does writing encode ideas, or does it encDoodees phonemes? • Ancient script decipherment Egyptian hieroglyphs Linear B Mayan glyphs – – – – wUgoarkritic, including computational Chinese N ¨ushu, including computational work • Automatic phonetic decipherment • Application to transliteration 7. Undeciphered writing systems (15 minutes) • Indus Valley Script (3300BC) • Linear A (1900BC) • Phaistos disc (1700BC?) • Rongorongo (1800s?) – 8. Conclusion and further questions (15 minutes) 3 About the Presenter Kevin Knight is a Senior Research Scientist and Fellow at the Information Sciences Institute of the University of Southern California (USC), and a Research Professor in USC’s Computer Science Department. He received a PhD in computer science from Carnegie Mellon University and a bachelor’s degree from Harvard University. Professor Knight’s research interests include natural language processing, machine translation, automata theory, and decipherment. In 2001, he co-founded Language Weaver, Inc., and in 2011, he served as President of the Association for Computational Linguistics. Dr. Knight has taught computer science courses at USC for more than fifteen years and co-authored the widely adopted textbook Artificial Intelligence. 4

2 0.9063772 190 acl-2013-Implicatures and Nested Beliefs in Approximate Decentralized-POMDPs

Author: Adam Vogel ; Christopher Potts ; Dan Jurafsky

Abstract: Conversational implicatures involve reasoning about multiply nested belief structures. This complexity poses significant challenges for computational models of conversation and cognition. We show that agents in the multi-agent DecentralizedPOMDP reach implicature-rich interpretations simply as a by-product of the way they reason about each other to maximize joint utility. Our simulations involve a reference game of the sort studied in psychology and linguistics as well as a dynamic, interactional scenario involving implemented artificial agents.

3 0.80629575 65 acl-2013-BRAINSUP: Brainstorming Support for Creative Sentence Generation

Author: Gozde Ozbal ; Daniele Pighin ; Carlo Strapparava

Abstract: Daniele Pighin Google Inc. Z ¨urich, Switzerland danie le . pighin@ gmai l com . Carlo Strapparava FBK-irst Trento, Italy st rappa@ fbk . eu you”. As another scenario, creative sentence genWe present BRAINSUP, an extensible framework for the generation of creative sentences in which users are able to force several words to appear in the sentences and to control the generation process across several semantic dimensions, namely emotions, colors, domain relatedness and phonetic properties. We evaluate its performance on a creative sentence generation task, showing its capability of generating well-formed, catchy and effective sentences that have all the good qualities of slogans produced by human copywriters.

same-paper 4 0.78330731 178 acl-2013-HEADY: News headline abstraction through event pattern clustering

Author: Enrique Alfonseca ; Daniele Pighin ; Guillermo Garrido

Abstract: This paper presents HEADY: a novel, abstractive approach for headline generation from news collections. From a web-scale corpus of English news, we mine syntactic patterns that a Noisy-OR model generalizes into event descriptions. At inference time, we query the model with the patterns observed in an unseen news collection, identify the event that better captures the gist of the collection and retrieve the most appropriate pattern to generate a headline. HEADY improves over a state-of-theart open-domain title abstraction method, bridging half of the gap that separates it from extractive methods using humangenerated titles in manual evaluations, and performs comparably to human-generated headlines as evaluated with ROUGE.

5 0.75406867 258 acl-2013-Neighbors Help: Bilingual Unsupervised WSD Using Context

Author: Sudha Bhingardive ; Samiulla Shaikh ; Pushpak Bhattacharyya

Abstract: Word Sense Disambiguation (WSD) is one of the toughest problems in NLP, and in WSD, verb disambiguation has proved to be extremely difficult, because of high degree of polysemy, too fine grained senses, absence of deep verb hierarchy and low inter annotator agreement in verb sense annotation. Unsupervised WSD has received widespread attention, but has performed poorly, specially on verbs. Recently an unsupervised bilingual EM based algorithm has been proposed, which makes use only of the raw counts of the translations in comparable corpora (Marathi and Hindi). But the performance of this approach is poor on verbs with accuracy level at 25-38%. We suggest a modifica- tion to this mentioned formulation, using context and semantic relatedness of neighboring words. An improvement of 17% 35% in the accuracy of verb WSD is obtained compared to the existing EM based approach. On a general note, the work can be looked upon as contributing to the framework of unsupervised WSD through context aware expectation maximization.

6 0.55737972 318 acl-2013-Sentiment Relevance

7 0.5559842 272 acl-2013-Paraphrase-Driven Learning for Open Question Answering

8 0.55425304 2 acl-2013-A Bayesian Model for Joint Unsupervised Induction of Sentiment, Aspect and Discourse Representations

9 0.55423617 83 acl-2013-Collective Annotation of Linguistic Resources: Basic Principles and a Formal Model

10 0.5513683 159 acl-2013-Filling Knowledge Base Gaps for Distant Supervision of Relation Extraction

11 0.55116904 185 acl-2013-Identifying Bad Semantic Neighbors for Improving Distributional Thesauri

12 0.55049795 134 acl-2013-Embedding Semantic Similarity in Tree Kernels for Domain Adaptation of Relation Extraction

13 0.54906368 224 acl-2013-Learning to Extract International Relations from Political Context

14 0.54892677 267 acl-2013-PARMA: A Predicate Argument Aligner

15 0.54875314 254 acl-2013-Multimodal DBN for Predicting High-Quality Answers in cQA portals

16 0.548262 169 acl-2013-Generating Synthetic Comparable Questions for News Articles

17 0.54825002 175 acl-2013-Grounded Language Learning from Video Described with Sentences

18 0.5482313 99 acl-2013-Crowd Prefers the Middle Path: A New IAA Metric for Crowdsourcing Reveals Turker Biases in Query Segmentation

19 0.54811841 194 acl-2013-Improving Text Simplification Language Modeling Using Unsimplified Text Data

20 0.54743803 377 acl-2013-Using Supervised Bigram-based ILP for Extractive Summarization