emnlp emnlp2010 emnlp2010-102 knowledge-graph by maker-knowledge-mining
Source: pdf
Author: Michael Paul ; ChengXiang Zhai ; Roxana Girju
Abstract: This paper presents a two-stage approach to summarizing multiple contrastive viewpoints in opinionated text. In the first stage, we use an unsupervised probabilistic approach to model and extract multiple viewpoints in text. We experiment with a variety of lexical and syntactic features, yielding significant performance gains over bag-of-words feature sets. In the second stage, we introduce Comparative LexRank, a novel random walk formulation to score sentences and pairs of sentences from opposite viewpoints based on both their representativeness of the collection as well as their contrastiveness with each other. Exper- imental results show that the proposed approach can generate informative summaries of viewpoints in opinionated text.
Reference: text
sentIndex sentText sentNum sentScore
1 edu s Abstract This paper presents a two-stage approach to summarizing multiple contrastive viewpoints in opinionated text. [sent-3, score-1.002]
2 In the first stage, we use an unsupervised probabilistic approach to model and extract multiple viewpoints in text. [sent-4, score-0.571]
3 In the second stage, we introduce Comparative LexRank, a novel random walk formulation to score sentences and pairs of sentences from opposite viewpoints based on both their representativeness of the collection as well as their contrastiveness with each other. [sent-6, score-0.774]
4 Exper- imental results show that the proposed approach can generate informative summaries of viewpoints in opinionated text. [sent-7, score-0.911]
5 Usually, online opinionated text is generated by multiple people, and thus often contains multiple viewpoints about an issue or topic. [sent-10, score-0.663]
6 An opinion is usually expressed in association with a particular viewpoint, even though the viewpoint is usually ∗Now at Johns Hopkins University (mpaul@cs . [sent-13, score-0.375]
7 Moreover, in an opinionated text with diverse opinions, the multiple viewpoints taken by opinion holders are often “contrastive”, leading to opposite polarities. [sent-24, score-0.757]
8 Such opinions may reflect many different types of viewpoints which cannot be modeled by current systems. [sent-27, score-0.6]
9 For this reason, we believe that a viewpoint summarization system would benefit from the ability to extract unlabeled viewpoints without supervision. [sent-28, score-1.017]
10 Thus, given a set of opinionated documents about a topic, we aim at automatically extracting and sum- marizing the multiple contrastive viewpoints implicitly expressed in the opinionated text to facilitate digestion and comparison of different viewpoints. [sent-30, score-1.09]
11 Specifically, we will generate two types of multiview summaries: macro multi-view summary and micro multi-view summary. [sent-31, score-0.43]
12 A macro multi-view summary would contain multiple sets of sentences, each representing a different viewpoint; these different sets of sentences can be compared to understand the difference of multiple viewpoints at the “macro level. [sent-32, score-0.854]
13 ” A micro multi-view summary would contain a set of pairs of contrastive sentences (each pair ProcMe IdTi,n Mgsas ofsa tchehu 2se0t1t0s, C UoSnAfe,r 9e-n1ce1 o Onc Etombepri 2ic0a1l0 M. [sent-33, score-0.633]
14 ec th2o0d1s0 i Ans Nsaotcuiartaioln La fonrg Cuaogmep Purtoatcieosnsainlg L,in pgagueis ti 6c6s–76, consists of two sentences representing two different viewpoints), making it easy to understand the difference between two viewpoints at the “micro level. [sent-35, score-0.571]
15 , 2006)), existing work has not attempted to generate our envisioned contrastive macro and micro multi-view summaries in an unsupervised way, which is the goal of our work. [sent-40, score-0.831]
16 The closest work to ours is perhaps that of Lerman and McDonald (2009) who present an approach to contrastive summarization. [sent-48, score-0.307]
17 They add an objective to their summarization model such that the summary model for one set of text is different from the model for the other set. [sent-49, score-0.267]
18 The idea is to highlight the key differences between the sets, however this is a different type of contrast than the one we study here our goal is instead to make the summaries similar to each other, to contrast how the same information is conveyed through different viewpoints. [sent-50, score-0.29]
19 – 2 Modeling Viewpoints The first challenge to be solved in order to generate a contrastive summary of multiple viewpoints is to model and extract these viewpoints which are hidden in text. [sent-52, score-1.603]
20 , 2003) for jointly modeling topics and viewpoints in text. [sent-54, score-0.616]
21 These are 67 more discriminative than single word tokens and can improve the accuracy of extracting multiple viewpoints as we will show in the experimental results’ section. [sent-56, score-0.591]
22 , while the viewpoints might be the positive and negative sentiments. [sent-66, score-0.571]
23 A word like speakers, for instance depends on the sound topic but not a viewpoint, while good would be an example of a word that depends on a viewpoint but not any particular topic. [sent-67, score-0.404]
24 A word is associated with variables denoting its topic and viewpoint assignments, as well as two binary variables to denote if the word depends on the topic and if the word depends on the viewpoint. [sent-72, score-0.433]
25 Sample a topic z from P(z|d) and a viewpoint v from1. [sent-76, score-0.383]
26 The number of topics and number of viewpoints are parameters that must be specified. [sent-89, score-0.595]
27 TAM naturally gives us a very rich output to use in a viewpoint summarization application. [sent-91, score-0.446]
28 If we are doing unsupervised viewpoint extraction, we can use the output of the model to compute P(v|sentence) which could be used to generate sPu(mvm|saernietse tcheat) c wohntiachin only excerpts dth taot strongly highlight one viewpoint over another. [sent-92, score-0.88]
29 Futhermore, the variables r and ‘ tell us if a word is dependent on the viewpoint and topic, and we could use this information to focus on sentences that contain informative content words. [sent-94, score-0.333]
30 Note that without supervision, TAM’s clustering is based only on co-occurrences and the patterns it captures may or may not correspond with the viewpoints we wish to extract. [sent-95, score-0.61]
31 Nonetheless, we show in this research that it can indeed find meaningful viewpoints with reasonable accuracy on certain data sets. [sent-96, score-0.591]
32 Although we do not explore this in this paper, additional information about the viewpoints could be added to TAM by defining priors on the distributions to further improve the accuracy of viewpoint discovery. [sent-97, score-0.924]
33 For example, “Israel attacked Palestine” and “Palestine attacked Israel” are identical excerpts in an exchangable bag of words representation, yet one is more likely to come from the perspective of a Palestinian and the other from an Israeli. [sent-100, score-0.288]
34 We evaluate the utility of these features 68 to the task of modeling viewpoints by measuring the accuracy of unsupervised clustering. [sent-102, score-0.612]
35 1 Words We have experimented with simple bag of words features as baseline approaches, both with and without removing stop words, and found that the accu- racy of clustering by viewpoint is better when retaining all words. [sent-105, score-0.427]
36 , x|X| } with k viewpoints and generate two types of multi}-v wieitwh contrastive summaries: 1) A macro contrastive summary Smacro consists of k disjoint sets of excerpts, X1, X2, . [sent-148, score-1.468]
37 69 2) A micro contrastive summary Smicro consists of a set of excerpt pairs, each containing two excerpts from two different viewpoints, i. [sent-160, score-0.909]
38 Note that both macro and micro summaries can reveal contrast between different viewpoints, though at different granularity levels. [sent-167, score-0.545]
39 To generate macro and micro summaries based on the probabilistic assignment of excerpts to viewpoints given by TAM, we propose a novel extension to the LexRank algorithm (Erkan and Radev, 2004), a graph-based method for scoring representative excerpts to be used in a summary. [sent-168, score-1.599]
40 Our key idea is to modify the definition of the jumping probability in the random walk model so that it would favor excerpts that represent a viewpoint well and encourage jumping to an excerpt comparable with the current one but from a different viewpoint. [sent-169, score-0.848]
41 As a result, the stationary distribution of the random walk model would capture representative contrastive excerpts and allow us to generate both macro and micro contrastive summaries within a unified framework. [sent-170, score-1.465]
42 The jumping probability from node xi t iom neo tde − xj . [sent-177, score-0.252]
43 Our extension is mainly to modify this jumping probability in two ways so as to favor visiting contrastive representative opinions from multiple viewpoints. [sent-179, score-0.497]
44 The first modification is to make it favor jumping to a good representative excerpt x of any viewpoint v (i. [sent-180, score-0.559]
45 pTroheb sbeicloitynd p (mvo|dxi)fi acac-tion is to further favorjumping between two excerpts that can potentially form a good contrastive pair for × use in generating a micro contrastive summary. [sent-185, score-0.975]
46 Specifically, under our model, the random walker first decides whether to jump to a sentence of the same viewpoint or to a sentence of a different viewpoint. [sent-186, score-0.357]
47 It is also possible to score pairs of excerpts that contrast each other. [sent-195, score-0.282]
48 We define the score for a pair (xi, xj) as the probability of being at xi and transitioning to xj or vice versa, where xi and xj are of opposite viewpoints. [sent-196, score-0.476]
49 2 Summary Generation The final summary should be a set of excerpts that have a high relevance score according to our scoring algorithm, but are not redundant among each other. [sent-198, score-0.412]
50 Macro contrastive summarization: A macro-level summary consists of independent summaries for each viewpoint, which we generate by first using the random walk stationary distribution across all of the data to rank the excerpts. [sent-201, score-0.79]
51 We then separate the topranked excerpts into two disjoint sets according to their viewpoint based on whichever gives a greater value of P(v|x), and finally remove redundancy and produce Pth(ev summary according vtoe our mndeathnocdy adned– – scribed above. [sent-202, score-0.725]
52 We refer to this as macro contrastive summarization, because the summaries will contrast each other in that they have related content, but the excerpts in the summaries are not explicitly aligned with each other. [sent-203, score-1.167]
53 Micro contrastive summarization: A candidate excerpt for a micro-level summary will consist of a pair (xi, xj) with the pairwise relevance score defined in Equation 3. [sent-204, score-0.57]
54 It is possible that both xi and xj in a high-scoring pair may belong to the same viewpoint; such a case would be filtered out since we are mainly interested in including contrastive pairs in our summary. [sent-206, score-0.533]
55 We refer to this as micro contrastive summarization, because the summaries will allow us to see contrast at the level of individual excerpts from different viewpoints. [sent-207, score-0.937]
56 Respondants indicate if they are ‘for’ or ‘against’ the bill, and there is a roughly even mix of the two viewpoints (45% for and 48% against). [sent-216, score-0.571]
57 Moreover, for the healthcare data set, manually extracted opinion polls are available on the Web, which we further leverage to construct gold standard summaries to evaluate our method quantitatively. [sent-223, score-0.426]
58 2 Stage One: Modeling Viewpoints The main research question we want to answer in modeling viewpoints is whether richer feature sets would lead to better accuracy than word features. [sent-230, score-0.612]
59 We will use this approach for the summarization task as well, as it ensures we are only summarizing documents where we have high confidence about their viewpoint membership. [sent-236, score-0.506]
60 We tell the model to use 2 viewpoints as well as 5 topics for the healthcare corpus and 8 topics for the Bitterlemons corpus. [sent-244, score-0.755]
61 (1) In all cases, the proposed linguistic features yield higher accuracy than the word features, supporting our hypothesis that for viewpoint modeling, applying TAM to these features – improves performance over using simple word features. [sent-251, score-0.353]
62 , the Comparative LexRank algorithm), we mainly want to evaluate the quality of the generated contrastive multi-viewpoint summary and study the effectiveness of our extension to the standard LexRank. [sent-270, score-0.483]
63 Below we present extensive evaluation of our summarization method on the healthcare data. [sent-271, score-0.249]
64 In a way, this represents an expert human-generated summary of our database, and we will use this as a gold standard macro contrastive summary against which the representative4http://www. [sent-276, score-0.744]
65 aspx 72 ness of a multi-viewpoint contrastive summary can be evaluated. [sent-279, score-0.461]
66 We also want to develop a reference set for micro contrastive summaries, where we are mainly interested in evaluating contrastiveness. [sent-282, score-0.491]
67 To do this, we asked 3 annotators to identify contrastive pairs in the “main reasons” table described above. [sent-283, score-0.332]
68 We take the set of pairs that were identified as being contrastive by at least 2 annotators to be our gold set of contrastive pairs. [sent-285, score-0.639]
69 The highlighted cells show an example of a contrastive pair identified by our annotators. [sent-297, score-0.307]
70 The basic form of this algorithm is to select a set of sentences Sm to minimize the KL-divergence between the models of the summary Sm and the entire collection Xm for a viewpoint m. [sent-299, score-0.487]
71 ) Lerman and McDonald introduce an additional term to maximize the KL-divergence between the summary of one viewpoint and the collection of the opposite viewpoint, so that each viewpoint’s summary is dissimilar to the other viewpoints. [sent-304, score-0.693]
72 We borrow this idea but instead do the opposite so that the viewpoints’ summaries are more (rather than less) similar to each other. [sent-305, score-0.3]
73 This contrastive version of our −Pkm=1 model-based baseline is formulated as: Xk −X KL(L(Sm1)||L(Xm1)) + mX1 X= 1 ? [sent-306, score-0.307]
74 Our summPary generation algorithm is to iteratively add excerpts to the summary in a greedy fashion, selecting the excerpt with the highest score in each iteration. [sent-308, score-0.477]
75 Since the same excerpt may appear in multiple pairs, there would be significant redundancy in our reference summary if we were to include every pair. [sent-314, score-0.302]
76 Thus, we will restrict a contrastive reference summary to exclude overlapping pairs, and we will have many reference sets for all possible combinations of pairs. [sent-315, score-0.535]
77 Our reference summaries have a unique property in that the summaries have already been annotated with the prominence of the different reasons in the data. [sent-317, score-0.583]
78 For evaluating the macro-level summaries, we will score the summaries for the two viewpoints separately, given a reference set Refi and a candidate summary Ci for a viewpoint v = i. [sent-321, score-1.365]
79 It would also be interesting to measure how well a viewpoint’s summary matches the gold summary of the opposite viewpoint, which will give insights into how well the Comparative LexRank algorithm makes the two summaries similar to each other. [sent-327, score-0.608]
80 Finally, to score the micro-level comparative summaries (recall that this gives explicitly-aligned pairs of excerpts), we will concatenate each pair (xi, xj) as a single excerpt, and use these as the excerpts in our reference and candidate summaries. [sent-333, score-0.597]
81 Note that we have multiple reference summaries for the + 10λM. [sent-335, score-0.285]
82 In this case, the ROUGE score is defined as the maximum score among all possible reference summaries (Lin, 2004). [sent-350, score-0.329]
83 4 Evaluation Results In order to evaluate our Comparative LexRank algorithm by itself, in this subsection we will not use the output of TAM as part of our summarization input, and will assign excerpts fixed values of P(v|x) = 1for the correct label and 0 otherwise. [sent-354, score-0.353]
84 In general, increasing λ increases Srep, which suggests that tuning λ behaves as expected, and high- and mid-range λ values indeed produce summaries where the summaries of the two viewpoints are more similar to each other. [sent-367, score-1.067]
85 Similarly, mid-range λ values produce substantially higher values of Sp-1, the unigram ROUGE scores for the micro contrastive 74 summary, although there is not a large difference between the bigram scores. [sent-368, score-0.454]
86 As for our model-based baseline, we show results for both the basic algorithm (denoted MB) in addition to the contrastive modification (denoted MC). [sent-370, score-0.307]
87 We see that the contrastive modification behaves as expected and produces much higher scores for Sopp, however, this method does not outperform our LexRank algorithm. [sent-371, score-0.307]
88 4 Unsupervised Summarization So far we have focused on evaluating our viewpoint clustering models and our multi-view summarization algorithms separately. [sent-375, score-0.485]
89 If humans can correctly identify the viewpoints, then this would suggest both that the TAM accurately clustered documents by viewpoint and the summarization algorithm is selecting sentences that coherently represent the viewpoints. [sent-382, score-0.474]
90 h’ocukandrebwBilghtrnexaoclyhirwn, Table 4: An example of our micro-level contrastive summarization output on the healthcare data, using δ = 0. [sent-396, score-0.556]
91 erated macro contrastive summaries of our data for the two viewpoints with 6 sentences per viewpoint. [sent-399, score-1.255]
92 Such contrastive pairs are perhaps the most difficult data points to model. [sent-409, score-0.332]
93 A good test of a viewpoint model may be whether it can capture the nuanced properties of the viewpoints needed to contrast them at the micro level. [sent-410, score-1.072]
94 ” For example, someone in favor of the healthcare bill might focus on the benefits and someone against the bill might focus on the cost. [sent-412, score-0.34]
95 However, our approach is different in that our – 75 contrastive objective encourages the summaries to include each point as addressed by all viewpoints, rather than each viewpoint selectively emphasizing only certain points. [sent-413, score-0.888]
96 For example, someone in favor of healthcare reform might cite the high cost of the current system, but someone against this might counter-argue that the proposed system in the new bill has its own high costs (as seen in the last row of Table 4). [sent-415, score-0.308]
97 In conclusion, we have presented steps toward a two-stage system that can automatically extract and summarize viewpoints in opinionated text. [sent-419, score-0.683]
98 First, we have shown that accuracy of clustering documents by viewpoint can be enhanced by using simple but rich dependency features. [sent-420, score-0.42]
99 Second, we have introduced Comparative LexRank, an extension of the LexRank algorithm that aims to generate contrastive summaries both at the macro and micro level. [sent-422, score-0.853]
100 The algorithm presented is general enough that it can be applied to any number of viewpoints, and can accomodate input where the viewpoints are either given fixed labels, or given probabilistic assignments. [sent-423, score-0.571]
wordName wordTfidf (topN-words)
[('viewpoints', 0.571), ('viewpoint', 0.333), ('contrastive', 0.307), ('summaries', 0.248), ('excerpts', 0.214), ('lexrank', 0.19), ('tam', 0.173), ('summary', 0.154), ('micro', 0.147), ('healthcare', 0.136), ('macro', 0.129), ('xj', 0.124), ('summarization', 0.113), ('rel', 0.11), ('opinionated', 0.092), ('bitterlemons', 0.092), ('excerpt', 0.087), ('xi', 0.077), ('polarity', 0.069), ('favor', 0.056), ('walk', 0.056), ('opposite', 0.052), ('jumping', 0.051), ('comparative', 0.051), ('topic', 0.05), ('contrastiveness', 0.048), ('sopp', 0.048), ('srep', 0.048), ('rouge', 0.045), ('someone', 0.042), ('marneffe', 0.042), ('opinion', 0.042), ('lerman', 0.041), ('clustering', 0.039), ('reference', 0.037), ('girju', 0.037), ('erkan', 0.036), ('ev', 0.036), ('futhermore', 0.036), ('refi', 0.036), ('ros', 0.036), ('paul', 0.034), ('negation', 0.034), ('political', 0.034), ('bag', 0.034), ('representative', 0.032), ('tuples', 0.032), ('summarizing', 0.032), ('bill', 0.032), ('hu', 0.031), ('rewrite', 0.03), ('opinions', 0.029), ('documents', 0.028), ('contradiction', 0.028), ('amod', 0.028), ('prominence', 0.028), ('responses', 0.027), ('subsection', 0.026), ('stanford', 0.026), ('greene', 0.025), ('stationary', 0.025), ('minqing', 0.025), ('pairs', 0.025), ('tuple', 0.025), ('il', 0.025), ('topics', 0.024), ('redundancy', 0.024), ('fulltuple', 0.024), ('gallup', 0.024), ('refj', 0.024), ('rrel', 0.024), ('smacro', 0.024), ('smicro', 0.024), ('unmodified', 0.024), ('verbatim', 0.024), ('policy', 0.024), ('palestinian', 0.024), ('walker', 0.024), ('urbana', 0.024), ('sim', 0.024), ('wi', 0.024), ('xk', 0.023), ('sentiment', 0.023), ('perspectives', 0.022), ('sm', 0.022), ('reasons', 0.022), ('score', 0.022), ('scoring', 0.022), ('extension', 0.022), ('stop', 0.021), ('sound', 0.021), ('modeling', 0.021), ('contrast', 0.021), ('vanderwende', 0.02), ('carbonell', 0.02), ('attacked', 0.02), ('illinois', 0.02), ('mixtures', 0.02), ('summarize', 0.02), ('accuracy', 0.02)]
simIndex simValue paperId paperTitle
same-paper 1 0.99999923 102 emnlp-2010-Summarizing Contrastive Viewpoints in Opinionated Text
Author: Michael Paul ; ChengXiang Zhai ; Roxana Girju
Abstract: This paper presents a two-stage approach to summarizing multiple contrastive viewpoints in opinionated text. In the first stage, we use an unsupervised probabilistic approach to model and extract multiple viewpoints in text. We experiment with a variety of lexical and syntactic features, yielding significant performance gains over bag-of-words feature sets. In the second stage, we introduce Comparative LexRank, a novel random walk formulation to score sentences and pairs of sentences from opposite viewpoints based on both their representativeness of the collection as well as their contrastiveness with each other. Exper- imental results show that the proposed approach can generate informative summaries of viewpoints in opinionated text.
2 0.2438195 82 emnlp-2010-Multi-Document Summarization Using A* Search and Discriminative Learning
Author: Ahmet Aker ; Trevor Cohn ; Robert Gaizauskas
Abstract: In this paper we address two key challenges for extractive multi-document summarization: the search problem of finding the best scoring summary and the training problem of learning the best model parameters. We propose an A* search algorithm to find the best extractive summary up to a given length, which is both optimal and efficient to run. Further, we propose a discriminative training algorithm which directly maximises the quality ofthe best summary, rather than assuming a sentence-level decomposition as in earlier work. Our approach leads to significantly better results than earlier techniques across a number of evaluation metrics.
3 0.096444346 64 emnlp-2010-Incorporating Content Structure into Text Analysis Applications
Author: Christina Sauper ; Aria Haghighi ; Regina Barzilay
Abstract: In this paper, we investigate how modeling content structure can benefit text analysis applications such as extractive summarization and sentiment analysis. This follows the linguistic intuition that rich contextual information should be useful in these tasks. We present a framework which combines a supervised text analysis application with the induction of latent content structure. Both of these elements are learned jointly using the EM algorithm. The induced content structure is learned from a large unannotated corpus and biased by the underlying text analysis task. We demonstrate that exploiting content structure yields significant improvements over approaches that rely only on local context.1
4 0.076791696 100 emnlp-2010-Staying Informed: Supervised and Semi-Supervised Multi-View Topical Analysis of Ideological Perspective
Author: Amr Ahmed ; Eric Xing
Abstract: With the proliferation of user-generated articles over the web, it becomes imperative to develop automated methods that are aware of the ideological-bias implicit in a document collection. While there exist methods that can classify the ideological bias of a given document, little has been done toward understanding the nature of this bias on a topical-level. In this paper we address the problem ofmodeling ideological perspective on a topical level using a factored topic model. We develop efficient inference algorithms using Collapsed Gibbs sampling for posterior inference, and give various evaluations and illustrations of the utility of our model on various document collections with promising results. Finally we give a Metropolis-Hasting inference algorithm for a semi-supervised extension with decent results.
5 0.075001843 70 emnlp-2010-Jointly Modeling Aspects and Opinions with a MaxEnt-LDA Hybrid
Author: Xin Zhao ; Jing Jiang ; Hongfei Yan ; Xiaoming Li
Abstract: Discovering and summarizing opinions from online reviews is an important and challenging task. A commonly-adopted framework generates structured review summaries with aspects and opinions. Recently topic models have been used to identify meaningful review aspects, but existing topic models do not identify aspect-specific opinion words. In this paper, we propose a MaxEnt-LDA hybrid model to jointly discover both aspects and aspect-specific opinion words. We show that with a relatively small amount of training data, our model can effectively identify aspect and opinion words simultaneously. We also demonstrate the domain adaptability of our model.
6 0.074065059 83 emnlp-2010-Multi-Level Structured Models for Document-Level Sentiment Classification
7 0.070003428 105 emnlp-2010-Title Generation with Quasi-Synchronous Grammar
8 0.060117166 58 emnlp-2010-Holistic Sentiment Analysis Across Languages: Multilingual Supervised Latent Dirichlet Allocation
9 0.056118254 49 emnlp-2010-Extracting Opinion Targets in a Single and Cross-Domain Setting with Conditional Random Fields
10 0.051404346 120 emnlp-2010-What's with the Attitude? Identifying Sentences with Attitude in Online Discussions
11 0.043675877 48 emnlp-2010-Exploiting Conversation Structure in Unsupervised Topic Segmentation for Emails
12 0.041914828 84 emnlp-2010-NLP on Spoken Documents Without ASR
13 0.039398737 81 emnlp-2010-Modeling Perspective Using Adaptor Grammars
14 0.038562723 6 emnlp-2010-A Latent Variable Model for Geographic Lexical Variation
15 0.037549358 23 emnlp-2010-Automatic Keyphrase Extraction via Topic Decomposition
16 0.036807247 39 emnlp-2010-EMNLP 044
17 0.036749639 85 emnlp-2010-Negative Training Data Can be Harmful to Text Classification
18 0.035779923 111 emnlp-2010-Two Decades of Unsupervised POS Induction: How Far Have We Come?
19 0.035282977 34 emnlp-2010-Crouching Dirichlet, Hidden Markov Model: Unsupervised POS Tagging with Context Local Tag Generation
20 0.034531638 97 emnlp-2010-Simple Type-Level Unsupervised POS Tagging
topicId topicWeight
[(0, 0.15), (1, 0.1), (2, -0.129), (3, -0.061), (4, 0.116), (5, 0.025), (6, 0.063), (7, -0.037), (8, 0.033), (9, -0.076), (10, -0.076), (11, 0.013), (12, 0.081), (13, -0.088), (14, 0.0), (15, 0.014), (16, 0.231), (17, -0.334), (18, -0.212), (19, 0.217), (20, 0.126), (21, 0.221), (22, -0.045), (23, 0.068), (24, 0.129), (25, 0.033), (26, 0.014), (27, 0.007), (28, -0.072), (29, -0.065), (30, 0.005), (31, 0.051), (32, -0.086), (33, -0.013), (34, 0.038), (35, -0.093), (36, -0.025), (37, -0.125), (38, -0.119), (39, -0.008), (40, -0.031), (41, -0.082), (42, -0.04), (43, -0.041), (44, -0.103), (45, 0.122), (46, -0.091), (47, 0.046), (48, 0.096), (49, -0.075)]
simIndex simValue paperId paperTitle
same-paper 1 0.94587147 102 emnlp-2010-Summarizing Contrastive Viewpoints in Opinionated Text
Author: Michael Paul ; ChengXiang Zhai ; Roxana Girju
Abstract: This paper presents a two-stage approach to summarizing multiple contrastive viewpoints in opinionated text. In the first stage, we use an unsupervised probabilistic approach to model and extract multiple viewpoints in text. We experiment with a variety of lexical and syntactic features, yielding significant performance gains over bag-of-words feature sets. In the second stage, we introduce Comparative LexRank, a novel random walk formulation to score sentences and pairs of sentences from opposite viewpoints based on both their representativeness of the collection as well as their contrastiveness with each other. Exper- imental results show that the proposed approach can generate informative summaries of viewpoints in opinionated text.
2 0.86397755 82 emnlp-2010-Multi-Document Summarization Using A* Search and Discriminative Learning
Author: Ahmet Aker ; Trevor Cohn ; Robert Gaizauskas
Abstract: In this paper we address two key challenges for extractive multi-document summarization: the search problem of finding the best scoring summary and the training problem of learning the best model parameters. We propose an A* search algorithm to find the best extractive summary up to a given length, which is both optimal and efficient to run. Further, we propose a discriminative training algorithm which directly maximises the quality ofthe best summary, rather than assuming a sentence-level decomposition as in earlier work. Our approach leads to significantly better results than earlier techniques across a number of evaluation metrics.
3 0.40053359 105 emnlp-2010-Title Generation with Quasi-Synchronous Grammar
Author: Kristian Woodsend ; Yansong Feng ; Mirella Lapata
Abstract: The task of selecting information and rendering it appropriately appears in multiple contexts in summarization. In this paper we present a model that simultaneously optimizes selection and rendering preferences. The model operates over a phrase-based representation of the source document which we obtain by merging PCFG parse trees and dependency graphs. Selection preferences for individual phrases are learned discriminatively, while a quasi-synchronous grammar (Smith and Eisner, 2006) captures rendering preferences such as paraphrases and compressions. Based on an integer linear programming formulation, the model learns to generate summaries that satisfy both types of preferences, while ensuring that length, topic coverage and grammar constraints are met. Experiments on headline and image caption generation show that our method obtains state-of-the-art performance using essentially the same model for both tasks without any major modifications.
4 0.26798952 64 emnlp-2010-Incorporating Content Structure into Text Analysis Applications
Author: Christina Sauper ; Aria Haghighi ; Regina Barzilay
Abstract: In this paper, we investigate how modeling content structure can benefit text analysis applications such as extractive summarization and sentiment analysis. This follows the linguistic intuition that rich contextual information should be useful in these tasks. We present a framework which combines a supervised text analysis application with the induction of latent content structure. Both of these elements are learned jointly using the EM algorithm. The induced content structure is learned from a large unannotated corpus and biased by the underlying text analysis task. We demonstrate that exploiting content structure yields significant improvements over approaches that rely only on local context.1
5 0.22928444 83 emnlp-2010-Multi-Level Structured Models for Document-Level Sentiment Classification
Author: Ainur Yessenalina ; Yisong Yue ; Claire Cardie
Abstract: In this paper, we investigate structured models for document-level sentiment classification. When predicting the sentiment of a subjective document (e.g., as positive or negative), it is well known that not all sentences are equally discriminative or informative. But identifying the useful sentences automatically is itself a difficult learning problem. This paper proposes a joint two-level approach for document-level sentiment classification that simultaneously extracts useful (i.e., subjec- tive) sentences and predicts document-level sentiment based on the extracted sentences. Unlike previous joint learning methods for the task, our approach (1) does not rely on gold standard sentence-level subjectivity annotations (which may be expensive to obtain), and (2) optimizes directly for document-level performance. Empirical evaluations on movie reviews and U.S. Congressional floor debates show improved performance over previous approaches.
7 0.19975354 120 emnlp-2010-What's with the Attitude? Identifying Sentences with Attitude in Online Discussions
8 0.19973081 84 emnlp-2010-NLP on Spoken Documents Without ASR
9 0.19243497 70 emnlp-2010-Jointly Modeling Aspects and Opinions with a MaxEnt-LDA Hybrid
10 0.18413563 23 emnlp-2010-Automatic Keyphrase Extraction via Topic Decomposition
11 0.17883338 58 emnlp-2010-Holistic Sentiment Analysis Across Languages: Multilingual Supervised Latent Dirichlet Allocation
12 0.16494755 48 emnlp-2010-Exploiting Conversation Structure in Unsupervised Topic Segmentation for Emails
13 0.14873165 49 emnlp-2010-Extracting Opinion Targets in a Single and Cross-Domain Setting with Conditional Random Fields
14 0.14674574 81 emnlp-2010-Modeling Perspective Using Adaptor Grammars
15 0.14170125 17 emnlp-2010-An Efficient Algorithm for Unsupervised Word Segmentation with Branching Entropy and MDL
16 0.13989092 7 emnlp-2010-A Mixture Model with Sharing for Lexical Semantics
17 0.13874573 96 emnlp-2010-Self-Training with Products of Latent Variable Grammars
18 0.13456841 66 emnlp-2010-Inducing Word Senses to Improve Web Search Result Clustering
19 0.13076822 6 emnlp-2010-A Latent Variable Model for Geographic Lexical Variation
20 0.1285402 35 emnlp-2010-Discriminative Sample Selection for Statistical Machine Translation
topicId topicWeight
[(3, 0.013), (10, 0.018), (12, 0.024), (29, 0.078), (30, 0.039), (52, 0.024), (56, 0.508), (62, 0.018), (66, 0.091), (72, 0.034), (76, 0.016), (79, 0.011), (82, 0.015), (87, 0.016)]
simIndex simValue paperId paperTitle
1 0.96054155 1 emnlp-2010-"Poetic" Statistical Machine Translation: Rhyme and Meter
Author: Dmitriy Genzel ; Jakob Uszkoreit ; Franz Och
Abstract: As a prerequisite to translation of poetry, we implement the ability to produce translations with meter and rhyme for phrase-based MT, examine whether the hypothesis space of such a system is flexible enough to accomodate such constraints, and investigate the impact of such constraints on translation quality.
2 0.95003164 83 emnlp-2010-Multi-Level Structured Models for Document-Level Sentiment Classification
Author: Ainur Yessenalina ; Yisong Yue ; Claire Cardie
Abstract: In this paper, we investigate structured models for document-level sentiment classification. When predicting the sentiment of a subjective document (e.g., as positive or negative), it is well known that not all sentences are equally discriminative or informative. But identifying the useful sentences automatically is itself a difficult learning problem. This paper proposes a joint two-level approach for document-level sentiment classification that simultaneously extracts useful (i.e., subjec- tive) sentences and predicts document-level sentiment based on the extracted sentences. Unlike previous joint learning methods for the task, our approach (1) does not rely on gold standard sentence-level subjectivity annotations (which may be expensive to obtain), and (2) optimizes directly for document-level performance. Empirical evaluations on movie reviews and U.S. Congressional floor debates show improved performance over previous approaches.
3 0.93404001 64 emnlp-2010-Incorporating Content Structure into Text Analysis Applications
Author: Christina Sauper ; Aria Haghighi ; Regina Barzilay
Abstract: In this paper, we investigate how modeling content structure can benefit text analysis applications such as extractive summarization and sentiment analysis. This follows the linguistic intuition that rich contextual information should be useful in these tasks. We present a framework which combines a supervised text analysis application with the induction of latent content structure. Both of these elements are learned jointly using the EM algorithm. The induced content structure is learned from a large unannotated corpus and biased by the underlying text analysis task. We demonstrate that exploiting content structure yields significant improvements over approaches that rely only on local context.1
same-paper 4 0.8914274 102 emnlp-2010-Summarizing Contrastive Viewpoints in Opinionated Text
Author: Michael Paul ; ChengXiang Zhai ; Roxana Girju
Abstract: This paper presents a two-stage approach to summarizing multiple contrastive viewpoints in opinionated text. In the first stage, we use an unsupervised probabilistic approach to model and extract multiple viewpoints in text. We experiment with a variety of lexical and syntactic features, yielding significant performance gains over bag-of-words feature sets. In the second stage, we introduce Comparative LexRank, a novel random walk formulation to score sentences and pairs of sentences from opposite viewpoints based on both their representativeness of the collection as well as their contrastiveness with each other. Exper- imental results show that the proposed approach can generate informative summaries of viewpoints in opinionated text.
5 0.67987609 82 emnlp-2010-Multi-Document Summarization Using A* Search and Discriminative Learning
Author: Ahmet Aker ; Trevor Cohn ; Robert Gaizauskas
Abstract: In this paper we address two key challenges for extractive multi-document summarization: the search problem of finding the best scoring summary and the training problem of learning the best model parameters. We propose an A* search algorithm to find the best extractive summary up to a given length, which is both optimal and efficient to run. Further, we propose a discriminative training algorithm which directly maximises the quality ofthe best summary, rather than assuming a sentence-level decomposition as in earlier work. Our approach leads to significantly better results than earlier techniques across a number of evaluation metrics.
6 0.66126877 105 emnlp-2010-Title Generation with Quasi-Synchronous Grammar
7 0.65601075 107 emnlp-2010-Towards Conversation Entailment: An Empirical Investigation
8 0.64445323 120 emnlp-2010-What's with the Attitude? Identifying Sentences with Attitude in Online Discussions
9 0.61763406 58 emnlp-2010-Holistic Sentiment Analysis Across Languages: Multilingual Supervised Latent Dirichlet Allocation
10 0.61113787 49 emnlp-2010-Extracting Opinion Targets in a Single and Cross-Domain Setting with Conditional Random Fields
11 0.60577667 100 emnlp-2010-Staying Informed: Supervised and Semi-Supervised Multi-View Topical Analysis of Ideological Perspective
12 0.5964551 24 emnlp-2010-Automatically Producing Plot Unit Representations for Narrative Text
13 0.59239781 80 emnlp-2010-Modeling Organization in Student Essays
14 0.57689416 69 emnlp-2010-Joint Training and Decoding Using Virtual Nodes for Cascaded Segmentation and Tagging Tasks
15 0.57664919 25 emnlp-2010-Better Punctuation Prediction with Dynamic Conditional Random Fields
16 0.56002647 65 emnlp-2010-Inducing Probabilistic CCG Grammars from Logical Form with Higher-Order Unification
17 0.55439907 18 emnlp-2010-Assessing Phrase-Based Translation Models with Oracle Decoding
18 0.55315894 48 emnlp-2010-Exploiting Conversation Structure in Unsupervised Topic Segmentation for Emails
19 0.5515458 103 emnlp-2010-Tense Sense Disambiguation: A New Syntactic Polysemy Task
20 0.54771936 94 emnlp-2010-SCFG Decoding Without Binarization