acl acl2010 acl2010-188 knowledge-graph by maker-knowledge-mining

188 acl-2010-Optimizing Informativeness and Readability for Sentiment Summarization


Source: pdf

Author: Hitoshi Nishikawa ; Takaaki Hasegawa ; Yoshihiro Matsuo ; Genichiro Kikui

Abstract: We propose a novel algorithm for sentiment summarization that takes account of informativeness and readability, simultaneously. Our algorithm generates a summary by selecting and ordering sentences taken from multiple review texts according to two scores that represent the informativeness and readability of the sentence order. The informativeness score is defined by the number of sentiment expressions and the readability score is learned from the target corpus. We evaluate our method by summarizing reviews on restaurants. Our method outperforms an existing algorithm as indicated by its ROUGE score and human readability experiments.

Reference: text


Summary: the most important sentenses genereted by tfidf model

sentIndex sentText sentNum sentScore

1 jp Abstract We propose a novel algorithm for sentiment summarization that takes account of informativeness and readability, simultaneously. [sent-8, score-0.831]

2 Our algorithm generates a summary by selecting and ordering sentences taken from multiple review texts according to two scores that represent the informativeness and readability of the sentence order. [sent-9, score-1.17]

3 The informativeness score is defined by the number of sentiment expressions and the readability score is learned from the target corpus. [sent-10, score-1.556]

4 We evaluate our method by summarizing reviews on restaurants. [sent-11, score-0.153]

5 Our method outperforms an existing algorithm as indicated by its ROUGE score and human readability experiments. [sent-12, score-0.67]

6 1 Introduction The Web holds a massive number of reviews describing the sentiments of customers about products and services. [sent-13, score-0.393]

7 These reviews can help the user reach purchasing decisions and guide companies’ business activities such as product improvements. [sent-14, score-0.126]

8 It is, however, almost impossible to read all reviews given their sheer number. [sent-15, score-0.174]

9 These reviews are best utilized by the development of automatic text summarization, particularly sentiment summarization. [sent-16, score-0.488]

10 Sentiment summarizers are divided into two categories in terms of output style. [sent-18, score-0.06]

11 One outputs lists of sentences (Hu and Liu, 2004; Blair-Goldensohn et al. [sent-19, score-0.084]

12 , 2008; Titov and McDonald, 2008), the other outputs texts consisting of ordered sentences (Carenini et al. [sent-20, score-0.146]

13 Our work lies in the latter category, and a typical summary is shown in Figure 1. [sent-23, score-0.057]

14 One crucial weakness of existing text-oriented summarizers is the poor readability oftheir results. [sent-28, score-0.569]

15 Good readability is essential because readability strongly affects text comprehension (Barzilay et al. [sent-29, score-1.018]

16 To achieve readable summaries, the extracted sentences must be appropriately ordered (Barzilay et al. [sent-31, score-0.173]

17 (2002) proposed an algorithm for ordering sentences according to the dates ofthe publications from which the sentences were extracted. [sent-34, score-0.302]

18 Lapata (2003) proposed an algorithm that computes the probability of two sentences being adjacent for ordering sentences. [sent-35, score-0.233]

19 Both methods delink sentence extraction from sentence ordering, so a sentence can be extracted that cannot be ordered naturally with the other extracted sentences. [sent-36, score-0.236]

20 To solve this problem, we propose an algorithm that chooses sentences and orders them simultaneously in such a way that the ordered sentences maximize the scores of informativeness and readability. [sent-37, score-0.626]

21 Our algorithm efficiently searches for the best sequence of sentences by using dynamic programming and beam search. [sent-38, score-0.287]

22 We verify that our method generates summaries that are significantly better than the baseline results in terms of ROUGE score (Lin, 2004) and subjective readability measures. [sent-39, score-0.842]

23 c C2o0n1f0er Aenscseoc Sihatoirotn P faopre Crso,m papguetsat 3io2n5a–l3 L3i0n,guistics simultaneously achieve both informativeness and readability in the area of multi-document summarization. [sent-42, score-0.875]

24 This paper is organized as follows: Section 2 describes our summarization method. [sent-43, score-0.103]

25 2 Optimizing Sentence Sequence Formally, we define a summary S∗ = hs0, s1, . [sent-46, score-0.057]

26 , sn, sn+1i as a sequence consisting of n sentences wihere s0 and sn+1 are symbols indicating the beginning and ending of the sequence, respectively. [sent-49, score-0.155]

27 We introduce the informativeness score and the readability score, then describe how to optimize a sequence. [sent-53, score-1.007]

28 1 Informativeness Score Since we attempt to summarize reviews, we assume that a good summary must involve as many sentiments as possible. [sent-55, score-0.292]

29 With regard to restaurants, aspects include food, atmosphere and staff. [sent-58, score-0.132]

30 Polarity represents whether the sentiment is positive or negative. [sent-59, score-0.362]

31 Notice that Equation 2 defines the informativeness score of a summary as the sum of the score − of the sentiments contained in S. [sent-61, score-0.888]

32 To avoid duplicative sentences, each sentiment is counted only once for scoring. [sent-62, score-0.362]

33 In addition, the aspects are clustered and similar aspects (e. [sent-63, score-0.068]

34 Sentiments are extracted using a sentiment lexicon and pattern matched from dependency trees of sentences. [sent-69, score-0.389]

35 The sentiment lexicon1 consists of pairs of sentiment expressions and their polarities, for example, delicious, friendly and good are positive sentiment expressions, bad and expensive are negative sentiment expressions. [sent-70, score-1.503]

36 To extract sentiments from given sentences, first, we identify sentiment expressions among words consisting of parsed sentences. [sent-71, score-0.618]

37 For example, in the case of the sentence “This restaurant offers customers delicious foods and a relaxing atmosphere. [sent-72, score-0.362]

38 ” in Figure 1, delicious and relaxing are identified as sentiment expressions. [sent-73, score-0.513]

39 If the sentiment expressions are identified, the expressions and its aspects are extracted as aspect- sentiment expression pairs from dependency tree using some rules. [sent-74, score-0.895]

40 In the case of the example sentence, foods and delicious, atmosphere and relaxing are extracted as aspect-sentiment expression pairs. [sent-75, score-0.262]

41 Finally extracted sentiment expressions are converted to polarities, we acquire the set of sentiments from sentences, for example, h foods, 1i tainmde h atmosphere, 1i . [sent-76, score-0.645]

42 Nd hot aet tmhoats pshinecree, our method relies on only sentiment lexicon, extractable aspects are unlimited. [sent-77, score-0.396]

43 Since it is difficult to model all of them, we approximate readability as the natural order of sentences. [sent-80, score-0.509]

44 (2002) used the publication dates of documents to catch temporally-ordered events, but this approach is not really suitable for our goal because reviews focus on entities rather than events. [sent-82, score-0.209]

45 Lapata (2003) employed the probability of two sentences being adjacent as determined from a corpus. [sent-83, score-0.149]

46 If the cor- pus consists of reviews, it is expected that this approach would be effective for sentiment summarization. [sent-84, score-0.362]

47 We define the 1Since we aim to summarize Japanese reviews, we utilize Japanese sentiment lexicon (Asano et al. [sent-86, score-0.396]

48 However, our method is, except for sentiment extraction, language independent. [sent-88, score-0.362]

49 That is, the readability score of sentence sequence S is the sum ofthe connectivity of all adjacent sentences in the sequence. [sent-90, score-0.907]

50 As the features, Lapata (2003) proposed the Cartesian product of content words in adjacent sentences. [sent-91, score-0.065]

51 We observe that the first sentence of a review of a restaurant frequently contains named entities indicating location. [sent-95, score-0.113]

52 , sn, sn+1i as follows: ∑n Φ(S) = ∑φ(si,si+1) (4) ∑i=0 Therefore, the score of sequence S is w>Φ(S). [sent-100, score-0.171]

53 Given a training set, if a trained parameter w assigns a score w>Φ(S+) to an correct order S+ that is higher than a score w>Φ(S−) to an incorrect order S−, it is expected that the trained parameter will give higher score to naturally ordered sentences than to unnaturally ordered sentences. [sent-101, score-0.658]

54 Averaged Perceptron requires an argmax operation for parameter estimation. [sent-103, score-0.119]

55 Since we attempt to order a set of sentences, the operation is regarded as solving the Traveling Salesman Prob- lem; that is, we locate the path that offers maximum score through all n sentences as s0 and sn+1 are starting and ending points, respectively. [sent-104, score-0.292]

56 Thus the operation is NP-hard and it is difficult to find the global optimal solution. [sent-105, score-0.044]

57 To alleviate this, we find an approximate solution by adopting the dynamic programming technique of the Held and Karp Algorithm (Held and Karp, 1962) and beam search. [sent-106, score-0.139]

58 S indicates intended sentences and M is a distance matrix of the readability scores of adjacent sentence pairs. [sent-108, score-0.762]

59 Hi (C, j) indicates the score of the hypothesis that has covered the set of isentences C and has the sentence j at the end of the path, Figure 2: Held and Karp Algorithm. [sent-109, score-0.256]

60 For example, H2 ({s0, s2, s5}, s2) indicates a hypothesis that covers s0, s2, s5 a}n,ds the last sentence is s2. [sent-113, score-0.124]

61 Initially, H0({s0}, s0) is assigned the score of 0, and new sent(en{sces} are then added one by one. [sent-114, score-0.132]

62 In the search procedure, our dynamic programming based algorithm retains just the hypothesis with maximum score among the hypotheses that have the same sentences and the same last sentence. [sent-115, score-0.396]

63 3 Optimization The argmax operation in Equation 1 also involves search, which is NP-hard as described in Section 2. [sent-120, score-0.092]

64 Therefore, we adopt the Held and Karp Algorithm and beam search to find approximate solutions. [sent-122, score-0.052]

65 The search algorithm is basically the same as parameter estimation, except for its calculation of the informativeness score and size limitation. [sent-123, score-0.525]

66 Therefore, when a new sentence is added to a hypothesis, both the informativeness and the readability scores are calculated. [sent-124, score-0.945]

67 The size of the hypothesis is also calculated and if the size exceeds the limit, the sentence can’t be added. [sent-125, score-0.09]

68 A hypothesis that can’t accept any more sentences is removed from the search procedure and preserved in memory. [sent-126, score-0.134]

69 After all hypotheses are removed, the best hypothesis is chosen from among the pre- served hypotheses as the solution. [sent-127, score-0.136]

70 3 Experiments This section evaluates our method in terms of ROUGE score and readability. [sent-128, score-0.132]

71 We collected 2,940 reviews of 100 restaurants from a website. [sent-129, score-0.161]

72 We attempted to generate 300 byte summaries, so the summarization rate was about 6%. [sent-147, score-0.103]

73 , 2006) for sentiment extraction and constructing feature vectors for readability score, respectively. [sent-150, score-0.871]

74 We prepared four reference summaries for each document set. [sent-154, score-0.201]

75 To evaluate the effects of the informativeness score, the readability score and the optimization, we compared the following five methods. [sent-155, score-1.007]

76 We designed the score of a sentence as term frequencies of the content words in a document set. [sent-157, score-0.2]

77 Method1: uses optimization without the informativeness score or readability score. [sent-158, score-1.035]

78 Method2: uses the informativeness score and optimization without the readability score. [sent-160, score-1.035]

79 Following Equation 1, the summarizer searches for a se- quence with high informativeness and readability score. [sent-162, score-0.968]

80 The parameter vector w was trained on the same 2,940 reviews in 5-fold cross validation fashion. [sent-163, score-0.153]

81 To compare our summarizer to human summarization, we calculated ROUGE scores between each reference and the other references, and averaged them. [sent-166, score-0.168]

82 We discuss the contribution of readability to ROUGE scores. [sent-176, score-0.509]

83 It is interesting that the readability criterion also improved ROUGE scores. [sent-178, score-0.509]

84 We extracted sentiments from the summaries using the above sentiment extractor, and averaged the unique sentiment numbers. [sent-180, score-1.166]

85 The references (Human) have fewer sentiments than the summaries generated by our method. [sent-182, score-0.374]

86 In other words, the references included almost as many other sentences (e. [sent-183, score-0.084]

87 Including them in summaries would greatly improve summarizer appeal. [sent-188, score-0.241]

88 Three different summarizers generated summaries for each document set. [sent-191, score-0.261]

89 Before the evaluation the judges read evaluation criteria and gave points to summaries using a five-point scale. [sent-193, score-0.267]

90 The judges weren’t informed of which method generated which summary. [sent-194, score-0.046]

91 One important factor behind the higher readability of Method3 is that it yields longer sentences on average (6. [sent-204, score-0.593]

92 That is, Method2 and Method2+ tended to select short sentences, which made their summaries less readable. [sent-210, score-0.173]

93 4 Conclusion This paper proposed a novel algorithm for sentiment summarization that takes account of informativeness and readability, simultaneously. [sent-211, score-0.831]

94 To summarize reviews, the informativeness score is based on sentiments and the readability score is learned from a corpus of reviews. [sent-212, score-1.374]

95 The preferred sequence is determined by using dynamic programming and beam search. [sent-213, score-0.178]

96 Experiments showed that our method generated better summaries than the baseline in terms of ROUGE score and readability. [sent-214, score-0.305]

97 One future work is to include important information other than sentiments in the summaries. [sent-215, score-0.201]

98 We also plan to model the order of sentences globally. [sent-216, score-0.084]

99 Although the ordering model in this paper is local since it looks at only adjacent sentences, a model that can evaluate global order is important for better summaries. [sent-217, score-0.149]

100 A dynamic programming approach to sequencing prob- lems. [sent-271, score-0.112]


similar papers computed by tfidf model

tfidf for this paper:

wordName wordTfidf (topN-words)

[('readability', 0.509), ('informativeness', 0.366), ('sentiment', 0.362), ('rouge', 0.215), ('sentiments', 0.201), ('summaries', 0.173), ('score', 0.132), ('reviews', 0.126), ('barzilay', 0.119), ('carenini', 0.113), ('summarization', 0.103), ('karp', 0.098), ('lerman', 0.098), ('atmosphere', 0.098), ('delicious', 0.098), ('foods', 0.084), ('sentences', 0.084), ('ordering', 0.084), ('si', 0.079), ('sn', 0.077), ('summarizer', 0.068), ('adjacent', 0.065), ('held', 0.063), ('ordered', 0.062), ('lapata', 0.062), ('summarizers', 0.06), ('ntt', 0.06), ('summary', 0.057), ('info', 0.057), ('asano', 0.056), ('genichiro', 0.056), ('yoshihiro', 0.056), ('expressions', 0.055), ('relaxing', 0.053), ('ryan', 0.052), ('beam', 0.052), ('hypothesis', 0.05), ('dates', 0.05), ('kikui', 0.049), ('read', 0.048), ('argmax', 0.048), ('restaurant', 0.047), ('judges', 0.046), ('imamura', 0.045), ('operation', 0.044), ('dynamic', 0.044), ('japanese', 0.043), ('hypotheses', 0.043), ('programming', 0.043), ('sasha', 0.042), ('mmr', 0.042), ('averaged', 0.041), ('sentence', 0.04), ('polarities', 0.04), ('customers', 0.04), ('regina', 0.04), ('perceptron', 0.04), ('sequence', 0.039), ('carbonell', 0.038), ('connectivity', 0.038), ('giuseppe', 0.038), ('aspect', 0.037), ('evaluative', 0.037), ('titov', 0.037), ('suzuki', 0.035), ('wilcoxon', 0.035), ('restaurants', 0.035), ('indicates', 0.034), ('mcdonald', 0.034), ('summarize', 0.034), ('aspects', 0.034), ('publication', 0.033), ('ending', 0.032), ('scores', 0.03), ('companion', 0.029), ('human', 0.029), ('optimization', 0.028), ('document', 0.028), ('subjective', 0.028), ('extracted', 0.027), ('parameter', 0.027), ('summarizing', 0.027), ('named', 0.026), ('hu', 0.026), ('polarity', 0.026), ('products', 0.026), ('searches', 0.025), ('coherence', 0.025), ('nlpix', 0.025), ('reis', 0.025), ('simplifications', 0.025), ('jade', 0.025), ('thirty', 0.025), ('cyber', 0.025), ('sincerely', 0.025), ('charts', 0.025), ('weren', 0.025), ('sequencing', 0.025), ('aets', 0.025), ('grasp', 0.025)]

similar papers list:

simIndex simValue paperId paperTitle

same-paper 1 0.99999994 188 acl-2010-Optimizing Informativeness and Readability for Sentiment Summarization

Author: Hitoshi Nishikawa ; Takaaki Hasegawa ; Yoshihiro Matsuo ; Genichiro Kikui

Abstract: We propose a novel algorithm for sentiment summarization that takes account of informativeness and readability, simultaneously. Our algorithm generates a summary by selecting and ordering sentences taken from multiple review texts according to two scores that represent the informativeness and readability of the sentence order. The informativeness score is defined by the number of sentiment expressions and the readability score is learned from the target corpus. We evaluate our method by summarizing reviews on restaurants. Our method outperforms an existing algorithm as indicated by its ROUGE score and human readability experiments.

2 0.24680398 209 acl-2010-Sentiment Learning on Product Reviews via Sentiment Ontology Tree

Author: Wei Wei ; Jon Atle Gulla

Abstract: Existing works on sentiment analysis on product reviews suffer from the following limitations: (1) The knowledge of hierarchical relationships of products attributes is not fully utilized. (2) Reviews or sentences mentioning several attributes associated with complicated sentiments are not dealt with very well. In this paper, we propose a novel HL-SOT approach to labeling a product’s attributes and their associated sentiments in product reviews by a Hierarchical Learning (HL) process with a defined Sentiment Ontology Tree (SOT). The empirical analysis against a humanlabeled data set demonstrates promising and reasonable performance of the proposed HL-SOT approach. While this paper is mainly on sentiment analysis on reviews of one product, our proposed HLSOT approach is easily generalized to labeling a mix of reviews of more than one products.

3 0.23566449 77 acl-2010-Cross-Language Document Summarization Based on Machine Translation Quality Prediction

Author: Xiaojun Wan ; Huiying Li ; Jianguo Xiao

Abstract: Cross-language document summarization is a task of producing a summary in one language for a document set in a different language. Existing methods simply use machine translation for document translation or summary translation. However, current machine translation services are far from satisfactory, which results in that the quality of the cross-language summary is usually very poor, both in readability and content. In this paper, we propose to consider the translation quality of each sentence in the English-to-Chinese cross-language summarization process. First, the translation quality of each English sentence in the document set is predicted with the SVM regression method, and then the quality score of each sentence is incorporated into the summarization process. Finally, the English sentences with high translation quality and high informativeness are selected and translated to form the Chinese summary. Experimental results demonstrate the effectiveness and usefulness of the proposed approach. 1

4 0.23095372 210 acl-2010-Sentiment Translation through Lexicon Induction

Author: Christian Scheible

Abstract: The translation of sentiment information is a task from which sentiment analysis systems can benefit. We present a novel, graph-based approach using SimRank, a well-established vertex similarity algorithm to transfer sentiment information between a source language and a target language graph. We evaluate this method in comparison with SO-PMI.

5 0.18789279 123 acl-2010-Generating Focused Topic-Specific Sentiment Lexicons

Author: Valentin Jijkoun ; Maarten de Rijke ; Wouter Weerkamp

Abstract: We present a method for automatically generating focused and accurate topicspecific subjectivity lexicons from a general purpose polarity lexicon that allow users to pin-point subjective on-topic information in a set of relevant documents. We motivate the need for such lexicons in the field of media analysis, describe a bootstrapping method for generating a topic-specific lexicon from a general purpose polarity lexicon, and evaluate the quality of the generated lexicons both manually and using a TREC Blog track test set for opinionated blog post retrieval. Although the generated lexicons can be an order of magnitude more selective than the general purpose lexicon, they maintain, or even improve, the performance of an opin- ion retrieval system.

6 0.18276834 124 acl-2010-Generating Image Descriptions Using Dependency Relational Patterns

7 0.17301978 38 acl-2010-Automatic Evaluation of Linguistic Quality in Multi-Document Summarization

8 0.15446056 14 acl-2010-A Risk Minimization Framework for Extractive Speech Summarization

9 0.15261948 264 acl-2010-Wrapping up a Summary: From Representation to Generation

10 0.13679624 18 acl-2010-A Study of Information Retrieval Weighting Schemes for Sentiment Analysis

11 0.13546246 8 acl-2010-A Hybrid Hierarchical Model for Multi-Document Summarization

12 0.12867877 11 acl-2010-A New Approach to Improving Multilingual Summarization Using a Genetic Algorithm

13 0.12570563 122 acl-2010-Generating Fine-Grained Reviews of Songs from Album Reviews

14 0.12145557 22 acl-2010-A Unified Graph Model for Sentence-Based Opinion Retrieval

15 0.10904238 39 acl-2010-Automatic Generation of Story Highlights

16 0.10530096 42 acl-2010-Automatically Generating Annotator Rationales to Improve Sentiment Classification

17 0.10108707 157 acl-2010-Last but Definitely Not Least: On the Role of the Last Sentence in Automatic Polarity-Classification

18 0.099781387 125 acl-2010-Generating Templates of Entity Summaries with an Entity-Aspect Model and Pattern Mining

19 0.091869585 80 acl-2010-Cross Lingual Adaptation: An Experiment on Sentiment Classifications

20 0.089410901 78 acl-2010-Cross-Language Text Classification Using Structural Correspondence Learning


similar papers computed by lsi model

lsi for this paper:

topicId topicWeight

[(0, -0.193), (1, 0.095), (2, -0.235), (3, 0.165), (4, -0.137), (5, 0.004), (6, -0.011), (7, -0.259), (8, -0.049), (9, 0.006), (10, 0.003), (11, -0.01), (12, -0.097), (13, 0.073), (14, -0.086), (15, -0.065), (16, 0.231), (17, -0.057), (18, 0.003), (19, 0.018), (20, 0.019), (21, 0.108), (22, -0.046), (23, -0.105), (24, 0.027), (25, 0.067), (26, 0.008), (27, -0.064), (28, -0.062), (29, -0.003), (30, 0.084), (31, 0.034), (32, -0.153), (33, 0.057), (34, -0.026), (35, -0.114), (36, 0.008), (37, 0.035), (38, -0.006), (39, 0.031), (40, 0.032), (41, 0.104), (42, -0.069), (43, 0.084), (44, 0.033), (45, 0.094), (46, 0.028), (47, 0.037), (48, -0.114), (49, -0.024)]

similar papers list:

simIndex simValue paperId paperTitle

same-paper 1 0.96265733 188 acl-2010-Optimizing Informativeness and Readability for Sentiment Summarization

Author: Hitoshi Nishikawa ; Takaaki Hasegawa ; Yoshihiro Matsuo ; Genichiro Kikui

Abstract: We propose a novel algorithm for sentiment summarization that takes account of informativeness and readability, simultaneously. Our algorithm generates a summary by selecting and ordering sentences taken from multiple review texts according to two scores that represent the informativeness and readability of the sentence order. The informativeness score is defined by the number of sentiment expressions and the readability score is learned from the target corpus. We evaluate our method by summarizing reviews on restaurants. Our method outperforms an existing algorithm as indicated by its ROUGE score and human readability experiments.

2 0.76670891 42 acl-2010-Automatically Generating Annotator Rationales to Improve Sentiment Classification

Author: Ainur Yessenalina ; Yejin Choi ; Claire Cardie

Abstract: One ofthe central challenges in sentimentbased text categorization is that not every portion of a document is equally informative for inferring the overall sentiment of the document. Previous research has shown that enriching the sentiment labels with human annotators’ “rationales” can produce substantial improvements in categorization performance (Zaidan et al., 2007). We explore methods to automatically generate annotator rationales for document-level sentiment classification. Rather unexpectedly, we find the automatically generated rationales just as helpful as human rationales.

3 0.74195445 209 acl-2010-Sentiment Learning on Product Reviews via Sentiment Ontology Tree

Author: Wei Wei ; Jon Atle Gulla

Abstract: Existing works on sentiment analysis on product reviews suffer from the following limitations: (1) The knowledge of hierarchical relationships of products attributes is not fully utilized. (2) Reviews or sentences mentioning several attributes associated with complicated sentiments are not dealt with very well. In this paper, we propose a novel HL-SOT approach to labeling a product’s attributes and their associated sentiments in product reviews by a Hierarchical Learning (HL) process with a defined Sentiment Ontology Tree (SOT). The empirical analysis against a humanlabeled data set demonstrates promising and reasonable performance of the proposed HL-SOT approach. While this paper is mainly on sentiment analysis on reviews of one product, our proposed HLSOT approach is easily generalized to labeling a mix of reviews of more than one products.

4 0.67397285 18 acl-2010-A Study of Information Retrieval Weighting Schemes for Sentiment Analysis

Author: Georgios Paltoglou ; Mike Thelwall

Abstract: Most sentiment analysis approaches use as baseline a support vector machines (SVM) classifier with binary unigram weights. In this paper, we explore whether more sophisticated feature weighting schemes from Information Retrieval can enhance classification accuracy. We show that variants of the classic tf.idf scheme adapted to sentiment analysis provide significant increases in accuracy, especially when using a sublinear function for term frequency weights and document frequency smoothing. The techniques are tested on a wide selection of data sets and produce the best accuracy to our knowledge.

5 0.66490424 210 acl-2010-Sentiment Translation through Lexicon Induction

Author: Christian Scheible

Abstract: The translation of sentiment information is a task from which sentiment analysis systems can benefit. We present a novel, graph-based approach using SimRank, a well-established vertex similarity algorithm to transfer sentiment information between a source language and a target language graph. We evaluate this method in comparison with SO-PMI.

6 0.54951864 264 acl-2010-Wrapping up a Summary: From Representation to Generation

7 0.52819991 157 acl-2010-Last but Definitely Not Least: On the Role of the Last Sentence in Automatic Polarity-Classification

8 0.52267224 14 acl-2010-A Risk Minimization Framework for Extractive Speech Summarization

9 0.51692235 11 acl-2010-A New Approach to Improving Multilingual Summarization Using a Genetic Algorithm

10 0.51208419 38 acl-2010-Automatic Evaluation of Linguistic Quality in Multi-Document Summarization

11 0.51192844 77 acl-2010-Cross-Language Document Summarization Based on Machine Translation Quality Prediction

12 0.50646782 124 acl-2010-Generating Image Descriptions Using Dependency Relational Patterns

13 0.50436991 122 acl-2010-Generating Fine-Grained Reviews of Songs from Album Reviews

14 0.46526518 176 acl-2010-Mood Patterns and Affective Lexicon Access in Weblogs

15 0.45776525 8 acl-2010-A Hybrid Hierarchical Model for Multi-Document Summarization

16 0.44861278 123 acl-2010-Generating Focused Topic-Specific Sentiment Lexicons

17 0.43820286 105 acl-2010-Evaluating Multilanguage-Comparability of Subjectivity Analysis Systems

18 0.42944351 39 acl-2010-Automatic Generation of Story Highlights

19 0.39743567 140 acl-2010-Identifying Non-Explicit Citing Sentences for Citation-Based Summarization.

20 0.37939519 125 acl-2010-Generating Templates of Entity Summaries with an Entity-Aspect Model and Pattern Mining


similar papers computed by lda model

lda for this paper:

topicId topicWeight

[(0, 0.207), (14, 0.019), (25, 0.044), (33, 0.012), (39, 0.022), (42, 0.066), (44, 0.019), (59, 0.059), (72, 0.035), (73, 0.051), (76, 0.019), (78, 0.04), (83, 0.12), (84, 0.023), (98, 0.175)]

similar papers list:

simIndex simValue paperId paperTitle

1 0.86894119 197 acl-2010-Practical Very Large Scale CRFs

Author: Thomas Lavergne ; Olivier Cappe ; Francois Yvon

Abstract: Conditional Random Fields (CRFs) are a widely-used approach for supervised sequence labelling, notably due to their ability to handle large description spaces and to integrate structural dependency between labels. Even for the simple linearchain model, taking structure into account implies a number of parameters and a computational effort that grows quadratically with the cardinality of the label set. In this paper, we address the issue of training very large CRFs, containing up to hun- dreds output labels and several billion features. Efficiency stems here from the sparsity induced by the use of a ‘1 penalty term. Based on our own implementation, we compare three recent proposals for implementing this regularization strategy. Our experiments demonstrate that very large CRFs can be trained efficiently and that very large models are able to improve the accuracy, while delivering compact parameter sets.

same-paper 2 0.85027754 188 acl-2010-Optimizing Informativeness and Readability for Sentiment Summarization

Author: Hitoshi Nishikawa ; Takaaki Hasegawa ; Yoshihiro Matsuo ; Genichiro Kikui

Abstract: We propose a novel algorithm for sentiment summarization that takes account of informativeness and readability, simultaneously. Our algorithm generates a summary by selecting and ordering sentences taken from multiple review texts according to two scores that represent the informativeness and readability of the sentence order. The informativeness score is defined by the number of sentiment expressions and the readability score is learned from the target corpus. We evaluate our method by summarizing reviews on restaurants. Our method outperforms an existing algorithm as indicated by its ROUGE score and human readability experiments.

3 0.73611981 22 acl-2010-A Unified Graph Model for Sentence-Based Opinion Retrieval

Author: Binyang Li ; Lanjun Zhou ; Shi Feng ; Kam-Fai Wong

Abstract: There is a growing research interest in opinion retrieval as on-line users’ opinions are becoming more and more popular in business, social networks, etc. Practically speaking, the goal of opinion retrieval is to retrieve documents, which entail opinions or comments, relevant to a target subject specified by the user’s query. A fundamental challenge in opinion retrieval is information representation. Existing research focuses on document-based approaches and documents are represented by bag-of-word. However, due to loss of contextual information, this representation fails to capture the associative information between an opinion and its corresponding target. It cannot distinguish different degrees of a sentiment word when associated with different targets. This in turn seriously affects opinion retrieval performance. In this paper, we propose a sentence-based approach based on a new information representa- , tion, namely topic-sentiment word pair, to capture intra-sentence contextual information between an opinion and its target. Additionally, we consider inter-sentence information to capture the relationships among the opinions on the same topic. Finally, the two types of information are combined in a unified graph-based model, which can effectively rank the documents. Compared with existing approaches, experimental results on the COAE08 dataset showed that our graph-based model achieved significant improvement. 1

4 0.73499048 251 acl-2010-Using Anaphora Resolution to Improve Opinion Target Identification in Movie Reviews

Author: Niklas Jakob ; Iryna Gurevych

Abstract: unkown-abstract

5 0.73485398 208 acl-2010-Sentence and Expression Level Annotation of Opinions in User-Generated Discourse

Author: Cigdem Toprak ; Niklas Jakob ; Iryna Gurevych

Abstract: In this paper, we introduce a corpus of consumer reviews from the rateitall and the eopinions websites annotated with opinion-related information. We present a two-level annotation scheme. In the first stage, the reviews are analyzed at the sentence level for (i) relevancy to a given topic, and (ii) expressing an evaluation about the topic. In the second stage, on-topic sentences containing evaluations about the topic are further investigated at the expression level for pinpointing the properties (semantic orientation, intensity), and the functional components of the evaluations (opinion terms, targets and holders). We discuss the annotation scheme, the inter-annotator agreement for different subtasks and our observations.

6 0.73424721 211 acl-2010-Simple, Accurate Parsing with an All-Fragments Grammar

7 0.72758317 214 acl-2010-Sparsity in Dependency Grammar Induction

8 0.72175932 140 acl-2010-Identifying Non-Explicit Citing Sentences for Citation-Based Summarization.

9 0.71930003 93 acl-2010-Dynamic Programming for Linear-Time Incremental Parsing

10 0.71796227 209 acl-2010-Sentiment Learning on Product Reviews via Sentiment Ontology Tree

11 0.71581686 174 acl-2010-Modeling Semantic Relevance for Question-Answer Pairs in Web Social Communities

12 0.71528977 133 acl-2010-Hierarchical Search for Word Alignment

13 0.713835 102 acl-2010-Error Detection for Statistical Machine Translation Using Linguistic Features

14 0.71359658 77 acl-2010-Cross-Language Document Summarization Based on Machine Translation Quality Prediction

15 0.71249735 109 acl-2010-Experiments in Graph-Based Semi-Supervised Learning Methods for Class-Instance Acquisition

16 0.71224856 146 acl-2010-Improving Chinese Semantic Role Labeling with Rich Syntactic Features

17 0.71196109 127 acl-2010-Global Learning of Focused Entailment Graphs

18 0.71174967 36 acl-2010-Automatic Collocation Suggestion in Academic Writing

19 0.71123707 39 acl-2010-Automatic Generation of Story Highlights

20 0.71122658 5 acl-2010-A Framework for Figurative Language Detection Based on Sense Differentiation