acl acl2011 acl2011-82 knowledge-graph by maker-knowledge-mining

82 acl-2011-Content Models with Attitude

Source: pdf

Author: Christina Sauper ; Aria Haghighi ; Regina Barzilay

Abstract: We present a probabilistic topic model for jointly identifying properties and attributes of social media review snippets. Our model simultaneously learns a set of properties of a product and captures aggregate user sentiments towards these properties. This approach directly enables discovery of highly rated or inconsistent properties of a product. Our model admits an efficient variational meanfield inference algorithm which can be parallelized and run on large snippet collections. We evaluate our model on a large corpus of snippets from Yelp reviews to assess property and attribute prediction. We demonstrate that it outperforms applicable baselines by a considerable margin.

Reference: text

Summary: the most important sentenses genereted by tfidf model

sentIndex sentText sentNum sentScore

1 Abstract We present a probabilistic topic model for jointly identifying properties and attributes of social media review snippets. [sent-6, score-0.367]

2 Our model simultaneously learns a set of properties of a product and captures aggregate user sentiments towards these properties. [sent-7, score-0.41]

3 Our model admits an efficient variational meanfield inference algorithm which can be parallelized and run on large snippet collections. [sent-9, score-0.586]

4 We evaluate our model on a large corpus of snippets from Yelp reviews to assess property and attribute prediction. [sent-10, score-1.229]

5 For instance, product pages on Amazon prominently display the distribution of numerical scores across re350 Coherent property cluster +TThhee dmrairntkinsi -s b wotehre w vienrey a gnodo md. [sent-17, score-0.664]

6 Incoherent property cluster BTheset s puasehlil ias I ’thde e bveerst h Ia’vde. [sent-21, score-0.409]

7 The first cluster represents a coherent property of the underlying product, namely the cocktail property, and assesses distinctions in user sentiment. [sent-26, score-0.508]

8 The latter cluster simply shares a common attribute expression and does not represent snippets discussing the same product property. [sent-27, score-1.171]

9 In this work, we aim to produce the first type of property cluster with correct sentiment labeling. [sent-28, score-0.591]

10 Specifically, we are interested in identifying fine-grained product properties across reviews (e. [sent-31, score-0.432]

11 For this task, we assume as input a set of product review snippets (i. [sent-34, score-0.81]

12 These methods can effectively extract product properties from individual snippets along with their corresponding sentiment. [sent-42, score-0.792]

13 Consider, for instance, the two clusters of restaurant review snippets shown in Figure 1. [sent-44, score-0.936]

14 While both clusters have many words in common among their members, only the first describes a coherent property cluster, namely the cocktail property. [sent-45, score-0.5]

15 The snippets of the latter cluster do not discuss a single product property, but instead share similar expressions of sentiment. [sent-46, score-0.798]

16 To solve this issue, we need a method which can correctly identify both property and sentiment words. [sent-47, score-0.512]

17 In this work, we propose an approach that jointly analyzes the whole collection of product review snippets, induces a set of learned properties, and models the aggregate user sentiment towards these properties. [sent-48, score-0.574]

18 We capture this idea using a Bayesian topic model where a set of properties and corresponding attribute tendencies are represented as hidden variables. [sent-49, score-0.509]

19 The model takes product review snippets as input and explains how the observed text arises from the latent variables, thereby connecting text fragments with corresponding properties and attributes. [sent-50, score-0.955]

20 Second, our model yields an efficient mean-field variational inference procedure which can be parallelized and run on a large number of review snippets. [sent-54, score-0.332]

21 We evaluate our approach in the domain of snippets taken from restaurant reviews on Yelp. [sent-55, score-0.772]

22 8 snippets representing a wide spectrum of opinions about a restaurant. [sent-57, score-0.555]

23 We also show that the model can effectively identify binary snippet attributes with 9. [sent-59, score-0.417]

24 2% error reduction over applicable baselines, demonstrating that learning to identify attributes in the context of other product reviews yields significant gains. [sent-60, score-0.41]

25 Finally, we evaluate our model on its ability to identify product properties for which there is significant sentiment disagreement amongst user snippets. [sent-61, score-0.579]

26 First, our work relates to research on extraction of product properties with associated sentiment from review text (Hu and Liu, 2004; Liu et al. [sent-64, score-0.613]

27 While our model captures similar high-level intuition, it analyzes fine-grained properties expressed at the snippet level, rather than document-level sentiment. [sent-87, score-0.439]

28 Input snippets are deterministically taken from the output of the Sauper et al. [sent-94, score-0.485]

29 For instance, the snippet “the pad thai was great” describes the pad thai property. [sent-97, score-0.497]

30 We assume that each snippet has a single property associated with it. [sent-98, score-0.604]

31 For the corpus of restaurant reviews, we assume that the set of properties are specific to a given product, in order to capture fine-grained, relevant properties for each restaurant. [sent-100, score-0.479]

32 For example, reviews from a sandwich shop may contrast the club sandwich with the turkey wrap, while for a more general restaurant, the snippets refer to sandwiches in general. [sent-101, score-0.68]

33 Attribute: An attribute is a description of a property. [sent-103, score-0.348]

34 There are multiple attribute types, which may correspond to semantic differences. [sent-104, score-0.348]

35 For example, in the case of product reviews, we select N = 2 attributes corresponding to positive and negative sentiment. [sent-106, score-0.344]

36 352 One of the goals of this work in the review domain is to improve sentiment prediction by exploiting correlations within a single property cluster. [sent-108, score-0.616]

37 For example, if there are already many snippets with the attribute representing positive sentiment in a given property cluster, additional snippets are biased towards positive sentiment as well; however, data can always override this bias. [sent-109, score-2.045]

38 Snippets themselves are always observed; the goal of this work is to induce the latent property and attribute underlying each snippet. [sent-110, score-0.658]

39 4 Model Our model generates the words of all snippets for each product in a collection of products. [sent-111, score-0.687]

40 We use si,j,w to represent the wth word of the jth snippet of the ith product. [sent-112, score-0.362]

41 We use s to denote the collection of all snippet words. [sent-113, score-0.334]

42 We present an overview of our generative model in Figure 1 and describe each component in turn: Global Distributions: At the global level, we draw several unigram distributions: a global background distribution θB and attribute distributions θAa for each attribute. [sent-115, score-0.506]

43 In this domain, the positive and negative attribute distributions encode words with positive and negative sentiments (e. [sent-119, score-0.62]

44 The positive and negative attribute distributions are initialized using seed words (Vseeda in Figure 1). [sent-125, score-0.604]

45 These seeds are incorporated into the attribute priors: a non-seed word gets ? [sent-126, score-0.348]

46 Product Level: For the ith product, we draw property unigram distributions . [sent-132, score-0.408]

47 The prop- + θPi,1, θPi,K erty distribution represents product-specific content distributions over properties discussed in reviews of the product; for instance in the restaurant domains, pEraocpher θtiPi,eksis m daryaw cnor fr eosmpon ad sy tmom diest ri nc Dti mricehnluet it permiosr. [sent-136, score-0.527]

48 For the global attribute distribution, the prior hyper-parameter counts are ? [sent-139, score-0.348]

49 for all vocabulary items and λA for Vseeda , the vector of vocabulary items in the set of seed words for attribute a. [sent-140, score-0.456]

50 Snippet Level: For the jth snippet ofthe ith product, a property random variable is drawn according to the multinomial ψi. [sent-151, score-0.64]

51 Conditioned on this choice, we draw an attribute (positive or nega- ZPi,j ZAi,j φi,ZjP,j. [sent-152, score-0.398]

52 ZAi,j tive) from the property attribute distribution Once the property and attribute have been selected, the tokens of the snippet are generated using a simple HMM. [sent-153, score-1.592]

53 The latent state underlying a token, indicates whether the wth word comes from the property distribution, attribute dis- ZPi,j ZiW,j,w, 353 tribution, or background distribution; we use P, A, or B to denote these respective values of ZiW,j,w. [sent-154, score-0.687]

54 5 or θB for the values P,A, Inference The goal of inference is to predict the snippet property and attribute distributions over each snippet given all the observed snippets , |s) for all products iand snippets j. [sent-161, score-2.303]

55 Data Set Our data set consists of snippets from Yelp reviews generated by the system described in Sauper et al. [sent-175, score-0.61]

56 This system is trained to extract snippets containing short descriptions of user sentiment towards some aspect of a restaurant. [sent-177, score-0.731]

57 Figure 3: Example snippets from our data set, grouped according to property. [sent-194, score-0.485]

58 Property words are labeled P and colored blue, NEGATIVE attribute words are labeled - and colored red, and POSITIVE attribute words are labeled + and colored green. [sent-195, score-0.918]

59 select only the snippets labeled by that system as referencing food, and we ignore restaurants with fewer than 20 snippets. [sent-197, score-0.64]

60 There are 13,879 snippets in total, taken from 328 restaurants in and around the Boston/Cambridge area. [sent-198, score-0.608]

61 1 snippets per restaurant, although there is high variance in number of snippets for each restaurant. [sent-201, score-0.97]

62 For sentiment attribute seed words, we use 42 and 33 words for the positive and negative distributions respectively. [sent-203, score-0.786]

63 These are hand-selected based on the restaurant review domain; therefore, they include domain-specific words such as delicious and gross. [sent-204, score-0.362]

64 First, a cluster prediction task is designed to test the quality of the learned property clusters. [sent-206, score-0.448]

65 Second, an attribute analysis task will evaluate the sentiment analysis portion of the model. [sent-207, score-0.53]

66 Third, we present a task designed to test whether the system can correctly identify properties which have conflicting attributes, which tests both clustering and sentiment analysis. [sent-208, score-0.407]

67 Figure 2: The mean-field variational algorithm used during learning and inference to obtain posterior predictions over snippet properties and attributes, as described in Section 5. [sent-209, score-0.628]

68 , all snippets predicted for a given property are related to each other) and comprehensive (i. [sent-215, score-0.756]

69 , all snippets which are related to a property are predicted for it). [sent-217, score-0.756]

70 For example, a snippet will be assigned the property pad thai if and only if that snippet mentions some aspect of the pad thai. [sent-218, score-1.054]

71 ZPi,j Annotation For this task, we use a set of gold clusters over 3,250 snippets across 75 restaurants collected through Mechanical Turk. [sent-219, score-0.773]

72 In each task, a worker was given a set of 25 snippets from a single restaurant and asked to cluster them into as many clusters as they desired, with the option of leaving any number unclustered. [sent-220, score-0.95]

73 Because our model only uses property words to tie together clusters, it may miss correlations between words which are not correctly identified as property words. [sent-227, score-0.599]

74 The baseline is allowed 10 property clusters per restaurant. [sent-228, score-0.48]

75 While MUC has a deficiency in that putting everything into a single cluster will artificially inflate the score, parameters on our model are set so that the model uses the same number of clusters as the baseline system. [sent-241, score-0.419]

76 The most common cause of poor cluster choices in the baseline system is its inability to distinguish property words from attribute words. [sent-246, score-0.846]

77 For example, if many snippets in a given restaurant use the word delicious, there may end up being a cluster based on that alone. [sent-247, score-0.785]

78 2 Attribute analysis We also evaluate the system’s predictions of snippet attribute using the predicted posterior over the attribute distribution for the snippet (i. [sent-253, score-1.391]

79 q(ZAi,j) our model correctly distinguishes attribute words. [sent-257, score-0.379]

80 Annotation For this task, we use a set of 260 total snippets from the Yelp reviews for 30 restaurants, evenly split into a training and test sets of 130 snippets each. [sent-258, score-1.095]

81 These snippets are manually labeled POS356 T h e fsm i’saomhrztoairn eidrs mvleawr eicwvtoeuiangosriynvsmolegduroeynktdsfer wdsehrxlciveorlyunswtelmad ITCtha werabcopstearscitaokchteaB,cr owkleaot sgcmwndakey sl aiecdIn’wedovlauidcseliryvoeucmrshi. [sent-259, score-0.517]

82 In the first example, the baseline mistakenly clusters some snippets about martinis with those containing the word very. [sent-264, score-0.694]

83 Neutral snippets are ignored for the purpose of this experiment. [sent-267, score-0.485]

84 Given enough snippets from enough unrelated properties, the classifier should be able to identify that words like great indicate positive sentiment and those like bad indicate negative sentiment, while words like chicken are neutral and have no effect. [sent-270, score-0.782]

85 If there are more words from Vseed+ , the snippet is labeled positive, and if there are more words from Vseed , the snip- pet is labeled negative. [sent-272, score-0.371]

86 Because the seed word lists are specifically slanted toward restaurant reviews (i. [sent-274, score-0.395]

87 The advantage of our system is its ability to distinguish property words from attribute words in order to restrict judgment to only the relevant terms. [sent-279, score-0.681]

88 As in the cluster prediction case, the main flaw with the DISCRIMINATIVE baseline system is its inability to recognize which words are relevant for the task at hand, in this case the attribute words. [sent-286, score-0.684]

89 By learning to separate attribute words from the other words in the snippets, our full system is able to more accurately judge their sentiment. [sent-287, score-0.348]

90 3 Conflict identification Our final task requires both correct cluster prediction and correct sentiment judgments. [sent-291, score-0.359]

91 In many domains, it is interesting to know not only whether a product is rated highly, but also whether there is conflicting sentiment or debate. [sent-292, score-0.357]

92 357 conflict identification task, over both property and attribute. [sent-294, score-0.322]

93 Propertyjudgment (P) indicates whether the snippets are discussing the same item; attribute judgment (A) indicates whether there is a correct difference in attribute (sentiment), regardless of properties. [sent-295, score-1.241]

94 The goal is to identify whether these are true conflicts of sentiment or there was a failure in either property clustering or attribute classification. [sent-297, score-0.863]

95 For this task, the output clusters are manually annotated for correctness of both property and attribute judgments, as in Table 6. [sent-298, score-0.815]

96 From these numbers, we can see that 50% of the clusters are correct in both property (cohesiveness) and attribute (difference in sentiment) dimensions. [sent-302, score-0.784]

97 Overall, the properties are correctly identified (subject of NEG matches the subject of POS) 68% of the time and a correct difference in attribute is identified 67% of the time. [sent-303, score-0.511]

98 Of the clusters which are correct in property, 74% show a correctly labeled Table7:RsultNJYsuPeodsfgmcoYeNAneofsltic#anCl51yu27s8itebrsycoetnsof property label (P) and attribute conflict (A). [sent-304, score-0.898]

99 50% of the clusters are correct in both labels, and there are approximately the same number of errors toward both property and attribute. [sent-306, score-0.436]

100 7 Conclusion We have presented a probabilistic topic model for identifying properties and attitudes of product review snippets. [sent-308, score-0.46]

similar papers computed by tfidf model

tfidf for this paper:

wordName wordTfidf (topN-words)

[('snippets', 0.485), ('attribute', 0.348), ('snippet', 0.307), ('property', 0.271), ('sentiment', 0.182), ('product', 0.175), ('clusters', 0.165), ('restaurant', 0.162), ('cluster', 0.138), ('properties', 0.132), ('reviews', 0.125), ('review', 0.124), ('restaurants', 0.123), ('variational', 0.116), ('seed', 0.108), ('sauper', 0.095), ('vseed', 0.087), ('attributes', 0.082), ('delicious', 0.076), ('muc', 0.075), ('distributions', 0.061), ('hu', 0.061), ('yelp', 0.057), ('minqing', 0.056), ('parallelized', 0.053), ('conflict', 0.051), ('thai', 0.05), ('draw', 0.05), ('dirichlet', 0.048), ('carenini', 0.047), ('distribution', 0.047), ('opinion', 0.047), ('positive', 0.046), ('pad', 0.045), ('inability', 0.045), ('baseline', 0.044), ('opinions', 0.044), ('flaw', 0.043), ('inflate', 0.043), ('unp', 0.043), ('vseeda', 0.043), ('aggregation', 0.043), ('colored', 0.042), ('kim', 0.042), ('negative', 0.041), ('latent', 0.039), ('prediction', 0.039), ('inference', 0.039), ('bing', 0.038), ('christina', 0.038), ('vilain', 0.038), ('junsheng', 0.038), ('cocktail', 0.038), ('meanfield', 0.038), ('titov', 0.037), ('sentiments', 0.037), ('www', 0.037), ('drawn', 0.036), ('user', 0.035), ('sandwich', 0.035), ('judgment', 0.035), ('popescu', 0.035), ('liu', 0.034), ('mei', 0.034), ('posterior', 0.034), ('clustering', 0.034), ('numerical', 0.033), ('chengxiang', 0.033), ('admits', 0.033), ('summarization', 0.032), ('labeled', 0.032), ('correctly', 0.031), ('seki', 0.031), ('battery', 0.031), ('standalone', 0.031), ('aggregate', 0.031), ('correctness', 0.031), ('zhai', 0.03), ('everything', 0.029), ('wth', 0.029), ('aspect', 0.029), ('topic', 0.029), ('identify', 0.028), ('separation', 0.028), ('food', 0.028), ('discriminative', 0.028), ('regina', 0.027), ('hyperparameter', 0.027), ('amongst', 0.027), ('observer', 0.027), ('factorization', 0.027), ('collection', 0.027), ('relevant', 0.027), ('tie', 0.026), ('assume', 0.026), ('coherent', 0.026), ('ith', 0.026), ('duc', 0.026), ('spectrum', 0.026), ('discussing', 0.025)]

similar papers list:

simIndex simValue paperId paperTitle

same-paper 1 1.0000008 82 acl-2011-Content Models with Attitude

Author: Christina Sauper ; Aria Haghighi ; Regina Barzilay

2 0.22754358 19 acl-2011-A Mobile Touchable Application for Online Topic Graph Extraction and Exploration of Web Content

Author: Gunter Neumann ; Sven Schmeier

Abstract: We present a mobile touchable application for online topic graph extraction and exploration of web content. The system has been implemented for operation on an iPad. The topic graph is constructed from N web snippets which are determined by a standard search engine. We consider the extraction of a topic graph as a specific empirical collocation extraction task where collocations are extracted between chunks. Our measure of association strength is based on the pointwise mutual information between chunk pairs which explicitly takes their distance into account. An initial user evaluation shows that this system is especially helpful for finding new interesting information on topics about which the user has only a vague idea or even no idea at all.

3 0.20730136 332 acl-2011-Using Multiple Sources to Construct a Sentiment Sensitive Thesaurus for Cross-Domain Sentiment Classification

Author: Danushka Bollegala ; David Weir ; John Carroll

Abstract: We describe a sentiment classification method that is applicable when we do not have any labeled data for a target domain but have some labeled data for multiple other domains, designated as the source domains. We automat- ically create a sentiment sensitive thesaurus using both labeled and unlabeled data from multiple source domains to find the association between words that express similar sentiments in different domains. The created thesaurus is then used to expand feature vectors to train a binary classifier. Unlike previous cross-domain sentiment classification methods, our method can efficiently learn from multiple source domains. Our method significantly outperforms numerous baselines and returns results that are better than or comparable to previous cross-domain sentiment classification methods on a benchmark dataset containing Amazon user reviews for different types of products.

4 0.20014974 45 acl-2011-Aspect Ranking: Identifying Important Product Aspects from Online Consumer Reviews

Author: Jianxing Yu ; Zheng-Jun Zha ; Meng Wang ; Tat-Seng Chua

Abstract: In this paper, we dedicate to the topic of aspect ranking, which aims to automatically identify important product aspects from online consumer reviews. The important aspects are identified according to two observations: (a) the important aspects of a product are usually commented by a large number of consumers; and (b) consumers’ opinions on the important aspects greatly influence their overall opinions on the product. In particular, given consumer reviews of a product, we first identify the product aspects by a shallow dependency parser and determine consumers’ opinions on these aspects via a sentiment classifier. We then develop an aspect ranking algorithm to identify the important aspects by simultaneously considering the aspect frequency and the influence of consumers’ opinions given to each aspect on their overall opinions. The experimental results on 11 popular products in four domains demonstrate the effectiveness of our approach. We further apply the aspect ranking results to the application ofdocumentlevel sentiment classification, and improve the performance significantly.

5 0.18737032 204 acl-2011-Learning Word Vectors for Sentiment Analysis

Author: Andrew L. Maas ; Raymond E. Daly ; Peter T. Pham ; Dan Huang ; Andrew Y. Ng ; Christopher Potts

Abstract: Unsupervised vector-based approaches to semantics can model rich lexical meanings, but they largely fail to capture sentiment information that is central to many word meanings and important for a wide range of NLP tasks. We present a model that uses a mix of unsupervised and supervised techniques to learn word vectors capturing semanticterm–documentinformation as well as rich sentiment content. The proposed model can leverage both continuous and multi-dimensional sentiment information as well as non-sentiment annotations. We instantiate the model to utilize the document-level sentiment polarity annotations present in many online documents (e.g. star ratings). We evaluate the model using small, widely used sentiment and subjectivity corpora and find it out-performs several previously introduced methods for sentiment classification. We also introduce a large dataset , of movie reviews to serve as a more robust benchmark for work in this area.

6 0.15537721 279 acl-2011-Semi-supervised latent variable models for sentence-level sentiment analysis

7 0.13405542 292 acl-2011-Target-dependent Twitter Sentiment Classification

8 0.12934291 159 acl-2011-Identifying Noun Product Features that Imply Opinions

9 0.12299592 281 acl-2011-Sentiment Analysis of Citations using Sentence Structure-Based Features

10 0.12075065 183 acl-2011-Joint Bilingual Sentiment Classification with Unlabeled Parallel Corpora

11 0.1187626 54 acl-2011-Automatically Extracting Polarity-Bearing Topics for Cross-Domain Sentiment Classification

12 0.11516154 324 acl-2011-Unsupervised Semantic Role Induction via Split-Merge Clustering

13 0.1108273 117 acl-2011-Entity Set Expansion using Topic information

14 0.10802361 102 acl-2011-Does Size Matter - How Much Data is Required to Train a REG Algorithm?

15 0.10579815 253 acl-2011-PsychoSentiWordNet

16 0.10480613 105 acl-2011-Dr Sentiment Knows Everything!

17 0.10436974 218 acl-2011-MemeTube: A Sentiment-based Audiovisual System for Analyzing and Displaying Microblog Messages

18 0.10231494 103 acl-2011-Domain Adaptation by Constraining Inter-Domain Variability of Latent Feature Representation

19 0.097278558 64 acl-2011-C-Feel-It: A Sentiment Analyzer for Micro-blogs

20 0.091338329 277 acl-2011-Semi-supervised Relation Extraction with Large-scale Word Clustering

similar papers computed by lsi model

lsi for this paper:

topicId topicWeight

[(0, 0.209), (1, 0.224), (2, 0.088), (3, -0.011), (4, 0.047), (5, -0.036), (6, -0.046), (7, 0.072), (8, -0.023), (9, 0.001), (10, 0.078), (11, -0.027), (12, 0.016), (13, 0.044), (14, 0.018), (15, -0.034), (16, -0.089), (17, -0.031), (18, -0.019), (19, 0.048), (20, -0.036), (21, 0.028), (22, 0.016), (23, -0.057), (24, 0.014), (25, -0.034), (26, 0.083), (27, -0.045), (28, -0.009), (29, -0.014), (30, 0.076), (31, 0.003), (32, -0.128), (33, -0.01), (34, 0.022), (35, -0.019), (36, 0.012), (37, -0.021), (38, -0.082), (39, 0.053), (40, 0.079), (41, -0.139), (42, 0.003), (43, 0.116), (44, 0.183), (45, 0.102), (46, 0.048), (47, 0.062), (48, 0.017), (49, 0.106)]

similar papers list:

simIndex simValue paperId paperTitle

same-paper 1 0.95820588 82 acl-2011-Content Models with Attitude

Author: Christina Sauper ; Aria Haghighi ; Regina Barzilay

2 0.74541855 45 acl-2011-Aspect Ranking: Identifying Important Product Aspects from Online Consumer Reviews

Author: Jianxing Yu ; Zheng-Jun Zha ; Meng Wang ; Tat-Seng Chua

3 0.67162442 55 acl-2011-Automatically Predicting Peer-Review Helpfulness

Author: Wenting Xiong ; Diane Litman

Abstract: Identifying peer-review helpfulness is an important task for improving the quality of feedback that students receive from their peers. As a first step towards enhancing existing peerreview systems with new functionality based on helpfulness detection, we examine whether standard product review analysis techniques also apply to our new context of peer reviews. In addition, we investigate the utility of incorporating additional specialized features tailored to peer review. Our preliminary results show that the structural features, review unigrams and meta-data combined are useful in modeling the helpfulness of both peer reviews and product reviews, while peer-review specific auxiliary features can further improve helpfulness prediction.

4 0.6692369 204 acl-2011-Learning Word Vectors for Sentiment Analysis

Author: Andrew L. Maas ; Raymond E. Daly ; Peter T. Pham ; Dan Huang ; Andrew Y. Ng ; Christopher Potts

5 0.60099888 279 acl-2011-Semi-supervised latent variable models for sentence-level sentiment analysis

Author: Oscar Tackstrom ; Ryan McDonald

Abstract: We derive two variants of a semi-supervised model for fine-grained sentiment analysis. Both models leverage abundant natural supervision in the form of review ratings, as well as a small amount of manually crafted sentence labels, to learn sentence-level sentiment classifiers. The proposed model is a fusion of a fully supervised structured conditional model and its partially supervised counterpart. This allows for highly efficient estimation and inference algorithms with rich feature definitions. We describe the two variants as well as their component models and verify experimentally that both variants give significantly improved results for sentence-level sentiment analysis compared to all baselines. 1 Sentence-level sentiment analysis In this paper, we demonstrate how combining coarse-grained and fine-grained supervision benefits sentence-level sentiment analysis an important task in the field of opinion classification and retrieval (Pang and Lee, 2008). Typical supervised learning approaches to sentence-level sentiment analysis rely on sentence-level supervision. While such fine-grained supervision rarely exist naturally, and thus requires labor intensive manual annotation effort (Wiebe et al., 2005), coarse-grained supervision is naturally abundant in the form of online review ratings. This coarse-grained supervision is, of course, less informative compared to fine-grained supervision, however, by combining a small amount of sentence-level supervision with a large amount of document-level supervision, we are able to substantially improve on the sentence-level classification task. Our work combines two strands of research: models for sentiment analysis that take document structure into account; – 569 Ryan McDonald Google, Inc., New York ryanmcd@ google com . and models that use latent variables to learn unobserved phenomena from that which can be observed. Exploiting document structure for sentiment analysis has attracted research attention since the early work of Pang and Lee (2004), who performed minimal cuts in a sentence graph to select subjective sentences. McDonald et al. (2007) later showed that jointly learning fine-grained (sentence) and coarsegrained (document) sentiment improves predictions at both levels. More recently, Yessenalina et al. (2010) described how sentence-level latent variables can be used to improve document-level prediction and Nakagawa et al. (2010) used latent variables over syntactic dependency trees to improve sentence-level prediction, using only labeled sentences for training. In a similar vein, Sauper et al. (2010) integrated generative content structure models with discriminative models for multi-aspect sentiment summarization and ranking. These approaches all rely on the availability of fine-grained annotations, but Ta¨ckstro¨m and McDonald (201 1) showed that latent variables can be used to learn fine-grained sentiment using only coarse-grained supervision. While this model was shown to beat a set of natural baselines with quite a wide margin, it has its shortcomings. Most notably, due to the loose constraints provided by the coarse supervision, it tends to only predict the two dominant fine-grained sentiment categories well for each document sentiment category, so that almost all sentences in positive documents are deemed positive or neutral, and vice versa for negative documents. As a way of overcoming these shortcomings, we propose to fuse a coarsely supervised model with a fully supervised model. Below, we describe two ways of achieving such a combined model in the framework of structured conditional latent variable models. Contrary to (generative) topic models (Mei et al., 2007; Titov and Proceedings ofP thoer t4l9atnhd A, Onrnuegaoln M,e Jeuntineg 19 o-f2 t4h,e 2 A0s1s1o.c?i ac t2io0n11 fo Ar Cssoocmiaptuiotanti foonra Clo Lminpguutiast i ocns:aslh Loirntpgaupisetrics , pages 569–574, Figure 1: a) Factor graph of the fully observed graphical model. b) Factor graph of the corresponding latent variable model. During training, shaded nodes are observed, while non-shaded nodes are unobserved. The input sentences si are always observed. Note that there are no factors connecting the document node, yd, with the input nodes, s, so that the sentence-level variables, ys, in effect form a bottleneck between the document sentiment and the input sentences. McDonald, 2008; Lin and He, 2009), structured conditional models can handle rich and overlapping features and allow for exact inference and simple gradient based estimation. The former models are largely orthogonal to the one we propose in this work and combining their merits might be fruitful. As shown by Sauper et al. (2010), it is possible to fuse generative document structure models and task specific structured conditional models. While we do model document structure in terms of sentiment transitions, we do not model topical structure. An interesting avenue for future work would be to extend the model of Sauper et al. (2010) to take coarse-grained taskspecific supervision into account, while modeling fine-grained task-specific aspects with latent variables. Note also that the proposed approach is orthogonal to semi-supervised and unsupervised induction of context independent (prior polarity) lexicons (Turney, 2002; Kim and Hovy, 2004; Esuli and Sebastiani, 2009; Rao and Ravichandran, 2009; Velikovich et al., 2010). The output of such models could readily be incorporated as features in the proposed model. 1.1 Preliminaries Let d be a document consisting of n sentences, s = (si)in=1, with a document–sentence-sequence pair denoted d = (d, s). Let yd = (yd, ys) denote random variables1 the document level sentiment, yd, and the sequence of sentence level sentiment, = (ysi)in=1 . – ys 1We are abusing notation throughout by using the same symbols to refer to random variables and their particular assignments. 570 In what follows, we assume that we have access to two training sets: a small set of fully labeled instances, DF = {(dj, and a large set of ydj)}jm=f1, coarsely labeled instances DC = {(dj, yjd)}jm=fm+fm+c1. Furthermore, we assume that yd and all yis take values in {POS, NEG, NEU}. We focus on structured conditional models in the exponential family, with the standard parametrization pθ(yd,ys|s) = expnhφ(yd,ys,s),θi − Aθ(s)o

6 0.57399237 332 acl-2011-Using Multiple Sources to Construct a Sentiment Sensitive Thesaurus for Cross-Domain Sentiment Classification

7 0.54233533 54 acl-2011-Automatically Extracting Polarity-Bearing Topics for Cross-Domain Sentiment Classification

8 0.50050747 19 acl-2011-A Mobile Touchable Application for Online Topic Graph Extraction and Exploration of Web Content

9 0.48774815 218 acl-2011-MemeTube: A Sentiment-based Audiovisual System for Analyzing and Displaying Microblog Messages

10 0.47013414 159 acl-2011-Identifying Noun Product Features that Imply Opinions

11 0.46862975 292 acl-2011-Target-dependent Twitter Sentiment Classification

12 0.46761709 319 acl-2011-Unsupervised Decomposition of a Document into Authorial Components

13 0.46510187 136 acl-2011-Finding Deceptive Opinion Spam by Any Stretch of the Imagination

14 0.45846796 324 acl-2011-Unsupervised Semantic Role Induction via Split-Merge Clustering

15 0.44387865 125 acl-2011-Exploiting Readymades in Linguistic Creativity: A System Demonstration of the Jigsaw Bard

16 0.43326777 211 acl-2011-Liars and Saviors in a Sentiment Annotated Corpus of Comments to Political Debates

17 0.43325216 84 acl-2011-Contrasting Opposing Views of News Articles on Contentious Issues

18 0.43311816 64 acl-2011-C-Feel-It: A Sentiment Analyzer for Micro-blogs

19 0.43270275 281 acl-2011-Sentiment Analysis of Citations using Sentence Structure-Based Features

20 0.42854819 142 acl-2011-Generalized Interpolation in Decision Tree LM

similar papers computed by lda model

lda for this paper:

topicId topicWeight

[(5, 0.015), (17, 0.038), (26, 0.017), (37, 0.09), (39, 0.038), (41, 0.036), (53, 0.02), (55, 0.024), (59, 0.031), (72, 0.02), (91, 0.04), (96, 0.549)]

similar papers list:

simIndex simValue paperId paperTitle

same-paper 1 0.99803066 82 acl-2011-Content Models with Attitude

Author: Christina Sauper ; Aria Haghighi ; Regina Barzilay

2 0.99749732 314 acl-2011-Typed Graph Models for Learning Latent Attributes from Names

Author: Delip Rao ; David Yarowsky

Abstract: This paper presents an original approach to semi-supervised learning of personal name ethnicity from typed graphs of morphophonemic features and first/last-name co-occurrence statistics. We frame this as a general solution to an inference problem over typed graphs where the edges represent labeled relations between features that are parameterized by the edge types. We propose a framework for parameter estimation on different constructions of typed graphs for this problem using a gradient-free optimization method based on grid search. Results on both in-domain and out-of-domain data show significant gains over 30% accuracy improvement using the techniques presented in the paper.

3 0.99692816 270 acl-2011-SciSumm: A Multi-Document Summarization System for Scientific Articles

Author: Nitin Agarwal ; Ravi Shankar Reddy ; Kiran GVR ; Carolyn Penstein Rose

Abstract: In this demo, we present SciSumm, an interactive multi-document summarization system for scientific articles. The document collection to be summarized is a list of papers cited together within the same source article, otherwise known as a co-citation. At the heart of the approach is a topic based clustering of fragments extracted from each article based on queries generated from the context surrounding the co-cited list of papers. This analysis enables the generation of an overview of common themes from the co-cited papers that relate to the context in which the co-citation was found. SciSumm is currently built over the 2008 ACL Anthology, however the gen- eralizable nature of the summarization techniques and the extensible architecture makes it possible to use the system with other corpora where a citation network is available. Evaluation results on the same corpus demonstrate that our system performs better than an existing widely used multi-document summarization system (MEAD).

4 0.99682695 272 acl-2011-Semantic Information and Derivation Rules for Robust Dialogue Act Detection in a Spoken Dialogue System

Author: Wei-Bin Liang ; Chung-Hsien Wu ; Chia-Ping Chen

Abstract: In this study, a novel approach to robust dialogue act detection for error-prone speech recognition in a spoken dialogue system is proposed. First, partial sentence trees are proposed to represent a speech recognition output sentence. Semantic information and the derivation rules of the partial sentence trees are extracted and used to model the relationship between the dialogue acts and the derivation rules. The constructed model is then used to generate a semantic score for dialogue act detection given an input speech utterance. The proposed approach is implemented and evaluated in a Mandarin spoken dialogue system for tour-guiding service. Combined with scores derived from the ASR recognition probability and the dialogue history, the proposed approach achieves 84.3% detection accuracy, an absolute improvement of 34.7% over the baseline of the semantic slot-based method with 49.6% detection accuracy.

5 0.99660724 335 acl-2011-Why Initialization Matters for IBM Model 1: Multiple Optima and Non-Strict Convexity

Author: Kristina Toutanova ; Michel Galley

Abstract: Contrary to popular belief, we show that the optimal parameters for IBM Model 1 are not unique. We demonstrate that, for a large class of words, IBM Model 1 is indifferent among a continuum of ways to allocate probability mass to their translations. We study the magnitude of the variance in optimal model parameters using a linear programming approach as well as multiple random trials, and demonstrate that it results in variance in test set log-likelihood and alignment error rate.

6 0.99608546 290 acl-2011-Syntax-based Statistical Machine Translation using Tree Automata and Tree Transducers

7 0.99552351 168 acl-2011-Improving On-line Handwritten Recognition using Translation Models in Multimodal Interactive Machine Translation

8 0.99366397 25 acl-2011-A Simple Measure to Assess Non-response

9 0.99292505 49 acl-2011-Automatic Evaluation of Chinese Translation Output: Word-Level or Character-Level?

10 0.99024051 41 acl-2011-An Interactive Machine Translation System with Online Learning

11 0.9887619 341 acl-2011-Word Maturity: Computational Modeling of Word Knowledge

12 0.97906816 266 acl-2011-Reordering with Source Language Collocations

13 0.97273636 264 acl-2011-Reordering Metrics for MT

14 0.96910059 169 acl-2011-Improving Question Recommendation by Exploiting Information Need

15 0.96893674 251 acl-2011-Probabilistic Document Modeling for Syntax Removal in Text Summarization

16 0.96730953 21 acl-2011-A Pilot Study of Opinion Summarization in Conversations

17 0.96630132 2 acl-2011-AM-FM: A Semantic Framework for Translation Quality Assessment

18 0.96479023 155 acl-2011-Hypothesis Mixture Decoding for Statistical Machine Translation

19 0.96206874 326 acl-2011-Using Bilingual Information for Cross-Language Document Summarization

20 0.96201754 220 acl-2011-Minimum Bayes-risk System Combination