acl acl2011 acl2011-279 knowledge-graph by maker-knowledge-mining
Source: pdf
Author: Oscar Tackstrom ; Ryan McDonald
Abstract: We derive two variants of a semi-supervised model for fine-grained sentiment analysis. Both models leverage abundant natural supervision in the form of review ratings, as well as a small amount of manually crafted sentence labels, to learn sentence-level sentiment classifiers. The proposed model is a fusion of a fully supervised structured conditional model and its partially supervised counterpart. This allows for highly efficient estimation and inference algorithms with rich feature definitions. We describe the two variants as well as their component models and verify experimentally that both variants give significantly improved results for sentence-level sentiment analysis compared to all baselines. 1 Sentence-level sentiment analysis In this paper, we demonstrate how combining coarse-grained and fine-grained supervision benefits sentence-level sentiment analysis an important task in the field of opinion classification and retrieval (Pang and Lee, 2008). Typical supervised learning approaches to sentence-level sentiment analysis rely on sentence-level supervision. While such fine-grained supervision rarely exist naturally, and thus requires labor intensive manual annotation effort (Wiebe et al., 2005), coarse-grained supervision is naturally abundant in the form of online review ratings. This coarse-grained supervision is, of course, less informative compared to fine-grained supervision, however, by combining a small amount of sentence-level supervision with a large amount of document-level supervision, we are able to substantially improve on the sentence-level classification task. Our work combines two strands of research: models for sentiment analysis that take document structure into account; – 569 Ryan McDonald Google, Inc., New York ryanmcd@ google com . and models that use latent variables to learn unobserved phenomena from that which can be observed. Exploiting document structure for sentiment analysis has attracted research attention since the early work of Pang and Lee (2004), who performed minimal cuts in a sentence graph to select subjective sentences. McDonald et al. (2007) later showed that jointly learning fine-grained (sentence) and coarsegrained (document) sentiment improves predictions at both levels. More recently, Yessenalina et al. (2010) described how sentence-level latent variables can be used to improve document-level prediction and Nakagawa et al. (2010) used latent variables over syntactic dependency trees to improve sentence-level prediction, using only labeled sentences for training. In a similar vein, Sauper et al. (2010) integrated generative content structure models with discriminative models for multi-aspect sentiment summarization and ranking. These approaches all rely on the availability of fine-grained annotations, but Ta¨ckstro¨m and McDonald (201 1) showed that latent variables can be used to learn fine-grained sentiment using only coarse-grained supervision. While this model was shown to beat a set of natural baselines with quite a wide margin, it has its shortcomings. Most notably, due to the loose constraints provided by the coarse supervision, it tends to only predict the two dominant fine-grained sentiment categories well for each document sentiment category, so that almost all sentences in positive documents are deemed positive or neutral, and vice versa for negative documents. As a way of overcoming these shortcomings, we propose to fuse a coarsely supervised model with a fully supervised model. Below, we describe two ways of achieving such a combined model in the framework of structured conditional latent variable models. Contrary to (generative) topic models (Mei et al., 2007; Titov and Proceedings ofP thoer t4l9atnhd A, Onrnuegaoln M,e Jeuntineg 19 o-f2 t4h,e 2 A0s1s1o.c?i ac t2io0n11 fo Ar Cssoocmiaptuiotanti foonra Clo Lminpguutiast i ocns:aslh Loirntpgaupisetrics , pages 569–574, Figure 1: a) Factor graph of the fully observed graphical model. b) Factor graph of the corresponding latent variable model. During training, shaded nodes are observed, while non-shaded nodes are unobserved. The input sentences si are always observed. Note that there are no factors connecting the document node, yd, with the input nodes, s, so that the sentence-level variables, ys, in effect form a bottleneck between the document sentiment and the input sentences. McDonald, 2008; Lin and He, 2009), structured conditional models can handle rich and overlapping features and allow for exact inference and simple gradient based estimation. The former models are largely orthogonal to the one we propose in this work and combining their merits might be fruitful. As shown by Sauper et al. (2010), it is possible to fuse generative document structure models and task specific structured conditional models. While we do model document structure in terms of sentiment transitions, we do not model topical structure. An interesting avenue for future work would be to extend the model of Sauper et al. (2010) to take coarse-grained taskspecific supervision into account, while modeling fine-grained task-specific aspects with latent variables. Note also that the proposed approach is orthogonal to semi-supervised and unsupervised induction of context independent (prior polarity) lexicons (Turney, 2002; Kim and Hovy, 2004; Esuli and Sebastiani, 2009; Rao and Ravichandran, 2009; Velikovich et al., 2010). The output of such models could readily be incorporated as features in the proposed model. 1.1 Preliminaries Let d be a document consisting of n sentences, s = (si)in=1, with a document–sentence-sequence pair denoted d = (d, s). Let yd = (yd, ys) denote random variables1 the document level sentiment, yd, and the sequence of sentence level sentiment, = (ysi)in=1 . – ys 1We are abusing notation throughout by using the same symbols to refer to random variables and their particular assignments. 570 In what follows, we assume that we have access to two training sets: a small set of fully labeled instances, DF = {(dj, and a large set of ydj)}jm=f1, coarsely labeled instances DC = {(dj, yjd)}jm=fm+fm+c1. Furthermore, we assume that yd and all yis take values in {POS, NEG, NEU}. We focus on structured conditional models in the exponential family, with the standard parametrization pθ(yd,ys|s) = expnhφ(yd,ys,s),θi − Aθ(s)o
Reference: text
sentIndex sentText sentNum sentScore
1 Semi-supervised latent variable models for sentence-level sentiment analysis Oscar T ¨ackstr o¨m SICS, Kista / Uppsala University, Uppsala o s car @ s i s s e c . [sent-1, score-0.788]
2 Abstract We derive two variants of a semi-supervised model for fine-grained sentiment analysis. [sent-2, score-0.56]
3 Both models leverage abundant natural supervision in the form of review ratings, as well as a small amount of manually crafted sentence labels, to learn sentence-level sentiment classifiers. [sent-3, score-1.194]
4 The proposed model is a fusion of a fully supervised structured conditional model and its partially supervised counterpart. [sent-4, score-0.466]
5 This allows for highly efficient estimation and inference algorithms with rich feature definitions. [sent-5, score-0.038]
6 We describe the two variants as well as their component models and verify experimentally that both variants give significantly improved results for sentence-level sentiment analysis compared to all baselines. [sent-6, score-0.776]
7 1 Sentence-level sentiment analysis In this paper, we demonstrate how combining coarse-grained and fine-grained supervision benefits sentence-level sentiment analysis an important task in the field of opinion classification and retrieval (Pang and Lee, 2008). [sent-7, score-1.374]
8 Typical supervised learning approaches to sentence-level sentiment analysis rely on sentence-level supervision. [sent-8, score-0.599]
9 While such fine-grained supervision rarely exist naturally, and thus requires labor intensive manual annotation effort (Wiebe et al. [sent-9, score-0.47]
10 , 2005), coarse-grained supervision is naturally abundant in the form of online review ratings. [sent-10, score-0.579]
11 Our work combines two strands of research: models for sentiment analysis that take document structure into account; – 569 Ryan McDonald Google, Inc. [sent-12, score-0.807]
12 and models that use latent variables to learn unobserved phenomena from that which can be observed. [sent-14, score-0.438]
13 Exploiting document structure for sentiment analysis has attracted research attention since the early work of Pang and Lee (2004), who performed minimal cuts in a sentence graph to select subjective sentences. [sent-15, score-0.891]
14 (2007) later showed that jointly learning fine-grained (sentence) and coarsegrained (document) sentiment improves predictions at both levels. [sent-17, score-0.546]
15 (2010) described how sentence-level latent variables can be used to improve document-level prediction and Nakagawa et al. [sent-19, score-0.333]
16 (2010) used latent variables over syntactic dependency trees to improve sentence-level prediction, using only labeled sentences for training. [sent-20, score-0.336]
17 (2010) integrated generative content structure models with discriminative models for multi-aspect sentiment summarization and ranking. [sent-22, score-0.682]
18 These approaches all rely on the availability of fine-grained annotations, but Ta¨ckstro¨m and McDonald (201 1) showed that latent variables can be used to learn fine-grained sentiment using only coarse-grained supervision. [sent-23, score-0.857]
19 While this model was shown to beat a set of natural baselines with quite a wide margin, it has its shortcomings. [sent-24, score-0.053]
20 As a way of overcoming these shortcomings, we propose to fuse a coarsely supervised model with a fully supervised model. [sent-26, score-0.561]
21 Below, we describe two ways of achieving such a combined model in the framework of structured conditional latent variable models. [sent-27, score-0.402]
22 i ac t2io0n11 fo Ar Cssoocmiaptuiotanti foonra Clo Lminpguutiast i ocns:aslh Loirntpgaupisetrics , pages 569–574, Figure 1: a) Factor graph of the fully observed graphical model. [sent-31, score-0.136]
23 b) Factor graph of the corresponding latent variable model. [sent-32, score-0.271]
24 During training, shaded nodes are observed, while non-shaded nodes are unobserved. [sent-33, score-0.177]
25 Note that there are no factors connecting the document node, yd, with the input nodes, s, so that the sentence-level variables, ys, in effect form a bottleneck between the document sentiment and the input sentences. [sent-35, score-0.89]
26 McDonald, 2008; Lin and He, 2009), structured conditional models can handle rich and overlapping features and allow for exact inference and simple gradient based estimation. [sent-36, score-0.356]
27 The former models are largely orthogonal to the one we propose in this work and combining their merits might be fruitful. [sent-37, score-0.242]
28 (2010), it is possible to fuse generative document structure models and task specific structured conditional models. [sent-39, score-0.634]
29 While we do model document structure in terms of sentiment transitions, we do not model topical structure. [sent-40, score-0.726]
30 An interesting avenue for future work would be to extend the model of Sauper et al. [sent-41, score-0.045]
31 (2010) to take coarse-grained taskspecific supervision into account, while modeling fine-grained task-specific aspects with latent variables. [sent-42, score-0.52]
32 Note also that the proposed approach is orthogonal to semi-supervised and unsupervised induction of context independent (prior polarity) lexicons (Turney, 2002; Kim and Hovy, 2004; Esuli and Sebastiani, 2009; Rao and Ravichandran, 2009; Velikovich et al. [sent-43, score-0.085]
33 The output of such models could readily be incorporated as features in the proposed model. [sent-45, score-0.092]
34 1 Preliminaries Let d be a document consisting of n sentences, s = (si)in=1, with a document–sentence-sequence pair denoted d = (d, s). [sent-47, score-0.161]
35 Let yd = (yd, ys) denote random variables1 the document level sentiment, yd, and the sequence of sentence level sentiment, = (ysi)in=1 . [sent-48, score-0.436]
36 – ys 1We are abusing notation throughout by using the same symbols to refer to random variables and their particular assignments. [sent-49, score-0.354]
37 570 In what follows, we assume that we have access to two training sets: a small set of fully labeled instances, DF = {(dj, and a large set of ydj)}jm=f1, coarsely labeled instances DC = {(dj, yjd)}jm=fm+fm+c1. [sent-50, score-0.338]
38 Furthermore, we assume that yd and all yis take values in {POS, NEG, NEU}. [sent-51, score-0.335]
wordName wordTfidf (topN-words)
[('sentiment', 0.479), ('supervision', 0.37), ('yd', 0.275), ('sauper', 0.218), ('document', 0.161), ('latent', 0.15), ('variables', 0.146), ('coarsely', 0.145), ('ys', 0.135), ('fuse', 0.134), ('jm', 0.126), ('mcdonald', 0.111), ('abundant', 0.106), ('structured', 0.102), ('dj', 0.097), ('fm', 0.092), ('conditional', 0.088), ('orthogonal', 0.085), ('variants', 0.081), ('fully', 0.077), ('supervised', 0.076), ('strands', 0.073), ('abusing', 0.073), ('sics', 0.073), ('pang', 0.069), ('cuts', 0.067), ('parametrization', 0.067), ('coarsegrained', 0.067), ('ysi', 0.067), ('loose', 0.067), ('ackstr', 0.067), ('nodes', 0.065), ('variable', 0.062), ('esuli', 0.06), ('yessenalina', 0.06), ('neu', 0.06), ('yis', 0.06), ('nakagawa', 0.06), ('graph', 0.059), ('merits', 0.057), ('vein', 0.057), ('sebastiani', 0.057), ('neg', 0.057), ('crafted', 0.057), ('uppsala', 0.056), ('naturally', 0.056), ('generative', 0.055), ('models', 0.054), ('si', 0.054), ('overcoming', 0.053), ('beat', 0.053), ('shortcomings', 0.053), ('bottleneck', 0.051), ('intensive', 0.051), ('unobserved', 0.05), ('df', 0.05), ('labor', 0.049), ('versa', 0.049), ('attracted', 0.049), ('review', 0.047), ('titov', 0.047), ('shaded', 0.047), ('fusion', 0.047), ('rao', 0.046), ('deemed', 0.046), ('ravichandran', 0.046), ('topical', 0.046), ('preliminaries', 0.046), ('combining', 0.046), ('dominant', 0.045), ('avenue', 0.045), ('rely', 0.044), ('experimentally', 0.043), ('turney', 0.043), ('car', 0.043), ('mei', 0.043), ('google', 0.043), ('amount', 0.043), ('contrary', 0.042), ('lee', 0.042), ('dc', 0.042), ('ratings', 0.042), ('notably', 0.041), ('labeled', 0.04), ('structure', 0.04), ('factor', 0.04), ('coarse', 0.039), ('gradient', 0.039), ('transitions', 0.039), ('verify', 0.038), ('rich', 0.038), ('learn', 0.038), ('connecting', 0.038), ('readily', 0.038), ('prediction', 0.037), ('subjective', 0.036), ('ta', 0.036), ('instances', 0.036), ('exponential', 0.035), ('overlapping', 0.035)]
simIndex simValue paperId paperTitle
same-paper 1 1.0 279 acl-2011-Semi-supervised latent variable models for sentence-level sentiment analysis
Author: Oscar Tackstrom ; Ryan McDonald
Abstract: We derive two variants of a semi-supervised model for fine-grained sentiment analysis. Both models leverage abundant natural supervision in the form of review ratings, as well as a small amount of manually crafted sentence labels, to learn sentence-level sentiment classifiers. The proposed model is a fusion of a fully supervised structured conditional model and its partially supervised counterpart. This allows for highly efficient estimation and inference algorithms with rich feature definitions. We describe the two variants as well as their component models and verify experimentally that both variants give significantly improved results for sentence-level sentiment analysis compared to all baselines. 1 Sentence-level sentiment analysis In this paper, we demonstrate how combining coarse-grained and fine-grained supervision benefits sentence-level sentiment analysis an important task in the field of opinion classification and retrieval (Pang and Lee, 2008). Typical supervised learning approaches to sentence-level sentiment analysis rely on sentence-level supervision. While such fine-grained supervision rarely exist naturally, and thus requires labor intensive manual annotation effort (Wiebe et al., 2005), coarse-grained supervision is naturally abundant in the form of online review ratings. This coarse-grained supervision is, of course, less informative compared to fine-grained supervision, however, by combining a small amount of sentence-level supervision with a large amount of document-level supervision, we are able to substantially improve on the sentence-level classification task. Our work combines two strands of research: models for sentiment analysis that take document structure into account; – 569 Ryan McDonald Google, Inc., New York ryanmcd@ google com . and models that use latent variables to learn unobserved phenomena from that which can be observed. Exploiting document structure for sentiment analysis has attracted research attention since the early work of Pang and Lee (2004), who performed minimal cuts in a sentence graph to select subjective sentences. McDonald et al. (2007) later showed that jointly learning fine-grained (sentence) and coarsegrained (document) sentiment improves predictions at both levels. More recently, Yessenalina et al. (2010) described how sentence-level latent variables can be used to improve document-level prediction and Nakagawa et al. (2010) used latent variables over syntactic dependency trees to improve sentence-level prediction, using only labeled sentences for training. In a similar vein, Sauper et al. (2010) integrated generative content structure models with discriminative models for multi-aspect sentiment summarization and ranking. These approaches all rely on the availability of fine-grained annotations, but Ta¨ckstro¨m and McDonald (201 1) showed that latent variables can be used to learn fine-grained sentiment using only coarse-grained supervision. While this model was shown to beat a set of natural baselines with quite a wide margin, it has its shortcomings. Most notably, due to the loose constraints provided by the coarse supervision, it tends to only predict the two dominant fine-grained sentiment categories well for each document sentiment category, so that almost all sentences in positive documents are deemed positive or neutral, and vice versa for negative documents. As a way of overcoming these shortcomings, we propose to fuse a coarsely supervised model with a fully supervised model. Below, we describe two ways of achieving such a combined model in the framework of structured conditional latent variable models. Contrary to (generative) topic models (Mei et al., 2007; Titov and Proceedings ofP thoer t4l9atnhd A, Onrnuegaoln M,e Jeuntineg 19 o-f2 t4h,e 2 A0s1s1o.c?i ac t2io0n11 fo Ar Cssoocmiaptuiotanti foonra Clo Lminpguutiast i ocns:aslh Loirntpgaupisetrics , pages 569–574, Figure 1: a) Factor graph of the fully observed graphical model. b) Factor graph of the corresponding latent variable model. During training, shaded nodes are observed, while non-shaded nodes are unobserved. The input sentences si are always observed. Note that there are no factors connecting the document node, yd, with the input nodes, s, so that the sentence-level variables, ys, in effect form a bottleneck between the document sentiment and the input sentences. McDonald, 2008; Lin and He, 2009), structured conditional models can handle rich and overlapping features and allow for exact inference and simple gradient based estimation. The former models are largely orthogonal to the one we propose in this work and combining their merits might be fruitful. As shown by Sauper et al. (2010), it is possible to fuse generative document structure models and task specific structured conditional models. While we do model document structure in terms of sentiment transitions, we do not model topical structure. An interesting avenue for future work would be to extend the model of Sauper et al. (2010) to take coarse-grained taskspecific supervision into account, while modeling fine-grained task-specific aspects with latent variables. Note also that the proposed approach is orthogonal to semi-supervised and unsupervised induction of context independent (prior polarity) lexicons (Turney, 2002; Kim and Hovy, 2004; Esuli and Sebastiani, 2009; Rao and Ravichandran, 2009; Velikovich et al., 2010). The output of such models could readily be incorporated as features in the proposed model. 1.1 Preliminaries Let d be a document consisting of n sentences, s = (si)in=1, with a document–sentence-sequence pair denoted d = (d, s). Let yd = (yd, ys) denote random variables1 the document level sentiment, yd, and the sequence of sentence level sentiment, = (ysi)in=1 . – ys 1We are abusing notation throughout by using the same symbols to refer to random variables and their particular assignments. 570 In what follows, we assume that we have access to two training sets: a small set of fully labeled instances, DF = {(dj, and a large set of ydj)}jm=f1, coarsely labeled instances DC = {(dj, yjd)}jm=fm+fm+c1. Furthermore, we assume that yd and all yis take values in {POS, NEG, NEU}. We focus on structured conditional models in the exponential family, with the standard parametrization pθ(yd,ys|s) = expnhφ(yd,ys,s),θi − Aθ(s)o
2 0.36336255 204 acl-2011-Learning Word Vectors for Sentiment Analysis
Author: Andrew L. Maas ; Raymond E. Daly ; Peter T. Pham ; Dan Huang ; Andrew Y. Ng ; Christopher Potts
Abstract: Unsupervised vector-based approaches to semantics can model rich lexical meanings, but they largely fail to capture sentiment information that is central to many word meanings and important for a wide range of NLP tasks. We present a model that uses a mix of unsupervised and supervised techniques to learn word vectors capturing semanticterm–documentinformation as well as rich sentiment content. The proposed model can leverage both continuous and multi-dimensional sentiment information as well as non-sentiment annotations. We instantiate the model to utilize the document-level sentiment polarity annotations present in many online documents (e.g. star ratings). We evaluate the model using small, widely used sentiment and subjectivity corpora and find it out-performs several previously introduced methods for sentiment classification. We also introduce a large dataset , of movie reviews to serve as a more robust benchmark for work in this area.
Author: Danushka Bollegala ; David Weir ; John Carroll
Abstract: We describe a sentiment classification method that is applicable when we do not have any labeled data for a target domain but have some labeled data for multiple other domains, designated as the source domains. We automat- ically create a sentiment sensitive thesaurus using both labeled and unlabeled data from multiple source domains to find the association between words that express similar sentiments in different domains. The created thesaurus is then used to expand feature vectors to train a binary classifier. Unlike previous cross-domain sentiment classification methods, our method can efficiently learn from multiple source domains. Our method significantly outperforms numerous baselines and returns results that are better than or comparable to previous cross-domain sentiment classification methods on a benchmark dataset containing Amazon user reviews for different types of products.
4 0.27768138 281 acl-2011-Sentiment Analysis of Citations using Sentence Structure-Based Features
Author: Awais Athar
Abstract: Sentiment analysis of citations in scientific papers and articles is a new and interesting problem due to the many linguistic differences between scientific texts and other genres. In this paper, we focus on the problem of automatic identification of positive and negative sentiment polarity in citations to scientific papers. Using a newly constructed annotated citation sentiment corpus, we explore the effectiveness of existing and novel features, including n-grams, specialised science-specific lexical features, dependency relations, sentence splitting and negation features. Our results show that 3-grams and dependencies perform best in this task; they outperform the sentence splitting, science lexicon and negation based features.
5 0.25873911 292 acl-2011-Target-dependent Twitter Sentiment Classification
Author: Long Jiang ; Mo Yu ; Ming Zhou ; Xiaohua Liu ; Tiejun Zhao
Abstract: Sentiment analysis on Twitter data has attracted much attention recently. In this paper, we focus on target-dependent Twitter sentiment classification; namely, given a query, we classify the sentiments of the tweets as positive, negative or neutral according to whether they contain positive, negative or neutral sentiments about that query. Here the query serves as the target of the sentiments. The state-ofthe-art approaches for solving this problem always adopt the target-independent strategy, which may assign irrelevant sentiments to the given target. Moreover, the state-of-the-art approaches only take the tweet to be classified into consideration when classifying the sentiment; they ignore its context (i.e., related tweets). However, because tweets are usually short and more ambiguous, sometimes it is not enough to consider only the current tweet for sentiment classification. In this paper, we propose to improve target-dependent Twitter sentiment classification by 1) incorporating target-dependent features; and 2) taking related tweets into consideration. According to the experimental results, our approach greatly improves the performance of target-dependent sentiment classification. 1
6 0.2455537 183 acl-2011-Joint Bilingual Sentiment Classification with Unlabeled Parallel Corpora
7 0.23382793 253 acl-2011-PsychoSentiWordNet
8 0.23260619 105 acl-2011-Dr Sentiment Knows Everything!
9 0.22971819 54 acl-2011-Automatically Extracting Polarity-Bearing Topics for Cross-Domain Sentiment Classification
10 0.20137414 64 acl-2011-C-Feel-It: A Sentiment Analyzer for Micro-blogs
11 0.20065977 218 acl-2011-MemeTube: A Sentiment-based Audiovisual System for Analyzing and Displaying Microblog Messages
12 0.19465701 45 acl-2011-Aspect Ranking: Identifying Important Product Aspects from Online Consumer Reviews
13 0.15537721 82 acl-2011-Content Models with Attitude
14 0.13020292 190 acl-2011-Knowledge-Based Weak Supervision for Information Extraction of Overlapping Relations
15 0.12616503 103 acl-2011-Domain Adaptation by Constraining Inter-Domain Variability of Latent Feature Representation
16 0.11601381 159 acl-2011-Identifying Noun Product Features that Imply Opinions
17 0.11205387 289 acl-2011-Subjectivity and Sentiment Analysis of Modern Standard Arabic
18 0.10372786 211 acl-2011-Liars and Saviors in a Sentiment Annotated Corpus of Comments to Political Debates
19 0.095656015 21 acl-2011-A Pilot Study of Opinion Summarization in Conversations
20 0.092899509 18 acl-2011-A Latent Topic Extracting Method based on Events in a Document and its Application
topicId topicWeight
[(0, 0.221), (1, 0.326), (2, 0.272), (3, -0.098), (4, 0.112), (5, -0.023), (6, -0.055), (7, 0.055), (8, -0.013), (9, -0.017), (10, 0.148), (11, 0.02), (12, -0.031), (13, 0.033), (14, 0.053), (15, 0.057), (16, -0.005), (17, -0.029), (18, -0.001), (19, 0.059), (20, -0.029), (21, -0.015), (22, 0.007), (23, -0.016), (24, -0.008), (25, -0.023), (26, -0.019), (27, -0.043), (28, 0.039), (29, 0.015), (30, -0.085), (31, 0.023), (32, 0.018), (33, -0.031), (34, -0.043), (35, -0.002), (36, 0.01), (37, 0.011), (38, 0.015), (39, 0.023), (40, 0.041), (41, -0.068), (42, 0.06), (43, 0.02), (44, -0.01), (45, 0.003), (46, -0.023), (47, -0.078), (48, 0.075), (49, 0.044)]
simIndex simValue paperId paperTitle
same-paper 1 0.97450733 279 acl-2011-Semi-supervised latent variable models for sentence-level sentiment analysis
Author: Oscar Tackstrom ; Ryan McDonald
Abstract: We derive two variants of a semi-supervised model for fine-grained sentiment analysis. Both models leverage abundant natural supervision in the form of review ratings, as well as a small amount of manually crafted sentence labels, to learn sentence-level sentiment classifiers. The proposed model is a fusion of a fully supervised structured conditional model and its partially supervised counterpart. This allows for highly efficient estimation and inference algorithms with rich feature definitions. We describe the two variants as well as their component models and verify experimentally that both variants give significantly improved results for sentence-level sentiment analysis compared to all baselines. 1 Sentence-level sentiment analysis In this paper, we demonstrate how combining coarse-grained and fine-grained supervision benefits sentence-level sentiment analysis an important task in the field of opinion classification and retrieval (Pang and Lee, 2008). Typical supervised learning approaches to sentence-level sentiment analysis rely on sentence-level supervision. While such fine-grained supervision rarely exist naturally, and thus requires labor intensive manual annotation effort (Wiebe et al., 2005), coarse-grained supervision is naturally abundant in the form of online review ratings. This coarse-grained supervision is, of course, less informative compared to fine-grained supervision, however, by combining a small amount of sentence-level supervision with a large amount of document-level supervision, we are able to substantially improve on the sentence-level classification task. Our work combines two strands of research: models for sentiment analysis that take document structure into account; – 569 Ryan McDonald Google, Inc., New York ryanmcd@ google com . and models that use latent variables to learn unobserved phenomena from that which can be observed. Exploiting document structure for sentiment analysis has attracted research attention since the early work of Pang and Lee (2004), who performed minimal cuts in a sentence graph to select subjective sentences. McDonald et al. (2007) later showed that jointly learning fine-grained (sentence) and coarsegrained (document) sentiment improves predictions at both levels. More recently, Yessenalina et al. (2010) described how sentence-level latent variables can be used to improve document-level prediction and Nakagawa et al. (2010) used latent variables over syntactic dependency trees to improve sentence-level prediction, using only labeled sentences for training. In a similar vein, Sauper et al. (2010) integrated generative content structure models with discriminative models for multi-aspect sentiment summarization and ranking. These approaches all rely on the availability of fine-grained annotations, but Ta¨ckstro¨m and McDonald (201 1) showed that latent variables can be used to learn fine-grained sentiment using only coarse-grained supervision. While this model was shown to beat a set of natural baselines with quite a wide margin, it has its shortcomings. Most notably, due to the loose constraints provided by the coarse supervision, it tends to only predict the two dominant fine-grained sentiment categories well for each document sentiment category, so that almost all sentences in positive documents are deemed positive or neutral, and vice versa for negative documents. As a way of overcoming these shortcomings, we propose to fuse a coarsely supervised model with a fully supervised model. Below, we describe two ways of achieving such a combined model in the framework of structured conditional latent variable models. Contrary to (generative) topic models (Mei et al., 2007; Titov and Proceedings ofP thoer t4l9atnhd A, Onrnuegaoln M,e Jeuntineg 19 o-f2 t4h,e 2 A0s1s1o.c?i ac t2io0n11 fo Ar Cssoocmiaptuiotanti foonra Clo Lminpguutiast i ocns:aslh Loirntpgaupisetrics , pages 569–574, Figure 1: a) Factor graph of the fully observed graphical model. b) Factor graph of the corresponding latent variable model. During training, shaded nodes are observed, while non-shaded nodes are unobserved. The input sentences si are always observed. Note that there are no factors connecting the document node, yd, with the input nodes, s, so that the sentence-level variables, ys, in effect form a bottleneck between the document sentiment and the input sentences. McDonald, 2008; Lin and He, 2009), structured conditional models can handle rich and overlapping features and allow for exact inference and simple gradient based estimation. The former models are largely orthogonal to the one we propose in this work and combining their merits might be fruitful. As shown by Sauper et al. (2010), it is possible to fuse generative document structure models and task specific structured conditional models. While we do model document structure in terms of sentiment transitions, we do not model topical structure. An interesting avenue for future work would be to extend the model of Sauper et al. (2010) to take coarse-grained taskspecific supervision into account, while modeling fine-grained task-specific aspects with latent variables. Note also that the proposed approach is orthogonal to semi-supervised and unsupervised induction of context independent (prior polarity) lexicons (Turney, 2002; Kim and Hovy, 2004; Esuli and Sebastiani, 2009; Rao and Ravichandran, 2009; Velikovich et al., 2010). The output of such models could readily be incorporated as features in the proposed model. 1.1 Preliminaries Let d be a document consisting of n sentences, s = (si)in=1, with a document–sentence-sequence pair denoted d = (d, s). Let yd = (yd, ys) denote random variables1 the document level sentiment, yd, and the sequence of sentence level sentiment, = (ysi)in=1 . – ys 1We are abusing notation throughout by using the same symbols to refer to random variables and their particular assignments. 570 In what follows, we assume that we have access to two training sets: a small set of fully labeled instances, DF = {(dj, and a large set of ydj)}jm=f1, coarsely labeled instances DC = {(dj, yjd)}jm=fm+fm+c1. Furthermore, we assume that yd and all yis take values in {POS, NEG, NEU}. We focus on structured conditional models in the exponential family, with the standard parametrization pθ(yd,ys|s) = expnhφ(yd,ys,s),θi − Aθ(s)o
2 0.89338541 204 acl-2011-Learning Word Vectors for Sentiment Analysis
Author: Andrew L. Maas ; Raymond E. Daly ; Peter T. Pham ; Dan Huang ; Andrew Y. Ng ; Christopher Potts
Abstract: Unsupervised vector-based approaches to semantics can model rich lexical meanings, but they largely fail to capture sentiment information that is central to many word meanings and important for a wide range of NLP tasks. We present a model that uses a mix of unsupervised and supervised techniques to learn word vectors capturing semanticterm–documentinformation as well as rich sentiment content. The proposed model can leverage both continuous and multi-dimensional sentiment information as well as non-sentiment annotations. We instantiate the model to utilize the document-level sentiment polarity annotations present in many online documents (e.g. star ratings). We evaluate the model using small, widely used sentiment and subjectivity corpora and find it out-performs several previously introduced methods for sentiment classification. We also introduce a large dataset , of movie reviews to serve as a more robust benchmark for work in this area.
Author: Danushka Bollegala ; David Weir ; John Carroll
Abstract: We describe a sentiment classification method that is applicable when we do not have any labeled data for a target domain but have some labeled data for multiple other domains, designated as the source domains. We automat- ically create a sentiment sensitive thesaurus using both labeled and unlabeled data from multiple source domains to find the association between words that express similar sentiments in different domains. The created thesaurus is then used to expand feature vectors to train a binary classifier. Unlike previous cross-domain sentiment classification methods, our method can efficiently learn from multiple source domains. Our method significantly outperforms numerous baselines and returns results that are better than or comparable to previous cross-domain sentiment classification methods on a benchmark dataset containing Amazon user reviews for different types of products.
4 0.80016178 54 acl-2011-Automatically Extracting Polarity-Bearing Topics for Cross-Domain Sentiment Classification
Author: Yulan He ; Chenghua Lin ; Harith Alani
Abstract: Joint sentiment-topic (JST) model was previously proposed to detect sentiment and topic simultaneously from text. The only supervision required by JST model learning is domain-independent polarity word priors. In this paper, we modify the JST model by incorporating word polarity priors through modifying the topic-word Dirichlet priors. We study the polarity-bearing topics extracted by JST and show that by augmenting the original feature space with polarity-bearing topics, the in-domain supervised classifiers learned from augmented feature representation achieve the state-of-the-art performance of 95% on the movie review data and an average of 90% on the multi-domain sentiment dataset. Furthermore, using feature augmentation and selection according to the information gain criteria for cross-domain sentiment classification, our proposed approach performs either better or comparably compared to previous approaches. Nevertheless, our approach is much simpler and does not require difficult parameter tuning.
5 0.76247776 45 acl-2011-Aspect Ranking: Identifying Important Product Aspects from Online Consumer Reviews
Author: Jianxing Yu ; Zheng-Jun Zha ; Meng Wang ; Tat-Seng Chua
Abstract: In this paper, we dedicate to the topic of aspect ranking, which aims to automatically identify important product aspects from online consumer reviews. The important aspects are identified according to two observations: (a) the important aspects of a product are usually commented by a large number of consumers; and (b) consumers’ opinions on the important aspects greatly influence their overall opinions on the product. In particular, given consumer reviews of a product, we first identify the product aspects by a shallow dependency parser and determine consumers’ opinions on these aspects via a sentiment classifier. We then develop an aspect ranking algorithm to identify the important aspects by simultaneously considering the aspect frequency and the influence of consumers’ opinions given to each aspect on their overall opinions. The experimental results on 11 popular products in four domains demonstrate the effectiveness of our approach. We further apply the aspect ranking results to the application ofdocumentlevel sentiment classification, and improve the performance significantly.
6 0.75808865 218 acl-2011-MemeTube: A Sentiment-based Audiovisual System for Analyzing and Displaying Microblog Messages
7 0.74038357 292 acl-2011-Target-dependent Twitter Sentiment Classification
8 0.71048397 253 acl-2011-PsychoSentiWordNet
9 0.70400709 281 acl-2011-Sentiment Analysis of Citations using Sentence Structure-Based Features
10 0.70073581 105 acl-2011-Dr Sentiment Knows Everything!
11 0.68728608 64 acl-2011-C-Feel-It: A Sentiment Analyzer for Micro-blogs
12 0.664478 183 acl-2011-Joint Bilingual Sentiment Classification with Unlabeled Parallel Corpora
13 0.63144141 82 acl-2011-Content Models with Attitude
14 0.5219624 103 acl-2011-Domain Adaptation by Constraining Inter-Domain Variability of Latent Feature Representation
15 0.51873428 289 acl-2011-Subjectivity and Sentiment Analysis of Modern Standard Arabic
16 0.44262642 211 acl-2011-Liars and Saviors in a Sentiment Annotated Corpus of Comments to Political Debates
17 0.43849656 179 acl-2011-Is Machine Translation Ripe for Cross-Lingual Sentiment Classification?
18 0.41263482 40 acl-2011-An Error Analysis of Relation Extraction in Social Media Documents
19 0.40569678 55 acl-2011-Automatically Predicting Peer-Review Helpfulness
20 0.39257866 295 acl-2011-Temporal Restricted Boltzmann Machines for Dependency Parsing
topicId topicWeight
[(5, 0.014), (17, 0.023), (37, 0.15), (39, 0.029), (41, 0.029), (55, 0.02), (59, 0.506), (72, 0.02), (91, 0.024), (96, 0.104)]
simIndex simValue paperId paperTitle
1 0.9085291 322 acl-2011-Unsupervised Learning of Semantic Relation Composition
Author: Eduardo Blanco ; Dan Moldovan
Abstract: This paper presents an unsupervised method for deriving inference axioms by composing semantic relations. The method is independent of any particular relation inventory. It relies on describing semantic relations using primitives and manipulating these primitives according to an algebra. The method was tested using a set of eight semantic relations yielding 78 inference axioms which were evaluated over PropBank.
same-paper 2 0.88787597 279 acl-2011-Semi-supervised latent variable models for sentence-level sentiment analysis
Author: Oscar Tackstrom ; Ryan McDonald
Abstract: We derive two variants of a semi-supervised model for fine-grained sentiment analysis. Both models leverage abundant natural supervision in the form of review ratings, as well as a small amount of manually crafted sentence labels, to learn sentence-level sentiment classifiers. The proposed model is a fusion of a fully supervised structured conditional model and its partially supervised counterpart. This allows for highly efficient estimation and inference algorithms with rich feature definitions. We describe the two variants as well as their component models and verify experimentally that both variants give significantly improved results for sentence-level sentiment analysis compared to all baselines. 1 Sentence-level sentiment analysis In this paper, we demonstrate how combining coarse-grained and fine-grained supervision benefits sentence-level sentiment analysis an important task in the field of opinion classification and retrieval (Pang and Lee, 2008). Typical supervised learning approaches to sentence-level sentiment analysis rely on sentence-level supervision. While such fine-grained supervision rarely exist naturally, and thus requires labor intensive manual annotation effort (Wiebe et al., 2005), coarse-grained supervision is naturally abundant in the form of online review ratings. This coarse-grained supervision is, of course, less informative compared to fine-grained supervision, however, by combining a small amount of sentence-level supervision with a large amount of document-level supervision, we are able to substantially improve on the sentence-level classification task. Our work combines two strands of research: models for sentiment analysis that take document structure into account; – 569 Ryan McDonald Google, Inc., New York ryanmcd@ google com . and models that use latent variables to learn unobserved phenomena from that which can be observed. Exploiting document structure for sentiment analysis has attracted research attention since the early work of Pang and Lee (2004), who performed minimal cuts in a sentence graph to select subjective sentences. McDonald et al. (2007) later showed that jointly learning fine-grained (sentence) and coarsegrained (document) sentiment improves predictions at both levels. More recently, Yessenalina et al. (2010) described how sentence-level latent variables can be used to improve document-level prediction and Nakagawa et al. (2010) used latent variables over syntactic dependency trees to improve sentence-level prediction, using only labeled sentences for training. In a similar vein, Sauper et al. (2010) integrated generative content structure models with discriminative models for multi-aspect sentiment summarization and ranking. These approaches all rely on the availability of fine-grained annotations, but Ta¨ckstro¨m and McDonald (201 1) showed that latent variables can be used to learn fine-grained sentiment using only coarse-grained supervision. While this model was shown to beat a set of natural baselines with quite a wide margin, it has its shortcomings. Most notably, due to the loose constraints provided by the coarse supervision, it tends to only predict the two dominant fine-grained sentiment categories well for each document sentiment category, so that almost all sentences in positive documents are deemed positive or neutral, and vice versa for negative documents. As a way of overcoming these shortcomings, we propose to fuse a coarsely supervised model with a fully supervised model. Below, we describe two ways of achieving such a combined model in the framework of structured conditional latent variable models. Contrary to (generative) topic models (Mei et al., 2007; Titov and Proceedings ofP thoer t4l9atnhd A, Onrnuegaoln M,e Jeuntineg 19 o-f2 t4h,e 2 A0s1s1o.c?i ac t2io0n11 fo Ar Cssoocmiaptuiotanti foonra Clo Lminpguutiast i ocns:aslh Loirntpgaupisetrics , pages 569–574, Figure 1: a) Factor graph of the fully observed graphical model. b) Factor graph of the corresponding latent variable model. During training, shaded nodes are observed, while non-shaded nodes are unobserved. The input sentences si are always observed. Note that there are no factors connecting the document node, yd, with the input nodes, s, so that the sentence-level variables, ys, in effect form a bottleneck between the document sentiment and the input sentences. McDonald, 2008; Lin and He, 2009), structured conditional models can handle rich and overlapping features and allow for exact inference and simple gradient based estimation. The former models are largely orthogonal to the one we propose in this work and combining their merits might be fruitful. As shown by Sauper et al. (2010), it is possible to fuse generative document structure models and task specific structured conditional models. While we do model document structure in terms of sentiment transitions, we do not model topical structure. An interesting avenue for future work would be to extend the model of Sauper et al. (2010) to take coarse-grained taskspecific supervision into account, while modeling fine-grained task-specific aspects with latent variables. Note also that the proposed approach is orthogonal to semi-supervised and unsupervised induction of context independent (prior polarity) lexicons (Turney, 2002; Kim and Hovy, 2004; Esuli and Sebastiani, 2009; Rao and Ravichandran, 2009; Velikovich et al., 2010). The output of such models could readily be incorporated as features in the proposed model. 1.1 Preliminaries Let d be a document consisting of n sentences, s = (si)in=1, with a document–sentence-sequence pair denoted d = (d, s). Let yd = (yd, ys) denote random variables1 the document level sentiment, yd, and the sequence of sentence level sentiment, = (ysi)in=1 . – ys 1We are abusing notation throughout by using the same symbols to refer to random variables and their particular assignments. 570 In what follows, we assume that we have access to two training sets: a small set of fully labeled instances, DF = {(dj, and a large set of ydj)}jm=f1, coarsely labeled instances DC = {(dj, yjd)}jm=fm+fm+c1. Furthermore, we assume that yd and all yis take values in {POS, NEG, NEU}. We focus on structured conditional models in the exponential family, with the standard parametrization pθ(yd,ys|s) = expnhφ(yd,ys,s),θi − Aθ(s)o
3 0.86476254 102 acl-2011-Does Size Matter - How Much Data is Required to Train a REG Algorithm?
Author: Mariet Theune ; Ruud Koolen ; Emiel Krahmer ; Sander Wubben
Abstract: In this paper we investigate how much data is required to train an algorithm for attribute selection, a subtask of Referring Expressions Generation (REG). To enable comparison between different-sized training sets, a systematic training method was developed. The results show that depending on the complexity of the domain, training on 10 to 20 items may already lead to a good performance.
4 0.86267245 293 acl-2011-Template-Based Information Extraction without the Templates
Author: Nathanael Chambers ; Dan Jurafsky
Abstract: Standard algorithms for template-based information extraction (IE) require predefined template schemas, and often labeled data, to learn to extract their slot fillers (e.g., an embassy is the Target of a Bombing template). This paper describes an approach to template-based IE that removes this requirement and performs extraction without knowing the template structure in advance. Our algorithm instead learns the template structure automatically from raw text, inducing template schemas as sets of linked events (e.g., bombings include detonate, set off, and destroy events) associated with semantic roles. We also solve the standard IE task, using the induced syntactic patterns to extract role fillers from specific documents. We evaluate on the MUC-4 terrorism dataset and show that we induce template structure very similar to handcreated gold structure, and we extract role fillers with an F1 score of .40, approaching the performance of algorithms that require full knowledge of the templates.
5 0.81070983 224 acl-2011-Models and Training for Unsupervised Preposition Sense Disambiguation
Author: Dirk Hovy ; Ashish Vaswani ; Stephen Tratz ; David Chiang ; Eduard Hovy
Abstract: We present a preliminary study on unsupervised preposition sense disambiguation (PSD), comparing different models and training techniques (EM, MAP-EM with L0 norm, Bayesian inference using Gibbs sampling). To our knowledge, this is the first attempt at unsupervised preposition sense disambiguation. Our best accuracy reaches 56%, a significant improvement (at p <.001) of 16% over the most-frequent-sense baseline.
6 0.78373057 329 acl-2011-Using Deep Morphology to Improve Automatic Error Detection in Arabic Handwriting Recognition
7 0.69052041 51 acl-2011-Automatic Headline Generation using Character Cross-Correlation
8 0.61951816 164 acl-2011-Improving Arabic Dependency Parsing with Form-based and Functional Morphological Features
9 0.5994671 262 acl-2011-Relation Guided Bootstrapping of Semantic Lexicons
10 0.57310718 170 acl-2011-In-domain Relation Discovery with Meta-constraints via Posterior Regularization
11 0.56346929 7 acl-2011-A Corpus for Modeling Morpho-Syntactic Agreement in Arabic: Gender, Number and Rationality
12 0.56108928 324 acl-2011-Unsupervised Semantic Role Induction via Split-Merge Clustering
13 0.54652268 167 acl-2011-Improving Dependency Parsing with Semantic Classes
14 0.53674853 289 acl-2011-Subjectivity and Sentiment Analysis of Modern Standard Arabic
15 0.53632545 269 acl-2011-Scaling up Automatic Cross-Lingual Semantic Role Annotation
16 0.53545874 198 acl-2011-Latent Semantic Word Sense Induction and Disambiguation
17 0.52708542 3 acl-2011-A Bayesian Model for Unsupervised Semantic Parsing
18 0.51980621 229 acl-2011-NULEX: An Open-License Broad Coverage Lexicon
19 0.51774198 190 acl-2011-Knowledge-Based Weak Supervision for Information Extraction of Overlapping Relations
20 0.51119196 86 acl-2011-Coreference for Learning to Extract Relations: Yes Virginia, Coreference Matters