acl acl2011 acl2011-54 knowledge-graph by maker-knowledge-mining
Source: pdf
Author: Yulan He ; Chenghua Lin ; Harith Alani
Abstract: Joint sentiment-topic (JST) model was previously proposed to detect sentiment and topic simultaneously from text. The only supervision required by JST model learning is domain-independent polarity word priors. In this paper, we modify the JST model by incorporating word polarity priors through modifying the topic-word Dirichlet priors. We study the polarity-bearing topics extracted by JST and show that by augmenting the original feature space with polarity-bearing topics, the in-domain supervised classifiers learned from augmented feature representation achieve the state-of-the-art performance of 95% on the movie review data and an average of 90% on the multi-domain sentiment dataset. Furthermore, using feature augmentation and selection according to the information gain criteria for cross-domain sentiment classification, our proposed approach performs either better or comparably compared to previous approaches. Nevertheless, our approach is much simpler and does not require difficult parameter tuning.
Reference: text
sentIndex sentText sentNum sentScore
1 uk Abstract Joint sentiment-topic (JST) model was previously proposed to detect sentiment and topic simultaneously from text. [sent-11, score-0.525]
2 Furthermore, using feature augmentation and selection according to the information gain criteria for cross-domain sentiment classification, our proposed approach performs either better or comparably compared to previous approaches. [sent-15, score-0.58]
3 1 Introduction Given a piece of text, sentiment classification aims to determine whether the semantic orientation of the text is positive, negative or neutral. [sent-17, score-0.432]
4 However, in many practical cases, we may have plentiful labeled examples in the source domain, but very few or no labeled examples in the 123 target domain with a different distribution. [sent-25, score-0.277]
5 For example, we may have many labeled books reviews, but we are interested in detecting the polarity of electronics reviews. [sent-26, score-0.177]
6 Reviews for different produces might have widely different vocabularies, thus classifiers trained on one domain often fail to produce satisfactory results when shifting to another domain. [sent-27, score-0.169]
7 This has motivated much research on sen- timent transfer learning which transfers knowledge from a source task or domain to a different but related task or domain (? [sent-28, score-0.36]
8 With prior polarity words extracted from both the MPQA subjectivity lexicon1 and the appraisal lexicon2, the JST model achieves a sentiment classification accuracy of 74% on the movie review data3 and 71% on the multi-domain sentiment dataset4. [sent-38, score-1.125]
9 Moreover, it is also able to extract coherent and informative topics grouped under different sentiment. [sent-39, score-0.178]
10 The fact that the JST model does not required any labeled documents for training makes it desirable for domain adaptation in sentiment classification. [sent-40, score-0.704]
11 Many existing approaches solve the sentiment transfer problem by associating words 1http : / /www . [sent-41, score-0.405]
12 Ac s2s0o1ci1a Atiosnso fcoirat Cioonm foprut Caotimonpaulta Lti nognuails Lti cnsg,u piasgteics 123–131, from different domains which indicate the same sentiment (? [sent-57, score-0.434]
13 Indeed, the polarity-bearing topics extracted by JST essentially capture sentiment associations among words from different domains which effectively overcome the data distribution difference between source and target domains. [sent-61, score-0.701]
14 The previously proposed JST model uses the sentiment prior information in the Gibbs sampling inference step that a sentiment label will only be sampled if the current word token has no prior sentiment as defined in a sentiment lexicon. [sent-62, score-1.73]
15 This in fact implies a different generative process where many of the word prior sentiment labels are observed. [sent-63, score-0.464]
16 This essentially creates an informed prior distribution for the sentiment labels and would allow the model to actually be latent and would be consistent with the generative story. [sent-66, score-0.52]
17 We proceed with a review of related work on sentiment domain adaptation. [sent-72, score-0.553]
18 We then briefly describe the JST model and present another approach to incorporate word prior polarity information into JST learning. [sent-73, score-0.165]
19 We subsequently show that words from different domains can indeed be grouped under the same polarity-bearing topic through an illustration of example topic words extracted by JST be- fore proposing a domain adaptation approach based on JST. [sent-74, score-0.559]
20 We verify our proposed approach by conducting experiments on both the movie review data 124 and the multi-domain sentiment dataset. [sent-75, score-0.508]
21 2 Related Work There has been significant amount of work on algorithms for domain adaptation in NLP. [sent-77, score-0.258]
22 Earlier work treats the source domain data as “prior knowledge” and uses maximum a posterior (MAP) estimation to learn a model for the target domain data under this prior distribution (? [sent-78, score-0.522]
23 ) also uses the source domain data to estimate prior distribution but in the context of a maximum entropy (ME) model. [sent-81, score-0.285]
24 ) for domain adaptation where a mixture model is defined to learn differences between domains. [sent-83, score-0.28]
25 Other approaches rely on unlabeled data in the target domain to overcome feature distribution differences between domains. [sent-84, score-0.275]
26 ) proposed structural correspondence learning (SCL) for domain adaptation in sentiment classification. [sent-88, score-0.657]
27 Given labeled data from a source domain and unlabeled data from target domain, SCL selects a set of pivot features to link the source and target domains where pivots are selected based on their common frequency in both domains and also their mutual information with the source labels. [sent-89, score-0.531]
28 ) proposed a kernel-mapping function which maps both source and target domains data to a high-dimensional feature space so that data points from the same domain are twice as similar as those from different domains. [sent-92, score-0.368]
29 ) proposed translated learning which uses a language model to link the class labels to the features in the source spaces, which in turn is translated to the features in the target spaces. [sent-95, score-0.159]
30 ) proposed the spectral feature alignment (SFA) algorithm where some domain- independent words are used as a bridge to construct a bipartite graph to model the co-occurrence relationship between domain-specific words and domain-independent words. [sent-101, score-0.15]
31 The sentiment score of each unlabeled documents is recursively calculated until convergence from its neighbors the actual labels of source domain documents and pseudo-labels of target document documents. [sent-105, score-0.745]
32 This approach was later extended by simultaneously considering relations between documents and words from both source and target domains (? [sent-106, score-0.176]
33 ) addressed the issue when the predictive distribution of class label given input data of the domains differs and proposed Predictive Distribution Matching SVM learn a robust classifier in the target domain by leveraging the labeled data from only the relevant regions of multiple sources. [sent-110, score-0.421]
34 For each sentiment label l under document d, cFhooro esaec a d siesnttriimbuetinotn l θd,l ∼ Dir(α). [sent-128, score-0.47]
35 For each word wi in document d choose a sentiment label li ∼ Mult(πd), choose a topic zi ∼ Mult(θd,li ), choose a word wi from ϕlzii , a Multinomial distribution over words conditioned on topic zi and sentiment label li. [sent-129, score-1.177]
36 – – – Gibbs sampling was used to estimate the posterior distribution by sequentially sampling each variable of interest, zt and lt here, from the distribution over 125 ? [sent-130, score-0.17]
37 01 iofth Se(rww)is =e l ,(2) × where the function S(w) returns the prior sentiment label of w in a sentiment lexicon, i. [sent-160, score-0.849]
38 The matrix λ can be considered as a transformation matrix which modifies the Dirichlet priors β of size S T V , so that the word prior polarity can sbiez captured. [sent-163, score-0.171]
39 Thus, the word “excellent” can only be drawn from the positive topic word distributions generated from a Dirichlet distribution with parameter βlpos. [sent-167, score-0.18]
40 In this section, we study the polarity-bearing topics extracted by JST. [sent-169, score-0.146]
41 We combined reviews from the source and target domains and discarded document labels in both domains. [sent-170, score-0.281]
42 Words in each cell are grouped under one topic and the upper half of the table shows topic words under the positive sentiment label while the lower half shows topic words under the negative sentiment label. [sent-175, score-1.126]
43 We can see that JST appears to better capture sentiment association distribution in the source and target domains. [sent-176, score-0.494]
44 set, words from the DVD domain describe a rock concert DVD while words from the Electronics domain are likely relevant to stereo amplifiers and receivers, 126 and yet they are grouped under the same topic by the JST model. [sent-178, score-0.424]
45 × set, “stainless” rarely appears in the Book domain and “interest” does not occur often in the Kitchen domain and they are grouped under the same topic. [sent-181, score-0.32]
46 These observations motivate us to explore polaritybearing topics extracted by JST for cross-domain sentiment classification since grouping words from different domains but bearing similar sentiment has the effect of overcoming the data distribution difference of two domains. [sent-182, score-1.108]
47 5 Domain Adaptation using JST Given input data x and a class label y, labeled patterns of one domain can be drawn from the joint distribution P(x, y) = P(y|x)P(x). [sent-183, score-0.258]
48 The task of domain adaptation is to predict the label yit corresponding to xit in the target domain. [sent-188, score-0.341]
49 We assume that we are given two sets of training data, Ds and Dt, the source domain and target dodmaatian, Ddataa sets, respectively. [sent-189, score-0.231]
50 In the multiclass classification problem, the source domain data consist of labeled instances, Ds = {(xsn; yns) ∈ X Y : 1o ≤ n ≤ Ns}, wcehs,ere D X =is {th(ex input space a×nd Y Y = i1s a fi nnit ≤e set o}f, ,c wlahsse elab Xels i. [sent-190, score-0.266]
51 pTehresource and target domain data are first merged with document labels discarded. [sent-198, score-0.276]
52 A JST model is then learned from the merged corpus to generate polaritybearing topics for each document. [sent-199, score-0.23]
53 The original documents in the source domain are augmented with those polarity-bearing topics as shown in Step 4 of Algorithm ? [sent-200, score-0.393]
54 , where lizi denotes a combination of sentiment label li and topic zi for word wi. [sent-202, score-0.541]
55 Finally, feature selection is performed according to the information gain criteria and a classifier is then trained from the source domain using the new document representations. [sent-203, score-0.361]
56 The target domain documents are also encoded in a similar way with polarity-bearing topics added into their feature representations. [sent-204, score-0.415]
57 Input: The source domain data Ds= {(xsn;yns) ∈ X × uYt: : 1h e≤ s n c≤e dNoms}a,i nth dea target =do {m(xain dat)a, ∈ ∈D Xt =× {Yxtn : ∈1 X≤ : n1 ≤≤ n ≤} ,N tthe, eN tatr ? [sent-206, score-0.231]
58 r t hNe target domain Dt 1u: Merge D sesn tainmde nDtt c wlasisthif ideorc fuorm tehnet laarbgeelts d odimscaairnde Dd, 2: 3: 4: 5: 6: 7: 8: xtn, DM e=rg {e(x Dsn, a1n ≤d n ≤ Ns; 1≤ n ≤ Nt} TDra =in a JxST,1 1m ≤od nel ≤ on D fTorra iena ach J SdoTc mumodenelt xsn D= (w1, w2 , . [sent-208, score-0.321]
59 , wm) ∈ Ds do Augment document with polarity-bearing topics generated from JST, (w1 w2, wm, l1z1, l2z2, lmzm) Add {xsn0 ; yns} into a document pool B endA dfodr Perform feature selection using IG on B PReertufornrm a fcelaastusriefie ser, tercatiinoend u on Bg xsn0 = Return , . [sent-211, score-0.382]
60 that the JST model directly models P(l|d), the probability of sentiment lraecbetlly given document, ahned p rhoebncaeb ldiotycu omfe sennt polarity can be classified accordingly. [sent-219, score-0.471]
61 Since JST model learning does not require the availability of document labels, it is possible to augment the source domain data by adding most confident pseudo-labeled documents from the target domain by the JST model as shown in Algorithm ? [sent-220, score-0.508]
62 6 Experiments We evaluate our proposed approach on the two datasets, the movie review (MR) data and the multidomain sentiment (MDS) dataset. [sent-223, score-0.508]
63 The movie review data consist of 1000 positive and 1000 neg- ative movie reviews drawn from the IMDB movie archive while the multi-domain sentiment dataset contains four different types of product reviews extracted from Amazon. [sent-224, score-0.745]
64 Input: The target domain data, Dt= {xtn∈ X : u1t ≤ n ≤ Nrget,t tN dot ? [sent-227, score-0.191]
65 The MPQA subjectivity lexicon is used as a sentiment lexicon in our experiments. [sent-231, score-0.394]
66 1 Experimental Setup While the original JST model can produce reasonable results with a simple symmetric Dirichlet prior, here we use asymmetric prior α over the topic proportions which is learned directly from data using a fixed-point iteration method (? [sent-233, score-0.193]
67 2 Supervised Sentiment Classification We performed 5-fold cross validation for the performance evaluation of supervised sentiment classification. [sent-242, score-0.396]
68 For example, when bTe ris o fse fte attou 5, ctlhuesrtee are 53 topic groups munpldee,r weahechn of the positive, negative, or neutral sentiment labels and hence there are altogether 15 feature clusters. [sent-265, score-0.571]
69 The generated topics for each document from the JST model were simply added into its bag-of-words (BOW) feature representation prior to model training. [sent-266, score-0.389]
70 shows the classification results on the five different domains by varying the number of topics from 1 to 200. [sent-269, score-0.287]
71 It can be observed that the best classification accuracy is obtained when the number of topics is set to 1 (or 3 feature clusters). [sent-270, score-0.255]
72 Nevertheless, when the number of topics is set to 15, using JST feature augmentation still outperforms ME without feature augmentation (the baseline model) in all of the domains. [sent-272, score-0.424]
73 It is worth pointing out that the JST model with single topic becomes the standard LDA model with only three sentiment topics. [sent-273, score-0.521]
74 Nevertheless, we have proposed an effective way to incorporate domain-independent word polarity prior information into model learning. [sent-274, score-0.191]
75 ) explored different methods to automatically generate annotator rationales to improve sentiment classification accuracy. [sent-318, score-0.463]
76 ) adopted a two-stage process by first classifying sentences as personal views and impersonal views and then using an ensemble method to perform sentiment classification. [sent-328, score-0.373]
77 3 Domain Adaptation We conducted domain adaptation experiments on the MDS dataset comprising of four different domains, Book (B), DVD (D), Electronics (E), and Kitchen appliances (K). [sent-334, score-0.258]
78 A classifier trained on the training set of one domain is tested on the test set of a different domain. [sent-336, score-0.168]
79 LDA results were generated from an ME classifier trained on document vectors augmented with topics generated from the LDA model. [sent-342, score-0.308]
80 JST results were obtained in a similar way except that we used the polaritybearing topics generated from the JST model. [sent-344, score-0.229]
81 We also tested with adding pseudo-labeled examples from the JST model into the source domain for ME classifier training (following Algorithm ? [sent-345, score-0.23]
82 The document sentiment classification probability threshold τ was set to 0. [sent-350, score-0.493]
83 where the result for each domain and for each method is averaged over all three possible adaptation tasks by varying the source domain. [sent-356, score-0.319]
84 The adaptation loss is calculated with respect to the in-domain gold standard classification result. [sent-357, score-0.2]
85 We plot the classification accuracy versus different topic numbers in Figure ? [sent-404, score-0.163]
86 It can be observed that for the relatively larger Book and DVD data sets, the accuracies peaked at topic number 10, whereas for the relatively smaller Electronics and Kitchen data sets, the best performance was obtained at topic number 50. [sent-407, score-0.208]
87 Increasing topic numbers results in the decrease of classification accuracy. [sent-408, score-0.163]
88 Manually examining the extracted polarity topics from JST reveals that when the topic number is small, each topic cluster contains well-mixed words from different domains. [sent-409, score-0.43]
89 However, when the topic number is large, words under each topic cluster tend to be dominated by a single domain. [sent-410, score-0.208]
90 our proposed approach with two other domain adaptation algorithms for sentiment classification, SCL and SFA. [sent-413, score-0.657]
91 Each set of bars represent a cross-domain sentiment classification task. [sent-414, score-0.432]
92 The thick horizontal lines are in-domain sentiment classification accuracies. [sent-415, score-0.432]
93 Furthermore, using feature augmentation and selection according to the information gain criteria for cross-domain sentiment classification, our proposed approach outperforms SCL and gives similar results as SFA. [sent-460, score-0.58]
94 First, polarity-bearing topics generated by the JST model were simply added into the original feature space of documents, it is worth investigating attaching different weight to each topic 130 bbaasseelliinnee SSCCLL? [sent-463, score-0.374]
95 maybe in proportional to the posterior probability of sentiment label and topic given a word estimated by the JST model. [sent-493, score-0.537]
96 ) on both pseudo-labeled examples and source domain training examples in order to improve the adaptation performance. [sent-497, score-0.298]
97 Customizing sentiment classifiers to new domains: a case study. [sent-510, score-0.398]
98 Biographies, bollywood, boom-boxes and blenders: Domain adaptation for sentiment classification. [sent-528, score-0.487]
99 Employing personal/impersonal views in supervised and semi-supervised sentiment classification. [sent-589, score-0.396]
100 A sentimental education: sentiment analysis using subjectivity summariza- tion based on minimum cuts. [sent-626, score-0.394]
wordName wordTfidf (topN-words)
[('jst', 0.743), ('sentiment', 0.373), ('topics', 0.146), ('domain', 0.144), ('dvd', 0.131), ('adaptation', 0.114), ('xtn', 0.109), ('topic', 0.104), ('lda', 0.095), ('augmentation', 0.089), ('dd', 0.078), ('electronics', 0.078), ('polarity', 0.076), ('kitchen', 0.076), ('scl', 0.076), ('movie', 0.073), ('prior', 0.067), ('book', 0.066), ('polaritybearing', 0.062), ('bb', 0.062), ('kk', 0.062), ('document', 0.061), ('domains', 0.061), ('classification', 0.059), ('spectral', 0.052), ('feature', 0.05), ('reviews', 0.048), ('exeter', 0.047), ('xsn', 0.047), ('yns', 0.047), ('target', 0.047), ('dirichlet', 0.046), ('dai', 0.045), ('mds', 0.041), ('ee', 0.04), ('source', 0.04), ('zt', 0.038), ('label', 0.036), ('review', 0.036), ('augmented', 0.035), ('sfa', 0.034), ('yessenalina', 0.034), ('denoted', 0.034), ('distribution', 0.034), ('dt', 0.033), ('grouped', 0.032), ('transfer', 0.032), ('bbaasseelliinnee', 0.031), ('ftorra', 0.031), ('hne', 0.031), ('iena', 0.031), ('iigg', 0.031), ('jjsstt', 0.031), ('mmii', 0.031), ('mumodenelt', 0.031), ('rationales', 0.031), ('sdotc', 0.031), ('seah', 0.031), ('ssccll', 0.031), ('ssffaa', 0.031), ('nevertheless', 0.03), ('ds', 0.03), ('zi', 0.028), ('documents', 0.028), ('ns', 0.028), ('priors', 0.028), ('pivots', 0.028), ('loss', 0.027), ('ln', 0.027), ('mr', 0.027), ('proposed', 0.026), ('predictive', 0.026), ('appraisal', 0.025), ('cla', 0.025), ('classifiers', 0.025), ('posterior', 0.024), ('labels', 0.024), ('classifier', 0.024), ('pool', 0.023), ('supervised', 0.023), ('labeled', 0.023), ('augmenting', 0.023), ('mult', 0.023), ('model', 0.022), ('criteria', 0.022), ('adapted', 0.022), ('daum', 0.022), ('simpler', 0.021), ('subjectivity', 0.021), ('drawn', 0.021), ('varying', 0.021), ('ach', 0.021), ('mpqa', 0.021), ('generated', 0.021), ('gibbs', 0.02), ('sampling', 0.02), ('bow', 0.02), ('chelba', 0.02), ('altogether', 0.02), ('selection', 0.02)]
simIndex simValue paperId paperTitle
same-paper 1 0.99999923 54 acl-2011-Automatically Extracting Polarity-Bearing Topics for Cross-Domain Sentiment Classification
Author: Yulan He ; Chenghua Lin ; Harith Alani
Abstract: Joint sentiment-topic (JST) model was previously proposed to detect sentiment and topic simultaneously from text. The only supervision required by JST model learning is domain-independent polarity word priors. In this paper, we modify the JST model by incorporating word polarity priors through modifying the topic-word Dirichlet priors. We study the polarity-bearing topics extracted by JST and show that by augmenting the original feature space with polarity-bearing topics, the in-domain supervised classifiers learned from augmented feature representation achieve the state-of-the-art performance of 95% on the movie review data and an average of 90% on the multi-domain sentiment dataset. Furthermore, using feature augmentation and selection according to the information gain criteria for cross-domain sentiment classification, our proposed approach performs either better or comparably compared to previous approaches. Nevertheless, our approach is much simpler and does not require difficult parameter tuning.
Author: Danushka Bollegala ; David Weir ; John Carroll
Abstract: We describe a sentiment classification method that is applicable when we do not have any labeled data for a target domain but have some labeled data for multiple other domains, designated as the source domains. We automat- ically create a sentiment sensitive thesaurus using both labeled and unlabeled data from multiple source domains to find the association between words that express similar sentiments in different domains. The created thesaurus is then used to expand feature vectors to train a binary classifier. Unlike previous cross-domain sentiment classification methods, our method can efficiently learn from multiple source domains. Our method significantly outperforms numerous baselines and returns results that are better than or comparable to previous cross-domain sentiment classification methods on a benchmark dataset containing Amazon user reviews for different types of products.
3 0.32509303 204 acl-2011-Learning Word Vectors for Sentiment Analysis
Author: Andrew L. Maas ; Raymond E. Daly ; Peter T. Pham ; Dan Huang ; Andrew Y. Ng ; Christopher Potts
Abstract: Unsupervised vector-based approaches to semantics can model rich lexical meanings, but they largely fail to capture sentiment information that is central to many word meanings and important for a wide range of NLP tasks. We present a model that uses a mix of unsupervised and supervised techniques to learn word vectors capturing semanticterm–documentinformation as well as rich sentiment content. The proposed model can leverage both continuous and multi-dimensional sentiment information as well as non-sentiment annotations. We instantiate the model to utilize the document-level sentiment polarity annotations present in many online documents (e.g. star ratings). We evaluate the model using small, widely used sentiment and subjectivity corpora and find it out-performs several previously introduced methods for sentiment classification. We also introduce a large dataset , of movie reviews to serve as a more robust benchmark for work in this area.
4 0.22971819 279 acl-2011-Semi-supervised latent variable models for sentence-level sentiment analysis
Author: Oscar Tackstrom ; Ryan McDonald
Abstract: We derive two variants of a semi-supervised model for fine-grained sentiment analysis. Both models leverage abundant natural supervision in the form of review ratings, as well as a small amount of manually crafted sentence labels, to learn sentence-level sentiment classifiers. The proposed model is a fusion of a fully supervised structured conditional model and its partially supervised counterpart. This allows for highly efficient estimation and inference algorithms with rich feature definitions. We describe the two variants as well as their component models and verify experimentally that both variants give significantly improved results for sentence-level sentiment analysis compared to all baselines. 1 Sentence-level sentiment analysis In this paper, we demonstrate how combining coarse-grained and fine-grained supervision benefits sentence-level sentiment analysis an important task in the field of opinion classification and retrieval (Pang and Lee, 2008). Typical supervised learning approaches to sentence-level sentiment analysis rely on sentence-level supervision. While such fine-grained supervision rarely exist naturally, and thus requires labor intensive manual annotation effort (Wiebe et al., 2005), coarse-grained supervision is naturally abundant in the form of online review ratings. This coarse-grained supervision is, of course, less informative compared to fine-grained supervision, however, by combining a small amount of sentence-level supervision with a large amount of document-level supervision, we are able to substantially improve on the sentence-level classification task. Our work combines two strands of research: models for sentiment analysis that take document structure into account; – 569 Ryan McDonald Google, Inc., New York ryanmcd@ google com . and models that use latent variables to learn unobserved phenomena from that which can be observed. Exploiting document structure for sentiment analysis has attracted research attention since the early work of Pang and Lee (2004), who performed minimal cuts in a sentence graph to select subjective sentences. McDonald et al. (2007) later showed that jointly learning fine-grained (sentence) and coarsegrained (document) sentiment improves predictions at both levels. More recently, Yessenalina et al. (2010) described how sentence-level latent variables can be used to improve document-level prediction and Nakagawa et al. (2010) used latent variables over syntactic dependency trees to improve sentence-level prediction, using only labeled sentences for training. In a similar vein, Sauper et al. (2010) integrated generative content structure models with discriminative models for multi-aspect sentiment summarization and ranking. These approaches all rely on the availability of fine-grained annotations, but Ta¨ckstro¨m and McDonald (201 1) showed that latent variables can be used to learn fine-grained sentiment using only coarse-grained supervision. While this model was shown to beat a set of natural baselines with quite a wide margin, it has its shortcomings. Most notably, due to the loose constraints provided by the coarse supervision, it tends to only predict the two dominant fine-grained sentiment categories well for each document sentiment category, so that almost all sentences in positive documents are deemed positive or neutral, and vice versa for negative documents. As a way of overcoming these shortcomings, we propose to fuse a coarsely supervised model with a fully supervised model. Below, we describe two ways of achieving such a combined model in the framework of structured conditional latent variable models. Contrary to (generative) topic models (Mei et al., 2007; Titov and Proceedings ofP thoer t4l9atnhd A, Onrnuegaoln M,e Jeuntineg 19 o-f2 t4h,e 2 A0s1s1o.c?i ac t2io0n11 fo Ar Cssoocmiaptuiotanti foonra Clo Lminpguutiast i ocns:aslh Loirntpgaupisetrics , pages 569–574, Figure 1: a) Factor graph of the fully observed graphical model. b) Factor graph of the corresponding latent variable model. During training, shaded nodes are observed, while non-shaded nodes are unobserved. The input sentences si are always observed. Note that there are no factors connecting the document node, yd, with the input nodes, s, so that the sentence-level variables, ys, in effect form a bottleneck between the document sentiment and the input sentences. McDonald, 2008; Lin and He, 2009), structured conditional models can handle rich and overlapping features and allow for exact inference and simple gradient based estimation. The former models are largely orthogonal to the one we propose in this work and combining their merits might be fruitful. As shown by Sauper et al. (2010), it is possible to fuse generative document structure models and task specific structured conditional models. While we do model document structure in terms of sentiment transitions, we do not model topical structure. An interesting avenue for future work would be to extend the model of Sauper et al. (2010) to take coarse-grained taskspecific supervision into account, while modeling fine-grained task-specific aspects with latent variables. Note also that the proposed approach is orthogonal to semi-supervised and unsupervised induction of context independent (prior polarity) lexicons (Turney, 2002; Kim and Hovy, 2004; Esuli and Sebastiani, 2009; Rao and Ravichandran, 2009; Velikovich et al., 2010). The output of such models could readily be incorporated as features in the proposed model. 1.1 Preliminaries Let d be a document consisting of n sentences, s = (si)in=1, with a document–sentence-sequence pair denoted d = (d, s). Let yd = (yd, ys) denote random variables1 the document level sentiment, yd, and the sequence of sentence level sentiment, = (ysi)in=1 . – ys 1We are abusing notation throughout by using the same symbols to refer to random variables and their particular assignments. 570 In what follows, we assume that we have access to two training sets: a small set of fully labeled instances, DF = {(dj, and a large set of ydj)}jm=f1, coarsely labeled instances DC = {(dj, yjd)}jm=fm+fm+c1. Furthermore, we assume that yd and all yis take values in {POS, NEG, NEU}. We focus on structured conditional models in the exponential family, with the standard parametrization pθ(yd,ys|s) = expnhφ(yd,ys,s),θi − Aθ(s)o
5 0.22181515 292 acl-2011-Target-dependent Twitter Sentiment Classification
Author: Long Jiang ; Mo Yu ; Ming Zhou ; Xiaohua Liu ; Tiejun Zhao
Abstract: Sentiment analysis on Twitter data has attracted much attention recently. In this paper, we focus on target-dependent Twitter sentiment classification; namely, given a query, we classify the sentiments of the tweets as positive, negative or neutral according to whether they contain positive, negative or neutral sentiments about that query. Here the query serves as the target of the sentiments. The state-ofthe-art approaches for solving this problem always adopt the target-independent strategy, which may assign irrelevant sentiments to the given target. Moreover, the state-of-the-art approaches only take the tweet to be classified into consideration when classifying the sentiment; they ignore its context (i.e., related tweets). However, because tweets are usually short and more ambiguous, sometimes it is not enough to consider only the current tweet for sentiment classification. In this paper, we propose to improve target-dependent Twitter sentiment classification by 1) incorporating target-dependent features; and 2) taking related tweets into consideration. According to the experimental results, our approach greatly improves the performance of target-dependent sentiment classification. 1
6 0.22010523 281 acl-2011-Sentiment Analysis of Citations using Sentence Structure-Based Features
7 0.21631838 183 acl-2011-Joint Bilingual Sentiment Classification with Unlabeled Parallel Corpora
8 0.19188358 253 acl-2011-PsychoSentiWordNet
9 0.18771864 105 acl-2011-Dr Sentiment Knows Everything!
10 0.18126659 103 acl-2011-Domain Adaptation by Constraining Inter-Domain Variability of Latent Feature Representation
11 0.16829285 179 acl-2011-Is Machine Translation Ripe for Cross-Lingual Sentiment Classification?
12 0.16353247 218 acl-2011-MemeTube: A Sentiment-based Audiovisual System for Analyzing and Displaying Microblog Messages
13 0.1626841 45 acl-2011-Aspect Ranking: Identifying Important Product Aspects from Online Consumer Reviews
14 0.16027078 64 acl-2011-C-Feel-It: A Sentiment Analyzer for Micro-blogs
15 0.12798584 52 acl-2011-Automatic Labelling of Topic Models
16 0.12317549 161 acl-2011-Identifying Word Translations from Comparable Corpora Using Latent Topic Models
17 0.12259737 117 acl-2011-Entity Set Expansion using Topic information
18 0.1187626 82 acl-2011-Content Models with Attitude
19 0.11426165 18 acl-2011-A Latent Topic Extracting Method based on Events in a Document and its Application
20 0.1110779 178 acl-2011-Interactive Topic Modeling
topicId topicWeight
[(0, 0.209), (1, 0.307), (2, 0.257), (3, -0.03), (4, 0.079), (5, -0.054), (6, -0.073), (7, 0.112), (8, 0.005), (9, 0.026), (10, 0.114), (11, -0.025), (12, 0.051), (13, 0.036), (14, 0.177), (15, 0.101), (16, -0.036), (17, -0.015), (18, -0.006), (19, 0.02), (20, -0.017), (21, -0.047), (22, -0.02), (23, 0.009), (24, -0.004), (25, 0.039), (26, -0.077), (27, 0.013), (28, 0.041), (29, 0.014), (30, -0.086), (31, 0.048), (32, 0.01), (33, -0.05), (34, -0.088), (35, 0.003), (36, -0.087), (37, 0.01), (38, -0.048), (39, 0.009), (40, 0.017), (41, -0.038), (42, -0.026), (43, 0.043), (44, -0.024), (45, 0.047), (46, -0.006), (47, 0.018), (48, -0.052), (49, -0.013)]
simIndex simValue paperId paperTitle
same-paper 1 0.96144652 54 acl-2011-Automatically Extracting Polarity-Bearing Topics for Cross-Domain Sentiment Classification
Author: Yulan He ; Chenghua Lin ; Harith Alani
Abstract: Joint sentiment-topic (JST) model was previously proposed to detect sentiment and topic simultaneously from text. The only supervision required by JST model learning is domain-independent polarity word priors. In this paper, we modify the JST model by incorporating word polarity priors through modifying the topic-word Dirichlet priors. We study the polarity-bearing topics extracted by JST and show that by augmenting the original feature space with polarity-bearing topics, the in-domain supervised classifiers learned from augmented feature representation achieve the state-of-the-art performance of 95% on the movie review data and an average of 90% on the multi-domain sentiment dataset. Furthermore, using feature augmentation and selection according to the information gain criteria for cross-domain sentiment classification, our proposed approach performs either better or comparably compared to previous approaches. Nevertheless, our approach is much simpler and does not require difficult parameter tuning.
Author: Danushka Bollegala ; David Weir ; John Carroll
Abstract: We describe a sentiment classification method that is applicable when we do not have any labeled data for a target domain but have some labeled data for multiple other domains, designated as the source domains. We automat- ically create a sentiment sensitive thesaurus using both labeled and unlabeled data from multiple source domains to find the association between words that express similar sentiments in different domains. The created thesaurus is then used to expand feature vectors to train a binary classifier. Unlike previous cross-domain sentiment classification methods, our method can efficiently learn from multiple source domains. Our method significantly outperforms numerous baselines and returns results that are better than or comparable to previous cross-domain sentiment classification methods on a benchmark dataset containing Amazon user reviews for different types of products.
3 0.8773163 204 acl-2011-Learning Word Vectors for Sentiment Analysis
Author: Andrew L. Maas ; Raymond E. Daly ; Peter T. Pham ; Dan Huang ; Andrew Y. Ng ; Christopher Potts
Abstract: Unsupervised vector-based approaches to semantics can model rich lexical meanings, but they largely fail to capture sentiment information that is central to many word meanings and important for a wide range of NLP tasks. We present a model that uses a mix of unsupervised and supervised techniques to learn word vectors capturing semanticterm–documentinformation as well as rich sentiment content. The proposed model can leverage both continuous and multi-dimensional sentiment information as well as non-sentiment annotations. We instantiate the model to utilize the document-level sentiment polarity annotations present in many online documents (e.g. star ratings). We evaluate the model using small, widely used sentiment and subjectivity corpora and find it out-performs several previously introduced methods for sentiment classification. We also introduce a large dataset , of movie reviews to serve as a more robust benchmark for work in this area.
4 0.83428937 279 acl-2011-Semi-supervised latent variable models for sentence-level sentiment analysis
Author: Oscar Tackstrom ; Ryan McDonald
Abstract: We derive two variants of a semi-supervised model for fine-grained sentiment analysis. Both models leverage abundant natural supervision in the form of review ratings, as well as a small amount of manually crafted sentence labels, to learn sentence-level sentiment classifiers. The proposed model is a fusion of a fully supervised structured conditional model and its partially supervised counterpart. This allows for highly efficient estimation and inference algorithms with rich feature definitions. We describe the two variants as well as their component models and verify experimentally that both variants give significantly improved results for sentence-level sentiment analysis compared to all baselines. 1 Sentence-level sentiment analysis In this paper, we demonstrate how combining coarse-grained and fine-grained supervision benefits sentence-level sentiment analysis an important task in the field of opinion classification and retrieval (Pang and Lee, 2008). Typical supervised learning approaches to sentence-level sentiment analysis rely on sentence-level supervision. While such fine-grained supervision rarely exist naturally, and thus requires labor intensive manual annotation effort (Wiebe et al., 2005), coarse-grained supervision is naturally abundant in the form of online review ratings. This coarse-grained supervision is, of course, less informative compared to fine-grained supervision, however, by combining a small amount of sentence-level supervision with a large amount of document-level supervision, we are able to substantially improve on the sentence-level classification task. Our work combines two strands of research: models for sentiment analysis that take document structure into account; – 569 Ryan McDonald Google, Inc., New York ryanmcd@ google com . and models that use latent variables to learn unobserved phenomena from that which can be observed. Exploiting document structure for sentiment analysis has attracted research attention since the early work of Pang and Lee (2004), who performed minimal cuts in a sentence graph to select subjective sentences. McDonald et al. (2007) later showed that jointly learning fine-grained (sentence) and coarsegrained (document) sentiment improves predictions at both levels. More recently, Yessenalina et al. (2010) described how sentence-level latent variables can be used to improve document-level prediction and Nakagawa et al. (2010) used latent variables over syntactic dependency trees to improve sentence-level prediction, using only labeled sentences for training. In a similar vein, Sauper et al. (2010) integrated generative content structure models with discriminative models for multi-aspect sentiment summarization and ranking. These approaches all rely on the availability of fine-grained annotations, but Ta¨ckstro¨m and McDonald (201 1) showed that latent variables can be used to learn fine-grained sentiment using only coarse-grained supervision. While this model was shown to beat a set of natural baselines with quite a wide margin, it has its shortcomings. Most notably, due to the loose constraints provided by the coarse supervision, it tends to only predict the two dominant fine-grained sentiment categories well for each document sentiment category, so that almost all sentences in positive documents are deemed positive or neutral, and vice versa for negative documents. As a way of overcoming these shortcomings, we propose to fuse a coarsely supervised model with a fully supervised model. Below, we describe two ways of achieving such a combined model in the framework of structured conditional latent variable models. Contrary to (generative) topic models (Mei et al., 2007; Titov and Proceedings ofP thoer t4l9atnhd A, Onrnuegaoln M,e Jeuntineg 19 o-f2 t4h,e 2 A0s1s1o.c?i ac t2io0n11 fo Ar Cssoocmiaptuiotanti foonra Clo Lminpguutiast i ocns:aslh Loirntpgaupisetrics , pages 569–574, Figure 1: a) Factor graph of the fully observed graphical model. b) Factor graph of the corresponding latent variable model. During training, shaded nodes are observed, while non-shaded nodes are unobserved. The input sentences si are always observed. Note that there are no factors connecting the document node, yd, with the input nodes, s, so that the sentence-level variables, ys, in effect form a bottleneck between the document sentiment and the input sentences. McDonald, 2008; Lin and He, 2009), structured conditional models can handle rich and overlapping features and allow for exact inference and simple gradient based estimation. The former models are largely orthogonal to the one we propose in this work and combining their merits might be fruitful. As shown by Sauper et al. (2010), it is possible to fuse generative document structure models and task specific structured conditional models. While we do model document structure in terms of sentiment transitions, we do not model topical structure. An interesting avenue for future work would be to extend the model of Sauper et al. (2010) to take coarse-grained taskspecific supervision into account, while modeling fine-grained task-specific aspects with latent variables. Note also that the proposed approach is orthogonal to semi-supervised and unsupervised induction of context independent (prior polarity) lexicons (Turney, 2002; Kim and Hovy, 2004; Esuli and Sebastiani, 2009; Rao and Ravichandran, 2009; Velikovich et al., 2010). The output of such models could readily be incorporated as features in the proposed model. 1.1 Preliminaries Let d be a document consisting of n sentences, s = (si)in=1, with a document–sentence-sequence pair denoted d = (d, s). Let yd = (yd, ys) denote random variables1 the document level sentiment, yd, and the sequence of sentence level sentiment, = (ysi)in=1 . – ys 1We are abusing notation throughout by using the same symbols to refer to random variables and their particular assignments. 570 In what follows, we assume that we have access to two training sets: a small set of fully labeled instances, DF = {(dj, and a large set of ydj)}jm=f1, coarsely labeled instances DC = {(dj, yjd)}jm=fm+fm+c1. Furthermore, we assume that yd and all yis take values in {POS, NEG, NEU}. We focus on structured conditional models in the exponential family, with the standard parametrization pθ(yd,ys|s) = expnhφ(yd,ys,s),θi − Aθ(s)o
5 0.69097698 45 acl-2011-Aspect Ranking: Identifying Important Product Aspects from Online Consumer Reviews
Author: Jianxing Yu ; Zheng-Jun Zha ; Meng Wang ; Tat-Seng Chua
Abstract: In this paper, we dedicate to the topic of aspect ranking, which aims to automatically identify important product aspects from online consumer reviews. The important aspects are identified according to two observations: (a) the important aspects of a product are usually commented by a large number of consumers; and (b) consumers’ opinions on the important aspects greatly influence their overall opinions on the product. In particular, given consumer reviews of a product, we first identify the product aspects by a shallow dependency parser and determine consumers’ opinions on these aspects via a sentiment classifier. We then develop an aspect ranking algorithm to identify the important aspects by simultaneously considering the aspect frequency and the influence of consumers’ opinions given to each aspect on their overall opinions. The experimental results on 11 popular products in four domains demonstrate the effectiveness of our approach. We further apply the aspect ranking results to the application ofdocumentlevel sentiment classification, and improve the performance significantly.
6 0.68475556 103 acl-2011-Domain Adaptation by Constraining Inter-Domain Variability of Latent Feature Representation
7 0.67579454 218 acl-2011-MemeTube: A Sentiment-based Audiovisual System for Analyzing and Displaying Microblog Messages
8 0.66410303 183 acl-2011-Joint Bilingual Sentiment Classification with Unlabeled Parallel Corpora
9 0.66005385 179 acl-2011-Is Machine Translation Ripe for Cross-Lingual Sentiment Classification?
10 0.62806332 292 acl-2011-Target-dependent Twitter Sentiment Classification
11 0.61543912 281 acl-2011-Sentiment Analysis of Citations using Sentence Structure-Based Features
12 0.58200163 82 acl-2011-Content Models with Attitude
13 0.5768019 253 acl-2011-PsychoSentiWordNet
14 0.56731313 105 acl-2011-Dr Sentiment Knows Everything!
15 0.5528754 64 acl-2011-C-Feel-It: A Sentiment Analyzer for Micro-blogs
16 0.53386474 109 acl-2011-Effective Measures of Domain Similarity for Parsing
17 0.50716215 289 acl-2011-Subjectivity and Sentiment Analysis of Modern Standard Arabic
18 0.46572086 287 acl-2011-Structural Topic Model for Latent Topical Structure Analysis
19 0.43616864 55 acl-2011-Automatically Predicting Peer-Review Helpfulness
20 0.43204144 117 acl-2011-Entity Set Expansion using Topic information
topicId topicWeight
[(5, 0.013), (17, 0.052), (26, 0.015), (36, 0.202), (37, 0.203), (39, 0.065), (41, 0.048), (53, 0.011), (55, 0.02), (59, 0.035), (72, 0.028), (88, 0.044), (91, 0.033), (96, 0.132)]
simIndex simValue paperId paperTitle
same-paper 1 0.84567106 54 acl-2011-Automatically Extracting Polarity-Bearing Topics for Cross-Domain Sentiment Classification
Author: Yulan He ; Chenghua Lin ; Harith Alani
Abstract: Joint sentiment-topic (JST) model was previously proposed to detect sentiment and topic simultaneously from text. The only supervision required by JST model learning is domain-independent polarity word priors. In this paper, we modify the JST model by incorporating word polarity priors through modifying the topic-word Dirichlet priors. We study the polarity-bearing topics extracted by JST and show that by augmenting the original feature space with polarity-bearing topics, the in-domain supervised classifiers learned from augmented feature representation achieve the state-of-the-art performance of 95% on the movie review data and an average of 90% on the multi-domain sentiment dataset. Furthermore, using feature augmentation and selection according to the information gain criteria for cross-domain sentiment classification, our proposed approach performs either better or comparably compared to previous approaches. Nevertheless, our approach is much simpler and does not require difficult parameter tuning.
2 0.8435117 13 acl-2011-A Graph Approach to Spelling Correction in Domain-Centric Search
Author: Zhuowei Bao ; Benny Kimelfeld ; Yunyao Li
Abstract: Spelling correction for keyword-search queries is challenging in restricted domains such as personal email (or desktop) search, due to the scarcity of query logs, and due to the specialized nature of the domain. For that task, this paper presents an algorithm that is based on statistics from the corpus data (rather than the query log). This algorithm, which employs a simple graph-based approach, can incorporate different types of data sources with different levels of reliability (e.g., email subject vs. email body), and can handle complex spelling errors like splitting and merging of words. An experimental study shows the superiority of the algorithm over existing alternatives in the email domain.
Author: Danushka Bollegala ; David Weir ; John Carroll
Abstract: We describe a sentiment classification method that is applicable when we do not have any labeled data for a target domain but have some labeled data for multiple other domains, designated as the source domains. We automat- ically create a sentiment sensitive thesaurus using both labeled and unlabeled data from multiple source domains to find the association between words that express similar sentiments in different domains. The created thesaurus is then used to expand feature vectors to train a binary classifier. Unlike previous cross-domain sentiment classification methods, our method can efficiently learn from multiple source domains. Our method significantly outperforms numerous baselines and returns results that are better than or comparable to previous cross-domain sentiment classification methods on a benchmark dataset containing Amazon user reviews for different types of products.
4 0.78902447 156 acl-2011-IMASS: An Intelligent Microblog Analysis and Summarization System
Author: Jui-Yu Weng ; Cheng-Lun Yang ; Bo-Nian Chen ; Yen-Kai Wang ; Shou-De Lin
Abstract: This paper presents a system to summarize a Microblog post and its responses with the goal to provide readers a more constructive and concise set of information for efficient digestion. We introduce a novel two-phase summarization scheme. In the first phase, the post plus its responses are classified into four categories based on the intention, interrogation, sharing, discussion and chat. For each type of post, in the second phase, we exploit different strategies, including opinion analysis, response pair identification, and response relevancy detection, to summarize and highlight critical information to display. This system provides an alternative thinking about machinesummarization: by utilizing AI approaches, computers are capable of constructing deeper and more user-friendly abstraction. 1
5 0.77812892 122 acl-2011-Event Extraction as Dependency Parsing
Author: David McClosky ; Mihai Surdeanu ; Christopher Manning
Abstract: Nested event structures are a common occurrence in both open domain and domain specific extraction tasks, e.g., a “crime” event can cause a “investigation” event, which can lead to an “arrest” event. However, most current approaches address event extraction with highly local models that extract each event and argument independently. We propose a simple approach for the extraction of such structures by taking the tree of event-argument relations and using it directly as the representation in a reranking dependency parser. This provides a simple framework that captures global properties of both nested and flat event structures. We explore a rich feature space that models both the events to be parsed and context from the original supporting text. Our approach obtains competitive results in the extraction of biomedical events from the BioNLP’09 shared task with a F1 score of 53.5% in development and 48.6% in testing.
6 0.77530694 334 acl-2011-Which Noun Phrases Denote Which Concepts?
7 0.77472824 103 acl-2011-Domain Adaptation by Constraining Inter-Domain Variability of Latent Feature Representation
8 0.77259803 250 acl-2011-Prefix Probability for Probabilistic Synchronous Context-Free Grammars
9 0.77078104 127 acl-2011-Exploiting Web-Derived Selectional Preference to Improve Statistical Dependency Parsing
10 0.76843935 204 acl-2011-Learning Word Vectors for Sentiment Analysis
11 0.76769006 230 acl-2011-Neutralizing Linguistically Problematic Annotations in Unsupervised Dependency Parsing Evaluation
12 0.76315236 92 acl-2011-Data point selection for cross-language adaptation of dependency parsers
13 0.75604647 100 acl-2011-Discriminative Feature-Tied Mixture Modeling for Statistical Machine Translation
14 0.75550914 186 acl-2011-Joint Training of Dependency Parsing Filters through Latent Support Vector Machines
15 0.75378084 183 acl-2011-Joint Bilingual Sentiment Classification with Unlabeled Parallel Corpora
16 0.75316083 309 acl-2011-Transition-based Dependency Parsing with Rich Non-local Features
17 0.75160807 85 acl-2011-Coreference Resolution with World Knowledge
18 0.74966705 256 acl-2011-Query Weighting for Ranking Model Adaptation
19 0.74844205 179 acl-2011-Is Machine Translation Ripe for Cross-Lingual Sentiment Classification?
20 0.7482456 111 acl-2011-Effects of Noun Phrase Bracketing in Dependency Parsing and Machine Translation