emnlp emnlp2013 emnlp2013-170 knowledge-graph by maker-knowledge-mining

170 emnlp-2013-Sentiment Analysis: How to Derive Prior Polarities from SentiWordNet


Source: pdf

Author: Marco Guerini ; Lorenzo Gatti ; Marco Turchi

Abstract: Assigning a positive or negative score to a word out of context (i.e. a word’s prior polarity) is a challenging task for sentiment analysis. In the literature, various approaches based on SentiWordNet have been proposed. In this paper, we compare the most often used techniques together with newly proposed ones and incorporate all of them in a learning framework to see whether blending them can further improve the estimation of prior polarity scores. Using two different versions of SentiWordNet and testing regression and classification models across tasks and datasets, our learning approach consistently outperforms the single metrics, providing a new state-ofthe-art approach in computing words’ prior polarity for sentiment analysis. We conclude our investigation showing interesting biases in calculated prior polarity scores when word Part of Speech and annotator gender are considered.

Reference: text


Summary: the most important sentenses genereted by tfidf model

sentIndex sentText sentNum sentScore

1 a word’s prior polarity) is a challenging task for sentiment analysis. [sent-5, score-0.25]

2 In this paper, we compare the most often used techniques together with newly proposed ones and incorporate all of them in a learning framework to see whether blending them can further improve the estimation of prior polarity scores. [sent-7, score-0.416]

3 Using two different versions of SentiWordNet and testing regression and classification models across tasks and datasets, our learning approach consistently outperforms the single metrics, providing a new state-ofthe-art approach in computing words’ prior polarity for sentiment analysis. [sent-8, score-0.712]

4 We conclude our investigation showing interesting biases in calculated prior polarity scores when word Part of Speech and annotator gender are considered. [sent-9, score-0.484]

5 1 Introduction Many approaches to sentiment analysis make use of lexical resources i. [sent-10, score-0.168]

6 lists of positive and negative words often deployed as baselines or as features for other methods (usually machine learning based) for sentiment analysis research (Liu and Zhang, 2012). [sent-12, score-0.232]

7 For example, wonderful has a positive connotation prior polarity while horrible has a negative one. [sent-16, score-0.48]

8 eu needing deep semantic analysis or word sense disambiguation to assign an affective score to a word and are domain independent (they are thus less precise but more portable). [sent-20, score-0.166]

9 Given that SWN provides polarities scores for each word sense (also called ‘posterior polarities’), it is necessary to derive prior polarities from the posteriors. [sent-22, score-0.715]

10 For example, the word cold has a posterior polarity for the meaning “having a low temperature” like in “cold beer” that is different from the one in “cold which refers to “being emotionless”. [sent-23, score-0.487]

11 This information must be considered when reconstructing the prior polarity of cold. [sent-24, score-0.384]

12 Several formulae to compute prior polarities starting from posterior polarities scores have been used – – – – person” in the literature. [sent-25, score-1.162]

13 We show that researchers have not paid sufficient attention to this posterior-to-prior polarity issue. [sent-27, score-0.27]

14 On top of this, we attempt to outperform the state-of-the-art formula using a learning framework that combines the various formulae together. [sent-29, score-0.503]

15 In Section 2 we briefly describe our approach and how it differentiates from similar sentiment analysis tasks. [sent-35, score-0.136]

16 Then, in Sections 3 and 4, we present SentiWordNet and overview various posterior-to-prior polarity formulae based on this resource that appeared in the literature (included some new ones we identified as potentially relevant). [sent-36, score-0.781]

17 Finally, in the two last sections, we present a series of experiments, both in regression and classification tasks, that give an answer to the aforementioned research questions. [sent-39, score-0.192]

18 2 Proposed Approach In the broad field of Sentiment Analysis we will focus on the specific problem of posterior-to-prior polarity assessment, using both regression and classifi- cation experiments. [sent-41, score-0.397]

19 For the regression task, we tackled the problem of assigning affective scores (along a continuum between -1 and 1) to words using the posterior-to-prior polarity formulae. [sent-43, score-0.544]

20 In these experiments we will also use a learning framework which combines the various formulae together. [sent-45, score-0.435]

21 the posterior polarities provided by SWN combined in various ways), we can give a better prediction. [sent-48, score-0.322]

22 1260 The regression task is harder than binary classification, since we want to assess not only that pretty, beautiful and gorgeous are positive words, but also to define a partial or total order so that gorgeous is more positive than beautiful which, in turn, is more positive than pretty. [sent-49, score-0.268]

23 This is fundamental for tasks such as affective modification of existing texts, where words’ polarity together with their score are necessary for creating multiple graded variations of the original text (Guerini et al. [sent-50, score-0.388]

24 Some of the work that addresses the problem of sentiment strength are presented in (Wilson et al. [sent-52, score-0.184]

25 , 2010), however, their approach is modeled as a multi-class classification problem (neutral, low, medium or high sentiment) at the sentence level, rather than a regression problem at the word level. [sent-54, score-0.192]

26 On the other hand, even if approaches that go beyond pure prior polarities e. [sent-58, score-0.376]

27 using word bigram features (Wang and Manning, 2012) are better for sentiment analysis tasks, there are tasks that are intrinsically based on the notion of words’ prior polarity. [sent-60, score-0.25]

28 For example Mitsubishi changed the name of one of its SUV for the Spanish market, since the original name Pajero had a very negative prior polarity, as it meant ‘wanker’ in Spanish (Piller, 2003). [sent-64, score-0.163]

29 To our knowledge, the only work trying to address the SWN posterior-to-prior polarity issue, comparing some of the approaches appeared in the literature is (Gatti and Guerini, 2012). [sent-65, score-0.3]

30 However, in our previous study we only considered a regression framework, we did not use machine learning and we only tested SWN1. [sent-66, score-0.127]

31 These scores – automatically assigned starting from a bunch of seed terms – represent the positive and negative valence (or posterior polarity) of the synset and are inherited by each lemma-PoS in the synset. [sent-70, score-0.302]

32 In Table 1, the first 5 senses of cold#a present all possible combinations, included mixed scores (cold#a# 4), where positive and negative valences are assigned to the same sense. [sent-73, score-0.247]

33 In – SWN3 the annotation algorithm used in SWN1 was revised, leading to an increase in the accuracy of posterior polarities over the previous version. [sent-90, score-0.322]

34 4 Prior Polarities Formulae In this section we review the main strategies for computing prior polarities used in previous studies. [sent-91, score-0.376]

35 All the proposed approaches try to estimate the prior polarity score from the posterior polarities of all the senses for a single lemma-PoS. [sent-92, score-0.828]

36 Given a lemma-PoS with n senses (lemma#P oS #n), every formula f is independently applied to all the Pos ( s ) and Neg ( s ) . [sent-93, score-0.19]

37 To obtain a final prior polarity that ranges from -1 to 1, the negative sign is imposed. [sent-97, score-0.433]

38 So, considering the first 5 senses of cold#a in Table 1, f(posScore) will be derived from the P os ( s ) values <0. [sent-98, score-0.179]

39 Then, the final polarity strength returned will be either fm or fd. [sent-109, score-0.364]

40 The formulae (f) we tested are the following: fs. [sent-110, score-0.435]

41 It calculates the mean of the positive and negative scores for all the senses of the given lemma#PoS. [sent-118, score-0.247]

42 , 2009), it considers only those senses that have a P os ( s ) greater than or equal to the corresponding Neg ( s ) , and greater than 0 (the stronglyPos set). [sent-124, score-0.179]

43 This formula weighs each sense with a geometric series of ratio 1/2. [sent-131, score-0.148]

44 The rationale behind this choice is based on the assumption that more frequent senses should bear more “affective weight” than rare senses when computing the prior polarity of a word. [sent-132, score-0.628]

45 Similar to the previous one, this formula weighs each lemma with a harmonic series, see for example (Denecke, 2008). [sent-135, score-0.236]

46 On top of these formulae, we implemented some new formulae that were relevant to our task and have not been implemented before. [sent-136, score-0.435]

47 These formulae mimic the ones discussed previously, but they are built under a different assumption: that the saliency (Giora, 1997) of a word’s prior polarity might be more related to its posterior polarities score, rather than to sense frequencies. [sent-137, score-1.189]

48 Like w1 and w2, but senses are ordered by strength (sorting Pos(s) and Neg ( s ) independently). [sent-151, score-0.17]

49 median: return the median of the senses ordered by polarity score. [sent-157, score-0.424]

50 Finally, we implemented two variants of a prior polarity random baseline to asses possible advantages of approaches using SWN: rnd. [sent-159, score-0.384]

51 In classification SVMs use the geometric mean to discriminate between the positive and negative classes, while the GP model uses the posterior probability distribution over each class. [sent-176, score-0.221]

52 Both frameworks support learning algorithms for regression and classification. [sent-177, score-0.127]

53 GP regression models with Gaussian noise are a rare exception where the exact inference with likelihood functions is tractable, see §2 in (Rasmussen and Williams, 2006). [sent-182, score-0.127]

54 2) and the linear logistic (lll) and probit regression (prl) likelihood functions are evaluated in classification. [sent-186, score-0.159]

55 In our classification experiments we tried all possible combinations of kernels and likelihood functions, while in the regression tests we ranged only on different kernels. [sent-187, score-0.226]

56 Since the prior polarities formulae tend to cluster in groups that provide similar re- sults (Gatti and Guerini, 2012) creating noise for the learner we want to understand whether feature selection approaches can boost the performance of SVMs. [sent-195, score-0.843]

57 For this reason, we also test feature selection prior to the SVM training. [sent-196, score-0.146]

58 Re-sampling of the training data is performed several times and a Lasso regression model is fit on each sample. [sent-198, score-0.127]

59 6 Gold Standards To assess how well prior polarity formulae perform, a gold standard with word polarities provided by human annotators is needed. [sent-206, score-1.081]

60 , 2005) uses a similar binomial annotation for single words; another interesting resource is WordNetAffect (Strapparava and Valitutti, 2004) but it labels words senses and it cannot be used for the prior polarity validation task. [sent-215, score-0.552]

61 In the following we describe in detail the two resources we used for our experiments, namely ANEW for the regression experiments and the General Inquirer (GI) for the classification ones. [sent-216, score-0.224]

62 no context was provided) this resource represents a human validation of prior polarities scores for the given words, and can be used as a gold standard. [sent-223, score-0.451]

63 For this paper we consider the P o s it iv and Negat iv categories (1,915 words the former, 2,291 words the latter, for a total of 4,206 affective words). [sent-232, score-0.176]

64 7 Experiments In order to use the ANEW dataset to measure prior polarities formulae performance, we had to assign a PoS to all the words to obtain the SWN lemma#PoS format. [sent-233, score-0.811]

65 , 2008) and check if the lemma is present instead5. [sent-235, score-0.136]

66 Note that a lemma can have more than one PoS, for example, writer is present only as a noun (writ er#n), while yellow is present as a verb, a noun and an adjective (yel low#v, yel low#n, ye l low#a). [sent-240, score-0.168]

67 For each lemma#PoS in GI and ANEW, we then applied the prior polarity formulae described in Section 4, using both SWN1 and SWN3 and annotated the results. [sent-250, score-0.819]

68 According to the nature of the human labels (real numbers or -1/1), we ran several regression and classification experiments. [sent-251, score-0.192]

69 To evaluate the performance of our regression experiments on ANEW we used the Mean Absolute Error (MAE), that averages the error over a given test set. [sent-256, score-0.127]

70 In the following sections, to check if there was a statisti- – – cally significant difference in the results, we used Student’s t-test for regression experiments, while an approximate randomization test (Yeh, 2000) was used for the classification experiments. [sent-260, score-0.192]

71 In Tables 2 and 3, the results of regression experiments over the ANEW dataset, using SWN1 and SWN3, are presented. [sent-261, score-0.127]

72 Note that for classification we report the generics f and not the fm and fd variants. [sent-264, score-0.162]

73 So, using SWN for posteriorto-prior polarity computation brings benefits, since it increases the performance above the baseline in words’ prior polarity assessment. [sent-270, score-0.654]

74 The formulae described in Section 4 have very different results, along a continuum. [sent-285, score-0.435]

75 Furthermore, the new formulae we introduced, based on the “posterior polarities saliency” hypothesis, proved to be among the best performing in all experiments. [sent-289, score-0.735]

76 This entails that there is room for inspecting new formulae variants other than those already proposed in the literature. [sent-290, score-0.435]

77 On a side note, the approaches that rely on only one sense polarity (namely fs, median and max) have similar results which do not differ significantly from swnrnd (for maxm, fsd and fsm in Table 2, and for maxm in Table 3). [sent-292, score-0.495]

78 001); in Ta- 1266 ble 3, fs, max and median in both their fm and fd variants are significantly different from the best performing w2nm (p < 0. [sent-295, score-0.167]

79 For classification, in Table 4 and 5 the difference between the corresponding best performing formula and the single senses formulae is always significant (at least p < 0. [sent-297, score-0.663]

80 Combining the formulae in a learning framework further improves the results over the best performing formulae, both in regression (MAEµ with SWN1 0. [sent-302, score-0.6]

81 There is no significant difference in using linear logistic and probit regression likelihoods. [sent-326, score-0.159]

82 This is not surprising due the high level of redundancy in the formulae scores. [sent-328, score-0.435]

83 Interestingly, inspecting the most frequent selected features by SV Mfs, we see that features from different groups are selected, and even the worst performing formulae can add information, confirming the idea that viewing the same information from different perspectives (i. [sent-329, score-0.473]

84 the posterior polarities provided by SWN combined in var- ious ways) can give better predictions. [sent-331, score-0.322]

85 In Table 6 we report the results for the best performing formulae and learning algorithm on the GI PoS classes. [sent-334, score-0.473]

86 Instead, in Table 8 which displays the results along gender and polarity dimensions there is no statistically significant difference in MAE on positive words between male and female, while there is a strong statistical significance for negative words (p < 0. [sent-366, score-0.459]

87 Interestingly, there is also a large difference between positive and negative affective words (both for male and female dimensions). [sent-368, score-0.325]

88 This difference is maximum for male scores on positive words compared to female scores on negative words (0. [sent-369, score-0.265]

89 (2013) inspected the differences in prior polarity assessment due to gender. [sent-375, score-0.384]

90 – – At this stage we can only note that prior polarities calculated with SWN are closer to ANEW male annotations than female ones. [sent-376, score-0.487]

91 n0g13SWN3 10 Conclusions We have presented a study on the posterior-to-prior polarity issue, i. [sent-390, score-0.27]

92 the problem of computing words’ prior polarity starting from their posterior polarities. [sent-392, score-0.444]

93 Indeed, we showed that the better variants outperform the others on different datasets both in regression and classification tasks, and that they can represent a fairer state-of-art baseline approach using SentiWordNet. [sent-394, score-0.23]

94 On top of this, we also showed that these state-ofthe-art formulae can be further outperformed using a learning framework that combines the various formulae together. [sent-395, score-0.87]

95 Using syntactic and contextual information for sentiment polarity analysis. [sent-418, score-0.406]

96 0: An enhanced lexical resource for sentiment analysis and opinion mining. [sent-426, score-0.217]

97 Sentiment polarity identification in financial news: A cohesion-based approach. [sent-479, score-0.27]

98 Sentence level subjectivity and sentiment analysis experiments in NTCIR-7 MOAT challenge. [sent-601, score-0.136]

99 Development of a novel algorithm for sentiment analysis based on adverb-adjective-noun combinations. [sent-625, score-0.136]

100 Baselines and bigrams: Simple, good sentiment and topic classification. [sent-669, score-0.136]


similar papers computed by tfidf model

tfidf for this paper:

wordName wordTfidf (topN-words)

[('formulae', 0.435), ('swn', 0.323), ('polarity', 0.27), ('polarities', 0.262), ('anew', 0.256), ('sentiwordnet', 0.18), ('cold', 0.157), ('negscore', 0.145), ('sentiment', 0.136), ('lemma', 0.136), ('posscore', 0.129), ('regression', 0.127), ('gps', 0.126), ('senses', 0.122), ('affective', 0.118), ('prior', 0.114), ('guerini', 0.113), ('inquirer', 0.09), ('rasmussen', 0.084), ('valence', 0.083), ('mae', 0.083), ('gi', 0.082), ('neviarouskaya', 0.081), ('neg', 0.079), ('svms', 0.073), ('williams', 0.069), ('formula', 0.068), ('classification', 0.065), ('gatti', 0.065), ('lasswell', 0.065), ('maxm', 0.065), ('zbal', 0.065), ('pos', 0.062), ('posterior', 0.06), ('male', 0.059), ('os', 0.057), ('trento', 0.055), ('female', 0.052), ('strapparava', 0.051), ('fd', 0.051), ('negative', 0.049), ('fsm', 0.048), ('sommarive', 0.048), ('strength', 0.048), ('sense', 0.048), ('positive', 0.047), ('resource', 0.046), ('fm', 0.046), ('esuli', 0.045), ('gaussian', 0.044), ('gp', 0.042), ('dunphy', 0.042), ('mfs', 0.042), ('lemmatize', 0.042), ('emotion', 0.039), ('povo', 0.038), ('fairer', 0.038), ('rnd', 0.038), ('lrec', 0.038), ('harvard', 0.038), ('performing', 0.038), ('annotator', 0.037), ('bradley', 0.036), ('opinion', 0.035), ('gender', 0.034), ('kernels', 0.034), ('synset', 0.034), ('stone', 0.034), ('standards', 0.034), ('median', 0.032), ('arousal', 0.032), ('blending', 0.032), ('chowdhury', 0.032), ('denecke', 0.032), ('devitt', 0.032), ('fbk', 0.032), ('fsd', 0.032), ('meinshausen', 0.032), ('paltoglou', 0.032), ('pianta', 0.032), ('probit', 0.032), ('stronglypos', 0.032), ('textpro', 0.032), ('thet', 0.032), ('turchi', 0.032), ('warriner', 0.032), ('weighs', 0.032), ('yel', 0.032), ('resources', 0.032), ('selection', 0.032), ('pages', 0.031), ('literature', 0.03), ('revised', 0.029), ('iv', 0.029), ('wilson', 0.029), ('tables', 0.029), ('scores', 0.029), ('italy', 0.028), ('sv', 0.028), ('sebastiani', 0.028)]

similar papers list:

simIndex simValue paperId paperTitle

same-paper 1 0.9999997 170 emnlp-2013-Sentiment Analysis: How to Derive Prior Polarities from SentiWordNet

Author: Marco Guerini ; Lorenzo Gatti ; Marco Turchi

Abstract: Assigning a positive or negative score to a word out of context (i.e. a word’s prior polarity) is a challenging task for sentiment analysis. In the literature, various approaches based on SentiWordNet have been proposed. In this paper, we compare the most often used techniques together with newly proposed ones and incorporate all of them in a learning framework to see whether blending them can further improve the estimation of prior polarity scores. Using two different versions of SentiWordNet and testing regression and classification models across tasks and datasets, our learning approach consistently outperforms the single metrics, providing a new state-ofthe-art approach in computing words’ prior polarity for sentiment analysis. We conclude our investigation showing interesting biases in calculated prior polarity scores when word Part of Speech and annotator gender are considered.

2 0.16516547 81 emnlp-2013-Exploring Demographic Language Variations to Improve Multilingual Sentiment Analysis in Social Media

Author: Svitlana Volkova ; Theresa Wilson ; David Yarowsky

Abstract: Theresa Wilson Human Language Technology Center of Excellence Johns Hopkins University Baltimore, MD t aw@ j hu .edu differences may Different demographics, e.g., gender or age, can demonstrate substantial variation in their language use, particularly in informal contexts such as social media. In this paper we focus on learning gender differences in the use of subjective language in English, Spanish, and Russian Twitter data, and explore cross-cultural differences in emoticon and hashtag use for male and female users. We show that gender differences in subjective language can effectively be used to improve sentiment analysis, and in particular, polarity classification for Spanish and Russian. Our results show statistically significant relative F-measure improvement over the gender-independent baseline 1.5% and 1% for Russian, 2% and 0.5% for Spanish, and 2.5% and 5% for English for polarity and subjectivity classification.

3 0.16316837 143 emnlp-2013-Open Domain Targeted Sentiment

Author: Margaret Mitchell ; Jacqui Aguilar ; Theresa Wilson ; Benjamin Van Durme

Abstract: We propose a novel approach to sentiment analysis for a low resource setting. The intuition behind this work is that sentiment expressed towards an entity, targeted sentiment, may be viewed as a span of sentiment expressed across the entity. This representation allows us to model sentiment detection as a sequence tagging problem, jointly discovering people and organizations along with whether there is sentiment directed towards them. We compare performance in both Spanish and English on microblog data, using only a sentiment lexicon as an external resource. By leveraging linguisticallyinformed features within conditional random fields (CRFs) trained to minimize empirical risk, our best models in Spanish significantly outperform a strong baseline, and reach around 90% accuracy on the combined task of named entity recognition and sentiment prediction. Our models in English, trained on a much smaller dataset, are not yet statistically significant against their baselines.

4 0.16163754 109 emnlp-2013-Is Twitter A Better Corpus for Measuring Sentiment Similarity?

Author: Shi Feng ; Le Zhang ; Binyang Li ; Daling Wang ; Ge Yu ; Kam-Fai Wong

Abstract: Extensive experiments have validated the effectiveness of the corpus-based method for classifying the word’s sentiment polarity. However, no work is done for comparing different corpora in the polarity classification task. Nowadays, Twitter has aggregated huge amount of data that are full of people’s sentiments. In this paper, we empirically evaluate the performance of different corpora in sentiment similarity measurement, which is the fundamental task for word polarity classification. Experiment results show that the Twitter data can achieve a much better performance than the Google, Web1T and Wikipedia based methods.

5 0.099943437 18 emnlp-2013-A temporal model of text periodicities using Gaussian Processes

Author: Daniel Preotiuc-Pietro ; Trevor Cohn

Abstract: Temporal variations of text are usually ignored in NLP applications. However, text use changes with time, which can affect many applications. In this paper we model periodic distributions of words over time. Focusing on hashtag frequency in Twitter, we first automatically identify the periodic patterns. We use this for regression in order to forecast the volume of a hashtag based on past data. We use Gaussian Processes, a state-ofthe-art bayesian non-parametric model, with a novel periodic kernel. We demonstrate this in a text classification setting, assigning the tweet hashtag based on the rest of its text. This method shows significant improvements over competitive baselines.

6 0.087082475 158 emnlp-2013-Recursive Deep Models for Semantic Compositionality Over a Sentiment Treebank

7 0.084526964 144 emnlp-2013-Opinion Mining in Newspaper Articles by Entropy-Based Word Connections

8 0.076786362 47 emnlp-2013-Collective Opinion Target Extraction in Chinese Microblogs

9 0.071794972 163 emnlp-2013-Sarcasm as Contrast between a Positive Sentiment and Negative Situation

10 0.070844561 17 emnlp-2013-A Walk-Based Semantically Enriched Tree Kernel Over Distributed Word Representations

11 0.067282625 89 emnlp-2013-Gender Inference of Twitter Users in Non-English Contexts

12 0.064655855 169 emnlp-2013-Semi-Supervised Representation Learning for Cross-Lingual Text Classification

13 0.061155431 63 emnlp-2013-Discourse Level Explanatory Relation Extraction from Product Reviews Using First-Order Logic

14 0.05541236 77 emnlp-2013-Exploiting Domain Knowledge in Aspect Extraction

15 0.05454509 138 emnlp-2013-Naive Bayes Word Sense Induction

16 0.054169618 191 emnlp-2013-Understanding and Quantifying Creativity in Lexical Composition

17 0.053115748 120 emnlp-2013-Learning Latent Word Representations for Domain Adaptation using Supervised Word Clustering

18 0.049304523 121 emnlp-2013-Learning Topics and Positions from Debatepedia

19 0.046797272 154 emnlp-2013-Prior Disambiguation of Word Tensors for Constructing Sentence Vectors

20 0.045867421 53 emnlp-2013-Cross-Lingual Discriminative Learning of Sequence Models with Posterior Regularization


similar papers computed by lsi model

lsi for this paper:

topicId topicWeight

[(0, -0.167), (1, 0.054), (2, -0.163), (3, -0.176), (4, 0.068), (5, -0.042), (6, -0.012), (7, -0.13), (8, 0.057), (9, 0.079), (10, -0.019), (11, 0.006), (12, -0.005), (13, 0.021), (14, 0.009), (15, -0.019), (16, -0.041), (17, 0.048), (18, 0.088), (19, 0.089), (20, 0.036), (21, 0.034), (22, -0.012), (23, 0.039), (24, 0.029), (25, -0.078), (26, -0.09), (27, -0.066), (28, 0.002), (29, -0.025), (30, 0.064), (31, 0.076), (32, -0.137), (33, -0.069), (34, 0.035), (35, 0.044), (36, 0.057), (37, -0.018), (38, -0.036), (39, -0.112), (40, 0.063), (41, 0.008), (42, -0.048), (43, 0.094), (44, -0.063), (45, 0.036), (46, 0.014), (47, -0.003), (48, -0.026), (49, -0.032)]

similar papers list:

simIndex simValue paperId paperTitle

same-paper 1 0.92992443 170 emnlp-2013-Sentiment Analysis: How to Derive Prior Polarities from SentiWordNet

Author: Marco Guerini ; Lorenzo Gatti ; Marco Turchi

Abstract: Assigning a positive or negative score to a word out of context (i.e. a word’s prior polarity) is a challenging task for sentiment analysis. In the literature, various approaches based on SentiWordNet have been proposed. In this paper, we compare the most often used techniques together with newly proposed ones and incorporate all of them in a learning framework to see whether blending them can further improve the estimation of prior polarity scores. Using two different versions of SentiWordNet and testing regression and classification models across tasks and datasets, our learning approach consistently outperforms the single metrics, providing a new state-ofthe-art approach in computing words’ prior polarity for sentiment analysis. We conclude our investigation showing interesting biases in calculated prior polarity scores when word Part of Speech and annotator gender are considered.

2 0.69290954 144 emnlp-2013-Opinion Mining in Newspaper Articles by Entropy-Based Word Connections

Author: Thomas Scholz ; Stefan Conrad

Abstract: A very valuable piece of information in newspaper articles is the tonality of extracted statements. For the analysis of tonality of newspaper articles either a big human effort is needed, when it is carried out by media analysts, or an automated approach which has to be as accurate as possible for a Media Response Analysis (MRA). To this end, we will compare several state-of-the-art approaches for Opinion Mining in newspaper articles in this paper. Furthermore, we will introduce a new technique to extract entropy-based word connections which identifies the word combinations which create a tonality. In the evaluation, we use two different corpora consisting of news articles, by which we show that the new approach achieves better results than the four state-of-the-art methods.

3 0.66661602 81 emnlp-2013-Exploring Demographic Language Variations to Improve Multilingual Sentiment Analysis in Social Media

Author: Svitlana Volkova ; Theresa Wilson ; David Yarowsky

Abstract: Theresa Wilson Human Language Technology Center of Excellence Johns Hopkins University Baltimore, MD t aw@ j hu .edu differences may Different demographics, e.g., gender or age, can demonstrate substantial variation in their language use, particularly in informal contexts such as social media. In this paper we focus on learning gender differences in the use of subjective language in English, Spanish, and Russian Twitter data, and explore cross-cultural differences in emoticon and hashtag use for male and female users. We show that gender differences in subjective language can effectively be used to improve sentiment analysis, and in particular, polarity classification for Spanish and Russian. Our results show statistically significant relative F-measure improvement over the gender-independent baseline 1.5% and 1% for Russian, 2% and 0.5% for Spanish, and 2.5% and 5% for English for polarity and subjectivity classification.

4 0.66070741 143 emnlp-2013-Open Domain Targeted Sentiment

Author: Margaret Mitchell ; Jacqui Aguilar ; Theresa Wilson ; Benjamin Van Durme

Abstract: We propose a novel approach to sentiment analysis for a low resource setting. The intuition behind this work is that sentiment expressed towards an entity, targeted sentiment, may be viewed as a span of sentiment expressed across the entity. This representation allows us to model sentiment detection as a sequence tagging problem, jointly discovering people and organizations along with whether there is sentiment directed towards them. We compare performance in both Spanish and English on microblog data, using only a sentiment lexicon as an external resource. By leveraging linguisticallyinformed features within conditional random fields (CRFs) trained to minimize empirical risk, our best models in Spanish significantly outperform a strong baseline, and reach around 90% accuracy on the combined task of named entity recognition and sentiment prediction. Our models in English, trained on a much smaller dataset, are not yet statistically significant against their baselines.

5 0.65613043 109 emnlp-2013-Is Twitter A Better Corpus for Measuring Sentiment Similarity?

Author: Shi Feng ; Le Zhang ; Binyang Li ; Daling Wang ; Ge Yu ; Kam-Fai Wong

Abstract: Extensive experiments have validated the effectiveness of the corpus-based method for classifying the word’s sentiment polarity. However, no work is done for comparing different corpora in the polarity classification task. Nowadays, Twitter has aggregated huge amount of data that are full of people’s sentiments. In this paper, we empirically evaluate the performance of different corpora in sentiment similarity measurement, which is the fundamental task for word polarity classification. Experiment results show that the Twitter data can achieve a much better performance than the Google, Web1T and Wikipedia based methods.

6 0.64770216 163 emnlp-2013-Sarcasm as Contrast between a Positive Sentiment and Negative Situation

7 0.59961355 18 emnlp-2013-A temporal model of text periodicities using Gaussian Processes

8 0.49834064 196 emnlp-2013-Using Crowdsourcing to get Representations based on Regular Expressions

9 0.4765397 158 emnlp-2013-Recursive Deep Models for Semantic Compositionality Over a Sentiment Treebank

10 0.45556012 17 emnlp-2013-A Walk-Based Semantically Enriched Tree Kernel Over Distributed Word Representations

11 0.43866664 47 emnlp-2013-Collective Opinion Target Extraction in Chinese Microblogs

12 0.42873594 123 emnlp-2013-Learning to Rank Lexical Substitutions

13 0.41193733 63 emnlp-2013-Discourse Level Explanatory Relation Extraction from Product Reviews Using First-Order Logic

14 0.40036941 28 emnlp-2013-Automated Essay Scoring by Maximizing Human-Machine Agreement

15 0.36626962 161 emnlp-2013-Rule-Based Information Extraction is Dead! Long Live Rule-Based Information Extraction Systems!

16 0.34226686 191 emnlp-2013-Understanding and Quantifying Creativity in Lexical Composition

17 0.34199321 189 emnlp-2013-Two-Stage Method for Large-Scale Acquisition of Contradiction Pattern Pairs using Entailment

18 0.34029409 184 emnlp-2013-This Text Has the Scent of Starbucks: A Laplacian Structured Sparsity Model for Computational Branding Analytics

19 0.33390841 44 emnlp-2013-Centering Similarity Measures to Reduce Hubs

20 0.32488197 188 emnlp-2013-Tree Kernel-based Negation and Speculation Scope Detection with Structured Syntactic Parse Features


similar papers computed by lda model

lda for this paper:

topicId topicWeight

[(3, 0.032), (10, 0.013), (18, 0.033), (22, 0.042), (30, 0.067), (50, 0.019), (51, 0.172), (66, 0.066), (71, 0.049), (74, 0.321), (75, 0.016), (77, 0.019), (90, 0.016), (96, 0.023), (97, 0.016)]

similar papers list:

simIndex simValue paperId paperTitle

same-paper 1 0.73238343 170 emnlp-2013-Sentiment Analysis: How to Derive Prior Polarities from SentiWordNet

Author: Marco Guerini ; Lorenzo Gatti ; Marco Turchi

Abstract: Assigning a positive or negative score to a word out of context (i.e. a word’s prior polarity) is a challenging task for sentiment analysis. In the literature, various approaches based on SentiWordNet have been proposed. In this paper, we compare the most often used techniques together with newly proposed ones and incorporate all of them in a learning framework to see whether blending them can further improve the estimation of prior polarity scores. Using two different versions of SentiWordNet and testing regression and classification models across tasks and datasets, our learning approach consistently outperforms the single metrics, providing a new state-ofthe-art approach in computing words’ prior polarity for sentiment analysis. We conclude our investigation showing interesting biases in calculated prior polarity scores when word Part of Speech and annotator gender are considered.

2 0.6467936 168 emnlp-2013-Semi-Supervised Feature Transformation for Dependency Parsing

Author: Wenliang Chen ; Min Zhang ; Yue Zhang

Abstract: In current dependency parsing models, conventional features (i.e. base features) defined over surface words and part-of-speech tags in a relatively high-dimensional feature space may suffer from the data sparseness problem and thus exhibit less discriminative power on unseen data. In this paper, we propose a novel semi-supervised approach to addressing the problem by transforming the base features into high-level features (i.e. meta features) with the help of a large amount of automatically parsed data. The meta features are used together with base features in our final parser. Our studies indicate that our proposed approach is very effective in processing unseen data and features. Experiments on Chinese and English data sets show that the final parser achieves the best-reported accuracy on the Chinese data and comparable accuracy with the best known parsers on the English data.

3 0.62636578 77 emnlp-2013-Exploiting Domain Knowledge in Aspect Extraction

Author: Zhiyuan Chen ; Arjun Mukherjee ; Bing Liu ; Meichun Hsu ; Malu Castellanos ; Riddhiman Ghosh

Abstract: Aspect extraction is one of the key tasks in sentiment analysis. In recent years, statistical models have been used for the task. However, such models without any domain knowledge often produce aspects that are not interpretable in applications. To tackle the issue, some knowledge-based topic models have been proposed, which allow the user to input some prior domain knowledge to generate coherent aspects. However, existing knowledge-based topic models have several major shortcomings, e.g., little work has been done to incorporate the cannot-link type of knowledge or to automatically adjust the number of topics based on domain knowledge. This paper proposes a more advanced topic model, called MC-LDA (LDA with m-set and c-set), to address these problems, which is based on an Extended generalized Pólya urn (E-GPU) model (which is also proposed in this paper). Experiments on real-life product reviews from a variety of domains show that MCLDA outperforms the existing state-of-the-art models markedly.

4 0.5363878 143 emnlp-2013-Open Domain Targeted Sentiment

Author: Margaret Mitchell ; Jacqui Aguilar ; Theresa Wilson ; Benjamin Van Durme

Abstract: We propose a novel approach to sentiment analysis for a low resource setting. The intuition behind this work is that sentiment expressed towards an entity, targeted sentiment, may be viewed as a span of sentiment expressed across the entity. This representation allows us to model sentiment detection as a sequence tagging problem, jointly discovering people and organizations along with whether there is sentiment directed towards them. We compare performance in both Spanish and English on microblog data, using only a sentiment lexicon as an external resource. By leveraging linguisticallyinformed features within conditional random fields (CRFs) trained to minimize empirical risk, our best models in Spanish significantly outperform a strong baseline, and reach around 90% accuracy on the combined task of named entity recognition and sentiment prediction. Our models in English, trained on a much smaller dataset, are not yet statistically significant against their baselines.

5 0.53637552 76 emnlp-2013-Exploiting Discourse Analysis for Article-Wide Temporal Classification

Author: Jun-Ping Ng ; Min-Yen Kan ; Ziheng Lin ; Wei Feng ; Bin Chen ; Jian Su ; Chew Lim Tan

Abstract: In this paper we classify the temporal relations between pairs of events on an article-wide basis. This is in contrast to much of the existing literature which focuses on just event pairs which are found within the same or adjacent sentences. To achieve this, we leverage on discourse analysis as we believe that it provides more useful semantic information than typical lexico-syntactic features. We propose the use of several discourse analysis frameworks, including 1) Rhetorical Structure Theory (RST), 2) PDTB-styled discourse relations, and 3) topical text segmentation. We explain how features derived from these frameworks can be effectively used with support vector machines (SVM) paired with convolution kernels. Experiments show that our proposal is effective in improving on the state-of-the-art significantly by as much as 16% in terms of F1, even if we only adopt less-than-perfect automatic discourse analyzers and parsers. Making use of more accurate discourse analysis can further boost gains to 35%.

6 0.53200859 47 emnlp-2013-Collective Opinion Target Extraction in Chinese Microblogs

7 0.53197747 53 emnlp-2013-Cross-Lingual Discriminative Learning of Sequence Models with Posterior Regularization

8 0.52951616 89 emnlp-2013-Gender Inference of Twitter Users in Non-English Contexts

9 0.52906924 82 emnlp-2013-Exploring Representations from Unlabeled Data with Co-training for Chinese Word Segmentation

10 0.5284903 81 emnlp-2013-Exploring Demographic Language Variations to Improve Multilingual Sentiment Analysis in Social Media

11 0.52756 114 emnlp-2013-Joint Learning and Inference for Grammatical Error Correction

12 0.52655661 56 emnlp-2013-Deep Learning for Chinese Word Segmentation and POS Tagging

13 0.52654678 48 emnlp-2013-Collective Personal Profile Summarization with Social Networks

14 0.52641016 140 emnlp-2013-Of Words, Eyes and Brains: Correlating Image-Based Distributional Semantic Models with Neural Representations of Concepts

15 0.5259403 51 emnlp-2013-Connecting Language and Knowledge Bases with Embedding Models for Relation Extraction

16 0.5250622 181 emnlp-2013-The Effects of Syntactic Features in Automatic Prediction of Morphology

17 0.52501917 64 emnlp-2013-Discriminative Improvements to Distributional Sentence Similarity

18 0.5240218 152 emnlp-2013-Predicting the Presence of Discourse Connectives

19 0.5239678 38 emnlp-2013-Bilingual Word Embeddings for Phrase-Based Machine Translation

20 0.52384531 132 emnlp-2013-Mining Scientific Terms and their Definitions: A Study of the ACL Anthology