acl acl2011 acl2011-253 knowledge-graph by maker-knowledge-mining

253 acl-2011-PsychoSentiWordNet

Source: pdf

Author: Amitava Das

Abstract: Sentiment analysis is one of the hot demanding research areas since last few decades. Although a formidable amount of research has been done but still the existing reported solutions or available systems are far from perfect or to meet the satisfaction level of end user's. The main issue may be there are many conceptual rules that govern sentiment, and there are even more clues (possibly unlimited) that can convey these concepts from realization to verbalization of a human being. Human psychology directly relates to the unrevealed clues; govern the sentiment realization of us. Human psychology relates many things like social psychology, culture, pragmatics and many more endless intelligent aspects of civilization. Proper incorporation of human psychology into computational sentiment knowledge representation may solve the problem. PsychoSentiWordNet is an extension over SentiWordNet that holds human psychological knowledge and sentiment knowledge simultaneously. 1

Reference: text

Summary: the most important sentenses genereted by tfidf model

sentIndex sentText sentNum sentScore

1 Abstract Sentiment analysis is one of the hot demanding research areas since last few decades. [sent-3, score-0.042]

2 The main issue may be there are many conceptual rules that govern sentiment, and there are even more clues (possibly unlimited) that can convey these concepts from realization to verbalization of a human being. [sent-5, score-0.195]

3 Human psychology directly relates to the unrevealed clues; govern the sentiment realization of us. [sent-6, score-0.757]

4 Human psychology relates many things like social psychology, culture, pragmatics and many more endless intelligent aspects of civilization. [sent-7, score-0.196]

5 Proper incorporation of human psychology into computational sentiment knowledge representation may solve the problem. [sent-8, score-0.563]

6 PsychoSentiWordNet is an extension over SentiWordNet that holds human psychological knowledge and sentiment knowledge simultaneously. [sent-9, score-0.58]

7 1 Introduction In order to identify sentiment from a text, lexical analysis plays a crucial role. [sent-10, score-0.421]

8 For example, words like love, hate, good and favorite directly indicate sentiment or opinion. [sent-11, score-0.421]

9 , 2010) have already proposed techniques for making dictionaries for those sentiment words. [sent-15, score-0.452]

10 But polarity assignment of such sentiment lexicons is a hard semantic disambiguation problem. [sent-16, score-0.602]

11 The regulating aspects 52 which govern the lexical level semantic orientation are natural language context (Pang et al. [sent-17, score-0.33]

12 , 2002), language properties (Wiebe and Mihalcea, 2006), domain pragmatic knowledge (Aue and Gamon, 2005), time dimension (Read, 2005), colors and culture (Strapparava and Ozbal, 2010) and many more unrevealed hidden aspects. [sent-18, score-0.136]

13 What previous studies proposed is to attach prior polarity to each sentiment lexicon level. [sent-20, score-0.689]

14 Prior polarity is an approximation value based on corpus heuristics based statistics and not exact. [sent-21, score-0.181]

15 The probabilistic fixed point prior polarity scores do not solve the problem completely rather it shoves the problem into next level, called contextual polarity classification. [sent-22, score-0.414]

16 The hypothesis we started with is that the summation of all the regulating aspects of sentiment orientation is human psychology and thus it is called multi-faceted problem (Liu, 2010). [sent-23, score-0.79]

17 More precisely what we meant by human psychology is the all known and unknown aspects, directly or indirectly govern the sentiment orientation knowledge of us. [sent-24, score-0.729]

18 The regulating aspects wrapped in the present PsychoSentiWordNet are Gender, Age, City, Country, Language and Profession. [sent-25, score-0.164]

19 The PsychoSentiWordNet is an extension over the existing SentiWordNet to hold the possible psychological ingredients, governs the sentiment understandability of us. [sent-26, score-0.549]

20 The PsychoSentiWordNet holds variable prior polarity scores, could be fetched depending upon those psychological regulating aspects. [sent-27, score-0.435]

21 An example may illustrate the definition better for the concept “Rock_Climbing”: Aspects (Age) Polarity - -- - - -- - - -- - - -- - - -- - - -- - - -- - - - ----- --- ---- Null Positive 50-54 Negative 26-29 Positive Portland, OPR,ro UcSeeAdi 1n9g-s2 o4f J uthnee A 2C01L-1H. [sent-28, score-0.041]

22 c T2 2001111 A Sstsuodceinatti Soens fsoiorn C,o pmagpeusta 52ti–o5n7a,l Linguistics In the previous example the described concept “Rock_Climbing” is generally positive as it is adventurous and people have it to make fun or excursion. [sent-30, score-0.216]

23 But it demands highly physical ability thus may be not as good for aged people like the younger people. [sent-31, score-0.113]

24 In this paper, we propose an interactive gaming (Dr Sentiment) technology to collect psycho-sentimental polarity for lexicons. [sent-36, score-0.315]

25 Section 3 explains about some exciting outcomes that support the usefulness of the PsychoSentiWordNet. [sent-39, score-0.071]

26 What we believe is the developed PsychoSentiWordNet will help automatic sentiment analysis research in many aspect and other disciplines as well, described in the section 4. [sent-40, score-0.421]

27 2 Dr Sentiment Dr Sentiment 1 is a template based interactive online game, which collects player’s sentiment by asking a set of simple template based questions and finally reveals a player’s sentimental status. [sent-42, score-0.519]

28 Dr Sentiment fetches random words from SentiWordNet synsets and asks every player to tell about his/her sentiment polarity understanding regarding the concept behind. [sent-43, score-0.953]

29 There are several motivations behind developing an intuitive game to automatically collect human psycho-sentimental orientation information. [sent-44, score-0.349]

30 , 2004) innovate the concept of a game to automatically label images available in the World Wide Web. [sent-46, score-0.3]

31 It has been identified as the most reliable strategy to automatically annotate the online images. [sent-47, score-0.043]

32 These techniques can be broadly categorized in two genres, one follows classical manual annotation (Andreevskaia and Bergler, 2006);(Wiebe and Riloff, 2006); (Mohammad et al. [sent-54, score-0.06]

33 , 2008) techniques and the others proposed various automatic techniques (Tong, 2001). [sent-55, score-0.062]

34 Manual annotation techniques are undoubtedly trustable but it generally takes time. [sent-57, score-0.08]

35 Automatic techniques demands manual validations and are dependent on the corpus availability in the respective domain. [sent-58, score-0.102]

36 Manual annotation technique required a large number of annotators to balance one’s sentimentality in order to reach agreement. [sent-59, score-0.168]

37 But sentiment is a property of human intelligence and is not entirely based on the features of a language. [sent-61, score-0.452]

38 Thus people’s involvement is required to capture the sentiment of the human society. [sent-62, score-0.504]

39 We have developed an online game to attract internet population for the creation of PsychoSentiWordNet automatically. [sent-63, score-0.303]

40 Involvement of Internet population is an effective approach as the population is very high in number and ever growing (approx. [sent-64, score-0.132]

41 Internet population consists of people with various languages, cultures, age etc and thus not biased towards any domain, language or particular society. [sent-66, score-0.197]

42 The Sign Up form of the “Dr Sentiment” game asks the player to provide personal information such as Sex, Age, City, Country, Language and Profession. [sent-67, score-0.457]

43 The lexicons tagged by this system are credible as it is tagged by human beings. [sent-68, score-0.263]

44 In either way it is not like a static sentiment lexicon set as it is updated regularly. [sent-69, score-0.492]

45 Almost 100 players per day are currently playing it throughout the world in different languages. [sent-70, score-0.189]

46 To make the gaming interface more interesting images has been added with the help of Google image search API and to avoid biasness we have randomized among the first ten images retrieved by Google. [sent-73, score-0.432]

47 Snapshots of different screens from the game are presented in Figure 1. [sent-74, score-0.184]

48 There are predefined distributions of each question type as 11 for Q1, 11 for Q2, 4 for Q3 and 4 for Q4. [sent-83, score-0.066]

49 The questions are randomly asked to keep the game more interesting. [sent-85, score-0.324]

50 The Google image search API is fired with the word as a query. [sent-88, score-0.059]

51 An image along with the word itself is shown in the Q1 page of the game. [sent-89, score-0.059]

52 Players press the different emoticons (Fig 2) to express their sentimentality. [sent-90, score-0.056]

53 3 Q2 This question type is specially designed for relative scoring technique. [sent-93, score-0.066]

54 For example: good and better both are positive but we need to know which one is 54 more positive than other. [sent-94, score-0.166]

55 With the present gaming technology relative polarity scoring has been assigned to each n-n word pair combination. [sent-96, score-0.284]

56 Randomly n (presently 2-4) words have been chosen from the source SentiWordNet synsets along with their images as retrieved by Google API. [sent-98, score-0.173]

57 Each player is then asked to select one of them that he/she likes most. [sent-99, score-0.254]

58 The relative score is calculated and stored in the corresponding log log table. [sent-100, score-0.08]

59 4 Q3 The player is asked for any positive word in his/her mind. [sent-104, score-0.337]

60 The word is then added to the PsychoSentiWordNet and further used in Q1 to other users to note their sentimentality about the particular word. [sent-106, score-0.168]

61 5 Q4 A player is asked by Dr Sentiment about any negative word. [sent-108, score-0.32]

62 The word is then added to the PsychoSentiWordNet and further used in Q1 to other users to note their sentimentality about the particular word. [sent-109, score-0.168]

63 6 Comment Architecture There are three types of Comments, Comment type 1 (CMNT1), Comment type 2 (CMNT2) and the final comment as Dr Sentiment’s prescription. [sent-111, score-0.215]

64 CMNT1 type and CMNT2 type comments are associated with question types Q1 and Q2 respectively. [sent-112, score-0.141]

65 7 CMNT1 Comment type 1 has 5 variations as shown in the Comment table in Table 3. [sent-114, score-0.067]

66 Comments are randomly retrieved from comment type table according to their category. [sent-115, score-0.269]

67 • Positive word has been tagged as negative (PN) • • • • 2. [sent-116, score-0.182]

68 8 Positive word has been tagged as positive (PP) Negative word has been tagged as positive (NP) Negative word has been tagged as negative (NN) Neutral (NU) CMNT2 The strategy here is as same as the CMNT 1. [sent-117, score-0.623]

69 (PN) • Negative word has been tagged as positive (NP) 2. [sent-120, score-0.199]

70 9 Dr Sentiment’s Prescription The final prescription depends on various factors such as total number of positive, negative or neutral comments and the total time taken by any player. [sent-121, score-0.237]

71 The final prescription also depends on the range of the values of accumulating all the above factors. [sent-122, score-0.078]

72 The provoking message for players is Dr Sentiment can reveal their sentimental status: whether they are extreme negative or positive or very much neutral or diplomatic etc. [sent-124, score-0.412]

73 A word previously tagged by a player is avoided by the tracking system for the next time playing as our intension is to tag more and more words involving Internet population. [sent-125, score-0.397]

74 We observe that the strategy helps to keep the game interesting as a large number of players return to play the game after this strategy was implemented. [sent-126, score-0.664]

75 We are not demanding that the revealed status of a player by Dr Sentiment is exact or ideal. [sent-127, score-0.283]

76 It is only to make fun but the outcomes of the game 55 effectively help to store human sentimental psychology in terms of computational lexicon. [sent-128, score-0.476]

77 3 Senti-Mentality PsychoSentiWordNet gives a good sketch to understand the psycho-sentimental behavior of society depending upon proposed psychological dimensions. [sent-129, score-0.09]

78 The PsychoSentiWordNet is basically the log records of every player’ s tagged words. [sent-130, score-0.189]

79 1 Concept-Culture-Wise Analysis Figure 3: Geospatial Senti-Mentality The word “blue” get tagged by different players around the world. [sent-132, score-0.265]

80 But surprisingly it has been tagged as positive from one part of the world and negative from another part of the world. [sent-133, score-0.265]

81 The observation is that most of the negative tags are coming from the middle-east and especially from the Islamic countries. [sent-135, score-0.066]

82 This information could be further retrieved from the developed source by giving information like (blue, Italy), (blue, Iraq) or (blue, USA) etc. [sent-139, score-0.061]

83 2 Age-Wise Analysis Another interesting observation is that sentimentality may vary age-wise. [sent-141, score-0.198]

84 The total number of players for each range of age is shown at top of every bar. [sent-146, score-0.237]

85 In the Figure 4 the horizontal bars are divided into two colors (Green depicts the Positivity and Red depicts the negativity) according to the total positivity and negativity scores, gathered during playing. [sent-147, score-0.237]

86 This sociological study gives an idea that variation of sentimentality with age. [sent-148, score-0.168]

87 This information could be further retrieved from the developed source by giving information like (X, 36-39) or (X, 45-49) etc. [sent-149, score-0.061]

88 3 Gender Specific It is observed from the statistics collected that women are more positive than a man. [sent-151, score-0.129]

89 The variations in sentimentality among men and women are shown in the following Figure 5. [sent-152, score-0.245]

90 Studies on the combinations of the proposed psychological dimensions, such as, location-age, location56 profession and gender-location may reveal some interesting results. [sent-155, score-0.12]

91 Moreover the other non linguistic psychological dimensions are very much important for further analysis and in several newly discovered sub-disciplines such as: Geospatial Information retrieval (Egenhofer, 2002), Personalized search (Gaucha et al. [sent-157, score-0.119]

92 Several tables are being used to keep user’s clicking log and their personal information. [sent-161, score-0.071]

93 As one of the research motivations was to generate up-to-date prior polarity scores thus we decided to generate web service API by that people could access latest prior polarity scores. [sent-162, score-0.549]

94 We do believe this method will over perform than a static sentiment lexicon set. [sent-163, score-0.492]

95 No evaluation has been done yet as there is no data available for this kind of experimentation and to the best of our knowledge this is the first endeavor where sentiment meets psychology. [sent-165, score-0.421]

96 Our present goal is to collect such corpus and experiment to check whether variable prior polarity score of PsychoSentiWordNet excel over the fixed point prior polarity score of SentiWordNet. [sent-166, score-0.525]

97 CLaC and CLaC-NB: Knowledge-based and corpus-based approaches to sentiment tagging. [sent-169, score-0.421]

98 , Customizing sentiment classifiers to new domains: A case study. [sent-178, score-0.421]

99 Using emoticons to reduce dependency in machine learning techniques for sentiment classification. [sent-199, score-0.508]

100 An operational system for detecting and tracking opinions in online discussion. [sent-217, score-0.081]

similar papers computed by tfidf model

tfidf for this paper:

wordName wordTfidf (topN-words)

[('sentiment', 0.421), ('psychosentiwordnet', 0.421), ('sentiwordnet', 0.218), ('player', 0.212), ('dr', 0.201), ('game', 0.184), ('polarity', 0.181), ('sentimentality', 0.168), ('players', 0.149), ('comment', 0.143), ('tagged', 0.116), ('regulating', 0.112), ('psychology', 0.111), ('govern', 0.103), ('gaming', 0.103), ('psychological', 0.09), ('age', 0.088), ('baccianella', 0.084), ('positive', 0.083), ('prescription', 0.078), ('images', 0.075), ('blue', 0.075), ('wiebe', 0.066), ('negative', 0.066), ('population', 0.066), ('positivity', 0.064), ('orientation', 0.063), ('asks', 0.061), ('retrieved', 0.061), ('sentimental', 0.06), ('image', 0.059), ('emoticons', 0.056), ('unrevealed', 0.056), ('geospatial', 0.056), ('negativity', 0.056), ('neutral', 0.054), ('internet', 0.053), ('prior', 0.052), ('aspects', 0.052), ('operational', 0.052), ('involvement', 0.052), ('mohammad', 0.049), ('fun', 0.049), ('undoubtedly', 0.049), ('women', 0.046), ('presently', 0.046), ('api', 0.046), ('mihalcea', 0.044), ('andreevskaia', 0.044), ('bergler', 0.044), ('strategy', 0.043), ('people', 0.043), ('demands', 0.042), ('demanding', 0.042), ('asked', 0.042), ('gender', 0.042), ('aue', 0.041), ('colors', 0.041), ('outcomes', 0.041), ('strapparava', 0.041), ('concept', 0.041), ('pang', 0.04), ('motivations', 0.04), ('playing', 0.04), ('log', 0.04), ('comments', 0.039), ('culture', 0.039), ('questions', 0.038), ('extension', 0.038), ('depicts', 0.038), ('ahn', 0.038), ('synsets', 0.037), ('type', 0.036), ('static', 0.036), ('lexicon', 0.035), ('gamon', 0.034), ('realization', 0.033), ('country', 0.033), ('records', 0.033), ('relates', 0.033), ('pn', 0.032), ('variations', 0.031), ('keep', 0.031), ('techniques', 0.031), ('janyce', 0.031), ('human', 0.031), ('collect', 0.031), ('question', 0.03), ('interesting', 0.03), ('explains', 0.03), ('riloff', 0.03), ('status', 0.029), ('dimensions', 0.029), ('randomly', 0.029), ('manual', 0.029), ('interface', 0.029), ('tracking', 0.029), ('clues', 0.028), ('excel', 0.028), ('aged', 0.028)]

similar papers list:

simIndex simValue paperId paperTitle

same-paper 1 0.99999976 253 acl-2011-PsychoSentiWordNet

Author: Amitava Das

2 0.92360246 105 acl-2011-Dr Sentiment Knows Everything!

Author: Amitava Das ; Sivaji Bandyopadhyay

Abstract: Sentiment analysis is one of the hot demanding research areas since last few decades. Although a formidable amount of research have been done, the existing reported solutions or available systems are still far from perfect or do not meet the satisfaction level of end users’ . The main issue is the various conceptual rules that govern sentiment and there are even more clues (possibly unlimited) that can convey these concepts from realization to verbalization of a human being. Human psychology directly relates to the unrevealed clues and governs the sentiment realization of us. Human psychology relates many things like social psychology, culture, pragmatics and many more endless intelligent aspects of civilization. Proper incorporation of human psychology into computational sentiment knowledge representation may solve the problem. In the present paper we propose a template based online interactive gaming technology, called Dr Sentiment to automatically create the PsychoSentiWordNet involving internet population. The PsychoSentiWordNet is an extension of SentiWordNet that presently holds human psychological knowledge on a few aspects along with sentiment knowledge.

3 0.28877982 204 acl-2011-Learning Word Vectors for Sentiment Analysis

Author: Andrew L. Maas ; Raymond E. Daly ; Peter T. Pham ; Dan Huang ; Andrew Y. Ng ; Christopher Potts

Abstract: Unsupervised vector-based approaches to semantics can model rich lexical meanings, but they largely fail to capture sentiment information that is central to many word meanings and important for a wide range of NLP tasks. We present a model that uses a mix of unsupervised and supervised techniques to learn word vectors capturing semanticterm–documentinformation as well as rich sentiment content. The proposed model can leverage both continuous and multi-dimensional sentiment information as well as non-sentiment annotations. We instantiate the model to utilize the document-level sentiment polarity annotations present in many online documents (e.g. star ratings). We evaluate the model using small, widely used sentiment and subjectivity corpora and find it out-performs several previously introduced methods for sentiment classification. We also introduce a large dataset , of movie reviews to serve as a more robust benchmark for work in this area.

4 0.28134492 332 acl-2011-Using Multiple Sources to Construct a Sentiment Sensitive Thesaurus for Cross-Domain Sentiment Classification

Author: Danushka Bollegala ; David Weir ; John Carroll

Abstract: We describe a sentiment classification method that is applicable when we do not have any labeled data for a target domain but have some labeled data for multiple other domains, designated as the source domains. We automat- ically create a sentiment sensitive thesaurus using both labeled and unlabeled data from multiple source domains to find the association between words that express similar sentiments in different domains. The created thesaurus is then used to expand feature vectors to train a binary classifier. Unlike previous cross-domain sentiment classification methods, our method can efficiently learn from multiple source domains. Our method significantly outperforms numerous baselines and returns results that are better than or comparable to previous cross-domain sentiment classification methods on a benchmark dataset containing Amazon user reviews for different types of products.

5 0.27503645 281 acl-2011-Sentiment Analysis of Citations using Sentence Structure-Based Features

Author: Awais Athar

Abstract: Sentiment analysis of citations in scientific papers and articles is a new and interesting problem due to the many linguistic differences between scientific texts and other genres. In this paper, we focus on the problem of automatic identification of positive and negative sentiment polarity in citations to scientific papers. Using a newly constructed annotated citation sentiment corpus, we explore the effectiveness of existing and novel features, including n-grams, specialised science-specific lexical features, dependency relations, sentence splitting and negation features. Our results show that 3-grams and dependencies perform best in this task; they outperform the sentence splitting, science lexicon and negation based features.

6 0.25937411 292 acl-2011-Target-dependent Twitter Sentiment Classification

7 0.23382793 279 acl-2011-Semi-supervised latent variable models for sentence-level sentiment analysis

8 0.22439945 64 acl-2011-C-Feel-It: A Sentiment Analyzer for Micro-blogs

9 0.21123023 183 acl-2011-Joint Bilingual Sentiment Classification with Unlabeled Parallel Corpora

10 0.19860236 218 acl-2011-MemeTube: A Sentiment-based Audiovisual System for Analyzing and Displaying Microblog Messages

11 0.19188358 54 acl-2011-Automatically Extracting Polarity-Bearing Topics for Cross-Domain Sentiment Classification

12 0.18953609 207 acl-2011-Learning to Win by Reading Manuals in a Monte-Carlo Framework

13 0.18156151 45 acl-2011-Aspect Ranking: Identifying Important Product Aspects from Online Consumer Reviews

14 0.1347049 211 acl-2011-Liars and Saviors in a Sentiment Annotated Corpus of Comments to Political Debates

15 0.13023485 159 acl-2011-Identifying Noun Product Features that Imply Opinions

16 0.1172285 289 acl-2011-Subjectivity and Sentiment Analysis of Modern Standard Arabic

17 0.11550627 131 acl-2011-Extracting Opinion Expressions and Their Polarities - Exploration of Pipelines and Joint Models

18 0.10594748 226 acl-2011-Multi-Modal Annotation of Quest Games in Second Life

19 0.10579815 82 acl-2011-Content Models with Attitude

20 0.085619792 288 acl-2011-Subjective Natural Language Problems: Motivations, Applications, Characterizations, and Implications

similar papers computed by lsi model

lsi for this paper:

topicId topicWeight

[(0, 0.189), (1, 0.387), (2, 0.379), (3, -0.119), (4, 0.103), (5, 0.08), (6, -0.018), (7, -0.076), (8, 0.014), (9, -0.103), (10, 0.125), (11, -0.06), (12, -0.077), (13, 0.046), (14, 0.034), (15, 0.079), (16, 0.14), (17, 0.064), (18, -0.067), (19, 0.101), (20, 0.203), (21, 0.186), (22, 0.014), (23, -0.082), (24, -0.036), (25, -0.195), (26, 0.014), (27, 0.057), (28, -0.066), (29, -0.144), (30, -0.167), (31, -0.163), (32, 0.277), (33, -0.05), (34, 0.05), (35, 0.023), (36, 0.147), (37, 0.064), (38, -0.019), (39, -0.07), (40, 0.0), (41, 0.119), (42, -0.053), (43, -0.083), (44, 0.036), (45, -0.04), (46, 0.052), (47, 0.064), (48, 0.02), (49, -0.065)]

similar papers list:

simIndex simValue paperId paperTitle

1 0.97487146 105 acl-2011-Dr Sentiment Knows Everything!

Author: Amitava Das ; Sivaji Bandyopadhyay

same-paper 2 0.96772063 253 acl-2011-PsychoSentiWordNet

Author: Amitava Das

3 0.63487738 207 acl-2011-Learning to Win by Reading Manuals in a Monte-Carlo Framework

Author: S.R.K Branavan ; David Silver ; Regina Barzilay

Abstract: This paper presents a novel approach for leveraging automatically extracted textual knowledge to improve the performance of control applications such as games. Our ultimate goal is to enrich a stochastic player with highlevel guidance expressed in text. Our model jointly learns to identify text that is relevant to a given game state in addition to learning game strategies guided by the selected text. Our method operates in the Monte-Carlo search framework, and learns both text analysis and game strategies based only on environment feedback. We apply our approach to the complex strategy game Civilization II using the official game manual as the text guide. Our results show that a linguistically-informed game-playing agent significantly outperforms its language-unaware counterpart, yielding a 27% absolute improvement and winning over 78% of games when playing against the built- . in AI of Civilization II. 1

4 0.5532831 279 acl-2011-Semi-supervised latent variable models for sentence-level sentiment analysis

Author: Oscar Tackstrom ; Ryan McDonald

Abstract: We derive two variants of a semi-supervised model for fine-grained sentiment analysis. Both models leverage abundant natural supervision in the form of review ratings, as well as a small amount of manually crafted sentence labels, to learn sentence-level sentiment classifiers. The proposed model is a fusion of a fully supervised structured conditional model and its partially supervised counterpart. This allows for highly efficient estimation and inference algorithms with rich feature definitions. We describe the two variants as well as their component models and verify experimentally that both variants give significantly improved results for sentence-level sentiment analysis compared to all baselines. 1 Sentence-level sentiment analysis In this paper, we demonstrate how combining coarse-grained and fine-grained supervision benefits sentence-level sentiment analysis an important task in the field of opinion classification and retrieval (Pang and Lee, 2008). Typical supervised learning approaches to sentence-level sentiment analysis rely on sentence-level supervision. While such fine-grained supervision rarely exist naturally, and thus requires labor intensive manual annotation effort (Wiebe et al., 2005), coarse-grained supervision is naturally abundant in the form of online review ratings. This coarse-grained supervision is, of course, less informative compared to fine-grained supervision, however, by combining a small amount of sentence-level supervision with a large amount of document-level supervision, we are able to substantially improve on the sentence-level classification task. Our work combines two strands of research: models for sentiment analysis that take document structure into account; – 569 Ryan McDonald Google, Inc., New York ryanmcd@ google com . and models that use latent variables to learn unobserved phenomena from that which can be observed. Exploiting document structure for sentiment analysis has attracted research attention since the early work of Pang and Lee (2004), who performed minimal cuts in a sentence graph to select subjective sentences. McDonald et al. (2007) later showed that jointly learning fine-grained (sentence) and coarsegrained (document) sentiment improves predictions at both levels. More recently, Yessenalina et al. (2010) described how sentence-level latent variables can be used to improve document-level prediction and Nakagawa et al. (2010) used latent variables over syntactic dependency trees to improve sentence-level prediction, using only labeled sentences for training. In a similar vein, Sauper et al. (2010) integrated generative content structure models with discriminative models for multi-aspect sentiment summarization and ranking. These approaches all rely on the availability of fine-grained annotations, but Ta¨ckstro¨m and McDonald (201 1) showed that latent variables can be used to learn fine-grained sentiment using only coarse-grained supervision. While this model was shown to beat a set of natural baselines with quite a wide margin, it has its shortcomings. Most notably, due to the loose constraints provided by the coarse supervision, it tends to only predict the two dominant fine-grained sentiment categories well for each document sentiment category, so that almost all sentences in positive documents are deemed positive or neutral, and vice versa for negative documents. As a way of overcoming these shortcomings, we propose to fuse a coarsely supervised model with a fully supervised model. Below, we describe two ways of achieving such a combined model in the framework of structured conditional latent variable models. Contrary to (generative) topic models (Mei et al., 2007; Titov and Proceedings ofP thoer t4l9atnhd A, Onrnuegaoln M,e Jeuntineg 19 o-f2 t4h,e 2 A0s1s1o.c?i ac t2io0n11 fo Ar Cssoocmiaptuiotanti foonra Clo Lminpguutiast i ocns:aslh Loirntpgaupisetrics , pages 569–574, Figure 1: a) Factor graph of the fully observed graphical model. b) Factor graph of the corresponding latent variable model. During training, shaded nodes are observed, while non-shaded nodes are unobserved. The input sentences si are always observed. Note that there are no factors connecting the document node, yd, with the input nodes, s, so that the sentence-level variables, ys, in effect form a bottleneck between the document sentiment and the input sentences. McDonald, 2008; Lin and He, 2009), structured conditional models can handle rich and overlapping features and allow for exact inference and simple gradient based estimation. The former models are largely orthogonal to the one we propose in this work and combining their merits might be fruitful. As shown by Sauper et al. (2010), it is possible to fuse generative document structure models and task specific structured conditional models. While we do model document structure in terms of sentiment transitions, we do not model topical structure. An interesting avenue for future work would be to extend the model of Sauper et al. (2010) to take coarse-grained taskspecific supervision into account, while modeling fine-grained task-specific aspects with latent variables. Note also that the proposed approach is orthogonal to semi-supervised and unsupervised induction of context independent (prior polarity) lexicons (Turney, 2002; Kim and Hovy, 2004; Esuli and Sebastiani, 2009; Rao and Ravichandran, 2009; Velikovich et al., 2010). The output of such models could readily be incorporated as features in the proposed model. 1.1 Preliminaries Let d be a document consisting of n sentences, s = (si)in=1, with a document–sentence-sequence pair denoted d = (d, s). Let yd = (yd, ys) denote random variables1 the document level sentiment, yd, and the sequence of sentence level sentiment, = (ysi)in=1 . – ys 1We are abusing notation throughout by using the same symbols to refer to random variables and their particular assignments. 570 In what follows, we assume that we have access to two training sets: a small set of fully labeled instances, DF = {(dj, and a large set of ydj)}jm=f1, coarsely labeled instances DC = {(dj, yjd)}jm=fm+fm+c1. Furthermore, we assume that yd and all yis take values in {POS, NEG, NEU}. We focus on structured conditional models in the exponential family, with the standard parametrization pθ(yd,ys|s) = expnhφ(yd,ys,s),θi − Aθ(s)o

5 0.51945859 218 acl-2011-MemeTube: A Sentiment-based Audiovisual System for Analyzing and Displaying Microblog Messages

Author: Cheng-Te Li ; Chien-Yuan Wang ; Chien-Lin Tseng ; Shou-De Lin

Abstract: Micro-blogging services provide platforms for users to share their feelings and ideas on the move. In this paper, we present a search-based demonstration system, called MemeTube, to summarize the sentiments of microblog messages in an audiovisual manner. MemeTube provides three main functions: (1) recognizing the sentiments of messages (2) generating music melody automatically based on detected sentiments, and (3) produce an animation of real-time piano playing for audiovisual display. Our MemeTube system can be accessed via: http://mslab.csie.ntu.edu.tw/memetube/ .

6 0.51094884 204 acl-2011-Learning Word Vectors for Sentiment Analysis

7 0.49599028 281 acl-2011-Sentiment Analysis of Citations using Sentence Structure-Based Features

8 0.47837803 64 acl-2011-C-Feel-It: A Sentiment Analyzer for Micro-blogs

9 0.45872113 292 acl-2011-Target-dependent Twitter Sentiment Classification

10 0.44224191 332 acl-2011-Using Multiple Sources to Construct a Sentiment Sensitive Thesaurus for Cross-Domain Sentiment Classification

11 0.42744562 45 acl-2011-Aspect Ranking: Identifying Important Product Aspects from Online Consumer Reviews

12 0.41607058 54 acl-2011-Automatically Extracting Polarity-Bearing Topics for Cross-Domain Sentiment Classification

13 0.39176807 226 acl-2011-Multi-Modal Annotation of Quest Games in Second Life

14 0.34463596 183 acl-2011-Joint Bilingual Sentiment Classification with Unlabeled Parallel Corpora

15 0.34151506 211 acl-2011-Liars and Saviors in a Sentiment Annotated Corpus of Comments to Political Debates

16 0.31941345 120 acl-2011-Even the Abstract have Color: Consensus in Word-Colour Associations

17 0.31651178 289 acl-2011-Subjectivity and Sentiment Analysis of Modern Standard Arabic

18 0.28284797 288 acl-2011-Subjective Natural Language Problems: Motivations, Applications, Characterizations, and Implications

19 0.25333616 82 acl-2011-Content Models with Attitude

20 0.2377896 99 acl-2011-Discrete vs. Continuous Rating Scales for Language Evaluation in NLP

similar papers computed by lda model

lda for this paper:

topicId topicWeight

[(5, 0.029), (17, 0.036), (26, 0.367), (37, 0.089), (39, 0.022), (40, 0.104), (41, 0.026), (53, 0.013), (55, 0.014), (59, 0.05), (72, 0.02), (91, 0.023), (96, 0.124), (97, 0.016)]

similar papers list:

simIndex simValue paperId paperTitle

1 0.91286492 105 acl-2011-Dr Sentiment Knows Everything!

Author: Amitava Das ; Sivaji Bandyopadhyay

2 0.85647082 115 acl-2011-Engkoo: Mining the Web for Language Learning

Author: Matthew R. Scott ; Xiaohua Liu ; Ming Zhou ; Microsoft Engkoo Team

Abstract: This paper presents Engkoo 1, a system for exploring and learning language. It is built primarily by mining translation knowledge from billions of web pages - using the Internet to catch language in motion. Currently Engkoo is built for Chinese users who are learning English; however the technology itself is language independent and can be extended in the future. At a system level, Engkoo is an application platform that supports a multitude of NLP technologies such as cross language retrieval, alignment, sentence classification, and statistical machine translation. The data set that supports this system is primarily built from mining a massive set of bilingual terms and sentences from across the web. Specifically, web pages that contain both Chinese and English are discovered and analyzed for parallelism, extracted and formulated into clear term definitions and sample sentences. This approach allows us to build perhaps the world’s largest lexicon linking both Chinese and English together - at the same time covering the most up-to-date terms as captured by the net.

same-paper 3 0.85380274 253 acl-2011-PsychoSentiWordNet

Author: Amitava Das

4 0.81566274 259 acl-2011-Rare Word Translation Extraction from Aligned Comparable Documents

Author: Emmanuel Prochasson ; Pascale Fung

Abstract: We present a first known result of high precision rare word bilingual extraction from comparable corpora, using aligned comparable documents and supervised classification. We incorporate two features, a context-vector similarity and a co-occurrence model between words in aligned documents in a machine learning approach. We test our hypothesis on different pairs of languages and corpora. We obtain very high F-Measure between 80% and 98% for recognizing and extracting correct translations for rare terms (from 1to 5 occurrences). Moreover, we show that our system can be trained on a pair of languages and test on a different pair of languages, obtaining a F-Measure of 77% for the classification of Chinese-English translations using a training corpus of Spanish-French. Our method is therefore even potentially applicable to low resources languages without training data.

5 0.77152365 333 acl-2011-Web-Scale Features for Full-Scale Parsing

Author: Mohit Bansal ; Dan Klein

Abstract: Counts from large corpora (like the web) can be powerful syntactic cues. Past work has used web counts to help resolve isolated ambiguities, such as binary noun-verb PP attachments and noun compound bracketings. In this work, we first present a method for generating web count features that address the full range of syntactic attachments. These features encode both surface evidence of lexical affinities as well as paraphrase-based cues to syntactic structure. We then integrate our features into full-scale dependency and constituent parsers. We show relative error reductions of7.0% over the second-order dependency parser of McDonald and Pereira (2006), 9.2% over the constituent parser of Petrov et al. (2006), and 3.4% over a non-local constituent reranker.

6 0.68142593 123 acl-2011-Exact Decoding of Syntactic Translation Models through Lagrangian Relaxation

7 0.65518349 70 acl-2011-Clustering Comparable Corpora For Bilingual Lexicon Extraction

8 0.60542905 271 acl-2011-Search in the Lost Sense of "Query": Question Formulation in Web Search Queries and its Temporal Changes

9 0.60291338 258 acl-2011-Ranking Class Labels Using Query Sessions

10 0.56132537 181 acl-2011-Jigs and Lures: Associating Web Queries with Structured Entities

11 0.55351174 331 acl-2011-Using Large Monolingual and Bilingual Corpora to Improve Coordination Disambiguation

12 0.55224454 182 acl-2011-Joint Annotation of Search Queries

13 0.55056548 256 acl-2011-Query Weighting for Ranking Model Adaptation

14 0.54571742 137 acl-2011-Fine-Grained Class Label Markup of Search Queries

15 0.53135884 36 acl-2011-An Efficient Indexer for Large N-Gram Corpora

16 0.52242744 34 acl-2011-An Algorithm for Unsupervised Transliteration Mining with an Application to Word Alignment

17 0.5188694 90 acl-2011-Crowdsourcing Translation: Professional Quality from Non-Professionals

18 0.51722324 193 acl-2011-Language-independent compound splitting with morphological operations

19 0.51186502 240 acl-2011-ParaSense or How to Use Parallel Corpora for Word Sense Disambiguation

20 0.51088578 338 acl-2011-Wikulu: An Extensible Architecture for Integrating Natural Language Processing Techniques with Wikis