acl acl2011 acl2011-105 knowledge-graph by maker-knowledge-mining

105 acl-2011-Dr Sentiment Knows Everything!

Source: pdf

Author: Amitava Das ; Sivaji Bandyopadhyay

Abstract: Sentiment analysis is one of the hot demanding research areas since last few decades. Although a formidable amount of research have been done, the existing reported solutions or available systems are still far from perfect or do not meet the satisfaction level of end users’ . The main issue is the various conceptual rules that govern sentiment and there are even more clues (possibly unlimited) that can convey these concepts from realization to verbalization of a human being. Human psychology directly relates to the unrevealed clues and governs the sentiment realization of us. Human psychology relates many things like social psychology, culture, pragmatics and many more endless intelligent aspects of civilization. Proper incorporation of human psychology into computational sentiment knowledge representation may solve the problem. In the present paper we propose a template based online interactive gaming technology, called Dr Sentiment to automatically create the PsychoSentiWordNet involving internet population. The PsychoSentiWordNet is an extension of SentiWordNet that presently holds human psychological knowledge on a few aspects along with sentiment knowledge.

Reference: text

Summary: the most important sentenses genereted by tfidf model

sentIndex sentText sentNum sentScore

1 Amitava Das and Sivaji Bandyopadhyay Department of Computer Science and Engineering Jadavpur University India amitava . [sent-2, score-0.063]

2 Although a formidable amount of research have been done, the existing reported solutions or available systems are still far from perfect or do not meet the satisfaction level of end users’ . [sent-5, score-0.058]

3 The main issue is the various conceptual rules that govern sentiment and there are even more clues (possibly unlimited) that can convey these concepts from realization to verbalization of a human being. [sent-6, score-0.656]

4 Human psychology directly relates to the unrevealed clues and governs the sentiment realization of us. [sent-7, score-0.708]

5 Human psychology relates many things like social psychology, culture, pragmatics and many more endless intelligent aspects of civilization. [sent-8, score-0.256]

6 Proper incorporation of human psychology into computational sentiment knowledge representation may solve the problem. [sent-9, score-0.578]

7 In the present paper we propose a template based online interactive gaming technology, called Dr Sentiment to automatically create the PsychoSentiWordNet involving internet population. [sent-10, score-0.38]

8 The PsychoSentiWordNet is an extension of SentiWordNet that presently holds human psychological knowledge on a few aspects along with sentiment knowledge. [sent-11, score-0.71]

9 , 2010) have already proposed various techniques for making dictionaries for those sentiment words. [sent-15, score-0.464]

10 But polarity assignment of such sentiment lexicons is a hard semantic disambiguation problem. [sent-16, score-0.619]

11 The regulating aspects which govern the lexical level semantic orientation are natural language context (Pang et al. [sent-17, score-0.393]

12 , 2002), language properties (Wiebe and Mihalcea, 2006), domain pragmatic knowledge (Aue and Gamon, 2005), time dimension (Read, 2005), colors and culture (Strapparava and Ozbal, 2010) and many more unrevealed hidden aspects. [sent-18, score-0.107]

13 Therefore it is a challenging and enigmatic research problem. [sent-19, score-0.032]

14 The current trend is to attach prior polarity to each entry at the sentiment lexicon level. [sent-20, score-0.65]

15 Prior polarity is an approximation value based on heuristics based statistics collected from corpus and not exact. [sent-21, score-0.161]

16 The probabilistic fixed point prior polarity scores do not solve the problem completely rather it places the problem into next level, called contextual polarity classification. [sent-22, score-0.352]

17 We start with the hypothesis that the summation of all the regulating aspects of sentiment orientation is human psychology and thus it is a multifaceted problem (Liu, 2010). [sent-23, score-0.854]

18 More precisely what we mean by human psychology is the union of all known and unknown aspects that directly or indirectly govern the sentiment orientation knowledge of us. [sent-24, score-0.844]

19 The regulating aspects wrapped in the present PsychoSentiWordNet are Gender, Age, City, Country, Language and Profession. [sent-25, score-0.237]

20 The PsychoSentiWordNet is an extension of the existing SentiWordNet 3. [sent-26, score-0.029]

21 , 2010) to hold the possible psychological ingreIn order to identify sentiment from a text, lexical analysis plays a crucial role. [sent-29, score-0.514]

22 For example, words like love, hate, good and favorite directly indicate dients and govern the sentiment understandability of us. [sent-30, score-0.604]

23 The PsychoSentiWordNet holds variable prior polarity scores that could be fetched depend- sentiment or opinion. [sent-31, score-0.678]

24 , ing upon those psychological regulating aspects. [sent-33, score-0.212]

25 This technology has proven itself as an excellent technique to collect psychological sentiment of human society even at multilingual level. [sent-37, score-0.589]

26 Dr Sentiment presently supports 56 languages and therefore we may call it Global PsychoSentiWordNet. [sent-38, score-0.078]

27 In this section we have philosophically argued about the necessity of developing PsychoSentiWordNet. [sent-40, score-0.032]

28 Section 3 explains about some exciting outcomes of PsychoSentiWordNet. [sent-42, score-0.046]

29 The developed PsychoSentiWordNet(s) are expected to help automatic sentiment analysis research in many aspects and other disciplines as well and have been described in section 4. [sent-43, score-0.507]

30 2 Dr Sentiment Dr Sentiment1 is a template based interactive online game, which collects player’s sentiment by asking a set of simple template based questions and finally reveals a player’s sentimental status. [sent-46, score-0.686]

31 Dr Sentiment fetches random words from SentiWordNet synsets and asks every player to tell about his/her sentiment polarity understanding regarding the concept behind the word fetched by it. [sent-47, score-1.073]

32 There are several motivations behind developing the intuitive game to automatically collect human psycho-sentimental orientation information. [sent-48, score-0.316]

33 In the history of Information Retrieval research there is a milestone when ESP (Ahn et al. [sent-49, score-0.032]

34 , 2004) innovated the concept of a game to automatically label images available in the World Wide Web. [sent-50, score-0.255]

35 It has been identified as the most reliable game2 strategy to automatically annotate the online im- 1 http://www. [sent-51, score-0.078]

36 A number of research endeavors could be found in the literature for creation of Sentiment Lexicon in several languages and domains. [sent-58, score-0.032]

37 These techniques can be broadly categorized into two classes, one follows classical manual annotation techniques (Andreevskaia and Bergler, 2006);(Wiebe and Riloff, 2006) while the other follows various automatic techniques (Mohammad et al. [sent-59, score-0.137]

38 Manual annotation techniques are undoubtedly trustable but it generally takes time. [sent-62, score-0.067]

39 Automatic techniques demand manual validations and are dependent on the corpus availability in the respective domain. [sent-63, score-0.067]

40 Manual annotation techniques require a large number of annotators to balance one’s sentimentality in order to reach agreement. [sent-64, score-0.13]

41 Sentiment is a property of human intelligence and is not entirely based on the features of a lan- guage. [sent-66, score-0.04]

42 Thus people’s involvement is required to capture the sentiment of the human society. [sent-67, score-0.527]

43 We have developed an online game to attract internet population for the creation of PsychoSentiWordNet automatically. [sent-68, score-0.382]

44 Involvement of Internet population is an effective approach as the population is very high in number and ever growing (approx. [sent-69, score-0.184]

45 Internet population consists of people with various languages, cultures, age etc and thus not biased towards any domain, language or particular society. [sent-71, score-0.142]

46 A detailed statistics on the Internet usage and population has been reported in the Table 2. [sent-72, score-0.092]

47 The lexicons tagged by this system are credible as it is tagged by human beings. [sent-73, score-0.36]

48 It is not a static sentiment lexicon set [polarity changes with time (Read, 2005)] as it is updated regularly. [sent-74, score-0.459]

49 Around 10-20 players each day are playing it throughout the world in different languages. [sent-75, score-0.144]

50 The Sign Up form of the “Dr Sentiment” game asks the player to provide personal information such as Sex, Age, City, Country, Language and Profession. [sent-78, score-0.552]

51 These collected personal details of a player are kept as a log record in the database. [sent-79, score-0.349]

52 The gaming interface has four types of question templates. [sent-80, score-0.264]

53 The question templates are named as Q1, Q2, Q3 and Q4. [sent-81, score-0.041]

54 htm ABzferlabmiskenuaqicjensiBCuDrlahzotigne lrcsahine GEFesDiaolrntupcgisnaho IHncGdueTaobrlnigmteasdrkwinlaceL 1n:guMaLJiKtIchegpasourtdilvenahgisnea PNRMoremtawulsgineayh sneSRlowupvrseabdnhi slkanVUTYiWkeutrdnahliksmuhaens To make the gaming interface more interesting images have been added. [sent-85, score-0.308]

55 These images have been retrieved by Google image search API and to avoid biasness we have randomized among the first ten images retrieved by Google. [sent-86, score-0.379]

56 1 Gaming Strategy Dr Sentiment asks 30 questions to each player. [sent-88, score-0.101]

57 There are predefined distributions of each question type as 11 for Q1, 11 for Q2, 4 for Q3 and 4 for Q4. [sent-89, score-0.082]

58 The questions are randomly asked to keep the game more inter- esting. [sent-91, score-0.25]

59 For word based translation Google translation5 service has been used. [sent-92, score-0.034]

60 At each Question (Q) level translation service has been used to display the sentiment word into player’s own language. [sent-93, score-0.463]

61 The Google image search API is fired with the word as a query. [sent-97, score-0.067]

62 An image along with the word itself is shown in the Q1 page of the game. [sent-98, score-0.067]

63 com/ 52 Players press the different emoticons (Figure 1) to express their sentimentality. [sent-103, score-0.042]

64 3 Q2 This question type is specially designed for relative scoring technique. [sent-106, score-0.082]

65 For example: good and better both are positive but we need to know which one is more positive than other. [sent-107, score-0.154]

66 With the present gaming technology relative polarity scoring has been assigned to each n-n word pair combination. [sent-109, score-0.336]

67 Randomly n (presently 2-4) words have been chosen from the source SentiWordNet synsets along with their images as retrieved by Google API. [sent-110, score-0.181]

68 Each player is then asked to select one of them that he/she likes most. [sent-111, score-0.331]

69 The relative score is calculated and stored in the corresponding log table. [sent-112, score-0.036]

70 4 Q3 The player is asked for any positive word in his/her mind. [sent-116, score-0.408]

71 This technique helps to increase the coverage of existing SentiWordNet. [sent-117, score-0.061]

72 The word is then added to the existing PsychoSentiWordNet and further used in Q1 to other users to note their sentimentality about the particular word. [sent-118, score-0.124]

73 5 Q4 A player is asked by Dr Sentiment about any negative word. [sent-120, score-0.412]

74 The word is then added to the existing PsychoSentiWordNet and further used in Q1 to other users to note their sentimentality about the particular word. [sent-121, score-0.124]

75 6 Comment Architecture There are three types of Comments, Comment type 1 (CMNT1), Comment type 2 (CMNT2) and the final comment as Dr Sentiment’s prescription. [sent-123, score-0.243]

76 CMNT1 type and CMNT2 type comments are associated with question types Q1 and Q2 respectively. [sent-124, score-0.167]

77 1 CMNT1 Comment type 1 has 5 variations as shown in the Comment table in Table 4. [sent-127, score-0.041]

78 Comments are random- ly retrieved from comment type table according to their category: • • • • • Positive word has been tagged as negative (PN) Positive word has been tagged as positive (PP) Negative word has been tagged as positive (NP) Negative word has been tagged as negative (NN) Neutral. [sent-128, score-1.097]

79 2 CMNT2 The strategy here is as same as the CMNT 1. [sent-131, score-0.048]

80 • Positive word has been tagged as negative (PN) • Negative word has been tagged as positive (NP) 2. [sent-133, score-0.42]

81 7 Dr Sentiment’s Prescription The final prescription depends on various factors such as total number of positive, negative or neutral comments and the total time taken by any player. [sent-134, score-0.259]

82 The final prescription also depends on the range of the accumulated values of all the above factors. [sent-135, score-0.088]

83 The motivating message for players is that Dr Sentiment can reveal their sentimental status: whether they are extreme negative or positive or very much neutral or diplomatic etc. [sent-137, score-0.447]

84 It is not claimed that the revealed status of a player by Dr Sentiment is exact or ideal. [sent-138, score-0.345]

85 It is only to make the players motivated but the outcomes of the game effectively helps to store human sentimental psychology in terms of computational lexicon. [sent-139, score-0.608]

86 A word previously tagged by a player is avoided by the tracking system during subsequent turns by the same player. [sent-140, score-0.414]

87 The intension is to tag more and more words involving Internet population. [sent-141, score-0.032]

88 We observe that the strategy helps to keep the game interesting as a large number of players return to play the game after this strategy was implemented. [sent-142, score-0.612]

89 3 Senti-Mentality PsychoSentiWordNet gives a good sketch to understand the psycho-sentimental behavior of the human society depending upon proposed psychological dimensions. [sent-143, score-0.125]

90 The PsychoSentiWordNet is basically the log records of every player’s tagged words. [sent-144, score-0.204]

91 1 Concept-Culture-Wise Analysis The word “blue” gets tagged by different players around the world. [sent-146, score-0.275]

92 But surprisingly it has been tagged as positive from one part of the world and negative from another part of the world. [sent-147, score-0.289]

93 The observation is that most of the negative tags are coming from the middle-east and especially from the Islamic countries. [sent-149, score-0.081]

similar papers computed by tfidf model

tfidf for this paper:

wordName wordTfidf (topN-words)

[('sentiment', 0.429), ('psychosentiwordnet', 0.348), ('dr', 0.31), ('player', 0.283), ('sentiwordnet', 0.192), ('gaming', 0.175), ('game', 0.17), ('comment', 0.161), ('polarity', 0.161), ('players', 0.144), ('tagged', 0.131), ('regulating', 0.127), ('govern', 0.117), ('psychology', 0.109), ('sentimentality', 0.095), ('population', 0.092), ('internet', 0.09), ('prescription', 0.088), ('psychological', 0.085), ('images', 0.085), ('negative', 0.081), ('aspects', 0.078), ('presently', 0.078), ('positive', 0.077), ('orientation', 0.071), ('asks', 0.069), ('sentimental', 0.067), ('image', 0.067), ('amitava', 0.063), ('baccianella', 0.063), ('unrevealed', 0.063), ('fetched', 0.058), ('involvement', 0.058), ('retrieved', 0.055), ('api', 0.052), ('age', 0.05), ('interface', 0.048), ('strategy', 0.048), ('asked', 0.048), ('google', 0.047), ('outcomes', 0.046), ('neutral', 0.046), ('wiebe', 0.045), ('comments', 0.044), ('culture', 0.044), ('template', 0.043), ('emoticons', 0.042), ('interactive', 0.042), ('synsets', 0.041), ('question', 0.041), ('type', 0.041), ('human', 0.04), ('realization', 0.038), ('country', 0.038), ('records', 0.037), ('relates', 0.037), ('log', 0.036), ('pn', 0.036), ('techniques', 0.035), ('collect', 0.035), ('service', 0.034), ('mihalcea', 0.033), ('status', 0.033), ('questions', 0.032), ('manual', 0.032), ('helps', 0.032), ('clues', 0.032), ('bandyopadhyay', 0.032), ('antu', 0.032), ('biasness', 0.032), ('cmnt', 0.032), ('diplomatic', 0.032), ('endeavors', 0.032), ('endless', 0.032), ('enigmatic', 0.032), ('esp', 0.032), ('fetches', 0.032), ('intension', 0.032), ('jadavpur', 0.032), ('milestone', 0.032), ('negativ', 0.032), ('philosophically', 0.032), ('profession', 0.032), ('sivaji', 0.032), ('trustable', 0.032), ('wrapped', 0.032), ('online', 0.03), ('pang', 0.03), ('lexicon', 0.03), ('personal', 0.03), ('prior', 0.03), ('existing', 0.029), ('lexicons', 0.029), ('islamic', 0.029), ('understandability', 0.029), ('claimed', 0.029), ('businessman', 0.029), ('credible', 0.029), ('favorite', 0.029), ('formidable', 0.029)]

similar papers list:

simIndex simValue paperId paperTitle

same-paper 1 0.99999928 105 acl-2011-Dr Sentiment Knows Everything!

Author: Amitava Das ; Sivaji Bandyopadhyay

2 0.92360246 253 acl-2011-PsychoSentiWordNet

Author: Amitava Das

Abstract: Sentiment analysis is one of the hot demanding research areas since last few decades. Although a formidable amount of research has been done but still the existing reported solutions or available systems are far from perfect or to meet the satisfaction level of end user's. The main issue may be there are many conceptual rules that govern sentiment, and there are even more clues (possibly unlimited) that can convey these concepts from realization to verbalization of a human being. Human psychology directly relates to the unrevealed clues; govern the sentiment realization of us. Human psychology relates many things like social psychology, culture, pragmatics and many more endless intelligent aspects of civilization. Proper incorporation of human psychology into computational sentiment knowledge representation may solve the problem. PsychoSentiWordNet is an extension over SentiWordNet that holds human psychological knowledge and sentiment knowledge simultaneously. 1

3 0.28594747 204 acl-2011-Learning Word Vectors for Sentiment Analysis

Author: Andrew L. Maas ; Raymond E. Daly ; Peter T. Pham ; Dan Huang ; Andrew Y. Ng ; Christopher Potts

Abstract: Unsupervised vector-based approaches to semantics can model rich lexical meanings, but they largely fail to capture sentiment information that is central to many word meanings and important for a wide range of NLP tasks. We present a model that uses a mix of unsupervised and supervised techniques to learn word vectors capturing semanticterm–documentinformation as well as rich sentiment content. The proposed model can leverage both continuous and multi-dimensional sentiment information as well as non-sentiment annotations. We instantiate the model to utilize the document-level sentiment polarity annotations present in many online documents (e.g. star ratings). We evaluate the model using small, widely used sentiment and subjectivity corpora and find it out-performs several previously introduced methods for sentiment classification. We also introduce a large dataset , of movie reviews to serve as a more robust benchmark for work in this area.

4 0.28039604 332 acl-2011-Using Multiple Sources to Construct a Sentiment Sensitive Thesaurus for Cross-Domain Sentiment Classification

Author: Danushka Bollegala ; David Weir ; John Carroll

Abstract: We describe a sentiment classification method that is applicable when we do not have any labeled data for a target domain but have some labeled data for multiple other domains, designated as the source domains. We automat- ically create a sentiment sensitive thesaurus using both labeled and unlabeled data from multiple source domains to find the association between words that express similar sentiments in different domains. The created thesaurus is then used to expand feature vectors to train a binary classifier. Unlike previous cross-domain sentiment classification methods, our method can efficiently learn from multiple source domains. Our method significantly outperforms numerous baselines and returns results that are better than or comparable to previous cross-domain sentiment classification methods on a benchmark dataset containing Amazon user reviews for different types of products.

5 0.26246464 281 acl-2011-Sentiment Analysis of Citations using Sentence Structure-Based Features

Author: Awais Athar

Abstract: Sentiment analysis of citations in scientific papers and articles is a new and interesting problem due to the many linguistic differences between scientific texts and other genres. In this paper, we focus on the problem of automatic identification of positive and negative sentiment polarity in citations to scientific papers. Using a newly constructed annotated citation sentiment corpus, we explore the effectiveness of existing and novel features, including n-grams, specialised science-specific lexical features, dependency relations, sentence splitting and negation features. Our results show that 3-grams and dependencies perform best in this task; they outperform the sentence splitting, science lexicon and negation based features.

6 0.25552779 292 acl-2011-Target-dependent Twitter Sentiment Classification

7 0.23260619 279 acl-2011-Semi-supervised latent variable models for sentence-level sentiment analysis

8 0.2180097 64 acl-2011-C-Feel-It: A Sentiment Analyzer for Micro-blogs

9 0.20699483 183 acl-2011-Joint Bilingual Sentiment Classification with Unlabeled Parallel Corpora

10 0.19397521 218 acl-2011-MemeTube: A Sentiment-based Audiovisual System for Analyzing and Displaying Microblog Messages

11 0.18771864 54 acl-2011-Automatically Extracting Polarity-Bearing Topics for Cross-Domain Sentiment Classification

12 0.18427984 207 acl-2011-Learning to Win by Reading Manuals in a Monte-Carlo Framework

13 0.17938274 45 acl-2011-Aspect Ranking: Identifying Important Product Aspects from Online Consumer Reviews

14 0.12666713 211 acl-2011-Liars and Saviors in a Sentiment Annotated Corpus of Comments to Political Debates

15 0.11185513 226 acl-2011-Multi-Modal Annotation of Quest Games in Second Life

16 0.109029 289 acl-2011-Subjectivity and Sentiment Analysis of Modern Standard Arabic

17 0.10853101 159 acl-2011-Identifying Noun Product Features that Imply Opinions

18 0.10480613 82 acl-2011-Content Models with Attitude

19 0.097604826 131 acl-2011-Extracting Opinion Expressions and Their Polarities - Exploration of Pipelines and Joint Models

20 0.080774158 179 acl-2011-Is Machine Translation Ripe for Cross-Lingual Sentiment Classification?

similar papers computed by lsi model

lsi for this paper:

topicId topicWeight

[(0, 0.182), (1, 0.378), (2, 0.376), (3, -0.114), (4, 0.105), (5, 0.079), (6, -0.02), (7, -0.077), (8, 0.015), (9, -0.1), (10, 0.128), (11, -0.062), (12, -0.065), (13, 0.04), (14, 0.034), (15, 0.077), (16, 0.141), (17, 0.067), (18, -0.063), (19, 0.11), (20, 0.203), (21, 0.183), (22, 0.019), (23, -0.084), (24, -0.037), (25, -0.197), (26, 0.008), (27, 0.056), (28, -0.064), (29, -0.146), (30, -0.178), (31, -0.165), (32, 0.287), (33, -0.058), (34, 0.048), (35, 0.023), (36, 0.157), (37, 0.061), (38, -0.022), (39, -0.075), (40, 0.002), (41, 0.112), (42, -0.057), (43, -0.078), (44, 0.038), (45, -0.042), (46, 0.052), (47, 0.066), (48, 0.021), (49, -0.061)]

similar papers list:

simIndex simValue paperId paperTitle

same-paper 1 0.97197211 105 acl-2011-Dr Sentiment Knows Everything!

Author: Amitava Das ; Sivaji Bandyopadhyay

2 0.96291786 253 acl-2011-PsychoSentiWordNet

Author: Amitava Das

3 0.63416207 207 acl-2011-Learning to Win by Reading Manuals in a Monte-Carlo Framework

Author: S.R.K Branavan ; David Silver ; Regina Barzilay

Abstract: This paper presents a novel approach for leveraging automatically extracted textual knowledge to improve the performance of control applications such as games. Our ultimate goal is to enrich a stochastic player with highlevel guidance expressed in text. Our model jointly learns to identify text that is relevant to a given game state in addition to learning game strategies guided by the selected text. Our method operates in the Monte-Carlo search framework, and learns both text analysis and game strategies based only on environment feedback. We apply our approach to the complex strategy game Civilization II using the official game manual as the text guide. Our results show that a linguistically-informed game-playing agent significantly outperforms its language-unaware counterpart, yielding a 27% absolute improvement and winning over 78% of games when playing against the built- . in AI of Civilization II. 1

4 0.5460006 279 acl-2011-Semi-supervised latent variable models for sentence-level sentiment analysis

Author: Oscar Tackstrom ; Ryan McDonald

Abstract: We derive two variants of a semi-supervised model for fine-grained sentiment analysis. Both models leverage abundant natural supervision in the form of review ratings, as well as a small amount of manually crafted sentence labels, to learn sentence-level sentiment classifiers. The proposed model is a fusion of a fully supervised structured conditional model and its partially supervised counterpart. This allows for highly efficient estimation and inference algorithms with rich feature definitions. We describe the two variants as well as their component models and verify experimentally that both variants give significantly improved results for sentence-level sentiment analysis compared to all baselines. 1 Sentence-level sentiment analysis In this paper, we demonstrate how combining coarse-grained and fine-grained supervision benefits sentence-level sentiment analysis an important task in the field of opinion classification and retrieval (Pang and Lee, 2008). Typical supervised learning approaches to sentence-level sentiment analysis rely on sentence-level supervision. While such fine-grained supervision rarely exist naturally, and thus requires labor intensive manual annotation effort (Wiebe et al., 2005), coarse-grained supervision is naturally abundant in the form of online review ratings. This coarse-grained supervision is, of course, less informative compared to fine-grained supervision, however, by combining a small amount of sentence-level supervision with a large amount of document-level supervision, we are able to substantially improve on the sentence-level classification task. Our work combines two strands of research: models for sentiment analysis that take document structure into account; – 569 Ryan McDonald Google, Inc., New York ryanmcd@ google com . and models that use latent variables to learn unobserved phenomena from that which can be observed. Exploiting document structure for sentiment analysis has attracted research attention since the early work of Pang and Lee (2004), who performed minimal cuts in a sentence graph to select subjective sentences. McDonald et al. (2007) later showed that jointly learning fine-grained (sentence) and coarsegrained (document) sentiment improves predictions at both levels. More recently, Yessenalina et al. (2010) described how sentence-level latent variables can be used to improve document-level prediction and Nakagawa et al. (2010) used latent variables over syntactic dependency trees to improve sentence-level prediction, using only labeled sentences for training. In a similar vein, Sauper et al. (2010) integrated generative content structure models with discriminative models for multi-aspect sentiment summarization and ranking. These approaches all rely on the availability of fine-grained annotations, but Ta¨ckstro¨m and McDonald (201 1) showed that latent variables can be used to learn fine-grained sentiment using only coarse-grained supervision. While this model was shown to beat a set of natural baselines with quite a wide margin, it has its shortcomings. Most notably, due to the loose constraints provided by the coarse supervision, it tends to only predict the two dominant fine-grained sentiment categories well for each document sentiment category, so that almost all sentences in positive documents are deemed positive or neutral, and vice versa for negative documents. As a way of overcoming these shortcomings, we propose to fuse a coarsely supervised model with a fully supervised model. Below, we describe two ways of achieving such a combined model in the framework of structured conditional latent variable models. Contrary to (generative) topic models (Mei et al., 2007; Titov and Proceedings ofP thoer t4l9atnhd A, Onrnuegaoln M,e Jeuntineg 19 o-f2 t4h,e 2 A0s1s1o.c?i ac t2io0n11 fo Ar Cssoocmiaptuiotanti foonra Clo Lminpguutiast i ocns:aslh Loirntpgaupisetrics , pages 569–574, Figure 1: a) Factor graph of the fully observed graphical model. b) Factor graph of the corresponding latent variable model. During training, shaded nodes are observed, while non-shaded nodes are unobserved. The input sentences si are always observed. Note that there are no factors connecting the document node, yd, with the input nodes, s, so that the sentence-level variables, ys, in effect form a bottleneck between the document sentiment and the input sentences. McDonald, 2008; Lin and He, 2009), structured conditional models can handle rich and overlapping features and allow for exact inference and simple gradient based estimation. The former models are largely orthogonal to the one we propose in this work and combining their merits might be fruitful. As shown by Sauper et al. (2010), it is possible to fuse generative document structure models and task specific structured conditional models. While we do model document structure in terms of sentiment transitions, we do not model topical structure. An interesting avenue for future work would be to extend the model of Sauper et al. (2010) to take coarse-grained taskspecific supervision into account, while modeling fine-grained task-specific aspects with latent variables. Note also that the proposed approach is orthogonal to semi-supervised and unsupervised induction of context independent (prior polarity) lexicons (Turney, 2002; Kim and Hovy, 2004; Esuli and Sebastiani, 2009; Rao and Ravichandran, 2009; Velikovich et al., 2010). The output of such models could readily be incorporated as features in the proposed model. 1.1 Preliminaries Let d be a document consisting of n sentences, s = (si)in=1, with a document–sentence-sequence pair denoted d = (d, s). Let yd = (yd, ys) denote random variables1 the document level sentiment, yd, and the sequence of sentence level sentiment, = (ysi)in=1 . – ys 1We are abusing notation throughout by using the same symbols to refer to random variables and their particular assignments. 570 In what follows, we assume that we have access to two training sets: a small set of fully labeled instances, DF = {(dj, and a large set of ydj)}jm=f1, coarsely labeled instances DC = {(dj, yjd)}jm=fm+fm+c1. Furthermore, we assume that yd and all yis take values in {POS, NEG, NEU}. We focus on structured conditional models in the exponential family, with the standard parametrization pθ(yd,ys|s) = expnhφ(yd,ys,s),θi − Aθ(s)o

5 0.50638258 218 acl-2011-MemeTube: A Sentiment-based Audiovisual System for Analyzing and Displaying Microblog Messages

Author: Cheng-Te Li ; Chien-Yuan Wang ; Chien-Lin Tseng ; Shou-De Lin

Abstract: Micro-blogging services provide platforms for users to share their feelings and ideas on the move. In this paper, we present a search-based demonstration system, called MemeTube, to summarize the sentiments of microblog messages in an audiovisual manner. MemeTube provides three main functions: (1) recognizing the sentiments of messages (2) generating music melody automatically based on detected sentiments, and (3) produce an animation of real-time piano playing for audiovisual display. Our MemeTube system can be accessed via: http://mslab.csie.ntu.edu.tw/memetube/ .

6 0.50375056 204 acl-2011-Learning Word Vectors for Sentiment Analysis

7 0.48056117 281 acl-2011-Sentiment Analysis of Citations using Sentence Structure-Based Features

8 0.47271639 64 acl-2011-C-Feel-It: A Sentiment Analyzer for Micro-blogs

9 0.45068726 292 acl-2011-Target-dependent Twitter Sentiment Classification

10 0.43258274 332 acl-2011-Using Multiple Sources to Construct a Sentiment Sensitive Thesaurus for Cross-Domain Sentiment Classification

11 0.4091996 54 acl-2011-Automatically Extracting Polarity-Bearing Topics for Cross-Domain Sentiment Classification

12 0.40894708 45 acl-2011-Aspect Ranking: Identifying Important Product Aspects from Online Consumer Reviews

13 0.38589665 226 acl-2011-Multi-Modal Annotation of Quest Games in Second Life

14 0.33786547 183 acl-2011-Joint Bilingual Sentiment Classification with Unlabeled Parallel Corpora

15 0.31705168 211 acl-2011-Liars and Saviors in a Sentiment Annotated Corpus of Comments to Political Debates

16 0.31119317 120 acl-2011-Even the Abstract have Color: Consensus in Word-Colour Associations

17 0.29803613 289 acl-2011-Subjectivity and Sentiment Analysis of Modern Standard Arabic

18 0.2619403 288 acl-2011-Subjective Natural Language Problems: Motivations, Applications, Characterizations, and Implications

19 0.24464607 82 acl-2011-Content Models with Attitude

20 0.23180924 99 acl-2011-Discrete vs. Continuous Rating Scales for Language Evaluation in NLP

similar papers computed by lda model

lda for this paper:

topicId topicWeight

[(5, 0.016), (17, 0.031), (26, 0.508), (37, 0.091), (39, 0.02), (40, 0.022), (41, 0.026), (55, 0.014), (59, 0.044), (72, 0.023), (91, 0.013), (96, 0.109), (97, 0.012)]

similar papers list:

simIndex simValue paperId paperTitle

same-paper 1 0.89184487 105 acl-2011-Dr Sentiment Knows Everything!

Author: Amitava Das ; Sivaji Bandyopadhyay

2 0.81997603 115 acl-2011-Engkoo: Mining the Web for Language Learning

Author: Matthew R. Scott ; Xiaohua Liu ; Ming Zhou ; Microsoft Engkoo Team

Abstract: This paper presents Engkoo 1, a system for exploring and learning language. It is built primarily by mining translation knowledge from billions of web pages - using the Internet to catch language in motion. Currently Engkoo is built for Chinese users who are learning English; however the technology itself is language independent and can be extended in the future. At a system level, Engkoo is an application platform that supports a multitude of NLP technologies such as cross language retrieval, alignment, sentence classification, and statistical machine translation. The data set that supports this system is primarily built from mining a massive set of bilingual terms and sentences from across the web. Specifically, web pages that contain both Chinese and English are discovered and analyzed for parallelism, extracted and formulated into clear term definitions and sample sentences. This approach allows us to build perhaps the world’s largest lexicon linking both Chinese and English together - at the same time covering the most up-to-date terms as captured by the net.

3 0.78930628 253 acl-2011-PsychoSentiWordNet

Author: Amitava Das

4 0.76645863 259 acl-2011-Rare Word Translation Extraction from Aligned Comparable Documents

Author: Emmanuel Prochasson ; Pascale Fung

Abstract: We present a first known result of high precision rare word bilingual extraction from comparable corpora, using aligned comparable documents and supervised classification. We incorporate two features, a context-vector similarity and a co-occurrence model between words in aligned documents in a machine learning approach. We test our hypothesis on different pairs of languages and corpora. We obtain very high F-Measure between 80% and 98% for recognizing and extracting correct translations for rare terms (from 1to 5 occurrences). Moreover, we show that our system can be trained on a pair of languages and test on a different pair of languages, obtaining a F-Measure of 77% for the classification of Chinese-English translations using a training corpus of Spanish-French. Our method is therefore even potentially applicable to low resources languages without training data.

5 0.71591038 333 acl-2011-Web-Scale Features for Full-Scale Parsing

Author: Mohit Bansal ; Dan Klein

Abstract: Counts from large corpora (like the web) can be powerful syntactic cues. Past work has used web counts to help resolve isolated ambiguities, such as binary noun-verb PP attachments and noun compound bracketings. In this work, we first present a method for generating web count features that address the full range of syntactic attachments. These features encode both surface evidence of lexical affinities as well as paraphrase-based cues to syntactic structure. We then integrate our features into full-scale dependency and constituent parsers. We show relative error reductions of7.0% over the second-order dependency parser of McDonald and Pereira (2006), 9.2% over the constituent parser of Petrov et al. (2006), and 3.4% over a non-local constituent reranker.

6 0.60462409 123 acl-2011-Exact Decoding of Syntactic Translation Models through Lagrangian Relaxation

7 0.58395547 70 acl-2011-Clustering Comparable Corpora For Bilingual Lexicon Extraction

8 0.5234741 271 acl-2011-Search in the Lost Sense of "Query": Question Formulation in Web Search Queries and its Temporal Changes

9 0.5169698 258 acl-2011-Ranking Class Labels Using Query Sessions

10 0.47064722 181 acl-2011-Jigs and Lures: Associating Web Queries with Structured Entities

11 0.46486521 256 acl-2011-Query Weighting for Ranking Model Adaptation

12 0.46472275 182 acl-2011-Joint Annotation of Search Queries

13 0.45772922 331 acl-2011-Using Large Monolingual and Bilingual Corpora to Improve Coordination Disambiguation

14 0.44836187 137 acl-2011-Fine-Grained Class Label Markup of Search Queries

15 0.43597353 36 acl-2011-An Efficient Indexer for Large N-Gram Corpora

16 0.4236474 34 acl-2011-An Algorithm for Unsupervised Transliteration Mining with an Application to Word Alignment

17 0.42295456 90 acl-2011-Crowdsourcing Translation: Professional Quality from Non-Professionals

18 0.42222893 338 acl-2011-Wikulu: An Extensible Architecture for Integrating Natural Language Processing Techniques with Wikis

19 0.41725567 193 acl-2011-Language-independent compound splitting with morphological operations

20 0.41302693 160 acl-2011-Identifying Sarcasm in Twitter: A Closer Look