acl acl2010 acl2010-176 knowledge-graph by maker-knowledge-mining
Source: pdf
Author: Thin Nguyen
Abstract: The emergence of social media brings chances, but also challenges, to linguistic analysis. In this paper we investigate a novel problem of discovering patterns based on emotion and the association of moods and affective lexicon usage in blogosphere, a representative for social media. We propose the use ofnormative emotional scores for English words in combination with a psychological model of emotion measurement and a nonparametric clustering process for inferring meaningful emotion patterns automatically from data. Our results on a dataset consisting of more than 17 million mood-groundtruthed blogposts have shown interesting evidence of the emotion patterns automatically discovered that match well with the core- affect emotion model theorized by psychologists. We then present a method based on information theory to discover the association of moods and affective lexicon usage in the new media.
Reference: text
sentIndex sentText sentNum sentScore
1 Mood Patterns and Affective Lexicon Access in Weblogs Thin Nguyen Curtin University of Technology Bentley, WA 6102, Australia thin . [sent-1, score-0.025]
2 au Abstract The emergence of social media brings chances, but also challenges, to linguistic analysis. [sent-5, score-0.108]
3 In this paper we investigate a novel problem of discovering patterns based on emotion and the association of moods and affective lexicon usage in blogosphere, a representative for social media. [sent-6, score-1.325]
4 We propose the use ofnormative emotional scores for English words in combination with a psychological model of emotion measurement and a nonparametric clustering process for inferring meaningful emotion patterns automatically from data. [sent-7, score-0.826]
5 Our results on a dataset consisting of more than 17 million mood-groundtruthed blogposts have shown interesting evidence of the emotion patterns automatically discovered that match well with the core- affect emotion model theorized by psychologists. [sent-8, score-0.98]
6 We then present a method based on information theory to discover the association of moods and affective lexicon usage in the new media. [sent-9, score-0.895]
7 1 Introduction Social media provides communication and interaction channels where users can freely participate in, express their opinions, make their own content, and interact with other users. [sent-10, score-0.054]
8 Users in this new media are more comfortable in expressing their feelings, opinions, and ideas. [sent-11, score-0.054]
9 Thus, the resulting user-generated content tends to be more subjective than other written genres, and thus, is more appealing to be investigated in terms of subjectivity and sentiment analysis. [sent-12, score-0.088]
10 Research in sentiment analysis has recently attracted much attention (Pang and Lee, 2008), but modeling emotion patterns and studying the affective lexicon used in social media have received little attention. [sent-13, score-0.811]
11 Work in sentiment analysis in social media is often limited to finding the sentiment sign in the dipole pattern (negative/positive) for given text. [sent-14, score-0.318]
12 Extensions to this task include the three-class classification (adding neutral to the polarity) and locating the value of emotion the text carries across a spectrum of valence scores. [sent-15, score-0.462]
13 On the other hand, it is well appreciated by psychologists that sentiment has much richer structures than the aforementioned simplified polarity. [sent-16, score-0.113]
14 For example, emotion a form of expressive sentiment was suggested by psychologists to be measured in terms of valence and arousal (Russell, 2009). [sent-17, score-0.64]
15 Thus, we are motivated to analyze the sentiment in blogosphere in a more fine-grained fashion. [sent-18, score-0.137]
16 In this paper we study the grouping behaviors of the emotion, or emotion patterns, expressed in the blogposts. [sent-19, score-0.33]
17 We are inspired to get insights into the question of whether these structures can be discovered directly from data without the cost of involving human participants as in traditional psychological studies. [sent-20, score-0.07]
18 Next, we aim to study the relationship between the data-driven emotion structures discovered and those proposed by psychologists. [sent-21, score-0.365]
19 – – Work on the analysis of effects of sentiment on lexical access is great in a psychology perspective. [sent-22, score-0.141]
20 However, to our knowledge, limited work exists to examine the same tasks in social media context. [sent-23, score-0.108]
21 To our understanding, we study a novel problem of emotion-based pattern discovery in blogosphere. [sent-25, score-0.034]
22 We provide an initial solution for the matter using a combination of psychological models, affective norm scores for English words, a novel feature representation scheme, and a nonparametric clustering to automatically group moods into meaningful emotion patterns. [sent-26, score-1.17]
23 We believe that we are the first to consider the matter of data-driven emotion pattern discovery at the scale presented in this 43 UppsaPlra,o Scewe didnegn,s o 1f3 t Jhuely AC 20L10 20. [sent-27, score-0.364]
24 Secondly, we explore a novel problem of detecting the mood affective lexicon usage correlation in the new media, and propose a novel use of a term-goodness criterion to discover this sentiment linguistic association. [sent-30, score-0.793]
25 – – 2 Related Work Much work in sentiment analysis measures the value of emotion the text convey in a continuum range of valence (Pang and Lee, 2008). [sent-31, score-0.517]
26 Emotion patterns have often been used in sentiment analysis limited to this one-dimensional formulation. [sent-32, score-0.156]
27 In the former, emotion states are conceptualized as combinations of some factors like valence and arousal. [sent-34, score-0.429]
28 In contrast, the latter style argues that each emotion has a unique coincidence of experience, psychology and behavior (Mauss and Robinson, 2009). [sent-35, score-0.363]
29 Our work utilizes the dimensional representation, and in particular, the core-affect model (Russell, 2009), which encodes emotion states along the valence and arousal dimensions. [sent-36, score-0.527]
30 The sentiment scoring for emotion bearing words is available in a lexicon known as Affective Norms for English Words (ANEW) (Bradley and Lang, 1999). [sent-37, score-0.455]
31 Related work making use of ANEW includes (Dodds and Danforth, 2009) for estimating happiness levels in three types of data: song lyrics, blogs, and the State of the Union addresses. [sent-38, score-0.056]
32 From a psychological perspective, for estimating mood effects in lexicon decisions, (Chastain et al. [sent-39, score-0.399]
33 , 1995) investigates the influence of moods on the access of affective words. [sent-40, score-0.762]
34 For learning affect in blogosphere, (Leshed and Kaye, 2006) utilizes Support Vector Machines (SVM) to predict moods for coming blog posts and detect mood synonymy. [sent-41, score-1.139]
35 1 Mood Pattern Detection Livejournal provides a comprehensive set of 132 moods for users to tag their moods when blogging. [sent-43, score-1.124]
36 The provided moods range diversely in the emotion spectrum but typically are observed to fall into soft clusters such as happiness (cheerful or grateful) or sadness (discontent or uncomfortable). [sent-44, score-1.027]
37 We call each cluster of these moods an emotion pattern and aim to detect them in this paper. [sent-45, score-0.926]
38 We observe that the blogposts tagged with moods in the same emotion pattern have similar 00. [sent-46, score-1.139]
39 7 terrorist sexy surprsied romantic wni enraged orgasm rage ANEW and their arousal values Figure 1: ANEW usage proportion in the posts tagged with happy/cheerful and angry/p*ssed off proportions in the usage of ANEW. [sent-57, score-0.643]
40 For example, in Figure 1 a plot of the usage of ANEW having arousal in the range of 7. [sent-58, score-0.211]
41 2 in the blogposts we could see that the ANEW usage patterns of happy/cheerful and angry/p*ssed off are well separated. [sent-60, score-0.288]
42 Anger, enraged, and rage will be most likely found in the angry/p*ssed off tagged – – – posts and least likely found in the happy/cheerful ones. [sent-61, score-0.282]
43 In contrast, the ANEW as romantic or surprised are not commonly used in the posts tagged with angry/p*ssed off but most popularly used in the happy/cheerful ones; suggesting that, the similarity between ANEW usage patterns can be used as a basis to study the structure of mood space. [sent-62, score-0.75]
44 Let us denote by B the corpus of all blogposts andL by Ms d=e {sad, happy, . [sent-63, score-0.126]
45 c}h blogpost bn ∈ eBt oinf mtheo corpus is| l =abe 1le3d2 w. [sent-70, score-0.042]
46 , xnm] be the vector representing the usage of ANEW by the mood m. [sent-79, score-0.421]
47 Thus, xim = Pb∈B,lb=m cib, where cib is the counting of the APNbE∈WB,l i-th occurrence in the blogpost b tagged with the mood m. [sent-80, score-0.518]
48 The usage vector is normalized so that Pin=1 xim = 1 for all m ∈ M. [sent-81, score-0.128]
49 To discover the Pgrouping o=f th 1e moro aoldls m ba ∈sed M on the usage vectors we use a nonparametric clustering algorithm known as Affinity Propagation (AP) (Frey and Dueck, 2007). [sent-82, score-0.179]
50 AP is desirable here because it automatically discovers the number of xm clusters as well as the cluster exemplars. [sent-83, score-0.019]
51 To map the emotion patterns detected to their psychological meaning, we proceed to measure 44 the sentiment scores of those |M | mood words. [sent-85, score-0.868]
52 Ithne particular, we use A ofN tEhoWse (Bradley aondd w Lang, 1999), which is a set of 1034 sentiment conveying English words. [sent-86, score-0.122]
53 The valence and arousal of moods are assigned by those of the same words in the ANEW lexicon. [sent-87, score-0.759]
54 For those moods which are not in ANEW, their values are assigned by those of the nearest father words in the mood hierarchical tree1 , where those moods conveying the same meaning, to some extent, are in the same level of the tree. [sent-88, score-1.485]
55 Thus, each member of the mood clusters can be placed onto the a 2D representation along the valence and arousal dimensions, making it feasible to compare with the core-affect model (Russell, 2009) theorized by psychologists. [sent-89, score-0.571]
56 2 Mood and ANEW Usage Association To study the statistical strength of an ANEW word with respect to a particular mood, the information gain measure (Mitchell, 1997) is adopted. [sent-91, score-0.018]
57 Given a collection of blog posts B consisting of those tagged or nnot o tagged pwositths a target sctliansgs aotftr tihboustee mood m. [sent-92, score-0.688]
58 are the proportions of the posts tagged and not tagged with m respectively. [sent-97, score-0.314]
59 ) where B⊕ is the subset of B for which attribute A iws present iins t thhee corpus oanf dB B? [sent-102, score-0.036]
60 in classifying the collection with respect to the target class attribute mood m, IG(m, A), is the reduction in entropy caused by partitioning the examples according to the attribute A. [sent-105, score-0.363]
61 Thus, IG(m, A) = H(B) − H(B|A) With respect to a given mood m, those ANEW having high information gain are considered likely to be associated with the mood. [sent-106, score-0.367]
62 1 Mood Patterns We use a large Livejournal blogpost dataset, which contains more than 17 million blogposts tagged with the predefined moods. [sent-112, score-0.255]
63 The ANEW usage vectors of all moods are subjected to a clustering to learn emotion patterns. [sent-114, score-1.02]
64 After running the Affinity Propagation algorithm, 16 pat- terns of moods are clustered as below (the moods in upper case are the exemplars). [sent-115, score-1.124]
65 EXANIMATE, intimidated, predatory, embarrassed, srehsotlceksesd, nostalgic, indif erent, listles , apathetic, blank, Generally, the patterns 1–7 contain moods in high valence (pleasure) and the patterns 8–16 include mood in low valence (displeasure). [sent-132, score-1.223]
66 We learn that nearly all members in the same patterns express a common affect concept. [sent-134, score-0.155]
67 Those moods in the patterns with cheerful, pensive, and rejuvenated as the ex- emplars are mostly located in the first quarter of the affect circle (00 900), which should contain moods being high in both pleasure and activation measures. [sent-135, score-1.446]
68 Meanwhile, many members of the angry and aggravated patterns are found in the second quarter (900 1800), which roughly means that those moods express the feeling of sadness in the high of activation. [sent-136, score-0.783]
69 The patterns with the exemplars nauseated and tired contain a majority of moods found in the third quarter (1800 2700), which could be representatives for the mood fashion of sadness and deactivation. [sent-137, score-1.116]
70 In addition, the grateful group could be a representative for moods which are both low in pleasure and in the degree of activation (2700 3600 of the affect circle). [sent-138, score-0.7]
71 Thus, the clustering process based on the ANEW usage could separate moods having similar affect scores into corresponding segments in the circle proposed in (Russell, 2009). [sent-139, score-0.821]
72 To visualize mood patterns that have been detected, we plot these emotion modes on the affect circle plane in Figure 4. [sent-140, score-0.894]
73 For each pattern, the valence and arousal are computed by averaging of – – – – the values of those moods in the quarter where most of the members in the pattern are. [sent-141, score-0.855]
74 Figure 2 and Figure 3 show views of the distance between moods, based on the Euclidean measure of their corresponding ANEW usage, using MDS and hierarchical clustering respectively. [sent-143, score-0.034]
75 2 Mood and ANEW Association Based on the IG values between moods and ANEW, we learn the correlation of moods and the affective lexicon. [sent-145, score-1.331]
76 With respect to a given mood, those ANEW having high information gain are most likely to be found in the blogposts tagged with the mood. [sent-146, score-0.253]
77 The ANEW most likely happened in the blogposts tagged with a given mood are shown in Table 1a; the most likely moods for the blog posts containing a given ANEW are shown in Table 1b. [sent-147, score-1.333]
78 The ANEW used in the blog posts tagged with moods in the same pattern are more similar than those in the posts tagged with moods in different patterns. [sent-148, score-1.642]
79 For a given mood, a majority ofthe ANEW used in the blog posts tagged with the mood is similar in the valence with the mood. [sent-150, score-0.7]
80 The occurrence of some ANEW having valence much different with the tagging mood, e. [sent-151, score-0.099]
81 the ANEW hate in the posts tagged with cheerful or happy moods, might be the result of a negation construction used in the text or of other context. [sent-153, score-0.325]
82 For a given ANEW, the most likely moods tagged to the blog posts containing the word are similar with the word in the affective scores. [sent-154, score-1.038]
83 In addition, the least likely moods are much different with the ANEW in the affect measure. [sent-155, score-0.647]
84 A plot of top ANEWs used in the blogposts is shown in Figure 5. [sent-156, score-0.145]
85 terrorist or accident, might be a good source for learning opinions from social network towards the things. [sent-161, score-0.101]
86 In the corpus, the posts containing the ANEW terrorist are most likely tagged with angry or cynical moods. [sent-162, score-0.313]
87 Also, the posts containing the ANEW accident are most likely tagged with bored and sore moods. [sent-163, score-0.32]
88 5 Conclusion and Future Work We have investigated the problems of emotionbased pattern discovery and mood affective lexicon usage correlation detection in blogosphere. [sent-164, score-0.699]
89 We presented a method for feature representation based on the affective norms of English scores usage. [sent-165, score-0.207]
90 We then presented an unsupervised approach using Affinity Propagation, a nonparametric clustering algorithm that does not require the number of clusters a priori, for detecting emotion patterns in blogosphere. [sent-166, score-0.48]
91 The results are showing that those automatically discovered patterns match well with the core-affect model for emotion, which is independently formulated in the psychology literature. [sent-167, score-0.136]
92 In addition, we proposed a novel use of a termgoodness criterion to discover mood–lexicon correlation in blogosphere, giving hints on predicting moods based on the affective lexicon usage and – vice versa in the social media. [sent-168, score-0.994]
93 Our results could also have potential uses in sentiment-aware social media applications. [sent-169, score-0.108]
94 Future work will take into account the temporal dimension to trace changes in mood patterns over time in blogosphere. [sent-170, score-0.395]
95 Another direction is to integrate negation information to learn more cohesive association in affect scores between moods and affective words. [sent-171, score-0.805]
96 In addition, a new affective lexicon could be automatically detected based on learning correlation of the blog text and the moods tagged. [sent-172, score-0.89]
97 Affective norms for English words (ANEW): Stimuli, instruction manual and affective ratings. [sent-187, score-0.207]
98 Mood and lexical access of positive, negative, and neutral words. [sent-196, score-0.035]
99 Measuring the happiness of large-scale written expression: Songs, blogs, and presidents. [sent-204, score-0.056]
100 Understanding how bloggers feel: recognizing affect in blog posts. [sent-218, score-0.127]
wordName wordTfidf (topN-words)
[('moods', 0.562), ('anew', 0.48), ('emotion', 0.33), ('mood', 0.327), ('affective', 0.18), ('blogposts', 0.126), ('posts', 0.123), ('valence', 0.099), ('arousal', 0.098), ('usage', 0.094), ('sentiment', 0.088), ('tagged', 0.087), ('ssed', 0.084), ('cheerful', 0.07), ('patterns', 0.068), ('circle', 0.068), ('blog', 0.064), ('affect', 0.063), ('russell', 0.056), ('happiness', 0.056), ('social', 0.054), ('media', 0.054), ('angry', 0.049), ('blogosphere', 0.049), ('happy', 0.045), ('blogpost', 0.042), ('bored', 0.042), ('bradley', 0.042), ('enraged', 0.042), ('sadness', 0.042), ('quarter', 0.038), ('lexicon', 0.037), ('multidimensional', 0.037), ('pleasure', 0.037), ('psychological', 0.035), ('discovered', 0.035), ('clustering', 0.034), ('xim', 0.034), ('conveying', 0.034), ('pattern', 0.034), ('psychology', 0.033), ('terrorist', 0.032), ('nonparametric', 0.029), ('affinity', 0.029), ('anger', 0.028), ('chastain', 0.028), ('cib', 0.028), ('discontent', 0.028), ('dodds', 0.028), ('leshed', 0.028), ('livejournal', 0.028), ('mauss', 0.028), ('nauseated', 0.028), ('rage', 0.028), ('rejuvenated', 0.028), ('romantic', 0.028), ('sore', 0.028), ('theorized', 0.028), ('tired', 0.028), ('uncomfortable', 0.028), ('correlation', 0.027), ('norms', 0.027), ('ig', 0.026), ('mds', 0.025), ('psychologists', 0.025), ('sad', 0.025), ('frey', 0.025), ('pensive', 0.025), ('borg', 0.025), ('thin', 0.025), ('pang', 0.024), ('members', 0.024), ('exemplars', 0.023), ('surprised', 0.023), ('discover', 0.022), ('propagation', 0.022), ('likely', 0.022), ('scaling', 0.021), ('ap', 0.02), ('access', 0.02), ('activation', 0.02), ('bb', 0.02), ('detected', 0.02), ('visualize', 0.019), ('plot', 0.019), ('clusters', 0.019), ('gain', 0.018), ('accident', 0.018), ('spectrum', 0.018), ('grateful', 0.018), ('attribute', 0.018), ('criterion', 0.018), ('iws', 0.018), ('euclidean', 0.017), ('proportions', 0.017), ('lang', 0.017), ('nguyen', 0.016), ('blogs', 0.016), ('opinions', 0.015), ('neutral', 0.015)]
simIndex simValue paperId paperTitle
same-paper 1 1.0000001 176 acl-2010-Mood Patterns and Affective Lexicon Access in Weblogs
Author: Thin Nguyen
Abstract: The emergence of social media brings chances, but also challenges, to linguistic analysis. In this paper we investigate a novel problem of discovering patterns based on emotion and the association of moods and affective lexicon usage in blogosphere, a representative for social media. We propose the use ofnormative emotional scores for English words in combination with a psychological model of emotion measurement and a nonparametric clustering process for inferring meaningful emotion patterns automatically from data. Our results on a dataset consisting of more than 17 million mood-groundtruthed blogposts have shown interesting evidence of the emotion patterns automatically discovered that match well with the core- affect emotion model theorized by psychologists. We then present a method based on information theory to discover the association of moods and affective lexicon usage in the new media.
2 0.094693467 123 acl-2010-Generating Focused Topic-Specific Sentiment Lexicons
Author: Valentin Jijkoun ; Maarten de Rijke ; Wouter Weerkamp
Abstract: We present a method for automatically generating focused and accurate topicspecific subjectivity lexicons from a general purpose polarity lexicon that allow users to pin-point subjective on-topic information in a set of relevant documents. We motivate the need for such lexicons in the field of media analysis, describe a bootstrapping method for generating a topic-specific lexicon from a general purpose polarity lexicon, and evaluate the quality of the generated lexicons both manually and using a TREC Blog track test set for opinionated blog post retrieval. Although the generated lexicons can be an order of magnitude more selective than the general purpose lexicon, they maintain, or even improve, the performance of an opin- ion retrieval system.
3 0.071936809 85 acl-2010-Detecting Experiences from Weblogs
Author: Keun Chan Park ; Yoonjae Jeong ; Sung Hyon Myaeng
Abstract: Weblogs are a source of human activity knowledge comprising valuable information such as facts, opinions and personal experiences. In this paper, we propose a method for mining personal experiences from a large set of weblogs. We define experience as knowledge embedded in a collection of activities or events which an individual or group has actually undergone. Based on an observation that experience-revealing sentences have a certain linguistic style, we formulate the problem of detecting experience as a classification task using various features including tense, mood, aspect, modality, experiencer, and verb classes. We also present an activity verb lexicon construction method based on theories of lexical semantics. Our results demonstrate that the activity verb lexicon plays a pivotal role among selected features in the classification perfor- , mance and shows that our proposed method outperforms the baseline significantly.
4 0.063611142 210 acl-2010-Sentiment Translation through Lexicon Induction
Author: Christian Scheible
Abstract: The translation of sentiment information is a task from which sentiment analysis systems can benefit. We present a novel, graph-based approach using SimRank, a well-established vertex similarity algorithm to transfer sentiment information between a source language and a target language graph. We evaluate this method in comparison with SO-PMI.
5 0.05418542 209 acl-2010-Sentiment Learning on Product Reviews via Sentiment Ontology Tree
Author: Wei Wei ; Jon Atle Gulla
Abstract: Existing works on sentiment analysis on product reviews suffer from the following limitations: (1) The knowledge of hierarchical relationships of products attributes is not fully utilized. (2) Reviews or sentences mentioning several attributes associated with complicated sentiments are not dealt with very well. In this paper, we propose a novel HL-SOT approach to labeling a product’s attributes and their associated sentiments in product reviews by a Hierarchical Learning (HL) process with a defined Sentiment Ontology Tree (SOT). The empirical analysis against a humanlabeled data set demonstrates promising and reasonable performance of the proposed HL-SOT approach. While this paper is mainly on sentiment analysis on reviews of one product, our proposed HLSOT approach is easily generalized to labeling a mix of reviews of more than one products.
6 0.043894 18 acl-2010-A Study of Information Retrieval Weighting Schemes for Sentiment Analysis
7 0.041576032 22 acl-2010-A Unified Graph Model for Sentence-Based Opinion Retrieval
8 0.040563501 125 acl-2010-Generating Templates of Entity Summaries with an Entity-Aspect Model and Pattern Mining
9 0.039635032 204 acl-2010-Recommendation in Internet Forums and Blogs
10 0.037860528 188 acl-2010-Optimizing Informativeness and Readability for Sentiment Summarization
11 0.035776712 157 acl-2010-Last but Definitely Not Least: On the Role of the Last Sentence in Automatic Polarity-Classification
12 0.032949559 105 acl-2010-Evaluating Multilanguage-Comparability of Subjectivity Analysis Systems
13 0.032502525 112 acl-2010-Extracting Social Networks from Literary Fiction
14 0.030784111 174 acl-2010-Modeling Semantic Relevance for Question-Answer Pairs in Web Social Communities
15 0.026638843 42 acl-2010-Automatically Generating Annotator Rationales to Improve Sentiment Classification
16 0.026384275 113 acl-2010-Extraction and Approximation of Numerical Attributes from the Web
17 0.024152655 161 acl-2010-Learning Better Data Representation Using Inference-Driven Metric Learning
18 0.023711596 78 acl-2010-Cross-Language Text Classification Using Structural Correspondence Learning
19 0.02287296 166 acl-2010-Learning Word-Class Lattices for Definition and Hypernym Extraction
20 0.022654578 208 acl-2010-Sentence and Expression Level Annotation of Opinions in User-Generated Discourse
topicId topicWeight
[(0, -0.062), (1, 0.049), (2, -0.061), (3, 0.052), (4, -0.023), (5, -0.007), (6, -0.009), (7, 0.022), (8, 0.017), (9, -0.003), (10, -0.004), (11, 0.021), (12, -0.012), (13, -0.025), (14, 0.006), (15, 0.03), (16, 0.069), (17, -0.006), (18, 0.02), (19, 0.004), (20, 0.014), (21, 0.015), (22, 0.018), (23, -0.054), (24, 0.015), (25, 0.009), (26, -0.038), (27, -0.04), (28, -0.033), (29, -0.028), (30, 0.038), (31, 0.015), (32, 0.054), (33, 0.085), (34, -0.026), (35, -0.132), (36, -0.024), (37, -0.004), (38, -0.102), (39, -0.066), (40, -0.108), (41, 0.063), (42, -0.0), (43, 0.073), (44, 0.03), (45, 0.03), (46, -0.055), (47, -0.044), (48, 0.034), (49, -0.062)]
simIndex simValue paperId paperTitle
same-paper 1 0.94021291 176 acl-2010-Mood Patterns and Affective Lexicon Access in Weblogs
Author: Thin Nguyen
Abstract: The emergence of social media brings chances, but also challenges, to linguistic analysis. In this paper we investigate a novel problem of discovering patterns based on emotion and the association of moods and affective lexicon usage in blogosphere, a representative for social media. We propose the use ofnormative emotional scores for English words in combination with a psychological model of emotion measurement and a nonparametric clustering process for inferring meaningful emotion patterns automatically from data. Our results on a dataset consisting of more than 17 million mood-groundtruthed blogposts have shown interesting evidence of the emotion patterns automatically discovered that match well with the core- affect emotion model theorized by psychologists. We then present a method based on information theory to discover the association of moods and affective lexicon usage in the new media.
2 0.50697106 42 acl-2010-Automatically Generating Annotator Rationales to Improve Sentiment Classification
Author: Ainur Yessenalina ; Yejin Choi ; Claire Cardie
Abstract: One ofthe central challenges in sentimentbased text categorization is that not every portion of a document is equally informative for inferring the overall sentiment of the document. Previous research has shown that enriching the sentiment labels with human annotators’ “rationales” can produce substantial improvements in categorization performance (Zaidan et al., 2007). We explore methods to automatically generate annotator rationales for document-level sentiment classification. Rather unexpectedly, we find the automatically generated rationales just as helpful as human rationales.
3 0.47578421 210 acl-2010-Sentiment Translation through Lexicon Induction
Author: Christian Scheible
Abstract: The translation of sentiment information is a task from which sentiment analysis systems can benefit. We present a novel, graph-based approach using SimRank, a well-established vertex similarity algorithm to transfer sentiment information between a source language and a target language graph. We evaluate this method in comparison with SO-PMI.
4 0.45334157 18 acl-2010-A Study of Information Retrieval Weighting Schemes for Sentiment Analysis
Author: Georgios Paltoglou ; Mike Thelwall
Abstract: Most sentiment analysis approaches use as baseline a support vector machines (SVM) classifier with binary unigram weights. In this paper, we explore whether more sophisticated feature weighting schemes from Information Retrieval can enhance classification accuracy. We show that variants of the classic tf.idf scheme adapted to sentiment analysis provide significant increases in accuracy, especially when using a sublinear function for term frequency weights and document frequency smoothing. The techniques are tested on a wide selection of data sets and produce the best accuracy to our knowledge.
5 0.45248497 209 acl-2010-Sentiment Learning on Product Reviews via Sentiment Ontology Tree
Author: Wei Wei ; Jon Atle Gulla
Abstract: Existing works on sentiment analysis on product reviews suffer from the following limitations: (1) The knowledge of hierarchical relationships of products attributes is not fully utilized. (2) Reviews or sentences mentioning several attributes associated with complicated sentiments are not dealt with very well. In this paper, we propose a novel HL-SOT approach to labeling a product’s attributes and their associated sentiments in product reviews by a Hierarchical Learning (HL) process with a defined Sentiment Ontology Tree (SOT). The empirical analysis against a humanlabeled data set demonstrates promising and reasonable performance of the proposed HL-SOT approach. While this paper is mainly on sentiment analysis on reviews of one product, our proposed HLSOT approach is easily generalized to labeling a mix of reviews of more than one products.
6 0.42898071 204 acl-2010-Recommendation in Internet Forums and Blogs
7 0.41122648 112 acl-2010-Extracting Social Networks from Literary Fiction
8 0.40426964 123 acl-2010-Generating Focused Topic-Specific Sentiment Lexicons
9 0.40392083 85 acl-2010-Detecting Experiences from Weblogs
10 0.37443364 188 acl-2010-Optimizing Informativeness and Readability for Sentiment Summarization
11 0.35747802 105 acl-2010-Evaluating Multilanguage-Comparability of Subjectivity Analysis Systems
12 0.33228341 224 acl-2010-Talking NPCs in a Virtual Game World
13 0.33071178 64 acl-2010-Complexity Assumptions in Ontology Verbalisation
14 0.32736689 141 acl-2010-Identifying Text Polarity Using Random Walks
15 0.32125005 157 acl-2010-Last but Definitely Not Least: On the Role of the Last Sentence in Automatic Polarity-Classification
16 0.30276054 138 acl-2010-Hunting for the Black Swan: Risk Mining from Text
17 0.28635836 43 acl-2010-Automatically Generating Term Frequency Induced Taxonomies
18 0.28582194 178 acl-2010-Non-Cooperation in Dialogue
19 0.28551191 108 acl-2010-Expanding Verb Coverage in Cyc with VerbNet
20 0.28398517 200 acl-2010-Profiting from Mark-Up: Hyper-Text Annotations for Guided Parsing
topicId topicWeight
[(14, 0.017), (23, 0.01), (25, 0.028), (30, 0.454), (42, 0.069), (44, 0.016), (59, 0.043), (71, 0.011), (73, 0.023), (76, 0.013), (78, 0.02), (80, 0.02), (83, 0.058), (84, 0.024), (97, 0.011), (98, 0.065)]
simIndex simValue paperId paperTitle
same-paper 1 0.81099892 176 acl-2010-Mood Patterns and Affective Lexicon Access in Weblogs
Author: Thin Nguyen
Abstract: The emergence of social media brings chances, but also challenges, to linguistic analysis. In this paper we investigate a novel problem of discovering patterns based on emotion and the association of moods and affective lexicon usage in blogosphere, a representative for social media. We propose the use ofnormative emotional scores for English words in combination with a psychological model of emotion measurement and a nonparametric clustering process for inferring meaningful emotion patterns automatically from data. Our results on a dataset consisting of more than 17 million mood-groundtruthed blogposts have shown interesting evidence of the emotion patterns automatically discovered that match well with the core- affect emotion model theorized by psychologists. We then present a method based on information theory to discover the association of moods and affective lexicon usage in the new media.
2 0.39033768 65 acl-2010-Complexity Metrics in an Incremental Right-Corner Parser
Author: Stephen Wu ; Asaf Bachrach ; Carlos Cardenas ; William Schuler
Abstract: Hierarchical HMM (HHMM) parsers make promising cognitive models: while they use a bounded model of working memory and pursue incremental hypotheses in parallel, they still achieve parsing accuracies competitive with chart-based techniques. This paper aims to validate that a right-corner HHMM parser is also able to produce complexity metrics, which quantify a reader’s incremental difficulty in understanding a sentence. Besides defining standard metrics in the HHMM framework, a new metric, embedding difference, is also proposed, which tests the hypothesis that HHMM store elements represents syntactic working memory. Results show that HHMM surprisal outperforms all other evaluated metrics in predicting reading times, and that embedding difference makes a significant, independent contribution.
3 0.30721799 158 acl-2010-Latent Variable Models of Selectional Preference
Author: Diarmuid O Seaghdha
Abstract: This paper describes the application of so-called topic models to selectional preference induction. Three models related to Latent Dirichlet Allocation, a proven method for modelling document-word cooccurrences, are presented and evaluated on datasets of human plausibility judgements. Compared to previously proposed techniques, these models perform very competitively, especially for infrequent predicate-argument combinations where they exceed the quality of Web-scale predictions while using relatively little data.
4 0.2641618 123 acl-2010-Generating Focused Topic-Specific Sentiment Lexicons
Author: Valentin Jijkoun ; Maarten de Rijke ; Wouter Weerkamp
Abstract: We present a method for automatically generating focused and accurate topicspecific subjectivity lexicons from a general purpose polarity lexicon that allow users to pin-point subjective on-topic information in a set of relevant documents. We motivate the need for such lexicons in the field of media analysis, describe a bootstrapping method for generating a topic-specific lexicon from a general purpose polarity lexicon, and evaluate the quality of the generated lexicons both manually and using a TREC Blog track test set for opinionated blog post retrieval. Although the generated lexicons can be an order of magnitude more selective than the general purpose lexicon, they maintain, or even improve, the performance of an opin- ion retrieval system.
5 0.2631 214 acl-2010-Sparsity in Dependency Grammar Induction
Author: Jennifer Gillenwater ; Kuzman Ganchev ; Joao Graca ; Fernando Pereira ; Ben Taskar
Abstract: A strong inductive bias is essential in unsupervised grammar induction. We explore a particular sparsity bias in dependency grammars that encourages a small number of unique dependency types. Specifically, we investigate sparsity-inducing penalties on the posterior distributions of parent-child POS tag pairs in the posterior regularization (PR) framework of Graça et al. (2007). In ex- periments with 12 languages, we achieve substantial gains over the standard expectation maximization (EM) baseline, with average improvement in attachment accuracy of 6.3%. Further, our method outperforms models based on a standard Bayesian sparsity-inducing prior by an average of 4.9%. On English in particular, we show that our approach improves on several other state-of-the-art techniques.
6 0.26283765 150 acl-2010-Inducing Domain-Specific Semantic Class Taggers from (Almost) Nothing
7 0.26215133 149 acl-2010-Incorporating Extra-Linguistic Information into Reference Resolution in Collaborative Task Dialogue
8 0.25947309 251 acl-2010-Using Anaphora Resolution to Improve Opinion Target Identification in Movie Reviews
9 0.25631776 208 acl-2010-Sentence and Expression Level Annotation of Opinions in User-Generated Discourse
10 0.25369295 178 acl-2010-Non-Cooperation in Dialogue
11 0.25158674 231 acl-2010-The Prevalence of Descriptive Referring Expressions in News and Narrative
12 0.25000703 22 acl-2010-A Unified Graph Model for Sentence-Based Opinion Retrieval
13 0.24860172 134 acl-2010-Hierarchical Sequential Learning for Extracting Opinions and Their Attributes
14 0.2416971 33 acl-2010-Assessing the Role of Discourse References in Entailment Inference
15 0.24141221 42 acl-2010-Automatically Generating Annotator Rationales to Improve Sentiment Classification
16 0.24110425 167 acl-2010-Learning to Adapt to Unknown Users: Referring Expression Generation in Spoken Dialogue Systems
17 0.23866168 112 acl-2010-Extracting Social Networks from Literary Fiction
18 0.23804319 219 acl-2010-Supervised Noun Phrase Coreference Research: The First Fifteen Years
19 0.23780712 211 acl-2010-Simple, Accurate Parsing with an All-Fragments Grammar
20 0.23694509 1 acl-2010-"Ask Not What Textual Entailment Can Do for You..."