acl acl2011 acl2011-64 knowledge-graph by maker-knowledge-mining

64 acl-2011-C-Feel-It: A Sentiment Analyzer for Micro-blogs


Source: pdf

Author: Aditya Joshi ; Balamurali AR ; Pushpak Bhattacharyya ; Rajat Mohanty

Abstract: Social networking and micro-blogging sites are stores of opinion-bearing content created by human users. We describe C-Feel-It, a system which can tap opinion content in posts (called tweets) from the micro-blogging website, Twitter. This web-based system categorizes tweets pertaining to a search string as positive, negative or objective and gives an aggregate sentiment score that represents a sentiment snapshot for a search string. We present a qualitative evaluation of this system based on a human-annotated tweet corpus.

Reference: text


Summary: the most important sentenses genereted by tfidf model

sentIndex sentText sentNum sentScore

1 We describe C-Feel-It, a system which can tap opinion content in posts (called tweets) from the micro-blogging website, Twitter. [sent-8, score-0.095]

2 This web-based system categorizes tweets pertaining to a search string as positive, negative or objective and gives an aggregate sentiment score that represents a sentiment snapshot for a search string. [sent-9, score-1.448]

3 We present a qualitative evaluation of this system based on a human-annotated tweet corpus. [sent-10, score-0.721]

4 The content has been a by-product of a class of Internet-based applications that allow users to interact with each other on the web. [sent-13, score-0.06]

5 These applications which are highly accessible and scalable represent a class of media called social media. [sent-14, score-0.094]

6 Some of the currently popular social media sites are Facebook (www. [sent-15, score-0.094]

7 User-generated content on the social media represents the views of the users and hence, may be opinion-bearing. [sent-22, score-0.154]

8 C-Feel-It is a web-based system which predicts sentiment in micro-blogs on Twitter (called tweets). [sent-25, score-0.368]

9 com/user/cfeelit/ ) C-FeelIt uses a rule-based system to classify tweets as positive, negative or objective using inputs from four sentiment-based knowledge repositories. [sent-28, score-0.486]

10 A 127 weighted-majority voting principle is used to predict sentiment of a tweet. [sent-29, score-0.368]

11 An overall sentiment score for the search string is assigned based on the results of predictions for the tweets fetched. [sent-30, score-0.757]

12 This score which is represented as a percentage value gives a live snapshot of the sentiment of users about the topic. [sent-31, score-0.515]

13 The rest of the paper is organized as follows: Section 2 gives background study of Twitter and related work in the context of sentiment analysis for Twitter. [sent-32, score-0.397]

14 2 Background study Twitter is a micro-blogging website and ranks second among the present social media websites (Prelovac, 2010). [sent-36, score-0.118]

15 A micro-blog allows users to exchange small elements of content such as short sentences, individual pages, or video links (Kaplan and Haenlein, 2010). [sent-37, score-0.06]

16 In Twitter, a micro-blogging post is called a tweet which can be upto 140 characters in length. [sent-39, score-0.672]

17 Since the length is constrained, the language used in tweets is highly unstructured. [sent-40, score-0.294]

18 1c 12 S0y1s1te Amss Doecmiaotinosntr faotiron Cos,m papguetast 1io2n7a–l1 L3in2g,uistics Twitter makes it a source for getting a live snapshot of the things happenings on the web. [sent-50, score-0.089]

19 In the context of sentiment classification of tweets Alec et al. [sent-51, score-0.662]

20 (2009a) describes a distant supervisionbased approach for sentiment classification. [sent-52, score-0.368]

21 The training data for this purpose is created following a semi-supervised approach that exploits emoticons in tweets. [sent-53, score-0.075]

22 (2009b) additionally use hashtags in tweets to create train- ing data. [sent-55, score-0.294]

23 Hence, we follow a rulebased approach for predicting sentiment of a tweet. [sent-59, score-0.368]

24 An approach like ours provides a generic way of solving sentiment classification problems in microblogs. [sent-60, score-0.368]

25 C-Feel-It offers two implementations of a rule-based sentiment prediction system. [sent-64, score-0.441]

26 4, we explain how four lexical resources are mapped to the desired output labels. [sent-73, score-0.055]

27 Input to C-Feel-It is a search string and a version number. [sent-76, score-0.097]

28 The versions are described in detail in subsection 3. [sent-77, score-0.072]

29 Output given by C-Feel-It is two-level: tweet-wise prediction and overall prediction. [sent-79, score-0.073]

30 For tweet-wise prediction, sentiment prediction by each of the resources is returned. [sent-80, score-0.471]

31 On the other hand, overall prediction combines sentiment from all tweets to return the percentage of positive, negative and objective content retrieved for the search string. [sent-81, score-0.981]

32 1 Tweet Fetcher Tweet fetcher obtains tweets pertaining to a search string entered by a user. [sent-83, score-0.458]

33 The parameters passed to the API ensure that system receives the latest 50 tweets about the keyword in English. [sent-85, score-0.359]

34 2 Tweet Sentiment Predictor Tweet sentiment predictor predicts sentiment for a single tweet. [sent-88, score-0.921]

35 The first two blocks are same for both the versions of C-FeelIt. [sent-90, score-0.065]

36 Preprocessor The noisy nature of tweets is a classical challenge that any system working on tweets needs to encounter. [sent-92, score-0.588]

37 However, the preprocessor handles extensions and contractions found in tweets as follows. [sent-95, score-0.447]

38 This gives a higher weight to the extended word and retains its contribution to the sentiment of the tweet. [sent-104, score-0.397]

39 Chat lingo normalization: Words used in chat/Internet language that are common in tweets are not present in the lexical resources. [sent-105, score-0.294]

40 Emoticon-based Sentiment Predictor Emoticons are visual representations of emotions frequently used in the user-generated content on the Internet. [sent-110, score-0.054]

41 We observe that in most cases, emoticons pinpoint the sentiment of a tweet. [sent-111, score-0.419]

42 An emoticon is mapped to an output label: positive or negative. [sent-116, score-0.17]

43 A tweet containing one of these emoticons that can be mapped to the desired output labels directly. [sent-117, score-0.748]

44 Lexicon-based Sentiment Predictor For a tweet, the Lexicon-based Sentiment Predictor gives a prediction each for four resources. [sent-119, score-0.102]

45 We remove stop words 3 from the tweet and stem the words using Lovins stemmer (Lovins, 1968). [sent-121, score-0.672]

46 Negation in tweets is handled by inverting sentiment of words after a negating word. [sent-122, score-0.72]

47 The words ‘no ’, ‘never’, ‘not’ are considered negating words and a context window of three words after a negative words is considered for inversion. [sent-123, score-0.149]

48 For each word in the tweet, it gets the prediction from a lexical resource. [sent-126, score-0.073]

49 We use the intuition that a positive tweet has positive words outnumbering other words, a negative tweet has negative words outnumbering other words and an objective tweet has objective words outnumbering other words. [sent-127, score-2.799]

50 As opposed to the earlier version, version 2 gets prediction from the lexical resource for some words in the tweet. [sent-129, score-0.19]

51 This is because certain parts-of-speech have been found to be better indi- cators of sentiment (Pang and Lee, 2004). [sent-130, score-0.368]

52 A tweet is annotated with parts-of-speech tags and the POS bi-tags (i. [sent-131, score-0.672]

53 The prediction for a tweet uses majority vote-based approach as for version 1. [sent-135, score-0.786]

54 3 Tweet Sentiment Collaborator Based on predictions of individual tweets, the Tweet Sentiment Collaborator gives overall prediction with respect to a keyword in form of percentage of positive, negative and objective content. [sent-144, score-0.398]

55 This is on the basis of predictions by each resource by weighting them according to their accuracies. [sent-145, score-0.115]

56 These weights have been assigned to each resource based on experimental results. [sent-146, score-0.076]

57 SentiWordNet (Esuli and Sebastiani, 2006) assigns three scores to synsets of WordNet: positive score, negative score and objective score. [sent-156, score-0.331]

58 When a word is looked up, the label corresponding to maximum of the three scores is returned. [sent-157, score-0.087]

59 For multiple synsets of a word, the output label returned by majority of the synsets becomes the prediction of the resource. [sent-158, score-0.212]

60 , 2004) is a resource that annotates words with tags like parts-ofspeech, prior polarity, magnitude of prior polarity (weak/strong), etc. [sent-161, score-0.174]

61 The prior polarity can be positive, negative or neutral. [sent-162, score-0.185]

62 For prediction using this resource, we use this prior polarity. [sent-163, score-0.1]

63 , 1966) is a list of words marked as positive, negative and neutral. [sent-166, score-0.114]

64 We use these labels to use Inquirer resource for our prediction. [sent-167, score-0.076]

65 Taboada (Taboada and Grieve, 2004) is a word-list that gives a count of collocations with positive and negative seed words. [sent-169, score-0.245]

66 A word closer to a positive seed word is predicted to be positive and vice versa. [sent-170, score-0.204]

67 For the purpose of tweet annotation, an internal interface was written in PHP 5 with MySQL 5. [sent-176, score-0.696]

68 1 Evaluation Data For the purpose of evaluation, a total of 7000 tweets were downloaded by using popular trending topics of 20 domains (like books, movies, electronic gadget, etc. [sent-181, score-0.318]

69 wtoidteorw4nltohadt predicted respectively using resource i wpi,wni,ooi = Weights for respective classes derived for each resource i assigned to a tweet one out of 4 classes: positive, negative, objective and objective-spam. [sent-184, score-0.902]

70 Human annotators 130 A tweet is assigned to objective-spam category if it contains promotional links or incoherent text which was possibly not created by a human user. [sent-189, score-0.71]

71 Apart from these nominal class labels, we also assigned the positive/negative tweets scores ranging from +2 to -2 with +2 being the most positive and -2 being the most negative score respectively. [sent-190, score-0.51]

72 If the tweet belongs to the objective category, a value of zero is assigned as the score. [sent-191, score-0.75]

73 The spam category has been included in the annotation as a future goal of modeling a spam detection layer prior to the sentiment detection. [sent-192, score-0.515]

74 However, the current version of C-Feel-It does not have a spam detection module and hence for evaluation purpose, we use only the data belonging to classes other than objective-spam. [sent-193, score-0.131]

75 1 Sarcastic Tweets Tweet: Hoge, Jaws, and Palantonio are brilliant together talking X’s and O ’s on ESPN right now. [sent-201, score-0.064]

76 Label by C-Feel-It: Positive Label by human annotator: Negative The sarcasm in the above tweet lies in the use of a positive word ’brilliant’ followed by a rather trivial action of ’talking Xs and Os’. [sent-202, score-0.774]

77 The positive word leads to the prediction by C-Feel-It where in fact, it is a negative tweet for the human annotator. [sent-203, score-0.961]

78 2 Lack of Sense Understanding Tweet: If your tooth hurts drink some pain killers and place a warm/hot tea bag like chamomile on your tooth and hold it. [sent-206, score-0.199]

79 it will relieve the pain Label by C-Feel-It: Negative This tweet is objective in nature. [sent-207, score-0.825]

80 in the tweet give an indication to C-Feel-It that the tweet is negative. [sent-209, score-1.344]

81 This misguided implication is because of multiple senses of these words (for example, ’pain’ can also be used in the sentence ’symptoms of the disease are body pain and irritation in the throat’ where it is non-sentiment-bearing). [sent-210, score-0.075]

82 The system finds a positive word ’good’ and marks the tweet as positive. [sent-215, score-0.774]

83 Label by SentiWordNet: Negative Label by Taboada/Inquirer: Objective Label by human annotator: Negative On manual verification, it was observed that an entry for the emotion-bearing word ’bullshit’ is present in SentiWordNet while Inquirer and Taboada resource do not have them. [sent-221, score-0.076]

84 This shows that the coverage of the lexical resource affects the performance of a system and may introduce errors. [sent-222, score-0.076]

85 Lol Entity: Close encounters of the third kind Label by C-Feel-It: Positive The words comprising the name of the film ’Close encounters of the third kind’ are also looked up. [sent-226, score-0.162]

86 This world knowledge is important for a system that aims to handle tweets like these. [sent-232, score-0.294]

87 but if i had one chance, i’d do it all over again Label by C-Feel-It: Positive The tweet contains emotions of positive as well as negative variety and it would in fact be difficult for a human as well to identify the polarity. [sent-236, score-0.911]

88 The mixed nature of the tweet leads to this error by the system. [sent-237, score-0.672]

89 8 Lack of Context Tweet: I’ll have to say it’s a tie between Little Women or To kill a Mockingbird Label by C-Feel-It: Negative Label by human user: Positive The tweet has a sentiment which will possibly be clear in the context of the conversation. [sent-240, score-1.075]

90 Going by the tweet alone, while one understands that an comparative opinion is being expressed, it is not possible to tag it as positive or negative. [sent-241, score-0.813]

91 Label by C-Feel-It: Negative The tweet has a hashtag containing concatenated words ’goodbook’ which get overlooked as out-ofdictionary words and hence, are not used for sentiment prediction. [sent-245, score-1.062]

92 ’ have to be handled as a special case because ’more’ is an intensification of the negative sentiment expressed by the word ’disgusted’ . [sent-264, score-0.505]

93 5 Summary & Future Work In this paper, we described a system which categorizes live tweets related to a keyword as positive, negative and objective based on the predictions of four sentimentbased resources. [sent-265, score-0.674]

94 A sentiment analyzer of this kind can be tuned to take inputs from different sources on the internet (for example, wall posts on facebook). [sent-267, score-0.419]

95 In order to improve the quality of sentiment prediction, we propose two additions. [sent-268, score-0.368]

96 Secondly, a spam detection module that eliminates promotional tweets before performing sentiment detection may be added to the cur- rent system. [sent-270, score-0.76]

97 Our goal with respect to this system is to deploy it for predicting share market values of firms based 132 on sentiment on social networks with respect to related entitites. [sent-271, score-0.458]

98 SentiWordNet: A publicly available lexical resource for opinion mining. [sent-282, score-0.115]

99 A sentimental education: sentiment analysis using subjectivity summarization based on minimum cuts. [sent-296, score-0.368]

100 Seeing stars: Exploiting class relationships for sentiment categorization with respect to rating scales. [sent-301, score-0.368]


similar papers computed by tfidf model

tfidf for this paper:

wordName wordTfidf (topN-words)

[('tweet', 0.672), ('sentiment', 0.368), ('tweets', 0.294), ('predictor', 0.185), ('twitter', 0.117), ('negative', 0.114), ('positive', 0.102), ('taboada', 0.086), ('objective', 0.078), ('alec', 0.076), ('casablanca', 0.076), ('preprocessor', 0.076), ('resource', 0.076), ('pain', 0.075), ('prediction', 0.073), ('inquirer', 0.07), ('sentiwordnet', 0.066), ('label', 0.065), ('keyword', 0.065), ('disgusted', 0.065), ('fetcher', 0.065), ('outnumbering', 0.065), ('spam', 0.06), ('bombay', 0.057), ('iit', 0.052), ('social', 0.052), ('emoticons', 0.051), ('live', 0.049), ('qualitative', 0.049), ('extensions', 0.046), ('encounters', 0.045), ('polarity', 0.044), ('collaborator', 0.043), ('emoticon', 0.043), ('lovins', 0.043), ('mockingbird', 0.043), ('negscore', 0.043), ('posscore', 0.043), ('tooth', 0.043), ('xxm', 0.043), ('pertaining', 0.043), ('media', 0.042), ('version', 0.041), ('snapshot', 0.04), ('predictions', 0.039), ('facebook', 0.039), ('opinion', 0.039), ('architecture', 0.038), ('brilliant', 0.038), ('deploy', 0.038), ('interjection', 0.038), ('killers', 0.038), ('mumbai', 0.038), ('promotional', 0.038), ('sarcastic', 0.038), ('synsets', 0.037), ('versions', 0.036), ('ix', 0.036), ('subsection', 0.036), ('pang', 0.036), ('kill', 0.035), ('categorizes', 0.035), ('audience', 0.035), ('bhayani', 0.035), ('comparatives', 0.035), ('negating', 0.035), ('string', 0.033), ('content', 0.031), ('esuli', 0.031), ('contractions', 0.031), ('api', 0.031), ('resources', 0.03), ('annotator', 0.03), ('hence', 0.03), ('gives', 0.029), ('blocks', 0.029), ('users', 0.029), ('letter', 0.029), ('pos', 0.029), ('stone', 0.029), ('kaplan', 0.029), ('prior', 0.027), ('misspellings', 0.026), ('talking', 0.026), ('kind', 0.026), ('india', 0.025), ('chat', 0.025), ('mapped', 0.025), ('posts', 0.025), ('purpose', 0.024), ('comprising', 0.024), ('website', 0.024), ('search', 0.023), ('business', 0.023), ('emotions', 0.023), ('lack', 0.023), ('replace', 0.023), ('handled', 0.023), ('concatenated', 0.022), ('looked', 0.022)]

similar papers list:

simIndex simValue paperId paperTitle

same-paper 1 1.0000004 64 acl-2011-C-Feel-It: A Sentiment Analyzer for Micro-blogs

Author: Aditya Joshi ; Balamurali AR ; Pushpak Bhattacharyya ; Rajat Mohanty

Abstract: Social networking and micro-blogging sites are stores of opinion-bearing content created by human users. We describe C-Feel-It, a system which can tap opinion content in posts (called tweets) from the micro-blogging website, Twitter. This web-based system categorizes tweets pertaining to a search string as positive, negative or objective and gives an aggregate sentiment score that represents a sentiment snapshot for a search string. We present a qualitative evaluation of this system based on a human-annotated tweet corpus.

2 0.6708855 292 acl-2011-Target-dependent Twitter Sentiment Classification

Author: Long Jiang ; Mo Yu ; Ming Zhou ; Xiaohua Liu ; Tiejun Zhao

Abstract: Sentiment analysis on Twitter data has attracted much attention recently. In this paper, we focus on target-dependent Twitter sentiment classification; namely, given a query, we classify the sentiments of the tweets as positive, negative or neutral according to whether they contain positive, negative or neutral sentiments about that query. Here the query serves as the target of the sentiments. The state-ofthe-art approaches for solving this problem always adopt the target-independent strategy, which may assign irrelevant sentiments to the given target. Moreover, the state-of-the-art approaches only take the tweet to be classified into consideration when classifying the sentiment; they ignore its context (i.e., related tweets). However, because tweets are usually short and more ambiguous, sometimes it is not enough to consider only the current tweet for sentiment classification. In this paper, we propose to improve target-dependent Twitter sentiment classification by 1) incorporating target-dependent features; and 2) taking related tweets into consideration. According to the experimental results, our approach greatly improves the performance of target-dependent sentiment classification. 1

3 0.29880935 261 acl-2011-Recognizing Named Entities in Tweets

Author: Xiaohua LIU ; Shaodian ZHANG ; Furu WEI ; Ming ZHOU

Abstract: The challenges of Named Entities Recognition (NER) for tweets lie in the insufficient information in a tweet and the unavailability of training data. We propose to combine a K-Nearest Neighbors (KNN) classifier with a linear Conditional Random Fields (CRF) model under a semi-supervised learning framework to tackle these challenges. The KNN based classifier conducts pre-labeling to collect global coarse evidence across tweets while the CRF model conducts sequential labeling to capture fine-grained information encoded in a tweet. The semi-supervised learning plus the gazetteers alleviate the lack of training data. Extensive experiments show the advantages of our method over the baselines as well as the effectiveness of KNN and semisupervised learning.

4 0.2849755 242 acl-2011-Part-of-Speech Tagging for Twitter: Annotation, Features, and Experiments

Author: Kevin Gimpel ; Nathan Schneider ; Brendan O'Connor ; Dipanjan Das ; Daniel Mills ; Jacob Eisenstein ; Michael Heilman ; Dani Yogatama ; Jeffrey Flanigan ; Noah A. Smith

Abstract: We address the problem of part-of-speech tagging for English data from the popular microblogging service Twitter. We develop a tagset, annotate data, develop features, and report tagging results nearing 90% accuracy. The data and tools have been made available to the research community with the goal of enabling richer text analysis of Twitter and related social media data sets.

5 0.26890332 204 acl-2011-Learning Word Vectors for Sentiment Analysis

Author: Andrew L. Maas ; Raymond E. Daly ; Peter T. Pham ; Dan Huang ; Andrew Y. Ng ; Christopher Potts

Abstract: Unsupervised vector-based approaches to semantics can model rich lexical meanings, but they largely fail to capture sentiment information that is central to many word meanings and important for a wide range of NLP tasks. We present a model that uses a mix of unsupervised and supervised techniques to learn word vectors capturing semanticterm–documentinformation as well as rich sentiment content. The proposed model can leverage both continuous and multi-dimensional sentiment information as well as non-sentiment annotations. We instantiate the model to utilize the document-level sentiment polarity annotations present in many online documents (e.g. star ratings). We evaluate the model using small, widely used sentiment and subjectivity corpora and find it out-performs several previously introduced methods for sentiment classification. We also introduce a large dataset , of movie reviews to serve as a more robust benchmark for work in this area.

6 0.26251912 177 acl-2011-Interactive Group Suggesting for Twitter

7 0.25941074 160 acl-2011-Identifying Sarcasm in Twitter: A Closer Look

8 0.24941604 332 acl-2011-Using Multiple Sources to Construct a Sentiment Sensitive Thesaurus for Cross-Domain Sentiment Classification

9 0.22439945 253 acl-2011-PsychoSentiWordNet

10 0.2180097 105 acl-2011-Dr Sentiment Knows Everything!

11 0.21671182 281 acl-2011-Sentiment Analysis of Citations using Sentence Structure-Based Features

12 0.20137414 279 acl-2011-Semi-supervised latent variable models for sentence-level sentiment analysis

13 0.19494125 218 acl-2011-MemeTube: A Sentiment-based Audiovisual System for Analyzing and Displaying Microblog Messages

14 0.17929822 183 acl-2011-Joint Bilingual Sentiment Classification with Unlabeled Parallel Corpora

15 0.16027078 54 acl-2011-Automatically Extracting Polarity-Bearing Topics for Cross-Domain Sentiment Classification

16 0.14361593 45 acl-2011-Aspect Ranking: Identifying Important Product Aspects from Online Consumer Reviews

17 0.12759113 305 acl-2011-Topical Keyphrase Extraction from Twitter

18 0.12497229 159 acl-2011-Identifying Noun Product Features that Imply Opinions

19 0.10606055 194 acl-2011-Language Use: What can it tell us?

20 0.10403083 211 acl-2011-Liars and Saviors in a Sentiment Annotated Corpus of Comments to Political Debates


similar papers computed by lsi model

lsi for this paper:

topicId topicWeight

[(0, 0.185), (1, 0.382), (2, 0.336), (3, -0.108), (4, 0.08), (5, 0.019), (6, 0.013), (7, -0.218), (8, -0.064), (9, 0.084), (10, -0.21), (11, 0.176), (12, 0.163), (13, -0.111), (14, -0.129), (15, -0.072), (16, -0.02), (17, 0.039), (18, -0.039), (19, 0.053), (20, -0.084), (21, -0.014), (22, -0.015), (23, -0.011), (24, -0.007), (25, 0.053), (26, -0.042), (27, -0.089), (28, 0.045), (29, 0.027), (30, -0.028), (31, 0.025), (32, 0.023), (33, -0.027), (34, -0.015), (35, 0.008), (36, 0.083), (37, -0.095), (38, 0.006), (39, 0.002), (40, -0.039), (41, -0.076), (42, 0.007), (43, -0.013), (44, -0.022), (45, 0.003), (46, -0.027), (47, -0.061), (48, 0.004), (49, 0.027)]

similar papers list:

simIndex simValue paperId paperTitle

same-paper 1 0.96478754 64 acl-2011-C-Feel-It: A Sentiment Analyzer for Micro-blogs

Author: Aditya Joshi ; Balamurali AR ; Pushpak Bhattacharyya ; Rajat Mohanty

Abstract: Social networking and micro-blogging sites are stores of opinion-bearing content created by human users. We describe C-Feel-It, a system which can tap opinion content in posts (called tweets) from the micro-blogging website, Twitter. This web-based system categorizes tweets pertaining to a search string as positive, negative or objective and gives an aggregate sentiment score that represents a sentiment snapshot for a search string. We present a qualitative evaluation of this system based on a human-annotated tweet corpus.

2 0.92749971 292 acl-2011-Target-dependent Twitter Sentiment Classification

Author: Long Jiang ; Mo Yu ; Ming Zhou ; Xiaohua Liu ; Tiejun Zhao

Abstract: Sentiment analysis on Twitter data has attracted much attention recently. In this paper, we focus on target-dependent Twitter sentiment classification; namely, given a query, we classify the sentiments of the tweets as positive, negative or neutral according to whether they contain positive, negative or neutral sentiments about that query. Here the query serves as the target of the sentiments. The state-ofthe-art approaches for solving this problem always adopt the target-independent strategy, which may assign irrelevant sentiments to the given target. Moreover, the state-of-the-art approaches only take the tweet to be classified into consideration when classifying the sentiment; they ignore its context (i.e., related tweets). However, because tweets are usually short and more ambiguous, sometimes it is not enough to consider only the current tweet for sentiment classification. In this paper, we propose to improve target-dependent Twitter sentiment classification by 1) incorporating target-dependent features; and 2) taking related tweets into consideration. According to the experimental results, our approach greatly improves the performance of target-dependent sentiment classification. 1

3 0.86946309 160 acl-2011-Identifying Sarcasm in Twitter: A Closer Look

Author: Roberto Gonzalez-Ibanez ; Smaranda Muresan ; Nina Wacholder

Abstract: Sarcasm transforms the polarity of an apparently positive or negative utterance into its opposite. We report on a method for constructing a corpus of sarcastic Twitter messages in which determination of the sarcasm of each message has been made by its author. We use this reliable corpus to compare sarcastic utterances in Twitter to utterances that express positive or negative attitudes without sarcasm. We investigate the impact of lexical and pragmatic factors on machine learning effectiveness for identifying sarcastic utterances and we compare the performance of machine learning techniques and human judges on this task. Perhaps unsurprisingly, neither the human judges nor the machine learning techniques perform very well. 1

4 0.68097478 242 acl-2011-Part-of-Speech Tagging for Twitter: Annotation, Features, and Experiments

Author: Kevin Gimpel ; Nathan Schneider ; Brendan O'Connor ; Dipanjan Das ; Daniel Mills ; Jacob Eisenstein ; Michael Heilman ; Dani Yogatama ; Jeffrey Flanigan ; Noah A. Smith

Abstract: We address the problem of part-of-speech tagging for English data from the popular microblogging service Twitter. We develop a tagset, annotate data, develop features, and report tagging results nearing 90% accuracy. The data and tools have been made available to the research community with the goal of enabling richer text analysis of Twitter and related social media data sets.

5 0.65542221 261 acl-2011-Recognizing Named Entities in Tweets

Author: Xiaohua LIU ; Shaodian ZHANG ; Furu WEI ; Ming ZHOU

Abstract: The challenges of Named Entities Recognition (NER) for tweets lie in the insufficient information in a tweet and the unavailability of training data. We propose to combine a K-Nearest Neighbors (KNN) classifier with a linear Conditional Random Fields (CRF) model under a semi-supervised learning framework to tackle these challenges. The KNN based classifier conducts pre-labeling to collect global coarse evidence across tweets while the CRF model conducts sequential labeling to capture fine-grained information encoded in a tweet. The semi-supervised learning plus the gazetteers alleviate the lack of training data. Extensive experiments show the advantages of our method over the baselines as well as the effectiveness of KNN and semisupervised learning.

6 0.62889862 218 acl-2011-MemeTube: A Sentiment-based Audiovisual System for Analyzing and Displaying Microblog Messages

7 0.57917303 177 acl-2011-Interactive Group Suggesting for Twitter

8 0.52369291 279 acl-2011-Semi-supervised latent variable models for sentence-level sentiment analysis

9 0.50062788 204 acl-2011-Learning Word Vectors for Sentiment Analysis

10 0.50026995 332 acl-2011-Using Multiple Sources to Construct a Sentiment Sensitive Thesaurus for Cross-Domain Sentiment Classification

11 0.46440101 253 acl-2011-PsychoSentiWordNet

12 0.45906007 105 acl-2011-Dr Sentiment Knows Everything!

13 0.45781624 281 acl-2011-Sentiment Analysis of Citations using Sentence Structure-Based Features

14 0.42874733 45 acl-2011-Aspect Ranking: Identifying Important Product Aspects from Online Consumer Reviews

15 0.42016762 305 acl-2011-Topical Keyphrase Extraction from Twitter

16 0.41168359 54 acl-2011-Automatically Extracting Polarity-Bearing Topics for Cross-Domain Sentiment Classification

17 0.37634695 183 acl-2011-Joint Bilingual Sentiment Classification with Unlabeled Parallel Corpora

18 0.35525021 211 acl-2011-Liars and Saviors in a Sentiment Annotated Corpus of Comments to Political Debates

19 0.35130119 208 acl-2011-Lexical Normalisation of Short Text Messages: Makn Sens a #twitter

20 0.34822819 289 acl-2011-Subjectivity and Sentiment Analysis of Modern Standard Arabic


similar papers computed by lda model

lda for this paper:

topicId topicWeight

[(5, 0.028), (13, 0.012), (17, 0.024), (26, 0.043), (37, 0.097), (39, 0.063), (41, 0.051), (53, 0.012), (55, 0.024), (59, 0.029), (72, 0.094), (88, 0.017), (90, 0.23), (91, 0.037), (96, 0.131), (97, 0.015)]

similar papers list:

simIndex simValue paperId paperTitle

1 0.77545488 222 acl-2011-Model-Portability Experiments for Textual Temporal Analysis

Author: Oleksandr Kolomiyets ; Steven Bethard ; Marie-Francine Moens

Abstract: We explore a semi-supervised approach for improving the portability of time expression recognition to non-newswire domains: we generate additional training examples by substituting temporal expression words with potential synonyms. We explore using synonyms both from WordNet and from the Latent Words Language Model (LWLM), which predicts synonyms in context using an unsupervised approach. We evaluate a state-of-the-art time expression recognition system trained both with and without the additional training examples using data from TempEval 2010, Reuters and Wikipedia. We find that the LWLM provides substantial improvements on the Reuters corpus, and smaller improvements on the Wikipedia corpus. We find that WordNet alone never improves performance, though intersecting the examples from the LWLM and WordNet provides more stable results for Wikipedia. 1

2 0.7742728 212 acl-2011-Local Histograms of Character N-grams for Authorship Attribution

Author: Hugo Jair Escalante ; Thamar Solorio ; Manuel Montes-y-Gomez

Abstract: This paper proposes the use of local histograms (LH) over character n-grams for authorship attribution (AA). LHs are enriched histogram representations that preserve sequential information in documents; they have been successfully used for text categorization and document visualization using word histograms. In this work we explore the suitability of LHs over n-grams at the character-level for AA. We show that LHs are particularly helpful for AA, because they provide useful information for uncovering, to some extent, the writing style of authors. We report experimental results in AA data sets that confirm that LHs over character n-grams are more helpful for AA than the usual global histograms, yielding results far superior to state of the art approaches. We found that LHs are even more advantageous in challenging conditions, such as having imbalanced and small training sets. Our results motivate further research on the use of LHs for modeling the writing style of authors for related tasks, such as authorship verification and plagiarism detection.

same-paper 3 0.76944983 64 acl-2011-C-Feel-It: A Sentiment Analyzer for Micro-blogs

Author: Aditya Joshi ; Balamurali AR ; Pushpak Bhattacharyya ; Rajat Mohanty

Abstract: Social networking and micro-blogging sites are stores of opinion-bearing content created by human users. We describe C-Feel-It, a system which can tap opinion content in posts (called tweets) from the micro-blogging website, Twitter. This web-based system categorizes tweets pertaining to a search string as positive, negative or objective and gives an aggregate sentiment score that represents a sentiment snapshot for a search string. We present a qualitative evaluation of this system based on a human-annotated tweet corpus.

4 0.73880351 258 acl-2011-Ranking Class Labels Using Query Sessions

Author: Marius Pasca

Abstract: The role of search queries, as available within query sessions or in isolation from one another, in examined in the context of ranking the class labels (e.g., brazilian cities, business centers, hilly sites) extracted from Web documents for various instances (e.g., rio de janeiro). The co-occurrence of a class label and an instance, in the same query or within the same query session, is used to reinforce the estimated relevance of the class label for the instance. Experiments over evaluation sets of instances associated with Web search queries illustrate the higher quality of the query-based, re-ranked class labels, relative to ranking baselines using documentbased counts.

5 0.70979333 221 acl-2011-Model-Based Aligner Combination Using Dual Decomposition

Author: John DeNero ; Klaus Macherey

Abstract: Unsupervised word alignment is most often modeled as a Markov process that generates a sentence f conditioned on its translation e. A similar model generating e from f will make different alignment predictions. Statistical machine translation systems combine the predictions of two directional models, typically using heuristic combination procedures like grow-diag-final. This paper presents a graphical model that embeds two directional aligners into a single model. Inference can be performed via dual decomposition, which reuses the efficient inference algorithms of the directional models. Our bidirectional model enforces a one-to-one phrase constraint while accounting for the uncertainty in the underlying directional models. The resulting alignments improve upon baseline combination heuristics in word-level and phrase-level evaluations.

6 0.66477549 261 acl-2011-Recognizing Named Entities in Tweets

7 0.6609422 88 acl-2011-Creating a manually error-tagged and shallow-parsed learner corpus

8 0.64632332 246 acl-2011-Piggyback: Using Search Engines for Robust Cross-Domain Named Entity Recognition

9 0.64466554 48 acl-2011-Automatic Detection and Correction of Errors in Dependency Treebanks

10 0.6441403 252 acl-2011-Prototyping virtual instructors from human-human corpora

11 0.64305735 32 acl-2011-Algorithm Selection and Model Adaptation for ESL Correction Tasks

12 0.64015913 119 acl-2011-Evaluating the Impact of Coder Errors on Active Learning

13 0.6387794 292 acl-2011-Target-dependent Twitter Sentiment Classification

14 0.6368345 147 acl-2011-Grammatical Error Correction with Alternating Structure Optimization

15 0.63347447 242 acl-2011-Part-of-Speech Tagging for Twitter: Annotation, Features, and Experiments

16 0.63284916 318 acl-2011-Unsupervised Bilingual Morpheme Segmentation and Alignment with Context-rich Hidden Semi-Markov Models

17 0.63222754 182 acl-2011-Joint Annotation of Search Queries

18 0.62995207 269 acl-2011-Scaling up Automatic Cross-Lingual Semantic Role Annotation

19 0.62986767 137 acl-2011-Fine-Grained Class Label Markup of Search Queries

20 0.62960398 34 acl-2011-An Algorithm for Unsupervised Transliteration Mining with an Application to Word Alignment