acl acl2011 acl2011-160 knowledge-graph by maker-knowledge-mining
Source: pdf
Author: Roberto Gonzalez-Ibanez ; Smaranda Muresan ; Nina Wacholder
Abstract: Sarcasm transforms the polarity of an apparently positive or negative utterance into its opposite. We report on a method for constructing a corpus of sarcastic Twitter messages in which determination of the sarcasm of each message has been made by its author. We use this reliable corpus to compare sarcastic utterances in Twitter to utterances that express positive or negative attitudes without sarcasm. We investigate the impact of lexical and pragmatic factors on machine learning effectiveness for identifying sarcastic utterances and we compare the performance of machine learning techniques and human judges on this task. Perhaps unsurprisingly, neither the human judges nor the machine learning techniques perform very well. 1
Reference: text
sentIndex sentText sentNum sentScore
1 We report on a method for constructing a corpus of sarcastic Twitter messages in which determination of the sarcasm of each message has been made by its author. [sent-2, score-1.098]
2 We use this reliable corpus to compare sarcastic utterances in Twitter to utterances that express positive or negative attitudes without sarcasm. [sent-3, score-0.996]
3 We investigate the impact of lexical and pragmatic factors on machine learning effectiveness for identifying sarcastic utterances and we compare the performance of machine learning techniques and human judges on this task. [sent-4, score-1.09]
4 Perhaps unsurprisingly, neither the human judges nor the machine learning techniques perform very well. [sent-5, score-0.188]
5 1 Introduction Automatic detection of sarcasm is still in its infancy. [sent-6, score-0.397]
6 One reason for the lack of computational models has been the absence of accurately-labeled naturally occurring utterances that can be used to train machine learning systems. [sent-7, score-0.103]
7 Microblogging platforms such as Twitter, which allow users to communicate feelings, opinions and ideas in short messages and to assign labels to their own messages, have been recently exploited in sentiment and opinion analysis (Pak and Paroubek, 2010; Davidov et al. [sent-8, score-0.171]
8 In Twitter, messages can be an581 ninwac } @ rutge rs . [sent-10, score-0.104]
9 edu notated with hashtags such as #bicycling, #happy and #sarcasm. [sent-11, score-0.085]
10 We use these hashtags to build a labeled corpus of naturally occurring sarcastic, positive and negative tweets. [sent-12, score-0.253]
11 In this paper, we report on an empirical study on the use of lexical and pragmatic factors to distinguish sarcasm from positive and negative sentiments expressed in Twitter messages. [sent-13, score-0.806]
12 Our results suggest that lexical features alone are not sufficient for identifying sarcasm and that pragmatic and contextual features merit further study. [sent-15, score-0.63]
13 2 Related Work Sarcasm and irony are well-studied phenomena in linguistics, psychology and cognitive science (Gibbs, 1986; Gibbs and Colston 2007; Kreuz and Glucksberg, 1989; Utsumi, 2002). [sent-16, score-0.101]
14 But in the text mining literature, automatic detection of sarcasm is considered a difficult problem (Nigam & Hurst, 2006 and Pang & Lee, 2008 for an overview) and has been addressed in only a few studies. [sent-17, score-0.397]
15 In the context of spoken dialogues, automatic detection of sarcasm has relied primarily on speech-related cues such as laughter and prosody (Tepperman et al. [sent-18, score-0.43]
16 (2010), whose objective was to identify sarcastic and non-sarcastic utterances in Twitter and in Amazon product reviews. [sent-21, score-0.678]
17 i ac t2io0n11 fo Ar Cssoocmiaptuiotanti foonra Clo Lminpguutiast i ocns:aslh Loirntpgaupisetrics , pages 581–586, of distinguishing sarcastic tweets from nonsarcastic tweets that directly convey positive and negative attitudes (we do not consider neutral utterances at all). [sent-24, score-1.873]
18 Our approach of looking at lexical features for identification of sarcasm was inspired by the work of Kreuz and Caucci (2007). [sent-25, score-0.444]
19 In addition, we also look at pragmatic features, such as establishing common ground between speaker and hearer (Clark and Gerring, 1984), and emoticons. [sent-26, score-0.188]
20 3 Data In Twitter, people (tweeters) post messages of up to 140 characters (tweets). [sent-27, score-0.104]
21 Apart from plain text, a tweet can contain references to other users (@ ), URLs, and hashtags (#hashtag) which are tags assigned by the user to identify topic (#teaparty, #worldcup) or sentiment (#angry, #happy, #sarcasm). [sent-28, score-0.169]
22 An example of a tweet is: “@ UserName1 check out the twitter feed on @ UserName2 for a few ideas :) http://xxxxxx. [sent-29, score-0.178]
23 To build our corpus of sarcastic (S), positive (P) and negative (N) tweets, we relied on the annota- tions that tweeters assign to their own tweets using hashtags. [sent-31, score-1.323]
24 Our assumption is that the best judge of whether a tweet is intended to be sarcastic is the author of the tweet. [sent-32, score-0.656]
25 As shown in the following sections, human judges other than the tweets’ authors, achieve low levels of accuracy when trying to classify sarcastic tweets; we therefore argue that using the tweets labeled by their authors using hashtag produces a better quality gold standard. [sent-33, score-1.341]
26 We used a Twitter API to collect tweets that include hashtags that express sarcasm (#sarcasm, #sarcastic), direct positive sentiment (e. [sent-34, score-1.084]
27 We applied automatic filtering to remove retweets, duplicates, quotes, spam, tweets written in languages other than English, and tweets with URLs. [sent-39, score-0.912]
28 (2010) that tweets with #hashtags are noisy, we automatically filtered all tweets where the hashtags of interest were not located at the very end of the message. [sent-41, score-1.014]
29 We then performed a manual review of the filtered tweets to double check that the remaining end hashtags were not part of the message. [sent-42, score-0.541]
30 We thus eliminated messages about sarcasm such as “I really love #sarcasm” and kept only messages that 582 express sarcasm, such as “lol thanks. [sent-43, score-0.66]
31 Our final corpus consists of 900 tweets in each of the three categories, sarcastic, positive and negative. [sent-45, score-0.539]
32 Examples of tweets in our corpus that are labeled with the #sarcasm hashtag include the following: 1) @ UserName That must suck. [sent-46, score-0.497]
33 2) I can't express how much I love shopping on black Friday. [sent-47, score-0.055]
34 4) @ UserName im just loving the positive vibes out of that! [sent-50, score-0.083]
35 , messages that sound positive but are intended to convey a negative attitude) as in Examples 2-4, but there are also some positive messages (messages that sound negative but are apparently intended to be understood as positive), as in Example 1. [sent-53, score-0.661]
36 4 Lexical and Pragmatic Features In this section we address the question of whether it is possible to empirically identify lexical and pragmatic factors that distinguish sarcastic, positive and negative utterances. [sent-54, score-0.409]
37 We used two kinds of lexical features unigrams and dictionary-based. [sent-56, score-0.098]
38 The token overlap between the words in combined dictionary and the words in the tweets was 85%. [sent-74, score-0.456]
39 This demonstrates that lexical coverage is good, even though tweets are well – 1 http://www. [sent-75, score-0.479]
40 We used three pragmatic features: i) positive emoticons such as smileys; ii) negative emoticons such as frowning faces; and iii) ToUser, which marks if a tweets is a reply to another tweet (signaled by <@user> ). [sent-79, score-0.979]
41 To measure the impact of features on discriminating among the three categories, we used two standard measures: presence and frequency of the factors in each tweet. [sent-81, score-0.111]
42 We did a 3way comparison of Sarcastic (S), Positive (P), and Negative (N) messages (S-P-N); as well as 2-way comparisons of i) Sarcastic and Non-Sarcastic (SNS); ii) Sarcastic and Positive (S-P) and Sarcastic and Negative (S-N). [sent-82, score-0.104]
43 The NS tweets were obtained by merging 450 randomly selected positive and 450 negative tweets from our corpus. [sent-83, score-1.08]
44 We ran a test to identify the features that were most useful in discriminating categories. [sent-84, score-0.054]
45 Table 1 shows the top 10 features based on presence of all dictionary-based lexical factors plus the pragmatic factors. [sent-85, score-0.266]
46 χ2 for each task In all of the tasks, negative emotion (Negemo), positive emotion (Posemo), negation (Negate), emoticons (Smiley, Frown), auxiliary verbs (AuxVb), and punctuation marks are in the top 10 features. [sent-87, score-0.314]
47 Table 1 also shows that the pragmatic factor ToUser is important in sarcasm detection. [sent-89, score-0.559]
48 This is an indication of 583 the possible importance of features that indicate common ground in sarcasm identification. [sent-90, score-0.465]
49 5 Classification Experiments In this section we investigate the usefulness of lexical and pragmatic features in machine learning to classify sarcastic, positive and negative Tweets. [sent-91, score-0.399]
50 We used two standard classifiers often employed in sentiment classification: support vector machine with sequential minimal optimization (SMO) and logistic regression (LogR). [sent-92, score-0.07]
51 For features we used: 1) unigrams; 2) presence of dictionary-based lexical and pragmatic factors (LIWC+_P); and 3) frequency of dictionary-based lexical and pragmatic factors (LIWC+_F). [sent-93, score-0.49]
52 We also trained our models with bigrams and trigrams; however, results using these features did not report better results than unigrams and LICW+. [sent-94, score-0.075]
53 In Table 2, shaded cells indicate the best accuracies for each class, while bolded values indicate the best accuracies per row. [sent-96, score-0.074]
54 In the three-way classification (S-P-N), SMO with unigrams as features outperformed SMO with LIWC+_P and LIWC+_F as features. [sent-97, score-0.155]
55 The best accuracy of 57% is an indication of the difficulty of the task. [sent-99, score-0.095]
56 Table 2: Classifiers accuracies using 5-fold crossvalidation, in percent. [sent-100, score-0.037]
57 For the S-NS classification the best results were again obtained using SMO with THae213sBt IkLFUeISnWai–gtrCuNa[r+4em_–sFP3. [sent-102, score-0.053]
58 gi0c Ro0ns) Table 3 : Classifiers accuracies against humans ’ unigrams as features (65. [sent-111, score-0.138]
59 For S-P and S-N the best accuracies were close to 70%. [sent-113, score-0.037]
60 It is intriguing that the machine learning systems have roughly equal difficulty in separating sarcastic tweets from positive tweets and from negative tweets. [sent-116, score-1.729]
61 These results indicate that the lexical and pragmatic features considered in this paper do not provide sufficient information to accurately differentiate sarcastic from positive and negative tweets. [sent-117, score-0.97]
62 This may be due to the inherent difficulty of distinguishing short utterances in isolation, without use of contextual evidence. [sent-118, score-0.186]
63 In the next section we explore the inherent difficulty of identifying sarcastic utterances by comparing human performance and classifier performance. [sent-119, score-0.744]
64 6 Comparison mance against Human Perfor- To get a better sense of how difficult the task of sarcasm identification really is, we conducted three studies with human judges (not the authors of this paper). [sent-120, score-0.585]
65 In the first study, we asked three judges to classify 10% of our S-P-N dataset (90 randomly selected tweets per category) into sarcastic, positive and negative. [sent-121, score-0.747]
66 In addition, they were able to indicate if they were unsure to which category tweets belonged and to add comments about the difficulty of the task. [sent-122, score-0.502]
67 When we considered only the 135 of 270 tweets on which all three judges agreed, the accuracy, computed over to the entire gold standard test set, fell to We used the accuracy when the judges 43. [sent-131, score-0.874]
68 set they agreed on (135 out of 270 584 accuracies in three classification tasks . [sent-135, score-0.147]
69 The highest value in the established HBI achieved a slightly higher accuracy; however, when compared to the bottom value of the same interval, our best result significantly outperformed it. [sent-144, score-0.046]
70 It is intriguing that the difficulty of distinguishing sarcastic utterances from positive ones and from negative ones was quite similar. [sent-145, score-0.957]
71 In the second study, we investigated how well human judges performed on the two-way classification task of labeling sarcastic and non-sarcastic tweets. [sent-146, score-0.816]
72 We asked three other judges to classify 10% of our S-NS dataset (i. [sent-147, score-0.208]
73 67% among the three judges with a Fleiss’ Kappa value of 0. [sent-150, score-0.168]
74 When we considered only cases where all three judges agreed, the accuracy, again computed over the entire gold standard test set, fell to 59. [sent-158, score-0.219]
75 As shown in Table 3 (S-NS: 10% tweets), the HBI was outperformed by the automatic classification using unigrams (68. [sent-160, score-0.131]
76 Based on recent results which show that nonlinguistic cues such as emoticons are helpful in interpreting non-literal meaning such as sarcasm and irony in user generated content (Derks et al. [sent-163, score-0.541]
77 , 2009), we explored how much emoticons help humans to distinguish sarcastic from positive and negative tweets. [sent-165, score-0.861]
78 For this test, we created a new dataset using only tweets with emoticons. [sent-166, score-0.474]
79 This dataset consisted of 50 sarcastic 3 3 The accuracy on the set they agreed on (129 out of 180 tweets) was 82. [sent-167, score-0.681]
80 tweets and 50 non-sarcastic tweets (25 P and 25 N). [sent-169, score-0.912]
81 Two human judges classified the tweets using the same procedure as above. [sent-170, score-0.644]
82 For this task judges achieved an overall agreement of 89% with Cohen’s Kappa value of 0. [sent-171, score-0.187]
83 The results show that emoticons play an important role in helping people distinguish sarcastic from nonsarcastic tweets. [sent-174, score-0.703]
84 The overall accuracy for both judges was 73% (1. [sent-175, score-0.199]
85 When all judges agreed, the accuracy was 70% when computed relative the entire gold standard Using our trained model for S-NS from the previous section, we also tested our classifiers on this new dataset. [sent-178, score-0.256]
86 Table 3 (S-NS: 100 tweets) shows that our best result (71%) was achieved by SMO using unigrams as features. [sent-179, score-0.07]
87 These three studies show that humans do not perform significantly better than the simple automatic classification methods discussed in this paper. [sent-181, score-0.079]
88 Some judges reported that the classification task was hard. [sent-182, score-0.221]
89 The main issues judges identified were the lack of context and the brevity of the messages. [sent-183, score-0.168]
90 This suggests that accurate automatic identification of sarcasm on Twitter requires information about interaction between the tweeters such as common ground and world knowledge. [sent-185, score-0.514]
91 set4 7 Conclusion In this paper we have taken a closer look at the problem of automatically detecting sarcasm in Twitter messages. [sent-186, score-0.397]
92 We used a corpus annotated by the tweeters themselves as our gold standard; we relied on the judgments of tweeters because of the relatively poor performance of human coders at this task. [sent-187, score-0.318]
93 set they agreed on (83 out of 100 585 We also compared the performance of automatic and human classification in three different studies. [sent-192, score-0.13]
94 We found that automatic classification can be as good as human classification; however, the accuracy is still low. [sent-193, score-0.104]
95 Our results demonstrate the difficulty of sarcasm classification for both humans and machine learning methods. [sent-194, score-0.522]
96 The length of tweets as well as the lack of explicit context makes this classification task quite dif- ficult. [sent-195, score-0.509]
97 Finally, the low performance of human coders in the classification task of sarcastic tweets suggests that gold standards built by using labels given by human coders other than tweets’ authors may not be reliable. [sent-197, score-1.262]
98 In this sense we believe that our approach to create the gold standard of sarcastic tweets is more suitable in the context of Twitter messages. [sent-198, score-1.059]
99 Acknowledgments We thank all those who participated as coders in our human classification task. [sent-199, score-0.128]
100 Verbal irony as implicit display of ironic environment: Distinguishing ironic utterances from nonirony. [sent-313, score-0.228]
wordName wordTfidf (topN-words)
[('sarcastic', 0.575), ('tweets', 0.456), ('sarcasm', 0.397), ('judges', 0.168), ('pragmatic', 0.162), ('liwc', 0.152), ('twitter', 0.135), ('smo', 0.11), ('messages', 0.104), ('utterances', 0.103), ('kreuz', 0.091), ('tweeters', 0.091), ('negative', 0.085), ('hashtags', 0.085), ('positive', 0.083), ('emoticons', 0.075), ('irony', 0.069), ('davidov', 0.064), ('agreed', 0.057), ('coders', 0.055), ('caucci', 0.054), ('hbi', 0.054), ('touser', 0.054), ('username', 0.054), ('classification', 0.053), ('happy', 0.052), ('unigrams', 0.051), ('difficulty', 0.046), ('tweet', 0.043), ('hashtag', 0.041), ('sentiment', 0.041), ('factors', 0.039), ('distinguishing', 0.037), ('accuracies', 0.037), ('colston', 0.036), ('derks', 0.036), ('frown', 0.036), ('nonsarcastic', 0.036), ('posemo', 0.036), ('smiley', 0.036), ('tepperman', 0.036), ('utsumi', 0.036), ('relied', 0.033), ('love', 0.033), ('glucksberg', 0.032), ('hurst', 0.032), ('logr', 0.032), ('pak', 0.032), ('paroubek', 0.032), ('gibbs', 0.032), ('psychology', 0.032), ('accuracy', 0.031), ('discriminating', 0.03), ('yeah', 0.029), ('angry', 0.029), ('valitutti', 0.029), ('classifiers', 0.029), ('gold', 0.028), ('intriguing', 0.028), ('negate', 0.028), ('carvalho', 0.028), ('ironic', 0.028), ('nigam', 0.028), ('emotion', 0.027), ('outperformed', 0.027), ('ground', 0.026), ('kappa', 0.026), ('opinion', 0.026), ('humans', 0.026), ('fleiss', 0.025), ('attitudes', 0.025), ('features', 0.024), ('pennebaker', 0.024), ('lexical', 0.023), ('strapparava', 0.023), ('ii', 0.023), ('fell', 0.023), ('express', 0.022), ('message', 0.022), ('classify', 0.022), ('apparently', 0.022), ('francis', 0.021), ('attitude', 0.021), ('intended', 0.021), ('human', 0.02), ('achieved', 0.019), ('interval', 0.019), ('sound', 0.018), ('dataset', 0.018), ('indication', 0.018), ('publishers', 0.018), ('differentiate', 0.018), ('presence', 0.018), ('distinguish', 0.017), ('negation', 0.017), ('convey', 0.017), ('judge', 0.017), ('proceeding', 0.017), ('concerns', 0.017), ('located', 0.017)]
simIndex simValue paperId paperTitle
same-paper 1 1.0000006 160 acl-2011-Identifying Sarcasm in Twitter: A Closer Look
Author: Roberto Gonzalez-Ibanez ; Smaranda Muresan ; Nina Wacholder
Abstract: Sarcasm transforms the polarity of an apparently positive or negative utterance into its opposite. We report on a method for constructing a corpus of sarcastic Twitter messages in which determination of the sarcasm of each message has been made by its author. We use this reliable corpus to compare sarcastic utterances in Twitter to utterances that express positive or negative attitudes without sarcasm. We investigate the impact of lexical and pragmatic factors on machine learning effectiveness for identifying sarcastic utterances and we compare the performance of machine learning techniques and human judges on this task. Perhaps unsurprisingly, neither the human judges nor the machine learning techniques perform very well. 1
2 0.33669412 292 acl-2011-Target-dependent Twitter Sentiment Classification
Author: Long Jiang ; Mo Yu ; Ming Zhou ; Xiaohua Liu ; Tiejun Zhao
Abstract: Sentiment analysis on Twitter data has attracted much attention recently. In this paper, we focus on target-dependent Twitter sentiment classification; namely, given a query, we classify the sentiments of the tweets as positive, negative or neutral according to whether they contain positive, negative or neutral sentiments about that query. Here the query serves as the target of the sentiments. The state-ofthe-art approaches for solving this problem always adopt the target-independent strategy, which may assign irrelevant sentiments to the given target. Moreover, the state-of-the-art approaches only take the tweet to be classified into consideration when classifying the sentiment; they ignore its context (i.e., related tweets). However, because tweets are usually short and more ambiguous, sometimes it is not enough to consider only the current tweet for sentiment classification. In this paper, we propose to improve target-dependent Twitter sentiment classification by 1) incorporating target-dependent features; and 2) taking related tweets into consideration. According to the experimental results, our approach greatly improves the performance of target-dependent sentiment classification. 1
3 0.25941074 64 acl-2011-C-Feel-It: A Sentiment Analyzer for Micro-blogs
Author: Aditya Joshi ; Balamurali AR ; Pushpak Bhattacharyya ; Rajat Mohanty
Abstract: Social networking and micro-blogging sites are stores of opinion-bearing content created by human users. We describe C-Feel-It, a system which can tap opinion content in posts (called tweets) from the micro-blogging website, Twitter. This web-based system categorizes tweets pertaining to a search string as positive, negative or objective and gives an aggregate sentiment score that represents a sentiment snapshot for a search string. We present a qualitative evaluation of this system based on a human-annotated tweet corpus.
4 0.21934442 242 acl-2011-Part-of-Speech Tagging for Twitter: Annotation, Features, and Experiments
Author: Kevin Gimpel ; Nathan Schneider ; Brendan O'Connor ; Dipanjan Das ; Daniel Mills ; Jacob Eisenstein ; Michael Heilman ; Dani Yogatama ; Jeffrey Flanigan ; Noah A. Smith
Abstract: We address the problem of part-of-speech tagging for English data from the popular microblogging service Twitter. We develop a tagset, annotate data, develop features, and report tagging results nearing 90% accuracy. The data and tools have been made available to the research community with the goal of enabling richer text analysis of Twitter and related social media data sets.
5 0.18398952 177 acl-2011-Interactive Group Suggesting for Twitter
Author: Zhonghua Qu ; Yang Liu
Abstract: The number of users on Twitter has drastically increased in the past years. However, Twitter does not have an effective user grouping mechanism. Therefore tweets from other users can quickly overrun and become inconvenient to read. In this paper, we propose methods to help users group the people they follow using their provided seeding users. Two sources of information are used to build sub-systems: textural information captured by the tweets sent by users, and social connections among users. We also propose a measure of fitness to determine which subsystem best represents the seed users and use it for target user ranking. Our experiments show that our proposed framework works well and that adaptively choosing the appropriate sub-system for group suggestion results in increased accuracy.
6 0.17497832 261 acl-2011-Recognizing Named Entities in Tweets
7 0.1139081 208 acl-2011-Lexical Normalisation of Short Text Messages: Makn Sens a #twitter
8 0.087968685 305 acl-2011-Topical Keyphrase Extraction from Twitter
9 0.082992934 218 acl-2011-MemeTube: A Sentiment-based Audiovisual System for Analyzing and Displaying Microblog Messages
10 0.079732478 121 acl-2011-Event Discovery in Social Media Feeds
11 0.071333125 172 acl-2011-Insertion, Deletion, or Substitution? Normalizing Text Messages without Pre-categorization nor Supervision
12 0.066176526 47 acl-2011-Automatic Assessment of Coverage Quality in Intelligence Reports
13 0.061778381 332 acl-2011-Using Multiple Sources to Construct a Sentiment Sensitive Thesaurus for Cross-Domain Sentiment Classification
14 0.0604579 136 acl-2011-Finding Deceptive Opinion Spam by Any Stretch of the Imagination
15 0.058789227 253 acl-2011-PsychoSentiWordNet
16 0.057271536 211 acl-2011-Liars and Saviors in a Sentiment Annotated Corpus of Comments to Political Debates
17 0.057045024 285 acl-2011-Simple supervised document geolocation with geodesic grids
18 0.055173296 105 acl-2011-Dr Sentiment Knows Everything!
19 0.054971647 281 acl-2011-Sentiment Analysis of Citations using Sentence Structure-Based Features
20 0.054233514 159 acl-2011-Identifying Noun Product Features that Imply Opinions
topicId topicWeight
[(0, 0.109), (1, 0.171), (2, 0.124), (3, -0.042), (4, -0.02), (5, 0.05), (6, 0.028), (7, -0.163), (8, -0.045), (9, 0.084), (10, -0.231), (11, 0.172), (12, 0.166), (13, -0.105), (14, -0.155), (15, -0.08), (16, -0.035), (17, 0.019), (18, -0.041), (19, 0.004), (20, -0.08), (21, -0.006), (22, -0.054), (23, 0.016), (24, -0.016), (25, 0.042), (26, -0.012), (27, -0.073), (28, 0.018), (29, 0.034), (30, 0.006), (31, -0.035), (32, 0.002), (33, 0.021), (34, -0.001), (35, 0.076), (36, 0.079), (37, -0.057), (38, 0.046), (39, -0.03), (40, -0.033), (41, -0.031), (42, 0.005), (43, -0.041), (44, -0.018), (45, -0.022), (46, -0.034), (47, 0.009), (48, 0.034), (49, 0.032)]
simIndex simValue paperId paperTitle
same-paper 1 0.94711387 160 acl-2011-Identifying Sarcasm in Twitter: A Closer Look
Author: Roberto Gonzalez-Ibanez ; Smaranda Muresan ; Nina Wacholder
Abstract: Sarcasm transforms the polarity of an apparently positive or negative utterance into its opposite. We report on a method for constructing a corpus of sarcastic Twitter messages in which determination of the sarcasm of each message has been made by its author. We use this reliable corpus to compare sarcastic utterances in Twitter to utterances that express positive or negative attitudes without sarcasm. We investigate the impact of lexical and pragmatic factors on machine learning effectiveness for identifying sarcastic utterances and we compare the performance of machine learning techniques and human judges on this task. Perhaps unsurprisingly, neither the human judges nor the machine learning techniques perform very well. 1
2 0.77850395 242 acl-2011-Part-of-Speech Tagging for Twitter: Annotation, Features, and Experiments
Author: Kevin Gimpel ; Nathan Schneider ; Brendan O'Connor ; Dipanjan Das ; Daniel Mills ; Jacob Eisenstein ; Michael Heilman ; Dani Yogatama ; Jeffrey Flanigan ; Noah A. Smith
Abstract: We address the problem of part-of-speech tagging for English data from the popular microblogging service Twitter. We develop a tagset, annotate data, develop features, and report tagging results nearing 90% accuracy. The data and tools have been made available to the research community with the goal of enabling richer text analysis of Twitter and related social media data sets.
3 0.77499831 64 acl-2011-C-Feel-It: A Sentiment Analyzer for Micro-blogs
Author: Aditya Joshi ; Balamurali AR ; Pushpak Bhattacharyya ; Rajat Mohanty
Abstract: Social networking and micro-blogging sites are stores of opinion-bearing content created by human users. We describe C-Feel-It, a system which can tap opinion content in posts (called tweets) from the micro-blogging website, Twitter. This web-based system categorizes tweets pertaining to a search string as positive, negative or objective and gives an aggregate sentiment score that represents a sentiment snapshot for a search string. We present a qualitative evaluation of this system based on a human-annotated tweet corpus.
4 0.70985162 261 acl-2011-Recognizing Named Entities in Tweets
Author: Xiaohua LIU ; Shaodian ZHANG ; Furu WEI ; Ming ZHOU
Abstract: The challenges of Named Entities Recognition (NER) for tweets lie in the insufficient information in a tweet and the unavailability of training data. We propose to combine a K-Nearest Neighbors (KNN) classifier with a linear Conditional Random Fields (CRF) model under a semi-supervised learning framework to tackle these challenges. The KNN based classifier conducts pre-labeling to collect global coarse evidence across tweets while the CRF model conducts sequential labeling to capture fine-grained information encoded in a tweet. The semi-supervised learning plus the gazetteers alleviate the lack of training data. Extensive experiments show the advantages of our method over the baselines as well as the effectiveness of KNN and semisupervised learning.
5 0.7097804 292 acl-2011-Target-dependent Twitter Sentiment Classification
Author: Long Jiang ; Mo Yu ; Ming Zhou ; Xiaohua Liu ; Tiejun Zhao
Abstract: Sentiment analysis on Twitter data has attracted much attention recently. In this paper, we focus on target-dependent Twitter sentiment classification; namely, given a query, we classify the sentiments of the tweets as positive, negative or neutral according to whether they contain positive, negative or neutral sentiments about that query. Here the query serves as the target of the sentiments. The state-ofthe-art approaches for solving this problem always adopt the target-independent strategy, which may assign irrelevant sentiments to the given target. Moreover, the state-of-the-art approaches only take the tweet to be classified into consideration when classifying the sentiment; they ignore its context (i.e., related tweets). However, because tweets are usually short and more ambiguous, sometimes it is not enough to consider only the current tweet for sentiment classification. In this paper, we propose to improve target-dependent Twitter sentiment classification by 1) incorporating target-dependent features; and 2) taking related tweets into consideration. According to the experimental results, our approach greatly improves the performance of target-dependent sentiment classification. 1
6 0.63761896 177 acl-2011-Interactive Group Suggesting for Twitter
7 0.48840463 305 acl-2011-Topical Keyphrase Extraction from Twitter
8 0.4690727 208 acl-2011-Lexical Normalisation of Short Text Messages: Makn Sens a #twitter
10 0.39988476 121 acl-2011-Event Discovery in Social Media Feeds
11 0.33057189 218 acl-2011-MemeTube: A Sentiment-based Audiovisual System for Analyzing and Displaying Microblog Messages
12 0.28195941 194 acl-2011-Language Use: What can it tell us?
13 0.24129257 285 acl-2011-Simple supervised document geolocation with geodesic grids
14 0.23815507 211 acl-2011-Liars and Saviors in a Sentiment Annotated Corpus of Comments to Political Debates
15 0.23471823 288 acl-2011-Subjective Natural Language Problems: Motivations, Applications, Characterizations, and Implications
16 0.23077574 136 acl-2011-Finding Deceptive Opinion Spam by Any Stretch of the Imagination
17 0.22095801 95 acl-2011-Detection of Agreement and Disagreement in Broadcast Conversations
18 0.21927184 120 acl-2011-Even the Abstract have Color: Consensus in Word-Colour Associations
19 0.21614952 338 acl-2011-Wikulu: An Extensible Architecture for Integrating Natural Language Processing Techniques with Wikis
20 0.20489193 297 acl-2011-That's What She Said: Double Entendre Identification
topicId topicWeight
[(5, 0.036), (17, 0.022), (26, 0.053), (31, 0.026), (35, 0.019), (37, 0.064), (39, 0.035), (41, 0.066), (55, 0.021), (59, 0.021), (72, 0.098), (73, 0.303), (88, 0.014), (91, 0.03), (96, 0.092), (97, 0.011)]
simIndex simValue paperId paperTitle
same-paper 1 0.70188123 160 acl-2011-Identifying Sarcasm in Twitter: A Closer Look
Author: Roberto Gonzalez-Ibanez ; Smaranda Muresan ; Nina Wacholder
Abstract: Sarcasm transforms the polarity of an apparently positive or negative utterance into its opposite. We report on a method for constructing a corpus of sarcastic Twitter messages in which determination of the sarcasm of each message has been made by its author. We use this reliable corpus to compare sarcastic utterances in Twitter to utterances that express positive or negative attitudes without sarcasm. We investigate the impact of lexical and pragmatic factors on machine learning effectiveness for identifying sarcastic utterances and we compare the performance of machine learning techniques and human judges on this task. Perhaps unsurprisingly, neither the human judges nor the machine learning techniques perform very well. 1
2 0.56483412 310 acl-2011-Translating from Morphologically Complex Languages: A Paraphrase-Based Approach
Author: Preslav Nakov ; Hwee Tou Ng
Abstract: We propose a novel approach to translating from a morphologically complex language. Unlike previous research, which has targeted word inflections and concatenations, we focus on the pairwise relationship between morphologically related words, which we treat as potential paraphrases and handle using paraphrasing techniques at the word, phrase, and sentence level. An important advantage of this framework is that it can cope with derivational morphology, which has so far remained largely beyond the capabilities of statistical machine translation systems. Our experiments translating from Malay, whose morphology is mostly derivational, into English show signif- icant improvements over rivaling approaches based on five automatic evaluation measures (for 320,000 sentence pairs; 9.5 million English word tokens).
3 0.56280959 58 acl-2011-Beam-Width Prediction for Efficient Context-Free Parsing
Author: Nathan Bodenstab ; Aaron Dunlop ; Keith Hall ; Brian Roark
Abstract: Efficient decoding for syntactic parsing has become a necessary research area as statistical grammars grow in accuracy and size and as more NLP applications leverage syntactic analyses. We review prior methods for pruning and then present a new framework that unifies their strengths into a single approach. Using a log linear model, we learn the optimal beam-search pruning parameters for each CYK chart cell, effectively predicting the most promising areas of the model space to explore. We demonstrate that our method is faster than coarse-to-fine pruning, exemplified in both the Charniak and Berkeley parsers, by empirically comparing our parser to the Berkeley parser using the same grammar and under identical operating conditions.
4 0.54845047 37 acl-2011-An Empirical Evaluation of Data-Driven Paraphrase Generation Techniques
Author: Donald Metzler ; Eduard Hovy ; Chunliang Zhang
Abstract: Paraphrase generation is an important task that has received a great deal of interest recently. Proposed data-driven solutions to the problem have ranged from simple approaches that make minimal use of NLP tools to more complex approaches that rely on numerous language-dependent resources. Despite all of the attention, there have been very few direct empirical evaluations comparing the merits of the different approaches. This paper empirically examines the tradeoffs between simple and sophisticated paraphrase harvesting approaches to help shed light on their strengths and weaknesses. Our evaluation reveals that very simple approaches fare surprisingly well and have a number of distinct advantages, including strong precision, good coverage, and low redundancy.
5 0.48942739 81 acl-2011-Consistent Translation using Discriminative Learning - A Translation Memory-inspired Approach
Author: Yanjun Ma ; Yifan He ; Andy Way ; Josef van Genabith
Abstract: We present a discriminative learning method to improve the consistency of translations in phrase-based Statistical Machine Translation (SMT) systems. Our method is inspired by Translation Memory (TM) systems which are widely used by human translators in industrial settings. We constrain the translation of an input sentence using the most similar ‘translation example’ retrieved from the TM. Differently from previous research which used simple fuzzy match thresholds, these constraints are imposed using discriminative learning to optimise the translation performance. We observe that using this method can benefit the SMT system by not only producing consistent translations, but also improved translation outputs. We report a 0.9 point improvement in terms of BLEU score on English–Chinese technical documents.
6 0.48038328 261 acl-2011-Recognizing Named Entities in Tweets
7 0.47998029 252 acl-2011-Prototyping virtual instructors from human-human corpora
8 0.47647494 88 acl-2011-Creating a manually error-tagged and shallow-parsed learner corpus
9 0.47535503 91 acl-2011-Data-oriented Monologue-to-Dialogue Generation
10 0.47116536 130 acl-2011-Extracting Comparative Entities and Predicates from Texts Using Comparative Type Classification
11 0.46092549 32 acl-2011-Algorithm Selection and Model Adaptation for ESL Correction Tasks
12 0.45680964 316 acl-2011-Unary Constraints for Efficient Context-Free Parsing
13 0.45576525 40 acl-2011-An Error Analysis of Relation Extraction in Social Media Documents
14 0.45302308 302 acl-2011-They Can Help: Using Crowdsourcing to Improve the Evaluation of Grammatical Error Detection Systems
15 0.45299459 142 acl-2011-Generalized Interpolation in Decision Tree LM
16 0.45263597 108 acl-2011-EdIt: A Broad-Coverage Grammar Checker Using Pattern Grammar
17 0.45191383 246 acl-2011-Piggyback: Using Search Engines for Robust Cross-Domain Named Entity Recognition
18 0.45047861 119 acl-2011-Evaluating the Impact of Coder Errors on Active Learning
19 0.449406 48 acl-2011-Automatic Detection and Correction of Errors in Dependency Treebanks
20 0.44788194 152 acl-2011-How Much Can We Gain from Supervised Word Alignment?