acl acl2013 acl2013-117 knowledge-graph by maker-knowledge-mining
Source: pdf
Author: Ankit Ramteke ; Akshat Malu ; Pushpak Bhattacharyya ; J. Saketha Nath
Abstract: Thwarting and sarcasm are two uncharted territories in sentiment analysis, the former because of the lack of training corpora and the latter because of the enormous amount of world knowledge it demands. In this paper, we propose a working definition of thwarting amenable to machine learning and create a system that detects if the document is thwarted or not. We focus on identifying thwarting in product reviews, especially in the camera domain. An ontology of the camera domain is created. Thwarting is looked upon as the phenomenon of polarity reversal at a higher level of ontology compared to the polarity expressed at the lower level. This notion of thwarting defined with respect to an ontology is novel, to the best of our knowledge. A rule based implementation building upon this idea forms our baseline. We show that machine learning with annotated corpora (thwarted/nonthwarted) is more effective than the rule based system. Because of the skewed distribution of thwarting, we adopt the Areaunder-the-Curve measure of performance. To the best of our knowledge, this is the first attempt at the difficult problem of thwarting detection, which we hope will at Akshat Malu Dept. of Computer Science & Engg., Indian Institute of Technology Bombay, Mumbai, India. akshatmalu@ cse .i itb .ac .in J. Saketha Nath Dept. of Computer Science & Engg., Indian Institute of Technology Bombay, Mumbai, India. s aketh@ cse .i itb .ac .in least provide a baseline system to compare against. 1 Credits The authors thank the lexicographers at Center for Indian Language Technology (CFILT) at IIT Bombay for their support for this work. 2
Reference: text
sentIndex sentText sentNum sentScore
1 in Abstract Thwarting and sarcasm are two uncharted territories in sentiment analysis, the former because of the lack of training corpora and the latter because of the enormous amount of world knowledge it demands. [sent-13, score-0.267]
2 In this paper, we propose a working definition of thwarting amenable to machine learning and create a system that detects if the document is thwarted or not. [sent-14, score-1.092]
3 We focus on identifying thwarting in product reviews, especially in the camera domain. [sent-15, score-0.999]
4 Thwarting is looked upon as the phenomenon of polarity reversal at a higher level of ontology compared to the polarity expressed at the lower level. [sent-17, score-0.923]
5 This notion of thwarting defined with respect to an ontology is novel, to the best of our knowledge. [sent-18, score-1.046]
6 A rule based implementation building upon this idea forms our baseline. [sent-19, score-0.068]
7 We show that machine learning with annotated corpora (thwarted/nonthwarted) is more effective than the rule based system. [sent-20, score-0.038]
8 To the best of our knowledge, this is the first attempt at the difficult problem of thwarting detection, which we hope will at Akshat Malu Dept. [sent-22, score-0.77]
9 2 Introduction Although much research has been done in the field of sentiment analysis (Liu et al. [sent-37, score-0.184]
10 , 2012), thwarting and sarcasm are not addressed, to the best of our knowledge. [sent-38, score-0.815]
11 Thwarting has been identified as a common phenomenon in sentiment analysis (Pang et al. [sent-39, score-0.235]
12 The definition of an opinion as specified in Liu (2012) is “An opinion is a quintuple, ( , , , ), where is the name of an entity, is an aspect of is the sentiment on aspect of entity is the opinion holder, and is the time when the opinion is expressed by . [sent-43, score-0.523]
13 ” , , , If the sentiment towards the entity or one of its important attribute contradicts the sentiment towards all other attributes, we can say that the document is thwarted. [sent-44, score-0.462]
14 c A2s0s1o3ci Aatsiosonc fioartio Cno fmorpu Ctoamtiopnuatalt Lioin gauli Lsitnicgsu,i psatgices 860–865, A domain ontology is an ontology of various features pertaining to a domain, arranged in a hierarchy. [sent-47, score-0.634]
15 Domain ontology has been used by various works in NLP (Saggion et al. [sent-49, score-0.276]
16 We look upon thwarting as the phenomenon of reversal of polarity from the lower level of the ontology to the higher level. [sent-53, score-1.471]
17 At the higher level of ontology the entities mentioned are the whole product or a large critical part of the product. [sent-54, score-0.401]
18 So while statements about entities at the lower level of the ontology are on “details”, statements about entities at higher levels are on the “big picture”. [sent-55, score-0.364]
19 Polarity reversal from details to the big picture is at the heart of thwarting. [sent-56, score-0.136]
20 The motivation for our study on thwarting comes from the fact that: a) Thwarting is a challenging NLP problem and b) Special ML machinery is needed in view of the fact that the training data is so skewed. [sent-57, score-0.77]
21 Additionally large amount of world and domain knowledge maybe called for to solve the problem. [sent-58, score-0.056]
22 In spite of the relatively fewer occurrence of the thwarting phenomenon the problem poses an intellectually stimulating exercise. [sent-59, score-0.821]
23 We may also say that in the limit, thwarting approaches the very difficult problem of sarcasm detection (Tsur et al. [sent-60, score-0.847]
24 We start by defining and understanding the problem of thwarting in section 2. [sent-62, score-0.77]
25 In section 3, we describe a method to create the domain ontology. [sent-63, score-0.056]
26 In section 4, we propose a naïve rule based approach to detect thwarting. [sent-64, score-0.038]
27 In section 5 we discuss a machine learning based approach which could be used to identify whether a document is thwarted or not. [sent-65, score-0.316]
28 , (2008) as follows: “Thwarted expectations basically refer to the phenomenon wherein the author of the text first builds up certain expectations for the topic, only to produce a deliberate contrast to the earlier discussion. [sent-69, score-0.168]
29 " For our computational purposes, we define thwarting as: “The phenomenon wherein the overall polarity of the document is in contrast with the polarity of majority of the document. [sent-70, score-1.442]
30 ” This definition emphasizes thwarting as piggybacking on sentiment analysis to improve the latter’s performance. [sent-71, score-0.996]
31 The current work however only addresses the problem of whether a document is thwarted or not and does not output the sentiment of the document. [sent-72, score-0.483]
32 DoI cnupmtenFigure1TD:ShewBytsea crsetimconBglockDiagNrTaohmtw-Tahrtwedaroted An example of a thwarted document is: “I love the sleek design. [sent-74, score-0.366]
33 The pictures look good but, somehow this cam- era disappoints me. [sent-76, score-0.121]
34 ” do While thwarting occurs in various forms of sentiment bearing texts, it is not a very frequent one. [sent-78, score-0.954]
35 Thus, it becomes hard to find sufficient number of examples of thwarting to train a classifier. [sent-80, score-0.77]
36 Since thwarting is a complex natural language phenomenon we require basic NLP tools and resources, whose accuracy in turn can affect the overall performance of a thwarting detection system. [sent-81, score-1.646]
37 4 Building domain ontology Domain ontology comprises of features and entities from the domain and the relationships between them. [sent-82, score-0.688]
38 We decided to use a combination of review corpora mining and manual means for identifying key features. [sent-85, score-0.104]
39 Our approach to building the domain ontology is as follows: Step 1: We use Latent Dirichlet Allocation (LDA) (Blei et al. [sent-86, score-0.332]
40 , 2003) on a corpus containing reviews of a particular product (camera, in our case) to identify key features from the domain. [sent-87, score-0.171]
41 Some additional features get added by human annotator to increase the coverage of the ontology. [sent-89, score-0.023]
42 For Example, in the camera domain, the corpus may include words 861 like memory, card, gb, etc. [sent-90, score-0.153]
43 Step 2: The features thus obtained are arranged in the form of a hierarchy by a human annotator. [sent-93, score-0.026]
44 5 A rule based approach to thwarting recognition As per the definition of thwarting, most of the thwarted document carries a single sentiment; however, a small but critical portion of the text, carrying the contrary sentiment, actually decides the overall polarity. [sent-94, score-1.198]
45 The critical statement, thus, should be strongly polar (either positive or negative), and it should be on some critical feature of tsFhoernwmotpaeinrmocdteshnuliftocepagt. [sent-95, score-0.09]
46 oantieuacohrtne slotnrgwyimo,atrehdnsf Based on these observations we propose the following naïve approach to thwarting detection: For each sentence in a review to be tested 1. [sent-97, score-0.814]
47 It makes explicit the adjec- tive noun dependencies, which in turn uncovers the sentiment on a specific part or feature of the product. [sent-100, score-0.201]
48 Identify the polarities towards all nouns, using the dependency parse and sentiment lexicons. [sent-102, score-0.242]
49 If a domain feature, identified using the domain ontology, exists in the sentence, annotate/update the ontology node, containing the feature, using the polarity obtained. [sent-104, score-0.628]
50 Once the entire review is processed, we obtain the domain ontology, with polarity marking on nodes, for the corresponding review. [sent-105, score-0.37]
51 The given review is thwarted if there is a contradiction of sentiment among different levels of the domain ontology with polarity marking on nodes. [sent-106, score-1.082]
52 The sentiment lexicons used are SentiWordNet (Esuli et al. [sent-107, score-0.184]
53 The pictures look good but, somehow this camera disappoints me. [sent-115, score-0.274]
54 ” do A part of the ontology, with polarity marking on nodes, for this example is shown in figure 3. [sent-117, score-0.27]
55 FigunCaremga3etri:voen-tlogywpDLiBoetshnopidtslayvoe- laritymPpDriokcetsiuntgrivne o- n odes: example Based on this ontology we see that there is an opposition of sentiment between the root (“camera”) and the lower nodes. [sent-118, score-0.46]
56 However, since the nodes, within the same level, might have different weighting based upon the product under consideration, this method fails to perform well. [sent-120, score-0.086]
57 For example, the body and video capability might be subjective whereas any fault in the lens or the battery will render the camera useless, hence they are more critical. [sent-121, score-0.24]
58 862 6 A Machine Learning based approach Manual fixing of relative weightages for the features of the product is possible, but that would be ad hoc. [sent-123, score-0.056]
59 We now propose a machine learning based approach to detect thwarting in documents. [sent-124, score-0.77]
60 It uses the domain ontology to identify key features related to the domain. [sent-125, score-0.367]
61 The approach involves two major steps namely learning the weights and building a model that classifies the reviews using the learnt weights. [sent-126, score-0.142]
62 1 Learning Weights The weights are learnt using the lossregularization framework. [sent-128, score-0.062]
63 The key idea is that the overall polarity of the document is determined by the polarities of individual words in the document. [sent-129, score-0.403]
64 Since, we need to find the weights for the nodes in the domain ontology; we consider only the words belonging to the ontology for further processing. [sent-130, score-0.379]
65 Thus, if P is the polarity of the review and is the polarity associated with word i then ∑ gives the linear model. [sent-131, score-0.524]
66 The word ishould belong to the ontology as well as the review. [sent-132, score-0.276]
67 by where w is the weight vector and x is the feature vector consisting of Based on the intuition, that every word contributes some polarity to its parent node in the domain ontology, we also learnt weights on the ontology by percolating polarities towards the root. [sent-134, score-0.692]
68 We experimented with complete percolation, wherein the polarity at a node is its polarity in the document summed with the polarities of all its descendants. [sent-135, score-0.656]
69 We also define controlled percolation, wherein the value added for a particular descendant is a function of its distance from the node. [sent-136, score-0.076]
70 We halved the polarity value percolated, for each edge between the two nodes. [sent-137, score-0.24]
71 Thus, for the example in figure 2, the polarity value of camera would be Where is the final polarity for camera and is the polarity of the word ϵ {camera, body, display, design, picture} . [sent-138, score-1.026]
72 We first create a vector of weighted polarity values for each review. [sent-141, score-0.263]
73 This is constructed by generating a value for each word in the domain ontology encountered while reading the review sequentially. [sent-142, score-0.376]
74 1), with the polarity of the word as determined from the sentence. [sent-144, score-0.24]
75 These features are selected based on our understanding of the problem and the fact that thwarting is a function of the change of polarity values and also the position of change. [sent-146, score-1.033]
76 7 Results Experiments were performed on a dataset obtained by crawling product reviews from Amazon 1. [sent-150, score-0.136]
77 The reviews crawled were given to three different annotators. [sent-156, score-0.102]
78 Read the entire review and try to form a mental picture of how sentiment in the document is distributed. [sent-158, score-0.342]
79 Ignore anything that is not the opinion of the writer. [sent-159, score-0.079]
80 Try to determine the overall polarity of the document. [sent-161, score-0.263]
81 The star rating of the document can be used for this purpose. [sent-162, score-0.064]
82 If the overall polarity of the document is negative but, most of the words in the document indicate positive sentiment, or vice versa, then consider the document as thwarted. [sent-164, score-0.475]
83 Since, identifying thwarting is a difficult task even for humans, we calculated the Cohen’s kappa score (Cohen 1960) in order to determine the inter annotator agreement. [sent-165, score-0.813]
84 The annotators showed high agreement (98%) in the non-thwarted class whereas they agreed on 70% of the thwarted documents. [sent-170, score-0.259]
85 Out of the 1196 reviews, exactly 21 were thwarted documents, agreed upon by all annotators. [sent-171, score-0.289]
86 The AUC for a random baseline is expected to be 50%, and the rule based approach is close to the baseline (56. [sent-178, score-0.038]
87 We used the CVX3 library in Matlab to solve the optimization problem for learning weights and the LIBSVM4 library to implement the svm classifier. [sent-181, score-0.091]
88 In order to account for the data skew, we assign a class weight of 50 (determined empirically) to the thwarted instances and 1 for non-thwarted instances in the classifier. [sent-182, score-0.235]
89 The system outperforms both the random baseline as well as the rule based system. [sent-186, score-0.038]
90 The basic objective for creating a thwarting detection system was to include such a module in the general sentiment analysis framework. [sent-188, score-0.986]
91 Thus, using document polarity as a feature contradicts the objective of sentiment analysis, which is to find the document polarity. [sent-189, score-0.582]
92 tw/~cjlin/libsvm/ ment polarity feature, the values drop by 10% which is not acceptable. [sent-198, score-0.263]
93 8 Conclusions and Future Work We have described a system for detecting thwarting, based on polarity reversal between opinion on most parts of the product and opinion on the overall product or a critical part of the product. [sent-199, score-0.664]
94 The parts of the product are related to one another through an ontology. [sent-200, score-0.056]
95 This ontology guides a rule based approach to thwarting detection, and also provides features for an SVM based learning system. [sent-201, score-1.084]
96 The ML based system scores over the rule based system. [sent-202, score-0.038]
97 Future work consists in trying out the approach across products and across domains, doing better ontology harnessing from the reviews and investing and searching for distributions and learning algorithms more suitable for the problem. [sent-203, score-0.373]
98 Sentiwordnet: A publicly available lexical resource for opinion mining. [sent-234, score-0.079]
99 A survey of opinion mining and sentiment analysis. [sent-258, score-0.285]
100 ICWSM–A great catchy name: Semi-supervised recognition of sarcastic sentences in online product reviews. [sent-309, score-0.09]
wordName wordTfidf (topN-words)
[('thwarting', 0.77), ('ontology', 0.276), ('polarity', 0.24), ('thwarted', 0.235), ('sentiment', 0.184), ('camera', 0.153), ('reversal', 0.086), ('reviews', 0.08), ('opinion', 0.079), ('bombay', 0.068), ('document', 0.064), ('itb', 0.062), ('cse', 0.062), ('indian', 0.058), ('polarities', 0.058), ('percolation', 0.057), ('domain', 0.056), ('product', 0.056), ('wherein', 0.054), ('lens', 0.052), ('phenomenon', 0.051), ('picture', 0.05), ('taboada', 0.049), ('auc', 0.049), ('sarcasm', 0.045), ('critical', 0.045), ('review', 0.044), ('disappoints', 0.043), ('lcsp', 0.043), ('ohana', 0.043), ('polpinij', 0.043), ('sleek', 0.043), ('mumbai', 0.039), ('rule', 0.038), ('pang', 0.036), ('learnt', 0.035), ('tsur', 0.035), ('detection', 0.032), ('pictures', 0.031), ('inquirer', 0.031), ('contradicts', 0.03), ('upon', 0.03), ('marking', 0.03), ('somehow', 0.029), ('saggion', 0.028), ('esuli', 0.028), ('sentiwordnet', 0.028), ('weights', 0.027), ('recommend', 0.027), ('stone', 0.027), ('arranged', 0.026), ('subsequence', 0.026), ('aaai', 0.026), ('love', 0.024), ('versa', 0.024), ('agreed', 0.024), ('cohen', 0.024), ('entities', 0.024), ('values', 0.023), ('annotator', 0.023), ('overall', 0.023), ('definition', 0.023), ('library', 0.023), ('mining', 0.022), ('expectations', 0.022), ('ling', 0.022), ('crawled', 0.022), ('controlled', 0.022), ('liu', 0.021), ('skewed', 0.021), ('contiguous', 0.021), ('longest', 0.021), ('ml', 0.021), ('negative', 0.02), ('identifying', 0.02), ('statements', 0.02), ('nodes', 0.02), ('curve', 0.019), ('klein', 0.019), ('vaithyanathan', 0.019), ('uncharted', 0.019), ('deliberate', 0.019), ('ocel', 0.019), ('piggybacking', 0.019), ('territories', 0.019), ('body', 0.018), ('look', 0.018), ('svm', 0.018), ('key', 0.018), ('contradiction', 0.017), ('fault', 0.017), ('catchy', 0.017), ('cfilt', 0.017), ('sarcastic', 0.017), ('flips', 0.017), ('investing', 0.017), ('matlab', 0.017), ('tist', 0.017), ('uncovers', 0.017), ('identify', 0.017)]
simIndex simValue paperId paperTitle
same-paper 1 0.99999994 117 acl-2013-Detecting Turnarounds in Sentiment Analysis: Thwarting
Author: Ankit Ramteke ; Akshat Malu ; Pushpak Bhattacharyya ; J. Saketha Nath
Abstract: Thwarting and sarcasm are two uncharted territories in sentiment analysis, the former because of the lack of training corpora and the latter because of the enormous amount of world knowledge it demands. In this paper, we propose a working definition of thwarting amenable to machine learning and create a system that detects if the document is thwarted or not. We focus on identifying thwarting in product reviews, especially in the camera domain. An ontology of the camera domain is created. Thwarting is looked upon as the phenomenon of polarity reversal at a higher level of ontology compared to the polarity expressed at the lower level. This notion of thwarting defined with respect to an ontology is novel, to the best of our knowledge. A rule based implementation building upon this idea forms our baseline. We show that machine learning with annotated corpora (thwarted/nonthwarted) is more effective than the rule based system. Because of the skewed distribution of thwarting, we adopt the Areaunder-the-Curve measure of performance. To the best of our knowledge, this is the first attempt at the difficult problem of thwarting detection, which we hope will at Akshat Malu Dept. of Computer Science & Engg., Indian Institute of Technology Bombay, Mumbai, India. akshatmalu@ cse .i itb .ac .in J. Saketha Nath Dept. of Computer Science & Engg., Indian Institute of Technology Bombay, Mumbai, India. s aketh@ cse .i itb .ac .in least provide a baseline system to compare against. 1 Credits The authors thank the lexicographers at Center for Indian Language Technology (CFILT) at IIT Bombay for their support for this work. 2
2 0.22739033 188 acl-2013-Identifying Sentiment Words Using an Optimization-based Model without Seed Words
Author: Hongliang Yu ; Zhi-Hong Deng ; Shiyingxue Li
Abstract: Sentiment Word Identification (SWI) is a basic technique in many sentiment analysis applications. Most existing researches exploit seed words, and lead to low robustness. In this paper, we propose a novel optimization-based model for SWI. Unlike previous approaches, our model exploits the sentiment labels of documents instead of seed words. Several experiments on real datasets show that WEED is effective and outperforms the state-of-the-art methods with seed words.
3 0.15738352 2 acl-2013-A Bayesian Model for Joint Unsupervised Induction of Sentiment, Aspect and Discourse Representations
Author: Angeliki Lazaridou ; Ivan Titov ; Caroline Sporleder
Abstract: We propose a joint model for unsupervised induction of sentiment, aspect and discourse information and show that by incorporating a notion of latent discourse relations in the model, we improve the prediction accuracy for aspect and sentiment polarity on the sub-sentential level. We deviate from the traditional view of discourse, as we induce types of discourse relations and associated discourse cues relevant to the considered opinion analysis task; consequently, the induced discourse relations play the role of opinion and aspect shifters. The quantitative analysis that we conducted indicated that the integration of a discourse model increased the prediction accuracy results with respect to the discourse-agnostic approach and the qualitative analysis suggests that the induced representations encode a meaningful discourse structure.
4 0.14764024 211 acl-2013-LABR: A Large Scale Arabic Book Reviews Dataset
Author: Mohamed Aly ; Amir Atiya
Abstract: We introduce LABR, the largest sentiment analysis dataset to-date for the Arabic language. It consists of over 63,000 book reviews, each rated on a scale of 1 to 5 stars. We investigate the properties of the the dataset, and present its statistics. We explore using the dataset for two tasks: sentiment polarity classification and rating classification. We provide standard splits of the dataset into training and testing, for both polarity and rating classification, in both balanced and unbalanced settings. We run baseline experiments on the dataset to establish a benchmark.
5 0.1361798 131 acl-2013-Dual Training and Dual Prediction for Polarity Classification
Author: Rui Xia ; Tao Wang ; Xuelei Hu ; Shoushan Li ; Chengqing Zong
Abstract: Bag-of-words (BOW) is now the most popular way to model text in machine learning based sentiment classification. However, the performance of such approach sometimes remains rather limited due to some fundamental deficiencies of the BOW model. In this paper, we focus on the polarity shift problem, and propose a novel approach, called dual training and dual prediction (DTDP), to address it. The basic idea of DTDP is to first generate artificial samples that are polarity-opposite to the original samples by polarity reversion, and then leverage both the original and opposite samples for (dual) training and (dual) prediction. Experimental results on four datasets demonstrate the effectiveness of the proposed approach for polarity classification. 1
6 0.13539889 148 acl-2013-Exploring Sentiment in Social Media: Bootstrapping Subjectivity Clues from Multilingual Twitter Streams
7 0.13194863 318 acl-2013-Sentiment Relevance
8 0.10933316 187 acl-2013-Identifying Opinion Subgroups in Arabic Online Discussions
9 0.10369036 379 acl-2013-Utterance-Level Multimodal Sentiment Analysis
10 0.10242772 345 acl-2013-The Haves and the Have-Nots: Leveraging Unlabelled Corpora for Sentiment Analysis
11 0.10071521 253 acl-2013-Multilingual Affect Polarity and Valence Prediction in Metaphor-Rich Texts
12 0.098626792 244 acl-2013-Mining Opinion Words and Opinion Targets in a Two-Stage Framework
13 0.09266825 115 acl-2013-Detecting Event-Related Links and Sentiments from Social Media Texts
14 0.091373682 91 acl-2013-Connotation Lexicon: A Dash of Sentiment Beneath the Surface Meaning
15 0.089216009 79 acl-2013-Character-to-Character Sentiment Analysis in Shakespeare's Plays
16 0.085888557 67 acl-2013-Bi-directional Inter-dependencies of Subjective Expressions and Targets and their Value for a Joint Model
17 0.084922664 207 acl-2013-Joint Inference for Fine-grained Opinion Extraction
18 0.083458312 147 acl-2013-Exploiting Topic based Twitter Sentiment for Stock Prediction
19 0.078749031 81 acl-2013-Co-Regression for Cross-Language Review Rating Prediction
20 0.075331055 336 acl-2013-Syntactic Patterns versus Word Alignment: Extracting Opinion Targets from Online Reviews
topicId topicWeight
[(0, 0.124), (1, 0.206), (2, -0.026), (3, 0.175), (4, -0.085), (5, -0.043), (6, -0.015), (7, 0.003), (8, -0.001), (9, 0.054), (10, 0.07), (11, -0.012), (12, -0.031), (13, -0.029), (14, 0.016), (15, 0.049), (16, 0.009), (17, 0.013), (18, -0.016), (19, 0.071), (20, -0.015), (21, 0.041), (22, -0.044), (23, 0.055), (24, -0.001), (25, -0.061), (26, -0.055), (27, 0.001), (28, -0.027), (29, 0.006), (30, -0.002), (31, 0.011), (32, -0.024), (33, -0.039), (34, 0.072), (35, 0.009), (36, -0.023), (37, 0.009), (38, 0.006), (39, -0.048), (40, 0.004), (41, 0.046), (42, 0.001), (43, 0.054), (44, 0.014), (45, 0.018), (46, -0.089), (47, 0.013), (48, -0.074), (49, -0.008)]
simIndex simValue paperId paperTitle
same-paper 1 0.93420869 117 acl-2013-Detecting Turnarounds in Sentiment Analysis: Thwarting
Author: Ankit Ramteke ; Akshat Malu ; Pushpak Bhattacharyya ; J. Saketha Nath
Abstract: Thwarting and sarcasm are two uncharted territories in sentiment analysis, the former because of the lack of training corpora and the latter because of the enormous amount of world knowledge it demands. In this paper, we propose a working definition of thwarting amenable to machine learning and create a system that detects if the document is thwarted or not. We focus on identifying thwarting in product reviews, especially in the camera domain. An ontology of the camera domain is created. Thwarting is looked upon as the phenomenon of polarity reversal at a higher level of ontology compared to the polarity expressed at the lower level. This notion of thwarting defined with respect to an ontology is novel, to the best of our knowledge. A rule based implementation building upon this idea forms our baseline. We show that machine learning with annotated corpora (thwarted/nonthwarted) is more effective than the rule based system. Because of the skewed distribution of thwarting, we adopt the Areaunder-the-Curve measure of performance. To the best of our knowledge, this is the first attempt at the difficult problem of thwarting detection, which we hope will at Akshat Malu Dept. of Computer Science & Engg., Indian Institute of Technology Bombay, Mumbai, India. akshatmalu@ cse .i itb .ac .in J. Saketha Nath Dept. of Computer Science & Engg., Indian Institute of Technology Bombay, Mumbai, India. s aketh@ cse .i itb .ac .in least provide a baseline system to compare against. 1 Credits The authors thank the lexicographers at Center for Indian Language Technology (CFILT) at IIT Bombay for their support for this work. 2
2 0.85200375 188 acl-2013-Identifying Sentiment Words Using an Optimization-based Model without Seed Words
Author: Hongliang Yu ; Zhi-Hong Deng ; Shiyingxue Li
Abstract: Sentiment Word Identification (SWI) is a basic technique in many sentiment analysis applications. Most existing researches exploit seed words, and lead to low robustness. In this paper, we propose a novel optimization-based model for SWI. Unlike previous approaches, our model exploits the sentiment labels of documents instead of seed words. Several experiments on real datasets show that WEED is effective and outperforms the state-of-the-art methods with seed words.
3 0.79661554 318 acl-2013-Sentiment Relevance
Author: Christian Scheible ; Hinrich Schutze
Abstract: A number of different notions, including subjectivity, have been proposed for distinguishing parts of documents that convey sentiment from those that do not. We propose a new concept, sentiment relevance, to make this distinction and argue that it better reflects the requirements of sentiment analysis systems. We demonstrate experimentally that sentiment relevance and subjectivity are related, but different. Since no large amount of labeled training data for our new notion of sentiment relevance is available, we investigate two semi-supervised methods for creating sentiment relevance classifiers: a distant supervision approach that leverages structured information about the domain of the reviews; and transfer learning on feature representations based on lexical taxonomies that enables knowledge transfer. We show that both methods learn sentiment relevance classifiers that perform well.
4 0.78526151 211 acl-2013-LABR: A Large Scale Arabic Book Reviews Dataset
Author: Mohamed Aly ; Amir Atiya
Abstract: We introduce LABR, the largest sentiment analysis dataset to-date for the Arabic language. It consists of over 63,000 book reviews, each rated on a scale of 1 to 5 stars. We investigate the properties of the the dataset, and present its statistics. We explore using the dataset for two tasks: sentiment polarity classification and rating classification. We provide standard splits of the dataset into training and testing, for both polarity and rating classification, in both balanced and unbalanced settings. We run baseline experiments on the dataset to establish a benchmark.
5 0.75978649 79 acl-2013-Character-to-Character Sentiment Analysis in Shakespeare's Plays
Author: Eric T. Nalisnick ; Henry S. Baird
Abstract: We present an automatic method for analyzing sentiment dynamics between characters in plays. This literary format’s structured dialogue allows us to make assumptions about who is participating in a conversation. Once we have an idea of who a character is speaking to, the sentiment in his or her speech can be attributed accordingly, allowing us to generate lists of a character’s enemies and allies as well as pinpoint scenes critical to a character’s emotional development. Results of experiments on Shakespeare’s plays are presented along with discussion of how this work can be extended to unstructured texts (i.e. novels).
6 0.75298548 131 acl-2013-Dual Training and Dual Prediction for Polarity Classification
7 0.73627943 148 acl-2013-Exploring Sentiment in Social Media: Bootstrapping Subjectivity Clues from Multilingual Twitter Streams
8 0.68996996 91 acl-2013-Connotation Lexicon: A Dash of Sentiment Beneath the Surface Meaning
9 0.61148179 379 acl-2013-Utterance-Level Multimodal Sentiment Analysis
10 0.60572535 2 acl-2013-A Bayesian Model for Joint Unsupervised Induction of Sentiment, Aspect and Discourse Representations
11 0.58415306 187 acl-2013-Identifying Opinion Subgroups in Arabic Online Discussions
12 0.5670163 49 acl-2013-An annotated corpus of quoted opinions in news articles
13 0.5631727 81 acl-2013-Co-Regression for Cross-Language Review Rating Prediction
14 0.52347362 284 acl-2013-Probabilistic Sense Sentiment Similarity through Hidden Emotions
15 0.51554787 345 acl-2013-The Haves and the Have-Nots: Leveraging Unlabelled Corpora for Sentiment Analysis
16 0.49515107 168 acl-2013-Generating Recommendation Dialogs by Extracting Information from User Reviews
17 0.49504057 147 acl-2013-Exploiting Topic based Twitter Sentiment for Stock Prediction
18 0.4889892 115 acl-2013-Detecting Event-Related Links and Sentiments from Social Media Texts
19 0.47781596 67 acl-2013-Bi-directional Inter-dependencies of Subjective Expressions and Targets and their Value for a Joint Model
20 0.45294601 253 acl-2013-Multilingual Affect Polarity and Valence Prediction in Metaphor-Rich Texts
topicId topicWeight
[(0, 0.042), (6, 0.025), (11, 0.076), (15, 0.021), (24, 0.05), (26, 0.108), (35, 0.069), (42, 0.04), (48, 0.044), (63, 0.012), (70, 0.03), (88, 0.06), (90, 0.018), (93, 0.216), (95, 0.083)]
simIndex simValue paperId paperTitle
1 0.80547565 149 acl-2013-Exploring Word Order Universals: a Probabilistic Graphical Model Approach
Author: Xia Lu
Abstract: In this paper we propose a probabilistic graphical model as an innovative framework for studying typological universals. We view language as a system and linguistic features as its components whose relationships are encoded in a Directed Acyclic Graph (DAG). Taking discovery of the word order universals as a knowledge discovery task we learn the graphical representation of a word order sub-system which reveals a finer structure such as direct and indirect dependencies among word order features. Then probabilistic inference enables us to see the strength of such relationships: given the observed value of one feature (or combination of features), the probabilities of values of other features can be calculated. Our model is not restricted to using only two values of a feature. Using imputation technique and EM algorithm it can handle missing val- ues well. Model averaging technique solves the problem of limited data. In addition the incremental and divide-and-conquer method addresses the areal and genetic effects simultaneously instead of separately as in Daumé III and Campbell (2007). 1
same-paper 2 0.77886391 117 acl-2013-Detecting Turnarounds in Sentiment Analysis: Thwarting
Author: Ankit Ramteke ; Akshat Malu ; Pushpak Bhattacharyya ; J. Saketha Nath
Abstract: Thwarting and sarcasm are two uncharted territories in sentiment analysis, the former because of the lack of training corpora and the latter because of the enormous amount of world knowledge it demands. In this paper, we propose a working definition of thwarting amenable to machine learning and create a system that detects if the document is thwarted or not. We focus on identifying thwarting in product reviews, especially in the camera domain. An ontology of the camera domain is created. Thwarting is looked upon as the phenomenon of polarity reversal at a higher level of ontology compared to the polarity expressed at the lower level. This notion of thwarting defined with respect to an ontology is novel, to the best of our knowledge. A rule based implementation building upon this idea forms our baseline. We show that machine learning with annotated corpora (thwarted/nonthwarted) is more effective than the rule based system. Because of the skewed distribution of thwarting, we adopt the Areaunder-the-Curve measure of performance. To the best of our knowledge, this is the first attempt at the difficult problem of thwarting detection, which we hope will at Akshat Malu Dept. of Computer Science & Engg., Indian Institute of Technology Bombay, Mumbai, India. akshatmalu@ cse .i itb .ac .in J. Saketha Nath Dept. of Computer Science & Engg., Indian Institute of Technology Bombay, Mumbai, India. s aketh@ cse .i itb .ac .in least provide a baseline system to compare against. 1 Credits The authors thank the lexicographers at Center for Indian Language Technology (CFILT) at IIT Bombay for their support for this work. 2
3 0.7204206 8 acl-2013-A Learner Corpus-based Approach to Verb Suggestion for ESL
Author: Yu Sawai ; Mamoru Komachi ; Yuji Matsumoto
Abstract: We propose a verb suggestion method which uses candidate sets and domain adaptation to incorporate error patterns produced by ESL learners. The candidate sets are constructed from a large scale learner corpus to cover various error patterns made by learners. Furthermore, the model is trained using both a native corpus and the learner corpus via a domain adaptation technique. Experiments on two learner corpora show that the candidate sets increase the coverage of error patterns and domain adaptation improves the performance for verb suggestion.
4 0.65976477 318 acl-2013-Sentiment Relevance
Author: Christian Scheible ; Hinrich Schutze
Abstract: A number of different notions, including subjectivity, have been proposed for distinguishing parts of documents that convey sentiment from those that do not. We propose a new concept, sentiment relevance, to make this distinction and argue that it better reflects the requirements of sentiment analysis systems. We demonstrate experimentally that sentiment relevance and subjectivity are related, but different. Since no large amount of labeled training data for our new notion of sentiment relevance is available, we investigate two semi-supervised methods for creating sentiment relevance classifiers: a distant supervision approach that leverages structured information about the domain of the reviews; and transfer learning on feature representations based on lexical taxonomies that enables knowledge transfer. We show that both methods learn sentiment relevance classifiers that perform well.
Author: Phillippe Langlais
Abstract: Analogical learning over strings is a holistic model that has been investigated by a few authors as a means to map forms of a source language to forms of a target language. In this study, we revisit this learning paradigm and apply it to the transliteration task. We show that alone, it performs worse than a statistical phrase-based machine translation engine, but the combination of both approaches outperforms each one taken separately, demonstrating the usefulness of the information captured by a so-called formal analogy.
6 0.65301794 2 acl-2013-A Bayesian Model for Joint Unsupervised Induction of Sentiment, Aspect and Discourse Representations
7 0.65261126 131 acl-2013-Dual Training and Dual Prediction for Polarity Classification
8 0.65014005 148 acl-2013-Exploring Sentiment in Social Media: Bootstrapping Subjectivity Clues from Multilingual Twitter Streams
9 0.64842498 144 acl-2013-Explicit and Implicit Syntactic Features for Text Classification
10 0.64489216 95 acl-2013-Crawling microblogging services to gather language-classified URLs. Workflow and case study
11 0.64377987 81 acl-2013-Co-Regression for Cross-Language Review Rating Prediction
12 0.64137775 196 acl-2013-Improving pairwise coreference models through feature space hierarchy learning
13 0.64126331 70 acl-2013-Bilingually-Guided Monolingual Dependency Grammar Induction
14 0.64095306 188 acl-2013-Identifying Sentiment Words Using an Optimization-based Model without Seed Words
15 0.63926625 295 acl-2013-Real-World Semi-Supervised Learning of POS-Taggers for Low-Resource Languages
16 0.6379751 369 acl-2013-Unsupervised Consonant-Vowel Prediction over Hundreds of Languages
17 0.63670659 97 acl-2013-Cross-lingual Projections between Languages from Different Families
18 0.63651001 211 acl-2013-LABR: A Large Scale Arabic Book Reviews Dataset
19 0.63592982 7 acl-2013-A Lattice-based Framework for Joint Chinese Word Segmentation, POS Tagging and Parsing
20 0.63333291 368 acl-2013-Universal Dependency Annotation for Multilingual Parsing