acl acl2013 acl2013-188 knowledge-graph by maker-knowledge-mining

188 acl-2013-Identifying Sentiment Words Using an Optimization-based Model without Seed Words


Source: pdf

Author: Hongliang Yu ; Zhi-Hong Deng ; Shiyingxue Li

Abstract: Sentiment Word Identification (SWI) is a basic technique in many sentiment analysis applications. Most existing researches exploit seed words, and lead to low robustness. In this paper, we propose a novel optimization-based model for SWI. Unlike previous approaches, our model exploits the sentiment labels of documents instead of seed words. Several experiments on real datasets show that WEED is effective and outperforms the state-of-the-art methods with seed words.

Reference: text


Summary: the most important sentenses genereted by tfidf model

sentIndex sentText sentNum sentScore

1 Abstract Sentiment Word Identification (SWI) is a basic technique in many sentiment analysis applications. [sent-6, score-0.515]

2 Most existing researches exploit seed words, and lead to low robustness. [sent-7, score-0.335]

3 Unlike previous approaches, our model exploits the sentiment labels of documents instead of seed words. [sent-9, score-0.917]

4 Several experiments on real datasets show that WEED is effective and outperforms the state-of-the-art methods with seed words. [sent-10, score-0.365]

5 1 Introduction In recent years, sentiment analysis (Pang et al. [sent-11, score-0.515]

6 Sentiment analysis is to classify a text span into different sentiment polarities, i. [sent-13, score-0.515]

7 Sentiment Word Identification (SWI) is a basic technique in sentiment analysis. [sent-16, score-0.515]

8 The sentence below is an movie review in IMDB database: • Bored performers and a lackluster plot and script, d poe rnfootr mmearkse a a good cakctliuonst emro pvlioet. [sent-23, score-0.289]

9 In order tojudge the sentence polarity (thus we can learn about the preference of this user), one must recognize which words are able to express sentiment. [sent-24, score-0.455]

10 In this sentence, “bored” and “lackluster” are negative while “good” should be positive, yet ∗Corresponding author its polarity is reversed by “not”. [sent-25, score-0.455]

11 By such analysis, we then conclude such movie review is a negative comment. [sent-26, score-0.21]

12 To achieve this, previous supervised approaches need labeled polarity words, also called seed words, usually manually selected. [sent-28, score-0.706]

13 The words to be classified by their sentiment polarities are called candidate words. [sent-29, score-0.781]

14 Prior works study the relations between labeled seed words and unlabeled candidate words, and then obtain sentiment polarities of candidate words by these relations. [sent-30, score-1.209]

15 The authors of (Turney and Littman, 2003) and (Kaji and Kitsuregawa, 2007) use statistical measures, such as point wise mutual information (PMI), to compute similarities in words or phrases. [sent-32, score-0.052]

16 Kanayama and Nasukawa (2006) assume sentiment words successively appear in the text, so one could find sentiment words in the context of seed words (Kanayama and Nasukawa, 2006). [sent-33, score-1.521]

17 , 2011), a Markov random walk model is applied to a large word relatedness graph, constructed according to the synonyms and hypernyms in WordNet (Miller, 1995). [sent-35, score-0.034]

18 However, approaches based on seed words has obvious shortcomings. [sent-36, score-0.387]

19 First, polarities of seed words are not reliable for various domains. [sent-37, score-0.56]

20 As a simple example, “rise” is a neutral word most often, but becomes positive in stock market. [sent-38, score-0.128]

21 Second, manually selection of seed words can be very subjective even if the application domain is determined. [sent-39, score-0.417]

22 Any missing key word in the set of seed words could lead to poor performance. [sent-41, score-0.421]

23 Therefore, the seed word set of such algorithms demands high completeness (by containing common polarity words as many as possible). [sent-42, score-0.792]

24 Unlike the previous research work, we identify sentiment words without any seed words in this paper. [sent-43, score-0.954]

25 c A2s0s1o3ci Aatsiosonc fioartio Cno fmorpu Ctoamtiopnuatalt Lioin gauli Lsitnicgsu,i psatgices 85 –859, formation and their polarity labels are exploited in the identification process. [sent-46, score-0.406]

26 Intuitively, polarities of the document and its most component sentiment words are the same. [sent-47, score-0.85]

27 Moreover, if a word is found mostly in positive documents, it is very likely a positive word, and vice versa. [sent-49, score-0.25]

28 We first measure the importance of the component words in the labeled documents semantically. [sent-51, score-0.144]

29 Here, the basic assumption is that important words are more sentiment related to the document than those less important. [sent-52, score-0.677]

30 Then, we estimate the polarity of each document using its component words’ importance along with their sentiment values, and compare the estimation to the real polarity. [sent-53, score-1.107]

31 After that, we construct an optimization model for the whole corpus to weigh the overall estimation error, which is minimized by the best sentiment values of candidate words. [sent-54, score-0.646]

32 To the best of our knowledge, this paper is the first work that identi- fies sentiment words without seed words. [sent-56, score-0.902]

33 1 Preliminary We formulate the sentiment word identification problem as follows. [sent-58, score-0.584]

34 If document di is a positiv ree psraemsepnltes, tthheeinr li = 1; if di is negative, then li = −1. [sent-66, score-0.456]

35 , cV} to represent cean udsied tahtee nwootardti set, Can =d V {c is the num}b teor orefp pcraensdenidta ctean nwdiodrdaste. [sent-70, score-0.062]

36 Each document is formed by consecutive words in C. [sent-71, score-0.162]

37 Our task is to predict the sentiment polarity of each word cj ∈ C. [sent-72, score-1.208]

38 2 Word Importance We assume each document di ∈ D is presented by a bag-of-words feature vectorf⃗i=ffi. [sent-74, score-0.242]

39 iV1, where fij describes the importance of cj to di. [sent-76, score-0.203]

40 A, high value of fij indicates word cj contributes a lot to document di in semantic view, and vice versa. [sent-77, score-0.778]

41 Note that fij > 0 if cj appears in di, while f⃗i fij = 0 if not. [sent-78, score-0.586]

42 For simplicity, every is normalized to a unit vector, such that features of different documents are relatively comparable. [sent-79, score-0.069]

43 There are several ways to define the word importance, and we choose normalized TF-IDF (Jones, 1972). [sent-80, score-0.065]

44 Therefore, we have fij ∝ TF−IDF(di, cj), and = 1. [sent-81, score-0.149]

45 3 Polarity Value In the above description, the sentiment polarity has only two states, positive or negative. [sent-83, score-0.98]

46 We extend both word and document polarities to polarity values in this section. [sent-84, score-0.688]

47 Definition 1 Word Polarity Value: For each word cj ∈ C, we denote its word polarity value as w(cj). [sent-85, score-0.764]

48 w(cj) > n0o itned iitcsa wteso cj piso a positive word, while w(cj) < 0 indicates cj is a negative word. [sent-86, score-0.754]

49 |w(cj) | indicates the strength of the belief of cj ’s polarity. [sent-87, score-0.323]

50 n Ddeincaottees w(cj) as wj, a onfd th hthee b word polar- ity value vector w⃗ =ww. [sent-88, score-0.071]

51 For example, if w(“bad”) < w(“greedy”) < 0, we can say “bad” is more likely to be a negative word than “greedy”. [sent-91, score-0.118]

52 Definition 2 Document Polarity Value: For each document di, document polarity value is y(di) = cosine(f⃗i, w⃗ ) =f⃗i∥T w⃗·∥ w⃗ . [sent-92, score-0.628]

53 Here, we can regard yi as a polarity estimate for di based on w⃗ . [sent-94, score-0.579]

54 “MR1”, “MR2” and “MR3” are three movie review documents, and “compelling” and “boring” are polarity words in the vocabulary. [sent-96, score-0.549]

55 we simply use TF to construct the document feature vectors without normalization. [sent-97, score-0.143]

56 Similarly, we can get w⃗ = (1, −1), indicating “compelling” is a positive ww⃗o =rd w1,h−ile1 “boring” nisg negative. [sent-99, score-0.126]

57 These inequalities tell us the first two reviews are positive, while the last review is negative. [sent-101, score-0.114]

58 Furthermore, we believe that “MR1” is more positive than “MR2”. [sent-102, score-0.094]

59 f⃗1, f⃗2 f⃗1, f⃗2 f⃗3, f⃗3, 856 Table1:M TwR h r312e ro“wcosminp231etlhiengm”id“lbeos-1h3r1ionwg”sthefa- ture vectors of three movie reviews, and the last row shows the word polarity value vector w⃗ . [sent-103, score-0.554]

60 For simplicity, we use TF value to represent the word importance feature. [sent-104, score-0.125]

61 4 Optimization Model As mentioned above, we can regard yi as a polarity estimate for document di. [sent-106, score-0.557]

62 A precise prediction makes the positive document’s estimator close to 1, and the negative’s close to -1. [sent-107, score-0.094]

63 We define the polarity estimate error for document di as: ei= |yi− li| = |f⃗∥iT w⃗·∥ w⃗ − li|. [sent-108, score-0.642]

64 We obtain w⃗ by minimizing the overall estimation er- ror of all document samplesi∑=n1ei2. [sent-110, score-0.166]

65 Thus, the optimization problem can be de∑scribed as m w⃗in∑i=n1(f⃗i∥T w⃗·∥ w⃗ − li)2. [sent-111, score-0.034]

66 (3) After solving this problem, we not only obtain the polarity of each word cj according to the sign of wj, but also its polarity belief based on |wj |. [sent-112, score-1.099]

67 5 Model Solution We use normalized vector x⃗ to substitute w⃗, and derive an equivalent optimization problem∥: w⃗ mx⃗ in s. [sent-114, score-0.065]

68 The equality constraint of above model makes the problem non-convex. [sent-117, score-0.041]

69 We relax the equality constraint to ∥ x⃗∥ 1, then the problem becomes convex. [sent-118, score-0.041]

70 tW toe ∥ can ≤ re 1w,ri tthee nth teh objective bfeucnoctmioens as the form of least square regression: E(⃗ x) = ∥F ·⃗ x where F is the feature matrix, and ≤ − l⃗∥2, equals tof ⃗ n1. [sent-119, score-0.062]

71 Now we can solve the problem by convex op- timization algorithms (Boyd and Vandenberghe, 2004), such as gradient descend method. [sent-121, score-0.068]

72 1 Experimental Setup We leverage two widely used document datasets. [sent-124, score-0.11]

73 The first dataset is the Cornell Movie Review Data 1, containing 1,000 positive and 1,000 negative processed reviews. [sent-125, score-0.178]

74 The ground-truth is generated with the help of a sentiment lexicon, MPQA subjective lexicon 3. [sent-128, score-0.578]

75 We randomly select 20% polarity words as the seed words, and the remaining are candidate ones. [sent-129, score-0.799]

76 Here, the seed words are provided for the baseline methods but not for ours. [sent-130, score-0.387]

77 In order to increase the difficulty of our task, several non-polarity words are added to the candidate word set. [sent-131, score-0.127]

78 Table 2 shows the word distribution of two datasets. [sent-132, score-0.034]

79 2 Top-K Test In face of the long lists of recommended polarity words, people are only concerned about the topranked words with the highest sentiment value. [sent-137, score-1.018]

80 In this experiment we consider the accuracy of the top K polarity words. [sent-138, score-0.371]

81 The quality of a polarity word list is measured by p@K = where Nright,K is the number of top-K words which are correctly recommended. [sent-139, score-0.457]

82 0%, and it shows the top 10 words in our recommended list is exceptionally reliable. [sent-154, score-0.13]

83 This shows three approaches rank the most probable polarity words in the front of the word list. [sent-156, score-0.457]

84 Table 3 shows top10 positive and negative words for each method, where the bold words are the ones with correct polarities. [sent-160, score-0.282]

85 From the first two columns, we can see the accuracy of WEED is very high, where positive words are absolutely correct and negative word list makes only one mistake, “plot”. [sent-161, score-0.264]

86 The other columns of this table shows the baseline methods both achieve reasonable results but do not perform as well as WEED. [sent-162, score-0.041]

87 Our approach is able to identify frequently used sentiment words, which are vital for the applications without prior sentiment lexicons. [sent-163, score-1.03]

88 The sentiment words identified by SO-PMI are not so representative as WEED and COM. [sent-164, score-0.567]

89 COM tends to assign wrong polarities to the sentiment words although these words are often used. [sent-166, score-0.792]

90 In the 5th and 6th columns of Table 3, “bad” and “horror” are recognized as positive words, while “pretty” and “fun” are recognized as negative ones. [sent-167, score-0.281]

91 These concrete results show that WEED captures the generality of the sentiment words, and achieves a higher accuracy than the baselines. [sent-168, score-0.515]

92 4 Conclusion and Future Work We propose an effective optimization-based model, WEED, to identify sentiment words from the corpus without seed words. [sent-169, score-0.902]

93 The algorithm exploits the sentiment information provided by the documents. [sent-170, score-0.544]

94 To the best of our knowledge, this paper is the first work that identifies sentiment words without any seed words. [sent-171, score-0.902]

95 Several experiments on real datasets show that WEED outperforms the stateof-the-art methods with seed words. [sent-172, score-0.365]

96 Our work can be considered as the first step of building a domain-specific sentiment lexicon. [sent-173, score-0.515]

97 Once some sentiment words are obtained in a certain domain, our future work is to improve WEED by utilizing these words. [sent-174, score-0.567]

98 Extracting diverse sentiment expressions with target-dependent polarity from twitter. [sent-191, score-0.886]

99 Probability adjustment na¨ ıve bayes algorithm based on nondomain-specific sentiment and evaluation word for domain-transfer sentiment analysis. [sent-195, score-1.092]

100 Building lexicon for sentiment analysis from massive collection of html documents. [sent-223, score-0.548]


similar papers computed by tfidf model

tfidf for this paper:

wordName wordTfidf (topN-words)

[('sentiment', 0.515), ('polarity', 0.371), ('seed', 0.335), ('cj', 0.288), ('weed', 0.269), ('polarities', 0.173), ('fij', 0.149), ('di', 0.132), ('swi', 0.129), ('document', 0.11), ('positive', 0.094), ('negative', 0.084), ('movie', 0.079), ('bored', 0.073), ('hassan', 0.071), ('kanayama', 0.069), ('lackluster', 0.065), ('wj', 0.057), ('importance', 0.054), ('boyd', 0.053), ('compelling', 0.053), ('maas', 0.053), ('tf', 0.053), ('words', 0.052), ('kaji', 0.051), ('boring', 0.051), ('ku', 0.049), ('imdb', 0.049), ('nasukawa', 0.049), ('cornell', 0.048), ('recommended', 0.048), ('review', 0.047), ('yi', 0.047), ('littman', 0.046), ('turney', 0.044), ('columns', 0.041), ('li', 0.041), ('candidate', 0.041), ('reviews', 0.041), ('equality', 0.041), ('weblogs', 0.041), ('bad', 0.039), ('documents', 0.038), ('convex', 0.038), ('value', 0.037), ('identification', 0.035), ('fan', 0.035), ('belief', 0.035), ('plot', 0.034), ('orientation', 0.034), ('optimization', 0.034), ('word', 0.034), ('lexicon', 0.033), ('com', 0.033), ('vectors', 0.033), ('poe', 0.032), ('tthee', 0.032), ('topranked', 0.032), ('guohui', 0.032), ('nisg', 0.032), ('cean', 0.032), ('hotspot', 0.032), ('performers', 0.032), ('tojudge', 0.032), ('xer', 0.032), ('recognized', 0.031), ('normalized', 0.031), ('pang', 0.031), ('datasets', 0.03), ('subjective', 0.03), ('irs', 0.03), ('teor', 0.03), ('descend', 0.03), ('critics', 0.03), ('toe', 0.03), ('tiv', 0.03), ('exceptionally', 0.03), ('exploits', 0.029), ('estimate', 0.029), ('vice', 0.028), ('weigh', 0.028), ('posi', 0.028), ('kitsuregawa', 0.028), ('nagarajan', 0.028), ('adjustment', 0.028), ('whe', 0.028), ('ror', 0.028), ('estimation', 0.028), ('simplicity', 0.027), ('jha', 0.027), ('cv', 0.027), ('electronics', 0.027), ('daly', 0.027), ('gmai', 0.027), ('chen', 0.026), ('greedy', 0.026), ('fun', 0.026), ('resent', 0.026), ('criticism', 0.026), ('inequalities', 0.026)]

similar papers list:

simIndex simValue paperId paperTitle

same-paper 1 1.0000001 188 acl-2013-Identifying Sentiment Words Using an Optimization-based Model without Seed Words

Author: Hongliang Yu ; Zhi-Hong Deng ; Shiyingxue Li

Abstract: Sentiment Word Identification (SWI) is a basic technique in many sentiment analysis applications. Most existing researches exploit seed words, and lead to low robustness. In this paper, we propose a novel optimization-based model for SWI. Unlike previous approaches, our model exploits the sentiment labels of documents instead of seed words. Several experiments on real datasets show that WEED is effective and outperforms the state-of-the-art methods with seed words.

2 0.33844483 2 acl-2013-A Bayesian Model for Joint Unsupervised Induction of Sentiment, Aspect and Discourse Representations

Author: Angeliki Lazaridou ; Ivan Titov ; Caroline Sporleder

Abstract: We propose a joint model for unsupervised induction of sentiment, aspect and discourse information and show that by incorporating a notion of latent discourse relations in the model, we improve the prediction accuracy for aspect and sentiment polarity on the sub-sentential level. We deviate from the traditional view of discourse, as we induce types of discourse relations and associated discourse cues relevant to the considered opinion analysis task; consequently, the induced discourse relations play the role of opinion and aspect shifters. The quantitative analysis that we conducted indicated that the integration of a discourse model increased the prediction accuracy results with respect to the discourse-agnostic approach and the qualitative analysis suggests that the induced representations encode a meaningful discourse structure.

3 0.31803668 148 acl-2013-Exploring Sentiment in Social Media: Bootstrapping Subjectivity Clues from Multilingual Twitter Streams

Author: Svitlana Volkova ; Theresa Wilson ; David Yarowsky

Abstract: We study subjective language media and create Twitter-specific lexicons via bootstrapping sentiment-bearing terms from multilingual Twitter streams. Starting with a domain-independent, highprecision sentiment lexicon and a large pool of unlabeled data, we bootstrap Twitter-specific sentiment lexicons, using a small amount of labeled data to guide the process. Our experiments on English, Spanish and Russian show that the resulting lexicons are effective for sentiment classification for many underexplored languages in social media.

4 0.30769137 318 acl-2013-Sentiment Relevance

Author: Christian Scheible ; Hinrich Schutze

Abstract: A number of different notions, including subjectivity, have been proposed for distinguishing parts of documents that convey sentiment from those that do not. We propose a new concept, sentiment relevance, to make this distinction and argue that it better reflects the requirements of sentiment analysis systems. We demonstrate experimentally that sentiment relevance and subjectivity are related, but different. Since no large amount of labeled training data for our new notion of sentiment relevance is available, we investigate two semi-supervised methods for creating sentiment relevance classifiers: a distant supervision approach that leverages structured information about the domain of the reviews; and transfer learning on feature representations based on lexical taxonomies that enables knowledge transfer. We show that both methods learn sentiment relevance classifiers that perform well.

5 0.27625054 131 acl-2013-Dual Training and Dual Prediction for Polarity Classification

Author: Rui Xia ; Tao Wang ; Xuelei Hu ; Shoushan Li ; Chengqing Zong

Abstract: Bag-of-words (BOW) is now the most popular way to model text in machine learning based sentiment classification. However, the performance of such approach sometimes remains rather limited due to some fundamental deficiencies of the BOW model. In this paper, we focus on the polarity shift problem, and propose a novel approach, called dual training and dual prediction (DTDP), to address it. The basic idea of DTDP is to first generate artificial samples that are polarity-opposite to the original samples by polarity reversion, and then leverage both the original and opposite samples for (dual) training and (dual) prediction. Experimental results on four datasets demonstrate the effectiveness of the proposed approach for polarity classification. 1

6 0.24733558 211 acl-2013-LABR: A Large Scale Arabic Book Reviews Dataset

7 0.22749662 91 acl-2013-Connotation Lexicon: A Dash of Sentiment Beneath the Surface Meaning

8 0.22739033 117 acl-2013-Detecting Turnarounds in Sentiment Analysis: Thwarting

9 0.22670618 115 acl-2013-Detecting Event-Related Links and Sentiments from Social Media Texts

10 0.21869482 345 acl-2013-The Haves and the Have-Nots: Leveraging Unlabelled Corpora for Sentiment Analysis

11 0.21489196 79 acl-2013-Character-to-Character Sentiment Analysis in Shakespeare's Plays

12 0.20999856 379 acl-2013-Utterance-Level Multimodal Sentiment Analysis

13 0.19681536 284 acl-2013-Probabilistic Sense Sentiment Similarity through Hidden Emotions

14 0.18576762 147 acl-2013-Exploiting Topic based Twitter Sentiment for Stock Prediction

15 0.1804245 187 acl-2013-Identifying Opinion Subgroups in Arabic Online Discussions

16 0.15998404 253 acl-2013-Multilingual Affect Polarity and Valence Prediction in Metaphor-Rich Texts

17 0.15657182 168 acl-2013-Generating Recommendation Dialogs by Extracting Information from User Reviews

18 0.12977719 121 acl-2013-Discovering User Interactions in Ideological Discussions

19 0.12958905 310 acl-2013-Semantic Frames to Predict Stock Price Movement

20 0.11948813 294 acl-2013-Re-embedding words


similar papers computed by lsi model

lsi for this paper:

topicId topicWeight

[(0, 0.2), (1, 0.361), (2, -0.026), (3, 0.318), (4, -0.098), (5, -0.149), (6, 0.046), (7, 0.057), (8, 0.047), (9, 0.146), (10, 0.241), (11, -0.059), (12, -0.03), (13, -0.066), (14, 0.059), (15, 0.087), (16, 0.052), (17, -0.018), (18, 0.02), (19, 0.123), (20, -0.022), (21, 0.034), (22, -0.034), (23, 0.063), (24, 0.008), (25, -0.036), (26, -0.108), (27, 0.04), (28, -0.051), (29, 0.024), (30, -0.053), (31, -0.036), (32, -0.039), (33, -0.074), (34, 0.039), (35, 0.036), (36, -0.002), (37, -0.003), (38, 0.001), (39, -0.079), (40, -0.016), (41, 0.05), (42, 0.017), (43, -0.021), (44, -0.037), (45, 0.046), (46, 0.002), (47, 0.047), (48, -0.012), (49, -0.021)]

similar papers list:

simIndex simValue paperId paperTitle

same-paper 1 0.9708491 188 acl-2013-Identifying Sentiment Words Using an Optimization-based Model without Seed Words

Author: Hongliang Yu ; Zhi-Hong Deng ; Shiyingxue Li

Abstract: Sentiment Word Identification (SWI) is a basic technique in many sentiment analysis applications. Most existing researches exploit seed words, and lead to low robustness. In this paper, we propose a novel optimization-based model for SWI. Unlike previous approaches, our model exploits the sentiment labels of documents instead of seed words. Several experiments on real datasets show that WEED is effective and outperforms the state-of-the-art methods with seed words.

2 0.87297946 148 acl-2013-Exploring Sentiment in Social Media: Bootstrapping Subjectivity Clues from Multilingual Twitter Streams

Author: Svitlana Volkova ; Theresa Wilson ; David Yarowsky

Abstract: We study subjective language media and create Twitter-specific lexicons via bootstrapping sentiment-bearing terms from multilingual Twitter streams. Starting with a domain-independent, highprecision sentiment lexicon and a large pool of unlabeled data, we bootstrap Twitter-specific sentiment lexicons, using a small amount of labeled data to guide the process. Our experiments on English, Spanish and Russian show that the resulting lexicons are effective for sentiment classification for many underexplored languages in social media.

3 0.8625052 117 acl-2013-Detecting Turnarounds in Sentiment Analysis: Thwarting

Author: Ankit Ramteke ; Akshat Malu ; Pushpak Bhattacharyya ; J. Saketha Nath

Abstract: Thwarting and sarcasm are two uncharted territories in sentiment analysis, the former because of the lack of training corpora and the latter because of the enormous amount of world knowledge it demands. In this paper, we propose a working definition of thwarting amenable to machine learning and create a system that detects if the document is thwarted or not. We focus on identifying thwarting in product reviews, especially in the camera domain. An ontology of the camera domain is created. Thwarting is looked upon as the phenomenon of polarity reversal at a higher level of ontology compared to the polarity expressed at the lower level. This notion of thwarting defined with respect to an ontology is novel, to the best of our knowledge. A rule based implementation building upon this idea forms our baseline. We show that machine learning with annotated corpora (thwarted/nonthwarted) is more effective than the rule based system. Because of the skewed distribution of thwarting, we adopt the Areaunder-the-Curve measure of performance. To the best of our knowledge, this is the first attempt at the difficult problem of thwarting detection, which we hope will at Akshat Malu Dept. of Computer Science & Engg., Indian Institute of Technology Bombay, Mumbai, India. akshatmalu@ cse .i itb .ac .in J. Saketha Nath Dept. of Computer Science & Engg., Indian Institute of Technology Bombay, Mumbai, India. s aketh@ cse .i itb .ac .in least provide a baseline system to compare against. 1 Credits The authors thank the lexicographers at Center for Indian Language Technology (CFILT) at IIT Bombay for their support for this work. 2

4 0.86084604 318 acl-2013-Sentiment Relevance

Author: Christian Scheible ; Hinrich Schutze

Abstract: A number of different notions, including subjectivity, have been proposed for distinguishing parts of documents that convey sentiment from those that do not. We propose a new concept, sentiment relevance, to make this distinction and argue that it better reflects the requirements of sentiment analysis systems. We demonstrate experimentally that sentiment relevance and subjectivity are related, but different. Since no large amount of labeled training data for our new notion of sentiment relevance is available, we investigate two semi-supervised methods for creating sentiment relevance classifiers: a distant supervision approach that leverages structured information about the domain of the reviews; and transfer learning on feature representations based on lexical taxonomies that enables knowledge transfer. We show that both methods learn sentiment relevance classifiers that perform well.

5 0.84867245 79 acl-2013-Character-to-Character Sentiment Analysis in Shakespeare's Plays

Author: Eric T. Nalisnick ; Henry S. Baird

Abstract: We present an automatic method for analyzing sentiment dynamics between characters in plays. This literary format’s structured dialogue allows us to make assumptions about who is participating in a conversation. Once we have an idea of who a character is speaking to, the sentiment in his or her speech can be attributed accordingly, allowing us to generate lists of a character’s enemies and allies as well as pinpoint scenes critical to a character’s emotional development. Results of experiments on Shakespeare’s plays are presented along with discussion of how this work can be extended to unstructured texts (i.e. novels).

6 0.83566874 131 acl-2013-Dual Training and Dual Prediction for Polarity Classification

7 0.78377628 91 acl-2013-Connotation Lexicon: A Dash of Sentiment Beneath the Surface Meaning

8 0.74763632 211 acl-2013-LABR: A Large Scale Arabic Book Reviews Dataset

9 0.66752511 2 acl-2013-A Bayesian Model for Joint Unsupervised Induction of Sentiment, Aspect and Discourse Representations

10 0.64287055 379 acl-2013-Utterance-Level Multimodal Sentiment Analysis

11 0.62746257 284 acl-2013-Probabilistic Sense Sentiment Similarity through Hidden Emotions

12 0.57681775 345 acl-2013-The Haves and the Have-Nots: Leveraging Unlabelled Corpora for Sentiment Analysis

13 0.56063122 115 acl-2013-Detecting Event-Related Links and Sentiments from Social Media Texts

14 0.56060946 168 acl-2013-Generating Recommendation Dialogs by Extracting Information from User Reviews

15 0.53525722 81 acl-2013-Co-Regression for Cross-Language Review Rating Prediction

16 0.5342657 253 acl-2013-Multilingual Affect Polarity and Valence Prediction in Metaphor-Rich Texts

17 0.52557129 147 acl-2013-Exploiting Topic based Twitter Sentiment for Stock Prediction

18 0.50054967 49 acl-2013-An annotated corpus of quoted opinions in news articles

19 0.45555252 187 acl-2013-Identifying Opinion Subgroups in Arabic Online Discussions

20 0.43454254 67 acl-2013-Bi-directional Inter-dependencies of Subjective Expressions and Targets and their Value for a Joint Model


similar papers computed by lda model

lda for this paper:

topicId topicWeight

[(0, 0.042), (4, 0.014), (6, 0.035), (11, 0.1), (15, 0.012), (24, 0.054), (26, 0.097), (28, 0.015), (35, 0.078), (42, 0.027), (48, 0.065), (63, 0.215), (70, 0.037), (88, 0.053), (90, 0.018), (95, 0.066)]

similar papers list:

simIndex simValue paperId paperTitle

1 0.8838101 151 acl-2013-Extra-Linguistic Constraints on Stance Recognition in Ideological Debates

Author: Kazi Saidul Hasan ; Vincent Ng

Abstract: Determining the stance expressed by an author from a post written for a twosided debate in an online debate forum is a relatively new problem. We seek to improve Anand et al.’s (201 1) approach to debate stance classification by modeling two types of soft extra-linguistic constraints on the stance labels of debate posts, user-interaction constraints and ideology constraints. Experimental results on four datasets demonstrate the effectiveness of these inter-post constraints in improving debate stance classification.

same-paper 2 0.81705338 188 acl-2013-Identifying Sentiment Words Using an Optimization-based Model without Seed Words

Author: Hongliang Yu ; Zhi-Hong Deng ; Shiyingxue Li

Abstract: Sentiment Word Identification (SWI) is a basic technique in many sentiment analysis applications. Most existing researches exploit seed words, and lead to low robustness. In this paper, we propose a novel optimization-based model for SWI. Unlike previous approaches, our model exploits the sentiment labels of documents instead of seed words. Several experiments on real datasets show that WEED is effective and outperforms the state-of-the-art methods with seed words.

3 0.78274786 140 acl-2013-Evaluating Text Segmentation using Boundary Edit Distance

Author: Chris Fournier

Abstract: This work proposes a new segmentation evaluation metric, named boundary similarity (B), an inter-coder agreement coefficient adaptation, and a confusion-matrix for segmentation that are all based upon an adaptation of the boundary edit distance in Fournier and Inkpen (2012). Existing segmentation metrics such as Pk, WindowDiff, and Segmentation Similarity (S) are all able to award partial credit for near misses between boundaries, but are biased towards segmentations containing few or tightly clustered boundaries. Despite S’s improvements, its normalization also produces cosmetically high values that overestimate agreement & performance, leading this work to propose a solution.

4 0.77987576 219 acl-2013-Learning Entity Representation for Entity Disambiguation

Author: Zhengyan He ; Shujie Liu ; Mu Li ; Ming Zhou ; Longkai Zhang ; Houfeng Wang

Abstract: We propose a novel entity disambiguation model, based on Deep Neural Network (DNN). Instead of utilizing simple similarity measures and their disjoint combinations, our method directly optimizes document and entity representations for a given similarity measure. Stacked Denoising Auto-encoders are first employed to learn an initial document representation in an unsupervised pre-training stage. A supervised fine-tuning stage follows to optimize the representation towards the similarity measure. Experiment results show that our method achieves state-of-the-art performance on two public datasets without any manually designed features, even beating complex collective approaches.

5 0.67145944 318 acl-2013-Sentiment Relevance

Author: Christian Scheible ; Hinrich Schutze

Abstract: A number of different notions, including subjectivity, have been proposed for distinguishing parts of documents that convey sentiment from those that do not. We propose a new concept, sentiment relevance, to make this distinction and argue that it better reflects the requirements of sentiment analysis systems. We demonstrate experimentally that sentiment relevance and subjectivity are related, but different. Since no large amount of labeled training data for our new notion of sentiment relevance is available, we investigate two semi-supervised methods for creating sentiment relevance classifiers: a distant supervision approach that leverages structured information about the domain of the reviews; and transfer learning on feature representations based on lexical taxonomies that enables knowledge transfer. We show that both methods learn sentiment relevance classifiers that perform well.

6 0.66106552 7 acl-2013-A Lattice-based Framework for Joint Chinese Word Segmentation, POS Tagging and Parsing

7 0.65638083 49 acl-2013-An annotated corpus of quoted opinions in news articles

8 0.6562624 2 acl-2013-A Bayesian Model for Joint Unsupervised Induction of Sentiment, Aspect and Discourse Representations

9 0.65165091 117 acl-2013-Detecting Turnarounds in Sentiment Analysis: Thwarting

10 0.64882886 187 acl-2013-Identifying Opinion Subgroups in Arabic Online Discussions

11 0.64865065 131 acl-2013-Dual Training and Dual Prediction for Polarity Classification

12 0.64790964 309 acl-2013-Scaling Semi-supervised Naive Bayes with Feature Marginals

13 0.64780354 148 acl-2013-Exploring Sentiment in Social Media: Bootstrapping Subjectivity Clues from Multilingual Twitter Streams

14 0.64759052 196 acl-2013-Improving pairwise coreference models through feature space hierarchy learning

15 0.64521742 169 acl-2013-Generating Synthetic Comparable Questions for News Articles

16 0.64414263 144 acl-2013-Explicit and Implicit Syntactic Features for Text Classification

17 0.64297289 81 acl-2013-Co-Regression for Cross-Language Review Rating Prediction

18 0.6429013 17 acl-2013-A Random Walk Approach to Selectional Preferences Based on Preference Ranking and Propagation

19 0.6423068 236 acl-2013-Mapping Source to Target Strings without Alignment by Analogical Learning: A Case Study with Transliteration

20 0.64216799 333 acl-2013-Summarization Through Submodularity and Dispersion