acl acl2012 acl2012-161 knowledge-graph by maker-knowledge-mining

161 acl-2012-Polarity Consistency Checking for Sentiment Dictionaries

Source: pdf

Author: Eduard Dragut ; Hong Wang ; Clement Yu ; Prasad Sistla ; Weiyi Meng

Abstract: Polarity classification of words is important for applications such as Opinion Mining and Sentiment Analysis. A number of sentiment word/sense dictionaries have been manually or (semi)automatically constructed. The dictionaries have substantial inaccuracies. Besides obvious instances, where the same word appears with different polarities in different dictionaries, the dictionaries exhibit complex cases, which cannot be detected by mere manual inspection. We introduce the concept of polarity consistency of words/senses in sentiment dictionaries in this paper. We show that the consistency problem is NP-complete. We reduce the polarity consistency problem to the satisfiability problem and utilize a fast SAT solver to detect inconsistencies in a sentiment dictionary. We perform experiments on four sentiment dictionaries and WordNet.

Reference: text

Summary: the most important sentenses genereted by tfidf model

sentIndex sentText sentNum sentScore

1 A number of sentiment word/sense dictionaries have been manually or (semi)automatically constructed. [sent-8, score-0.397]

2 Besides obvious instances, where the same word appears with different polarities in different dictionaries, the dictionaries exhibit complex cases, which cannot be detected by mere manual inspection. [sent-10, score-0.535]

3 We introduce the concept of polarity consistency of words/senses in sentiment dictionaries in this paper. [sent-11, score-1.097]

4 We reduce the polarity consistency problem to the satisfiability problem and utilize a fast SAT solver to detect inconsistencies in a sentiment dictionary. [sent-13, score-1.264]

5 We perform experiments on four sentiment dictionaries and WordNet. [sent-14, score-0.397]

6 The general approach is to summarize the semantic polarity (i. [sent-19, score-0.588]

7 There are numerous works that, given a sentiment lexicon, analyze the structure of 997 a sentence/document to infer its orientation, the holder of an opinion, the sentiment of the opinion, etc. [sent-27, score-0.436]

8 Several domain independent sentiment dictionaries have been manually or (semi)-automatically created, e. [sent-30, score-0.397]

9 OF, GI and AL are called sentiment word dictionaries (SWD). [sent-38, score-0.397]

10 The sentiment dictionaries have the following problems: • They exhibit substantial (intra-dictionary) inaccTuhreaycie exs. [sent-40, score-0.397]

11 h iFboirt example, tlh (ei synset {Indo-European, Indo-Aryan, Aryan} (of or relating Etou the former Indo-European people), has a negative polarity in Q-WordNet, while most people would agree that this synset has a neutral polarity instead. [sent-41, score-1.618]

12 • These dictionaries do not address the concept of polarity (in)consistency otf a words/synsets. [sent-45, score-0.767]

13 We define consistency among the polarities of words/synsets in a dictionary and give methods to check it. [sent-47, score-0.531]

14 Hence, t ant al i e conveys a positive sentiment z when used with this sense. [sent-60, score-0.41]

15 Manual checking of sentiment dictionaries for inconsistency is a difficult endeavor. [sent-62, score-0.561]

16 We aim to unearth these inconsistencies in sentiment dictionaries. [sent-64, score-0.402]

17 The presence of inconsistencies found via polarity analysis is not exclusively attributed to one party, i. [sent-65, score-0.772]

18 Therefore, a by-product of our polarity consistency analysis is that it can also locate some ofthe likely places where WordNet needs linguists’ attention. [sent-69, score-0.7]

19 We show that the problem of checking whether the polarities of a set of words is consistent is NPcomplete. [sent-70, score-0.519]

20 A fast SAT solver is utilized to detect inconsistencies and it is known such solvers can in practice determine consistency or detect inconsistencies. [sent-72, score-0.409]

21 org/ 998 are discovered among words with polarities within and across sentiment dictionaries. [sent-76, score-0.61]

22 This suggests that some remedial work needs to be performed on these sentiment dictionaries as well as on WordNet. [sent-77, score-0.397]

23 The contributions of this paper are: • address the consistency of polarities of awdodrrdess/ssenses. [sent-78, score-0.468]

24 2 Problem Definition The polarities of the words in a sentiment dictionary may not necessarily be consistent (or correct). [sent-80, score-0.709]

25 In this paper, we focus on the detection of polarity as- × signment inconsistencies for the words and synsets within and across dictionaries (e. [sent-81, score-1.187]

26 We attempt to pinpoint the words with polarity inconsistencies and classify them (Section 3). [sent-85, score-0.831]

27 We define the polarity of a word to be a discrete probability distribution: P+ , P− , P0 with P+ +P− + P0 = 1, where they represent the “likelihoods” that the word is positive, negative or neutral, respectively. [sent-108, score-0.673]

28 For instance, the word cheap has the polarity distribution P+ = 0. [sent-110, score-0.674]

29 The polarity distribution of a word is estimated using the polarities of its underlying synsets. [sent-113, score-0.944]

30 1 f r2e+q(fc3h+efap4) freq(fc1heap) Our view of characterizing the polarity of a word using a polarity distribution is shared with other previous works (Kim and Hovy, 2006; Andreevskaia and Bergler, 2006). [sent-120, score-1.176]

31 We say that a word has a (mostly) positive (negative) polarity if the majority sense of the word is positive (negative). [sent-122, score-0.92]

32 That is, a word has a mostly positive polarity if P+ > P− + P0 and it has a mostly negative polarity if P− > P+ + P0. [sent-123, score-1.364]

33 For example, on majority, cheap conveys positive polarity since P+ = . [sent-125, score-0.802]

34 For example, the verb steal is assigned only negative polarity in GI. [sent-130, score-0.699]

35 The polarity of steal according to these two senses is not mentioned in GI. [sent-132, score-0.693]

36 For example, the verb arre st is mentioned with both negative and positive polarities in GI. [sent-134, score-0.544]

37 For instance, the adjective cheap has positive polarity in GI. [sent-136, score-0.811]

38 In this work we show that this property allows the polarities of words in input sentiment dictionaries to be checked. [sent-139, score-0.789]

39 Each synset in Sw has an associated polarity and a relative frequency with respect to w. [sent-143, score-0.721]

40 w has polarity p, p ∈ {positive, negative} if there is a ssub psoelta of synsets ∈S {′ ⊆ iStiwv ,su ncehg atthiavet }ea ifch t synset s ∈ S′ has polarity p and ∑s∈S′ > 0. [sent-144, score-1.509]

41 S′ ⊆ Sw is a minimally dominant subset of synsets (MDSs) if the sum of the relative frequencies of the synsets in S′ is larger than 0. [sent-148, score-0.454]

42 The definition does not preclude a word from having a polarity with a majority sense and a different polarity with a minority sense. [sent-151, score-1.341]

43 For example, the def- f r(ewq(,ws)) inition does not prevent a word from having both positive and negative senses, but it prevents a word from concomitantly having a majority sense ofbeing positive and a majority sense of being negative. [sent-152, score-0.569]

44 We need a formal description of polarity assignments to the words and synsets in WordNet. [sent-159, score-0.824]

45 Formally, a polarity assignment γ efnotrs a nne Wtwo ∪rk S N. [sent-161, score-0.639]

46 aL neet γ ob erk a polarity assignment Wfor ∪ N S. [sent-163, score-0.639]

47 1 Input Dictionaries Polarity Inconsistency Input polarity inconsistencies are of two types: intra-dictionary and inter-dictionary inconsistencies. [sent-176, score-0.772]

48 For instance, the verb brag has ) b,o wthh positive a npd negative polarities = in OF. [sent-181, score-0.57]

49 For these cases, we look up WordNet and apply Definition 1to determine the polarity of word w with part of speechpos. [sent-182, score-0.588]

50 The verb brag has negative polarity according to Definition 1. [sent-183, score-0.699]

51 Such cases simply say that the team who constructs the dictionary believes the word has multiple polarities as they do not adopt our dominant sense principle. [sent-184, score-0.519]

52 Q-WordNet, a sentiment sense dictionary, does not have intra-inconsistencies as it does do not have a synset with multiple polarities. [sent-186, score-0.427]

53 2 Inter-dictionary inconsistency A word belongs to this category if it appears with different polarities in different SWDs. [sent-189, score-0.46]

54 For instance, the adjective j oyle s s has positive polarity in OF and negative polarity in GI. [sent-190, score-1.398]

55 The three dictionaries largely agree on the polarities of the words they pairwise share. [sent-194, score-0.608]

56 Among the three dictionaries there are 181 polarity inconsistent words. [sent-198, score-0.86]

57 These words are manually corrected using Definition 1before the polarity consistency checking is applied to the union of the three dictionaries. [sent-199, score-0.84]

58 They consist of sets of words and/or synsets whose polarities cannot concomitantly be satisfied. [sent-203, score-0.618]

59 The word has negative polarity in OF and has a single sense in WordNet. [sent-218, score-0.749]

60 The sense is shared with the word ni fty, which has positive polarity in OF. [sent-219, score-0.767]

61 The example shows the presence of a discrepancy between WordNet and OF, namely, OF seems to assign polarity to a word according to a sense that is not in WordNet. [sent-225, score-0.695]

62 2 Across Sentiment Dictionaries We provide examples of inconsistencies across sentiment dictionaries here. [sent-228, score-0.581]

63 The adjective comi c has negative polarity in AL and the adjective laughable has positive polarity in OF. [sent-230, score-1.458]

64 , by successive applications of Definition 1), the word ris ible, which is not present in either of the dictionaries, is assigned negative polarity because of comi c and is assigned positive polarity because of laughable. [sent-233, score-1.39]

65 On one hand, intoxicate has a negative polarity in GI. [sent-237, score-0.712]

66 This means that P− > On the other hand, two of its three synsets have positive polarity in Q-WordNet. [sent-238, score-0.891]

67 The problem is that when all the senses of a word have a 0 frequency of use, wrong polarity inference may be produced. [sent-244, score-0.698]

68 This in turn boils down to finding those words with the property that there does not exist any polarity assignment to the synsets, which is consistent with their polarities. [sent-247, score-0.711]

69 It turns out that the complexity of the problem of assigning polarities to the synsets such that the assignment is consistent with the polarities of the input words, called Cons i stent P o l arity As s ignment problem, is a “hard” problem, as described below. [sent-248, score-1.03]

70 A word has polarity p if it satisfies the hypothesis of Definition 1. [sent-251, score-0.588]

71 The question to be answered is: Given an assignment of polarities to the words, does there exist an assignment of polarities to the synsets that agrees with that of the words? [sent-252, score-1.014]

72 , that given by one of the three SWDs) the problem of finding the polarities of the synsets that agree with this assignment is a “hard” problem. [sent-255, score-0.675]

73 4 Polarity Consistency Checking To “exhaustively” solve the problem of finding the polarity inconsistencies in an SWD, we propose a solution that reduces an instance of the problem to an instance of CNF-SAT. [sent-258, score-0.882]

74 We developed a method of converting an instance of the polarity consistency checking problem into an instance of CNF-SAT, which we will describe next. [sent-268, score-0.839]

75 Si)n ∨ce ( a w∧or ¬ds has∧ a sne)ut ∨ral ( polarity if∧ i t¬ shas nei−, ther positive nor negative polarities, we have that s0 = ¬s+ ∧ ¬s−. [sent-279, score-0.776]

76 Replacing this expression in the equation ab∧ov ¬e sand applying standard Boolean logic formulas, we can reduce it to C(s) = ¬s+ ∨ ¬s−(1) For each word w with polarity p ∈ {−, +, 0} in D we neeacedh a ocrldau wse C(w, p) trhitayt pst ∈ates { −th,a+t w }ha ins polarity p. [sent-280, score-1.176]

77 d S C(w, +), wd htoich de correspond teo w huasev-s ing polarity negative a)n, dw positive, respectively. [sent-285, score-0.673]

78 ement in Definition 1: w has polarity p if there exists a polarity dominant subset among its synsets. [sent-288, score-1.2]

79 If at least one of them is a polarity dominant subset then C(w, p) evaluates to True. [sent-290, score-0.612]

80 Let C(w, p, T) denote the clause for an MDS T of w, when w has polarity p ∈ {−, +}. [sent-294, score-0.588]

81 For each MDS T of w, the clause C(w, p, T) is the AND of the variables corresponding to polarity p of the synsets in T. [sent-296, score-0.788]

82 The clauses C(w, +) = s1+ and C(v, −) = s1 are unsatisfiable and thus the polarities of cheap and inexpensi ve are inconsistent. [sent-321, score-0.637]

83 We choose to present the exponential reduction in this paper because it can handle over 97% of the words in WordNet and it is better suited to explain one of the main contributions of paper: the translation from the polarity consistency problem to SAT. [sent-325, score-0.767]

84 5 Detecting Inconsistencies In this section we describe how we detect the words with polarity inconsistencies using the output of a SAT solver. [sent-335, score-0.808]

85 In our problem a MUC corresponds to a set of polarity inconsistent words. [sent-340, score-0.712]

86 6 Experiments The goal of the experimental study is to show that our techniques can identify considerable inconsistencies in various sentiment dictionaries. [sent-357, score-0.402]

87 EEM finds 240, 14 and 2 polarity inconsistent words in OF, GI and AL, respectively. [sent-370, score-0.717]

88 The union dictionary has 7,794 words and 249 out of them are found to be polarity inconsistent words. [sent-372, score-0.824]

89 So, in effect the three dictionaries have 249 + 181 = 430 polarity inconsistent words. [sent-374, score-0.86]

90 As discussed in the previous section, these may not be all the polarity inconsistencies in UF. [sent-375, score-0.772]

91 Observe that polarities assigned to the words in AL and GI largely agree with the polarities assigned to the synsets in Q-WordNet. [sent-384, score-0.985]

92 The union dictionary and Q-WordNet have substantial inconsistencies: the polarity of 455 words in the union dictionary disagrees with the polarities assigned to their underlying synsets in Q-WordNet. [sent-387, score-1.394]

93 i tWh polarity p tahned n polarities d tiimffeerse ⟨nwt f,rpooms⟩ p. [sent-395, score-0.944]

94 For example, the annotators totally agree with the polarities of 55% of the consistent words, whereas they only totally agree with 16% of the polarities of the inconsistent words. [sent-401, score-0.915]

95 The graph suggests that the annotators disagree to some extent (total disagreement + most disagreement + major disagreement) with 40% of the polarities of the inconsistent words, whereas they disagree to some extent with only 5% of the consistent words. [sent-402, score-0.637]

96 There are two lines of work on sentiment polarity lexicon induction: corpora-based (Hatzivassiloglou and McKeown, 1997; Kanayama and Nasukawa, 2006; Qiu et al. [sent-413, score-0.831]

97 To our knowledge, none of the earlier works studied the problem of polarity consistency checking for a sentiment dictionary. [sent-423, score-1.009]

98 8 Conclusion We studied the problem of checking polarity consistency for sentiment word dictionaries. [sent-425, score-1.009]

99 We showed that in practice polarity inconsistencies of words both within a dictionary and across dictionaries can be obtained using an SAT solver. [sent-427, score-1.05]

100 We reported experiments on four sentiment dictionaries and their union dictionary. [sent-429, score-0.441]

similar papers computed by tfidf model

tfidf for this paper:

wordName wordTfidf (topN-words)

[('polarity', 0.588), ('polarities', 0.356), ('sentiment', 0.218), ('synsets', 0.2), ('inconsistencies', 0.184), ('dictionaries', 0.179), ('sat', 0.176), ('wordnet', 0.165), ('synset', 0.133), ('cnf', 0.117), ('consistency', 0.112), ('gi', 0.108), ('inconsistency', 0.104), ('positive', 0.103), ('inconsistent', 0.093), ('formula', 0.091), ('cheap', 0.086), ('negative', 0.085), ('swd', 0.079), ('senses', 0.079), ('mdss', 0.078), ('unsatisfiable', 0.078), ('sense', 0.076), ('boolean', 0.07), ('confute', 0.065), ('clauses', 0.065), ('al', 0.064), ('dictionary', 0.063), ('solver', 0.061), ('checking', 0.06), ('muc', 0.058), ('neutral', 0.054), ('inexpensi', 0.052), ('mds', 0.052), ('sprove', 0.052), ('solvers', 0.052), ('assignment', 0.051), ('majority', 0.05), ('disagreement', 0.049), ('eem', 0.045), ('union', 0.044), ('freq', 0.041), ('agerri', 0.039), ('baccianella', 0.039), ('bul', 0.039), ('dershowitz', 0.039), ('dragut', 0.039), ('intoxicate', 0.039), ('mucs', 0.039), ('satisfiability', 0.039), ('ssd', 0.039), ('swds', 0.039), ('definition', 0.039), ('opinion', 0.037), ('agree', 0.037), ('consistent', 0.036), ('words', 0.036), ('connected', 0.035), ('uf', 0.034), ('andreevskaia', 0.034), ('garc', 0.034), ('adjective', 0.034), ('kim', 0.033), ('discrepancy', 0.031), ('takamura', 0.031), ('problem', 0.031), ('frequencies', 0.03), ('esuli', 0.029), ('sentiwordnet', 0.029), ('orientation', 0.029), ('disagree', 0.027), ('babic', 0.026), ('brag', 0.026), ('clement', 0.026), ('comi', 0.026), ('concomitantly', 0.026), ('fty', 0.026), ('kanayama', 0.026), ('picosat', 0.026), ('purdue', 0.026), ('sistla', 0.026), ('slea', 0.026), ('steal', 0.026), ('weiyi', 0.026), ('zy', 0.026), ('lexicon', 0.025), ('conveys', 0.025), ('entries', 0.024), ('dominant', 0.024), ('instance', 0.024), ('si', 0.023), ('oxford', 0.023), ('pinpoint', 0.023), ('enc', 0.023), ('appraisal', 0.023), ('taboada', 0.023), ('inquirer', 0.023), ('bergler', 0.023), ('kamps', 0.023), ('bing', 0.022)]

similar papers list:

simIndex simValue paperId paperTitle

same-paper 1 0.99999923 161 acl-2012-Polarity Consistency Checking for Sentiment Dictionaries

Author: Eduard Dragut ; Hong Wang ; Clement Yu ; Prasad Sistla ; Weiyi Meng

2 0.2252848 100 acl-2012-Fine Granular Aspect Analysis using Latent Structural Models

Author: Lei Fang ; Minlie Huang

Abstract: In this paper, we present a structural learning model forjoint sentiment classification and aspect analysis of text at various levels of granularity. Our model aims to identify highly informative sentences that are aspect-specific in online custom reviews. The primary advantages of our model are two-fold: first, it performs document-level and sentence-level sentiment polarity classification jointly; second, it is able to find informative sentences that are closely related to some respects in a review, which may be helpful for aspect-level sentiment analysis such as aspect-oriented summarization. The proposed method was evaluated with 9,000 Chinese restaurant reviews. Preliminary experiments demonstrate that our model obtains promising performance. 1

3 0.19626708 61 acl-2012-Cross-Domain Co-Extraction of Sentiment and Topic Lexicons

Author: Fangtao Li ; Sinno Jialin Pan ; Ou Jin ; Qiang Yang ; Xiaoyan Zhu

Abstract: Extracting sentiment and topic lexicons is important for opinion mining. Previous works have showed that supervised learning methods are superior for this task. However, the performance of supervised methods highly relies on manually labeled training data. In this paper, we propose a domain adaptation framework for sentiment- and topic- lexicon co-extraction in a domain of interest where we do not require any labeled data, but have lots of labeled data in another related domain. The framework is twofold. In the first step, we generate a few high-confidence sentiment and topic seeds in the target domain. In the second step, we propose a novel Relational Adaptive bootstraPping (RAP) algorithm to expand the seeds in the target domain by exploiting the labeled source domain data and the relationships between topic and sentiment words. Experimental results show that our domain adaptation framework can extract precise lexicons in the target domain without any annotation.

4 0.16626242 115 acl-2012-Identifying High-Impact Sub-Structures for Convolution Kernels in Document-level Sentiment Classification

Author: Zhaopeng Tu ; Yifan He ; Jennifer Foster ; Josef van Genabith ; Qun Liu ; Shouxun Lin

Abstract: Convolution kernels support the modeling of complex syntactic information in machinelearning tasks. However, such models are highly sensitive to the type and size of syntactic structure used. It is therefore an important challenge to automatically identify high impact sub-structures relevant to a given task. In this paper we present a systematic study investigating (combinations of) sequence and convolution kernels using different types of substructures in document-level sentiment classification. We show that minimal sub-structures extracted from constituency and dependency trees guided by a polarity lexicon show 1.45 pointabsoluteimprovementinaccuracy overa bag-of-words classifier on a widely used sentiment corpus. 1

5 0.1617204 62 acl-2012-Cross-Lingual Mixture Model for Sentiment Classification

Author: Xinfan Meng ; Furu Wei ; Xiaohua Liu ; Ming Zhou ; Ge Xu ; Houfeng Wang

Abstract: The amount of labeled sentiment data in English is much larger than that in other languages. Such a disproportion arouse interest in cross-lingual sentiment classification, which aims to conduct sentiment classification in the target language (e.g. Chinese) using labeled data in the source language (e.g. English). Most existing work relies on machine translation engines to directly adapt labeled data from the source language to the target language. This approach suffers from the limited coverage of vocabulary in the machine translation results. In this paper, we propose a generative cross-lingual mixture model (CLMM) to leverage unlabeled bilingual parallel data. By fitting parameters to maximize the likelihood of the bilingual parallel data, the proposed model learns previously unseen sentiment words from the large bilingual parallel data and improves vocabulary coverage signifi- cantly. Experiments on multiple data sets show that CLMM is consistently effective in two settings: (1) labeled data in the target language are unavailable; and (2) labeled data in the target language are also available.

6 0.15293391 187 acl-2012-Subgroup Detection in Ideological Discussions

7 0.12457293 102 acl-2012-Genre Independent Subgroup Detection in Online Discussion Threads: A Study of Implicit Attitude using Textual Latent Semantics

8 0.1240788 188 acl-2012-Subgroup Detector: A System for Detecting Subgroups in Online Discussions

9 0.11949995 151 acl-2012-Multilingual Subjectivity and Sentiment Analysis

10 0.11775869 180 acl-2012-Social Event Radar: A Bilingual Context Mining and Sentiment Analysis Summarization System

11 0.11611936 21 acl-2012-A System for Real-time Twitter Sentiment Analysis of 2012 U.S. Presidential Election Cycle

12 0.11139461 206 acl-2012-UWN: A Large Multilingual Lexical Knowledge Base

13 0.10320035 37 acl-2012-Baselines and Bigrams: Simple, Good Sentiment and Topic Classification

14 0.087956883 56 acl-2012-Computational Approaches to Sentence Completion

15 0.08291965 152 acl-2012-Multilingual WSD with Just a Few Lines of Code: the BabelNet API

16 0.077879176 217 acl-2012-Word Sense Disambiguation Improves Information Retrieval

17 0.074612446 28 acl-2012-Aspect Extraction through Semi-Supervised Modeling

18 0.054880071 132 acl-2012-Learning the Latent Semantics of a Concept from its Definition

19 0.054338381 208 acl-2012-Unsupervised Relation Discovery with Sense Disambiguation

20 0.053067796 117 acl-2012-Improving Word Representations via Global Context and Multiple Word Prototypes

similar papers computed by lsi model

lsi for this paper:

topicId topicWeight

[(0, -0.15), (1, 0.187), (2, 0.158), (3, -0.207), (4, 0.167), (5, -0.024), (6, -0.028), (7, 0.008), (8, -0.119), (9, -0.057), (10, 0.058), (11, 0.029), (12, 0.019), (13, -0.045), (14, -0.042), (15, -0.152), (16, 0.061), (17, 0.051), (18, -0.049), (19, 0.02), (20, 0.031), (21, -0.076), (22, -0.055), (23, -0.011), (24, 0.025), (25, 0.051), (26, -0.06), (27, 0.008), (28, -0.019), (29, -0.08), (30, 0.104), (31, 0.11), (32, -0.056), (33, 0.023), (34, 0.021), (35, -0.025), (36, 0.04), (37, -0.052), (38, -0.009), (39, 0.008), (40, -0.075), (41, -0.091), (42, 0.011), (43, 0.023), (44, 0.061), (45, 0.017), (46, 0.015), (47, -0.018), (48, -0.088), (49, -0.053)]

similar papers list:

simIndex simValue paperId paperTitle

same-paper 1 0.97856647 161 acl-2012-Polarity Consistency Checking for Sentiment Dictionaries

Author: Eduard Dragut ; Hong Wang ; Clement Yu ; Prasad Sistla ; Weiyi Meng

2 0.6747427 62 acl-2012-Cross-Lingual Mixture Model for Sentiment Classification

Author: Xinfan Meng ; Furu Wei ; Xiaohua Liu ; Ming Zhou ; Ge Xu ; Houfeng Wang

3 0.67133886 151 acl-2012-Multilingual Subjectivity and Sentiment Analysis

Author: Rada Mihalcea ; Carmen Banea ; Janyce Wiebe

Abstract: Subjectivity and sentiment analysis focuses on the automatic identification of private states, such as opinions, emotions, sentiments, evaluations, beliefs, and speculations in natural language. While subjectivity classification labels text as either subjective or objective, sentiment classification adds an additional level of granularity, by further classifying subjective text as either positive, negative or neutral. While much of the research work in this area has been applied to English, research on other languages is growing, including Japanese, Chinese, German, Spanish, Romanian. While most of the researchers in the field are familiar with the methods applied on English, few of them have closely looked at the original research carried out in other languages. For example, in languages such as Chinese, researchers have been looking at the ability of characters to carry sentiment information (Ku et al., 2005; Xiang, 2011). In Romanian, due to markers of politeness and additional verbal modes embedded in the language, experiments have hinted that subjectivity detection may be easier to achieve (Banea et al., 2008). These additional sources ofinformation may not be available across all languages, yet, various articles have pointed out that by investigating a synergistic approach for detecting subjectivity and sentiment in multiple languages at the same time, improvements can be achieved not only in other languages, but in English as well. The development and interest in these methods is also highly motivated by the fact that only 27% of Internet users speak English (www.internetworldstats.com/stats.htm, 4 . unt . edu wiebe @ c s . pitt . edu Oct 11, 2011), and that number diminishes further every year, as more people across the globe gain Internet access. The aim of this tutorial is to familiarize the attendees with the subjectivity and sentiment research carried out on languages other than English in order to enable and promote crossfertilization. Specifically, we will review work along three main directions. First, we will present methods where the resources and tools have been specifically developed for a given target language. In this category, we will also briefly overview the main methods that have been proposed for English, but which can be easily ported to other languages. Second, we will describe cross-lingual approaches, including several methods that have been proposed to leverage on the resources and tools available in English by using cross-lingual projections. Finally, third, we will show how the expression of opinions and polarity pervades language boundaries, and thus methods that holistically explore multiple languages at the same time can be effectively considered. References C. Banea, R. Mihalcea, and J. Wiebe. 2008. A Bootstrapping method for building subjectivity lexicons for languages with scarce resources. In Proceedings of LREC 2008, Marrakech, Morocco. L. W. Ku, T. H. Wu, L. Y. Lee, and H. H. Chen. 2005. Construction of an Evaluation Corpus for Opinion Extraction. In Proceedings of NTCIR-5, Tokyo, Japan. L. Xiang. 2011. Ideogram Based Chinese Sentiment Word Orientation Computation. Computing Research Repository, page 4, October. Jeju, Republic of Korea,T 8ut Jourliya 2l0 A1b2s.tr ?ac c2t0s1 o2f A ACssLo 2c0ia1t2io,n p faogre C 4o,mputational Linguistics

4 0.65531284 61 acl-2012-Cross-Domain Co-Extraction of Sentiment and Topic Lexicons

Author: Fangtao Li ; Sinno Jialin Pan ; Ou Jin ; Qiang Yang ; Xiaoyan Zhu

5 0.63907188 37 acl-2012-Baselines and Bigrams: Simple, Good Sentiment and Topic Classification

Author: Sida Wang ; Christopher Manning

Abstract: Variants of Naive Bayes (NB) and Support Vector Machines (SVM) are often used as baseline methods for text classification, but their performance varies greatly depending on the model variant, features used and task/ dataset. We show that: (i) the inclusion of word bigram features gives consistent gains on sentiment analysis tasks; (ii) for short snippet sentiment tasks, NB actually does better than SVMs (while for longer documents the opposite result holds); (iii) a simple but novel SVM variant using NB log-count ratios as feature values consistently performs well across tasks and datasets. Based on these observations, we identify simple NB and SVM variants which outperform most published results on sentiment analysis datasets, sometimes providing a new state-of-the-art performance level.

6 0.59352565 100 acl-2012-Fine Granular Aspect Analysis using Latent Structural Models

7 0.53607243 21 acl-2012-A System for Real-time Twitter Sentiment Analysis of 2012 U.S. Presidential Election Cycle

8 0.50193799 115 acl-2012-Identifying High-Impact Sub-Structures for Convolution Kernels in Document-level Sentiment Classification

9 0.49942812 180 acl-2012-Social Event Radar: A Bilingual Context Mining and Sentiment Analysis Summarization System

10 0.47743908 152 acl-2012-Multilingual WSD with Just a Few Lines of Code: the BabelNet API

11 0.46613207 187 acl-2012-Subgroup Detection in Ideological Discussions

12 0.45436358 206 acl-2012-UWN: A Large Multilingual Lexical Knowledge Base

13 0.45302543 102 acl-2012-Genre Independent Subgroup Detection in Online Discussion Threads: A Study of Implicit Attitude using Textual Latent Semantics

14 0.4431558 28 acl-2012-Aspect Extraction through Semi-Supervised Modeling

15 0.40524775 188 acl-2012-Subgroup Detector: A System for Detecting Subgroups in Online Discussions

16 0.34494862 217 acl-2012-Word Sense Disambiguation Improves Information Retrieval

17 0.33557314 132 acl-2012-Learning the Latent Semantics of a Concept from its Definition

18 0.33490697 120 acl-2012-Information-theoretic Multi-view Domain Adaptation

19 0.33009437 112 acl-2012-Humor as Circuits in Semantic Networks

20 0.32409352 7 acl-2012-A Computational Approach to the Automation of Creative Naming

similar papers computed by lda model

lda for this paper:

topicId topicWeight

[(13, 0.305), (25, 0.046), (26, 0.023), (28, 0.045), (30, 0.041), (37, 0.027), (39, 0.101), (59, 0.011), (74, 0.023), (82, 0.025), (84, 0.026), (85, 0.019), (90, 0.08), (92, 0.05), (94, 0.019), (99, 0.057)]

similar papers list:

simIndex simValue paperId paperTitle

1 0.84962416 185 acl-2012-Strong Lexicalization of Tree Adjoining Grammars

Author: Andreas Maletti ; Joost Engelfriet

Abstract: Recently, it was shown (KUHLMANN, SATTA: Tree-adjoining grammars are not closed under strong lexicalization. Comput. Linguist., 2012) that finitely ambiguous tree adjoining grammars cannot be transformed into a normal form (preserving the generated tree language), in which each production contains a lexical symbol. A more powerful model, the simple context-free tree grammar, admits such a normal form. It can be effectively constructed and the maximal rank of the nonterminals only increases by 1. Thus, simple context-free tree grammars strongly lexicalize tree adjoining grammars and themselves.

same-paper 2 0.76821488 161 acl-2012-Polarity Consistency Checking for Sentiment Dictionaries

Author: Eduard Dragut ; Hong Wang ; Clement Yu ; Prasad Sistla ; Weiyi Meng

3 0.73872614 194 acl-2012-Text Segmentation by Language Using Minimum Description Length

Author: Hiroshi Yamaguchi ; Kumiko Tanaka-Ishii

Abstract: The problem addressed in this paper is to segment a given multilingual document into segments for each language and then identify the language of each segment. The problem was motivated by an attempt to collect a large amount of linguistic data for non-major languages from the web. The problem is formulated in terms of obtaining the minimum description length of a text, and the proposed solution finds the segments and their languages through dynamic programming. Empirical results demonstrating the potential of this approach are presented for experiments using texts taken from the Universal Declaration of Human Rights and Wikipedia, covering more than 200 languages.

4 0.4509778 139 acl-2012-MIX Is Not a Tree-Adjoining Language

Author: Makoto Kanazawa ; Sylvain Salvati

Abstract: The language MIX consists of all strings over the three-letter alphabet {a, b, c} that contain an equal n-luemttebrer a olpfh occurrences }o tfh heaatch c olentttaeinr. We prove Joshi’s (1985) conjecture that MIX is not a tree-adjoining language.

5 0.44681901 7 acl-2012-A Computational Approach to the Automation of Creative Naming

Author: Gozde Ozbal ; Carlo Strapparava

Abstract: In this paper, we propose a computational approach to generate neologisms consisting of homophonic puns and metaphors based on the category of the service to be named and the properties to be underlined. We describe all the linguistic resources and natural language processing techniques that we have exploited for this task. Then, we analyze the performance of the system that we have developed. The empirical results show that our approach is generally effective and it constitutes a solid starting point for the automation ofthe naming process.

6 0.44481054 133 acl-2012-Learning to "Read Between the Lines" using Bayesian Logic Programs

7 0.44298372 79 acl-2012-Efficient Tree-Based Topic Modeling

8 0.43953812 206 acl-2012-UWN: A Large Multilingual Lexical Knowledge Base

9 0.43750963 84 acl-2012-Estimating Compact Yet Rich Tree Insertion Grammars

10 0.43512192 21 acl-2012-A System for Real-time Twitter Sentiment Analysis of 2012 U.S. Presidential Election Cycle

11 0.42579201 80 acl-2012-Efficient Tree-based Approximation for Entailment Graph Learning

12 0.42389134 102 acl-2012-Genre Independent Subgroup Detection in Online Discussion Threads: A Study of Implicit Attitude using Textual Latent Semantics

13 0.42142347 28 acl-2012-Aspect Extraction through Semi-Supervised Modeling

14 0.42043209 186 acl-2012-Structuring E-Commerce Inventory

15 0.42014462 191 acl-2012-Temporally Anchored Relation Extraction

16 0.4176518 174 acl-2012-Semantic Parsing with Bayesian Tree Transducers

17 0.41715318 180 acl-2012-Social Event Radar: A Bilingual Context Mining and Sentiment Analysis Summarization System

18 0.41659704 187 acl-2012-Subgroup Detection in Ideological Discussions

19 0.41426569 44 acl-2012-CSNIPER - Annotation-by-query for Non-canonical Constructions in Large Corpora

20 0.41263464 159 acl-2012-Pattern Learning for Relation Extraction with a Hierarchical Topic Model