acl acl2010 acl2010-141 knowledge-graph by maker-knowledge-mining

141 acl-2010-Identifying Text Polarity Using Random Walks


Source: pdf

Author: Ahmed Hassan ; Dragomir Radev

Abstract: Automatically identifying the polarity of words is a very important task in Natural Language Processing. It has applications in text classification, text filtering, analysis of product review, analysis of responses to surveys, and mining online discussions. We propose a method for identifying the polarity of words. We apply a Markov random walk model to a large word relatedness graph, producing a polarity estimate for any given word. A key advantage of the model is its ability to accurately and quickly assign a polarity sign and magnitude to any word. The method could be used both in a semi-supervised setting where a training set of labeled words is used, and in an unsupervised setting where a handful of seeds is used to define the two polarity classes. The method is experimentally tested using a manually labeled set of positive and negative words. It outperforms the state of the art methods in the semi-supervised setting. The results in the unsupervised setting is comparable to the best reported values. However, the proposed method is faster and does not need a large corpus.

Reference: text


Summary: the most important sentenses genereted by tfidf model

sentIndex sentText sentNum sentScore

1 edu Abstract Automatically identifying the polarity of words is a very important task in Natural Language Processing. [sent-2, score-0.378]

2 We propose a method for identifying the polarity of words. [sent-4, score-0.403]

3 We apply a Markov random walk model to a large word relatedness graph, producing a polarity estimate for any given word. [sent-5, score-0.822]

4 A key advantage of the model is its ability to accurately and quickly assign a polarity sign and magnitude to any word. [sent-6, score-0.268]

5 The method could be used both in a semi-supervised setting where a training set of labeled words is used, and in an unsupervised setting where a handful of seeds is used to define the two polarity classes. [sent-7, score-0.734]

6 The method is experimentally tested using a manually labeled set of positive and negative words. [sent-8, score-0.285]

7 A list of words with positive/negative polarity is a very valuable resource for such an application. [sent-20, score-0.317]

8 This is used to produce a plot of the number of positive and negative sentiment messages over time. [sent-30, score-0.216]

9 All those applications could benefit much from an automatic way of identifying semantic orientation of words. [sent-31, score-0.367]

10 In this paper, we study the problem of automatically identifying semantic orientation of any word by analyzing its relations to other words. [sent-32, score-0.399]

11 Automatically classifying words as either positive or negative enables us to automatically identify the polarity of larger pieces of text. [sent-33, score-0.516]

12 We apply a Markov random walk model to a large semantic word graph, producing a polarity estimate for any given word. [sent-35, score-0.789]

13 Previous work on identifying the semantic orientation of words has addressed the problem as both a semi-supervised (Takamura et al. [sent-36, score-0.416]

14 In the unsupervised setting, only a handful of seeds is used to define the two polarity classes. [sent-41, score-0.459]

15 2 Related Work Hatzivassiloglou and McKeown (1997) proposed a method for identifying word polarity of adjectives. [sent-51, score-0.435]

16 They extract all conjunctions of adjectives from a given corpus and then they classify each conjunctive expression as either the same orientation such as “simple and well-received” or different orientation such as “simplistic but wellreceived”. [sent-52, score-0.688]

17 Turney and Littman (2003) identify word polarity by looking at its statistical association with a set of positive/negative seed words. [sent-61, score-0.399]

18 They present their method as an unsupervised method where a very small amount of seed words are used to define semantic orientation rather than train the model. [sent-66, score-0.638]

19 (2005) proposed using spin models for extracting semantic orientation of words. [sent-70, score-0.669]

20 Each electron has a spin and each spin has a direction taking one of two values: up or down. [sent-73, score-0.726]

21 Two neighboring spins tend to have the same orientation from an energetic point of view. [sent-74, score-0.293]

22 Their hy- pothesis is that as neighboring electrons tend to have the same spin direction, neighboring words tend to have similar polarity. [sent-75, score-0.529]

23 (2004) construct a network based on WordNet synonyms and then use the shortest paths between any given word and the words ’good’ and ’bad’ to determine word polarity. [sent-81, score-0.381]

24 Restricting seed words to only two words affects their accuracy. [sent-88, score-0.197]

25 Adding more seed words could help but it will make their method extremely costly from the computation point of view. [sent-89, score-0.258]

26 Hu and Liu (2004) use WordNet synonyms and antonyms to predict the polarity of words. [sent-91, score-0.453]

27 For any word, whose polarity is unknown, they search WordNet and a list of seed labeled words to predict its polarity. [sent-92, score-0.486]

28 Kim and Hovy (2004) start with two lists of positive and negative seed words. [sent-103, score-0.24]

29 Synonyms of positive words and antonyms of negative words are considered positive, while synonyms of negative words and antonyms of positive words are considered negative. [sent-105, score-0.744]

30 Subjectivity analysis is related to the proposed method because identifying the polarity of text is the natural next step that should follow identifying subjective text. [sent-117, score-0.52]

31 3 Word Polarity We use a Markov random walk model to identify polarity of words. [sent-118, score-0.708]

32 Assume that we have a network of words, some of which are labeled as either positive or negative. [sent-119, score-0.234]

33 Now imagine a random surfer walking along the network starting from an unlabeled word w. [sent-124, score-0.411]

34 The random walk continues until the surfer hits a labeled word. [sent-125, score-0.661]

35 If the word w is positive then the probability that the random walk hits a positive word is higher and if w is negative then the probability that the random walk hits a negative word is higher. [sent-126, score-1.378]

36 Similarly, if the word w is positive then the average time it takes a random walk starting at w to hit a positive node is less than the average time it takes a random walk starting at w to hit a negative node. [sent-127, score-1.331]

37 The random walk model is described in Section 3. [sent-130, score-0.44]

38 Finally, an algorithm for computing a sign and magnitude for the polarity of any given word is described in Section 3. [sent-134, score-0.3]

39 The resulting graph is a graph G(W, E) where W is a set of word / part-of-speech pairs for all the words in WordNet. [sent-145, score-0.221]

40 Following the method presented in (Hatzivassiloglou and McKeown, 1997), we can connect words if they appear in a conjunctive form in the corpus. [sent-155, score-0.211]

41 2 Random Walk Model Imagine a random surfer walking along the word relatedness graph G. [sent-162, score-0.432]

42 Starting from a word with unknown polarity i , it moves to a node j with probability Pij after the first step. [sent-163, score-0.3]

43 The walk continues until the surfer hits a word with a known polarity. [sent-164, score-0.506]

44 Seed words with known polarity act as an absorbing boundary for the random walk. [sent-165, score-0.434]

45 If we repeat the number of random walks N times, the percentage of time at which the walk ends at a positive/negative word could be used as an indicator of its positive/negative polarity. [sent-166, score-0.585]

46 The average time a random walk starting at w takes to hit the set of positive/negative nodes is also an indicator of its polarity. [sent-167, score-0.574]

47 This view is closely related to the partially labeled classification with random walks approach in (Szummer and Jaakkola, 2002) and the semi-supervised learning using harmonic functions approach in (Zhu et al. [sent-168, score-0.3]

48 Consider a subset of vertices S ⊂ V , Consider a random walk on G starting at = × nSod ⊂e iV ∈ CSo. [sent-177, score-0.522]

49 4 Word Polarity Calculation Based on the description of the random walk model and the first-passage (hitting) time above, 398 we now propose our word polarity identification algorithm. [sent-184, score-0.74]

50 We begin by constructing a word relatedness graph and defining a random walk on that graph as described above. [sent-185, score-0.694]

51 Let S+ and S− be two sets of vertices representing seed words that are already labeled as either positive or negative re- spectively. [sent-186, score-0.396]

52 For any given word w, we compute the hitting time h(w|S+), and h(w|S−) for the two sheittsti iteratively as des)c,r aibnded h e(awr|liSer. [sent-187, score-0.349]

53 The ratio between the two hitting times could be used as an indication of how positive/negative the given word is. [sent-190, score-0.349]

54 Computing hitting time as described earlier may be time consuming especially if the graph is large. [sent-193, score-0.387]

55 Algorithm 1 Word Polarity using Random Walks Require: A word relatedness graph G 1: Given a word w in V 2: Define a random walk on the graph. [sent-196, score-0.656]

56 We use WordNet (Miller, 1995) as a source of synonyms and hypernyms for the word relatedness graph. [sent-203, score-0.256]

57 We also look at the performance of the proposed method for different parts of speech, and for different confidence levels We compare our method to the Semantic Orientation from PMI (SO-PMI) method described in (Turney, 2002), the Spin model (Spin) described in (Takamura et al. [sent-210, score-0.26]

58 1 Comparisons with other methods This method could be used in a semi-supervised setting where a set of labeled words are used and the system learns from these labeled nodes and from other unlabeled nodes. [sent-214, score-0.338]

59 Under this setting, we compare our method to the spin model described in (Takamura et al. [sent-215, score-0.437]

60 The table shows that the proposed method outperforms the spin model. [sent-218, score-0.437]

61 The spin model approach uses word glosses, WordNet synonym, hypernym, and antonym relations, in addition to co-occurrence statistics extracted from corpus. [sent-219, score-0.428]

62 They describe this setting as unsupervised (Turney, 2002) because they only use 14 seeds as paradigm words that define the semantic orientation rather than train the model. [sent-223, score-0.549]

63 After (Turney, 2002), we use our method to predict semantic orientation of words in the General Inquirer lexicon (Stone et al. [sent-224, score-0.429]

64 The results comparing the SO-PMI method with different dataset sizes, the spin model, and the proposed method using only 14 seeds is shown in Table 2. [sent-228, score-0.662]

65 We no399 Table 1: Accuracy for adjectives only for the spin model, the bootstrap method, and the random walk model. [sent-229, score-0.885]

66 8 tice that the random walk method outperforms SOPMI when SO-PMI uses datasets of sizes 1 107 PanMd I2 w 1n09 S Ow-oPrMdsI. [sent-234, score-0.514]

67 uTsehes performance zoefs S 1O ×-P 1M0I aanndd 2th ×e 1 ra0ndom walk methods are comparable when SO-PMI uses a very large dataset (1 1011 Table 2: Accuracy for SO-PMI with different dataset sizes, the spin model, and the random walks model for 10-fold cross validation and 14 seeds. [sent-235, score-1.091]

68 T-PhMe performance oarfg tehe d spin tm (1od ×el 1 approach is also comparable to the other 2 methods. [sent-239, score-0.363]

69 The advantages of the random walk method over SO-PMI is that it is faster and it does not need a very large corpus like the one used by SOPMI. [sent-240, score-0.514]

70 Another advantage is that the random walk method can be used along with the labeled data from the General Inquirer lexicon (Stone et al. [sent-241, score-0.584]

71 We also compare our method to the bootstrapping method described in (Hu and Liu, 2004), and the shortest path method described in (Kamps et al. [sent-244, score-0.39]

72 The performance of the spin model method, the bootstrapping method, the shortest path method, and the random walk method for only adjectives is shown in Table 1. [sent-249, score-1.127]

73 We notice from the table that the random walk method outperforms both the spin model, the bootstrapping method, and the shortest path method for adjectives. [sent-250, score-1.186]

74 The reported accuracy for the shortest path method only considers the words it could assign a non-zero orientation value. [sent-251, score-0.535]

75 Figure 1shows the accuracy of the random walk method as a function of the maximum number of steps m. [sent-258, score-0.603]

76 We use a network built from WordNet synonyms and hypernyms only. [sent-260, score-0.228]

77 We found out that when the number of steps is very large, compared to the diameter of the graph, the random walk that starts at ambiguous words, that are hard to classify, have the chance of moving till it hits a node in the opposite class. [sent-267, score-0.55]

78 That does not happen when the limit on the number of steps is smaller because those walks are then terminated without hitting any labeled nodes and hence ignored. [sent-268, score-0.584]

79 4, k is the number of samples used by the Monte Carlo algorithm to find an estimate for the hitting time. [sent-271, score-0.364]

80 Figure 2 shows the accuracy of the random walks method as a function of the number of samples k. [sent-272, score-0.39]

81 This shows that the Monte Carlo algorithm for computing the random walks hitting time performs quite well with values of the number of samples as small as 1000. [sent-277, score-0.594]

82 2 Other Experiments We now measure the performance of the proposed method when the system is allowed to abstain from classifying the words for which it have low confidence. [sent-286, score-0.226]

83 We regard the ratio between the hitting time to positive words and hitting time to negative words as a confidence measure and evaluate the top words with the highest confidence level at different values of threshold. [sent-287, score-0.998]

84 Figure 4 shows the accuracy for 10-fold cross validation and for using only 14 seeds at different thresholds. [sent-288, score-0.263]

85 The figure shows that the top 60% words are classified with an accuracy greater than 99% for 10-fold cross validation and 92% with 14 seed words. [sent-290, score-0.294]

86 Figure 3 shows a learning curve displaying how the performance of the proposed method is affected with varying the labeled set size (i. [sent-293, score-0.198]

87 However, when we only use 14 seeds all of which are adjectives, similar to (Turney and Littman, 2003), we notice that the performance on adjectives is much better than other parts of speech. [sent-300, score-0.266]

88 When we use 14 seeds but replace some of the adjectives with verbs and nouns like (love, harm, friend, enemy), the performance for nouns and verbs improves considerably at the cost of losing a little bit of the performance on adjectives. [sent-301, score-0.199]

89 Disambiguating the sense of words given their context before trying to predict their polarity should solve this problem. [sent-305, score-0.317]

90 A possible solution to this might be identifying those words and adding more links to them from glosses of co-occurrence statistics in corpus. [sent-307, score-0.235]

91 5 Conclusions Predicting the semantic orientation of words is a very interesting task in Natural Language Processing and it has a wide variety of applications. [sent-311, score-0.355]

92 We proposed a method for automatically predicting the semantic orientation of words using random walks and hitting time. [sent-312, score-0.976]

93 The proposed method is based on the observation that a random walk starting at a given word is more likely to hit another word with the same semantic orientation before hitting a word with a different semantic orientation. [sent-313, score-1.382]

94 The proposed method can be used in a semi-supervised setting where a training set of labeled words is used, and in an unsupervised setting where only a handful of seeds is used to define the two polarity classes. [sent-314, score-0.734]

95 Mining wordnet for fuzzy sentiment: Sentiment tag extraction from wordnet glosses. [sent-325, score-0.274]

96 A bootstrapping method for building subjectivity lexicons for languages with scarce resources. [sent-329, score-0.198]

97 Determining the semantic orientation of terms through gloss classification. [sent-333, score-0.347]

98 Effects of adjective orientation and gradability on sentence subjectivity. [sent-346, score-0.257]

99 Measuring praise and criticism: Inference of semantic orientation from association. [sent-411, score-0.306]

100 Towards answering opinion questions: separating facts from opinions and identifying the polarity ofopinion sentences. [sent-429, score-0.361]


similar papers computed by tfidf model

tfidf for this paper:

wordName wordTfidf (topN-words)

[('spin', 0.363), ('walk', 0.323), ('hitting', 0.317), ('polarity', 0.268), ('orientation', 0.257), ('wordnet', 0.137), ('turney', 0.135), ('takamura', 0.127), ('inquirer', 0.119), ('random', 0.117), ('seeds', 0.117), ('walks', 0.113), ('synonyms', 0.104), ('kamps', 0.099), ('seed', 0.099), ('glosses', 0.092), ('surfer', 0.091), ('network', 0.086), ('hatzivassiloglou', 0.085), ('stone', 0.085), ('relatedness', 0.082), ('adjectives', 0.082), ('antonyms', 0.081), ('positive', 0.078), ('shortest', 0.078), ('sentiment', 0.075), ('method', 0.074), ('littman', 0.073), ('subjectivity', 0.072), ('graph', 0.07), ('wiebe', 0.07), ('labeled', 0.07), ('xj', 0.068), ('morinaga', 0.068), ('ts', 0.068), ('notice', 0.067), ('janyce', 0.065), ('negative', 0.063), ('identifying', 0.061), ('hits', 0.06), ('mining', 0.058), ('classifying', 0.058), ('cross', 0.057), ('subjective', 0.056), ('xt', 0.056), ('hit', 0.055), ('threaded', 0.055), ('nasukawa', 0.055), ('conjunctive', 0.055), ('reputation', 0.055), ('varying', 0.054), ('bootstrapping', 0.052), ('validation', 0.05), ('steps', 0.05), ('words', 0.049), ('semantic', 0.049), ('esuli', 0.048), ('michigan', 0.048), ('samples', 0.047), ('arbor', 0.046), ('abstain', 0.045), ('electrons', 0.045), ('iarpa', 0.045), ('odni', 0.045), ('szummer', 0.045), ('xpij', 0.045), ('starting', 0.045), ('vasileios', 0.044), ('gloss', 0.041), ('hu', 0.041), ('setting', 0.041), ('kanayama', 0.04), ('andreevskaia', 0.04), ('walking', 0.04), ('accuracy', 0.039), ('monte', 0.039), ('confidence', 0.038), ('path', 0.038), ('handful', 0.038), ('carlo', 0.038), ('hypernyms', 0.038), ('classify', 0.037), ('connecting', 0.037), ('pt', 0.037), ('vertices', 0.037), ('unsupervised', 0.036), ('orientations', 0.036), ('tetsuya', 0.036), ('costly', 0.036), ('neighboring', 0.036), ('umi', 0.034), ('banea', 0.034), ('dataset', 0.034), ('nodes', 0.034), ('connect', 0.033), ('statistics', 0.033), ('polarities', 0.032), ('opinions', 0.032), ('word', 0.032), ('product', 0.031)]

similar papers list:

simIndex simValue paperId paperTitle

same-paper 1 0.99999845 141 acl-2010-Identifying Text Polarity Using Random Walks

Author: Ahmed Hassan ; Dragomir Radev

Abstract: Automatically identifying the polarity of words is a very important task in Natural Language Processing. It has applications in text classification, text filtering, analysis of product review, analysis of responses to surveys, and mining online discussions. We propose a method for identifying the polarity of words. We apply a Markov random walk model to a large word relatedness graph, producing a polarity estimate for any given word. A key advantage of the model is its ability to accurately and quickly assign a polarity sign and magnitude to any word. The method could be used both in a semi-supervised setting where a training set of labeled words is used, and in an unsupervised setting where a handful of seeds is used to define the two polarity classes. The method is experimentally tested using a manually labeled set of positive and negative words. It outperforms the state of the art methods in the semi-supervised setting. The results in the unsupervised setting is comparable to the best reported values. However, the proposed method is faster and does not need a large corpus.

2 0.1740704 210 acl-2010-Sentiment Translation through Lexicon Induction

Author: Christian Scheible

Abstract: The translation of sentiment information is a task from which sentiment analysis systems can benefit. We present a novel, graph-based approach using SimRank, a well-established vertex similarity algorithm to transfer sentiment information between a source language and a target language graph. We evaluate this method in comparison with SO-PMI.

3 0.15012529 209 acl-2010-Sentiment Learning on Product Reviews via Sentiment Ontology Tree

Author: Wei Wei ; Jon Atle Gulla

Abstract: Existing works on sentiment analysis on product reviews suffer from the following limitations: (1) The knowledge of hierarchical relationships of products attributes is not fully utilized. (2) Reviews or sentences mentioning several attributes associated with complicated sentiments are not dealt with very well. In this paper, we propose a novel HL-SOT approach to labeling a product’s attributes and their associated sentiments in product reviews by a Hierarchical Learning (HL) process with a defined Sentiment Ontology Tree (SOT). The empirical analysis against a humanlabeled data set demonstrates promising and reasonable performance of the proposed HL-SOT approach. While this paper is mainly on sentiment analysis on reviews of one product, our proposed HLSOT approach is easily generalized to labeling a mix of reviews of more than one products.

4 0.14858997 123 acl-2010-Generating Focused Topic-Specific Sentiment Lexicons

Author: Valentin Jijkoun ; Maarten de Rijke ; Wouter Weerkamp

Abstract: We present a method for automatically generating focused and accurate topicspecific subjectivity lexicons from a general purpose polarity lexicon that allow users to pin-point subjective on-topic information in a set of relevant documents. We motivate the need for such lexicons in the field of media analysis, describe a bootstrapping method for generating a topic-specific lexicon from a general purpose polarity lexicon, and evaluate the quality of the generated lexicons both manually and using a TREC Blog track test set for opinionated blog post retrieval. Although the generated lexicons can be an order of magnitude more selective than the general purpose lexicon, they maintain, or even improve, the performance of an opin- ion retrieval system.

5 0.13725837 208 acl-2010-Sentence and Expression Level Annotation of Opinions in User-Generated Discourse

Author: Cigdem Toprak ; Niklas Jakob ; Iryna Gurevych

Abstract: In this paper, we introduce a corpus of consumer reviews from the rateitall and the eopinions websites annotated with opinion-related information. We present a two-level annotation scheme. In the first stage, the reviews are analyzed at the sentence level for (i) relevancy to a given topic, and (ii) expressing an evaluation about the topic. In the second stage, on-topic sentences containing evaluations about the topic are further investigated at the expression level for pinpointing the properties (semantic orientation, intensity), and the functional components of the evaluations (opinion terms, targets and holders). We discuss the annotation scheme, the inter-annotator agreement for different subtasks and our observations.

6 0.13127705 134 acl-2010-Hierarchical Sequential Learning for Extracting Opinions and Their Attributes

7 0.12222114 105 acl-2010-Evaluating Multilanguage-Comparability of Subjectivity Analysis Systems

8 0.11945456 27 acl-2010-An Active Learning Approach to Finding Related Terms

9 0.10633978 150 acl-2010-Inducing Domain-Specific Semantic Class Taggers from (Almost) Nothing

10 0.103286 157 acl-2010-Last but Definitely Not Least: On the Role of the Last Sentence in Automatic Polarity-Classification

11 0.10325527 218 acl-2010-Structural Semantic Relatedness: A Knowledge-Based Method to Named Entity Disambiguation

12 0.098842628 156 acl-2010-Knowledge-Rich Word Sense Disambiguation Rivaling Supervised Systems

13 0.092644118 18 acl-2010-A Study of Information Retrieval Weighting Schemes for Sentiment Analysis

14 0.091103166 129 acl-2010-Growing Related Words from Seed via User Behaviors: A Re-Ranking Based Approach

15 0.088622838 181 acl-2010-On Learning Subtypes of the Part-Whole Relation: Do Not Mix Your Seeds

16 0.084959351 89 acl-2010-Distributional Similarity vs. PU Learning for Entity Set Expansion

17 0.082236975 44 acl-2010-BabelNet: Building a Very Large Multilingual Semantic Network

18 0.078924231 22 acl-2010-A Unified Graph Model for Sentence-Based Opinion Retrieval

19 0.078281589 160 acl-2010-Learning Arguments and Supertypes of Semantic Relations Using Recursive Patterns

20 0.075816549 258 acl-2010-Weakly Supervised Learning of Presupposition Relations between Verbs


similar papers computed by lsi model

lsi for this paper:

topicId topicWeight

[(0, -0.206), (1, 0.126), (2, -0.151), (3, 0.142), (4, 0.006), (5, 0.015), (6, 0.018), (7, 0.14), (8, -0.016), (9, 0.014), (10, -0.038), (11, 0.023), (12, -0.035), (13, -0.066), (14, -0.012), (15, 0.046), (16, 0.178), (17, -0.033), (18, -0.048), (19, 0.004), (20, -0.055), (21, 0.053), (22, -0.086), (23, 0.01), (24, -0.022), (25, -0.059), (26, -0.063), (27, 0.018), (28, 0.011), (29, 0.022), (30, 0.032), (31, 0.077), (32, -0.044), (33, -0.011), (34, 0.027), (35, -0.059), (36, 0.076), (37, 0.107), (38, -0.075), (39, -0.097), (40, 0.006), (41, -0.086), (42, 0.011), (43, -0.022), (44, 0.02), (45, -0.055), (46, 0.015), (47, 0.069), (48, -0.006), (49, -0.09)]

similar papers list:

simIndex simValue paperId paperTitle

same-paper 1 0.91808492 141 acl-2010-Identifying Text Polarity Using Random Walks

Author: Ahmed Hassan ; Dragomir Radev

Abstract: Automatically identifying the polarity of words is a very important task in Natural Language Processing. It has applications in text classification, text filtering, analysis of product review, analysis of responses to surveys, and mining online discussions. We propose a method for identifying the polarity of words. We apply a Markov random walk model to a large word relatedness graph, producing a polarity estimate for any given word. A key advantage of the model is its ability to accurately and quickly assign a polarity sign and magnitude to any word. The method could be used both in a semi-supervised setting where a training set of labeled words is used, and in an unsupervised setting where a handful of seeds is used to define the two polarity classes. The method is experimentally tested using a manually labeled set of positive and negative words. It outperforms the state of the art methods in the semi-supervised setting. The results in the unsupervised setting is comparable to the best reported values. However, the proposed method is faster and does not need a large corpus.

2 0.76024878 210 acl-2010-Sentiment Translation through Lexicon Induction

Author: Christian Scheible

Abstract: The translation of sentiment information is a task from which sentiment analysis systems can benefit. We present a novel, graph-based approach using SimRank, a well-established vertex similarity algorithm to transfer sentiment information between a source language and a target language graph. We evaluate this method in comparison with SO-PMI.

3 0.66414595 209 acl-2010-Sentiment Learning on Product Reviews via Sentiment Ontology Tree

Author: Wei Wei ; Jon Atle Gulla

Abstract: Existing works on sentiment analysis on product reviews suffer from the following limitations: (1) The knowledge of hierarchical relationships of products attributes is not fully utilized. (2) Reviews or sentences mentioning several attributes associated with complicated sentiments are not dealt with very well. In this paper, we propose a novel HL-SOT approach to labeling a product’s attributes and their associated sentiments in product reviews by a Hierarchical Learning (HL) process with a defined Sentiment Ontology Tree (SOT). The empirical analysis against a humanlabeled data set demonstrates promising and reasonable performance of the proposed HL-SOT approach. While this paper is mainly on sentiment analysis on reviews of one product, our proposed HLSOT approach is easily generalized to labeling a mix of reviews of more than one products.

4 0.63550347 42 acl-2010-Automatically Generating Annotator Rationales to Improve Sentiment Classification

Author: Ainur Yessenalina ; Yejin Choi ; Claire Cardie

Abstract: One ofthe central challenges in sentimentbased text categorization is that not every portion of a document is equally informative for inferring the overall sentiment of the document. Previous research has shown that enriching the sentiment labels with human annotators’ “rationales” can produce substantial improvements in categorization performance (Zaidan et al., 2007). We explore methods to automatically generate annotator rationales for document-level sentiment classification. Rather unexpectedly, we find the automatically generated rationales just as helpful as human rationales.

5 0.59848553 105 acl-2010-Evaluating Multilanguage-Comparability of Subjectivity Analysis Systems

Author: Jungi Kim ; Jin-Ji Li ; Jong-Hyeok Lee

Abstract: Subjectivity analysis is a rapidly growing field of study. Along with its applications to various NLP tasks, much work have put efforts into multilingual subjectivity learning from existing resources. Multilingual subjectivity analysis requires language-independent criteria for comparable outcomes across languages. This paper proposes to measure the multilanguage-comparability of subjectivity analysis tools, and provides meaningful comparisons of multilingual subjectivity analysis from various points of view.

6 0.58711672 109 acl-2010-Experiments in Graph-Based Semi-Supervised Learning Methods for Class-Instance Acquisition

7 0.5844422 150 acl-2010-Inducing Domain-Specific Semantic Class Taggers from (Almost) Nothing

8 0.5807333 123 acl-2010-Generating Focused Topic-Specific Sentiment Lexicons

9 0.55591047 92 acl-2010-Don't 'Have a Clue'? Unsupervised Co-Learning of Downward-Entailing Operators.

10 0.5376265 27 acl-2010-An Active Learning Approach to Finding Related Terms

11 0.53166437 18 acl-2010-A Study of Information Retrieval Weighting Schemes for Sentiment Analysis

12 0.52439618 258 acl-2010-Weakly Supervised Learning of Presupposition Relations between Verbs

13 0.52275693 89 acl-2010-Distributional Similarity vs. PU Learning for Entity Set Expansion

14 0.50920856 176 acl-2010-Mood Patterns and Affective Lexicon Access in Weblogs

15 0.48283923 129 acl-2010-Growing Related Words from Seed via User Behaviors: A Re-Ranking Based Approach

16 0.46487373 160 acl-2010-Learning Arguments and Supertypes of Semantic Relations Using Recursive Patterns

17 0.46231002 161 acl-2010-Learning Better Data Representation Using Inference-Driven Metric Learning

18 0.4588092 85 acl-2010-Detecting Experiences from Weblogs

19 0.44609746 181 acl-2010-On Learning Subtypes of the Part-Whole Relation: Do Not Mix Your Seeds

20 0.44422945 157 acl-2010-Last but Definitely Not Least: On the Role of the Last Sentence in Automatic Polarity-Classification


similar papers computed by lda model

lda for this paper:

topicId topicWeight

[(7, 0.013), (25, 0.043), (39, 0.012), (42, 0.035), (44, 0.015), (59, 0.078), (72, 0.016), (73, 0.479), (78, 0.023), (83, 0.054), (84, 0.021), (98, 0.128)]

similar papers list:

simIndex simValue paperId paperTitle

1 0.92311502 259 acl-2010-WebLicht: Web-Based LRT Services for German

Author: Erhard Hinrichs ; Marie Hinrichs ; Thomas Zastrow

Abstract: This software demonstration presents WebLicht (short for: Web-Based Linguistic Chaining Tool), a webbased service environment for the integration and use of language resources and tools (LRT). WebLicht is being developed as part of the D-SPIN project1. WebLicht is implemented as a web application so that there is no need for users to install any software on their own computers or to concern themselves with the technical details involved in building tool chains. The integrated web services are part of a prototypical infrastructure that was developed to facilitate chaining of LRT services. WebLicht allows the integration and use of distributed web services with standardized APIs. The nature of these open and standardized APIs makes it possible to access the web services from nearly any programming language, shell script or workflow engine (UIMA, Gate etc.) Additionally, an application for integration of additional services is available, allowing anyone to contribute his own web service. 1

2 0.88935316 34 acl-2010-Authorship Attribution Using Probabilistic Context-Free Grammars

Author: Sindhu Raghavan ; Adriana Kovashka ; Raymond Mooney

Abstract: In this paper, we present a novel approach for authorship attribution, the task of identifying the author of a document, using probabilistic context-free grammars. Our approach involves building a probabilistic context-free grammar for each author and using this grammar as a language model for classification. We evaluate the performance of our method on a wide range of datasets to demonstrate its efficacy.

3 0.88108426 68 acl-2010-Conditional Random Fields for Word Hyphenation

Author: Nikolaos Trogkanis ; Charles Elkan

Abstract: Finding allowable places in words to insert hyphens is an important practical problem. The algorithm that is used most often nowadays has remained essentially unchanged for 25 years. This method is the TEX hyphenation algorithm of Knuth and Liang. We present here a hyphenation method that is clearly more accurate. The new method is an application of conditional random fields. We create new training sets for English and Dutch from the CELEX European lexical resource, and achieve error rates for English of less than 0.1% for correctly allowed hyphens, and less than 0.01% for Dutch. Experiments show that both the Knuth/Liang method and a leading current commercial alternative have error rates several times higher for both languages.

4 0.87760007 45 acl-2010-Balancing User Effort and Translation Error in Interactive Machine Translation via Confidence Measures

Author: Jesus Gonzalez Rubio ; Daniel Ortiz Martinez ; Francisco Casacuberta

Abstract: This work deals with the application of confidence measures within an interactivepredictive machine translation system in order to reduce human effort. If a small loss in translation quality can be tolerated for the sake of efficiency, user effort can be saved by interactively translating only those initial translations which the confidence measure classifies as incorrect. We apply confidence estimation as a way to achieve a balance between user effort savings and final translation error. Empirical results show that our proposal allows to obtain almost perfect translations while significantly reducing user effort.

same-paper 5 0.8676365 141 acl-2010-Identifying Text Polarity Using Random Walks

Author: Ahmed Hassan ; Dragomir Radev

Abstract: Automatically identifying the polarity of words is a very important task in Natural Language Processing. It has applications in text classification, text filtering, analysis of product review, analysis of responses to surveys, and mining online discussions. We propose a method for identifying the polarity of words. We apply a Markov random walk model to a large word relatedness graph, producing a polarity estimate for any given word. A key advantage of the model is its ability to accurately and quickly assign a polarity sign and magnitude to any word. The method could be used both in a semi-supervised setting where a training set of labeled words is used, and in an unsupervised setting where a handful of seeds is used to define the two polarity classes. The method is experimentally tested using a manually labeled set of positive and negative words. It outperforms the state of the art methods in the semi-supervised setting. The results in the unsupervised setting is comparable to the best reported values. However, the proposed method is faster and does not need a large corpus.

6 0.8197282 238 acl-2010-Towards Open-Domain Semantic Role Labeling

7 0.79725122 118 acl-2010-Fine-Grained Tree-to-String Translation Rule Extraction

8 0.59447074 121 acl-2010-Generating Entailment Rules from FrameNet

9 0.57962334 230 acl-2010-The Manually Annotated Sub-Corpus: A Community Resource for and by the People

10 0.56534934 134 acl-2010-Hierarchical Sequential Learning for Extracting Opinions and Their Attributes

11 0.56453586 82 acl-2010-Demonstration of a Prototype for a Conversational Companion for Reminiscing about Images

12 0.56418324 251 acl-2010-Using Anaphora Resolution to Improve Opinion Target Identification in Movie Reviews

13 0.56400567 154 acl-2010-Jointly Optimizing a Two-Step Conditional Random Field Model for Machine Transliteration and Its Fast Decoding Algorithm

14 0.56272155 204 acl-2010-Recommendation in Internet Forums and Blogs

15 0.56112772 209 acl-2010-Sentiment Learning on Product Reviews via Sentiment Ontology Tree

16 0.5520848 102 acl-2010-Error Detection for Statistical Machine Translation Using Linguistic Features

17 0.54895699 175 acl-2010-Models of Metaphor in NLP

18 0.53639841 227 acl-2010-The Impact of Interpretation Problems on Tutorial Dialogue

19 0.53575057 85 acl-2010-Detecting Experiences from Weblogs

20 0.53535533 111 acl-2010-Extracting Sequences from the Web