emnlp emnlp2013 emnlp2013-144 knowledge-graph by maker-knowledge-mining

144 emnlp-2013-Opinion Mining in Newspaper Articles by Entropy-Based Word Connections

Source: pdf

Author: Thomas Scholz ; Stefan Conrad

Abstract: A very valuable piece of information in newspaper articles is the tonality of extracted statements. For the analysis of tonality of newspaper articles either a big human effort is needed, when it is carried out by media analysts, or an automated approach which has to be as accurate as possible for a Media Response Analysis (MRA). To this end, we will compare several state-of-the-art approaches for Opinion Mining in newspaper articles in this paper. Furthermore, we will introduce a new technique to extract entropy-based word connections which identifies the word combinations which create a tonality. In the evaluation, we use two different corpora consisting of news articles, by which we show that the new approach achieves better results than the four state-of-the-art methods.

Reference: text

Summary: the most important sentenses genereted by tfidf model

sentIndex sentText sentNum sentScore

1 For the analysis of tonality of newspaper articles either a big human effort is needed, when it is carried out by media analysts, or an automated approach which has to be as accurate as possible for a Media Response Analysis (MRA). [sent-2, score-0.679]

2 To this end, we will compare several state-of-the-art approaches for Opinion Mining in newspaper articles in this paper. [sent-3, score-0.138]

3 1 Introduction The Web keeps many potentially valuable opinions in news articles which are partly new online articles or uploaded print media articles. [sent-6, score-0.211]

4 So, an opinion-oriented analysis of news articles is important, because the tonality (Watson and Noble, 2007; Scholz et al. [sent-8, score-0.596]

5 At the same time, Opinion Mining in newspaper articles appears to be difficult, because not all parts of news articles are as subjective (Balahur et al. [sent-15, score-0.315]

6 Therefore, we work with extracted statements of news articles, in which a sequence of consecutive sentences has the same tonality value. [sent-18, score-1.043]

7 At the same time, some approaches focus more on differentiating only between positive and negative news and leave out neutral examples (Taboada et al. [sent-19, score-0.297]

8 Conversely, we have noticed that even if the used words in the news domain are quite similar, the tonality which the words express can be different, especially if neutral examples are involved (cf. [sent-22, score-0.673]

9 We propose this task formulation: Problem definition: Let s ⊆ d be a statement and docPurmobenletm md represents a newspaper eaartsictalete. [sent-25, score-0.262]

10 mThenet taanskd is to determine the tonality y for a given statement s, consisting of k words: y t∈ : { sp =osit (wiv1e, nwe2u,tr. [sent-26, score-0.674]

11 l, nwekg)at7 →ive} (1) Normally, a statement consists of one up to four sentences. [sent-28, score-0.182]

12 But also longer statements are possible, but they appear less frequently in a MRA. [sent-29, score-0.505]

13 An automated approach (Scholz and Conrad, 2013) for the extraction of statements already exists. [sent-30, score-0.505]

14 So, we concentrate on the tonality classification, which is not provided by 1828 Proce Sdeiantgtlse o,f W thaesh 2i0n1gt3o nC,o UnSfeAre,n 1c8e- o2n1 E Omctpoibriecra 2l0 M13et. [sent-32, score-0.492]

15 oc d2s0 i1n3 N Aastusorcaila Ltiaon g fuoarg Ceo Pmrpoucetastsi on ga,l p Laignegsu 1is8t2ic8s–1839, the approach for the statements extraction (Scholz and Conrad, 2013). [sent-34, score-0.505]

16 Furthermore, we define the polarity of sentiment as the distinction between positive and negative sentiment and the subjectivity as the distinction between subjective (positive and negative) statements and neutral statements. [sent-35, score-1.221]

17 We are aware of the fact that viewpoints play a significant role in a newspaper, but since we concentrate on the determination of the tonality, the extraction of viewpoints can be solved in a separate step (Scholz and Conrad, 2012). [sent-40, score-0.156]

18 This is possible, because the tonality of a statement can be determined without knowledge of the viewpoint in almost all cases. [sent-41, score-0.705]

19 The only exception is a statement with multiple viewpoints and different tonalities, but these statements are very rare (cf. [sent-42, score-0.765]

20 Our approach learns a graph from an annotated collection of statements, in which nodes and edges model tonality-bearing word connections. [sent-45, score-0.168]

21 For unseen statements, we recognize subgraphs of the learned graph, compare two weighting methods for extracting different tonality features, and classify the statements by a support vector machine. [sent-46, score-1.098]

22 In the third section, we introduce our graph-based and entropy-based approach to calculate the tonality features T. [sent-48, score-0.528]

23 The different contributions reach from applying Opinion Mining in reviews and recommending new multimedia products for individuals (Qumsiyeh and Ng, 2012) to sentiment analyses for different topics in social media (Wang et al. [sent-53, score-0.206]

24 , 2011) or the creation of sentiment dictionaries (Baccianella et al. [sent-54, score-0.164]

25 SO-CAL identifies some special expressions and constructions, which tell the reader, that this text part does not really contain an actual opinion or sentiment. [sent-69, score-0.157]

26 (2008) also work with a dictionary, which even includes context-dependent words (positive, neutral, and negative words) as well as rules to identify the sentiment orientation of words (Opinion Observer). [sent-73, score-0.212]

27 Furthermore, they extract relations between opinion words and corresponding product features. [sent-75, score-0.157]

28 Subsequently, an average subjective measure vector selects the most subjective terms. [sent-80, score-0.146]

29 Unfortunately, since the corpus does not have statements and a statement-based tonality, it is not designed as a MRA. [sent-89, score-0.505]

30 In this way, our approach is able to recognize tonality-indicating structures (subgraphs) which provide precise information about the tonal- ity, even if statements have a very similar bag-ofwords representation and at the same time different tonalities. [sent-95, score-0.505]

31 One could also say that we create a graph 1830 instead of a sentiment dictionary from training examples, as other approaches (Kaji and Kitsuregawa, 2007; Du et al. [sent-96, score-0.227]

32 In figure 1, simple examples are shown with a possible graph (the nodes and edges are taken from the given statements; of course, the graphs and weights become larger in practice). [sent-98, score-0.205]

33 Thus, even though the word representation is quite similar, the tonality can be different. [sent-100, score-0.492]

34 Therefore, the vocabulary V is the set of words in lemma for one set of statements S. [sent-106, score-0.505]

35 The edge eij shows the appearance of node υi and υj in combination with tonality y by means of a weight εi,j (the sequence of the values in equation 2 is also used in figure 1 and 2). [sent-109, score-0.516]

36 εij = (yijπ, yijo, yijν) (2) is the number of co-occurrences of node υi and υj in positive statements within the same sentence. [sent-110, score-0.588]

37 In analogy, yijo belongs to sentences of neutral statements and yijν to sentences of negative statements. [sent-111, score-0.792]

38 2 Generating Features for Learning From a learned graph, we can combine different edges to calculate tonality features for an unseen statement s. [sent-114, score-0.812]

39 An unseen statement is a statement, which is of course not used to learn the graph. [sent-115, score-0.205]

40 We use all edges of the subgraph Gsl which contains the nodes for every lemma wi in the l-th sentence of s. [sent-116, score-0.143]

41 y32 ) T (h pn is(oe nspgiuneoiatulsevrtineavtrisalve)ilft)ihes Figure 1: An example inte(0n,s0if,1y)csroislv(s1e,10)( ,01s,l10o,)w0l)y for different statements and a graph: The weights base on the three examples and their notation is (positive,neutral,negative). [sent-119, score-0.505]

42 (c1gr,i2os(,7w42),t1h8)1(4s,0trubc)e(t3usr0t,1oar,l1y0)(25, f10a,t2c8e),lo0r Figure 2: An example of a learned graph: The nodes and edges, which are drawn in solid lines, represent the recognized subgraph Gsl for the sentence “There are structural factors behind the African growth story. [sent-120, score-0.133]

43 It contains seven nodes and nine edges (also the nodes and edges in dashed lines). [sent-124, score-0.228]

44 If we further assume that an unseen statement is the example of section 1. [sent-125, score-0.205]

45 We could also look for complete or connected graphs in the statement instead of using all edges. [sent-130, score-0.219]

46 Otherwise we take the appearances in statements of the same class. [sent-135, score-0.561]

47 The denominators of the polarity refer only to positive and negative appearances, while the denominators for the subjectivity refer to every tonality. [sent-136, score-0.302]

48 By calculating the vectorial sum, we combine several edges in order to estimate precise tonality scores. [sent-137, score-0.595]

49 In this way, we can get the correct tonality score for the noun “crisis”, if a sentence contains also “solve” and “slowly” (→ more neutral) or “intensify” (→ more negative) (cf. [sent-138, score-0.492]

50 lA)n odr we get ftyhe” (c→orr emcot tonality score . [sent-140, score-0.492]

51 Thus, every category gets its own feature and every node only has a tonality value, if it belongs to the category of the feature. [sent-146, score-0.541]

52 One type shows the difference between positive and negative polarity (z = pol), for the other type we replace the positive class by the subjective one (the sum of positive and negative) and the negative by a neutral one in order to differentiate between neutral and non-neutral examples (z = sub). [sent-149, score-0.735]

53 For a clearly positive node (appears only in positive statements), e. [sent-162, score-0.142]

54 We use a SVM2 to classify the statements by the extracted features. [sent-168, score-0.505]

55 We will demonstrate that in section 4, where this method of using all edges as features is denoted as the graph edges method. [sent-176, score-0.212]

56 Up to ten media analysts (professional experts in the field of MRA) annotate the extracted statements with a tonality. [sent-183, score-0.606]

57 So, four analysts annotate the same statements from a small part of the statements. [sent-185, score-0.557]

58 This is not a problem, because the tonality of statements can be estimated without knowledge of the viewpoint in the most cases. [sent-192, score-1.028]

59 Nevertheless, a statement can have two different viewpoints in a MRA. [sent-193, score-0.26]

60 de/research/ 1833 and 279 statements of the Finance dataset (approx. [sent-198, score-0.505]

61 One of these examples is the following statement, which is a translated statement of the PDS: • Example: The logical consequence would be a xsaumbsptalen:tia Tlh ienc loregaicsea of tnhsee subsidies, uwldhic beh the SPD fraction has demanded several times. [sent-204, score-0.182]

62 We keep these statements within the dataset, because this case can occur in a MRA. [sent-208, score-0.505]

63 30% of the statements, that is 420 statements (the first 140 positive, neutral, or negative statements) or 2,500 statements (the first 625 positive or negative and the first 1,250 neutral statements) in order to create our graph (the graph has 41,470 or 154,001 edges, resp. [sent-211, score-1.426]

64 Unless otherwise stated, 20% of the remaining statements (220 and 1,200 statements) are the training set for the SVM and the rest is test set. [sent-214, score-0.505]

65 Thus, we use the same statements which we use for the creation of our graphs for the creation of a dictionary as one variant. [sent-221, score-0.662]

66 (2009), all words which appear more often in neutral statements get the prior polarity neutral. [sent-223, score-0.717]

67 For all other words, we calculate the number of appearances in positive statements minus the appearances in negative statements divided by all appearances. [sent-224, score-1.274]

68 Thus, for a statement classification, we classify the words of the statements and the class of the most frequently used words is the class of the statement (ambiguous statements are classified as the most frequent class). [sent-235, score-1.422]

69 According to the authors, we apply the best machine learning techniques for the word classification (BoosTexter for tonality classification and Ripper for Subjectivity Analysis with parameters as in (Wilson et al. [sent-236, score-0.56]

70 , 2008), we also identify neutral words if they appear more often in neutral than in subjective statements and subjective words are positive if they appear more often in positive than in negative statements and vice versa for negative words. [sent-239, score-1.658]

71 In contrast to Opinion Mining in customer reviews, we exchange product features through statements and calculate the orientation of 1834 opinions for all statements with their opinion orientation algorithm. [sent-240, score-1.259]

72 , 2011) needs dictionaries with sentiment values from -5 to +5 with intervals of one. [sent-243, score-0.127]

73 Furthermore, we implement the algorithm of irrealis blocking and translate the list of irrealis markers (modal verbs, conditional markers, negative polarity items, private-state verbs (Taboada et al. [sent-251, score-0.251]

74 For all dictionary-based methods (Wilson, Opinion Observer, SO-CAL), we also evaluate an additional variant which use a sentiment dictionary and not the statements which we use to construct the graphs on each fold. [sent-253, score-0.715]

75 As the SentiWS has sentiment values between −1 and 1, we apply similsaern procedures etos bceotnwsetreunct − th1e a method-specific idmici-tionaries as described above: For SO-CAL, it is the same procedure by using the SentiWS values, positive words has a score above 0. [sent-256, score-0.186]

76 Therefore, we have also added our SVM in order to classify the statements based on the scores of Opinion Observer and SO-CAL (as shown in tables with (+ SVM)). [sent-272, score-0.505]

77 Table 2 and 3 present the tonality classification (positive, neutral, negative) and table 4 displays the Subjectivity Analysis (subjective, neutral). [sent-275, score-0.526]

78 The weighting of the edges through the Entropy-summand performs better than the Kullback-Leibler weighting on both datasets, so we use the Entropy-summand weighting for all further experiments. [sent-279, score-0.22]

79 Furthermore, the variants of the methods, which are expanded by a general sentiment dictionary, perform rather worse. [sent-283, score-0.127]

80 The ’classical’ Opinion Observer performs better with a general sentiment dictionary, while Wilson tends to achieve worse results in this variant. [sent-284, score-0.127]

81 org/ 1835 the tonality classification by the most frequent word class seems appropriate for this task and method, because this method achieves better results in the classification of statements than on the word level. [sent-295, score-1.089]

82 This fits in with our assumption that every sentence of a statement is important and that more words lead to more tonality information. [sent-299, score-0.674]

83 The number of word features for RSUMM(100%) is 4,985 features for one statement on PDS and 13,608 features on Finance. [sent-300, score-0.182]

84 As mentioned before, only the graph edges obtain a not so high accuracy. [sent-304, score-0.133]

85 We evaluate the influence of the different input sizes and so we performed experiments with 5%, 10%, 40%, and 80% training for machine learning as well as 210 and 840 statements for the creation of dictionaries/graphs on PSD (0. [sent-306, score-0.542]

86 32% training for 840 statements in order to create the same size of training according to the results of 420 statements). [sent-308, score-0.505]

87 4 Statistical Significance of the Features We perform a 10-fold cross validation with our method, Wilson (as the best ’classical’ state-of-theart-method) and SO-CAL (+ SVM) on the pressrelations dataset in order to evaluate the contribution of single tonality features. [sent-328, score-0.59]

88 In the categories, the nouns and verbs are more significant than adjectives and adverbs (adverbs are a little stronger in the polarity difference). [sent-347, score-0.186]

89 The combination of all tonality features is a significant increase against both baselines, too. [sent-351, score-0.492]

90 The findings show that the word connections in combination with the entropy weighting allow to learn the tonality structure of different word combinations accurately, even though the training size is small. [sent-352, score-0.619]

91 So, this approach in combination with an extraction of statements (Scholz and Conrad, 2013) and the determination of viewpoints (Scholz and Conrad, 2012) represents a fully automated solution in order to perform Opinion Mining for a MRA. [sent-354, score-0.583]

92 Using multiple sources to construct a sentiment sensitive thesaurus for cross-domain sentiment classification. [sent-374, score-0.254]

93 Adapting information bottleneck method for automatic construction of domain-oriented sentiment lexicon. [sent-394, score-0.127]

94 Seeing stars when there aren’t many stars: graph-based semi- supervised learning for sentiment categorization. [sent-402, score-0.127]

95 Building lexicon for sentiment analysis from massive collection of html documents. [sent-412, score-0.127]

96 Integrating viewpoints into newspaper opinion mining for a media – response analysis. [sent-472, score-0.443]

97 Extraction of statements in news for a media response analysis. [sent-478, score-0.631]

98 Opinion mining on a german corpus of a media response analysis. [sent-485, score-0.167]

99 Comparing different methods for opinion mining in newspaper articles. [sent-490, score-0.285]

100 Topic sentiment analysis in twitter: a graph-based hashtag sentiment classification ap- proach. [sent-508, score-0.288]

similar papers computed by tfidf model

tfidf for this paper:

wordName wordTfidf (topN-words)

[('statements', 0.505), ('tonality', 0.492), ('rsumm', 0.211), ('statement', 0.182), ('yij', 0.171), ('opinion', 0.157), ('scholz', 0.156), ('pds', 0.155), ('neutral', 0.135), ('sentiment', 0.127), ('wilson', 0.116), ('conrad', 0.1), ('taboada', 0.1), ('pressrelations', 0.098), ('mra', 0.098), ('finance', 0.094), ('observer', 0.089), ('sarvabhotla', 0.084), ('newspaper', 0.08), ('edges', 0.079), ('viewpoints', 0.078), ('polarity', 0.077), ('subjective', 0.073), ('yijo', 0.07), ('negations', 0.062), ('subjectivity', 0.061), ('positive', 0.059), ('articles', 0.058), ('negative', 0.057), ('connections', 0.057), ('appearances', 0.056), ('eijp', 0.056), ('gsl', 0.056), ('sentiws', 0.056), ('graph', 0.054), ('analysts', 0.052), ('ding', 0.052), ('noble', 0.049), ('media', 0.049), ('mining', 0.048), ('adverbs', 0.048), ('weighting', 0.047), ('news', 0.046), ('dictionary', 0.046), ('growth', 0.045), ('boostexter', 0.042), ('cdu', 0.042), ('fpol', 0.042), ('fsub', 0.042), ('irrealis', 0.042), ('remus', 0.042), ('spd', 0.042), ('sub', 0.041), ('svm', 0.04), ('german', 0.039), ('watson', 0.037), ('creation', 0.037), ('graphs', 0.037), ('intensifiers', 0.037), ('calculate', 0.036), ('negation', 0.036), ('signs', 0.036), ('nodes', 0.035), ('classification', 0.034), ('african', 0.033), ('verbs', 0.033), ('viewpoint', 0.031), ('subgraphs', 0.031), ('dkl', 0.031), ('response', 0.031), ('classical', 0.031), ('stefan', 0.03), ('reviews', 0.03), ('subgraph', 0.029), ('neg', 0.029), ('company', 0.029), ('orientation', 0.028), ('entropysummand', 0.028), ('geijspl', 0.028), ('neu', 0.028), ('ponomareva', 0.028), ('qumsiyeh', 0.028), ('ripper', 0.028), ('socal', 0.028), ('songbo', 0.028), ('strongsubj', 0.028), ('tonalities', 0.028), ('ygisjl', 0.028), ('adjectives', 0.028), ('belongs', 0.025), ('nldb', 0.024), ('balahur', 0.024), ('denominators', 0.024), ('xueqi', 0.024), ('vectorial', 0.024), ('class', 0.024), ('node', 0.024), ('structural', 0.024), ('unseen', 0.023), ('combinations', 0.023)]

similar papers list:

simIndex simValue paperId paperTitle

same-paper 1 0.99999958 144 emnlp-2013-Opinion Mining in Newspaper Articles by Entropy-Based Word Connections

Author: Thomas Scholz ; Stefan Conrad

2 0.15880774 143 emnlp-2013-Open Domain Targeted Sentiment

Author: Margaret Mitchell ; Jacqui Aguilar ; Theresa Wilson ; Benjamin Van Durme

Abstract: We propose a novel approach to sentiment analysis for a low resource setting. The intuition behind this work is that sentiment expressed towards an entity, targeted sentiment, may be viewed as a span of sentiment expressed across the entity. This representation allows us to model sentiment detection as a sequence tagging problem, jointly discovering people and organizations along with whether there is sentiment directed towards them. We compare performance in both Spanish and English on microblog data, using only a sentiment lexicon as an external resource. By leveraging linguisticallyinformed features within conditional random fields (CRFs) trained to minimize empirical risk, our best models in Spanish significantly outperform a strong baseline, and reach around 90% accuracy on the combined task of named entity recognition and sentiment prediction. Our models in English, trained on a much smaller dataset, are not yet statistically significant against their baselines.

3 0.13951917 47 emnlp-2013-Collective Opinion Target Extraction in Chinese Microblogs

Author: Xinjie Zhou ; Xiaojun Wan ; Jianguo Xiao

Abstract: Microblog messages pose severe challenges for current sentiment analysis techniques due to some inherent characteristics such as the length limit and informal writing style. In this paper, we study the problem of extracting opinion targets of Chinese microblog messages. Such fine-grained word-level task has not been well investigated in microblogs yet. We propose an unsupervised label propagation algorithm to address the problem. The opinion targets of all messages in a topic are collectively extracted based on the assumption that similar messages may focus on similar opinion targets. Topics in microblogs are identified by hashtags or using clustering algorithms. Experimental results on Chinese microblogs show the effectiveness of our framework and algorithms.

4 0.12493914 81 emnlp-2013-Exploring Demographic Language Variations to Improve Multilingual Sentiment Analysis in Social Media

Author: Svitlana Volkova ; Theresa Wilson ; David Yarowsky

Abstract: Theresa Wilson Human Language Technology Center of Excellence Johns Hopkins University Baltimore, MD t aw@ j hu .edu differences may Different demographics, e.g., gender or age, can demonstrate substantial variation in their language use, particularly in informal contexts such as social media. In this paper we focus on learning gender differences in the use of subjective language in English, Spanish, and Russian Twitter data, and explore cross-cultural differences in emoticon and hashtag use for male and female users. We show that gender differences in subjective language can effectively be used to improve sentiment analysis, and in particular, polarity classification for Spanish and Russian. Our results show statistically significant relative F-measure improvement over the gender-independent baseline 1.5% and 1% for Russian, 2% and 0.5% for Spanish, and 2.5% and 5% for English for polarity and subjectivity classification.

5 0.11362666 109 emnlp-2013-Is Twitter A Better Corpus for Measuring Sentiment Similarity?

Author: Shi Feng ; Le Zhang ; Binyang Li ; Daling Wang ; Ge Yu ; Kam-Fai Wong

Abstract: Extensive experiments have validated the effectiveness of the corpus-based method for classifying the word’s sentiment polarity. However, no work is done for comparing different corpora in the polarity classification task. Nowadays, Twitter has aggregated huge amount of data that are full of people’s sentiments. In this paper, we empirically evaluate the performance of different corpora in sentiment similarity measurement, which is the fundamental task for word polarity classification. Experiment results show that the Twitter data can achieve a much better performance than the Google, Web1T and Wikipedia based methods.

6 0.093925245 158 emnlp-2013-Recursive Deep Models for Semantic Compositionality Over a Sentiment Treebank

7 0.091407314 63 emnlp-2013-Discourse Level Explanatory Relation Extraction from Product Reviews Using First-Order Logic

8 0.084526964 170 emnlp-2013-Sentiment Analysis: How to Derive Prior Polarities from SentiWordNet

9 0.073941119 163 emnlp-2013-Sarcasm as Contrast between a Positive Sentiment and Negative Situation

10 0.063426495 121 emnlp-2013-Learning Topics and Positions from Debatepedia

11 0.06240885 77 emnlp-2013-Exploiting Domain Knowledge in Aspect Extraction

12 0.057265196 169 emnlp-2013-Semi-Supervised Representation Learning for Cross-Lingual Text Classification

13 0.050227232 120 emnlp-2013-Learning Latent Word Representations for Domain Adaptation using Supervised Word Clustering

14 0.049404223 66 emnlp-2013-Dynamic Feature Selection for Dependency Parsing

15 0.049300361 41 emnlp-2013-Building Event Threads out of Multiple News Articles

16 0.04858654 61 emnlp-2013-Detecting Promotional Content in Wikipedia

17 0.047566816 182 emnlp-2013-The Topology of Semantic Knowledge

18 0.043166522 99 emnlp-2013-Implicit Feature Detection via a Constrained Topic Model and SVM

19 0.042842168 189 emnlp-2013-Two-Stage Method for Large-Scale Acquisition of Contradiction Pattern Pairs using Entailment

20 0.04253326 126 emnlp-2013-MCTest: A Challenge Dataset for the Open-Domain Machine Comprehension of Text

similar papers computed by lsi model

lsi for this paper:

topicId topicWeight

[(0, -0.148), (1, 0.058), (2, -0.151), (3, -0.13), (4, 0.095), (5, -0.058), (6, -0.006), (7, -0.106), (8, 0.068), (9, 0.111), (10, 0.001), (11, -0.035), (12, -0.024), (13, 0.023), (14, -0.006), (15, 0.044), (16, -0.048), (17, 0.053), (18, 0.096), (19, 0.033), (20, 0.113), (21, -0.025), (22, -0.066), (23, 0.001), (24, 0.032), (25, -0.007), (26, 0.052), (27, 0.002), (28, 0.052), (29, -0.05), (30, 0.119), (31, -0.058), (32, -0.115), (33, 0.019), (34, 0.02), (35, 0.063), (36, -0.014), (37, -0.017), (38, -0.052), (39, -0.062), (40, 0.107), (41, 0.038), (42, 0.121), (43, 0.022), (44, 0.093), (45, -0.032), (46, -0.014), (47, -0.071), (48, -0.126), (49, -0.013)]

similar papers list:

simIndex simValue paperId paperTitle

same-paper 1 0.93840832 144 emnlp-2013-Opinion Mining in Newspaper Articles by Entropy-Based Word Connections

Author: Thomas Scholz ; Stefan Conrad

2 0.69103378 143 emnlp-2013-Open Domain Targeted Sentiment

Author: Margaret Mitchell ; Jacqui Aguilar ; Theresa Wilson ; Benjamin Van Durme

3 0.63964856 170 emnlp-2013-Sentiment Analysis: How to Derive Prior Polarities from SentiWordNet

Author: Marco Guerini ; Lorenzo Gatti ; Marco Turchi

Abstract: Assigning a positive or negative score to a word out of context (i.e. a word’s prior polarity) is a challenging task for sentiment analysis. In the literature, various approaches based on SentiWordNet have been proposed. In this paper, we compare the most often used techniques together with newly proposed ones and incorporate all of them in a learning framework to see whether blending them can further improve the estimation of prior polarity scores. Using two different versions of SentiWordNet and testing regression and classification models across tasks and datasets, our learning approach consistently outperforms the single metrics, providing a new state-ofthe-art approach in computing words’ prior polarity for sentiment analysis. We conclude our investigation showing interesting biases in calculated prior polarity scores when word Part of Speech and annotator gender are considered.

4 0.59169024 47 emnlp-2013-Collective Opinion Target Extraction in Chinese Microblogs

Author: Xinjie Zhou ; Xiaojun Wan ; Jianguo Xiao

5 0.57130831 109 emnlp-2013-Is Twitter A Better Corpus for Measuring Sentiment Similarity?

Author: Shi Feng ; Le Zhang ; Binyang Li ; Daling Wang ; Ge Yu ; Kam-Fai Wong

6 0.55947089 63 emnlp-2013-Discourse Level Explanatory Relation Extraction from Product Reviews Using First-Order Logic

7 0.52620941 196 emnlp-2013-Using Crowdsourcing to get Representations based on Regular Expressions

8 0.50580812 163 emnlp-2013-Sarcasm as Contrast between a Positive Sentiment and Negative Situation

9 0.47413176 158 emnlp-2013-Recursive Deep Models for Semantic Compositionality Over a Sentiment Treebank

10 0.45994958 81 emnlp-2013-Exploring Demographic Language Variations to Improve Multilingual Sentiment Analysis in Social Media

11 0.418345 182 emnlp-2013-The Topology of Semantic Knowledge

12 0.36092222 121 emnlp-2013-Learning Topics and Positions from Debatepedia

13 0.34103647 61 emnlp-2013-Detecting Promotional Content in Wikipedia

14 0.33411542 77 emnlp-2013-Exploiting Domain Knowledge in Aspect Extraction

15 0.31417149 191 emnlp-2013-Understanding and Quantifying Creativity in Lexical Composition

16 0.31256047 66 emnlp-2013-Dynamic Feature Selection for Dependency Parsing

17 0.2736392 142 emnlp-2013-Open-Domain Fine-Grained Class Extraction from Web Search Queries

18 0.26871693 161 emnlp-2013-Rule-Based Information Extraction is Dead! Long Live Rule-Based Information Extraction Systems!

19 0.25842655 178 emnlp-2013-Success with Style: Using Writing Style to Predict the Success of Novels

20 0.25291136 35 emnlp-2013-Automatically Detecting and Attributing Indirect Quotations

similar papers computed by lda model

lda for this paper:

topicId topicWeight

[(3, 0.062), (9, 0.018), (18, 0.027), (22, 0.052), (30, 0.076), (37, 0.342), (51, 0.127), (66, 0.039), (71, 0.068), (75, 0.027), (96, 0.018)]

similar papers list:

simIndex simValue paperId paperTitle

same-paper 1 0.72672045 144 emnlp-2013-Opinion Mining in Newspaper Articles by Entropy-Based Word Connections

Author: Thomas Scholz ; Stefan Conrad

2 0.676319 18 emnlp-2013-A temporal model of text periodicities using Gaussian Processes

Author: Daniel Preotiuc-Pietro ; Trevor Cohn

Abstract: Temporal variations of text are usually ignored in NLP applications. However, text use changes with time, which can affect many applications. In this paper we model periodic distributions of words over time. Focusing on hashtag frequency in Twitter, we first automatically identify the periodic patterns. We use this for regression in order to forecast the volume of a hashtag based on past data. We use Gaussian Processes, a state-ofthe-art bayesian non-parametric model, with a novel periodic kernel. We demonstrate this in a text classification setting, assigning the tweet hashtag based on the rest of its text. This method shows significant improvements over competitive baselines.

3 0.45132098 151 emnlp-2013-Paraphrasing 4 Microblog Normalization

Author: Wang Ling ; Chris Dyer ; Alan W Black ; Isabel Trancoso

Abstract: Compared to the edited genres that have played a central role in NLP research, microblog texts use a more informal register with nonstandard lexical items, abbreviations, and free orthographic variation. When confronted with such input, conventional text analysis tools often perform poorly. Normalization replacing orthographically or lexically idiosyncratic forms with more standard variants can improve performance. We propose a method for learning normalization rules from machine translations of a parallel corpus of microblog messages. To validate the utility of our approach, we evaluate extrinsically, showing that normalizing English tweets and then translating improves translation quality (compared to translating unnormalized text) using three standard web translation services as well as a phrase-based translation system trained — — on parallel microblog data.

4 0.45003378 123 emnlp-2013-Learning to Rank Lexical Substitutions

Author: Gyorgy Szarvas ; Robert Busa-Fekete ; Eyke Hullermeier

Abstract: The problem to replace a word with a synonym that fits well in its sentential context is known as the lexical substitution task. In this paper, we tackle this task as a supervised ranking problem. Given a dataset of target words, their sentential contexts and the potential substitutions for the target words, the goal is to train a model that accurately ranks the candidate substitutions based on their contextual fitness. As a key contribution, we customize and evaluate several learning-to-rank models to the lexical substitution task, including classification-based and regression-based approaches. On two datasets widely used for lexical substitution, our best models signifi- cantly advance the state-of-the-art.

5 0.44932488 63 emnlp-2013-Discourse Level Explanatory Relation Extraction from Product Reviews Using First-Order Logic

Author: Qi Zhang ; Jin Qian ; Huan Chen ; Jihua Kang ; Xuanjing Huang

Abstract: Explanatory sentences are employed to clarify reasons, details, facts, and so on. High quality online product reviews usually include not only positive or negative opinions, but also a variety of explanations of why these opinions were given. These explanations can help readers get easily comprehensible information of the discussed products and aspects. Moreover, explanatory relations can also benefit sentiment analysis applications. In this work, we focus on the task of identifying subjective text segments and extracting their corresponding explanations from product reviews in discourse level. We propose a novel joint extraction method using firstorder logic to model rich linguistic features and long distance constraints. Experimental results demonstrate the effectiveness of the proposed method.

6 0.44655299 47 emnlp-2013-Collective Opinion Target Extraction in Chinese Microblogs

7 0.44423065 48 emnlp-2013-Collective Personal Profile Summarization with Social Networks

8 0.44107649 56 emnlp-2013-Deep Learning for Chinese Word Segmentation and POS Tagging

9 0.44092071 143 emnlp-2013-Open Domain Targeted Sentiment

10 0.43864638 77 emnlp-2013-Exploiting Domain Knowledge in Aspect Extraction

11 0.43805063 81 emnlp-2013-Exploring Demographic Language Variations to Improve Multilingual Sentiment Analysis in Social Media

12 0.43760359 36 emnlp-2013-Automatically Determining a Proper Length for Multi-Document Summarization: A Bayesian Nonparametric Approach

13 0.43441272 106 emnlp-2013-Inducing Document Plans for Concept-to-Text Generation

14 0.43367013 99 emnlp-2013-Implicit Feature Detection via a Constrained Topic Model and SVM

15 0.43349084 140 emnlp-2013-Of Words, Eyes and Brains: Correlating Image-Based Distributional Semantic Models with Neural Representations of Concepts

16 0.43266216 179 emnlp-2013-Summarizing Complex Events: a Cross-Modal Solution of Storylines Extraction and Reconstruction

17 0.43234235 21 emnlp-2013-An Empirical Study Of Semi-Supervised Chinese Word Segmentation Using Co-Training

18 0.43177751 194 emnlp-2013-Unsupervised Relation Extraction with General Domain Knowledge

19 0.43169475 76 emnlp-2013-Exploiting Discourse Analysis for Article-Wide Temporal Classification

20 0.4313505 152 emnlp-2013-Predicting the Presence of Discourse Connectives