emnlp emnlp2011 emnlp2011-126 knowledge-graph by maker-knowledge-mining

126 emnlp-2011-Structural Opinion Mining for Graph-based Sentiment Representation


Source: pdf

Author: Yuanbin Wu ; Qi Zhang ; Xuanjing Huang ; Lide Wu

Abstract: Based on analysis of on-line review corpus we observe that most sentences have complicated opinion structures and they cannot be well represented by existing methods, such as frame-based and feature-based ones. In this work, we propose a novel graph-based representation for sentence level sentiment. An integer linear programming-based structural learning method is then introduced to produce the graph representations of input sentences. Experimental evaluations on a manually labeled Chinese corpus demonstrate the effectiveness of the proposed approach.

Reference: text


Summary: the most important sentenses genereted by tfidf model

sentIndex sentText sentNum sentScore

1 cn Abstract Based on analysis of on-line review corpus we observe that most sentences have complicated opinion structures and they cannot be well represented by existing methods, such as frame-based and feature-based ones. [sent-3, score-0.649]

2 An integer linear programming-based structural learning method is then introduced to produce the graph representations of input sentences. [sent-5, score-0.228]

3 Previous researches on sentiment analysis tackled the problem on various levels of granularity including document, sentence, phrase and word (Pang et al. [sent-9, score-0.242]

4 They mainly focused on two directions: sentiment classification which detects the overall polarity of a text; sentiment related information extraction which tries to answer the questions like “who expresses what opinion on which target”. [sent-15, score-1.061]

5 Most of the current studies on the second direction assume that an opinion can be structured as a frame which is composed of a fixed number of slots. [sent-16, score-0.649]

6 Typical slots include opinion holder, opinion expression, and evaluation target. [sent-17, score-1.329]

7 A lot of important information about an opinion may be lost using those representation methods. [sent-24, score-0.69]

8 Based on the definition of opinion unit proposed by Hu and Liu (2004), from the first example, the information we can get is the author’s negative opinion about “interior” using an opinion expression “noisy”. [sent-28, score-2.053]

9 Besides that, an opinion expression may induce other opinions which are not expressed directly. [sent-34, score-0.801]

10 In example 3, the opinion expression is “good” whose 1http://reviews. [sent-35, score-0.727]

11 But the “software” which triggers the opinion expression “good” is also endowed with a positive opinion. [sent-43, score-0.727]

12 In practice, this induced opinion on “software” is actually more informative than its direct counterpart. [sent-44, score-0.649]

13 Mining those opinions may help to form a complete sentiment analysis result. [sent-45, score-0.28]

14 Example 4 is such a case that the whole positive comment of camera is expressed by a transition from a negative opinion to a positive one. [sent-48, score-0.782]

15 In order to address those issues, this paper de- scribes a novel sentiment representation and analysis method. [sent-49, score-0.247]

16 The vertices are evaluation target, opinion expression, modifiers of opinion. [sent-52, score-0.881]

17 Through the graph, various information on opinion expressions which is ignored by current representation methods can be well handled. [sent-55, score-0.766]

18 We propose a supervised structural learning method which takes a sentence as input and the proposed sentiment representation for it as output. [sent-58, score-0.319]

19 The inference algorithm is based on integer linear programming which helps to concisely and uniformly handle various properties of our sentiment representation. [sent-59, score-0.328]

20 In the graph, vertices are text spans in the sentences which are opinion expressions, evaluation targets, conditional clauses etc. [sent-66, score-0.785]

21 Two types of edges are included in the graph: (1) relations among opinion expressions and their modifiers; (2) relations among opinion expressions. [sent-67, score-1.681]

22 The second type of the edges captures the relations among individual opinions. [sent-69, score-0.228]

23 1 Individual Opinion Representation Let r be an opinion expression in a sentence, the representation unit for r is a set of relations {(r, dk)}. [sent-72, score-0.846]

24 rFeosre enatacthio rnela utniiotn f (r, dk), dk its o a modifier w {(hri,cdh i)s} a span of text specifying the change of r’s meaning. [sent-73, score-0.229]

25 The relations between modifier and opinion expression can be the type of any kind. [sent-74, score-0.873]

26 In this work, we mainly consider two basic types: • • opinion restriction. [sent-75, score-0.649]

27 (r, dk) is an opinion expansoipoinn oifn nr e’sx scope expands to dk, r induces another opinion on dk, or the opinion on dk is implicitly expressed by r. [sent-78, score-2.108]

28 Mining the opinion restrictions can help to get accurate meaning of an opinion, and the opinion expansions are useful to cover more indirect opinions. [sent-79, score-1.326]

29 As with previous sentiment representations, we ac- tually consider the third type of modifier which dk is the evaluation target of r. [sent-80, score-0.463]

30 In this example, there are three opinion expressions: “good”, “sharp”, “slightly soft”. [sent-82, score-0.649]

31 The modifiers of “good” are “indoors” and “Focus accuracy”, where relation (“good”,“indoors”) is an opinion restriction because “indoors” is the condition under which “Focus accuracy” is good. [sent-83, score-0.821]

32 On the other hand, the relation (“sharp”, “little 3x optical zooms”) is an opinion expansion because the “sharp” opinion on “shot” implies a positive opinion on “little 3x optical zooms”. [sent-84, score-2.068]

33 It is worth to remark that: 1) a modifier dk can relate to more than one opinion expression. [sent-85, score-0.878]

34 For example, multiple opinion expressions may share a same condition; 2) dk itself can employ a set of relations, although the case appears occasionally. [sent-86, score-0.886]

35 The following is an example: Example 5: The camera wisely get rid of many redundant buttons. [sent-87, score-0.234]

36 In the example, “redundant buttons” is the evaluation target of opinion expression “wisely get rid of”, but itself is a relation between “redundant” and “buttons”. [sent-88, score-0.862]

37 Then we define two relations on adjacent pair ri, ri+1: coordination when the polarities of ri and ri+1 are consistent, and transition when they are opposite. [sent-92, score-0.315]

38 Those relations among ri form a set B called opinion thread. [sent-93, score-0.851]

39 In Figure 1, the opinion thread is: {(“good”, “sharp”), (“sharp”, “slightly soft”)}. [sent-94, score-0.873]

40 hole sentiment representation for a sentence can be organized by a direct graph G = (V, E). [sent-96, score-0.356]

41 Vertex set V includes all opinion expressions and modifiers. [sent-97, score-0.725]

42 Edge set E collects both relations of each individual opinion and relations in opinion thread. [sent-98, score-1.481]

43 Compared iwointh” previous works, the advantages of using G as sentiment representation are: 1) for individual opinions, the modifiers information will collect more than using opinion expression alone. [sent-100, score-1.097]

44 3We don’t define any “label” on vertices: if two span of text satisfy a relation in L, they are chosen to be vertices and an edge ywi ath r proper la inbe Ll ,w tilhle appear cihno oEse. [sent-101, score-0.25]

45 3 System Description To produce the representation graph G for a sentence, we need to extract candidate vertices and build the relations among them to get a graph structure. [sent-287, score-0.529]

46 In this section, we focus on the second task, and assume the vertices in the graph have already been correctly collected in the following formulation of algorithm. [sent-289, score-0.29]

47 A sentence is denoted by s; x are text spans which will be vertices of graph; xi is the ith vertex in x ordered by their positions in s. [sent-293, score-0.45]

48 For a set of vertices x, y is the graph of its sentiment representation, and e = (xi, xj) ∈ y is the direct edge from xi to xj in y. [sent-294, score-0.771]

49 Following the edge based factorization, the score of a graph is the sum of its edges’ scores, score(x,y) =(xi∑,xj)∈yscore(xi,xj) =(xi∑,xj)∈yαTf(xi,xj), (1) f(xi, xj) is a high dimensional feature vector of the edge (xi, xj). [sent-297, score-0.3]

50 In our case, Y is all possible directed acyclic graphs onf o tuhre c given vertex set, iwblheic dhi number is exponential. [sent-307, score-0.336]

51 From individual opinion representation, each subgraph of G which takes an opinion expression as root is connected and acyclic. [sent-315, score-1.433]

52 Thus the connectedness is guaranteed for opinion expressions are connected in opinion thread; the acyclic is guaranteed by the fact that if a modifier is shared by different opinion expressions, the inedges from them always keep (directed) acyclic. [sent-316, score-2.197]

53 Each vertex can have one outedge labeled with coordination or transition at most. [sent-318, score-0.39]

54 The opinion thread B is a directed path in graph. [sent-319, score-0.921]

55 In other words, the cases that a modifier connects to more than one opinion expression rarely occur comparing with those vertices which have a single parent. [sent-324, score-0.931]

56 2 ILP Formulation Based on the property 3, we divide the inference algorithm into two steps: i) constructing G’s spanning tree (arborescence) with property 1 and 2; ii) finding additional non-tree edges as a post processing task. [sent-328, score-0.302]

57 Following the multicommodity flow formulation of maximum spanning tree (MST) problem in (Magnanti and Wolsey, 1994), the ILP for MST is: max. [sent-335, score-0.295]

58 ∑yij ∑i,j = |V | − (3) 1 (4) ≤ u,j ≤ |V | (5) 1 ≤ u ≤ |V | (6) ∑fiuj −∑fjuk = δju,1 ∑i ∑k = ∑f0uk 1, fiuj ≤ yij, ∑k 1 ≤ u, j ≤ |V |, 0 ≤ i≤ |V | fiuj ≥ 0, 1 ≤ u,j ≤ |V |, 0 ≤ i≤ |V | yij ∈ { 0, 1}, (7) (8) 0 ≤ i,j ≤ |V |. [sent-338, score-0.335]

59 (9) In this formulation, yij is an edge indicator variable that (xi, xj) is a spanning tree edge when yij = 1, (xi, xj) is a non-tree edge when yij = 0. [sent-339, score-0.797]

60 Thus if the edges corresponding to sth aorese non zero yij iss a c tohnen eedcgteeds s cuobr-graph, y is a well-formed spanning tree. [sent-342, score-0.309]

61 fiuj indicates whether f=low { fu i0s through edge (xi, xj). [sent-348, score-0.219]

62 The Kronecker’s delta δju in (5) guarantees fu − − is only assumed by vertex xu, so fu is a well-formed path from root to xu. [sent-350, score-0.315]

63 The following are our constraints: c1: Constraint on edges in opinion thread (10)(11). [sent-353, score-0.968]

64 From the definition of opinion thread, we impose a constraint on every vertex’s outedges in opinion thread, which are labeled with “coordination” or 4For simplicity, we overload symbol y from the graph of the sentiment represetation to the MST of it. [sent-354, score-1.679]

65 (10) ∑k Then following linear inequalities bound the number of outedges in opinion thread (≤ 1) on each vertex: ≤ ≤ ≤ qj 1, 0 j |V |. [sent-358, score-0.979]

66 (11) We also bound the number of evaluation targets for a vertex in a similar way. [sent-360, score-0.259]

67 From graph property 2, the opinion thread should be a directed path. [sent-365, score-1.065]

68 Two set of additional variables are needed: {cj , 0 ≤ j ≤ |V | } and {hj, n0a ≤ j ≤ |V |}, ew nheeerde cj={01 ioft ahenrw opisineion thread starts at xj, and hj = ∑yij ∑i · Iob((i,j)). [sent-367, score-0.285]

69 (17) If the sum of cj is no more than 1, the opinion thread of graph is a directed path. [sent-369, score-1.12]

70 Assume solid lines are edges labeled with “coordination” and “transition”, dot lines are edges labeled with other types. [sent-395, score-0.25]

71 It shows c1 are not sufficient for graph property 2: the edges in opinion thread may not be connected. [sent-399, score-1.112]

72 4 ning tree problem, the integer constraints (9) on yij can be dropped. [sent-405, score-0.264]

73 We examine the case that a modifier attaches to different opinion expressions. [sent-410, score-0.717]

74 That often occurs as the result of the sharing of modifiers among adjacent opinion expressions. [sent-411, score-0.773]

75 We add those edges in the following heuristic way: If a vertex ri in opinion thread does not have any modifier, we search the modifiers of its adjacent vertices ri+1, ri−1 in the opinion thread, and add edge (ri, d∗) where d∗ = argmaxscore(ri,d), and S are the modifiers of ri−1 and ri+1 . [sent-412, score-2.341]

76 1337 d iesptaen cde nbceOytwthpeae r sniFnpega rt euenlrtaetasinodnschild Table 1: Feature set Feature Construction For each vertex xi in graph, we use 2 sets of features: inside features which are extracted inside the text span of xi; outside features which are outside the text span of xi. [sent-416, score-0.398]

77 A vertex xi is described both in word sequence (w0, w1, · · · , wk) and character sequence (c0, c1, · · · , cl), f,·or· ·th ,ew sentences are in Chinese. [sent-417, score-0.314]

78 In order to involve syntactic information, whether there is certain type of dependency relation between xi and xj is also used as a feature. [sent-420, score-0.301]

79 The annotators started from locating opinion expressions, and for each of them, they annotated other modifiers related to it. [sent-430, score-0.745]

80 Thus only mining the relations between opinion expressions and evaluation target is actually at risk of inaccurate and incomplete results. [sent-437, score-0.871]

81 In feature construction, we use an external Chinese sentiment lexicon which contains 4566 positive opinion words and 4370 negative opinion words. [sent-440, score-1.504]

82 We evaluate the system from the following aspects: 1) whether the structural information helps to mining opinion relations. [sent-448, score-0.761]

83 Those kinds of methods were used in previous opinion mining works (Wu et al. [sent-457, score-0.689]

84 In order to examine whether the complicated sentiment representation would disturb the classifier in finding relations between opinion expressions and its target, we evaluate the system by discarding the modifiers of opinion restriction and expansion from the corpus. [sent-471, score-1.87]

85 In the inference algorithm, we utilized the properties of graph G and adapted the basic multicommodity flow ILP to our specific task. [sent-476, score-0.318]

86 “MST” is the basic multicommodity flow formulation of maximum spanning tree; c1, c2, c3 are groups of constraint from Section 3. [sent-479, score-0.265]

87 By comparing with different constraint combinations, the constraints on opinion thread (c1, c3) are more effective than constraints on evaluation targets (c2). [sent-486, score-0.977]

88 It is because opinion expressions are more important in the entire sentiment representation. [sent-487, score-0.931]

89 The main structure of a graph is clear once the relations between opinion expressions are correctly determined. [sent-488, score-0.912]

90 A possible reason is that the content of a vertex can be very complicated (a vertex even can be a clause), but the features surrounding the vertex are relatively simple and easy to identify (for example, a single preposition can identify a complex condition). [sent-493, score-0.657]

91 “In” represents the result of inside feature set; “In-s” is “In” without the external opinion lexicon feature; “Out” uses the outside feature set; “In+Out” uses both “In” and “Out”, “In+Out+Dep” adds the dependency feature. [sent-500, score-0.72]

92 The reason may be that for a modification on opinion expression (r, dk), we allow dk recursively has its own modifiers (Example 5). [sent-504, score-0.984]

93 Thus an opinion expression can be a modifier which brings difficulties to classifier. [sent-505, score-0.795]

94 Finally we conduct an experiment on vertex extraction using standard se- quential labeling method. [sent-508, score-0.219]

95 We use two criterion: 1) the vertex is correct if it is exactly same as ground truth(“E”), 2) the vertex is correct if it overlaps with ground truth(“O”). [sent-532, score-0.438]

96 (2007) presented their work on extracting opinion units including: opinion holder, subject, aspect and evaluation. [sent-538, score-1.298]

97 (2009) also applied ILP with flow formulation for maximum spanning tree, besides, they also handled dependency parse trees involving high order features(sibling, grandparent), and with projective constraint. [sent-550, score-0.217]

98 Inspections on corpus show that the information ignored in previous sentiment representation can cause incorrect or incomplete mining results. [sent-552, score-0.287]

99 We consider opinion restriction, opinion expansions, relations between opinion expressions, and represent them with a directed graph. [sent-553, score-2.073]

100 Mining the peanut gallery: opinion extraction and semantic classification of product reviews. [sent-570, score-0.649]


similar papers computed by tfidf model

tfidf for this paper:

wordName wordTfidf (topN-words)

[('opinion', 0.649), ('thread', 0.224), ('vertex', 0.219), ('sentiment', 0.206), ('dk', 0.161), ('yij', 0.155), ('xj', 0.144), ('vertices', 0.136), ('graph', 0.109), ('modifiers', 0.096), ('ri', 0.096), ('edges', 0.095), ('xi', 0.095), ('cj', 0.09), ('fiuj', 0.09), ('mst', 0.087), ('flow', 0.084), ('edge', 0.081), ('coordination', 0.08), ('expression', 0.078), ('relations', 0.078), ('ilp', 0.077), ('sharp', 0.077), ('multicommodity', 0.077), ('expressions', 0.076), ('opinions', 0.074), ('structural', 0.072), ('camera', 0.072), ('kobayashi', 0.072), ('qj', 0.07), ('modifier', 0.068), ('transition', 0.061), ('hj', 0.061), ('spanning', 0.059), ('folder', 0.054), ('indoors', 0.054), ('jindal', 0.054), ('narayanan', 0.054), ('wisely', 0.054), ('iob', 0.052), ('inference', 0.048), ('fu', 0.048), ('directed', 0.048), ('integer', 0.047), ('rid', 0.046), ('buttons', 0.046), ('magnanti', 0.046), ('martins', 0.046), ('formulation', 0.045), ('restriction', 0.043), ('inside', 0.042), ('wk', 0.042), ('representation', 0.041), ('acyclic', 0.04), ('targets', 0.04), ('mining', 0.04), ('online', 0.039), ('connectedness', 0.036), ('dik', 0.036), ('freeway', 0.036), ('holder', 0.036), ('lide', 0.036), ('narrows', 0.036), ('outedges', 0.036), ('researches', 0.036), ('wolsey', 0.036), ('xiz', 0.036), ('yjk', 0.036), ('yuanbin', 0.036), ('zooms', 0.036), ('property', 0.035), ('comparative', 0.035), ('redundant', 0.034), ('hu', 0.034), ('wu', 0.033), ('relation', 0.033), ('expansion', 0.032), ('constraints', 0.032), ('slots', 0.031), ('argmy', 0.031), ('shanghai', 0.031), ('interior', 0.031), ('labeled', 0.03), ('connected', 0.03), ('tree', 0.03), ('bing', 0.029), ('dimensional', 0.029), ('graphs', 0.029), ('dependency', 0.029), ('get', 0.028), ('among', 0.028), ('riedel', 0.028), ('optical', 0.028), ('hassan', 0.028), ('dasgupta', 0.028), ('target', 0.028), ('individual', 0.027), ('reviews', 0.027), ('ct', 0.027), ('programming', 0.027)]

similar papers list:

simIndex simValue paperId paperTitle

same-paper 1 1.0000004 126 emnlp-2011-Structural Opinion Mining for Graph-based Sentiment Representation

Author: Yuanbin Wu ; Qi Zhang ; Xuanjing Huang ; Lide Wu

Abstract: Based on analysis of on-line review corpus we observe that most sentences have complicated opinion structures and they cannot be well represented by existing methods, such as frame-based and feature-based ones. In this work, we propose a novel graph-based representation for sentence level sentiment. An integer linear programming-based structural learning method is then introduced to produce the graph representations of input sentences. Experimental evaluations on a manually labeled Chinese corpus demonstrate the effectiveness of the proposed approach.

2 0.16184449 120 emnlp-2011-Semi-Supervised Recursive Autoencoders for Predicting Sentiment Distributions

Author: Richard Socher ; Jeffrey Pennington ; Eric H. Huang ; Andrew Y. Ng ; Christopher D. Manning

Abstract: We introduce a novel machine learning framework based on recursive autoencoders for sentence-level prediction of sentiment label distributions. Our method learns vector space representations for multi-word phrases. In sentiment prediction tasks these representations outperform other state-of-the-art approaches on commonly used datasets, such as movie reviews, without using any pre-defined sentiment lexica or polarity shifting rules. We also evaluate the model’s ability to predict sentiment distributions on a new dataset based on confessions from the experience project. The dataset consists of personal user stories annotated with multiple labels which, when aggregated, form a multinomial distribution that captures emotional reactions. Our algorithm can more accurately predict distributions over such labels compared to several competitive baselines.

3 0.1547185 30 emnlp-2011-Compositional Matrix-Space Models for Sentiment Analysis

Author: Ainur Yessenalina ; Claire Cardie

Abstract: We present a general learning-based approach for phrase-level sentiment analysis that adopts an ordinal sentiment scale and is explicitly compositional in nature. Thus, we can model the compositional effects required for accurate assignment of phrase-level sentiment. For example, combining an adverb (e.g., “very”) with a positive polar adjective (e.g., “good”) produces a phrase (“very good”) with increased polarity over the adjective alone. Inspired by recent work on distributional approaches to compositionality, we model each word as a matrix and combine words using iterated matrix multiplication, which allows for the modeling of both additive and multiplicative semantic effects. Although the multiplication-based matrix-space framework has been shown to be a theoretically elegant way to model composition (Rudolph and Giesbrecht, 2010), training such models has to be done carefully: the optimization is nonconvex and requires a good initial starting point. This paper presents the first such algorithm for learning a matrix-space model for semantic composition. In the context of the phrase-level sentiment analysis task, our experimental results show statistically significant improvements in performance over a bagof-words model.

4 0.14505936 33 emnlp-2011-Cooooooooooooooollllllllllllll!!!!!!!!!!!!!! Using Word Lengthening to Detect Sentiment in Microblogs

Author: Samuel Brody ; Nicholas Diakopoulos

Abstract: We present an automatic method which leverages word lengthening to adapt a sentiment lexicon specifically for Twitter and similar social messaging networks. The contributions of the paper are as follows. First, we call attention to lengthening as a widespread phenomenon in microblogs and social messaging, and demonstrate the importance of handling it correctly. We then show that lengthening is strongly associated with subjectivity and sentiment. Finally, we present an automatic method which leverages this association to detect domain-specific sentiment- and emotionbearing words. We evaluate our method by comparison to human judgments, and analyze its strengths and weaknesses. Our results are of interest to anyone analyzing sentiment in microblogs and social networks, whether for research or commercial purposes.

5 0.14252537 105 emnlp-2011-Predicting Thread Discourse Structure over Technical Web Forums

Author: Li Wang ; Marco Lui ; Su Nam Kim ; Joakim Nivre ; Timothy Baldwin

Abstract: Online discussion forums are a valuable means for users to resolve specific information needs, both interactively for the participants and statically for users who search/browse over historical thread data. However, the complex structure of forum threads can make it difficult for users to extract relevant information. The discourse structure of web forum threads, in the form of labelled dependency relationships between posts, has the potential to greatly improve information access over web forum archives. In this paper, we present the task of parsing user forum threads to determine the labelled dependencies between posts. Three methods, including a dependency parsing approach, are proposed to jointly classify the links (relationships) between posts and the dialogue act (type) of each link. The proposed methods significantly surpass an informed baseline. We also experiment with “in situ” classification of evolving threads, and establish that our best methods are able to perform equivalently well over partial threads as complete threads.

6 0.14123198 145 emnlp-2011-Unsupervised Semantic Role Induction with Graph Partitioning

7 0.12055632 63 emnlp-2011-Harnessing WordNet Senses for Supervised Sentiment Classification

8 0.096747711 81 emnlp-2011-Learning General Connotation of Words using Graph-based Algorithms

9 0.092056051 104 emnlp-2011-Personalized Recommendation of User Comments via Factor Models

10 0.084741041 142 emnlp-2011-Unsupervised Discovery of Discourse Relations for Eliminating Intra-sentence Polarity Ambiguities

11 0.077147841 108 emnlp-2011-Quasi-Synchronous Phrase Dependency Grammars for Machine Translation

12 0.075363614 17 emnlp-2011-Active Learning with Amazon Mechanical Turk

13 0.068980433 100 emnlp-2011-Optimal Search for Minimum Error Rate Training

14 0.068737894 148 emnlp-2011-Watermarking the Outputs of Structured Prediction with an application in Statistical Machine Translation.

15 0.06749481 117 emnlp-2011-Rumor has it: Identifying Misinformation in Microblogs

16 0.062198836 132 emnlp-2011-Syntax-Based Grammaticality Improvement using CCG and Guided Search

17 0.058368802 51 emnlp-2011-Exact Decoding of Phrase-Based Translation Models through Lagrangian Relaxation

18 0.058138147 128 emnlp-2011-Structured Relation Discovery using Generative Models

19 0.058058582 7 emnlp-2011-A Joint Model for Extended Semantic Role Labeling

20 0.05626436 45 emnlp-2011-Dual Decomposition with Many Overlapping Components


similar papers computed by lsi model

lsi for this paper:

topicId topicWeight

[(0, 0.208), (1, -0.136), (2, 0.037), (3, 0.12), (4, 0.296), (5, -0.057), (6, 0.09), (7, -0.049), (8, 0.024), (9, -0.02), (10, 0.018), (11, 0.018), (12, 0.039), (13, 0.022), (14, 0.096), (15, 0.033), (16, -0.058), (17, -0.009), (18, 0.145), (19, -0.08), (20, -0.007), (21, 0.041), (22, -0.21), (23, -0.094), (24, 0.097), (25, 0.056), (26, -0.07), (27, -0.131), (28, 0.009), (29, 0.019), (30, 0.02), (31, 0.094), (32, -0.048), (33, -0.027), (34, -0.014), (35, -0.163), (36, 0.066), (37, 0.059), (38, -0.01), (39, -0.021), (40, -0.062), (41, 0.097), (42, -0.138), (43, -0.079), (44, 0.074), (45, -0.053), (46, -0.006), (47, -0.048), (48, 0.128), (49, 0.03)]

similar papers list:

simIndex simValue paperId paperTitle

same-paper 1 0.96628499 126 emnlp-2011-Structural Opinion Mining for Graph-based Sentiment Representation

Author: Yuanbin Wu ; Qi Zhang ; Xuanjing Huang ; Lide Wu

Abstract: Based on analysis of on-line review corpus we observe that most sentences have complicated opinion structures and they cannot be well represented by existing methods, such as frame-based and feature-based ones. In this work, we propose a novel graph-based representation for sentence level sentiment. An integer linear programming-based structural learning method is then introduced to produce the graph representations of input sentences. Experimental evaluations on a manually labeled Chinese corpus demonstrate the effectiveness of the proposed approach.

2 0.55838615 33 emnlp-2011-Cooooooooooooooollllllllllllll!!!!!!!!!!!!!! Using Word Lengthening to Detect Sentiment in Microblogs

Author: Samuel Brody ; Nicholas Diakopoulos

Abstract: We present an automatic method which leverages word lengthening to adapt a sentiment lexicon specifically for Twitter and similar social messaging networks. The contributions of the paper are as follows. First, we call attention to lengthening as a widespread phenomenon in microblogs and social messaging, and demonstrate the importance of handling it correctly. We then show that lengthening is strongly associated with subjectivity and sentiment. Finally, we present an automatic method which leverages this association to detect domain-specific sentiment- and emotionbearing words. We evaluate our method by comparison to human judgments, and analyze its strengths and weaknesses. Our results are of interest to anyone analyzing sentiment in microblogs and social networks, whether for research or commercial purposes.

3 0.55776709 81 emnlp-2011-Learning General Connotation of Words using Graph-based Algorithms

Author: Song Feng ; Ritwik Bose ; Yejin Choi

Abstract: In this paper, we introduce a connotation lexicon, a new type of lexicon that lists words with connotative polarity, i.e., words with positive connotation (e.g., award, promotion) and words with negative connotation (e.g., cancer, war). Connotation lexicons differ from much studied sentiment lexicons: the latter concerns words that express sentiment, while the former concerns words that evoke or associate with a specific polarity of sentiment. Understanding the connotation of words would seem to require common sense and world knowledge. However, we demonstrate that much of the connotative polarity of words can be inferred from natural language text in a nearly unsupervised manner. The key linguistic insight behind our approach is selectional preference of connotative predicates. We present graphbased algorithms using PageRank and HITS that collectively learn connotation lexicon together with connotative predicates. Our empirical study demonstrates that the resulting connotation lexicon is of great value for sentiment analysis complementing existing sentiment lexicons.

4 0.55548543 120 emnlp-2011-Semi-Supervised Recursive Autoencoders for Predicting Sentiment Distributions

Author: Richard Socher ; Jeffrey Pennington ; Eric H. Huang ; Andrew Y. Ng ; Christopher D. Manning

Abstract: We introduce a novel machine learning framework based on recursive autoencoders for sentence-level prediction of sentiment label distributions. Our method learns vector space representations for multi-word phrases. In sentiment prediction tasks these representations outperform other state-of-the-art approaches on commonly used datasets, such as movie reviews, without using any pre-defined sentiment lexica or polarity shifting rules. We also evaluate the model’s ability to predict sentiment distributions on a new dataset based on confessions from the experience project. The dataset consists of personal user stories annotated with multiple labels which, when aggregated, form a multinomial distribution that captures emotional reactions. Our algorithm can more accurately predict distributions over such labels compared to several competitive baselines.

5 0.52480686 30 emnlp-2011-Compositional Matrix-Space Models for Sentiment Analysis

Author: Ainur Yessenalina ; Claire Cardie

Abstract: We present a general learning-based approach for phrase-level sentiment analysis that adopts an ordinal sentiment scale and is explicitly compositional in nature. Thus, we can model the compositional effects required for accurate assignment of phrase-level sentiment. For example, combining an adverb (e.g., “very”) with a positive polar adjective (e.g., “good”) produces a phrase (“very good”) with increased polarity over the adjective alone. Inspired by recent work on distributional approaches to compositionality, we model each word as a matrix and combine words using iterated matrix multiplication, which allows for the modeling of both additive and multiplicative semantic effects. Although the multiplication-based matrix-space framework has been shown to be a theoretically elegant way to model composition (Rudolph and Giesbrecht, 2010), training such models has to be done carefully: the optimization is nonconvex and requires a good initial starting point. This paper presents the first such algorithm for learning a matrix-space model for semantic composition. In the context of the phrase-level sentiment analysis task, our experimental results show statistically significant improvements in performance over a bagof-words model.

6 0.50265002 105 emnlp-2011-Predicting Thread Discourse Structure over Technical Web Forums

7 0.43768835 145 emnlp-2011-Unsupervised Semantic Role Induction with Graph Partitioning

8 0.36205295 148 emnlp-2011-Watermarking the Outputs of Structured Prediction with an application in Statistical Machine Translation.

9 0.3433677 63 emnlp-2011-Harnessing WordNet Senses for Supervised Sentiment Classification

10 0.333327 104 emnlp-2011-Personalized Recommendation of User Comments via Factor Models

11 0.3197819 100 emnlp-2011-Optimal Search for Minimum Error Rate Training

12 0.30148974 142 emnlp-2011-Unsupervised Discovery of Discourse Relations for Eliminating Intra-sentence Polarity Ambiguities

13 0.28977251 27 emnlp-2011-Classifying Sentences as Speech Acts in Message Board Posts

14 0.28595817 117 emnlp-2011-Rumor has it: Identifying Misinformation in Microblogs

15 0.26814255 67 emnlp-2011-Hierarchical Verb Clustering Using Graph Factorization

16 0.25904641 7 emnlp-2011-A Joint Model for Extended Semantic Role Labeling

17 0.25824517 143 emnlp-2011-Unsupervised Information Extraction with Distributional Prior Knowledge

18 0.25731644 132 emnlp-2011-Syntax-Based Grammaticality Improvement using CCG and Guided Search

19 0.25037634 72 emnlp-2011-Improved Transliteration Mining Using Graph Reinforcement

20 0.24903116 47 emnlp-2011-Efficient retrieval of tree translation examples for Syntax-Based Machine Translation


similar papers computed by lda model

lda for this paper:

topicId topicWeight

[(15, 0.023), (23, 0.131), (26, 0.206), (36, 0.041), (37, 0.03), (45, 0.057), (54, 0.03), (57, 0.013), (62, 0.031), (64, 0.027), (66, 0.017), (69, 0.032), (79, 0.044), (82, 0.033), (87, 0.013), (90, 0.011), (96, 0.082), (97, 0.037), (98, 0.055)]

similar papers list:

simIndex simValue paperId paperTitle

same-paper 1 0.82536328 126 emnlp-2011-Structural Opinion Mining for Graph-based Sentiment Representation

Author: Yuanbin Wu ; Qi Zhang ; Xuanjing Huang ; Lide Wu

Abstract: Based on analysis of on-line review corpus we observe that most sentences have complicated opinion structures and they cannot be well represented by existing methods, such as frame-based and feature-based ones. In this work, we propose a novel graph-based representation for sentence level sentiment. An integer linear programming-based structural learning method is then introduced to produce the graph representations of input sentences. Experimental evaluations on a manually labeled Chinese corpus demonstrate the effectiveness of the proposed approach.

2 0.64911819 108 emnlp-2011-Quasi-Synchronous Phrase Dependency Grammars for Machine Translation

Author: Kevin Gimpel ; Noah A. Smith

Abstract: We present a quasi-synchronous dependency grammar (Smith and Eisner, 2006) for machine translation in which the leaves of the tree are phrases rather than words as in previous work (Gimpel and Smith, 2009). This formulation allows us to combine structural components of phrase-based and syntax-based MT in a single model. We describe a method of extracting phrase dependencies from parallel text using a target-side dependency parser. For decoding, we describe a coarse-to-fine approach based on lattice dependency parsing of phrase lattices. We demonstrate performance improvements for Chinese-English and UrduEnglish translation over a phrase-based baseline. We also investigate the use of unsupervised dependency parsers, reporting encouraging preliminary results.

3 0.62854189 59 emnlp-2011-Fast and Robust Joint Models for Biomedical Event Extraction

Author: Sebastian Riedel ; Andrew McCallum

Abstract: Extracting biomedical events from literature has attracted much recent attention. The bestperforming systems so far have been pipelines of simple subtask-specific local classifiers. A natural drawback of such approaches are cascading errors introduced in early stages of the pipeline. We present three joint models of increasing complexity designed to overcome this problem. The first model performs joint trigger and argument extraction, and lends itself to a simple, efficient and exact inference algorithm. The second model captures correlations between events, while the third model ensures consistency between arguments of the same event. Inference in these models is kept tractable through dual decomposition. The first two models outperform the previous best joint approaches and are very competitive with respect to the current state-of-theart. The third model yields the best results reported so far on the BioNLP 2009 shared task, the BioNLP 2011 Genia task and the BioNLP 2011Infectious Diseases task.

4 0.62609088 30 emnlp-2011-Compositional Matrix-Space Models for Sentiment Analysis

Author: Ainur Yessenalina ; Claire Cardie

Abstract: We present a general learning-based approach for phrase-level sentiment analysis that adopts an ordinal sentiment scale and is explicitly compositional in nature. Thus, we can model the compositional effects required for accurate assignment of phrase-level sentiment. For example, combining an adverb (e.g., “very”) with a positive polar adjective (e.g., “good”) produces a phrase (“very good”) with increased polarity over the adjective alone. Inspired by recent work on distributional approaches to compositionality, we model each word as a matrix and combine words using iterated matrix multiplication, which allows for the modeling of both additive and multiplicative semantic effects. Although the multiplication-based matrix-space framework has been shown to be a theoretically elegant way to model composition (Rudolph and Giesbrecht, 2010), training such models has to be done carefully: the optimization is nonconvex and requires a good initial starting point. This paper presents the first such algorithm for learning a matrix-space model for semantic composition. In the context of the phrase-level sentiment analysis task, our experimental results show statistically significant improvements in performance over a bagof-words model.

5 0.62265217 137 emnlp-2011-Training dependency parsers by jointly optimizing multiple objectives

Author: Keith Hall ; Ryan McDonald ; Jason Katz-Brown ; Michael Ringgaard

Abstract: We present an online learning algorithm for training parsers which allows for the inclusion of multiple objective functions. The primary example is the extension of a standard supervised parsing objective function with additional loss-functions, either based on intrinsic parsing quality or task-specific extrinsic measures of quality. Our empirical results show how this approach performs for two dependency parsing algorithms (graph-based and transition-based parsing) and how it achieves increased performance on multiple target tasks including reordering for machine translation and parser adaptation.

6 0.62216341 28 emnlp-2011-Closing the Loop: Fast, Interactive Semi-Supervised Annotation With Queries on Features and Instances

7 0.6192733 136 emnlp-2011-Training a Parser for Machine Translation Reordering

8 0.61739588 81 emnlp-2011-Learning General Connotation of Words using Graph-based Algorithms

9 0.61168963 128 emnlp-2011-Structured Relation Discovery using Generative Models

10 0.61121565 104 emnlp-2011-Personalized Recommendation of User Comments via Factor Models

11 0.61025935 70 emnlp-2011-Identifying Relations for Open Information Extraction

12 0.60925043 134 emnlp-2011-Third-order Variational Reranking on Packed-Shared Dependency Forests

13 0.60716116 6 emnlp-2011-A Generate and Rank Approach to Sentence Paraphrasing

14 0.60669708 120 emnlp-2011-Semi-Supervised Recursive Autoencoders for Predicting Sentiment Distributions

15 0.60621274 132 emnlp-2011-Syntax-Based Grammaticality Improvement using CCG and Guided Search

16 0.60533148 17 emnlp-2011-Active Learning with Amazon Mechanical Turk

17 0.60456842 1 emnlp-2011-A Bayesian Mixture Model for PoS Induction Using Multiple Features

18 0.60354358 35 emnlp-2011-Correcting Semantic Collocation Errors with L1-induced Paraphrases

19 0.60328388 116 emnlp-2011-Robust Disambiguation of Named Entities in Text

20 0.60293627 68 emnlp-2011-Hypotheses Selection Criteria in a Reranking Framework for Spoken Language Understanding