emnlp emnlp2010 emnlp2010-120 knowledge-graph by maker-knowledge-mining

120 emnlp-2010-What's with the Attitude? Identifying Sentences with Attitude in Online Discussions

Source: pdf

Author: Ahmed Hassan ; Vahed Qazvinian ; Dragomir Radev

Abstract: Mining sentiment from user generated content is a very important task in Natural Language Processing. An example of such content is threaded discussions which act as a very important tool for communication and collaboration in the Web. Threaded discussions include e-mails, e-mail lists, bulletin boards, newsgroups, and Internet forums. Most of the work on sentiment analysis has been centered around finding the sentiment toward products or topics. In this work, we present a method to identify the attitude of participants in an online discussion toward one another. This would enable us to build a signed network representation of participant interaction where every edge has a sign that indicates whether the interaction is positive or negative. This is different from most of the research on social networks that has focused almost exclusively on positive links. The method is exper- imentally tested using a manually labeled set of discussion posts. The results show that the proposed method is capable of identifying attitudinal sentences, and their signs, with high accuracy and that it outperforms several other baselines.

Reference: text

Summary: the most important sentenses genereted by tfidf model

sentIndex sentText sentNum sentScore

1 Most of the work on sentiment analysis has been centered around finding the sentiment toward products or topics. [sent-6, score-0.246]

2 In this work, we present a method to identify the attitude of participants in an online discussion toward one another. [sent-7, score-0.955]

3 This would enable us to build a signed network representation of participant interaction where every edge has a sign that indicates whether the interaction is positive or negative. [sent-8, score-0.35]

4 This is different from most of the research on social networks that has focused almost exclusively on positive links. [sent-9, score-0.184]

5 A new application of sentiment mining is to automatically identify attitudes between participants in an online discussion. [sent-16, score-0.425]

6 An automatic tool to identify attitudes will enable 1245 us to build a signed network representation of participant interaction in which the interaction between two participants is represented using a positive or a negative edge. [sent-17, score-0.541]

7 Even though using signed edges in social network studies is clearly important, most of the social networks research has focused only on positive links between entities. [sent-18, score-0.391]

8 Although similar, identifying sentences that display an attitude in discussions is different from identifying opinionated sentences. [sent-24, score-0.944]

9 , price of a camera) and yet have no attitude toward the other participants in the discussion. [sent-27, score-0.81]

10 For instance, in the following discussion Alice’s sentence has her opinion against something, yet no attitude toward the recipient of the sentence, Bob. [sent-28, score-0.899]

11 Alice: “You know what, he turned out to be a great disappointment” Bob: “You are completely unqualified to judge this great person” However, Bob shows strong attitude toward Alice. [sent-29, score-0.777]

12 In this work, we look at ways to predict whether a sentence displays an attitude toward the text recipient. [sent-30, score-0.8]

13 An attitude is the mental position of one participant with regard to another participant. [sent-31, score-0.708]

14 In Section 2 we review some of the related prior work on identifying polarized words and subjectivity analysis. [sent-39, score-0.461]

15 2 Related Work Identifying the polarity of individual words is a well studied problem. [sent-43, score-0.204]

16 In previous work, Hatzivassiloglou and McKeown (1997) propose a method to identify the polarity of adjectives. [sent-44, score-0.248]

17 Their method can label simple in “simple and well-received” as the same orientation and simplistic in “simplistic but well-received” as the opposite orientation of wellreceived. [sent-46, score-0.204]

18 Then, they use the energy point of view to propose that neighboring electrons tend to have the same spin direction, and therefore neighboring words tend to have the same polarity orientations. [sent-56, score-0.287]

19 Specifically, Hu and Liu (2004) use WordNet synonyms and antonyms to predict the polarity of any given word with unknown polarity. [sent-59, score-0.298]

20 They label each word with the polarity of its synonyms and the opposite polarity of its antonyms. [sent-60, score-0.44]

21 , 2004) in which a network of WordNet synonyms is used to find the shortest path between any given word, and the words “good” and “bad”. [sent-63, score-0.206]

22 Kim and Hovy (Kim and Hovy, 2004) used WordNet syn- onyms and antonyms to expand two lists of positive and negative seed words. [sent-64, score-0.248]

23 All the work mentioned above focus on the task of identifying the polarity of individual words. [sent-67, score-0.254]

24 Our proposed work is identifying attitudes in sentences that appear in online discussions. [sent-68, score-0.236]

25 Prior work on subjectivity analysis mainly consists of two main categories: The first category is concerned with identifying the subjectivity of individual phrases and words regardless of the sentence and context they appear in (Wiebe, 2000; Hatzivassiloglou and Wiebe, 2000; Banea et al. [sent-70, score-0.221]

26 A discussion sentence may display an opinion about some topic yet no attitude. [sent-78, score-0.196]

27 Moreover, extracting attitudes from online discussions is different from targeting subjective expressions (Josef Ruppenhofer and Wiebe, 2008; Kim and Hovy, 2004). [sent-80, score-0.245]

28 A very detailed survey that covers techniques and approaches in sentiment analysis and opinion mining could be found in (Pang and Lee, 2008). [sent-83, score-0.204]

29 Huang et al (2007) used an SVM classifier to extract (thread-title, reply) pairs as chat knowledge from online discussion forums to support the construction of a chatbot for a certain domain. [sent-87, score-0.199]

30 3 Problem Definition Assume we have a set of sentences exchanged between participants in an online discussion. [sent-91, score-0.184]

31 Our objective is to identify sentences that display an attitude from the text writer to the text recepient from those that do not. [sent-92, score-0.884]

32 An attitude is the mental position of one particpant with regard to another partic- ipant. [sent-93, score-0.679]

33 An attitude may not be directly observable, but rather inferred from what particpants say to one another. [sent-94, score-0.679]

34 Strategies for showing a positive attitude may include agreement, and praise, while strategies for showing a negative attitude may include disagreement, insults, and negative slang. [sent-96, score-1.625]

35 After identifying sentences that display an attitude, we also predict the sign (positive or negative) of that attitude. [sent-97, score-0.234]

36 1247 4 Approach In this section, we describe a model which, given a sentence, predicts whether it carries an attitude from the text writer toward the text recipient or not. [sent-98, score-0.824]

37 Any given piece of text exchanged between two participants in a discussion could carry an attitude toward the text recipient, an attitude towards the topic, or no attitude at all. [sent-99, score-2.206]

38 As we are only interested in attitudes between participants, we limit our study to sentences that use second person pronouns. [sent-100, score-0.233]

39 Second person pronouns are usually used in conversational genre to indicate that the text writer is addressing the text recipient. [sent-101, score-0.213]

40 We examine these fragments to to identify the polarity of every word in the sentence. [sent-103, score-0.337]

41 The existence of polarized words in any sentence is an important indicator of whether it carries an attitude or not. [sent-106, score-1.108]

42 1 Word Polarity Identification Identifying the polarity of words is an important step for our method. [sent-116, score-0.204]

43 Let S+ and S− be two sets of ver- toifce esv representing ts See+d awndord Ss− th baet are already lear-beled as either positive or negative respectively. [sent-126, score-0.182]

44 , 2005) to determine the contextual polarity of the identified words. [sent-134, score-0.204]

45 The set of features used to predict contextual polarity include word, sentence, polarity, structure, and other features. [sent-135, score-0.234]

46 If we closely examine the sentence, we will notice that we are only interested in a part of the sentence that includes the second person pronoun ”you“. [sent-141, score-0.249]

47 Examples of such patterns could use lexical items, part-of-speech (POS) tags, word polarity tags, and dependency relations. [sent-152, score-0.345]

48 We use three different patterns to represent each fragments: • • • Lexical patterns: All polarized words are replaces lw pitaht ttehren corresponding polarity tag, a rnedall other words are left as is. [sent-153, score-0.65]

49 Polarized words are replaced with their polarity tags and their POS tags. [sent-156, score-0.204]

50 Dependency grammar patterns: the shortest path connecting every s epacottendrn person pronoun to the closed polarized word is extracted. [sent-157, score-0.744]

51 The second person pronoun, the polarized word tag, and the types of the dependency relations along the path connecting them are used as a pattern. [sent-158, score-0.585]

52 Every polarized word is assigned to the closest second person pronoun in the dependency tree. [sent-160, score-0.595]

53 This is only useful for sentences that have polarized words. [sent-161, score-0.386]

54 We use text, partof-speech tags, polarity tags, and dependency relations. [sent-163, score-0.245]

55 5 Identifying Sentences with Attitude We split our training data into two splits; the first containing all sentences that have an attitude and the second containing all sentences that do not have an attitude. [sent-180, score-0.759]

56 A standard machine learning classifier is then trained using those features to predict whether a given sentence has an attitude or not. [sent-192, score-0.781]

57 6 Identifying the Sign of an Attitude To determine the orientation of an attitude sentence, we tried two different methods. [sent-194, score-0.781]

58 The first method assumes that the orientation of an attitude sentence is directly related to the polarity of the words it contains. [sent-195, score-1.026]

59 If the sentence has both positive and negative words, we calculate the summation of the polarity scores of all positive words and that of all negative words. [sent-198, score-0.609]

60 The polarity score of a word is an indicator of how strong of a polarized word it is. [sent-199, score-0.55]

61 The problem with this method is that it assumes that all polarized words in a sentence with an attitude target the text recipient. [sent-201, score-1.066]

62 For example, the sentence ”You are completely unqualified to judge this great person” has a positive word ”great” and a negative word ”unqualified”. [sent-203, score-0.271]

63 To solve this problem, we use another method that is based on the paths that connect polarized words to second person pronouns in a dependency parse tree. [sent-205, score-0.546]

64 For every positive word w , we identify the shortest path connecting it to every second person pronoun in the sentence then we compute the average length of the shortest path connecting every positive word to the closest second person pronoun. [sent-206, score-1.092]

65 The sentence is classified as positive ifthe average length of the shortest path connecting positive words to the closest second person pronoun is smaller than the corresponding value for negative words. [sent-208, score-0.693]

66 The reason behind that is that participants usually quote other participants text when they reply to them. [sent-215, score-0.199]

67 This restriction allows us to identify the target of every post, and raises the probability that the post will display an attitude from its writer to its target. [sent-216, score-0.899]

68 We explained earlier how second person pronouns are used in discussions genres to indicate the writer is targeting the text recipient. [sent-220, score-0.271]

69 Given a random sentence selected from some random discussion thread, the probability that the sentence does not have an attitude is significantly larger than the probability that it will have an attitude. [sent-221, score-0.799]

70 Hence, restricting our dataset to posts with quoted text and sentences with second person pronouns is very important to make sure that we will have a considerable amount of attitudinal sentences. [sent-222, score-0.279]

71 1 Annotation Scheme The goals of the annotation scheme are to distinguish sentences that display an attitude from those that do not. [sent-226, score-0.786]

72 Sentences could display either a negative or a positive attitude. [sent-227, score-0.249]

73 The first specifies whether the sentence displays an attitude or not. [sent-235, score-0.72]

74 The existence of an attitude was judged on a three point scale: attitude, unsure, and no-attitude. [sent-236, score-0.721]

75 If an attitude exists, annotators were asked to specify whether the attitude is positive or negative. [sent-238, score-1.455]

76 The number of sentences with an attitude was around 20% of the entire dataset. [sent-249, score-0.719]

77 The class imbalance caused by the small number of attitude sentences may hurt the performance of the learning algorithm (Provost, 2000). [sent-250, score-0.719]

78 To do this we down-sample the majority class by randomly selecting, without replacement, a number of sentences without an attitude that equals the number of sentences with an 1251 attitude. [sent-252, score-0.759]

79 2 Baselines The first baseline is based on the hypothesis that the existence of polarized words is a strong indicator that the sentence has an attitude. [sent-264, score-0.429]

80 As a result, we use the number of polarized word in the sentence, the percentage of polarized words to all other words, and whether the sentences has polarized words with mixed or same sign as features to train an SVM classifier to detect attitude. [sent-265, score-1.156]

81 The second baseline is based on the proximity between the polarized words and the second person pronouns. [sent-266, score-0.456]

82 We assume that every polarized word is associated with the closest second person pronoun. [sent-267, score-0.541]

83 Let w be a polarized word and p(w) be the closes second person pronoun, and surf dist(w, p(w)) be the surface distance between w and p(w). [sent-268, score-0.504]

84 This baseline uses the minimum, maximum, and average of surf dist(w, p(w)) for all polarized words as features to train an SVM classifier to identify sentences with attitude. [sent-269, score-0.509]

85 We assume that every polarized word is associated to the second person pronoun that is connected to it using the smallest shortest path. [sent-271, score-0.656]

86 The minimum, maximum, and average of this distance for all polarized words are used as features to train an SVM classifier. [sent-273, score-0.346]

87 3 Results and Discussion Figure 2 compares the accuracy, precision, and recall of the proposed method (ML), the polarity based classifier (POL), the surface distance based classifier (Surf Dist), and the dependency distance based classifier (Dep Dist). [sent-277, score-0.338]

88 It turns out that they tend to predict most sentences that have polarized words as sentences with attitude. [sent-282, score-0.456]

89 Dependency patterns performs best in terms of recall, while part-of-speech patterns outperform all others in terms of precision, and accuracy. [sent-302, score-0.2]

90 The accuracy of the first model that only uses the count and scores of polarized words was 95%. [sent-313, score-0.346]

91 First, errors in predicting word polarity usually propagates and results in errors in attitude prediction. [sent-318, score-0.883]

92 The reasons behind incorrect word polarity predictions is ambiguity in word senses and infrequent words that have very few connection in thesaurus. [sent-319, score-0.204]

93 A possible solution to this type of errors is to improve the word polarity identification module by including word sense disambiguation and adding more links to the words graph using glosses or co-occurrence statistics. [sent-320, score-0.234]

94 7 Conclusions We have shown that training a supervised Markov model of text, part-of-speech, and dependecy patterns allows us to identify sentences with attitudes from sentences without attitude. [sent-398, score-0.339]

95 This model is more accurate than several other baselines that use features based on the existence of polarized word, and proximity between polarized words and second person pronouns both in text and dependecy trees. [sent-399, score-0.991]

96 This method allows to extract signed social networks from multi-party online discussions. [sent-400, score-0.219]

97 It also allows us to study dynamics behind interactions in online discussions, the relation between text and social interactions, and how groups form and break in online discussions. [sent-402, score-0.181]

98 The slashdot zoo: mining a social network with negative edges. [sent-473, score-0.249]

99 Predicting positive and negative links in online social networks. [sent-477, score-0.33]

100 Towards answering opinion questions: separating facts from opinions and identifying the polarity of opinion sentences. [sent-556, score-0.388]

similar papers computed by tfidf model

tfidf for this paper:

wordName wordTfidf (topN-words)

[('attitude', 0.679), ('polarized', 0.346), ('polarity', 0.204), ('person', 0.11), ('orientation', 0.102), ('patterns', 0.1), ('sentiment', 0.098), ('depdist', 0.097), ('positive', 0.097), ('negative', 0.085), ('spin', 0.083), ('attitudes', 0.083), ('participants', 0.081), ('agr', 0.081), ('dist', 0.081), ('shortest', 0.077), ('hatzivassiloglou', 0.075), ('janyce', 0.073), ('wiebe', 0.07), ('signed', 0.069), ('pronoun', 0.068), ('display', 0.067), ('baselines', 0.066), ('subjectivity', 0.065), ('surfdist', 0.065), ('wordnet', 0.064), ('online', 0.063), ('dep', 0.061), ('discussions', 0.058), ('mining', 0.056), ('threaded', 0.055), ('pol', 0.055), ('social', 0.055), ('every', 0.055), ('writer', 0.054), ('network', 0.053), ('ml', 0.053), ('toward', 0.05), ('opinion', 0.05), ('identifying', 0.05), ('pronouns', 0.049), ('cong', 0.048), ('surf', 0.048), ('unqualified', 0.048), ('posts', 0.048), ('operator', 0.048), ('sign', 0.047), ('path', 0.044), ('connecting', 0.044), ('identify', 0.044), ('existence', 0.042), ('ignorant', 0.041), ('recipient', 0.041), ('orientations', 0.041), ('vasileios', 0.041), ('threads', 0.041), ('claims', 0.041), ('ruppenhofer', 0.041), ('txt', 0.041), ('sentence', 0.041), ('dependency', 0.041), ('subjective', 0.041), ('sentences', 0.04), ('discussion', 0.038), ('praise', 0.037), ('reply', 0.037), ('nasukawa', 0.037), ('precision', 0.037), ('forums', 0.035), ('neg', 0.035), ('seed', 0.034), ('opinions', 0.034), ('fragments', 0.034), ('synonyms', 0.032), ('networks', 0.032), ('andreevskaia', 0.032), ('antonyms', 0.032), ('attitudinal', 0.032), ('chatbot', 0.032), ('dependecy', 0.032), ('electron', 0.032), ('insults', 0.032), ('kanayama', 0.032), ('kunegis', 0.032), ('leskovec', 0.032), ('morinaga', 0.032), ('vahed', 0.032), ('somasundaran', 0.032), ('transition', 0.031), ('classifier', 0.031), ('targets', 0.03), ('notice', 0.03), ('closest', 0.03), ('links', 0.03), ('predict', 0.03), ('svm', 0.029), ('participant', 0.029), ('alice', 0.028), ('tetsuya', 0.028), ('kamps', 0.028)]

similar papers list:

simIndex simValue paperId paperTitle

same-paper 1 1.0000001 120 emnlp-2010-What's with the Attitude? Identifying Sentences with Attitude in Online Discussions

Author: Ahmed Hassan ; Vahed Qazvinian ; Dragomir Radev

2 0.13548794 83 emnlp-2010-Multi-Level Structured Models for Document-Level Sentiment Classification

Author: Ainur Yessenalina ; Yisong Yue ; Claire Cardie

Abstract: In this paper, we investigate structured models for document-level sentiment classification. When predicting the sentiment of a subjective document (e.g., as positive or negative), it is well known that not all sentences are equally discriminative or informative. But identifying the useful sentences automatically is itself a difficult learning problem. This paper proposes a joint two-level approach for document-level sentiment classification that simultaneously extracts useful (i.e., subjec- tive) sentences and predicts document-level sentiment based on the extracted sentences. Unlike previous joint learning methods for the task, our approach (1) does not rely on gold standard sentence-level subjectivity annotations (which may be expensive to obtain), and (2) optimizes directly for document-level performance. Empirical evaluations on movie reviews and U.S. Congressional floor debates show improved performance over previous approaches.

3 0.10420607 64 emnlp-2010-Incorporating Content Structure into Text Analysis Applications

Author: Christina Sauper ; Aria Haghighi ; Regina Barzilay

Abstract: In this paper, we investigate how modeling content structure can benefit text analysis applications such as extractive summarization and sentiment analysis. This follows the linguistic intuition that rich contextual information should be useful in these tasks. We present a framework which combines a supervised text analysis application with the induction of latent content structure. Both of these elements are learned jointly using the EM algorithm. The induced content structure is learned from a large unannotated corpus and biased by the underlying text analysis task. We demonstrate that exploiting content structure yields significant improvements over approaches that rely only on local context.1

4 0.096329242 24 emnlp-2010-Automatically Producing Plot Unit Representations for Narrative Text

Author: Amit Goyal ; Ellen Riloff ; Hal Daume III

Abstract: In the 1980s, plot units were proposed as a conceptual knowledge structure for representing and summarizing narrative stories. Our research explores whether current NLP technology can be used to automatically produce plot unit representations for narrative text. We create a system called AESOP that exploits a variety of existing resources to identify affect states and applies “projection rules” to map the affect states onto the characters in a story. We also use corpus-based techniques to generate a new type of affect knowledge base: verbs that impart positive or negative states onto their patients (e.g., being eaten is an undesirable state, but being fed is a desirable state). We harvest these “patient polarity verbs” from a Web corpus using two techniques: co-occurrence with Evil/Kind Agent patterns, and bootstrapping over conjunctions of verbs. We evaluate the plot unit representations produced by our system on a small collection of Aesop’s fables.

5 0.090286568 49 emnlp-2010-Extracting Opinion Targets in a Single and Cross-Domain Setting with Conditional Random Fields

Author: Niklas Jakob ; Iryna Gurevych

Abstract: In this paper, we focus on the opinion target extraction as part of the opinion mining task. We model the problem as an information extraction task, which we address based on Conditional Random Fields (CRF). As a baseline we employ the supervised algorithm by Zhuang et al. (2006), which represents the state-of-the-art on the employed data. We evaluate the algorithms comprehensively on datasets from four different domains annotated with individual opinion target instances on a sentence level. Furthermore, we investigate the performance of our CRF-based approach and the baseline in a single- and cross-domain opinion target extraction setting. Our CRF-based approach improves the performance by 0.077, 0.126, 0.071 and 0. 178 regarding F-Measure in the single-domain extraction in the four domains. In the crossdomain setting our approach improves the performance by 0.409, 0.242, 0.294 and 0.343 regarding F-Measure over the baseline.

6 0.087087579 85 emnlp-2010-Negative Training Data Can be Harmful to Text Classification

7 0.086648703 58 emnlp-2010-Holistic Sentiment Analysis Across Languages: Multilingual Supervised Latent Dirichlet Allocation

8 0.080534719 112 emnlp-2010-Unsupervised Discovery of Negative Categories in Lexicon Bootstrapping

9 0.079159707 70 emnlp-2010-Jointly Modeling Aspects and Opinions with a MaxEnt-LDA Hybrid

10 0.074127302 20 emnlp-2010-Automatic Detection and Classification of Social Events

11 0.062637106 51 emnlp-2010-Function-Based Question Classification for General QA

12 0.058781002 100 emnlp-2010-Staying Informed: Supervised and Semi-Supervised Multi-View Topical Analysis of Ideological Perspective

13 0.051404346 102 emnlp-2010-Summarizing Contrastive Viewpoints in Opinionated Text

14 0.048458945 48 emnlp-2010-Exploiting Conversation Structure in Unsupervised Topic Segmentation for Emails

15 0.047603853 33 emnlp-2010-Cross Language Text Classification by Model Translation and Semi-Supervised Learning

16 0.047321841 61 emnlp-2010-Improving Gender Classification of Blog Authors

17 0.045857981 119 emnlp-2010-We're Not in Kansas Anymore: Detecting Domain Changes in Streams

18 0.045185708 11 emnlp-2010-A Semi-Supervised Approach to Improve Classification of Infrequent Discourse Relations Using Feature Vector Extension

19 0.044279132 107 emnlp-2010-Towards Conversation Entailment: An Empirical Investigation

20 0.042110961 41 emnlp-2010-Efficient Graph-Based Semi-Supervised Learning of Structured Tagging Models

similar papers computed by lsi model

lsi for this paper:

topicId topicWeight

[(0, 0.175), (1, 0.133), (2, -0.107), (3, 0.019), (4, 0.088), (5, -0.033), (6, 0.171), (7, -0.025), (8, 0.027), (9, -0.012), (10, -0.027), (11, -0.044), (12, 0.068), (13, -0.146), (14, 0.023), (15, 0.081), (16, 0.02), (17, 0.096), (18, -0.136), (19, -0.03), (20, -0.183), (21, -0.004), (22, -0.136), (23, -0.15), (24, -0.001), (25, -0.031), (26, 0.012), (27, 0.065), (28, 0.103), (29, -0.147), (30, -0.035), (31, 0.264), (32, 0.035), (33, 0.029), (34, -0.025), (35, 0.155), (36, 0.12), (37, 0.041), (38, 0.051), (39, 0.08), (40, 0.046), (41, -0.088), (42, -0.054), (43, -0.019), (44, 0.08), (45, 0.004), (46, -0.06), (47, 0.103), (48, -0.026), (49, -0.06)]

similar papers list:

simIndex simValue paperId paperTitle

same-paper 1 0.92625242 120 emnlp-2010-What's with the Attitude? Identifying Sentences with Attitude in Online Discussions

Author: Ahmed Hassan ; Vahed Qazvinian ; Dragomir Radev

2 0.72702473 83 emnlp-2010-Multi-Level Structured Models for Document-Level Sentiment Classification

Author: Ainur Yessenalina ; Yisong Yue ; Claire Cardie

3 0.55596238 24 emnlp-2010-Automatically Producing Plot Unit Representations for Narrative Text

Author: Amit Goyal ; Ellen Riloff ; Hal Daume III

4 0.39466754 64 emnlp-2010-Incorporating Content Structure into Text Analysis Applications

Author: Christina Sauper ; Aria Haghighi ; Regina Barzilay

5 0.34823707 112 emnlp-2010-Unsupervised Discovery of Negative Categories in Lexicon Bootstrapping

Author: Tara McIntosh

Abstract: Multi-category bootstrapping algorithms were developed to reduce semantic drift. By extracting multiple semantic lexicons simultaneously, a category’s search space may be restricted. The best results have been achieved through reliance on manually crafted negative categories. Unfortunately, identifying these categories is non-trivial, and their use shifts the unsupervised bootstrapping paradigm towards a supervised framework. We present NEG-FINDER, the first approach for discovering negative categories automatically. NEG-FINDER exploits unsupervised term clustering to generate multiple negative categories during bootstrapping. Our algorithm effectively removes the necessity of manual intervention and formulation of negative categories, with performance closely approaching that obtained using negative categories defined by a domain expert.

6 0.33932441 85 emnlp-2010-Negative Training Data Can be Harmful to Text Classification

7 0.27982539 58 emnlp-2010-Holistic Sentiment Analysis Across Languages: Multilingual Supervised Latent Dirichlet Allocation

8 0.27621046 107 emnlp-2010-Towards Conversation Entailment: An Empirical Investigation

9 0.2753585 20 emnlp-2010-Automatic Detection and Classification of Social Events

10 0.23599912 51 emnlp-2010-Function-Based Question Classification for General QA

11 0.23515327 49 emnlp-2010-Extracting Opinion Targets in a Single and Cross-Domain Setting with Conditional Random Fields

12 0.23176041 70 emnlp-2010-Jointly Modeling Aspects and Opinions with a MaxEnt-LDA Hybrid

13 0.2233936 91 emnlp-2010-Practical Linguistic Steganography Using Contextual Synonym Substitution and Vertex Colour Coding

14 0.22089231 102 emnlp-2010-Summarizing Contrastive Viewpoints in Opinionated Text

15 0.22040041 46 emnlp-2010-Evaluating the Impact of Alternative Dependency Graph Encodings on Solving Event Extraction Tasks

16 0.21110025 61 emnlp-2010-Improving Gender Classification of Blog Authors

17 0.20808579 48 emnlp-2010-Exploiting Conversation Structure in Unsupervised Topic Segmentation for Emails

18 0.191833 108 emnlp-2010-Training Continuous Space Language Models: Some Practical Issues

19 0.19006516 100 emnlp-2010-Staying Informed: Supervised and Semi-Supervised Multi-View Topical Analysis of Ideological Perspective

20 0.18678595 8 emnlp-2010-A Multi-Pass Sieve for Coreference Resolution

similar papers computed by lda model

lda for this paper:

topicId topicWeight

[(3, 0.013), (10, 0.018), (12, 0.034), (18, 0.271), (29, 0.064), (30, 0.053), (32, 0.015), (52, 0.031), (56, 0.139), (62, 0.012), (66, 0.11), (72, 0.047), (76, 0.042), (79, 0.018), (82, 0.015), (87, 0.018), (89, 0.018)]

similar papers list:

simIndex simValue paperId paperTitle

1 0.84212494 73 emnlp-2010-Learning Recurrent Event Queries for Web Search

Author: Ruiqiang Zhang ; Yuki Konda ; Anlei Dong ; Pranam Kolari ; Yi Chang ; Zhaohui Zheng

Abstract: Recurrent event queries (REQ) constitute a special class of search queries occurring at regular, predictable time intervals. The freshness of documents ranked for such queries is generally of critical importance. REQ forms a significant volume, as much as 6% of query traffic received by search engines. In this work, we develop an improved REQ classifier that could provide significant improvements in addressing this problem. We analyze REQ queries, and develop novel features from multiple sources, and evaluate them using machine learning techniques. From historical query logs, we develop features utilizing query frequency, click information, and user intent dynamics within a search session. We also develop temporal features by time series analysis from query frequency. Other generated features include word matching with recurrent event seed words and time sensitivity of search result set. We use Naive Bayes, SVM and decision tree based logistic regres- sion model to train REQ classifier. The results on test data show that our models outperformed baseline approach significantly. Experiments on a commercial Web search engine also show significant gains in overall relevance, and thus overall user experience.

same-paper 2 0.73870009 120 emnlp-2010-What's with the Attitude? Identifying Sentences with Attitude in Online Discussions

Author: Ahmed Hassan ; Vahed Qazvinian ; Dragomir Radev

3 0.5821113 102 emnlp-2010-Summarizing Contrastive Viewpoints in Opinionated Text

Author: Michael Paul ; ChengXiang Zhai ; Roxana Girju

Abstract: This paper presents a two-stage approach to summarizing multiple contrastive viewpoints in opinionated text. In the first stage, we use an unsupervised probabilistic approach to model and extract multiple viewpoints in text. We experiment with a variety of lexical and syntactic features, yielding significant performance gains over bag-of-words feature sets. In the second stage, we introduce Comparative LexRank, a novel random walk formulation to score sentences and pairs of sentences from opposite viewpoints based on both their representativeness of the collection as well as their contrastiveness with each other. Exper- imental results show that the proposed approach can generate informative summaries of viewpoints in opinionated text.

4 0.56427079 1 emnlp-2010-"Poetic" Statistical Machine Translation: Rhyme and Meter

Author: Dmitriy Genzel ; Jakob Uszkoreit ; Franz Och

Abstract: As a prerequisite to translation of poetry, we implement the ability to produce translations with meter and rhyme for phrase-based MT, examine whether the hypothesis space of such a system is flexible enough to accomodate such constraints, and investigate the impact of such constraints on translation quality.

5 0.56349689 64 emnlp-2010-Incorporating Content Structure into Text Analysis Applications

Author: Christina Sauper ; Aria Haghighi ; Regina Barzilay

6 0.56095976 83 emnlp-2010-Multi-Level Structured Models for Document-Level Sentiment Classification

7 0.55519682 107 emnlp-2010-Towards Conversation Entailment: An Empirical Investigation

8 0.55400109 105 emnlp-2010-Title Generation with Quasi-Synchronous Grammar

9 0.55304348 82 emnlp-2010-Multi-Document Summarization Using A* Search and Discriminative Learning

10 0.55146444 58 emnlp-2010-Holistic Sentiment Analysis Across Languages: Multilingual Supervised Latent Dirichlet Allocation

11 0.54733145 100 emnlp-2010-Staying Informed: Supervised and Semi-Supervised Multi-View Topical Analysis of Ideological Perspective

12 0.5445534 49 emnlp-2010-Extracting Opinion Targets in a Single and Cross-Domain Setting with Conditional Random Fields

13 0.54055917 32 emnlp-2010-Context Comparison of Bursty Events in Web Search and Online Media

14 0.53313732 35 emnlp-2010-Discriminative Sample Selection for Statistical Machine Translation

15 0.53141975 69 emnlp-2010-Joint Training and Decoding Using Virtual Nodes for Cascaded Segmentation and Tagging Tasks

16 0.52812099 65 emnlp-2010-Inducing Probabilistic CCG Grammars from Logical Form with Higher-Order Unification

17 0.52597326 25 emnlp-2010-Better Punctuation Prediction with Dynamic Conditional Random Fields

18 0.52406311 18 emnlp-2010-Assessing Phrase-Based Translation Models with Oracle Decoding

19 0.52216285 80 emnlp-2010-Modeling Organization in Student Essays

20 0.52143192 31 emnlp-2010-Constraints Based Taxonomic Relation Classification