emnlp emnlp2010 emnlp2010-83 knowledge-graph by maker-knowledge-mining

83 emnlp-2010-Multi-Level Structured Models for Document-Level Sentiment Classification

Source: pdf

Author: Ainur Yessenalina ; Yisong Yue ; Claire Cardie

Abstract: In this paper, we investigate structured models for document-level sentiment classification. When predicting the sentiment of a subjective document (e.g., as positive or negative), it is well known that not all sentences are equally discriminative or informative. But identifying the useful sentences automatically is itself a difficult learning problem. This paper proposes a joint two-level approach for document-level sentiment classification that simultaneously extracts useful (i.e., subjec- tive) sentences and predicts document-level sentiment based on the extracted sentences. Unlike previous joint learning methods for the task, our approach (1) does not rely on gold standard sentence-level subjectivity annotations (which may be expensive to obtain), and (2) optimizes directly for document-level performance. Empirical evaluations on movie reviews and U.S. Congressional floor debates show improved performance over previous approaches.

Reference: text

Summary: the most important sentenses genereted by tfidf model

sentIndex sentText sentNum sentScore

1 edu l Abstract In this paper, we investigate structured models for document-level sentiment classification. [sent-4, score-0.209]

2 When predicting the sentiment of a subjective document (e. [sent-5, score-0.468]

3 This paper proposes a joint two-level approach for document-level sentiment classification that simultaneously extracts useful (i. [sent-9, score-0.298]

4 , subjec- tive) sentences and predicts document-level sentiment based on the extracted sentences. [sent-11, score-0.253]

5 Unlike previous joint learning methods for the task, our approach (1) does not rely on gold standard sentence-level subjectivity annotations (which may be expensive to obtain), and (2) optimizes directly for document-level performance. [sent-12, score-0.197]

6 Congressional floor debates show improved performance over previous approaches. [sent-15, score-0.475]

7 One of the main challenges for document-level sentiment categorization is that not every part of the document is equally informative for inferring the sentiment of the whole document. [sent-17, score-0.526]

8 Objective statements interleaved with the subjective statements can be confusing for learning methods, and subjective statements with conflicting sentiment further complicate the document categorization task. [sent-18, score-0.619]

9 For example, authors of movie reviews 1046 Yisong Yue Claire Cardie Dept. [sent-19, score-0.294]

10 Early research on document-level sentiment classification employed conventional machine learning techniques for text categorization (Pang et al. [sent-27, score-0.265]

11 Second, some solutions for incorporating sentencelevel information lack mechanisms for controlling how errors propagate from the subjective sentence identification subtask to the main document classification task (Pang and Lee, 2004). [sent-37, score-0.426]

12 Instead, our training method treats sentence-level labels as hidden variables and jointly learns to predict the document label and those (subjective) sentences that best “explain” it, thus controlling the propagation of incorrect sentence labels. [sent-46, score-0.256]

13 And by directly optimizing for document-level accuracy, our model learns to solve the sentence extraction subtask only to the extent required for accurately classifying document sentiment. [sent-47, score-0.212]

14 1 For the rest of the paper, we will discuss related work, motivate and describe our model, present an empirical evaluation on movie reviews and U. [sent-49, score-0.294]

15 Congressional floor debates datasets and close with discussion and conclusions. [sent-51, score-0.506]

16 They used a cascaded approach by first filtering out objective sentences and performing subjectivity extractions using a global min-cut inference. [sent-53, score-0.228]

17 Afterward, the subjective extracts were converted into inputs for the document-level sentiment classifier. [sent-54, score-0.36]

18 One advantage of their approach is that it avoids the need for explicit subjectivity annotations. [sent-55, score-0.133]

19 (2006), Mao and Lebanon (2006)), it can be difficult to control how errors propagate from the sentence-level subtask to the main document classification task. [sent-59, score-0.236]

20 Instead of taking a cascaded approach, one can directly modify the training of flat document classifiers using lower level information. [sent-60, score-0.198]

21 (2007) used human annotators to mark the “annotator rationales”, which are text spans that support the document’s sentiment label. [sent-62, score-0.209]

22 These annotator rationales are then used to formulate additional constraints during SVM training to ensure that the resulting document classifier is less confident in classifying a document that does not contain the rationale versus the original document. [sent-63, score-0.497]

23 com/ svms le / 1047 A natural approach to avoid the pitfalls associated with cascaded methods is to use joint twolevel models that simultaneously solve the sentencelevel and document-level tasks (e. [sent-68, score-0.196]

24 Similar to our approach, the lower level labels are treated as hidden or latent variables during training. [sent-86, score-0.145]

25 Although the training process is non-trivial (and in particular requires a good initialization ofthe hidden variables), it avoids the need for human annotations for the lower level subtasks. [sent-87, score-0.245]

26 Some researchers have also recently applied hidden variable models to sentiment analysis, but they were focused on classifying either phrase-level (Choi and Cardie, 2008) or sentence-level polarity (Nakagawa et al. [sent-88, score-0.456]

27 3 Extracting Hidden Explanations In this paper, we take the view that each document has a subset of sentences that best explains its sentiment. [sent-90, score-0.198]

28 Consider the “annotator rationales” generated by human judges for the movie reviews dataset (Zaidan et al. [sent-91, score-0.35]

29 Thus, these rationales can be interpreted as (something close to) a ground truth labeling of the explanatory segments. [sent-94, score-0.267]

30 We are interested in settings where humanextracted explanations such as annotator rationales might not be readily available, or are imperfect. [sent-99, score-0.344]

31 As such, we will formulate the set of extracted sentences as latent or hidden variables in our model. [sent-100, score-0.153]

32 Viewing the extracted sentences as latent variables will pose no new challenges during prediction, since the model is expected to predict all labels at test time. [sent-101, score-0.152]

33 – 4 Model In this section, we present a two-level document classification model. [sent-103, score-0.164]

34 Although our model makes predictions at both the document and sentence levels, it will be trained (and evaluated) only with respect to document-level performance. [sent-104, score-0.108]

35 Let x denote a document, y = ±1 denote the sentimLeentt (for us, a binary positive or negative polarity) of a document, and s denote a subset of explanatory sentences in x. [sent-107, score-0.211]

36 Let Ψ(x, y, s) denote a joint feature map that outputs features describing the quality of predicting sentiment y using explanation s for document x. [sent-108, score-0.396]

37 ,(4) and ψpol (xj) and ψsubj (xj) denote the polarity and subjectivity features of sentence xj, respectively. [sent-117, score-0.271]

38 Then we might learn a high weight for the feature corresponding to the word “think” in ψsubj since that word is indicative of the sentence being subjective (but not necessarily indicating positive or negative polarity). [sent-124, score-0.248]

39 Recall from (2) that our hypothesis function predicts the sentiment label that maximizes (3). [sent-127, score-0.209]

40 To do this, we compare the best set of sentences that explains a positive polarity prediction with the best set that explains a negative polarity prediction. [sent-128, score-0.559]

41 2 Training For training, we will use an approach based on latent variable structural SVMs (Yu and Joachims, 2009). [sent-147, score-0.124]

42 Each training example has a corresponding constraint (7), which is quantified over the best possible explanation of the training polarity label. [sent-152, score-0.224]

43 Note that we never observe the true explanation for the training labels; they are the hidden or latent variables. [sent-153, score-0.155]

44 In other words, our goal is to learn to identify the informative (subjective) sentences that best explain the training labels to the extent required for good document classification performance. [sent-159, score-0.244]

45 Yu and Joachims (2009) showed that this alternating procedure for training latent variable structural SVMs is an instance of the CCCP procedure (Yuille and Rangarajan, 2003), and so is guaranteed to converge to a local optimum. [sent-170, score-0.204]

46 For our experiments, we do not train until convergence, but instead use performance on a validation set to choose the halting iteration. [sent-171, score-0.13]

47 Since OP 1is nonconvex, a good initialization is necessary. [sent-172, score-0.097]

48 To generate the initial explanations, one can use an off-theshelf sentiment classifier such as OpinionFinder2 (Wilson et al. [sent-173, score-0.209]

49 edu /mpqa/ opinion finderre leas e / can treat either as the ground truth or another (very good) initial guess of the explanatory sentences. [sent-178, score-0.195]

50 Using such a feature representation might allow us to learn which words have high polarity (e. [sent-184, score-0.178]

51 , “great”) and which are indicative of subjective sentences (e. [sent-186, score-0.225]

52 For example, subjective sentences might densely populate the end of a document, or exhibit spatial coherence (so features describing previous sentences might be useful for classifying the current sentence). [sent-191, score-0.271]

53 Such features cannot be compactly incorporated into flat models that ignore the document structure. [sent-192, score-0.145]

54 3, it is possible (and likely) for subjective sentences to exhibit spatial coherence (e. [sent-201, score-0.195]

55 Alternative approaches include explicitly accounting for this structure by treating subjective sentence extraction as a sequence-labeling problem, such as in McDonald et al. [sent-209, score-0.151]

56 Note that the inference procedure in Algorthm 1 is still tractable, since it reduces to comparing the best sequence of subjective/objective sentences that explains a positive sentiment versus the best sequence that explains a negative sentiment. [sent-212, score-0.452]

57 5 Extensions Though our initial model (3) is simple and intuitive, performance can depend heavily on the quality of latent variable initialization and the quality of the feature structure design. [sent-215, score-0.169]

58 Consider the case where the initialization contains only objective sentences that do not convey any sentiment. [sent-216, score-0.179]

59 Then all the features initially available during training are generated from these objective sentences and are thus useless for sentiment classification. [sent-217, score-0.291]

60 (C) The component that models the entire document should influence which sentences are extracted. [sent-222, score-0.152]

61 The second property is desirable since joint training avoids error propagation that can be difficult to control. [sent-224, score-0.104]

62 Using the representation in (4), we propose a training procedure that regularize w~ pol relative to a prior model. [sent-229, score-0.398]

63 , where w~ doc denotes a weight vector trained to classify the polarity of entire documents. [sent-235, score-0.217]

64 Then one can interpret OP 2 as enforcing that the polarity weights w~pol not be too far from w~ doc. [sent-236, score-0.178]

65 2 Extended Feature Space One simple way to satisfy all three aforementioned properties is to jointly model not only polarity and subjectivity of the extracted sentences, but also polarity of the entire document. [sent-241, score-0.449]

66 Let w~ doc denote the weight vector used to model the polarity of entire document x (so the document polarity score is then w~ Tdocψpol(x)). [sent-242, score-0.611]

67 We can also incorporate this weight vector into our structured model to compute a smoothed polarity score of each sentence via w~Tdocψpol(xj). [sent-243, score-0.178]

68 Training this model via OP 1 achieves that w~ doc is (1) used to model the polarity of the entire document, and (2) used to compute a smoothed estimate of the polarity of the extracted sentences. [sent-247, score-0.395]

69 We use the movie reviews dataset from Zaidan et al. [sent-254, score-0.35]

70 This version contains annotated rationales for each review, which we use to generate an additional initialization during training (described below). [sent-256, score-0.264]

71 floor debates in the House of Representatives in 2005. [sent-269, score-0.475]

72 As in previ- ous work, only debates with discussions of “con- troversial” the los- bills were considered (where ing side had at least 20% of the speeches). [sent-270, score-0.301]

73 html l 4Since the rationale annotations are available for nine out of 10 folds, we used the 10-th fold as the blind test set. [sent-278, score-0.107]

74 Since our training procedure solves a non-convex optimization problem, it requires an initial guess of the explanatory sentences. [sent-324, score-0.235]

75 We use an explanatory set size (5) of 30% of the number of sentences in each document, L = d0. [sent-325, score-0.144]

76 , 2005), which were shown to be a reasonable substitute for human annotations in the Movie Reviews dataset (Yessenalina et al. [sent-329, score-0.127]

77 In the Movie Reviews dataset, we also use sentences containing human-annotator rationales as a final initialization option. [sent-332, score-0.308]

78 8 Unsurprisingly, initializing using human annotations (in the Movie Reviews dataset) can offer further im- provement. [sent-352, score-0.103]

79 Congressional Floor Debates dataset for the speaker-based segment classification task. [sent-366, score-0.112]

80 Congressional Floor Debates dataset we used only the latter setting, since there are no annotations available for this dataset. [sent-376, score-0.127]

81 The training procedure alternates between training a standard struc- tural SVM model and using the subsequent model to re-label the latent variables. [sent-380, score-0.145]

82 We selected the halting iteration of the training procedure using the validation set. [sent-381, score-0.17]

83 Figure 1 shows the per-iteration overlap of extracted sentences from SVMsflse models initialized using OpinionFinder and human annotations on the Movie Reviews training set. [sent-385, score-0.115]

84 9 We can also see that both models iteratively learn to extract sentences that are more similar to each other than their respective initializations (the overlap between the two initializations is 57%). [sent-387, score-0.116]

85 The five least subjective sentences are preceded by circles with numbers denoting the subjectivity order (1being least subjective according to SVMfslse). [sent-397, score-0.439]

86 posed methods Our pro- are not designed to extract inter- pretable explanations, but examining the extracted explanations might still yield meaningful informa- tion. [sent-400, score-0.131]

87 For comparison, Table 4 also shows the five least subjective sentences according to SVMsflse. [sent-405, score-0.195]

88 Compared to methods that modify the training of flat document classifiers (e. [sent-410, score-0.145]

89 , Pang and Lee (2004)), our approach is more robust to errors in the lower-level subtask due to being a joint model. [sent-416, score-0.105]

90 Introducing latent variables makes the training procedure more flexible by not requiring lower-level labels, but does require a good initialization (i. [sent-417, score-0.209]

91 We believe that the widespread availability of off-theshelf sentiment lexicons and software, despite being developed for a different domain, makes this issue less of a concern, and in fact creates an opportunity for approaches like ours to have real impact. [sent-420, score-0.209]

92 Our method is not limited to the transductive setting, and instead exploits a different and complementary structure: the latent explanation (i. [sent-436, score-0.118]

93 Another interesting direction is training models to predict not only sentiment polarity, but also whether a document is objective. [sent-447, score-0.317]

94 For example, one can pose a three class problem (“positive”, “negative”, “objective”), where objective documents might not necessarily have a good set of (subjective) explanatory sentences, similar to (Chang et al. [sent-448, score-0.138]

95 7 Conclusion We have presented latent variable structured models for the document sentiment classification task. [sent-450, score-0.445]

96 These models do not rely on sentence-level annotations, and are trained jointly (over both the document and sentence levels) to directly optimize document-level accuracy. [sent-451, score-0.108]

97 Experiments on two standard sentiment analysis datasets showed improved performance over previous results. [sent-452, score-0.24]

98 However, as evidenced by our experiments, proper training does require a reasonable initial guess of the extracted ex- 1055 planations, as well as ways to mitigate the risk of the extraction subtask suppressing too much information (such as via feature smoothing). [sent-454, score-0.167]

99 Learning with compositional semantics as structural inference for subsentential sentiment analysis. [sent-474, score-0.261]

100 Dependency tree-based sentiment classification using crfs with hidden variables. [sent-502, score-0.302]

similar papers computed by tfidf model

tfidf for this paper:

wordName wordTfidf (topN-words)

[('pol', 0.358), ('debates', 0.268), ('congressional', 0.244), ('xj', 0.227), ('sentiment', 0.209), ('floor', 0.207), ('subj', 0.191), ('polarity', 0.178), ('movie', 0.177), ('rationales', 0.167), ('subjective', 0.151), ('svmsle', 0.151), ('svmfslse', 0.134), ('zaidan', 0.134), ('explanations', 0.131), ('svmsflse', 0.117), ('reviews', 0.117), ('document', 0.108), ('pang', 0.101), ('explanatory', 0.1), ('op', 0.1), ('initialization', 0.097), ('guess', 0.095), ('subjectivity', 0.093), ('jx', 0.089), ('opinionfinder', 0.086), ('joachims', 0.084), ('halting', 0.084), ('latent', 0.072), ('subtask', 0.072), ('annotations', 0.071), ('svms', 0.071), ('ainur', 0.067), ('corne', 0.067), ('ptol', 0.067), ('yessenalina', 0.067), ('lillian', 0.057), ('thomas', 0.057), ('dataset', 0.056), ('classification', 0.056), ('yu', 0.055), ('cascaded', 0.053), ('cardie', 0.053), ('structural', 0.052), ('svm', 0.05), ('argmaxs', 0.05), ('cornell', 0.05), ('ithaca', 0.05), ('mao', 0.05), ('ssvmsolve', 0.05), ('stubj', 0.05), ('xxi', 0.05), ('yuille', 0.05), ('choi', 0.05), ('vote', 0.048), ('validation', 0.046), ('explanation', 0.046), ('annotator', 0.046), ('explains', 0.046), ('sentences', 0.044), ('nakagawa', 0.043), ('mcdonald', 0.041), ('xi', 0.04), ('avoids', 0.04), ('procedure', 0.04), ('sentencelevel', 0.039), ('doc', 0.039), ('objective', 0.038), ('hidden', 0.037), ('flat', 0.037), ('claire', 0.036), ('labels', 0.036), ('yejin', 0.036), ('initializations', 0.036), ('finley', 0.036), ('rationale', 0.036), ('lee', 0.035), ('negative', 0.035), ('bo', 0.034), ('thorsten', 0.034), ('alternates', 0.033), ('asxi', 0.033), ('bansal', 0.033), ('bills', 0.033), ('cccp', 0.033), ('dtoc', 0.033), ('felzenszwalb', 0.033), ('ncxi', 0.033), ('omar', 0.033), ('tdoc', 0.033), ('pf', 0.033), ('joint', 0.033), ('classifying', 0.032), ('positive', 0.032), ('si', 0.032), ('initializing', 0.032), ('propagation', 0.031), ('datasets', 0.031), ('sa', 0.03), ('indicative', 0.03)]

similar papers list:

simIndex simValue paperId paperTitle

same-paper 1 1.0000006 83 emnlp-2010-Multi-Level Structured Models for Document-Level Sentiment Classification

Author: Ainur Yessenalina ; Yisong Yue ; Claire Cardie

2 0.19723919 64 emnlp-2010-Incorporating Content Structure into Text Analysis Applications

Author: Christina Sauper ; Aria Haghighi ; Regina Barzilay

Abstract: In this paper, we investigate how modeling content structure can benefit text analysis applications such as extractive summarization and sentiment analysis. This follows the linguistic intuition that rich contextual information should be useful in these tasks. We present a framework which combines a supervised text analysis application with the induction of latent content structure. Both of these elements are learned jointly using the EM algorithm. The induced content structure is learned from a large unannotated corpus and biased by the underlying text analysis task. We demonstrate that exploiting content structure yields significant improvements over approaches that rely only on local context.1

3 0.13548794 120 emnlp-2010-What's with the Attitude? Identifying Sentences with Attitude in Online Discussions

Author: Ahmed Hassan ; Vahed Qazvinian ; Dragomir Radev

Abstract: Mining sentiment from user generated content is a very important task in Natural Language Processing. An example of such content is threaded discussions which act as a very important tool for communication and collaboration in the Web. Threaded discussions include e-mails, e-mail lists, bulletin boards, newsgroups, and Internet forums. Most of the work on sentiment analysis has been centered around finding the sentiment toward products or topics. In this work, we present a method to identify the attitude of participants in an online discussion toward one another. This would enable us to build a signed network representation of participant interaction where every edge has a sign that indicates whether the interaction is positive or negative. This is different from most of the research on social networks that has focused almost exclusively on positive links. The method is exper- imentally tested using a manually labeled set of discussion posts. The results show that the proposed method is capable of identifying attitudinal sentences, and their signs, with high accuracy and that it outperforms several other baselines.

4 0.13001221 58 emnlp-2010-Holistic Sentiment Analysis Across Languages: Multilingual Supervised Latent Dirichlet Allocation

Author: Jordan Boyd-Graber ; Philip Resnik

Abstract: In this paper, we develop multilingual supervised latent Dirichlet allocation (MLSLDA), a probabilistic generative model that allows insights gleaned from one language’s data to inform how the model captures properties of other languages. MLSLDA accomplishes this by jointly modeling two aspects of text: how multilingual concepts are clustered into thematically coherent topics and how topics associated with text connect to an observed regression variable (such as ratings on a sentiment scale). Concepts are represented in a general hierarchical framework that is flexible enough to express semantic ontologies, dictionaries, clustering constraints, and, as a special, degenerate case, conventional topic models. Both the topics and the regression are discovered via posterior inference from corpora. We show MLSLDA can build topics that are consistent across languages, discover sensible bilingual lexical correspondences, and leverage multilingual corpora to better predict sentiment. Sentiment analysis (Pang and Lee, 2008) offers the promise of automatically discerning how people feel about a product, person, organization, or issue based on what they write online, which is potentially of great value to businesses and other organizations. However, the vast majority of sentiment resources and algorithms are limited to a single language, usually English (Wilson, 2008; Baccianella and Sebastiani, 2010). Since no single language captures a majority of the content online, adopting such a limited approach in an increasingly global community risks missing important details and trends that might only be available when text in multiple languages is taken into account. 45 Philip Resnik Department of Linguistics and UMIACS University of Maryland College Park, MD re snik@umd .edu Up to this point, multiple languages have been addressed in sentiment analysis primarily by transferring knowledge from a resource-rich language to a less rich language (Banea et al., 2008), or by ignoring differences in languages via translation into English (Denecke, 2008). These approaches are limited to a view of sentiment that takes place through an English-centric lens, and they ignore the potential to share information between languages. Ideally, learning sentiment cues holistically, across languages, would result in a richer and more globally consistent picture. In this paper, we introduce Multilingual Supervised Latent Dirichlet Allocation (MLSLDA), a model for sentiment analysis on a multilingual corpus. MLSLDA discovers a consistent, unified picture of sentiment across multiple languages by learning “topics,” probabilistic partitions of the vocabulary that are consistent in terms of both meaning and relevance to observed sentiment. Our approach makes few assumptions about available resources, requiring neither parallel corpora nor machine translation. The rest of the paper proceeds as follows. In Section 1, we describe the probabilistic tools that we use to create consistent topics bridging across languages and the MLSLDA model. In Section 2, we present the inference process. We discuss our set of semantic bridges between languages in Section 3, and our experiments in Section 4 demonstrate that this approach functions as an effective multilingual topic model, discovers sentiment-biased topics, and uses multilingual corpora to make better sentiment predictions across languages. Sections 5 and 6 discuss related research and discusses future work, respectively. ProcMe IdTi,n Mgsas ofsa tchehu 2se0t1t0s, C UoSnAfe,r 9e-n1ce1 o Onc Etombepri 2ic0a1l0 M. ?ec th2o0d1s0 i Ans Nsaotcuiartaioln La fonrg Cuaogmep Purtoatcieosnsainlg L,in pgagueis ti 4c5s–5 , 1 Predictions from Multilingual Topics As its name suggests, MLSLDA is an extension of Latent Dirichlet allocation (LDA) (Blei et al., 2003), a modeling approach that takes a corpus of unannotated documents as input and produces two outputs, a set of “topics” and assignments of documents to topics. Both the topics and the assignments are probabilistic: a topic is represented as a probability distribution over words in the corpus, and each document is assigned a probability distribution over all the topics. Topic models built on the foundations of LDA are appealing for sentiment analysis because the learned topics can cluster together sentimentbearing words, and because topic distributions are a parsimonious way to represent a document.1 LDA has been used to discover latent structure in text (e.g. for discourse segmentation (Purver et al., 2006) and authorship (Rosen-Zvi et al., 2004)). MLSLDA extends the approach by ensuring that this latent structure the underlying topics is consistent across languages. We discuss multilingual topic modeling in Section 1. 1, and in Section 1.2 we show how this enables supervised regression regardless of a document’s language. — — 1.1 Capturing Semantic Correlations Topic models posit a straightforward generative process that creates an observed corpus. For each docu- ment d, some distribution θd over unobserved topics is chosen. Then, for each word position in the document, a topic z is selected. Finally, the word for that position is generated by selecting from the topic indexed by z. (Recall that in LDA, a “topic” is a distribution over words). In monolingual topic models, the topic distribution is usually drawn from a Dirichlet distribution. Using Dirichlet distributions makes it easy to specify sparse priors, and it also simplifies posterior inference because Dirichlet distributions are conjugate to multinomial distributions. However, drawing topics from Dirichlet distributions will not suffice if our vocabulary includes multiple languages. If we are working with English, German, and Chinese at the same time, a Dirichlet prior has no way to favor distributions z such that p(good|z), p(gut|z), and 1The latter property has also made LDA popular for information retrieval (Wei and Croft, 2006)). 46 p(h aˇo|z) all tend to be high at the same time, or low at hth ˇaeo same lti tmened. tMoo bree generally, et sheam structure oorf our model must encourage topics to be consistent across languages, and Dirichlet distributions cannot encode correlations between elements. One possible solution to this problem is to use the multivariate normal distribution, which can produce correlated multinomials (Blei and Lafferty, 2005), in place of the Dirichlet distribution. This has been done successfully in multilingual settings (Cohen and Smith, 2009). However, such models complicate inference by not being conjugate. Instead, we appeal to tree-based extensions of the Dirichlet distribution, which has been used to induce correlation in semantic ontologies (Boyd-Graber et al., 2007) and to encode clustering constraints (Andrzejewski et al., 2009). The key idea in this approach is to assume the vocabularies of all languages are organized according to some shared semantic structure that can be represented as a tree. For concreteness in this section, we will use WordNet (Miller, 1990) as the representation of this multilingual semantic bridge, since it is well known, offers convenient and intuitive terminology, and demonstrates the full flexibility of our approach. However, the model we describe generalizes to any tree-structured rep- resentation of multilingual knowledge; we discuss some alternatives in Section 3. WordNet organizes a vocabulary into a rooted, directed acyclic graph of nodes called synsets, short for “synonym sets.” A synset is a child of another synset if it satisfies a hyponomy relationship; each child “is a” more specific instantiation of its parent concept (thus, hyponomy is often called an “isa” relationship). For example, a “dog” is a “canine” is an “animal” is a “living thing,” etc. As an approximation, it is not unreasonable to assume that WordNet’s structure of meaning is language independent, i.e. the concept encoded by a synset can be realized using terms in different languages that share the same meaning. In practice, this organization has been used to create many alignments of international WordNets to the original English WordNet (Ordan and Wintner, 2007; Sagot and Fiˇ ser, 2008; Isahara et al., 2008). Using the structure of WordNet, we can now describe a generative process that produces a distribution over a multilingual vocabulary, which encourages correlations between words with similar meanings regardless of what language each word is in. For each synset h, we create a multilingual word distribution for that synset as follows: 1. Draw transition probabilities βh ∼ Dir (τh) 2. Draw stop probabilities ωh ∼ Dir∼ (κ Dhi)r 3. For each language l, draw emission probabilities for that synset φh,l ∼ Dir (πh,l) . For conciseness in the rest of the paper, we will refer to this generative process as multilingual Dirichlet hierarchy, or MULTDIRHIER(τ, κ, π) .2 Each observed token can be viewed as the end result of a sequence of visited synsets λ. At each node in the tree, the path can end at node iwith probability ωi,1, or it can continue to a child synset with probability ωi,0. If the path continues to another child synset, it visits child j with probability βi,j. If the path ends at a synset, it generates word k with probability φi,l,k.3 The probability of a word being emitted from a path with visited synsets r and final synset h in language lis therefore p(w, λ = r, h|l, β, ω, φ) = (iY,j)∈rβi,jωi,0(1 − ωh,1)φh,l,w. Note that the stop probability ωh (1) is independent of language, but the emission φh,l is dependent on the language. This is done to prevent the following scenario: while synset A is highly probable in a topic and words in language 1attached to that synset have high probability, words in language 2 have low probability. If this could happen for many synsets in a topic, an entire language would be effectively silenced, which would lead to inconsistent topics (e.g. 2Variables τh, πh,l, and κh are hyperparameters. Their mean is fixed, but their magnitude is sampled during inference (i.e. Pkτhτ,ih,k is constant, but τh,i is not). For the bushier bridges, (Pe.g. dictionary and flat), their mean is uniform. For GermaNet, we took frequencies from two balanced corpora of German and English: the British National Corpus (University of Oxford, 2006) and the Kern Corpus of the Digitales Wo¨rterbuch der Deutschen Sprache des 20. Jahrhunderts project (Geyken, 2007). We took these frequencies and propagated them through the multilingual hierarchy, following LDAWN’s (Boyd-Graber et al., 2007) formulation of information content (Resnik, 1995) as a Bayesian prior. The variance of the priors was initialized to be 1.0, but could be sampled during inference. 3Note that the language and word are taken as given, but the path through the semantic hierarchy is a latent random variable. 47 Topic 1 is about baseball in English and about travel in German). Separating path from emission helps ensure that topics are consistent across languages. Having defined topic distributions in a way that can preserve cross-language correspondences, we now use this distribution within a larger model that can discover cross-language patterns of use that predict sentiment. 1.2 The MLSLDA Model We will view sentiment analysis as a regression problem: given an input document, we want to predict a real-valued observation y that represents the sentiment of a document. Specifically, we build on supervised latent Dirichlet allocation (SLDA, (Blei and McAuliffe, 2007)), which makes predictions based on the topics expressed in a document; this can be thought of projecting the words in a document to low dimensional space of dimension equal to the number of topics. Blei et al. showed that using this latent topic structure can offer improved predictions over regressions based on words alone, and the approach fits well with our current goals, since word-level cues are unlikely to be identical across languages. In addition to text, SLDA has been successfully applied to other domains such as social networks (Chang and Blei, 2009) and image classification (Wang et al., 2009). The key innovation in this paper is to extend SLDA by creating topics that are globally consistent across languages, using the bridging approach above. We express our model in the form of a probabilistic generative latent-variable model that generates documents in multiple languages and assigns a realvalued score to each document. The score comes from a normal distribution whose sum is the dot product between a regression parameter η that encodes the influence of each topic on the observation and a variance σ2. With this model in hand, we use statistical inference to determine the distribution over latent variables that, given the model, best explains observed data. The generative model is as follows: 1. For each topic i= 1. . . K, draw a topic distribution {βi, ωi, φi} from MULTDIRHIER(τ, κ, π). 2. {Foβr each do}cuf mroemn tM Md = 1. . . M with language ld: (a) CDihro(oαse). a distribution over topics θd ∼ (b) For each word in the document n = 1. . . Nd, choose a topic assignment zd,n ∼ Mult (θd) and a path λd,n ending at word wd,n according to Equation 1using {βzd,n , ωzd,n , φzd,n }. 3. Choose a re?sponse variable from y Norm ?η> z¯, σ2?, where z¯ d ≡ N1 PnN=1 zd,n. ∼ Crucially, note that the topics are not independent of the sentiment task; the regression encourages terms with similar effects on the observation y to be in the same topic. The consistency of topics described above allows the same regression to be done for the entire corpus regardless of the language of the underlying document. 2 Inference Finding the model parameters most likely to explain the data is a problem of statistical inference. We employ stochastic EM (Diebolt and Ip, 1996), using a Gibbs sampler for the E-step to assign words to paths and topics. After randomly initializing the topics, we alternate between sampling the topic and path of a word (zd,n, λd,n) and finding the regression parameters η that maximize the likelihood. We jointly sample the topic and path conditioning on all of the other path and document assignments in the corpus, selecting a path and topic with probability p(zn = k, λn = r|z−n , λ−n, wn , η, σ, Θ) = p(yd|z, η, σ)p(λn = r|zn = k, λ−n, wn, τ, p(zn = k|z−n, α) . κ, π) (2) Each of these three terms reflects a different influence on the topics from the vocabulary structure, the document’s topics, and the response variable. In the next paragraphs, we will expand each of them to derive the full conditional topic distribution. As discussed in Section 1.1, the structure of the topic distribution encourages terms with the same meaning to be in the same topic, even across languages. During inference, we marginalize over possible multinomial distributions β, ω, and φ, using the observed transitions from ito j in topic k; Tk,i,j, stop counts in synset iin topic k, Ok,i,0; continue counts in synsets iin topic k, Ok,i,1 ; and emission counts in synset iin language lin topic k, Fk,i,l. The 48 Multilingual Topics Text Documents Sentiment Prediction Figure 1: Graphical model representing MLSLDA. Shaded nodes represent observations, plates denote replication, and lines show probabilistic dependencies. probability of taking a path r is then p(λn = r|zn = k, λ−n) = (iY,j)∈r PBj0Bk,ik,j,i,+j0 τ+i,j τi,jPs∈0O,1k,Oi,1k,+i,s ω+i ωi,s! |(iY,j)∈rP{zP} Tran{szitiPon Ok,rend,0 + ωrend Fk,rend,wn + πrend,}l Ps∈0,1Ok,rend,s+ ωrend,sPw0Frend,w0+ πrend,w0 |PEmi{szsiPon} (3) Equation 3 reflects the multilingual aspect of this model. The conditional topic distribution for SLDA (Blei and McAuliffe, 2007) replaces this term with the standard Multinomial-Dirichlet. However, we believe this is the first published SLDA-style model using MCMC inference, as prior work has used variational inference (Blei and McAuliffe, 2007; Chang and Blei, 2009; Wang et al., 2009). Because the observed response variable depends on the topic assignments of a document, the conditional topic distribution is shifted toward topics that explain the observed response. Topics that move the predicted response yˆd toward the true yd will be favored. We drop terms that are constant across all topics for the effect of the response variable, p(yd|z, η, σ) ∝ exp?σ12?yd−PPk0kN0Nd,dk,0kη0k0?Pkη0Nzkd,k0? |??PP{z?P?} . Other wPord{zs’ influence exp

5 0.08186008 100 emnlp-2010-Staying Informed: Supervised and Semi-Supervised Multi-View Topical Analysis of Ideological Perspective

Author: Amr Ahmed ; Eric Xing

Abstract: With the proliferation of user-generated articles over the web, it becomes imperative to develop automated methods that are aware of the ideological-bias implicit in a document collection. While there exist methods that can classify the ideological bias of a given document, little has been done toward understanding the nature of this bias on a topical-level. In this paper we address the problem ofmodeling ideological perspective on a topical level using a factored topic model. We develop efficient inference algorithms using Collapsed Gibbs sampling for posterior inference, and give various evaluations and illustrations of the utility of our model on various document collections with promising results. Finally we give a Metropolis-Hasting inference algorithm for a semi-supervised extension with decent results.

6 0.076894201 33 emnlp-2010-Cross Language Text Classification by Model Translation and Semi-Supervised Learning

7 0.074065059 102 emnlp-2010-Summarizing Contrastive Viewpoints in Opinionated Text

8 0.072866812 49 emnlp-2010-Extracting Opinion Targets in a Single and Cross-Domain Setting with Conditional Random Fields

9 0.067259625 85 emnlp-2010-Negative Training Data Can be Harmful to Text Classification

10 0.066512927 70 emnlp-2010-Jointly Modeling Aspects and Opinions with a MaxEnt-LDA Hybrid

11 0.066461071 24 emnlp-2010-Automatically Producing Plot Unit Representations for Narrative Text

12 0.053366113 119 emnlp-2010-We're Not in Kansas Anymore: Detecting Domain Changes in Streams

13 0.047959484 108 emnlp-2010-Training Continuous Space Language Models: Some Practical Issues

14 0.04794826 41 emnlp-2010-Efficient Graph-Based Semi-Supervised Learning of Structured Tagging Models

15 0.047886271 109 emnlp-2010-Translingual Document Representations from Discriminative Projections

16 0.045109093 69 emnlp-2010-Joint Training and Decoding Using Virtual Nodes for Cascaded Segmentation and Tagging Tasks

17 0.044563111 17 emnlp-2010-An Efficient Algorithm for Unsupervised Word Segmentation with Branching Entropy and MDL

18 0.041399494 11 emnlp-2010-A Semi-Supervised Approach to Improve Classification of Infrequent Discourse Relations Using Feature Vector Extension

19 0.040617902 81 emnlp-2010-Modeling Perspective Using Adaptor Grammars

20 0.038796406 6 emnlp-2010-A Latent Variable Model for Geographic Lexical Variation

similar papers computed by lsi model

lsi for this paper:

topicId topicWeight

[(0, 0.179), (1, 0.136), (2, -0.138), (3, -0.109), (4, 0.079), (5, 0.005), (6, 0.192), (7, -0.055), (8, 0.005), (9, -0.047), (10, -0.054), (11, 0.014), (12, -0.016), (13, -0.079), (14, 0.004), (15, 0.107), (16, 0.009), (17, 0.018), (18, -0.145), (19, 0.025), (20, -0.084), (21, 0.063), (22, -0.085), (23, -0.111), (24, -0.025), (25, -0.112), (26, -0.067), (27, -0.005), (28, -0.085), (29, -0.122), (30, -0.073), (31, 0.248), (32, 0.064), (33, 0.155), (34, -0.065), (35, 0.315), (36, 0.114), (37, 0.168), (38, -0.058), (39, 0.108), (40, -0.113), (41, 0.027), (42, -0.05), (43, 0.022), (44, 0.137), (45, -0.05), (46, -0.088), (47, 0.105), (48, -0.097), (49, -0.002)]

similar papers list:

simIndex simValue paperId paperTitle

same-paper 1 0.94449985 83 emnlp-2010-Multi-Level Structured Models for Document-Level Sentiment Classification

Author: Ainur Yessenalina ; Yisong Yue ; Claire Cardie

2 0.69110245 120 emnlp-2010-What's with the Attitude? Identifying Sentences with Attitude in Online Discussions

Author: Ahmed Hassan ; Vahed Qazvinian ; Dragomir Radev

3 0.61543506 64 emnlp-2010-Incorporating Content Structure into Text Analysis Applications

Author: Christina Sauper ; Aria Haghighi ; Regina Barzilay

4 0.43275496 58 emnlp-2010-Holistic Sentiment Analysis Across Languages: Multilingual Supervised Latent Dirichlet Allocation

Author: Jordan Boyd-Graber ; Philip Resnik

5 0.30537802 100 emnlp-2010-Staying Informed: Supervised and Semi-Supervised Multi-View Topical Analysis of Ideological Perspective

Author: Amr Ahmed ; Eric Xing

6 0.2976644 108 emnlp-2010-Training Continuous Space Language Models: Some Practical Issues

7 0.25158542 102 emnlp-2010-Summarizing Contrastive Viewpoints in Opinionated Text

8 0.24483724 24 emnlp-2010-Automatically Producing Plot Unit Representations for Narrative Text

9 0.23667461 109 emnlp-2010-Translingual Document Representations from Discriminative Projections

10 0.22885735 17 emnlp-2010-An Efficient Algorithm for Unsupervised Word Segmentation with Branching Entropy and MDL

11 0.21844327 11 emnlp-2010-A Semi-Supervised Approach to Improve Classification of Infrequent Discourse Relations Using Feature Vector Extension

12 0.21624236 4 emnlp-2010-A Game-Theoretic Approach to Generating Spatial Descriptions

13 0.19956544 33 emnlp-2010-Cross Language Text Classification by Model Translation and Semi-Supervised Learning

14 0.19949101 85 emnlp-2010-Negative Training Data Can be Harmful to Text Classification

15 0.19738899 105 emnlp-2010-Title Generation with Quasi-Synchronous Grammar

16 0.17966375 9 emnlp-2010-A New Approach to Lexical Disambiguation of Arabic Text

17 0.17511727 30 emnlp-2010-Confidence in Structured-Prediction Using Confidence-Weighted Models

18 0.17105149 84 emnlp-2010-NLP on Spoken Documents Without ASR

19 0.16307923 41 emnlp-2010-Efficient Graph-Based Semi-Supervised Learning of Structured Tagging Models

20 0.15873872 90 emnlp-2010-Positional Language Models for Clinical Information Retrieval

similar papers computed by lda model

lda for this paper:

topicId topicWeight

[(10, 0.011), (12, 0.019), (29, 0.063), (30, 0.016), (32, 0.018), (52, 0.016), (56, 0.631), (62, 0.015), (66, 0.074), (72, 0.029), (76, 0.017), (89, 0.014)]

similar papers list:

simIndex simValue paperId paperTitle

1 0.93608481 1 emnlp-2010-"Poetic" Statistical Machine Translation: Rhyme and Meter

Author: Dmitriy Genzel ; Jakob Uszkoreit ; Franz Och

Abstract: As a prerequisite to translation of poetry, we implement the ability to produce translations with meter and rhyme for phrase-based MT, examine whether the hypothesis space of such a system is flexible enough to accomodate such constraints, and investigate the impact of such constraints on translation quality.

same-paper 2 0.92538261 83 emnlp-2010-Multi-Level Structured Models for Document-Level Sentiment Classification

Author: Ainur Yessenalina ; Yisong Yue ; Claire Cardie

3 0.90387815 64 emnlp-2010-Incorporating Content Structure into Text Analysis Applications

Author: Christina Sauper ; Aria Haghighi ; Regina Barzilay

4 0.85004216 102 emnlp-2010-Summarizing Contrastive Viewpoints in Opinionated Text

Author: Michael Paul ; ChengXiang Zhai ; Roxana Girju

Abstract: This paper presents a two-stage approach to summarizing multiple contrastive viewpoints in opinionated text. In the first stage, we use an unsupervised probabilistic approach to model and extract multiple viewpoints in text. We experiment with a variety of lexical and syntactic features, yielding significant performance gains over bag-of-words feature sets. In the second stage, we introduce Comparative LexRank, a novel random walk formulation to score sentences and pairs of sentences from opposite viewpoints based on both their representativeness of the collection as well as their contrastiveness with each other. Exper- imental results show that the proposed approach can generate informative summaries of viewpoints in opinionated text.

5 0.61674541 82 emnlp-2010-Multi-Document Summarization Using A* Search and Discriminative Learning

Author: Ahmet Aker ; Trevor Cohn ; Robert Gaizauskas

Abstract: In this paper we address two key challenges for extractive multi-document summarization: the search problem of finding the best scoring summary and the training problem of learning the best model parameters. We propose an A* search algorithm to find the best extractive summary up to a given length, which is both optimal and efficient to run. Further, we propose a discriminative training algorithm which directly maximises the quality ofthe best summary, rather than assuming a sentence-level decomposition as in earlier work. Our approach leads to significantly better results than earlier techniques across a number of evaluation metrics.

6 0.59388149 107 emnlp-2010-Towards Conversation Entailment: An Empirical Investigation

7 0.59194207 105 emnlp-2010-Title Generation with Quasi-Synchronous Grammar

8 0.57792693 120 emnlp-2010-What's with the Attitude? Identifying Sentences with Attitude in Online Discussions

9 0.54880387 58 emnlp-2010-Holistic Sentiment Analysis Across Languages: Multilingual Supervised Latent Dirichlet Allocation

10 0.54492319 49 emnlp-2010-Extracting Opinion Targets in a Single and Cross-Domain Setting with Conditional Random Fields

11 0.53787792 24 emnlp-2010-Automatically Producing Plot Unit Representations for Narrative Text

12 0.52729475 100 emnlp-2010-Staying Informed: Supervised and Semi-Supervised Multi-View Topical Analysis of Ideological Perspective

13 0.52703822 80 emnlp-2010-Modeling Organization in Student Essays

14 0.50938147 25 emnlp-2010-Better Punctuation Prediction with Dynamic Conditional Random Fields

15 0.50586385 69 emnlp-2010-Joint Training and Decoding Using Virtual Nodes for Cascaded Segmentation and Tagging Tasks

16 0.49349132 94 emnlp-2010-SCFG Decoding Without Binarization

17 0.48709404 65 emnlp-2010-Inducing Probabilistic CCG Grammars from Logical Form with Higher-Order Unification

18 0.48254049 103 emnlp-2010-Tense Sense Disambiguation: A New Syntactic Polysemy Task

19 0.48001 18 emnlp-2010-Assessing Phrase-Based Translation Models with Oracle Decoding

20 0.47903848 110 emnlp-2010-Turbo Parsers: Dependency Parsing by Approximate Variational Inference