emnlp emnlp2010 emnlp2010-64 knowledge-graph by maker-knowledge-mining
Source: pdf
Author: Christina Sauper ; Aria Haghighi ; Regina Barzilay
Abstract: In this paper, we investigate how modeling content structure can benefit text analysis applications such as extractive summarization and sentiment analysis. This follows the linguistic intuition that rich contextual information should be useful in these tasks. We present a framework which combines a supervised text analysis application with the induction of latent content structure. Both of these elements are learned jointly using the EM algorithm. The induced content structure is learned from a large unannotated corpus and biased by the underlying text analysis task. We demonstrate that exploiting content structure yields significant improvements over approaches that rely only on local context.1
Reference: text
sentIndex sentText sentNum sentScore
1 We present a framework which combines a supervised text analysis application with the induction of latent content structure. [sent-3, score-0.564]
2 The induced content structure is learned from a large unannotated corpus and biased by the underlying text analysis task. [sent-5, score-0.741]
3 We demonstrate that exploiting content structure yields significant improvements over approaches that rely only on local context. [sent-6, score-0.478]
4 1 1 Introduction , In this paper, we demonstrate that leveraging document structure significantly benefits text analysis applications. [sent-7, score-0.216]
5 A central challenge in utilizing such information lies in finding a relevant representation of content structure for a specific text analysis task. [sent-12, score-0.517]
6 instance, when performing single-aspect sentiment analysis, the most relevant aspect of content structure is whether a given sentence is objective or subjective (Pang and Lee, 2004). [sent-20, score-1.03]
7 In a multi-aspect setting, however, information about the sentence topic is required to determine the aspect to which a sentiment-bearing word relates (Snyder and Barzilay, 2007). [sent-21, score-0.298]
8 As we can see from even these closely related applications, the content structure representation should be intimately tied to a specific text analysis task. [sent-22, score-0.517]
9 In this work, we present an approach in which a content model is learned jointly with a text analysis task. [sent-23, score-0.562]
10 We assume complete annotations for the task itself, but we learn the content model from raw, unannotated text. [sent-24, score-0.584]
11 Our approach is implemented in a discriminative framework using latent variables to represent facets of content structure. [sent-25, score-0.585]
12 , lexical ones) are conjoined with latent variables to enrich the features with global contextual information. [sent-28, score-0.256]
13 tc ho2d0s10 in A Nsastoucira tlio Lnan fogru Cagoem Ppruotcaetisosninagl, L pinag eusis 3t7ic7s–387, word “pleased” should contribute most strongly to the sentiment of the audio aspect when it is augmented with a relevant topic indicator. [sent-31, score-0.66]
14 The coupling of the content model and the taskspecific model allows the two components to mutually influence each other during learning. [sent-32, score-0.524]
15 The content model leverages unannotated data to improve the performance of the task-specific model, while the task-specific model provides feedback to improve the relevance of the content model. [sent-33, score-0.964]
16 Our first task is a multi-aspect sentiment analysis task, where a system predicts the aspect-specific sentiment ratings (Snyder and Barzilay, 2007). [sent-36, score-0.861]
17 Second, we consider a multi-aspect extractive summarization task in which a system extracts key properties for a pre-specified set of aspects. [sent-37, score-0.253]
18 On both tasks, our method for incorporating content structure consistently outperforms structureagnostic counterparts. [sent-38, score-0.478]
19 Moreover, jointly learning content and task parameters yields additional gains over independently learned models. [sent-39, score-0.576]
20 2 Related Work Prior research has demonstrated the usefulness of content models for discourse-level tasks. [sent-40, score-0.433]
21 Since these tasks are inherently tied to document structure, a content model is essential to performing them successfully. [sent-44, score-0.565]
22 Our goal is to augment these models with document-level content information. [sent-46, score-0.433]
23 Several applications in information extraction and sentiment analysis are close in spirit to our work (Pang and Lee, 2004; Patwardhan and Riloff, 2007; McDonald et al. [sent-47, score-0.432]
24 For instance, 378 Pang and Lee (2004) refine the accuracy of sentiment analysis by considering only the subjective sentences of a review as determined by an independent classifier. [sent-51, score-0.446]
25 (2009) uses linguistic resources to create a latent model in a taskspecific fashion to improve performance, rather than assuming sentence-level task relevancy. [sent-56, score-0.235]
26 Choi and Cardie (2008) address a sentiment analysis task by using a heuristic decision process based on wordlevel intermediate variables to represent polarity. [sent-57, score-0.504]
27 (2007) is concerned with recovering the labels at all levels, whereas in this work we are interested in using latent document content structure as a means to benefit task predictions. [sent-63, score-0.809]
28 Our algorithm adjusts the content model dynamically for a given task rather than pre-specifying it. [sent-66, score-0.486]
29 To overcome this drawback, our method induces a content model in an unsupervised fashion and connects it via latent variables to the target model. [sent-68, score-0.62]
30 This design not only eliminates the need for additional annotations, but also allows the algorithm to leverage large quantities of raw data for training the content model. [sent-69, score-0.488]
31 In this work, latent topic variables are used to generate text as well as a supervised sentiment rating for the document. [sent-72, score-0.707]
32 1 Problem Formulation In this section, we describe a model which incorporates content information into a multi-aspect summarization task. [sent-75, score-0.597]
33 2 Our approach assumes that at training time we have a collection of labeled documents DL, each consisting of the document text s manendt true task-specific labeling y∗. [sent-76, score-0.269]
34 For the multiaspect summarization task, y∗ consists of sequence labels (e. [sent-77, score-0.267]
35 , sn and the labelings y∗ consists of corresponding label sequences y1, As is common in related work, we model each yi using a CRF which conditions on the observed document text. [sent-83, score-0.291]
36 In this work, we also assume a content model, which we fix to be the document-level HMM as used in Barzilay and Lee (2004). [sent-84, score-0.482]
37 In this content model, each sentence si is associated with a hidden topic variable Ti which generates the words of the sentence. [sent-85, score-0.687]
38 2 Model Overview Our model, depicted in Figure 1, proceeds as follows: First the document-level HMM generates a hidden content topic sequence T for the sentences of a document. [sent-96, score-0.636]
39 This content component is parametrized by θ and decomposes in the standard 2In Section 3. [sent-97, score-0.59]
40 3Note that each yi is a label sequence across the words in si, rather than an individual label. [sent-99, score-0.208]
41 The Ti variable represents the content model topic for the ith sentence si. [sent-101, score-0.626]
42 Note that each token label has an undirected edge to a factor containing the words of the current sentence, si as well as the topic of the current sentence Ti. [sent-109, score-0.368]
43 Figure 2: A graphical depiction of the generative process for a labeled document at training time (See Section 3); shaded nodes indicate variables which are observed at training time. [sent-113, score-0.312]
44 First the latent underlying content structure T is drawn. [sent-114, score-0.618]
45 Then, the document text s is drawn conditioned on the content structure utilizing content parameters θ. [sent-115, score-1.043]
46 Finally, the observed task labels for the document are modeled given s and T using the task parameters φ. [sent-116, score-0.292]
47 For instance, using the example from Table 1, we could have a feature that indicates the word “pleased” conjoined with the segment topic (see Figure 1). [sent-120, score-0.212]
48 This joint process, depicted graphically in Figure 2, is summarized as: P(T, s, y∗) = Pθ(T, s)Pφ(y∗|s, T) (3) Note that this probability decomposes into a document-level HMM term (the content component) as well as a product of CRF terms (the task component). [sent-122, score-0.564]
49 3 Learning During learning, we would like to find the document-level HMM parameters θ and the summarization task CRF parameters which maximize the φ 380 likelihood of the labeled documents. [sent-124, score-0.29]
50 The only observed elements of a labeled document are the document text s and the aspect labels y∗. [sent-125, score-0.496]
51 M-Step We perform separate M-Steps for content and task parameters. [sent-135, score-0.486]
52 The M-Step for the content parameters is identical to the document-level HMM content model: topic emission and transition distributions are updated with expected counts derived from E-Step topic posteriors. [sent-136, score-1.174]
53 Using the decomposition in Equation (3), it is clear that the only component of the joint labeled document probability which relies upon the task parameters is log Pφ(y∗ |s, T). [sent-139, score-0.333]
54 Each utilizes different content features to explain the sentence sequence labeling. [sent-143, score-0.521]
55 Note that the above formulation differs from the standard CRF due to the latent topic variables. [sent-155, score-0.246]
56 Otherwise the inference task could be accomplished by directly obtaining posteriors over each yj state using the Forward-Backwards algorithm on the sentence CRF. [sent-156, score-0.222]
57 5 Leveraging unannotated data Our model allows us to incorporate unlabeled documents, denoted DU, to improve the learning of the content m deondoelte. [sent-163, score-0.658]
58 dF oDr an unlabeled document we only observe the document text s and assume it is drawn from the same content model as our labeled documents. [sent-164, score-0.897]
59 This objective corresponds to: L(φ, θ) =LU(θ) + LL(φ, θ) This objective can also be optimized using the EM algorithm, where the E-Step for labeled and unlabeled documents is outlined above. [sent-167, score-0.376]
60 In this task, the target y consists of numeric sentiment ratings (y1, . [sent-172, score-0.417]
61 The task component consists of independent linear regression models for each aspect sentiment rating. [sent-176, score-0.55]
62 For the content model, we associate a topic with each paragraph; T consists of assignments of topics to each document paragraph. [sent-177, score-0.719]
63 For instance, because the task label (aspect sentiment ratings) is not localized to any region of the document, all content model variables influence the target response. [sent-179, score-0.964]
64 Conditioned on the target label, all 382 topic variables become correlated. [sent-180, score-0.214]
65 4 Experimental Set-Up We apply our approach to two text analysis tasks that stand to benefit from modeling content structure: multi-aspect sentiment analysis and multi-aspect review summarization. [sent-183, score-0.918]
66 For all tasks, when using a content model with a task model, we utilize a new set of features which include all the original features as well as a copy of each feature conjoined with the content topic assignment (see Figure 1). [sent-187, score-1.171]
67 Multi-Aspect Sentiment Ranking The goal of multi-aspect sentiment classification is to predict a set ofnumeric ranks that reflects the user satisfaction for each aspect (Snyder and Barzilay, 2007). [sent-189, score-0.457]
68 In each case, the unlabeled texts of both labeled and unlabeled documents are used for training the content model, while only the labeled training corpus is used to train the task model. [sent-197, score-0.95]
69 Note that the entire data set for the multi-aspect sentiment analysis task is labeled. [sent-198, score-0.444]
70 We test our sentiment ranker on a set of DVD reviews from the website IGN. [sent-201, score-0.409]
71 Therefore, we can compare the performance of the algorithm using automatically induced content models against the gold standard structural information. [sent-205, score-0.469]
72 Variants of this task have been considered in review summarization in previous work (Kim and Hovy, 2006; Branavan et al. [sent-213, score-0.272]
73 Since we cannot select high-quality annotators directly, we included a control document which had been previously annotated by a native speaker among the documents assigned to each annotator. [sent-225, score-0.239]
74 The work of any annotator who exhibited low agreement on the control document annotation was excluded from the corpus. [sent-226, score-0.213]
75 2 Baseline Comparison and Evaluation Baselines For all the models, we obtain a baseline system by eliminating content features and only us- ing a task model with the set of features described above. [sent-234, score-0.486]
76 We also compare against a simplified variant of our method wherein a content model is induced in isolation rather than learned jointly in the context of the underlying task. [sent-235, score-0.643]
77 The multi-aspect sentiment corpus has labels per paragraph rather than per sentence. [sent-248, score-0.449]
78 Model (JointCM) setting refers to our full model described in Section 3, where content and task components are learned jointly. [sent-249, score-0.528]
79 Evaluation Metrics For multi-aspect sentiment ranking, we report the average L2 (squared difference) and L1 (absolute difference) between system prediction and true 1-10 sentiment rating across test documents and aspects. [sent-250, score-0.817]
80 For the multi-aspect summarization task, we measure average token precision and recall of the label assignments (Multi-label). [sent-251, score-0.278]
81 1624850† * Table 3: The error rate on the multi-aspect sentiment ranking. [sent-255, score-0.352]
82 Baseline Comparisons Adding a content model significantly outperforms the NoCM baseline on both tasks. [sent-273, score-0.433]
83 75%, on multi-aspect summarization on the Amazon corpus and multi-aspect sentiment ranking, respectively. [sent-277, score-0.516]
84 vantages of jointly learning the content model in the context of the underlying task. [sent-293, score-0.529]
85 Comparison with additional context features One alternative to an explicit content model is to simply incorporate additional features into NoCM as a proxy for contextual information. [sent-294, score-0.479]
86 – Impact of content model quality on task performance In the multi-aspect sentiment ranking task, we have access to gold standard documentlevel content structure annotation. [sent-301, score-1.365]
87 This affords us the ability to compare the ideal content structure, provided by the document authors, with one that is learned automatically. [sent-302, score-0.607]
88 However, the performance of our JointCM model is not far behind the gold standard content structure. [sent-304, score-0.433]
89 The quality of the induced content model is determined by the amount of training data. [sent-305, score-0.469]
90 As Figure 4 shows, the multi-aspect summarizer improves with the increase in the size of raw data available for learning content model. [sent-306, score-0.433]
91 6This type of feature is not applicable to our multi-aspect sentiment ranking task, as we already use unigram features from the entire document. [sent-307, score-0.401]
92 385 Percentage of unlabeled data Figure 4: Results on the Amazon corpus using the complete annotated set with varying amounts of ad- ditional unlabeled data. [sent-308, score-0.254]
93 6 Conclusion In this paper, we demonstrate the benefits of incorporating content models in text analysis tasks. [sent-313, score-0.472]
94 We also introduce a framework to allow the joint learning of an unsupervised latent content model with a supervised task-specific model. [sent-314, score-0.525]
95 On multiple tasks and datasets, our results empirically connect model quality and task performance, suggesting that fur7Because we append the unlabeled versions of the labeled data to the unlabeled set, even with 0% additional unlabeled data, there is a small data set to train the content model. [sent-315, score-0.94]
96 ther improvements in content modeling may yield even further gains. [sent-320, score-0.433]
97 Catching the drift: Probabilistic content models, with applications to generation and summarization. [sent-326, score-0.474]
98 Learning with compositional semantics as structural inference for subsentential sentiment analysis. [sent-356, score-0.352]
99 Seeing stars when there aren’t many stars: Graph-based semi-supervised learning for sentiment categorization. [sent-383, score-0.415]
100 Seeing stars: Exploiting class relationships for sentiment categorization with respect to rating scales. [sent-417, score-0.401]
wordName wordTfidf (topN-words)
[('content', 0.433), ('sentiment', 0.352), ('summarization', 0.164), ('yelp', 0.164), ('amazon', 0.155), ('topic', 0.154), ('nocm', 0.137), ('document', 0.132), ('yj', 0.13), ('barzilay', 0.129), ('crf', 0.128), ('unlabeled', 0.127), ('ti', 0.11), ('hmm', 0.106), ('aspect', 0.105), ('unannotated', 0.098), ('regina', 0.093), ('yi', 0.093), ('latent', 0.092), ('snyder', 0.091), ('rouge', 0.086), ('dvd', 0.084), ('pang', 0.082), ('decomposes', 0.078), ('logp', 0.078), ('labeled', 0.073), ('pleased', 0.07), ('wy', 0.07), ('label', 0.066), ('ratings', 0.065), ('documents', 0.064), ('patwardhan', 0.063), ('stars', 0.063), ('si', 0.061), ('variables', 0.06), ('xt', 0.06), ('branavan', 0.058), ('conjoined', 0.058), ('reviews', 0.057), ('aspects', 0.056), ('objective', 0.056), ('review', 0.055), ('indepcm', 0.055), ('jointcm', 0.055), ('logxp', 0.055), ('sip', 0.055), ('taskspecific', 0.055), ('somasundaran', 0.055), ('quantities', 0.055), ('labels', 0.054), ('polarity', 0.053), ('task', 0.053), ('iy', 0.052), ('ranking', 0.049), ('fix', 0.049), ('rating', 0.049), ('dl', 0.049), ('choi', 0.049), ('audio', 0.049), ('sequence', 0.049), ('jointly', 0.048), ('token', 0.048), ('underlying', 0.048), ('depiction', 0.047), ('weeds', 0.047), ('lillian', 0.047), ('contextual', 0.046), ('mcdonald', 0.045), ('structure', 0.045), ('posterior', 0.044), ('xp', 0.043), ('du', 0.043), ('paragraph', 0.043), ('control', 0.043), ('jorge', 0.042), ('harr', 0.042), ('jair', 0.042), ('dong', 0.042), ('learned', 0.042), ('applications', 0.041), ('loss', 0.04), ('utilize', 0.04), ('component', 0.04), ('analysis', 0.039), ('parametrized', 0.039), ('sentence', 0.039), ('lee', 0.039), ('agreement', 0.038), ('elsner', 0.036), ('video', 0.036), ('goldberg', 0.036), ('excerpt', 0.036), ('extractive', 0.036), ('isolation', 0.036), ('coupling', 0.036), ('induced', 0.036), ('proceedings', 0.036), ('fashion', 0.035), ('decomposition', 0.035), ('cardie', 0.034)]
simIndex simValue paperId paperTitle
same-paper 1 1.0000012 64 emnlp-2010-Incorporating Content Structure into Text Analysis Applications
Author: Christina Sauper ; Aria Haghighi ; Regina Barzilay
Abstract: In this paper, we investigate how modeling content structure can benefit text analysis applications such as extractive summarization and sentiment analysis. This follows the linguistic intuition that rich contextual information should be useful in these tasks. We present a framework which combines a supervised text analysis application with the induction of latent content structure. Both of these elements are learned jointly using the EM algorithm. The induced content structure is learned from a large unannotated corpus and biased by the underlying text analysis task. We demonstrate that exploiting content structure yields significant improvements over approaches that rely only on local context.1
2 0.26933557 58 emnlp-2010-Holistic Sentiment Analysis Across Languages: Multilingual Supervised Latent Dirichlet Allocation
Author: Jordan Boyd-Graber ; Philip Resnik
Abstract: In this paper, we develop multilingual supervised latent Dirichlet allocation (MLSLDA), a probabilistic generative model that allows insights gleaned from one language’s data to inform how the model captures properties of other languages. MLSLDA accomplishes this by jointly modeling two aspects of text: how multilingual concepts are clustered into thematically coherent topics and how topics associated with text connect to an observed regression variable (such as ratings on a sentiment scale). Concepts are represented in a general hierarchical framework that is flexible enough to express semantic ontologies, dictionaries, clustering constraints, and, as a special, degenerate case, conventional topic models. Both the topics and the regression are discovered via posterior inference from corpora. We show MLSLDA can build topics that are consistent across languages, discover sensible bilingual lexical correspondences, and leverage multilingual corpora to better predict sentiment. Sentiment analysis (Pang and Lee, 2008) offers the promise of automatically discerning how people feel about a product, person, organization, or issue based on what they write online, which is potentially of great value to businesses and other organizations. However, the vast majority of sentiment resources and algorithms are limited to a single language, usually English (Wilson, 2008; Baccianella and Sebastiani, 2010). Since no single language captures a majority of the content online, adopting such a limited approach in an increasingly global community risks missing important details and trends that might only be available when text in multiple languages is taken into account. 45 Philip Resnik Department of Linguistics and UMIACS University of Maryland College Park, MD re snik@umd .edu Up to this point, multiple languages have been addressed in sentiment analysis primarily by transferring knowledge from a resource-rich language to a less rich language (Banea et al., 2008), or by ignoring differences in languages via translation into English (Denecke, 2008). These approaches are limited to a view of sentiment that takes place through an English-centric lens, and they ignore the potential to share information between languages. Ideally, learning sentiment cues holistically, across languages, would result in a richer and more globally consistent picture. In this paper, we introduce Multilingual Supervised Latent Dirichlet Allocation (MLSLDA), a model for sentiment analysis on a multilingual corpus. MLSLDA discovers a consistent, unified picture of sentiment across multiple languages by learning “topics,” probabilistic partitions of the vocabulary that are consistent in terms of both meaning and relevance to observed sentiment. Our approach makes few assumptions about available resources, requiring neither parallel corpora nor machine translation. The rest of the paper proceeds as follows. In Section 1, we describe the probabilistic tools that we use to create consistent topics bridging across languages and the MLSLDA model. In Section 2, we present the inference process. We discuss our set of semantic bridges between languages in Section 3, and our experiments in Section 4 demonstrate that this approach functions as an effective multilingual topic model, discovers sentiment-biased topics, and uses multilingual corpora to make better sentiment predictions across languages. Sections 5 and 6 discuss related research and discusses future work, respectively. ProcMe IdTi,n Mgsas ofsa tchehu 2se0t1t0s, C UoSnAfe,r 9e-n1ce1 o Onc Etombepri 2ic0a1l0 M. ?ec th2o0d1s0 i Ans Nsaotcuiartaioln La fonrg Cuaogmep Purtoatcieosnsainlg L,in pgagueis ti 4c5s–5 , 1 Predictions from Multilingual Topics As its name suggests, MLSLDA is an extension of Latent Dirichlet allocation (LDA) (Blei et al., 2003), a modeling approach that takes a corpus of unannotated documents as input and produces two outputs, a set of “topics” and assignments of documents to topics. Both the topics and the assignments are probabilistic: a topic is represented as a probability distribution over words in the corpus, and each document is assigned a probability distribution over all the topics. Topic models built on the foundations of LDA are appealing for sentiment analysis because the learned topics can cluster together sentimentbearing words, and because topic distributions are a parsimonious way to represent a document.1 LDA has been used to discover latent structure in text (e.g. for discourse segmentation (Purver et al., 2006) and authorship (Rosen-Zvi et al., 2004)). MLSLDA extends the approach by ensuring that this latent structure the underlying topics is consistent across languages. We discuss multilingual topic modeling in Section 1. 1, and in Section 1.2 we show how this enables supervised regression regardless of a document’s language. — — 1.1 Capturing Semantic Correlations Topic models posit a straightforward generative process that creates an observed corpus. For each docu- ment d, some distribution θd over unobserved topics is chosen. Then, for each word position in the document, a topic z is selected. Finally, the word for that position is generated by selecting from the topic indexed by z. (Recall that in LDA, a “topic” is a distribution over words). In monolingual topic models, the topic distribution is usually drawn from a Dirichlet distribution. Using Dirichlet distributions makes it easy to specify sparse priors, and it also simplifies posterior inference because Dirichlet distributions are conjugate to multinomial distributions. However, drawing topics from Dirichlet distributions will not suffice if our vocabulary includes multiple languages. If we are working with English, German, and Chinese at the same time, a Dirichlet prior has no way to favor distributions z such that p(good|z), p(gut|z), and 1The latter property has also made LDA popular for information retrieval (Wei and Croft, 2006)). 46 p(h aˇo|z) all tend to be high at the same time, or low at hth ˇaeo same lti tmened. tMoo bree generally, et sheam structure oorf our model must encourage topics to be consistent across languages, and Dirichlet distributions cannot encode correlations between elements. One possible solution to this problem is to use the multivariate normal distribution, which can produce correlated multinomials (Blei and Lafferty, 2005), in place of the Dirichlet distribution. This has been done successfully in multilingual settings (Cohen and Smith, 2009). However, such models complicate inference by not being conjugate. Instead, we appeal to tree-based extensions of the Dirichlet distribution, which has been used to induce correlation in semantic ontologies (Boyd-Graber et al., 2007) and to encode clustering constraints (Andrzejewski et al., 2009). The key idea in this approach is to assume the vocabularies of all languages are organized according to some shared semantic structure that can be represented as a tree. For concreteness in this section, we will use WordNet (Miller, 1990) as the representation of this multilingual semantic bridge, since it is well known, offers convenient and intuitive terminology, and demonstrates the full flexibility of our approach. However, the model we describe generalizes to any tree-structured rep- resentation of multilingual knowledge; we discuss some alternatives in Section 3. WordNet organizes a vocabulary into a rooted, directed acyclic graph of nodes called synsets, short for “synonym sets.” A synset is a child of another synset if it satisfies a hyponomy relationship; each child “is a” more specific instantiation of its parent concept (thus, hyponomy is often called an “isa” relationship). For example, a “dog” is a “canine” is an “animal” is a “living thing,” etc. As an approximation, it is not unreasonable to assume that WordNet’s structure of meaning is language independent, i.e. the concept encoded by a synset can be realized using terms in different languages that share the same meaning. In practice, this organization has been used to create many alignments of international WordNets to the original English WordNet (Ordan and Wintner, 2007; Sagot and Fiˇ ser, 2008; Isahara et al., 2008). Using the structure of WordNet, we can now describe a generative process that produces a distribution over a multilingual vocabulary, which encourages correlations between words with similar meanings regardless of what language each word is in. For each synset h, we create a multilingual word distribution for that synset as follows: 1. Draw transition probabilities βh ∼ Dir (τh) 2. Draw stop probabilities ωh ∼ Dir∼ (κ Dhi)r 3. For each language l, draw emission probabilities for that synset φh,l ∼ Dir (πh,l) . For conciseness in the rest of the paper, we will refer to this generative process as multilingual Dirichlet hierarchy, or MULTDIRHIER(τ, κ, π) .2 Each observed token can be viewed as the end result of a sequence of visited synsets λ. At each node in the tree, the path can end at node iwith probability ωi,1, or it can continue to a child synset with probability ωi,0. If the path continues to another child synset, it visits child j with probability βi,j. If the path ends at a synset, it generates word k with probability φi,l,k.3 The probability of a word being emitted from a path with visited synsets r and final synset h in language lis therefore p(w, λ = r, h|l, β, ω, φ) = (iY,j)∈rβi,jωi,0(1 − ωh,1)φh,l,w. Note that the stop probability ωh (1) is independent of language, but the emission φh,l is dependent on the language. This is done to prevent the following scenario: while synset A is highly probable in a topic and words in language 1attached to that synset have high probability, words in language 2 have low probability. If this could happen for many synsets in a topic, an entire language would be effectively silenced, which would lead to inconsistent topics (e.g. 2Variables τh, πh,l, and κh are hyperparameters. Their mean is fixed, but their magnitude is sampled during inference (i.e. Pkτhτ,ih,k is constant, but τh,i is not). For the bushier bridges, (Pe.g. dictionary and flat), their mean is uniform. For GermaNet, we took frequencies from two balanced corpora of German and English: the British National Corpus (University of Oxford, 2006) and the Kern Corpus of the Digitales Wo¨rterbuch der Deutschen Sprache des 20. Jahrhunderts project (Geyken, 2007). We took these frequencies and propagated them through the multilingual hierarchy, following LDAWN’s (Boyd-Graber et al., 2007) formulation of information content (Resnik, 1995) as a Bayesian prior. The variance of the priors was initialized to be 1.0, but could be sampled during inference. 3Note that the language and word are taken as given, but the path through the semantic hierarchy is a latent random variable. 47 Topic 1 is about baseball in English and about travel in German). Separating path from emission helps ensure that topics are consistent across languages. Having defined topic distributions in a way that can preserve cross-language correspondences, we now use this distribution within a larger model that can discover cross-language patterns of use that predict sentiment. 1.2 The MLSLDA Model We will view sentiment analysis as a regression problem: given an input document, we want to predict a real-valued observation y that represents the sentiment of a document. Specifically, we build on supervised latent Dirichlet allocation (SLDA, (Blei and McAuliffe, 2007)), which makes predictions based on the topics expressed in a document; this can be thought of projecting the words in a document to low dimensional space of dimension equal to the number of topics. Blei et al. showed that using this latent topic structure can offer improved predictions over regressions based on words alone, and the approach fits well with our current goals, since word-level cues are unlikely to be identical across languages. In addition to text, SLDA has been successfully applied to other domains such as social networks (Chang and Blei, 2009) and image classification (Wang et al., 2009). The key innovation in this paper is to extend SLDA by creating topics that are globally consistent across languages, using the bridging approach above. We express our model in the form of a probabilistic generative latent-variable model that generates documents in multiple languages and assigns a realvalued score to each document. The score comes from a normal distribution whose sum is the dot product between a regression parameter η that encodes the influence of each topic on the observation and a variance σ2. With this model in hand, we use statistical inference to determine the distribution over latent variables that, given the model, best explains observed data. The generative model is as follows: 1. For each topic i= 1. . . K, draw a topic distribution {βi, ωi, φi} from MULTDIRHIER(τ, κ, π). 2. {Foβr each do}cuf mroemn tM Md = 1. . . M with language ld: (a) CDihro(oαse). a distribution over topics θd ∼ (b) For each word in the document n = 1. . . Nd, choose a topic assignment zd,n ∼ Mult (θd) and a path λd,n ending at word wd,n according to Equation 1using {βzd,n , ωzd,n , φzd,n }. 3. Choose a re?sponse variable from y Norm ?η> z¯, σ2?, where z¯ d ≡ N1 PnN=1 zd,n. ∼ Crucially, note that the topics are not independent of the sentiment task; the regression encourages terms with similar effects on the observation y to be in the same topic. The consistency of topics described above allows the same regression to be done for the entire corpus regardless of the language of the underlying document. 2 Inference Finding the model parameters most likely to explain the data is a problem of statistical inference. We employ stochastic EM (Diebolt and Ip, 1996), using a Gibbs sampler for the E-step to assign words to paths and topics. After randomly initializing the topics, we alternate between sampling the topic and path of a word (zd,n, λd,n) and finding the regression parameters η that maximize the likelihood. We jointly sample the topic and path conditioning on all of the other path and document assignments in the corpus, selecting a path and topic with probability p(zn = k, λn = r|z−n , λ−n, wn , η, σ, Θ) = p(yd|z, η, σ)p(λn = r|zn = k, λ−n, wn, τ, p(zn = k|z−n, α) . κ, π) (2) Each of these three terms reflects a different influence on the topics from the vocabulary structure, the document’s topics, and the response variable. In the next paragraphs, we will expand each of them to derive the full conditional topic distribution. As discussed in Section 1.1, the structure of the topic distribution encourages terms with the same meaning to be in the same topic, even across languages. During inference, we marginalize over possible multinomial distributions β, ω, and φ, using the observed transitions from ito j in topic k; Tk,i,j, stop counts in synset iin topic k, Ok,i,0; continue counts in synsets iin topic k, Ok,i,1 ; and emission counts in synset iin language lin topic k, Fk,i,l. The 48 Multilingual Topics Text Documents Sentiment Prediction Figure 1: Graphical model representing MLSLDA. Shaded nodes represent observations, plates denote replication, and lines show probabilistic dependencies. probability of taking a path r is then p(λn = r|zn = k, λ−n) = (iY,j)∈r PBj0Bk,ik,j,i,+j0 τ+i,j τi,jPs∈0O,1k,Oi,1k,+i,s ω+i ωi,s! |(iY,j)∈rP{zP} Tran{szitiPon Ok,rend,0 + ωrend Fk,rend,wn + πrend,}l Ps∈0,1Ok,rend,s+ ωrend,sPw0Frend,w0+ πrend,w0 |PEmi{szsiPon} (3) Equation 3 reflects the multilingual aspect of this model. The conditional topic distribution for SLDA (Blei and McAuliffe, 2007) replaces this term with the standard Multinomial-Dirichlet. However, we believe this is the first published SLDA-style model using MCMC inference, as prior work has used variational inference (Blei and McAuliffe, 2007; Chang and Blei, 2009; Wang et al., 2009). Because the observed response variable depends on the topic assignments of a document, the conditional topic distribution is shifted toward topics that explain the observed response. Topics that move the predicted response yˆd toward the true yd will be favored. We drop terms that are constant across all topics for the effect of the response variable, p(yd|z, η, σ) ∝ exp?σ12?yd−PPk0kN0Nd,dk,0kη0k0?Pkη0Nzkd,k0? |??PP{z?P?} . Other wPord{zs’ influence exp
3 0.19723919 83 emnlp-2010-Multi-Level Structured Models for Document-Level Sentiment Classification
Author: Ainur Yessenalina ; Yisong Yue ; Claire Cardie
Abstract: In this paper, we investigate structured models for document-level sentiment classification. When predicting the sentiment of a subjective document (e.g., as positive or negative), it is well known that not all sentences are equally discriminative or informative. But identifying the useful sentences automatically is itself a difficult learning problem. This paper proposes a joint two-level approach for document-level sentiment classification that simultaneously extracts useful (i.e., subjec- tive) sentences and predicts document-level sentiment based on the extracted sentences. Unlike previous joint learning methods for the task, our approach (1) does not rely on gold standard sentence-level subjectivity annotations (which may be expensive to obtain), and (2) optimizes directly for document-level performance. Empirical evaluations on movie reviews and U.S. Congressional floor debates show improved performance over previous approaches.
4 0.17041594 34 emnlp-2010-Crouching Dirichlet, Hidden Markov Model: Unsupervised POS Tagging with Context Local Tag Generation
Author: Taesun Moon ; Katrin Erk ; Jason Baldridge
Abstract: We define the crouching Dirichlet, hidden Markov model (CDHMM), an HMM for partof-speech tagging which draws state prior distributions for each local document context. This simple modification of the HMM takes advantage of the dichotomy in natural language between content and function words. In contrast, a standard HMM draws all prior distributions once over all states and it is known to perform poorly in unsupervised and semisupervised POS tagging. This modification significantly improves unsupervised POS tagging performance across several measures on five data sets for four languages. We also show that simply using different hyperparameter values for content and function word states in a standard HMM (which we call HMM+) is surprisingly effective.
Author: Amr Ahmed ; Eric Xing
Abstract: With the proliferation of user-generated articles over the web, it becomes imperative to develop automated methods that are aware of the ideological-bias implicit in a document collection. While there exist methods that can classify the ideological bias of a given document, little has been done toward understanding the nature of this bias on a topical-level. In this paper we address the problem ofmodeling ideological perspective on a topical level using a factored topic model. We develop efficient inference algorithms using Collapsed Gibbs sampling for posterior inference, and give various evaluations and illustrations of the utility of our model on various document collections with promising results. Finally we give a Metropolis-Hasting inference algorithm for a semi-supervised extension with decent results.
6 0.16257666 70 emnlp-2010-Jointly Modeling Aspects and Opinions with a MaxEnt-LDA Hybrid
7 0.15701807 41 emnlp-2010-Efficient Graph-Based Semi-Supervised Learning of Structured Tagging Models
8 0.14009111 97 emnlp-2010-Simple Type-Level Unsupervised POS Tagging
9 0.12767836 33 emnlp-2010-Cross Language Text Classification by Model Translation and Semi-Supervised Learning
10 0.11597866 69 emnlp-2010-Joint Training and Decoding Using Virtual Nodes for Cascaded Segmentation and Tagging Tasks
11 0.10581649 6 emnlp-2010-A Latent Variable Model for Geographic Lexical Variation
12 0.10420607 120 emnlp-2010-What's with the Attitude? Identifying Sentences with Attitude in Online Discussions
13 0.09778183 85 emnlp-2010-Negative Training Data Can be Harmful to Text Classification
14 0.096914172 82 emnlp-2010-Multi-Document Summarization Using A* Search and Discriminative Learning
15 0.096444346 102 emnlp-2010-Summarizing Contrastive Viewpoints in Opinionated Text
16 0.093208194 104 emnlp-2010-The Necessity of Combining Adaptation Methods
17 0.091356538 109 emnlp-2010-Translingual Document Representations from Discriminative Projections
18 0.088911556 48 emnlp-2010-Exploiting Conversation Structure in Unsupervised Topic Segmentation for Emails
19 0.088735737 105 emnlp-2010-Title Generation with Quasi-Synchronous Grammar
20 0.087889463 30 emnlp-2010-Confidence in Structured-Prediction Using Confidence-Weighted Models
topicId topicWeight
[(0, 0.323), (1, 0.233), (2, -0.219), (3, -0.283), (4, 0.065), (5, 0.008), (6, 0.166), (7, -0.038), (8, 0.016), (9, -0.035), (10, -0.039), (11, 0.016), (12, -0.003), (13, -0.023), (14, -0.019), (15, 0.16), (16, 0.033), (17, -0.062), (18, -0.022), (19, 0.075), (20, -0.024), (21, 0.063), (22, 0.099), (23, -0.095), (24, -0.089), (25, -0.11), (26, -0.071), (27, 0.041), (28, -0.07), (29, 0.015), (30, -0.091), (31, 0.116), (32, 0.056), (33, -0.004), (34, -0.059), (35, 0.104), (36, -0.02), (37, 0.063), (38, -0.053), (39, 0.075), (40, -0.048), (41, 0.129), (42, 0.017), (43, 0.023), (44, 0.107), (45, 0.043), (46, 0.106), (47, -0.014), (48, -0.021), (49, 0.077)]
simIndex simValue paperId paperTitle
same-paper 1 0.97667092 64 emnlp-2010-Incorporating Content Structure into Text Analysis Applications
Author: Christina Sauper ; Aria Haghighi ; Regina Barzilay
Abstract: In this paper, we investigate how modeling content structure can benefit text analysis applications such as extractive summarization and sentiment analysis. This follows the linguistic intuition that rich contextual information should be useful in these tasks. We present a framework which combines a supervised text analysis application with the induction of latent content structure. Both of these elements are learned jointly using the EM algorithm. The induced content structure is learned from a large unannotated corpus and biased by the underlying text analysis task. We demonstrate that exploiting content structure yields significant improvements over approaches that rely only on local context.1
2 0.752397 83 emnlp-2010-Multi-Level Structured Models for Document-Level Sentiment Classification
Author: Ainur Yessenalina ; Yisong Yue ; Claire Cardie
Abstract: In this paper, we investigate structured models for document-level sentiment classification. When predicting the sentiment of a subjective document (e.g., as positive or negative), it is well known that not all sentences are equally discriminative or informative. But identifying the useful sentences automatically is itself a difficult learning problem. This paper proposes a joint two-level approach for document-level sentiment classification that simultaneously extracts useful (i.e., subjec- tive) sentences and predicts document-level sentiment based on the extracted sentences. Unlike previous joint learning methods for the task, our approach (1) does not rely on gold standard sentence-level subjectivity annotations (which may be expensive to obtain), and (2) optimizes directly for document-level performance. Empirical evaluations on movie reviews and U.S. Congressional floor debates show improved performance over previous approaches.
3 0.70524448 58 emnlp-2010-Holistic Sentiment Analysis Across Languages: Multilingual Supervised Latent Dirichlet Allocation
Author: Jordan Boyd-Graber ; Philip Resnik
Abstract: In this paper, we develop multilingual supervised latent Dirichlet allocation (MLSLDA), a probabilistic generative model that allows insights gleaned from one language’s data to inform how the model captures properties of other languages. MLSLDA accomplishes this by jointly modeling two aspects of text: how multilingual concepts are clustered into thematically coherent topics and how topics associated with text connect to an observed regression variable (such as ratings on a sentiment scale). Concepts are represented in a general hierarchical framework that is flexible enough to express semantic ontologies, dictionaries, clustering constraints, and, as a special, degenerate case, conventional topic models. Both the topics and the regression are discovered via posterior inference from corpora. We show MLSLDA can build topics that are consistent across languages, discover sensible bilingual lexical correspondences, and leverage multilingual corpora to better predict sentiment. Sentiment analysis (Pang and Lee, 2008) offers the promise of automatically discerning how people feel about a product, person, organization, or issue based on what they write online, which is potentially of great value to businesses and other organizations. However, the vast majority of sentiment resources and algorithms are limited to a single language, usually English (Wilson, 2008; Baccianella and Sebastiani, 2010). Since no single language captures a majority of the content online, adopting such a limited approach in an increasingly global community risks missing important details and trends that might only be available when text in multiple languages is taken into account. 45 Philip Resnik Department of Linguistics and UMIACS University of Maryland College Park, MD re snik@umd .edu Up to this point, multiple languages have been addressed in sentiment analysis primarily by transferring knowledge from a resource-rich language to a less rich language (Banea et al., 2008), or by ignoring differences in languages via translation into English (Denecke, 2008). These approaches are limited to a view of sentiment that takes place through an English-centric lens, and they ignore the potential to share information between languages. Ideally, learning sentiment cues holistically, across languages, would result in a richer and more globally consistent picture. In this paper, we introduce Multilingual Supervised Latent Dirichlet Allocation (MLSLDA), a model for sentiment analysis on a multilingual corpus. MLSLDA discovers a consistent, unified picture of sentiment across multiple languages by learning “topics,” probabilistic partitions of the vocabulary that are consistent in terms of both meaning and relevance to observed sentiment. Our approach makes few assumptions about available resources, requiring neither parallel corpora nor machine translation. The rest of the paper proceeds as follows. In Section 1, we describe the probabilistic tools that we use to create consistent topics bridging across languages and the MLSLDA model. In Section 2, we present the inference process. We discuss our set of semantic bridges between languages in Section 3, and our experiments in Section 4 demonstrate that this approach functions as an effective multilingual topic model, discovers sentiment-biased topics, and uses multilingual corpora to make better sentiment predictions across languages. Sections 5 and 6 discuss related research and discusses future work, respectively. ProcMe IdTi,n Mgsas ofsa tchehu 2se0t1t0s, C UoSnAfe,r 9e-n1ce1 o Onc Etombepri 2ic0a1l0 M. ?ec th2o0d1s0 i Ans Nsaotcuiartaioln La fonrg Cuaogmep Purtoatcieosnsainlg L,in pgagueis ti 4c5s–5 , 1 Predictions from Multilingual Topics As its name suggests, MLSLDA is an extension of Latent Dirichlet allocation (LDA) (Blei et al., 2003), a modeling approach that takes a corpus of unannotated documents as input and produces two outputs, a set of “topics” and assignments of documents to topics. Both the topics and the assignments are probabilistic: a topic is represented as a probability distribution over words in the corpus, and each document is assigned a probability distribution over all the topics. Topic models built on the foundations of LDA are appealing for sentiment analysis because the learned topics can cluster together sentimentbearing words, and because topic distributions are a parsimonious way to represent a document.1 LDA has been used to discover latent structure in text (e.g. for discourse segmentation (Purver et al., 2006) and authorship (Rosen-Zvi et al., 2004)). MLSLDA extends the approach by ensuring that this latent structure the underlying topics is consistent across languages. We discuss multilingual topic modeling in Section 1. 1, and in Section 1.2 we show how this enables supervised regression regardless of a document’s language. — — 1.1 Capturing Semantic Correlations Topic models posit a straightforward generative process that creates an observed corpus. For each docu- ment d, some distribution θd over unobserved topics is chosen. Then, for each word position in the document, a topic z is selected. Finally, the word for that position is generated by selecting from the topic indexed by z. (Recall that in LDA, a “topic” is a distribution over words). In monolingual topic models, the topic distribution is usually drawn from a Dirichlet distribution. Using Dirichlet distributions makes it easy to specify sparse priors, and it also simplifies posterior inference because Dirichlet distributions are conjugate to multinomial distributions. However, drawing topics from Dirichlet distributions will not suffice if our vocabulary includes multiple languages. If we are working with English, German, and Chinese at the same time, a Dirichlet prior has no way to favor distributions z such that p(good|z), p(gut|z), and 1The latter property has also made LDA popular for information retrieval (Wei and Croft, 2006)). 46 p(h aˇo|z) all tend to be high at the same time, or low at hth ˇaeo same lti tmened. tMoo bree generally, et sheam structure oorf our model must encourage topics to be consistent across languages, and Dirichlet distributions cannot encode correlations between elements. One possible solution to this problem is to use the multivariate normal distribution, which can produce correlated multinomials (Blei and Lafferty, 2005), in place of the Dirichlet distribution. This has been done successfully in multilingual settings (Cohen and Smith, 2009). However, such models complicate inference by not being conjugate. Instead, we appeal to tree-based extensions of the Dirichlet distribution, which has been used to induce correlation in semantic ontologies (Boyd-Graber et al., 2007) and to encode clustering constraints (Andrzejewski et al., 2009). The key idea in this approach is to assume the vocabularies of all languages are organized according to some shared semantic structure that can be represented as a tree. For concreteness in this section, we will use WordNet (Miller, 1990) as the representation of this multilingual semantic bridge, since it is well known, offers convenient and intuitive terminology, and demonstrates the full flexibility of our approach. However, the model we describe generalizes to any tree-structured rep- resentation of multilingual knowledge; we discuss some alternatives in Section 3. WordNet organizes a vocabulary into a rooted, directed acyclic graph of nodes called synsets, short for “synonym sets.” A synset is a child of another synset if it satisfies a hyponomy relationship; each child “is a” more specific instantiation of its parent concept (thus, hyponomy is often called an “isa” relationship). For example, a “dog” is a “canine” is an “animal” is a “living thing,” etc. As an approximation, it is not unreasonable to assume that WordNet’s structure of meaning is language independent, i.e. the concept encoded by a synset can be realized using terms in different languages that share the same meaning. In practice, this organization has been used to create many alignments of international WordNets to the original English WordNet (Ordan and Wintner, 2007; Sagot and Fiˇ ser, 2008; Isahara et al., 2008). Using the structure of WordNet, we can now describe a generative process that produces a distribution over a multilingual vocabulary, which encourages correlations between words with similar meanings regardless of what language each word is in. For each synset h, we create a multilingual word distribution for that synset as follows: 1. Draw transition probabilities βh ∼ Dir (τh) 2. Draw stop probabilities ωh ∼ Dir∼ (κ Dhi)r 3. For each language l, draw emission probabilities for that synset φh,l ∼ Dir (πh,l) . For conciseness in the rest of the paper, we will refer to this generative process as multilingual Dirichlet hierarchy, or MULTDIRHIER(τ, κ, π) .2 Each observed token can be viewed as the end result of a sequence of visited synsets λ. At each node in the tree, the path can end at node iwith probability ωi,1, or it can continue to a child synset with probability ωi,0. If the path continues to another child synset, it visits child j with probability βi,j. If the path ends at a synset, it generates word k with probability φi,l,k.3 The probability of a word being emitted from a path with visited synsets r and final synset h in language lis therefore p(w, λ = r, h|l, β, ω, φ) = (iY,j)∈rβi,jωi,0(1 − ωh,1)φh,l,w. Note that the stop probability ωh (1) is independent of language, but the emission φh,l is dependent on the language. This is done to prevent the following scenario: while synset A is highly probable in a topic and words in language 1attached to that synset have high probability, words in language 2 have low probability. If this could happen for many synsets in a topic, an entire language would be effectively silenced, which would lead to inconsistent topics (e.g. 2Variables τh, πh,l, and κh are hyperparameters. Their mean is fixed, but their magnitude is sampled during inference (i.e. Pkτhτ,ih,k is constant, but τh,i is not). For the bushier bridges, (Pe.g. dictionary and flat), their mean is uniform. For GermaNet, we took frequencies from two balanced corpora of German and English: the British National Corpus (University of Oxford, 2006) and the Kern Corpus of the Digitales Wo¨rterbuch der Deutschen Sprache des 20. Jahrhunderts project (Geyken, 2007). We took these frequencies and propagated them through the multilingual hierarchy, following LDAWN’s (Boyd-Graber et al., 2007) formulation of information content (Resnik, 1995) as a Bayesian prior. The variance of the priors was initialized to be 1.0, but could be sampled during inference. 3Note that the language and word are taken as given, but the path through the semantic hierarchy is a latent random variable. 47 Topic 1 is about baseball in English and about travel in German). Separating path from emission helps ensure that topics are consistent across languages. Having defined topic distributions in a way that can preserve cross-language correspondences, we now use this distribution within a larger model that can discover cross-language patterns of use that predict sentiment. 1.2 The MLSLDA Model We will view sentiment analysis as a regression problem: given an input document, we want to predict a real-valued observation y that represents the sentiment of a document. Specifically, we build on supervised latent Dirichlet allocation (SLDA, (Blei and McAuliffe, 2007)), which makes predictions based on the topics expressed in a document; this can be thought of projecting the words in a document to low dimensional space of dimension equal to the number of topics. Blei et al. showed that using this latent topic structure can offer improved predictions over regressions based on words alone, and the approach fits well with our current goals, since word-level cues are unlikely to be identical across languages. In addition to text, SLDA has been successfully applied to other domains such as social networks (Chang and Blei, 2009) and image classification (Wang et al., 2009). The key innovation in this paper is to extend SLDA by creating topics that are globally consistent across languages, using the bridging approach above. We express our model in the form of a probabilistic generative latent-variable model that generates documents in multiple languages and assigns a realvalued score to each document. The score comes from a normal distribution whose sum is the dot product between a regression parameter η that encodes the influence of each topic on the observation and a variance σ2. With this model in hand, we use statistical inference to determine the distribution over latent variables that, given the model, best explains observed data. The generative model is as follows: 1. For each topic i= 1. . . K, draw a topic distribution {βi, ωi, φi} from MULTDIRHIER(τ, κ, π). 2. {Foβr each do}cuf mroemn tM Md = 1. . . M with language ld: (a) CDihro(oαse). a distribution over topics θd ∼ (b) For each word in the document n = 1. . . Nd, choose a topic assignment zd,n ∼ Mult (θd) and a path λd,n ending at word wd,n according to Equation 1using {βzd,n , ωzd,n , φzd,n }. 3. Choose a re?sponse variable from y Norm ?η> z¯, σ2?, where z¯ d ≡ N1 PnN=1 zd,n. ∼ Crucially, note that the topics are not independent of the sentiment task; the regression encourages terms with similar effects on the observation y to be in the same topic. The consistency of topics described above allows the same regression to be done for the entire corpus regardless of the language of the underlying document. 2 Inference Finding the model parameters most likely to explain the data is a problem of statistical inference. We employ stochastic EM (Diebolt and Ip, 1996), using a Gibbs sampler for the E-step to assign words to paths and topics. After randomly initializing the topics, we alternate between sampling the topic and path of a word (zd,n, λd,n) and finding the regression parameters η that maximize the likelihood. We jointly sample the topic and path conditioning on all of the other path and document assignments in the corpus, selecting a path and topic with probability p(zn = k, λn = r|z−n , λ−n, wn , η, σ, Θ) = p(yd|z, η, σ)p(λn = r|zn = k, λ−n, wn, τ, p(zn = k|z−n, α) . κ, π) (2) Each of these three terms reflects a different influence on the topics from the vocabulary structure, the document’s topics, and the response variable. In the next paragraphs, we will expand each of them to derive the full conditional topic distribution. As discussed in Section 1.1, the structure of the topic distribution encourages terms with the same meaning to be in the same topic, even across languages. During inference, we marginalize over possible multinomial distributions β, ω, and φ, using the observed transitions from ito j in topic k; Tk,i,j, stop counts in synset iin topic k, Ok,i,0; continue counts in synsets iin topic k, Ok,i,1 ; and emission counts in synset iin language lin topic k, Fk,i,l. The 48 Multilingual Topics Text Documents Sentiment Prediction Figure 1: Graphical model representing MLSLDA. Shaded nodes represent observations, plates denote replication, and lines show probabilistic dependencies. probability of taking a path r is then p(λn = r|zn = k, λ−n) = (iY,j)∈r PBj0Bk,ik,j,i,+j0 τ+i,j τi,jPs∈0O,1k,Oi,1k,+i,s ω+i ωi,s! |(iY,j)∈rP{zP} Tran{szitiPon Ok,rend,0 + ωrend Fk,rend,wn + πrend,}l Ps∈0,1Ok,rend,s+ ωrend,sPw0Frend,w0+ πrend,w0 |PEmi{szsiPon} (3) Equation 3 reflects the multilingual aspect of this model. The conditional topic distribution for SLDA (Blei and McAuliffe, 2007) replaces this term with the standard Multinomial-Dirichlet. However, we believe this is the first published SLDA-style model using MCMC inference, as prior work has used variational inference (Blei and McAuliffe, 2007; Chang and Blei, 2009; Wang et al., 2009). Because the observed response variable depends on the topic assignments of a document, the conditional topic distribution is shifted toward topics that explain the observed response. Topics that move the predicted response yˆd toward the true yd will be favored. We drop terms that are constant across all topics for the effect of the response variable, p(yd|z, η, σ) ∝ exp?σ12?yd−PPk0kN0Nd,dk,0kη0k0?Pkη0Nzkd,k0? |??PP{z?P?} . Other wPord{zs’ influence exp
Author: Amr Ahmed ; Eric Xing
Abstract: With the proliferation of user-generated articles over the web, it becomes imperative to develop automated methods that are aware of the ideological-bias implicit in a document collection. While there exist methods that can classify the ideological bias of a given document, little has been done toward understanding the nature of this bias on a topical-level. In this paper we address the problem ofmodeling ideological perspective on a topical level using a factored topic model. We develop efficient inference algorithms using Collapsed Gibbs sampling for posterior inference, and give various evaluations and illustrations of the utility of our model on various document collections with promising results. Finally we give a Metropolis-Hasting inference algorithm for a semi-supervised extension with decent results.
5 0.52707297 34 emnlp-2010-Crouching Dirichlet, Hidden Markov Model: Unsupervised POS Tagging with Context Local Tag Generation
Author: Taesun Moon ; Katrin Erk ; Jason Baldridge
Abstract: We define the crouching Dirichlet, hidden Markov model (CDHMM), an HMM for partof-speech tagging which draws state prior distributions for each local document context. This simple modification of the HMM takes advantage of the dichotomy in natural language between content and function words. In contrast, a standard HMM draws all prior distributions once over all states and it is known to perform poorly in unsupervised and semisupervised POS tagging. This modification significantly improves unsupervised POS tagging performance across several measures on five data sets for four languages. We also show that simply using different hyperparameter values for content and function word states in a standard HMM (which we call HMM+) is surprisingly effective.
6 0.51295465 120 emnlp-2010-What's with the Attitude? Identifying Sentences with Attitude in Online Discussions
7 0.49263024 41 emnlp-2010-Efficient Graph-Based Semi-Supervised Learning of Structured Tagging Models
8 0.45921463 6 emnlp-2010-A Latent Variable Model for Geographic Lexical Variation
9 0.42086321 102 emnlp-2010-Summarizing Contrastive Viewpoints in Opinionated Text
10 0.41806507 45 emnlp-2010-Evaluating Models of Latent Document Semantics in the Presence of OCR Errors
11 0.41216639 105 emnlp-2010-Title Generation with Quasi-Synchronous Grammar
12 0.40767148 97 emnlp-2010-Simple Type-Level Unsupervised POS Tagging
13 0.40348098 13 emnlp-2010-A Simple Domain-Independent Probabilistic Approach to Generation
14 0.40097544 109 emnlp-2010-Translingual Document Representations from Discriminative Projections
15 0.38071609 48 emnlp-2010-Exploiting Conversation Structure in Unsupervised Topic Segmentation for Emails
16 0.37569416 23 emnlp-2010-Automatic Keyphrase Extraction via Topic Decomposition
17 0.36446333 70 emnlp-2010-Jointly Modeling Aspects and Opinions with a MaxEnt-LDA Hybrid
18 0.33700323 25 emnlp-2010-Better Punctuation Prediction with Dynamic Conditional Random Fields
19 0.33671924 33 emnlp-2010-Cross Language Text Classification by Model Translation and Semi-Supervised Learning
20 0.33376026 82 emnlp-2010-Multi-Document Summarization Using A* Search and Discriminative Learning
topicId topicWeight
[(12, 0.021), (29, 0.106), (30, 0.021), (52, 0.022), (56, 0.568), (62, 0.014), (66, 0.093), (72, 0.036), (76, 0.012), (87, 0.013), (89, 0.011)]
simIndex simValue paperId paperTitle
1 0.95894414 1 emnlp-2010-"Poetic" Statistical Machine Translation: Rhyme and Meter
Author: Dmitriy Genzel ; Jakob Uszkoreit ; Franz Och
Abstract: As a prerequisite to translation of poetry, we implement the ability to produce translations with meter and rhyme for phrase-based MT, examine whether the hypothesis space of such a system is flexible enough to accomodate such constraints, and investigate the impact of such constraints on translation quality.
2 0.94950473 83 emnlp-2010-Multi-Level Structured Models for Document-Level Sentiment Classification
Author: Ainur Yessenalina ; Yisong Yue ; Claire Cardie
Abstract: In this paper, we investigate structured models for document-level sentiment classification. When predicting the sentiment of a subjective document (e.g., as positive or negative), it is well known that not all sentences are equally discriminative or informative. But identifying the useful sentences automatically is itself a difficult learning problem. This paper proposes a joint two-level approach for document-level sentiment classification that simultaneously extracts useful (i.e., subjec- tive) sentences and predicts document-level sentiment based on the extracted sentences. Unlike previous joint learning methods for the task, our approach (1) does not rely on gold standard sentence-level subjectivity annotations (which may be expensive to obtain), and (2) optimizes directly for document-level performance. Empirical evaluations on movie reviews and U.S. Congressional floor debates show improved performance over previous approaches.
same-paper 3 0.93606341 64 emnlp-2010-Incorporating Content Structure into Text Analysis Applications
Author: Christina Sauper ; Aria Haghighi ; Regina Barzilay
Abstract: In this paper, we investigate how modeling content structure can benefit text analysis applications such as extractive summarization and sentiment analysis. This follows the linguistic intuition that rich contextual information should be useful in these tasks. We present a framework which combines a supervised text analysis application with the induction of latent content structure. Both of these elements are learned jointly using the EM algorithm. The induced content structure is learned from a large unannotated corpus and biased by the underlying text analysis task. We demonstrate that exploiting content structure yields significant improvements over approaches that rely only on local context.1
4 0.88897318 102 emnlp-2010-Summarizing Contrastive Viewpoints in Opinionated Text
Author: Michael Paul ; ChengXiang Zhai ; Roxana Girju
Abstract: This paper presents a two-stage approach to summarizing multiple contrastive viewpoints in opinionated text. In the first stage, we use an unsupervised probabilistic approach to model and extract multiple viewpoints in text. We experiment with a variety of lexical and syntactic features, yielding significant performance gains over bag-of-words feature sets. In the second stage, we introduce Comparative LexRank, a novel random walk formulation to score sentences and pairs of sentences from opposite viewpoints based on both their representativeness of the collection as well as their contrastiveness with each other. Exper- imental results show that the proposed approach can generate informative summaries of viewpoints in opinionated text.
5 0.6802727 82 emnlp-2010-Multi-Document Summarization Using A* Search and Discriminative Learning
Author: Ahmet Aker ; Trevor Cohn ; Robert Gaizauskas
Abstract: In this paper we address two key challenges for extractive multi-document summarization: the search problem of finding the best scoring summary and the training problem of learning the best model parameters. We propose an A* search algorithm to find the best extractive summary up to a given length, which is both optimal and efficient to run. Further, we propose a discriminative training algorithm which directly maximises the quality ofthe best summary, rather than assuming a sentence-level decomposition as in earlier work. Our approach leads to significantly better results than earlier techniques across a number of evaluation metrics.
6 0.65813923 105 emnlp-2010-Title Generation with Quasi-Synchronous Grammar
7 0.650316 107 emnlp-2010-Towards Conversation Entailment: An Empirical Investigation
8 0.63415366 120 emnlp-2010-What's with the Attitude? Identifying Sentences with Attitude in Online Discussions
9 0.61349452 58 emnlp-2010-Holistic Sentiment Analysis Across Languages: Multilingual Supervised Latent Dirichlet Allocation
10 0.60619962 49 emnlp-2010-Extracting Opinion Targets in a Single and Cross-Domain Setting with Conditional Random Fields
11 0.5914343 80 emnlp-2010-Modeling Organization in Student Essays
12 0.58689892 100 emnlp-2010-Staying Informed: Supervised and Semi-Supervised Multi-View Topical Analysis of Ideological Perspective
13 0.58684331 24 emnlp-2010-Automatically Producing Plot Unit Representations for Narrative Text
14 0.5752936 69 emnlp-2010-Joint Training and Decoding Using Virtual Nodes for Cascaded Segmentation and Tagging Tasks
15 0.57273018 25 emnlp-2010-Better Punctuation Prediction with Dynamic Conditional Random Fields
16 0.55797046 65 emnlp-2010-Inducing Probabilistic CCG Grammars from Logical Form with Higher-Order Unification
17 0.55654299 94 emnlp-2010-SCFG Decoding Without Binarization
18 0.55212414 103 emnlp-2010-Tense Sense Disambiguation: A New Syntactic Polysemy Task
19 0.55049288 18 emnlp-2010-Assessing Phrase-Based Translation Models with Oracle Decoding
20 0.54369956 48 emnlp-2010-Exploiting Conversation Structure in Unsupervised Topic Segmentation for Emails