emnlp emnlp2010 emnlp2010-70 knowledge-graph by maker-knowledge-mining

70 emnlp-2010-Jointly Modeling Aspects and Opinions with a MaxEnt-LDA Hybrid

Source: pdf

Author: Xin Zhao ; Jing Jiang ; Hongfei Yan ; Xiaoming Li

Abstract: Discovering and summarizing opinions from online reviews is an important and challenging task. A commonly-adopted framework generates structured review summaries with aspects and opinions. Recently topic models have been used to identify meaningful review aspects, but existing topic models do not identify aspect-specific opinion words. In this paper, we propose a MaxEnt-LDA hybrid model to jointly discover both aspects and aspect-specific opinion words. We show that with a relatively small amount of training data, our model can effectively identify aspect and opinion words simultaneously. We also demonstrate the domain adaptability of our model.

Reference: text

Summary: the most important sentenses genereted by tfidf model

sentIndex sentText sentNum sentScore

1 Recently topic models have been used to identify meaningful review aspects, but existing topic models do not identify aspect-specific opinion words. [sent-6, score-1.07]

2 In this paper, we propose a MaxEnt-LDA hybrid model to jointly discover both aspects and aspect-specific opinion words. [sent-7, score-0.938]

3 We show that with a relatively small amount of training data, our model can effectively identify aspect and opinion words simultaneously. [sent-8, score-1.174]

4 , 2002) to fine-grained extraction of opinion expressions and their targets (Wu et al. [sent-13, score-0.728]

5 For example, aspects of a restaurant may include food, staff, ambience and price, and opinion expressions for staff may include friendly, rude, etc. [sent-23, score-1.258]

6 Different approaches have been proposed to identify aspect words and phrases from reviews. [sent-27, score-0.505]

7 , 2009) have the limitation that they do not group semantically related aspect expressions together. [sent-30, score-0.477]

8 We follow this promising direction and extend existing topic models to jointly identify both aspect and opinion words, especially aspect-specific opinion words. [sent-34, score-1.962]

9 Current topic models for opinion mining, which we will review in detail in Section 2, still lack this ability. [sent-35, score-0.837]

10 But separating aspect and opinion words can be very useful. [sent-36, score-1.129]

11 Aspect-specific opinion words can be used to construct a domain-dependent senti1http://www. [sent-37, score-0.69]

12 For example, using more specific opinion words such as cozy and romantic to describe the ambience aspect in a review summary is more meaningful than using generic words such as nice and great. [sent-43, score-1.325]

13 To the best of our knowledge, Brody and Elhadad (2010) are the first to study aspect-specific opinion words, but their opinion word detection is performed outside of topic modeling, and they only consider adjectives as possible opinion words. [sent-44, score-2.153]

14 In this paper, we propose a new topic modeling approach that can automatically separate aspect and opinion words. [sent-45, score-1.262]

15 The MaxEnt component allows us to leverage arbitrary features such as POS tags to help separate aspect and opinion words. [sent-47, score-1.135]

16 Empirical evaluation on large review data sets shows that our model can effectively identify both aspects and aspect-specific opinion words with a small amount of training data. [sent-50, score-0.972]

17 There are usually two major tasks involved, namely, aspect or feature identification and opinion extraction. [sent-52, score-1.151]

18 Hu and Liu (2004) applied frequent itemset mining to identify product features without supervision, and considered adjectives collocated with feature words as opinion words. [sent-53, score-0.869]

19 A common limitation of these methods is that they do not group semantically related aspect expressions together. [sent-57, score-0.477]

20 57 Topic modeling provides an unsupervised and knowledge-lean approach to opinion mining. [sent-59, score-0.695]

21 However, they do not explicitly separate aspect and opinion words. [sent-63, score-1.135]

22 (2007) propose to separate topic and sentiment words using a positive sentiment model and a negative sentiment model, but both models capture general opinion words only. [sent-66, score-1.05]

23 In contrast, we model aspect-specific opinion words as well as gen- eral opinion words. [sent-67, score-1.359]

24 Recently Brody and Elhadad (2010) propose to detect aspect-specific opinion words in an unsupervised manner. [sent-68, score-0.69]

25 They take a two-step approach by first detecting aspect words using topic models and then identifying aspect-specific opinion words using polarity propagation. [sent-69, score-1.291]

26 They only consider adjectives as opinion words, which may potentially miss opinion words with other POS tags. [sent-70, score-1.404]

27 We try to jointly capture both aspect and opinion words within topic models, and we allow non-adjective opinion words. [sent-71, score-1.938]

28 Our MaxEnt-LDA hybrid bears similarity to these recent models but ours is designed for opinion mining. [sent-73, score-0.729]

29 , 2003) but captures both aspect words and opinion words. [sent-75, score-1.129]

30 To model the aspect words, we use a modified version of the multi-grain topic models from (Titov and McDonald, 2008). [sent-76, score-0.54]

31 To understand how we model the opinion words, let us first look at two example review sentences from the restaurant domain: The food was tasty. [sent-80, score-0.975]

32 Besides these aspect-specific opinion words, we also see general opinion words such as great in the sentence “The food was great! [sent-84, score-1.475]

33 ” These general opinion words are shared across aspects, as opposed to aspect-specific opinion words which are used most commonly with their corresponding aspects. [sent-85, score-1.402]

34 We therefore introduce a general opinion model and T aspect-specific opinion models to capture these different opinion words. [sent-86, score-2.029]

35 First, we draw several multinomial word distributions from a symmetric Dirichlet prior with parameter β: a background model φB, a general aspect model φA,g, a general opinion model φO,g, T aspect models {φA,t}tT=1 and T aspect-specific opinipoenc tm mooddeelsls { {φφO,t}}tT=1. [sent-89, score-1.671]

36 For each sentence s in∼ dDoicru(αm)e anst d, we nddraarwd an aspect assignment zd,s∼Multi(θd). [sent-92, score-0.439]

37 Now for each word in sentence s of document d, we have several choices: The word may describe the specific aspect (e. [sent-93, score-0.469]

38 waiter for the staff aspect), or a general aspect (e. [sent-95, score-0.681]

39 restaurant), or an opinion either specific to the aspect (e. [sent-97, score-1.108]

40 l yd,s,n udteitoenrm oivneers { w0,h1e,th2e}r, wd,s,n tise a ebdac bkyground word, aspect word or opinion word. [sent-106, score-1.108]

41 , 2007; Lin and He, 2009), fully unsupervised topic models are unable to identify opinion words well. [sent-116, score-0.836]

42 An important observation we make is that aspect words and opinion words usually play different syntactic roles in a sentence. [sent-117, score-1.15]

43 Aspect words tend to be nouns while opinion words tend to be adjectives. [sent-118, score-0.711]

44 But we do not want to use strict rules to separate aspect and opinion words because there are also exceptions. [sent-120, score-1.156]

45 verbs such as recommend can also be opinion words. [sent-123, score-0.669]

46 In order to use information such as POS tags to help discriminate between aspect and opinion words, we propose a novel idea as follows: We set πd,s,n using a maximum entropy (MaxEnt) model applied to a feature vector xd,s,n associated with wd,s,n. [sent-124, score-1.108]

47 Formally, we have p(yd,s,n= l|xd,s,n) = πld,s,n=Pl20e=x0pex¡pλl¡·λ xl0d·,s x,nd¢,s,n¢, where {λl}l2=0 denote the MaxEnt model weights awnhde can λ be} learned from a set of training sentences with labeled background, aspect and opinion words. [sent-128, score-1.139]

48 Here c(dt) is the number of sentences assigned to aspect t in document d, and c(d·) is the number of sentences in document d. [sent-139, score-0.529]

49 c(Av,)t is the number of times word v is assigned as an aspect word to aspect t, and c(Ov,)t is the number of times word v is assigned as an opinion word to aspect t. [sent-140, score-2.046]

50 c(A·,)t is the total number of times any word is assigned as an aspect word to aspect t, and c(O·),t is the total number of times any word is assigned as an opinion word to aspect t. [sent-141, score-2.046]

51 n(Av,)t is the number of times 59 word v is assigned as an aspect word to aspect t in sentence s of document d, and similarly, n(Ov,)t is the number of times word v is assigned as an opinion word to aspect t in sentence s of document d. [sent-143, score-2.106]

52 4 Experiment Setup To evaluate our MaxEnt-LDA hybrid model for jointly modeling aspect and opinion words, we used a restaurant review data set previously used in (Ganu et al. [sent-146, score-1.445]

53 We found that this unsupervised model could not separate aspect and opinion words well. [sent-170, score-1.156]

54 Note that the words in bold are opinion words which are mixed with aspect words. [sent-179, score-1.15]

55 Because the hotel domain is somehow similar to the restaurant domain, we used the labeled training data from the restaurant domain also for the hotel data set. [sent-182, score-0.619]

56 From the tables we can see that generally aspect words are quite coherent and meaningful, and opinion words correspond to aspects very well. [sent-183, score-1.345]

57 We can see that ME-LDA and LocLDA give similar aspect words. [sent-185, score-0.439]

58 The major difference between these two models is that ME-LDA can sperate aspect words and opinion words, which can be very useful. [sent-186, score-1.129]

59 MELDA is also able to separate general opinion words from aspect-specific ones, giving more informative opinion expressions for each aspect. [sent-187, score-1.446]

60 T {oS teavfaf,lu Faotoe dth,e A quality eo,f our aspect iddoetne-, tification, we acluhaotsee tfhreom qu tahliet gold sutran adsapredc tlia dbeenl-s three major aspects, namely Staff, Food and Ambience. [sent-192, score-0.499]

61 We first ran ME-LDA and LocLDA each to get an inferred aspect set T . [sent-200, score-0.478]

62 We then manually mapped each inferred aspect to one of the six gold standard aspects, i. [sent-203, score-0.516]

63 , s wenete cnrecaet s do fa d moacupmpienngt fdu,n we ofinrs ft assign Tit → →to an inferred aspect as follows: t∗= argt∈mTaxNnX=d,1slogP(wd,s,n|t). [sent-208, score-0.478]

64 We then assign the gold standard aspect f(t∗) to this 61 AspectMethodPrecisionRecallF-1 TAabmFSletoai6fd:ncResulM LtsoE c-Lf aD sAp ectsid0 e. [sent-209, score-0.477]

65 Note that ME-LDA is not designed to compete with LocLDA for aspect identification. [sent-218, score-0.439]

66 3 Evaluation of Opinion Identification Since the major advantage of ME-LDA is its ability to separate aspect and opinion words, we further quantitatively evaluated the quality of the aspectspecific opinion words identified by ME-LDA. [sent-220, score-1.901]

67 Brody and Elhadad (2010) has constructed a gold standard set of aspect-specific opinion words for the restaurant data set. [sent-221, score-0.873]

68 Because their gold standard only includes adjectives, we also manually added more opinion words into the gold standard set. [sent-228, score-0.766]

69 To do so, we took the top 20 opinion words returned by our method and two baseline methods, pooled them together, and manually judged them. [sent-229, score-0.744]

70 Because top words are more important in opinion models, we set n to 5, 10 and 20. [sent-231, score-0.69]

71 For both ME-LDA and BL-1 below, we again manually mapped each automatically inferred aspect to one of the gold standard aspects. [sent-232, score-0.516]

72 Since LocLDA does not identify aspect-specific opinion words, we consider the following two baseline methods that can identify aspect-specific opinion words: BL-1: In this baseline, we start with all adjectives as candidate opinion words, and use mutual information (MI) to rank these candidates. [sent-233, score-2.142]

73 Specifically, given an aspect t, we rank the candidate words according to the following scoring function: ScoreBL-1(w,t) =vX∈Vtp(w,v)logpp(w(w)p,(vv)), where Vt is the set of the top-100 frequent aspect wwoherdrse f Vrom φA,t. [sent-234, score-0.899]

74 Finally, for each aspect we rank adjectives by their 62 MethodP@5P@10P@20 ME-LDA0. [sent-238, score-0.484]

75 546639 Table 7: Average P@n of aspect-specific opinion words on restaurant. [sent-247, score-0.69]

76 frequencies in the aspect and treat these as aspectspecific opinion words. [sent-250, score-1.158]

77 825 for P@5 also indicates that top opinion words discovered by our model are indeed meaningful. [sent-255, score-0.69]

78 4 Evaluation of the Association between Opinion Words and Aspects The evaluation in the previous section shows that our model returns good opinion words for each aspect. [sent-257, score-0.69]

79 It does not, however, directly judge how aspectspecific those opinion words are. [sent-258, score-0.74]

80 This is because the gold standard created by (Brody and Elhadad, 2010) also includes general opinion words. [sent-259, score-0.729]

81 friendly and good may both be judged to be opinion words for the staff aspect, but the former is more specific than the latter. [sent-262, score-0.971]

82 So we further evaluated the association between opinion words and aspects by directly looking at how easy it is to infer the corresponding aspect by only looking at an aspect-specific opinion word. [sent-264, score-1.968]

83 For each aspect, similar to the pooling strategy in IR, we pooled the top 20 opinion words identified by BL-1, BL-2 and ME-LDA. [sent-267, score-0.716]

84 We can see that ME-LDA outperformed BL-2 quite a lot for the restaurant data set, which conforms to our hypothesis that ME-LDA generates aspect-specific opinion words of stronger association with aspects. [sent-282, score-0.893]

85 1 Feature Selection Previous studies have shown that simple POS features and lexical features can be very effective for discovering aspect words and opinion words (Hu 63 MethodsAverage F-1 LocLDA0. [sent-287, score-1.181]

86 705 Table 9: Comparison of the average F-1 using different feature sets for aspect identification on restaurant. [sent-291, score-0.482]

87 for POS features, since we observe that aspect words tend to be nouns while opinion words tend to be adjectives but sometimes also verbs or other part-of-speeches, we can expect that POS features should be quite useful. [sent-295, score-1.22]

88 As for lexical features, words from a sentiment lexicon can also be helpful in discovering opinion words. [sent-296, score-0.784]

89 We can see that Set B plays tihnieo nmo idset important part, cwahnic she eco tnhfaotr mSest tBo our hypothesis that POS features are very important in opinion mining. [sent-303, score-0.669]

90 2 Examine the Size of Labeled Data As we have seen, POS features play the major role in discriminating between aspect and opinion words. [sent-310, score-1.108]

91 569 Table 10: Comparison ofthe average P@n using different feature sets for opinion identification on restaurant. [sent-323, score-0.712]

92 5 897845 Table 14: Average P@n of aspect-specific opinion words for domain adaption on restaurant. [sent-358, score-0.797]

93 For opinion identification, we can see that there is no clear difference between using out-of-domain training data and using in-domain training data, which may indicate that our opinion identification component is robust in domain adaption. [sent-369, score-1.441]

94 Also, we cannot easily tell whether B has advantage over C for opinion iadsielnytiteficllawtiohnet. [sent-370, score-0.669]

95 h OernBe possible reason may fboer that those general opinion words are useful across domains, so lexical features may still be useful for domain adaption. [sent-371, score-0.772]

96 7 Conclusions In this paper, we presented a topic modeling approach that can jointly identify aspect and opinion words, using a MaxEnt-LDA hybrid. [sent-372, score-1.319]

97 We showed that by incorporating a supervised, discriminative maximum entropy model into an unsupervised, generative topic model, we could leverage syntactic features to help separate aspect and opinion words. [sent-373, score-1.236]

98 Most importantly, our model was able to identify meaningful opinion words strongly associated with different aspects. [sent-376, score-0.777]

99 A novel lexicalized HMM-based learning framework for web opinion mining. [sent-416, score-0.669]

100 OpinionMiner: A novel machine learning system for web opinion mining and extraction. [sent-421, score-0.7]

similar papers computed by tfidf model

tfidf for this paper:

wordName wordTfidf (topN-words)

[('opinion', 0.669), ('aspect', 0.439), ('aspects', 0.17), ('staff', 0.17), ('elhadad', 0.153), ('restaurant', 0.145), ('brody', 0.132), ('loclda', 0.116), ('topic', 0.101), ('posi', 0.099), ('food', 0.094), ('hotel', 0.089), ('maxent', 0.083), ('friendly', 0.083), ('ov', 0.071), ('review', 0.067), ('ambience', 0.066), ('jin', 0.066), ('av', 0.064), ('sentiment', 0.063), ('domain', 0.06), ('hybrid', 0.06), ('titov', 0.051), ('pos', 0.051), ('aspectspecific', 0.05), ('ganu', 0.05), ('methodp', 0.05), ('ndcg', 0.05), ('waiter', 0.05), ('adaption', 0.047), ('reviews', 0.046), ('adjectives', 0.045), ('identify', 0.045), ('hu', 0.043), ('identification', 0.043), ('wu', 0.043), ('meaningful', 0.042), ('polarity', 0.04), ('jointly', 0.039), ('inferred', 0.039), ('gold', 0.038), ('expressions', 0.038), ('draw', 0.036), ('pang', 0.033), ('outperformed', 0.033), ('ambiance', 0.033), ('anecdote', 0.033), ('dishes', 0.033), ('hay', 0.033), ('hung', 0.033), ('itemset', 0.033), ('rateable', 0.033), ('tasty', 0.033), ('xiaoming', 0.033), ('mimno', 0.033), ('ho', 0.033), ('wi', 0.033), ('vt', 0.031), ('mining', 0.031), ('discovering', 0.031), ('lda', 0.031), ('labeled', 0.031), ('document', 0.03), ('opinions', 0.03), ('assigned', 0.03), ('dirichlet', 0.03), ('mei', 0.03), ('service', 0.03), ('ou', 0.028), ('baccianella', 0.028), ('misc', 0.028), ('dcg', 0.028), ('hongfei', 0.028), ('noemie', 0.028), ('vy', 0.028), ('judged', 0.028), ('separate', 0.027), ('supervision', 0.027), ('price', 0.027), ('summarizing', 0.027), ('modeling', 0.026), ('gibbs', 0.026), ('yan', 0.026), ('quantitatively', 0.026), ('pooled', 0.026), ('product', 0.025), ('eight', 0.025), ('quite', 0.025), ('nth', 0.024), ('symmetric', 0.023), ('general', 0.022), ('liu', 0.022), ('xin', 0.022), ('popescu', 0.022), ('namely', 0.022), ('words', 0.021), ('multinomial', 0.021), ('blei', 0.021), ('nt', 0.021), ('targets', 0.021)]

similar papers list:

simIndex simValue paperId paperTitle

same-paper 1 0.9999994 70 emnlp-2010-Jointly Modeling Aspects and Opinions with a MaxEnt-LDA Hybrid

Author: Xin Zhao ; Jing Jiang ; Hongfei Yan ; Xiaoming Li

2 0.48540705 49 emnlp-2010-Extracting Opinion Targets in a Single and Cross-Domain Setting with Conditional Random Fields

Author: Niklas Jakob ; Iryna Gurevych

Abstract: In this paper, we focus on the opinion target extraction as part of the opinion mining task. We model the problem as an information extraction task, which we address based on Conditional Random Fields (CRF). As a baseline we employ the supervised algorithm by Zhuang et al. (2006), which represents the state-of-the-art on the employed data. We evaluate the algorithms comprehensively on datasets from four different domains annotated with individual opinion target instances on a sentence level. Furthermore, we investigate the performance of our CRF-based approach and the baseline in a single- and cross-domain opinion target extraction setting. Our CRF-based approach improves the performance by 0.077, 0.126, 0.071 and 0. 178 regarding F-Measure in the single-domain extraction in the four domains. In the crossdomain setting our approach improves the performance by 0.409, 0.242, 0.294 and 0.343 regarding F-Measure over the baseline.

3 0.16257666 64 emnlp-2010-Incorporating Content Structure into Text Analysis Applications

Author: Christina Sauper ; Aria Haghighi ; Regina Barzilay

Abstract: In this paper, we investigate how modeling content structure can benefit text analysis applications such as extractive summarization and sentiment analysis. This follows the linguistic intuition that rich contextual information should be useful in these tasks. We present a framework which combines a supervised text analysis application with the induction of latent content structure. Both of these elements are learned jointly using the EM algorithm. The induced content structure is learned from a large unannotated corpus and biased by the underlying text analysis task. We demonstrate that exploiting content structure yields significant improvements over approaches that rely only on local context.1

4 0.11563743 58 emnlp-2010-Holistic Sentiment Analysis Across Languages: Multilingual Supervised Latent Dirichlet Allocation

Author: Jordan Boyd-Graber ; Philip Resnik

Abstract: In this paper, we develop multilingual supervised latent Dirichlet allocation (MLSLDA), a probabilistic generative model that allows insights gleaned from one language’s data to inform how the model captures properties of other languages. MLSLDA accomplishes this by jointly modeling two aspects of text: how multilingual concepts are clustered into thematically coherent topics and how topics associated with text connect to an observed regression variable (such as ratings on a sentiment scale). Concepts are represented in a general hierarchical framework that is flexible enough to express semantic ontologies, dictionaries, clustering constraints, and, as a special, degenerate case, conventional topic models. Both the topics and the regression are discovered via posterior inference from corpora. We show MLSLDA can build topics that are consistent across languages, discover sensible bilingual lexical correspondences, and leverage multilingual corpora to better predict sentiment. Sentiment analysis (Pang and Lee, 2008) offers the promise of automatically discerning how people feel about a product, person, organization, or issue based on what they write online, which is potentially of great value to businesses and other organizations. However, the vast majority of sentiment resources and algorithms are limited to a single language, usually English (Wilson, 2008; Baccianella and Sebastiani, 2010). Since no single language captures a majority of the content online, adopting such a limited approach in an increasingly global community risks missing important details and trends that might only be available when text in multiple languages is taken into account. 45 Philip Resnik Department of Linguistics and UMIACS University of Maryland College Park, MD re snik@umd .edu Up to this point, multiple languages have been addressed in sentiment analysis primarily by transferring knowledge from a resource-rich language to a less rich language (Banea et al., 2008), or by ignoring differences in languages via translation into English (Denecke, 2008). These approaches are limited to a view of sentiment that takes place through an English-centric lens, and they ignore the potential to share information between languages. Ideally, learning sentiment cues holistically, across languages, would result in a richer and more globally consistent picture. In this paper, we introduce Multilingual Supervised Latent Dirichlet Allocation (MLSLDA), a model for sentiment analysis on a multilingual corpus. MLSLDA discovers a consistent, unified picture of sentiment across multiple languages by learning “topics,” probabilistic partitions of the vocabulary that are consistent in terms of both meaning and relevance to observed sentiment. Our approach makes few assumptions about available resources, requiring neither parallel corpora nor machine translation. The rest of the paper proceeds as follows. In Section 1, we describe the probabilistic tools that we use to create consistent topics bridging across languages and the MLSLDA model. In Section 2, we present the inference process. We discuss our set of semantic bridges between languages in Section 3, and our experiments in Section 4 demonstrate that this approach functions as an effective multilingual topic model, discovers sentiment-biased topics, and uses multilingual corpora to make better sentiment predictions across languages. Sections 5 and 6 discuss related research and discusses future work, respectively. ProcMe IdTi,n Mgsas ofsa tchehu 2se0t1t0s, C UoSnAfe,r 9e-n1ce1 o Onc Etombepri 2ic0a1l0 M. ?ec th2o0d1s0 i Ans Nsaotcuiartaioln La fonrg Cuaogmep Purtoatcieosnsainlg L,in pgagueis ti 4c5s–5 , 1 Predictions from Multilingual Topics As its name suggests, MLSLDA is an extension of Latent Dirichlet allocation (LDA) (Blei et al., 2003), a modeling approach that takes a corpus of unannotated documents as input and produces two outputs, a set of “topics” and assignments of documents to topics. Both the topics and the assignments are probabilistic: a topic is represented as a probability distribution over words in the corpus, and each document is assigned a probability distribution over all the topics. Topic models built on the foundations of LDA are appealing for sentiment analysis because the learned topics can cluster together sentimentbearing words, and because topic distributions are a parsimonious way to represent a document.1 LDA has been used to discover latent structure in text (e.g. for discourse segmentation (Purver et al., 2006) and authorship (Rosen-Zvi et al., 2004)). MLSLDA extends the approach by ensuring that this latent structure the underlying topics is consistent across languages. We discuss multilingual topic modeling in Section 1. 1, and in Section 1.2 we show how this enables supervised regression regardless of a document’s language. — — 1.1 Capturing Semantic Correlations Topic models posit a straightforward generative process that creates an observed corpus. For each docu- ment d, some distribution θd over unobserved topics is chosen. Then, for each word position in the document, a topic z is selected. Finally, the word for that position is generated by selecting from the topic indexed by z. (Recall that in LDA, a “topic” is a distribution over words). In monolingual topic models, the topic distribution is usually drawn from a Dirichlet distribution. Using Dirichlet distributions makes it easy to specify sparse priors, and it also simplifies posterior inference because Dirichlet distributions are conjugate to multinomial distributions. However, drawing topics from Dirichlet distributions will not suffice if our vocabulary includes multiple languages. If we are working with English, German, and Chinese at the same time, a Dirichlet prior has no way to favor distributions z such that p(good|z), p(gut|z), and 1The latter property has also made LDA popular for information retrieval (Wei and Croft, 2006)). 46 p(h aˇo|z) all tend to be high at the same time, or low at hth ˇaeo same lti tmened. tMoo bree generally, et sheam structure oorf our model must encourage topics to be consistent across languages, and Dirichlet distributions cannot encode correlations between elements. One possible solution to this problem is to use the multivariate normal distribution, which can produce correlated multinomials (Blei and Lafferty, 2005), in place of the Dirichlet distribution. This has been done successfully in multilingual settings (Cohen and Smith, 2009). However, such models complicate inference by not being conjugate. Instead, we appeal to tree-based extensions of the Dirichlet distribution, which has been used to induce correlation in semantic ontologies (Boyd-Graber et al., 2007) and to encode clustering constraints (Andrzejewski et al., 2009). The key idea in this approach is to assume the vocabularies of all languages are organized according to some shared semantic structure that can be represented as a tree. For concreteness in this section, we will use WordNet (Miller, 1990) as the representation of this multilingual semantic bridge, since it is well known, offers convenient and intuitive terminology, and demonstrates the full flexibility of our approach. However, the model we describe generalizes to any tree-structured rep- resentation of multilingual knowledge; we discuss some alternatives in Section 3. WordNet organizes a vocabulary into a rooted, directed acyclic graph of nodes called synsets, short for “synonym sets.” A synset is a child of another synset if it satisfies a hyponomy relationship; each child “is a” more specific instantiation of its parent concept (thus, hyponomy is often called an “isa” relationship). For example, a “dog” is a “canine” is an “animal” is a “living thing,” etc. As an approximation, it is not unreasonable to assume that WordNet’s structure of meaning is language independent, i.e. the concept encoded by a synset can be realized using terms in different languages that share the same meaning. In practice, this organization has been used to create many alignments of international WordNets to the original English WordNet (Ordan and Wintner, 2007; Sagot and Fiˇ ser, 2008; Isahara et al., 2008). Using the structure of WordNet, we can now describe a generative process that produces a distribution over a multilingual vocabulary, which encourages correlations between words with similar meanings regardless of what language each word is in. For each synset h, we create a multilingual word distribution for that synset as follows: 1. Draw transition probabilities βh ∼ Dir (τh) 2. Draw stop probabilities ωh ∼ Dir∼ (κ Dhi)r 3. For each language l, draw emission probabilities for that synset φh,l ∼ Dir (πh,l) . For conciseness in the rest of the paper, we will refer to this generative process as multilingual Dirichlet hierarchy, or MULTDIRHIER(τ, κ, π) .2 Each observed token can be viewed as the end result of a sequence of visited synsets λ. At each node in the tree, the path can end at node iwith probability ωi,1, or it can continue to a child synset with probability ωi,0. If the path continues to another child synset, it visits child j with probability βi,j. If the path ends at a synset, it generates word k with probability φi,l,k.3 The probability of a word being emitted from a path with visited synsets r and final synset h in language lis therefore p(w, λ = r, h|l, β, ω, φ) = (iY,j)∈rβi,jωi,0(1 − ωh,1)φh,l,w. Note that the stop probability ωh (1) is independent of language, but the emission φh,l is dependent on the language. This is done to prevent the following scenario: while synset A is highly probable in a topic and words in language 1attached to that synset have high probability, words in language 2 have low probability. If this could happen for many synsets in a topic, an entire language would be effectively silenced, which would lead to inconsistent topics (e.g. 2Variables τh, πh,l, and κh are hyperparameters. Their mean is fixed, but their magnitude is sampled during inference (i.e. Pkτhτ,ih,k is constant, but τh,i is not). For the bushier bridges, (Pe.g. dictionary and flat), their mean is uniform. For GermaNet, we took frequencies from two balanced corpora of German and English: the British National Corpus (University of Oxford, 2006) and the Kern Corpus of the Digitales Wo¨rterbuch der Deutschen Sprache des 20. Jahrhunderts project (Geyken, 2007). We took these frequencies and propagated them through the multilingual hierarchy, following LDAWN’s (Boyd-Graber et al., 2007) formulation of information content (Resnik, 1995) as a Bayesian prior. The variance of the priors was initialized to be 1.0, but could be sampled during inference. 3Note that the language and word are taken as given, but the path through the semantic hierarchy is a latent random variable. 47 Topic 1 is about baseball in English and about travel in German). Separating path from emission helps ensure that topics are consistent across languages. Having defined topic distributions in a way that can preserve cross-language correspondences, we now use this distribution within a larger model that can discover cross-language patterns of use that predict sentiment. 1.2 The MLSLDA Model We will view sentiment analysis as a regression problem: given an input document, we want to predict a real-valued observation y that represents the sentiment of a document. Specifically, we build on supervised latent Dirichlet allocation (SLDA, (Blei and McAuliffe, 2007)), which makes predictions based on the topics expressed in a document; this can be thought of projecting the words in a document to low dimensional space of dimension equal to the number of topics. Blei et al. showed that using this latent topic structure can offer improved predictions over regressions based on words alone, and the approach fits well with our current goals, since word-level cues are unlikely to be identical across languages. In addition to text, SLDA has been successfully applied to other domains such as social networks (Chang and Blei, 2009) and image classification (Wang et al., 2009). The key innovation in this paper is to extend SLDA by creating topics that are globally consistent across languages, using the bridging approach above. We express our model in the form of a probabilistic generative latent-variable model that generates documents in multiple languages and assigns a realvalued score to each document. The score comes from a normal distribution whose sum is the dot product between a regression parameter η that encodes the influence of each topic on the observation and a variance σ2. With this model in hand, we use statistical inference to determine the distribution over latent variables that, given the model, best explains observed data. The generative model is as follows: 1. For each topic i= 1. . . K, draw a topic distribution {βi, ωi, φi} from MULTDIRHIER(τ, κ, π). 2. {Foβr each do}cuf mroemn tM Md = 1. . . M with language ld: (a) CDihro(oαse). a distribution over topics θd ∼ (b) For each word in the document n = 1. . . Nd, choose a topic assignment zd,n ∼ Mult (θd) and a path λd,n ending at word wd,n according to Equation 1using {βzd,n , ωzd,n , φzd,n }. 3. Choose a re?sponse variable from y Norm ?η> z¯, σ2?, where z¯ d ≡ N1 PnN=1 zd,n. ∼ Crucially, note that the topics are not independent of the sentiment task; the regression encourages terms with similar effects on the observation y to be in the same topic. The consistency of topics described above allows the same regression to be done for the entire corpus regardless of the language of the underlying document. 2 Inference Finding the model parameters most likely to explain the data is a problem of statistical inference. We employ stochastic EM (Diebolt and Ip, 1996), using a Gibbs sampler for the E-step to assign words to paths and topics. After randomly initializing the topics, we alternate between sampling the topic and path of a word (zd,n, λd,n) and finding the regression parameters η that maximize the likelihood. We jointly sample the topic and path conditioning on all of the other path and document assignments in the corpus, selecting a path and topic with probability p(zn = k, λn = r|z−n , λ−n, wn , η, σ, Θ) = p(yd|z, η, σ)p(λn = r|zn = k, λ−n, wn, τ, p(zn = k|z−n, α) . κ, π) (2) Each of these three terms reflects a different influence on the topics from the vocabulary structure, the document’s topics, and the response variable. In the next paragraphs, we will expand each of them to derive the full conditional topic distribution. As discussed in Section 1.1, the structure of the topic distribution encourages terms with the same meaning to be in the same topic, even across languages. During inference, we marginalize over possible multinomial distributions β, ω, and φ, using the observed transitions from ito j in topic k; Tk,i,j, stop counts in synset iin topic k, Ok,i,0; continue counts in synsets iin topic k, Ok,i,1 ; and emission counts in synset iin language lin topic k, Fk,i,l. The 48 Multilingual Topics Text Documents Sentiment Prediction Figure 1: Graphical model representing MLSLDA. Shaded nodes represent observations, plates denote replication, and lines show probabilistic dependencies. probability of taking a path r is then p(λn = r|zn = k, λ−n) = (iY,j)∈r PBj0Bk,ik,j,i,+j0 τ+i,j τi,jPs∈0O,1k,Oi,1k,+i,s ω+i ωi,s! |(iY,j)∈rP{zP} Tran{szitiPon Ok,rend,0 + ωrend Fk,rend,wn + πrend,}l Ps∈0,1Ok,rend,s+ ωrend,sPw0Frend,w0+ πrend,w0 |PEmi{szsiPon} (3) Equation 3 reflects the multilingual aspect of this model. The conditional topic distribution for SLDA (Blei and McAuliffe, 2007) replaces this term with the standard Multinomial-Dirichlet. However, we believe this is the first published SLDA-style model using MCMC inference, as prior work has used variational inference (Blei and McAuliffe, 2007; Chang and Blei, 2009; Wang et al., 2009). Because the observed response variable depends on the topic assignments of a document, the conditional topic distribution is shifted toward topics that explain the observed response. Topics that move the predicted response yˆd toward the true yd will be favored. We drop terms that are constant across all topics for the effect of the response variable, p(yd|z, η, σ) ∝ exp?σ12?yd−PPk0kN0Nd,dk,0kη0k0?Pkη0Nzkd,k0? |??PP{z?P?} . Other wPord{zs’ influence exp

5 0.10641949 100 emnlp-2010-Staying Informed: Supervised and Semi-Supervised Multi-View Topical Analysis of Ideological Perspective

Author: Amr Ahmed ; Eric Xing

Abstract: With the proliferation of user-generated articles over the web, it becomes imperative to develop automated methods that are aware of the ideological-bias implicit in a document collection. While there exist methods that can classify the ideological bias of a given document, little has been done toward understanding the nature of this bias on a topical-level. In this paper we address the problem ofmodeling ideological perspective on a topical level using a factored topic model. We develop efficient inference algorithms using Collapsed Gibbs sampling for posterior inference, and give various evaluations and illustrations of the utility of our model on various document collections with promising results. Finally we give a Metropolis-Hasting inference algorithm for a semi-supervised extension with decent results.

6 0.079159707 120 emnlp-2010-What's with the Attitude? Identifying Sentences with Attitude in Online Discussions

7 0.075001843 102 emnlp-2010-Summarizing Contrastive Viewpoints in Opinionated Text

8 0.066512927 83 emnlp-2010-Multi-Level Structured Models for Document-Level Sentiment Classification

9 0.056667075 6 emnlp-2010-A Latent Variable Model for Geographic Lexical Variation

10 0.050699916 48 emnlp-2010-Exploiting Conversation Structure in Unsupervised Topic Segmentation for Emails

11 0.048559137 119 emnlp-2010-We're Not in Kansas Anymore: Detecting Domain Changes in Streams

12 0.047272868 34 emnlp-2010-Crouching Dirichlet, Hidden Markov Model: Unsupervised POS Tagging with Context Local Tag Generation

13 0.046520479 33 emnlp-2010-Cross Language Text Classification by Model Translation and Semi-Supervised Learning

14 0.045299698 81 emnlp-2010-Modeling Perspective Using Adaptor Grammars

15 0.04361463 104 emnlp-2010-The Necessity of Combining Adaptation Methods

16 0.043540176 97 emnlp-2010-Simple Type-Level Unsupervised POS Tagging

17 0.042807225 69 emnlp-2010-Joint Training and Decoding Using Virtual Nodes for Cascaded Segmentation and Tagging Tasks

18 0.042341545 24 emnlp-2010-Automatically Producing Plot Unit Representations for Narrative Text

19 0.040139552 61 emnlp-2010-Improving Gender Classification of Blog Authors

20 0.039536022 96 emnlp-2010-Self-Training with Products of Latent Variable Grammars

similar papers computed by lsi model

lsi for this paper:

topicId topicWeight

[(0, 0.172), (1, 0.171), (2, -0.192), (3, -0.171), (4, 0.128), (5, -0.032), (6, 0.415), (7, -0.076), (8, 0.059), (9, -0.02), (10, -0.049), (11, 0.077), (12, 0.375), (13, -0.265), (14, -0.044), (15, -0.328), (16, -0.071), (17, 0.037), (18, 0.187), (19, -0.052), (20, -0.063), (21, 0.001), (22, 0.037), (23, 0.082), (24, 0.018), (25, 0.057), (26, -0.007), (27, -0.025), (28, 0.007), (29, 0.046), (30, 0.043), (31, -0.101), (32, -0.018), (33, -0.017), (34, 0.04), (35, -0.058), (36, -0.035), (37, 0.001), (38, 0.023), (39, -0.025), (40, 0.029), (41, 0.015), (42, 0.008), (43, 0.031), (44, -0.043), (45, -0.018), (46, -0.009), (47, -0.01), (48, -0.001), (49, -0.025)]

similar papers list:

simIndex simValue paperId paperTitle

same-paper 1 0.98188913 70 emnlp-2010-Jointly Modeling Aspects and Opinions with a MaxEnt-LDA Hybrid

Author: Xin Zhao ; Jing Jiang ; Hongfei Yan ; Xiaoming Li

2 0.95077866 49 emnlp-2010-Extracting Opinion Targets in a Single and Cross-Domain Setting with Conditional Random Fields

Author: Niklas Jakob ; Iryna Gurevych

3 0.28226766 64 emnlp-2010-Incorporating Content Structure into Text Analysis Applications

Author: Christina Sauper ; Aria Haghighi ; Regina Barzilay

4 0.28164107 100 emnlp-2010-Staying Informed: Supervised and Semi-Supervised Multi-View Topical Analysis of Ideological Perspective

Author: Amr Ahmed ; Eric Xing

5 0.27556008 120 emnlp-2010-What's with the Attitude? Identifying Sentences with Attitude in Online Discussions

Author: Ahmed Hassan ; Vahed Qazvinian ; Dragomir Radev

Abstract: Mining sentiment from user generated content is a very important task in Natural Language Processing. An example of such content is threaded discussions which act as a very important tool for communication and collaboration in the Web. Threaded discussions include e-mails, e-mail lists, bulletin boards, newsgroups, and Internet forums. Most of the work on sentiment analysis has been centered around finding the sentiment toward products or topics. In this work, we present a method to identify the attitude of participants in an online discussion toward one another. This would enable us to build a signed network representation of participant interaction where every edge has a sign that indicates whether the interaction is positive or negative. This is different from most of the research on social networks that has focused almost exclusively on positive links. The method is exper- imentally tested using a manually labeled set of discussion posts. The results show that the proposed method is capable of identifying attitudinal sentences, and their signs, with high accuracy and that it outperforms several other baselines.

6 0.2390992 102 emnlp-2010-Summarizing Contrastive Viewpoints in Opinionated Text

7 0.23569945 58 emnlp-2010-Holistic Sentiment Analysis Across Languages: Multilingual Supervised Latent Dirichlet Allocation

8 0.18977232 83 emnlp-2010-Multi-Level Structured Models for Document-Level Sentiment Classification

9 0.18323389 45 emnlp-2010-Evaluating Models of Latent Document Semantics in the Presence of OCR Errors

10 0.16893613 6 emnlp-2010-A Latent Variable Model for Geographic Lexical Variation

11 0.16831757 23 emnlp-2010-Automatic Keyphrase Extraction via Topic Decomposition

12 0.15629831 48 emnlp-2010-Exploiting Conversation Structure in Unsupervised Topic Segmentation for Emails

13 0.14961304 119 emnlp-2010-We're Not in Kansas Anymore: Detecting Domain Changes in Streams

14 0.14563657 77 emnlp-2010-Measuring Distributional Similarity in Context

15 0.14263806 61 emnlp-2010-Improving Gender Classification of Blog Authors

16 0.13948688 41 emnlp-2010-Efficient Graph-Based Semi-Supervised Learning of Structured Tagging Models

17 0.13620217 9 emnlp-2010-A New Approach to Lexical Disambiguation of Arabic Text

18 0.13605404 111 emnlp-2010-Two Decades of Unsupervised POS Induction: How Far Have We Come?

19 0.13472104 7 emnlp-2010-A Mixture Model with Sharing for Lexical Semantics

20 0.13320796 34 emnlp-2010-Crouching Dirichlet, Hidden Markov Model: Unsupervised POS Tagging with Context Local Tag Generation

similar papers computed by lda model

lda for this paper:

topicId topicWeight

[(3, 0.012), (12, 0.027), (29, 0.078), (30, 0.035), (52, 0.02), (56, 0.091), (66, 0.553), (72, 0.031), (76, 0.014), (79, 0.016)]

similar papers list:

simIndex simValue paperId paperTitle

1 0.99597627 33 emnlp-2010-Cross Language Text Classification by Model Translation and Semi-Supervised Learning

Author: Lei Shi ; Rada Mihalcea ; Mingjun Tian

Abstract: In this paper, we introduce a method that automatically builds text classifiers in a new language by training on already labeled data in another language. Our method transfers the classification knowledge across languages by translating the model features and by using an Expectation Maximization (EM) algorithm that naturally takes into account the ambiguity associated with the translation of a word. We further exploit the readily available unlabeled data in the target language via semisupervised learning, and adapt the translated model to better fit the data distribution of the target language.

same-paper 2 0.99329311 70 emnlp-2010-Jointly Modeling Aspects and Opinions with a MaxEnt-LDA Hybrid

Author: Xin Zhao ; Jing Jiang ; Hongfei Yan ; Xiaoming Li

3 0.99239737 91 emnlp-2010-Practical Linguistic Steganography Using Contextual Synonym Substitution and Vertex Colour Coding

Author: Ching-Yun Chang ; Stephen Clark

Abstract: Linguistic Steganography is concerned with hiding information in natural language text. One of the major transformations used in Linguistic Steganography is synonym substitution. However, few existing studies have studied the practical application of this approach. In this paper we propose two improvements to the use of synonym substitution for encoding hidden bits of information. First, we use the Web 1T Google n-gram corpus for checking the applicability of a synonym in context, and we evaluate this method using data from the SemEval lexical substitution task. Second, we address the problem that arises from words with more than one sense, which creates a potential ambiguity in terms of which bits are encoded by a particular word. We develop a novel method in which words are the vertices in a graph, synonyms are linked by edges, and the bits assigned to a word are determined by a vertex colouring algorithm. This method ensures that each word encodes a unique sequence of bits, without cutting out large number of synonyms, and thus maintaining a reasonable embedding capacity.

4 0.98758036 10 emnlp-2010-A Probabilistic Morphological Analyzer for Syriac

Author: Peter McClanahan ; George Busby ; Robbie Haertel ; Kristian Heal ; Deryle Lonsdale ; Kevin Seppi ; Eric Ringger

Abstract: We define a probabilistic morphological analyzer using a data-driven approach for Syriac in order to facilitate the creation of an annotated corpus. Syriac is an under-resourced Semitic language for which there are no available language tools such as morphological analyzers. We introduce novel probabilistic models for segmentation, dictionary linkage, and morphological tagging and connect them in a pipeline to create a probabilistic morphological analyzer requiring only labeled data. We explore the performance of models with varying amounts of training data and find that with about 34,500 labeled tokens, we can outperform a reasonable baseline trained on over 99,000 tokens and achieve an accuracy of just over 80%. When trained on all available training data, our joint model achieves 86.47% accuracy, a 29.7% reduction in error rate over the baseline.

5 0.98040873 111 emnlp-2010-Two Decades of Unsupervised POS Induction: How Far Have We Come?

Author: Christos Christodoulopoulos ; Sharon Goldwater ; Mark Steedman

Abstract: Part-of-speech (POS) induction is one of the most popular tasks in research on unsupervised NLP. Many different methods have been proposed, yet comparisons are difficult to make since there is little consensus on evaluation framework, and many papers evaluate against only one or two competitor systems. Here we evaluate seven different POS induction systems spanning nearly 20 years of work, using a variety of measures. We show that some of the oldest (and simplest) systems stand up surprisingly well against more recent approaches. Since most of these systems were developed and tested using data from the WSJ corpus, we compare their generalization abil- ities by testing on both WSJ and the multilingual Multext-East corpus. Finally, we introduce the idea of evaluating systems based on their ability to produce cluster prototypes that are useful as input to a prototype-driven learner. In most cases, the prototype-driven learner outperforms the unsupervised system used to initialize it, yielding state-of-the-art results on WSJ and improvements on nonEnglish corpora.

6 0.92063415 85 emnlp-2010-Negative Training Data Can be Harmful to Text Classification

7 0.91811943 104 emnlp-2010-The Necessity of Combining Adaptation Methods

8 0.91068512 50 emnlp-2010-Facilitating Translation Using Source Language Paraphrase Lattices

9 0.90905535 43 emnlp-2010-Enhancing Domain Portability of Chinese Segmentation Model Using Chi-Square Statistics and Bootstrapping

10 0.89771771 119 emnlp-2010-We're Not in Kansas Anymore: Detecting Domain Changes in Streams

11 0.88590705 11 emnlp-2010-A Semi-Supervised Approach to Improve Classification of Infrequent Discourse Relations Using Feature Vector Extension

12 0.8803587 114 emnlp-2010-Unsupervised Parse Selection for HPSG

13 0.87570435 44 emnlp-2010-Enhancing Mention Detection Using Projection via Aligned Corpora

14 0.8732729 49 emnlp-2010-Extracting Opinion Targets in a Single and Cross-Domain Setting with Conditional Random Fields

15 0.8708477 67 emnlp-2010-It Depends on the Translation: Unsupervised Dependency Parsing via Word Alignment

16 0.86563081 120 emnlp-2010-What's with the Attitude? Identifying Sentences with Attitude in Online Discussions

17 0.85883373 76 emnlp-2010-Maximum Entropy Based Phrase Reordering for Hierarchical Phrase-Based Translation

18 0.8581059 9 emnlp-2010-A New Approach to Lexical Disambiguation of Arabic Text

19 0.85698825 92 emnlp-2010-Predicting the Semantic Compositionality of Prefix Verbs

20 0.85119969 69 emnlp-2010-Joint Training and Decoding Using Virtual Nodes for Cascaded Segmentation and Tagging Tasks