acl acl2012 acl2012-62 knowledge-graph by maker-knowledge-mining

62 acl-2012-Cross-Lingual Mixture Model for Sentiment Classification

Source: pdf

Author: Xinfan Meng ; Furu Wei ; Xiaohua Liu ; Ming Zhou ; Ge Xu ; Houfeng Wang

Abstract: The amount of labeled sentiment data in English is much larger than that in other languages. Such a disproportion arouse interest in cross-lingual sentiment classification, which aims to conduct sentiment classification in the target language (e.g. Chinese) using labeled data in the source language (e.g. English). Most existing work relies on machine translation engines to directly adapt labeled data from the source language to the target language. This approach suffers from the limited coverage of vocabulary in the machine translation results. In this paper, we propose a generative cross-lingual mixture model (CLMM) to leverage unlabeled bilingual parallel data. By fitting parameters to maximize the likelihood of the bilingual parallel data, the proposed model learns previously unseen sentiment words from the large bilingual parallel data and improves vocabulary coverage signifi- cantly. Experiments on multiple data sets show that CLMM is consistently effective in two settings: (1) labeled data in the target language are unavailable; and (2) labeled data in the target language are also available.

Reference: text

Summary: the most important sentenses genereted by tfidf model

sentIndex sentText sentNum sentScore

1 com Abstract The amount of labeled sentiment data in English is much larger than that in other languages. [sent-4, score-0.695]

2 Such a disproportion arouse interest in cross-lingual sentiment classification, which aims to conduct sentiment classification in the target language (e. [sent-5, score-1.169]

3 Chinese) using labeled data in the source language (e. [sent-7, score-0.29]

4 Most existing work relies on machine translation engines to directly adapt labeled data from the source language to the target language. [sent-10, score-0.473]

5 In this paper, we propose a generative cross-lingual mixture model (CLMM) to leverage unlabeled bilingual parallel data. [sent-12, score-0.455]

6 By fitting parameters to maximize the likelihood of the bilingual parallel data, the proposed model learns previously unseen sentiment words from the large bilingual parallel data and improves vocabulary coverage signifi- cantly. [sent-13, score-1.009]

7 Experiments on multiple data sets show that CLMM is consistently effective in two settings: (1) labeled data in the target language are unavailable; and (2) labeled data in the target language are also available. [sent-14, score-0.72]

8 1 Introduction Sentiment Analysis (also known as opinion mining), which aims to extract the sentiment information from text, has attracted extensive attention in recent years. [sent-15, score-0.507]

9 Sentiment classification, the task of determining the sentiment orientation (positive, negative or neutral) of text, has been the most extensively studied task in sentiment analysis. [sent-16, score-0.947]

10 572 already a large amount of work on sentiment classification of text in various genres and in many languages. [sent-18, score-0.57]

11 (2002) focus on sentiment classification ofmovie reviews in English, and Zagibalov and Carroll (2008) study the problem of classifying product reviews in Chinese. [sent-20, score-0.626]

12 During the past few years, NTCIR1 organized several pilot tasks for sentiment classification of news articles written in English, Chinese and Japanese (Seki et al. [sent-21, score-0.604]

13 For English sentiment classification, there are several labeled corpora available (Hu and Liu, 2004; Pang et al. [sent-24, score-0.669]

14 Therefore, it is desirable to use the English labeled data to improve sentiment classification of documents in other languages. [sent-28, score-0.837]

15 One direct approach to leveraging the labeled data in English is to use machine translation engines as a black box to translate the labeled data from English to the target language (e. [sent-29, score-0.706]

16 Chinese), and then using the translated training data directly for the development of the sentiment classifier in the target language (Wan, 2009; Pan et al. [sent-31, score-0.684]

17 First, the vocabulary covered by the translated labeled data is limited, hence many sentiment indicative words can not be learned from the translated labeled data. [sent-34, score-1.115]

18 (201 1) show that vocabulary coverage has a strong correlation with sentiment classification accuracy. [sent-43, score-0.662]

19 Second, machine translation may change the sentiment polarity of the original text. [sent-44, score-0.556]

20 In this paper we propose a cross-lingual mixture model (CLMM) for cross-lingual sentiment classification. [sent-48, score-0.532]

21 Instead of relying on the unreliable machine translated labeled data, CLMM leverages bilingual parallel data to bridge the language gap between the source language and the target language. [sent-49, score-0.647]

22 CLMM is a generative model that treats the source language and target language words in parallel data as generated simultaneously by a set of mixture components. [sent-50, score-0.398]

23 Besides, CLMM can improve the accuracy of cross-lingual sentiment classification consistently regardless of whether labeled data in the target language are present or not. [sent-52, score-0.965]

24 We evaluate the model on sentiment classification of Chinese using English labeled data. [sent-53, score-0.787]

25 The experiment results show that CLMM yields 71% in accuracy when no Chinese labeled data are used, which significantly improves Chinese sentiment classification and is superiorto the SVM and co-training based methods. [sent-54, score-0.85]

26 2 Related Work In this section, we present a brief review of the related work on monolingual sentiment classification and cross-lingual sentiment classification. [sent-60, score-1.044]

27 1 Sentiment Classification Early work of sentiment classification focuses on English product reviews or movie reviews (Pang et al. [sent-62, score-0.626]

28 Since then, sentiment classification has been investigated in various domains and different languages (Zagibalov and Carroll, 2008; Seki et al. [sent-64, score-0.57]

29 There exist two main approaches to extracting sentiment orientation automatically. [sent-68, score-0.495]

30 , 2011) aims to aggregate the sentiment orientation of a sentence (or document) from the sentiment orientations of words or phrases found in the sentence (or document), while the corpus-based approach (Pang et al. [sent-70, score-0.969]

31 , 2002) treats the sentiment orientation detection as a conventional classification task and focuses on building classifier from a set of sentences (or documents) labeled with sentiment orientations. [sent-71, score-1.366]

32 Dictionary-based methods involve in creating or using sentiment lexicons. [sent-72, score-0.452]

33 Turney (2002) derives sentiment scores for phrases by measuring the mutual information between the given phrase and the words “excellent” and “poor”, and then uses the average scores of the phrases in a document as the sentiment of the document. [sent-73, score-0.904]

34 The interested readers are referred to (Pang and Lee, 2008) for a comprehensive review of sentiment classification. [sent-78, score-0.452]

35 2 Cross-Lingual Sentiment Classification Cross-lingual sentiment classification, which aims to conduct sentiment classification in the target language (e. [sent-80, score-1.169]

36 Chinese) with labeled data in the source language (e. [sent-82, score-0.29]

37 The basic idea is to explore the abundant labeled sentiment data in source language to alleviate the shortage of labeled data in the target language. [sent-85, score-1.078]

38 Most existing work relies on machine translation engines to directly adapt labeled data from the source language to target language. [sent-86, score-0.473]

39 Wan (2009) proposes to use ensemble method to train better Chinese sentiment classification model on English labeled data and their Chinese translation. [sent-87, score-0.838]

40 English Labeled data are first translated to Chinese, and then two SVM classifiers are trained on English and Chinese labeled data respectively. [sent-88, score-0.392]

41 After that, co-training (Blum and Mitchell, 1998) approach is adopted to leverage Chinese unlabeled data and their English translation to improve the SVM classifier for Chinese sentiment classification. [sent-89, score-0.745]

42 Instead of using machine translation engines to translate labeled text, the authors use it to construct the word translation oracle for pivot words translation. [sent-99, score-0.387]

43 (2011) focus on the task of jointly improving the performance of sentiment classification on two languages (e. [sent-101, score-0.57]

44 the authors use an unlabeled parallel corpus instead of machine translation engines. [sent-104, score-0.35]

45 They assume parallel sentences in the corpus should have the same sentiment polarity. [sent-105, score-0.645]

46 However, this method requires labeled data in both the source language and the target language, which are not always readily available. [sent-108, score-0.383]

47 574 3 Cross-Lingual Mixture Model for Sentiment Classification In this section we present the cross-lingual mixture model (CLMM) for sentiment classification. [sent-109, score-0.532]

48 1 Cross-lingual Sentiment Classification Formally, the task we are concerned about is to develop a sentiment classifier for the target language T (e. [sent-113, score-0.588]

49 Chinese), given labeled sentiment data DS in the source language S (e. [sent-115, score-0.742]

50 English), unlabeled parallel corpus U of the source language and the target language, and optional labeled data DT in target language T. [sent-117, score-0.783]

51 Aligning with previous work (Wan, 2008; Wan, 2009), we only consider binary sentiment classification scheme (positive or negative) in this paper, but the proposed method can be used in other classification schemes with minor modifications. [sent-118, score-0.688]

52 2 The Cross-Lingual Mixture Model The basic idea underlying CLMM is to enlarge the vocabulary by learning sentiment words from the parallel corpus. [sent-120, score-0.667]

53 The unobserved polarities of the unlabeled parallel corpus are modeled as hidden variables, and the observed words in parallel corpus are modeled as generated by a set of words generation distributions conditioned on the hidden variables. [sent-126, score-0.482]

54 Given a parallel corpus, we fit CLMM model by maximizing the likelihood of generating this parallel corpus. [sent-127, score-0.361]

55 By maximizing the likelihood, CLMM can estimate words generation probabilities for words unseen in the labeled data but present in the parallel corpus, hence expand the vocabulary. [sent-128, score-0.418]

56 In addition, CLMM can utilize words in both the source language and target language for determining polarity classes of the parallel sentences. [sent-129, score-0.353]

57 Figure 1: The generation process of the cross-lingual mixture model Figure 1 illustrates the detailed process of gener- ating words in the source language and target language respectively for the parallel corpus U, from the four mixture components in CLMM. [sent-130, score-0.475]

58 Words generation: Generating the words (a) Generating source language words ws from a Multinomial distribution P(ws |cs) (b) Generating target language words wt from a Multinomial distribution P(wt |ct) 3. [sent-136, score-0.706]

59 Meanwhile, we have the following log-likelihood function for labeled data in the source language Ds. [sent-146, score-0.29]

60 In addition, when labeled data in the target language is available, we have the following loglikelihood function. [sent-148, score-0.336]

61 , 1977) to estimate the conditional probability of word ws and wt given class c, P(ws |c) and P(wt |c) respectively. [sent-172, score-0.622]

62 class label for unlabeled parallel sentences) is computed according to the following equations. [sent-178, score-0.335]

63 )When di belongs to labeled data, P(cj |di) is 1w(hen) its label is cj and 0 otherwise. [sent-183, score-0.515]

64 1 Experiment Setup and Data Sets Experiment setup: We conduct experiments on two common cross-lingual sentiment classification settings. [sent-186, score-0.602]

65 In the first setting, no labeled data in the target language are available. [sent-187, score-0.336]

66 This setting has realistic significance, since in some situations we need to quickly develop a sentiment classifier for languages that we do not have labeled data in hand. [sent-188, score-0.738]

67 In this case, we classify text in the target language using only labeled data in the source language. [sent-189, score-0.383]

68 In the second setting, labeled data in the target language are also available. [sent-190, score-0.336]

69 In this case, a more reasonable strategy is to make full use of both labeled data in the source language and target language to develop the sentiment classifier for the target language. [sent-191, score-0.971]

70 Data sets: For Chinese sentiment classification, we use the same data set described in (Lu et al. [sent-193, score-0.478]

71 The labeled data sets consist of two English data sets and one Chinese data set. [sent-195, score-0.295]

72 The unlabeled parallel sentences are selected from ISI Chinese-English parallel corpus (Munteanu and Marcu, 2005). [sent-201, score-0.5]

73 , 2011), we remove neutral sentences and keep only high confident positive and negative sentences as predicted by a maximum entropy classifier trained on the labeled data. [sent-203, score-0.342]

74 TPNoetsgaitlvieTab31l,e4M1478,:P719(Q5S37tA80a% ti)scN1s5,T2a180bC,9o7(I3Ru370t -E%t hN)eD2aN1,t3aT97C418,62I(R945 -C% H) CLMM includes two hyper-parameters (λs and λt) controlling the contribution of unlabeled parallel data. [sent-206, score-0.307]

75 1, since no Chinese labeled data are used and the contribution of target language to the source language is limited. [sent-210, score-0.383]

76 MT-SVM: We translate the English labeled data to Chinese using Google Translate and use the translation results to train the SVM classifier for Chinese. [sent-217, score-0.366]

77 First, two monolingual SVM classifiers are trained on English labeled data and Chinese data translated from English labeled data. [sent-221, score-0.631]

78 Instead ofusing the corresponding machine translation of Chinese unlabeled sentences, we use the parallel English sentences of the Chinese unlabeled sentences. [sent-227, score-0.546]

79 3 Classification Using Only English Labeled Data The first set of experiments are conducted on using only English labeled data to create the sentiment classifier for Chinese. [sent-232, score-0.738]

80 As is shown, sentiment classification does not benefit much from the direct machine translation. [sent-236, score-0.57]

81 The underlying reason is that the vocabulary coverage in machine translated data is low, therefore the classifier learned from the labeled data is unable to generalize well on the test data. [sent-242, score-0.474]

82 We also observe that using a parallel corpus instead of machine translations can improve classification accuracy. [sent-247, score-0.27]

83 It should be noted that we do not have the result for Joint-Train model in this setting, since it requires both English labeled data and Chinese labeled data. [sent-248, score-0.46]

84 4 Classification Using English and Chinese Labeled Data The second set of experiments are conducted on using both English labeled data and Chinese labeled data to develop the Chinese sentiment classifier. [sent-250, score-0.938]

85 We conduct 5-fold cross validations on Chinese labeled data. [sent-251, score-0.29]

86 2, but we use natural Chinese sentences instead of translated Chinese sentences as labeled data in MT-Cotrain and Para-Cotrain. [sent-253, score-0.395]

87 One reason is that we use natural Chi578 nese labeled data instead of translated Chinese labeled data. [sent-256, score-0.53]

88 Never- theless, all three methods which leverage an unlabeled parallel corpus, namely Para-Cotrain, JointTrain and CLMM, still show big improvements over the SVM baseline. [sent-260, score-0.333]

89 5 The Influence of Unlabeled Parallel Data We investigate how the size of the unlabeled parallel data affects the sentiment classification in this subsection. [sent-264, score-0.903]

90 We vary the number of sentences in the unlabeled parallel from 2,000 to 20,000. [sent-265, score-0.348]

91 We use only English labeled data in this experiment, since this more directly reflects the effectiveness of each model in utilizing unlabeled parallel data. [sent-266, score-0.55]

92 The reason is that the two methods use machine translated labeled data to create initial Chinese classifiers. [sent-273, score-0.313]

93 6 The Influence of Chinese Labeled Data In this subsection, we investigate how the size of the Chinese labeled data affects the sentiment classification. [sent-277, score-0.695]

94 First, the proposed model can learn previously unseen sentiment words from large unlabeled data, which are not covered by the limited vocabulary in machine translation of the labeled data. [sent-290, score-0.93]

95 Second, CLMM can effectively utilize unlabeled parallel data regardless of whether labeled data in the target language are used or not. [sent-291, score-0.669]

96 In the future, we will work on leveraging parallel sentences and word alignments for other tasks in sentiment analysis, such as building multilingual sentiment lexicons. [sent-293, score-1.097]

97 A nonnegative matrix tri-factorization approach to sentiment classification with lexical prior knowledge. [sent-330, score-0.613]

98 Text classification from labeled and unlabeled documents using EM. [sent-348, score-0.514]

99 Using bilingual knowledge and ensemble techniques for unsupervised chinese sentiment analysis. [sent-392, score-0.691]

100 Automatic seed word selection for unsupervised sentiment classification of chinese text. [sent-405, score-0.742]

similar papers computed by tfidf model

tfidf for this paper:

wordName wordTfidf (topN-words)

[('clmm', 0.469), ('sentiment', 0.452), ('wt', 0.287), ('ws', 0.279), ('cj', 0.259), ('labeled', 0.217), ('chinese', 0.172), ('unlabeled', 0.155), ('parallel', 0.152), ('classification', 0.118), ('seki', 0.1), ('target', 0.093), ('page', 0.093), ('svm', 0.084), ('mpqa', 0.082), ('mixture', 0.08), ('translated', 0.07), ('english', 0.063), ('vocabulary', 0.063), ('pang', 0.061), ('polarity', 0.061), ('projection', 0.059), ('ntcir', 0.055), ('usi', 0.055), ('classifiers', 0.053), ('wan', 0.052), ('uti', 0.05), ('source', 0.047), ('zagibalov', 0.047), ('dt', 0.047), ('engines', 0.047), ('ds', 0.044), ('orientation', 0.043), ('translation', 0.043), ('classifier', 0.043), ('bilingual', 0.042), ('sentences', 0.041), ('tour', 0.041), ('validations', 0.041), ('di', 0.039), ('accuracy', 0.037), ('projecting', 0.037), ('thumbs', 0.037), ('translate', 0.037), ('lu', 0.037), ('pilot', 0.034), ('opinion', 0.033), ('ij', 0.033), ('conduct', 0.032), ('pan', 0.031), ('turney', 0.031), ('ccjj', 0.031), ('cmootdraeinlpara', 0.031), ('cotrain', 0.031), ('cotrainsvm', 0.031), ('cusi', 0.031), ('cuti', 0.031), ('davidov', 0.031), ('gclmmjoint', 0.031), ('gclmmmt', 0.031), ('kando', 0.031), ('kirk', 0.031), ('noriko', 0.031), ('prettenhofer', 0.031), ('traminodeplara', 0.031), ('yohei', 0.031), ('likelihood', 0.03), ('nigam', 0.03), ('coverage', 0.029), ('reviews', 0.028), ('probability', 0.028), ('class', 0.028), ('duh', 0.028), ('vt', 0.028), ('masterpiece', 0.027), ('taboada', 0.027), ('generating', 0.027), ('data', 0.026), ('leverage', 0.026), ('ensemble', 0.025), ('wiebe', 0.025), ('alignment', 0.025), ('documents', 0.024), ('generation', 0.023), ('remarkably', 0.023), ('ui', 0.023), ('evans', 0.023), ('unavailable', 0.023), ('xiaojun', 0.023), ('influence', 0.023), ('consistently', 0.022), ('monolingual', 0.022), ('munteanu', 0.022), ('nonnegative', 0.022), ('blum', 0.022), ('aims', 0.022), ('parameters', 0.021), ('matrix', 0.021), ('dempster', 0.021), ('correspondence', 0.021)]

similar papers list:

simIndex simValue paperId paperTitle

same-paper 1 0.99999976 62 acl-2012-Cross-Lingual Mixture Model for Sentiment Classification

Author: Xinfan Meng ; Furu Wei ; Xiaohua Liu ; Ming Zhou ; Ge Xu ; Houfeng Wang

2 0.36541304 61 acl-2012-Cross-Domain Co-Extraction of Sentiment and Topic Lexicons

Author: Fangtao Li ; Sinno Jialin Pan ; Ou Jin ; Qiang Yang ; Xiaoyan Zhu

Abstract: Extracting sentiment and topic lexicons is important for opinion mining. Previous works have showed that supervised learning methods are superior for this task. However, the performance of supervised methods highly relies on manually labeled training data. In this paper, we propose a domain adaptation framework for sentiment- and topic- lexicon co-extraction in a domain of interest where we do not require any labeled data, but have lots of labeled data in another related domain. The framework is twofold. In the first step, we generate a few high-confidence sentiment and topic seeds in the target domain. In the second step, we propose a novel Relational Adaptive bootstraPping (RAP) algorithm to expand the seeds in the target domain by exploiting the labeled source domain data and the relationships between topic and sentiment words. Experimental results show that our domain adaptation framework can extract precise lexicons in the target domain without any annotation.

3 0.25371262 100 acl-2012-Fine Granular Aspect Analysis using Latent Structural Models

Author: Lei Fang ; Minlie Huang

Abstract: In this paper, we present a structural learning model forjoint sentiment classification and aspect analysis of text at various levels of granularity. Our model aims to identify highly informative sentences that are aspect-specific in online custom reviews. The primary advantages of our model are two-fold: first, it performs document-level and sentence-level sentiment polarity classification jointly; second, it is able to find informative sentences that are closely related to some respects in a review, which may be helpful for aspect-level sentiment analysis such as aspect-oriented summarization. The proposed method was evaluated with 9,000 Chinese restaurant reviews. Preliminary experiments demonstrate that our model obtains promising performance. 1

4 0.20357373 21 acl-2012-A System for Real-time Twitter Sentiment Analysis of 2012 U.S. Presidential Election Cycle

Author: Hao Wang ; Dogan Can ; Abe Kazemzadeh ; Francois Bar ; Shrikanth Narayanan

Abstract: This paper describes a system for real-time analysis of public sentiment toward presidential candidates in the 2012 U.S. election as expressed on Twitter, a microblogging service. Twitter has become a central site where people express their opinions and views on political parties and candidates. Emerging events or news are often followed almost instantly by a burst in Twitter volume, providing a unique opportunity to gauge the relation between expressed public sentiment and electoral events. In addition, sentiment analysis can help explore how these events affect public opinion. While traditional content analysis takes days or weeks to complete, the system demonstrated here analyzes sentiment in the entire Twitter traffic about the election, delivering results instantly and continuously. It offers the public, the media, politicians and scholars a new and timely perspective on the dynamics of the electoral process and public opinion. 1

5 0.19270436 37 acl-2012-Baselines and Bigrams: Simple, Good Sentiment and Topic Classification

Author: Sida Wang ; Christopher Manning

Abstract: Variants of Naive Bayes (NB) and Support Vector Machines (SVM) are often used as baseline methods for text classification, but their performance varies greatly depending on the model variant, features used and task/ dataset. We show that: (i) the inclusion of word bigram features gives consistent gains on sentiment analysis tasks; (ii) for short snippet sentiment tasks, NB actually does better than SVMs (while for longer documents the opposite result holds); (iii) a simple but novel SVM variant using NB log-count ratios as feature values consistently performs well across tasks and datasets. Based on these observations, we identify simple NB and SVM variants which outperform most published results on sentiment analysis datasets, sometimes providing a new state-of-the-art performance level.

6 0.18676554 151 acl-2012-Multilingual Subjectivity and Sentiment Analysis

7 0.16944019 115 acl-2012-Identifying High-Impact Sub-Structures for Convolution Kernels in Document-level Sentiment Classification

8 0.1617204 161 acl-2012-Polarity Consistency Checking for Sentiment Dictionaries

9 0.14177676 28 acl-2012-Aspect Extraction through Semi-Supervised Modeling

10 0.11814073 180 acl-2012-Social Event Radar: A Bilingual Context Mining and Sentiment Analysis Summarization System

11 0.10576659 187 acl-2012-Subgroup Detection in Ideological Discussions

12 0.096959166 102 acl-2012-Genre Independent Subgroup Detection in Online Discussion Threads: A Study of Implicit Attitude using Textual Latent Semantics

13 0.089509904 12 acl-2012-A Graph-based Cross-lingual Projection Approach for Weakly Supervised Relation Extraction

14 0.086337201 134 acl-2012-Learning to Find Translations and Transliterations on the Web

15 0.081758559 143 acl-2012-Mixing Multiple Translation Models in Statistical Machine Translation

16 0.0797856 150 acl-2012-Multilingual Named Entity Recognition using Parallel Data and Metadata from Wikipedia

17 0.073209234 199 acl-2012-Topic Models for Dynamic Translation Model Adaptation

18 0.073174767 120 acl-2012-Information-theoretic Multi-view Domain Adaptation

19 0.071714893 144 acl-2012-Modeling Review Comments

20 0.070997596 25 acl-2012-An Exploration of Forest-to-String Translation: Does Translation Help or Hurt Parsing?

similar papers computed by lsi model

lsi for this paper:

topicId topicWeight

[(0, -0.225), (1, 0.132), (2, 0.219), (3, -0.255), (4, 0.158), (5, -0.118), (6, 0.068), (7, -0.024), (8, -0.253), (9, -0.091), (10, 0.11), (11, 0.017), (12, -0.107), (13, -0.154), (14, -0.018), (15, -0.06), (16, 0.041), (17, 0.018), (18, 0.013), (19, 0.074), (20, -0.076), (21, 0.057), (22, -0.052), (23, -0.038), (24, 0.021), (25, -0.001), (26, 0.015), (27, 0.041), (28, 0.015), (29, 0.058), (30, 0.033), (31, 0.082), (32, -0.043), (33, -0.016), (34, -0.071), (35, -0.044), (36, 0.056), (37, -0.108), (38, -0.017), (39, -0.008), (40, -0.013), (41, -0.061), (42, -0.118), (43, 0.063), (44, 0.065), (45, -0.006), (46, -0.021), (47, 0.05), (48, -0.025), (49, -0.035)]

similar papers list:

simIndex simValue paperId paperTitle

same-paper 1 0.9642089 62 acl-2012-Cross-Lingual Mixture Model for Sentiment Classification

Author: Xinfan Meng ; Furu Wei ; Xiaohua Liu ; Ming Zhou ; Ge Xu ; Houfeng Wang

2 0.82829082 61 acl-2012-Cross-Domain Co-Extraction of Sentiment and Topic Lexicons

Author: Fangtao Li ; Sinno Jialin Pan ; Ou Jin ; Qiang Yang ; Xiaoyan Zhu

3 0.81593776 151 acl-2012-Multilingual Subjectivity and Sentiment Analysis

Author: Rada Mihalcea ; Carmen Banea ; Janyce Wiebe

Abstract: Subjectivity and sentiment analysis focuses on the automatic identification of private states, such as opinions, emotions, sentiments, evaluations, beliefs, and speculations in natural language. While subjectivity classification labels text as either subjective or objective, sentiment classification adds an additional level of granularity, by further classifying subjective text as either positive, negative or neutral. While much of the research work in this area has been applied to English, research on other languages is growing, including Japanese, Chinese, German, Spanish, Romanian. While most of the researchers in the field are familiar with the methods applied on English, few of them have closely looked at the original research carried out in other languages. For example, in languages such as Chinese, researchers have been looking at the ability of characters to carry sentiment information (Ku et al., 2005; Xiang, 2011). In Romanian, due to markers of politeness and additional verbal modes embedded in the language, experiments have hinted that subjectivity detection may be easier to achieve (Banea et al., 2008). These additional sources ofinformation may not be available across all languages, yet, various articles have pointed out that by investigating a synergistic approach for detecting subjectivity and sentiment in multiple languages at the same time, improvements can be achieved not only in other languages, but in English as well. The development and interest in these methods is also highly motivated by the fact that only 27% of Internet users speak English (www.internetworldstats.com/stats.htm, 4 . unt . edu wiebe @ c s . pitt . edu Oct 11, 2011), and that number diminishes further every year, as more people across the globe gain Internet access. The aim of this tutorial is to familiarize the attendees with the subjectivity and sentiment research carried out on languages other than English in order to enable and promote crossfertilization. Specifically, we will review work along three main directions. First, we will present methods where the resources and tools have been specifically developed for a given target language. In this category, we will also briefly overview the main methods that have been proposed for English, but which can be easily ported to other languages. Second, we will describe cross-lingual approaches, including several methods that have been proposed to leverage on the resources and tools available in English by using cross-lingual projections. Finally, third, we will show how the expression of opinions and polarity pervades language boundaries, and thus methods that holistically explore multiple languages at the same time can be effectively considered. References C. Banea, R. Mihalcea, and J. Wiebe. 2008. A Bootstrapping method for building subjectivity lexicons for languages with scarce resources. In Proceedings of LREC 2008, Marrakech, Morocco. L. W. Ku, T. H. Wu, L. Y. Lee, and H. H. Chen. 2005. Construction of an Evaluation Corpus for Opinion Extraction. In Proceedings of NTCIR-5, Tokyo, Japan. L. Xiang. 2011. Ideogram Based Chinese Sentiment Word Orientation Computation. Computing Research Repository, page 4, October. Jeju, Republic of Korea,T 8ut Jourliya 2l0 A1b2s.tr ?ac c2t0s1 o2f A ACssLo 2c0ia1t2io,n p faogre C 4o,mputational Linguistics

4 0.80862665 37 acl-2012-Baselines and Bigrams: Simple, Good Sentiment and Topic Classification

Author: Sida Wang ; Christopher Manning

5 0.73648298 161 acl-2012-Polarity Consistency Checking for Sentiment Dictionaries

Author: Eduard Dragut ; Hong Wang ; Clement Yu ; Prasad Sistla ; Weiyi Meng

Abstract: Polarity classification of words is important for applications such as Opinion Mining and Sentiment Analysis. A number of sentiment word/sense dictionaries have been manually or (semi)automatically constructed. The dictionaries have substantial inaccuracies. Besides obvious instances, where the same word appears with different polarities in different dictionaries, the dictionaries exhibit complex cases, which cannot be detected by mere manual inspection. We introduce the concept of polarity consistency of words/senses in sentiment dictionaries in this paper. We show that the consistency problem is NP-complete. We reduce the polarity consistency problem to the satisfiability problem and utilize a fast SAT solver to detect inconsistencies in a sentiment dictionary. We perform experiments on four sentiment dictionaries and WordNet.

6 0.70029306 100 acl-2012-Fine Granular Aspect Analysis using Latent Structural Models

7 0.60048085 21 acl-2012-A System for Real-time Twitter Sentiment Analysis of 2012 U.S. Presidential Election Cycle

8 0.56765324 115 acl-2012-Identifying High-Impact Sub-Structures for Convolution Kernels in Document-level Sentiment Classification

9 0.54026467 28 acl-2012-Aspect Extraction through Semi-Supervised Modeling

10 0.46987715 120 acl-2012-Information-theoretic Multi-view Domain Adaptation

11 0.4194558 180 acl-2012-Social Event Radar: A Bilingual Context Mining and Sentiment Analysis Summarization System

12 0.34082168 12 acl-2012-A Graph-based Cross-lingual Projection Approach for Weakly Supervised Relation Extraction

13 0.3396751 67 acl-2012-Deciphering Foreign Language by Combining Language Models and Context Vectors

14 0.3348695 187 acl-2012-Subgroup Detection in Ideological Discussions

15 0.32243219 102 acl-2012-Genre Independent Subgroup Detection in Online Discussion Threads: A Study of Implicit Attitude using Textual Latent Semantics

16 0.30622751 172 acl-2012-Selective Sharing for Multilingual Dependency Parsing

17 0.303983 134 acl-2012-Learning to Find Translations and Transliterations on the Web

18 0.29238731 150 acl-2012-Multilingual Named Entity Recognition using Parallel Data and Metadata from Wikipedia

19 0.28989565 1 acl-2012-ACCURAT Toolkit for Multi-Level Alignment and Information Extraction from Comparable Corpora

20 0.27482605 162 acl-2012-Post-ordering by Parsing for Japanese-English Statistical Machine Translation

similar papers computed by lda model

lda for this paper:

topicId topicWeight

[(25, 0.028), (26, 0.044), (28, 0.068), (30, 0.019), (37, 0.053), (39, 0.09), (52, 0.011), (57, 0.011), (59, 0.011), (74, 0.019), (81, 0.225), (82, 0.028), (84, 0.016), (85, 0.029), (90, 0.106), (92, 0.046), (94, 0.034), (99, 0.077)]

similar papers list:

simIndex simValue paperId paperTitle

same-paper 1 0.75097698 62 acl-2012-Cross-Lingual Mixture Model for Sentiment Classification

Author: Xinfan Meng ; Furu Wei ; Xiaohua Liu ; Ming Zhou ; Ge Xu ; Houfeng Wang

2 0.7178576 142 acl-2012-Mining Entity Types from Query Logs via User Intent Modeling

Author: Patrick Pantel ; Thomas Lin ; Michael Gamon

Abstract: We predict entity type distributions in Web search queries via probabilistic inference in graphical models that capture how entitybearing queries are generated. We jointly model the interplay between latent user intents that govern queries and unobserved entity types, leveraging observed signals from query formulations and document clicks. We apply the models to resolve entity types in new queries and to assign prior type distributions over an existing knowledge base. Our models are efficiently trained using maximum likelihood estimation over millions of real-world Web search queries. We show that modeling user intent significantly improves entity type resolution for head queries over the state ofthe art, on several metrics, without degradation in tail query performance.

3 0.67606348 89 acl-2012-Exploring Deterministic Constraints: from a Constrained English POS Tagger to an Efficient ILP Solution to Chinese Word Segmentation

Author: Qiuye Zhao ; Mitch Marcus

Abstract: We show for both English POS tagging and Chinese word segmentation that with proper representation, large number of deterministic constraints can be learned from training examples, and these are useful in constraining probabilistic inference. For tagging, learned constraints are directly used to constrain Viterbi decoding. For segmentation, character-based tagging constraints can be learned with the same templates. However, they are better applied to a word-based model, thus an integer linear programming (ILP) formulation is proposed. For both problems, the corresponding constrained solutions have advantages in both efficiency and accuracy. 1 introduction In recent work, interesting results are reported for applications of integer linear programming (ILP) such as semantic role labeling (SRL) (Roth and Yih, 2005), dependency parsing (Martins et al., 2009) and so on. In an ILP formulation, ’non-local’ deterministic constraints on output structures can be naturally incorporated, such as ”a verb cannot take two subject arguments” for SRL, and the projectivity constraint for dependency parsing. In contrast to probabilistic constraints that are estimated from training examples, this type of constraint is usually hand-written reflecting one’s linguistic knowledge. Dynamic programming techniques based on Markov assumptions, such as Viterbi decoding, cannot handle those ’non-local’ constraints as discussed above. However, it is possible to constrain Viterbi 1054 decoding by ’local’ constraints, e.g. ”assign label t to word w” for POS tagging. This type of constraint may come from human input solicited in interactive inference procedure (Kristjansson et al., 2004). In this work, we explore deterministic constraints for two fundamental NLP problems, English POS tagging and Chinese word segmentation. We show by experiments that, with proper representation, large number of deterministic constraints can be learned automatically from training data, which can then be used to constrain probabilistic inference. For POS tagging, the learned constraints are directly used to constrain Viterbi decoding. The corresponding constrained tagger is 10 times faster than searching in a raw space pruned with beam-width 5. Tagging accuracy is moderately improved as well. For Chinese word segmentation (CWS), which can be formulated as character tagging, analogous constraints can be learned with the same templates as English POS tagging. High-quality constraints can be learned with respect to a special tagset, however, with this tagset, the best segmentation accuracy is hard to achieve. Therefore, these character-based constraints are not directly used for determining predictions as in English POS tagging. We propose an ILP formulation of the CWS problem. By adopting this ILP formulation, segmentation F-measure is increased from 0.968 to 0.974, as compared to Viterbi decoding with the same feature set. Moreover, the learned constraints can be applied to reduce the number of possible words over a character sequence, i.e. to reduce the number of variables to set. This reduction of problem size immediately speeds up an ILP solver by more than 100 times. ProceediJnegjus, o Rfe thpeu 5bl0icth o Afn Knouraela M, 8e-e1t4in Jgul oyf t 2h0e1 A2.s ?oc c2ia0t1io2n A fsosro Cciaotmiopnu ftaotrio Cnoamlp Luintagtuioisntaicls L,i pnaggueis t 1i0c5s4–1062, 2 English POS tagging 2.1 Explore deterministic constraints Suppose that, following (Chomsky, 1970), we distinguish major lexical categories (Noun, Verb, Adjective and Preposition) by two binary features: + |− N and +|− V. Let (+N −V) =Noun, (−N +V) =Verb, (+N, +V) =Adjective, aonudn (−N, −V) =preposition. A word occurring in betw(e−eNn a preceding wosoitrdio nth.e Aand w a following wgo irnd of always bears the feature +N. On the other hand, consider the annotation guideline of English Treebank (Marcus et al., 1993) instead. Part-of-speech (POS) tags are used to categorize words, for example, the POS tag VBG tags verbal gerunds, NNS tags nominal plurals, DT tags determiners and so on. Following this POS representation, there are as many as 10 possible POS tags that may occur in between the–of, as estimated from the WSJ corpus of Penn Treebank. , 2.1.1 Templates of deterministic constraints , To explore determinacy in the distribution of POS tags in Penn Treebank, we need to consider that a POS tag marks the basic syntactic category of a word as well as its morphological inflection. A constraint that may determine the POS category should reflect both the context and the morphological feature of the corresponding word. The practical difficulty in representing such deterministic constraints is that we do not have a perfect mechanism to analyze morphological features of a word. Endings or prefixes of English words do not deterministically mark their morphological inflections. We propose to compute the morph feature of a word as the set of all of its possible tags, i.e. all tag types that are assigned to the word in training data. Furthermore, we approximate unknown words in testing data by rare words in training data. For a word that occurs less than 5 times in the training corpus, we compute its morph feature as its last two characters, which is also conjoined with binary features indicating whether the rare word contains digits, hyphens or upper-case characters respectively. See examples of morph features in Table 1. We consider bigram and trigram templates for generating potentially deterministic constraints. Let denote the ith word relative to the current word w0; and mi denote the morph feature of wi. A wi 1055 w(fr0e=qtruaednets)(set of pmos0s=ib{lNeN taSg,s V oBfZ th}e word) w0=t(imraere-s)hares(thme0 l=as{t- tewso, c HhYaPraHcEteNrs}. .) Table 1: Morph features offrequent words and rare words as computed from the WSJ Corpus of Penn Treebank. -gtbr ai -m w −1w 0w−mw1 m,wm 0−, 1mw1 0 w mw1 , mw m− 1m 1mw0m0w,1 wm, m0 −m1 m 0wm1 Table 2: The templates for generating potentially deterministic constraints of English POS tagging. bigram constraint includes one contextual word (w−1 |w1) or the corresponding morph feature; and a trigram constraint includes both contextual words or their morph features. Each constraint is also con- joined with w0 or m0, as described in Table 2. 2.1.2 Learning of deterministic constraints In the above section, we explore templates for potentially deterministic constraints that may determine POS category. With respect to a training corpus, if a constraint C relative to w0 ’always’ assigns a certain POS category t∗ to w0 in its context, i.e. > thr, and this constraint occurs more than a cutoff number, we consider it as a deterministic constraint. The threshold thr is a real number just under 1.0 and the cutoff number is empirically set to 5 in our experiments. counctou(Cnt∧(tC0)=t∗) 2.1.3 Decoding of deterministic constraints By the above definition, the constraint of w−1 = the, m0 = {NNS VBZ } and w1 = of is deterministic. It det=er{mNiNneSs, ,the V BPZO}S category of w0 to be NNS. There are at least two ways of decoding these constraints during POS tagging. Take the word trades for example, whose morph feature is {NNS, VBZ}. fOonre e xaaltemrnplaet,ive w hiso sthea tm as long as rtera dises { occurs Zb e}-. tween the-of, it is tagged with NNS. The second alternative is that the tag decision is made only if all deterministic constraints relative to this occurrence , of trades agree on the same tag. Both ways of decoding are purely rule-based and involve no probabilistic inference. In favor of a higher precision, we adopt the latter one in our experiments. tTchoe/nDscrotTamwSpci&lnoeLmxpd;/–fiulenbtaxp/i–cloufntg/aNpnlOci(amgnw/1–tOhNTpe(lanS+Ti&/m2cNL)lubTdaien2ls/)IoVNuBtlZamwn.1=ic2l3ud,ems.2=1 Table 3: Comparison of raw input and constrained input. 2.2 Search in a constrained space Following most previous work, we consider POS tagging as a sequence classification problem and de- compose the overall sequence scnore over the linear structure, i.e. ˆt =t∈atraggGmENa(xw)Xi=1score(ti) where function tagGEN maps input seXntence w = w1...wn to the set of all tag sequences that are of length n. If a POS tagger takes raw input only, i.e. for every word, the number of possible tags is a constant T, the space of tagGEN is as large as Tn. On the other hand, if we decode deterministic constraints first be- fore a probabilistic search, i.e. for some words, the number of possible tags is reduced to 1, the search space is reduced to Tm, where m is the number of (unconstrained) words that are not subject to any deterministic constraints. Viterbi algorithm is widely used for tagging, and runs in O(nT2) when searching in an unconstrained space. On the other hand, consider searching in a constrained space. Suppose that among the m unconstrained words, m1 of them follow a word that has been tagged by deterministic constraints and m2 (=m-m1) of them follow another unconstrained word. Viterbi decoder runs in O(m1T + m2T2) while searching in such a constrained space. The example in Table 3 shows raw and constrained input with respect to a typical input sentence. Lookahead features The score of tag predictions are usually computed in a high-dimensional feature space. We adopt the basic feature set used in (Ratnaparkhi, 1996) and (Collins, 2002). Moreover, when deterministic constraints have applied to contextual words of w0, it is also possible to include some lookahead feature templates, such as: t0&t1; , t0&t1;&t2; , and t−1&t0;&t1; where ti represents the tag of the ith word relative 1056 to the current word w0. As discussed in (Shen et al., 2007), categorical information of neighbouring words on both sides of w0 help resolve POS ambiguity of w0. In (Shen et al., 2007), lookahead features may be available for use during decoding since searching is bidirectional instead of left-to-right as in Viterbi decoding. In this work, deterministic constraints are decoded before the application of probabilistic models, therefore lookahead features are made available during Viterbi decoding. 3 Chinese Word Segmentation (CWS) 3.1 Word segmentation as character tagging Considering the ambiguity problem that a Chinese character may appear in any relative position in a word and the out-of-vocabulary (OOV) problem that it is impossible to observe all words in training data, CWS is widely formulated as a character tagging problem (Xue, 2003). A character-based CWS decoder is to find the highest scoring tag sequence tˆ over the input character sequence c, i.e. Xn tˆ =t∈ atraggGmEaNx(c)Xi=1score(ti) . This is the same formulation as POS tagging. The Viterbi algorithm is also widely used for decoding. The tag of each character represents its relative position in a word. Two popular tagsets include 1) IB: where B tags the beginning of a word and I all other positions; and 2) BMES: where B, M and E represent the beginning, middle and end of a multicharacter word respectively, and S tags a singlecharacter word. For example, after decoding with BMES, 4 consecutive characters associated with the tag sequence BMME compose a word. However, after decoding with IB, characters associated with BIII may compose a word if the following tag is B or only form part of a word if the following tag is I. Even though character tagging accuracy is higher with tagset IB, tagset BMES is more popular in use since better performance of the original problem CWS can be achieved by this tagset. Character-based feature templates We adopt the ’non-lexical-target’ feature templates in (Jiang et al., 2008a). Let ci denote the ith character relative to the current character c0 and t0 denote the tag assigned to c0. The following templates are used: ci&t0; (i=-2...2), cici+1&t0; (i=-2...1) and c−1c1&t0.; Character-based deterministic constraints We can use the same templates as described in Table 2 to generate potentially deterministic constraints for CWS character tagging, except that there are no morph features computed for Chinese characters. As we will show with experimental results in Section 5.2, useful deterministic constraints for CWS can be learned with tagset IB but not with tagset BMES. It is interesting but not surprising to notice, again, that the determinacy of a problem is sensitive to its representation. Since it is hard to achieve the best segmentations with tagset IB, we propose an indirect way to use these constraints in the following section, instead of applying these constraints as straightforwardly as in English POS tagging. 3.2 Word-based word segmentation A word-based CWS decoder finds the highest scoring segmentation sequence wˆ that is composed by the input character sequence c, i.e. wˆ =w∈arseggGmEaNx(c)Xi|=w1|score(wi) . where function segGEN maps character sequence c to the set of all possible segmentations of c. For example, w = (c1. .cl1 ) ...(cn−lk+1 ...cn) represents a segmentation of k words and the lengths of the first and last word are l1 and lk respectively. In early work, rule-based models find words one by one based on heuristics such as forward maximum match (Sproat et al., 1996). Exact search is possible with a Viterbi-style algorithm, but beamsearch decoding is more popular as used in (Zhang and Clark, 2007) and (Jiang et al., 2008a). We propose an Integer Linear Programming (ILP) formulation of word segmentation, which is naturally viewed as a word-based model for CWS. Character-based deterministic constraints, as discussed in Section 3.1, can be easily applied. 3.3 ILP formulation of CWS Given a character sequence c=c1 ...cn, there are s(= n(n + 1)/2) possible words that are contiguous subsets of c, i.e. w1, ..., ws ⊆ c. Our goal is to find 1057 Table 4: Comparison of raw input and constrained input. an optimal solution x = ...xs that maximizes x1 Xs Xscore(wi) · xi, subject to Xi= X1 (1) X xi = 1, ∀c ∈ c; (2) ix:Xic∈∈wi {0,1},1 ≤i≤s The boolean value of xi, as guaranteed by constraint (2), indicates whether wi is selected in the segmentation solution or not. Constraint (1) requires every character to be included in exactly one selected word, thus guarantees a proper segmentation of the whole sequence. This resembles the ILP formulation of the set cover problem, though the first con- straint is different. Take n = 2 for example, i.e. c = c1c2, the set of possible words is {c1, c2 , c1c2}, i.e. s = |x| = t3 o. T pohesrseib are only t iwso { possible soli.uet.ion ss = subject t o3 .co Tnhsetrreain artse (1) yan tdw (2), x = 1 s1o0giving an output set {c1, c2}, or x = 001 giving an output asent {c1c2}. tTphuet efficiency o.f solving this problem depends on the number of possible words (contiguous subsets) over a character sequence, i.e. the number of variables in x. So as to reduce |x|, we apply determiniasbtlice sc ionn xs.tra Sinots a predicting I |xB| tags first, w dehtiecrhm are learned as described in Section 3.1. Possible words are generated with respect to the partially tagged character sequence. A character tagged with B always occurs at the beginning of a possible word. Table 4 illustrates the constrained and raw input with respect to a typical character sequence. 3.4 Character- and word-based features As studied in previous work, word-based feature templates usually include the word itself, sub-words contained in the word, contextual characters/words and so on. It has been shown that combining the use of character- and word-based features helps improve performance. However, in the character tag- ging formulation, word-based features are non-local. To incorporate these non-local features and make the search tractable, various efforts have been made. For example, Jiang et al. (2008a) combine different levels of knowledge in an outside linear model of a twolayer cascaded model; Jiang et al. (2008b) uses the forest re-ranking technique (Huang, 2008); and in (Kruengkrai et al., 2009), only known words in vocabulary are included in the hybrid lattice consisting of both character- and word-level nodes. We propose to incorporate character-based features in word-based models. Consider a characterbased feature function φ(c, t,c) that maps a character-tag pair to a high-dimensional feature space, with respect to an input character sequence c. For a possible word over c of length l , wi = ci0 ...ci0+l−1, tag each character cij in this word with a character-based tag tij . Character-based features of wi can be computed as {φ(cij , tij , c) |0 ≤ j < l}. The ficrsant row oofm pTautbeled a5s i {llφus(tcrates c,ch)a|r0ac ≤ter j-b

4 0.60415268 206 acl-2012-UWN: A Large Multilingual Lexical Knowledge Base

Author: Gerard de Melo ; Gerhard Weikum

Abstract: We present UWN, a large multilingual lexical knowledge base that describes the meanings and relationships of words in over 200 languages. This paper explains how link prediction, information integration and taxonomy induction methods have been used to build UWN based on WordNet and extend it with millions of named entities from Wikipedia. We additionally introduce extensions to cover lexical relationships, frame-semantic knowledge, and language data. An online interface provides human access to the data, while a software API enables applications to look up over 16 million words and names.

5 0.59420663 21 acl-2012-A System for Real-time Twitter Sentiment Analysis of 2012 U.S. Presidential Election Cycle

Author: Hao Wang ; Dogan Can ; Abe Kazemzadeh ; Francois Bar ; Shrikanth Narayanan

6 0.59122843 102 acl-2012-Genre Independent Subgroup Detection in Online Discussion Threads: A Study of Implicit Attitude using Textual Latent Semantics

7 0.58914864 191 acl-2012-Temporally Anchored Relation Extraction

8 0.58910102 40 acl-2012-Big Data versus the Crowd: Looking for Relationships in All the Right Places

9 0.58796763 187 acl-2012-Subgroup Detection in Ideological Discussions

10 0.58619308 218 acl-2012-You Had Me at Hello: How Phrasing Affects Memorability

11 0.58332795 63 acl-2012-Cross-lingual Parse Disambiguation based on Semantic Correspondence

12 0.58211243 159 acl-2012-Pattern Learning for Relation Extraction with a Hierarchical Topic Model

13 0.58198857 80 acl-2012-Efficient Tree-based Approximation for Entailment Graph Learning

14 0.58014441 219 acl-2012-langid.py: An Off-the-shelf Language Identification Tool

15 0.57958299 130 acl-2012-Learning Syntactic Verb Frames using Graphical Models

16 0.57928163 29 acl-2012-Assessing the Effect of Inconsistent Assessors on Summarization Evaluation

17 0.57909769 28 acl-2012-Aspect Extraction through Semi-Supervised Modeling

18 0.57771617 61 acl-2012-Cross-Domain Co-Extraction of Sentiment and Topic Lexicons

19 0.57763845 214 acl-2012-Verb Classification using Distributional Similarity in Syntactic and Semantic Structures

20 0.57486272 72 acl-2012-Detecting Semantic Equivalence and Information Disparity in Cross-lingual Documents