acl acl2012 acl2012-100 knowledge-graph by maker-knowledge-mining
Source: pdf
Author: Lei Fang ; Minlie Huang
Abstract: In this paper, we present a structural learning model forjoint sentiment classification and aspect analysis of text at various levels of granularity. Our model aims to identify highly informative sentences that are aspect-specific in online custom reviews. The primary advantages of our model are two-fold: first, it performs document-level and sentence-level sentiment polarity classification jointly; second, it is able to find informative sentences that are closely related to some respects in a review, which may be helpful for aspect-level sentiment analysis such as aspect-oriented summarization. The proposed method was evaluated with 9,000 Chinese restaurant reviews. Preliminary experiments demonstrate that our model obtains promising performance. 1
Reference: text
sentIndex sentText sentNum sentScore
1 cn Abstract In this paper, we present a structural learning model forjoint sentiment classification and aspect analysis of text at various levels of granularity. [sent-7, score-1.15]
2 Our model aims to identify highly informative sentences that are aspect-specific in online custom reviews. [sent-8, score-0.252]
3 The proposed method was evaluated with 9,000 Chinese restaurant reviews. [sent-10, score-0.069]
4 Preliminary experiments demonstrate that our model obtains promising performance. [sent-11, score-0.069]
5 1 Introduction Online reviews have been a major resource from which users may find opinions or comments on the products or services they want to consume. [sent-12, score-0.288]
6 However, users sometimes might be overwhelmed, and not be able to read reviews one by one when facing a considerably large number of reviews, and they may be not be satisfied by only being served with document-level reviews statistics (that is, the number of reviews with 1-star, 2-star, . [sent-13, score-0.611]
7 Aspect-level review analysis may be alternative for addressing this issue as aspect-specific opinions may more clearly, explicitly, and completely describe the quality of a product from different properties. [sent-17, score-0.167]
8 Our goal is to discover informative sentences that are consistent with the overall rating of a review, and 333 simultaneously, to perform sentiment analysis at aspect level. [sent-18, score-1.391]
9 Notice, that a review with a high rating (say, 4/5 stars) may contain both negative and positive opinions, and the same to a review with a very low rating (say, 1/2 star). [sent-19, score-0.549]
10 From our point of view, each review has a set of sentences that are informative and coherent to its overall rating. [sent-20, score-0.34]
11 To perform fine granular sentiment analysis, the first step is to discover such coherent content. [sent-21, score-0.674]
12 Many information needs require the systems to perform fine granular sentiment analysis. [sent-22, score-0.601]
13 Aspectlevel sentiment analysis may be more useful for users to have a global picture of opinions on the product’s properties. [sent-23, score-0.561]
14 Furthermore, different users may have different preferences on different aspects of a product. [sent-24, score-0.056]
15 In recent years, there has been much work focused multilevel sentiment classification using structural learning models. [sent-26, score-0.654]
16 Yi (2007) extends the standard conditional random fields to model the local sentiment flow. [sent-27, score-0.427]
17 Ryan (2007) proposed structured models for fine-to-coarse sentiment analysis. [sent-28, score-0.454]
18 Oscar (201 1) proposed to discover fine-grained sentiment with hidden-state CRF(Quattoni et al. [sent-29, score-0.463]
19 Yessenalina (2010) deployed the framework of latent struc- tural SVMs(Yu and Joachims. [sent-31, score-0.096]
20 As for aspect level rating, ranking, or summarization, Benjamin(2007) emProceedJienjgus, R ofep thueb 5lic0t hof A Knonrueaa,l M 8-e1e4ti Jnugly o f2 t0h1e2 A. [sent-33, score-0.567]
21 c so2c0ia1t2io Ans fsoorc Ciatoiomnp fuotart Cioonmaplu Ltiantgiounisatlic Lsi,n pgaugiestsi3c 3s3–337, ployed the good grief algorithm for multiple aspect ranking and the extensions of the generative topic models were also widely studied, such as (Titov and McDonald. [sent-35, score-0.599]
22 In this paper, we build a general structural learning model for joint sentiment classification and aspect analysis using a latent discriminate method. [sent-42, score-1.305]
23 Our model is able to predict the sentiment polarity of document as well as to identify aspect-specific sentences and predict the polarity of such sentences. [sent-43, score-0.934]
24 The proposed method was evaluated with 9,000 Chinese restaurant reviews. [sent-44, score-0.069]
25 Preliminary experiments demonstrate that our model obtains promising performance. [sent-45, score-0.069]
26 1 Document Structure We assume that the polarity of document is closely related to some aspects for the reason that people are writing reviews to praise or criticize certain aspects. [sent-47, score-0.448]
27 Therefore, each informative sentence of the document characterizes one aspect, expressing aspect specific polarity or subjective features. [sent-48, score-1.051]
28 Similar to previous work on aspect analysis (Wang et al. [sent-49, score-0.572]
29 , 2010), we define the aspect as a collection of synonyms. [sent-51, score-0.541]
30 For instance, the word set {“value”, “price”, “cost”, “worth”, “quality”} di ss a synonym set corresponding too tthh”e, aspect “price”. [sent-52, score-0.616]
31 sFyonro enaycmh document, an aspect is described by one or several sentences expressing aspect specific polarity or subjective information. [sent-53, score-1.378]
32 Let document be denoted by x, and y ∈ {+1, −1} represents mthee positive or negative polarity 1o,f− t1he} document, s is the set of informative sentences, in which each sentence is attached with certain aspect ai ∈ A = {a1, . [sent-54, score-1.13]
33 explains tlhinea s (e2n0ti1m0)en cth oofo tshees whole document while the s here retain this property. [sent-59, score-0.121]
34 Let Ψ(x, y, s) denote the joint feature map that outputs the features describing the quality ofpredicting sentiment y using the sentence set s. [sent-60, score-0.523]
35 Let xj denote the j-th sentence of document x, and aj is the attached aspect of xj. [sent-61, score-0.993]
36 , 2010), we propose the follow334 ing formulation of the discriminate function w⃗TΨ(x, y, s) = N(1x)∑j∈s(y · w⃗ pTolajψpol(xj) + w⃗ sTubjajψsubj(xj)) where N(x) is the normalizing factor, ψpol (xj) and ψsubj (xj) represents the polarity and subjectivity features of sentence xj respectively. [sent-63, score-0.596]
37 w⃗ pol and w⃗ subj denote the weight for polarity and subjectivity features. [sent-64, score-0.539]
38 To be specific for each aspect, we have w⃗ pola and w⃗ subja representing the vector of feature weight for aspect a to calculate the polarity and subjectivity score. [sent-65, score-0.786]
39 b ja 0k To make prediction, we have the document-level sentiment classifier as h(x; w⃗ ) = aryg=m±a1xsm∈Sa(xx) w⃗TΨ(x,y,s) = . [sent-68, score-0.427]
40 jTohinist formulation of w⃗ TΨ(x,y, s) and classifier explains that for each sentence, the assigned aspect has the highest score over other aspects. [sent-74, score-0.613]
41 In order to obtain a model that is jointly trained, and satisfy the condition that the overall polarity of document should influence the sentiment of extracted informative sentences. [sent-77, score-0.844]
42 We denote the weight vector modeling the polarity of entire document as w⃗ doc, as follows: w⃗TΨ(x, y, s) = N(yx)∑j∈s( w⃗pTolajψpol(xj) + w⃗ dTocψpol(xj) +N1(x)∑j∈s w⃗sTubjajψsubj(xj)+y· w⃗dTocψpol(x) 2. [sent-78, score-0.263]
43 3 Training We trained our model using the latent structural SVMs (Yu and Joachims. [sent-79, score-0.184]
44 ∀i : s∈mSa(xxi) w⃗TΨ(xi,yi,s) ≥ s′∈mSa(xxi) w⃗TΨ(xi,−yi,s′) + △(yi,−yi,s′) − ξi We define △(yi, −yi, s′) = 1, that is, we view docWume ednetfi nleev △el (syen,t−imyent classification loss as the loss function. [sent-83, score-0.133]
45 To circumvent the optimization difficulty, we employ the framework of structural SVMs (Tsochantaridis et al. [sent-85, score-0.088]
46 , 2004) with latent variables proposed by Yu (2009) using the CCCP algorithm (Yuille and Rangarajan. [sent-86, score-0.096]
47 In terms of the formulation here, since the true informative sentence set is never observed, it is a hidden or latent variable. [sent-88, score-0.321]
48 Thus, we keep si fixed to compute the upper bound for the concave part of each constraint, and rewrite the constraints as ξi≥s′∈mSa(xxi) w⃗TΨ(xi,−yi,s′)− w⃗ TΨ(xi,yi,si)+1 After that, we have yi completed with the latent variable si as if it is observed. [sent-89, score-0.218]
49 For each training 335 example, starting with an initialization sentence set in which each sentence is with an aspect label, the training procedure alternates between solving an instance of the structural SVM using the si and predicting a new sentence until the learned weight vector w⃗ converges. [sent-90, score-0.971]
50 In our work, we use the performance on a validation set to choose the halting iteration, as is similar to Yessenalina (2010). [sent-91, score-0.058]
51 4 Model Initialization To initialize the informative sentence set, following the experiment result of Yessenalina (2010), we set f(|x|) = 0. [sent-93, score-0.195]
52 o3t∗al| sentences as tehe o snleyt o sef liencftor thmeat tiovpe part of the do√cument. [sent-95, score-0.06]
53 The normalizing factor is set as N(x) = as Yessenalina (2010) demonstrates that sq√uare |r,oo ats n Yoersmseanlizaalitnioan ( can 0b)e d uesmeofunl. [sent-96, score-0.028]
54 To analyze the aspect of each sentence, we need to give an initial guess of the aspect and sentiment for each sentence. [sent-97, score-1.559]
55 Sentence level sentiment initialization : To ini- √f|x|, tialize the sentence level sentiment, we employ a rule based method incorporating positive and negative sentiment terms, with adversative relation considered. [sent-98, score-1.123]
56 Sentence aspect assignment initialization : Obviously, if a synonym of aspect a occurs in sentence xl, we assign aspect a to xl, and add xl to an aspect specific sentence set Pa. [sent-99, score-2.525]
57 For sentence xl without any aspect term, we set a as the aspect label if a = argmaxsimai′larity(xl,Pa′) We select the sentences whose sentiment is consistent with the overall rating of a review as the initial guess of the latent variable. [sent-100, score-2.117]
58 3 Experiments In this section, we evaluate our model in terms of document and sentence level sentiment classification, we also analyze the performance of aspect assignment for each sentence. [sent-101, score-1.185]
59 The model is evaluated on the Chinese restaurant reviews crawled from Dianping1 . [sent-102, score-0.254]
60 Each of the reviews has an overall rating ranging from one to five stars. [sent-103, score-0.356]
61 To be specific, we consider a review as positive if its rating is greater 1http://www. [sent-104, score-0.274]
62 com/ than or equal to 4 stars, or negative if less than or equal to 2 stars. [sent-106, score-0.044]
63 The corpus has 4500 positive and 4500 negative reviews. [sent-107, score-0.087]
64 We train 5 different models by splitting these reviews into 9 folds. [sent-109, score-0.185]
65 Two folds are left out as the testing set, and each model takes 5 folds as training set, 2 folds as validation set, and the performance is averaged. [sent-110, score-0.162]
66 Besides, we also manually label 100 reviews, in which each sentence is labeled as positive or negative corresponding to certain aspect or with no aspect description. [sent-111, score-1.239]
67 5% of the total sentences can be assigned to aspect by directly matching with aspect terms, which explains that keywords based aspect sentiment analysis may fail. [sent-115, score-2.183]
68 For restaurant reviews, we pre-defined 11aspects, and for each aspect, we select about 5 frequently used terms to describe that aspect. [sent-116, score-0.069]
69 Table 1 shows some examples of the aspect synonym set used in this paper: Table 1: Samples of Aspect Synonym. [sent-117, score-0.588]
70 Document level sentiment classification We compare our method with previous work on sentiment classification using standard SVM(Pang et al. [sent-118, score-1.006]
71 Clearly, our model outperforms the baseline on document level sentiment classification. [sent-123, score-0.532]
72 Sentence level sentiment classification Our method can extract a set of informative sentence that are coherent to the overall rating of a review. [sent-124, score-0.919]
73 The evaluation of sentence-level sentiment classification is based on manual annotation. [sent-125, score-0.49]
74 net/faculty/hml/ 336 sample 100 reviews, and present the extracted 300 sentences to annotators who have been asked to assign positive/negative/non-related labels. [sent-128, score-0.06]
75 Among the sentences, 251 correctly classified as positive or negative while 49 are misclassified. [sent-129, score-0.087]
76 And, 38 sentences of the 49 sentences have mix opinions or are non-subjective sentences. [sent-130, score-0.167]
77 Aspect Assignment To evaluate the accuracy of aspect assignment, we compare the predicted aspect labels with the ground truth (manual annotation). [sent-131, score-1.082]
78 As some of sentences have explicit aspect terms and can be easily identified, we only consider those sentences without aspect words. [sent-132, score-1.202]
79 In the extracted 300 sentences, 78 sentences have aspect terms, and for the rest, our model assigns correct aspect labels to 44 sentences while random guess only maps 21 sentences with right labels. [sent-133, score-1.312]
80 4 Conclusion and Future Work In this paper, we address the task of multilevel sentiment classification of online custom reviews for fine granular aspect analysis. [sent-134, score-1.533]
81 We present a structural learning model based on struct-SVM with latent variables. [sent-135, score-0.184]
82 The informative sentence set is regarded as latent variable, in which each sentence is attached with certain aspect label. [sent-136, score-0.946]
83 The training procedure alternates between solving an instance of the standard structural SVM optimization and predicting a new sentence set until the halting condition is satisfied. [sent-137, score-0.257]
84 Preliminary experiments demonstrate that our model obtains promising performance. [sent-139, score-0.069]
85 For future work, we propose to incorporate prior knowledge of latent variables to the model. [sent-141, score-0.096]
86 One possible way is to reformulate the loss function by taking the predicted aspect of the extracted sentences into consideration. [sent-142, score-0.636]
87 Aspect and sentiment unification model for online review analysis. [sent-154, score-0.546]
88 Discovering fine-grained sentiment with latent variable struc- tured prediction models. [sent-196, score-0.523]
89 A joint model of text and aspect ratings for sentiment summarization. [sent-200, score-0.994]
90 Support vector machine learning for interdependent and structured output spaces. [sent-204, score-0.027]
91 Latent aspect rating analysis on review text data: A rating regression approach. [sent-208, score-0.945]
wordName wordTfidf (topN-words)
[('aspect', 0.541), ('sentiment', 0.427), ('yessenalina', 0.209), ('xj', 0.19), ('reviews', 0.185), ('polarity', 0.184), ('pol', 0.178), ('rating', 0.142), ('informative', 0.125), ('granular', 0.117), ('subj', 0.116), ('latent', 0.096), ('review', 0.089), ('structural', 0.088), ('document', 0.079), ('multilevel', 0.076), ('xl', 0.072), ('sentence', 0.07), ('restaurant', 0.069), ('aj', 0.069), ('msa', 0.065), ('xxi', 0.065), ('classification', 0.063), ('svms', 0.063), ('subjectivity', 0.061), ('sentences', 0.06), ('yi', 0.06), ('initialization', 0.06), ('cccp', 0.058), ('dtoc', 0.058), ('grief', 0.058), ('halting', 0.058), ('ptolaj', 0.058), ('stubjaj', 0.058), ('yuille', 0.058), ('fine', 0.057), ('users', 0.056), ('folds', 0.054), ('guess', 0.05), ('opinions', 0.047), ('synonym', 0.047), ('ryan', 0.047), ('oscar', 0.046), ('price', 0.046), ('inghua', 0.046), ('quattoni', 0.046), ('yu', 0.045), ('negative', 0.044), ('attached', 0.044), ('stars', 0.043), ('tsinghua', 0.043), ('positive', 0.043), ('assignment', 0.042), ('explains', 0.042), ('brody', 0.041), ('tsochantaridis', 0.041), ('alternates', 0.041), ('svm', 0.039), ('coherent', 0.037), ('custom', 0.037), ('obtains', 0.037), ('discover', 0.036), ('lu', 0.035), ('loss', 0.035), ('titov', 0.035), ('discriminate', 0.033), ('yue', 0.032), ('jo', 0.032), ('promising', 0.032), ('benjamin', 0.031), ('si', 0.031), ('analysis', 0.031), ('wang', 0.031), ('thorsten', 0.031), ('claire', 0.031), ('online', 0.03), ('formulation', 0.03), ('chinese', 0.03), ('overall', 0.029), ('normalizing', 0.028), ('ss', 0.028), ('pang', 0.028), ('structured', 0.027), ('preliminary', 0.027), ('subjective', 0.026), ('level', 0.026), ('expressing', 0.026), ('joint', 0.026), ('mao', 0.025), ('neylon', 0.025), ('bes', 0.025), ('hongning', 0.025), ('elicitation', 0.025), ('chenghua', 0.025), ('yulan', 0.025), ('tthh', 0.025), ('sq', 0.025), ('noemie', 0.025), ('ackstr', 0.025), ('bry', 0.025)]
simIndex simValue paperId paperTitle
same-paper 1 1.0000001 100 acl-2012-Fine Granular Aspect Analysis using Latent Structural Models
Author: Lei Fang ; Minlie Huang
Abstract: In this paper, we present a structural learning model forjoint sentiment classification and aspect analysis of text at various levels of granularity. Our model aims to identify highly informative sentences that are aspect-specific in online custom reviews. The primary advantages of our model are two-fold: first, it performs document-level and sentence-level sentiment polarity classification jointly; second, it is able to find informative sentences that are closely related to some respects in a review, which may be helpful for aspect-level sentiment analysis such as aspect-oriented summarization. The proposed method was evaluated with 9,000 Chinese restaurant reviews. Preliminary experiments demonstrate that our model obtains promising performance. 1
2 0.44186279 28 acl-2012-Aspect Extraction through Semi-Supervised Modeling
Author: Arjun Mukherjee ; Bing Liu
Abstract: Aspect extraction is a central problem in sentiment analysis. Current methods either extract aspects without categorizing them, or extract and categorize them using unsupervised topic modeling. By categorizing, we mean the synonymous aspects should be clustered into the same category. In this paper, we solve the problem in a different setting where the user provides some seed words for a few aspect categories and the model extracts and clusters aspect terms into categories simultaneously. This setting is important because categorizing aspects is a subjective task. For different application purposes, different categorizations may be needed. Some form of user guidance is desired. In this paper, we propose two statistical models to solve this seeded problem, which aim to discover exactly what the user wants. Our experimental results show that the two proposed models are indeed able to perform the task effectively. 1
3 0.32538509 61 acl-2012-Cross-Domain Co-Extraction of Sentiment and Topic Lexicons
Author: Fangtao Li ; Sinno Jialin Pan ; Ou Jin ; Qiang Yang ; Xiaoyan Zhu
Abstract: Extracting sentiment and topic lexicons is important for opinion mining. Previous works have showed that supervised learning methods are superior for this task. However, the performance of supervised methods highly relies on manually labeled training data. In this paper, we propose a domain adaptation framework for sentiment- and topic- lexicon co-extraction in a domain of interest where we do not require any labeled data, but have lots of labeled data in another related domain. The framework is twofold. In the first step, we generate a few high-confidence sentiment and topic seeds in the target domain. In the second step, we propose a novel Relational Adaptive bootstraPping (RAP) algorithm to expand the seeds in the target domain by exploiting the labeled source domain data and the relationships between topic and sentiment words. Experimental results show that our domain adaptation framework can extract precise lexicons in the target domain without any annotation.
4 0.25371262 62 acl-2012-Cross-Lingual Mixture Model for Sentiment Classification
Author: Xinfan Meng ; Furu Wei ; Xiaohua Liu ; Ming Zhou ; Ge Xu ; Houfeng Wang
Abstract: The amount of labeled sentiment data in English is much larger than that in other languages. Such a disproportion arouse interest in cross-lingual sentiment classification, which aims to conduct sentiment classification in the target language (e.g. Chinese) using labeled data in the source language (e.g. English). Most existing work relies on machine translation engines to directly adapt labeled data from the source language to the target language. This approach suffers from the limited coverage of vocabulary in the machine translation results. In this paper, we propose a generative cross-lingual mixture model (CLMM) to leverage unlabeled bilingual parallel data. By fitting parameters to maximize the likelihood of the bilingual parallel data, the proposed model learns previously unseen sentiment words from the large bilingual parallel data and improves vocabulary coverage signifi- cantly. Experiments on multiple data sets show that CLMM is consistently effective in two settings: (1) labeled data in the target language are unavailable; and (2) labeled data in the target language are also available.
5 0.2252848 161 acl-2012-Polarity Consistency Checking for Sentiment Dictionaries
Author: Eduard Dragut ; Hong Wang ; Clement Yu ; Prasad Sistla ; Weiyi Meng
Abstract: Polarity classification of words is important for applications such as Opinion Mining and Sentiment Analysis. A number of sentiment word/sense dictionaries have been manually or (semi)automatically constructed. The dictionaries have substantial inaccuracies. Besides obvious instances, where the same word appears with different polarities in different dictionaries, the dictionaries exhibit complex cases, which cannot be detected by mere manual inspection. We introduce the concept of polarity consistency of words/senses in sentiment dictionaries in this paper. We show that the consistency problem is NP-complete. We reduce the polarity consistency problem to the satisfiability problem and utilize a fast SAT solver to detect inconsistencies in a sentiment dictionary. We perform experiments on four sentiment dictionaries and WordNet.
6 0.19700073 21 acl-2012-A System for Real-time Twitter Sentiment Analysis of 2012 U.S. Presidential Election Cycle
7 0.18564692 151 acl-2012-Multilingual Subjectivity and Sentiment Analysis
9 0.18006554 37 acl-2012-Baselines and Bigrams: Simple, Good Sentiment and Topic Classification
10 0.15025146 144 acl-2012-Modeling Review Comments
11 0.14396302 182 acl-2012-Spice it up? Mining Refinements to Online Instructions from User Generated Content
12 0.12733978 180 acl-2012-Social Event Radar: A Bilingual Context Mining and Sentiment Analysis Summarization System
13 0.11198985 187 acl-2012-Subgroup Detection in Ideological Discussions
15 0.086157441 192 acl-2012-Tense and Aspect Error Correction for ESL Learners Using Global Context
16 0.078513578 188 acl-2012-Subgroup Detector: A System for Detecting Subgroups in Online Discussions
17 0.070811599 9 acl-2012-A Cost Sensitive Part-of-Speech Tagging: Differentiating Serious Errors from Minor Errors
18 0.069112562 190 acl-2012-Syntactic Stylometry for Deception Detection
19 0.061474696 177 acl-2012-Sentence Dependency Tagging in Online Question Answering Forums
20 0.058563415 33 acl-2012-Automatic Event Extraction with Structured Preference Modeling
topicId topicWeight
[(0, -0.202), (1, 0.245), (2, 0.241), (3, -0.291), (4, 0.112), (5, -0.089), (6, 0.068), (7, -0.004), (8, -0.31), (9, -0.114), (10, 0.007), (11, 0.032), (12, -0.165), (13, -0.205), (14, -0.086), (15, -0.1), (16, 0.109), (17, -0.037), (18, -0.052), (19, -0.034), (20, 0.088), (21, -0.02), (22, -0.05), (23, 0.003), (24, -0.004), (25, -0.1), (26, -0.031), (27, -0.037), (28, -0.033), (29, -0.022), (30, -0.04), (31, -0.009), (32, -0.044), (33, 0.05), (34, -0.004), (35, 0.011), (36, -0.083), (37, 0.052), (38, -0.033), (39, 0.054), (40, 0.058), (41, 0.03), (42, 0.084), (43, -0.041), (44, -0.096), (45, -0.04), (46, -0.06), (47, 0.037), (48, -0.006), (49, 0.119)]
simIndex simValue paperId paperTitle
same-paper 1 0.97793138 100 acl-2012-Fine Granular Aspect Analysis using Latent Structural Models
Author: Lei Fang ; Minlie Huang
Abstract: In this paper, we present a structural learning model forjoint sentiment classification and aspect analysis of text at various levels of granularity. Our model aims to identify highly informative sentences that are aspect-specific in online custom reviews. The primary advantages of our model are two-fold: first, it performs document-level and sentence-level sentiment polarity classification jointly; second, it is able to find informative sentences that are closely related to some respects in a review, which may be helpful for aspect-level sentiment analysis such as aspect-oriented summarization. The proposed method was evaluated with 9,000 Chinese restaurant reviews. Preliminary experiments demonstrate that our model obtains promising performance. 1
2 0.90272528 28 acl-2012-Aspect Extraction through Semi-Supervised Modeling
Author: Arjun Mukherjee ; Bing Liu
Abstract: Aspect extraction is a central problem in sentiment analysis. Current methods either extract aspects without categorizing them, or extract and categorize them using unsupervised topic modeling. By categorizing, we mean the synonymous aspects should be clustered into the same category. In this paper, we solve the problem in a different setting where the user provides some seed words for a few aspect categories and the model extracts and clusters aspect terms into categories simultaneously. This setting is important because categorizing aspects is a subjective task. For different application purposes, different categorizations may be needed. Some form of user guidance is desired. In this paper, we propose two statistical models to solve this seeded problem, which aim to discover exactly what the user wants. Our experimental results show that the two proposed models are indeed able to perform the task effectively. 1
3 0.69815791 37 acl-2012-Baselines and Bigrams: Simple, Good Sentiment and Topic Classification
Author: Sida Wang ; Christopher Manning
Abstract: Variants of Naive Bayes (NB) and Support Vector Machines (SVM) are often used as baseline methods for text classification, but their performance varies greatly depending on the model variant, features used and task/ dataset. We show that: (i) the inclusion of word bigram features gives consistent gains on sentiment analysis tasks; (ii) for short snippet sentiment tasks, NB actually does better than SVMs (while for longer documents the opposite result holds); (iii) a simple but novel SVM variant using NB log-count ratios as feature values consistently performs well across tasks and datasets. Based on these observations, we identify simple NB and SVM variants which outperform most published results on sentiment analysis datasets, sometimes providing a new state-of-the-art performance level.
4 0.69457209 61 acl-2012-Cross-Domain Co-Extraction of Sentiment and Topic Lexicons
Author: Fangtao Li ; Sinno Jialin Pan ; Ou Jin ; Qiang Yang ; Xiaoyan Zhu
Abstract: Extracting sentiment and topic lexicons is important for opinion mining. Previous works have showed that supervised learning methods are superior for this task. However, the performance of supervised methods highly relies on manually labeled training data. In this paper, we propose a domain adaptation framework for sentiment- and topic- lexicon co-extraction in a domain of interest where we do not require any labeled data, but have lots of labeled data in another related domain. The framework is twofold. In the first step, we generate a few high-confidence sentiment and topic seeds in the target domain. In the second step, we propose a novel Relational Adaptive bootstraPping (RAP) algorithm to expand the seeds in the target domain by exploiting the labeled source domain data and the relationships between topic and sentiment words. Experimental results show that our domain adaptation framework can extract precise lexicons in the target domain without any annotation.
5 0.64832962 62 acl-2012-Cross-Lingual Mixture Model for Sentiment Classification
Author: Xinfan Meng ; Furu Wei ; Xiaohua Liu ; Ming Zhou ; Ge Xu ; Houfeng Wang
Abstract: The amount of labeled sentiment data in English is much larger than that in other languages. Such a disproportion arouse interest in cross-lingual sentiment classification, which aims to conduct sentiment classification in the target language (e.g. Chinese) using labeled data in the source language (e.g. English). Most existing work relies on machine translation engines to directly adapt labeled data from the source language to the target language. This approach suffers from the limited coverage of vocabulary in the machine translation results. In this paper, we propose a generative cross-lingual mixture model (CLMM) to leverage unlabeled bilingual parallel data. By fitting parameters to maximize the likelihood of the bilingual parallel data, the proposed model learns previously unseen sentiment words from the large bilingual parallel data and improves vocabulary coverage signifi- cantly. Experiments on multiple data sets show that CLMM is consistently effective in two settings: (1) labeled data in the target language are unavailable; and (2) labeled data in the target language are also available.
6 0.62639695 161 acl-2012-Polarity Consistency Checking for Sentiment Dictionaries
7 0.62365538 151 acl-2012-Multilingual Subjectivity and Sentiment Analysis
9 0.53598034 144 acl-2012-Modeling Review Comments
11 0.4732644 182 acl-2012-Spice it up? Mining Refinements to Online Instructions from User Generated Content
12 0.4041484 180 acl-2012-Social Event Radar: A Bilingual Context Mining and Sentiment Analysis Summarization System
13 0.33633143 190 acl-2012-Syntactic Stylometry for Deception Detection
15 0.3157551 186 acl-2012-Structuring E-Commerce Inventory
16 0.27972448 173 acl-2012-Self-Disclosure and Relationship Strength in Twitter Conversations
17 0.27626792 187 acl-2012-Subgroup Detection in Ideological Discussions
18 0.2223039 215 acl-2012-WizIE: A Best Practices Guided Development Environment for Information Extraction
19 0.21850568 192 acl-2012-Tense and Aspect Error Correction for ESL Learners Using Global Context
20 0.21226355 110 acl-2012-Historical Analysis of Legal Opinions with a Sparse Mixed-Effects Latent Variable Model
topicId topicWeight
[(15, 0.273), (25, 0.023), (26, 0.023), (28, 0.062), (30, 0.054), (37, 0.053), (39, 0.096), (59, 0.015), (74, 0.014), (82, 0.018), (84, 0.024), (85, 0.011), (90, 0.122), (92, 0.065), (94, 0.019), (99, 0.055)]
simIndex simValue paperId paperTitle
same-paper 1 0.795012 100 acl-2012-Fine Granular Aspect Analysis using Latent Structural Models
Author: Lei Fang ; Minlie Huang
Abstract: In this paper, we present a structural learning model forjoint sentiment classification and aspect analysis of text at various levels of granularity. Our model aims to identify highly informative sentences that are aspect-specific in online custom reviews. The primary advantages of our model are two-fold: first, it performs document-level and sentence-level sentiment polarity classification jointly; second, it is able to find informative sentences that are closely related to some respects in a review, which may be helpful for aspect-level sentiment analysis such as aspect-oriented summarization. The proposed method was evaluated with 9,000 Chinese restaurant reviews. Preliminary experiments demonstrate that our model obtains promising performance. 1
2 0.70606971 11 acl-2012-A Feature-Rich Constituent Context Model for Grammar Induction
Author: Dave Golland ; John DeNero ; Jakob Uszkoreit
Abstract: We present LLCCM, a log-linear variant ofthe constituent context model (CCM) of grammar induction. LLCCM retains the simplicity of the original CCM but extends robustly to long sentences. On sentences of up to length 40, LLCCM outperforms CCM by 13.9% bracketing F1 and outperforms a right-branching baseline in regimes where CCM does not.
3 0.652794 87 acl-2012-Exploiting Multiple Treebanks for Parsing with Quasi-synchronous Grammars
Author: Zhenghua Li ; Ting Liu ; Wanxiang Che
Abstract: We present a simple and effective framework for exploiting multiple monolingual treebanks with different annotation guidelines for parsing. Several types of transformation patterns (TP) are designed to capture the systematic annotation inconsistencies among different treebanks. Based on such TPs, we design quasisynchronous grammar features to augment the baseline parsing models. Our approach can significantly advance the state-of-the-art parsing accuracy on two widely used target treebanks (Penn Chinese Treebank 5. 1 and 6.0) using the Chinese Dependency Treebank as the source treebank. The improvements are respectively 1.37% and 1.10% with automatic part-of-speech tags. Moreover, an indirect comparison indicates that our approach also outperforms previous work based on treebank conversion.
4 0.62439901 28 acl-2012-Aspect Extraction through Semi-Supervised Modeling
Author: Arjun Mukherjee ; Bing Liu
Abstract: Aspect extraction is a central problem in sentiment analysis. Current methods either extract aspects without categorizing them, or extract and categorize them using unsupervised topic modeling. By categorizing, we mean the synonymous aspects should be clustered into the same category. In this paper, we solve the problem in a different setting where the user provides some seed words for a few aspect categories and the model extracts and clusters aspect terms into categories simultaneously. This setting is important because categorizing aspects is a subjective task. For different application purposes, different categorizations may be needed. Some form of user guidance is desired. In this paper, we propose two statistical models to solve this seeded problem, which aim to discover exactly what the user wants. Our experimental results show that the two proposed models are indeed able to perform the task effectively. 1
5 0.54430532 80 acl-2012-Efficient Tree-based Approximation for Entailment Graph Learning
Author: Jonathan Berant ; Ido Dagan ; Meni Adler ; Jacob Goldberger
Abstract: Learning entailment rules is fundamental in many semantic-inference applications and has been an active field of research in recent years. In this paper we address the problem of learning transitive graphs that describe entailment rules between predicates (termed entailment graphs). We first identify that entailment graphs exhibit a “tree-like” property and are very similar to a novel type of graph termed forest-reducible graph. We utilize this property to develop an iterative efficient approximation algorithm for learning the graph edges, where each iteration takes linear time. We compare our approximation algorithm to a recently-proposed state-of-the-art exact algorithm and show that it is more efficient and scalable both theoretically and empirically, while its output quality is close to that given by the optimal solution of the exact algorithm.
7 0.5412221 21 acl-2012-A System for Real-time Twitter Sentiment Analysis of 2012 U.S. Presidential Election Cycle
8 0.54027593 182 acl-2012-Spice it up? Mining Refinements to Online Instructions from User Generated Content
9 0.54025179 174 acl-2012-Semantic Parsing with Bayesian Tree Transducers
10 0.53600723 184 acl-2012-String Re-writing Kernel
11 0.53579164 10 acl-2012-A Discriminative Hierarchical Model for Fast Coreference at Large Scale
12 0.53529793 61 acl-2012-Cross-Domain Co-Extraction of Sentiment and Topic Lexicons
13 0.53513759 206 acl-2012-UWN: A Large Multilingual Lexical Knowledge Base
14 0.53445494 146 acl-2012-Modeling Topic Dependencies in Hierarchical Text Categorization
15 0.53429002 130 acl-2012-Learning Syntactic Verb Frames using Graphical Models
16 0.53426844 38 acl-2012-Bayesian Symbol-Refined Tree Substitution Grammars for Syntactic Parsing
17 0.53380048 63 acl-2012-Cross-lingual Parse Disambiguation based on Semantic Correspondence
18 0.53343213 191 acl-2012-Temporally Anchored Relation Extraction
19 0.53178209 156 acl-2012-Online Plagiarized Detection Through Exploiting Lexical, Syntax, and Semantic Information
20 0.53158647 167 acl-2012-QuickView: NLP-based Tweet Search