acl acl2012 acl2012-144 knowledge-graph by maker-knowledge-mining
Source: pdf
Author: Arjun Mukherjee ; Bing Liu
Abstract: Writing comments about news articles, blogs, or reviews have become a popular activity in social media. In this paper, we analyze reader comments about reviews. Analyzing review comments is important because reviews only tell the experiences and evaluations of reviewers about the reviewed products or services. Comments, on the other hand, are readers’ evaluations of reviews, their questions and concerns. Clearly, the information in comments is valuable for both future readers and brands. This paper proposes two latent variable models to simultaneously model and extract these key pieces of information. The results also enable classification of comments accurately. Experiments using Amazon review comments demonstrate the effectiveness of the proposed models.
Reference: text
sentIndex sentText sentNum sentScore
1 com Abstract Writing comments about news articles, blogs, or reviews have become a popular activity in social media. [sent-2, score-0.344]
2 In this paper, we analyze reader comments about reviews. [sent-3, score-0.226]
3 Analyzing review comments is important because reviews only tell the experiences and evaluations of reviewers about the reviewed products or services. [sent-4, score-0.69]
4 Clearly, the information in comments is valuable for both future readers and brands. [sent-6, score-0.279]
5 The results also enable classification of comments accurately. [sent-8, score-0.226]
6 Experiments using Amazon review comments demonstrate the effectiveness of the proposed models. [sent-9, score-0.572]
7 Introduction Online reviews enable consumers to evaluate the products and services that they have used. [sent-11, score-0.118]
8 These reviews are also used by other consumers and businesses as a valuable source of opinions. [sent-12, score-0.118]
9 Often a reviewer may not be an expert of the product and may misuse the product or make other mistakes. [sent-14, score-0.211]
10 There may also be aspects of the product that the reviewer did not mention but a reader wants to know. [sent-15, score-0.223]
11 Some reviewers may even write fake reviews to promote 320 Bing Liu Department of Computer Science University of Illinois at Chicago Chicago, IL 60607, USA l iub @ c s . [sent-16, score-0.159]
12 To improve the online review system and user experience, some review hosting sites allow readers to write comments about reviews (apart from just providing a feedback by clicking whether the review is helpful or not). [sent-19, score-1.467]
13 Review comments mainly contain the following information: Thumbs-up or thumbs-down: Some readers may comment on whether they find the review useful in helping them make a buying decision. [sent-23, score-0.977]
14 Agreement or disagreement: Some readers who comment on a review may be users of the product themselves. [sent-24, score-0.819]
15 They often state whether they agree or disagree with the review. [sent-25, score-0.261]
16 Such comments are valuable as they provide a second opinion, which may even identify fake reviews because a genuine user often can easily spot reviewers who have never used the product. [sent-26, score-0.385]
17 Question and answer: A commenter may ask for clarification or about some aspects of the product that are not covered in the review. [sent-27, score-0.148]
18 In this paper, we use statistical modeling to model review comments. [sent-28, score-0.346]
19 It models topics and different types of expressions, which represent different types of comment posts: 1. [sent-31, score-0.58]
20 Note that we have no expressions for answers to questions as there are usually no specific phrases indicating that a post answers a question except starting with the name of the person who asked the question. [sent-46, score-0.349]
21 However, there are typical phrases for acknowledging answers, thus answer acknowledgement expressions. [sent-47, score-0.296]
22 For ease of presentation, we call these expressions the comment expressions (or Cexpressions). [sent-55, score-0.478]
23 Its generative process separates topics and C- expression types using a switch variable and treats posts as random mixtures over latent topics and Cexpression types. [sent-57, score-0.526]
24 In short, the two models provide a principled and integrated approach to simultaneously discover topics and Cexpressions, which is the goal of this work. [sent-59, score-0.152]
25 Note that topics are usually product aspects in this work. [sent-60, score-0.3]
26 The extracted C-expressions and topics from review comments are very useful in practice. [sent-61, score-0.724]
27 First of all, C-expressions enable us to perform more accurate classification of comments, which can give us a good evaluation of the review quality and credibility. [sent-62, score-0.346]
28 For example, a review with many Disagreeing and Thumbs-down comments is dubious. [sent-63, score-0.572]
29 Second, the extracted C-expressions and topics help identify the key product aspects that people are troubled with in disagreements and in questions. [sent-64, score-0.3]
30 With these pieces of information, comments for a review can be summarized. [sent-66, score-0.61]
31 To the best of our knowledge, there is no reported work on such a fine-grained modeling of review comments. [sent-68, score-0.346]
32 , topic and sentiment modeling, review quality prediction and review spam detection. [sent-71, score-0.885]
33 321 The proposed models have been evaluated both qualitatively and quantitatively using a large number of review comments from Amazon. [sent-74, score-0.572]
34 Related Work We believe that this work is the first attempt to model review comments for fine-grained analysis. [sent-79, score-0.572]
35 , 2003) have been used to mine topics in large text collections. [sent-82, score-0.152]
36 These models produce only topics but not multiple types of expressions together with topics. [sent-87, score-0.253]
37 Our labeling is on topical terms and C-expressions with the purpose of obtaining some priors to separate topics and C-expressions. [sent-90, score-0.405]
38 In sentiment analysis, researchers have jointly modeled topics and sentiment words (Lin and He, 2009; Mei et al. [sent-91, score-0.288]
39 , 2010), which used a switch variable trained with Maximum-Entropy to separate topic and sentiment words. [sent-97, score-0.154]
40 However, unlike sentiments and topics in reviews, which are emitted in the same sentence, C-expressions often interleave with topics across sentences and the same comment post may also have multiple types of C-expressions. [sent-99, score-0.85]
41 The TME (Topic and Multi-Expression) model is a hierarchical generative model motivated by the joint occurrence of various types of expressions indicating Thumbs-up, Thumbs-down, Question, Answer acknowledgement, Agreement, and Disagreement and topics in comment posts. [sent-143, score-0.605]
42 A typical comment post mentions a few topics (using semantically related topical terms) and expresses some viewpoints with one or more C-expression types (using semantically related expressions). [sent-145, score-0.755]
43 This observation motivates the generative process of our model where documents (posts) are represented as random mixtures of latent topics and C-expression types. [sent-146, score-0.152]
44 Each topic or C-expression type is characterized by a distribution over terms (words/phrases). [sent-147, score-0.145]
45 Assume 322 distribution of topics and C-expressions in a document ? [sent-148, score-0.182]
46 We parameterize multinomials over topics using a matrix Θ? [sent-172, score-0.201]
47 The multinomials over terms associated with each topic are parameterized by a matrix Φ? [sent-197, score-0.194]
48 that were assigned to topics and Cexpression types respectively. [sent-564, score-0.19]
49 is the (success) probability (of the Bernoulli distribution) of emitting a topical/aspect term in a comment post ? [sent-578, score-0.511]
50 , assumes that both topical and C-expression terms are equally likely to be emitted in a comment post). [sent-613, score-0.558]
51 However, knowing the fact that topics are more likely to be emitted than expressions in a post apriori motivates us to take guidance from asymmetric priors (i. [sent-646, score-0.563]
52 are more close to the actual distribution of topical terms in posts based on some domain knowledge. [sent-653, score-0.238]
53 The Max-Ent parameters can be learned from a small number of labeled topical and C-expression terms (words and phrases) which can serve as good priors. [sent-715, score-0.191]
54 The idea is motivated by the following observation: topical and C-expression terms typically play different syntactic roles in a sentence. [sent-716, score-0.161]
55 ) tend to be noun and noun phrases while expression terms (“I refute”, “how can you say”, “great review”) usually contain pronouns, verbs, wh- determiners, adjectives, and modals. [sent-720, score-0.2]
56 Specifically, we evaluate the discovered C-expressions, contentious aspects, and aspects often mentioned in questions. [sent-962, score-0.233]
57 1 Dataset and Experiment SettingsWe crawled comments of reviews in Amazon. [sent-964, score-0.344]
58 For each comment we extracted its id, the comment author id, the review id on which it commented, and the review author id. [sent-966, score-1.396]
59 Our database consisted of 21,3 16 authors, 37,548 reviews, and 88,345 comments with an average of 124 words per comment post. [sent-967, score-0.578]
60 , we estimated the asymmetric Beta priors using the method of moments discussed in Section 3. [sent-971, score-0.153]
61 We sampled 1000 random posts and for each post we identified the C-expressions emitted. [sent-972, score-0.188]
62 , we randomly sampled 500 terms from our corpus appearing at least 10 times and labeled them as topical (332) or C-expressions (168) and used the corresponding feature vector of ? [sent-985, score-0.191]
63 324 each term (in the context of posts where it occurs) to train the Max-Ent model. [sent-986, score-0.125]
64 We set the number of topics, T = 100 and the number of C-expression types, E = 6 (Thumbs-up, Thumbs-down, Question, Answer acknowledgement, Agreement and Disagreement) as in review comments, we usually find these six dominant expression types. [sent-987, score-0.453]
65 Instead, the expression types became less specific as the expression term space became sparser. [sent-993, score-0.3]
66 Table 1 shows the top terms of all expression types using the TME model. [sent-997, score-0.241]
67 Table 3 reports the precisions @ top 25, 50, 75, and 100 rank positions for all six expression types across both models. [sent-1007, score-0.182]
68 terms denote possible errors Blue (italics) terms denote those newly discovered by the model; rest (black) were used in Max-Ent training. [sent-1017, score-0.184]
69 From Table 3, we observe that METME consistently outperforms TME in precisions across all expression types and all rank positions. [sent-1020, score-0.145]
70 This shows that Max-Ent priors are more effective in discovering expressions than Beta priors. [sent-1021, score-0.19]
71 3 Comment Classification Here we can help comment comment show that the discovered C-expressions comment classification. [sent-1024, score-1.122]
72 This labeling is a fairly easy task as one can almost certainly make out to which type a comment belongs. [sent-1035, score-0.352]
73 For instance, we found many comments belonging to both Thumbs-down and Disagreement, Thumbs-up with Acknowledgement and with Question. [sent-1038, score-0.226]
74 As comments in Question type mostly use the punctuation “? [sent-1043, score-0.226]
75 We report the average precision (P), recall (R), and F1 score over 100 comments for each particular domain. [sent-1121, score-0.226]
76 We note that the annotation resulted in a new label “Answer” which consists of mostly replies to comments with questions. [sent-1133, score-0.226]
77 Thus, to improve the performance of the Answer 326 type comments, we added three binary features for each comment c on top of C-expression features: i) Is the author of c the review author too? [sent-1135, score-0.735]
78 ii) Is there any comment posted before c by some author a which has been previously classified as a question post? [sent-1137, score-0.431]
79 iii) Is there any comment posted after c by author a that replies to c (using @name) and is an Answer-Acknowledgement comment (which again has been previously classified as such)? [sent-1138, score-0.704]
80 4 Contention Points and Questioned Aspects We now turn to the task of discovering points of contention in disagreement comments and aspects (or topics) raised in questions. [sent-1145, score-0.718]
81 By “points”, we mean the topical terms on which some contentions or disagreements have been expressed. [sent-1146, score-0.196]
82 Topics being the product aspects are also indirectly evaluated in this task. [sent-1147, score-0.148]
83 , we first select the top k topics that are mentioned in d according to its topic distribution, ? [sent-1152, score-0.275]
84 , we emit the topical terms (words/phrases) of topics in ? [sent-1181, score-0.359]
85 This is so because the Dirichlet distribution has a smoothing effect which assigns some non-zero probability mass to every term in the vocabulary for each topic ? [sent-1229, score-0.134]
86 = 5, which are reasonable because a post normally does not talk about many topics (? [sent-1253, score-0.263]
87 ), and the contention points (aspect terms) appear quite close to the disagreement expressions. [sent-1254, score-0.377]
88 This baseline is reasonable because topical terms are usually nouns and noun phrases and are near disagreement (question) expressions. [sent-1263, score-0.372]
89 We asked them to report the precision of the discovered terms for a post by judging them as being indeed valid points of contention and report recall in a post by judging how many of actually contentious points in the post were discovered. [sent-1296, score-0.855]
90 In Table 5 (a), we report the average precision and recall for 100 posts in each domain by the two judges J1 and J2 for different methods on the task of discovering points (aspects) of contention. [sent-1297, score-0.214]
91 In Table 5 (b), similar results are reported for the task of discovering questioned aspects in 100 question comments for each product domain. [sent-1298, score-0.557]
92 Conclusion This paper proposed the problem of modeling review comments, and presented two models TME and ME-TME to model and to extract topics (aspects) and various comment expressions. [sent-1304, score-0.85]
93 These expressions enable us to classify comments more accurately, and to find contentious aspects and questioned aspects. [sent-1305, score-0.525]
94 These pieces of information also allow us to produce a simple summary of comments for each review as discussed in Section 1. [sent-1306, score-0.61]
95 To our knowledge, this is the first attempt to analyze comments in such details. [sent-1307, score-0.226]
96 Identifying agreement and disagreement in conversational speech: Use of Bayesian networks to model pragmatic dependencies. [sent-1353, score-0.22]
97 Designing novel review ranking systems: predicting the usefulness and impact of reviews. [sent-1359, score-0.346]
98 Aspect and sentiment unification model for online review analysis. [sent-1385, score-0.446]
99 ILDA: interdependent LDA model for learning latent aspects and their ratings from online product reviews. [sent-1457, score-0.18]
100 Latent aspect rating analysis on review text data: a rating regression approach. [sent-1578, score-0.383]
wordName wordTfidf (topN-words)
[('tme', 0.398), ('comment', 0.352), ('review', 0.346), ('comments', 0.226), ('answer', 0.185), ('disagreement', 0.177), ('contention', 0.156), ('topics', 0.152), ('agree', 0.14), ('disagree', 0.121), ('reviews', 0.118), ('post', 0.111), ('expression', 0.107), ('cexpression', 0.104), ('metme', 0.104), ('topical', 0.102), ('priors', 0.092), ('contentious', 0.087), ('topic', 0.086), ('aspects', 0.08), ('question', 0.079), ('reply', 0.077), ('posts', 0.077), ('acknowledgement', 0.077), ('reviewer', 0.075), ('helped', 0.072), ('draw', 0.07), ('mukherjee', 0.069), ('questioned', 0.069), ('product', 0.068), ('sentiment', 0.068), ('discovered', 0.066), ('expressions', 0.063), ('asymmetric', 0.061), ('terms', 0.059), ('judges', 0.058), ('beta', 0.058), ('opinion', 0.056), ('readers', 0.053), ('cexpressions', 0.052), ('clarifying', 0.052), ('kindle', 0.052), ('refute', 0.052), ('multinomials', 0.049), ('term', 0.048), ('emit', 0.046), ('emitted', 0.045), ('disregard', 0.045), ('doubts', 0.045), ('ipod', 0.045), ('jindal', 0.045), ('ramage', 0.044), ('points', 0.044), ('agreement', 0.043), ('sampling', 0.042), ('unfair', 0.041), ('fake', 0.041), ('yes', 0.041), ('titov', 0.041), ('guidance', 0.039), ('spam', 0.039), ('clarify', 0.039), ('doesn', 0.038), ('pieces', 0.038), ('explain', 0.038), ('types', 0.038), ('top', 0.037), ('aspect', 0.037), ('chicago', 0.035), ('discovering', 0.035), ('gibbs', 0.035), ('please', 0.035), ('bogus', 0.035), ('contentions', 0.035), ('murakami', 0.035), ('nikon', 0.035), ('oppose', 0.035), ('terrific', 0.035), ('wondering', 0.035), ('wrz', 0.035), ('lu', 0.034), ('phrases', 0.034), ('world', 0.034), ('dirichlet', 0.034), ('true', 0.034), ('judging', 0.033), ('acm', 0.033), ('liu', 0.032), ('online', 0.032), ('griffiths', 0.032), ('debate', 0.032), ('steyvers', 0.032), ('answers', 0.031), ('completely', 0.031), ('labeled', 0.03), ('document', 0.03), ('credible', 0.03), ('clears', 0.03), ('convinced', 0.03), ('flawed', 0.03)]
simIndex simValue paperId paperTitle
same-paper 1 1.0000012 144 acl-2012-Modeling Review Comments
Author: Arjun Mukherjee ; Bing Liu
Abstract: Writing comments about news articles, blogs, or reviews have become a popular activity in social media. In this paper, we analyze reader comments about reviews. Analyzing review comments is important because reviews only tell the experiences and evaluations of reviewers about the reviewed products or services. Comments, on the other hand, are readers’ evaluations of reviews, their questions and concerns. Clearly, the information in comments is valuable for both future readers and brands. This paper proposes two latent variable models to simultaneously model and extract these key pieces of information. The results also enable classification of comments accurately. Experiments using Amazon review comments demonstrate the effectiveness of the proposed models.
2 0.18022636 28 acl-2012-Aspect Extraction through Semi-Supervised Modeling
Author: Arjun Mukherjee ; Bing Liu
Abstract: Aspect extraction is a central problem in sentiment analysis. Current methods either extract aspects without categorizing them, or extract and categorize them using unsupervised topic modeling. By categorizing, we mean the synonymous aspects should be clustered into the same category. In this paper, we solve the problem in a different setting where the user provides some seed words for a few aspect categories and the model extracts and clusters aspect terms into categories simultaneously. This setting is important because categorizing aspects is a subjective task. For different application purposes, different categorizations may be needed. Some form of user guidance is desired. In this paper, we propose two statistical models to solve this seeded problem, which aim to discover exactly what the user wants. Our experimental results show that the two proposed models are indeed able to perform the task effectively. 1
3 0.17006302 55 acl-2012-Community Answer Summarization for Multi-Sentence Question with Group L1 Regularization
Author: Wen Chan ; Xiangdong Zhou ; Wei Wang ; Tat-Seng Chua
Abstract: We present a novel answer summarization method for community Question Answering services (cQAs) to address the problem of “incomplete answer”, i.e., the “best answer” of a complex multi-sentence question misses valuable information that is contained in other answers. In order to automatically generate a novel and non-redundant community answer summary, we segment the complex original multi-sentence question into several sub questions and then propose a general Conditional Random Field (CRF) based answer summary method with group L1 regularization. Various textual and non-textual QA features are explored. Specifically, we explore four different types of contextual factors, namely, the information novelty and non-redundancy modeling for local and non-local sentence interactions under question segmentation. To further unleash the potential of the abundant cQA features, we introduce the group L1 regularization for feature learning. Experimental results on a Yahoo! Answers dataset show that our proposed method significantly outperforms state-of-the-art methods on cQA summarization task.
4 0.16194983 182 acl-2012-Spice it up? Mining Refinements to Online Instructions from User Generated Content
Author: Gregory Druck ; Bo Pang
Abstract: There are a growing number of popular web sites where users submit and review instructions for completing tasks as varied as building a table and baking a pie. In addition to providing their subjective evaluation, reviewers often provide actionable refinements. These refinements clarify, correct, improve, or provide alternatives to the original instructions. However, identifying and reading all relevant reviews is a daunting task for a user. In this paper, we propose a generative model that jointly identifies user-proposed refinements in instruction reviews at multiple granularities, and aligns them to the appropriate steps in the original instructions. Labeled data is not readily available for these tasks, so we focus on the unsupervised setting. In experiments in the recipe domain, our model provides 90. 1% F1 for predicting refinements at the review level, and 77.0% F1 for predicting refinement segments within reviews.
5 0.16071096 8 acl-2012-A Corpus of Textual Revisions in Second Language Writing
Author: John Lee ; Jonathan Webster
Abstract: This paper describes the creation of the first large-scale corpus containing drafts and final versions of essays written by non-native speakers, with the sentences aligned across different versions. Furthermore, the sentences in the drafts are annotated with comments from teachers. The corpus is intended to support research on textual revision by language learners, and how it is influenced by feedback. This corpus has been converted into an XML format conforming to the standards of the Text Encoding Initiative (TEI).
6 0.15025146 100 acl-2012-Fine Granular Aspect Analysis using Latent Structural Models
7 0.12376512 98 acl-2012-Finding Bursty Topics from Microblogs
8 0.12070784 190 acl-2012-Syntactic Stylometry for Deception Detection
10 0.11198577 61 acl-2012-Cross-Domain Co-Extraction of Sentiment and Topic Lexicons
11 0.1075419 177 acl-2012-Sentence Dependency Tagging in Online Question Answering Forums
12 0.10413775 22 acl-2012-A Topic Similarity Model for Hierarchical Phrase-based Translation
13 0.10042664 180 acl-2012-Social Event Radar: A Bilingual Context Mining and Sentiment Analysis Summarization System
14 0.095198683 86 acl-2012-Exploiting Latent Information to Predict Diffusions of Novel Topics on Social Networks
15 0.094956584 199 acl-2012-Topic Models for Dynamic Translation Model Adaptation
16 0.094939202 79 acl-2012-Efficient Tree-Based Topic Modeling
17 0.084831811 31 acl-2012-Authorship Attribution with Author-aware Topic Models
18 0.083170407 187 acl-2012-Subgroup Detection in Ideological Discussions
20 0.073491767 88 acl-2012-Exploiting Social Information in Grounded Language Learning via Grammatical Reduction
topicId topicWeight
[(0, -0.199), (1, 0.174), (2, 0.17), (3, -0.061), (4, -0.114), (5, 0.04), (6, -0.009), (7, -0.04), (8, -0.077), (9, -0.004), (10, -0.04), (11, 0.041), (12, -0.087), (13, -0.013), (14, -0.071), (15, -0.001), (16, 0.097), (17, -0.123), (18, -0.011), (19, -0.084), (20, 0.109), (21, 0.025), (22, 0.085), (23, 0.026), (24, -0.104), (25, -0.221), (26, -0.153), (27, -0.096), (28, 0.021), (29, 0.016), (30, -0.103), (31, -0.167), (32, -0.002), (33, 0.092), (34, 0.025), (35, 0.065), (36, -0.146), (37, 0.155), (38, 0.086), (39, -0.063), (40, 0.02), (41, 0.158), (42, 0.033), (43, 0.061), (44, -0.047), (45, 0.014), (46, -0.043), (47, -0.075), (48, 0.05), (49, -0.074)]
simIndex simValue paperId paperTitle
same-paper 1 0.96344268 144 acl-2012-Modeling Review Comments
Author: Arjun Mukherjee ; Bing Liu
Abstract: Writing comments about news articles, blogs, or reviews have become a popular activity in social media. In this paper, we analyze reader comments about reviews. Analyzing review comments is important because reviews only tell the experiences and evaluations of reviewers about the reviewed products or services. Comments, on the other hand, are readers’ evaluations of reviews, their questions and concerns. Clearly, the information in comments is valuable for both future readers and brands. This paper proposes two latent variable models to simultaneously model and extract these key pieces of information. The results also enable classification of comments accurately. Experiments using Amazon review comments demonstrate the effectiveness of the proposed models.
2 0.76137596 182 acl-2012-Spice it up? Mining Refinements to Online Instructions from User Generated Content
Author: Gregory Druck ; Bo Pang
Abstract: There are a growing number of popular web sites where users submit and review instructions for completing tasks as varied as building a table and baking a pie. In addition to providing their subjective evaluation, reviewers often provide actionable refinements. These refinements clarify, correct, improve, or provide alternatives to the original instructions. However, identifying and reading all relevant reviews is a daunting task for a user. In this paper, we propose a generative model that jointly identifies user-proposed refinements in instruction reviews at multiple granularities, and aligns them to the appropriate steps in the original instructions. Labeled data is not readily available for these tasks, so we focus on the unsupervised setting. In experiments in the recipe domain, our model provides 90. 1% F1 for predicting refinements at the review level, and 77.0% F1 for predicting refinement segments within reviews.
3 0.67097628 190 acl-2012-Syntactic Stylometry for Deception Detection
Author: Song Feng ; Ritwik Banerjee ; Yejin Choi
Abstract: Most previous studies in computerized deception detection have relied only on shallow lexico-syntactic patterns. This paper investigates syntactic stylometry for deception detection, adding a somewhat unconventional angle to prior literature. Over four different datasets spanning from the product review to the essay domain, we demonstrate that features driven from Context Free Grammar (CFG) parse trees consistently improve the detection performance over several baselines that are based only on shallow lexico-syntactic features. Our results improve the best published result on the hotel review data (Ott et al., 2011) reaching 91.2% accuracy with 14% error reduction. ,
4 0.61776799 28 acl-2012-Aspect Extraction through Semi-Supervised Modeling
Author: Arjun Mukherjee ; Bing Liu
Abstract: Aspect extraction is a central problem in sentiment analysis. Current methods either extract aspects without categorizing them, or extract and categorize them using unsupervised topic modeling. By categorizing, we mean the synonymous aspects should be clustered into the same category. In this paper, we solve the problem in a different setting where the user provides some seed words for a few aspect categories and the model extracts and clusters aspect terms into categories simultaneously. This setting is important because categorizing aspects is a subjective task. For different application purposes, different categorizations may be needed. Some form of user guidance is desired. In this paper, we propose two statistical models to solve this seeded problem, which aim to discover exactly what the user wants. Our experimental results show that the two proposed models are indeed able to perform the task effectively. 1
5 0.54999852 100 acl-2012-Fine Granular Aspect Analysis using Latent Structural Models
Author: Lei Fang ; Minlie Huang
Abstract: In this paper, we present a structural learning model forjoint sentiment classification and aspect analysis of text at various levels of granularity. Our model aims to identify highly informative sentences that are aspect-specific in online custom reviews. The primary advantages of our model are two-fold: first, it performs document-level and sentence-level sentiment polarity classification jointly; second, it is able to find informative sentences that are closely related to some respects in a review, which may be helpful for aspect-level sentiment analysis such as aspect-oriented summarization. The proposed method was evaluated with 9,000 Chinese restaurant reviews. Preliminary experiments demonstrate that our model obtains promising performance. 1
6 0.54421103 8 acl-2012-A Corpus of Textual Revisions in Second Language Writing
7 0.5346818 55 acl-2012-Community Answer Summarization for Multi-Sentence Question with Group L1 Regularization
8 0.44678578 31 acl-2012-Authorship Attribution with Author-aware Topic Models
9 0.42773122 177 acl-2012-Sentence Dependency Tagging in Online Question Answering Forums
10 0.41738388 110 acl-2012-Historical Analysis of Legal Opinions with a Sparse Mixed-Effects Latent Variable Model
12 0.36732462 156 acl-2012-Online Plagiarized Detection Through Exploiting Lexical, Syntax, and Semantic Information
13 0.36134523 98 acl-2012-Finding Bursty Topics from Microblogs
14 0.35393369 86 acl-2012-Exploiting Latent Information to Predict Diffusions of Novel Topics on Social Networks
15 0.34680599 186 acl-2012-Structuring E-Commerce Inventory
16 0.33581015 79 acl-2012-Efficient Tree-Based Topic Modeling
17 0.32668653 37 acl-2012-Baselines and Bigrams: Simple, Good Sentiment and Topic Classification
18 0.32408714 180 acl-2012-Social Event Radar: A Bilingual Context Mining and Sentiment Analysis Summarization System
19 0.31030533 14 acl-2012-A Joint Model for Discovery of Aspects in Utterances
20 0.3040829 199 acl-2012-Topic Models for Dynamic Translation Model Adaptation
topicId topicWeight
[(26, 0.033), (28, 0.037), (29, 0.011), (30, 0.406), (37, 0.025), (39, 0.078), (74, 0.017), (82, 0.014), (84, 0.026), (85, 0.012), (90, 0.119), (92, 0.071), (94, 0.018), (99, 0.046)]
simIndex simValue paperId paperTitle
1 0.94943619 65 acl-2012-Crowdsourcing Inference-Rule Evaluation
Author: Naomi Zeichner ; Jonathan Berant ; Ido Dagan
Abstract: The importance of inference rules to semantic applications has long been recognized and extensive work has been carried out to automatically acquire inference-rule resources. However, evaluating such resources has turned out to be a non-trivial task, slowing progress in the field. In this paper, we suggest a framework for evaluating inference-rule resources. Our framework simplifies a previously proposed “instance-based evaluation” method that involved substantial annotator training, making it suitable for crowdsourcing. We show that our method produces a large amount of annotations with high inter-annotator agreement for a low cost at a short period of time, without requiring training expert annotators.
same-paper 2 0.87551939 144 acl-2012-Modeling Review Comments
Author: Arjun Mukherjee ; Bing Liu
Abstract: Writing comments about news articles, blogs, or reviews have become a popular activity in social media. In this paper, we analyze reader comments about reviews. Analyzing review comments is important because reviews only tell the experiences and evaluations of reviewers about the reviewed products or services. Comments, on the other hand, are readers’ evaluations of reviews, their questions and concerns. Clearly, the information in comments is valuable for both future readers and brands. This paper proposes two latent variable models to simultaneously model and extract these key pieces of information. The results also enable classification of comments accurately. Experiments using Amazon review comments demonstrate the effectiveness of the proposed models.
3 0.80238003 75 acl-2012-Discriminative Strategies to Integrate Multiword Expression Recognition and Parsing
Author: Matthieu Constant ; Anthony Sigogne ; Patrick Watrin
Abstract: and Parsing Anthony Sigogne Universit e´ Paris-Est LIGM, CNRS France s igogne @univ-mlv . fr Patrick Watrin Universit e´ de Louvain CENTAL Belgium pat rick .wat rin @ ucl ouvain .be view, their incorporation has also been considered The integration of multiword expressions in a parsing procedure has been shown to improve accuracy in an artificial context where such expressions have been perfectly pre-identified. This paper evaluates two empirical strategies to integrate multiword units in a real constituency parsing context and shows that the results are not as promising as has sometimes been suggested. Firstly, we show that pregrouping multiword expressions before parsing with a state-of-the-art recognizer improves multiword recognition accuracy and unlabeled attachment score. However, it has no statistically significant impact in terms of F-score as incorrect multiword expression recognition has important side effects on parsing. Secondly, integrating multiword expressions in the parser grammar followed by a reranker specific to such expressions slightly improves all evaluation metrics.
4 0.74938929 19 acl-2012-A Ranking-based Approach to Word Reordering for Statistical Machine Translation
Author: Nan Yang ; Mu Li ; Dongdong Zhang ; Nenghai Yu
Abstract: Long distance word reordering is a major challenge in statistical machine translation research. Previous work has shown using source syntactic trees is an effective way to tackle this problem between two languages with substantial word order difference. In this work, we further extend this line of exploration and propose a novel but simple approach, which utilizes a ranking model based on word order precedence in the target language to reposition nodes in the syntactic parse tree of a source sentence. The ranking model is automatically derived from word aligned parallel data with a syntactic parser for source language based on both lexical and syntactical features. We evaluated our approach on largescale Japanese-English and English-Japanese machine translation tasks, and show that it can significantly outperform the baseline phrase- based SMT system.
5 0.59854108 83 acl-2012-Error Mining on Dependency Trees
Author: Claire Gardent ; Shashi Narayan
Abstract: In recent years, error mining approaches were developed to help identify the most likely sources of parsing failures in parsing systems using handcrafted grammars and lexicons. However the techniques they use to enumerate and count n-grams builds on the sequential nature of a text corpus and do not easily extend to structured data. In this paper, we propose an algorithm for mining trees and apply it to detect the most likely sources of generation failure. We show that this tree mining algorithm permits identifying not only errors in the generation system (grammar, lexicon) but also mismatches between the structures contained in the input and the input structures expected by our generator as well as a few idiosyncrasies/error in the input data.
6 0.55161953 28 acl-2012-Aspect Extraction through Semi-Supervised Modeling
7 0.5494557 148 acl-2012-Modified Distortion Matrices for Phrase-Based Statistical Machine Translation
8 0.54414576 190 acl-2012-Syntactic Stylometry for Deception Detection
9 0.53999716 80 acl-2012-Efficient Tree-based Approximation for Entailment Graph Learning
10 0.53265971 139 acl-2012-MIX Is Not a Tree-Adjoining Language
11 0.53250843 182 acl-2012-Spice it up? Mining Refinements to Online Instructions from User Generated Content
12 0.5047828 174 acl-2012-Semantic Parsing with Bayesian Tree Transducers
13 0.50293458 175 acl-2012-Semi-supervised Dependency Parsing using Lexical Affinities
14 0.49684915 108 acl-2012-Hierarchical Chunk-to-String Translation
15 0.49642318 197 acl-2012-Tokenization: Returning to a Long Solved Problem A Survey, Contrastive Experiment, Recommendations, and Toolkit
16 0.49595043 100 acl-2012-Fine Granular Aspect Analysis using Latent Structural Models
17 0.48912263 34 acl-2012-Automatically Learning Measures of Child Language Development
18 0.48826933 84 acl-2012-Estimating Compact Yet Rich Tree Insertion Grammars
19 0.48788249 184 acl-2012-String Re-writing Kernel
20 0.48752835 60 acl-2012-Coupling Label Propagation and Constraints for Temporal Fact Extraction