acl acl2011 acl2011-55 knowledge-graph by maker-knowledge-mining
Source: pdf
Author: Wenting Xiong ; Diane Litman
Abstract: Identifying peer-review helpfulness is an important task for improving the quality of feedback that students receive from their peers. As a first step towards enhancing existing peerreview systems with new functionality based on helpfulness detection, we examine whether standard product review analysis techniques also apply to our new context of peer reviews. In addition, we investigate the utility of incorporating additional specialized features tailored to peer review. Our preliminary results show that the structural features, review unigrams and meta-data combined are useful in modeling the helpfulness of both peer reviews and product reviews, while peer-review specific auxiliary features can further improve helpfulness prediction.
Reference: text
sentIndex sentText sentNum sentScore
1 edu Abstract Identifying peer-review helpfulness is an important task for improving the quality of feedback that students receive from their peers. [sent-3, score-0.722]
2 As a first step towards enhancing existing peerreview systems with new functionality based on helpfulness detection, we examine whether standard product review analysis techniques also apply to our new context of peer reviews. [sent-4, score-1.458]
3 In addition, we investigate the utility of incorporating additional specialized features tailored to peer review. [sent-5, score-0.77]
4 Our preliminary results show that the structural features, review unigrams and meta-data combined are useful in modeling the helpfulness of both peer reviews and product reviews, while peer-review specific auxiliary features can further improve helpfulness prediction. [sent-6, score-2.373]
5 1 Introduction Peer reviewing of student writing has been widely used in various academic fields. [sent-7, score-0.063]
6 While existing web-based peer-review systems largely save instructors effort in setting up peer-review assignments and managing document assignment, there still remains the problem that the quality of peer reviews is often poor (Nelson and Schunn, 2009). [sent-8, score-0.815]
7 Thus to enhance the effectiveness of existing peer-review systems, we propose to automatically predict the helpfulness of peer reviews. [sent-9, score-1.182]
8 In this paper, we examine prior techniques that have been used to successfully rank helpfulness for product reviews, and adapt them to the peer-review domain. [sent-10, score-0.761]
9 In particular, we use an SVM regression algorithm to predict the helpfulness of peer reviews 502 Diane Litman University of Pittsburgh Department of Computer Science & Learning Research and Development Center Pittsburgh, PA, 15260 l itman@ cs . [sent-11, score-1.465]
10 edu based on generic linguistic features automatically mined from peer reviews and students’ papers, plus specialized features based on existing knowledge about peer reviews. [sent-13, score-1.663]
11 We not only demonstrate that prior techniques from product reviews can be successfully tailored to peer reviews, but also show the importance of peer-review specific features. [sent-14, score-0.966]
12 However, given some similarity between peer reviews and other review types, we hypothesize that techniques used to predict review helpfulness in other domains can also be ap- plied to peer reviews. [sent-16, score-2.251]
13 (2006) used regression to predict the helpfulness ranking of product reviews based on various classes of linguistic features. [sent-18, score-1.052]
14 Ghose and Ipeirotis (2010) further examined the socio-economic impact of product reviews using a similar approach and suggested the usefulness of subjectivity analysis. [sent-19, score-0.357]
15 , 2008) of movie reviews showed that helpfulness depends on reviewers’ expertise, their writing style, and the timeliness of the review. [sent-21, score-0.965]
16 Tsur and Rappoport (2009) proposed RevRank to select the most helpful book reviews in an unsupervised fashion based on review lexicons. [sent-22, score-0.454]
17 However, studies of Amazon’s product reviews also show that the perProceedings ofP thoer t4l9atnhd A, Onrnuegaoln M,e Jeuntineg 19 o-f2 t4h,e 2 A0s1s1o. [sent-23, score-0.357]
18 i ac t2io0n11 fo Ar Cssoocmiaptuiotanti foonra Clo Lminpguutiast i ocns:aslh Loirntpgaupisetrics , pages 502–507, Table 1: Generic features motivated by related work of product reviews (Kim et al. [sent-25, score-0.413]
19 ceived helpfulness of a review depends not only on its review content, but also on social effects such as product qualities, and individual bias in the presence of mixed opinion distribution (Danescu-NiculescuMizil et al. [sent-27, score-1.033]
20 Nonetheless, several properties distinguish our corpus of peer reviews from other types of reviews: 1) The helpfulness of our peer reviews is directly rated using a discrete scale from one to five instead of being defined as a function of binary votes (e. [sent-29, score-2.281]
21 , 2006)); 2) Peer reviews frequently refer to the related students’ papers, thus review analysis needs to take into account paper topics; 3) Within the context of education, peer-review helpfulness often has a writing specific semantics, e. [sent-32, score-1.108]
22 improving revision likelihood; 4) In general, peer-review corpora collected from classrooms are of a much smaller size compared to online product reviews. [sent-34, score-0.103]
23 To tailor existing techniques to peer reviews, we will thus propose new specialized features to address these issues. [sent-35, score-0.735]
24 3 Data and Features In this study, we use a previously annotated peerreview corpus (Nelson and Schunn, 2009; Patchan et al. [sent-36, score-0.066]
25 , 2009), collected using a freely available webbased peer-review system (Cho and Schunn, 2007) in an introductory college history class. [sent-37, score-0.042]
26 The corpus consists of 16 papers (about six pages each) and 267 reviews (varying from twenty words to about two hundred words). [sent-38, score-0.3]
27 Two experts (a writing instructor and a content instructor) (Patchan et al. [sent-39, score-0.148]
28 , 2009) were asked to rate the helpfulness of each peer review on a scale from one to five (Pearson correlation r = 0. [sent-40, score-1.329]
29 For our study, we consider 503 the average ratings given by the two experts (which roughly follow a normal distribution) as the gold standard of review helpfulness. [sent-43, score-0.248]
30 Two example rated peer reviews (shown verbatim) follow: A helpful peer review of average-rating 5: The support and explanation of the ideas could use some work. [sent-44, score-1.521]
31 Page 2 says that the 13th amendment ended the war. [sent-47, score-0.029]
32 was there no more fighting or problems once this amendment was added? [sent-49, score-0.029]
33 An unhelpful peer review of average-rating 1: Your paper and its main points are easy to find and to follow. [sent-54, score-0.667]
34 As shown in Table 1, we first mine generic linguistic features from reviews and papers based on the results of syntactic analysis of the texts, aiming to replicate the feature sets used by Kim et al. [sent-55, score-0.429]
35 Note, however, that peer-review helpfulness is rated for the whole review, which can include multiple idea units. [sent-62, score-0.662]
36 92), the percentage of problems that have problem localization —the presence of information indicating where the problem is localized in the related paper— (kappa = . [sent-64, score-0.202]
37 69), and the percentage of problems that have a solution the presence of a solution addressing the problem mentioned in the review— (kappa = . [sent-65, score-0.08]
38 These kappa values (Nelson and Schunn, 2009) were calculated from a subset of the corpus for evaluating the reliability of human annotations3. [sent-67, score-0.032]
39 Consider the example ofthe helpful review presented in Section 3 which was manu— ally separated into two idea units (each presented in a separate paragraph). [sent-68, score-0.163]
40 As both ideas are coded as problem with the presence of problem localization and solution, the cognitive-science features of this review are praise%=0, problem%=1, summary%=0, localization%=1, and solution%=1 . [sent-69, score-0.423]
41 Lexical category features (LEX2): Ten categories of keyword lexicons developed for automatically detecting the previously manually annotated feedback types (Xiong et al. [sent-70, score-0.153]
42 The categories are learned in a semi-supervised way based on syntactic and semantic functions, such as suggestion dents’ papers using topic signature (Lin and Hovy, 2000) software kindly provided by Annie Louis. [sent-72, score-0.072]
43 3These annotators are not the same experts who rated the peer-review helpfulness. [sent-79, score-0.082]
44 We first manually created a list of words that were specified as signal words for annotating feedbackType and problem localization in the coding manual; then we supplemented the list with words selected by a decision tree model learned using a Bag-of-Words representation of the peer reviews. [sent-85, score-0.699]
45 These categories will also be helpful for reducing the feature space size as discussed below. [sent-86, score-0.02]
46 Localization features (LOC): Five features developed in our prior work (Xiong and Litman, 2010) for automatically identifying the manually coded problem localization tags, such as the percentage of problems in reviews that could be matched with a localization pattern (e. [sent-87, score-0.813]
47 “on page 5”, “the section about”), the percentage of sentences in which topic words exist between the subject and object, etc. [sent-89, score-0.039]
48 (2006), we train our helpfulness model using SVM regression with a radial basis function kernel provided by SVMlight (Joachims, 1999). [sent-91, score-0.651]
49 We first evaluate each feature type in isolation to investigate its predictive power of peerreview helpfulness; we then examine them together in various combinations to find the most useful fea- ture set for modeling peer-review helpfulness. [sent-92, score-0.097]
50 Performance is evaluated in 10-fold cross validation of our 267 peer reviews by predicting the absolute helpfulness scores (with Pearson correlation coefficient r) as well as by predicting helpfulness ranking (with Spearman rank correlation coefficient rs). [sent-93, score-2.288]
51 Although predicted helpfulness ranking could be directly used to compare the helpfulness of a given set of reviews, predicting helpfulness rating is desirable in practice to compare helpfulness between existing reviews and new written ones without reranking all previously ranked reviews. [sent-94, score-2.922]
52 Results are presented regarding the generic features and the specialized features respectively, with 95% confidence bounds. [sent-95, score-0.324]
53 1 Performance of Generic Features Evaluation of the generic features is presented in Table 2, showing that all classes except syntactic (SYN) and meta-data (MET) features are significantly correlated with both helpfulness rating (r) and helpfulness ranking (rs). [sent-97, score-1.523]
54 59) (although within the significant correlations, the dif- ference among coefficients are insignificant). [sent-100, score-0.03]
55 Note that in isolation, MET (paper ratings) are not significantly correlated with peer-review helpfulness, which is different from prior findings of product reviews (Kim et al. [sent-101, score-0.413]
56 , 2006) where product scores are significantly correlated with product-review helpfulness. [sent-102, score-0.111]
57 When comparing the performance between predicting helpfulness ratings versus ranking, we observe r ≈ rs consistently for our peer reviews, while Kim ert ≈al. [sent-104, score-1.336]
58 (2006) did, in that simply combining all features does not improve the model’s performance. [sent-107, score-0.056]
59 In sum our results verify our hypothesis that the effectiveness of generic features can be transferred to our peerreview domain for predicting review helpfulness. [sent-109, score-0.406]
60 10 90 * Table 2: Performance evaluation of the generic features for predicting peer-review helpfulness. [sent-118, score-0.197]
61 2 Analysis of the Specialized Features Evaluation of the specialized features is shown in Table 3, where all features examined are signifi4The best performing single feature type reported (Kim et al. [sent-122, score-0.251]
62 505 cantly correlated with both helpfulness rating and ranking. [sent-126, score-0.682]
63 When evaluated in isolation, although specialized features have weaker correlation coefficients ([0. [sent-127, score-0.26]
64 51]) than the best generic features, these differences are not significant, and the specialized features have the potential advantage of being theory-based. [sent-129, score-0.268]
65 The use of features related to meaningful dimensions of writing has contributed to validity and greater acceptability in the related area of automated essay scoring (Attali and Burstein, 2006). [sent-130, score-0.152]
66 When combined with some generic features, the specialized features improve the model’s performance in terms of both r and rs compared to the best performance in Section 4. [sent-131, score-0.327]
67 Though the improvement is not significant yet, we think it still interesting to investigate the potential trend to understand how specialized features capture additional information of peer-review helpfulness. [sent-133, score-0.195]
68 Semantic features did not help when working with generic lexical features in Section 4. [sent-137, score-0.185]
69 1 (second to last row in Table 2), but they can be successfully combined with the lexical category features and further improve the performance as indicated here. [sent-138, score-0.126]
70 3) When cognitive-science and localization features are introduced, the prediction becomes even more accurate, which reaches a Pearson correlation of 0. [sent-139, score-0.251]
71 – 5 Discussion Despite the difference between peer reviews and other types of reviews as discussed in Section 2, our work demonstrates that many generic linguistic features are also effective in predicting peer-review helpfulness. [sent-142, score-1.271]
72 The model’s performance can be alter- FeaturesPearson rSpearman rs cL oOEgXSC20 0. [sent-143, score-0.059]
73 189 Table 3: Evaluation of the model’s performance (all significant) after introducing the specialized features. [sent-151, score-0.139]
74 natively achieved and further improved by adding auxiliary features tailored to peer reviews. [sent-152, score-0.613]
75 These specialized features not only introduce domain expertise, but also capture linguistic information at an abstracted level, which can help avoid the risk of over-fitting. [sent-153, score-0.195]
76 Given only 267 peer reviews in our case compared to more than ten thousand product reviews (Kim et al. [sent-154, score-1.156]
77 (2006), we indirectly compared them by analyzing the utility of features in isolation and combined. [sent-157, score-0.105]
78 While STR+UGR+MET is found as the best combination of generic features for both types of reviews, the best individual feature type is different (review unigrams work best for product reviews; structural features work best for peer reviews). [sent-158, score-0.83]
79 More importantly, meta-data, which are found to significantly affect the perceived helpfulness of product reviews (Kim et al. [sent-159, score-1.005]
80 Perhaps because the paper grades and other helpfulness ratings are not visible to the reviewers, we have less of a social dimension for predicting the helpfulness of peer reviews. [sent-162, score-1.921]
81 We also found that SVM regression does not favor ranking over predicting helpfulness as in (Kim et al. [sent-163, score-0.748]
82 2) Our qualitative comparison shows that the utility of generic features (e. [sent-166, score-0.147]
83 meta-data features) in predicting review helpfulness varies between different review types. [sent-168, score-0.981]
84 3) We further show that prediction performance could be improved by incorporating specialized features that capture helpfulness information specific to peer reviews. [sent-169, score-1.346]
85 In the future, we would like to replace the manually coded peer-review specialized features (cogS) with their automatic predictions, since we have already shown in our prior work that some important cognitive-science constructs can be successfully identified automatically. [sent-170, score-0.33]
86 5 Also, it is interesting to observe that the average helpfulness ratings assigned by experts (used as the gold standard in this study) differ from those given by students. [sent-171, score-0.732]
87 Prior work on this corpus has already shown that feedback features of review comments differ not only between students and experts, but also between the writing and the content experts (Patchan et al. [sent-172, score-0.424]
88 (2009) focused on the review com- ments, we hypothesize that there is also a difference in perceived peer-review helpfulness. [sent-175, score-0.164]
89 Therefore, we are planning to investigate the impact of these different helpfulness ratings on the utilities of features used in modeling peer-review helpfulness. [sent-176, score-0.741]
90 Finally, we would like to integrate our helpfulness model into a web-based peer-review system to improve the quality of both peer reviews and paper revisions. [sent-177, score-1.426]
91 Schunn, Janyce Wiebe, Joanna Drummond, and Michael Lipschultz who kindly gave us valuable feedback while writing this paper. [sent-182, score-0.139]
92 Scaffolded writing and rewriting in the discipline: A webbased reciprocal peer review system. [sent-195, score-0.753]
93 Exploring document clustering techniques for personalized peer assessment in exploratory courses. [sent-209, score-0.524]
94 Estimating the helpfulness and economic impact of prod- uct reviews: Mining text and reviewer characteristics. [sent-214, score-0.627]
95 The automated acquisition of topic signatures for text summarization. [sent-231, score-0.033]
96 The nature of feedback: how different types of peer feedback affect writing performance. [sent-247, score-0.634]
97 A validation study of students’ end comments: Comparing comments by students, a writing instructor, and a content instructor. [sent-253, score-0.1]
98 Detecting key sentences for automatic assistance in peerreviewing research articles in educational sciences. [sent-258, score-0.025]
99 Revrank: A fully unsupervised algorithm for selecting the most helpful book reviews. [sent-262, score-0.036]
100 Assessing reviewers performance based on mining problem localization in peer-review data. [sent-273, score-0.178]
wordName wordTfidf (topN-words)
[('helpfulness', 0.627), ('peer', 0.524), ('reviews', 0.275), ('localization', 0.16), ('schunn', 0.149), ('review', 0.143), ('specialized', 0.139), ('patchan', 0.099), ('nelson', 0.094), ('kim', 0.089), ('product', 0.082), ('generic', 0.073), ('xiong', 0.073), ('predicting', 0.068), ('peerreview', 0.066), ('writing', 0.063), ('rs', 0.059), ('ratings', 0.058), ('features', 0.056), ('litman', 0.053), ('christian', 0.052), ('feedbacktype', 0.05), ('students', 0.048), ('feedback', 0.047), ('experts', 0.047), ('wenting', 0.044), ('melissa', 0.044), ('coded', 0.043), ('instructor', 0.038), ('met', 0.036), ('rated', 0.035), ('correlation', 0.035), ('cogs', 0.033), ('featurespearson', 0.033), ('ghose', 0.033), ('kwangsu', 0.033), ('revrank', 0.033), ('rspearman', 0.033), ('ugr', 0.033), ('cho', 0.033), ('tailored', 0.033), ('pearson', 0.032), ('kappa', 0.032), ('diane', 0.032), ('pittsburgh', 0.031), ('isolation', 0.031), ('row', 0.03), ('coefficients', 0.03), ('spearman', 0.03), ('correlated', 0.029), ('attali', 0.029), ('amendment', 0.029), ('alamitos', 0.029), ('kindly', 0.029), ('sandor', 0.029), ('ranking', 0.029), ('prior', 0.027), ('rating', 0.026), ('successfully', 0.025), ('papers', 0.025), ('praise', 0.025), ('tsur', 0.025), ('educational', 0.025), ('constructs', 0.025), ('regression', 0.024), ('expertise', 0.024), ('webbased', 0.023), ('tutoring', 0.022), ('presence', 0.021), ('revision', 0.021), ('perceived', 0.021), ('percentage', 0.021), ('votes', 0.021), ('comments', 0.02), ('unigrams', 0.02), ('volume', 0.02), ('helpful', 0.02), ('detecting', 0.02), ('svm', 0.019), ('history', 0.019), ('paragraph', 0.019), ('education', 0.019), ('structural', 0.019), ('solution', 0.019), ('mining', 0.018), ('topic', 0.018), ('utility', 0.018), ('essay', 0.018), ('tenth', 0.017), ('social', 0.017), ('pa', 0.017), ('assessing', 0.017), ('study', 0.017), ('existing', 0.016), ('stroudsburg', 0.016), ('book', 0.016), ('automated', 0.015), ('predict', 0.015), ('manually', 0.015), ('category', 0.015)]
simIndex simValue paperId paperTitle
same-paper 1 1.0000002 55 acl-2011-Automatically Predicting Peer-Review Helpfulness
Author: Wenting Xiong ; Diane Litman
Abstract: Identifying peer-review helpfulness is an important task for improving the quality of feedback that students receive from their peers. As a first step towards enhancing existing peerreview systems with new functionality based on helpfulness detection, we examine whether standard product review analysis techniques also apply to our new context of peer reviews. In addition, we investigate the utility of incorporating additional specialized features tailored to peer review. Our preliminary results show that the structural features, review unigrams and meta-data combined are useful in modeling the helpfulness of both peer reviews and product reviews, while peer-review specific auxiliary features can further improve helpfulness prediction.
2 0.1375186 45 acl-2011-Aspect Ranking: Identifying Important Product Aspects from Online Consumer Reviews
Author: Jianxing Yu ; Zheng-Jun Zha ; Meng Wang ; Tat-Seng Chua
Abstract: In this paper, we dedicate to the topic of aspect ranking, which aims to automatically identify important product aspects from online consumer reviews. The important aspects are identified according to two observations: (a) the important aspects of a product are usually commented by a large number of consumers; and (b) consumers’ opinions on the important aspects greatly influence their overall opinions on the product. In particular, given consumer reviews of a product, we first identify the product aspects by a shallow dependency parser and determine consumers’ opinions on these aspects via a sentiment classifier. We then develop an aspect ranking algorithm to identify the important aspects by simultaneously considering the aspect frequency and the influence of consumers’ opinions given to each aspect on their overall opinions. The experimental results on 11 popular products in four domains demonstrate the effectiveness of our approach. We further apply the aspect ranking results to the application ofdocumentlevel sentiment classification, and improve the performance significantly.
Author: Danushka Bollegala ; David Weir ; John Carroll
Abstract: We describe a sentiment classification method that is applicable when we do not have any labeled data for a target domain but have some labeled data for multiple other domains, designated as the source domains. We automat- ically create a sentiment sensitive thesaurus using both labeled and unlabeled data from multiple source domains to find the association between words that express similar sentiments in different domains. The created thesaurus is then used to expand feature vectors to train a binary classifier. Unlike previous cross-domain sentiment classification methods, our method can efficiently learn from multiple source domains. Our method significantly outperforms numerous baselines and returns results that are better than or comparable to previous cross-domain sentiment classification methods on a benchmark dataset containing Amazon user reviews for different types of products.
4 0.087319106 204 acl-2011-Learning Word Vectors for Sentiment Analysis
Author: Andrew L. Maas ; Raymond E. Daly ; Peter T. Pham ; Dan Huang ; Andrew Y. Ng ; Christopher Potts
Abstract: Unsupervised vector-based approaches to semantics can model rich lexical meanings, but they largely fail to capture sentiment information that is central to many word meanings and important for a wide range of NLP tasks. We present a model that uses a mix of unsupervised and supervised techniques to learn word vectors capturing semanticterm–documentinformation as well as rich sentiment content. The proposed model can leverage both continuous and multi-dimensional sentiment information as well as non-sentiment annotations. We instantiate the model to utilize the document-level sentiment polarity annotations present in many online documents (e.g. star ratings). We evaluate the model using small, widely used sentiment and subjectivity corpora and find it out-performs several previously introduced methods for sentiment classification. We also introduce a large dataset , of movie reviews to serve as a more robust benchmark for work in this area.
5 0.085312746 82 acl-2011-Content Models with Attitude
Author: Christina Sauper ; Aria Haghighi ; Regina Barzilay
Abstract: We present a probabilistic topic model for jointly identifying properties and attributes of social media review snippets. Our model simultaneously learns a set of properties of a product and captures aggregate user sentiments towards these properties. This approach directly enables discovery of highly rated or inconsistent properties of a product. Our model admits an efficient variational meanfield inference algorithm which can be parallelized and run on large snippet collections. We evaluate our model on a large corpus of snippets from Yelp reviews to assess property and attribute prediction. We demonstrate that it outperforms applicable baselines by a considerable margin.
6 0.062580891 20 acl-2011-A New Dataset and Method for Automatically Grading ESOL Texts
7 0.055947486 136 acl-2011-Finding Deceptive Opinion Spam by Any Stretch of the Imagination
8 0.048913222 179 acl-2011-Is Machine Translation Ripe for Cross-Lingual Sentiment Classification?
9 0.04457346 194 acl-2011-Language Use: What can it tell us?
10 0.042179316 205 acl-2011-Learning to Grade Short Answer Questions using Semantic Similarity Measures and Dependency Graph Alignments
11 0.03979554 52 acl-2011-Automatic Labelling of Topic Models
12 0.038851991 77 acl-2011-Computing and Evaluating Syntactic Complexity Features for Automated Scoring of Spontaneous Non-Native Speech
13 0.03818664 54 acl-2011-Automatically Extracting Polarity-Bearing Topics for Cross-Domain Sentiment Classification
14 0.038087282 28 acl-2011-A Statistical Tree Annotator and Its Applications
15 0.036870688 159 acl-2011-Identifying Noun Product Features that Imply Opinions
16 0.036722504 103 acl-2011-Domain Adaptation by Constraining Inter-Domain Variability of Latent Feature Representation
17 0.035185557 248 acl-2011-Predicting Clicks in a Vocabulary Learning System
18 0.034553938 292 acl-2011-Target-dependent Twitter Sentiment Classification
19 0.034260493 2 acl-2011-AM-FM: A Semantic Framework for Translation Quality Assessment
20 0.033884112 312 acl-2011-Turn-Taking Cues in a Human Tutoring Corpus
topicId topicWeight
[(0, 0.096), (1, 0.078), (2, 0.034), (3, 0.006), (4, -0.015), (5, 0.005), (6, -0.009), (7, 0.02), (8, 0.009), (9, -0.01), (10, -0.0), (11, -0.037), (12, -0.019), (13, 0.015), (14, 0.02), (15, 0.04), (16, -0.045), (17, -0.03), (18, -0.001), (19, -0.027), (20, 0.014), (21, -0.011), (22, -0.015), (23, 0.027), (24, -0.002), (25, -0.009), (26, 0.059), (27, -0.026), (28, 0.001), (29, 0.038), (30, 0.001), (31, 0.026), (32, -0.081), (33, 0.042), (34, 0.005), (35, -0.016), (36, -0.065), (37, -0.009), (38, -0.041), (39, 0.08), (40, 0.045), (41, -0.033), (42, 0.008), (43, 0.009), (44, 0.097), (45, 0.043), (46, 0.052), (47, 0.112), (48, 0.065), (49, 0.031)]
simIndex simValue paperId paperTitle
same-paper 1 0.90599388 55 acl-2011-Automatically Predicting Peer-Review Helpfulness
Author: Wenting Xiong ; Diane Litman
Abstract: Identifying peer-review helpfulness is an important task for improving the quality of feedback that students receive from their peers. As a first step towards enhancing existing peerreview systems with new functionality based on helpfulness detection, we examine whether standard product review analysis techniques also apply to our new context of peer reviews. In addition, we investigate the utility of incorporating additional specialized features tailored to peer review. Our preliminary results show that the structural features, review unigrams and meta-data combined are useful in modeling the helpfulness of both peer reviews and product reviews, while peer-review specific auxiliary features can further improve helpfulness prediction.
2 0.68811578 82 acl-2011-Content Models with Attitude
Author: Christina Sauper ; Aria Haghighi ; Regina Barzilay
Abstract: We present a probabilistic topic model for jointly identifying properties and attributes of social media review snippets. Our model simultaneously learns a set of properties of a product and captures aggregate user sentiments towards these properties. This approach directly enables discovery of highly rated or inconsistent properties of a product. Our model admits an efficient variational meanfield inference algorithm which can be parallelized and run on large snippet collections. We evaluate our model on a large corpus of snippets from Yelp reviews to assess property and attribute prediction. We demonstrate that it outperforms applicable baselines by a considerable margin.
3 0.65730566 45 acl-2011-Aspect Ranking: Identifying Important Product Aspects from Online Consumer Reviews
Author: Jianxing Yu ; Zheng-Jun Zha ; Meng Wang ; Tat-Seng Chua
Abstract: In this paper, we dedicate to the topic of aspect ranking, which aims to automatically identify important product aspects from online consumer reviews. The important aspects are identified according to two observations: (a) the important aspects of a product are usually commented by a large number of consumers; and (b) consumers’ opinions on the important aspects greatly influence their overall opinions on the product. In particular, given consumer reviews of a product, we first identify the product aspects by a shallow dependency parser and determine consumers’ opinions on these aspects via a sentiment classifier. We then develop an aspect ranking algorithm to identify the important aspects by simultaneously considering the aspect frequency and the influence of consumers’ opinions given to each aspect on their overall opinions. The experimental results on 11 popular products in four domains demonstrate the effectiveness of our approach. We further apply the aspect ranking results to the application ofdocumentlevel sentiment classification, and improve the performance significantly.
4 0.61914104 20 acl-2011-A New Dataset and Method for Automatically Grading ESOL Texts
Author: Helen Yannakoudakis ; Ted Briscoe ; Ben Medlock
Abstract: We demonstrate how supervised discriminative machine learning techniques can be used to automate the assessment of ‘English as a Second or Other Language’ (ESOL) examination scripts. In particular, we use rank preference learning to explicitly model the grade relationships between scripts. A number of different features are extracted and ablation tests are used to investigate their contribution to overall performance. A comparison between regression and rank preference models further supports our method. Experimental results on the first publically available dataset show that our system can achieve levels of performance close to the upper bound for the task, as defined by the agreement between human examiners on the same corpus. Finally, using a set of ‘outlier’ texts, we test the validity of our model and identify cases where the model’s scores diverge from that of a human examiner.
5 0.55559301 136 acl-2011-Finding Deceptive Opinion Spam by Any Stretch of the Imagination
Author: Myle Ott ; Yejin Choi ; Claire Cardie ; Jeffrey T. Hancock
Abstract: Consumers increasingly rate, review and research products online (Jansen, 2010; Litvin et al., 2008). Consequently, websites containing consumer reviews are becoming targets of opinion spam. While recent work has focused primarily on manually identifiable instances of opinion spam, in this work we study deceptive opinion spam—fictitious opinions that have been deliberately written to sound authentic. Integrating work from psychology and computational linguistics, we develop and compare three approaches to detecting deceptive opinion spam, and ultimately develop a classifier that is nearly 90% accurate on our gold-standard opinion spam dataset. Based on feature analysis of our learned models, we additionally make several theoretical contributions, including revealing a relationship between deceptive opinions and imaginative writing.
7 0.54137325 341 acl-2011-Word Maturity: Computational Modeling of Word Knowledge
8 0.53927541 248 acl-2011-Predicting Clicks in a Vocabulary Learning System
9 0.53043389 97 acl-2011-Discovering Sociolinguistic Associations with Structured Sparsity
10 0.49515942 332 acl-2011-Using Multiple Sources to Construct a Sentiment Sensitive Thesaurus for Cross-Domain Sentiment Classification
11 0.48030043 125 acl-2011-Exploiting Readymades in Linguistic Creativity: A System Demonstration of the Jigsaw Bard
12 0.4802047 306 acl-2011-Towards Style Transformation from Written-Style to Audio-Style
13 0.47132903 204 acl-2011-Learning Word Vectors for Sentiment Analysis
14 0.46202788 133 acl-2011-Extracting Social Power Relationships from Natural Language
15 0.44694471 99 acl-2011-Discrete vs. Continuous Rating Scales for Language Evaluation in NLP
16 0.43353808 159 acl-2011-Identifying Noun Product Features that Imply Opinions
17 0.42712528 231 acl-2011-Nonlinear Evidence Fusion and Propagation for Hyponymy Relation Mining
18 0.42527762 319 acl-2011-Unsupervised Decomposition of a Document into Authorial Components
19 0.42237854 54 acl-2011-Automatically Extracting Polarity-Bearing Topics for Cross-Domain Sentiment Classification
20 0.41947138 212 acl-2011-Local Histograms of Character N-grams for Authorship Attribution
topicId topicWeight
[(5, 0.04), (17, 0.033), (26, 0.015), (37, 0.09), (39, 0.045), (41, 0.064), (51, 0.267), (55, 0.023), (59, 0.034), (72, 0.044), (91, 0.044), (96, 0.156), (97, 0.013), (98, 0.021)]
simIndex simValue paperId paperTitle
same-paper 1 0.76964021 55 acl-2011-Automatically Predicting Peer-Review Helpfulness
Author: Wenting Xiong ; Diane Litman
Abstract: Identifying peer-review helpfulness is an important task for improving the quality of feedback that students receive from their peers. As a first step towards enhancing existing peerreview systems with new functionality based on helpfulness detection, we examine whether standard product review analysis techniques also apply to our new context of peer reviews. In addition, we investigate the utility of incorporating additional specialized features tailored to peer review. Our preliminary results show that the structural features, review unigrams and meta-data combined are useful in modeling the helpfulness of both peer reviews and product reviews, while peer-review specific auxiliary features can further improve helpfulness prediction.
2 0.76260376 276 acl-2011-Semi-Supervised SimHash for Efficient Document Similarity Search
Author: Qixia Jiang ; Maosong Sun
Abstract: Searching documents that are similar to a query document is an important component in modern information retrieval. Some existing hashing methods can be used for efficient document similarity search. However, unsupervised hashing methods cannot incorporate prior knowledge for better hashing. Although some supervised hashing methods can derive effective hash functions from prior knowledge, they are either computationally expensive or poorly discriminative. This paper proposes a novel (semi-)supervised hashing method named Semi-Supervised SimHash (S3H) for high-dimensional data similarity search. The basic idea of S3H is to learn the optimal feature weights from prior knowledge to relocate the data such that similar data have similar hash codes. We evaluate our method with several state-of-the-art methods on two large datasets. All the results show that our method gets the best performance. 1
3 0.71510005 196 acl-2011-Large-Scale Cross-Document Coreference Using Distributed Inference and Hierarchical Models
Author: Sameer Singh ; Amarnag Subramanya ; Fernando Pereira ; Andrew McCallum
Abstract: Cross-document coreference, the task of grouping all the mentions of each entity in a document collection, arises in information extraction and automated knowledge base construction. For large collections, it is clearly impractical to consider all possible groupings of mentions into distinct entities. To solve the problem we propose two ideas: (a) a distributed inference technique that uses parallelism to enable large scale processing, and (b) a hierarchical model of coreference that represents uncertainty over multiple granularities of entities to facilitate more effective approximate inference. To evaluate these ideas, we constructed a labeled corpus of 1.5 million disambiguated mentions in Web pages by selecting link anchors referring to Wikipedia entities. We show that the combination of the hierarchical model with distributed inference quickly obtains high accuracy (with error reduction of 38%) on this large dataset, demonstrating the scalability of our approach.
4 0.70456362 141 acl-2011-Gappy Phrasal Alignment By Agreement
Author: Mohit Bansal ; Chris Quirk ; Robert Moore
Abstract: We propose a principled and efficient phraseto-phrase alignment model, useful in machine translation as well as other related natural language processing problems. In a hidden semiMarkov model, word-to-phrase and phraseto-word translations are modeled directly by the system. Agreement between two directional models encourages the selection of parsimonious phrasal alignments, avoiding the overfitting commonly encountered in unsupervised training with multi-word units. Expanding the state space to include “gappy phrases” (such as French ne ? pas) makes the alignment space more symmetric; thus, it allows agreement between discontinuous alignments. The resulting system shows substantial improvements in both alignment quality and translation quality over word-based Hidden Markov Models, while maintaining asymptotically equivalent runtime.
5 0.61378789 58 acl-2011-Beam-Width Prediction for Efficient Context-Free Parsing
Author: Nathan Bodenstab ; Aaron Dunlop ; Keith Hall ; Brian Roark
Abstract: Efficient decoding for syntactic parsing has become a necessary research area as statistical grammars grow in accuracy and size and as more NLP applications leverage syntactic analyses. We review prior methods for pruning and then present a new framework that unifies their strengths into a single approach. Using a log linear model, we learn the optimal beam-search pruning parameters for each CYK chart cell, effectively predicting the most promising areas of the model space to explore. We demonstrate that our method is faster than coarse-to-fine pruning, exemplified in both the Charniak and Berkeley parsers, by empirically comparing our parser to the Berkeley parser using the same grammar and under identical operating conditions.
6 0.6132406 246 acl-2011-Piggyback: Using Search Engines for Robust Cross-Domain Named Entity Recognition
7 0.61300248 324 acl-2011-Unsupervised Semantic Role Induction via Split-Merge Clustering
8 0.61222398 119 acl-2011-Evaluating the Impact of Coder Errors on Active Learning
9 0.61089289 318 acl-2011-Unsupervised Bilingual Morpheme Segmentation and Alignment with Context-rich Hidden Semi-Markov Models
10 0.61073798 190 acl-2011-Knowledge-Based Weak Supervision for Information Extraction of Overlapping Relations
11 0.61030471 327 acl-2011-Using Bilingual Parallel Corpora for Cross-Lingual Textual Entailment
12 0.60973954 133 acl-2011-Extracting Social Power Relationships from Natural Language
14 0.60932201 28 acl-2011-A Statistical Tree Annotator and Its Applications
15 0.60930467 145 acl-2011-Good Seed Makes a Good Crop: Accelerating Active Learning Using Language Modeling
16 0.60889876 241 acl-2011-Parsing the Internal Structure of Words: A New Paradigm for Chinese Word Segmentation
17 0.60869449 128 acl-2011-Exploring Entity Relations for Named Entity Disambiguation
18 0.60859096 48 acl-2011-Automatic Detection and Correction of Errors in Dependency Treebanks
19 0.60851383 3 acl-2011-A Bayesian Model for Unsupervised Semantic Parsing
20 0.60830021 257 acl-2011-Question Detection in Spoken Conversations Using Textual Conversations