emnlp emnlp2013 emnlp2013-196 knowledge-graph by maker-knowledge-mining
Source: pdf
Author: Anders Sgaard ; Hector Martinez ; Jakob Elming ; Anders Johannsen
Abstract: Often the bottleneck in document classification is finding good representations that zoom in on the most important aspects of the documents. Most research uses n-gram representations, but relevant features often occur discontinuously, e.g., not. . . good in sentiment analysis. In this paper we present experiments getting experts to provide regular expressions, as well as crowdsourced annotation tasks from which regular expressions can be derived. Somewhat surprisingly, it turns out that these crowdsourced feature combinations outperform automatic feature combination methods, as well as expert features, by a very large margin and reduce error by 24-41% over n-gram representations.
Reference: text
sentIndex sentText sentNum sentScore
1 Using crowdsourcing to get representations based on regular expressions Anders Søgaard and Hector Martinez and Jakob Elming and Anders Johannsen Center for Language Technology University of Copenhagen DK-2300 Copenhagen S {s oegaard| alons o | zmk8 6 7 |a j ohanns en} @hum . [sent-1, score-0.579]
2 dk Abstract Often the bottleneck in document classification is finding good representations that zoom in on the most important aspects of the documents. [sent-3, score-0.137]
3 Most research uses n-gram representations, but relevant features often occur discontinuously, e. [sent-4, score-0.059]
4 In this paper we present experiments getting experts to provide regular expressions, as well as crowdsourced annotation tasks from which regular expressions can be derived. [sent-10, score-1.018]
5 Somewhat surprisingly, it turns out that these crowdsourced feature combinations outperform automatic feature combination methods, as well as expert features, by a very large margin and reduce error by 24-41% over n-gram representations. [sent-11, score-0.324]
6 1 Introduction Finding good representations of classification problems is often glossed over in the literature. [sent-12, score-0.108]
7 Sev- eral authors have emphasized the need to pay more attention to finding such representations (Wagstaff, 2012; Domingos, 2012), but in document classification most research still uses n-gram representations. [sent-13, score-0.108]
8 This paper considers two document classification problems where such representations seem inadequate. [sent-14, score-0.108]
9 We argue that in order to adequately represent such problems we need discontinuous features, i. [sent-19, score-0.044]
10 The problem with using regular expressions as features is of course that even with a finite vocab1476 ulary we can generate infinitely many regular expressions that match our documents. [sent-22, score-0.846]
11 We suggest to use expert knowledge or crowdsourcing in the loop. [sent-23, score-0.148]
12 In particular we present experiments where standard representations are augmented with features from a few hours of manual work, by machine learning experts or by turkers. [sent-24, score-0.382]
13 Somewhat surprisingly, we find that features derived from crowdsourced annotation tasks lead to the best results across the three datasets. [sent-25, score-0.235]
14 While crowdsourcing of annotation tasks has become increasing popular in NLP, this is, to the best of our knowledge, the first attempt to crowdsource the problem of finding good representations. [sent-26, score-0.201]
15 (2012) design a collaborative two-player game for sentiment annotation and collecting a sentiment lexicon. [sent-29, score-0.224]
16 One player guesses the sentiment of a text and picks a word from it that is representative of its sentiment. [sent-30, score-0.181]
17 The other player also provides a guess observing only this word. [sent-31, score-0.039]
18 If the two guesses agree, both players get a point. [sent-32, score-0.052]
19 The idea of gamifying the problem of finding good representations goes beyond crowdsourcing, but is not considered here. [sent-33, score-0.08]
20 (2012) crowdsource the feature weighting problem, but using standard representations. [sent-35, score-0.103]
21 (201 1), who learn a ’crowd kernel’ by asking annotators to rate examples by similarity, providing an embedding that promotes feature combinations deemed relative when measuring similar- ity. [sent-37, score-0.142]
22 x0 0238 915 2 Experiments Data The three datasets used in our experiments come from two sources, namely stackoverflow. [sent-44, score-0.034]
23 The two beer review datasets (TASTE and APPEARANCE) are described in McAuley et al. [sent-47, score-0.121]
24 1 Each input example is an unstructured review text, and the associated label is the score assigned to taste or appearance by the reviewer. [sent-49, score-0.315]
25 We randomly sample about 152k data points, as well as 500 examples for experiments with experts and turks. [sent-50, score-0.175]
26 We select pairs of answers, where one is ranked higher than the other by stackoverflow. [sent-52, score-0.044]
27 Obviously the answers submitted first have a better chance of being ranked highly, so we also require that the highest ranked answer was submitted last. [sent-54, score-0.285]
28 From this set of answer pairs, we randomly sample 97,519 pairs, as well as 500 examples for our experiments with experts and turks. [sent-55, score-0.232]
29 Our experiments are classification experiments using the same learning algorithm in all experiments, namely L1-regularized logistic regression. [sent-56, score-0.028]
30 We don’t set any parameters The only differences between our systems are in the feature sets. [sent-57, score-0.037]
31 The four feature sets are described below: BoW, HI, Exp and AMT. [sent-59, score-0.037]
32 For motivating using regular expressions, consider the following sentence from a review of John Harvard’s Grand Cru: (1) Could have been more flavorful. [sent-60, score-0.26]
33 The only word carrying direct sentiment in this sentence is flavorful, which is positive, but the sentence is a negative evaluation of the Grand Cru’s 1http://snap. [sent-61, score-0.09]
34 The trigram been more flavorful seems negative at first, but in the context of negation or in a comparative, it can become positive again. [sent-67, score-0.142]
35 However, note that this trigram may occur discontinuously, e. [sent-68, score-0.072]
36 In order to match such occurrences, we need simple regular expressions, e. [sent-71, score-0.23]
37 * flavorful This is exactly the kind of regular expressions we asked experts to submit, and that we derived from the crowdsourced annotation tasks. [sent-75, score-0.887]
38 Note that the sentence says nothing about the beer’s appearance, so this feature is only relevant in TASTE, not in APPEARANCE. [sent-76, score-0.037]
39 BoW and BoW+HI Our most simple baseline approach is a bag-of-words model of unigram features (BoW). [sent-77, score-0.072]
40 We also introduce a semantically enriched unigram model (BoW)+HI, where in addition to representing what words occur in a text, we also represent what Harvard Inquirer (HI)3 word classes occur in it. [sent-79, score-0.135]
41 The HI classes are used to generate features from the crowdsourced annotation tasks, so the semantically enriched unigram model is an important baseline in our experiments below. [sent-80, score-0.312]
42 BoW+Exp In order to collect regular expressions from experts, we set up a web interface for querying held-out portions of the datasets with regular expressions that reports how occurrences of the submitted regular expressions correlate with class. [sent-81, score-1.367]
43 We used the Python re syntax for regular expressions after augmenting word forms with POS and semantic classes from the HI. [sent-82, score-0.408]
44 Few of the experts made use of the POS tags, but many regular expressions included references to HI classes. [sent-83, score-0.583]
45 htm Regular expressions submitted by participants were visible to other participants during the experiment, and participants were allowed to work together. [sent-88, score-0.428]
46 Participants had 15 minutes to familiarize themselves with the syntax used in the experiments. [sent-89, score-0.029]
47 Seven researchers and graduate students spent five effective hours querying the datasets with regular expressions. [sent-91, score-0.521]
48 In particular, they spent three hours on the Stack Exchange dataset, and one hour on each of the two RateBeer datasets. [sent-92, score-0.26]
49 So, in total, we spent 20 person hours on Stack Exchange, and seven person hours on each of the RateBeer datasets. [sent-94, score-0.343]
50 In the five hours, we collected 1,156 regular expressions for the STACKOVERFLOW dataset, and about 650 regular expressions for each of the two RateBeer datasets. [sent-95, score-0.816]
51 In our experiments below we concatenate these with the BoW features to form BoW+Exp. [sent-97, score-0.03]
52 The annotators were presented with each text, a review or an answer, twice: once as running text, once word-by-word with bullets to tick off words. [sent-99, score-0.218]
53 The annotators were instructed to tick off words or phrases that they found predictive of the text’s sentiment or answer quality. [sent-100, score-0.306]
54 We chose this annotation task, because it is relatively easy for annotators to mark spans of text with a particular attribute. [sent-102, score-0.117]
55 The annotators were con- strained to tick off at least three words, including one closed class item (closed class items were colored differently). [sent-106, score-0.217]
56 Finally, we only used annotators with a track record of providing high-quality annotations in previous tasks. [sent-107, score-0.073]
57 It was clear from the average time spent by annotators that annotating STACKOVERFLOW was harder than annotating the Ratebeer datasets. [sent-108, score-0.27]
58 The average time spent on a Ratebeer HIT was 44s, while for STACKOVERFLOW it was 3m:8s. [sent-109, score-0.121]
59 859 Table 2: Results using all features was between 5. [sent-124, score-0.03]
60 6 and 7, with more words ticked off in STACKOVERFLOW. [sent-125, score-0.132]
61 The maximum number of words ticked off by an annotator was 41. [sent-126, score-0.132]
62 This was supposed to match, roughly, the cost of the experts consulted for BoW+Exp. [sent-129, score-0.204]
63 The features generated from the annotations were constructed as follows: We use a sliding window of size 3 to extract trigrams over the possibly discontinuous words ticked off by the annotators. [sent-130, score-0.285]
64 These trigrams were converted into regular expressions by placing Kleene stars between the words. [sent-131, score-0.441]
65 This gives us a manually selected subset of skip trigrams. [sent-132, score-0.035]
66 For each skip trigram, we add copies with one or more words replaced by one of their HI classes. [sent-133, score-0.035]
67 Feature combinations This subsection introduces some harder baselines for our experiments, considered in Experiment #2. [sent-134, score-0.032]
68 The simplest possible way of combining unigram features is by considering ngram models. [sent-135, score-0.072]
69 An n-gram extracts features from a sliding window (of size n) over the text. [sent-136, score-0.076]
70 Our BoW(N = 1) model takes word forms as features, and there are obviously more advanced ways ofautomatically combining such features. [sent-138, score-0.031]
71 Kernel representations We experimented with ap- plying an approximate feature map for the additive χ2-kernel. [sent-139, score-0.18]
72 Deep features We also ran denoising autoencoders (Pascal et al. [sent-142, score-0.265]
73 , 2012), with 2N nodes in the middle layer to obtain a deep representation of our datasets from χ2-BoW input. [sent-146, score-0.074]
74 Figure 2: Results using different feature combination techniques (left to right): STACKOVERFLOW, APPEARANCE. [sent-153, score-0.037]
75 One observation is of course that the expert feature set Exp is much smaller than BoW and AMT, but note also that the expert features fire about 150 times more often on average than the BoW features. [sent-162, score-0.181]
76 Exp and AMT We present results using all features, as well as results obtained after selecting k features as ranked by a simple χ2 test. [sent-165, score-0.074]
77 The results using all collected features are presented in Table 2. [sent-166, score-0.03]
78 The error reduction on STACKOVERFLOW when adding crowdsourced features to our baseline model (BoW+AMT), is 24. [sent-167, score-0.191]
79 The BoW+AMT feature set is bigger than those of the other models. [sent-173, score-0.037]
80 We therefore report results using the top-k features as ranked by a simple χ2 test. [sent-174, score-0.074]
81 The result curves are presented in the three plots in 1479 Fig. [sent-175, score-0.066]
82 We also show that bigram features, kernelbased decomposition and deep features do not provide much stronger baselines either. [sent-181, score-0.097]
83 The result curves are presented in the three plots in Fig. [sent-182, score-0.066]
84 Since autoencoders are consistently worse than denoising autoencoders (drop-out 0. [sent-185, score-0.326]
85 4 Conclusion We presented a new method for deriving feature representations from crowdsourced annotation tasks and showed how it leads to 24%-41% error reductions on answer scoring and multi-aspect sentiment analysis problems. [sent-187, score-0.497]
86 We saw no significant improvements using features contributed by experts, kernel representations or learned deep representations. [sent-188, score-0.181]
87 Dynamic pooling and unfolding recursive autoencoders for paraphrase detection. [sent-231, score-0.091]
wordName wordTfidf (topN-words)
[('bow', 0.562), ('stackoverflow', 0.231), ('regular', 0.23), ('amt', 0.21), ('taste', 0.184), ('expressions', 0.178), ('experts', 0.175), ('ratebeer', 0.165), ('crowdsourced', 0.161), ('denoising', 0.144), ('ticked', 0.132), ('hi', 0.125), ('spent', 0.121), ('appearance', 0.101), ('flavorful', 0.099), ('mcauley', 0.099), ('hours', 0.097), ('crowdsourcing', 0.091), ('autoencoders', 0.091), ('sentiment', 0.09), ('exp', 0.086), ('tick', 0.086), ('representations', 0.08), ('annotators', 0.073), ('submitted', 0.07), ('crowdsource', 0.066), ('discontinuously', 0.066), ('grand', 0.066), ('kbdeoarnw', 0.066), ('musat', 0.066), ('ranganath', 0.066), ('tamuz', 0.066), ('uoteln', 0.066), ('vedaldi', 0.066), ('participants', 0.06), ('burstein', 0.057), ('dahlmeier', 0.057), ('cru', 0.057), ('beer', 0.057), ('answer', 0.057), ('expert', 0.057), ('logarithmic', 0.052), ('finin', 0.052), ('guesses', 0.052), ('crowd', 0.049), ('copenhagen', 0.046), ('sliding', 0.046), ('ranked', 0.044), ('annotation', 0.044), ('discontinuous', 0.044), ('trigram', 0.043), ('unigram', 0.042), ('hour', 0.042), ('anders', 0.042), ('deep', 0.04), ('harvard', 0.039), ('exchange', 0.039), ('plots', 0.039), ('querying', 0.039), ('player', 0.039), ('annotating', 0.038), ('feature', 0.037), ('stack', 0.035), ('enriched', 0.035), ('skip', 0.035), ('additive', 0.034), ('datasets', 0.034), ('trigrams', 0.033), ('combinations', 0.032), ('obviously', 0.031), ('kernel', 0.031), ('hit', 0.03), ('review', 0.03), ('features', 0.03), ('socher', 0.029), ('closed', 0.029), ('occur', 0.029), ('familiarize', 0.029), ('susanne', 0.029), ('jill', 0.029), ('kukich', 0.029), ('hum', 0.029), ('brianna', 0.029), ('minmin', 0.029), ('zhixiang', 0.029), ('consulted', 0.029), ('martinez', 0.029), ('bullets', 0.029), ('plying', 0.029), ('strained', 0.029), ('zoom', 0.029), ('watery', 0.029), ('belongie', 0.029), ('classification', 0.028), ('leave', 0.028), ('scoring', 0.028), ('seven', 0.028), ('pascal', 0.027), ('stronger', 0.027), ('curves', 0.027)]
simIndex simValue paperId paperTitle
same-paper 1 0.9999997 196 emnlp-2013-Using Crowdsourcing to get Representations based on Regular Expressions
Author: Anders Sgaard ; Hector Martinez ; Jakob Elming ; Anders Johannsen
Abstract: Often the bottleneck in document classification is finding good representations that zoom in on the most important aspects of the documents. Most research uses n-gram representations, but relevant features often occur discontinuously, e.g., not. . . good in sentiment analysis. In this paper we present experiments getting experts to provide regular expressions, as well as crowdsourced annotation tasks from which regular expressions can be derived. Somewhat surprisingly, it turns out that these crowdsourced feature combinations outperform automatic feature combination methods, as well as expert features, by a very large margin and reduce error by 24-41% over n-gram representations.
2 0.10995425 120 emnlp-2013-Learning Latent Word Representations for Domain Adaptation using Supervised Word Clustering
Author: Min Xiao ; Feipeng Zhao ; Yuhong Guo
Abstract: Domain adaptation has been popularly studied on exploiting labeled information from a source domain to learn a prediction model in a target domain. In this paper, we develop a novel representation learning approach to address domain adaptation for text classification with automatically induced discriminative latent features, which are generalizable across domains while informative to the prediction task. Specifically, we propose a hierarchical multinomial Naive Bayes model with latent variables to conduct supervised word clustering on labeled documents from both source and target domains, and then use the produced cluster distribution of each word as its latent feature representation for domain adaptation. We train this latent graphical model us- ing a simple expectation-maximization (EM) algorithm. We empirically evaluate the proposed method with both cross-domain document categorization tasks on Reuters-21578 dataset and cross-domain sentiment classification tasks on Amazon product review dataset. The experimental results demonstrate that our proposed approach achieves superior performance compared with alternative methods.
3 0.087461993 143 emnlp-2013-Open Domain Targeted Sentiment
Author: Margaret Mitchell ; Jacqui Aguilar ; Theresa Wilson ; Benjamin Van Durme
Abstract: We propose a novel approach to sentiment analysis for a low resource setting. The intuition behind this work is that sentiment expressed towards an entity, targeted sentiment, may be viewed as a span of sentiment expressed across the entity. This representation allows us to model sentiment detection as a sequence tagging problem, jointly discovering people and organizations along with whether there is sentiment directed towards them. We compare performance in both Spanish and English on microblog data, using only a sentiment lexicon as an external resource. By leveraging linguisticallyinformed features within conditional random fields (CRFs) trained to minimize empirical risk, our best models in Spanish significantly outperform a strong baseline, and reach around 90% accuracy on the combined task of named entity recognition and sentiment prediction. Our models in English, trained on a much smaller dataset, are not yet statistically significant against their baselines.
4 0.074731745 119 emnlp-2013-Learning Distributions over Logical Forms for Referring Expression Generation
Author: Nicholas FitzGerald ; Yoav Artzi ; Luke Zettlemoyer
Abstract: We present a new approach to referring expression generation, casting it as a density estimation problem where the goal is to learn distributions over logical expressions identifying sets of objects in the world. Despite an extremely large space of possible expressions, we demonstrate effective learning of a globally normalized log-linear distribution. This learning is enabled by a new, multi-stage approximate inference technique that uses a pruning model to construct only the most likely logical forms. We train and evaluate the approach on a new corpus of references to sets of visual objects. Experiments show the approach is able to learn accurate models, which generate over 87% of the expressions people used. Additionally, on the previously studied special case of single object reference, we show a 35% relative error reduction over previous state of the art.
5 0.070287898 158 emnlp-2013-Recursive Deep Models for Semantic Compositionality Over a Sentiment Treebank
Author: Richard Socher ; Alex Perelygin ; Jean Wu ; Jason Chuang ; Christopher D. Manning ; Andrew Ng ; Christopher Potts
Abstract: Semantic word spaces have been very useful but cannot express the meaning of longer phrases in a principled way. Further progress towards understanding compositionality in tasks such as sentiment detection requires richer supervised training and evaluation resources and more powerful models of composition. To remedy this, we introduce a Sentiment Treebank. It includes fine grained sentiment labels for 215,154 phrases in the parse trees of 11,855 sentences and presents new challenges for sentiment compositionality. To address them, we introduce the Recursive Neural Tensor Network. When trained on the new treebank, this model outperforms all previous methods on several metrics. It pushes the state of the art in single sentence positive/negative classification from 80% up to 85.4%. The accuracy of predicting fine-grained sentiment labels for all phrases reaches 80.7%, an improvement of 9.7% over bag of features baselines. Lastly, it is the only model that can accurately capture the effects of negation and its scope at various tree levels for both positive and negative phrases.
6 0.066485241 31 emnlp-2013-Automatic Feature Engineering for Answer Selection and Extraction
7 0.057412572 109 emnlp-2013-Is Twitter A Better Corpus for Measuring Sentiment Similarity?
8 0.051339097 169 emnlp-2013-Semi-Supervised Representation Learning for Cross-Lingual Text Classification
9 0.046767805 157 emnlp-2013-Recursive Autoencoders for ITG-Based Translation
10 0.044184782 126 emnlp-2013-MCTest: A Challenge Dataset for the Open-Domain Machine Comprehension of Text
11 0.040959299 163 emnlp-2013-Sarcasm as Contrast between a Positive Sentiment and Negative Situation
12 0.03982519 59 emnlp-2013-Deriving Adjectival Scales from Continuous Space Word Representations
13 0.038710326 64 emnlp-2013-Discriminative Improvements to Distributional Sentence Similarity
14 0.038357846 170 emnlp-2013-Sentiment Analysis: How to Derive Prior Polarities from SentiWordNet
15 0.037706055 191 emnlp-2013-Understanding and Quantifying Creativity in Lexical Composition
16 0.03684447 17 emnlp-2013-A Walk-Based Semantically Enriched Tree Kernel Over Distributed Word Representations
17 0.035697963 193 emnlp-2013-Unsupervised Induction of Cross-Lingual Semantic Relations
18 0.034937724 24 emnlp-2013-Application of Localized Similarity for Web Documents
19 0.034780122 37 emnlp-2013-Automatically Identifying Pseudepigraphic Texts
20 0.034678537 63 emnlp-2013-Discourse Level Explanatory Relation Extraction from Product Reviews Using First-Order Logic
topicId topicWeight
[(0, -0.129), (1, 0.031), (2, -0.078), (3, -0.062), (4, 0.043), (5, 0.066), (6, -0.006), (7, -0.043), (8, 0.027), (9, 0.027), (10, 0.009), (11, -0.017), (12, -0.021), (13, -0.067), (14, 0.065), (15, 0.034), (16, -0.015), (17, -0.0), (18, 0.023), (19, -0.051), (20, 0.101), (21, -0.049), (22, 0.067), (23, -0.069), (24, 0.08), (25, 0.025), (26, 0.073), (27, -0.097), (28, 0.0), (29, 0.029), (30, 0.134), (31, -0.002), (32, -0.045), (33, -0.004), (34, -0.026), (35, -0.019), (36, 0.06), (37, -0.048), (38, -0.028), (39, -0.035), (40, 0.018), (41, 0.061), (42, 0.076), (43, 0.056), (44, -0.013), (45, 0.016), (46, 0.026), (47, 0.068), (48, 0.01), (49, -0.138)]
simIndex simValue paperId paperTitle
same-paper 1 0.93365145 196 emnlp-2013-Using Crowdsourcing to get Representations based on Regular Expressions
Author: Anders Sgaard ; Hector Martinez ; Jakob Elming ; Anders Johannsen
Abstract: Often the bottleneck in document classification is finding good representations that zoom in on the most important aspects of the documents. Most research uses n-gram representations, but relevant features often occur discontinuously, e.g., not. . . good in sentiment analysis. In this paper we present experiments getting experts to provide regular expressions, as well as crowdsourced annotation tasks from which regular expressions can be derived. Somewhat surprisingly, it turns out that these crowdsourced feature combinations outperform automatic feature combination methods, as well as expert features, by a very large margin and reduce error by 24-41% over n-gram representations.
2 0.51849848 144 emnlp-2013-Opinion Mining in Newspaper Articles by Entropy-Based Word Connections
Author: Thomas Scholz ; Stefan Conrad
Abstract: A very valuable piece of information in newspaper articles is the tonality of extracted statements. For the analysis of tonality of newspaper articles either a big human effort is needed, when it is carried out by media analysts, or an automated approach which has to be as accurate as possible for a Media Response Analysis (MRA). To this end, we will compare several state-of-the-art approaches for Opinion Mining in newspaper articles in this paper. Furthermore, we will introduce a new technique to extract entropy-based word connections which identifies the word combinations which create a tonality. In the evaluation, we use two different corpora consisting of news articles, by which we show that the new approach achieves better results than the four state-of-the-art methods.
3 0.50197637 143 emnlp-2013-Open Domain Targeted Sentiment
Author: Margaret Mitchell ; Jacqui Aguilar ; Theresa Wilson ; Benjamin Van Durme
Abstract: We propose a novel approach to sentiment analysis for a low resource setting. The intuition behind this work is that sentiment expressed towards an entity, targeted sentiment, may be viewed as a span of sentiment expressed across the entity. This representation allows us to model sentiment detection as a sequence tagging problem, jointly discovering people and organizations along with whether there is sentiment directed towards them. We compare performance in both Spanish and English on microblog data, using only a sentiment lexicon as an external resource. By leveraging linguisticallyinformed features within conditional random fields (CRFs) trained to minimize empirical risk, our best models in Spanish significantly outperform a strong baseline, and reach around 90% accuracy on the combined task of named entity recognition and sentiment prediction. Our models in English, trained on a much smaller dataset, are not yet statistically significant against their baselines.
4 0.47991401 170 emnlp-2013-Sentiment Analysis: How to Derive Prior Polarities from SentiWordNet
Author: Marco Guerini ; Lorenzo Gatti ; Marco Turchi
Abstract: Assigning a positive or negative score to a word out of context (i.e. a word’s prior polarity) is a challenging task for sentiment analysis. In the literature, various approaches based on SentiWordNet have been proposed. In this paper, we compare the most often used techniques together with newly proposed ones and incorporate all of them in a learning framework to see whether blending them can further improve the estimation of prior polarity scores. Using two different versions of SentiWordNet and testing regression and classification models across tasks and datasets, our learning approach consistently outperforms the single metrics, providing a new state-ofthe-art approach in computing words’ prior polarity for sentiment analysis. We conclude our investigation showing interesting biases in calculated prior polarity scores when word Part of Speech and annotator gender are considered.
5 0.47445276 120 emnlp-2013-Learning Latent Word Representations for Domain Adaptation using Supervised Word Clustering
Author: Min Xiao ; Feipeng Zhao ; Yuhong Guo
Abstract: Domain adaptation has been popularly studied on exploiting labeled information from a source domain to learn a prediction model in a target domain. In this paper, we develop a novel representation learning approach to address domain adaptation for text classification with automatically induced discriminative latent features, which are generalizable across domains while informative to the prediction task. Specifically, we propose a hierarchical multinomial Naive Bayes model with latent variables to conduct supervised word clustering on labeled documents from both source and target domains, and then use the produced cluster distribution of each word as its latent feature representation for domain adaptation. We train this latent graphical model us- ing a simple expectation-maximization (EM) algorithm. We empirically evaluate the proposed method with both cross-domain document categorization tasks on Reuters-21578 dataset and cross-domain sentiment classification tasks on Amazon product review dataset. The experimental results demonstrate that our proposed approach achieves superior performance compared with alternative methods.
6 0.45683813 169 emnlp-2013-Semi-Supervised Representation Learning for Cross-Lingual Text Classification
7 0.44886696 158 emnlp-2013-Recursive Deep Models for Semantic Compositionality Over a Sentiment Treebank
8 0.44136909 163 emnlp-2013-Sarcasm as Contrast between a Positive Sentiment and Negative Situation
9 0.43017015 119 emnlp-2013-Learning Distributions over Logical Forms for Referring Expression Generation
10 0.3979913 178 emnlp-2013-Success with Style: Using Writing Style to Predict the Success of Novels
11 0.39184701 191 emnlp-2013-Understanding and Quantifying Creativity in Lexical Composition
12 0.38777152 31 emnlp-2013-Automatic Feature Engineering for Answer Selection and Extraction
13 0.37478745 59 emnlp-2013-Deriving Adjectival Scales from Continuous Space Word Representations
14 0.36898899 126 emnlp-2013-MCTest: A Challenge Dataset for the Open-Domain Machine Comprehension of Text
15 0.33437839 185 emnlp-2013-Towards Situated Dialogue: Revisiting Referring Expression Generation
16 0.32931128 123 emnlp-2013-Learning to Rank Lexical Substitutions
17 0.32620814 190 emnlp-2013-Ubertagging: Joint Segmentation and Supertagging for English
18 0.31425121 29 emnlp-2013-Automatic Domain Partitioning for Multi-Domain Learning
19 0.31291878 109 emnlp-2013-Is Twitter A Better Corpus for Measuring Sentiment Similarity?
topicId topicWeight
[(3, 0.058), (18, 0.04), (22, 0.051), (30, 0.058), (41, 0.383), (51, 0.174), (66, 0.026), (71, 0.044), (75, 0.037), (77, 0.013), (90, 0.019), (96, 0.014)]
simIndex simValue paperId paperTitle
same-paper 1 0.73412353 196 emnlp-2013-Using Crowdsourcing to get Representations based on Regular Expressions
Author: Anders Sgaard ; Hector Martinez ; Jakob Elming ; Anders Johannsen
Abstract: Often the bottleneck in document classification is finding good representations that zoom in on the most important aspects of the documents. Most research uses n-gram representations, but relevant features often occur discontinuously, e.g., not. . . good in sentiment analysis. In this paper we present experiments getting experts to provide regular expressions, as well as crowdsourced annotation tasks from which regular expressions can be derived. Somewhat surprisingly, it turns out that these crowdsourced feature combinations outperform automatic feature combination methods, as well as expert features, by a very large margin and reduce error by 24-41% over n-gram representations.
2 0.70441175 68 emnlp-2013-Effectiveness and Efficiency of Open Relation Extraction
Author: Filipe Mesquita ; Jordan Schmidek ; Denilson Barbosa
Abstract: A large number of Open Relation Extraction approaches have been proposed recently, covering a wide range of NLP machinery, from “shallow” (e.g., part-of-speech tagging) to “deep” (e.g., semantic role labeling–SRL). A natural question then is what is the tradeoff between NLP depth (and associated computational cost) versus effectiveness. This paper presents a fair and objective experimental comparison of 8 state-of-the-art approaches over 5 different datasets, and sheds some light on the issue. The paper also describes a novel method, EXEMPLAR, which adapts ideas from SRL to less costly NLP machinery, resulting in substantial gains both in efficiency and effectiveness, over binary and n-ary relation extraction tasks.
3 0.46941495 152 emnlp-2013-Predicting the Presence of Discourse Connectives
Author: Gary Patterson ; Andrew Kehler
Abstract: We present a classification model that predicts the presence or omission of a lexical connective between two clauses, based upon linguistic features of the clauses and the type of discourse relation holding between them. The model is trained on a set of high frequency relations extracted from the Penn Discourse Treebank and achieves an accuracy of 86.6%. Analysis of the results reveals that the most informative features relate to the discourse dependencies between sequences of coherence relations in the text. We also present results of an experiment that provides insight into the nature and difficulty of the task.
4 0.46601766 132 emnlp-2013-Mining Scientific Terms and their Definitions: A Study of the ACL Anthology
Author: Yiping Jin ; Min-Yen Kan ; Jun-Ping Ng ; Xiangnan He
Abstract: This paper presents DefMiner, a supervised sequence labeling system that identifies scientific terms and their accompanying definitions. DefMiner achieves 85% F1 on a Wikipedia benchmark corpus, significantly improving the previous state-of-the-art by 8%. We exploit DefMiner to process the ACL Anthology Reference Corpus (ARC) – a large, real-world digital library of scientific articles in computational linguistics. The resulting automatically-acquired glossary represents the terminology defined over several thousand individual research articles. We highlight several interesting observations: more definitions are introduced for conference and workshop papers over the years and that multiword terms account for slightly less than half of all terms. Obtaining a list of popular , defined terms in a corpus ofcomputational linguistics papers, we find that concepts can often be categorized into one of three categories: resources, methodologies and evaluation metrics.
5 0.46533513 51 emnlp-2013-Connecting Language and Knowledge Bases with Embedding Models for Relation Extraction
Author: Jason Weston ; Antoine Bordes ; Oksana Yakhnenko ; Nicolas Usunier
Abstract: This paper proposes a novel approach for relation extraction from free text which is trained to jointly use information from the text and from existing knowledge. Our model is based on scoring functions that operate by learning low-dimensional embeddings of words, entities and relationships from a knowledge base. We empirically show on New York Times articles aligned with Freebase relations that our approach is able to efficiently use the extra information provided by a large subset of Freebase data (4M entities, 23k relationships) to improve over methods that rely on text features alone.
6 0.46524179 48 emnlp-2013-Collective Personal Profile Summarization with Social Networks
7 0.46477246 179 emnlp-2013-Summarizing Complex Events: a Cross-Modal Solution of Storylines Extraction and Reconstruction
8 0.46211183 69 emnlp-2013-Efficient Collective Entity Linking with Stacking
9 0.46209595 53 emnlp-2013-Cross-Lingual Discriminative Learning of Sequence Models with Posterior Regularization
10 0.46202725 82 emnlp-2013-Exploring Representations from Unlabeled Data with Co-training for Chinese Word Segmentation
11 0.46196952 56 emnlp-2013-Deep Learning for Chinese Word Segmentation and POS Tagging
12 0.46152687 64 emnlp-2013-Discriminative Improvements to Distributional Sentence Similarity
13 0.46103382 114 emnlp-2013-Joint Learning and Inference for Grammatical Error Correction
14 0.46099499 154 emnlp-2013-Prior Disambiguation of Word Tensors for Constructing Sentence Vectors
15 0.46087104 193 emnlp-2013-Unsupervised Induction of Cross-Lingual Semantic Relations
16 0.4608334 77 emnlp-2013-Exploiting Domain Knowledge in Aspect Extraction
18 0.45952511 79 emnlp-2013-Exploiting Multiple Sources for Open-Domain Hypernym Discovery
19 0.45909828 21 emnlp-2013-An Empirical Study Of Semi-Supervised Chinese Word Segmentation Using Co-Training
20 0.45873269 47 emnlp-2013-Collective Opinion Target Extraction in Chinese Microblogs