emnlp emnlp2013 emnlp2013-18 knowledge-graph by maker-knowledge-mining
Source: pdf
Author: Daniel Preotiuc-Pietro ; Trevor Cohn
Abstract: Temporal variations of text are usually ignored in NLP applications. However, text use changes with time, which can affect many applications. In this paper we model periodic distributions of words over time. Focusing on hashtag frequency in Twitter, we first automatically identify the periodic patterns. We use this for regression in order to forecast the volume of a hashtag based on past data. We use Gaussian Processes, a state-ofthe-art bayesian non-parametric model, with a novel periodic kernel. We demonstrate this in a text classification setting, assigning the tweet hashtag based on the rest of its text. This method shows significant improvements over competitive baselines.
Reference: text
sentIndex sentText sentNum sentScore
1 A temporal model of text periodicities using Gaussian Processes Daniel Preo t¸iuc-Pietro, Trevor Cohn Department of Computer Science University of Sheffield Regent Court, 211Portobello Street Sheffield, S 14DP, United Kingdom {danie l t . [sent-1, score-0.293]
2 In this paper we model periodic distributions of words over time. [sent-7, score-0.501]
3 Focusing on hashtag frequency in Twitter, we first automatically identify the periodic patterns. [sent-8, score-0.735]
4 We use this for regression in order to forecast the volume of a hashtag based on past data. [sent-9, score-0.404]
5 We use Gaussian Processes, a state-ofthe-art bayesian non-parametric model, with a novel periodic kernel. [sent-10, score-0.541]
6 We demonstrate this in a text classification setting, assigning the tweet hashtag based on the rest of its text. [sent-11, score-0.291]
7 This paper develops a temporal model for classifying microblog posts which explicitly incorporates multimodal periodic behaviours using Gaussian Processes (GPs). [sent-18, score-0.646]
8 Modelling temporal patterns and periodicities can be useful to tasks like text classification. [sent-23, score-0.336]
9 For example a tweet containing ‘music’ is normally attributed to a general hashtag about music like #np (now playing). [sent-24, score-0.291]
10 We demonstrate the approach by modelling frequencies of hashtag occurrences in Twitter. [sent-31, score-0.315]
11 Using the Bayesian evidence we automatically perform model selection to classify temporal patterns. [sent-36, score-0.189]
12 We also introduce a new kernel suitable to model the periodic behaviour we observe in text: periods of low frequency followed by bursts at regular time intervals. [sent-41, score-1.041]
13 To demonstrate the practical importance of our approach, we use our GP prediction as a prior in a Na¨ ıve Bayes model for text classification showing improvements over baselines which do not account for temporal periodicities. [sent-45, score-0.19]
14 To our knowledge, this is the first paper to use GP regression for forecasting and model selection within a NLP task. [sent-50, score-0.235]
15 All the hashtag time series data and the implementation of the PS kernel in the popular opensource Gaussian Processes packages GPML1 and GPy2 are available on the author’s website3. [sent-51, score-0.603]
16 In general, methods that assume certain periodicities at daily or weekly levels were proposed e. [sent-67, score-0.368]
17 Temporal patterns for short, distinctive lexical items such as hashtags and memes were quantitatively studied (Leskovec et al. [sent-74, score-0.267]
18 , 2012) studies the dual role of hashtags, of bookmarks of content and symbols of community membership, in the context of hashtag adoption. [sent-77, score-0.234]
19 , 2011) analyses the patterns of temporal diffusion in Social Media finding that hashtags have also a persistence factor. [sent-79, score-0.332]
20 For predicting the hashtag given the tweet text, Mazzia and Juett (201 1) uses the Na¨ ıve Bayes classifier with the uniform and empirical prior or TF-IDF weighting. [sent-85, score-0.327]
21 The kernel of the GP defines the covariance in response values as a function of its inputs. [sent-90, score-0.364]
22 In this respect, extrapolation is considered a more difficult task and the covariance kernel which incorporates our prior knowledge plays a major role in the prediction. [sent-94, score-0.412]
23 For choosing the right kernel and its hyperparameters only using the training data we employ Bayesian model selection which makes a trade-off between the fit of the training data and model complexity. [sent-96, score-0.372]
24 We now briefly give an overview of GP regression, kernel choice and model selection. [sent-97, score-0.257]
25 1 Gaussian Process Regression Consider a time series regression task where we only have one feature, the value xt at time t. [sent-100, score-0.299]
26 GP regression assumes a latent function f that is drawn from a GP prior f(t) ∼ GP(m, k(t, t0)) wish derraew m firso tmhe a mean raiondr fk( a k ∼ern GelP. [sent-104, score-0.192]
27 he ∼ mean m, here 0, and the covariance kernel function, k(t, t0). [sent-109, score-0.372]
28 k(t∗, tn)]T are the kernel evaluations between the test point and all the training points, K = {k(ti, is the Gram matrix tj)}ij==11. [sent-115, score-0.257]
29 2 Kernels The covariance kernel together with its parameters fully define the GP (we assume 0 mean). [sent-127, score-0.332]
30 The kernel induces similarities in the response between pairs of data points. [sent-128, score-0.289]
31 If we want a periodic behaviour points at period length intervals should have the highest covariance. [sent-130, score-0.688]
32 We see that both the SE kernel and a periodic kernel (PS, see below) give good results. [sent-136, score-1.015]
33 However, for extrapolation, the choice of the kernel is paramount. [sent-137, score-0.257]
34 The kernel encodes our prior belief about the type of function wish to learn. [sent-138, score-0.293]
35 To illustrate this, in Figure 3, we show the time series for #goodmorning over 2 weeks and plot the regression for the future week learned by using different kernels. [sent-139, score-0.293]
36 This includes a new kernel inspired by observed word occurrence patterns. [sent-141, score-0.257]
37 The kernels we use are: Constant (C): The constant kernel is kC(t, t0) = c. [sent-142, score-0.35]
38 Squared exponential (SE): The SE kernel or the Radial Basis Function (RBF) is the standard kernel used in most interpolation settings. [sent-145, score-0.514]
39 kSE(t,t0) = s2· exp−(t −2l t20)2 (3) This gives a smooth transition between neighbouring points and best describes time series with a smooth shape e. [sent-146, score-0.241]
40 Using the SE kernel corresponds to Bayesian linear regression with an infinite number of basis functions (Rasmussen and Williams, 2005). [sent-152, score-0.373]
41 Linear (Lin): The linear kernel describes a linear relationship between outputs. [sent-153, score-0.257]
42 kLin(t, t0) = s2 + kt · t0k (4) This can be obtained from linear regression by having N(0, 1) priors on the corresponding regression weights a,n1d) a prior of N(0, osr2r)e on tnhdei n bgia rse. [sent-154, score-0.268]
43 ePigerhitosd anicd (PER): oTfh Ne periodic kernel represents a SE kernel in polar coordinates. [sent-155, score-1.015]
44 s and l are characteristic lengthscales as in the SE kernel and p is the period. [sent-158, score-0.286]
45 Periodic spikes (PS): For textual time series, like word frequencies, we identify the following periodic behaviour: abrupt rise in usage, usually with a peak, followed by periods of low occurrence, which can be short (e. [sent-161, score-0.683]
46 (6) The kernel is parameterised by its period p and a shape parameter s. [sent-169, score-0.407]
47 The period indicates the time interval between the peaks of the function, while the shape parameter controls the width of the spike. [sent-170, score-0.224]
48 The behaviour of the kernel is illustrated in Figure 2. [sent-171, score-0.345]
49 In Figure 3 we see that the forecast is highly dependent on the kernel choice. [sent-173, score-0.311]
50 We expect that for periodic data the PER and PS kernels will forecast best, maybe with the PS kernel doing a better job because it captures multiple modes of the daily increase in volume. [sent-174, score-0.963]
51 This is because although a daily pattern exists, the weekly is stronger, with the day of the week influencing the volume of the hashtag. [sent-176, score-0.301]
52 This is called the marginal likelihood or evidence and is useful for model selection using only the training set: p(x|D,θ,Hi) =Zfp(x|D,f,Hi)p(f|θ,Hi) (7) Our first goal is to fit the kernel by minimizing the negative log marginal likelihood (NLML) with respect to the kernel parameters θ. [sent-189, score-0.629]
53 Conditioned on kernel parameters, the evidence of a GP can be computed analytically. [sent-191, score-0.288]
54 This is for example the case of the periodic bursts in Figure 3. [sent-196, score-0.552]
55 Although the periodic kernel can fit the data, it will incur a high model complexity penalty. [sent-197, score-0.8]
56 The PS kernel in this respect is a simpler model and can fit the data and is thus chosen as the right model. [sent-198, score-0.299]
57 For optimising the hyperparameters of the kernel defined in Equation 6, it is important to first identify the right period. [sent-202, score-0.326]
58 Our procedure is not guaranteed to reach a global optima, but Shape s25140P8e0riod p120160200 Figure 4: NLML for #goodmorning on the training set as a function of the 2 kernel parameters. [sent-209, score-0.257]
59 is a relatively standard technique for fitting periodic kernels (Duvenaud et al. [sent-210, score-0.594]
60 It might seem limiting that we only learn a single period, although we could combine periodic kernels with different periods together. [sent-220, score-0.697]
61 But, as we have seen in the #goodmorning example (with overlapping weekly and daily patterns), if there is a combination of periods the model will select a single period which is the least common multiple. [sent-221, score-0.393]
62 5 Forecasting hashtag frequency We treat our task of forecasting the volume of a Twitter hashtag as a regression problem. [sent-236, score-0.661]
63 Some users use hashtags as regular words that are integral to the tweet text, some hashtags are general and refer to the same thing or emotion (#news, #usa, #fail), others are Twitter games or memes (#2010dissapointments, #musicmonday). [sent-240, score-0.454]
64 Other hashtags refer to events which might be short lived (#worldcup2022), long lived (#25jan) or periodic (#raw, #americanidol). [sent-241, score-0.792]
65 We chose to model hashtags because they group similar tweets (like topics), reflect real world events (some of which are periodic) and present direct means of evaluation. [sent-242, score-0.242]
66 We treat each regression problem independently, learning for each hashtag its specific model and set of parameters. [sent-244, score-0.35]
67 Note that this is the same as using a GP model with a constant kernel (+ noise) with a mean equal to the training set mean. [sent-254, score-0.297]
68 Lag model with GP determined period (Lag+): The prediction is the mean value in the training set of the values at lag ∆ where ∆ is the period rounded to the closest integer as determined by our GP model. [sent-255, score-0.334]
69 Comparing to this model we can see if the GP model can recover the underlying function that described the periodic variation and filter out the noise in the observations. [sent-258, score-0.501]
70 GP regression: Gaussian Process regression using only the SE kernel (GP-SE), the periodic ker- nel (GP-PER), the PS kernel (GP-PS). [sent-260, score-1.131]
71 We will also compare to GP regression the linear kernel (GP-Lin), but we will not use this as a candidate for model selection due the poor results shown below. [sent-263, score-0.415]
72 For this reason the model that uses the constant kernel performs best, being the simplest one that can describe the data, although the others give similar results in terms of NRMSE on the held-out testing set. [sent-277, score-0.257]
73 While functions learned using this kernel never clearly outperform others on NRMSE on held-out data, this is very useful for interpretation of the time series, separating noisy time series from those that have an underlying periodic behaviour. [sent-278, score-0.911]
74 This is actually the behaviour typical of ‘internet memes’ (this hashtag tags tweets of people posting things they would never tell anyone) as presented in Yang and Leskovec (201 1). [sent-281, score-0.391]
75 These cannot be modelled with a constant kernel or a periodic one as shown by the results on held-out data and the time series plot. [sent-282, score-0.899]
76 The periodic kernels will fail in trying to match the large burst with others in the training data and will attribute to noise the lack of a similar peak, thus discovering wrong periods and making bad predictions. [sent-283, score-0.744]
77 The periodic kernel best models hashtags that exhibit an oscillating pattern. [sent-285, score-0.975]
78 Here, the pe984 riod is chosen to be one week (168) rather than one day (24) because of the weekly effect superimposed on the daily one. [sent-287, score-0.301]
79 The PS kernel introduced in this paper models best hashtags that have a large and short lived burst in usage. [sent-290, score-0.536]
80 First, we choose #breakfast which has a daily and weekly pattern. [sent-292, score-0.191]
81 In the second example, we present a hashtag that is associated to a weekly event: #raw is used to discuss a wrestling show that airs every week for 2 hours on Monday evenings in the U. [sent-297, score-0.497]
82 With the exception of these 2 hours and the hour building up to it, the hashtag is rarely used. [sent-300, score-0.27]
83 This behaviour is modelled very well using our kernel, with a very high value for the shape parameter (s = 200) compared to the previous example (s = 11) which captures the abrupt trend in usage. [sent-301, score-0.206]
84 G22P%+ Table 4: Average relative gain over mean (M) prediction for forecasting on the entire month using the different models being part of the 4 hashtag categories, and the total number of hashtags in each. [sent-311, score-0.595]
85 We choose this, because we consider that NRMSE is not comparable between regression tasks due to the presence of large peaks in many time series, which distort the NRMSE values. [sent-313, score-0.19]
86 Although chosen in the model selection phase in only a third of the tasks, it performs consistently well across tasks because of its ability to model well all the periodic hashtags, be they smooth or abrupt. [sent-317, score-0.582]
87 3 Discussion Let us now turn to why the GP model is better for discovering periodicities than classic time series modelling methods. [sent-320, score-0.37]
88 Measuring autocorrelation between points in the time series is used to discover the hidden periodicities in the data and in building AR models. [sent-321, score-0.348]
89 The second case is illustrated in Figure 6 where #confessionhour shows autocorrelation but, as seen in Figure 5, lacks a periodic component. [sent-323, score-0.56]
90 This fails to discover the correct period when dealing with large bursts like those exhibited by the #raw time series as shown in Figure 7. [sent-327, score-0.262]
91 The lowest frequency spike corresponds to the correct period of 168, but also other candidate periods are shown as possible. [sent-328, score-0.202]
92 6 Text based prediction In this section we demonstrate the usefulness of our method of modelling in an NLP task: predicting the hashtag of a tweet based on its text. [sent-341, score-0.41]
93 This method provides us with a straightforward way to incorporate our prior knowledge of how frequent a hashtag is in a certain time frame. [sent-345, score-0.311]
94 For comparison we use the Most Frequent (MF) baseline and the Na¨ ıve Bayes with empirical prior (NB-E) which doesn’t use any temporal forecasting information. [sent-347, score-0.229]
95 Because there are more than 1000 possible classes we show the accuracy of the correct hashtag being amongst the top 1,5 or 50 hashtags as well as the Mean Reciprocal Rank (MRR). [sent-348, score-0.407]
96 We have presented a framework based on Gaussian Process regression for identifying periodic patterns and their parameters using only training data. [sent-363, score-0.66]
97 We divided the periodic patterns into 2 categories: oscillating and periodic bursts by performing model selection using bayesian evidence. [sent-364, score-1.222]
98 Structure discovery in nonparametric regression through compositional kernel search. [sent-392, score-0.373]
99 Learning periodic human behaviour models from sparse data for crowdsourcing aid delivery in developing countries. [sent-429, score-0.589]
100 We know what @you #tag: does the dual role affect hashtag adoption? [sent-486, score-0.234]
wordName wordTfidf (topN-words)
[('periodic', 0.501), ('gp', 0.41), ('kernel', 0.257), ('hashtag', 0.234), ('periodicities', 0.177), ('hashtags', 0.173), ('gaussian', 0.148), ('weekly', 0.133), ('nrmse', 0.118), ('ps', 0.118), ('regression', 0.116), ('temporal', 0.116), ('rasmussen', 0.115), ('periods', 0.103), ('period', 0.099), ('kernels', 0.093), ('behaviour', 0.088), ('modelling', 0.081), ('forecasting', 0.077), ('covariance', 0.075), ('goodmorning', 0.074), ('nlml', 0.074), ('series', 0.071), ('tweets', 0.069), ('week', 0.065), ('gps', 0.064), ('preo', 0.064), ('autocorrelation', 0.059), ('lived', 0.059), ('daily', 0.058), ('lag', 0.058), ('tweet', 0.057), ('processes', 0.056), ('forecast', 0.054), ('se', 0.053), ('bursts', 0.051), ('memes', 0.051), ('shape', 0.051), ('burst', 0.047), ('cohn', 0.046), ('day', 0.045), ('williams', 0.045), ('confessionhour', 0.044), ('cpgosonldst', 0.044), ('durrande', 0.044), ('duvenaud', 0.044), ('extrapolation', 0.044), ('oscillating', 0.044), ('trendminer', 0.044), ('weekends', 0.044), ('patterns', 0.043), ('selection', 0.042), ('fit', 0.042), ('leskovec', 0.041), ('time', 0.041), ('bayesian', 0.04), ('mean', 0.04), ('smooth', 0.039), ('yogatama', 0.038), ('abrupt', 0.038), ('optimising', 0.038), ('social', 0.038), ('trevor', 0.038), ('prediction', 0.038), ('bayes', 0.037), ('hours', 0.036), ('prior', 0.036), ('carl', 0.035), ('breakfast', 0.035), ('month', 0.033), ('peaks', 0.033), ('response', 0.032), ('hyperparameters', 0.031), ('evidence', 0.031), ('ve', 0.031), ('na', 0.03), ('xt', 0.03), ('adams', 0.029), ('americanidol', 0.029), ('autoregressive', 0.029), ('behaviours', 0.029), ('candela', 0.029), ('evenings', 0.029), ('forecasted', 0.029), ('hennig', 0.029), ('hensman', 0.029), ('lengthscales', 0.029), ('mazzia', 0.029), ('mcinerney', 0.029), ('nen', 0.029), ('occam', 0.029), ('periodicity', 0.029), ('periodogram', 0.029), ('sheffield', 0.029), ('zangerle', 0.029), ('zfp', 0.029), ('zongyang', 0.029), ('modelled', 0.029), ('tsur', 0.029), ('uncertainty', 0.029)]
simIndex simValue paperId paperTitle
same-paper 1 1.0000006 18 emnlp-2013-A temporal model of text periodicities using Gaussian Processes
Author: Daniel Preotiuc-Pietro ; Trevor Cohn
Abstract: Temporal variations of text are usually ignored in NLP applications. However, text use changes with time, which can affect many applications. In this paper we model periodic distributions of words over time. Focusing on hashtag frequency in Twitter, we first automatically identify the periodic patterns. We use this for regression in order to forecast the volume of a hashtag based on past data. We use Gaussian Processes, a state-ofthe-art bayesian non-parametric model, with a novel periodic kernel. We demonstrate this in a text classification setting, assigning the tweet hashtag based on the rest of its text. This method shows significant improvements over competitive baselines.
2 0.15528256 17 emnlp-2013-A Walk-Based Semantically Enriched Tree Kernel Over Distributed Word Representations
Author: Shashank Srivastava ; Dirk Hovy ; Eduard Hovy
Abstract: In this paper, we propose a walk-based graph kernel that generalizes the notion of treekernels to continuous spaces. Our proposed approach subsumes a general framework for word-similarity, and in particular, provides a flexible way to incorporate distributed representations. Using vector representations, such an approach captures both distributional semantic similarities among words as well as the structural relations between them (encoded as the structure of the parse tree). We show an efficient formulation to compute this kernel using simple matrix operations. We present our results on three diverse NLP tasks, showing state-of-the-art results.
3 0.11017855 47 emnlp-2013-Collective Opinion Target Extraction in Chinese Microblogs
Author: Xinjie Zhou ; Xiaojun Wan ; Jianguo Xiao
Abstract: Microblog messages pose severe challenges for current sentiment analysis techniques due to some inherent characteristics such as the length limit and informal writing style. In this paper, we study the problem of extracting opinion targets of Chinese microblog messages. Such fine-grained word-level task has not been well investigated in microblogs yet. We propose an unsupervised label propagation algorithm to address the problem. The opinion targets of all messages in a topic are collectively extracted based on the assumption that similar messages may focus on similar opinion targets. Topics in microblogs are identified by hashtags or using clustering algorithms. Experimental results on Chinese microblogs show the effectiveness of our framework and algorithms.
4 0.099943437 170 emnlp-2013-Sentiment Analysis: How to Derive Prior Polarities from SentiWordNet
Author: Marco Guerini ; Lorenzo Gatti ; Marco Turchi
Abstract: Assigning a positive or negative score to a word out of context (i.e. a word’s prior polarity) is a challenging task for sentiment analysis. In the literature, various approaches based on SentiWordNet have been proposed. In this paper, we compare the most often used techniques together with newly proposed ones and incorporate all of them in a learning framework to see whether blending them can further improve the estimation of prior polarity scores. Using two different versions of SentiWordNet and testing regression and classification models across tasks and datasets, our learning approach consistently outperforms the single metrics, providing a new state-ofthe-art approach in computing words’ prior polarity for sentiment analysis. We conclude our investigation showing interesting biases in calculated prior polarity scores when word Part of Speech and annotator gender are considered.
5 0.082183562 16 emnlp-2013-A Unified Model for Topics, Events and Users on Twitter
Author: Qiming Diao ; Jing Jiang
Abstract: With the rapid growth of social media, Twitter has become one of the most widely adopted platforms for people to post short and instant message. On the one hand, people tweets about their daily lives, and on the other hand, when major events happen, people also follow and tweet about them. Moreover, people’s posting behaviors on events are often closely tied to their personal interests. In this paper, we try to model topics, events and users on Twitter in a unified way. We propose a model which combines an LDA-like topic model and the Recurrent Chinese Restaurant Process to capture topics and events. We further propose a duration-based regularization component to find bursty events. We also propose to use event-topic affinity vectors to model the asso- . ciation between events and topics. Our experiments shows that our model can accurately identify meaningful events and the event-topic affinity vectors are effective for event recommendation and grouping events by topics.
6 0.080303915 76 emnlp-2013-Exploiting Discourse Analysis for Article-Wide Temporal Classification
7 0.068713196 27 emnlp-2013-Authorship Attribution of Micro-Messages
8 0.063543819 109 emnlp-2013-Is Twitter A Better Corpus for Measuring Sentiment Similarity?
10 0.059414912 163 emnlp-2013-Sarcasm as Contrast between a Positive Sentiment and Negative Situation
11 0.057450347 31 emnlp-2013-Automatic Feature Engineering for Answer Selection and Extraction
12 0.057190612 89 emnlp-2013-Gender Inference of Twitter Users in Non-English Contexts
13 0.056891136 41 emnlp-2013-Building Event Threads out of Multiple News Articles
14 0.054844845 118 emnlp-2013-Learning Biological Processes with Global Constraints
15 0.054570872 93 emnlp-2013-Harvesting Parallel News Streams to Generate Paraphrases of Event Relations
16 0.051583573 74 emnlp-2013-Event-Based Time Label Propagation for Automatic Dating of News Articles
17 0.04812894 147 emnlp-2013-Optimized Event Storyline Generation based on Mixture-Event-Aspect Model
18 0.043440677 9 emnlp-2013-A Log-Linear Model for Unsupervised Text Normalization
19 0.039767083 4 emnlp-2013-A Dataset for Research on Short-Text Conversations
20 0.038609661 133 emnlp-2013-Modeling Scientific Impact with Topical Influence Regression
topicId topicWeight
[(0, -0.148), (1, 0.065), (2, -0.106), (3, 0.01), (4, 0.017), (5, -0.074), (6, -0.019), (7, -0.014), (8, -0.018), (9, 0.038), (10, -0.045), (11, 0.073), (12, -0.031), (13, 0.082), (14, 0.017), (15, -0.015), (16, 0.014), (17, 0.117), (18, -0.024), (19, 0.057), (20, -0.024), (21, -0.106), (22, 0.038), (23, 0.065), (24, -0.129), (25, -0.017), (26, -0.163), (27, -0.032), (28, -0.048), (29, -0.01), (30, 0.083), (31, 0.093), (32, -0.172), (33, -0.117), (34, 0.078), (35, 0.017), (36, 0.162), (37, -0.071), (38, 0.061), (39, -0.019), (40, -0.006), (41, 0.147), (42, -0.099), (43, 0.12), (44, -0.213), (45, 0.1), (46, 0.075), (47, 0.03), (48, 0.009), (49, -0.022)]
simIndex simValue paperId paperTitle
same-paper 1 0.94883144 18 emnlp-2013-A temporal model of text periodicities using Gaussian Processes
Author: Daniel Preotiuc-Pietro ; Trevor Cohn
Abstract: Temporal variations of text are usually ignored in NLP applications. However, text use changes with time, which can affect many applications. In this paper we model periodic distributions of words over time. Focusing on hashtag frequency in Twitter, we first automatically identify the periodic patterns. We use this for regression in order to forecast the volume of a hashtag based on past data. We use Gaussian Processes, a state-ofthe-art bayesian non-parametric model, with a novel periodic kernel. We demonstrate this in a text classification setting, assigning the tweet hashtag based on the rest of its text. This method shows significant improvements over competitive baselines.
2 0.70219243 17 emnlp-2013-A Walk-Based Semantically Enriched Tree Kernel Over Distributed Word Representations
Author: Shashank Srivastava ; Dirk Hovy ; Eduard Hovy
Abstract: In this paper, we propose a walk-based graph kernel that generalizes the notion of treekernels to continuous spaces. Our proposed approach subsumes a general framework for word-similarity, and in particular, provides a flexible way to incorporate distributed representations. Using vector representations, such an approach captures both distributional semantic similarities among words as well as the structural relations between them (encoded as the structure of the parse tree). We show an efficient formulation to compute this kernel using simple matrix operations. We present our results on three diverse NLP tasks, showing state-of-the-art results.
3 0.54691362 188 emnlp-2013-Tree Kernel-based Negation and Speculation Scope Detection with Structured Syntactic Parse Features
Author: Bowei Zou ; Guodong Zhou ; Qiaoming Zhu
Abstract: Scope detection is a key task in information extraction. This paper proposes a new approach for tree kernel-based scope detection by using the structured syntactic parse information. In addition, we have explored the way of selecting compatible features for different part-of-speech cues. Experiments on the BioScope corpus show that both constituent and dependency structured syntactic parse features have the advantage in capturing the potential relationships between cues and their scopes. Compared with the state of the art scope detection systems, our system achieves substantial improvement.
4 0.51395178 170 emnlp-2013-Sentiment Analysis: How to Derive Prior Polarities from SentiWordNet
Author: Marco Guerini ; Lorenzo Gatti ; Marco Turchi
Abstract: Assigning a positive or negative score to a word out of context (i.e. a word’s prior polarity) is a challenging task for sentiment analysis. In the literature, various approaches based on SentiWordNet have been proposed. In this paper, we compare the most often used techniques together with newly proposed ones and incorporate all of them in a learning framework to see whether blending them can further improve the estimation of prior polarity scores. Using two different versions of SentiWordNet and testing regression and classification models across tasks and datasets, our learning approach consistently outperforms the single metrics, providing a new state-ofthe-art approach in computing words’ prior polarity for sentiment analysis. We conclude our investigation showing interesting biases in calculated prior polarity scores when word Part of Speech and annotator gender are considered.
Author: Jinpeng Wang ; Wayne Xin Zhao ; Haitian Wei ; Hongfei Yan ; Xiaoming Li
Abstract: Hot trends are likely to bring new business opportunities. For example, “Air Pollution” might lead to a significant increase of the sales of related products, e.g., mouth mask. For ecommerce companies, it is very important to make rapid and correct response to these hot trends in order to improve product sales. In this paper, we take the initiative to study the task of how to identify trend related products. The major novelty of our work is that we automatically learn commercial intents revealed from microblogs. We carefully construct a data collection for this task and present quite a few insightful findings. In order to solve this problem, we further propose a graph based method, which jointly models relevance and associativity. We perform extensive experiments and the results showed that our methods are very effective.
6 0.37794131 44 emnlp-2013-Centering Similarity Measures to Reduce Hubs
8 0.33846843 163 emnlp-2013-Sarcasm as Contrast between a Positive Sentiment and Negative Situation
9 0.31766582 47 emnlp-2013-Collective Opinion Target Extraction in Chinese Microblogs
10 0.30935577 31 emnlp-2013-Automatic Feature Engineering for Answer Selection and Extraction
11 0.30866477 189 emnlp-2013-Two-Stage Method for Large-Scale Acquisition of Contradiction Pattern Pairs using Entailment
12 0.30064082 123 emnlp-2013-Learning to Rank Lexical Substitutions
13 0.28803664 16 emnlp-2013-A Unified Model for Topics, Events and Users on Twitter
14 0.28508639 86 emnlp-2013-Feature Noising for Log-Linear Structured Prediction
15 0.27329087 10 emnlp-2013-A Multi-Teraflop Constituency Parser using GPUs
16 0.26444238 161 emnlp-2013-Rule-Based Information Extraction is Dead! Long Live Rule-Based Information Extraction Systems!
17 0.26360789 74 emnlp-2013-Event-Based Time Label Propagation for Automatic Dating of News Articles
18 0.26304629 76 emnlp-2013-Exploiting Discourse Analysis for Article-Wide Temporal Classification
19 0.25459158 14 emnlp-2013-A Synchronous Context Free Grammar for Time Normalization
20 0.25354365 129 emnlp-2013-Measuring Ideological Proportions in Political Speeches
topicId topicWeight
[(3, 0.038), (9, 0.014), (18, 0.033), (22, 0.044), (30, 0.088), (37, 0.32), (47, 0.01), (50, 0.016), (51, 0.132), (66, 0.049), (71, 0.035), (75, 0.043), (77, 0.014), (90, 0.013), (96, 0.065)]
simIndex simValue paperId paperTitle
1 0.74433249 144 emnlp-2013-Opinion Mining in Newspaper Articles by Entropy-Based Word Connections
Author: Thomas Scholz ; Stefan Conrad
Abstract: A very valuable piece of information in newspaper articles is the tonality of extracted statements. For the analysis of tonality of newspaper articles either a big human effort is needed, when it is carried out by media analysts, or an automated approach which has to be as accurate as possible for a Media Response Analysis (MRA). To this end, we will compare several state-of-the-art approaches for Opinion Mining in newspaper articles in this paper. Furthermore, we will introduce a new technique to extract entropy-based word connections which identifies the word combinations which create a tonality. In the evaluation, we use two different corpora consisting of news articles, by which we show that the new approach achieves better results than the four state-of-the-art methods.
same-paper 2 0.722121 18 emnlp-2013-A temporal model of text periodicities using Gaussian Processes
Author: Daniel Preotiuc-Pietro ; Trevor Cohn
Abstract: Temporal variations of text are usually ignored in NLP applications. However, text use changes with time, which can affect many applications. In this paper we model periodic distributions of words over time. Focusing on hashtag frequency in Twitter, we first automatically identify the periodic patterns. We use this for regression in order to forecast the volume of a hashtag based on past data. We use Gaussian Processes, a state-ofthe-art bayesian non-parametric model, with a novel periodic kernel. We demonstrate this in a text classification setting, assigning the tweet hashtag based on the rest of its text. This method shows significant improvements over competitive baselines.
3 0.50607008 48 emnlp-2013-Collective Personal Profile Summarization with Social Networks
Author: Zhongqing Wang ; Shoushan LI ; Fang Kong ; Guodong Zhou
Abstract: Personal profile information on social media like LinkedIn.com and Facebook.com is at the core of many interesting applications, such as talent recommendation and contextual advertising. However, personal profiles usually lack organization confronted with the large amount of available information. Therefore, it is always a challenge for people to find desired information from them. In this paper, we address the task of personal profile summarization by leveraging both personal profile textual information and social networks. Here, using social networks is motivated by the intuition that, people with similar academic, business or social connections (e.g. co-major, co-university, and cocorporation) tend to have similar experience and summaries. To achieve the learning process, we propose a collective factor graph (CoFG) model to incorporate all these resources of knowledge to summarize personal profiles with local textual attribute functions and social connection factors. Extensive evaluation on a large-scale dataset from LinkedIn.com demonstrates the effectiveness of the proposed approach. 1
4 0.49402392 16 emnlp-2013-A Unified Model for Topics, Events and Users on Twitter
Author: Qiming Diao ; Jing Jiang
Abstract: With the rapid growth of social media, Twitter has become one of the most widely adopted platforms for people to post short and instant message. On the one hand, people tweets about their daily lives, and on the other hand, when major events happen, people also follow and tweet about them. Moreover, people’s posting behaviors on events are often closely tied to their personal interests. In this paper, we try to model topics, events and users on Twitter in a unified way. We propose a model which combines an LDA-like topic model and the Recurrent Chinese Restaurant Process to capture topics and events. We further propose a duration-based regularization component to find bursty events. We also propose to use event-topic affinity vectors to model the asso- . ciation between events and topics. Our experiments shows that our model can accurately identify meaningful events and the event-topic affinity vectors are effective for event recommendation and grouping events by topics.
5 0.49395758 143 emnlp-2013-Open Domain Targeted Sentiment
Author: Margaret Mitchell ; Jacqui Aguilar ; Theresa Wilson ; Benjamin Van Durme
Abstract: We propose a novel approach to sentiment analysis for a low resource setting. The intuition behind this work is that sentiment expressed towards an entity, targeted sentiment, may be viewed as a span of sentiment expressed across the entity. This representation allows us to model sentiment detection as a sequence tagging problem, jointly discovering people and organizations along with whether there is sentiment directed towards them. We compare performance in both Spanish and English on microblog data, using only a sentiment lexicon as an external resource. By leveraging linguisticallyinformed features within conditional random fields (CRFs) trained to minimize empirical risk, our best models in Spanish significantly outperform a strong baseline, and reach around 90% accuracy on the combined task of named entity recognition and sentiment prediction. Our models in English, trained on a much smaller dataset, are not yet statistically significant against their baselines.
6 0.49131253 56 emnlp-2013-Deep Learning for Chinese Word Segmentation and POS Tagging
8 0.489622 47 emnlp-2013-Collective Opinion Target Extraction in Chinese Microblogs
9 0.48689866 130 emnlp-2013-Microblog Entity Linking by Leveraging Extra Posts
10 0.4851141 80 emnlp-2013-Exploiting Zero Pronouns to Improve Chinese Coreference Resolution
11 0.4844929 51 emnlp-2013-Connecting Language and Knowledge Bases with Embedding Models for Relation Extraction
12 0.48427275 194 emnlp-2013-Unsupervised Relation Extraction with General Domain Knowledge
13 0.48316729 17 emnlp-2013-A Walk-Based Semantically Enriched Tree Kernel Over Distributed Word Representations
14 0.48267725 38 emnlp-2013-Bilingual Word Embeddings for Phrase-Based Machine Translation
15 0.48125693 107 emnlp-2013-Interactive Machine Translation using Hierarchical Translation Models
16 0.4812333 81 emnlp-2013-Exploring Demographic Language Variations to Improve Multilingual Sentiment Analysis in Social Media
17 0.48076344 110 emnlp-2013-Joint Bootstrapping of Corpus Annotations and Entity Types
18 0.47873491 64 emnlp-2013-Discriminative Improvements to Distributional Sentence Similarity
19 0.4786976 69 emnlp-2013-Efficient Collective Entity Linking with Stacking
20 0.47833207 13 emnlp-2013-A Study on Bootstrapping Bilingual Vector Spaces from Non-Parallel Data (and Nothing Else)