emnlp emnlp2013 emnlp2013-16 knowledge-graph by maker-knowledge-mining
Source: pdf
Author: Qiming Diao ; Jing Jiang
Abstract: With the rapid growth of social media, Twitter has become one of the most widely adopted platforms for people to post short and instant message. On the one hand, people tweets about their daily lives, and on the other hand, when major events happen, people also follow and tweet about them. Moreover, people’s posting behaviors on events are often closely tied to their personal interests. In this paper, we try to model topics, events and users on Twitter in a unified way. We propose a model which combines an LDA-like topic model and the Recurrent Chinese Restaurant Process to capture topics and events. We further propose a duration-based regularization component to find bursty events. We also propose to use event-topic affinity vectors to model the asso- . ciation between events and topics. Our experiments shows that our model can accurately identify meaningful events and the event-topic affinity vectors are effective for event recommendation and grouping events by topics.
Reference: text
sentIndex sentText sentNum sentScore
1 On the one hand, people tweets about their daily lives, and on the other hand, when major events happen, people also follow and tweet about them. [sent-5, score-0.997]
2 In this paper, we try to model topics, events and users on Twitter in a unified way. [sent-7, score-0.578]
3 Our experiments shows that our model can accurately identify meaningful events and the event-topic affinity vectors are effective for event recommendation and grouping events by topics. [sent-12, score-1.696]
4 The concepts of topics and events are orthogonal in that many events fall under certain topics. [sent-31, score-0.998]
5 Furthermore, being social media, Twitter users play important roles in forming topics and events on Twitter. [sent-33, score-0.784]
6 Whether a user publishes a tweet related to an event also largely depends on whether her topic interests match the nature of the event. [sent-35, score-0.931]
7 Modeling the interplay between topics, events and users can deepen our understanding of Twitter content and potentially aid many predication and recommendation tasks. [sent-36, score-0.647]
8 In this paper, we aim to construct a unified model of topics, events and users on Twitter. [sent-37, score-0.578]
9 Although there has been a number of recent studies on event detection on Twitter, to the best of our knowledge, ours is the first that links the topic interests of users to their tweeting behaviors on events. [sent-38, score-0.82]
10 Specifically, we propose a probabilistic latent variable model that identifies both topics and events on Twitter. [sent-39, score-0.581]
11 To do so, we first separate tweets into topic tweets and event tweets. [sent-40, score-1.222]
12 The latter are about some major global event interesting to a large group of people, such as a tweet advertising a concert or commenting on an election result. [sent-44, score-0.7]
13 Although considering only topic tweets and event tweets is a much simplified view of the diverse range of tweets, we find it effective in finding meaningful topics and events. [sent-45, score-1.459]
14 It uses the latent topics to explain users’ preferences of events and subsequently infers the association between topics and events. [sent-57, score-0.745]
15 Comparison with our base model and with an existing model for event discovery on Twitter shows that the two modifications are both effective. [sent-60, score-0.552]
16 The duration-based regularization helps find more meaningful events; the event-topic affinity vectors improve an event recommendation task and helps produce a meaningful organization of events by topics. [sent-61, score-1.419]
17 2 Related Work Study of topics, events and users on Twitter is related to several branches of work. [sent-62, score-0.578]
18 In comparison, our work focuses on modeling topics, events and users as well as their relation. [sent-72, score-0.578]
19 Our work helps better understand these additional events on Twitter and their relations with users’ topic interests. [sent-75, score-0.561]
20 It has later been combined with LDA to model both topics and events in news streams and social media streams (Ahmed et al. [sent-82, score-0.623]
21 3 Our Model In this section, we present our model for topics, events and users on Twitter. [sent-90, score-0.578]
22 3 we discuss how we model the relation between topics and events using event-topic affinity vectors. [sent-118, score-0.827]
23 Each event is also a multinomial distribution over words, denoted as ψk where k is an event index. [sent-124, score-0.915]
24 As we have discussed, we separate tweets into two categories, topic tweets and event tweets. [sent-129, score-1.222]
25 For event tweets, the event is sampled according to RCRP. [sent-132, score-0.852]
26 It could be an existing event that has at least one related tweet in the previous epoch or the current epoch, or it could be a new event. [sent-139, score-0.751]
27 Let nk,t−1 denote the number of tweets related to event k at the end of epoch (t 1). [sent-140, score-0.891]
28 Let − n(ki,t) denote the number of tweets related to event k in epoch t before the i-th tweet comes. [sent-141, score-1.077]
29 Let Nt−1 denote the total number of event-related tweets in epoch (t − 1) and denote the number of eventereploatcehd ( ttw −ee 1ts) ainn epoch t before the i-th tweet. [sent-142, score-0.604]
30 Then RCRP assumes that the probability for the i-th tweet Nt(i) Nntk−,t1+−1N+t(ni)(k+i,t)α and the probability to start a new event is Nt−1+αNt(i)+α, where α is a to join event k is parameter. [sent-143, score-1.038]
31 We also show the plate notation in Figure 1, in which the Recurrent Chinese Restaurant Process is represented as an infinite dynamic mixture model (Ahmed and Xing, ) and θrtcrp means the distribution on an infinite number of events in epoch t. [sent-147, score-0.586]
32 Dt is the total number of tweets (both event-related and topic tweets), while Nt represents the number event-related tweets in epoch t. [sent-148, score-0.935]
33 If yt,i is 1, then we look at all the tweets that belong to event st,i. [sent-159, score-0.752]
34 So we assume that rt,i gets a value of 1 with probability exp(− nst,i,t′ ∑tT′=1,|t′−t|>1 λ|t − t′|nst,i,t′), where is the n∑umber of tweets in epoch t′ that be1872 • For each topic a = 1, . [sent-161, score-0.609]
35 , U) - drr eawac hθ uu ∼ Dirichlet(γ) , πu ∼ Beta(τ) • For each epoch itr iacnhdl tewt(eγe)t, ,iπ - drr eawac yt,i ∼ Bernoulli(πut,i ) - If yt,i = 0∼ * draw zt,i ∼ Multinomial(θut,i ) * For each j,d Mrauwl wt,i,j ∼ Multinomial(ϕzt,i ) - If yt,i = 1 * draw st,i from RCRP * If st,i is a new event . [sent-167, score-0.773]
36 We can see that when we factor in the generation of these pseud-observed variables r, we penalize long-term events and favor events whose tweets are concentrated along the timeline. [sent-174, score-1.192]
37 3 Event-Topic Affinity Vectors So far in our model topics and events are not related. [sent-177, score-0.581]
38 One way to do it is to assume that event tweets also have topical words sampled from the event’s topic distribution, something similar to the models by Ahmed et al. [sent-181, score-0.896]
39 The idea is that when a user posts a tweet about an event, we can treat the event as an item and this posting behavior as adoption of the item. [sent-185, score-0.736]
40 ηk0 is a bias term that represents the inner popularity of an event regardless of its affinity to any topic. [sent-191, score-0.834]
41 For event tweets, ct,i is generated by a Gaussian distribution with mean equal to ηs0t,i + ηst,i · z¯ut,i, where z¯u is an A-dimensional vector denoting the empirical topic distribution of user u’s tweets. [sent-194, score-0.724]
42 Let C¯u,a be the number of tweets by user u assigned to topic a, based on the values of the latent variables y and z. [sent-196, score-0.596]
43 In the gradient descent part, we update the event-topic affinity vectors ηk and the bias term ηk0 of each event k by keeping the assignment of the variables yt,i ,zt,i and st,i fixed. [sent-204, score-0.752]
44 ,nst−+(uaπ1,0n)>eiws0 teh,vn ek n,tu=m>b0 e,r n(uπ,1) of topic tweets by user u while is the number of event tweets by user u. [sent-213, score-1.41]
45 the total number of tweets is the number of tweets assigned to topic a for this user, resulting from integrating out the user’s topic nu(θ,(). [sent-217, score-0.94]
46 ) times word type v is assigned to event k, and is the total number of words assigned to event k. [sent-225, score-0.852]
47 Iu = {t′, i′|yt′,i′ = 1, ut′,i′ = u}, which is the set of event ttwee|yts published by user u, and u represents ut,i for short. [sent-227, score-0.52]
48 Finally, N is a local normalization factor for event tweets, which includes the RCRP, event-topic affinity and regularization on event duration. [sent-229, score-1.165]
49 Given the assignment, we use gradient descent to update the values of the bias term ηk0 and the eventtopic affinity vectors ηk for each current existing event k. [sent-231, score-0.77]
50 First, we can get the logarithm of the posterior distribution: ln P(y, z, s, r, c|w, u, all priors) = constant −k∑∞=1{2ι(ηk02+ ηk· ηk) +u∑U=1nu,k2ϵ[1 − (ηk0+ ηk· z¯ u)]2}, where nu,k is the number of event tweets about event k published by user u. [sent-232, score-1.272]
51 We therefore would like to evaluate the quality of the identified topics and events as well as the usefulness of the discovered topic distributions of users and eventtopic affinity vectors. [sent-250, score-1.228]
52 TimeUserLDA also mod- els topics and events by separating topic tweets from event tweets. [sent-258, score-1.477]
53 However, it groups event tweets into a fixed number of bursty topics and then uses a twostate machine in a postprocessing step to identify events from these bursty topics. [sent-259, score-1.485]
54 We do not compare with other event detection methods because our objective is not online event detection. [sent-262, score-0.885]
55 When a new event k is created, the inner popularity bias term ηk0 is set to 1, and the factors in event- topic affinity vectors ηk are all set to 0. [sent-271, score-1.026]
56 We then judge whether the detected event tweets are indeed related to the corresponding event. [sent-286, score-0.752]
57 For each method, we rank the detected events based on the number of tweets assigned to them and then pick the top-30 events for each method. [sent-306, score-1.189]
58 The judges are given 100 randomly selected tweets for each event (or all tweets if an event contains less than 100 tweets). [sent-308, score-1.537]
59 Finally we treat an event as meaningful if both judges have scored it 1. [sent-314, score-0.532]
60 We have the following findings from the results: (1) Our base model performs quite poorly for the top events while Base+Reg and Base+Reg+Aff perform much better. [sent-318, score-0.543]
61 A close examination shows that the base model clusters many general topic tweets as events, such as tweets about transportation and music and even foursquare tweets. [sent-320, score-0.966]
62 (2) TimeUserLDA performs well for the very top events (P@5 and P@ 10) but its performance drops for lower-ranked events (P@20 and P@30), similar to what was reported by Diao et al. [sent-321, score-0.834]
63 A close examination shows that this method is good at finding major events that do not have strong topic association and thus attract most people’s attention, e. [sent-323, score-0.598]
64 This is because this method mixes topics and events first and only detects events from bursty topics in a second stage of postprocessing. [sent-326, score-1.238]
65 (3) The difference between Base+Reg and Base+Reg+Aff is small, suggesting that the event- topic affinity vectors are not crucial for event detection. [sent-328, score-0.864]
66 Precision of Event Tweets Next, we evaluate the relevance of the detected event tweets to each event. [sent-329, score-0.752]
67 We pick 3 out of 5 common events shared by all methods within top-30 events Event TimeUserLDA Father’s Day debate caused by Manda Swaggie Indonesia tsunami Super Junior album release Base Base+Reg Base+Reg+Aff 0. [sent-331, score-0.964]
68 81 Table 3: Precision of the event tweets for the 4 common events. [sent-346, score-0.752]
69 The precision of the 100 tweets for each event and each method is shown in Table 3. [sent-350, score-0.752]
70 For example, for the event “Super Junior album release,” Base finds other music-related tweets surrounding the peak period of the event itself. [sent-353, score-1.241]
71 In summary, our evaluation on event quality shows that (1) Using the non-parametric RCRP model to identify events within the generative model itself is advantageous over TimeUserLDA, which identifies events by postprocessing. [sent-354, score-1.26]
72 3 Event-Topic Association Besides event identification, our model also finds the association between events and topics through the event-topic affinity vectors. [sent-357, score-1.253]
73 Event Recommendation Recall that to discover event-topic association, we treat an event as an item and a tweet about the event as indication of the user’s adoption of the item. [sent-360, score-1.094]
74 Following this analogy with item recommendation, we define an event recommendation task where the goal is to recommend an event to users who have not posted any tweet about the event but may potentially be interested in the event. [sent-361, score-1.694]
75 Intuitively, if a user’s topic 1876 distribution is similar to the event-topic affinity vector of the event, then the user is likely to be interested in the event. [sent-362, score-0.514]
76 We then use a ransom subset of 250 training users and their tweets in June to identify events in June as well as the event-topic affinity vectors of these events. [sent-364, score-1.198]
77 We pick 8 meaningful events that are ranked high by all methods for testing. [sent-365, score-0.519]
78 For each event, we try to find among the remaining 250 users those who may be interested in the event and compare the results with ground truth obtained by human judgment. [sent-366, score-0.587]
79 For each test user and each event, we manually inspect the user’s tweets around the peak days ofthe event tojudge whether she has com- mented on the event. [sent-368, score-0.846]
80 For the other methods, because we do not have· any parameter that directly encodes event-topic association, we cannot rank users based on how similar their topic distributions are to the event’s affinity to topics. [sent-371, score-0.551]
81 In addition, for each test event these methods identify a list of training users who have tweeted on it. [sent-374, score-0.641]
82 We also rank the 8 events in decreasing order of their inner popularity ηk0 learned by our complete model. [sent-421, score-0.579]
83 (1) Our complete method outperforms the other methods for 6 out of the 8 test events, suggesting that with the inferred event-topic affinity vectors we can do better event recommendation. [sent-424, score-0.72]
84 (2) The improvement brought by the eventtopic affinity vectors, as reflected in the difference in Average Precision between Base+Reg+Aff and Base (or Base+Reg) is more pronounced for events with average lower inner popularity. [sent-425, score-0.791]
85 The finding above suggests that the event-topic affinity vectors are especially useful for recommending events that attract only certain people’s attention, such as those related to sports, music, etc. [sent-427, score-0.748]
86 One may wonder for the events with low inner popularity why we could not achieve the same effect by Base or Base+Reg where we consider the topic similarity of test users with training users who have tweeted about the event. [sent-428, score-1.099]
87 Our close examination shows that for these events although Base and Base+Reg may identify relevant event tweets with decent precision, the users they identify who have tweeted about the event may not share similar topic interests. [sent-429, score-1.954]
88 As a result, when we average these users’ topic interests, we cannot obtain a clear skewed topic distribution that explains the event’s affinity to different topics. [sent-430, score-0.564]
89 In contrast, Base+Reg+Aff explicitly models the event-topic affinity vector and prefers to assign a tweet to an event if its author’s topic distribution is similar to the event’s affinity vector. [sent-431, score-1.278]
90 Through the training iterations, the users who have tweeted about an event as identified by Base+Reg+Affwill gradually converge to share similar topic distributions. [sent-432, score-0.785]
91 Grouping Events by Topics Finally, we show that the event-topic affinity vectors can also be used to group events by topics. [sent-433, score-0.711]
92 In Table 5 we show a few highly related events for a few popular topics in our Twitter data set. [sent-435, score-0.581]
93 Specifically given a topic a we rank the meaningful events that contain at least 70 tweets based on ηk,a. [sent-436, score-0.96]
94 The event “LionsXII 9-0 Sabah FA” is particularly interesting in that it is highly related to both the topic on Malay and the topic on soccer. [sent-438, score-0.714]
95 ) 5 Conclusion In this paper, we propose a unified model to study topics, events and users jointly. [sent-440, score-0.578]
96 The base of our method is a combination of an LDA-like model and the Recurrent Chinese Restaurant Process, which aims to model users’ longstanding personal topic interests and events over time simultaneously. [sent-441, score-0.798]
97 We further use a time durationbased regularization to capture the fast emergence and disappearance of events on Twitter, which is effective to produce more meaningful events. [sent-443, score-0.557]
98 Finally, we use an inner popularity bias parameter and event-topic affinity vectors to interpret an event’s inherent popularity and its affinity to different topics. [sent-444, score-0.786]
99 Our experiments quantitatively show that our proposed model can effectively identify meaningful 1878 events and accurately find relevant tweets for these events. [sent-445, score-0.816]
100 Furthermore, the event-topic association in- ferred by our model can help an event recommendation task and organize events by topics. [sent-446, score-0.912]
wordName wordTfidf (topN-words)
[('event', 0.426), ('events', 0.417), ('tweets', 0.326), ('affinity', 0.246), ('tweet', 0.186), ('reg', 0.166), ('topics', 0.164), ('users', 0.161), ('topic', 0.144), ('epoch', 0.139), ('twitter', 0.135), ('base', 0.126), ('rcrp', 0.113), ('diao', 0.1), ('super', 0.1), ('aff', 0.099), ('user', 0.094), ('concert', 0.088), ('timeuserlda', 0.088), ('junior', 0.087), ('popularity', 0.084), ('inner', 0.078), ('bursty', 0.076), ('meaningful', 0.073), ('restaurant', 0.071), ('april', 0.07), ('recommendation', 0.069), ('regularization', 0.067), ('album', 0.063), ('lionsxii', 0.063), ('recurrent', 0.059), ('interests', 0.056), ('day', 0.055), ('tweeted', 0.054), ('ahmed', 0.053), ('singapore', 0.05), ('eventtopic', 0.05), ('manda', 0.05), ('sabah', 0.05), ('swaggie', 0.05), ('petrovi', 0.05), ('vectors', 0.048), ('discovered', 0.046), ('music', 0.044), ('june', 0.043), ('social', 0.042), ('duration', 0.041), ('balasubramanyan', 0.038), ('becker', 0.038), ('concerts', 0.038), ('eawac', 0.038), ('fools', 0.038), ('hyuk', 0.038), ('qiming', 0.038), ('tsunami', 0.038), ('attract', 0.037), ('salakhutdinov', 0.037), ('soccer', 0.035), ('weng', 0.035), ('people', 0.034), ('cr', 0.034), ('gaussian', 0.034), ('judges', 0.033), ('draw', 0.033), ('bernoulli', 0.033), ('multinomial', 0.033), ('detection', 0.033), ('drr', 0.033), ('xing', 0.032), ('variables', 0.032), ('collaborative', 0.032), ('stream', 0.03), ('personal', 0.03), ('adoption', 0.03), ('indonesia', 0.03), ('distribution', 0.03), ('pick', 0.029), ('chinese', 0.029), ('weblogs', 0.027), ('fa', 0.027), ('sports', 0.027), ('analytics', 0.026), ('happy', 0.026), ('mother', 0.026), ('discover', 0.026), ('blei', 0.026), ('nt', 0.026), ('bang', 0.025), ('changmin', 0.025), ('encore', 0.025), ('fans', 0.025), ('flda', 0.025), ('fool', 0.025), ('lah', 0.025), ('liangjie', 0.025), ('longstanding', 0.025), ('mothers', 0.025), ('pmf', 0.025), ('publishes', 0.025), ('spontan', 0.025)]
simIndex simValue paperId paperTitle
same-paper 1 0.9999997 16 emnlp-2013-A Unified Model for Topics, Events and Users on Twitter
Author: Qiming Diao ; Jing Jiang
Abstract: With the rapid growth of social media, Twitter has become one of the most widely adopted platforms for people to post short and instant message. On the one hand, people tweets about their daily lives, and on the other hand, when major events happen, people also follow and tweet about them. Moreover, people’s posting behaviors on events are often closely tied to their personal interests. In this paper, we try to model topics, events and users on Twitter in a unified way. We propose a model which combines an LDA-like topic model and the Recurrent Chinese Restaurant Process to capture topics and events. We further propose a duration-based regularization component to find bursty events. We also propose to use event-topic affinity vectors to model the asso- . ciation between events and topics. Our experiments shows that our model can accurately identify meaningful events and the event-topic affinity vectors are effective for event recommendation and grouping events by topics.
2 0.31957272 118 emnlp-2013-Learning Biological Processes with Global Constraints
Author: Aju Thalappillil Scaria ; Jonathan Berant ; Mengqiu Wang ; Peter Clark ; Justin Lewis ; Brittany Harding ; Christopher D. Manning
Abstract: Biological processes are complex phenomena involving a series of events that are related to one another through various relationships. Systems that can understand and reason over biological processes would dramatically improve the performance of semantic applications involving inference such as question answering (QA) – specifically “How? ” and “Why? ” questions. In this paper, we present the task of process extraction, in which events within a process and the relations between the events are automatically extracted from text. We represent processes by graphs whose edges describe a set oftemporal, causal and co-reference event-event relations, and characterize the structural properties of these graphs (e.g., the graphs are connected). Then, we present a method for extracting relations between the events, which exploits these structural properties by performing joint in- ference over the set of extracted relations. On a novel dataset containing 148 descriptions of biological processes (released with this paper), we show significant improvement comparing to baselines that disregard process structure.
3 0.27850765 192 emnlp-2013-Unsupervised Induction of Contingent Event Pairs from Film Scenes
Author: Zhichao Hu ; Elahe Rahimtoroghi ; Larissa Munishkina ; Reid Swanson ; Marilyn A. Walker
Abstract: Human engagement in narrative is partially driven by reasoning about discourse relations between narrative events, and the expectations about what is likely to happen next that results from such reasoning. Researchers in NLP have tackled modeling such expectations from a range of perspectives, including treating it as the inference of the CONTINGENT discourse relation, or as a type of common-sense causal reasoning. Our approach is to model likelihood between events by drawing on several of these lines of previous work. We implement and evaluate different unsupervised methods for learning event pairs that are likely to be CONTINGENT on one another. We refine event pairs that we learn from a corpus of film scene descriptions utilizing web search counts, and evaluate our results by collecting human judgments ofcontingency. Our results indicate that the use of web search counts increases the av- , erage accuracy of our best method to 85.64% over a baseline of 50%, as compared to an average accuracy of 75. 15% without web search.
4 0.23622255 147 emnlp-2013-Optimized Event Storyline Generation based on Mixture-Event-Aspect Model
Author: Lifu Huang ; Lian'en Huang
Abstract: Recently, much research focuses on event storyline generation, which aims to produce a concise, global and temporal event summary from a collection of articles. Generally, each event contains multiple sub-events and the storyline should be composed by the component summaries of all the sub-events. However, different sub-events have different part-whole relationship with the major event, which is important to correspond to users’ interests but seldom considered in previous work. To distinguish different types of sub-events, we propose a mixture-event-aspect model which models different sub-events into local and global aspects. Combining these local/global aspects with summarization requirements together, we utilize an optimization method to generate the component summaries along the timeline. We develop experimental systems on 6 distinctively different datasets. Evaluation and comparison results indicate the effectiveness of our proposed method.
5 0.18152629 27 emnlp-2013-Authorship Attribution of Micro-Messages
Author: Roy Schwartz ; Oren Tsur ; Ari Rappoport ; Moshe Koppel
Abstract: Work on authorship attribution has traditionally focused on long texts. In this work, we tackle the question of whether the author of a very short text can be successfully identified. We use Twitter as an experimental testbed. We introduce the concept of an author’s unique “signature”, and show that such signatures are typical of many authors when writing very short texts. We also present a new authorship attribution feature (“flexible patterns”) and demonstrate a significant improvement over our baselines. Our results show that the author of a single tweet can be identified with good accuracy in an array of flavors of the authorship attribution task.
6 0.18121621 41 emnlp-2013-Building Event Threads out of Multiple News Articles
7 0.17393439 89 emnlp-2013-Gender Inference of Twitter Users in Non-English Contexts
8 0.15324578 74 emnlp-2013-Event-Based Time Label Propagation for Automatic Dating of News Articles
9 0.15204462 109 emnlp-2013-Is Twitter A Better Corpus for Measuring Sentiment Similarity?
10 0.13389772 76 emnlp-2013-Exploiting Discourse Analysis for Article-Wide Temporal Classification
11 0.13193534 75 emnlp-2013-Event Schema Induction with a Probabilistic Entity-Driven Model
12 0.11982355 163 emnlp-2013-Sarcasm as Contrast between a Positive Sentiment and Negative Situation
13 0.11739421 93 emnlp-2013-Harvesting Parallel News Streams to Generate Paraphrases of Event Relations
14 0.1069295 81 emnlp-2013-Exploring Demographic Language Variations to Improve Multilingual Sentiment Analysis in Social Media
15 0.10404337 77 emnlp-2013-Exploiting Domain Knowledge in Aspect Extraction
16 0.10365985 179 emnlp-2013-Summarizing Complex Events: a Cross-Modal Solution of Storylines Extraction and Reconstruction
17 0.097866915 90 emnlp-2013-Generating Coherent Event Schemas at Scale
18 0.095144905 100 emnlp-2013-Improvements to the Bayesian Topic N-Gram Models
19 0.092292264 121 emnlp-2013-Learning Topics and Positions from Debatepedia
20 0.087203749 47 emnlp-2013-Collective Opinion Target Extraction in Chinese Microblogs
topicId topicWeight
[(0, -0.219), (1, 0.228), (2, -0.196), (3, 0.225), (4, 0.098), (5, -0.33), (6, -0.188), (7, 0.058), (8, -0.075), (9, 0.059), (10, -0.094), (11, 0.098), (12, 0.059), (13, -0.075), (14, 0.054), (15, -0.114), (16, 0.13), (17, -0.041), (18, -0.162), (19, -0.131), (20, -0.074), (21, -0.002), (22, 0.057), (23, 0.139), (24, -0.07), (25, 0.008), (26, -0.04), (27, 0.022), (28, -0.087), (29, 0.077), (30, -0.019), (31, -0.088), (32, 0.077), (33, -0.028), (34, 0.068), (35, 0.07), (36, -0.007), (37, 0.027), (38, 0.005), (39, 0.077), (40, -0.092), (41, 0.0), (42, -0.023), (43, -0.035), (44, -0.048), (45, -0.044), (46, -0.002), (47, 0.038), (48, 0.096), (49, 0.012)]
simIndex simValue paperId paperTitle
same-paper 1 0.98456204 16 emnlp-2013-A Unified Model for Topics, Events and Users on Twitter
Author: Qiming Diao ; Jing Jiang
Abstract: With the rapid growth of social media, Twitter has become one of the most widely adopted platforms for people to post short and instant message. On the one hand, people tweets about their daily lives, and on the other hand, when major events happen, people also follow and tweet about them. Moreover, people’s posting behaviors on events are often closely tied to their personal interests. In this paper, we try to model topics, events and users on Twitter in a unified way. We propose a model which combines an LDA-like topic model and the Recurrent Chinese Restaurant Process to capture topics and events. We further propose a duration-based regularization component to find bursty events. We also propose to use event-topic affinity vectors to model the asso- . ciation between events and topics. Our experiments shows that our model can accurately identify meaningful events and the event-topic affinity vectors are effective for event recommendation and grouping events by topics.
2 0.75039285 192 emnlp-2013-Unsupervised Induction of Contingent Event Pairs from Film Scenes
Author: Zhichao Hu ; Elahe Rahimtoroghi ; Larissa Munishkina ; Reid Swanson ; Marilyn A. Walker
Abstract: Human engagement in narrative is partially driven by reasoning about discourse relations between narrative events, and the expectations about what is likely to happen next that results from such reasoning. Researchers in NLP have tackled modeling such expectations from a range of perspectives, including treating it as the inference of the CONTINGENT discourse relation, or as a type of common-sense causal reasoning. Our approach is to model likelihood between events by drawing on several of these lines of previous work. We implement and evaluate different unsupervised methods for learning event pairs that are likely to be CONTINGENT on one another. We refine event pairs that we learn from a corpus of film scene descriptions utilizing web search counts, and evaluate our results by collecting human judgments ofcontingency. Our results indicate that the use of web search counts increases the av- , erage accuracy of our best method to 85.64% over a baseline of 50%, as compared to an average accuracy of 75. 15% without web search.
3 0.70593166 118 emnlp-2013-Learning Biological Processes with Global Constraints
Author: Aju Thalappillil Scaria ; Jonathan Berant ; Mengqiu Wang ; Peter Clark ; Justin Lewis ; Brittany Harding ; Christopher D. Manning
Abstract: Biological processes are complex phenomena involving a series of events that are related to one another through various relationships. Systems that can understand and reason over biological processes would dramatically improve the performance of semantic applications involving inference such as question answering (QA) – specifically “How? ” and “Why? ” questions. In this paper, we present the task of process extraction, in which events within a process and the relations between the events are automatically extracted from text. We represent processes by graphs whose edges describe a set oftemporal, causal and co-reference event-event relations, and characterize the structural properties of these graphs (e.g., the graphs are connected). Then, we present a method for extracting relations between the events, which exploits these structural properties by performing joint in- ference over the set of extracted relations. On a novel dataset containing 148 descriptions of biological processes (released with this paper), we show significant improvement comparing to baselines that disregard process structure.
4 0.62449747 147 emnlp-2013-Optimized Event Storyline Generation based on Mixture-Event-Aspect Model
Author: Lifu Huang ; Lian'en Huang
Abstract: Recently, much research focuses on event storyline generation, which aims to produce a concise, global and temporal event summary from a collection of articles. Generally, each event contains multiple sub-events and the storyline should be composed by the component summaries of all the sub-events. However, different sub-events have different part-whole relationship with the major event, which is important to correspond to users’ interests but seldom considered in previous work. To distinguish different types of sub-events, we propose a mixture-event-aspect model which models different sub-events into local and global aspects. Combining these local/global aspects with summarization requirements together, we utilize an optimization method to generate the component summaries along the timeline. We develop experimental systems on 6 distinctively different datasets. Evaluation and comparison results indicate the effectiveness of our proposed method.
5 0.54313314 74 emnlp-2013-Event-Based Time Label Propagation for Automatic Dating of News Articles
Author: Tao Ge ; Baobao Chang ; Sujian Li ; Zhifang Sui
Abstract: Since many applications such as timeline summaries and temporal IR involving temporal analysis rely on document timestamps, the task of automatic dating of documents has been increasingly important. Instead of using feature-based methods as conventional models, our method attempts to date documents in a year level by exploiting relative temporal relations between documents and events, which are very effective for dating documents. Based on this intuition, we proposed an eventbased time label propagation model called confidence boosting in which time label information can be propagated between documents and events on a bipartite graph. The experiments show that our event-based propagation model can predict document timestamps in high accuracy and the model combined with a MaxEnt classifier outperforms the state-ofthe-art method for this task especially when the size of the training set is small.
6 0.54004103 41 emnlp-2013-Building Event Threads out of Multiple News Articles
8 0.4507505 89 emnlp-2013-Gender Inference of Twitter Users in Non-English Contexts
9 0.41391921 27 emnlp-2013-Authorship Attribution of Micro-Messages
10 0.37117764 75 emnlp-2013-Event Schema Induction with a Probabilistic Entity-Driven Model
11 0.3635436 93 emnlp-2013-Harvesting Parallel News Streams to Generate Paraphrases of Event Relations
12 0.35414428 179 emnlp-2013-Summarizing Complex Events: a Cross-Modal Solution of Storylines Extraction and Reconstruction
13 0.34690538 200 emnlp-2013-Well-Argued Recommendation: Adaptive Models Based on Words in Recommender Systems
14 0.34178934 18 emnlp-2013-A temporal model of text periodicities using Gaussian Processes
15 0.34034148 163 emnlp-2013-Sarcasm as Contrast between a Positive Sentiment and Negative Situation
16 0.33953682 100 emnlp-2013-Improvements to the Bayesian Topic N-Gram Models
17 0.33409443 76 emnlp-2013-Exploiting Discourse Analysis for Article-Wide Temporal Classification
18 0.31899908 77 emnlp-2013-Exploiting Domain Knowledge in Aspect Extraction
19 0.28750348 109 emnlp-2013-Is Twitter A Better Corpus for Measuring Sentiment Similarity?
20 0.28696129 90 emnlp-2013-Generating Coherent Event Schemas at Scale
topicId topicWeight
[(3, 0.021), (9, 0.022), (18, 0.019), (22, 0.072), (30, 0.08), (50, 0.015), (51, 0.141), (66, 0.045), (71, 0.019), (75, 0.05), (77, 0.01), (96, 0.401)]
simIndex simValue paperId paperTitle
1 0.95015925 165 emnlp-2013-Scaling to Large3 Data: An Efficient and Effective Method to Compute Distributional Thesauri
Author: Martin Riedl ; Chris Biemann
Abstract: We introduce a new highly scalable approach for computing Distributional Thesauri (DTs). By employing pruning techniques and a distributed framework, we make the computation for very large corpora feasible on comparably small computational resources. We demonstrate this by releasing a DT for the whole vocabulary of Google Books syntactic n-grams. Evaluating against lexical resources using two measures, we show that our approach produces higher quality DTs than previous approaches, and is thus preferable in terms of speed and quality for large corpora.
2 0.84884167 17 emnlp-2013-A Walk-Based Semantically Enriched Tree Kernel Over Distributed Word Representations
Author: Shashank Srivastava ; Dirk Hovy ; Eduard Hovy
Abstract: In this paper, we propose a walk-based graph kernel that generalizes the notion of treekernels to continuous spaces. Our proposed approach subsumes a general framework for word-similarity, and in particular, provides a flexible way to incorporate distributed representations. Using vector representations, such an approach captures both distributional semantic similarities among words as well as the structural relations between them (encoded as the structure of the parse tree). We show an efficient formulation to compute this kernel using simple matrix operations. We present our results on three diverse NLP tasks, showing state-of-the-art results.
3 0.82694018 112 emnlp-2013-Joint Coreference Resolution and Named-Entity Linking with Multi-Pass Sieves
Author: Hannaneh Hajishirzi ; Leila Zilles ; Daniel S. Weld ; Luke Zettlemoyer
Abstract: Many errors in coreference resolution come from semantic mismatches due to inadequate world knowledge. Errors in named-entity linking (NEL), on the other hand, are often caused by superficial modeling of entity context. This paper demonstrates that these two tasks are complementary. We introduce NECO, a new model for named entity linking and coreference resolution, which solves both problems jointly, reducing the errors made on each. NECO extends the Stanford deterministic coreference system by automatically linking mentions to Wikipedia and introducing new NEL-informed mention-merging sieves. Linking improves mention-detection and enables new semantic attributes to be incorporated from Freebase, while coreference provides better context modeling by propagating named-entity links within mention clusters. Experiments show consistent improve- ments across a number of datasets and experimental conditions, including over 11% reduction in MUC coreference error and nearly 21% reduction in F1 NEL error on ACE 2004 newswire data.
same-paper 4 0.80212742 16 emnlp-2013-A Unified Model for Topics, Events and Users on Twitter
Author: Qiming Diao ; Jing Jiang
Abstract: With the rapid growth of social media, Twitter has become one of the most widely adopted platforms for people to post short and instant message. On the one hand, people tweets about their daily lives, and on the other hand, when major events happen, people also follow and tweet about them. Moreover, people’s posting behaviors on events are often closely tied to their personal interests. In this paper, we try to model topics, events and users on Twitter in a unified way. We propose a model which combines an LDA-like topic model and the Recurrent Chinese Restaurant Process to capture topics and events. We further propose a duration-based regularization component to find bursty events. We also propose to use event-topic affinity vectors to model the asso- . ciation between events and topics. Our experiments shows that our model can accurately identify meaningful events and the event-topic affinity vectors are effective for event recommendation and grouping events by topics.
Author: Baichuan Li ; Jing Liu ; Chin-Yew Lin ; Irwin King ; Michael R. Lyu
Abstract: Social media like forums and microblogs have accumulated a huge amount of user generated content (UGC) containing human knowledge. Currently, most of UGC is listed as a whole or in pre-defined categories. This “list-based” approach is simple, but hinders users from browsing and learning knowledge of certain topics effectively. To address this problem, we propose a hierarchical entity-based approach for structuralizing UGC in social media. By using a large-scale entity repository, we design a three-step framework to organize UGC in a novel hierarchical structure called “cluster entity tree (CET)”. With Yahoo! Answers as a test case, we conduct experiments and the results show the effectiveness of our framework in constructing CET. We further evaluate the performance of CET on UGC organization in both user and system aspects. From a user aspect, our user study demonstrates that, with CET-based structure, users perform significantly better in knowledge learning than using traditional list-based approach. From a system aspect, CET substantially boosts the performance of two information retrieval models (i.e., vector space model and query likelihood language model).
6 0.60161817 48 emnlp-2013-Collective Personal Profile Summarization with Social Networks
7 0.59010279 130 emnlp-2013-Microblog Entity Linking by Leveraging Extra Posts
8 0.56976569 73 emnlp-2013-Error-Driven Analysis of Challenges in Coreference Resolution
9 0.56666046 80 emnlp-2013-Exploiting Zero Pronouns to Improve Chinese Coreference Resolution
10 0.55807263 110 emnlp-2013-Joint Bootstrapping of Corpus Annotations and Entity Types
11 0.55160886 18 emnlp-2013-A temporal model of text periodicities using Gaussian Processes
12 0.5471729 200 emnlp-2013-Well-Argued Recommendation: Adaptive Models Based on Words in Recommender Systems
13 0.54464787 69 emnlp-2013-Efficient Collective Entity Linking with Stacking
14 0.54095161 1 emnlp-2013-A Constrained Latent Variable Model for Coreference Resolution
15 0.52985948 51 emnlp-2013-Connecting Language and Knowledge Bases with Embedding Models for Relation Extraction
16 0.52224684 160 emnlp-2013-Relational Inference for Wikification
17 0.51594955 67 emnlp-2013-Easy Victories and Uphill Battles in Coreference Resolution
18 0.51389301 89 emnlp-2013-Gender Inference of Twitter Users in Non-English Contexts
19 0.51326627 143 emnlp-2013-Open Domain Targeted Sentiment
20 0.51280415 75 emnlp-2013-Event Schema Induction with a Probabilistic Entity-Driven Model