acl acl2012 acl2012-205 knowledge-graph by maker-knowledge-mining
Source: pdf
Author: Rui Yan ; Mirella Lapata ; Xiaoming Li
Abstract: Mirella Lapata‡ Xiaoming Li†, \ ‡Institute for Language, \State Key Laboratory of Software Cognition and Computation, Development Environment, University of Edinburgh, Beihang University, Edinburgh EH8 9AB, UK Beijing 100083, China mlap@ inf .ed .ac .uk lxm@pku .edu .cn 2012.1 Twitter enables users to send and read textbased posts ofup to 140 characters, known as tweets. As one of the most popular micro-blogging services, Twitter attracts millions of users, producing millions of tweets daily. Shared information through this service spreads faster than would have been possible with traditional sources, however the proliferation of user-generation content poses challenges to browsing and finding valuable information. In this paper we propose a graph-theoretic model for tweet recommendation that presents users with items they may have an interest in. Our model ranks tweets and their authors simultaneously using several networks: the social network connecting the users, the network connecting the tweets, and a third network that ties the two together. Tweet and author entities are ranked following a co-ranking algorithm based on the intuition that that there is a mutually reinforcing relationship between tweets and their authors that could be reflected in the rankings. We show that this framework can be parametrized to take into account user preferences, the popularity of tweets and their authors, and diversity. Experimental evaluation on a large dataset shows that our model out- performs competitive approaches by a large margin.
Reference: text
sentIndex sentText sentNum sentScore
1 As one of the most popular micro-blogging services, Twitter attracts millions of users, producing millions of tweets daily. [sent-11, score-0.534]
2 In this paper we propose a graph-theoretic model for tweet recommendation that presents users with items they may have an interest in. [sent-13, score-0.818]
3 Our model ranks tweets and their authors simultaneously using several networks: the social network connecting the users, the network connecting the tweets, and a third network that ties the two together. [sent-14, score-0.895]
4 Tweet and author entities are ranked following a co-ranking algorithm based on the intuition that that there is a mutually reinforcing relationship between tweets and their authors that could be reflected in the rankings. [sent-15, score-0.743]
5 We show that this framework can be parametrized to take into account user preferences, the popularity of tweets and their authors, and diversity. [sent-16, score-0.775]
6 Being a follower on Twitter means that the user receives all the tweets from those she follows. [sent-20, score-0.664]
7 Common practice of responding to a tweet has evolved into a welldefined markup culture (e. [sent-21, score-0.5]
8 Over 340 millions of tweets are being generated daily amounting to thousands of tweets per Twitter’s own search engine handles more than 1. [sent-30, score-1.068]
9 html 2In fact, the peak record is 6,939 tweets per second, reported by http : / /blog . [sent-37, score-0.534]
10 , by limiting the stream of tweets to those of interest to the user, or by discovering intriguing content outside the user’s following network. [sent-48, score-0.573]
11 The tweet recommendation task is challenging for several reasons. [sent-49, score-0.681]
12 Secondly, the recommendations ought to be of interest to the user and likely to to attract user response (e. [sent-52, score-0.332]
13 In this paper we present a graph-theoretic approach to tweet recommendation that attempts to address these challenges. [sent-57, score-0.681]
14 Our recommender operates over a heterogeneous network that connects the users (or authors) and the tweets they produce. [sent-58, score-0.807]
15 The user network represents links among authors based on their following behavior, whereas the tweet network connects tweets based on content similarity. [sent-59, score-1.389]
16 The main intuition behind co-ranking is that there is a mutually reinforcing relationship between authors and tweets that could be reflected in the rankings. [sent-63, score-0.643]
17 Tweets are important if they are related to other important tweets and authored by important users who in turn are related to other important users. [sent-64, score-0.714]
18 The model exploits this mutually reinforcing relationship between tweets and their authors and couples two random walks, one on the tweet graph and one on the author graph, into a combined one. [sent-65, score-1.364]
19 Rather than creating a global ranking over all tweets in a collection, we extend this framework to individual users and produce personalized recommendations. [sent-66, score-0.88]
20 Moreover, we incorporate diversity by allowing the random walk on the tweet graph to be time-variant (Mei et al. [sent-67, score-0.811]
21 Experimental results on a real-world dataset consisting of 364,287,744 tweets from 9,449,542 users show that the co-ranking approach substantially improves performance over the state of the art. [sent-69, score-0.671]
22 517 2 Related Work Tweet Search Given the large amount of tweets being posted daily, ranking strategies have be- come extremely important for retrieving information quickly. [sent-73, score-0.633]
23 State-of-art tweet retrieval methods include a linear regression model biased towards text quality with a regularization factor inspired by the hypothesis that documents similar in content may have similar quality (Huang et al. [sent-77, score-0.539]
24 (2010) learn a ranking model using SVMs and features based on tweet content, the relations among users, and tweet specific characteristics (e. [sent-80, score-1.099]
25 Tweet Recommendation Previous work has also focused on tweet recommendation systems, assuming no explicit query is provided by the users. [sent-83, score-0.681]
26 Collaborative filtering is perhaps the most obvious method for recommending tweets (Hannon et al. [sent-84, score-0.569]
27 (2009) propose a diffusion-based recommendation framework especially for tweets representing critical events by constructing a diffusion graph. [sent-94, score-0.754]
28 (201 1) recommend tweets based on popularity related features. [sent-96, score-0.648]
29 (2010) investigate which topics users are interested in following a Labeled-LDA approach, by deciding whether a user is in the followee list of a given user or not. [sent-98, score-0.397]
30 Uysal and Croft (201 1) estimate the likelihood of a tweet being reposted from a user-centric perspective. [sent-99, score-0.559]
31 Our model exploits the information provided by the tweets and the underlying social networks in a unified co-ranking framework. [sent-101, score-0.571]
32 However, the adaptation of this framework to the tweet recommendation task is novel to our knowledge. [sent-106, score-0.72]
33 GM = (VM, EM) is a weighted undirected graph representing the tweets and their relationships. [sent-110, score-0.67]
34 Let VM = {mi|mi ∈ VM} denote a collection of |VM| tweets a=nd { EM mthe∈ seVt o}f dliennkost representing nre olaft |iVon-| ships between them. [sent-111, score-0.534]
35 The latter are established by measuring how semantically similar any two tweets are (see Section 3. [sent-112, score-0.534]
36 The graph consists of nodes VMU = VM ∪ VU and edges EMU connecting each tweet with ∪allV of its authors. [sent-118, score-0.604]
37 Typically, a tweet m is written by only one author u. [sent-119, score-0.559]
38 However, because of retweeting we treat all users involved in reposting a tweet as “co-authors”. [sent-120, score-0.71]
39 A bipartite graph (whose edges are shown with dashed lines) ties the tweet and author networks together. [sent-128, score-0.754]
40 The framework couples the two random walks on GM, and GU that rank tweets and theirs authors in isolation. [sent-130, score-0.764]
41 1 Ranking the Tweet Graph Popularity We rank the tweet network following the PageRank paradigm (Brin and Page, 1998). [sent-134, score-0.559]
42 It produces a global ranking over all tweets in the collection without taking specific users into account. [sent-143, score-0.77]
43 As there are billions of tweets available on Twitter covering many diverse topics, it is reasonable to assume that an average user will only be interested in a small subset (Qiu and Cho, 2006). [sent-144, score-0.664]
44 Note that user preferences can be also defined at the tweet (rather than topic) level. [sent-151, score-0.63]
45 Although tweets can illustrate user interests more directly, in most cases a user will only respond to a small fraction of tweets. [sent-152, score-0.821]
46 This means ×n, that most tweets will not provide any information relating to a user’s interests. [sent-153, score-0.534]
47 The topic preference vector allows to propagate such information (based on whether a tweet has been reposted or not) to other tweets within the same topic cluster. [sent-154, score-1.255]
48 Let Dij denote the probability of tweet mi to belong to topic tj. [sent-157, score-0.573]
49 Consider a user with a topic preference vector t and topic distribution matrix D. [sent-158, score-0.369]
50 We calculate the response probability r for all tweets for this user as: × r = tDT (2) where r=[r1, r2, . [sent-159, score-0.698]
51 , rVM]1 |VM| represents the response probability vector a1nd× ri |the probability for a user to respond to tweet mi. [sent-162, score-0.722]
52 ,rw] 1 where w< |VM| for a given user and the topic d,is wtrhiebruetio wn< matrix D, our task is estimate the topic preference vector t. [sent-167, score-0.369]
53 Assuming all responses are independent, the probability for w tweets r1, r2, . [sent-171, score-0.534]
54 T|h1×e| preference factor for author u toward other authors ui is defined as: piu=#t#wtweeetests fro ofm u ui (7) which represents the proportion of tweets inherited from user ui. [sent-205, score-0.998]
55 3 The Co-Ranking Algorithm So far we have described how we rank the network of tweets GM and their authors GU independently following the PageRank paradigm. [sent-210, score-0.661]
56 The latter is a bipartite graph representing which tweets are authored by which users. [sent-212, score-0.733]
57 The two intra-class random walks are coupled using the inter-class random walk on the bipartite graph. [sent-219, score-0.313]
58 This amounts to separately ranking tweets and authors by PageRank. [sent-222, score-0.701]
59 In general, λ represents the extent to which the ranking of tweets and their authors depend on each other. [sent-223, score-0.701]
60 There are two intuitions behind the co-ranking algorithm: (1) a tweet is important if it associates to other important tweets, and is authored by important users and (2) a user is important if they associate to other important users, and they write important tweets. [sent-224, score-0.81]
61 Note that the tweet transition matrix M is dynamic due to the computation of diversity while the author transition matrix U is static. [sent-227, score-0.856]
62 The tweet graph is an undirected weighted graph, where an edge between two tweets mi and mj represents their cosine similarity. [sent-233, score-1.2]
63 An adjacency matrix M describes the tweet graph where each entry corresponds to the weight of a link in the graph: Mij=∑FkF(m(mi,mi,mj)k),F(mi,mj) =|| m~ m~i|i|·| m~ m~jj| (10) m where F(. [sent-234, score-0.681]
64 ) is the cosine similarity and is a term vector corresponding to tweet m. [sent-235, score-0.531]
65 We treat a tweet as a short document and weight each term with tf. [sent-236, score-0.5]
66 In addition, we collected the data of their followees and followers by traversing the following edges, and exploring all newly included users in the same way until no new users were added. [sent-242, score-0.301]
67 , the tweet graph, the author graph, and the tweet-author graph), the dataset was preprocessed as follows. [sent-248, score-0.559]
68 We removed tweets of low linguistic quality and subsequently discarded users without any linkage to the remaining tweets. [sent-249, score-0.698]
69 Small λ values place little emphasis on the tweet graph, whereas larger values rely more heav- ily on the author graph. [sent-263, score-0.559]
70 This suggests that both sources of information the content of the tweets and their authors are important for the recommendation task. [sent-267, score-0.822]
71 Our second baseline ranks tweets according to token length: longer tweets are ranked higher (Length). [sent-273, score-1.149]
72 The third baseline ranks tweets by the number of times they are reposted assuming that more reposting is better (RTnum). [sent-274, score-0.674]
73 Their model (RSVM) ranks tweets based on tweet content features and tweet authority features using the RankSVM algorithm (Joachims, 1999). [sent-277, score-1.613]
74 Our fifth comparison system (DTC) was Uysal and Croft (201 1) who use a decision tree classifier to judge how likely it is for a tweet to be reposted by a specific user. [sent-278, score-0.559]
75 This scenario is similar to ours when ranking tweets by retweet likelihood. [sent-279, score-0.68]
76 (201 1) who use weighted linear combination (WLC) to grade the relevance of a tweet given a query. [sent-281, score-0.531]
77 We implemented their model without any query-related features as in our setting we do not discriminate tweets depending on their relevance to specific queries. [sent-282, score-0.565]
78 Specifically, we assume that if a tweet is retweeted it is relevant and is thus ranked higher over tweets that have not been reposted. [sent-286, score-1.184]
79 We used our algorithm to predict a ranking for the tweets in the test data which we then compared against a goldstandard ranking based on whether a tweet has been retweeted or not. [sent-287, score-1.377]
80 , 1: retweeted, 0: not retweeted) for the i-th tweet in the ranking list for user u. [sent-291, score-0.729]
81 For instance, it is possible for the algorithm to recommend tweets to users with no linkage to their publishers. [sent-295, score-0.74]
82 Such tweets may be of potential interest, however our goldstandard data can only provide information for tweets and users with following links. [sent-296, score-1.241]
83 We therefore asked the 23 users whose Twitter data formed the basis of our corpus to judge the tweets ranked by our algorithm and comparison systems. [sent-309, score-0.712]
84 The users were asked to read the systems’ recommendations and decide for every tweet presented to them whether they would retweet it or not, under the assumption that retweeting takes place when users find the tweet interesting. [sent-310, score-1.391]
85 In both automatic and human-based evaluations we ranked all tweets in the test data. [sent-311, score-0.575]
86 This is hardly surprising as it recommends tweets without any notion oftheir importance or user interest. [sent-318, score-0.664]
87 This might be due to the fact that informativeness is related to tweet length. [sent-332, score-0.5]
88 Using merely the number of retweets does not seem to capture the tweet importance as well as Length. [sent-333, score-0.5]
89 For example, in our data, the most frequently reposted tweet is a commercial advertise- ment calling for reposting! [sent-335, score-0.559]
90 Our co-ranking algorithm models user interest with respect to the content of the tweets and their publishers. [sent-344, score-0.703]
91 This indicates that users are interested in tweets that fall outside the scope of their followers and that recommendation can improve user experience. [sent-348, score-0.982]
92 523 We further examined the contribution of the individual components of our system to the tweet recommendation task. [sent-349, score-0.681]
93 Tables 3 and 4 show how the performance of our co-ranking algorithm varies when considering only tweet popularity using the standard PageRank algorithm, personalization (PersRank), and diversity (DivRank). [sent-350, score-0.727]
94 Note that DivRank is only applied to the tweet graph. [sent-351, score-0.5]
95 Intuitively, users are more likely to repost tweets from their followees, or tweets closely related to those retweeted previously. [sent-354, score-1.314]
96 6 Conclusions We presented a co-ranking framework for a tweet recommendation system that takes popularity, personalization and diversity into account. [sent-355, score-0.875]
97 Central to our approach is the representation of tweets and their users in a heterogeneous network and the ability to produce a global ranking that takes both information sources into account. [sent-356, score-0.863]
98 Semantic enrichment of twitter posts for user profile construction on the social web. [sent-378, score-0.342]
99 Recommending twitter users to follow using content and collaborative filtering approaches. [sent-413, score-0.321]
100 User oriented tweet ranking: a filtering approach to microblogs. [sent-471, score-0.5]
wordName wordTfidf (topN-words)
[('tweets', 0.534), ('tweet', 0.5), ('recommendation', 0.181), ('ndcg', 0.173), ('twitter', 0.145), ('users', 0.137), ('user', 0.13), ('vm', 0.13), ('gm', 0.119), ('retweeted', 0.109), ('vu', 0.108), ('graph', 0.104), ('ranking', 0.099), ('pagerank', 0.091), ('personalization', 0.086), ('gu', 0.082), ('divrank', 0.081), ('ui', 0.081), ('walk', 0.08), ('matrix', 0.077), ('popularity', 0.072), ('personalized', 0.071), ('diversity', 0.069), ('emu', 0.068), ('gmu', 0.068), ('authors', 0.068), ('walks', 0.065), ('reposted', 0.059), ('network', 0.059), ('author', 0.059), ('random', 0.058), ('dtc', 0.054), ('systemndcg', 0.054), ('bipartite', 0.052), ('tdt', 0.05), ('abel', 0.047), ('retweet', 0.047), ('preference', 0.045), ('vertex', 0.043), ('authored', 0.043), ('recommender', 0.043), ('topic', 0.043), ('recommend', 0.042), ('ranked', 0.041), ('houben', 0.041), ('piu', 0.041), ('reinforcing', 0.041), ('reposting', 0.041), ('rsvm', 0.041), ('uysal', 0.041), ('wlc', 0.041), ('diag', 0.04), ('mu', 0.04), ('ranks', 0.04), ('framework', 0.039), ('ties', 0.039), ('content', 0.039), ('map', 0.038), ('recommendations', 0.038), ('yan', 0.038), ('social', 0.037), ('transition', 0.037), ('vertices', 0.036), ('goldstandard', 0.036), ('recency', 0.035), ('recommending', 0.035), ('crawler', 0.035), ('affinity', 0.035), ('heterogeneous', 0.034), ('response', 0.034), ('um', 0.033), ('undirected', 0.032), ('uj', 0.032), ('retweeting', 0.032), ('mei', 0.032), ('relevance', 0.031), ('vector', 0.031), ('cho', 0.03), ('matrices', 0.03), ('wan', 0.03), ('posts', 0.03), ('mi', 0.03), ('fabian', 0.029), ('rui', 0.029), ('linkage', 0.027), ('duan', 0.027), ('respond', 0.027), ('arvelin', 0.027), ('carbonell', 0.027), ('damping', 0.027), ('dceaoigvrrs', 0.027), ('dcg', 0.027), ('elicited', 0.027), ('followees', 0.027), ('hannon', 0.027), ('hongyuan', 0.027), ('jianguo', 0.027), ('junghoo', 0.027), ('kek', 0.027), ('mtm', 0.027)]
simIndex simValue paperId paperTitle
same-paper 1 0.99999863 205 acl-2012-Tweet Recommendation with Graph Co-Ranking
Author: Rui Yan ; Mirella Lapata ; Xiaoming Li
Abstract: Mirella Lapata‡ Xiaoming Li†, \ ‡Institute for Language, \State Key Laboratory of Software Cognition and Computation, Development Environment, University of Edinburgh, Beihang University, Edinburgh EH8 9AB, UK Beijing 100083, China mlap@ inf .ed .ac .uk lxm@pku .edu .cn 2012.1 Twitter enables users to send and read textbased posts ofup to 140 characters, known as tweets. As one of the most popular micro-blogging services, Twitter attracts millions of users, producing millions of tweets daily. Shared information through this service spreads faster than would have been possible with traditional sources, however the proliferation of user-generation content poses challenges to browsing and finding valuable information. In this paper we propose a graph-theoretic model for tweet recommendation that presents users with items they may have an interest in. Our model ranks tweets and their authors simultaneously using several networks: the social network connecting the users, the network connecting the tweets, and a third network that ties the two together. Tweet and author entities are ranked following a co-ranking algorithm based on the intuition that that there is a mutually reinforcing relationship between tweets and their authors that could be reflected in the rankings. We show that this framework can be parametrized to take into account user preferences, the popularity of tweets and their authors, and diversity. Experimental evaluation on a large dataset shows that our model out- performs competitive approaches by a large margin.
2 0.5297963 167 acl-2012-QuickView: NLP-based Tweet Search
Author: Xiaohua Liu ; Furu Wei ; Ming Zhou ; QuickView Team Microsoft
Abstract: Tweets have become a comprehensive repository for real-time information. However, it is often hard for users to quickly get information they are interested in from tweets, owing to the sheer volume of tweets as well as their noisy and informal nature. We present QuickView, an NLP-based tweet search platform to tackle this issue. Specifically, it exploits a series of natural language processing technologies, such as tweet normalization, named entity recognition, semantic role labeling, sentiment analysis, tweet classification, to extract useful information, i.e., named entities, events, opinions, etc., from a large volume of tweets. Then, non-noisy tweets, together with the mined information, are indexed, on top of which two brand new scenarios are enabled, i.e., categorized browsing and advanced search, allowing users to effectively access either the tweets or fine-grained information they are interested in.
3 0.39162192 21 acl-2012-A System for Real-time Twitter Sentiment Analysis of 2012 U.S. Presidential Election Cycle
Author: Hao Wang ; Dogan Can ; Abe Kazemzadeh ; Francois Bar ; Shrikanth Narayanan
Abstract: This paper describes a system for real-time analysis of public sentiment toward presidential candidates in the 2012 U.S. election as expressed on Twitter, a microblogging service. Twitter has become a central site where people express their opinions and views on political parties and candidates. Emerging events or news are often followed almost instantly by a burst in Twitter volume, providing a unique opportunity to gauge the relation between expressed public sentiment and electoral events. In addition, sentiment analysis can help explore how these events affect public opinion. While traditional content analysis takes days or weeks to complete, the system demonstrated here analyzes sentiment in the entire Twitter traffic about the election, delivering results instantly and continuously. It offers the public, the media, politicians and scholars a new and timely perspective on the dynamics of the electoral process and public opinion. 1
4 0.2986708 124 acl-2012-Joint Inference of Named Entity Recognition and Normalization for Tweets
Author: Xiaohua Liu ; Ming Zhou ; Xiangyang Zhou ; Zhongyang Fu ; Furu Wei
Abstract: Tweets represent a critical source of fresh information, in which named entities occur frequently with rich variations. We study the problem of named entity normalization (NEN) for tweets. Two main challenges are the errors propagated from named entity recognition (NER) and the dearth of information in a single tweet. We propose a novel graphical model to simultaneously conduct NER and NEN on multiple tweets to address these challenges. Particularly, our model introduces a binary random variable for each pair of words with the same lemma across similar tweets, whose value indicates whether the two related words are mentions of the same entity. We evaluate our method on a manually annotated data set, and show that our method outperforms the baseline that handles these two tasks separately, boosting the F1 from 80.2% to 83.6% for NER, and the Accuracy from 79.4% to 82.6% for NEN, respectively.
5 0.24262391 91 acl-2012-Extracting and modeling durations for habits and events from Twitter
Author: Jennifer Williams ; Graham Katz
Abstract: We seek to automatically estimate typical durations for events and habits described in Twitter tweets. A corpus of more than 14 million tweets containing temporal duration information was collected. These tweets were classified as to their habituality status using a bootstrapped, decision tree. For each verb lemma, associated duration information was collected for episodic and habitual uses of the verb. Summary statistics for 483 verb lemmas and their typical habit and episode durations has been compiled and made available. This automatically generated duration information is broadly comparable to hand-annotation. 1
6 0.12794209 173 acl-2012-Self-Disclosure and Relationship Strength in Twitter Conversations
7 0.11120788 98 acl-2012-Finding Bursty Topics from Microblogs
8 0.09980455 86 acl-2012-Exploiting Latent Information to Predict Diffusions of Novel Topics on Social Networks
9 0.078983687 2 acl-2012-A Broad-Coverage Normalization System for Social Media Language
10 0.070616715 142 acl-2012-Mining Entity Types from Query Logs via User Intent Modeling
11 0.069947414 160 acl-2012-Personalized Normalization for a Multilingual Chat System
12 0.066777498 145 acl-2012-Modeling Sentences in the Latent Space
13 0.061221186 24 acl-2012-A Web-based Evaluation Framework for Spatial Instruction-Giving Systems
14 0.060564525 12 acl-2012-A Graph-based Cross-lingual Projection Approach for Weakly Supervised Relation Extraction
15 0.060259875 180 acl-2012-Social Event Radar: A Bilingual Context Mining and Sentiment Analysis Summarization System
16 0.05391176 80 acl-2012-Efficient Tree-based Approximation for Entailment Graph Learning
17 0.051691219 19 acl-2012-A Ranking-based Approach to Word Reordering for Statistical Machine Translation
18 0.051354449 22 acl-2012-A Topic Similarity Model for Hierarchical Phrase-based Translation
19 0.049741417 33 acl-2012-Automatic Event Extraction with Structured Preference Modeling
20 0.047115251 31 acl-2012-Authorship Attribution with Author-aware Topic Models
topicId topicWeight
[(0, -0.171), (1, 0.187), (2, 0.107), (3, 0.107), (4, 0.048), (5, -0.148), (6, 0.569), (7, 0.116), (8, 0.18), (9, 0.28), (10, 0.101), (11, -0.041), (12, 0.081), (13, 0.05), (14, 0.097), (15, 0.055), (16, -0.035), (17, -0.044), (18, 0.074), (19, 0.007), (20, 0.044), (21, -0.078), (22, 0.009), (23, 0.018), (24, 0.008), (25, -0.039), (26, -0.056), (27, -0.018), (28, 0.0), (29, 0.014), (30, -0.078), (31, 0.012), (32, 0.021), (33, -0.061), (34, -0.043), (35, -0.034), (36, -0.018), (37, -0.009), (38, -0.006), (39, -0.01), (40, 0.031), (41, 0.025), (42, 0.031), (43, 0.013), (44, 0.016), (45, 0.001), (46, 0.033), (47, -0.031), (48, -0.012), (49, -0.03)]
simIndex simValue paperId paperTitle
same-paper 1 0.96471667 205 acl-2012-Tweet Recommendation with Graph Co-Ranking
Author: Rui Yan ; Mirella Lapata ; Xiaoming Li
Abstract: Mirella Lapata‡ Xiaoming Li†, \ ‡Institute for Language, \State Key Laboratory of Software Cognition and Computation, Development Environment, University of Edinburgh, Beihang University, Edinburgh EH8 9AB, UK Beijing 100083, China mlap@ inf .ed .ac .uk lxm@pku .edu .cn 2012.1 Twitter enables users to send and read textbased posts ofup to 140 characters, known as tweets. As one of the most popular micro-blogging services, Twitter attracts millions of users, producing millions of tweets daily. Shared information through this service spreads faster than would have been possible with traditional sources, however the proliferation of user-generation content poses challenges to browsing and finding valuable information. In this paper we propose a graph-theoretic model for tweet recommendation that presents users with items they may have an interest in. Our model ranks tweets and their authors simultaneously using several networks: the social network connecting the users, the network connecting the tweets, and a third network that ties the two together. Tweet and author entities are ranked following a co-ranking algorithm based on the intuition that that there is a mutually reinforcing relationship between tweets and their authors that could be reflected in the rankings. We show that this framework can be parametrized to take into account user preferences, the popularity of tweets and their authors, and diversity. Experimental evaluation on a large dataset shows that our model out- performs competitive approaches by a large margin.
2 0.92188936 167 acl-2012-QuickView: NLP-based Tweet Search
Author: Xiaohua Liu ; Furu Wei ; Ming Zhou ; QuickView Team Microsoft
Abstract: Tweets have become a comprehensive repository for real-time information. However, it is often hard for users to quickly get information they are interested in from tweets, owing to the sheer volume of tweets as well as their noisy and informal nature. We present QuickView, an NLP-based tweet search platform to tackle this issue. Specifically, it exploits a series of natural language processing technologies, such as tweet normalization, named entity recognition, semantic role labeling, sentiment analysis, tweet classification, to extract useful information, i.e., named entities, events, opinions, etc., from a large volume of tweets. Then, non-noisy tweets, together with the mined information, are indexed, on top of which two brand new scenarios are enabled, i.e., categorized browsing and advanced search, allowing users to effectively access either the tweets or fine-grained information they are interested in.
3 0.75514334 124 acl-2012-Joint Inference of Named Entity Recognition and Normalization for Tweets
Author: Xiaohua Liu ; Ming Zhou ; Xiangyang Zhou ; Zhongyang Fu ; Furu Wei
Abstract: Tweets represent a critical source of fresh information, in which named entities occur frequently with rich variations. We study the problem of named entity normalization (NEN) for tweets. Two main challenges are the errors propagated from named entity recognition (NER) and the dearth of information in a single tweet. We propose a novel graphical model to simultaneously conduct NER and NEN on multiple tweets to address these challenges. Particularly, our model introduces a binary random variable for each pair of words with the same lemma across similar tweets, whose value indicates whether the two related words are mentions of the same entity. We evaluate our method on a manually annotated data set, and show that our method outperforms the baseline that handles these two tasks separately, boosting the F1 from 80.2% to 83.6% for NER, and the Accuracy from 79.4% to 82.6% for NEN, respectively.
4 0.70387018 21 acl-2012-A System for Real-time Twitter Sentiment Analysis of 2012 U.S. Presidential Election Cycle
Author: Hao Wang ; Dogan Can ; Abe Kazemzadeh ; Francois Bar ; Shrikanth Narayanan
Abstract: This paper describes a system for real-time analysis of public sentiment toward presidential candidates in the 2012 U.S. election as expressed on Twitter, a microblogging service. Twitter has become a central site where people express their opinions and views on political parties and candidates. Emerging events or news are often followed almost instantly by a burst in Twitter volume, providing a unique opportunity to gauge the relation between expressed public sentiment and electoral events. In addition, sentiment analysis can help explore how these events affect public opinion. While traditional content analysis takes days or weeks to complete, the system demonstrated here analyzes sentiment in the entire Twitter traffic about the election, delivering results instantly and continuously. It offers the public, the media, politicians and scholars a new and timely perspective on the dynamics of the electoral process and public opinion. 1
5 0.64915454 91 acl-2012-Extracting and modeling durations for habits and events from Twitter
Author: Jennifer Williams ; Graham Katz
Abstract: We seek to automatically estimate typical durations for events and habits described in Twitter tweets. A corpus of more than 14 million tweets containing temporal duration information was collected. These tweets were classified as to their habituality status using a bootstrapped, decision tree. For each verb lemma, associated duration information was collected for episodic and habitual uses of the verb. Summary statistics for 483 verb lemmas and their typical habit and episode durations has been compiled and made available. This automatically generated duration information is broadly comparable to hand-annotation. 1
6 0.49711347 173 acl-2012-Self-Disclosure and Relationship Strength in Twitter Conversations
7 0.33259726 2 acl-2012-A Broad-Coverage Normalization System for Social Media Language
8 0.21053542 70 acl-2012-Demonstration of IlluMe: Creating Ambient According to Instant Message Logs
9 0.20743974 219 acl-2012-langid.py: An Off-the-shelf Language Identification Tool
10 0.20733979 98 acl-2012-Finding Bursty Topics from Microblogs
11 0.20082669 160 acl-2012-Personalized Normalization for a Multilingual Chat System
12 0.1939732 86 acl-2012-Exploiting Latent Information to Predict Diffusions of Novel Topics on Social Networks
13 0.19343339 42 acl-2012-Bootstrapping via Graph Propagation
14 0.17960954 180 acl-2012-Social Event Radar: A Bilingual Context Mining and Sentiment Analysis Summarization System
15 0.17744686 77 acl-2012-Ecological Evaluation of Persuasive Messages Using Google AdWords
16 0.17375411 24 acl-2012-A Web-based Evaluation Framework for Spatial Instruction-Giving Systems
17 0.17014024 80 acl-2012-Efficient Tree-based Approximation for Entailment Graph Learning
18 0.16840215 142 acl-2012-Mining Entity Types from Query Logs via User Intent Modeling
19 0.16451222 43 acl-2012-Building Trainable Taggers in a Web-based, UIMA-Supported NLP Workbench
20 0.15854552 153 acl-2012-Named Entity Disambiguation in Streaming Data
topicId topicWeight
[(25, 0.011), (26, 0.027), (28, 0.027), (30, 0.024), (37, 0.04), (39, 0.056), (52, 0.013), (74, 0.02), (82, 0.028), (84, 0.016), (85, 0.022), (90, 0.097), (92, 0.483), (94, 0.019), (99, 0.039)]
simIndex simValue paperId paperTitle
1 0.95692652 78 acl-2012-Efficient Search for Transformation-based Inference
Author: Asher Stern ; Roni Stern ; Ido Dagan ; Ariel Felner
Abstract: This paper addresses the search problem in textual inference, where systems need to infer one piece of text from another. A prominent approach to this task is attempts to transform one text into the other through a sequence of inference-preserving transformations, a.k.a. a proof, while estimating the proof’s validity. This raises a search challenge of finding the best possible proof. We explore this challenge through a comprehensive investigation of prominent search algorithms and propose two novel algorithmic components specifically designed for textual inference: a gradient-style evaluation function, and a locallookahead node expansion method. Evaluations, using the open-source system, BIUTEE, show the contribution of these ideas to search efficiency and proof quality.
2 0.94433707 86 acl-2012-Exploiting Latent Information to Predict Diffusions of Novel Topics on Social Networks
Author: Tsung-Ting Kuo ; San-Chuan Hung ; Wei-Shih Lin ; Nanyun Peng ; Shou-De Lin ; Wei-Fen Lin
Abstract: This paper brings a marriage of two seemly unrelated topics, natural language processing (NLP) and social network analysis (SNA). We propose a new task in SNA which is to predict the diffusion of a new topic, and design a learning-based framework to solve this problem. We exploit the latent semantic information among users, topics, and social connections as features for prediction. Our framework is evaluated on real data collected from public domain. The experiments show 16% AUC improvement over baseline methods. The source code and dataset are available at http://www.csie.ntu.edu.tw/~d97944007/dif fusion/ 1 Background The diffusion of information on social networks has been studied for decades. Generally, the proposed strategies can be categorized into two categories, model-driven and data-driven. The model-driven strategies, such as independent cascade model (Kempe et al., 2003), rely on certain manually crafted, usually intuitive, models to fit the diffusion data without using diffusion history. The data-driven strategies usually utilize learning-based approaches to predict the future propagation given historical records of prediction (Fei et al., 2011; Galuba et al., 2010; Petrovic et al., 2011). Data-driven strategies usually perform better than model-driven approaches because the past diffusion behavior is used during learning (Galuba et al., 2010). Recently, researchers started to exploit content information in data-driven diffusion models (Fei et al., 2011; Petrovic et al., 2011; Zhu et al., 2011). 344 However, most of the data-driven approaches assume that in order to train a model and predict the future diffusion of a topic, it is required to obtain historical records about how this topic has propagated in a social network (Petrovic et al., 2011; Zhu et al., 2011). We argue that such assumption does not always hold in the real-world scenario, and being able to forecast the propagation of novel or unseen topics is more valuable in practice. For example, a company would like to know which users are more likely to be the source of ‘viva voce’ of a newly released product for advertising purpose. A political party might want to estimate the potential degree of responses of a half-baked policy before deciding to bring it up to public. To achieve such goal, it is required to predict the future propagation behavior of a topic even before any actual diffusion happens on this topic (i.e., no historical propagation data of this topic are available). Lin et al. also propose an idea aiming at predicting the inference of implicit diffusions for novel topics (Lin et al., 2011). The main difference between their work and ours is that they focus on implicit diffusions, whose data are usually not available. Consequently, they need to rely on a model-driven approach instead of a datadriven approach. On the other hand, our work focuses on the prediction of explicit diffusion behaviors. Despite the fact that no diffusion data of novel topics is available, we can still design a data- driven approach taking advantage of some explicit diffusion data of known topics. Our experiments show that being able to utilize such information is critical for diffusion prediction. 2 The Novel-Topic Diffusion Model We start by assuming an existing social network G = (V, E), where V is the set of nodes (or user) v, and E is the set of link e. The set of topics is Proce dJienjgus, R ofep thueb 5lic0t hof A Knonruea ,l M 8-e1e4ti Jnugly o f2 t0h1e2 A.s ?c so2c0ia1t2io Ans fso rc Ciatoiomnp fuotart Cio nmaplu Ltiantgiounisatlic Lsi,n pgaugiestsi3c 4s4–348, denoted as T. Among them, some are considered as novel topics (denoted as N), while the rest (R) are used as the training records. We are also given a set of diffusion records D = {d | d = (src, dest, t) }, where src is the source node (or diffusion source), dest is the destination node, and t is the topic of the diffusion that belongs to R but not N. We assume that diffusions cannot occur between nodes without direct social connection; any diffusion pair implies the existence of a link e = (src, dest) ∈ E. Finally, we assume there are sets of keywords or tags that relevant to each topic (including existing and novel topics). Note that the set of keywords for novel topics should be seen in that of existing topics. From these sets of keywords, we construct a topicword matrix TW = (P(wordj | topici))i,j of which the elements stand for the conditional probabilities that a word appears in the text of a certain topic. Similarly, we also construct a user-word matrix UW= (P(wordj | useri))i,j from these sets of keywords. Given the above information, the goal is to predict whether a given link is active (i.e., belongs to a diffusion link) for topics in N. 2.1 The Framework The main challenge of this problem lays in that the past diffusion behaviors of new topics are missing. To address this challenge, we propose a supervised diffusion discovery framework that exploits the latent semantic information among users, topics, and their explicit / implicit interactions. Intuitively, four kinds of information are useful for prediction: • Topic information: Intuitively, knowing the signatures of a topic (e.g., is it about politics?) is critical to the success of the prediction. • User information: The information of a user such as the personality (e.g., whether this user is aggressive or passive) is generally useful. • User-topic interaction: Understanding the users' preference on certain topics can improve the quality of prediction. • Global information: We include some global features (e.g., topology info) of social network. Below we will describe how these four kinds of information can be modeled in our framework. 2.2 Topic Information We extract hidden topic category information to model topic signature. In particular, we exploit the 345 Latent Dirichlet Allocation (LDA) method (Blei et al., 2003), which is a widely used topic modeling technique, to decompose the topic-word matrix TW into hidden topic categories: TW = TH * HW , where TH is a topic-hidden matrix, HW is hiddenword matrix, and h is the manually-chosen parameter to determine the size of hidden topic categories. TH indicates the distribution of each topic to hidden topic categories, and HW indicates the distribution of each lexical term to hidden topic categories. Note that TW and TH include both existing and novel topics. We utilize THt,*, the row vector of the topic-hidden matrix TH for a topic t, as a feature set. In brief, we apply LDA to extract the topic-hidden vector THt,* to model topic signature (TG) for both existing and novel topics. Topic information can be further exploited. To predict whether a novel topic will be propagated through a link, we can first enumerate the existing topics that have been propagated through this link. For each such topic, we can calculate its similarity with the new topic based on the hidden vectors generated above (e.g., using cosine similarity between feature vectors). Then, we sum up the similarity values as a new feature: topic similarity (TS). For example, a link has previously propagated two topics for a total of three times {ACL, KDD, ACL}, and we would like to know whether a new topic, EMNLP, will propagate through this link. We can use the topic-hidden vector to generate the similarity values between EMNLP and the other topics (e.g., {0.6, 0.4, 0.6}), and then sum them up (1.6) as the value of TS. 2.3 User Information Similar to topic information, we extract latent personal information to model user signature (the users are anonymized already). We apply LDA on the user-word matrix UW: UW = UM * MW , where UM is the user-hidden matrix, MW is the hidden-word matrix, and m is the manually-chosen size of hidden user categories. UM indicates the distribution of each user to the hidden user categories (e.g., age). We then use UMu,*, the row vector of UM for the user u, as a feature set. In brief, we apply LDA to extract the user-hidden vector UMu,* for both source and destination nodes of a link to model user signature (UG). 2.4 User-Topic Interaction Modeling user-topic interaction turns out to be non-trivial. It is not useful to exploit latent semantic analysis directly on the user-topic matrix UR = UQ * QR , where UR represents how many times each user is diffused for existing topic R (R ∈ T), because UR does not contain information of novel topics, and neither do UQ and QR. Given no propagation record about novel topics, we propose a method that allows us to still extract implicit user-topic information. First, we extract from the matrix TH (described in Section 2.2) a subset RH that contains only information about existing topics. Next we apply left division to derive another userhidden matrix UH: UH = (RH \ URT)T = ((RHT RH )-1 RHT URT)T Using left division, we generate the UH matrix using existing topic information. Finally, we exploit UHu,*, the row vector of the user-hidden matrix UH for the user u, as a feature set. Note that novel topics were included in the process of learning the hidden topic categories on RH; therefore the features learned here do implicitly utilize some latent information of novel topics, which is not the case for UM. Experiments confirm the superiority of our approach. Furthermore, our approach ensures that the hidden categories in topic-hidden and user-hidden matrices are identical. Intuitively, our method directly models the user’s preference to topics’ signature (e.g., how capable is this user to propagate topics in politics category?). In contrast, the UM mentioned in Section 2.3 represents the users’ signature (e.g., aggressiveness) and has nothing to do with their opinions on a topic. In short, we obtain the user-hidden probability vector UHu,* as a feature set, which models user preferences to latent categories (UPLC). 2.5 Global Features Given a candidate link, we can extract global social features such as in-degree (ID) and outdegree (OD). We tried other features such as PageRank values but found them not useful. Moreover, we extract the number of distinct topics (NDT) for a link as a feature. The intuition behind this is that the more distinct topics a user has diffused to another, the more likely the diffusion will happen for novel topics. 346 2.6 Complexity Analysis The complexity to produce each feature is as below: (1) Topic information: O(I * |T| * h * Bt) for LDA using Gibbs sampling, where Iis # of the iterations in sampling, |T| is # of topics, and Bt is the average # of tokens in a topic. (2) User information: O(I * |V| * m * Bu) , where |V| is # of users, and Bu is the average # of tokens for a user. (3) User-topic interaction: the time complexity is O(h3 + h2 * |T| + h * |T| * |V|). (4) Global features: O(|D|), where |D| is # of diffusions. 3 Experiments For evaluation, we try to use the diffusion records of old topics to predict whether a diffusion link exists between two nodes given a new topic. 3.1 Dataset and Evaluation Metric We first identify 100 most popular topic (e.g., earthquake) from the Plurk micro-blog site between 01/201 1 and 05/201 1. Plurk is a popular micro-blog service in Asia with more than 5 million users (Kuo et al., 2011). We manually separate the 100 topics into 7 groups. We use topic-wise 4-fold cross validation to evaluate our method, because there are only 100 available topics. For each group, we select 3/4 of the topics as training and 1/4 as validation. The positive diffusion records are generated based on the post-response behavior. That is, if a person x posts a message containing one of the selected topic t, and later there is a person y responding to this message, we consider a diffusion of t has occurred from x to y (i.e., (x, y, t) is a positive instance). Our dataset contains a total of 1,642,894 positive instances out of 100 distinct topics; the largest and smallest topic contains 303,424 and 2,166 diffusions, respectively. Also, the same amount of negative instances for each topic (totally 1,642,894) is sampled for binary classification (similar to the setup in KDD Cup 2011 Track 2). The negative links of a topic t are sampled randomly based on the absence of responses for that given topic. The underlying social network is created using the post-response behavior as well. We assume there is an acquaintance link between x and y if and only if x has responded to y (or vice versa) on at least one topic. Eventually we generated a social network of 163,034 nodes and 382,878 links. Furthermore, the sets of keywords for each topic are required to create the TW and UW matrices for latent topic analysis; we simply extract the content of posts and responses for each topic to create both matrices. We set the hidden category number h = m = 7, which is equal to the number of topic groups. We use area under ROC curve (AUC) to evaluate our proposed framework (Davis and Goadrich, 2006); we rank the testing instances based on their likelihood of being positive, and compare it with the ground truth to compute AUC. 3.2 Implementation and Baseline After trying many classifiers and obtaining similar results for all of them, we report only results from LIBLINEAR with c=0.0001 (Fan et al., 2008) due to space limitation. We remove stop-words, use SCWS (Hightman, 2012) for tokenization, and MALLET (McCallum, 2002) and GibbsLDA++ (Phan and Nguyen, 2007) for LDA. There are three baseline models we compare the result with. First, we simply use the total number of existing diffusions among all topics between two nodes as the single feature for prediction. Second, we exploit the independent cascading model (Kempe et al., 2003), and utilize the normalized total number of diffusions as the propagation probability of each link. Third, we try the heat diffusion model (Ma et al., 2008), set initial heat proportional to out-degree, and tune the diffusion time parameter until the best results are obtained. Note that we did not compare with any data-driven approaches, as we have not identified one that can predict diffusion of novel topics. 3.3 Results The result of each model is shown in Table 1. All except two features outperform the baseline. The best single feature is TS. Note that UPLC performs better than UG, which verifies our hypothesis that maintaining the same hidden features across different LDA models is better. We further conduct experiments to evaluate different combinations of features (Table 2), and found that the best one (TS + ID + NDT) results in about 16% improvement over the baseline, and outperforms the combination of all features. As stated in (Witten et al., 2011), 347 adding useless features may cause the performance of classifiers to deteriorate. Intuitively, TS captures both latent topic and historical diffusion information, while ID and NDT provide complementary social characteristics of users. 4 Conclusions The main contributions of this paper are as below: 1. We propose a novel task of predicting the diffusion of unseen topics, which has wide applications in real-world. 2. Compared to the traditional model-driven or content-independent data-driven works on diffusion analysis, our solution demonstrates how one can bring together ideas from two different but promising areas, NLP and SNA, to solve a challenging problem. 3. Promising experiment result (74% in AUC) not only demonstrates the usefulness of the proposed models, but also indicates that predicting diffusion of unseen topics without historical diffusion data is feasible. Acknowledgments This work was also supported by National Science Council, National Taiwan University and Intel Corporation under Grants NSC 100-291 1-I-002-001, and 101R7501. References David M. Blei, Andrew Y. Ng & Michael I. Jordan. 2003. Latent dirichlet allocation. J. Mach. Learn. Res., 3.993-1022. Jesse Davis & Mark Goadrich. 2006. The relationship between Precision-Recall and ROC curves. Proceedings of the 23rd international conference on Machine learning, Pittsburgh, Pennsylvania. Rong-En Fan, Kai-Wei Chang, Cho-Jui Hsieh, XiangRui Wang & Chih-Jen Lin. 2008. LIBLINEAR: A Library for Large Linear Classification. J. Mach. Learn. Res., 9.1871-74. Hongliang Fei, Ruoyi Jiang, Yuhao Yang, Bo Luo & Jun Huan. 2011. Content based social behavior prediction: a multi-task learning approach. Proceedings of the 20th ACM international conference on Information and knowledge management, Glasgow, Scotland, UK. Wojciech Galuba, Karl Aberer, Dipanjan Chakraborty, Zoran Despotovic & Wolfgang Kellerer. 2010. Outtweeting the twitterers - predicting information cascades in microblogs. Proceedings of the 3rd conference on Online social networks, Boston, MA. Hightman. 2012. Simple Chinese Words Segmentation (SCWS). David Kempe, Jon Kleinberg & Eva Tardos. 2003. Maximizing the spread of influence through a social network. Proceedings of the ninth ACM SIGKDD international conference on Knowledge discovery and data mining, Washington, D.C. Tsung-Ting Kuo, San-Chuan Hung, Wei-Shih Lin, Shou-De Lin, Ting-Chun Peng & Chia-Chun Shih. 2011. Assessing the Quality of Diffusion Models Using Real-World Social Network Data. Conference on Technologies and Applications of Artificial Intelligence, 2011. C.X. Lin, Q.Z. Mei, Y.L. Jiang, J.W. Han & S.X. Qi. 2011. Inferring the Diffusion and Evolution of Topics in Social Communities. Proceedings of the IEEE International Conference on Data Mining, 2011. Hao Ma, Haixuan Yang, Michael R. Lyu & Irwin King. 2008. Mining social networks using heat diffusion processes for marketing candidates selection. Proceeding of the 17th ACM conference on Information and knowledge management, Napa Valley, California, USA. Andrew Kachites McCallum. 2002. MALLET: A Machine Learning for Language Toolkit. Sasa Petrovic, Miles Osborne & Victor Lavrenko. 2011. RT to Win! Predicting Message Propagation in Twitter. International AAAI Conference on Weblogs and Social Media, 2011. 348 Xuan-Hieu Phan & Cam-Tu Nguyen. 2007. GibbsLDA++: A C/C++ implementation of latent Dirichlet allocation (LDA). Ian H. Witten, Eibe Frank & Mark A. Hall. 2011. Data Mining: Practical machine learning tools and techniques. San Francisco: Morgan Kaufmann Publishers Inc. Jiang Zhu, Fei Xiong, Dongzhen Piao, Yun Liu & Ying Zhang. 2011. Statistically Modeling the Effectiveness of Disaster Information in Social Media. Proceedings of the 2011 IEEE Global Humanitarian Technology Conference.
3 0.93168133 154 acl-2012-Native Language Detection with Tree Substitution Grammars
Author: Benjamin Swanson ; Eugene Charniak
Abstract: We investigate the potential of Tree Substitution Grammars as a source of features for native language detection, the task of inferring an author’s native language from text in a different language. We compare two state of the art methods for Tree Substitution Grammar induction and show that features from both methods outperform previous state of the art results at native language detection. Furthermore, we contrast these two induction algorithms and show that the Bayesian approach produces superior classification results with a smaller feature set.
same-paper 4 0.92177629 205 acl-2012-Tweet Recommendation with Graph Co-Ranking
Author: Rui Yan ; Mirella Lapata ; Xiaoming Li
Abstract: Mirella Lapata‡ Xiaoming Li†, \ ‡Institute for Language, \State Key Laboratory of Software Cognition and Computation, Development Environment, University of Edinburgh, Beihang University, Edinburgh EH8 9AB, UK Beijing 100083, China mlap@ inf .ed .ac .uk lxm@pku .edu .cn 2012.1 Twitter enables users to send and read textbased posts ofup to 140 characters, known as tweets. As one of the most popular micro-blogging services, Twitter attracts millions of users, producing millions of tweets daily. Shared information through this service spreads faster than would have been possible with traditional sources, however the proliferation of user-generation content poses challenges to browsing and finding valuable information. In this paper we propose a graph-theoretic model for tweet recommendation that presents users with items they may have an interest in. Our model ranks tweets and their authors simultaneously using several networks: the social network connecting the users, the network connecting the tweets, and a third network that ties the two together. Tweet and author entities are ranked following a co-ranking algorithm based on the intuition that that there is a mutually reinforcing relationship between tweets and their authors that could be reflected in the rankings. We show that this framework can be parametrized to take into account user preferences, the popularity of tweets and their authors, and diversity. Experimental evaluation on a large dataset shows that our model out- performs competitive approaches by a large margin.
5 0.90528697 208 acl-2012-Unsupervised Relation Discovery with Sense Disambiguation
Author: Limin Yao ; Sebastian Riedel ; Andrew McCallum
Abstract: To discover relation types from text, most methods cluster shallow or syntactic patterns of relation mentions, but consider only one possible sense per pattern. In practice this assumption is often violated. In this paper we overcome this issue by inducing clusters of pattern senses from feature representations of patterns. In particular, we employ a topic model to partition entity pairs associated with patterns into sense clusters using local and global features. We merge these sense clusters into semantic relations using hierarchical agglomerative clustering. We compare against several baselines: a generative latent-variable model, a clustering method that does not disambiguate between path senses, and our own approach but with only local features. Experimental results show our proposed approach discovers dramatically more accurate clusters than models without sense disambiguation, and that incorporating global features, such as the document theme, is crucial.
6 0.69990915 31 acl-2012-Authorship Attribution with Author-aware Topic Models
7 0.699844 36 acl-2012-BIUTEE: A Modular Open-Source System for Recognizing Textual Entailment
8 0.69821161 84 acl-2012-Estimating Compact Yet Rich Tree Insertion Grammars
9 0.68770069 38 acl-2012-Bayesian Symbol-Refined Tree Substitution Grammars for Syntactic Parsing
10 0.67293262 174 acl-2012-Semantic Parsing with Bayesian Tree Transducers
11 0.63428938 98 acl-2012-Finding Bursty Topics from Microblogs
12 0.62607747 132 acl-2012-Learning the Latent Semantics of a Concept from its Definition
13 0.59788173 167 acl-2012-QuickView: NLP-based Tweet Search
14 0.59541565 79 acl-2012-Efficient Tree-Based Topic Modeling
15 0.59485561 22 acl-2012-A Topic Similarity Model for Hierarchical Phrase-based Translation
16 0.59259868 80 acl-2012-Efficient Tree-based Approximation for Entailment Graph Learning
19 0.56932473 10 acl-2012-A Discriminative Hierarchical Model for Fast Coreference at Large Scale
20 0.55924356 185 acl-2012-Strong Lexicalization of Tree Adjoining Grammars