acl acl2012 acl2012-167 knowledge-graph by maker-knowledge-mining

167 acl-2012-QuickView: NLP-based Tweet Search


Source: pdf

Author: Xiaohua Liu ; Furu Wei ; Ming Zhou ; QuickView Team Microsoft

Abstract: Tweets have become a comprehensive repository for real-time information. However, it is often hard for users to quickly get information they are interested in from tweets, owing to the sheer volume of tweets as well as their noisy and informal nature. We present QuickView, an NLP-based tweet search platform to tackle this issue. Specifically, it exploits a series of natural language processing technologies, such as tweet normalization, named entity recognition, semantic role labeling, sentiment analysis, tweet classification, to extract useful information, i.e., named entities, events, opinions, etc., from a large volume of tweets. Then, non-noisy tweets, together with the mined information, are indexed, on top of which two brand new scenarios are enabled, i.e., categorized browsing and advanced search, allowing users to effectively access either the tweets or fine-grained information they are interested in.

Reference: text


Summary: the most important sentenses genereted by tfidf model

sentIndex sentText sentNum sentScore

1 However, it is often hard for users to quickly get information they are interested in from tweets, owing to the sheer volume of tweets as well as their noisy and informal nature. [sent-3, score-0.71]

2 We present QuickView, an NLP-based tweet search platform to tackle this issue. [sent-4, score-0.509]

3 Specifically, it exploits a series of natural language processing technologies, such as tweet normalization, named entity recognition, semantic role labeling, sentiment analysis, tweet classification, to extract useful information, i. [sent-5, score-0.978]

4 Then, non-noisy tweets, together with the mined information, are indexed, on top of which two brand new scenarios are enabled, i. [sent-9, score-0.139]

5 , categorized browsing and advanced search, allowing users to effectively access either the tweets or fine-grained information they are interested in. [sent-11, score-0.762]

6 However, users often have difficulty finding information they are interested in from tweets, because of the huge number of tweets as well as their noisy and informal nature. [sent-13, score-0.674]

7 , Twitter 1, is a kind of service aiming to tackle this issue. [sent-16, score-0.027]

8 For example, in Twitter, only a simple keyword-based search is sup1http://twitter. [sent-18, score-0.065]

9 This demonstration introduces QuickView, which employs a series of NLP technologies to extract useful information from a large volume of tweets. [sent-20, score-0.046]

10 Specifically, for each tweet, it first conducts normalization, followed by named entity recognition (NER). [sent-21, score-0.125]

11 Then it conducts semantic role labeling (SRL) to get predicate-argument structures, which are further converted into events, i. [sent-22, score-0.146]

12 Finally, non-noisy tweets together with the mined information are indexed. [sent-29, score-0.577]

13 On top of the index, QuickView enables two brand new scenarios, allowing users to effectively access the tweets or fine-grained information mined from tweets. [sent-30, score-0.679]

14 As illustrated in Figure 1(a), QuickView shows recent popular tweets, entities, events, opinions and so on, which are organized by categories. [sent-32, score-0.083]

15 It also extracts and classifies URL links in tweets and allows users to check out popular links in a categorized way. [sent-33, score-0.642]

16 , search tweets containing any positive/negative opinion about “Obama” or any event involving “Obama”. [sent-40, score-0.636]

17 The implementation of QuickView requires adapting existing NLP components trained on formal texts, which often performs poorly on tweets. [sent-41, score-0.108]

18 However, the adaptation of those components is challenging, owing to the lack of annotated tweets and the inadequate signals provided by a noisy and short tweet. [sent-49, score-0.755]

19 Our general strategy is to leverage existing resources as well as unsupervised or semisupervised learning methods to reduce the labeling efforts, and to aggregate as much evidence as possible from a broader context to compensate for the lack of information in a tweet. [sent-50, score-0.128]

20 This strategy is embodied by various components we have developed. [sent-51, score-0.071]

21 For example, our NER component combines a k-nearest neighbors (KNN) classifier, which collects global information across recently labeled tweets with a Conditional Random Fields (CRF) labeler, which exploits information from a single tweet and the gazetteers. [sent-52, score-1.083]

22 Both the KNN classifier and the CRF labeler are repeatedly retrained using the results that they have confidently labeled. [sent-53, score-0.207]

23 The SRL component caches and clusters recent labeled tweets, and aggregates information from the cluster containing the tweet. [sent-54, score-0.15]

24 Similarly, the classifier considers not only the current tweet but also its neighbors in a tweet graph, where two tweets are connected if they are similar in content or have a tweet/retweet relationship. [sent-55, score-1.423]

25 Experimental results on a human annotated dataset also indicate the effectiveness of our adaptation strategy. [sent-57, score-0.071]

26 We demonstrate QuickView, an NLP-based 14 tweet search. [sent-60, score-0.417]

27 Different from existing methods, it exploits a series of NLP technologies to extract useful information from a large volume of tweets, and enables categorized browsing and advanced search scenarios, allowing users to efficiently access information they are interested in from tweets. [sent-61, score-0.392]

28 We present core components of QuickView, focusing on how to leverage existing resources and technologies as well as how to make up for the limited information in a short and often noisy tweet by aggregating information from a broader context. [sent-63, score-0.708]

29 However, unlike existing IE systems, such as Evita (Saur ´ı et al. [sent-71, score-0.037]

30 , 2005), a robust event recognizer for QA system, and SRES (Rozenfeld and Feldman, 2008), a self-supervised relation extractor for the web, it targets tweets, a new genre of text, which are short and informal, and its focus is on adapting existing IE components to tweets. [sent-72, score-0.159]

31 A couple of tweet search services exist, including Twitter, Bing social search 2 and Google social search 3. [sent-74, score-0.637]

32 Most of them provide only keyword-based search interfaces, i. [sent-75, score-0.065]

33 , returning a list of tweets related to a given word/phrase. [sent-77, score-0.543]

34 In contrast, our system extracts fine-grained information from tweets and allows a new end-toend search experience beyond keyword search, such as clustering of search results, and search with events/opinions. [sent-78, score-0.738]

35 At the heart of our system is the re-use of existing resources, methodologies as 2http://www. [sent-83, score-0.068]

36 com/realtime (a) A screenshot of the categorized browsing scenario. [sent-87, score-0.142]

37 (b) A screenshot of the advanced search scenario. [sent-88, score-0.126]

38 well as components, and the the adaptation of them to tweets. [sent-90, score-0.048]

39 3 System Description We first give an overview of our system, then present more details about NER and SRL, as two representative core components, to illustrate the adaptation process. [sent-92, score-0.099]

40 QuickView can be divided into four parts, as illustrated in Figure 2. [sent-95, score-0.027]

41 The first part includes a crawler and a buffer of raw tweets. [sent-96, score-0.134]

42 The crawler repeatedly downloads tweets using the Twitter APIs, and then pre-filters noisy tweets using some heuristic rules, e. [sent-97, score-1.194]

43 , removing a tweet if it is too short, say, less than 3 words, or if it contains any predefined banned word. [sent-99, score-0.417]

44 At the moment, we focus on English tweets, so non-English tweets are filtered as well. [sent-100, score-0.543]

45 The second part consists of several tweet extraction pipelines. [sent-102, score-0.44]

46 Each pipeline has the same configuration, constantly fetching a tweet from the raw tweet buffer, and conducting the following processes se- quentially: 1) normalization; 2) parsing including part-of-speech (POS), chunking, and dependency parsing; 3) NER; 4) SRL; 5) SA and 6) classification. [sent-103, score-0.861]

47 In future, the parsing model will be re-trained using annotated tweets. [sent-109, score-0.023]

48 The SA component is implemented according to Jiang et al. [sent-110, score-0.043]

49 (201 1), which incorporates target-dependent features and considers related tweets by utilizing a graph-based optimization. [sent-111, score-0.543]

50 The classification model is a KNN-based classifier that caches confidently labeled results to re-train itself, which also recognizes and drops noisy tweets. [sent-112, score-0.206]

51 net/projects/opennlp/ 16 Each processed tweet, if not identified as noise, is put into a shared buffer for indexing. [sent-119, score-0.125]

52 The third part is responsible for indexing and querying. [sent-120, score-0.023]

53 It constantly takes from the indexing buffer a processed tweet, which is then indexed with various entries including words, phrases, metadata (e. [sent-121, score-0.222]

54 , source, publish time, and account), named entities, events, and opinions. [sent-123, score-0.035]

55 On top ofthis, it answers any search request, and returns a list of matched results, each of which contains both the original tweet and the extracted information from that tweet. [sent-124, score-0.513]

56 This part also maintains a cache of recent processed tweets, from which the following information is extracted and indexed: 1) top tweets; 2) top entities/events/opinions in tweets; and 3) top accounts. [sent-126, score-0.12]

57 Whether a tweet/entity/event/opinion ranks top depends on their re-tweeted/mentioned times as well as its publisher, while whether an account is top relies on the number of his/her followers and tweets. [sent-127, score-0.062]

58 The fourth part is a web application that returns related information to end users according to their browsing or search request. [sent-128, score-0.163]

59 The implementation of the web application is organized with the modelview-control pattern so that other kinds of user interfaces, e. [sent-129, score-0.029]

60 QuickView is deployed into 5 workstations 7 including 2 processing pipelines, as illustrated in Table 1. [sent-133, score-0.054]

61 01 seconds to process each tweet, and in total about 10 million tweets are indexed every day. [sent-136, score-0.59]

62 2 Core Components Because of limited space, we only discuss two core components of QuickView: NER and SRL. [sent-139, score-0.122]

63 33GHz, 4G of RAM, OS of Windows Server 2003 Enterprise X64 version Table 1: Current deployment of QuickView. [sent-147, score-0.036]

64 WorkstationHosted components #1Crawler,Raw tweet buffer #2,3Process pipeline #4Indexing Buffer, Indexer/Querier #5Web application the rule-based (Krupka and Hausman, 1998); 2) the machine learning based (Finkel and Manning, 2009; Singh et al. [sent-148, score-0.586]

65 With the availability of annotated corpora, such as ACE05, Enron and CoNLL03, the data-driven methods become the dominating methods. [sent-150, score-0.023]

66 Firstly, it defines those recently labeled tweets that are similar to the current tweet as its recognition context, under which a KNN- based classifier is used to conduct word level classification. [sent-153, score-1.015]

67 Following the two-stage prediction aggregation methods (Krishnan and Manning, 2006), such pre-labeled results, together with other conventional features used by the state-of-the-art NER systems, are fed into a linear CRF models, which conducts fine-grained tweet level NER. [sent-154, score-0.482]

68 Secondly, the KNN and CRF model are repeatedly retrained with an incrementally augmented training set, into which highly confidently labeled tweets are added. [sent-155, score-0.716]

69 Finally, following Lev Ratinov and Dan Roth (2009), 30 gazetteers are used, which cover common names, countries, locations, temporal expressions, etc. [sent-156, score-0.033]

70 These gazetteers represent general knowledge across domains, and help to make up for the lack of training data. [sent-157, score-0.033]

71 Given a sentence, the SRL component identifies every predicate, and for each predicate further identifies its arguments. [sent-159, score-0.154]

72 This task has been extensively studied on well-written corpora like news, and a couple of solutions exist. [sent-160, score-0.025]

73 , dividing the task into several successive components such as argument identification, argument classification, global infer- ence, etc. [sent-163, score-0.131]

74 , 2005); 2) sequentially labeling 17 based approach (M` arquez et al. [sent-165, score-0.08]

75 , labeling the words according to their positions relative to an argument (i. [sent-168, score-0.075]

76 Unsurprisingly, the performance of the state-of-the-art SRL system (MezaRuiz and Riedel, 2009) drops sharply when applied to tweets. [sent-173, score-0.03]

77 The SRL component of QuickView is based on CRF, and uses the recently labeled tweets that are similar to the current tweet as the broader context. [sent-174, score-1.081]

78 To prepare the initial clusters required by the SRL component as its input, we adopt the predicateargument mapping method (Liu et al. [sent-176, score-0.043]

79 , 2010) to get some automatically labeled tweets, which (plus the manually labeled tweets) are then organized into groups using a bottom-up clustering procedure. [sent-177, score-0.093]

80 We manually labeled the POS, Algorithm 1 SRL of QuickView. [sent-186, score-0.032]

81 2: while Pop a tweet t from iand t null do 3: eP Puot pt tao t a eceluts tte frro c: c a=n cdlu ts̸ =te nr(uclll, dt)o. [sent-189, score-0.417]

82 = NER, SRL and SA information for about 10,000 tweets, based on which the NER and SRL components are evaluated. [sent-196, score-0.071]

83 Experimental results show that: 1) our NER component achieves an average F1 of 80. [sent-197, score-0.043]

84 4% of the baseline, which is a CRF-based system similar to Ratinov and Roth’s (2009) but re-trained on annotated tweets; and 2) our SRL component gets an F1 of 59. [sent-199, score-0.066]

85 3%), which is trained on automatically annotated news tweets (tweets reporting news). [sent-203, score-0.566]

86 5 Conclusions and Future work We have described the motivation, scenarios, architecture, deployment and implementation of QuickView, an NLP-based tweet search. [sent-204, score-0.453]

87 At the heart of QuickView is the adaptation of existing NLP technologies, e. [sent-205, score-0.116]

88 We have illustrated our strategy to tackle this challenging task, i. [sent-208, score-0.054]

89 , leveraging existing resources and aggregating as much information as possible from a broader context, using NER and SRL as case studies. [sent-210, score-0.112]

90 Preliminary positive feedback suggests the usefulness of QuickView and its advantages over existing tweet search services. [sent-211, score-0.519]

91 Experimental results on a human annotated dataset indicate the effectiveness of our adaptation strategy. [sent-212, score-0.071]

92 We are improving the quality of the core compo- nents of QuickView by labeling more tweets and exploring alternative models. [sent-213, score-0.639]

93 Incorporating non-local information into information extraction systems by gibbs sampling. [sent-223, score-0.023]

94 An effective two-stage model for exploiting non-local de- pendencies in named entity recognition. [sent-241, score-0.06]

95 Isoquest: Description of the netowlTM extractor system as used in muc-7. [sent-246, score-0.023]


similar papers computed by tfidf model

tfidf for this paper:

wordName wordTfidf (topN-words)

[('tweets', 0.543), ('quickview', 0.493), ('tweet', 0.417), ('srl', 0.247), ('ner', 0.132), ('buffer', 0.098), ('components', 0.071), ('labeler', 0.065), ('conducts', 0.065), ('search', 0.065), ('knn', 0.062), ('categorized', 0.059), ('browsing', 0.058), ('crf', 0.056), ('confidently', 0.054), ('predicate', 0.053), ('ratinov', 0.053), ('core', 0.051), ('cf', 0.051), ('twitter', 0.05), ('adaptation', 0.048), ('indexed', 0.047), ('xiaohua', 0.046), ('broader', 0.046), ('labeling', 0.045), ('component', 0.043), ('scenarios', 0.043), ('sa', 0.042), ('cluster', 0.042), ('evita', 0.041), ('koomen', 0.041), ('krishnan', 0.041), ('mezaruiz', 0.041), ('rozenfeld', 0.041), ('riedel', 0.04), ('users', 0.04), ('finkel', 0.039), ('repeatedly', 0.038), ('existing', 0.037), ('role', 0.036), ('advanced', 0.036), ('jansche', 0.036), ('krupka', 0.036), ('owing', 0.036), ('crawler', 0.036), ('deployment', 0.036), ('arquez', 0.035), ('named', 0.035), ('events', 0.035), ('normalization', 0.034), ('mined', 0.034), ('noisy', 0.034), ('roth', 0.034), ('gazetteers', 0.033), ('saur', 0.033), ('caches', 0.033), ('labeled', 0.032), ('informal', 0.031), ('ming', 0.031), ('brand', 0.031), ('heart', 0.031), ('top', 0.031), ('drops', 0.03), ('argument', 0.03), ('identifies', 0.029), ('organized', 0.029), ('aggregating', 0.029), ('event', 0.028), ('harbin', 0.027), ('retrained', 0.027), ('constantly', 0.027), ('deployed', 0.027), ('processed', 0.027), ('illustrated', 0.027), ('tackle', 0.027), ('opinions', 0.027), ('interested', 0.026), ('exploits', 0.025), ('singh', 0.025), ('screenshot', 0.025), ('couple', 0.025), ('entity', 0.025), ('jiang', 0.024), ('lev', 0.024), ('classifier', 0.023), ('extraction', 0.023), ('conll', 0.023), ('series', 0.023), ('ie', 0.023), ('extractor', 0.023), ('interfaces', 0.023), ('obama', 0.023), ('indexing', 0.023), ('neighbors', 0.023), ('nlp', 0.023), ('technologies', 0.023), ('zhou', 0.023), ('annotated', 0.023), ('incrementally', 0.022), ('qa', 0.022)]

similar papers list:

simIndex simValue paperId paperTitle

same-paper 1 1.0 167 acl-2012-QuickView: NLP-based Tweet Search

Author: Xiaohua Liu ; Furu Wei ; Ming Zhou ; QuickView Team Microsoft

Abstract: Tweets have become a comprehensive repository for real-time information. However, it is often hard for users to quickly get information they are interested in from tweets, owing to the sheer volume of tweets as well as their noisy and informal nature. We present QuickView, an NLP-based tweet search platform to tackle this issue. Specifically, it exploits a series of natural language processing technologies, such as tweet normalization, named entity recognition, semantic role labeling, sentiment analysis, tweet classification, to extract useful information, i.e., named entities, events, opinions, etc., from a large volume of tweets. Then, non-noisy tweets, together with the mined information, are indexed, on top of which two brand new scenarios are enabled, i.e., categorized browsing and advanced search, allowing users to effectively access either the tweets or fine-grained information they are interested in.

2 0.5297963 205 acl-2012-Tweet Recommendation with Graph Co-Ranking

Author: Rui Yan ; Mirella Lapata ; Xiaoming Li

Abstract: Mirella Lapata‡ Xiaoming Li†, \ ‡Institute for Language, \State Key Laboratory of Software Cognition and Computation, Development Environment, University of Edinburgh, Beihang University, Edinburgh EH8 9AB, UK Beijing 100083, China mlap@ inf .ed .ac .uk lxm@pku .edu .cn 2012.1 Twitter enables users to send and read textbased posts ofup to 140 characters, known as tweets. As one of the most popular micro-blogging services, Twitter attracts millions of users, producing millions of tweets daily. Shared information through this service spreads faster than would have been possible with traditional sources, however the proliferation of user-generation content poses challenges to browsing and finding valuable information. In this paper we propose a graph-theoretic model for tweet recommendation that presents users with items they may have an interest in. Our model ranks tweets and their authors simultaneously using several networks: the social network connecting the users, the network connecting the tweets, and a third network that ties the two together. Tweet and author entities are ranked following a co-ranking algorithm based on the intuition that that there is a mutually reinforcing relationship between tweets and their authors that could be reflected in the rankings. We show that this framework can be parametrized to take into account user preferences, the popularity of tweets and their authors, and diversity. Experimental evaluation on a large dataset shows that our model out- performs competitive approaches by a large margin.

3 0.34889501 124 acl-2012-Joint Inference of Named Entity Recognition and Normalization for Tweets

Author: Xiaohua Liu ; Ming Zhou ; Xiangyang Zhou ; Zhongyang Fu ; Furu Wei

Abstract: Tweets represent a critical source of fresh information, in which named entities occur frequently with rich variations. We study the problem of named entity normalization (NEN) for tweets. Two main challenges are the errors propagated from named entity recognition (NER) and the dearth of information in a single tweet. We propose a novel graphical model to simultaneously conduct NER and NEN on multiple tweets to address these challenges. Particularly, our model introduces a binary random variable for each pair of words with the same lemma across similar tweets, whose value indicates whether the two related words are mentions of the same entity. We evaluate our method on a manually annotated data set, and show that our method outperforms the baseline that handles these two tasks separately, boosting the F1 from 80.2% to 83.6% for NER, and the Accuracy from 79.4% to 82.6% for NEN, respectively.

4 0.34818071 21 acl-2012-A System for Real-time Twitter Sentiment Analysis of 2012 U.S. Presidential Election Cycle

Author: Hao Wang ; Dogan Can ; Abe Kazemzadeh ; Francois Bar ; Shrikanth Narayanan

Abstract: This paper describes a system for real-time analysis of public sentiment toward presidential candidates in the 2012 U.S. election as expressed on Twitter, a microblogging service. Twitter has become a central site where people express their opinions and views on political parties and candidates. Emerging events or news are often followed almost instantly by a burst in Twitter volume, providing a unique opportunity to gauge the relation between expressed public sentiment and electoral events. In addition, sentiment analysis can help explore how these events affect public opinion. While traditional content analysis takes days or weeks to complete, the system demonstrated here analyzes sentiment in the entire Twitter traffic about the election, delivering results instantly and continuously. It offers the public, the media, politicians and scholars a new and timely perspective on the dynamics of the electoral process and public opinion. 1

5 0.23474282 91 acl-2012-Extracting and modeling durations for habits and events from Twitter

Author: Jennifer Williams ; Graham Katz

Abstract: We seek to automatically estimate typical durations for events and habits described in Twitter tweets. A corpus of more than 14 million tweets containing temporal duration information was collected. These tweets were classified as to their habituality status using a bootstrapped, decision tree. For each verb lemma, associated duration information was collected for episodic and habitual uses of the verb. Summary statistics for 483 verb lemmas and their typical habit and episode durations has been compiled and made available. This automatically generated duration information is broadly comparable to hand-annotation. 1

6 0.10028513 64 acl-2012-Crosslingual Induction of Semantic Roles

7 0.088663951 173 acl-2012-Self-Disclosure and Relationship Strength in Twitter Conversations

8 0.076181658 2 acl-2012-A Broad-Coverage Normalization System for Social Media Language

9 0.070018895 176 acl-2012-Sentence Compression with Semantic Role Constraints

10 0.068683237 209 acl-2012-Unsupervised Semantic Role Induction with Global Role Ordering

11 0.064803898 147 acl-2012-Modeling the Translation of Predicate-Argument Structure for SMT

12 0.057774659 98 acl-2012-Finding Bursty Topics from Microblogs

13 0.053996332 33 acl-2012-Automatic Event Extraction with Structured Preference Modeling

14 0.052630488 43 acl-2012-Building Trainable Taggers in a Web-based, UIMA-Supported NLP Workbench

15 0.051725041 208 acl-2012-Unsupervised Relation Discovery with Sense Disambiguation

16 0.048291847 85 acl-2012-Event Linking: Grounding Event Reference in a News Archive

17 0.048164468 90 acl-2012-Extracting Narrative Timelines as Temporal Dependency Structures

18 0.046764638 153 acl-2012-Named Entity Disambiguation in Streaming Data

19 0.045673531 73 acl-2012-Discriminative Learning for Joint Template Filling

20 0.041870885 191 acl-2012-Temporally Anchored Relation Extraction


similar papers computed by lsi model

lsi for this paper:

topicId topicWeight

[(0, -0.157), (1, 0.184), (2, 0.053), (3, 0.105), (4, 0.121), (5, -0.17), (6, 0.546), (7, 0.127), (8, 0.192), (9, 0.268), (10, 0.108), (11, -0.084), (12, 0.107), (13, -0.002), (14, 0.042), (15, 0.089), (16, -0.004), (17, -0.084), (18, 0.051), (19, 0.047), (20, 0.022), (21, -0.035), (22, -0.012), (23, 0.007), (24, 0.069), (25, -0.068), (26, -0.011), (27, -0.021), (28, -0.035), (29, -0.003), (30, -0.035), (31, 0.025), (32, -0.028), (33, -0.093), (34, -0.076), (35, -0.005), (36, -0.031), (37, 0.007), (38, 0.053), (39, -0.068), (40, 0.021), (41, 0.02), (42, -0.014), (43, -0.016), (44, -0.062), (45, 0.034), (46, 0.034), (47, -0.076), (48, -0.019), (49, -0.019)]

similar papers list:

simIndex simValue paperId paperTitle

same-paper 1 0.97076601 167 acl-2012-QuickView: NLP-based Tweet Search

Author: Xiaohua Liu ; Furu Wei ; Ming Zhou ; QuickView Team Microsoft

Abstract: Tweets have become a comprehensive repository for real-time information. However, it is often hard for users to quickly get information they are interested in from tweets, owing to the sheer volume of tweets as well as their noisy and informal nature. We present QuickView, an NLP-based tweet search platform to tackle this issue. Specifically, it exploits a series of natural language processing technologies, such as tweet normalization, named entity recognition, semantic role labeling, sentiment analysis, tweet classification, to extract useful information, i.e., named entities, events, opinions, etc., from a large volume of tweets. Then, non-noisy tweets, together with the mined information, are indexed, on top of which two brand new scenarios are enabled, i.e., categorized browsing and advanced search, allowing users to effectively access either the tweets or fine-grained information they are interested in.

2 0.89011598 205 acl-2012-Tweet Recommendation with Graph Co-Ranking

Author: Rui Yan ; Mirella Lapata ; Xiaoming Li

Abstract: Mirella Lapata‡ Xiaoming Li†, \ ‡Institute for Language, \State Key Laboratory of Software Cognition and Computation, Development Environment, University of Edinburgh, Beihang University, Edinburgh EH8 9AB, UK Beijing 100083, China mlap@ inf .ed .ac .uk lxm@pku .edu .cn 2012.1 Twitter enables users to send and read textbased posts ofup to 140 characters, known as tweets. As one of the most popular micro-blogging services, Twitter attracts millions of users, producing millions of tweets daily. Shared information through this service spreads faster than would have been possible with traditional sources, however the proliferation of user-generation content poses challenges to browsing and finding valuable information. In this paper we propose a graph-theoretic model for tweet recommendation that presents users with items they may have an interest in. Our model ranks tweets and their authors simultaneously using several networks: the social network connecting the users, the network connecting the tweets, and a third network that ties the two together. Tweet and author entities are ranked following a co-ranking algorithm based on the intuition that that there is a mutually reinforcing relationship between tweets and their authors that could be reflected in the rankings. We show that this framework can be parametrized to take into account user preferences, the popularity of tweets and their authors, and diversity. Experimental evaluation on a large dataset shows that our model out- performs competitive approaches by a large margin.

3 0.76673359 124 acl-2012-Joint Inference of Named Entity Recognition and Normalization for Tweets

Author: Xiaohua Liu ; Ming Zhou ; Xiangyang Zhou ; Zhongyang Fu ; Furu Wei

Abstract: Tweets represent a critical source of fresh information, in which named entities occur frequently with rich variations. We study the problem of named entity normalization (NEN) for tweets. Two main challenges are the errors propagated from named entity recognition (NER) and the dearth of information in a single tweet. We propose a novel graphical model to simultaneously conduct NER and NEN on multiple tweets to address these challenges. Particularly, our model introduces a binary random variable for each pair of words with the same lemma across similar tweets, whose value indicates whether the two related words are mentions of the same entity. We evaluate our method on a manually annotated data set, and show that our method outperforms the baseline that handles these two tasks separately, boosting the F1 from 80.2% to 83.6% for NER, and the Accuracy from 79.4% to 82.6% for NEN, respectively.

4 0.67710274 21 acl-2012-A System for Real-time Twitter Sentiment Analysis of 2012 U.S. Presidential Election Cycle

Author: Hao Wang ; Dogan Can ; Abe Kazemzadeh ; Francois Bar ; Shrikanth Narayanan

Abstract: This paper describes a system for real-time analysis of public sentiment toward presidential candidates in the 2012 U.S. election as expressed on Twitter, a microblogging service. Twitter has become a central site where people express their opinions and views on political parties and candidates. Emerging events or news are often followed almost instantly by a burst in Twitter volume, providing a unique opportunity to gauge the relation between expressed public sentiment and electoral events. In addition, sentiment analysis can help explore how these events affect public opinion. While traditional content analysis takes days or weeks to complete, the system demonstrated here analyzes sentiment in the entire Twitter traffic about the election, delivering results instantly and continuously. It offers the public, the media, politicians and scholars a new and timely perspective on the dynamics of the electoral process and public opinion. 1

5 0.65488333 91 acl-2012-Extracting and modeling durations for habits and events from Twitter

Author: Jennifer Williams ; Graham Katz

Abstract: We seek to automatically estimate typical durations for events and habits described in Twitter tweets. A corpus of more than 14 million tweets containing temporal duration information was collected. These tweets were classified as to their habituality status using a bootstrapped, decision tree. For each verb lemma, associated duration information was collected for episodic and habitual uses of the verb. Summary statistics for 483 verb lemmas and their typical habit and episode durations has been compiled and made available. This automatically generated duration information is broadly comparable to hand-annotation. 1

6 0.36570793 173 acl-2012-Self-Disclosure and Relationship Strength in Twitter Conversations

7 0.28185746 2 acl-2012-A Broad-Coverage Normalization System for Social Media Language

8 0.21375896 209 acl-2012-Unsupervised Semantic Role Induction with Global Role Ordering

9 0.19933148 43 acl-2012-Building Trainable Taggers in a Web-based, UIMA-Supported NLP Workbench

10 0.1965922 64 acl-2012-Crosslingual Induction of Semantic Roles

11 0.15688364 219 acl-2012-langid.py: An Off-the-shelf Language Identification Tool

12 0.15179215 176 acl-2012-Sentence Compression with Semantic Role Constraints

13 0.13490491 73 acl-2012-Discriminative Learning for Joint Template Filling

14 0.13389575 133 acl-2012-Learning to "Read Between the Lines" using Bayesian Logic Programs

15 0.13020082 142 acl-2012-Mining Entity Types from Query Logs via User Intent Modeling

16 0.1279752 120 acl-2012-Information-theoretic Multi-view Domain Adaptation

17 0.12668096 147 acl-2012-Modeling the Translation of Predicate-Argument Structure for SMT

18 0.12159272 156 acl-2012-Online Plagiarized Detection Through Exploiting Lexical, Syntax, and Semantic Information

19 0.12049384 13 acl-2012-A Graphical Interface for MT Evaluation and Error Analysis

20 0.11973407 153 acl-2012-Named Entity Disambiguation in Streaming Data


similar papers computed by lda model

lda for this paper:

topicId topicWeight

[(25, 0.023), (26, 0.059), (28, 0.041), (30, 0.024), (37, 0.048), (39, 0.053), (59, 0.013), (74, 0.029), (82, 0.022), (84, 0.023), (85, 0.031), (90, 0.125), (92, 0.147), (94, 0.022), (98, 0.196), (99, 0.054)]

similar papers list:

simIndex simValue paperId paperTitle

1 0.95931423 24 acl-2012-A Web-based Evaluation Framework for Spatial Instruction-Giving Systems

Author: Srinivasan Janarthanam ; Oliver Lemon ; Xingkun Liu

Abstract: We demonstrate a web-based environment for development and testing of different pedestrian route instruction-giving systems. The environment contains a City Model, a TTS interface, a game-world, and a user GUI including a simulated street-view. We describe the environment and components, the metrics that can be used for the evaluation of pedestrian route instruction-giving systems, and the shared challenge which is being organised using this environment.

2 0.92388082 113 acl-2012-INPROwidth.3emiSS: A Component for Just-In-Time Incremental Speech Synthesis

Author: Timo Baumann ; David Schlangen

Abstract: We present a component for incremental speech synthesis (iSS) and a set of applications that demonstrate its capabilities. This component can be used to increase the responsivity and naturalness of spoken interactive systems. While iSS can show its full strength in systems that generate output incrementally, we also discuss how even otherwise unchanged systems may profit from its capabilities.

3 0.83054489 214 acl-2012-Verb Classification using Distributional Similarity in Syntactic and Semantic Structures

Author: Danilo Croce ; Alessandro Moschitti ; Roberto Basili ; Martha Palmer

Abstract: In this paper, we propose innovative representations for automatic classification of verbs according to mainstream linguistic theories, namely VerbNet and FrameNet. First, syntactic and semantic structures capturing essential lexical and syntactic properties of verbs are defined. Then, we design advanced similarity functions between such structures, i.e., semantic tree kernel functions, for exploiting distributional and grammatical information in Support Vector Machines. The extensive empirical analysis on VerbNet class and frame detection shows that our models capture mean- ingful syntactic/semantic structures, which allows for improving the state-of-the-art.

same-paper 4 0.82741147 167 acl-2012-QuickView: NLP-based Tweet Search

Author: Xiaohua Liu ; Furu Wei ; Ming Zhou ; QuickView Team Microsoft

Abstract: Tweets have become a comprehensive repository for real-time information. However, it is often hard for users to quickly get information they are interested in from tweets, owing to the sheer volume of tweets as well as their noisy and informal nature. We present QuickView, an NLP-based tweet search platform to tackle this issue. Specifically, it exploits a series of natural language processing technologies, such as tweet normalization, named entity recognition, semantic role labeling, sentiment analysis, tweet classification, to extract useful information, i.e., named entities, events, opinions, etc., from a large volume of tweets. Then, non-noisy tweets, together with the mined information, are indexed, on top of which two brand new scenarios are enabled, i.e., categorized browsing and advanced search, allowing users to effectively access either the tweets or fine-grained information they are interested in.

5 0.72866154 205 acl-2012-Tweet Recommendation with Graph Co-Ranking

Author: Rui Yan ; Mirella Lapata ; Xiaoming Li

Abstract: Mirella Lapata‡ Xiaoming Li†, \ ‡Institute for Language, \State Key Laboratory of Software Cognition and Computation, Development Environment, University of Edinburgh, Beihang University, Edinburgh EH8 9AB, UK Beijing 100083, China mlap@ inf .ed .ac .uk lxm@pku .edu .cn 2012.1 Twitter enables users to send and read textbased posts ofup to 140 characters, known as tweets. As one of the most popular micro-blogging services, Twitter attracts millions of users, producing millions of tweets daily. Shared information through this service spreads faster than would have been possible with traditional sources, however the proliferation of user-generation content poses challenges to browsing and finding valuable information. In this paper we propose a graph-theoretic model for tweet recommendation that presents users with items they may have an interest in. Our model ranks tweets and their authors simultaneously using several networks: the social network connecting the users, the network connecting the tweets, and a third network that ties the two together. Tweet and author entities are ranked following a co-ranking algorithm based on the intuition that that there is a mutually reinforcing relationship between tweets and their authors that could be reflected in the rankings. We show that this framework can be parametrized to take into account user preferences, the popularity of tweets and their authors, and diversity. Experimental evaluation on a large dataset shows that our model out- performs competitive approaches by a large margin.

6 0.72350979 154 acl-2012-Native Language Detection with Tree Substitution Grammars

7 0.71878093 84 acl-2012-Estimating Compact Yet Rich Tree Insertion Grammars

8 0.71745908 174 acl-2012-Semantic Parsing with Bayesian Tree Transducers

9 0.71511686 208 acl-2012-Unsupervised Relation Discovery with Sense Disambiguation

10 0.71446407 31 acl-2012-Authorship Attribution with Author-aware Topic Models

11 0.71119201 86 acl-2012-Exploiting Latent Information to Predict Diffusions of Novel Topics on Social Networks

12 0.70461202 38 acl-2012-Bayesian Symbol-Refined Tree Substitution Grammars for Syntactic Parsing

13 0.70344305 36 acl-2012-BIUTEE: A Modular Open-Source System for Recognizing Textual Entailment

14 0.70194238 78 acl-2012-Efficient Search for Transformation-based Inference

15 0.69579875 132 acl-2012-Learning the Latent Semantics of a Concept from its Definition

16 0.6814338 80 acl-2012-Efficient Tree-based Approximation for Entailment Graph Learning

17 0.67927599 22 acl-2012-A Topic Similarity Model for Hierarchical Phrase-based Translation

18 0.67677116 10 acl-2012-A Discriminative Hierarchical Model for Fast Coreference at Large Scale

19 0.67366791 98 acl-2012-Finding Bursty Topics from Microblogs

20 0.67122102 146 acl-2012-Modeling Topic Dependencies in Hierarchical Text Categorization