acl acl2011 acl2011-177 knowledge-graph by maker-knowledge-mining

177 acl-2011-Interactive Group Suggesting for Twitter


Source: pdf

Author: Zhonghua Qu ; Yang Liu

Abstract: The number of users on Twitter has drastically increased in the past years. However, Twitter does not have an effective user grouping mechanism. Therefore tweets from other users can quickly overrun and become inconvenient to read. In this paper, we propose methods to help users group the people they follow using their provided seeding users. Two sources of information are used to build sub-systems: textural information captured by the tweets sent by users, and social connections among users. We also propose a measure of fitness to determine which subsystem best represents the seed users and use it for target user ranking. Our experiments show that our proposed framework works well and that adaptively choosing the appropriate sub-system for group suggestion results in increased accuracy.

Reference: text


Summary: the most important sentenses genereted by tfidf model

sentIndex sentText sentNum sentScore

1 edu a Abstract The number of users on Twitter has drastically increased in the past years. [sent-3, score-0.427]

2 However, Twitter does not have an effective user grouping mechanism. [sent-4, score-0.29]

3 Therefore tweets from other users can quickly overrun and become inconvenient to read. [sent-5, score-0.634]

4 In this paper, we propose methods to help users group the people they follow using their provided seeding users. [sent-6, score-0.745]

5 Two sources of information are used to build sub-systems: textural information captured by the tweets sent by users, and social connections among users. [sent-7, score-0.42]

6 We also propose a measure of fitness to determine which subsystem best represents the seed users and use it for target user ranking. [sent-8, score-1.324]

7 Our experiments show that our proposed framework works well and that adaptively choosing the appropriate sub-system for group suggestion results in increased accuracy. [sent-9, score-0.425]

8 1 Introduction Twitter is a well-known social network service that allows users to post short 140 character status update which is called “Tweet”. [sent-10, score-0.624]

9 A twitter user can “follow” other users to get their latest updates. [sent-11, score-1.063]

10 It works well when the number of tweets the user receives is not very large. [sent-15, score-0.497]

11 However, the flat timeline becomes tedious to read even for average users with less than 80 friends. [sent-16, score-0.492]

12 When Bob wants to read the latest news from his “Colleagues”, because of lacking effective ways to group users, he has to scroll through all “Tweets” from other users. [sent-20, score-0.328]

13 There have been suggestions from many Twitter users that a grouping feature could be very useful. [sent-21, score-0.457]

14 Yet, the only way to create groups is to create “lists” of users in Twitter manually by selecting each individual user. [sent-22, score-0.501]

15 This process is tedious and could be sometimes formidable when a user is following many people. [sent-23, score-0.347]

16 In this paper, we propose an interactive group creating system for Twitter. [sent-24, score-0.399]

17 A user creates a group by first providing a small number of seeding users, then the system ranks the friend list according to how likely a user belongs to the group indicated by the seeds. [sent-25, score-1.492]

18 We know in the real world, users like to group their “follows” in many ways. [sent-26, score-0.655]

19 For example, some may create groups containing all the “computer scientists”, others might create groups containing their real-life friends. [sent-27, score-0.212]

20 A system using “social information” to find friend groups may work well in the latter case, but might not effectively suggest correct group members in the former case. [sent-28, score-0.692]

21 On the other hand, a system using “textual information” may be effective in the first case, but is probably weak in finding friends in the second case. [sent-29, score-0.197]

22 Therefore in this paper, we propose to use multiple information sources for group member suggestions, and use a cross-validation approach to find the best-fit subProceedings ofP thoer t4l9atnhd A, Onrnuegaoln M,e Jeuntineg 19 o-f2 t4h,e 2 A0s1s1o. [sent-30, score-0.294]

23 Our results show that automatic group suggestion is feasible and that selecting approximate sub-system yields additional gain than using individual systems. [sent-33, score-0.319]

24 2 Related Work There is no previous research on interactive suggestion of friend groups on Twitter to our knowledge; however, some prior work is related and can help our task. [sent-34, score-0.463]

25 , 2010) uses implicit social graphs to help suggest email addresses a person is likely to send to based on the addresses already entered. [sent-36, score-0.232]

26 Also, using the social network information, hidden community detection algorithms such as (Palla et al. [sent-37, score-0.255]

27 Besides the social information, what a user tweets is also a good indicator to group users. [sent-39, score-0.904]

28 , 2010) used semi-supervised topic modeling to map each user’s tweets into four characteristic dimensions. [sent-41, score-0.312]

29 3 Interactive Group Creation Creating groups manually is a tedious process. [sent-42, score-0.169]

30 However, creating groups in an entirely unsupervised fashion could result in unwanted results. [sent-43, score-0.166]

31 In our system, a user first indicates a small number of users that belong to a group, called “seeds”, then the system suggests other users that might belong to this group. [sent-44, score-1.149]

32 Seed Users Figure 1: Overview of the system architecture As mentioned earlier, we use different informa- 520 tion sources to determine user/group similarity, including textual information and social connections. [sent-46, score-0.238]

33 A module is designed for each information source to rank users based on their similarity to the provided seeds. [sent-47, score-0.573]

34 In our approach, the system first tries to detect what sub-system can best fit the seed group. [sent-48, score-0.328]

35 Then, the corresponding system is used to generate the final ranked list of users according to the likelihood of belonging to the group. [sent-49, score-0.6]

36 After the rank list is given, the user can adjust the size ofthe group to best fit his/her needs. [sent-50, score-0.647]

37 In addition, a user can correct the system by specifically indicating someone as a “negative seed”, which should not be on the top of the list. [sent-51, score-0.319]

38 In this paper, we only consider creating one group at a time with only “positive seed” and do not consider the relationships between different groups. [sent-52, score-0.295]

39 Since determining the best fitting sub-system or the group type from the seeds needs the use of the two sub-systems, we describe them first. [sent-53, score-0.42]

40 Each subsystem takes a group of seed users and unlabeled target users as the input, and provides a ranked list of the target users belonging to the group indicated by the seeds. [sent-54, score-2.401]

41 1 Tweet Based Sub-system In this sub-system, user groups are modeled using the textual information contained in their tweets. [sent-56, score-0.392]

42 We collected all the tweets from a user and grouped them together. [sent-57, score-0.497]

43 To represent the tweets information, we could use a bag-of-word model for each user. [sent-58, score-0.239]

44 They can reduce the dimension and group words with similar semantics, and are often more robust in face of data sparsity or noisy data. [sent-62, score-0.312]

45 Because tweet messages are very short and hard to infer topics directly from them, we merge all the tweets from a user to form a larger document. [sent-63, score-0.71]

46 Then LDA is applied to the collection of documents from all the users to derive the topics. [sent-64, score-0.395]

47 Each user’s tweets can then be represented using a bag-of-topics model, where the ith component is the proportion of the ith topic appearing in the user’s tweet. [sent-65, score-0.334]

48 Given a group of seed users, we want to find target users that are similar to the seeds in terms of their tweet content. [sent-66, score-1.343]

49 To take multiple seed instances into consideration, we use two schemes to calculate the similarity between one target user and a seed group. [sent-67, score-1.089]

50 • centroid: we calculate the centroid of seeds, tcheenntr use :the w similarity ebet twheee cne nthtreo ciden otrfoi sde eadnsd, the target user as the final similarity value. [sent-68, score-0.732]

51 • average: we calculate the similarity between tahvee target waned c aealccuhl aitnedi thveidu siaml sileaerdit user, wtheeenn take the average as the final similarity value. [sent-69, score-0.414]

52 In this paper, we explore using two different similarity functions between two vectors (ui and vi), cosine similarity and inverse Euclidean distance, shown below respectively. [sent-70, score-0.346]

53 dcosine(u,v) =| u1 || v |iX=n1ui× vi (1) deuclidean(u,v) =pPin=1(1ui− vi)2 (2) After calculating similarpityP for all the target users, this tweet-based sub-system gives the ranking ac- cordingly. [sent-71, score-0.169]

54 2 Friend Based Sub-system As an initial study, we use a simple method to model friend relationship in user groups. [sent-73, score-0.481]

55 In this sub-system, we model people using their social information. [sent-75, score-0.19]

56 Unlike other social networks like “Facebook” or “Myspace”, a “following” relation in Twitter is directed. [sent-77, score-0.173]

57 In Twitter, a “mention” happens when someone refers to another Twitter user in their tweets. [sent-78, score-0.318]

58 Because this sub-system models the real-life friend groups, we only consider bi-directional following relation between people. [sent-80, score-0.223]

59 That is, we only consider an edge between users when both of them follow each other. [sent-81, score-0.395]

60 Our task is however different in that we know the seed of the target group and the output needs to be a ranking. [sent-84, score-0.635]

61 Here, we 521 use the count of bi-directional friends and mentions between a target user and the seed group as the score for ranking. [sent-85, score-1.1]

62 The intuition is that the social graph between real life friends tends to be very dense, and people who belong to the clique should have more edges to the seeds than others. [sent-86, score-0.58]

63 3 Group Type Detection The first component in our system is to determine which sub-system to use to suggest user groups. [sent-88, score-0.317]

64 We propose to evaluate the fitness of each sub-system base on the seeds provided using a cross-validation approach. [sent-89, score-0.359]

65 The assumption is that if a sub-system (information source used to form the group) is a good match, then it will rank the users in the seed group higher than others not in the seed. [sent-90, score-1.012]

66 The procedure of calculating the fitness score of each sub-system is shown in Algorithm 1. [sent-91, score-0.199]

67 In the input, S is the seed users (with more than one user), U is the target users to be ranked, and subrank is a ranking sub-system (two systems described above, each taking seed users and target users as input, and producing the ranking of the target users). [sent-92, score-2.504]

68 Each time, it takes one seed user Si out and puts it together with other target users. [sent-94, score-0.633]

69 Then it calls the sub-system to rank the new list and finds out the resulting rank for Si. [sent-95, score-0.187]

70 The final fitness score is the sum of all the ranks for the seed instances. [sent-96, score-0.586]

71 The system with the highest score is then selected and used to rank the original target users. [sent-97, score-0.163]

72 Because twitter does not provide direct functions to group friends, we use lists created by twitter users as the reference friend group in testing and evaluation. [sent-99, score-1.92]

73 We exclude users that have less than 20 or more than 150 friends; that do not have a qualified list (more than 20 and less than 200 list members); and that do not use English in their tweets. [sent-100, score-0.584]

74 For these qualified users, their 1, 383 friends information is retrieved, again using Twitter API. [sent-102, score-0.215]

75 For the friends that are retrieved, their 180, 296 tweets and 584, 339 friend-of-friend information are also retrieved. [sent-103, score-0.407]

76 5 Experiment In our experiment, we evaluate the performance of each sub-system and then use group type detection algorithm to adaptively combine the systems. [sent-105, score-0.371]

77 We use the Twitter lists we collected as the reference user groups for evaluation. [sent-106, score-0.404]

78 For each user group, we randomly take out 6 users from the list and use as seed candidate. [sent-107, score-1.023]

79 The target user consists of the rest of the list members and other “friends” that the list creator has. [sent-108, score-0.549]

80 From the ranked list for the target users, we calculate the mean average precision (MAP) score with the rank position of the list members. [sent-109, score-0.423]

81 In order to evaluate the effect of the seed size on the final performance, we vary the number of seeds from 2 to 6 using the 6 taken-out list members. [sent-112, score-0.557]

82 In the tweet based sub-system, we optimize its hyper parameter automatically based on the data. [sent-113, score-0.179]

83 As a stronger baseline (BOW baseline), we used cosine similarity between users’ tweets as the similarity measure. [sent-125, score-0.545]

84 Each user’s tweet content is represented using a bagof-words vector using this vocabulary. [sent-127, score-0.153]

85 The ranking of this baseline is calculated using the average similarity with the seeds. [sent-128, score-0.203]

86 In the tweet-based sub-system, “Cos” and “Euc” mean cosine similarity and inverse Euclidean distance respectively as the similarity measure. [sent-129, score-0.375]

87 “Cent” and “Avg” mean using centroid vector and average similarity respectively to measure the similarities between a target user and the seed group. [sent-130, score-0.907]

88 From the results, we can see that in general using a larger seed group improves performance since more information can be obtained from the group. [sent-131, score-0.559]

89 The “CosAvg” scheme (which uses cosine similarity with average similarity measure) achieves the best result. [sent-132, score-0.34]

90 Using cosine similarity measure gives better performance than inverse Euclidean distance. [sent-133, score-0.252]

91 This is not surprising since cosine similarity has been widely adopted as an appropriate similarity measure in the vector space model for text processing. [sent-134, score-0.332]

92 In the adaptive system, we also used “CosAvg” scheme in the tweet based sub-system. [sent-137, score-0.182]

93 This indicates that users form lists based on different factors and thus always using one single system is not the best solution. [sent-139, score-0.464]

94 It also demonstrates that our proposed fitness measure using cross-validation works well, and that the two information sources used to build sub-systems can appropriately capture the group characteristics. [sent-140, score-0.519]

95 6 Conclusion In this paper, we have proposed an interactive group creation system for Twitter users to organize their “followings”. [sent-141, score-0.759]

96 The system takes friend seeds provided by users and generates a ranked list according to the likelihood of a test user being in the group. [sent-142, score-1.183]

97 We introduced two sub-systems, based on tweet text and social information respectively. [sent-143, score-0.3]

98 We also pro- posed a group type detection procedure that is able to use the most appropriate ranking. [sent-144, score-0.297]

99 Our experiments system for group user show that by using differ- ent systems adaptively, better performance can be achieved compared to using any single system, suggesting this framework works well. [sent-145, score-0.583]

100 Furthermore, we will incorporate negative seeds into the process of interactive suggestion. [sent-147, score-0.235]


similar papers computed by tfidf model

tfidf for this paper:

wordName wordTfidf (topN-words)

[('users', 0.395), ('twitter', 0.371), ('seed', 0.299), ('group', 0.26), ('user', 0.258), ('tweets', 0.239), ('friend', 0.223), ('fitness', 0.199), ('friends', 0.168), ('seeds', 0.16), ('tweet', 0.153), ('social', 0.147), ('similarity', 0.12), ('groups', 0.106), ('palla', 0.098), ('target', 0.076), ('interactive', 0.075), ('adaptively', 0.074), ('list', 0.071), ('subsystem', 0.071), ('cosine', 0.066), ('lda', 0.066), ('cosavg', 0.065), ('centroid', 0.065), ('tedious', 0.063), ('ranks', 0.061), ('suggestion', 0.059), ('rank', 0.058), ('euclidean', 0.058), ('bob', 0.053), ('ranking', 0.049), ('ranked', 0.047), ('qualified', 0.047), ('seeding', 0.047), ('service', 0.046), ('vi', 0.044), ('members', 0.044), ('people', 0.043), ('ramage', 0.043), ('topic', 0.043), ('inverse', 0.04), ('lists', 0.04), ('latest', 0.039), ('mentions', 0.039), ('colleagues', 0.038), ('retrieved', 0.038), ('calculate', 0.037), ('detection', 0.037), ('suggesting', 0.036), ('network', 0.036), ('belong', 0.036), ('creating', 0.035), ('community', 0.035), ('average', 0.034), ('sources', 0.034), ('someone', 0.032), ('grouping', 0.032), ('si', 0.032), ('increased', 0.032), ('messages', 0.031), ('belonging', 0.031), ('blei', 0.031), ('suggestions', 0.03), ('suggest', 0.03), ('map', 0.03), ('topics', 0.029), ('adaptive', 0.029), ('system', 0.029), ('cent', 0.029), ('creator', 0.029), ('euc', 0.029), ('send', 0.029), ('maayan', 0.029), ('scroll', 0.029), ('cne', 0.029), ('ilan', 0.029), ('tweeting', 0.029), ('uncovering', 0.029), ('mean', 0.029), ('textual', 0.028), ('happens', 0.028), ('dimension', 0.027), ('final', 0.027), ('measure', 0.026), ('graphs', 0.026), ('networks', 0.026), ('clique', 0.026), ('myspace', 0.026), ('domly', 0.026), ('formidable', 0.026), ('hyper', 0.026), ('imre', 0.026), ('roth', 0.026), ('ith', 0.026), ('indicated', 0.025), ('noisy', 0.025), ('guy', 0.025), ('unwanted', 0.025), ('replies', 0.025), ('ron', 0.025)]

similar papers list:

simIndex simValue paperId paperTitle

same-paper 1 0.99999988 177 acl-2011-Interactive Group Suggesting for Twitter

Author: Zhonghua Qu ; Yang Liu

Abstract: The number of users on Twitter has drastically increased in the past years. However, Twitter does not have an effective user grouping mechanism. Therefore tweets from other users can quickly overrun and become inconvenient to read. In this paper, we propose methods to help users group the people they follow using their provided seeding users. Two sources of information are used to build sub-systems: textural information captured by the tweets sent by users, and social connections among users. We also propose a measure of fitness to determine which subsystem best represents the seed users and use it for target user ranking. Our experiments show that our proposed framework works well and that adaptively choosing the appropriate sub-system for group suggestion results in increased accuracy.

2 0.29551211 292 acl-2011-Target-dependent Twitter Sentiment Classification

Author: Long Jiang ; Mo Yu ; Ming Zhou ; Xiaohua Liu ; Tiejun Zhao

Abstract: Sentiment analysis on Twitter data has attracted much attention recently. In this paper, we focus on target-dependent Twitter sentiment classification; namely, given a query, we classify the sentiments of the tweets as positive, negative or neutral according to whether they contain positive, negative or neutral sentiments about that query. Here the query serves as the target of the sentiments. The state-ofthe-art approaches for solving this problem always adopt the target-independent strategy, which may assign irrelevant sentiments to the given target. Moreover, the state-of-the-art approaches only take the tweet to be classified into consideration when classifying the sentiment; they ignore its context (i.e., related tweets). However, because tweets are usually short and more ambiguous, sometimes it is not enough to consider only the current tweet for sentiment classification. In this paper, we propose to improve target-dependent Twitter sentiment classification by 1) incorporating target-dependent features; and 2) taking related tweets into consideration. According to the experimental results, our approach greatly improves the performance of target-dependent sentiment classification. 1

3 0.29471698 242 acl-2011-Part-of-Speech Tagging for Twitter: Annotation, Features, and Experiments

Author: Kevin Gimpel ; Nathan Schneider ; Brendan O'Connor ; Dipanjan Das ; Daniel Mills ; Jacob Eisenstein ; Michael Heilman ; Dani Yogatama ; Jeffrey Flanigan ; Noah A. Smith

Abstract: We address the problem of part-of-speech tagging for English data from the popular microblogging service Twitter. We develop a tagset, annotate data, develop features, and report tagging results nearing 90% accuracy. The data and tools have been made available to the research community with the goal of enabling richer text analysis of Twitter and related social media data sets.

4 0.26251912 64 acl-2011-C-Feel-It: A Sentiment Analyzer for Micro-blogs

Author: Aditya Joshi ; Balamurali AR ; Pushpak Bhattacharyya ; Rajat Mohanty

Abstract: Social networking and micro-blogging sites are stores of opinion-bearing content created by human users. We describe C-Feel-It, a system which can tap opinion content in posts (called tweets) from the micro-blogging website, Twitter. This web-based system categorizes tweets pertaining to a search string as positive, negative or objective and gives an aggregate sentiment score that represents a sentiment snapshot for a search string. We present a qualitative evaluation of this system based on a human-annotated tweet corpus.

5 0.19560905 208 acl-2011-Lexical Normalisation of Short Text Messages: Makn Sens a #twitter

Author: Bo Han ; Timothy Baldwin

Abstract: Twitter provides access to large volumes of data in real time, but is notoriously noisy, hampering its utility for NLP. In this paper, we target out-of-vocabulary words in short text messages and propose a method for identifying and normalising ill-formed words. Our method uses a classifier to detect ill-formed words, and generates correction candidates based on morphophonemic similarity. Both word similarity and context are then exploited to select the most probable correction candidate for the word. The proposed method doesn’t require any annotations, and achieves state-of-the-art performance over an SMS corpus and a novel dataset based on Twitter.

6 0.18398952 160 acl-2011-Identifying Sarcasm in Twitter: A Closer Look

7 0.16546153 305 acl-2011-Topical Keyphrase Extraction from Twitter

8 0.14835617 261 acl-2011-Recognizing Named Entities in Tweets

9 0.14399457 148 acl-2011-HITS-based Seed Selection and Stop List Construction for Bootstrapping

10 0.12942626 285 acl-2011-Simple supervised document geolocation with geodesic grids

11 0.12539537 304 acl-2011-Together We Can: Bilingual Bootstrapping for WSD

12 0.12274256 286 acl-2011-Social Network Extraction from Texts: A Thesis Proposal

13 0.1189286 271 acl-2011-Search in the Lost Sense of "Query": Question Formulation in Web Search Queries and its Temporal Changes

14 0.11557937 218 acl-2011-MemeTube: A Sentiment-based Audiovisual System for Analyzing and Displaying Microblog Messages

15 0.11074123 178 acl-2011-Interactive Topic Modeling

16 0.10846673 313 acl-2011-Two Easy Improvements to Lexical Weighting

17 0.10461446 174 acl-2011-Insights from Network Structure for Text Mining

18 0.10417611 172 acl-2011-Insertion, Deletion, or Substitution? Normalizing Text Messages without Pre-categorization nor Supervision

19 0.098959938 117 acl-2011-Entity Set Expansion using Topic information

20 0.097412206 169 acl-2011-Improving Question Recommendation by Exploiting Information Need


similar papers computed by lsi model

lsi for this paper:

topicId topicWeight

[(0, 0.185), (1, 0.187), (2, 0.043), (3, 0.059), (4, -0.049), (5, -0.029), (6, 0.007), (7, -0.176), (8, -0.065), (9, 0.125), (10, -0.306), (11, 0.157), (12, 0.27), (13, -0.133), (14, -0.135), (15, -0.144), (16, -0.023), (17, -0.032), (18, -0.018), (19, -0.005), (20, -0.051), (21, 0.041), (22, 0.002), (23, 0.057), (24, -0.01), (25, -0.049), (26, 0.003), (27, 0.101), (28, 0.053), (29, -0.083), (30, 0.014), (31, 0.022), (32, -0.06), (33, -0.028), (34, 0.06), (35, -0.061), (36, 0.026), (37, -0.021), (38, 0.01), (39, 0.025), (40, -0.006), (41, -0.052), (42, -0.023), (43, 0.048), (44, -0.007), (45, 0.063), (46, 0.04), (47, 0.006), (48, -0.028), (49, -0.053)]

similar papers list:

simIndex simValue paperId paperTitle

same-paper 1 0.96517688 177 acl-2011-Interactive Group Suggesting for Twitter

Author: Zhonghua Qu ; Yang Liu

Abstract: The number of users on Twitter has drastically increased in the past years. However, Twitter does not have an effective user grouping mechanism. Therefore tweets from other users can quickly overrun and become inconvenient to read. In this paper, we propose methods to help users group the people they follow using their provided seeding users. Two sources of information are used to build sub-systems: textural information captured by the tweets sent by users, and social connections among users. We also propose a measure of fitness to determine which subsystem best represents the seed users and use it for target user ranking. Our experiments show that our proposed framework works well and that adaptively choosing the appropriate sub-system for group suggestion results in increased accuracy.

2 0.73668027 160 acl-2011-Identifying Sarcasm in Twitter: A Closer Look

Author: Roberto Gonzalez-Ibanez ; Smaranda Muresan ; Nina Wacholder

Abstract: Sarcasm transforms the polarity of an apparently positive or negative utterance into its opposite. We report on a method for constructing a corpus of sarcastic Twitter messages in which determination of the sarcasm of each message has been made by its author. We use this reliable corpus to compare sarcastic utterances in Twitter to utterances that express positive or negative attitudes without sarcasm. We investigate the impact of lexical and pragmatic factors on machine learning effectiveness for identifying sarcastic utterances and we compare the performance of machine learning techniques and human judges on this task. Perhaps unsurprisingly, neither the human judges nor the machine learning techniques perform very well. 1

3 0.71083462 242 acl-2011-Part-of-Speech Tagging for Twitter: Annotation, Features, and Experiments

Author: Kevin Gimpel ; Nathan Schneider ; Brendan O'Connor ; Dipanjan Das ; Daniel Mills ; Jacob Eisenstein ; Michael Heilman ; Dani Yogatama ; Jeffrey Flanigan ; Noah A. Smith

Abstract: We address the problem of part-of-speech tagging for English data from the popular microblogging service Twitter. We develop a tagset, annotate data, develop features, and report tagging results nearing 90% accuracy. The data and tools have been made available to the research community with the goal of enabling richer text analysis of Twitter and related social media data sets.

4 0.64620769 305 acl-2011-Topical Keyphrase Extraction from Twitter

Author: Xin Zhao ; Jing Jiang ; Jing He ; Yang Song ; Palakorn Achanauparp ; Ee-Peng Lim ; Xiaoming Li

Abstract: Summarizing and analyzing Twitter content is an important and challenging task. In this paper, we propose to extract topical keyphrases as one way to summarize Twitter. We propose a context-sensitive topical PageRank method for keyword ranking and a probabilistic scoring function that considers both relevance and interestingness of keyphrases for keyphrase ranking. We evaluate our proposed methods on a large Twitter data set. Experiments show that these methods are very effective for topical keyphrase extraction.

5 0.61433947 64 acl-2011-C-Feel-It: A Sentiment Analyzer for Micro-blogs

Author: Aditya Joshi ; Balamurali AR ; Pushpak Bhattacharyya ; Rajat Mohanty

Abstract: Social networking and micro-blogging sites are stores of opinion-bearing content created by human users. We describe C-Feel-It, a system which can tap opinion content in posts (called tweets) from the micro-blogging website, Twitter. This web-based system categorizes tweets pertaining to a search string as positive, negative or objective and gives an aggregate sentiment score that represents a sentiment snapshot for a search string. We present a qualitative evaluation of this system based on a human-annotated tweet corpus.

6 0.57544458 261 acl-2011-Recognizing Named Entities in Tweets

7 0.54057348 292 acl-2011-Target-dependent Twitter Sentiment Classification

8 0.53453016 208 acl-2011-Lexical Normalisation of Short Text Messages: Makn Sens a #twitter

9 0.50356525 172 acl-2011-Insertion, Deletion, or Substitution? Normalizing Text Messages without Pre-categorization nor Supervision

10 0.43283081 338 acl-2011-Wikulu: An Extensible Architecture for Integrating Natural Language Processing Techniques with Wikis

11 0.42642602 121 acl-2011-Event Discovery in Social Media Feeds

12 0.42379534 148 acl-2011-HITS-based Seed Selection and Stop List Construction for Bootstrapping

13 0.4235734 285 acl-2011-Simple supervised document geolocation with geodesic grids

14 0.41765773 19 acl-2011-A Mobile Touchable Application for Online Topic Graph Extraction and Exploration of Web Content

15 0.40377715 174 acl-2011-Insights from Network Structure for Text Mining

16 0.38350081 67 acl-2011-Clairlib: A Toolkit for Natural Language Processing, Information Retrieval, and Network Analysis

17 0.3565315 286 acl-2011-Social Network Extraction from Texts: A Thesis Proposal

18 0.35356566 115 acl-2011-Engkoo: Mining the Web for Language Learning

19 0.35092452 26 acl-2011-A Speech-based Just-in-Time Retrieval System using Semantic Search

20 0.35038325 304 acl-2011-Together We Can: Bilingual Bootstrapping for WSD


similar papers computed by lda model

lda for this paper:

topicId topicWeight

[(5, 0.022), (17, 0.064), (26, 0.065), (37, 0.063), (39, 0.04), (41, 0.08), (55, 0.018), (59, 0.037), (64, 0.141), (72, 0.054), (91, 0.121), (96, 0.195)]

similar papers list:

simIndex simValue paperId paperTitle

1 0.95469582 106 acl-2011-Dual Decomposition for Natural Language Processing

Author: Alexander M. Rush and Michael Collins

Abstract: unkown-abstract

same-paper 2 0.89351797 177 acl-2011-Interactive Group Suggesting for Twitter

Author: Zhonghua Qu ; Yang Liu

Abstract: The number of users on Twitter has drastically increased in the past years. However, Twitter does not have an effective user grouping mechanism. Therefore tweets from other users can quickly overrun and become inconvenient to read. In this paper, we propose methods to help users group the people they follow using their provided seeding users. Two sources of information are used to build sub-systems: textural information captured by the tweets sent by users, and social connections among users. We also propose a measure of fitness to determine which subsystem best represents the seed users and use it for target user ranking. Our experiments show that our proposed framework works well and that adaptively choosing the appropriate sub-system for group suggestion results in increased accuracy.

3 0.86480439 206 acl-2011-Learning to Transform and Select Elementary Trees for Improved Syntax-based Machine Translations

Author: Bing Zhao ; Young-Suk Lee ; Xiaoqiang Luo ; Liu Li

Abstract: We propose a novel technique of learning how to transform the source parse trees to improve the translation qualities of syntax-based translation models using synchronous context-free grammars. We transform the source tree phrasal structure into a set of simpler structures, expose such decisions to the decoding process, and find the least expensive transformation operation to better model word reordering. In particular, we integrate synchronous binarizations, verb regrouping, removal of redundant parse nodes, and incorporate a few important features such as translation boundaries. We learn the structural preferences from the data in a generative framework. The syntax-based translation system integrating the proposed techniques outperforms the best Arabic-English unconstrained system in NIST08 evaluations by 1.3 absolute BLEU, which is statistically significant.

4 0.86427289 108 acl-2011-EdIt: A Broad-Coverage Grammar Checker Using Pattern Grammar

Author: Chung-chi Huang ; Mei-hua Chen ; Shih-ting Huang ; Jason S. Chang

Abstract: We introduce a new method for learning to detect grammatical errors in learner’s writing and provide suggestions. The method involves parsing a reference corpus and inferring grammar patterns in the form of a sequence of content words, function words, and parts-of-speech (e.g., “play ~ role in Ving” and “look forward to Ving”). At runtime, the given passage submitted by the learner is matched using an extended Levenshtein algorithm against the set of pattern rules in order to detect errors and provide suggestions. We present a prototype implementation of the proposed method, EdIt, that can handle a broad range of errors. Promising results are illustrated with three common types of errors in nonnative writing. 1

5 0.86382169 123 acl-2011-Exact Decoding of Syntactic Translation Models through Lagrangian Relaxation

Author: Alexander M. Rush ; Michael Collins

Abstract: We describe an exact decoding algorithm for syntax-based statistical translation. The approach uses Lagrangian relaxation to decompose the decoding problem into tractable subproblems, thereby avoiding exhaustive dynamic programming. The method recovers exact solutions, with certificates of optimality, on over 97% of test examples; it has comparable speed to state-of-the-art decoders.

6 0.84816647 145 acl-2011-Good Seed Makes a Good Crop: Accelerating Active Learning Using Language Modeling

7 0.84254903 86 acl-2011-Coreference for Learning to Extract Relations: Yes Virginia, Coreference Matters

8 0.83295727 313 acl-2011-Two Easy Improvements to Lexical Weighting

9 0.8297565 137 acl-2011-Fine-Grained Class Label Markup of Search Queries

10 0.82652295 241 acl-2011-Parsing the Internal Structure of Words: A New Paradigm for Chinese Word Segmentation

11 0.82626379 148 acl-2011-HITS-based Seed Selection and Stop List Construction for Bootstrapping

12 0.82498741 37 acl-2011-An Empirical Evaluation of Data-Driven Paraphrase Generation Techniques

13 0.82470495 221 acl-2011-Model-Based Aligner Combination Using Dual Decomposition

14 0.82421708 79 acl-2011-Confidence Driven Unsupervised Semantic Parsing

15 0.82077718 140 acl-2011-Fully Unsupervised Word Segmentation with BVE and MDL

16 0.81955379 117 acl-2011-Entity Set Expansion using Topic information

17 0.81951678 190 acl-2011-Knowledge-Based Weak Supervision for Information Extraction of Overlapping Relations

18 0.81929779 254 acl-2011-Putting it Simply: a Context-Aware Approach to Lexical Simplification

19 0.81862664 36 acl-2011-An Efficient Indexer for Large N-Gram Corpora

20 0.81842625 207 acl-2011-Learning to Win by Reading Manuals in a Monte-Carlo Framework