emnlp emnlp2011 emnlp2011-41 knowledge-graph by maker-knowledge-mining

41 emnlp-2011-Discriminating Gender on Twitter


Source: pdf

Author: John D. Burger ; John Henderson ; George Kim ; Guido Zarrella

Abstract: Accurate prediction of demographic attributes from social media and other informal online content is valuable for marketing, personalization, and legal investigation. This paper describes the construction of a large, multilingual dataset labeled with gender, and investigates statistical models for determining the gender of uncharacterized Twitter users. We explore several different classifier types on this dataset. We show the degree to which classifier accuracy varies based on tweet volumes as well as when various kinds of profile metadata are included in the models. We also perform a large-scale human assessment using Amazon Mechanical Turk. Our methods significantly out-perform both baseline models and almost all humans on the same task.

Reference: text


Summary: the most important sentenses genereted by tfidf model

sentIndex sentText sentNum sentScore

1 This paper describes the construction of a large, multilingual dataset labeled with gender, and investigates statistical models for determining the gender of uncharacterized Twitter users. [sent-3, score-0.504]

2 We show the degree to which classifier accuracy varies based on tweet volumes as well as when various kinds of profile metadata are included in the models. [sent-5, score-0.704]

3 This in turn has sparked a great deal of research interest in aspects of social media, including automatically identifying latent demographic features of online users. [sent-9, score-0.156]

4 Many latent features have been explored, but gender and age have generated great interest (Schler et al. [sent-10, score-0.492]

5 In this work, we investigate the development of highperformance classifiers for identifying the gender of Twitter users. [sent-15, score-0.516]

6 We cast gender identification as the obvious binary classification problem, and explore the use of a number of text-based features. [sent-16, score-0.451]

7 2 Data Twitter is a social networking and micro-blogging platform whose users publish short messages or tweets. [sent-23, score-0.248]

8 In late 2010, it was estimated that Twitter had 175 million registered users worldwide, producing 65 million tweets per day (Miller, 2010). [sent-24, score-0.537]

9 In April 2009, we began sampling data from Twitter using their API at a rate of approximately 400,000 tweets per day. [sent-29, score-0.278]

10 This decrease is because we sample roughly the same number of tweets every day while Twitter’s overall volume has increased markedly. [sent-31, score-0.245]

11 Our corpus thus far contains approximately 213 million tweets from 18. [sent-32, score-0.281]

12 In addition to the tweets that they produce, each Twitter user has a profile with the following free-text fields: • Screen name (e. [sent-34, score-0.697]

13 , Retired accountant and grandfatDheesr)c All of these except screen name are completely optional, and all may be changed at any time. [sent-45, score-0.223]

14 Thus, the existing profile elements are not directly useful when we wish to apply supervised learning approaches to classify tweets for these target attributes. [sent-49, score-0.4]

15 (2010) use a focused search methodology followed by manual annotation to produce a dataset of 500 English users labeled with gender. [sent-52, score-0.247]

16 Previous research into gender variation in online dis- course (Herring et al. [sent-54, score-0.486]

17 , 2004; Huffaker, 2004) has found it convenient to examine blogs, in part because blog sites often have rich profile pages, with explicit entries for gender and other attributes of interest. [sent-55, score-0.831]

18 Many Twitter users use the URL field in their profile to link to another facet of their online presence. [sent-56, score-0.418]

19 A significant number of users link to blogging websites, and many of these have wellstructured profile pages indicating our target attributes. [sent-57, score-0.349]

20 Users on these sites must select gender and other attributes from dropdown menus in order to populate their profile information. [sent-59, score-0.711]

21 Accordingly, we automatically followed the Twitter URL links to several of the most represented blog sites in our dataset, and sampled the corresponding profiles. [sent-60, score-0.162]

22 By attributing this blogger profile information to the associated Twitter account, we created a corpus of approximately 184,000 Twitter users labeled with gender. [sent-61, score-0.388]

23 We partitioned our dataset by user into three distinct subsets, training, development, and test, with sizes as indicated in Figure 1. [sent-62, score-0.254]

24 That is, all the tweets from each user are in a single one of the three subsets. [sent-63, score-0.446]

25 This method of gleaning supervised labels for our Twitter data is only useful if the blog profiles are in turn accurate. [sent-65, score-0.144]

26 We randomly selected 1000 Twitter users from our training set and manually examined the description field for obvious indicators of gender, e. [sent-67, score-0.276]

27 Only 150 descriptions (15% of the sample) had such an explicit gender cue. [sent-70, score-0.451]

28 136 of these also had a blog profile with the gender selected, and in all of these the gender cue from the user’s Twitter description agreed with the corresponding blog profile. [sent-71, score-1.37]

29 This may only indicate that people who misrepresent their gender are simply consistent across different aspects of their online presence. [sent-72, score-0.486]

30 Figure 2 shows several statistics broken down by gender, including the Twitter users who did not indicate their gender on their blog profile. [sent-76, score-0.765]

31 In our dataset females tweet at a higher rate than males and in general users who provide their gender on their blog profile produce more tweets than users who do not. [sent-77, score-1.872]

32 Additionally, of the 150 users who provided a gender cue in their Twitter user description, 105 were female (70%). [sent-78, score-0.938]

33 Thus, females appear more likely to provide explicit indicators about their gender in our corpus. [sent-79, score-0.501]

34 The average number of tweets per user is 22 and is fairly consistent across our traing/dev/test splits. [sent-80, score-0.446]

35 There is wide variance, however, with some users represented by only a single tweet, while the most prolific user in our sample has nearly 4000 tweets. [sent-81, score-0.423]

36 It is worth noting that many Twitter users do not tweet in English. [sent-82, score-0.57]

37 We ran automatic language ID on the concatenated tweet texts of each user in the training set. [sent-84, score-0.649]

38 The subset of Twitter users who also use a blog site may be different from the Twitter population as a whole, and may also be different from the users tweeting during the three days of Wauters’s study. [sent-87, score-0.546]

39 There are also possible longitudinal differences: English was the dominant language on Twitter when the online service began in 2006, and this was still the case when we began sampling tweets in 2009, but the proportion of English tweets had steadily dropped to about 50% in late 2010. [sent-88, score-0.642]

40 However by sampling only Twitter users with blogs we have largely filtered out spammers from our dataset. [sent-92, score-0.194]

41 Informal inspection of a few thousand tweets revealed a negligible number ofcommercial tweets. [sent-93, score-0.268]

42 3 Features Tweets are tagged with many sources of potentially discriminative metadata, including timestamps, user color Users Count Tweets Percentage Count Percentage Mean tweets per user FNMeoamtle palreovided18503 0, 068715745432 52. [sent-94, score-0.647]

43 We use the content of the tweet text as well as three fields from the Twitter user profile described in Section 2: full name, screen name, and description. [sent-99, score-0.99]

44 For each user in our dataset, a field is in general a set of text strings. [sent-100, score-0.235]

45 This is obviously true for tweet texts but is also the case for the profile-based fields since a Twitter user may change any part of their profile at any time. [sent-101, score-0.906]

46 Because our sample spans points in time where users have changed their screen name, full name or description, we include all of the different values for those fields as a set. [sent-102, score-0.548]

47 In addition, a user may leave their description and full name blank, which corresponds to the empty set. [sent-103, score-0.374]

48 4 Experiments We formulate gender labeling as the obvious binary classification problem. [sent-115, score-0.451]

49 To speed experimentation and reduce the memory footprint, we perform a one-time feature generation preprocessing step in which we convert each feature pattern (such as “caseful screen name character trigram: Joh”) to an integer codeword. [sent-123, score-0.246]

50 We compress the data further by concatenating all of a user’s features into a single vector that represents the union of every tweet produced by that user. [sent-125, score-0.376]

51 These initial experiments were based only on caseful word unigram features from tweet texts, which represent less than 3% of the total feature space but still include large numbers of irrelevant features. [sent-129, score-0.443]

52 03) was effective when using only one type of input feature (such as only screen name features, or only tweet text features), and a higher learning rate (0. [sent-140, score-0.599]

53 All gender prediction models were trained using data from the training set and evaluated on data from the development set. [sent-144, score-0.485]

54 We trained it on the training set and evaluated on the development set for each of the four user fields in isolation, as well as various combinations, in order to simulate different use cases for systems that perform gender prediction from social media sources. [sent-148, score-0.891]

55 In some cases we may have all of the metadata fields available above, while in other cases we may only have a sample of a user’s tweet content or perhaps just one tweet. [sent-149, score-0.518]

56 We simulated the latter condition by randomly selecting a single tweet for each dev and test user; this tweet was used for all evaluations of that user under the single-tweet condition. [sent-150, score-0.989]

57 Note, however, that for training the single tweet classifier, we do not concatenate all of a user’s tweets as described above. [sent-151, score-0.621]

58 Instead, we pair each user in the training set with each of their tweets in turn, 1304 in order to take advantage of all the training data. [sent-152, score-0.446]

59 This amounted to over 3 million training instances for the single tweet condition. [sent-153, score-0.412]

60 Note that for all experiments, the evaluation includes some users who have left their full name or description fields blank in their profile. [sent-156, score-0.503]

61 In all cases, we compare results to a maximum likelihood baseline that simply labels all users female. [sent-157, score-0.218]

62 AMT and other crowd-sourcing platforms allow simple tasks to be posted online for large numbers of anonymous workers to complete. [sent-161, score-0.211]

63 We used AMT to measure human performance on gender determination for the all tweets condition. [sent-162, score-0.696]

64 Each AMT worker was presented with all of the tweet texts from a single Twitter user in our development set and asked whether the author was male or female. [sent-163, score-0.823]

65 We redundantly assigned five workers to each Twitter user, for a total of 91,900 responses from 794 different workers. [sent-164, score-0.214]

66 1 Field combinations Figure 5 shows development set performance on various combinations of the user fields, all of which outperform the maximum likelihood baseline that classifies all users as female. [sent-174, score-0.473]

67 5385 Figure 6: Accuracy on the training, development and test sets to gender is the user’s full name, which provides an accuracy of 89. [sent-184, score-0.554]

68 Screen name is often a derivative of full name, and it too is informative (77. [sent-186, score-0.153]

69 Using only tweet texts performs better than using only the user description (75. [sent-189, score-0.697]

70 It appears that the tweet texts convey more about a Twitter user’s gender than their own self-descriptions. [sent-194, score-0.899]

71 Even a single (randomly selected) tweet text contains some gender-indicative information (67. [sent-195, score-0.376]

72 7% accuracy on gender from tweet texts alone using an ngram-only model, rising to 72. [sent-200, score-0.939]

73 Test set differences aside, this is comparable with the “All tweet texts” line in Figure 5, where we achieve an accuracy of 75. [sent-202, score-0.416]

74 The combination of tweet texts and a screen name represents a use case common to many different social media sites, such as chat rooms and news article comment streams. [sent-205, score-0.8]

75 As we have observed, full name is the single most informative field. [sent-208, score-0.153]

76 Finally, the classifier that has access to features from all four fields is able to achieve an accuracy of 92. [sent-211, score-0.235]

77 Underscores are spaces, $ matches the end of the tweet text. [sent-222, score-0.376]

78 There are features in the user name and user screen name fields that make the data trivially separable. [sent-226, score-0.823]

79 The tweet texts, however, present more ambiguity for the learners. [sent-227, score-0.376]

80 As discussed in Section 2, there is wide variance in the number of tweets available from different users. [sent-231, score-0.245]

81 In Figure 9 we show how the tweet text classifier’s accuracy increases as the number of tweets from the user increases. [sent-232, score-0.862]

82 Each point is the average classifier accuracy for the user cohort with exactly that many tweets in our dev set. [sent-233, score-0.654]

83 Performance increases given more tweets, although the averages get noisy for the larger tweet sets, due to successively smaller cohort sizes. [sent-234, score-0.415]

84 Some of the most informative features from tweet texts are shown in Figure 7, ordered by mutual information with gender. [sent-235, score-0.476]

85 The presence of http as a strong male feature might be taken to indicate that men include links in their tweet texts far more often than women, but a cursory examination seems to show instead that women are simply more likely to include “bare” links, e. [sent-243, score-0.612]

86 This would seem to indicate that there were a few poor workers who did many annotations, and in fact when we limit the performance average to those workers who produced 100 or more responses, we do see a degradation to 62. [sent-255, score-0.292]

87 The problem of poor quality workers is endemic to anonymous crowd sourcing platforms like Mechanical Turk. [sent-257, score-0.215]

88 5 Figure 10: Comparing with humans on the all tweet texts task five workers who responded to each item as an ensemble. [sent-264, score-0.617]

89 In this case, the first is an AMT worker’s capability and the second is the distribution of gender labels for each Twitter user. [sent-270, score-0.475]

90 As Figure 11 indicates, most workers perform below 80% accuracy, and less than 5% of the prolific workers out-perform the automatic classifier. [sent-278, score-0.32]

91 These high-scoring workers may indeed be good at the task, or they may have simply been assigned a lessdifficult subset of the data. [sent-279, score-0.146]

92 Figure 12 illustrates this by showing aligned worker performance and classifier performance on the precise set of items that each worker performed on. [sent-280, score-0.279]

93 Here we see that, with few exceptions, the automatic classifier performs as well or better than the AMT workers on their subset. [sent-281, score-0.239]

94 As described above the all-fields classifier achieves an accuracy of 92% on the development set when trained on the full training set. [sent-285, score-0.196]

95 Figure 8: Performance increases when training with more users Figure 9: Performance increases with more tweets from target user 1307 Figure 11: Human accuracy in rank order (100 responses or more), with classifier performance (line) Figure 12: Classifier vs. [sent-292, score-0.841]

96 human accuracy on the same subsets (100 responses or more) 1308 6 Conclusion In this paper, we have presented several configurations of a language-independent classifier for predicting the gender of Twitter users. [sent-293, score-0.652]

97 The large dataset used for construction and evaluation of these classifiers was drawn from Twitter users who also completed blog profile pages. [sent-294, score-0.553]

98 These classifiers were tested on the largest set of gender-tagged tweets to date that we are aware of. [sent-295, score-0.276]

99 The best classifier performed at 92% accuracy, and the classifier relying only on tweet texts performed at 76% accuracy. [sent-296, score-0.69]

100 In future work, we will explore how well such models carry over to gender identification in other informal online genres such as chat and forum comments. [sent-298, score-0.551]


similar papers computed by tfidf model

tfidf for this paper:

wordName wordTfidf (topN-words)

[('gender', 0.451), ('twitter', 0.432), ('tweet', 0.376), ('tweets', 0.245), ('user', 0.201), ('users', 0.194), ('profile', 0.155), ('workers', 0.146), ('screen', 0.127), ('blog', 0.12), ('fields', 0.102), ('name', 0.096), ('classifier', 0.093), ('amt', 0.088), ('winnow', 0.084), ('worker', 0.079), ('texts', 0.072), ('mechanical', 0.072), ('responses', 0.068), ('female', 0.067), ('demographic', 0.067), ('attributes', 0.063), ('male', 0.061), ('argamon', 0.059), ('dawid', 0.059), ('social', 0.054), ('amazon', 0.054), ('rao', 0.053), ('dataset', 0.053), ('females', 0.05), ('burger', 0.05), ('media', 0.049), ('description', 0.048), ('ngrams', 0.042), ('sites', 0.042), ('age', 0.041), ('accuracy', 0.04), ('women', 0.04), ('metadata', 0.04), ('informal', 0.039), ('blogger', 0.039), ('caseful', 0.039), ('cohort', 0.039), ('endemic', 0.039), ('heil', 0.039), ('mitre', 0.039), ('petrovic', 0.039), ('skene', 0.039), ('wauters', 0.039), ('url', 0.038), ('population', 0.038), ('million', 0.036), ('ensemble', 0.036), ('dev', 0.036), ('online', 0.035), ('field', 0.034), ('development', 0.034), ('men', 0.034), ('blogosphere', 0.034), ('males', 0.034), ('ipeirotis', 0.034), ('blank', 0.034), ('herring', 0.034), ('mukherjee', 0.034), ('schler', 0.034), ('shlomo', 0.034), ('began', 0.033), ('classifiers', 0.031), ('turk', 0.03), ('platforms', 0.03), ('gigabytes', 0.03), ('koppel', 0.03), ('marketing', 0.03), ('balanced', 0.029), ('aaai', 0.029), ('http', 0.029), ('full', 0.029), ('half', 0.028), ('libsvm', 0.028), ('personalization', 0.028), ('prolific', 0.028), ('performed', 0.028), ('informative', 0.028), ('irrelevant', 0.028), ('late', 0.026), ('chat', 0.026), ('facebook', 0.026), ('moshe', 0.026), ('cue', 0.025), ('henderson', 0.025), ('steadily', 0.025), ('labels', 0.024), ('legal', 0.024), ('ib', 0.024), ('exploration', 0.024), ('character', 0.023), ('humans', 0.023), ('revealed', 0.023), ('combinations', 0.022), ('eisenstein', 0.022)]

similar papers list:

simIndex simValue paperId paperTitle

same-paper 1 1.0000006 41 emnlp-2011-Discriminating Gender on Twitter

Author: John D. Burger ; John Henderson ; George Kim ; Guido Zarrella

Abstract: Accurate prediction of demographic attributes from social media and other informal online content is valuable for marketing, personalization, and legal investigation. This paper describes the construction of a large, multilingual dataset labeled with gender, and investigates statistical models for determining the gender of uncharacterized Twitter users. We explore several different classifier types on this dataset. We show the degree to which classifier accuracy varies based on tweet volumes as well as when various kinds of profile metadata are included in the models. We also perform a large-scale human assessment using Amazon Mechanical Turk. Our methods significantly out-perform both baseline models and almost all humans on the same task.

2 0.3259593 89 emnlp-2011-Linguistic Redundancy in Twitter

Author: Fabio Massimo Zanzotto ; Marco Pennaccchiotti ; Kostas Tsioutsiouliklis

Abstract: In the last few years, the interest of the research community in micro-blogs and social media services, such as Twitter, is growing exponentially. Yet, so far not much attention has been paid on a key characteristic of microblogs: the high level of information redundancy. The aim of this paper is to systematically approach this problem by providing an operational definition of redundancy. We cast redundancy in the framework of Textual Entailment Recognition. We also provide quantitative evidence on the pervasiveness of redundancy in Twitter, and describe a dataset of redundancy-annotated tweets. Finally, we present a general purpose system for identifying redundant tweets. An extensive quantitative evaluation shows that our system successfully solves the redundancy detection task, improving over baseline systems with statistical significance.

3 0.29287112 117 emnlp-2011-Rumor has it: Identifying Misinformation in Microblogs

Author: Vahed Qazvinian ; Emily Rosengren ; Dragomir R. Radev ; Qiaozhu Mei

Abstract: A rumor is commonly defined as a statement whose true value is unverifiable. Rumors may spread misinformation (false information) or disinformation (deliberately false information) on a network of people. Identifying rumors is crucial in online social media where large amounts of information are easily spread across a large network by sources with unverified authority. In this paper, we address the problem of rumor detection in microblogs and explore the effectiveness of 3 categories of features: content-based, network-based, and microblog-specific memes for correctly identifying rumors. Moreover, we show how these features are also effective in identifying disinformers, users who endorse a rumor and further help it to spread. We perform our experiments on more than 10,000 manually annotated tweets collected from Twitter and show how our retrieval model achieves more than 0.95 in Mean Average Precision (MAP). Fi- nally, we believe that our dataset is the first large-scale dataset on rumor detection. It can open new dimensions in analyzing online misinformation and other aspects of microblog conversations.

4 0.28341806 98 emnlp-2011-Named Entity Recognition in Tweets: An Experimental Study

Author: Alan Ritter ; Sam Clark ; Mausam ; Oren Etzioni

Abstract: People tweet more than 100 Million times daily, yielding a noisy, informal, but sometimes informative corpus of 140-character messages that mirrors the zeitgeist in an unprecedented manner. The performance of standard NLP tools is severely degraded on tweets. This paper addresses this issue by re-building the NLP pipeline beginning with part-of-speech tagging, through chunking, to named-entity recognition. Our novel T-NER system doubles F1 score compared with the Stanford NER system. T-NER leverages the redundancy inherent in tweets to achieve this performance, using LabeledLDA to exploit Freebase dictionaries as a source of distant supervision. LabeledLDA outperforms cotraining, increasing F1 by 25% over ten common entity types. Our NLP tools are available at: http : / / github .com/ aritt er /twitte r_nlp

5 0.25689891 71 emnlp-2011-Identifying and Following Expert Investors in Stock Microblogs

Author: Roy Bar-Haim ; Elad Dinur ; Ronen Feldman ; Moshe Fresko ; Guy Goldstein

Abstract: Information published in online stock investment message boards, and more recently in stock microblogs, is considered highly valuable by many investors. Previous work focused on aggregation of sentiment from all users. However, in this work we show that it is beneficial to distinguish expert users from non-experts. We propose a general framework for identifying expert investors, and use it as a basis for several models that predict stock rise from stock microblogging messages (stock tweets). In particular, we present two methods that combine expert identification and per-user unsupervised learning. These methods were shown to achieve relatively high precision in predicting stock rise, and significantly outperform our baseline. In addition, our work provides an in-depth analysis of the content and potential usefulness of stock tweets.

6 0.19212721 133 emnlp-2011-The Imagination of Crowds: Conversational AAC Language Modeling using Crowdsourcing and Large Data Sources

7 0.17437564 139 emnlp-2011-Twitter Catches The Flu: Detecting Influenza Epidemics using Twitter

8 0.14940712 28 emnlp-2011-Closing the Loop: Fast, Interactive Semi-Supervised Annotation With Queries on Features and Instances

9 0.13871807 33 emnlp-2011-Cooooooooooooooollllllllllllll!!!!!!!!!!!!!! Using Word Lengthening to Detect Sentiment in Microblogs

10 0.13295922 38 emnlp-2011-Data-Driven Response Generation in Social Media

11 0.1178434 104 emnlp-2011-Personalized Recommendation of User Comments via Factor Models

12 0.10540193 17 emnlp-2011-Active Learning with Amazon Mechanical Turk

13 0.080206901 130 emnlp-2011-Summarize What You Are Interested In: An Optimization Framework for Interactive Personalized Summarization

14 0.073417552 24 emnlp-2011-Bootstrapping Semantic Parsers from Conversations

15 0.054722164 105 emnlp-2011-Predicting Thread Discourse Structure over Technical Web Forums

16 0.049902994 11 emnlp-2011-A Simple Word Trigger Method for Social Tag Suggestion

17 0.048247054 54 emnlp-2011-Exploiting Parse Structures for Native Language Identification

18 0.047208201 42 emnlp-2011-Divide and Conquer: Crowdsourcing the Creation of Cross-Lingual Textual Entailment Corpora

19 0.046155479 56 emnlp-2011-Exploring Supervised LDA Models for Assigning Attributes to Adjective-Noun Phrases

20 0.042511914 140 emnlp-2011-Universal Morphological Analysis using Structured Nearest Neighbor Prediction


similar papers computed by lsi model

lsi for this paper:

topicId topicWeight

[(0, 0.185), (1, -0.359), (2, 0.453), (3, 0.031), (4, -0.282), (5, 0.034), (6, -0.076), (7, -0.01), (8, 0.023), (9, -0.039), (10, -0.039), (11, -0.033), (12, 0.012), (13, -0.074), (14, -0.027), (15, 0.012), (16, 0.016), (17, 0.004), (18, -0.028), (19, 0.034), (20, 0.084), (21, 0.01), (22, 0.015), (23, 0.043), (24, -0.012), (25, -0.021), (26, 0.008), (27, -0.005), (28, 0.03), (29, 0.036), (30, 0.008), (31, 0.044), (32, 0.011), (33, -0.021), (34, 0.006), (35, 0.042), (36, -0.053), (37, -0.032), (38, 0.019), (39, -0.014), (40, -0.024), (41, 0.058), (42, -0.051), (43, -0.036), (44, -0.023), (45, -0.063), (46, 0.027), (47, 0.016), (48, 0.056), (49, -0.009)]

similar papers list:

simIndex simValue paperId paperTitle

same-paper 1 0.96568978 41 emnlp-2011-Discriminating Gender on Twitter

Author: John D. Burger ; John Henderson ; George Kim ; Guido Zarrella

Abstract: Accurate prediction of demographic attributes from social media and other informal online content is valuable for marketing, personalization, and legal investigation. This paper describes the construction of a large, multilingual dataset labeled with gender, and investigates statistical models for determining the gender of uncharacterized Twitter users. We explore several different classifier types on this dataset. We show the degree to which classifier accuracy varies based on tweet volumes as well as when various kinds of profile metadata are included in the models. We also perform a large-scale human assessment using Amazon Mechanical Turk. Our methods significantly out-perform both baseline models and almost all humans on the same task.

2 0.86913526 117 emnlp-2011-Rumor has it: Identifying Misinformation in Microblogs

Author: Vahed Qazvinian ; Emily Rosengren ; Dragomir R. Radev ; Qiaozhu Mei

Abstract: A rumor is commonly defined as a statement whose true value is unverifiable. Rumors may spread misinformation (false information) or disinformation (deliberately false information) on a network of people. Identifying rumors is crucial in online social media where large amounts of information are easily spread across a large network by sources with unverified authority. In this paper, we address the problem of rumor detection in microblogs and explore the effectiveness of 3 categories of features: content-based, network-based, and microblog-specific memes for correctly identifying rumors. Moreover, we show how these features are also effective in identifying disinformers, users who endorse a rumor and further help it to spread. We perform our experiments on more than 10,000 manually annotated tweets collected from Twitter and show how our retrieval model achieves more than 0.95 in Mean Average Precision (MAP). Fi- nally, we believe that our dataset is the first large-scale dataset on rumor detection. It can open new dimensions in analyzing online misinformation and other aspects of microblog conversations.

3 0.82561874 89 emnlp-2011-Linguistic Redundancy in Twitter

Author: Fabio Massimo Zanzotto ; Marco Pennaccchiotti ; Kostas Tsioutsiouliklis

Abstract: In the last few years, the interest of the research community in micro-blogs and social media services, such as Twitter, is growing exponentially. Yet, so far not much attention has been paid on a key characteristic of microblogs: the high level of information redundancy. The aim of this paper is to systematically approach this problem by providing an operational definition of redundancy. We cast redundancy in the framework of Textual Entailment Recognition. We also provide quantitative evidence on the pervasiveness of redundancy in Twitter, and describe a dataset of redundancy-annotated tweets. Finally, we present a general purpose system for identifying redundant tweets. An extensive quantitative evaluation shows that our system successfully solves the redundancy detection task, improving over baseline systems with statistical significance.

4 0.81857491 71 emnlp-2011-Identifying and Following Expert Investors in Stock Microblogs

Author: Roy Bar-Haim ; Elad Dinur ; Ronen Feldman ; Moshe Fresko ; Guy Goldstein

Abstract: Information published in online stock investment message boards, and more recently in stock microblogs, is considered highly valuable by many investors. Previous work focused on aggregation of sentiment from all users. However, in this work we show that it is beneficial to distinguish expert users from non-experts. We propose a general framework for identifying expert investors, and use it as a basis for several models that predict stock rise from stock microblogging messages (stock tweets). In particular, we present two methods that combine expert identification and per-user unsupervised learning. These methods were shown to achieve relatively high precision in predicting stock rise, and significantly outperform our baseline. In addition, our work provides an in-depth analysis of the content and potential usefulness of stock tweets.

5 0.76532054 139 emnlp-2011-Twitter Catches The Flu: Detecting Influenza Epidemics using Twitter

Author: Eiji ARAMAKI ; Sachiko MASKAWA ; Mizuki MORITA

Abstract: Sachiko MASKAWA The University of Tokyo Tokyo, Japan s achi ko . mas kawa @ gma i . com l Mizuki MORITA National Institute of Biomedical Innovation Osaka, Japan mori ta . mi zuki @ gmai l com . posts more than 5.5 million messages (tweets) every day (reported by Twitter.com in March 201 1). With the recent rise in popularity and scale of social media, a growing need exists for systems that can extract useful information from huge amounts of data. We address the issue of detecting influenza epidemics. First, the proposed system extracts influenza related tweets using Twitter API. Then, only tweets that mention actual influenza patients are extracted by the support vector machine (SVM) based classifier. The experiment results demonstrate the feasibility of the proposed approach (0.89 correlation to the gold standard). Especially at the outbreak and early spread (early epidemic stage), the proposed method shows high correlation (0.97 correlation), which outperforms the state-of-the-art methods. This paper describes that Twitter texts reflect the real world, and that NLP techniques can be applied to extract only tweets that contain useful information. 1

6 0.59209687 133 emnlp-2011-The Imagination of Crowds: Conversational AAC Language Modeling using Crowdsourcing and Large Data Sources

7 0.53376019 98 emnlp-2011-Named Entity Recognition in Tweets: An Experimental Study

8 0.42227376 104 emnlp-2011-Personalized Recommendation of User Comments via Factor Models

9 0.38368309 28 emnlp-2011-Closing the Loop: Fast, Interactive Semi-Supervised Annotation With Queries on Features and Instances

10 0.35249072 33 emnlp-2011-Cooooooooooooooollllllllllllll!!!!!!!!!!!!!! Using Word Lengthening to Detect Sentiment in Microblogs

11 0.33699179 38 emnlp-2011-Data-Driven Response Generation in Social Media

12 0.28080794 17 emnlp-2011-Active Learning with Amazon Mechanical Turk

13 0.27128592 24 emnlp-2011-Bootstrapping Semantic Parsers from Conversations

14 0.23752971 42 emnlp-2011-Divide and Conquer: Crowdsourcing the Creation of Cross-Lingual Textual Entailment Corpora

15 0.23335296 130 emnlp-2011-Summarize What You Are Interested In: An Optimization Framework for Interactive Personalized Summarization

16 0.19606532 11 emnlp-2011-A Simple Word Trigger Method for Social Tag Suggestion

17 0.19514264 23 emnlp-2011-Bootstrapped Named Entity Recognition for Product Attribute Extraction

18 0.18112332 12 emnlp-2011-A Weakly-supervised Approach to Argumentative Zoning of Scientific Documents

19 0.17554972 105 emnlp-2011-Predicting Thread Discourse Structure over Technical Web Forums

20 0.16364647 54 emnlp-2011-Exploiting Parse Structures for Native Language Identification


similar papers computed by lda model

lda for this paper:

topicId topicWeight

[(16, 0.011), (23, 0.145), (32, 0.01), (36, 0.019), (37, 0.021), (45, 0.085), (52, 0.344), (53, 0.018), (54, 0.023), (57, 0.015), (62, 0.02), (64, 0.026), (66, 0.029), (69, 0.012), (79, 0.049), (82, 0.017), (90, 0.011), (96, 0.034), (98, 0.016)]

similar papers list:

simIndex simValue paperId paperTitle

same-paper 1 0.80106318 41 emnlp-2011-Discriminating Gender on Twitter

Author: John D. Burger ; John Henderson ; George Kim ; Guido Zarrella

Abstract: Accurate prediction of demographic attributes from social media and other informal online content is valuable for marketing, personalization, and legal investigation. This paper describes the construction of a large, multilingual dataset labeled with gender, and investigates statistical models for determining the gender of uncharacterized Twitter users. We explore several different classifier types on this dataset. We show the degree to which classifier accuracy varies based on tweet volumes as well as when various kinds of profile metadata are included in the models. We also perform a large-scale human assessment using Amazon Mechanical Turk. Our methods significantly out-perform both baseline models and almost all humans on the same task.

2 0.73021799 56 emnlp-2011-Exploring Supervised LDA Models for Assigning Attributes to Adjective-Noun Phrases

Author: Matthias Hartung ; Anette Frank

Abstract: This paper introduces an attribute selection task as a way to characterize the inherent meaning of property-denoting adjectives in adjective-noun phrases, such as e.g. hot in hot summer denoting the attribute TEMPERATURE, rather than TASTE. We formulate this task in a vector space model that represents adjectives and nouns as vectors in a semantic space defined over possible attributes. The vectors incorporate latent semantic information obtained from two variants of LDA topic models. Our LDA models outperform previous approaches on a small set of 10 attributes with considerable gains on sparse representations, which highlights the strong smoothing power of LDA models. For the first time, we extend the attribute selection task to a new data set with more than 200 classes. We observe that large-scale attribute selection is a hard problem, but a subset of attributes performs robustly on the large scale as well. Again, the LDA models outperform the VSM baseline.

3 0.70033216 37 emnlp-2011-Cross-Cutting Models of Lexical Semantics

Author: Joseph Reisinger ; Raymond Mooney

Abstract: Context-dependent word similarity can be measured over multiple cross-cutting dimensions. For example, lung and breath are similar thematically, while authoritative and superficial occur in similar syntactic contexts, but share little semantic similarity. Both of these notions of similarity play a role in determining word meaning, and hence lexical semantic models must take them both into account. Towards this end, we develop a novel model, Multi-View Mixture (MVM), that represents words as multiple overlapping clusterings. MVM finds multiple data partitions based on different subsets of features, subject to the marginal constraint that feature subsets are distributed according to Latent Dirich- let Allocation. Intuitively, this constraint favors feature partitions that have coherent topical semantics. Furthermore, MVM uses soft feature assignment, hence the contribution of each data point to each clustering view is variable, isolating the impact of data only to views where they assign the most features. Through a series of experiments, we demonstrate the utility of MVM as an inductive bias for capturing relations between words that are intuitive to humans, outperforming related models such as Latent Dirichlet Allocation.

4 0.56871188 117 emnlp-2011-Rumor has it: Identifying Misinformation in Microblogs

Author: Vahed Qazvinian ; Emily Rosengren ; Dragomir R. Radev ; Qiaozhu Mei

Abstract: A rumor is commonly defined as a statement whose true value is unverifiable. Rumors may spread misinformation (false information) or disinformation (deliberately false information) on a network of people. Identifying rumors is crucial in online social media where large amounts of information are easily spread across a large network by sources with unverified authority. In this paper, we address the problem of rumor detection in microblogs and explore the effectiveness of 3 categories of features: content-based, network-based, and microblog-specific memes for correctly identifying rumors. Moreover, we show how these features are also effective in identifying disinformers, users who endorse a rumor and further help it to spread. We perform our experiments on more than 10,000 manually annotated tweets collected from Twitter and show how our retrieval model achieves more than 0.95 in Mean Average Precision (MAP). Fi- nally, we believe that our dataset is the first large-scale dataset on rumor detection. It can open new dimensions in analyzing online misinformation and other aspects of microblog conversations.

5 0.557836 71 emnlp-2011-Identifying and Following Expert Investors in Stock Microblogs

Author: Roy Bar-Haim ; Elad Dinur ; Ronen Feldman ; Moshe Fresko ; Guy Goldstein

Abstract: Information published in online stock investment message boards, and more recently in stock microblogs, is considered highly valuable by many investors. Previous work focused on aggregation of sentiment from all users. However, in this work we show that it is beneficial to distinguish expert users from non-experts. We propose a general framework for identifying expert investors, and use it as a basis for several models that predict stock rise from stock microblogging messages (stock tweets). In particular, we present two methods that combine expert identification and per-user unsupervised learning. These methods were shown to achieve relatively high precision in predicting stock rise, and significantly outperform our baseline. In addition, our work provides an in-depth analysis of the content and potential usefulness of stock tweets.

6 0.55007845 139 emnlp-2011-Twitter Catches The Flu: Detecting Influenza Epidemics using Twitter

7 0.54971838 89 emnlp-2011-Linguistic Redundancy in Twitter

8 0.54751021 98 emnlp-2011-Named Entity Recognition in Tweets: An Experimental Study

9 0.51283026 33 emnlp-2011-Cooooooooooooooollllllllllllll!!!!!!!!!!!!!! Using Word Lengthening to Detect Sentiment in Microblogs

10 0.49828753 23 emnlp-2011-Bootstrapped Named Entity Recognition for Product Attribute Extraction

11 0.48322272 38 emnlp-2011-Data-Driven Response Generation in Social Media

12 0.47347471 133 emnlp-2011-The Imagination of Crowds: Conversational AAC Language Modeling using Crowdsourcing and Large Data Sources

13 0.46997237 28 emnlp-2011-Closing the Loop: Fast, Interactive Semi-Supervised Annotation With Queries on Features and Instances

14 0.4691177 1 emnlp-2011-A Bayesian Mixture Model for PoS Induction Using Multiple Features

15 0.46887586 108 emnlp-2011-Quasi-Synchronous Phrase Dependency Grammars for Machine Translation

16 0.46814603 35 emnlp-2011-Correcting Semantic Collocation Errors with L1-induced Paraphrases

17 0.46738523 136 emnlp-2011-Training a Parser for Machine Translation Reordering

18 0.465431 137 emnlp-2011-Training dependency parsers by jointly optimizing multiple objectives

19 0.46534735 17 emnlp-2011-Active Learning with Amazon Mechanical Turk

20 0.46475184 62 emnlp-2011-Generating Subsequent Reference in Shared Visual Scenes: Computation vs Re-Use