emnlp emnlp2011 emnlp2011-117 emnlp2011-117-reference knowledge-graph by maker-knowledge-mining

117 emnlp-2011-Rumor has it: Identifying Misinformation in Microblogs

Source: pdf

Author: Vahed Qazvinian ; Emily Rosengren ; Dragomir R. Radev ; Qiaozhu Mei

Abstract: A rumor is commonly defined as a statement whose true value is unverifiable. Rumors may spread misinformation (false information) or disinformation (deliberately false information) on a network of people. Identifying rumors is crucial in online social media where large amounts of information are easily spread across a large network by sources with unverified authority. In this paper, we address the problem of rumor detection in microblogs and explore the effectiveness of 3 categories of features: content-based, network-based, and microblog-specific memes for correctly identifying rumors. Moreover, we show how these features are also effective in identifying disinformers, users who endorse a rumor and further help it to spread. We perform our experiments on more than 10,000 manually annotated tweets collected from Twitter and show how our retrieval model achieves more than 0.95 in Mean Average Precision (MAP). Fi- nally, we believe that our dataset is the first large-scale dataset on rumor detection. It can open new dimensions in analyzing online misinformation and other aspects of microblog conversations.

reference text

Floyd H. Allport and Milton Lepkin. 1945. Wartime rumors of waste and special privilege: why some people 1598 believe them. Journal of Abnormal and Social Psychology, 40(1):3 36. Gordon Allport and Leo Postman. 1947. The psychology of rumor. Holt, Rinehart, and Winston, New York. Galen Andrew and Jianfeng Gao. 2007. Scalable training of l1-regularized log-linear models. In ICML ’07, pages 33–40. James Berger. 1985. Statistical decision theory and Bayesian Analysis (2nd ed.). New York: SpringerVerlag. – Albert Bifet and Eibe Frank. 2010. Sentiment knowledge discovery in twitter streaming data. In Bernhard Pfahringer, Geoff Holmes, and Achim Hoffmann, editors, Discovery Science, volume 6332 of Lecture Notes in Computer Science, pages 1–15. Springer Berlin / Heidelberg. Jean Carletta. 1996. Assessing agreement on classification tasks: the kappa statistic. Comput. Linguist. , 22(2):249–254. Philip Clarkson and Roni Rosenfeld. 1997. Statistical language modeling using the cmu-cambridge toolkit. Proceedings ESCA Eurospeech, 47:45–148. Nicholas DiFonzo and Prashant Bordia. 2007. Rumor, gossip, and urban legend. Diogenes, 54: 19–35, February. Nicholas DiFonzo, P. Prashant Bordia, and Ralph L. Rosnow. 1994. Reining in rumors. Organizational Dynamics, 23(1):47–62. Rob Ennals, Dan Byler, John Mark Agosta, and Barbara Rosario. 2010. What is disputed on the web? In Proceedings of the 4th workshop on Information Credibility, WICOW ’ 10, pages 67–74. Jianfeng Gao, Galen Andrew, Mark Johnson, and Kristina Toutanova. 2007. A comparative study of parameter estimation methods for statistical natural language processing. In ACL ’07. Namrata Godbole, Manjunath Srinivasaiah, and Steven Skiena. 2007. Large-scale sentiment analysis for news and blogs. Conference In Proceedings of the International on Weblogs and Social Media (ICWSM), Boulder, CO, USA. Ahmed Hassan, Vahed Qazvinian, and Dragomir Radev. 2010. What’s with the attitude? identifying sentences with attitude in online discussions. In Proceedings of the 2010 Conference on Empirical Methods in Natural Language Processing, MA, October. tics. Edward pages 1245–1255, Cambridge, Association for Computational Linguis- and Noam Chomsky. 2002. Manufacturing Consent: The Political Economy of the Mass Media. Pantheon. Courtenay Honeycutt and Susan C. Herring. 2009. Beyond microblogging: Conversation and collaboration S Herman via twitter. Hawaii International Conference on System Sciences, 0:1–10. Jeff Huang, Katherine M. Thornton, and Efthimis N. Efthimiadis. 2010. Conversational tagging in twitter. In Proceedings of the 21st ACM conference on Hypertext and hypermedia, HT ’ 10, pages 173–178. Klaus Krippendorff. 1980. Content Analysis: An Introduction to its Methodology. Beverly Hills: Sage Publications. Jure Leskovec, Lars Backstrom, and Jon Kleinberg. 2009. Meme-tracking and the dynamics of the news cycle. In KDD ’09: Proceedings of the 15th ACM SIGKDD international conference on Knowledge discovery and data mining, pages 497–506. Christopher D. Manning, Prabhakar Raghavan, and Hinrich Sch u¨tze. 2008. Introduction to Information Retrieval. Cambridge University Press. Marcelo Mendoza, Barbara Poblete, and Carlos Castillo. Twitter under crisis: Can we trust what we rt? Alexander Pak and Patrick Paroubek. 2010. Twitter as a corpus for sentiment analysis and opinion mining. In Nicoletta Calzolari (Conference Chair), Khalid Choukri, Bente Maegaard, Joseph Mariani, Jan Odijk, Stelios Piperidis, Mike Rosner, and Daniel Tapias, editors, Proceedings of the Seventh conference on International Language Resources and Evaluation (LREC’10), Valletta, Malta, may. European Language Resources Association (ELRA). Bo Pang and Lillian Lee. 2004. A sentimental educa- tion: sentiment analysis using subjectivity summarization based on minimum cuts. In ACL’04, Morristown, NJ, USA. Bo Pang and Lillian Lee. 2008. Opinion mining and sentiment analysis. Foundations and Trends in Information Retrieval, 2:1–135. Bo Pang, Lillian Lee, and Shivakumar Vaithyanathan. 2002. Thumbs up?: sentiment classification using machine learning techniques. In Proceedings of conference on Empirical methods in natural language processing, EMNLP’02, pages 79–86. Jacob Ratkiewicz, Michael Conover, Mark Meiss, Bruno Gon ¸calves, Snehal Patil, Alessandro Flammini, and Filippo Menczer. 2010. Detecting and tracking the spread of astroturf memes in microblog streams. CoRR, abs/101 1.3768. 1599