acl acl2013 acl2013-350 acl2013-350-reference knowledge-graph by maker-knowledge-mining

350 acl-2013-TopicSpam: a Topic-Model based approach for spam detection

Source: pdf

Author: Jiwei Li ; Claire Cardie ; Sujian Li

Abstract: Product reviews are now widely used by individuals and organizations for decision making (Litvin et al., 2008; Jansen, 2010). And because of the profits at stake, people have been known to try to game the system by writing fake reviews to promote target products. As a result, the task of deceptive review detection has been gaining increasing attention. In this paper, we propose a generative LDA-based topic modeling approach for fake review detection. Our model can aptly detect the subtle dif- ferences between deceptive reviews and truthful ones and achieves about 95% accuracy on review spam datasets, outperforming existing baselines by a large margin.

reference text

David Blei, Andrew Ng and Micheal Jordan. Latent Dirichlet allocation. 2003. In Journal of Machine Learning Research. Carlos Castillo, Debora Donato, Luca Becchetti, Paolo Boldi, Stefano Leonardi Massimo Santini, and Sebastiano Vigna. A reference collection for web spam. In Proceedings of annual international ACM SIGIR conference on Research and development in information retrieval, 2006. Chaltanya Chemudugunta, Padhraic Smyth and Mark Steyers. Modeling General and Specific Aspects of Documents with a Probabilistic Topic Model.. In Advances in Neural Information Processing Systems 19: Proceedings of the 2006 Conference. Paul-Alexandru Chirita, Jorg Diederich, and Wolfgang Nejdl. MailRank: using ranking for spam detection. In Proceedings of ACM international conference on Information and knowledge management. 2005. Harris Drucke, Donghui Wu, and Vladimir Vapnik. 2002. Support vector machines for spam categorization. In Neural Networks. Qiming Diao, Jing Jiang, Feida Zhu and Ee-Peng Lim. In Proceeding of the 50th Annual Meeting of the Association for Computational Linguistics. 2012 Thorsten Joachims. 1999. Making large-scale support vector machine learning practical. In Advances in kernel methods. Jack Jansen. 2010. Online product research. In Pew Internet and American Life Project Report. Nitin Jindal, and Bing Liu. Opinion spam and analysis. 2008. In Proceedings of the international conference on Web search and web data mining Nitin Jindal, Bing Liu, and Ee-Peng Lim. Finding Unusual Review Patterns Using Unexpected Rules. 2010. In Proceedings of the 19th ACM international conference on Information and knowledge management Pranam Kolari, Akshay Java, Tim Finin, Tim Oates and Anupam Joshi. Detecting Spam Blogs: A Machine Learning Approach. In Proceedings of Association for the Advancement of Artificial Intelligence. 2006. Peng Li, Jing Jiang and Yinglin Wang. 2010. Generating templates of entity summaries with an entityaspect model and pattern mining. In Proceedings of the 48th Annual Meeting ofthe Associationfor Computational Linguistics. Fangtao Li, Minlie Huang, Yi Yang, and Xiaoyan Zhu. Learning to identify review Spam. 2011. In Proceedings ofthe Twenty-Second internationaljoint conference on Artificial Intelligence. Ee-Peng Lim, Viet-An Nguyen, Nitin Jindal, Bing Liu, and Hady Wirawan Lauw. Detecting Product Review Spammers Using Rating Behavior. 2010. In Proceedings of the 19th ACM international conference on Information and knowledge management. Stephen Litvina, Ronald Goldsmithb and Bing Pana. 2008. Electronic word-of-mouth in hospitality and tourism management. Tourism management, 29(3):458468. Juan Martinez-Romo and Lourdes Araujo. Web Spam Identification Through Language Model Analysis. In AIRWeb. 2009. Arjun Mukherjee, Bing Liu and Natalie Glance. Spotting Fake Reviewer Groups in Consumer Reviews. In Proceedings of the 18th international conference on World wide web, 2012. Alexandros Ntoulas, Marc Najork, Mark Manasse and Dennis Fetterly. Detecting Spam Web Pages through Content Analysis. In Proceedings of international conference on World Wide Web 2006 Myle Ott, Yejin Choi, Claire Cardie and Jeffrey Hancock. Finding deceptive opinion spam by any stretch of the imagination. 2011. In Proceedings of the 49th Annual Meeting of the Association for Computational Linguistics: Human Language Technologies Bo Pang and Lillian Lee. Opinion mining and sentiment analysis. In Found. Trends Inf. Retr. Daniel Ramage, David Hall, Ramesh Nallapati and Christopher D. Manning. Labeled LDA: a supervised topic model for credit attribution in multilabeled corpora. 2009. In Proceedings of the 2009 Conference on Empirical Methods in Natural Language Processing 2009. Michal Rosen-zvi, Thomas Griffith, Mark Steyvers and Padhraic Smyth. The author-topic model for authors and documents. In Proceedings of the 20th conference on Uncertainty in artificial intelligence. Guan Wang, Sihong Xie, Bing Liu and Philip Yu. Review Graph based Online Store Review Spammer Detection. 2011. In Proceedings of 11th Interna- tional Conference of Data Mining. Baoning Wu, Vinay Goel and Brian Davison. Topical TrustRank: using topicality to combat Web spam. In Proceedings of international conference on World Wide Web 2006 . Kyang Yoo and Ulrike Gretzel. 2009. Comparison of Deceptive and Truthful Travel Reviews. InInformation and Communication Technologies in Tourism 2009. 221