acl acl2013 acl2013-107 acl2013-107-reference knowledge-graph by maker-knowledge-mining

107 acl-2013-Deceptive Answer Prediction with User Preference Graph

Source: pdf

Author: Fangtao Li ; Yang Gao ; Shuchang Zhou ; Xiance Si ; Decheng Dai

Abstract: In Community question answering (QA) sites, malicious users may provide deceptive answers to promote their products or services. It is important to identify and filter out these deceptive answers. In this paper, we first solve this problem with the traditional supervised learning methods. Two kinds of features, including textual and contextual features, are investigated for this task. We further propose to exploit the user relationships to identify the deceptive answers, based on the hypothesis that similar users will have similar behaviors to post deceptive or authentic answers. To measure the user similarity, we propose a new user preference graph based on the answer preference expressed by users, such as “helpful” voting and “best answer” selection. The user preference graph is incorporated into traditional supervised learning framework with the graph regularization technique. The experiment results demonstrate that the user preference graph can indeed help improve the performance of deceptive answer prediction.

reference text

Lada A. Adamic, Jun Zhang, Eytan Bakshy, and Mark S. Ackerman. 2008. Knowledge sharing and yahoo answers: everyone knows something. In Proceedings of the 17th international conference on World Wide Web, WWW ’08, pages 665–674, New York, NY, USA. ACM. Jiang Bian, Yandong Liu, Ding Zhou, Eugene Agichtein, and Hongyuan Zha. 2009. Learning to recognize reliable users and content in social media with coupled mutual reinforcement. In Proceedings of the 18th international conference on World wide web, WWW ’09, pages 51–60, NY, USA. ACM. David M. Blei, Andrew Y. Ng, and Michael I. Jordan. 2003. Latent dirichlet allocation. J. Mach. Learn. Res., 3:993–1022, March. Peter F. Brown, John Cocke, Stephen A. Della Pietra, Vincent J. Della Pietra, Fredrick Jelinek, John D. Lafferty, Robert L. Mercer, and Paul S. Roossin. 1990. A statistical approach to machine translation. Comput. Linguist. , 16:79–85, June. S. Deerwester, S.T. Dumais, G.W. Furnas, T.K. Landauer, and R. Harshman. 1990. Indexing by latent semantic analysis. Journal of the American society for information science, 41(6):391–407. A. Figueroa and J. Atkinson. 2011. Maximum entropy context models for ranking biographical answers to open-domain definition questions. In Twenty-Fifth AAAI Conference on Artificial Intelligence. F. Maxwell Harper, Daphne Raban, Sheizaf Rafaeli, and Joseph A. Konstan. 2008. Predictors of answer quality in online q&a; sites. In Proceedings of the twenty-sixth annual SIGCHI conference on Human factors in computing systems, CHI ’08, pages 865– 874, New York, NY, USA. ACM. Daisuke Ishikawa, Tetsuya Sakai, and Noriko Kando, 2010. Overview of the NTCIR-8 Community QA Pilot Task (Part I): The Test Collection and the Task, pages 421–432. Number Part I. Jiwoon Jeon, W. Bruce Croft, and Joon Ho Lee. 2005. Finding similar questions in large question and answer archives. In Proceedings of the 14th ACM CIKM conference, 05, pages 84–90, NY, USA. ACM. J. Jeon, W.B. Croft, J.H. Lee, and S. Park. 2006. A framework to predict the quality of answers with non-textual features. In Proceedings of the 29th annual international ACM SIGIR conference on Research and development in information retrieval, pages 228–235. ACM. P. Jurczyk and E. Agichtein. 2007. Discovering authorities in question answer communities by using link analysis. In Proceedings of the sixteenth ACM CIKM conference, pages 919–922. ACM. H. Kim, P. Howland, and H. Park. 2006. Dimension reduction in text classification with support vector machines. Journal of Machine Learning Research, 6(1):37. Fangtao Li, Minlie Huang, Yi Yang, and Xiaoyan Zhu. 2011. Learning to identify review spam. In Proceedings of the Twenty-Second international joint conference on Artificial Intelligence-Volume Volume Three, pages 2488–2493. AAAI Press. Yuanjie Liu, Shasha Li, Yunbo Cao, Chin-Yew Lin, Dingyi Han, and Yong Yu. 2008. Understanding and summarizing answers in community-based question answering services. In Proceedings of the 22nd International Conference on Computational Linguistics - Volume 1, COLING ’08, pages 497– 504, Stroudsburg, PA, USA. Association for Computational Linguistics. Jing Liu, Young-In Song, and Chin-Yew Lin. 2011. Competition-based user expertise score estimation. In Proceedings of the 34th international ACM SIGIR conference on Research and development in Information Retrieval, pages 425–434. ACM. Yue Lu, Panayiotis Tsaparas, Alexandros Ntoulas, and Livia Polanyi. 2010. Exploiting social context for review quality prediction. In Proceedings of the 19th international conference on World wide web, pages 691–700. ACM. Franz Josef Och and Hermann Ney. 2003. A systematic comparison of various statistical alignment models. Comput. Linguist. , 29: 19–5 1, March. Lawrence Page, Sergey Brin, Rajeev Motwani, and Terry Winograd. 1999. The pagerank citation ranking: Bringing order to the web. Technical Report 1999-66, Stanford InfoLab, November. 1999-0120. SIDL-WP- Kishore Papineni, Salim Roukos, Todd Ward, and WeiJing Zhu. 2002. Bleu: a method for automatic evaluation of machine translation. In Proceedings of the 40th Annual Meeting on Association for Computational Linguistics, ACL ’02, pages 3 11–3 18, Stroudsburg, PA, USA. ACL. Gerard Salton and Michael J. McGill. 1986. Introduction to Modern Information Retrieval. McGrawHill, Inc., New York, NY, USA. Chirag Shah and Jefferey Pomerantz. 2010. Evaluating and predicting answer quality in community qa. In Proceedings ofthe 33rd internationalACMSIGIR conference on Research and development in information retrieval, SIGIR ’ 10, pages 411–418, New York, NY, USA. ACM. X. Si, Z. Gyongyi, and E. Y. Chang. 2010a. Scalable mining of topic-dependent user reputation for improving user generated content search quality. In Google Technical Report. 1731 Xiance Si, Edward Y. Chang, Zolt a´n Gy¨ ongyi, Maosong Sun. 2010b. Confucius and and its intelli- gent disciples: integrating social with search. Proc. VLDB Endow., 3:1505–1516, September. Young-In Song, Chin-Yew Chang Rim. 2008. Lin, Yunbo Cao, and Hae- Question utility: a novel static ranking of question search. In Proceedings of the 23rd national conference on Artificial intelligence - Volume 2, AAAI’08, pages 123 1–1236. AAAI Press. Y.I. Song, J. Liu, T. Sakai, X.J. Wang, G. Feng, Y. Cao, H. Suzuki, and C.Y. Lin. 2010. Microsoft research asia with redmond at the ntcir-8 community qa pilot task. In Proceedings of NTCIR. Wei Wei, Gao Cong, Xiaoli Li, See-Kiong Ng, and Guohui Li. 2011. Integrating community question and answer archives. In AAAI. Y. Yang and J.O. Pedersen. 1997. A comparative study on feature selection in text categorization. In MACHINE LEARNING-INTERNATIONAL WORKSHOP THEN CONFERENCE-, pages 412– 420. MORGAN KAUFMANN PUBLISHERS. Tong Zhang, Alexandrin Popescul, and Byron Dom. 2006. Linear prediction models with graph regularization for web-page categorization. In Proceedings of the 12th ACM SIGKDD international conference on Knowledge discovery and data mining, pages 821–826. ACM. 1732