emnlp emnlp2013 emnlp2013-95 emnlp2013-95-reference knowledge-graph by maker-knowledge-mining

95 emnlp-2013-Identifying Multiple Userids of the Same Author

Source: pdf

Author: Tieyun Qian ; Bing Liu

Abstract: This paper studies the problem of identifying users who use multiple userids to post in social media. Since multiple userids may belong to the same author, it is hard to directly apply supervised learning to solve the problem. This paper proposes a new method, which still uses supervised learning but does not require training documents from the involved userids. Instead, it uses documents from other userids for classifier building. The classifier can be applied to documents of the involved userids. This is possible because we transform the document space to a similarity space and learning is performed in this new space. Our evaluation is done in the online review domain. The experimental results using a large number of userids and their reviews show that the proposed method is highly effective. 1

reference text

Shlomo Argamon and Shlomo Levitan. 2004. Measuring the usefulness of function words for authorship attribution. Literary and Linguistic Computing 1-3. Shlomo Argamon, Casey Whitelaw, Paul Chase, Sobhan Raj Hota, Navendu Garg, and Shlomo Levitan. 2007. Stylistic text classification using functional lexical features: Research articles. J. Am. Soc. Inf. Sci. Technol. 58:802-822. John F. Burrows. 1992. Not unless you ask nicely: The interpretative nexus between analysis and information. Literary and Linguistic Computing 7:91-109. Yunbo Cao, Jun Xu, Tie-Yan Liu, Hang Li, Yalou Huang, and Hsiao-Wuen Hon. 2006. Adapting ranking svm to document retrieval. Proc. of SIGIR, Pages 186-193. Hung-Ching Chen, Mark K. Goldberg, Malik Magdon-Ismail. 2004. Identifying multi-ID users in open forums. Intelligence and Security Informatics, Pages 176-186. Joachim Diederich, Jörg Kindermann, Edda Leopold, and Gerhard Paass, 2000. Authorship attribution with support vector machines. Applied Intelligence 19: 109-123. Hugo Jair Escalante, Thamar Solorio, and Manuel Montes-y-Gómez. 2011. Local histograms of character n-grams for authorship attribution. Proc. of ACL-HLT, Volume I: 288-298. Song Feng, Longfei Xing, Anupam Gogar, and Yejin Choi. 2012. Distributional Footprints of Deceptive Product Reviews. Proc. of ICWSM. Michael Gamon. 2004. Linguistic correlates of style: authorship classification with deep linguistic analysis features. Proc. of Coling. Neil Graham, Graeme Hirst, and Bhaskara Marthi. 2005. Segmenting documents by stylistic character. Natural 11:397-415. Language Engineering, Jack Grieve. 2007. Quantitative authorship attribution: An evaluation of techniques. Literary and Linguistic Computing 22:25 1-270. Hans van Halteren, Fiona Tweedie, and Harald Baayen. 1996. Outside the cave of shadows: using syntactic annotation to enhance authorship attribution. Literary and Linguistic Computing 11: 121-132. Steffen Hedegaard and Jakob Grue Simonsen. 2011. Lost in translation: authorship attribution using frame semantics. Proc. of ACL-HLT, short papers - Volume 2, 65-70. 1134 Graeme Hirst and Ol’ga Feiguina. 2007. Bigrams of syntactic labels for authorship discrimination of short texts. Literary and Linguistic Computing 22:405-417. David I. Holmes and R. S. Forsyth. 1995. The Federalist Revisited: New Directions in Authorship Attribution, Literary and Linguistic Computing, 10(2): 111-127. David L. Hoover. 2001. Statistical stylistics and authorship attribution: an empirical investigation. Literary and Linguistic Computing 16:421-424. Nitin Jindal and Bing Liu. 2008. Opinion Spam and Analysis. Proc. of WSDM, California, USA. Thorsten Joachims. 2006. Training linear svms in linear time. Proc. of KDD. Sangkyum Kim, Hyungsul Kim, Tim Weninger, Jiawei Han, and Hyun Duk Kim. 2011. Authorship classification: a discriminative syntactic tree mining approach. Proc. of SIGIR, Pages 455-464. Dan Klein, and Christopher D. Manning. 2003. Accurate unlexicalized parsing. Proc. of ACL, 423-430. Moshe Koppel and Jonathan Schler. 2004. Authorship verification as a one-class classification problem. Proc. of ICML. Moshe Koppel, Jonathan Schler, Shlomo Argamon. 2011. Authorship attribution in the wild. Lang Resources & Evaluation, 45:83-94 Fangtao Li, Minlie Huang, Yi Yang and Xiaoyan Zhu. 2011. Learning to identify review Spam. Proc. of IJCAI. Hang Li. 2011. Learning to Rank for Information Retrieval and Natural Language Processing. Morgan & Claypool publishers. Jiexun Li, Rong Zheng, and Hsinchun Chen. 2006. From fingerprint to writeprint. Communications of the ACM, 49:76-82. Ee-Peng Lim, Viet-An Nguyen, Nitin Jindal, Bing Liu, Hady W. Lauw. 2010. Detecting product review spammers using rating behaviors. Proc. of CIKM, 2010. Bing Liu. 2012. Sentiment Analysis and Opinion Mining, Morgan & Claypool publishers. Tieyan Liu. 2011. Learning to Rank for Information Retrieval. Springer. Kim Luyckx, Walter Daelemans. 2008. Authorship Attribution and Verification with Many Authors and Limited Data. Proc. of Coling, pages 5 13520. David Madigan, Alexander Genkin, David D. Lewis, Shlomo Argamon, Dmitriy Fradkin, and Li Ye. 2005. Author Identification on the Large Scale. Proc. of CSNA. Donald Metzler, Yaniv Bernstein, W. Bruce Croft, Alistair Moffat, and Justin Zobel. 2005. Similarity measures for tracking information flow. Proc. of CIKM. Pages 517-524. Frederick Mosteller, David Lee Wallace. 1964. Inference and disputed authorship: The Federalist. Addison-Wesley. Arjun Mukherjee, Bing Liu, and Natalie Glance. 2012. Spotting Fake Reviewer Groups in Consumer Reviews. Proc. of WWW, Pages 191-200. Arvind Narayanan, Hristo Paskov, Neil Zhenqiang Gong, et al. 2012. On the feasibility of internetscale author identification. Proceedings of the 2012 IEEE Symposium on Security and Privacy. Pages 300-314 Jasmine Novak, Prabhakar Raghavan, Andrew Tomkins. 2004. Anti-aliasing on the web. Proc. of WWW, Pages 30-39 Myle Ott, Yejin Choi, Claire Cardie, Jeffrey T. Hancock. 2011. Finding Deceptive Opinion Spam by Any Stretch of the Imagination. Proc. of ACL. Myle Ott, Claire Cardie, Jeffrey T. Hancock. 2012. Estimating the prevalence of deception in online review communities. Proc. of WWW. Fuchun Peng, Dale Schuurmans, Shaojun Wang, and Vlado Keselj. 2003. Language independent authorship attribution using character level language models. Proc. of EACL, Pages 267274. Conrad Sanderson and Simon Guenter. 2006. Short text authorship attribution via sequence kernels, markov chains and author unmasking: an investigation. Proc. of EMNLP, Pages 482491. Yanir Seroussi, Fabian Bohnert, Ingrid Zukerman. 2012. Authorship Attribution with Author1135 aware Topic Models. Proc. of ACL, 2:264-269. Thamar Solorio, Sangita Pillay, Sindhu Raghavan, Manuel Montes y G´omez. 201 1. Modality Specific Meta Features for Authorship Attribution in Web Forum Posts. Proc. of IJCNLP, Pages 156-164. Efstathios Stamatatos. 2009. A Survey of Modern Authorship Attribution Methods. Journal of the American Society for Information Science and Technology, 60(3):538-556, Wiley. Efstathios Stamatatos, George Kokkinakis, and Nikos Fakotakis. 2000. Automatic text categorization in terms of genre and author. Comput. Linguist. 26:47 1-495. Özlem Uzuner and Boris Katz. 2005. A comparative study of language models for book and author recognition. Proc. of IJCNLP, Pages 969-980. Vladimir N. Vapnik. 1998. Statistical Learning Theory. Wiley-Interscience, NY. O. de Vel, A. Anderson, M. Corney and G. Mohay. 2001. Mining Email Content for Author Identification Forensics. Sigmod Record, 30:5564. Kyung-Hyan Yoo and Ulrike Comparison of Deceptive and Reviews. Information and Technologies in Tourism, Pages Gretzel. 2009. Truthful Travel Communication 37-47. Georgy Udnv Yule. 1944. The statistical study of literary vocabulary. Cambridge University Press. Guan Wang, Sihong Xie, Bing Liu, Philip S. Yu. 2011. Review Graph based Online Store Review Spammer Detection. Proc. of ICDM. Ying Zhao and Justin Zobel. 2005. Effective and scalable authorship attribution using function words. Proceeding of Information Retrival Technology, Pages 174-189. Rong Zheng, Jiexun Li, Hsinchun Chen, and Zan Huang. 2006. A framework for authorship identification of online messages: Writing style features and classification techniques. Journal of the American Society of Information Science and Technology 57:378-393.