acl acl2010 acl2010-34 acl2010-34-reference knowledge-graph by maker-knowledge-mining

34 acl-2010-Authorship Attribution Using Probabilistic Context-Free Grammars


Source: pdf

Author: Sindhu Raghavan ; Adriana Kovashka ; Raymond Mooney

Abstract: In this paper, we present a novel approach for authorship attribution, the task of identifying the author of a document, using probabilistic context-free grammars. Our approach involves building a probabilistic context-free grammar for each author and using this grammar as a language model for classification. We evaluate the performance of our method on a wide range of datasets to demonstrate its efficacy.


reference text

H. Baayen, H. van Halteren, and F. Tweedie. 1996. Outside the cave of shadows: using syntactic annotation to enhance authorship attribution. Literary and Linguistic Computing, 11(3): 121–132, September. Binongo and Smith. 1999. A Study of Oscar Wilde’s Writings. Journal of Applied Statistics, 26:781 . J Burrows. 1987. Word-patterns and Story-shapes: The Statistical Analysis of Narrative Style. Joachim Diederich, J o¨rg Kindermann, Edda Leopold, and Gerhard Paass. 2000. Authorship Attribution with Support Vector Machines. Applied Intelligence, 19:2003. D. I. Holmes and R. S. Forsyth. 1995. The Federalist Revisited: New Directions in Authorship Attribution. Literary and Linguistic Computing, 10: 111– 127. Thorsten Joachims. 1998. Text categorization with Support Vector Machines: Learning with many relevant features. In Proceedings of the 10th European Conference on Machine Learning (ECML), pages 137–142, Berlin, Heidelberg. Springer-Verlag. Patrick Juola and John Sofko. 2004. Proving and Improving Authorship Attribution Technologies. In Proceedings of Canadian Symposium for Text Analysis (CaSTA). Dan Klein and Christopher D. Manning. 2003a. Accurate unlexicalized parsing. In Proceedings of the 41st Annual Meeting on Association for Computational Linguistics (ACL), pages 423–430, Morristown, NJ, USA. Association for Computational Linguistics. Dan Klein and Christopher D. Manning. 2003b. Fast Exact Inference with a Factored Model for Natural Language Parsing. In Advances in Neural Information Processing Systems 15 (NIPS), pages 3–10. MIT Press. Kim Luyckx and Walter Daelemans. 2008. Authorship Attribution and Verification with Many Authors and Limited Data. In Proceedings of the 22nd International Conference on Computational Linguistics (COLING), pages 5 13–520, August. Andrew Kachites McCallum. 2002. MALLET: A Machine Learning for Language Toolkit. http://mallet.cs.umass.edu. Frederick Mosteller and David L. Wallace. 1984. Applied Bayesian and Classical Inference: The Case of the Federalist Papers. Springer-Verlag. Andrew Y. Ng and Michael I. Jordan. 2001 . On Discriminative vs. Generative classifiers: A comparison of logistic regression and naive Bayes. In Advances in Neural Information Processing Systems 14 (NIPS), pages 841–848. Fuchun Peng, Dale Schuurmans, Viado Keselj, and Shaojun Wang. 2003. Language Independent Authorship Attribution using Character Level Language Models. In Proceedings of the 10th Conference of the European Chapter of the Association for Computational Linguistics (EACL). Bernard Rosner. 2005. Fundamentals of Biostatistics. Duxbury Press. E. Stamatatos, N. Fakotakis, and G. Kokkinakis. 1999. Automatic Authorship Attribution. In Proceedings ofthe 9th Conference ofthe European Chapter ofthe Association for Computational Linguistics (EACL), pages 158–164, Morristown, NJ, USA. Association for Computational Linguistics. E. Stamatatos. 2009. A Survey of Modern Authorship Attribution Methods. Journal of the American SocietyforInformation Science and Technology, 60(3):538–556. Rong Zheng, Yi Qin, Zan Huang, and Hsinchun Chen. 2009. Authorship Analysis in Cybercrime Investigation. Lecture Notes in Computer Science, 2665/2009:959. 42