acl acl2011 acl2011-319 acl2011-319-reference knowledge-graph by maker-knowledge-mining

319 acl-2011-Unsupervised Decomposition of a Document into Authorial Components

Source: pdf

Author: Moshe Koppel ; Navot Akiva ; Idan Dershowitz ; Nachum Dershowitz

Abstract: We propose a novel unsupervised method for separating out distinct authorial components of a document. In particular, we show that, given a book artificially “munged” from two thematically similar biblical books, we can separate out the two constituent books almost perfectly. This allows us to automatically recapitulate many conclusions reached by Bible scholars over centuries of research. One of the key elements of our method is exploitation of differences in synonym choice by different authors. 1

reference text

J. Astruc. 1753. Conjectures sur les mémoires originaux dont ilparoit que Moyse s ’est servi pour composer le livre de la Genèse. Brussels. R. E. Bee. 1971. Statistical methods in the study of the Masoretic text of the Old Testament. J. of the Royal Statistical Society, 134(1):61 1-622. M. J. Berryman, A. Allison, and D. Abbott. 2003. Statistical techniques for text classification based on word recurrence intervals. Fluctuation and Noise Letters, 3(1):L1-L10. J. E. Carpenter, G. Hartford-Battersby. 1900. The Hexateuch: According to the Revised Version. London. J. Clark and C. Hannon. 2007. A classifier system for author recognition using synonym-based features. Proc. Sixth Mexican International Conference on Artificial Intelligence, Lecture Notes in Artificial Intelligence, vol. 4827, pp. 839-849. I. S. Dhillon, Y. Guan, and B. Kulis. 2004. Kernel kmeans: spectral clustering and normalized cuts. Proc. ACM International Conference on Knowledge Discovery and Data Mining (KDD), pp. 55 1-556. S. R. Driver. 1909. An Introduction to the Literature of the Old Testament (8th ed.). Clark, Edinburgh. N. Graham, G. Hirst, and B. Marthi. 2005. Segmenting documents by stylistic character. Natural Language Engineering, 11(4):397-415. D. Guthrie, L. Guthrie, and Y. Wilks. 2008. An unsupervised probabilistic approach for the detection of outliers in corpora. Proc. Sixth International Language Resources and Evaluation (LREC'08), pp. 2830. D. Holmes. 1994. Authorship attribution, and the Humanities, 28(2):87-106. Computers P. Juola. 2008. Author Attribution. Series title: Foundations and Trends in Information Retrieval. Now Publishing, Delft. M. Koppel, N. Akiva, and I. Dagan. 2006. Feature instability as a criterion for selecting potential style 1364 markers. J. of the American Society for Information Science and Technology, 57(1 1): 15 19-1525. M. Koppel, J. Schler, and S. Argamon. 2009. Computational methods in authorship attribution. J. of the American Society for Information Science and Technology, 60(1):9-26. D. L. Mealand. 1995. Correspondence analysis of Luke. Lit. Linguist Computing, 10(3): 171-182. S. Meyer zu Eisen and B. Stein. 2006. Intrinsic plagiarism detection. Proc. European Conference on Infor- mation Retrieval (ECIR 2006), Lecture Notes in Computer Science, vol. 3936, pp. 565–569. K. Nigam, A. K. McCallum, S. Thrun, and T. M. Mitchell. 2000. Text classification from labeled and unlabeled documents using EM, Machine Learning, 39(2/3): 103-134. M. H. Pope. 1965. Job (The Anchor Bible, Vol. XV). Doubleday, New York, NY. M. H. Pope. 1952. Isaiah 34 in relation to Isaiah 35, 4066. Journal of Biblical Literature, 7 1(4):235-243. Y. Radday. 1970. Isaiah and the computer: A preliminary report, Computers and the Humanities, 5(2):6573. E. Stamatatos. 2009. A survey of modern authorship attribution methods. J. of the American Society for Information Science and Technology, 60(3):538-556. J. Strong. 1890. The Exhaustive Concordance of the Bible. Nashville, TN. (Online edition: http://www.htmlbible.com/sacrednamebiblecom/kjvs trongs/STRINDEX.htm; accessed 14 November 2010.)