emnlp emnlp2012 emnlp2012-27 emnlp2012-27-reference knowledge-graph by maker-knowledge-mining

27 emnlp-2012-Characterizing Stylistic Elements in Syntactic Structure

Source: pdf

Author: Song Feng ; Ritwik Banerjee ; Yejin Choi

Abstract: Much of the writing styles recognized in rhetorical and composition theories involve deep syntactic elements. However, most previous research for computational stylometric analysis has relied on shallow lexico-syntactic patterns. Some very recent work has shown that PCFG models can detect distributional difference in syntactic styles, but without offering much insights into exactly what constitute salient stylistic elements in sentence structure characterizing each authorship. In this paper, we present a comprehensive exploration of syntactic elements in writing styles, with particular emphasis on interpretable characterization of stylistic elements. We present analytic insights with respect to the authorship attribution task in two different domains. ,

reference text

Shlomo Argamon and Shlomo Levitan. 2004. Measuring the usefulness of function words for authorship attribution. Literary and Linguistic Computing, pages 1–3. Shlomo Argamon, Casey Whitelaw, Paul Chase, Sobhan Raj Hota, Navendu Garg, and Shlomo Levitan. 2007. Stylistic text classification using functional lexical features: Research articles. J. Am. Soc. Inf. Sci. Technol., 58(6):802–822. H. Baayen, H. Van Halteren, and F. Tweedie. 1996. Outside the cave of shadows: Using syntactic annotation to enhance authorship attribution. Literary and Linguistic Computing, 11(3) :121. H. Baayen, H. van Halteren, A. Neijt, and F. Tweedie. 2002. An experiment in authorship attribution. In 6th JADT. Citeseer. A. Bain. 1887. English Composition and Rhetoric: Intellectual elements of style. D. Appleton and company. S. Bird, R. Dale, B.J. Dorr, B. Gibson, M.T. Joseph, M.Y. Kan, D. Lee, B. Powley, D.R. Radev, and Y.F. Tan. 2008. The acl anthology reference corpus: A reference dataset for bibliographic research in computational linguistics. In Proc. 1531 of the 6th International Conference on Language Resources and Evaluation Conference (LREC08), pages 1755–1759. J. Burrows. 2002. Delta: A measure of stylistic difference and a guide to likely authorship. Literary and Linguistic Computing, 17(3) :267–287. Samuel W. K. Chan, Lawrence Y. L. Cheung, and Mickey W. C. Chong. 2010. Tree topological features for unlexicalized parsing. In Proceedings of the 23rd International Conference on Computational Linguistics: Posters, COLING ’10, pages 117–125, Stroudsburg, PA, USA. Association for Computational Linguistics. Michael Collins and Nigel Duffy. 2002. New ranking algorithms for parsing and tagging: kernels over discrete structures, and the voted perceptron. In Proceedings of the 40th Annual Meeting on Association for Computational Linguistics, ACL ’02, pages 263–270, Stroudsburg, PA, USA. Association for Computational Linguistics. J. Diederich, J. Kindermann, E. Leopold, and G. Paass. 2003. Authorship attribution with support vector machines. Applied Intelligence, 19(1):109–123. Rong-En Fan, Kai-Wei Chang, Cho-Jui Hsieh, Xiang-Rui Wang, and Chih-Jen Lin. 2008. LIB- LINEAR: A library for large linear classification. Journal of Machine Learning Research, 9:1871– 1874. Antonion Miranda Garcia and Javier Calle Martin. 2007. Function words in authorship attribution studies. Literary and Linguistic Computing, 22(1):49–66. Graeme Hirst and Olga Feiguina. 2007. Bigrams of syntactic labels for authorship discrimination of short texts. Literary and Linguistic Computing, 22(4):405–417. D. L. Hoover. 2004. Testing burrow’s delta. Literary and Linguistic Computing, 19(4) :453–475. J. Houvardas and E. Stamatatos. 2006. N-gram feature selection for author identification. In Proc. of the 12th International Conference on Artificial Intelligence: Methodology, Systems and Applications, volume 4183 of LNCS, pages 77–86, Varna, Bulgaria. Springer. Aravind K. Joshi. 1992. Statistical language modeling. In Proceedings of a Workshop Held at Harriman, New York, February 23-26, 1992. Association for Computational Linguistics. S. Kemper. 1987. Life-span changes in syntactic complexity. Journal of gerontology, 42(3) :323. Vlado Keselj , Fuchun Peng, Nick Cercone, and Calvin Thomas. 2003. N-gram-based author pro- files for authorship attribution. In Proc. of the Pacific Association for Computational Linguistics, pages 255–264. M. Koppel and J. Schler. 2003. Exploiting stylistic idiosyncrasies for authorship attribution. In Proceedings of IJCAI, volume 3, pages 69–72. Citeseer. D. Lin. 1995. University of manitoba: description of the pie system used for muc-6. In Proceedings of the 6th conference on Message understanding, pages 113–126. Association for Computational Linguistics. D. Lin. 1997. Using syntactic dependency as local context to resolve word sense ambiguity. In Proceedings of the 35th Annual Meeting of the Association for Computational Linguistics and Eighth Conference of the European Chapter of the Association for Computational Linguistics, pages 64–71. Association for Computational Linguistics. Kim Luyckx and Walter Daelemans. 2008. Authorship attribution and verification with many authors and limited data. In COLING ’08, pages 513–520. T.C. Mendenhall. 1887. The characteristic curves of composition. Science, ns-9(214S) :237–246. Alessandro Moschitti. 2008. Kernel methods, syntax and semantics for relational text categorization. In Proceedings of the 17th A CM conference on In- formation and knowledge management, CIKM ’08, pages 253–262, New York, NY, USA. ACM. Frederick Mosteller and David L. Wallace. 1984. Applied Bayesian and Classical Inference: The Case of the Federalist Papers. Springer-Verlag. Fuchun Peng, Dale Schuurmans, Shaojun Wang, and Vlado Keselj . 2003. Language independent authorship attribution using character level language models. In Proceedings of the tenth conference on European chapter of the Association for Computational Linguistics - Volume 1, EACL ’03, pages 267–274, Stroudsburg, PA, USA. Association for Computational Linguistics. S. Petrov and D. Klein. 2007. Improved inference for unlexicalized parsing. In Proceedings of NAA CL HLT 2007, pages 404–411. Daniele Pighin and Alessandro Moschitti. 2009. Reverse engineering of tree kernel feature spaces. In Proceedings of the 2009 Conference on Empirical Methods in Natural Language Processing: Volume 1 - Volume 1, EMNLP ’09, pages 111–120, Stroudsburg, PA, USA. Association for Computational Linguistics. Emily Pitler and Ani Nenkova. 2008. Revisiting readability: a unified framework for predicting text quality. In Proceedings of the Conference on 1532 Empirical Methods in Natural Language Processing, EMNLP ’08, pages 186–195, Stroudsburg, PA, USA. Association for Computational Linguistics. Arthus Quinn. 1995. Figures of Speech: 60 Ways To Turn A Phrase. Routledge. Sindhu Raghavan, Adriana Kovashka, and Raymond Mooney. 2010. Authorship attribution using probabilistic context-free grammars. In Proceedings of the A CL 2010 Conference Short Papers, pages 38–42, Uppsala, Sweden. Association for Computational Linguistics. Ruchita Sarawgi, Kailash Gajulapalli, and Yejin Choi. 2011. Gender attribution: tracing stylometric evidence beyond topic and genre. In Proceedings of the Fifteenth Conference on Computational Natural Language Learning, CoNLL ’11, pages 78–86, Stroudsburg, PA, USA. Association for Computational Linguistics. K.T. Shao. 1990. Tree balance. Systematic Biology, 39(3):266. Efstathios Stamatatos, George Kokkinakis, and Nikos Fakotakis. 2000. Automatic text categorization in terms of genre and author. Comput. Linguist. , 26(4) :471–495. E. Stamatatos, N. Fakotakis, and G. Kokkinakis. 2001. Computer-based authorship attribution without lexical measures. Computers and the Hu- manities, 35(2) :193–214. E. Stamatatos. 2006. Ensemble-based author identification using character n-grams. ReCALL, page 4146. W. Strunk and E.B. White. 2008. The elements of style. Penguin Group USA. Sze-Meng Jojo Wong and Mark Dras. 2011. Exploiting parse structures for native language identification. In Proceedings of the Conference on Empirical Methods in Natural Language Processing, EMNLP ’1 1, pages 1600–1610, Stroudsburg, PA, USA. Association for Computational Linguistics. Ying Zhao and Justin Zobel. 2005. Effective and scalable authorship attribution using function words. In Proceedings of the Second Asia conference on Asia Information Retrieval Technology, AIRS’05, pages 174–189, Berlin, Heidelberg. Springer-Verlag. Y. Zhao and J. Zobel. 2007. Searching with style: Authorship attribution in classic literature. In Proceedings of the thirtieth Australasian conference on Computer science- Volume 62, pages 59– 68. Australian Computer Society, Inc. Rong Zheng, Jiexun Li, Hsinchun Chen, and Zan Huang. 2006. A framework for authorship identification of online messages: Writing-style features and classification techniques. J. Am. Soc. Inf. Sci. Technol., 57(3):378–393. 1533