acl acl2011 acl2011-56 acl2011-56-reference knowledge-graph by maker-knowledge-mining

56 acl-2011-Bayesian Inference for Zodiac and Other Homophonic Ciphers

Source: pdf

Author: Sujith Ravi ; Kevin Knight

Abstract: We introduce a novel Bayesian approach for deciphering complex substitution ciphers. Our method uses a decipherment model which combines information from letter n-gram language models as well as word dictionaries. Bayesian inference is performed on our model using an efficient sampling technique. We evaluate the quality of the Bayesian decipherment output on simple and homophonic letter substitution ciphers and show that unlike a previous approach, our method consistently produces almost 100% accurate decipherments. The new method can be applied on more complex substitution ciphers and we demonstrate its utility by cracking the famous Zodiac-408 cipher in a fully automated fashion, which has never been done before.

reference text

Phil Blunsom, Trevor Cohn, Chris Dyer, and Miles Osborne. 2009. A Gibbs sampler for phrasal synchronous grammar induction. In Proceedings of the Joint Conference of the 47th Annual Meeting of the ACL and the 4th International Joint Conference on Natural Language Processing of the Asian Federation of Natural Language Processing (ACL-IJCNLP), pages 782–790. David Chiang, Jonathan Graehl, Kevin Knight, Adam Pauls, and Sujith Ravi. 2010. Bayesian inference for finite-state transducers. In Proceedings of the Conference of the North American Chapter of the Association for Computational Linguistics - Human Language Technologies (NAACL/HLT), pages 447–455. Eric Corlett and Gerald Penn. 2010. An exact A* method for deciphering letter-substitution ciphers. In Proceedings of the 48th Annual Meeting of the Association for Computational Linguistics, pages 1040–1047. Arthur P. Dempster, Nan M. Laird, and Donald B. Rubin. 1977. Maximum likelihood from incomplete data via the EM algorithm. Journal of the Royal Statistical Society, Series B, 39(1): 1–38. Persi Diaconis. 2008. The Markov Chain Monte Carlo revolution. Bulletin of the American Mathematical Society, 46(2): 179–205. Jenny Finkel, Trond Grenager, and Christopher Manning. 2005. Incorporating non-local information into information extraction systems by Gibbs sampling. In Proceedings of the 43rd Annual Meeting of the Association for Computational Linguistics (ACL), pages 363– 370. Ravi Ganesan and Alan T. Sherman. 1993. Statistical techniques for language recognition: An introduction and guide for cryptanalysts. Cryptologia, 17(4):321– 366. Stuart Geman and Donald Geman. 1984. Stochastic re- laxation, Gibbs distributions and the Bayesian restoration of images. IEEE Transactions on Pattern Analysis and Machine Intelligence, 6(6):721–741. Sharon Goldwater and Thomas Griffiths. 2007. A fully Bayesian approach to unsupervised part-of-speech tagging. In Proceedings ofthe 45thAnnual Meeting ofthe Association of Computational Linguistics, pages 744– 751. Thomas Jakobsen. 1995. A fast method for cryptanalysis of substitution ciphers. Cryptologia, 19(3):265–274. Kevin Knight, Anish Nair, Nishit Rathod, and Kenji Yamada. 2006. Unsupervised analysis for decipherment problems. In Proceedings of the Joint Conference of the International Committee on Computational Linguistics and the Association for Computational Linguistics, pages 499–506. Percy Liang, Michael I. Jordan, and Dan Klein. 2010. Type-based MCMC. In Proceedings ofthe Conference on Human Language Technologies: The 2010 Annual Conference of the North American Chapter of the Association for Computational Linguistics, pages 573– 581. Edwin Olson. 2007. Robust dictionary attack of short simple substitution ciphers. Cryptologia, 31(4):332– 342. David Oranchak. 2008. Evolutionary algorithm for decryption of monoalphabetic homophonic substitution ciphers encoded as constraint satisfaction problems. In Proceedings ofthe 10thAnnual Conference on Genetic and Evolutionary Computation, pages 1717–1718. 247 Shmuel Peleg and Azriel Rosenfeld. 1979. Breaking substitution ciphers using a relaxation algorithm. Comm. ACM, 22(1 1):598–605. Sujith Ravi and Kevin Knight. 2008. Attacking decipherment problems optimally with low-order n-gram models. In Proceedings of the Empirical Methods in Natural Language Processing (EMNLP), pages 812– 819. Sujith Ravi and Kevin Knight. 2009. Probabilistic methods for a Japanese syllable cipher. In Proceedings of the International Conference on the Computer Processing of Oriental Languages (ICCPOL), pages 270– 281. Benjamin Snyder, Regina Barzilay, and Kevin Knight. 2010. A statistical model for lost language decipherment. In Proceedings of the 48th Annual Meeting of the Association for Computational Linguistics, pages 1048–1057.