acl acl2013 acl2013-66 acl2013-66-reference knowledge-graph by maker-knowledge-mining

66 acl-2013-Beam Search for Solving Substitution Ciphers

Source: pdf

Author: Malte Nuhn ; Julian Schamper ; Hermann Ney

Abstract: In this paper we address the problem of solving substitution ciphers using a beam search approach. We present a conceptually consistent and easy to implement method that improves the current state of the art for decipherment of substitution ciphers and is able to use high order n-gram language models. We show experiments with 1:1 substitution ciphers in which the guaranteed optimal solution for 3-gram language models has 38.6% decipherment error, while our approach achieves 4.13% decipherment error in a fraction of time by using a 6-gram language model. We also apply our approach to the famous Zodiac-408 cipher and obtain slightly bet- ter (and near to optimal) results than previously published. Unlike the previous state-of-the-art approach that uses additional word lists to evaluate possible decipherments, our approach only uses a letterbased 6-gram language model. Furthermore we use our algorithm to solve large vocabulary substitution ciphers and improve the best published decipherment error rate based on the Gigaword corpus of 7.8% to 6.0% error rate.

reference text

Andrew J. Clark. 1998. Optimisation heuristics for cryptology. Ph.D. thesis, Faculty of Information Technology, Queensland University of Technology. Eric Corlett and Gerald Penn. 2010. An exact A* method for deciphering letter-substitution ciphers. In Proceedings of the 48th Annual Meeting of the Association for Computational Linguistics (ACL), pages 1040–1047, Uppsala, Sweden, July. The Association for Computer Linguistics. Qing Dou and Kevin Knight. 2012. Large scale decipherment for out-of-domain machine translation. In Conference on Empirical Methods in Natural Language Processing and Computational Natural Language Learning (EMNLP-CoNLL), pages 266–275, Jeju Island, Korea, July. Association for Computational Linguistics. David Graff, Junbo Kong, Ke Chen, and Kazuaki Maeda. 2007. English Gigaword Third Edition. Linguistic Data Consortium, Philadelphia. George W. Hart. 1994. To decode short cryptograms. Communications of the Association for Computing Machinery (CACM), 37(9): 102–108, September. Malte Nuhn, Arne Mauser, and Hermann Ney. 2012. Deciphering foreign language by combining language models and context vectors. In Proceedings of the 50th Annual Meeting of the Association for Computational Linguistics (ACL), pages 156–164, Jeju, Republic of Korea, July. Association for Computational Linguistics. Edwin Olson. 2007. Robust dictionary attack of short simple substitution ciphers. Cryptologia, 31(4):332–342, October. 1575 Sujith Ravi and Kevin Knight. 2008. Attacking decipherment problems optimally with low-order ngram models. In Proceedings of the Conference on Empirical Methods in Natural Language Processing (EMNLP), pages 812–819, Honolulu, Hawaii. Association for Computational Linguistics. Sujith Ravi and Kevin Knight. 2011a. Bayesian inference for Zodiac and other homophonic ciphers. In Proceedings of the 49th Annual Meeting of the Association for Computational Linguistics (ACL), pages 239–247, Portland, Oregon, June. Association for Computational Linguistics. Sujith Ravi and Kevin Knight. 2011b. Deciphering foreign language. In Proceedings of the 49th Annual Meeting of the Association for Computational Linguistics: Human Language Technologies (ACLHLT), pages 12–21, Portland, Oregon, USA, June. Association for Computational Linguistics. Ralf Steinberger, Bruno Pouliquen, Anna Widiger, Camelia Ignat, Toma zˇ Erjavec, and Dan Tufi ¸s. 2006. The JRC-Acquis: A multilingual aligned parallel corpus with 20+ languages. In In Proceedings of the 5th International Conference on Language Resources and Evaluation (LREC), pages 2142–2147. European Language Resources Association. 1576