nips nips2010 nips2010-255 nips2010-255-reference knowledge-graph by maker-knowledge-mining

255 nips-2010-Static Analysis of Binary Executables Using Structural SVMs


Source: pdf

Author: Nikos Karampatziakis

Abstract: We cast the problem of identifying basic blocks of code in a binary executable as learning a mapping from a byte sequence to a segmentation of the sequence. In general, inference in segmentation models, such as semi-CRFs, can be cubic in the length of the sequence. By taking advantage of the structure of our problem, we derive a linear-time inference algorithm which makes our approach practical, given that even small programs are tens or hundreds of thousands bytes long. Furthermore, we introduce two loss functions which are appropriate for our problem and show how to use structural SVMs to optimize the learned mapping for these losses. Finally, we present experimental results that demonstrate the advantages of our method against a strong baseline. 1


reference text

[1] F. B. Wrixon Codes, Ciphers, Secrets and Cryptic Communication. page 490, Black Dog & Leventhal Publishers, 2005.

[2] John D. Lafferty, Andrew McCallum, and Fernando C. N. Pereira. Conditional random fields: Probabilistic models for segmenting and labeling sequence data. In ICML ’01: Proceedings of the Eighteenth International Conference on Machine Learning, pages 282–289, San Francisco, CA, USA, 2001. Morgan Kaufmann Publishers Inc.

[3] S. Sarawagi and W.W. Cohen. Semi-markov conditional random fields for information extraction. Advances in Neural Information Processing Systems, 17:1185–1192, 2005.

[4] Q. Shi, Y. Altun, A. Smola, and SVN Vishwanathan. Semi-Markov Models for Sequence Segmentation. In Proceedings of the 2007 EMNLP-CoNLL.

[5] S. Sarawagi. Efficient inference on sequence segmentation models. In Proceedings of the 23rd international conference on Machine learning, page 800. ACM, 2006.

[6] C. Kruegel, W. Robertson, F. Valeur, and G. Vigna. Static disassembly of obfuscated binaries. In Proceedings of the 13th conference on USENIX Security Symposium-Volume 13, page 18. USENIX Association, 2004.

[7] I. Tsochantaridis, T. Joachims, T. Hofmann, and Y. Altun. Large margin methods for structured and interdependent output variables. Journal of Machine Learning Research, 6(2):1453, 2006.

[8] T. Joachims, T. Finley, and C-N. Yu. Cutting-Plane Training of Structural SVMs. Machine Learning, 77(1):27, 2009.

[9] F. Sha and F. Pereira. Shallow parsing with conditional random fields. In Proceedings of HLT-NAACL, pages 213–220, 2003.

[10] H. Theiling. Extracting safe and precise control flow from binaries. In Seventh International Conference on Real-Time Computing Systems and Applications, pages 23–30, 2000.

[11] C. Cifuentes and M. Van Emmerik. UQBT: Adaptable binary translation at low cost. Computer, 33(3):60–66, 2000.

[12] N. Rosenblum, X. Zhu, B. Miller, and K. Hunt. Learning to analyze binary computer code. In Conference on Artificial Intelligence (AAAI 2008), Chicago, Illinois, 2008. 9