jmlr jmlr2012 jmlr2012-104 jmlr2012-104-reference knowledge-graph by maker-knowledge-mining

104 jmlr-2012-Security Analysis of Online Centroid Anomaly Detection

Source: pdf

Author: Marius Kloft, Pavel Laskov

Abstract: Security issues are crucial in a number of machine learning applications, especially in scenarios dealing with human activity rather than natural phenomena (e.g., information ranking, spam detection, malware detection, etc.). In such cases, learning algorithms may have to cope with manipulated data aimed at hampering decision making. Although some previous work addressed the issue of handling malicious data in the context of supervised learning, very little is known about the behavior of anomaly detection methods in such scenarios. In this contribution,1 we analyze the performance of a particular method—online centroid anomaly detection—in the presence of adversarial noise. Our analysis addresses the following security-related issues: formalization of learning and attack processes, derivation of an optimal attack, and analysis of attack efﬁciency and limitations. We derive bounds on the effectiveness of a poisoning attack against centroid anomaly detection under different conditions: attacker’s full or limited control over the trafﬁc and bounded false positive rate. Our bounds show that whereas a poisoning attack can be effectively staged in the unconstrained case, it can be made arbitrarily difﬁcult (a strict upper bound on the attacker’s gain) if external constraints are properly used. Our experimental evaluation, carried out on real traces of HTTP and exploit trafﬁc, conﬁrms the tightness of our theoretical bounds and the practicality of our protection mechanisms. Keywords: anomaly detection, adversarial, security analysis, support vector data description, computer security, network intrusion detection

reference text

D. Angluin and P. Laird. Learning from noisy examples. Machine Learning, 2(4):434–470, 1988. P. Auer. Learning nested differences in the presence of malicious noise. Theoretical Computer Science, 185(1):159–175, 1997. M. Bailey, J. Oberheide, J. Andersen, Z. M. Mao, F. Jahanian, and J. Nazario. Automated classiﬁcation and analysis of internet malware. In Recent Adances in Intrusion Detection (RAID), pages 178–197, 2007. M. Barreno, B. Nelson, R. Sears, A. Joseph, and J. Tygar. Can machine learning be secure? In ACM Symposium on Information, Computer and Communication Security, pages 16–25, 2006. M. Barreno, P. L. Bartlett, F. J. Chi, A. D. Joseph, B. Nelson, B. I. Rubinstein, U. Saini, and J. D. Tygar. Open problems in the security of learning. In AISec ’08: Proceedings of the 1st ACM workshop on Workshop on AISec, pages 19–26, New York, NY, USA, 2008. ACM. M. Barreno, B. Nelson, A. D. Joseph, and J. D. Tygar. The security of machine learning. Machine Learning, 81(2):121–148, 2010. B. Boser, I. Guyon, and V. Vapnik. A training algorithm for optimal margin classiﬁers. In D. Haussler, editor, Proceedings of the 5th Annual ACM Workshop on Computational Learning Theory, pages 144–152, 1992. M. L. Braun, J. Buhmann, and K.-R. M¨ ller. On relevant dimensions in kernel feature spaces. u Journal of Machine Learning Research, 9:1875–1908, Aug 2008. N. H. Bschouty, N. Eiron, and E. Kushilevitz. PAC learning with nasty noise. In Algorithmic Learning Theory (ALT 1999), pages 206–218, 1999. 3720 S ECURITY A NALYSIS OF O NLINE C ENTROID A NOMALY D ETECTION G. Cretu, A. Stavrou, M. Locasto, S. Stolfo, and A. Keromytis. Casting out demons: Sanitizing training data for anomaly sensors. In Proc. of IEEE Symposium on Security and Privacy, pages 81–95, 2008. G. F. Cretu-Ciocarlie, A. Stavrou, M. E. Locasto, and S. J. Stolfo. Adaptive anomaly detection via self-calibration and dynamic updating. In Recent Adances in Intrusion Detection (RAID), pages 41–60, 2009. N. Dalvi, P. Domingos, M. Sumit, and S. D. Verma. Adversarial classiﬁcation. In In KDD, pages 99–108. ACM Press, 2004. O. Dekel and O. Shamir. Learning to classify with missing and corrupted features. In International Conference on Machine Learning (ICML), pages 216–223, 2008. O. Dekel, O. Shamir, and L. Xiao. Learning to classify with missing and corrupted features. Machine Learning, 2009. P. Fogla and W. Lee. Evading network anomaly detection systems: formal reasoning and practical techniques. In ACM Conference on Computer and Communications Security, pages 59–68, 2006. P. Fogla, M. Sharif, R. Perdisci, O. Kolesnikov, and W. Lee. Polymorphic blending attacks. In Proc. of USENIX Security Symposium, pages 241–256, 2006. S. Forrest, S. Hofmeyr, A. Somayaji, and T. Longstaff. A sense of self for unix processes. In Proc. of IEEE Symposium on Security and Privacy, pages 120–128, Oakland, CA, USA, 1996. A. Globerson and S. Roweis. Nightmare at test time: Robust learning by feature deletion. In International Conference on Machine Learning (ICML), pages 353–360, 2006. S. Hofmeyr, S. Forrest, and A. Somayaji. Intrusion detection using sequences of system calls. Journal of Computer Security, 6(3):151–180, 1998. M. Kearns and M. Li. Learning in the presence of malicious errors. SIAM Journal on Computing, 22(4):807–837, 1993. P. Laskov and M. Kloft. A framework for quantitative security analysis of machine learning. In D. Balfanz and J. Staddon, editors, AISec, pages 1–4. ACM, 2009. ISBN 978-1-60558-781-3. P. Laskov, C. Sch¨ fer, and I. Kotenko. Intrusion detection in unlabeled data with quarter-sphere a support vector machines. In Detection of Intrusions and Malware, and Vulnerability Assessment, Proc. of DIMVA Conference, pages 71–82, 2004a. P. Laskov, C. Sch¨ fer, I. Kotenko, and K.-R. M¨ ller. Intrusion detection in unlabeled data with a u quarter-sphere support vector machines (extended version). Praxis der Informationsverarbeitung und Kommunikation, 27:228–236, 2004b. P. Laskov, C. Gehl, S. Kr¨ ger, and K. R. M¨ ller. Incremental support vector learning: Analysis, u u implementation and applications. Journal of Machine Learning Research, 7:1909–1936, Sept. 2006. 3721 K LOFT AND L ASKOV A. Lazarevic, L. Ertoz, V. Kumar, A. Ozgur, and J. Srivastava. A comparative study of anomaly detection schemes in network intrusion detection. In Proc. of SIAM International Conference on Data Mining (SDM), 2003. C. Leslie, E. Eskin, and W. Noble. The spectrum kernel: A string kernel for SVM protein classiﬁcation. In Proc. Paciﬁc Symp. Biocomputing, pages 564–575, 2002. Z. Li, M. Sandhi, Y. Chen, M.-Y. Kao, and B. Chavez. Hamsa: fast signature generation for zeroday polymorphic worms with provable attack resilience. In IEEE Security and Privacy, pages 32–47, 2006. N. Littlestone. Learning quickly when irrelevant attributes abound: A new linear threshold algorithm. Machine Learning, 2:285–318, 1988. D. Lowd and C. Meek. Good word attacks on statistical spam ﬁlters. In Proceedings of the 11th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pages 641–647, 2005a. D. Lowd and C. Meek. Adversarial learning. In Conference on Email and Anti-Spam, 2005b. M. Markou and S. Singh. Novelty detection: a review – part 1: statistical approaches. Signal Processing, 83:2481–2497, 2003a. M. Markou and S. Singh. Novelty detection: a review – part 2: neural network based approaches. Signal Processing, 83:2499–2521, 2003b. L. Martein and S. Schaible. On solving a linear program with one quadratic constraint. Decisions in Economics and Finance, 10:75–90, 2005. Microsoft. Microsoft security intelligence report: January to June 2008. Microsoft Corporation, 2008. M. Mohri and A. Rostamizadeh. Stability bounds for stationary φ-mixing and β-mixing processes. J. Mach. Learn. Res., 11:789–814, March 2010. ISSN 1532-4435. URL http: //portal.acm.org/citation.cfm?id=1756006.1756032. D. Moore, C. Shannon, and J. Brown. Code-Red: a case study on the spread and victims of an internet worm. In Proc. of Internet Measurement Workshop (IMW), pages 273–284, 2002. K.-R. M¨ ller, S. Mika, G. R¨ tsch, K. Tsuda, and B. Sch¨ lkopf. An introduction to kernel-based u a o learning algorithms. IEEE Neural Networks, 12(2):181–201, May 2001. A. Nairac, T. N., R. Carr, S. King, P. Cowley, and L. Tarassenko. A system for the analysis fo jet vibration data. Integrated Computer-Aided Engineering, 1999. B. Nelson and A. D. Joseph. Bounding an attack’s complexity for a simple learning model. In Proc. of the First Workshop on Tackling Computer Systems Problems with Machine Learning Techniques (SysML), Saint-Malo, France, 2006. 3722 S ECURITY A NALYSIS OF O NLINE C ENTROID A NOMALY D ETECTION B. Nelson, M. Barreno, F. Chi, A. Joseph, B. Rubinstein, U. Saini, C. Sutton, J. Tygar, and K. Xia. Exploiting machine learning to subvert your spam ﬁlter. In Proceedings of the First USENIX Workshop on Large-Scale Exploits and Emergent Threats (LEET’08), 2008. J. Newsome, B. Karp, and D. Song. Paragraph: Thwarting signature learning by training maliciously. In Recent Adances in Intrusion Detection (RAID), pages 81–105, 2006. E. Parzen. On estimation of probability density function and mode. Annals of Mathematical Statistics, 33:1065–1076, 1962. R. Perdisci, D. Dagon, W. Lee, P. Fogla, and M. Sharif. Misleading worm signature generators using deliberate noise injection. In Proc. of IEEE Symposium on Security and Privacy, pages 17–31, 2006. W. Polonik. Measuring mass concentration and estimating density contour clusters – an excess mass approach. Annals of Statistics, 23:855–881, 1995. S. Rajasegarar, C. Leckie, M. Palaniswami, and J. Bezdek. Quarter sphere based distributed anomaly detection in wireless sensor networks. In IEEE International Conference on Communications (ICC), pages 3864–3869, 2007. K. Rieck and P. Laskov. Detecting unknown network attacks using language models. In Detection of Intrusions and Malware, and Vulnerability Assessment, Proc. of 3rd DIMVA Conference, LNCS, pages 74–90, July 2006. K. Rieck and P. Laskov. Language models for detection of unknown attacks in network trafﬁc. Journal in Computer Virology, 2(4):243–256, 2007. K. Rieck and P. Laskov. Linear-time computation of similarity measures for sequential data. Journal of Machine Learning Research, 9(Jan):23–48, 2008. K. Rieck, T. Holz, C. Willems, P. D¨ ssel, and P. Laskov. Learning and classiﬁcation of malware u behavior. In Detection of Intrusions and Malware, and Vulnerability Assessment, Proc. of 5th DIMVA Conference, LNCS, pages 108–125, 2008. B. Sch¨ lkopf and A. Smola. Learning with Kernels. MIT Press, Cambridge, MA, 2002. o B. Sch¨ lkopf, A. Smola, and K.-R. M¨ ller. Nonlinear component analysis as a kernel eigenvalue o u problem. Neural Computation, 10:1299–1319, 1998. B. Sch¨ lkopf, J. Platt, J. Shawe-Taylor, A. Smola, and R. Williamson. Estimating the support of a o high-dimensional distribution. Neural Computation, 13(7):1443–1471, 2001. C. Shannon and D. Moore. The spread of the Witty worm. IEEE Security and Privacy, 2(4):46–50, 2004. J. Shawe-Taylor and N. Cristianini. Kernel Methods for Pattern Analysis. Cambridge University Press, 2004. 3723 K LOFT AND L ASKOV Y. Song, M. Locasto, A. Stavrou, A. Keromytis, and S. Stolfo. On the infeasibility of modeling polymorphic shellcode. In Conference on Computer and Communications Security (CCS), pages 541–551, 2007. I. Steinwart, D. Hush, and C. Scovel. A classiﬁcation framework for anomaly detection. Journal of Machine Learning Research, 6:211–232, 2005. I. Steinwart, D. Hush, and C. Scovel. Learning from dependent observations. J. Multivar. Anal., 100:175–194, January 2009. ISSN 0047-259X. M. Sugiyama, M. Krauledat, and K.-R. M¨ ller. Covariate shift adaptation by importance weighted u cross validation. Journal of Machine Learning Research, 8:1027–1061, 2007. Symantec. Symantex report on the underground economy: July 07 to June 08. Symantec Corporation, 2008. D. Tax and R. Duin. Data domain description by support vectors. In M. Verleysen, editor, Proc. ESANN, pages 251–256, Brussels, 1999a. D. Facto Press. D. Tax and R. Duin. Support vector domain description. Pattern Recognition Letters, 20(11–13): 1191–1199, 1999b. A. Tsybakov. On nonparametric estimation of density level sets. Annals of Statistics, 25:948–969, 1997. C. van de Panne. Programming with a quadratic constraint. Management Science, 12:798–815, 1966. V. Vapnik. Statistical Learning Theory. Wiley, New York, 1998. R. Vert and J.-P. Vert. Consistency and convergence rates of one-class svms and related algorithms. J. Mach. Learn. Res., 7:789–814, December 2006. ISSN 1532-4435. K. Wang and S. Stolfo. Anomalous payload-based network intrusion detection. In Recent Adances in Intrusion Detection (RAID), pages 203–222, 2004. K. Wang, G. Cretu, and S. Stolfo. Anomalous payload-based worm detection and signature generation. In Recent Adances in Intrusion Detection (RAID), 2005. K. Wang, J. Parekh, and S. Stolfo. Anagram: A content anomaly detector resistant to mimicry attack. In Recent Adances in Intrusion Detection (RAID), pages 226–248, 2006. C. Warrender, S. Forrest, and B. Pearlmutter. Detecting intrusions using system calls: alternative data methods. In Proc. of IEEE Symposium on Security and Privacy, pages 133–145, 1999. D.-Y. Yeung and C. Chow. Parzen-window network intrusion detectors. In Sixteenth International Conference on Pattern Recognition (ICPR), pages 385–388, 2002. 3724