jmlr jmlr2005 jmlr2005-3 jmlr2005-3-reference knowledge-graph by maker-knowledge-mining
Source: pdf
Author: Ingo Steinwart, Don Hush, Clint Scovel
Abstract: One way to describe anomalies is by saying that anomalies are not concentrated. This leads to the problem of finding level sets for the data generating density. We interpret this learning problem as a binary classification problem and compare the corresponding classification risk with the standard performance measure for the density level problem. In particular it turns out that the empirical classification risk can serve as an empirical performance measure for the anomaly detection problem. This allows us to compare different anomaly detection algorithms empirically, i.e. with the help of a test set. Furthermore, by the above interpretation we can give a strong justification for the well-known heuristic of artificially sampling “labeled” samples, provided that the sampling plan is well chosen. In particular this enables us to propose a support vector machine (SVM) for anomaly detection for which we can easily establish universal consistency. Finally, we report some experiments which compare our SVM to other commonly used methods including the standard one-class SVM. Keywords: unsupervised learning, anomaly detection, density levels, classification, SVMs
S. Ben-David and M. Lindenbaum. Learning distributions by their density levels: a paradigm for learning without a teacher. J. Comput. System Sci., 55:171–182, 1997. C. Campbell and K. P. Bennett. A linear programming approach to novelty detection. In T. K. Leen, T. G. Dietterich, and V. Tresp, editors, Advances in Neural Information Processing Systems 13, pages 395–401. MIT Press, 2001. C.-C. Chang and C.-J. Lin. LIBSVM: a library for support vector machines, 2004. A. Cuevas, M. Febrero, and R. Fraiman. Cluster analysis: a further approach based on density estimation. Computat. Statist. Data Anal., 36:441–459, 2001. A. P. Dempster, N. M. Laird, and D. B. Rubin. Maximum likelihood from incomplete data via the EM algorithm. J. Royal Statist. Soc. Ser. B (methodology), 39:1–38, 1977. 230 A C LASSIFICATION F RAMEWORK FOR A NOMALY D ETECTION M. J. Desforges, P. J. Jacob, and J. E. Cooper. Applications of probability density estimation to the detection of abnormal conditions in engineering. Proceedings of the Institution of Mechanical Engineers, Part C—Mechanical engineering science, 212:687–703, 1998. R. O. Duda, P. E. Hart, and D. G. Stork. Pattern Classification. Wiley, New York, 2000. R. M. Dudley. Real Analysis and Probability. Cambridge University Press, 2002. W. Fan, M. Miller, S. J. Stolfo, W. Lee, and P. K. Chan. Using artificial anomalies to detect unknown and known network intrusions. In IEEE International Conference on Data Mining (ICDM’01), pages 123–130. IEEE Computer Society, 2001. F. Gonz´ lez and D. Dagupta. Anomaly detection using real-valued negative selection. Genetic a Programming and Evolvable Machines, 4:383–403, 2003. J. A. Hartigan. Clustering Algorithms. Wiley, New York, 1975. J. A. Hartigan. Estimation of a convex density contour in 2 dimensions. J. Amer. Statist. Assoc., 82: 267–270, 1987. P. Hayton, B. Sch¨ lkopf, L. Tarassenko, and P. Anuzis. Support vector novelty detection applied to o jet engine vibration spectra. In T. K. Leen, T. G. Dietterich, and V. Tresp, editors, Advances in Neural Information Processing Systems 13, pages 946–952. MIT Press, 2001. S. P. King, D. M. King, P. Anuzis, K. Astley, L. Tarassenko, P. Hayton, and S. Utete. The use of novelty detection techniques for monitoring high-integrity plant. In IEEE International Conference on Control Applications, pages 221–226. IEEE Computer Society, 2002. E. Mammen and A. Tsybakov. Smooth discrimination analysis. Ann. Statist., 27:1808–1829, 1999. C. Manikopoulos and S. Papavassiliou. Network intrusion and fault detection: a statistical anomaly approach. IEEE Communications Magazine, 40:76–82, 2002. M. Markou and S. Singh. Novelty detection: a review—Part 1: statistical approaches. Signal Processing, 83:2481–2497, 2003a. M. Markou and S. Singh. Novelty detection: a review—Part 2: neural network based approaches. Signal Processing, 83:2499–2521, 2003b. D. W. M¨ ller and G. Sawitzki. Excess mass estimates and tests for multimodality. J. Amer. Statist. u Assoc., 86:738–746, 1991. A. Nairac, N. Townsend, R. Carr, S. King, P. Cowley, and L. Tarassenko. A system for the analysis of jet engine vibration data. Integrated Computer-Aided Engineering, 6:53–56, 1999. W. Polonik. Measuring mass concentrations and estimating density contour clusters—an excess mass aproach. Ann. Statist., 23:855–881, 1995. B. D. Ripley. Pattern recognition and neural networks. Cambridge University Press, 1996. 231 S TEINWART, H USH AND S COVEL G. Sawitzki. The excess mass approach and the analysis of multi-modality. In W. Gaul and D. Pfeifer, editors, From data to knowledge: Theoretical and practical aspects of classification, data analysis and knowledge organization, Proc. 18th Annual Conference of the GfKl, pages 203–211. Springer, 1996. B. Sch¨ lkopf, J. C. Platt, J. Shawe-Taylor, and A. J. Smola. Estimating the support of a higho dimensional distribution. Neural Computation, 13:1443–1471, 2001. B. Sch¨ lkopf and A. J. Smola. Learning with Kernels. MIT Press, 2002. o I. Steinwart. On the influence of the kernel on the consistency of support vector machines. J. Mach. Learn. Res., 2:67–93, 2001. I. Steinwart. Consistency of support vector machines and other regularized kernel machines. IEEE Trans. Inform. Theory, 51:128–142, 2005. I. Steinwart and C. Scovel. Fast rates for support vector machines using Gaussian kernels. Ann. Statist., submitted, 2004. http://www.c3.lanl.gov/˜ingo/publications/ann-04a.pdf. L. Tarassenko, P. Hayton, N. Cerneaz, and M. Brady. Novelty detection for the identification of masses in mammograms. In 4th International Conference on Artificial Neural Networks, pages 442–447, 1995. J. Theiler and D. M. Cai. Resampling approach for anomaly detection in multispectral images. In Proceedings of the SPIE 5093, pages 230–240, 2003. A. B. Tsybakov. On nonparametric estimation of density level sets. Ann. Statist., 25:948–969, 1997. A. B. Tsybakov. Optimal aggregation of classifiers in statistical learning. Ann. Statist., 32:135–166, 2004. D. Y. Yeung and C. Chow. Parzen-window network intrusion detectors. In Proceedings of the 16th International Conference on Pattern Recognition (ICPR’02) Vol. 4, pages 385–388. IEEE Computer Society, 2002. H. Yu, J. Hen, and K. C. Chang. PEBL: Web page classification without negative examples. IEEE. Trans. on Knowledge and Data Engineering, 16:70–81, 2004. 232