nips nips2011 nips2011-81 nips2011-81-reference knowledge-graph by maker-knowledge-mining

81 nips-2011-Efficient anomaly detection using bipartite k-NN graphs


Source: pdf

Author: Kumar Sricharan, Alfred O. Hero

Abstract: Learning minimum volume sets of an underlying nominal distribution is a very effective approach to anomaly detection. Several approaches to learning minimum volume sets have been proposed in the literature, including the K-point nearest neighbor graph (K-kNNG) algorithm based on the geometric entropy minimization (GEM) principle [4]. The K-kNNG detector, while possessing several desirable characteristics, suffers from high computation complexity, and in [4] a simpler heuristic approximation, the leave-one-out kNNG (L1O-kNNG) was proposed. In this paper, we propose a novel bipartite k-nearest neighbor graph (BPkNNG) anomaly detection scheme for estimating minimum volume sets. Our bipartite estimator retains all the desirable theoretical properties of the K-kNNG, while being computationally simpler than the K-kNNG and the surrogate L1OkNNG detectors. We show that BP-kNNG is asymptotically consistent in recovering the p-value of each test point. Experimental results are given that illustrate the superior performance of BP-kNNG as compared to the L1O-kNNG and other state of the art anomaly detection schemes.


reference text

[1] A. Asuncion and D.J. Newman. UCI machine learning repository, 2007.

[2] S. D. Bay and M. Schwabacher. Mining distance-based outliers in near linear time with randomization and a simple pruning rule. In Proceedings of the ninth ACM SIGKDD international conference on Knowledge discovery and data mining, KDD ’03, pages 29–38, New York, NY, USA, 2003. ACM.

[3] M. M. Breunig, H. Kriegel, R. T. Ng, and J. Sander. Lof: identifying density-based local outliers. In Proceedings of the 2000 ACM SIGMOD international conference on Management of data, SIGMOD ’00, pages 93–104, New York, NY, USA, 2000. ACM.

[4] A. O. Hero. Geometric entropy minimization (gem) for anomaly detection and localization. In Proc. Advances in Neural Information Processing Systems (NIPS, pages 585–592. MIT Press, 2006.

[5] F. T. Liu, K. M. Ting, and Z. Zhou. Isolation forest. In Proceedings of the 2008 Eighth IEEE International Conference on Data Mining, pages 413–422, Washington, DC, USA, 2008. IEEE Computer Society.

[6] C. Park, J. Z. Huang, and Y. Ding. A computable plug-in estimator of minimum volume sets for novelty detection. Operations Research, 58(5):1469–1480, 2010.

[7] S. Ramaswamy, R. Rastogi, and K. Shim. Efficient algorithms for mining outliers from large data sets. SIGMOD Rec., 29:427–438, May 2000.

[8] D. M. Rocke and D. L. Woodruff. Identification of Outliers in Multivariate Data. Journal of the American Statistical Association, 91(435):1047–1061, 1996.

[9] B. Sch¨ lkopf, R. Williamson, A. Smola, J. Shawe-Taylor, and J.Platt. Support Vector Method o for Novelty Detection. volume 12, 2000.

[10] C. Scott and R. Nowak. Learning minimum volume sets. J. Machine Learning Res, 7:665–704, 2006.

[11] K. Sricharan, R. Raich, and A. O. Hero. Empirical estimation of entropy functionals with confidence. ArXiv e-prints, December 2010.

[12] K. M. Ting, G. Zhou, T. F. Liu, and J. S. C. Tan. Mass estimation and its applications. In Proceedings of the 16th ACM SIGKDD international conference on Knowledge discovery and data mining, KDD ’10, pages 989–998, New York, NY, USA, 2010. ACM.

[13] M. Zhao and V. Saligrama. Anomaly detection with score functions based on nearest neighbor graphs. Computing Research Repository, abs/0910.5461, 2009. 9