nips nips2012 nips2012-223 nips2012-223-reference knowledge-graph by maker-knowledge-mining

223 nips-2012-Multi-criteria Anomaly Detection using Pareto Depth Analysis


Source: pdf

Author: Ko-jen Hsiao, Kevin Xu, Jeff Calder, Alfred O. Hero

Abstract: We consider the problem of identifying patterns in a data set that exhibit anomalous behavior, often referred to as anomaly detection. In most anomaly detection algorithms, the dissimilarity between data samples is calculated by a single criterion, such as Euclidean distance. However, in many cases there may not exist a single dissimilarity measure that captures all possible anomalous patterns. In such a case, multiple criteria can be defined, and one can test for anomalies by scalarizing the multiple criteria using a linear combination of them. If the importance of the different criteria are not known in advance, the algorithm may need to be executed multiple times with different choices of weights in the linear combination. In this paper, we introduce a novel non-parametric multi-criteria anomaly detection method using Pareto depth analysis (PDA). PDA uses the concept of Pareto optimality to detect anomalies under multiple criteria without having to run an algorithm multiple times with different choices of weights. The proposed PDA approach scales linearly in the number of criteria and is provably better than linear combinations of the criteria. 1


reference text

[1] V. J. Hodge and J. Austin (2004). A survey of outlier detection methodologies. Artificial Intelligence Review 22(2):85–126.

[2] V. Chandola, A. Banerjee, and V. Kumar (2009). Anomaly detection: A survey. ACM Computing Surveys 41(3):1–58.

[3] Y. Jin and B. Sendhoff (2008). Pareto-based multiobjective machine learning: An overview and case studies. IEEE Transactions on Systems, Man, and Cybernetics, Part C: Applications and Reviews 38(3):397–415.

[4] A. O. Hero III and G. Fleury (2004). Pareto-optimal methods for gene ranking. The Journal of VLSI Signal Processing 38(3):259–275.

[5] A. Blum and T. Mitchell (1998). Combining labeled and unlabeled data with co-training. In Proceedings of the 11th Annual Conference on Computational Learning Theory.

[6] V. Sindhwani, P. Niyogi, and M. Belkin (2005). A co-regularization approach to semisupervised learning with multiple views. In Proceedings of the Workshop on Learning with Multiple Views, 22nd International Conference on Machine Learning.

[7] C. M. Christoudias, R. Urtasun, and T. Darrell (2008). Multi-view learning in the presence of view disagreement. In Proceedings of the Conference on Uncertainty in Artificial Intelligence.

[8] M. G¨ nen and E. Alpaydın (2011). Multiple kernel learning algorithms. Journal of Machine o Learning Research 12(Jul):2211–2268.

[9] S. Byers and A. E. Raftery (1998). Nearest-neighbor clutter removal for estimating features in spatial point processes. Journal of the American Statistical Association 93(442):577–584.

[10] F. Angiulli and C. Pizzuti (2002). Fast outlier detection in high dimensional spaces. In Proceedings of the 6th European Conference on Principles of Data Mining and Knowledge Discovery.

[11] E. Eskin, A. Arnold, M. Prerau, L. Portnoy, and S. Stolfo (2002). A geometric framework for unsupervised anomaly detection: Detecting intrusions in unlabeled data. In Applications of Data Mining in Computer Security. Kluwer: Norwell, MA.

[12] M. M. Breunig, H.-P. Kriegel, R. T. Ng, and J. Sander (2000). LOF: Identifying density-based local outliers. In Proceedings of the ACM SIGMOD International Conference on Management of Data.

[13] A. O. Hero III (2006). Geometric entropy minimization (GEM) for anomaly detection and localization. In Advances in Neural Information Processing Systems 19.

[14] K. Sricharan and A. O. Hero III (2011). Efficient anomaly detection using bipartite k-NN graphs. In Advances in Neural Information Processing Systems 24.

[15] M. Zhao and V. Saligrama (2009). Anomaly detection with score functions based on nearest neighbor graphs. In Advances in Neural Information Processing Systems 22.

[16] M. Ehrgott (2000). Multicriteria optimization. Lecture Notes in Economics and Mathematical Systems 491. Springer-Verlag.

[17] O. Barndorff-Nielsen and M. Sobel (1966). On the distribution of the number of admissible points in a vector random sample. Theory of Probability and its Applications, 11(2):249–269.

[18] Z.-D. Bai, L. Devroye, H.-K. Hwang, and T.-H. Tsai (2005). Maxima in hypercubes. Random Structures Algorithms, 27(3):290–309.

[19] Y. Baryshnikov and J. E. Yukich (2005). Maximal points and Gaussian fields. Unpublished. URL http://www.math.illinois.edu/˜ymb/ps/by4.pdf.

[20] B. Majecka (2009). Statistical models of pedestrian behaviour in the Forum. Master’s thesis, University of Edinburgh.

[21] R. R. Sillito and R. B. Fisher (2008). Semi-supervised learning for anomalous trajectory detection. In Proceedings of the 19th British Machine Vision Conference. 9