nips nips2011 nips2011-110 nips2011-110-reference knowledge-graph by maker-knowledge-mining

110 nips-2011-Group Anomaly Detection using Flexible Genre Models


Source: pdf

Author: Liang Xiong, Barnabás Póczos, Jeff G. Schneider

Abstract: An important task in exploring and analyzing real-world data sets is to detect unusual and interesting phenomena. In this paper, we study the group anomaly detection problem. Unlike traditional anomaly detection research that focuses on data points, our goal is to discover anomalous aggregated behaviors of groups of points. For this purpose, we propose the Flexible Genre Model (FGM). FGM is designed to characterize data groups at both the point level and the group level so as to detect various types of group anomalies. We evaluate the effectiveness of FGM on both synthetic and real data sets including images and turbulence data, and show that it is superior to existing approaches in detecting group anomalies. 1


reference text

[1] Varun Chandola, Arindam Banerjee, and Vipin Kumar. Anomaly detection: A survey. ACM Computing Surveys, 41-3, 2009.

[2] Geoffrey G. Hazel. Multivariate gaussian MRF for multispectral scene segmentation and anomaly detection. IEEE Trans. Geoscience and Remote Sensing, 38-3:1199 – 1211, 2000.

[3] Kaustav Das, Jeff Schneider, and Daniel Neill. Anomaly pattern detection in categorical datasets. In Knowledge Discovery and Data Mining (KDD), 2008.

[4] Kaustav Das, Jeff Schneider, and Daniel Neill. Detecting anomalous groups in categorical datasets. Technical Report 09-104, CMU-ML, 2009.

[5] Philip K. Chan and Matthew V. Mahoney. Modeling multiple time series for anomaly detection. In IEEE International Conference on Data Mining, 2005.

[6] Eamonn Keogh, Jessica Lin, and Ada Fu. Hot sax: Efficiently finding the most unusual time series subsequence. In IEEE International Conference on Data Mining, 2005.

[7] G. Mark Voit. Tracing cosmic evolution with clusters of galaxies. Reviews of Modern Physics, 77(1):207 – 258, 2005.

[8] B. de Finetti. Funzione caratteristica di un fenomeno aleatorio. Atti della R. Academia Nazionale dei Lincei, Serie 6. Memorie, Classe di Scienze Fisiche, Mathematice e Naturale, 4, 1931.

[9] Thomas Hofmann. Unsupervised learning with probabilistic latent semantic analysis. Machine Learning Journal, 2001.

[10] David M. Blei, Andrew Y. Ng, and Michael I. Jordan. Latent Dirichlet allocation. JMLR, 3:993–1022, 2003.

[11] Stuart Geman and Donald Geman. Stochastic relaxation, gibbs distributions, and the bayesian restoration of images. IEEE Trans. PAMI, 6:721 – 741, 1984.

[12] Gilles Celeux, Didier Chaveau, and Jean Diebolt. Stochastic version of the em algorithm: An experimental study in the mixture case. J. of Statistical Computation and Simulation, 55, 1996.

[13] Liang Xiong, Barnab´ s P´ czos, and Jeff Schneider. Hierarchical probabilistic models for group a o anomaly detection. In International conference on Artificial Intelligence and Statistics (AISTATS), 2011.

[14] Mikaela Keller and Samy Bengio. Theme-topic mixture model for document representation. In Learning Methods for Text Understanding and Mining, 2004.

[15] Li Fei-Fei and P. Perona. A bayesian hierarchical model for learning natural scene categories. IEEE Conf. CVPR, pages 524–531, 2005.

[16] Gabriel Doyle and Charles Elkan. Accounting for burstiness in topic models. In International Conference on Machine Learning, 2009.

[17] Andrew Gelman, John B. Carlin, Hal S. Stern, and Donald B. Rubin. Bayesian Data Analysis. Chapman and Hall/CRC, 2003.

[18] Thomas P. Minka. Estimating a dirichlet distribution. http://research.microsoft. com/en-us/um/people/minka/papers/dirichlet, 2009.

[19] Gideon E. Schwarz. Estimating the dimension of a model. Annals of Statistics, (6-2):461–464, 1974.

[20] David G. Lowe. Distinctive image features from scale-invariant keypoints. IJCV, 60(2):91 – 110, 2004.

[21] Manqi Zhao. Anomaly detection with score functions based on nearest neighbor graphs. In NIPS, 2009.

[22] E. Perlman, R. Burns, Y. Li, and C. Meneveau. Data exploration of turbulence simulations using a database cluster. In Supercomputing SC, 2007.

[23] Charles Meneveau. Lagrangian dynamics and models of the velocity gradient tensor in turbulent flows. Annual Review of Fluid Mechanics, 43:219–45, 2011.

[24] Yee Whye Teh, Michael I. Jordan, Matthew J. Beal, and David M. Blei. Hierarchical Dirichlet process. Journal of the American Statistical Association, 101:1566 – 1581, 2006. 9