nips nips2012 nips2012-294 nips2012-294-reference knowledge-graph by maker-knowledge-mining
Source: pdf
Author: Francesca Petralia, Vinayak Rao, David B. Dunson
Abstract: Discrete mixtures are used routinely in broad sweeping applications ranging from unsupervised settings to fully supervised multi-task learning. Indeed, finite mixtures and infinite mixtures, relying on Dirichlet processes and modifications, have become a standard tool. One important issue that arises in using discrete mixtures is low separation in the components; in particular, different components can be introduced that are very similar and hence redundant. Such redundancy leads to too many clusters that are too similar, degrading performance in unsupervised learning and leading to computational problems and an unnecessarily complex model in supervised settings. Redundancy can arise in the absence of a penalty on components placed close together even when a Bayesian approach is used to learn the number of components. To solve this problem, we propose a novel prior that generates components from a repulsive process, automatically penalizing redundant components. We characterize this repulsive prior theoretically and propose a Markov chain Monte Carlo sampling algorithm for posterior computation. The methods are illustrated using synthetic examples and an iris data set. Key Words: Bayesian nonparametrics; Dirichlet process; Gaussian mixture model; Model-based clustering; Repulsive point process; Well separated mixture. 1
[1] J. Rousseau and K. Mengersen. Asymptotic Behaviour of the Posterior Distribution in Over-Fitted Models. Journal of the Royal Statistical Society B, 73:689–710, 2011.
[2] S. Dasgupta. Learning Mixtures of Gaussians. Proceedings of the 40th Annual Symposium on Foundations of Computer Science, pages 633–644, 1999.
[3] S. Dasgupta and L. Schulman. A Probabilistic Analysis of EM for Mixtures of Separated, Spherical Gaussians. The Journal of Machine Learning Research, 8:203–226, 2007.
[4] M. Stephens. Bayesian Analysis of Mixture Models with an Unknown Number of Components - An Alternative to Reversible Jump Methods. The Annals of Statistics, 28:40–74, 2000.
[5] H. Ishwaran and M. Zarepour. Dirichlet Prior Sieves in Finite Normal Mixtures. Statistica Sinica, 12:941– 963, 2002.
[6] J. Sethuraman. A Constructive Denition of Dirichlet Priors. Statistica Sinica, 4:639–650, 1994.
[7] H. Ishwaran and L. F. James. Gibbs Sampling Methods for Stick-Breaking Priors. Journal of the American Statistical Association, 96:161–173, 2001.
[8] M. L. Huber and R. L. Wolpert. Likelihood-Based Inference for Matern Type-III Repulsive Point Processes. Advances in Applied Probability, 41:958–977, 2009.
[9] A. Lawson and A. Clark. Spatial Cluster Modeling. Chapman & Hall CRC, London, UK, 2002.
[10] D. J. Daley and D. Vere-Jones. An Introduction to the Theory of Point Processes. Springer, 2008.
[11] Catia Scricciolo. Posterior Rates of Convergence for Dirichlet Mixtures of Exponential Power Densities. Electronic Journal of Statistics, 5:270–308, 2011.
[12] H. Ishwaran, L. F. James, and J. Sun. Bayesian Model Selection in Finite Mixtures by Marginal Density Decompositions. Journal of American Statistical Association, 96:1316–1332, 2001.
[13] Paul Damien, Jon Wakefield, and Stephen Walker. Gibbs Sampling for Bayesian Non-Conjugate and Hierarchical Models by Using Auxiliary Variables. Journal of the Royal Statistical Society B, 61:331– 344, 1999.
[14] M. Stephens. Dealing with label switching in mixture models. Journal of the Roya; statistical society B, 62:795–810, 2000.
[15] H. Locarek-Junge and C. Weihs. Classification as a Tool for Research. Springer, 2009.
[16] C. Sugar and G. James. Finding the number of clusters in a data set: an information theoretic approach. Journal of the American Statistical Association, 98:750–763, 2003.
[17] J. Wang. Consistent selection of the number of clusters via crossvalidation. Biometrika, 97:893–904, 2010. 9