nips nips2006 nips2006-19 nips2006-19-reference knowledge-graph by maker-knowledge-mining
Source: pdf
Author: Kenichi Kurihara, Max Welling, Nikos A. Vlassis
Abstract: Dirichlet Process (DP) mixture models are promising candidates for clustering applications where the number of clusters is unknown a priori. Due to computational considerations these models are unfortunately unsuitable for large scale data-mining applications. We propose a class of deterministic accelerated DP mixture models that can routinely handle millions of data-cases. The speedup is achieved by incorporating kd-trees into a variational Bayesian algorithm for DP mixtures in the stick-breaking representation, similar to that of Blei and Jordan (2005). Our algorithm differs in the use of kd-trees and in the way we handle truncation: we only assume that the variational distributions are fixed at their priors after a certain level. Experiments show that speedups relative to the standard variational algorithm can be significant. 1
[1] T. Ferguson. A Bayesian analysis of some nonparametric problems. Ann. Statist., 1:209–230, 1973.
[2] C. Antoniak. Mixtures of Dirichlet processes with applications to Bayesian nonparametric problems. Ann. Statist., 2(6):1152–1174, 1974. ´
[3] D. Aldous. Exchangeability and related topics. In Ecole d’ et´ de Probabilit´ de Saint-Flour XIII, 1983. e ´e
[4] J. Sethuraman. A constructive definition of Dirichlet priors. Statist. Sinica, 4:639–650, 1994.
[5] C.E. Rasmussen. The infinite Gaussian mixture model. In NIPS 12. MIT Press, 2000.
[6] H. Ishwaran and M. Zarepour. Exact and approximate sum-representations for the Dirichlet process. Can. J. Statist., 30:269–283, 2002.
[7] D.M. Blei and M.I. Jordan. Variational inference for Dirichlet process mixtures. Journal of Bayesian Analysis, 1(1):121–144, 2005.
[8] A.W. Moore. Very fast EM-based mixture model clustering using multiresolution kd-trees. In NIPS 11. MIT Press, 1999.
[9] J.J. Verbeek, J.R.J. Nunnink, and N. Vlassis. Accelerated EM-based clustering of large data sets. Data Mining and Knowledge Discovery, 13(3):291–307, 2006.
[10] J.L. Bentley. Multidimensional binary search trees used for associative searching. Commun. ACM, 18(9):509–517, 1975.
[11] S. Dasgupta. Learning mixtures of Gaussians. In IEEE Symp. on Foundations of Computer Science, 1999.
[12] D.M. Blei, A.Y. Ng, and M.I. Jordan. Latent Dirichlet allocation. Journal of Machine Learning Research, 3:993–1022, 2003.