nips nips2006 nips2006-158 nips2006-158-reference knowledge-graph by maker-knowledge-mining

158 nips-2006-PG-means: learning the number of clusters in data


Source: pdf

Author: Yu Feng, Greg Hamerly

Abstract: We present a novel algorithm called PG-means which is able to learn the number of clusters in a classical Gaussian mixture model. Our method is robust and efficient; it uses statistical hypothesis tests on one-dimensional projections of the data and model to determine if the examples are well represented by the model. In so doing, we are applying a statistical test for the entire model at once, not just on a per-cluster basis. We show that our method works well in difficult cases such as non-Gaussian data, overlapping clusters, eccentric clusters, high dimension, and many true clusters. Further, our new method provides a much more stable estimate of the number of clusters than existing methods. 1


reference text

[1] Hirotugu Akaike. A new look at the statistical model identification. IEEE Transactions on Automatic Control, 19:716–723, 1974.

[2] Gerard E. Dallal and Leland Wilkinson. An analytic approximation to the distribution of Lilliefors’ test for normality. The American Statistician, 40:294–296, 1986.

[3] Sanjoy Dasgupta. Experiments with random projection. In Proceedings of the Sixteenth Conference on Uncertainty in Artificial Intelligence (UAI-2000), pages 143–151. Morgan Kaufmann Publishers, 2000.

[4] Greg Hamerly and Charles Elkan. Learning the k in k-means. In Proceedings of the seventeenth annual conference on neural information processing systems (NIPS), pages 281–288, 2003.

[5] Robert E. Kass and Larry Wasserman. A reference Bayesian test for nested hypotheses and its relationship to the schwarz criterion. Journal of the American Statistical Association, 90(431):928–934, 1995.

[6] Hubert W. Lilliefors. On the Kolmogorov-Smirnov test of normality with mean and variance unknown. Journal of the American Statistical Association, 62(318):399–402, 1967.

[7] Frank J. Massey, Jr. The Kolmogorov-Smirnov test for goodness of fit. Journal of the American Statistical Association, 46(253):68–78, 1951.

[8] Marina Meila. Comparing clusterings by the variation of information. In COLT, pages 173–187, 2003.

[9] Dan Pelleg and Andrew Moore. X-means: Extending k-means with efficient estimation of the number of clusters. In Proceedings of the 17th International Conf. on Machine Learning, pages 727–734. Morgan Kaufmann, 2000.

[10] Jorma Rissanen. Modeling by shortest data description. Automatica, 14:465–471, 1978.

[11] Peter Sand and Andrew W. Moore. Repairing faulty mixture models using density estimation. In Proceedings of the 18th International Conf. on Machine Learning, pages 457–464, 2001.

[12] Gideon Schwarz. Estimating the dimension of a model. The Annnals of Statistics, 6(2):461–464, 1978.

[13] Robert Tibshirani, Guenther Walther, and Trevor Hastie. Estimating the number of clusters in a dataset via the Gap statistic. Journal of the Royal Statistical Society B, 63:411–423, 2001.

[14] Naonori Ueda and Ryohei Nakano. Deterministic annealing em algorithm. Neural Networks, 11(2):271– 282, 1998.

[15] Max Welling and Kenichi Kurihara. Bayesian k-means as a ’maximization-expectation’ algorithm. In SIAM conference on Data Mining SDM06, 2006.