nips nips2012 nips2012-126 nips2012-126-reference knowledge-graph by maker-knowledge-mining

126 nips-2012-FastEx: Hash Clustering with Exponential Families


Source: pdf

Author: Amr Ahmed, Sujith Ravi, Alex J. Smola, Shravan M. Narayanamurthy

Abstract: Clustering is a key component in any data analysis toolbox. Despite its importance, scalable algorithms often eschew rich statistical models in favor of simpler descriptions such as k-means clustering. In this paper we present a sampler, capable of estimating mixtures of exponential families. At its heart lies a novel proposal distribution using random projections to achieve high throughput in generating proposals, which is crucial for clustering models with large numbers of clusters. 1


reference text

[1] C. D. Manning, P. Raghavan, and H. Sch¨ tze. Introduction to Information Retrieval. Camu bridge University Press, 2008.

[2] D. Agarwal and S. Merugu. Predictive discrete latent factor models for large scale dyadic data. Conference on Knowledge Discovery and Data Mining, pages 26–35. ACM, 2007.

[3] A. Das, M. Datar, A. Garg, and S. Rajaram. Google news personalization: scalable online collaborative filtering. In Conference on World Wide Web, pages 271–280. ACM, 2007.

[4] D. Emanuel and A. Fiat. Correlation clustering — minimizing disagreements on arbitrary weighted graphs. Algorithms — ESA 2003, 11th Annual European Symposium, volume 2832 of Lecture Notes in Computer Science, pages 208–220. Springer, 2003.

[5] J. MacQueen. Some methods of classification and analysis of multivariate observations. In L. M. LeCam and J. Neyman, editors, Proc. 5th Berkeley Symposium on Math., Stat., and Prob., page 281. U. California Press, Berkeley, CA, 1967.

[6] S. Negahban, P. Ravikumar, M. J. Wainwright, and B. Yu. A unified framework for highdimensional analysis of M -estimators with decomposable regularizers. CoRR, abs/1010.2731, 2010. informal publication.

[7] V. Vapnik and A. Chervonenkis. The necessary and sufficient conditions for consistency in the empirical risk minimization method. Pattern Recognition and Image Analysis, 1(3):283–305, 1991.

[8] A. P. Dempster, N. M. Laird, and D. B. Rubin. Maximum Likelihood from Incomplete Data via the EM Algorithm. Journal of the Royal Statistical Society B, 39(1):1–22, 1977.

[9] C. E. Rasmussen. The infinite gaussian mixture model. In Advances in Neural Information Processing Systems 12, pages 554–560, 2000.

[10] M. J. Wainwright and M. I. Jordan. Graphical models, exponential families, and variational inference. Foundations and Trends in Machine Learning, 1(1 – 2):1 – 305, 2008.

[11] T.L. Griffiths and M. Steyvers. Finding scientific topics. Proceedings of the National Academy of Sciences, 101:5228–5235, 2004.

[12] M. Charikar. Similarity estimation techniques from rounding algorithms. In Proceedings of the thiry-fourth annual ACM symposium on Theory of computing, pages 380–388, 2002.

[13] A. Beygelzimer, S. Kakade, and J. Langford. Cover trees for nearest neighbor. In International Conference on Machine Learning, 2006.

[14] A. Gionis, P. Indyk, and R. Motwani. Similarity search in high dimensions via hashing. In M. P. Atkinson, M. E. Orlowska, P. Valduriez, S. B. Zdonik, and M. L. Brodie, editors, Proceedings of the 25th VLDB Conference, pages 518–529, Edinburgh, Scotland, 1999. Morgan Kaufmann.

[15] Y. Shen, A. Ng, and M. Seeger. Fast Gaussian process regression using kd-trees. In Y. Weiss, B. Sch¨ lkopf, and J. Platt, editors, Advances in Neural Information Processing Systems 18, o pages 1227–1234, Cambridge, MA, 2005. MIT Press.

[16] R.J. Bayardo, Y. Ma, and R. Srikant. Scaling up all pairs similarity search. In Proceedings of the 16th international conference on World Wide Web, pages 131–140. ACM, 2007.

[17] M.X. Goemans and D. P. Williamson. Improved approximation algorithms for maximum cut and satisfiability problems using semidefinite programming. Journal of the ACM, 42(6), 1995.

[18] M. Meila. Comparing clusterings by the variation of information. In COLT, 2003.

[19] A.J. Smola and S. Narayanamurthy. An architecture for parallel topic models. In Very Large Databases (VLDB), 2010.

[20] A. Ahmed, M. Aly, J. Gonzalez, S. Narayanamurthy, and A.J. Smola. Scalable inference in latent variable models. In Web Science and Data Mining (WSDM), 2012.

[21] A. Ahmed, Q. Ho, C. H. Teo, J. Eisenstein, A. J. Smola, and E. P. Xing. Online inference for the infinite cluster-topic model: Storylines from streaming text. In AISTATS, 2011.

[22] A. Ahmed, Q. Ho, J. Eisenstein, E. P. Xing, A. J. Smola, and C. H. Teo. Unified analysis of streaming news. In www, 2011.

[23] D. Mimno, M. Hoffman, and D. Blei. Sparse stochastic inference for latent dirichlet allocation. In International Conference on Machine Learning, 2012. 9