nips nips2013 nips2013-238 nips2013-238-reference knowledge-graph by maker-knowledge-mining

238 nips-2013-Optimistic Concurrency Control for Distributed Unsupervised Learning

Source: pdf

Author: Xinghao Pan, Joseph E. Gonzalez, Stefanie Jegelka, Tamara Broderick, Michael Jordan

Abstract: Research on distributed machine learning algorithms has focused primarily on one of two extremes—algorithms that obey strict concurrency constraints or algorithms that obey few or no such constraints. We consider an intermediate alternative in which algorithms optimistically assume that conﬂicts are unlikely and if conﬂicts do arise a conﬂict-resolution protocol is invoked. We view this “optimistic concurrency control” paradigm as particularly appropriate for large-scale machine learning algorithms, particularly in the unsupervised setting. We demonstrate our approach in three problem areas: clustering, feature learning and online facility location. We evaluate our methods via large-scale experiments in a cluster computing environment. 1

reference text

[1] J. Gonzalez, Y. Low, A. Gretton, and C. Guestrin. Parallel Gibbs sampling: From colored ﬁelds to thin junction trees. In Proceedings of the 14th International Conference on Artiﬁcial Intelligence and Statistics (AISTATS), pages 324–332, 2011.

[2] Yucheng Low, Joseph Gonzalez, Aapo Kyrola, Danny Bickson, Carlos Guestrin, and J. M. Hellerstein. Distributed GraphLab: A framework for machine learning and data mining in the cloud. In Proceedings of the 38th International Conference on Very Large Data Bases (VLDB, Istanbul, 2012.

[3] Benjamin Recht, Christopher Re, Stephen J. Wright, and Feng Niu. Hogwild: A lock-free approach to parallelizing stochastic gradient descent. In Advances in Neural Information Processing Systems (NIPS) 24, pages 693–701, Granada, 2011.

[4] Amr Ahmed, Mohamed Aly, Joseph Gonzalez, Shravan Narayanamurthy, and Alexander J. Smola. Scalable inference in latent variable models. In Proceedings of the 5th ACM International Conference on Web Search and Data Mining (WSDM), 2012.

[5] Hsiang-Tsung Kung and John T Robinson. On optimistic methods for concurrency control. ACM Transactions on Database Systems (TODS), 6(2):213–226, 1981.

[6] Brian Kulis and Michael I. Jordan. Revisiting k-means: New algorithms via Bayesian nonparametrics. In Proceedings of 29th International Conference on Machine Learning (ICML), Edinburgh, 2012.

[7] Tamara Broderick, Brian Kulis, and Michael I. Jordan. MAD-bayes: MAP-based asymptotic derivations from Bayes. In Proceedings of the 30th International Conference on Machine Learning (ICML), 2013.

[8] Leslie G. Valiant. A bridging model for parallel computation. Communications of the ACM, 33(8):103–111, 1990.

[9] Matei Zaharia, Mosharaf Chowdhury, Michael J Franklin, Scott Shenker, and Ion Stoica. Spark: Cluster computing with working sets. In Proceedings of the 2nd USENIX Conference on Hot Topics in Cloud Computing, 2010.

[10] A. Meyerson. Online facility location. In Proceedings of the 42nd Annual Symposium on Foundations of Computer Science (FOCS), Las Vegas, 2001.

[11] A. Meyerson, N. Mishra, R. Motwani, and L. O’Callaghan. Clustering data streams: Theory and practice. IEEE Transactions on Knowledge and Data Engineering, 15(3):515–528, 2003.

[12] N. Ailon, R. Jaiswal, and C. Monteleoni. Streaming k-means approximation. In Advances in Neural Information Processing Systems (NIPS) 22, Vancouver, 2009.

[13] John Paisley, David Blei, and Michael I Jordan. Stick-breaking Beta processes and the Poisson process. In Proceedings of the 15th International Conference on Artiﬁcial Intelligence and Statistics (AISTATS), 2012.

[14] D. Newman, A. Asuncion, P. Smyth, and M. Welling. Distributed inference for Latent Dirichlet Allocation. In Advances in Neural Information Processing Systems (NIPS) 20, Vancouver, 2007.

[15] D. Lovell, J. Malmaud, R. P. Adams, and V. K. Mansinghka. ClusterCluster: Parallel Markov chain Monte Carlo for Dirichlet process mixtures. ArXiv e-prints, April 2013.

[16] F. Doshi-Velez, D. Knowles, S. Mohamed, and Z. Ghahramani. Large scale nonparametric Bayesian inference: Data parallelisation in the Indian Buffet process. In Advances in Neural Information Processing Systems (NIPS) 22, Vancouver, 2009.

[17] Tianbing Xu and Alexander Ihler. Multicore Gibbs sampling in dense, unstructured graphs. In Proceedings of the 14th International Conference on Artiﬁcial Intelligence and Statistics (AISTATS). 2011.

[18] I. Dhillon and D.S. Modha. A data-clustering algorithm on distributed memory multiprocessors. In Workshop on Large-Scale Parallel KDD Systems, 2000.

[19] A. Das, M. Datar, A. Garg, and S. Ragarajam. Google news personalization: Scalable online collaborative ﬁltering. In Proceedings of the 16th World Wide Web Conference, Banff, 2007.

[20] A. Ene, S. Im, and B. Moseley. Fast clustering using MapReduce. In Proceedings of the 17th ACM SIGKDD Conference on Knowledge Discovery and Data Mining, San Diego, 2011.

[21] M. Shindler, A. Wong, and A. Meyerson. Fast and accurate k-means for large datasets. In Advances in Neural Information Processing Systems (NIPS) 24, Granada, 2011.

[22] Moses Charikar, Liadan O’Callaghan, and Rina Panigrahy. Better streaming algorithms for clustering problems. In Proceedings of the 35th Annual ACM Symposium on Theory of Computing (STOC), 2003.

[23] Mihai Bˇ doiu, Sariel Har-Peled, and Piotr Indyk. Approximate clustering via core-sets. In Proceedings of a the 34th Annual ACM Symposium on Theory of Computing (STOC), 2002.

[24] D. Feldman, A. Krause, and M. Faulkner. Scalable training of mixture models via coresets. In Advances in Neural Information Processing Systems (NIPS) 24, Granada, 2011.

[25] B. Bahmani, B. Moseley, A. Vattani, R. Kumar, and S. Vassilvitskii. Scalable kmeans++. In Proceedings of the 38th International Conference on Very Large Data Bases (VLDB), Istanbul, 2012. 9