nips nips2013 nips2013-146 nips2013-146-reference knowledge-graph by maker-knowledge-mining

146 nips-2013-Large Scale Distributed Sparse Precision Estimation


Source: pdf

Author: Huahua Wang, Arindam Banerjee, Cho-Jui Hsieh, Pradeep Ravikumar, Inderjit Dhillon

Abstract: We consider the problem of sparse precision matrix estimation in high dimensions using the CLIME estimator, which has several desirable theoretical properties. We present an inexact alternating direction method of multiplier (ADMM) algorithm for CLIME, and establish rates of convergence for both the objective and optimality conditions. Further, we develop a large scale distributed framework for the computations, which scales to millions of dimensions and trillions of parameters, using hundreds of cores. The proposed framework solves CLIME in columnblocks and only involves elementwise operations and parallel matrix multiplications. We evaluate our algorithm on both shared-memory and distributed-memory architectures, which can use block cyclic distribution of data and parameters to achieve load balance and improve the efficiency in the use of memory hierarchies. Experimental results show that our algorithm is substantially more scalable than state-of-the-art methods and scales almost linearly with the number of cores. 1


reference text

[1] O. Banerjee, L. E. Ghaoui, and A. dAspremont. Model selection through sparse maximum likelihood estimation for multivariate Gaussian or binary data. JMLR, 9:2261–2286, 2008.

[2] S. Boyd, E. Chu N. Parikh, B. Peleato, and J. Eckstein. Distributed optimization and statistical learning via the alternating direction method of multipliers. Foundation and Trends Machine Learning, 3(1), 2011.

[3] T. Cai, W. Liu, and X. Luo. A constrained 1 minimization approach to sparse precision matrix estimation. Journal of American Statistical Association, 106:594–607, 2011.

[4] T. Cai, W. Liu, and H. Zhou. Estimating sparse precision matrix: Optimal rates of convergence and adaptive estimation. Preprint, 2012.

[5] J. Choi. A new parallel matrix multiplication algorithm on distributed-memory concurrent computers. In High Performance Computing on the Information Superhighway, 1997.

[6] J. Dean, G. Corrado, R. Monga, K. Chen, M. Devin, Q. Le, M. Mao, M. Ranzato, A. Senior, P. Tucker, K. Yang, and A. Y. Ng. Large scale distributed deep networks. In NIPS, 2012.

[7] J. Dean and S. Ghemawat. Map-Reduce: simplified data processing on large clusters. In CACM, 2008.

[8] J. Friedman, T. Hastie, and R. Tibshirani. Model selection through sparse maximum likelihood estimation for multivariate gaussian or binary data. Biostatistics, 9:432–441, 2008.

[9] Q. Fu, H. Wang, and A. Banerjee. Bethe-ADMM for tree decomposition based parallel MAP inference. In UAI, 2013.

[10] T. R. Golub, D. K. Slonim, P. Tamayo, C. Huard, M. Gaasenbeek, J. P. Mesirov, H. Coller, M. L. Loh, J. R. Downing, M. A. Caligiuri, and C. D. Bloomfield. Molecular classication of cancer: class discovery and class prediction by gene expression monitoring. Science, pages 531–537, 1999.

[11] K. Goto and R. Van De Geijn. Highperformance implementation of the level-3 BLAS. ACM Transactions on Mathematical Software, 35:1–14, 2008.

[12] B. He and X. Yuan. On non-ergodic convergence rate of Douglas-Rachford alternating direction method of multipliers. Preprint, 2012.

[13] C. Hsieh, I. Dhillon, P. Ravikumar, and A. Banerjee. A divide-and-conquer method for sparse inverse covariance estimation. In NIPS, 2012.

[14] C. Hsieh, M. Sustik, I. Dhillon, and P. Ravikumar. Sparse inverse covariance matrix estimation using quadratic approximation. In NIPS, 2011.

[15] A. Cleary J. Demmel I. S. Dhillon J. Dongarra S. Hammarling G. Henry A. Petitet K. Stanley D. Walker L. Blackford, J. Choi and R.C. Whaley. ScaLAPACK Users’ Guide. SIAM, 1997.

[16] M. Lam, E. Rothberg, and M. Wolf. The cache performance and optimization of blocked algorithms. In Architectural Support for Programming Languages and Operating Systems, 1991.

[17] L. Li and K.-C. Toh. An inexact interior point method for L1-reguarlized sparse covariance selection. Mathematical Programming Computation, 2:291–315, 2010.

[18] X. Li, T. Zhao, X. Yuan, and H. Liu. An R package flare for high dimensional linear regression and precision matrix estimation. http://cran.r-project.org/web/packages/flare, 2013.

[19] H. Liu and L. Wang. Tiger: A tuning-insensitive approach for optimally estimating Gaussian graphical models. Preprint, 2012.

[20] Y. Low, J. Gonzalez, A. Kyrola, D. Bickson, C. Guestrin, and J. Hellerstein. Distributed graphlab: A framework for machine learning in the cloud. In VLDB, 2012.

[21] R. Mazumder and T. Hastie. Exact covariance thresholding into connected components for large-scale graphical lasso. JMLR, 13:723–736, 2012.

[22] F. Niu, B. Retcht, C. Re, and S. J. Wright. Hogwild! a lock-free approach to parallelizing stochastic gradient descent. In NIPS, 2011.

[23] N. Parikh and S. Boyd. Graph projection block splitting for distributed optimization. Preprint, 2012.

[24] R. Raina, A. Madhavan, and A. Y. Ng. Large-scale deep unsupervised learning using graphics processors. In ICML, 2009.

[25] P. Ravikumar, M. J. Wainwright, G. Raskutti, and B. Yu. High-dimensional covariance estimation by minimizing l1-penalized log-determinant divergence. Electronic Journal of Statistics, 5:935–980, 2011.

[26] H. Wang and A. Banerjee. Online alternating direction method. In ICML, 2012.

[27] J. Yang and Y. Zhang. Alternating direction algorithms for L1-problems in compressive sensing. ArXiv, 2009.

[28] M. Yuan. Sparse inverse covariance matrix estimation via linear programming. JMLR, 11, 2010.

[29] M. Zinkevich, M. Weimer, A. Smola, and L. Li. Parallelized stochastic gradient descent. In NIPS, 2010. 9