nips nips2011 nips2011-262 nips2011-262-reference knowledge-graph by maker-knowledge-mining

262 nips-2011-Sparse Inverse Covariance Matrix Estimation Using Quadratic Approximation


Source: pdf

Author: Cho-jui Hsieh, Inderjit S. Dhillon, Pradeep K. Ravikumar, Mátyás A. Sustik

Abstract: The 1 regularized Gaussian maximum likelihood estimator has been shown to have strong statistical guarantees in recovering a sparse inverse covariance matrix, or alternatively the underlying graph structure of a Gaussian Markov Random Field, from very limited samples. We propose a novel algorithm for solving the resulting optimization problem which is a regularized log-determinant program. In contrast to other state-of-the-art methods that largely use first order gradient information, our algorithm is based on Newton’s method and employs a quadratic approximation, but with some modifications that leverage the structure of the sparse Gaussian MLE problem. We show that our method is superlinearly convergent, and also present experimental results using synthetic and real application data that demonstrate the considerable improvements in performance of our method when compared to other state-of-the-art methods.


reference text

[1] A. Agarwal, S. Negahban, and M. Wainwright. Convergence rates of gradient methods for high-dimensional statistical recovery. In NIPS, 2010.

[2] O. Banerjee, L. E. Ghaoui, and A. d’Aspremont. Model selection through sparse maximum likelihood estimation for multivariate Gaussian or binary data. The Journal of Machine Learning Research, 9, 6 2008.

[3] D. Bertsekas. Nonlinear programming. Athena Scientific, Belmont, MA, 1995.

[4] S. Boyd and L. Vandenberghe. Convex optimization. Cambridge University Press, 7th printing edition, 2009.

[5] J. Duchi, S. Gould, and D. Koller. Projected subgradient methods for learning sparse Gaussians. UAI, 2008.

[6] J. Dunn. Newton’s method and the Goldstein step-length rule for constrained minimization problems. SIAM J. Control and Optimization, 18(6):659–674, 1980.

[7] J. Friedman, T. Hastie, H. H¨ fling, and R. Tibshirani. Pathwise coordinate optimization. Annals o of Applied Statistics, 1(2):302–332, 2007.

[8] J. Friedman, T. Hastie, and R. Tibshirani. Sparse inverse covariance estimation with the graphical lasso. Biostatistics, 9(3):432–441, July 2008.

[9] J. Friedman, T. Hastie, and R. Tibshirani. Regularization paths for generalized linear models via coordinate descent. Journal of Statistical Software, 33(1):1–22, 2010.

[10] E. S. Levitin and B. T. Polyak. Constrained minimization methods. U.S.S.R. Computational Math. and Math. Phys., 6:1–50, 1966.

[11] L. Li and K.-C. Toh. An inexact interior point method for l1-reguarlized sparse covariance selection. Mathematical Programming Computation, 2:291–315, 2010.

[12] L. Meier, S. Van de Geer, and P. B¨ hlmann. The group lasso for logistic regression. Journal u of the Royal Statistical Society, Series B, 70:53–71, 2008.

[13] R. T. Rockafellar. Convex Analysis. Princeton University Press, Princeton, NJ, 1970.

[14] K. Scheinberg, S. Ma, and D. Glodfarb. Sparse inverse covariance selection via alternating linearization methods. NIPS, 2010.

[15] K. Scheinberg and I. Rish. Learning sparse Gaussian Markov networks using a greedy coordinate ascent approach. In J. Balczar, F. Bonchi, A. Gionis, and M. Sebag, editors, Machine Learning and Knowledge Discovery in Databases, volume 6323 of Lecture Notes in Computer Science, pages 196–212. Springer Berlin / Heidelberg, 2010.

[16] R. Tibshirani. Regression shrinkage and selection via the lasso. Journal of the Royal Statistical Society, 58:267–288, 1996.

[17] P. Tseng and S. Yun. A coordinate gradient descent method for nonsmooth separable minimization. Mathematical Programming, 117:387–423, 2007.

[18] T. T. Wu and K. Lange. Coordinate descent algorithms for lasso penalized regression. The Annals of Applied Statistics, 2(1):224–244, 2008.

[19] G.-X. Yuan, K.-W. Chang, C.-J. Hsieh, and C.-J. Lin. A comparison of optimization methods and software for large-scale l1-regularized linear classification. Journal of Machine Learning Research, 11:3183–3234, 2010.

[20] M. Yuan and Y. Lin. Model selection and estimation in the gaussian graphical model. Biometrika, 94:19–35, 2007.

[21] S. Yun and K.-C. Toh. A coordinate gradient descent method for l1-regularized convex minimization. Computational Optimizations and Applications, 48(2):273–307, 2011. 9