nips nips2012 nips2012-240 nips2012-240-reference knowledge-graph by maker-knowledge-mining

240 nips-2012-Newton-Like Methods for Sparse Inverse Covariance Estimation

Source: pdf

Author: Figen Oztoprak, Jorge Nocedal, Steven Rennie, Peder A. Olsen

Abstract: We propose two classes of second-order optimization methods for solving the sparse inverse covariance estimation problem. The ﬁrst approach, which we call the Newton-LASSO method, minimizes a piecewise quadratic model of the objective function at every iteration to generate a step. We employ the fast iterative shrinkage thresholding algorithm (FISTA) to solve this subproblem. The second approach, which we call the Orthant-Based Newton method, is a two-phase algorithm that ﬁrst identiﬁes an orthant face and then minimizes a smooth quadratic approximation of the objective function using the conjugate gradient method. These methods exploit the structure of the Hessian to efﬁciently compute the search direction and to avoid explicitly storing the Hessian. We also propose a limited memory BFGS variant of the orthant-based Newton method. Numerical results, including comparisons with the method implemented in the QUIC software [1], suggest that all the techniques described in this paper constitute useful tools for the solution of the sparse inverse covariance estimation problem. 1

reference text

[1] C. J. Hsieh, M. A. Sustik, P. Ravikumar, and I. S. Dhillon. Sparse inverse covariance matrix estimation using quadratic approximation. Advances in Neural Information Processing Systems (NIPS), 24, 2011.

[2] A. P. Dempster. Covariance selection. Biometrics, 28:157–75, 1972.

[3] J. D. Picka. Gaussian Markov random ﬁelds: theory and applications. Technometrics, 48(1):146–147, 2006.

[4] O. Banerjee, L. El Ghaoui, A. d’Aspremont, and G. Natsoulis. Convex optimization techniques for ﬁtting sparse Gaussian graphical models. In ICML, pages 89–96. ACM, 2006.

[5] A.J. Rothman, E. Levina, and J. Zhu. Sparse multivariate regression with covariance estimation. Journal of Computational and Graphical Statistics, 19(4):947–962, 2010.

[6] T. Tsiligkaridis and A. O. Hero III. Sparse covariance estimation under Kronecker product structure. In ICASSP 2006 Proceedings, pages 3633–3636, Kyoto, Japan, 2012.

[7] O. Banerjee, L. El Ghaoui, and A. d’Aspremont. Model selection through sparse maximum likelihood estimation for multivariate gaussian or binary data. The Journal of Machine Learning Research, 9:485– 516, 2008.

[8] A. d’Aspremont, O. Banerjee, and L. El Ghaoui. First-order methods for sparse covariance selection. SIAM Journal on Matrix Analysis and Applications, 30(1):56–66, 2008.

[9] J. Nocedal and S. J. Wright. Numerical Optimization. Springer Series in Operations Research. 1999.

[10] I. Rish and G. Grabarnik. ELEN E6898 Sparse Signal Modeling (Spring 2011): Lecture 7, Beyond LASSO: Other Losses (Likelihoods). https://sites.google.com/site/eecs6898sparse2011/, 2011.

[11] S. Sra, S. Nowozin, and S. J. Wright. Optimization for Machine Learning. MIT Press, 2011.

[12] J. Friedman, T. Hastie, and R. Tibshirani. Sparse inverse covariance estimation with the graphical LASSO. Biostatistics, 9(3):432, 2008.

[13] K. Scheinberg and I. Rish. SINCO-a greedy coordinate ascent method for sparse inverse covariance selection problem. Technical report, IBM RC24837, 2009.

[14] J. Duchi, S. Gould, and D. Koller. Projected subgradient methods for learning sparse Gaussians. In Proc. of the Conf. on Uncertainty in AI. Citeseer, 2008.

[15] Z. Lu. Smooth optimization approach for sparse covariance selection. Arxiv preprint arXiv:0904.0687, 2009.

[16] K. Scheinberg, S. Ma, and D. Goldfarb. Sparse inverse covariance selection via alternating linearization methods. Arxiv preprint arXiv:1011.0097, 2010.

[17] L. Li and K. C. Toh. An inexact interior point method for L1-regularized sparse covariance selection. Mathematical Programming Computation, 2(3):291–315, 2010.

[18] R. Tibshirani. Regression shrinkage and selection via the LASSO. Journal of the Royal Statistical Society B, 58(1):267–288, 1996.

[19] B. T. Polyak. The conjugate gradient method in extremal problems. U.S.S.R. Computational Mathematics and Mathematical Physics, 9:94–112, 1969.

[20] I. Daubechies, M. Defrise, and C. De Mol. An iterative thresholding algorithm for linear inverse problems with a sparsity constraint. Communications on pure and applied mathematics, 57(11):1413–1457, 2004.

[21] A. Beck and M. Teboulle. A fast iterative shrinkage-thresholding algorithm for linear inverse problems. SIAM Journal on Imaging Sciences, 2(1):183–202, 2009.

[22] P. Tseng and S. Yun. A coordinate gradient descent method for nonsmooth separable minimization. Mathematical Programming, 117(1):387–423, 2009.

[23] G. Andrew and J. Gao. Scalable training of L1-regularized log-linear models. In ICML, pages 33–40. ACM, 2007.

[24] R. H. Byrd, P. Lu, J. Nocedal, and C. Zhu. A limited memory algorithm for bound constrained optimization. SIAM Journal on Scientiﬁc Computing, 16(5):1190–1208, 1995.

[25] J. Dahl, V. Roychowdhury, and L. Vandenberghe. Maximum likelihood estimation of gaussian graphical models: numerical implementation and topology selection. UCLA Preprint, 2005.

[26] S. Boyd, N. Parikh, E. Chu, B. Peleato, and J. Eckstein. Matlab scripts for alternating direction method of multipliers. Technical report, http://www.stanford.edu/ boyd/papers/admm/, 2012. 9