nips nips2010 nips2010-45 nips2010-45-reference knowledge-graph by maker-knowledge-mining

45 nips-2010-CUR from a Sparse Optimization Viewpoint

Source: pdf

Author: Jacob Bien, Ya Xu, Michael W. Mahoney

Abstract: The CUR decomposition provides an approximation of a matrix X that has low reconstruction error and that is sparse in the sense that the resulting approximation lies in the span of only a few columns of X. In this regard, it appears to be similar to many sparse PCA methods. However, CUR takes a randomized algorithmic approach, whereas most sparse PCA methods are framed as convex optimization problems. In this paper, we try to understand CUR from a sparse optimization viewpoint. We show that CUR is implicitly optimizing a sparse regression objective and, furthermore, cannot be directly cast as a sparse PCA method. We also observe that the sparsity attained by CUR possesses an interesting structure, which leads us to formulate a sparse PCA method that achieves a CUR-like sparsity.

reference text

[1] M.-A. Belabbas and P.J. Wolfe. Fast low-rank approximation for covariance matrices. In Second IEEE International Workshop on Computational Advances in Multi-Sensor Adaptive Processing, pages 293–296, 2007.

[2] A. d’Aspremont, L. El Ghaoui, M. I. Jordan, and G. R. G. Lanckriet. A direct formulation for sparse PCA using semideﬁnite programming. SIAM Review, 49(3):434–448, 2007.

[3] P. Drineas, R. Kannan, and M.W. Mahoney. Fast Monte Carlo algorithms for matrices III: Computing a compressed approximate matrix decomposition. SIAM Journal on Computing, 36:184–206, 2006.

[4] P. Drineas, M.W. Mahoney, and S. Muthukrishnan. Relative-error CUR matrix decompositions. SIAM Journal on Matrix Analysis and Applications, 30:844–881, 2008.

[5] S.A. Goreinov and E.E. Tyrtyshnikov. The maximum-volume concept in approximation by low-rank matrices. Contemporary Mathematics, 280:47–51, 2001.

[6] T. Hastie, R. Tibshirani, and J. Friedman. Applications of the lasso and grouped lasso to the estimation of sparse graphical models. Manuscript. Submitted. 2010.

[7] T. Hastie, R. Tibshirani, and J. Friedman. The Elements of Statistical Learning. SpringerVerlag, New York, 2003.

[8] D.C. Hoaglin and R.E. Welsch. The hat matrix in regression and ANOVA. The American Statistician, 32(1):17–22, 1978.

[9] R. Jenatton, G. Obozinski, and F. Bach. Structured sparse principal component analysis. Technical report. Preprint: arXiv:0909.1440 (2009).

[10] I. T. Jolliffe, N. T. Trendaﬁlov, and M. Uddin. A modiﬁed principal component technique based on the LASSO. Journal of Computational and Graphical Statistics, 12(3):531–547, 2003.

[11] S. Kumar, M. Mohri, and A. Talwalkar. Ensemble Nystr¨ m method. In Annual Advances in o Neural Information Processing Systems 22: Proceedings of the 2009 Conference, 2009.

[12] M.W. Mahoney and P. Drineas. CUR matrix decompositions for improved data analysis. Proc. Natl. Acad. Sci. USA, 106:697–702, 2009.

[13] T. Nielsen, R.B. West, S.C. Linn, O. Alter, M.A. Knowling, J. O’Connell, S. Zhu, M. Fero, G. Sherlock, J.R. Pollack, P.O. Brown, D. Botstein, and M. van de Rijn. Molecular characterisation of soft tissue tumours: a gene expression study. Lancet, 359(9314):1301–1307, 2002.

[14] P. Paschou, E. Ziv, E.G. Burchard, S. Choudhry, W. Rodriguez-Cintron, M.W. Mahoney, and P. Drineas. PCA-correlated SNPs for structure identiﬁcation in worldwide human populations. PLoS Genetics, 3:1672–1686, 2007.

[15] J. Peng, P. Wang, N. Zhou, and J. Zhu. Partial correlation estimation by joint sparse regression models. Journal of the American Statistical Association, 104:735–746, 2009.

[16] A. Subramanian, P. Tamayo, V. K. Mootha, S. Mukherjee, B. L. Ebert, M. A. Gillette, A. Paulovich, S. L. Pomeroy, T. R. Golub, E. S. Lander, and J. P. Mesirov. Gene set enrichment analysis: A knowledge-based approach for interpreting genome-wide expression proﬁles. Proc. Natl. Acad. Sci. USA, 102(43):15545–15550, 2005.

[17] J. Sun, Y. Xie, H. Zhang, and C. Faloutsos. Less is more: Compact matrix decomposition for large sparse graphs. In Proceedings of the 7th SIAM International Conference on Data Mining, 2007.

[18] R. Tibshirani. Regression shrinkage and selection via the lasso. Journal of the Royal Statistical Society: Series B, 58(1):267–288, 1996.

[19] D. M. Witten, R. Tibshirani, and T. Hastie. A penalized matrix decomposition, with applications to sparse principal components and canonical correlation analysis. Biostatistics, 10(3):515–534, 2009.

[20] M. Yuan and Y. Lin. Model selection and estimation in regression with grouped variables. Journal of the Royal Statistical Society: Series B, 68(1):49–67, 2006.

[21] H. Zou, T. Hastie, and R. Tibshirani. Sparse principal component analysis. Journal of Computational and Graphical Statistics, 15(2):262–286, 2006. 9