jmlr jmlr2010 jmlr2010-43 jmlr2010-43-reference knowledge-graph by maker-knowledge-mining

43 jmlr-2010-Generalized Power Method for Sparse Principal Component Analysis


Source: pdf

Author: Michel Journée, Yurii Nesterov, Peter Richtárik, Rodolphe Sepulchre

Abstract: In this paper we develop a new approach to sparse principal component analysis (sparse PCA). We propose two single-unit and two block optimization formulations of the sparse PCA problem, aimed at extracting a single sparse dominant principal component of a data matrix, or more components at once, respectively. While the initial formulations involve nonconvex functions, and are therefore computationally intractable, we rewrite them into the form of an optimization program involving maximization of a convex function on a compact set. The dimension of the search space is decreased enormously if the data matrix has many more columns (variables) than rows. We then propose and analyze a simple gradient method suited for the task. It appears that our algorithm has best convergence properties in the case when either the objective function or the feasible set are strongly convex, which is the case with our single-unit formulations and can be enforced in the block case. Finally, we demonstrate numerically on a set of random and gene expression test problems that our approach outperforms existing algorithms both in quality of the obtained solution and in computational speed. Keywords: sparse PCA, power method, gradient ascent, strongly convex sets, block algorithms


reference text

P.-A. Absil, R. Mahony, and R. Sepulchre. Optimization Algorithms on Matrix Manifolds. Princeton University Press, Princeton, January 2008. O. Alter, P. O. Brown, and D. Botstein. Generalized singular value decomposition for comparative analysis of genome-scale expression data sets of two different organisms. Proc Natl Acad Sci USA, 100(6):3351–3356, 2003. R. W. Brockett. Dynamical systems that sort lists, diagonalize matrices and solve linear programming problems. Linear Algebra Appl., 146:79–91, 1991. J. P. Brunet, P. Tamayo, T. R. Golub, and J. P. Mesirov. Metagenes and molecular pattern discovery using matrix factorization. Proc Natl Acad Sci USA, 101(12):4164–4169, 2004. J. Cadima and I. T. Jolliffe. Loadings and correlations in the interpretation of principal components. Journal of Applied Statistics, 22:203–214, 1995. A. d’Aspremont, L. El Ghaoui, M. I. Jordan, and G. R. G. Lanckriet. A direct formulation for sparse PCA using semidefinite programming. Siam Review, 49:434–448, 2007. A. d’Aspremont, F. R. Bach, and L. El Ghaoui. Optimal solutions for sparse principal component analysis. Journal of Machine Learning Research, 9:1269–1294, 2008. C. Fraikin, Yu. Nesterov, and P. Van Dooren. A gradient-type algorithm optimizing the coupling between matrices. Linear Algebra and its Applications, 429(5-6):1229–1242, 2008. G. H. Golub and C. F. Van Loan. Matrix Computations. The Johns Hopkins University Press, 1996. R. A. Horn and C. A. Johnson. Matrix Analysis. Cambridge University Press, Cambridge, UK, 1985. I. T. Jolliffe. Rotation of principal components: choice of normalization constraints. Journal of Applied Statistics, 22:29–35, 1995. I. T. Jolliffe, N. T. Trendafilov, and M. Uddin. A modified principal component technique based on the LASSO. Journal of Computational and Graphical Statistics, 12(3):531–547, 2003. W. Liebermeister. Linear modes of gene expression determined by independent component analysis. Bioinformatics, 18(1):51–60, 2002. 552 G ENERALIZED P OWER M ETHOD FOR S PARSE PCA L. Mackey. Deflation methods for sparse PCA. In Advances in Neural Information Processing Systems (NIPS), pages 1017–1024, 2008. B. Moghaddam, Y. Weiss, and S. Avidan. Spectral bounds for sparse PCA: Exact and greedy algorithms. In Y. Weiss, B. Sch¨ lkopf, and J. Platt, editors, Advances in Neural Information o Processing Systems 18, pages 915–922. MIT Press, Cambridge, MA, 2006. A. Naderi, A. E. Teschendorff, N. L. Barbosa-Morais, S. E. Pinder, A. R. Green, D. G. Powe, J. F. R. Robertson, S. Aparicio, I. O. Ellis, J. D. Brenton, and C. Caldas. A gene expression signature to predict survival in breast cancer across independent data sets. Oncogene, 26:1507–1516, 2007. B. N. Parlett. The Symmetric Eigenvalue Problem. Prentice-Hall Inc., Englewood Cliffs, N.J., 1980. ISBN 0-13-880047-2. Prentice-Hall Series in Computational Mathematics. A. Riva, A.-S. Carpentier, B. Torr´ sani, and A. H´ naut. Comments on selected fundamental aspects e e of microarray analysis. Computational Biology and Chemistry, 29(5):319–336, 2005. H. Shen and J. Z. Huang. Sparse principal component analysis via regularized low rank matrix approximation. Journal of Multivariate Analysis, 99(6):1015–1034, 2008. C. Sotiriou, P. Wirapati, S. Loi, A. Harris, S. Fox, J. Smeds, H. Nordgren, P. Farmer, V. Praz, B. Haibe-Kains, C. Desmedt, D. Larsimont, F. Cardoso, H. Peterse, D. Nuyten, M. Buyse, M. J. Van de Vijver, J. Bergh, M. Piccart, and M. Delorenzi. Gene expression profiling in breast cancer: understanding the molecular basis of histologic grade to improve prognosis. J Natl Cancer Inst, 98(4):262–272, 2006. A. Teschendorff, M. Journ´ e, P.-A. Absil, R. Sepulchre, and C. Caldas. Elucidating the altered e transcriptional programs in breast cancer using independent component analysis. PLoS Computational Biology, 3(8):1539–1554, 2007. R. Tibshirani. Regression shrinkage and selection via the lasso. Journal of the Royal Statistical Society, Series B, 58(2):267–288, 1996. M. J. van de Vijver, Y. D. He, L. J. van’t Veer, H. Dai, A. A. Hart, D. W. Voskuil, G. J. Schreiber, J. L. Peterse, C. Roberts, M. J. Marton, M. Parrish, D. Atsma, A. Witteveen, A. Glas, L. Delahaye, T. van der Velde, H. Bartelink, S. Rodenhuis, E. T. Rutgers, S. H. Friend, and R. Bernards. A gene-expression signature as a predictor of survival in breast cancer. N Engl J Med, 347(25): 1999–2009, 2002. Y. Wang, J. G. Klijn, Y. Zhang, A. M. Sieuwerts, M. P. Look, F. Yang, D. Talantov, M. Timmermans, M. E. Meijer-van Gelder, J. Yu, T. Jatkoe, E. M. Berns, D. Atkins, and J. A. Foekens. Geneexpression profiles to predict distant metastasis of lymph-node-negative primary breast cancer. Lancet, 365(9460):671–679, 2005. H. Zou, T. Hastie, and R. Tibshirani. Sparse principal component analysis. Journal of Computational and Graphical Statistics, 15(2):265–286, 2006. 553