jmlr jmlr2008 jmlr2008-75 jmlr2008-75-reference knowledge-graph by maker-knowledge-mining

75 jmlr-2008-Optimal Solutions for Sparse Principal Component Analysis

Source: pdf

Author: Alexandre d'Aspremont, Francis Bach, Laurent El Ghaoui

Abstract: Given a sample covariance matrix, we examine the problem of maximizing the variance explained by a linear combination of the input variables while constraining the number of nonzero coefﬁcients in this combination. This is known as sparse principal component analysis and has a wide array of applications in machine learning and engineering. We formulate a new semideﬁnite relaxation to this problem and derive a greedy algorithm that computes a full set of good solutions for all target numbers of non zero coefﬁcients, with total complexity O(n3 ), where n is the number of variables. We then use the same relaxation to derive sufﬁcient conditions for global optimality of a solution, which can be tested in O(n3 ) per pattern. We discuss applications in subset selection and sparse recovery and show on artiﬁcial examples and biological data that our algorithm does provide globally optimal solutions in many cases. Keywords: PCA, subset selection, sparse eigenvalues, sparse recovery, lasso

reference text

A. Alizadeh, M. Eisen, R. Davis, C. Ma, I. Lossos, and A. Rosenwald. Distinct types of diffuse large b-cell lymphoma identiﬁed by gene expression proﬁling. Nature, 403:503–511, 2000. A. Alon, N. Barkai, D. A. Notterman, K. Gish, S. Ybarra, D. Mack, and A. J. Levine. Broad patterns of gene expression revealed by clustering analysis of tumor and normal colon tissues probed by oligonucleotide arrays. Cell Biology, 96:6745–6750, 1999. A. Ben-Tal and A. Nemirovski. On tractable approximations of uncertain linear matrix inequalities affected by interval uncertainty. SIAM Journal on Optimization, 12(3):811–833, 2002. S. Boyd and L. Vandenberghe. Convex Optimization. Cambridge University Press, 2004. J. Cadima and I. T. Jolliffe. Loadings and correlations in the interpretation of principal components. Journal of Applied Statistics, 22:203–214, 1995. E. J. Cand` s and T. Tao. Decoding by linear programming. Information Theory, IEEE Transactions e on, 51(12):4203–4215, 2005. C. Couvreur and Y. Bresler. On the optimality of the backward greedy algorithm for the subset selection problem. SIAM J. Matrix Anal. Appl., 21(3):797–808, 2000. A. d’Aspremont, F. R. Bach, and L. El Ghaoui. Full regularization path for sparse principal component analysis. In Proceedings of the Twenty-fourth International Conference on Machine Learning (ICML), 2007a. A. d’Aspremont, L. El Ghaoui, M.I. Jordan, and G. R. G. Lanckriet. A direct formulation for sparse PCA using semideﬁnite programming. SIAM Review, 49(3):434–448, 2007b. D. L. Donoho and J. Tanner. Sparse nonnegative solutions of underdetermined linear equations by linear programming. Proceedings of the National Academy of Sciences, 102(27):9446–9451, 2005. R.A. Horn and C.R. Johnson. Matrix Analysis. Cambridge University Press, 1985. 1293 D ’A SPREMONT, BACH AND E L G HAOUI I. T. Jolliffe. Rotation of principal components: choice of normalization constraints. Journal of Applied Statistics, 22:29–35, 1995. I. T. Jolliffe, N.T. Trendaﬁlov, and M. Uddin. A modiﬁed principal component technique based on the LASSO. Journal of Computational and Graphical Statistics, 12:531–547, 2003. H.F. Kaiser. The varimax criterion for analytic rotation in factor analysis. Psychometrika, 23(3): 187–200, 1958. T. Kato. Perturbation Theory for Linear Operators. Springer-Verlag, 1966. N. Meinshausen and B. Yu. Lasso-type recovery of sparse representations for highdimensional data. Technical report, Technical Report, Statistics Department, UC Berkeley, 2006, 2006. B. Moghaddam, Y. Weiss, and S. Avidan. Spectral bounds for sparse PCA: Exact and greedy algorithms. Advances in Neural Information Processing Systems, 18, 2006a. B. Moghaddam, Y. Weiss, and S. Avidan. Generalized spectral bounds for sparse LDA. In Proc. ICML, 2006b. B. Moghaddam, Y. Weiss, and S. Avidan. Fast Pixel/Part Selection with Sparse Eigenvectors. Computer Vision, 2007. ICCV 2007. IEEE 11th International Conference on, pages 1–8, 2007. B. K. Natarajan. Sparse approximate solutions to linear systems. SIAM J. Comput., 24(2):227–234, 1995. JO Neuhaus and C. Wrigley. The quartimax method: an analytical approach to orthogonal simple structure. British Journal of Statistical Psychology, 7:81–91, 1954. W. Rudin. Real and Complex Analysis, Third edition. McGraw-Hill, Inc., New York, NY, USA, 1987. ISBN 0070542341. B.K. Sriperumbudur, D.A. Torres, and G.R.G. Lanckriet. Sparse eigen methods by DC programming. Proceedings of the 24th international conference on Machine learning, pages 831–838, 2007. Y. Su, T. M. Murali, V. Pavlovic, M. Schaffer, and S. Kasif. Rankgene: Identiﬁcation of diagnostic genes based on expression data. Bioinformatics, 19:1578–1579, 2003. R. Tibshirani. Regression shrinkage and selection via the LASSO. Journal of the Royal statistical society, series B, 58(1):267–288, 1996. M. Yuan and Y. Lin. On the non-negative garrotte estimator. Journal of The Royal Statistical Society Series B, 69(2):143–161, 2007. Z. Zhang, H. Zha, and H. Simon. Low rank approximations with sparse factors I: basic algorithms and error analysis. SIAM journal on matrix analysis and its applications, 23(3):706–727, 2002. P. Zhao and B. Yu. On model selection consistency of lasso. Journal of Machine Learning Research, 7:2541–2563, 2006. H. Zou, T. Hastie, and R. Tibshirani. Sparse principal component analysis. Journal of Computational & Graphical Statistics, 15(2):265–286, 2006. 1294