nips nips2008 nips2008-57 nips2008-57-reference knowledge-graph by maker-knowledge-mining

57 nips-2008-Deflation Methods for Sparse PCA

Source: pdf

Author: Lester W. Mackey

Abstract: In analogy to the PCA setting, the sparse PCA problem is often solved by iteratively alternating between two subtasks: cardinality-constrained rank-one variance maximization and matrix deﬂation. While the former has received a great deal of attention in the literature, the latter is seldom analyzed and is typically borrowed without justiﬁcation from the PCA context. In this work, we demonstrate that the standard PCA deﬂation procedure is seldom appropriate for the sparse PCA setting. To rectify the situation, we ﬁrst develop several deﬂation alternatives better suited to the cardinality-constrained context. We then reformulate the sparse PCA optimization problem to explicitly reﬂect the maximum additional variance objective on each round. The result is a generalized deﬂation procedure that typically outperforms more standard techniques on real-world datasets. 1

reference text

[1] A. d’Aspremont, L. El Ghaoui, M. I. Jordan, and G. R. G. Lanckriet. A Direct Formulation for Sparse PCA using Semideﬁnite Programming. In Advances in Neural Information Processing Systems (NIPS). Vancouver, BC, December 2004.

[2] A. d’Aspremont, F. R. Bach, and L. E. Ghaoui. Full regularization path for sparse principal component analysis. In Proceedings of the 24th international Conference on Machine Learning. Z. Ghahramani, Ed. ICML ’07, vol. 227. ACM, New York, NY, 177-184, 2007.

[3] J. Cadima and I. Jolliffe. Loadings and correlations in the interpretation of principal components. Applied Statistics, 22:203.214, 1995.

[4] C.C. Fowlkes, C.L. Luengo Hendriks, S.V. Kernen, G.H. Weber, O. Rbel, M.-Y. Huang, S. Chatoor, A.H. DePace, L. Simirenko and C. Henriquez et al. Cell 133, pp. 364-374, 2008.

[5] J. Jeffers. Two case studies in the application of principal components. Applied Statistics, 16, 225-236, 1967.

[6] I.T. Jolliffe and M. Uddin. A Modiﬁed Principal Component Technique based on the Lasso. Journal of Computational and Graphical Statistics, 12:531.547, 2003.

[7] I.T. Jolliffe, Principal component analysis, Springer Verlag, New York, 1986.

[8] I.T. Jolliffe. Rotation of principal components: choice of normalization constraints. Journal of Applied Statistics, 22:29-35, 1995.

[9] B. Moghaddam, Y. Weiss, and S. Avidan. Spectral bounds for sparse PCA: Exact and greedy algorithms. Advances in Neural Information Processing Systems, 18, 2006.

[10] B. Moghaddam, Y. Weiss, and S. Avidan. Generalized spectral bounds for sparse LDA. In Proc. ICML, 2006.

[11] Y. Saad, Projection and deﬂation methods for partial pole assignment in linear state feedback, IEEE Trans. Automat. Contr., vol. 33, pp. 290-297, Mar. 1998.

[12] B.K. Sriperumbudur, D.A. Torres, and G.R.G. Lanckriet. Sparse eigen methods by DC programming. Proceedings of the 24th International Conference on Machine learning, pp. 831838, 2007.

[13] D. Torres, B.K. Sriperumbudur, and G. Lanckriet. Finding Musically Meaningful Words by Sparse CCA. Neural Information Processing Systems (NIPS) Workshop on Music, the Brain and Cognition, 2007.

[14] P. White. The Computation of Eigenvalues and Eigenvectors of a Matrix. Journal of the Society for Industrial and Applied Mathematics, Vol. 6, No. 4, pp. 393-437, Dec., 1958.

[15] F. Zhang (Ed.). The Schur Complement and Its Applications. Kluwer, Dordrecht, Springer, 2005.

[16] Z. Zhang, H. Zha, and H. Simon, Low-rank approximations with sparse factors I: Basic algorithms and error analysis. SIAM J. Matrix Anal. Appl., 23 (2002), pp. 706-727.

[17] Z. Zhang, H. Zha, and H. Simon, Low-rank approximations with sparse factors II: Penalized methods with discrete Newton-like iterations. SIAM J. Matrix Anal. Appl., 25 (2004), pp. 901-920.

[18] H. Zou, T. Hastie, and R. Tibshirani. Sparse Principal Component Analysis. Technical Report, Statistics Department, Stanford University, 2004. 8