nips nips2011 nips2011-260 nips2011-260-reference knowledge-graph by maker-knowledge-mining
Source: pdf
Author: Christos Boutsidis, Petros Drineas, Malik Magdon-Ismail
Abstract: Principal Components Analysis (PCA) is often used as a feature extraction procedure. Given a matrix X ∈ Rn×d , whose rows represent n data points with respect to d features, the top k right singular vectors of X (the so-called eigenfeatures), are arbitrary linear combinations of all available features. The eigenfeatures are very useful in data analysis, including the regularization of linear regression. Enforcing sparsity on the eigenfeatures, i.e., forcing them to be linear combinations of only a small number of actual features (as opposed to all available features), can promote better generalization error and improve the interpretability of the eigenfeatures. We present deterministic and randomized algorithms that construct such sparse eigenfeatures while provably achieving in-sample performance comparable to regularized linear regression. Our algorithms are relatively simple and practically efficient, and we demonstrate their performance on several data sets.
[1] J. Batson, D. Spielman, and N. Srivastava. Twice-ramanujan sparsifiers. In Proceedings of ACM STOC, pages 255–262, 2009.
[2] C. Boutsidis, P. Drineas, and M. Magdon-Ismail. Near-optimal column based matrix reconstruction. In Proceedings of IEEE FOCS, 2011.
[3] C. Boutsidis, P. Drineas, and M. Magdon-Ismail. manuscript, 2011.
[4] C. Boutsidis and M. Magdon-Ismail. arXiv:1109.5664v1, 2011. Sparse features for PCA-like linear regression. Deterministic feature selection for k-means clustering. 8
[5] C. Boutsidis, M. W. Mahoney, and P. Drineas. An improved approximation algorithm for the column subset selection problem. In Proceedings of ACM -SIAM SODA, pages 968–977, 2009.
[6] C. Boutsidis, M. W. Mahoney, and P. Drineas. Unsupervised feature selection for the k-means clustering problem. In Proceedings of NIPS, 2009.
[7] J. Cadima and I. Jolliffe. Loadings and correlations in the interpretation of principal components. Applied Statistics, 22:203–214, 1995.
[8] T. Chan and P. Hansen. Some applications of the rank revealing QR factorization. SIAM Journal on Scientific and Statistical Computing, 13:727–741, 1992.
[9] A. Das and D. Kempe. Algorithms for subset selection in linear regression. In Proceedings of ACM STOC, 2008.
[10] A. Dasgupta, P. Drineas, B. Harb, R. Kumar, and M. W. Mahoney. Sampling algorithms and coresets for Lp regression. In Proceedings of ACM-SIAM SODA, 2008.
[11] A. d’Aspremont, L. El Ghaoui, M. I. Jordan, and G. R. G. Lanckriet. A direct formulation for sparse PCA using semidefinite programming. In Proceedings of NIPS, 2004.
[12] A. Deshpande and L. Rademacher. Efficient volume sampling for row/column subset selection. In Proceedings of ACM STOC, 2010.
[13] P. Drineas, R. Kannan, and M. Mahoney. Fast Monte Carlo algorithms for matrices I: Approximating matrix multiplication. SIAM Journal of Computing, 36(1):132–157, 2006.
[14] P. Drineas, M. Mahoney, and S. Muthukrishnan. Polynomial time algorithm for column-row based relative-error low-rank matrix approximation. Technical Report 2006-04, DIMACS, March 2006.
[15] P. Drineas, M. Mahoney, and S. Muthukrishnan. Sampling algorithms for ℓ2 regression and applications. In Proceedings of ACM-SIAM SODA, pages 1127–1136, 2006.
[16] G. Golub. Numerical methods for solving linear least squares problems. Numerische Mathematik, 7:206– 216, 1965.
[17] G. Golub, P. Hansen, and D. O’Leary. Tikhonov regularization and total least squares. SIAM Journal on Matrix Analysis and Applications, 21(1):185–194, 2000.
[18] M. Gu and S. Eisenstat. Efficient algorithms for computing a strong rank-revealing QR factorization. SIAM Journal on Scientific Computing, 17:848–869, 1996.
[19] I. Guyon and A. Elisseeff. Special issue on variable and feature selection. Journal of Machine Learning Research, 3, 2003.
[20] N. Halko, P. Martinsson, and J. Tropp. Finding structure with randomness: probabilistic algorithms for constructing approximate matrix decompositions. SIAM Review, 2011.
[21] P. Hansen. The truncated SVD as a method for regularization. BIT Numerical Mathematics, 27(4):534– 553, 1987.
[22] I. Jolliffe. Discarding variables in Principal Component Analysis: asrtificial data. Applied Statistics, 21(2):160–173, 1972.
[23] R. Larsen. PROPACK: A software package for the symmetric eigenvalue problem and singular value problems on Lanczos and Lanczos bidiagonalization with partial reorthogonalization. http://soi.stanford.edu/∼rmunk/∼PROPACK/.
[24] B. Moghaddam, Y. Weiss, and S. Avidan. Spectral bounds for sparse PCA: exact and greedy algorithms. In Proceedings of NIPS, 2005.
[25] B. Natarajan. Sparse approximate solutions to linear systems. SIAM Journal on Computing, 24(2):227– 234, 1995.
[26] M. Rudelson and R. Vershynin. Sampling from large matrices: An approach through geometric functional analysis. Journal of the ACM, 54, 2007.
[27] N. Srivastava and D. Spielman. Graph sparsifications by effective resistances. In Proceedings of ACM STOC, pages 563–568, 2008.
[28] R. Tibshirani. Regression shrinkage and selection via the lasso. Journal of the Royal Statistical Society, pages 267–288, 1996.
[29] J. Tropp. Greed is good: Algorithmic results for sparse approximation. IEEE Transactions on Information Theory, 50(10):2231–2242, 2004.
[30] T. Zhang. Generating a d-dimensional linear subspace efficiently. In Adaptive forward-backward greedy algorithm for sparse learning with linear models, 2008. 9