nips nips2013 nips2013-188 nips2013-188-reference knowledge-graph by maker-knowledge-mining
Source: pdf
Author: Ioannis Mitliagkas, Constantine Caramanis, Prateek Jain
Abstract: We consider streaming, one-pass principal component analysis (PCA), in the highdimensional regime, with limited memory. Here, p-dimensional samples are presented sequentially, and the goal is to produce the k-dimensional subspace that best approximates these points. Standard algorithms require O(p2 ) memory; meanwhile no algorithm can do better than O(kp) memory, since this is what the output itself requires. Memory (or storage) complexity is most meaningful when understood in the context of computational and sample complexity. Sample complexity for high-dimensional PCA is typically studied in the setting of the spiked covariance model, where p-dimensional points are generated from a population covariance equal to the identity (white noise) plus a low-dimensional perturbation (the spike) which is the signal to be recovered. It is now well-understood that the spike can be recovered when the number of samples, n, scales proportionally with the dimension, p. Yet, all algorithms that provably achieve this, have memory complexity O(p2 ). Meanwhile, algorithms with memory-complexity O(kp) do not have provable bounds on sample complexity comparable to p. We present an algorithm that achieves both: it uses O(kp) memory (meaning storage of any kind) and is able to compute the k-dimensional spike with O(p log p) samplecomplexity – the first algorithm of its kind. While our theoretical analysis focuses on the spiked covariance model, our simulations show that our algorithm is successful on much more general models for the data. 1
Arora, R., Cotter, A., Livescu, K., and Srebro, N. Stochastic optimization for PCA and PLS. In 50th Allerton Conference on Communication, Control, and Computing, Monticello, IL, 2012. Balzano, L., Nowak, R., and Recht, B. Online identification and tracking of subspaces from highly incomplete information. In Communication, Control, and Computing (Allerton), 2010 48th Annual Allerton Conference on, pp. 704–711, 2010. Brand, M. Fast low-rank modifications of the thin singular value decomposition. Linear algebra and its applications, 415(1):20–30, 2006. Brand, Matthew. Incremental singular value decomposition of uncertain data with missing values. Computer Vision—ECCV 2002, pp. 707–720, 2002. Clarkson, Kenneth L. and Woodruff, David P. Numerical linear algebra in the streaming model. In Proceedings of the 41st annual ACM symposium on Theory of computing, pp. 205–214, 2009. Comon, P. and Golub, G. H. Tracking a few extreme singular values and vectors in signal processing. Proceedings of the IEEE, 78(8):1327–1343, 1990. Golub, Gene H. and Van Loan, Charles F. Matrix computations, volume 3. JHUP, 2012. Halko, Nathan, Martinsson, Per-Gunnar, and Tropp, Joel A. Finding structure with randomness: Probabilistic algorithms for constructing approximate matrix decompositions. SIAM review, 53(2):217–288, 2011. He, J., Balzano, L., and Lui, J. Online robust subspace tracking from partial information. arXiv preprint arXiv:1109.3827, 2011. Herbster, Mark and Warmuth, Manfred K. Tracking the best linear predictor. The Journal of Machine Learning Research, 1:281–309, 2001. Johnstone, Iain M. On the distribution of the largest eigenvalue in principal components analysis.(english. Ann. Statist, 29(2):295–327, 2001. Li, Y. On incremental and robust subspace learning. Pattern recognition, 37(7):1509–1518, 2004. Mitliagkas, Ioannis, Caramanis, Constantine, and Jain, Prateek. Memory limited, streaming PCA. arXiv preprint arXiv:1307.0032, 2013. Nadler, Boaz. Finite sample approximation results for principal component analysis: a matrix perturbation approach. The Annals of Statistics, pp. 2791–2817, 2008. Porteous, Ian, Newman, David, Ihler, Alexander, Asuncion, Arthur, Smyth, Padhraic, and Welling, Max. Fast collapsed gibbs sampling for latent dirichlet allocation. In Proceedings of the 14th ACM SIGKDD international conference on Knowledge discovery and data mining, pp. 569–577, 2008. Robbins, Herbert and Monro, Sutton. A stochastic approximation method. The Annals of Mathematical Statistics, pp. 400–407, 1951. Roweis, Sam. EM algorithms for PCA and SPCA. Advances in neural information processing systems, pp. 626–632, 1998. Rudelson, Mark and Vershynin, Roman. Smallest singular value of a random rectangular matrix. Communications on Pure and Applied Mathematics, 62(12):1707–1739, 2009. Tipping, Michael E. and Bishop, Christopher M. Probabilistic principal component analysis. Journal of the Royal Statistical Society: Series B (Statistical Methodology), 61(3):611–622, 1999. Vershynin, R. How close is the sample covariance matrix to the actual covariance matrix? Journal of Theoretical Probability, pp. 1–32, 2010a. Vershynin, Roman. Introduction to the non-asymptotic analysis of random matrices. arXiv:1011.3027, 2010b. arXiv preprint Warmuth, Manfred K. and Kuzmin, Dima. Randomized online PCA algorithms with regret bounds that are logarithmic in the dimension. Journal of Machine Learning Research, 9:2287–2320, 2008. 9