nips nips2013 nips2013-324 nips2013-324-reference knowledge-graph by maker-knowledge-mining

324 nips-2013-The Fast Convergence of Incremental PCA

Source: pdf

Author: Akshay Balsubramani, Sanjoy Dasgupta, Yoav Freund

Abstract: We consider a situation in which we see samples Xn ∈ Rd drawn i.i.d. from some distribution with mean zero and unknown covariance A. We wish to compute the top eigenvector of A in an incremental fashion - with an algorithm that maintains an estimate of the top eigenvector in O(d) space, and incrementally adjusts the estimate with each new data point that arrives. Two classical such schemes are due to Krasulina (1969) and Oja (1983). We give ﬁnite-sample convergence rates for both. 1

reference text

[1] A. Agarwal, O. Chapelle, M. Dud´k, and J. Langford. A reliable effective terascale linear ı learning system. CoRR, abs/1110.4198, 2011.

[2] R. Arora, A. Cotter, K. Livescu, and N. Srebro. Stochastic optimization for PCA and PLS. In 50th Annual Allerton Conference on Communication, Control, and Computing, pages 861– 868. 2012.

[3] R. Arora, A. Cotter, and N. Srebro. Stochastic optimization of PCA with capped MSG. In Advances in Neural Information Processing Systems, 2013.

[4] G. Blanchard, O. Bousquet, and L. Zwald. Statistical properties of kernel principal component analysis. Machine Learning, 66(2-3):259–294, 2007.

[5] T. T. Cai, Z. Ma, and Y. Wu. Sparse PCA: Optimal rates and adaptive estimation. CoRR, abs/1211.1309, 2012. 8

[6] R. Durrett. Probability: Theory and Examples. Duxbury, second edition, 1995.

[7] T.P. Krasulina. A method of stochastic approximation for the determination of the least eigenvalue of a symmetrical matrix. USSR Computational Mathematics and Mathematical Physics, 9(6):189–195, 1969.

[8] I. Mitliagkas, C. Caramanis, and P. Jain. Memory limited, streaming PCA. In Advances in Neural Information Processing Systems, 2013.

[9] E. Oja. Subspace Methods of Pattern Recognition. Research Studies Press, 1983.

[10] E. Oja and J. Karhunen. On stochastic approximation of the eigenvectors and eigenvalues of the expectation of a random matrix. Journal of Math. Analysis and Applications, 106:69–84, 1985.

[11] A. Rakhlin, O. Shamir, and K. Sridharan. Making gradient descent optimal for strongly convex stochastic optimization. In International Conference on Machine Learning, 2012.

[12] S. Roweis. EM algorithms for PCA and SPCA. In Advances in Neural Information Processing Systems, 1997.

[13] T. Sim, S. Baker, and M. Bsat. The CMU pose, illumination, and expression database. IEEE Transactions on Pattern Analysis and Machine Intelligence, 25(12):1615–1618, 2003.

[14] V.Q. Vu and J. Lei. Minimax rates of estimation for sparse PCA in high dimensions. Journal of Machine Learning Research - Proceedings Track, 22:1278–1286, 2012.

[15] M.K. Warmuth and D. Kuzmin. Randomized PCA algorithms with regret bounds that are logarithmic in the dimension. In Advances in Neural Information Processing Systems. 2007.

[16] J. Weng, Y. Zhang, and W.-S. Hwang. Candid covariance-free incremental principal component analysis. IEEE Transactions on Pattern Analysis and Machine Intelligence, 25(8):1034– 1040, 2003.

[17] L. Zwald and G. Blanchard. On the convergence of eigenspaces in kernel principal component analysis. In Advances in Neural Information Processing Systems, 2005. 9