nips nips2013 nips2013-232 nips2013-232-reference knowledge-graph by maker-knowledge-mining

232 nips-2013-Online PCA for Contaminated Data


Source: pdf

Author: Jiashi Feng, Huan Xu, Shie Mannor, Shuicheng Yan

Abstract: We consider the online Principal Component Analysis (PCA) where contaminated samples (containing outliers) are revealed sequentially to the Principal Components (PCs) estimator. Due to their sensitiveness to outliers, previous online PCA algorithms fail in this case and their results can be arbitrarily skewed by the outliers. Here we propose the online robust PCA algorithm, which is able to improve the PCs estimation upon an initial one steadily, even when faced with a constant fraction of outliers. We show that the final result of the proposed online RPCA has an acceptable degradation from the optimum. Actually, under mild conditions, online RPCA achieves the maximal robustness with a 50% breakdown point. Moreover, online RPCA is shown to be efficient for both storage and computation, since it need not re-explore the previous samples as in traditional robust PCA algorithms. This endows online RPCA with scalability for large scale data. 1


reference text

[1] J.R. Bunch and C.P. Nielsen. Updating the singular value decomposition. Numerische Mathematik, 1978.

[2] E.J. Candes, X. Li, Y. Ma, and J. Wright. Robust principal component analysis? ArXiv:0912.3599, 2009.

[3] C. Croux and A. Ruiz-Gazen. A fast algorithm for robust principal components based on projection pursuit. In COMPSTAT, 1996.

[4] C. Croux and A. Ruiz-Gazen. High breakdown estimators for principal components: the projection-pursuit approach revisited. Journal of Multivariate Analysis, 2005.

[5] A. d’Aspremont, F. Bach, and L. Ghaoui. Optimal solutions for sparse principal component analysis. JMLR, 2008.

[6] J. Feng, H. Xu, and S. Yan. Robust PCA in high-dimension: A deterministic approach. In ICML, 2012.

[7] David Gross and Vincent Nesme. Note on sampling without replacing from a finite collection of matrices. arXiv preprint arXiv:1001.2738, 2010.

[8] P. Hall, D. Marshall, and R. Martin. Merging and splitting eigenspace models. TPAMI, 2000.

[9] Jun He, Laura Balzano, and John Lui. Online robust subspace tracking from partial information. arXiv preprint arXiv:1109.3827, 2011.

[10] P. Honeine. Online kernel principal component analysis: a reduced-order model. TPAMI, 2012.

[11] P.J. Huber, E. Ronchetti, and MyiLibrary. Robust statistics. John Wiley & Sons, New York, 1981.

[12] M. Hubert, P.J. Rousseeuw, and K.V. Branden. Robpca: a new approach to robust principal component analysis. Technometrics, 2005.

[13] M. Hubert, P.J. Rousseeuw, and S. Verboven. A fast method for robust principal components with applications to chemometrics. Chemometrics and Intelligent Laboratory Systems, 2002.

[14] G. Li and Z. Chen. Projection-pursuit approach to robust dispersion matrices and principal components: primary theory and monte carlo. Journal of the American Statistical Association, 1985.

[15] Y. Li. On incremental and robust subspace learning. Pattern recognition, 2004.

[16] Michael W Mahoney. Randomized algorithms for matrices and data. arXiv preprint arXiv:1104.5557, 2011.

[17] R.A. Maronna. Robust m-estimators of multivariate location and scatter. The annals of statistics, 1976.

[18] S. Ozawa, S. Pang, and N. Kasabov. A modified incremental principal component analysis for on-line learning of feature space and classifier. PRICAI, 2004.

[19] K. Pearson. On lines and planes of closest fit to systems of points in space. Philosophical Magazine, 1901.

[20] C. Qiu, N. Vaswani, and L. Hogben. Recursive robust pca or recursive sparse recovery in large but structured noise. arXiv preprint arXiv:1211.3754, 2012.

[21] P.J. Rousseeuw. Least median of squares regression. Journal of the American statistical association, 1984.

[22] P.J. Rousseeuw and A.M. Leroy. Robust regression and outlier detection. John Wiley & Sons Inc, 1987.

[23] M.K. Warmuth and D. Kuzmin. Randomized online pca algorithms with regret bounds that are logarithmic in the dimension. JMLR, 2008.

[24] H. Xu, C. Caramanis, and S. Mannor. Principal component analysis with contaminated data: The high dimensional case. In COLT, 2010.

[25] H. Zhao, P.C. Yuen, and J.T. Kwok. A novel incremental principal component analysis and its application for face recognition. TSMC-B, 2006. 9