nips nips2010 nips2010-231 nips2010-231-reference knowledge-graph by maker-knowledge-mining
Source: pdf
Author: Huan Xu, Constantine Caramanis, Sujay Sanghavi
Abstract: Singular Value Decomposition (and Principal Component Analysis) is one of the most widely used techniques for dimensionality reduction: successful and efficiently computable, it is nevertheless plagued by a well-known, well-documented sensitivity to outliers. Recent work has considered the setting where each point has a few arbitrarily corrupted components. Yet, in applications of SVD or PCA such as robust collaborative filtering or bioinformatics, malicious agents, defective genes, or simply corrupted or contaminated experiments may effectively yield entire points that are completely corrupted. We present an efficient convex optimization-based algorithm we call Outlier Pursuit, that under some mild assumptions on the uncorrupted points (satisfied, e.g., by the standard generative assumption in PCA problems) recovers the exact optimal low-dimensional subspace, and identifies the corrupted points. Such identification of corrupted points that do not conform to the low-dimensional approximation, is of paramount interest in bioinformatics and financial applications, and beyond. Our techniques involve matrix decomposition using nuclear norm minimization, however, our results, setup, and approach, necessarily differ considerably from the existing line of work in matrix completion and matrix decomposition, since we develop an approach to recover the correct column space of the uncorrupted matrix, rather than the exact matrix itself.
[1] I. T. Jolliffe. Principal Component Analysis. Springer Series in Statistics, Berlin: Springer, 1986.
[2] P. J. Huber. Robust Statistics. John Wiley & Sons, New York, 1981.
[3] L. Xu and A. L. Yuille. Robust principal component analysis by self-organizing rules based on statistical physics approach. IEEE Tran. on Neural Networks, 6(1):131–143, 1995.
[4] E. Cand` s, X. Li, Y. Ma, and J. Wright. Robust pricinpal component analysis? ArXiv:0912.3599, 2009. e
[5] S. J. Devlin, R. Gnanadesikan, and J. R. Kettenring. Robust estimation of dispersion matrices and principal components. Journal of the American Statistical Association, 76(374):354–362, 1981.
[6] T. N. Yang and S. D. Wang. Robust algorithms for principal component analysis. Pattern Recognition Letters, 20(9):927–933, 1999.
[7] C. Croux and G. Hasebroeck. Principal component analysis based on robust estimators of the covariance or correlation matrix: Influence functions and efficiencies. Biometrika, 87(3):603–618, 2000.
[8] F. De la Torre and M. J. Black. Robust principal component analysis for computer vision. In ICCV’01, pages 362–369, 2001.
[9] F. De la Torre and M. J. Black. A framework for robust subspace learning. International Journal of Computer Vision, 54(1/2/3):117–142, 2003.
[10] C. Croux, P. Filzmoser, and M. Oliveira. Algorithms for Projection−Pursuit robust principal component analysis. Chemometrics and Intelligent Laboratory Systems, 87(2):218–225, 2007.
[11] S. C. Brubaker. Robust PCA and clustering on noisy mixtures. In SODA’09, pages 1078–1087, 2009.
[12] H. Xu, C. Caramanis, and S. Mannor. Principal component analysis with contaminated data: The high dimensional case. In COLT’10, pages 490–502, 2010.
[13] E. J. Cand` s, J. Romberg, and T. Tao. Robust uncertainty principles: Exact signal reconstruction from e highly incomplete frequency information. IEEE Tran. on Information Theory, 52(2):489–509, 2006.
[14] B. Recht, M. Fazel, and P. Parrilo. Guaranteed minimum rank solutions to linear matrix equations via nuclear norm minimization. To appear in SIAM Review, 2010.
[15] V. Chandrasekaran, S. Sanghavi, P. Parrilo, and A. Willsky. Rank-sparsity incoherence for matrix decomposition. ArXiv:0906.2220, 2009.
[16] E. Cand` s and B. Recht. Exact matrix completion via convex optimization. Foundations of Computational e Mathematics, 9:717–772, 2009.
[17] E. Cand` s and T. Tao. The power of convex relaxation: Near-optimal matrix completion. IEEE Tran. on e Information Theory, 56(2053-2080), 2010.
[18] D. L. Donoho. Breakdown properties of multivariate location estimators. Qualifying paper, Harvard University, 1982.
[19] R. Maronna. Robust M-estimators of multivariate location and scatter. The Annals of Statistics, 4:51–67, 1976.
[20] V. Barnett. The ordering of multivariate data. Journal of Royal Statistics Society, A, 138:318–344, 1976.
[21] D. Titterington. Estimation of correlation coefficients by ellipsoidal trimming. Applied Statistics, 27:227– 234, 1978.
[22] V. Barnett and T. Lewis. Outliers in Statistical Data. Wiley, New York, 1978.
[23] A. Dempster and M. Gasko-Green. New tools for residual analysis. The Annals of Statistics, 9(5):945– 959, 1981.
[24] S. J. Devlin, R. Gnanadesikan, and J. R. Kettenring. Robust estimation and outlier detection with correlation coefficients. Biometrika, 62:531–545, 1975.
[25] G. Li and Z. Chen. Projection-pursuit approach to robust dispersion matrices and principal components: Primary theory and monte carlo. Journal of the American Statistical Association, 80(391):759–766, 1985.
[26] H. Xu, C. Caramanis, and S. Sanghavi. Robust PCA via outlier pursuit. http://arxiv.org/abs/1010.4237, 2010.
[27] Y. Nesterov. A method of solving a convex programming problem with convergence rate o(1/k2 ). Soviet Mathematics Doklady, 27(372-376), 1983.
[28] J-F. Cai, E. Cand` s, and Z. Shen. A singular value thresholding algorithm for matrix completion. SIAM e Journal on Optimization, 20:1956–1982, 2008.
[29] C. Rasmussen and C. Williams. Gaussian Processes for Machine Learning. the MIT Press, 2006. 9