nips nips2012 nips2012-310 nips2012-310-reference knowledge-graph by maker-knowledge-mining
Source: pdf
Author: Fang Han, Han Liu
Abstract: We propose two new principal component analysis methods in this paper utilizing a semiparametric model. The according methods are named Copula Component Analysis (COCA) and Copula PCA. The semiparametric model assumes that, after unspecified marginally monotone transformations, the distributions are multivariate Gaussian. The COCA and Copula PCA accordingly estimate the leading eigenvectors of the correlation and covariance matrices of the latent Gaussian distribution. The robust nonparametric rank-based correlation coefficient estimator, Spearman’s rho, is exploited in estimation. We prove that, under suitable conditions, although the marginal distributions can be arbitrarily continuous, the COCA and Copula PCA estimators obtain fast estimation rates and are feature selection consistent in the setting where the dimension is nearly exponentially large relative to the sample size. Careful numerical experiments on the synthetic and real data are conducted to back up the theoretical results. We also discuss the relationship with the transelliptical component analysis proposed by Han and Liu (2012). 1
[1] A.A. Amini and M.J. Wainwright. High-dimensional analysis of semidefinite relaxations for sparse principal components. In Information Theory, 2008. ISIT 2008. IEEE International Symposium on, pages 2454–2458. IEEE, 2008.
[2] T.W Anderson. An introduction to multivariate statistical analysis, volume 2. Wiley New York, 1958.
[3] A. d’Aspremont, F. Bach, and L.E. Ghaoui. Optimal solutions for sparse principal component analysis. The Journal of Machine Learning Research, 9:1269–1294, 2008.
[4] A. d’Aspremont, L. El Ghaoui, M.I. Jordan, and G.R.G. Lanckriet. A direct formulation for sparse PCA using semidefinite programming. Computer Science Division, University of California, 2004.
[5] B. Flury. A first course in multivariate statistics. Springer Verlag, 1997.
[6] F. Han and H. Liu. High dimensional semiparametric scale-invariant principal component analysis. Technical Report, 2012.
[7] F. Han and H. Liu. Tca: Transelliptical principal component analysis for high dimensional non-gaussian data. Technical Report, 2012.
[8] T. Hastie and W. Stuetzle. Principal curves. Journal of the American Statistical Association, pages 502–516, 1989.
[9] I.M. Johnstone and A.Y. Lu. On consistency and sparsity for principal components analysis in high dimensions. Journal of the American Statistical Association, 104(486):682–693, 2009.
[10] I.T. Jolliffe. Principal component analysis, volume 2. Wiley Online Library, 2002.
[11] I.T. Jolliffe, N.T. Trendafilov, and M. Uddin. A modified principal component technique based on the lasso. Journal of Computational and Graphical Statistics, 12(3):531–547, 2003.
[12] M. Journ´ e, Y. Nesterov, P. Richt´ rik, and R. Sepulchre. Generalized power method for sparse e a principal component analysis. The Journal of Machine Learning Research, 11:517–553, 2010.
[13] J.T. Leek and J.D. Storey. Capturing heterogeneity in gene expression studies by surrogate variable analysis. PLoS Genetics, 3(9):e161, 2007.
[14] H. Liu, F. Han, M. Yuan, J. Lafferty, and L. Wasserman. High dimensional semiparametric gaussian copula graphical models. Annals of Statistics, 2012.
[15] H. Liu, J. Lafferty, and L. Wasserman. The nonparanormal: Semiparametric estimation of high dimensional undirected graphs. The Journal of Machine Learning Research, 10:2295–2328, 2009.
[16] Z. Ma. Sparse principal component analysis and iterative thresholding. Arxiv preprint arXiv:1112.2432, 2011.
[17] Matthew McCall, Benjamin Bolstad, and Rafael Irizarry. Frozen robust multiarray analysis (frma). Biostatistics, 11:242–253, 2010.
[18] D. Paul and I.M. Johnstone. Augmented sparse principal component analysis for high dimensional data. Arxiv preprint arXiv:1202.1242, 2012.
[19] H. Shen and J.Z. Huang. Sparse principal component analysis via regularized low rank matrix approximation. Journal of multivariate analysis, 99(6):1015–1034, 2008.
[20] V.Q. Vu and J. Lei. Minimax rates of estimation for sparse pca in high dimensions. Arxiv preprint arXiv:1202.0786, 2012.
[21] D.M. Witten, R. Tibshirani, and T. Hastie. A penalized matrix decomposition, with applications to sparse principal components and canonical correlation analysis. Biostatistics, 10(3):515–534, 2009.
[22] L. Xue and H. Zou. Regularized rank-based estimation of high-dimensional nonparanormal graphical models. Annals of Statistics, 2012.
[23] X.T. Yuan and T. Zhang. Truncated power method for sparse eigenvalue problems. Arxiv preprint arXiv:1112.2679, 2011.
[24] H. Zou and T. Hastie. Regularization and variable selection via the elastic net. Journal of the Royal Statistical Society: Series B (Statistical Methodology), 67(2):301–320, 2005.
[25] H. Zou, T. Hastie, and R. Tibshirani. Sparse principal component analysis. Journal of computational and graphical statistics, 15(2):265–286, 2006. 9