nips nips2012 nips2012-351 nips2012-351-reference knowledge-graph by maker-knowledge-mining
Source: pdf
Author: Fang Han, Han Liu
Abstract: We propose a high dimensional semiparametric scale-invariant principle component analysis, named TCA, by utilize the natural connection between the elliptical distribution family and the principal component analysis. Elliptical distribution family includes many well-known multivariate distributions like multivariate Gaussian, t and logistic and it is extended to the meta-elliptical by Fang et.al (2002) using the copula techniques. In this paper we extend the meta-elliptical distribution family to a even larger family, called transelliptical. We prove that TCA can obtain a near-optimal s log d/n estimation consistency rate in recovering the leading eigenvector of the latent generalized correlation matrix under the transelliptical distribution family, even if the distributions are very heavy-tailed, have infinite second moments, do not have densities and possess arbitrarily continuous marginal distributions. A feature selection result with explicit rate is also provided. TCA is further implemented in both numerical simulations and largescale stock data to illustrate its empirical usefulness. Both theories and experiments confirm that TCA can achieve model flexibility, estimation accuracy and robustness at almost no cost. 1
[1] TW Anderson. Statistical inference in elliptically contoured and related distributions. Recherche, 67:02, 1990.
[2] M.G. Borgognone, J. Bussi, and G. Hough. Principal component analysis in sensory analysis: covariance or correlation matrix? Food quality and preference, 12(5-7):323–326, 2001.
[3] S. Cambanis, S. Huang, and G. Simons. On the theory of elliptically contoured distributions. Journal of Multivariate Analysis, 11(3):368–385, 1981.
[4] H.B. Fang, K.T. Fang, and S. Kotz. The meta-elliptical distributions with given marginals. Journal of Multivariate Analysis, 82(1):1–16, 2002.
[5] KT Fang, S. Kotz, and KW Ng. Symmetric multivariate and related distributions. Chapman&Hall;, London, 1990.
[6] C. Genest, AC Favre, J. B´ liveau, and C. Jacques. Metaelliptical copulas and their use in e frequency analysis of multivariate hydrological data. Water Resour. Res, 43(9):W09401, 2007.
[7] P.R. Halmos. Measure theory, volume 18. Springer, 1974.
[8] F. Han and H. Liu. Tca: Transelliptical principal component analysis for high dimensional non-gaussian data. Technical Report, 2012.
[9] W. Hoeffding. Probability inequalities for sums of bounded random variables. Journal of the American Statistical Association, pages 13–30, 1963.
[10] H. Hult and F. Lindskog. Multivariate extremes, aggregation and dependence in elliptical distributions. Advances in Applied probability, 34(3):587–608, 2002.
[11] DR Jensen. The structure of ellipsoidal distributions, ii. principal components. Biometrical Journal, 28(3):363–369, 1986.
[12] DR Jensen. Conditioning and concentration of principal components. Australian Journal of Statistics, 39(1):93–104, 1997.
[13] H. Joe. Multivariate models and dependence concepts, volume 73. Chapman & Hall/CRC, 1997.
[14] I.M. Johnstone and A.Y. Lu. Sparse principal components analysis. Arxiv preprint arXiv:0901.4392, 2009.
[15] M. Journ´ e, Y. Nesterov, P. Richt´ rik, and R. Sepulchre. Generalized power method for sparse e a principal component analysis. The Journal of Machine Learning Research, 11:517–553, 2010.
[16] KS Kelly and R. Krzysztofowicz. A bivariate meta-gaussian density for use in hydrology. Stochastic Hydrology and Hydraulics, 11(1):17–31, 1997.
[17] W.H. Kruskal. Ordinal measures of association. Journal of the American Statistical Association, pages 814–861, 1958.
[18] D. Kurowicka, J. Misiewicz, and RM Cooke. Elliptical copulae. In Proc of the International Conference on Monte Carlo Simulation-Monte Carlo, pages 209–214, 2000.
[19] H. Liu, J. Lafferty, and L. Wasserman. The nonparanormal: Semiparametric estimation of high dimensional undirected graphs. The Journal of Machine Learning Research, 10:2295–2328, 2009.
[20] Z. Ma. Sparse principal component analysis and iterative thresholding. Arxiv preprint arXiv:1112.2432, 2011.
[21] L. Mackey. Deflation methods for sparse pca. Advances in neural information processing systems, 21:1017–1024, 2009.
[22] G.P. McCabe. Principal variables. Technometrics, pages 137–144, 1984.
[23] D. Paul and I.M. Johnstone. Augmented sparse principal component analysis for high dimensional data. Arxiv preprint arXiv:1202.1242, 2012.
[24] GQ Qian, G. Gabor, and RP Gupta. Principal components selection by the criterion of the minimum mean difference of complexity. Journal of multivariate analysis, 49(1):55–75, 1994. `
[25] A. Sklar. Fonctions de r´ partition a n dimensions et leurs marges. Publ. Inst. Statist. Univ. e Paris, 8(1):11, 1959.
[26] V.Q. Vu and J. Lei. Minimax rates of estimation for sparse pca in high dimensions. Arxiv preprint arXiv:1202.0786, 2012.
[27] C.M. Waternaux. Principal components in the nonnormal case: The test of equality of q roots. Journal of Multivariate Analysis, 14(3):323–335, 1984.
[28] X.T. Yuan and T. Zhang. Truncated power method for sparse eigenvalue problems. Arxiv preprint arXiv:1112.2679, 2011.
[29] Y. Zhang, A. dAspremont, and L.E. Ghaoui. Sparse pca: Convex relaxations, algorithms and applications. Handbook on Semidefinite, Conic and Polynomial Optimization, pages 915–940, 2012. 9