jmlr jmlr2010 jmlr2010-6 jmlr2010-6-reference knowledge-graph by maker-knowledge-mining
Source: pdf
Author: Patrick O. Perry, Art B. Owen
Abstract: In multivariate regression models we have the opportunity to look for hidden structure unrelated to the observed predictors. However, when one fits a model involving such latent variables it is important to be able to tell if the structure is real, or just an artifact of correlation in the regression errors. We develop a new statistical test based on random rotations for verifying the existence of latent variables. The rotations are carefully constructed to rotate orthogonally to the column space of the regression model. We find that only non-Gaussian latent variables are detectable, a finding that parallels a well known phenomenon in independent components analysis. We base our test on a measure of non-Gaussianity in the histogram of the principal eigenvector components instead of on the eigenvalue. The method finds and verifies some latent dichotomies in the microarray data from the AGEMAP consortium. Keywords: independent components analysis, Kronecker covariance, latent variables, projection pursuit, transposable data
J. Baik and J.W. Silverstein. Eigenvalues of large sample covariance matrices of spiked population models. Journal of Multivariate Analysis, 97(6):1382–1408, 2006. J. Crossa and P. L. Cornelius. Linear-bilinear models for the analysis of genotype-environment interaction data. In M. S. Kang, editor, Quantitative Genetics, Genomics and Plant Breeding in the 21st Century, an International Symposium, pages 305–322, Wallingford UK, 2002. CAB International. C. T. dos S Dias and W. J. Krzanowski. Model Selection and Cross Validation in Additive Main Effect and Multiplicative Interaction Models. Crop Science, 43(3):865–873, 2003. B. Efron. Are a set of microarrays independent of each other? Annals of Applied Statistics, 3(3): 922–942, 2009. N. El Karoui. Tracy-Widom limit for the largest eigenvalue of a large class of complex Wishart matrices. Annals of Probability, 35(2):663–714, 2007. J. H. Friedman. Exploratory projection pursuit. Journal of the American Statistical Association, 82: 249–266, 1987. K. R. Gabriel. Least squares approximation of matrices by additive and multiplicative models. Journal of the Royal Statistical Society, B, 40(2):186–196, 1978. H. F. Gollob. A statistical model which combines features of factor analytic and analysis of variance techniques. Psychometrika, 33:73–115, 1968. R. M. Heiberger. Algorithm AS 127: Generation of random orthogonal matrices. Applied Statistics, 27(2):199–206, 1978. A. Hyvarinen, J. Karhunen, and E. Oja. Independent Component Analysis. J. Wiley New York, 2001. Ø. Langsrud. Rotation tests. Statistics and Computing, 15:53–60, 2005. E.L. Lehmann and J.P. Romano. Testing Statistical Hypotheses. Springer, 2005. J. Mandel. A new analysis of variance model for non-additive data. Technometrics, 13:1–18, 1971. B. Nadler. personal communication. Discussions at AIM workshop on Random Matrices and Higher Dimensional Inference, April 2007. A. Onatski. Asymptotics of the principal components estimator of large factor models with weak factors and i.i.d. Gaussian noise. Available at http://www.columbia.edu/˜ao2027/inference45.pdf, August 2007. 623 P ERRY AND OWEN A. B. Owen. Variance of the number of false discoveries. Journal of the Royal Statistical Society, Series B, 67:411–426, 2005. A. B. Owen and P. O. Perry. Bi-cross-validation of the SVD and the non-negative matrix factorization. Annals of applied statistics, 3(2):564–594, 2009. D. Paul. Asymptotics of sample eigenstructure for a large dimensional spiked covariance model. Statistica Sinica, 17(4):1617–1642, 2007a. D. Paul. Spiked covariance: some extensions. Unpublished technical note, June 2007b. N.R. Rao and A. Edelman. Sample eigenvalue based detection of high dimensional signals in white noise using relatively few samples. IEEE Transactions on Signal Processing, 56(7 Part 1):2625– 2638, 2008. R. D. Snee. Nonadditivity in a Two-Way Classification: Is It Interaction or Nonhomogeneous Variance? Journal of the American Statistical Association, 77(379):515–519, 1982. L. K. Southworth, S. K. Kim, and A. B. Owen. Properties of balanced permutations. Journal of Computational Biology, 16(4):625–638, 2009. R. W. M. Wedderburn. Random rotations and multivariate normal simulation. Technical report, Rothamsted Experimental Station, 1975. J.M. Zahn, S. Poosala, A.B. Owen, D.K. Ingram, A. Lustig, A. Carter, A.T. Weeraratna, D.D. Taub, M. Gorospe, K. Mazan-Mamczarz, et al. AGEMAP: A Gene Expression Database for Aging in Mice. PLoS Genet, 3(11):e201, 2007. 624