jmlr jmlr2010 jmlr2010-27 jmlr2010-27-reference knowledge-graph by maker-knowledge-mining

27 jmlr-2010-Consistent Nonparametric Tests of Independence

Source: pdf

Author: Arthur Gretton, László Györfi

Abstract: Three simple and explicit procedures for testing the independence of two multi-dimensional random variables are described. Two of the associated test statistics (L1 , log-likelihood) are deﬁned when the empirical distribution of the variables is restricted to ﬁnite partitions. A third test statistic is deﬁned as a kernel-based independence measure. Two kinds of tests are provided. Distributionfree strong consistent tests are derived on the basis of large deviation bounds on the test statistics: these tests make almost surely no Type I or Type II error after a random sample size. Asymptotically α-level tests are obtained from the limiting distribution of the test statistics. For the latter tests, the Type I error converges to a ﬁxed non-zero value α, and the Type II error drops to zero, for increasing sample size. All tests reject the null hypothesis of independence if the test statistics become large. The performance of the tests is evaluated experimentally on benchmark data. Keywords: hypothesis test, independence, L1, log-likelihood, kernel methods, distribution-free consistent test

reference text

F. R. Bach and M. I. Jordan. Kernel independent component analysis. Journal of Machine Learning Research, 3:1–48, 2002. C. Baker. Joint measures and cross-covariance operators. Transactions of the American Mathematical Society, 186:273–289, 1973. A. R. Barron. Uniformly powerful goodness of ﬁt tests. The Annals of Statistics, 17:107–124, 1989. A. R. Barron, L. Gy¨ rﬁ, and E. C. van der Meulen. Distribution estimation consistent in total o variation and in two types of information divergence. IEEE Transactions on Information Theory, 38:1437–1454, 1992. M. S. Bartlett. The characteristic function of a conditional statistic. Journal of the London Mathematical Society, 13:62–67, 1938. J. Beirlant and D. M. Mason. On the asymptotic normality of l p -norms of empirical functionals. Mathematical Methods of Statistics, 4:1–19, 1995. J. Beirlant, L. Gy¨ rﬁ, and G. Lugosi. On the asymptotic normality of the l1 - and l2 -errors in hiso togram density estimation. Canadian Journal of Statistics, 22:309–318, 1994. J. Beirlant, L. Devroye, L. Gy¨ rﬁ, and I. Vajda. Large deviations of divergence measures on partio tions. Journal of Statistical Planning and Inference, 93:1–16, 2001. G. Biau and L. Gy¨ rﬁ. On the asymptotic properties of a nonparametric l1 -test statistic of homoo geneity. IEEE Transactions on Information Theory, 51:3965–3973, 2005. J. R. Blum, J. Kiefer, and M. Rosenblatt. Distribution free tests of independence based on the sample distribution function. The Annals of Mathematical Statistics, 32:485–498, 1961. D. S. Cotterill and M. Cs¨ rg˝ . On the limiting distribution of and critical values for the Hoeffding, o o Blum, Kiefer, Rosenblatt independence criterion. Statistics and Decisions, 3:1–48, 1985. 1420 C ONSISTENT N ONPARAMETRIC T ESTS OF I NDEPENDENCE I. Csisz´ r. Information-type measures of divergence of probability distributions and indirect obsera vations. Studia Scientiarum Mathematicarum Hungarica, 2:299–318, 1967. J. Dauxois and G. M. Nkiet. Nonlinear canonical analysis and independence tests. The Annals of Statistics, 26(4):1254–1278, 1998. A. Dembo and Y. Peres. A topological criterion for hypothesis testing. The Annals of Statistics, 22: 106–117, 1994. A. Feuerverger. A consistent test for bivariate dependence. International Statistical Review, 61(3): 419–433, 1993. K. Fukumizu, F. R. Bach, and M. I. Jordan. Dimensionality reduction for supervised learning with reproducing kernel Hilbert spaces. Journal of Machine Learning Research, 5:73–99, 2004. K. Fukumizu, F. Bach, and A. Gretton. Statistical consistency of kernel canonical correlation analysis. Journal of Machine Learning Research, 8:361–383, 2007. K. Fukumizu, A. Gretton, X. Sun, and B. Sch¨ lkopf. Kernel measures of conditional dependence. In o Advances in Neural Information Processing Systems 20, pages 489–496, Cambridge, MA, 2008. MIT Press. A. Gretton and L. Gy¨ rﬁ. Nonparametric independence tests: Space partitioning and kernel apo proaches. In Algorithmic Learning Theory: 19th International Conference, pages 183–198, Berlin, 2008. Springer. A. Gretton, O. Bousquet, A. Smola, and B. Sch¨ lkopf. Measuring statistical dependence with o Hilbert-Schmidt norms. In Algorithmic Learning Theory: 16th International Conference, pages 63–78, Berlin, 2005a. Springer. A. Gretton, R. Herbrich, A. Smola, O. Bousquet, and B. Sch¨ lkopf. Kernel methods for measuring o independence. Journal of Machine Learning Research, 6:2075–2129, 2005b. A. Gretton, K. Fukumizu, C.-H. Teo, L. Song, B. Sch¨ lkopf, and A. Smola. A kernel statistical o test of independence. In Advances in Neural Information Processing Systems 20, pages 585–592, Cambridge, MA, 2008. MIT Press. L. Gy¨ rﬁ and I. Vajda. Asymptotic distributions for goodness of ﬁt statistics in a sequence of o multinomial models. Statistics and Probability Letters, 56:57–67, 2002. L. Gy¨ rﬁ and E. C. van der Meulen. A consistent goodness of ﬁt test based on the total variation o distance. In G. Roussas, editor, Nonparametric Functional Estimation and Related Topics, pages 631–645. Kluwer, Dordrecht, 1990. L. Gy¨ rﬁ, F. Liese, I. Vajda, and E. C. van der Meulen. Distribution estimates consistent in χ2 o divergence. Statistics, 32:31–57, 1998. P. Hall. Central limit theorem for integrated square error of multivariate nonparametric density estimators. Journal of Multivariate Analysis, 14:1–16, 1984. 1421 ¨ G RETTON AND G Y ORFI W. Hoeffding. A nonparametric test for independence. The Annals of Mathematical Statistics, 19 (4):546–557, 1948. W. Hoeffding. Probability inequalities for sums of bounded random variables. Journal of the American Statistical Association, 58:13–30, 1963. T. Inglot, T. Jurlewitz, and T. Ledwina. Asymptotics for multinomial goodness of ﬁt tests for a simple hypothesis. Theory of Probability and Its Applications, 35:797–803, 1990. J. Jacod and P. Protter. Probability Essentials. Springer, New York, 2000. W. C. M. Kallenberg. On moderate and large deviations in multinomial distributions. The Annals of Statistics, 13:1554–1580, 1985. A. Kankainen. Consistent Testing of Total Independence Based on the Empirical Characteristic Function. PhD thesis, University of Jyv¨ skyl¨ , 1995. a a J. H. B. Kemperman. An optimum rate of transmitting information. The Annals of Mathematical Statistics, 40:2156–2177, 1969. S. Kullback. A lower bound for discrimination in terms of variation. IEEE Transactions on Information Theory, 13:126–127, 1967. S. E. Leurgans, R. A. Moyeed, and B. W. Silverman. Canonical correlation analysis when the data are curves. Journal of the Royal Statistical Society, Series B (Methodological), 55(3):725–740, 1993. C. McDiarmid. On the method of bounded differences. In Survey in Combinatorics, pages 148–188. Cambridge University Press, 1989. C. Morris. Central limit theorems for multinomial sums. The Annals of Statistics, 3:165–188, 1975. M.P. Quine and J. Robinson. Efﬁciencies of chi-square and likelihood ratio goodness-of-ﬁt tests. The Annals of Statistics, 13:727–742, 1985. C. R. Rao. Statistical Inference and its Applications. Wiley, New York, second edition, 1973. T. Read and N. Cressie. Goodness-Of-Fit Statistics for Discrete Multivariate Analysis. SpringerVerlag, New York, 1988. A. R´ nyi. On measures of dependence. Acta Mathematica Academiae Scientiarum Hungaricae, 10: e 441–451, 1959. M. Rosenblatt. A quadratic measure of deviation of two-dimensional density estimates and a test of independence. The Annals of Statistics, 3(1):1–14, 1975. R. Serﬂing. Approximation Theorems of Mathematical Statistics. Wiley, New York, 1980. B. Sriperumbudur, A. Gretton, K. Fukumizu, G. Lanckriet, and B. Sch¨ lkopf. Injective hilbert space o embeddings of probability measures. In Proceedings of the 21st Annual Conference on Learning Theory, pages 111–122, 2008. 1422 C ONSISTENT N ONPARAMETRIC T ESTS OF I NDEPENDENCE I. Steinwart. On the inﬂuence of the kernel on the consistency of support vector machines. Journal of Machine Learning Research, 2:67–93, 2001. G. Tusn´ dy. On asymptotically optimal tests. The Annals of Statistics, 5:385–393, 1977. a N. Ushakov. Selected Topics in Characteristic Functions. Modern Probability and Statistics. Walter de Gruyter, Berlin, 1999. 1423