nips nips2009 nips2009-8 nips2009-8-reference knowledge-graph by maker-knowledge-mining

8 nips-2009-A Fast, Consistent Kernel Two-Sample Test


Source: pdf

Author: Arthur Gretton, Kenji Fukumizu, Zaïd Harchaoui, Bharath K. Sriperumbudur

Abstract: A kernel embedding of probability distributions into reproducing kernel Hilbert spaces (RKHS) has recently been proposed, which allows the comparison of two probability measures P and Q based on the distance between their respective embeddings: for a sufficiently rich RKHS, this distance is zero if and only if P and Q coincide. In using this distance as a statistic for a test of whether two samples are from different distributions, a major difficulty arises in computing the significance threshold, since the empirical statistic has as its null distribution (where P = Q) an infinite weighted sum of χ2 random variables. Prior finite sample approximations to the null distribution include using bootstrap resampling, which yields a consistent estimate but is computationally costly; and fitting a parametric model with the low order moments of the test statistic, which can work well in practice but has no consistency or accuracy guarantees. The main result of the present work is a novel estimate of the null distribution, computed from the eigenspectrum of the Gram matrix on the aggregate sample from P and Q, and having lower computational cost than the bootstrap. A proof of consistency of this estimate is provided. The performance of the null distribution estimate is compared with the bootstrap and parametric approaches on an artificial example, high dimensional multivariate data, and text.


reference text

[1] N. Anderson, P. Hall, and D. Titterington. Two-sample test statistics for measuring discrepancies between two multivariate probability density functions using kernel-based density estimates. Journal of 8 Multivariate Analysis, 50:41–54, 1994.

[2] F. R. Bach and M. I. Jordan. Kernel independent component analysis. J. Mach. Learn. Res., 3:1–48, 2002.

[3] C. Baker. Joint measures and cross-covariance operators. Transactions of the American Mathematical Society, 186:273–289, 1973.

[4] A. Berlinet and C. Thomas-Agnan. Reproducing Kernel Hilbert Spaces in Probability and Statistics. Springer-Verlag, Berlin, 2003.

[5] Rajendra Bhatia and Ludwig Elsner. The Hoffman-Wielandt inequality in infinite dimensions. Proceedings of Indian Academy of Science (Mathematical Sciences), 104(3):483–494, 1994.

[6] G. Biau and L. Gyorfi. On the asymptotic properties of a nonparametric l1 -test statistic of homogeneity. IEEE Transactions on Information Theory, 51(11):3965–3973, 2005.

[7] G. Blanchard, O. Bousquet, and L. Zwald. Statistical properties of kernel principal component analysis. Machine Learning, 66:259–294, 2007.

[8] K. M. Borgwardt, A. Gretton, M. J. Rasch, H.-P. Kriegel, B. Sch¨ lkopf, and A. J. Smola. Integrating o structured biological data by kernel maximum mean discrepancy. Bioinformatics (ISMB), 22(14):e49– e57, 2006.

[9] J. Friedman and L. Rafsky. Multivariate generalizations of the Wald-Wolfowitz and Smirnov two-sample tests. The Annals of Statistics, 7(4):697–717, 1979.

[10] K. Fukumizu, F. R. Bach, and M. I. Jordan. Dimensionality reduction for supervised learning with reproducing kernel Hilbert spaces. J. Mach. Learn. Res., 5:73–99, 2004.

[11] K. Fukumizu, A. Gretton, X. Sun, and B. Sch¨ lkopf. Kernel measures of conditional dependence. In o NIPS 20, pages 489–496, 2008.

[12] K. Fukumizu, B. Sriperumbudur, A. Gretton, and B. Sch¨ lkopf. Characteristic kernels on groups and o semigroups. In NIPS 21, pages 473–480, 2009.

[13] G. Golub and Q. Ye. An inverse free preconditioned krylov subspace method for symmetric generalized eigenvalue problems. SIAM Journal on Scientific Computing, 24:312–334, 2002.

[14] A. Gretton, K. Borgwardt, M. Rasch, B. Sch¨ lkopf, and A. Smola. A kernel method for the two-sampleo problem. In NIPS 19, pages 513–520, 2007.

[15] A. Gretton, K. Fukumizu, C.-H. Teo, L. Song, B. Sch¨ lkopf, and A. Smola. A kernel statistical test of o independence. In NIPS 20, pages 585–592, 2008.

[16] G. R. Grimmet and D. R. Stirzaker. Probability and Random Processes. Oxford University Press, Oxford, third edition, 2001.

[17] P. Hall and N. Tajvidi. Permutation tests for equality of distributions in high-dimensional settings. Biometrika, 89(2):359–374, 2002.

[18] Z. Harchaoui, F. Bach, and E. Moulines. Testing for homogeneity with kernel fisher discriminant analysis. In NIPS 20, pages 609–616. 2008. (long version: arXiv:0804.1026v1).

[19] M. Hein and O. Bousquet. Kernels, associated structures, and generalizations. Technical Report 127, Max Planck Institute for Biological Cybernetics, 2004.

[20] N. L. Johnson, S. Kotz, and N. Balakrishnan. Continuous Univariate Distributions. Volume 1 (Second Edition). John Wiley and Sons, 1994.

[21] E. Lehmann and J. Romano. Testing Statistical Hypothesis (3rd ed.). Wiley, New York, 2005.

[22] C. Leslie, E. Eskin, and W. S. Noble. The spectrum kernel: A string kernel for SVM protein classification. In Proceedings of the Pacific Symposium on Biocomputing, pages 564–575, 2002.

[23] A. S. Markus. The eigen- and singular values of the sum and product of linear operators. Russian Mathematical Surveys, 19(4):93–123, 1964.

[24] M. Rasch, A. Gretton, Y. Murayama, W. Maass, and N. K. Logothetis. Predicting spiking activity from local field potentials. Journal of Neurophysiology, 99:1461–1476, 2008.

[25] B. Sch¨ lkopf and A. Smola. Learning with Kernels. MIT Press, Cambridge, MA, 2002. o

[26] B. Sch¨ lkopf, A. J. Smola, and K.-R. M¨ ller. Nonlinear component analysis as a kernel eigenvalue o u problem. Neural Computation, 10:1299–1319, 1998.

[27] J. Shawe-Taylor, C. Williams, N. Cristianini, and J. Kandola. On the eigenspectrum of the Gram matrix and the generalisation error of kernel PCA. IEEE Trans. Inf. Theory, 51(7):2510–2522, 2005.

[28] A. J. Smola, A. Gretton, L. Song, and B. Sch¨ lkopf. A Hilbert space embedding for distributions. In ALT o 18, pages 13–31, 2007.

[29] B. Sriperumbudur, A. Gretton, K. Fukumizu, G. Lanckriet, and B. Sch¨ lkopf. Injective hilbert space o embeddings of probability measures. In COLT 21, pages 111–122, 2008.

[30] C. H. Teo and S. V. N. Vishwanathan. Fast and space efficient string kernels using suffix arrays. In ICML, pages 929–936, 2006.

[31] J. E. Wilkins. A note on skewness and kurtosis. Ann. Math. Stat., 15(3):333–335, 1944.

[32] G. Zech and B. Aslan. A multivariate two-sample test based on the concept of minimum energy. In PHYSTAT, pages 97–100, 2003. 9