nips nips2012 nips2012-264 nips2012-264-reference knowledge-graph by maker-knowledge-mining

264 nips-2012-Optimal kernel choice for large-scale two-sample tests


Source: pdf

Author: Arthur Gretton, Dino Sejdinovic, Heiko Strathmann, Sivaraman Balakrishnan, Massimiliano Pontil, Kenji Fukumizu, Bharath K. Sriperumbudur

Abstract: Given samples from distributions p and q, a two-sample test determines whether to reject the null hypothesis that p = q, based on the value of a test statistic measuring the distance between the samples. One choice of test statistic is the maximum mean discrepancy (MMD), which is a distance between embeddings of the probability distributions in a reproducing kernel Hilbert space. The kernel used in obtaining these embeddings is critical in ensuring the test has high power, and correctly distinguishes unlike distributions with high probability. A means of parameter selection for the two-sample test based on the MMD is proposed. For a given test level (an upper bound on the probability of making a Type I error), the kernel is chosen so as to maximize the test power, and minimize the probability of making a Type II error. The test statistic, test threshold, and optimization over the kernel parameters are obtained with cost linear in the sample size. These properties make the kernel selection and test procedures suited to data streams, where the observations cannot all be stored in memory. In experiments, the new kernel selection approach yields a more powerful test than earlier kernel selection heuristics.


reference text

[1] R. Adler and J. Taylor. Random Fields and Geometry. Springer, 2007.

[2] Andreas Argyriou, Raphael Hauser, Charles A. Micchelli, and Massimiliano Pontil. A dcprogramming algorithm for kernel selection. In ICML, pages 41–48, 2006.

[3] P. L. Bartlett and S. Mendelson. Rademacher and Gaussian complexities: Risk bounds and structural results. Journal of Machine Learning Research, 3:463–482, 2002.

[4] A. Berlinet and C. Thomas-Agnan. Reproducing Kernel Hilbert Spaces in Probability and Statistics. Kluwer, 2004.

[5] Magnetic Fields. 69 love songs. Merge, MRG169, 1999.

[6] P. Gehler and S. Nowozin. Infinite kernel learning. Technical Report TR-178, Max Planck Institute for Biological Cybernetics, 2008.

[7] A. Gretton, K. Borgwardt, M. Rasch, B. Schoelkopf, and A. Smola. A kernel two-sample test. JMLR, 13:723–773, 2012. 8

[8] A. Gretton, K. Borgwardt, M. Rasch, B. Sch¨ lkopf, and A. J. Smola. A kernel method for o the two-sample problem. In Advances in Neural Information Processing Systems 15, pages 513–520, Cambridge, MA, 2007. MIT Press.

[9] A. Gretton, K. Fukumizu, Z. Harchaoui, and B. Sriperumbudur. A fast, consistent kernel twosample test. In Advances in Neural Information Processing Systems 22, Red Hook, NY, 2009. Curran Associates Inc.

[10] Z. Harchaoui, F. Bach, and E. Moulines. Testing for homogeneity with kernel Fisher discriminant analysis. In Advances in Neural Information Processing Systems 20, pages 609–616. MIT Press, Cambridge, MA, 2008.

[11] R. A. Horn and C. R. Johnson. Matrix analysis. Cambridge Univ Press, 1990.

[12] C. McDiarmid. On the method of bounded differences. In Survey in Combinatorics, pages 148–188. Cambridge University Press, 1989.

[13] R. Serfling. Approximation Theorems of Mathematical Statistics. Wiley, New York, 1980.

[14] A. J. Smola, A. Gretton, L. Song, and B. Sch¨ lkopf. A Hilbert space embedding for diso tributions. In Proceedings of the International Conference on Algorithmic Learning Theory, volume 4754, pages 13–31. Springer, 2007.

[15] B. Sriperumbudur, K. Fukumizu, A. Gretton, G. Lanckriet, and B. Schoelkopf. Kernel choice and classifiability for RKHS embeddings of probability distributions. In Advances in Neural Information Processing Systems 22, Red Hook, NY, 2009. Curran Associates Inc.

[16] B. Sriperumbudur, A. Gretton, K. Fukumizu, G. Lanckriet, and B. Sch¨ lkopf. Hilbert space o embeddings and metrics on probability measures. Journal of Machine Learning Research, 11:1517–1561, 2010.

[17] M. Sugiyama, T. Suzuki, Y. Itoh, T. Kanamori, and M. Kimura. Least-squares two-sample test. Neural Networks, 24(7):735–751, 2011.

[18] A. W. van der Vaart and J. A. Wellner. Weak Convergence and Empirical Processes. Springer, 1996. 9