jmlr jmlr2005 jmlr2005-41 jmlr2005-41-reference knowledge-graph by maker-knowledge-mining
Source: pdf
Author: Arthur Gretton, Ralf Herbrich, Alexander Smola, Olivier Bousquet, Bernhard Schölkopf
Abstract: We introduce two new functionals, the constrained covariance and the kernel mutual information, to measure the degree of independence of random variables. These quantities are both based on the covariance between functions of the random variables in reproducing kernel Hilbert spaces (RKHSs). We prove that when the RKHSs are universal, both functionals are zero if and only if the random variables are pairwise independent. We also show that the kernel mutual information is an upper bound near independence on the Parzen window estimate of the mutual information. Analogous results apply for two correlation-based dependence functionals introduced earlier: we show the kernel canonical correlation and the kernel generalised variance to be independence measures for universal kernels, and prove the latter to be an upper bound on the mutual information near independence. The performance of the kernel dependence functionals in measuring independence is verified in the context of independent component analysis. Keywords: independence, covariance operator, mutual information, kernel, Parzen window estimate, independent component analysis c 2005 Arthur Gretton, Ralf Herbrich, Alexander Smola, Olivier Bousquet and Bernhard Schölkopf . G RETTON , H ERBRICH , S MOLA , B OUSQUET AND S CHÖLKOPF
S. Achard, D.-T. Pham, and C. Jutten. Blind source separation in post-nonlinear mixtures. In 3rd International Conference on ICA and BSS, 2001. S. Achard, D.-T. Pham, and C. Jutten. Quadratic dependence measure for nonlinear blind source separation. In 4th International Conference on ICA and BSS, 2003. S. Akaho. A kernel method for canonical correlation analysis. In Proceedings of the International Meeting of the Psychometric Society (IMPS2001), 2001. S.-I. Amari, A. Cichoki, and Y. H. A new learning algorithm for blind signal separation. In Advances in Neural Information Processing Systems, volume 8, pages 757–763. MIT Press, 1996. F. Bach and M. Jordan. Kernel independent component analysis - (matlab code, version 1.1). http://www.cs.berkeley.edu/~fbach/kernel-ica/index.htm F. Bach and M. Jordan. Kernel independent component analysis. Journal of Machine Learning Research, 3:1–48, 2002a. F. Bach and M. Jordan. Tree-dependent component analysis. In Uncertainty in Artificial Intelligence, volume 18, 2002b. C. R. Baker. Mutual information for gaussian processes. SIAM Journal on Applied Mathematics, 19(2):451–458, 1970. C. R. Baker. Joint measures and cross-covariance operators. Transactions of the American Mathematical Society, 186:273–289, 1973. G. Bakır, A. Gretton, M. Franz, and B. Schölkopf. Multivariate regression with stiefel constraints. Technical Report 101, Max Planck Institute for Biological Cybernetics, 2004. A. Bell and T. Sejnowski. An information-maximization approach to blind separation and blind deconvolution. Neural Computation, 7(6):1129–1159, 1995. A. Belouchrani, K. Abed-Meraim, J.-F. Cardoso, and E. Moulines. A blind source separation technique using second order statistics. IEEE Transactions on Signal Processing, 45(2):434–444, 1997. A. Belouchrani and M. G. Amin. Blind source separation based on time-frequency signal representations. IEEE Transactions on Signal Processing, 46(11):2888–2897, 1998. R. N. Bracewell. The Fourier Transform and its Applications. McGraw Hill, New York, 1986. L. Breiman and J. Friedman. Estimating optimal transformations for multiple regression and correlation. Journal of the American Statistical Association, 80:580–598, 1985. J.-F. Cardoso. Blind signal separation: statistical principles. Proceedings of the IEEE, 90(8):2009– 2026, 1998a. J.-F. Cardoso. Multidimensional independent component analysis. In ICASSP, 1998b. 2125 G RETTON , H ERBRICH , S MOLA , B OUSQUET AND S CHÖLKOPF A. Chen and P. Bickel. Consistent independent component analysis and prewhitening. Technical report, Berkeley, 2004. A. Cichocki and S.-I. Amari. Adaptive Blind Signal and Image Processing. John Wiley and Sons, New York, 2002. P. Comon. Independent component analysis, a new concept? Signal Processing, 36:287–314, 1994. T. M. Cover and J. A. Thomas. Elements of Information Theory. John Wiley and Sons, New York, 1991. N. Cristianini, J. Shawe-Taylor, and J. Kandola. Spectral kernel methods for clustering. In NIPS, volume 14, Cambridge, MA, 2002. MIT Press. J. Dauxois and G. M. Nkiet. Nonlinear canonical analysis and independence tests. Annals of Statistics, 26(4):1254–1278, 1998. R. O. Duda, P. E. Hart, and D. G. Stork. Pattern Classification. Wiley, New York, second edition, 2001. A. Edelman, T. Arias, and S. Smith. The geometry of algorithms with orthogonality constraints. SIAM Journal on Matrix Analysis and Applications, 20(2):303–353, 1998. S. Fine and K. Scheinberg. Efficient SVM training using low-rank kernel representations. Journal of Machine Learning Research, 2(Dec):243–264, 2001. K. Fukumizu, F. Bach, and A. Gretton. Consistency of kernel canonical correlation analysis. Technical Report 942, Institute of Statistical Mathematics, 2005. K. Fukumizu, F. R. Bach, and M. I. Jordan. Dimensionality reduction for supervised learning with reproducing kernel hilbert spaces. Journal of Machine Learning Research, 5:73–99, 2004. M. Greenacre. Theory and Applications of Correspondence Analysis. Academic Press, London, 1984. A. Gretton. Kernel Methods for Classification and Signal Separation. PhD thesis, Cambridge University Engineering Department, 2003. A. Gretton, O. Bousquet, A. Smola, and B. Schoelkopf. Measuring statistical dependence with hilbert-schmidt norms. Technical Report 140, MPI for Biological Cybernetics, 2005a. A. Gretton, R. Herbrich, and A. Smola. The kernel mutual information. Technical report, Cambridge University Engineering Department and Max Planck Institute for Biological Cybernetics, 2003a. A. Gretton, R. Herbrich, and A. Smola. The kernel mutual information. In ICASSP, volume 4, pages 880–883, 2003b. A. Gretton, A. Smola, O. Bousquet, and R. Herbrich. Behaviour and convergence of the constrained covariance. Technical Report 130, MPI for Biological Cybernetics, 2004. 2126 K ERNEL M ETHODS FOR M EASURING I NDEPENDENCE A. Gretton, A. Smola, O. Bousquet, R. Herbrich, A. Belitski, M. Augath, Y. Murayama, J. Pauls, B. Schölkopf, and N. Logothetis. Kernel constrained covariance for dependence measurement. In AISTATS, volume 10, 2005b. D. Hardoon, J. Shawe-Taylor, and O. Friman. KCCA for fMRI analysis. In Proceedings of Medical Image Understanding and Analysis, London, 2004. S. Harmeling, A. Ziehe, M. Kawanabe, and K.-R. Müller. Kernel-based nonlinear blind source separation. Neural Computation, 15(5):1089–1124, 2003. S. Haykin. Neural Networks : A Comprehensive Foundation. Macmillan, New York, 2nd edition, 1998. M. Hein and O. Bousquet. Kernels, associated structures, and generalizations. Technical Report 127, Max Planck Institute for Biological Cybernetics, 2004. R. A. Horn and C. R. Johnson. Matrix Analysis. Cambridge University Press, Cambridge, 1985. S. Hosseni and C. Jutten. On the separability of nonlinear mixtures of temporally correlated sources. IEEE Signal Processing Letters, 10(2):43–46, 2003. A. Hyvärinen. One-unit contrast functions for independent component analysis: A statistical analysis. In Proc. IEEE Neural Networks for Signal Processing Workshop, pages 388–397, 1997. A. Hyvärinen, J. Karhunen, and E. Oja. Independent Component Analysis. John Wiley and Sons, New York, 2001. A. Hyvärinen and P. Pajunen. Nonlinear independent component analysis: Existence and uniqueness results. Neural Networks, 12(3):429–439, 1999. A. Hyvärinen and M. Plumbley. Optimization with orthogonality constraints: a modified gradient method. Unpublished note, 2002. Y. I. Ingster. Asymptotically minimax testing of the hypothesis of independence. Zap. Nauchn. Seminar. LOMI, 153 (1986) pp. 60-72, Translation in J. Soviet. Math., 44:466–476, 1989. J. Jacod and P. Protter. Probability Essentials. Springer, New York, 2000. M. Kuss. Kernel multivariate analysis. Master’s thesis, Technical University of Berlin, 2001. P. Lai and C. Fyfe. Kernel and nonlinear canonical correlation analysis. International Journal of Neural Systems, 10(5):365–377, 2000. E. Learned-Miller and J. Fisher III. ICA using spacings estimates of entropy. JMLR, 4:1271–1295, 2003. T.-W. Lee, M. Girolami, A. Bell, and T. Sejnowski. A unifying framework for independent component analysis. Computers and Mathematics with Applications, 39:1–21, 2000. S. E. Leurgans, R. A. Moyeed, and B. W. Silverman. Canonical correlation analysis when the data are curves. Journal of the Royal Statistical Society, Series B (Methodological), 55(3):725–740, 1993. 2127 G RETTON , H ERBRICH , S MOLA , B OUSQUET AND S CHÖLKOPF T. Melzer, M. Reiter, and H. Bischof. Kernel canonical correlation analysis. Technical Report PRIP-TR-65, Pattern Recognition and Image Processing Group, TU Wien, 2001. E. Mourier. Éléments aléatoires dans un éspace de Banach. Ann. Inst. H. Poincaré Sect B., 161: 161–244, 1953. L. Paninski. Estimation of entropy and mutual information. Neural Computation, 15:1191–1253, 2003. A. Papoulis. Probability, Random Variables, and Stochastic Processes. McGraw Hill, New York, 1991. B. Pearlmutter. Music samples to illustrate the context-sensitive generalisation of ICA. http://www.cs.unm.edu/~bap/demos.html D.-T. Pham. Fast algorithms for mutual information based independent component analysis. IEEE Transactions on Signal Processing, 2002. Submitted. D.-T. Pham and P. Garat. Blind separation of mixture of independent sources through a quasimaximum likelihood approach. IEEE Transactions on Signal Processing, 45(7):1712–1725, 1997. A. Rényi. On measures of dependence. Acta Math. Acad. Sci. Hungar., 10:441–451, 1959. R. Rosipal and L. Trejo. Kernel partial least squares regression in reproducing kernel hilbert spaces. Journal of Machine Learning Research, 1(2):97–123, 2001. A. Samarov and A. Tsybakov. Nonparametric independent component analysis. Bernoulli, 10: 565–582, 2004. B. Schölkopf and A. Smola. Learning with Kernels. MIT press, Cambridge, MA, 2002. B. Schölkopf, A. J. Smola, and K.-R. Müller. Nonlinear component analysis as a kernel eigenvalue problem. Neural Computation, 10:1299–1319, 1998. J. Shawe-Taylor and N. Cristianini. Kernel Methods for Pattern Analysis. Cambridge University Press, Cambridge, UK, 2004. B. Silverman. Density Estimation for Statistics and Data Analysis. Chapman and Hall, New York, 1986. I. Steinwart. On the influence of the kernel on the consistency of support vector machines. JMLR, 2, 2001. H. Stögbauer, A. Kraskov, S. A. Astakhov, and P. Grassberger. Least dependent component analysis based on mutual information. Phys. Rev. E, 70(6):066123, 2004. A. Taleb and C. Jutten. Source separation in post-nonlinear mixtures. IEEE Transactions on Signal Processing, 47(10):2807–2820, 1999. F. Theis. Blind signal separation into groups of dependent signals using joint block diagonalisation. In ISCAS, pages 5878–5881, 2005. 2128 K ERNEL M ETHODS FOR M EASURING I NDEPENDENCE T. van Gestel, J. Suykens, J. de Brabanter, B. de Moor, and J. Vanderwalle. Kernel canonical correlation analysis and least squares support vector machines. In Proceedings of the International Conference on Artificial Neural Networks (ICANN). Springer Verlag, 2001. X.-L. Zhu and X.-D. Zhang. Adaptive RLS algorithm for blind source separation using a natural gradient. IEEE Signal Processing Letters, 9(12):432–435, 2002. L. Zwald, O. Bousquet, and G. Blanchard. Statistical properties of kernel principal component analysis. In Proceedings of the 17th Conference on Computational Learning Theory (COLT), 2004. 2129