nips nips2005 nips2005-182 nips2005-182-reference knowledge-graph by maker-knowledge-mining
Source: pdf
Author: Kenji Fukumizu, Arthur Gretton, Francis R. Bach
Abstract: While kernel canonical correlation analysis (kernel CCA) has been applied in many problems, the asymptotic convergence of the functions estimated from a finite sample to the true functions has not yet been established. This paper gives a rigorous proof of the statistical convergence of kernel CCA and a related method (NOCCO), which provides a theoretical justification for these methods. The result also gives a sufficient condition on the decay of the regularization coefficient in the methods to ensure convergence. 1
[1] S. Akaho. A kernel method for canonical correlation analysis. Proc. Intern. Meeting on Psychometric Society (IMPS2001), 2001.
[2] N. Aronszajn. Theory of reproducing kernels. Trans. American Mathematical Society, 69(3):337–404, 1950.
[3] F. R. Bach and M. I. Jordan. Kernel independent component analysis. J. Machine Learning Research, 3:1–48, 2002.
[4] C. R. Baker. Joint measures and cross-covariance operators. Trans. American Mathematical Society, 186:273–289, 1973.
[5] N. Dunford and J. T. Schwartz. Linear Operators, Part II. Interscience, 1963.
[6] K. Fukumizu, F. R. Bach, and A. Gretton. Consistency of kernel canonical correlation. Research Memorandum 942, Institute of Statistical Mathematics, 2005.
[7] K. Fukumizu, F. R. Bach, and M. I. Jordan. Dimensionality reduction for supervised learning with reproducing kernel Hilbert spaces. J. Machine Learning Research, 5:73– 99, 2004.
[8] A. Gretton, O. Bousquet, A. Smola, and B. Sch¨lkopf. Measuring statistical deo pendence with Hilbert-Schmidt norms. Tech Report 140, Max-Planck-Institut f¨r u biologische Kybernetik, 2005.
[9] A. Gretton, A. Smola, O. Bousquet, R. Herbrich, B. Sch¨lkopf, and N. Logothetis. o Behaviour and convergence of the constrained covariance. Tech Report 128, MaxPlanck-Institut f¨r biologische Kybernetik, 2004. u
[10] S. Leurgans, R. Moyeed, and B. Silverman. Canonical correlation analysis when the data are curves. J. Royal Statistical Society, Series B, 55(3):725–740, 1993.
[11] T. Melzer, M. Reiter, and H. Bischof. Nonlinear feature extraction using generalized canonical correlation analysis. Proc. Intern. Conf. Artificial Neural Networks (ICANN2001), 353–360, 2001. A Lemmas used in the proofs We list the lemmas used in Section 4. See [6] for the proofs. Lemma 7. Suppose A and B are positive self-adjoint operators on a Hilbert space such that 0 ≤ A ≤ λI and 0 ≤ B ≤ λI hold for a positive constant λ. Then A3/2 − B 3/2 ≤ 3λ3/2 A − B . Lemma 8. Let H1 and H2 be Hilbert spaces, and H0 be a dense linear subspace of H2 . Suppose An and A are bounded operators on H2 , and B is a compact operator from H1 to H2 such that An u → Au for all u ∈ H0 , and supn An ≤ M for some M > 0. Then An B converges to AB in norm. Lemma 9. Let A be a compact positive operator on a Hilbert space H, and An (n ∈ N) be bounded positive operators on H such that An converges to A in norm. Assume the eigenspace of A corresponding to the largest eigenvalue is one-dimensional and spanned by a unit eigenvector φ, and the maximum of the spectrum of An is attained by a unit eigenvector φn . Then we have | φn , φ H | → 1 as n → ∞.