nips nips2003 nips2003-98 nips2003-98-reference knowledge-graph by maker-knowledge-mining

98 nips-2003-Kernel Dimensionality Reduction for Supervised Learning

Source: pdf

Author: Kenji Fukumizu, Francis R. Bach, Michael I. Jordan

Abstract: We propose a novel method of dimensionality reduction for supervised learning. Given a regression or classiﬁcation problem in which we wish to predict a variable Y from an explanatory vector X, we treat the problem of dimensionality reduction as that of ﬁnding a low-dimensional “effective subspace” of X which retains the statistical relationship between X and Y . We show that this problem can be formulated in terms of conditional independence. To turn this formulation into an optimization problem, we characterize the notion of conditional independence using covariance operators on reproducing kernel Hilbert spaces; this allows us to derive a contrast function for estimation of the effective subspace. Unlike many conventional methods, the proposed method requires neither assumptions on the marginal distribution of X, nor a parametric model of the conditional distribution of Y . 1

reference text

[1] Friedman, J.H. and Stuetzle, W. Projection pursuit regression. J. Amer. Stat. Assoc., 76:817– 823, 1981.

[2] Breiman, L. and Friedman, J.H. Estimating optimal transformations for multiple regression and correlation. J. Amer. Stat. Assoc., 80:580–598, 1985.

[3] Wold, H. Partial least squares. in S. Kotz and N.L. Johnson (Eds.), Encyclopedia of Statistical Sciences, Vol. 6, Wiley, New York. pp.581–591. 1985.

[4] Li, K.-C. Sliced inverse regression for dimension reduction (with discussion). J. Amer. Stat. Assoc., 86:316–342, 1991.

[5] Li, K.-C. On principal Hessian directions for data visualization and dimension reduction: Another application of Stein’s lemma. J. Amer. Stat. Assoc., 87:1025–1039, 1992.

[6] Aronszajn, N. Theory of reproducing kernels. Trans. Amer. Math. Soc., 69(3):337–404, 1950.

[7] Sch¨ lkopf, B., Burges, C.J.C., and Smola, A. (eds.) Advances in Kernel Methods: Support o Vector Learning. MIT Press. 1999.

[8] Sch¨ lkopf, B., Smola, A and M¨ ller, K.-R. Nonlinear component analysis as a kernel eigenvalue o u problem. Neural Computation, 10:1299–1319, 1998.

[9] Bach, F.R. and Jordan, M.I. Kernel independent component analysis. JMLR, 3:1–48, 2002.

[10] Baker, C.R. Joint measures and cross-covariance operators. Trans. Amer. Math. Soc., 186:273– 289, 1973.

[11] Fukumizu, K., Bach, F.R. and Jordan, M.I. Dimensionality reduction for supervised learning with reproducing kernel Hilbert spaces. JMLR, 5:73–99, 2004.

[12] Golub T.R. et al. Molecular classiﬁcation of cancer: Class discovery and class prediction by gene expression monitoring. Science, 286:531–537, 1999.