jmlr jmlr2006 jmlr2006-45 jmlr2006-45-reference knowledge-graph by maker-knowledge-mining

45 jmlr-2006-Learning Coordinate Covariances via Gradients

Source: pdf

Author: Sayan Mukherjee, Ding-Xuan Zhou

Abstract: We introduce an algorithm that learns gradients from samples in the supervised learning framework. An error analysis is given for the convergence of the gradient estimated by the algorithm to the true gradient. The utility of the algorithm for the problem of variable selection as well as determining variable covariance is illustrated on simulated data as well as two gene expression data sets. For square loss we provide a very efﬁcient implementation with respect to both memory and time. Keywords: Tikhnonov regularization, variable selection, reproducing kernel Hilbert space, generalization bounds

reference text

N. Aronszajn. Theory of reproducing kernels. Transactions of the American Mathematical Society, 686:337–404, 1950. M. Belkin and P. Niyogi. Semi-Supervised Learning on Riemannian Manifolds. Machine Learning, 56(1-3):209–239, 2004. 547 M UKHERJEE AND Z HOU I. Carrel, A. Cottle, K. Coglin, and H. Willard. A ﬁrst-generation X-incativation proﬁle of the human X chromosome. Proc. Natl. Acad. Sci. USA, 96:14440–14444, 1999. O. Chapelle, V. N. Vapnik, O. Bousquet, and S. Mukherjee. Choosing Multiple Parameters for Support Vector Machines. Machine Learning, 46(1-3):131–159, 2002. S. S. Chen, D. L. Donoho, and M. A. Saunders. Atomic decomposition by basis pursuit. SIAM Journal on Scientiﬁc Computing, 20(1):33–61, 1999. C. Cortes and V. N. Vapnik. Support-Vector Networks. Machine Learning, 20(3):273–297, 1995. F. Cucker and S. Smale. On the mathematical foundations of learning. Bull. Amer. Math. Soc., 39: 1–49, 2001. C. Disteche, G. Flippova, and K. Tsuchiya. Escape from X inactivation. Cytogenet. Genome Res., 99:35–43, 2002. T. Evgeniou, M. Pontil, and T. Poggio. Regularization Networks and Support Vector Machines. Advances in Computational Mathematics, 13:1–50, 2000. T. R. Golub, D. K. Slonim, P. Tamayo, C. Huard, M. Gaasenbeek, J. P. Mesirov, H. Coller, M. L. Loh, J. R. Downing, M. A. Caligiuri, C. D. Bloomﬁeld, and E. S. Lander. Molecular classiﬁcation of cancer: class discovery and class prediction by gene expression monitoring. Science, 286:531– 537, 1999. I. Guyon, J. Weston, S. Barnhill, and V. Vapnik. Gene selection for cancer classiﬁcation using support vector machines. Machine Learning, 46(1-3):389–422, 2002. Y. Lee, Y. Lin, and G. Wahba. Multicategory support vector machines: theory and applications to the classiﬁcation of microarray data and satellite radiance data. Journal of the American Statistical Society, 99:67–81, 2004. M. Liao. Bayesian estimation of gene expression index and Bayesian kernel models. PhD thesis, Duke University, Durham, NC, 2005. M. Liao, F. Liang, S. Mukherjee, and M. West. Bayesian kernel regression and radial basis function models. Preprint, 2005. C. A. Micchelli and M. Pontil. On learning vector-valued functions. Neural Computation, 17: 177–204, 2005. I. Pinelis. Optimum bounds for the distributions of martingales in Banach spaces. Ann. Probab., 22:1679–1706, 1994. I. Pinelis. Correction: ”Optimum bounds for the distributions of martingales in Banach spaces”. Ann. Probab., 27:2119, 1999. T. Poggio and F. Girosi. Regularization algorithms for learning that are equivalent to multilayer networks. Science, 247:978–982, 1990. 548 L EARNING C OORDINATE C OVARIANCES VIA G RADIENTS B. Schoelkopf and A. Smola. Learning with Kernels: Support Vector Machines, Regularization, Optimization, and Beyond. MIT Press, Cambridge, MA, USA, 2001. D. K. Slonim, P. Tamayo, J. P. Mesirov, T. R. Golub, and E. S. Lander. Class prediction and discovery using gene expression data. In Proc. of the 4th Annual International Conference on Computational Molecular Biology (RECOMB), pages 263–272, 2000. S. Smale and D. X. Zhou. Learning theory estimates via integral operators and their approximations. Constr. Approx., 24, 2006a. S. Smale and D. X. Zhou. Shannon sampling II. Connections to learning theory. Appl. Comput. Harmonic Anal., 19:285–302, 2006b. S. Smale and D. X. Zhou. Shannon sampling and function reconstruction from point values. Bull. Amer. Math. Soc., 41:279–305, 2004. S. Smale and D. X. Zhou. Estimating the approximation error in learning theory. Anal. Appl., 1: 17–41, 2003. A. Subramanian, P. Tamayo, VK. Mootha, S. Mukherjee, BL. Ebert, MA. Gillette, A. Paulovich, SL. Pomeroy, TR. Golub, ES. Lander, and JP. Mesirov. Gene set enrichment analysis: A knowledgebased approach for interpreting genome-wide expression proﬁles. Proc Natl Acad Sci U S A, 2005. A. Sweet-Cordero, S. Mukherjee, A. Subramanian, H. You, J. J. Roix, C. Ladd-Acosta, J. P. Mesirov, T. R. Golub, and T. Jacks. An oncogenic KRAS2 expression signature identiﬁed by cross-species gene-expression analysis. Nature Genetics, 37:48–55, 2005. R. Tibshirani. Regression shrinkage and selection via the lasso. J Royal Stat Soc B, 58(1):267–288, 1996. V. N. Vapnik. Statistical Learning Theory. Wiley, New York, 1998. E. De Vito, A. Caponnetto, and L. Rosasco. Model selection for regularized least-squares algorithm in learning. Foundat. Comput. Math., 5:59–85, 2005. G. Wahba and J. Wendelberger. Some new mathematical methods for variational objective analysis using splines and cross-validation. Monthly Weather Rev., 108:1122–1145, 1980. M. West. Bayesian factor regression models in the “large p, small n” paradigm. In J. M. Bernardo et al., editor, Bayesian Statistics 7, pages 723–732. Oxford, 2003. Q. Wu and D. X. Zhou. Support vector machine classiﬁers: linear programming versus quadratic programming. Neural Computation, 17:1160–1187, 2005. T. Zhang. Leave-one-out bounds for kernel methods. Neural Computation, 15(6):1397–1437, 2003. D. X. Zhou. Capacity of reproducing kernel spaces in learning theory. IEEE Trans. Inform. Theory, 49:1743–1752, 2003. 549