jmlr jmlr2006 jmlr2006-45 jmlr2006-45-reference knowledge-graph by maker-knowledge-mining
Source: pdf
Author: Sayan Mukherjee, Ding-Xuan Zhou
Abstract: We introduce an algorithm that learns gradients from samples in the supervised learning framework. An error analysis is given for the convergence of the gradient estimated by the algorithm to the true gradient. The utility of the algorithm for the problem of variable selection as well as determining variable covariance is illustrated on simulated data as well as two gene expression data sets. For square loss we provide a very efficient implementation with respect to both memory and time. Keywords: Tikhnonov regularization, variable selection, reproducing kernel Hilbert space, generalization bounds
N. Aronszajn. Theory of reproducing kernels. Transactions of the American Mathematical Society, 686:337–404, 1950. M. Belkin and P. Niyogi. Semi-Supervised Learning on Riemannian Manifolds. Machine Learning, 56(1-3):209–239, 2004. 547 M UKHERJEE AND Z HOU I. Carrel, A. Cottle, K. Coglin, and H. Willard. A first-generation X-incativation profile of the human X chromosome. Proc. Natl. Acad. Sci. USA, 96:14440–14444, 1999. O. Chapelle, V. N. Vapnik, O. Bousquet, and S. Mukherjee. Choosing Multiple Parameters for Support Vector Machines. Machine Learning, 46(1-3):131–159, 2002. S. S. Chen, D. L. Donoho, and M. A. Saunders. Atomic decomposition by basis pursuit. SIAM Journal on Scientific Computing, 20(1):33–61, 1999. C. Cortes and V. N. Vapnik. Support-Vector Networks. Machine Learning, 20(3):273–297, 1995. F. Cucker and S. Smale. On the mathematical foundations of learning. Bull. Amer. Math. Soc., 39: 1–49, 2001. C. Disteche, G. Flippova, and K. Tsuchiya. Escape from X inactivation. Cytogenet. Genome Res., 99:35–43, 2002. T. Evgeniou, M. Pontil, and T. Poggio. Regularization Networks and Support Vector Machines. Advances in Computational Mathematics, 13:1–50, 2000. T. R. Golub, D. K. Slonim, P. Tamayo, C. Huard, M. Gaasenbeek, J. P. Mesirov, H. Coller, M. L. Loh, J. R. Downing, M. A. Caligiuri, C. D. Bloomfield, and E. S. Lander. Molecular classification of cancer: class discovery and class prediction by gene expression monitoring. Science, 286:531– 537, 1999. I. Guyon, J. Weston, S. Barnhill, and V. Vapnik. Gene selection for cancer classification using support vector machines. Machine Learning, 46(1-3):389–422, 2002. Y. Lee, Y. Lin, and G. Wahba. Multicategory support vector machines: theory and applications to the classification of microarray data and satellite radiance data. Journal of the American Statistical Society, 99:67–81, 2004. M. Liao. Bayesian estimation of gene expression index and Bayesian kernel models. PhD thesis, Duke University, Durham, NC, 2005. M. Liao, F. Liang, S. Mukherjee, and M. West. Bayesian kernel regression and radial basis function models. Preprint, 2005. C. A. Micchelli and M. Pontil. On learning vector-valued functions. Neural Computation, 17: 177–204, 2005. I. Pinelis. Optimum bounds for the distributions of martingales in Banach spaces. Ann. Probab., 22:1679–1706, 1994. I. Pinelis. Correction: ”Optimum bounds for the distributions of martingales in Banach spaces”. Ann. Probab., 27:2119, 1999. T. Poggio and F. Girosi. Regularization algorithms for learning that are equivalent to multilayer networks. Science, 247:978–982, 1990. 548 L EARNING C OORDINATE C OVARIANCES VIA G RADIENTS B. Schoelkopf and A. Smola. Learning with Kernels: Support Vector Machines, Regularization, Optimization, and Beyond. MIT Press, Cambridge, MA, USA, 2001. D. K. Slonim, P. Tamayo, J. P. Mesirov, T. R. Golub, and E. S. Lander. Class prediction and discovery using gene expression data. In Proc. of the 4th Annual International Conference on Computational Molecular Biology (RECOMB), pages 263–272, 2000. S. Smale and D. X. Zhou. Learning theory estimates via integral operators and their approximations. Constr. Approx., 24, 2006a. S. Smale and D. X. Zhou. Shannon sampling II. Connections to learning theory. Appl. Comput. Harmonic Anal., 19:285–302, 2006b. S. Smale and D. X. Zhou. Shannon sampling and function reconstruction from point values. Bull. Amer. Math. Soc., 41:279–305, 2004. S. Smale and D. X. Zhou. Estimating the approximation error in learning theory. Anal. Appl., 1: 17–41, 2003. A. Subramanian, P. Tamayo, VK. Mootha, S. Mukherjee, BL. Ebert, MA. Gillette, A. Paulovich, SL. Pomeroy, TR. Golub, ES. Lander, and JP. Mesirov. Gene set enrichment analysis: A knowledgebased approach for interpreting genome-wide expression profiles. Proc Natl Acad Sci U S A, 2005. A. Sweet-Cordero, S. Mukherjee, A. Subramanian, H. You, J. J. Roix, C. Ladd-Acosta, J. P. Mesirov, T. R. Golub, and T. Jacks. An oncogenic KRAS2 expression signature identified by cross-species gene-expression analysis. Nature Genetics, 37:48–55, 2005. R. Tibshirani. Regression shrinkage and selection via the lasso. J Royal Stat Soc B, 58(1):267–288, 1996. V. N. Vapnik. Statistical Learning Theory. Wiley, New York, 1998. E. De Vito, A. Caponnetto, and L. Rosasco. Model selection for regularized least-squares algorithm in learning. Foundat. Comput. Math., 5:59–85, 2005. G. Wahba and J. Wendelberger. Some new mathematical methods for variational objective analysis using splines and cross-validation. Monthly Weather Rev., 108:1122–1145, 1980. M. West. Bayesian factor regression models in the “large p, small n” paradigm. In J. M. Bernardo et al., editor, Bayesian Statistics 7, pages 723–732. Oxford, 2003. Q. Wu and D. X. Zhou. Support vector machine classifiers: linear programming versus quadratic programming. Neural Computation, 17:1160–1187, 2005. T. Zhang. Leave-one-out bounds for kernel methods. Neural Computation, 15(6):1397–1437, 2003. D. X. Zhou. Capacity of reproducing kernel spaces in learning theory. IEEE Trans. Inform. Theory, 49:1743–1752, 2003. 549