jmlr jmlr2006 jmlr2006-29 jmlr2006-29-reference knowledge-graph by maker-knowledge-mining
Source: pdf
Author: Sayan Mukherjee, Qiang Wu
Abstract: We introduce an algorithm that simultaneously estimates a classification function as well as its gradient in the supervised learning framework. The motivation for the algorithm is to find salient variables and estimate how they covary. An efficient implementation with respect to both memory and time is given. The utility of the algorithm is illustrated on simulated data as well as a gene expression data set. An error analysis is given for the convergence of the estimate of the classification function and its gradient to the true classification function and true gradient. Keywords: Tikhnonov regularization, variable selection, reproducing kernel Hilbert space, generalization bounds, classification
N. Aronszajn. Theory of reproducing kernels. Trans. Amer. Math. Soc., 68(6):337–404, 1950. B. L. Bartlett, M. L. Jordan, and J. D. McAuliffe. Convexity, classification, and risk bounds. Journal of the American Statistical Association, 101(473):138–156, 2005. O. Chapelle, V.N. Vapnik, O. Bousquet, and S. Mukherjee. Choosing multiple parameters for support vector machines. Machine Learning, 46(1-3):131–159, 2002. S.S. Chen, D.L. Donoho, and M.A. Saunders. Atomic decomposition by basis pursuit. SIAM Journal on Scientific Computing, 20(1):33–61, 1999. C. Cortes and V.N. Vapnik. Support-vector networks. Machine Learning, 20(3):273–297, 1995. T. Evgeniou, M. Pontil, C. Papageorgiou, and T. Poggio. Image representations for object detection using kernel classifiers. In Proceedings of Asian Conference on Computer Vision, 2000a. T. Evgeniou, M. Pontil, and T. Poggio. Regularization networks and support vector machines. Advances in Computational Mathematics, 13:1–50, 2000b. T.R. Golub, D.K. Slonim, P. Tamayo, C. Huard, M. Gaasenbeek, J.P. Mesirov, H. Coller, M.L. Loh, J.R. Downing, M.A. Caligiuri, C.D. Bloomfield, and E.S. Lander. Molecular classification of cancer: Class discovery and class prediction by gene expression monitoring. Science, 286: 531–537, 1999. I. Guyon, J. Weston, S. Barnhill, and V. Vapnik. Gene selection for cancer classification using support vector machines. Machine Learning, 46(1-3):389–422, 2002. T. Hastie, R. Tibshirani, and J. Friedman. The Elements of Statistical Learning. Springer, 2001. L. Hermes and J.M. Buhmann. Feature selection for support vector machines. In International Conference on Pattern Recognition, 2000. V.I. Koltchinskii and D. Panchenko. Rademacher processes and bounding the risk of function learning. In J. Wellner E. Gin´ , D. Mason, editor, High Dimensional Probability II, pages 443–459, e 2000. Y. Lee, Y. Lin, and G. Wahba. Multicategory support vector machines: Theory and applications to the classification of microarray data and satellite radiance data. J. Amer. Stat. Soc., 99:67–81, 2004. F. Liang, S. Mukherjee, and M. West. Understanding the use of unlabeled data in predictive modeling. Statistical Science, 2006. accepted. Y. Lin and H.H. Zhang. Component selection and smoothing in smoothing spline analysis of variance models. Annals of Statistics, 2006. in press. C. McDiarmid. On the method of bounded differences. In J. Siemons, editor, Surveys in Combinatorics, volume 141, pages 148–188. LMS Lecture Notes Series, 1989. 2513 M UKHERJEE AND W U C.A. Micchelli and M. Pontil. On learning vector-valued functions. Neural Computation, 17:177– 204, 2005. S. Mukherjee and D.X. Zhou. Learning coordinate covariances via gradients. Journal of Machine Learning Research, 7:519–549, 2006. S. Mukherjee, Q. Wu, and D. X. Zhou. Learning gradients and feature selection on manifolds. Annals of Statistics, 2006. submitted. T. Poggio and F. Girosi. Regularization algorithms for learning that are equivalent to multilayer networks. Science, 247:978–982, 1990. B. Schoelkopf and A. Smola. Learning with Kernels: Support Vector Machines, Regularization, Optimization, and Beyond. MIT Press, Cambridge, MA, USA, 2001. D.K. Slonim, P. Tamayo, J.P. Mesirov, T.R. Golub, and E.S. Lander. Class prediction and discovery using gene expression data. In Proc. of the 4th Annual International Conference on Computational Molecular Biology (RECOMB), pages 263–272, 2000. R. Tibshirani. Regression shrinkage and selection via the lasso. J Royal Stat Soc B, 58(1):267–288, 1996. V.G. Tusher, R. Tibshirani, and G. Chu. Significance analysis of microarrays applied to the ionizing radiation response. Proc Natl Acad Sci U S A, 98(9):5116–21, 2001. A. van der Vaart and J. Wellner. Weak convergence and empirical processes. Springer-Verlag, New York, 1996. V. N. Vapnik. Statistical Learning Theory. Wiley, New York, 1998. G. Wahba. Splines Models for Observational Data. Series in Applied Mathematics, Vol. 59, SIAM, Philadelphia, 1990. Q. Wu and D.X. Zhou. Support vector machine classifiers: Linear programming versus quadratic programming. Neural Computation, 17:1160–1187, 2005. T. Zhang. Statistical behavior and consistency of classification methods based on convex risk minimization. Ann. Statis., 32:56–85, 2004. 2514