jmlr jmlr2009 jmlr2009-16 jmlr2009-16-reference knowledge-graph by maker-knowledge-mining

16 jmlr-2009-Classification with Gaussians and Convex Loss


Source: pdf

Author: Dao-Hong Xiang, Ding-Xuan Zhou

Abstract: This paper considers binary classification algorithms generated from Tikhonov regularization schemes associated with general convex loss functions and varying Gaussian kernels. Our main goal is to provide fast convergence rates for the excess misclassification error. Allowing varying Gaussian kernels in the algorithms improves learning rates measured by regularization error and sample error. Special structures of Gaussian kernels enable us to construct, by a nice approximation scheme with a Fourier analysis technique, uniformly bounded regularizing functions achieving polynomial decays of the regularization error under a Sobolev smoothness condition. The sample error is estimated by using a projection operator and a tight bound for the covering numbers of reproducing kernel Hilbert spaces generated by Gaussian kernels. The convexity of the general loss function plays a very important role in our analysis. Keywords: reproducing kernel Hilbert space, binary classification, general convex loss, varying Gaussian kernels, covering number, approximation


reference text

A. Argyriou, R. Hauser, C. A. Micchelli, and M. Pontil. A DC-programming algorithm for kernel selection. Proceedings of the Twenty-Third International Conference on Machine Learning, 2006. N. Aronszajn. Theory of reproducing kernels. Transactions of the American Mathematical Society, 68:337–404, 1950. P. L. Bartlett, M. I. Jordan, and J. D. McAuliffe. Convexity, classification, and risk bounds. Journal of the American Statistical Association, 101:138–156, 2006. O. Chapelle, V. N. Vapnik, O. Bousquet and S. Mukherjee. Choosing multiple parameters for support vector machines. Machine Learning, 46: 131–159, 2002. N. Cristianini and J. Shawe-Taylor. An Introduction to Support Vector Machines. Cambridge University Press, 2000. 1466 C LASSIFICATION WITH G AUSSIANS D. R. Chen, Q. Wu, Y. Ying, and D. X. Zhou. Support vector machine soft margin classifiers: error analysis. Journal of Machine Learning Research, 5:1143–1175, 2004. F. Cucker and D. X. Zhou. Learning Theory: An Approximation Theory Viewpoint. Cambridge University Press, 2007. E. De Vito, A. Caponnetto, and L. Rosasco. Model selection for regularized least-squares algorithm in learning theory. Foundations of Computational Mathematics, 5:59–85, 2006. E. De Vito, L. Rosasco, A. Caponnetto, M. Piana, and A. Verri. Some properties of regularized kernel methods. Journal of Machine Learning Research, 5:1363–1390, 2004. D. Edmunds and H. Triebel. Function Spaces, Entropy Numbers, Differential Operators. Cambridge University Press, 1996. T. Evgeniou, M. Pontil, and T. Poggio. Regularization networks and support vector machines. Advances in Computational Mathematics, 13:1–50, 2000. D. Hardin, I. Tsamardinos, and C. F. Aliferis. A theoretical characterization of linear SVM-based feature selection. Proc. of the 21st Int. Conf. on Machine Learning, Banff, Canada, 2004. S. Mukherjee, P. Niyogi, T. Poggio, and R. Rifkin. Learning theory: stability is sufficient for generalization and necessary and sufficient for empirical risk minimization. Advances in Computational Mathematics, 25:161–193, 2006. Z. W. Pan, D. H. Xiang, Q. W. Xiao, and D. X. Zhou. Parzen windows for multi-class classification. Journal of Complexity, 24:606–618, 2008. R. Schaback and J. Werner. Linearly constrained reconstruction of functions by kernels, with applications to machine learning. Advances in Computational Mathematics, 25:237–258, 2006. S. Smale and D. X. Zhou. Learning theory estimates via integral operators and their approximations. Constructive Approximation, 26:153–172, 2007. S. Smale and D. X. Zhou. Estimating the approximation error in learning theory. Analysis and Applications, 1:17–41, 2003. S. Smale and D. X. Zhou. Shannon sampling and function reconstruction from point values. Bulletin of the American Mathematical Society, 41:279–305, 2004. I. Steinwart and C. Scovel. Fast rates for support vector machines using Gaussian kernels. Annals of Statistics, 35:575–607, 2007. I. Steinwart, D. Hush, and C. Scovel. An explicit description of the reproducing kernel Hilbert spaces of Gaussian RBF kernels. IEEE Transactions on Information Theory, 52:4635–4643, 2006. A. B. Tsybakov. Optimal aggregation of classifiers in statistical learning. Annals of Statistics, 32:135–166, 2004. G. Wahba. Spline Models for Observational Data. SIAM, 1990. 1467 X IANG AND Z HOU Q. Wu, Y. Ying, and D. X. Zhou. Multi-kernel regularized classifiers. Journal of Complexity, 23:108–134, 2007. Q. Wu and D. X. Zhou. Analysis of support vector machine classification. Journal of Computational Analysis and Applications, 8:99–119, 2006. Q. Wu and D. X. Zhou. SVM soft margin classifiers: linear programming versus quadratic programming. Neural Computation, 17:1160–1187, 2005. Y. Yao. On complexity issue of online learning algorithms. IEEE Transactions on Information Theory, to appear. G. B. Ye and D. X. Zhou. Fully online classification by regularization. Applied and Computational Harmonic Analysis, 23:198–214, 2007. G. B. Ye and D. X. Zhou. Learning and approximation by Gaussians on Riemannian manifolds. Advances in Computational Mathematics, 29:291–310, 2008. Y. Ying. Convergence analysis of online algorithms. Advances in Computational Mathematics, 27:273–291, 2007. Y. Ying and D. X. Zhou. Learnability of Gaussians with flexible variances. Journal of Machine Learning Research, 8:249–276, 2007. T. Zhang. Statistical behavior and consistency of classification methods based on convex risk minimization. Annals of Statistics, 32:56–85, 2004. D. X. Zhou. The covering number in learning theory. Journal of Complexity, 18:739–767, 2002. D. X. Zhou. Capacity of reproducing kernel spaces in learning theory. IEEE Transactions on Information Theory, 49:1743–1752, 2003. D. X. Zhou and K. Jetter. Approximation with polynomial kernels and SVM classifiers. Advances in Computational Mathematics, 25:323–344, 2006. 1468