nips nips2008 nips2008-85 nips2008-85-reference knowledge-graph by maker-knowledge-mining
Source: pdf
Author: Karthik Sridharan, Shai Shalev-shwartz, Nathan Srebro
Abstract: We study convergence properties of empirical minimization of a stochastic strongly convex objective, where the stochastic component is linear. We show that the value attained by the empirical minimizer converges to the optimal value with rate 1/n. The result applies, in particular, to the SVM objective. Thus, we obtain a rate of 1/n on the convergence of the SVM objective (with fixed regularization parameter) to its infinite data limit. We demonstrate how this is essential for obtaining certain type of oracle inequalities for SVMs. The results extend also to approximate minimization as well as to strong convexity with respect to an arbitrary norm, and so also to objectives regularized using other p norms. 1
[1] E. Hazan, A. Kalai, S. Kale, and A. Agarwal. Logarithmic regret algorithms for online convex optimization. In Proceedings of the Nineteenth Annual Conference on Computational Learning Theory, 2006.
[2] S. Shalev-Shwartz. Online Learning: Theory, Algorithms, and Applications. PhD thesis, The Hebrew University, 2007.
[3] T. Zhang. Covering number bounds of certain regularized linear function classes. J. Mach. Learn. Res., 2:527–550, 2002.
[4] I. Steinwart, D. Hush, and C. Scovel. A new concentration result for regularized risk minimizers. Highdimensional Probability IV, in IMS Lecture Notes, 51:260–275, 2006.
[5] P. L. Bartlett, O. Bousquet, and S. Mendelson. Localized rademacher complexities. In COLT ’02: Proceedings of the 15th Annual Conference on Computational Learning Theory, pages 44–58, London, UK, 2002. Springer-Verlag.
[6] O. Bousquet, S. Boucheron, and G. Lugosi. Introduction to statistical learning theory. In O. Bousquet, U.v. Luxburg, and G. R¨ tsch, editors, Advanced Lectures in Machine Learning, pages 169–207. Springer, a 2004.
[7] S. M. Kakade, K. Sridharan, and A. Tewari. On the complexity of linear prediction: Risk bounds, margin bounds, and regularization. In NIPS, 2008.
[8] W. S. Lee, P. L. Bartlett, and R. C. Williamson. The importance of convexity in learning with squared loss. In Computational Learing Theory, pages 140–146, 1996.
[9] O. Bousquet. Concentration Inequalities and Empirical Processes Theory Applied to the Analysis of Learning Algorithms. PhD thesis, Ecole Polytechnique, 2002.
[10] A. Tsybakov. Optimal aggregation of classifiers in statistical learning. Annals of Statistics, 32:135–166, 2004.
[11] I. Steinwart and C. Scovel. Fast rates for support vector machines using gaussian kernels. ANNALS OF STATISTICS, 35:575, 2007.
[12] P. L. Bartlett, M. I. Jordan, and J. D. McAuliffe. Convexity, classification, and risk bounds. Journal of the American Statistical Association, 101:138–156, March 2006. 8