nips nips2010 nips2010-243 nips2010-243-reference knowledge-graph by maker-knowledge-mining
Source: pdf
Author: Nathan Srebro, Karthik Sridharan, Ambuj Tewari
Abstract: √ ˜ We establish an excess risk bound of O HR2 + HL∗ Rn for ERM with an H-smooth loss n function and a hypothesis class with Rademacher complexity Rn , where L∗ is the best risk achievable by the hypothesis class. For typical hypothesis classes where Rn = R/n, this translates to ˜ ˜ a learning rate of O (RH/n) in the separable (L∗ = 0) case and O RH/n + L∗ RH/n more generally. We also provide similar guarantees for online and stochastic convex optimization of a smooth non-negative objective. 1
[1] N. Alon, S. Ben-David, N. Cesa-Bianchi, and D. Haussler. Scale-sensitive dimensions, uniform convergence, and learnability. FOCS, 0:292–301, 1993.
[2] P. L. Bartlett and S. Mendelson. Rademacher and Gaussian complexities: Risk bounds and structural results. JMLR, 3:463– 482, 2002.
[3] P.L. Bartlett, O. Bousquet, and S. Mendelson. Local rademacher complexities. Annals of Statistics, 33(4):1497–1537, 2005.
[4] A. Beck and M. Teboulle. Mirror descent and nonlinear projected subgradient methods for convex optimization. Operations Research Letters, 31:167–175, 2003.
[5] P.J. Bickel, Y. Ritov, and A.B. Tsybakov. Simultaneous analysis of Lasso and Dantzig selector. The Annals of Statistics, 37(4):1705–1732, 2009.
[6] O. Bousquet. Concentration Inequalities and Empirical Processes Theory Applied to the Analysis of Learning Algorithms. PhD thesis, Ecole Polytechnique, 2002.
[7] Olivier Bousquet and Andr´ Elisseeff. Stability and generalization. J. Mach. Learn. Res., 2:499–526, 2002. e
[8] N. Cesa-Bianchi, A. Conconi, and C.Gentile. On the generalization ability of on-line learning algorithms. In NIPS, pages 359–366, 2002.
[9] N. Cesa-Bianchi and G. Lugosi. Prediction, learning, and games. Cambridge University Press, 2006.
[10] Christophe Chesneau and Guillaume Lecu. Adapting to unknown smoothness by aggregation of thresholded wavelet estimators. 2006.
[11] S.M. Kakade, K. Sridharan, and A. Tewari. On the complexity of linear prediction: Risk bounds, margin bounds, and regularization. In NIPS, 2008.
[12] V. Koltchinskii. Sparsity in penalized empirical risk minimization. Ann. Inst, H. Poincar´ Probab. Statist., 45(1):7–57, 2009. e
[13] V. Koltchinskii and D. Panchenko. Empirical margin distributions and bounding the generalization error of combined classifiers. Ann. of Stats., 30(1):1–50, 2002.
[14] G. Lan. Convex Optimization Under Inexact First-order Information. PhD thesis, Georgia Institute of Technology, 2009.
[15] J. Langford and J. Shawe-Taylor. PAC-Bayes & margins. In Advances in Neural Information Processing Systems 15, pages 423–430, 2003.
[16] Wee Sun Lee, Peter L. Bartlett, and Robert C. Williamson. The importance of convexity in learning with squared loss. IEEE Trans. on Information Theory, 1998.
[17] P. Liang, F. Bach, G. Bouchard, and M. I. Jordan. Asymptotically optimal regularization in smooth parametric models. In NIPS, 2010.
[18] P. Liang and N. Srebro. On the interaction between norm and dimensionality: Multiple regimes in learning. In ICML, 2010.
[19] D. A. McAllester. Simplified PAC-Bayesian margin bounds. In COLT, pages 203–215, 2003.
[20] Shahar Mendelson. Rademacher averages and phase transitions in glivenko-cantelli classes. IEEE Trans. On Information Theory, 48(1):251–263, 2002.
[21] A. Nemirovski and D. Yudin. Problem complexity and method efficiency in optimization. Nauka Publishers, Moscow, 1978.
[22] D. Panchenko. Some extensions of an inequality of vapnik and chervonenkis. Electronic Communications in Probability, 7:55–65, 2002.
[23] David Pollard. Convergence of Stochastic Processes. Springer-Verlag, 1984.
[24] S. Shalev-Shwartz, O. Shamir, N. Srebro, and K. Sridharan. Stochastic convex optimization. In COLT, 2009.
[25] S. Shalev-Shwartz, N. Srebro, and T. Zhang. Trading accuracy for sparsity. Technical report, TTI-C, 2009. Available at ttic.uchicago.edu/∼shai.
[26] S.Shalev-Shwartz. Online Learning: Theory, Algorithms, and Applications. PhD thesis, Hebrew University of Jerusalem, 2007.
[27] I. Steinwart and C. Scovel. Fast rates for support vector machines using gaussian kernels. ANNALS OF STATISTICS, 35:575, 2007.
[28] R. Tibshirani. Regression shrinkage and selection via the lasso. J. Royal. Statist. Soc B., 58(1):267–288, 1996.
[29] A. Tsybakov. Optimal aggregation of classifiers in statistical learning. Annals of Statistics, 32:135–166, 2004.
[30] S. A. van de Geer. High-dimensional generalized linear models and the lasso. Annals of Statistics, 36(2):614–645, 2008.
[31] C. Zalinescu. Convex analysis in general vector spaces. World Scientific Publishing Co. Inc., River Edge, NJ, 2002.
[32] M. Zinkevich. Online convex programming and generalized infinitesimal gradient ascent. In ICML, 2003. 9