jmlr jmlr2010 jmlr2010-60 jmlr2010-60-reference knowledge-graph by maker-knowledge-mining

60 jmlr-2010-Learnability, Stability and Uniform Convergence

Source: pdf

Author: Shai Shalev-Shwartz, Ohad Shamir, Nathan Srebro, Karthik Sridharan

Abstract: The problem of characterizing learnability is the most basic question of statistical learning theory. A fundamental and long-standing answer, at least for the case of supervised classiﬁcation and regression, is that learnability is equivalent to uniform convergence of the empirical risk to the population risk, and that if a problem is learnable, it is learnable via empirical risk minimization. In this paper, we consider the General Learning Setting (introduced by Vapnik), which includes most statistical learning problems as special cases. We show that in this setting, there are non-trivial learning problems where uniform convergence does not hold, empirical risk minimization fails, and yet they are learnable using alternative mechanisms. Instead of uniform convergence, we identify stability as the key necessary and sufﬁcient condition for learnability. Moreover, we show that the conditions for learnability in the general setting are signiﬁcantly more complex than in supervised classiﬁcation and regression. Keywords: statistical learning theory, learnability, uniform convergence, stability, stochastic convex optimization

reference text

N. Alon, S. Ben-David, N. Cesa-Bianchi, and D. Haussler. Scale-sensitive dimensions, uniform convergence, and learnability. Journal of the ACM, 44(4):615–631, 1997. M. Anthony and P. Bartlet. Neural Network Learning: Theoretical Foundations. Cambridge University Press, 1999. 2668 L EARNABILITY, S TABILITY AND U NIFORM C ONVERGENCE A. Blumer, A. Ehrenfeucht, D. Haussler, and M. Warmuth. Learnability and the Vapnik-Chervonenkis dimension. Journal of the Association for Computing Machinery, 36(4):929–965, October 1989. O. Bousquet and A. Elisseeff. Stability and generalization. Journal of Machine Learning Research, 2:499– 526, 2002. ISSN 1533-7928. L. Breiman. Bias, variance, and arcing classiﬁers. Technical Report 460, Statistics Department, University of California at Berkeley, 1996. N. Cesa-Bianchi, A. Conconi, and C. Gentile. On the generalization ability of on-line learning algorithms. IEEE Transactions on Information Theory, 50(9):2050–2057, September 2004. L. Devroye, L. Gy¨ rﬁ, and G. Lugosi. A Probabilistic Theory of Pattern Recognition. Springer, 1996. o J. Hadamard. Sur les probl` mes aux d´ riv´ es partielles et leur signiﬁcation physique. Princeton University e e e Bulletin, 13:49–52, 1902. D. Haussler. Decision theoretic generalizations of the PAC model for neural net and other learning applications. Information and Computation, 100(1):78–150, 1992. M. Kearns and D. Ron. Algorithmic stability and sanity-check bounds for leave-one-out cross-validation. Neural Computation, 11(6):1427–1453, 1999. M. Kearns, R. Schapire, and L. Sellie. Toward efﬁcient agnostic learning. In Proceedings of the Fifth Annual ACM Workshop on Computational Learning Theory, pages 341–352, 1992. S. Kutin and P. Niyogi. Almost-everywhere algorithmic stability and generalization error. In Proceedings of the 18th Conference in Uncertainty in Artiﬁcial Intelligence, pages 275–282, 2002. D. McAllester and R. Schapire. On the convergence rate of Good-Turing estimators. In Proceedings of the Thirteenth Annual Conference on Computational Learning Theory, pages 1–6, 2000. S. Mukherjee, P. Niyogi, T. Poggio, and R. Rifkin. Learning theory: stability is sufﬁcient for generalization and necessary and sufﬁcient for consistency of empirical risk minimization. Advances in Computational Mathematics, 25(1-3):161–193, 2006. D. L. Phillips. A technique for the numerical solution of certain integral equations of the ﬁrst kind. Journal of the ACM, 9(1):84–97, 1962. S. Rakhlin, S. Mukherjee, and T. Poggio. Stability results in learning theory. Analysis and Applications, 3 (4):397–419, 2005. W. Rogers and T. Wagner. A ﬁnite sample distribution-free performance bound for local discrimination rules. Annals of Statistics, 6(3):506–514, 1978. R.E. Schapire. The strength of weak learnability. In 30th Annual Symposium on Foundations of Computer Science, pages 28–33, October 1989. S. Shalev-Shwartz. Online Learning: Theory, Algorithms, and Applications. PhD thesis, The Hebrew University, 2007. S. Shalev-Shwartz, O. Shamir, N. Srebro, and K. Sridharan. Stochastic convex optimization. In Proceedings of the 22nd Annual Conference on Computational Learning Theory, 2009a. S. Shalev-Shwartz, O. Shamir, N. Srebro, and K. Sridharan. Learnability and stability in the general learning setting. In Proceedings of the 22nd Annual Conference on Computational Learning Theory, 2009b. 2669 S HALEV-S HWARTZ , S HAMIR , S REBRO AND S RIDHARAN K. Sridharan, N. Srebro, and S. Shalev-Shwartz. Fast rates for regularized objectives. In Advances in Neural Information Processing Systems 22, pages 1545–1552, 2008. A. N. Tikhonov. On the stability of inverse problems. Dolk. Akad. Nauk SSSR, 39(5):195–198, 1943. V. N. Vapnik. The Nature of Statistical Learning Theory. Springer, 1995. V. N. Vapnik. Statistical Learning Theory. Wiley, 1998. V. N. Vapnik and A. Ya. Chervonenkis. On the uniform convergence of relative frequencies of events to their probabilities. Theory of Probability and its applications, XVI(2):264–280, 1971. M. Zinkevich. Online convex programming and generalized inﬁnitesimal gradient ascent. In Proceedings of the Twentieth International Conference on Machine Learning, pages 928–936, 2003. 2670