jmlr jmlr2010 jmlr2010-60 jmlr2010-60-reference knowledge-graph by maker-knowledge-mining
Source: pdf
Author: Shai Shalev-Shwartz, Ohad Shamir, Nathan Srebro, Karthik Sridharan
Abstract: The problem of characterizing learnability is the most basic question of statistical learning theory. A fundamental and long-standing answer, at least for the case of supervised classification and regression, is that learnability is equivalent to uniform convergence of the empirical risk to the population risk, and that if a problem is learnable, it is learnable via empirical risk minimization. In this paper, we consider the General Learning Setting (introduced by Vapnik), which includes most statistical learning problems as special cases. We show that in this setting, there are non-trivial learning problems where uniform convergence does not hold, empirical risk minimization fails, and yet they are learnable using alternative mechanisms. Instead of uniform convergence, we identify stability as the key necessary and sufficient condition for learnability. Moreover, we show that the conditions for learnability in the general setting are significantly more complex than in supervised classification and regression. Keywords: statistical learning theory, learnability, uniform convergence, stability, stochastic convex optimization
N. Alon, S. Ben-David, N. Cesa-Bianchi, and D. Haussler. Scale-sensitive dimensions, uniform convergence, and learnability. Journal of the ACM, 44(4):615–631, 1997. M. Anthony and P. Bartlet. Neural Network Learning: Theoretical Foundations. Cambridge University Press, 1999. 2668 L EARNABILITY, S TABILITY AND U NIFORM C ONVERGENCE A. Blumer, A. Ehrenfeucht, D. Haussler, and M. Warmuth. Learnability and the Vapnik-Chervonenkis dimension. Journal of the Association for Computing Machinery, 36(4):929–965, October 1989. O. Bousquet and A. Elisseeff. Stability and generalization. Journal of Machine Learning Research, 2:499– 526, 2002. ISSN 1533-7928. L. Breiman. Bias, variance, and arcing classifiers. Technical Report 460, Statistics Department, University of California at Berkeley, 1996. N. Cesa-Bianchi, A. Conconi, and C. Gentile. On the generalization ability of on-line learning algorithms. IEEE Transactions on Information Theory, 50(9):2050–2057, September 2004. L. Devroye, L. Gy¨ rfi, and G. Lugosi. A Probabilistic Theory of Pattern Recognition. Springer, 1996. o J. Hadamard. Sur les probl` mes aux d´ riv´ es partielles et leur signification physique. Princeton University e e e Bulletin, 13:49–52, 1902. D. Haussler. Decision theoretic generalizations of the PAC model for neural net and other learning applications. Information and Computation, 100(1):78–150, 1992. M. Kearns and D. Ron. Algorithmic stability and sanity-check bounds for leave-one-out cross-validation. Neural Computation, 11(6):1427–1453, 1999. M. Kearns, R. Schapire, and L. Sellie. Toward efficient agnostic learning. In Proceedings of the Fifth Annual ACM Workshop on Computational Learning Theory, pages 341–352, 1992. S. Kutin and P. Niyogi. Almost-everywhere algorithmic stability and generalization error. In Proceedings of the 18th Conference in Uncertainty in Artificial Intelligence, pages 275–282, 2002. D. McAllester and R. Schapire. On the convergence rate of Good-Turing estimators. In Proceedings of the Thirteenth Annual Conference on Computational Learning Theory, pages 1–6, 2000. S. Mukherjee, P. Niyogi, T. Poggio, and R. Rifkin. Learning theory: stability is sufficient for generalization and necessary and sufficient for consistency of empirical risk minimization. Advances in Computational Mathematics, 25(1-3):161–193, 2006. D. L. Phillips. A technique for the numerical solution of certain integral equations of the first kind. Journal of the ACM, 9(1):84–97, 1962. S. Rakhlin, S. Mukherjee, and T. Poggio. Stability results in learning theory. Analysis and Applications, 3 (4):397–419, 2005. W. Rogers and T. Wagner. A finite sample distribution-free performance bound for local discrimination rules. Annals of Statistics, 6(3):506–514, 1978. R.E. Schapire. The strength of weak learnability. In 30th Annual Symposium on Foundations of Computer Science, pages 28–33, October 1989. S. Shalev-Shwartz. Online Learning: Theory, Algorithms, and Applications. PhD thesis, The Hebrew University, 2007. S. Shalev-Shwartz, O. Shamir, N. Srebro, and K. Sridharan. Stochastic convex optimization. In Proceedings of the 22nd Annual Conference on Computational Learning Theory, 2009a. S. Shalev-Shwartz, O. Shamir, N. Srebro, and K. Sridharan. Learnability and stability in the general learning setting. In Proceedings of the 22nd Annual Conference on Computational Learning Theory, 2009b. 2669 S HALEV-S HWARTZ , S HAMIR , S REBRO AND S RIDHARAN K. Sridharan, N. Srebro, and S. Shalev-Shwartz. Fast rates for regularized objectives. In Advances in Neural Information Processing Systems 22, pages 1545–1552, 2008. A. N. Tikhonov. On the stability of inverse problems. Dolk. Akad. Nauk SSSR, 39(5):195–198, 1943. V. N. Vapnik. The Nature of Statistical Learning Theory. Springer, 1995. V. N. Vapnik. Statistical Learning Theory. Wiley, 1998. V. N. Vapnik and A. Ya. Chervonenkis. On the uniform convergence of relative frequencies of events to their probabilities. Theory of Probability and its applications, XVI(2):264–280, 1971. M. Zinkevich. Online convex programming and generalized infinitesimal gradient ascent. In Proceedings of the Twentieth International Conference on Machine Learning, pages 928–936, 2003. 2670