jmlr jmlr2006 jmlr2006-84 jmlr2006-84-reference knowledge-graph by maker-knowledge-mining
Source: pdf
Author: Andrea Caponnetto, Alexander Rakhlin
Abstract: We study some stability properties of algorithms which minimize (or almost-minimize) empirical error over Donsker classes of functions. We show that, as the number n of samples grows, the L 2 1 diameter of the set of almost-minimizers of empirical error with tolerance ξ(n) = o(n − 2 ) converges to zero in probability. Hence, even in the case of multiple minimizers of expected error, as n increases it becomes less and less likely that adding a sample (or a number of samples) to the training set will result in a large jump to a new hypothesis. Moreover, under some assumptions on the entropy of the class, along with an assumption of Komlos-Major-Tusnady type, we derive a power rate of decay for the diameter of almost-minimizers. This rate, through an application of a uniform ratio limit inequality, is shown to govern the closeness of the expected errors of the almost-minimizers. In fact, under the above assumptions, the expected errors of almost-minimizers become closer with a rate strictly faster than n−1/2 . Keywords: empirical risk minimization, empirical processes, stability, Donsker classes
P. L. Bartlett and S. Mendelson. Empirical minimization. Probability Theory and Related Fields, 135(3):311–334, 2006. P. L. Bartlett, S. Mendelson, and P. Philips. Local complexities for empirical risk minimization. In J. Shawe-Taylor and Y. Singer, editors, Proceedings of the 17th Annual Conference on Learning Theory, pages 270–284. Springer, 2004. S. Ben-David, N. Eiron, and P. M. Long. On the difficulty of approximately maximizing agreements. Journal of Computer and System Sciences, 66(3):496–514, 2003. O. Bousquet and A. Elisseeff. Stability and generalization. Journal of Machine Learning Research, 2:499–526, 2002. L. Devroye, L. Gy¨ rfi, and G. Lugosi. A Probabilistic Theory of Pattern Recognition. Number 31 o in Applications of mathematics. Springer, New York, 1996. L. P. Devroye and T. J. Wagner. Distribution-free performance bounds for potential function rules. IEEE Transactions on Information Theory, 25(5):601–604, 1979. R. M. Dudley. Uniform Central Limit Theorems. Cambridge University Press, 1999. R. M. Dudley. Real Analysis and Probability. Cambrige Studies in Advaced Mathematics. Cambrigde University Press, 2002. E. Gin´ and J. Zinn. Gaussian characterization of uniform Donsker classes of functions. The Annals e of Probability, 19:758–782, 1991. J. Kim and D. Pollard. Cube root asymptotics. Annals of Statistics, 18:191–219, 1990. V. Koltchinskii. Local Rademacher complexities and oracle inequalities in risk minimization. Annals of Statistics, 2006. To appear. V. Koltchinskii. Koml´ s-Major-Tusn´ dy approximation for the general empirical process and Haar o a expansion of classes of functions. Journal of Theoretical Probability, 7:73–118, 1994. S. Kutin and P. Niyogi. Almost-everywhere algorithmic stability and generalization error. In Proceedings of the 18th Annual Conference on Uncertainty in Artificial Intelligence (UAI-02), pages 275–28, San Francisco, CA, 2002. Morgan Kaufmann. W. S. Lee, P. L. Bartlett, and R. C. Williamson. The importance of convexity in learning with squared loss. IEEE Transactions on Information Theory, 44(5):1974–1980, 1998. S. Mukherjee, P. Niyogi, T. Poggio, and R. Rifkin. Statistical learning: Stability is sufficient for generalization and necessary and sufficient for consistency of empirical risk minimization. Advances in Computational Mathematics, 25:161–193, 2006. 2582 S TABILITY P ROPERTIES OF E MPIRICAL R ISK M INIMIZATION OVER D ONSKER C LASSES T. Poggio, R. Rifkin, S. Mukherjee, and P. Niyogi. General conditions for predictivity in learning theory. Nature, 428:419–422, 2004. D. Pollard. Uniform ratio limit theorems for empirical processes. Scandinavian Journal of Statistics, 22(3):271–278, 1995. A. Rakhlin and A. Caponnetto. Stability of k-means clustering. In Proceedings of Neural Information Processing Systems Conference, 2006. To appear. A. Rakhlin, S. Mukherjee, and T. Poggio. Stability results in learning theory. Analysis and Applications, 3(4):397–417, 2005. E. Rio. Strong approximation for set-indexed partial sum processes via KMT constructions I. The Annals of Probability, 21(2):759–790, 1993. M. Rudelson and R. Vershynin. Combinatorics of random processes and sections of convex bodies. Annals of Mathematics. To appear. S. van de Geer. Empirical Processes in M-Estimation. Cambridge University Press, 2000. A. W. van der Vaart and J. A. Wellner. Weak Convergence and Empirical Processes with Applications to Statistics. Springer-Verlag, New York, 1996. V. N. Vapnik and A. Y. Chervonenkis. On the uniform convergence of relative frequences of events to their probabilities. Th. Prob. and its Applications, 17(2):264–280, 1971. V .N. Vapnik and A. Ya. Chervonenkis. The necessary and sufficient conditions for consistency in the empirical risk minimization method. Pattern Recognition and Image Analysis, 1(3):283–305, 1991. 2583