jmlr jmlr2007 jmlr2007-20 jmlr2007-20-reference knowledge-graph by maker-knowledge-mining

20 jmlr-2007-Combining PAC-Bayesian and Generic Chaining Bounds


Source: pdf

Author: Jean-Yves Audibert, Olivier Bousquet

Abstract: There exist many different generalization error bounds in statistical learning theory. Each of these bounds contains an improvement over the others for certain situations or algorithms. Our goal is, first, to underline the links between these bounds, and second, to combine the different improvements into a single bound. In particular we combine the PAC-Bayes approach introduced by McAllester (1998), which is interesting for randomized predictions, with the optimal union bound provided by the generic chaining technique developed by Fernique and Talagrand (see Talagrand, 1996), in a way that also takes into account the variance of the combined functions. We also show how this connects to Rademacher based bounds. Keywords: statistical learning theory, PAC-Bayes theorems, generalization error bounds


reference text

J.-Y. Audibert. Aggregated estimators and empirical complexity for least square regression. Ann. Inst. Henri Poincar´ , Probab. Stat., 40(6):685–736, 2004a. e J.-Y. Audibert. A better variance control for PAC-Bayesian classification. Technical report n.905, http://www.proba.jussieu.fr/mathdoc/textes/PMA-905Bis.pdf, Laboratoire de Probabilit´ s et Mod` les Al´ atoires, Universit´ s Paris 6 and Paris 7, 2004b. e e e e P. L. Bartlett, O. Bousquet, and S. Mendelson. Local Rademacher complexities. Ann. Stat., 33(4): 1497–1537, 2005. P. L. Bartlett and S. Mendelson. Empirical minimization. Probability Theory and Related Fields, 135(3):311 – 334, 2006. P. L. Bartlett, S. Mendelson, and P. Philips. Local complexities for empirical risk minimization. In J. Shawe-Taylor, editor, 17th Annual Conference on Learning Theory, COLT 2004, LNCS-3120, berlin, 2004. Springer-Verlag. S. Boucheron, O. Bousquet, and G. Lugosi. Theory of classification: a survey of some recent advances. ESAIM: Probability and Statistics, 9:323–375, 2005. S. Boucheron, G. Lugosi, and S. Massart. A sharp concentration inequality with applications. Random Structures and Algorithms, 16:277–292, 2000. O. Catoni. A PAC-Bayesian approach to adaptive classification. Technical report ´ n.840, http://www.proba.jussieu.fr/mathdoc/preprints/, Laboratoire de Probabilit es et Mod` les e Al´ atoires, Universit´ s Paris 6 and Paris 7, 2003. e e A. Dembo and O. Zeitouni. Large Deviation Techniques and Applications. Springer, 1998. L. Devroye and G. Lugosi. Combinatorial Methods in Density Estimation. Springer Series in Statistics. Springer Verlag, New York, 2001. R. M. Dudley. A course on empirical processes. Lecture Notes in Mathematics, 1097:2–142, 1984. E. Gin´ and J. Zinn. Some limit theorems for empirical processes. Ann. Probab., 12(4):929–989, e 1984. W. Hoeffding. Probability inequalities for sums of bounded random variables. Journal of the American Statistical Association, 58:13–30, 1963. V. Koltchinskii. Local Rademacher complexities and oracle inequalities in risk minimization. Annals of Statistics, 34(6), 2006. G. Lugosi. Concentration of measure inequalities. Lecture notes, pages 1–62, 2003. available from http://www.econ.upf.es/ lugosi/anu.ps. P. Massart. Some applications of concentration inequalities to statistics. Ann. Fac. Sci. Toulouse, Math. 9(2):245–303, 2000. 888 C OMBINING PAC-BAYESIAN AND G ENERIC C HAINING B OUNDS P. Massart. Concentration Inequalities and Model Selection: Ecole d’´ t´ de Probabilit´ s de Saintee e Flour XXXIII - 2003. Lecture Notes in Mathematics. Springer, 2006. D. A. McAllester. Some PAC-Bayesian theorems. In Proceedings of the 11th Annual Conference on Computational Learning Theory, pages 230–234. ACM Press, 1998. D. A. McAllester. PAC-Bayesian model averaging. In Proceedings of the 12th Annual Conference on Computational Learning Theory. ACM Press, 1999. D. Panchenko. Symmetrization approach to concentration inequalities for empirical processes. Annals of Probability, 31(4):2068–2081, 2003. G. Pisier. Probabilistic methods in the geometry of banach spaces. In Probability and analysis, Lect. Sess. C.I.M.E Varena, Italy 1985, Lecture Notes in Mathematics, volume 1206, pages 167–241, Berlin, 1986. Springer. M. Seeger. Bayesian gaussian process models: PAC-bayesian generalisation error bounds and sparse approximations. PhD Thesis, University of Edinburgh, December 2003. M. Talagrand. Majorizing measures: The generic chaining. Annals of Probability, 24(3):1049– 1103, 1996. M. Talagrand. Majorizing measures without measures. Annals of Probability, 29(1):411–417, 2001. M. Talagrand. The Generic Chaining: Upper and Lower Bounds for Stochastic Processes. Springer, 2005. A. B. Tsybakov. Optimal aggregation of classifiers in statistical learning. Ann. Stat., 32(1):135–166, 2004. S. van de Geer. Empirical Processes in M-Estimation. Cambridge University Press, Cambridge, UK, 2000. A. W. van der Vaart. Asymptotic Statistics. Cambridge University Press, 1998. A. W. van der Vaart and J. A. Wellner. Weak Convergence and Empirical Processes. Springer, 1996. V. Vapnik and A. Chervonenkis. On the uniform convergence of relative frequencies of events to their probabilities. Theory of Probability and its Applications, 16(2):264–280, 1971. V. Vapnik and A. Chervonenkis. Theory of Pattern Recognition [in Russian]. Nauka, Moscow, 1974. (German Translation: W. Wapnik & A. Tscherwonenkis, Theorie der Zeichenerkennung, Akademie–Verlag, Berlin, 1979). 889