nips nips2013 nips2013-242 nips2013-242-reference knowledge-graph by maker-knowledge-mining

242 nips-2013-PAC-Bayes-Empirical-Bernstein Inequality


Source: pdf

Author: Ilya O. Tolstikhin, Yevgeny Seldin

Abstract: We present a PAC-Bayes-Empirical-Bernstein inequality. The inequality is based on a combination of the PAC-Bayesian bounding technique with an Empirical Bernstein bound. We show that when the empirical variance is significantly smaller than the empirical loss the PAC-Bayes-Empirical-Bernstein inequality is significantly tighter than the PAC-Bayes-kl inequality of Seeger (2002) and otherwise it is comparable. Our theoretical analysis is confirmed empirically on a synthetic example and several UCI datasets. The PAC-Bayes-Empirical-Bernstein inequality is an interesting example of an application of the PAC-Bayesian bounding technique to self-bounding functions. 1


reference text

[1] John Langford and John Shawe-Taylor. PAC-Bayes & margins. In Advances in Neural Information Processing Systems (NIPS), 2002.

[2] David McAllester. PAC-Bayesian stochastic model selection. Machine Learning, 51(1), 2003.

[3] John Langford. Tutorial on practical prediction theory for classification. Journal of Machine Learning Research, 2005.

[4] Yevgeny Seldin and Naftali Tishby. PAC-Bayesian analysis of co-clustering and beyond. Journal of Machine Learning Research, 11, 2010.

[5] Matthew Higgs and John Shawe-Taylor. A PAC-Bayes bound for tailored density estimation. In Proceedings of the International Conference on Algorithmic Learning Theory (ALT), 2010. 8

[6] Yevgeny Seldin, Peter Auer, Francois Laviolette, John Shawe-Taylor, and Ronald Ortner. PAC¸ Bayesian analysis of contextual bandits. In Advances in Neural Information Processing Systems (NIPS), 2011.

[7] Matthias Seeger. PAC-Bayesian generalization error bounds for Gaussian process classification. Journal of Machine Learning Research, 2002.

[8] Yevgeny Seldin, Francois Laviolette, Nicol` Cesa-Bianchi, John Shawe-Taylor, and Peter ¸ o Auer. PAC-Bayesian inequalities for martingales. IEEE Transactions on Information Theory, 58, 2012.

[9] Andreas Maurer and Massimiliano Pontil. Empirical Bernstein bounds and sample variance penalization. In Proceedings of the International Conference on Computational Learning Theory (COLT), 2009.

[10] Thomas M. Cover and Joy A. Thomas. Elements of Information Theory. John Wiley & Sons, 1991.

[11] Andreas Maurer. A note on the PAC-Bayesian theorem. www.arxiv.org, 2004.

[12] David McAllester. Some PAC-Bayesian theorems. Machine Learning, 37, 1999.

[13] A.W. Van Der Vaart. Asymptotic statistics. Cambridge University Press, 1998.

[14] St´ phane Boucheron, G´ bor Lugosi, and Olivier Bousquet. Concentration inequalities. In e a O. Bousquet, U.v. Luxburg, and G. R¨ tsch, editors, Advanced Lectures in Machine Learning. a Springer, 2004.

[15] A. Asuncion and D.J. Newman. UCI machine http://www.ics.uci.edu/∼mlearn/MLRepository.html. 9 learning repository, 2007.