nips nips2004 nips2004-138 nips2004-138-reference knowledge-graph by maker-knowledge-mining

138 nips-2004-Online Bounds for Bayesian Algorithms

Source: pdf

Author: Sham M. Kakade, Andrew Y. Ng

Abstract: We present a competitive analysis of Bayesian learning algorithms in the online learning setting and show that many simple Bayesian algorithms (such as Gaussian linear regression and Bayesian logistic regression) perform favorably when compared, in retrospect, to the single best model in the model class. The analysis does not assume that the Bayesian algorithms’ modeling assumptions are “correct,” and our bounds hold even if the data is adversarially chosen. For Gaussian linear regression (using logloss), our error bounds are comparable to the best bounds in the online learning literature, and we also provide a lower bound showing that Gaussian linear regression is optimal in a certain worst case sense. We also give bounds for some widely used maximum a posteriori (MAP) estimation algorithms, including regularized logistic regression. 1

reference text

Azoury, K. S. and Warmuth, M. (2001). Relative loss bounds for on-line density estimation with the exponential family of distributions. Machine Learning, 43(3). Cesa-Bianchi, N., Freund, Y., Haussler, D., Helmbold, D., Schapire, R., and Warmuth, M. (1997). How to use expert advice. J. ACM, 44. Cesa-Bianchi, N., Helmbold, D., and Panizza, S. (1998). On Bayes methods for on-line boolean prediction. Algorithmica, 22. Dawid, A. (1984). Statistical theory: The prequential approach. J. Royal Statistical Society. Foster, D. P. (1991). Prediction in the worst case. Annals of Statistics, 19. Freund, Y. and Schapire, R. (1999). Adaptive game playing using multiplicative weights. Games and Economic Behavior, 29:79–103. Freund, Y., Schapire, R., Singer, Y., and Warmuth, M. (1997). Using and combining predictors that specialize. In STOC. Grunwald, P. (2005). A tutorial introduction to the minimum description length principle. McCullagh, P. and Nelder, J. A. (1989). Generalized Linear Models (2nd ed.). Chapman and Hall. Ng, A. Y. and Jordan, M. (2001). Convergence rates of the voting Gibbs classiﬁer, with application to Bayesian feature selection. In Proceedings of the 18th Int’l Conference on Machine Learning. Vovk, V. (2001). Competitive on-line statistics. International Statistical Review, 69.