nips nips2000 nips2000-3 nips2000-3-reference knowledge-graph by maker-knowledge-mining

3 nips-2000-A Gradient-Based Boosting Algorithm for Regression Problems

Source: pdf

Author: Richard S. Zemel, Toniann Pitassi

Abstract: In adaptive boosting, several weak learners trained sequentially are combined to boost the overall algorithm performance. Recently adaptive boosting methods for classification problems have been derived as gradient descent algorithms. This formulation justifies key elements and parameters in the methods, all chosen to optimize a single common objective function. We propose an analogous formulation for adaptive boosting of regression problems, utilizing a novel objective function that leads to a simple boosting algorithm. We prove that this method reduces training error, and compare its performance to other regression methods. The aim of boosting algorithms is to

reference text

[1] Bordley, R. (1982). A multiplicative formula for aggregating probability assessments. Managment Science, 28, 1137-1148.

[2] Breiman, L. (1997). Prediction games and arcing classifiers. TR 504. Statistics Dept., UC Berkeley.

[3] Duffy, N. & Helmbold, D. (2000). Leveraging for regression. In Proceedings of COLT, 13.

[4] Freund, Y. & Schapire, R. E. (1997). A decision-theoretic generalization of on-line learning and an application to boosting. Journal of Camp. and System Sci., 55, 119-139.

[5] Friedman, J. H. (1999). Greedy function approximation: A gradient boosting machine. TR, Dept. of Statistics, Stanford University.

[6] Friedman, J. H., Hastie, T., & Tibshirani, R. (1999). Additive logistic regression: A statistical view of boosting. Annals of Statistics, To appear.

[7] Geman, S., Bienenstock, E., & Doursat, R. (1992). Neural networks and the bias/variance dilemma. Neural Computation, 4,1-58.

[8] Hastie, T & TIbshirani, R. (1990). Generalized Additive Models. Chapman and Hall.

[9] Heskes, T (1998). Bias-variance decompositions for likelihood-based estimators. Neural Computa- tion, 10, 1425-1433.

[10] Hinton, G. E. (2000). Training products of experts by minimizing contrastive divergence. GCNUTR 2000-004. Gatsby Computational Neuroscience Unit, University College London.

[11] Jacobs, R. A., Jordan, M. I., Nowlan, S. J., & Hinton, G. E. (1991). Adaptive mixtures of local experts. Neural Computation, 3,79-87.

[12] Karakoulas, G., & Shawe-Taylor, J. (1999). Towards a strategy for boosting regressors. In Advances in Large Margin Classifiers, Smola, Bartlett, Schdlkopf & Schuurmans (Eds.).

[13] Krogh, A. & Vedelsby, J. (1995). Neural network ensembles, cross-validation, and active learning. InJ\TIps 7.

[14] Mason, L., Baxter, J., Bartlett, P., & Frean, M. (1999). Boosting algorithms as gradient descent in function space. In NIPS 11.

[15] Riitsch, G., Mika, S. Onoda, T, Lemm, S. & Miiller, K.-R. (2000). Barrier boosting. In Proceedings of COLT,13 .

[16] Schapire, R. E. (1990). The strength of weak leamability. Machine Learning, 5, 197-227.

[17] Schapire, R. E. & Singer, Y. (1998). Improved boosting algorithms using confidence-rated precitions. In Proceedings of COLT, 11.