jmlr jmlr2005 jmlr2005-66 jmlr2005-66-reference knowledge-graph by maker-knowledge-mining
Source: pdf
Author: Ofer Dekel, Shai Shalev-Shwartz, Yoram Singer
Abstract: We describe new loss functions for regression problems along with an accompanying algorithmic framework which utilizes these functions. These loss functions are derived by symmetrization of margin-based losses commonly used in boosting algorithms, namely, the logistic loss and the exponential loss. The resulting symmetric logistic loss can be viewed as a smooth approximation to the ε-insensitive hinge loss used in support vector regression. We describe and analyze two parametric families of batch learning algorithms for minimizing these symmetric losses. The first family employs an iterative log-additive update which can be viewed as a regression counterpart to recent boosting algorithms. The second family utilizes an iterative additive update step. We also describe and analyze online gradient descent (GD) and exponentiated gradient (EG) algorithms for the symmetric logistic loss. A byproduct of our work is a new simple form of regularization for boosting-based classification and regression algorithms. Our regression framework also has implications on classification algorithms, namely, a new additive update boosting algorithm for classification. We demonstrate the merits of our algorithms in a series of experiments.
J. Bi and K.P. Bennett. A geometric approach to support vector regression. In Neurocomputing, special issue on support vector machines, volume 55, pages 79–108, September 2003. N. Cesa-Bianchi. Analysis of two gradient-based algorithms for on-line regression. Journal of Computer and System Sciences, 59(3):392–411, 1999. M. Collins, R.E. Schapire, and Y. Singer. Logistic regression, AdaBoost and Bregman distances. Machine Learning, 47(2/3):253–285, 2002. N. Duffy and D. Helmbold. Leveraging for regression. In Proceedings of the Thirteenth Annual Conference on Computational Learning Theory. ACM, 2000. Y. Freund and M. Opper. Drifting games and Brownian motion. Journal of Computer and System Sciences, 64:113–132, 2002. J. Friedman, T. Hastie, and R. Tibshirani. Additive logistic regression: a statistical view of boosting. Annals of Statistics, 28(2):337–374, April 2000. J.H. Friedman. Greedy function approximation: A gradient boosting machine. Annals of Statistics, 29(5):1189–1232, 2001. T. Hastie, R. Tibshirani, and J. Friedman. The Elements of Statistical Learning. Springer, 2001. P.J. Huber. Robust Statistics. John Wiley and Sons, New York, 1981. J. Kivinen and M.K. Warmuth. Exponentiated gradient versus gradient descent for linear predictors. Information and Computation, 132(1):1–64, January 1997. G. Lebanon and J. Lafferty. Boosting and maximum likelihood for exponential models. In Advances in Neural Information Processing Systems 14, 2001. K.W. Penrose, A.G. Nelson, and A.G. Fisher. Generalized body composition prediction equation for men using simple measurement techniques. Medicine and Science in Sports and Exercise, 17(2):189, 1985. T. Poggio and F. Girosi. Networks for approximation and learning. Proceedings of the IEEE, 78(9), 1990. R.E. Schapire, M. Rochery, M. Rahim, and N. Gupta. Incorporating prior knowledge into boosting. In Machine Learning: Proceedings of the Nineteenth International Conference, 2002. R.E. Schapire. Drifting games. In Proceedings of the Twelfth Annual Conference on Computational Learning Theory, 1999. A. Smola and B. Sch¨lkopf. A tutorial on support vector regression. Technical Report o NC2-TR-1998-030, NeuroCOLT2, 1998. V.N. Vapnik. Statistical Learning Theory. Wiley, 1998. 741