nips nips2010 nips2010-282 nips2010-282-reference knowledge-graph by maker-knowledge-mining
Source: pdf
Author: Hamed Masnadi-shirazi, Nuno Vasconcelos
Abstract: The problem of controlling the margin of a classifier is studied. A detailed analytical study is presented on how properties of the classification risk, such as its optimal link and minimum risk functions, are related to the shape of the loss, and its margin enforcing properties. It is shown that for a class of risks, denoted canonical risks, asymptotic Bayes consistency is compatible with simple analytical relationships between these functions. These enable a precise characterization of the loss for a popular class of link functions. It is shown that, when the risk is in canonical form and the link is inverse sigmoidal, the margin properties of the loss are determined by a single parameter. Novel families of Bayes consistent loss functions, of variable margin, are derived. These families are then used to design boosting style algorithms with explicit control of the classification margin. The new algorithms generalize well established approaches, such as LogitBoost. Experimental results show that the proposed variable margin losses outperform the fixed margin counterparts used by existing algorithms. Finally, it is shown that best performance can be achieved by cross-validating the margin parameter. 1
[1] V. N. Vapnik, Statistical Learning Theory. John Wiley Sons Inc, 1998.
[2] J. Friedman, T. Hastie, and R. Tibshirani, “Additive logistic regression: A statistical view of boosting,” Annals of Statistics, 2000.
[3] H. Masnadi-Shirazi and N. Vasconcelos, “On the design of loss functions for classification: theory, robustness to outliers, and savageboost,” in NIPS, 2008, pp. 1049–1056.
[4] L. J. Savage, “The elicitation of personal probabilities and expectations,” Journal of the American Statistical Association, vol. 66, pp. 783–801, 1971.
[5] C. Leistner, A. Saffari, P. M. Roth, and H. Bischof, “On robustness of on-line boosting - a competitive study,” in IEEE ICCV Workshop on On-line Computer Vision, 2009.
[6] A. Buja, W. Stuetzle, and Y. Shen, “Loss functions for binary class probability estimation and classification: Structure and applications,” 2006.
[7] T. Zhang, “Statistical behavior and consistency of classification methods based on convex risk minimization,” Annals of Statistics, 2004.
[8] J. H. Friedman, “Greedy function approximation: A gradient boosting machine,” The Annals of Statistics, vol. 29, no. 5, pp. 1189–1232, 2001.
[9] J. Demˇar, “Statistical comparisons of classifiers over multiple data sets,” The Journal of Machine Learning s Research, vol. 7, pp. 1–30, 2006. 9