nips nips2002 nips2002-92 nips2002-92-reference knowledge-graph by maker-knowledge-mining

92 nips-2002-FloatBoost Learning for Classification


Source: pdf

Author: Stan Z. Li, Zhenqiu Zhang, Heung-yeung Shum, Hongjiang Zhang

Abstract: AdaBoost [3] minimizes an upper error bound which is an exponential function of the margin on the training set [14]. However, the ultimate goal in applications of pattern classification is always minimum error rate. On the other hand, AdaBoost needs an effective procedure for learning weak classifiers, which by itself is difficult especially for high dimensional data. In this paper, we present a novel procedure, called FloatBoost, for learning a better boosted classifier. FloatBoost uses a backtrack mechanism after each iteration of AdaBoost to remove weak classifiers which cause higher error rates. The resulting float-boosted classifier consists of fewer weak classifiers yet achieves lower error rates than AdaBoost in both training and test. We also propose a statistical model for learning weak classifiers, based on a stagewise approximation of the posterior using an overcomplete set of scalar features. Experimental comparisons of FloatBoost and AdaBoost are provided through a difficult classification problem, face detection, where the goal is to learn from training examples a highly nonlinear classifier to differentiate between face and nonface patterns in a high dimensional space. The results clearly demonstrate the promises made by FloatBoost over AdaBoost.


reference text

[1] L. Breiman. “Arcing classifiers”. The Annals of Statistics, 26(3):801–849, 1998.

[2] P. Buhlmann and B. Yu. “Invited discussion on ‘Additive logistic regression: a statistical view of boosting (friedman, hastie and tibshirani)’ ”. The Annals of Statistics, 28(2):377–386, April 2000.

[3] Y. Freund and R. Schapire. “A decision-theoretic generalization of on-line learning and an application to boosting”. Journal of Computer and System Sciences, 55(1):119–139, Aug 1997.

[4] J. Friedman. “Greedy function approximation: A gradient boosting machine”. The Annals of Statistics, 29(5), October 2001.

[5] J. Friedman, T. Hastie, and R. Tibshirani. “Additive logistic regression: a statistical view of boosting”. The Annals of Statistics, 28(2):337–374, April 2000.

[6] M. J. Kearns and U. Vazirani. An Introduction to Computational Learning Theory. MIT Press, Cambridge, MA, 1994.

[7] S. Z. Li, L. Zhu, Z. Q. Zhang, A. Blake, H. Zhang, and H. Shum. “Statistical learning of multi-view face detection”. In Proceedings of the European Conference on Computer Vision, page ???, Copenhagen, Denmark, May 28 - June 2 2002.

[8] L. Mason, J. Baxter, P. Bartlett, and M. Frean. Functional gradient techniques for combining hypotheses. In A. Smola, P. Bartlett, B. Sch¨ lkopf, and D. Schuurmans, editors, Advances in Large Margin Classifiers, pages 221–247. MIT Press, o Cambridge, MA, 1999.

[9] E. Osuna, R. Freund, and F. Girosi. “Training support vector machines: An application to face detection”. In CVPR, pages 130–136, 1997.

[10] C. P. Papageorgiou, M. Oren, and T. Poggio. “A general framework for object detection”. In Proceedings of IEEE International Conference on Computer Vision, pages 555–562, Bombay, India, 1998.

[11] P. Pudil, J. Novovicova, and J. Kittler. “Floating search methods in feature selection”. Pattern Recognition Letters, (11):1119–1125, 1994.

[12] D. Roth, M. Yang, and N. Ahuja. “A snow-based face detector”. In Proceedings of Neural Information Processing Systems, 2000.

[13] H. A. Rowley, S. Baluja, and T. Kanade. “Neural network-based face detection”. IEEE Transactions on Pattern Analysis and Machine Intelligence, 20(1):23–28, 1998.

[14] R. Schapire, Y. Freund, P. Bartlett, and W. S. Lee. “Boosting the margin: A new explanation for the effectiveness of voting methods”. The Annals of Statistics, 26(5):1651–1686, October 1998.

[15] R. E. Schapire and Y. Singer. “Improved boosting algorithms using confidence-rated predictions”. In Proceedings of the Eleventh Annual Conference on Computational Learning Theory, pages 80–91, 1998.

[16] K.-K. Sung and T. Poggio. “Example-based learning for view-based human face detection”. IEEE Transactions on Pattern Analysis and Machine Intelligence, 20(1):39–51, 1998.

[17] L. Valiant. “A theory of the learnable”. Communications of ACM, 27(11):1134–1142, 1984.

[18] P. Viola and M. Jones. “Asymmetric AdaBoost and a detector cascade”. In Proceedings of Neural Information Processing Systems, Vancouver, Canada, December 2001.

[19] P. Viola and M. Jones. “Rapid object detection using a boosted cascade of simple features”. In Proceedings of IEEE Computer Society Conference on Computer Vision and Pattern Recognition, Kauai, Hawaii, December 12-14 2001.

[20] P. Viola and M. Jones. “Robust real time object detection”. In IEEE ICCV Workshop on Statistical and Computational Theories of Vision, Vancouver, Canada, July 13 2001.

[21] R. Zemel and T. Pitassi. “A gradient-based boosting algorithm for regression problems”. In Advances in Neural Information Processing Systems, volume 13, Cambridge, MA, 2001. MIT Press.