nips nips2013 nips2013-90 nips2013-90-reference knowledge-graph by maker-knowledge-mining

90 nips-2013-Direct 0-1 Loss Minimization and Margin Maximization with Boosting


Source: pdf

Author: Shaodan Zhai, Tian Xia, Ming Tan, Shaojun Wang

Abstract: We propose a boosting method, DirectBoost, a greedy coordinate descent algorithm that builds an ensemble classifier of weak classifiers through directly minimizing empirical classification error over labeled training examples; once the training classification error is reduced to a local coordinatewise minimum, DirectBoost runs a greedy coordinate ascent algorithm that continuously adds weak classifiers to maximize any targeted arbitrarily defined margins until reaching a local coordinatewise maximum of the margins in a certain sense. Experimental results on a collection of machine-learning benchmark datasets show that DirectBoost gives better results than AdaBoost, LogitBoost, LPBoost with column generation and BrownBoost, and is noise tolerant when it maximizes an n′ th order bottom sample margin. 1


reference text

[1] P. Bartlett and M. Traskin. AdaBoost is consistent. Journal of Machine Learning Research, 8:2347–2368, 2007.

[2] M. Bazaraa, H. Sherali and C. Shetty. Nonlinear Programming: Theory and Algorithms, 3rd Edition. Wiley-Interscience, 2006.

[3] D. P. Bertsekas. A distributed algorithm for the assignment problem. Technical Report, MIT, 1979.

[4] D. Bertsekas. Network Optimization: Continuous and Discrete Models. Athena Scientific, 1998.

[5] S. Boyd and L. Vandenberghe. Convex Optimization. Cambridge University Press, 2004.

[6] A. Demiriz, K. Bennett and J. Shawe-Taylor. Linear programming boosting via column generation, Machine Learning, 46:225–254, 2002.

[7] L. Devroye, L. Gy¨ rfi and G. Lugosi. A Probabilistic Theory of Pattern Recognition Springer, New York, o 1996.

[8] A. Frank and A. Asuncion. UCI Machine Learning Repository. School of Information and Computer Science, University of California at Irvine, 2006.

[9] Y. Freund and R. Schapire. A decision-theoretic generalization of on-line learning and an application to boosting. Journal of Computer and System Sciences, 55(1):119–139, 1997.

[10] Y. Freund. An adaptive version of the boost by majority algorithm. Machine Learning, 43(3):293–318, 2001.

[11] J. Friedman, T. Hastie and R. Tibshirani. Additive logistic regression: A statistical view of boosting. The Annals of Statistics, 28(2):337–374, 2000.

[12] K. Glocer. Entropy regularization and soft margin maximization. Ph.D. Dissertation, UCSC, 2009.

[13] K. Hoffgen, H. Simon and K. van Horn. Robust trainability of single neurons. Journal of Computer and System Sciences, 50(1):114–125, 1995.

[14] P. Long and R. Servedio. Random classification noise defeats all convex potential boosters. Machine Learning, 78:287-304, 2010.

[15] E. Mammen and A. Tsybakov. Smooth discrimination analysis. The Annals of Statistics, 27, 1808-1829, 1999.

[16] D. McAllester, T. Hazan and J. Keshet. Direct loss minimization for structured prediction. Neural Information Processing Systems (NIPS), 1594-1602, 2010.

[17] R. Schapire, Y. Freund, P. Bartlett and W. Lee. Boosting the margin: A new explanation for the effectiveness of voting methods. The Annals of Statistics, 26(5):1651–1686, 1998.

[18] R. Schapire and Y. Freund. Boosting: Foundations and Algorithms. MIT Press, 2012.

[19] S. Shalev-Shwartz and Y. Singer. On the equivalence of weak learnability and linear separability: new relaxations and efficient boosting algorithms. Machine Learning, 80(2-3): 141-163, 2010.

[20] I. Steinwart. Consistency of support vector machines and other regularized kernel classifiers. IEEE Transactions on Information Theory, 51(1):128-142, 2005.

[21] P. Tseng and D. Bertsekas. Relaxation methods for strictly convex costs and linear constraints. Mathematics of Operations Research, 16:462-481, 1991.

[22] P. Tseng. Convergence of block coordinate descent method for nondifferentiable minimization. Journal of Optimization Theory and Applications, 109(3):475–494, 2001.

[23] A. Tsybakov. Optimal aggregation of classifiers in statistical learning. The Annals of Statistics, 32(1):135166, 2004.

[24] V. Vapnik. Statistical Learning Theory. John Wiley, 1998.

[25] M. Warmuth, K. Glocer and G. Ratsch. Boosting algorithms for maximizing the soft margin. Advances in Neural Information Processing Systems (NIPS), 21, 1585-1592, 2007.

[26] M. Warmuth, K. Glocer and S. Vishwanathan. Entropy regularized LPBoost. The 19th International conference on Algorithmic Learning Theory (ALT), 256-271, 2008.

[27] S. Zhai, T. Xia, M. Tan and S. Wang. Direct 0-1 loss minimization and margin maximization with boosting. Technical Report, 2013.

[28] T. Zhang. Statistical behavior and consistency of classification methods based on convex risk minimization. The Annals of Statistics, 32(1):56–85, 2004.

[29] T. Zhang and B. Yu. Boosting with early stopping: Convergence and consistency. The Annals of Statistics, 33:1538–1579, 2005. 9