nips nips2004 nips2004-137 nips2004-137-reference knowledge-graph by maker-knowledge-mining
Source: pdf
Author: Clayton Scott, Robert Nowak
Abstract: Decision trees are surprisingly adaptive in three important respects: They automatically (1) adapt to favorable conditions near the Bayes decision boundary; (2) focus on data distributed on lower dimensional manifolds; (3) reject irrelevant features. In this paper we examine a decision tree based on dyadic splits that adapts to each of these conditions to achieve minimax optimal rates of convergence. The proposed classifier is the first known to achieve these optimal rates while being practical and implementable. 1
[1] A. B. Tsybakov, “Optimal aggregation of classifiers in statistical learning,” Ann. Stat., vol. 32, no. 1, pp. 135–166, 2004.
[2] Y. Mansour and D. McAllester, “Generalization bounds for decision trees,” in Proceedings of the Thirteenth Annual Conference on Computational Learning Theory, N. Cesa-Bianchi and S. Goldman, Eds., Palo Alto, CA, 2000, pp. 69–74.
[3] Y. Yang, “Minimax nonparametric classification–Part I: Rates of convergence,” IEEE Trans. Inform. Theory, vol. 45, no. 7, pp. 2271–2284, 1999.
[4] E. Mammen and A. B. Tsybakov, “Smooth discrimination analysis,” Ann. Stat., vol. 27, pp. 1808–1829, 1999.
[5] A. B. Tsybakov and S. A. van de Geer, “Square root penalty: adaptation to the margin in classification and in edge estimation,” 2004, preprint.
[6] P. Bartlett, M. Jordan, and J. McAuliffe, “Convexity, classification, and risk bounds,” Department of Statistics, U.C. Berkeley, Tech. Rep. 638, 2003, to appear in Journal of the American Statistical Association.
[7] G. Blanchard, G. Lugosi, and N. Vayatis, “On the rate of convergence of regularized boosting classifiers,” J. Machine Learning Research, vol. 4, pp. 861–894, 2003.
[8] J. C. Scovel and I. Steinwart, “Fast rates for support vector machines,” Los Alamos National Laboratory, Tech. Rep. LA-UR 03-9117, 2004.
[9] G. Blanchard, C. Sch¨ fer, and Y. Rozenholc, “Oracle bounds and exact algorithm for dyadic a classification trees,” in Learning Theory: 17th Annual Conference on Learning Theory, COLT 2004, J. Shawe-Taylor and Y. Singer, Eds. Heidelberg: Springer-Verlag, 2004, pp. 378–392.
[10] C. Scott and R. Nowak, “Near-minimax optimal classification with dyadic classification trees,” in Advances in Neural Information Processing Systems 16, S. Thrun, L. Saul, and B. Sch¨ lkopf, o Eds. Cambridge, MA: MIT Press, 2004.
[11] ——, “Minimax optimal classification with dyadic decision trees,” Rice University, Tech. Rep. TREE0403, 2004. [Online]. Available: http://www.stat.rice.edu/∼cscott
[12] A. Nobel, “Analysis of a complexity based pruning scheme for classification trees,” IEEE Trans. Inform. Theory, vol. 48, no. 8, pp. 2362–2368, 2002.
[13] J.-Y. Audibert, “PAC-Bayesian statistical learning theory,” Ph.D. dissertation, Universit´ Paris e 6, June 2004.