nips nips2003 nips2003-134 nips2003-134-reference knowledge-graph by maker-knowledge-mining
Source: pdf
Author: Clayton Scott, Robert Nowak
Abstract: This paper reports on a family of computationally practical classifiers that converge to the Bayes error at near-minimax optimal rates for a variety of distributions. The classifiers are based on dyadic classification trees (DCTs), which involve adaptively pruned partitions of the feature space. A key aspect of DCTs is their spatial adaptivity, which enables local (rather than global) fitting of the decision boundary. Our risk analysis involves a spatial decomposition of the usual concentration inequalities, leading to a spatially adaptive, data-dependent pruning criterion. For any distribution on (X, Y ) whose Bayes decision boundary behaves locally like a Lipschitz smooth function, we show that the DCT error converges to the Bayes error at a rate within a logarithmic factor of the minimax optimal rate. We also study DCTs equipped with polynomial classification rules at each leaf, and show that as the smoothness of the boundary increases their errors converge to the Bayes error at a rate approaching n−1/2 , the parametric rate. We are not aware of any other practical classifiers that provide similar rate of convergence guarantees. Fast algorithms for tree pruning are discussed. 1
[1] C. Scott and R. Nowak, “Dyadic classification trees via structural risk minimization,” in Advances in Neural Information Processing Systems 14, S. Becker, S. Thrun, and K. Obermayer, Eds., Cambridge, MA, 2002, MIT Press.
[2] E. Mammen and A. B. Tsybakov, “Smooth discrimination analysis,” Annals of Statistics, vol. 27, pp. 1808–1829, 1999.
[3] M. Kearns and Y. Mansour, “A fast, bottom-up decision tree pruning algorithm with nearoptimal generalization,” in International Conference on Machine Learning, 1998, pp. 269–277.
[4] Y. Mansour and D. McAllester, “Generalization bounds for decision trees,” in Proceedings of the Thirteenth Annual Conference on Computational Learning Theory, Palo Alto, California, Nicol` Cesa-Bianchi and Sally A. Goldman, Eds., 2000, pp. 69–74. o
[5] K. Bennett and J. Blue, “A support vector machine approach to decision trees,” in Proceedings of the IEEE International Joint Conference on Neural Networks, Anchorage, Alaska, 1998, vol. 41, pp. 2396–2401.
[6] K. Bennett, N. Cristianini, J. Shawe-Taylor, and D. Wu, “Enlarging the margins in perceptron decision trees,” Machine Learning, vol. 41, pp. 295–313, 2000.
[7] C. Scott, “Tree pruning using a non-additive penalty,” Tech. Rep. TREE 0301, Rice University, 2003, available at http://www.dsp.rice.edu/∼cscott/pubs.html.
[8] A. Barron, “Complexity regularization with application to artificial neural networks,” in Nonparametric functional estimation and related topics, G. Roussas, Ed., pp. 561–576. NATO ASI series, Kluwer Academic Publishers, Dordrecht, 1991.
[9] C. Scott and R. Nowak, “Complexity-regularized dyadic classification trees: Efficient pruning and rates of convergence,” Tech. Rep. TREE0201, Rice University, 2002, available at http://www.dsp.rice.edu/∼cscott/pubs.html.
[10] M. Okamoto, “Some inequalities relating to the partial sum of binomial probabilites,” Annals of the Institute of Statistical Mathematics, vol. 10, pp. 29–35, 1958.