nips nips2003 nips2003-134 nips2003-134-reference knowledge-graph by maker-knowledge-mining

134 nips-2003-Near-Minimax Optimal Classification with Dyadic Classification Trees

Source: pdf

Author: Clayton Scott, Robert Nowak

Abstract: This paper reports on a family of computationally practical classiﬁers that converge to the Bayes error at near-minimax optimal rates for a variety of distributions. The classiﬁers are based on dyadic classiﬁcation trees (DCTs), which involve adaptively pruned partitions of the feature space. A key aspect of DCTs is their spatial adaptivity, which enables local (rather than global) ﬁtting of the decision boundary. Our risk analysis involves a spatial decomposition of the usual concentration inequalities, leading to a spatially adaptive, data-dependent pruning criterion. For any distribution on (X, Y ) whose Bayes decision boundary behaves locally like a Lipschitz smooth function, we show that the DCT error converges to the Bayes error at a rate within a logarithmic factor of the minimax optimal rate. We also study DCTs equipped with polynomial classiﬁcation rules at each leaf, and show that as the smoothness of the boundary increases their errors converge to the Bayes error at a rate approaching n−1/2 , the parametric rate. We are not aware of any other practical classiﬁers that provide similar rate of convergence guarantees. Fast algorithms for tree pruning are discussed. 1

reference text

[1] C. Scott and R. Nowak, “Dyadic classiﬁcation trees via structural risk minimization,” in Advances in Neural Information Processing Systems 14, S. Becker, S. Thrun, and K. Obermayer, Eds., Cambridge, MA, 2002, MIT Press.

[2] E. Mammen and A. B. Tsybakov, “Smooth discrimination analysis,” Annals of Statistics, vol. 27, pp. 1808–1829, 1999.

[3] M. Kearns and Y. Mansour, “A fast, bottom-up decision tree pruning algorithm with nearoptimal generalization,” in International Conference on Machine Learning, 1998, pp. 269–277.

[4] Y. Mansour and D. McAllester, “Generalization bounds for decision trees,” in Proceedings of the Thirteenth Annual Conference on Computational Learning Theory, Palo Alto, California, Nicol` Cesa-Bianchi and Sally A. Goldman, Eds., 2000, pp. 69–74. o

[5] K. Bennett and J. Blue, “A support vector machine approach to decision trees,” in Proceedings of the IEEE International Joint Conference on Neural Networks, Anchorage, Alaska, 1998, vol. 41, pp. 2396–2401.

[6] K. Bennett, N. Cristianini, J. Shawe-Taylor, and D. Wu, “Enlarging the margins in perceptron decision trees,” Machine Learning, vol. 41, pp. 295–313, 2000.

[7] C. Scott, “Tree pruning using a non-additive penalty,” Tech. Rep. TREE 0301, Rice University, 2003, available at http://www.dsp.rice.edu/∼cscott/pubs.html.

[8] A. Barron, “Complexity regularization with application to artiﬁcial neural networks,” in Nonparametric functional estimation and related topics, G. Roussas, Ed., pp. 561–576. NATO ASI series, Kluwer Academic Publishers, Dordrecht, 1991.

[9] C. Scott and R. Nowak, “Complexity-regularized dyadic classiﬁcation trees: Efﬁcient pruning and rates of convergence,” Tech. Rep. TREE0201, Rice University, 2002, available at http://www.dsp.rice.edu/∼cscott/pubs.html.

[10] M. Okamoto, “Some inequalities relating to the partial sum of binomial probabilites,” Annals of the Institute of Statistical Mathematics, vol. 10, pp. 29–35, 1958.