nips nips2013 nips2013-82 nips2013-82-reference knowledge-graph by maker-knowledge-mining

82 nips-2013-Decision Jungles: Compact and Rich Models for Classification

Source: pdf

Author: Jamie Shotton, Toby Sharp, Pushmeet Kohli, Sebastian Nowozin, John Winn, Antonio Criminisi

Abstract: Randomized decision trees and forests have a rich history in machine learning and have seen considerable success in application, perhaps particularly so for computer vision. However, they face a fundamental limitation: given enough data, the number of nodes in decision trees will grow exponentially with depth. For certain applications, for example on mobile or embedded processors, memory is a limited resource, and so the exponential growth of trees limits their depth, and thus their potential accuracy. This paper proposes decision jungles, revisiting the idea of ensembles of rooted decision directed acyclic graphs (DAGs), and shows these to be compact and powerful discriminative models for classiﬁcation. Unlike conventional decision trees that only allow one path to every node, a DAG in a decision jungle allows multiple paths from the root to each leaf. We present and compare two new node merging algorithms that jointly optimize both the features and the structure of the DAGs efﬁciently. During training, node splitting and node merging are driven by the minimization of exactly the same objective function, here the weighted sum of entropies at the leaves. Results on varied datasets show that, compared to decision forests and several other baselines, decision jungles require dramatically less memory while considerably improving generalization. 1

reference text

[1] Y. Amit and D. Geman. Randomized inquiries about shape; an application to handwritten digit recognition. Technical Report 401, Dept. of Statistics, University of Chicago, IL, Nov 1994.

[2] A. Banerjee, S. Merugu, I. S. Dhillon, and J. Ghosh. Clustering with Bregman divergences. Journal of Machine Learning Research, 6:1705–1749, Oct. 2005.

[3] D. Benbouzid, R. Busa-Fekete, and B. K´ gl. Fast classiﬁcation using sparse decision DAGs. In Proc. Intl e Conf. on Machine Learning (ICML), New York, NY, USA, 2012. ACM.

[4] K. P. Bennett, N. Cristianini, J. Shawe-Taylor, and D. Wu. Enlarging the margins in perceptron decision trees. Machine Learning, 41(3):295–313, 2000.

[5] L. Breiman. Random forests. Machine Learning, 45(1), 2001.

[6] L. Breiman, J. Friedman, C. J. Stone, and R. A. Olshen. Classiﬁcation and Regression Trees. Chapman and Hall/CRC, 1984.

[7] H. Chipman, E. I. George, and R. E. Mcculloch. Bayesian CART model search. Journal of the American Statistical Association, 93:935–960, 1997.

[8] P. Chou. Optimal partitioning for classiﬁcation and regression trees. IEEE Trans. PAMI, 13(4), 1991.

[9] A. Criminisi and J. Shotton. Decision Forests for Computer Vision and Medical Image Analysis. Springer, 2013.

[10] T. Elomaa and M. K¨ ari¨ inen. On the practice of branching program boosting. In European Conf. on a¨ a Machine Learning (ECML), 2001.

[11] M. Everingham, L. van Gool, C. Williams, J. Winn, and A. Zisserman. The Pascal Visual Object Classes (VOC) Challenge. http://www.pascal-network.org/challenges/VOC/.

[12] S. Gould, R. Fulton, and D. Koller. Decomposing a scene into geometric and semantically consistent regions. In Proc. IEEE ICCV, 2009.

[13] T. Hastie, R. Tibshirani, and J. Friedman. The Elements of Statistical Learning. Springer, 2001.

[14] E. B. Hunt, J. Marin, and P. T. Stone. Experiments in Induction. Academic Press, New York, 1966.

[15] L. Hyaﬁl and R. L. Rivest. Constructing optimal binary decision trees is NP-complete. Information Processing Letters, 5(1):15–17, 1976.

[16] B. Kijsirikul, N. Ussivakul, and S. Meknavin. Adaptive directed acyclic graphs for multiclass classiﬁcation. In Paciﬁc Rim Intl Conference on Artiﬁcial Intelligence (PRICAI), 2002.

[17] R. Kohavi and C.-H. Li. Oblivious decision trees, graphs, and top-down pruning. In Intl Joint Conf. on Artiﬁcal Intelligence (IJCAI), 1995.

[18] P. Kontschieder, P. Kohli, J. Shotton, and A. Criminisi. GeoF: Geodesic forests for learning coupled predictors. In Proc. IEEE CVPR, 2013.

[19] V. Lepetit and P. Fua. Keypoint recognition using randomized trees. IEEE Trans. PAMI, 2006.

[20] J. Mahoney and R. J. Mooney. Initializing ID5R with a domain theory: some negative results. Technical Report 91-154, Dept. of Computer Science, University of Texas, Austin, TX, 1991.

[21] K. V. S. Murthy and S. L. Salzberg. On growing better decision trees from data. PhD thesis, John Hopkins University, 1995.

[22] D. Newman, S. Hettich, C. Blake, and C. Merz. UCI repository of machine learning databases. Technical Report 28, University of California, Irvine, Department of Information and Computer Science, 1998.

[23] A. L. Oliveira and A. Sangiovanni-Vincentelli. Using the minimum description length principle to infer reduced ordered decision graphs. Machine Learning, 12, 1995.

[24] J. J. Oliver. Decision graphs – an extension of decision trees. Technical Report 92/173, Dept. of Computer Science, Monash University, Victoria, Australia, 1992.

[25] A. H. Peterson and T. R. Martinez. Reducing decision trees ensemble size using parallel decision DAGs. Intl Journ. on Artiﬁcial Intelligence Tools, 18(4), 2009.

[26] J. C. Platt, N. Cristianini, and J. Shawe-Taylor. Large margin DAGs for multiclass classiﬁcation. In Proc. NIPS, pages 547–553, 2000.

[27] J. R. Quinlan. Induction of decision trees. Machine Learning, 1986.

[28] J. R. Quinlan. C4.5: Programs for Machine Learning. Morgan Kaufmann Publishers, 1993.

[29] J. Shotton, R. Girshick, A. Fitzgibbon, T. Sharp, M. Cook, M. Finocchio, R. Moore, P. Kohli, A. Criminisi, A. Kipman, and A. Blake. Efﬁcient human pose estimation from single depth images. IEEE Trans. PAMI, 2013. 9