jmlr jmlr2007 jmlr2007-11 jmlr2007-11-reference knowledge-graph by maker-knowledge-mining
Source: pdf
Author: Saher Esmeir, Shaul Markovitch
Abstract: The majority of existing algorithms for learning decision trees are greedy—a tree is induced topdown, making locally optimal decisions at each node. In most cases, however, the constructed tree is not globally optimal. Even the few non-greedy learners cannot learn good trees when the concept is difficult. Furthermore, they require a fixed amount of time and are not able to generate a better tree if additional time is available. We introduce a framework for anytime induction of decision trees that overcomes these problems by trading computation speed for better tree quality. Our proposed family of algorithms employs a novel strategy for evaluating candidate splits. A biased sampling of the space of consistent trees rooted at an attribute is used to estimate the size of the minimal tree under that attribute, and an attribute with the smallest expected tree is selected. We present two types of anytime induction algorithms: a contract algorithm that determines the sample size on the basis of a pre-given allocation of time, and an interruptible algorithm that starts with a greedy tree and continuously improves subtrees by additional sampling. Experimental results indicate that, for several hard concepts, our proposed approach exhibits good anytime behavior and yields significantly better decision trees when more time is available. Keywords: anytime algorithms, decision tree induction, lookahead, hard concepts, resource-bounded reasoning
Y. Baram, R. El-Yaniv, and K. Luz. Online choice of active learning algorithms. In Proceedings of the Twentieth International Conference on Machine Learning, pages 19–26, Washington, DC, USA, 2003. K. Bennett. Global tree optimization: A non-greedy decision tree algorithm. Computing Science and Statistics, 26:156–160, 1994. C. L. Blake and C. J. Merz. UCI repository of machine learning databases, 1998. http://www.ics.uci.edu/∼mlearn/MLRepository.html. URL A. Blumer, A. Ehrenfeucht, D. Haussler, and M. K. Warmuth. Occam’s Razor. Information Processing Letters, 24(6):377–380, 1987. M. Boddy and T. L. Dean. Deliberation scheduling for problem solving in time constrained environments. Artificial Intelligence, 67(2):245–285, 1994. R. R. Bouckaert. Choosing between two learning algorithms based on calibrated tests. In Proceedings of the Twentieth International Conference on Machine Learning, pages 51–58, Washington, DC, USA, 2003. L. Breiman. Bagging predictors. Machine Learning, 24(2):123–140, 1996. L. Breiman, J. Friedman, R. Olshen, and C. Stone. Classification and Regression Trees. Wadsworth and Brooks, Monterey, CA, 1984. W. Buntine and T. Niblett. A further comparison of splitting rules for decision-tree induction. Machine Learning, 8(1):75–85, 1992. O. Chapelle, V. Vapnik, O. Bousquet, and S. Mukherjee. Choosing multiple parameters for support vector machines. Machine Learning, 46(1-3):131–159, 2002. M. W. Craven. Extracting Comprehensible Models from Trained Neural Networks. PhD thesis, Department of Computer Sciences, University of Wisconsin, Madison, 1996. J. Demsar. Statistical comparisons of classifiers over multiple data sets. Journal of Machine Learning Research, 7:1–30, 2006. T. Dietterich. An experimental comparison of three methods for constructing ensembles of decision trees: Bagging, boosting, and randomization. Machine Learning, 40(2):139–157, 2000. M. Dong and R. Kothari. Look-ahead based fuzzy decision tree induction. IEEE-FS, 9:461–468, June 2001. J. Dougherty, R. Kohavi, and M. Sahami. Supervised and unsupervised discretization of continuous features. In Proceedings of the Twelfth International Conference on Machine Learning, pages 194–202, Tahoe City, California, USA, 1995. S. Esmeir and S. Markovitch. Occam’s Razor just got sharper. In Proceedings of The Twentieth International Joint Conference on Artificial Intelligence, India, 2007. 930 A NYTIME L EARNING OF D ECISION T REES F. Esposito, D. Malerba, and G. Semeraro. A comparative analysis of methods for pruning decision trees. IEEE Transactions on Pattern Analysis and Machine Intelligence, 19(5):476–491, 1997. U. M. Fayyad and K. B. Irani. What should be minimized in a decision tree? In Proceedings of the Eighth National Conference on Artificial Intelligence, pages 749–754, Boston, Massachusetts, USA, 1990. AAAI Press / The MIT Press. Y. Freund and L. Mason. The alternating decision tree learning algorithm. In Proceedings of the Sixteenth International Conference on Machine Learning, pages 124–133, Bled, Slovenia, 1999. T. Hastie, R. Tibshirani, and J. Friedman. The Elements of Statistical Learning: Data Mining, Inference, and Prediction. New York: Springer-Verlag, 2001. E. Hovitz. Computation and Action under Bounded Resources. PhD thesis, Computer Science Department, Stanford University, 1990. L. Hyafil and R. L. Rivest. Constructing optimal binary decision trees is NP-complete. Information Processing Letters, 5(1):15–17, 1976. G. H. John, R. Kohavi, and K. Pfleger. Irrelevant features and the subset selection problem. In Proceedings of the Eleventh International Conference on Machine Learning, pages 121–129, 1994. D.S. Johnson, C.R. Aragon, L.A. McGeoch, and C. Schevon. Optimization by simulated annealing: An experimental evaluation; part ii, graph coloring and number partitioning. Operations Research, 39(3):378–406, 1991. H. Kim and W. Loh. Classification trees with unbiased multiway splits. Journal of the American Statistical Association, 96:589–604, 2001. I. Kononenko, E. Simec, and M. Robnik-Sikonja. Overcoming the myopia of inductive learning algorithms with RELIEFF. Applied Intelligence, 7(1):39–55, 1997. M. Last, A. Kandel, O. Maimon, and E. Eberbach. Anytime algorithm for feature selection. In Revised Papers from the Second International Conference on Rough Sets and Current Trends in Computing, pages 532–539. Springer-Verlag, 2001. M. Lindenbaum, S. Markovitch, and D. Rusakov. Selective sampling for nearest neighbor classifiers. Machine Learning, 54(2):125–152, 2004. D. Lizotte, O. Madani, and R. Greiner. Budgeted learning of naive bayes classifiers. In Proceedings of the Nineteenth Conference on Uncertainty in Artificial Intelligence, Acapulco, Mexico, 2003. W. Loh and Y. Shih. Split selection methods for classification trees. Statistica Sinica, 7:815–840, 1997. S. Markovitch and D. Rosenstein. Feature generation using general constructor functions. Machine Learning, 49:59–98, 2002. J. Mingers. An empirical comparison of selection measures for decision-tree induction. Machine Learing, 3(4):319–342, 1989. 931 E SMEIR AND M ARKOVITCH S. Minton, M. D. Johnston, A. B. Philips, and P. Laird. Minimizing conflicts: A heuristic repair method for constraint satisfaction and scheduling problems. Artificial Intelligence, 58(1-3):161– 205, 1992. T. Mitchell. Machine Learning. McGraw Hill, 1997. O. J. Murphy and R. L. McCraw. Designing storage efficient decision trees. IEEE Transactions on Computers, 40(3):315–320, 1991. P. M. Murphy and M. J. Pazzani. Exploring the Decision Forest: An empirical investigation of Occam’s Razor in decision tree induction. Journal of Artificial Intelligence Research, 1:257– 275, 1994. S. K. Murthy and S. Salzberg. Lookahead and pathology in decision tree induction. In Proceedings of the Fourteenth International Joint Conference on Artificial Intelligence, pages 1025–1033, Montreal, Canada, 1995. S. W. Norton. Generating better decision trees. In N. S. Sridharan, editor, Proceedings of the Eleventh International Joint Conference on Artificial Intelligence, pages 800–805, Detroit, Michigan, USA, 1989. T. Oates and D. Jensen. The effects of training set size on decision tree complexity. In Proceedings of the Fourteenth International Conference on Machine Learning, pages 254–262. Morgan Kaufmann, 1997. D. Opitz. An Anytime Approach to Connectionist Theory Refinement: Refining the Topologies of Knowledge-Based Neural Networks. PhD thesis, Department of Computer Sciences, University of Wisconsin-Madison, 1995. D. Page and S. Ray. Skewing: An efficient alternative to lookahead for decision tree induction. In Proceedings of the Eighteenth International Joint Conference on Artificial Intelligence, Acapulco, Mexico, 2003. D. Page and S. Ray. Sequential Skewing: An improved skewing algorithm. In Proceedings of the Twenty-First International Conference on Machine Learning, Banff, Canada, 2004. A. Papagelis and D. Kalles. Breeding decision trees using evolutionary techniques. In Proceedings of the Eighteenth International Conference on Machine Learning, pages 393–400, San Francisco, CA, USA, 2001. Morgan Kaufmann Publishers Inc. J. R. Quinlan. C4.5: Programs for Machine Learning. Morgan Kaufmann, CA, 1993. J. R. Quinlan and R. M. Cameron-Jones. Oversearching and layered search in empirical learning. In Proceedings of the Fourteenth International Joint Conference on Artificial Intelligence, pages 1019–1024, 1995. J. R. Quinlan and R. L. Rivest. Inferring decision trees using the minimum description length principle. Information and Computation, 80(3):227–248, 1989. 932 A NYTIME L EARNING OF D ECISION T REES H. Ragavan and L. Rendell. Lookahead feature construction for learning hard concepts. In Proceedings of the Tenth International Conference on Machine Learning, pages 252–259, Amherst, MA, USA, 1993. R. Rao, D. Gordon, and W. Spears. For every generalization action, is there really an equal or opposite reaction? In Proceedings of Twelfth International Conference on Machine Learning, pages 471–479, 1995. S. Ray and D. Page. Generalized Skewing for functions with continuous and nominal attributes. In Proceedings of the Twenty-Second International Conference on Machine Learning, Bonn, Germany, 2005. S. J. Russell and E. Wefald. Principles of metareasoning. In Proceedings of the First International Conference on Pronciples of Knowledge Representation and Reasoning, pages 400–411, San Mateo, California, 1989. S. J. Russell and S. Zilberstein. Optimal composition of real-time systems. Artificial Intelligence, 82(1-2):181–213, 1996. U. K. Sarkar, P. Chakrabarti, S. Ghose, and S. C. DeSarkar. Improving greedy algorithms by lookahead search. Journal of Algorithms, 16:1–23, 1994. A. Schaerf. Tabu search techniques for large high-school timetabling problems. In Proceedings of the Thirteenth National Conference on Artificial Intelligence, pages 363–368, OR, USA, 1996. C. Schaffer. A conservation law for generalization performance. In Proceedings of Eleventh International Conference on Machine Learning, pages 259–265, 1994. R. Schapire. A brief introduction to boosting. In Proceedings of the Sixteenth International Joint Conference on Artificial Intelligence, pages 1401–1406, Stockholm, Sweden, 1999. Jude W. Shavlik, Raymond J. Mooney, and Geoffrey Towell G. Symbolic and neural learning algorithm: An experimental comparison. Machine Learning, 6:111–143, 1991. P. E. Utgoff. Incremental induction of decision trees. Machine Learning, 4(2):161–186, 1989. P. E. Utgoff, N. C. Berkman, and J. A. Clouse. Decision tree induction based on efficient tree restructuring. Machine Learning, 29(1):5–44, 1997. V. Vapnik. The Nature of Statistical Learning Theory. New York: Springer-Verlag, 1995. G. I. Webb. Further experimental evidence against the utility of Occam’s Razor. Journal of Artificial Intelligence Research, 4:397–417, 1996. I. Witten and E. Frank. Data Mining: Practical machine learning tools and techniques. Morgan Kaufmann, San Francisco, CA, 2005. 933