jmlr jmlr2007 jmlr2007-79 jmlr2007-79-reference knowledge-graph by maker-knowledge-mining
Source: pdf
Author: Ray J. Hickey
Abstract: To provide good classification accuracy on unseen examples, a decision tree, learned by an algorithm such as ID3, must have sufficient structure and also identify the correct majority class in each of its leaves. If there are inadequacies in respect of either of these, the tree will have a percentage classification rate below that of the maximum possible for the domain, namely (100 Bayes error rate). An error decomposition is introduced which enables the relative contributions of deficiencies in structure and in incorrect determination of majority class to be isolated and quantified. A sub-decomposition of majority class error permits separation of the sampling error at the leaves from the possible bias introduced by the attribute selection method of the induction algorithm. It is shown that sampling error can extend to 25% when there are more than two classes. Decompositions are obtained from experiments on several data sets. For ID3, the effect of selection bias is shown to vary from being statistically non-significant to being quite substantial, with the latter appearing to be associated with a simple underlying model. Keywords: decision tree learning, error decomposition, majority classes, sampling error, attribute selection bias 1
Robert E. Bechhofer, Salah Elmaghraby and Norman Morse. A single sample multiple-decision procedure for selecting the multinomial event which has the highest probability. Annals of Mathematical Statistics, 30:102-119, 1959. Leo Breiman, Jerome H. Friedman, Richard A. Olshen and Charles J. Stone. Classification and Regression Trees. Wadsworth, Pacific Grove, California, 1984. 1766 STRUCTURE AND MAJORITY CLASSES IN DECISION TREE LEARNING Eibe Frank. Pruning Decision Trees and Lists. PhD thesis, University of Waikato, Hamilton, New Zealand, 2000. Carlos R.P. Hartmann, Pramod K. Varshney, Kishan G. Mehrotra and Carl L. Gerberich. Application of information theory to the construction of efficient decision trees. IEEE Transactions on Information Theory, IT-28:565-577, 1982. Ray J. Hickey. Artificial universes: towards a systematic approach to evaluating algorithms which learn from examples. In Proceedings of the Ninth International Conference on Machine Learning, pages 196-205, Aberdeen, Scotland, 1992. Ray J. Hickey. Noise modelling and evaluating learning from examples. Artificial Intelligence, 82(1-2):157-179, 1996. Gareth M. James. Variance and bias for general loss functions. Machine Learning, 51(2):115135, 2003. David D. Jensen and Paul R. Cohen. Multiple comparisons in induction algorithms. Machine Learning, 38(3):309-338, 2000. Harry Kesten and Norman Morse. A property of the multinomial distribution. Annals of Mathematical Statistics, 30:120-127, 1959. Ron Kohavi and George H. John. Wrappers for feature subset selection. Artificial Intelligence, 97(1-2):273-324, 1997. Wei Zhong Liu and Allan P. White. The importance of attribute selection measures in decision tree induction. Machine Learning, 15(1):25-41, 1994. Albert W. Marshall and Ingram Olkin. Inequalities: The Theory of Majorisation and its Applications. Academic Press, New York, 1979. Tom M. Mitchell. Machine Learning. McGraw-Hill, New York, 1997. Tim Oates and David D. Jensen. The effects of training set size on decision tree complexity. In Proceedings of the Fourteenth International Conference on Machine Learning, pages. 254-261, Nashville, Tennessee, 1997. Tim Oates and David D. Jensen. Toward a theoretical understanding of why and when decision tree pruning algorithms fail. In Proceedings of the Sixteenth National Conference on Artificial Intelligence, pages 372-378, Orlando, Florida, 1999. Foster Provost and Pedro Domingos. Tree induction for probability-based ranking. Machine Learning, 52(3):199-215, 2003. J. Ross Quinlan. Induction of decision trees. Machine Learning, 1(1):81-106, 1986. Cullen Schaffer. Overfitting avoidance as bias. Machine Learning, 10(2):153-178, 1993. UCI KDD Archive. http://kdd.ics.uci.edu 1767 HICKEY Gary M. Weiss and Haym Hirsh. A quantitative study of small disjuncts. In Proceedings of the Seventeenth National Conference on Artificial Intelligence, pages 665-670, Austin, Texas, 2000. 1768