jmlr jmlr2007 jmlr2007-16 jmlr2007-16-reference knowledge-graph by maker-knowledge-mining

16 jmlr-2007-Boosted Classification Trees and Class Probability Quantile Estimation

Source: pdf

Author: David Mease, Abraham J. Wyner, Andreas Buja

Abstract: The standard by which binary classiﬁers are usually judged, misclassiﬁcation error, assumes equal costs of misclassifying the two classes or, equivalently, classifying at the 1/2 quantile of the conditional class probability function P[y = 1|x]. Boosted classiﬁcation trees are known to perform quite well for such problems. In this article we consider the use of standard, off-the-shelf boosting for two more general problems: 1) classiﬁcation with unequal costs or, equivalently, classiﬁcation at quantiles other than 1/2, and 2) estimation of the conditional class probability function P[y = 1|x]. We ﬁrst examine whether the latter problem, estimation of P[y = 1|x], can be solved with LogitBoost, and with AdaBoost when combined with a natural link function. The answer is negative: both approaches are often ineffective because they overﬁt P[y = 1|x] even though they perform well as classiﬁers. A major negative point of the present article is the disconnect between class probability estimation and classiﬁcation. Next we consider the practice of over/under-sampling of the two classes. We present an algorithm that uses AdaBoost in conjunction with Over/Under-Sampling and Jittering of the data (“JOUS-Boost”). This algorithm is simple, yet successful, and it preserves the advantage of relative protection against overﬁtting, but for arbitrary misclassiﬁcation costs and, equivalently, arbitrary quantile boundaries. We then use collections of classiﬁers obtained from a grid of quantiles to form estimators of class probabilities. The estimates of the class probabilities compare favorably to those obtained by a variety of methods across both simulated and real data sets. Keywords: boosting algorithms, LogitBoost, AdaBoost, class probability estimation, over-sampling, under-sampling, stratiﬁcation, data jittering

reference text

P. J. Bickel, Y. Ritov, and A. Zakai. Some theory for generalized boosting algorithms. Journal of Machine Learning Research, 7:705–732, 2006. G. Blanchard, G. Lugosi, and N. Vayatis. On the rate of convergence of regularized boosting classiﬁers. Journal of Machine Learning Research, 4:861–894, 2003. 437 M EASE , W YNER AND B UJA A. Buja, W. Stuetzle, and Y. Shen. Loss functions for binary class probability estimation and classiﬁcation: Structure and applications. 2006. P. Chan and S. Stolfo. Toward scalable learning with non-uniform class and cost distributions: A case study in credit card fraud detection. In Proceedings of the Fourth International Conference on Knowledge Discovery and Data Mining, pages 164–168, 1998. N. V. Chawla, K. W. Bowyer, L. O. Hall, and W. P. Kegelmeyer. Smote: Synthetic minority oversampling technique. Journal of Artiﬁcial Intelligence Research, 16:321–357, 2002. N. V. Chawla, A. Lazarevic, L. O. Hall, and K. W. Bowyer. Smoteboost: Improving prediction of the minority class in boosting. In Proceedings of 7th European Conference on Principles and Practice of Knowledge Discovery in Databases, pages 107–119, 2003. W. W. Cohen and Y. Singer. A simple, fast, and effective rule learner. In Proceedings of the 16th National Conference on Artiﬁcial Intelligence (AAAI), pages 335–342, 1999. M. Collins, R. E. Schapire, and Y. Singer. Logistic regression, adaboost and bregman distances. In Computational Learing Theory, pages 158–169, 2000. M. Dettling and P. Buhlmann. Boosting for tumor classiﬁcation with gene expression data. Bioinformatics, 19:1061–1069, 2003. N. Duffy and D. Helmbold. Potential boosters? In Advances in Neural Information Processing Systems, pages 258–264, 1999. C. Elkan. The foundations of cost-sensitive learning. In Proceedings of the Seventeenth International Joint Conference on Artiﬁcial Intelligence (IJCAI), pages 973–978, 2001. A. Estabrooks, T. Jo, and N. Japkowicz. A multiple resampling method for learning from imbalanced data sets. Computational Intelligence, 20:18–36, 2004. W. Fan, S. Stolfo, J. Zhang, and P. Chan. Adacost: Misclassiﬁcation cost-sensitive boosting. In Proceedings of the 16th International Conference on Machine Learning, pages 97–105, 1999. D. P. Foster and R. A. Stine. Variable selection in data mining: Building a predictive model for bankruptcy. Journal of the American Statistical Association, 99:303–313, 2004. Y. Freund and R. E. Schapire. Experiments with a new boosting algorithm. In Machine Learning: Proceedings of the Thirteenth International Conference, pages 148–156, 1996. Y. Freund and R. E. Schapire. Discussion of three papers regarding the asymptotic consistency of boosting. Annals of Statistics, 32:113–117, 2004. J. Friedman, T. Hastie, and R. Tibshirani. Additive logistic regression: a statistical view of boosting. Annals of Statistics, 28:337–374, 2000. T. Hastie, R. Tibshirani, and J. Friedman. The Elements of Statistical Learning. Springer, 2001. W. Jiang. Process consistency for adaboost. Annals of Statistics, 32:13–29, 2004. 438 B OOSTED C LASSIFICATION T REES AND C LASS P ROBABILITY /Q UANTILE E STIMATION W. Jiang. Does boosting overﬁt: Views from an exact solution. Technical Report 00-03, Department of Statistics, Northwestern University, 2000. M. Joshi, V. Kumar, and R. Agarwal. Evaluating boosting algorithms to classify rare classes: Comparison and improvements. In Proceedings of the First IEEE International Conference on Data Mining (ICDM), pages 257–264, 2001. G. Lebanon and J. Lafferty. Boosting and maximum likelihood for exponential models. In Advances in Neural Information Processing Systems, 2001. Y. Liu, Y. Yang, and J. Carbonell. Boosting to correct inductive bias in text classiﬁcation. In Proceedings of the Eleventh International Conference on Information and Knowledge Management, pages 348–355, 2002. G. Lugosi and N. Vayatis. On the bayes-risk consistency of regularized boosting methods. Annals of Statistics, 32:30–55, 2004. D. Mease and A. Wyner. Evidence contrary to the statistical view of boosting. 2007. S. Rosset, J. Zhu, and T. Hastie. Boosting as a regularized path to a maximum margin classiﬁer. Journal of Machine Learning Research, 5:941–973, 2004. L. J. Savage. Elicitation of personal probabilities and expectations. Journal of the American Statistical Association, 66:783–801, 1973. K. M. Ting. A comparative study of cost-sensitive boosting algorithms. In Proceedings of the 17th International Conference on Machine Learning, pages 983–990, 2000. B. Zadrozny and C. Elkan. Obtaining calibrated probability estimates from decision trees and naive bayesian classiﬁers. In Proceedings of the Eighteenth International Conference on Machine Learning, pages 609–616, 2001. T. Zhang and B. Yu. Boosting with early stopping: Convergence and consistency. Annals of Statistics, 33:1538–1579, 2005. 439