jmlr jmlr2007 jmlr2007-16 jmlr2007-16-reference knowledge-graph by maker-knowledge-mining
Source: pdf
Author: David Mease, Abraham J. Wyner, Andreas Buja
Abstract: The standard by which binary classifiers are usually judged, misclassification error, assumes equal costs of misclassifying the two classes or, equivalently, classifying at the 1/2 quantile of the conditional class probability function P[y = 1|x]. Boosted classification trees are known to perform quite well for such problems. In this article we consider the use of standard, off-the-shelf boosting for two more general problems: 1) classification with unequal costs or, equivalently, classification at quantiles other than 1/2, and 2) estimation of the conditional class probability function P[y = 1|x]. We first examine whether the latter problem, estimation of P[y = 1|x], can be solved with LogitBoost, and with AdaBoost when combined with a natural link function. The answer is negative: both approaches are often ineffective because they overfit P[y = 1|x] even though they perform well as classifiers. A major negative point of the present article is the disconnect between class probability estimation and classification. Next we consider the practice of over/under-sampling of the two classes. We present an algorithm that uses AdaBoost in conjunction with Over/Under-Sampling and Jittering of the data (“JOUS-Boost”). This algorithm is simple, yet successful, and it preserves the advantage of relative protection against overfitting, but for arbitrary misclassification costs and, equivalently, arbitrary quantile boundaries. We then use collections of classifiers obtained from a grid of quantiles to form estimators of class probabilities. The estimates of the class probabilities compare favorably to those obtained by a variety of methods across both simulated and real data sets. Keywords: boosting algorithms, LogitBoost, AdaBoost, class probability estimation, over-sampling, under-sampling, stratification, data jittering
P. J. Bickel, Y. Ritov, and A. Zakai. Some theory for generalized boosting algorithms. Journal of Machine Learning Research, 7:705–732, 2006. G. Blanchard, G. Lugosi, and N. Vayatis. On the rate of convergence of regularized boosting classifiers. Journal of Machine Learning Research, 4:861–894, 2003. 437 M EASE , W YNER AND B UJA A. Buja, W. Stuetzle, and Y. Shen. Loss functions for binary class probability estimation and classification: Structure and applications. 2006. P. Chan and S. Stolfo. Toward scalable learning with non-uniform class and cost distributions: A case study in credit card fraud detection. In Proceedings of the Fourth International Conference on Knowledge Discovery and Data Mining, pages 164–168, 1998. N. V. Chawla, K. W. Bowyer, L. O. Hall, and W. P. Kegelmeyer. Smote: Synthetic minority oversampling technique. Journal of Artificial Intelligence Research, 16:321–357, 2002. N. V. Chawla, A. Lazarevic, L. O. Hall, and K. W. Bowyer. Smoteboost: Improving prediction of the minority class in boosting. In Proceedings of 7th European Conference on Principles and Practice of Knowledge Discovery in Databases, pages 107–119, 2003. W. W. Cohen and Y. Singer. A simple, fast, and effective rule learner. In Proceedings of the 16th National Conference on Artificial Intelligence (AAAI), pages 335–342, 1999. M. Collins, R. E. Schapire, and Y. Singer. Logistic regression, adaboost and bregman distances. In Computational Learing Theory, pages 158–169, 2000. M. Dettling and P. Buhlmann. Boosting for tumor classification with gene expression data. Bioinformatics, 19:1061–1069, 2003. N. Duffy and D. Helmbold. Potential boosters? In Advances in Neural Information Processing Systems, pages 258–264, 1999. C. Elkan. The foundations of cost-sensitive learning. In Proceedings of the Seventeenth International Joint Conference on Artificial Intelligence (IJCAI), pages 973–978, 2001. A. Estabrooks, T. Jo, and N. Japkowicz. A multiple resampling method for learning from imbalanced data sets. Computational Intelligence, 20:18–36, 2004. W. Fan, S. Stolfo, J. Zhang, and P. Chan. Adacost: Misclassification cost-sensitive boosting. In Proceedings of the 16th International Conference on Machine Learning, pages 97–105, 1999. D. P. Foster and R. A. Stine. Variable selection in data mining: Building a predictive model for bankruptcy. Journal of the American Statistical Association, 99:303–313, 2004. Y. Freund and R. E. Schapire. Experiments with a new boosting algorithm. In Machine Learning: Proceedings of the Thirteenth International Conference, pages 148–156, 1996. Y. Freund and R. E. Schapire. Discussion of three papers regarding the asymptotic consistency of boosting. Annals of Statistics, 32:113–117, 2004. J. Friedman, T. Hastie, and R. Tibshirani. Additive logistic regression: a statistical view of boosting. Annals of Statistics, 28:337–374, 2000. T. Hastie, R. Tibshirani, and J. Friedman. The Elements of Statistical Learning. Springer, 2001. W. Jiang. Process consistency for adaboost. Annals of Statistics, 32:13–29, 2004. 438 B OOSTED C LASSIFICATION T REES AND C LASS P ROBABILITY /Q UANTILE E STIMATION W. Jiang. Does boosting overfit: Views from an exact solution. Technical Report 00-03, Department of Statistics, Northwestern University, 2000. M. Joshi, V. Kumar, and R. Agarwal. Evaluating boosting algorithms to classify rare classes: Comparison and improvements. In Proceedings of the First IEEE International Conference on Data Mining (ICDM), pages 257–264, 2001. G. Lebanon and J. Lafferty. Boosting and maximum likelihood for exponential models. In Advances in Neural Information Processing Systems, 2001. Y. Liu, Y. Yang, and J. Carbonell. Boosting to correct inductive bias in text classification. In Proceedings of the Eleventh International Conference on Information and Knowledge Management, pages 348–355, 2002. G. Lugosi and N. Vayatis. On the bayes-risk consistency of regularized boosting methods. Annals of Statistics, 32:30–55, 2004. D. Mease and A. Wyner. Evidence contrary to the statistical view of boosting. 2007. S. Rosset, J. Zhu, and T. Hastie. Boosting as a regularized path to a maximum margin classifier. Journal of Machine Learning Research, 5:941–973, 2004. L. J. Savage. Elicitation of personal probabilities and expectations. Journal of the American Statistical Association, 66:783–801, 1973. K. M. Ting. A comparative study of cost-sensitive boosting algorithms. In Proceedings of the 17th International Conference on Machine Learning, pages 983–990, 2000. B. Zadrozny and C. Elkan. Obtaining calibrated probability estimates from decision trees and naive bayesian classifiers. In Proceedings of the Eighteenth International Conference on Machine Learning, pages 609–616, 2001. T. Zhang and B. Yu. Boosting with early stopping: Convergence and consistency. Annals of Statistics, 33:1538–1579, 2005. 439