jmlr jmlr2006 jmlr2006-22 jmlr2006-22-reference knowledge-graph by maker-knowledge-mining

22 jmlr-2006-Considering Cost Asymmetry in Learning Classifiers

Source: pdf

Author: Francis R. Bach, David Heckerman, Eric Horvitz

Abstract: Receiver Operating Characteristic (ROC) curves are a standard way to display the performance of a set of binary classiﬁers for all feasible ratios of the costs associated with false positives and false negatives. For linear classiﬁers, the set of classiﬁers is typically obtained by training once, holding constant the estimated slope and then varying the intercept to obtain a parameterized set of classiﬁers whose performances can be plotted in the ROC plane. We consider the alternative of varying the asymmetry of the cost function used for training. We show that the ROC curve obtained by varying both the intercept and the asymmetry, and hence the slope, always outperforms the ROC curve obtained by varying only the intercept. In addition, we present a path-following algorithm for the support vector machine (SVM) that can compute efﬁciently the entire ROC curve, and that has the same computational complexity as training a single classiﬁer. Finally, we provide a theoretical analysis of the relationship between the asymmetric cost model assumed when training a classiﬁer and the cost model assumed in applying the classiﬁer. In particular, we show that the mismatch between the step function used for testing and its convex upper bounds, usually used for training, leads to a provable and quantiﬁable difference around extreme asymmetries. Keywords: support vector machines, receiver operating characteristic (ROC) analysis, linear classiﬁcation

reference text

D. Avis, D. Bremner, and R. Seidel. How good are convex hull algorithms ? In Computational Geometry: Theory and Applications, volume 7, 1997. F. R. Bach, D. Heckerman, and E. Horvitz. On the path to an ideal ROC curve: considering cost asymmetry in learning classiﬁers. In Workshop on Artiﬁcial Intelligence and Statistics (AISTATS), 2005a. F. R. Bach, R. Thibaux, and M. I. Jordan. Computing regularization paths for learning multiple kernels. In Advances in Neural Information Processing Systems, 17. MIT Press, 2005b. 1740 C ONSIDERING C OST A SYMMETRY IN L EARNING C LASSIFIERS P. L. Bartlett, M. I. Jordan, and J. D. McAuliffe. Large margin classiﬁers: convex loss, low noise, and convergence rates. In Advances in Neural Information Processing Systems, 16. MIT Press, 2004. C.L. Blake and C.J. Merz. UCI repository of machine learning databases, 1998. N. Bleistein and R. A. Handelsman. Asymptotic Expansions of Integrals. Dover, 1986. S. Boyd and L. Vandenberghe. Convex Optimization. Cambridge University Press, 2003. P. A. Flach. The geometry of ROC space: understanding machine learning metrics through ROC isometrics. In International Conference on Machine Learning (ICML), 2003. G. H. Golub and C. F. Van Loan. Matrix Computations. Johns Hopkins University Press, 1996. T. Hastie, S. Rosset, R. Tibshirani, and J. Zhu. The entire regularization path for the support vector machine. Journal of Machine Learning Research, 5:1391–1415, 2005. T. Hastie, R. Tibshirani, and J. Friedman. The Elements of Statistical Learning. Springer-Verlag, 2001. M. Hollander and D. A. Wolfe. Nonparametric statistical inference. John Wiley & Sons, 1999. R. Kohavi and G. John. Wrappers for feature subset selection. Artiﬁcial Intelligence, 97(1-2): 273–324, 1997. ¨ I. Maros. Computational Techniques of the Simplex Method. Kl uwer Academic Publishers, 2002. A. Y. Ng. Preventing overﬁtting of cross-validation data. In International Conference on Machine Learning (ICML), 1997. M. S. Pepe. Receiver operating characteristic methodology. Journal of the American Statistical Association, 95(449):308–311, 2000. J. Platt. Fast training of support vector machines using sequential minimal optimization. In Advances in Kernel Methods: Support Vector Learning. MIT Press, 1998. F. Provost and T. Fawcett. Robust classiﬁcation for imprecise environments. Machine Learning Journal, 42(3):203–231, 2001. K. Scheinberg. An efﬁcient implementation of an active set method for SVM. Journal of Machine Learning Research, to appear, 2006. B. Sch¨ lkopf and A. J. Smola. Learning with Kernels. MIT Press, 2002. o J. Shawe-Taylor and N. Cristianini. Kernel Methods for Pattern Analysis. Cambridge University Press, 2004. S. Tong and D. Koller. Restricted Bayes optimal classiﬁers. In American Conference on Artiﬁcial Intelligence (AAAI-00), 2000. T. Zhang. Statistical behavior and consistency of classiﬁcation methods based on convex risk minimization. Annals of Statistics, 32:56–85, 2004. 1741