jmlr jmlr2006 jmlr2006-22 jmlr2006-22-reference knowledge-graph by maker-knowledge-mining
Source: pdf
Author: Francis R. Bach, David Heckerman, Eric Horvitz
Abstract: Receiver Operating Characteristic (ROC) curves are a standard way to display the performance of a set of binary classifiers for all feasible ratios of the costs associated with false positives and false negatives. For linear classifiers, the set of classifiers is typically obtained by training once, holding constant the estimated slope and then varying the intercept to obtain a parameterized set of classifiers whose performances can be plotted in the ROC plane. We consider the alternative of varying the asymmetry of the cost function used for training. We show that the ROC curve obtained by varying both the intercept and the asymmetry, and hence the slope, always outperforms the ROC curve obtained by varying only the intercept. In addition, we present a path-following algorithm for the support vector machine (SVM) that can compute efficiently the entire ROC curve, and that has the same computational complexity as training a single classifier. Finally, we provide a theoretical analysis of the relationship between the asymmetric cost model assumed when training a classifier and the cost model assumed in applying the classifier. In particular, we show that the mismatch between the step function used for testing and its convex upper bounds, usually used for training, leads to a provable and quantifiable difference around extreme asymmetries. Keywords: support vector machines, receiver operating characteristic (ROC) analysis, linear classification
D. Avis, D. Bremner, and R. Seidel. How good are convex hull algorithms ? In Computational Geometry: Theory and Applications, volume 7, 1997. F. R. Bach, D. Heckerman, and E. Horvitz. On the path to an ideal ROC curve: considering cost asymmetry in learning classifiers. In Workshop on Artificial Intelligence and Statistics (AISTATS), 2005a. F. R. Bach, R. Thibaux, and M. I. Jordan. Computing regularization paths for learning multiple kernels. In Advances in Neural Information Processing Systems, 17. MIT Press, 2005b. 1740 C ONSIDERING C OST A SYMMETRY IN L EARNING C LASSIFIERS P. L. Bartlett, M. I. Jordan, and J. D. McAuliffe. Large margin classifiers: convex loss, low noise, and convergence rates. In Advances in Neural Information Processing Systems, 16. MIT Press, 2004. C.L. Blake and C.J. Merz. UCI repository of machine learning databases, 1998. N. Bleistein and R. A. Handelsman. Asymptotic Expansions of Integrals. Dover, 1986. S. Boyd and L. Vandenberghe. Convex Optimization. Cambridge University Press, 2003. P. A. Flach. The geometry of ROC space: understanding machine learning metrics through ROC isometrics. In International Conference on Machine Learning (ICML), 2003. G. H. Golub and C. F. Van Loan. Matrix Computations. Johns Hopkins University Press, 1996. T. Hastie, S. Rosset, R. Tibshirani, and J. Zhu. The entire regularization path for the support vector machine. Journal of Machine Learning Research, 5:1391–1415, 2005. T. Hastie, R. Tibshirani, and J. Friedman. The Elements of Statistical Learning. Springer-Verlag, 2001. M. Hollander and D. A. Wolfe. Nonparametric statistical inference. John Wiley & Sons, 1999. R. Kohavi and G. John. Wrappers for feature subset selection. Artificial Intelligence, 97(1-2): 273–324, 1997. ¨ I. Maros. Computational Techniques of the Simplex Method. Kl uwer Academic Publishers, 2002. A. Y. Ng. Preventing overfitting of cross-validation data. In International Conference on Machine Learning (ICML), 1997. M. S. Pepe. Receiver operating characteristic methodology. Journal of the American Statistical Association, 95(449):308–311, 2000. J. Platt. Fast training of support vector machines using sequential minimal optimization. In Advances in Kernel Methods: Support Vector Learning. MIT Press, 1998. F. Provost and T. Fawcett. Robust classification for imprecise environments. Machine Learning Journal, 42(3):203–231, 2001. K. Scheinberg. An efficient implementation of an active set method for SVM. Journal of Machine Learning Research, to appear, 2006. B. Sch¨ lkopf and A. J. Smola. Learning with Kernels. MIT Press, 2002. o J. Shawe-Taylor and N. Cristianini. Kernel Methods for Pattern Analysis. Cambridge University Press, 2004. S. Tong and D. Koller. Restricted Bayes optimal classifiers. In American Conference on Artificial Intelligence (AAAI-00), 2000. T. Zhang. Statistical behavior and consistency of classification methods based on convex risk minimization. Annals of Statistics, 32:56–85, 2004. 1741