jmlr jmlr2007 jmlr2007-42 jmlr2007-42-reference knowledge-graph by maker-knowledge-mining

42 jmlr-2007-Infinitely Imbalanced Logistic Regression


Source: pdf

Author: Art B. Owen

Abstract: In binary classification problems it is common for the two classes to be imbalanced: one case is very rare compared to the other. In this paper we consider the infinitely imbalanced case where one class has a finite sample size and the other class’s sample size grows without bound. For logistic regression, the infinitely imbalanced case often has a useful solution. Under mild conditions, the intercept diverges as expected, but the rest of the coefficient vector approaches a non trivial and useful limit. That limit can be expressed in terms of exponential tilting and is the minimum of a convex objective function. The limiting form of logistic regression suggests a computational shortcut for fraud detection problems. Keywords: classification, drug discovery, fraud detection, rare events, unbalanced data


reference text

R. J. Bolton and D. J. Hand. Statistical fraud detection: A review. Statistical Science, 17(3):235– 255, 2002. L. Breiman, J.H. Friedman, R.A. Olshen, and C.J. Stone. Classification And Regression Trees. Wadsworth, Belmont, CA, 1984. N.V. Chawla, N. Japkowicz, and A. Kolcz. Proceedings of the ICML’2003 Workshop on Learning from Imbalanced Data Sets. 2003. N.V. Chawla, N. Japkowicz, and A. Kolcz. Editorial: special issue on learning from imbalanced data sets. ACM SIGKDD Explorations Newsletter, 6(1):1–6, 2004. D.A. Cohn, Z. Ghahramani, and M.I. Jordan. Active learning with statistical models. Journal of Artificial Intelligence Research, 4:129–145, 1996. N. Japkowicz. Learning from Imbalanced Data Sets: Papers from the AAAI Workshop. AAAI, 2000. Technical Report WS-00-05. G. King and L. Zeng. Logistic regression in rare events data. Political Analysis, 9(2):137–163, 2001. M.J. Silvapulle. On the existence of maximum likelihood estimates for the binomial response models. Journal of the Royal Statistical Society, Series B, 43:310–313, 1981. S. Tong. Active learning: Theory and applications. PhD thesis, Stanford University, 2001. URL http://ai.stanford.edu/∼stong/research.html/tong thesis.pdf. M. Zhu, W. Su, and H. A. Chipman. LAGO: A computationally efficient approach for statistical detection. Technometrics, 48:193–205, 2005. 773