jmlr jmlr2005 jmlr2005-16 jmlr2005-16-reference knowledge-graph by maker-knowledge-mining

16 jmlr-2005-Asymptotics in Empirical Risk Minimization

Source: pdf

Author: Leila Mohammadi, Sara van de Geer

Abstract: In this paper, we study a two-category classiﬁcation problem. We indicate the categories by labels Y = 1 and Y = −1. We observe a covariate, or feature, X ∈ X ⊂ Rd . Consider a collection {ha } of classiﬁers indexed by a ﬁnite-dimensional parameter a, and the classiﬁer ha∗ that minimizes the prediction error over this class. The parameter a∗ is estimated by the empirical risk minimizer an over the class, where the empirical risk is calculated on a training sample of size n. We apply ˆ the Kim Pollard Theorem to show that under certain differentiability assumptions, an converges to ˆ a∗ with rate n−1/3 , and also present the asymptotic distribution of the renormalized estimator. For example, let V0 denote the set of x on which, given X = x, the label Y = 1 is more likely (than the label Y = −1). If X is one-dimensional, the set V0 is the union of disjoint intervals. The problem is then to estimate the thresholds of the intervals. We obtain the asymptotic distribution of the empirical risk minimizer when the classiﬁers have K thresholds, where K is ﬁxed. We furthermore consider an extension to higher-dimensional X, assuming basically that V0 has a smooth boundary in some given parametric class. We also discuss various rates of convergence when the differentiability conditions are possibly violated. Here, we again restrict ourselves to one-dimensional X. We show that the rate is n−1 in certain cases, and then also obtain the asymptotic distribution for the empirical prediction error. Keywords: asymptotic distribution, classiﬁcation theory, estimation error, nonparametric models, threshold-based classiﬁers

16 jmlr-2005-Asymptotics in Empirical Risk Minimization

reference text