jmlr jmlr2013 jmlr2013-31 knowledge-graph by maker-knowledge-mining
Source: pdf
Author: Kris De Brabanter, Jos De Brabanter, Bart De Moor, Irène Gijbels
Abstract: We present a fully automated framework to estimate derivatives nonparametrically without estimating the regression function. Derivative estimation plays an important role in the exploration of structures in curves (jump detection and discontinuities), comparison of regression curves, analysis of human growth data, etc. Hence, the study of estimating derivatives is equally important as regression estimation itself. Via empirical derivatives we approximate the qth order derivative and create a new data set which can be smoothed by any nonparametric regression estimator. We derive L1 and L2 rates and establish consistency of the estimator. The new data sets created by this technique are no longer independent and identically distributed (i.i.d.) random variables anymore. As a consequence, automated model selection criteria (data-driven procedures) break down. Therefore, we propose a simple factor method, based on bimodal kernels, to effectively deal with correlated data in the local polynomial regression framework. Keywords: nonparametric derivative estimation, model selection, empirical derivative, factor rule
Reference: text
sentIndex sentText sentNum sentScore
1 BE Department of Mathematics & Leuven Statistics Research Centre (LStat) KU Leuven Celestijnenlaan 200B B-3001 Leuven, Belgium Editor: Xiaotong Shen Abstract We present a fully automated framework to estimate derivatives nonparametrically without estimating the regression function. [sent-13, score-0.298]
2 Hence, the study of estimating derivatives is equally important as regression estimation itself. [sent-15, score-0.334]
3 Via empirical derivatives we approximate the qth order derivative and create a new data set which can be smoothed by any nonparametric regression estimator. [sent-16, score-0.915]
4 Therefore, we propose a simple factor method, based on bimodal kernels, to effectively deal with correlated data in the local polynomial regression framework. [sent-23, score-0.464]
5 Keywords: nonparametric derivative estimation, model selection, empirical derivative, factor rule 1. [sent-24, score-0.449]
6 Introduction The next section describes previous methods and objectives for nonparametric derivative estimation. [sent-25, score-0.4]
7 Although the importance of regression estimation is indisputable, sometimes the first or higher order derivatives of the regression function can be equally important. [sent-38, score-0.477]
8 Also, estimation of derivatives of the regression function is required for plug-in bandwidth selection strategies (Wand and Jones, 1995) and in the construction of confidence intervals (Eubank and Speckman, 1993). [sent-42, score-0.471]
9 the inˆ dependent variable to obtain the first order derivative of the regression function. [sent-46, score-0.468]
10 Otherwise, it can lead to wrong derivative estimates when the data is noisy. [sent-48, score-0.325]
11 In the literature there are two main approaches to nonparametric derivative estimation: Regression/smoothing splines and local polynomial regression. [sent-50, score-0.598]
12 In the context of derivative estimation, Stone (1985) has shown that spline derivative estimators can achieve the optimal L2 rate of convergence. [sent-51, score-0.686]
13 (2004) suggested an empirical bias bandwidth criterion to i=1 ˆ estimate the first derivative via semiparametric penalized splines. [sent-57, score-0.66]
14 Early works discussing kernel based derivative estimation include Gasser and M¨ ller (1984) u and H¨ rdle and Gasser (1985). [sent-58, score-0.514]
15 (1987) and H¨ rdle (1990) proposed a generalized a u a version of the cross-validation technique to estimate the first derivative via kernel smoothing using difference quotients. [sent-60, score-0.492]
16 (1987) also proposed a factor method to estimate a derivative via u kernel smoothing. [sent-65, score-0.367]
17 In case of local polynomial regression (Fan and Gijbels, 1996), the estimation of the qth derivative is straightforward. [sent-67, score-0.739]
18 One can estimate m(q) (x) via the intercept coefficient of the qth derivative (local slope) of the local polynomial being fitted at x, assuming that the degree p is larger or equal to q. [sent-68, score-0.594]
19 Note that this estimate of the derivative is, in general, not equal to the qth derivative of the estimated regression function m(x). [sent-69, score-0.893]
20 282 D ERIVATIVE E STIMATION WITH L OCAL P OLYNOMIAL F ITTING As mentioned before, two problems inherently present in nonparametric derivative estimation are the unavailability of the data for derivative estimation (only regression data is given) and bandwidth or smoothing selection. [sent-72, score-1.116]
21 In what follows we investigate a new way to compute derivatives of the regression function given the data (x1 ,Y1 ), . [sent-73, score-0.298]
22 , 2011) and derive a factor method based on bimodal kernels to estimate the derivatives of the unknown regression function. [sent-80, score-0.502]
23 Section 2 illustrates the principle of empirical first order derivatives and their use within the local polynomial regression framework. [sent-83, score-0.516]
24 We derive bias and variance of empirical first order derivatives and establish pointwise consistency. [sent-84, score-0.444]
25 Further, the behavior at the boundaries of empirical first order derivatives is described. [sent-85, score-0.315]
26 Section 3 generalizes the idea of empirical first order derivatives to higher order derivatives. [sent-86, score-0.306]
27 In Section 5 we conduct a Monte Carlo experiment to compare the proposed method with two often used methods for derivative estimation. [sent-88, score-0.325]
28 Suppose that (p + 1)th derivative of m at the point x0 exists. [sent-100, score-0.325]
29 Derivative Estimation In this section we first illustrate the principle of empirical first order derivatives and how they can be used within the local polynomial regression framework to estimate first order derivatives of the unknown regression function. [sent-129, score-0.848]
30 Such a procedure can lead to wrong derivative estimates when the data is noisy and will deteriorate quickly when calculating higher order derivatives. [sent-135, score-0.391]
31 (1987) and u H¨ rdle (1990) to estimate first order derivatives via kernel smoothing. [sent-138, score-0.317]
32 Such an approach produces a a very noisy estimate of the derivative which is of the order O(n2 ) and as a result it will be difficult to estimate the derivative function. [sent-139, score-0.716]
33 In order to reduce the variance we use a variance-reducing linear combination of symmetric (about i) difference quotients (1) Yi = Y (1) (xi ) = k ∑ wj · j=1 Yi+ j −Yi− j xi+ j − xi− j , (4) where the weights w1 , . [sent-141, score-0.368]
34 Then, for j=1 k + 1 ≤ i ≤ n − k, the weights wj = (1) minimize the variance of Yi Proof: see Appendix A. [sent-162, score-0.267]
35 Figure 1a displays the empirical first derivative for k ∈ {2, 5, 7, 12} generated from model (1) with m(x) = x(1 − x) sin((2. [sent-167, score-0.374]
36 Even for a small k, it can be seen that the empirical first order derivatives are noise corrupted versions of the true derivative m′ . [sent-173, score-0.597]
37 In contrast, difference quotients produce an extreme noisy version of the true derivative (Figure 1b). [sent-174, score-0.424]
38 When k is large, empirical first derivatives are biased near local extrema of the true derivative (see Figure 1f). [sent-176, score-0.609]
39 The next two theorems give asymptotic results on the bias and variance and establish pointwise consistency of the empirical first order derivatives. [sent-178, score-0.289]
40 Further, assume that the second order derivative m(2) is finite on X . [sent-180, score-0.359]
41 Then the bias and variance of the empirical first order derivative, with weights assigned by Proposition 1, satisfy (1) bias(Yi ) = O(n−1 k) and (1) Var(Yi ) = O(n2 k−3 ) uniformly for k + 1 ≤ i ≤ n − k. [sent-181, score-0.311]
42 3 60 40 First derivative First derivative 0 (b) difference quotient 40 −30 0. [sent-220, score-0.686]
43 9 1 (f) empirical derivative (k = 12) Figure 1: (a) Simulated data set of size n = 300 equispaced points from model (1) with m(x) = x(1 − x) sin((2. [sent-244, score-0.541]
44 As a reference, the true derivative is also displayed (full line); (c)-(f) empirical first derivatives for k ∈ {2, 5, 7, 12}. [sent-248, score-0.563]
45 According to Theorem 2 and Theorem 3, the bias and variance of the empirical first order derivative tends to zero and k → ∞ faster than O(n2/3 ) but slower than O(n). [sent-249, score-0.58]
46 The optimal rate at which k → ∞ such that the mean squared error (MSE) of the empirical first order derivatives will tend to zero at the fastest possible rate is a direct consequence of Theorem 2. [sent-250, score-0.272]
47 In order to derive a suitable expression for the MSE, we start from the bias and variance expressions for the empirical derivatives. [sent-259, score-0.255]
48 n − 2 i=1 For the second unknown quantity B one can use the local polynomial regression estimate of order ˆ p = 3 leading to the following (rough) estimate of the second derivative m(2) (x0 ) = 2β2 (see also ˆ Section 1). [sent-273, score-0.603]
49 2 Behavior At The Boundaries Recall that for the boundary region (2 ≤ i ≤ k and n − k + 1 ≤ i ≤ n − 1) the weights in the derivative (4) and the range of the sum are slightly modified. [sent-279, score-0.423]
50 By noticing that all even orders of the derivative cancel out, the previous result can be written as (1) E(Yi ) = n−1 2d(X ) q w j 2 jd(X ) ′ 2 m (xi ) + ∑ ∑ j n−1 j=1 l=3,5,. [sent-289, score-0.325]
51 This immediately follows from the definition of the derivative in (4). [sent-299, score-0.325]
52 Then, the bias of the derivative (4) is given by (1) bias(Yi ) l−1 q ′ = (κ − 1)m (xi ) + k w j jl−1 d(X )l−1 + O n−q/5 , l! [sent-302, score-0.443]
53 However, in order to obtain an autoj=1 matic bias correction at the boundaries, we can make κ = 1 by normalizing the sum leading to the following estimator k(i) wj Yi+ j −Yi− j (1) Yi = ∑ k(i) (8) w j xi+ j − xi− j j=1 ∑ w j=1 at the boundaries. [sent-308, score-0.342]
54 Higher Order Empirical Derivatives In this section, we generalize the idea of first order empirical derivatives to higher order derivatives. [sent-317, score-0.306]
55 Let q denote the order of the derivative and assume further that q ≥ 2, then higher order empirical derivatives can be defined inductively as (l) Yi (l−1) kl ∑ w j,l · = Yi+ j (l−1) −Yi− j with xi+ j − xi− j j=1 l ∈ {2, . [sent-318, score-0.819]
56 As with the first order empirical derivative, a boundary issue arises q q with expression (9) when i < ∑l=1 kl + 1 or i > n − ∑l=1 kl . [sent-325, score-0.501]
57 Although, the qth order derivatives are linear in the weights at level q, they are not linear in the weights at all levels. [sent-327, score-0.469]
58 Assume that there exist λ ∈ (0, 1) and cl ∈ (0, ∞) such that kl n−λ → cl for n → ∞ and l ∈ {1, 2, . [sent-335, score-0.302]
59 Then the asymptotic bias and variance of the empirical qth order derivative are given by (q) bias(Yi ) = O(nλ−1 ) and q (q) Var(Yi ) = O(n2q−2λ(q+1/2) ) q uniformly for ∑l=1 kq + 1 < i < n − ∑l=1 kq . [sent-349, score-0.982]
60 An interesting consequence of Theorem 4 is that the order of the bias of the empirical derivative estimator does not depend on the order of the derivative q. [sent-351, score-0.885]
61 Corollary 5 states that the L2 rate of convergence (and L1 rate) will be slower for increasing orders of derivatives q, that is, higher order derivatives are progressively more difficult to estimate. [sent-353, score-0.412]
62 Corollary 5 suggests that the MSE of the qth order empirical derivative will 2q tend to zero for λ ∈ ( 2q+1 , 1) prescribing, for example, kq = O(n2(q+1)/(2q+3) ). [sent-354, score-0.659]
63 They showed, under mild conditions on the kernel function and for equispaced design, that by using a kernel satisfying K(0) = 0 the correlation structure is removed without any prior knowledge about its structure. [sent-369, score-0.251]
64 Further, they showed that bimodal kernels introduce extra bias and variance yielding in a slightly wiggly estimate. [sent-370, score-0.376]
65 In what follows we develop a relation between the bandwidth of a unimodal kernel and the bandwidth of a bimodal kernel. [sent-371, score-0.519]
66 Consequently, the estimate based on this bandwidth will be smoother than the one based on a bimodal kernel. [sent-372, score-0.309]
67 Assume the following model for the qth order derivative Y (q) (x) = m(q) (x) + ε and assume that m has two continuous derivatives. [sent-373, score-0.493]
68 Since the bandwidth hb based on a symmetric bimodal kernel K has a similar expression as (10) for a unimodal kernel, one can express h as a function of hb resulting into a factor method. [sent-392, score-0.478]
69 62470 Table 1: The factor Cp (K, K) for different unimodal kernels and for various odd orders of polyno√ mials p with K(u) = (2/ π)u2 exp(−u2 ) as bimodal kernel. [sent-414, score-0.27]
70 Simulations In what follows, we evaluate the proposed method for derivative estimation with several other methods used in the literature. [sent-416, score-0.361]
71 The corresponding sets of bandwidths of the bimodal kernel hb were {0. [sent-423, score-0.262]
72 To smooth the noisy derivative data we have chosen a local polynomial regression estimate of order p = 3. [sent-434, score-0.635]
73 8 1 Figure 2: Illustration of the noisy empirical first order derivative (data points), smoothed empirical first order derivative based on a local polynomial regression estimate of order p = 3 (bold line) and true derivative (bold dashed line). [sent-441, score-1.451]
74 (a) First order derivative of regression function (11) with k1 = 7; (b) First order derivative of regression function (12) with k1 = 12. [sent-442, score-0.936]
75 A typical result for the second order derivative (q = 2) of (11) and (12) and 292 D ERIVATIVE E STIMATION WITH L OCAL P OLYNOMIAL F ITTING 1. [sent-454, score-0.359]
76 6 proposed locpol Pspline Figure 3: Result of the Monte Carlo study for the proposed method and two other well-known methods for first order derivative estimation. [sent-460, score-0.426]
77 its second order empirical derivative is shown in Figure 4. [sent-461, score-0.408]
78 To smooth the noisy derivative data we have chosen a local polynomial regression estimate of order p = 3. [sent-462, score-0.635]
79 The question that arises is the following: How to tune k1 and k2 for second order derivative estimation? [sent-463, score-0.359]
80 We evaluate the proposed method for derivative estimation with the local slope in local polynomial regression with p = 5 and penalized smoothing splines. [sent-485, score-0.755]
81 8 Figure 4: Illustration of the noisy empirical second order derivative (data points), smoothed empirical second order derivative based on a local polynomial regression estimate of order p = 3 (bold line) and true derivative (bold dashed line). [sent-495, score-1.451]
82 (a) Second order derivative of regression function (11) with k1 = 6 and k2 = 10; (b) Second order derivative of regression function (12) with k1 = 3 and k2 = 25. [sent-496, score-0.936]
83 MAEadjusted 35 30 25 20 proposed locpol Pspline Figure 5: Result of the Monte Carlo study for the proposed method and two other well-known methods for second order derivative estimation. [sent-497, score-0.426]
84 Conclusion In this paper we proposed a methodology to estimate derivatives nonparametrically without estimating the regression function. [sent-499, score-0.298]
85 nature of 294 1 D ERIVATIVE E STIMATION WITH L OCAL P OLYNOMIAL F ITTING the data, we proposed a simple factor method, based on bimodal kernels, for the local polynomial regression framework. [sent-508, score-0.416]
86 Further, we showed that the order bias of the empirical derivative does not depend on the order of the derivative q and that slower rates of convergence are to be expected for increasing orders of derivatives q. [sent-509, score-1.074]
87 Setting the partial derivatives to zero gives k 2 1− ∑ wj j=2 = 2w j , j2 295 j = 2, . [sent-546, score-0.346]
88 Proof Of Theorem 4 The first step is to notice that there exist λ ∈ (0, 1) and c1 ∈ (0, ∞) (see Theorem 3) so that the (1) bias and variance of the first order empirical derivative can be written as bias(Yi ) = O(nλ−1 ) and (1) Var(Yi ) = O(n2−3λ ) uniformly over i for k1 n−λ → c1 as n → ∞. [sent-563, score-0.58]
89 (14) The expected value of the first order empirical derivative is given by (see Section 2) q (1) E(Yi ) = m′ (xi ) + ∑ k1 w j,1 j p−1 d(X ) p−1 + O nq(λ−1) , p! [sent-572, score-0.408]
90 , q} and kl n−λ → cl , where cl ∈ (0, ∞), as n→∞ (l−1) E(Yi q ) = m(l−1) (xi ) + ∑ θ p,l−1 m(p) (xi ) + O n(q−l+2)(λ−1) p=l+1,l+3,. [sent-581, score-0.302]
91 , kl results in (l) E(Yi ) = m(l) (xi ) kl +∑ jd(X ) n−1 q−l+1 j ∑ k kl j kl j=1 ∑i=1 i kl +∑ m(p+l−1) (xi ) p! [sent-621, score-0.94]
92 θ p,l m(p) (xi ) for θ p,l = O n(p−l)(λ−1) for kl n−λ → cl as (l) n → ∞. [sent-640, score-0.245]
93 The variance of Yi is given by (l) Var(Yi ) = ≤ (n − 1)2 Var 4d(X )2 (n − 1)2 Var 2d(X )2 kl w j,l (l−1) (l−1) Yi+ j −Yi− j j=1 j ∑ kl w j,l (l−1) ∑ j Yi+ j + Var j=1 kl w j,l (l−1) Yi− j j=1 j ∑ . [sent-652, score-0.618]
94 , kl , the variance is upperbounded by (l) Var(Yi ) ≤ (n − 1)2 d(X )2 kl ∑ aj w2 j,l j=1 j2 O(n2(l−1)−2λ(l−1/2) ). [sent-656, score-0.43]
95 Then, for kl n−λ → cl as n → ∞, 299 j k for l ∑i=1 i (l) it readily follows that Var(Yi ) = D E B RABANTER , D E B RABANTER , D E M OOR AND G IJBELS References J. [sent-660, score-0.245]
96 Nonparametric estimation of a regression function and its derivatives under an ergodic hypothesis. [sent-720, score-0.334]
97 Data-driven bandwidth selection in local polynomial fitting: variable bandwidth and spatial adaptation. [sent-739, score-0.409]
98 Estimating regression functions and their derivatives by the kernel u method. [sent-755, score-0.34]
99 Data-driven discontinuity detection in derivatives of a regression function. [sent-764, score-0.298]
100 On robust kernel estimation of derivatives of regression functions. [sent-788, score-0.376]
wordName wordTfidf (topN-words)
[('rabanter', 0.334), ('derivative', 0.325), ('jd', 0.219), ('derivatives', 0.189), ('kl', 0.188), ('bimodal', 0.172), ('equispaced', 0.167), ('ijbels', 0.167), ('olynomial', 0.167), ('yi', 0.165), ('xi', 0.158), ('wj', 0.157), ('brabanter', 0.155), ('erivative', 0.143), ('gijbels', 0.143), ('itting', 0.143), ('leuven', 0.143), ('var', 0.137), ('bandwidth', 0.137), ('qth', 0.134), ('stimation', 0.119), ('ocal', 0.119), ('bias', 0.118), ('kq', 0.117), ('oor', 0.111), ('regression', 0.109), ('polynomial', 0.089), ('nonparametric', 0.075), ('smoothing', 0.073), ('mse', 0.07), ('kris', 0.067), ('kuleuven', 0.067), ('locpol', 0.067), ('maeadjusted', 0.067), ('pspline', 0.067), ('quotients', 0.067), ('wgcv', 0.067), ('kp', 0.063), ('splines', 0.063), ('jos', 0.059), ('ller', 0.059), ('ku', 0.057), ('cl', 0.057), ('weights', 0.056), ('variance', 0.054), ('fan', 0.052), ('rdle', 0.052), ('cp', 0.05), ('charnigo', 0.05), ('gasser', 0.05), ('newell', 0.05), ('ruppert', 0.05), ('sbo', 0.05), ('sizer', 0.05), ('empirical', 0.049), ('correlated', 0.048), ('hb', 0.048), ('local', 0.046), ('carlo', 0.045), ('monte', 0.044), ('boundaries', 0.043), ('wand', 0.043), ('boundary', 0.042), ('kernel', 0.042), ('thumb', 0.042), ('du', 0.041), ('bart', 0.039), ('moor', 0.039), ('ramsay', 0.039), ('spline', 0.036), ('estimation', 0.036), ('kh', 0.036), ('marron', 0.036), ('quotient', 0.036), ('odd', 0.035), ('asymptotic', 0.034), ('order', 0.034), ('correction', 0.033), ('arenberg', 0.033), ('debrabanter', 0.033), ('einbeck', 0.033), ('eubank', 0.033), ('iuap', 0.033), ('jarrow', 0.033), ('kasteelpark', 0.033), ('mpc', 0.033), ('nanoparticles', 0.033), ('opsomer', 0.033), ('optec', 0.033), ('rondonotti', 0.033), ('kernels', 0.032), ('noisy', 0.032), ('supx', 0.031), ('unimodal', 0.031), ('penalized', 0.031), ('taylor', 0.029), ('selectors', 0.029), ('delecroix', 0.029), ('fwo', 0.029), ('katholieke', 0.029)]
simIndex simValue paperId paperTitle
same-paper 1 0.99999958 31 jmlr-2013-Derivative Estimation with Local Polynomial Fitting
Author: Kris De Brabanter, Jos De Brabanter, Bart De Moor, Irène Gijbels
Abstract: We present a fully automated framework to estimate derivatives nonparametrically without estimating the regression function. Derivative estimation plays an important role in the exploration of structures in curves (jump detection and discontinuities), comparison of regression curves, analysis of human growth data, etc. Hence, the study of estimating derivatives is equally important as regression estimation itself. Via empirical derivatives we approximate the qth order derivative and create a new data set which can be smoothed by any nonparametric regression estimator. We derive L1 and L2 rates and establish consistency of the estimator. The new data sets created by this technique are no longer independent and identically distributed (i.i.d.) random variables anymore. As a consequence, automated model selection criteria (data-driven procedures) break down. Therefore, we propose a simple factor method, based on bimodal kernels, to effectively deal with correlated data in the local polynomial regression framework. Keywords: nonparametric derivative estimation, model selection, empirical derivative, factor rule
2 0.082261816 96 jmlr-2013-Regularization-Free Principal Curve Estimation
Author: Samuel Gerber, Ross Whitaker
Abstract: Principal curves and manifolds provide a framework to formulate manifold learning within a statistical context. Principal curves define the notion of a curve passing through the middle of a distribution. While the intuition is clear, the formal definition leads to some technical and practical difficulties. In particular, principal curves are saddle points of the mean-squared projection distance, which poses severe challenges for estimation and model selection. This paper demonstrates that the difficulties in model selection associated with the saddle point property of principal curves are intrinsically tied to the minimization of the mean-squared projection distance. We introduce a new objective function, facilitated through a modification of the principal curve estimation approach, for which all critical points are principal curves and minima. Thus, the new formulation removes the fundamental issue for model selection in principal curve estimation. A gradient-descent-based estimator demonstrates the effectiveness of the new formulation for controlling model complexity on numerical experiments with synthetic and real data. Keywords: principal curve, manifold estimation, unsupervised learning, model complexity, model selection
3 0.057174608 76 jmlr-2013-Nonparametric Sparsity and Regularization
Author: Lorenzo Rosasco, Silvia Villa, Sofia Mosci, Matteo Santoro, Alessandro Verri
Abstract: In this work we are interested in the problems of supervised learning and variable selection when the input-output dependence is described by a nonlinear function depending on a few variables. Our goal is to consider a sparse nonparametric model, hence avoiding linear or additive models. The key idea is to measure the importance of each variable in the model by making use of partial derivatives. Based on this intuition we propose a new notion of nonparametric sparsity and a corresponding least squares regularization scheme. Using concepts and results from the theory of reproducing kernel Hilbert spaces and proximal methods, we show that the proposed learning algorithm corresponds to a minimization problem which can be provably solved by an iterative procedure. The consistency properties of the obtained estimator are studied both in terms of prediction and selection performance. An extensive empirical analysis shows that the proposed method performs favorably with respect to the state-of-the-art methods. Keywords: sparsity, nonparametric, variable selection, regularization, proximal methods, RKHS ∗. Also at Istituto Italiano di Tecnologia, Via Morego 30, 16163 Genova, Italy and Massachusetts Institute of Technology, Bldg. 46-5155, 77 Massachusetts Avenue, Cambridge, MA 02139, USA. c 2013 Lorenzo Rosasco, Silvia Villa, Sofia Mosci, Matteo Santoro and Alessandro Verri. ROSASCO , V ILLA , M OSCI , S ANTORO AND V ERRI
4 0.053585846 79 jmlr-2013-On the Mutual Nearest Neighbors Estimate in Regression
Author: Arnaud Guyader, Nick Hengartner
Abstract: Motivated by promising experimental results, this paper investigates the theoretical properties of a recently proposed nonparametric estimator, called the Mutual Nearest Neighbors rule, which estimates the regression function m(x) = E[Y |X = x] as follows: first identify the k nearest neighbors of x in the sample Dn , then keep only those for which x is itself one of the k nearest neighbors, and finally take the average over the corresponding response variables. We prove that this estimator is consistent and that its rate of convergence is optimal. Since the estimate with the optimal rate of convergence depends on the unknown distribution of the observations, we also present adaptation results by data-splitting. Keywords: nonparametric estimation, nearest neighbor methods, mathematical statistics
5 0.052919939 47 jmlr-2013-Gaussian Kullback-Leibler Approximate Inference
Author: Edward Challis, David Barber
Abstract: We investigate Gaussian Kullback-Leibler (G-KL) variational approximate inference techniques for Bayesian generalised linear models and various extensions. In particular we make the following novel contributions: sufficient conditions for which the G-KL objective is differentiable and convex are described; constrained parameterisations of Gaussian covariance that make G-KL methods fast and scalable are provided; the lower bound to the normalisation constant provided by G-KL methods is proven to dominate those provided by local lower bounding methods; complexity and model applicability issues of G-KL versus other Gaussian approximate inference methods are discussed. Numerical results comparing G-KL and other deterministic Gaussian approximate inference methods are presented for: robust Gaussian process regression models with either Student-t or Laplace likelihoods, large scale Bayesian binary logistic regression models, and Bayesian sparse linear models for sequential experimental design. Keywords: generalised linear models, latent linear models, variational approximate inference, large scale inference, sparse learning, experimental design, active learning, Gaussian processes
6 0.050692745 62 jmlr-2013-Learning Theory Approach to Minimum Error Entropy Criterion
7 0.050310966 26 jmlr-2013-Conjugate Relation between Loss Functions and Uncertainty Sets in Classification Problems
8 0.047972336 4 jmlr-2013-A Max-Norm Constrained Minimization Approach to 1-Bit Matrix Completion
9 0.046569157 104 jmlr-2013-Sparse Single-Index Model
10 0.045471057 90 jmlr-2013-Quasi-Newton Method: A New Direction
11 0.044462051 36 jmlr-2013-Distributions of Angles in Random Packing on Spheres
12 0.043342084 74 jmlr-2013-Multivariate Convex Regression with Adaptive Partitioning
13 0.042338733 120 jmlr-2013-Variational Algorithms for Marginal MAP
14 0.041072428 27 jmlr-2013-Consistent Selection of Tuning Parameters via Variable Selection Stability
15 0.039784484 11 jmlr-2013-Algorithms for Discovery of Multiple Markov Boundaries
16 0.038917352 14 jmlr-2013-Asymptotic Results on Adaptive False Discovery Rate Controlling Procedures Based on Kernel Estimators
17 0.038511019 68 jmlr-2013-Machine Learning with Operational Costs
18 0.03553139 88 jmlr-2013-Perturbative Corrections for Approximate Inference in Gaussian Latent Variable Models
19 0.033607073 6 jmlr-2013-A Plug-in Approach to Neyman-Pearson Classification
20 0.032121912 93 jmlr-2013-Random Walk Kernels and Learning Curves for Gaussian Process Regression on Random Graphs
topicId topicWeight
[(0, -0.177), (1, 0.014), (2, 0.052), (3, -0.0), (4, 0.037), (5, -0.039), (6, -0.016), (7, 0.055), (8, 0.078), (9, -0.091), (10, -0.113), (11, -0.03), (12, 0.083), (13, -0.085), (14, -0.106), (15, 0.027), (16, -0.105), (17, -0.014), (18, 0.024), (19, 0.209), (20, 0.029), (21, -0.211), (22, 0.019), (23, -0.057), (24, 0.163), (25, -0.047), (26, 0.078), (27, -0.023), (28, -0.085), (29, 0.136), (30, -0.085), (31, -0.038), (32, 0.068), (33, 0.028), (34, 0.075), (35, -0.05), (36, -0.044), (37, 0.002), (38, 0.195), (39, 0.057), (40, -0.009), (41, -0.08), (42, 0.078), (43, 0.061), (44, 0.017), (45, -0.007), (46, -0.017), (47, -0.243), (48, 0.143), (49, 0.126)]
simIndex simValue paperId paperTitle
same-paper 1 0.96082747 31 jmlr-2013-Derivative Estimation with Local Polynomial Fitting
Author: Kris De Brabanter, Jos De Brabanter, Bart De Moor, Irène Gijbels
Abstract: We present a fully automated framework to estimate derivatives nonparametrically without estimating the regression function. Derivative estimation plays an important role in the exploration of structures in curves (jump detection and discontinuities), comparison of regression curves, analysis of human growth data, etc. Hence, the study of estimating derivatives is equally important as regression estimation itself. Via empirical derivatives we approximate the qth order derivative and create a new data set which can be smoothed by any nonparametric regression estimator. We derive L1 and L2 rates and establish consistency of the estimator. The new data sets created by this technique are no longer independent and identically distributed (i.i.d.) random variables anymore. As a consequence, automated model selection criteria (data-driven procedures) break down. Therefore, we propose a simple factor method, based on bimodal kernels, to effectively deal with correlated data in the local polynomial regression framework. Keywords: nonparametric derivative estimation, model selection, empirical derivative, factor rule
2 0.58541316 62 jmlr-2013-Learning Theory Approach to Minimum Error Entropy Criterion
Author: Ting Hu, Jun Fan, Qiang Wu, Ding-Xuan Zhou
Abstract: We consider the minimum error entropy (MEE) criterion and an empirical risk minimization learning algorithm when an approximation of R´ nyi’s entropy (of order 2) by Parzen windowing is e minimized. This learning algorithm involves a Parzen windowing scaling parameter. We present a learning theory approach for this MEE algorithm in a regression setting when the scaling parameter is large. Consistency and explicit convergence rates are provided in terms of the approximation ability and capacity of the involved hypothesis space. Novel analysis is carried out for the generalization error associated with R´ nyi’s entropy and a Parzen windowing function, to overcome e technical difficulties arising from the essential differences between the classical least squares problems and the MEE setting. An involved symmetrized least squares error is introduced and analyzed, which is related to some ranking algorithms. Keywords: minimum error entropy, learning theory, R´ nyi’s entropy, empirical risk minimization, e approximation error
3 0.46935692 96 jmlr-2013-Regularization-Free Principal Curve Estimation
Author: Samuel Gerber, Ross Whitaker
Abstract: Principal curves and manifolds provide a framework to formulate manifold learning within a statistical context. Principal curves define the notion of a curve passing through the middle of a distribution. While the intuition is clear, the formal definition leads to some technical and practical difficulties. In particular, principal curves are saddle points of the mean-squared projection distance, which poses severe challenges for estimation and model selection. This paper demonstrates that the difficulties in model selection associated with the saddle point property of principal curves are intrinsically tied to the minimization of the mean-squared projection distance. We introduce a new objective function, facilitated through a modification of the principal curve estimation approach, for which all critical points are principal curves and minima. Thus, the new formulation removes the fundamental issue for model selection in principal curve estimation. A gradient-descent-based estimator demonstrates the effectiveness of the new formulation for controlling model complexity on numerical experiments with synthetic and real data. Keywords: principal curve, manifold estimation, unsupervised learning, model complexity, model selection
4 0.46147925 76 jmlr-2013-Nonparametric Sparsity and Regularization
Author: Lorenzo Rosasco, Silvia Villa, Sofia Mosci, Matteo Santoro, Alessandro Verri
Abstract: In this work we are interested in the problems of supervised learning and variable selection when the input-output dependence is described by a nonlinear function depending on a few variables. Our goal is to consider a sparse nonparametric model, hence avoiding linear or additive models. The key idea is to measure the importance of each variable in the model by making use of partial derivatives. Based on this intuition we propose a new notion of nonparametric sparsity and a corresponding least squares regularization scheme. Using concepts and results from the theory of reproducing kernel Hilbert spaces and proximal methods, we show that the proposed learning algorithm corresponds to a minimization problem which can be provably solved by an iterative procedure. The consistency properties of the obtained estimator are studied both in terms of prediction and selection performance. An extensive empirical analysis shows that the proposed method performs favorably with respect to the state-of-the-art methods. Keywords: sparsity, nonparametric, variable selection, regularization, proximal methods, RKHS ∗. Also at Istituto Italiano di Tecnologia, Via Morego 30, 16163 Genova, Italy and Massachusetts Institute of Technology, Bldg. 46-5155, 77 Massachusetts Avenue, Cambridge, MA 02139, USA. c 2013 Lorenzo Rosasco, Silvia Villa, Sofia Mosci, Matteo Santoro and Alessandro Verri. ROSASCO , V ILLA , M OSCI , S ANTORO AND V ERRI
5 0.42414331 74 jmlr-2013-Multivariate Convex Regression with Adaptive Partitioning
Author: Lauren A. Hannah, David B. Dunson
Abstract: We propose a new, nonparametric method for multivariate regression subject to convexity or concavity constraints on the response function. Convexity constraints are common in economics, statistics, operations research, financial engineering and optimization, but there is currently no multivariate method that is stable and computationally feasible for more than a few thousand observations. We introduce convex adaptive partitioning (CAP), which creates a globally convex regression model from locally linear estimates fit on adaptively selected covariate partitions. CAP is a computationally efficient, consistent method for convex regression. We demonstrate empirical performance by comparing the performance of CAP to other shape-constrained and unconstrained regression methods for predicting weekly wages and value function approximation for pricing American basket options. Keywords: adaptive partitioning, convex regression, nonparametric regression, shape constraint, treed linear model
6 0.4022468 25 jmlr-2013-Communication-Efficient Algorithms for Statistical Optimization
7 0.37720734 104 jmlr-2013-Sparse Single-Index Model
8 0.37283194 57 jmlr-2013-Kernel Bayes' Rule: Bayesian Inference with Positive Definite Kernels
9 0.36116785 79 jmlr-2013-On the Mutual Nearest Neighbors Estimate in Regression
10 0.35755315 26 jmlr-2013-Conjugate Relation between Loss Functions and Uncertainty Sets in Classification Problems
11 0.35410613 14 jmlr-2013-Asymptotic Results on Adaptive False Discovery Rate Controlling Procedures Based on Kernel Estimators
12 0.32276988 68 jmlr-2013-Machine Learning with Operational Costs
13 0.31005108 93 jmlr-2013-Random Walk Kernels and Learning Curves for Gaussian Process Regression on Random Graphs
14 0.29688814 11 jmlr-2013-Algorithms for Discovery of Multiple Markov Boundaries
15 0.29513878 107 jmlr-2013-Stochastic Dual Coordinate Ascent Methods for Regularized Loss Minimization
16 0.2817333 4 jmlr-2013-A Max-Norm Constrained Minimization Approach to 1-Bit Matrix Completion
17 0.27686158 120 jmlr-2013-Variational Algorithms for Marginal MAP
18 0.25879297 90 jmlr-2013-Quasi-Newton Method: A New Direction
19 0.25541359 27 jmlr-2013-Consistent Selection of Tuning Parameters via Variable Selection Stability
20 0.25118855 47 jmlr-2013-Gaussian Kullback-Leibler Approximate Inference
topicId topicWeight
[(0, 0.018), (5, 0.091), (6, 0.036), (10, 0.066), (20, 0.016), (23, 0.042), (39, 0.451), (68, 0.028), (70, 0.044), (75, 0.032), (85, 0.026), (87, 0.034), (95, 0.011)]
simIndex simValue paperId paperTitle
same-paper 1 0.6800493 31 jmlr-2013-Derivative Estimation with Local Polynomial Fitting
Author: Kris De Brabanter, Jos De Brabanter, Bart De Moor, Irène Gijbels
Abstract: We present a fully automated framework to estimate derivatives nonparametrically without estimating the regression function. Derivative estimation plays an important role in the exploration of structures in curves (jump detection and discontinuities), comparison of regression curves, analysis of human growth data, etc. Hence, the study of estimating derivatives is equally important as regression estimation itself. Via empirical derivatives we approximate the qth order derivative and create a new data set which can be smoothed by any nonparametric regression estimator. We derive L1 and L2 rates and establish consistency of the estimator. The new data sets created by this technique are no longer independent and identically distributed (i.i.d.) random variables anymore. As a consequence, automated model selection criteria (data-driven procedures) break down. Therefore, we propose a simple factor method, based on bimodal kernels, to effectively deal with correlated data in the local polynomial regression framework. Keywords: nonparametric derivative estimation, model selection, empirical derivative, factor rule
2 0.28878123 25 jmlr-2013-Communication-Efficient Algorithms for Statistical Optimization
Author: Yuchen Zhang, John C. Duchi, Martin J. Wainwright
Abstract: We analyze two communication-efficient algorithms for distributed optimization in statistical settings involving large-scale data sets. The first algorithm is a standard averaging method that distributes the N data samples evenly to m machines, performs separate minimization on each subset, and then averages the estimates. We provide a sharp analysis of this average mixture algorithm, showing that under a reasonable set of conditions, the combined parameter achieves √ mean-squared error (MSE) that decays as O (N −1 + (N/m)−2 ). Whenever m ≤ N, this guarantee matches the best possible rate achievable by a centralized algorithm having access to all N samples. The second algorithm is a novel method, based on an appropriate form of bootstrap subsampling. Requiring only a single round of communication, it has mean-squared error that decays as O (N −1 + (N/m)−3 ), and so is more robust to the amount of parallelization. In addition, we show that a stochastic gradient-based method attains mean-squared error decaying as O (N −1 + (N/m)−3/2 ), easing computation at the expense of a potentially slower MSE rate. We also provide an experimental evaluation of our methods, investigating their performance both on simulated data and on a large-scale regression problem from the internet search domain. In particular, we show that our methods can be used to efficiently solve an advertisement prediction problem from the Chinese SoSo Search Engine, which involves logistic regression with N ≈ 2.4 × 108 samples and d ≈ 740,000 covariates. Keywords: distributed learning, stochastic optimization, averaging, subsampling
3 0.28379664 105 jmlr-2013-Sparsity Regret Bounds for Individual Sequences in Online Linear Regression
Author: Sébastien Gerchinovitz
Abstract: We consider the problem of online linear regression on arbitrary deterministic sequences when the ambient dimension d can be much larger than the number of time rounds T . We introduce the notion of sparsity regret bound, which is a deterministic online counterpart of recent risk bounds derived in the stochastic setting under a sparsity scenario. We prove such regret bounds for an online-learning algorithm called SeqSEW and based on exponential weighting and data-driven truncation. In a second part we apply a parameter-free version of this algorithm to the stochastic setting (regression model with random design). This yields risk bounds of the same flavor as in Dalalyan and Tsybakov (2012a) but which solve two questions left open therein. In particular our risk bounds are adaptive (up to a logarithmic factor) to the unknown variance of the noise if the latter is Gaussian. We also address the regression model with fixed design. Keywords: sparsity, online linear regression, individual sequences, adaptive regret bounds
Author: Daniil Ryabko, Jérémie Mary
Abstract: A metric between time-series distributions is proposed that can be evaluated using binary classification methods, which were originally developed to work on i.i.d. data. It is shown how this metric can be used for solving statistical problems that are seemingly unrelated to classification and concern highly dependent time series. Specifically, the problems of time-series clustering, homogeneity testing and the three-sample problem are addressed. Universal consistency of the resulting algorithms is proven under most general assumptions. The theoretical results are illustrated with experiments on synthetic and real-world data. Keywords: time series, reductions, stationary ergodic, clustering, metrics between probability distributions
5 0.28224218 28 jmlr-2013-Construction of Approximation Spaces for Reinforcement Learning
Author: Wendelin Böhmer, Steffen Grünewälder, Yun Shen, Marek Musial, Klaus Obermayer
Abstract: Linear reinforcement learning (RL) algorithms like least-squares temporal difference learning (LSTD) require basis functions that span approximation spaces of potential value functions. This article investigates methods to construct these bases from samples. We hypothesize that an ideal approximation spaces should encode diffusion distances and that slow feature analysis (SFA) constructs such spaces. To validate our hypothesis we provide theoretical statements about the LSTD value approximation error and induced metric of approximation spaces constructed by SFA and the state-of-the-art methods Krylov bases and proto-value functions (PVF). In particular, we prove that SFA minimizes the average (over all tasks in the same environment) bound on the above approximation error. Compared to other methods, SFA is very sensitive to sampling and can sometimes fail to encode the whole state space. We derive a novel importance sampling modification to compensate for this effect. Finally, the LSTD and least squares policy iteration (LSPI) performance of approximation spaces constructed by Krylov bases, PVF, SFA and PCA is compared in benchmark tasks and a visual robot navigation experiment (both in a realistic simulation and with a robot). The results support our hypothesis and suggest that (i) SFA provides subspace-invariant features for MDPs with self-adjoint transition operators, which allows strong guarantees on the approximation error, (ii) the modified SFA algorithm is best suited for LSPI in both discrete and continuous state spaces and (iii) approximation spaces encoding diffusion distances facilitate LSPI performance. Keywords: reinforcement learning, diffusion distance, proto value functions, slow feature analysis, least-squares policy iteration, visual robot navigation c 2013 Wendelin B¨ hmer, Steffen Gr¨ new¨ lder, Yun Shen, Marek Musial and Klaus Obermayer. o u a ¨ ¨ ¨ B OHMER , G R UNEW ALDER , S HEN , M USIAL AND O BERMAYER
7 0.28106073 48 jmlr-2013-Generalized Spike-and-Slab Priors for Bayesian Group Feature Selection Using Expectation Propagation
8 0.27932021 26 jmlr-2013-Conjugate Relation between Loss Functions and Uncertainty Sets in Classification Problems
9 0.27927089 50 jmlr-2013-Greedy Feature Selection for Subspace Clustering
10 0.27859133 4 jmlr-2013-A Max-Norm Constrained Minimization Approach to 1-Bit Matrix Completion
11 0.27826554 72 jmlr-2013-Multi-Stage Multi-Task Feature Learning
12 0.27801883 77 jmlr-2013-On the Convergence of Maximum Variance Unfolding
13 0.27759191 102 jmlr-2013-Sparse Matrix Inversion with Scaled Lasso
14 0.27651745 5 jmlr-2013-A Near-Optimal Algorithm for Differentially-Private Principal Components
15 0.27530637 46 jmlr-2013-GURLS: A Least Squares Library for Supervised Learning
16 0.27498162 17 jmlr-2013-Belief Propagation for Continuous State Spaces: Stochastic Message-Passing with Quantitative Guarantees
17 0.27439293 59 jmlr-2013-Large-scale SVD and Manifold Learning
18 0.27400017 68 jmlr-2013-Machine Learning with Operational Costs
19 0.2738544 14 jmlr-2013-Asymptotic Results on Adaptive False Discovery Rate Controlling Procedures Based on Kernel Estimators
20 0.27336174 99 jmlr-2013-Semi-Supervised Learning Using Greedy Max-Cut