Abstract: In regression, the desired estimate of y|x is not always given by a conditional mean, although this is most common. Sometimes one wants to obtain a good estimate that satisfies the property that a proportion, τ, of y|x, will be below the estimate. For τ = 0.5 this is an estimate of the median. What might be called median regression, is subsumed under the term quantile regression. We present a nonparametric version of a quantile estimator, which can be obtained by solving a simple quadratic programming problem and provide uniform convergence statements and bounds on the quantile property of our estimator. Experimental results show the feasibility of the approach and competitiveness of our method with existing ones. We discuss several types of extensions including an approach to solve the quantile crossing problems, as well as a method to incorporate prior qualitative knowledge such as monotonicity constraints. Keywords: support vector machines, kernel methods, quantile estimation, nonparametric techniques, estimation with constraints
1 What might be called median regression, is subsumed under the term quantile regression. [sent-21, score-0.745]
2 We present a nonparametric version of a quantile estimator, which can be obtained by solving a simple quadratic programming problem and provide uniform convergence statements and bounds on the quantile property of our estimator. [sent-22, score-1.547]
3 We discuss several types of extensions including an approach to solve the quantile crossing problems, as well as a method to incorporate prior qualitative knowledge such as monotonicity constraints. [sent-24, score-0.803]
4 Keywords: support vector machines, kernel methods, quantile estimation, nonparametric techniques, estimation with constraints 1. [sent-25, score-0.847]
5 The purpose of our paper is: • To bring the technique of quantile regression to the attention of the machine learning community and show its relation to ν-Support Vector Regression (Sch¨ lkopf et al. [sent-44, score-0.818]
6 Likewise, the conditional quantile µτ (x) for a pair of random variables (x, y) ∈ X × R is defined as the function µτ : X → R for which pointwise µτ is the infimum over µ for which Pr {y ≤ µ|x} = τ. [sent-70, score-0.764]
7 3 Examples To illustrate regression analyses with conditional quantile functions, we provide two simple examples here. [sent-72, score-0.832]
8 Since ξ is normally distributed, we know that the τ-th quantile of ξ is given by σ(x)Φ−1 (τ), where Φ is the cumulative distribution function of the normal distribution with unit variance. [sent-81, score-0.725]
9 The τ-th conditional quantile function is obtained by connecting the τ-th quantile of the conditional distribution p(y|x) for all x ∈ X . [sent-94, score-1.528]
10 The error bars of many regression estimates can be viewed as crude quantile regressions. [sent-99, score-0.812]
11 The conditional quantile analysis (b) gives us more detailed description of these changes. [sent-110, score-0.764]
12 5 1 Figure 1: Illustration of conditional quantile functions of a simple artificial system in (1) with f (x) = sinc(x) and σ(x) = 0. [sent-129, score-0.764]
13 In this paper, we are concerned with the problem of estimating these conditional quantile functions from training data. [sent-140, score-0.764]
14 However this approach carries the disadvantage of requiring us to estimate both an upper and lower quantile simultaneously. [sent-143, score-0.745]
15 Following Vapnik’s paradigm of estimating only the relevant parameters directly (Vapnik, 1982) we attack the problem by estimating each quantile separately. [sent-148, score-0.725]
16 For completeness and comparison, we provide a detailed description of a symmetric quantile regression in Appendix A. [sent-149, score-0.793]
17 1 Loss Function The basic strategy behind quantile estimation arises from the observation that minimizing the ℓ1 -loss function for a location estimator yields the median. [sent-151, score-0.759]
18 Koenker and Bassett (1978) generalizes this idea to obtain a regression estimate for any quantile by tilting the loss function in a suitable fashion. [sent-154, score-0.853]
19 (2000) does, in fact, suggests that a choice of different upper bounds on the dual problem would lead o to estimators which weigh errors for positive and negative excess differently, that is, which would lead to quantile regression estimators. [sent-160, score-0.813]
20 05 10 12 14 16 18 Age 20 22 24 10 (a) Conditional mean analysis 12 14 16 18 Age 20 22 24 (b) Conditional quantile analysis Figure 2: An illustration of (a) conditional mean analysis and (b) conditional quantile analysis for a data set on bone mineral density (BMD) in adolescents. [sent-180, score-1.548]
21 In (b) the nine curves are the estimated conditional quantile curves at orders 0. [sent-182, score-0.812]
22 The set of conditional quantile curves provides more informative description of the relationship among variables such as non-constant variance or non-normality of the noise (error) distribution. [sent-189, score-0.788]
23 lτ(ξ) lτ (ξ) = τξ (τ − 1)ξ if ξ ≥ 0 if ξ < 0 τ−1 (2) τ 0 ξ 0 Figure 3: Pinball loss function for quantile estimation. [sent-191, score-0.765]
24 The idea is to use the same loss function for functions, f (x), rather than just constants in order to obtain quantile estimates conditional on x. [sent-204, score-0.823]
25 2 Optimization Problem Based on lτ (ξ) we define the expected quantile risk as R[ f ] := E p(x,y) [lτ (y − f (x))] . [sent-208, score-0.741]
26 (3) By the same reasoning as in Lemma 2 it follows that for f : X → R the minimizer of R[ f ] is the quantile µτ (x). [sent-209, score-0.75]
27 This ensures that the minimizer of (4) will satisfy the quantile property: Lemma 3 (Empirical Conditional Quantile Estimator) Assuming that f contains a scalar unregularized term, the minimizer of (4) satisfies: 1. [sent-213, score-0.775]
28 With respect to b, however, minimizing Rreg amounts to finding the τ quantile in terms of yi − g(xi ). [sent-222, score-0.757]
29 This is exactly what we want from a quantile estimator: by this procedure errors in one direction have a larger influence than those in the converse direction, which leads to the shifted estimate we expect from QR. [sent-256, score-0.745]
30 While they all satisfy the quantile property having half the points on either side of the regression, some estimates appear track the observations better. [sent-262, score-0.76]
31 This issue is addressed in Section 5 where we compute quantile regression estimates on a range of data sets. [sent-263, score-0.812]
32 Extensions and Modifications Our optimization framework lends itself naturally to a series of extensions and modifications of the regularized risk minimization framework for quantile regression. [sent-265, score-0.741]
33 While all three variants satisfy the quantile property, the degree of smoothness is controlled by the regularization constant λ. [sent-274, score-0.745]
34 9), two or more estimated conditional quantile functions can cross or overlap. [sent-285, score-0.764]
35 This embarrassing phenomenon called quantile crossings occurs because each conditional quantile function is independently estimated (Koenker, 2005; He, 1997). [sent-286, score-1.509]
36 9 conditional quantile functions estimated by the kernel-based estimator described in the previous section. [sent-294, score-0.798]
37 We note quantile crossings at several places, especially at the outside of the training data range (x < 0 and 1 < x). [sent-296, score-0.745]
38 3 Figure 5(b) shows a family of conditional quantile functions estimated with the non-crossing constraints. [sent-298, score-0.764]
39 Let us write the model for the τh -th conditional quantile function as fh (x) = φ(x), wh + bh for h = 1, 2, . [sent-304, score-0.822]
40 (8) Solving (5) or (6) for 1 ≤ h ≤ n with non-crossing constraints (8) allows us to estimate n conditional quantile functions not crossing at l points x1 , . [sent-309, score-0.843]
41 The model for conditional quantile τh -th quantile function is now represented as m l i=1 j=1 fh (x) = ∑ αhi k(x, xi ) + ∑ (θh−1i − θhi )k(x, x j ) + bh . [sent-317, score-1.542]
42 It is worth noting that, after enforcing the non-crossing constraints, the quantile property as in Lemma 3 may not be guaranteed. [sent-323, score-0.741]
43 This is because the method both tries to optimize for the quantile property and the non-crossing property (in relation to other quantiles). [sent-324, score-0.757]
44 Hence, the final outcome may not empirically satisfy the quantile property. [sent-325, score-0.725]
45 Yet, the non-crossing constraints are very nice because they ensure the semantics of the quantile definition: lower quantile level should not cross the higher quantile level. [sent-326, score-2.217]
46 2 age (standardized) (a) Without non-crossing constraints (b) With non-crossing constraints Figure 5: An example of quantile crossing problem in BMD data set presented in Section 1. [sent-365, score-0.863]
47 The plotted curves in (b) are the conditional quantile functions obtained with non-crossing constraints explained in Section 3. [sent-375, score-0.83]
48 There are no quantile crossing even at the outside of the training data range. [sent-377, score-0.742]
49 Figure 6 is an example of quantile regression with monotonicity constraints. [sent-385, score-0.854]
50 1240 N ONPARAMTERIC Q UANTILE E STIMATION 5 0 4 5 4 0 3 5 3 0 2 5 2 0 1 5 1 0 5 0 0 2 0 5 1 0 0 1 0 5 Figure 6: Example plots from quantile regression with and without monotonicity constraints. [sent-389, score-0.854]
51 The thin line represents the nonparametric quantile regression without monotonicity constraints whereas the thick line represents the nonparamtric quantile regression with monotonicity constraints. [sent-390, score-1.812]
52 i Since the additional constraint does not depend on b it is easy to see that the quantile property still holds. [sent-398, score-0.741]
53 3 Other Function Classes Semiparametric Estimates RKHS expansions may not be the only function classes desired for quantile regression. [sent-413, score-0.741]
54 The former discuss ℓ1 regularization of expansion coefficients whereas the latter discuss an explicit second order smoothing spline method for the purpose of quantile regression. [sent-428, score-0.768]
55 This is the main reason why this paper is called Non-parametric quantile estimation. [sent-441, score-0.725]
56 1 Performance Indicators We first need to discuss how to evaluate the performance of the estimate f versus the true conditional quantile µτ (x). [sent-445, score-0.784]
57 Two criteria are important for a good quantile estimator fτ : 1243 TAKEUCHI , L E , S EARS AND S MOLA • fτ needs to satisfy the quantile property as well as possible. [sent-446, score-1.5]
58 Note however, that (19) does not imply having a conditional quantile estimator at all. [sent-449, score-0.798]
59 For instance, the constant function based on the unconditional quantile estimator with respect to Y performs extremely well under this criterion. [sent-450, score-0.778]
60 With the conditions listed above for any sample size m and 0 < δ < 1, every quantile regression estimate fτ satisfies with probability at least (1 − δ) R[ fτ ] − R[ fτ∗ ] ≤ 2 max LR m (F ) + (4 + LB) log 2/δ where L = {τ, 1 − τ} . [sent-471, score-0.813]
61 Since we do not expect R[ f ] to vanish except for pathological applications where quantile regression is inappropriate (that is, cases where we have a deterministic dependency between y and x), the use of localized estimates (Bartlett et al. [sent-482, score-0.812]
62 3 Bounds on the Quantile Property The theorem of the previous section gave us some idea about how far the sample average quantile loss is from its true value under p. [sent-486, score-0.765]
63 We now proceed to stating bounds to which degree fτ satisfies the quantile property, i. [sent-487, score-0.725]
64 Since computing the true conditional quantile is impossible and all approximations of the latter rely on intermediate density estimation, this is the only objective criterion we could find. [sent-510, score-0.764]
65 1246 N ONPARAMTERIC Q UANTILE E STIMATION • Simultaneously we need to ensure that the estimate satisfies the quantile property, that is, we want to ensure that the estimator we obtained does indeed produce numbers fτ (x) which exceed y with probability close to τ. [sent-512, score-0.779]
66 1 M ODELS We compare the following four models: • An unconditional quantile estimator. [sent-516, score-0.744]
67 • Nonparametric quantile regression as described in Section 2. [sent-531, score-0.793]
68 Simultaneously we expect that the quantile property becomes less and less maintained, as the function class grows. [sent-535, score-0.741]
69 Data Set caution ftcollinssnow highway heights sniffer snowgeese ufc birthwt crabs GAGurine geyser gilgais topo BostonHousing CobarOre engel mcycle BigMac2003 UN3 cpus Sample Size 100 93 39 1375 125 45 372 189 200 314 299 365 52 506 38 235 133 69 125 209 No. [sent-581, score-0.873]
70 • In terms of ramp loss (quantile property), the performance of our npqr were comparable to other three models for intermediate quantile (τ = 0. [sent-613, score-1.295]
71 Note that the quantile property, as such, is not informative measure for conditional quantile estimation. [sent-624, score-1.489]
72 For example, uncond, the constant function based on the unconditional quantile estimator with respect to Y (straightforwardly obtained by sorting {yi }m without using {xi }m at all), performed best i=1 i=1 under this criterion. [sent-626, score-0.778]
73 It is clear that the less flexible model would have the better quantile property, but it does not necessarily mean that those less flexible ones are better for conditional quantile functions. [sent-627, score-1.489]
74 2 Experiments on Nonparametric Quantile Regression with Additional Constraints We empirically investigate the performances of nonparametric quantile regression estimator with the additional constraints described in section 3. [sent-636, score-0.931]
75 The performances of npqr and noncross are quite similar since npqr itself could produce almost noncrossing estimates and the constraints only make a small adjustments only when there happen to be the violations. [sent-666, score-0.88]
76 For example, if we run a quantile estimator with τ = 0. [sent-700, score-0.759]
77 Estimation with constraints We introduce non-crossing and monotonicity constraints in the context of nonparametric quantile regression. [sent-719, score-0.932]
78 0 0 r Figure 9: Illustration of the relationship between quantile in training and ramp loss. [sent-745, score-0.889]
79 Note, however, that in this situation we want to be able to estimate the regression quantile for a large set of different portfolios. [sent-769, score-0.813]
80 Nonparametric ν-Support Vector Regression In this section we explore an alternative to the quantile regression framework proposed in Section 2. [sent-781, score-0.793]
81 There the authors suggest a method for adapting SV regreso sion and classification estimates such that automatically only a quantile ν lies beyond the desired confidence region. [sent-784, score-0.744]
82 (2000) show that the o ν-SV regression estimate does converge to a quantile estimate. [sent-788, score-0.813]
83 Tables 3, 5 and 7, show the ramp loss, a measure for quantile property. [sent-870, score-0.889]
84 1 N ON -C ROSSING C ONSTRAINTS Table 8 shows the average pinball loss comparison between the nonparametric quantile regression without (npqr) and with (noncross) non-crossing constraints. [sent-881, score-1.068]
85 Table 9 shows the ramp loss, a measure for quantile property, of npqr and noncross. [sent-884, score-1.255]
86 Table 10 shows the average pinball loss comparison between the nonparametric quantile regression without (npqr) and with (npqrm) monotonicity constraints. [sent-894, score-1.129]
87 Table 11 shows the ramp loss, a measure for quantile property, of npqr and npqrm. [sent-896, score-1.255]
88 1257 TAKEUCHI , L E , S EARS AND S MOLA data set caution ftcollinssnow highway heights sniffer snowgeese ufc birthwt crabs GAGurine geyser gilgais topo BostonHousing CobarOre engel mcycle BigMac2003 UN3 cpus uncond 11. [sent-898, score-0.953]
89 1) data set caution ftcollinssnow highway heights sniffer snowgeese ufc birthwt crabs GAGurine geyser gilgais topo BostonHousing CobarOre engel mcycle BigMac2003 UN3 cpus uncond 11. [sent-1051, score-0.953]
90 1) 1258 N ONPARAMTERIC Q UANTILE E STIMATION data set caution ftcollinssnow highway heights sniffer snowgeese ufc birthwt crabs GAGurine geyser gilgais topo BostonHousing CobarOre engel mcycle BigMac2003 UN3 cpus uncond 38. [sent-1204, score-0.953]
91 5) data set caution ftcollinssnow highway heights sniffer snowgeese ufc birthwt crabs GAGurine geyser gilgais topo BostonHousing CobarOre engel mcycle BigMac2003 UN3 cpus uncond 52. [sent-1337, score-0.953]
92 17 TAKEUCHI , L E , S EARS AND S MOLA data set caution ftcollinssnow highway heights sniffer snowgeese ufc birthwt crabs GAGurine geyser gilgais topo BostonHousing CobarOre engel mcycle BigMac2003 UN3 cpus uncond 23. [sent-1510, score-0.953]
93 9) data set caution ftcollinssnow highway heights sniffer snowgeese ufc birthwt crabs GAGurine geyser gilgais topo BostonHousing CobarOre engel mcycle BigMac2003 UN3 cpus uncond 90. [sent-1663, score-0.953]
94 1 data set caution ftcollinssnow highway heights sniffer snowgeese ufc birthwt crabs GAGurine geyser gilgais topo BostonHousing CobarOre engel mcycle BigMac2003 UN3 cpus npqr 9. [sent-1817, score-1.239]
95 18 Table 8: Pinball loss comparison between the nonparametric quantile regression without (npqr) and with (noncross) non-crossing constraints. [sent-2059, score-0.895]
96 data set caution ftcollinssnow highway heights sniffer snowgeese ufc birthwt crabs GAGurine geyser gilgais topo BostonHousing CobarOre engel mcycle BigMac2003 UN3 cpus τ = 0. [sent-2060, score-0.873]
97 00) Table 9: Ramp loss (quantile property) comparison between the nonparametric quantile regression without (npqr) and with (noncross) non-crossing constraints. [sent-2303, score-0.895]
98 37 Table 10: Pinball loss comparison between the nonparametric quantile regression without (npqr) and with (npqrm) monotonicity constraints. [sent-2331, score-0.956]
99 00) Table 11: Ramp loss (quantile property) comparison between the nonparametric quantile regression without (npqr) and with (npqrm) monotonicity constraints. [sent-2359, score-0.956]
100 A convergent algorithm for quantile regression with smoothing splines. [sent-2403, score-0.793]
