jmlr jmlr2006 jmlr2006-65 knowledge-graph by maker-knowledge-mining
Source: pdf
Author: Ichiro Takeuchi, Quoc V. Le, Timothy D. Sears, Alexander J. Smola
Abstract: In regression, the desired estimate of y|x is not always given by a conditional mean, although this is most common. Sometimes one wants to obtain a good estimate that satisfies the property that a proportion, τ, of y|x, will be below the estimate. For τ = 0.5 this is an estimate of the median. What might be called median regression, is subsumed under the term quantile regression. We present a nonparametric version of a quantile estimator, which can be obtained by solving a simple quadratic programming problem and provide uniform convergence statements and bounds on the quantile property of our estimator. Experimental results show the feasibility of the approach and competitiveness of our method with existing ones. We discuss several types of extensions including an approach to solve the quantile crossing problems, as well as a method to incorporate prior qualitative knowledge such as monotonicity constraints. Keywords: support vector machines, kernel methods, quantile estimation, nonparametric techniques, estimation with constraints
Reference: text
sentIndex sentText sentNum sentScore
1 What might be called median regression, is subsumed under the term quantile regression. [sent-21, score-0.745]
2 We present a nonparametric version of a quantile estimator, which can be obtained by solving a simple quadratic programming problem and provide uniform convergence statements and bounds on the quantile property of our estimator. [sent-22, score-1.547]
3 We discuss several types of extensions including an approach to solve the quantile crossing problems, as well as a method to incorporate prior qualitative knowledge such as monotonicity constraints. [sent-24, score-0.803]
4 Keywords: support vector machines, kernel methods, quantile estimation, nonparametric techniques, estimation with constraints 1. [sent-25, score-0.847]
5 The purpose of our paper is: • To bring the technique of quantile regression to the attention of the machine learning community and show its relation to ν-Support Vector Regression (Sch¨ lkopf et al. [sent-44, score-0.818]
6 Likewise, the conditional quantile µτ (x) for a pair of random variables (x, y) ∈ X × R is defined as the function µτ : X → R for which pointwise µτ is the infimum over µ for which Pr {y ≤ µ|x} = τ. [sent-70, score-0.764]
7 3 Examples To illustrate regression analyses with conditional quantile functions, we provide two simple examples here. [sent-72, score-0.832]
8 Since ξ is normally distributed, we know that the τ-th quantile of ξ is given by σ(x)Φ−1 (τ), where Φ is the cumulative distribution function of the normal distribution with unit variance. [sent-81, score-0.725]
9 The τ-th conditional quantile function is obtained by connecting the τ-th quantile of the conditional distribution p(y|x) for all x ∈ X . [sent-94, score-1.528]
10 The error bars of many regression estimates can be viewed as crude quantile regressions. [sent-99, score-0.812]
11 The conditional quantile analysis (b) gives us more detailed description of these changes. [sent-110, score-0.764]
12 5 1 Figure 1: Illustration of conditional quantile functions of a simple artificial system in (1) with f (x) = sinc(x) and σ(x) = 0. [sent-129, score-0.764]
13 In this paper, we are concerned with the problem of estimating these conditional quantile functions from training data. [sent-140, score-0.764]
14 However this approach carries the disadvantage of requiring us to estimate both an upper and lower quantile simultaneously. [sent-143, score-0.745]
15 Following Vapnik’s paradigm of estimating only the relevant parameters directly (Vapnik, 1982) we attack the problem by estimating each quantile separately. [sent-148, score-0.725]
16 For completeness and comparison, we provide a detailed description of a symmetric quantile regression in Appendix A. [sent-149, score-0.793]
17 1 Loss Function The basic strategy behind quantile estimation arises from the observation that minimizing the ℓ1 -loss function for a location estimator yields the median. [sent-151, score-0.759]
18 Koenker and Bassett (1978) generalizes this idea to obtain a regression estimate for any quantile by tilting the loss function in a suitable fashion. [sent-154, score-0.853]
19 (2000) does, in fact, suggests that a choice of different upper bounds on the dual problem would lead o to estimators which weigh errors for positive and negative excess differently, that is, which would lead to quantile regression estimators. [sent-160, score-0.813]
20 05 10 12 14 16 18 Age 20 22 24 10 (a) Conditional mean analysis 12 14 16 18 Age 20 22 24 (b) Conditional quantile analysis Figure 2: An illustration of (a) conditional mean analysis and (b) conditional quantile analysis for a data set on bone mineral density (BMD) in adolescents. [sent-180, score-1.548]
21 In (b) the nine curves are the estimated conditional quantile curves at orders 0. [sent-182, score-0.812]
22 The set of conditional quantile curves provides more informative description of the relationship among variables such as non-constant variance or non-normality of the noise (error) distribution. [sent-189, score-0.788]
23 lτ(ξ) lτ (ξ) = τξ (τ − 1)ξ if ξ ≥ 0 if ξ < 0 τ−1 (2) τ 0 ξ 0 Figure 3: Pinball loss function for quantile estimation. [sent-191, score-0.765]
24 The idea is to use the same loss function for functions, f (x), rather than just constants in order to obtain quantile estimates conditional on x. [sent-204, score-0.823]
25 2 Optimization Problem Based on lτ (ξ) we define the expected quantile risk as R[ f ] := E p(x,y) [lτ (y − f (x))] . [sent-208, score-0.741]
26 (3) By the same reasoning as in Lemma 2 it follows that for f : X → R the minimizer of R[ f ] is the quantile µτ (x). [sent-209, score-0.75]
27 This ensures that the minimizer of (4) will satisfy the quantile property: Lemma 3 (Empirical Conditional Quantile Estimator) Assuming that f contains a scalar unregularized term, the minimizer of (4) satisfies: 1. [sent-213, score-0.775]
28 With respect to b, however, minimizing Rreg amounts to finding the τ quantile in terms of yi − g(xi ). [sent-222, score-0.757]
29 This is exactly what we want from a quantile estimator: by this procedure errors in one direction have a larger influence than those in the converse direction, which leads to the shifted estimate we expect from QR. [sent-256, score-0.745]
30 While they all satisfy the quantile property having half the points on either side of the regression, some estimates appear track the observations better. [sent-262, score-0.76]
31 This issue is addressed in Section 5 where we compute quantile regression estimates on a range of data sets. [sent-263, score-0.812]
32 Extensions and Modifications Our optimization framework lends itself naturally to a series of extensions and modifications of the regularized risk minimization framework for quantile regression. [sent-265, score-0.741]
33 While all three variants satisfy the quantile property, the degree of smoothness is controlled by the regularization constant λ. [sent-274, score-0.745]
34 9), two or more estimated conditional quantile functions can cross or overlap. [sent-285, score-0.764]
35 This embarrassing phenomenon called quantile crossings occurs because each conditional quantile function is independently estimated (Koenker, 2005; He, 1997). [sent-286, score-1.509]
36 9 conditional quantile functions estimated by the kernel-based estimator described in the previous section. [sent-294, score-0.798]
37 We note quantile crossings at several places, especially at the outside of the training data range (x < 0 and 1 < x). [sent-296, score-0.745]
38 3 Figure 5(b) shows a family of conditional quantile functions estimated with the non-crossing constraints. [sent-298, score-0.764]
39 Let us write the model for the τh -th conditional quantile function as fh (x) = φ(x), wh + bh for h = 1, 2, . [sent-304, score-0.822]
40 (8) Solving (5) or (6) for 1 ≤ h ≤ n with non-crossing constraints (8) allows us to estimate n conditional quantile functions not crossing at l points x1 , . [sent-309, score-0.843]
41 The model for conditional quantile τh -th quantile function is now represented as m l i=1 j=1 fh (x) = ∑ αhi k(x, xi ) + ∑ (θh−1i − θhi )k(x, x j ) + bh . [sent-317, score-1.542]
42 It is worth noting that, after enforcing the non-crossing constraints, the quantile property as in Lemma 3 may not be guaranteed. [sent-323, score-0.741]
43 This is because the method both tries to optimize for the quantile property and the non-crossing property (in relation to other quantiles). [sent-324, score-0.757]
44 Hence, the final outcome may not empirically satisfy the quantile property. [sent-325, score-0.725]
45 Yet, the non-crossing constraints are very nice because they ensure the semantics of the quantile definition: lower quantile level should not cross the higher quantile level. [sent-326, score-2.217]
46 2 age (standardized) (a) Without non-crossing constraints (b) With non-crossing constraints Figure 5: An example of quantile crossing problem in BMD data set presented in Section 1. [sent-365, score-0.863]
47 The plotted curves in (b) are the conditional quantile functions obtained with non-crossing constraints explained in Section 3. [sent-375, score-0.83]
48 There are no quantile crossing even at the outside of the training data range. [sent-377, score-0.742]
49 Figure 6 is an example of quantile regression with monotonicity constraints. [sent-385, score-0.854]
50 1240 N ONPARAMTERIC Q UANTILE E STIMATION 5 0 4 5 4 0 3 5 3 0 2 5 2 0 1 5 1 0 5 0 0 2 0 5 1 0 0 1 0 5 Figure 6: Example plots from quantile regression with and without monotonicity constraints. [sent-389, score-0.854]
51 The thin line represents the nonparametric quantile regression without monotonicity constraints whereas the thick line represents the nonparamtric quantile regression with monotonicity constraints. [sent-390, score-1.812]
52 i Since the additional constraint does not depend on b it is easy to see that the quantile property still holds. [sent-398, score-0.741]
53 3 Other Function Classes Semiparametric Estimates RKHS expansions may not be the only function classes desired for quantile regression. [sent-413, score-0.741]
54 The former discuss ℓ1 regularization of expansion coefficients whereas the latter discuss an explicit second order smoothing spline method for the purpose of quantile regression. [sent-428, score-0.768]
55 This is the main reason why this paper is called Non-parametric quantile estimation. [sent-441, score-0.725]
56 1 Performance Indicators We first need to discuss how to evaluate the performance of the estimate f versus the true conditional quantile µτ (x). [sent-445, score-0.784]
57 Two criteria are important for a good quantile estimator fτ : 1243 TAKEUCHI , L E , S EARS AND S MOLA • fτ needs to satisfy the quantile property as well as possible. [sent-446, score-1.5]
58 Note however, that (19) does not imply having a conditional quantile estimator at all. [sent-449, score-0.798]
59 For instance, the constant function based on the unconditional quantile estimator with respect to Y performs extremely well under this criterion. [sent-450, score-0.778]
60 With the conditions listed above for any sample size m and 0 < δ < 1, every quantile regression estimate fτ satisfies with probability at least (1 − δ) R[ fτ ] − R[ fτ∗ ] ≤ 2 max LR m (F ) + (4 + LB) log 2/δ where L = {τ, 1 − τ} . [sent-471, score-0.813]
61 Since we do not expect R[ f ] to vanish except for pathological applications where quantile regression is inappropriate (that is, cases where we have a deterministic dependency between y and x), the use of localized estimates (Bartlett et al. [sent-482, score-0.812]
62 3 Bounds on the Quantile Property The theorem of the previous section gave us some idea about how far the sample average quantile loss is from its true value under p. [sent-486, score-0.765]
63 We now proceed to stating bounds to which degree fτ satisfies the quantile property, i. [sent-487, score-0.725]
64 Since computing the true conditional quantile is impossible and all approximations of the latter rely on intermediate density estimation, this is the only objective criterion we could find. [sent-510, score-0.764]
65 1246 N ONPARAMTERIC Q UANTILE E STIMATION • Simultaneously we need to ensure that the estimate satisfies the quantile property, that is, we want to ensure that the estimator we obtained does indeed produce numbers fτ (x) which exceed y with probability close to τ. [sent-512, score-0.779]
66 1 M ODELS We compare the following four models: • An unconditional quantile estimator. [sent-516, score-0.744]
67 • Nonparametric quantile regression as described in Section 2. [sent-531, score-0.793]
68 Simultaneously we expect that the quantile property becomes less and less maintained, as the function class grows. [sent-535, score-0.741]
69 Data Set caution ftcollinssnow highway heights sniffer snowgeese ufc birthwt crabs GAGurine geyser gilgais topo BostonHousing CobarOre engel mcycle BigMac2003 UN3 cpus Sample Size 100 93 39 1375 125 45 372 189 200 314 299 365 52 506 38 235 133 69 125 209 No. [sent-581, score-0.873]
70 • In terms of ramp loss (quantile property), the performance of our npqr were comparable to other three models for intermediate quantile (τ = 0. [sent-613, score-1.295]
71 Note that the quantile property, as such, is not informative measure for conditional quantile estimation. [sent-624, score-1.489]
72 For example, uncond, the constant function based on the unconditional quantile estimator with respect to Y (straightforwardly obtained by sorting {yi }m without using {xi }m at all), performed best i=1 i=1 under this criterion. [sent-626, score-0.778]
73 It is clear that the less flexible model would have the better quantile property, but it does not necessarily mean that those less flexible ones are better for conditional quantile functions. [sent-627, score-1.489]
74 2 Experiments on Nonparametric Quantile Regression with Additional Constraints We empirically investigate the performances of nonparametric quantile regression estimator with the additional constraints described in section 3. [sent-636, score-0.931]
75 The performances of npqr and noncross are quite similar since npqr itself could produce almost noncrossing estimates and the constraints only make a small adjustments only when there happen to be the violations. [sent-666, score-0.88]
76 For example, if we run a quantile estimator with τ = 0. [sent-700, score-0.759]
77 Estimation with constraints We introduce non-crossing and monotonicity constraints in the context of nonparametric quantile regression. [sent-719, score-0.932]
78 0 0 r Figure 9: Illustration of the relationship between quantile in training and ramp loss. [sent-745, score-0.889]
79 Note, however, that in this situation we want to be able to estimate the regression quantile for a large set of different portfolios. [sent-769, score-0.813]
80 Nonparametric ν-Support Vector Regression In this section we explore an alternative to the quantile regression framework proposed in Section 2. [sent-781, score-0.793]
81 There the authors suggest a method for adapting SV regreso sion and classification estimates such that automatically only a quantile ν lies beyond the desired confidence region. [sent-784, score-0.744]
82 (2000) show that the o ν-SV regression estimate does converge to a quantile estimate. [sent-788, score-0.813]
83 Tables 3, 5 and 7, show the ramp loss, a measure for quantile property. [sent-870, score-0.889]
84 1 N ON -C ROSSING C ONSTRAINTS Table 8 shows the average pinball loss comparison between the nonparametric quantile regression without (npqr) and with (noncross) non-crossing constraints. [sent-881, score-1.068]
85 Table 9 shows the ramp loss, a measure for quantile property, of npqr and noncross. [sent-884, score-1.255]
86 Table 10 shows the average pinball loss comparison between the nonparametric quantile regression without (npqr) and with (npqrm) monotonicity constraints. [sent-894, score-1.129]
87 Table 11 shows the ramp loss, a measure for quantile property, of npqr and npqrm. [sent-896, score-1.255]
88 1257 TAKEUCHI , L E , S EARS AND S MOLA data set caution ftcollinssnow highway heights sniffer snowgeese ufc birthwt crabs GAGurine geyser gilgais topo BostonHousing CobarOre engel mcycle BigMac2003 UN3 cpus uncond 11. [sent-898, score-0.953]
89 1) data set caution ftcollinssnow highway heights sniffer snowgeese ufc birthwt crabs GAGurine geyser gilgais topo BostonHousing CobarOre engel mcycle BigMac2003 UN3 cpus uncond 11. [sent-1051, score-0.953]
90 1) 1258 N ONPARAMTERIC Q UANTILE E STIMATION data set caution ftcollinssnow highway heights sniffer snowgeese ufc birthwt crabs GAGurine geyser gilgais topo BostonHousing CobarOre engel mcycle BigMac2003 UN3 cpus uncond 38. [sent-1204, score-0.953]
91 5) data set caution ftcollinssnow highway heights sniffer snowgeese ufc birthwt crabs GAGurine geyser gilgais topo BostonHousing CobarOre engel mcycle BigMac2003 UN3 cpus uncond 52. [sent-1337, score-0.953]
92 17 TAKEUCHI , L E , S EARS AND S MOLA data set caution ftcollinssnow highway heights sniffer snowgeese ufc birthwt crabs GAGurine geyser gilgais topo BostonHousing CobarOre engel mcycle BigMac2003 UN3 cpus uncond 23. [sent-1510, score-0.953]
93 9) data set caution ftcollinssnow highway heights sniffer snowgeese ufc birthwt crabs GAGurine geyser gilgais topo BostonHousing CobarOre engel mcycle BigMac2003 UN3 cpus uncond 90. [sent-1663, score-0.953]
94 1 data set caution ftcollinssnow highway heights sniffer snowgeese ufc birthwt crabs GAGurine geyser gilgais topo BostonHousing CobarOre engel mcycle BigMac2003 UN3 cpus npqr 9. [sent-1817, score-1.239]
95 18 Table 8: Pinball loss comparison between the nonparametric quantile regression without (npqr) and with (noncross) non-crossing constraints. [sent-2059, score-0.895]
96 data set caution ftcollinssnow highway heights sniffer snowgeese ufc birthwt crabs GAGurine geyser gilgais topo BostonHousing CobarOre engel mcycle BigMac2003 UN3 cpus τ = 0. [sent-2060, score-0.873]
97 00) Table 9: Ramp loss (quantile property) comparison between the nonparametric quantile regression without (npqr) and with (noncross) non-crossing constraints. [sent-2303, score-0.895]
98 37 Table 10: Pinball loss comparison between the nonparametric quantile regression without (npqr) and with (npqrm) monotonicity constraints. [sent-2331, score-0.956]
99 00) Table 11: Ramp loss (quantile property) comparison between the nonparametric quantile regression without (npqr) and with (npqrm) monotonicity constraints. [sent-2359, score-0.956]
100 A convergent algorithm for quantile regression with smoothing splines. [sent-2403, score-0.793]
wordName wordTfidf (topN-words)
[('quantile', 0.725), ('npqr', 0.366), ('pinball', 0.173), ('ramp', 0.164), ('rqss', 0.133), ('takeuchi', 0.124), ('ears', 0.12), ('quantiles', 0.107), ('onparamteric', 0.107), ('uantile', 0.107), ('mola', 0.091), ('noncross', 0.087), ('na', 0.087), ('koenker', 0.085), ('stimation', 0.081), ('uncond', 0.08), ('qr', 0.076), ('bmd', 0.073), ('regression', 0.068), ('bostonhousing', 0.067), ('cobarore', 0.067), ('npqrm', 0.067), ('nonparametric', 0.062), ('monotonicity', 0.061), ('birthwt', 0.06), ('crabs', 0.06), ('engel', 0.06), ('ftcollinssnow', 0.06), ('gagurine', 0.06), ('geyser', 0.06), ('gilgais', 0.06), ('highway', 0.06), ('mcycle', 0.06), ('sniffer', 0.06), ('snowgeese', 0.06), ('topo', 0.06), ('ufc', 0.06), ('caution', 0.051), ('cpus', 0.051), ('heights', 0.051), ('constraints', 0.042), ('cars', 0.04), ('remp', 0.04), ('rreg', 0.04), ('loss', 0.04), ('trials', 0.039), ('conditional', 0.039), ('age', 0.037), ('bh', 0.035), ('estimator', 0.034), ('onions', 0.033), ('quantil', 0.033), ('standardized', 0.032), ('yi', 0.032), ('rademacher', 0.031), ('lipschitz', 0.029), ('semiparametric', 0.028), ('hi', 0.027), ('heteroscedastic', 0.027), ('kink', 0.027), ('rkhs', 0.026), ('mendelson', 0.026), ('minimizer', 0.025), ('pr', 0.025), ('lkopf', 0.025), ('monotonic', 0.024), ('sch', 0.024), ('curves', 0.024), ('spline', 0.023), ('onstraints', 0.023), ('wh', 0.023), ('median', 0.02), ('binomial', 0.02), ('williamson', 0.02), ('mammen', 0.02), ('height', 0.02), ('regularization', 0.02), ('bone', 0.02), ('crossings', 0.02), ('spinal', 0.02), ('dual', 0.02), ('estimate', 0.02), ('likewise', 0.019), ('statements', 0.019), ('fi', 0.019), ('estimates', 0.019), ('unconditional', 0.019), ('xi', 0.018), ('kernel', 0.018), ('crossing', 0.017), ('iid', 0.017), ('smola', 0.017), ('risk', 0.016), ('expansions', 0.016), ('property', 0.016), ('australia', 0.015), ('circles', 0.015), ('lagrange', 0.015), ('gures', 0.015), ('bivariate', 0.015)]
simIndex simValue paperId paperTitle
same-paper 1 0.99999863 65 jmlr-2006-Nonparametric Quantile Estimation
Author: Ichiro Takeuchi, Quoc V. Le, Timothy D. Sears, Alexander J. Smola
Abstract: In regression, the desired estimate of y|x is not always given by a conditional mean, although this is most common. Sometimes one wants to obtain a good estimate that satisfies the property that a proportion, τ, of y|x, will be below the estimate. For τ = 0.5 this is an estimate of the median. What might be called median regression, is subsumed under the term quantile regression. We present a nonparametric version of a quantile estimator, which can be obtained by solving a simple quadratic programming problem and provide uniform convergence statements and bounds on the quantile property of our estimator. Experimental results show the feasibility of the approach and competitiveness of our method with existing ones. We discuss several types of extensions including an approach to solve the quantile crossing problems, as well as a method to incorporate prior qualitative knowledge such as monotonicity constraints. Keywords: support vector machines, kernel methods, quantile estimation, nonparametric techniques, estimation with constraints
2 0.17774111 77 jmlr-2006-Quantile Regression Forests
Author: Nicolai Meinshausen
Abstract: Random forests were introduced as a machine learning tool in Breiman (2001) and have since proven to be very popular and powerful for high-dimensional regression and classification. For regression, random forests give an accurate approximation of the conditional mean of a response variable. It is shown here that random forests provide information about the full conditional distribution of the response variable, not only about the conditional mean. Conditional quantiles can be inferred with quantile regression forests, a generalisation of random forests. Quantile regression forests give a non-parametric and accurate way of estimating conditional quantiles for high-dimensional predictor variables. The algorithm is shown to be consistent. Numerical examples suggest that the algorithm is competitive in terms of predictive power. Keywords: quantile regression, random forests, adaptive neighborhood regression
Author: Pannagadatta K. Shivaswamy, Chiranjib Bhattacharyya, Alexander J. Smola
Abstract: We propose a novel second order cone programming formulation for designing robust classifiers which can handle uncertainty in observations. Similar formulations are also derived for designing regression functions which are robust to uncertainties in the regression setting. The proposed formulations are independent of the underlying distribution, requiring only the existence of second order moments. These formulations are then specialized to the case of missing values in observations for both classification and regression problems. Experiments show that the proposed formulations outperform imputation.
4 0.048032533 23 jmlr-2006-Consistency and Convergence Rates of One-Class SVMs and Related Algorithms
Author: Régis Vert, Jean-Philippe Vert
Abstract: We determine the asymptotic behaviour of the function computed by support vector machines (SVM) and related algorithms that minimize a regularized empirical convex loss function in the reproducing kernel Hilbert space of the Gaussian RBF kernel, in the situation where the number of examples tends to infinity, the bandwidth of the Gaussian kernel tends to 0, and the regularization parameter is held fixed. Non-asymptotic convergence bounds to this limit in the L2 sense are provided, together with upper bounds on the classification error that is shown to converge to the Bayes risk, therefore proving the Bayes-consistency of a variety of methods although the regularization term does not vanish. These results are particularly relevant to the one-class SVM, for which the regularization can not vanish by construction, and which is shown for the first time to be a consistent density level set estimator. Keywords: regularization, Gaussian kernel RKHS, one-class SVM, convex loss functions, kernel density estimation
5 0.04092114 48 jmlr-2006-Learning Minimum Volume Sets
Author: Clayton D. Scott, Robert D. Nowak
Abstract: Given a probability measure P and a reference measure µ, one is often interested in the minimum µ-measure set with P-measure at least α. Minimum volume sets of this type summarize the regions of greatest probability mass of P, and are useful for detecting anomalies and constructing confidence regions. This paper addresses the problem of estimating minimum volume sets based on independent samples distributed according to P. Other than these samples, no other information is available regarding P, but the reference measure µ is assumed to be known. We introduce rules for estimating minimum volume sets that parallel the empirical risk minimization and structural risk minimization principles in classification. As in classification, we show that the performances of our estimators are controlled by the rate of uniform convergence of empirical to true probabilities over the class from which the estimator is drawn. Thus we obtain finite sample size performance bounds in terms of VC dimension and related quantities. We also demonstrate strong universal consistency, an oracle inequality, and rates of convergence. The proposed estimators are illustrated with histogram and decision tree set estimation rules. Keywords: minimum volume sets, anomaly detection, statistical learning theory, uniform deviation bounds, sample complexity, universal consistency
6 0.035862055 17 jmlr-2006-Bounds for the Loss in Probability of Correct Classification Under Model Based Approximation
7 0.032541003 86 jmlr-2006-Step Size Adaptation in Reproducing Kernel Hilbert Space
8 0.030551061 40 jmlr-2006-Infinite-σ Limits For Tikhonov Regularization
9 0.030428723 44 jmlr-2006-Large Scale Transductive SVMs
10 0.027118392 15 jmlr-2006-Bayesian Network Learning with Parameter Constraints (Special Topic on Machine Learning and Optimization)
11 0.026452402 91 jmlr-2006-The Interplay of Optimization and Machine Learning Research (Special Topic on Machine Learning and Optimization)
12 0.024762785 43 jmlr-2006-Large Scale Multiple Kernel Learning (Special Topic on Machine Learning and Optimization)
13 0.023731632 60 jmlr-2006-Manifold Regularization: A Geometric Framework for Learning from Labeled and Unlabeled Examples
14 0.022965305 45 jmlr-2006-Learning Coordinate Covariances via Gradients
15 0.022847967 58 jmlr-2006-Lower Bounds and Aggregation in Density Estimation
16 0.021739107 29 jmlr-2006-Estimation of Gradients and Coordinate Covariation in Classification
19 0.020463811 84 jmlr-2006-Stability Properties of Empirical Risk Minimization over Donsker Classes
20 0.0202117 9 jmlr-2006-Accurate Error Bounds for the Eigenvalues of the Kernel Matrix
topicId topicWeight
[(0, 0.119), (1, -0.058), (2, 0.021), (3, -0.058), (4, -0.013), (5, 0.051), (6, -0.053), (7, 0.003), (8, 0.041), (9, -0.048), (10, -0.015), (11, 0.039), (12, 0.065), (13, 0.081), (14, 0.098), (15, -0.604), (16, -0.321), (17, 0.075), (18, -0.078), (19, 0.031), (20, 0.019), (21, 0.083), (22, 0.064), (23, 0.129), (24, 0.01), (25, 0.026), (26, 0.023), (27, 0.013), (28, 0.078), (29, 0.044), (30, 0.002), (31, -0.024), (32, -0.013), (33, -0.01), (34, 0.006), (35, -0.002), (36, -0.008), (37, 0.014), (38, 0.034), (39, 0.009), (40, 0.028), (41, 0.064), (42, 0.03), (43, 0.022), (44, -0.014), (45, -0.046), (46, -0.04), (47, -0.024), (48, 0.02), (49, -0.013)]
simIndex simValue paperId paperTitle
same-paper 1 0.95141327 65 jmlr-2006-Nonparametric Quantile Estimation
Author: Ichiro Takeuchi, Quoc V. Le, Timothy D. Sears, Alexander J. Smola
Abstract: In regression, the desired estimate of y|x is not always given by a conditional mean, although this is most common. Sometimes one wants to obtain a good estimate that satisfies the property that a proportion, τ, of y|x, will be below the estimate. For τ = 0.5 this is an estimate of the median. What might be called median regression, is subsumed under the term quantile regression. We present a nonparametric version of a quantile estimator, which can be obtained by solving a simple quadratic programming problem and provide uniform convergence statements and bounds on the quantile property of our estimator. Experimental results show the feasibility of the approach and competitiveness of our method with existing ones. We discuss several types of extensions including an approach to solve the quantile crossing problems, as well as a method to incorporate prior qualitative knowledge such as monotonicity constraints. Keywords: support vector machines, kernel methods, quantile estimation, nonparametric techniques, estimation with constraints
2 0.94107479 77 jmlr-2006-Quantile Regression Forests
Author: Nicolai Meinshausen
Abstract: Random forests were introduced as a machine learning tool in Breiman (2001) and have since proven to be very popular and powerful for high-dimensional regression and classification. For regression, random forests give an accurate approximation of the conditional mean of a response variable. It is shown here that random forests provide information about the full conditional distribution of the response variable, not only about the conditional mean. Conditional quantiles can be inferred with quantile regression forests, a generalisation of random forests. Quantile regression forests give a non-parametric and accurate way of estimating conditional quantiles for high-dimensional predictor variables. The algorithm is shown to be consistent. Numerical examples suggest that the algorithm is competitive in terms of predictive power. Keywords: quantile regression, random forests, adaptive neighborhood regression
Author: Pannagadatta K. Shivaswamy, Chiranjib Bhattacharyya, Alexander J. Smola
Abstract: We propose a novel second order cone programming formulation for designing robust classifiers which can handle uncertainty in observations. Similar formulations are also derived for designing regression functions which are robust to uncertainties in the regression setting. The proposed formulations are independent of the underlying distribution, requiring only the existence of second order moments. These formulations are then specialized to the case of missing values in observations for both classification and regression problems. Experiments show that the proposed formulations outperform imputation.
4 0.1903915 48 jmlr-2006-Learning Minimum Volume Sets
Author: Clayton D. Scott, Robert D. Nowak
Abstract: Given a probability measure P and a reference measure µ, one is often interested in the minimum µ-measure set with P-measure at least α. Minimum volume sets of this type summarize the regions of greatest probability mass of P, and are useful for detecting anomalies and constructing confidence regions. This paper addresses the problem of estimating minimum volume sets based on independent samples distributed according to P. Other than these samples, no other information is available regarding P, but the reference measure µ is assumed to be known. We introduce rules for estimating minimum volume sets that parallel the empirical risk minimization and structural risk minimization principles in classification. As in classification, we show that the performances of our estimators are controlled by the rate of uniform convergence of empirical to true probabilities over the class from which the estimator is drawn. Thus we obtain finite sample size performance bounds in terms of VC dimension and related quantities. We also demonstrate strong universal consistency, an oracle inequality, and rates of convergence. The proposed estimators are illustrated with histogram and decision tree set estimation rules. Keywords: minimum volume sets, anomaly detection, statistical learning theory, uniform deviation bounds, sample complexity, universal consistency
5 0.17989185 17 jmlr-2006-Bounds for the Loss in Probability of Correct Classification Under Model Based Approximation
Author: Magnus Ekdahl, Timo Koski
Abstract: In many pattern recognition/classification problem the true class conditional model and class probabilities are approximated for reasons of reducing complexity and/or of statistical estimation. The approximated classifier is expected to have worse performance, here measured by the probability of correct classification. We present an analysis valid in general, and easily computable formulas for estimating the degradation in probability of correct classification when compared to the optimal classifier. An example of an approximation is the Na¨ve Bayes classifier. We show that the perforı mance of the Na¨ve Bayes depends on the degree of functional dependence between the features ı and labels. We provide a sufficient condition for zero loss of performance, too. Keywords: Bayesian networks, na¨ve Bayes, plug-in classifier, Kolmogorov distance of variation, ı variational learning
6 0.16165903 23 jmlr-2006-Consistency and Convergence Rates of One-Class SVMs and Related Algorithms
8 0.14435993 43 jmlr-2006-Large Scale Multiple Kernel Learning (Special Topic on Machine Learning and Optimization)
9 0.13819903 44 jmlr-2006-Large Scale Transductive SVMs
10 0.13700618 86 jmlr-2006-Step Size Adaptation in Reproducing Kernel Hilbert Space
11 0.13372052 40 jmlr-2006-Infinite-σ Limits For Tikhonov Regularization
12 0.12288701 45 jmlr-2006-Learning Coordinate Covariances via Gradients
13 0.11390401 60 jmlr-2006-Manifold Regularization: A Geometric Framework for Learning from Labeled and Unlabeled Examples
14 0.11147961 71 jmlr-2006-Optimising Kernel Parameters and Regularisation Coefficients for Non-linear Discriminant Analysis
15 0.10663354 29 jmlr-2006-Estimation of Gradients and Coordinate Covariation in Classification
17 0.10229222 6 jmlr-2006-A Scoring Function for Learning Bayesian Networks based on Mutual Information and Conditional Independence Tests
18 0.10101702 84 jmlr-2006-Stability Properties of Empirical Risk Minimization over Donsker Classes
19 0.098539144 42 jmlr-2006-Kernels on Prolog Proof Trees: Statistical Learning in the ILP Setting (Special Topic on Inductive Programming)
20 0.09608601 58 jmlr-2006-Lower Bounds and Aggregation in Density Estimation
topicId topicWeight
[(8, 0.022), (36, 0.063), (45, 0.027), (50, 0.055), (59, 0.441), (63, 0.023), (68, 0.015), (76, 0.011), (78, 0.023), (81, 0.04), (84, 0.046), (90, 0.037), (91, 0.035), (96, 0.061)]
simIndex simValue paperId paperTitle
same-paper 1 0.71822459 65 jmlr-2006-Nonparametric Quantile Estimation
Author: Ichiro Takeuchi, Quoc V. Le, Timothy D. Sears, Alexander J. Smola
Abstract: In regression, the desired estimate of y|x is not always given by a conditional mean, although this is most common. Sometimes one wants to obtain a good estimate that satisfies the property that a proportion, τ, of y|x, will be below the estimate. For τ = 0.5 this is an estimate of the median. What might be called median regression, is subsumed under the term quantile regression. We present a nonparametric version of a quantile estimator, which can be obtained by solving a simple quadratic programming problem and provide uniform convergence statements and bounds on the quantile property of our estimator. Experimental results show the feasibility of the approach and competitiveness of our method with existing ones. We discuss several types of extensions including an approach to solve the quantile crossing problems, as well as a method to incorporate prior qualitative knowledge such as monotonicity constraints. Keywords: support vector machines, kernel methods, quantile estimation, nonparametric techniques, estimation with constraints
2 0.29329628 77 jmlr-2006-Quantile Regression Forests
Author: Nicolai Meinshausen
Abstract: Random forests were introduced as a machine learning tool in Breiman (2001) and have since proven to be very popular and powerful for high-dimensional regression and classification. For regression, random forests give an accurate approximation of the conditional mean of a response variable. It is shown here that random forests provide information about the full conditional distribution of the response variable, not only about the conditional mean. Conditional quantiles can be inferred with quantile regression forests, a generalisation of random forests. Quantile regression forests give a non-parametric and accurate way of estimating conditional quantiles for high-dimensional predictor variables. The algorithm is shown to be consistent. Numerical examples suggest that the algorithm is competitive in terms of predictive power. Keywords: quantile regression, random forests, adaptive neighborhood regression
Author: Juho Rousu, Craig Saunders, Sandor Szedmak, John Shawe-Taylor
Abstract: We present a kernel-based algorithm for hierarchical text classification where the documents are allowed to belong to more than one category at a time. The classification model is a variant of the Maximum Margin Markov Network framework, where the classification hierarchy is represented as a Markov tree equipped with an exponential family defined on the edges. We present an efficient optimization algorithm based on incremental conditional gradient ascent in single-example subspaces spanned by the marginal dual variables. The optimization is facilitated with a dynamic programming based algorithm that computes best update directions in the feasible set. Experiments show that the algorithm can feasibly optimize training sets of thousands of examples and classification hierarchies consisting of hundreds of nodes. Training of the full hierarchical model is as efficient as training independent SVM-light classifiers for each node. The algorithm’s predictive accuracy was found to be competitive with other recently introduced hierarchical multicategory or multilabel classification learning algorithms. Keywords: kernel methods, hierarchical classification, text categorization, convex optimization, structured outputs
4 0.27203554 9 jmlr-2006-Accurate Error Bounds for the Eigenvalues of the Kernel Matrix
Author: Mikio L. Braun
Abstract: The eigenvalues of the kernel matrix play an important role in a number of kernel methods, in particular, in kernel principal component analysis. It is well known that the eigenvalues of the kernel matrix converge as the number of samples tends to infinity. We derive probabilistic finite sample size bounds on the approximation error of individual eigenvalues which have the important property that the bounds scale with the eigenvalue under consideration, reflecting the actual behavior of the approximation errors as predicted by asymptotic results and observed in numerical simulations. Such scaling bounds have so far only been known for tail sums of eigenvalues. Asymptotically, the bounds presented here have a slower than stochastic rate, but the number of sample points necessary to make this disadvantage noticeable is often unrealistically large. Therefore, under practical conditions, and for all but the largest few eigenvalues, the bounds presented here form a significant improvement over existing non-scaling bounds. Keywords: kernel matrix, eigenvalues, relative perturbation bounds
5 0.26998052 29 jmlr-2006-Estimation of Gradients and Coordinate Covariation in Classification
Author: Sayan Mukherjee, Qiang Wu
Abstract: We introduce an algorithm that simultaneously estimates a classification function as well as its gradient in the supervised learning framework. The motivation for the algorithm is to find salient variables and estimate how they covary. An efficient implementation with respect to both memory and time is given. The utility of the algorithm is illustrated on simulated data as well as a gene expression data set. An error analysis is given for the convergence of the estimate of the classification function and its gradient to the true classification function and true gradient. Keywords: Tikhnonov regularization, variable selection, reproducing kernel Hilbert space, generalization bounds, classification
6 0.2677739 70 jmlr-2006-Online Passive-Aggressive Algorithms
7 0.26547912 40 jmlr-2006-Infinite-σ Limits For Tikhonov Regularization
8 0.26474029 60 jmlr-2006-Manifold Regularization: A Geometric Framework for Learning from Labeled and Unlabeled Examples
9 0.26313961 52 jmlr-2006-Learning Spectral Clustering, With Application To Speech Separation
10 0.26177582 28 jmlr-2006-Estimating the "Wrong" Graphical Model: Benefits in the Computation-Limited Setting
11 0.26138407 66 jmlr-2006-On Model Selection Consistency of Lasso
12 0.2612918 73 jmlr-2006-Pattern Recognition for Conditionally Independent Data
13 0.26048112 1 jmlr-2006-A Direct Method for Building Sparse Kernel Learning Algorithms
14 0.25995547 17 jmlr-2006-Bounds for the Loss in Probability of Correct Classification Under Model Based Approximation
15 0.25906891 11 jmlr-2006-Active Learning in Approximately Linear Regression Based on Conditional Expectation of Generalization Error
16 0.25817269 44 jmlr-2006-Large Scale Transductive SVMs
19 0.2509273 48 jmlr-2006-Learning Minimum Volume Sets
20 0.25091088 53 jmlr-2006-Learning a Hidden Hypergraph