jmlr jmlr2006 jmlr2006-65 knowledge-graph by maker-knowledge-mining

65 jmlr-2006-Nonparametric Quantile Estimation

Source: pdf

Author: Ichiro Takeuchi, Quoc V. Le, Timothy D. Sears, Alexander J. Smola

Abstract: In regression, the desired estimate of y|x is not always given by a conditional mean, although this is most common. Sometimes one wants to obtain a good estimate that satisﬁes the property that a proportion, τ, of y|x, will be below the estimate. For τ = 0.5 this is an estimate of the median. What might be called median regression, is subsumed under the term quantile regression. We present a nonparametric version of a quantile estimator, which can be obtained by solving a simple quadratic programming problem and provide uniform convergence statements and bounds on the quantile property of our estimator. Experimental results show the feasibility of the approach and competitiveness of our method with existing ones. We discuss several types of extensions including an approach to solve the quantile crossing problems, as well as a method to incorporate prior qualitative knowledge such as monotonicity constraints. Keywords: support vector machines, kernel methods, quantile estimation, nonparametric techniques, estimation with constraints

Reference: text

Summary: the most important sentenses genereted by tfidf model

sentIndex sentText sentNum sentScore

1 What might be called median regression, is subsumed under the term quantile regression. [sent-21, score-0.745]

2 We present a nonparametric version of a quantile estimator, which can be obtained by solving a simple quadratic programming problem and provide uniform convergence statements and bounds on the quantile property of our estimator. [sent-22, score-1.547]

3 We discuss several types of extensions including an approach to solve the quantile crossing problems, as well as a method to incorporate prior qualitative knowledge such as monotonicity constraints. [sent-24, score-0.803]

4 Keywords: support vector machines, kernel methods, quantile estimation, nonparametric techniques, estimation with constraints 1. [sent-25, score-0.847]

5 The purpose of our paper is: • To bring the technique of quantile regression to the attention of the machine learning community and show its relation to ν-Support Vector Regression (Sch¨ lkopf et al. [sent-44, score-0.818]

6 Likewise, the conditional quantile µτ (x) for a pair of random variables (x, y) ∈ X × R is deﬁned as the function µτ : X → R for which pointwise µτ is the inﬁmum over µ for which Pr {y ≤ µ|x} = τ. [sent-70, score-0.764]

7 3 Examples To illustrate regression analyses with conditional quantile functions, we provide two simple examples here. [sent-72, score-0.832]

8 Since ξ is normally distributed, we know that the τ-th quantile of ξ is given by σ(x)Φ−1 (τ), where Φ is the cumulative distribution function of the normal distribution with unit variance. [sent-81, score-0.725]

9 The τ-th conditional quantile function is obtained by connecting the τ-th quantile of the conditional distribution p(y|x) for all x ∈ X . [sent-94, score-1.528]

10 The error bars of many regression estimates can be viewed as crude quantile regressions. [sent-99, score-0.812]

11 The conditional quantile analysis (b) gives us more detailed description of these changes. [sent-110, score-0.764]

12 5 1 Figure 1: Illustration of conditional quantile functions of a simple artiﬁcial system in (1) with f (x) = sinc(x) and σ(x) = 0. [sent-129, score-0.764]

13 In this paper, we are concerned with the problem of estimating these conditional quantile functions from training data. [sent-140, score-0.764]

14 However this approach carries the disadvantage of requiring us to estimate both an upper and lower quantile simultaneously. [sent-143, score-0.745]

15 Following Vapnik’s paradigm of estimating only the relevant parameters directly (Vapnik, 1982) we attack the problem by estimating each quantile separately. [sent-148, score-0.725]

16 For completeness and comparison, we provide a detailed description of a symmetric quantile regression in Appendix A. [sent-149, score-0.793]

17 1 Loss Function The basic strategy behind quantile estimation arises from the observation that minimizing the ℓ1 -loss function for a location estimator yields the median. [sent-151, score-0.759]

18 Koenker and Bassett (1978) generalizes this idea to obtain a regression estimate for any quantile by tilting the loss function in a suitable fashion. [sent-154, score-0.853]

19 (2000) does, in fact, suggests that a choice of different upper bounds on the dual problem would lead o to estimators which weigh errors for positive and negative excess differently, that is, which would lead to quantile regression estimators. [sent-160, score-0.813]

20 05 10 12 14 16 18 Age 20 22 24 10 (a) Conditional mean analysis 12 14 16 18 Age 20 22 24 (b) Conditional quantile analysis Figure 2: An illustration of (a) conditional mean analysis and (b) conditional quantile analysis for a data set on bone mineral density (BMD) in adolescents. [sent-180, score-1.548]

21 In (b) the nine curves are the estimated conditional quantile curves at orders 0. [sent-182, score-0.812]

22 The set of conditional quantile curves provides more informative description of the relationship among variables such as non-constant variance or non-normality of the noise (error) distribution. [sent-189, score-0.788]

23 lτ(ξ) lτ (ξ) = τξ (τ − 1)ξ if ξ ≥ 0 if ξ < 0 τ−1 (2) τ 0 ξ 0 Figure 3: Pinball loss function for quantile estimation. [sent-191, score-0.765]

24 The idea is to use the same loss function for functions, f (x), rather than just constants in order to obtain quantile estimates conditional on x. [sent-204, score-0.823]

25 2 Optimization Problem Based on lτ (ξ) we deﬁne the expected quantile risk as R[ f ] := E p(x,y) [lτ (y − f (x))] . [sent-208, score-0.741]

26 (3) By the same reasoning as in Lemma 2 it follows that for f : X → R the minimizer of R[ f ] is the quantile µτ (x). [sent-209, score-0.75]

27 This ensures that the minimizer of (4) will satisfy the quantile property: Lemma 3 (Empirical Conditional Quantile Estimator) Assuming that f contains a scalar unregularized term, the minimizer of (4) satisﬁes: 1. [sent-213, score-0.775]

28 With respect to b, however, minimizing Rreg amounts to ﬁnding the τ quantile in terms of yi − g(xi ). [sent-222, score-0.757]

29 This is exactly what we want from a quantile estimator: by this procedure errors in one direction have a larger inﬂuence than those in the converse direction, which leads to the shifted estimate we expect from QR. [sent-256, score-0.745]

30 While they all satisfy the quantile property having half the points on either side of the regression, some estimates appear track the observations better. [sent-262, score-0.76]

31 This issue is addressed in Section 5 where we compute quantile regression estimates on a range of data sets. [sent-263, score-0.812]

32 Extensions and Modiﬁcations Our optimization framework lends itself naturally to a series of extensions and modiﬁcations of the regularized risk minimization framework for quantile regression. [sent-265, score-0.741]

33 While all three variants satisfy the quantile property, the degree of smoothness is controlled by the regularization constant λ. [sent-274, score-0.745]

34 9), two or more estimated conditional quantile functions can cross or overlap. [sent-285, score-0.764]

35 This embarrassing phenomenon called quantile crossings occurs because each conditional quantile function is independently estimated (Koenker, 2005; He, 1997). [sent-286, score-1.509]

36 9 conditional quantile functions estimated by the kernel-based estimator described in the previous section. [sent-294, score-0.798]

37 We note quantile crossings at several places, especially at the outside of the training data range (x < 0 and 1 < x). [sent-296, score-0.745]

38 3 Figure 5(b) shows a family of conditional quantile functions estimated with the non-crossing constraints. [sent-298, score-0.764]

39 Let us write the model for the τh -th conditional quantile function as fh (x) = φ(x), wh + bh for h = 1, 2, . [sent-304, score-0.822]

40 (8) Solving (5) or (6) for 1 ≤ h ≤ n with non-crossing constraints (8) allows us to estimate n conditional quantile functions not crossing at l points x1 , . [sent-309, score-0.843]

41 The model for conditional quantile τh -th quantile function is now represented as m l i=1 j=1 fh (x) = ∑ αhi k(x, xi ) + ∑ (θh−1i − θhi )k(x, x j ) + bh . [sent-317, score-1.542]

42 It is worth noting that, after enforcing the non-crossing constraints, the quantile property as in Lemma 3 may not be guaranteed. [sent-323, score-0.741]

43 This is because the method both tries to optimize for the quantile property and the non-crossing property (in relation to other quantiles). [sent-324, score-0.757]

44 Hence, the ﬁnal outcome may not empirically satisfy the quantile property. [sent-325, score-0.725]

45 Yet, the non-crossing constraints are very nice because they ensure the semantics of the quantile deﬁnition: lower quantile level should not cross the higher quantile level. [sent-326, score-2.217]

46 2 age (standardized) (a) Without non-crossing constraints (b) With non-crossing constraints Figure 5: An example of quantile crossing problem in BMD data set presented in Section 1. [sent-365, score-0.863]

47 The plotted curves in (b) are the conditional quantile functions obtained with non-crossing constraints explained in Section 3. [sent-375, score-0.83]

48 There are no quantile crossing even at the outside of the training data range. [sent-377, score-0.742]

49 Figure 6 is an example of quantile regression with monotonicity constraints. [sent-385, score-0.854]

50 1240 N ONPARAMTERIC Q UANTILE E STIMATION 5 0 4 5 4 0 3 5 3 0 2 5 2 0 1 5 1 0 5 0 0 2 0 5 1 0 0 1 0 5 Figure 6: Example plots from quantile regression with and without monotonicity constraints. [sent-389, score-0.854]

51 The thin line represents the nonparametric quantile regression without monotonicity constraints whereas the thick line represents the nonparamtric quantile regression with monotonicity constraints. [sent-390, score-1.812]

52 i Since the additional constraint does not depend on b it is easy to see that the quantile property still holds. [sent-398, score-0.741]

53 3 Other Function Classes Semiparametric Estimates RKHS expansions may not be the only function classes desired for quantile regression. [sent-413, score-0.741]

54 The former discuss ℓ1 regularization of expansion coefﬁcients whereas the latter discuss an explicit second order smoothing spline method for the purpose of quantile regression. [sent-428, score-0.768]

55 This is the main reason why this paper is called Non-parametric quantile estimation. [sent-441, score-0.725]

56 1 Performance Indicators We ﬁrst need to discuss how to evaluate the performance of the estimate f versus the true conditional quantile µτ (x). [sent-445, score-0.784]

57 Two criteria are important for a good quantile estimator fτ : 1243 TAKEUCHI , L E , S EARS AND S MOLA • fτ needs to satisfy the quantile property as well as possible. [sent-446, score-1.5]

58 Note however, that (19) does not imply having a conditional quantile estimator at all. [sent-449, score-0.798]

59 For instance, the constant function based on the unconditional quantile estimator with respect to Y performs extremely well under this criterion. [sent-450, score-0.778]

60 With the conditions listed above for any sample size m and 0 < δ < 1, every quantile regression estimate fτ satisﬁes with probability at least (1 − δ) R[ fτ ] − R[ fτ∗ ] ≤ 2 max LR m (F ) + (4 + LB) log 2/δ where L = {τ, 1 − τ} . [sent-471, score-0.813]

61 Since we do not expect R[ f ] to vanish except for pathological applications where quantile regression is inappropriate (that is, cases where we have a deterministic dependency between y and x), the use of localized estimates (Bartlett et al. [sent-482, score-0.812]

62 3 Bounds on the Quantile Property The theorem of the previous section gave us some idea about how far the sample average quantile loss is from its true value under p. [sent-486, score-0.765]

63 We now proceed to stating bounds to which degree fτ satisﬁes the quantile property, i. [sent-487, score-0.725]

64 Since computing the true conditional quantile is impossible and all approximations of the latter rely on intermediate density estimation, this is the only objective criterion we could ﬁnd. [sent-510, score-0.764]

65 1246 N ONPARAMTERIC Q UANTILE E STIMATION • Simultaneously we need to ensure that the estimate satisﬁes the quantile property, that is, we want to ensure that the estimator we obtained does indeed produce numbers fτ (x) which exceed y with probability close to τ. [sent-512, score-0.779]

66 1 M ODELS We compare the following four models: • An unconditional quantile estimator. [sent-516, score-0.744]

67 • Nonparametric quantile regression as described in Section 2. [sent-531, score-0.793]

68 Simultaneously we expect that the quantile property becomes less and less maintained, as the function class grows. [sent-535, score-0.741]

69 Data Set caution ftcollinssnow highway heights sniffer snowgeese ufc birthwt crabs GAGurine geyser gilgais topo BostonHousing CobarOre engel mcycle BigMac2003 UN3 cpus Sample Size 100 93 39 1375 125 45 372 189 200 314 299 365 52 506 38 235 133 69 125 209 No. [sent-581, score-0.873]

70 • In terms of ramp loss (quantile property), the performance of our npqr were comparable to other three models for intermediate quantile (τ = 0. [sent-613, score-1.295]

71 Note that the quantile property, as such, is not informative measure for conditional quantile estimation. [sent-624, score-1.489]

72 For example, uncond, the constant function based on the unconditional quantile estimator with respect to Y (straightforwardly obtained by sorting {yi }m without using {xi }m at all), performed best i=1 i=1 under this criterion. [sent-626, score-0.778]

73 It is clear that the less ﬂexible model would have the better quantile property, but it does not necessarily mean that those less ﬂexible ones are better for conditional quantile functions. [sent-627, score-1.489]

74 2 Experiments on Nonparametric Quantile Regression with Additional Constraints We empirically investigate the performances of nonparametric quantile regression estimator with the additional constraints described in section 3. [sent-636, score-0.931]

75 The performances of npqr and noncross are quite similar since npqr itself could produce almost noncrossing estimates and the constraints only make a small adjustments only when there happen to be the violations. [sent-666, score-0.88]

76 For example, if we run a quantile estimator with τ = 0. [sent-700, score-0.759]

77 Estimation with constraints We introduce non-crossing and monotonicity constraints in the context of nonparametric quantile regression. [sent-719, score-0.932]

78 0 0 r Figure 9: Illustration of the relationship between quantile in training and ramp loss. [sent-745, score-0.889]

79 Note, however, that in this situation we want to be able to estimate the regression quantile for a large set of different portfolios. [sent-769, score-0.813]

80 Nonparametric ν-Support Vector Regression In this section we explore an alternative to the quantile regression framework proposed in Section 2. [sent-781, score-0.793]

81 There the authors suggest a method for adapting SV regreso sion and classiﬁcation estimates such that automatically only a quantile ν lies beyond the desired conﬁdence region. [sent-784, score-0.744]

82 (2000) show that the o ν-SV regression estimate does converge to a quantile estimate. [sent-788, score-0.813]

83 Tables 3, 5 and 7, show the ramp loss, a measure for quantile property. [sent-870, score-0.889]

84 1 N ON -C ROSSING C ONSTRAINTS Table 8 shows the average pinball loss comparison between the nonparametric quantile regression without (npqr) and with (noncross) non-crossing constraints. [sent-881, score-1.068]

85 Table 9 shows the ramp loss, a measure for quantile property, of npqr and noncross. [sent-884, score-1.255]

86 Table 10 shows the average pinball loss comparison between the nonparametric quantile regression without (npqr) and with (npqrm) monotonicity constraints. [sent-894, score-1.129]

87 Table 11 shows the ramp loss, a measure for quantile property, of npqr and npqrm. [sent-896, score-1.255]

88 1257 TAKEUCHI , L E , S EARS AND S MOLA data set caution ftcollinssnow highway heights sniffer snowgeese ufc birthwt crabs GAGurine geyser gilgais topo BostonHousing CobarOre engel mcycle BigMac2003 UN3 cpus uncond 11. [sent-898, score-0.953]

89 1) data set caution ftcollinssnow highway heights sniffer snowgeese ufc birthwt crabs GAGurine geyser gilgais topo BostonHousing CobarOre engel mcycle BigMac2003 UN3 cpus uncond 11. [sent-1051, score-0.953]

90 1) 1258 N ONPARAMTERIC Q UANTILE E STIMATION data set caution ftcollinssnow highway heights sniffer snowgeese ufc birthwt crabs GAGurine geyser gilgais topo BostonHousing CobarOre engel mcycle BigMac2003 UN3 cpus uncond 38. [sent-1204, score-0.953]

91 5) data set caution ftcollinssnow highway heights sniffer snowgeese ufc birthwt crabs GAGurine geyser gilgais topo BostonHousing CobarOre engel mcycle BigMac2003 UN3 cpus uncond 52. [sent-1337, score-0.953]

92 17 TAKEUCHI , L E , S EARS AND S MOLA data set caution ftcollinssnow highway heights sniffer snowgeese ufc birthwt crabs GAGurine geyser gilgais topo BostonHousing CobarOre engel mcycle BigMac2003 UN3 cpus uncond 23. [sent-1510, score-0.953]

93 9) data set caution ftcollinssnow highway heights sniffer snowgeese ufc birthwt crabs GAGurine geyser gilgais topo BostonHousing CobarOre engel mcycle BigMac2003 UN3 cpus uncond 90. [sent-1663, score-0.953]

94 1 data set caution ftcollinssnow highway heights sniffer snowgeese ufc birthwt crabs GAGurine geyser gilgais topo BostonHousing CobarOre engel mcycle BigMac2003 UN3 cpus npqr 9. [sent-1817, score-1.239]

95 18 Table 8: Pinball loss comparison between the nonparametric quantile regression without (npqr) and with (noncross) non-crossing constraints. [sent-2059, score-0.895]

96 data set caution ftcollinssnow highway heights sniffer snowgeese ufc birthwt crabs GAGurine geyser gilgais topo BostonHousing CobarOre engel mcycle BigMac2003 UN3 cpus τ = 0. [sent-2060, score-0.873]

97 00) Table 9: Ramp loss (quantile property) comparison between the nonparametric quantile regression without (npqr) and with (noncross) non-crossing constraints. [sent-2303, score-0.895]

98 37 Table 10: Pinball loss comparison between the nonparametric quantile regression without (npqr) and with (npqrm) monotonicity constraints. [sent-2331, score-0.956]

99 00) Table 11: Ramp loss (quantile property) comparison between the nonparametric quantile regression without (npqr) and with (npqrm) monotonicity constraints. [sent-2359, score-0.956]

100 A convergent algorithm for quantile regression with smoothing splines. [sent-2403, score-0.793]

similar papers computed by tfidf model

tfidf for this paper:

wordName wordTfidf (topN-words)

[('quantile', 0.725), ('npqr', 0.366), ('pinball', 0.173), ('ramp', 0.164), ('rqss', 0.133), ('takeuchi', 0.124), ('ears', 0.12), ('quantiles', 0.107), ('onparamteric', 0.107), ('uantile', 0.107), ('mola', 0.091), ('noncross', 0.087), ('na', 0.087), ('koenker', 0.085), ('stimation', 0.081), ('uncond', 0.08), ('qr', 0.076), ('bmd', 0.073), ('regression', 0.068), ('bostonhousing', 0.067), ('cobarore', 0.067), ('npqrm', 0.067), ('nonparametric', 0.062), ('monotonicity', 0.061), ('birthwt', 0.06), ('crabs', 0.06), ('engel', 0.06), ('ftcollinssnow', 0.06), ('gagurine', 0.06), ('geyser', 0.06), ('gilgais', 0.06), ('highway', 0.06), ('mcycle', 0.06), ('sniffer', 0.06), ('snowgeese', 0.06), ('topo', 0.06), ('ufc', 0.06), ('caution', 0.051), ('cpus', 0.051), ('heights', 0.051), ('constraints', 0.042), ('cars', 0.04), ('remp', 0.04), ('rreg', 0.04), ('loss', 0.04), ('trials', 0.039), ('conditional', 0.039), ('age', 0.037), ('bh', 0.035), ('estimator', 0.034), ('onions', 0.033), ('quantil', 0.033), ('standardized', 0.032), ('yi', 0.032), ('rademacher', 0.031), ('lipschitz', 0.029), ('semiparametric', 0.028), ('hi', 0.027), ('heteroscedastic', 0.027), ('kink', 0.027), ('rkhs', 0.026), ('mendelson', 0.026), ('minimizer', 0.025), ('pr', 0.025), ('lkopf', 0.025), ('monotonic', 0.024), ('sch', 0.024), ('curves', 0.024), ('spline', 0.023), ('onstraints', 0.023), ('wh', 0.023), ('median', 0.02), ('binomial', 0.02), ('williamson', 0.02), ('mammen', 0.02), ('height', 0.02), ('regularization', 0.02), ('bone', 0.02), ('crossings', 0.02), ('spinal', 0.02), ('dual', 0.02), ('estimate', 0.02), ('likewise', 0.019), ('statements', 0.019), ('fi', 0.019), ('estimates', 0.019), ('unconditional', 0.019), ('xi', 0.018), ('kernel', 0.018), ('crossing', 0.017), ('iid', 0.017), ('smola', 0.017), ('risk', 0.016), ('expansions', 0.016), ('property', 0.016), ('australia', 0.015), ('circles', 0.015), ('lagrange', 0.015), ('gures', 0.015), ('bivariate', 0.015)]

similar papers list:

simIndex simValue paperId paperTitle

same-paper 1 0.99999863 65 jmlr-2006-Nonparametric Quantile Estimation

Author: Ichiro Takeuchi, Quoc V. Le, Timothy D. Sears, Alexander J. Smola

2 0.17774111 77 jmlr-2006-Quantile Regression Forests

Author: Nicolai Meinshausen

Abstract: Random forests were introduced as a machine learning tool in Breiman (2001) and have since proven to be very popular and powerful for high-dimensional regression and classiﬁcation. For regression, random forests give an accurate approximation of the conditional mean of a response variable. It is shown here that random forests provide information about the full conditional distribution of the response variable, not only about the conditional mean. Conditional quantiles can be inferred with quantile regression forests, a generalisation of random forests. Quantile regression forests give a non-parametric and accurate way of estimating conditional quantiles for high-dimensional predictor variables. The algorithm is shown to be consistent. Numerical examples suggest that the algorithm is competitive in terms of predictive power. Keywords: quantile regression, random forests, adaptive neighborhood regression

3 0.053300168 79 jmlr-2006-Second Order Cone Programming Approaches for Handling Missing and Uncertain Data (Special Topic on Machine Learning and Optimization)

Author: Pannagadatta K. Shivaswamy, Chiranjib Bhattacharyya, Alexander J. Smola

Abstract: We propose a novel second order cone programming formulation for designing robust classiﬁers which can handle uncertainty in observations. Similar formulations are also derived for designing regression functions which are robust to uncertainties in the regression setting. The proposed formulations are independent of the underlying distribution, requiring only the existence of second order moments. These formulations are then specialized to the case of missing values in observations for both classiﬁcation and regression problems. Experiments show that the proposed formulations outperform imputation.

4 0.048032533 23 jmlr-2006-Consistency and Convergence Rates of One-Class SVMs and Related Algorithms

Author: Régis Vert, Jean-Philippe Vert

Abstract: We determine the asymptotic behaviour of the function computed by support vector machines (SVM) and related algorithms that minimize a regularized empirical convex loss function in the reproducing kernel Hilbert space of the Gaussian RBF kernel, in the situation where the number of examples tends to inﬁnity, the bandwidth of the Gaussian kernel tends to 0, and the regularization parameter is held ﬁxed. Non-asymptotic convergence bounds to this limit in the L2 sense are provided, together with upper bounds on the classiﬁcation error that is shown to converge to the Bayes risk, therefore proving the Bayes-consistency of a variety of methods although the regularization term does not vanish. These results are particularly relevant to the one-class SVM, for which the regularization can not vanish by construction, and which is shown for the ﬁrst time to be a consistent density level set estimator. Keywords: regularization, Gaussian kernel RKHS, one-class SVM, convex loss functions, kernel density estimation

5 0.04092114 48 jmlr-2006-Learning Minimum Volume Sets

Author: Clayton D. Scott, Robert D. Nowak

Abstract: Given a probability measure P and a reference measure µ, one is often interested in the minimum µ-measure set with P-measure at least α. Minimum volume sets of this type summarize the regions of greatest probability mass of P, and are useful for detecting anomalies and constructing conﬁdence regions. This paper addresses the problem of estimating minimum volume sets based on independent samples distributed according to P. Other than these samples, no other information is available regarding P, but the reference measure µ is assumed to be known. We introduce rules for estimating minimum volume sets that parallel the empirical risk minimization and structural risk minimization principles in classiﬁcation. As in classiﬁcation, we show that the performances of our estimators are controlled by the rate of uniform convergence of empirical to true probabilities over the class from which the estimator is drawn. Thus we obtain ﬁnite sample size performance bounds in terms of VC dimension and related quantities. We also demonstrate strong universal consistency, an oracle inequality, and rates of convergence. The proposed estimators are illustrated with histogram and decision tree set estimation rules. Keywords: minimum volume sets, anomaly detection, statistical learning theory, uniform deviation bounds, sample complexity, universal consistency

6 0.035862055 17 jmlr-2006-Bounds for the Loss in Probability of Correct Classification Under Model Based Approximation

7 0.032541003 86 jmlr-2006-Step Size Adaptation in Reproducing Kernel Hilbert Space

8 0.030551061 40 jmlr-2006-Infinite-σ Limits For Tikhonov Regularization

9 0.030428723 44 jmlr-2006-Large Scale Transductive SVMs

10 0.027118392 15 jmlr-2006-Bayesian Network Learning with Parameter Constraints (Special Topic on Machine Learning and Optimization)

11 0.026452402 91 jmlr-2006-The Interplay of Optimization and Machine Learning Research (Special Topic on Machine Learning and Optimization)

12 0.024762785 43 jmlr-2006-Large Scale Multiple Kernel Learning (Special Topic on Machine Learning and Optimization)

13 0.023731632 60 jmlr-2006-Manifold Regularization: A Geometric Framework for Learning from Labeled and Unlabeled Examples

14 0.022965305 45 jmlr-2006-Learning Coordinate Covariances via Gradients

15 0.022847967 58 jmlr-2006-Lower Bounds and Aggregation in Density Estimation

16 0.021739107 29 jmlr-2006-Estimation of Gradients and Coordinate Covariation in Classification

17 0.021612119 26 jmlr-2006-Efficient Learning of Label Ranking by Soft Projections onto Polyhedra (Special Topic on Machine Learning and Optimization)

18 0.021096533 41 jmlr-2006-Kernel-Based Learning of Hierarchical Multilabel Classification Models (Special Topic on Machine Learning and Optimization)

19 0.020463811 84 jmlr-2006-Stability Properties of Empirical Risk Minimization over Donsker Classes

20 0.0202117 9 jmlr-2006-Accurate Error Bounds for the Eigenvalues of the Kernel Matrix

similar papers computed by lsi model

lsi for this paper:

topicId topicWeight

[(0, 0.119), (1, -0.058), (2, 0.021), (3, -0.058), (4, -0.013), (5, 0.051), (6, -0.053), (7, 0.003), (8, 0.041), (9, -0.048), (10, -0.015), (11, 0.039), (12, 0.065), (13, 0.081), (14, 0.098), (15, -0.604), (16, -0.321), (17, 0.075), (18, -0.078), (19, 0.031), (20, 0.019), (21, 0.083), (22, 0.064), (23, 0.129), (24, 0.01), (25, 0.026), (26, 0.023), (27, 0.013), (28, 0.078), (29, 0.044), (30, 0.002), (31, -0.024), (32, -0.013), (33, -0.01), (34, 0.006), (35, -0.002), (36, -0.008), (37, 0.014), (38, 0.034), (39, 0.009), (40, 0.028), (41, 0.064), (42, 0.03), (43, 0.022), (44, -0.014), (45, -0.046), (46, -0.04), (47, -0.024), (48, 0.02), (49, -0.013)]

similar papers list:

simIndex simValue paperId paperTitle

same-paper 1 0.95141327 65 jmlr-2006-Nonparametric Quantile Estimation

Author: Ichiro Takeuchi, Quoc V. Le, Timothy D. Sears, Alexander J. Smola

2 0.94107479 77 jmlr-2006-Quantile Regression Forests

Author: Nicolai Meinshausen

3 0.23444104 79 jmlr-2006-Second Order Cone Programming Approaches for Handling Missing and Uncertain Data (Special Topic on Machine Learning and Optimization)

Author: Pannagadatta K. Shivaswamy, Chiranjib Bhattacharyya, Alexander J. Smola

4 0.1903915 48 jmlr-2006-Learning Minimum Volume Sets

Author: Clayton D. Scott, Robert D. Nowak

5 0.17989185 17 jmlr-2006-Bounds for the Loss in Probability of Correct Classification Under Model Based Approximation

Author: Magnus Ekdahl, Timo Koski

Abstract: In many pattern recognition/classiﬁcation problem the true class conditional model and class probabilities are approximated for reasons of reducing complexity and/or of statistical estimation. The approximated classiﬁer is expected to have worse performance, here measured by the probability of correct classiﬁcation. We present an analysis valid in general, and easily computable formulas for estimating the degradation in probability of correct classiﬁcation when compared to the optimal classiﬁer. An example of an approximation is the Na¨ve Bayes classiﬁer. We show that the perforı mance of the Na¨ve Bayes depends on the degree of functional dependence between the features ı and labels. We provide a sufﬁcient condition for zero loss of performance, too. Keywords: Bayesian networks, na¨ve Bayes, plug-in classiﬁer, Kolmogorov distance of variation, ı variational learning

6 0.16165903 23 jmlr-2006-Consistency and Convergence Rates of One-Class SVMs and Related Algorithms

7 0.15140213 15 jmlr-2006-Bayesian Network Learning with Parameter Constraints (Special Topic on Machine Learning and Optimization)

8 0.14435993 43 jmlr-2006-Large Scale Multiple Kernel Learning (Special Topic on Machine Learning and Optimization)

9 0.13819903 44 jmlr-2006-Large Scale Transductive SVMs

10 0.13700618 86 jmlr-2006-Step Size Adaptation in Reproducing Kernel Hilbert Space

11 0.13372052 40 jmlr-2006-Infinite-σ Limits For Tikhonov Regularization

12 0.12288701 45 jmlr-2006-Learning Coordinate Covariances via Gradients

13 0.11390401 60 jmlr-2006-Manifold Regularization: A Geometric Framework for Learning from Labeled and Unlabeled Examples

14 0.11147961 71 jmlr-2006-Optimising Kernel Parameters and Regularisation Coefficients for Non-linear Discriminant Analysis

15 0.10663354 29 jmlr-2006-Estimation of Gradients and Coordinate Covariation in Classification

16 0.10531259 18 jmlr-2006-Building Support Vector Machines with Reduced Classifier Complexity (Special Topic on Machine Learning and Optimization)

17 0.10229222 6 jmlr-2006-A Scoring Function for Learning Bayesian Networks based on Mutual Information and Conditional Independence Tests

18 0.10101702 84 jmlr-2006-Stability Properties of Empirical Risk Minimization over Donsker Classes

19 0.098539144 42 jmlr-2006-Kernels on Prolog Proof Trees: Statistical Learning in the ILP Setting (Special Topic on Inductive Programming)

20 0.09608601 58 jmlr-2006-Lower Bounds and Aggregation in Density Estimation

similar papers computed by lda model

lda for this paper:

topicId topicWeight

[(8, 0.022), (36, 0.063), (45, 0.027), (50, 0.055), (59, 0.441), (63, 0.023), (68, 0.015), (76, 0.011), (78, 0.023), (81, 0.04), (84, 0.046), (90, 0.037), (91, 0.035), (96, 0.061)]

similar papers list:

simIndex simValue paperId paperTitle

same-paper 1 0.71822459 65 jmlr-2006-Nonparametric Quantile Estimation

Author: Ichiro Takeuchi, Quoc V. Le, Timothy D. Sears, Alexander J. Smola

2 0.29329628 77 jmlr-2006-Quantile Regression Forests

Author: Nicolai Meinshausen

3 0.27745995 41 jmlr-2006-Kernel-Based Learning of Hierarchical Multilabel Classification Models (Special Topic on Machine Learning and Optimization)

Author: Juho Rousu, Craig Saunders, Sandor Szedmak, John Shawe-Taylor

Abstract: We present a kernel-based algorithm for hierarchical text classiﬁcation where the documents are allowed to belong to more than one category at a time. The classiﬁcation model is a variant of the Maximum Margin Markov Network framework, where the classiﬁcation hierarchy is represented as a Markov tree equipped with an exponential family deﬁned on the edges. We present an efﬁcient optimization algorithm based on incremental conditional gradient ascent in single-example subspaces spanned by the marginal dual variables. The optimization is facilitated with a dynamic programming based algorithm that computes best update directions in the feasible set. Experiments show that the algorithm can feasibly optimize training sets of thousands of examples and classiﬁcation hierarchies consisting of hundreds of nodes. Training of the full hierarchical model is as efﬁcient as training independent SVM-light classiﬁers for each node. The algorithm’s predictive accuracy was found to be competitive with other recently introduced hierarchical multicategory or multilabel classiﬁcation learning algorithms. Keywords: kernel methods, hierarchical classiﬁcation, text categorization, convex optimization, structured outputs

4 0.27203554 9 jmlr-2006-Accurate Error Bounds for the Eigenvalues of the Kernel Matrix

Author: Mikio L. Braun

Abstract: The eigenvalues of the kernel matrix play an important role in a number of kernel methods, in particular, in kernel principal component analysis. It is well known that the eigenvalues of the kernel matrix converge as the number of samples tends to inﬁnity. We derive probabilistic ﬁnite sample size bounds on the approximation error of individual eigenvalues which have the important property that the bounds scale with the eigenvalue under consideration, reﬂecting the actual behavior of the approximation errors as predicted by asymptotic results and observed in numerical simulations. Such scaling bounds have so far only been known for tail sums of eigenvalues. Asymptotically, the bounds presented here have a slower than stochastic rate, but the number of sample points necessary to make this disadvantage noticeable is often unrealistically large. Therefore, under practical conditions, and for all but the largest few eigenvalues, the bounds presented here form a signiﬁcant improvement over existing non-scaling bounds. Keywords: kernel matrix, eigenvalues, relative perturbation bounds

5 0.26998052 29 jmlr-2006-Estimation of Gradients and Coordinate Covariation in Classification

Author: Sayan Mukherjee, Qiang Wu

Abstract: We introduce an algorithm that simultaneously estimates a classiﬁcation function as well as its gradient in the supervised learning framework. The motivation for the algorithm is to ﬁnd salient variables and estimate how they covary. An efﬁcient implementation with respect to both memory and time is given. The utility of the algorithm is illustrated on simulated data as well as a gene expression data set. An error analysis is given for the convergence of the estimate of the classiﬁcation function and its gradient to the true classiﬁcation function and true gradient. Keywords: Tikhnonov regularization, variable selection, reproducing kernel Hilbert space, generalization bounds, classiﬁcation

6 0.2677739 70 jmlr-2006-Online Passive-Aggressive Algorithms

7 0.26547912 40 jmlr-2006-Infinite-σ Limits For Tikhonov Regularization

8 0.26474029 60 jmlr-2006-Manifold Regularization: A Geometric Framework for Learning from Labeled and Unlabeled Examples

9 0.26313961 52 jmlr-2006-Learning Spectral Clustering, With Application To Speech Separation

10 0.26177582 28 jmlr-2006-Estimating the "Wrong" Graphical Model: Benefits in the Computation-Limited Setting

11 0.26138407 66 jmlr-2006-On Model Selection Consistency of Lasso

12 0.2612918 73 jmlr-2006-Pattern Recognition for Conditionally Independent Data

13 0.26048112 1 jmlr-2006-A Direct Method for Building Sparse Kernel Learning Algorithms

14 0.25995547 17 jmlr-2006-Bounds for the Loss in Probability of Correct Classification Under Model Based Approximation

15 0.25906891 11 jmlr-2006-Active Learning in Approximately Linear Regression Based on Conditional Expectation of Generalization Error

16 0.25817269 44 jmlr-2006-Large Scale Transductive SVMs

17 0.25712836 56 jmlr-2006-Linear Programs for Hypotheses Selection in Probabilistic Inference Models (Special Topic on Machine Learning and Optimization)

18 0.25382516 51 jmlr-2006-Learning Sparse Representations by Non-Negative Matrix Factorization and Sequential Cone Programming (Special Topic on Machine Learning and Optimization)

19 0.2509273 48 jmlr-2006-Learning Minimum Volume Sets

20 0.25091088 53 jmlr-2006-Learning a Hidden Hypergraph