jmlr jmlr2006 jmlr2006-77 knowledge-graph by maker-knowledge-mining

77 jmlr-2006-Quantile Regression Forests


Source: pdf

Author: Nicolai Meinshausen

Abstract: Random forests were introduced as a machine learning tool in Breiman (2001) and have since proven to be very popular and powerful for high-dimensional regression and classification. For regression, random forests give an accurate approximation of the conditional mean of a response variable. It is shown here that random forests provide information about the full conditional distribution of the response variable, not only about the conditional mean. Conditional quantiles can be inferred with quantile regression forests, a generalisation of random forests. Quantile regression forests give a non-parametric and accurate way of estimating conditional quantiles for high-dimensional predictor variables. The algorithm is shown to be consistent. Numerical examples suggest that the algorithm is competitive in terms of predictive power. Keywords: quantile regression, random forests, adaptive neighborhood regression

Reference: text


Summary: the most important sentenses genereted by tfidf model

sentIndex sentText sentNum sentScore

1 ch Seminar f¨r Statistik u ETH Z¨rich u 8092 Z¨rich, Switzerland u Editor: Greg Ridgeway Abstract Random forests were introduced as a machine learning tool in Breiman (2001) and have since proven to be very popular and powerful for high-dimensional regression and classification. [sent-4, score-0.37]

2 For regression, random forests give an accurate approximation of the conditional mean of a response variable. [sent-5, score-0.404]

3 It is shown here that random forests provide information about the full conditional distribution of the response variable, not only about the conditional mean. [sent-6, score-0.428]

4 Conditional quantiles can be inferred with quantile regression forests, a generalisation of random forests. [sent-7, score-0.372]

5 Quantile regression forests give a non-parametric and accurate way of estimating conditional quantiles for high-dimensional predictor variables. [sent-8, score-0.514]

6 Keywords: quantile regression, random forests, adaptive neighborhood regression 1. [sent-11, score-0.282]

7 Standard regression analysis tries to come up with an estimate µ(x) of the conditional mean E(Y |X = x) of the response variable Y , given ˆ X = x. [sent-14, score-0.134]

8 The conditional mean minimizes the expected squared error loss, E(Y |X = x) = arg min E{(Y − z)2 |X = x}, z and approximation of the conditional mean is typically achieved by minimization of a squared error type loss function over the available data. [sent-15, score-0.125]

9 Beyond the Conditional Mean The conditional mean illuminates just one aspect of the conditional distribution of a response variable Y , yet neglects all other features of possible interest. [sent-16, score-0.126]

10 This led to the development of quantile regression; for a good summary see e. [sent-17, score-0.217]

11 (1) The quantiles give more complete information about the distribution of Y as a function of the predictor variable X than the conditional mean alone. [sent-24, score-0.157]

12 Least-squares regression tries to estimate the conditional mean of ozone levels. [sent-26, score-0.161]

13 It gives little information about the fluctuations of ozone levels around this predicted conditional mean. [sent-27, score-0.104]

14 This can be achieved with quantile regression, as it gives information about the spread of the response variable. [sent-29, score-0.246]

15 (2005), which is to the best of our knowledge the first time that quantile regression is mentioned in the Machine Learning literature. [sent-31, score-0.272]

16 Consider again the prediction of next day ozone levels. [sent-34, score-0.102]

17 Some days, it might be possible to pinpoint next day ozone levels to a higher accuracy than on other days (this can indeed be observed for the ozone data, see the section with numerical results). [sent-35, score-0.133]

18 Quantile regression can be used to build prediction intervals. [sent-38, score-0.089]

19 Indeed, going back to the previous example, next day ozone level can on some days be predicted five times more accurately than on other days. [sent-44, score-0.088]

20 Outlier Detection Quantile regression can likewise be used for outlier detection (for surveys on outlier detection see e. [sent-47, score-0.099]

21 984 Quantile Regression Forests Estimating Quantiles from Data Quantile regression aims to estimate the conditional quantiles from data. [sent-60, score-0.174]

22 Quantile regression can be cast as an optimization problem, just as estimation of the conditional mean is achieved by minimizing a squared error loss function. [sent-61, score-0.123]

23 y≤q (3) While the conditional mean minimizes the expected squared error loss, conditional quantiles minimize the expected loss E(Lα ), Qα (x) = arg min E{Lα (Y, q)|X = x}. [sent-63, score-0.187]

24 q A parametric quantile regression is solved by optimizing the parameters so that the empirical loss is minimal. [sent-64, score-0.283]

25 Non-parametric approaches, in particular quantile Smoothing Splines (He et al. [sent-66, score-0.217]

26 Chaudhuri and Loh (2002) developed an interesting tree-based method for estimation of conditional quantiles which gives good performance and allows for easy interpretation, being in this respect similar to CART (Breiman et al. [sent-69, score-0.119]

27 Rather, the method is based on random forests (Breiman, 2001). [sent-72, score-0.325]

28 Random forests grows an ensemble of trees, employing random node and split point selection, inspired by Amit and Geman (1997). [sent-73, score-0.347]

29 The prediction of random forests can then be seen as an adaptive neighborhood classification and regression procedure (Lin and Jeon, 2002). [sent-74, score-0.414]

30 The prediction of random forests, or estimation of the conditional mean, is equivalent to the weighted mean of the observed response variables. [sent-79, score-0.132]

31 For quantile regression forests, trees are grown as in the standard random forests algorithm. [sent-80, score-0.648]

32 The conditional distribution is then estimated by the weighted distribution of observed response variables, where the weights attached to observations are identical to the original random forests algorithm. [sent-81, score-0.438]

33 In Section 2, necessary notation is introduced and the mechanism of random forests is briefly explained, using the interpretation of Lin and Jeon (2002), which views random forests as an adaptive nearest neighbor algorithm, a view that is later supported in Breiman (2004). [sent-82, score-0.65]

34 Using this interpretation, quantile regression forests are introduced in Section 3 as a natural generalisation of random forests. [sent-83, score-0.605]

35 Random Forests Random forests grows an ensemble of trees, using n independent observations (Yi , Xi ), i = 1, . [sent-86, score-0.361]

36 For each tree and each node, random forests employs randomness when selecting a variable to split on. [sent-91, score-0.349]

37 For regression, the prediction of random forests for a new data point X = x is the averaged response of all trees. [sent-96, score-0.388]

38 Yet, with random forests, each tree is grown using the original observations of the response variable, while boosting tries to fit the residuals after taking into account the prediction of previously generated trees (Friedman et al. [sent-100, score-0.186]

39 The prediction of a single tree T (θ) for a new data point X = x is obtained by averaging over the observed values in leaf (x, θ). [sent-117, score-0.084]

40 Let the weight vector wi (x, θ) be given by a positive constant if observation Xi is part of leaf (x, θ) and 0 if it is not. [sent-118, score-0.084]

41 (4) wi (x, θ) = #{j : Xj ∈ R (x,θ) } The prediction of a single tree, given covariate X = x, is then the weighted average of the original observations Yi , i = 1, . [sent-120, score-0.15]

42 i=1 Using random forests, the conditional mean E(Y |X = x) is approximated by the averaged prediction of k single trees, each constructed with an i. [sent-124, score-0.094]

43 Let wi (x) be the average of wi (θ) over this collection of trees, k wi (x) = k −1 wi (x, θt ). [sent-131, score-0.232]

44 (5) t=1 The prediction of random forests is then n wi (x)Yi . [sent-132, score-0.417]

45 Quantile Regression Forests It was shown above that random forests approximates the conditional mean E(Y |X = x) by a weighted mean over the observations of the response variable Y . [sent-139, score-0.464]

46 One could suspect that the weighted observations deliver not only a good approximation to the conditional mean but to the full conditional distribution. [sent-140, score-0.134]

47 This approximation is at the heart of the quantile regression forests algorithm. [sent-144, score-0.587]

48 ˆ ˆ Estimates Qα (x) of the conditional quantiles Qα (x) are obtained by plugging F (y|X = x) instead of F (y|X = x) into (1). [sent-163, score-0.119]

49 Other approaches for estimating quantiles from empirical distribution functions are discussed in Hyndman and Fan (1996). [sent-164, score-0.082]

50 The key difference between quantile regression forests and random forests is as follows: for each node in each tree, random forests keeps only the mean of the observations that fall into this node and neglects all other information. [sent-165, score-1.326]

51 In contrast, quantile regression forests keeps the value of all observations in this node, not just their mean, and assesses the conditional distribution based on this information. [sent-166, score-0.662]

52 Consistency for random forests (when approximating the conditional mean) has been shown for a simplified model of random forests in (Breiman, 2004), together with an analysis of convergence rates. [sent-171, score-0.687]

53 This assumption is necessary to derive consistency of quantile estimates from the consistency of distribution estimates. [sent-199, score-0.243]

54 Under the made assumptions, consistency of quantile regression forests is Theorem 1 Let Assumptions 1-5 be fulfilled. [sent-201, score-0.6]

55 Quantile regression forests is thus a consistent way of estimating conditional distributions and quantile functions. [sent-204, score-0.624]

56 , n be defined as the quantiles of the observations Yi , conditional on X = Xi , Ui = F (Yi |X = Xi ). [sent-208, score-0.157]

57 The approximation ˆ ˆ F (y|x) = F (y|X = x) can then be written as a sum of two parts, n n ˆ F (y|x) = wi (x) 1{Yi ≤y} = i=1 n = wi (x) 1{Ui ≤F (y|Xi )} i=1 wi (x) 1{Ui ≤F (y|x)} + i=1 n wi (x) (1{Ui ≤F (y|Xi )} − 1{Ui ≤F (y|x)} ). [sent-217, score-0.232]

58 i=1 The absolute difference between the approximation and the true value is hence bounded by n ˆ |F (y|x) − F (y|x)| ≤ |F (y|x) − wi (x) 1{Ui ≤F (y|x)} | + i=1 n | wi (x)(1{Ui ≤F (y|Xi )} − 1{Ui ≤F (y|x)} )|. [sent-218, score-0.116]

59 Taking supremum over y in the first part leads to n n sup |F (y|x) − y∈R wi (x) 1{Ui ≤F (y|x)} | = sup |z − z∈[0,1] i=1 wi (x) 1{Ui ≤z} |. [sent-220, score-0.143]

60 As the weights add to one, i=1 wi (x) = 1, and −1 = o(1) by Assumption 2, it follows that, for every x ∈ B, (min ,θ kθ ( )) n wi (x)2 → 0 i=1 989 n → ∞. [sent-226, score-0.116]

61 (7) Meinshausen and hence, for every z ∈ [0, 1] and x ∈ B, n |z − wi (x) 1{Ui ≤z} | = op (1) n → ∞. [sent-227, score-0.093]

62 By straightforward yet tedious calculations, it can be shown that the supremum can be extended not only to such a set of z-values, but also to the whole interval z ∈ [0, 1], so that n sup |z − z∈[0,1] wi (x) 1{Ui ≤z} | = op (1) n → ∞. [sent-229, score-0.119]

63 wi (x)(1{Ui ≤F (y|Xi )} − 1{Ui ≤F (y|x)} )| →p i=1 i=1 Using Assumption 4 about Lipschitz continuity of the distribution function, it thus remains to show that n wi (x) x − Xi 1 = op (1) n → ∞. [sent-239, score-0.151]

64 ,k wi (x, θt ), where wi (x, θ) is defined as the weight produced by a single tree with (random) parameter θt , as in (4). [sent-243, score-0.14]

65 Thus it suffices to show that, for a single tree, n wi (x, θ) x − Xi 1 = op (1) n → ∞. [sent-244, score-0.093]

66 (8) i=1 The rectangular subspace R (x,θ) ⊆ [0, 1]p of leaf (x, θ) of tree T (θ) is defined by the intervals I(x, m, θ) ⊆ [0, 1] for m = 1, . [sent-245, score-0.101]

67 As the predictor variables are assumed to be uniform over [0, 1], it holds by a KolmogorovSmirnov type argument that (m) sup |Fn (t) − t| →p 0 n → ∞, t∈[0,1] and (10) implies thus maxm |I(x, m, θ)| = op (1) for n → ∞, which completes the proof. [sent-279, score-0.082]

68 For quantile regression forests (QRF), bagged versions of the training data are used for each of the k = 1000 trees. [sent-289, score-0.597]

69 Linear quantile regression is very similar to standard linear regression and is extensively covered in Koenker (2005). [sent-294, score-0.327]

70 To make linear quantile regression (LQR) more competitive, interaction terms between variables were added for QQR. [sent-295, score-0.272]

71 Quantile regression trees with piecewise polynomial form were introduced in Chaudhuri and Loh (2002). [sent-298, score-0.1]

72 Software for quantile regression trees comes in the form of the very useful software package GUIDE, available from www. [sent-299, score-0.313]

73 html, which makes also piecewise constant and piecewise linear quantile regression trees (TRC) available. [sent-303, score-0.333]

74 Evaluation To measure the quality of the conditional quantile approximations, loss function (3) is used in conjunction with 5-fold cross-validation. [sent-308, score-0.265]

75 The minimum of the loss would be achieved by the true conditional quantile function, as discussed previously. [sent-310, score-0.265]

76 The empirical loss over the test data is computed for all splits of the data sets at quantiles α ∈ {. [sent-311, score-0.093]

77 Additionally to the average loss for each method, one might be interested to see whether the difference in performance between quantile regression forests and the other methods is significant or not. [sent-319, score-0.598]

78 To this end bootstrapping is used, comparing each method against quantile regression forests (QRF). [sent-320, score-0.587]

79 There is not a single data set on which any competing method performs significantly better than quantile regression forests (QRF). [sent-323, score-0.594]

80 This is despite the fact that quantile regression trees have to be grown separately for each quantile α, whereas the same set of trees can be used for all quantiles with QRF. [sent-327, score-0.651]

81 Moreover, monotonicity of the quantile estimates would not be guaranteed any longer. [sent-330, score-0.217]

82 Linear quantile regression with interaction terms works surprisingly well in comparison, especially for moderate quantiles (that is α is not too close to either 0 or 1). [sent-333, score-0.354]

83 However, for more extremal quantiles, quantile regression forests delivers most often a better better approximation. [sent-334, score-0.587]

84 The performance of quantile regression forests seems to be robust with respect to inclusion of noise variables. [sent-338, score-0.587]

85 Prediction Intervals A possible application of quantile regression forests is the construction of prediction intervals, as discussed previously. [sent-339, score-0.621]

86 005 LQR TRM QQR TRP QRF TRC Figure 1: Average loss for various data sets (from top to bottom) and quantiles (from left to right). [sent-517, score-0.093]

87 The average loss of quantile regression forests is shown in each plot as the leftmost dot and is indicated as well by a horizontal line for better comparison. [sent-518, score-0.598]

88 The vertical bars indicate the bootstrap confidence intervals for the difference in average loss for each method against quantile regression forests. [sent-520, score-0.326]

89 Note that no parameters have been fine-tuned for quantile regression forests (the default settings are used throughout). [sent-521, score-0.594]

90 , n in the Boston Housing data set (with n = 506), conditional quantiles are estimated with QRF on a test set which does not include the i-th observation (5-fold cross-validation). [sent-688, score-0.119]

91 Moreover, the mean of the upper and lower end of the prediction interval is subtracted from all observations and prediction intervals. [sent-699, score-0.127]

92 996 Quantile Regression Forests observed values and prediction intervals (centered) −15 −10 −5 0 5 10 15 observed values and prediction intervals (centered) −50 0 50 100 BigMac q 1. [sent-701, score-0.14]

93 Additionally, the percentage of observations that lie above the upper end of their respective prediction intervals (below the lower end) are indicated in the upper left corner (lower left corner). [sent-713, score-0.108]

94 For the Big Mac, Fuel, and Ozone data sets, it is particularly apparent that the lengths of the prediction intervals vary strongly (some values can thus be predicted more accurately than others). [sent-716, score-0.089]

95 With quantile regression forests, it is possible to give a range in which each observation is going to be (with high probability). [sent-719, score-0.272]

96 Conclusions Quantile regression forests infer the full conditional distribution of a response variable. [sent-723, score-0.436]

97 The length of the prediction intervals reflect thus the variation of new observations around their predicted values. [sent-726, score-0.119]

98 The estimated conditional distribution is thus a useful addition to the commonly inferred conditional mean of a response variable. [sent-729, score-0.116]

99 It was shown that quantile regression forests are, under some reasonable assumptions, consistent for conditional quantile estimation. [sent-730, score-0.841]

100 Nonparametric estimation of conditional quantiles using quantile regression trees. [sent-769, score-0.391]


similar papers computed by tfidf model

tfidf for this paper:

wordName wordTfidf (topN-words)

[('qq', 0.794), ('forests', 0.315), ('qqq', 0.234), ('quantile', 0.217), ('qrf', 0.209), ('trc', 0.163), ('lqr', 0.112), ('qqr', 0.107), ('trm', 0.107), ('trp', 0.107), ('quantiles', 0.082), ('wi', 0.058), ('ozone', 0.056), ('qqqq', 0.056), ('regression', 0.055), ('ui', 0.048), ('breiman', 0.04), ('observations', 0.038), ('conditional', 0.037), ('intervals', 0.036), ('qqqqq', 0.036), ('meinshausen', 0.035), ('op', 0.035), ('prediction', 0.034), ('splitpoint', 0.031), ('trees', 0.029), ('response', 0.029), ('leaf', 0.026), ('predictor', 0.025), ('erence', 0.025), ('tree', 0.024), ('grown', 0.022), ('smin', 0.022), ('fuel', 0.022), ('koenker', 0.022), ('jeon', 0.02), ('mtry', 0.02), ('housing', 0.018), ('abalone', 0.017), ('bigmac', 0.017), ('piecewise', 0.016), ('outlier', 0.015), ('chaudhuri', 0.015), ('nicolai', 0.015), ('rectangular', 0.015), ('node', 0.014), ('di', 0.014), ('mean', 0.013), ('boston', 0.013), ('consistency', 0.013), ('maxm', 0.013), ('day', 0.012), ('package', 0.012), ('predicted', 0.011), ('covariate', 0.011), ('ordered', 0.011), ('loss', 0.011), ('random', 0.01), ('centered', 0.01), ('bagged', 0.01), ('barnett', 0.01), ('gasoline', 0.01), ('hodge', 0.01), ('hyndman', 0.01), ('liaw', 0.01), ('markou', 0.01), ('neglects', 0.01), ('erent', 0.009), ('weighted', 0.009), ('sup', 0.009), ('median', 0.009), ('outliers', 0.009), ('supremum', 0.009), ('days', 0.009), ('panel', 0.009), ('amit', 0.009), ('portnoy', 0.009), ('noteworthy', 0.009), ('ensemble', 0.008), ('interval', 0.008), ('yi', 0.008), ('gn', 0.008), ('lin', 0.008), ('vary', 0.008), ('growing', 0.008), ('mac', 0.008), ('dispersion', 0.008), ('income', 0.008), ('loh', 0.008), ('tax', 0.008), ('generalisation', 0.008), ('sn', 0.008), ('friedman', 0.007), ('detection', 0.007), ('competing', 0.007), ('lipschitz', 0.007), ('anomalies', 0.007), ('bars', 0.007), ('default', 0.007), ('squared', 0.007)]

similar papers list:

simIndex simValue paperId paperTitle

same-paper 1 1.0000005 77 jmlr-2006-Quantile Regression Forests

Author: Nicolai Meinshausen

Abstract: Random forests were introduced as a machine learning tool in Breiman (2001) and have since proven to be very popular and powerful for high-dimensional regression and classification. For regression, random forests give an accurate approximation of the conditional mean of a response variable. It is shown here that random forests provide information about the full conditional distribution of the response variable, not only about the conditional mean. Conditional quantiles can be inferred with quantile regression forests, a generalisation of random forests. Quantile regression forests give a non-parametric and accurate way of estimating conditional quantiles for high-dimensional predictor variables. The algorithm is shown to be consistent. Numerical examples suggest that the algorithm is competitive in terms of predictive power. Keywords: quantile regression, random forests, adaptive neighborhood regression

2 0.17774111 65 jmlr-2006-Nonparametric Quantile Estimation

Author: Ichiro Takeuchi, Quoc V. Le, Timothy D. Sears, Alexander J. Smola

Abstract: In regression, the desired estimate of y|x is not always given by a conditional mean, although this is most common. Sometimes one wants to obtain a good estimate that satisfies the property that a proportion, τ, of y|x, will be below the estimate. For τ = 0.5 this is an estimate of the median. What might be called median regression, is subsumed under the term quantile regression. We present a nonparametric version of a quantile estimator, which can be obtained by solving a simple quadratic programming problem and provide uniform convergence statements and bounds on the quantile property of our estimator. Experimental results show the feasibility of the approach and competitiveness of our method with existing ones. We discuss several types of extensions including an approach to solve the quantile crossing problems, as well as a method to incorporate prior qualitative knowledge such as monotonicity constraints. Keywords: support vector machines, kernel methods, quantile estimation, nonparametric techniques, estimation with constraints

3 0.036145121 76 jmlr-2006-QP Algorithms with Guaranteed Accuracy and Run Time for Support Vector Machines

Author: Don Hush, Patrick Kelly, Clint Scovel, Ingo Steinwart

Abstract: We describe polynomial–time algorithms that produce approximate solutions with guaranteed accuracy for a class of QP problems that are used in the design of support vector machine classifiers. These algorithms employ a two–stage process where the first stage produces an approximate solution to a dual QP problem and the second stage maps this approximate dual solution to an approximate primal solution. For the second stage we describe an O(n log n) algorithm that maps an √ √ approximate dual solution with accuracy (2 2Kn + 8 λ)−2 λε2 to an approximate primal solution p with accuracy ε p where n is the number of data samples, Kn is the maximum kernel value over the data and λ > 0 is the SVM regularization parameter. For the first stage we present new results for decomposition algorithms and describe new decomposition algorithms with guaranteed accuracy and run time. In particular, for τ–rate certifying decomposition algorithms we establish the optimality of τ = 1/(n − 1). In addition we extend the recent τ = 1/(n − 1) algorithm of Simon (2004) to form two new composite algorithms that also achieve the τ = 1/(n − 1) iteration bound of List and Simon (2005), but yield faster run times in practice. We also exploit the τ–rate certifying property of these algorithms to produce new stopping rules that are computationally efficient and that guarantee a specified accuracy for the approximate dual solution. Furthermore, for the dual QP problem corresponding to the standard classification problem we describe operational conditions for which the Simon and composite algorithms possess an upper bound of O(n) on the number of iterations. For this same problem we also describe general conditions for which a matching lower bound exists for any decomposition algorithm that uses working sets of size 2. For the Simon and composite algorithms we also establish an O(n2 ) bound on the overall run time for the first stage. Combining the first and second stages gives an overall run time of O(n2 (c

4 0.01914406 83 jmlr-2006-Sparse Boosting

Author: Peter Bühlmann, Bin Yu

Abstract: We propose Sparse Boosting (the SparseL2 Boost algorithm), a variant on boosting with the squared error loss. SparseL2 Boost yields sparser solutions than the previously proposed L2 Boosting by minimizing some penalized L2 -loss functions, the FPE model selection criteria, through smallstep gradient descent. Although boosting may give already relatively sparse solutions, for example corresponding to the soft-thresholding estimator in orthogonal linear models, there is sometimes a desire for more sparseness to increase prediction accuracy and ability for better variable selection: such goals can be achieved with SparseL2 Boost. We prove an equivalence of SparseL2 Boost to Breiman’s nonnegative garrote estimator for orthogonal linear models and demonstrate the generic nature of SparseL2 Boost for nonparametric interaction modeling. For an automatic selection of the tuning parameter in SparseL2 Boost we propose to employ the gMDL model selection criterion which can also be used for early stopping of L2 Boosting. Consequently, we can select between SparseL2 Boost and L2 Boosting by comparing their gMDL scores. Keywords: lasso, minimum description length (MDL), model selection, nonnegative garrote, regression

5 0.018382648 45 jmlr-2006-Learning Coordinate Covariances via Gradients

Author: Sayan Mukherjee, Ding-Xuan Zhou

Abstract: We introduce an algorithm that learns gradients from samples in the supervised learning framework. An error analysis is given for the convergence of the gradient estimated by the algorithm to the true gradient. The utility of the algorithm for the problem of variable selection as well as determining variable covariance is illustrated on simulated data as well as two gene expression data sets. For square loss we provide a very efficient implementation with respect to both memory and time. Keywords: Tikhnonov regularization, variable selection, reproducing kernel Hilbert space, generalization bounds

6 0.01815933 79 jmlr-2006-Second Order Cone Programming Approaches for Handling Missing and Uncertain Data     (Special Topic on Machine Learning and Optimization)

7 0.015949897 90 jmlr-2006-Superior Guarantees for Sequential Prediction and Lossless Compression via Alphabet Decomposition

8 0.015898995 27 jmlr-2006-Ensemble Pruning Via Semi-definite Programming     (Special Topic on Machine Learning and Optimization)

9 0.014927295 37 jmlr-2006-Incremental Algorithms for Hierarchical Classification

10 0.014093159 48 jmlr-2006-Learning Minimum Volume Sets

11 0.012626388 23 jmlr-2006-Consistency and Convergence Rates of One-Class SVMs and Related Algorithms

12 0.012612664 38 jmlr-2006-Incremental Support Vector Learning: Analysis, Implementation and Applications     (Special Topic on Machine Learning and Optimization)

13 0.012417731 88 jmlr-2006-Streamwise Feature Selection

14 0.012279043 80 jmlr-2006-Segmental Hidden Markov Models with Random Effects for Waveform Modeling

15 0.01097567 51 jmlr-2006-Learning Sparse Representations by Non-Negative Matrix Factorization and Sequential Cone Programming     (Special Topic on Machine Learning and Optimization)

16 0.01085947 28 jmlr-2006-Estimating the "Wrong" Graphical Model: Benefits in the Computation-Limited Setting

17 0.010295971 31 jmlr-2006-Exact 1-Norm Support Vector Machines Via Unconstrained Convex Differentiable Minimization     (Special Topic on Machine Learning and Optimization)

18 0.010153324 29 jmlr-2006-Estimation of Gradients and Coordinate Covariation in Classification

19 0.0098713674 56 jmlr-2006-Linear Programs for Hypotheses Selection in Probabilistic Inference Models     (Special Topic on Machine Learning and Optimization)

20 0.0097750453 63 jmlr-2006-New Algorithms for Efficient High-Dimensional Nonparametric Classification


similar papers computed by lsi model

lsi for this paper:

topicId topicWeight

[(0, 0.061), (1, -0.033), (2, 0.005), (3, -0.027), (4, -0.014), (5, 0.03), (6, -0.042), (7, -0.008), (8, 0.069), (9, -0.027), (10, -0.033), (11, 0.022), (12, 0.098), (13, 0.102), (14, 0.099), (15, -0.588), (16, -0.315), (17, 0.085), (18, -0.019), (19, 0.035), (20, 0.011), (21, 0.183), (22, 0.026), (23, 0.174), (24, -0.046), (25, 0.064), (26, 0.062), (27, -0.002), (28, 0.025), (29, 0.077), (30, -0.0), (31, -0.014), (32, 0.017), (33, 0.006), (34, -0.021), (35, -0.007), (36, -0.032), (37, 0.089), (38, 0.06), (39, 0.029), (40, 0.053), (41, 0.054), (42, -0.042), (43, -0.008), (44, -0.036), (45, 0.098), (46, -0.026), (47, 0.047), (48, -0.05), (49, -0.013)]

similar papers list:

simIndex simValue paperId paperTitle

same-paper 1 0.98877656 77 jmlr-2006-Quantile Regression Forests

Author: Nicolai Meinshausen

Abstract: Random forests were introduced as a machine learning tool in Breiman (2001) and have since proven to be very popular and powerful for high-dimensional regression and classification. For regression, random forests give an accurate approximation of the conditional mean of a response variable. It is shown here that random forests provide information about the full conditional distribution of the response variable, not only about the conditional mean. Conditional quantiles can be inferred with quantile regression forests, a generalisation of random forests. Quantile regression forests give a non-parametric and accurate way of estimating conditional quantiles for high-dimensional predictor variables. The algorithm is shown to be consistent. Numerical examples suggest that the algorithm is competitive in terms of predictive power. Keywords: quantile regression, random forests, adaptive neighborhood regression

2 0.8463285 65 jmlr-2006-Nonparametric Quantile Estimation

Author: Ichiro Takeuchi, Quoc V. Le, Timothy D. Sears, Alexander J. Smola

Abstract: In regression, the desired estimate of y|x is not always given by a conditional mean, although this is most common. Sometimes one wants to obtain a good estimate that satisfies the property that a proportion, τ, of y|x, will be below the estimate. For τ = 0.5 this is an estimate of the median. What might be called median regression, is subsumed under the term quantile regression. We present a nonparametric version of a quantile estimator, which can be obtained by solving a simple quadratic programming problem and provide uniform convergence statements and bounds on the quantile property of our estimator. Experimental results show the feasibility of the approach and competitiveness of our method with existing ones. We discuss several types of extensions including an approach to solve the quantile crossing problems, as well as a method to incorporate prior qualitative knowledge such as monotonicity constraints. Keywords: support vector machines, kernel methods, quantile estimation, nonparametric techniques, estimation with constraints

3 0.12433717 76 jmlr-2006-QP Algorithms with Guaranteed Accuracy and Run Time for Support Vector Machines

Author: Don Hush, Patrick Kelly, Clint Scovel, Ingo Steinwart

Abstract: We describe polynomial–time algorithms that produce approximate solutions with guaranteed accuracy for a class of QP problems that are used in the design of support vector machine classifiers. These algorithms employ a two–stage process where the first stage produces an approximate solution to a dual QP problem and the second stage maps this approximate dual solution to an approximate primal solution. For the second stage we describe an O(n log n) algorithm that maps an √ √ approximate dual solution with accuracy (2 2Kn + 8 λ)−2 λε2 to an approximate primal solution p with accuracy ε p where n is the number of data samples, Kn is the maximum kernel value over the data and λ > 0 is the SVM regularization parameter. For the first stage we present new results for decomposition algorithms and describe new decomposition algorithms with guaranteed accuracy and run time. In particular, for τ–rate certifying decomposition algorithms we establish the optimality of τ = 1/(n − 1). In addition we extend the recent τ = 1/(n − 1) algorithm of Simon (2004) to form two new composite algorithms that also achieve the τ = 1/(n − 1) iteration bound of List and Simon (2005), but yield faster run times in practice. We also exploit the τ–rate certifying property of these algorithms to produce new stopping rules that are computationally efficient and that guarantee a specified accuracy for the approximate dual solution. Furthermore, for the dual QP problem corresponding to the standard classification problem we describe operational conditions for which the Simon and composite algorithms possess an upper bound of O(n) on the number of iterations. For this same problem we also describe general conditions for which a matching lower bound exists for any decomposition algorithm that uses working sets of size 2. For the Simon and composite algorithms we also establish an O(n2 ) bound on the overall run time for the first stage. Combining the first and second stages gives an overall run time of O(n2 (c

4 0.089727506 83 jmlr-2006-Sparse Boosting

Author: Peter Bühlmann, Bin Yu

Abstract: We propose Sparse Boosting (the SparseL2 Boost algorithm), a variant on boosting with the squared error loss. SparseL2 Boost yields sparser solutions than the previously proposed L2 Boosting by minimizing some penalized L2 -loss functions, the FPE model selection criteria, through smallstep gradient descent. Although boosting may give already relatively sparse solutions, for example corresponding to the soft-thresholding estimator in orthogonal linear models, there is sometimes a desire for more sparseness to increase prediction accuracy and ability for better variable selection: such goals can be achieved with SparseL2 Boost. We prove an equivalence of SparseL2 Boost to Breiman’s nonnegative garrote estimator for orthogonal linear models and demonstrate the generic nature of SparseL2 Boost for nonparametric interaction modeling. For an automatic selection of the tuning parameter in SparseL2 Boost we propose to employ the gMDL model selection criterion which can also be used for early stopping of L2 Boosting. Consequently, we can select between SparseL2 Boost and L2 Boosting by comparing their gMDL scores. Keywords: lasso, minimum description length (MDL), model selection, nonnegative garrote, regression

5 0.081222847 88 jmlr-2006-Streamwise Feature Selection

Author: Jing Zhou, Dean P. Foster, Robert A. Stine, Lyle H. Ungar

Abstract: In streamwise feature selection, new features are sequentially considered for addition to a predictive model. When the space of potential features is large, streamwise feature selection offers many advantages over traditional feature selection methods, which assume that all features are known in advance. Features can be generated dynamically, focusing the search for new features on promising subspaces, and overfitting can be controlled by dynamically adjusting the threshold for adding features to the model. In contrast to traditional forward feature selection algorithms such as stepwise regression in which at each step all possible features are evaluated and the best one is selected, streamwise feature selection only evaluates each feature once when it is generated. We describe information-investing and α-investing, two adaptive complexity penalty methods for streamwise feature selection which dynamically adjust the threshold on the error reduction required for adding a new feature. These two methods give false discovery rate style guarantees against overfitting. They differ from standard penalty methods such as AIC, BIC and RIC, which always drastically over- or under-fit in the limit of infinite numbers of non-predictive features. Empirical results show that streamwise regression is competitive with (on small data sets) and superior to (on large data sets) much more compute-intensive feature selection methods such as stepwise regression, and allows feature selection on problems with millions of potential features. Keywords: classification, stepwise regression, multiple regression, feature selection, false discovery rate

6 0.080993444 56 jmlr-2006-Linear Programs for Hypotheses Selection in Probabilistic Inference Models     (Special Topic on Machine Learning and Optimization)

7 0.066424184 28 jmlr-2006-Estimating the "Wrong" Graphical Model: Benefits in the Computation-Limited Setting

8 0.064022154 63 jmlr-2006-New Algorithms for Efficient High-Dimensional Nonparametric Classification

9 0.062695108 45 jmlr-2006-Learning Coordinate Covariances via Gradients

10 0.060176328 27 jmlr-2006-Ensemble Pruning Via Semi-definite Programming     (Special Topic on Machine Learning and Optimization)

11 0.05995189 37 jmlr-2006-Incremental Algorithms for Hierarchical Classification

12 0.056643698 95 jmlr-2006-Walk-Sums and Belief Propagation in Gaussian Graphical Models

13 0.055700362 67 jmlr-2006-On Representing and Generating Kernels by Fuzzy Equivalence Relations

14 0.05369541 90 jmlr-2006-Superior Guarantees for Sequential Prediction and Lossless Compression via Alphabet Decomposition

15 0.05239391 73 jmlr-2006-Pattern Recognition for Conditionally Independent Data

16 0.052288141 41 jmlr-2006-Kernel-Based Learning of Hierarchical Multilabel Classification Models     (Special Topic on Machine Learning and Optimization)

17 0.051639248 47 jmlr-2006-Learning Image Components for Object Recognition

18 0.05101281 80 jmlr-2006-Segmental Hidden Markov Models with Random Effects for Waveform Modeling

19 0.050915059 29 jmlr-2006-Estimation of Gradients and Coordinate Covariation in Classification

20 0.049520802 54 jmlr-2006-Learning the Structure of Linear Latent Variable Models


similar papers computed by lda model

lda for this paper:

topicId topicWeight

[(8, 0.018), (31, 0.505), (35, 0.014), (36, 0.048), (45, 0.019), (50, 0.033), (61, 0.013), (63, 0.021), (71, 0.016), (78, 0.011), (79, 0.016), (81, 0.034), (84, 0.037), (90, 0.012), (91, 0.014), (96, 0.033)]

similar papers list:

simIndex simValue paperId paperTitle

same-paper 1 0.78125298 77 jmlr-2006-Quantile Regression Forests

Author: Nicolai Meinshausen

Abstract: Random forests were introduced as a machine learning tool in Breiman (2001) and have since proven to be very popular and powerful for high-dimensional regression and classification. For regression, random forests give an accurate approximation of the conditional mean of a response variable. It is shown here that random forests provide information about the full conditional distribution of the response variable, not only about the conditional mean. Conditional quantiles can be inferred with quantile regression forests, a generalisation of random forests. Quantile regression forests give a non-parametric and accurate way of estimating conditional quantiles for high-dimensional predictor variables. The algorithm is shown to be consistent. Numerical examples suggest that the algorithm is competitive in terms of predictive power. Keywords: quantile regression, random forests, adaptive neighborhood regression

2 0.16462772 41 jmlr-2006-Kernel-Based Learning of Hierarchical Multilabel Classification Models     (Special Topic on Machine Learning and Optimization)

Author: Juho Rousu, Craig Saunders, Sandor Szedmak, John Shawe-Taylor

Abstract: We present a kernel-based algorithm for hierarchical text classification where the documents are allowed to belong to more than one category at a time. The classification model is a variant of the Maximum Margin Markov Network framework, where the classification hierarchy is represented as a Markov tree equipped with an exponential family defined on the edges. We present an efficient optimization algorithm based on incremental conditional gradient ascent in single-example subspaces spanned by the marginal dual variables. The optimization is facilitated with a dynamic programming based algorithm that computes best update directions in the feasible set. Experiments show that the algorithm can feasibly optimize training sets of thousands of examples and classification hierarchies consisting of hundreds of nodes. Training of the full hierarchical model is as efficient as training independent SVM-light classifiers for each node. The algorithm’s predictive accuracy was found to be competitive with other recently introduced hierarchical multicategory or multilabel classification learning algorithms. Keywords: kernel methods, hierarchical classification, text categorization, convex optimization, structured outputs

3 0.16077451 95 jmlr-2006-Walk-Sums and Belief Propagation in Gaussian Graphical Models

Author: Dmitry M. Malioutov, Jason K. Johnson, Alan S. Willsky

Abstract: We present a new framework based on walks in a graph for analysis and inference in Gaussian graphical models. The key idea is to decompose the correlation between each pair of variables as a sum over all walks between those variables in the graph. The weight of each walk is given by a product of edgewise partial correlation coefficients. This representation holds for a large class of Gaussian graphical models which we call walk-summable. We give a precise characterization of this class of models, and relate it to other classes including diagonally dominant, attractive, nonfrustrated, and pairwise-normalizable. We provide a walk-sum interpretation of Gaussian belief propagation in trees and of the approximate method of loopy belief propagation in graphs with cycles. The walk-sum perspective leads to a better understanding of Gaussian belief propagation and to stronger results for its convergence in loopy graphs. Keywords: Gaussian graphical models, walk-sum analysis, convergence of loopy belief propagation

4 0.16009556 66 jmlr-2006-On Model Selection Consistency of Lasso

Author: Peng Zhao, Bin Yu

Abstract: Sparsity or parsimony of statistical models is crucial for their proper interpretations, as in sciences and social sciences. Model selection is a commonly used method to find such models, but usually involves a computationally heavy combinatorial search. Lasso (Tibshirani, 1996) is now being used as a computationally feasible alternative to model selection. Therefore it is important to study Lasso for model selection purposes. In this paper, we prove that a single condition, which we call the Irrepresentable Condition, is almost necessary and sufficient for Lasso to select the true model both in the classical fixed p setting and in the large p setting as the sample size n gets large. Based on these results, sufficient conditions that are verifiable in practice are given to relate to previous works and help applications of Lasso for feature selection and sparse representation. This Irrepresentable Condition, which depends mainly on the covariance of the predictor variables, states that Lasso selects the true model consistently if and (almost) only if the predictors that are not in the true model are “irrepresentable” (in a sense to be clarified) by predictors that are in the true model. Furthermore, simulations are carried out to provide insights and understanding of this result. Keywords: Lasso, regularization, sparsity, model selection, consistency

5 0.15988047 40 jmlr-2006-Infinite-σ Limits For Tikhonov Regularization

Author: Ross A. Lippert, Ryan M. Rifkin

Abstract: We consider the problem of Tikhonov regularization with a general convex loss function: this formalism includes support vector machines and regularized least squares. For a family of kernels that includes the Gaussian, parameterized by a “bandwidth” parameter σ, we characterize the limiting ˜ solution as σ → ∞. In particular, we show that if we set the regularization parameter λ = λσ−2p , the regularization term of the Tikhonov problem tends to an indicator function on polynomials of degree ⌊p⌋ (with residual regularization in the case where p ∈ Z). The proof rests on two key ideas: epi-convergence, a notion of functional convergence under which limits of minimizers converge to minimizers of limits, and a value-based formulation of learning, where we work with regularization on the function output values (y) as opposed to the function expansion coefficients in the RKHS. Our result generalizes and unifies previous results in this area. Keywords: Tikhonov regularization, Gaussian kernel, theory, kernel machines

6 0.1592252 73 jmlr-2006-Pattern Recognition for Conditionally Independent Data

7 0.15590566 29 jmlr-2006-Estimation of Gradients and Coordinate Covariation in Classification

8 0.15572691 9 jmlr-2006-Accurate Error Bounds for the Eigenvalues of the Kernel Matrix

9 0.15411845 52 jmlr-2006-Learning Spectral Clustering, With Application To Speech Separation

10 0.15398034 28 jmlr-2006-Estimating the "Wrong" Graphical Model: Benefits in the Computation-Limited Setting

11 0.1535864 79 jmlr-2006-Second Order Cone Programming Approaches for Handling Missing and Uncertain Data     (Special Topic on Machine Learning and Optimization)

12 0.15266094 22 jmlr-2006-Considering Cost Asymmetry in Learning Classifiers

13 0.152132 60 jmlr-2006-Manifold Regularization: A Geometric Framework for Learning from Labeled and Unlabeled Examples

14 0.15211332 17 jmlr-2006-Bounds for the Loss in Probability of Correct Classification Under Model Based Approximation

15 0.1519838 11 jmlr-2006-Active Learning in Approximately Linear Regression Based on Conditional Expectation of Generalization Error

16 0.15127605 48 jmlr-2006-Learning Minimum Volume Sets

17 0.15054408 53 jmlr-2006-Learning a Hidden Hypergraph

18 0.1500825 56 jmlr-2006-Linear Programs for Hypotheses Selection in Probabilistic Inference Models     (Special Topic on Machine Learning and Optimization)

19 0.14992085 87 jmlr-2006-Stochastic Complexities of Gaussian Mixtures in Variational Bayesian Approximation

20 0.14930613 44 jmlr-2006-Large Scale Transductive SVMs