jmlr jmlr2006 jmlr2006-77 jmlr2006-77-reference knowledge-graph by maker-knowledge-mining

77 jmlr-2006-Quantile Regression Forests

Source: pdf

Author: Nicolai Meinshausen

Abstract: Random forests were introduced as a machine learning tool in Breiman (2001) and have since proven to be very popular and powerful for high-dimensional regression and classiﬁcation. For regression, random forests give an accurate approximation of the conditional mean of a response variable. It is shown here that random forests provide information about the full conditional distribution of the response variable, not only about the conditional mean. Conditional quantiles can be inferred with quantile regression forests, a generalisation of random forests. Quantile regression forests give a non-parametric and accurate way of estimating conditional quantiles for high-dimensional predictor variables. The algorithm is shown to be consistent. Numerical examples suggest that the algorithm is competitive in terms of predictive power. Keywords: quantile regression, random forests, adaptive neighborhood regression

reference text

Y. Amit and D. Geman. Shape quantization and recognition with randomized trees. Neural Computation, 9:1545–1588, 1997. V. Barnett and T. Lewis. Outliers in Statistical Data. John Wiley and Sons, 1994. L. Breiman. Random forests. Machine Learning, 45:5–32, 2001. L. Breiman. Consistency for a simple model of random forests. Technical Report 670, Department of Statistics, University of California, Berkeley, 2004. L. Breiman and J. H. Friedman. Estimating optimal transformations for multiple regression and correlation. Journal of the American Statistical Association, 80:580–598, 1985. L. Breiman, J. H. Friedman, R. A. Olshen, and C. J. Stone. Classiﬁcation and Regression Trees. Wadsworth, Belmont, 1984. P. Chaudhuri and W. Loh. Nonparametric estimation of conditional quantiles using quantile regression trees. Bernoulli, 8:561–576, 2002. J. Friedman, T. Hastie, and R. Tibshirani. Additive logistic regression: a statistical view of boosting. Annals of Statistics, 28:337–407, 2000. X. He, P. Ng, and S. Portnoy. Bivariate quantile smoothing splines. Journal of the Royal Statistical Society B, 3:537–550, 1998. 998 Quantile Regression Forests V. Hodge and J. Austin. A survey of outlier detection methodologies. Artiﬁcial Intelligence Review, 22:85 – 126, 2004. P. Huber. Robust regression: asymptotics, conjectures, and monte carlo. Annals of Statistics, 1:799–821, 1973. R. J. Hyndman and Y. Fan. Sample quantiles in statistical packages. American Statistician, 50:361–365, 1996. R. Koenker. Quantile Regression. Cambridge University Press, 2005. R. Koenker, P. Ng, and S. Portnoy. Quantile smoothing splines. Biometrika, 81:673–680, 1994. Q. V. Le, T. Sears, and A. Smola. Nonparametric quantile regression. Technical report, NICTA, 2005. Andy Liaw and Matthew Wiener. Classiﬁcation and regression by randomForest. R News, 2:18–22, 2002. Y. Lin and Y. Jeon. Random forests and adaptive nearest neighbors. Technical Report 1055, Department of Statistics, University of Wisconsin, 2002. M. Markou and S. Singh. Novelty detection: A review. Signal Processing, 83:2481–2497, 2003. S Portnoy and R. Koenker. The gaussian hare and the laplacian tortoise: Computability of squared-error versus absolute-error estimates. Statistical Science, 12:279–300, 1997. R Development Core Team. R: A language and environment for statistical computing. R Foundation for Statistical Computing, Vienna, Austria, 2005. URL http: //www.R-project.org. ISBN 3-900051-07-0. R. E. Schapire, Y. Freund, P. Bartlett, and W. S. Lee. Boosting the margin: A new explanation for the eﬀectiveness of voting methods. Annals of Statistics, 26:1651–1686, 1998. I. Steinwart, D. Hush, and C. Scovel. A classiﬁcation framework for anomaly detection. Journal of Machine Learning Research, 6:211–232, 2005. S. Weisberg. Applied Linear Regression. Wiley, 2005. 999