Ilya Lipkovich writes: I read with great interest your 2008 paper [with Aleks Jakulin, Grazia Pittau, and Yu-Sung Su] on weakly informative priors for logistic regression and also followed an interesting discussion on your blog. This discussion was within Bayesian community in relation to the validity of priors. However i would like to approach it rather from a more broad perspective on predictive modeling bringing in the ideas from machine/statistical learning approach". Actually you were the first to bring it up by mentioning in your paper "borrowing ideas from computer science" on cross-validation when comparing predictive ability of your proposed priors with other choices. However, using cross-validation for comparing method performance is not the only or primary use of CV in machine-learning. Most of machine learning methods have some "meta" or complexity parameters and use cross-validation to tune them up. For example, one of your comparison methods is BBR which actually

1 Ilya Lipkovich writes: I read with great interest your 2008 paper [with Aleks Jakulin, Grazia Pittau, and Yu-Sung Su] on weakly informative priors for logistic regression and also followed an interesting discussion on your blog. [sent-1, score-0.392]

2 However i would like to approach it rather from a more broad perspective on predictive modeling bringing in the ideas from machine/statistical learning approach”. [sent-3, score-0.409]

3 Actually you were the first to bring it up by mentioning in your paper “borrowing ideas from computer science” on cross-validation when comparing predictive ability of your proposed priors with other choices. [sent-4, score-0.535]

4 Most of machine learning methods have some “meta” or complexity parameters and use cross-validation to tune them up. [sent-6, score-0.623]

5 For example, one of your comparison methods is BBR which actually resorts to CV for selecting the prior variance (whether you use Laplace or Gaussian priors). [sent-7, score-0.245]

6 This makes their method essentially equivalent to ridge regression or lasso with tuning parameter selected by cross-validation so there is really not much Bayesian flavor left there. [sent-8, score-0.887]

7 From my personal communication with David Madigan I did not remember whether he ever advocated using default priors, he seemed to like CV approach in choosing them (and that was the whole point of making the algorithm fast), as most people in the statistical learning community would (e. [sent-11, score-0.563]

8 Now, it is unclear from your paper, whether when comparing your automated Cauchy priors with BBR you let the BBR to chose optimal tuning parameter, or used the default values. [sent-14, score-1.119]

9 If you let BBR tune parameters then you should have performed a “double cross-validation,” allowing BBR to select a (possibly different) value of tuning parameter (prior variance) on each fold of your “outer cross-validation,” based on a separate “inner CV” within that fold. [sent-15, score-0.836]

10 If you used automated priors then you might not have done justice to the BBR. [sent-16, score-0.441]

11 But then you may say that it would be unfair to let them choose optimal prior variance via CV if your method uses automated priors. [sent-17, score-0.581]

12 If we leave the Bayesian grounds and move to the statistical learning (or “computer science” in your interpretation) turf, then what is the optimal way to fit a predictive model? [sent-20, score-0.44]

13 From reading your paper it seems that you believe in the existence of default priors, which translates in having default complexity parameters when performing statistical learning. [sent-21, score-0.669]

14 This seems to be in contrast with what the “authorities” in the statistical leaning literature tell us where they reject the idea that one can preset complexity parameters in any large-scale predictive modeling as a popular myth. [sent-22, score-0.692]

15 The answer may be that your approach with automated priors is intended only when having just few predictors? [sent-25, score-0.51]

16 Or there is here a deeper philosophical split between the Bayesian and the statistical learning community? [sent-26, score-0.213]

17 It depends on the structure of the problem: the more replication, the more it is possible to estimate such tuning parameters internally. [sent-31, score-0.587]

18 In our paper we were particularly interested in cases where the number of predictors is small. [sent-33, score-0.257]

19 If there are general differences between statistics and machine learning here, then, it’s not on the philosophy of automated priors or whatever; it’s that in statistics we often talk about small problems with only a few predictors (see any statistics textbooks, including mine! [sent-34, score-0.842]

20 ), whereas machine learning methods tend to be applied to problems with large numbers of predictors. [sent-35, score-0.297]

same-blog 1 1.0 2129 andrew gelman stats-2013-12-10-Cross-validation and Bayesian estimation of tuning parameters

Introduction: Ilya Lipkovich writes: I read with great interest your 2008 paper [with Aleks Jakulin, Grazia Pittau, and Yu-Sung Su] on weakly informative priors for logistic regression and also followed an interesting discussion on your blog. This discussion was within Bayesian community in relation to the validity of priors. However i would like to approach it rather from a more broad perspective on predictive modeling bringing in the ideas from machine/statistical learning approach”. Actually you were the first to bring it up by mentioning in your paper “borrowing ideas from computer science” on cross-validation when comparing predictive ability of your proposed priors with other choices. However, using cross-validation for comparing method performance is not the only or primary use of CV in machine-learning. Most of machine learning methods have some “meta” or complexity parameters and use cross-validation to tune them up. For example, one of your comparison methods is BBR which actually

Introduction: Andre de Boer writes: In my profession as a risk manager I encountered this graph: I can’t figure out what kind of regression this is, would you be so kind to enlighten me? The points represent (maturity,yield) of bonds. My reply: That’s a fun problem, reverse-engineering a curve fit! My first guess is lowess, although it seems too flat and asympoty on the right side of the graph to be lowess. Maybe a Gaussian process? Looks too smooth to be a spline. I guess I’ll go with my original guess, on the theory that lowess is the most accessible smoother out there, and if someone fit something much more complicated they’d make more of a big deal about it. On the other hand, if the curve is an automatic output of some software (Excel? Stata?) then it could be just about anything. Does anyone have any ideas?

same-blog 2 0.97762573 2129 andrew gelman stats-2013-12-10-Cross-validation and Bayesian estimation of tuning parameters

Introduction: Ilya Lipkovich writes: I read with great interest your 2008 paper [with Aleks Jakulin, Grazia Pittau, and Yu-Sung Su] on weakly informative priors for logistic regression and also followed an interesting discussion on your blog. This discussion was within Bayesian community in relation to the validity of priors. However i would like to approach it rather from a more broad perspective on predictive modeling bringing in the ideas from machine/statistical learning approach”. Actually you were the first to bring it up by mentioning in your paper “borrowing ideas from computer science” on cross-validation when comparing predictive ability of your proposed priors with other choices. However, using cross-validation for comparing method performance is not the only or primary use of CV in machine-learning. Most of machine learning methods have some “meta” or complexity parameters and use cross-validation to tune them up. For example, one of your comparison methods is BBR which actually

3 0.9767161 399 andrew gelman stats-2010-11-07-Challenges of experimental design; also another rant on the practice of mentioning the publication of an article but not naming its author

Introduction: After learning of a news article by Amy Harmon on problems with medical trials–sometimes people are stuck getting the placebo when they could really use the experimental treatment, and it can be a life-or-death difference, John Langford discusses some fifteen-year-old work on optimal design in machine learning and makes the following completely reasonable point: With reasonable record keeping of existing outcomes for the standard treatments, there is no need to explicitly assign people to a control group with the standard treatment, as that approach is effectively explored with great certainty. Asserting otherwise would imply that the nature of effective treatments for cancer has changed between now and a year ago, which denies the value of any clinical trial. . . . Done the right way, the clinical trial for a successful treatment would start with some initial small pool (equivalent to “phase 1″ in the article) and then simply expanded the pool of participants over time as it

4 0.97654045 197 andrew gelman stats-2010-08-10-The last great essayist?

Introduction: I recently read a bizarre article by Janet Malcolm on a murder trial in NYC. What threw me about the article was that the story was utterly commonplace (by the standards of today’s headlines): divorced mom kills ex-husband in a custody dispute over their four-year-old daughter. The only interesting features were (a) the wife was a doctor and the husband were a dentist, the sort of people you’d expect to sue rather than slay, and (b) the wife hired a hitman from within the insular immigrant community that she (and her husband) belonged to. But, really, neither of these was much of a twist. To add to the non-storyness of it all, there were no other suspects, the evidence against the wife and the hitman was overwhelming, and even the high-paid defense lawyers didn’t seem to be making much of an effort to convince anyone of their client’s innocents. (One of the closing arguments was that one aspect of the wife’s story was so ridiculous that it had to be true. In the lawyer’s wo

5 0.97569078 847 andrew gelman stats-2011-08-10-Using a “pure infographic” to explore differences between information visualization and statistical graphics

Introduction: Our discussion on data visualization continues. One one side are three statisticians–Antony Unwin, Kaiser Fung, and myself. We have been writing about the different goals served by information visualization and statistical graphics. On the other side are graphics experts (sorry for the imprecision, I don’t know exactly what these people do in their day jobs or how they are trained, and I don’t want to mislabel them) such as Robert Kosara and Jen Lowe , who seem a bit annoyed at how my colleagues and myself seem to follow the Tufte strategy of criticizing what we don’t understand. And on the third side are many (most?) academic statisticians, econometricians, etc., who don’t understand or respect graphs and seem to think of visualization as a toy that is unrelated to serious science or statistics. I’m not so interested in the third group right now–I tried to communicate with them in my big articles from 2003 and 2004 )–but I am concerned that our dialogue with the graphic

6 0.97373676 1240 andrew gelman stats-2012-04-02-Blogads update

7 0.97361201 1838 andrew gelman stats-2013-05-03-Setting aside the politics, the debate over the new health-care study reveals that we’re moving to a new high standard of statistical journalism

8 0.97281945 1087 andrew gelman stats-2011-12-27-“Keeping things unridiculous”: Berger, O’Hagan, and me on weakly informative priors

9 0.97201121 896 andrew gelman stats-2011-09-09-My homework success

10 0.97174948 1757 andrew gelman stats-2013-03-11-My problem with the Lindley paradox

11 0.97082633 1455 andrew gelman stats-2012-08-12-Probabilistic screening to get an approximate self-weighted sample

12 0.97050947 278 andrew gelman stats-2010-09-15-Advice that might make sense for individuals but is negative-sum overall

13 0.97003329 2358 andrew gelman stats-2014-06-03-Did you buy laundry detergent on their most recent trip to the store? Also comments on scientific publication and yet another suggestion to do a study that allows within-person comparisons

14 0.96974826 1999 andrew gelman stats-2013-08-27-Bayesian model averaging or fitting a larger model

15 0.96974194 2247 andrew gelman stats-2014-03-14-The maximal information coefficient

16 0.96946216 2143 andrew gelman stats-2013-12-22-The kluges of today are the textbook solutions of tomorrow.

17 0.96904123 2099 andrew gelman stats-2013-11-13-“What are some situations in which the classical approach (or a naive implementation of it, based on cookbook recipes) gives worse results than a Bayesian approach, results that actually impeded the science?”

18 0.96835661 85 andrew gelman stats-2010-06-14-Prior distribution for design effects

19 0.96831048 1072 andrew gelman stats-2011-12-19-“The difference between . . .”: It’s not just p=.05 vs. p=.06

20 0.96802801 1080 andrew gelman stats-2011-12-24-Latest in blog advertising