Introduction: David Hogg writes: My (now deceased) collaborator and guru in all things inference, Sam Roweis, used to emphasize to me that we should evaluate models in the data space — not the parameter space — because models are always effectively “effective” and not really, fundamentally true. Or, in other words, models should be compared in the space of their predictions, not in the space of their parameters (the parameters didn’t really “exist” at all for Sam). In that spirit, when we estimate the effectiveness of a MCMC method or tuning — by autocorrelation time or ESJD or anything else — shouldn’t we be looking at the changes in the model predictions over time, rather than the changes in the parameters over time? That is, the autocorrelation time should be the autocorrelation time in what the model (at the walker position) predicts for the data, and the ESJD should be the expected squared jump distance in what the model predicts for the data? This might resolve the concern I expressed a

6 Hogg continues with an example: Imagine you have a three-planet model for some radial velocity data. [sent-7, score-0.204]

7 Sometimes we have a redundant parameterization in which the individual parameters are not identified, but predictions are well-identified. [sent-10, score-0.897]

8 For a simple example, suppose you have a model, y ~ N (a+b, 1), with a uniform prior distribution on (a,b). [sent-11, score-0.1]

9 Then your data don’t tell you anything about a or b, but you can get good inference for a+b and good predictions for new data from the same model. [sent-12, score-0.559]

10 On the other hand, if you want to make a prediction for new data z ~ N(a,1), you’re out of luck. [sent-13, score-0.2]

11 More generally, one problem I have with the hard-line predictivist stance—the idea that models and parameters are mere fictions whereas predictions are real—is that models and parameters can be thought of as bridges between the data of yesterday and the data of tomorrow. [sent-14, score-1.571]

12 It’s not just part of a prediction for some particular measurement. [sent-16, score-0.094]

13 For a more humble example, consider our discussion of physiologically-based pharmacokinetics models in Section 4. [sent-18, score-0.388]

14 In a Bayesian model, good parameterization can be important, as it is typically through the parameters that we put in prior information. [sent-20, score-0.631]

15 In many ways, the parameterization represents a key source of prior information. [sent-21, score-0.316]

same-blog 1 0.9426356 1287 andrew gelman stats-2012-04-28-Understanding simulations in terms of predictive inference?

Introduction: David Hogg writes: My (now deceased) collaborator and guru in all things inference, Sam Roweis, used to emphasize to me that we should evaluate models in the data space — not the parameter space — because models are always effectively “effective” and not really, fundamentally true. Or, in other words, models should be compared in the space of their predictions, not in the space of their parameters (the parameters didn’t really “exist” at all for Sam). In that spirit, when we estimate the effectiveness of a MCMC method or tuning — by autocorrelation time or ESJD or anything else — shouldn’t we be looking at the changes in the model predictions over time, rather than the changes in the parameters over time? That is, the autocorrelation time should be the autocorrelation time in what the model (at the walker position) predicts for the data, and the ESJD should be the expected squared jump distance in what the model predicts for the data? This might resolve the concern I expressed a

2 0.9354322 479 andrew gelman stats-2010-12-20-WWJD? U can find out!

Introduction: Two positions open in the statistics group at the NYU education school. If you get the job, you get to work with Jennifer HIll! One position is a postdoctoral fellowship, and the other is a visiting professorship. The latter position requires “the demonstrated ability to develop a nationally recognized research program,” which seems like a lot to ask for a visiting professor. Do they expect the visiting prof to develop a nationally recognized research program and then leave it there at NYU after the visit is over? In any case, Jennifer and her colleagues are doing excellent work, both applied and methodological, and this seems like a great opportunity.

3 0.92082185 1913 andrew gelman stats-2013-06-24-Why it doesn’t make sense in general to form confidence intervals by inverting hypothesis tests

Introduction: I’m reposing this classic from 2011 . . . Peter Bergman pointed me to this discussion from Cyrus of a presentation by Guido Imbens on design of randomized experiments. Cyrus writes: The standard analysis that Imbens proposes includes (1) a Fisher-type permutation test of the sharp null hypothesis–what Imbens referred to as “testing”–along with a (2) Neyman-type point estimate of the sample average treatment effect and confidence interval–what Imbens referred to as “estimation.” . . . Imbens claimed that testing and estimation are separate enterprises with separate goals and that the two should not be confused. I [Cyrus] took it as a warning against proposals that use “inverted” tests in order to produce point estimates and confidence intervals. There is no reason that such confidence intervals will have accurate coverage except under rather dire assumptions, meaning that they are not “confidence intervals” in the way that we usually think of them. I agree completely. T

4 0.91880953 870 andrew gelman stats-2011-08-25-Why it doesn’t make sense in general to form confidence intervals by inverting hypothesis tests

Introduction: Peter Bergman points me to this discussion from Cyrus of a presentation by Guido Imbens on design of randomized experiments. Cyrus writes: The standard analysis that Imbens proposes includes (1) a Fisher-type permutation test of the sharp null hypothesis–what Imbens referred to as “testing”–along with a (2) Neyman-type point estimate of the sample average treatment effect and confidence interval–what Imbens referred to as “estimation.” . . . Imbens claimed that testing and estimation are separate enterprises with separate goals and that the two should not be confused. I [Cyrus] took it as a warning against proposals that use “inverted” tests in order to produce point estimates and confidence intervals. There is no reason that such confidence intervals will have accurate coverage except under rather dire assumptions, meaning that they are not “confidence intervals” in the way that we usually think of them. I agree completely. This is something I’ve been saying for a long

5 0.90942764 1206 andrew gelman stats-2012-03-10-95% intervals that I don’t believe, because they’re from a flat prior I don’t believe

Introduction: Arnaud Trolle (no relation ) writes: I have a question about the interpretation of (non-)overlapping of 95% credibility intervals. In a Bayesian ANOVA (a within-subjects one), I computed 95% credibility intervals about the main effects of a factor. I’d like to compare two by two the main effects across the different conditions of the factor. Can I directly interpret the (non-)overlapping of these credibility intervals and make the following statements: “As the 95% credibility intervals do not overlap, both conditions have significantly different main effects” or conversely “As the 95% credibility intervals overlap, the main effects of both conditions are not significantly different, i.e. equivalent”? I heard that, in the case of classical confidence intervals, the second statement is false, but what happens when working within a Bayesian framework? My reply: I think it makes more sense to directly look at inference for the difference. Also, your statements about equivalence

6 0.90446442 900 andrew gelman stats-2011-09-11-Symptomatic innumeracy

7 0.90349936 1270 andrew gelman stats-2012-04-19-Demystifying Blup

8 0.90168607 1881 andrew gelman stats-2013-06-03-Boot

9 0.89993083 2208 andrew gelman stats-2014-02-12-How to think about “identifiability” in Bayesian inference?

10 0.89845276 2029 andrew gelman stats-2013-09-18-Understanding posterior p-values

11 0.89560378 1409 andrew gelman stats-2012-07-08-Is linear regression unethical in that it gives more weight to cases that are far from the average?

12 0.89541113 1792 andrew gelman stats-2013-04-07-X on JLP

13 0.89538872 2316 andrew gelman stats-2014-05-03-“The graph clearly shows that mammography adds virtually nothing to survival and if anything, decreases survival (and increases cost and provides unnecessary treatment)”

14 0.89406723 502 andrew gelman stats-2011-01-04-Cash in, cash out graph

15 0.89295691 1150 andrew gelman stats-2012-02-02-The inevitable problems with statistical significance and 95% intervals

16 0.89278847 2201 andrew gelman stats-2014-02-06-Bootstrap averaging: Examples where it works and where it doesn’t work

17 0.89208686 896 andrew gelman stats-2011-09-09-My homework success

18 0.89103246 1757 andrew gelman stats-2013-03-11-My problem with the Lindley paradox

19 0.89015555 1080 andrew gelman stats-2011-12-24-Latest in blog advertising

20 0.89011419 1465 andrew gelman stats-2012-08-21-D. Buggin