andrew_gelman_stats andrew_gelman_stats-2011 andrew_gelman_stats-2011-870 knowledge-graph by maker-knowledge-mining
Source: html
Introduction: Peter Bergman points me to this discussion from Cyrus of a presentation by Guido Imbens on design of randomized experiments. Cyrus writes: The standard analysis that Imbens proposes includes (1) a Fisher-type permutation test of the sharp null hypothesis–what Imbens referred to as “testing”–along with a (2) Neyman-type point estimate of the sample average treatment effect and confidence interval–what Imbens referred to as “estimation.” . . . Imbens claimed that testing and estimation are separate enterprises with separate goals and that the two should not be confused. I [Cyrus] took it as a warning against proposals that use “inverted” tests in order to produce point estimates and confidence intervals. There is no reason that such confidence intervals will have accurate coverage except under rather dire assumptions, meaning that they are not “confidence intervals” in the way that we usually think of them. I agree completely. This is something I’ve been saying for a long
sentIndex sentText sentNum sentScore
1 I [Cyrus] took it as a warning against proposals that use “inverted” tests in order to produce point estimates and confidence intervals. [sent-7, score-0.453]
2 There is no reason that such confidence intervals will have accurate coverage except under rather dire assumptions, meaning that they are not “confidence intervals” in the way that we usually think of them. [sent-8, score-0.55]
3 thesis, where I tried to fit a model that was proposed in the literature but it did not fit the data. [sent-12, score-0.411]
4 Thus, the confidence interval that you would get by inverting the hypothesis test was empty. [sent-13, score-1.34]
5 You might say that’s fine–the model didn’t fit, so the conf interval was empty. [sent-14, score-0.715]
6 Then you’d get a really tiny confidence interval. [sent-16, score-0.478]
7 Here’s what was happening: Sometimes you can get a reasonable confidence interval by inverting a hypothesis test. [sent-18, score-1.141]
8 But if your hypothesis test can ever reject the model entirely, then you’re in the situation shown above. [sent-20, score-0.766]
9 Once you hit rejection, you suddenly go from a very tiny precise confidence interval to no interval at all. [sent-21, score-1.544]
10 To put it another way, as your fit gets gradually worse, the inference from your confidence interval becomes more and more precise and then suddenly, discontinuously has no precision at all. [sent-22, score-1.196]
11 (With an empty interval, you’d say that the model rejects and thus you can say nothing based on the model. [sent-23, score-0.402]
12 You wouldn’t just say your interval is, say, [3. [sent-24, score-0.523]
13 So here is some more detail: The idea is that you’re fitting a family of distributions indexed by some parameter theta, and your test is a function T(theta,y) of parameter theta and data y such that, if the model is true, Pr(T(theta,y)=reject|theta) = 0. [sent-38, score-0.926]
14 In addition, the test can be used to reject the entire family of distributions, given data y: if T(theta,y)=reject for all theta, then we can say that the test rejects the model. [sent-41, score-0.909]
15 Now, to get back to the graph above, the confidence interval given data y is defined as the set of values theta for which T(y,theta)! [sent-43, score-1.111]
16 As noted above, when you can reject the model, the confidence interval is empty. [sent-45, score-1.066]
17 The bad news is that when you’re close to being able to reject the model, the confidence interval is very small, hence implying precise inferences in the very situation where you’d really rather have less confidence! [sent-47, score-1.225]
18 This awkward story doesn’t always happen in classical confidence intervals, but it can happen. [sent-48, score-0.506]
19 That’s why I say that inverting hypothesis tests is not a good general principle for obtaining interval estimates. [sent-49, score-0.872]
20 You’re mixing up two ideas: inference within a model and checking the fit of a model. [sent-50, score-0.388]
wordName wordTfidf (topN-words)
[('interval', 0.443), ('confidence', 0.401), ('imbens', 0.24), ('reject', 0.222), ('theta', 0.218), ('guido', 0.208), ('test', 0.199), ('cyrus', 0.188), ('inverting', 0.152), ('hypothesis', 0.145), ('model', 0.141), ('fit', 0.135), ('fisher', 0.11), ('rejects', 0.101), ('precise', 0.1), ('permutation', 0.098), ('intervals', 0.095), ('neyman', 0.084), ('say', 0.08), ('suddenly', 0.08), ('parameter', 0.078), ('tiny', 0.077), ('referred', 0.074), ('discontinuously', 0.06), ('inverted', 0.06), ('ultraconservative', 0.06), ('family', 0.059), ('situation', 0.059), ('separate', 0.058), ('inference', 0.057), ('bergman', 0.056), ('enterprises', 0.056), ('checking', 0.055), ('classical', 0.055), ('testing', 0.054), ('dire', 0.054), ('distributions', 0.053), ('tests', 0.052), ('worse', 0.052), ('indexed', 0.051), ('conf', 0.051), ('uninteresting', 0.051), ('happen', 0.05), ('methodologically', 0.049), ('data', 0.049), ('design', 0.049), ('proposes', 0.045), ('nonzero', 0.045), ('doesn', 0.045), ('sharp', 0.043)]
simIndex simValue blogId blogTitle
same-blog 1 1.0000004 870 andrew gelman stats-2011-08-25-Why it doesn’t make sense in general to form confidence intervals by inverting hypothesis tests
Introduction: Peter Bergman points me to this discussion from Cyrus of a presentation by Guido Imbens on design of randomized experiments. Cyrus writes: The standard analysis that Imbens proposes includes (1) a Fisher-type permutation test of the sharp null hypothesis–what Imbens referred to as “testing”–along with a (2) Neyman-type point estimate of the sample average treatment effect and confidence interval–what Imbens referred to as “estimation.” . . . Imbens claimed that testing and estimation are separate enterprises with separate goals and that the two should not be confused. I [Cyrus] took it as a warning against proposals that use “inverted” tests in order to produce point estimates and confidence intervals. There is no reason that such confidence intervals will have accurate coverage except under rather dire assumptions, meaning that they are not “confidence intervals” in the way that we usually think of them. I agree completely. This is something I’ve been saying for a long
Introduction: I’m reposing this classic from 2011 . . . Peter Bergman pointed me to this discussion from Cyrus of a presentation by Guido Imbens on design of randomized experiments. Cyrus writes: The standard analysis that Imbens proposes includes (1) a Fisher-type permutation test of the sharp null hypothesis–what Imbens referred to as “testing”–along with a (2) Neyman-type point estimate of the sample average treatment effect and confidence interval–what Imbens referred to as “estimation.” . . . Imbens claimed that testing and estimation are separate enterprises with separate goals and that the two should not be confused. I [Cyrus] took it as a warning against proposals that use “inverted” tests in order to produce point estimates and confidence intervals. There is no reason that such confidence intervals will have accurate coverage except under rather dire assumptions, meaning that they are not “confidence intervals” in the way that we usually think of them. I agree completely. T
3 0.47029862 480 andrew gelman stats-2010-12-21-Instead of “confidence interval,” let’s say “uncertainty interval”
Introduction: I’ve become increasingly uncomfortable with the term “confidence interval,” for several reasons: - The well-known difficulties in interpretation (officially the confidence statement can be interpreted only on average, but people typically implicitly give the Bayesian interpretation to each case), - The ambiguity between confidence intervals and predictive intervals. (See the footnote in BDA where we discuss the difference between “inference” and “prediction” in the classical framework.) - The awkwardness of explaining that confidence intervals are big in noisy situations where you have less confidence, and confidence intervals are small when you have more confidence. So here’s my proposal. Let’s use the term “uncertainty interval” instead. The uncertainty interval tells you how much uncertainty you have. That works pretty well, I think. P.S. As of this writing, “confidence interval” outGoogles “uncertainty interval” by the huge margin of 9.5 million to 54000. So we
4 0.32951355 1672 andrew gelman stats-2013-01-14-How do you think about the values in a confidence interval?
Introduction: Philip Jones writes: As an interested reader of your blog, I wondered if you might consider a blog entry sometime on the following question I posed on CrossValidated (StackExchange). I originally posed the question based on my uncertainty about 95% CIs: “Are all values within the 95% CI equally likely (probable), or are the values at the “tails” of the 95% CI less likely than those in the middle of the CI closer to the point estimate?” I posed this question based on discordant information I found at a couple of different web sources (I posted these sources in the body of the question). I received some interesting replies, and the replies were not unanimous, in fact there is some serious disagreement there! After seeing this disagreement, I naturally thought of you, and whether you might be able to clear this up. Please note I am not referring to credible intervals, but rather to the common medical journal reporting standard of confidence intervals. My response: First
5 0.25589848 1869 andrew gelman stats-2013-05-24-In which I side with Neyman over Fisher
Introduction: As a data analyst and a scientist, Fisher > Neyman, no question. But as a theorist, Fisher came up with ideas that worked just fine in his applications but can fall apart when people try to apply them too generally. Here’s an example that recently came up. Deborah Mayo pointed me to a comment by Stephen Senn on the so-called Fisher and Neyman null hypotheses. In an experiment with n participants (or, as we used to say, subjects or experimental units), the Fisher null hypothesis is that the treatment effect is exactly 0 for every one of the n units, while the Neyman null hypothesis is that the individual treatment effects can be negative or positive but have an average of zero. Senn explains why Neyman’s hypothesis in general makes no sense—the short story is that Fisher’s hypothesis seems relevant in some problems (sometimes we really are studying effects that are zero or close enough for all practical purposes), whereas Neyman’s hypothesis just seems weird (it’s implausible
6 0.24874082 1334 andrew gelman stats-2012-05-21-Question 11 of my final exam for Design and Analysis of Sample Surveys
7 0.24114865 1333 andrew gelman stats-2012-05-20-Question 10 of my final exam for Design and Analysis of Sample Surveys
8 0.21226282 899 andrew gelman stats-2011-09-10-The statistical significance filter
9 0.19578876 1527 andrew gelman stats-2012-10-10-Another reason why you can get good inferences from a bad model
10 0.1860294 662 andrew gelman stats-2011-04-15-Bayesian statistical pragmatism
11 0.17214826 2201 andrew gelman stats-2014-02-06-Bootstrap averaging: Examples where it works and where it doesn’t work
12 0.16494903 1016 andrew gelman stats-2011-11-17-I got 99 comparisons but multiplicity ain’t one
14 0.15232798 2248 andrew gelman stats-2014-03-15-Problematic interpretations of confidence intervals
15 0.14645106 1941 andrew gelman stats-2013-07-16-Priors
16 0.14631455 858 andrew gelman stats-2011-08-17-Jumping off the edge of the world
17 0.14404903 2210 andrew gelman stats-2014-02-13-Stopping rules and Bayesian analysis
18 0.14389031 1089 andrew gelman stats-2011-12-28-Path sampling for models of varying dimension
19 0.14240164 1355 andrew gelman stats-2012-05-31-Lindley’s paradox
20 0.14043309 1605 andrew gelman stats-2012-12-04-Write This Book
topicId topicWeight
[(0, 0.206), (1, 0.156), (2, 0.058), (3, -0.065), (4, -0.021), (5, -0.072), (6, 0.012), (7, 0.073), (8, 0.1), (9, -0.212), (10, -0.107), (11, 0.073), (12, -0.003), (13, -0.126), (14, -0.084), (15, -0.096), (16, -0.09), (17, -0.11), (18, 0.054), (19, -0.188), (20, 0.239), (21, -0.009), (22, 0.163), (23, -0.091), (24, 0.187), (25, -0.165), (26, -0.128), (27, -0.144), (28, -0.026), (29, 0.139), (30, -0.051), (31, -0.2), (32, -0.085), (33, -0.044), (34, 0.017), (35, 0.09), (36, 0.001), (37, 0.141), (38, 0.088), (39, 0.057), (40, 0.029), (41, 0.029), (42, 0.103), (43, -0.049), (44, -0.007), (45, 0.011), (46, 0.009), (47, -0.067), (48, -0.027), (49, 0.019)]
simIndex simValue blogId blogTitle
Introduction: I’m reposing this classic from 2011 . . . Peter Bergman pointed me to this discussion from Cyrus of a presentation by Guido Imbens on design of randomized experiments. Cyrus writes: The standard analysis that Imbens proposes includes (1) a Fisher-type permutation test of the sharp null hypothesis–what Imbens referred to as “testing”–along with a (2) Neyman-type point estimate of the sample average treatment effect and confidence interval–what Imbens referred to as “estimation.” . . . Imbens claimed that testing and estimation are separate enterprises with separate goals and that the two should not be confused. I [Cyrus] took it as a warning against proposals that use “inverted” tests in order to produce point estimates and confidence intervals. There is no reason that such confidence intervals will have accurate coverage except under rather dire assumptions, meaning that they are not “confidence intervals” in the way that we usually think of them. I agree completely. T
same-blog 2 0.9598121 870 andrew gelman stats-2011-08-25-Why it doesn’t make sense in general to form confidence intervals by inverting hypothesis tests
Introduction: Peter Bergman points me to this discussion from Cyrus of a presentation by Guido Imbens on design of randomized experiments. Cyrus writes: The standard analysis that Imbens proposes includes (1) a Fisher-type permutation test of the sharp null hypothesis–what Imbens referred to as “testing”–along with a (2) Neyman-type point estimate of the sample average treatment effect and confidence interval–what Imbens referred to as “estimation.” . . . Imbens claimed that testing and estimation are separate enterprises with separate goals and that the two should not be confused. I [Cyrus] took it as a warning against proposals that use “inverted” tests in order to produce point estimates and confidence intervals. There is no reason that such confidence intervals will have accurate coverage except under rather dire assumptions, meaning that they are not “confidence intervals” in the way that we usually think of them. I agree completely. This is something I’ve been saying for a long
3 0.90332365 480 andrew gelman stats-2010-12-21-Instead of “confidence interval,” let’s say “uncertainty interval”
Introduction: I’ve become increasingly uncomfortable with the term “confidence interval,” for several reasons: - The well-known difficulties in interpretation (officially the confidence statement can be interpreted only on average, but people typically implicitly give the Bayesian interpretation to each case), - The ambiguity between confidence intervals and predictive intervals. (See the footnote in BDA where we discuss the difference between “inference” and “prediction” in the classical framework.) - The awkwardness of explaining that confidence intervals are big in noisy situations where you have less confidence, and confidence intervals are small when you have more confidence. So here’s my proposal. Let’s use the term “uncertainty interval” instead. The uncertainty interval tells you how much uncertainty you have. That works pretty well, I think. P.S. As of this writing, “confidence interval” outGoogles “uncertainty interval” by the huge margin of 9.5 million to 54000. So we
4 0.84264404 1672 andrew gelman stats-2013-01-14-How do you think about the values in a confidence interval?
Introduction: Philip Jones writes: As an interested reader of your blog, I wondered if you might consider a blog entry sometime on the following question I posed on CrossValidated (StackExchange). I originally posed the question based on my uncertainty about 95% CIs: “Are all values within the 95% CI equally likely (probable), or are the values at the “tails” of the 95% CI less likely than those in the middle of the CI closer to the point estimate?” I posed this question based on discordant information I found at a couple of different web sources (I posted these sources in the body of the question). I received some interesting replies, and the replies were not unanimous, in fact there is some serious disagreement there! After seeing this disagreement, I naturally thought of you, and whether you might be able to clear this up. Please note I am not referring to credible intervals, but rather to the common medical journal reporting standard of confidence intervals. My response: First
5 0.64977312 1016 andrew gelman stats-2011-11-17-I got 99 comparisons but multiplicity ain’t one
Introduction: After I gave my talk at an econ seminar on Why We (Usually) Don’t Care About Multiple Comparisons, I got the following comment: One question that came up later was whether your argument is really with testing in general, rather than only with testing in multiple comparison settings. My reply: Yes, my argument is with testing in general. But it arises with particular force in multiple comparisons. With a single test, we can just say we dislike testing so we use confidence intervals or Bayesian inference instead, and it’s no problem—really more of a change in emphasis than a change in methods. But with multiple tests, the classical advice is not simply to look at type 1 error rates but more specifically to make a multiplicity adjustment, for example to make confidence intervals wider to account for multiplicity. I don’t want to do this! So here there is a real battle to fight. P.S. Here’s the article (with Jennifer and Masanao), to appear in the Journal of Research on
6 0.62777787 1333 andrew gelman stats-2012-05-20-Question 10 of my final exam for Design and Analysis of Sample Surveys
8 0.60233319 2201 andrew gelman stats-2014-02-06-Bootstrap averaging: Examples where it works and where it doesn’t work
9 0.58205873 2248 andrew gelman stats-2014-03-15-Problematic interpretations of confidence intervals
10 0.56996286 1881 andrew gelman stats-2013-06-03-Boot
11 0.55218685 1334 andrew gelman stats-2012-05-21-Question 11 of my final exam for Design and Analysis of Sample Surveys
12 0.54912025 1206 andrew gelman stats-2012-03-10-95% intervals that I don’t believe, because they’re from a flat prior I don’t believe
13 0.54866713 1527 andrew gelman stats-2012-10-10-Another reason why you can get good inferences from a bad model
14 0.53097206 1869 andrew gelman stats-2013-05-24-In which I side with Neyman over Fisher
15 0.52968735 1178 andrew gelman stats-2012-02-21-How many data points do you really have?
16 0.50456697 2142 andrew gelman stats-2013-12-21-Chasing the noise
17 0.49760053 1095 andrew gelman stats-2012-01-01-Martin and Liu: Probabilistic inference based on consistency of model with data
18 0.49406251 662 andrew gelman stats-2011-04-15-Bayesian statistical pragmatism
19 0.4915598 996 andrew gelman stats-2011-11-07-Chi-square FAIL when many cells have small expected values
20 0.4841457 1331 andrew gelman stats-2012-05-19-Question 9 of my final exam for Design and Analysis of Sample Surveys
topicId topicWeight
[(15, 0.017), (16, 0.037), (20, 0.049), (21, 0.046), (24, 0.229), (27, 0.026), (48, 0.033), (64, 0.08), (82, 0.024), (86, 0.018), (89, 0.023), (98, 0.021), (99, 0.259)]
simIndex simValue blogId blogTitle
Introduction: I’m reposing this classic from 2011 . . . Peter Bergman pointed me to this discussion from Cyrus of a presentation by Guido Imbens on design of randomized experiments. Cyrus writes: The standard analysis that Imbens proposes includes (1) a Fisher-type permutation test of the sharp null hypothesis–what Imbens referred to as “testing”–along with a (2) Neyman-type point estimate of the sample average treatment effect and confidence interval–what Imbens referred to as “estimation.” . . . Imbens claimed that testing and estimation are separate enterprises with separate goals and that the two should not be confused. I [Cyrus] took it as a warning against proposals that use “inverted” tests in order to produce point estimates and confidence intervals. There is no reason that such confidence intervals will have accurate coverage except under rather dire assumptions, meaning that they are not “confidence intervals” in the way that we usually think of them. I agree completely. T
same-blog 2 0.97768748 870 andrew gelman stats-2011-08-25-Why it doesn’t make sense in general to form confidence intervals by inverting hypothesis tests
Introduction: Peter Bergman points me to this discussion from Cyrus of a presentation by Guido Imbens on design of randomized experiments. Cyrus writes: The standard analysis that Imbens proposes includes (1) a Fisher-type permutation test of the sharp null hypothesis–what Imbens referred to as “testing”–along with a (2) Neyman-type point estimate of the sample average treatment effect and confidence interval–what Imbens referred to as “estimation.” . . . Imbens claimed that testing and estimation are separate enterprises with separate goals and that the two should not be confused. I [Cyrus] took it as a warning against proposals that use “inverted” tests in order to produce point estimates and confidence intervals. There is no reason that such confidence intervals will have accurate coverage except under rather dire assumptions, meaning that they are not “confidence intervals” in the way that we usually think of them. I agree completely. This is something I’ve been saying for a long
3 0.95668805 1653 andrew gelman stats-2013-01-04-Census dotmap
Introduction: Andrew Vande Moere points to this impressive interactive map from Brandon Martin-Anderson showing the locations of all the residents of the United States and Canada. It says, “The map has 341,817,095 dots – one for each person.” Not quite . . . I was hoping to zoom into my building (approximately 10 people live on our floor, I say approximately because two of the apartments are split between two floors and I’m not sure how they would assign the residents), but unfortunately our entire block is just a solid mass of black. Also, they put a few dots in the park and in the river by accident (presumably because the borders of the census blocks were specified only approximately). But, hey, no algorithm is perfect. It’s hard to know what to do about this. The idea of mapping every person is cool, but you’ll always run into trouble displaying densely populated areas. Smaller dots might work, but then that might depend on the screen being used for display.
4 0.95266879 2029 andrew gelman stats-2013-09-18-Understanding posterior p-values
Introduction: David Kaplan writes: I came across your paper “Understanding Posterior Predictive P-values”, and I have a question regarding your statement “If a posterior predictive p-value is 0.4, say, that means that, if we believe the model, we think there is a 40% chance that tomorrow’s value of T(y_rep) will exceed today’s T(y).” This is perfectly understandable to me and represents the idea of calibration. However, I am unsure how this relates to statements about fit. If T is the LR chi-square or Pearson chi-square, then your statement that there is a 40% chance that tomorrows value exceeds today’s value indicates bad fit, I think. Yet, some literature indicates that high p-values suggest good fit. Could you clarify this? My reply: I think that “fit” depends on the question being asked. In this case, I’d say the model fits for this particular purpose, even though it might not fit for other purposes. And here’s the abstract of the paper: Posterior predictive p-values do not i
5 0.94972503 1240 andrew gelman stats-2012-04-02-Blogads update
Introduction: A few months ago I reported on someone who wanted to insert text links into the blog. I asked her how much they would pay and got no answer. Yesterday, though, I received this reply: Hello Andrew, I am sorry for the delay in getting back to you. I’d like to make a proposal for your site. Please refer below. We would like to place a simple text link ad on page http://andrewgelman.com/2011/07/super_sam_fuld/ to link to *** with the key phrase ***. We will incorporate the key phrase into a sentence so it would read well. Rest assured it won’t sound obnoxious or advertorial. We will then process the final text link code as soon as you agree to our proposal. We can offer you $200 for this with the assumption that you will keep the link “live” on that page for 12 months or longer if you prefer. Please get back to us with a quick reply on your thoughts on this and include your Paypal ID for payment process. Hoping for a positive response from you. I wrote back: Hi,
6 0.94931346 896 andrew gelman stats-2011-09-09-My homework success
7 0.94890052 1757 andrew gelman stats-2013-03-11-My problem with the Lindley paradox
10 0.94646204 1208 andrew gelman stats-2012-03-11-Gelman on Hennig on Gelman on Bayes
11 0.94606566 1792 andrew gelman stats-2013-04-07-X on JLP
12 0.94602239 1465 andrew gelman stats-2012-08-21-D. Buggin
13 0.94504952 953 andrew gelman stats-2011-10-11-Steve Jobs’s cancer and science-based medicine
15 0.94458079 1080 andrew gelman stats-2011-12-24-Latest in blog advertising
16 0.944278 2086 andrew gelman stats-2013-11-03-How best to compare effects measured in two different time periods?
17 0.94425648 1087 andrew gelman stats-2011-12-27-“Keeping things unridiculous”: Berger, O’Hagan, and me on weakly informative priors
19 0.9441241 1072 andrew gelman stats-2011-12-19-“The difference between . . .”: It’s not just p=.05 vs. p=.06
20 0.94368488 2247 andrew gelman stats-2014-03-14-The maximal information coefficient