andrew_gelman_stats andrew_gelman_stats-2013 andrew_gelman_stats-2013-1913 knowledge-graph by maker-knowledge-mining

1913 andrew gelman stats-2013-06-24-Why it doesn’t make sense in general to form confidence intervals by inverting hypothesis tests


meta infos for this blog

Source: html

Introduction: I’m reposing this classic from 2011 . . . Peter Bergman pointed me to this discussion from Cyrus of a presentation by Guido Imbens on design of randomized experiments. Cyrus writes: The standard analysis that Imbens proposes includes (1) a Fisher-type permutation test of the sharp null hypothesis–what Imbens referred to as “testing”–along with a (2) Neyman-type point estimate of the sample average treatment effect and confidence interval–what Imbens referred to as “estimation.” . . . Imbens claimed that testing and estimation are separate enterprises with separate goals and that the two should not be confused. I [Cyrus] took it as a warning against proposals that use “inverted” tests in order to produce point estimates and confidence intervals. There is no reason that such confidence intervals will have accurate coverage except under rather dire assumptions, meaning that they are not “confidence intervals” in the way that we usually think of them. I agree completely. T


Summary: the most important sentenses genereted by tfidf model

sentIndex sentText sentNum sentScore

1 I [Cyrus] took it as a warning against proposals that use “inverted” tests in order to produce point estimates and confidence intervals. [sent-10, score-0.46]

2 There is no reason that such confidence intervals will have accurate coverage except under rather dire assumptions, meaning that they are not “confidence intervals” in the way that we usually think of them. [sent-11, score-0.553]

3 thesis, where I tried to fit a model that was proposed in the literature but it did not fit the data. [sent-15, score-0.382]

4 Thus, the confidence interval that you would get by inverting the hypothesis test was empty. [sent-16, score-1.373]

5 You might say that’s fine–the model didn’t fit, so the conf interval was empty. [sent-17, score-0.737]

6 Then you’d get a really tiny confidence interval. [sent-19, score-0.484]

7 Sometimes you can get a reasonable confidence interval by inverting a hypothesis test. [sent-22, score-1.182]

8 But if your hypothesis test can ever reject the model entirely, then you’re in the situation shown above. [sent-24, score-0.722]

9 Once you hit rejection, you suddenly go from a very tiny precise confidence interval to no interval at all. [sent-25, score-1.632]

10 To put it another way, as your fit gets gradually worse, the inference from your confidence interval becomes more and more precise and then suddenly, discontinuously has no precision at all. [sent-26, score-1.234]

11 (With an empty interval, you’d say that the model rejects and thus you can say nothing based on the model. [sent-27, score-0.375]

12 You wouldn’t just say your interval is, say, [3. [sent-28, score-0.565]

13 Here is some more detail: The idea is that you’re fitting a family of distributions indexed by some parameter theta, and your test is a function T(theta,y) of parameter theta and data y such that, if the model is true, Pr(T(theta,y)=reject|theta) = 0. [sent-39, score-0.874]

14 In addition, the test can be used to reject the entire family of distributions, given data y: if T(theta,y)=reject for all theta, then we can say that the test rejects the model. [sent-42, score-0.871]

15 Now, to get back to the graph above, the confidence interval given data y is defined as the set of values theta for which T(y,theta)! [sent-44, score-1.154]

16 As noted above, when you can reject the model, the confidence interval is empty. [sent-46, score-1.11]

17 The bad news is that when you’re close to being able to reject the model, the confidence interval is very small, hence implying precise inferences in the very situation where you’d really rather have less confidence! [sent-48, score-1.262]

18 This awkward story doesn’t always happen in classical confidence intervals, but it can happen. [sent-49, score-0.511]

19 That’s why I say that inverting hypothesis tests is not a good general principle for obtaining interval estimates. [sent-50, score-0.899]

20 You’re mixing up two ideas: inference within a model and checking the fit of a model. [sent-51, score-0.36]


similar blogs computed by tfidf model

tfidf for this blog:

wordName wordTfidf (topN-words)

[('interval', 0.488), ('confidence', 0.41), ('imbens', 0.23), ('reject', 0.212), ('theta', 0.209), ('guido', 0.199), ('test', 0.191), ('cyrus', 0.18), ('inverting', 0.145), ('hypothesis', 0.139), ('fit', 0.129), ('model', 0.124), ('fisher', 0.105), ('rejects', 0.097), ('precise', 0.096), ('permutation', 0.094), ('intervals', 0.091), ('neyman', 0.081), ('say', 0.077), ('suddenly', 0.076), ('parameter', 0.074), ('tiny', 0.074), ('referred', 0.071), ('discontinuously', 0.057), ('inverted', 0.057), ('ultraconservative', 0.057), ('family', 0.056), ('situation', 0.056), ('separate', 0.055), ('inference', 0.054), ('bergman', 0.054), ('enterprises', 0.054), ('checking', 0.053), ('classical', 0.053), ('testing', 0.052), ('dire', 0.052), ('distributions', 0.051), ('tests', 0.05), ('worse', 0.05), ('indexed', 0.048), ('conf', 0.048), ('uninteresting', 0.048), ('happen', 0.048), ('methodologically', 0.047), ('data', 0.047), ('design', 0.047), ('proposes', 0.043), ('nonzero', 0.043), ('doesn', 0.043), ('stylized', 0.043)]

similar blogs list:

simIndex simValue blogId blogTitle

same-blog 1 0.99999988 1913 andrew gelman stats-2013-06-24-Why it doesn’t make sense in general to form confidence intervals by inverting hypothesis tests

Introduction: I’m reposing this classic from 2011 . . . Peter Bergman pointed me to this discussion from Cyrus of a presentation by Guido Imbens on design of randomized experiments. Cyrus writes: The standard analysis that Imbens proposes includes (1) a Fisher-type permutation test of the sharp null hypothesis–what Imbens referred to as “testing”–along with a (2) Neyman-type point estimate of the sample average treatment effect and confidence interval–what Imbens referred to as “estimation.” . . . Imbens claimed that testing and estimation are separate enterprises with separate goals and that the two should not be confused. I [Cyrus] took it as a warning against proposals that use “inverted” tests in order to produce point estimates and confidence intervals. There is no reason that such confidence intervals will have accurate coverage except under rather dire assumptions, meaning that they are not “confidence intervals” in the way that we usually think of them. I agree completely. T

2 0.99124348 870 andrew gelman stats-2011-08-25-Why it doesn’t make sense in general to form confidence intervals by inverting hypothesis tests

Introduction: Peter Bergman points me to this discussion from Cyrus of a presentation by Guido Imbens on design of randomized experiments. Cyrus writes: The standard analysis that Imbens proposes includes (1) a Fisher-type permutation test of the sharp null hypothesis–what Imbens referred to as “testing”–along with a (2) Neyman-type point estimate of the sample average treatment effect and confidence interval–what Imbens referred to as “estimation.” . . . Imbens claimed that testing and estimation are separate enterprises with separate goals and that the two should not be confused. I [Cyrus] took it as a warning against proposals that use “inverted” tests in order to produce point estimates and confidence intervals. There is no reason that such confidence intervals will have accurate coverage except under rather dire assumptions, meaning that they are not “confidence intervals” in the way that we usually think of them. I agree completely. This is something I’ve been saying for a long

3 0.49544179 480 andrew gelman stats-2010-12-21-Instead of “confidence interval,” let’s say “uncertainty interval”

Introduction: I’ve become increasingly uncomfortable with the term “confidence interval,” for several reasons: - The well-known difficulties in interpretation (officially the confidence statement can be interpreted only on average, but people typically implicitly give the Bayesian interpretation to each case), - The ambiguity between confidence intervals and predictive intervals. (See the footnote in BDA where we discuss the difference between “inference” and “prediction” in the classical framework.) - The awkwardness of explaining that confidence intervals are big in noisy situations where you have less confidence, and confidence intervals are small when you have more confidence. So here’s my proposal. Let’s use the term “uncertainty interval” instead. The uncertainty interval tells you how much uncertainty you have. That works pretty well, I think. P.S. As of this writing, “confidence interval” outGoogles “uncertainty interval” by the huge margin of 9.5 million to 54000. So we

4 0.3456158 1672 andrew gelman stats-2013-01-14-How do you think about the values in a confidence interval?

Introduction: Philip Jones writes: As an interested reader of your blog, I wondered if you might consider a blog entry sometime on the following question I posed on CrossValidated (StackExchange). I originally posed the question based on my uncertainty about 95% CIs: “Are all values within the 95% CI equally likely (probable), or are the values at the “tails” of the 95% CI less likely than those in the middle of the CI closer to the point estimate?” I posed this question based on discordant information I found at a couple of different web sources (I posted these sources in the body of the question). I received some interesting replies, and the replies were not unanimous, in fact there is some serious disagreement there! After seeing this disagreement, I naturally thought of you, and whether you might be able to clear this up. Please note I am not referring to credible intervals, but rather to the common medical journal reporting standard of confidence intervals. My response: First

5 0.26802027 1334 andrew gelman stats-2012-05-21-Question 11 of my final exam for Design and Analysis of Sample Surveys

Introduction: 11. Here is the result of fitting a logistic regression to Republican vote in the 1972 NES. Income is on a 1–5 scale. Approximately how much more likely is a person in income category 4 to vote Republican, compared to a person income category 2? Give an approximate estimate, standard error, and 95% interval. Solution to question 10 From yesterday : 10. Out of a random sample of 100 Americans, zero report having ever held political office. From this information, give a 95% confidence interval for the proportion of Americans who have ever held political office. Solution: Use the Agresti-Coull interval based on (y+2)/(n+4). Estimate is p.hat=2/104=0.02, se is sqrt(p.hat*(1-p.hat)/104)=0.013, 95% interval is [0.02 +/- 2*0.013] = [0,0.05].

6 0.25822115 1333 andrew gelman stats-2012-05-20-Question 10 of my final exam for Design and Analysis of Sample Surveys

7 0.25004473 1869 andrew gelman stats-2013-05-24-In which I side with Neyman over Fisher

8 0.2085865 899 andrew gelman stats-2011-09-10-The statistical significance filter

9 0.1965611 1527 andrew gelman stats-2012-10-10-Another reason why you can get good inferences from a bad model

10 0.18363446 662 andrew gelman stats-2011-04-15-Bayesian statistical pragmatism

11 0.18190747 2201 andrew gelman stats-2014-02-06-Bootstrap averaging: Examples where it works and where it doesn’t work

12 0.16401258 1016 andrew gelman stats-2011-11-17-I got 99 comparisons but multiplicity ain’t one

13 0.15812646 254 andrew gelman stats-2010-09-04-Bayesian inference viewed as a computational approximation to classical calculations

14 0.1562542 2248 andrew gelman stats-2014-03-15-Problematic interpretations of confidence intervals

15 0.14457041 2210 andrew gelman stats-2014-02-13-Stopping rules and Bayesian analysis

16 0.14091301 1941 andrew gelman stats-2013-07-16-Priors

17 0.13952535 858 andrew gelman stats-2011-08-17-Jumping off the edge of the world

18 0.13790077 1021 andrew gelman stats-2011-11-21-Don’t judge a book by its title

19 0.13685706 1089 andrew gelman stats-2011-12-28-Path sampling for models of varying dimension

20 0.13501824 1355 andrew gelman stats-2012-05-31-Lindley’s paradox


similar blogs computed by lsi model

lsi for this blog:

topicId topicWeight

[(0, 0.2), (1, 0.148), (2, 0.058), (3, -0.063), (4, -0.019), (5, -0.077), (6, 0.007), (7, 0.076), (8, 0.095), (9, -0.218), (10, -0.106), (11, 0.068), (12, -0.0), (13, -0.121), (14, -0.08), (15, -0.095), (16, -0.092), (17, -0.108), (18, 0.052), (19, -0.193), (20, 0.241), (21, -0.006), (22, 0.167), (23, -0.088), (24, 0.198), (25, -0.168), (26, -0.127), (27, -0.148), (28, -0.028), (29, 0.14), (30, -0.052), (31, -0.2), (32, -0.09), (33, -0.053), (34, 0.019), (35, 0.089), (36, -0.003), (37, 0.143), (38, 0.088), (39, 0.057), (40, 0.03), (41, 0.028), (42, 0.11), (43, -0.048), (44, -0.007), (45, 0.012), (46, 0.01), (47, -0.064), (48, -0.028), (49, 0.017)]

similar blogs list:

simIndex simValue blogId blogTitle

same-blog 1 0.96407074 1913 andrew gelman stats-2013-06-24-Why it doesn’t make sense in general to form confidence intervals by inverting hypothesis tests

Introduction: I’m reposing this classic from 2011 . . . Peter Bergman pointed me to this discussion from Cyrus of a presentation by Guido Imbens on design of randomized experiments. Cyrus writes: The standard analysis that Imbens proposes includes (1) a Fisher-type permutation test of the sharp null hypothesis–what Imbens referred to as “testing”–along with a (2) Neyman-type point estimate of the sample average treatment effect and confidence interval–what Imbens referred to as “estimation.” . . . Imbens claimed that testing and estimation are separate enterprises with separate goals and that the two should not be confused. I [Cyrus] took it as a warning against proposals that use “inverted” tests in order to produce point estimates and confidence intervals. There is no reason that such confidence intervals will have accurate coverage except under rather dire assumptions, meaning that they are not “confidence intervals” in the way that we usually think of them. I agree completely. T

2 0.95237142 870 andrew gelman stats-2011-08-25-Why it doesn’t make sense in general to form confidence intervals by inverting hypothesis tests

Introduction: Peter Bergman points me to this discussion from Cyrus of a presentation by Guido Imbens on design of randomized experiments. Cyrus writes: The standard analysis that Imbens proposes includes (1) a Fisher-type permutation test of the sharp null hypothesis–what Imbens referred to as “testing”–along with a (2) Neyman-type point estimate of the sample average treatment effect and confidence interval–what Imbens referred to as “estimation.” . . . Imbens claimed that testing and estimation are separate enterprises with separate goals and that the two should not be confused. I [Cyrus] took it as a warning against proposals that use “inverted” tests in order to produce point estimates and confidence intervals. There is no reason that such confidence intervals will have accurate coverage except under rather dire assumptions, meaning that they are not “confidence intervals” in the way that we usually think of them. I agree completely. This is something I’ve been saying for a long

3 0.91251642 480 andrew gelman stats-2010-12-21-Instead of “confidence interval,” let’s say “uncertainty interval”

Introduction: I’ve become increasingly uncomfortable with the term “confidence interval,” for several reasons: - The well-known difficulties in interpretation (officially the confidence statement can be interpreted only on average, but people typically implicitly give the Bayesian interpretation to each case), - The ambiguity between confidence intervals and predictive intervals. (See the footnote in BDA where we discuss the difference between “inference” and “prediction” in the classical framework.) - The awkwardness of explaining that confidence intervals are big in noisy situations where you have less confidence, and confidence intervals are small when you have more confidence. So here’s my proposal. Let’s use the term “uncertainty interval” instead. The uncertainty interval tells you how much uncertainty you have. That works pretty well, I think. P.S. As of this writing, “confidence interval” outGoogles “uncertainty interval” by the huge margin of 9.5 million to 54000. So we

4 0.8458299 1672 andrew gelman stats-2013-01-14-How do you think about the values in a confidence interval?

Introduction: Philip Jones writes: As an interested reader of your blog, I wondered if you might consider a blog entry sometime on the following question I posed on CrossValidated (StackExchange). I originally posed the question based on my uncertainty about 95% CIs: “Are all values within the 95% CI equally likely (probable), or are the values at the “tails” of the 95% CI less likely than those in the middle of the CI closer to the point estimate?” I posed this question based on discordant information I found at a couple of different web sources (I posted these sources in the body of the question). I received some interesting replies, and the replies were not unanimous, in fact there is some serious disagreement there! After seeing this disagreement, I naturally thought of you, and whether you might be able to clear this up. Please note I am not referring to credible intervals, but rather to the common medical journal reporting standard of confidence intervals. My response: First

5 0.64476711 1016 andrew gelman stats-2011-11-17-I got 99 comparisons but multiplicity ain’t one

Introduction: After I gave my talk at an econ seminar on Why We (Usually) Don’t Care About Multiple Comparisons, I got the following comment: One question that came up later was whether your argument is really with testing in general, rather than only with testing in multiple comparison settings. My reply: Yes, my argument is with testing in general. But it arises with particular force in multiple comparisons. With a single test, we can just say we dislike testing so we use confidence intervals or Bayesian inference instead, and it’s no problem—really more of a change in emphasis than a change in methods. But with multiple tests, the classical advice is not simply to look at type 1 error rates but more specifically to make a multiplicity adjustment, for example to make confidence intervals wider to account for multiplicity. I don’t want to do this! So here there is a real battle to fight. P.S. Here’s the article (with Jennifer and Masanao), to appear in the Journal of Research on

6 0.6383779 1333 andrew gelman stats-2012-05-20-Question 10 of my final exam for Design and Analysis of Sample Surveys

7 0.60232174 254 andrew gelman stats-2010-09-04-Bayesian inference viewed as a computational approximation to classical calculations

8 0.5988695 2201 andrew gelman stats-2014-02-06-Bootstrap averaging: Examples where it works and where it doesn’t work

9 0.58517599 2248 andrew gelman stats-2014-03-15-Problematic interpretations of confidence intervals

10 0.56346118 1881 andrew gelman stats-2013-06-03-Boot

11 0.56077313 1334 andrew gelman stats-2012-05-21-Question 11 of my final exam for Design and Analysis of Sample Surveys

12 0.54688621 1206 andrew gelman stats-2012-03-10-95% intervals that I don’t believe, because they’re from a flat prior I don’t believe

13 0.53212601 1527 andrew gelman stats-2012-10-10-Another reason why you can get good inferences from a bad model

14 0.52029055 1178 andrew gelman stats-2012-02-21-How many data points do you really have?

15 0.51837969 1869 andrew gelman stats-2013-05-24-In which I side with Neyman over Fisher

16 0.49889478 2142 andrew gelman stats-2013-12-21-Chasing the noise

17 0.49075326 1331 andrew gelman stats-2012-05-19-Question 9 of my final exam for Design and Analysis of Sample Surveys

18 0.48261851 662 andrew gelman stats-2011-04-15-Bayesian statistical pragmatism

19 0.48171684 1021 andrew gelman stats-2011-11-21-Don’t judge a book by its title

20 0.48066777 1095 andrew gelman stats-2012-01-01-Martin and Liu: Probabilistic inference based on consistency of model with data


similar blogs computed by lda model

lda for this blog:

topicId topicWeight

[(15, 0.022), (16, 0.035), (20, 0.05), (21, 0.054), (24, 0.226), (27, 0.026), (48, 0.032), (64, 0.077), (82, 0.023), (86, 0.017), (89, 0.019), (98, 0.02), (99, 0.26)]

similar blogs list:

simIndex simValue blogId blogTitle

same-blog 1 0.97873044 1913 andrew gelman stats-2013-06-24-Why it doesn’t make sense in general to form confidence intervals by inverting hypothesis tests

Introduction: I’m reposing this classic from 2011 . . . Peter Bergman pointed me to this discussion from Cyrus of a presentation by Guido Imbens on design of randomized experiments. Cyrus writes: The standard analysis that Imbens proposes includes (1) a Fisher-type permutation test of the sharp null hypothesis–what Imbens referred to as “testing”–along with a (2) Neyman-type point estimate of the sample average treatment effect and confidence interval–what Imbens referred to as “estimation.” . . . Imbens claimed that testing and estimation are separate enterprises with separate goals and that the two should not be confused. I [Cyrus] took it as a warning against proposals that use “inverted” tests in order to produce point estimates and confidence intervals. There is no reason that such confidence intervals will have accurate coverage except under rather dire assumptions, meaning that they are not “confidence intervals” in the way that we usually think of them. I agree completely. T

2 0.97767174 870 andrew gelman stats-2011-08-25-Why it doesn’t make sense in general to form confidence intervals by inverting hypothesis tests

Introduction: Peter Bergman points me to this discussion from Cyrus of a presentation by Guido Imbens on design of randomized experiments. Cyrus writes: The standard analysis that Imbens proposes includes (1) a Fisher-type permutation test of the sharp null hypothesis–what Imbens referred to as “testing”–along with a (2) Neyman-type point estimate of the sample average treatment effect and confidence interval–what Imbens referred to as “estimation.” . . . Imbens claimed that testing and estimation are separate enterprises with separate goals and that the two should not be confused. I [Cyrus] took it as a warning against proposals that use “inverted” tests in order to produce point estimates and confidence intervals. There is no reason that such confidence intervals will have accurate coverage except under rather dire assumptions, meaning that they are not “confidence intervals” in the way that we usually think of them. I agree completely. This is something I’ve been saying for a long

3 0.95427966 2029 andrew gelman stats-2013-09-18-Understanding posterior p-values

Introduction: David Kaplan writes: I came across your paper “Understanding Posterior Predictive P-values”, and I have a question regarding your statement “If a posterior predictive p-value is 0.4, say, that means that, if we believe the model, we think there is a 40% chance that tomorrow’s value of T(y_rep) will exceed today’s T(y).” This is perfectly understandable to me and represents the idea of calibration. However, I am unsure how this relates to statements about fit. If T is the LR chi-square or Pearson chi-square, then your statement that there is a 40% chance that tomorrows value exceeds today’s value indicates bad fit, I think. Yet, some literature indicates that high p-values suggest good fit. Could you clarify this? My reply: I think that “fit” depends on the question being asked. In this case, I’d say the model fits for this particular purpose, even though it might not fit for other purposes. And here’s the abstract of the paper: Posterior predictive p-values do not i

4 0.95325649 1653 andrew gelman stats-2013-01-04-Census dotmap

Introduction: Andrew Vande Moere points to this impressive interactive map from Brandon Martin-Anderson showing the locations of all the residents of the United States and Canada. It says, “The map has 341,817,095 dots – one for each person.” Not quite . . . I was hoping to zoom into my building (approximately 10 people live on our floor, I say approximately because two of the apartments are split between two floors and I’m not sure how they would assign the residents), but unfortunately our entire block is just a solid mass of black. Also, they put a few dots in the park and in the river by accident (presumably because the borders of the census blocks were specified only approximately). But, hey, no algorithm is perfect. It’s hard to know what to do about this. The idea of mapping every person is cool, but you’ll always run into trouble displaying densely populated areas. Smaller dots might work, but then that might depend on the screen being used for display.

5 0.95111609 896 andrew gelman stats-2011-09-09-My homework success

Introduction: A friend writes to me: You will be amused to know that students in our Bayesian Inference paper at 4th year found solutions to exercises from your book on-line. The amazing thing was that some of them were dumb enough to copy out solutions verbatim. However, I thought you might like to know you have done well in this class! I’m happy to hear this. I worked hard on those solutions!

6 0.94974601 1240 andrew gelman stats-2012-04-02-Blogads update

7 0.94927102 1757 andrew gelman stats-2013-03-11-My problem with the Lindley paradox

8 0.94909406 2312 andrew gelman stats-2014-04-29-Ken Rice presents a unifying approach to statistical inference and hypothesis testing

9 0.94707578 1465 andrew gelman stats-2012-08-21-D. Buggin

10 0.94701195 1792 andrew gelman stats-2013-04-07-X on JLP

11 0.94672352 1208 andrew gelman stats-2012-03-11-Gelman on Hennig on Gelman on Bayes

12 0.94645083 1838 andrew gelman stats-2013-05-03-Setting aside the politics, the debate over the new health-care study reveals that we’re moving to a new high standard of statistical journalism

13 0.94544411 1080 andrew gelman stats-2011-12-24-Latest in blog advertising

14 0.94416392 2208 andrew gelman stats-2014-02-12-How to think about “identifiability” in Bayesian inference?

15 0.94395554 2086 andrew gelman stats-2013-11-03-How best to compare effects measured in two different time periods?

16 0.94390702 2358 andrew gelman stats-2014-06-03-Did you buy laundry detergent on their most recent trip to the store? Also comments on scientific publication and yet another suggestion to do a study that allows within-person comparisons

17 0.94389784 953 andrew gelman stats-2011-10-11-Steve Jobs’s cancer and science-based medicine

18 0.94362134 1087 andrew gelman stats-2011-12-27-“Keeping things unridiculous”: Berger, O’Hagan, and me on weakly informative priors

19 0.94336843 2109 andrew gelman stats-2013-11-21-Hidden dangers of noninformative priors

20 0.94297689 1072 andrew gelman stats-2011-12-19-“The difference between . . .”: It’s not just p=.05 vs. p=.06