andrew_gelman_stats andrew_gelman_stats-2011 andrew_gelman_stats-2011-870 knowledge-graph by maker-knowledge-mining

870 andrew gelman stats-2011-08-25-Why it doesn’t make sense in general to form confidence intervals by inverting hypothesis tests

meta infos for this blog

Source: html

Introduction: Peter Bergman points me to this discussion from Cyrus of a presentation by Guido Imbens on design of randomized experiments. Cyrus writes: The standard analysis that Imbens proposes includes (1) a Fisher-type permutation test of the sharp null hypothesis–what Imbens referred to as “testing”–along with a (2) Neyman-type point estimate of the sample average treatment effect and confidence interval–what Imbens referred to as “estimation.” . . . Imbens claimed that testing and estimation are separate enterprises with separate goals and that the two should not be confused. I [Cyrus] took it as a warning against proposals that use “inverted” tests in order to produce point estimates and confidence intervals. There is no reason that such confidence intervals will have accurate coverage except under rather dire assumptions, meaning that they are not “confidence intervals” in the way that we usually think of them. I agree completely. This is something I’ve been saying for a long

Summary: the most important sentenses genereted by tfidf model

sentIndex sentText sentNum sentScore

1 I [Cyrus] took it as a warning against proposals that use “inverted” tests in order to produce point estimates and confidence intervals. [sent-7, score-0.453]

2 There is no reason that such confidence intervals will have accurate coverage except under rather dire assumptions, meaning that they are not “confidence intervals” in the way that we usually think of them. [sent-8, score-0.55]

3 thesis, where I tried to fit a model that was proposed in the literature but it did not fit the data. [sent-12, score-0.411]

4 Thus, the confidence interval that you would get by inverting the hypothesis test was empty. [sent-13, score-1.34]

5 You might say that’s fine–the model didn’t fit, so the conf interval was empty. [sent-14, score-0.715]

6 Then you’d get a really tiny confidence interval. [sent-16, score-0.478]

7 Here’s what was happening: Sometimes you can get a reasonable confidence interval by inverting a hypothesis test. [sent-18, score-1.141]

8 But if your hypothesis test can ever reject the model entirely, then you’re in the situation shown above. [sent-20, score-0.766]

9 Once you hit rejection, you suddenly go from a very tiny precise confidence interval to no interval at all. [sent-21, score-1.544]

10 To put it another way, as your fit gets gradually worse, the inference from your confidence interval becomes more and more precise and then suddenly, discontinuously has no precision at all. [sent-22, score-1.196]

11 (With an empty interval, you’d say that the model rejects and thus you can say nothing based on the model. [sent-23, score-0.402]

12 You wouldn’t just say your interval is, say, [3. [sent-24, score-0.523]

13 So here is some more detail: The idea is that you’re fitting a family of distributions indexed by some parameter theta, and your test is a function T(theta,y) of parameter theta and data y such that, if the model is true, Pr(T(theta,y)=reject|theta) = 0. [sent-38, score-0.926]

14 In addition, the test can be used to reject the entire family of distributions, given data y: if T(theta,y)=reject for all theta, then we can say that the test rejects the model. [sent-41, score-0.909]

15 Now, to get back to the graph above, the confidence interval given data y is defined as the set of values theta for which T(y,theta)! [sent-43, score-1.111]

16 As noted above, when you can reject the model, the confidence interval is empty. [sent-45, score-1.066]

17 The bad news is that when you’re close to being able to reject the model, the confidence interval is very small, hence implying precise inferences in the very situation where you’d really rather have less confidence! [sent-47, score-1.225]

18 This awkward story doesn’t always happen in classical confidence intervals, but it can happen. [sent-48, score-0.506]

19 That’s why I say that inverting hypothesis tests is not a good general principle for obtaining interval estimates. [sent-49, score-0.872]

20 You’re mixing up two ideas: inference within a model and checking the fit of a model. [sent-50, score-0.388]

similar blogs computed by tfidf model

tfidf for this blog:

wordName wordTfidf (topN-words)

[('interval', 0.443), ('confidence', 0.401), ('imbens', 0.24), ('reject', 0.222), ('theta', 0.218), ('guido', 0.208), ('test', 0.199), ('cyrus', 0.188), ('inverting', 0.152), ('hypothesis', 0.145), ('model', 0.141), ('fit', 0.135), ('fisher', 0.11), ('rejects', 0.101), ('precise', 0.1), ('permutation', 0.098), ('intervals', 0.095), ('neyman', 0.084), ('say', 0.08), ('suddenly', 0.08), ('parameter', 0.078), ('tiny', 0.077), ('referred', 0.074), ('discontinuously', 0.06), ('inverted', 0.06), ('ultraconservative', 0.06), ('family', 0.059), ('situation', 0.059), ('separate', 0.058), ('inference', 0.057), ('bergman', 0.056), ('enterprises', 0.056), ('checking', 0.055), ('classical', 0.055), ('testing', 0.054), ('dire', 0.054), ('distributions', 0.053), ('tests', 0.052), ('worse', 0.052), ('indexed', 0.051), ('conf', 0.051), ('uninteresting', 0.051), ('happen', 0.05), ('methodologically', 0.049), ('data', 0.049), ('design', 0.049), ('proposes', 0.045), ('nonzero', 0.045), ('doesn', 0.045), ('sharp', 0.043)]

similar blogs list:

simIndex simValue blogId blogTitle

same-blog 1 1.0000004 870 andrew gelman stats-2011-08-25-Why it doesn’t make sense in general to form confidence intervals by inverting hypothesis tests

2 0.99124348 1913 andrew gelman stats-2013-06-24-Why it doesn’t make sense in general to form confidence intervals by inverting hypothesis tests

Introduction: I’m reposing this classic from 2011 . . . Peter Bergman pointed me to this discussion from Cyrus of a presentation by Guido Imbens on design of randomized experiments. Cyrus writes: The standard analysis that Imbens proposes includes (1) a Fisher-type permutation test of the sharp null hypothesis–what Imbens referred to as “testing”–along with a (2) Neyman-type point estimate of the sample average treatment effect and confidence interval–what Imbens referred to as “estimation.” . . . Imbens claimed that testing and estimation are separate enterprises with separate goals and that the two should not be confused. I [Cyrus] took it as a warning against proposals that use “inverted” tests in order to produce point estimates and confidence intervals. There is no reason that such confidence intervals will have accurate coverage except under rather dire assumptions, meaning that they are not “confidence intervals” in the way that we usually think of them. I agree completely. T

3 0.47029862 480 andrew gelman stats-2010-12-21-Instead of “confidence interval,” let’s say “uncertainty interval”

Introduction: I’ve become increasingly uncomfortable with the term “confidence interval,” for several reasons: - The well-known difficulties in interpretation (officially the confidence statement can be interpreted only on average, but people typically implicitly give the Bayesian interpretation to each case), - The ambiguity between confidence intervals and predictive intervals. (See the footnote in BDA where we discuss the difference between “inference” and “prediction” in the classical framework.) - The awkwardness of explaining that confidence intervals are big in noisy situations where you have less confidence, and confidence intervals are small when you have more confidence. So here’s my proposal. Let’s use the term “uncertainty interval” instead. The uncertainty interval tells you how much uncertainty you have. That works pretty well, I think. P.S. As of this writing, “confidence interval” outGoogles “uncertainty interval” by the huge margin of 9.5 million to 54000. So we

4 0.32951355 1672 andrew gelman stats-2013-01-14-How do you think about the values in a confidence interval?

Introduction: Philip Jones writes: As an interested reader of your blog, I wondered if you might consider a blog entry sometime on the following question I posed on CrossValidated (StackExchange). I originally posed the question based on my uncertainty about 95% CIs: “Are all values within the 95% CI equally likely (probable), or are the values at the “tails” of the 95% CI less likely than those in the middle of the CI closer to the point estimate?” I posed this question based on discordant information I found at a couple of different web sources (I posted these sources in the body of the question). I received some interesting replies, and the replies were not unanimous, in fact there is some serious disagreement there! After seeing this disagreement, I naturally thought of you, and whether you might be able to clear this up. Please note I am not referring to credible intervals, but rather to the common medical journal reporting standard of confidence intervals. My response: First

5 0.25589848 1869 andrew gelman stats-2013-05-24-In which I side with Neyman over Fisher

Introduction: As a data analyst and a scientist, Fisher > Neyman, no question. But as a theorist, Fisher came up with ideas that worked just fine in his applications but can fall apart when people try to apply them too generally. Here’s an example that recently came up. Deborah Mayo pointed me to a comment by Stephen Senn on the so-called Fisher and Neyman null hypotheses. In an experiment with n participants (or, as we used to say, subjects or experimental units), the Fisher null hypothesis is that the treatment effect is exactly 0 for every one of the n units, while the Neyman null hypothesis is that the individual treatment effects can be negative or positive but have an average of zero. Senn explains why Neyman’s hypothesis in general makes no sense—the short story is that Fisher’s hypothesis seems relevant in some problems (sometimes we really are studying effects that are zero or close enough for all practical purposes), whereas Neyman’s hypothesis just seems weird (it’s implausible

6 0.24874082 1334 andrew gelman stats-2012-05-21-Question 11 of my final exam for Design and Analysis of Sample Surveys

7 0.24114865 1333 andrew gelman stats-2012-05-20-Question 10 of my final exam for Design and Analysis of Sample Surveys

8 0.21226282 899 andrew gelman stats-2011-09-10-The statistical significance filter

9 0.19578876 1527 andrew gelman stats-2012-10-10-Another reason why you can get good inferences from a bad model

10 0.1860294 662 andrew gelman stats-2011-04-15-Bayesian statistical pragmatism

11 0.17214826 2201 andrew gelman stats-2014-02-06-Bootstrap averaging: Examples where it works and where it doesn’t work

12 0.16494903 1016 andrew gelman stats-2011-11-17-I got 99 comparisons but multiplicity ain’t one

13 0.1591419 254 andrew gelman stats-2010-09-04-Bayesian inference viewed as a computational approximation to classical calculations

14 0.15232798 2248 andrew gelman stats-2014-03-15-Problematic interpretations of confidence intervals

15 0.14645106 1941 andrew gelman stats-2013-07-16-Priors

16 0.14631455 858 andrew gelman stats-2011-08-17-Jumping off the edge of the world

17 0.14404903 2210 andrew gelman stats-2014-02-13-Stopping rules and Bayesian analysis

18 0.14389031 1089 andrew gelman stats-2011-12-28-Path sampling for models of varying dimension

19 0.14240164 1355 andrew gelman stats-2012-05-31-Lindley’s paradox

20 0.14043309 1605 andrew gelman stats-2012-12-04-Write This Book

similar blogs computed by lsi model

lsi for this blog:

topicId topicWeight

[(0, 0.206), (1, 0.156), (2, 0.058), (3, -0.065), (4, -0.021), (5, -0.072), (6, 0.012), (7, 0.073), (8, 0.1), (9, -0.212), (10, -0.107), (11, 0.073), (12, -0.003), (13, -0.126), (14, -0.084), (15, -0.096), (16, -0.09), (17, -0.11), (18, 0.054), (19, -0.188), (20, 0.239), (21, -0.009), (22, 0.163), (23, -0.091), (24, 0.187), (25, -0.165), (26, -0.128), (27, -0.144), (28, -0.026), (29, 0.139), (30, -0.051), (31, -0.2), (32, -0.085), (33, -0.044), (34, 0.017), (35, 0.09), (36, 0.001), (37, 0.141), (38, 0.088), (39, 0.057), (40, 0.029), (41, 0.029), (42, 0.103), (43, -0.049), (44, -0.007), (45, 0.011), (46, 0.009), (47, -0.067), (48, -0.027), (49, 0.019)]

similar blogs list:

simIndex simValue blogId blogTitle

1 0.96979415 1913 andrew gelman stats-2013-06-24-Why it doesn’t make sense in general to form confidence intervals by inverting hypothesis tests

same-blog 2 0.9598121 870 andrew gelman stats-2011-08-25-Why it doesn’t make sense in general to form confidence intervals by inverting hypothesis tests

3 0.90332365 480 andrew gelman stats-2010-12-21-Instead of “confidence interval,” let’s say “uncertainty interval”

4 0.84264404 1672 andrew gelman stats-2013-01-14-How do you think about the values in a confidence interval?

5 0.64977312 1016 andrew gelman stats-2011-11-17-I got 99 comparisons but multiplicity ain’t one

Introduction: After I gave my talk at an econ seminar on Why We (Usually) Don’t Care About Multiple Comparisons, I got the following comment: One question that came up later was whether your argument is really with testing in general, rather than only with testing in multiple comparison settings. My reply: Yes, my argument is with testing in general. But it arises with particular force in multiple comparisons. With a single test, we can just say we dislike testing so we use confidence intervals or Bayesian inference instead, and it’s no problem—really more of a change in emphasis than a change in methods. But with multiple tests, the classical advice is not simply to look at type 1 error rates but more specifically to make a multiplicity adjustment, for example to make confidence intervals wider to account for multiplicity. I don’t want to do this! So here there is a real battle to fight. P.S. Here’s the article (with Jennifer and Masanao), to appear in the Journal of Research on

6 0.62777787 1333 andrew gelman stats-2012-05-20-Question 10 of my final exam for Design and Analysis of Sample Surveys

7 0.60526377 254 andrew gelman stats-2010-09-04-Bayesian inference viewed as a computational approximation to classical calculations

8 0.60233319 2201 andrew gelman stats-2014-02-06-Bootstrap averaging: Examples where it works and where it doesn’t work

9 0.58205873 2248 andrew gelman stats-2014-03-15-Problematic interpretations of confidence intervals

10 0.56996286 1881 andrew gelman stats-2013-06-03-Boot

11 0.55218685 1334 andrew gelman stats-2012-05-21-Question 11 of my final exam for Design and Analysis of Sample Surveys

12 0.54912025 1206 andrew gelman stats-2012-03-10-95% intervals that I don’t believe, because they’re from a flat prior I don’t believe

13 0.54866713 1527 andrew gelman stats-2012-10-10-Another reason why you can get good inferences from a bad model

14 0.53097206 1869 andrew gelman stats-2013-05-24-In which I side with Neyman over Fisher

15 0.52968735 1178 andrew gelman stats-2012-02-21-How many data points do you really have?

16 0.50456697 2142 andrew gelman stats-2013-12-21-Chasing the noise

17 0.49760053 1095 andrew gelman stats-2012-01-01-Martin and Liu: Probabilistic inference based on consistency of model with data

18 0.49406251 662 andrew gelman stats-2011-04-15-Bayesian statistical pragmatism

19 0.4915598 996 andrew gelman stats-2011-11-07-Chi-square FAIL when many cells have small expected values

20 0.4841457 1331 andrew gelman stats-2012-05-19-Question 9 of my final exam for Design and Analysis of Sample Surveys

similar blogs computed by lda model

lda for this blog:

topicId topicWeight

[(15, 0.017), (16, 0.037), (20, 0.049), (21, 0.046), (24, 0.229), (27, 0.026), (48, 0.033), (64, 0.08), (82, 0.024), (86, 0.018), (89, 0.023), (98, 0.021), (99, 0.259)]

similar blogs list:

simIndex simValue blogId blogTitle

1 0.97819668 1913 andrew gelman stats-2013-06-24-Why it doesn’t make sense in general to form confidence intervals by inverting hypothesis tests

same-blog 2 0.97768748 870 andrew gelman stats-2011-08-25-Why it doesn’t make sense in general to form confidence intervals by inverting hypothesis tests

3 0.95668805 1653 andrew gelman stats-2013-01-04-Census dotmap

Introduction: Andrew Vande Moere points to this impressive interactive map from Brandon Martin-Anderson showing the locations of all the residents of the United States and Canada. It says, “The map has 341,817,095 dots – one for each person.” Not quite . . . I was hoping to zoom into my building (approximately 10 people live on our floor, I say approximately because two of the apartments are split between two floors and I’m not sure how they would assign the residents), but unfortunately our entire block is just a solid mass of black. Also, they put a few dots in the park and in the river by accident (presumably because the borders of the census blocks were specified only approximately). But, hey, no algorithm is perfect. It’s hard to know what to do about this. The idea of mapping every person is cool, but you’ll always run into trouble displaying densely populated areas. Smaller dots might work, but then that might depend on the screen being used for display.

4 0.95266879 2029 andrew gelman stats-2013-09-18-Understanding posterior p-values

Introduction: David Kaplan writes: I came across your paper “Understanding Posterior Predictive P-values”, and I have a question regarding your statement “If a posterior predictive p-value is 0.4, say, that means that, if we believe the model, we think there is a 40% chance that tomorrow’s value of T(y_rep) will exceed today’s T(y).” This is perfectly understandable to me and represents the idea of calibration. However, I am unsure how this relates to statements about fit. If T is the LR chi-square or Pearson chi-square, then your statement that there is a 40% chance that tomorrows value exceeds today’s value indicates bad fit, I think. Yet, some literature indicates that high p-values suggest good fit. Could you clarify this? My reply: I think that “fit” depends on the question being asked. In this case, I’d say the model fits for this particular purpose, even though it might not fit for other purposes. And here’s the abstract of the paper: Posterior predictive p-values do not i

5 0.94972503 1240 andrew gelman stats-2012-04-02-Blogads update

Introduction: A few months ago I reported on someone who wanted to insert text links into the blog. I asked her how much they would pay and got no answer. Yesterday, though, I received this reply: Hello Andrew, I am sorry for the delay in getting back to you. I’d like to make a proposal for your site. Please refer below. We would like to place a simple text link ad on page http://andrewgelman.com/2011/07/super_sam_fuld/ to link to *** with the key phrase ***. We will incorporate the key phrase into a sentence so it would read well. Rest assured it won’t sound obnoxious or advertorial. We will then process the final text link code as soon as you agree to our proposal. We can offer you $200 for this with the assumption that you will keep the link “live” on that page for 12 months or longer if you prefer. Please get back to us with a quick reply on your thoughts on this and include your Paypal ID for payment process. Hoping for a positive response from you. I wrote back: Hi,

6 0.94931346 896 andrew gelman stats-2011-09-09-My homework success

7 0.94890052 1757 andrew gelman stats-2013-03-11-My problem with the Lindley paradox

8 0.94864941 2312 andrew gelman stats-2014-04-29-Ken Rice presents a unifying approach to statistical inference and hypothesis testing

9 0.94772536 1838 andrew gelman stats-2013-05-03-Setting aside the politics, the debate over the new health-care study reveals that we’re moving to a new high standard of statistical journalism

10 0.94646204 1208 andrew gelman stats-2012-03-11-Gelman on Hennig on Gelman on Bayes

11 0.94606566 1792 andrew gelman stats-2013-04-07-X on JLP

12 0.94602239 1465 andrew gelman stats-2012-08-21-D. Buggin

13 0.94504952 953 andrew gelman stats-2011-10-11-Steve Jobs’s cancer and science-based medicine

14 0.94462383 847 andrew gelman stats-2011-08-10-Using a “pure infographic” to explore differences between information visualization and statistical graphics

15 0.94458079 1080 andrew gelman stats-2011-12-24-Latest in blog advertising

16 0.944278 2086 andrew gelman stats-2013-11-03-How best to compare effects measured in two different time periods?

17 0.94425648 1087 andrew gelman stats-2011-12-27-“Keeping things unridiculous”: Berger, O’Hagan, and me on weakly informative priors

18 0.94420004 2358 andrew gelman stats-2014-06-03-Did you buy laundry detergent on their most recent trip to the store? Also comments on scientific publication and yet another suggestion to do a study that allows within-person comparisons

19 0.9441241 1072 andrew gelman stats-2011-12-19-“The difference between . . .”: It’s not just p=.05 vs. p=.06

20 0.94368488 2247 andrew gelman stats-2014-03-14-The maximal information coefficient