andrew_gelman_stats andrew_gelman_stats-2014 andrew_gelman_stats-2014-2248 knowledge-graph by maker-knowledge-mining

2248 andrew gelman stats-2014-03-15-Problematic interpretations of confidence intervals


meta infos for this blog

Source: html

Introduction: Rink Hoekstra writes: A couple of months ago, you were visiting the University of Groningen, and after the talk you gave there I spoke briefly with you about a study that I conducted with Richard Morey, Jeff Rouder and Eric-Jan Wagenmakers. In the study, we found that researchers’  knowledge of how to interpret a confidence interval (CI), was almost as limited as the knowledge of students who had had no inferential statistics course yet. Our manuscript was recently accepted for publication in  Psychonomic Bulletin & Review , and it’s now available online (see e.g.,  here ). Maybe it’s interesting to discuss on your blog, especially since CIs are often promoted (for example in the new guidelines of Psychological Science ), but apparently researchers seem to have little idea how to interpret them. Given that the confidence percentage of a CI tells something about the procedure rather than about the data at hand, this might be understandable, but, according to us, it’s problematic neve


Summary: the most important sentenses genereted by tfidf model

sentIndex sentText sentNum sentScore

1 Rink Hoekstra writes: A couple of months ago, you were visiting the University of Groningen, and after the talk you gave there I spoke briefly with you about a study that I conducted with Richard Morey, Jeff Rouder and Eric-Jan Wagenmakers. [sent-1, score-0.7]

2 In the study, we found that researchers’  knowledge of how to interpret a confidence interval (CI), was almost as limited as the knowledge of students who had had no inferential statistics course yet. [sent-2, score-1.037]

3 Our manuscript was recently accepted for publication in  Psychonomic Bulletin & Review , and it’s now available online (see e. [sent-3, score-0.362]

4 Maybe it’s interesting to discuss on your blog, especially since CIs are often promoted (for example in the new guidelines of Psychological Science ), but apparently researchers seem to have little idea how to interpret them. [sent-6, score-0.624]

5 Given that the confidence percentage of a CI tells something about the procedure rather than about the data at hand, this might be understandable, but, according to us, it’s problematic nevertheless. [sent-7, score-0.663]

6 I replied that I agree that conf intervals are overrated, a point I think I discussed briefly here . [sent-8, score-0.604]

7 We used to all go around saying that all would be ok if people just ditched their p-values and replaced them with intervals. [sent-9, score-0.107]

8 But, from a Bayesian perspective, the problem is not with the inferential summary (central intervals vs. [sent-10, score-0.477]

9 tail-area probabilities) but with those default flat priors, which are particularly problematic in “Psychological Science”-style research where effect sizes are small and estimates are noisy. [sent-11, score-0.524]


similar blogs computed by tfidf model

tfidf for this blog:

wordName wordTfidf (topN-words)

[('ci', 0.259), ('problematic', 0.244), ('inferential', 0.223), ('briefly', 0.199), ('intervals', 0.181), ('interpret', 0.177), ('groningen', 0.171), ('hoekstra', 0.171), ('rink', 0.171), ('rouder', 0.171), ('psychological', 0.167), ('confidence', 0.163), ('cis', 0.154), ('bulletin', 0.154), ('promoted', 0.144), ('morey', 0.144), ('conf', 0.144), ('knowledge', 0.143), ('overrated', 0.14), ('visiting', 0.126), ('understandable', 0.122), ('guidelines', 0.118), ('manuscript', 0.116), ('spoke', 0.11), ('replaced', 0.107), ('flat', 0.104), ('researchers', 0.103), ('noisy', 0.098), ('interval', 0.097), ('study', 0.095), ('accepted', 0.095), ('conducted', 0.095), ('limited', 0.091), ('richard', 0.091), ('default', 0.089), ('tells', 0.088), ('procedure', 0.088), ('jeff', 0.088), ('sizes', 0.087), ('probabilities', 0.086), ('science', 0.083), ('apparently', 0.082), ('percentage', 0.08), ('replied', 0.08), ('priors', 0.08), ('central', 0.077), ('online', 0.076), ('publication', 0.075), ('months', 0.075), ('summary', 0.073)]

similar blogs list:

simIndex simValue blogId blogTitle

same-blog 1 1.0 2248 andrew gelman stats-2014-03-15-Problematic interpretations of confidence intervals

Introduction: Rink Hoekstra writes: A couple of months ago, you were visiting the University of Groningen, and after the talk you gave there I spoke briefly with you about a study that I conducted with Richard Morey, Jeff Rouder and Eric-Jan Wagenmakers. In the study, we found that researchers’  knowledge of how to interpret a confidence interval (CI), was almost as limited as the knowledge of students who had had no inferential statistics course yet. Our manuscript was recently accepted for publication in  Psychonomic Bulletin & Review , and it’s now available online (see e.g.,  here ). Maybe it’s interesting to discuss on your blog, especially since CIs are often promoted (for example in the new guidelines of Psychological Science ), but apparently researchers seem to have little idea how to interpret them. Given that the confidence percentage of a CI tells something about the procedure rather than about the data at hand, this might be understandable, but, according to us, it’s problematic neve

2 0.21496633 1672 andrew gelman stats-2013-01-14-How do you think about the values in a confidence interval?

Introduction: Philip Jones writes: As an interested reader of your blog, I wondered if you might consider a blog entry sometime on the following question I posed on CrossValidated (StackExchange). I originally posed the question based on my uncertainty about 95% CIs: “Are all values within the 95% CI equally likely (probable), or are the values at the “tails” of the 95% CI less likely than those in the middle of the CI closer to the point estimate?” I posed this question based on discordant information I found at a couple of different web sources (I posted these sources in the body of the question). I received some interesting replies, and the replies were not unanimous, in fact there is some serious disagreement there! After seeing this disagreement, I naturally thought of you, and whether you might be able to clear this up. Please note I am not referring to credible intervals, but rather to the common medical journal reporting standard of confidence intervals. My response: First

3 0.21234578 480 andrew gelman stats-2010-12-21-Instead of “confidence interval,” let’s say “uncertainty interval”

Introduction: I’ve become increasingly uncomfortable with the term “confidence interval,” for several reasons: - The well-known difficulties in interpretation (officially the confidence statement can be interpreted only on average, but people typically implicitly give the Bayesian interpretation to each case), - The ambiguity between confidence intervals and predictive intervals. (See the footnote in BDA where we discuss the difference between “inference” and “prediction” in the classical framework.) - The awkwardness of explaining that confidence intervals are big in noisy situations where you have less confidence, and confidence intervals are small when you have more confidence. So here’s my proposal. Let’s use the term “uncertainty interval” instead. The uncertainty interval tells you how much uncertainty you have. That works pretty well, I think. P.S. As of this writing, “confidence interval” outGoogles “uncertainty interval” by the huge margin of 9.5 million to 54000. So we

4 0.1562542 1913 andrew gelman stats-2013-06-24-Why it doesn’t make sense in general to form confidence intervals by inverting hypothesis tests

Introduction: I’m reposing this classic from 2011 . . . Peter Bergman pointed me to this discussion from Cyrus of a presentation by Guido Imbens on design of randomized experiments. Cyrus writes: The standard analysis that Imbens proposes includes (1) a Fisher-type permutation test of the sharp null hypothesis–what Imbens referred to as “testing”–along with a (2) Neyman-type point estimate of the sample average treatment effect and confidence interval–what Imbens referred to as “estimation.” . . . Imbens claimed that testing and estimation are separate enterprises with separate goals and that the two should not be confused. I [Cyrus] took it as a warning against proposals that use “inverted” tests in order to produce point estimates and confidence intervals. There is no reason that such confidence intervals will have accurate coverage except under rather dire assumptions, meaning that they are not “confidence intervals” in the way that we usually think of them. I agree completely. T

5 0.15232798 870 andrew gelman stats-2011-08-25-Why it doesn’t make sense in general to form confidence intervals by inverting hypothesis tests

Introduction: Peter Bergman points me to this discussion from Cyrus of a presentation by Guido Imbens on design of randomized experiments. Cyrus writes: The standard analysis that Imbens proposes includes (1) a Fisher-type permutation test of the sharp null hypothesis–what Imbens referred to as “testing”–along with a (2) Neyman-type point estimate of the sample average treatment effect and confidence interval–what Imbens referred to as “estimation.” . . . Imbens claimed that testing and estimation are separate enterprises with separate goals and that the two should not be confused. I [Cyrus] took it as a warning against proposals that use “inverted” tests in order to produce point estimates and confidence intervals. There is no reason that such confidence intervals will have accurate coverage except under rather dire assumptions, meaning that they are not “confidence intervals” in the way that we usually think of them. I agree completely. This is something I’ve been saying for a long

6 0.13326749 1470 andrew gelman stats-2012-08-26-Graphs showing regression uncertainty: the code!

7 0.12279472 1681 andrew gelman stats-2013-01-19-Participate in a short survey about the weight of evidence provided by statistics

8 0.11769186 1021 andrew gelman stats-2011-11-21-Don’t judge a book by its title

9 0.1148697 1878 andrew gelman stats-2013-05-31-How to fix the tabloids? Toward replicable social science research

10 0.11249958 2042 andrew gelman stats-2013-09-28-Difficulties of using statistical significance (or lack thereof) to sift through and compare research hypotheses

11 0.11145091 1478 andrew gelman stats-2012-08-31-Watercolor regression

12 0.11066969 1206 andrew gelman stats-2012-03-10-95% intervals that I don’t believe, because they’re from a flat prior I don’t believe

13 0.11040746 2278 andrew gelman stats-2014-04-01-Association for Psychological Science announces a new journal

14 0.11023127 2240 andrew gelman stats-2014-03-10-On deck this week: Things people sent me

15 0.10427473 963 andrew gelman stats-2011-10-18-Question on Type M errors

16 0.10197191 1016 andrew gelman stats-2011-11-17-I got 99 comparisons but multiplicity ain’t one

17 0.10019898 1511 andrew gelman stats-2012-09-26-What do statistical p-values mean when the sample = the population?

18 0.099192582 1174 andrew gelman stats-2012-02-18-Not as ugly as you look

19 0.097578615 2302 andrew gelman stats-2014-04-23-A short questionnaire regarding the subjective assessment of evidence

20 0.09692926 2183 andrew gelman stats-2014-01-23-Discussion on preregistration of research studies


similar blogs computed by lsi model

lsi for this blog:

topicId topicWeight

[(0, 0.168), (1, 0.011), (2, -0.027), (3, -0.086), (4, -0.041), (5, -0.012), (6, -0.015), (7, 0.04), (8, -0.062), (9, -0.075), (10, -0.019), (11, 0.025), (12, 0.056), (13, -0.017), (14, 0.001), (15, -0.018), (16, -0.029), (17, -0.009), (18, 0.021), (19, -0.059), (20, 0.035), (21, 0.027), (22, 0.034), (23, -0.017), (24, 0.073), (25, -0.054), (26, -0.022), (27, -0.101), (28, 0.001), (29, 0.035), (30, -0.032), (31, -0.07), (32, -0.055), (33, -0.052), (34, 0.026), (35, 0.055), (36, 0.002), (37, 0.063), (38, 0.017), (39, -0.034), (40, 0.005), (41, 0.002), (42, 0.118), (43, -0.022), (44, 0.039), (45, -0.016), (46, -0.041), (47, 0.019), (48, -0.003), (49, 0.011)]

similar blogs list:

simIndex simValue blogId blogTitle

same-blog 1 0.95094323 2248 andrew gelman stats-2014-03-15-Problematic interpretations of confidence intervals

Introduction: Rink Hoekstra writes: A couple of months ago, you were visiting the University of Groningen, and after the talk you gave there I spoke briefly with you about a study that I conducted with Richard Morey, Jeff Rouder and Eric-Jan Wagenmakers. In the study, we found that researchers’  knowledge of how to interpret a confidence interval (CI), was almost as limited as the knowledge of students who had had no inferential statistics course yet. Our manuscript was recently accepted for publication in  Psychonomic Bulletin & Review , and it’s now available online (see e.g.,  here ). Maybe it’s interesting to discuss on your blog, especially since CIs are often promoted (for example in the new guidelines of Psychological Science ), but apparently researchers seem to have little idea how to interpret them. Given that the confidence percentage of a CI tells something about the procedure rather than about the data at hand, this might be understandable, but, according to us, it’s problematic neve

2 0.82820135 1672 andrew gelman stats-2013-01-14-How do you think about the values in a confidence interval?

Introduction: Philip Jones writes: As an interested reader of your blog, I wondered if you might consider a blog entry sometime on the following question I posed on CrossValidated (StackExchange). I originally posed the question based on my uncertainty about 95% CIs: “Are all values within the 95% CI equally likely (probable), or are the values at the “tails” of the 95% CI less likely than those in the middle of the CI closer to the point estimate?” I posed this question based on discordant information I found at a couple of different web sources (I posted these sources in the body of the question). I received some interesting replies, and the replies were not unanimous, in fact there is some serious disagreement there! After seeing this disagreement, I naturally thought of you, and whether you might be able to clear this up. Please note I am not referring to credible intervals, but rather to the common medical journal reporting standard of confidence intervals. My response: First

3 0.80592543 480 andrew gelman stats-2010-12-21-Instead of “confidence interval,” let’s say “uncertainty interval”

Introduction: I’ve become increasingly uncomfortable with the term “confidence interval,” for several reasons: - The well-known difficulties in interpretation (officially the confidence statement can be interpreted only on average, but people typically implicitly give the Bayesian interpretation to each case), - The ambiguity between confidence intervals and predictive intervals. (See the footnote in BDA where we discuss the difference between “inference” and “prediction” in the classical framework.) - The awkwardness of explaining that confidence intervals are big in noisy situations where you have less confidence, and confidence intervals are small when you have more confidence. So here’s my proposal. Let’s use the term “uncertainty interval” instead. The uncertainty interval tells you how much uncertainty you have. That works pretty well, I think. P.S. As of this writing, “confidence interval” outGoogles “uncertainty interval” by the huge margin of 9.5 million to 54000. So we

4 0.67242408 1913 andrew gelman stats-2013-06-24-Why it doesn’t make sense in general to form confidence intervals by inverting hypothesis tests

Introduction: I’m reposing this classic from 2011 . . . Peter Bergman pointed me to this discussion from Cyrus of a presentation by Guido Imbens on design of randomized experiments. Cyrus writes: The standard analysis that Imbens proposes includes (1) a Fisher-type permutation test of the sharp null hypothesis–what Imbens referred to as “testing”–along with a (2) Neyman-type point estimate of the sample average treatment effect and confidence interval–what Imbens referred to as “estimation.” . . . Imbens claimed that testing and estimation are separate enterprises with separate goals and that the two should not be confused. I [Cyrus] took it as a warning against proposals that use “inverted” tests in order to produce point estimates and confidence intervals. There is no reason that such confidence intervals will have accurate coverage except under rather dire assumptions, meaning that they are not “confidence intervals” in the way that we usually think of them. I agree completely. T

5 0.66934633 1662 andrew gelman stats-2013-01-09-The difference between “significant” and “non-significant” is not itself statistically significant

Introduction: Commenter Rahul asked what I thought of this note by Scott Firestone ( link from Tyler Cowen) criticizing a recent discussion by Kevin Drum suggesting that lead exposure causes violent crime. Firestone writes: It turns out there was in fact a prospective study done—but its implications for Drum’s argument are mixed. The study was a cohort study done by researchers at the University of Cincinnati. Between 1979 and 1984, 376 infants were recruited. Their parents consented to have lead levels in their blood tested over time; this was matched with records over subsequent decades of the individuals’ arrest records, and specifically arrest for violent crime. Ultimately, some of these individuals were dropped from the study; by the end, 250 were selected for the results. The researchers found that for each increase of 5 micrograms of lead per deciliter of blood, there was a higher risk for being arrested for a violent crime, but a further look at the numbers shows a more mixe

6 0.66215509 2227 andrew gelman stats-2014-02-27-“What Can we Learn from the Many Labs Replication Project?”

7 0.65504211 870 andrew gelman stats-2011-08-25-Why it doesn’t make sense in general to form confidence intervals by inverting hypothesis tests

8 0.65408462 410 andrew gelman stats-2010-11-12-The Wald method has been the subject of extensive criticism by statisticians for exaggerating results”

9 0.65285295 2142 andrew gelman stats-2013-12-21-Chasing the noise

10 0.64060909 2069 andrew gelman stats-2013-10-19-R package for effect size calculations for psychology researchers

11 0.63851887 1766 andrew gelman stats-2013-03-16-“Nightshifts Linked to Increased Risk for Ovarian Cancer”

12 0.63123649 1206 andrew gelman stats-2012-03-10-95% intervals that I don’t believe, because they’re from a flat prior I don’t believe

13 0.60662878 1016 andrew gelman stats-2011-11-17-I got 99 comparisons but multiplicity ain’t one

14 0.60360777 2099 andrew gelman stats-2013-11-13-“What are some situations in which the classical approach (or a naive implementation of it, based on cookbook recipes) gives worse results than a Bayesian approach, results that actually impeded the science?”

15 0.59860784 2220 andrew gelman stats-2014-02-22-Quickies

16 0.59529233 2042 andrew gelman stats-2013-09-28-Difficulties of using statistical significance (or lack thereof) to sift through and compare research hypotheses

17 0.58219481 1021 andrew gelman stats-2011-11-21-Don’t judge a book by its title

18 0.5717324 2201 andrew gelman stats-2014-02-06-Bootstrap averaging: Examples where it works and where it doesn’t work

19 0.5665338 1150 andrew gelman stats-2012-02-02-The inevitable problems with statistical significance and 95% intervals

20 0.56208503 1881 andrew gelman stats-2013-06-03-Boot


similar blogs computed by lda model

lda for this blog:

topicId topicWeight

[(15, 0.053), (16, 0.163), (20, 0.04), (24, 0.152), (36, 0.053), (39, 0.036), (43, 0.017), (46, 0.013), (56, 0.06), (57, 0.019), (88, 0.016), (89, 0.064), (98, 0.014), (99, 0.211)]

similar blogs list:

simIndex simValue blogId blogTitle

same-blog 1 0.94301617 2248 andrew gelman stats-2014-03-15-Problematic interpretations of confidence intervals

Introduction: Rink Hoekstra writes: A couple of months ago, you were visiting the University of Groningen, and after the talk you gave there I spoke briefly with you about a study that I conducted with Richard Morey, Jeff Rouder and Eric-Jan Wagenmakers. In the study, we found that researchers’  knowledge of how to interpret a confidence interval (CI), was almost as limited as the knowledge of students who had had no inferential statistics course yet. Our manuscript was recently accepted for publication in  Psychonomic Bulletin & Review , and it’s now available online (see e.g.,  here ). Maybe it’s interesting to discuss on your blog, especially since CIs are often promoted (for example in the new guidelines of Psychological Science ), but apparently researchers seem to have little idea how to interpret them. Given that the confidence percentage of a CI tells something about the procedure rather than about the data at hand, this might be understandable, but, according to us, it’s problematic neve

2 0.91594052 411 andrew gelman stats-2010-11-13-Ethical concerns in medical trials

Introduction: I just read this article on the treatment of medical volunteers, written by doctor and bioethicist Carl Ellliott. As a statistician who has done a small amount of consulting for pharmaceutical companies, I have a slightly different perspective. As a doctor, Elliott focuses on individual patients, whereas, as a statistician, I’ve been trained to focus on the goal of accurately estimate treatment effects. I’ll go through Elliott’s article and give my reactions. Elliott: In Miami, investigative reporters for Bloomberg Markets magazine discovered that a contract research organisation called SFBC International was testing drugs on undocumented immigrants in a rundown motel; since that report, the motel has been demolished for fire and safety violations. . . . SFBC had recently been named one of the best small businesses in America by Forbes magazine. The Holiday Inn testing facility was the largest in North America, and had been operating for nearly ten years before inspecto

3 0.91513139 177 andrew gelman stats-2010-08-02-Reintegrating rebels into civilian life: Quasi-experimental evidence from Burundi

Introduction: Michael Gilligan, Eric Mvukiyehe, and Cyrus Samii write : We [Gilligan, Mvukiyehe, and Samii] use original survey data, collected in Burundi in the summer of 2007, to show that a World Bank ex-combatant reintegration program implemented after Burundi’s civil war caused significant economic reintegration for its beneficiaries but that this economic reintegration did not translate into greater political and social reintegration. Previous studies of reintegration programs have found them to be ineffective, but these studies have suffered from selection bias: only ex-combatants who self selected into those programs were studied. We avoid such bias with a quasi-experimental research design made possible by an exogenous bureaucratic failure in the implementation of program. One of the World Bank’s implementing partners delayed implementation by almost a year due to an unforeseen contract dispute. As a result, roughly a third of ex-combatants had their program benefits withheld for reas

4 0.91210628 1093 andrew gelman stats-2011-12-30-Strings Attached: Untangling the Ethics of Incentives

Introduction: Chris Paulse points me to this book by Ruth Grant: Incentives can be found everywhere–in schools, businesses, factories, and government–influencing people’s choices about almost everything, from financial decisions and tobacco use to exercise and child rearing. So long as people have a choice, incentives seem innocuous. But Strings Attached demonstrates that when incentives are viewed as a kind of power rather than as a form of exchange, many ethical questions arise: How do incentives affect character and institutional culture? Can incentives be manipulative or exploitative, even if people are free to refuse them? What are the responsibilities of the powerful in using incentives? Ruth Grant shows that, like all other forms of power, incentives can be subject to abuse, and she identifies their legitimate and illegitimate uses. Grant offers a history of the growth of incentives in early twentieth-century America, identifies standards for judging incentives, and examines incentives

5 0.90762889 2 andrew gelman stats-2010-04-23-Modeling heterogenous treatment effects

Introduction: Don Green and Holger Kern write on one of my favorite topics , treatment interactions (see also here ): We [Green and Kern] present a methodology that largely automates the search for systematic treatment effect heterogeneity in large-scale experiments. We introduce a nonparametric estimator developed in statistical learning, Bayesian Additive Regression Trees (BART), to model treatment effects that vary as a function of covariates. BART has several advantages over commonly employed parametric modeling strategies, in particular its ability to automatically detect and model relevant treatment-covariate interactions in a flexible manner. To increase the reliability and credibility of the resulting conditional treatment effect estimates, we suggest the use of a split sample design. The data are randomly divided into two equally-sized parts, with the first part used to explore treatment effect heterogeneity and the second part used to confirm the results. This approach permits a re

6 0.90426701 503 andrew gelman stats-2011-01-04-Clarity on my email policy

7 0.90269101 674 andrew gelman stats-2011-04-21-Handbook of Markov Chain Monte Carlo

8 0.90045547 1572 andrew gelman stats-2012-11-10-I don’t like this cartoon

9 0.90000415 2179 andrew gelman stats-2014-01-20-The AAA Tranche of Subprime Science

10 0.89867634 799 andrew gelman stats-2011-07-13-Hypothesis testing with multiple imputations

11 0.89675117 960 andrew gelman stats-2011-10-15-The bias-variance tradeoff

12 0.8961727 1016 andrew gelman stats-2011-11-17-I got 99 comparisons but multiplicity ain’t one

13 0.89322579 1712 andrew gelman stats-2013-02-07-Philosophy and the practice of Bayesian statistics (with all the discussions!)

14 0.89031571 593 andrew gelman stats-2011-02-27-Heat map

15 0.89014149 609 andrew gelman stats-2011-03-13-Coauthorship norms

16 0.89001101 608 andrew gelman stats-2011-03-12-Single or multiple imputation?

17 0.88917446 481 andrew gelman stats-2010-12-22-The Jumpstart financial literacy survey and the different purposes of tests

18 0.88704765 159 andrew gelman stats-2010-07-23-Popular governor, small state

19 0.88632023 434 andrew gelman stats-2010-11-28-When Small Numbers Lead to Big Errors

20 0.88628125 370 andrew gelman stats-2010-10-25-Who gets wedding announcements in the Times?