andrew_gelman_stats andrew_gelman_stats-2010 andrew_gelman_stats-2010-466 knowledge-graph by maker-knowledge-mining

466 andrew gelman stats-2010-12-13-“The truth wears off: Is there something wrong with the scientific method?”


meta infos for this blog

Source: html

Introduction: Gur Huberman asks what I think of this magazine article by Johah Lehrer (see also here ). My reply is that it reminds me a bit of what I wrote here . Or see here for the quick powerpoint version: The short story is that if you screen for statistical significance when estimating small effects, you will necessarily overestimate the magnitudes of effects, sometimes by a huge amount. I know that Dave Krantz has thought about this issue for awhile; it came up when Francis Tuerlinckx and I wrote our paper on Type S errors, ten years ago. My current thinking is that most (almost all?) research studies of the sort described by Lehrer should be accompanied by retrospective power analyses, or informative Bayesian inferences. Either of these approaches–whether classical or Bayesian, the key is that they incorporate real prior information, just as is done in a classical prospective power analysis–would, I think, moderate the tendency to overestimate the magnitude of effects. In answ


Summary: the most important sentenses genereted by tfidf model

sentIndex sentText sentNum sentScore

1 Gur Huberman asks what I think of this magazine article by Johah Lehrer (see also here ). [sent-1, score-0.171]

2 My reply is that it reminds me a bit of what I wrote here . [sent-2, score-0.166]

3 Or see here for the quick powerpoint version: The short story is that if you screen for statistical significance when estimating small effects, you will necessarily overestimate the magnitudes of effects, sometimes by a huge amount. [sent-3, score-1.002]

4 I know that Dave Krantz has thought about this issue for awhile; it came up when Francis Tuerlinckx and I wrote our paper on Type S errors, ten years ago. [sent-4, score-0.175]

5 ) research studies of the sort described by Lehrer should be accompanied by retrospective power analyses, or informative Bayesian inferences. [sent-6, score-0.559]

6 Either of these approaches–whether classical or Bayesian, the key is that they incorporate real prior information, just as is done in a classical prospective power analysis–would, I think, moderate the tendency to overestimate the magnitude of effects. [sent-7, score-1.208]

7 And corrections for multiple comparisons will not solve the problem: such adjustments merely shift the threshold without resolving the problem of overestimation of small effects. [sent-9, score-0.987]


similar blogs computed by tfidf model

tfidf for this blog:

wordName wordTfidf (topN-words)

[('lehrer', 0.395), ('overestimate', 0.228), ('huberman', 0.167), ('gur', 0.157), ('krantz', 0.157), ('patternless', 0.157), ('resolving', 0.157), ('effects', 0.156), ('classical', 0.155), ('significance', 0.148), ('accompanied', 0.141), ('tuerlinckx', 0.141), ('power', 0.138), ('powerpoint', 0.134), ('overestimation', 0.132), ('posed', 0.129), ('magnitudes', 0.129), ('retrospective', 0.127), ('francis', 0.125), ('prospective', 0.123), ('method', 0.122), ('adjustments', 0.119), ('corrections', 0.115), ('incorporate', 0.112), ('screen', 0.11), ('dave', 0.106), ('answer', 0.106), ('moderate', 0.104), ('threshold', 0.102), ('tendency', 0.1), ('small', 0.099), ('shift', 0.098), ('pass', 0.095), ('magnitude', 0.093), ('ten', 0.091), ('magazine', 0.09), ('bayesian', 0.087), ('merely', 0.084), ('wrote', 0.084), ('reminds', 0.082), ('approaches', 0.082), ('solve', 0.081), ('asks', 0.081), ('awhile', 0.081), ('defined', 0.079), ('necessarily', 0.078), ('reporting', 0.078), ('described', 0.077), ('informative', 0.076), ('estimating', 0.076)]

similar blogs list:

simIndex simValue blogId blogTitle

same-blog 1 0.99999994 466 andrew gelman stats-2010-12-13-“The truth wears off: Is there something wrong with the scientific method?”

Introduction: Gur Huberman asks what I think of this magazine article by Johah Lehrer (see also here ). My reply is that it reminds me a bit of what I wrote here . Or see here for the quick powerpoint version: The short story is that if you screen for statistical significance when estimating small effects, you will necessarily overestimate the magnitudes of effects, sometimes by a huge amount. I know that Dave Krantz has thought about this issue for awhile; it came up when Francis Tuerlinckx and I wrote our paper on Type S errors, ten years ago. My current thinking is that most (almost all?) research studies of the sort described by Lehrer should be accompanied by retrospective power analyses, or informative Bayesian inferences. Either of these approaches–whether classical or Bayesian, the key is that they incorporate real prior information, just as is done in a classical prospective power analysis–would, I think, moderate the tendency to overestimate the magnitude of effects. In answ

2 0.18317087 899 andrew gelman stats-2011-09-10-The statistical significance filter

Introduction: I’ve talked about this a bit but it’s never had its own blog entry (until now). Statistically significant findings tend to overestimate the magnitude of effects. This holds in general (because E(|x|) > |E(x)|) but even more so if you restrict to statistically significant results. Here’s an example. Suppose a true effect of theta is unbiasedly estimated by y ~ N (theta, 1). Further suppose that we will only consider statistically significant results, that is, cases in which |y| > 2. The estimate “|y| conditional on |y|>2″ is clearly an overestimate of |theta|. First off, if |theta|<2, the estimate |y| conditional on statistical significance is not only too high in expectation, it's always too high. This is a problem, given that |theta| is in reality probably is less than 2. (The low-hangning fruit have already been picked, remember?) But even if |theta|>2, the estimate |y| conditional on statistical significance will still be too high in expectation. For a discussion o

3 0.1801351 2057 andrew gelman stats-2013-10-10-Chris Chabris is irritated by Malcolm Gladwell

Introduction: Christopher Chabris reviewed the new book by Malcolm Gladwell: One thing “David and Goliath” shows is that Mr. Gladwell has not changed his own strategy, despite serious criticism of his prior work. What he presents are mostly just intriguing possibilities and musings about human behavior, but what his publisher sells them as, and what his readers may incorrectly take them for, are lawful, causal rules that explain how the world really works. Mr. Gladwell should acknowledge when he is speculating or working with thin evidentiary soup. Yet far from abandoning his hand or even standing pat, Mr. Gladwell has doubled down. This will surely bring more success to a Goliath of nonfiction writing, but not to his readers. Afterward he blogged some further thoughts about the popular popular science writer. Good stuff . Chabris has a thoughtful explanation of why the “Gladwell is just an entertainer” alibi doesn’t work for him (Chabris). Some of his discussion reminds me of my articl

4 0.1519482 1442 andrew gelman stats-2012-08-03-Double standard? Plagiarizing journos get slammed, plagiarizing profs just shrug it off

Introduction: Dan Kahan writes on what seems to be the topic of the week : In reflecting on Lehrer , I [Kahan] have to wonder why the sanction is so much more severe — basically career “death penalty” subject to parole [I think he means "life imprisonment" --- ed.], I suppose, if he manages decades of “good behavior” — for this science journalist when scholars who stick plagiarized material in their “popular science” writing don’t even get slap on wrist — more like shrug of the shoulders. I do think the behavior is comparable; if anything, it’s probably “less wrong” to make up innocuous filler quotes (the Dylan one is, for sure), then to stick paragraphs of someone else’s writing into a book. But the cause is the same: laziness. (The plagarism I’m talking about is not the sort done by Wegman; its sort done by scholars who use factory production techniques to write popular press books — teams of research assistants who write memos, which the “author” then knits together & passes off as learne

5 0.15136886 511 andrew gelman stats-2011-01-11-One more time on that ESP study: The problem of overestimates and the shrinkage solution

Introduction: Benedict Carey writes a follow-up article on ESP studies and Bayesian statistics. ( See here for my previous thoughts on the topic.) Everything Carey writes is fine, and he even uses an example I recommended: The statistical approach that has dominated the social sciences for almost a century is called significance testing. The idea is straightforward. A finding from any well-designed study — say, a correlation between a personality trait and the risk of depression — is considered “significant” if its probability of occurring by chance is less than 5 percent. This arbitrary cutoff makes sense when the effect being studied is a large one — for example, when measuring the so-called Stroop effect. This effect predicts that naming the color of a word is faster and more accurate when the word and color match (“red” in red letters) than when they do not (“red” in blue letters), and is very strong in almost everyone. “But if the true effect of what you are measuring is small,” sai

6 0.14326142 1883 andrew gelman stats-2013-06-04-Interrogating p-values

7 0.13183868 1989 andrew gelman stats-2013-08-20-Correcting for multiple comparisons in a Bayesian regression model

8 0.1308908 1695 andrew gelman stats-2013-01-28-Economists argue about Bayes

9 0.13081244 562 andrew gelman stats-2011-02-06-Statistician cracks Toronto lottery

10 0.12872893 1955 andrew gelman stats-2013-07-25-Bayes-respecting experimental design and other things

11 0.12855853 1448 andrew gelman stats-2012-08-07-Scientific fraud, double standards and institutions protecting themselves

12 0.12835543 446 andrew gelman stats-2010-12-03-Is 0.05 too strict as a p-value threshold?

13 0.1270134 1944 andrew gelman stats-2013-07-18-You’ll get a high Type S error rate if you use classical statistical methods to analyze data from underpowered studies

14 0.11812358 957 andrew gelman stats-2011-10-14-Questions about a study of charter schools

15 0.11600367 1400 andrew gelman stats-2012-06-29-Decline Effect in Linguistics?

16 0.11186312 576 andrew gelman stats-2011-02-15-With a bit of precognition, you’d have known I was going to post again on this topic, and with a lot of precognition, you’d have known I was going to post today

17 0.11024877 310 andrew gelman stats-2010-10-02-The winner’s curse

18 0.10977906 1149 andrew gelman stats-2012-02-01-Philosophy of Bayesian statistics: my reactions to Cox and Mayo

19 0.10879125 643 andrew gelman stats-2011-04-02-So-called Bayesian hypothesis testing is just as bad as regular hypothesis testing

20 0.10769289 63 andrew gelman stats-2010-06-02-The problem of overestimation of group-level variance parameters


similar blogs computed by lsi model

lsi for this blog:

topicId topicWeight

[(0, 0.184), (1, 0.074), (2, -0.008), (3, -0.101), (4, -0.045), (5, -0.063), (6, 0.001), (7, 0.027), (8, -0.023), (9, -0.037), (10, -0.031), (11, -0.015), (12, 0.055), (13, -0.023), (14, 0.062), (15, 0.007), (16, -0.045), (17, 0.009), (18, 0.018), (19, 0.036), (20, -0.042), (21, 0.055), (22, 0.012), (23, -0.003), (24, -0.044), (25, -0.042), (26, 0.01), (27, 0.002), (28, -0.017), (29, -0.04), (30, 0.063), (31, 0.005), (32, 0.017), (33, 0.027), (34, 0.033), (35, 0.019), (36, -0.027), (37, -0.016), (38, 0.011), (39, -0.011), (40, 0.023), (41, 0.059), (42, -0.01), (43, 0.08), (44, 0.031), (45, -0.029), (46, 0.025), (47, -0.032), (48, -0.019), (49, -0.037)]

similar blogs list:

simIndex simValue blogId blogTitle

same-blog 1 0.97868162 466 andrew gelman stats-2010-12-13-“The truth wears off: Is there something wrong with the scientific method?”

Introduction: Gur Huberman asks what I think of this magazine article by Johah Lehrer (see also here ). My reply is that it reminds me a bit of what I wrote here . Or see here for the quick powerpoint version: The short story is that if you screen for statistical significance when estimating small effects, you will necessarily overestimate the magnitudes of effects, sometimes by a huge amount. I know that Dave Krantz has thought about this issue for awhile; it came up when Francis Tuerlinckx and I wrote our paper on Type S errors, ten years ago. My current thinking is that most (almost all?) research studies of the sort described by Lehrer should be accompanied by retrospective power analyses, or informative Bayesian inferences. Either of these approaches–whether classical or Bayesian, the key is that they incorporate real prior information, just as is done in a classical prospective power analysis–would, I think, moderate the tendency to overestimate the magnitude of effects. In answ

2 0.81928104 643 andrew gelman stats-2011-04-02-So-called Bayesian hypothesis testing is just as bad as regular hypothesis testing

Introduction: Steve Ziliak points me to this article by the always-excellent Carl Bialik, slamming hypothesis tests. I only wish Carl had talked with me before so hastily posting, though! I would’ve argued with some of the things in the article. In particular, he writes: Reese and Brad Carlin . . . suggest that Bayesian statistics are a better alternative, because they tackle the probability that the hypothesis is true head-on, and incorporate prior knowledge about the variables involved. Brad Carlin does great work in theory, methods, and applications, and I like the bit about the prior knowledge (although I might prefer the more general phrase “additional information”), but I hate that quote! My quick response is that the hypothesis of zero effect is almost never true! The problem with the significance testing framework–Bayesian or otherwise–is in the obsession with the possibility of an exact zero effect. The real concern is not with zero, it’s with claiming a positive effect whe

3 0.80023813 511 andrew gelman stats-2011-01-11-One more time on that ESP study: The problem of overestimates and the shrinkage solution

Introduction: Benedict Carey writes a follow-up article on ESP studies and Bayesian statistics. ( See here for my previous thoughts on the topic.) Everything Carey writes is fine, and he even uses an example I recommended: The statistical approach that has dominated the social sciences for almost a century is called significance testing. The idea is straightforward. A finding from any well-designed study — say, a correlation between a personality trait and the risk of depression — is considered “significant” if its probability of occurring by chance is less than 5 percent. This arbitrary cutoff makes sense when the effect being studied is a large one — for example, when measuring the so-called Stroop effect. This effect predicts that naming the color of a word is faster and more accurate when the word and color match (“red” in red letters) than when they do not (“red” in blue letters), and is very strong in almost everyone. “But if the true effect of what you are measuring is small,” sai

4 0.80009943 1944 andrew gelman stats-2013-07-18-You’ll get a high Type S error rate if you use classical statistical methods to analyze data from underpowered studies

Introduction: Brendan Nyhan sends me this article from the research-methods all-star team of Katherine Button, John Ioannidis, Claire Mokrysz, Brian Nosek , Jonathan Flint, Emma Robinson, and Marcus Munafo: A study with low statistical power has a reduced chance of detecting a true effect, but it is less well appreciated that low power also reduces the likelihood that a statistically significant result reflects a true effect. Here, we show that the average statistical power of studies in the neurosciences is very low. The consequences of this include overestimates of effect size and low reproducibility of results. There are also ethical dimensions to this problem, as unreliable research is inefficient and wasteful. Improving reproducibility in neuroscience is a key priority and requires attention to well-established but often ignored methodological principles. I agree completely. In my terminology, with small sample size, the classical approach of looking for statistical significance leads

5 0.78815043 576 andrew gelman stats-2011-02-15-With a bit of precognition, you’d have known I was going to post again on this topic, and with a lot of precognition, you’d have known I was going to post today

Introduction: Chris Masse points me to this response by Daryl Bem and two statisticians (Jessica Utts and Wesley Johnson) to criticisms by Wagenmakers et.al. of Bem’s recent ESP study. I have nothing to add but would like to repeat a couple bits of my discussions of last month, of here : Classical statistical methods that work reasonably well when studying moderate or large effects (see the work of Fisher, Snedecor, Cochran, etc.) fall apart in the presence of small effects. I think it’s naive when people implicitly assume that the study’s claims are correct, or the study’s statistical methods are weak. Generally, the smaller the effects you’re studying, the better the statistics you need. ESP is a field of small effects and so ESP researchers use high-quality statistics. To put it another way: whatever methodological errors happen to be in the paper in question, probably occur in lots of researcher papers in “legitimate” psychology research. The difference is that when you’re studying a

6 0.77008677 2090 andrew gelman stats-2013-11-05-How much do we trust a new claim that early childhood stimulation raised earnings by 42%?

7 0.748101 2223 andrew gelman stats-2014-02-24-“Edlin’s rule” for routinely scaling down published estimates

8 0.74647832 899 andrew gelman stats-2011-09-10-The statistical significance filter

9 0.74405527 368 andrew gelman stats-2010-10-25-Is instrumental variables analysis particularly susceptible to Type M errors?

10 0.74022573 106 andrew gelman stats-2010-06-23-Scientists can read your mind . . . as long as the’re allowed to look at more than one place in your brain and then make a prediction after seeing what you actually did

11 0.7395584 506 andrew gelman stats-2011-01-06-That silly ESP paper and some silliness in a rebuttal as well

12 0.7351132 1150 andrew gelman stats-2012-02-02-The inevitable problems with statistical significance and 95% intervals

13 0.73281032 1171 andrew gelman stats-2012-02-16-“False-positive psychology”

14 0.73217082 1776 andrew gelman stats-2013-03-25-The harm done by tests of significance

15 0.72594041 2042 andrew gelman stats-2013-09-28-Difficulties of using statistical significance (or lack thereof) to sift through and compare research hypotheses

16 0.72157395 2183 andrew gelman stats-2014-01-23-Discussion on preregistration of research studies

17 0.7177912 897 andrew gelman stats-2011-09-09-The difference between significant and not significant…

18 0.71444726 1989 andrew gelman stats-2013-08-20-Correcting for multiple comparisons in a Bayesian regression model

19 0.70542657 1971 andrew gelman stats-2013-08-07-I doubt they cheated

20 0.70298719 310 andrew gelman stats-2010-10-02-The winner’s curse


similar blogs computed by lda model

lda for this blog:

topicId topicWeight

[(9, 0.011), (15, 0.026), (16, 0.061), (21, 0.014), (24, 0.228), (29, 0.109), (30, 0.014), (34, 0.044), (42, 0.013), (48, 0.014), (53, 0.028), (62, 0.021), (65, 0.012), (89, 0.023), (96, 0.014), (99, 0.28)]

similar blogs list:

simIndex simValue blogId blogTitle

same-blog 1 0.96511948 466 andrew gelman stats-2010-12-13-“The truth wears off: Is there something wrong with the scientific method?”

Introduction: Gur Huberman asks what I think of this magazine article by Johah Lehrer (see also here ). My reply is that it reminds me a bit of what I wrote here . Or see here for the quick powerpoint version: The short story is that if you screen for statistical significance when estimating small effects, you will necessarily overestimate the magnitudes of effects, sometimes by a huge amount. I know that Dave Krantz has thought about this issue for awhile; it came up when Francis Tuerlinckx and I wrote our paper on Type S errors, ten years ago. My current thinking is that most (almost all?) research studies of the sort described by Lehrer should be accompanied by retrospective power analyses, or informative Bayesian inferences. Either of these approaches–whether classical or Bayesian, the key is that they incorporate real prior information, just as is done in a classical prospective power analysis–would, I think, moderate the tendency to overestimate the magnitude of effects. In answ

2 0.96463549 1940 andrew gelman stats-2013-07-16-A poll that throws away data???

Introduction: Mark Blumenthal writes: What do you think about the “random rejection” method used by PPP that was attacked at some length today by a Republican pollster. Our just published post on the debate includes all the details as I know them. The Storify of Martino’s tweets has some additional data tables linked to toward the end. Also, more specifically, setting aside Martino’s suggestion of manipulation (which is also quite possible with post-stratification weights), would the PPP method introduce more potential random error than weighting? From Blumenthal’s blog: B.J. Martino, a senior vice president at the Republican polling firm The Tarrance Group, went on an 30-minute Twitter rant on Tuesday questioning the unorthodox method used by PPP [Public Policy Polling] to select samples and weight data: “Looking at @ppppolls new VA SW. Wondering how many interviews they discarded to get down to 601 completes? Because @ppppolls discards a LOT of interviews. Of 64,811 conducted

3 0.96058828 1944 andrew gelman stats-2013-07-18-You’ll get a high Type S error rate if you use classical statistical methods to analyze data from underpowered studies

Introduction: Brendan Nyhan sends me this article from the research-methods all-star team of Katherine Button, John Ioannidis, Claire Mokrysz, Brian Nosek , Jonathan Flint, Emma Robinson, and Marcus Munafo: A study with low statistical power has a reduced chance of detecting a true effect, but it is less well appreciated that low power also reduces the likelihood that a statistically significant result reflects a true effect. Here, we show that the average statistical power of studies in the neurosciences is very low. The consequences of this include overestimates of effect size and low reproducibility of results. There are also ethical dimensions to this problem, as unreliable research is inefficient and wasteful. Improving reproducibility in neuroscience is a key priority and requires attention to well-established but often ignored methodological principles. I agree completely. In my terminology, with small sample size, the classical approach of looking for statistical significance leads

4 0.9597255 2133 andrew gelman stats-2013-12-13-Flexibility is good

Introduction: If I made a separate post for each interesting blog discussion, we’d get overwhelmed. That’s why I often leave detailed responses in the comments section, even though I’m pretty sure that most readers don’t look in the comments at all. Sometimes, though, I think it’s good to bring such discussions to light. Here’s a recent example. Michael wrote : Poor predictive performance usually indicates that the model isn’t sufficiently flexible to explain the data, and my understanding of the proper Bayesian strategy is to feed that back into your original model and try again until you achieve better performance. Corey replied : It was my impression that — in ML at least — poor predictive performance is more often due to the model being too flexible and fitting noise. And Rahul agreed : Good point. A very flexible model will describe your training data perfectly and then go bonkers when unleashed on wild data. But I wrote : Overfitting comes from a model being flex

5 0.95666254 1421 andrew gelman stats-2012-07-19-Alexa, Maricel, and Marty: Three cellular automata who got on my nerves

Introduction: I received the following two emails within fifteen minutes of each other. First, from “Alexa Russell,” subject line “An idea for a blog post: The Role, Importance, and Power of Words”: Hi Andrew, I’m a researcher/writer for a resource covering the importance of English proficiency in today’s workplace. I came across your blog andrewgelman.com as I was conducting research and I’m interested in contributing an article to your blog because I found the topics you cover very engaging. I’m thinking about writing an article that looks at how the Internet has changed the way English is used today; not only has its syntax changed as a result of the Internet Revolution, but the amount of job opportunities has also shifted as a result of this shift. I’d be happy to work with you on the topic if you have any insights. Thanks, and I look forward to hearing from you soon. Best, Alexa Second, From “Maricel Anderson,” subject line “An idea for a blog post: Healthcare Management and Geri

6 0.95548689 1024 andrew gelman stats-2011-11-23-Of hypothesis tests and Unitarians

7 0.95222664 2051 andrew gelman stats-2013-10-04-Scientific communication that accords you “the basic human dignity of allowing you to draw your own conclusions”

8 0.95049393 1687 andrew gelman stats-2013-01-21-Workshop on science communication for graduate students

9 0.95002902 1392 andrew gelman stats-2012-06-26-Occam

10 0.94883174 639 andrew gelman stats-2011-03-31-Bayes: radical, liberal, or conservative?

11 0.94572163 936 andrew gelman stats-2011-10-02-Covariate Adjustment in RCT - Model Overfitting in Multilevel Regression

12 0.94505763 2305 andrew gelman stats-2014-04-25-Revised statistical standards for evidence (comments to Val Johnson’s comments on our comments on Val’s comments on p-values)

13 0.94477475 1491 andrew gelman stats-2012-09-10-Update on Levitt paper on child car seats

14 0.94242555 1240 andrew gelman stats-2012-04-02-Blogads update

15 0.94034123 2109 andrew gelman stats-2013-11-21-Hidden dangers of noninformative priors

16 0.93947637 1155 andrew gelman stats-2012-02-05-What is a prior distribution?

17 0.93804526 2149 andrew gelman stats-2013-12-26-Statistical evidence for revised standards

18 0.93760073 847 andrew gelman stats-2011-08-10-Using a “pure infographic” to explore differences between information visualization and statistical graphics

19 0.93727028 1208 andrew gelman stats-2012-03-11-Gelman on Hennig on Gelman on Bayes

20 0.93695128 2358 andrew gelman stats-2014-06-03-Did you buy laundry detergent on their most recent trip to the store? Also comments on scientific publication and yet another suggestion to do a study that allows within-person comparisons