andrew_gelman_stats andrew_gelman_stats-2013 andrew_gelman_stats-2013-1776 knowledge-graph by maker-knowledge-mining

1776 andrew gelman stats-2013-03-25-The harm done by tests of significance


meta infos for this blog

Source: html

Introduction: After seeing this recent discussion , Ezra Hauer sent along an article of his from the journal Accident Analysis and Prevention, describing three examples from accident research in which null hypothesis significance testing led researchers astray. Hauer writes: The problem is clear. Researchers obtain real data which, while noisy, time and again point in a certain direction. However, instead of saying: “here is my estimate of the safety effect, here is its precision, and this is how what I found relates to previous findings”, the data is processed by NHST, and the researcher says, correctly but pointlessly: “I cannot be sure that the safety effect is not zero”. Occasionally, the researcher adds, this time incorrectly and unjustifiably, a statement to the effect that: “since the result is not statistically significant, it is best to assume the safety effect to be zero”. In this manner, good data are drained of real content, the direction of empirical conclusions reversed, and ord


Summary: the most important sentenses genereted by tfidf model

sentIndex sentText sentNum sentScore

1 After seeing this recent discussion , Ezra Hauer sent along an article of his from the journal Accident Analysis and Prevention, describing three examples from accident research in which null hypothesis significance testing led researchers astray. [sent-1, score-0.621]

2 Researchers obtain real data which, while noisy, time and again point in a certain direction. [sent-3, score-0.281]

3 However, instead of saying: “here is my estimate of the safety effect, here is its precision, and this is how what I found relates to previous findings”, the data is processed by NHST, and the researcher says, correctly but pointlessly: “I cannot be sure that the safety effect is not zero”. [sent-4, score-1.247]

4 Occasionally, the researcher adds, this time incorrectly and unjustifiably, a statement to the effect that: “since the result is not statistically significant, it is best to assume the safety effect to be zero”. [sent-5, score-0.962]

5 In this manner, good data are drained of real content, the direction of empirical conclusions reversed, and ordinary human and scientific reasoning is turned on its head for the sake of a venerable ritual. [sent-6, score-0.622]

6 As to the habit of subjecting the data from each study to the NHST separately, as if no pre- vious knowledge existed, Edwards (1976, p. [sent-7, score-0.345]

7 180) notes that “it is like trying to sink a battleship by firing lead shot at it for a long time”. [sent-8, score-0.358]

8 Indeed, when I say that a Bayesian wants other researchers to be non-Bayesian, what I mean is that I want people to give me their data or their summary statistics, unpolluted by any prior distributions. [sent-9, score-0.243]

9 But I certainly don’t want them to discard all their numbers in exchange for a simple yes/no statement on statistical significance. [sent-10, score-0.318]

10 P-values as data summaries can be really misleading, and unfortunately this sort of thing is often encouraged (explicitly or implicitly) by standard statistics books. [sent-11, score-0.303]

11 Maybe an even better title would be, “The harm done by tests of significance and by analyzing datasets in isolation. [sent-14, score-0.379]


similar blogs computed by tfidf model

tfidf for this blog:

wordName wordTfidf (topN-words)

[('hauer', 0.31), ('safety', 0.28), ('nhst', 0.254), ('accident', 0.211), ('effect', 0.167), ('pointlessly', 0.141), ('unjustifiably', 0.141), ('firing', 0.141), ('subjecting', 0.141), ('edwards', 0.133), ('drained', 0.133), ('researchers', 0.128), ('sink', 0.127), ('ezra', 0.127), ('significance', 0.125), ('statement', 0.123), ('prevention', 0.123), ('researcher', 0.121), ('reversed', 0.119), ('sake', 0.116), ('processed', 0.116), ('data', 0.115), ('discard', 0.113), ('zero', 0.112), ('incorrectly', 0.104), ('existed', 0.101), ('encouraged', 0.096), ('ordinary', 0.096), ('harm', 0.095), ('summaries', 0.092), ('relates', 0.091), ('shot', 0.09), ('habit', 0.089), ('separately', 0.088), ('manner', 0.088), ('obtain', 0.086), ('precision', 0.086), ('datasets', 0.083), ('adds', 0.083), ('head', 0.082), ('exchange', 0.082), ('noisy', 0.081), ('implicitly', 0.08), ('real', 0.08), ('occasionally', 0.08), ('null', 0.079), ('describing', 0.078), ('correctly', 0.077), ('explicitly', 0.076), ('analyzing', 0.076)]

similar blogs list:

simIndex simValue blogId blogTitle

same-blog 1 1.0000001 1776 andrew gelman stats-2013-03-25-The harm done by tests of significance

Introduction: After seeing this recent discussion , Ezra Hauer sent along an article of his from the journal Accident Analysis and Prevention, describing three examples from accident research in which null hypothesis significance testing led researchers astray. Hauer writes: The problem is clear. Researchers obtain real data which, while noisy, time and again point in a certain direction. However, instead of saying: “here is my estimate of the safety effect, here is its precision, and this is how what I found relates to previous findings”, the data is processed by NHST, and the researcher says, correctly but pointlessly: “I cannot be sure that the safety effect is not zero”. Occasionally, the researcher adds, this time incorrectly and unjustifiably, a statement to the effect that: “since the result is not statistically significant, it is best to assume the safety effect to be zero”. In this manner, good data are drained of real content, the direction of empirical conclusions reversed, and ord

2 0.16677989 2183 andrew gelman stats-2014-01-23-Discussion on preregistration of research studies

Introduction: Chris Chambers and I had an enlightening discussion the other day at the blog of Rolf Zwaan, regarding the Garden of Forking Paths ( go here and scroll down through the comments). Chris sent me the following note: I’m writing a book at the moment about reforming practices in psychological research (focusing on various bad practices such as p-hacking, HARKing, low statistical power, publication bias, lack of data sharing etc. – and posing solutions such as pre-registration, Bayesian hypothesis testing, mandatory data archiving etc.) and I am arriving at rather unsettling conclusion: that null hypothesis significance testing (NHST) simply isn’t valid for observational research. If this is true then most of the psychological literature is statistically flawed. I was wonder what your thoughts were on this, both from a statistical point of view and from your experience working in an observational field. We all know about the dangers of researcher degrees of freedom. We also know

3 0.14419022 643 andrew gelman stats-2011-04-02-So-called Bayesian hypothesis testing is just as bad as regular hypothesis testing

Introduction: Steve Ziliak points me to this article by the always-excellent Carl Bialik, slamming hypothesis tests. I only wish Carl had talked with me before so hastily posting, though! I would’ve argued with some of the things in the article. In particular, he writes: Reese and Brad Carlin . . . suggest that Bayesian statistics are a better alternative, because they tackle the probability that the hypothesis is true head-on, and incorporate prior knowledge about the variables involved. Brad Carlin does great work in theory, methods, and applications, and I like the bit about the prior knowledge (although I might prefer the more general phrase “additional information”), but I hate that quote! My quick response is that the hypothesis of zero effect is almost never true! The problem with the significance testing framework–Bayesian or otherwise–is in the obsession with the possibility of an exact zero effect. The real concern is not with zero, it’s with claiming a positive effect whe

4 0.1330976 1695 andrew gelman stats-2013-01-28-Economists argue about Bayes

Introduction: Robert Bell pointed me to this post by Brad De Long on Bayesian statistics, and then I also noticed this from Noah Smith, who wrote: My impression is that although the Bayesian/Frequentist debate is interesting and intellectually fun, there’s really not much “there” there… despite being so-hip-right-now, Bayesian is not the Statistical Jesus. I’m happy to see the discussion going in this direction. Twenty-five years ago or so, when I got into this biz, there were some serious anti-Bayesian attitudes floating around in mainstream statistics. Discussions in the journals sometimes devolved into debates of the form, “Bayesians: knaves or fools?”. You’d get all sorts of free-floating skepticism about any prior distribution at all, even while people were accepting without question (and doing theory on) logistic regressions, proportional hazards models, and all sorts of strong strong models. (In the subfield of survey sampling, various prominent researchers would refuse to mode

5 0.11777663 511 andrew gelman stats-2011-01-11-One more time on that ESP study: The problem of overestimates and the shrinkage solution

Introduction: Benedict Carey writes a follow-up article on ESP studies and Bayesian statistics. ( See here for my previous thoughts on the topic.) Everything Carey writes is fine, and he even uses an example I recommended: The statistical approach that has dominated the social sciences for almost a century is called significance testing. The idea is straightforward. A finding from any well-designed study — say, a correlation between a personality trait and the risk of depression — is considered “significant” if its probability of occurring by chance is less than 5 percent. This arbitrary cutoff makes sense when the effect being studied is a large one — for example, when measuring the so-called Stroop effect. This effect predicts that naming the color of a word is faster and more accurate when the word and color match (“red” in red letters) than when they do not (“red” in blue letters), and is very strong in almost everyone. “But if the true effect of what you are measuring is small,” sai

6 0.1160979 1607 andrew gelman stats-2012-12-05-The p-value is not . . .

7 0.11029269 256 andrew gelman stats-2010-09-04-Noooooooooooooooooooooooooooooooooooooooooooooooo!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!

8 0.10870378 2281 andrew gelman stats-2014-04-04-The Notorious N.H.S.T. presents: Mo P-values Mo Problems

9 0.10860908 1605 andrew gelman stats-2012-12-04-Write This Book

10 0.10665548 2263 andrew gelman stats-2014-03-24-Empirical implications of Empirical Implications of Theoretical Models

11 0.10425093 803 andrew gelman stats-2011-07-14-Subtleties with measurement-error models for the evaluation of wacky claims

12 0.10267247 708 andrew gelman stats-2011-05-12-Improvement of 5 MPG: how many more auto deaths?

13 0.099856608 899 andrew gelman stats-2011-09-10-The statistical significance filter

14 0.098330632 1974 andrew gelman stats-2013-08-08-Statistical significance and the dangerous lure of certainty

15 0.0968154 2042 andrew gelman stats-2013-09-28-Difficulties of using statistical significance (or lack thereof) to sift through and compare research hypotheses

16 0.095402092 1941 andrew gelman stats-2013-07-16-Priors

17 0.094659925 2312 andrew gelman stats-2014-04-29-Ken Rice presents a unifying approach to statistical inference and hypothesis testing

18 0.092108384 1826 andrew gelman stats-2013-04-26-“A Vast Graveyard of Undead Theories: Publication Bias and Psychological Science’s Aversion to the Null”

19 0.091642879 2295 andrew gelman stats-2014-04-18-One-tailed or two-tailed?

20 0.089192115 1963 andrew gelman stats-2013-07-31-Response by Jessica Tracy and Alec Beall to my critique of the methods in their paper, “Women Are More Likely to Wear Red or Pink at Peak Fertility”


similar blogs computed by lsi model

lsi for this blog:

topicId topicWeight

[(0, 0.184), (1, 0.027), (2, -0.004), (3, -0.127), (4, -0.031), (5, -0.062), (6, -0.023), (7, 0.052), (8, -0.024), (9, -0.051), (10, -0.06), (11, -0.011), (12, 0.054), (13, -0.063), (14, 0.017), (15, 0.01), (16, -0.033), (17, -0.026), (18, 0.012), (19, -0.031), (20, 0.009), (21, 0.03), (22, -0.004), (23, 0.006), (24, -0.056), (25, -0.009), (26, 0.022), (27, -0.049), (28, 0.006), (29, -0.025), (30, -0.001), (31, -0.01), (32, 0.0), (33, 0.019), (34, -0.011), (35, 0.025), (36, -0.031), (37, -0.048), (38, 0.014), (39, -0.0), (40, -0.01), (41, 0.005), (42, -0.058), (43, 0.014), (44, 0.018), (45, 0.013), (46, -0.017), (47, -0.031), (48, 0.031), (49, -0.015)]

similar blogs list:

simIndex simValue blogId blogTitle

same-blog 1 0.9569025 1776 andrew gelman stats-2013-03-25-The harm done by tests of significance

Introduction: After seeing this recent discussion , Ezra Hauer sent along an article of his from the journal Accident Analysis and Prevention, describing three examples from accident research in which null hypothesis significance testing led researchers astray. Hauer writes: The problem is clear. Researchers obtain real data which, while noisy, time and again point in a certain direction. However, instead of saying: “here is my estimate of the safety effect, here is its precision, and this is how what I found relates to previous findings”, the data is processed by NHST, and the researcher says, correctly but pointlessly: “I cannot be sure that the safety effect is not zero”. Occasionally, the researcher adds, this time incorrectly and unjustifiably, a statement to the effect that: “since the result is not statistically significant, it is best to assume the safety effect to be zero”. In this manner, good data are drained of real content, the direction of empirical conclusions reversed, and ord

2 0.88436925 2183 andrew gelman stats-2014-01-23-Discussion on preregistration of research studies

Introduction: Chris Chambers and I had an enlightening discussion the other day at the blog of Rolf Zwaan, regarding the Garden of Forking Paths ( go here and scroll down through the comments). Chris sent me the following note: I’m writing a book at the moment about reforming practices in psychological research (focusing on various bad practices such as p-hacking, HARKing, low statistical power, publication bias, lack of data sharing etc. – and posing solutions such as pre-registration, Bayesian hypothesis testing, mandatory data archiving etc.) and I am arriving at rather unsettling conclusion: that null hypothesis significance testing (NHST) simply isn’t valid for observational research. If this is true then most of the psychological literature is statistically flawed. I was wonder what your thoughts were on this, both from a statistical point of view and from your experience working in an observational field. We all know about the dangers of researcher degrees of freedom. We also know

3 0.84879076 1171 andrew gelman stats-2012-02-16-“False-positive psychology”

Introduction: Everybody’s talkin bout this paper by Joseph Simmons, Leif Nelson and Uri Simonsohn, who write : Despite empirical psychologists’ nominal endorsement of a low rate of false-positive findings (≤ .05), flexibility in data collection, analysis, and reporting dramatically increases actual false-positive rates. In many cases, a researcher is more likely to falsely find evidence that an effect exists than to correctly find evidence that it does not. We [Simmons, Nelson, and Simonsohn] present computer simulations and a pair of actual experiments that demonstrate how unacceptably easy it is to accumulate (and report) statistically significant evidence for a false hypothesis. Second, we suggest a simple, low-cost, and straightforwardly effective disclosure-based solution to this problem. The solution involves six concrete requirements for authors and four guidelines for reviewers, all of which impose a minimal burden on the publication process. Whatever you think about these recommend

4 0.84169775 1883 andrew gelman stats-2013-06-04-Interrogating p-values

Introduction: This article is a discussion of a paper by Greg Francis for a special issue, edited by E. J. Wagenmakers, of the Journal of Mathematical Psychology. Here’s what I wrote: Much of statistical practice is an effort to reduce or deny variation and uncertainty. The reduction is done through standardization, replication, and other practices of experimental design, with the idea being to isolate and stabilize the quantity being estimated and then average over many cases. Even so, however, uncertainty persists, and statistical hypothesis testing is in many ways an endeavor to deny this, by reporting binary accept/reject decisions. Classical statistical methods produce binary statements, but there is no reason to assume that the world works that way. Expressions such as Type 1 error, Type 2 error, false positive, and so on, are based on a model in which the world is divided into real and non-real effects. To put it another way, I understand the general scientific distinction of real vs

5 0.83595687 1944 andrew gelman stats-2013-07-18-You’ll get a high Type S error rate if you use classical statistical methods to analyze data from underpowered studies

Introduction: Brendan Nyhan sends me this article from the research-methods all-star team of Katherine Button, John Ioannidis, Claire Mokrysz, Brian Nosek , Jonathan Flint, Emma Robinson, and Marcus Munafo: A study with low statistical power has a reduced chance of detecting a true effect, but it is less well appreciated that low power also reduces the likelihood that a statistically significant result reflects a true effect. Here, we show that the average statistical power of studies in the neurosciences is very low. The consequences of this include overestimates of effect size and low reproducibility of results. There are also ethical dimensions to this problem, as unreliable research is inefficient and wasteful. Improving reproducibility in neuroscience is a key priority and requires attention to well-established but often ignored methodological principles. I agree completely. In my terminology, with small sample size, the classical approach of looking for statistical significance leads

6 0.82651341 1974 andrew gelman stats-2013-08-08-Statistical significance and the dangerous lure of certainty

7 0.82017255 898 andrew gelman stats-2011-09-10-Fourteen magic words: an update

8 0.81933802 2295 andrew gelman stats-2014-04-18-One-tailed or two-tailed?

9 0.8091619 643 andrew gelman stats-2011-04-02-So-called Bayesian hypothesis testing is just as bad as regular hypothesis testing

10 0.80448252 256 andrew gelman stats-2010-09-04-Noooooooooooooooooooooooooooooooooooooooooooooooo!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!

11 0.80026293 1607 andrew gelman stats-2012-12-05-The p-value is not . . .

12 0.79854584 899 andrew gelman stats-2011-09-10-The statistical significance filter

13 0.79415011 1826 andrew gelman stats-2013-04-26-“A Vast Graveyard of Undead Theories: Publication Bias and Psychological Science’s Aversion to the Null”

14 0.79189831 511 andrew gelman stats-2011-01-11-One more time on that ESP study: The problem of overestimates and the shrinkage solution

15 0.7910552 1557 andrew gelman stats-2012-11-01-‘Researcher Degrees of Freedom’

16 0.78745884 2149 andrew gelman stats-2013-12-26-Statistical evidence for revised standards

17 0.78191507 1150 andrew gelman stats-2012-02-02-The inevitable problems with statistical significance and 95% intervals

18 0.77320218 1355 andrew gelman stats-2012-05-31-Lindley’s paradox

19 0.76010972 466 andrew gelman stats-2010-12-13-“The truth wears off: Is there something wrong with the scientific method?”

20 0.75981539 1605 andrew gelman stats-2012-12-04-Write This Book


similar blogs computed by lda model

lda for this blog:

topicId topicWeight

[(16, 0.055), (21, 0.021), (24, 0.239), (41, 0.011), (45, 0.042), (84, 0.273), (99, 0.241)]

similar blogs list:

simIndex simValue blogId blogTitle

1 0.96086872 323 andrew gelman stats-2010-10-06-Sociotropic Voting and the Media

Introduction: Stephen Ansolabehere, Marc Meredith, and Erik Snowberg write : The literature on economic voting notes that voters’ subjective evaluations of the overall state of the economy are correlated with vote choice, whereas personal economic experiences are not. Missing from this literature is a description of how voters acquire information about the general state of the economy, and how that information is used to form perceptions. In order to begin understanding this process, we [Ansolabehere, Meredith, and Snowberg] asked a series of questions on the 2006 ANES Pilot about respondents’ perceptions of the average price of gas and the unemployment rate in their home state. We find that questions about gas prices and unemployment show differences in the sources of information about these two economic variables. Information about unemployment rates come from media sources, and are systematically biased by partisan factors. Information about gas prices, in contrast, comes only from everyday

2 0.9549858 667 andrew gelman stats-2011-04-19-Free $5 gift certificate!

Introduction: I bought something online and got a gift certificate for $5 to use at BustedTees.com. The gift code is TP07zh4q5dc and it expires on 30 Apr. I don’t need a T-shirt so I’ll pass this on to you. I assume it only works once. So the first person who follows up on this gets the discount. Enjoy!

3 0.93110883 1181 andrew gelman stats-2012-02-23-Philosophy: Pointer to Salmon

Introduction: Larry Brownstein writes: I read your article on induction and deduction and your comments on Deborah Mayo’s approach and thought you might find the following useful in this discussion. It is Wesley Salmon’s Reality and Rationality (2005). Here he argues that Bayesian inferential procedures can replace the hypothetical-deductive method aka the Hempel-Oppenheim theory of explanation. He is concerned about the subjectivity problem, so takes a frequentist approach to the use of Bayes in this context. Hardly anyone agrees that the H-D approach accounts for scientific explanation. The problem has been to find a replacement. Salmon thought he had found it. I don’t know this book—but that’s no surprise since I know just about none of the philosophy of science literature that came after Popper, Kuhn, and Lakatos. That’s why I collaborated with Cosma Shalizi. He’s the one who connected me to Deborah Mayo and who put in the recent philosophy references in our articles. Anyway, I’m pa

same-blog 4 0.92114079 1776 andrew gelman stats-2013-03-25-The harm done by tests of significance

Introduction: After seeing this recent discussion , Ezra Hauer sent along an article of his from the journal Accident Analysis and Prevention, describing three examples from accident research in which null hypothesis significance testing led researchers astray. Hauer writes: The problem is clear. Researchers obtain real data which, while noisy, time and again point in a certain direction. However, instead of saying: “here is my estimate of the safety effect, here is its precision, and this is how what I found relates to previous findings”, the data is processed by NHST, and the researcher says, correctly but pointlessly: “I cannot be sure that the safety effect is not zero”. Occasionally, the researcher adds, this time incorrectly and unjustifiably, a statement to the effect that: “since the result is not statistically significant, it is best to assume the safety effect to be zero”. In this manner, good data are drained of real content, the direction of empirical conclusions reversed, and ord

5 0.91884696 490 andrew gelman stats-2010-12-29-Brain Structure and the Big Five

Introduction: Many years ago, a research psychologist whose judgment I greatly respect told me that the characterization of personality by the so-called Big Five traits (extraversion, etc.) was old-fashioned. So I’m always surprised to see that the Big Five keeps cropping up. I guess not everyone agrees that it’s a bad idea. For example, Hamdan Azhar wrote to me: I was wondering if you’d seen this recent paper (De Young et al. 2010) that finds significant correlations between brain volume in selected regions and personality trait measures (from the Big Five). This is quite a ground-breaking finding and it was covered extensively in the mainstream media. I think readers of your blog would be interested in your thoughts, statistically speaking, on their methodology and findings. My reply: I’d be interested in my thoughts on this too! But I don’t know enough to say anything useful. From the abstract of the paper under discussion: Controlling for age, sex, and whole-brain volume

6 0.90834928 1817 andrew gelman stats-2013-04-21-More on Bayesian model selection in high-dimensional settings

7 0.90033603 360 andrew gelman stats-2010-10-21-Forensic bioinformatics, or, Don’t believe everything you read in the (scientific) papers

8 0.90018618 235 andrew gelman stats-2010-08-25-Term Limits for the Supreme Court?

9 0.86030912 1152 andrew gelman stats-2012-02-03-Web equation

10 0.85986561 1877 andrew gelman stats-2013-05-30-Infill asymptotics and sprawl asymptotics

11 0.85777575 184 andrew gelman stats-2010-08-04-That half-Cauchy prior

12 0.85643411 42 andrew gelman stats-2010-05-19-Updated solutions to Bayesian Data Analysis homeworks

13 0.84208953 2053 andrew gelman stats-2013-10-06-Ideas that spread fast and slow

14 0.83888751 2162 andrew gelman stats-2014-01-08-Belief aggregation

15 0.83635485 2004 andrew gelman stats-2013-09-01-Post-publication peer review: How it (sometimes) really works

16 0.83449119 1883 andrew gelman stats-2013-06-04-Interrogating p-values

17 0.83386624 1433 andrew gelman stats-2012-07-28-LOL without the CATS

18 0.82909757 182 andrew gelman stats-2010-08-03-Nebraska never looked so appealing: anatomy of a zombie attack. Oops, I mean a recession.

19 0.818591 98 andrew gelman stats-2010-06-19-Further thoughts on happiness and life satisfaction research

20 0.8161453 2299 andrew gelman stats-2014-04-21-Stan Model of the Week: Hierarchical Modeling of Supernovas