andrew_gelman_stats andrew_gelman_stats-2013 andrew_gelman_stats-2013-1944 knowledge-graph by maker-knowledge-mining

1944 andrew gelman stats-2013-07-18-You’ll get a high Type S error rate if you use classical statistical methods to analyze data from underpowered studies


meta infos for this blog

Source: html

Introduction: Brendan Nyhan sends me this article from the research-methods all-star team of Katherine Button, John Ioannidis, Claire Mokrysz, Brian Nosek , Jonathan Flint, Emma Robinson, and Marcus Munafo: A study with low statistical power has a reduced chance of detecting a true effect, but it is less well appreciated that low power also reduces the likelihood that a statistically significant result reflects a true effect. Here, we show that the average statistical power of studies in the neurosciences is very low. The consequences of this include overestimates of effect size and low reproducibility of results. There are also ethical dimensions to this problem, as unreliable research is inefficient and wasteful. Improving reproducibility in neuroscience is a key priority and requires attention to well-established but often ignored methodological principles. I agree completely. In my terminology, with small sample size, the classical approach of looking for statistical significance leads


Summary: the most important sentenses genereted by tfidf model

sentIndex sentText sentNum sentScore

1 Here, we show that the average statistical power of studies in the neurosciences is very low. [sent-2, score-0.67]

2 The consequences of this include overestimates of effect size and low reproducibility of results. [sent-3, score-0.874]

3 There are also ethical dimensions to this problem, as unreliable research is inefficient and wasteful. [sent-4, score-0.43]

4 Improving reproducibility in neuroscience is a key priority and requires attention to well-established but often ignored methodological principles. [sent-5, score-0.689]

5 In my terminology, with small sample size, the classical approach of looking for statistical significance leads to a high rate of Type S error. [sent-7, score-0.473]

6 indeed this is a theme of my paper with Weakiem (along with much earlier literature in psychology research methods). [sent-8, score-0.099]

7 I’d love this stuff even more if they stopped using the word “power” which unfortunately is strongly tied to the not-so-useful notion of statistical significance. [sent-9, score-0.575]

8 Also I didn’t notice if they mentioned the statistical significance filter—the problem that statistically-significant results tend to have high Type M errors. [sent-10, score-0.38]

9 In any case, it’s good to see this stuff getting further attention. [sent-11, score-0.12]

10 Also I think it would be useful for them to go further and provide guidance into how to better analyze data from small samples. [sent-12, score-0.206]

11 Saying not to design low-power studies is fine, but once you have the data there’s no point in ignoring what you have. [sent-13, score-0.204]


similar blogs computed by tfidf model

tfidf for this blog:

wordName wordTfidf (topN-words)

[('power', 0.258), ('reproducibility', 0.251), ('low', 0.189), ('weakiem', 0.156), ('neurosciences', 0.156), ('emma', 0.156), ('statistical', 0.15), ('claire', 0.141), ('significance', 0.138), ('type', 0.133), ('robinson', 0.132), ('size', 0.126), ('nosek', 0.125), ('priority', 0.123), ('katherine', 0.123), ('unreliable', 0.123), ('overestimates', 0.123), ('marcus', 0.123), ('button', 0.12), ('stuff', 0.12), ('inefficient', 0.118), ('appreciated', 0.118), ('terminology', 0.116), ('neuroscience', 0.116), ('detecting', 0.115), ('ioannidis', 0.115), ('nyhan', 0.113), ('guidance', 0.113), ('brendan', 0.111), ('ignored', 0.11), ('reduces', 0.109), ('notion', 0.107), ('studies', 0.106), ('reflects', 0.106), ('brian', 0.106), ('filter', 0.103), ('theme', 0.099), ('stopped', 0.099), ('tied', 0.099), ('ignoring', 0.098), ('reduced', 0.097), ('true', 0.096), ('ethical', 0.095), ('dimensions', 0.094), ('consequences', 0.093), ('improving', 0.093), ('small', 0.093), ('high', 0.092), ('effect', 0.092), ('methodological', 0.089)]

similar blogs list:

simIndex simValue blogId blogTitle

same-blog 1 1.0 1944 andrew gelman stats-2013-07-18-You’ll get a high Type S error rate if you use classical statistical methods to analyze data from underpowered studies

Introduction: Brendan Nyhan sends me this article from the research-methods all-star team of Katherine Button, John Ioannidis, Claire Mokrysz, Brian Nosek , Jonathan Flint, Emma Robinson, and Marcus Munafo: A study with low statistical power has a reduced chance of detecting a true effect, but it is less well appreciated that low power also reduces the likelihood that a statistically significant result reflects a true effect. Here, we show that the average statistical power of studies in the neurosciences is very low. The consequences of this include overestimates of effect size and low reproducibility of results. There are also ethical dimensions to this problem, as unreliable research is inefficient and wasteful. Improving reproducibility in neuroscience is a key priority and requires attention to well-established but often ignored methodological principles. I agree completely. In my terminology, with small sample size, the classical approach of looking for statistical significance leads

2 0.1594812 899 andrew gelman stats-2011-09-10-The statistical significance filter

Introduction: I’ve talked about this a bit but it’s never had its own blog entry (until now). Statistically significant findings tend to overestimate the magnitude of effects. This holds in general (because E(|x|) > |E(x)|) but even more so if you restrict to statistically significant results. Here’s an example. Suppose a true effect of theta is unbiasedly estimated by y ~ N (theta, 1). Further suppose that we will only consider statistically significant results, that is, cases in which |y| > 2. The estimate “|y| conditional on |y|>2″ is clearly an overestimate of |theta|. First off, if |theta|<2, the estimate |y| conditional on statistical significance is not only too high in expectation, it's always too high. This is a problem, given that |theta| is in reality probably is less than 2. (The low-hangning fruit have already been picked, remember?) But even if |theta|>2, the estimate |y| conditional on statistical significance will still be too high in expectation. For a discussion o

3 0.15580943 511 andrew gelman stats-2011-01-11-One more time on that ESP study: The problem of overestimates and the shrinkage solution

Introduction: Benedict Carey writes a follow-up article on ESP studies and Bayesian statistics. ( See here for my previous thoughts on the topic.) Everything Carey writes is fine, and he even uses an example I recommended: The statistical approach that has dominated the social sciences for almost a century is called significance testing. The idea is straightforward. A finding from any well-designed study — say, a correlation between a personality trait and the risk of depression — is considered “significant” if its probability of occurring by chance is less than 5 percent. This arbitrary cutoff makes sense when the effect being studied is a large one — for example, when measuring the so-called Stroop effect. This effect predicts that naming the color of a word is faster and more accurate when the word and color match (“red” in red letters) than when they do not (“red” in blue letters), and is very strong in almost everyone. “But if the true effect of what you are measuring is small,” sai

4 0.13873591 643 andrew gelman stats-2011-04-02-So-called Bayesian hypothesis testing is just as bad as regular hypothesis testing

Introduction: Steve Ziliak points me to this article by the always-excellent Carl Bialik, slamming hypothesis tests. I only wish Carl had talked with me before so hastily posting, though! I would’ve argued with some of the things in the article. In particular, he writes: Reese and Brad Carlin . . . suggest that Bayesian statistics are a better alternative, because they tackle the probability that the hypothesis is true head-on, and incorporate prior knowledge about the variables involved. Brad Carlin does great work in theory, methods, and applications, and I like the bit about the prior knowledge (although I might prefer the more general phrase “additional information”), but I hate that quote! My quick response is that the hypothesis of zero effect is almost never true! The problem with the significance testing framework–Bayesian or otherwise–is in the obsession with the possibility of an exact zero effect. The real concern is not with zero, it’s with claiming a positive effect whe

5 0.13596281 1878 andrew gelman stats-2013-05-31-How to fix the tabloids? Toward replicable social science research

Introduction: This seems to be the topic of the week. Yesterday I posted on the sister blog some further thoughts on those “Psychological Science” papers on menstrual cycles, biceps size, and political attitudes, tied to a horrible press release from the journal Psychological Science hyping the biceps and politics study. Then I was pointed to these suggestions from Richard Lucas and M. Brent Donnellan have on improving the replicability and reproducibility of research published in the Journal of Research in Personality: It goes without saying that editors of scientific journals strive to publish research that is not only theoretically interesting but also methodologically rigorous. The goal is to select papers that advance the field. Accordingly, editors want to publish findings that can be reproduced and replicated by other scientists. Unfortunately, there has been a recent “crisis in confidence” among psychologists about the quality of psychological research (Pashler & Wagenmakers, 2012)

6 0.13118218 695 andrew gelman stats-2011-05-04-Statistics ethics question

7 0.13021007 1317 andrew gelman stats-2012-05-13-Question 3 of my final exam for Design and Analysis of Sample Surveys

8 0.1270134 466 andrew gelman stats-2010-12-13-“The truth wears off: Is there something wrong with the scientific method?”

9 0.12573883 1959 andrew gelman stats-2013-07-28-50 shades of gray: A research story

10 0.12050655 6 andrew gelman stats-2010-04-27-Jelte Wicherts lays down the stats on IQ

11 0.11729433 1607 andrew gelman stats-2012-12-05-The p-value is not . . .

12 0.11334056 446 andrew gelman stats-2010-12-03-Is 0.05 too strict as a p-value threshold?

13 0.11307579 1883 andrew gelman stats-2013-06-04-Interrogating p-values

14 0.11176042 1315 andrew gelman stats-2012-05-12-Question 2 of my final exam for Design and Analysis of Sample Surveys

15 0.11064504 897 andrew gelman stats-2011-09-09-The difference between significant and not significant…

16 0.10941631 2040 andrew gelman stats-2013-09-26-Difficulties in making inferences about scientific truth from distributions of published p-values

17 0.10790033 1572 andrew gelman stats-2012-11-10-I don’t like this cartoon

18 0.1031859 1955 andrew gelman stats-2013-07-25-Bayes-respecting experimental design and other things

19 0.10109857 2235 andrew gelman stats-2014-03-06-How much time (if any) should we spend criticizing research that’s fraudulent, crappy, or just plain pointless?

20 0.098247364 1695 andrew gelman stats-2013-01-28-Economists argue about Bayes


similar blogs computed by lsi model

lsi for this blog:

topicId topicWeight

[(0, 0.183), (1, 0.017), (2, 0.02), (3, -0.15), (4, -0.017), (5, -0.057), (6, -0.049), (7, 0.023), (8, -0.031), (9, -0.044), (10, -0.053), (11, -0.013), (12, 0.046), (13, -0.055), (14, 0.001), (15, -0.014), (16, -0.025), (17, 0.002), (18, 0.005), (19, -0.008), (20, -0.01), (21, -0.016), (22, 0.005), (23, 0.015), (24, -0.054), (25, -0.015), (26, -0.001), (27, 0.014), (28, 0.007), (29, -0.04), (30, 0.04), (31, 0.028), (32, 0.027), (33, -0.011), (34, 0.031), (35, 0.069), (36, -0.035), (37, -0.057), (38, -0.006), (39, 0.012), (40, 0.044), (41, 0.0), (42, -0.056), (43, 0.038), (44, 0.071), (45, -0.058), (46, 0.001), (47, -0.009), (48, -0.02), (49, -0.014)]

similar blogs list:

simIndex simValue blogId blogTitle

same-blog 1 0.98253131 1944 andrew gelman stats-2013-07-18-You’ll get a high Type S error rate if you use classical statistical methods to analyze data from underpowered studies

Introduction: Brendan Nyhan sends me this article from the research-methods all-star team of Katherine Button, John Ioannidis, Claire Mokrysz, Brian Nosek , Jonathan Flint, Emma Robinson, and Marcus Munafo: A study with low statistical power has a reduced chance of detecting a true effect, but it is less well appreciated that low power also reduces the likelihood that a statistically significant result reflects a true effect. Here, we show that the average statistical power of studies in the neurosciences is very low. The consequences of this include overestimates of effect size and low reproducibility of results. There are also ethical dimensions to this problem, as unreliable research is inefficient and wasteful. Improving reproducibility in neuroscience is a key priority and requires attention to well-established but often ignored methodological principles. I agree completely. In my terminology, with small sample size, the classical approach of looking for statistical significance leads

2 0.8277508 899 andrew gelman stats-2011-09-10-The statistical significance filter

Introduction: I’ve talked about this a bit but it’s never had its own blog entry (until now). Statistically significant findings tend to overestimate the magnitude of effects. This holds in general (because E(|x|) > |E(x)|) but even more so if you restrict to statistically significant results. Here’s an example. Suppose a true effect of theta is unbiasedly estimated by y ~ N (theta, 1). Further suppose that we will only consider statistically significant results, that is, cases in which |y| > 2. The estimate “|y| conditional on |y|>2″ is clearly an overestimate of |theta|. First off, if |theta|<2, the estimate |y| conditional on statistical significance is not only too high in expectation, it's always too high. This is a problem, given that |theta| is in reality probably is less than 2. (The low-hangning fruit have already been picked, remember?) But even if |theta|>2, the estimate |y| conditional on statistical significance will still be too high in expectation. For a discussion o

3 0.81207156 695 andrew gelman stats-2011-05-04-Statistics ethics question

Introduction: A graduate student in public health writes: I have been asked to do the statistical analysis for a medical unit that is delivering a pilot study of a program to [details redacted to prevent identification]. They are using a prospective, nonrandomized, cohort-controlled trial study design. The investigator thinks they can recruit only a small number of treatment and control cases, maybe less than 30 in total. After I told the Investigator that I cannot do anything statistically with a sample size that small, he responded that small sample sizes are common in this field, and he send me an example of analysis that someone had done on a similar study. So he still wants me to come up with a statistical plan. Is it unethical for me to do anything other than descriptive statistics? I think he should just stick to qualitative research. But the study she mentions above has 40 subjects and apparently had enough power to detect some effects. This is a pilot study after all so the n does n

4 0.80717278 511 andrew gelman stats-2011-01-11-One more time on that ESP study: The problem of overestimates and the shrinkage solution

Introduction: Benedict Carey writes a follow-up article on ESP studies and Bayesian statistics. ( See here for my previous thoughts on the topic.) Everything Carey writes is fine, and he even uses an example I recommended: The statistical approach that has dominated the social sciences for almost a century is called significance testing. The idea is straightforward. A finding from any well-designed study — say, a correlation between a personality trait and the risk of depression — is considered “significant” if its probability of occurring by chance is less than 5 percent. This arbitrary cutoff makes sense when the effect being studied is a large one — for example, when measuring the so-called Stroop effect. This effect predicts that naming the color of a word is faster and more accurate when the word and color match (“red” in red letters) than when they do not (“red” in blue letters), and is very strong in almost everyone. “But if the true effect of what you are measuring is small,” sai

5 0.80412656 1171 andrew gelman stats-2012-02-16-“False-positive psychology”

Introduction: Everybody’s talkin bout this paper by Joseph Simmons, Leif Nelson and Uri Simonsohn, who write : Despite empirical psychologists’ nominal endorsement of a low rate of false-positive findings (≤ .05), flexibility in data collection, analysis, and reporting dramatically increases actual false-positive rates. In many cases, a researcher is more likely to falsely find evidence that an effect exists than to correctly find evidence that it does not. We [Simmons, Nelson, and Simonsohn] present computer simulations and a pair of actual experiments that demonstrate how unacceptably easy it is to accumulate (and report) statistically significant evidence for a false hypothesis. Second, we suggest a simple, low-cost, and straightforwardly effective disclosure-based solution to this problem. The solution involves six concrete requirements for authors and four guidelines for reviewers, all of which impose a minimal burden on the publication process. Whatever you think about these recommend

6 0.80199933 1776 andrew gelman stats-2013-03-25-The harm done by tests of significance

7 0.79762387 898 andrew gelman stats-2011-09-10-Fourteen magic words: an update

8 0.79472488 1974 andrew gelman stats-2013-08-08-Statistical significance and the dangerous lure of certainty

9 0.78727818 576 andrew gelman stats-2011-02-15-With a bit of precognition, you’d have known I was going to post again on this topic, and with a lot of precognition, you’d have known I was going to post today

10 0.77740455 466 andrew gelman stats-2010-12-13-“The truth wears off: Is there something wrong with the scientific method?”

11 0.77702045 2090 andrew gelman stats-2013-11-05-How much do we trust a new claim that early childhood stimulation raised earnings by 42%?

12 0.77580398 1072 andrew gelman stats-2011-12-19-“The difference between . . .”: It’s not just p=.05 vs. p=.06

13 0.76588225 156 andrew gelman stats-2010-07-20-Burglars are local

14 0.76475954 1150 andrew gelman stats-2012-02-02-The inevitable problems with statistical significance and 95% intervals

15 0.75671721 2183 andrew gelman stats-2014-01-23-Discussion on preregistration of research studies

16 0.75661838 963 andrew gelman stats-2011-10-18-Question on Type M errors

17 0.75039786 1557 andrew gelman stats-2012-11-01-‘Researcher Degrees of Freedom’

18 0.74322253 2223 andrew gelman stats-2014-02-24-“Edlin’s rule” for routinely scaling down published estimates

19 0.74127012 1317 andrew gelman stats-2012-05-13-Question 3 of my final exam for Design and Analysis of Sample Surveys

20 0.73627925 2042 andrew gelman stats-2013-09-28-Difficulties of using statistical significance (or lack thereof) to sift through and compare research hypotheses


similar blogs computed by lda model

lda for this blog:

topicId topicWeight

[(5, 0.01), (15, 0.019), (16, 0.06), (24, 0.244), (29, 0.155), (34, 0.013), (45, 0.012), (50, 0.011), (62, 0.014), (72, 0.021), (77, 0.01), (81, 0.014), (84, 0.012), (86, 0.052), (99, 0.264)]

similar blogs list:

simIndex simValue blogId blogTitle

same-blog 1 0.94784606 1944 andrew gelman stats-2013-07-18-You’ll get a high Type S error rate if you use classical statistical methods to analyze data from underpowered studies

Introduction: Brendan Nyhan sends me this article from the research-methods all-star team of Katherine Button, John Ioannidis, Claire Mokrysz, Brian Nosek , Jonathan Flint, Emma Robinson, and Marcus Munafo: A study with low statistical power has a reduced chance of detecting a true effect, but it is less well appreciated that low power also reduces the likelihood that a statistically significant result reflects a true effect. Here, we show that the average statistical power of studies in the neurosciences is very low. The consequences of this include overestimates of effect size and low reproducibility of results. There are also ethical dimensions to this problem, as unreliable research is inefficient and wasteful. Improving reproducibility in neuroscience is a key priority and requires attention to well-established but often ignored methodological principles. I agree completely. In my terminology, with small sample size, the classical approach of looking for statistical significance leads

2 0.94603723 1940 andrew gelman stats-2013-07-16-A poll that throws away data???

Introduction: Mark Blumenthal writes: What do you think about the “random rejection” method used by PPP that was attacked at some length today by a Republican pollster. Our just published post on the debate includes all the details as I know them. The Storify of Martino’s tweets has some additional data tables linked to toward the end. Also, more specifically, setting aside Martino’s suggestion of manipulation (which is also quite possible with post-stratification weights), would the PPP method introduce more potential random error than weighting? From Blumenthal’s blog: B.J. Martino, a senior vice president at the Republican polling firm The Tarrance Group, went on an 30-minute Twitter rant on Tuesday questioning the unorthodox method used by PPP [Public Policy Polling] to select samples and weight data: “Looking at @ppppolls new VA SW. Wondering how many interviews they discarded to get down to 601 completes? Because @ppppolls discards a LOT of interviews. Of 64,811 conducted

3 0.93466163 1421 andrew gelman stats-2012-07-19-Alexa, Maricel, and Marty: Three cellular automata who got on my nerves

Introduction: I received the following two emails within fifteen minutes of each other. First, from “Alexa Russell,” subject line “An idea for a blog post: The Role, Importance, and Power of Words”: Hi Andrew, I’m a researcher/writer for a resource covering the importance of English proficiency in today’s workplace. I came across your blog andrewgelman.com as I was conducting research and I’m interested in contributing an article to your blog because I found the topics you cover very engaging. I’m thinking about writing an article that looks at how the Internet has changed the way English is used today; not only has its syntax changed as a result of the Internet Revolution, but the amount of job opportunities has also shifted as a result of this shift. I’d be happy to work with you on the topic if you have any insights. Thanks, and I look forward to hearing from you soon. Best, Alexa Second, From “Maricel Anderson,” subject line “An idea for a blog post: Healthcare Management and Geri

4 0.92913663 1687 andrew gelman stats-2013-01-21-Workshop on science communication for graduate students

Introduction: Nathan Sanders writes: Applications are now open for the Communicating Science 2013 workshop (http://workshop.astrobites.com/), to be held in Cambridge, MA on June 13-15th, 2013. Graduate students at US institutions in all fields of science and engineering are encouraged to apply – funding is available for travel expenses and accommodations. The application can be found here: http://workshop.astrobites.org/application Participants will build the communication skills that technical professionals need to express complex ideas to their peers, experts in other fields, and the general public. There will be panel discussions on the following topics: * Engaging Non-Scientific Audiences * Science Writing for a Cause * Communicating Science Through Fiction * Sharing Science with Scientists * The World of Non-Academic Publishing * Communicating using Multimedia and the Web In addition to these discussions, ample time is allotted for interacting with the experts and with att

5 0.92629123 639 andrew gelman stats-2011-03-31-Bayes: radical, liberal, or conservative?

Introduction: Radford writes : The word “conservative” gets used many ways, for various political purposes, but I would take it’s basic meaning to be someone who thinks there’s a lot of wisdom in traditional ways of doing things, even if we don’t understand exactly why those ways are good, so we should be reluctant to change unless we have a strong argument that some other way is better. This sounds very Bayesian, with a prior reducing the impact of new data. I agree completely, and I think Radford will very much enjoy my article with Aleks Jakulin , “Bayes: radical, liberal, or conservative?” Radford’s comment also fits with my increasing inclination to use informative prior distributions.

6 0.92624354 2133 andrew gelman stats-2013-12-13-Flexibility is good

7 0.923015 466 andrew gelman stats-2010-12-13-“The truth wears off: Is there something wrong with the scientific method?”

8 0.92087287 1392 andrew gelman stats-2012-06-26-Occam

9 0.92029953 1024 andrew gelman stats-2011-11-23-Of hypothesis tests and Unitarians

10 0.91233945 2051 andrew gelman stats-2013-10-04-Scientific communication that accords you “the basic human dignity of allowing you to draw your own conclusions”

11 0.91183352 1491 andrew gelman stats-2012-09-10-Update on Levitt paper on child car seats

12 0.91132379 1367 andrew gelman stats-2012-06-05-Question 26 of my final exam for Design and Analysis of Sample Surveys

13 0.91126776 779 andrew gelman stats-2011-06-25-Avoiding boundary estimates using a prior distribution as regularization

14 0.91069174 846 andrew gelman stats-2011-08-09-Default priors update?

15 0.91034806 953 andrew gelman stats-2011-10-11-Steve Jobs’s cancer and science-based medicine

16 0.90974605 1838 andrew gelman stats-2013-05-03-Setting aside the politics, the debate over the new health-care study reveals that we’re moving to a new high standard of statistical journalism

17 0.90949309 847 andrew gelman stats-2011-08-10-Using a “pure infographic” to explore differences between information visualization and statistical graphics

18 0.90893042 1240 andrew gelman stats-2012-04-02-Blogads update

19 0.90871298 1474 andrew gelman stats-2012-08-29-More on scaled-inverse Wishart and prior independence

20 0.9079448 494 andrew gelman stats-2010-12-31-Type S error rates for classical and Bayesian single and multiple comparison procedures