andrew_gelman_stats andrew_gelman_stats-2011 andrew_gelman_stats-2011-899 knowledge-graph by maker-knowledge-mining

899 andrew gelman stats-2011-09-10-The statistical significance filter


meta infos for this blog

Source: html

Introduction: I’ve talked about this a bit but it’s never had its own blog entry (until now). Statistically significant findings tend to overestimate the magnitude of effects. This holds in general (because E(|x|) > |E(x)|) but even more so if you restrict to statistically significant results. Here’s an example. Suppose a true effect of theta is unbiasedly estimated by y ~ N (theta, 1). Further suppose that we will only consider statistically significant results, that is, cases in which |y| > 2. The estimate “|y| conditional on |y|>2″ is clearly an overestimate of |theta|. First off, if |theta|<2, the estimate |y| conditional on statistical significance is not only too high in expectation, it's always too high. This is a problem, given that |theta| is in reality probably is less than 2. (The low-hangning fruit have already been picked, remember?) But even if |theta|>2, the estimate |y| conditional on statistical significance will still be too high in expectation. For a discussion o


Summary: the most important sentenses genereted by tfidf model

sentIndex sentText sentNum sentScore

1 I’ve talked about this a bit but it’s never had its own blog entry (until now). [sent-1, score-0.068]

2 Statistically significant findings tend to overestimate the magnitude of effects. [sent-2, score-0.589]

3 This holds in general (because E(|x|) > |E(x)|) but even more so if you restrict to statistically significant results. [sent-3, score-0.696]

4 Suppose a true effect of theta is unbiasedly estimated by y ~ N (theta, 1). [sent-5, score-0.509]

5 Further suppose that we will only consider statistically significant results, that is, cases in which |y| > 2. [sent-6, score-0.638]

6 The estimate “|y| conditional on |y|>2″ is clearly an overestimate of |theta|. [sent-7, score-0.538]

7 First off, if |theta|<2, the estimate |y| conditional on statistical significance is not only too high in expectation, it's always too high. [sent-8, score-0.797]

8 (The low-hangning fruit have already been picked, remember? [sent-10, score-0.104]

9 ) But even if |theta|>2, the estimate |y| conditional on statistical significance will still be too high in expectation. [sent-11, score-0.87]

10 For a discussion of the statistical significance filter in the context of a dramatic example, see this article or the first part of this presentation . [sent-12, score-0.648]

11 I call it the statistical significance filter because when you select only the statistically significant results, your “type M” (magnitude) errors become worse. [sent-13, score-1.133]

12 And classical multiple comparisons procedures—which select at an even higher threshold—make the type M problem worse still (even if these corrections solve other problems). [sent-14, score-0.597]

13 This is one of the troubles with using multiple comparisons to attempt to adjust for spurious correlations in neuroscience . [sent-15, score-0.559]

14 Whatever happens to exceed the threshold is almost certainly an overestimate. [sent-16, score-0.238]

15 This might not be a concern in some problems (for example, in identifying candidate genes in a gene-association study) but it arises in any analysis (including just about anything in social or environmental science where the magnitude of the effect is important. [sent-17, score-0.678]

16 [This is part of a series of posts analyzing the properties of statistical procedures as they are actually done rather than as they might be described in theory. [sent-18, score-0.552]

17 Earlier I wrote about the problems of inverting a family of hypothesis tests to get a confidence interval and how this falls apart given the way that empty intervals are treated in practice. [sent-19, score-0.563]

18 Here I consider the statistical properties of an estimate conditional on it being statistically significant, in contrast to the usual unconditional analysis. [sent-20, score-1.078]


similar blogs computed by tfidf model

tfidf for this blog:

wordName wordTfidf (topN-words)

[('theta', 0.438), ('statistically', 0.249), ('significant', 0.226), ('conditional', 0.225), ('significance', 0.213), ('magnitude', 0.199), ('overestimate', 0.164), ('filter', 0.159), ('properties', 0.152), ('estimate', 0.149), ('threshold', 0.147), ('select', 0.147), ('statistical', 0.139), ('procedures', 0.131), ('troubles', 0.113), ('fruit', 0.104), ('type', 0.102), ('comparisons', 0.101), ('inverting', 0.101), ('problems', 0.099), ('spurious', 0.093), ('multiple', 0.091), ('exceed', 0.091), ('neuroscience', 0.09), ('unconditional', 0.09), ('suppose', 0.089), ('genes', 0.087), ('corrections', 0.083), ('empty', 0.081), ('expectation', 0.078), ('identifying', 0.077), ('environmental', 0.076), ('falls', 0.075), ('restrict', 0.075), ('consider', 0.074), ('holds', 0.073), ('even', 0.073), ('dramatic', 0.072), ('adjust', 0.071), ('high', 0.071), ('effect', 0.071), ('apart', 0.071), ('picked', 0.071), ('arises', 0.069), ('interval', 0.068), ('talked', 0.068), ('treated', 0.068), ('results', 0.066), ('analyzing', 0.065), ('part', 0.065)]

similar blogs list:

simIndex simValue blogId blogTitle

same-blog 1 0.99999988 899 andrew gelman stats-2011-09-10-The statistical significance filter

Introduction: I’ve talked about this a bit but it’s never had its own blog entry (until now). Statistically significant findings tend to overestimate the magnitude of effects. This holds in general (because E(|x|) > |E(x)|) but even more so if you restrict to statistically significant results. Here’s an example. Suppose a true effect of theta is unbiasedly estimated by y ~ N (theta, 1). Further suppose that we will only consider statistically significant results, that is, cases in which |y| > 2. The estimate “|y| conditional on |y|>2″ is clearly an overestimate of |theta|. First off, if |theta|<2, the estimate |y| conditional on statistical significance is not only too high in expectation, it's always too high. This is a problem, given that |theta| is in reality probably is less than 2. (The low-hangning fruit have already been picked, remember?) But even if |theta|>2, the estimate |y| conditional on statistical significance will still be too high in expectation. For a discussion o

2 0.26110479 1089 andrew gelman stats-2011-12-28-Path sampling for models of varying dimension

Introduction: Somebody asks: I’m reading your paper on path sampling. It essentially solves the problem of computing the ratio \int q0(omega)d omega/\int q1(omega) d omega. I.e the arguments in q0() and q1() are the same. But this assumption is not always true in Bayesian model selection using Bayes factor. In general (for BF), we have this problem, t1 and t2 may have no relation at all. \int f1(y|t1)p1(t1) d t1 / \int f2(y|t2)p2(t2) d t2 As an example, suppose that we want to compare two sets of normally distributed data with known variance whether they have the same mean (H0) or they are not necessarily have the same mean (H1). Then the dummy variable should be mu in H0 (which is the common mean of both set of samples), and should be (mu1, mu2) (which are the means for each set of samples). One straight method to address my problem is to preform path integration for the numerate and the denominator, as both the numerate and the denominator are integrals. Each integral can be rewrit

3 0.24066485 310 andrew gelman stats-2010-10-02-The winner’s curse

Introduction: If an estimate is statistically significant, it’s probably an overestimate of the magnitude of your effect. P.S. I think youall know what I mean here. But could someone rephrase it in a more pithy manner? I’d like to include it in our statistical lexicon.

4 0.23707908 1941 andrew gelman stats-2013-07-16-Priors

Introduction: Nick Firoozye writes: While I am absolutely sympathetic to the Bayesian agenda I am often troubled by the requirement of having priors. We must have priors on the parameter of an infinite number of model we have never seen before and I find this troubling. There is a similarly troubling problem in economics of utility theory. Utility is on consumables. To be complete a consumer must assign utility to all sorts of things they never would have encountered. More recent versions of utility theory instead make consumption goods a portfolio of attributes. Cadillacs are x many units of luxury y of transport etc etc. And we can automatically have personal utilities to all these attributes. I don’t ever see parameters. Some model have few and some have hundreds. Instead, I see data. So I don’t know how to have an opinion on parameters themselves. Rather I think it far more natural to have opinions on the behavior of models. The prior predictive density is a good and sensible notion. Also

5 0.23366642 1072 andrew gelman stats-2011-12-19-“The difference between . . .”: It’s not just p=.05 vs. p=.06

Introduction: The title of this post by Sanjay Srivastava illustrates an annoying misconception that’s crept into the (otherwise delightful) recent publicity related to my article with Hal Stern, he difference between “significant” and “not significant” is not itself statistically significant. When people bring this up, they keep referring to the difference between p=0.05 and p=0.06, making the familiar (and correct) point about the arbitrariness of the conventional p-value threshold of 0.05. And, sure, I agree with this, but everybody knows that already. The point Hal and I were making was that even apparently large differences in p-values are not statistically significant. For example, if you have one study with z=2.5 (almost significant at the 1% level!) and another with z=1 (not statistically significant at all, only 1 se from zero!), then their difference has a z of about 1 (again, not statistically significant at all). So it’s not just a comparison of 0.05 vs. 0.06, even a differenc

6 0.21526183 961 andrew gelman stats-2011-10-16-The “Washington read” and the algebra of conditional distributions

7 0.21226282 870 andrew gelman stats-2011-08-25-Why it doesn’t make sense in general to form confidence intervals by inverting hypothesis tests

8 0.2085865 1913 andrew gelman stats-2013-06-24-Why it doesn’t make sense in general to form confidence intervals by inverting hypothesis tests

9 0.20141262 511 andrew gelman stats-2011-01-11-One more time on that ESP study: The problem of overestimates and the shrinkage solution

10 0.18317087 466 andrew gelman stats-2010-12-13-“The truth wears off: Is there something wrong with the scientific method?”

11 0.18306805 1610 andrew gelman stats-2012-12-06-Yes, checking calibration of probability forecasts is part of Bayesian statistics

12 0.18136197 1868 andrew gelman stats-2013-05-23-Validation of Software for Bayesian Models Using Posterior Quantiles

13 0.18072797 547 andrew gelman stats-2011-01-31-Using sample size in the prior distribution

14 0.17680418 2155 andrew gelman stats-2013-12-31-No on Yes-No decisions

15 0.16702154 146 andrew gelman stats-2010-07-14-The statistics and the science

16 0.16401641 643 andrew gelman stats-2011-04-02-So-called Bayesian hypothesis testing is just as bad as regular hypothesis testing

17 0.1633338 1695 andrew gelman stats-2013-01-28-Economists argue about Bayes

18 0.1594812 1944 andrew gelman stats-2013-07-18-You’ll get a high Type S error rate if you use classical statistical methods to analyze data from underpowered studies

19 0.14971136 1476 andrew gelman stats-2012-08-30-Stan is fast

20 0.14511801 494 andrew gelman stats-2010-12-31-Type S error rates for classical and Bayesian single and multiple comparison procedures


similar blogs computed by lsi model

lsi for this blog:

topicId topicWeight

[(0, 0.193), (1, 0.079), (2, 0.052), (3, -0.163), (4, -0.025), (5, -0.101), (6, 0.002), (7, 0.064), (8, -0.028), (9, -0.147), (10, -0.097), (11, 0.004), (12, 0.066), (13, -0.103), (14, 0.01), (15, -0.02), (16, -0.085), (17, -0.03), (18, 0.015), (19, -0.046), (20, 0.097), (21, 0.041), (22, 0.075), (23, -0.032), (24, 0.033), (25, -0.02), (26, 0.008), (27, -0.054), (28, 0.025), (29, -0.045), (30, 0.033), (31, 0.038), (32, -0.037), (33, -0.008), (34, 0.058), (35, 0.083), (36, -0.054), (37, 0.026), (38, -0.03), (39, -0.003), (40, 0.016), (41, 0.099), (42, -0.132), (43, 0.035), (44, 0.006), (45, -0.102), (46, 0.025), (47, 0.036), (48, -0.05), (49, -0.033)]

similar blogs list:

simIndex simValue blogId blogTitle

same-blog 1 0.97339112 899 andrew gelman stats-2011-09-10-The statistical significance filter

Introduction: I’ve talked about this a bit but it’s never had its own blog entry (until now). Statistically significant findings tend to overestimate the magnitude of effects. This holds in general (because E(|x|) > |E(x)|) but even more so if you restrict to statistically significant results. Here’s an example. Suppose a true effect of theta is unbiasedly estimated by y ~ N (theta, 1). Further suppose that we will only consider statistically significant results, that is, cases in which |y| > 2. The estimate “|y| conditional on |y|>2″ is clearly an overestimate of |theta|. First off, if |theta|<2, the estimate |y| conditional on statistical significance is not only too high in expectation, it's always too high. This is a problem, given that |theta| is in reality probably is less than 2. (The low-hangning fruit have already been picked, remember?) But even if |theta|>2, the estimate |y| conditional on statistical significance will still be too high in expectation. For a discussion o

2 0.72127551 1072 andrew gelman stats-2011-12-19-“The difference between . . .”: It’s not just p=.05 vs. p=.06

Introduction: The title of this post by Sanjay Srivastava illustrates an annoying misconception that’s crept into the (otherwise delightful) recent publicity related to my article with Hal Stern, he difference between “significant” and “not significant” is not itself statistically significant. When people bring this up, they keep referring to the difference between p=0.05 and p=0.06, making the familiar (and correct) point about the arbitrariness of the conventional p-value threshold of 0.05. And, sure, I agree with this, but everybody knows that already. The point Hal and I were making was that even apparently large differences in p-values are not statistically significant. For example, if you have one study with z=2.5 (almost significant at the 1% level!) and another with z=1 (not statistically significant at all, only 1 se from zero!), then their difference has a z of about 1 (again, not statistically significant at all). So it’s not just a comparison of 0.05 vs. 0.06, even a differenc

3 0.7062192 1944 andrew gelman stats-2013-07-18-You’ll get a high Type S error rate if you use classical statistical methods to analyze data from underpowered studies

Introduction: Brendan Nyhan sends me this article from the research-methods all-star team of Katherine Button, John Ioannidis, Claire Mokrysz, Brian Nosek , Jonathan Flint, Emma Robinson, and Marcus Munafo: A study with low statistical power has a reduced chance of detecting a true effect, but it is less well appreciated that low power also reduces the likelihood that a statistically significant result reflects a true effect. Here, we show that the average statistical power of studies in the neurosciences is very low. The consequences of this include overestimates of effect size and low reproducibility of results. There are also ethical dimensions to this problem, as unreliable research is inefficient and wasteful. Improving reproducibility in neuroscience is a key priority and requires attention to well-established but often ignored methodological principles. I agree completely. In my terminology, with small sample size, the classical approach of looking for statistical significance leads

4 0.70156014 310 andrew gelman stats-2010-10-02-The winner’s curse

Introduction: If an estimate is statistically significant, it’s probably an overestimate of the magnitude of your effect. P.S. I think youall know what I mean here. But could someone rephrase it in a more pithy manner? I’d like to include it in our statistical lexicon.

5 0.68691552 156 andrew gelman stats-2010-07-20-Burglars are local

Introduction: This makes sense: In the land of fiction, it’s the criminal’s modus operandi – his method of entry, his taste for certain jewellery and so forth – that can be used by detectives to identify his handiwork. The reality according to a new analysis of solved burglaries in the Northamptonshire region of England is that these aspects of criminal behaviour are on their own unreliable as identifying markers, most likely because they are dictated by circumstances rather than the criminal’s taste and style. However, the geographical spread and timing of a burglar’s crimes are distinctive, and could help with police investigations. And, as a bonus, more Tourette’s pride! P.S. On yet another unrelated topic from the same blog, I wonder if the researchers in this study are aware that the difference between “significant” and “not significant” is not itself statistically significant .

6 0.67816514 1776 andrew gelman stats-2013-03-25-The harm done by tests of significance

7 0.66644174 1662 andrew gelman stats-2013-01-09-The difference between “significant” and “non-significant” is not itself statistically significant

8 0.65701938 1171 andrew gelman stats-2012-02-16-“False-positive psychology”

9 0.64372426 2042 andrew gelman stats-2013-09-28-Difficulties of using statistical significance (or lack thereof) to sift through and compare research hypotheses

10 0.63921088 511 andrew gelman stats-2011-01-11-One more time on that ESP study: The problem of overestimates and the shrinkage solution

11 0.63611573 643 andrew gelman stats-2011-04-02-So-called Bayesian hypothesis testing is just as bad as regular hypothesis testing

12 0.63228178 466 andrew gelman stats-2010-12-13-“The truth wears off: Is there something wrong with the scientific method?”

13 0.63184863 2155 andrew gelman stats-2013-12-31-No on Yes-No decisions

14 0.62814373 918 andrew gelman stats-2011-09-21-Avoiding boundary estimates in linear mixed models

15 0.62737536 1150 andrew gelman stats-2012-02-02-The inevitable problems with statistical significance and 95% intervals

16 0.61864519 410 andrew gelman stats-2010-11-12-The Wald method has been the subject of extensive criticism by statisticians for exaggerating results”

17 0.61367637 2183 andrew gelman stats-2014-01-23-Discussion on preregistration of research studies

18 0.61020374 775 andrew gelman stats-2011-06-21-Fundamental difficulty of inference for a ratio when the denominator could be positive or negative

19 0.6052708 106 andrew gelman stats-2010-06-23-Scientists can read your mind . . . as long as the’re allowed to look at more than one place in your brain and then make a prediction after seeing what you actually did

20 0.60215223 576 andrew gelman stats-2011-02-15-With a bit of precognition, you’d have known I was going to post again on this topic, and with a lot of precognition, you’d have known I was going to post today


similar blogs computed by lda model

lda for this blog:

topicId topicWeight

[(1, 0.01), (6, 0.013), (15, 0.029), (16, 0.068), (20, 0.02), (24, 0.257), (42, 0.02), (53, 0.033), (55, 0.01), (69, 0.011), (84, 0.017), (86, 0.086), (95, 0.018), (99, 0.282)]

similar blogs list:

simIndex simValue blogId blogTitle

1 0.98333776 494 andrew gelman stats-2010-12-31-Type S error rates for classical and Bayesian single and multiple comparison procedures

Introduction: Type S error: When your estimate is the wrong sign, compared to the true value of the parameter Type M error: When the magnitude of your estimate is far off, compared to the true value of the parameter More here.

2 0.98205572 1367 andrew gelman stats-2012-06-05-Question 26 of my final exam for Design and Analysis of Sample Surveys

Introduction: 26. You have just graded an an exam with 28 questions and 15 students. You fit a logistic item- response model estimating ability, difficulty, and discrimination parameters. Which of the following statements are basically true? (Indicate all that apply.) (a) If a question is answered correctly by students with very low and very high ability, but is missed by students in the middle, it will have a high value for its discrimination parameter. (b) It is not possible to fit an item-response model when you have more questions than students. In order to fit the model, you either need to reduce the number of questions (for example, by discarding some questions or by putting together some questions into a combined score) or increase the number of students in the dataset. (c) To keep the model identified, you can set one of the difficulty parameters or one of the ability parameters to zero and set one of the discrimination parameters to 1. (d) If two students answer the same number of q

3 0.98020309 779 andrew gelman stats-2011-06-25-Avoiding boundary estimates using a prior distribution as regularization

Introduction: For awhile I’ve been fitting most of my multilevel models using lmer/glmer, which gives point estimates of the group-level variance parameters (maximum marginal likelihood estimate for lmer and an approximation for glmer). I’m usually satisfied with this–sure, point estimation understates the uncertainty in model fitting, but that’s typically the least of our worries. Sometimes, though, lmer/glmer estimates group-level variances at 0 or estimates group-level correlation parameters at +/- 1. Typically, when this happens, it’s not that we’re so sure the variance is close to zero or that the correlation is close to 1 or -1; rather, the marginal likelihood does not provide a lot of information about these parameters of the group-level error distribution. I don’t want point estimates on the boundary. I don’t want to say that the unexplained variance in some dimension is exactly zero. One way to handle this problem is full Bayes: slap a prior on sigma, do your Gibbs and Metropolis

4 0.97984689 846 andrew gelman stats-2011-08-09-Default priors update?

Introduction: Ryan King writes: I was wondering if you have a brief comment on the state of the art for objective priors for hierarchical generalized linear models (generalized linear mixed models). I have been working off the papers in Bayesian Analysis (2006) 1, Number 3 (Browne and Draper, Kass and Natarajan, Gelman). There seems to have been continuous work for matching priors in linear mixed models, but GLMMs less so because of the lack of an analytic marginal likelihood for the variance components. There are a number of additional suggestions in the literature since 2006, but little robust practical guidance. I’m interested in both mean parameters and the variance components. I’m almost always concerned with logistic random effect models. I’m fascinated by the matching-priors idea of higher-order asymptotic improvements to maximum likelihood, and need to make some kind of defensible default recommendation. Given the massive scale of the datasets (genetics …), extensive sensitivity a

same-blog 5 0.97775316 899 andrew gelman stats-2011-09-10-The statistical significance filter

Introduction: I’ve talked about this a bit but it’s never had its own blog entry (until now). Statistically significant findings tend to overestimate the magnitude of effects. This holds in general (because E(|x|) > |E(x)|) but even more so if you restrict to statistically significant results. Here’s an example. Suppose a true effect of theta is unbiasedly estimated by y ~ N (theta, 1). Further suppose that we will only consider statistically significant results, that is, cases in which |y| > 2. The estimate “|y| conditional on |y|>2″ is clearly an overestimate of |theta|. First off, if |theta|<2, the estimate |y| conditional on statistical significance is not only too high in expectation, it's always too high. This is a problem, given that |theta| is in reality probably is less than 2. (The low-hangning fruit have already been picked, remember?) But even if |theta|>2, the estimate |y| conditional on statistical significance will still be too high in expectation. For a discussion o

6 0.97693753 1240 andrew gelman stats-2012-04-02-Blogads update

7 0.97683835 1474 andrew gelman stats-2012-08-29-More on scaled-inverse Wishart and prior independence

8 0.97587734 953 andrew gelman stats-2011-10-11-Steve Jobs’s cancer and science-based medicine

9 0.97336805 2149 andrew gelman stats-2013-12-26-Statistical evidence for revised standards

10 0.9727236 1206 andrew gelman stats-2012-03-10-95% intervals that I don’t believe, because they’re from a flat prior I don’t believe

11 0.97241294 2224 andrew gelman stats-2014-02-25-Basketball Stats: Don’t model the probability of win, model the expected score differential.

12 0.97141922 1062 andrew gelman stats-2011-12-16-Mr. Pearson, meet Mr. Mandelbrot: Detecting Novel Associations in Large Data Sets

13 0.97105455 1087 andrew gelman stats-2011-12-27-“Keeping things unridiculous”: Berger, O’Hagan, and me on weakly informative priors

14 0.97058004 1858 andrew gelman stats-2013-05-15-Reputations changeable, situations tolerable

15 0.96990299 2299 andrew gelman stats-2014-04-21-Stan Model of the Week: Hierarchical Modeling of Supernovas

16 0.96961951 1838 andrew gelman stats-2013-05-03-Setting aside the politics, the debate over the new health-care study reveals that we’re moving to a new high standard of statistical journalism

17 0.96937668 1368 andrew gelman stats-2012-06-06-Question 27 of my final exam for Design and Analysis of Sample Surveys

18 0.96915317 301 andrew gelman stats-2010-09-28-Correlation, prediction, variation, etc.

19 0.96904123 847 andrew gelman stats-2011-08-10-Using a “pure infographic” to explore differences between information visualization and statistical graphics

20 0.96835637 1004 andrew gelman stats-2011-11-11-Kaiser Fung on how not to critique models