andrew_gelman_stats andrew_gelman_stats-2011 andrew_gelman_stats-2011-643 knowledge-graph by maker-knowledge-mining

643 andrew gelman stats-2011-04-02-So-called Bayesian hypothesis testing is just as bad as regular hypothesis testing


meta infos for this blog

Source: html

Introduction: Steve Ziliak points me to this article by the always-excellent Carl Bialik, slamming hypothesis tests. I only wish Carl had talked with me before so hastily posting, though! I would’ve argued with some of the things in the article. In particular, he writes: Reese and Brad Carlin . . . suggest that Bayesian statistics are a better alternative, because they tackle the probability that the hypothesis is true head-on, and incorporate prior knowledge about the variables involved. Brad Carlin does great work in theory, methods, and applications, and I like the bit about the prior knowledge (although I might prefer the more general phrase “additional information”), but I hate that quote! My quick response is that the hypothesis of zero effect is almost never true! The problem with the significance testing framework–Bayesian or otherwise–is in the obsession with the possibility of an exact zero effect. The real concern is not with zero, it’s with claiming a positive effect whe


Summary: the most important sentenses genereted by tfidf model

sentIndex sentText sentNum sentScore

1 Steve Ziliak points me to this article by the always-excellent Carl Bialik, slamming hypothesis tests. [sent-1, score-0.236]

2 I only wish Carl had talked with me before so hastily posting, though! [sent-2, score-0.136]

3 I would’ve argued with some of the things in the article. [sent-3, score-0.069]

4 suggest that Bayesian statistics are a better alternative, because they tackle the probability that the hypothesis is true head-on, and incorporate prior knowledge about the variables involved. [sent-7, score-0.702]

5 Brad Carlin does great work in theory, methods, and applications, and I like the bit about the prior knowledge (although I might prefer the more general phrase “additional information”), but I hate that quote! [sent-8, score-0.254]

6 My quick response is that the hypothesis of zero effect is almost never true! [sent-9, score-0.594]

7 The problem with the significance testing framework–Bayesian or otherwise–is in the obsession with the possibility of an exact zero effect. [sent-10, score-0.659]

8 The real concern is not with zero, it’s with claiming a positive effect when the true effect is negative, or claiming a large effect when the true effect is small, or claiming a precise estimate of an effect when the true effect is highly variable, or . [sent-11, score-2.844]

9 I’ve probably missed a few possibilities here but you get the idea . [sent-14, score-0.233]

10 In addition, none of Carl’s correspondents mentioned the “statistical significance filter”: the idea that, to make the cut of statistical significance, an estimate has to reach some threshold. [sent-15, score-0.616]

11 As a result of this selection bias, statistically significant estimates tend to be overestimates–whether or not a Bayesian method is used, and whether or not there are any problems with fishing through the data. [sent-16, score-0.159]

12 Bayesian inference is great–I’ve written a few books on the topic–but, y’know, garbage in, garbage out. [sent-17, score-0.424]

13 If you start with a model of exactly zero effects, that’s what will pop out. [sent-18, score-0.28]

14 I completely agree with this quote from Susan Ellenberg, reported in the above article: You have to make a lot of assumptions in order to do any statistical test, and all of those are questionable. [sent-19, score-0.268]

15 Steve Stigler is quoted as saying, “I don’t think in science we generally sanction the unequivocal acceptance of significance tests. [sent-24, score-0.699]

16 ” Unfortunately, I have no idea what he means here, given the two completely opposite meanings of the word “sanction” (see the P. [sent-25, score-0.278]


similar blogs computed by tfidf model

tfidf for this blog:

wordName wordTfidf (topN-words)

[('carl', 0.287), ('effect', 0.252), ('sanction', 0.219), ('significance', 0.215), ('garbage', 0.212), ('claiming', 0.208), ('zero', 0.194), ('true', 0.188), ('carlin', 0.176), ('brad', 0.172), ('bayesian', 0.158), ('hypothesis', 0.148), ('steve', 0.138), ('ziliak', 0.122), ('reese', 0.122), ('unequivocal', 0.115), ('bialik', 0.115), ('ellenberg', 0.115), ('obsession', 0.115), ('stigler', 0.115), ('quote', 0.108), ('meanings', 0.106), ('correspondents', 0.106), ('knowledge', 0.102), ('tackle', 0.1), ('overestimates', 0.096), ('susan', 0.092), ('completely', 0.09), ('fishing', 0.089), ('slamming', 0.088), ('pop', 0.086), ('prior', 0.083), ('idea', 0.082), ('incorporate', 0.081), ('filter', 0.08), ('possibilities', 0.079), ('acceptance', 0.079), ('estimate', 0.076), ('missed', 0.072), ('quoted', 0.071), ('statistical', 0.07), ('whether', 0.07), ('great', 0.069), ('argued', 0.069), ('talked', 0.069), ('possibility', 0.068), ('precise', 0.068), ('cut', 0.067), ('exact', 0.067), ('wish', 0.067)]

similar blogs list:

simIndex simValue blogId blogTitle

same-blog 1 1.0 643 andrew gelman stats-2011-04-02-So-called Bayesian hypothesis testing is just as bad as regular hypothesis testing

Introduction: Steve Ziliak points me to this article by the always-excellent Carl Bialik, slamming hypothesis tests. I only wish Carl had talked with me before so hastily posting, though! I would’ve argued with some of the things in the article. In particular, he writes: Reese and Brad Carlin . . . suggest that Bayesian statistics are a better alternative, because they tackle the probability that the hypothesis is true head-on, and incorporate prior knowledge about the variables involved. Brad Carlin does great work in theory, methods, and applications, and I like the bit about the prior knowledge (although I might prefer the more general phrase “additional information”), but I hate that quote! My quick response is that the hypothesis of zero effect is almost never true! The problem with the significance testing framework–Bayesian or otherwise–is in the obsession with the possibility of an exact zero effect. The real concern is not with zero, it’s with claiming a positive effect whe

2 0.1844883 511 andrew gelman stats-2011-01-11-One more time on that ESP study: The problem of overestimates and the shrinkage solution

Introduction: Benedict Carey writes a follow-up article on ESP studies and Bayesian statistics. ( See here for my previous thoughts on the topic.) Everything Carey writes is fine, and he even uses an example I recommended: The statistical approach that has dominated the social sciences for almost a century is called significance testing. The idea is straightforward. A finding from any well-designed study — say, a correlation between a personality trait and the risk of depression — is considered “significant” if its probability of occurring by chance is less than 5 percent. This arbitrary cutoff makes sense when the effect being studied is a large one — for example, when measuring the so-called Stroop effect. This effect predicts that naming the color of a word is faster and more accurate when the word and color match (“red” in red letters) than when they do not (“red” in blue letters), and is very strong in almost everyone. “But if the true effect of what you are measuring is small,” sai

3 0.16511899 803 andrew gelman stats-2011-07-14-Subtleties with measurement-error models for the evaluation of wacky claims

Introduction: A few days ago I discussed the evaluation of somewhat-plausible claims that are somewhat supported by theory and somewhat supported by statistical evidence. One point I raised was that an implausibly large estimate of effect size can be cause for concern: Uri Simonsohn (the author of the recent rebuttal of the name-choice article by Pelham et al.) argued that the implied effects were too large to be believed (just as I was arguing above regarding the July 4th study), which makes more plausible his claims that the results arise from methodological artifacts. That calculation is straight Bayes: the distribution of systematic errors has much longer tails than the distribution of random errors, so the larger the estimated effect, the more likely it is to be a mistake. This little theoretical result is a bit annoying, because it is the larger effects that are the most interesting!” Larry Bartels notes that my reasoning above is a bit incoherent: I [Bartels] strongly agree with

4 0.16401641 899 andrew gelman stats-2011-09-10-The statistical significance filter

Introduction: I’ve talked about this a bit but it’s never had its own blog entry (until now). Statistically significant findings tend to overestimate the magnitude of effects. This holds in general (because E(|x|) > |E(x)|) but even more so if you restrict to statistically significant results. Here’s an example. Suppose a true effect of theta is unbiasedly estimated by y ~ N (theta, 1). Further suppose that we will only consider statistically significant results, that is, cases in which |y| > 2. The estimate “|y| conditional on |y|>2″ is clearly an overestimate of |theta|. First off, if |theta|<2, the estimate |y| conditional on statistical significance is not only too high in expectation, it's always too high. This is a problem, given that |theta| is in reality probably is less than 2. (The low-hangning fruit have already been picked, remember?) But even if |theta|>2, the estimate |y| conditional on statistical significance will still be too high in expectation. For a discussion o

5 0.16175199 1695 andrew gelman stats-2013-01-28-Economists argue about Bayes

Introduction: Robert Bell pointed me to this post by Brad De Long on Bayesian statistics, and then I also noticed this from Noah Smith, who wrote: My impression is that although the Bayesian/Frequentist debate is interesting and intellectually fun, there’s really not much “there” there… despite being so-hip-right-now, Bayesian is not the Statistical Jesus. I’m happy to see the discussion going in this direction. Twenty-five years ago or so, when I got into this biz, there were some serious anti-Bayesian attitudes floating around in mainstream statistics. Discussions in the journals sometimes devolved into debates of the form, “Bayesians: knaves or fools?”. You’d get all sorts of free-floating skepticism about any prior distribution at all, even while people were accepting without question (and doing theory on) logistic regressions, proportional hazards models, and all sorts of strong strong models. (In the subfield of survey sampling, various prominent researchers would refuse to mode

6 0.14419022 1776 andrew gelman stats-2013-03-25-The harm done by tests of significance

7 0.13873591 1944 andrew gelman stats-2013-07-18-You’ll get a high Type S error rate if you use classical statistical methods to analyze data from underpowered studies

8 0.13677417 1418 andrew gelman stats-2012-07-16-Long discussion about causal inference and the use of hierarchical models to bridge between different inferential settings

9 0.13643044 1744 andrew gelman stats-2013-03-01-Why big effects are more important than small effects

10 0.1347788 506 andrew gelman stats-2011-01-06-That silly ESP paper and some silliness in a rebuttal as well

11 0.13455483 1941 andrew gelman stats-2013-07-16-Priors

12 0.13447505 1575 andrew gelman stats-2012-11-12-Thinking like a statistician (continuously) rather than like a civilian (discretely)

13 0.13289863 1355 andrew gelman stats-2012-05-31-Lindley’s paradox

14 0.13028139 1883 andrew gelman stats-2013-06-04-Interrogating p-values

15 0.1298603 1205 andrew gelman stats-2012-03-09-Coming to agreement on philosophy of statistics

16 0.12602273 256 andrew gelman stats-2010-09-04-Noooooooooooooooooooooooooooooooooooooooooooooooo!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!

17 0.12472709 1605 andrew gelman stats-2012-12-04-Write This Book

18 0.12327795 662 andrew gelman stats-2011-04-15-Bayesian statistical pragmatism

19 0.12169999 1757 andrew gelman stats-2013-03-11-My problem with the Lindley paradox

20 0.12075223 1779 andrew gelman stats-2013-03-27-“Two Dogmas of Strong Objective Bayesianism”


similar blogs computed by lsi model

lsi for this blog:

topicId topicWeight

[(0, 0.22), (1, 0.123), (2, -0.001), (3, -0.099), (4, -0.088), (5, -0.063), (6, 0.004), (7, 0.09), (8, 0.015), (9, -0.088), (10, -0.079), (11, -0.024), (12, 0.083), (13, -0.066), (14, 0.053), (15, 0.014), (16, -0.053), (17, -0.006), (18, -0.007), (19, 0.019), (20, -0.015), (21, 0.017), (22, 0.033), (23, 0.025), (24, -0.038), (25, -0.035), (26, 0.026), (27, 0.003), (28, -0.042), (29, -0.041), (30, 0.037), (31, -0.011), (32, 0.002), (33, 0.007), (34, -0.041), (35, -0.022), (36, -0.003), (37, -0.069), (38, 0.017), (39, -0.015), (40, -0.028), (41, 0.025), (42, -0.063), (43, 0.036), (44, 0.051), (45, 0.011), (46, -0.006), (47, -0.02), (48, -0.018), (49, 0.041)]

similar blogs list:

simIndex simValue blogId blogTitle

same-blog 1 0.98806494 643 andrew gelman stats-2011-04-02-So-called Bayesian hypothesis testing is just as bad as regular hypothesis testing

Introduction: Steve Ziliak points me to this article by the always-excellent Carl Bialik, slamming hypothesis tests. I only wish Carl had talked with me before so hastily posting, though! I would’ve argued with some of the things in the article. In particular, he writes: Reese and Brad Carlin . . . suggest that Bayesian statistics are a better alternative, because they tackle the probability that the hypothesis is true head-on, and incorporate prior knowledge about the variables involved. Brad Carlin does great work in theory, methods, and applications, and I like the bit about the prior knowledge (although I might prefer the more general phrase “additional information”), but I hate that quote! My quick response is that the hypothesis of zero effect is almost never true! The problem with the significance testing framework–Bayesian or otherwise–is in the obsession with the possibility of an exact zero effect. The real concern is not with zero, it’s with claiming a positive effect whe

2 0.86821336 1355 andrew gelman stats-2012-05-31-Lindley’s paradox

Introduction: Sam Seaver writes: I [Seaver] happened to be reading an ironic article by Karl Friston when I learned something new about frequentist vs bayesian, namely Lindley’s paradox, on page 12. The text is as follows: So why are we worried about trivial effects? They are important because the probability that the true effect size is exactly zero is itself zero and could cause us to reject the null hypothesis inappropriately. This is a fallacy of classical inference and is not unrelated to Lindley’s paradox (Lindley 1957). Lindley’s paradox describes a counterintuitive situation in which Bayesian and frequentist approaches to hypothesis testing give opposite results. It occurs when; (i) a result is significant by a frequentist test, indicating sufficient evidence to reject the null hypothesis d=0 and (ii) priors render the posterior probability of d=0 high, indicating strong evidence that the null hypothesis is true. In his original treatment, Lindley (1957) showed that – under a parti

3 0.83236486 2149 andrew gelman stats-2013-12-26-Statistical evidence for revised standards

Introduction: In response to the discussion of X and me of his recent paper , Val Johnson writes: I would like to thank Andrew for forwarding his comments on uniformly most powerful Bayesian tests (UMPBTs) to me and his invitation to respond to them. I think he (and also Christian Robert) raise a number of interesting points concerning this new class of Bayesian tests, but I think that they may have confounded several issues that might more usefully be examined separately. The first issue involves the choice of the Bayesian evidence threshold, gamma, used in rejecting a null hypothesis in favor of an alternative hypothesis. Andrew objects to the higher values of gamma proposed in my recent PNAS article on grounds that too many important scientific effects would be missed if thresholds of 25-50 were routinely used. These evidence thresholds correspond roughly to p-values of 0.005; Andrew suggests that evidence thresholds around 5 should continue to be used (gamma=5 corresponds approximate

4 0.81143159 466 andrew gelman stats-2010-12-13-“The truth wears off: Is there something wrong with the scientific method?”

Introduction: Gur Huberman asks what I think of this magazine article by Johah Lehrer (see also here ). My reply is that it reminds me a bit of what I wrote here . Or see here for the quick powerpoint version: The short story is that if you screen for statistical significance when estimating small effects, you will necessarily overestimate the magnitudes of effects, sometimes by a huge amount. I know that Dave Krantz has thought about this issue for awhile; it came up when Francis Tuerlinckx and I wrote our paper on Type S errors, ten years ago. My current thinking is that most (almost all?) research studies of the sort described by Lehrer should be accompanied by retrospective power analyses, or informative Bayesian inferences. Either of these approaches–whether classical or Bayesian, the key is that they incorporate real prior information, just as is done in a classical prospective power analysis–would, I think, moderate the tendency to overestimate the magnitude of effects. In answ

5 0.80177402 506 andrew gelman stats-2011-01-06-That silly ESP paper and some silliness in a rebuttal as well

Introduction: John Talbott points me to this , which I briefly mocked a couple months ago. I largely agree with the critics of this research, but I want to reiterate my point from earlier that all the statistical sophistication in the world won’t help you if you’re studying a null effect. This is not to say that the actual effect is zero—who am I to say?—just that the comments about the high-quality statistics in the article don’t say much to me. There’s lots of discussion of the lack of science underlying ESP claims. I can’t offer anything useful on that account (not being a psychologist, I could imagine all sorts of stories about brain waves or whatever), but I would like to point out something that usually doesn’t seem to get mentioned in these discussions, which is that lots of people want to believe in ESP. After all, it would be cool to read minds. (It wouldn’t be so cool, maybe, if other people could read your mind and you couldn’t read theirs, but I suspect most people don’t think

6 0.79786766 1575 andrew gelman stats-2012-11-12-Thinking like a statistician (continuously) rather than like a civilian (discretely)

7 0.78150755 511 andrew gelman stats-2011-01-11-One more time on that ESP study: The problem of overestimates and the shrinkage solution

8 0.7676304 2312 andrew gelman stats-2014-04-29-Ken Rice presents a unifying approach to statistical inference and hypothesis testing

9 0.76276016 2183 andrew gelman stats-2014-01-23-Discussion on preregistration of research studies

10 0.75889939 1883 andrew gelman stats-2013-06-04-Interrogating p-values

11 0.75727868 368 andrew gelman stats-2010-10-25-Is instrumental variables analysis particularly susceptible to Type M errors?

12 0.755436 2140 andrew gelman stats-2013-12-19-Revised evidence for statistical standards

13 0.75289476 1572 andrew gelman stats-2012-11-10-I don’t like this cartoon

14 0.75278777 2305 andrew gelman stats-2014-04-25-Revised statistical standards for evidence (comments to Val Johnson’s comments on our comments on Val’s comments on p-values)

15 0.74529803 1776 andrew gelman stats-2013-03-25-The harm done by tests of significance

16 0.74496573 1607 andrew gelman stats-2012-12-05-The p-value is not . . .

17 0.74475437 2295 andrew gelman stats-2014-04-18-One-tailed or two-tailed?

18 0.74440646 898 andrew gelman stats-2011-09-10-Fourteen magic words: an update

19 0.7356739 2281 andrew gelman stats-2014-04-04-The Notorious N.H.S.T. presents: Mo P-values Mo Problems

20 0.72743553 899 andrew gelman stats-2011-09-10-The statistical significance filter


similar blogs computed by lda model

lda for this blog:

topicId topicWeight

[(16, 0.028), (21, 0.021), (24, 0.669), (99, 0.164)]

similar blogs list:

simIndex simValue blogId blogTitle

1 0.99866021 471 andrew gelman stats-2010-12-17-Attractive models (and data) wanted for statistical art show.

Introduction: I have agreed to do a local art exhibition in February. An excuse to think about form, colour and style for plotting almost individual observation likelihoods – while invoking the artists privilege of refusing to give interpretations of their own work. In order to make it possibly less dry I’ll try to use intuitive suggestive captions like in this example TheTyranyof13.pdf thereby side stepping the technical discussions like here RadfordNealBlog Suggested models and data sets (or even submissions) would be most appreciated. I likely be sticking to realism i.e. plots that represent ‘statistical reality’ faithfully. K?

2 0.99859655 1437 andrew gelman stats-2012-07-31-Paying survey respondents

Introduction: I agree with Casey Mulligan that participants in government surveys should be paid, and I think it should be part of the code of ethics for commercial pollsters to compensate their respondents also. As Mulligan points out, if a survey is worth doing, it should be worth compensating the participants for their time and effort. P.S. Just to clarify, I do not recommend that Census surveys be made voluntary, I just think that respondents (who can be required to participate) should be paid a small amount. P.P.S. More rant here .

3 0.99777871 1046 andrew gelman stats-2011-12-07-Neutral noninformative and informative conjugate beta and gamma prior distributions

Introduction: Jouni Kerman did a cool bit of research justifying the Beta (1/3, 1/3) prior as noninformative for binomial data, and the Gamma (1/3, 0) prior for Poisson data. You probably thought that nothing new could be said about noninformative priors in such basic problems, but you were wrong! Here’s the story : The conjugate binomial and Poisson models are commonly used for estimating proportions or rates. However, it is not well known that the conventional noninformative conjugate priors tend to shrink the posterior quantiles toward the boundary or toward the middle of the parameter space, making them thus appear excessively informative. The shrinkage is always largest when the number of observed events is small. This behavior persists for all sample sizes and exposures. The effect of the prior is therefore most conspicuous and potentially controversial when analyzing rare events. As alternative default conjugate priors, I [Jouni] introduce Beta(1/3, 1/3) and Gamma(1/3, 0), which I cal

4 0.99095243 240 andrew gelman stats-2010-08-29-ARM solutions

Introduction: People sometimes email asking if a solution set is available for the exercises in ARM. The answer, unfortunately, is no. Many years ago, I wrote up 50 solutions for BDA and it was a lot of work–really, it was like writing a small book in itself. The trouble is that, once I started writing them up, I wanted to do it right, to set a good example. That’s a lot more effort than simply scrawling down some quick answers.

5 0.98505253 545 andrew gelman stats-2011-01-30-New innovations in spam

Introduction: I received the following (unsolicited) email today: Hello Andrew, I’m interested in whether you are accepting guest article submissions for your site Statistical Modeling, Causal Inference, and Social Science? I’m the owner of the recently created nonprofit site OnlineEngineeringDegree.org and am interested in writing / submitting an article for your consideration to be published on your site. Is that something you’d be willing to consider, and if so, what specs in terms of topics or length requirements would you be looking for? Thanks you for your time, and if you have any questions or are interested, I’d appreciate you letting me know. Sincerely, Samantha Rhodes Huh? P.S. My vote for most obnoxious spam remains this one , which does its best to dilute whatever remains of the reputation of Wolfram Research. Or maybe that particular bit of spam was written by a particularly awesome cellular automaton that Wolfram discovered? I guess in the world of big-time software

6 0.98386812 59 andrew gelman stats-2010-05-30-Extended Binary Format Support for Mac OS X

same-blog 7 0.98058629 643 andrew gelman stats-2011-04-02-So-called Bayesian hypothesis testing is just as bad as regular hypothesis testing

8 0.97010136 613 andrew gelman stats-2011-03-15-Gay-married state senator shot down gay marriage

9 0.97010136 712 andrew gelman stats-2011-05-14-The joys of working in the public domain

10 0.97010136 723 andrew gelman stats-2011-05-21-Literary blurb translation guide

11 0.97010136 1242 andrew gelman stats-2012-04-03-Best lottery story ever

12 0.97010136 1252 andrew gelman stats-2012-04-08-Jagdish Bhagwati’s definition of feminist sincerity

13 0.96341765 38 andrew gelman stats-2010-05-18-Breastfeeding, infant hyperbilirubinemia, statistical graphics, and modern medicine

14 0.95196396 241 andrew gelman stats-2010-08-29-Ethics and statistics in development research

15 0.94955111 938 andrew gelman stats-2011-10-03-Comparing prediction errors

16 0.94911027 1978 andrew gelman stats-2013-08-12-Fixing the race, ethnicity, and national origin questions on the U.S. Census

17 0.9483794 373 andrew gelman stats-2010-10-27-It’s better than being forwarded the latest works of you-know-who

18 0.94738597 1479 andrew gelman stats-2012-09-01-Mothers and Moms

19 0.94736946 1092 andrew gelman stats-2011-12-29-More by Berger and me on weakly informative priors

20 0.94471562 2229 andrew gelman stats-2014-02-28-God-leaf-tree