andrew_gelman_stats andrew_gelman_stats-2013 andrew_gelman_stats-2013-1713 knowledge-graph by maker-knowledge-mining

1713 andrew gelman stats-2013-02-08-P-values and statistical practice

meta infos for this blog

Source: html

Introduction: From my new article in the journal Epidemiology: Sander Greenland and Charles Poole accept that P values are here to stay but recognize that some of their most common interpretations have problems. The casual view of the P value as posterior probability of the truth of the null hypothesis is false and not even close to valid under any reasonable model, yet this misunderstanding persists even in high-stakes settings (as discussed, for example, by Greenland in 2011). The formal view of the P value as a probability conditional on the null is mathematically correct but typically irrelevant to research goals (hence, the popularity of alternative—if wrong—interpretations). A Bayesian interpretation based on a spike-and-slab model makes little sense in applied contexts in epidemiology, political science, and other fields in which true effects are typically nonzero and bounded (thus violating both the “spike” and the “slab” parts of the model). I find Greenland and Poole’s perspective t

Summary: the most important sentenses genereted by tfidf model

sentIndex sentText sentNum sentScore

1 From my new article in the journal Epidemiology: Sander Greenland and Charles Poole accept that P values are here to stay but recognize that some of their most common interpretations have problems. [sent-1, score-0.316]

2 The casual view of the P value as posterior probability of the truth of the null hypothesis is false and not even close to valid under any reasonable model, yet this misunderstanding persists even in high-stakes settings (as discussed, for example, by Greenland in 2011). [sent-2, score-1.047]

3 The formal view of the P value as a probability conditional on the null is mathematically correct but typically irrelevant to research goals (hence, the popularity of alternative—if wrong—interpretations). [sent-3, score-0.713]

4 A Bayesian interpretation based on a spike-and-slab model makes little sense in applied contexts in epidemiology, political science, and other fields in which true effects are typically nonzero and bounded (thus violating both the “spike” and the “slab” parts of the model). [sent-4, score-0.313]

5 I find Greenland and Poole’s perspective to be valuable: it is important to go beyond criticism and to understand what information is actually contained in a P value. [sent-5, score-0.063]

6 These authors discuss some connections between P values and Bayesian posterior probabilities. [sent-6, score-0.446]

7 I am not so optimistic about the practical value of these connections. [sent-7, score-0.247]

8 Conditional on the continuing omnipresence of P values in applications, however, these are important results that should be generally understood. [sent-8, score-0.207]

9 First, they describe how P values approximate posterior probabilities under prior distributions that contain little information relative to the data: This misuse [of P values] may be lessened by recognizing correct Bayesian interpretations. [sent-10, score-0.852]

10 For example, under weak priors, 95% confidence intervals approximate 95% posterior probability intervals, one-sided P values approximate directional posterior probabilities, and point estimates approximate posterior medians. [sent-11, score-1.907]

11 I used to think this way, too (see many examples in our books), but in recent years have moved to the position that I do not trust such direct posterior probabilities. [sent-12, score-0.239]

12 Unfortunately, I think we cannot avoid informative priors if we wish to make reasonable unconditional probability statements. [sent-13, score-0.446]

13 To put it another way, I agree with the mathematical truth of the quotation above, but I think it can mislead in practice because of serious problems with apparently noninformative or weak priors. [sent-14, score-0.305]

14 At its center are three examples: “A P value that worked” (to dismiss a hypothesis of fraud in a local election), “A P value that was reasonable but unnecessary” (in our estimates of the effects of redistricting) and “A misleading P value” (from the notorious Daryl Bem). [sent-19, score-0.575]

15 One reason my article came out so well is that, after writing it, I sent it to Greenland, who pointed out a number of places where I’d misunderstood what he’d written. [sent-22, score-0.067]

16 Instead I stuck it out, swallowed my pride, and ended up with something much improved. [sent-25, score-0.083]

similar blogs computed by tfidf model

tfidf for this blog:

wordName wordTfidf (topN-words)

[('greenland', 0.496), ('poole', 0.332), ('posterior', 0.239), ('values', 0.207), ('approximate', 0.189), ('value', 0.182), ('sander', 0.15), ('epidemiology', 0.126), ('priors', 0.116), ('interpretations', 0.109), ('probability', 0.105), ('null', 0.094), ('weak', 0.091), ('intervals', 0.088), ('reasonable', 0.087), ('truth', 0.084), ('probabilities', 0.083), ('swallowed', 0.083), ('slab', 0.078), ('conditional', 0.078), ('informative', 0.076), ('favoring', 0.075), ('directional', 0.075), ('misuse', 0.068), ('hypothesis', 0.067), ('misunderstood', 0.067), ('quotation', 0.067), ('view', 0.067), ('correct', 0.066), ('spike', 0.065), ('violating', 0.065), ('redistricting', 0.065), ('persists', 0.065), ('optimistic', 0.065), ('bayesian', 0.065), ('pride', 0.064), ('typically', 0.064), ('contained', 0.063), ('mislead', 0.063), ('nonzero', 0.063), ('unconditional', 0.062), ('contexts', 0.061), ('communities', 0.061), ('unnecessary', 0.061), ('town', 0.06), ('bounded', 0.06), ('popularity', 0.057), ('misunderstanding', 0.057), ('daryl', 0.057), ('estimates', 0.057)]

similar blogs list:

simIndex simValue blogId blogTitle

same-blog 1 0.99999976 1713 andrew gelman stats-2013-02-08-P-values and statistical practice

2 0.28582358 1760 andrew gelman stats-2013-03-12-Misunderstanding the p-value

Introduction: The New York Times has a feature in its Tuesday science section, Take a Number, to which I occasionally contribute (see here and here ). Today’s column , by Nicholas Balakar, is in error. The column begins: When medical researchers report their findings, they need to know whether their result is a real effect of what they are testing, or just a random occurrence. To figure this out, they most commonly use the p-value. This is wrong on two counts. First, whatever researchers might feel, this is something they’ll never know. Second, results are a combination of real effects and chance, it’s not either/or. Perhaps the above is a forgivable simplification, but I don’t think so; I think it’s a simplification that destroys the reason for writing the article in the first place. But in any case I think there’s no excuse for this, later on: By convention, a p-value higher than 0.05 usually indicates that the results of the study, however good or bad, were probably due only

3 0.17886806 427 andrew gelman stats-2010-11-23-Bayesian adaptive methods for clinical trials

Introduction: Scott Berry, Brad Carlin, Jack Lee, and Peter Muller recently came out with a book with the above title. The book packs a lot into its 280 pages and is fun to read as well (even if they do use the word “modalities” in their first paragraph, and later on they use the phrase “DIC criterion,” which upsets my tidy, logical mind). The book starts off fast on page 1 and never lets go. Clinical trials are a big part of statistics and it’s cool to see the topic taken seriously and being treated rigorously. (Here I’m not talking about empty mathematical rigor (or, should I say, “rigor”), so-called optimal designs and all that, but rather the rigor of applied statistics, mapping models to reality.) Also I have a few technical suggestions. 1. The authors fit a lot of models in Bugs, which is fine, but they go overboard on the WinBUGS thing. There’s WinBUGS, OpenBUGS, JAGS: they’re all Bugs recommend running Bugs from R using the clunky BRugs interface rather than the smoother bugs(

4 0.16932505 2029 andrew gelman stats-2013-09-18-Understanding posterior p-values

Introduction: David Kaplan writes: I came across your paper “Understanding Posterior Predictive P-values”, and I have a question regarding your statement “If a posterior predictive p-value is 0.4, say, that means that, if we believe the model, we think there is a 40% chance that tomorrow’s value of T(y_rep) will exceed today’s T(y).” This is perfectly understandable to me and represents the idea of calibration. However, I am unsure how this relates to statements about fit. If T is the LR chi-square or Pearson chi-square, then your statement that there is a 40% chance that tomorrows value exceeds today’s value indicates bad fit, I think. Yet, some literature indicates that high p-values suggest good fit. Could you clarify this? My reply: I think that “fit” depends on the question being asked. In this case, I’d say the model fits for this particular purpose, even though it might not fit for other purposes. And here’s the abstract of the paper: Posterior predictive p-values do not i

5 0.15567356 1955 andrew gelman stats-2013-07-25-Bayes-respecting experimental design and other things

Introduction: Dan Lakeland writes: I have some questions about some basic statistical ideas and would like your opinion on them: 1) Parameters that manifestly DON’T exist: It makes good sense to me to think about Bayesian statistics as narrowing in on the value of parameters based on a model and some data. But there are cases where “the parameter” simply doesn’t make sense as an actual thing. Yet, it’s not really a complete fiction, like unicorns either, it’s some kind of “effective” thing maybe. Here’s an example of what I mean. I did a simple toy experiment where we dropped crumpled up balls of paper and timed their fall times. (see here: http://models.street-artists.org/?s=falling+ball ) It was pretty instructive actually, and I did it to figure out how to in a practical way use an ODE to get a likelihood in MCMC procedures. One of the parameters in the model is the radius of the spherical ball of paper. But the ball of paper isn’t a sphere, not even approximately. There’s no single valu

6 0.15391514 1941 andrew gelman stats-2013-07-16-Priors

7 0.15262391 1247 andrew gelman stats-2012-04-05-More philosophy of Bayes

8 0.14869916 1829 andrew gelman stats-2013-04-28-Plain old everyday Bayesianism!

9 0.14785916 2109 andrew gelman stats-2013-11-21-Hidden dangers of noninformative priors

10 0.13748123 1926 andrew gelman stats-2013-07-05-More plain old everyday Bayesianism

11 0.13511716 1046 andrew gelman stats-2011-12-07-Neutral noninformative and informative conjugate beta and gamma prior distributions

12 0.13372898 1087 andrew gelman stats-2011-12-27-“Keeping things unridiculous”: Berger, O’Hagan, and me on weakly informative priors

13 0.13371576 1092 andrew gelman stats-2011-12-29-More by Berger and me on weakly informative priors

14 0.12900294 1208 andrew gelman stats-2012-03-11-Gelman on Hennig on Gelman on Bayes

15 0.12541389 1817 andrew gelman stats-2013-04-21-More on Bayesian model selection in high-dimensional settings

16 0.12014499 1946 andrew gelman stats-2013-07-19-Prior distributions on derived quantities rather than on parameters themselves

17 0.11976662 1672 andrew gelman stats-2013-01-14-How do you think about the values in a confidence interval?

18 0.11878853 291 andrew gelman stats-2010-09-22-Philosophy of Bayes and non-Bayes: A dialogue with Deborah Mayo

19 0.11851262 754 andrew gelman stats-2011-06-09-Difficulties with Bayesian model averaging

20 0.11828686 2208 andrew gelman stats-2014-02-12-How to think about “identifiability” in Bayesian inference?

similar blogs computed by lsi model

lsi for this blog:

topicId topicWeight

[(0, 0.213), (1, 0.133), (2, -0.019), (3, -0.015), (4, -0.092), (5, -0.071), (6, 0.031), (7, 0.021), (8, -0.065), (9, -0.047), (10, -0.022), (11, 0.024), (12, -0.004), (13, -0.019), (14, -0.021), (15, -0.034), (16, -0.002), (17, -0.028), (18, 0.017), (19, -0.047), (20, 0.037), (21, -0.001), (22, -0.003), (23, -0.004), (24, -0.001), (25, -0.018), (26, -0.001), (27, 0.027), (28, 0.013), (29, -0.012), (30, -0.014), (31, -0.017), (32, -0.016), (33, 0.015), (34, -0.045), (35, -0.015), (36, 0.066), (37, -0.008), (38, 0.047), (39, -0.045), (40, -0.048), (41, -0.035), (42, 0.047), (43, -0.005), (44, 0.013), (45, 0.011), (46, -0.013), (47, 0.027), (48, 0.023), (49, 0.046)]

similar blogs list:

simIndex simValue blogId blogTitle

same-blog 1 0.97082782 1713 andrew gelman stats-2013-02-08-P-values and statistical practice

2 0.80145437 2305 andrew gelman stats-2014-04-25-Revised statistical standards for evidence (comments to Val Johnson’s comments on our comments on Val’s comments on p-values)

Introduction: As regular readers of this blog are aware, a few months ago Val Johnson published an article, “Revised standards for statistical evidence,” making a Bayesian argument that researchers and journals should use a p=0.005 publication threshold rather than the usual p=0.05. Christian Robert and I were unconvinced by Val’s reasoning and wrote a response , “Revised evidence for statistical standards,” in which we wrote: Johnson’s minimax prior is not intended to correspond to any distribution of effect sizes; rather, it represents a worst case scenario under some mathematical assumptions. Minimax and tradeoffs do well together, and it is hard for us to see how any worst case procedure can supply much guidance on how to balance between two different losses. . . . We would argue that the appropriate significance level depends on the scenario and that what worked well for agricultural experiments in the 1920s might not be so appropriate for many applications in modern biosciences . . .

3 0.79026431 2149 andrew gelman stats-2013-12-26-Statistical evidence for revised standards

Introduction: In response to the discussion of X and me of his recent paper , Val Johnson writes: I would like to thank Andrew for forwarding his comments on uniformly most powerful Bayesian tests (UMPBTs) to me and his invitation to respond to them. I think he (and also Christian Robert) raise a number of interesting points concerning this new class of Bayesian tests, but I think that they may have confounded several issues that might more usefully be examined separately. The first issue involves the choice of the Bayesian evidence threshold, gamma, used in rejecting a null hypothesis in favor of an alternative hypothesis. Andrew objects to the higher values of gamma proposed in my recent PNAS article on grounds that too many important scientific effects would be missed if thresholds of 25-50 were routinely used. These evidence thresholds correspond roughly to p-values of 0.005; Andrew suggests that evidence thresholds around 5 should continue to be used (gamma=5 corresponds approximate

4 0.76955777 2140 andrew gelman stats-2013-12-19-Revised evidence for statistical standards

Introduction: X and I heard about this much-publicized recent paper by Val Johnson, who suggests changing the default level of statistical significance from z=2 to z=3 (or, as he puts it, going from p=.05 to p=.005 or .001). Val argues that you need to go out to 3 standard errors to get a Bayes factor of 25 or 50 in favor of the alternative hypothesis. I don’t really buy this, first because Val’s model is a weird (to me) mixture of two point masses, which he creates in order to make a minimax argument, and second because I don’t see why you need a Bayes factor of 25 to 50 in order to make a claim. I’d think that a factor of 5:1, say, provides strong information already—if you really believe those odds. The real issue, as I see it, is that we’re getting Bayes factors and posterior probabilities we don’t believe, because we’re assuming flat priors that don’t really make sense. This is a topic that’s come up over and over in recent months on this blog, for example in this discussion of why I d

5 0.75852787 1518 andrew gelman stats-2012-10-02-Fighting a losing battle

Introduction: Following a recent email exchange regarding path sampling and thermodynamic integration (sadly, I’ve gotten rusty and haven’t thought seriously about these challenges for many years), a correspondent referred to the marginal distribution of the data under a model as “the evidence.” I hate that expression! As we discuss in chapter 6 of BDA, for continuous-parametered models, this quantity can be completely sensitive to aspects of the prior that have essentially no impact on the posterior. In the examples I’ve seen, this marginal probability is not “evidence” in any useful sense of the term. When I told this to my correspondent, he replied, I actually don’t find “the evidence” too bothersome. I don’t have BDA at home where I’m working from at the moment, so I’ll read up on chapter 6 later, but I assume you refer to the problem of the marginal likelihood being strongly sensitive to the prior in a way that the posterior typically isn’t, thereby diminishing the value of the margi

6 0.75469261 2029 andrew gelman stats-2013-09-18-Understanding posterior p-values

7 0.74923062 1355 andrew gelman stats-2012-05-31-Lindley’s paradox

8 0.74011624 1760 andrew gelman stats-2013-03-12-Misunderstanding the p-value

9 0.7293551 2027 andrew gelman stats-2013-09-17-Christian Robert on the Jeffreys-Lindley paradox; more generally, it’s good news when philosophical arguments can be transformed into technical modeling issues

10 0.7267502 1095 andrew gelman stats-2012-01-01-Martin and Liu: Probabilistic inference based on consistency of model with data

11 0.71878982 1247 andrew gelman stats-2012-04-05-More philosophy of Bayes

12 0.71430224 1792 andrew gelman stats-2013-04-07-X on JLP

13 0.71322948 1208 andrew gelman stats-2012-03-11-Gelman on Hennig on Gelman on Bayes

14 0.70834827 1723 andrew gelman stats-2013-02-15-Wacky priors can work well?

15 0.70628166 331 andrew gelman stats-2010-10-10-Bayes jumps the shark

16 0.70094121 2201 andrew gelman stats-2014-02-06-Bootstrap averaging: Examples where it works and where it doesn’t work

17 0.70017737 1757 andrew gelman stats-2013-03-11-My problem with the Lindley paradox

18 0.69384325 317 andrew gelman stats-2010-10-04-Rob Kass on statistical pragmatism, and my reactions

19 0.68603224 1941 andrew gelman stats-2013-07-16-Priors

20 0.68539274 1926 andrew gelman stats-2013-07-05-More plain old everyday Bayesianism

similar blogs computed by lda model

lda for this blog:

topicId topicWeight

[(5, 0.094), (9, 0.011), (15, 0.055), (16, 0.079), (21, 0.031), (24, 0.217), (45, 0.016), (47, 0.013), (59, 0.011), (63, 0.015), (65, 0.035), (86, 0.024), (99, 0.273)]

similar blogs list:

simIndex simValue blogId blogTitle

same-blog 1 0.97714621 1713 andrew gelman stats-2013-02-08-P-values and statistical practice

2 0.95669329 488 andrew gelman stats-2010-12-27-Graph of the year

Introduction: From blogger Matthew Yglesias : There are lots of great graphs all over the web (see, for example, here and here for some snappy pictures of unemployment trends from blogger “Geoff”). There’s nothing special about Yglesias’s graph. In fact, the reason I’m singling it out as “graph of the year” is because it’s not special. It’s a display of three numbers, with no subtlety or artistry in its presentation. True, it has some good features: - Clear title - Clearly labeled axes - Vertical axis goes to zero - The cities are in a sensible order (not, for example, alphabetical) - The graphs is readable; none of that 3-D “data visualization” crap that looks cool but distances the reader from the numbers being displayed. What’s impressive about the above graph, what makes it a landmark to me, is that it was made at all. As noted in the text immediately below the image, it’s a display of exactly three numbers which can with little effort be completely presented and e

3 0.95607185 1606 andrew gelman stats-2012-12-05-The Grinch Comes Back

Introduction: Wayne Folta writes: In keeping with your interest in graphs, this might interest or inspire you, if you haven’t seen it already, which features 20 scientific graphs that Wired likes, ranging from drawn illustrations to trajectory plots. My reaction: I looked at the first 10. I liked 1, 3, and 5, I didn’t like 2, 7, 8, 9, and 10. I have neutral feelings about 4 and 6. I won’t explain all these feelings, but, just for example, from my perspective, image 9 fails as a statistical graphic (although it might be fine as an infovis) by trying to cram to much into a single image. I don’t think it works to have all the colors on the single wheels; instead I’d prefer some sort of grid of images. Also, I don’t see the point of the circular display. That makes no sense at all; it’s a misleading feature. That said, the graphs I dislike can still be fine for their purpose. A graph in a journal such as Science or Nature is meant to grab the eye of a busy reader (or to go viral on

4 0.95493484 1841 andrew gelman stats-2013-05-04-The Folk Theorem of Statistical Computing

Introduction: From an email I received the other day: Things are going much better now — it’s interesting, it feels like with both of my models, parameters are slow to converge or get “stuck” and have trouble mixing when the model is somehow misspecified. See here for a statement of the folk theorem.

5 0.9544698 1240 andrew gelman stats-2012-04-02-Blogads update

Introduction: A few months ago I reported on someone who wanted to insert text links into the blog. I asked her how much they would pay and got no answer. Yesterday, though, I received this reply: Hello Andrew, I am sorry for the delay in getting back to you. I’d like to make a proposal for your site. Please refer below. We would like to place a simple text link ad on page http://andrewgelman.com/2011/07/super_sam_fuld/ to link to *** with the key phrase ***. We will incorporate the key phrase into a sentence so it would read well. Rest assured it won’t sound obnoxious or advertorial. We will then process the final text link code as soon as you agree to our proposal. We can offer you $200 for this with the assumption that you will keep the link “live” on that page for 12 months or longer if you prefer. Please get back to us with a quick reply on your thoughts on this and include your Paypal ID for payment process. Hoping for a positive response from you. I wrote back: Hi,

6 0.95248312 1637 andrew gelman stats-2012-12-24-Textbook for data visualization?

7 0.95144469 2358 andrew gelman stats-2014-06-03-Did you buy laundry detergent on their most recent trip to the store? Also comments on scientific publication and yet another suggestion to do a study that allows within-person comparisons

8 0.95042276 671 andrew gelman stats-2011-04-20-One more time-use graph

9 0.95017725 1578 andrew gelman stats-2012-11-15-Outta control political incorrectness

10 0.94795024 1367 andrew gelman stats-2012-06-05-Question 26 of my final exam for Design and Analysis of Sample Surveys

11 0.94757438 2208 andrew gelman stats-2014-02-12-How to think about “identifiability” in Bayesian inference?

12 0.94704974 1080 andrew gelman stats-2011-12-24-Latest in blog advertising

13 0.94688183 2109 andrew gelman stats-2013-11-21-Hidden dangers of noninformative priors

14 0.94679648 1208 andrew gelman stats-2012-03-11-Gelman on Hennig on Gelman on Bayes

15 0.94612724 1155 andrew gelman stats-2012-02-05-What is a prior distribution?

16 0.94591719 2149 andrew gelman stats-2013-12-26-Statistical evidence for revised standards

17 0.94536394 687 andrew gelman stats-2011-04-29-Zero is zero

18 0.94519937 2080 andrew gelman stats-2013-10-28-Writing for free

19 0.94499457 2055 andrew gelman stats-2013-10-08-A Bayesian approach for peer-review panels? and a speculation about Bruno Frey

20 0.94442308 898 andrew gelman stats-2011-09-10-Fourteen magic words: an update