andrew_gelman_stats andrew_gelman_stats-2013 andrew_gelman_stats-2013-2029 knowledge-graph by maker-knowledge-mining

2029 andrew gelman stats-2013-09-18-Understanding posterior p-values


meta infos for this blog

Source: html

Introduction: David Kaplan writes: I came across your paper “Understanding Posterior Predictive P-values”, and I have a question regarding your statement “If a posterior predictive p-value is 0.4, say, that means that, if we believe the model, we think there is a 40% chance that tomorrow’s value of T(y_rep) will exceed today’s T(y).” This is perfectly understandable to me and represents the idea of calibration. However, I am unsure how this relates to statements about fit. If T is the LR chi-square or Pearson chi-square, then your statement that there is a 40% chance that tomorrows value exceeds today’s value indicates bad fit, I think. Yet, some literature indicates that high p-values suggest good fit. Could you clarify this? My reply: I think that “fit” depends on the question being asked. In this case, I’d say the model fits for this particular purpose, even though it might not fit for other purposes. And here’s the abstract of the paper: Posterior predictive p-values do not i


Summary: the most important sentenses genereted by tfidf model

sentIndex sentText sentNum sentScore

1 David Kaplan writes: I came across your paper “Understanding Posterior Predictive P-values”, and I have a question regarding your statement “If a posterior predictive p-value is 0. [sent-1, score-0.81]

2 4, say, that means that, if we believe the model, we think there is a 40% chance that tomorrow’s value of T(y_rep) will exceed today’s T(y). [sent-2, score-0.375]

3 ” This is perfectly understandable to me and represents the idea of calibration. [sent-3, score-0.261]

4 However, I am unsure how this relates to statements about fit. [sent-4, score-0.293]

5 If T is the LR chi-square or Pearson chi-square, then your statement that there is a 40% chance that tomorrows value exceeds today’s value indicates bad fit, I think. [sent-5, score-0.983]

6 Yet, some literature indicates that high p-values suggest good fit. [sent-6, score-0.174]

7 My reply: I think that “fit” depends on the question being asked. [sent-8, score-0.149]

8 In this case, I’d say the model fits for this particular purpose, even though it might not fit for other purposes. [sent-9, score-0.302]

9 And here’s the abstract of the paper: Posterior predictive p-values do not in general have uniform distributions under the null hypothesis (except in the special case of pivotal test statistics) but instead tend to have distributions more concentrated near 0. [sent-10, score-1.317]

10 From different perspectives, such nonuniform distributions have been portrayed as desirable (as reflecting an ability of vague prior distributions to nonetheless yield accurate posterior predictions) or undesirable (as making it more difficult to reject a false model). [sent-12, score-1.871]

11 We explore this tension through two simple normal-distribution examples. [sent-13, score-0.2]

12 In one example, the low power of the posterior predictive check is desirable from a statistical perspective; in the other, the posterior predictive check seems inappropriate. [sent-14, score-1.804]

13 Our conclusion is that the relevance of the p-value depends on the applied context, a point which (ironically) can be seen even in these two toy examples. [sent-15, score-0.412]


similar blogs computed by tfidf model

tfidf for this blog:

wordName wordTfidf (topN-words)

[('posterior', 0.356), ('predictive', 0.328), ('distributions', 0.257), ('desirable', 0.206), ('indicates', 0.174), ('value', 0.158), ('depends', 0.149), ('tomorrows', 0.144), ('fit', 0.14), ('undesirable', 0.136), ('unsure', 0.13), ('lr', 0.13), ('statement', 0.126), ('pivotal', 0.126), ('tension', 0.122), ('portrayed', 0.122), ('exceeds', 0.116), ('check', 0.115), ('kaplan', 0.114), ('today', 0.113), ('exceed', 0.11), ('toy', 0.11), ('concentrated', 0.108), ('ironically', 0.108), ('chance', 0.107), ('pearson', 0.105), ('understandable', 0.103), ('tomorrow', 0.1), ('reflecting', 0.098), ('nonetheless', 0.096), ('perspectives', 0.094), ('vague', 0.094), ('relates', 0.093), ('reject', 0.089), ('yield', 0.086), ('uniform', 0.085), ('model', 0.085), ('perfectly', 0.082), ('relevance', 0.081), ('null', 0.081), ('explore', 0.078), ('clarify', 0.077), ('fits', 0.077), ('represents', 0.076), ('near', 0.075), ('purpose', 0.075), ('accurate', 0.074), ('conclusion', 0.072), ('statements', 0.07), ('predictions', 0.07)]

similar blogs list:

simIndex simValue blogId blogTitle

same-blog 1 1.0 2029 andrew gelman stats-2013-09-18-Understanding posterior p-values

Introduction: David Kaplan writes: I came across your paper “Understanding Posterior Predictive P-values”, and I have a question regarding your statement “If a posterior predictive p-value is 0.4, say, that means that, if we believe the model, we think there is a 40% chance that tomorrow’s value of T(y_rep) will exceed today’s T(y).” This is perfectly understandable to me and represents the idea of calibration. However, I am unsure how this relates to statements about fit. If T is the LR chi-square or Pearson chi-square, then your statement that there is a 40% chance that tomorrows value exceeds today’s value indicates bad fit, I think. Yet, some literature indicates that high p-values suggest good fit. Could you clarify this? My reply: I think that “fit” depends on the question being asked. In this case, I’d say the model fits for this particular purpose, even though it might not fit for other purposes. And here’s the abstract of the paper: Posterior predictive p-values do not i

2 0.30216292 1363 andrew gelman stats-2012-06-03-Question about predictive checks

Introduction: Klaas Metselaar writes: I [Metselaar] am currently involved in a discussion about the use of the notion “predictive” as used in “posterior predictive check”. I would argue that the notion “predictive” should be reserved for posterior checks using information not used in the determination of the posterior. I quote from the discussion: “However, the predictive uncertainty in a Bayesian calculation requires sampling from all the random variables, and this includes both the model parameters and the residual error”. My [Metselaar's] comment: This may be exactly the point I am worried about: shouldn’t the predictive uncertainty be defined as sampling from the posterior parameter distribution + residual error + sampling from the prediction error distribution? Residual error reduces to measurement error in the case of a model which is perfect for the sample of experiments. Measurement error could be reduced to almost zero by ideal and perfect measurement instruments. I would h

3 0.189051 1208 andrew gelman stats-2012-03-11-Gelman on Hennig on Gelman on Bayes

Introduction: Deborah Mayo pointed me to this discussion by Christian Hennig of my recent article on Induction and Deduction in Bayesian Data Analysis. A couple days ago I responded to comments by Mayo, Stephen Senn, and Larry Wasserman. I will respond to Hennig by pulling out paragraphs from his discussion and then replying. Hennig: for me the terms “frequentist” and “subjective Bayes” point to interpretations of probability, and not to specific methods of inference. The frequentist one refers to the idea that there is an underlying data generating process that repeatedly throws out data and would approximate the assumed distribution if one could only repeat it infinitely often. Hennig makes the good point that, if this is the way you would define “frequentist” (it’s not how I’d define the term myself, but I’ll use Hennig’s definition here), then it makes sense to be a frequentist in some settings but not others. Dice really can be rolled over and over again; a sample survey of 15

4 0.16932505 1713 andrew gelman stats-2013-02-08-P-values and statistical practice

Introduction: From my new article in the journal Epidemiology: Sander Greenland and Charles Poole accept that P values are here to stay but recognize that some of their most common interpretations have problems. The casual view of the P value as posterior probability of the truth of the null hypothesis is false and not even close to valid under any reasonable model, yet this misunderstanding persists even in high-stakes settings (as discussed, for example, by Greenland in 2011). The formal view of the P value as a probability conditional on the null is mathematically correct but typically irrelevant to research goals (hence, the popularity of alternative—if wrong—interpretations). A Bayesian interpretation based on a spike-and-slab model makes little sense in applied contexts in epidemiology, political science, and other fields in which true effects are typically nonzero and bounded (thus violating both the “spike” and the “slab” parts of the model). I find Greenland and Poole’s perspective t

5 0.1596591 1247 andrew gelman stats-2012-04-05-More philosophy of Bayes

Introduction: Konrad Scheffler writes: I was interested by your paper “Induction and deduction in Bayesian data analysis” and was wondering if you would entertain a few questions: – Under the banner of objective Bayesianism, I would posit something like this as a description of Bayesian inference: “Objective Bayesian probability is not a degree of belief (which would necessarily be subjective) but a measure of the plausibility of a hypothesis, conditional on a formally specified information state. One way of specifying a formal information state is to specify a model, which involves specifying both a prior distribution (typically for a set of unobserved variables) and a likelihood function (typically for a set of observed variables, conditioned on the values of the unobserved variables). Bayesian inference involves calculating the objective degree of plausibility of a hypothesis (typically the truth value of the hypothesis is a function of the variables mentioned above) given such a

6 0.1455795 1941 andrew gelman stats-2013-07-16-Priors

7 0.13790451 1465 andrew gelman stats-2012-08-21-D. Buggin

8 0.13068366 1983 andrew gelman stats-2013-08-15-More on AIC, WAIC, etc

9 0.12621236 1955 andrew gelman stats-2013-07-25-Bayes-respecting experimental design and other things

10 0.12503666 1886 andrew gelman stats-2013-06-07-Robust logistic regression

11 0.11982876 1648 andrew gelman stats-2013-01-02-A important new survey of Bayesian predictive methods for model assessment, selection and comparison

12 0.11832747 1221 andrew gelman stats-2012-03-19-Whassup with deviance having a high posterior correlation with a parameter in the model?

13 0.11789849 2133 andrew gelman stats-2013-12-13-Flexibility is good

14 0.11746013 2208 andrew gelman stats-2014-02-12-How to think about “identifiability” in Bayesian inference?

15 0.11701953 2201 andrew gelman stats-2014-02-06-Bootstrap averaging: Examples where it works and where it doesn’t work

16 0.11435094 774 andrew gelman stats-2011-06-20-The pervasive twoishness of statistics; in particular, the “sampling distribution” and the “likelihood” are two different models, and that’s a good thing

17 0.11279487 2176 andrew gelman stats-2014-01-19-Transformations for non-normal data

18 0.11259694 754 andrew gelman stats-2011-06-09-Difficulties with Bayesian model averaging

19 0.11219082 291 andrew gelman stats-2010-09-22-Philosophy of Bayes and non-Bayes: A dialogue with Deborah Mayo

20 0.11149688 1779 andrew gelman stats-2013-03-27-“Two Dogmas of Strong Objective Bayesianism”


similar blogs computed by lsi model

lsi for this blog:

topicId topicWeight

[(0, 0.177), (1, 0.15), (2, 0.019), (3, 0.011), (4, -0.039), (5, -0.044), (6, 0.029), (7, 0.004), (8, 0.003), (9, -0.018), (10, -0.021), (11, 0.033), (12, -0.058), (13, -0.038), (14, -0.074), (15, -0.039), (16, 0.024), (17, -0.038), (18, 0.003), (19, -0.043), (20, 0.062), (21, -0.04), (22, 0.023), (23, -0.066), (24, 0.002), (25, 0.037), (26, -0.034), (27, 0.049), (28, 0.055), (29, -0.018), (30, -0.015), (31, 0.008), (32, 0.008), (33, 0.023), (34, -0.039), (35, 0.006), (36, 0.053), (37, -0.03), (38, 0.007), (39, 0.005), (40, -0.002), (41, -0.042), (42, 0.018), (43, -0.019), (44, -0.016), (45, -0.045), (46, -0.016), (47, 0.005), (48, 0.054), (49, -0.019)]

similar blogs list:

simIndex simValue blogId blogTitle

same-blog 1 0.97646177 2029 andrew gelman stats-2013-09-18-Understanding posterior p-values

Introduction: David Kaplan writes: I came across your paper “Understanding Posterior Predictive P-values”, and I have a question regarding your statement “If a posterior predictive p-value is 0.4, say, that means that, if we believe the model, we think there is a 40% chance that tomorrow’s value of T(y_rep) will exceed today’s T(y).” This is perfectly understandable to me and represents the idea of calibration. However, I am unsure how this relates to statements about fit. If T is the LR chi-square or Pearson chi-square, then your statement that there is a 40% chance that tomorrows value exceeds today’s value indicates bad fit, I think. Yet, some literature indicates that high p-values suggest good fit. Could you clarify this? My reply: I think that “fit” depends on the question being asked. In this case, I’d say the model fits for this particular purpose, even though it might not fit for other purposes. And here’s the abstract of the paper: Posterior predictive p-values do not i

2 0.83833194 996 andrew gelman stats-2011-11-07-Chi-square FAIL when many cells have small expected values

Introduction: William Perkins, Mark Tygert, and Rachel Ward write : If a discrete probability distribution in a model being tested for goodness-of-fit is not close to uniform, then forming the Pearson χ2 statistic can involve division by nearly zero. This often leads to serious trouble in practice — even in the absence of round-off errors . . . The problem is not merely that the chi-squared statistic doesn’t have the advertised chi-squared distribution —a reference distribution can always be computed via simulation, either using the posterior predictive distribution or by conditioning on a point estimate of the cell expectations and then making a degrees-of-freedom sort of adjustment. Rather, the problem is that, when there are lots of cells with near-zero expectation, the chi-squared test is mostly noise. And this is not merely a theoretical problem. It comes up in real examples. Here’s one, taken from the classic 1992 genetics paper of Guo and Thomspson: And here are the e

3 0.82317477 1363 andrew gelman stats-2012-06-03-Question about predictive checks

Introduction: Klaas Metselaar writes: I [Metselaar] am currently involved in a discussion about the use of the notion “predictive” as used in “posterior predictive check”. I would argue that the notion “predictive” should be reserved for posterior checks using information not used in the determination of the posterior. I quote from the discussion: “However, the predictive uncertainty in a Bayesian calculation requires sampling from all the random variables, and this includes both the model parameters and the residual error”. My [Metselaar's] comment: This may be exactly the point I am worried about: shouldn’t the predictive uncertainty be defined as sampling from the posterior parameter distribution + residual error + sampling from the prediction error distribution? Residual error reduces to measurement error in the case of a model which is perfect for the sample of experiments. Measurement error could be reduced to almost zero by ideal and perfect measurement instruments. I would h

4 0.81646001 1983 andrew gelman stats-2013-08-15-More on AIC, WAIC, etc

Introduction: Following up on our discussion from the other day, Angelika van der Linde sends along this paper from 2012 (link to journal here ). And Aki pulls out this great quote from Geisser and Eddy (1979): This discussion makes clear that in the nested case this method, as Akaike’s, is not consistent; i.e., even if $M_k$ is true, it will be rejected with probability $\alpha$ as $N\to\infty$. This point is also made by Schwarz (1978). However, from the point of view of prediction, this is of no great consequence. For large numbers of observations, a prediction based on the falsely assumed $M_k$, will not differ appreciably from one based on the true $M_k$. For example, if we assert that two normal populations have different means when in fact they have the same mean, then the use of the group mean as opposed to the grand mean for predicting a future observation results in predictors which are asymptotically equivalent and whose predictive variances are $\sigma^2[1 + (1/2n)]$ and $\si

5 0.79275024 1518 andrew gelman stats-2012-10-02-Fighting a losing battle

Introduction: Following a recent email exchange regarding path sampling and thermodynamic integration (sadly, I’ve gotten rusty and haven’t thought seriously about these challenges for many years), a correspondent referred to the marginal distribution of the data under a model as “the evidence.” I hate that expression! As we discuss in chapter 6 of BDA, for continuous-parametered models, this quantity can be completely sensitive to aspects of the prior that have essentially no impact on the posterior. In the examples I’ve seen, this marginal probability is not “evidence” in any useful sense of the term. When I told this to my correspondent, he replied, I actually don’t find “the evidence” too bothersome. I don’t have BDA at home where I’m working from at the moment, so I’ll read up on chapter 6 later, but I assume you refer to the problem of the marginal likelihood being strongly sensitive to the prior in a way that the posterior typically isn’t, thereby diminishing the value of the margi

6 0.78659981 1401 andrew gelman stats-2012-06-30-David Hogg on statistics

7 0.77581042 1713 andrew gelman stats-2013-02-08-P-values and statistical practice

8 0.76377511 1247 andrew gelman stats-2012-04-05-More philosophy of Bayes

9 0.76264966 1221 andrew gelman stats-2012-03-19-Whassup with deviance having a high posterior correlation with a parameter in the model?

10 0.75780475 1817 andrew gelman stats-2013-04-21-More on Bayesian model selection in high-dimensional settings

11 0.75684333 1095 andrew gelman stats-2012-01-01-Martin and Liu: Probabilistic inference based on consistency of model with data

12 0.75619048 754 andrew gelman stats-2011-06-09-Difficulties with Bayesian model averaging

13 0.73604333 1723 andrew gelman stats-2013-02-15-Wacky priors can work well?

14 0.72308141 2208 andrew gelman stats-2014-02-12-How to think about “identifiability” in Bayesian inference?

15 0.71735162 398 andrew gelman stats-2010-11-06-Quote of the day

16 0.71703357 1208 andrew gelman stats-2012-03-11-Gelman on Hennig on Gelman on Bayes

17 0.71678138 1999 andrew gelman stats-2013-08-27-Bayesian model averaging or fitting a larger model

18 0.71642631 1527 andrew gelman stats-2012-10-10-Another reason why you can get good inferences from a bad model

19 0.7150625 151 andrew gelman stats-2010-07-16-Wanted: Probability distributions for rank orderings

20 0.71474081 82 andrew gelman stats-2010-06-12-UnConMax – uncertainty consideration maxims 7 +-- 2


similar blogs computed by lda model

lda for this blog:

topicId topicWeight

[(2, 0.028), (16, 0.086), (18, 0.012), (21, 0.099), (24, 0.244), (31, 0.012), (37, 0.023), (48, 0.021), (49, 0.015), (55, 0.013), (68, 0.015), (69, 0.023), (75, 0.016), (86, 0.026), (93, 0.013), (99, 0.265)]

similar blogs list:

simIndex simValue blogId blogTitle

same-blog 1 0.98335183 2029 andrew gelman stats-2013-09-18-Understanding posterior p-values

Introduction: David Kaplan writes: I came across your paper “Understanding Posterior Predictive P-values”, and I have a question regarding your statement “If a posterior predictive p-value is 0.4, say, that means that, if we believe the model, we think there is a 40% chance that tomorrow’s value of T(y_rep) will exceed today’s T(y).” This is perfectly understandable to me and represents the idea of calibration. However, I am unsure how this relates to statements about fit. If T is the LR chi-square or Pearson chi-square, then your statement that there is a 40% chance that tomorrows value exceeds today’s value indicates bad fit, I think. Yet, some literature indicates that high p-values suggest good fit. Could you clarify this? My reply: I think that “fit” depends on the question being asked. In this case, I’d say the model fits for this particular purpose, even though it might not fit for other purposes. And here’s the abstract of the paper: Posterior predictive p-values do not i

2 0.97098941 1080 andrew gelman stats-2011-12-24-Latest in blog advertising

Introduction: I received the following message from “Patricia Lopez” of “Premium Link Ads”: Hello, I am interested in placing a text link on your page: http://andrewgelman.com/2011/07/super_sam_fuld/. The link would point to a page on a website that is relevant to your page and may be useful to your site visitors. We would be happy to compensate you for your time if it is something we are able to work out. The best way to reach me is through a direct response to this email. This will help me get back to you about the right link request. Please let me know if you are interested, and if not thanks for your time. Thanks. Usually I just ignore these, but after our recent discussion I decided to reply. I wrote: How much do you pay? But no answer. I wonder what’s going on? I mean, why bother sending the email in the first place if you’re not going to follow up?

3 0.96714509 1757 andrew gelman stats-2013-03-11-My problem with the Lindley paradox

Introduction: From a couple years ago but still relevant, I think: To me, the Lindley paradox falls apart because of its noninformative prior distribution on the parameter of interest. If you really think there’s a high probability the parameter is nearly exactly zero, I don’t see the point of the model saying that you have no prior information at all on the parameter. In short: my criticism of so-called Bayesian hypothesis testing is that it’s insufficiently Bayesian. P.S. To clarify (in response to Bill’s comment below): I’m speaking of all the examples I’ve ever worked on in social and environmental science, where in some settings I can imagine a parameter being very close to zero and in other settings I can imagine a parameter taking on just about any value in a wide range, but where I’ve never seen an example where a parameter could be either right at zero or taking on any possible value. But such examples might occur in areas of application that I haven’t worked on.

4 0.96545243 898 andrew gelman stats-2011-09-10-Fourteen magic words: an update

Introduction: In the discussion of the fourteen magic words that can increase voter turnout by over 10 percentage points , questions were raised about the methods used to estimate the experimental effects. I sent these on to Chris Bryan, the author of the study, and he gave the following response: We’re happy to address the questions that have come up. It’s always noteworthy when a precise psychological manipulation like this one generates a large effect on a meaningful outcome. Such findings illustrate the power of the underlying psychological process. I’ve provided the contingency tables for the two turnout experiments below. As indicated in the paper, the data are analyzed using logistic regressions. The change in chi-squared statistic represents the significance of the noun vs. verb condition variable in predicting turnout; that is, the change in the model’s significance when the condition variable is added. This is a standard way to analyze dichotomous outcomes. Four outliers were excl

5 0.96545005 1607 andrew gelman stats-2012-12-05-The p-value is not . . .

Introduction: From a recent email exchange: I agree that you should never compare p-values directly. The p-value is a strange nonlinear transformation of data that is only interpretable under the null hypothesis. Once you abandon the null (as we do when we observe something with a very low p-value), the p-value itself becomes irrelevant. To put it another way, the p-value is a measure of evidence, it is not an estimate of effect size (as it is often treated, with the idea that a p=.001 effect is larger than a p=.01 effect, etc). Even conditional on sample size, the p-value is not a measure of effect size.

6 0.96535635 896 andrew gelman stats-2011-09-09-My homework success

7 0.96533203 1584 andrew gelman stats-2012-11-19-Tradeoffs in information graphics

8 0.96450984 433 andrew gelman stats-2010-11-27-One way that psychology research is different than medical research

9 0.96404248 810 andrew gelman stats-2011-07-20-Adding more information can make the variance go up (depending on your model)

10 0.96024799 1792 andrew gelman stats-2013-04-07-X on JLP

11 0.9596619 502 andrew gelman stats-2011-01-04-Cash in, cash out graph

12 0.95898223 1615 andrew gelman stats-2012-12-10-A defense of Tom Wolfe based on the impossibility of the law of small numbers in network structure

13 0.95834911 1838 andrew gelman stats-2013-05-03-Setting aside the politics, the debate over the new health-care study reveals that we’re moving to a new high standard of statistical journalism

14 0.95797217 2312 andrew gelman stats-2014-04-29-Ken Rice presents a unifying approach to statistical inference and hypothesis testing

15 0.95707881 953 andrew gelman stats-2011-10-11-Steve Jobs’s cancer and science-based medicine

16 0.95703727 1240 andrew gelman stats-2012-04-02-Blogads update

17 0.95671457 574 andrew gelman stats-2011-02-14-“The best data visualizations should stand on their own”? I don’t think so.

18 0.95669115 2112 andrew gelman stats-2013-11-25-An interesting but flawed attempt to apply general forecasting principles to contextualize attitudes toward risks of global warming

19 0.95607042 2086 andrew gelman stats-2013-11-03-How best to compare effects measured in two different time periods?

20 0.95465493 399 andrew gelman stats-2010-11-07-Challenges of experimental design; also another rant on the practice of mentioning the publication of an article but not naming its author