andrew_gelman_stats andrew_gelman_stats-2013 andrew_gelman_stats-2013-1983 knowledge-graph by maker-knowledge-mining

1983 andrew gelman stats-2013-08-15-More on AIC, WAIC, etc


meta infos for this blog

Source: html

Introduction: Following up on our discussion from the other day, Angelika van der Linde sends along this paper from 2012 (link to journal here ). And Aki pulls out this great quote from Geisser and Eddy (1979): This discussion makes clear that in the nested case this method, as Akaike’s, is not consistent; i.e., even if $M_k$ is true, it will be rejected with probability $\alpha$ as $N\to\infty$. This point is also made by Schwarz (1978). However, from the point of view of prediction, this is of no great consequence. For large numbers of observations, a prediction based on the falsely assumed $M_k$, will not differ appreciably from one based on the true $M_k$. For example, if we assert that two normal populations have different means when in fact they have the same mean, then the use of the group mean as opposed to the grand mean for predicting a future observation results in predictors which are asymptotically equivalent and whose predictive variances are $\sigma^2[1 + (1/2n)]$ and $\si


Summary: the most important sentenses genereted by tfidf model

sentIndex sentText sentNum sentScore

1 Following up on our discussion from the other day, Angelika van der Linde sends along this paper from 2012 (link to journal here ). [sent-1, score-0.19]

2 And Aki pulls out this great quote from Geisser and Eddy (1979): This discussion makes clear that in the nested case this method, as Akaike’s, is not consistent; i. [sent-2, score-0.193]

3 , even if $M_k$ is true, it will be rejected with probability $\alpha$ as $N\to\infty$. [sent-4, score-0.077]

4 For large numbers of observations, a prediction based on the falsely assumed $M_k$, will not differ appreciably from one based on the true $M_k$. [sent-7, score-0.393]

5 And I’d like to pull out something from the comments, in which Brendon Brewer wrote, “I’m happy to avoid all these XIC things, as they seem like strange approximations to something I can already accomplish with marginal likelihoods. [sent-9, score-0.926]

6 ” I’d be happy to avoid “these XIC things” too—writing our paper took a lot of unpleasant work—but I felt the need to do so because comparing models is a real issue in a lot of applied research. [sent-10, score-0.492]

7 I don’t actually do much model comparison myself—my usual approach is to fit the most complicated model that I can handle, and then get frustrated that I can’t do more—but I recognize that others feel the need for predictive comparisons. [sent-11, score-0.325]

8 And a key point here, which we discuss in chapter 7 of BDA (chapter 6 of the first and second editions) is that marginal likelihoods (also very misleadingly called “evidence”) do not in general solve the problem of predictive model comparison. [sent-12, score-0.934]

9 Out-of-sample prediction error (which is what AIC, DIC, and WAIC are estimating) is not the same as marginal likelihood. [sent-13, score-0.701]

10 Via weak priors, it’s easy to construct models that fit the data well and give accurate predictions but can have marginal likelihoods as low as you want. [sent-14, score-0.911]

11 In many settings it can make sense to compare fitted models using estimated out-of-sample prediction error, while it will not make sense to compare them using marginal likelihoods. [sent-15, score-0.955]

12 The problem is that, with continuous-parameter models, the marginal likelihood can depend strongly on aspects of the prior distribution that have essentially no impact on the posterior distributions of the individual models. [sent-16, score-0.39]

13 This issue is well known, but perhaps not well known enough. [sent-17, score-0.322]


similar blogs computed by tfidf model

tfidf for this blog:

wordName wordTfidf (topN-words)

[('marginal', 0.39), ('xic', 0.261), ('prediction', 0.228), ('sigma', 0.187), ('likelihoods', 0.175), ('predictive', 0.162), ('models', 0.119), ('angelika', 0.119), ('geisser', 0.119), ('schwarz', 0.112), ('linde', 0.112), ('assert', 0.112), ('compare', 0.109), ('avoid', 0.109), ('der', 0.107), ('misleadingly', 0.107), ('editions', 0.103), ('mean', 0.101), ('chapter', 0.1), ('accomplish', 0.1), ('waic', 0.1), ('pulls', 0.098), ('dic', 0.098), ('known', 0.096), ('happy', 0.096), ('nested', 0.095), ('akaike', 0.093), ('aic', 0.093), ('falsely', 0.092), ('asymptotically', 0.092), ('unpleasant', 0.09), ('frustrated', 0.086), ('approximations', 0.084), ('error', 0.083), ('van', 0.083), ('alpha', 0.083), ('variances', 0.083), ('aki', 0.083), ('populations', 0.08), ('issue', 0.078), ('rejected', 0.077), ('grand', 0.077), ('fit', 0.077), ('observation', 0.076), ('construct', 0.076), ('pull', 0.075), ('well', 0.074), ('bda', 0.074), ('true', 0.073), ('strange', 0.072)]

similar blogs list:

simIndex simValue blogId blogTitle

same-blog 1 1.0000001 1983 andrew gelman stats-2013-08-15-More on AIC, WAIC, etc

Introduction: Following up on our discussion from the other day, Angelika van der Linde sends along this paper from 2012 (link to journal here ). And Aki pulls out this great quote from Geisser and Eddy (1979): This discussion makes clear that in the nested case this method, as Akaike’s, is not consistent; i.e., even if $M_k$ is true, it will be rejected with probability $\alpha$ as $N\to\infty$. This point is also made by Schwarz (1978). However, from the point of view of prediction, this is of no great consequence. For large numbers of observations, a prediction based on the falsely assumed $M_k$, will not differ appreciably from one based on the true $M_k$. For example, if we assert that two normal populations have different means when in fact they have the same mean, then the use of the group mean as opposed to the grand mean for predicting a future observation results in predictors which are asymptotically equivalent and whose predictive variances are $\sigma^2[1 + (1/2n)]$ and $\si

2 0.2501134 1518 andrew gelman stats-2012-10-02-Fighting a losing battle

Introduction: Following a recent email exchange regarding path sampling and thermodynamic integration (sadly, I’ve gotten rusty and haven’t thought seriously about these challenges for many years), a correspondent referred to the marginal distribution of the data under a model as “the evidence.” I hate that expression! As we discuss in chapter 6 of BDA, for continuous-parametered models, this quantity can be completely sensitive to aspects of the prior that have essentially no impact on the posterior. In the examples I’ve seen, this marginal probability is not “evidence” in any useful sense of the term. When I told this to my correspondent, he replied, I actually don’t find “the evidence” too bothersome. I don’t have BDA at home where I’m working from at the moment, so I’ll read up on chapter 6 later, but I assume you refer to the problem of the marginal likelihood being strongly sensitive to the prior in a way that the posterior typically isn’t, thereby diminishing the value of the margi

3 0.2065317 779 andrew gelman stats-2011-06-25-Avoiding boundary estimates using a prior distribution as regularization

Introduction: For awhile I’ve been fitting most of my multilevel models using lmer/glmer, which gives point estimates of the group-level variance parameters (maximum marginal likelihood estimate for lmer and an approximation for glmer). I’m usually satisfied with this–sure, point estimation understates the uncertainty in model fitting, but that’s typically the least of our worries. Sometimes, though, lmer/glmer estimates group-level variances at 0 or estimates group-level correlation parameters at +/- 1. Typically, when this happens, it’s not that we’re so sure the variance is close to zero or that the correlation is close to 1 or -1; rather, the marginal likelihood does not provide a lot of information about these parameters of the group-level error distribution. I don’t want point estimates on the boundary. I don’t want to say that the unexplained variance in some dimension is exactly zero. One way to handle this problem is full Bayes: slap a prior on sigma, do your Gibbs and Metropolis

4 0.18452223 754 andrew gelman stats-2011-06-09-Difficulties with Bayesian model averaging

Introduction: In response to this article by Cosma Shalizi and myself on the philosophy of Bayesian statistics, David Hogg writes: I [Hogg] agree–even in physics and astronomy–that the models are not “True” in the God-like sense of being absolute reality (that is, I am not a realist); and I have argued (a philosophically very naive paper, but hey, I was new to all this) that for pretty fundamental reasons we could never arrive at the True (with a capital “T”) model of the Universe. The goal of inference is to find the “best” model, where “best” might have something to do with prediction, or explanation, or message length, or (horror!) our utility. Needless to say, most of my physics friends *are* realists, even in the face of “effective theories” as Newtonian mechanics is an effective theory of GR and GR is an effective theory of “quantum gravity” (this plays to your point, because if you think any theory is possibly an effective theory, how could you ever find Truth?). I also liked the i

5 0.17880434 1975 andrew gelman stats-2013-08-09-Understanding predictive information criteria for Bayesian models

Introduction: Jessy, Aki, and I write : We review the Akaike, deviance, and Watanabe-Akaike information criteria from a Bayesian perspective, where the goal is to estimate expected out-of-sample-prediction error using a bias-corrected adjustment of within-sample error. We focus on the choices involved in setting up these measures, and we compare them in three simple examples, one theoretical and two applied. The contribution of this review is to put all these information criteria into a Bayesian predictive context and to better understand, through small examples, how these methods can apply in practice. I like this paper. It came about as a result of preparing Chapter 7 for the new BDA . I had difficulty understanding AIC, DIC, WAIC, etc., but I recognized that these methods served a need. My first plan was to just apply DIC and WAIC on a couple of simple examples (a linear regression and the 8 schools) and leave it at that. But when I did the calculations, I couldn’t understand the resu

6 0.17728008 1363 andrew gelman stats-2012-06-03-Question about predictive checks

7 0.175777 1377 andrew gelman stats-2012-06-13-A question about AIC

8 0.1684521 2349 andrew gelman stats-2014-05-26-WAIC and cross-validation in Stan!

9 0.16444996 778 andrew gelman stats-2011-06-24-New ideas on DIC from Martyn Plummer and Sumio Watanabe

10 0.16323702 776 andrew gelman stats-2011-06-22-Deviance, DIC, AIC, cross-validation, etc

11 0.15524651 1474 andrew gelman stats-2012-08-29-More on scaled-inverse Wishart and prior independence

12 0.1468364 612 andrew gelman stats-2011-03-14-Uh-oh

13 0.13315231 1395 andrew gelman stats-2012-06-27-Cross-validation (What is it good for?)

14 0.13245437 1648 andrew gelman stats-2013-01-02-A important new survey of Bayesian predictive methods for model assessment, selection and comparison

15 0.13068366 2029 andrew gelman stats-2013-09-18-Understanding posterior p-values

16 0.12961999 773 andrew gelman stats-2011-06-18-Should we always be using the t and robit instead of the normal and logit?

17 0.12772773 338 andrew gelman stats-2010-10-12-Update on Mankiw’s work incentives

18 0.12714888 1247 andrew gelman stats-2012-04-05-More philosophy of Bayes

19 0.1242449 1941 andrew gelman stats-2013-07-16-Priors

20 0.12159077 288 andrew gelman stats-2010-09-21-Discussion of the paper by Girolami and Calderhead on Bayesian computation


similar blogs computed by lsi model

lsi for this blog:

topicId topicWeight

[(0, 0.222), (1, 0.153), (2, 0.033), (3, 0.033), (4, -0.008), (5, -0.037), (6, 0.064), (7, -0.043), (8, 0.001), (9, 0.02), (10, 0.01), (11, 0.019), (12, -0.055), (13, 0.006), (14, -0.092), (15, -0.023), (16, 0.039), (17, -0.003), (18, -0.039), (19, 0.012), (20, 0.079), (21, -0.018), (22, 0.058), (23, 0.024), (24, -0.012), (25, 0.039), (26, -0.025), (27, 0.041), (28, 0.038), (29, -0.029), (30, -0.059), (31, 0.054), (32, 0.028), (33, -0.021), (34, 0.021), (35, 0.017), (36, 0.043), (37, -0.048), (38, 0.013), (39, -0.007), (40, -0.032), (41, -0.005), (42, 0.014), (43, -0.03), (44, 0.015), (45, -0.002), (46, -0.049), (47, 0.034), (48, 0.012), (49, -0.027)]

similar blogs list:

simIndex simValue blogId blogTitle

same-blog 1 0.96532071 1983 andrew gelman stats-2013-08-15-More on AIC, WAIC, etc

Introduction: Following up on our discussion from the other day, Angelika van der Linde sends along this paper from 2012 (link to journal here ). And Aki pulls out this great quote from Geisser and Eddy (1979): This discussion makes clear that in the nested case this method, as Akaike’s, is not consistent; i.e., even if $M_k$ is true, it will be rejected with probability $\alpha$ as $N\to\infty$. This point is also made by Schwarz (1978). However, from the point of view of prediction, this is of no great consequence. For large numbers of observations, a prediction based on the falsely assumed $M_k$, will not differ appreciably from one based on the true $M_k$. For example, if we assert that two normal populations have different means when in fact they have the same mean, then the use of the group mean as opposed to the grand mean for predicting a future observation results in predictors which are asymptotically equivalent and whose predictive variances are $\sigma^2[1 + (1/2n)]$ and $\si

2 0.83438206 1518 andrew gelman stats-2012-10-02-Fighting a losing battle

Introduction: Following a recent email exchange regarding path sampling and thermodynamic integration (sadly, I’ve gotten rusty and haven’t thought seriously about these challenges for many years), a correspondent referred to the marginal distribution of the data under a model as “the evidence.” I hate that expression! As we discuss in chapter 6 of BDA, for continuous-parametered models, this quantity can be completely sensitive to aspects of the prior that have essentially no impact on the posterior. In the examples I’ve seen, this marginal probability is not “evidence” in any useful sense of the term. When I told this to my correspondent, he replied, I actually don’t find “the evidence” too bothersome. I don’t have BDA at home where I’m working from at the moment, so I’ll read up on chapter 6 later, but I assume you refer to the problem of the marginal likelihood being strongly sensitive to the prior in a way that the posterior typically isn’t, thereby diminishing the value of the margi

3 0.81731677 2029 andrew gelman stats-2013-09-18-Understanding posterior p-values

Introduction: David Kaplan writes: I came across your paper “Understanding Posterior Predictive P-values”, and I have a question regarding your statement “If a posterior predictive p-value is 0.4, say, that means that, if we believe the model, we think there is a 40% chance that tomorrow’s value of T(y_rep) will exceed today’s T(y).” This is perfectly understandable to me and represents the idea of calibration. However, I am unsure how this relates to statements about fit. If T is the LR chi-square or Pearson chi-square, then your statement that there is a 40% chance that tomorrows value exceeds today’s value indicates bad fit, I think. Yet, some literature indicates that high p-values suggest good fit. Could you clarify this? My reply: I think that “fit” depends on the question being asked. In this case, I’d say the model fits for this particular purpose, even though it might not fit for other purposes. And here’s the abstract of the paper: Posterior predictive p-values do not i

4 0.8021813 778 andrew gelman stats-2011-06-24-New ideas on DIC from Martyn Plummer and Sumio Watanabe

Introduction: Martyn Plummer replied to my recent blog on DIC with information that was important enough that I thought it deserved its own blog entry. Martyn wrote: DIC has been around for 10 years now and despite being immensely popular with applied statisticians it has generated very little theoretical interest. In fact, the silence has been deafening. I [Martyn] hope my paper added some clarity. As you say, DIC is (an approximation to) a theoretical out-of-sample predictive error. When I finished the paper I was a little embarrassed to see that I had almost perfectly reconstructed the justification of AIC as approximate cross-validation measure by Stone (1977), with a Bayesian spin of course. But even this insight leaves a lot of choices open. You need to choose the right loss function and also which level of the model you want to replicate from. David Spiegelhalter and colleagues called this the “focus”. In practice the focus is limited to the lowest level of the model. You generall

5 0.76916784 1374 andrew gelman stats-2012-06-11-Convergence Monitoring for Non-Identifiable and Non-Parametric Models

Introduction: Becky Passonneau and colleagues at the Center for Computational Learning Systems (CCLS) at Columbia have been working on a project for ConEd (New York’s major electric utility) to rank structures based on vulnerability to secondary events (e.g., transformer explosions, cable meltdowns, electrical fires). They’ve been using the R implementation BayesTree of Chipman, George and McCulloch’s Bayesian Additive Regression Trees (BART). BART is a Bayesian non-parametric method that is non-identifiable in two ways. Firstly, it is an additive tree model with a fixed number of trees, the indexes of which aren’t identified (you get the same predictions in a model swapping the order of the trees). This is the same kind of non-identifiability you get with any mixture model (additive or interpolated) with an exchangeable prior on the mixture components. Secondly, the trees themselves have varying structure over samples in terms of number of nodes and their topology (depth, branching, etc

6 0.76864529 2311 andrew gelman stats-2014-04-29-Bayesian Uncertainty Quantification for Differential Equations!

7 0.75999707 1363 andrew gelman stats-2012-06-03-Question about predictive checks

8 0.7543323 776 andrew gelman stats-2011-06-22-Deviance, DIC, AIC, cross-validation, etc

9 0.75050396 996 andrew gelman stats-2011-11-07-Chi-square FAIL when many cells have small expected values

10 0.74429315 1459 andrew gelman stats-2012-08-15-How I think about mixture models

11 0.74281353 810 andrew gelman stats-2011-07-20-Adding more information can make the variance go up (depending on your model)

12 0.73087835 2136 andrew gelman stats-2013-12-16-Whither the “bet on sparsity principle” in a nonsparse world?

13 0.72273427 754 andrew gelman stats-2011-06-09-Difficulties with Bayesian model averaging

14 0.72227639 1221 andrew gelman stats-2012-03-19-Whassup with deviance having a high posterior correlation with a parameter in the model?

15 0.71737969 1392 andrew gelman stats-2012-06-26-Occam

16 0.71720028 1041 andrew gelman stats-2011-12-04-David MacKay and Occam’s Razor

17 0.71622968 1739 andrew gelman stats-2013-02-26-An AI can build and try out statistical models using an open-ended generative grammar

18 0.71340919 1723 andrew gelman stats-2013-02-15-Wacky priors can work well?

19 0.70884269 1999 andrew gelman stats-2013-08-27-Bayesian model averaging or fitting a larger model

20 0.70866507 1395 andrew gelman stats-2012-06-27-Cross-validation (What is it good for?)


similar blogs computed by lda model

lda for this blog:

topicId topicWeight

[(15, 0.035), (16, 0.052), (24, 0.188), (57, 0.011), (59, 0.013), (61, 0.026), (86, 0.236), (96, 0.01), (99, 0.298)]

similar blogs list:

simIndex simValue blogId blogTitle

1 0.98305559 253 andrew gelman stats-2010-09-03-Gladwell vs Pinker

Introduction: I just happened to notice this from last year. Eric Loken writes : Steven Pinker reviewed Malcolm Gladwell’s latest book and criticized him rather harshly for several shortcomings. Gladwell appears to have made things worse for himself in a letter to the editor of the NYT by defending a manifestly weak claim from one of his essays – the claim that NFL quarterback performance is unrelated to the order they were drafted out of college. The reason w [Loken and his colleagues] are implicated is that Pinker identified an earlier blog post of ours as one of three sources he used to challenge Gladwell (yay us!). But Gladwell either misrepresented or misunderstood our post in his response, and admonishes Pinker by saying “we should agree that our differences owe less to what can be found in the scientific literature than they do to what can be found on Google.” Well, here’s what you can find on Google. Follow this link to request the data for NFL quarterbacks drafted between 1980 and

2 0.97834128 1718 andrew gelman stats-2013-02-11-Toward a framework for automatic model building

Introduction: Patrick Caldon writes: I saw your recent blog post where you discussed in passing an iterative-chain-of models approach to AI. I essentially built such a thing for my PhD thesis – not in a Bayesian context, but in a logic programming context – and proved it had a few properties and showed how you could solve some toy problems. The important bit of my framework was that at various points you also go and get more data in the process – in a statistical context this might be seen as building a little univariate model on a subset of the data, then iteratively extending into a better model with more data and more independent variables – a generalized forward stepwise regression if you like. It wrapped a proper computational framework around E.M. Gold’s identification/learning in the limit based on a logic my advisor (Eric Martin) had invented. What’s not written up in the thesis is a few months of failed struggle trying to shoehorn some simple statistical inference into this

3 0.97794563 1552 andrew gelman stats-2012-10-29-“Communication is a central task of statistics, and ideally a state-of-the-art data analysis can have state-of-the-art displays to match”

Introduction: The Journal of the Royal Statistical Society publishes papers followed by discussions. Lots of discussions, each can be no more than 400 words. Here’s my most recent discussion: The authors are working on an important applied problem and I have no reason to doubt that their approach is a step forward beyond diagnostic criteria based on point estimation. An attempt at an accurate assessment of variation is important not just for statistical reasons but also because scientists have the duty to convey their uncertainty to the larger world. I am thinking, for example, of discredited claims such as that of the mathematician who claimed to predict divorces with 93% accuracy (Abraham, 2010). Regarding the paper at hand, I thought I would try an experiment in comment-writing. My usual practice is to read the graphs and then go back and clarify any questions through the text. So, very quickly: I would prefer Figure 1 to be displayed in terms of standard deviations, not variances. I

4 0.9729405 1327 andrew gelman stats-2012-05-18-Comments on “A Bayesian approach to complex clinical diagnoses: a case-study in child abuse”

Introduction: I was given the opportunity to briefly comment on the paper , A Bayesian approach to complex clinical diagnoses: a case-study in child abuse, by Nicky Best, Deborah Ashby, Frank Dunstan, David Foreman, and Neil McIntosh, for the Journal of the Royal Statistical Society. Here is what I wrote: Best et al. are working on an important applied problem and I have no reason to doubt that their approach is a step forward beyond diagnostic criteria based on point estimation. An attempt at an accurate assessment of variation is important not just for statistical reasons but also because scientists have the duty to convey their uncertainty to the larger world. I am thinking, for example, of discredited claims such as that of the mathematician who claimed to predict divorces with 93% accuracy (Abraham, 2010). Regarding the paper at hand, I thought I would try an experiment in comment-writing. My usual practice is to read the graphs and then go back and clarify any questions through the t

5 0.96959221 436 andrew gelman stats-2010-11-29-Quality control problems at the New York Times

Introduction: I guess there’s a reason they put this stuff in the Opinion section and not in the Science section, huh? P.S. More here .

6 0.96650267 1547 andrew gelman stats-2012-10-25-College football, voting, and the law of large numbers

7 0.96453047 305 andrew gelman stats-2010-09-29-Decision science vs. social psychology

8 0.9613657 904 andrew gelman stats-2011-09-13-My wikipedia edit

9 0.96123827 759 andrew gelman stats-2011-06-11-“2 level logit with 2 REs & large sample. computational nightmare – please help”

10 0.96113253 1530 andrew gelman stats-2012-10-11-Migrating your blog from Movable Type to WordPress

11 0.959795 76 andrew gelman stats-2010-06-09-Both R and Stata

12 0.95817012 873 andrew gelman stats-2011-08-26-Luck or knowledge?

13 0.95444375 2082 andrew gelman stats-2013-10-30-Berri Gladwell Loken football update

14 0.95097595 1971 andrew gelman stats-2013-08-07-I doubt they cheated

same-blog 15 0.94911909 1983 andrew gelman stats-2013-08-15-More on AIC, WAIC, etc

16 0.94581431 276 andrew gelman stats-2010-09-14-Don’t look at just one poll number–unless you really know what you’re doing!

17 0.94460827 1427 andrew gelman stats-2012-07-24-More from the sister blog

18 0.94279337 1278 andrew gelman stats-2012-04-23-“Any old map will do” meets “God is in every leaf of every tree”

19 0.93421215 866 andrew gelman stats-2011-08-23-Participate in a research project on combining information for prediction

20 0.93008399 2102 andrew gelman stats-2013-11-15-“Are all significant p-values created equal?”