andrew_gelman_stats andrew_gelman_stats-2012 andrew_gelman_stats-2012-1377 knowledge-graph by maker-knowledge-mining
Source: html
Introduction: Jacob Oaknin asks: Akaike ‘s selection criterion is often justified on the basis of the empirical risk of a ML estimate being a biased estimate of the true generalization error of a parametric family, say the family, S_m, of linear regressors on a m-dimensional variable x=(x_1,..,x_m) with gaussian noise independent of x (for instance in “Unifying the derivations for the Akaike and Corrected Akaike information criteria”, by J.E.Cavanaugh, Statistics and Probability Letters, vol. 33, 1997, pp. 201-208). On the other hand, the family S_m is known to have finite VC-dimension (VC = m+1), and this fact should grant that empirical risk minimizer is asymtotically consistent regardless of the underlying probability distribution, and in particular for the assumed gaussian distribution of noise(“An overview of statistical learning theory”, by V.N.Vapnik, IEEE Transactions On Neural Networks, vol. 10, No. 5, 1999, pp. 988-999) What am I missing? My reply: I’m no expert on AIC so
sentIndex sentText sentNum sentScore
1 Jacob Oaknin asks: Akaike ‘s selection criterion is often justified on the basis of the empirical risk of a ML estimate being a biased estimate of the true generalization error of a parametric family, say the family, S_m, of linear regressors on a m-dimensional variable x=(x_1,. [sent-1, score-1.463]
2 ,x_m) with gaussian noise independent of x (for instance in “Unifying the derivations for the Akaike and Corrected Akaike information criteria”, by J. [sent-3, score-0.618]
3 My reply: I’m no expert on AIC so let’s see if I can wing this and give a reasonable response without fully understanding either the question or the answer. [sent-14, score-0.268]
4 Here goes: As the saying goes, asymptotically we’re all dead. [sent-18, score-0.225]
5 If you divide by n so that you’re estimating average prediction error rather than total prediction error, the correction is of order O(1/n). [sent-20, score-1.118]
6 So asymptotically the uncorrected estimate of the average prediction error is consistent, just as you’d like. [sent-21, score-1.002]
wordName wordTfidf (topN-words)
[('akaike', 0.344), ('aic', 0.23), ('asymptotically', 0.225), ('family', 0.216), ('prediction', 0.21), ('error', 0.205), ('correction', 0.186), ('gaussian', 0.177), ('noise', 0.162), ('vc', 0.146), ('regressors', 0.146), ('ieee', 0.137), ('risk', 0.136), ('estimate', 0.136), ('empirical', 0.134), ('consistent', 0.133), ('unifying', 0.132), ('wing', 0.127), ('uncorrected', 0.127), ('transactions', 0.127), ('derivations', 0.127), ('ml', 0.127), ('neural', 0.115), ('order', 0.113), ('goes', 0.111), ('parametric', 0.107), ('jacob', 0.104), ('overview', 0.102), ('regardless', 0.102), ('distribution', 0.101), ('justified', 0.101), ('corrected', 0.101), ('criterion', 0.101), ('generalization', 0.101), ('average', 0.099), ('divide', 0.095), ('probability', 0.092), ('grant', 0.091), ('criteria', 0.089), ('finite', 0.089), ('letters', 0.086), ('biased', 0.086), ('networks', 0.085), ('assumed', 0.083), ('instance', 0.082), ('basis', 0.074), ('fully', 0.072), ('asks', 0.071), ('independent', 0.07), ('expert', 0.069)]
simIndex simValue blogId blogTitle
same-blog 1 0.99999994 1377 andrew gelman stats-2012-06-13-A question about AIC
Introduction: Jacob Oaknin asks: Akaike ‘s selection criterion is often justified on the basis of the empirical risk of a ML estimate being a biased estimate of the true generalization error of a parametric family, say the family, S_m, of linear regressors on a m-dimensional variable x=(x_1,..,x_m) with gaussian noise independent of x (for instance in “Unifying the derivations for the Akaike and Corrected Akaike information criteria”, by J.E.Cavanaugh, Statistics and Probability Letters, vol. 33, 1997, pp. 201-208). On the other hand, the family S_m is known to have finite VC-dimension (VC = m+1), and this fact should grant that empirical risk minimizer is asymtotically consistent regardless of the underlying probability distribution, and in particular for the assumed gaussian distribution of noise(“An overview of statistical learning theory”, by V.N.Vapnik, IEEE Transactions On Neural Networks, vol. 10, No. 5, 1999, pp. 988-999) What am I missing? My reply: I’m no expert on AIC so
2 0.20297925 1975 andrew gelman stats-2013-08-09-Understanding predictive information criteria for Bayesian models
Introduction: Jessy, Aki, and I write : We review the Akaike, deviance, and Watanabe-Akaike information criteria from a Bayesian perspective, where the goal is to estimate expected out-of-sample-prediction error using a bias-corrected adjustment of within-sample error. We focus on the choices involved in setting up these measures, and we compare them in three simple examples, one theoretical and two applied. The contribution of this review is to put all these information criteria into a Bayesian predictive context and to better understand, through small examples, how these methods can apply in practice. I like this paper. It came about as a result of preparing Chapter 7 for the new BDA . I had difficulty understanding AIC, DIC, WAIC, etc., but I recognized that these methods served a need. My first plan was to just apply DIC and WAIC on a couple of simple examples (a linear regression and the 8 schools) and leave it at that. But when I did the calculations, I couldn’t understand the resu
3 0.175777 1983 andrew gelman stats-2013-08-15-More on AIC, WAIC, etc
Introduction: Following up on our discussion from the other day, Angelika van der Linde sends along this paper from 2012 (link to journal here ). And Aki pulls out this great quote from Geisser and Eddy (1979): This discussion makes clear that in the nested case this method, as Akaike’s, is not consistent; i.e., even if $M_k$ is true, it will be rejected with probability $\alpha$ as $N\to\infty$. This point is also made by Schwarz (1978). However, from the point of view of prediction, this is of no great consequence. For large numbers of observations, a prediction based on the falsely assumed $M_k$, will not differ appreciably from one based on the true $M_k$. For example, if we assert that two normal populations have different means when in fact they have the same mean, then the use of the group mean as opposed to the grand mean for predicting a future observation results in predictors which are asymptotically equivalent and whose predictive variances are $\sigma^2[1 + (1/2n)]$ and $\si
4 0.16746072 778 andrew gelman stats-2011-06-24-New ideas on DIC from Martyn Plummer and Sumio Watanabe
Introduction: Martyn Plummer replied to my recent blog on DIC with information that was important enough that I thought it deserved its own blog entry. Martyn wrote: DIC has been around for 10 years now and despite being immensely popular with applied statisticians it has generated very little theoretical interest. In fact, the silence has been deafening. I [Martyn] hope my paper added some clarity. As you say, DIC is (an approximation to) a theoretical out-of-sample predictive error. When I finished the paper I was a little embarrassed to see that I had almost perfectly reconstructed the justification of AIC as approximate cross-validation measure by Stone (1977), with a Bayesian spin of course. But even this insight leaves a lot of choices open. You need to choose the right loss function and also which level of the model you want to replicate from. David Spiegelhalter and colleagues called this the “focus”. In practice the focus is limited to the lowest level of the model. You generall
Introduction: Type S error: When your estimate is the wrong sign, compared to the true value of the parameter Type M error: When the magnitude of your estimate is far off, compared to the true value of the parameter More here.
6 0.11472203 1960 andrew gelman stats-2013-07-28-More on that machine learning course
10 0.1069802 1956 andrew gelman stats-2013-07-25-What should be in a machine learning course?
11 0.10309381 822 andrew gelman stats-2011-07-26-Any good articles on the use of error bars?
12 0.096873604 696 andrew gelman stats-2011-05-04-Whassup with glm()?
13 0.094629847 1737 andrew gelman stats-2013-02-25-Correlation of 1 . . . too good to be true?
14 0.093988314 960 andrew gelman stats-2011-10-15-The bias-variance tradeoff
15 0.092671886 493 andrew gelman stats-2010-12-31-Obituaries in 2010
16 0.089830086 1363 andrew gelman stats-2012-06-03-Question about predictive checks
17 0.084439248 1162 andrew gelman stats-2012-02-11-Adding an error model to a deterministic model
18 0.083611459 2046 andrew gelman stats-2013-10-01-I’ll say it again
19 0.08041735 255 andrew gelman stats-2010-09-04-How does multilevel modeling affect the estimate of the grand mean?
20 0.080345549 2349 andrew gelman stats-2014-05-26-WAIC and cross-validation in Stan!
topicId topicWeight
[(0, 0.121), (1, 0.084), (2, 0.06), (3, -0.034), (4, 0.012), (5, 0.014), (6, 0.008), (7, 0.003), (8, 0.0), (9, -0.028), (10, -0.014), (11, 0.011), (12, -0.031), (13, 0.015), (14, -0.086), (15, 0.008), (16, 0.004), (17, -0.005), (18, 0.021), (19, -0.039), (20, 0.018), (21, 0.026), (22, 0.042), (23, -0.003), (24, 0.028), (25, 0.041), (26, -0.007), (27, 0.039), (28, 0.004), (29, -0.027), (30, -0.01), (31, 0.073), (32, -0.0), (33, -0.018), (34, 0.039), (35, -0.043), (36, -0.0), (37, -0.029), (38, 0.008), (39, -0.051), (40, -0.053), (41, -0.045), (42, -0.049), (43, -0.02), (44, 0.008), (45, 0.064), (46, 0.047), (47, 0.013), (48, -0.035), (49, -0.02)]
simIndex simValue blogId blogTitle
same-blog 1 0.97727108 1377 andrew gelman stats-2012-06-13-A question about AIC
Introduction: Jacob Oaknin asks: Akaike ‘s selection criterion is often justified on the basis of the empirical risk of a ML estimate being a biased estimate of the true generalization error of a parametric family, say the family, S_m, of linear regressors on a m-dimensional variable x=(x_1,..,x_m) with gaussian noise independent of x (for instance in “Unifying the derivations for the Akaike and Corrected Akaike information criteria”, by J.E.Cavanaugh, Statistics and Probability Letters, vol. 33, 1997, pp. 201-208). On the other hand, the family S_m is known to have finite VC-dimension (VC = m+1), and this fact should grant that empirical risk minimizer is asymtotically consistent regardless of the underlying probability distribution, and in particular for the assumed gaussian distribution of noise(“An overview of statistical learning theory”, by V.N.Vapnik, IEEE Transactions On Neural Networks, vol. 10, No. 5, 1999, pp. 988-999) What am I missing? My reply: I’m no expert on AIC so
Introduction: Type S error: When your estimate is the wrong sign, compared to the true value of the parameter Type M error: When the magnitude of your estimate is far off, compared to the true value of the parameter More here.
3 0.69856334 1761 andrew gelman stats-2013-03-13-Lame Statistics Patents
Introduction: Manoel Galdino wrote in a comment off-topic on another post (which I erased): I know you commented before about patents on statistical methods. Did you know this patent ( http://www.archpatent.com/patents/8032473 )? Do you have any comment on patents that don’t describe mathematically how it works and how and if they’re any different from previous methods? And what about the lack of scientific validation of the claims in such a method? The patent in question, “US 8032473: “Generalized reduced error logistic regression method,” begins with the following “claim”: A system for machine learning comprising: a computer including a computer-readable medium having software stored thereon that, when executed by said computer, performs a method comprising the steps of being trained to learn a logistic regression match to a target class variable so to exhibit classification learning by which: an estimated error in each variable’s moment in the logistic regression be modeled and reduce
Introduction: David Radwin writes: I am seeking a statistic measuring an estimate’s reliability or stability as an alternative to the coefficient of variation (CV), also known as the relative standard error. The CV is the standard error of an estimate (proportion, mean, regression coefficient, etc.) divided by the estimate itself, usually expressed as a percentage. For example, if a survey finds 15% unemployment with a 6% standard error, the CV is .06/.15 = .4 = 40%. Some US government agencies flag or suppress as unreliable any estimate with a CV over a certain threshold such as 30% or 50%. But this standard can be arbitrary (for example, 85% employment would have a much lower CV of .06/.85 = 7%), and the CV has other drawbacks I won’t elaborate here. I don’t need an evaluation of the wisdom of using the CV or anything else for measuring an estimate’s stability, but one of my projects calls for such a measure and I would like to find a better alternative. Can you or your blog readers suggest
5 0.67491484 778 andrew gelman stats-2011-06-24-New ideas on DIC from Martyn Plummer and Sumio Watanabe
Introduction: Martyn Plummer replied to my recent blog on DIC with information that was important enough that I thought it deserved its own blog entry. Martyn wrote: DIC has been around for 10 years now and despite being immensely popular with applied statisticians it has generated very little theoretical interest. In fact, the silence has been deafening. I [Martyn] hope my paper added some clarity. As you say, DIC is (an approximation to) a theoretical out-of-sample predictive error. When I finished the paper I was a little embarrassed to see that I had almost perfectly reconstructed the justification of AIC as approximate cross-validation measure by Stone (1977), with a Bayesian spin of course. But even this insight leaves a lot of choices open. You need to choose the right loss function and also which level of the model you want to replicate from. David Spiegelhalter and colleagues called this the “focus”. In practice the focus is limited to the lowest level of the model. You generall
6 0.66044831 822 andrew gelman stats-2011-07-26-Any good articles on the use of error bars?
7 0.62545353 2311 andrew gelman stats-2014-04-29-Bayesian Uncertainty Quantification for Differential Equations!
9 0.61988682 2201 andrew gelman stats-2014-02-06-Bootstrap averaging: Examples where it works and where it doesn’t work
10 0.60431916 341 andrew gelman stats-2010-10-14-Confusion about continuous probability densities
11 0.60403937 1284 andrew gelman stats-2012-04-26-Modeling probability data
12 0.60071957 1975 andrew gelman stats-2013-08-09-Understanding predictive information criteria for Bayesian models
13 0.59923506 2180 andrew gelman stats-2014-01-21-Everything I need to know about Bayesian statistics, I learned in eight schools.
14 0.59570116 1221 andrew gelman stats-2012-03-19-Whassup with deviance having a high posterior correlation with a parameter in the model?
15 0.59278482 960 andrew gelman stats-2011-10-15-The bias-variance tradeoff
16 0.59134829 1341 andrew gelman stats-2012-05-24-Question 14 of my final exam for Design and Analysis of Sample Surveys
17 0.58924216 1363 andrew gelman stats-2012-06-03-Question about predictive checks
18 0.58554077 1164 andrew gelman stats-2012-02-13-Help with this problem, win valuable prizes
19 0.58268952 1983 andrew gelman stats-2013-08-15-More on AIC, WAIC, etc
20 0.57955545 1348 andrew gelman stats-2012-05-27-Question 17 of my final exam for Design and Analysis of Sample Surveys
topicId topicWeight
[(16, 0.027), (24, 0.104), (30, 0.041), (41, 0.013), (52, 0.016), (53, 0.036), (59, 0.154), (61, 0.037), (65, 0.014), (66, 0.013), (76, 0.02), (84, 0.032), (86, 0.044), (89, 0.011), (96, 0.017), (99, 0.327)]
simIndex simValue blogId blogTitle
same-blog 1 0.96541476 1377 andrew gelman stats-2012-06-13-A question about AIC
Introduction: Jacob Oaknin asks: Akaike ‘s selection criterion is often justified on the basis of the empirical risk of a ML estimate being a biased estimate of the true generalization error of a parametric family, say the family, S_m, of linear regressors on a m-dimensional variable x=(x_1,..,x_m) with gaussian noise independent of x (for instance in “Unifying the derivations for the Akaike and Corrected Akaike information criteria”, by J.E.Cavanaugh, Statistics and Probability Letters, vol. 33, 1997, pp. 201-208). On the other hand, the family S_m is known to have finite VC-dimension (VC = m+1), and this fact should grant that empirical risk minimizer is asymtotically consistent regardless of the underlying probability distribution, and in particular for the assumed gaussian distribution of noise(“An overview of statistical learning theory”, by V.N.Vapnik, IEEE Transactions On Neural Networks, vol. 10, No. 5, 1999, pp. 988-999) What am I missing? My reply: I’m no expert on AIC so
2 0.95457345 1408 andrew gelman stats-2012-07-07-Not much difference between communicating to self and communicating to others
Introduction: Thomas Basbøll writes : [Advertising executive] Russell Davies wrote a blog post called “The Tyranny of the Big Idea”. His five-point procedure begins: Start doing stuff. Start executing things which seem right. Do it quickly and do it often. Don’t cling onto anything, good or bad. Don’t repeat much. Take what was good and do it differently. And ends with: “And something else and something else.” This inspires several thoughts, which I’ll take advantage of the blog format to present with no attempt to be cohesively organized. 1. My first concern is the extent to which productivity-enhancing advice such as Davies’s (and Basbøll’s) is zero or even negative-sum , just helping people in the rat race. But, upon reflection, I’d rate the recommendations as positive-sum. If people learn to write better and be more productive, that’s not (necessarily) just positional. 2. Blogging fits with the “Do it quickly and do it often” advice. 3. I wonder what Basbøll thinks abo
3 0.95440209 214 andrew gelman stats-2010-08-17-Probability-processing hardware
Introduction: Lyric Semiconductor posted: For over 60 years, computers have been based on digital computing principles. Data is represented as bits (0s and 1s). Boolean logic gates perform operations on these bits. A processor steps through many of these operations serially in order to perform a function. However, today’s most interesting problems are not at all suited to this approach. Here at Lyric Semiconductor, we are redesigning information processing circuits from the ground up to natively process probabilities: from the gate circuits to the processor architecture to the programming language. As a result, many applications that today require a thousand conventional processors will soon run in just one Lyric processor, providing 1,000x efficiencies in cost, power, and size. Om Malik has some more information, also relating to the team and the business. The fundamental idea is that computing architectures work deterministically, even though the world is fundamentally stochastic.
4 0.95398241 965 andrew gelman stats-2011-10-19-Web-friendly visualizations in R
Introduction: Aleks points me to this new tool from Wojciech Gryc. Right now I save my graphs as pdfs or pngs and then upload them to put them on the web. I expect I’ll still be doing this for awhile—I like having full control of what my graphs look like—but Gryc’s default plots might be useful for lots of people making their analyses more accessible. Here’s an example: x = rnorm(30) y = rnorm(30) wv.plot(x, y, "~/Desktop/scatterplot", height=300, width=300, xlim=c(-2.5,2.5), ylim=c(-2.5,2.5), xbreaks=c(0), ybreaks=c(0))
5 0.95030671 229 andrew gelman stats-2010-08-24-Bizarre twisty argument about medical diagnostic tests
Introduction: My cobloggers sometimes write about “Politics Everywhere.” Here’s an example of a political writer taking something that’s not particularly political and trying to twist it into a political context. Perhaps the title should be “political journalism everywhere”. Michael Kinsley writes : Scientists have discovered a spinal fluid test that can predict with 100 percent accuracy whether people who already have memory loss are going to develop full-fledged Alzheimer’s disease. They apparently don’t know whether this test works for people with no memory problems yet, but reading between the lines of the report in the New York Times August 10, it sounds as if they believe it will. . . . This is truly the apple of knowledge: a test that can be given to physically and mentally healthy people in the prime of life, which can identify with perfect accuracy which ones are slowly going to lose their mental capabilities. If your first instinct is, “We should outlaw this test” or at lea
7 0.94033158 1764 andrew gelman stats-2013-03-15-How do I make my graphs?
8 0.93710393 853 andrew gelman stats-2011-08-14-Preferential admissions for children of elite colleges
9 0.9351384 771 andrew gelman stats-2011-06-16-30 days of statistics
11 0.92767608 1380 andrew gelman stats-2012-06-15-Coaching, teaching, and writing
12 0.92708588 1190 andrew gelman stats-2012-02-29-Why “Why”?
13 0.92550874 1000 andrew gelman stats-2011-11-10-Forecasting 2012: How much does ideology matter?
14 0.92132843 766 andrew gelman stats-2011-06-14-Last Wegman post (for now)
15 0.92092758 199 andrew gelman stats-2010-08-11-Note to semi-spammers
16 0.92034918 2230 andrew gelman stats-2014-03-02-What is it with Americans in Olympic ski teams from tropical countries?
17 0.91718107 34 andrew gelman stats-2010-05-14-Non-academic writings on literature
18 0.91713983 517 andrew gelman stats-2011-01-14-Bayes in China update
19 0.9170571 2291 andrew gelman stats-2014-04-14-Transitioning to Stan
20 0.91666234 2285 andrew gelman stats-2014-04-07-On deck this week