andrew_gelman_stats andrew_gelman_stats-2010 andrew_gelman_stats-2010-63 knowledge-graph by maker-knowledge-mining

63 andrew gelman stats-2010-06-02-The problem of overestimation of group-level variance parameters


meta infos for this blog

Source: html

Introduction: John Lawson writes: I have been experimenting using Bayesian Methods to estimate variance components, and I have noticed that even when I use a noninformative prior, my estimates are never close to the method of moments or REML estimates. In every case I have tried, the sum of the Bayesian estimated variance components is always larger than the sum of the estimates obtained by method of moments or REML. For data sets I have used that arise from a simple one-way random effects model, the Bayesian estimates of the between groups variance component is usually larger than the method of moments or REML estimates. When I use a uniform prior on the between standard deviation (as you recommended in your 2006 paper ) rather than an inverse gamma prior on the between variance component, the between variance component is usually reduced. However, for the dyestuff data in Davies(1949, p74), the opposite appears to be the case. I am a worried that the Bayesian estimators of the varian


Summary: the most important sentenses genereted by tfidf model

sentIndex sentText sentNum sentScore

1 John Lawson writes: I have been experimenting using Bayesian Methods to estimate variance components, and I have noticed that even when I use a noninformative prior, my estimates are never close to the method of moments or REML estimates. [sent-1, score-1.476]

2 In every case I have tried, the sum of the Bayesian estimated variance components is always larger than the sum of the estimates obtained by method of moments or REML. [sent-2, score-1.83]

3 For data sets I have used that arise from a simple one-way random effects model, the Bayesian estimates of the between groups variance component is usually larger than the method of moments or REML estimates. [sent-3, score-1.681]

4 When I use a uniform prior on the between standard deviation (as you recommended in your 2006 paper ) rather than an inverse gamma prior on the between variance component, the between variance component is usually reduced. [sent-4, score-2.3]

5 However, for the dyestuff data in Davies(1949, p74), the opposite appears to be the case. [sent-5, score-0.123]

6 I am a worried that the Bayesian estimators of the variance components are too large. [sent-6, score-0.85]

7 Do you have any comments or advice for me on this topic such as what noninformative prior I should use? [sent-7, score-0.509]

8 Any response from you have would be greatly appreciated. [sent-8, score-0.085]

9 My reply: In my 2006 paper I recommend a half-Cauchy prior distribution. [sent-9, score-0.384]

10 If you set the scale on this prior to a reasonable value, it should reduce the tendency to overestimate the group-level variance. [sent-10, score-0.658]

11 I’d also note that many statisticians actually like to overestimate the group-level variance, as this results in less shrinkage, and many statisticians are uncomfortable with shrinkage. [sent-11, score-0.639]


similar blogs computed by tfidf model

tfidf for this blog:

wordName wordTfidf (topN-words)

[('variance', 0.42), ('reml', 0.292), ('moments', 0.278), ('prior', 0.271), ('component', 0.259), ('components', 0.256), ('shrinkage', 0.188), ('noninformative', 0.181), ('overestimate', 0.181), ('sum', 0.176), ('method', 0.145), ('bayesian', 0.138), ('estimates', 0.137), ('lawson', 0.133), ('experimenting', 0.125), ('davies', 0.116), ('statisticians', 0.108), ('larger', 0.101), ('usually', 0.101), ('estimators', 0.099), ('gamma', 0.099), ('inverse', 0.095), ('uncomfortable', 0.089), ('greatly', 0.085), ('obtained', 0.083), ('tendency', 0.08), ('deviation', 0.079), ('uniform', 0.079), ('use', 0.077), ('worried', 0.075), ('recommended', 0.074), ('reduce', 0.069), ('arise', 0.068), ('opposite', 0.066), ('sets', 0.065), ('noticed', 0.062), ('estimated', 0.058), ('recommend', 0.058), ('groups', 0.057), ('appears', 0.057), ('tried', 0.057), ('scale', 0.057), ('advice', 0.057), ('paper', 0.055), ('note', 0.055), ('close', 0.051), ('random', 0.05), ('many', 0.049), ('value', 0.049), ('john', 0.047)]

similar blogs list:

simIndex simValue blogId blogTitle

same-blog 1 0.99999994 63 andrew gelman stats-2010-06-02-The problem of overestimation of group-level variance parameters

Introduction: John Lawson writes: I have been experimenting using Bayesian Methods to estimate variance components, and I have noticed that even when I use a noninformative prior, my estimates are never close to the method of moments or REML estimates. In every case I have tried, the sum of the Bayesian estimated variance components is always larger than the sum of the estimates obtained by method of moments or REML. For data sets I have used that arise from a simple one-way random effects model, the Bayesian estimates of the between groups variance component is usually larger than the method of moments or REML estimates. When I use a uniform prior on the between standard deviation (as you recommended in your 2006 paper ) rather than an inverse gamma prior on the between variance component, the between variance component is usually reduced. However, for the dyestuff data in Davies(1949, p74), the opposite appears to be the case. I am a worried that the Bayesian estimators of the varian

2 0.265661 779 andrew gelman stats-2011-06-25-Avoiding boundary estimates using a prior distribution as regularization

Introduction: For awhile I’ve been fitting most of my multilevel models using lmer/glmer, which gives point estimates of the group-level variance parameters (maximum marginal likelihood estimate for lmer and an approximation for glmer). I’m usually satisfied with this–sure, point estimation understates the uncertainty in model fitting, but that’s typically the least of our worries. Sometimes, though, lmer/glmer estimates group-level variances at 0 or estimates group-level correlation parameters at +/- 1. Typically, when this happens, it’s not that we’re so sure the variance is close to zero or that the correlation is close to 1 or -1; rather, the marginal likelihood does not provide a lot of information about these parameters of the group-level error distribution. I don’t want point estimates on the boundary. I don’t want to say that the unexplained variance in some dimension is exactly zero. One way to handle this problem is full Bayes: slap a prior on sigma, do your Gibbs and Metropolis

3 0.2206772 810 andrew gelman stats-2011-07-20-Adding more information can make the variance go up (depending on your model)

Introduction: Andy McKenzie writes: In their March 9 “ counterpoint ” in nature biotech to the prospect that we should try to integrate more sources of data in clinical practice (see “ point ” arguing for this), Isaac Kohane and David Margulies claim that, “Finally, how much better is our new knowledge than older knowledge? When is the incremental benefit of a genomic variant(s) or gene expression profile relative to a family history or classic histopathology insufficient and when does it add rather than subtract variance?” Perhaps I am mistaken (thus this email), but it seems that this claim runs contra to the definition of conditional probability. That is, if you have a hierarchical model, and the family history / classical histopathology already suggests a parameter estimate with some variance, how could the new genomic info possibly increase the variance of that parameter estimate? Surely the question is how much variance the new genomic info reduces and whether it therefore justifies t

4 0.21838839 1465 andrew gelman stats-2012-08-21-D. Buggin

Introduction: Joe Zhao writes: I am trying to fit my data using the scaled inverse wishart model you mentioned in your book, Data analysis using regression and hierarchical models. Instead of using a uniform prior on the scale parameters, I try to use a log-normal distribution prior. However, I found that the individual coefficients don’t shrink much to a certain value even a highly informative prior (with extremely low variance) is considered. The coefficients are just very close to their least-squares estimations. Is it because of the log-normal prior I’m using or I’m wrong somewhere? My reply: If your priors are concentrated enough at zero variance, then yeah, the posterior estimates of the parameters should be pulled (almost) all the way to zero. If this isn’t happening, you got a problem. So as a start I’d try putting in some really strong priors concentrated at 0 (for example, N(0,.1^2)) and checking that you get a sensible answer. If not, you might well have a bug. You can also try

5 0.21744537 2109 andrew gelman stats-2013-11-21-Hidden dangers of noninformative priors

Introduction: Following up on Christian’s post [link fixed] on the topic, I’d like to offer a few thoughts of my own. In BDA, we express the idea that a noninformative prior is a placeholder: you can use the noninformative prior to get the analysis started, then if your posterior distribution is less informative than you would like, or if it does not make sense, you can go back and add prior information. Same thing for the data model (the “likelihood”), for that matter: it often makes sense to start with something simple and conventional and then go from there. So, in that sense, noninformative priors are no big deal, they’re just a way to get started. Just don’t take them too seriously. Traditionally in statistics we’ve worked with the paradigm of a single highly informative dataset with only weak external information. But if the data are sparse and prior information is strong, we have to think differently. And, when you increase the dimensionality of a problem, both these things hap

6 0.21574637 669 andrew gelman stats-2011-04-19-The mysterious Gamma (1.4, 0.4)

7 0.20265974 1941 andrew gelman stats-2013-07-16-Priors

8 0.19755015 1474 andrew gelman stats-2012-08-29-More on scaled-inverse Wishart and prior independence

9 0.18997675 2143 andrew gelman stats-2013-12-22-The kluges of today are the textbook solutions of tomorrow.

10 0.18673417 519 andrew gelman stats-2011-01-16-Update on the generalized method of moments

11 0.18358393 1046 andrew gelman stats-2011-12-07-Neutral noninformative and informative conjugate beta and gamma prior distributions

12 0.18084827 2145 andrew gelman stats-2013-12-24-Estimating and summarizing inference for hierarchical variance parameters when the number of groups is small

13 0.17935622 184 andrew gelman stats-2010-08-04-That half-Cauchy prior

14 0.17922637 801 andrew gelman stats-2011-07-13-On the half-Cauchy prior for a global scale parameter

15 0.17644165 846 andrew gelman stats-2011-08-09-Default priors update?

16 0.17543837 464 andrew gelman stats-2010-12-12-Finite-population standard deviation in a hierarchical model

17 0.17472078 1155 andrew gelman stats-2012-02-05-What is a prior distribution?

18 0.17364533 1102 andrew gelman stats-2012-01-06-Bayesian Anova found useful in ecology

19 0.17198031 2200 andrew gelman stats-2014-02-05-Prior distribution for a predicted probability

20 0.16029687 1786 andrew gelman stats-2013-04-03-Hierarchical array priors for ANOVA decompositions


similar blogs computed by lsi model

lsi for this blog:

topicId topicWeight

[(0, 0.162), (1, 0.232), (2, 0.047), (3, 0.003), (4, -0.032), (5, -0.049), (6, 0.119), (7, 0.016), (8, -0.13), (9, 0.077), (10, 0.064), (11, -0.018), (12, 0.107), (13, 0.048), (14, 0.055), (15, 0.01), (16, -0.06), (17, 0.01), (18, 0.016), (19, 0.028), (20, -0.037), (21, 0.007), (22, 0.038), (23, 0.048), (24, -0.006), (25, -0.038), (26, -0.021), (27, 0.052), (28, 0.019), (29, -0.014), (30, 0.032), (31, 0.016), (32, 0.026), (33, -0.043), (34, 0.033), (35, 0.004), (36, 0.009), (37, 0.011), (38, 0.001), (39, 0.009), (40, 0.016), (41, 0.031), (42, -0.036), (43, 0.055), (44, -0.027), (45, -0.011), (46, 0.017), (47, -0.016), (48, 0.008), (49, 0.019)]

similar blogs list:

simIndex simValue blogId blogTitle

same-blog 1 0.97693366 63 andrew gelman stats-2010-06-02-The problem of overestimation of group-level variance parameters

Introduction: John Lawson writes: I have been experimenting using Bayesian Methods to estimate variance components, and I have noticed that even when I use a noninformative prior, my estimates are never close to the method of moments or REML estimates. In every case I have tried, the sum of the Bayesian estimated variance components is always larger than the sum of the estimates obtained by method of moments or REML. For data sets I have used that arise from a simple one-way random effects model, the Bayesian estimates of the between groups variance component is usually larger than the method of moments or REML estimates. When I use a uniform prior on the between standard deviation (as you recommended in your 2006 paper ) rather than an inverse gamma prior on the between variance component, the between variance component is usually reduced. However, for the dyestuff data in Davies(1949, p74), the opposite appears to be the case. I am a worried that the Bayesian estimators of the varian

2 0.90690529 846 andrew gelman stats-2011-08-09-Default priors update?

Introduction: Ryan King writes: I was wondering if you have a brief comment on the state of the art for objective priors for hierarchical generalized linear models (generalized linear mixed models). I have been working off the papers in Bayesian Analysis (2006) 1, Number 3 (Browne and Draper, Kass and Natarajan, Gelman). There seems to have been continuous work for matching priors in linear mixed models, but GLMMs less so because of the lack of an analytic marginal likelihood for the variance components. There are a number of additional suggestions in the literature since 2006, but little robust practical guidance. I’m interested in both mean parameters and the variance components. I’m almost always concerned with logistic random effect models. I’m fascinated by the matching-priors idea of higher-order asymptotic improvements to maximum likelihood, and need to make some kind of defensible default recommendation. Given the massive scale of the datasets (genetics …), extensive sensitivity a

3 0.88033748 779 andrew gelman stats-2011-06-25-Avoiding boundary estimates using a prior distribution as regularization

Introduction: For awhile I’ve been fitting most of my multilevel models using lmer/glmer, which gives point estimates of the group-level variance parameters (maximum marginal likelihood estimate for lmer and an approximation for glmer). I’m usually satisfied with this–sure, point estimation understates the uncertainty in model fitting, but that’s typically the least of our worries. Sometimes, though, lmer/glmer estimates group-level variances at 0 or estimates group-level correlation parameters at +/- 1. Typically, when this happens, it’s not that we’re so sure the variance is close to zero or that the correlation is close to 1 or -1; rather, the marginal likelihood does not provide a lot of information about these parameters of the group-level error distribution. I don’t want point estimates on the boundary. I don’t want to say that the unexplained variance in some dimension is exactly zero. One way to handle this problem is full Bayes: slap a prior on sigma, do your Gibbs and Metropolis

4 0.86700499 801 andrew gelman stats-2011-07-13-On the half-Cauchy prior for a global scale parameter

Introduction: Nick Polson and James Scott write : We generalize the half-Cauchy prior for a global scale parameter to the wider class of hypergeometric inverted-beta priors. We derive expressions for posterior moments and marginal densities when these priors are used for a top-level normal variance in a Bayesian hierarchical model. Finally, we prove a result that characterizes the frequentist risk of the Bayes estimators under all priors in the class. These arguments provide an alternative, classical justification for the use of the half-Cauchy prior in Bayesian hierarchical models, complementing the arguments in Gelman (2006). This makes me happy, of course. It’s great to be validated. The only think I didn’t catch is how they set the scale parameter for the half-Cauchy prior. In my 2006 paper I frame it as a weakly informative prior and recommend that the scale be set based on actual prior knowledge. But Polson and Scott are talking about a default choice. I used to think that such a

5 0.83795136 2129 andrew gelman stats-2013-12-10-Cross-validation and Bayesian estimation of tuning parameters

Introduction: Ilya Lipkovich writes: I read with great interest your 2008 paper [with Aleks Jakulin, Grazia Pittau, and Yu-Sung Su] on weakly informative priors for logistic regression and also followed an interesting discussion on your blog. This discussion was within Bayesian community in relation to the validity of priors. However i would like to approach it rather from a more broad perspective on predictive modeling bringing in the ideas from machine/statistical learning approach”. Actually you were the first to bring it up by mentioning in your paper “borrowing ideas from computer science” on cross-validation when comparing predictive ability of your proposed priors with other choices. However, using cross-validation for comparing method performance is not the only or primary use of CV in machine-learning. Most of machine learning methods have some “meta” or complexity parameters and use cross-validation to tune them up. For example, one of your comparison methods is BBR which actually

6 0.83763307 1858 andrew gelman stats-2013-05-15-Reputations changeable, situations tolerable

7 0.8114934 1465 andrew gelman stats-2012-08-21-D. Buggin

8 0.80122966 184 andrew gelman stats-2010-08-04-That half-Cauchy prior

9 0.785694 1474 andrew gelman stats-2012-08-29-More on scaled-inverse Wishart and prior independence

10 0.78198034 669 andrew gelman stats-2011-04-19-The mysterious Gamma (1.4, 0.4)

11 0.7721774 1046 andrew gelman stats-2011-12-07-Neutral noninformative and informative conjugate beta and gamma prior distributions

12 0.76569206 1466 andrew gelman stats-2012-08-22-The scaled inverse Wishart prior distribution for a covariance matrix in a hierarchical model

13 0.75328094 1941 andrew gelman stats-2013-07-16-Priors

14 0.74790388 1092 andrew gelman stats-2011-12-29-More by Berger and me on weakly informative priors

15 0.73530781 247 andrew gelman stats-2010-09-01-How does Bayes do it?

16 0.73253971 1946 andrew gelman stats-2013-07-19-Prior distributions on derived quantities rather than on parameters themselves

17 0.72891426 2017 andrew gelman stats-2013-09-11-“Informative g-Priors for Logistic Regression”

18 0.72702491 2109 andrew gelman stats-2013-11-21-Hidden dangers of noninformative priors

19 0.72336847 1786 andrew gelman stats-2013-04-03-Hierarchical array priors for ANOVA decompositions

20 0.7217769 468 andrew gelman stats-2010-12-15-Weakly informative priors and imprecise probabilities


similar blogs computed by lda model

lda for this blog:

topicId topicWeight

[(16, 0.03), (24, 0.283), (38, 0.073), (55, 0.025), (59, 0.016), (60, 0.014), (63, 0.019), (79, 0.035), (86, 0.026), (95, 0.045), (99, 0.318)]

similar blogs list:

simIndex simValue blogId blogTitle

same-blog 1 0.98313218 63 andrew gelman stats-2010-06-02-The problem of overestimation of group-level variance parameters

Introduction: John Lawson writes: I have been experimenting using Bayesian Methods to estimate variance components, and I have noticed that even when I use a noninformative prior, my estimates are never close to the method of moments or REML estimates. In every case I have tried, the sum of the Bayesian estimated variance components is always larger than the sum of the estimates obtained by method of moments or REML. For data sets I have used that arise from a simple one-way random effects model, the Bayesian estimates of the between groups variance component is usually larger than the method of moments or REML estimates. When I use a uniform prior on the between standard deviation (as you recommended in your 2006 paper ) rather than an inverse gamma prior on the between variance component, the between variance component is usually reduced. However, for the dyestuff data in Davies(1949, p74), the opposite appears to be the case. I am a worried that the Bayesian estimators of the varian

2 0.97249293 197 andrew gelman stats-2010-08-10-The last great essayist?

Introduction: I recently read a bizarre article by Janet Malcolm on a murder trial in NYC. What threw me about the article was that the story was utterly commonplace (by the standards of today’s headlines): divorced mom kills ex-husband in a custody dispute over their four-year-old daughter. The only interesting features were (a) the wife was a doctor and the husband were a dentist, the sort of people you’d expect to sue rather than slay, and (b) the wife hired a hitman from within the insular immigrant community that she (and her husband) belonged to. But, really, neither of these was much of a twist. To add to the non-storyness of it all, there were no other suspects, the evidence against the wife and the hitman was overwhelming, and even the high-paid defense lawyers didn’t seem to be making much of an effort to convince anyone of their client’s innocents. (One of the closing arguments was that one aspect of the wife’s story was so ridiculous that it had to be true. In the lawyer’s wo

3 0.97063351 847 andrew gelman stats-2011-08-10-Using a “pure infographic” to explore differences between information visualization and statistical graphics

Introduction: Our discussion on data visualization continues. One one side are three statisticians–Antony Unwin, Kaiser Fung, and myself. We have been writing about the different goals served by information visualization and statistical graphics. On the other side are graphics experts (sorry for the imprecision, I don’t know exactly what these people do in their day jobs or how they are trained, and I don’t want to mislabel them) such as Robert Kosara and Jen Lowe , who seem a bit annoyed at how my colleagues and myself seem to follow the Tufte strategy of criticizing what we don’t understand. And on the third side are many (most?) academic statisticians, econometricians, etc., who don’t understand or respect graphs and seem to think of visualization as a toy that is unrelated to serious science or statistics. I’m not so interested in the third group right now–I tried to communicate with them in my big articles from 2003 and 2004 )–but I am concerned that our dialogue with the graphic

4 0.96975636 1838 andrew gelman stats-2013-05-03-Setting aside the politics, the debate over the new health-care study reveals that we’re moving to a new high standard of statistical journalism

Introduction: Pointing to this news article by Megan McArdle discussing a recent study of Medicaid recipients, Jonathan Falk writes: Forget the interpretation for a moment, and the political spin, but haven’t we reached an interesting point when a journalist says things like: When you do an RCT with more than 12,000 people in it, and your defense of your hypothesis is that maybe the study just didn’t have enough power, what you’re actually saying is “the beneficial effects are probably pretty small”. and A good Bayesian—and aren’t most of us are supposed to be good Bayesians these days?—should be updating in light of this new information. Given this result, what is the likelihood that Obamacare will have a positive impact on the average health of Americans? Every one of us, for or against, should be revising that probability downwards. I’m not saying that you have to revise it to zero; I certainly haven’t. But however high it was yesterday, it should be somewhat lower today. This

5 0.96845043 1455 andrew gelman stats-2012-08-12-Probabilistic screening to get an approximate self-weighted sample

Introduction: Sharad had a survey sampling question: We’re trying to use mechanical turk to conduct some surveys, and have quickly discovered that turkers tend to be quite young. We’d really like a representative sample of the U.S., or at the least be able to recruit a diverse enough sample from turk that we can post-stratify to adjust the estimates. The approach we ended up taking is to pay turkers a small amount to answer a couple of screening questions (age & sex), and then probabilistically recruit individuals to complete the full survey (for more money) based on the estimated turk population parameters and our desired target distribution. We use rejection sampling, so the end result is that individuals who are invited to take the full survey look as if they came from a representative sample, at least in terms of age and sex. I’m wondering whether this sort of technique—a two step design in which participants are first screened and then probabilistically selected to mimic a target distributio

6 0.96834719 1368 andrew gelman stats-2012-06-06-Question 27 of my final exam for Design and Analysis of Sample Surveys

7 0.96813583 1072 andrew gelman stats-2011-12-19-“The difference between . . .”: It’s not just p=.05 vs. p=.06

8 0.96806526 1240 andrew gelman stats-2012-04-02-Blogads update

9 0.96805704 1087 andrew gelman stats-2011-12-27-“Keeping things unridiculous”: Berger, O’Hagan, and me on weakly informative priors

10 0.96792287 414 andrew gelman stats-2010-11-14-“Like a group of teenagers on a bus, they behave in public as if they were in private”

11 0.96782804 2129 andrew gelman stats-2013-12-10-Cross-validation and Bayesian estimation of tuning parameters

12 0.96752441 2247 andrew gelman stats-2014-03-14-The maximal information coefficient

13 0.96716249 1474 andrew gelman stats-2012-08-29-More on scaled-inverse Wishart and prior independence

14 0.9671095 1999 andrew gelman stats-2013-08-27-Bayesian model averaging or fitting a larger model

15 0.96682179 953 andrew gelman stats-2011-10-11-Steve Jobs’s cancer and science-based medicine

16 0.96557778 1224 andrew gelman stats-2012-03-21-Teaching velocity and acceleration

17 0.96461368 1465 andrew gelman stats-2012-08-21-D. Buggin

18 0.9636693 2109 andrew gelman stats-2013-11-21-Hidden dangers of noninformative priors

19 0.96352458 1421 andrew gelman stats-2012-07-19-Alexa, Maricel, and Marty: Three cellular automata who got on my nerves

20 0.96346891 1208 andrew gelman stats-2012-03-11-Gelman on Hennig on Gelman on Bayes