andrew_gelman_stats andrew_gelman_stats-2013 andrew_gelman_stats-2013-2145 knowledge-graph by maker-knowledge-mining

2145 andrew gelman stats-2013-12-24-Estimating and summarizing inference for hierarchical variance parameters when the number of groups is small


meta infos for this blog

Source: html

Introduction: Chris Che-Castaldo writes: I am trying to compute variance components for a hierarchical model where the group level has two binary predictors and their interaction. When I model each of these three predictors as N(0, tau) the model will not converge, perhaps because the number of coefficients in each batch is so small (2 for the main effects and 4 for the interaction). Although I could simply leave all these as predictors as unmodeled fixed effects, the last sentence of section 21.2 on page 462 of Gelman and Hill (2007) suggests this would not be a wise course of action: For example, it is not clear how to define the (finite) standard deviation of variables that are included in interactions. I am curious – is there still no clear cut way to directly compute the finite standard deviation for binary unmodeled variables that are also part of an interaction as well as the interaction itself? My reply: I’d recommend including these in your model (it’s probably easiest to do so


Summary: the most important sentenses genereted by tfidf model

sentIndex sentText sentNum sentScore

1 Chris Che-Castaldo writes: I am trying to compute variance components for a hierarchical model where the group level has two binary predictors and their interaction. [sent-1, score-1.249]

2 When I model each of these three predictors as N(0, tau) the model will not converge, perhaps because the number of coefficients in each batch is so small (2 for the main effects and 4 for the interaction). [sent-2, score-0.812]

3 Although I could simply leave all these as predictors as unmodeled fixed effects, the last sentence of section 21. [sent-3, score-0.842]

4 2 on page 462 of Gelman and Hill (2007) suggests this would not be a wise course of action: For example, it is not clear how to define the (finite) standard deviation of variables that are included in interactions. [sent-4, score-0.677]

5 I am curious – is there still no clear cut way to directly compute the finite standard deviation for binary unmodeled variables that are also part of an interaction as well as the interaction itself? [sent-5, score-2.043]

6 My reply: I’d recommend including these in your model (it’s probably easiest to do so using Stan), but you’ll need informative priors on these hierarchical variance parameters to get everything to work. [sent-6, score-0.771]

7 I regret that I have not done much of this in my published work so it’s hard to point you to an example. [sent-8, score-0.178]

8 Che-Castaldo continues: My predicament is not an uncommon one in my field. [sent-9, score-0.251]

9 Ecologists do a lot of experiments where the unmodeled “fixed” factors are few in number, each factor has only a few groups (they are often binary), and the interactions are important (for example the ubiquitous 2×2 ANOVA). [sent-10, score-0.625]

10 Estimating effects is not difficult but using Bayesian ANOVA to compute variance components for these types of models seems a lot harder. [sent-11, score-0.795]

11 As we’ve been discussing a bit on the blog recently, I think it should be possible to use informative priors with scales set based on actual prior information. [sent-12, score-0.426]

12 I do think we will understand this sort of thing better once we have some good examples (and, indeed, my own lack of experience in this area is a big reason why we don’t have such examples in our books). [sent-13, score-0.251]


similar blogs computed by tfidf model

tfidf for this blog:

wordName wordTfidf (topN-words)

[('unmodeled', 0.373), ('binary', 0.232), ('interaction', 0.23), ('compute', 0.23), ('anova', 0.197), ('predictors', 0.195), ('variance', 0.187), ('components', 0.177), ('finite', 0.169), ('deviation', 0.164), ('predicament', 0.138), ('fixed', 0.13), ('priors', 0.129), ('effects', 0.128), ('informative', 0.126), ('ecologists', 0.124), ('batch', 0.12), ('hierarchical', 0.119), ('tau', 0.116), ('ubiquitous', 0.116), ('uncommon', 0.113), ('model', 0.109), ('wise', 0.107), ('converge', 0.105), ('scales', 0.101), ('easiest', 0.101), ('variables', 0.101), ('regret', 0.099), ('examples', 0.094), ('standard', 0.087), ('hill', 0.085), ('clear', 0.082), ('action', 0.082), ('number', 0.081), ('hard', 0.079), ('cut', 0.076), ('sentence', 0.074), ('types', 0.073), ('interactions', 0.071), ('define', 0.07), ('leave', 0.07), ('discussing', 0.07), ('coefficients', 0.07), ('curious', 0.069), ('chris', 0.069), ('continues', 0.067), ('included', 0.066), ('factor', 0.065), ('lack', 0.063), ('estimating', 0.063)]

similar blogs list:

simIndex simValue blogId blogTitle

same-blog 1 1.0000002 2145 andrew gelman stats-2013-12-24-Estimating and summarizing inference for hierarchical variance parameters when the number of groups is small

Introduction: Chris Che-Castaldo writes: I am trying to compute variance components for a hierarchical model where the group level has two binary predictors and their interaction. When I model each of these three predictors as N(0, tau) the model will not converge, perhaps because the number of coefficients in each batch is so small (2 for the main effects and 4 for the interaction). Although I could simply leave all these as predictors as unmodeled fixed effects, the last sentence of section 21.2 on page 462 of Gelman and Hill (2007) suggests this would not be a wise course of action: For example, it is not clear how to define the (finite) standard deviation of variables that are included in interactions. I am curious – is there still no clear cut way to directly compute the finite standard deviation for binary unmodeled variables that are also part of an interaction as well as the interaction itself? My reply: I’d recommend including these in your model (it’s probably easiest to do so

2 0.26565865 888 andrew gelman stats-2011-09-03-A psychology researcher asks: Is Anova dead?

Introduction: A research psychologist writes in with a question that’s so long that I’ll put my answer first, then put the question itself below the fold. Here’s my reply: As I wrote in my Anova paper and in my book with Jennifer Hill, I do think that multilevel models can completely replace Anova. At the same time, I think the central idea of Anova should persist in our understanding of these models. To me the central idea of Anova is not F-tests or p-values or sums of squares, but rather the idea of predicting an outcome based on factors with discrete levels, and understanding these factors using variance components. The continuous or categorical response thing doesn’t really matter so much to me. I have no problem using a normal linear model for continuous outcomes (perhaps suitably transformed) and a logistic model for binary outcomes. I don’t want to throw away interactions just because they’re not statistically significant. I’d rather partially pool them toward zero using an inform

3 0.26326576 1786 andrew gelman stats-2013-04-03-Hierarchical array priors for ANOVA decompositions

Introduction: Alexander Volfovsky and Peter Hoff write : ANOVA decompositions are a standard method for describing and estimating heterogeneity among the means of a response variable across levels of multiple categorical factors. In such a decomposition, the complete set of main effects and interaction terms can be viewed as a collection of vectors, matrices and arrays that share various index sets defined by the factor levels. For many types of categorical factors, it is plausible that an ANOVA decomposition exhibits some consistency across orders of effects, in that the levels of a factor that have similar main-effect coefficients may also have similar coefficients in higher-order interaction terms. In such a case, estimation of the higher-order interactions should be improved by borrowing information from the main effects and lower-order interactions. To take advantage of such patterns, this article introduces a class of hierarchical prior distributions for collections of interaction arrays t

4 0.20145872 753 andrew gelman stats-2011-06-09-Allowing interaction terms to vary

Introduction: Zoltan Fazekas writes: I am a 2nd year graduate student in political science at the University of Vienna. In my empirical research I often employ multilevel modeling, and recently I came across a situation that kept me wondering for quite a while. As I did not find much on this in the literature and considering the topics that you work on and blog about, I figured I will try to contact you. The situation is as follows: in a linear multilevel model, there are two important individual level predictors (x1 and x2) and a set of controls. Let us assume that there is a theoretically grounded argument suggesting that an interaction between x1 and x2 should be included in the model (x1 * x2). Both x1 and x2 are let to vary randomly across groups. Would this directly imply that the coefficient of the interaction should also be left to vary across country? This is even more burning if there is no specific hypothesis on the variance of the conditional effect across countries. And then i

5 0.19274844 1966 andrew gelman stats-2013-08-03-Uncertainty in parameter estimates using multilevel models

Introduction: David Hsu writes: I have a (perhaps) simple question about uncertainty in parameter estimates using multilevel models — what is an appropriate threshold for measure parameter uncertainty in a multilevel model? The reason why I ask is that I set out to do a crossed two-way model with two varying intercepts, similar to your flight simulator example in your 2007 book. The difference is that I have a lot of predictors specific to each cell (I think equivalent to airport and pilot in your example), and I find after modeling this in JAGS, I happily find that the predictors are much less important than the variability by cell (airport and pilot effects). Happily because this is what I am writing a paper about. However, I then went to check subsets of predictors using lm() and lmer(). I understand that they all use different estimation methods, but what I can’t figure out is why the errors on all of the coefficient estimates are *so* different. For example, using JAGS, and th

6 0.18390471 472 andrew gelman stats-2010-12-17-So-called fixed and random effects

7 0.18084827 63 andrew gelman stats-2010-06-02-The problem of overestimation of group-level variance parameters

8 0.17653261 846 andrew gelman stats-2011-08-09-Default priors update?

9 0.17391431 1248 andrew gelman stats-2012-04-06-17 groups, 6 group-level predictors: What to do?

10 0.17260015 1486 andrew gelman stats-2012-09-07-Prior distributions for regression coefficients

11 0.16455337 1686 andrew gelman stats-2013-01-21-Finite-population Anova calculations for models with interactions

12 0.16369544 1241 andrew gelman stats-2012-04-02-Fixed effects and identification

13 0.161919 464 andrew gelman stats-2010-12-12-Finite-population standard deviation in a hierarchical model

14 0.15806051 653 andrew gelman stats-2011-04-08-Multilevel regression with shrinkage for “fixed” effects

15 0.15434878 1465 andrew gelman stats-2012-08-21-D. Buggin

16 0.15418245 184 andrew gelman stats-2010-08-04-That half-Cauchy prior

17 0.1499079 1102 andrew gelman stats-2012-01-06-Bayesian Anova found useful in ecology

18 0.14029393 1506 andrew gelman stats-2012-09-21-Building a regression model . . . with only 27 data points

19 0.13831635 1092 andrew gelman stats-2011-12-29-More by Berger and me on weakly informative priors

20 0.13826185 1941 andrew gelman stats-2013-07-16-Priors


similar blogs computed by lsi model

lsi for this blog:

topicId topicWeight

[(0, 0.215), (1, 0.189), (2, 0.055), (3, 0.004), (4, 0.079), (5, -0.002), (6, 0.088), (7, -0.073), (8, -0.027), (9, 0.145), (10, 0.043), (11, 0.047), (12, 0.058), (13, -0.024), (14, 0.081), (15, 0.012), (16, -0.066), (17, 0.031), (18, -0.01), (19, 0.043), (20, -0.046), (21, 0.01), (22, -0.005), (23, -0.014), (24, -0.038), (25, -0.056), (26, -0.049), (27, 0.043), (28, -0.05), (29, -0.028), (30, -0.001), (31, 0.005), (32, -0.029), (33, -0.01), (34, 0.039), (35, -0.022), (36, -0.005), (37, 0.013), (38, 0.015), (39, -0.022), (40, -0.012), (41, -0.082), (42, 0.064), (43, 0.066), (44, -0.042), (45, 0.005), (46, 0.002), (47, -0.073), (48, 0.029), (49, 0.02)]

similar blogs list:

simIndex simValue blogId blogTitle

same-blog 1 0.96647489 2145 andrew gelman stats-2013-12-24-Estimating and summarizing inference for hierarchical variance parameters when the number of groups is small

Introduction: Chris Che-Castaldo writes: I am trying to compute variance components for a hierarchical model where the group level has two binary predictors and their interaction. When I model each of these three predictors as N(0, tau) the model will not converge, perhaps because the number of coefficients in each batch is so small (2 for the main effects and 4 for the interaction). Although I could simply leave all these as predictors as unmodeled fixed effects, the last sentence of section 21.2 on page 462 of Gelman and Hill (2007) suggests this would not be a wise course of action: For example, it is not clear how to define the (finite) standard deviation of variables that are included in interactions. I am curious – is there still no clear cut way to directly compute the finite standard deviation for binary unmodeled variables that are also part of an interaction as well as the interaction itself? My reply: I’d recommend including these in your model (it’s probably easiest to do so

2 0.87821651 1786 andrew gelman stats-2013-04-03-Hierarchical array priors for ANOVA decompositions

Introduction: Alexander Volfovsky and Peter Hoff write : ANOVA decompositions are a standard method for describing and estimating heterogeneity among the means of a response variable across levels of multiple categorical factors. In such a decomposition, the complete set of main effects and interaction terms can be viewed as a collection of vectors, matrices and arrays that share various index sets defined by the factor levels. For many types of categorical factors, it is plausible that an ANOVA decomposition exhibits some consistency across orders of effects, in that the levels of a factor that have similar main-effect coefficients may also have similar coefficients in higher-order interaction terms. In such a case, estimation of the higher-order interactions should be improved by borrowing information from the main effects and lower-order interactions. To take advantage of such patterns, this article introduces a class of hierarchical prior distributions for collections of interaction arrays t

3 0.84799325 1686 andrew gelman stats-2013-01-21-Finite-population Anova calculations for models with interactions

Introduction: Jim Thomson writes: I wonder if you could provide some clarification on the correct way to calculate the finite-population standard deviations for interaction terms in your Bayesian approach to ANOVA (as explained in your 2005 paper, and Gelman and Hill 2007). I understand that it is the SD of the constrained batch coefficients that is of interest, but in most WinBUGS examples I have seen, the SDs are all calculated directly as sd.fin<-sd(beta.main[]) for main effects and sd(beta.int[,]) for interaction effects, where beta.main and beta.int are the unconstrained coefficients, e.g. beta.int[i,j]~dnorm(0,tau). For main effects, I can see that it makes no difference, since the constrained value is calculated by subtracting the mean, and sd(B[]) = sd(B[]-mean(B[])). But the conventional sum-to-zero constraint for interaction terms in linear models is more complicated than subtracting the mean (there are only (n1-1)*(n2-1) free coefficients for an interaction b/w factors with n1 a

4 0.83457285 1966 andrew gelman stats-2013-08-03-Uncertainty in parameter estimates using multilevel models

Introduction: David Hsu writes: I have a (perhaps) simple question about uncertainty in parameter estimates using multilevel models — what is an appropriate threshold for measure parameter uncertainty in a multilevel model? The reason why I ask is that I set out to do a crossed two-way model with two varying intercepts, similar to your flight simulator example in your 2007 book. The difference is that I have a lot of predictors specific to each cell (I think equivalent to airport and pilot in your example), and I find after modeling this in JAGS, I happily find that the predictors are much less important than the variability by cell (airport and pilot effects). Happily because this is what I am writing a paper about. However, I then went to check subsets of predictors using lm() and lmer(). I understand that they all use different estimation methods, but what I can’t figure out is why the errors on all of the coefficient estimates are *so* different. For example, using JAGS, and th

5 0.82090127 753 andrew gelman stats-2011-06-09-Allowing interaction terms to vary

Introduction: Zoltan Fazekas writes: I am a 2nd year graduate student in political science at the University of Vienna. In my empirical research I often employ multilevel modeling, and recently I came across a situation that kept me wondering for quite a while. As I did not find much on this in the literature and considering the topics that you work on and blog about, I figured I will try to contact you. The situation is as follows: in a linear multilevel model, there are two important individual level predictors (x1 and x2) and a set of controls. Let us assume that there is a theoretically grounded argument suggesting that an interaction between x1 and x2 should be included in the model (x1 * x2). Both x1 and x2 are let to vary randomly across groups. Would this directly imply that the coefficient of the interaction should also be left to vary across country? This is even more burning if there is no specific hypothesis on the variance of the conditional effect across countries. And then i

6 0.82005179 653 andrew gelman stats-2011-04-08-Multilevel regression with shrinkage for “fixed” effects

7 0.79305613 2296 andrew gelman stats-2014-04-19-Index or indicator variables

8 0.77296042 1486 andrew gelman stats-2012-09-07-Prior distributions for regression coefficients

9 0.77115154 464 andrew gelman stats-2010-12-12-Finite-population standard deviation in a hierarchical model

10 0.77093649 846 andrew gelman stats-2011-08-09-Default priors update?

11 0.76784116 184 andrew gelman stats-2010-08-04-That half-Cauchy prior

12 0.75541806 1267 andrew gelman stats-2012-04-17-Hierarchical-multilevel modeling with “big data”

13 0.75131547 2086 andrew gelman stats-2013-11-03-How best to compare effects measured in two different time periods?

14 0.74562836 850 andrew gelman stats-2011-08-11-Understanding how estimates change when you move to a multilevel model

15 0.74436682 1466 andrew gelman stats-2012-08-22-The scaled inverse Wishart prior distribution for a covariance matrix in a hierarchical model

16 0.74349988 1465 andrew gelman stats-2012-08-21-D. Buggin

17 0.73807192 472 andrew gelman stats-2010-12-17-So-called fixed and random effects

18 0.73610115 1102 andrew gelman stats-2012-01-06-Bayesian Anova found useful in ecology

19 0.7352587 1248 andrew gelman stats-2012-04-06-17 groups, 6 group-level predictors: What to do?

20 0.72072583 810 andrew gelman stats-2011-07-20-Adding more information can make the variance go up (depending on your model)


similar blogs computed by lda model

lda for this blog:

topicId topicWeight

[(3, 0.013), (13, 0.031), (16, 0.013), (21, 0.026), (24, 0.229), (52, 0.013), (62, 0.051), (63, 0.034), (79, 0.097), (80, 0.011), (84, 0.036), (99, 0.351)]

similar blogs list:

simIndex simValue blogId blogTitle

same-blog 1 0.98035192 2145 andrew gelman stats-2013-12-24-Estimating and summarizing inference for hierarchical variance parameters when the number of groups is small

Introduction: Chris Che-Castaldo writes: I am trying to compute variance components for a hierarchical model where the group level has two binary predictors and their interaction. When I model each of these three predictors as N(0, tau) the model will not converge, perhaps because the number of coefficients in each batch is so small (2 for the main effects and 4 for the interaction). Although I could simply leave all these as predictors as unmodeled fixed effects, the last sentence of section 21.2 on page 462 of Gelman and Hill (2007) suggests this would not be a wise course of action: For example, it is not clear how to define the (finite) standard deviation of variables that are included in interactions. I am curious – is there still no clear cut way to directly compute the finite standard deviation for binary unmodeled variables that are also part of an interaction as well as the interaction itself? My reply: I’d recommend including these in your model (it’s probably easiest to do so

2 0.9650389 1786 andrew gelman stats-2013-04-03-Hierarchical array priors for ANOVA decompositions

Introduction: Alexander Volfovsky and Peter Hoff write : ANOVA decompositions are a standard method for describing and estimating heterogeneity among the means of a response variable across levels of multiple categorical factors. In such a decomposition, the complete set of main effects and interaction terms can be viewed as a collection of vectors, matrices and arrays that share various index sets defined by the factor levels. For many types of categorical factors, it is plausible that an ANOVA decomposition exhibits some consistency across orders of effects, in that the levels of a factor that have similar main-effect coefficients may also have similar coefficients in higher-order interaction terms. In such a case, estimation of the higher-order interactions should be improved by borrowing information from the main effects and lower-order interactions. To take advantage of such patterns, this article introduces a class of hierarchical prior distributions for collections of interaction arrays t

3 0.9596945 63 andrew gelman stats-2010-06-02-The problem of overestimation of group-level variance parameters

Introduction: John Lawson writes: I have been experimenting using Bayesian Methods to estimate variance components, and I have noticed that even when I use a noninformative prior, my estimates are never close to the method of moments or REML estimates. In every case I have tried, the sum of the Bayesian estimated variance components is always larger than the sum of the estimates obtained by method of moments or REML. For data sets I have used that arise from a simple one-way random effects model, the Bayesian estimates of the between groups variance component is usually larger than the method of moments or REML estimates. When I use a uniform prior on the between standard deviation (as you recommended in your 2006 paper ) rather than an inverse gamma prior on the between variance component, the between variance component is usually reduced. However, for the dyestuff data in Davies(1949, p74), the opposite appears to be the case. I am a worried that the Bayesian estimators of the varian

4 0.95852679 1172 andrew gelman stats-2012-02-17-Rare name analysis and wealth convergence

Introduction: Steve Hsu summarizes the research of economic historian Greg Clark and Neil Cummins : Using rare surnames we track the socio-economic status of descendants of a sample of English rich and poor in 1800, until 2011. We measure social status through wealth, education, occupation, and age at death. Our method allows unbiased estimates of mobility rates. Paradoxically, we find two things. Mobility rates are lower than conventionally estimated. There is considerable persistence of status, even after 200 years. But there is convergence with each generation. The 1800 underclass has already attained mediocrity. And the 1800 upper class will eventually dissolve into the mass of society, though perhaps not for another 300 years, or longer. Read more at Steven’s blog. The idea of rare names to perform this analysis is interesting – and has been recently applied to the study of nepotism in Italy . I haven’t looked into the details of the methodology, but rare events

5 0.95742083 2129 andrew gelman stats-2013-12-10-Cross-validation and Bayesian estimation of tuning parameters

Introduction: Ilya Lipkovich writes: I read with great interest your 2008 paper [with Aleks Jakulin, Grazia Pittau, and Yu-Sung Su] on weakly informative priors for logistic regression and also followed an interesting discussion on your blog. This discussion was within Bayesian community in relation to the validity of priors. However i would like to approach it rather from a more broad perspective on predictive modeling bringing in the ideas from machine/statistical learning approach”. Actually you were the first to bring it up by mentioning in your paper “borrowing ideas from computer science” on cross-validation when comparing predictive ability of your proposed priors with other choices. However, using cross-validation for comparing method performance is not the only or primary use of CV in machine-learning. Most of machine learning methods have some “meta” or complexity parameters and use cross-validation to tune them up. For example, one of your comparison methods is BBR which actually

6 0.95587921 1384 andrew gelman stats-2012-06-19-Slick time series decomposition of the birthdays data

7 0.95584261 1170 andrew gelman stats-2012-02-16-A previous discussion with Charles Murray about liberals, conservatives, and social class

8 0.95522106 1733 andrew gelman stats-2013-02-22-Krugman sets the bar too high

9 0.95474768 2200 andrew gelman stats-2014-02-05-Prior distribution for a predicted probability

10 0.95442873 1644 andrew gelman stats-2012-12-30-Fixed effects, followed by Bayes shrinkage?

11 0.95391715 2283 andrew gelman stats-2014-04-06-An old discussion of food deserts

12 0.95345449 1941 andrew gelman stats-2013-07-16-Priors

13 0.95322299 1150 andrew gelman stats-2012-02-02-The inevitable problems with statistical significance and 95% intervals

14 0.95219547 1785 andrew gelman stats-2013-04-02-So much artistic talent

15 0.95209086 1041 andrew gelman stats-2011-12-04-David MacKay and Occam’s Razor

16 0.95191216 247 andrew gelman stats-2010-09-01-How does Bayes do it?

17 0.95175779 399 andrew gelman stats-2010-11-07-Challenges of experimental design; also another rant on the practice of mentioning the publication of an article but not naming its author

18 0.95175195 888 andrew gelman stats-2011-09-03-A psychology researcher asks: Is Anova dead?

19 0.95173109 1414 andrew gelman stats-2012-07-12-Steven Pinker’s unconvincing debunking of group selection

20 0.95169449 970 andrew gelman stats-2011-10-24-Bell Labs