andrew_gelman_stats andrew_gelman_stats-2013 andrew_gelman_stats-2013-1786 knowledge-graph by maker-knowledge-mining

1786 andrew gelman stats-2013-04-03-Hierarchical array priors for ANOVA decompositions


meta infos for this blog

Source: html

Introduction: Alexander Volfovsky and Peter Hoff write : ANOVA decompositions are a standard method for describing and estimating heterogeneity among the means of a response variable across levels of multiple categorical factors. In such a decomposition, the complete set of main effects and interaction terms can be viewed as a collection of vectors, matrices and arrays that share various index sets defined by the factor levels. For many types of categorical factors, it is plausible that an ANOVA decomposition exhibits some consistency across orders of effects, in that the levels of a factor that have similar main-effect coefficients may also have similar coefficients in higher-order interaction terms. In such a case, estimation of the higher-order interactions should be improved by borrowing information from the main effects and lower-order interactions. To take advantage of such patterns, this article introduces a class of hierarchical prior distributions for collections of interaction arrays t


Summary: the most important sentenses genereted by tfidf model

sentIndex sentText sentNum sentScore

1 Alexander Volfovsky and Peter Hoff write : ANOVA decompositions are a standard method for describing and estimating heterogeneity among the means of a response variable across levels of multiple categorical factors. [sent-1, score-0.321]

2 In such a decomposition, the complete set of main effects and interaction terms can be viewed as a collection of vectors, matrices and arrays that share various index sets defined by the factor levels. [sent-2, score-0.828]

3 For many types of categorical factors, it is plausible that an ANOVA decomposition exhibits some consistency across orders of effects, in that the levels of a factor that have similar main-effect coefficients may also have similar coefficients in higher-order interaction terms. [sent-3, score-1.136]

4 In such a case, estimation of the higher-order interactions should be improved by borrowing information from the main effects and lower-order interactions. [sent-4, score-0.666]

5 To take advantage of such patterns, this article introduces a class of hierarchical prior distributions for collections of interaction arrays that can adapt to the presence of such interactions. [sent-5, score-1.012]

6 These prior distributions are based on a type of array-variate normal distribution, for which a covariance matrix for each factor is estimated. [sent-6, score-0.396]

7 This prior is able to adapt to potential similarities among the levels of a factor, and incorporate any such information into the estimation of the effects in which the factor appears. [sent-7, score-1.061]

8 In the presence of such similarities, this prior is able to borrow information from well-estimated main effects and lower-order interactions to assist in the estimation of higher-order terms for which data information is limited. [sent-8, score-1.138]

9 I’ll have to look at the model in detail, but at first glance this looks like exactly what I want for partial pooling of deep interactions , going beyond the exchangeable Anova models I’ve written about before. [sent-9, score-0.358]

10 Volfovsky and Hoff report that they fit their model by iterating the Gibbs sampler 200,000 times. [sent-11, score-0.078]

11 And, of course, once we start fitting these models in examples, we’ll probably have thoughts on how to modify them. [sent-13, score-0.149]

12 Econometricians see it as an uninteresting special case of linear regression. [sent-15, score-0.073]

13 Instructors see it as one of the hardest topics in classical statistics to teach, especially in its more elaborate forms such as split-plot analysis. [sent-18, score-0.149]

14 We believe, however, that the ideas of ANOVA are useful in many applications of statistics. [sent-19, score-0.063]

15 For the purpose of this paper, we identify ANOVA with the structuring of parameters into batches—that is, with variance components models. [sent-20, score-0.335]

16 There are more general mathematical formulations of the analysis of variance, but this is the aspect that we believe is most relevant in applied statistics, especially for regression modeling. [sent-21, score-0.156]

17 We shall demonstrate how many of the difficulties in understanding and computing ANOVAs can be resolved using a hierarchical Bayesian framework. [sent-22, score-0.279]

18 Conversely, we illustrate how thinking in terms of variance components can be useful in understanding and displaying hierarchical regressions. [sent-23, score-0.515]

19 With hierarchical (multilevel) models becoming used more and more widely, we view ANOVA as more important than ever in statistical applications. [sent-24, score-0.234]


similar blogs computed by tfidf model

tfidf for this blog:

wordName wordTfidf (topN-words)

[('anova', 0.369), ('volfovsky', 0.283), ('hoff', 0.243), ('factor', 0.203), ('variance', 0.155), ('hierarchical', 0.148), ('arrays', 0.145), ('interaction', 0.144), ('decomposition', 0.142), ('similarities', 0.138), ('effects', 0.134), ('interactions', 0.133), ('adapt', 0.133), ('categorical', 0.133), ('levels', 0.124), ('estimation', 0.121), ('prior', 0.117), ('presence', 0.111), ('components', 0.111), ('terms', 0.101), ('main', 0.101), ('information', 0.091), ('coefficients', 0.087), ('models', 0.086), ('borrowing', 0.086), ('anovas', 0.081), ('inflexible', 0.081), ('classical', 0.08), ('mathematical', 0.078), ('formulations', 0.078), ('exhibits', 0.078), ('iterating', 0.078), ('distributions', 0.076), ('exchangeable', 0.075), ('orders', 0.075), ('collections', 0.073), ('alexander', 0.073), ('uninteresting', 0.073), ('vectors', 0.071), ('assist', 0.069), ('structuring', 0.069), ('borrow', 0.069), ('hardest', 0.069), ('shall', 0.068), ('introduces', 0.065), ('heterogeneity', 0.064), ('glance', 0.064), ('econometricians', 0.064), ('modify', 0.063), ('many', 0.063)]

similar blogs list:

simIndex simValue blogId blogTitle

same-blog 1 0.99999982 1786 andrew gelman stats-2013-04-03-Hierarchical array priors for ANOVA decompositions

Introduction: Alexander Volfovsky and Peter Hoff write : ANOVA decompositions are a standard method for describing and estimating heterogeneity among the means of a response variable across levels of multiple categorical factors. In such a decomposition, the complete set of main effects and interaction terms can be viewed as a collection of vectors, matrices and arrays that share various index sets defined by the factor levels. For many types of categorical factors, it is plausible that an ANOVA decomposition exhibits some consistency across orders of effects, in that the levels of a factor that have similar main-effect coefficients may also have similar coefficients in higher-order interaction terms. In such a case, estimation of the higher-order interactions should be improved by borrowing information from the main effects and lower-order interactions. To take advantage of such patterns, this article introduces a class of hierarchical prior distributions for collections of interaction arrays t

2 0.2638272 888 andrew gelman stats-2011-09-03-A psychology researcher asks: Is Anova dead?

Introduction: A research psychologist writes in with a question that’s so long that I’ll put my answer first, then put the question itself below the fold. Here’s my reply: As I wrote in my Anova paper and in my book with Jennifer Hill, I do think that multilevel models can completely replace Anova. At the same time, I think the central idea of Anova should persist in our understanding of these models. To me the central idea of Anova is not F-tests or p-values or sums of squares, but rather the idea of predicting an outcome based on factors with discrete levels, and understanding these factors using variance components. The continuous or categorical response thing doesn’t really matter so much to me. I have no problem using a normal linear model for continuous outcomes (perhaps suitably transformed) and a logistic model for binary outcomes. I don’t want to throw away interactions just because they’re not statistically significant. I’d rather partially pool them toward zero using an inform

3 0.26326576 2145 andrew gelman stats-2013-12-24-Estimating and summarizing inference for hierarchical variance parameters when the number of groups is small

Introduction: Chris Che-Castaldo writes: I am trying to compute variance components for a hierarchical model where the group level has two binary predictors and their interaction. When I model each of these three predictors as N(0, tau) the model will not converge, perhaps because the number of coefficients in each batch is so small (2 for the main effects and 4 for the interaction). Although I could simply leave all these as predictors as unmodeled fixed effects, the last sentence of section 21.2 on page 462 of Gelman and Hill (2007) suggests this would not be a wise course of action: For example, it is not clear how to define the (finite) standard deviation of variables that are included in interactions. I am curious – is there still no clear cut way to directly compute the finite standard deviation for binary unmodeled variables that are also part of an interaction as well as the interaction itself? My reply: I’d recommend including these in your model (it’s probably easiest to do so

4 0.16484305 472 andrew gelman stats-2010-12-17-So-called fixed and random effects

Introduction: Someone writes: I am hoping you can give me some advice about when to use fixed and random effects model. I am currently working on a paper that examines the effect of . . . by comparing states . . . It got reviewed . . . by three economists and all suggest that we run a fixed effects model. We ran a hierarchial model in the paper that allow the intercept and slope to vary before and after . . . My question is which is correct? We have ran it both ways and really it makes no difference which model you run, the results are very similar. But for my own learning, I would really like to understand which to use under what circumstances. Is the fact that we use the whole population reason enough to just run a fixed effect model? Perhaps you can suggest a good reference to this question of when to run a fixed vs. random effects model. I’m not always sure what is meant by a “fixed effects model”; see my paper on Anova for discussion of the problems with this terminology: http://w

5 0.16191743 1891 andrew gelman stats-2013-06-09-“Heterogeneity of variance in experimental studies: A challenge to conventional interpretations”

Introduction: Avi sent along this old paper from Bryk and Raudenbush, who write: The presence of heterogeneity of variance across groups indicates that the standard statistical model for treatment effects no longer applies. Specifically, the assumption that treatments add a constant to each subject’s development fails. An alternative model is required to represent how treatment effects are distributed across individuals. We develop in this article a simple statistical model to demonstrate the link between heterogeneity of variance and random treatment effects. Next, we illustrate with results from two previously published studies how a failure to recognize the substantive importance of heterogeneity of variance obscured significant results present in these data. The article concludes with a review and synthesis of techniques for modeling variances. Although these methods have been well established in the statistical literature, they are not widely known by social and behavioral scientists. T

6 0.16029687 63 andrew gelman stats-2010-06-02-The problem of overestimation of group-level variance parameters

7 0.15806398 1102 andrew gelman stats-2012-01-06-Bayesian Anova found useful in ecology

8 0.15276861 1267 andrew gelman stats-2012-04-17-Hierarchical-multilevel modeling with “big data”

9 0.14616747 1686 andrew gelman stats-2013-01-21-Finite-population Anova calculations for models with interactions

10 0.1448386 1695 andrew gelman stats-2013-01-28-Economists argue about Bayes

11 0.13781062 1486 andrew gelman stats-2012-09-07-Prior distributions for regression coefficients

12 0.13469632 779 andrew gelman stats-2011-06-25-Avoiding boundary estimates using a prior distribution as regularization

13 0.13319555 1858 andrew gelman stats-2013-05-15-Reputations changeable, situations tolerable

14 0.13131088 1991 andrew gelman stats-2013-08-21-BDA3 table of contents (also a new paper on visualization)

15 0.12963085 288 andrew gelman stats-2010-09-21-Discussion of the paper by Girolami and Calderhead on Bayesian computation

16 0.12818645 464 andrew gelman stats-2010-12-12-Finite-population standard deviation in a hierarchical model

17 0.12572163 1425 andrew gelman stats-2012-07-23-Examples of the use of hierarchical modeling to generalize to new settings

18 0.12564585 433 andrew gelman stats-2010-11-27-One way that psychology research is different than medical research

19 0.12343918 1941 andrew gelman stats-2013-07-16-Priors

20 0.12341961 851 andrew gelman stats-2011-08-12-year + (1|year)


similar blogs computed by lsi model

lsi for this blog:

topicId topicWeight

[(0, 0.2), (1, 0.178), (2, 0.016), (3, -0.009), (4, 0.044), (5, 0.01), (6, 0.032), (7, -0.046), (8, -0.049), (9, 0.096), (10, -0.001), (11, 0.039), (12, 0.04), (13, -0.025), (14, 0.069), (15, 0.019), (16, -0.074), (17, 0.017), (18, -0.01), (19, 0.016), (20, -0.013), (21, -0.01), (22, 0.0), (23, -0.008), (24, -0.005), (25, -0.032), (26, -0.061), (27, 0.064), (28, -0.014), (29, -0.022), (30, -0.002), (31, 0.016), (32, -0.022), (33, -0.047), (34, 0.046), (35, -0.04), (36, -0.038), (37, 0.014), (38, -0.007), (39, 0.011), (40, -0.025), (41, -0.03), (42, 0.025), (43, 0.032), (44, -0.011), (45, -0.013), (46, -0.005), (47, -0.066), (48, 0.011), (49, -0.065)]

similar blogs list:

simIndex simValue blogId blogTitle

same-blog 1 0.97735256 1786 andrew gelman stats-2013-04-03-Hierarchical array priors for ANOVA decompositions

Introduction: Alexander Volfovsky and Peter Hoff write : ANOVA decompositions are a standard method for describing and estimating heterogeneity among the means of a response variable across levels of multiple categorical factors. In such a decomposition, the complete set of main effects and interaction terms can be viewed as a collection of vectors, matrices and arrays that share various index sets defined by the factor levels. For many types of categorical factors, it is plausible that an ANOVA decomposition exhibits some consistency across orders of effects, in that the levels of a factor that have similar main-effect coefficients may also have similar coefficients in higher-order interaction terms. In such a case, estimation of the higher-order interactions should be improved by borrowing information from the main effects and lower-order interactions. To take advantage of such patterns, this article introduces a class of hierarchical prior distributions for collections of interaction arrays t

2 0.86982691 2145 andrew gelman stats-2013-12-24-Estimating and summarizing inference for hierarchical variance parameters when the number of groups is small

Introduction: Chris Che-Castaldo writes: I am trying to compute variance components for a hierarchical model where the group level has two binary predictors and their interaction. When I model each of these three predictors as N(0, tau) the model will not converge, perhaps because the number of coefficients in each batch is so small (2 for the main effects and 4 for the interaction). Although I could simply leave all these as predictors as unmodeled fixed effects, the last sentence of section 21.2 on page 462 of Gelman and Hill (2007) suggests this would not be a wise course of action: For example, it is not clear how to define the (finite) standard deviation of variables that are included in interactions. I am curious – is there still no clear cut way to directly compute the finite standard deviation for binary unmodeled variables that are also part of an interaction as well as the interaction itself? My reply: I’d recommend including these in your model (it’s probably easiest to do so

3 0.79894727 1466 andrew gelman stats-2012-08-22-The scaled inverse Wishart prior distribution for a covariance matrix in a hierarchical model

Introduction: Since we’re talking about the scaled inverse Wishart . . . here’s a recent message from Chris Chatham: I have been reading your book on Bayesian Hierarchical/Multilevel Modeling but have been struggling a bit with deciding whether to model my multivariate normal distribution using the scaled inverse Wishart approach you advocate given the arguments at this blog post [entitled "Why an inverse-Wishart prior may not be such a good idea"]. My reply: We discuss this in our book. We know the inverse-Wishart has problems, that’s why we recommend the scaled inverse-Wishart, which is a more general class of models. Here ‘s an old blog post on the topic. And also of course there’s the description in our book. Chris pointed me to the following comment by Simon Barthelmé: Using the scaled inverse Wishart doesn’t change anything, the standard deviations of the invidual coefficients and their covariance are still dependent. My answer would be to use a prior that models the stan

4 0.79571718 653 andrew gelman stats-2011-04-08-Multilevel regression with shrinkage for “fixed” effects

Introduction: Dean Eckles writes: I remember reading on your blog that you were working on some tools to fit multilevel models that also include “fixed” effects — such as continuous predictors — that are also estimated with shrinkage (for example, an L1 or L2 penalty). Any new developments on this front? I often find myself wanting to fit a multilevel model to some data, but also needing to include a number of “fixed” effects, mainly continuous variables. This makes me wary of overfitting to these predictors, so then I’d want to use some kind of shrinkage. As far as I can tell, the main options for doing this now is by going fully Bayesian and using a Gibbs sampler. With MCMCglmm or BUGS/JAGS I could just specify a prior on the fixed effects that corresponds to a desired penalty. However, this is pretty slow, especially with a large data set and because I’d like to select the penalty parameter by cross-validation (which is where this isn’t very Bayesian I guess?). My reply: We allow info

5 0.79175669 1686 andrew gelman stats-2013-01-21-Finite-population Anova calculations for models with interactions

Introduction: Jim Thomson writes: I wonder if you could provide some clarification on the correct way to calculate the finite-population standard deviations for interaction terms in your Bayesian approach to ANOVA (as explained in your 2005 paper, and Gelman and Hill 2007). I understand that it is the SD of the constrained batch coefficients that is of interest, but in most WinBUGS examples I have seen, the SDs are all calculated directly as sd.fin<-sd(beta.main[]) for main effects and sd(beta.int[,]) for interaction effects, where beta.main and beta.int are the unconstrained coefficients, e.g. beta.int[i,j]~dnorm(0,tau). For main effects, I can see that it makes no difference, since the constrained value is calculated by subtracting the mean, and sd(B[]) = sd(B[]-mean(B[])). But the conventional sum-to-zero constraint for interaction terms in linear models is more complicated than subtracting the mean (there are only (n1-1)*(n2-1) free coefficients for an interaction b/w factors with n1 a

6 0.78542036 1102 andrew gelman stats-2012-01-06-Bayesian Anova found useful in ecology

7 0.78099954 464 andrew gelman stats-2010-12-12-Finite-population standard deviation in a hierarchical model

8 0.77924204 1966 andrew gelman stats-2013-08-03-Uncertainty in parameter estimates using multilevel models

9 0.77641785 846 andrew gelman stats-2011-08-09-Default priors update?

10 0.7719571 269 andrew gelman stats-2010-09-10-R vs. Stata, or, Different ways to estimate multilevel models

11 0.75341433 2086 andrew gelman stats-2013-11-03-How best to compare effects measured in two different time periods?

12 0.7499103 850 andrew gelman stats-2011-08-11-Understanding how estimates change when you move to a multilevel model

13 0.741759 501 andrew gelman stats-2011-01-04-A new R package for fititng multilevel models

14 0.74004233 1267 andrew gelman stats-2012-04-17-Hierarchical-multilevel modeling with “big data”

15 0.73389322 810 andrew gelman stats-2011-07-20-Adding more information can make the variance go up (depending on your model)

16 0.7285794 753 andrew gelman stats-2011-06-09-Allowing interaction terms to vary

17 0.7280426 2129 andrew gelman stats-2013-12-10-Cross-validation and Bayesian estimation of tuning parameters

18 0.72716516 63 andrew gelman stats-2010-06-02-The problem of overestimation of group-level variance parameters

19 0.72711092 184 andrew gelman stats-2010-08-04-That half-Cauchy prior

20 0.72156799 1877 andrew gelman stats-2013-05-30-Infill asymptotics and sprawl asymptotics


similar blogs computed by lda model

lda for this blog:

topicId topicWeight

[(6, 0.02), (9, 0.012), (16, 0.046), (21, 0.028), (24, 0.158), (30, 0.01), (45, 0.011), (55, 0.015), (63, 0.033), (79, 0.159), (86, 0.043), (89, 0.032), (99, 0.291)]

similar blogs list:

simIndex simValue blogId blogTitle

1 0.98167419 1515 andrew gelman stats-2012-09-29-Jost Haidt

Introduction: Research psychologist John Jost reviews the recent book, “The Righteous Mind,” by research psychologist Jonathan Haidt. Some of my thoughts on Haidt’s book are here . And here’s some of Jost’s review: Haidt’s book is creative, interesting, and provocative. . . . The book shines a new light on moral psychology and presents a bold, confrontational message. From a scientific perspective, however, I worry that his theory raises more questions than it answers. Why do some individuals feel that it is morally good (or necessary) to obey authority, favor the ingroup, and maintain purity, whereas others are skeptical? (Perhaps parenting style is relevant after all.) Why do some people think that it is morally acceptable to judge or even mistreat others such as gay or lesbian couples or, only a generation ago, interracial couples because they dislike or feel disgusted by them, whereas others do not? Why does the present generation “care about violence toward many more classes of victims

2 0.9714672 1538 andrew gelman stats-2012-10-17-Rust

Introduction: I happened to be referring to the path sampling paper today and took a look at Appendix A.2: I’m sure I could reconstruct all of this if I had to, but I certainly can’t read this sort of thing cold anymore.

3 0.96287358 845 andrew gelman stats-2011-08-08-How adoption speed affects the abandonment of cultural tastes

Introduction: Interesting article by Jonah Berger and Gael Le Mens: Products, styles, and social movements often catch on and become popular, but little is known about why such identity-relevant cultural tastes and practices die out. We demonstrate that the velocity of adoption may affect abandonment: Analysis of over 100 years of data on first-name adoption in both France and the United States illustrates that cultural tastes that have been adopted quickly die faster (i.e., are less likely to persist). Mirroring this aggregate pattern, at the individual level, expecting parents are more hesitant to adopt names that recently experienced sharper increases in adoption. Further analysis indicate that these effects are driven by concerns about symbolic value: Fads are perceived negatively, so people avoid identity-relevant items with sharply increasing popularity because they believe that they will be short lived. Ancillary analyses also indicate that, in contrast to conventional wisdom, identity-r

4 0.96058816 469 andrew gelman stats-2010-12-16-2500 people living in a park in Chicago?

Introduction: Frank Hansen writes: Columbus Park is on Chicago’s west side, in the Austin neighborhood. The park is a big green area which includes a golf course. Here is the google satellite view. Here is the nytimes page. Go to Chicago, and zoom over to the census tract 2521, which is just north of the horizontal gray line (Eisenhower Expressway, aka I290) and just east of Oak Park. The park is labeled on the nytimes map. The census data have around 50 dots (they say 50 people per dot) in the park which has no residential buildings. Congressional district is Danny Davis, IL7. Here’s a map of the district. So, how do we explain the map showing ~50 dots worth of people living in the park. What’s up with the algorithm to place the dots? I dunno. I leave this one to you, the readers.

same-blog 5 0.96015155 1786 andrew gelman stats-2013-04-03-Hierarchical array priors for ANOVA decompositions

Introduction: Alexander Volfovsky and Peter Hoff write : ANOVA decompositions are a standard method for describing and estimating heterogeneity among the means of a response variable across levels of multiple categorical factors. In such a decomposition, the complete set of main effects and interaction terms can be viewed as a collection of vectors, matrices and arrays that share various index sets defined by the factor levels. For many types of categorical factors, it is plausible that an ANOVA decomposition exhibits some consistency across orders of effects, in that the levels of a factor that have similar main-effect coefficients may also have similar coefficients in higher-order interaction terms. In such a case, estimation of the higher-order interactions should be improved by borrowing information from the main effects and lower-order interactions. To take advantage of such patterns, this article introduces a class of hierarchical prior distributions for collections of interaction arrays t

6 0.95785952 1172 andrew gelman stats-2012-02-17-Rare name analysis and wealth convergence

7 0.94778413 1379 andrew gelman stats-2012-06-14-Cool-ass signal processing using Gaussian processes (birthdays again)

8 0.94312692 1825 andrew gelman stats-2013-04-25-It’s binless! A program for computing normalizing functions

9 0.93874633 1884 andrew gelman stats-2013-06-05-A story of fake-data checking being used to shoot down a flawed analysis at the Farm Credit Agency

10 0.93269652 1126 andrew gelman stats-2012-01-18-Bob on Stan

11 0.92774522 1041 andrew gelman stats-2011-12-04-David MacKay and Occam’s Razor

12 0.92728364 2145 andrew gelman stats-2013-12-24-Estimating and summarizing inference for hierarchical variance parameters when the number of groups is small

13 0.92684847 863 andrew gelman stats-2011-08-21-Bad graph

14 0.92460161 939 andrew gelman stats-2011-10-03-DBQQ rounding for labeling charts and communicating tolerances

15 0.91968566 1384 andrew gelman stats-2012-06-19-Slick time series decomposition of the birthdays data

16 0.91856909 1048 andrew gelman stats-2011-12-09-Maze generation algorithms!

17 0.91838431 399 andrew gelman stats-2010-11-07-Challenges of experimental design; also another rant on the practice of mentioning the publication of an article but not naming its author

18 0.91573477 888 andrew gelman stats-2011-09-03-A psychology researcher asks: Is Anova dead?

19 0.90639734 2142 andrew gelman stats-2013-12-21-Chasing the noise

20 0.90336907 1403 andrew gelman stats-2012-07-02-Moving beyond hopeless graphics