andrew_gelman_stats andrew_gelman_stats-2013 andrew_gelman_stats-2013-1966 knowledge-graph by maker-knowledge-mining

1966 andrew gelman stats-2013-08-03-Uncertainty in parameter estimates using multilevel models

meta infos for this blog

Source: html

Introduction: David Hsu writes: I have a (perhaps) simple question about uncertainty in parameter estimates using multilevel models — what is an appropriate threshold for measure parameter uncertainty in a multilevel model? The reason why I ask is that I set out to do a crossed two-way model with two varying intercepts, similar to your flight simulator example in your 2007 book. The difference is that I have a lot of predictors specific to each cell (I think equivalent to airport and pilot in your example), and I find after modeling this in JAGS, I happily find that the predictors are much less important than the variability by cell (airport and pilot effects). Happily because this is what I am writing a paper about. However, I then went to check subsets of predictors using lm() and lmer(). I understand that they all use different estimation methods, but what I can’t figure out is why the errors on all of the coefficient estimates are *so* different. For example, using JAGS, and th

Summary: the most important sentenses genereted by tfidf model

sentIndex sentText sentNum sentScore

1 David Hsu writes: I have a (perhaps) simple question about uncertainty in parameter estimates using multilevel models — what is an appropriate threshold for measure parameter uncertainty in a multilevel model? [sent-1, score-0.853]

2 The reason why I ask is that I set out to do a crossed two-way model with two varying intercepts, similar to your flight simulator example in your 2007 book. [sent-2, score-0.665]

3 The difference is that I have a lot of predictors specific to each cell (I think equivalent to airport and pilot in your example), and I find after modeling this in JAGS, I happily find that the predictors are much less important than the variability by cell (airport and pilot effects). [sent-3, score-1.963]

4 However, I then went to check subsets of predictors using lm() and lmer(). [sent-5, score-0.617]

5 I understand that they all use different estimation methods, but what I can’t figure out is why the errors on all of the coefficient estimates are *so* different. [sent-6, score-0.406]

6 For example, using JAGS, and then visualizing the predictors relative to zero (i. [sent-7, score-0.611]

7 , the null hypothesis) using a plot similar to your ANOVA graphs (figure 22. [sent-9, score-0.177]

8 3), I would find that if I made the error bars either based on 95% confidence intervals or +/- 2 standard deviations, one would conclude that the predictors are not very significant (since 2. [sent-11, score-0.68]

9 But if I use the lm() function to check the model without any varying intercepts, I get all of the predictors significant. [sent-14, score-0.623]

10 It is based on 12,000 or so observations, so I guess I’d expect the standard errors to be low. [sent-15, score-0.513]

11 But by the same token, I’d expect the standard deviation of the chains for each estimate to be equivalently low and asymptotically approaching the standard errors from the normal OLS. [sent-16, score-1.126]

12 Even weak prior information (for example, half-Cauchy priors that bound the parameters away from unrealistically high values) can be useful in constraining group-level variance parameters (especially when the number of groups is small). [sent-21, score-0.483]

13 Third, if you fit lm(), you’ll tend to get standard errors that are too small because you’re not incorporating the correlations in the unexplained errors. [sent-22, score-0.694]

similar blogs computed by tfidf model

tfidf for this blog:

wordName wordTfidf (topN-words)

[('predictors', 0.332), ('lm', 0.289), ('errors', 0.235), ('airport', 0.221), ('intercepts', 0.188), ('happily', 0.185), ('standard', 0.184), ('lmer', 0.178), ('jags', 0.17), ('pilot', 0.17), ('cell', 0.152), ('varying', 0.129), ('simulator', 0.117), ('unrealistically', 0.11), ('incorporating', 0.106), ('span', 0.106), ('constraining', 0.106), ('uncertainty', 0.103), ('crossed', 0.102), ('parameter', 0.101), ('unexplained', 0.099), ('approaching', 0.099), ('using', 0.098), ('multilevel', 0.097), ('subsets', 0.094), ('parameters', 0.094), ('expect', 0.094), ('zero', 0.093), ('check', 0.093), ('equivalently', 0.092), ('flight', 0.091), ('asymptotically', 0.091), ('figure', 0.09), ('visualizing', 0.088), ('hsu', 0.086), ('find', 0.085), ('anova', 0.084), ('deviations', 0.081), ('estimates', 0.081), ('bound', 0.079), ('bars', 0.079), ('variability', 0.079), ('similar', 0.079), ('example', 0.078), ('chains', 0.077), ('limits', 0.075), ('threshold', 0.072), ('deviation', 0.07), ('small', 0.07), ('model', 0.069)]

similar blogs list:

simIndex simValue blogId blogTitle

same-blog 1 1.0 1966 andrew gelman stats-2013-08-03-Uncertainty in parameter estimates using multilevel models

2 0.19274844 2145 andrew gelman stats-2013-12-24-Estimating and summarizing inference for hierarchical variance parameters when the number of groups is small

Introduction: Chris Che-Castaldo writes: I am trying to compute variance components for a hierarchical model where the group level has two binary predictors and their interaction. When I model each of these three predictors as N(0, tau) the model will not converge, perhaps because the number of coefficients in each batch is so small (2 for the main effects and 4 for the interaction). Although I could simply leave all these as predictors as unmodeled fixed effects, the last sentence of section 21.2 on page 462 of Gelman and Hill (2007) suggests this would not be a wise course of action: For example, it is not clear how to define the (finite) standard deviation of variables that are included in interactions. I am curious – is there still no clear cut way to directly compute the finite standard deviation for binary unmodeled variables that are also part of an interaction as well as the interaction itself? My reply: I’d recommend including these in your model (it’s probably easiest to do so

3 0.17743894 753 andrew gelman stats-2011-06-09-Allowing interaction terms to vary

Introduction: Zoltan Fazekas writes: I am a 2nd year graduate student in political science at the University of Vienna. In my empirical research I often employ multilevel modeling, and recently I came across a situation that kept me wondering for quite a while. As I did not find much on this in the literature and considering the topics that you work on and blog about, I figured I will try to contact you. The situation is as follows: in a linear multilevel model, there are two important individual level predictors (x1 and x2) and a set of controls. Let us assume that there is a theoretically grounded argument suggesting that an interaction between x1 and x2 should be included in the model (x1 * x2). Both x1 and x2 are let to vary randomly across groups. Would this directly imply that the coefficient of the interaction should also be left to vary across country? This is even more burning if there is no specific hypothesis on the variance of the conditional effect across countries. And then i

4 0.16129002 1248 andrew gelman stats-2012-04-06-17 groups, 6 group-level predictors: What to do?

Introduction: Yi-Chun Ou writes: I am using a multilevel model with three levels. I read that you wrote a book about multilevel models, and wonder if you can solve the following question. The data structure is like this: Level one: customer (8444 customers) Level two: companys (90 companies) Level three: industry (17 industries) I use 6 level-three variables (i.e. industry characteristics) to explain the variance of the level-one effect across industries. The question here is whether there is an over-fitting problem since there are only 17 industries. I understand that this must be a problem for non-multilevel models, but is it also a problem for multilevel models? My reply: Yes, this could be a problem. I’d suggest combining some of your variables into a common score, or using only some of the variables, or using strong priors to control the inferences. This is an interesting and important area of statistics research, to do this sort of thing systematically. There’s lots o

5 0.16038516 1941 andrew gelman stats-2013-07-16-Priors

Introduction: Nick Firoozye writes: While I am absolutely sympathetic to the Bayesian agenda I am often troubled by the requirement of having priors. We must have priors on the parameter of an infinite number of model we have never seen before and I find this troubling. There is a similarly troubling problem in economics of utility theory. Utility is on consumables. To be complete a consumer must assign utility to all sorts of things they never would have encountered. More recent versions of utility theory instead make consumption goods a portfolio of attributes. Cadillacs are x many units of luxury y of transport etc etc. And we can automatically have personal utilities to all these attributes. I don’t ever see parameters. Some model have few and some have hundreds. Instead, I see data. So I don’t know how to have an opinion on parameters themselves. Rather I think it far more natural to have opinions on the behavior of models. The prior predictive density is a good and sensible notion. Also

6 0.15618484 1506 andrew gelman stats-2012-09-21-Building a regression model . . . with only 27 data points

7 0.15579839 779 andrew gelman stats-2011-06-25-Avoiding boundary estimates using a prior distribution as regularization

8 0.15525022 653 andrew gelman stats-2011-04-08-Multilevel regression with shrinkage for “fixed” effects

9 0.14835873 368 andrew gelman stats-2010-10-25-Is instrumental variables analysis particularly susceptible to Type M errors?

10 0.14369939 2129 andrew gelman stats-2013-12-10-Cross-validation and Bayesian estimation of tuning parameters

11 0.14040053 948 andrew gelman stats-2011-10-10-Combining data from many sources

12 0.13653359 2364 andrew gelman stats-2014-06-08-Regression and causality and variable ordering

13 0.13227031 1474 andrew gelman stats-2012-08-29-More on scaled-inverse Wishart and prior independence

14 0.13090475 1317 andrew gelman stats-2012-05-13-Question 3 of my final exam for Design and Analysis of Sample Surveys

15 0.1290292 257 andrew gelman stats-2010-09-04-Question about standard range for social science correlations

16 0.12878585 1737 andrew gelman stats-2013-02-25-Correlation of 1 . . . too good to be true?

17 0.12479214 1267 andrew gelman stats-2012-04-17-Hierarchical-multilevel modeling with “big data”

18 0.1238957 2294 andrew gelman stats-2014-04-17-If you get to the point of asking, just do it. But some difficulties do arise . . .

19 0.12304643 246 andrew gelman stats-2010-08-31-Somewhat Bayesian multilevel modeling

20 0.12243699 695 andrew gelman stats-2011-05-04-Statistics ethics question

similar blogs computed by lsi model

lsi for this blog:

topicId topicWeight

[(0, 0.202), (1, 0.187), (2, 0.105), (3, -0.041), (4, 0.105), (5, -0.027), (6, 0.057), (7, -0.061), (8, -0.017), (9, 0.039), (10, 0.018), (11, 0.012), (12, 0.03), (13, -0.022), (14, 0.022), (15, -0.022), (16, -0.077), (17, 0.003), (18, 0.015), (19, -0.024), (20, 0.015), (21, 0.007), (22, 0.046), (23, 0.006), (24, -0.006), (25, -0.081), (26, -0.032), (27, -0.004), (28, -0.046), (29, -0.005), (30, 0.014), (31, 0.001), (32, 0.016), (33, -0.029), (34, 0.013), (35, -0.002), (36, -0.003), (37, 0.004), (38, 0.061), (39, -0.057), (40, -0.029), (41, -0.063), (42, 0.004), (43, 0.05), (44, -0.054), (45, -0.029), (46, -0.022), (47, -0.025), (48, 0.01), (49, -0.005)]

similar blogs list:

simIndex simValue blogId blogTitle

same-blog 1 0.97566003 1966 andrew gelman stats-2013-08-03-Uncertainty in parameter estimates using multilevel models

2 0.8177501 2145 andrew gelman stats-2013-12-24-Estimating and summarizing inference for hierarchical variance parameters when the number of groups is small

3 0.77989465 1737 andrew gelman stats-2013-02-25-Correlation of 1 . . . too good to be true?

Introduction: Alex Hoffman points me to this interview by Dylan Matthews of education researcher Thomas Kane, who at one point says, Once you corrected for measurement error, a teacher’s score on their chosen videos and on their unchosen videos were correlated at 1. They were perfectly correlated. Hoffman asks, “What do you think? Do you think that just maybe, perhaps, it’s possible we aught to consider, I’m just throwing out the possibility that it might be that the procedure for correcting measurement error might, you now, be a little too strong?” I don’t know exactly what’s happening here, but it might be something that I’ve seen on occasion when fitting multilevel models using a point estimate for the group-level variance. It goes like this: measurement-error models are multilevel models, they involve the estimation of a distribution of a latent variable. When fitting multilevel models, it is possible to estimate the group-level variance to be zero, even though the group-level varia

4 0.77815998 184 andrew gelman stats-2010-08-04-That half-Cauchy prior

Introduction: Xiaoyu Qian writes: I have a question when I apply the half-Cauchy prior (Gelman, 2006) for the variance parameter in a hierarchical model. The model I used is a three level IRT model equivalent to a Rasch model. The variance parameter I try to estimate is at the third level. The group size ranges from 15 to 44. The data is TIMSS 2007 data. I used the syntax provided by the paper and found that the convergence of the standard deviation term is good (sigma.theta), however, the convergence for the parameter “xi” is not very good. Does it mean the whole model has not converged? Do you have any suggestion for this situation. I also used the uniform prior and correlate the result with the half-Cauchy result for the standard deviation term. The results correlated .99. My reply: It’s not a problem if xi does not converge well. It’s |xi|*sigma that is relevant. And, if the number of groups is large, the prior probably won’t matter so much, which would explain your 99% correlat

5 0.77425039 269 andrew gelman stats-2010-09-10-R vs. Stata, or, Different ways to estimate multilevel models

Introduction: Cyrus writes: I [Cyrus] was teaching a class on multilevel modeling, and we were playing around with different method to fit a random effects logit model with 2 random intercepts—one corresponding to “family” and another corresponding to “community” (labeled “mom” and “cluster” in the data, respectively). There are also a few regressors at the individual, family, and community level. We were replicating in part some of the results from the following paper : Improved estimation procedures for multilevel models with binary response: a case-study, by G Rodriguez, N Goldman. (I say “replicating in part” because we didn’t include all the regressors that they use, only a subset.) We were looking at the performance of estimation via glmer in R’s lme4 package, glmmPQL in R’s MASS package, and Stata’s xtmelogit. We wanted to study the performance of various estimation methods, including adaptive quadrature methods and penalized quasi-likelihood. I was shocked to discover that glmer

6 0.76550514 1786 andrew gelman stats-2013-04-03-Hierarchical array priors for ANOVA decompositions

7 0.76307023 753 andrew gelman stats-2011-06-09-Allowing interaction terms to vary

8 0.75472999 464 andrew gelman stats-2010-12-12-Finite-population standard deviation in a hierarchical model

9 0.75371361 918 andrew gelman stats-2011-09-21-Avoiding boundary estimates in linear mixed models

10 0.74535614 1686 andrew gelman stats-2013-01-21-Finite-population Anova calculations for models with interactions

11 0.7436403 810 andrew gelman stats-2011-07-20-Adding more information can make the variance go up (depending on your model)

12 0.74122339 2296 andrew gelman stats-2014-04-19-Index or indicator variables

13 0.73601329 2086 andrew gelman stats-2013-11-03-How best to compare effects measured in two different time periods?

14 0.72799641 1267 andrew gelman stats-2012-04-17-Hierarchical-multilevel modeling with “big data”

15 0.72651088 888 andrew gelman stats-2011-09-03-A psychology researcher asks: Is Anova dead?

16 0.71628702 246 andrew gelman stats-2010-08-31-Somewhat Bayesian multilevel modeling

17 0.71529108 779 andrew gelman stats-2011-06-25-Avoiding boundary estimates using a prior distribution as regularization

18 0.71082711 726 andrew gelman stats-2011-05-22-Handling multiple versions of an outcome variable

19 0.70748675 759 andrew gelman stats-2011-06-11-“2 level logit with 2 REs & large sample. computational nightmare – please help”

20 0.70147157 2294 andrew gelman stats-2014-04-17-If you get to the point of asking, just do it. But some difficulties do arise . . .

similar blogs computed by lda model

lda for this blog:

topicId topicWeight

[(1, 0.012), (9, 0.017), (14, 0.01), (16, 0.045), (21, 0.026), (24, 0.235), (35, 0.011), (36, 0.036), (58, 0.094), (72, 0.039), (85, 0.018), (86, 0.027), (96, 0.016), (98, 0.025), (99, 0.262)]

similar blogs list:

simIndex simValue blogId blogTitle

same-blog 1 0.96955228 1966 andrew gelman stats-2013-08-03-Uncertainty in parameter estimates using multilevel models

2 0.9633404 574 andrew gelman stats-2011-02-14-“The best data visualizations should stand on their own”? I don’t think so.

Introduction: Jimmy pointed me to this blog by Drew Conway on word clouds. I don’t have much to say about Conway’s specifics–word clouds aren’t really my thing, but I’m glad that people are thinking about how to do them better–but I did notice one phrase of his that I’ll dispute. Conway writes The best data visualizations should stand on their own . . . I disagree. I prefer the saying, “A picture plus 1000 words is better than two pictures or 2000 words.” That is, I see a positive interaction between words and pictures or, to put it another way, diminishing returns for words or pictures on their own. I don’t have any big theory for this, but I think, when expressed as a joint value function, my idea makes sense. Also, I live this suggestion in my own work. I typically accompany my graphs with long captions and I try to accompany my words with pictures (although I’m not doing it here, because with the software I use, it’s much easier to type more words than to find, scale, and insert i

3 0.9588151 815 andrew gelman stats-2011-07-22-Statistical inference based on the minimum description length principle

Introduction: Tom Ball writes: Here’s another query to add to the stats backlog…Minimum Description Length (MDL). I’m attaching a 2002 Psych Rev paper on same. Basically, it’s an approach to model selection that replaces goodness of fit with generalizability or complexity. Would be great to get your response to this approach. My reply: I’ve heard about the minimum description length principle for a long time but have never really understood it. So I have nothing to say! Anyone who has anything useful to say on the topic, feel free to add in the comments. The rest of you might wonder why I posted this. I just thought it would be good for you to have some sense of the boundaries of my knowledge.

4 0.94869041 1838 andrew gelman stats-2013-05-03-Setting aside the politics, the debate over the new health-care study reveals that we’re moving to a new high standard of statistical journalism

Introduction: Pointing to this news article by Megan McArdle discussing a recent study of Medicaid recipients, Jonathan Falk writes: Forget the interpretation for a moment, and the political spin, but haven’t we reached an interesting point when a journalist says things like: When you do an RCT with more than 12,000 people in it, and your defense of your hypothesis is that maybe the study just didn’t have enough power, what you’re actually saying is “the beneficial effects are probably pretty small”. and A good Bayesian—and aren’t most of us are supposed to be good Bayesians these days?—should be updating in light of this new information. Given this result, what is the likelihood that Obamacare will have a positive impact on the average health of Americans? Every one of us, for or against, should be revising that probability downwards. I’m not saying that you have to revise it to zero; I certainly haven’t. But however high it was yesterday, it should be somewhat lower today. This

5 0.94783658 2312 andrew gelman stats-2014-04-29-Ken Rice presents a unifying approach to statistical inference and hypothesis testing

Introduction: Ken Rice writes: In the recent discussion on stopping rules I saw a comment that I wanted to chip in on, but thought it might get a bit lost, in the already long thread. Apologies in advance if I misinterpreted what you wrote, or am trying to tell you things you already know. The comment was: “In Bayesian decision making, there is a utility function and you choose the decision with highest expected utility. Making a decision based on statistical significance does not correspond to any utility function.” … which immediately suggests this little 2010 paper; A Decision-Theoretic Formulation of Fisher’s Approach to Testing, The American Statistician, 64(4) 345-349. It contains utilities that lead to decisions that very closely mimic classical Wald tests, and provides a rationale for why this utility is not totally unconnected from how some scientists think. Some (old) slides discussing it are here . A few notes, on things not in the paper: * I know you don’t like squared-

6 0.94679379 1474 andrew gelman stats-2012-08-29-More on scaled-inverse Wishart and prior independence

7 0.94678438 953 andrew gelman stats-2011-10-11-Steve Jobs’s cancer and science-based medicine

8 0.94672346 727 andrew gelman stats-2011-05-23-My new writing strategy

9 0.94591415 1886 andrew gelman stats-2013-06-07-Robust logistic regression

10 0.94581926 847 andrew gelman stats-2011-08-10-Using a “pure infographic” to explore differences between information visualization and statistical graphics

11 0.94522655 1757 andrew gelman stats-2013-03-11-My problem with the Lindley paradox

12 0.94515675 2029 andrew gelman stats-2013-09-18-Understanding posterior p-values

13 0.94441718 1167 andrew gelman stats-2012-02-14-Extra babies on Valentine’s Day, fewer on Halloween?

14 0.94401991 1240 andrew gelman stats-2012-04-02-Blogads update

15 0.94375539 1072 andrew gelman stats-2011-12-19-“The difference between . . .”: It’s not just p=.05 vs. p=.06

16 0.94357443 1155 andrew gelman stats-2012-02-05-What is a prior distribution?

17 0.94290173 1792 andrew gelman stats-2013-04-07-X on JLP

18 0.94285142 1087 andrew gelman stats-2011-12-27-“Keeping things unridiculous”: Berger, O’Hagan, and me on weakly informative priors

19 0.94284463 1465 andrew gelman stats-2012-08-21-D. Buggin

20 0.94269788 197 andrew gelman stats-2010-08-10-The last great essayist?