andrew_gelman_stats andrew_gelman_stats-2010 andrew_gelman_stats-2010-269 knowledge-graph by maker-knowledge-mining

269 andrew gelman stats-2010-09-10-R vs. Stata, or, Different ways to estimate multilevel models


meta infos for this blog

Source: html

Introduction: Cyrus writes: I [Cyrus] was teaching a class on multilevel modeling, and we were playing around with different method to fit a random effects logit model with 2 random intercepts—one corresponding to “family” and another corresponding to “community” (labeled “mom” and “cluster” in the data, respectively). There are also a few regressors at the individual, family, and community level. We were replicating in part some of the results from the following paper : Improved estimation procedures for multilevel models with binary response: a case-study, by G Rodriguez, N Goldman. (I say “replicating in part” because we didn’t include all the regressors that they use, only a subset.) We were looking at the performance of estimation via glmer in R’s lme4 package, glmmPQL in R’s MASS package, and Stata’s xtmelogit. We wanted to study the performance of various estimation methods, including adaptive quadrature methods and penalized quasi-likelihood. I was shocked to discover that glmer


Summary: the most important sentenses genereted by tfidf model

sentIndex sentText sentNum sentScore

1 We were replicating in part some of the results from the following paper : Improved estimation procedures for multilevel models with binary response: a case-study, by G Rodriguez, N Goldman. [sent-3, score-0.294]

2 ) We were looking at the performance of estimation via glmer in R’s lme4 package, glmmPQL in R’s MASS package, and Stata’s xtmelogit. [sent-5, score-0.599]

3 We wanted to study the performance of various estimation methods, including adaptive quadrature methods and penalized quasi-likelihood. [sent-6, score-0.631]

4 I was shocked to discover that glmer’s default setting is adaptive Gaussian quadrature with 1 integration point (that is, a Laplace approximation). [sent-7, score-0.676]

5 In addition, and even more shocking, glmer **does not allow** more than one integration point to be used for models with more than one random effect. [sent-8, score-0.813]

6 With only one random effect, you can increase the number of integration points. [sent-9, score-0.338]

7 But with multiple random effects (like what we had, which was 2), when you try to specify multiple integration points (with the nAGQ setting), you get an error saying that it can’t do it. [sent-10, score-0.42]

8 I made a table showing results from estimates using glmer’s Laplace approximation, and then also results from the glmmPQL which uses penalized quasi-liklihood, and then results from Stata’s xtmelogit using first its own Laplace approximation and then a 7-pt Gaussian adaptive quadrature. [sent-12, score-1.127]

9 Note that in Stata, the 7-pt quadrature fit is the default. [sent-13, score-0.385]

10 Now, when we look at the results for the random effects standard deviation estimates, we see that the Laplace approximations in R (via glmer) and Stata (model 3 in the Table) are identical. [sent-15, score-0.482]

11 In addition, taking the 7-pt quadrature estimate from xtmelogit to be the best guess of the bunch, the Laplace approximation-based estimates are HORRIBLY biased downward. [sent-16, score-0.717]

12 That is, the estimated random effect variance is way too small. [sent-17, score-0.319]

13 The PQL estimates are also biased downward, but not so badly. [sent-18, score-0.238]

14 This is in line with what Rodriguez and Goldman found, using a 20-point quadrature and an MCMC fit as benchmarks (they produced nearly identical results). [sent-19, score-0.451]

15 The upshot is that glmer’s default is REALLY BAD; and this default is the only option when you have more than one random effect. [sent-20, score-0.47]

16 My suggestion then is to either use Stata, which has a very sensible 7-point quadrature default (although as a brute force method, it is slow) or fit the model with MCMC. [sent-21, score-0.661]

17 If so, have you (1) looked into this or (2) checked what glmer is giving you against xtmelogit or BUGS? [sent-23, score-0.686]

18 I would trust the xtmelogit or BUGS answers more. [sent-24, score-0.211]

19 Sophia has been telling me for awhile that the method used by gllam is better than the Laplace method that is currently used in glmer. [sent-27, score-0.388]

20 (b) Adding a few steps of the Metropolis algorithm to capture some of the uncertainty in the variance parameters and also fix some of that bias you’re talking about. [sent-36, score-0.309]


similar blogs computed by tfidf model

tfidf for this blog:

wordName wordTfidf (topN-words)

[('glmer', 0.475), ('quadrature', 0.317), ('laplace', 0.237), ('xtmelogit', 0.211), ('stata', 0.2), ('random', 0.2), ('sophia', 0.174), ('gllam', 0.158), ('integration', 0.138), ('estimates', 0.127), ('estimation', 0.124), ('variance', 0.119), ('method', 0.115), ('approximations', 0.112), ('adaptive', 0.111), ('default', 0.11), ('glmmpql', 0.106), ('regressors', 0.106), ('rodriguez', 0.106), ('approximation', 0.101), ('brute', 0.1), ('results', 0.088), ('intercepts', 0.085), ('parameters', 0.085), ('cyrus', 0.083), ('effects', 0.082), ('replicating', 0.082), ('penalized', 0.079), ('metropolis', 0.074), ('fit', 0.068), ('using', 0.066), ('force', 0.066), ('gaussian', 0.064), ('bugs', 0.063), ('biased', 0.062), ('intuition', 0.061), ('corresponding', 0.06), ('marginal', 0.058), ('steps', 0.056), ('package', 0.055), ('adding', 0.055), ('community', 0.053), ('nagq', 0.053), ('table', 0.053), ('family', 0.052), ('integrals', 0.05), ('scalar', 0.05), ('upshot', 0.05), ('addition', 0.049), ('also', 0.049)]

similar blogs list:

simIndex simValue blogId blogTitle

same-blog 1 0.99999988 269 andrew gelman stats-2010-09-10-R vs. Stata, or, Different ways to estimate multilevel models

Introduction: Cyrus writes: I [Cyrus] was teaching a class on multilevel modeling, and we were playing around with different method to fit a random effects logit model with 2 random intercepts—one corresponding to “family” and another corresponding to “community” (labeled “mom” and “cluster” in the data, respectively). There are also a few regressors at the individual, family, and community level. We were replicating in part some of the results from the following paper : Improved estimation procedures for multilevel models with binary response: a case-study, by G Rodriguez, N Goldman. (I say “replicating in part” because we didn’t include all the regressors that they use, only a subset.) We were looking at the performance of estimation via glmer in R’s lme4 package, glmmPQL in R’s MASS package, and Stata’s xtmelogit. We wanted to study the performance of various estimation methods, including adaptive quadrature methods and penalized quasi-likelihood. I was shocked to discover that glmer

2 0.23537101 779 andrew gelman stats-2011-06-25-Avoiding boundary estimates using a prior distribution as regularization

Introduction: For awhile I’ve been fitting most of my multilevel models using lmer/glmer, which gives point estimates of the group-level variance parameters (maximum marginal likelihood estimate for lmer and an approximation for glmer). I’m usually satisfied with this–sure, point estimation understates the uncertainty in model fitting, but that’s typically the least of our worries. Sometimes, though, lmer/glmer estimates group-level variances at 0 or estimates group-level correlation parameters at +/- 1. Typically, when this happens, it’s not that we’re so sure the variance is close to zero or that the correlation is close to 1 or -1; rather, the marginal likelihood does not provide a lot of information about these parameters of the group-level error distribution. I don’t want point estimates on the boundary. I don’t want to say that the unexplained variance in some dimension is exactly zero. One way to handle this problem is full Bayes: slap a prior on sigma, do your Gibbs and Metropolis

3 0.20160961 759 andrew gelman stats-2011-06-11-“2 level logit with 2 REs & large sample. computational nightmare – please help”

Introduction: I received an email with the above title from Daniel Adkins, who elaborates: I [Adkins] am having a tough time with a dataset including 40K obs and 8K subjects. Trying to estimate a 2 level logit with random intercept and age slope and about 13 fixed covariates. I have tried several R packages (lme4, lme4a, glmmPQL, MCMCglmm) and stata xtmelogit and gllamm to no avail. xtmelogit crashes from insufficient memory. The R packages yield false convergences. A simpler model w/ random intercept only gives stable estimates in lme4 with a very large number of quadrature point (nAGQ>220). When i try this (nAGQ=221) with the random age term, it doesn’t make it through a single iteration in 72 hours (have tried both w/ and w/out RE correlation). I am using a power desktop that is top of the line compared to anything other than a cluster. Have tried start values for fixed effects in lme4 and that doesn’t help (couldn’t figure out how to specify RE starts). Do you have any advice. Should I move t

4 0.15192847 501 andrew gelman stats-2011-01-04-A new R package for fititng multilevel models

Introduction: Joscha Legewie points to this article by Lars Ronnegard, Xia Shen, and Moudud Alam, “hglm: A Package for Fitting Hierarchical Generalized Linear Models,” which just appeared in the R journal. This new package has the advantage, compared to lmer(), of allowing non-normal distributions for the varying coefficients. On the downside, they seem to have reverted to the ugly lme-style syntax (for example, “fixed = y ~ week, random = ~ 1|ID” rather than “y ~ week + (1|D)”). The old-style syntax has difficulties handling non-nested grouping factors. They also say they can estimated models with correlated random effects, but isn’t that just the same as varying-intercept, varying-slope models, which lmer (or Stata alternatives such as gllam) can already do? There’s also a bunch of stuff on H-likelihood theory, which seems pretty pointless to me (although probably it won’t do much harm either). In any case, this package might be useful to some of you, hence this note.

5 0.1493336 846 andrew gelman stats-2011-08-09-Default priors update?

Introduction: Ryan King writes: I was wondering if you have a brief comment on the state of the art for objective priors for hierarchical generalized linear models (generalized linear mixed models). I have been working off the papers in Bayesian Analysis (2006) 1, Number 3 (Browne and Draper, Kass and Natarajan, Gelman). There seems to have been continuous work for matching priors in linear mixed models, but GLMMs less so because of the lack of an analytic marginal likelihood for the variance components. There are a number of additional suggestions in the literature since 2006, but little robust practical guidance. I’m interested in both mean parameters and the variance components. I’m almost always concerned with logistic random effect models. I’m fascinated by the matching-priors idea of higher-order asymptotic improvements to maximum likelihood, and need to make some kind of defensible default recommendation. Given the massive scale of the datasets (genetics …), extensive sensitivity a

6 0.149331 1267 andrew gelman stats-2012-04-17-Hierarchical-multilevel modeling with “big data”

7 0.13054068 76 andrew gelman stats-2010-06-09-Both R and Stata

8 0.13028035 442 andrew gelman stats-2010-12-01-bayesglm in Stata?

9 0.12869848 2086 andrew gelman stats-2013-11-03-How best to compare effects measured in two different time periods?

10 0.12586361 63 andrew gelman stats-2010-06-02-The problem of overestimation of group-level variance parameters

11 0.12382199 888 andrew gelman stats-2011-09-03-A psychology researcher asks: Is Anova dead?

12 0.11738978 352 andrew gelman stats-2010-10-19-Analysis of survey data: Design based models vs. hierarchical modeling?

13 0.11064173 383 andrew gelman stats-2010-10-31-Analyzing the entire population rather than a sample

14 0.10998318 1644 andrew gelman stats-2012-12-30-Fixed effects, followed by Bayes shrinkage?

15 0.10913169 726 andrew gelman stats-2011-05-22-Handling multiple versions of an outcome variable

16 0.10392233 869 andrew gelman stats-2011-08-24-Mister P in Stata

17 0.1021712 1309 andrew gelman stats-2012-05-09-The first version of my “inference from iterative simulation using parallel sequences” paper!

18 0.10182092 472 andrew gelman stats-2010-12-17-So-called fixed and random effects

19 0.10147736 1966 andrew gelman stats-2013-08-03-Uncertainty in parameter estimates using multilevel models

20 0.099581115 246 andrew gelman stats-2010-08-31-Somewhat Bayesian multilevel modeling


similar blogs computed by lsi model

lsi for this blog:

topicId topicWeight

[(0, 0.179), (1, 0.131), (2, 0.06), (3, -0.021), (4, 0.09), (5, 0.03), (6, 0.036), (7, -0.035), (8, -0.007), (9, 0.036), (10, 0.022), (11, -0.018), (12, 0.021), (13, -0.018), (14, 0.023), (15, -0.026), (16, -0.059), (17, -0.001), (18, -0.017), (19, -0.004), (20, 0.009), (21, -0.022), (22, 0.055), (23, 0.052), (24, -0.013), (25, -0.057), (26, -0.086), (27, 0.086), (28, 0.051), (29, 0.014), (30, -0.004), (31, 0.004), (32, 0.006), (33, -0.048), (34, 0.015), (35, -0.013), (36, -0.059), (37, -0.001), (38, -0.005), (39, 0.014), (40, 0.009), (41, 0.015), (42, 0.011), (43, 0.027), (44, -0.025), (45, -0.014), (46, -0.026), (47, 0.062), (48, 0.044), (49, -0.027)]

similar blogs list:

simIndex simValue blogId blogTitle

same-blog 1 0.9683972 269 andrew gelman stats-2010-09-10-R vs. Stata, or, Different ways to estimate multilevel models

Introduction: Cyrus writes: I [Cyrus] was teaching a class on multilevel modeling, and we were playing around with different method to fit a random effects logit model with 2 random intercepts—one corresponding to “family” and another corresponding to “community” (labeled “mom” and “cluster” in the data, respectively). There are also a few regressors at the individual, family, and community level. We were replicating in part some of the results from the following paper : Improved estimation procedures for multilevel models with binary response: a case-study, by G Rodriguez, N Goldman. (I say “replicating in part” because we didn’t include all the regressors that they use, only a subset.) We were looking at the performance of estimation via glmer in R’s lme4 package, glmmPQL in R’s MASS package, and Stata’s xtmelogit. We wanted to study the performance of various estimation methods, including adaptive quadrature methods and penalized quasi-likelihood. I was shocked to discover that glmer

2 0.79609841 2086 andrew gelman stats-2013-11-03-How best to compare effects measured in two different time periods?

Introduction: I received the following email from someone who wishes to remain anonymous: My colleague and I are trying to understand the best way to approach a problem involving measuring a group of individuals’ abilities across time, and are hoping you can offer some guidance. We are trying to analyze the combined effect of two distinct groups of people (A and B, with no overlap between A and B) who collaborate to produce a binary outcome, using a mixed logistic regression along the lines of the following. Outcome ~ (1 | A) + (1 | B) + Other variables What we’re interested in testing was whether the observed A random effects in period 1 are predictive of the A random effects in the following period 2. Our idea being create two models, each using a different period’s worth of data, to create two sets of A coefficients, then observe the relationship between the two. If the A’s have a persistent ability across periods, the coefficients should be correlated or show a linear-ish relationshi

3 0.79218924 246 andrew gelman stats-2010-08-31-Somewhat Bayesian multilevel modeling

Introduction: Eric McGhee writes: I’m trying to generate county-level estimates from a statewide survey of California using multilevel modeling. I would love to learn the full Bayesian approach, but I’m on a tight schedule and worried about teaching myself something of that complexity in the time available. I’m hoping I can use the classical approach and simulate standard errors using what you and Jennifer Hill call the “informal Bayesian” method. This has raised a few questions: First, what are the costs of using this approach as opposed to full Bayesian? Second, when I use the predictive simulation as described on p. 149 of “Data Analysis” on a binary dependent variable and a sample of 2000, I get a 5%-95% range of simulation results so large as to be effectively useless (on the order of +/- 15 points). This is true even for LA county, which has enough cases by itself (about 500) to get a standard error of about 2 points from simple disaggregation. However, if I simulate only with t

4 0.78135747 464 andrew gelman stats-2010-12-12-Finite-population standard deviation in a hierarchical model

Introduction: Karri Seppa writes: My topic is regional variation in the cause-specific survival of breast cancer patients across the 21 hospital districts in Finland, this component being modeled by random effects. I am interested mainly in the district-specific effects, and with a hierarchical model I can get reasonable estimates also for sparsely populated districts. Based on the recommendation given in the book by yourself and Dr. Hill (2007) I tend to think that the finite-population variance would be an appropriate measure to summarize the overall variation across the 21 districts. However, I feel it is somewhat incoherent first to assume a Normal distribution for the district effects, involving a “superpopulation” variance parameter, and then to compute the finite-population variance from the estimated district-specific parameters. I wonder whether the finite-population variance were more appropriate in the context of a model with fixed district effects? My reply: I agree that th

5 0.78016955 759 andrew gelman stats-2011-06-11-“2 level logit with 2 REs & large sample. computational nightmare – please help”

Introduction: I received an email with the above title from Daniel Adkins, who elaborates: I [Adkins] am having a tough time with a dataset including 40K obs and 8K subjects. Trying to estimate a 2 level logit with random intercept and age slope and about 13 fixed covariates. I have tried several R packages (lme4, lme4a, glmmPQL, MCMCglmm) and stata xtmelogit and gllamm to no avail. xtmelogit crashes from insufficient memory. The R packages yield false convergences. A simpler model w/ random intercept only gives stable estimates in lme4 with a very large number of quadrature point (nAGQ>220). When i try this (nAGQ=221) with the random age term, it doesn’t make it through a single iteration in 72 hours (have tried both w/ and w/out RE correlation). I am using a power desktop that is top of the line compared to anything other than a cluster. Have tried start values for fixed effects in lme4 and that doesn’t help (couldn’t figure out how to specify RE starts). Do you have any advice. Should I move t

6 0.77508378 501 andrew gelman stats-2011-01-04-A new R package for fititng multilevel models

7 0.76209307 1966 andrew gelman stats-2013-08-03-Uncertainty in parameter estimates using multilevel models

8 0.75579524 1267 andrew gelman stats-2012-04-17-Hierarchical-multilevel modeling with “big data”

9 0.74813747 1737 andrew gelman stats-2013-02-25-Correlation of 1 . . . too good to be true?

10 0.74093652 1194 andrew gelman stats-2012-03-04-Multilevel modeling even when you’re not interested in predictions for new groups

11 0.72740692 1786 andrew gelman stats-2013-04-03-Hierarchical array priors for ANOVA decompositions

12 0.72727334 1309 andrew gelman stats-2012-05-09-The first version of my “inference from iterative simulation using parallel sequences” paper!

13 0.72545588 726 andrew gelman stats-2011-05-22-Handling multiple versions of an outcome variable

14 0.70026737 650 andrew gelman stats-2011-04-05-Monitor the efficiency of your Markov chain sampler using expected squared jumped distance!

15 0.69819272 850 andrew gelman stats-2011-08-11-Understanding how estimates change when you move to a multilevel model

16 0.69494909 653 andrew gelman stats-2011-04-08-Multilevel regression with shrinkage for “fixed” effects

17 0.69028562 931 andrew gelman stats-2011-09-29-Hamiltonian Monte Carlo stories

18 0.6840899 1144 andrew gelman stats-2012-01-29-How many parameters are in a multilevel model?

19 0.68349284 851 andrew gelman stats-2011-08-12-year + (1|year)

20 0.68234271 1726 andrew gelman stats-2013-02-18-What to read to catch up on multivariate statistics?


similar blogs computed by lda model

lda for this blog:

topicId topicWeight

[(16, 0.061), (21, 0.016), (24, 0.143), (55, 0.258), (57, 0.017), (59, 0.012), (81, 0.016), (82, 0.014), (84, 0.019), (86, 0.045), (99, 0.269)]

similar blogs list:

simIndex simValue blogId blogTitle

1 0.97089028 1463 andrew gelman stats-2012-08-19-It is difficult to convey intonation in typed speech

Introduction: I just wanted to add the above comment to Bob’s notes on language. Spoken (and, to some extent, handwritten) language can be much more expressive than the typed version. I’m not just talking about slang or words such as baaaaad; I’m also talking about pauses that give logical structure to a sentence. For example, sentences such as “The girl who hit the ball where the dog used to be was the one who was climbing the tree when the dog came by” are effortless to understand in speech but can be difficult for a reader to follow. Often when I write, I need to untangle my sentences to keep them readable.

2 0.96469724 1617 andrew gelman stats-2012-12-11-Math Talks :: Action Movies

Introduction: Jonathan Goodman gave the departmental seminar yesterday (10 Dec 2012) and I was amused by an extended analogy he made. After his (very clear) intro, he said that math talks were like action movies. The overall theorem and its applications provide the plot, and the proofs provide the action scenes.

3 0.95281351 1896 andrew gelman stats-2013-06-13-Against the myth of the heroic visualization

Introduction: Alberto Cairo tells a fascinating story about John Snow, H. W. Acland, and the Mythmaking Problem: Every human community—nations, ethnic and cultural groups, professional guilds—inevitably raises a few of its members to the status of heroes and weaves myths around them. . . . The visual display of information is no stranger to heroes and myth. In fact, being a set of disciplines with a relatively small amount of practitioners and researchers, it has generated a staggering number of heroes, perhaps as a morale-enhancing mechanism. Most of us have heard of the wonders of William Playfair’s Commercial and Political Atlas, Florence Nightingale’s coxcomb charts, Charles Joseph Minard’s Napoleon’s march diagram, and Henry Beck’s 1933 redesign of the London Underground map. . . . Cairo’s goal, I think, is not to disparage these great pioneers of graphics but rather to put their work in perspective, recognizing the work of their excellent contemporaries. I would like to echo Cairo’

4 0.93240255 1299 andrew gelman stats-2012-05-04-Models, assumptions, and data summaries

Introduction: I saw an analysis recently that I didn’t like. I won’t go into the details, but basically it was a dose-response inference, where a continuous exposure was binned into three broad categories (terciles of the data) and the probability of an adverse event was computed for each tercile. The effect and the sample size was large enough that the terciles were statistically-significantly different from each other in probability of adverse event, with the probabilities increasing from low to mid to high exposure, as one would predict. I didn’t like this analysis because it is equivalent to fitting a step function. There is a tendency for people to interpret the (arbitrary) tercile boundaries as being meaningful thresholds even though the underlying dose-response relation has to be continuous. I’d prefer to start with a linear model and then add nonlinearity from there with a spline or whatever. At this point I stepped back and thought: Hey, the divide-into-three analysis does not lite

same-blog 5 0.91276646 269 andrew gelman stats-2010-09-10-R vs. Stata, or, Different ways to estimate multilevel models

Introduction: Cyrus writes: I [Cyrus] was teaching a class on multilevel modeling, and we were playing around with different method to fit a random effects logit model with 2 random intercepts—one corresponding to “family” and another corresponding to “community” (labeled “mom” and “cluster” in the data, respectively). There are also a few regressors at the individual, family, and community level. We were replicating in part some of the results from the following paper : Improved estimation procedures for multilevel models with binary response: a case-study, by G Rodriguez, N Goldman. (I say “replicating in part” because we didn’t include all the regressors that they use, only a subset.) We were looking at the performance of estimation via glmer in R’s lme4 package, glmmPQL in R’s MASS package, and Stata’s xtmelogit. We wanted to study the performance of various estimation methods, including adaptive quadrature methods and penalized quasi-likelihood. I was shocked to discover that glmer

6 0.9100405 688 andrew gelman stats-2011-04-30-Why it’s so relaxing to think about social issues

7 0.9054175 168 andrew gelman stats-2010-07-28-Colorless green, and clueless

8 0.90206707 333 andrew gelman stats-2010-10-10-Psychiatric drugs and the reduction in crime

9 0.90053117 620 andrew gelman stats-2011-03-19-Online James?

10 0.89983761 2019 andrew gelman stats-2013-09-12-Recently in the sister blog

11 0.89670044 1131 andrew gelman stats-2012-01-20-Stan: A (Bayesian) Directed Graphical Model Compiler

12 0.89582437 997 andrew gelman stats-2011-11-07-My contribution to the discussion on “Should voting be mandatory?”

13 0.88710177 1107 andrew gelman stats-2012-01-08-More on essentialism

14 0.88217497 874 andrew gelman stats-2011-08-27-What’s “the definition of a professional career”?

15 0.87985861 50 andrew gelman stats-2010-05-25-Looking for Sister Right

16 0.87557036 1406 andrew gelman stats-2012-07-05-Xiao-Li Meng and Xianchao Xie rethink asymptotics

17 0.84661168 706 andrew gelman stats-2011-05-11-The happiness gene: My bottom line (for now)

18 0.844868 201 andrew gelman stats-2010-08-12-Are all rich people now liberals?

19 0.84256065 1520 andrew gelman stats-2012-10-03-Advice that’s so eminently sensible but so difficult to follow

20 0.84091902 13 andrew gelman stats-2010-04-30-Things I learned from the Mickey Kaus for Senate campaign