andrew_gelman_stats andrew_gelman_stats-2010 andrew_gelman_stats-2010-246 knowledge-graph by maker-knowledge-mining
Source: html
Introduction: Eric McGhee writes: I’m trying to generate county-level estimates from a statewide survey of California using multilevel modeling. I would love to learn the full Bayesian approach, but I’m on a tight schedule and worried about teaching myself something of that complexity in the time available. I’m hoping I can use the classical approach and simulate standard errors using what you and Jennifer Hill call the “informal Bayesian” method. This has raised a few questions: First, what are the costs of using this approach as opposed to full Bayesian? Second, when I use the predictive simulation as described on p. 149 of “Data Analysis” on a binary dependent variable and a sample of 2000, I get a 5%-95% range of simulation results so large as to be effectively useless (on the order of +/- 15 points). This is true even for LA county, which has enough cases by itself (about 500) to get a standard error of about 2 points from simple disaggregation. However, if I simulate only with t
sentIndex sentText sentNum sentScore
1 Eric McGhee writes: I’m trying to generate county-level estimates from a statewide survey of California using multilevel modeling. [sent-1, score-0.204]
2 I would love to learn the full Bayesian approach, but I’m on a tight schedule and worried about teaching myself something of that complexity in the time available. [sent-2, score-0.502]
3 I’m hoping I can use the classical approach and simulate standard errors using what you and Jennifer Hill call the “informal Bayesian” method. [sent-3, score-0.531]
4 This has raised a few questions: First, what are the costs of using this approach as opposed to full Bayesian? [sent-4, score-0.372]
5 Second, when I use the predictive simulation as described on p. [sent-5, score-0.588]
6 149 of “Data Analysis” on a binary dependent variable and a sample of 2000, I get a 5%-95% range of simulation results so large as to be effectively useless (on the order of +/- 15 points). [sent-6, score-0.701]
7 This is true even for LA county, which has enough cases by itself (about 500) to get a standard error of about 2 points from simple disaggregation. [sent-7, score-0.313]
8 However, if I simulate only with the coefficients and skip the step of random draws from a binomial distribution (i. [sent-8, score-0.79]
9 148), I get results that are much more sensible (around +/- 5 points). [sent-11, score-0.2]
10 Do the random draws from the binomial distribution only apply to out-of-sample predictions? [sent-12, score-0.646]
11 If the latter, any idea why I would be getting such a large range of results? [sent-14, score-0.108]
12 Might that be signaling something wrong with the model, or with my R code? [sent-15, score-0.189]
13 Finally, when dealing with simulation results, what would most closely correspond to a margin of error? [sent-16, score-0.625]
14 I need a way of summarizing uncertainty using a terminology that is familiar to a policy audience. [sent-19, score-0.408]
15 My reply: The main benefit of full Bayes over approximate Bayes (of the sort done by lmer(), for example, and used in many of the examples in my book with Jennifer) arises when group-level variances are small. [sent-20, score-0.377]
16 Approximate Bayes gives a point estimate of the variance parameters, which understates uncertainty compared to full Bayes. [sent-21, score-0.425]
17 We are currently working on an add-on to lmer()-like programs to include some of that uncertainty, but we haven’t done it yet, so I don’t have any R package to conveniently offer you here. [sent-22, score-0.097]
18 Regarding your simulation question: Yes, if you’re interested in estimating all of California, you don’t want to do that binomial simulation–that’s something you only do when you’re simulating some finite amount of new data. [sent-23, score-0.812]
19 For the margin of error, you can just compute sd’s from the simulations and then compute 2*sd. [sent-24, score-0.545]
20 5%] simulation points, but that will be pretty noisy unless you have thousands of simulations. [sent-27, score-0.393]
wordName wordTfidf (topN-words)
[('simulation', 0.393), ('binomial', 0.243), ('lmer', 0.174), ('full', 0.166), ('simulate', 0.164), ('margin', 0.157), ('bayes', 0.152), ('uncertainty', 0.151), ('california', 0.144), ('draws', 0.135), ('simulations', 0.132), ('approximate', 0.131), ('compute', 0.128), ('results', 0.127), ('jennifer', 0.122), ('error', 0.121), ('points', 0.12), ('predictions', 0.111), ('approach', 0.11), ('statewide', 0.108), ('understates', 0.108), ('range', 0.108), ('described', 0.106), ('apply', 0.102), ('signaling', 0.1), ('conveniently', 0.097), ('using', 0.096), ('tight', 0.09), ('bayesian', 0.09), ('something', 0.089), ('use', 0.089), ('simulating', 0.087), ('random', 0.087), ('schedule', 0.086), ('terminology', 0.086), ('skip', 0.082), ('la', 0.082), ('variances', 0.08), ('distribution', 0.079), ('sd', 0.079), ('county', 0.078), ('summarizing', 0.075), ('informal', 0.075), ('correspond', 0.075), ('technique', 0.075), ('sensible', 0.073), ('dependent', 0.073), ('standard', 0.072), ('complexity', 0.071), ('hill', 0.071)]
simIndex simValue blogId blogTitle
same-blog 1 0.99999988 246 andrew gelman stats-2010-08-31-Somewhat Bayesian multilevel modeling
Introduction: Eric McGhee writes: I’m trying to generate county-level estimates from a statewide survey of California using multilevel modeling. I would love to learn the full Bayesian approach, but I’m on a tight schedule and worried about teaching myself something of that complexity in the time available. I’m hoping I can use the classical approach and simulate standard errors using what you and Jennifer Hill call the “informal Bayesian” method. This has raised a few questions: First, what are the costs of using this approach as opposed to full Bayesian? Second, when I use the predictive simulation as described on p. 149 of “Data Analysis” on a binary dependent variable and a sample of 2000, I get a 5%-95% range of simulation results so large as to be effectively useless (on the order of +/- 15 points). This is true even for LA county, which has enough cases by itself (about 500) to get a standard error of about 2 points from simple disaggregation. However, if I simulate only with t
2 0.15187365 1465 andrew gelman stats-2012-08-21-D. Buggin
Introduction: Joe Zhao writes: I am trying to fit my data using the scaled inverse wishart model you mentioned in your book, Data analysis using regression and hierarchical models. Instead of using a uniform prior on the scale parameters, I try to use a log-normal distribution prior. However, I found that the individual coefficients don’t shrink much to a certain value even a highly informative prior (with extremely low variance) is considered. The coefficients are just very close to their least-squares estimations. Is it because of the log-normal prior I’m using or I’m wrong somewhere? My reply: If your priors are concentrated enough at zero variance, then yeah, the posterior estimates of the parameters should be pulled (almost) all the way to zero. If this isn’t happening, you got a problem. So as a start I’d try putting in some really strong priors concentrated at 0 (for example, N(0,.1^2)) and checking that you get a sensible answer. If not, you might well have a bug. You can also try
3 0.14953956 547 andrew gelman stats-2011-01-31-Using sample size in the prior distribution
Introduction: Mike McLaughlin writes: Consider the Seeds example in vol. 1 of the BUGS examples. There, a binomial likelihood has a p parameter constructed, via logit, from two covariates. What I am wondering is: Would it be legitimate, in a binomial + logit problem like this, to allow binomial p[i] to be a function of the corresponding n[i] or would that amount to using the data in the prior? In other words, in the context of the Seeds example, is r[] the only data or is n[] data as well and therefore not permissible in a prior formulation? I [McLaughlin] currently have a model with a common beta prior for all p[i] but would like to mitigate this commonality (a kind of James-Stein effect) when there are lots of observations for some i. But this seems to feed the data back into the prior. Does it really? It also occurs to me [McLaughlin] that, perhaps, a binomial likelihood is not the one to use here (not flexible enough). My reply: Strictly speaking, “n” is data, and so what you wa
Introduction: From August 1990. It was in the form of a note sent to all the people in the statistics group of Bell Labs, where I’d worked that summer. To all: Here’s the abstract of the work I’ve done this summer. It’s stored in the file, /fs5/gelman/abstract.bell, and copies of the Figures 1-3 are on Trevor’s desk. Any comments are of course appreciated; I’m at gelman@stat.berkeley.edu. On the Routine Use of Markov Chains for Simulation Andrew Gelman and Donald Rubin, 6 August 1990 corrected version: 8 August 1990 1. Simulation In probability and statistics we can often specify multivariate distributions many of whose properties we do not fully understand–perhaps, as in the Ising model of statistical physics, we can write the joint density function, up to a multiplicative constant that cannot be expressed in closed form. For an example in statistics, consider the Normal random effects model in the analysis of variance, which can be easily placed in a Bayesian fram
5 0.14583057 1735 andrew gelman stats-2013-02-24-F-f-f-fake data
Introduction: Tiago Fragoso writes: Suppose I fit a two stage regression model Y = a + bx + e a = cw + d + e1 I could fit it all in one step by using MCMC for example (my model is more complicated than that, so I’ll have to do it by MCMC). However, I could fit the first regression only using MCMC because those estimates are hard to obtain and perform the second regression using least squares or a separate MCMC. So there’s an ‘one step’ inference based on doing it all at the same time and a ‘two step’ inference by fitting one and using the estimates on the further steps. What is gained or lost between both? Is anything done in this question? My response: Rather than answering your particular question, I’ll give you my generic answer, which is to simulate fake data from your model, then fit your model both ways and see how the results differ. Repeat the simulation a few thousand times and you can make all the statistical comparisons you like.
6 0.12947932 2201 andrew gelman stats-2014-02-06-Bootstrap averaging: Examples where it works and where it doesn’t work
7 0.12847847 1363 andrew gelman stats-2012-06-03-Question about predictive checks
8 0.12781489 779 andrew gelman stats-2011-06-25-Avoiding boundary estimates using a prior distribution as regularization
9 0.12701178 408 andrew gelman stats-2010-11-11-Incumbency advantage in 2010
10 0.12526584 1469 andrew gelman stats-2012-08-25-Ways of knowing
11 0.12304643 1966 andrew gelman stats-2013-08-03-Uncertainty in parameter estimates using multilevel models
12 0.12167877 1267 andrew gelman stats-2012-04-17-Hierarchical-multilevel modeling with “big data”
14 0.11800555 1682 andrew gelman stats-2013-01-19-R package for Bayes factors
15 0.11761252 516 andrew gelman stats-2011-01-14-A new idea for a science core course based entirely on computer simulation
16 0.11594897 1955 andrew gelman stats-2013-07-25-Bayes-respecting experimental design and other things
17 0.11519501 480 andrew gelman stats-2010-12-21-Instead of “confidence interval,” let’s say “uncertainty interval”
18 0.11426127 2351 andrew gelman stats-2014-05-28-Bayesian nonparametric weighted sampling inference
19 0.11359846 1737 andrew gelman stats-2013-02-25-Correlation of 1 . . . too good to be true?
20 0.11246607 2180 andrew gelman stats-2014-01-21-Everything I need to know about Bayesian statistics, I learned in eight schools.
topicId topicWeight
[(0, 0.213), (1, 0.157), (2, 0.064), (3, -0.02), (4, 0.069), (5, 0.037), (6, 0.029), (7, -0.013), (8, 0.021), (9, -0.033), (10, 0.034), (11, -0.056), (12, 0.013), (13, -0.02), (14, -0.008), (15, 0.008), (16, -0.034), (17, -0.005), (18, 0.011), (19, -0.024), (20, 0.035), (21, 0.071), (22, 0.044), (23, 0.042), (24, 0.035), (25, -0.041), (26, -0.041), (27, 0.032), (28, 0.027), (29, 0.022), (30, 0.019), (31, 0.04), (32, -0.018), (33, -0.008), (34, 0.045), (35, -0.016), (36, -0.072), (37, 0.002), (38, -0.028), (39, -0.033), (40, -0.021), (41, -0.002), (42, -0.015), (43, 0.034), (44, -0.008), (45, -0.016), (46, 0.034), (47, 0.048), (48, 0.027), (49, -0.006)]
simIndex simValue blogId blogTitle
same-blog 1 0.97416121 246 andrew gelman stats-2010-08-31-Somewhat Bayesian multilevel modeling
Introduction: Eric McGhee writes: I’m trying to generate county-level estimates from a statewide survey of California using multilevel modeling. I would love to learn the full Bayesian approach, but I’m on a tight schedule and worried about teaching myself something of that complexity in the time available. I’m hoping I can use the classical approach and simulate standard errors using what you and Jennifer Hill call the “informal Bayesian” method. This has raised a few questions: First, what are the costs of using this approach as opposed to full Bayesian? Second, when I use the predictive simulation as described on p. 149 of “Data Analysis” on a binary dependent variable and a sample of 2000, I get a 5%-95% range of simulation results so large as to be effectively useless (on the order of +/- 15 points). This is true even for LA county, which has enough cases by itself (about 500) to get a standard error of about 2 points from simple disaggregation. However, if I simulate only with t
2 0.82641059 269 andrew gelman stats-2010-09-10-R vs. Stata, or, Different ways to estimate multilevel models
Introduction: Cyrus writes: I [Cyrus] was teaching a class on multilevel modeling, and we were playing around with different method to fit a random effects logit model with 2 random intercepts—one corresponding to “family” and another corresponding to “community” (labeled “mom” and “cluster” in the data, respectively). There are also a few regressors at the individual, family, and community level. We were replicating in part some of the results from the following paper : Improved estimation procedures for multilevel models with binary response: a case-study, by G Rodriguez, N Goldman. (I say “replicating in part” because we didn’t include all the regressors that they use, only a subset.) We were looking at the performance of estimation via glmer in R’s lme4 package, glmmPQL in R’s MASS package, and Stata’s xtmelogit. We wanted to study the performance of various estimation methods, including adaptive quadrature methods and penalized quasi-likelihood. I was shocked to discover that glmer
Introduction: From August 1990. It was in the form of a note sent to all the people in the statistics group of Bell Labs, where I’d worked that summer. To all: Here’s the abstract of the work I’ve done this summer. It’s stored in the file, /fs5/gelman/abstract.bell, and copies of the Figures 1-3 are on Trevor’s desk. Any comments are of course appreciated; I’m at gelman@stat.berkeley.edu. On the Routine Use of Markov Chains for Simulation Andrew Gelman and Donald Rubin, 6 August 1990 corrected version: 8 August 1990 1. Simulation In probability and statistics we can often specify multivariate distributions many of whose properties we do not fully understand–perhaps, as in the Ising model of statistical physics, we can write the joint density function, up to a multiplicative constant that cannot be expressed in closed form. For an example in statistics, consider the Normal random effects model in the analysis of variance, which can be easily placed in a Bayesian fram
4 0.76177078 1966 andrew gelman stats-2013-08-03-Uncertainty in parameter estimates using multilevel models
Introduction: David Hsu writes: I have a (perhaps) simple question about uncertainty in parameter estimates using multilevel models — what is an appropriate threshold for measure parameter uncertainty in a multilevel model? The reason why I ask is that I set out to do a crossed two-way model with two varying intercepts, similar to your flight simulator example in your 2007 book. The difference is that I have a lot of predictors specific to each cell (I think equivalent to airport and pilot in your example), and I find after modeling this in JAGS, I happily find that the predictors are much less important than the variability by cell (airport and pilot effects). Happily because this is what I am writing a paper about. However, I then went to check subsets of predictors using lm() and lmer(). I understand that they all use different estimation methods, but what I can’t figure out is why the errors on all of the coefficient estimates are *so* different. For example, using JAGS, and th
5 0.75412136 726 andrew gelman stats-2011-05-22-Handling multiple versions of an outcome variable
Introduction: Jay Ulfelder asks: I have a question for you about what to do in a situation where you have two measures of your dependent variable and no prior reasons to strongly favor one over the other. Here’s what brings this up: I’m working on a project with Michael Ross where we’re modeling transitions to and from democracy in countries worldwide since 1960 to estimate the effects of oil income on the likelihood of those events’ occurrence. We’ve got a TSCS data set, and we’re using a discrete-time event history design, splitting the sample by regime type at the start of each year and then using multilevel logistic regression models with parametric measures of time at risk and random intercepts at the country and region levels. (We’re also checking for the usefulness of random slopes for oil wealth at one or the other level and then including them if they improve a model’s goodness of fit.) All of this is being done in Stata with the gllamm module. Our problem is that we have two plausib
6 0.74776328 2201 andrew gelman stats-2014-02-06-Bootstrap averaging: Examples where it works and where it doesn’t work
7 0.74644399 2180 andrew gelman stats-2014-01-21-Everything I need to know about Bayesian statistics, I learned in eight schools.
8 0.74201536 1374 andrew gelman stats-2012-06-11-Convergence Monitoring for Non-Identifiable and Non-Parametric Models
9 0.74147445 464 andrew gelman stats-2010-12-12-Finite-population standard deviation in a hierarchical model
10 0.72799194 250 andrew gelman stats-2010-09-02-Blending results from two relatively independent multi-level models
11 0.71749234 2311 andrew gelman stats-2014-04-29-Bayesian Uncertainty Quantification for Differential Equations!
12 0.71584344 960 andrew gelman stats-2011-10-15-The bias-variance tradeoff
13 0.71556199 2176 andrew gelman stats-2014-01-19-Transformations for non-normal data
14 0.71168876 759 andrew gelman stats-2011-06-11-“2 level logit with 2 REs & large sample. computational nightmare – please help”
16 0.70241576 1955 andrew gelman stats-2013-07-25-Bayes-respecting experimental design and other things
18 0.70050305 1737 andrew gelman stats-2013-02-25-Correlation of 1 . . . too good to be true?
19 0.69996101 1164 andrew gelman stats-2012-02-13-Help with this problem, win valuable prizes
20 0.69803685 888 andrew gelman stats-2011-09-03-A psychology researcher asks: Is Anova dead?
topicId topicWeight
[(9, 0.037), (16, 0.067), (20, 0.107), (21, 0.115), (24, 0.132), (45, 0.026), (52, 0.016), (63, 0.013), (84, 0.028), (85, 0.017), (86, 0.027), (99, 0.316)]
simIndex simValue blogId blogTitle
1 0.96898395 900 andrew gelman stats-2011-09-11-Symptomatic innumeracy
Introduction: I put it at the sister blog so the politics-haters among you could skip it. . . .
same-blog 2 0.95778656 246 andrew gelman stats-2010-08-31-Somewhat Bayesian multilevel modeling
Introduction: Eric McGhee writes: I’m trying to generate county-level estimates from a statewide survey of California using multilevel modeling. I would love to learn the full Bayesian approach, but I’m on a tight schedule and worried about teaching myself something of that complexity in the time available. I’m hoping I can use the classical approach and simulate standard errors using what you and Jennifer Hill call the “informal Bayesian” method. This has raised a few questions: First, what are the costs of using this approach as opposed to full Bayesian? Second, when I use the predictive simulation as described on p. 149 of “Data Analysis” on a binary dependent variable and a sample of 2000, I get a 5%-95% range of simulation results so large as to be effectively useless (on the order of +/- 15 points). This is true even for LA county, which has enough cases by itself (about 500) to get a standard error of about 2 points from simple disaggregation. However, if I simulate only with t
3 0.95760226 974 andrew gelman stats-2011-10-26-NYC jobs in applied statistics, psychometrics, and causal inference!
Introduction: The Center for the Promotion of Research Involving Innovative Statistical Methodology at the Steinhardt School of Education has two job openings ! One is for an assistant/associated tenure track position for an applied statistician or psychometrician. The other is for a postdoc in causal inference and sensitivity analysis. Jennifer Hill and Marc Scott at the Steinhardt school are just great! We’re working together on various research projects so if you manage to get one of these jobs maybe you can collaborate with us here at Columbia too. So I have every interest in encouraging the very best people to apply for these jobs.
4 0.95154548 1270 andrew gelman stats-2012-04-19-Demystifying Blup
Introduction: In our recent thread on computing hierarchical models with big datasets, someone brought up Blup. I thought it might be worth explaining what Blup is and how it relates to hierarchical models. Blup stands for Best Linear Unbiased Prediction, but in my terminology it’s just hierarchical modeling. Let me break it down: - “Best” doesn’t really matter. What’s important is that our estimates and predictions make sense and are as accurate as possible. - “Linear” isn’t so important. Statistical predictions are linear for Gaussian linear models, otherwise not. We can and do perform hierarchical generalized linear models all the time. - “Unbiased” doesn’t really matter (see discussion of “Best,” above). - “Prediction” is the key word for relating Blup and hierarchical modeling to classical statistical terminology. In classical statistics, “estimation” of a parameter theta is evaluated conditional on the true value of theta, whereas “prediction” of a predictive quantity phi is eval
Introduction: James Robins, Tyler VanderWeele, and Richard Gill write : Neyman introduced a formal mathematical theory of counterfactual causation that now has become standard language in many quantitative disciplines, but not in physics. We use results on causal interaction and interference between treatments (derived under the Neyman theory) to give a simple new proof of a well-known result in quantum physics, namely, Bellís inequality. Now the predictions of quantum mechanics and the results of experiment both violate Bell’s inequality. In the remainder of the talk, we review the implications for a counterfactual theory of causation. Assuming with Einstein that faster than light (supraluminal) communication is not possible, one can view the Neyman theory of counterfactuals as falsified by experiment. . . . Is it safe for a quantitative discipline to rely on a counterfactual approach to causation, when our best confirmed physical theory falsifies their existence? I haven’t seen the talk
7 0.94391805 147 andrew gelman stats-2010-07-15-Quote of the day: statisticians and defaults
8 0.94384503 2281 andrew gelman stats-2014-04-04-The Notorious N.H.S.T. presents: Mo P-values Mo Problems
10 0.94328314 1629 andrew gelman stats-2012-12-18-It happened in Connecticut
11 0.94322717 789 andrew gelman stats-2011-07-07-Descriptive statistics, causal inference, and story time
12 0.94137502 2159 andrew gelman stats-2014-01-04-“Dogs are sensitive to small variations of the Earth’s magnetic field”
13 0.94062787 1675 andrew gelman stats-2013-01-15-“10 Things You Need to Know About Causal Effects”
14 0.94037157 1989 andrew gelman stats-2013-08-20-Correcting for multiple comparisons in a Bayesian regression model
15 0.94013 1287 andrew gelman stats-2012-04-28-Understanding simulations in terms of predictive inference?
16 0.93927747 514 andrew gelman stats-2011-01-13-News coverage of statistical issues…how did I do?
17 0.93927336 537 andrew gelman stats-2011-01-25-Postdoc Position #1: Missing-Data Imputation, Diagnostics, and Applications
18 0.93804944 1459 andrew gelman stats-2012-08-15-How I think about mixture models
19 0.93802488 1824 andrew gelman stats-2013-04-25-Fascinating graphs from facebook data
20 0.93788552 480 andrew gelman stats-2010-12-21-Instead of “confidence interval,” let’s say “uncertainty interval”