andrew_gelman_stats andrew_gelman_stats-2012 andrew_gelman_stats-2012-1144 knowledge-graph by maker-knowledge-mining

1144 andrew gelman stats-2012-01-29-How many parameters are in a multilevel model?


meta infos for this blog

Source: html

Introduction: Stephen Collins writes: I’m reading your Multilevel modeling book and am trying to apply it to my work. I’m concerned with how to estimate a random intercept model if there are hundreds/thousands of levels. In the Gibbs sampling, am I sampling a parameter for each level? Or, just the hyper-parameters? In other words, say I had 500 zipcode intercepts modeled as ~ N(m,s). Would my posterior be two dimensional, sampling for “m” and “s,” or would it have 502 dimensions? My reply: Indeed you will have hundreds or thousands of parameters—or, in classical terms, hundreds or thousands of predictive quantities. But that’s ok. Even if none of those predictions is precise, you’re learning about the model. See page 526 of the book for more discussion of the number of parameters in a multilevel model.


Summary: the most important sentenses genereted by tfidf model

sentIndex sentText sentNum sentScore

1 Stephen Collins writes: I’m reading your Multilevel modeling book and am trying to apply it to my work. [sent-1, score-0.515]

2 I’m concerned with how to estimate a random intercept model if there are hundreds/thousands of levels. [sent-2, score-0.636]

3 In the Gibbs sampling, am I sampling a parameter for each level? [sent-3, score-0.463]

4 In other words, say I had 500 zipcode intercepts modeled as ~ N(m,s). [sent-5, score-0.436]

5 Would my posterior be two dimensional, sampling for “m” and “s,” or would it have 502 dimensions? [sent-6, score-0.566]

6 My reply: Indeed you will have hundreds or thousands of parameters—or, in classical terms, hundreds or thousands of predictive quantities. [sent-7, score-1.415]

7 Even if none of those predictions is precise, you’re learning about the model. [sent-9, score-0.373]

8 See page 526 of the book for more discussion of the number of parameters in a multilevel model. [sent-10, score-0.821]


similar blogs computed by tfidf model

tfidf for this blog:

wordName wordTfidf (topN-words)

[('sampling', 0.347), ('hundreds', 0.301), ('thousands', 0.284), ('collins', 0.252), ('multilevel', 0.222), ('intercepts', 0.215), ('parameters', 0.215), ('dimensional', 0.21), ('intercept', 0.197), ('gibbs', 0.18), ('modeled', 0.17), ('dimensions', 0.161), ('stephen', 0.155), ('concerned', 0.15), ('precise', 0.15), ('book', 0.14), ('none', 0.13), ('predictions', 0.129), ('classical', 0.124), ('predictive', 0.121), ('apply', 0.118), ('parameter', 0.116), ('learning', 0.114), ('posterior', 0.11), ('model', 0.105), ('terms', 0.105), ('words', 0.102), ('random', 0.101), ('page', 0.1), ('modeling', 0.093), ('level', 0.087), ('indeed', 0.086), ('reading', 0.086), ('estimate', 0.083), ('number', 0.079), ('trying', 0.078), ('reply', 0.078), ('discussion', 0.065), ('would', 0.059), ('say', 0.051), ('two', 0.05), ('re', 0.048), ('even', 0.04), ('writes', 0.04), ('see', 0.035)]

similar blogs list:

simIndex simValue blogId blogTitle

same-blog 1 1.0 1144 andrew gelman stats-2012-01-29-How many parameters are in a multilevel model?

Introduction: Stephen Collins writes: I’m reading your Multilevel modeling book and am trying to apply it to my work. I’m concerned with how to estimate a random intercept model if there are hundreds/thousands of levels. In the Gibbs sampling, am I sampling a parameter for each level? Or, just the hyper-parameters? In other words, say I had 500 zipcode intercepts modeled as ~ N(m,s). Would my posterior be two dimensional, sampling for “m” and “s,” or would it have 502 dimensions? My reply: Indeed you will have hundreds or thousands of parameters—or, in classical terms, hundreds or thousands of predictive quantities. But that’s ok. Even if none of those predictions is precise, you’re learning about the model. See page 526 of the book for more discussion of the number of parameters in a multilevel model.

2 0.20258197 1628 andrew gelman stats-2012-12-17-Statistics in a world where nothing is random

Introduction: Rama Ganesan writes: I think I am having an existential crisis. I used to work with animals (rats, mice, gerbils etc.) Then I started to work in marketing research where we did have some kind of random sampling procedure. So up until a few years ago, I was sort of okay. Now I am teaching marketing research, and I feel like there is no real random sampling anymore. I take pains to get students to understand what random means, and then the whole lot of inferential statistics. Then almost anything they do – the sample is not random. They think I am contradicting myself. They use convenience samples at every turn – for their school work, and the enormous amount on online surveying that gets done. Do you have any suggestions for me? Other than say, something like this . My reply: Statistics does not require randomness. The three essential elements of statistics are measurement, comparison, and variation. Randomness is one way to supply variation, and it’s one way to model

3 0.19868416 774 andrew gelman stats-2011-06-20-The pervasive twoishness of statistics; in particular, the “sampling distribution” and the “likelihood” are two different models, and that’s a good thing

Introduction: Lots of good statistical methods make use of two models. For example: - Classical statistics: estimates and standard errors using the likelihood function; tests and p-values using the sampling distribution. (The sampling distribution is not equivalent to the likelihood, as has been much discussed, for example in sequential stopping problems.) - Bayesian data analysis: inference using the posterior distribution; model checking using the predictive distribution (which, again, depends on the data-generating process in a way that the likelihood does not). - Machine learning: estimation using the data; evaluation using cross-validation (which requires some rule for partitioning the data, a rule that stands outside of the data themselves). - Bootstrap, jackknife, etc: estimation using an “estimator” (which, I would argue, is based in some sense on a model for the data), uncertainties using resampling (which, I would argue, is close to the idea of a “sampling distribution” in

4 0.19482379 1248 andrew gelman stats-2012-04-06-17 groups, 6 group-level predictors: What to do?

Introduction: Yi-Chun Ou writes: I am using a multilevel model with three levels. I read that you wrote a book about multilevel models, and wonder if you can solve the following question. The data structure is like this: Level one: customer (8444 customers) Level two: companys (90 companies) Level three: industry (17 industries) I use 6 level-three variables (i.e. industry characteristics) to explain the variance of the level-one effect across industries. The question here is whether there is an over-fitting problem since there are only 17 industries. I understand that this must be a problem for non-multilevel models, but is it also a problem for multilevel models? My reply: Yes, this could be a problem. I’d suggest combining some of your variables into a common score, or using only some of the variables, or using strong priors to control the inferences. This is an interesting and important area of statistics research, to do this sort of thing systematically. There’s lots o

5 0.17108935 1363 andrew gelman stats-2012-06-03-Question about predictive checks

Introduction: Klaas Metselaar writes: I [Metselaar] am currently involved in a discussion about the use of the notion “predictive” as used in “posterior predictive check”. I would argue that the notion “predictive” should be reserved for posterior checks using information not used in the determination of the posterior. I quote from the discussion: “However, the predictive uncertainty in a Bayesian calculation requires sampling from all the random variables, and this includes both the model parameters and the residual error”. My [Metselaar's] comment: This may be exactly the point I am worried about: shouldn’t the predictive uncertainty be defined as sampling from the posterior parameter distribution + residual error + sampling from the prediction error distribution? Residual error reduces to measurement error in the case of a model which is perfect for the sample of experiments. Measurement error could be reduced to almost zero by ideal and perfect measurement instruments. I would h

6 0.15660001 295 andrew gelman stats-2010-09-25-Clusters with very small numbers of observations

7 0.15464926 77 andrew gelman stats-2010-06-09-Sof[t]

8 0.15420869 383 andrew gelman stats-2010-10-31-Analyzing the entire population rather than a sample

9 0.14794041 1194 andrew gelman stats-2012-03-04-Multilevel modeling even when you’re not interested in predictions for new groups

10 0.13963613 1368 andrew gelman stats-2012-06-06-Question 27 of my final exam for Design and Analysis of Sample Surveys

11 0.1381318 1287 andrew gelman stats-2012-04-28-Understanding simulations in terms of predictive inference?

12 0.1374055 288 andrew gelman stats-2010-09-21-Discussion of the paper by Girolami and Calderhead on Bayesian computation

13 0.1357789 1898 andrew gelman stats-2013-06-14-Progress! (on the understanding of the role of randomization in Bayesian inference)

14 0.13516052 85 andrew gelman stats-2010-06-14-Prior distribution for design effects

15 0.13474403 405 andrew gelman stats-2010-11-10-Estimation from an out-of-date census

16 0.13159174 749 andrew gelman stats-2011-06-06-“Sampling: Design and Analysis”: a course for political science graduate students

17 0.12996925 1934 andrew gelman stats-2013-07-11-Yes, worry about generalizing from data to population. But multilevel modeling is the solution, not the problem

18 0.12831762 753 andrew gelman stats-2011-06-09-Allowing interaction terms to vary

19 0.12322532 25 andrew gelman stats-2010-05-10-Two great tastes that taste great together

20 0.12265173 2299 andrew gelman stats-2014-04-21-Stan Model of the Week: Hierarchical Modeling of Supernovas


similar blogs computed by lsi model

lsi for this blog:

topicId topicWeight

[(0, 0.148), (1, 0.159), (2, 0.055), (3, 0.023), (4, 0.072), (5, 0.088), (6, 0.026), (7, -0.047), (8, 0.069), (9, 0.034), (10, 0.055), (11, -0.051), (12, -0.0), (13, 0.027), (14, 0.015), (15, -0.048), (16, -0.078), (17, 0.009), (18, 0.032), (19, -0.053), (20, 0.006), (21, -0.031), (22, 0.026), (23, 0.041), (24, -0.045), (25, -0.039), (26, -0.081), (27, 0.136), (28, 0.066), (29, 0.045), (30, -0.124), (31, -0.019), (32, -0.054), (33, -0.001), (34, -0.069), (35, 0.042), (36, 0.002), (37, -0.072), (38, 0.001), (39, -0.006), (40, 0.004), (41, -0.0), (42, 0.024), (43, -0.04), (44, -0.087), (45, -0.076), (46, 0.038), (47, 0.043), (48, -0.027), (49, -0.037)]

similar blogs list:

simIndex simValue blogId blogTitle

same-blog 1 0.97838908 1144 andrew gelman stats-2012-01-29-How many parameters are in a multilevel model?

Introduction: Stephen Collins writes: I’m reading your Multilevel modeling book and am trying to apply it to my work. I’m concerned with how to estimate a random intercept model if there are hundreds/thousands of levels. In the Gibbs sampling, am I sampling a parameter for each level? Or, just the hyper-parameters? In other words, say I had 500 zipcode intercepts modeled as ~ N(m,s). Would my posterior be two dimensional, sampling for “m” and “s,” or would it have 502 dimensions? My reply: Indeed you will have hundreds or thousands of parameters—or, in classical terms, hundreds or thousands of predictive quantities. But that’s ok. Even if none of those predictions is precise, you’re learning about the model. See page 526 of the book for more discussion of the number of parameters in a multilevel model.

2 0.73863453 383 andrew gelman stats-2010-10-31-Analyzing the entire population rather than a sample

Introduction: Lee Mobley writes: I recently read what you posted on your blog How does statistical analysis differ when analyzing the entire population rather than a sample? What you said in the blog accords with my training in econometrics. However I am concerned about a new wrinkle on this problem that derives from multilevel modeling. We are analyzing multilevel models of the probability of using cancer screening for the entire Medicare population. I argue that every state has different systems in place (politics, cancer control efforts, culture, insurance regulations, etc) so that essentially a different probability generating mechanism is in place for each state. Thus I estimate 50 separate regressions for the populations in each state, and then note and map the variability in the effect estimates (slope parameters) for each covariate. Reviewers argue that I should be using random slopes modeling, pooling all individuals in all states together. I am familiar with this approach

3 0.71229917 1194 andrew gelman stats-2012-03-04-Multilevel modeling even when you’re not interested in predictions for new groups

Introduction: Fred Wu writes: I work at National Prescribing Services in Australia. I have a database representing say, antidiabetic drug utilisation for the entire Australia in the past few years. I planned to do a longitudinal analysis across GP Division Network (112 divisions in AUS) using mixed-effects models (or as you called in your book varying intercept and varying slope) on this data. The problem here is: as data actually represent the population who use antidiabetic drugs in AUS, should I use 112 fixed dummy variables to capture the random variations or use varying intercept and varying slope for the model ? Because some one may aruge, like divisions in AUS or states in USA can hardly be considered from a “superpopulation”, then fixed dummies should be used. What I think is the population are those who use the drugs, what will happen when the rest need to use them? In terms of exchangeability, using varying intercept and varying slopes can be justified. Also you provided in y

4 0.69865847 269 andrew gelman stats-2010-09-10-R vs. Stata, or, Different ways to estimate multilevel models

Introduction: Cyrus writes: I [Cyrus] was teaching a class on multilevel modeling, and we were playing around with different method to fit a random effects logit model with 2 random intercepts—one corresponding to “family” and another corresponding to “community” (labeled “mom” and “cluster” in the data, respectively). There are also a few regressors at the individual, family, and community level. We were replicating in part some of the results from the following paper : Improved estimation procedures for multilevel models with binary response: a case-study, by G Rodriguez, N Goldman. (I say “replicating in part” because we didn’t include all the regressors that they use, only a subset.) We were looking at the performance of estimation via glmer in R’s lme4 package, glmmPQL in R’s MASS package, and Stata’s xtmelogit. We wanted to study the performance of various estimation methods, including adaptive quadrature methods and penalized quasi-likelihood. I was shocked to discover that glmer

5 0.69092584 85 andrew gelman stats-2010-06-14-Prior distribution for design effects

Introduction: David Shor writes: I’m fitting a state-space model right now that estimates the “design effect” of individual pollsters (Ratio of poll variance to that predicted by perfect random sampling). What would be a good prior distribution for that? My quickest suggestion is start with something simple, such as a uniform from 1 to 10, and then to move to something hierarchical, such as a lognormal on (design.effect – 1), with the hyperparameters estimated from data. My longer suggestion is to take things apart. What exactly do you mean by “design effect”? There are lots of things going on, both in sampling error (the classical “design effect” that comes from cluster sampling, stratification, weighting, etc.) and nonsampling error (nonresponse bias, likeliy voter screening, bad questions, etc.) It would be best if you could model both pieces.

6 0.68957591 1726 andrew gelman stats-2013-02-18-What to read to catch up on multivariate statistics?

7 0.67223275 1934 andrew gelman stats-2013-07-11-Yes, worry about generalizing from data to population. But multilevel modeling is the solution, not the problem

8 0.64503694 1363 andrew gelman stats-2012-06-03-Question about predictive checks

9 0.63305122 1737 andrew gelman stats-2013-02-25-Correlation of 1 . . . too good to be true?

10 0.6252597 1682 andrew gelman stats-2013-01-19-R package for Bayes factors

11 0.62385619 850 andrew gelman stats-2011-08-11-Understanding how estimates change when you move to a multilevel model

12 0.61717701 1248 andrew gelman stats-2012-04-06-17 groups, 6 group-level predictors: What to do?

13 0.60929286 1270 andrew gelman stats-2012-04-19-Demystifying Blup

14 0.60918587 1628 andrew gelman stats-2012-12-17-Statistics in a world where nothing is random

15 0.60453725 464 andrew gelman stats-2010-12-12-Finite-population standard deviation in a hierarchical model

16 0.60397547 759 andrew gelman stats-2011-06-11-“2 level logit with 2 REs & large sample. computational nightmare – please help”

17 0.60313147 1267 andrew gelman stats-2012-04-17-Hierarchical-multilevel modeling with “big data”

18 0.59077591 405 andrew gelman stats-2010-11-10-Estimation from an out-of-date census

19 0.58548313 674 andrew gelman stats-2011-04-21-Handbook of Markov Chain Monte Carlo

20 0.58516294 107 andrew gelman stats-2010-06-24-PPS in Georgia


similar blogs computed by lda model

lda for this blog:

topicId topicWeight

[(7, 0.03), (16, 0.035), (21, 0.048), (24, 0.278), (34, 0.25), (99, 0.223)]

similar blogs list:

simIndex simValue blogId blogTitle

1 0.94311786 929 andrew gelman stats-2011-09-27-Visual diagnostics for discrete-data regressions

Introduction: Jeff asked me what I thought of this recent AJPS article by Brian Greenhill, Michael Ward, and Audrey Sacks, “The Separation Plot: A New Visual Method for Evaluating the Fit of Binary Models.” It’s similar to a graph of observed vs. predicted values, but using color rather than the y-axis to display the observed values. It seems like it could be useful, also could be applied more generally to discrete-data regressions with more than two categories. When it comes to checking the model fit, I recommend binned residual plots, as discussed in this 2000 article with Yuri Goegebeur, Francis Tuerlinckx, and Iven Van Mechelen.

same-blog 2 0.93544888 1144 andrew gelman stats-2012-01-29-How many parameters are in a multilevel model?

Introduction: Stephen Collins writes: I’m reading your Multilevel modeling book and am trying to apply it to my work. I’m concerned with how to estimate a random intercept model if there are hundreds/thousands of levels. In the Gibbs sampling, am I sampling a parameter for each level? Or, just the hyper-parameters? In other words, say I had 500 zipcode intercepts modeled as ~ N(m,s). Would my posterior be two dimensional, sampling for “m” and “s,” or would it have 502 dimensions? My reply: Indeed you will have hundreds or thousands of parameters—or, in classical terms, hundreds or thousands of predictive quantities. But that’s ok. Even if none of those predictions is precise, you’re learning about the model. See page 526 of the book for more discussion of the number of parameters in a multilevel model.

3 0.89811873 1501 andrew gelman stats-2012-09-18-More studies on the economic effects of climate change

Introduction: After writing yesterday’s post , I was going through Solomon Hsiang’s blog and found a post pointing to three studies from researchers at business schools: Severe Weather and Automobile Assembly Productivity Gérard P. Cachon, Santiago Gallino and Marcelo Olivares Abstract: It is expected that climate change could lead to an increased frequency of severe weather. In turn, severe weather intuitively should hamper the productivity of work that occurs outside. But what is the effect of rain, snow, fog, heat and wind on work that occurs indoors, such as the production of automobiles? Using weekly production data from 64 automobile plants in the United States over a ten-year period, we find that adverse weather conditions lead to a significant reduction in production. For example, one additional day of high wind advisory by the National Weather Service (i.e., maximum winds generally in excess of 44 miles per hour) reduces production by 26%, which is comparable in order of magnitude t

4 0.87816978 1911 andrew gelman stats-2013-06-23-AI Stats conference on Stan etc.

Introduction: Jaakko Peltonen writes: The Seventeenth International Conference on Artificial Intelligence and Statistics (http://www.aistats.org) will be next April in Reykjavik, Iceland. AISTATS is an interdisciplinary conference at the intersection of computer science, artificial intelligence, machine learning, statistics, and related areas. ============================================================================== AISTATS 2014 Call for Papers Seventeenth International Conference on Artificial Intelligence and Statistics April 22 – 25, 2014, Reykjavik, Iceland http://www.aistats.org Colocated with a MLSS Machine Learning Summer School ============================================================================== AISTATS is an interdisciplinary gathering of researchers at the intersection of computer science, artificial intelligence, machine learning, statistics, and related areas. Since its inception in 1985, the primary goal of AISTATS has been to broaden research in the

5 0.87462795 936 andrew gelman stats-2011-10-02-Covariate Adjustment in RCT - Model Overfitting in Multilevel Regression

Introduction: Makoto Hanita writes: We have been discussing the following two issues amongst ourselves, then with our methodological consultant for several days. However, we have not been able to arrive at a consensus. Consequently, we decided to seek an opinion from nationally known experts. FYI, we sent a similar inquiry to Larry Hedges and David Rogosa . . . 1)      We are wondering if a post-hoc covariate adjustment is a good practice in the context of RCTs [randomized clinical trials]. We have a situation where we found a significant baseline difference between the treatment and the control groups in 3 variables. Some of us argue that adding those three variables to the original impact analysis model is a good idea, as that would remove the confound from the impact estimate. Others among us, on the other hand, argue that a post-hoc covariate adjustment should never be done, on the ground that those covariates are correlated with the treatment, which makes the analysis model that of quasi

6 0.86506379 135 andrew gelman stats-2010-07-09-Rasmussen sez: “108% of Respondents Say . . .”

7 0.85879111 292 andrew gelman stats-2010-09-23-Doug Hibbs on the fundamentals in 2010

8 0.85018802 388 andrew gelman stats-2010-11-01-The placebo effect in pharma

9 0.83202124 2017 andrew gelman stats-2013-09-11-“Informative g-Priors for Logistic Regression”

10 0.83038968 1500 andrew gelman stats-2012-09-17-“2% per degree Celsius . . . the magic number for how worker productivity responds to warm-hot temperatures”

11 0.83021712 1734 andrew gelman stats-2013-02-23-Life in the C-suite: A graph that is both ugly and bad, and an unrelated story

12 0.82217836 2305 andrew gelman stats-2014-04-25-Revised statistical standards for evidence (comments to Val Johnson’s comments on our comments on Val’s comments on p-values)

13 0.81815469 1723 andrew gelman stats-2013-02-15-Wacky priors can work well?

14 0.81592178 2143 andrew gelman stats-2013-12-22-The kluges of today are the textbook solutions of tomorrow.

15 0.81488824 1376 andrew gelman stats-2012-06-12-Simple graph WIN: the example of birthday frequencies

16 0.81407106 278 andrew gelman stats-2010-09-15-Advice that might make sense for individuals but is negative-sum overall

17 0.81356275 1455 andrew gelman stats-2012-08-12-Probabilistic screening to get an approximate self-weighted sample

18 0.81334633 1842 andrew gelman stats-2013-05-05-Cleaning up science

19 0.81309187 1999 andrew gelman stats-2013-08-27-Bayesian model averaging or fitting a larger model

20 0.81244981 1111 andrew gelman stats-2012-01-10-The blog of the Cultural Cognition Project