andrew_gelman_stats andrew_gelman_stats-2013 andrew_gelman_stats-2013-1900 knowledge-graph by maker-knowledge-mining

1900 andrew gelman stats-2013-06-15-Exploratory multilevel analysis when group-level variables are of importance


meta infos for this blog

Source: html

Introduction: Steve Miller writes: Much of what I do is cross-national analyses of survey data (largely World Values Survey). . . . My big question pertains to (what I would call) exploratory analysis of multilevel data, especially when the group-level predictors are of theoretical importance. A lot of what I do involves analyzing cross-national survey items of citizen attitudes, typically of political leadership. These survey items are usually yes/no responses, or four-part responses indicating a level of agreement (strongly agree, agree, disagree, strongly disagree) that can be condensed into a binary variable. I believe these can be explained by reference to country-level factors. Much of the group-level variables of interest are count variables with a modal value of 0, which can be quite messy. How would you recommend exploring the variation in the dependent variable as it could be explained by the group-level count variable of interest, before fitting the multilevel model itself? When


Summary: the most important sentenses genereted by tfidf model

sentIndex sentText sentNum sentScore

1 Steve Miller writes: Much of what I do is cross-national analyses of survey data (largely World Values Survey). [sent-1, score-0.219]

2 My big question pertains to (what I would call) exploratory analysis of multilevel data, especially when the group-level predictors are of theoretical importance. [sent-5, score-0.654]

3 A lot of what I do involves analyzing cross-national survey items of citizen attitudes, typically of political leadership. [sent-6, score-0.782]

4 These survey items are usually yes/no responses, or four-part responses indicating a level of agreement (strongly agree, agree, disagree, strongly disagree) that can be condensed into a binary variable. [sent-7, score-1.144]

5 I believe these can be explained by reference to country-level factors. [sent-8, score-0.234]

6 Much of the group-level variables of interest are count variables with a modal value of 0, which can be quite messy. [sent-9, score-0.99]

7 How would you recommend exploring the variation in the dependent variable as it could be explained by the group-level count variable of interest, before fitting the multilevel model itself? [sent-10, score-1.642]

8 When the variables of interest are at the individual-level, I know it’s practical to run a separate regression on each country and plot the results. [sent-11, score-0.837]

9 Further, I’m not sure if doing a simple cross-tab of group-level averages of the dependent variable, as it coincides with different values of the group-level independent variable, is appropriate since the DV itself is at the individual-level. [sent-13, score-0.655]

10 What I’m trying to avoid is wasting time estimating computationally-intensive multilevel models (with often an N between 60,000 and 90,000 and a J of anywhere between 60 and 80) as a means to exploring the data as they pertain to my research question of interest. [sent-14, score-0.886]

11 Actually, that article is pretty much a direct answer to your above questions. [sent-16, score-0.072]


similar blogs computed by tfidf model

tfidf for this blog:

wordName wordTfidf (topN-words)

[('multilevel', 0.26), ('variable', 0.255), ('variables', 0.228), ('survey', 0.219), ('interest', 0.214), ('exploring', 0.203), ('dependent', 0.198), ('count', 0.173), ('items', 0.17), ('responses', 0.167), ('coincides', 0.156), ('condensed', 0.156), ('pertains', 0.156), ('explained', 0.155), ('disagree', 0.148), ('modal', 0.147), ('strongly', 0.14), ('citizen', 0.136), ('dv', 0.136), ('values', 0.13), ('commentary', 0.121), ('miller', 0.117), ('indicating', 0.107), ('wasting', 0.104), ('anywhere', 0.102), ('regression', 0.1), ('agreement', 0.097), ('averages', 0.096), ('agree', 0.092), ('exploratory', 0.09), ('steve', 0.089), ('involves', 0.088), ('largely', 0.088), ('binary', 0.088), ('analyzing', 0.085), ('political', 0.084), ('reference', 0.079), ('separate', 0.075), ('independent', 0.075), ('attitudes', 0.075), ('plot', 0.075), ('question', 0.074), ('predictors', 0.074), ('fitting', 0.073), ('practical', 0.073), ('much', 0.072), ('country', 0.072), ('avoid', 0.072), ('estimating', 0.071), ('variation', 0.07)]

similar blogs list:

simIndex simValue blogId blogTitle

same-blog 1 1.0 1900 andrew gelman stats-2013-06-15-Exploratory multilevel analysis when group-level variables are of importance

Introduction: Steve Miller writes: Much of what I do is cross-national analyses of survey data (largely World Values Survey). . . . My big question pertains to (what I would call) exploratory analysis of multilevel data, especially when the group-level predictors are of theoretical importance. A lot of what I do involves analyzing cross-national survey items of citizen attitudes, typically of political leadership. These survey items are usually yes/no responses, or four-part responses indicating a level of agreement (strongly agree, agree, disagree, strongly disagree) that can be condensed into a binary variable. I believe these can be explained by reference to country-level factors. Much of the group-level variables of interest are count variables with a modal value of 0, which can be quite messy. How would you recommend exploring the variation in the dependent variable as it could be explained by the group-level count variable of interest, before fitting the multilevel model itself? When

2 0.206047 1248 andrew gelman stats-2012-04-06-17 groups, 6 group-level predictors: What to do?

Introduction: Yi-Chun Ou writes: I am using a multilevel model with three levels. I read that you wrote a book about multilevel models, and wonder if you can solve the following question. The data structure is like this: Level one: customer (8444 customers) Level two: companys (90 companies) Level three: industry (17 industries) I use 6 level-three variables (i.e. industry characteristics) to explain the variance of the level-one effect across industries. The question here is whether there is an over-fitting problem since there are only 17 industries. I understand that this must be a problem for non-multilevel models, but is it also a problem for multilevel models? My reply: Yes, this could be a problem. I’d suggest combining some of your variables into a common score, or using only some of the variables, or using strong priors to control the inferences. This is an interesting and important area of statistics research, to do this sort of thing systematically. There’s lots o

3 0.20282371 1218 andrew gelman stats-2012-03-18-Check your missing-data imputations using cross-validation

Introduction: Elena Grewal writes: I am currently using the iterative regression imputation model as implemented in the Stata ICE package. I am using data from a survey of about 90,000 students in 142 schools and my variable of interest is parent level of education. I want only this variable to be imputed with as little bias as possible as I am not using any other variable. So I scoured the survey for every variable I thought could possibly predict parent education. The main variable I found is parent occupation, which explains about 35% of the variance in parent education for the students with complete data on both. I then include the 20 other variables I found in the survey in a regression predicting parent education, which explains about 40% of the variance in parent education for students with complete data on all the variables. My question is this: many of the other variables I found have more missing values than the parent education variable, and also, although statistically significant

4 0.19479619 352 andrew gelman stats-2010-10-19-Analysis of survey data: Design based models vs. hierarchical modeling?

Introduction: Alban Zeber writes: Suppose I have survey data from say 10 countries where by each country collected the data based on different sampling routines – the results of this being that each country has its own weights for the data that can be used in the analyses. If I analyse the data of each country separately then I can incorporate the survey design in the analyses e.g in Stata once can use svyset ….. But what happens when I want to do a pooled analysis of the all the data from the 10 countries: Presumably either 1. I analyse the data from each country separately (using multiple or logistic regression, …) accounting for the survey design and then combine the estimates using a meta analysis (fixed or random) OR 2. Assume that the data from each country is a simple random sample from the population, combine the data from the 10 countries and then use multilevel or hierarchical models My question is which of the methods is likely to give better estimates? Or is the

5 0.16188708 25 andrew gelman stats-2010-05-10-Two great tastes that taste great together

Introduction: Vlad Kogan writes: I’ve using your book on regression and multilevel modeling and have a quick R question for you. Do you happen to know if there is any R package that can estimate a two-stage (instrumental variable) multi-level model? My reply: I don’t know. I’ll post on blog and maybe there will be a response. You could also try the R help list.

6 0.16058698 14 andrew gelman stats-2010-05-01-Imputing count data

7 0.15840074 704 andrew gelman stats-2011-05-10-Multiple imputation and multilevel analysis

8 0.1498969 2296 andrew gelman stats-2014-04-19-Index or indicator variables

9 0.14824803 315 andrew gelman stats-2010-10-03-He doesn’t trust the fit . . . r=.999

10 0.13834187 295 andrew gelman stats-2010-09-25-Clusters with very small numbers of observations

11 0.13709694 2152 andrew gelman stats-2013-12-28-Using randomized incentives as an instrument for survey nonresponse?

12 0.13586138 1430 andrew gelman stats-2012-07-26-Some thoughts on survey weighting

13 0.13474531 1656 andrew gelman stats-2013-01-05-Understanding regression models and regression coefficients

14 0.13432054 1294 andrew gelman stats-2012-05-01-Modeling y = a + b + c

15 0.13384171 726 andrew gelman stats-2011-05-22-Handling multiple versions of an outcome variable

16 0.13329366 888 andrew gelman stats-2011-09-03-A psychology researcher asks: Is Anova dead?

17 0.13328792 761 andrew gelman stats-2011-06-13-A survey’s not a survey if they don’t tell you how they did it

18 0.13291049 772 andrew gelman stats-2011-06-17-Graphical tools for understanding multilevel models

19 0.1317693 1934 andrew gelman stats-2013-07-11-Yes, worry about generalizing from data to population. But multilevel modeling is the solution, not the problem

20 0.13100176 1228 andrew gelman stats-2012-03-25-Continuous variables in Bayesian networks


similar blogs computed by lsi model

lsi for this blog:

topicId topicWeight

[(0, 0.193), (1, 0.093), (2, 0.121), (3, -0.057), (4, 0.128), (5, 0.076), (6, -0.048), (7, -0.049), (8, 0.117), (9, 0.107), (10, 0.068), (11, -0.039), (12, 0.014), (13, 0.076), (14, 0.028), (15, 0.018), (16, -0.04), (17, -0.038), (18, 0.021), (19, 0.005), (20, -0.026), (21, 0.04), (22, -0.01), (23, 0.03), (24, -0.07), (25, -0.053), (26, 0.064), (27, -0.088), (28, -0.043), (29, -0.012), (30, 0.041), (31, 0.061), (32, 0.04), (33, 0.055), (34, -0.068), (35, -0.041), (36, 0.06), (37, 0.054), (38, 0.006), (39, 0.015), (40, 0.013), (41, 0.019), (42, 0.058), (43, -0.034), (44, 0.007), (45, 0.028), (46, 0.041), (47, 0.048), (48, -0.04), (49, 0.054)]

similar blogs list:

simIndex simValue blogId blogTitle

same-blog 1 0.98219198 1900 andrew gelman stats-2013-06-15-Exploratory multilevel analysis when group-level variables are of importance

Introduction: Steve Miller writes: Much of what I do is cross-national analyses of survey data (largely World Values Survey). . . . My big question pertains to (what I would call) exploratory analysis of multilevel data, especially when the group-level predictors are of theoretical importance. A lot of what I do involves analyzing cross-national survey items of citizen attitudes, typically of political leadership. These survey items are usually yes/no responses, or four-part responses indicating a level of agreement (strongly agree, agree, disagree, strongly disagree) that can be condensed into a binary variable. I believe these can be explained by reference to country-level factors. Much of the group-level variables of interest are count variables with a modal value of 0, which can be quite messy. How would you recommend exploring the variation in the dependent variable as it could be explained by the group-level count variable of interest, before fitting the multilevel model itself? When

2 0.85628355 1814 andrew gelman stats-2013-04-20-A mess with which I am comfortable

Introduction: Having established that survey weighting is a mess, I should also acknowledge that, by this standard, regression modeling is also a mess, involving many arbitrary choices of variable selection, transformations and modeling of interaction. Nonetheless, regression modeling is a mess with which I am comfortable and, perhaps more relevant to the discussion, can be extended using multilevel models to get inference for small cross-classifications or small areas. We’re working on it.

3 0.84130818 14 andrew gelman stats-2010-05-01-Imputing count data

Introduction: Guy asks: I am analyzing an original survey of farmers in Uganda. I am hoping to use a battery of welfare proxy variables to create a single welfare index using PCA. I have quick question which I hope you can find time to address: How do you recommend treating count data? (for example # of rooms, # of chickens, # of cows, # of radios)? In my dataset these variables are highly skewed with many responses at zero (which makes taking the natural log problematic). In the case of # of cows or chickens several obs have values in the hundreds. My response: Here’s what we do in our mi package in R. We split a variable into two parts: an indicator for whether it is positive, and the positive part. That is, y = u*v. Then u is binary and can be modeled using logisitc regression, and v can be modeled on the log scale. At the end you can round to the nearest integer if you want to avoid fractional values.

4 0.82721311 2152 andrew gelman stats-2013-12-28-Using randomized incentives as an instrument for survey nonresponse?

Introduction: I received the following question: Is there a classic paper on instrumenting for survey non-response? some colleagues in public health are going to carry out a survey and I wonder about suggesting that they build in a randomization of response-encouragement (e.g. offering additional $ to a subset of those who don’t respond initially). Can you recommend a basic treatment of this, and why it might or might not make sense compared to IPW using covariates (without an instrument)? My reply: Here’s the best analysis I know of on the effects of incentives for survey response. There have been several survey-experiments on the subject. The short answer is that the effect on nonresponse is small and the outcome is highly variable, hence you can’t very well use it as an instrument in any particular survey. My recommended approach to dealing with nonresponse is to use multilevel regression and poststratification; an example is here . Inverse-probability weighting doesn’t really w

5 0.80757427 1218 andrew gelman stats-2012-03-18-Check your missing-data imputations using cross-validation

Introduction: Elena Grewal writes: I am currently using the iterative regression imputation model as implemented in the Stata ICE package. I am using data from a survey of about 90,000 students in 142 schools and my variable of interest is parent level of education. I want only this variable to be imputed with as little bias as possible as I am not using any other variable. So I scoured the survey for every variable I thought could possibly predict parent education. The main variable I found is parent occupation, which explains about 35% of the variance in parent education for the students with complete data on both. I then include the 20 other variables I found in the survey in a regression predicting parent education, which explains about 40% of the variance in parent education for students with complete data on all the variables. My question is this: many of the other variables I found have more missing values than the parent education variable, and also, although statistically significant

6 0.79908609 1294 andrew gelman stats-2012-05-01-Modeling y = a + b + c

7 0.77799648 397 andrew gelman stats-2010-11-06-Multilevel quantile regression

8 0.76216185 1121 andrew gelman stats-2012-01-15-R-squared for multilevel models

9 0.74432778 1248 andrew gelman stats-2012-04-06-17 groups, 6 group-level predictors: What to do?

10 0.72566426 1094 andrew gelman stats-2011-12-31-Using factor analysis or principal components analysis or measurement-error models for biological measurements in archaeology?

11 0.7202388 352 andrew gelman stats-2010-10-19-Analysis of survey data: Design based models vs. hierarchical modeling?

12 0.71776974 704 andrew gelman stats-2011-05-10-Multiple imputation and multilevel analysis

13 0.71640122 25 andrew gelman stats-2010-05-10-Two great tastes that taste great together

14 0.70787835 2357 andrew gelman stats-2014-06-02-Why we hate stepwise regression

15 0.70584649 1815 andrew gelman stats-2013-04-20-Displaying inferences from complex models

16 0.70356858 2296 andrew gelman stats-2014-04-19-Index or indicator variables

17 0.70130467 1430 andrew gelman stats-2012-07-26-Some thoughts on survey weighting

18 0.69604224 948 andrew gelman stats-2011-10-10-Combining data from many sources

19 0.69319052 1908 andrew gelman stats-2013-06-21-Interpreting interactions in discrete-data regression

20 0.68621492 251 andrew gelman stats-2010-09-02-Interactions of predictors in a causal model


similar blogs computed by lda model

lda for this blog:

topicId topicWeight

[(2, 0.021), (15, 0.011), (16, 0.072), (21, 0.021), (24, 0.147), (36, 0.115), (46, 0.016), (47, 0.011), (75, 0.012), (86, 0.025), (93, 0.013), (99, 0.441)]

similar blogs list:

simIndex simValue blogId blogTitle

1 0.99222529 1217 andrew gelman stats-2012-03-17-NSF program “to support analytic and methodological research in support of its surveys”

Introduction: David Hogg points me to this announcement of a program from the National Center for Science and Engineering Statistics of the National Science Foundation: The Center would like to enhance its efforts to support analytic and methodological research in support of its surveys, and to engage in the education and training of researchers in the use of large-scale nationally representative datasets. NCSES welcomes efforts by the research community to use NCSES data for research on the science and technology enterprise. This sounds like a job for Mister P. My first thought when David sent this to me was not to post it, because maybe I’d want to apply myself! But then I thought better of my piggy instincts, so here it is. I think some of the readers of this blog are doing research that would be relevant for this program, so . . . go for it!

2 0.98998117 101 andrew gelman stats-2010-06-20-“People with an itch to scratch”

Introduction: Derek Sonderegger writes: I have just finished my Ph.D. in statistics and am currently working in applied statistics (plant ecology) using Bayesian statistics. As the statistician in the group I only ever get the ‘hard analysis’ problems that don’t readily fit into standard models. As I delve into the computational aspects of Bayesian analysis, I find myself increasingly frustrated with the current set of tools. I was delighted to see JAGS 2.0 just came out and spent yesterday happily playing with it. My question is, where do you see the short-term future of Bayesian computing going and what can we do to steer it in a particular direction? In your book with Dr Hill, you mention that you expect BUGS (or its successor) to become increasingly sophisticated and, for example, re-parameterizations that increase convergence rates would be handled automatically. Just as R has been successful because users can extend it, I think progress here also will be made by input from ‘p

3 0.98864782 1847 andrew gelman stats-2013-05-08-Of parsing and chess

Introduction: Gary Marcus writes , An algorithm that is good at chess won’t help parsing sentences, and one that parses sentences likely won’t be much help playing chess. That is soooo true. I’m excellent at parsing sentences but I’m not so great at chess. And, worse than that, my chess ability seems to be declining from year to year. Which reminds me: I recently read Frank Brady’s much lauded Endgame , a biography of Bobby Fischer. The first few chapters were great, not just the Cinderella story of his steps to the world championship, but also the background on his childhood and the stories of the games and tournaments that he lost along the way. But after Fischer beats Spassky in 1972, the book just dies. Brady has chapter after chapter on Fisher’s life, his paranoia, his girlfriends, his travels. But, really, after the chess is over, it’s just sad and kind of boring. I’d much rather have had twice as much detail on the first part of the life and then had the post-1972 era compr

same-blog 4 0.98860085 1900 andrew gelman stats-2013-06-15-Exploratory multilevel analysis when group-level variables are of importance

Introduction: Steve Miller writes: Much of what I do is cross-national analyses of survey data (largely World Values Survey). . . . My big question pertains to (what I would call) exploratory analysis of multilevel data, especially when the group-level predictors are of theoretical importance. A lot of what I do involves analyzing cross-national survey items of citizen attitudes, typically of political leadership. These survey items are usually yes/no responses, or four-part responses indicating a level of agreement (strongly agree, agree, disagree, strongly disagree) that can be condensed into a binary variable. I believe these can be explained by reference to country-level factors. Much of the group-level variables of interest are count variables with a modal value of 0, which can be quite messy. How would you recommend exploring the variation in the dependent variable as it could be explained by the group-level count variable of interest, before fitting the multilevel model itself? When

5 0.98795259 1666 andrew gelman stats-2013-01-10-They’d rather be rigorous than right

Introduction: Following up on my post responding to his question about that controversial claim that high genetic diversity, or low genetic diversity, is bad for the economy, Kyle Peyton writes: I’m happy to see you’ve articulated similar gripes I had w/ the piece, which makes me feel like I’m not crazy. I remember discussing this with colleagues (I work at a research institute w/ economists) and only a couple of them shared any concern. It seems that by virtue of being published in ‘the AER’ the results are unquestionable. I agree that the idea is interesting and worth pursuing but as you say it’s one thing to go from that to asserting ‘causality’ (I still don’t know what definition of causality they’re using?). All the data torture along the way is just tipping the hat to convention rather than serving any scientific purpose. Some researchers are so uptight about identification that, when they think they have it, all their skepticism dissolves. Even in a case like this where that causal tr

6 0.98443127 415 andrew gelman stats-2010-11-15-The two faces of Erving Goffman: Subtle observer of human interactions, and Smug organzation man

7 0.98335189 1898 andrew gelman stats-2013-06-14-Progress! (on the understanding of the role of randomization in Bayesian inference)

8 0.98157281 551 andrew gelman stats-2011-02-02-Obama and Reagan, sitting in a tree, etc.

9 0.98117512 394 andrew gelman stats-2010-11-05-2010: What happened?

10 0.97612214 1336 andrew gelman stats-2012-05-22-Battle of the Repo Man quotes: Reid Hastie’s turn

11 0.97598094 998 andrew gelman stats-2011-11-08-Bayes-Godel

12 0.97061318 288 andrew gelman stats-2010-09-21-Discussion of the paper by Girolami and Calderhead on Bayesian computation

13 0.97040069 2151 andrew gelman stats-2013-12-27-Should statistics have a Nobel prize?

14 0.97002196 370 andrew gelman stats-2010-10-25-Who gets wedding announcements in the Times?

15 0.96984357 2303 andrew gelman stats-2014-04-23-Thinking of doing a list experiment? Here’s a list of reasons why you should think again

16 0.96912098 1470 andrew gelman stats-2012-08-26-Graphs showing regression uncertainty: the code!

17 0.96832705 2176 andrew gelman stats-2014-01-19-Transformations for non-normal data

18 0.96713465 2236 andrew gelman stats-2014-03-07-Selection bias in the reporting of shaky research

19 0.96695524 1807 andrew gelman stats-2013-04-17-Data problems, coding errors…what can be done?

20 0.9667818 1074 andrew gelman stats-2011-12-20-Reading a research paper != agreeing with its claims