andrew_gelman_stats andrew_gelman_stats-2011 andrew_gelman_stats-2011-704 knowledge-graph by maker-knowledge-mining

704 andrew gelman stats-2011-05-10-Multiple imputation and multilevel analysis


meta infos for this blog

Source: html

Introduction: Robert Birkelbach: I am writing my Bachelor Thesis in which I want to assess the reading competencies of German elementary school children using the PIRLS2006 data. My levels are classrooms and the individuals. However, my dependent variable is a multiple imputed (m=5) reading test. The problem I have is, that I do not know, whether I can just calculate 5 linear multilevel models and then average all the results (the coefficients, standard deviation, bic, intra class correlation, R2, t-statistics, p-values etc) or if I need different formulas for integrating the results of the five models into one because it is a multilevel analysis? Do you think there’s a better way in solving my problem? I would greatly appreciate if you could help me with a problem regarding my analysis — I am quite a newbie to multilevel modeling and especially to multiple imputation. Also: Is it okay to use frequentist models when the multiple imputation was done bayesian? Would the different philosophies of sc


Summary: the most important sentenses genereted by tfidf model

sentIndex sentText sentNum sentScore

1 Robert Birkelbach: I am writing my Bachelor Thesis in which I want to assess the reading competencies of German elementary school children using the PIRLS2006 data. [sent-1, score-0.783]

2 However, my dependent variable is a multiple imputed (m=5) reading test. [sent-3, score-0.718]

3 Do you think there’s a better way in solving my problem? [sent-5, score-0.134]

4 I would greatly appreciate if you could help me with a problem regarding my analysis — I am quite a newbie to multilevel modeling and especially to multiple imputation. [sent-6, score-1.186]

5 Also: Is it okay to use frequentist models when the multiple imputation was done bayesian? [sent-7, score-0.779]

6 Would the different philosophies of scientific testing contradict each other? [sent-8, score-0.457]

7 My reply: I receommend doing 5 separate analyses, pushing them all the way thru to the end, then combining them using the combining-imputation rules given in the imputatoin chapter of our book. [sent-9, score-0.777]


similar blogs computed by tfidf model

tfidf for this blog:

wordName wordTfidf (topN-words)

[('multilevel', 0.25), ('multiple', 0.229), ('competencies', 0.2), ('newbie', 0.189), ('integrating', 0.181), ('classrooms', 0.181), ('bachelor', 0.181), ('bic', 0.174), ('thru', 0.165), ('okay', 0.161), ('formulas', 0.158), ('philosophies', 0.155), ('models', 0.15), ('elementary', 0.15), ('imputed', 0.15), ('german', 0.145), ('solving', 0.134), ('contradict', 0.132), ('problem', 0.131), ('pushing', 0.131), ('reading', 0.13), ('calculate', 0.129), ('greatly', 0.128), ('dependent', 0.127), ('imputation', 0.126), ('thesis', 0.12), ('assess', 0.12), ('deviation', 0.119), ('combining', 0.119), ('frequentist', 0.113), ('results', 0.111), ('coefficients', 0.101), ('children', 0.099), ('robert', 0.097), ('separate', 0.097), ('appreciate', 0.096), ('levels', 0.096), ('rules', 0.096), ('correlation', 0.095), ('five', 0.095), ('testing', 0.091), ('linear', 0.089), ('analyses', 0.087), ('analysis', 0.086), ('etc', 0.085), ('chapter', 0.085), ('using', 0.084), ('variable', 0.082), ('different', 0.079), ('regarding', 0.077)]

similar blogs list:

simIndex simValue blogId blogTitle

same-blog 1 0.99999994 704 andrew gelman stats-2011-05-10-Multiple imputation and multilevel analysis

Introduction: Robert Birkelbach: I am writing my Bachelor Thesis in which I want to assess the reading competencies of German elementary school children using the PIRLS2006 data. My levels are classrooms and the individuals. However, my dependent variable is a multiple imputed (m=5) reading test. The problem I have is, that I do not know, whether I can just calculate 5 linear multilevel models and then average all the results (the coefficients, standard deviation, bic, intra class correlation, R2, t-statistics, p-values etc) or if I need different formulas for integrating the results of the five models into one because it is a multilevel analysis? Do you think there’s a better way in solving my problem? I would greatly appreciate if you could help me with a problem regarding my analysis — I am quite a newbie to multilevel modeling and especially to multiple imputation. Also: Is it okay to use frequentist models when the multiple imputation was done bayesian? Would the different philosophies of sc

2 0.19152057 608 andrew gelman stats-2011-03-12-Single or multiple imputation?

Introduction: Vishnu Ganglani writes: It appears that multiple imputation appears to be the best way to impute missing data because of the more accurate quantification of variance. However, when imputing missing data for income values in national household surveys, would you recommend it would be practical to maintain the multiple datasets associated with multiple imputations, or a single imputation method would suffice. I have worked on household survey projects (in Scotland) and in the past gone with suggesting single methods for ease of implementation, but with the availability of open source R software I am think of performing multiple imputation methodologies, but a bit apprehensive because of the complexity and also the need to maintain multiple datasets (ease of implementation). My reply: In many applications I’ve just used a single random imputation to avoid the awkwardness of working with multiple datasets. But if there’s any concern, I’d recommend doing parallel analyses on multipl

3 0.17121302 1248 andrew gelman stats-2012-04-06-17 groups, 6 group-level predictors: What to do?

Introduction: Yi-Chun Ou writes: I am using a multilevel model with three levels. I read that you wrote a book about multilevel models, and wonder if you can solve the following question. The data structure is like this: Level one: customer (8444 customers) Level two: companys (90 companies) Level three: industry (17 industries) I use 6 level-three variables (i.e. industry characteristics) to explain the variance of the level-one effect across industries. The question here is whether there is an over-fitting problem since there are only 17 industries. I understand that this must be a problem for non-multilevel models, but is it also a problem for multilevel models? My reply: Yes, this could be a problem. I’d suggest combining some of your variables into a common score, or using only some of the variables, or using strong priors to control the inferences. This is an interesting and important area of statistics research, to do this sort of thing systematically. There’s lots o

4 0.17062077 935 andrew gelman stats-2011-10-01-When should you worry about imputed data?

Introduction: Majid Ezzati writes: My research group is increasingly focusing on a series of problems that involve data that either have missingness or measurements that may have bias/error. We have at times developed our own approaches to imputation (as simple as interpolating a missing unit and as sophisticated as a problem-specific Bayesian hierarchical model) and at other times, other groups impute the data. The outputs are being used to investigate the basic associations between pairs of variables, Xs and Ys, in regressions; we may or may not interpret these as causal. I am contacting colleagues with relevant expertise to suggest good references on whether having imputed X and/or Y in a subsequent regression is correct or if it could somehow lead to biased/spurious associations. Thinking about this, we can have at least the following situations (these could all be Bayesian or not): 1) X and Y both measured (perhaps with error) 2) Y imputed using some data and a model and X measur

5 0.15840074 1900 andrew gelman stats-2013-06-15-Exploratory multilevel analysis when group-level variables are of importance

Introduction: Steve Miller writes: Much of what I do is cross-national analyses of survey data (largely World Values Survey). . . . My big question pertains to (what I would call) exploratory analysis of multilevel data, especially when the group-level predictors are of theoretical importance. A lot of what I do involves analyzing cross-national survey items of citizen attitudes, typically of political leadership. These survey items are usually yes/no responses, or four-part responses indicating a level of agreement (strongly agree, agree, disagree, strongly disagree) that can be condensed into a binary variable. I believe these can be explained by reference to country-level factors. Much of the group-level variables of interest are count variables with a modal value of 0, which can be quite messy. How would you recommend exploring the variation in the dependent variable as it could be explained by the group-level count variable of interest, before fitting the multilevel model itself? When

6 0.15520203 295 andrew gelman stats-2010-09-25-Clusters with very small numbers of observations

7 0.14424828 799 andrew gelman stats-2011-07-13-Hypothesis testing with multiple imputations

8 0.14284311 1016 andrew gelman stats-2011-11-17-I got 99 comparisons but multiplicity ain’t one

9 0.14197356 383 andrew gelman stats-2010-10-31-Analyzing the entire population rather than a sample

10 0.13767579 1330 andrew gelman stats-2012-05-19-Cross-validation to check missing-data imputation

11 0.13721129 1989 andrew gelman stats-2013-08-20-Correcting for multiple comparisons in a Bayesian regression model

12 0.13118535 1934 andrew gelman stats-2013-07-11-Yes, worry about generalizing from data to population. But multilevel modeling is the solution, not the problem

13 0.12030046 2033 andrew gelman stats-2013-09-23-More on Bayesian methods and multilevel modeling

14 0.1197759 25 andrew gelman stats-2010-05-10-Two great tastes that taste great together

15 0.11864217 291 andrew gelman stats-2010-09-22-Philosophy of Bayes and non-Bayes: A dialogue with Deborah Mayo

16 0.1175743 1737 andrew gelman stats-2013-02-25-Correlation of 1 . . . too good to be true?

17 0.11496706 2294 andrew gelman stats-2014-04-17-If you get to the point of asking, just do it. But some difficulties do arise . . .

18 0.11347898 524 andrew gelman stats-2011-01-19-Data exploration and multiple comparisons

19 0.11254824 726 andrew gelman stats-2011-05-22-Handling multiple versions of an outcome variable

20 0.11093525 1586 andrew gelman stats-2012-11-21-Readings for a two-week segment on Bayesian modeling?


similar blogs computed by lsi model

lsi for this blog:

topicId topicWeight

[(0, 0.176), (1, 0.116), (2, 0.019), (3, -0.039), (4, 0.069), (5, 0.055), (6, -0.008), (7, -0.0), (8, 0.088), (9, 0.08), (10, 0.043), (11, 0.03), (12, 0.013), (13, -0.024), (14, 0.096), (15, 0.002), (16, -0.041), (17, -0.03), (18, -0.018), (19, 0.011), (20, -0.01), (21, 0.07), (22, 0.009), (23, 0.012), (24, -0.066), (25, -0.119), (26, -0.001), (27, -0.026), (28, -0.003), (29, -0.009), (30, -0.001), (31, 0.014), (32, 0.059), (33, 0.055), (34, -0.002), (35, -0.035), (36, 0.061), (37, 0.057), (38, 0.051), (39, 0.019), (40, -0.049), (41, 0.021), (42, 0.002), (43, -0.076), (44, -0.049), (45, -0.035), (46, 0.02), (47, 0.06), (48, -0.042), (49, -0.073)]

similar blogs list:

simIndex simValue blogId blogTitle

same-blog 1 0.97557831 704 andrew gelman stats-2011-05-10-Multiple imputation and multilevel analysis

Introduction: Robert Birkelbach: I am writing my Bachelor Thesis in which I want to assess the reading competencies of German elementary school children using the PIRLS2006 data. My levels are classrooms and the individuals. However, my dependent variable is a multiple imputed (m=5) reading test. The problem I have is, that I do not know, whether I can just calculate 5 linear multilevel models and then average all the results (the coefficients, standard deviation, bic, intra class correlation, R2, t-statistics, p-values etc) or if I need different formulas for integrating the results of the five models into one because it is a multilevel analysis? Do you think there’s a better way in solving my problem? I would greatly appreciate if you could help me with a problem regarding my analysis — I am quite a newbie to multilevel modeling and especially to multiple imputation. Also: Is it okay to use frequentist models when the multiple imputation was done bayesian? Would the different philosophies of sc

2 0.77639681 1248 andrew gelman stats-2012-04-06-17 groups, 6 group-level predictors: What to do?

Introduction: Yi-Chun Ou writes: I am using a multilevel model with three levels. I read that you wrote a book about multilevel models, and wonder if you can solve the following question. The data structure is like this: Level one: customer (8444 customers) Level two: companys (90 companies) Level three: industry (17 industries) I use 6 level-three variables (i.e. industry characteristics) to explain the variance of the level-one effect across industries. The question here is whether there is an over-fitting problem since there are only 17 industries. I understand that this must be a problem for non-multilevel models, but is it also a problem for multilevel models? My reply: Yes, this could be a problem. I’d suggest combining some of your variables into a common score, or using only some of the variables, or using strong priors to control the inferences. This is an interesting and important area of statistics research, to do this sort of thing systematically. There’s lots o

3 0.77602249 848 andrew gelman stats-2011-08-11-That xkcd cartoon on multiple comparisons that all of you were sending me a couple months ago

Introduction: John Transue sent it in with the following thoughtful comment: I’d imagine you’ve already received this, but just in case, here’s a cartoon you’d like. At first blush it seems to go against your advice (more nuanced than what I’m about to say by quoting the paper title) to not worry about multiple comparisons. However, if I understand correctly your argument about multiple comparisons in multilevel models, the situation in this comic might have been avoided if shrinkage toward the grand mean (of all colors) had prevented the greens from clearing the .05 threshold. Is that right?

4 0.75049597 2294 andrew gelman stats-2014-04-17-If you get to the point of asking, just do it. But some difficulties do arise . . .

Introduction: Nelson Villoria writes: I find the multilevel approach very useful for a problem I am dealing with, and I was wondering whether you could point me to some references about poolability tests for multilevel models. I am working with time series of cross sectional data and I want to test whether the data supports cross sectional and/or time pooling. In a standard panel data setting I do this with Chow tests and/or CUSUM. Are these ideas directly transferable to the multilevel setting? My reply: I think you should do partial pooling. Once the question arises, just do it. Other models are just special cases. I don’t see the need for any test. That said, if you do a group-level model, you need to consider including group-level averages of individual predictors (see here ). And if the number of groups is small, there can be real gains from using an informative prior distribution on the hierarchical variance parameters. This is something that Jennifer and I do not discuss in our

5 0.7263431 772 andrew gelman stats-2011-06-17-Graphical tools for understanding multilevel models

Introduction: There are a few things I want to do: 1. Understand a fitted model using tools such as average predictive comparisons , R-squared, and partial pooling factors . In defining these concepts, Iain and I came up with some clever tricks, including (but not limited to): - Separating the inputs and averaging over all possible values of the input not being altered (for average predictive comparisons); - Defining partial pooling without referring to a raw-data or maximum-likelihood or no-pooling estimate (these don’t necessarily exist when you’re fitting logistic regression with sparse data); - Defining an R-squared for each level of a multilevel model. The methods get pretty complicated, though, and they have some loose ends–in particular, for average predictive comparisons with continuous input variables. So now we want to implement these in R and put them into arm along with bglmer etc. 2. Setting up coefplot so it works more generally (that is, so the graphics look nice

6 0.70884526 1814 andrew gelman stats-2013-04-20-A mess with which I am comfortable

7 0.70778984 1900 andrew gelman stats-2013-06-15-Exploratory multilevel analysis when group-level variables are of importance

8 0.70692414 1121 andrew gelman stats-2012-01-15-R-squared for multilevel models

9 0.70606565 397 andrew gelman stats-2010-11-06-Multilevel quantile regression

10 0.70540255 2033 andrew gelman stats-2013-09-23-More on Bayesian methods and multilevel modeling

11 0.7024675 1815 andrew gelman stats-2013-04-20-Displaying inferences from complex models

12 0.70038348 25 andrew gelman stats-2010-05-10-Two great tastes that taste great together

13 0.69936413 726 andrew gelman stats-2011-05-22-Handling multiple versions of an outcome variable

14 0.69629514 295 andrew gelman stats-2010-09-25-Clusters with very small numbers of observations

15 0.69620496 1989 andrew gelman stats-2013-08-20-Correcting for multiple comparisons in a Bayesian regression model

16 0.69264734 2296 andrew gelman stats-2014-04-19-Index or indicator variables

17 0.68883604 1934 andrew gelman stats-2013-07-11-Yes, worry about generalizing from data to population. But multilevel modeling is the solution, not the problem

18 0.68465793 948 andrew gelman stats-2011-10-10-Combining data from many sources

19 0.675713 653 andrew gelman stats-2011-04-08-Multilevel regression with shrinkage for “fixed” effects

20 0.67251468 608 andrew gelman stats-2011-03-12-Single or multiple imputation?


similar blogs computed by lda model

lda for this blog:

topicId topicWeight

[(2, 0.03), (16, 0.131), (24, 0.077), (40, 0.032), (44, 0.018), (47, 0.019), (52, 0.015), (62, 0.16), (63, 0.012), (86, 0.055), (89, 0.014), (99, 0.332)]

similar blogs list:

simIndex simValue blogId blogTitle

same-blog 1 0.95288998 704 andrew gelman stats-2011-05-10-Multiple imputation and multilevel analysis

Introduction: Robert Birkelbach: I am writing my Bachelor Thesis in which I want to assess the reading competencies of German elementary school children using the PIRLS2006 data. My levels are classrooms and the individuals. However, my dependent variable is a multiple imputed (m=5) reading test. The problem I have is, that I do not know, whether I can just calculate 5 linear multilevel models and then average all the results (the coefficients, standard deviation, bic, intra class correlation, R2, t-statistics, p-values etc) or if I need different formulas for integrating the results of the five models into one because it is a multilevel analysis? Do you think there’s a better way in solving my problem? I would greatly appreciate if you could help me with a problem regarding my analysis — I am quite a newbie to multilevel modeling and especially to multiple imputation. Also: Is it okay to use frequentist models when the multiple imputation was done bayesian? Would the different philosophies of sc

2 0.92948759 986 andrew gelman stats-2011-11-01-MacKay update: where 12 comes from

Introduction: In reply to my question , David MacKay writes: You said that can imagine rounding up 9 to 10 – which would be elegant if we worked in base 10. But in the UK we haven’t switched to base 10 yet, we still work in dozens and grosses. (One gross = 12^2 = 144.) So I was taught (by John Skilling, probably) “a dozen samples are plenty”. Probably in an earlier draft of the book in 2001 I said “a dozen”, rather than “12″. Then some feedbacker may have written and said “I don’t know what a dozen is”; so then I sacrificed elegant language and replaced “dozen” by “12″, which leads to your mystification. PS – please send the winner of your competition a free copy of my other book ( sewtha ) too, from me. PPS I see that Mikkel Schmidt [in your comments] has diligently found the correct answer, which I guessed above. I suggest you award the prizes to him. OK, we’re just giving away books here! P.S. See here for my review of MacKay’s book on sustainable energy.

3 0.92680776 260 andrew gelman stats-2010-09-07-QB2

Introduction: Dave Berri writes: Saw you had a post on the research I did with Rob Simmons on the NFL draft. I have attached the article. This article has not officially been published, so please don’t post this on-line. The post you linked to states the following: “On his blog, Berri says he restricts the analysis to QBs who have played more than 500 downs, or for 5 years. He also looks at per-play statistics, like touchdowns per game, to counter what he considers an opportunity bias.” Two points: First of all, we did not look at touchdowns per game (that is not a per play stat). More importantly — as this post indicates — we did far more than just look at data after five years. We did mention the five year result, but directly below that discussion (and I mean, directly below), the following sentences appear. Our data set runs from 1970 to 2007 (adjustments were made for how performance changed over time). We also looked at career performance after 2, 3, 4, 6, 7, and 8 years

4 0.9257164 715 andrew gelman stats-2011-05-16-“It doesn’t matter if you believe in God. What matters is if God believes in you.”

Introduction: Mark Chaves sent me this great article on religion and religious practice: After reading a book or article in the scientific study of religion, I [Chaves] wonder if you ever find yourself thinking, “I just don’t believe it.” I have this experience uncomfortably often, and I think it’s because of a pervasive problem in the scientific study of religion. I want to describe that problem and how to overcome it. The problem is illustrated in a story told by Meyer Fortes. He once asked a rainmaker in a native culture he was studying to perform the rainmaking ceremony for him. The rainmaker refused, replying: “Don’t be a fool, whoever makes a rain-making ceremony in the dry season?” The problem is illustrated in a different way in a story told by Jay Demerath. He was in Israel, visiting friends for a Sabbath dinner. The man of the house, a conservative rabbi, stopped in the middle of chanting the prayers to say cheerfully: “You know, we don’t believe in any of this. But then in Judai

5 0.90966076 156 andrew gelman stats-2010-07-20-Burglars are local

Introduction: This makes sense: In the land of fiction, it’s the criminal’s modus operandi – his method of entry, his taste for certain jewellery and so forth – that can be used by detectives to identify his handiwork. The reality according to a new analysis of solved burglaries in the Northamptonshire region of England is that these aspects of criminal behaviour are on their own unreliable as identifying markers, most likely because they are dictated by circumstances rather than the criminal’s taste and style. However, the geographical spread and timing of a burglar’s crimes are distinctive, and could help with police investigations. And, as a bonus, more Tourette’s pride! P.S. On yet another unrelated topic from the same blog, I wonder if the researchers in this study are aware that the difference between “significant” and “not significant” is not itself statistically significant .

6 0.90579844 339 andrew gelman stats-2010-10-13-Battle of the NYT opinion-page economists

7 0.90534085 2130 andrew gelman stats-2013-12-11-Multilevel marketing as a way of liquidating participants’ social networks

8 0.90337211 154 andrew gelman stats-2010-07-18-Predictive checks for hierarchical models

9 0.90286744 722 andrew gelman stats-2011-05-20-Why no Wegmania?

10 0.90088201 2106 andrew gelman stats-2013-11-19-More on “data science” and “statistics”

11 0.89963812 2131 andrew gelman stats-2013-12-12-My talk at Leuven, Sat 14 Dec

12 0.89922881 2280 andrew gelman stats-2014-04-03-As the boldest experiment in journalism history, you admit you made a mistake

13 0.89917946 1832 andrew gelman stats-2013-04-29-The blogroll

14 0.89909679 2082 andrew gelman stats-2013-10-30-Berri Gladwell Loken football update

15 0.89846075 54 andrew gelman stats-2010-05-27-Hype about conditional probability puzzles

16 0.89824247 859 andrew gelman stats-2011-08-18-Misunderstanding analysis of covariance

17 0.89786208 2182 andrew gelman stats-2014-01-22-Spell-checking example demonstrates key aspects of Bayesian data analysis

18 0.89638901 452 andrew gelman stats-2010-12-06-Followup questions

19 0.89631361 2107 andrew gelman stats-2013-11-20-NYT (non)-retraction watch

20 0.89580953 962 andrew gelman stats-2011-10-17-Death!