andrew_gelman_stats andrew_gelman_stats-2013 andrew_gelman_stats-2013-1737 knowledge-graph by maker-knowledge-mining

1737 andrew gelman stats-2013-02-25-Correlation of 1 . . . too good to be true?


meta infos for this blog

Source: html

Introduction: Alex Hoffman points me to this interview by Dylan Matthews of education researcher Thomas Kane, who at one point says, Once you corrected for measurement error, a teacher’s score on their chosen videos and on their unchosen videos were correlated at 1. They were perfectly correlated. Hoffman asks, “What do you think? Do you think that just maybe, perhaps, it’s possible we aught to consider, I’m just throwing out the possibility that it might be that the procedure for correcting measurement error might, you now, be a little too strong?” I don’t know exactly what’s happening here, but it might be something that I’ve seen on occasion when fitting multilevel models using a point estimate for the group-level variance. It goes like this: measurement-error models are multilevel models, they involve the estimation of a distribution of a latent variable. When fitting multilevel models, it is possible to estimate the group-level variance to be zero, even though the group-level varia


Summary: the most important sentenses genereted by tfidf model

sentIndex sentText sentNum sentScore

1 Alex Hoffman points me to this interview by Dylan Matthews of education researcher Thomas Kane, who at one point says, Once you corrected for measurement error, a teacher’s score on their chosen videos and on their unchosen videos were correlated at 1. [sent-1, score-1.387]

2 Do you think that just maybe, perhaps, it’s possible we aught to consider, I’m just throwing out the possibility that it might be that the procedure for correcting measurement error might, you now, be a little too strong? [sent-4, score-0.962]

3 ” I don’t know exactly what’s happening here, but it might be something that I’ve seen on occasion when fitting multilevel models using a point estimate for the group-level variance. [sent-5, score-1.152]

4 It goes like this: measurement-error models are multilevel models, they involve the estimation of a distribution of a latent variable. [sent-6, score-0.749]

5 When fitting multilevel models, it is possible to estimate the group-level variance to be zero, even though the group-level variance is not zero in real life. [sent-7, score-1.518]

6 We have a penalized-likelihood approach to keep the estimate away from zero (see this paper , to appear in Psychometrika) but this is not yet standard in computer packages. [sent-8, score-0.704]

7 The result is that in a multilevel model you can get estimates of zero variance or perfect correlations because the variation in the data is less than its expected value under the noise model. [sent-9, score-1.228]

8 With a full Bayesian approach, you’d find the correlation could take on a range of possible values, it’s not really equal to 1. [sent-10, score-0.416]


similar blogs computed by tfidf model

tfidf for this blog:

wordName wordTfidf (topN-words)

[('multilevel', 0.288), ('videos', 0.285), ('hoffman', 0.285), ('zero', 0.277), ('variance', 0.235), ('models', 0.174), ('kane', 0.173), ('measurement', 0.171), ('psychometrika', 0.164), ('fitting', 0.162), ('estimate', 0.162), ('possible', 0.159), ('dylan', 0.143), ('correcting', 0.126), ('error', 0.122), ('corrected', 0.12), ('occasion', 0.115), ('alex', 0.111), ('approach', 0.11), ('latent', 0.105), ('throwing', 0.103), ('chosen', 0.102), ('teacher', 0.1), ('involve', 0.1), ('perfectly', 0.099), ('possibility', 0.098), ('interview', 0.096), ('noise', 0.096), ('equal', 0.094), ('correlations', 0.093), ('might', 0.093), ('score', 0.092), ('happening', 0.091), ('procedure', 0.09), ('thomas', 0.09), ('correlated', 0.089), ('perfect', 0.086), ('asks', 0.084), ('computer', 0.082), ('correlation', 0.082), ('estimation', 0.082), ('range', 0.081), ('variation', 0.078), ('expected', 0.075), ('researcher', 0.074), ('education', 0.073), ('appear', 0.073), ('values', 0.072), ('exactly', 0.067), ('strong', 0.066)]

similar blogs list:

simIndex simValue blogId blogTitle

same-blog 1 1.0 1737 andrew gelman stats-2013-02-25-Correlation of 1 . . . too good to be true?

Introduction: Alex Hoffman points me to this interview by Dylan Matthews of education researcher Thomas Kane, who at one point says, Once you corrected for measurement error, a teacher’s score on their chosen videos and on their unchosen videos were correlated at 1. They were perfectly correlated. Hoffman asks, “What do you think? Do you think that just maybe, perhaps, it’s possible we aught to consider, I’m just throwing out the possibility that it might be that the procedure for correcting measurement error might, you now, be a little too strong?” I don’t know exactly what’s happening here, but it might be something that I’ve seen on occasion when fitting multilevel models using a point estimate for the group-level variance. It goes like this: measurement-error models are multilevel models, they involve the estimation of a distribution of a latent variable. When fitting multilevel models, it is possible to estimate the group-level variance to be zero, even though the group-level varia

2 0.19827475 779 andrew gelman stats-2011-06-25-Avoiding boundary estimates using a prior distribution as regularization

Introduction: For awhile I’ve been fitting most of my multilevel models using lmer/glmer, which gives point estimates of the group-level variance parameters (maximum marginal likelihood estimate for lmer and an approximation for glmer). I’m usually satisfied with this–sure, point estimation understates the uncertainty in model fitting, but that’s typically the least of our worries. Sometimes, though, lmer/glmer estimates group-level variances at 0 or estimates group-level correlation parameters at +/- 1. Typically, when this happens, it’s not that we’re so sure the variance is close to zero or that the correlation is close to 1 or -1; rather, the marginal likelihood does not provide a lot of information about these parameters of the group-level error distribution. I don’t want point estimates on the boundary. I don’t want to say that the unexplained variance in some dimension is exactly zero. One way to handle this problem is full Bayes: slap a prior on sigma, do your Gibbs and Metropolis

3 0.18252866 1248 andrew gelman stats-2012-04-06-17 groups, 6 group-level predictors: What to do?

Introduction: Yi-Chun Ou writes: I am using a multilevel model with three levels. I read that you wrote a book about multilevel models, and wonder if you can solve the following question. The data structure is like this: Level one: customer (8444 customers) Level two: companys (90 companies) Level three: industry (17 industries) I use 6 level-three variables (i.e. industry characteristics) to explain the variance of the level-one effect across industries. The question here is whether there is an over-fitting problem since there are only 17 industries. I understand that this must be a problem for non-multilevel models, but is it also a problem for multilevel models? My reply: Yes, this could be a problem. I’d suggest combining some of your variables into a common score, or using only some of the variables, or using strong priors to control the inferences. This is an interesting and important area of statistics research, to do this sort of thing systematically. There’s lots o

4 0.17269067 165 andrew gelman stats-2010-07-27-Nothing is Linear, Nothing is Additive: Bayesian Models for Interactions in Social Science

Introduction: My talks at Cambridge this Wed and Thurs in the department of Machine Learning . Powerpoints are here and here . Also some videos are here (but no videos of the “Nothing is Linear, Nothing is Additive” talk).

5 0.15548328 295 andrew gelman stats-2010-09-25-Clusters with very small numbers of observations

Introduction: James O’Brien writes: How would you explain, to a “classically-trained” hypothesis-tester, that “It’s OK to fit a multilevel model even if some groups have only one observation each”? I [O'Brien] think I understand the logic and the statistical principles at work in this, but I’ve having trouble being clear and persuasive. I also feel like I’m contending with some methodological conventional wisdom here. My reply: I’m so used to this idea that I find it difficult to defend it in some sort of general conceptual way. So let me retreat to a more functional defense, which is that multilevel modeling gives good estimates, especially when the number of observations per group is small. One way to see this in any particular example in through cross-validation. Another way is to consider the alternatives. If you try really hard you can come up with a “classical hypothesis testing” approach which will do as well as the multilevel model. It would just take a lot of work. I’d r

6 0.15423618 2294 andrew gelman stats-2014-04-17-If you get to the point of asking, just do it. But some difficulties do arise . . .

7 0.14984991 810 andrew gelman stats-2011-07-20-Adding more information can make the variance go up (depending on your model)

8 0.14254722 255 andrew gelman stats-2010-09-04-How does multilevel modeling affect the estimate of the grand mean?

9 0.14033557 960 andrew gelman stats-2011-10-15-The bias-variance tradeoff

10 0.13878308 772 andrew gelman stats-2011-06-17-Graphical tools for understanding multilevel models

11 0.13711387 1162 andrew gelman stats-2012-02-11-Adding an error model to a deterministic model

12 0.13673051 2315 andrew gelman stats-2014-05-02-Discovering general multidimensional associations

13 0.13473135 1757 andrew gelman stats-2013-03-11-My problem with the Lindley paradox

14 0.13411824 2033 andrew gelman stats-2013-09-23-More on Bayesian methods and multilevel modeling

15 0.13333307 2117 andrew gelman stats-2013-11-29-The gradual transition to replicable science

16 0.13265483 383 andrew gelman stats-2010-10-31-Analyzing the entire population rather than a sample

17 0.13227174 63 andrew gelman stats-2010-06-02-The problem of overestimation of group-level variance parameters

18 0.13118072 1934 andrew gelman stats-2013-07-11-Yes, worry about generalizing from data to population. But multilevel modeling is the solution, not the problem

19 0.12878585 1966 andrew gelman stats-2013-08-03-Uncertainty in parameter estimates using multilevel models

20 0.12606829 1392 andrew gelman stats-2012-06-26-Occam


similar blogs computed by lsi model

lsi for this blog:

topicId topicWeight

[(0, 0.194), (1, 0.16), (2, 0.087), (3, -0.043), (4, 0.075), (5, 0.027), (6, 0.037), (7, -0.02), (8, 0.042), (9, 0.086), (10, 0.045), (11, 0.011), (12, 0.008), (13, 0.017), (14, 0.007), (15, -0.018), (16, -0.071), (17, -0.02), (18, 0.007), (19, -0.01), (20, -0.01), (21, -0.015), (22, 0.047), (23, 0.03), (24, -0.018), (25, -0.113), (26, -0.08), (27, 0.06), (28, -0.06), (29, -0.048), (30, 0.0), (31, 0.057), (32, 0.004), (33, -0.098), (34, 0.035), (35, -0.03), (36, 0.024), (37, -0.051), (38, 0.038), (39, -0.043), (40, -0.033), (41, 0.016), (42, -0.038), (43, 0.01), (44, -0.092), (45, 0.031), (46, 0.025), (47, 0.058), (48, -0.083), (49, -0.065)]

similar blogs list:

simIndex simValue blogId blogTitle

same-blog 1 0.97879702 1737 andrew gelman stats-2013-02-25-Correlation of 1 . . . too good to be true?

Introduction: Alex Hoffman points me to this interview by Dylan Matthews of education researcher Thomas Kane, who at one point says, Once you corrected for measurement error, a teacher’s score on their chosen videos and on their unchosen videos were correlated at 1. They were perfectly correlated. Hoffman asks, “What do you think? Do you think that just maybe, perhaps, it’s possible we aught to consider, I’m just throwing out the possibility that it might be that the procedure for correcting measurement error might, you now, be a little too strong?” I don’t know exactly what’s happening here, but it might be something that I’ve seen on occasion when fitting multilevel models using a point estimate for the group-level variance. It goes like this: measurement-error models are multilevel models, they involve the estimation of a distribution of a latent variable. When fitting multilevel models, it is possible to estimate the group-level variance to be zero, even though the group-level varia

2 0.86255294 255 andrew gelman stats-2010-09-04-How does multilevel modeling affect the estimate of the grand mean?

Introduction: Subhadeep Mukhopadhyay writes: I am convinced of the power of hierarchical modeling and individual parameter pooling concept. I was wondering how could multi-level modeling could influence the estimate of grad mean (NOT individual label). My reply: Multilevel modeling will affect the estimate of the grand mean in two ways: 1. If the group-level mean is correlated with group size, then the partial pooling will change the estimate of the grand mean (and, indeed, you might want to include group size or some similar variable as a group-level predictor. 2. In any case, the extra error term(s) in a multilevel model will typically affect the standard error of everything, including the estimate of the grand mean.

3 0.77044219 1966 andrew gelman stats-2013-08-03-Uncertainty in parameter estimates using multilevel models

Introduction: David Hsu writes: I have a (perhaps) simple question about uncertainty in parameter estimates using multilevel models — what is an appropriate threshold for measure parameter uncertainty in a multilevel model? The reason why I ask is that I set out to do a crossed two-way model with two varying intercepts, similar to your flight simulator example in your 2007 book. The difference is that I have a lot of predictors specific to each cell (I think equivalent to airport and pilot in your example), and I find after modeling this in JAGS, I happily find that the predictors are much less important than the variability by cell (airport and pilot effects). Happily because this is what I am writing a paper about. However, I then went to check subsets of predictors using lm() and lmer(). I understand that they all use different estimation methods, but what I can’t figure out is why the errors on all of the coefficient estimates are *so* different. For example, using JAGS, and th

4 0.75365567 464 andrew gelman stats-2010-12-12-Finite-population standard deviation in a hierarchical model

Introduction: Karri Seppa writes: My topic is regional variation in the cause-specific survival of breast cancer patients across the 21 hospital districts in Finland, this component being modeled by random effects. I am interested mainly in the district-specific effects, and with a hierarchical model I can get reasonable estimates also for sparsely populated districts. Based on the recommendation given in the book by yourself and Dr. Hill (2007) I tend to think that the finite-population variance would be an appropriate measure to summarize the overall variation across the 21 districts. However, I feel it is somewhat incoherent first to assume a Normal distribution for the district effects, involving a “superpopulation” variance parameter, and then to compute the finite-population variance from the estimated district-specific parameters. I wonder whether the finite-population variance were more appropriate in the context of a model with fixed district effects? My reply: I agree that th

5 0.74516392 269 andrew gelman stats-2010-09-10-R vs. Stata, or, Different ways to estimate multilevel models

Introduction: Cyrus writes: I [Cyrus] was teaching a class on multilevel modeling, and we were playing around with different method to fit a random effects logit model with 2 random intercepts—one corresponding to “family” and another corresponding to “community” (labeled “mom” and “cluster” in the data, respectively). There are also a few regressors at the individual, family, and community level. We were replicating in part some of the results from the following paper : Improved estimation procedures for multilevel models with binary response: a case-study, by G Rodriguez, N Goldman. (I say “replicating in part” because we didn’t include all the regressors that they use, only a subset.) We were looking at the performance of estimation via glmer in R’s lme4 package, glmmPQL in R’s MASS package, and Stata’s xtmelogit. We wanted to study the performance of various estimation methods, including adaptive quadrature methods and penalized quasi-likelihood. I was shocked to discover that glmer

6 0.72173375 653 andrew gelman stats-2011-04-08-Multilevel regression with shrinkage for “fixed” effects

7 0.71324909 772 andrew gelman stats-2011-06-17-Graphical tools for understanding multilevel models

8 0.71181613 383 andrew gelman stats-2010-10-31-Analyzing the entire population rather than a sample

9 0.70999968 1194 andrew gelman stats-2012-03-04-Multilevel modeling even when you’re not interested in predictions for new groups

10 0.70086324 810 andrew gelman stats-2011-07-20-Adding more information can make the variance go up (depending on your model)

11 0.69183832 2086 andrew gelman stats-2013-11-03-How best to compare effects measured in two different time periods?

12 0.6903581 1248 andrew gelman stats-2012-04-06-17 groups, 6 group-level predictors: What to do?

13 0.68141007 948 andrew gelman stats-2011-10-10-Combining data from many sources

14 0.68094754 1102 andrew gelman stats-2012-01-06-Bayesian Anova found useful in ecology

15 0.67989802 1786 andrew gelman stats-2013-04-03-Hierarchical array priors for ANOVA decompositions

16 0.67783475 2033 andrew gelman stats-2013-09-23-More on Bayesian methods and multilevel modeling

17 0.67607498 704 andrew gelman stats-2011-05-10-Multiple imputation and multilevel analysis

18 0.67285323 1267 andrew gelman stats-2012-04-17-Hierarchical-multilevel modeling with “big data”

19 0.67094165 952 andrew gelman stats-2011-10-11-More reason to like Sims besides just his name

20 0.66988575 1686 andrew gelman stats-2013-01-21-Finite-population Anova calculations for models with interactions


similar blogs computed by lda model

lda for this blog:

topicId topicWeight

[(8, 0.033), (16, 0.066), (24, 0.177), (30, 0.015), (52, 0.017), (84, 0.011), (85, 0.014), (86, 0.032), (95, 0.206), (99, 0.318)]

similar blogs list:

simIndex simValue blogId blogTitle

1 0.98089755 1973 andrew gelman stats-2013-08-08-For chrissake, just make up an analysis already! We have a lab here to run, y’know?

Introduction: Ben Hyde sends along this : Stuck in the middle of the supplemental data, reporting the total workup for their compounds, was this gem: Emma, please insert NMR data here! where are they? and for this compound, just make up an elemental analysis . . . I’m reminded of our recent discussions of coauthorship, where I argued that I see real advantages to having multiple people taking responsibility for the result. Jay Verkuilen responded: “On the flipside of collaboration . . . is diffusion of responsibility, where everybody thinks someone else ‘has that problem’ and thus things don’t get solved.” That’s what seems to have happened (hilariously) here.

2 0.98056459 1164 andrew gelman stats-2012-02-13-Help with this problem, win valuable prizes

Introduction: Corrected equation                 This post is by Phil. In the comments to an earlier post , I mentioned a problem I am struggling with right now. Several people mentioned having (and solving!) similar problems in the past, so this seems like a great way for me and a bunch of other blog readers to learn something. I will describe the problem, one or more of you will tell me how to solve it, and you will win…wait for it….my thanks, and the approval and admiration of your fellow blog readers, and a big thank-you in any publication that includes results from fitting the model.  You can’t ask fairer than that! Here’s the problem.  The goal is to estimate six parameters that characterize the leakiness (or air-tightness) of a house with an attached garage.  We are specifically interested in the parameters that describe the connection between the house and the garage; this is of interest because of the effect on the air quality in the house  if there are toxic chemic

3 0.97991395 404 andrew gelman stats-2010-11-09-“Much of the recent reported drop in interstate migration is a statistical artifact”

Introduction: Greg Kaplan writes: I noticed that you have blogged a little about interstate migration trends in the US, and thought that you might be interested in a new working paper of mine (joint with Sam Schulhofer-Wohl from the Minneapolis Fed) which I have attached. Briefly, we show that much of the recent reported drop in interstate migration is a statistical artifact: The Census Bureau made an undocumented change in its imputation procedures for missing data in 2006, and this change significantly reduced the number of imputed interstate moves. The change in imputation procedures — not any actual change in migration behavior — explains 90 percent of the reported decrease in interstate migration between the 2005 and 2006 Current Population Surveys, and 42 percent of the decrease between 2000 and 2010. I haven’t had a chance to give a serious look so could only make the quick suggestion to make the graphs smaller and put multiple graphs on a page, This would allow the reader to bett

4 0.97734416 12 andrew gelman stats-2010-04-30-More on problems with surveys estimating deaths in war zones

Introduction: Andrew Mack writes: There was a brief commentary from the Benetech folk on the Human Security Report Project’s, “The Shrinking Costs of War” report on your blog in January. But the report has since generated a lot of public controversy . Since the report–like the current discussion in your blog on Mike Spagat’s new paper on Iraq–deals with controversies generated by survey-based excess death estimates, we thought your readers might be interested. Our responses to the debate were posted on our website last week. “Shrinking Costs” had discussed the dramatic decline in death tolls from wartime violence since the end of World War II –and its causes. We also argued that deaths from war-exacerbated disease and malnutrition had declined. (The exec. summary is here .) One of the most striking findings was that mortality rates (we used under-five mortality data) decline during most wars. Indeed our latest research indicates that of the total number of years that countries w

5 0.97603756 1086 andrew gelman stats-2011-12-27-The most dangerous jobs in America

Introduction: Robin Hanson writes: On the criteria of potential to help people avoid death, this would seem to be among the most important news I’ve ever heard. [In his recent Ph.D. thesis , Ken Lee finds that] death rates depend on job details more than on race, gender, marriage status, rural vs. urban, education, and income  combined !  Now for the details. The US Department of Labor has described each of 807 occupations with over 200 detailed features on how jobs are done, skills required, etc.. Lee looked at seven domains of such features, each containing 16 to 57 features, and for each domain Lee did a factor analysis of those features to find the top 2-4 factors. This gave Lee a total of 22 domain factors. Lee also found four overall factors to describe his total set of 225 job and 9 demographic features. (These four factors explain 32%, 15%, 7%, and 4% of total variance.) Lee then tried to use these 26 job factors, along with his other standard predictors (age, race, gender, m

6 0.97216463 1862 andrew gelman stats-2013-05-18-uuuuuuuuuuuuugly

7 0.97073853 519 andrew gelman stats-2011-01-16-Update on the generalized method of moments

8 0.96820271 266 andrew gelman stats-2010-09-09-The future of R

9 0.96653605 1308 andrew gelman stats-2012-05-08-chartsnthings !

same-blog 10 0.95498335 1737 andrew gelman stats-2013-02-25-Correlation of 1 . . . too good to be true?

11 0.95365 2135 andrew gelman stats-2013-12-15-The UN Plot to Force Bayesianism on Unsuspecting Americans (penalized B-Spline edition)

12 0.95107347 1820 andrew gelman stats-2013-04-23-Foundation for Open Access Statistics

13 0.95010757 627 andrew gelman stats-2011-03-24-How few respondents are reasonable to use when calculating the average by county?

14 0.94745159 1834 andrew gelman stats-2013-05-01-A graph at war with its caption. Also, how to visualize the same numbers without giving the display a misleading causal feel?

15 0.94426095 1758 andrew gelman stats-2013-03-11-Yes, the decision to try (or not) to have a child can be made rationally

16 0.94220448 1070 andrew gelman stats-2011-12-19-The scope for snooping

17 0.9418633 1575 andrew gelman stats-2012-11-12-Thinking like a statistician (continuously) rather than like a civilian (discretely)

18 0.93624967 2154 andrew gelman stats-2013-12-30-Bill Gates’s favorite graph of the year

19 0.93406832 1595 andrew gelman stats-2012-11-28-Should Harvard start admitting kids at random?

20 0.93394327 1646 andrew gelman stats-2013-01-01-Back when fifty years was a long time ago