andrew_gelman_stats andrew_gelman_stats-2012 andrew_gelman_stats-2012-1644 knowledge-graph by maker-knowledge-mining

1644 andrew gelman stats-2012-12-30-Fixed effects, followed by Bayes shrinkage?


meta infos for this blog

Source: html

Introduction: Stuart Buck writes: I have a question about fixed effects vs. random effects . Amongst economists who study teacher value-added, it has become common to see people saying that they estimated teacher fixed effects (via least squares dummy variables, so that there is a parameter for each teacher), but that they then applied empirical Bayes shrinkage so that the teacher effects are brought closer to the mean. (See this paper by Jacob and Lefgren, for example.) Can that really be what they are doing? Why wouldn’t they just run random (modeled) effects in the first place? I feel like there’s something I’m missing. My reply: I don’t know the full story here, but I’m thinking there are two goals, first to get an unbiased estimate of an overall treatment effect (and there the econometricians prefer so-called fixed effects; I disagree with them on this but I know where they’re coming from) and second to estimate individual teacher effects (and there it makes sense to use so-called


Summary: the most important sentenses genereted by tfidf model

sentIndex sentText sentNum sentScore

1 Stuart Buck writes: I have a question about fixed effects vs. [sent-1, score-0.693]

2 Why wouldn’t they just run random (modeled) effects in the first place? [sent-6, score-0.781]


similar blogs computed by tfidf model

tfidf for this blog:

wordName wordTfidf (topN-words)

[('teacher', 0.513), ('effects', 0.442), ('fixed', 0.251), ('random', 0.202), ('overall', 0.17), ('dummy', 0.161), ('buck', 0.146), ('amongst', 0.143), ('stuart', 0.133), ('shrink', 0.133), ('econometricians', 0.133), ('jacob', 0.127), ('shrinkage', 0.126), ('unbiased', 0.115), ('modeled', 0.113), ('estimate', 0.111), ('brought', 0.111), ('squares', 0.111), ('closer', 0.098), ('goals', 0.088), ('disagree', 0.084), ('predictors', 0.084), ('empirical', 0.082), ('bayes', 0.078), ('via', 0.078), ('estimated', 0.078), ('parameter', 0.077), ('economists', 0.076), ('treatment', 0.075), ('toward', 0.074), ('become', 0.072), ('prefer', 0.072), ('common', 0.069), ('first', 0.069), ('run', 0.068), ('coming', 0.066), ('individual', 0.065), ('variables', 0.065), ('place', 0.065), ('full', 0.064), ('although', 0.063), ('wouldn', 0.062), ('applied', 0.06), ('regression', 0.057), ('know', 0.057), ('second', 0.056), ('feel', 0.055), ('saying', 0.054), ('thinking', 0.054), ('effect', 0.053)]

similar blogs list:

simIndex simValue blogId blogTitle

same-blog 1 0.99999988 1644 andrew gelman stats-2012-12-30-Fixed effects, followed by Bayes shrinkage?

Introduction: Stuart Buck writes: I have a question about fixed effects vs. random effects . Amongst economists who study teacher value-added, it has become common to see people saying that they estimated teacher fixed effects (via least squares dummy variables, so that there is a parameter for each teacher), but that they then applied empirical Bayes shrinkage so that the teacher effects are brought closer to the mean. (See this paper by Jacob and Lefgren, for example.) Can that really be what they are doing? Why wouldn’t they just run random (modeled) effects in the first place? I feel like there’s something I’m missing. My reply: I don’t know the full story here, but I’m thinking there are two goals, first to get an unbiased estimate of an overall treatment effect (and there the econometricians prefer so-called fixed effects; I disagree with them on this but I know where they’re coming from) and second to estimate individual teacher effects (and there it makes sense to use so-called

2 0.28568819 472 andrew gelman stats-2010-12-17-So-called fixed and random effects

Introduction: Someone writes: I am hoping you can give me some advice about when to use fixed and random effects model. I am currently working on a paper that examines the effect of . . . by comparing states . . . It got reviewed . . . by three economists and all suggest that we run a fixed effects model. We ran a hierarchial model in the paper that allow the intercept and slope to vary before and after . . . My question is which is correct? We have ran it both ways and really it makes no difference which model you run, the results are very similar. But for my own learning, I would really like to understand which to use under what circumstances. Is the fact that we use the whole population reason enough to just run a fixed effect model? Perhaps you can suggest a good reference to this question of when to run a fixed vs. random effects model. I’m not always sure what is meant by a “fixed effects model”; see my paper on Anova for discussion of the problems with this terminology: http://w

3 0.27641758 1241 andrew gelman stats-2012-04-02-Fixed effects and identification

Introduction: Tom Clark writes: Drew Linzer and I [Tom] have been working on a paper about the use of modeled (“random”) and unmodeled (“fixed”) effects. Not directly in response to the paper, but in conversations about the topic over the past few months, several people have said to us things to the effect of “I prefer fixed effects over random effects because I care about identification.” Neither Drew nor I has any idea what this comment is supposed to mean. Have you come across someone saying something like this? Do you have any thoughts about what these people could possibly mean? I want to respond to this concern when people raise it, but I have failed thus far to inquire what is meant and so do not know what to say. My reply: I have a “cultural” reply, which is that so-called fixed effects are thought to make fewer assumptions, and making fewer assumptions is considered a generally good thing that serious people do, and identification is considered a concern of serious people, so they g

4 0.2291704 1350 andrew gelman stats-2012-05-28-Value-added assessment: What went wrong?

Introduction: Jacob Hartog writes the following in reaction to my post on the use of value-added modeling for teacher assessment: What I [Hartog] think has been inadequately discussed is the use of individual model specifications to assign these teacher ratings, rather than the zone of agreement across a broad swath of model specifications. For example, the model used by NYCDOE doesn’t just control for a student’s prior year test score (as I think everyone can agree is a good idea.) It also assumes that different demographic groups will learn different amounts in a given year, and assigns a school-level random effect. The result is that, as was much ballyhooed at the time of the release of the data,the average teacher rating for a given school is roughly the same, no matter whether the school is performing great or terribly. The headline from this was “excellent teachers spread evenly across the city’s schools,” rather than “the specification of these models assume that excellent teachers are

5 0.20441678 1267 andrew gelman stats-2012-04-17-Hierarchical-multilevel modeling with “big data”

Introduction: Dean Eckles writes: I make extensive use of random effects models in my academic and industry research, as they are very often appropriate. However, with very large data sets, I am not sure what to do. Say I have thousands of levels of a grouping factor, and the number of observations totals in the billions. Despite having lots of observations, I am often either dealing with (a) small effects or (b) trying to fit models with many predictors. So I would really like to use a random effects model to borrow strength across the levels of the grouping factor, but I am not sure how to practically do this. Are you aware of any approaches to fitting random effects models (including approximations) that work for very large data sets? For example, applying a procedure to each group, and then using the results of this to shrink each fit in some appropriate way. Just to clarify, here I am only worried about the non-crossed and in fact single-level case. I don’t see any easy route for cross

6 0.20339186 226 andrew gelman stats-2010-08-23-More on those L.A. Times estimates of teacher effectiveness

7 0.19949117 1194 andrew gelman stats-2012-03-04-Multilevel modeling even when you’re not interested in predictions for new groups

8 0.19522546 653 andrew gelman stats-2011-04-08-Multilevel regression with shrinkage for “fixed” effects

9 0.18490437 888 andrew gelman stats-2011-09-03-A psychology researcher asks: Is Anova dead?

10 0.18051361 606 andrew gelman stats-2011-03-10-It’s no fun being graded on a curve

11 0.17056434 1310 andrew gelman stats-2012-05-09-Varying treatment effects, again

12 0.16674156 2086 andrew gelman stats-2013-11-03-How best to compare effects measured in two different time periods?

13 0.15389428 1891 andrew gelman stats-2013-06-09-“Heterogeneity of variance in experimental studies: A challenge to conventional interpretations”

14 0.15028784 1274 andrew gelman stats-2012-04-21-Value-added assessment political FAIL

15 0.1502694 1620 andrew gelman stats-2012-12-12-“Teaching effectiveness” as another dimension in cognitive ability

16 0.15020739 464 andrew gelman stats-2010-12-12-Finite-population standard deviation in a hierarchical model

17 0.14973705 433 andrew gelman stats-2010-11-27-One way that psychology research is different than medical research

18 0.14491449 1763 andrew gelman stats-2013-03-14-Everyone’s trading bias for variance at some point, it’s just done at different places in the analyses

19 0.14370601 342 andrew gelman stats-2010-10-14-Trying to be precise about vagueness

20 0.13657698 960 andrew gelman stats-2011-10-15-The bias-variance tradeoff


similar blogs computed by lsi model

lsi for this blog:

topicId topicWeight

[(0, 0.168), (1, 0.083), (2, 0.102), (3, -0.132), (4, 0.078), (5, 0.018), (6, 0.049), (7, 0.004), (8, 0.046), (9, 0.059), (10, -0.024), (11, 0.035), (12, 0.058), (13, -0.057), (14, 0.082), (15, -0.006), (16, -0.092), (17, 0.065), (18, -0.084), (19, 0.107), (20, -0.069), (21, -0.004), (22, 0.037), (23, 0.003), (24, 0.009), (25, -0.065), (26, -0.133), (27, 0.145), (28, -0.092), (29, 0.018), (30, -0.034), (31, 0.042), (32, -0.022), (33, -0.07), (34, 0.039), (35, -0.036), (36, -0.063), (37, -0.007), (38, 0.02), (39, 0.006), (40, 0.072), (41, 0.06), (42, 0.015), (43, 0.041), (44, -0.046), (45, 0.067), (46, 0.027), (47, -0.064), (48, -0.024), (49, 0.005)]

similar blogs list:

simIndex simValue blogId blogTitle

same-blog 1 0.98534435 1644 andrew gelman stats-2012-12-30-Fixed effects, followed by Bayes shrinkage?

Introduction: Stuart Buck writes: I have a question about fixed effects vs. random effects . Amongst economists who study teacher value-added, it has become common to see people saying that they estimated teacher fixed effects (via least squares dummy variables, so that there is a parameter for each teacher), but that they then applied empirical Bayes shrinkage so that the teacher effects are brought closer to the mean. (See this paper by Jacob and Lefgren, for example.) Can that really be what they are doing? Why wouldn’t they just run random (modeled) effects in the first place? I feel like there’s something I’m missing. My reply: I don’t know the full story here, but I’m thinking there are two goals, first to get an unbiased estimate of an overall treatment effect (and there the econometricians prefer so-called fixed effects; I disagree with them on this but I know where they’re coming from) and second to estimate individual teacher effects (and there it makes sense to use so-called

2 0.79929304 1241 andrew gelman stats-2012-04-02-Fixed effects and identification

Introduction: Tom Clark writes: Drew Linzer and I [Tom] have been working on a paper about the use of modeled (“random”) and unmodeled (“fixed”) effects. Not directly in response to the paper, but in conversations about the topic over the past few months, several people have said to us things to the effect of “I prefer fixed effects over random effects because I care about identification.” Neither Drew nor I has any idea what this comment is supposed to mean. Have you come across someone saying something like this? Do you have any thoughts about what these people could possibly mean? I want to respond to this concern when people raise it, but I have failed thus far to inquire what is meant and so do not know what to say. My reply: I have a “cultural” reply, which is that so-called fixed effects are thought to make fewer assumptions, and making fewer assumptions is considered a generally good thing that serious people do, and identification is considered a concern of serious people, so they g

3 0.79696423 464 andrew gelman stats-2010-12-12-Finite-population standard deviation in a hierarchical model

Introduction: Karri Seppa writes: My topic is regional variation in the cause-specific survival of breast cancer patients across the 21 hospital districts in Finland, this component being modeled by random effects. I am interested mainly in the district-specific effects, and with a hierarchical model I can get reasonable estimates also for sparsely populated districts. Based on the recommendation given in the book by yourself and Dr. Hill (2007) I tend to think that the finite-population variance would be an appropriate measure to summarize the overall variation across the 21 districts. However, I feel it is somewhat incoherent first to assume a Normal distribution for the district effects, involving a “superpopulation” variance parameter, and then to compute the finite-population variance from the estimated district-specific parameters. I wonder whether the finite-population variance were more appropriate in the context of a model with fixed district effects? My reply: I agree that th

4 0.76939803 2086 andrew gelman stats-2013-11-03-How best to compare effects measured in two different time periods?

Introduction: I received the following email from someone who wishes to remain anonymous: My colleague and I are trying to understand the best way to approach a problem involving measuring a group of individuals’ abilities across time, and are hoping you can offer some guidance. We are trying to analyze the combined effect of two distinct groups of people (A and B, with no overlap between A and B) who collaborate to produce a binary outcome, using a mixed logistic regression along the lines of the following. Outcome ~ (1 | A) + (1 | B) + Other variables What we’re interested in testing was whether the observed A random effects in period 1 are predictive of the A random effects in the following period 2. Our idea being create two models, each using a different period’s worth of data, to create two sets of A coefficients, then observe the relationship between the two. If the A’s have a persistent ability across periods, the coefficients should be correlated or show a linear-ish relationshi

5 0.76900405 433 andrew gelman stats-2010-11-27-One way that psychology research is different than medical research

Introduction: Medical researchers care about main effects, psychologists care about interactions. In psychology, the main effects are typically obvious, and it’s only the interactions that are worth studying.

6 0.76586699 1891 andrew gelman stats-2013-06-09-“Heterogeneity of variance in experimental studies: A challenge to conventional interpretations”

7 0.7486127 1194 andrew gelman stats-2012-03-04-Multilevel modeling even when you’re not interested in predictions for new groups

8 0.72777617 1310 andrew gelman stats-2012-05-09-Varying treatment effects, again

9 0.72642595 472 andrew gelman stats-2010-12-17-So-called fixed and random effects

10 0.70929325 1513 andrew gelman stats-2012-09-27-Estimating seasonality with a data set that’s just 52 weeks long

11 0.70585144 1744 andrew gelman stats-2013-03-01-Why big effects are more important than small effects

12 0.69752121 963 andrew gelman stats-2011-10-18-Question on Type M errors

13 0.68761683 1186 andrew gelman stats-2012-02-27-Confusion from illusory precision

14 0.67795056 226 andrew gelman stats-2010-08-23-More on those L.A. Times estimates of teacher effectiveness

15 0.67181981 2165 andrew gelman stats-2014-01-09-San Fernando Valley cityscapes: An example of the benefits of fractal devastation?

16 0.66975522 269 andrew gelman stats-2010-09-10-R vs. Stata, or, Different ways to estimate multilevel models

17 0.66667104 1686 andrew gelman stats-2013-01-21-Finite-population Anova calculations for models with interactions

18 0.65834457 1267 andrew gelman stats-2012-04-17-Hierarchical-multilevel modeling with “big data”

19 0.64824581 1400 andrew gelman stats-2012-06-29-Decline Effect in Linguistics?

20 0.64326966 653 andrew gelman stats-2011-04-08-Multilevel regression with shrinkage for “fixed” effects


similar blogs computed by lda model

lda for this blog:

topicId topicWeight

[(16, 0.098), (21, 0.033), (24, 0.267), (45, 0.042), (54, 0.02), (79, 0.021), (86, 0.04), (89, 0.017), (99, 0.338)]

similar blogs list:

simIndex simValue blogId blogTitle

same-blog 1 0.98936141 1644 andrew gelman stats-2012-12-30-Fixed effects, followed by Bayes shrinkage?

Introduction: Stuart Buck writes: I have a question about fixed effects vs. random effects . Amongst economists who study teacher value-added, it has become common to see people saying that they estimated teacher fixed effects (via least squares dummy variables, so that there is a parameter for each teacher), but that they then applied empirical Bayes shrinkage so that the teacher effects are brought closer to the mean. (See this paper by Jacob and Lefgren, for example.) Can that really be what they are doing? Why wouldn’t they just run random (modeled) effects in the first place? I feel like there’s something I’m missing. My reply: I don’t know the full story here, but I’m thinking there are two goals, first to get an unbiased estimate of an overall treatment effect (and there the econometricians prefer so-called fixed effects; I disagree with them on this but I know where they’re coming from) and second to estimate individual teacher effects (and there it makes sense to use so-called

2 0.98593801 1155 andrew gelman stats-2012-02-05-What is a prior distribution?

Introduction: Some recent blog discussion revealed some confusion that I’ll try to resolve here. I wrote that I’m not a big fan of subjective priors. Various commenters had difficulty with this point, and I think the issue was most clearly stated by Bill Jeff re erys, who wrote : It seems to me that your prior has to reflect your subjective information before you look at the data. How can it not? But this does not mean that the (subjective) prior that you choose is irrefutable; Surely a prior that reflects prior information just does not have to be inconsistent with that information. But that still leaves a range of priors that are consistent with it, the sort of priors that one would use in a sensitivity analysis, for example. I think I see what Bill is getting at. A prior represents your subjective belief, or some approximation to your subjective belief, even if it’s not perfect. That sounds reasonable but I don’t think it works. Or, at least, it often doesn’t work. Let’s start

3 0.98589408 807 andrew gelman stats-2011-07-17-Macro causality

Introduction: David Backus writes: This is from my area of work, macroeconomics. The suggestion here is that the economy is growing slowly because consumers aren’t spending money. But how do we know it’s not the reverse: that consumers are spending less because the economy isn’t doing well. As a teacher, I can tell you that it’s almost impossible to get students to understand that the first statement isn’t obviously true. What I’d call the demand-side story (more spending leads to more output) is everywhere, including this piece, from the usually reliable David Leonhardt. This whole situation reminds me of the story of the village whose inhabitants support themselves by taking in each others’ laundry. I guess we’re rich enough in the U.S. that we can stay afloat for a few decades just buying things from each other? Regarding the causal question, I’d like to move away from the idea of “Does A causes B or does B cause A” and toward a more intervention-based framework (Rubin’s model for

4 0.98552573 2086 andrew gelman stats-2013-11-03-How best to compare effects measured in two different time periods?

Introduction: I received the following email from someone who wishes to remain anonymous: My colleague and I are trying to understand the best way to approach a problem involving measuring a group of individuals’ abilities across time, and are hoping you can offer some guidance. We are trying to analyze the combined effect of two distinct groups of people (A and B, with no overlap between A and B) who collaborate to produce a binary outcome, using a mixed logistic regression along the lines of the following. Outcome ~ (1 | A) + (1 | B) + Other variables What we’re interested in testing was whether the observed A random effects in period 1 are predictive of the A random effects in the following period 2. Our idea being create two models, each using a different period’s worth of data, to create two sets of A coefficients, then observe the relationship between the two. If the A’s have a persistent ability across periods, the coefficients should be correlated or show a linear-ish relationshi

5 0.98482084 2029 andrew gelman stats-2013-09-18-Understanding posterior p-values

Introduction: David Kaplan writes: I came across your paper “Understanding Posterior Predictive P-values”, and I have a question regarding your statement “If a posterior predictive p-value is 0.4, say, that means that, if we believe the model, we think there is a 40% chance that tomorrow’s value of T(y_rep) will exceed today’s T(y).” This is perfectly understandable to me and represents the idea of calibration. However, I am unsure how this relates to statements about fit. If T is the LR chi-square or Pearson chi-square, then your statement that there is a 40% chance that tomorrows value exceeds today’s value indicates bad fit, I think. Yet, some literature indicates that high p-values suggest good fit. Could you clarify this? My reply: I think that “fit” depends on the question being asked. In this case, I’d say the model fits for this particular purpose, even though it might not fit for other purposes. And here’s the abstract of the paper: Posterior predictive p-values do not i

6 0.98440295 1240 andrew gelman stats-2012-04-02-Blogads update

7 0.98438776 1838 andrew gelman stats-2013-05-03-Setting aside the politics, the debate over the new health-care study reveals that we’re moving to a new high standard of statistical journalism

8 0.98410606 1792 andrew gelman stats-2013-04-07-X on JLP

9 0.98301613 1080 andrew gelman stats-2011-12-24-Latest in blog advertising

10 0.98250902 2109 andrew gelman stats-2013-11-21-Hidden dangers of noninformative priors

11 0.98234832 1367 andrew gelman stats-2012-06-05-Question 26 of my final exam for Design and Analysis of Sample Surveys

12 0.98219645 1474 andrew gelman stats-2012-08-29-More on scaled-inverse Wishart and prior independence

13 0.98210859 502 andrew gelman stats-2011-01-04-Cash in, cash out graph

14 0.98169196 1208 andrew gelman stats-2012-03-11-Gelman on Hennig on Gelman on Bayes

15 0.98082775 1206 andrew gelman stats-2012-03-10-95% intervals that I don’t believe, because they’re from a flat prior I don’t believe

16 0.98081625 1176 andrew gelman stats-2012-02-19-Standardized writing styles and standardized graphing styles

17 0.98054695 846 andrew gelman stats-2011-08-09-Default priors update?

18 0.98002595 1881 andrew gelman stats-2013-06-03-Boot

19 0.98001134 898 andrew gelman stats-2011-09-10-Fourteen magic words: an update

20 0.97953796 2208 andrew gelman stats-2014-02-12-How to think about “identifiability” in Bayesian inference?