andrew_gelman_stats andrew_gelman_stats-2011 andrew_gelman_stats-2011-653 knowledge-graph by maker-knowledge-mining

653 andrew gelman stats-2011-04-08-Multilevel regression with shrinkage for “fixed” effects


meta infos for this blog

Source: html

Introduction: Dean Eckles writes: I remember reading on your blog that you were working on some tools to fit multilevel models that also include “fixed” effects — such as continuous predictors — that are also estimated with shrinkage (for example, an L1 or L2 penalty). Any new developments on this front? I often find myself wanting to fit a multilevel model to some data, but also needing to include a number of “fixed” effects, mainly continuous variables. This makes me wary of overfitting to these predictors, so then I’d want to use some kind of shrinkage. As far as I can tell, the main options for doing this now is by going fully Bayesian and using a Gibbs sampler. With MCMCglmm or BUGS/JAGS I could just specify a prior on the fixed effects that corresponds to a desired penalty. However, this is pretty slow, especially with a large data set and because I’d like to select the penalty parameter by cross-validation (which is where this isn’t very Bayesian I guess?). My reply: We allow info


Summary: the most important sentenses genereted by tfidf model

sentIndex sentText sentNum sentScore

1 Dean Eckles writes: I remember reading on your blog that you were working on some tools to fit multilevel models that also include “fixed” effects — such as continuous predictors — that are also estimated with shrinkage (for example, an L1 or L2 penalty). [sent-1, score-1.889]

2 I often find myself wanting to fit a multilevel model to some data, but also needing to include a number of “fixed” effects, mainly continuous variables. [sent-3, score-1.234]

3 This makes me wary of overfitting to these predictors, so then I’d want to use some kind of shrinkage. [sent-4, score-0.365]

4 As far as I can tell, the main options for doing this now is by going fully Bayesian and using a Gibbs sampler. [sent-5, score-0.201]

5 With MCMCglmm or BUGS/JAGS I could just specify a prior on the fixed effects that corresponds to a desired penalty. [sent-6, score-0.791]

6 However, this is pretty slow, especially with a large data set and because I’d like to select the penalty parameter by cross-validation (which is where this isn’t very Bayesian I guess? [sent-7, score-0.444]

7 My reply: We allow informative priors in blmer/bglmer. [sent-9, score-0.258]

8 Unfortunately blmer/bglmer aren’t ready yet but they will be soon, I hope. [sent-10, score-0.231]

9 We’re also working on a bigger project of multilevel models for deep interactions of continuous predictors. [sent-12, score-1.195]

10 But that won’t be ready for awhile; we still have to figure out what we want to do there. [sent-13, score-0.307]


similar blogs computed by tfidf model

tfidf for this blog:

wordName wordTfidf (topN-words)

[('continuous', 0.28), ('fixed', 0.261), ('penalty', 0.25), ('multilevel', 0.231), ('ready', 0.231), ('mcmcglmm', 0.185), ('predictors', 0.175), ('effects', 0.173), ('eckles', 0.161), ('wary', 0.146), ('overfitting', 0.143), ('needing', 0.136), ('include', 0.132), ('shrinkage', 0.131), ('specify', 0.126), ('developments', 0.126), ('dean', 0.126), ('mainly', 0.125), ('gibbs', 0.125), ('fit', 0.12), ('desired', 0.117), ('corresponds', 0.114), ('select', 0.114), ('wanting', 0.113), ('working', 0.111), ('options', 0.11), ('slow', 0.11), ('arm', 0.108), ('bigger', 0.108), ('front', 0.104), ('deep', 0.1), ('soon', 0.1), ('also', 0.097), ('package', 0.097), ('bayesian', 0.096), ('interactions', 0.096), ('models', 0.093), ('tools', 0.092), ('fully', 0.091), ('awhile', 0.089), ('priors', 0.087), ('allow', 0.086), ('informative', 0.085), ('unfortunately', 0.082), ('estimated', 0.081), ('parameter', 0.08), ('project', 0.079), ('aren', 0.077), ('want', 0.076), ('remember', 0.076)]

similar blogs list:

simIndex simValue blogId blogTitle

same-blog 1 0.99999994 653 andrew gelman stats-2011-04-08-Multilevel regression with shrinkage for “fixed” effects

Introduction: Dean Eckles writes: I remember reading on your blog that you were working on some tools to fit multilevel models that also include “fixed” effects — such as continuous predictors — that are also estimated with shrinkage (for example, an L1 or L2 penalty). Any new developments on this front? I often find myself wanting to fit a multilevel model to some data, but also needing to include a number of “fixed” effects, mainly continuous variables. This makes me wary of overfitting to these predictors, so then I’d want to use some kind of shrinkage. As far as I can tell, the main options for doing this now is by going fully Bayesian and using a Gibbs sampler. With MCMCglmm or BUGS/JAGS I could just specify a prior on the fixed effects that corresponds to a desired penalty. However, this is pretty slow, especially with a large data set and because I’d like to select the penalty parameter by cross-validation (which is where this isn’t very Bayesian I guess?). My reply: We allow info

2 0.1993551 1228 andrew gelman stats-2012-03-25-Continuous variables in Bayesian networks

Introduction: Antti Rasinen writes: I’m a former undergrad machine learning student and a current software engineer with a Bayesian hobby. Today my two worlds collided. I ask for some enlightenment. On your blog you’ve repeatedly advocated continuous distributions with Bayesian models. Today I read this article by Ricky Ho, who writes: The strength of Bayesian network is it is highly scalable and can learn incrementally because all we do is to count the observed variables and update the probability distribution table. Similar to Neural Network, Bayesian network expects all data to be binary, categorical variable will need to be transformed into multiple binary variable as described above. Numeric variable is generally not a good fit for Bayesian network. The last sentence seems to be at odds with what you’ve said. Sadly, I don’t have enough expertise to say which view of the world is correct. During my undergrad years our team wrote an implementation of the Junction Tree algorithm. We r

3 0.19522546 1644 andrew gelman stats-2012-12-30-Fixed effects, followed by Bayes shrinkage?

Introduction: Stuart Buck writes: I have a question about fixed effects vs. random effects . Amongst economists who study teacher value-added, it has become common to see people saying that they estimated teacher fixed effects (via least squares dummy variables, so that there is a parameter for each teacher), but that they then applied empirical Bayes shrinkage so that the teacher effects are brought closer to the mean. (See this paper by Jacob and Lefgren, for example.) Can that really be what they are doing? Why wouldn’t they just run random (modeled) effects in the first place? I feel like there’s something I’m missing. My reply: I don’t know the full story here, but I’m thinking there are two goals, first to get an unbiased estimate of an overall treatment effect (and there the econometricians prefer so-called fixed effects; I disagree with them on this but I know where they’re coming from) and second to estimate individual teacher effects (and there it makes sense to use so-called

4 0.19095907 472 andrew gelman stats-2010-12-17-So-called fixed and random effects

Introduction: Someone writes: I am hoping you can give me some advice about when to use fixed and random effects model. I am currently working on a paper that examines the effect of . . . by comparing states . . . It got reviewed . . . by three economists and all suggest that we run a fixed effects model. We ran a hierarchial model in the paper that allow the intercept and slope to vary before and after . . . My question is which is correct? We have ran it both ways and really it makes no difference which model you run, the results are very similar. But for my own learning, I would really like to understand which to use under what circumstances. Is the fact that we use the whole population reason enough to just run a fixed effect model? Perhaps you can suggest a good reference to this question of when to run a fixed vs. random effects model. I’m not always sure what is meant by a “fixed effects model”; see my paper on Anova for discussion of the problems with this terminology: http://w

5 0.18045217 1241 andrew gelman stats-2012-04-02-Fixed effects and identification

Introduction: Tom Clark writes: Drew Linzer and I [Tom] have been working on a paper about the use of modeled (“random”) and unmodeled (“fixed”) effects. Not directly in response to the paper, but in conversations about the topic over the past few months, several people have said to us things to the effect of “I prefer fixed effects over random effects because I care about identification.” Neither Drew nor I has any idea what this comment is supposed to mean. Have you come across someone saying something like this? Do you have any thoughts about what these people could possibly mean? I want to respond to this concern when people raise it, but I have failed thus far to inquire what is meant and so do not know what to say. My reply: I have a “cultural” reply, which is that so-called fixed effects are thought to make fewer assumptions, and making fewer assumptions is considered a generally good thing that serious people do, and identification is considered a concern of serious people, so they g

6 0.17089282 962 andrew gelman stats-2011-10-17-Death!

7 0.16563913 383 andrew gelman stats-2010-10-31-Analyzing the entire population rather than a sample

8 0.16549218 1248 andrew gelman stats-2012-04-06-17 groups, 6 group-level predictors: What to do?

9 0.16174182 1267 andrew gelman stats-2012-04-17-Hierarchical-multilevel modeling with “big data”

10 0.15852273 1431 andrew gelman stats-2012-07-27-Overfitting

11 0.15806051 2145 andrew gelman stats-2013-12-24-Estimating and summarizing inference for hierarchical variance parameters when the number of groups is small

12 0.15525022 1966 andrew gelman stats-2013-08-03-Uncertainty in parameter estimates using multilevel models

13 0.15105774 772 andrew gelman stats-2011-06-17-Graphical tools for understanding multilevel models

14 0.14347474 288 andrew gelman stats-2010-09-21-Discussion of the paper by Girolami and Calderhead on Bayesian computation

15 0.14265265 888 andrew gelman stats-2011-09-03-A psychology researcher asks: Is Anova dead?

16 0.14228885 2294 andrew gelman stats-2014-04-17-If you get to the point of asking, just do it. But some difficulties do arise . . .

17 0.14212976 342 andrew gelman stats-2010-10-14-Trying to be precise about vagueness

18 0.13628685 1486 andrew gelman stats-2012-09-07-Prior distributions for regression coefficients

19 0.13565196 295 andrew gelman stats-2010-09-25-Clusters with very small numbers of observations

20 0.13437 1194 andrew gelman stats-2012-03-04-Multilevel modeling even when you’re not interested in predictions for new groups


similar blogs computed by lsi model

lsi for this blog:

topicId topicWeight

[(0, 0.191), (1, 0.181), (2, 0.028), (3, 0.029), (4, 0.073), (5, 0.024), (6, 0.045), (7, -0.068), (8, 0.021), (9, 0.115), (10, 0.034), (11, -0.0), (12, 0.066), (13, -0.007), (14, 0.093), (15, 0.034), (16, -0.06), (17, -0.003), (18, -0.035), (19, 0.084), (20, -0.03), (21, 0.035), (22, -0.03), (23, -0.008), (24, -0.077), (25, -0.133), (26, -0.081), (27, 0.057), (28, -0.046), (29, -0.007), (30, -0.043), (31, -0.016), (32, -0.025), (33, -0.052), (34, 0.044), (35, -0.022), (36, -0.013), (37, 0.005), (38, 0.034), (39, -0.01), (40, 0.012), (41, 0.022), (42, 0.028), (43, -0.001), (44, -0.034), (45, 0.013), (46, 0.006), (47, 0.013), (48, -0.059), (49, 0.0)]

similar blogs list:

simIndex simValue blogId blogTitle

same-blog 1 0.9791913 653 andrew gelman stats-2011-04-08-Multilevel regression with shrinkage for “fixed” effects

Introduction: Dean Eckles writes: I remember reading on your blog that you were working on some tools to fit multilevel models that also include “fixed” effects — such as continuous predictors — that are also estimated with shrinkage (for example, an L1 or L2 penalty). Any new developments on this front? I often find myself wanting to fit a multilevel model to some data, but also needing to include a number of “fixed” effects, mainly continuous variables. This makes me wary of overfitting to these predictors, so then I’d want to use some kind of shrinkage. As far as I can tell, the main options for doing this now is by going fully Bayesian and using a Gibbs sampler. With MCMCglmm or BUGS/JAGS I could just specify a prior on the fixed effects that corresponds to a desired penalty. However, this is pretty slow, especially with a large data set and because I’d like to select the penalty parameter by cross-validation (which is where this isn’t very Bayesian I guess?). My reply: We allow info

2 0.8377583 2145 andrew gelman stats-2013-12-24-Estimating and summarizing inference for hierarchical variance parameters when the number of groups is small

Introduction: Chris Che-Castaldo writes: I am trying to compute variance components for a hierarchical model where the group level has two binary predictors and their interaction. When I model each of these three predictors as N(0, tau) the model will not converge, perhaps because the number of coefficients in each batch is so small (2 for the main effects and 4 for the interaction). Although I could simply leave all these as predictors as unmodeled fixed effects, the last sentence of section 21.2 on page 462 of Gelman and Hill (2007) suggests this would not be a wise course of action: For example, it is not clear how to define the (finite) standard deviation of variables that are included in interactions. I am curious – is there still no clear cut way to directly compute the finite standard deviation for binary unmodeled variables that are also part of an interaction as well as the interaction itself? My reply: I’d recommend including these in your model (it’s probably easiest to do so

3 0.82816637 1194 andrew gelman stats-2012-03-04-Multilevel modeling even when you’re not interested in predictions for new groups

Introduction: Fred Wu writes: I work at National Prescribing Services in Australia. I have a database representing say, antidiabetic drug utilisation for the entire Australia in the past few years. I planned to do a longitudinal analysis across GP Division Network (112 divisions in AUS) using mixed-effects models (or as you called in your book varying intercept and varying slope) on this data. The problem here is: as data actually represent the population who use antidiabetic drugs in AUS, should I use 112 fixed dummy variables to capture the random variations or use varying intercept and varying slope for the model ? Because some one may aruge, like divisions in AUS or states in USA can hardly be considered from a “superpopulation”, then fixed dummies should be used. What I think is the population are those who use the drugs, what will happen when the rest need to use them? In terms of exchangeability, using varying intercept and varying slopes can be justified. Also you provided in y

4 0.82767993 2033 andrew gelman stats-2013-09-23-More on Bayesian methods and multilevel modeling

Introduction: Ban Chuan Cheah writes: In a previous post, http://andrewgelman.com/2013/07/30/the-roy-causal-model/ you pointed to a paper on Bayesian methods by Heckman. At around the same time I came across another one of his papers, “The Effects of Cognitive and Noncognitive Abilities on Labor Market Outcomes and Social Behavior (2006)” (http://www.nber.org/papers/w12006 or published version http://www.jstor.org/stable/10.1086/504455). In this paper they implement their model as follows: We use Bayesian Markov chain Monte Carlo methods to compute the sample likelihood. Our use of Bayesian methods is only a computational convenience. Our identification analysis is strictly classical. Under our assumptions, the priors we use are asymptotically irrelevant. Some of the authors have also done something similar earlier in: Hansen, Karsten T. & Heckman, James J. & Mullen, K.J.Kathleen J., 2004. “The effect of schooling and ability on achievement test scores,” Journal of Econometrics, Elsevi

5 0.79510552 850 andrew gelman stats-2011-08-11-Understanding how estimates change when you move to a multilevel model

Introduction: Ramu Sudhagoni writes: I am working on combining three longitudinal studies using Bayesian hierarchical technique. In each study, I have at least 70 subjects follow up on 5 different visit months. My model consists of 10 different covariates including longitudinal and cross-sectional effects. Mixed models are used to fit the three studies individually using Bayesian approach and I noticed that few covariates were significant. When I combined using three level hierarchical approach, all the covariates became non-significant at the population level, and large estimates were found for variance parameters at the population level. I am struggling to understand why I am getting large variances at population level and wider credible intervals. I assumed non-informative normal priors for all my cross sectional and longitudinal effects, and non-informative inverse-gamma priors for variance parameters. I followed the approach explained by Inoue et al. (Title: Combining Longitudinal Studie

6 0.79299951 1786 andrew gelman stats-2013-04-03-Hierarchical array priors for ANOVA decompositions

7 0.78011107 1737 andrew gelman stats-2013-02-25-Correlation of 1 . . . too good to be true?

8 0.77449143 2294 andrew gelman stats-2014-04-17-If you get to the point of asking, just do it. But some difficulties do arise . . .

9 0.76974869 1248 andrew gelman stats-2012-04-06-17 groups, 6 group-level predictors: What to do?

10 0.76866382 2086 andrew gelman stats-2013-11-03-How best to compare effects measured in two different time periods?

11 0.76075053 1267 andrew gelman stats-2012-04-17-Hierarchical-multilevel modeling with “big data”

12 0.75112748 464 andrew gelman stats-2010-12-12-Finite-population standard deviation in a hierarchical model

13 0.74543178 1102 andrew gelman stats-2012-01-06-Bayesian Anova found useful in ecology

14 0.73987114 269 andrew gelman stats-2010-09-10-R vs. Stata, or, Different ways to estimate multilevel models

15 0.7387591 417 andrew gelman stats-2010-11-17-Clutering and variance components

16 0.73490417 704 andrew gelman stats-2011-05-10-Multiple imputation and multilevel analysis

17 0.73479593 2296 andrew gelman stats-2014-04-19-Index or indicator variables

18 0.73395616 1513 andrew gelman stats-2012-09-27-Estimating seasonality with a data set that’s just 52 weeks long

19 0.73256695 753 andrew gelman stats-2011-06-09-Allowing interaction terms to vary

20 0.7264241 472 andrew gelman stats-2010-12-17-So-called fixed and random effects


similar blogs computed by lda model

lda for this blog:

topicId topicWeight

[(2, 0.016), (9, 0.014), (15, 0.015), (16, 0.027), (21, 0.021), (24, 0.134), (40, 0.034), (42, 0.065), (52, 0.111), (54, 0.015), (86, 0.054), (99, 0.396)]

similar blogs list:

simIndex simValue blogId blogTitle

same-blog 1 0.98251456 653 andrew gelman stats-2011-04-08-Multilevel regression with shrinkage for “fixed” effects

Introduction: Dean Eckles writes: I remember reading on your blog that you were working on some tools to fit multilevel models that also include “fixed” effects — such as continuous predictors — that are also estimated with shrinkage (for example, an L1 or L2 penalty). Any new developments on this front? I often find myself wanting to fit a multilevel model to some data, but also needing to include a number of “fixed” effects, mainly continuous variables. This makes me wary of overfitting to these predictors, so then I’d want to use some kind of shrinkage. As far as I can tell, the main options for doing this now is by going fully Bayesian and using a Gibbs sampler. With MCMCglmm or BUGS/JAGS I could just specify a prior on the fixed effects that corresponds to a desired penalty. However, this is pretty slow, especially with a large data set and because I’d like to select the penalty parameter by cross-validation (which is where this isn’t very Bayesian I guess?). My reply: We allow info

2 0.97556823 786 andrew gelman stats-2011-07-04-Questions about quantum computing

Introduction: I read this article by Rivka Galchen on quantum computing. Much of the article was about an eccentric scientist in his fifties named David Deutch. I’m sure the guy is brilliant but I wasn’t particularly interested in his not particularly interesting life story (apparently he’s thin and lives in Oxford). There was a brief description of quantum computing itself, which reminds me of the discussion we had a couple years ago under the heading, The laws of conditional probability are false (and the update here ). I don’t have anything new to say here; I’d just never heard of quantum computing before and it seemed relevant to our discussion. The uncertainty inherent in quantum computing seems closely related to Jouni’s idea of fully Bayesian computing , that uncertainty should be inherent in the computational structure rather than tacked on at the end. P.S. No, I’m not working on July 4th! This post is two months old, we just have a long waiting list of blog entries.

3 0.97107333 104 andrew gelman stats-2010-06-22-Seeking balance

Introduction: I’m trying to temporarily kick the blogging habit as I seem to be addicted. I’m currently on a binge and my plan is to schedule a bunch of already-written entries at one per weekday and not blog anything new for awhile. Yesterday I fell off the wagon and posted 4 items, but maybe now I can show some restraint. P.S. In keeping with the spirit of this blog, I scheduled it to appear on 13 May, even though I wrote it on 15 Apr. Just about everything you’ve been reading on this blog for the past several weeks (and lots of forthcoming items) were written a month ago. The only exceptions are whatever my cobloggers have been posting and various items that were timely enough that I inserted them in the queue afterward. P.P.S I bumped it up to 22 Jun because, as of 14 Apr, I was continuing to write new entries. I hope to slow down soon! P.P.P.S. (20 June) I was going to bump it up again–the horizon’s now in mid-July–but I thought, enough is enough! Right now I think that about ha

4 0.96939087 1369 andrew gelman stats-2012-06-06-Your conclusion is only as good as your data

Introduction: Jay Livingston points to an excellent rant from Peter Moskos, trashing a study about “food deserts” (which I kept reading as “food desserts”) in inner-city neighborhoods. Here’s Moskos: From the Times: There is no relationship between the type of food being sold in a neighborhood and obesity among its children and adolescents. Within a couple of miles of almost any urban neighborhood, “you can get basically any type of food,” said Roland Sturm of the RAND Corporation, lead author of one of the studies. “Maybe we should call it a food swamp rather than a desert,” he said. Sure thing, Sturm. But I suspect you wouldn’t think certain neighborhoods are swamped with good food if you actually got out of your office and went to one of the neighborhoods. After all, what are going to believe: A nice data set or your lying eyes? “Food outlet data … are classifıed using the North American Industry Classifıcation System (NAICS)” (p. 130). Assuming validity and reliability of NAICS

5 0.96928322 82 andrew gelman stats-2010-06-12-UnConMax – uncertainty consideration maxims 7 +-- 2

Introduction: Warning – this blog post is meant to encourage some loose, fuzzy and possibly distracting thoughts about the practice of statistics in research endeavours. There maybe spelling and grammatical errors as well as a lack of proper sentence structure. It may not be understandable to many or even possibly any readers. But somewhat more seriously, its better that “ConUnMax” So far I have five maxims 1. Explicit models of uncertanty are useful but – always wrong and can always be made less wrong 2. If the model is formally a probability model – always use probability calculus (Bayes) 3. Always useful to make the model a formal probability model – no matter what (Bayesianisn) 4. Never use a model that is not empirically motivated and strongly empirically testable (Frequentist – of the anti-Bayesian flavour) 5. Quantitative tools are always just a means to grasp and manipulate models – never an end in itself (i.e. don’t obsess over “baby” mathematics) 6. If one really understood st

6 0.96838856 948 andrew gelman stats-2011-10-10-Combining data from many sources

7 0.96767455 889 andrew gelman stats-2011-09-04-The acupuncture paradox

8 0.96629077 1957 andrew gelman stats-2013-07-26-“The Inside Story Of The Harvard Dissertation That Became Too Racist For Heritage”

9 0.96512121 1020 andrew gelman stats-2011-11-20-No no no no no

10 0.96415329 1531 andrew gelman stats-2012-10-12-Elderpedia

11 0.96289414 1611 andrew gelman stats-2012-12-07-Feedback on my Bayesian Data Analysis class at Columbia

12 0.96143931 200 andrew gelman stats-2010-08-11-Separating national and state swings in voting and public opinion, or, How I avoided blogorific embarrassment: An agony in four acts

13 0.96138555 117 andrew gelman stats-2010-06-29-Ya don’t know Bayes, Jack

14 0.96125364 2041 andrew gelman stats-2013-09-27-Setting up Jitts online

15 0.95967567 1588 andrew gelman stats-2012-11-23-No one knows what it’s like to be the bad man

16 0.95765829 2018 andrew gelman stats-2013-09-12-Do you ever have that I-just-fit-a-model feeling?

17 0.95698798 1726 andrew gelman stats-2013-02-18-What to read to catch up on multivariate statistics?

18 0.95620036 2265 andrew gelman stats-2014-03-24-On deck this week

19 0.9557001 2367 andrew gelman stats-2014-06-10-Spring forward, fall back, drop dead?

20 0.95469022 1692 andrew gelman stats-2013-01-25-Freakonomics Experiments