andrew_gelman_stats andrew_gelman_stats-2012 andrew_gelman_stats-2012-1267 knowledge-graph by maker-knowledge-mining
Source: html
Introduction: Dean Eckles writes: I make extensive use of random effects models in my academic and industry research, as they are very often appropriate. However, with very large data sets, I am not sure what to do. Say I have thousands of levels of a grouping factor, and the number of observations totals in the billions. Despite having lots of observations, I am often either dealing with (a) small effects or (b) trying to fit models with many predictors. So I would really like to use a random effects model to borrow strength across the levels of the grouping factor, but I am not sure how to practically do this. Are you aware of any approaches to fitting random effects models (including approximations) that work for very large data sets? For example, applying a procedure to each group, and then using the results of this to shrink each fit in some appropriate way. Just to clarify, here I am only worried about the non-crossed and in fact single-level case. I don’t see any easy route for cross
sentIndex sentText sentNum sentScore
1 Dean Eckles writes: I make extensive use of random effects models in my academic and industry research, as they are very often appropriate. [sent-1, score-0.651]
2 Say I have thousands of levels of a grouping factor, and the number of observations totals in the billions. [sent-3, score-0.587]
3 Despite having lots of observations, I am often either dealing with (a) small effects or (b) trying to fit models with many predictors. [sent-4, score-0.403]
4 So I would really like to use a random effects model to borrow strength across the levels of the grouping factor, but I am not sure how to practically do this. [sent-5, score-1.201]
5 Are you aware of any approaches to fitting random effects models (including approximations) that work for very large data sets? [sent-6, score-0.672]
6 For example, applying a procedure to each group, and then using the results of this to shrink each fit in some appropriate way. [sent-7, score-0.172]
7 I don’t see any easy route for crossed random effects, which is why we have been content to just get reasonable estimates uncertainty estimates for means, etc. [sent-9, score-0.9]
8 (Some extra details: In one case, I am fitting a propensity score model where there are really more than 2e8 somewhat similar treatments. [sent-13, score-0.328]
9 One approach is to go totally unpooled (your secret weapon), but I think variance will be a problem here sense there are so many features. [sent-15, score-0.284]
10 Another approach is to use some other kind of shrinkage, like the lasso or the grouped lasso. [sent-16, score-0.286]
11 My reply: I’ve been thinking about this problem for awhile . [sent-18, score-0.081]
12 It seems likely to me that some Gibbs-like and EM-like solutions should be possible. [sent-19, score-0.083]
13 (And if there’s an EM solution, there should be a variational Bayes solution too. [sent-20, score-0.211]
14 - Speeding things up by analyzing subsets of the data. [sent-22, score-0.099]
15 , “California”) but not in the sparser groups (“Rhode Island,” etc. [sent-25, score-0.124]
16 This has a bit of the feel of particle filtering. [sent-28, score-0.097]
17 - My guess is that the way to go is to get this working for a particular problem of interest, then could think about how to implement it efficiently in Stan etc. [sent-29, score-0.167]
wordName wordTfidf (topN-words)
[('random', 0.234), ('effects', 0.23), ('grouping', 0.223), ('crossed', 0.215), ('em', 0.182), ('observations', 0.135), ('group', 0.134), ('estimates', 0.127), ('rhode', 0.124), ('sparser', 0.124), ('speeding', 0.124), ('unpooled', 0.124), ('model', 0.122), ('sets', 0.121), ('levels', 0.118), ('factor', 0.117), ('grouped', 0.117), ('fitting', 0.115), ('solution', 0.112), ('totals', 0.111), ('uncertainty', 0.109), ('eckles', 0.108), ('envision', 0.102), ('subsets', 0.099), ('borrow', 0.099), ('variational', 0.099), ('particle', 0.097), ('fed', 0.095), ('weapon', 0.095), ('practically', 0.094), ('island', 0.094), ('extensive', 0.094), ('appreciated', 0.094), ('models', 0.093), ('shrink', 0.092), ('propensity', 0.091), ('lasso', 0.09), ('route', 0.088), ('shrinkage', 0.087), ('approximations', 0.087), ('efficiently', 0.086), ('dean', 0.084), ('gibbs', 0.083), ('solutions', 0.083), ('units', 0.082), ('problem', 0.081), ('strength', 0.081), ('fit', 0.08), ('approach', 0.079), ('including', 0.078)]
simIndex simValue blogId blogTitle
same-blog 1 0.9999997 1267 andrew gelman stats-2012-04-17-Hierarchical-multilevel modeling with “big data”
Introduction: Dean Eckles writes: I make extensive use of random effects models in my academic and industry research, as they are very often appropriate. However, with very large data sets, I am not sure what to do. Say I have thousands of levels of a grouping factor, and the number of observations totals in the billions. Despite having lots of observations, I am often either dealing with (a) small effects or (b) trying to fit models with many predictors. So I would really like to use a random effects model to borrow strength across the levels of the grouping factor, but I am not sure how to practically do this. Are you aware of any approaches to fitting random effects models (including approximations) that work for very large data sets? For example, applying a procedure to each group, and then using the results of this to shrink each fit in some appropriate way. Just to clarify, here I am only worried about the non-crossed and in fact single-level case. I don’t see any easy route for cross
2 0.20441678 1644 andrew gelman stats-2012-12-30-Fixed effects, followed by Bayes shrinkage?
Introduction: Stuart Buck writes: I have a question about fixed effects vs. random effects . Amongst economists who study teacher value-added, it has become common to see people saying that they estimated teacher fixed effects (via least squares dummy variables, so that there is a parameter for each teacher), but that they then applied empirical Bayes shrinkage so that the teacher effects are brought closer to the mean. (See this paper by Jacob and Lefgren, for example.) Can that really be what they are doing? Why wouldn’t they just run random (modeled) effects in the first place? I feel like there’s something I’m missing. My reply: I don’t know the full story here, but I’m thinking there are two goals, first to get an unbiased estimate of an overall treatment effect (and there the econometricians prefer so-called fixed effects; I disagree with them on this but I know where they’re coming from) and second to estimate individual teacher effects (and there it makes sense to use so-called
3 0.17655671 2086 andrew gelman stats-2013-11-03-How best to compare effects measured in two different time periods?
Introduction: I received the following email from someone who wishes to remain anonymous: My colleague and I are trying to understand the best way to approach a problem involving measuring a group of individuals’ abilities across time, and are hoping you can offer some guidance. We are trying to analyze the combined effect of two distinct groups of people (A and B, with no overlap between A and B) who collaborate to produce a binary outcome, using a mixed logistic regression along the lines of the following. Outcome ~ (1 | A) + (1 | B) + Other variables What we’re interested in testing was whether the observed A random effects in period 1 are predictive of the A random effects in the following period 2. Our idea being create two models, each using a different period’s worth of data, to create two sets of A coefficients, then observe the relationship between the two. If the A’s have a persistent ability across periods, the coefficients should be correlated or show a linear-ish relationshi
4 0.16174182 653 andrew gelman stats-2011-04-08-Multilevel regression with shrinkage for “fixed” effects
Introduction: Dean Eckles writes: I remember reading on your blog that you were working on some tools to fit multilevel models that also include “fixed” effects — such as continuous predictors — that are also estimated with shrinkage (for example, an L1 or L2 penalty). Any new developments on this front? I often find myself wanting to fit a multilevel model to some data, but also needing to include a number of “fixed” effects, mainly continuous variables. This makes me wary of overfitting to these predictors, so then I’d want to use some kind of shrinkage. As far as I can tell, the main options for doing this now is by going fully Bayesian and using a Gibbs sampler. With MCMCglmm or BUGS/JAGS I could just specify a prior on the fixed effects that corresponds to a desired penalty. However, this is pretty slow, especially with a large data set and because I’d like to select the penalty parameter by cross-validation (which is where this isn’t very Bayesian I guess?). My reply: We allow info
5 0.15948689 472 andrew gelman stats-2010-12-17-So-called fixed and random effects
Introduction: Someone writes: I am hoping you can give me some advice about when to use fixed and random effects model. I am currently working on a paper that examines the effect of . . . by comparing states . . . It got reviewed . . . by three economists and all suggest that we run a fixed effects model. We ran a hierarchial model in the paper that allow the intercept and slope to vary before and after . . . My question is which is correct? We have ran it both ways and really it makes no difference which model you run, the results are very similar. But for my own learning, I would really like to understand which to use under what circumstances. Is the fact that we use the whole population reason enough to just run a fixed effect model? Perhaps you can suggest a good reference to this question of when to run a fixed vs. random effects model. I’m not always sure what is meant by a “fixed effects model”; see my paper on Anova for discussion of the problems with this terminology: http://w
6 0.15276861 1786 andrew gelman stats-2013-04-03-Hierarchical array priors for ANOVA decompositions
7 0.14985025 888 andrew gelman stats-2011-09-03-A psychology researcher asks: Is Anova dead?
8 0.149331 269 andrew gelman stats-2010-09-10-R vs. Stata, or, Different ways to estimate multilevel models
9 0.14538337 1628 andrew gelman stats-2012-12-17-Statistics in a world where nothing is random
10 0.14511204 851 andrew gelman stats-2011-08-12-year + (1|year)
12 0.13347904 501 andrew gelman stats-2011-01-04-A new R package for fititng multilevel models
14 0.13178524 963 andrew gelman stats-2011-10-18-Question on Type M errors
15 0.13034311 1769 andrew gelman stats-2013-03-18-Tibshirani announces new research result: A significance test for the lasso
16 0.12934241 63 andrew gelman stats-2010-06-02-The problem of overestimation of group-level variance parameters
17 0.12831931 2258 andrew gelman stats-2014-03-21-Random matrices in the news
18 0.12653369 1726 andrew gelman stats-2013-02-18-What to read to catch up on multivariate statistics?
19 0.12479214 1966 andrew gelman stats-2013-08-03-Uncertainty in parameter estimates using multilevel models
20 0.12167877 246 andrew gelman stats-2010-08-31-Somewhat Bayesian multilevel modeling
topicId topicWeight
[(0, 0.243), (1, 0.162), (2, 0.077), (3, -0.065), (4, 0.088), (5, 0.036), (6, 0.023), (7, -0.065), (8, 0.062), (9, 0.049), (10, 0.004), (11, 0.014), (12, 0.008), (13, -0.029), (14, 0.015), (15, -0.009), (16, -0.059), (17, 0.004), (18, -0.016), (19, 0.046), (20, -0.02), (21, -0.042), (22, -0.015), (23, 0.007), (24, -0.029), (25, -0.065), (26, -0.066), (27, 0.079), (28, -0.012), (29, -0.003), (30, -0.026), (31, -0.001), (32, -0.06), (33, 0.007), (34, 0.04), (35, 0.019), (36, -0.043), (37, 0.043), (38, -0.008), (39, -0.022), (40, 0.029), (41, 0.02), (42, 0.048), (43, 0.037), (44, -0.071), (45, 0.04), (46, -0.037), (47, -0.001), (48, 0.008), (49, -0.008)]
simIndex simValue blogId blogTitle
same-blog 1 0.96274799 1267 andrew gelman stats-2012-04-17-Hierarchical-multilevel modeling with “big data”
Introduction: Dean Eckles writes: I make extensive use of random effects models in my academic and industry research, as they are very often appropriate. However, with very large data sets, I am not sure what to do. Say I have thousands of levels of a grouping factor, and the number of observations totals in the billions. Despite having lots of observations, I am often either dealing with (a) small effects or (b) trying to fit models with many predictors. So I would really like to use a random effects model to borrow strength across the levels of the grouping factor, but I am not sure how to practically do this. Are you aware of any approaches to fitting random effects models (including approximations) that work for very large data sets? For example, applying a procedure to each group, and then using the results of this to shrink each fit in some appropriate way. Just to clarify, here I am only worried about the non-crossed and in fact single-level case. I don’t see any easy route for cross
2 0.8861807 2086 andrew gelman stats-2013-11-03-How best to compare effects measured in two different time periods?
Introduction: I received the following email from someone who wishes to remain anonymous: My colleague and I are trying to understand the best way to approach a problem involving measuring a group of individuals’ abilities across time, and are hoping you can offer some guidance. We are trying to analyze the combined effect of two distinct groups of people (A and B, with no overlap between A and B) who collaborate to produce a binary outcome, using a mixed logistic regression along the lines of the following. Outcome ~ (1 | A) + (1 | B) + Other variables What we’re interested in testing was whether the observed A random effects in period 1 are predictive of the A random effects in the following period 2. Our idea being create two models, each using a different period’s worth of data, to create two sets of A coefficients, then observe the relationship between the two. If the A’s have a persistent ability across periods, the coefficients should be correlated or show a linear-ish relationshi
3 0.87301767 269 andrew gelman stats-2010-09-10-R vs. Stata, or, Different ways to estimate multilevel models
Introduction: Cyrus writes: I [Cyrus] was teaching a class on multilevel modeling, and we were playing around with different method to fit a random effects logit model with 2 random intercepts—one corresponding to “family” and another corresponding to “community” (labeled “mom” and “cluster” in the data, respectively). There are also a few regressors at the individual, family, and community level. We were replicating in part some of the results from the following paper : Improved estimation procedures for multilevel models with binary response: a case-study, by G Rodriguez, N Goldman. (I say “replicating in part” because we didn’t include all the regressors that they use, only a subset.) We were looking at the performance of estimation via glmer in R’s lme4 package, glmmPQL in R’s MASS package, and Stata’s xtmelogit. We wanted to study the performance of various estimation methods, including adaptive quadrature methods and penalized quasi-likelihood. I was shocked to discover that glmer
Introduction: Chris Che-Castaldo writes: I am trying to compute variance components for a hierarchical model where the group level has two binary predictors and their interaction. When I model each of these three predictors as N(0, tau) the model will not converge, perhaps because the number of coefficients in each batch is so small (2 for the main effects and 4 for the interaction). Although I could simply leave all these as predictors as unmodeled fixed effects, the last sentence of section 21.2 on page 462 of Gelman and Hill (2007) suggests this would not be a wise course of action: For example, it is not clear how to define the (finite) standard deviation of variables that are included in interactions. I am curious – is there still no clear cut way to directly compute the finite standard deviation for binary unmodeled variables that are also part of an interaction as well as the interaction itself? My reply: I’d recommend including these in your model (it’s probably easiest to do so
5 0.81926596 464 andrew gelman stats-2010-12-12-Finite-population standard deviation in a hierarchical model
Introduction: Karri Seppa writes: My topic is regional variation in the cause-specific survival of breast cancer patients across the 21 hospital districts in Finland, this component being modeled by random effects. I am interested mainly in the district-specific effects, and with a hierarchical model I can get reasonable estimates also for sparsely populated districts. Based on the recommendation given in the book by yourself and Dr. Hill (2007) I tend to think that the finite-population variance would be an appropriate measure to summarize the overall variation across the 21 districts. However, I feel it is somewhat incoherent first to assume a Normal distribution for the district effects, involving a “superpopulation” variance parameter, and then to compute the finite-population variance from the estimated district-specific parameters. I wonder whether the finite-population variance were more appropriate in the context of a model with fixed district effects? My reply: I agree that th
6 0.81902272 759 andrew gelman stats-2011-06-11-“2 level logit with 2 REs & large sample. computational nightmare – please help”
7 0.81653428 1966 andrew gelman stats-2013-08-03-Uncertainty in parameter estimates using multilevel models
8 0.81583655 1194 andrew gelman stats-2012-03-04-Multilevel modeling even when you’re not interested in predictions for new groups
9 0.8130073 653 andrew gelman stats-2011-04-08-Multilevel regression with shrinkage for “fixed” effects
10 0.80847174 472 andrew gelman stats-2010-12-17-So-called fixed and random effects
11 0.80077815 851 andrew gelman stats-2011-08-12-year + (1|year)
12 0.79610819 888 andrew gelman stats-2011-09-03-A psychology researcher asks: Is Anova dead?
14 0.78201449 1786 andrew gelman stats-2013-04-03-Hierarchical array priors for ANOVA decompositions
15 0.77624917 948 andrew gelman stats-2011-10-10-Combining data from many sources
16 0.77060032 2296 andrew gelman stats-2014-04-19-Index or indicator variables
17 0.76863533 1644 andrew gelman stats-2012-12-30-Fixed effects, followed by Bayes shrinkage?
19 0.76691163 417 andrew gelman stats-2010-11-17-Clutering and variance components
20 0.76328832 753 andrew gelman stats-2011-06-09-Allowing interaction terms to vary
topicId topicWeight
[(9, 0.025), (15, 0.017), (16, 0.033), (24, 0.159), (45, 0.042), (52, 0.017), (63, 0.024), (68, 0.014), (69, 0.108), (75, 0.016), (85, 0.012), (86, 0.061), (89, 0.013), (99, 0.321)]
simIndex simValue blogId blogTitle
1 0.97791696 406 andrew gelman stats-2010-11-10-Translating into Votes: The Electoral Impact of Spanish-Language Ballots
Introduction: Dan Hopkins sends along this article : [Hopkins] uses regression discontinuity design to estimate the turnout and election impacts of Spanish-language assistance provided under Section 203 of the Voting Rights Act. Analyses of two different data sets – the Latino National Survey and California 1998 primary election returns – show that Spanish-language assistance increased turnout for citizens who speak little English. The California results also demonstrate that election procedures an influence outcomes, as support for ending bilingual education dropped markedly in heavily Spanish-speaking neighborhoods with Spanish-language assistance. The California analyses find hints of backlash among non-Hispanic white precincts, but not with the same size or certainty. Small changes in election procedures can influence who votes as well as what wins. Beyond the direct relevance of these results, I find this paper interesting as an example of research that is fundamentally quantitative. Th
2 0.97123545 923 andrew gelman stats-2011-09-24-What is the normal range of values in a medical test?
Introduction: Geoffrey Sheean writes: I am having trouble thinking Bayesianly about the so-called ‘normal’ or ‘reference’ values that I am supposed to use in some of the tests I perform. These values are obtained from purportedly healthy people. Setting aside concerns about ascertainment bias, non-parametric distributions, and the like, the values are usually obtained by setting the limits at ± 2SD from the mean. In some cases, supposedly because of a non-normal distribution, the third highest and lowest value observed in the healthy group sets the limits, on the assumption that no more than 2 results (out of 20 samples) are allowed to exceed these values: if there are 3 or more, then the test is assumed to be abnormal and the reference range is said to reflect the 90th percentile. The results are binary – normal, abnormal. The relevance to the diseased state is this. People who are known unequivocally to have condition X show Y abnormalities in these tests. Therefore, when people suspected
same-blog 3 0.96868396 1267 andrew gelman stats-2012-04-17-Hierarchical-multilevel modeling with “big data”
Introduction: Dean Eckles writes: I make extensive use of random effects models in my academic and industry research, as they are very often appropriate. However, with very large data sets, I am not sure what to do. Say I have thousands of levels of a grouping factor, and the number of observations totals in the billions. Despite having lots of observations, I am often either dealing with (a) small effects or (b) trying to fit models with many predictors. So I would really like to use a random effects model to borrow strength across the levels of the grouping factor, but I am not sure how to practically do this. Are you aware of any approaches to fitting random effects models (including approximations) that work for very large data sets? For example, applying a procedure to each group, and then using the results of this to shrink each fit in some appropriate way. Just to clarify, here I am only worried about the non-crossed and in fact single-level case. I don’t see any easy route for cross
4 0.9683708 158 andrew gelman stats-2010-07-22-Tenants and landlords
Introduction: Matthew Yglesias and Megan McArdle argue about the economics of landlord/tenant laws in D.C., a topic I know nothing about. But it did remind me of a few stories . . . 1. In grad school, I shared half of a two-family house with three other students. At some point, our landlord (who lived in the other half of the house) decided he wanted to sell the place, so he had a real estate agent coming by occasionally to show the house to people. She was just a flat-out liar (which I guess fits my impression based on screenings of Glengarry Glen Ross). I could never decide, when I was around and she was lying to a prospective buyer, whether to call her on it. Sometimes I did, sometimes I didn’t. 2. A year after I graduated, the landlord actually did sell the place but then, when my friends moved out, he refused to pay back their security deposit. There was some debate about getting the place repainted, I don’t remember the details. So they sued the landlord in Mass. housing court
5 0.96215165 1769 andrew gelman stats-2013-03-18-Tibshirani announces new research result: A significance test for the lasso
Introduction: Lasso and me For a long time I was wrong about lasso. Lasso (“least absolute shrinkage and selection operator”) is a regularization procedure that shrinks regression coefficients toward zero, and in its basic form is equivalent to maximum penalized likelihood estimation with a penalty function that is proportional to the sum of the absolute values of the regression coefficients. I first heard about lasso from a talk that Trevor Hastie Rob Tibshirani gave at Berkeley in 1994 or 1995. He demonstrated that it shrunk regression coefficients to zero. I wasn’t impressed, first because it seemed like no big deal (if that’s the prior you use, that’s the shrinkage you get) and second because, from a Bayesian perspective, I don’t want to shrink things all the way to zero. In the sorts of social and environmental science problems I’ve worked on, just about nothing is zero. I’d like to control my noisy estimates but there’s nothing special about zero. At the end of the talk I stood
6 0.96137863 1357 andrew gelman stats-2012-06-01-Halloween-Valentine’s update
7 0.96023697 749 andrew gelman stats-2011-06-06-“Sampling: Design and Analysis”: a course for political science graduate students
8 0.95777428 89 andrew gelman stats-2010-06-16-A historical perspective on financial bailouts
9 0.95744312 678 andrew gelman stats-2011-04-25-Democrats do better among the most and least educated groups
10 0.95624197 1310 andrew gelman stats-2012-05-09-Varying treatment effects, again
11 0.95347977 518 andrew gelman stats-2011-01-15-Regression discontinuity designs: looking for the keys under the lamppost?
13 0.94993848 1909 andrew gelman stats-2013-06-21-Job openings at conservative political analytics firm!
14 0.94795096 198 andrew gelman stats-2010-08-11-Multilevel modeling in R on a Mac
15 0.94714653 1167 andrew gelman stats-2012-02-14-Extra babies on Valentine’s Day, fewer on Halloween?
16 0.94494635 1337 andrew gelman stats-2012-05-22-Question 12 of my final exam for Design and Analysis of Sample Surveys
17 0.94457567 811 andrew gelman stats-2011-07-20-Kind of Bayesian
18 0.94268227 1658 andrew gelman stats-2013-01-07-Free advice from an academic writing coach!
19 0.94259965 2041 andrew gelman stats-2013-09-27-Setting up Jitts online
20 0.94244552 781 andrew gelman stats-2011-06-28-The holes in my philosophy of Bayesian data analysis