andrew_gelman_stats andrew_gelman_stats-2010 andrew_gelman_stats-2010-77 knowledge-graph by maker-knowledge-mining
Source: html
Introduction: Joe Fruehwald writes: I’m working with linguistic data, specifically binomial hits and misses of a certain variable for certain words (specifically whether or not the “t” sound was pronounced at the end of words like “soft”). Word frequency follows a power law, with most words appearing just once, and with some words being hyperfrequent. I’m not interested in specific word effects, but I am interested in the effect of word frequency. A logistic model fit is going to be heavily influenced by the effect of the hyperfrequent words which constitute only one type. To control for the item effect, I would fit a multilevel model with a random intercept by word, but like I said, most of the words appear only once. Is there a principled approach to this problem? My response: It’s ok to fit a multilevel model even if most groups only have one observation each. You’ll want to throw in some word-level predictors too. Think of the multilevel model not as a substitute for the usual thoug
sentIndex sentText sentNum sentScore
1 Joe Fruehwald writes: I’m working with linguistic data, specifically binomial hits and misses of a certain variable for certain words (specifically whether or not the “t” sound was pronounced at the end of words like “soft”). [sent-1, score-2.223]
2 Word frequency follows a power law, with most words appearing just once, and with some words being hyperfrequent. [sent-2, score-1.219]
3 I’m not interested in specific word effects, but I am interested in the effect of word frequency. [sent-3, score-1.112]
4 A logistic model fit is going to be heavily influenced by the effect of the hyperfrequent words which constitute only one type. [sent-4, score-1.377]
5 To control for the item effect, I would fit a multilevel model with a random intercept by word, but like I said, most of the words appear only once. [sent-5, score-1.401]
6 Is there a principled approach to this problem? [sent-6, score-0.157]
7 My response: It’s ok to fit a multilevel model even if most groups only have one observation each. [sent-7, score-0.797]
8 You’ll want to throw in some word-level predictors too. [sent-8, score-0.178]
9 Think of the multilevel model not as a substitute for the usual thoughtful steps of statistical modeling but rather as a way to account for unmodeled error at the group level. [sent-9, score-1.217]
wordName wordTfidf (topN-words)
[('words', 0.414), ('word', 0.331), ('multilevel', 0.225), ('specifically', 0.184), ('fruehwald', 0.18), ('fit', 0.175), ('pronounced', 0.163), ('unmodeled', 0.163), ('effect', 0.16), ('principled', 0.157), ('constitute', 0.149), ('certain', 0.146), ('model', 0.142), ('linguistic', 0.142), ('misses', 0.139), ('intercept', 0.133), ('substitute', 0.131), ('binomial', 0.127), ('influenced', 0.126), ('soft', 0.126), ('joe', 0.122), ('hits', 0.121), ('appearing', 0.118), ('heavily', 0.118), ('observation', 0.116), ('frequency', 0.113), ('interested', 0.107), ('thoughtful', 0.107), ('steps', 0.096), ('item', 0.095), ('throw', 0.093), ('sound', 0.093), ('logistic', 0.093), ('account', 0.089), ('follows', 0.085), ('predictors', 0.085), ('law', 0.085), ('groups', 0.078), ('specific', 0.076), ('appear', 0.076), ('power', 0.075), ('variable', 0.073), ('control', 0.073), ('usual', 0.073), ('random', 0.068), ('group', 0.065), ('error', 0.063), ('modeling', 0.063), ('ok', 0.061), ('end', 0.061)]
simIndex simValue blogId blogTitle
same-blog 1 0.99999994 77 andrew gelman stats-2010-06-09-Sof[t]
Introduction: Joe Fruehwald writes: I’m working with linguistic data, specifically binomial hits and misses of a certain variable for certain words (specifically whether or not the “t” sound was pronounced at the end of words like “soft”). Word frequency follows a power law, with most words appearing just once, and with some words being hyperfrequent. I’m not interested in specific word effects, but I am interested in the effect of word frequency. A logistic model fit is going to be heavily influenced by the effect of the hyperfrequent words which constitute only one type. To control for the item effect, I would fit a multilevel model with a random intercept by word, but like I said, most of the words appear only once. Is there a principled approach to this problem? My response: It’s ok to fit a multilevel model even if most groups only have one observation each. You’ll want to throw in some word-level predictors too. Think of the multilevel model not as a substitute for the usual thoug
2 0.2344057 318 andrew gelman stats-2010-10-04-U-Haul statistics
Introduction: Very freakonomic (and I mean that in the best sense of the word).
3 0.20809171 1191 andrew gelman stats-2012-03-01-Hoe noem je?
Introduction: Gerrit Storms reports on an interesting linguistic research project in which you can participate! Here’s the description: Over the past few weeks, we have been trying to set up a scientific study that is important for many researchers interested in words, word meaning, semantics, and cognitive science in general. It is a huge word association project, in which people are asked to participate in a small task that doesn’t last longer than 5 minutes. Our goal is to build a global word association network that contains connections between about 40,000 words, the size of the lexicon of an average adult. Setting up such a network might learn us a lot about semantic memory, how it develops, and maybe also about how it can deteriorate (like in Alzheimer’s disease). Most people enjoy doing the task, but we need thousands of participants to succeed. Up till today, we found about 53,000 participants willing to do the little task, but we need more subjects. That is why we address you. Would
4 0.1949133 295 andrew gelman stats-2010-09-25-Clusters with very small numbers of observations
Introduction: James O’Brien writes: How would you explain, to a “classically-trained” hypothesis-tester, that “It’s OK to fit a multilevel model even if some groups have only one observation each”? I [O'Brien] think I understand the logic and the statistical principles at work in this, but I’ve having trouble being clear and persuasive. I also feel like I’m contending with some methodological conventional wisdom here. My reply: I’m so used to this idea that I find it difficult to defend it in some sort of general conceptual way. So let me retreat to a more functional defense, which is that multilevel modeling gives good estimates, especially when the number of observations per group is small. One way to see this in any particular example in through cross-validation. Another way is to consider the alternatives. If you try really hard you can come up with a “classical hypothesis testing” approach which will do as well as the multilevel model. It would just take a lot of work. I’d r
5 0.18913546 574 andrew gelman stats-2011-02-14-“The best data visualizations should stand on their own”? I don’t think so.
Introduction: Jimmy pointed me to this blog by Drew Conway on word clouds. I don’t have much to say about Conway’s specifics–word clouds aren’t really my thing, but I’m glad that people are thinking about how to do them better–but I did notice one phrase of his that I’ll dispute. Conway writes The best data visualizations should stand on their own . . . I disagree. I prefer the saying, “A picture plus 1000 words is better than two pictures or 2000 words.” That is, I see a positive interaction between words and pictures or, to put it another way, diminishing returns for words or pictures on their own. I don’t have any big theory for this, but I think, when expressed as a joint value function, my idea makes sense. Also, I live this suggestion in my own work. I typically accompany my graphs with long captions and I try to accompany my words with pictures (although I’m not doing it here, because with the software I use, it’s much easier to type more words than to find, scale, and insert i
6 0.16120338 476 andrew gelman stats-2010-12-19-Google’s word count statistics viewer
7 0.15464926 1144 andrew gelman stats-2012-01-29-How many parameters are in a multilevel model?
8 0.14844233 1400 andrew gelman stats-2012-06-29-Decline Effect in Linguistics?
9 0.14667092 1248 andrew gelman stats-2012-04-06-17 groups, 6 group-level predictors: What to do?
11 0.12940314 383 andrew gelman stats-2010-10-31-Analyzing the entire population rather than a sample
12 0.11981864 753 andrew gelman stats-2011-06-09-Allowing interaction terms to vary
13 0.11822569 653 andrew gelman stats-2011-04-08-Multilevel regression with shrinkage for “fixed” effects
16 0.11254817 1241 andrew gelman stats-2012-04-02-Fixed effects and identification
17 0.11227782 2294 andrew gelman stats-2014-04-17-If you get to the point of asking, just do it. But some difficulties do arise . . .
18 0.11148158 305 andrew gelman stats-2010-09-29-Decision science vs. social psychology
19 0.10843283 397 andrew gelman stats-2010-11-06-Multilevel quantile regression
20 0.10706417 1194 andrew gelman stats-2012-03-04-Multilevel modeling even when you’re not interested in predictions for new groups
topicId topicWeight
[(0, 0.157), (1, 0.107), (2, 0.063), (3, -0.033), (4, 0.075), (5, 0.032), (6, 0.012), (7, -0.046), (8, 0.118), (9, 0.081), (10, 0.006), (11, 0.021), (12, 0.033), (13, -0.02), (14, -0.002), (15, 0.002), (16, -0.027), (17, -0.054), (18, -0.009), (19, 0.008), (20, 0.005), (21, -0.025), (22, 0.008), (23, -0.036), (24, -0.061), (25, -0.092), (26, -0.057), (27, 0.035), (28, -0.048), (29, -0.025), (30, -0.046), (31, 0.011), (32, -0.072), (33, 0.021), (34, 0.005), (35, 0.021), (36, -0.018), (37, -0.034), (38, 0.033), (39, -0.012), (40, 0.04), (41, -0.008), (42, -0.021), (43, -0.061), (44, -0.014), (45, 0.024), (46, 0.027), (47, 0.009), (48, -0.094), (49, 0.062)]
simIndex simValue blogId blogTitle
same-blog 1 0.97100168 77 andrew gelman stats-2010-06-09-Sof[t]
Introduction: Joe Fruehwald writes: I’m working with linguistic data, specifically binomial hits and misses of a certain variable for certain words (specifically whether or not the “t” sound was pronounced at the end of words like “soft”). Word frequency follows a power law, with most words appearing just once, and with some words being hyperfrequent. I’m not interested in specific word effects, but I am interested in the effect of word frequency. A logistic model fit is going to be heavily influenced by the effect of the hyperfrequent words which constitute only one type. To control for the item effect, I would fit a multilevel model with a random intercept by word, but like I said, most of the words appear only once. Is there a principled approach to this problem? My response: It’s ok to fit a multilevel model even if most groups only have one observation each. You’ll want to throw in some word-level predictors too. Think of the multilevel model not as a substitute for the usual thoug
Introduction: A sociologist writes in: Samuel Lucas has just published a paper in Quality and Quantity arguing that anything less than a full probability sample of higher levels in HLMs yields biased and unusable results. If I follow him correctly, he is arguing that not only are the SEs too small, but the parameter estimates themselves are biased and we cannot say in advance whether the bias is positive or negative. Lucas has thrown down a big gauntlet, advising us throw away our data unless the sample of macro units is right and ignore the published results that fail this standard. Extreme. Is there another conclusion to be drawn? Other advice to be given? A Bayesian path out of the valley? Heres’s the abstract to Lucas’s paper: The multilevel model has become a staple of social research. I textually and formally explicate sample design features that, I contend, are required for unbiased estimation of macro-level multilevel model parameters and the use of tools for statistical infe
3 0.77847689 417 andrew gelman stats-2010-11-17-Clutering and variance components
Introduction: Raymond Lim writes: Do you have any recommendations on clustering and binary models? My particular problem is I’m running a firm fixed effect logit and want to cluster by industry-year (every combination of industry-year). My control variable of interest in measured by industry-year and when I cluster by industry-year, the standard errors are 300x larger than when I don’t cluster. Strangely, this problem only occurs when doing logit and not OLS (linear probability). Also, clustering just by field doesn’t blow up the errors. My hunch is it has something to do with the non-nested structure of year, but I don’t understand why this is only problematic under logit and not OLS. My reply: I’d recommend including four multilevel variance parameters, one for firm, one for industry, one for year, and one for industry-year. (In lmer, that’s (1 | firm) + (1 | industry) + (1 | year) + (1 | industry.year)). No need to include (1 | firm.year) since in your data this is the error term. Try
4 0.77813911 823 andrew gelman stats-2011-07-26-Including interactions or not
Introduction: Liz Sanders writes: I viewed your 2005 presentation “Interactions in multilevel models” and was hoping you or one of your students/colleagues could point me to some readings about the issue of using all possible vs. only particular interaction terms in regression models with continuous covariates (I think “functional form validity” is the term I have encountered in the past). In particular, I am trying to understand whether I would be mis-specifying a model if I deleted two of its interaction terms (in favor of using only 2-way treatment interaction terms). The general full model, for example, is: Y = intercept + txt + pre1 + pre2 + txt*pre1 + txt*pre2 + pre1*pre2 + txt*pre1*pre2, where txt is effect coded (1=treatment, -1=control) and pre1 and pre2 are two different pretests that are assumed normally distributed. (The model is actually a multilevel model; the error terms are not listed for brevity.) The truncated model, on the other hand, would only test 2-way treatment inte
5 0.77638841 295 andrew gelman stats-2010-09-25-Clusters with very small numbers of observations
Introduction: James O’Brien writes: How would you explain, to a “classically-trained” hypothesis-tester, that “It’s OK to fit a multilevel model even if some groups have only one observation each”? I [O'Brien] think I understand the logic and the statistical principles at work in this, but I’ve having trouble being clear and persuasive. I also feel like I’m contending with some methodological conventional wisdom here. My reply: I’m so used to this idea that I find it difficult to defend it in some sort of general conceptual way. So let me retreat to a more functional defense, which is that multilevel modeling gives good estimates, especially when the number of observations per group is small. One way to see this in any particular example in through cross-validation. Another way is to consider the alternatives. If you try really hard you can come up with a “classical hypothesis testing” approach which will do as well as the multilevel model. It would just take a lot of work. I’d r
6 0.76370549 255 andrew gelman stats-2010-09-04-How does multilevel modeling affect the estimate of the grand mean?
7 0.762438 753 andrew gelman stats-2011-06-09-Allowing interaction terms to vary
8 0.76190186 1194 andrew gelman stats-2012-03-04-Multilevel modeling even when you’re not interested in predictions for new groups
9 0.75674844 1267 andrew gelman stats-2012-04-17-Hierarchical-multilevel modeling with “big data”
10 0.74714667 1468 andrew gelman stats-2012-08-24-Multilevel modeling and instrumental variables
11 0.73972321 383 andrew gelman stats-2010-10-31-Analyzing the entire population rather than a sample
12 0.73950398 397 andrew gelman stats-2010-11-06-Multilevel quantile regression
13 0.73777312 653 andrew gelman stats-2011-04-08-Multilevel regression with shrinkage for “fixed” effects
14 0.73709583 948 andrew gelman stats-2011-10-10-Combining data from many sources
15 0.73611617 472 andrew gelman stats-2010-12-17-So-called fixed and random effects
18 0.72442114 2086 andrew gelman stats-2013-11-03-How best to compare effects measured in two different time periods?
19 0.72289342 1737 andrew gelman stats-2013-02-25-Correlation of 1 . . . too good to be true?
20 0.72283846 1248 andrew gelman stats-2012-04-06-17 groups, 6 group-level predictors: What to do?
topicId topicWeight
[(4, 0.075), (5, 0.042), (7, 0.034), (15, 0.036), (16, 0.012), (20, 0.04), (24, 0.245), (27, 0.016), (59, 0.017), (62, 0.022), (76, 0.016), (86, 0.013), (89, 0.017), (99, 0.307)]
simIndex simValue blogId blogTitle
same-blog 1 0.98498356 77 andrew gelman stats-2010-06-09-Sof[t]
Introduction: Joe Fruehwald writes: I’m working with linguistic data, specifically binomial hits and misses of a certain variable for certain words (specifically whether or not the “t” sound was pronounced at the end of words like “soft”). Word frequency follows a power law, with most words appearing just once, and with some words being hyperfrequent. I’m not interested in specific word effects, but I am interested in the effect of word frequency. A logistic model fit is going to be heavily influenced by the effect of the hyperfrequent words which constitute only one type. To control for the item effect, I would fit a multilevel model with a random intercept by word, but like I said, most of the words appear only once. Is there a principled approach to this problem? My response: It’s ok to fit a multilevel model even if most groups only have one observation each. You’ll want to throw in some word-level predictors too. Think of the multilevel model not as a substitute for the usual thoug
Introduction: Peter Bergman points me to this discussion from Cyrus of a presentation by Guido Imbens on design of randomized experiments. Cyrus writes: The standard analysis that Imbens proposes includes (1) a Fisher-type permutation test of the sharp null hypothesis–what Imbens referred to as “testing”–along with a (2) Neyman-type point estimate of the sample average treatment effect and confidence interval–what Imbens referred to as “estimation.” . . . Imbens claimed that testing and estimation are separate enterprises with separate goals and that the two should not be confused. I [Cyrus] took it as a warning against proposals that use “inverted” tests in order to produce point estimates and confidence intervals. There is no reason that such confidence intervals will have accurate coverage except under rather dire assumptions, meaning that they are not “confidence intervals” in the way that we usually think of them. I agree completely. This is something I’ve been saying for a long
Introduction: Please answer the above question before reading on . . . I’m curious after reading Leif Nelson’s report that, based on research with Minah Jung, approximately 42% of the people they surveyed said they bought laundry detergent on their most recent trip to the store. I’m stunned that the number is so high. 42%??? That’s almost half the time. If we bought laundry detergent half the time we went to the store, our apartment would be stacked so full with the stuff, we wouldn’t be able to enter the door. I think we buy laundry detergent . . . ummm, how often? There are 40 of those little laundry packets in the box, we do laundry once a day, sometimes twice, let’s say 10 times a week, so this means we buy detergent about once every 4 weeks. We go to the store, hmmm, about once a day, let’s say 5 times a week to put our guess on the conservative side. So, 20 trips to the store for each purchase of detergent, that’s 5% of the time. Compared to us, lots of people must (a) go to
Introduction: I’m reposing this classic from 2011 . . . Peter Bergman pointed me to this discussion from Cyrus of a presentation by Guido Imbens on design of randomized experiments. Cyrus writes: The standard analysis that Imbens proposes includes (1) a Fisher-type permutation test of the sharp null hypothesis–what Imbens referred to as “testing”–along with a (2) Neyman-type point estimate of the sample average treatment effect and confidence interval–what Imbens referred to as “estimation.” . . . Imbens claimed that testing and estimation are separate enterprises with separate goals and that the two should not be confused. I [Cyrus] took it as a warning against proposals that use “inverted” tests in order to produce point estimates and confidence intervals. There is no reason that such confidence intervals will have accurate coverage except under rather dire assumptions, meaning that they are not “confidence intervals” in the way that we usually think of them. I agree completely. T
5 0.95845246 2283 andrew gelman stats-2014-04-06-An old discussion of food deserts
Introduction: I happened to be reading an old comment thread from 2012 (follow the link from here ) and came across this amusing exchange: Perhaps this is the paper Jonathan was talking about? Here’s more from the thread: Anyway, I don’t have anything to add right now, I just thought it was an interesting discussion.
6 0.95728981 2109 andrew gelman stats-2013-11-21-Hidden dangers of noninformative priors
8 0.95616412 2129 andrew gelman stats-2013-12-10-Cross-validation and Bayesian estimation of tuning parameters
9 0.95538068 63 andrew gelman stats-2010-06-02-The problem of overestimation of group-level variance parameters
10 0.95524269 1941 andrew gelman stats-2013-07-16-Priors
11 0.95505363 1170 andrew gelman stats-2012-02-16-A previous discussion with Charles Murray about liberals, conservatives, and social class
12 0.95503473 1465 andrew gelman stats-2012-08-21-D. Buggin
14 0.95403671 1240 andrew gelman stats-2012-04-02-Blogads update
15 0.95379412 1208 andrew gelman stats-2012-03-11-Gelman on Hennig on Gelman on Bayes
16 0.95353746 414 andrew gelman stats-2010-11-14-“Like a group of teenagers on a bus, they behave in public as if they were in private”
18 0.95296431 1713 andrew gelman stats-2013-02-08-P-values and statistical practice
19 0.9528743 1150 andrew gelman stats-2012-02-02-The inevitable problems with statistical significance and 95% intervals
20 0.95272934 1474 andrew gelman stats-2012-08-29-More on scaled-inverse Wishart and prior independence