andrew_gelman_stats andrew_gelman_stats-2010 andrew_gelman_stats-2010-77 knowledge-graph by maker-knowledge-mining

77 andrew gelman stats-2010-06-09-Sof[t]


meta infos for this blog

Source: html

Introduction: Joe Fruehwald writes: I’m working with linguistic data, specifically binomial hits and misses of a certain variable for certain words (specifically whether or not the “t” sound was pronounced at the end of words like “soft”). Word frequency follows a power law, with most words appearing just once, and with some words being hyperfrequent. I’m not interested in specific word effects, but I am interested in the effect of word frequency. A logistic model fit is going to be heavily influenced by the effect of the hyperfrequent words which constitute only one type. To control for the item effect, I would fit a multilevel model with a random intercept by word, but like I said, most of the words appear only once. Is there a principled approach to this problem? My response: It’s ok to fit a multilevel model even if most groups only have one observation each. You’ll want to throw in some word-level predictors too. Think of the multilevel model not as a substitute for the usual thoug


Summary: the most important sentenses genereted by tfidf model

sentIndex sentText sentNum sentScore

1 Joe Fruehwald writes: I’m working with linguistic data, specifically binomial hits and misses of a certain variable for certain words (specifically whether or not the “t” sound was pronounced at the end of words like “soft”). [sent-1, score-2.223]

2 Word frequency follows a power law, with most words appearing just once, and with some words being hyperfrequent. [sent-2, score-1.219]

3 I’m not interested in specific word effects, but I am interested in the effect of word frequency. [sent-3, score-1.112]

4 A logistic model fit is going to be heavily influenced by the effect of the hyperfrequent words which constitute only one type. [sent-4, score-1.377]

5 To control for the item effect, I would fit a multilevel model with a random intercept by word, but like I said, most of the words appear only once. [sent-5, score-1.401]

6 Is there a principled approach to this problem? [sent-6, score-0.157]

7 My response: It’s ok to fit a multilevel model even if most groups only have one observation each. [sent-7, score-0.797]

8 You’ll want to throw in some word-level predictors too. [sent-8, score-0.178]

9 Think of the multilevel model not as a substitute for the usual thoughtful steps of statistical modeling but rather as a way to account for unmodeled error at the group level. [sent-9, score-1.217]


similar blogs computed by tfidf model

tfidf for this blog:

wordName wordTfidf (topN-words)

[('words', 0.414), ('word', 0.331), ('multilevel', 0.225), ('specifically', 0.184), ('fruehwald', 0.18), ('fit', 0.175), ('pronounced', 0.163), ('unmodeled', 0.163), ('effect', 0.16), ('principled', 0.157), ('constitute', 0.149), ('certain', 0.146), ('model', 0.142), ('linguistic', 0.142), ('misses', 0.139), ('intercept', 0.133), ('substitute', 0.131), ('binomial', 0.127), ('influenced', 0.126), ('soft', 0.126), ('joe', 0.122), ('hits', 0.121), ('appearing', 0.118), ('heavily', 0.118), ('observation', 0.116), ('frequency', 0.113), ('interested', 0.107), ('thoughtful', 0.107), ('steps', 0.096), ('item', 0.095), ('throw', 0.093), ('sound', 0.093), ('logistic', 0.093), ('account', 0.089), ('follows', 0.085), ('predictors', 0.085), ('law', 0.085), ('groups', 0.078), ('specific', 0.076), ('appear', 0.076), ('power', 0.075), ('variable', 0.073), ('control', 0.073), ('usual', 0.073), ('random', 0.068), ('group', 0.065), ('error', 0.063), ('modeling', 0.063), ('ok', 0.061), ('end', 0.061)]

similar blogs list:

simIndex simValue blogId blogTitle

same-blog 1 0.99999994 77 andrew gelman stats-2010-06-09-Sof[t]

Introduction: Joe Fruehwald writes: I’m working with linguistic data, specifically binomial hits and misses of a certain variable for certain words (specifically whether or not the “t” sound was pronounced at the end of words like “soft”). Word frequency follows a power law, with most words appearing just once, and with some words being hyperfrequent. I’m not interested in specific word effects, but I am interested in the effect of word frequency. A logistic model fit is going to be heavily influenced by the effect of the hyperfrequent words which constitute only one type. To control for the item effect, I would fit a multilevel model with a random intercept by word, but like I said, most of the words appear only once. Is there a principled approach to this problem? My response: It’s ok to fit a multilevel model even if most groups only have one observation each. You’ll want to throw in some word-level predictors too. Think of the multilevel model not as a substitute for the usual thoug

2 0.2344057 318 andrew gelman stats-2010-10-04-U-Haul statistics

Introduction: Very freakonomic (and I mean that in the best sense of the word).

3 0.20809171 1191 andrew gelman stats-2012-03-01-Hoe noem je?

Introduction: Gerrit Storms reports on an interesting linguistic research project in which you can participate! Here’s the description: Over the past few weeks, we have been trying to set up a scientific study that is important for many researchers interested in words, word meaning, semantics, and cognitive science in general. It is a huge word association project, in which people are asked to participate in a small task that doesn’t last longer than 5 minutes. Our goal is to build a global word association network that contains connections between about 40,000 words, the size of the lexicon of an average adult. Setting up such a network might learn us a lot about semantic memory, how it develops, and maybe also about how it can deteriorate (like in Alzheimer’s disease). Most people enjoy doing the task, but we need thousands of participants to succeed. Up till today, we found about 53,000 participants willing to do the little task, but we need more subjects. That is why we address you. Would

4 0.1949133 295 andrew gelman stats-2010-09-25-Clusters with very small numbers of observations

Introduction: James O’Brien writes: How would you explain, to a “classically-trained” hypothesis-tester, that “It’s OK to fit a multilevel model even if some groups have only one observation each”? I [O'Brien] think I understand the logic and the statistical principles at work in this, but I’ve having trouble being clear and persuasive. I also feel like I’m contending with some methodological conventional wisdom here. My reply: I’m so used to this idea that I find it difficult to defend it in some sort of general conceptual way. So let me retreat to a more functional defense, which is that multilevel modeling gives good estimates, especially when the number of observations per group is small. One way to see this in any particular example in through cross-validation. Another way is to consider the alternatives. If you try really hard you can come up with a “classical hypothesis testing” approach which will do as well as the multilevel model. It would just take a lot of work. I’d r

5 0.18913546 574 andrew gelman stats-2011-02-14-“The best data visualizations should stand on their own”? I don’t think so.

Introduction: Jimmy pointed me to this blog by Drew Conway on word clouds. I don’t have much to say about Conway’s specifics–word clouds aren’t really my thing, but I’m glad that people are thinking about how to do them better–but I did notice one phrase of his that I’ll dispute. Conway writes The best data visualizations should stand on their own . . . I disagree. I prefer the saying, “A picture plus 1000 words is better than two pictures or 2000 words.” That is, I see a positive interaction between words and pictures or, to put it another way, diminishing returns for words or pictures on their own. I don’t have any big theory for this, but I think, when expressed as a joint value function, my idea makes sense. Also, I live this suggestion in my own work. I typically accompany my graphs with long captions and I try to accompany my words with pictures (although I’m not doing it here, because with the software I use, it’s much easier to type more words than to find, scale, and insert i

6 0.16120338 476 andrew gelman stats-2010-12-19-Google’s word count statistics viewer

7 0.15464926 1144 andrew gelman stats-2012-01-29-How many parameters are in a multilevel model?

8 0.14844233 1400 andrew gelman stats-2012-06-29-Decline Effect in Linguistics?

9 0.14667092 1248 andrew gelman stats-2012-04-06-17 groups, 6 group-level predictors: What to do?

10 0.13418399 1934 andrew gelman stats-2013-07-11-Yes, worry about generalizing from data to population. But multilevel modeling is the solution, not the problem

11 0.12940314 383 andrew gelman stats-2010-10-31-Analyzing the entire population rather than a sample

12 0.11981864 753 andrew gelman stats-2011-06-09-Allowing interaction terms to vary

13 0.11822569 653 andrew gelman stats-2011-04-08-Multilevel regression with shrinkage for “fixed” effects

14 0.11441897 2145 andrew gelman stats-2013-12-24-Estimating and summarizing inference for hierarchical variance parameters when the number of groups is small

15 0.11262141 1972 andrew gelman stats-2013-08-07-When you’re planning on fitting a model, build up to it by fitting simpler models first. Then, once you have a model you like, check the hell out of it

16 0.11254817 1241 andrew gelman stats-2012-04-02-Fixed effects and identification

17 0.11227782 2294 andrew gelman stats-2014-04-17-If you get to the point of asking, just do it. But some difficulties do arise . . .

18 0.11148158 305 andrew gelman stats-2010-09-29-Decision science vs. social psychology

19 0.10843283 397 andrew gelman stats-2010-11-06-Multilevel quantile regression

20 0.10706417 1194 andrew gelman stats-2012-03-04-Multilevel modeling even when you’re not interested in predictions for new groups


similar blogs computed by lsi model

lsi for this blog:

topicId topicWeight

[(0, 0.157), (1, 0.107), (2, 0.063), (3, -0.033), (4, 0.075), (5, 0.032), (6, 0.012), (7, -0.046), (8, 0.118), (9, 0.081), (10, 0.006), (11, 0.021), (12, 0.033), (13, -0.02), (14, -0.002), (15, 0.002), (16, -0.027), (17, -0.054), (18, -0.009), (19, 0.008), (20, 0.005), (21, -0.025), (22, 0.008), (23, -0.036), (24, -0.061), (25, -0.092), (26, -0.057), (27, 0.035), (28, -0.048), (29, -0.025), (30, -0.046), (31, 0.011), (32, -0.072), (33, 0.021), (34, 0.005), (35, 0.021), (36, -0.018), (37, -0.034), (38, 0.033), (39, -0.012), (40, 0.04), (41, -0.008), (42, -0.021), (43, -0.061), (44, -0.014), (45, 0.024), (46, 0.027), (47, 0.009), (48, -0.094), (49, 0.062)]

similar blogs list:

simIndex simValue blogId blogTitle

same-blog 1 0.97100168 77 andrew gelman stats-2010-06-09-Sof[t]

Introduction: Joe Fruehwald writes: I’m working with linguistic data, specifically binomial hits and misses of a certain variable for certain words (specifically whether or not the “t” sound was pronounced at the end of words like “soft”). Word frequency follows a power law, with most words appearing just once, and with some words being hyperfrequent. I’m not interested in specific word effects, but I am interested in the effect of word frequency. A logistic model fit is going to be heavily influenced by the effect of the hyperfrequent words which constitute only one type. To control for the item effect, I would fit a multilevel model with a random intercept by word, but like I said, most of the words appear only once. Is there a principled approach to this problem? My response: It’s ok to fit a multilevel model even if most groups only have one observation each. You’ll want to throw in some word-level predictors too. Think of the multilevel model not as a substitute for the usual thoug

2 0.77950025 1934 andrew gelman stats-2013-07-11-Yes, worry about generalizing from data to population. But multilevel modeling is the solution, not the problem

Introduction: A sociologist writes in: Samuel Lucas has just published a paper in Quality and Quantity arguing that anything less than a full probability sample of higher levels in HLMs yields biased and unusable results. If I follow him correctly, he is arguing that not only are the SEs too small, but the parameter estimates themselves are biased and we cannot say in advance whether the bias is positive or negative. Lucas has thrown down a big gauntlet, advising us throw away our data unless the sample of macro units is right and ignore the published results that fail this standard. Extreme. Is there another conclusion to be drawn? Other advice to be given? A Bayesian path out of the valley? Heres’s the abstract to Lucas’s paper: The multilevel model has become a staple of social research. I textually and formally explicate sample design features that, I contend, are required for unbiased estimation of macro-level multilevel model parameters and the use of tools for statistical infe

3 0.77847689 417 andrew gelman stats-2010-11-17-Clutering and variance components

Introduction: Raymond Lim writes: Do you have any recommendations on clustering and binary models? My particular problem is I’m running a firm fixed effect logit and want to cluster by industry-year (every combination of industry-year). My control variable of interest in measured by industry-year and when I cluster by industry-year, the standard errors are 300x larger than when I don’t cluster. Strangely, this problem only occurs when doing logit and not OLS (linear probability). Also, clustering just by field doesn’t blow up the errors. My hunch is it has something to do with the non-nested structure of year, but I don’t understand why this is only problematic under logit and not OLS. My reply: I’d recommend including four multilevel variance parameters, one for firm, one for industry, one for year, and one for industry-year. (In lmer, that’s (1 | firm) + (1 | industry) + (1 | year) + (1 | industry.year)). No need to include (1 | firm.year) since in your data this is the error term. Try

4 0.77813911 823 andrew gelman stats-2011-07-26-Including interactions or not

Introduction: Liz Sanders writes: I viewed your 2005 presentation “Interactions in multilevel models” and was hoping you or one of your students/colleagues could point me to some readings about the issue of using all possible vs. only particular interaction terms in regression models with continuous covariates (I think “functional form validity” is the term I have encountered in the past). In particular, I am trying to understand whether I would be mis-specifying a model if I deleted two of its interaction terms (in favor of using only 2-way treatment interaction terms). The general full model, for example, is: Y = intercept + txt + pre1 + pre2 + txt*pre1 + txt*pre2 + pre1*pre2 + txt*pre1*pre2, where txt is effect coded (1=treatment, -1=control) and pre1 and pre2 are two different pretests that are assumed normally distributed. (The model is actually a multilevel model; the error terms are not listed for brevity.) The truncated model, on the other hand, would only test 2-way treatment inte

5 0.77638841 295 andrew gelman stats-2010-09-25-Clusters with very small numbers of observations

Introduction: James O’Brien writes: How would you explain, to a “classically-trained” hypothesis-tester, that “It’s OK to fit a multilevel model even if some groups have only one observation each”? I [O'Brien] think I understand the logic and the statistical principles at work in this, but I’ve having trouble being clear and persuasive. I also feel like I’m contending with some methodological conventional wisdom here. My reply: I’m so used to this idea that I find it difficult to defend it in some sort of general conceptual way. So let me retreat to a more functional defense, which is that multilevel modeling gives good estimates, especially when the number of observations per group is small. One way to see this in any particular example in through cross-validation. Another way is to consider the alternatives. If you try really hard you can come up with a “classical hypothesis testing” approach which will do as well as the multilevel model. It would just take a lot of work. I’d r

6 0.76370549 255 andrew gelman stats-2010-09-04-How does multilevel modeling affect the estimate of the grand mean?

7 0.762438 753 andrew gelman stats-2011-06-09-Allowing interaction terms to vary

8 0.76190186 1194 andrew gelman stats-2012-03-04-Multilevel modeling even when you’re not interested in predictions for new groups

9 0.75674844 1267 andrew gelman stats-2012-04-17-Hierarchical-multilevel modeling with “big data”

10 0.74714667 1468 andrew gelman stats-2012-08-24-Multilevel modeling and instrumental variables

11 0.73972321 383 andrew gelman stats-2010-10-31-Analyzing the entire population rather than a sample

12 0.73950398 397 andrew gelman stats-2010-11-06-Multilevel quantile regression

13 0.73777312 653 andrew gelman stats-2011-04-08-Multilevel regression with shrinkage for “fixed” effects

14 0.73709583 948 andrew gelman stats-2011-10-10-Combining data from many sources

15 0.73611617 472 andrew gelman stats-2010-12-17-So-called fixed and random effects

16 0.73050809 1891 andrew gelman stats-2013-06-09-“Heterogeneity of variance in experimental studies: A challenge to conventional interpretations”

17 0.72893792 346 andrew gelman stats-2010-10-16-Mandelbrot and Akaike: from taxonomy to smooth runways (pioneering work in fractals and self-similarity)

18 0.72442114 2086 andrew gelman stats-2013-11-03-How best to compare effects measured in two different time periods?

19 0.72289342 1737 andrew gelman stats-2013-02-25-Correlation of 1 . . . too good to be true?

20 0.72283846 1248 andrew gelman stats-2012-04-06-17 groups, 6 group-level predictors: What to do?


similar blogs computed by lda model

lda for this blog:

topicId topicWeight

[(4, 0.075), (5, 0.042), (7, 0.034), (15, 0.036), (16, 0.012), (20, 0.04), (24, 0.245), (27, 0.016), (59, 0.017), (62, 0.022), (76, 0.016), (86, 0.013), (89, 0.017), (99, 0.307)]

similar blogs list:

simIndex simValue blogId blogTitle

same-blog 1 0.98498356 77 andrew gelman stats-2010-06-09-Sof[t]

Introduction: Joe Fruehwald writes: I’m working with linguistic data, specifically binomial hits and misses of a certain variable for certain words (specifically whether or not the “t” sound was pronounced at the end of words like “soft”). Word frequency follows a power law, with most words appearing just once, and with some words being hyperfrequent. I’m not interested in specific word effects, but I am interested in the effect of word frequency. A logistic model fit is going to be heavily influenced by the effect of the hyperfrequent words which constitute only one type. To control for the item effect, I would fit a multilevel model with a random intercept by word, but like I said, most of the words appear only once. Is there a principled approach to this problem? My response: It’s ok to fit a multilevel model even if most groups only have one observation each. You’ll want to throw in some word-level predictors too. Think of the multilevel model not as a substitute for the usual thoug

2 0.9606868 870 andrew gelman stats-2011-08-25-Why it doesn’t make sense in general to form confidence intervals by inverting hypothesis tests

Introduction: Peter Bergman points me to this discussion from Cyrus of a presentation by Guido Imbens on design of randomized experiments. Cyrus writes: The standard analysis that Imbens proposes includes (1) a Fisher-type permutation test of the sharp null hypothesis–what Imbens referred to as “testing”–along with a (2) Neyman-type point estimate of the sample average treatment effect and confidence interval–what Imbens referred to as “estimation.” . . . Imbens claimed that testing and estimation are separate enterprises with separate goals and that the two should not be confused. I [Cyrus] took it as a warning against proposals that use “inverted” tests in order to produce point estimates and confidence intervals. There is no reason that such confidence intervals will have accurate coverage except under rather dire assumptions, meaning that they are not “confidence intervals” in the way that we usually think of them. I agree completely. This is something I’ve been saying for a long

3 0.96036088 2358 andrew gelman stats-2014-06-03-Did you buy laundry detergent on their most recent trip to the store? Also comments on scientific publication and yet another suggestion to do a study that allows within-person comparisons

Introduction: Please answer the above question before reading on . . . I’m curious after reading Leif Nelson’s report that, based on research with Minah Jung, approximately 42% of the people they surveyed said they bought laundry detergent on their most recent trip to the store. I’m stunned that the number is so high. 42%??? That’s almost half the time. If we bought laundry detergent half the time we went to the store, our apartment would be stacked so full with the stuff, we wouldn’t be able to enter the door. I think we buy laundry detergent . . . ummm, how often? There are 40 of those little laundry packets in the box, we do laundry once a day, sometimes twice, let’s say 10 times a week, so this means we buy detergent about once every 4 weeks. We go to the store, hmmm, about once a day, let’s say 5 times a week to put our guess on the conservative side. So, 20 trips to the store for each purchase of detergent, that’s 5% of the time. Compared to us, lots of people must (a) go to

4 0.9602809 1913 andrew gelman stats-2013-06-24-Why it doesn’t make sense in general to form confidence intervals by inverting hypothesis tests

Introduction: I’m reposing this classic from 2011 . . . Peter Bergman pointed me to this discussion from Cyrus of a presentation by Guido Imbens on design of randomized experiments. Cyrus writes: The standard analysis that Imbens proposes includes (1) a Fisher-type permutation test of the sharp null hypothesis–what Imbens referred to as “testing”–along with a (2) Neyman-type point estimate of the sample average treatment effect and confidence interval–what Imbens referred to as “estimation.” . . . Imbens claimed that testing and estimation are separate enterprises with separate goals and that the two should not be confused. I [Cyrus] took it as a warning against proposals that use “inverted” tests in order to produce point estimates and confidence intervals. There is no reason that such confidence intervals will have accurate coverage except under rather dire assumptions, meaning that they are not “confidence intervals” in the way that we usually think of them. I agree completely. T

5 0.95845246 2283 andrew gelman stats-2014-04-06-An old discussion of food deserts

Introduction: I happened to be reading an old comment thread from 2012 (follow the link from here ) and came across this amusing exchange: Perhaps this is the paper Jonathan was talking about? Here’s more from the thread: Anyway, I don’t have anything to add right now, I just thought it was an interesting discussion.

6 0.95728981 2109 andrew gelman stats-2013-11-21-Hidden dangers of noninformative priors

7 0.95657599 1087 andrew gelman stats-2011-12-27-“Keeping things unridiculous”: Berger, O’Hagan, and me on weakly informative priors

8 0.95616412 2129 andrew gelman stats-2013-12-10-Cross-validation and Bayesian estimation of tuning parameters

9 0.95538068 63 andrew gelman stats-2010-06-02-The problem of overestimation of group-level variance parameters

10 0.95524269 1941 andrew gelman stats-2013-07-16-Priors

11 0.95505363 1170 andrew gelman stats-2012-02-16-A previous discussion with Charles Murray about liberals, conservatives, and social class

12 0.95503473 1465 andrew gelman stats-2012-08-21-D. Buggin

13 0.95457804 847 andrew gelman stats-2011-08-10-Using a “pure infographic” to explore differences between information visualization and statistical graphics

14 0.95403671 1240 andrew gelman stats-2012-04-02-Blogads update

15 0.95379412 1208 andrew gelman stats-2012-03-11-Gelman on Hennig on Gelman on Bayes

16 0.95353746 414 andrew gelman stats-2010-11-14-“Like a group of teenagers on a bus, they behave in public as if they were in private”

17 0.95326012 2099 andrew gelman stats-2013-11-13-“What are some situations in which the classical approach (or a naive implementation of it, based on cookbook recipes) gives worse results than a Bayesian approach, results that actually impeded the science?”

18 0.95296431 1713 andrew gelman stats-2013-02-08-P-values and statistical practice

19 0.9528743 1150 andrew gelman stats-2012-02-02-The inevitable problems with statistical significance and 95% intervals

20 0.95272934 1474 andrew gelman stats-2012-08-29-More on scaled-inverse Wishart and prior independence