andrew_gelman_stats andrew_gelman_stats-2011 andrew_gelman_stats-2011-1070 knowledge-graph by maker-knowledge-mining
Source: html
Introduction: Macartan Humphreys sent the following question to David Madigan and me: I am working on a piece on the registration of research designs (to prevent snooping). As part of it we want to give some estimates for the “scope for snooping” and how this can be affected by different registration requirements. So we want to answer questions of the form: “Say in truth there is no relation between x and y, you were willing to mess about with models until you found a significant relation between them, what are the chances that you would succeed if: 1. You were free to choose the indicators for x and y 2. You were free to choose h control variable from some group of k possible controls 3. You were free to divide up the sample in k ways to examine heterogeneous treatment effects 4. You were free to select from some set of k reasonable models” People have thought a lot about the first problem of choosing your indicators; we have done a set of simulations to answer the other questions
sentIndex sentText sentNum sentScore
1 Macartan Humphreys sent the following question to David Madigan and me: I am working on a piece on the registration of research designs (to prevent snooping). [sent-1, score-0.359]
2 As part of it we want to give some estimates for the “scope for snooping” and how this can be affected by different registration requirements. [sent-2, score-0.288]
3 So we want to answer questions of the form: “Say in truth there is no relation between x and y, you were willing to mess about with models until you found a significant relation between them, what are the chances that you would succeed if: 1. [sent-3, score-0.438]
4 You were free to choose the indicators for x and y 2. [sent-4, score-0.285]
5 You were free to choose h control variable from some group of k possible controls 3. [sent-5, score-0.175]
6 You were free to divide up the sample in k ways to examine heterogeneous treatment effects 4. [sent-6, score-0.265]
7 The question is: are there analytic results on these things already? [sent-8, score-0.071]
8 David wrote: I’ve been involved in a large-scale drug safety signal detection project for the last two or three years (http://omop. [sent-10, score-0.099]
9 Generally I don’t think there is any way to say definitively that any one of these analysis is a priori obviously stupid (although “experts” will happily concoct an attack on any approach that does not produce the result they like! [sent-14, score-0.228]
10 The medical journals are full of conflicting analyses and I’ve come to the belief that, at least in the medical arena, the idea human experts *know* the *right* analysis for a particular estimand is false. [sent-16, score-0.378]
11 I’m all for registration of observational studies with pre-specified protocols. [sent-17, score-0.373]
12 Meanwhile, I wrote the following reply to the original question: The short answer is that I think a determined researcher can find all sorts of things. [sent-20, score-0.239]
13 My solution to this snooping problem is not to forbid analyses but rather the opposite, to set up the data so people can do all possible analyses. [sent-21, score-0.559]
14 and find no effects; should I infer that my result was spurious? [sent-24, score-0.224]
15 No, not unless I thought that B, C, D… are just as plausible tests of whatever my claim is. [sent-25, score-0.115]
16 But of course if I did find them just as plausible then I would have been happy to include them in my initial statement of the test to be run. [sent-26, score-0.26]
17 In other words the extra analyses that you would admit would only matter to me if they are the ones that I wouldn;t have forbidden in the first place. [sent-27, score-0.244]
18 What precommitting then does is just move forward the conversaton about what the family of plausible models is, to a point where it is not influenced by results. [sent-28, score-0.208]
19 It is still the case that for whatever model you settle on (including a multi level Bayesian model that uses data from all schools) someone can muck about with features of the model to get results they like. [sent-37, score-0.297]
20 But a multilevel model will handle many of the issues of concern. [sent-39, score-0.074]
wordName wordTfidf (topN-words)
[('snooping', 0.328), ('registration', 0.288), ('macartan', 0.26), ('registered', 0.18), ('latitude', 0.149), ('find', 0.145), ('rr', 0.135), ('freedom', 0.131), ('spurious', 0.115), ('plausible', 0.115), ('indicators', 0.11), ('free', 0.109), ('safety', 0.099), ('plans', 0.098), ('analyses', 0.097), ('answer', 0.094), ('models', 0.093), ('effects', 0.093), ('select', 0.091), ('datasets', 0.088), ('observational', 0.085), ('significant', 0.085), ('relation', 0.083), ('produce', 0.082), ('answers', 0.079), ('result', 0.079), ('mathieu', 0.075), ('forbidden', 0.075), ('judicious', 0.075), ('madigan', 0.075), ('multi', 0.075), ('model', 0.074), ('experts', 0.073), ('ones', 0.072), ('question', 0.071), ('finger', 0.07), ('forbid', 0.07), ('estimand', 0.07), ('subpopulations', 0.07), ('replied', 0.07), ('medical', 0.069), ('ex', 0.067), ('definitively', 0.067), ('http', 0.067), ('choose', 0.066), ('ante', 0.065), ('highly', 0.065), ('set', 0.064), ('heterogeneous', 0.063), ('stakes', 0.063)]
simIndex simValue blogId blogTitle
same-blog 1 0.99999994 1070 andrew gelman stats-2011-12-19-The scope for snooping
Introduction: Macartan Humphreys sent the following question to David Madigan and me: I am working on a piece on the registration of research designs (to prevent snooping). As part of it we want to give some estimates for the “scope for snooping” and how this can be affected by different registration requirements. So we want to answer questions of the form: “Say in truth there is no relation between x and y, you were willing to mess about with models until you found a significant relation between them, what are the chances that you would succeed if: 1. You were free to choose the indicators for x and y 2. You were free to choose h control variable from some group of k possible controls 3. You were free to divide up the sample in k ways to examine heterogeneous treatment effects 4. You were free to select from some set of k reasonable models” People have thought a lot about the first problem of choosing your indicators; we have done a set of simulations to answer the other questions
2 0.14176659 888 andrew gelman stats-2011-09-03-A psychology researcher asks: Is Anova dead?
Introduction: A research psychologist writes in with a question that’s so long that I’ll put my answer first, then put the question itself below the fold. Here’s my reply: As I wrote in my Anova paper and in my book with Jennifer Hill, I do think that multilevel models can completely replace Anova. At the same time, I think the central idea of Anova should persist in our understanding of these models. To me the central idea of Anova is not F-tests or p-values or sums of squares, but rather the idea of predicting an outcome based on factors with discrete levels, and understanding these factors using variance components. The continuous or categorical response thing doesn’t really matter so much to me. I have no problem using a normal linear model for continuous outcomes (perhaps suitably transformed) and a logistic model for binary outcomes. I don’t want to throw away interactions just because they’re not statistically significant. I’d rather partially pool them toward zero using an inform
3 0.12564105 2183 andrew gelman stats-2014-01-23-Discussion on preregistration of research studies
Introduction: Chris Chambers and I had an enlightening discussion the other day at the blog of Rolf Zwaan, regarding the Garden of Forking Paths ( go here and scroll down through the comments). Chris sent me the following note: I’m writing a book at the moment about reforming practices in psychological research (focusing on various bad practices such as p-hacking, HARKing, low statistical power, publication bias, lack of data sharing etc. – and posing solutions such as pre-registration, Bayesian hypothesis testing, mandatory data archiving etc.) and I am arriving at rather unsettling conclusion: that null hypothesis significance testing (NHST) simply isn’t valid for observational research. If this is true then most of the psychological literature is statistically flawed. I was wonder what your thoughts were on this, both from a statistical point of view and from your experience working in an observational field. We all know about the dangers of researcher degrees of freedom. We also know
Introduction: In response to my remarks on his online book, Think Bayes, Allen Downey wrote: I [Downey] have a question about one of your comments: My [Gelman's] main criticism with both books is that they talk a lot about inference but not so much about model building or model checking (recall the three steps of Bayesian data analysis). I think it’s ok for an introductory book to focus on inference, which of course is central to the data-analytic process—but I’d like them to at least mention that Bayesian ideas arise in model building and model checking as well. This sounds like something I agree with, and one of the things I tried to do in the book is to put modeling decisions front and center. But the word “modeling” is used in lots of ways, so I want to see if we are talking about the same thing. For example, in many chapters, I start with a simple model of the scenario, do some analysis, then check whether the model is good enough, and iterate. Here’s the discussion of modeling
Introduction: Someone wrote in: We are about to conduct a voting list experiment. We came across your comment recommending that each item be removed from the list. Would greatly appreciate it if you take a few minutes to spell out your recommendation in a little more detail. In particular: (a) Why are you “uneasy” about list experiments? What would strengthen your confidence in list experiments? (b) What do you mean by “each item be removed”? As you know, there are several non-sensitive items and one sensitive item in a list experiment. Do you mean that the non-sensitive items should be removed one-by-one for the control group or are you suggesting a multiple arm design in which each arm of the experiment has one non-sensitive item removed. What would be achieved by this design? I replied: I’ve always been a bit skeptical about list experiments, partly because I worry that the absolute number of items on the list could itself affect the response. For example, someone might not want to che
8 0.10941964 773 andrew gelman stats-2011-06-18-Should we always be using the t and robit instead of the normal and logit?
9 0.10820887 2268 andrew gelman stats-2014-03-26-New research journal on observational studies
10 0.10709246 1392 andrew gelman stats-2012-06-26-Occam
12 0.10629654 754 andrew gelman stats-2011-06-09-Difficulties with Bayesian model averaging
13 0.10616235 553 andrew gelman stats-2011-02-03-is it possible to “overstratify” when assigning a treatment in a randomized control trial?
14 0.10422727 1695 andrew gelman stats-2013-01-28-Economists argue about Bayes
15 0.10365351 1690 andrew gelman stats-2013-01-23-When are complicated models helpful in psychology research and when are they overkill?
16 0.10344572 472 andrew gelman stats-2010-12-17-So-called fixed and random effects
17 0.10336659 1289 andrew gelman stats-2012-04-29-We go to war with the data we have, not the data we want
18 0.10320728 1395 andrew gelman stats-2012-06-27-Cross-validation (What is it good for?)
19 0.1008137 781 andrew gelman stats-2011-06-28-The holes in my philosophy of Bayesian data analysis
topicId topicWeight
[(0, 0.264), (1, 0.06), (2, 0.037), (3, -0.074), (4, 0.034), (5, 0.014), (6, 0.008), (7, -0.017), (8, 0.063), (9, 0.035), (10, 0.007), (11, 0.022), (12, 0.017), (13, -0.026), (14, 0.025), (15, 0.024), (16, 0.012), (17, 0.001), (18, -0.011), (19, 0.028), (20, -0.005), (21, -0.001), (22, 0.011), (23, -0.02), (24, -0.029), (25, 0.008), (26, 0.029), (27, -0.045), (28, -0.005), (29, -0.005), (30, 0.01), (31, -0.005), (32, 0.028), (33, 0.034), (34, 0.019), (35, 0.004), (36, -0.029), (37, 0.04), (38, 0.015), (39, -0.012), (40, 0.02), (41, -0.036), (42, 0.025), (43, 0.051), (44, 0.023), (45, -0.031), (46, 0.031), (47, -0.018), (48, -0.007), (49, 0.007)]
simIndex simValue blogId blogTitle
same-blog 1 0.97399348 1070 andrew gelman stats-2011-12-19-The scope for snooping
Introduction: Macartan Humphreys sent the following question to David Madigan and me: I am working on a piece on the registration of research designs (to prevent snooping). As part of it we want to give some estimates for the “scope for snooping” and how this can be affected by different registration requirements. So we want to answer questions of the form: “Say in truth there is no relation between x and y, you were willing to mess about with models until you found a significant relation between them, what are the chances that you would succeed if: 1. You were free to choose the indicators for x and y 2. You were free to choose h control variable from some group of k possible controls 3. You were free to divide up the sample in k ways to examine heterogeneous treatment effects 4. You were free to select from some set of k reasonable models” People have thought a lot about the first problem of choosing your indicators; we have done a set of simulations to answer the other questions
2 0.91298866 888 andrew gelman stats-2011-09-03-A psychology researcher asks: Is Anova dead?
Introduction: A research psychologist writes in with a question that’s so long that I’ll put my answer first, then put the question itself below the fold. Here’s my reply: As I wrote in my Anova paper and in my book with Jennifer Hill, I do think that multilevel models can completely replace Anova. At the same time, I think the central idea of Anova should persist in our understanding of these models. To me the central idea of Anova is not F-tests or p-values or sums of squares, but rather the idea of predicting an outcome based on factors with discrete levels, and understanding these factors using variance components. The continuous or categorical response thing doesn’t really matter so much to me. I have no problem using a normal linear model for continuous outcomes (perhaps suitably transformed) and a logistic model for binary outcomes. I don’t want to throw away interactions just because they’re not statistically significant. I’d rather partially pool them toward zero using an inform
Introduction: Maggie Fox writes : Brain scans may be able to predict what you will do better than you can yourself . . . They found a way to interpret “real time” brain images to show whether people who viewed messages about using sunscreen would actually use sunscreen during the following week. The scans were more accurate than the volunteers were, Emily Falk and colleagues at the University of California Los Angeles reported in the Journal of Neuroscience. . . . About half the volunteers had correctly predicted whether they would use sunscreen. The research team analyzed and re-analyzed the MRI scans to see if they could find any brain activity that would do better. Activity in one area of the brain, a particular part of the medial prefrontal cortex, provided the best information. “From this region of the brain, we can predict for about three-quarters of the people whether they will increase their use of sunscreen beyond what they say they will do,” Lieberman said. “It is the one re
4 0.8281464 1294 andrew gelman stats-2012-05-01-Modeling y = a + b + c
Introduction: Brandon Behlendorf writes: I [Behlendorf] am replicating some previous research using OLS [he's talking about what we call "linear regression"---ed.] to regress a logged rate (to reduce skew) of Y on a number of predictors (Xs). Y is the count of a phenomena divided by the population of the unit of the analysis. The problem that I am encountering is that Y is composite count of a number of distinct phenomena [A+B+C], and these phenomena are not uniformly distributed across the sample. Most of the research in this area has conducted regressions either with Y or with individual phenomena [A or B or C] as the dependent variable. Yet it seems that if [A, B, C] are not uniformly distributed across the sample of units in the same proportion, then the use of Y would be biased, since as a count of [A+B+C] divided by the population, it would treat as equivalent units both [2+0.5+1.5] and [4+0+0]. My goal is trying to find a methodology which allows a researcher to regress Y on a
5 0.82739562 257 andrew gelman stats-2010-09-04-Question about standard range for social science correlations
Introduction: Andrew Eppig writes: I’m a physicist by training who is transitioning to the social sciences. I recently came across a reference in the Economist to a paper on IQ and parasites which I read as I have more than a passing interest in IQ research (having read much that you and others (e.g., Shalizi, Wicherts) have written). In this paper I note that the authors find a very high correlation between national IQ and parasite prevalence. The strength of the correlation (-0.76 to -0.82) surprised me, as I’m used to much weaker correlations in the social sciences. To me, it’s a bit too high, suggesting that there are other factors at play or that one of the variables is merely a proxy for a large number of other variables. But I have no basis for this other than a gut feeling and a memory of a plot on Language Log about the distribution of correlation coefficients in social psychology. So my question is this: Is a correlation in the range of (-0.82,-0.76) more likely to be a correlatio
6 0.82677859 401 andrew gelman stats-2010-11-08-Silly old chi-square!
7 0.82581609 1121 andrew gelman stats-2012-01-15-R-squared for multilevel models
8 0.82575494 1910 andrew gelman stats-2013-06-22-Struggles over the criticism of the “cannabis users and IQ change” paper
10 0.82460988 726 andrew gelman stats-2011-05-22-Handling multiple versions of an outcome variable
12 0.81182539 1918 andrew gelman stats-2013-06-29-Going negative
13 0.80970222 1971 andrew gelman stats-2013-08-07-I doubt they cheated
14 0.80700058 86 andrew gelman stats-2010-06-14-“Too much data”?
15 0.80193496 938 andrew gelman stats-2011-10-03-Comparing prediction errors
17 0.80002689 2274 andrew gelman stats-2014-03-30-Adjudicating between alternative interpretations of a statistical interaction?
18 0.79768813 753 andrew gelman stats-2011-06-09-Allowing interaction terms to vary
19 0.79608285 1702 andrew gelman stats-2013-02-01-Don’t let your standard errors drive your research agenda
topicId topicWeight
[(2, 0.023), (15, 0.014), (16, 0.076), (21, 0.031), (24, 0.163), (63, 0.027), (76, 0.01), (82, 0.023), (95, 0.188), (99, 0.317)]
simIndex simValue blogId blogTitle
1 0.98912293 12 andrew gelman stats-2010-04-30-More on problems with surveys estimating deaths in war zones
Introduction: Andrew Mack writes: There was a brief commentary from the Benetech folk on the Human Security Report Project’s, “The Shrinking Costs of War” report on your blog in January. But the report has since generated a lot of public controversy . Since the report–like the current discussion in your blog on Mike Spagat’s new paper on Iraq–deals with controversies generated by survey-based excess death estimates, we thought your readers might be interested. Our responses to the debate were posted on our website last week. “Shrinking Costs” had discussed the dramatic decline in death tolls from wartime violence since the end of World War II –and its causes. We also argued that deaths from war-exacerbated disease and malnutrition had declined. (The exec. summary is here .) One of the most striking findings was that mortality rates (we used under-five mortality data) decline during most wars. Indeed our latest research indicates that of the total number of years that countries w
2 0.98402637 1164 andrew gelman stats-2012-02-13-Help with this problem, win valuable prizes
Introduction: Corrected equation This post is by Phil. In the comments to an earlier post , I mentioned a problem I am struggling with right now. Several people mentioned having (and solving!) similar problems in the past, so this seems like a great way for me and a bunch of other blog readers to learn something. I will describe the problem, one or more of you will tell me how to solve it, and you will win…wait for it….my thanks, and the approval and admiration of your fellow blog readers, and a big thank-you in any publication that includes results from fitting the model. You can’t ask fairer than that! Here’s the problem. The goal is to estimate six parameters that characterize the leakiness (or air-tightness) of a house with an attached garage. We are specifically interested in the parameters that describe the connection between the house and the garage; this is of interest because of the effect on the air quality in the house if there are toxic chemic
Introduction: Greg Kaplan writes: I noticed that you have blogged a little about interstate migration trends in the US, and thought that you might be interested in a new working paper of mine (joint with Sam Schulhofer-Wohl from the Minneapolis Fed) which I have attached. Briefly, we show that much of the recent reported drop in interstate migration is a statistical artifact: The Census Bureau made an undocumented change in its imputation procedures for missing data in 2006, and this change significantly reduced the number of imputed interstate moves. The change in imputation procedures — not any actual change in migration behavior — explains 90 percent of the reported decrease in interstate migration between the 2005 and 2006 Current Population Surveys, and 42 percent of the decrease between 2000 and 2010. I haven’t had a chance to give a serious look so could only make the quick suggestion to make the graphs smaller and put multiple graphs on a page, This would allow the reader to bett
4 0.97768152 1086 andrew gelman stats-2011-12-27-The most dangerous jobs in America
Introduction: Robin Hanson writes: On the criteria of potential to help people avoid death, this would seem to be among the most important news I’ve ever heard. [In his recent Ph.D. thesis , Ken Lee finds that] death rates depend on job details more than on race, gender, marriage status, rural vs. urban, education, and income combined ! Now for the details. The US Department of Labor has described each of 807 occupations with over 200 detailed features on how jobs are done, skills required, etc.. Lee looked at seven domains of such features, each containing 16 to 57 features, and for each domain Lee did a factor analysis of those features to find the top 2-4 factors. This gave Lee a total of 22 domain factors. Lee also found four overall factors to describe his total set of 225 job and 9 demographic features. (These four factors explain 32%, 15%, 7%, and 4% of total variance.) Lee then tried to use these 26 job factors, along with his other standard predictors (age, race, gender, m
5 0.97574878 1862 andrew gelman stats-2013-05-18-uuuuuuuuuuuuugly
Introduction: Hamdan Azhar writes: I came across this graphic of vaccine-attributed decreases in mortality and was curious if you found it as unattractive and unintuitive as I did. Hope all is well with you! My reply: All’s well with me. And yes, that’s one horrible graph. It has all the problems with a bad infographic with none of the virtues. Compared to this monstrosity, the typical USA Today graph is a stunning, beautiful masterpiece. I don’t think I want to soil this webpage with the image. In fact, I don’t even want to link to it.
7 0.9735294 266 andrew gelman stats-2010-09-09-The future of R
8 0.97237551 519 andrew gelman stats-2011-01-16-Update on the generalized method of moments
9 0.97165704 1308 andrew gelman stats-2012-05-08-chartsnthings !
10 0.96406448 2135 andrew gelman stats-2013-12-15-The UN Plot to Force Bayesianism on Unsuspecting Americans (penalized B-Spline edition)
11 0.96021402 1758 andrew gelman stats-2013-03-11-Yes, the decision to try (or not) to have a child can be made rationally
same-blog 12 0.9595111 1070 andrew gelman stats-2011-12-19-The scope for snooping
13 0.95857722 627 andrew gelman stats-2011-03-24-How few respondents are reasonable to use when calculating the average by county?
15 0.95611054 1737 andrew gelman stats-2013-02-25-Correlation of 1 . . . too good to be true?
16 0.95019752 2154 andrew gelman stats-2013-12-30-Bill Gates’s favorite graph of the year
17 0.94759321 1646 andrew gelman stats-2013-01-01-Back when fifty years was a long time ago
18 0.94664323 1595 andrew gelman stats-2012-11-28-Should Harvard start admitting kids at random?
19 0.94642854 1575 andrew gelman stats-2012-11-12-Thinking like a statistician (continuously) rather than like a civilian (discretely)
20 0.9451797 1820 andrew gelman stats-2013-04-23-Foundation for Open Access Statistics