andrew_gelman_stats andrew_gelman_stats-2013 andrew_gelman_stats-2013-1702 knowledge-graph by maker-knowledge-mining
Source: html
Introduction: Alexis Le Nestour writes: How do you test for no effect? I attended a seminar where the person assumed that a non significant difference between groups implied an absence of effect. In that case, the researcher needed to show that two groups were similar before being hit by a shock conditional on some observable variables. The assumption was that the two groups were similar and that the shock was random. What would be the good way to set up a test in that case? I know you’ve been through that before (http://andrewgelman.com/2009/02/not_statistical/) and there are interesting comments but I wanted to have your opinion on that. My reply: I think you have to get quantitative here. How similar is similar? Don’t let your standard errors drive your research agenda. Or, to put it another way, what would you do if you had all the data? If your sample size were 1 zillion, then everything would statistically distinguishable from everything else. And then you’d have to think about w
sentIndex sentText sentNum sentScore
1 Alexis Le Nestour writes: How do you test for no effect? [sent-1, score-0.171]
2 I attended a seminar where the person assumed that a non significant difference between groups implied an absence of effect. [sent-2, score-1.487]
3 In that case, the researcher needed to show that two groups were similar before being hit by a shock conditional on some observable variables. [sent-3, score-1.765]
4 The assumption was that the two groups were similar and that the shock was random. [sent-4, score-1.171]
5 What would be the good way to set up a test in that case? [sent-5, score-0.387]
6 com/2009/02/not_statistical/) and there are interesting comments but I wanted to have your opinion on that. [sent-7, score-0.316]
7 My reply: I think you have to get quantitative here. [sent-8, score-0.162]
8 Don’t let your standard errors drive your research agenda. [sent-10, score-0.417]
9 Or, to put it another way, what would you do if you had all the data? [sent-11, score-0.19]
10 If your sample size were 1 zillion, then everything would statistically distinguishable from everything else. [sent-12, score-0.922]
11 And then you’d have to think about what you really care about. [sent-13, score-0.141]
wordName wordTfidf (topN-words)
[('shock', 0.358), ('similar', 0.31), ('groups', 0.3), ('distinguishable', 0.231), ('alexis', 0.231), ('le', 0.195), ('observable', 0.186), ('everything', 0.173), ('non', 0.173), ('test', 0.171), ('attended', 0.168), ('zillion', 0.166), ('absence', 0.156), ('seminar', 0.155), ('implied', 0.155), ('drive', 0.14), ('assumed', 0.132), ('hit', 0.127), ('assumption', 0.116), ('quantitative', 0.112), ('needed', 0.11), ('conditional', 0.108), ('http', 0.104), ('case', 0.099), ('researcher', 0.099), ('statistically', 0.096), ('size', 0.094), ('opinion', 0.094), ('errors', 0.093), ('care', 0.091), ('wanted', 0.09), ('significant', 0.087), ('two', 0.087), ('person', 0.081), ('show', 0.08), ('difference', 0.08), ('sample', 0.078), ('would', 0.077), ('comments', 0.073), ('standard', 0.073), ('way', 0.073), ('effect', 0.068), ('reply', 0.067), ('set', 0.066), ('let', 0.065), ('interesting', 0.059), ('put', 0.058), ('another', 0.055), ('think', 0.05), ('research', 0.046)]
simIndex simValue blogId blogTitle
same-blog 1 0.99999994 1702 andrew gelman stats-2013-02-01-Don’t let your standard errors drive your research agenda
Introduction: Alexis Le Nestour writes: How do you test for no effect? I attended a seminar where the person assumed that a non significant difference between groups implied an absence of effect. In that case, the researcher needed to show that two groups were similar before being hit by a shock conditional on some observable variables. The assumption was that the two groups were similar and that the shock was random. What would be the good way to set up a test in that case? I know you’ve been through that before (http://andrewgelman.com/2009/02/not_statistical/) and there are interesting comments but I wanted to have your opinion on that. My reply: I think you have to get quantitative here. How similar is similar? Don’t let your standard errors drive your research agenda. Or, to put it another way, what would you do if you had all the data? If your sample size were 1 zillion, then everything would statistically distinguishable from everything else. And then you’d have to think about w
2 0.10126675 2295 andrew gelman stats-2014-04-18-One-tailed or two-tailed?
Introduction: Someone writes: Suppose I have two groups of people, A and B, which differ on some characteristic of interest to me; and for each person I measure a single real-valued quantity X. I have a theory that group A has a higher mean value of X than group B. I test this theory by using a t-test. Am I entitled to use a *one-tailed* t-test? Or should I use a *two-tailed* one (thereby giving a p-value that is twice as large)? I know you will probably answer: Forget the t-test; you should use Bayesian methods instead. But what is the standard frequentist answer to this question? My reply: The quick answer here is that different people will do different things here. I would say the 2-tailed p-value is more standard but some people will insist on the one-tailed version, and it’s hard to make a big stand on this one, given all the other problems with p-values in practice: http://www.stat.columbia.edu/~gelman/research/unpublished/p_hacking.pdf http://www.stat.columbia.edu/~gelm
3 0.098751947 1072 andrew gelman stats-2011-12-19-“The difference between . . .”: It’s not just p=.05 vs. p=.06
Introduction: The title of this post by Sanjay Srivastava illustrates an annoying misconception that’s crept into the (otherwise delightful) recent publicity related to my article with Hal Stern, he difference between “significant” and “not significant” is not itself statistically significant. When people bring this up, they keep referring to the difference between p=0.05 and p=0.06, making the familiar (and correct) point about the arbitrariness of the conventional p-value threshold of 0.05. And, sure, I agree with this, but everybody knows that already. The point Hal and I were making was that even apparently large differences in p-values are not statistically significant. For example, if you have one study with z=2.5 (almost significant at the 1% level!) and another with z=1 (not statistically significant at all, only 1 se from zero!), then their difference has a z of about 1 (again, not statistically significant at all). So it’s not just a comparison of 0.05 vs. 0.06, even a differenc
4 0.094817109 1607 andrew gelman stats-2012-12-05-The p-value is not . . .
Introduction: From a recent email exchange: I agree that you should never compare p-values directly. The p-value is a strange nonlinear transformation of data that is only interpretable under the null hypothesis. Once you abandon the null (as we do when we observe something with a very low p-value), the p-value itself becomes irrelevant. To put it another way, the p-value is a measure of evidence, it is not an estimate of effect size (as it is often treated, with the idea that a p=.001 effect is larger than a p=.01 effect, etc). Even conditional on sample size, the p-value is not a measure of effect size.
5 0.09252885 1966 andrew gelman stats-2013-08-03-Uncertainty in parameter estimates using multilevel models
Introduction: David Hsu writes: I have a (perhaps) simple question about uncertainty in parameter estimates using multilevel models — what is an appropriate threshold for measure parameter uncertainty in a multilevel model? The reason why I ask is that I set out to do a crossed two-way model with two varying intercepts, similar to your flight simulator example in your 2007 book. The difference is that I have a lot of predictors specific to each cell (I think equivalent to airport and pilot in your example), and I find after modeling this in JAGS, I happily find that the predictors are much less important than the variability by cell (airport and pilot effects). Happily because this is what I am writing a paper about. However, I then went to check subsets of predictors using lm() and lmer(). I understand that they all use different estimation methods, but what I can’t figure out is why the errors on all of the coefficient estimates are *so* different. For example, using JAGS, and th
7 0.083728693 1265 andrew gelman stats-2012-04-15-Progress in U.S. education; also, a discussion of what it takes to hit the op-ed pages
8 0.083410926 1605 andrew gelman stats-2012-12-04-Write This Book
10 0.081255123 401 andrew gelman stats-2010-11-08-Silly old chi-square!
11 0.081069909 888 andrew gelman stats-2011-09-03-A psychology researcher asks: Is Anova dead?
12 0.080046117 899 andrew gelman stats-2011-09-10-The statistical significance filter
13 0.077599555 1910 andrew gelman stats-2013-06-22-Struggles over the criticism of the “cannabis users and IQ change” paper
14 0.077504613 923 andrew gelman stats-2011-09-24-What is the normal range of values in a medical test?
15 0.075972468 255 andrew gelman stats-2010-09-04-How does multilevel modeling affect the estimate of the grand mean?
16 0.075662106 972 andrew gelman stats-2011-10-25-How do you interpret standard errors from a regression fit to the entire population?
17 0.075034767 957 andrew gelman stats-2011-10-14-Questions about a study of charter schools
19 0.07336247 1209 andrew gelman stats-2012-03-12-As a Bayesian I want scientists to report their data non-Bayesianly
20 0.072891131 1039 andrew gelman stats-2011-12-02-I just flew in from the econ seminar, and boy are my arms tired
topicId topicWeight
[(0, 0.14), (1, 0.006), (2, 0.037), (3, -0.074), (4, 0.028), (5, -0.008), (6, 0.009), (7, 0.022), (8, 0.012), (9, -0.005), (10, -0.03), (11, -0.008), (12, 0.055), (13, -0.069), (14, 0.013), (15, 0.012), (16, -0.024), (17, -0.005), (18, 0.002), (19, 0.004), (20, 0.01), (21, 0.0), (22, 0.013), (23, -0.024), (24, -0.007), (25, -0.03), (26, 0.016), (27, -0.013), (28, -0.026), (29, 0.011), (30, 0.03), (31, -0.012), (32, 0.026), (33, 0.038), (34, 0.044), (35, 0.034), (36, -0.018), (37, 0.01), (38, 0.028), (39, 0.005), (40, 0.02), (41, -0.019), (42, -0.004), (43, -0.002), (44, -0.009), (45, -0.04), (46, 0.025), (47, -0.054), (48, 0.03), (49, -0.027)]
simIndex simValue blogId blogTitle
same-blog 1 0.96425474 1702 andrew gelman stats-2013-02-01-Don’t let your standard errors drive your research agenda
Introduction: Alexis Le Nestour writes: How do you test for no effect? I attended a seminar where the person assumed that a non significant difference between groups implied an absence of effect. In that case, the researcher needed to show that two groups were similar before being hit by a shock conditional on some observable variables. The assumption was that the two groups were similar and that the shock was random. What would be the good way to set up a test in that case? I know you’ve been through that before (http://andrewgelman.com/2009/02/not_statistical/) and there are interesting comments but I wanted to have your opinion on that. My reply: I think you have to get quantitative here. How similar is similar? Don’t let your standard errors drive your research agenda. Or, to put it another way, what would you do if you had all the data? If your sample size were 1 zillion, then everything would statistically distinguishable from everything else. And then you’d have to think about w
2 0.76534861 1150 andrew gelman stats-2012-02-02-The inevitable problems with statistical significance and 95% intervals
Introduction: I’m thinking more and more that we have to get rid of statistical significance, 95% intervals, and all the rest, and just come to a more fundamental acceptance of uncertainty. In practice, I think we use confidence intervals and hypothesis tests as a way to avoid acknowledging uncertainty. We set up some rules and then act as if we know what is real and what is not. Even in my own applied work, I’ve often enough presented 95% intervals and gone on from there. But maybe that’s just not right. I was thinking about this after receiving the following email from a psychology student: I [the student] am trying to conceptualize the lessons in your paper with Stern with comparing treatment effects across studies. When trying to understand if a certain intervention works, we must look at what the literature says. However this can be complicated if the literature has divergent results. There are four situations I am thinking of. FOr each of these situations, assume the studies are r
Introduction: Maggie Fox writes : Brain scans may be able to predict what you will do better than you can yourself . . . They found a way to interpret “real time” brain images to show whether people who viewed messages about using sunscreen would actually use sunscreen during the following week. The scans were more accurate than the volunteers were, Emily Falk and colleagues at the University of California Los Angeles reported in the Journal of Neuroscience. . . . About half the volunteers had correctly predicted whether they would use sunscreen. The research team analyzed and re-analyzed the MRI scans to see if they could find any brain activity that would do better. Activity in one area of the brain, a particular part of the medial prefrontal cortex, provided the best information. “From this region of the brain, we can predict for about three-quarters of the people whether they will increase their use of sunscreen beyond what they say they will do,” Lieberman said. “It is the one re
4 0.75708514 212 andrew gelman stats-2010-08-17-Futures contracts, Granger causality, and my preference for estimation to testing
Introduction: José Iparraguirre writes: There’s a letter in the latest issue of The Economist (July 31st) signed by Sir Richard Branson (Virgin), Michael Masters (Masters Capital Management) and David Frenk (Better Markets) about an “>OECD report on speculation and the prices of commodities, which includes the following: “The report uses a Granger causality test to measure the relationship between the level of commodities futures contracts held by swap dealers, and the prices of those commodities. Granger tests, however, are of dubious applicability to extremely volatile variables like commodities prices.” The report says: Granger causality is a standard statistical technique for determining whether one time series is useful in forecasting another. It is important to bear in mind that the term causality is used in a statistical sense, and not in a philosophical one of structural causation. More precisely a variable A is said to Granger cause B if knowing the time paths of B and A toge
Introduction: Hal Pashler wrote in about a recent paper , “Labor Market Returns to Early Childhood Stimulation: a 20-year Followup to an Experimental Intervention in Jamaica,” by Paul Gertler, James Heckman, Rodrigo Pinto, Arianna Zanolini, Christel Vermeerch, Susan Walker, Susan M. Chang, and Sally Grantham-McGregor. Here’s Pashler: Dan Willingham tweeted: @DTWillingham: RCT from Jamaica: Big effects 20 years later of intervention—teaching parenting/child stimulation to moms in poverty http://t.co/rX6904zxvN Browsing pp. 4 ff, it seems the authors are basically saying “hey the stats were challenging, the sample size tiny, other problems, but we solved them all—using innovative methods of our own devising!—and lo and behold, big positive results!”. So this made me think (and tweet) basically that I hope the topic (which is pretty important) will happen to interest Andy Gelman enough to incline him to give us his take. If you happen to have time and interest… My reply became this artic
7 0.73574251 1910 andrew gelman stats-2013-06-22-Struggles over the criticism of the “cannabis users and IQ change” paper
8 0.73566401 1746 andrew gelman stats-2013-03-02-Fishing for cherries
9 0.73551899 401 andrew gelman stats-2010-11-08-Silly old chi-square!
10 0.72769558 1605 andrew gelman stats-2012-12-04-Write This Book
11 0.7224704 2183 andrew gelman stats-2014-01-23-Discussion on preregistration of research studies
12 0.71971321 695 andrew gelman stats-2011-05-04-Statistics ethics question
13 0.71695465 791 andrew gelman stats-2011-07-08-Censoring on one end, “outliers” on the other, what can we do with the middle?
14 0.71013165 2159 andrew gelman stats-2014-01-04-“Dogs are sensitive to small variations of the Earth’s magnetic field”
15 0.70949405 490 andrew gelman stats-2010-12-29-Brain Structure and the Big Five
16 0.70742851 561 andrew gelman stats-2011-02-06-Poverty, educational performance – and can be done about it
17 0.70595378 2295 andrew gelman stats-2014-04-18-One-tailed or two-tailed?
18 0.70434892 1070 andrew gelman stats-2011-12-19-The scope for snooping
19 0.7041828 888 andrew gelman stats-2011-09-03-A psychology researcher asks: Is Anova dead?
20 0.70350003 2223 andrew gelman stats-2014-02-24-“Edlin’s rule” for routinely scaling down published estimates
topicId topicWeight
[(0, 0.024), (16, 0.091), (21, 0.067), (24, 0.183), (49, 0.028), (61, 0.037), (82, 0.022), (89, 0.17), (99, 0.256)]
simIndex simValue blogId blogTitle
Introduction: I remember in 4th grade or so, the teacher would give us a list of vocabulary words each week and we’d have to show we learned them by using each in a sentence. We quickly got bored and decided to do the assignment by writing a single sentence using all ten words. (Which the teacher hated, of course.) The above headline is in that spirit, combining blog posts rather than vocabulary words. But that only uses two of the entries. To really do the job, I’d need to throw in bivariate associations, ecological fallacies, high-dimensional feature selection, statistical significance, the suddenly unpopular name Hilary, snotty reviewers, the contagion of obesity, and milk-related spam. Or we could bring in some of the all-time favorites, such as Bayesians, economists, Finland, beautiful parents and their daughters, goofy graphics, red and blue states, essentialism in children’s reasoning, chess running, and zombies. Putting 8 of these in a single sentence (along with Glenn Hubbard
2 0.95628625 833 andrew gelman stats-2011-07-31-Untunable Metropolis
Introduction: Michael Margolis writes: What are we to make of it when a Metropolis-Hastings step just won’t tune? That is, the acceptance rate is zero at expected-jump-size X, and way above 1/2 at X-exp(-16) (i.e., machine precision ). I’ve solved my practical problem by writing that I would have liked to include results from a diffuse prior, but couldn’t. But I’m bothered by the poverty of my intuition. And since everything I’ve read says this is an issue of efficiency, rather than accuracy, I wonder if I could solve it just by running massive and heavily thinned chains. My reply: I can’t see how this could happen in a well-specified problem! I suspect it’s a bug. Otherwise try rescaling your variables so that your parameters will have values on the order of magnitude of 1. To which Margolis responded: I hardly wrote any of the code, so I can’t speak to the bug question — it’s binomial kriging from the R package geoRglm. And there are no covariates to scale — just the zero and one
3 0.95384181 1160 andrew gelman stats-2012-02-09-Familial Linkage between Neuropsychiatric Disorders and Intellectual Interests
Introduction: When I spoke at Princeton last year, I talked with neuroscientist Sam Wang, who told me about a project he did surveying incoming Princeton freshmen about mental illness in their families. He and his coauthor Benjamin Campbell found some interesting results, which they just published : A link between intellect and temperament has long been the subject of speculation. . . . Studies of the artistically inclined report linkage with familial depression, while among eminent and creative scientists, a lower incidence of affective disorders is found. In the case of developmental disorders, a heightened prevalence of autism spectrum disorders (ASDs) has been found in the families of mathematicians, physicists, and engineers. . . . We surveyed the incoming class of 2014 at Princeton University about their intended academic major, familial incidence of neuropsychiatric disorders, and demographic variables. . . . Consistent with prior findings, we noticed a relation between intended academ
4 0.95247078 1756 andrew gelman stats-2013-03-10-He said he was sorry
Introduction: Yes, it can be done : Hereby I contact you to clarify the situation that occurred with the publication of the article entitled *** which was published in Volume 11, Issue 3 of *** and I made the mistake of declaring as an author. This chapter is a plagiarism of . . . I wish to express and acknowledge that I am solely responsible for this . . . I recognize the gravity of the offense committed, since there is no justification for so doing. Therefore, and as a sign of shame and regret I feel in this situation, I will publish this letter, in order to set an example for other researchers do not engage in a similar error. No more, and to please accept my apologies, Sincerely, *** P.S. Since we’re on Retraction Watch already, I’ll point you to this unrelated story featuring a hilarious photo of a fraudster, who in this case was a grad student in psychology who faked his data and “has agreed to submit to a three-year supervisory period for any work involving funding from the
5 0.94800937 407 andrew gelman stats-2010-11-11-Data Visualization vs. Statistical Graphics
Introduction: I have this great talk on the above topic but nowhere to give it. Here’s the story. Several months ago, I was invited to speak at IEEE VisWeek. It sounded like a great opportunity. The organizer told me that there were typically about 700 people in the audience, and these are people in the visualization community whom I’d like to reach but normally wouldn’t have the opportunity to encounter. It sounded great, but I didn’t want to fly most of the way across the country by myself, so I offered to give the talk by videolink. I was surprised to get a No response: I’d think that a visualization conference, of all things, would welcome a video talk. In the meantime, though, I’d thought a lot about what I’d talk about and had started preparing something. Once I found out I wouldn’t be giving the talk, I channeled the efforts into an article which, with the collaboration of Antony Unwin, was completed about a month ago. It would take very little effort to adapt this graph-laden a
6 0.94568133 1215 andrew gelman stats-2012-03-16-The “hot hand” and problems with hypothesis testing
same-blog 7 0.94104588 1702 andrew gelman stats-2013-02-01-Don’t let your standard errors drive your research agenda
8 0.94074255 1572 andrew gelman stats-2012-11-10-I don’t like this cartoon
9 0.93970561 1991 andrew gelman stats-2013-08-21-BDA3 table of contents (also a new paper on visualization)
10 0.93220037 1320 andrew gelman stats-2012-05-14-Question 4 of my final exam for Design and Analysis of Sample Surveys
11 0.93117523 2243 andrew gelman stats-2014-03-11-The myth of the myth of the myth of the hot hand
12 0.92617178 231 andrew gelman stats-2010-08-24-Yet another Bayesian job opportunity
13 0.91387284 623 andrew gelman stats-2011-03-21-Baseball’s greatest fielders
14 0.91287386 566 andrew gelman stats-2011-02-09-The boxer, the wrestler, and the coin flip, again
15 0.91175884 1580 andrew gelman stats-2012-11-16-Stantastic!
16 0.90805721 1628 andrew gelman stats-2012-12-17-Statistics in a world where nothing is random
18 0.90379548 1473 andrew gelman stats-2012-08-28-Turing chess run update
19 0.90128708 459 andrew gelman stats-2010-12-09-Solve mazes by starting at the exit
20 0.90006149 1477 andrew gelman stats-2012-08-30-Visualizing Distributions of Covariance Matrices