andrew_gelman_stats andrew_gelman_stats-2013 andrew_gelman_stats-2013-1702 knowledge-graph by maker-knowledge-mining

1702 andrew gelman stats-2013-02-01-Don’t let your standard errors drive your research agenda


meta infos for this blog

Source: html

Introduction: Alexis Le Nestour writes: How do you test for no effect? I attended a seminar where the person assumed that a non significant difference between groups implied an absence of effect. In that case, the researcher needed to show that two groups were similar before being hit by a shock conditional on some observable variables. The assumption was that the two groups were similar and that the shock was random. What would be the good way to set up a test in that case? I know you’ve been through that before (http://andrewgelman.com/2009/02/not_statistical/) and there are interesting comments but I wanted to have your opinion on that. My reply: I think you have to get quantitative here. How similar is similar? Don’t let your standard errors drive your research agenda. Or, to put it another way, what would you do if you had all the data? If your sample size were 1 zillion, then everything would statistically distinguishable from everything else. And then you’d have to think about w


Summary: the most important sentenses genereted by tfidf model

sentIndex sentText sentNum sentScore

1 Alexis Le Nestour writes: How do you test for no effect? [sent-1, score-0.171]

2 I attended a seminar where the person assumed that a non significant difference between groups implied an absence of effect. [sent-2, score-1.487]

3 In that case, the researcher needed to show that two groups were similar before being hit by a shock conditional on some observable variables. [sent-3, score-1.765]

4 The assumption was that the two groups were similar and that the shock was random. [sent-4, score-1.171]

5 What would be the good way to set up a test in that case? [sent-5, score-0.387]

6 com/2009/02/not_statistical/) and there are interesting comments but I wanted to have your opinion on that. [sent-7, score-0.316]

7 My reply: I think you have to get quantitative here. [sent-8, score-0.162]

8 Don’t let your standard errors drive your research agenda. [sent-10, score-0.417]

9 Or, to put it another way, what would you do if you had all the data? [sent-11, score-0.19]

10 If your sample size were 1 zillion, then everything would statistically distinguishable from everything else. [sent-12, score-0.922]

11 And then you’d have to think about what you really care about. [sent-13, score-0.141]


similar blogs computed by tfidf model

tfidf for this blog:

wordName wordTfidf (topN-words)

[('shock', 0.358), ('similar', 0.31), ('groups', 0.3), ('distinguishable', 0.231), ('alexis', 0.231), ('le', 0.195), ('observable', 0.186), ('everything', 0.173), ('non', 0.173), ('test', 0.171), ('attended', 0.168), ('zillion', 0.166), ('absence', 0.156), ('seminar', 0.155), ('implied', 0.155), ('drive', 0.14), ('assumed', 0.132), ('hit', 0.127), ('assumption', 0.116), ('quantitative', 0.112), ('needed', 0.11), ('conditional', 0.108), ('http', 0.104), ('case', 0.099), ('researcher', 0.099), ('statistically', 0.096), ('size', 0.094), ('opinion', 0.094), ('errors', 0.093), ('care', 0.091), ('wanted', 0.09), ('significant', 0.087), ('two', 0.087), ('person', 0.081), ('show', 0.08), ('difference', 0.08), ('sample', 0.078), ('would', 0.077), ('comments', 0.073), ('standard', 0.073), ('way', 0.073), ('effect', 0.068), ('reply', 0.067), ('set', 0.066), ('let', 0.065), ('interesting', 0.059), ('put', 0.058), ('another', 0.055), ('think', 0.05), ('research', 0.046)]

similar blogs list:

simIndex simValue blogId blogTitle

same-blog 1 0.99999994 1702 andrew gelman stats-2013-02-01-Don’t let your standard errors drive your research agenda

Introduction: Alexis Le Nestour writes: How do you test for no effect? I attended a seminar where the person assumed that a non significant difference between groups implied an absence of effect. In that case, the researcher needed to show that two groups were similar before being hit by a shock conditional on some observable variables. The assumption was that the two groups were similar and that the shock was random. What would be the good way to set up a test in that case? I know you’ve been through that before (http://andrewgelman.com/2009/02/not_statistical/) and there are interesting comments but I wanted to have your opinion on that. My reply: I think you have to get quantitative here. How similar is similar? Don’t let your standard errors drive your research agenda. Or, to put it another way, what would you do if you had all the data? If your sample size were 1 zillion, then everything would statistically distinguishable from everything else. And then you’d have to think about w

2 0.10126675 2295 andrew gelman stats-2014-04-18-One-tailed or two-tailed?

Introduction: Someone writes: Suppose I have two groups of people, A and B, which differ on some characteristic of interest to me; and for each person I measure a single real-valued quantity X. I have a theory that group A has a higher mean value of X than group B. I test this theory by using a t-test. Am I entitled to use a *one-tailed* t-test? Or should I use a *two-tailed* one (thereby giving a p-value that is twice as large)? I know you will probably answer: Forget the t-test; you should use Bayesian methods instead. But what is the standard frequentist answer to this question? My reply: The quick answer here is that different people will do different things here. I would say the 2-tailed p-value is more standard but some people will insist on the one-tailed version, and it’s hard to make a big stand on this one, given all the other problems with p-values in practice: http://www.stat.columbia.edu/~gelman/research/unpublished/p_hacking.pdf http://www.stat.columbia.edu/~gelm

3 0.098751947 1072 andrew gelman stats-2011-12-19-“The difference between . . .”: It’s not just p=.05 vs. p=.06

Introduction: The title of this post by Sanjay Srivastava illustrates an annoying misconception that’s crept into the (otherwise delightful) recent publicity related to my article with Hal Stern, he difference between “significant” and “not significant” is not itself statistically significant. When people bring this up, they keep referring to the difference between p=0.05 and p=0.06, making the familiar (and correct) point about the arbitrariness of the conventional p-value threshold of 0.05. And, sure, I agree with this, but everybody knows that already. The point Hal and I were making was that even apparently large differences in p-values are not statistically significant. For example, if you have one study with z=2.5 (almost significant at the 1% level!) and another with z=1 (not statistically significant at all, only 1 se from zero!), then their difference has a z of about 1 (again, not statistically significant at all). So it’s not just a comparison of 0.05 vs. 0.06, even a differenc

4 0.094817109 1607 andrew gelman stats-2012-12-05-The p-value is not . . .

Introduction: From a recent email exchange: I agree that you should never compare p-values directly. The p-value is a strange nonlinear transformation of data that is only interpretable under the null hypothesis. Once you abandon the null (as we do when we observe something with a very low p-value), the p-value itself becomes irrelevant. To put it another way, the p-value is a measure of evidence, it is not an estimate of effect size (as it is often treated, with the idea that a p=.001 effect is larger than a p=.01 effect, etc). Even conditional on sample size, the p-value is not a measure of effect size.

5 0.09252885 1966 andrew gelman stats-2013-08-03-Uncertainty in parameter estimates using multilevel models

Introduction: David Hsu writes: I have a (perhaps) simple question about uncertainty in parameter estimates using multilevel models — what is an appropriate threshold for measure parameter uncertainty in a multilevel model? The reason why I ask is that I set out to do a crossed two-way model with two varying intercepts, similar to your flight simulator example in your 2007 book. The difference is that I have a lot of predictors specific to each cell (I think equivalent to airport and pilot in your example), and I find after modeling this in JAGS, I happily find that the predictors are much less important than the variability by cell (airport and pilot effects). Happily because this is what I am writing a paper about. However, I then went to check subsets of predictors using lm() and lmer(). I understand that they all use different estimation methods, but what I can’t figure out is why the errors on all of the coefficient estimates are *so* different. For example, using JAGS, and th

6 0.08815828 56 andrew gelman stats-2010-05-28-Another argument in favor of expressing conditional probability statements using the population distribution

7 0.083728693 1265 andrew gelman stats-2012-04-15-Progress in U.S. education; also, a discussion of what it takes to hit the op-ed pages

8 0.083410926 1605 andrew gelman stats-2012-12-04-Write This Book

9 0.082280718 351 andrew gelman stats-2010-10-18-“I was finding the test so irritating and boring that I just started to click through as fast as I could”

10 0.081255123 401 andrew gelman stats-2010-11-08-Silly old chi-square!

11 0.081069909 888 andrew gelman stats-2011-09-03-A psychology researcher asks: Is Anova dead?

12 0.080046117 899 andrew gelman stats-2011-09-10-The statistical significance filter

13 0.077599555 1910 andrew gelman stats-2013-06-22-Struggles over the criticism of the “cannabis users and IQ change” paper

14 0.077504613 923 andrew gelman stats-2011-09-24-What is the normal range of values in a medical test?

15 0.075972468 255 andrew gelman stats-2010-09-04-How does multilevel modeling affect the estimate of the grand mean?

16 0.075662106 972 andrew gelman stats-2011-10-25-How do you interpret standard errors from a regression fit to the entire population?

17 0.075034767 957 andrew gelman stats-2011-10-14-Questions about a study of charter schools

18 0.07450562 1891 andrew gelman stats-2013-06-09-“Heterogeneity of variance in experimental studies: A challenge to conventional interpretations”

19 0.07336247 1209 andrew gelman stats-2012-03-12-As a Bayesian I want scientists to report their data non-Bayesianly

20 0.072891131 1039 andrew gelman stats-2011-12-02-I just flew in from the econ seminar, and boy are my arms tired


similar blogs computed by lsi model

lsi for this blog:

topicId topicWeight

[(0, 0.14), (1, 0.006), (2, 0.037), (3, -0.074), (4, 0.028), (5, -0.008), (6, 0.009), (7, 0.022), (8, 0.012), (9, -0.005), (10, -0.03), (11, -0.008), (12, 0.055), (13, -0.069), (14, 0.013), (15, 0.012), (16, -0.024), (17, -0.005), (18, 0.002), (19, 0.004), (20, 0.01), (21, 0.0), (22, 0.013), (23, -0.024), (24, -0.007), (25, -0.03), (26, 0.016), (27, -0.013), (28, -0.026), (29, 0.011), (30, 0.03), (31, -0.012), (32, 0.026), (33, 0.038), (34, 0.044), (35, 0.034), (36, -0.018), (37, 0.01), (38, 0.028), (39, 0.005), (40, 0.02), (41, -0.019), (42, -0.004), (43, -0.002), (44, -0.009), (45, -0.04), (46, 0.025), (47, -0.054), (48, 0.03), (49, -0.027)]

similar blogs list:

simIndex simValue blogId blogTitle

same-blog 1 0.96425474 1702 andrew gelman stats-2013-02-01-Don’t let your standard errors drive your research agenda

Introduction: Alexis Le Nestour writes: How do you test for no effect? I attended a seminar where the person assumed that a non significant difference between groups implied an absence of effect. In that case, the researcher needed to show that two groups were similar before being hit by a shock conditional on some observable variables. The assumption was that the two groups were similar and that the shock was random. What would be the good way to set up a test in that case? I know you’ve been through that before (http://andrewgelman.com/2009/02/not_statistical/) and there are interesting comments but I wanted to have your opinion on that. My reply: I think you have to get quantitative here. How similar is similar? Don’t let your standard errors drive your research agenda. Or, to put it another way, what would you do if you had all the data? If your sample size were 1 zillion, then everything would statistically distinguishable from everything else. And then you’d have to think about w

2 0.76534861 1150 andrew gelman stats-2012-02-02-The inevitable problems with statistical significance and 95% intervals

Introduction: I’m thinking more and more that we have to get rid of statistical significance, 95% intervals, and all the rest, and just come to a more fundamental acceptance of uncertainty. In practice, I think we use confidence intervals and hypothesis tests as a way to avoid acknowledging uncertainty. We set up some rules and then act as if we know what is real and what is not. Even in my own applied work, I’ve often enough presented 95% intervals and gone on from there. But maybe that’s just not right. I was thinking about this after receiving the following email from a psychology student: I [the student] am trying to conceptualize the lessons in your paper with Stern with comparing treatment effects across studies. When trying to understand if a certain intervention works, we must look at what the literature says. However this can be complicated if the literature has divergent results. There are four situations I am thinking of. FOr each of these situations, assume the studies are r

3 0.76433599 106 andrew gelman stats-2010-06-23-Scientists can read your mind . . . as long as the’re allowed to look at more than one place in your brain and then make a prediction after seeing what you actually did

Introduction: Maggie Fox writes : Brain scans may be able to predict what you will do better than you can yourself . . . They found a way to interpret “real time” brain images to show whether people who viewed messages about using sunscreen would actually use sunscreen during the following week. The scans were more accurate than the volunteers were, Emily Falk and colleagues at the University of California Los Angeles reported in the Journal of Neuroscience. . . . About half the volunteers had correctly predicted whether they would use sunscreen. The research team analyzed and re-analyzed the MRI scans to see if they could find any brain activity that would do better. Activity in one area of the brain, a particular part of the medial prefrontal cortex, provided the best information. “From this region of the brain, we can predict for about three-quarters of the people whether they will increase their use of sunscreen beyond what they say they will do,” Lieberman said. “It is the one re

4 0.75708514 212 andrew gelman stats-2010-08-17-Futures contracts, Granger causality, and my preference for estimation to testing

Introduction: José Iparraguirre writes: There’s a letter in the latest issue of The Economist (July 31st) signed by Sir Richard Branson (Virgin), Michael Masters (Masters Capital Management) and David Frenk (Better Markets) about an “>OECD report on speculation and the prices of commodities, which includes the following: “The report uses a Granger causality test to measure the relationship between the level of commodities futures contracts held by swap dealers, and the prices of those commodities. Granger tests, however, are of dubious applicability to extremely volatile variables like commodities prices.” The report says: Granger causality is a standard statistical technique for determining whether one time series is useful in forecasting another. It is important to bear in mind that the term causality is used in a statistical sense, and not in a philosophical one of structural causation. More precisely a variable A is said to Granger cause B if knowing the time paths of B and A toge

5 0.75193352 2090 andrew gelman stats-2013-11-05-How much do we trust a new claim that early childhood stimulation raised earnings by 42%?

Introduction: Hal Pashler wrote in about a recent paper , “Labor Market Returns to Early Childhood Stimulation: a 20-year Followup to an Experimental Intervention in Jamaica,” by Paul Gertler, James Heckman, Rodrigo Pinto, Arianna Zanolini, Christel Vermeerch, Susan Walker, Susan M. Chang, and Sally Grantham-McGregor. Here’s Pashler: Dan Willingham tweeted: @DTWillingham: RCT from Jamaica: Big effects 20 years later of intervention—teaching parenting/child stimulation to moms in poverty http://t.co/rX6904zxvN Browsing pp. 4 ff, it seems the authors are basically saying “hey the stats were challenging, the sample size tiny, other problems, but we solved them all—using innovative methods of our own devising!—and lo and behold, big positive results!”. So this made me think (and tweet) basically that I hope the topic (which is pretty important) will happen to interest Andy Gelman enough to incline him to give us his take. If you happen to have time and interest… My reply became this artic

6 0.74727488 2042 andrew gelman stats-2013-09-28-Difficulties of using statistical significance (or lack thereof) to sift through and compare research hypotheses

7 0.73574251 1910 andrew gelman stats-2013-06-22-Struggles over the criticism of the “cannabis users and IQ change” paper

8 0.73566401 1746 andrew gelman stats-2013-03-02-Fishing for cherries

9 0.73551899 401 andrew gelman stats-2010-11-08-Silly old chi-square!

10 0.72769558 1605 andrew gelman stats-2012-12-04-Write This Book

11 0.7224704 2183 andrew gelman stats-2014-01-23-Discussion on preregistration of research studies

12 0.71971321 695 andrew gelman stats-2011-05-04-Statistics ethics question

13 0.71695465 791 andrew gelman stats-2011-07-08-Censoring on one end, “outliers” on the other, what can we do with the middle?

14 0.71013165 2159 andrew gelman stats-2014-01-04-“Dogs are sensitive to small variations of the Earth’s magnetic field”

15 0.70949405 490 andrew gelman stats-2010-12-29-Brain Structure and the Big Five

16 0.70742851 561 andrew gelman stats-2011-02-06-Poverty, educational performance – and can be done about it

17 0.70595378 2295 andrew gelman stats-2014-04-18-One-tailed or two-tailed?

18 0.70434892 1070 andrew gelman stats-2011-12-19-The scope for snooping

19 0.7041828 888 andrew gelman stats-2011-09-03-A psychology researcher asks: Is Anova dead?

20 0.70350003 2223 andrew gelman stats-2014-02-24-“Edlin’s rule” for routinely scaling down published estimates


similar blogs computed by lda model

lda for this blog:

topicId topicWeight

[(0, 0.024), (16, 0.091), (21, 0.067), (24, 0.183), (49, 0.028), (61, 0.037), (82, 0.022), (89, 0.17), (99, 0.256)]

similar blogs list:

simIndex simValue blogId blogTitle

1 0.96391678 1708 andrew gelman stats-2013-02-05-Wouldn’t it be cool if Glenn Hubbard were consulting for Herbalife and I were on the other side?

Introduction: I remember in 4th grade or so, the teacher would give us a list of vocabulary words each week and we’d have to show we learned them by using each in a sentence. We quickly got bored and decided to do the assignment by writing a single sentence using all ten words. (Which the teacher hated, of course.) The above headline is in that spirit, combining blog posts rather than vocabulary words. But that only uses two of the entries. To really do the job, I’d need to throw in bivariate associations, ecological fallacies, high-dimensional feature selection, statistical significance, the suddenly unpopular name Hilary, snotty reviewers, the contagion of obesity, and milk-related spam. Or we could bring in some of the all-time favorites, such as Bayesians, economists, Finland, beautiful parents and their daughters, goofy graphics, red and blue states, essentialism in children’s reasoning, chess running, and zombies. Putting 8 of these in a single sentence (along with Glenn Hubbard

2 0.95628625 833 andrew gelman stats-2011-07-31-Untunable Metropolis

Introduction: Michael Margolis writes: What are we to make of it when a Metropolis-Hastings step just won’t tune? That is, the acceptance rate is zero at expected-jump-size X, and way above 1/2 at X-exp(-16) (i.e., machine precision ). I’ve solved my practical problem by writing that I would have liked to include results from a diffuse prior, but couldn’t. But I’m bothered by the poverty of my intuition. And since everything I’ve read says this is an issue of efficiency, rather than accuracy, I wonder if I could solve it just by running massive and heavily thinned chains. My reply: I can’t see how this could happen in a well-specified problem! I suspect it’s a bug. Otherwise try rescaling your variables so that your parameters will have values on the order of magnitude of 1. To which Margolis responded: I hardly wrote any of the code, so I can’t speak to the bug question — it’s binomial kriging from the R package geoRglm. And there are no covariates to scale — just the zero and one

3 0.95384181 1160 andrew gelman stats-2012-02-09-Familial Linkage between Neuropsychiatric Disorders and Intellectual Interests

Introduction: When I spoke at Princeton last year, I talked with neuroscientist Sam Wang, who told me about a project he did surveying incoming Princeton freshmen about mental illness in their families. He and his coauthor Benjamin Campbell found some interesting results, which they just published : A link between intellect and temperament has long been the subject of speculation. . . . Studies of the artistically inclined report linkage with familial depression, while among eminent and creative scientists, a lower incidence of affective disorders is found. In the case of developmental disorders, a heightened prevalence of autism spectrum disorders (ASDs) has been found in the families of mathematicians, physicists, and engineers. . . . We surveyed the incoming class of 2014 at Princeton University about their intended academic major, familial incidence of neuropsychiatric disorders, and demographic variables. . . . Consistent with prior findings, we noticed a relation between intended academ

4 0.95247078 1756 andrew gelman stats-2013-03-10-He said he was sorry

Introduction: Yes, it can be done : Hereby I contact you to clarify the situation that occurred with the publication of the article entitled *** which was published in Volume 11, Issue 3 of *** and I made the mistake of declaring as an author. This chapter is a plagiarism of . . . I wish to express and acknowledge that I am solely responsible for this . . . I recognize the gravity of the offense committed, since there is no justification for so doing. Therefore, and as a sign of shame and regret I feel in this situation, I will publish this letter, in order to set an example for other researchers do not engage in a similar error. No more, and to please accept my apologies, Sincerely, *** P.S. Since we’re on Retraction Watch already, I’ll point you to this unrelated story featuring a hilarious photo of a fraudster, who in this case was a grad student in psychology who faked his data and “has agreed to submit to a three-year supervisory period for any work involving funding from the

5 0.94800937 407 andrew gelman stats-2010-11-11-Data Visualization vs. Statistical Graphics

Introduction: I have this great talk on the above topic but nowhere to give it. Here’s the story. Several months ago, I was invited to speak at IEEE VisWeek. It sounded like a great opportunity. The organizer told me that there were typically about 700 people in the audience, and these are people in the visualization community whom I’d like to reach but normally wouldn’t have the opportunity to encounter. It sounded great, but I didn’t want to fly most of the way across the country by myself, so I offered to give the talk by videolink. I was surprised to get a No response: I’d think that a visualization conference, of all things, would welcome a video talk. In the meantime, though, I’d thought a lot about what I’d talk about and had started preparing something. Once I found out I wouldn’t be giving the talk, I channeled the efforts into an article which, with the collaboration of Antony Unwin, was completed about a month ago. It would take very little effort to adapt this graph-laden a

6 0.94568133 1215 andrew gelman stats-2012-03-16-The “hot hand” and problems with hypothesis testing

same-blog 7 0.94104588 1702 andrew gelman stats-2013-02-01-Don’t let your standard errors drive your research agenda

8 0.94074255 1572 andrew gelman stats-2012-11-10-I don’t like this cartoon

9 0.93970561 1991 andrew gelman stats-2013-08-21-BDA3 table of contents (also a new paper on visualization)

10 0.93220037 1320 andrew gelman stats-2012-05-14-Question 4 of my final exam for Design and Analysis of Sample Surveys

11 0.93117523 2243 andrew gelman stats-2014-03-11-The myth of the myth of the myth of the hot hand

12 0.92617178 231 andrew gelman stats-2010-08-24-Yet another Bayesian job opportunity

13 0.91387284 623 andrew gelman stats-2011-03-21-Baseball’s greatest fielders

14 0.91287386 566 andrew gelman stats-2011-02-09-The boxer, the wrestler, and the coin flip, again

15 0.91175884 1580 andrew gelman stats-2012-11-16-Stantastic!

16 0.90805721 1628 andrew gelman stats-2012-12-17-Statistics in a world where nothing is random

17 0.90588307 1390 andrew gelman stats-2012-06-23-Traditionalist claims that modern art could just as well be replaced by a “paint-throwing chimp”

18 0.90379548 1473 andrew gelman stats-2012-08-28-Turing chess run update

19 0.90128708 459 andrew gelman stats-2010-12-09-Solve mazes by starting at the exit

20 0.90006149 1477 andrew gelman stats-2012-08-30-Visualizing Distributions of Covariance Matrices