andrew_gelman_stats andrew_gelman_stats-2013 andrew_gelman_stats-2013-1746 knowledge-graph by maker-knowledge-mining

1746 andrew gelman stats-2013-03-02-Fishing for cherries


meta infos for this blog

Source: html

Introduction: Someone writes: I’m currently trying to make sense of the Army’s preliminary figures on their Comprehensive Soldier Fitness programme, which I found here . That report (see for example table 4 on p.15) has only a few very small “effect sizes” with p<.01 on some of the subscales and nothing significant on the rest. It looks to me like it's not much different from random noise, which I suspect might be caused by the large N (and there's more to come, because N for the whole programme will be in excess of 1 million). While googling on the subject of large N, I came across this entry in your blog. My question is, does that imply that when one has a large N – and, thus, presumably, large statistical power – one should systematically reduce alpha as well? Is there any literature on this? Does one always/sometimes/never need to take Lindley’s “paradox” into account? And a supplementary question: can it ever be legitimate to quote a result as significant for one DV (“Social fi


Summary: the most important sentenses genereted by tfidf model

sentIndex sentText sentNum sentScore

1 Someone writes: I’m currently trying to make sense of the Army’s preliminary figures on their Comprehensive Soldier Fitness programme, which I found here . [sent-1, score-0.09]

2 01 on some of the subscales and nothing significant on the rest. [sent-4, score-0.089]

3 It looks to me like it's not much different from random noise, which I suspect might be caused by the large N (and there's more to come, because N for the whole programme will be in excess of 1 million). [sent-5, score-0.719]

4 While googling on the subject of large N, I came across this entry in your blog. [sent-6, score-0.392]

5 My question is, does that imply that when one has a large N – and, thus, presumably, large statistical power – one should systematically reduce alpha as well? [sent-7, score-0.364]

6 And a supplementary question: can it ever be legitimate to quote a result as significant for one DV (“Social fitness” in table 4) when it is simply (cf. [sent-10, score-0.387]

7 10) an amalgam of the four other DVs listed immediately underneath it, of which one (“Friendliness”) has a significance of <. [sent-12, score-0.111]

8 CSF is a $140 million programme that has been controversial for all sorts of reasons. [sent-17, score-0.434]

9 There’s a whole bunch of other stuff that about this process, such as their use of MANOVA at T1 and “ANOVA with blocking” at T2, that makes me think they are on a fishing expedition for cherries to pick. [sent-18, score-0.394]

10 For example, the means in some of the tables are “estimated marginal means” (MANOVA output), the SD values are in fact SEMs, and I have no idea why they are expressing effect sizes as partial eta squared when they only have one independent variable. [sent-19, score-0.482]

11 But I’m a complete newbie to stats, so I’m probably missing a lot of stuff. [sent-20, score-0.111]

12 That report is almost a parody of military bureaucracy! [sent-22, score-0.183]

13 The people doing this research have real problems for which there are no easy solutions. [sent-24, score-0.1]

14 In short: none of the effects is zero and there’s gotta be a lot of variation across people and across subgroups of people. [sent-25, score-0.633]

15 It’s a classic multiple comparisons situation, but the null hypothesis of zero effects (which is standard in multiple-comparisons analyses) is clearly inappropriate. [sent-27, score-0.339]

16 Multilevel modeling seems like a good idea but it requires real modeling and real thought, not simply plugging the data into an 8-schols program. [sent-28, score-0.556]

17 We have seen the same issues arising in education research, another area with multiple outcomes, treatments varying across predictors, and small aggregate effects. [sent-29, score-0.468]


similar blogs computed by tfidf model

tfidf for this blog:

wordName wordTfidf (topN-words)

[('programme', 0.32), ('manova', 0.259), ('fitness', 0.19), ('across', 0.167), ('large', 0.141), ('multiple', 0.135), ('sizes', 0.121), ('bureaucracy', 0.118), ('dvs', 0.118), ('ns', 0.118), ('ps', 0.118), ('sems', 0.118), ('blocking', 0.118), ('cherries', 0.118), ('table', 0.118), ('million', 0.114), ('soldier', 0.111), ('newbie', 0.111), ('underneath', 0.111), ('eta', 0.111), ('effects', 0.11), ('plugging', 0.107), ('dv', 0.103), ('expedition', 0.103), ('real', 0.1), ('army', 0.1), ('parody', 0.1), ('subgroups', 0.095), ('supplementary', 0.095), ('zero', 0.094), ('comprehensive', 0.091), ('preliminary', 0.09), ('significant', 0.089), ('fishing', 0.087), ('squared', 0.087), ('whole', 0.086), ('looks', 0.086), ('excess', 0.086), ('means', 0.085), ('simply', 0.085), ('googling', 0.084), ('anova', 0.084), ('issues', 0.084), ('lindley', 0.083), ('report', 0.083), ('alpha', 0.082), ('arising', 0.082), ('modeling', 0.082), ('sd', 0.081), ('expressing', 0.078)]

similar blogs list:

simIndex simValue blogId blogTitle

same-blog 1 1.0 1746 andrew gelman stats-2013-03-02-Fishing for cherries

Introduction: Someone writes: I’m currently trying to make sense of the Army’s preliminary figures on their Comprehensive Soldier Fitness programme, which I found here . That report (see for example table 4 on p.15) has only a few very small “effect sizes” with p<.01 on some of the subscales and nothing significant on the rest. It looks to me like it's not much different from random noise, which I suspect might be caused by the large N (and there's more to come, because N for the whole programme will be in excess of 1 million). While googling on the subject of large N, I came across this entry in your blog. My question is, does that imply that when one has a large N – and, thus, presumably, large statistical power – one should systematically reduce alpha as well? Is there any literature on this? Does one always/sometimes/never need to take Lindley’s “paradox” into account? And a supplementary question: can it ever be legitimate to quote a result as significant for one DV (“Social fi

2 0.12869617 1355 andrew gelman stats-2012-05-31-Lindley’s paradox

Introduction: Sam Seaver writes: I [Seaver] happened to be reading an ironic article by Karl Friston when I learned something new about frequentist vs bayesian, namely Lindley’s paradox, on page 12. The text is as follows: So why are we worried about trivial effects? They are important because the probability that the true effect size is exactly zero is itself zero and could cause us to reject the null hypothesis inappropriately. This is a fallacy of classical inference and is not unrelated to Lindley’s paradox (Lindley 1957). Lindley’s paradox describes a counterintuitive situation in which Bayesian and frequentist approaches to hypothesis testing give opposite results. It occurs when; (i) a result is significant by a frequentist test, indicating sufficient evidence to reject the null hypothesis d=0 and (ii) priors render the posterior probability of d=0 high, indicating strong evidence that the null hypothesis is true. In his original treatment, Lindley (1957) showed that – under a parti

3 0.12858832 888 andrew gelman stats-2011-09-03-A psychology researcher asks: Is Anova dead?

Introduction: A research psychologist writes in with a question that’s so long that I’ll put my answer first, then put the question itself below the fold. Here’s my reply: As I wrote in my Anova paper and in my book with Jennifer Hill, I do think that multilevel models can completely replace Anova. At the same time, I think the central idea of Anova should persist in our understanding of these models. To me the central idea of Anova is not F-tests or p-values or sums of squares, but rather the idea of predicting an outcome based on factors with discrete levels, and understanding these factors using variance components. The continuous or categorical response thing doesn’t really matter so much to me. I have no problem using a normal linear model for continuous outcomes (perhaps suitably transformed) and a logistic model for binary outcomes. I don’t want to throw away interactions just because they’re not statistically significant. I’d rather partially pool them toward zero using an inform

4 0.12828301 1989 andrew gelman stats-2013-08-20-Correcting for multiple comparisons in a Bayesian regression model

Introduction: Joe Northrup writes: I have a question about correcting for multiple comparisons in a Bayesian regression model. I believe I understand the argument in your 2012 paper in Journal of Research on Educational Effectiveness that when you have a hierarchical model there is shrinkage of estimates towards the group-level mean and thus there is no need to add any additional penalty to correct for multiple comparisons. In my case I do not have hierarchically structured data—i.e. I have only 1 observation per group but have a categorical variable with a large number of categories. Thus, I am fitting a simple multiple regression in a Bayesian framework. Would putting a strong, mean 0, multivariate normal prior on the betas in this model accomplish the same sort of shrinkage (it seems to me that it would) and do you believe this is a valid way to address criticism of multiple comparisons in this setting? My reply: Yes, I think this makes sense. One way to address concerns of multiple com

5 0.12070241 2042 andrew gelman stats-2013-09-28-Difficulties of using statistical significance (or lack thereof) to sift through and compare research hypotheses

Introduction: Dean Eckles writes: Thought you might be interested in an example that touches on a couple recurring topics: 1. The difference between a statistically significant finding and one that is non-significant need not be itself statistically significant (thus highlighting the problems of using NHST to declare whether an effect exists or not). 2. Continued issues with the credibility of high profile studies of “social contagion”, especially by Christakis and Fowler . A new paper in Archives of Sexual Behavior produces observational estimates of peer effects in sexual behavior and same-sex attraction. In the text, the authors (who include C&F;) make repeated comparisons of the results for peer effects in sexual intercourse and those for peer effects in same-sex attraction. However, the 95% CI for the later actually includes the point estimate for the former! This is most clear in Figure 2, as highlighted by Real Clear Science’s blog post about the study. (Now because there is som

6 0.1177446 963 andrew gelman stats-2011-10-18-Question on Type M errors

7 0.11705479 1744 andrew gelman stats-2013-03-01-Why big effects are more important than small effects

8 0.11268045 511 andrew gelman stats-2011-01-11-One more time on that ESP study: The problem of overestimates and the shrinkage solution

9 0.11111432 643 andrew gelman stats-2011-04-02-So-called Bayesian hypothesis testing is just as bad as regular hypothesis testing

10 0.11031353 472 andrew gelman stats-2010-12-17-So-called fixed and random effects

11 0.10881237 1605 andrew gelman stats-2012-12-04-Write This Book

12 0.10362396 803 andrew gelman stats-2011-07-14-Subtleties with measurement-error models for the evaluation of wacky claims

13 0.10162812 1691 andrew gelman stats-2013-01-25-Extreem p-values!

14 0.10147966 753 andrew gelman stats-2011-06-09-Allowing interaction terms to vary

15 0.098753385 1016 andrew gelman stats-2011-11-17-I got 99 comparisons but multiplicity ain’t one

16 0.096504055 1891 andrew gelman stats-2013-06-09-“Heterogeneity of variance in experimental studies: A challenge to conventional interpretations”

17 0.09519238 1150 andrew gelman stats-2012-02-02-The inevitable problems with statistical significance and 95% intervals

18 0.095077425 1695 andrew gelman stats-2013-01-28-Economists argue about Bayes

19 0.093506761 1966 andrew gelman stats-2013-08-03-Uncertainty in parameter estimates using multilevel models

20 0.091799244 1241 andrew gelman stats-2012-04-02-Fixed effects and identification


similar blogs computed by lsi model

lsi for this blog:

topicId topicWeight

[(0, 0.206), (1, 0.019), (2, 0.064), (3, -0.11), (4, 0.047), (5, -0.029), (6, 0.008), (7, 0.009), (8, 0.017), (9, 0.009), (10, -0.036), (11, -0.019), (12, 0.078), (13, -0.047), (14, 0.048), (15, 0.016), (16, -0.048), (17, -0.007), (18, -0.001), (19, 0.003), (20, 0.014), (21, 0.011), (22, -0.003), (23, 0.022), (24, -0.062), (25, -0.073), (26, -0.026), (27, 0.05), (28, -0.034), (29, -0.015), (30, 0.045), (31, -0.01), (32, 0.015), (33, -0.025), (34, -0.003), (35, -0.021), (36, 0.052), (37, -0.032), (38, 0.004), (39, 0.004), (40, -0.041), (41, -0.013), (42, -0.009), (43, -0.017), (44, 0.01), (45, 0.008), (46, -0.008), (47, -0.01), (48, -0.013), (49, -0.021)]

similar blogs list:

simIndex simValue blogId blogTitle

same-blog 1 0.97237122 1746 andrew gelman stats-2013-03-02-Fishing for cherries

Introduction: Someone writes: I’m currently trying to make sense of the Army’s preliminary figures on their Comprehensive Soldier Fitness programme, which I found here . That report (see for example table 4 on p.15) has only a few very small “effect sizes” with p<.01 on some of the subscales and nothing significant on the rest. It looks to me like it's not much different from random noise, which I suspect might be caused by the large N (and there's more to come, because N for the whole programme will be in excess of 1 million). While googling on the subject of large N, I came across this entry in your blog. My question is, does that imply that when one has a large N – and, thus, presumably, large statistical power – one should systematically reduce alpha as well? Is there any literature on this? Does one always/sometimes/never need to take Lindley’s “paradox” into account? And a supplementary question: can it ever be legitimate to quote a result as significant for one DV (“Social fi

2 0.79679251 963 andrew gelman stats-2011-10-18-Question on Type M errors

Introduction: Inti Pedroso writes: Today during the group meeting at my new job we were revising a paper whose main conclusions were sustained by an ANOVA. One of the first observations is that the experiment had a small sample size. Interestingly (may not so), some of the reported effects (most of them interactions) were quite large. One of the experience group members said that “there is a common wisdom that one should not believe effects from small sample sizes but [he thinks] if they [the effects] are large enough to be picked on a small study they must be real large effects”. I argued that if the sample size is small one could incur on a M-type error in which the magnitude of the effect is being over-estimated and that if larger samples are evaluated the magnitude may become smaller and also the confidence intervals. The concept of M-type error is completely new to all other members of the group (on which I am in my second week) and I was given the job of finding a suitable ref to explain

3 0.78437132 1400 andrew gelman stats-2012-06-29-Decline Effect in Linguistics?

Introduction: Josef Fruehwald writes : In the past few years, the empirical foundations of the social sciences, especially Psychology, have been coming under increased scrutiny and criticism. For example, there was the New Yorker piece from 2010 called “The Truth Wears Off” about the “decline effect,” or how the effect size of a phenomenon appears to decrease over time. . . . I [Fruehwald] am a linguist. Do the problems facing psychology face me? To really answer that, I first have to decide which explanation for the decline effect I think is most likely, and I think Andrew Gelman’s proposal is a good candidate: The short story is that if you screen for statistical significance when estimating small effects, you will necessarily overestimate the magnitudes of effects, sometimes by a huge amount. I’ve put together some R code to demonstrate this point. Let’s say I’m looking at two populations, and unknown to me as a researcher, there is a small difference between the two, even though they

4 0.77973783 1691 andrew gelman stats-2013-01-25-Extreem p-values!

Introduction: Joshua Vogelstein writes: I know you’ve discussed this on your blog in the past, but I don’t know exactly how you’d answer the following query: Suppose you run an analysis and obtain a p-value of 10^-300. What would you actually report? I’m fairly confident that I’m not that confident :) I’m guessing: “p-value \approx 0.” One possibility is to determine the accuracy with this one *could* in theory know, by virtue of the sample size, and say that p-value is less than or equal to that? For example, if I used a Monte Carlo approach to generate the null distribution with 10,000 samples, and I found that the observed value was more extreme than all of the sample values, then I might say that p is less than or equal to 1/10,000. My reply: Mosteller and Wallace talked a bit about this in their book, the idea that there are various other 1-in-a-million possibilities (for example, the data were faked somewhere before they got to you) so p-values such as 10^-6 don’t really mean an

5 0.77714658 511 andrew gelman stats-2011-01-11-One more time on that ESP study: The problem of overestimates and the shrinkage solution

Introduction: Benedict Carey writes a follow-up article on ESP studies and Bayesian statistics. ( See here for my previous thoughts on the topic.) Everything Carey writes is fine, and he even uses an example I recommended: The statistical approach that has dominated the social sciences for almost a century is called significance testing. The idea is straightforward. A finding from any well-designed study — say, a correlation between a personality trait and the risk of depression — is considered “significant” if its probability of occurring by chance is less than 5 percent. This arbitrary cutoff makes sense when the effect being studied is a large one — for example, when measuring the so-called Stroop effect. This effect predicts that naming the color of a word is faster and more accurate when the word and color match (“red” in red letters) than when they do not (“red” in blue letters), and is very strong in almost everyone. “But if the true effect of what you are measuring is small,” sai

6 0.77015096 1744 andrew gelman stats-2013-03-01-Why big effects are more important than small effects

7 0.76123917 2243 andrew gelman stats-2014-03-11-The myth of the myth of the myth of the hot hand

8 0.75917476 898 andrew gelman stats-2011-09-10-Fourteen magic words: an update

9 0.75457251 2042 andrew gelman stats-2013-09-28-Difficulties of using statistical significance (or lack thereof) to sift through and compare research hypotheses

10 0.75042701 2223 andrew gelman stats-2014-02-24-“Edlin’s rule” for routinely scaling down published estimates

11 0.74745744 1910 andrew gelman stats-2013-06-22-Struggles over the criticism of the “cannabis users and IQ change” paper

12 0.74431723 1215 andrew gelman stats-2012-03-16-The “hot hand” and problems with hypothesis testing

13 0.73470038 2090 andrew gelman stats-2013-11-05-How much do we trust a new claim that early childhood stimulation raised earnings by 42%?

14 0.7331354 1702 andrew gelman stats-2013-02-01-Don’t let your standard errors drive your research agenda

15 0.72861063 1883 andrew gelman stats-2013-06-04-Interrogating p-values

16 0.72852248 1186 andrew gelman stats-2012-02-27-Confusion from illusory precision

17 0.71997744 888 andrew gelman stats-2011-09-03-A psychology researcher asks: Is Anova dead?

18 0.71822119 490 andrew gelman stats-2010-12-29-Brain Structure and the Big Five

19 0.71553099 2183 andrew gelman stats-2014-01-23-Discussion on preregistration of research studies

20 0.71135634 2040 andrew gelman stats-2013-09-26-Difficulties in making inferences about scientific truth from distributions of published p-values


similar blogs computed by lda model

lda for this blog:

topicId topicWeight

[(9, 0.033), (16, 0.049), (21, 0.036), (24, 0.169), (36, 0.017), (57, 0.037), (62, 0.025), (63, 0.015), (86, 0.054), (94, 0.077), (98, 0.014), (99, 0.305)]

similar blogs list:

simIndex simValue blogId blogTitle

same-blog 1 0.97794473 1746 andrew gelman stats-2013-03-02-Fishing for cherries

Introduction: Someone writes: I’m currently trying to make sense of the Army’s preliminary figures on their Comprehensive Soldier Fitness programme, which I found here . That report (see for example table 4 on p.15) has only a few very small “effect sizes” with p<.01 on some of the subscales and nothing significant on the rest. It looks to me like it's not much different from random noise, which I suspect might be caused by the large N (and there's more to come, because N for the whole programme will be in excess of 1 million). While googling on the subject of large N, I came across this entry in your blog. My question is, does that imply that when one has a large N – and, thus, presumably, large statistical power – one should systematically reduce alpha as well? Is there any literature on this? Does one always/sometimes/never need to take Lindley’s “paradox” into account? And a supplementary question: can it ever be legitimate to quote a result as significant for one DV (“Social fi

2 0.97216314 1211 andrew gelman stats-2012-03-13-A personal bit of spam, just for me!

Introduction: Hi Andrew, I came across your site while searching for blogs and posts around American obesity and wanted to reach out to get your readership’s feedback on an infographic my team built which focuses on the obesity of America and where we could end up at the going rate. If you’re interested, let’s connect. Have a great weekend! Thanks. *** I have to say, that’s pretty pitiful, to wish someone a “great weekend” on a Tuesday! This guy’s gotta ratchet up his sophistication a few notches if he ever wants to get a job as a spammer for a major software company , for example.

3 0.9701584 582 andrew gelman stats-2011-02-20-Statisticians vs. everybody else

Introduction: Statisticians are literalists. When someone says that the U.K. boundary commission’s delay in redistricting gave the Tories an advantage equivalent to 10 percent of the vote, we’re the kind of person who looks it up and claims that the effect is less than 0.7 percent. When someone says, “Since 1968, with the single exception of the election of George W. Bush in 2000, Americans have chosen Republican presidents in times of perceived danger and Democrats in times of relative calm,” we’re like, Hey, really? And we go look that one up too. And when someone says that engineers have more sons and nurses have more daughters . . . well, let’s not go there. So, when I was pointed to this blog by Michael O’Hare making the following claim, in the context of K-12 education in the United States: My [O'Hare's] favorite examples of this junk [educational content with no workplace value] are spelling and pencil-and-paper algorithm arithmetic. These are absolutely critical for a clerk

4 0.96802175 1510 andrew gelman stats-2012-09-25-Incoherence of Bayesian data analysis

Introduction: Hogg writes: At the end this article you wonder about consistency. Have you ever considered the possibility that utility might resolve some of the problems? I have no idea if it would—I am not advocating that position—I just get some kind of intuition from phrases like “Judgment is required to decide…”. Perhaps there is a coherent and objective description of what is—or could be—done under a coherent “utility” model (like a utility that could be objectively agreed upon and computed). Utilities are usually subjective—true—but priors are usually subjective too. My reply: I’m happy to think about utility, for some particular problem or class of problems going to the effort of assigning costs and benefits to different outcomes. I agree that a utility analysis, even if (necessarily) imperfect, can usefully focus discussion. For example, if a statistical method for selecting variables is justified on the basis of cost, I like the idea of attempting to quantify the costs of ga

5 0.96461821 1523 andrew gelman stats-2012-10-06-Comparing people from two surveys, one of which is a simple random sample and one of which is not

Introduction: Juli writes: I’m helping a professor out with an analysis, and I was hoping that you might be able to point me to some relevant literature… She has two studies that have been completed already (so we can’t go back to the planning stage in terms of sampling, unfortunately). Both studies are based around the population of adults in LA who attended LA public high schools at some point, so that is the same for both studies. Study #1 uses random digit dialing, so I consider that one to be SRS. Study #2, however, is a convenience sample in which all participants were involved with one of eight community-based organizations (CBOs). Of course, both studies can be analyzed independently, but she was hoping for there to be some way to combine/compare the two studies. Specifically, I am working on looking at the civic engagement of the adults in both studies. In study #1, this means looking at factors such as involvement in student government. In study #2, this means looking at involv

6 0.9638834 1760 andrew gelman stats-2013-03-12-Misunderstanding the p-value

7 0.96236986 324 andrew gelman stats-2010-10-07-Contest for developing an R package recommendation system

8 0.96133208 35 andrew gelman stats-2010-05-16-Another update on the spam email study

9 0.95847279 1695 andrew gelman stats-2013-01-28-Economists argue about Bayes

10 0.95830232 288 andrew gelman stats-2010-09-21-Discussion of the paper by Girolami and Calderhead on Bayesian computation

11 0.95807266 1950 andrew gelman stats-2013-07-22-My talks that were scheduled for Tues at the Data Skeptics meetup and Wed at the Open Statistical Programming meetup

12 0.95800334 1502 andrew gelman stats-2012-09-19-Scalability in education

13 0.9567889 781 andrew gelman stats-2011-06-28-The holes in my philosophy of Bayesian data analysis

14 0.95673537 1247 andrew gelman stats-2012-04-05-More philosophy of Bayes

15 0.95661771 262 andrew gelman stats-2010-09-08-Here’s how rumors get started: Lineplots, dotplots, and nonfunctional modernist architecture

16 0.95648289 107 andrew gelman stats-2010-06-24-PPS in Georgia

17 0.95596147 2174 andrew gelman stats-2014-01-17-How to think about the statistical evidence when the statistical evidence can’t be conclusive?

18 0.95574296 2055 andrew gelman stats-2013-10-08-A Bayesian approach for peer-review panels? and a speculation about Bruno Frey

19 0.95559615 878 andrew gelman stats-2011-08-29-Infovis, infographics, and data visualization: Where I’m coming from, and where I’d like to go

20 0.95542532 1605 andrew gelman stats-2012-12-04-Write This Book