andrew_gelman_stats andrew_gelman_stats-2011 andrew_gelman_stats-2011-963 knowledge-graph by maker-knowledge-mining
Source: html
Introduction: Inti Pedroso writes: Today during the group meeting at my new job we were revising a paper whose main conclusions were sustained by an ANOVA. One of the first observations is that the experiment had a small sample size. Interestingly (may not so), some of the reported effects (most of them interactions) were quite large. One of the experience group members said that “there is a common wisdom that one should not believe effects from small sample sizes but [he thinks] if they [the effects] are large enough to be picked on a small study they must be real large effects”. I argued that if the sample size is small one could incur on a M-type error in which the magnitude of the effect is being over-estimated and that if larger samples are evaluated the magnitude may become smaller and also the confidence intervals. The concept of M-type error is completely new to all other members of the group (on which I am in my second week) and I was given the job of finding a suitable ref to explain
sentIndex sentText sentNum sentScore
1 Inti Pedroso writes: Today during the group meeting at my new job we were revising a paper whose main conclusions were sustained by an ANOVA. [sent-1, score-0.912]
2 One of the first observations is that the experiment had a small sample size. [sent-2, score-0.722]
3 Interestingly (may not so), some of the reported effects (most of them interactions) were quite large. [sent-3, score-0.291]
4 One of the experience group members said that “there is a common wisdom that one should not believe effects from small sample sizes but [he thinks] if they [the effects] are large enough to be picked on a small study they must be real large effects”. [sent-4, score-2.258]
5 I argued that if the sample size is small one could incur on a M-type error in which the magnitude of the effect is being over-estimated and that if larger samples are evaluated the magnitude may become smaller and also the confidence intervals. [sent-5, score-1.716]
6 The concept of M-type error is completely new to all other members of the group (on which I am in my second week) and I was given the job of finding a suitable ref to explain it. [sent-6, score-1.138]
7 They acknowledge that the CI would decrease with large sample but not necessarily that the mean effect itself could be wrongly estimated. [sent-7, score-1.065]
8 The group is formed by biologist some of which have good stat knowledge but there are several undegrads which are just starting. [sent-8, score-0.621]
9 I was wondering if you know of an article describing M-type errors and how large they can be on small sample sizes? [sent-9, score-0.931]
10 My reply: In increasing order of mathematical sophistication, see this blog post, this semi-popular article, and this scholarly article. [sent-10, score-0.271]
11 I think there’s room for more research on the topic. [sent-11, score-0.093]
wordName wordTfidf (topN-words)
[('sample', 0.291), ('small', 0.257), ('group', 0.25), ('effects', 0.215), ('large', 0.207), ('magnitude', 0.192), ('members', 0.181), ('sizes', 0.177), ('ref', 0.163), ('wrongly', 0.156), ('revising', 0.136), ('biologist', 0.136), ('sustained', 0.134), ('formed', 0.131), ('ci', 0.131), ('sophistication', 0.129), ('job', 0.127), ('interestingly', 0.122), ('error', 0.122), ('suitable', 0.121), ('acknowledge', 0.117), ('wisdom', 0.114), ('decrease', 0.111), ('evaluated', 0.11), ('meeting', 0.105), ('stat', 0.104), ('effect', 0.102), ('picked', 0.102), ('scholarly', 0.102), ('argued', 0.098), ('concept', 0.097), ('describing', 0.095), ('observations', 0.095), ('thinks', 0.093), ('room', 0.093), ('may', 0.092), ('increasing', 0.091), ('interactions', 0.089), ('samples', 0.089), ('smaller', 0.088), ('confidence', 0.083), ('conclusions', 0.082), ('wondering', 0.081), ('necessarily', 0.081), ('experiment', 0.079), ('mathematical', 0.078), ('whose', 0.078), ('week', 0.078), ('finding', 0.077), ('reported', 0.076)]
simIndex simValue blogId blogTitle
same-blog 1 0.99999982 963 andrew gelman stats-2011-10-18-Question on Type M errors
Introduction: Inti Pedroso writes: Today during the group meeting at my new job we were revising a paper whose main conclusions were sustained by an ANOVA. One of the first observations is that the experiment had a small sample size. Interestingly (may not so), some of the reported effects (most of them interactions) were quite large. One of the experience group members said that “there is a common wisdom that one should not believe effects from small sample sizes but [he thinks] if they [the effects] are large enough to be picked on a small study they must be real large effects”. I argued that if the sample size is small one could incur on a M-type error in which the magnitude of the effect is being over-estimated and that if larger samples are evaluated the magnitude may become smaller and also the confidence intervals. The concept of M-type error is completely new to all other members of the group (on which I am in my second week) and I was given the job of finding a suitable ref to explain
2 0.20275521 803 andrew gelman stats-2011-07-14-Subtleties with measurement-error models for the evaluation of wacky claims
Introduction: A few days ago I discussed the evaluation of somewhat-plausible claims that are somewhat supported by theory and somewhat supported by statistical evidence. One point I raised was that an implausibly large estimate of effect size can be cause for concern: Uri Simonsohn (the author of the recent rebuttal of the name-choice article by Pelham et al.) argued that the implied effects were too large to be believed (just as I was arguing above regarding the July 4th study), which makes more plausible his claims that the results arise from methodological artifacts. That calculation is straight Bayes: the distribution of systematic errors has much longer tails than the distribution of random errors, so the larger the estimated effect, the more likely it is to be a mistake. This little theoretical result is a bit annoying, because it is the larger effects that are the most interesting!” Larry Bartels notes that my reasoning above is a bit incoherent: I [Bartels] strongly agree with
3 0.16365501 1074 andrew gelman stats-2011-12-20-Reading a research paper != agreeing with its claims
Introduction: A journalist wrote to me recently: I was going to include your deconstruction of the beautiful daughters paper, but ran out of space. The author, incidentally, stands by that paper — and emailed me that you’d advised him on a later paper, implying that meant you now accepted the thesis! I responded: I know that Kanazawa stands by his earlier claim. Unfortunately, in this case, “stands by” means believing something with no evidence. And, yes, he send me a copy of his other paper, and I responded by telling him I thought his sample size is too small. I did advise him, and my advice was to not do it! To avoid any future confusion, I thought I’d post my half of my emails with Kanazawa over the years: 28 Aug 2006: Dear Dr. Kanazawa, I read with interest your papers on sex ratios in the Journal of Theoretical Biology. Sex ratios are an inherently interesting topic and also a favorite example for people such as myself who teach probability and statistics. In the cour
4 0.16174707 695 andrew gelman stats-2011-05-04-Statistics ethics question
Introduction: A graduate student in public health writes: I have been asked to do the statistical analysis for a medical unit that is delivering a pilot study of a program to [details redacted to prevent identification]. They are using a prospective, nonrandomized, cohort-controlled trial study design. The investigator thinks they can recruit only a small number of treatment and control cases, maybe less than 30 in total. After I told the Investigator that I cannot do anything statistically with a sample size that small, he responded that small sample sizes are common in this field, and he send me an example of analysis that someone had done on a similar study. So he still wants me to come up with a statistical plan. Is it unethical for me to do anything other than descriptive statistics? I think he should just stick to qualitative research. But the study she mentions above has 40 subjects and apparently had enough power to detect some effects. This is a pilot study after all so the n does n
5 0.15611528 820 andrew gelman stats-2011-07-25-Design of nonrandomized cluster sample study
Introduction: Rhoderick Machekano writes: I have a design question which has been bothering me and wonder if you can clear for me. In my line of work, we often conveniently select health centers and from those sample patients. When I am doing sample size estimation under this design do I account for the design effect – since I expect outcomes in patients from the same health center to be correlated? Given that I didn’t random sample the health facilities, is my only limitation that I cannot generalize the results and make group level comparisons in the analysis? My response: You can generalize the results even if you didn’t randomly sample the health facilities. The only thing is that your generalization applies to the implicit population of facilities to which your sample is representative. You could try to move further on this by considering facility-level predictors. Regarding sample size estimation, see chapter 20 .
6 0.1518067 1744 andrew gelman stats-2013-03-01-Why big effects are more important than small effects
7 0.13951457 1400 andrew gelman stats-2012-06-29-Decline Effect in Linguistics?
9 0.1321809 1317 andrew gelman stats-2012-05-13-Question 3 of my final exam for Design and Analysis of Sample Surveys
10 0.13178524 1267 andrew gelman stats-2012-04-17-Hierarchical-multilevel modeling with “big data”
11 0.12802029 1644 andrew gelman stats-2012-12-30-Fixed effects, followed by Bayes shrinkage?
12 0.12661627 433 andrew gelman stats-2010-11-27-One way that psychology research is different than medical research
15 0.1177446 1746 andrew gelman stats-2013-03-02-Fishing for cherries
16 0.11539472 446 andrew gelman stats-2010-12-03-Is 0.05 too strict as a p-value threshold?
18 0.11405122 1414 andrew gelman stats-2012-07-12-Steven Pinker’s unconvincing debunking of group selection
19 0.11213134 107 andrew gelman stats-2010-06-24-PPS in Georgia
20 0.11177336 511 andrew gelman stats-2011-01-11-One more time on that ESP study: The problem of overestimates and the shrinkage solution
topicId topicWeight
[(0, 0.187), (1, 0.018), (2, 0.089), (3, -0.191), (4, 0.04), (5, 0.003), (6, -0.002), (7, 0.011), (8, -0.024), (9, -0.036), (10, -0.049), (11, -0.034), (12, 0.092), (13, -0.019), (14, 0.027), (15, 0.006), (16, -0.07), (17, -0.0), (18, 0.005), (19, 0.054), (20, -0.017), (21, 0.009), (22, 0.007), (23, -0.018), (24, -0.02), (25, -0.044), (26, -0.062), (27, 0.068), (28, -0.006), (29, -0.033), (30, -0.007), (31, -0.046), (32, -0.05), (33, -0.007), (34, 0.046), (35, 0.03), (36, -0.044), (37, -0.047), (38, -0.01), (39, -0.002), (40, 0.042), (41, -0.006), (42, -0.008), (43, 0.024), (44, 0.032), (45, -0.002), (46, -0.018), (47, -0.026), (48, -0.021), (49, 0.01)]
simIndex simValue blogId blogTitle
same-blog 1 0.98745072 963 andrew gelman stats-2011-10-18-Question on Type M errors
Introduction: Inti Pedroso writes: Today during the group meeting at my new job we were revising a paper whose main conclusions were sustained by an ANOVA. One of the first observations is that the experiment had a small sample size. Interestingly (may not so), some of the reported effects (most of them interactions) were quite large. One of the experience group members said that “there is a common wisdom that one should not believe effects from small sample sizes but [he thinks] if they [the effects] are large enough to be picked on a small study they must be real large effects”. I argued that if the sample size is small one could incur on a M-type error in which the magnitude of the effect is being over-estimated and that if larger samples are evaluated the magnitude may become smaller and also the confidence intervals. The concept of M-type error is completely new to all other members of the group (on which I am in my second week) and I was given the job of finding a suitable ref to explain
2 0.84385407 1400 andrew gelman stats-2012-06-29-Decline Effect in Linguistics?
Introduction: Josef Fruehwald writes : In the past few years, the empirical foundations of the social sciences, especially Psychology, have been coming under increased scrutiny and criticism. For example, there was the New Yorker piece from 2010 called “The Truth Wears Off” about the “decline effect,” or how the effect size of a phenomenon appears to decrease over time. . . . I [Fruehwald] am a linguist. Do the problems facing psychology face me? To really answer that, I first have to decide which explanation for the decline effect I think is most likely, and I think Andrew Gelman’s proposal is a good candidate: The short story is that if you screen for statistical significance when estimating small effects, you will necessarily overestimate the magnitudes of effects, sometimes by a huge amount. I’ve put together some R code to demonstrate this point. Let’s say I’m looking at two populations, and unknown to me as a researcher, there is a small difference between the two, even though they
3 0.82951778 7 andrew gelman stats-2010-04-27-Should Mister P be allowed-encouraged to reside in counter-factual populations?
Introduction: Lets say you are repeatedly going to recieve unselected sets of well done RCTs on various say medical treatments. One reasonable assumption with all of these treatments is that they are monotonic – either helpful or harmful for all. The treatment effect will (as always) vary for subgroups in the population – these will not be explicitly identified in the studies – but each study very likely will enroll different percentages of the variuos patient subgroups. Being all randomized studies these subgroups will be balanced in the treatment versus control arms – but each study will (as always) be estimating a different – but exchangeable – treatment effect (Exhangeable due to the ignorance about the subgroup memberships of the enrolled patients.) That reasonable assumption – monotonicity – will be to some extent (as always) wrong, but given that it is a risk believed well worth taking – if the average effect in any population is positive (versus negative) the average effect in any other
4 0.82597899 1744 andrew gelman stats-2013-03-01-Why big effects are more important than small effects
Introduction: The title of this post is silly but I have an important point to make, regarding an implicit model which I think many people assume even though it does not really make sense. Following a link from Sanjay Srivastava, I came across a post from David Funder saying that it’s useful to talk about the sizes of effects (I actually prefer the term “comparisons” so as to avoid the causal baggage) rather than just their signs. I agree , and I wanted to elaborate a bit on a point that comes up in Funder’s discussion. He quotes an (unnamed) prominent social psychologist as writing: The key to our research . . . [is not] to accurately estimate effect size. If I were testing an advertisement for a marketing research firm and wanted to be sure that the cost of the ad would produce enough sales to make it worthwhile, effect size would be crucial. But when I am testing a theory about whether, say, positive mood reduces information processing in comparison with negative mood, I am worried abou
5 0.81211263 2227 andrew gelman stats-2014-02-27-“What Can we Learn from the Many Labs Replication Project?”
Introduction: Aki points us to this discussion from Rolf Zwaan: The first massive replication project in psychology has just reached completion (several others are to follow). . . . What can we learn from the ManyLabs project? The results here show the effect sizes for the replication efforts (in green and grey) as well as the original studies (in blue). The 99% confidence intervals are for the meta-analysis of the effect size (the green dots); the studies are ordered by effect size. Let’s first consider what we canNOT learn from these data. Of the 13 replication attempts (when the first four are taken together), 11 succeeded and 2 did not (in fact, at some point ManyLabs suggests that a third one, Imagined Contact also doesn’t really replicate). We cannot learn from this that the vast majority of psychological findings will replicate . . . But even if we had an accurate estimate of the percentage of findings that replicate, how useful would that be? Rather than trying to arrive at a mo
7 0.78826475 898 andrew gelman stats-2011-09-10-Fourteen magic words: an update
8 0.78758436 1150 andrew gelman stats-2012-02-02-The inevitable problems with statistical significance and 95% intervals
9 0.77121413 820 andrew gelman stats-2011-07-25-Design of nonrandomized cluster sample study
10 0.76180261 797 andrew gelman stats-2011-07-11-How do we evaluate a new and wacky claim?
11 0.75994462 1186 andrew gelman stats-2012-02-27-Confusion from illusory precision
12 0.75479513 1746 andrew gelman stats-2013-03-02-Fishing for cherries
14 0.74781251 2223 andrew gelman stats-2014-02-24-“Edlin’s rule” for routinely scaling down published estimates
15 0.74483484 803 andrew gelman stats-2011-07-14-Subtleties with measurement-error models for the evaluation of wacky claims
16 0.74262112 2090 andrew gelman stats-2013-11-05-How much do we trust a new claim that early childhood stimulation raised earnings by 42%?
17 0.73584312 433 andrew gelman stats-2010-11-27-One way that psychology research is different than medical research
18 0.72839594 511 andrew gelman stats-2011-01-11-One more time on that ESP study: The problem of overestimates and the shrinkage solution
19 0.72554475 1074 andrew gelman stats-2011-12-20-Reading a research paper != agreeing with its claims
20 0.72390443 1644 andrew gelman stats-2012-12-30-Fixed effects, followed by Bayes shrinkage?
topicId topicWeight
[(2, 0.011), (15, 0.046), (16, 0.02), (22, 0.081), (24, 0.213), (27, 0.016), (36, 0.016), (42, 0.014), (43, 0.013), (51, 0.012), (63, 0.033), (65, 0.01), (84, 0.014), (86, 0.012), (89, 0.011), (94, 0.015), (95, 0.054), (99, 0.32)]
simIndex simValue blogId blogTitle
same-blog 1 0.97504377 963 andrew gelman stats-2011-10-18-Question on Type M errors
Introduction: Inti Pedroso writes: Today during the group meeting at my new job we were revising a paper whose main conclusions were sustained by an ANOVA. One of the first observations is that the experiment had a small sample size. Interestingly (may not so), some of the reported effects (most of them interactions) were quite large. One of the experience group members said that “there is a common wisdom that one should not believe effects from small sample sizes but [he thinks] if they [the effects] are large enough to be picked on a small study they must be real large effects”. I argued that if the sample size is small one could incur on a M-type error in which the magnitude of the effect is being over-estimated and that if larger samples are evaluated the magnitude may become smaller and also the confidence intervals. The concept of M-type error is completely new to all other members of the group (on which I am in my second week) and I was given the job of finding a suitable ref to explain
2 0.96812904 1216 andrew gelman stats-2012-03-17-Modeling group-level predictors in a multilevel regression
Introduction: Trey Causey writes: Do you have suggestions as to model selection strategies akin to Bayesian model averaging for multilevel models when level-2 inputs are of substantive interest? I [Causey] have seen plenty of R packages and procedures for non-multilevel models, and tried the glmulti package but found that it did not perform well with more than a few level-2 variables. My quick answer is: with a name like that, you should really be fitting three-level models! My longer answer is: regular readers will be unsurprised to hear that I’m no fan of Bayesian model averaging . Instead I’d prefer to bite the bullet and assign an informative prior distribution on these coefficients. I don’t have a great example of such an analysis but I’m more and more thinking that this is the way to go. I don’t see the point in aiming for the intermediate goal of pruning the predictors; I’d rather have a procedure that includes prior information on the predictors and their interactions.
3 0.96734977 1413 andrew gelman stats-2012-07-11-News flash: Probability and statistics are hard to understand
Introduction: Two people pointed me to an article by Emre Soyer and Robin Hogarth that was linked to by Felix Salmon. Here are my reactions: 1. Soyer and Hogarth’s paper seems very strong to me, and Salmon’s presentation is an impressive condensation of it. I’d say good job on the science and the reporting. 2. I don’t see the point of focusing on economists. This seems just like a gimmick to me. But, then again, I’m not an economist. So of course I’d be more interested in a similar paper studying political scientists or statisticians. This should be easy enough for someone to do, of course. 3. To elaborate on this last point: I’m not surprised that people, even expert practitioners, screw up with statistics. Kahneman and Tversky found this with psychology researchers back in the 1970s. I’m not knocking the current paper by Soyer and Hogarth but I don’t see it as surprising. Perhaps the focus on economists is what allowed it to get all this attention. If you want people to re
Introduction: Interesting discussion here from Mark Palko. I think of Palko’s post as having a lot of statistical content here, although it’s hard for me to say exactly why it feels that way to me. Perhaps it has to do with the challenges of measurement, how something that would seem to be a simple problem of measurement (adding up the cost of staple foods) isn’t so easy after all, in fact it requires a lot of subject-matter knowledge, in this case knowledge that some guy named Ron Shaich whom I’ve never heard of (but that’s ok, I’m sure he’s never heard of me either) doesn’t have. We’ve been talking a lot about measurement on this blog recently (for example, here ), and I think this new story fits into these discussions somehow.
5 0.96122861 669 andrew gelman stats-2011-04-19-The mysterious Gamma (1.4, 0.4)
Introduction: A student writes: I have a question about an earlier recommendation of yours on the election of the prior distribution for the precision hyperparameter of a normal distribution, and a reference for the recommendation. If I recall correctly I have read that you have suggested to use Gamma(1.4, 0.4) instead of Gamma(0.01,0.01) for the prior distribution of the precision hyper parameter of a normal distribution. I would very much appreciate if you would have the time to point me to this publication of yours. The reason is that I have used the prior distribution (Gamma(1.4, 0.4)) in a study which we now revise for publication, and where a reviewer question the choice of the distribution (claiming that it is too informative!). I am well aware of that you in recent publications (Prior distributions for variance parameters in hierarchical models. Bayesian Analysis; Data Analysis using regression and multilevel/hierarchical models) suggest to model the precision as pow(standard deviatio
8 0.9596355 2129 andrew gelman stats-2013-12-10-Cross-validation and Bayesian estimation of tuning parameters
9 0.95943141 1363 andrew gelman stats-2012-06-03-Question about predictive checks
10 0.95816839 2283 andrew gelman stats-2014-04-06-An old discussion of food deserts
11 0.95800889 63 andrew gelman stats-2010-06-02-The problem of overestimation of group-level variance parameters
12 0.95666295 86 andrew gelman stats-2010-06-14-“Too much data”?
13 0.95628822 1941 andrew gelman stats-2013-07-16-Priors
14 0.9560858 970 andrew gelman stats-2011-10-24-Bell Labs
15 0.95579588 2109 andrew gelman stats-2013-11-21-Hidden dangers of noninformative priors
16 0.95547521 247 andrew gelman stats-2010-09-01-How does Bayes do it?
18 0.95497322 1240 andrew gelman stats-2012-04-02-Blogads update
19 0.95429909 2200 andrew gelman stats-2014-02-05-Prior distribution for a predicted probability
20 0.95337355 899 andrew gelman stats-2011-09-10-The statistical significance filter