andrew_gelman_stats andrew_gelman_stats-2011 andrew_gelman_stats-2011-803 knowledge-graph by maker-knowledge-mining

803 andrew gelman stats-2011-07-14-Subtleties with measurement-error models for the evaluation of wacky claims


meta infos for this blog

Source: html

Introduction: A few days ago I discussed the evaluation of somewhat-plausible claims that are somewhat supported by theory and somewhat supported by statistical evidence. One point I raised was that an implausibly large estimate of effect size can be cause for concern: Uri Simonsohn (the author of the recent rebuttal of the name-choice article by Pelham et al.) argued that the implied effects were too large to be believed (just as I was arguing above regarding the July 4th study), which makes more plausible his claims that the results arise from methodological artifacts. That calculation is straight Bayes: the distribution of systematic errors has much longer tails than the distribution of random errors, so the larger the estimated effect, the more likely it is to be a mistake. This little theoretical result is a bit annoying, because it is the larger effects that are the most interesting!” Larry Bartels notes that my reasoning above is a bit incoherent: I [Bartels] strongly agree with


Summary: the most important sentenses genereted by tfidf model

sentIndex sentText sentNum sentScore

1 A few days ago I discussed the evaluation of somewhat-plausible claims that are somewhat supported by theory and somewhat supported by statistical evidence. [sent-1, score-0.494]

2 One point I raised was that an implausibly large estimate of effect size can be cause for concern: Uri Simonsohn (the author of the recent rebuttal of the name-choice article by Pelham et al. [sent-2, score-0.926]

3 ) argued that the implied effects were too large to be believed (just as I was arguing above regarding the July 4th study), which makes more plausible his claims that the results arise from methodological artifacts. [sent-3, score-0.485]

4 That calculation is straight Bayes: the distribution of systematic errors has much longer tails than the distribution of random errors, so the larger the estimated effect, the more likely it is to be a mistake. [sent-4, score-1.017]

5 This little theoretical result is a bit annoying, because it is the larger effects that are the most interesting! [sent-5, score-0.244]

6 ” Larry Bartels notes that my reasoning above is a bit incoherent: I [Bartels] strongly agree with your bottom line that our main aim should be “understanding effect sizes on a real scale. [sent-6, score-0.456]

7 ” However, your paradoxical conclusion (“the larger the estimated effect, the more likely it is to be a mistake”) seems to distract attention from the effect size of primary interest-the magnitude of the “true” (causal) effect. [sent-7, score-1.403]

8 But the more important fact would seem to be that your posterior belief regarding the magnitude of the “true” (causal) effect, E(c|b), is also increasing in b (at least for plausible-seeming distributional assumptions). [sent-9, score-0.815]

9 Focusing on whether a surprising empirical result is “a mistake” (whatever that means) seems to concede too much to the simple-minded is-there-an-effect-or-isn’t-there perspective, while obscuring your more fundamental interest in “understanding [true] effect sizes on a real scale. [sent-12, score-0.775]

10 Maybe a more correct statement would be that, given reasonable models for x, d, and e, if the estimate gets implausibly large, the estimate for x does not increase proportionally. [sent-15, score-0.546]

11 I actually think there will be some (non-Gaussian) models for which, as y gets larger, E(x|y) can actually go back toward zero. [sent-16, score-0.073]

12 But this will depend on the distributional form. [sent-17, score-0.148]

13 I agree that “how likely is it to be a mistake” is the wrong way to look at things. [sent-18, score-0.09]

14 No analysis is perfect, so the “mistake” framing is generally not so helpful. [sent-20, score-0.065]


similar blogs computed by tfidf model

tfidf for this blog:

wordName wordTfidf (topN-words)

[('effect', 0.36), ('estimated', 0.247), ('magnitude', 0.208), ('attribute', 0.184), ('july', 0.181), ('larger', 0.178), ('mistake', 0.171), ('implausibly', 0.158), ('systematic', 0.154), ('causal', 0.15), ('bartels', 0.148), ('distributional', 0.148), ('true', 0.145), ('regarding', 0.145), ('large', 0.14), ('estimate', 0.117), ('posterior', 0.116), ('supported', 0.106), ('larry', 0.106), ('claims', 0.104), ('error', 0.099), ('belief', 0.099), ('increasing', 0.099), ('plausible', 0.096), ('sizes', 0.096), ('concede', 0.094), ('likely', 0.09), ('somewhat', 0.089), ('warranted', 0.088), ('elicited', 0.085), ('distract', 0.085), ('paradoxical', 0.082), ('obscuring', 0.082), ('increase', 0.081), ('pelham', 0.079), ('study', 0.078), ('seems', 0.077), ('size', 0.076), ('rebuttal', 0.075), ('errors', 0.075), ('gets', 0.073), ('incoherent', 0.072), ('tails', 0.072), ('wacky', 0.071), ('random', 0.071), ('understanding', 0.068), ('nevertheless', 0.067), ('result', 0.066), ('distribution', 0.065), ('framing', 0.065)]

similar blogs list:

simIndex simValue blogId blogTitle

same-blog 1 0.99999952 803 andrew gelman stats-2011-07-14-Subtleties with measurement-error models for the evaluation of wacky claims

Introduction: A few days ago I discussed the evaluation of somewhat-plausible claims that are somewhat supported by theory and somewhat supported by statistical evidence. One point I raised was that an implausibly large estimate of effect size can be cause for concern: Uri Simonsohn (the author of the recent rebuttal of the name-choice article by Pelham et al.) argued that the implied effects were too large to be believed (just as I was arguing above regarding the July 4th study), which makes more plausible his claims that the results arise from methodological artifacts. That calculation is straight Bayes: the distribution of systematic errors has much longer tails than the distribution of random errors, so the larger the estimated effect, the more likely it is to be a mistake. This little theoretical result is a bit annoying, because it is the larger effects that are the most interesting!” Larry Bartels notes that my reasoning above is a bit incoherent: I [Bartels] strongly agree with

2 0.36803645 797 andrew gelman stats-2011-07-11-How do we evaluate a new and wacky claim?

Introduction: Around these parts we see a continuing flow of unusual claims supported by some statistical evidence. The claims are varyingly plausible a priori. Some examples (I won’t bother to supply the links; regular readers will remember these examples and newcomers can find them by searching): - Obesity is contagious - People’s names affect where they live, what jobs they take, etc. - Beautiful people are more likely to have girl babies - More attractive instructors have higher teaching evaluations - In a basketball game, it’s better to be behind by a point at halftime than to be ahead by a point - Praying for someone without their knowledge improves their recovery from heart attacks - A variety of claims about ESP How should we think about these claims? The usual approach is to evaluate the statistical evidence–in particular, to look for reasons that the claimed results are not really statistically significant. If nobody can shoot down a claim, it survives. The other part of th

3 0.20431682 1744 andrew gelman stats-2013-03-01-Why big effects are more important than small effects

Introduction: The title of this post is silly but I have an important point to make, regarding an implicit model which I think many people assume even though it does not really make sense. Following a link from Sanjay Srivastava, I came across a post from David Funder saying that it’s useful to talk about the sizes of effects (I actually prefer the term “comparisons” so as to avoid the causal baggage) rather than just their signs. I agree , and I wanted to elaborate a bit on a point that comes up in Funder’s discussion. He quotes an (unnamed) prominent social psychologist as writing: The key to our research . . . [is not] to accurately estimate effect size. If I were testing an advertisement for a marketing research firm and wanted to be sure that the cost of the ad would produce enough sales to make it worthwhile, effect size would be crucial. But when I am testing a theory about whether, say, positive mood reduces information processing in comparison with negative mood, I am worried abou

4 0.20275521 963 andrew gelman stats-2011-10-18-Question on Type M errors

Introduction: Inti Pedroso writes: Today during the group meeting at my new job we were revising a paper whose main conclusions were sustained by an ANOVA. One of the first observations is that the experiment had a small sample size. Interestingly (may not so), some of the reported effects (most of them interactions) were quite large. One of the experience group members said that “there is a common wisdom that one should not believe effects from small sample sizes but [he thinks] if they [the effects] are large enough to be picked on a small study they must be real large effects”. I argued that if the sample size is small one could incur on a M-type error in which the magnitude of the effect is being over-estimated and that if larger samples are evaluated the magnitude may become smaller and also the confidence intervals. The concept of M-type error is completely new to all other members of the group (on which I am in my second week) and I was given the job of finding a suitable ref to explain

5 0.19504096 1607 andrew gelman stats-2012-12-05-The p-value is not . . .

Introduction: From a recent email exchange: I agree that you should never compare p-values directly. The p-value is a strange nonlinear transformation of data that is only interpretable under the null hypothesis. Once you abandon the null (as we do when we observe something with a very low p-value), the p-value itself becomes irrelevant. To put it another way, the p-value is a measure of evidence, it is not an estimate of effect size (as it is often treated, with the idea that a p=.001 effect is larger than a p=.01 effect, etc). Even conditional on sample size, the p-value is not a measure of effect size.

6 0.16940995 629 andrew gelman stats-2011-03-26-Is it plausible that 1% of people pick a career based on their first name?

7 0.16511899 643 andrew gelman stats-2011-04-02-So-called Bayesian hypothesis testing is just as bad as regular hypothesis testing

8 0.16450399 494 andrew gelman stats-2010-12-31-Type S error rates for classical and Bayesian single and multiple comparison procedures

9 0.15794986 1941 andrew gelman stats-2013-07-16-Priors

10 0.15759669 1418 andrew gelman stats-2012-07-16-Long discussion about causal inference and the use of hierarchical models to bridge between different inferential settings

11 0.14929563 1400 andrew gelman stats-2012-06-29-Decline Effect in Linguistics?

12 0.14899649 808 andrew gelman stats-2011-07-18-The estimated effect size is implausibly large. Under what models is this a piece of evidence that the true effect is small?

13 0.14715935 85 andrew gelman stats-2010-06-14-Prior distribution for design effects

14 0.14321111 7 andrew gelman stats-2010-04-27-Should Mister P be allowed-encouraged to reside in counter-factual populations?

15 0.14110182 388 andrew gelman stats-2010-11-01-The placebo effect in pharma

16 0.13640732 1939 andrew gelman stats-2013-07-15-Forward causal reasoning statements are about estimation; reverse causal questions are about model checking and hypothesis generation

17 0.13536428 960 andrew gelman stats-2011-10-15-The bias-variance tradeoff

18 0.13522655 1910 andrew gelman stats-2013-06-22-Struggles over the criticism of the “cannabis users and IQ change” paper

19 0.13224457 1955 andrew gelman stats-2013-07-25-Bayes-respecting experimental design and other things

20 0.13211839 511 andrew gelman stats-2011-01-11-One more time on that ESP study: The problem of overestimates and the shrinkage solution


similar blogs computed by lsi model

lsi for this blog:

topicId topicWeight

[(0, 0.229), (1, 0.076), (2, 0.118), (3, -0.178), (4, -0.039), (5, -0.079), (6, 0.02), (7, 0.034), (8, 0.013), (9, -0.052), (10, -0.105), (11, 0.019), (12, 0.053), (13, -0.042), (14, 0.056), (15, 0.018), (16, -0.054), (17, 0.044), (18, -0.048), (19, 0.058), (20, -0.063), (21, -0.058), (22, 0.049), (23, 0.013), (24, 0.034), (25, 0.099), (26, -0.037), (27, 0.083), (28, -0.021), (29, -0.036), (30, -0.033), (31, 0.034), (32, -0.082), (33, -0.047), (34, -0.009), (35, 0.017), (36, -0.066), (37, -0.117), (38, -0.019), (39, -0.062), (40, -0.018), (41, 0.011), (42, -0.092), (43, -0.013), (44, 0.031), (45, 0.016), (46, -0.035), (47, 0.042), (48, 0.02), (49, 0.034)]

similar blogs list:

simIndex simValue blogId blogTitle

same-blog 1 0.99336594 803 andrew gelman stats-2011-07-14-Subtleties with measurement-error models for the evaluation of wacky claims

Introduction: A few days ago I discussed the evaluation of somewhat-plausible claims that are somewhat supported by theory and somewhat supported by statistical evidence. One point I raised was that an implausibly large estimate of effect size can be cause for concern: Uri Simonsohn (the author of the recent rebuttal of the name-choice article by Pelham et al.) argued that the implied effects were too large to be believed (just as I was arguing above regarding the July 4th study), which makes more plausible his claims that the results arise from methodological artifacts. That calculation is straight Bayes: the distribution of systematic errors has much longer tails than the distribution of random errors, so the larger the estimated effect, the more likely it is to be a mistake. This little theoretical result is a bit annoying, because it is the larger effects that are the most interesting!” Larry Bartels notes that my reasoning above is a bit incoherent: I [Bartels] strongly agree with

2 0.84117436 7 andrew gelman stats-2010-04-27-Should Mister P be allowed-encouraged to reside in counter-factual populations?

Introduction: Lets say you are repeatedly going to recieve unselected sets of well done RCTs on various say medical treatments. One reasonable assumption with all of these treatments is that they are monotonic – either helpful or harmful for all. The treatment effect will (as always) vary for subgroups in the population – these will not be explicitly identified in the studies – but each study very likely will enroll different percentages of the variuos patient subgroups. Being all randomized studies these subgroups will be balanced in the treatment versus control arms – but each study will (as always) be estimating a different – but exchangeable – treatment effect (Exhangeable due to the ignorance about the subgroup memberships of the enrolled patients.) That reasonable assumption – monotonicity – will be to some extent (as always) wrong, but given that it is a risk believed well worth taking – if the average effect in any population is positive (versus negative) the average effect in any other

3 0.83497715 797 andrew gelman stats-2011-07-11-How do we evaluate a new and wacky claim?

Introduction: Around these parts we see a continuing flow of unusual claims supported by some statistical evidence. The claims are varyingly plausible a priori. Some examples (I won’t bother to supply the links; regular readers will remember these examples and newcomers can find them by searching): - Obesity is contagious - People’s names affect where they live, what jobs they take, etc. - Beautiful people are more likely to have girl babies - More attractive instructors have higher teaching evaluations - In a basketball game, it’s better to be behind by a point at halftime than to be ahead by a point - Praying for someone without their knowledge improves their recovery from heart attacks - A variety of claims about ESP How should we think about these claims? The usual approach is to evaluate the statistical evidence–in particular, to look for reasons that the claimed results are not really statistically significant. If nobody can shoot down a claim, it survives. The other part of th

4 0.81482935 1744 andrew gelman stats-2013-03-01-Why big effects are more important than small effects

Introduction: The title of this post is silly but I have an important point to make, regarding an implicit model which I think many people assume even though it does not really make sense. Following a link from Sanjay Srivastava, I came across a post from David Funder saying that it’s useful to talk about the sizes of effects (I actually prefer the term “comparisons” so as to avoid the causal baggage) rather than just their signs. I agree , and I wanted to elaborate a bit on a point that comes up in Funder’s discussion. He quotes an (unnamed) prominent social psychologist as writing: The key to our research . . . [is not] to accurately estimate effect size. If I were testing an advertisement for a marketing research firm and wanted to be sure that the cost of the ad would produce enough sales to make it worthwhile, effect size would be crucial. But when I am testing a theory about whether, say, positive mood reduces information processing in comparison with negative mood, I am worried abou

5 0.77762616 629 andrew gelman stats-2011-03-26-Is it plausible that 1% of people pick a career based on their first name?

Introduction: In my discussion of dentists-named-Dennis study, I referred to my back-of-the-envelope calculation that the effect (if it indeed exists) corresponds to an approximate 1% aggregate chance that you’ll pick a profession based on your first name. Even if there are nearly twice as many dentist Dennises as would be expected from chance alone, the base rate is so low that a shift of 1% of all Dennises would be enough to do this. My point was that (a) even a small effect could show up when looking at low-frequency events such as the choice to pick a particular career or live in a particular city, and (b) any small effects will inherently be difficult to detect in any direct way. Uri Simonsohn (the author of the recent rebuttal of the original name-choice article by Brett Pelham et al.) wrote: In terms of the effect size. I [Simonsohn] think of it differently and see it as too big to be believable. I don’t find it plausible that I can double the odds that my daughter will marry an

6 0.75743335 1607 andrew gelman stats-2012-12-05-The p-value is not . . .

7 0.75391269 1400 andrew gelman stats-2012-06-29-Decline Effect in Linguistics?

8 0.73293257 518 andrew gelman stats-2011-01-15-Regression discontinuity designs: looking for the keys under the lamppost?

9 0.73254287 898 andrew gelman stats-2011-09-10-Fourteen magic words: an update

10 0.73112214 716 andrew gelman stats-2011-05-17-Is the internet causing half the rapes in Norway? I wanna see the scatterplot.

11 0.71823049 963 andrew gelman stats-2011-10-18-Question on Type M errors

12 0.70609289 2040 andrew gelman stats-2013-09-26-Difficulties in making inferences about scientific truth from distributions of published p-values

13 0.7029016 2 andrew gelman stats-2010-04-23-Modeling heterogenous treatment effects

14 0.7004137 960 andrew gelman stats-2011-10-15-The bias-variance tradeoff

15 0.69628054 2223 andrew gelman stats-2014-02-24-“Edlin’s rule” for routinely scaling down published estimates

16 0.69358927 1186 andrew gelman stats-2012-02-27-Confusion from illusory precision

17 0.69299281 1150 andrew gelman stats-2012-02-02-The inevitable problems with statistical significance and 95% intervals

18 0.68950987 643 andrew gelman stats-2011-04-02-So-called Bayesian hypothesis testing is just as bad as regular hypothesis testing

19 0.68650091 2165 andrew gelman stats-2014-01-09-San Fernando Valley cityscapes: An example of the benefits of fractal devastation?

20 0.67921281 1910 andrew gelman stats-2013-06-22-Struggles over the criticism of the “cannabis users and IQ change” paper


similar blogs computed by lda model

lda for this blog:

topicId topicWeight

[(2, 0.026), (15, 0.172), (16, 0.03), (21, 0.039), (24, 0.209), (40, 0.016), (43, 0.023), (48, 0.017), (53, 0.011), (55, 0.011), (99, 0.336)]

similar blogs list:

simIndex simValue blogId blogTitle

1 0.9871242 945 andrew gelman stats-2011-10-06-W’man < W’pedia, again

Introduction: Blogger Deep Climate looks at another paper by the 2002 recipient of the American Statistical Association’s Founders award. This time it’s not funny, it’s just sad. Here’s Wikipedia on simulated annealing: By analogy with this physical process, each step of the SA algorithm replaces the current solution by a random “nearby” solution, chosen with a probability that depends on the difference between the corresponding function values and on a global parameter T (called the temperature), that is gradually decreased during the process. The dependency is such that the current solution changes almost randomly when T is large, but increasingly “downhill” as T goes to zero. The allowance for “uphill” moves saves the method from becoming stuck at local minima—which are the bane of greedier methods. And here’s Wegman: During each step of the algorithm, the variable that will eventually represent the minimum is replaced by a random solution that is chosen according to a temperature

2 0.98467314 133 andrew gelman stats-2010-07-08-Gratuitous use of “Bayesian Statistics,” a branding issue?

Introduction: I’m on an island in Maine for a few weeks (big shout out for North Haven!) This morning I picked up a copy of “Working Waterfront,” a newspaper that focuses on issues of coastal fishing communities. I came across an article about modeling “fish” populations — actually lobsters, I guess they’re considered “fish” for regulatory purposes. When I read it, I thought “wow, this article is really well-written, not dumbed down like articles in most newspapers.” I think it’s great that a small coastal newspaper carries reporting like this. (The online version has a few things that I don’t recall in the print version, too, so it’s even better). But in addition to being struck by finding such a good article in a small newspaper, I was struck by this: According to [University of Maine scientist Yong] Chen, there are four main areas where his model improved on the prior version. “We included the inshore trawl data from Maine and other state surveys, in addition to federal survey data; we h

3 0.98460668 576 andrew gelman stats-2011-02-15-With a bit of precognition, you’d have known I was going to post again on this topic, and with a lot of precognition, you’d have known I was going to post today

Introduction: Chris Masse points me to this response by Daryl Bem and two statisticians (Jessica Utts and Wesley Johnson) to criticisms by Wagenmakers et.al. of Bem’s recent ESP study. I have nothing to add but would like to repeat a couple bits of my discussions of last month, of here : Classical statistical methods that work reasonably well when studying moderate or large effects (see the work of Fisher, Snedecor, Cochran, etc.) fall apart in the presence of small effects. I think it’s naive when people implicitly assume that the study’s claims are correct, or the study’s statistical methods are weak. Generally, the smaller the effects you’re studying, the better the statistics you need. ESP is a field of small effects and so ESP researchers use high-quality statistics. To put it another way: whatever methodological errors happen to be in the paper in question, probably occur in lots of researcher papers in “legitimate” psychology research. The difference is that when you’re studying a

same-blog 4 0.98113412 803 andrew gelman stats-2011-07-14-Subtleties with measurement-error models for the evaluation of wacky claims

Introduction: A few days ago I discussed the evaluation of somewhat-plausible claims that are somewhat supported by theory and somewhat supported by statistical evidence. One point I raised was that an implausibly large estimate of effect size can be cause for concern: Uri Simonsohn (the author of the recent rebuttal of the name-choice article by Pelham et al.) argued that the implied effects were too large to be believed (just as I was arguing above regarding the July 4th study), which makes more plausible his claims that the results arise from methodological artifacts. That calculation is straight Bayes: the distribution of systematic errors has much longer tails than the distribution of random errors, so the larger the estimated effect, the more likely it is to be a mistake. This little theoretical result is a bit annoying, because it is the larger effects that are the most interesting!” Larry Bartels notes that my reasoning above is a bit incoherent: I [Bartels] strongly agree with

5 0.97914433 1081 andrew gelman stats-2011-12-24-Statistical ethics violation

Introduction: A colleague writes: When I was in NYC I went to this party by group of Japanese bio-scientists. There, one guy told me about how the biggest pharmaceutical company in Japan did their statistics. They ran 100 different tests and reported the most significant one. (This was in 2006 and he said they stopped doing this few years back so they were doing this until pretty recently…) I’m not sure if this was 100 multiple comparison or 100 different kinds of test but I’m sure they wouldn’t want to disclose their data… Ouch!

6 0.97799182 1794 andrew gelman stats-2013-04-09-My talks in DC and Baltimore this week

7 0.9779827 1541 andrew gelman stats-2012-10-19-Statistical discrimination again

8 0.96871859 1908 andrew gelman stats-2013-06-21-Interpreting interactions in discrete-data regression

9 0.96844459 1779 andrew gelman stats-2013-03-27-“Two Dogmas of Strong Objective Bayesianism”

10 0.96768701 2188 andrew gelman stats-2014-01-27-“Disappointed with your results? Boost your scientific paper”

11 0.96343571 1800 andrew gelman stats-2013-04-12-Too tired to mock

12 0.96224999 274 andrew gelman stats-2010-09-14-Battle of the Americans: Writer at the American Enterprise Institute disparages the American Political Science Association

13 0.96185887 329 andrew gelman stats-2010-10-08-More on those dudes who will pay your professor $8000 to assign a book to your class, and related stories about small-time sleazoids

14 0.9600879 902 andrew gelman stats-2011-09-12-The importance of style in academic writing

15 0.95907831 2353 andrew gelman stats-2014-05-30-I posted this as a comment on a sociology blog

16 0.95858347 762 andrew gelman stats-2011-06-13-How should journals handle replication studies?

17 0.95845807 981 andrew gelman stats-2011-10-30-rms2

18 0.95343053 1683 andrew gelman stats-2013-01-19-“Confirmation, on the other hand, is not sexy”

19 0.95339572 1393 andrew gelman stats-2012-06-26-The reverse-journal-submission system

20 0.95307708 1833 andrew gelman stats-2013-04-30-“Tragedy of the science-communication commons”