andrew_gelman_stats andrew_gelman_stats-2012 andrew_gelman_stats-2012-1557 knowledge-graph by maker-knowledge-mining

1557 andrew gelman stats-2012-11-01-‘Researcher Degrees of Freedom’


meta infos for this blog

Source: html

Introduction: False-Positive Psychology: Undisclosed Flexibility in Data Collection and Analysis Allows Presenting Anything as Significant [I]t is unacceptably easy to publish “statistically significant” evidence consistent with any hypothesis. The culprit is a construct we refer to as researcher degrees of freedom. In the course of collecting and analyzing data, researchers have many decisions to make: Should more data be collected? Should some observations be excluded? Which conditions should be combined and which ones compared? Which control variables should be considered? Should specific measures be combined or transformed or both? It is rare, and sometimes impractical, for researchers to make all these decisions beforehand. Rather, it is common (and accepted practice) for researchers to explore various analytic alternatives, to search for a combination that yields “statistical significance,” and to then report only what “worked.” The problem, of course, is that the likelihood of at leas


Summary: the most important sentenses genereted by tfidf model

sentIndex sentText sentNum sentScore

1 False-Positive Psychology: Undisclosed Flexibility in Data Collection and Analysis Allows Presenting Anything as Significant [I]t is unacceptably easy to publish “statistically significant” evidence consistent with any hypothesis. [sent-1, score-0.252]

2 The culprit is a construct we refer to as researcher degrees of freedom. [sent-2, score-0.48]

3 In the course of collecting and analyzing data, researchers have many decisions to make: Should more data be collected? [sent-3, score-0.657]

4 Which conditions should be combined and which ones compared? [sent-5, score-0.404]

5 Should specific measures be combined or transformed or both? [sent-7, score-0.45]

6 It is rare, and sometimes impractical, for researchers to make all these decisions beforehand. [sent-8, score-0.336]

7 Rather, it is common (and accepted practice) for researchers to explore various analytic alternatives, to search for a combination that yields “statistical significance,” and to then report only what “worked. [sent-9, score-0.793]

8 ” The problem, of course, is that the likelihood of at least one (of many) analyses producing a falsely positive finding at the 5% level is necessarily greater than 5%. [sent-10, score-0.448]

9 Other choice quotes, “Everything reported here actually happened”, “Author order is alphabetical, controlling for father’s age (reverse-coded)”. [sent-12, score-0.246]

10 I [ Malecki ] would rank author guidelines №s 5 & 6 higher in the order. [sent-13, score-0.409]


similar blogs computed by tfidf model

tfidf for this blog:

wordName wordTfidf (topN-words)

[('combined', 0.227), ('afshar', 0.181), ('yalda', 0.181), ('decisions', 0.172), ('unacceptably', 0.17), ('researchers', 0.164), ('undisclosed', 0.163), ('impractical', 0.163), ('excluded', 0.163), ('culprit', 0.157), ('author', 0.155), ('alphabetical', 0.153), ('malecki', 0.145), ('falsely', 0.14), ('order', 0.14), ('significant', 0.136), ('transformed', 0.133), ('flexibility', 0.133), ('analytic', 0.131), ('rank', 0.129), ('father', 0.129), ('guidelines', 0.125), ('yields', 0.125), ('producing', 0.122), ('collecting', 0.12), ('alternatives', 0.116), ('construct', 0.115), ('collected', 0.107), ('controlling', 0.106), ('degrees', 0.105), ('refer', 0.103), ('course', 0.103), ('accepted', 0.101), ('greater', 0.101), ('quotes', 0.1), ('collection', 0.1), ('presenting', 0.1), ('observations', 0.099), ('explore', 0.098), ('analyzing', 0.098), ('allows', 0.098), ('rare', 0.096), ('combination', 0.093), ('measures', 0.09), ('conditions', 0.089), ('ones', 0.088), ('necessarily', 0.085), ('consistent', 0.082), ('excellent', 0.082), ('search', 0.081)]

similar blogs list:

simIndex simValue blogId blogTitle

same-blog 1 1.0 1557 andrew gelman stats-2012-11-01-‘Researcher Degrees of Freedom’

Introduction: False-Positive Psychology: Undisclosed Flexibility in Data Collection and Analysis Allows Presenting Anything as Significant [I]t is unacceptably easy to publish “statistically significant” evidence consistent with any hypothesis. The culprit is a construct we refer to as researcher degrees of freedom. In the course of collecting and analyzing data, researchers have many decisions to make: Should more data be collected? Should some observations be excluded? Which conditions should be combined and which ones compared? Which control variables should be considered? Should specific measures be combined or transformed or both? It is rare, and sometimes impractical, for researchers to make all these decisions beforehand. Rather, it is common (and accepted practice) for researchers to explore various analytic alternatives, to search for a combination that yields “statistical significance,” and to then report only what “worked.” The problem, of course, is that the likelihood of at leas

2 0.13455766 1171 andrew gelman stats-2012-02-16-“False-positive psychology”

Introduction: Everybody’s talkin bout this paper by Joseph Simmons, Leif Nelson and Uri Simonsohn, who write : Despite empirical psychologists’ nominal endorsement of a low rate of false-positive findings (≤ .05), flexibility in data collection, analysis, and reporting dramatically increases actual false-positive rates. In many cases, a researcher is more likely to falsely find evidence that an effect exists than to correctly find evidence that it does not. We [Simmons, Nelson, and Simonsohn] present computer simulations and a pair of actual experiments that demonstrate how unacceptably easy it is to accumulate (and report) statistically significant evidence for a false hypothesis. Second, we suggest a simple, low-cost, and straightforwardly effective disclosure-based solution to this problem. The solution involves six concrete requirements for authors and four guidelines for reviewers, all of which impose a minimal burden on the publication process. Whatever you think about these recommend

3 0.10534349 1963 andrew gelman stats-2013-07-31-Response by Jessica Tracy and Alec Beall to my critique of the methods in their paper, “Women Are More Likely to Wear Red or Pink at Peak Fertility”

Introduction: Last week I published in Slate a critique of a paper that appeared in the journal Psychological Science. That paper, by Alec Beall and Jessica Tracy, found that women who were at peak fertility were three times more likely to wear red or pink shirts, compared to women at other points in their menstrual cycles. The study was based an 100 participants on the internet and 24 college students. In my critique, I argued that we had no reason to believe the results generalized to the larger population, because (1) the samples were not representative, (2) the measurements were noisy, (3) the researchers did not use the correct dates of peak fertility, and (4) there were many different comparisons that could have been reported in the data, so there was nothing special about a particular comparison being statistically significant. I likened their paper to other work which I considered flawed for multiple comparisons (too many researcher degrees of freedom), including a claimed relation bet

4 0.10452018 1072 andrew gelman stats-2011-12-19-“The difference between . . .”: It’s not just p=.05 vs. p=.06

Introduction: The title of this post by Sanjay Srivastava illustrates an annoying misconception that’s crept into the (otherwise delightful) recent publicity related to my article with Hal Stern, he difference between “significant” and “not significant” is not itself statistically significant. When people bring this up, they keep referring to the difference between p=0.05 and p=0.06, making the familiar (and correct) point about the arbitrariness of the conventional p-value threshold of 0.05. And, sure, I agree with this, but everybody knows that already. The point Hal and I were making was that even apparently large differences in p-values are not statistically significant. For example, if you have one study with z=2.5 (almost significant at the 1% level!) and another with z=1 (not statistically significant at all, only 1 se from zero!), then their difference has a z of about 1 (again, not statistically significant at all). So it’s not just a comparison of 0.05 vs. 0.06, even a differenc

5 0.095566578 451 andrew gelman stats-2010-12-05-What do practitioners need to know about regression?

Introduction: Fabio Rojas writes: In much of the social sciences outside economics, it’s very common for people to take a regression course or two in graduate school and then stop their statistical education. This creates a situation where you have a large pool of people who have some knowledge, but not a lot of knowledge. As a result, you have a pretty big gap between people like yourself, who are heavily invested in the cutting edge of applied statistics, and other folks. So here is the question: What are the major lessons about good statistical practice that “rank and file” social scientists should know? Sure, most people can recite “Correlation is not causation” or “statistical significance is not substantive significance.” But what are the other big lessons? This question comes from my own experience. I have a math degree and took regression analysis in graduate school, but I definitely do not have the level of knowledge of a statistician. I also do mixed method research, and field wor

6 0.095427088 593 andrew gelman stats-2011-02-27-Heat map

7 0.0944378 1860 andrew gelman stats-2013-05-17-How can statisticians help psychologists do their research better?

8 0.092485495 2183 andrew gelman stats-2014-01-23-Discussion on preregistration of research studies

9 0.090030968 1971 andrew gelman stats-2013-08-07-I doubt they cheated

10 0.089208603 899 andrew gelman stats-2011-09-10-The statistical significance filter

11 0.085173279 2042 andrew gelman stats-2013-09-28-Difficulties of using statistical significance (or lack thereof) to sift through and compare research hypotheses

12 0.084952466 1340 andrew gelman stats-2012-05-23-Question 13 of my final exam for Design and Analysis of Sample Surveys

13 0.084097236 1070 andrew gelman stats-2011-12-19-The scope for snooping

14 0.082885116 1291 andrew gelman stats-2012-04-30-Systematic review of publication bias in studies on publication bias

15 0.081140593 2091 andrew gelman stats-2013-11-06-“Marginally significant”

16 0.079959065 2295 andrew gelman stats-2014-04-18-One-tailed or two-tailed?

17 0.078996405 1337 andrew gelman stats-2012-05-22-Question 12 of my final exam for Design and Analysis of Sample Surveys

18 0.078719854 2179 andrew gelman stats-2014-01-20-The AAA Tranche of Subprime Science

19 0.076970182 1695 andrew gelman stats-2013-01-28-Economists argue about Bayes

20 0.076806396 1575 andrew gelman stats-2012-11-12-Thinking like a statistician (continuously) rather than like a civilian (discretely)


similar blogs computed by lsi model

lsi for this blog:

topicId topicWeight

[(0, 0.141), (1, 0.005), (2, 0.004), (3, -0.102), (4, 0.0), (5, -0.028), (6, -0.023), (7, 0.005), (8, -0.007), (9, -0.001), (10, -0.016), (11, 0.001), (12, 0.022), (13, -0.031), (14, 0.026), (15, 0.024), (16, 0.014), (17, -0.007), (18, 0.023), (19, -0.038), (20, 0.008), (21, 0.046), (22, -0.004), (23, -0.022), (24, -0.04), (25, 0.008), (26, 0.058), (27, -0.056), (28, 0.043), (29, -0.036), (30, -0.008), (31, 0.023), (32, 0.037), (33, 0.019), (34, 0.037), (35, 0.071), (36, -0.028), (37, -0.028), (38, -0.032), (39, -0.004), (40, 0.011), (41, -0.03), (42, 0.014), (43, 0.055), (44, 0.05), (45, -0.027), (46, -0.008), (47, 0.004), (48, 0.036), (49, -0.022)]

similar blogs list:

simIndex simValue blogId blogTitle

same-blog 1 0.97106618 1557 andrew gelman stats-2012-11-01-‘Researcher Degrees of Freedom’

Introduction: False-Positive Psychology: Undisclosed Flexibility in Data Collection and Analysis Allows Presenting Anything as Significant [I]t is unacceptably easy to publish “statistically significant” evidence consistent with any hypothesis. The culprit is a construct we refer to as researcher degrees of freedom. In the course of collecting and analyzing data, researchers have many decisions to make: Should more data be collected? Should some observations be excluded? Which conditions should be combined and which ones compared? Which control variables should be considered? Should specific measures be combined or transformed or both? It is rare, and sometimes impractical, for researchers to make all these decisions beforehand. Rather, it is common (and accepted practice) for researchers to explore various analytic alternatives, to search for a combination that yields “statistical significance,” and to then report only what “worked.” The problem, of course, is that the likelihood of at leas

2 0.85291022 156 andrew gelman stats-2010-07-20-Burglars are local

Introduction: This makes sense: In the land of fiction, it’s the criminal’s modus operandi – his method of entry, his taste for certain jewellery and so forth – that can be used by detectives to identify his handiwork. The reality according to a new analysis of solved burglaries in the Northamptonshire region of England is that these aspects of criminal behaviour are on their own unreliable as identifying markers, most likely because they are dictated by circumstances rather than the criminal’s taste and style. However, the geographical spread and timing of a burglar’s crimes are distinctive, and could help with police investigations. And, as a bonus, more Tourette’s pride! P.S. On yet another unrelated topic from the same blog, I wonder if the researchers in this study are aware that the difference between “significant” and “not significant” is not itself statistically significant .

3 0.80792582 1072 andrew gelman stats-2011-12-19-“The difference between . . .”: It’s not just p=.05 vs. p=.06

Introduction: The title of this post by Sanjay Srivastava illustrates an annoying misconception that’s crept into the (otherwise delightful) recent publicity related to my article with Hal Stern, he difference between “significant” and “not significant” is not itself statistically significant. When people bring this up, they keep referring to the difference between p=0.05 and p=0.06, making the familiar (and correct) point about the arbitrariness of the conventional p-value threshold of 0.05. And, sure, I agree with this, but everybody knows that already. The point Hal and I were making was that even apparently large differences in p-values are not statistically significant. For example, if you have one study with z=2.5 (almost significant at the 1% level!) and another with z=1 (not statistically significant at all, only 1 se from zero!), then their difference has a z of about 1 (again, not statistically significant at all). So it’s not just a comparison of 0.05 vs. 0.06, even a differenc

4 0.78185785 1171 andrew gelman stats-2012-02-16-“False-positive psychology”

Introduction: Everybody’s talkin bout this paper by Joseph Simmons, Leif Nelson and Uri Simonsohn, who write : Despite empirical psychologists’ nominal endorsement of a low rate of false-positive findings (≤ .05), flexibility in data collection, analysis, and reporting dramatically increases actual false-positive rates. In many cases, a researcher is more likely to falsely find evidence that an effect exists than to correctly find evidence that it does not. We [Simmons, Nelson, and Simonsohn] present computer simulations and a pair of actual experiments that demonstrate how unacceptably easy it is to accumulate (and report) statistically significant evidence for a false hypothesis. Second, we suggest a simple, low-cost, and straightforwardly effective disclosure-based solution to this problem. The solution involves six concrete requirements for authors and four guidelines for reviewers, all of which impose a minimal burden on the publication process. Whatever you think about these recommend

5 0.76624137 1974 andrew gelman stats-2013-08-08-Statistical significance and the dangerous lure of certainty

Introduction: In a discussion of some of the recent controversy over promiscuously statistically-significant science, Jeff Leek Rafael Irizarry points out there is a tradeoff between stringency and discovery and suggests that raising the bar of statistical significance (for example, to the .01 or .001 level instead of the conventional .05) will reduce the noise level but will also reduce the rate of identification of actual discoveries. I agree. But I should clarify that when I criticize a claim of statistical significance, arguing that the claimed “p less than .05″ could easily occur under the null hypothesis, given that the hypothesis test that is chosen is contingent on the data (see examples here of clothing and menstrual cycle, arm circumference and political attitudes, and ESP), I am not recommending a switch to a more stringent p-value threshold. Rather, I would prefer p-values not to be used as a threshold for publication at all. Here’s my point: The question is not whether

6 0.75997996 410 andrew gelman stats-2010-11-12-The Wald method has been the subject of extensive criticism by statisticians for exaggerating results”

7 0.75392109 1449 andrew gelman stats-2012-08-08-Gregor Mendel’s suspicious data

8 0.74586171 106 andrew gelman stats-2010-06-23-Scientists can read your mind . . . as long as the’re allowed to look at more than one place in your brain and then make a prediction after seeing what you actually did

9 0.74385852 1776 andrew gelman stats-2013-03-25-The harm done by tests of significance

10 0.74355638 1944 andrew gelman stats-2013-07-18-You’ll get a high Type S error rate if you use classical statistical methods to analyze data from underpowered studies

11 0.74269557 1971 andrew gelman stats-2013-08-07-I doubt they cheated

12 0.74165797 2159 andrew gelman stats-2014-01-04-“Dogs are sensitive to small variations of the Earth’s magnetic field”

13 0.72477669 2049 andrew gelman stats-2013-10-03-On house arrest for p-hacking

14 0.72013509 908 andrew gelman stats-2011-09-14-Type M errors in the lab

15 0.7166577 2183 andrew gelman stats-2014-01-23-Discussion on preregistration of research studies

16 0.71180987 2114 andrew gelman stats-2013-11-26-“Please make fun of this claim”

17 0.70872474 1023 andrew gelman stats-2011-11-22-Going Beyond the Book: Towards Critical Reading in Statistics Teaching

18 0.70720273 1671 andrew gelman stats-2013-01-13-Preregistration of Studies and Mock Reports

19 0.69322443 758 andrew gelman stats-2011-06-11-Hey, good news! Your p-value just passed the 0.05 threshold!

20 0.6908263 1150 andrew gelman stats-2012-02-02-The inevitable problems with statistical significance and 95% intervals


similar blogs computed by lda model

lda for this blog:

topicId topicWeight

[(2, 0.014), (15, 0.044), (16, 0.137), (17, 0.202), (19, 0.014), (21, 0.01), (24, 0.14), (30, 0.014), (42, 0.013), (65, 0.013), (86, 0.017), (88, 0.014), (99, 0.278)]

similar blogs list:

simIndex simValue blogId blogTitle

same-blog 1 0.93421936 1557 andrew gelman stats-2012-11-01-‘Researcher Degrees of Freedom’

Introduction: False-Positive Psychology: Undisclosed Flexibility in Data Collection and Analysis Allows Presenting Anything as Significant [I]t is unacceptably easy to publish “statistically significant” evidence consistent with any hypothesis. The culprit is a construct we refer to as researcher degrees of freedom. In the course of collecting and analyzing data, researchers have many decisions to make: Should more data be collected? Should some observations be excluded? Which conditions should be combined and which ones compared? Which control variables should be considered? Should specific measures be combined or transformed or both? It is rare, and sometimes impractical, for researchers to make all these decisions beforehand. Rather, it is common (and accepted practice) for researchers to explore various analytic alternatives, to search for a combination that yields “statistical significance,” and to then report only what “worked.” The problem, of course, is that the likelihood of at leas

2 0.91524285 1230 andrew gelman stats-2012-03-26-Further thoughts on nonparametric correlation measures

Introduction: Malka Gorfine, Ruth Heller, and Yair Heller write a comment on the paper of Reshef et al. that we discussed a few months ago. Just to remind you what’s going on here, here’s my quick summary from December: Reshef et al. propose a new nonlinear R-squared-like measure. Unlike R-squared, this new method depends on a tuning parameter that controls the level of discretization, in a “How long is the coast of Britain” sort of way. The dependence on scale is inevitable for such a general method. Just consider: if you sample 1000 points from the unit bivariate normal distribution, (x,y) ~ N(0,I), you’ll be able to fit them perfectly by a 999-degree polynomial fit to the data. So the scale of the fit matters. The clever idea of the paper is that, instead of going for an absolute measure (which, as we’ve seen, will be scale-dependent), they focus on the problem of summarizing the grid of pairwise dependences in a large set of variables. As they put it: “Imagine a data set with hundreds

3 0.91184103 2314 andrew gelman stats-2014-05-01-Heller, Heller, and Gorfine on univariate and multivariate information measures

Introduction: Malka Gorfine writes: We noticed that the important topic of association measures and tests came up again in your blog, and we have few comments in this regard. It is useful to distinguish between the univariate and multivariate methods. A consistent multivariate method can recognise dependence between two vectors of random variables, while a univariate method can only loop over pairs of components and check for dependency between them. There are very few consistent multivariate methods. To the best of our knowledge there are three practical methods: 1) HSIC by Gretton et al. (http://www.gatsby.ucl.ac.uk/~gretton/papers/GreBouSmoSch05.pdf) 2) dcov by Szekely et al. (http://projecteuclid.org/euclid.aoas/1267453933) 3) A method we introduced in Heller et al (Biometrika, 2013, 503—510, http://biomet.oxfordjournals.org/content/early/2012/12/04/biomet.ass070.full.pdf+html, and an R package, HHG, is available as well http://cran.r-project.org/web/packages/HHG/index.html). A

4 0.90537822 309 andrew gelman stats-2010-10-01-Why Development Economics Needs Theory?

Introduction: Robert Neumann writes: in the JEP 24(3), page18, Daron Acemoglu states: Why Development Economics Needs Theory There is no general agreement on how much we should rely on economic theory in motivating empirical work and whether we should try to formulate and estimate “structural parameters.” I (Acemoglu) argue that the answer is largely “yes” because otherwise econometric estimates would lack external validity, in which case they can neither inform us about whether a particular model or theory is a useful approximation to reality, nor would they be useful in providing us guidance on what the effects of similar shocks and policies would be in different circumstances or if implemented in different scales. I therefore define “structural parameters” as those that provide external validity and would thus be useful in testing theories or in policy analysis beyond the specific environment and sample from which they are derived. External validity becomes a particularly challenging t

5 0.90295285 1136 andrew gelman stats-2012-01-23-Fight! (also a bit of reminiscence at the end)

Introduction: Martin Lindquist and Michael Sobel published a fun little article in Neuroimage on models and assumptions for causal inference with intermediate outcomes. As their subtitle indicates (“A response to the comments on our comment”), this is a topic of some controversy. Lindquist and Sobel write: Our original comment (Lindquist and Sobel, 2011) made explicit the types of assumptions neuroimaging researchers are making when directed graphical models (DGMs), which include certain types of structural equation models (SEMs), are used to estimate causal effects. When these assumptions, which many researchers are not aware of, are not met, parameters of these models should not be interpreted as effects. . . . [Judea] Pearl does not disagree with anything we stated. However, he takes exception to our use of potential outcomes notation, which is the standard notation used in the statistical literature on causal inference, and his comment is devoted to promoting his alternative conventions. [C

6 0.90165949 2324 andrew gelman stats-2014-05-07-Once more on nonparametric measures of mutual information

7 0.89575458 1616 andrew gelman stats-2012-12-10-John McAfee is a Heinlein hero

8 0.88560647 1422 andrew gelman stats-2012-07-20-Likelihood thresholds and decisions

9 0.88434714 1093 andrew gelman stats-2011-12-30-Strings Attached: Untangling the Ethics of Incentives

10 0.87990129 705 andrew gelman stats-2011-05-10-Some interesting unpublished ideas on survey weighting

11 0.87382996 1362 andrew gelman stats-2012-06-03-Question 24 of my final exam for Design and Analysis of Sample Surveys

12 0.87124467 397 andrew gelman stats-2010-11-06-Multilevel quantile regression

13 0.86162817 1591 andrew gelman stats-2012-11-26-Politics as an escape hatch

14 0.85756689 1076 andrew gelman stats-2011-12-21-Derman, Rodrik and the nature of statistical models

15 0.85035658 2179 andrew gelman stats-2014-01-20-The AAA Tranche of Subprime Science

16 0.84962445 503 andrew gelman stats-2011-01-04-Clarity on my email policy

17 0.84844106 32 andrew gelman stats-2010-05-14-Causal inference in economics

18 0.84794652 1712 andrew gelman stats-2013-02-07-Philosophy and the practice of Bayesian statistics (with all the discussions!)

19 0.84777635 411 andrew gelman stats-2010-11-13-Ethical concerns in medical trials

20 0.84765178 2359 andrew gelman stats-2014-06-04-All the Assumptions That Are My Life