andrew_gelman_stats andrew_gelman_stats-2014 andrew_gelman_stats-2014-2281 knowledge-graph by maker-knowledge-mining

2281 andrew gelman stats-2014-04-04-The Notorious N.H.S.T. presents: Mo P-values Mo Problems


meta infos for this blog

Source: html

Introduction: A recent discussion between commenters Question and Fernando captured one of the recurrent themes here from the past year. Question: The problem is simple, the researchers are disproving always false null hypotheses and taking this disproof as near proof that their theory is correct. Fernando: Whereas it is probably true that researchers misuse NHT, the problem with tabloid science is broader and deeper. It is systemic. Question: I do not see how anything can be deeper than replacing careful description, prediction, falsification, and independent replication with dynamite plots, p-values, affirming the consequent, and peer review. From my own experience I am confident in saying that confusion caused by NHST is at the root of this problem. Fernando: Incentives? Impact factors? Publish or die? “Interesting” and “new” above quality and reliability, or actually answering a research question, and a silly and unbecoming obsession with being quoted in NYT, etc. . . . Giv


Summary: the most important sentenses genereted by tfidf model

sentIndex sentText sentNum sentScore

1 A recent discussion between commenters Question and Fernando captured one of the recurrent themes here from the past year. [sent-1, score-0.156]

2 Question: The problem is simple, the researchers are disproving always false null hypotheses and taking this disproof as near proof that their theory is correct. [sent-2, score-1.153]

3 Fernando: Whereas it is probably true that researchers misuse NHT, the problem with tabloid science is broader and deeper. [sent-3, score-0.148]

4 Question: I do not see how anything can be deeper than replacing careful description, prediction, falsification, and independent replication with dynamite plots, p-values, affirming the consequent, and peer review. [sent-5, score-0.15]

5 From my own experience I am confident in saying that confusion caused by NHST is at the root of this problem. [sent-6, score-0.172]

6 “Interesting” and “new” above quality and reliability, or actually answering a research question, and a silly and unbecoming obsession with being quoted in NYT, etc. [sent-10, score-0.231]

7 Given the incentives something silly is bound to happen. [sent-14, score-0.185]

8 At this point I was going to respond in the comments, but I decided to make this a separate post (at the cost of pre-empting yet another scheduled item on the queue), for two reasons: 1. [sent-16, score-0.134]

9 I think Bayesian methods are great, don’t get me wrong, but the discussion here has little to do with Bayes. [sent-22, score-0.066]

10 Null hypothesis significance testing can be done in a non-Bayesian way (of course, just see all sorts of theoretical-statistics textbooks) but some Bayesians like to do it too, using Bayes factors and all the rest of that crap to decide whether to accept models of the theta=0 variety. [sent-23, score-0.588]

11 Do it using p-values or Bayes factors, either way it’s significance testing with the goal of rejecting models. [sent-24, score-0.355]

12 as an enabler I agree with the now-conventional wisdom expressed by the original commenter, that null hypothesis significance testing is generally inappropriate. [sent-29, score-0.728]

13 But I also agree with Fernando’s comment that the pressures of publication would be leading to the aggressive dissemination of noise, in any case. [sent-30, score-0.456]

14 This relates to my recent discussion with Steven Pinker (not published on blog yet, it’s on the queue, you’ll see it in a month or so). [sent-37, score-0.066]

15 To say it another way, the reason why I go on and on about multiple comparisons is not that I think it’s so important to get correct p-values, but rather that these p-values are being used as the statistical justification for otherwise laughable claims. [sent-38, score-0.155]

16 , some other tool would be used to give the stamp of approval on data-based speculations. [sent-43, score-0.134]


similar blogs computed by tfidf model

tfidf for this blog:

wordName wordTfidf (topN-words)

[('fernando', 0.426), ('null', 0.254), ('hypotheses', 0.207), ('disproving', 0.17), ('testing', 0.164), ('dissemination', 0.163), ('disproof', 0.163), ('bayes', 0.159), ('noise', 0.15), ('queue', 0.149), ('aggressive', 0.128), ('factors', 0.123), ('significance', 0.12), ('proof', 0.117), ('hypothesis', 0.11), ('confusion', 0.106), ('smart', 0.105), ('incentives', 0.104), ('commenter', 0.097), ('notorious', 0.094), ('near', 0.094), ('recurrent', 0.09), ('laughable', 0.09), ('question', 0.085), ('pressures', 0.085), ('obsession', 0.085), ('dynamite', 0.085), ('overarching', 0.085), ('false', 0.083), ('nhst', 0.081), ('nht', 0.081), ('silly', 0.081), ('agree', 0.08), ('tabloid', 0.074), ('misuse', 0.074), ('skeptics', 0.071), ('wary', 0.071), ('rejecting', 0.071), ('sorts', 0.071), ('stamp', 0.069), ('yet', 0.068), ('scheduled', 0.066), ('root', 0.066), ('discussion', 0.066), ('falsification', 0.065), ('pinker', 0.065), ('used', 0.065), ('taking', 0.065), ('answering', 0.065), ('replacing', 0.065)]

similar blogs list:

simIndex simValue blogId blogTitle

same-blog 1 0.99999988 2281 andrew gelman stats-2014-04-04-The Notorious N.H.S.T. presents: Mo P-values Mo Problems

Introduction: A recent discussion between commenters Question and Fernando captured one of the recurrent themes here from the past year. Question: The problem is simple, the researchers are disproving always false null hypotheses and taking this disproof as near proof that their theory is correct. Fernando: Whereas it is probably true that researchers misuse NHT, the problem with tabloid science is broader and deeper. It is systemic. Question: I do not see how anything can be deeper than replacing careful description, prediction, falsification, and independent replication with dynamite plots, p-values, affirming the consequent, and peer review. From my own experience I am confident in saying that confusion caused by NHST is at the root of this problem. Fernando: Incentives? Impact factors? Publish or die? “Interesting” and “new” above quality and reliability, or actually answering a research question, and a silly and unbecoming obsession with being quoted in NYT, etc. . . . Giv

2 0.39817345 2272 andrew gelman stats-2014-03-29-I agree with this comment

Introduction: The anonymous commenter puts it well : The problem is simple, the researchers are disproving always false null hypotheses and taking this disproof as near proof that their theory is correct.

3 0.20960732 256 andrew gelman stats-2010-09-04-Noooooooooooooooooooooooooooooooooooooooooooooooo!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!

Introduction: Masanao sends this one in, under the heading, “another incident of misunderstood p-value”: Warren Davies, a positive psychology MSc student at UEL, provides the latest in our ongoing series of guest features for students. Warren has just released a Psychology Study Guide, which covers information on statistics, research methods and study skills for psychology students. Despite the myriad rules and procedures of science, some research findings are pure flukes. Perhaps you’re testing a new drug, and by chance alone, a large number of people spontaneously get better. The better your study is conducted, the lower the chance that your result was a fluke – but still, there is always a certain probability that it was. Statistical significance testing gives you an idea of what this probability is. In science we’re always testing hypotheses. We never conduct a study to ‘see what happens’, because there’s always at least one way to make any useless set of data look important. We take

4 0.20325148 2263 andrew gelman stats-2014-03-24-Empirical implications of Empirical Implications of Theoretical Models

Introduction: Robert Bloomfield writes: Most of the people in my field (accounting, which is basically applied economics and finance, leavened with psychology and organizational behavior) use ‘positive research methods’, which are typically described as coming to the data with a predefined theory, and using hypothesis testing to accept or reject the theory’s predictions. But a substantial minority use ‘interpretive research methods’ (sometimes called qualitative methods, for those that call positive research ‘quantitative’). No one seems entirely happy with the definition of this method, but I’ve found it useful to think of it as an attempt to see the world through the eyes of your subjects, much as Jane Goodall lived with gorillas and tried to see the world through their eyes.) Interpretive researchers often criticize positive researchers by noting that the latter don’t make the best use of their data, because they come to the data with a predetermined theory, and only test a narrow set of h

5 0.17467009 1826 andrew gelman stats-2013-04-26-“A Vast Graveyard of Undead Theories: Publication Bias and Psychological Science’s Aversion to the Null”

Introduction: Erin Jonaitis points us to this article by Christopher Ferguson and Moritz Heene, who write: Publication bias remains a controversial issue in psychological science. . . . that the field often constructs arguments to block the publication and interpretation of null results and that null results may be further extinguished through questionable researcher practices. Given that science is dependent on the process of falsification, we argue that these problems reduce psychological science’s capability to have a proper mechanism for theory falsification, thus resulting in the promulgation of numerous “undead” theories that are ideologically popular but have little basis in fact. They mention the infamous Daryl Bem article. It is pretty much only because Bem’s claims are (presumably) false that they got published in a major research journal. Had the claims been true—that is, had Bem run identical experiments, analyzed his data more carefully and objectively, and reported that the r

6 0.17400081 2312 andrew gelman stats-2014-04-29-Ken Rice presents a unifying approach to statistical inference and hypothesis testing

7 0.16472875 2149 andrew gelman stats-2013-12-26-Statistical evidence for revised standards

8 0.14959571 1355 andrew gelman stats-2012-05-31-Lindley’s paradox

9 0.14460802 2295 andrew gelman stats-2014-04-18-One-tailed or two-tailed?

10 0.14410596 2183 andrew gelman stats-2014-01-23-Discussion on preregistration of research studies

11 0.14199632 1272 andrew gelman stats-2012-04-20-More proposals to reform the peer-review system

12 0.13933481 2326 andrew gelman stats-2014-05-08-Discussion with Steven Pinker on research that is attached to data that are so noisy as to be essentially uninformative

13 0.13477957 114 andrew gelman stats-2010-06-28-More on Bayesian deduction-induction

14 0.13465583 1869 andrew gelman stats-2013-05-24-In which I side with Neyman over Fisher

15 0.13201401 506 andrew gelman stats-2011-01-06-That silly ESP paper and some silliness in a rebuttal as well

16 0.12953053 1024 andrew gelman stats-2011-11-23-Of hypothesis tests and Unitarians

17 0.12753037 1873 andrew gelman stats-2013-05-28-Escalatingly uncomfortable

18 0.12668926 524 andrew gelman stats-2011-01-19-Data exploration and multiple comparisons

19 0.12402948 1695 andrew gelman stats-2013-01-28-Economists argue about Bayes

20 0.12074599 423 andrew gelman stats-2010-11-20-How to schedule projects in an introductory statistics course?


similar blogs computed by lsi model

lsi for this blog:

topicId topicWeight

[(0, 0.204), (1, 0.021), (2, -0.045), (3, -0.081), (4, -0.093), (5, -0.069), (6, -0.014), (7, 0.034), (8, 0.051), (9, -0.109), (10, -0.068), (11, 0.028), (12, 0.036), (13, -0.078), (14, 0.015), (15, -0.005), (16, -0.046), (17, -0.066), (18, -0.025), (19, -0.042), (20, 0.051), (21, -0.003), (22, -0.062), (23, 0.004), (24, -0.092), (25, -0.067), (26, 0.078), (27, 0.034), (28, 0.034), (29, -0.012), (30, 0.027), (31, 0.002), (32, 0.077), (33, 0.001), (34, -0.101), (35, -0.056), (36, 0.081), (37, -0.028), (38, 0.035), (39, -0.017), (40, -0.08), (41, 0.031), (42, 0.013), (43, 0.006), (44, 0.008), (45, 0.022), (46, 0.003), (47, -0.051), (48, 0.031), (49, -0.034)]

similar blogs list:

simIndex simValue blogId blogTitle

same-blog 1 0.95342743 2281 andrew gelman stats-2014-04-04-The Notorious N.H.S.T. presents: Mo P-values Mo Problems

Introduction: A recent discussion between commenters Question and Fernando captured one of the recurrent themes here from the past year. Question: The problem is simple, the researchers are disproving always false null hypotheses and taking this disproof as near proof that their theory is correct. Fernando: Whereas it is probably true that researchers misuse NHT, the problem with tabloid science is broader and deeper. It is systemic. Question: I do not see how anything can be deeper than replacing careful description, prediction, falsification, and independent replication with dynamite plots, p-values, affirming the consequent, and peer review. From my own experience I am confident in saying that confusion caused by NHST is at the root of this problem. Fernando: Incentives? Impact factors? Publish or die? “Interesting” and “new” above quality and reliability, or actually answering a research question, and a silly and unbecoming obsession with being quoted in NYT, etc. . . . Giv

2 0.89923137 2272 andrew gelman stats-2014-03-29-I agree with this comment

Introduction: The anonymous commenter puts it well : The problem is simple, the researchers are disproving always false null hypotheses and taking this disproof as near proof that their theory is correct.

3 0.84129077 1355 andrew gelman stats-2012-05-31-Lindley’s paradox

Introduction: Sam Seaver writes: I [Seaver] happened to be reading an ironic article by Karl Friston when I learned something new about frequentist vs bayesian, namely Lindley’s paradox, on page 12. The text is as follows: So why are we worried about trivial effects? They are important because the probability that the true effect size is exactly zero is itself zero and could cause us to reject the null hypothesis inappropriately. This is a fallacy of classical inference and is not unrelated to Lindley’s paradox (Lindley 1957). Lindley’s paradox describes a counterintuitive situation in which Bayesian and frequentist approaches to hypothesis testing give opposite results. It occurs when; (i) a result is significant by a frequentist test, indicating sufficient evidence to reject the null hypothesis d=0 and (ii) priors render the posterior probability of d=0 high, indicating strong evidence that the null hypothesis is true. In his original treatment, Lindley (1957) showed that – under a parti

4 0.83668691 1024 andrew gelman stats-2011-11-23-Of hypothesis tests and Unitarians

Introduction: Xian, Judith, and I read this line in a book by statistician Murray Aitkin in which he considered the following hypothetical example: A survey of 100 individuals expressing support (Yes/No) for the president, before and after a presidential address . . . The question of interest is whether there has been a change in support between the surveys . . . We want to assess the evidence for the hypothesis of equality H1 against the alternative hypothesis H2 of a change. Here is our response : Based on our experience in public opinion research, this is not a real question. Support for any political position is always changing. The real question is how much the support has changed, or perhaps how this change is distributed across the population. A defender of Aitkin (and of classical hypothesis testing) might respond at this point that, yes, everybody knows that changes are never exactly zero and that we should take a more “grown-up” view of the null hypothesis, not that the change

5 0.82041794 2149 andrew gelman stats-2013-12-26-Statistical evidence for revised standards

Introduction: In response to the discussion of X and me of his recent paper , Val Johnson writes: I would like to thank Andrew for forwarding his comments on uniformly most powerful Bayesian tests (UMPBTs) to me and his invitation to respond to them. I think he (and also Christian Robert) raise a number of interesting points concerning this new class of Bayesian tests, but I think that they may have confounded several issues that might more usefully be examined separately. The first issue involves the choice of the Bayesian evidence threshold, gamma, used in rejecting a null hypothesis in favor of an alternative hypothesis. Andrew objects to the higher values of gamma proposed in my recent PNAS article on grounds that too many important scientific effects would be missed if thresholds of 25-50 were routinely used. These evidence thresholds correspond roughly to p-values of 0.005; Andrew suggests that evidence thresholds around 5 should continue to be used (gamma=5 corresponds approximate

6 0.81972909 2295 andrew gelman stats-2014-04-18-One-tailed or two-tailed?

7 0.81737244 256 andrew gelman stats-2010-09-04-Noooooooooooooooooooooooooooooooooooooooooooooooo!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!

8 0.80941427 2312 andrew gelman stats-2014-04-29-Ken Rice presents a unifying approach to statistical inference and hypothesis testing

9 0.79901534 2263 andrew gelman stats-2014-03-24-Empirical implications of Empirical Implications of Theoretical Models

10 0.76541746 1826 andrew gelman stats-2013-04-26-“A Vast Graveyard of Undead Theories: Publication Bias and Psychological Science’s Aversion to the Null”

11 0.76209652 1883 andrew gelman stats-2013-06-04-Interrogating p-values

12 0.75864673 2183 andrew gelman stats-2014-01-23-Discussion on preregistration of research studies

13 0.75150353 1869 andrew gelman stats-2013-05-24-In which I side with Neyman over Fisher

14 0.74520272 2102 andrew gelman stats-2013-11-15-“Are all significant p-values created equal?”

15 0.72872347 2305 andrew gelman stats-2014-04-25-Revised statistical standards for evidence (comments to Val Johnson’s comments on our comments on Val’s comments on p-values)

16 0.72664583 1760 andrew gelman stats-2013-03-12-Misunderstanding the p-value

17 0.72247702 2127 andrew gelman stats-2013-12-08-The never-ending (and often productive) race between theory and practice

18 0.71658528 331 andrew gelman stats-2010-10-10-Bayes jumps the shark

19 0.70458651 506 andrew gelman stats-2011-01-06-That silly ESP paper and some silliness in a rebuttal as well

20 0.68937367 1861 andrew gelman stats-2013-05-17-Where do theories come from?


similar blogs computed by lda model

lda for this blog:

topicId topicWeight

[(2, 0.025), (9, 0.031), (16, 0.114), (18, 0.012), (21, 0.098), (24, 0.164), (43, 0.027), (45, 0.02), (63, 0.025), (84, 0.011), (85, 0.052), (86, 0.06), (96, 0.011), (99, 0.228)]

similar blogs list:

simIndex simValue blogId blogTitle

same-blog 1 0.95074743 2281 andrew gelman stats-2014-04-04-The Notorious N.H.S.T. presents: Mo P-values Mo Problems

Introduction: A recent discussion between commenters Question and Fernando captured one of the recurrent themes here from the past year. Question: The problem is simple, the researchers are disproving always false null hypotheses and taking this disproof as near proof that their theory is correct. Fernando: Whereas it is probably true that researchers misuse NHT, the problem with tabloid science is broader and deeper. It is systemic. Question: I do not see how anything can be deeper than replacing careful description, prediction, falsification, and independent replication with dynamite plots, p-values, affirming the consequent, and peer review. From my own experience I am confident in saying that confusion caused by NHST is at the root of this problem. Fernando: Incentives? Impact factors? Publish or die? “Interesting” and “new” above quality and reliability, or actually answering a research question, and a silly and unbecoming obsession with being quoted in NYT, etc. . . . Giv

2 0.93798196 1826 andrew gelman stats-2013-04-26-“A Vast Graveyard of Undead Theories: Publication Bias and Psychological Science’s Aversion to the Null”

Introduction: Erin Jonaitis points us to this article by Christopher Ferguson and Moritz Heene, who write: Publication bias remains a controversial issue in psychological science. . . . that the field often constructs arguments to block the publication and interpretation of null results and that null results may be further extinguished through questionable researcher practices. Given that science is dependent on the process of falsification, we argue that these problems reduce psychological science’s capability to have a proper mechanism for theory falsification, thus resulting in the promulgation of numerous “undead” theories that are ideologically popular but have little basis in fact. They mention the infamous Daryl Bem article. It is pretty much only because Bem’s claims are (presumably) false that they got published in a major research journal. Had the claims been true—that is, had Bem run identical experiments, analyzed his data more carefully and objectively, and reported that the r

3 0.93588299 1374 andrew gelman stats-2012-06-11-Convergence Monitoring for Non-Identifiable and Non-Parametric Models

Introduction: Becky Passonneau and colleagues at the Center for Computational Learning Systems (CCLS) at Columbia have been working on a project for ConEd (New York’s major electric utility) to rank structures based on vulnerability to secondary events (e.g., transformer explosions, cable meltdowns, electrical fires). They’ve been using the R implementation BayesTree of Chipman, George and McCulloch’s Bayesian Additive Regression Trees (BART). BART is a Bayesian non-parametric method that is non-identifiable in two ways. Firstly, it is an additive tree model with a fixed number of trees, the indexes of which aren’t identified (you get the same predictions in a model swapping the order of the trees). This is the same kind of non-identifiability you get with any mixture model (additive or interpolated) with an exchangeable prior on the mixture components. Secondly, the trees themselves have varying structure over samples in terms of number of nodes and their topology (depth, branching, etc

4 0.93492788 2306 andrew gelman stats-2014-04-26-Sleazy sock puppet can’t stop spamming our discussion of compressed sensing and promoting the work of Xiteng Liu

Introduction: Some asshole who has a bug up his ass about compressed sensing is spamming our comments with a bunch of sock puppets. All from the same IP address: “George Stoneriver,” Scott Wolfe,” and just plain “Paul,” all saying pretty much the same thing in the same sort of broken English (except for Paul, whose post was too short to do a dialect analysis). “Scott Wolfe” is a generic sort of name, but a quick google search reveals nothing related to this topic. “George Stoneriver” seems to have no internet presence at all (besides the comments at this blog). As for “Paul,” I don’t know, maybe the spammer was too lazy to invent a last name? Our spammer spends about half his time slamming the field of compressed sensing and the other half pumping up the work of someone named Xiteng Liu. There’s no excuse for this behavior. It’s horrible, a true abuse of our scholarly community. If Scott Adams wants to use a sock puppet, fine, the guy’s an artist and we should cut him some slack. If tha

5 0.93293208 2112 andrew gelman stats-2013-11-25-An interesting but flawed attempt to apply general forecasting principles to contextualize attitudes toward risks of global warming

Introduction: I came across a document [updated link here ], “Applying structured analogies to the global warming alarm movement,” by Kesten Green and Scott Armstrong. The general approach is appealing to me, but the execution seemed disturbingly flawed. Here’s how they introduce the project: The structured analogies procedure we [Green and Armstrong] used for this study was as follows: 1. Identify possible analogies by searching the literature and by asking experts with different viewpoints to nominate analogies to the target situation: alarm over dangerous manmade global warming. 2. Screen the possible analogies to ensure they meet the stated criteria and that the outcomes are known. 3. Code the relevant characteristics of the analogous situations. 4. Forecast target situation outcomes by using a predetermined mechanical rule to select the outcomes of the analogies. Here is how we posed the question to the experts: The Intergovernmental Panel on Climate Change and other organizat

6 0.9316209 1019 andrew gelman stats-2011-11-19-Validation of Software for Bayesian Models Using Posterior Quantiles

7 0.93125808 777 andrew gelman stats-2011-06-23-Combining survey data obtained using different modes of sampling

8 0.92830515 1615 andrew gelman stats-2012-12-10-A defense of Tom Wolfe based on the impossibility of the law of small numbers in network structure

9 0.92760062 586 andrew gelman stats-2011-02-23-A statistical version of Arrow’s paradox

10 0.92682397 1959 andrew gelman stats-2013-07-28-50 shades of gray: A research story

11 0.9267056 2029 andrew gelman stats-2013-09-18-Understanding posterior p-values

12 0.92623889 2037 andrew gelman stats-2013-09-25-Classical probability does not apply to quantum systems (causal inference edition)

13 0.92408538 1824 andrew gelman stats-2013-04-25-Fascinating graphs from facebook data

14 0.92394841 514 andrew gelman stats-2011-01-13-News coverage of statistical issues…how did I do?

15 0.9231689 537 andrew gelman stats-2011-01-25-Postdoc Position #1: Missing-Data Imputation, Diagnostics, and Applications

16 0.92310083 789 andrew gelman stats-2011-07-07-Descriptive statistics, causal inference, and story time

17 0.92224991 807 andrew gelman stats-2011-07-17-Macro causality

18 0.92168254 659 andrew gelman stats-2011-04-13-Jim Campbell argues that Larry Bartels’s “Unequal Democracy” findings are not robust

19 0.92165291 810 andrew gelman stats-2011-07-20-Adding more information can make the variance go up (depending on your model)

20 0.92085356 488 andrew gelman stats-2010-12-27-Graph of the year