andrew_gelman_stats andrew_gelman_stats-2013 andrew_gelman_stats-2013-1883 knowledge-graph by maker-knowledge-mining
Source: html
Introduction: This article is a discussion of a paper by Greg Francis for a special issue, edited by E. J. Wagenmakers, of the Journal of Mathematical Psychology. Here’s what I wrote: Much of statistical practice is an effort to reduce or deny variation and uncertainty. The reduction is done through standardization, replication, and other practices of experimental design, with the idea being to isolate and stabilize the quantity being estimated and then average over many cases. Even so, however, uncertainty persists, and statistical hypothesis testing is in many ways an endeavor to deny this, by reporting binary accept/reject decisions. Classical statistical methods produce binary statements, but there is no reason to assume that the world works that way. Expressions such as Type 1 error, Type 2 error, false positive, and so on, are based on a model in which the world is divided into real and non-real effects. To put it another way, I understand the general scientific distinction of real vs
sentIndex sentText sentNum sentScore
1 Here’s what I wrote: Much of statistical practice is an effort to reduce or deny variation and uncertainty. [sent-4, score-0.119]
2 The reduction is done through standardization, replication, and other practices of experimental design, with the idea being to isolate and stabilize the quantity being estimated and then average over many cases. [sent-5, score-0.156]
3 Even so, however, uncertainty persists, and statistical hypothesis testing is in many ways an endeavor to deny this, by reporting binary accept/reject decisions. [sent-6, score-0.456]
4 Classical statistical methods produce binary statements, but there is no reason to assume that the world works that way. [sent-7, score-0.095]
5 Expressions such as Type 1 error, Type 2 error, false positive, and so on, are based on a model in which the world is divided into real and non-real effects. [sent-8, score-0.189]
6 To put it another way, I understand the general scientific distinction of real vs. [sent-9, score-0.286]
7 non-real effects but I do not think this maps well into the mathematical distinction of θ=0 vs. [sent-10, score-0.31]
8 Yes, there are some unambiguously true effects and some that are arguably zero, but I would guess that the challenge in most current research in psychology is not that effects are zero but that they vary from person to person and in different contexts. [sent-12, score-0.581]
9 But if we do not want to characterize science as the search for true positives, how should we statistically model the process of scientific publication and discovery? [sent-13, score-0.262]
10 An empirical approach is to identify scientific truth with replicability; hence, the goal of an experimental or observational scientist is to discover effects that replicate in future studies. [sent-14, score-0.324]
11 The replicability standard seems to be reasonable. [sent-15, score-0.144]
12 As a student many years ago, I heard about opportunistic stopping rules, the file drawer problem, and other reasons why nominal p-values do not actually represent the true probability that observed data are more extreme than what would be expected by chance. [sent-20, score-0.452]
13 My impression was that these problems represented a minor adjustment and not a major reappraisal of the scientific process. [sent-21, score-0.113]
14 After all, given what we know about scientists’ desire to communicate their efforts, it was hard to imagine that there were file drawers bulging with unpublished results. [sent-22, score-0.102]
15 More recently, though, there has been a growing sense that psychology, biomedicine, and other fields are being overwhelmed with errors (consider, for example, the generally positive reaction to the paper of Ioannidis, 2005). [sent-23, score-0.303]
16 I disagree with the following statement from that article: For both confirmatory and exploratory research, a hypothesis test is appropriate if the outcome drives a specific course of action. [sent-30, score-0.266]
17 Hypothesis tests provide a way to make a decision based on data, and such decisions are useful for choosing an action. [sent-31, score-0.293]
18 If a doctor has to determine whether to treat a patient with drugs or surgery, a hypothesis test might provide useful information to guide the action. [sent-32, score-0.725]
19 Here I speak not of the cost of hypothetical false positives or false negatives but of the direct costs and benefits of the decision. [sent-36, score-0.448]
20 An observed difference can be relevant to a decision whether or not that difference is statistically significant. [sent-37, score-0.571]
wordName wordTfidf (topN-words)
[('francis', 0.381), ('hypothesis', 0.172), ('notification', 0.17), ('surgery', 0.148), ('replicability', 0.144), ('designer', 0.14), ('positives', 0.137), ('effects', 0.132), ('observed', 0.119), ('deny', 0.119), ('false', 0.117), ('scientific', 0.113), ('drugs', 0.112), ('simonsohn', 0.112), ('decision', 0.109), ('provide', 0.103), ('file', 0.102), ('distinction', 0.101), ('psychology', 0.098), ('whether', 0.097), ('binary', 0.095), ('test', 0.094), ('difference', 0.088), ('light', 0.087), ('press', 0.087), ('standardization', 0.085), ('fields', 0.084), ('reaction', 0.084), ('useful', 0.081), ('opportunistic', 0.08), ('experimental', 0.079), ('true', 0.079), ('mathematical', 0.077), ('negatives', 0.077), ('stabilize', 0.077), ('gregory', 0.074), ('type', 0.072), ('real', 0.072), ('unambiguously', 0.072), ('drawer', 0.072), ('statistically', 0.07), ('endeavor', 0.07), ('design', 0.069), ('overwhelmed', 0.068), ('likewise', 0.068), ('zero', 0.068), ('positive', 0.067), ('persists', 0.067), ('doctor', 0.066), ('replicating', 0.066)]
simIndex simValue blogId blogTitle
same-blog 1 1.0000002 1883 andrew gelman stats-2013-06-04-Interrogating p-values
Introduction: This article is a discussion of a paper by Greg Francis for a special issue, edited by E. J. Wagenmakers, of the Journal of Mathematical Psychology. Here’s what I wrote: Much of statistical practice is an effort to reduce or deny variation and uncertainty. The reduction is done through standardization, replication, and other practices of experimental design, with the idea being to isolate and stabilize the quantity being estimated and then average over many cases. Even so, however, uncertainty persists, and statistical hypothesis testing is in many ways an endeavor to deny this, by reporting binary accept/reject decisions. Classical statistical methods produce binary statements, but there is no reason to assume that the world works that way. Expressions such as Type 1 error, Type 2 error, false positive, and so on, are based on a model in which the world is divided into real and non-real effects. To put it another way, I understand the general scientific distinction of real vs
2 0.16711566 2093 andrew gelman stats-2013-11-07-I’m negative on the expression “false positives”
Introduction: After seeing a document sent to me and others regarding the crisis of spurious, statistically-significant research findings in psychology research, I had the following reaction: I am unhappy with the use in the document of the phrase “false positives.” I feel that this expression is unhelpful as it frames science in terms of “true” and “false” claims, which I don’t think is particularly accurate. In particular, in most of the recent disputed Psych Science type studies (the ESP study excepted, perhaps), there is little doubt that there is _some_ underlying effect. The issue, as I see it, as that the underlying effects are much smaller, and much more variable, than mainstream researchers imagine. So what happens is that Psych Science or Nature or whatever will publish a result that is purported to be some sort of universal truth, but it is actually a pattern specific to one data set, one population, and one experimental condition. In a sense, yes, these journals are publishing
3 0.16643742 1171 andrew gelman stats-2012-02-16-“False-positive psychology”
Introduction: Everybody’s talkin bout this paper by Joseph Simmons, Leif Nelson and Uri Simonsohn, who write : Despite empirical psychologists’ nominal endorsement of a low rate of false-positive findings (≤ .05), flexibility in data collection, analysis, and reporting dramatically increases actual false-positive rates. In many cases, a researcher is more likely to falsely find evidence that an effect exists than to correctly find evidence that it does not. We [Simmons, Nelson, and Simonsohn] present computer simulations and a pair of actual experiments that demonstrate how unacceptably easy it is to accumulate (and report) statistically significant evidence for a false hypothesis. Second, we suggest a simple, low-cost, and straightforwardly effective disclosure-based solution to this problem. The solution involves six concrete requirements for authors and four guidelines for reviewers, all of which impose a minimal burden on the publication process. Whatever you think about these recommend
Introduction: John Cook writes : When I hear someone say “personalized medicine” I want to ask “as opposed to what?” All medicine is personalized. If you are in an emergency room with a broken leg and the person next to you is lapsing into a diabetic coma, the two of you will be treated differently. The aim of personalized medicine is to increase the degree of personalization, not to introduce personalization. . . . This to me is a statistical way of thinking, to change an “Is it or isn’t it?” question into a “How much?” question. This distinction arises in many settings but particularly in discussions of causal inference, for example here and here , where I use the “statistical thinking” approach of imagining everything as being on some continuous scale, in contrast to computer scientist Elias Bareinboim and psychology researcher Steven Sloman, both of whom prefer what might be called the “civilian” or “common sense” idea that effects are either real or not, or that certain data can
5 0.15716586 1605 andrew gelman stats-2012-12-04-Write This Book
Introduction: This post is by Phil Price. I’ve been preparing a review of a new statistics textbook aimed at students and practitioners in the “physical sciences,” as distinct from the social sciences and also distinct from people who intend to take more statistics courses. I figured that since it’s been years since I looked at an intro stats textbook, I should look at a few others and see how they differ from this one, so in addition to the book I’m reviewing I’ve looked at some other textbooks aimed at similar audiences: Milton and Arnold; Hines, Montgomery, Goldsman, and Borror; and a few others. I also looked at the table of contents of several more. There is a lot of overlap in the coverage of these books — they all have discussions of common discrete and continuous distributions, joint distributions, descriptive statistics, parameter estimation, hypothesis testing, linear regression, ANOVA, factorial experimental design, and a few other topics. I can see how, from a statisti
6 0.15159309 2263 andrew gelman stats-2014-03-24-Empirical implications of Empirical Implications of Theoretical Models
7 0.15042746 1878 andrew gelman stats-2013-05-31-How to fix the tabloids? Toward replicable social science research
9 0.14326142 466 andrew gelman stats-2010-12-13-“The truth wears off: Is there something wrong with the scientific method?”
10 0.14291622 888 andrew gelman stats-2011-09-03-A psychology researcher asks: Is Anova dead?
12 0.14085753 2149 andrew gelman stats-2013-12-26-Statistical evidence for revised standards
13 0.14067233 2183 andrew gelman stats-2014-01-23-Discussion on preregistration of research studies
15 0.13707438 1695 andrew gelman stats-2013-01-28-Economists argue about Bayes
16 0.13341045 2295 andrew gelman stats-2014-04-18-One-tailed or two-tailed?
18 0.13295126 929 andrew gelman stats-2011-09-27-Visual diagnostics for discrete-data regressions
19 0.13028139 643 andrew gelman stats-2011-04-02-So-called Bayesian hypothesis testing is just as bad as regular hypothesis testing
20 0.12768878 1844 andrew gelman stats-2013-05-06-Against optimism about social science
topicId topicWeight
[(0, 0.257), (1, 0.036), (2, 0.0), (3, -0.197), (4, -0.06), (5, -0.08), (6, -0.053), (7, 0.0), (8, 0.01), (9, -0.048), (10, -0.075), (11, 0.036), (12, 0.001), (13, -0.12), (14, 0.003), (15, -0.017), (16, -0.067), (17, -0.043), (18, -0.021), (19, -0.003), (20, 0.033), (21, 0.004), (22, -0.013), (23, 0.003), (24, -0.063), (25, -0.037), (26, 0.023), (27, 0.005), (28, -0.003), (29, -0.036), (30, -0.004), (31, 0.022), (32, 0.009), (33, 0.022), (34, -0.016), (35, -0.031), (36, -0.0), (37, -0.026), (38, 0.014), (39, -0.014), (40, -0.055), (41, -0.011), (42, -0.009), (43, 0.043), (44, -0.017), (45, 0.065), (46, -0.036), (47, -0.05), (48, 0.04), (49, 0.007)]
simIndex simValue blogId blogTitle
same-blog 1 0.98322129 1883 andrew gelman stats-2013-06-04-Interrogating p-values
Introduction: This article is a discussion of a paper by Greg Francis for a special issue, edited by E. J. Wagenmakers, of the Journal of Mathematical Psychology. Here’s what I wrote: Much of statistical practice is an effort to reduce or deny variation and uncertainty. The reduction is done through standardization, replication, and other practices of experimental design, with the idea being to isolate and stabilize the quantity being estimated and then average over many cases. Even so, however, uncertainty persists, and statistical hypothesis testing is in many ways an endeavor to deny this, by reporting binary accept/reject decisions. Classical statistical methods produce binary statements, but there is no reason to assume that the world works that way. Expressions such as Type 1 error, Type 2 error, false positive, and so on, are based on a model in which the world is divided into real and non-real effects. To put it another way, I understand the general scientific distinction of real vs
Introduction: Erin Jonaitis points us to this article by Christopher Ferguson and Moritz Heene, who write: Publication bias remains a controversial issue in psychological science. . . . that the field often constructs arguments to block the publication and interpretation of null results and that null results may be further extinguished through questionable researcher practices. Given that science is dependent on the process of falsification, we argue that these problems reduce psychological science’s capability to have a proper mechanism for theory falsification, thus resulting in the promulgation of numerous “undead” theories that are ideologically popular but have little basis in fact. They mention the infamous Daryl Bem article. It is pretty much only because Bem’s claims are (presumably) false that they got published in a major research journal. Had the claims been true—that is, had Bem run identical experiments, analyzed his data more carefully and objectively, and reported that the r
3 0.86139596 2183 andrew gelman stats-2014-01-23-Discussion on preregistration of research studies
Introduction: Chris Chambers and I had an enlightening discussion the other day at the blog of Rolf Zwaan, regarding the Garden of Forking Paths ( go here and scroll down through the comments). Chris sent me the following note: I’m writing a book at the moment about reforming practices in psychological research (focusing on various bad practices such as p-hacking, HARKing, low statistical power, publication bias, lack of data sharing etc. – and posing solutions such as pre-registration, Bayesian hypothesis testing, mandatory data archiving etc.) and I am arriving at rather unsettling conclusion: that null hypothesis significance testing (NHST) simply isn’t valid for observational research. If this is true then most of the psychological literature is statistically flawed. I was wonder what your thoughts were on this, both from a statistical point of view and from your experience working in an observational field. We all know about the dangers of researcher degrees of freedom. We also know
4 0.82923293 1171 andrew gelman stats-2012-02-16-“False-positive psychology”
Introduction: Everybody’s talkin bout this paper by Joseph Simmons, Leif Nelson and Uri Simonsohn, who write : Despite empirical psychologists’ nominal endorsement of a low rate of false-positive findings (≤ .05), flexibility in data collection, analysis, and reporting dramatically increases actual false-positive rates. In many cases, a researcher is more likely to falsely find evidence that an effect exists than to correctly find evidence that it does not. We [Simmons, Nelson, and Simonsohn] present computer simulations and a pair of actual experiments that demonstrate how unacceptably easy it is to accumulate (and report) statistically significant evidence for a false hypothesis. Second, we suggest a simple, low-cost, and straightforwardly effective disclosure-based solution to this problem. The solution involves six concrete requirements for authors and four guidelines for reviewers, all of which impose a minimal burden on the publication process. Whatever you think about these recommend
5 0.82706475 2295 andrew gelman stats-2014-04-18-One-tailed or two-tailed?
Introduction: Someone writes: Suppose I have two groups of people, A and B, which differ on some characteristic of interest to me; and for each person I measure a single real-valued quantity X. I have a theory that group A has a higher mean value of X than group B. I test this theory by using a t-test. Am I entitled to use a *one-tailed* t-test? Or should I use a *two-tailed* one (thereby giving a p-value that is twice as large)? I know you will probably answer: Forget the t-test; you should use Bayesian methods instead. But what is the standard frequentist answer to this question? My reply: The quick answer here is that different people will do different things here. I would say the 2-tailed p-value is more standard but some people will insist on the one-tailed version, and it’s hard to make a big stand on this one, given all the other problems with p-values in practice: http://www.stat.columbia.edu/~gelman/research/unpublished/p_hacking.pdf http://www.stat.columbia.edu/~gelm
6 0.82366055 897 andrew gelman stats-2011-09-09-The difference between significant and not significant…
8 0.81957334 1760 andrew gelman stats-2013-03-12-Misunderstanding the p-value
9 0.81417197 2149 andrew gelman stats-2013-12-26-Statistical evidence for revised standards
10 0.80693448 898 andrew gelman stats-2011-09-10-Fourteen magic words: an update
11 0.79688716 511 andrew gelman stats-2011-01-11-One more time on that ESP study: The problem of overestimates and the shrinkage solution
12 0.79286724 2281 andrew gelman stats-2014-04-04-The Notorious N.H.S.T. presents: Mo P-values Mo Problems
14 0.78903341 2093 andrew gelman stats-2013-11-07-I’m negative on the expression “false positives”
15 0.78390223 1355 andrew gelman stats-2012-05-31-Lindley’s paradox
16 0.78360671 2263 andrew gelman stats-2014-03-24-Empirical implications of Empirical Implications of Theoretical Models
17 0.77852005 1776 andrew gelman stats-2013-03-25-The harm done by tests of significance
19 0.7728669 1195 andrew gelman stats-2012-03-04-Multiple comparisons dispute in the tabloids
20 0.76866204 1869 andrew gelman stats-2013-05-24-In which I side with Neyman over Fisher
topicId topicWeight
[(2, 0.031), (4, 0.011), (15, 0.021), (16, 0.101), (21, 0.045), (24, 0.192), (34, 0.03), (52, 0.01), (53, 0.017), (84, 0.113), (86, 0.037), (99, 0.265)]
simIndex simValue blogId blogTitle
Introduction: Hadley Wickham sent me this , by Keith Baggerly and Kevin Coombes: In this report we [Baggerly and Coombes] examine several related papers purporting to use microarray-based signatures of drug sensitivity derived from cell lines to predict patient response. Patients in clinical trials are currently being allocated to treatment arms on the basis of these results. However, we show in five case studies that the results incorporate several simple errors that may be putting patients at risk. One theme that emerges is that the most common errors are simple (e.g., row or column offsets); conversely, it is our experience that the most simple errors are common. This is horrible! But, in a way, it’s not surprising. I make big mistakes in my applied work all the time. I mean, all the time. Sometimes I scramble the order of the 50 states, or I’m plotting a pure noise variable, or whatever. But usually I don’t drift too far from reality because I have a lot of cross-checks and I (or my
same-blog 2 0.97280657 1883 andrew gelman stats-2013-06-04-Interrogating p-values
Introduction: This article is a discussion of a paper by Greg Francis for a special issue, edited by E. J. Wagenmakers, of the Journal of Mathematical Psychology. Here’s what I wrote: Much of statistical practice is an effort to reduce or deny variation and uncertainty. The reduction is done through standardization, replication, and other practices of experimental design, with the idea being to isolate and stabilize the quantity being estimated and then average over many cases. Even so, however, uncertainty persists, and statistical hypothesis testing is in many ways an endeavor to deny this, by reporting binary accept/reject decisions. Classical statistical methods produce binary statements, but there is no reason to assume that the world works that way. Expressions such as Type 1 error, Type 2 error, false positive, and so on, are based on a model in which the world is divided into real and non-real effects. To put it another way, I understand the general scientific distinction of real vs
3 0.9712851 42 andrew gelman stats-2010-05-19-Updated solutions to Bayesian Data Analysis homeworks
Introduction: Here are solutions to about 50 of the exercises from Bayesian Data Analysis. The solutions themselves haven’t been updated; I just cleaned up the file: some change in Latex had resulted in much of the computer code running off the page, so I went in and cleaned up the files. I wrote most of these in 1996, and I like them a lot. I think several of them would’ve made good journal articles, and in retrospect I wish I’d published them as such. Original material that appears first in a book (or, even worse, in homework solutions) can easily be overlooked.
4 0.96773362 1776 andrew gelman stats-2013-03-25-The harm done by tests of significance
Introduction: After seeing this recent discussion , Ezra Hauer sent along an article of his from the journal Accident Analysis and Prevention, describing three examples from accident research in which null hypothesis significance testing led researchers astray. Hauer writes: The problem is clear. Researchers obtain real data which, while noisy, time and again point in a certain direction. However, instead of saying: “here is my estimate of the safety effect, here is its precision, and this is how what I found relates to previous findings”, the data is processed by NHST, and the researcher says, correctly but pointlessly: “I cannot be sure that the safety effect is not zero”. Occasionally, the researcher adds, this time incorrectly and unjustifiably, a statement to the effect that: “since the result is not statistically significant, it is best to assume the safety effect to be zero”. In this manner, good data are drained of real content, the direction of empirical conclusions reversed, and ord
5 0.96496487 2004 andrew gelman stats-2013-09-01-Post-publication peer review: How it (sometimes) really works
Introduction: In an ideal world, research articles would be open to criticism and discussion in the same place where they are published, in a sort of non-corrupt version of Yelp. What is happening now is that the occasional paper or research area gets lots of press coverage, and this inspires reactions on science-focused blogs. The trouble here is that it’s easier to give off-the-cuff comments than detailed criticisms. Here’s an example. It starts a couple years ago with this article by Ryota Kanai, Tom Feilden, Colin Firth, and Geraint Rees, on brain size and political orientation: In a large sample of young adults, we related self-reported political attitudes to gray matter volume using structural MRI. We found that greater liberalism was associated with increased gray matter volume in the anterior cingulate cortex, whereas greater conservatism was associated with increased volume of the right amygdala. These results were replicated in an independent sample of additional participants. Ou
6 0.96252048 1817 andrew gelman stats-2013-04-21-More on Bayesian model selection in high-dimensional settings
7 0.95987761 1877 andrew gelman stats-2013-05-30-Infill asymptotics and sprawl asymptotics
8 0.95915067 2299 andrew gelman stats-2014-04-21-Stan Model of the Week: Hierarchical Modeling of Supernovas
9 0.95627189 1181 andrew gelman stats-2012-02-23-Philosophy: Pointer to Salmon
10 0.95477188 323 andrew gelman stats-2010-10-06-Sociotropic Voting and the Media
11 0.95460993 490 andrew gelman stats-2010-12-29-Brain Structure and the Big Five
12 0.95082551 235 andrew gelman stats-2010-08-25-Term Limits for the Supreme Court?
13 0.95056891 2053 andrew gelman stats-2013-10-06-Ideas that spread fast and slow
15 0.94499767 2201 andrew gelman stats-2014-02-06-Bootstrap averaging: Examples where it works and where it doesn’t work
16 0.94361448 2149 andrew gelman stats-2013-12-26-Statistical evidence for revised standards
18 0.94013619 187 andrew gelman stats-2010-08-05-Update on state size and governors’ popularity
19 0.93799192 2183 andrew gelman stats-2014-01-23-Discussion on preregistration of research studies
20 0.93741202 98 andrew gelman stats-2010-06-19-Further thoughts on happiness and life satisfaction research