andrew_gelman_stats andrew_gelman_stats-2010 andrew_gelman_stats-2010-256 knowledge-graph by maker-knowledge-mining
Source: html
Introduction: Masanao sends this one in, under the heading, “another incident of misunderstood p-value”: Warren Davies, a positive psychology MSc student at UEL, provides the latest in our ongoing series of guest features for students. Warren has just released a Psychology Study Guide, which covers information on statistics, research methods and study skills for psychology students. Despite the myriad rules and procedures of science, some research findings are pure flukes. Perhaps you’re testing a new drug, and by chance alone, a large number of people spontaneously get better. The better your study is conducted, the lower the chance that your result was a fluke – but still, there is always a certain probability that it was. Statistical significance testing gives you an idea of what this probability is. In science we’re always testing hypotheses. We never conduct a study to ‘see what happens’, because there’s always at least one way to make any useless set of data look important. We take
sentIndex sentText sentNum sentScore
1 Masanao sends this one in, under the heading, “another incident of misunderstood p-value”: Warren Davies, a positive psychology MSc student at UEL, provides the latest in our ongoing series of guest features for students. [sent-1, score-0.584]
2 Warren has just released a Psychology Study Guide, which covers information on statistics, research methods and study skills for psychology students. [sent-2, score-0.602]
3 Despite the myriad rules and procedures of science, some research findings are pure flukes. [sent-3, score-0.194]
4 Perhaps you’re testing a new drug, and by chance alone, a large number of people spontaneously get better. [sent-4, score-0.246]
5 The better your study is conducted, the lower the chance that your result was a fluke – but still, there is always a certain probability that it was. [sent-5, score-0.551]
6 Statistical significance testing gives you an idea of what this probability is. [sent-6, score-0.411]
7 We never conduct a study to ‘see what happens’, because there’s always at least one way to make any useless set of data look important. [sent-8, score-0.368]
8 We take a risk; we put our idea on the line and expose it to potential refutation. [sent-9, score-0.102]
9 Therefore, all statistical tests in psychology test the possibility that the hypothesis is correct, versus the possibility that it isn’t. [sent-10, score-0.747]
10 I don’t blame Warren Davies–it’s all-too-common for someone teaching statistics to (a) make a mistake and (b) not realize it. [sent-13, score-0.318]
11 But I do blame the editors of the website for getting a non-expert to emit wrong information. [sent-14, score-0.255]
12 One thing that any research psychologist should know is that statistics is tricky. [sent-15, score-0.237]
13 I hate to see this sort of mistake (saying that statistical significance is a measure of the probability the null hypothesis is true) being given the official endorsement of British Psychological Society. [sent-16, score-1.103]
14 To any confused readers out there: The p-value is the probability of seeing something as extreme as the data or more so, if the null hypothesis were true. [sent-19, score-0.732]
15 In social science (and I think in psychology as well), the null hypothesis is almost certainly false, false, false, and you don’t need a p-value to tell you this. [sent-20, score-0.983]
16 The p-value tells you the extent to which a certain aspect of your data are consistent with the null hypothesis. [sent-21, score-0.611]
17 A lack of rejection doesn’t tell you that the null hyp is likely true; rather, it tells you that you don’t have enough data to reject the null hyp. [sent-22, score-1.151]
18 For more more more on this, see for example this paper with David Weakliem which was written for a nontechnical audience. [sent-23, score-0.117]
wordName wordTfidf (topN-words)
[('null', 0.395), ('warren', 0.276), ('psychology', 0.226), ('davies', 0.204), ('hypothesis', 0.189), ('false', 0.162), ('testing', 0.159), ('probability', 0.148), ('blame', 0.138), ('possibility', 0.132), ('study', 0.13), ('tells', 0.121), ('emit', 0.117), ('nontechnical', 0.117), ('bps', 0.11), ('mistake', 0.107), ('weakliem', 0.105), ('significance', 0.104), ('expose', 0.102), ('myriad', 0.102), ('certain', 0.095), ('digest', 0.094), ('misunderstood', 0.094), ('guest', 0.094), ('research', 0.092), ('endorsement', 0.092), ('incident', 0.092), ('always', 0.091), ('masanao', 0.09), ('handy', 0.09), ('tell', 0.088), ('chance', 0.087), ('science', 0.085), ('covers', 0.083), ('heading', 0.082), ('zombies', 0.082), ('rejection', 0.08), ('ongoing', 0.078), ('conduct', 0.076), ('british', 0.075), ('statistics', 0.073), ('guide', 0.073), ('true', 0.072), ('reject', 0.072), ('ridiculous', 0.072), ('psychologist', 0.072), ('useless', 0.071), ('released', 0.071), ('versus', 0.068), ('official', 0.068)]
simIndex simValue blogId blogTitle
same-blog 1 0.99999994 256 andrew gelman stats-2010-09-04-Noooooooooooooooooooooooooooooooooooooooooooooooo!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!
Introduction: Masanao sends this one in, under the heading, “another incident of misunderstood p-value”: Warren Davies, a positive psychology MSc student at UEL, provides the latest in our ongoing series of guest features for students. Warren has just released a Psychology Study Guide, which covers information on statistics, research methods and study skills for psychology students. Despite the myriad rules and procedures of science, some research findings are pure flukes. Perhaps you’re testing a new drug, and by chance alone, a large number of people spontaneously get better. The better your study is conducted, the lower the chance that your result was a fluke – but still, there is always a certain probability that it was. Statistical significance testing gives you an idea of what this probability is. In science we’re always testing hypotheses. We never conduct a study to ‘see what happens’, because there’s always at least one way to make any useless set of data look important. We take
2 0.21957479 1355 andrew gelman stats-2012-05-31-Lindley’s paradox
Introduction: Sam Seaver writes: I [Seaver] happened to be reading an ironic article by Karl Friston when I learned something new about frequentist vs bayesian, namely Lindley’s paradox, on page 12. The text is as follows: So why are we worried about trivial effects? They are important because the probability that the true effect size is exactly zero is itself zero and could cause us to reject the null hypothesis inappropriately. This is a fallacy of classical inference and is not unrelated to Lindley’s paradox (Lindley 1957). Lindley’s paradox describes a counterintuitive situation in which Bayesian and frequentist approaches to hypothesis testing give opposite results. It occurs when; (i) a result is significant by a frequentist test, indicating sufficient evidence to reject the null hypothesis d=0 and (ii) priors render the posterior probability of d=0 high, indicating strong evidence that the null hypothesis is true. In his original treatment, Lindley (1957) showed that – under a parti
Introduction: Erin Jonaitis points us to this article by Christopher Ferguson and Moritz Heene, who write: Publication bias remains a controversial issue in psychological science. . . . that the field often constructs arguments to block the publication and interpretation of null results and that null results may be further extinguished through questionable researcher practices. Given that science is dependent on the process of falsification, we argue that these problems reduce psychological science’s capability to have a proper mechanism for theory falsification, thus resulting in the promulgation of numerous “undead” theories that are ideologically popular but have little basis in fact. They mention the infamous Daryl Bem article. It is pretty much only because Bem’s claims are (presumably) false that they got published in a major research journal. Had the claims been true—that is, had Bem run identical experiments, analyzed his data more carefully and objectively, and reported that the r
4 0.20960732 2281 andrew gelman stats-2014-04-04-The Notorious N.H.S.T. presents: Mo P-values Mo Problems
Introduction: A recent discussion between commenters Question and Fernando captured one of the recurrent themes here from the past year. Question: The problem is simple, the researchers are disproving always false null hypotheses and taking this disproof as near proof that their theory is correct. Fernando: Whereas it is probably true that researchers misuse NHT, the problem with tabloid science is broader and deeper. It is systemic. Question: I do not see how anything can be deeper than replacing careful description, prediction, falsification, and independent replication with dynamite plots, p-values, affirming the consequent, and peer review. From my own experience I am confident in saying that confusion caused by NHST is at the root of this problem. Fernando: Incentives? Impact factors? Publish or die? “Interesting” and “new” above quality and reliability, or actually answering a research question, and a silly and unbecoming obsession with being quoted in NYT, etc. . . . Giv
5 0.2076408 2295 andrew gelman stats-2014-04-18-One-tailed or two-tailed?
Introduction: Someone writes: Suppose I have two groups of people, A and B, which differ on some characteristic of interest to me; and for each person I measure a single real-valued quantity X. I have a theory that group A has a higher mean value of X than group B. I test this theory by using a t-test. Am I entitled to use a *one-tailed* t-test? Or should I use a *two-tailed* one (thereby giving a p-value that is twice as large)? I know you will probably answer: Forget the t-test; you should use Bayesian methods instead. But what is the standard frequentist answer to this question? My reply: The quick answer here is that different people will do different things here. I would say the 2-tailed p-value is more standard but some people will insist on the one-tailed version, and it’s hard to make a big stand on this one, given all the other problems with p-values in practice: http://www.stat.columbia.edu/~gelman/research/unpublished/p_hacking.pdf http://www.stat.columbia.edu/~gelm
6 0.20305212 1869 andrew gelman stats-2013-05-24-In which I side with Neyman over Fisher
7 0.19944033 2263 andrew gelman stats-2014-03-24-Empirical implications of Empirical Implications of Theoretical Models
8 0.19476987 2183 andrew gelman stats-2014-01-23-Discussion on preregistration of research studies
10 0.18908873 1760 andrew gelman stats-2013-03-12-Misunderstanding the p-value
11 0.18482777 1024 andrew gelman stats-2011-11-23-Of hypothesis tests and Unitarians
12 0.17291561 1272 andrew gelman stats-2012-04-20-More proposals to reform the peer-review system
13 0.17270106 506 andrew gelman stats-2011-01-06-That silly ESP paper and some silliness in a rebuttal as well
14 0.16503638 2149 andrew gelman stats-2013-12-26-Statistical evidence for revised standards
15 0.16342011 2102 andrew gelman stats-2013-11-15-“Are all significant p-values created equal?”
16 0.16061638 1607 andrew gelman stats-2012-12-05-The p-value is not . . .
17 0.15392223 2272 andrew gelman stats-2014-03-29-I agree with this comment
18 0.15329534 1605 andrew gelman stats-2012-12-04-Write This Book
19 0.15125999 423 andrew gelman stats-2010-11-20-How to schedule projects in an introductory statistics course?
20 0.14690083 2093 andrew gelman stats-2013-11-07-I’m negative on the expression “false positives”
topicId topicWeight
[(0, 0.217), (1, -0.004), (2, -0.046), (3, -0.135), (4, -0.087), (5, -0.044), (6, -0.048), (7, 0.063), (8, -0.034), (9, -0.12), (10, -0.108), (11, 0.027), (12, 0.036), (13, -0.162), (14, -0.023), (15, -0.021), (16, -0.043), (17, -0.073), (18, 0.003), (19, -0.104), (20, 0.059), (21, 0.022), (22, -0.03), (23, -0.019), (24, -0.101), (25, -0.062), (26, 0.057), (27, 0.008), (28, -0.007), (29, -0.039), (30, 0.022), (31, -0.018), (32, 0.046), (33, 0.033), (34, -0.108), (35, -0.06), (36, 0.076), (37, -0.086), (38, -0.014), (39, 0.016), (40, -0.05), (41, 0.012), (42, 0.031), (43, -0.039), (44, 0.007), (45, 0.077), (46, -0.006), (47, -0.028), (48, 0.045), (49, 0.009)]
simIndex simValue blogId blogTitle
same-blog 1 0.95551068 256 andrew gelman stats-2010-09-04-Noooooooooooooooooooooooooooooooooooooooooooooooo!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!
Introduction: Masanao sends this one in, under the heading, “another incident of misunderstood p-value”: Warren Davies, a positive psychology MSc student at UEL, provides the latest in our ongoing series of guest features for students. Warren has just released a Psychology Study Guide, which covers information on statistics, research methods and study skills for psychology students. Despite the myriad rules and procedures of science, some research findings are pure flukes. Perhaps you’re testing a new drug, and by chance alone, a large number of people spontaneously get better. The better your study is conducted, the lower the chance that your result was a fluke – but still, there is always a certain probability that it was. Statistical significance testing gives you an idea of what this probability is. In science we’re always testing hypotheses. We never conduct a study to ‘see what happens’, because there’s always at least one way to make any useless set of data look important. We take
2 0.84193164 2272 andrew gelman stats-2014-03-29-I agree with this comment
Introduction: The anonymous commenter puts it well : The problem is simple, the researchers are disproving always false null hypotheses and taking this disproof as near proof that their theory is correct.
3 0.84140444 1024 andrew gelman stats-2011-11-23-Of hypothesis tests and Unitarians
Introduction: Xian, Judith, and I read this line in a book by statistician Murray Aitkin in which he considered the following hypothetical example: A survey of 100 individuals expressing support (Yes/No) for the president, before and after a presidential address . . . The question of interest is whether there has been a change in support between the surveys . . . We want to assess the evidence for the hypothesis of equality H1 against the alternative hypothesis H2 of a change. Here is our response : Based on our experience in public opinion research, this is not a real question. Support for any political position is always changing. The real question is how much the support has changed, or perhaps how this change is distributed across the population. A defender of Aitkin (and of classical hypothesis testing) might respond at this point that, yes, everybody knows that changes are never exactly zero and that we should take a more “grown-up” view of the null hypothesis, not that the change
4 0.82776403 1355 andrew gelman stats-2012-05-31-Lindley’s paradox
Introduction: Sam Seaver writes: I [Seaver] happened to be reading an ironic article by Karl Friston when I learned something new about frequentist vs bayesian, namely Lindley’s paradox, on page 12. The text is as follows: So why are we worried about trivial effects? They are important because the probability that the true effect size is exactly zero is itself zero and could cause us to reject the null hypothesis inappropriately. This is a fallacy of classical inference and is not unrelated to Lindley’s paradox (Lindley 1957). Lindley’s paradox describes a counterintuitive situation in which Bayesian and frequentist approaches to hypothesis testing give opposite results. It occurs when; (i) a result is significant by a frequentist test, indicating sufficient evidence to reject the null hypothesis d=0 and (ii) priors render the posterior probability of d=0 high, indicating strong evidence that the null hypothesis is true. In his original treatment, Lindley (1957) showed that – under a parti
5 0.82243502 2281 andrew gelman stats-2014-04-04-The Notorious N.H.S.T. presents: Mo P-values Mo Problems
Introduction: A recent discussion between commenters Question and Fernando captured one of the recurrent themes here from the past year. Question: The problem is simple, the researchers are disproving always false null hypotheses and taking this disproof as near proof that their theory is correct. Fernando: Whereas it is probably true that researchers misuse NHT, the problem with tabloid science is broader and deeper. It is systemic. Question: I do not see how anything can be deeper than replacing careful description, prediction, falsification, and independent replication with dynamite plots, p-values, affirming the consequent, and peer review. From my own experience I am confident in saying that confusion caused by NHST is at the root of this problem. Fernando: Incentives? Impact factors? Publish or die? “Interesting” and “new” above quality and reliability, or actually answering a research question, and a silly and unbecoming obsession with being quoted in NYT, etc. . . . Giv
6 0.81951141 2295 andrew gelman stats-2014-04-18-One-tailed or two-tailed?
7 0.79911041 2102 andrew gelman stats-2013-11-15-“Are all significant p-values created equal?”
9 0.79186052 2183 andrew gelman stats-2014-01-23-Discussion on preregistration of research studies
10 0.79101276 2263 andrew gelman stats-2014-03-24-Empirical implications of Empirical Implications of Theoretical Models
11 0.77544338 1869 andrew gelman stats-2013-05-24-In which I side with Neyman over Fisher
12 0.77474928 1883 andrew gelman stats-2013-06-04-Interrogating p-values
13 0.76727146 2149 andrew gelman stats-2013-12-26-Statistical evidence for revised standards
14 0.74355298 1760 andrew gelman stats-2013-03-12-Misunderstanding the p-value
15 0.70485973 331 andrew gelman stats-2010-10-10-Bayes jumps the shark
16 0.70014042 2312 andrew gelman stats-2014-04-29-Ken Rice presents a unifying approach to statistical inference and hypothesis testing
18 0.6697613 1605 andrew gelman stats-2012-12-04-Write This Book
19 0.66617388 1776 andrew gelman stats-2013-03-25-The harm done by tests of significance
20 0.66539806 423 andrew gelman stats-2010-11-20-How to schedule projects in an introductory statistics course?
topicId topicWeight
[(2, 0.045), (15, 0.043), (16, 0.041), (21, 0.06), (24, 0.146), (59, 0.02), (63, 0.03), (69, 0.083), (75, 0.026), (86, 0.018), (99, 0.344)]
simIndex simValue blogId blogTitle
1 0.98221904 158 andrew gelman stats-2010-07-22-Tenants and landlords
Introduction: Matthew Yglesias and Megan McArdle argue about the economics of landlord/tenant laws in D.C., a topic I know nothing about. But it did remind me of a few stories . . . 1. In grad school, I shared half of a two-family house with three other students. At some point, our landlord (who lived in the other half of the house) decided he wanted to sell the place, so he had a real estate agent coming by occasionally to show the house to people. She was just a flat-out liar (which I guess fits my impression based on screenings of Glengarry Glen Ross). I could never decide, when I was around and she was lying to a prospective buyer, whether to call her on it. Sometimes I did, sometimes I didn’t. 2. A year after I graduated, the landlord actually did sell the place but then, when my friends moved out, he refused to pay back their security deposit. There was some debate about getting the place repainted, I don’t remember the details. So they sued the landlord in Mass. housing court
same-blog 2 0.97583568 256 andrew gelman stats-2010-09-04-Noooooooooooooooooooooooooooooooooooooooooooooooo!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!
Introduction: Masanao sends this one in, under the heading, “another incident of misunderstood p-value”: Warren Davies, a positive psychology MSc student at UEL, provides the latest in our ongoing series of guest features for students. Warren has just released a Psychology Study Guide, which covers information on statistics, research methods and study skills for psychology students. Despite the myriad rules and procedures of science, some research findings are pure flukes. Perhaps you’re testing a new drug, and by chance alone, a large number of people spontaneously get better. The better your study is conducted, the lower the chance that your result was a fluke – but still, there is always a certain probability that it was. Statistical significance testing gives you an idea of what this probability is. In science we’re always testing hypotheses. We never conduct a study to ‘see what happens’, because there’s always at least one way to make any useless set of data look important. We take
3 0.97306931 923 andrew gelman stats-2011-09-24-What is the normal range of values in a medical test?
Introduction: Geoffrey Sheean writes: I am having trouble thinking Bayesianly about the so-called ‘normal’ or ‘reference’ values that I am supposed to use in some of the tests I perform. These values are obtained from purportedly healthy people. Setting aside concerns about ascertainment bias, non-parametric distributions, and the like, the values are usually obtained by setting the limits at ± 2SD from the mean. In some cases, supposedly because of a non-normal distribution, the third highest and lowest value observed in the healthy group sets the limits, on the assumption that no more than 2 results (out of 20 samples) are allowed to exceed these values: if there are 3 or more, then the test is assumed to be abnormal and the reference range is said to reflect the 90th percentile. The results are binary – normal, abnormal. The relevance to the diseased state is this. People who are known unequivocally to have condition X show Y abnormalities in these tests. Therefore, when people suspected
4 0.97248107 1769 andrew gelman stats-2013-03-18-Tibshirani announces new research result: A significance test for the lasso
Introduction: Lasso and me For a long time I was wrong about lasso. Lasso (“least absolute shrinkage and selection operator”) is a regularization procedure that shrinks regression coefficients toward zero, and in its basic form is equivalent to maximum penalized likelihood estimation with a penalty function that is proportional to the sum of the absolute values of the regression coefficients. I first heard about lasso from a talk that Trevor Hastie Rob Tibshirani gave at Berkeley in 1994 or 1995. He demonstrated that it shrunk regression coefficients to zero. I wasn’t impressed, first because it seemed like no big deal (if that’s the prior you use, that’s the shrinkage you get) and second because, from a Bayesian perspective, I don’t want to shrink things all the way to zero. In the sorts of social and environmental science problems I’ve worked on, just about nothing is zero. I’d like to control my noisy estimates but there’s nothing special about zero. At the end of the talk I stood
5 0.97163224 406 andrew gelman stats-2010-11-10-Translating into Votes: The Electoral Impact of Spanish-Language Ballots
Introduction: Dan Hopkins sends along this article : [Hopkins] uses regression discontinuity design to estimate the turnout and election impacts of Spanish-language assistance provided under Section 203 of the Voting Rights Act. Analyses of two different data sets – the Latino National Survey and California 1998 primary election returns – show that Spanish-language assistance increased turnout for citizens who speak little English. The California results also demonstrate that election procedures an influence outcomes, as support for ending bilingual education dropped markedly in heavily Spanish-speaking neighborhoods with Spanish-language assistance. The California analyses find hints of backlash among non-Hispanic white precincts, but not with the same size or certainty. Small changes in election procedures can influence who votes as well as what wins. Beyond the direct relevance of these results, I find this paper interesting as an example of research that is fundamentally quantitative. Th
6 0.97056305 1909 andrew gelman stats-2013-06-21-Job openings at conservative political analytics firm!
7 0.96953869 89 andrew gelman stats-2010-06-16-A historical perspective on financial bailouts
8 0.96860534 518 andrew gelman stats-2011-01-15-Regression discontinuity designs: looking for the keys under the lamppost?
9 0.96790111 749 andrew gelman stats-2011-06-06-“Sampling: Design and Analysis”: a course for political science graduate students
10 0.96639985 384 andrew gelman stats-2010-10-31-Two stories about the election that I don’t believe
11 0.96522474 1357 andrew gelman stats-2012-06-01-Halloween-Valentine’s update
12 0.96501231 1267 andrew gelman stats-2012-04-17-Hierarchical-multilevel modeling with “big data”
14 0.96322054 654 andrew gelman stats-2011-04-09-There’s no evidence that voters choose presidential candidates based on their looks
16 0.96133411 486 andrew gelman stats-2010-12-26-Age and happiness: The pattern isn’t as clear as you might think
17 0.96124595 2367 andrew gelman stats-2014-06-10-Spring forward, fall back, drop dead?
18 0.96095836 2236 andrew gelman stats-2014-03-07-Selection bias in the reporting of shaky research
19 0.96050459 27 andrew gelman stats-2010-05-11-Update on the spam email study
20 0.9603951 2232 andrew gelman stats-2014-03-03-What is the appropriate time scale for blogging—the day or the week?