andrew_gelman_stats andrew_gelman_stats-2011 andrew_gelman_stats-2011-1024 knowledge-graph by maker-knowledge-mining
Source: html
Introduction: Xian, Judith, and I read this line in a book by statistician Murray Aitkin in which he considered the following hypothetical example: A survey of 100 individuals expressing support (Yes/No) for the president, before and after a presidential address . . . The question of interest is whether there has been a change in support between the surveys . . . We want to assess the evidence for the hypothesis of equality H1 against the alternative hypothesis H2 of a change. Here is our response : Based on our experience in public opinion research, this is not a real question. Support for any political position is always changing. The real question is how much the support has changed, or perhaps how this change is distributed across the population. A defender of Aitkin (and of classical hypothesis testing) might respond at this point that, yes, everybody knows that changes are never exactly zero and that we should take a more “grown-up” view of the null hypothesis, not that the change
sentIndex sentText sentNum sentScore
1 Xian, Judith, and I read this line in a book by statistician Murray Aitkin in which he considered the following hypothetical example: A survey of 100 individuals expressing support (Yes/No) for the president, before and after a presidential address . [sent-1, score-0.705]
2 The question of interest is whether there has been a change in support between the surveys . [sent-4, score-0.577]
3 We want to assess the evidence for the hypothesis of equality H1 against the alternative hypothesis H2 of a change. [sent-7, score-0.993]
4 Here is our response : Based on our experience in public opinion research, this is not a real question. [sent-8, score-0.193]
5 The real question is how much the support has changed, or perhaps how this change is distributed across the population. [sent-10, score-0.675]
6 A defender of Aitkin (and of classical hypothesis testing) might respond at this point that, yes, everybody knows that changes are never exactly zero and that we should take a more “grown-up” view of the null hypothesis, not that the change is zero but that it is nearly zero. [sent-11, score-1.111]
7 Unfortunately, the metaphorical interpretation of hypothesis tests has problems similar to the theological doctrines of the Unitarian church. [sent-12, score-0.754]
8 Once you have abandoned literal belief in the Bible, the question soon arises: why follow it at all? [sent-13, score-0.486]
9 I like the line about the Unitarian Church, also the idea of hypothesis testing as a religion (since people are always describing Bayesianism as a religious doctrine). [sent-15, score-0.941]
wordName wordTfidf (topN-words)
[('hypothesis', 0.397), ('aitkin', 0.308), ('unitarian', 0.308), ('support', 0.211), ('change', 0.196), ('null', 0.158), ('rehabilitate', 0.14), ('theological', 0.14), ('inappropriateness', 0.14), ('metaphorical', 0.14), ('judith', 0.132), ('testing', 0.127), ('metaphor', 0.118), ('abandoned', 0.118), ('literal', 0.118), ('equality', 0.115), ('opinion', 0.114), ('doctrine', 0.113), ('zero', 0.112), ('recognizes', 0.108), ('bible', 0.106), ('xian', 0.105), ('bayesianism', 0.102), ('line', 0.101), ('church', 0.1), ('question', 0.1), ('murray', 0.098), ('expressing', 0.093), ('distributed', 0.089), ('religion', 0.086), ('attack', 0.084), ('assess', 0.084), ('performing', 0.084), ('hypothetical', 0.082), ('arises', 0.081), ('religious', 0.08), ('real', 0.079), ('presidential', 0.078), ('describing', 0.077), ('treat', 0.077), ('problems', 0.077), ('soon', 0.076), ('belief', 0.074), ('president', 0.074), ('always', 0.073), ('address', 0.073), ('surveys', 0.07), ('respond', 0.069), ('everybody', 0.067), ('individuals', 0.067)]
simIndex simValue blogId blogTitle
same-blog 1 1.0 1024 andrew gelman stats-2011-11-23-Of hypothesis tests and Unitarians
Introduction: Xian, Judith, and I read this line in a book by statistician Murray Aitkin in which he considered the following hypothetical example: A survey of 100 individuals expressing support (Yes/No) for the president, before and after a presidential address . . . The question of interest is whether there has been a change in support between the surveys . . . We want to assess the evidence for the hypothesis of equality H1 against the alternative hypothesis H2 of a change. Here is our response : Based on our experience in public opinion research, this is not a real question. Support for any political position is always changing. The real question is how much the support has changed, or perhaps how this change is distributed across the population. A defender of Aitkin (and of classical hypothesis testing) might respond at this point that, yes, everybody knows that changes are never exactly zero and that we should take a more “grown-up” view of the null hypothesis, not that the change
2 0.23149183 2295 andrew gelman stats-2014-04-18-One-tailed or two-tailed?
Introduction: Someone writes: Suppose I have two groups of people, A and B, which differ on some characteristic of interest to me; and for each person I measure a single real-valued quantity X. I have a theory that group A has a higher mean value of X than group B. I test this theory by using a t-test. Am I entitled to use a *one-tailed* t-test? Or should I use a *two-tailed* one (thereby giving a p-value that is twice as large)? I know you will probably answer: Forget the t-test; you should use Bayesian methods instead. But what is the standard frequentist answer to this question? My reply: The quick answer here is that different people will do different things here. I would say the 2-tailed p-value is more standard but some people will insist on the one-tailed version, and it’s hard to make a big stand on this one, given all the other problems with p-values in practice: http://www.stat.columbia.edu/~gelman/research/unpublished/p_hacking.pdf http://www.stat.columbia.edu/~gelm
3 0.19836456 1869 andrew gelman stats-2013-05-24-In which I side with Neyman over Fisher
Introduction: As a data analyst and a scientist, Fisher > Neyman, no question. But as a theorist, Fisher came up with ideas that worked just fine in his applications but can fall apart when people try to apply them too generally. Here’s an example that recently came up. Deborah Mayo pointed me to a comment by Stephen Senn on the so-called Fisher and Neyman null hypotheses. In an experiment with n participants (or, as we used to say, subjects or experimental units), the Fisher null hypothesis is that the treatment effect is exactly 0 for every one of the n units, while the Neyman null hypothesis is that the individual treatment effects can be negative or positive but have an average of zero. Senn explains why Neyman’s hypothesis in general makes no sense—the short story is that Fisher’s hypothesis seems relevant in some problems (sometimes we really are studying effects that are zero or close enough for all practical purposes), whereas Neyman’s hypothesis just seems weird (it’s implausible
4 0.1954392 2263 andrew gelman stats-2014-03-24-Empirical implications of Empirical Implications of Theoretical Models
Introduction: Robert Bloomfield writes: Most of the people in my field (accounting, which is basically applied economics and finance, leavened with psychology and organizational behavior) use ‘positive research methods’, which are typically described as coming to the data with a predefined theory, and using hypothesis testing to accept or reject the theory’s predictions. But a substantial minority use ‘interpretive research methods’ (sometimes called qualitative methods, for those that call positive research ‘quantitative’). No one seems entirely happy with the definition of this method, but I’ve found it useful to think of it as an attempt to see the world through the eyes of your subjects, much as Jane Goodall lived with gorillas and tried to see the world through their eyes.) Interpretive researchers often criticize positive researchers by noting that the latter don’t make the best use of their data, because they come to the data with a predetermined theory, and only test a narrow set of h
5 0.18760931 423 andrew gelman stats-2010-11-20-How to schedule projects in an introductory statistics course?
Introduction: John Haubrick writes: Next semester I want to center my statistics class around independent projects that they will present at the end of the semester. My question is, by centering around a project and teaching for the different parts that they need at the time, should topics such as hypothesis testing be moved toward the beginning of the course? Or should I only discuss setting up a research hypothesis and discuss the actual testing later after they have the data? My reply: I’m not sure. There always is a difficulty of what can be covered in a project. My quick thought is that a project will perhaps work better if it is focused on data collection or exploratory data analysis rather than on estimation and hypothesis testing, which are topics that get covered pretty well in the course as a whole.
7 0.18000025 1355 andrew gelman stats-2012-05-31-Lindley’s paradox
8 0.17522375 1605 andrew gelman stats-2012-12-04-Write This Book
10 0.13995962 1272 andrew gelman stats-2012-04-20-More proposals to reform the peer-review system
11 0.13891146 2183 andrew gelman stats-2014-01-23-Discussion on preregistration of research studies
12 0.13730413 1095 andrew gelman stats-2012-01-01-Martin and Liu: Probabilistic inference based on consistency of model with data
13 0.13597848 331 andrew gelman stats-2010-10-10-Bayes jumps the shark
14 0.13326366 2149 andrew gelman stats-2013-12-26-Statistical evidence for revised standards
15 0.12953053 2281 andrew gelman stats-2014-04-04-The Notorious N.H.S.T. presents: Mo P-values Mo Problems
16 0.12374375 1016 andrew gelman stats-2011-11-17-I got 99 comparisons but multiplicity ain’t one
17 0.12018865 1883 andrew gelman stats-2013-06-04-Interrogating p-values
18 0.11513796 643 andrew gelman stats-2011-04-02-So-called Bayesian hypothesis testing is just as bad as regular hypothesis testing
19 0.11138076 1695 andrew gelman stats-2013-01-28-Economists argue about Bayes
20 0.11029357 662 andrew gelman stats-2011-04-15-Bayesian statistical pragmatism
topicId topicWeight
[(0, 0.151), (1, 0.014), (2, 0.042), (3, -0.056), (4, -0.065), (5, -0.01), (6, -0.044), (7, 0.061), (8, 0.021), (9, -0.103), (10, -0.082), (11, -0.015), (12, 0.027), (13, -0.062), (14, 0.033), (15, -0.039), (16, -0.056), (17, -0.087), (18, 0.03), (19, -0.08), (20, 0.051), (21, -0.006), (22, -0.012), (23, 0.003), (24, -0.057), (25, -0.106), (26, 0.075), (27, -0.021), (28, -0.007), (29, 0.002), (30, 0.047), (31, -0.056), (32, 0.049), (33, 0.038), (34, -0.124), (35, -0.083), (36, 0.09), (37, -0.071), (38, 0.092), (39, 0.063), (40, -0.064), (41, 0.008), (42, -0.019), (43, -0.022), (44, -0.02), (45, 0.076), (46, -0.0), (47, -0.059), (48, 0.048), (49, 0.004)]
simIndex simValue blogId blogTitle
same-blog 1 0.98672503 1024 andrew gelman stats-2011-11-23-Of hypothesis tests and Unitarians
Introduction: Xian, Judith, and I read this line in a book by statistician Murray Aitkin in which he considered the following hypothetical example: A survey of 100 individuals expressing support (Yes/No) for the president, before and after a presidential address . . . The question of interest is whether there has been a change in support between the surveys . . . We want to assess the evidence for the hypothesis of equality H1 against the alternative hypothesis H2 of a change. Here is our response : Based on our experience in public opinion research, this is not a real question. Support for any political position is always changing. The real question is how much the support has changed, or perhaps how this change is distributed across the population. A defender of Aitkin (and of classical hypothesis testing) might respond at this point that, yes, everybody knows that changes are never exactly zero and that we should take a more “grown-up” view of the null hypothesis, not that the change
2 0.80411398 1355 andrew gelman stats-2012-05-31-Lindley’s paradox
Introduction: Sam Seaver writes: I [Seaver] happened to be reading an ironic article by Karl Friston when I learned something new about frequentist vs bayesian, namely Lindley’s paradox, on page 12. The text is as follows: So why are we worried about trivial effects? They are important because the probability that the true effect size is exactly zero is itself zero and could cause us to reject the null hypothesis inappropriately. This is a fallacy of classical inference and is not unrelated to Lindley’s paradox (Lindley 1957). Lindley’s paradox describes a counterintuitive situation in which Bayesian and frequentist approaches to hypothesis testing give opposite results. It occurs when; (i) a result is significant by a frequentist test, indicating sufficient evidence to reject the null hypothesis d=0 and (ii) priors render the posterior probability of d=0 high, indicating strong evidence that the null hypothesis is true. In his original treatment, Lindley (1957) showed that – under a parti
3 0.7913928 2295 andrew gelman stats-2014-04-18-One-tailed or two-tailed?
Introduction: Someone writes: Suppose I have two groups of people, A and B, which differ on some characteristic of interest to me; and for each person I measure a single real-valued quantity X. I have a theory that group A has a higher mean value of X than group B. I test this theory by using a t-test. Am I entitled to use a *one-tailed* t-test? Or should I use a *two-tailed* one (thereby giving a p-value that is twice as large)? I know you will probably answer: Forget the t-test; you should use Bayesian methods instead. But what is the standard frequentist answer to this question? My reply: The quick answer here is that different people will do different things here. I would say the 2-tailed p-value is more standard but some people will insist on the one-tailed version, and it’s hard to make a big stand on this one, given all the other problems with p-values in practice: http://www.stat.columbia.edu/~gelman/research/unpublished/p_hacking.pdf http://www.stat.columbia.edu/~gelm
4 0.7681393 1869 andrew gelman stats-2013-05-24-In which I side with Neyman over Fisher
Introduction: As a data analyst and a scientist, Fisher > Neyman, no question. But as a theorist, Fisher came up with ideas that worked just fine in his applications but can fall apart when people try to apply them too generally. Here’s an example that recently came up. Deborah Mayo pointed me to a comment by Stephen Senn on the so-called Fisher and Neyman null hypotheses. In an experiment with n participants (or, as we used to say, subjects or experimental units), the Fisher null hypothesis is that the treatment effect is exactly 0 for every one of the n units, while the Neyman null hypothesis is that the individual treatment effects can be negative or positive but have an average of zero. Senn explains why Neyman’s hypothesis in general makes no sense—the short story is that Fisher’s hypothesis seems relevant in some problems (sometimes we really are studying effects that are zero or close enough for all practical purposes), whereas Neyman’s hypothesis just seems weird (it’s implausible
5 0.73460346 2281 andrew gelman stats-2014-04-04-The Notorious N.H.S.T. presents: Mo P-values Mo Problems
Introduction: A recent discussion between commenters Question and Fernando captured one of the recurrent themes here from the past year. Question: The problem is simple, the researchers are disproving always false null hypotheses and taking this disproof as near proof that their theory is correct. Fernando: Whereas it is probably true that researchers misuse NHT, the problem with tabloid science is broader and deeper. It is systemic. Question: I do not see how anything can be deeper than replacing careful description, prediction, falsification, and independent replication with dynamite plots, p-values, affirming the consequent, and peer review. From my own experience I am confident in saying that confusion caused by NHST is at the root of this problem. Fernando: Incentives? Impact factors? Publish or die? “Interesting” and “new” above quality and reliability, or actually answering a research question, and a silly and unbecoming obsession with being quoted in NYT, etc. . . . Giv
7 0.71578276 2272 andrew gelman stats-2014-03-29-I agree with this comment
8 0.68624103 2149 andrew gelman stats-2013-12-26-Statistical evidence for revised standards
10 0.68475765 2102 andrew gelman stats-2013-11-15-“Are all significant p-values created equal?”
11 0.68192255 2263 andrew gelman stats-2014-03-24-Empirical implications of Empirical Implications of Theoretical Models
12 0.6459614 423 andrew gelman stats-2010-11-20-How to schedule projects in an introductory statistics course?
13 0.64588141 2183 andrew gelman stats-2014-01-23-Discussion on preregistration of research studies
14 0.64284307 331 andrew gelman stats-2010-10-10-Bayes jumps the shark
16 0.59301966 1605 andrew gelman stats-2012-12-04-Write This Book
17 0.59129924 1883 andrew gelman stats-2013-06-04-Interrogating p-values
18 0.58763355 1095 andrew gelman stats-2012-01-01-Martin and Liu: Probabilistic inference based on consistency of model with data
19 0.58476996 1016 andrew gelman stats-2011-11-17-I got 99 comparisons but multiplicity ain’t one
20 0.55640811 643 andrew gelman stats-2011-04-02-So-called Bayesian hypothesis testing is just as bad as regular hypothesis testing
topicId topicWeight
[(2, 0.033), (16, 0.036), (21, 0.04), (24, 0.18), (29, 0.15), (31, 0.022), (53, 0.019), (63, 0.02), (65, 0.037), (81, 0.013), (82, 0.013), (87, 0.014), (89, 0.012), (91, 0.025), (98, 0.058), (99, 0.237)]
simIndex simValue blogId blogTitle
same-blog 1 0.93330389 1024 andrew gelman stats-2011-11-23-Of hypothesis tests and Unitarians
Introduction: Xian, Judith, and I read this line in a book by statistician Murray Aitkin in which he considered the following hypothetical example: A survey of 100 individuals expressing support (Yes/No) for the president, before and after a presidential address . . . The question of interest is whether there has been a change in support between the surveys . . . We want to assess the evidence for the hypothesis of equality H1 against the alternative hypothesis H2 of a change. Here is our response : Based on our experience in public opinion research, this is not a real question. Support for any political position is always changing. The real question is how much the support has changed, or perhaps how this change is distributed across the population. A defender of Aitkin (and of classical hypothesis testing) might respond at this point that, yes, everybody knows that changes are never exactly zero and that we should take a more “grown-up” view of the null hypothesis, not that the change
2 0.92420912 1940 andrew gelman stats-2013-07-16-A poll that throws away data???
Introduction: Mark Blumenthal writes: What do you think about the “random rejection” method used by PPP that was attacked at some length today by a Republican pollster. Our just published post on the debate includes all the details as I know them. The Storify of Martino’s tweets has some additional data tables linked to toward the end. Also, more specifically, setting aside Martino’s suggestion of manipulation (which is also quite possible with post-stratification weights), would the PPP method introduce more potential random error than weighting? From Blumenthal’s blog: B.J. Martino, a senior vice president at the Republican polling firm The Tarrance Group, went on an 30-minute Twitter rant on Tuesday questioning the unorthodox method used by PPP [Public Policy Polling] to select samples and weight data: “Looking at @ppppolls new VA SW. Wondering how many interviews they discarded to get down to 601 completes? Because @ppppolls discards a LOT of interviews. Of 64,811 conducted
3 0.91717899 1687 andrew gelman stats-2013-01-21-Workshop on science communication for graduate students
Introduction: Nathan Sanders writes: Applications are now open for the Communicating Science 2013 workshop (http://workshop.astrobites.com/), to be held in Cambridge, MA on June 13-15th, 2013. Graduate students at US institutions in all fields of science and engineering are encouraged to apply – funding is available for travel expenses and accommodations. The application can be found here: http://workshop.astrobites.org/application Participants will build the communication skills that technical professionals need to express complex ideas to their peers, experts in other fields, and the general public. There will be panel discussions on the following topics: * Engaging Non-Scientific Audiences * Science Writing for a Cause * Communicating Science Through Fiction * Sharing Science with Scientists * The World of Non-Academic Publishing * Communicating using Multimedia and the Web In addition to these discussions, ample time is allotted for interacting with the experts and with att
4 0.90359592 2133 andrew gelman stats-2013-12-13-Flexibility is good
Introduction: If I made a separate post for each interesting blog discussion, we’d get overwhelmed. That’s why I often leave detailed responses in the comments section, even though I’m pretty sure that most readers don’t look in the comments at all. Sometimes, though, I think it’s good to bring such discussions to light. Here’s a recent example. Michael wrote : Poor predictive performance usually indicates that the model isn’t sufficiently flexible to explain the data, and my understanding of the proper Bayesian strategy is to feed that back into your original model and try again until you achieve better performance. Corey replied : It was my impression that — in ML at least — poor predictive performance is more often due to the model being too flexible and fitting noise. And Rahul agreed : Good point. A very flexible model will describe your training data perfectly and then go bonkers when unleashed on wild data. But I wrote : Overfitting comes from a model being flex
Introduction: Brendan Nyhan sends me this article from the research-methods all-star team of Katherine Button, John Ioannidis, Claire Mokrysz, Brian Nosek , Jonathan Flint, Emma Robinson, and Marcus Munafo: A study with low statistical power has a reduced chance of detecting a true effect, but it is less well appreciated that low power also reduces the likelihood that a statistically significant result reflects a true effect. Here, we show that the average statistical power of studies in the neurosciences is very low. The consequences of this include overestimates of effect size and low reproducibility of results. There are also ethical dimensions to this problem, as unreliable research is inefficient and wasteful. Improving reproducibility in neuroscience is a key priority and requires attention to well-established but often ignored methodological principles. I agree completely. In my terminology, with small sample size, the classical approach of looking for statistical significance leads
7 0.89399701 1392 andrew gelman stats-2012-06-26-Occam
8 0.89370507 651 andrew gelman stats-2011-04-06-My talk at Northwestern University tomorrow (Thursday)
9 0.89150167 1491 andrew gelman stats-2012-09-10-Update on Levitt paper on child car seats
10 0.89030313 466 andrew gelman stats-2010-12-13-“The truth wears off: Is there something wrong with the scientific method?”
11 0.88967186 1421 andrew gelman stats-2012-07-19-Alexa, Maricel, and Marty: Three cellular automata who got on my nerves
12 0.88921285 1034 andrew gelman stats-2011-11-29-World Class Speakers and Entertainers
13 0.88019335 1344 andrew gelman stats-2012-05-25-Question 15 of my final exam for Design and Analysis of Sample Surveys
14 0.87967992 1539 andrew gelman stats-2012-10-18-IRB nightmares
15 0.87792909 868 andrew gelman stats-2011-08-24-Blogs vs. real journalism
16 0.877617 639 andrew gelman stats-2011-03-31-Bayes: radical, liberal, or conservative?
17 0.87701583 2160 andrew gelman stats-2014-01-06-Spam names
18 0.8653602 1915 andrew gelman stats-2013-06-27-Huh?
19 0.8641665 2246 andrew gelman stats-2014-03-13-An Economist’s Guide to Visualizing Data
20 0.86404091 65 andrew gelman stats-2010-06-03-How best to learn R?