andrew_gelman_stats andrew_gelman_stats-2014 andrew_gelman_stats-2014-2322 knowledge-graph by maker-knowledge-mining
Source: html
Introduction: Biostatistician Jeff Leek writes : Think about this headline: “Hospital checklist cut infections, saved lives.” I [Leek] am a pretty skeptical person, so I’m a little surprised that a checklist could really save lives. I say the odds of this being true are 1 in 4. I’m actually surprised that he’s surprised, since over the years I’ve heard about the benefits of checklists in various arenas, including hospital care. In particular, there was this article by Atul Gawande from a few years back. I mean, sure, I could imagine that checklists might hurt: after all, it takes some time and effort to put together the checklist and to use it, and perhaps the very existence of the checklist could give hospital staff a false feeling of security, which would ultimately cost lives. But my first guess would be that people still don’t do enough checklisting, and that the probability is greater than 1/4 that a checklist in a hospital will save lives. Later on, Leek writes: Let’s try ano
sentIndex sentText sentNum sentScore
1 Biostatistician Jeff Leek writes : Think about this headline: “Hospital checklist cut infections, saved lives. [sent-1, score-0.572]
2 ” I [Leek] am a pretty skeptical person, so I’m a little surprised that a checklist could really save lives. [sent-2, score-0.836]
3 I’m actually surprised that he’s surprised, since over the years I’ve heard about the benefits of checklists in various arenas, including hospital care. [sent-4, score-0.638]
4 I mean, sure, I could imagine that checklists might hurt: after all, it takes some time and effort to put together the checklist and to use it, and perhaps the very existence of the checklist could give hospital staff a false feeling of security, which would ultimately cost lives. [sent-6, score-1.854]
5 But my first guess would be that people still don’t do enough checklisting, and that the probability is greater than 1/4 that a checklist in a hospital will save lives. [sent-7, score-1.188]
6 Later on, Leek writes: Let’s try another headline: “How using Facebook could increase your risk of cancer. [sent-8, score-0.123]
7 ” To my mind, the odds that this is right may be something like 1 in 10. [sent-10, score-0.095]
8 He’s saying that his prior probability of this happening is as high as 1/10? [sent-12, score-0.284]
9 5 his prior probability that a checklist will save lives. [sent-14, score-0.968]
10 Here we can see one of the problems with subjective priors. [sent-15, score-0.061]
11 I’m reminded of what George Orwell wrote about book reviewing: if you review 10 books a week, and if your scale is such that Hamlet is a good play and Great Expectations is a good read, how to you calibrate all the material of varying quality that you are sent to review? [sent-17, score-0.295]
12 The only answer is that books are reviewed relative to expectations, and you can’t say that the latest bestseller is crap just cos it doesn’t live up to the standards of Shakespeare. [sent-18, score-0.293]
13 Similarly, I have a feeling that Leek is setting his priors relative to expectations. [sent-19, score-0.212]
14 In his first example, sure, we have a general belief that checklists are important, but Leek compresses his scale by invoking a general skepticism. [sent-20, score-0.748]
15 So, instead of saying that checklists probably work, he dials down his probability to 1/4. [sent-21, score-0.606]
16 Indeed, even thinking about the question implies that the probability is nonzero, and then we get to thinking: hmm, you use Facebook and you stay indoors more, then maybe you don’t get enough exercise or not enough vitamin C . [sent-25, score-0.481]
17 But this 1/10 is not on the same scale as the earlier 1/4. [sent-30, score-0.127]
18 The 1/4 referred to the probability that a checklist really works to save lives, whereas the 1/10 is the probability that there’s something, somewhere associated with Facebook use that is also associated with cancer risk in some small way (as there’s no realistic way this effect could be large). [sent-31, score-1.515]
19 This illustrates a general problem, not just with priors and Bayesian statistics but with scientific measurement in general. [sent-32, score-0.145]
20 My point is not to pick on him but rather to bring some attention to the general problems of probability assignment. [sent-36, score-0.371]
wordName wordTfidf (topN-words)
[('checklist', 0.518), ('leek', 0.343), ('checklists', 0.325), ('probability', 0.234), ('hospital', 0.215), ('facebook', 0.207), ('save', 0.166), ('scale', 0.127), ('surprised', 0.098), ('odds', 0.095), ('expectations', 0.093), ('headline', 0.09), ('general', 0.076), ('probabilities', 0.074), ('compresses', 0.074), ('infections', 0.074), ('indoors', 0.074), ('arenas', 0.074), ('hamlet', 0.074), ('feeling', 0.072), ('relative', 0.071), ('invoking', 0.07), ('atul', 0.07), ('gawande', 0.07), ('risk', 0.069), ('priors', 0.069), ('associated', 0.066), ('biostatistician', 0.064), ('vitamin', 0.063), ('bestseller', 0.061), ('problems', 0.061), ('works', 0.06), ('books', 0.059), ('orwell', 0.056), ('hmm', 0.056), ('nonzero', 0.056), ('review', 0.055), ('enough', 0.055), ('could', 0.054), ('calibrate', 0.054), ('saved', 0.054), ('cos', 0.054), ('numerical', 0.05), ('prior', 0.05), ('existence', 0.05), ('staff', 0.048), ('crap', 0.048), ('hurt', 0.048), ('realistic', 0.048), ('probably', 0.047)]
simIndex simValue blogId blogTitle
same-blog 1 1.0 2322 andrew gelman stats-2014-05-06-Priors I don’t believe
Introduction: Biostatistician Jeff Leek writes : Think about this headline: “Hospital checklist cut infections, saved lives.” I [Leek] am a pretty skeptical person, so I’m a little surprised that a checklist could really save lives. I say the odds of this being true are 1 in 4. I’m actually surprised that he’s surprised, since over the years I’ve heard about the benefits of checklists in various arenas, including hospital care. In particular, there was this article by Atul Gawande from a few years back. I mean, sure, I could imagine that checklists might hurt: after all, it takes some time and effort to put together the checklist and to use it, and perhaps the very existence of the checklist could give hospital staff a false feeling of security, which would ultimately cost lives. But my first guess would be that people still don’t do enough checklisting, and that the probability is greater than 1/4 that a checklist in a hospital will save lives. Later on, Leek writes: Let’s try ano
Introduction: Jeff Leek just posted the discussions of his paper (with Leah Jager), “An estimate of the science-wise false discovery rate and application to the top medical literature,” along with some further comments of his own. Here are my original thoughts on an earlier version of their article. Keith O’Rourke and I expanded these thoughts into a formal comment for the journal. We’re pretty much in agreement with John Ioannidis (you can find his discussion in the top link above). In quick summary, I agree with Jager and Leek that this is an important topic. I think there are two key places where Keith and I disagree with them: 1. They take published p-values at face value whereas we consider them as the result of a complicated process of selection. This is something I didn’t used to think much about, but now I’ve become increasingly convinced that the problems with published p-values is not a simple file-drawer effect or the case of a few p=0.051 values nudged toward p=0.049, bu
3 0.13472472 2200 andrew gelman stats-2014-02-05-Prior distribution for a predicted probability
Introduction: I received the following email: I have an interesting thought on a prior for a logistic regression, and would love your input on how to make it “work.” Some of my research, two published papers, are on mathematical models of **. Along those lines, I’m interested in developing more models for **. . . . Empirical studies show that the public is rather smart and that the wisdom-of-the-crowd is fairly accurate. So, my thought would be to tread the public’s probability of the event as a prior, and then see how adding data, through a model, would change or perturb our inferred probability of **. (Similarly, I could envision using previously published epidemiological research as a prior probability of a disease, and then seeing how the addition of new testing protocols would update that belief.) However, everything I learned about hierarchical Bayesian models has a prior as a distribution on the coefficients. I don’t know how to start with a prior point estimate for the probabili
4 0.12917793 1155 andrew gelman stats-2012-02-05-What is a prior distribution?
Introduction: Some recent blog discussion revealed some confusion that I’ll try to resolve here. I wrote that I’m not a big fan of subjective priors. Various commenters had difficulty with this point, and I think the issue was most clearly stated by Bill Jeff re erys, who wrote : It seems to me that your prior has to reflect your subjective information before you look at the data. How can it not? But this does not mean that the (subjective) prior that you choose is irrefutable; Surely a prior that reflects prior information just does not have to be inconsistent with that information. But that still leaves a range of priors that are consistent with it, the sort of priors that one would use in a sensitivity analysis, for example. I think I see what Bill is getting at. A prior represents your subjective belief, or some approximation to your subjective belief, even if it’s not perfect. That sounds reasonable but I don’t think it works. Or, at least, it often doesn’t work. Let’s start
5 0.12725206 1844 andrew gelman stats-2013-05-06-Against optimism about social science
Introduction: Social science research has been getting pretty bad press recently, what with the Excel buccaneers who didn’t know how to handle data with different numbers of observations per country, and the psychologist who published dozens of papers based on fabricated data, and the Evilicious guy who wouldn’t let people review his data tapes, etc etc. And that’s not even considering Dr. Anil Potti. On the other hand, the revelation of all these problems can be taken as evidence that things are getting better. Psychology researcher Gary Marcus writes : There is something positive that has come out of the crisis of replicability—something vitally important for all experimental sciences. For years, it was extremely difficult to publish a direct replication, or a failure to replicate an experiment, in a good journal. . . . Now, happily, the scientific culture has changed. . . . The Reproducibility Project, from the Center for Open Science is now underway . . . And sociologist Fabio Rojas
6 0.12202805 1951 andrew gelman stats-2013-07-22-Top 5 stat papers since 2000?
8 0.11104716 1941 andrew gelman stats-2013-07-16-Priors
9 0.104306 138 andrew gelman stats-2010-07-10-Creating a good wager based on probability estimates
10 0.09634418 54 andrew gelman stats-2010-05-27-Hype about conditional probability puzzles
11 0.094862759 1562 andrew gelman stats-2012-11-05-Let’s try this: Instead of saying, “The probability is 75%,” say “There’s a 25% chance I’m wrong”
12 0.094585374 341 andrew gelman stats-2010-10-14-Confusion about continuous probability densities
13 0.089839458 1092 andrew gelman stats-2011-12-29-More by Berger and me on weakly informative priors
14 0.088609837 760 andrew gelman stats-2011-06-12-How To Party Your Way Into a Multi-Million Dollar Facebook Job
15 0.086063862 1544 andrew gelman stats-2012-10-22-Is it meaningful to talk about a probability of “65.7%” that Obama will win the election?
16 0.084034279 2232 andrew gelman stats-2014-03-03-What is the appropriate time scale for blogging—the day or the week?
17 0.083759591 2028 andrew gelman stats-2013-09-17-Online conference for young statistics researchers
19 0.080884129 1474 andrew gelman stats-2012-08-29-More on scaled-inverse Wishart and prior independence
20 0.079848133 1695 andrew gelman stats-2013-01-28-Economists argue about Bayes
topicId topicWeight
[(0, 0.18), (1, 0.015), (2, -0.012), (3, 0.01), (4, -0.024), (5, -0.046), (6, 0.092), (7, 0.027), (8, -0.022), (9, -0.038), (10, -0.014), (11, -0.006), (12, 0.033), (13, -0.03), (14, -0.007), (15, 0.002), (16, 0.028), (17, 0.024), (18, 0.006), (19, -0.019), (20, -0.019), (21, 0.025), (22, -0.034), (23, 0.025), (24, -0.026), (25, 0.04), (26, 0.003), (27, 0.01), (28, -0.027), (29, -0.02), (30, -0.028), (31, -0.001), (32, -0.023), (33, 0.017), (34, -0.027), (35, -0.042), (36, 0.035), (37, -0.001), (38, -0.025), (39, -0.048), (40, -0.001), (41, -0.024), (42, 0.033), (43, -0.031), (44, 0.016), (45, 0.013), (46, 0.002), (47, 0.045), (48, -0.04), (49, -0.029)]
simIndex simValue blogId blogTitle
same-blog 1 0.97165793 2322 andrew gelman stats-2014-05-06-Priors I don’t believe
Introduction: Biostatistician Jeff Leek writes : Think about this headline: “Hospital checklist cut infections, saved lives.” I [Leek] am a pretty skeptical person, so I’m a little surprised that a checklist could really save lives. I say the odds of this being true are 1 in 4. I’m actually surprised that he’s surprised, since over the years I’ve heard about the benefits of checklists in various arenas, including hospital care. In particular, there was this article by Atul Gawande from a few years back. I mean, sure, I could imagine that checklists might hurt: after all, it takes some time and effort to put together the checklist and to use it, and perhaps the very existence of the checklist could give hospital staff a false feeling of security, which would ultimately cost lives. But my first guess would be that people still don’t do enough checklisting, and that the probability is greater than 1/4 that a checklist in a hospital will save lives. Later on, Leek writes: Let’s try ano
2 0.86246067 138 andrew gelman stats-2010-07-10-Creating a good wager based on probability estimates
Introduction: Suppose you and I agree on a probability estimate…perhaps we both agree there is a 2/3 chance Spain will beat Netherlands in tomorrow’s World Cup. In this case, we could agree on a wager: if Spain beats Netherlands, I pay you $x. If Netherlands beats Spain, you pay me $2x. It is easy to see that my expected loss (or win) is $0, and that the same is true for you. Either of us should be indifferent to taking this bet, and to which side of the bet we are on. We might make this bet just to increase our interest in watching the game, but neither of us would see a money-making opportunity here. By the way, the relationship between “odds” and the event probability — a 1/3 chance of winning turning into a bet at 2:1 odds — is that if the event probability is p, then a fair bet has odds of (1/p – 1):1. More interesting, and more relevant to many real-world situations, is the case that we disagree on the probability of an event. If we disagree on the probability, then there should be
3 0.84433335 54 andrew gelman stats-2010-05-27-Hype about conditional probability puzzles
Introduction: Jason Kottke posts this puzzle from Gary Foshee that reportedly impressed people at a puzzle-designers’ convention: I have two children. One is a boy born on a Tuesday. What is the probability I have two boys? The first thing you think is “What has Tuesday got to do with it?” Well, it has everything to do with it. I thought I should really figure this one out myself before reading any further, and I decided this was a good time to apply my general principle that it’s always best to solve such problems from scratch rather than trying to guess at the answer. So I laid out all the 4 x 49 possibilities. The 4 is bb, bg, gb, gg, and the 49 are all possible pairs of days of the week. Then I ruled out all the possibilities that were inconsistent with the data: this leaves the following: bb with all pairs of days that include a Tuesday. That’s 13 possibilities (Mon/Tues, Tues/Tues, Wed/Tues, …, Tues/Mon, …, Sun/Tues, remembering not to count Tues/Tues twice). bg with all
4 0.80650473 731 andrew gelman stats-2011-05-26-Lottery probability update
Introduction: It was reported last year that the national lottery of Israel featured the exact same 6 numbers (out of 45) twice in the same month, and statistics professor Isaac Meilijson of Tel Aviv University was quoted as saying that “the incident of six numbers repeating themselves within a month is an event of once in 10,000 years.” I shouldn’t mock when it comes to mathematics–after all, I proved a false theorem once! (Or, to be precise, my collaborator and I published a false claim which we thought we’d proved, thus we thought was a theorem.) So let me retract the mockery and move, first to the mathematics and then to the statistics. First, how many possibilities are there in pick 6 out of 45? It’s (45*44*43*42*41*40)/6! = 8,145,060. Let’s call this number N. Second, what’s the probability that the same numbers repeat in a single calendar month? I’ve been told that the Israeli lottery has 2 draws per week, That’s 104/12=8.67 draws per month. Or maybe they skip some holiday
5 0.76295936 341 andrew gelman stats-2010-10-14-Confusion about continuous probability densities
Introduction: I had the following email exchange with a reader of Bayesian Data Analysis. My correspondent wrote: Exercise 1(b) involves evaluating the normal pdf at a single point. But p(Y=y|mu,sigma) = 0 (and is not simply N(y|mu,sigma)), since the normal distribution is continuous. So it seems that part (b) of the exercise is inappropriate. The solution does actually evaluate the probability as the value of the pdf at the single point, which is wrong. The probabilities should all be 0, so the answer to (b) is undefined. I replied: The pdf is the probability density function, which for a continuous distribution is defined as the derivative of the cumulative density function. The notation in BDA is rigorous but we do not spell out all the details, so I can see how confusion is possible. My correspondent: I agree that the pdf is the derivative of the cdf. But to compute P(a .lt. Y .lt. b) for a continuous distribution (with support in the real line) requires integrating over t
6 0.75420201 562 andrew gelman stats-2011-02-06-Statistician cracks Toronto lottery
7 0.74153936 1158 andrew gelman stats-2012-02-07-The more likely it is to be X, the more likely it is to be Not X?
8 0.74144274 2140 andrew gelman stats-2013-12-19-Revised evidence for statistical standards
9 0.74115777 23 andrew gelman stats-2010-05-09-Popper’s great, but don’t bother with his theory of probability
10 0.73673475 721 andrew gelman stats-2011-05-20-Non-statistical thinking in the US foreign policy establishment
12 0.71521115 1713 andrew gelman stats-2013-02-08-P-values and statistical practice
13 0.71273512 1724 andrew gelman stats-2013-02-16-Zero Dark Thirty and Bayes’ theorem
14 0.70480931 526 andrew gelman stats-2011-01-19-“If it saves the life of a single child…” and other nonsense
15 0.69617921 1760 andrew gelman stats-2013-03-12-Misunderstanding the p-value
17 0.69208032 331 andrew gelman stats-2010-10-10-Bayes jumps the shark
18 0.69193673 1897 andrew gelman stats-2013-06-13-When’s that next gamma-ray blast gonna come, already?
19 0.69161302 2200 andrew gelman stats-2014-02-05-Prior distribution for a predicted probability
topicId topicWeight
[(4, 0.044), (14, 0.013), (16, 0.05), (21, 0.024), (24, 0.146), (42, 0.046), (47, 0.117), (53, 0.035), (60, 0.015), (63, 0.029), (69, 0.011), (80, 0.01), (92, 0.022), (99, 0.316)]
simIndex simValue blogId blogTitle
Introduction: Remember How to Lie With Statistics? It turns out that the author worked for the cigarette companies. John Mashey points to this, from Robert Proctor’s book, “Golden Holocaust: Origins of the Cigarette Catastrophe and the Case for Abolition”: Darrell Huff, author of the wildly popular (and aptly named) How to Lie With Statistics, was paid to testify before Congress in the 1950s and then again in the 1960s, with the assigned task of ridiculing any notion of a cigarette-disease link. On March 22, 1965, Huff testified at hearings on cigarette labeling and advertising, accusing the recent Surgeon General’s report of myriad failures and “fallacies.” Huff peppered his attack with with amusing asides and anecdotes, lampooning spurious correlations like that between the size of Dutch families and the number of storks nesting on rooftops–which proves not that storks bring babies but rather that people with large families tend to have larger houses (which therefore attract more storks).
2 0.96547991 1897 andrew gelman stats-2013-06-13-When’s that next gamma-ray blast gonna come, already?
Introduction: Phil Plait writes : Earth May Have Been Hit by a Cosmic Blast 1200 Years Ago . . . this is nothing to panic about. If it happened at all, it was a long time ago, and unlikely to happen again for hundreds of thousands of years. This left me confused. If it really did happen 1200 years ago, basic statistics would suggest it would occur approximately once every 1200 years or so (within half an order of magnitude). So where does “hundreds of thousands of years” come from? I emailed astronomer David Hogg to see if I was missing something here, and he replied: Yeah, if we think this hit us 1200 years ago, we should imagine that this happens every few thousand years at least. Now that said, if there are *other* reasons for thinking it is exceedingly rare, then that would be a strong a priori argument against believing in the result. So you should either believe that it didn’t happen 1200 years ago, or else you should believe it will happen again in the next few thousan
3 0.96192348 1261 andrew gelman stats-2012-04-12-The Naval Research Lab
Introduction: I worked at the U.S. Naval Research Laboratory for four summers during high school and college. I spent much of my time writing a computer program to do thermal analysis for an experiment that we put on the space shuttle. The facility I developed with the finite-element method came in handy in my job at Bell Labs the following summers. I was working for C. H. Tsao and Jim Adams in the Laboratory for Cosmic Ray Physics. We were estimating the distribution of isotopes in cosmic rays using a pile of track detectors. To get accurate measurements, you want these plastic disks to be as close as possible to a constant temperature, so we designed an elaborate wrapping of thermal blankets. My program computed the temperature of the detectors during the year that the Long Duration Exposure Facility (including our experiment and a bunch of others) was scheduled to be in orbit. The input is the heat from solar radiation (easy enough to compute given the trajectory). On the computer I tr
same-blog 4 0.96042049 2322 andrew gelman stats-2014-05-06-Priors I don’t believe
Introduction: Biostatistician Jeff Leek writes : Think about this headline: “Hospital checklist cut infections, saved lives.” I [Leek] am a pretty skeptical person, so I’m a little surprised that a checklist could really save lives. I say the odds of this being true are 1 in 4. I’m actually surprised that he’s surprised, since over the years I’ve heard about the benefits of checklists in various arenas, including hospital care. In particular, there was this article by Atul Gawande from a few years back. I mean, sure, I could imagine that checklists might hurt: after all, it takes some time and effort to put together the checklist and to use it, and perhaps the very existence of the checklist could give hospital staff a false feeling of security, which would ultimately cost lives. But my first guess would be that people still don’t do enough checklisting, and that the probability is greater than 1/4 that a checklist in a hospital will save lives. Later on, Leek writes: Let’s try ano
5 0.95984089 1050 andrew gelman stats-2011-12-10-Presenting at the econ seminar
Introduction: Jim Savage saw this and pointed me to this video. I didn’t actually look at it, but given that it is labeled, “For new econ Ph.D.’s about to look for a job . . . what you might expect when you give your first talk presenting your research,” I can pretty much guess what it’ll look like.
6 0.95969558 1668 andrew gelman stats-2013-01-11-My talk at the NY data visualization meetup this Monday!
7 0.95667303 1055 andrew gelman stats-2011-12-13-Data sharing update
8 0.95555538 2275 andrew gelman stats-2014-03-31-Just gave a talk
9 0.95444435 1730 andrew gelman stats-2013-02-20-Unz on Unz
10 0.95367265 95 andrew gelman stats-2010-06-17-“Rewarding Strivers: Helping Low-Income Students Succeed in College”
11 0.95030141 2270 andrew gelman stats-2014-03-28-Creating a Lenin-style democracy
12 0.94935012 1143 andrew gelman stats-2012-01-29-G+ > Skype
13 0.94786727 2183 andrew gelman stats-2014-01-23-Discussion on preregistration of research studies
14 0.94755089 716 andrew gelman stats-2011-05-17-Is the internet causing half the rapes in Norway? I wanna see the scatterplot.
15 0.94596314 1218 andrew gelman stats-2012-03-18-Check your missing-data imputations using cross-validation
16 0.94501418 1450 andrew gelman stats-2012-08-08-My upcoming talk for the data visualization meetup
17 0.94097525 275 andrew gelman stats-2010-09-14-Data visualization at the American Evaluation Association
18 0.94002658 1349 andrew gelman stats-2012-05-28-Question 18 of my final exam for Design and Analysis of Sample Surveys
20 0.93750054 907 andrew gelman stats-2011-09-14-Reproducibility in Practice