andrew_gelman_stats andrew_gelman_stats-2014 andrew_gelman_stats-2014-2210 knowledge-graph by maker-knowledge-mining
Source: html
Introduction: I happened to receive two questions about stopping rules on the same day. First, from Tom Cunningham: I’ve been arguing with my colleagues about whether the stopping rule is relevant (a presenter disclosed that he went out to collect more data because the first experiment didn’t get significant results) — and I believe you have some qualifications to the Bayesian irrelevance argument but I don’t properly understand them. Then, from Benjamin Kay: I have a question that may be of interest for your blog. I was reading about the early history of AIDS and learned that the the trial of AZT was ended early because it was so effective : The trial reported in the New England Journal of medicine, had produced a dramatic result. Before the planned 24 week duration of the study, after a mean period of participation of about 120 days, nineteen participants receiving placebo had died while there was only a single death among those receiving AZT. This appeared to be a momentous break
sentIndex sentText sentNum sentScore
1 I happened to receive two questions about stopping rules on the same day. [sent-1, score-0.565]
2 Before the planned 24 week duration of the study, after a mean period of participation of about 120 days, nineteen participants receiving placebo had died while there was only a single death among those receiving AZT. [sent-5, score-0.439]
3 This appeared to be a momentous breakthrough and accordingly there was no restraint at all in reporting the result; prominent researchers triumphantly proclaimed the drug to be “a ray of hope” and “a light at the end of the tunnel”. [sent-6, score-0.27]
4 Because of this dramatic effect, the placebo arm of the study was discontinued and all participants offered 1500mg of AZT daily. [sent-7, score-0.442]
5 If the treatment is much, much better than the control it is considered unethical to continue the planned study and they end it early. [sent-9, score-0.296]
6 However, I know that it isn’t kosher to keep adding time or sample to an experiment until you find a result, and isn’t this a bit like that? [sent-11, score-0.717]
7 So here goes: First, we discuss stopping rules in section 6. [sent-14, score-0.565]
8 The short answer is that the stopping rule enters Bayesian data analysis in two places: inference and model checking: 1. [sent-18, score-0.949]
9 For inference, the key is that the stopping rule is only ignorable if time is included in the model. [sent-19, score-0.956]
10 To put it another way, treatment effects (or whatever it is that you’re measuring) can vary over time, and that possibility should be allowed for in your model, if you’re using a data-dependent stopping rule. [sent-20, score-0.587]
11 To put it yet another way, if you use a data-dependent stopping rule and don’t allow for possible time trends in your outcome, then your analysis will not be robust to failures with that assumption. [sent-21, score-0.827]
12 For model checking, the key is that if you’re comparing observed data to hypothetical replications under the model (for example, using a p-value), these hypothetical replications depend on the design of your data collection. [sent-23, score-0.494]
13 If you use a data-dependent stopping rule, this should be included in your data model, otherwise your p-value isn’t what it claims to be. [sent-24, score-0.564]
14 Next, my response to Benjamin Kay’s question about AZT: For the Bayesian analysis, it is actually kosher “to keep adding time or sample to an experiment until you find a result. [sent-25, score-0.717]
15 I know that some people are bothered by the idea that you can keep adding time or sample to an experiment until you find a result. [sent-30, score-0.596]
16 But if you do a very careful study (so as to minimize variation) or a very large study (to get that magic 1/sqrt(n)), you’ll get a small enough confidence interval to have high certainty about the sign of the effect. [sent-37, score-0.48]
17 So, from going from high sigma and low n, to low sigma and high n, you’ve “adding time or sample to an experiment” and you “found a result. [sent-38, score-0.563]
18 OK, this particular plan (measure carefully and get a huge sample size) is chosen ahead of time, it doesn’t involve waiting until the confidence interval excludes zero. [sent-40, score-0.366]
19 It seems to me that problems with data-based stopping and Bayesian analysis (other than the two issues I noted above) arise only because people are mixing Bayesian inference with non-Bayesian decision making. [sent-47, score-0.706]
20 Which is fair enough—people apply these sorts of mixed methods all the time—but in that case I prefer to see the problem as arising from the non-Bayesian decision rule, not from the stopping rule or the Bayesian inference. [sent-48, score-0.844]
wordName wordTfidf (topN-words)
[('stopping', 0.498), ('azt', 0.229), ('rule', 0.227), ('experiment', 0.159), ('adding', 0.144), ('kay', 0.139), ('bayesian', 0.127), ('kosher', 0.121), ('decision', 0.119), ('interval', 0.119), ('sample', 0.117), ('study', 0.116), ('benjamin', 0.11), ('sigma', 0.11), ('placebo', 0.104), ('time', 0.102), ('robustness', 0.101), ('replications', 0.096), ('isn', 0.094), ('planned', 0.091), ('treatment', 0.089), ('inference', 0.089), ('receiving', 0.088), ('trial', 0.085), ('dramatic', 0.084), ('bother', 0.083), ('hypothetical', 0.082), ('drug', 0.078), ('keep', 0.074), ('presenter', 0.07), ('irrelevance', 0.07), ('discontinued', 0.07), ('model', 0.069), ('participants', 0.068), ('rules', 0.067), ('confidence', 0.067), ('included', 0.066), ('enters', 0.066), ('sentiment', 0.066), ('restraint', 0.066), ('disclosed', 0.066), ('checking', 0.065), ('excludes', 0.063), ('accordingly', 0.063), ('ignorable', 0.063), ('breakthrough', 0.063), ('preregistered', 0.063), ('re', 0.062), ('high', 0.062), ('effect', 0.062)]
simIndex simValue blogId blogTitle
same-blog 1 1.0000001 2210 andrew gelman stats-2014-02-13-Stopping rules and Bayesian analysis
Introduction: I happened to receive two questions about stopping rules on the same day. First, from Tom Cunningham: I’ve been arguing with my colleagues about whether the stopping rule is relevant (a presenter disclosed that he went out to collect more data because the first experiment didn’t get significant results) — and I believe you have some qualifications to the Bayesian irrelevance argument but I don’t properly understand them. Then, from Benjamin Kay: I have a question that may be of interest for your blog. I was reading about the early history of AIDS and learned that the the trial of AZT was ended early because it was so effective : The trial reported in the New England Journal of medicine, had produced a dramatic result. Before the planned 24 week duration of the study, after a mean period of participation of about 120 days, nineteen participants receiving placebo had died while there was only a single death among those receiving AZT. This appeared to be a momentous break
2 0.24453118 788 andrew gelman stats-2011-07-06-Early stopping and penalized likelihood
Introduction: Maximum likelihood gives the beat fit to the training data but in general overfits, yielding overly-noisy parameter estimates that don’t perform so well when predicting new data. A popular solution to this overfitting problem takes advantage of the iterative nature of most maximum likelihood algorithms by stopping early. In general, an iterative optimization algorithm goes from a starting point to the maximum of some objective function. If the starting point has some good properties, then early stopping can work well, keeping some of the virtues of the starting point while respecting the data. This trick can be performed the other way, too, starting with the data and then processing it to move it toward a model. That’s how the iterative proportional fitting algorithm of Deming and Stephan (1940) works to fit multivariate categorical data to known margins. In any case, the trick is to stop at the right point–not so soon that you’re ignoring the data but not so late that you en
Introduction: I’m reposing this classic from 2011 . . . Peter Bergman pointed me to this discussion from Cyrus of a presentation by Guido Imbens on design of randomized experiments. Cyrus writes: The standard analysis that Imbens proposes includes (1) a Fisher-type permutation test of the sharp null hypothesis–what Imbens referred to as “testing”–along with a (2) Neyman-type point estimate of the sample average treatment effect and confidence interval–what Imbens referred to as “estimation.” . . . Imbens claimed that testing and estimation are separate enterprises with separate goals and that the two should not be confused. I [Cyrus] took it as a warning against proposals that use “inverted” tests in order to produce point estimates and confidence intervals. There is no reason that such confidence intervals will have accurate coverage except under rather dire assumptions, meaning that they are not “confidence intervals” in the way that we usually think of them. I agree completely. T
Introduction: Peter Bergman points me to this discussion from Cyrus of a presentation by Guido Imbens on design of randomized experiments. Cyrus writes: The standard analysis that Imbens proposes includes (1) a Fisher-type permutation test of the sharp null hypothesis–what Imbens referred to as “testing”–along with a (2) Neyman-type point estimate of the sample average treatment effect and confidence interval–what Imbens referred to as “estimation.” . . . Imbens claimed that testing and estimation are separate enterprises with separate goals and that the two should not be confused. I [Cyrus] took it as a warning against proposals that use “inverted” tests in order to produce point estimates and confidence intervals. There is no reason that such confidence intervals will have accurate coverage except under rather dire assumptions, meaning that they are not “confidence intervals” in the way that we usually think of them. I agree completely. This is something I’ve been saying for a long
5 0.13550267 1955 andrew gelman stats-2013-07-25-Bayes-respecting experimental design and other things
Introduction: Dan Lakeland writes: I have some questions about some basic statistical ideas and would like your opinion on them: 1) Parameters that manifestly DON’T exist: It makes good sense to me to think about Bayesian statistics as narrowing in on the value of parameters based on a model and some data. But there are cases where “the parameter” simply doesn’t make sense as an actual thing. Yet, it’s not really a complete fiction, like unicorns either, it’s some kind of “effective” thing maybe. Here’s an example of what I mean. I did a simple toy experiment where we dropped crumpled up balls of paper and timed their fall times. (see here: http://models.street-artists.org/?s=falling+ball ) It was pretty instructive actually, and I did it to figure out how to in a practical way use an ODE to get a likelihood in MCMC procedures. One of the parameters in the model is the radius of the spherical ball of paper. But the ball of paper isn’t a sphere, not even approximately. There’s no single valu
7 0.13419534 1695 andrew gelman stats-2013-01-28-Economists argue about Bayes
8 0.12927601 2301 andrew gelman stats-2014-04-22-Ticket to Baaaaarf
9 0.12915497 248 andrew gelman stats-2010-09-01-Ratios where the numerator and denominator both change signs
10 0.12633729 2206 andrew gelman stats-2014-02-10-On deck this week
11 0.12237343 388 andrew gelman stats-2010-11-01-The placebo effect in pharma
12 0.12129223 1942 andrew gelman stats-2013-07-17-“Stop and frisk” statistics
13 0.12100387 1117 andrew gelman stats-2012-01-13-What are the important issues in ethics and statistics? I’m looking for your input!
15 0.11526668 480 andrew gelman stats-2010-12-21-Instead of “confidence interval,” let’s say “uncertainty interval”
16 0.1150852 1150 andrew gelman stats-2012-02-02-The inevitable problems with statistical significance and 95% intervals
17 0.11359188 1575 andrew gelman stats-2012-11-12-Thinking like a statistician (continuously) rather than like a civilian (discretely)
18 0.11167193 1554 andrew gelman stats-2012-10-31-It not necessary that Bayesian methods conform to the likelihood principle
19 0.11054292 781 andrew gelman stats-2011-06-28-The holes in my philosophy of Bayesian data analysis
20 0.10827702 350 andrew gelman stats-2010-10-18-Subtle statistical issues to be debated on TV.
topicId topicWeight
[(0, 0.245), (1, 0.072), (2, 0.019), (3, -0.086), (4, -0.027), (5, -0.008), (6, -0.003), (7, 0.043), (8, 0.056), (9, -0.091), (10, -0.057), (11, -0.004), (12, 0.049), (13, -0.02), (14, 0.015), (15, 0.004), (16, 0.027), (17, -0.007), (18, 0.007), (19, 0.057), (20, -0.005), (21, 0.022), (22, -0.002), (23, -0.024), (24, 0.02), (25, 0.008), (26, -0.036), (27, -0.078), (28, -0.004), (29, 0.022), (30, -0.072), (31, -0.076), (32, -0.022), (33, 0.03), (34, 0.048), (35, 0.005), (36, -0.068), (37, 0.001), (38, -0.006), (39, 0.009), (40, 0.048), (41, 0.05), (42, -0.041), (43, -0.009), (44, 0.065), (45, -0.011), (46, 0.008), (47, 0.0), (48, 0.048), (49, 0.049)]
simIndex simValue blogId blogTitle
same-blog 1 0.96933377 2210 andrew gelman stats-2014-02-13-Stopping rules and Bayesian analysis
Introduction: I happened to receive two questions about stopping rules on the same day. First, from Tom Cunningham: I’ve been arguing with my colleagues about whether the stopping rule is relevant (a presenter disclosed that he went out to collect more data because the first experiment didn’t get significant results) — and I believe you have some qualifications to the Bayesian irrelevance argument but I don’t properly understand them. Then, from Benjamin Kay: I have a question that may be of interest for your blog. I was reading about the early history of AIDS and learned that the the trial of AZT was ended early because it was so effective : The trial reported in the New England Journal of medicine, had produced a dramatic result. Before the planned 24 week duration of the study, after a mean period of participation of about 120 days, nineteen participants receiving placebo had died while there was only a single death among those receiving AZT. This appeared to be a momentous break
2 0.77248287 1150 andrew gelman stats-2012-02-02-The inevitable problems with statistical significance and 95% intervals
Introduction: I’m thinking more and more that we have to get rid of statistical significance, 95% intervals, and all the rest, and just come to a more fundamental acceptance of uncertainty. In practice, I think we use confidence intervals and hypothesis tests as a way to avoid acknowledging uncertainty. We set up some rules and then act as if we know what is real and what is not. Even in my own applied work, I’ve often enough presented 95% intervals and gone on from there. But maybe that’s just not right. I was thinking about this after receiving the following email from a psychology student: I [the student] am trying to conceptualize the lessons in your paper with Stern with comparing treatment effects across studies. When trying to understand if a certain intervention works, we must look at what the literature says. However this can be complicated if the literature has divergent results. There are four situations I am thinking of. FOr each of these situations, assume the studies are r
Introduction: Ratio estimates are common in statistics. In survey sampling, the ratio estimate is when you use y/x to estimate Y/X (using the notation in which x,y are totals of sample measurements and X,Y are population totals). In textbook sampling examples, the denominator X will be an all-positive variable, something that is easy to measure and is, ideally, close to proportional to Y. For example, X is last year’s sales and Y is this year’s sales, or X is the number of people in a cluster and Y is some count. Ratio estimation doesn’t work so well if X can be either positive or negative. More generally we can consider any estimate of a ratio, with no need for a survey sampling context. The problem with estimating Y/X is that the very interpretation of Y/X can change completely if the sign of X changes. Everything is ok for a point estimate: you get X.hat and Y.hat, you can take the ratio Y.hat/X.hat, no problem. But the inference falls apart if you have enough uncertainty in X.hat th
Introduction: Commenter Rahul asked what I thought of this note by Scott Firestone ( link from Tyler Cowen) criticizing a recent discussion by Kevin Drum suggesting that lead exposure causes violent crime. Firestone writes: It turns out there was in fact a prospective study done—but its implications for Drum’s argument are mixed. The study was a cohort study done by researchers at the University of Cincinnati. Between 1979 and 1984, 376 infants were recruited. Their parents consented to have lead levels in their blood tested over time; this was matched with records over subsequent decades of the individuals’ arrest records, and specifically arrest for violent crime. Ultimately, some of these individuals were dropped from the study; by the end, 250 were selected for the results. The researchers found that for each increase of 5 micrograms of lead per deciliter of blood, there was a higher risk for being arrested for a violent crime, but a further look at the numbers shows a more mixe
5 0.72985041 368 andrew gelman stats-2010-10-25-Is instrumental variables analysis particularly susceptible to Type M errors?
Introduction: Hendrik Juerges writes: I am an applied econometrician. The reason I am writing is that I am pondering a question for some time now and I am curious whether you have any views on it. One problem the practitioner of instrumental variables estimation faces is large standard errors even with very large samples. Part of the problem is of course that one estimates a ratio. Anyhow, more often than not, I and many other researchers I know end up with large point estimates and standard errors when trying IV on a problem. Sometimes some of us are lucky and get a statistically significant result. Those estimates that make it beyond the 2 standard error threshold are often ridiculously large (one famous example in my line of research being Lleras-Muney’s estimates of the 10% effect of one year of schooling on mortality). The standard defense here is that IV estimates the complier-specific causal effect (which is mathematically correct). But still, I find many of the IV results (including my
6 0.72509223 1299 andrew gelman stats-2012-05-04-Models, assumptions, and data summaries
7 0.72402591 1955 andrew gelman stats-2013-07-25-Bayes-respecting experimental design and other things
8 0.72351897 248 andrew gelman stats-2010-09-01-Ratios where the numerator and denominator both change signs
9 0.72326088 2223 andrew gelman stats-2014-02-24-“Edlin’s rule” for routinely scaling down published estimates
10 0.71962988 1776 andrew gelman stats-2013-03-25-The harm done by tests of significance
12 0.71075684 898 andrew gelman stats-2011-09-10-Fourteen magic words: an update
13 0.7087701 2 andrew gelman stats-2010-04-23-Modeling heterogenous treatment effects
14 0.70487058 2180 andrew gelman stats-2014-01-21-Everything I need to know about Bayesian statistics, I learned in eight schools.
15 0.70386815 2114 andrew gelman stats-2013-11-26-“Please make fun of this claim”
17 0.69319671 2090 andrew gelman stats-2013-11-05-How much do we trust a new claim that early childhood stimulation raised earnings by 42%?
20 0.68929094 1054 andrew gelman stats-2011-12-12-More frustrations trying to replicate an analysis published in a reputable journal
topicId topicWeight
[(5, 0.011), (13, 0.032), (14, 0.011), (15, 0.018), (16, 0.067), (24, 0.177), (35, 0.022), (44, 0.06), (46, 0.012), (53, 0.074), (86, 0.052), (89, 0.038), (99, 0.29)]
simIndex simValue blogId blogTitle
same-blog 1 0.9699012 2210 andrew gelman stats-2014-02-13-Stopping rules and Bayesian analysis
Introduction: I happened to receive two questions about stopping rules on the same day. First, from Tom Cunningham: I’ve been arguing with my colleagues about whether the stopping rule is relevant (a presenter disclosed that he went out to collect more data because the first experiment didn’t get significant results) — and I believe you have some qualifications to the Bayesian irrelevance argument but I don’t properly understand them. Then, from Benjamin Kay: I have a question that may be of interest for your blog. I was reading about the early history of AIDS and learned that the the trial of AZT was ended early because it was so effective : The trial reported in the New England Journal of medicine, had produced a dramatic result. Before the planned 24 week duration of the study, after a mean period of participation of about 120 days, nineteen participants receiving placebo had died while there was only a single death among those receiving AZT. This appeared to be a momentous break
Introduction: I’ve recently started a regular column on ethics, appearing every three months in Chance magazine . My first column, “Open Data and Open Methods,” is here , and my second column, “Statisticians: When we teach, we don’t practice what we preach” (coauthored with Eric Loken) will be appearing in the next issue. Statistical ethics is a wide-open topic, and I’d be very interested in everyone’s thoughts, questions, and stories. I’d like to get beyond generic questions such as, Is it right to do a randomized trial when you think the treatment is probably better than the control?, and I’d also like to avoid the really easy questions such as, Is it ethical to copy Wikipedia entries and then sell the resulting publication for $2800 a year? [Note to people who are sick of hearing about this particular story: I'll consider stopping my blogging on it, the moment that the people involved consider apologizing for their behavior.] Please insert your thoughts, questions, stories, links, et
3 0.96093047 1956 andrew gelman stats-2013-07-25-What should be in a machine learning course?
Introduction: Nando de Freitas writes: We’re designing two machine learning (ML) courses at Oxford (introductory and advanced ML). In doing this, we have many questions and wonder what your thoughts are on the following: - Which do you think are the key optimization papers/ideas that should be covered. - Which topics do you think are coolest things in ML? - Which are the essential ideas, tools and approaches? - Are there other courses you would recommend? - Which are good resources for students to learn to code and apply convolutional nets? Theano? What are the key deep learning things to know first? - Which are the best scalable classifiers? … pegasos .. etc. - Which are the coolest applications that can be easily given as a programming exercise? - What theory to teach? PAC? PAC-Bayes? CLTs? - What are the best tutorials on sample complexity for ML? - How much should we emphasize the trade-offs of computing/optimization-approximation-estimation. - What are the ML algorithms mostly
4 0.96082276 446 andrew gelman stats-2010-12-03-Is 0.05 too strict as a p-value threshold?
Introduction: Seth sent along an article (not by him) from the psychology literature and wrote: This is a good example of your complaint about statistical significance. The authors want to say that predictability of information determines how distracting something is and have two conditions that vary in predictability. One is significantly distracting, the other isn’t. But the two conditions are not significantly different from each other. So the two conditions are different more weakly than p = 0.05. I don’t think the reviewers failed to notice this. They just thought it should be published anyway, is my guess. To me, the interesting question is: where should the bar be? at p = 0.05? at p = 0.10? something else? How can we figure out where to put the bar? I replied: My quick answer is that we have to get away from .05 and .10 and move to something that takes into account prior information. This could be Bayesian (of course) or could be done classically using power calculations, as disc
5 0.96015918 2149 andrew gelman stats-2013-12-26-Statistical evidence for revised standards
Introduction: In response to the discussion of X and me of his recent paper , Val Johnson writes: I would like to thank Andrew for forwarding his comments on uniformly most powerful Bayesian tests (UMPBTs) to me and his invitation to respond to them. I think he (and also Christian Robert) raise a number of interesting points concerning this new class of Bayesian tests, but I think that they may have confounded several issues that might more usefully be examined separately. The first issue involves the choice of the Bayesian evidence threshold, gamma, used in rejecting a null hypothesis in favor of an alternative hypothesis. Andrew objects to the higher values of gamma proposed in my recent PNAS article on grounds that too many important scientific effects would be missed if thresholds of 25-50 were routinely used. These evidence thresholds correspond roughly to p-values of 0.005; Andrew suggests that evidence thresholds around 5 should continue to be used (gamma=5 corresponds approximate
6 0.95992649 687 andrew gelman stats-2011-04-29-Zero is zero
7 0.95926666 1047 andrew gelman stats-2011-12-08-I Am Too Absolutely Heteroskedastic for This Probit Model
8 0.95852512 547 andrew gelman stats-2011-01-31-Using sample size in the prior distribution
10 0.95789319 2161 andrew gelman stats-2014-01-07-My recent debugging experience
11 0.95782959 899 andrew gelman stats-2011-09-10-The statistical significance filter
12 0.95691454 788 andrew gelman stats-2011-07-06-Early stopping and penalized likelihood
13 0.95639038 2223 andrew gelman stats-2014-02-24-“Edlin’s rule” for routinely scaling down published estimates
14 0.95562291 2089 andrew gelman stats-2013-11-04-Shlemiel the Software Developer and Unknown Unknowns
15 0.95460755 2140 andrew gelman stats-2013-12-19-Revised evidence for statistical standards
19 0.951285 1858 andrew gelman stats-2013-05-15-Reputations changeable, situations tolerable
20 0.9488675 1760 andrew gelman stats-2013-03-12-Misunderstanding the p-value