andrew_gelman_stats andrew_gelman_stats-2013 andrew_gelman_stats-2013-1897 knowledge-graph by maker-knowledge-mining
Source: html
Introduction: Phil Plait writes : Earth May Have Been Hit by a Cosmic Blast 1200 Years Ago . . . this is nothing to panic about. If it happened at all, it was a long time ago, and unlikely to happen again for hundreds of thousands of years. This left me confused. If it really did happen 1200 years ago, basic statistics would suggest it would occur approximately once every 1200 years or so (within half an order of magnitude). So where does “hundreds of thousands of years” come from? I emailed astronomer David Hogg to see if I was missing something here, and he replied: Yeah, if we think this hit us 1200 years ago, we should imagine that this happens every few thousand years at least. Now that said, if there are *other* reasons for thinking it is exceedingly rare, then that would be a strong a priori argument against believing in the result. So you should either believe that it didn’t happen 1200 years ago, or else you should believe it will happen again in the next few thousan
sentIndex sentText sentNum sentScore
1 If it happened at all, it was a long time ago, and unlikely to happen again for hundreds of thousands of years. [sent-5, score-0.593]
2 If it really did happen 1200 years ago, basic statistics would suggest it would occur approximately once every 1200 years or so (within half an order of magnitude). [sent-7, score-1.118]
3 So where does “hundreds of thousands of years” come from? [sent-8, score-0.141]
4 I emailed astronomer David Hogg to see if I was missing something here, and he replied: Yeah, if we think this hit us 1200 years ago, we should imagine that this happens every few thousand years at least. [sent-9, score-1.04]
5 Now that said, if there are *other* reasons for thinking it is exceedingly rare, then that would be a strong a priori argument against believing in the result. [sent-10, score-0.49]
6 So you should either believe that it didn’t happen 1200 years ago, or else you should believe it will happen again in the next few thousand. [sent-11, score-1.034]
7 So, from a Bayesian standpoint, our prior guess is that this event has very low frequency, thus conditional on the assumed data (1200 years since the most recent event), the estimated frequency should be something a bit less than 1 per 1200 years. [sent-12, score-1.293]
8 But it’s hard to see how we’d get 1 per 300,000 years. [sent-13, score-0.163]
9 That would require a prior that’s so strong that it would be contradicted by the data. [sent-14, score-0.497]
10 (Or, again, perhaps the data are being misinterpreted, but the above analysis is conditional on that interpretation. [sent-15, score-0.124]
11 ) David supplied an update: Here is the paper and the authors of the paper say the rate is one per 375,000 yr to 3,750,000 yr. [sent-16, score-0.55]
12 (You will enjoy the precision they use in their numbers, given their uncertainties. [sent-17, score-0.081]
13 ) Then they say that this is consistent with one event in the last 3000 yr within 2. [sent-18, score-0.671]
14 (They figure the census for such events is complete back to 3000 years, and there is one event found in that census. [sent-20, score-0.368]
15 ) I guess it all depends on how strongly you believe your prior and how strongly you believe the data. [sent-21, score-0.788]
16 I wrote this post in January and then put it on the queue, confident that there was no rush. [sent-24, score-0.091]
17 After all, the probability of a major gamma ray burst in any given year is so small! [sent-25, score-0.331]
wordName wordTfidf (topN-words)
[('yr', 0.291), ('event', 0.287), ('years', 0.258), ('happen', 0.222), ('ago', 0.189), ('frequency', 0.166), ('believe', 0.166), ('per', 0.163), ('hundreds', 0.149), ('hit', 0.145), ('thousands', 0.141), ('prior', 0.135), ('blast', 0.132), ('burst', 0.125), ('panic', 0.125), ('astronomer', 0.125), ('exceedingly', 0.125), ('conditional', 0.124), ('cosmic', 0.119), ('strongly', 0.118), ('contradicted', 0.115), ('queue', 0.109), ('misinterpreted', 0.109), ('ray', 0.107), ('strong', 0.101), ('gamma', 0.099), ('priori', 0.097), ('david', 0.097), ('supplied', 0.096), ('hogg', 0.095), ('believing', 0.094), ('within', 0.093), ('earth', 0.091), ('confident', 0.091), ('emailed', 0.089), ('january', 0.086), ('guess', 0.085), ('standpoint', 0.084), ('thousand', 0.083), ('every', 0.082), ('census', 0.081), ('precision', 0.081), ('unlikely', 0.081), ('occur', 0.079), ('assumed', 0.075), ('magnitude', 0.073), ('would', 0.073), ('approximately', 0.073), ('yeah', 0.072), ('update', 0.072)]
simIndex simValue blogId blogTitle
same-blog 1 0.99999988 1897 andrew gelman stats-2013-06-13-When’s that next gamma-ray blast gonna come, already?
Introduction: Phil Plait writes : Earth May Have Been Hit by a Cosmic Blast 1200 Years Ago . . . this is nothing to panic about. If it happened at all, it was a long time ago, and unlikely to happen again for hundreds of thousands of years. This left me confused. If it really did happen 1200 years ago, basic statistics would suggest it would occur approximately once every 1200 years or so (within half an order of magnitude). So where does “hundreds of thousands of years” come from? I emailed astronomer David Hogg to see if I was missing something here, and he replied: Yeah, if we think this hit us 1200 years ago, we should imagine that this happens every few thousand years at least. Now that said, if there are *other* reasons for thinking it is exceedingly rare, then that would be a strong a priori argument against believing in the result. So you should either believe that it didn’t happen 1200 years ago, or else you should believe it will happen again in the next few thousan
2 0.16180409 1399 andrew gelman stats-2012-06-28-Life imitates blog
Introduction: I just noticed this from a couple years ago!
Introduction: From 2.5 years ago . Read all the comments; the discussion is helpful.
4 0.14039697 669 andrew gelman stats-2011-04-19-The mysterious Gamma (1.4, 0.4)
Introduction: A student writes: I have a question about an earlier recommendation of yours on the election of the prior distribution for the precision hyperparameter of a normal distribution, and a reference for the recommendation. If I recall correctly I have read that you have suggested to use Gamma(1.4, 0.4) instead of Gamma(0.01,0.01) for the prior distribution of the precision hyper parameter of a normal distribution. I would very much appreciate if you would have the time to point me to this publication of yours. The reason is that I have used the prior distribution (Gamma(1.4, 0.4)) in a study which we now revise for publication, and where a reviewer question the choice of the distribution (claiming that it is too informative!). I am well aware of that you in recent publications (Prior distributions for variance parameters in hierarchical models. Bayesian Analysis; Data Analysis using regression and multilevel/hierarchical models) suggest to model the precision as pow(standard deviatio
5 0.13984077 731 andrew gelman stats-2011-05-26-Lottery probability update
Introduction: It was reported last year that the national lottery of Israel featured the exact same 6 numbers (out of 45) twice in the same month, and statistics professor Isaac Meilijson of Tel Aviv University was quoted as saying that “the incident of six numbers repeating themselves within a month is an event of once in 10,000 years.” I shouldn’t mock when it comes to mathematics–after all, I proved a false theorem once! (Or, to be precise, my collaborator and I published a false claim which we thought we’d proved, thus we thought was a theorem.) So let me retract the mockery and move, first to the mathematics and then to the statistics. First, how many possibilities are there in pick 6 out of 45? It’s (45*44*43*42*41*40)/6! = 8,145,060. Let’s call this number N. Second, what’s the probability that the same numbers repeat in a single calendar month? I’ve been told that the Israeli lottery has 2 draws per week, That’s 104/12=8.67 draws per month. Or maybe they skip some holiday
6 0.13576056 2200 andrew gelman stats-2014-02-05-Prior distribution for a predicted probability
7 0.12763989 1941 andrew gelman stats-2013-07-16-Priors
8 0.12763265 2109 andrew gelman stats-2013-11-21-Hidden dangers of noninformative priors
9 0.12697764 563 andrew gelman stats-2011-02-07-Evaluating predictions of political events
10 0.11402156 1610 andrew gelman stats-2012-12-06-Yes, checking calibration of probability forecasts is part of Bayesian statistics
11 0.1135839 1695 andrew gelman stats-2013-01-28-Economists argue about Bayes
12 0.10250594 1946 andrew gelman stats-2013-07-19-Prior distributions on derived quantities rather than on parameters themselves
13 0.099155433 1572 andrew gelman stats-2012-11-10-I don’t like this cartoon
14 0.098906413 138 andrew gelman stats-2010-07-10-Creating a good wager based on probability estimates
15 0.098424748 477 andrew gelman stats-2010-12-20-Costless false beliefs
16 0.096715108 1149 andrew gelman stats-2012-02-01-Philosophy of Bayesian statistics: my reactions to Cox and Mayo
17 0.096520141 779 andrew gelman stats-2011-06-25-Avoiding boundary estimates using a prior distribution as regularization
18 0.0944186 1924 andrew gelman stats-2013-07-03-Kuhn, 1-f noise, and the fractal nature of scientific revolutions
19 0.093844585 1144 andrew gelman stats-2012-01-29-How many parameters are in a multilevel model?
20 0.091778092 1155 andrew gelman stats-2012-02-05-What is a prior distribution?
topicId topicWeight
[(0, 0.187), (1, 0.015), (2, 0.014), (3, 0.033), (4, -0.045), (5, -0.047), (6, 0.095), (7, 0.019), (8, -0.038), (9, -0.025), (10, -0.007), (11, -0.017), (12, 0.049), (13, 0.022), (14, 0.011), (15, 0.063), (16, 0.037), (17, 0.001), (18, 0.019), (19, 0.007), (20, -0.022), (21, 0.077), (22, -0.037), (23, 0.019), (24, -0.011), (25, 0.037), (26, -0.076), (27, -0.009), (28, 0.035), (29, 0.007), (30, -0.007), (31, 0.012), (32, -0.044), (33, -0.034), (34, -0.052), (35, -0.033), (36, 0.014), (37, 0.04), (38, -0.009), (39, -0.042), (40, -0.026), (41, 0.028), (42, -0.025), (43, -0.068), (44, -0.007), (45, -0.056), (46, -0.009), (47, -0.046), (48, -0.021), (49, 0.002)]
simIndex simValue blogId blogTitle
same-blog 1 0.97981018 1897 andrew gelman stats-2013-06-13-When’s that next gamma-ray blast gonna come, already?
Introduction: Phil Plait writes : Earth May Have Been Hit by a Cosmic Blast 1200 Years Ago . . . this is nothing to panic about. If it happened at all, it was a long time ago, and unlikely to happen again for hundreds of thousands of years. This left me confused. If it really did happen 1200 years ago, basic statistics would suggest it would occur approximately once every 1200 years or so (within half an order of magnitude). So where does “hundreds of thousands of years” come from? I emailed astronomer David Hogg to see if I was missing something here, and he replied: Yeah, if we think this hit us 1200 years ago, we should imagine that this happens every few thousand years at least. Now that said, if there are *other* reasons for thinking it is exceedingly rare, then that would be a strong a priori argument against believing in the result. So you should either believe that it didn’t happen 1200 years ago, or else you should believe it will happen again in the next few thousan
2 0.76381844 1905 andrew gelman stats-2013-06-18-There are no fat sprinters
Introduction: This post is by Phil. A little over three years ago I wrote a post about exercise and weight loss in which I described losing a fair amount of weight due to (I believe) an exercise regime, with no effort to change my diet; this contradicted the prediction of studies that had recently been released. The comment thread on that post is quite interesting: a lot of people had had similar experiences — losing weight, or keeping it off, with an exercise program that includes very short periods of exercise at maximal intensity — while other people expressed some skepticism about my claims. Some commenters said that I risked injury; others said it was too early to judge anything because my weight loss might not last. The people who predicted injury were right: running the curve during a 200m sprint a month or two after that post, I strained my Achilles tendon. Nothing really serious, but it did keep me off the track for a couple of months, and rather than go back to sprinting I switched t
3 0.74149913 731 andrew gelman stats-2011-05-26-Lottery probability update
Introduction: It was reported last year that the national lottery of Israel featured the exact same 6 numbers (out of 45) twice in the same month, and statistics professor Isaac Meilijson of Tel Aviv University was quoted as saying that “the incident of six numbers repeating themselves within a month is an event of once in 10,000 years.” I shouldn’t mock when it comes to mathematics–after all, I proved a false theorem once! (Or, to be precise, my collaborator and I published a false claim which we thought we’d proved, thus we thought was a theorem.) So let me retract the mockery and move, first to the mathematics and then to the statistics. First, how many possibilities are there in pick 6 out of 45? It’s (45*44*43*42*41*40)/6! = 8,145,060. Let’s call this number N. Second, what’s the probability that the same numbers repeat in a single calendar month? I’ve been told that the Israeli lottery has 2 draws per week, That’s 104/12=8.67 draws per month. Or maybe they skip some holiday
4 0.70861894 2230 andrew gelman stats-2014-03-02-What is it with Americans in Olympic ski teams from tropical countries?
Introduction: Every time I hear this sort of story: Morrone—listed at 48 years old, which would have made her the oldest Olympic cross-country skier of all time by seven years—didn’t even show up for the 10K women’s classic on Feb. 13, claiming injury. (She was the only one of the race’s 76 entrants who didn’t start.) A day later, in the 15K men’s classic, di Silvestri, 47, made it out of the starting gate but gave up just a few hundred meters later, claiming illness. He was reportedly the only starter who failed to make even the first checkpoint. I think about my cousin Bill . At least when Bill skied in the winter olympics a bunch of years ago as part of the Puerto Rican team, he finished. OK, he finished last, but somebody had to finish last. At least he skied down the whole damn slope. P.S. I have no idea how Puerto Rico was allowed to have an olympic team. As you might have heard, it’s not actually a country. P.P.S. My cousin is not Puerto Rican. But I think he’s gone the
5 0.70658088 2322 andrew gelman stats-2014-05-06-Priors I don’t believe
Introduction: Biostatistician Jeff Leek writes : Think about this headline: “Hospital checklist cut infections, saved lives.” I [Leek] am a pretty skeptical person, so I’m a little surprised that a checklist could really save lives. I say the odds of this being true are 1 in 4. I’m actually surprised that he’s surprised, since over the years I’ve heard about the benefits of checklists in various arenas, including hospital care. In particular, there was this article by Atul Gawande from a few years back. I mean, sure, I could imagine that checklists might hurt: after all, it takes some time and effort to put together the checklist and to use it, and perhaps the very existence of the checklist could give hospital staff a false feeling of security, which would ultimately cost lives. But my first guess would be that people still don’t do enough checklisting, and that the probability is greater than 1/4 that a checklist in a hospital will save lives. Later on, Leek writes: Let’s try ano
6 0.70554852 1829 andrew gelman stats-2013-04-28-Plain old everyday Bayesianism!
8 0.67889124 549 andrew gelman stats-2011-02-01-“Roughly 90% of the increase in . . .” Hey, wait a minute!
9 0.67620635 526 andrew gelman stats-2011-01-19-“If it saves the life of a single child…” and other nonsense
11 0.66687453 873 andrew gelman stats-2011-08-26-Luck or knowledge?
12 0.66541672 477 andrew gelman stats-2010-12-20-Costless false beliefs
13 0.66154742 1187 andrew gelman stats-2012-02-27-“Apple confronts the law of large numbers” . . . huh?
14 0.65467989 983 andrew gelman stats-2011-10-31-Skepticism about skepticism of global warming skepticism skepticism
15 0.65451527 164 andrew gelman stats-2010-07-26-A very short story
16 0.65195179 180 andrew gelman stats-2010-08-03-Climate Change News
17 0.65178931 786 andrew gelman stats-2011-07-04-Questions about quantum computing
18 0.6493752 1399 andrew gelman stats-2012-06-28-Life imitates blog
19 0.64826596 1623 andrew gelman stats-2012-12-14-GiveWell charity recommendations
20 0.64813113 1734 andrew gelman stats-2013-02-23-Life in the C-suite: A graph that is both ugly and bad, and an unrelated story
topicId topicWeight
[(2, 0.022), (16, 0.054), (21, 0.024), (24, 0.18), (47, 0.235), (89, 0.038), (99, 0.333)]
simIndex simValue blogId blogTitle
1 0.97961521 275 andrew gelman stats-2010-09-14-Data visualization at the American Evaluation Association
Introduction: Stephanie Evergreen writes: Media, web design, and marketing have all created an environment where stakeholders – clients, program participants, funders – all expect high quality graphics and reporting that effectively conveys the valuable insights from evaluation work. Some in statistics and mathematics have used data visualization strategies to support more useful reporting of complex ideas. Global growing interest in improving communications has begun to take root in the evaluation field as well. But as anyone who has sat through a day’s worth of a conference or had to endure a dissertation-worthy evaluation report knows, evaluators still have a long way to go. To support the development of researchers and evaluators, some members of the American Evaluation Association are proposing a new TIG (Topical Interest Group) on Data Visualization and Reporting. If you are a member of AEA (or want to be) and you are interested in joining this TIG, contact Stephanie Evergreen.
2 0.95271289 1055 andrew gelman stats-2011-12-13-Data sharing update
Introduction: Fred Oswald reports that Sian Beilock sent him sufficient amounts of raw data from her research study so allow him to answer his questions about the large effects that were observed. This sort of collegiality is central to the collective scientific enterprise. The bad news is that IRB’s are still getting in the way. Beilock was very helpful but she had to work within the constraints of her IRB, which apparently advised her not to share data—even if de-identified—without getting lots more permissions. Oswald writes: It is a little concerning that the IRB bars the sharing of de-identified data, particularly in light of the specific guidelines of the journal Science, which appears to say that when you submit a study to the journal for publication, you are allowing for the sharing of de-identified data — unless you expressly say otherwise at the point that you submit the paper for consideration. Again, I don’t blame Beilock and Ramirez—they appear to have been as helpful as
Introduction: Remember How to Lie With Statistics? It turns out that the author worked for the cigarette companies. John Mashey points to this, from Robert Proctor’s book, “Golden Holocaust: Origins of the Cigarette Catastrophe and the Case for Abolition”: Darrell Huff, author of the wildly popular (and aptly named) How to Lie With Statistics, was paid to testify before Congress in the 1950s and then again in the 1960s, with the assigned task of ridiculing any notion of a cigarette-disease link. On March 22, 1965, Huff testified at hearings on cigarette labeling and advertising, accusing the recent Surgeon General’s report of myriad failures and “fallacies.” Huff peppered his attack with with amusing asides and anecdotes, lampooning spurious correlations like that between the size of Dutch families and the number of storks nesting on rooftops–which proves not that storks bring babies but rather that people with large families tend to have larger houses (which therefore attract more storks).
4 0.93878734 95 andrew gelman stats-2010-06-17-“Rewarding Strivers: Helping Low-Income Students Succeed in College”
Introduction: Several years ago, I heard about a project at the Educational Testing Service to identify “strivers”: students from disadvantaged backgrounds who did unexpectedly well on the SAT (the college admissions exam formerly known as the “Scholastic Aptitude Test” but apparently now just “the SAT,” in the same way that Exxon is just “Exxon” and that Harry Truman’s middle name is just “S”), at least 200 points above a predicted score based on demographic and neighborhood information. My ETS colleague and I agreed that this was a silly idea: From a statistical point of view, if student A is expected ahead of time to do better than student B, and then they get identical test scores, then you’d expect student A (the non-”striver”) to do better than student B (the “striver”) later on. Just basic statistics: if a student does much better than expected, then probably some of that improvement is noise. The idea of identifying these “strivers” seemed misguided and not the best use of the SAT.
5 0.93401819 1668 andrew gelman stats-2013-01-11-My talk at the NY data visualization meetup this Monday!
Introduction: It’s in midtown at 7pm (on Mon 14 Jan 2013). Last time I talked for this group, I spoke on Infovis vs. Statistical Graphics . This time I plan to just go thru the choices involved in a few zillion graphs I’ve published over the years, to give a sense of the options and choices involved in graphical communication. For this talk there will be no single theme (except, perhaps, my usual “Graphs as comparisons,” “All of statistics as comparisons,” and “Exploratory data analysis as hypothesis testing”), just a bunch of open discussion about what I tried, why I tried it, what worked and what didn’t work, etc. I’ve discussed these sorts of decisions on occasion (and am now writing a paper with Yair about some of this for our voting models), but I’ve never tried to make a talk out of it before. Could be fun.
same-blog 6 0.93357086 1897 andrew gelman stats-2013-06-13-When’s that next gamma-ray blast gonna come, already?
7 0.93340153 1050 andrew gelman stats-2011-12-10-Presenting at the econ seminar
8 0.93168527 1261 andrew gelman stats-2012-04-12-The Naval Research Lab
9 0.92804819 2275 andrew gelman stats-2014-03-31-Just gave a talk
11 0.91587442 1143 andrew gelman stats-2012-01-29-G+ > Skype
12 0.91010785 716 andrew gelman stats-2011-05-17-Is the internet causing half the rapes in Norway? I wanna see the scatterplot.
13 0.90710157 2290 andrew gelman stats-2014-04-14-On deck this week
14 0.90388608 1486 andrew gelman stats-2012-09-07-Prior distributions for regression coefficients
15 0.90095818 548 andrew gelman stats-2011-02-01-What goes around . . .
16 0.89857382 1730 andrew gelman stats-2013-02-20-Unz on Unz
17 0.89403963 1218 andrew gelman stats-2012-03-18-Check your missing-data imputations using cross-validation
18 0.89144754 2183 andrew gelman stats-2014-01-23-Discussion on preregistration of research studies
19 0.89140332 1349 andrew gelman stats-2012-05-28-Question 18 of my final exam for Design and Analysis of Sample Surveys
20 0.89050674 1450 andrew gelman stats-2012-08-08-My upcoming talk for the data visualization meetup