andrew_gelman_stats andrew_gelman_stats-2011 andrew_gelman_stats-2011-731 knowledge-graph by maker-knowledge-mining
Source: html
Introduction: It was reported last year that the national lottery of Israel featured the exact same 6 numbers (out of 45) twice in the same month, and statistics professor Isaac Meilijson of Tel Aviv University was quoted as saying that “the incident of six numbers repeating themselves within a month is an event of once in 10,000 years.” I shouldn’t mock when it comes to mathematics–after all, I proved a false theorem once! (Or, to be precise, my collaborator and I published a false claim which we thought we’d proved, thus we thought was a theorem.) So let me retract the mockery and move, first to the mathematics and then to the statistics. First, how many possibilities are there in pick 6 out of 45? It’s (45*44*43*42*41*40)/6! = 8,145,060. Let’s call this number N. Second, what’s the probability that the same numbers repeat in a single calendar month? I’ve been told that the Israeli lottery has 2 draws per week, That’s 104/12=8.67 draws per month. Or maybe they skip some holiday
sentIndex sentText sentNum sentScore
1 Second, what’s the probability that the same numbers repeat in a single calendar month? [sent-9, score-0.577]
2 I’ve been told that the Israeli lottery has 2 draws per week, That’s 104/12=8. [sent-10, score-0.502]
3 Or maybe they’re using the Jewish calendar, in which a month has approximately 28 days. [sent-15, score-0.533]
4 If the probability of winning is 1/N, and there are 8 draws in a month, the probability of no repeats is ((N-1)/N)*((N-2)/N))*…*((N-7)/N). [sent-17, score-0.746]
5 So the probability of at least one repeat is 1 minus this. [sent-18, score-0.306]
6 Another way to do this “birthday problem” computation is to realize that, with 8 possibilities, there are 8*7/2=28 possible pairs, thus the probability of a repeat is approximately 28/N, which again comes to 3. [sent-21, score-0.404]
7 A year has 12 or 13 Jewish-calendar months–I vaguely recall they have the extra month 7 years out of every 19? [sent-24, score-0.508]
8 We could have a match (two identical sets of lottery numbers) less than a month apart, but not in the same month? [sent-30, score-0.577]
9 I think I’m undercounting here because it looks like several countries have multiple “Pick m out of n” lotteries but I’m counting each country only once. [sent-42, score-0.409]
10 I think a safe approximate guess is 100 major lotteries worldwide. [sent-45, score-0.335]
11 These lotteries have different rules–some are more frequent than twice a week, some less frequent, some are easier to win than “pick 6 out of 45,” some are harder to win. [sent-46, score-0.515]
12 But a quick calculation is that if the Israeli lottery will have a repeat in a single month, once in 10,000 years, that if there are 100 lotteries out there, you’ll see ” the incident of six numbers repeating themselves within a month” roughly once in 100 years. [sent-47, score-1.328]
13 To me, the 1 in 10,000 makes the event seem more rare than it is, given that there are so many lotteries out there. [sent-49, score-0.525]
14 paper, I think the 1 in 100 number makes more sense, and also fits better with our intuition that rare things happen but that extremely rare things are extremely rare–unless there are a lot of chances for them to occur. [sent-52, score-0.42]
15 I quarrel not with the mathematics behind the “1 in 10,000 years” claim but with the implicit choice of reference set that includes only the Israel lottery and nothing else. [sent-53, score-0.484]
16 There’s more here from Christian Robert, who reports that the 6 repeated numbers were actually drawn from 1 to 37, so that N is only 2,324,784 That gives us another factor of 3. [sent-57, score-0.346]
17 Christian also points out that there’s no particular reason why we should care about repeats within a month . [sent-59, score-0.805]
18 Consecutive repeats or repeats within a year or repeats ever, maybe, but there’s no particular reason to care about the month except that this is what happened to occur. [sent-60, score-1.499]
19 One might as well have said something like “The probability of a repeat within the same week is only 1 in a zillion, and . [sent-61, score-0.511]
20 There is often confusion on the basic facts (were the repeats 6 numbers out of 45 or 6 out of 37? [sent-66, score-0.544]
wordName wordTfidf (topN-words)
[('month', 0.37), ('lotteries', 0.335), ('repeats', 0.31), ('israeli', 0.26), ('lottery', 0.207), ('repeat', 0.175), ('draws', 0.174), ('numbers', 0.164), ('probability', 0.131), ('rare', 0.126), ('within', 0.125), ('per', 0.121), ('reference', 0.12), ('factor', 0.112), ('israel', 0.107), ('calendar', 0.107), ('mathematics', 0.098), ('repeating', 0.098), ('approximately', 0.098), ('twice', 0.096), ('incident', 0.093), ('zillion', 0.085), ('jewish', 0.084), ('frequent', 0.084), ('pick', 0.083), ('week', 0.08), ('possibilities', 0.077), ('proved', 0.075), ('counting', 0.074), ('year', 0.074), ('christian', 0.072), ('states', 0.07), ('gives', 0.07), ('confusion', 0.07), ('calculation', 0.069), ('let', 0.067), ('maybe', 0.065), ('event', 0.064), ('years', 0.064), ('six', 0.062), ('tel', 0.059), ('quarrel', 0.059), ('extremely', 0.058), ('consecutive', 0.056), ('aviv', 0.056), ('false', 0.055), ('testing', 0.054), ('degrade', 0.053), ('plugging', 0.053), ('number', 0.052)]
simIndex simValue blogId blogTitle
same-blog 1 1.0000002 731 andrew gelman stats-2011-05-26-Lottery probability update
Introduction: It was reported last year that the national lottery of Israel featured the exact same 6 numbers (out of 45) twice in the same month, and statistics professor Isaac Meilijson of Tel Aviv University was quoted as saying that “the incident of six numbers repeating themselves within a month is an event of once in 10,000 years.” I shouldn’t mock when it comes to mathematics–after all, I proved a false theorem once! (Or, to be precise, my collaborator and I published a false claim which we thought we’d proved, thus we thought was a theorem.) So let me retract the mockery and move, first to the mathematics and then to the statistics. First, how many possibilities are there in pick 6 out of 45? It’s (45*44*43*42*41*40)/6! = 8,145,060. Let’s call this number N. Second, what’s the probability that the same numbers repeat in a single calendar month? I’ve been told that the Israeli lottery has 2 draws per week, That’s 104/12=8.67 draws per month. Or maybe they skip some holiday
2 0.14903028 562 andrew gelman stats-2011-02-06-Statistician cracks Toronto lottery
Introduction: Christian points me to this amusing story by Jonah Lehrer about Mohan Srivastava, (perhaps the same person as R. Mohan Srivastava, coauthor of a book called Applied Geostatistics) who discovered a flaw in a scratch-off game in which he could figure out which tickets were likely to win based on partial information visible on the ticket. It appears that scratch-off lotteries elsewhere have similar flaws in their design. The obvious question is, why doesn’t the lottery create the patterns on the tickets (including which “teaser” numbers to reveal) completely at random? It shouldn’t be hard to design this so that zero information is supplied from the outside. in which case Srivastava’s trick would be impossible. So why not put down the numbers randomly? Lehrer quotes Srivastava as saying: The tickets are clearly mass-produced, which means there must be some computer program that lays down the numbers. Of course, it would be really nice if the computer could just spit out random
3 0.13984077 1897 andrew gelman stats-2013-06-13-When’s that next gamma-ray blast gonna come, already?
Introduction: Phil Plait writes : Earth May Have Been Hit by a Cosmic Blast 1200 Years Ago . . . this is nothing to panic about. If it happened at all, it was a long time ago, and unlikely to happen again for hundreds of thousands of years. This left me confused. If it really did happen 1200 years ago, basic statistics would suggest it would occur approximately once every 1200 years or so (within half an order of magnitude). So where does “hundreds of thousands of years” come from? I emailed astronomer David Hogg to see if I was missing something here, and he replied: Yeah, if we think this hit us 1200 years ago, we should imagine that this happens every few thousand years at least. Now that said, if there are *other* reasons for thinking it is exceedingly rare, then that would be a strong a priori argument against believing in the result. So you should either believe that it didn’t happen 1200 years ago, or else you should believe it will happen again in the next few thousan
4 0.11021169 826 andrew gelman stats-2011-07-27-The Statistics Forum!
Introduction: We’re having a fun discussion this week on invovis vs. statistical graphics. Michael Lavine has contributed a couple of posts. Next week will be our special Joint Statistical Meeting edition: we’ll be having several guest-bloggers post on the interesting and amusing encounters they’ve had each day. Then after that we’ll be moving to monthly theme issues: Each month we’ll solicit several different posts on a particular topic.
5 0.10784468 1730 andrew gelman stats-2013-02-20-Unz on Unz
Introduction: Last week I posted skeptical remarks about Ron Unz’s claim that Harvard admissions discriminate in favor of Jews. The comment thread was getting long enough there that I thought it most fair to give Unz a chance to present his thoughts here as a new post. I’ve done that before in cases where I’ve disagreed with someone and he wanted to make his views clear. I will post Unz’s email and my brief response. This is what Unz wrote to me: Since there’s been a great deal of dispute over the numerator and the denominator, it might be useful for each of us should provide our own estimate-range of what we believe are the true figures, and the justification. Perhaps if our ranges actually overlap substantially, then we don’t really disagree much after all. I’d think if you’ve been reading most of the endless comments and refreshing your memory about my claims, you’ve probably now developed your own mental model about the likely reality of the values whereas initially you may have simpl
6 0.093805999 54 andrew gelman stats-2010-05-27-Hype about conditional probability puzzles
7 0.091998674 771 andrew gelman stats-2011-06-16-30 days of statistics
8 0.090093523 1007 andrew gelman stats-2011-11-13-At last, treated with the disrespect that I deserve
9 0.089765951 1544 andrew gelman stats-2012-10-22-Is it meaningful to talk about a probability of “65.7%” that Obama will win the election?
10 0.08770556 1242 andrew gelman stats-2012-04-03-Best lottery story ever
13 0.083388664 2297 andrew gelman stats-2014-04-20-Fooled by randomness
14 0.082740843 386 andrew gelman stats-2010-11-01-Classic probability mistake, this time in the (virtual) pages of the New York Times
15 0.082314931 2232 andrew gelman stats-2014-03-03-What is the appropriate time scale for blogging—the day or the week?
16 0.082126617 1731 andrew gelman stats-2013-02-21-If a lottery is encouraging addictive gambling, don’t expand it!
17 0.081514739 2255 andrew gelman stats-2014-03-19-How Americans vote
18 0.080368541 1078 andrew gelman stats-2011-12-22-Tables as graphs: The Ramanujan principle
19 0.080329403 1567 andrew gelman stats-2012-11-07-Election reports
20 0.079307064 1743 andrew gelman stats-2013-02-28-Different modes of discourse
topicId topicWeight
[(0, 0.172), (1, -0.04), (2, 0.026), (3, 0.013), (4, 0.009), (5, -0.027), (6, 0.035), (7, 0.018), (8, -0.003), (9, -0.076), (10, -0.018), (11, 0.006), (12, -0.02), (13, 0.001), (14, -0.034), (15, 0.07), (16, -0.006), (17, 0.009), (18, 0.026), (19, -0.009), (20, -0.014), (21, 0.064), (22, -0.011), (23, 0.02), (24, -0.012), (25, 0.04), (26, -0.044), (27, 0.035), (28, 0.015), (29, -0.007), (30, -0.016), (31, -0.03), (32, -0.006), (33, 0.019), (34, -0.015), (35, -0.064), (36, 0.05), (37, -0.007), (38, -0.024), (39, -0.016), (40, -0.051), (41, -0.033), (42, 0.007), (43, -0.045), (44, 0.028), (45, -0.008), (46, -0.015), (47, 0.001), (48, -0.058), (49, -0.057)]
simIndex simValue blogId blogTitle
same-blog 1 0.96977305 731 andrew gelman stats-2011-05-26-Lottery probability update
Introduction: It was reported last year that the national lottery of Israel featured the exact same 6 numbers (out of 45) twice in the same month, and statistics professor Isaac Meilijson of Tel Aviv University was quoted as saying that “the incident of six numbers repeating themselves within a month is an event of once in 10,000 years.” I shouldn’t mock when it comes to mathematics–after all, I proved a false theorem once! (Or, to be precise, my collaborator and I published a false claim which we thought we’d proved, thus we thought was a theorem.) So let me retract the mockery and move, first to the mathematics and then to the statistics. First, how many possibilities are there in pick 6 out of 45? It’s (45*44*43*42*41*40)/6! = 8,145,060. Let’s call this number N. Second, what’s the probability that the same numbers repeat in a single calendar month? I’ve been told that the Israeli lottery has 2 draws per week, That’s 104/12=8.67 draws per month. Or maybe they skip some holiday
2 0.74065852 562 andrew gelman stats-2011-02-06-Statistician cracks Toronto lottery
Introduction: Christian points me to this amusing story by Jonah Lehrer about Mohan Srivastava, (perhaps the same person as R. Mohan Srivastava, coauthor of a book called Applied Geostatistics) who discovered a flaw in a scratch-off game in which he could figure out which tickets were likely to win based on partial information visible on the ticket. It appears that scratch-off lotteries elsewhere have similar flaws in their design. The obvious question is, why doesn’t the lottery create the patterns on the tickets (including which “teaser” numbers to reveal) completely at random? It shouldn’t be hard to design this so that zero information is supplied from the outside. in which case Srivastava’s trick would be impossible. So why not put down the numbers randomly? Lehrer quotes Srivastava as saying: The tickets are clearly mass-produced, which means there must be some computer program that lays down the numbers. Of course, it would be really nice if the computer could just spit out random
3 0.73089612 2322 andrew gelman stats-2014-05-06-Priors I don’t believe
Introduction: Biostatistician Jeff Leek writes : Think about this headline: “Hospital checklist cut infections, saved lives.” I [Leek] am a pretty skeptical person, so I’m a little surprised that a checklist could really save lives. I say the odds of this being true are 1 in 4. I’m actually surprised that he’s surprised, since over the years I’ve heard about the benefits of checklists in various arenas, including hospital care. In particular, there was this article by Atul Gawande from a few years back. I mean, sure, I could imagine that checklists might hurt: after all, it takes some time and effort to put together the checklist and to use it, and perhaps the very existence of the checklist could give hospital staff a false feeling of security, which would ultimately cost lives. But my first guess would be that people still don’t do enough checklisting, and that the probability is greater than 1/4 that a checklist in a hospital will save lives. Later on, Leek writes: Let’s try ano
4 0.72499359 1187 andrew gelman stats-2012-02-27-“Apple confronts the law of large numbers” . . . huh?
Introduction: I was reading this news article by famed business reporter James Stewart: Measured by market capitalization, Apple is the world’s biggest public company. . . . Sales for the quarter that ended Dec. 31 . . . totaled $46.33 billion, up 73 percent from the year before. Earnings more than doubled. . . . Here is the rub: Apple is so big, it’s running up against the law of large numbers. Huh? At this point I sat up, curious. Stewart continued: Also known as the golden theorem, with a proof attributed to the 17th-century Swiss mathematician Jacob Bernoulli, the law states that a variable will revert to a mean over a large sample of results. In the case of the largest companies, it suggests that high earnings growth and a rapid rise in share price will slow as those companies grow ever larger. If Apple’s share price grew even 20 percent a year for the next decade, which is far below its current blistering pace, its $500 billion market capitalization would be more than $3 tri
Introduction: Alexander at GiveWell writes : The Disease Control Priorities in Developing Countries (DCP2), a major report funded by the Gates Foundation . . . provides an estimate of $3.41 per disability-adjusted life-year (DALY) for the cost-effectiveness of soil-transmitted-helminth (STH) treatment, implying that STH treatment is one of the most cost-effective interventions for global health. In investigating this figure, we have corresponded, over a period of months, with six scholars who had been directly or indirectly involved in the production of the estimate. Eventually, we were able to obtain the spreadsheet that was used to generate the $3.41/DALY estimate. That spreadsheet contains five separate errors that, when corrected, shift the estimated cost effectiveness of deworming from $3.41 to $326.43. [I think they mean to say $300 -- ed.] We came to this conclusion a year after learning that the DCP2’s published cost-effectiveness estimate for schistosomiasis treatment – another kind of
6 0.72202462 1905 andrew gelman stats-2013-06-18-There are no fat sprinters
7 0.72006279 873 andrew gelman stats-2011-08-26-Luck or knowledge?
8 0.71751124 1724 andrew gelman stats-2013-02-16-Zero Dark Thirty and Bayes’ theorem
9 0.71392214 138 andrew gelman stats-2010-07-10-Creating a good wager based on probability estimates
10 0.71155256 628 andrew gelman stats-2011-03-25-100-year floods
11 0.71041095 68 andrew gelman stats-2010-06-03-…pretty soon you’re talking real money.
13 0.7036851 54 andrew gelman stats-2010-05-27-Hype about conditional probability puzzles
14 0.69444573 2022 andrew gelman stats-2013-09-13-You heard it here first: Intense exercise can suppress appetite
15 0.69222528 23 andrew gelman stats-2010-05-09-Popper’s great, but don’t bother with his theory of probability
16 0.69116706 2328 andrew gelman stats-2014-05-10-What property is important in a risk prediction model? Discrimination or calibration?
17 0.68938059 1058 andrew gelman stats-2011-12-14-Higgs bozos: Rosencrantz and Guildenstern are spinning in their graves
18 0.67688608 549 andrew gelman stats-2011-02-01-“Roughly 90% of the increase in . . .” Hey, wait a minute!
19 0.67458773 1897 andrew gelman stats-2013-06-13-When’s that next gamma-ray blast gonna come, already?
20 0.67181408 179 andrew gelman stats-2010-08-03-An Olympic size swimming pool full of lithium water
topicId topicWeight
[(15, 0.016), (16, 0.062), (21, 0.049), (23, 0.083), (24, 0.146), (31, 0.011), (36, 0.014), (47, 0.01), (53, 0.017), (77, 0.035), (86, 0.055), (95, 0.014), (99, 0.331)]
simIndex simValue blogId blogTitle
1 0.98113763 1410 andrew gelman stats-2012-07-09-Experimental work on market-based or non-market-based incentives
Introduction: Mark Patterson writes: I found a discussion at the Boston Review that I thought you’d be interested in, given your posts on the potentially dubious foundations of many neoclassical economics models. Michael Sandel cites a few examples of markets crowding out moral behavior. His longest discussion regards Frey and Oberholzer-Gee’s work demonstrating Swiss citizens’ willingness to admit a nuclear waste facility to town decreasing when offered monetary incentives. It seems like this is a situation that really demands a discussion of the available empirical evidence (Uri Gneezy and Aldo Rustichini have two papers, “Pay Enough or Don’t Pay At All” and “A Fine is a Price” that seem especially relevant.) While the essay has sparked the usual sort of libertarian response, I’m struck by the fact that most people aren’t talking about the experimental work that’s actually available—it seems like this is the best way forward. My reply: I don’t have much to add here, but this sort
2 0.97944003 203 andrew gelman stats-2010-08-12-John McPhee, the Anti-Malcolm
Introduction: This blog is threatening to turn into Statistical Modeling, Causal Inference, Social Science, and Literature Criticism, but I’m just going to go with the conversational flow, so here’s another post about an essayist. I’m not a big fan of Janet Malcolm’s essays — and I don’t mean I don’t like her attitude or her pro-murderer attitude, I mean I don’t like them all that much as writing. They’re fine, I read them, they don’t bore me, but I certainly don’t think she’s “our” best essayist. But that’s not a debate I want to have right now, and if I did I’m quite sure most of you wouldn’t want to read it anyway. So instead, I’ll just say something about John McPhee. As all right-thinking people agree, in McPhee’s long career he has written two kinds of books: good, short books, and bad, long books. (He has also written many New Yorker essays, and perhaps other essays for other magazines too; most of these are good, although I haven’t seen any really good recent work from him, and so
same-blog 3 0.97680759 731 andrew gelman stats-2011-05-26-Lottery probability update
Introduction: It was reported last year that the national lottery of Israel featured the exact same 6 numbers (out of 45) twice in the same month, and statistics professor Isaac Meilijson of Tel Aviv University was quoted as saying that “the incident of six numbers repeating themselves within a month is an event of once in 10,000 years.” I shouldn’t mock when it comes to mathematics–after all, I proved a false theorem once! (Or, to be precise, my collaborator and I published a false claim which we thought we’d proved, thus we thought was a theorem.) So let me retract the mockery and move, first to the mathematics and then to the statistics. First, how many possibilities are there in pick 6 out of 45? It’s (45*44*43*42*41*40)/6! = 8,145,060. Let’s call this number N. Second, what’s the probability that the same numbers repeat in a single calendar month? I’ve been told that the Israeli lottery has 2 draws per week, That’s 104/12=8.67 draws per month. Or maybe they skip some holiday
4 0.97462994 532 andrew gelman stats-2011-01-23-My Wall Street Journal story
Introduction: I was talking with someone the other day about the book by that Yale law professor who called her kids “garbage” and didn’t let them go to the bathroom when they were studying piano . . . apparently it wasn’t so bad as all that, she was misrepresented by the Wall Street Journal excerpt: “I was very surprised,” she says. “The Journal basically strung together the most controversial sections of the book. And I had no idea they’d put that kind of a title on it. . . . “And while it’s ultimately my responsibility — my strict Chinese mom told me ‘never blame other people for your problems!’ — the one-sided nature of the excerpt has really led to some major misconceptions about what the book says, and about what I really believe.” I don’t completely follow her reasoning here: just because, many years ago, her mother told her a slogan about not blaming other people, therefore she can say, “it’s ultimately my responsibility”? You can see the illogic of this by flipping it around. Wha
5 0.97087717 1513 andrew gelman stats-2012-09-27-Estimating seasonality with a data set that’s just 52 weeks long
Introduction: Kaiser asks: Trying to figure out what are some keywords to research for this problem I’m trying to solve. I need to estimate seasonality but without historical data. What I have are multiple time series of correlated metrics (think department store sales, movie receipts, etc.) but all of them for 52 weeks only. I’m thinking that if these metrics are all subject to some underlying seasonality, I should be able to estimate that without needing prior years data. My reply: Can I blog this and see if the hive mind responds? I’m not an expert on this one. My first thought is to fit an additive model including date effects, with some sort of spline on the date effects along with day-of-week effects, idiosyncratic date effects (July 4th, Christmas, etc.), and possible interactions. Actually, I’d love to fit something like that in Stan, just to see how it turns out. It could be a tangled mess but it could end up working really well!
7 0.96779794 578 andrew gelman stats-2011-02-17-Credentialism, elite employment, and career aspirations
8 0.967076 2216 andrew gelman stats-2014-02-18-Florida backlash
10 0.96565223 308 andrew gelman stats-2010-09-30-Nano-project qualifying exam process: An intensified dialogue between students and faculty
11 0.96364641 2263 andrew gelman stats-2014-03-24-Empirical implications of Empirical Implications of Theoretical Models
12 0.96304154 288 andrew gelman stats-2010-09-21-Discussion of the paper by Girolami and Calderhead on Bayesian computation
13 0.96296895 670 andrew gelman stats-2011-04-20-Attractive but hard-to-read graph could be made much much better
14 0.96292198 1247 andrew gelman stats-2012-04-05-More philosophy of Bayes
15 0.9625569 1910 andrew gelman stats-2013-06-22-Struggles over the criticism of the “cannabis users and IQ change” paper
16 0.96233588 2281 andrew gelman stats-2014-04-04-The Notorious N.H.S.T. presents: Mo P-values Mo Problems
18 0.9621855 1996 andrew gelman stats-2013-08-24-All inference is about generalizing from sample to population
19 0.96207285 1630 andrew gelman stats-2012-12-18-Postdoc positions at Microsoft Research – NYC
20 0.96193856 2337 andrew gelman stats-2014-05-18-Never back down: The culture of poverty and the culture of journalism