andrew_gelman_stats andrew_gelman_stats-2010 andrew_gelman_stats-2010-54 knowledge-graph by maker-knowledge-mining

54 andrew gelman stats-2010-05-27-Hype about conditional probability puzzles


meta infos for this blog

Source: html

Introduction: Jason Kottke posts this puzzle from Gary Foshee that reportedly impressed people at a puzzle-designers’ convention: I have two children. One is a boy born on a Tuesday. What is the probability I have two boys? The first thing you think is “What has Tuesday got to do with it?” Well, it has everything to do with it. I thought I should really figure this one out myself before reading any further, and I decided this was a good time to apply my general principle that it’s always best to solve such problems from scratch rather than trying to guess at the answer. So I laid out all the 4 x 49 possibilities. The 4 is bb, bg, gb, gg, and the 49 are all possible pairs of days of the week. Then I ruled out all the possibilities that were inconsistent with the data: this leaves the following: bb with all pairs of days that include a Tuesday. That’s 13 possibilities (Mon/Tues, Tues/Tues, Wed/Tues, …, Tues/Mon, …, Sun/Tues, remembering not to count Tues/Tues twice). bg with all


Summary: the most important sentenses genereted by tfidf model

sentIndex sentText sentNum sentScore

1 Jason Kottke posts this puzzle from Gary Foshee that reportedly impressed people at a puzzle-designers’ convention: I have two children. [sent-1, score-0.114]

2 I thought I should really figure this one out myself before reading any further, and I decided this was a good time to apply my general principle that it’s always best to solve such problems from scratch rather than trying to guess at the answer. [sent-6, score-0.46]

3 The 4 is bb, bg, gb, gg, and the 49 are all possible pairs of days of the week. [sent-8, score-0.26]

4 Then I ruled out all the possibilities that were inconsistent with the data: this leaves the following: bb with all pairs of days that include a Tuesday. [sent-9, score-0.993]

5 That’s 13 possibilities (Mon/Tues, Tues/Tues, Wed/Tues, …, Tues/Mon, …, Sun/Tues, remembering not to count Tues/Tues twice). [sent-10, score-0.208]

6 bg with all the Tues/x pairs: that’s 7 more possibilities. [sent-11, score-0.148]

7 I decided I’d keep it simple: - Ignore multiple births. [sent-14, score-0.073]

8 - Pretend that boys and girls are equally likely. [sent-15, score-0.192]

9 ) - Pretend that births are equally likely on every day. [sent-18, score-0.148]

10 But then I thought, Hey, he said, “ One is a boy born on a Tuesday. [sent-21, score-0.436]

11 So I’ll toss out bb Tues/Tues, which leaves us with 26 possibilities and a conditional probability for bb of 12/26. [sent-23, score-1.404]

12 So I guess when Foshee said “One,” he meant, “At least one. [sent-25, score-0.071]

13 If specifying a logically irrelevant detail changes the probability calculation, doesn’t that tell us that probability thinking is a relatively useless tool in situations like this? [sent-28, score-0.915]

14 It is implicit that everyone is born on a particuar day, if specifying something we already knew changes the calculation, isn’t the calculation unreliable for decision making, for this class of situations? [sent-29, score-0.659]

15 The interesting question, I think, is how often do these sorts of tricky conditional probability problems arise in real life. [sent-32, score-0.56]

16 (That is, I’m not trying to raise a rhetorical question and claim that these problems don’t arise in real life. [sent-34, score-0.258]

17 Bellos’s article was fine, but I wish he’d remarked that these conditional probability examples are textbook problems in introductory probability courses. [sent-38, score-0.798]

18 I agree with the many commenters who point out that, really, the information to condition on is not “Foshee has two children. [sent-42, score-0.056]

19 One is a boy born on a Tuesday,” but, rather, “Foshee says, ‘I have two children. [sent-43, score-0.492]

20 That’s one reason I’m not a big fan of this sort of trick probability question: some of the most important parts of the problem are hidden, and the answer is typically explained in a way that avoids making clear the assumptions that are needed to get there. [sent-46, score-0.388]


similar blogs computed by tfidf model

tfidf for this blog:

wordName wordTfidf (topN-words)

[('foshee', 0.406), ('bb', 0.37), ('probability', 0.258), ('born', 0.232), ('boy', 0.204), ('pairs', 0.196), ('bellos', 0.163), ('gb', 0.163), ('bg', 0.148), ('possibilities', 0.144), ('pretend', 0.14), ('calculation', 0.13), ('problems', 0.122), ('specifying', 0.111), ('conditional', 0.104), ('tuesday', 0.102), ('boys', 0.1), ('leaves', 0.097), ('pr', 0.097), ('equally', 0.092), ('situations', 0.084), ('tool', 0.076), ('arise', 0.076), ('regular', 0.075), ('decided', 0.073), ('solve', 0.072), ('guess', 0.071), ('stark', 0.07), ('ruled', 0.067), ('todd', 0.067), ('one', 0.066), ('changes', 0.065), ('avoids', 0.064), ('remembering', 0.064), ('days', 0.064), ('decision', 0.063), ('logically', 0.063), ('toss', 0.061), ('rhetorical', 0.06), ('curiosity', 0.06), ('reportedly', 0.058), ('unreliable', 0.058), ('scratch', 0.056), ('births', 0.056), ('jason', 0.056), ('remarked', 0.056), ('two', 0.056), ('inconsistent', 0.055), ('convention', 0.054), ('laid', 0.053)]

similar blogs list:

simIndex simValue blogId blogTitle

same-blog 1 1.0000001 54 andrew gelman stats-2010-05-27-Hype about conditional probability puzzles

Introduction: Jason Kottke posts this puzzle from Gary Foshee that reportedly impressed people at a puzzle-designers’ convention: I have two children. One is a boy born on a Tuesday. What is the probability I have two boys? The first thing you think is “What has Tuesday got to do with it?” Well, it has everything to do with it. I thought I should really figure this one out myself before reading any further, and I decided this was a good time to apply my general principle that it’s always best to solve such problems from scratch rather than trying to guess at the answer. So I laid out all the 4 x 49 possibilities. The 4 is bb, bg, gb, gg, and the 49 are all possible pairs of days of the week. Then I ruled out all the possibilities that were inconsistent with the data: this leaves the following: bb with all pairs of days that include a Tuesday. That’s 13 possibilities (Mon/Tues, Tues/Tues, Wed/Tues, …, Tues/Mon, …, Sun/Tues, remembering not to count Tues/Tues twice). bg with all

2 0.19014268 56 andrew gelman stats-2010-05-28-Another argument in favor of expressing conditional probability statements using the population distribution

Introduction: Yesterday we had a spirited discussion of the following conditional probability puzzle: “I have two children. One is a boy born on a Tuesday. What is the probability I have two boys?” This reminded me of the principle, familiar from statistics instruction and the cognitive psychology literature, that the best way to teach these sorts of examples is through integers rather than fractions. For example, consider this classic problem: “10% of persons have disease X. You are tested for the disease and test positive, and the test has 80% accuracy. What is the probability that you have the disease?” This can be solved directly using conditional probability but it appears to be clearer to do it using integers: Start with 100 people. 10 will have the disease and 90 will not. Of the 10 with the disease, 8 will test positive and 2 will test negative. Of the 90 without the disease, 18 will test positive and 72% will test negative. (72% = 0.8*90.) So, out of the origin

3 0.12702912 2200 andrew gelman stats-2014-02-05-Prior distribution for a predicted probability

Introduction: I received the following email: I have an interesting thought on a prior for a logistic regression, and would love your input on how to make it “work.” Some of my research, two published papers, are on mathematical models of **. Along those lines, I’m interested in developing more models for **. . . . Empirical studies show that the public is rather smart and that the wisdom-of-the-crowd is fairly accurate. So, my thought would be to tread the public’s probability of the event as a prior, and then see how adding data, through a model, would change or perturb our inferred probability of **. (Similarly, I could envision using previously published epidemiological research as a prior probability of a disease, and then seeing how the addition of new testing protocols would update that belief.) However, everything I learned about hierarchical Bayesian models has a prior as a distribution on the coefficients. I don’t know how to start with a prior point estimate for the probabili

4 0.12595175 341 andrew gelman stats-2010-10-14-Confusion about continuous probability densities

Introduction: I had the following email exchange with a reader of Bayesian Data Analysis. My correspondent wrote: Exercise 1(b) involves evaluating the normal pdf at a single point. But p(Y=y|mu,sigma) = 0 (and is not simply N(y|mu,sigma)), since the normal distribution is continuous. So it seems that part (b) of the exercise is inappropriate. The solution does actually evaluate the probability as the value of the pdf at the single point, which is wrong. The probabilities should all be 0, so the answer to (b) is undefined. I replied: The pdf is the probability density function, which for a continuous distribution is defined as the derivative of the cumulative density function. The notation in BDA is rigorous but we do not spell out all the details, so I can see how confusion is possible. My correspondent: I agree that the pdf is the derivative of the cdf. But to compute P(a .lt. Y .lt. b) for a continuous distribution (with support in the real line) requires integrating over t

5 0.10564516 23 andrew gelman stats-2010-05-09-Popper’s great, but don’t bother with his theory of probability

Introduction: Adam Gurri writes: Any chance you could do a post explaining Popper’s propensity theory of probability? I have never understood it. My reply: I’m a big fan of Popper (search this blog for details), especially as interpreted by Lakatos, but as far as I can tell, Popper’s theory of probability is hopeless. We’ve made a lot of progress on probability in the past 75 years, and I don’t see any real need to go back to the bad old days.

6 0.099536955 1760 andrew gelman stats-2013-03-12-Misunderstanding the p-value

7 0.09634418 2322 andrew gelman stats-2014-05-06-Priors I don’t believe

8 0.093805999 731 andrew gelman stats-2011-05-26-Lottery probability update

9 0.093178682 1972 andrew gelman stats-2013-08-07-When you’re planning on fitting a model, build up to it by fitting simpler models first. Then, once you have a model you like, check the hell out of it

10 0.08611396 2182 andrew gelman stats-2014-01-22-Spell-checking example demonstrates key aspects of Bayesian data analysis

11 0.086033605 2132 andrew gelman stats-2013-12-13-And now, here’s something that would make Ed Tufte spin in his . . . ummm, Tufte’s still around, actually, so let’s just say I don’t think he’d like it!

12 0.086023793 1848 andrew gelman stats-2013-05-09-A tale of two discussion papers

13 0.085916571 1562 andrew gelman stats-2012-11-05-Let’s try this: Instead of saying, “The probability is 75%,” say “There’s a 25% chance I’m wrong”

14 0.085044973 1544 andrew gelman stats-2012-10-22-Is it meaningful to talk about a probability of “65.7%” that Obama will win the election?

15 0.084746286 2134 andrew gelman stats-2013-12-14-Oswald evidence

16 0.082933858 1941 andrew gelman stats-2013-07-16-Priors

17 0.082433082 138 andrew gelman stats-2010-07-10-Creating a good wager based on probability estimates

18 0.08153934 2141 andrew gelman stats-2013-12-20-Don’t douthat, man! Please give this fallacy a name.

19 0.081418455 1247 andrew gelman stats-2012-04-05-More philosophy of Bayes

20 0.08112254 317 andrew gelman stats-2010-10-04-Rob Kass on statistical pragmatism, and my reactions


similar blogs computed by lsi model

lsi for this blog:

topicId topicWeight

[(0, 0.174), (1, 0.017), (2, 0.001), (3, 0.015), (4, 0.002), (5, -0.021), (6, 0.05), (7, 0.04), (8, 0.032), (9, -0.061), (10, -0.016), (11, 0.009), (12, -0.006), (13, -0.046), (14, -0.052), (15, 0.002), (16, 0.035), (17, 0.001), (18, -0.005), (19, -0.016), (20, 0.005), (21, -0.01), (22, -0.018), (23, -0.002), (24, 0.002), (25, 0.041), (26, 0.005), (27, 0.045), (28, 0.002), (29, -0.05), (30, -0.016), (31, 0.016), (32, -0.021), (33, 0.028), (34, -0.034), (35, -0.055), (36, 0.016), (37, 0.017), (38, -0.039), (39, -0.006), (40, 0.008), (41, -0.019), (42, 0.049), (43, -0.061), (44, 0.016), (45, 0.065), (46, 0.009), (47, 0.048), (48, -0.017), (49, -0.011)]

similar blogs list:

simIndex simValue blogId blogTitle

same-blog 1 0.96971321 54 andrew gelman stats-2010-05-27-Hype about conditional probability puzzles

Introduction: Jason Kottke posts this puzzle from Gary Foshee that reportedly impressed people at a puzzle-designers’ convention: I have two children. One is a boy born on a Tuesday. What is the probability I have two boys? The first thing you think is “What has Tuesday got to do with it?” Well, it has everything to do with it. I thought I should really figure this one out myself before reading any further, and I decided this was a good time to apply my general principle that it’s always best to solve such problems from scratch rather than trying to guess at the answer. So I laid out all the 4 x 49 possibilities. The 4 is bb, bg, gb, gg, and the 49 are all possible pairs of days of the week. Then I ruled out all the possibilities that were inconsistent with the data: this leaves the following: bb with all pairs of days that include a Tuesday. That’s 13 possibilities (Mon/Tues, Tues/Tues, Wed/Tues, …, Tues/Mon, …, Sun/Tues, remembering not to count Tues/Tues twice). bg with all

2 0.84827912 56 andrew gelman stats-2010-05-28-Another argument in favor of expressing conditional probability statements using the population distribution

Introduction: Yesterday we had a spirited discussion of the following conditional probability puzzle: “I have two children. One is a boy born on a Tuesday. What is the probability I have two boys?” This reminded me of the principle, familiar from statistics instruction and the cognitive psychology literature, that the best way to teach these sorts of examples is through integers rather than fractions. For example, consider this classic problem: “10% of persons have disease X. You are tested for the disease and test positive, and the test has 80% accuracy. What is the probability that you have the disease?” This can be solved directly using conditional probability but it appears to be clearer to do it using integers: Start with 100 people. 10 will have the disease and 90 will not. Of the 10 with the disease, 8 will test positive and 2 will test negative. Of the 90 without the disease, 18 will test positive and 72% will test negative. (72% = 0.8*90.) So, out of the origin

3 0.83325744 138 andrew gelman stats-2010-07-10-Creating a good wager based on probability estimates

Introduction: Suppose you and I agree on a probability estimate…perhaps we both agree there is a 2/3 chance Spain will beat Netherlands in tomorrow’s World Cup. In this case, we could agree on a wager: if Spain beats Netherlands, I pay you $x. If Netherlands beats Spain, you pay me $2x. It is easy to see that my expected loss (or win) is $0, and that the same is true for you. Either of us should be indifferent to taking this bet, and to which side of the bet we are on. We might make this bet just to increase our interest in watching the game, but neither of us would see a money-making opportunity here. By the way, the relationship between “odds” and the event probability — a 1/3 chance of winning turning into a bet at 2:1 odds — is that if the event probability is p, then a fair bet has odds of (1/p – 1):1. More interesting, and more relevant to many real-world situations, is the case that we disagree on the probability of an event. If we disagree on the probability, then there should be

4 0.83220059 2322 andrew gelman stats-2014-05-06-Priors I don’t believe

Introduction: Biostatistician Jeff Leek writes : Think about this headline: “Hospital checklist cut infections, saved lives.” I [Leek] am a pretty skeptical person, so I’m a little surprised that a checklist could really save lives. I say the odds of this being true are 1 in 4. I’m actually surprised that he’s surprised, since over the years I’ve heard about the benefits of checklists in various arenas, including hospital care. In particular, there was this article by Atul Gawande from a few years back. I mean, sure, I could imagine that checklists might hurt: after all, it takes some time and effort to put together the checklist and to use it, and perhaps the very existence of the checklist could give hospital staff a false feeling of security, which would ultimately cost lives. But my first guess would be that people still don’t do enough checklisting, and that the probability is greater than 1/4 that a checklist in a hospital will save lives. Later on, Leek writes: Let’s try ano

5 0.82214415 341 andrew gelman stats-2010-10-14-Confusion about continuous probability densities

Introduction: I had the following email exchange with a reader of Bayesian Data Analysis. My correspondent wrote: Exercise 1(b) involves evaluating the normal pdf at a single point. But p(Y=y|mu,sigma) = 0 (and is not simply N(y|mu,sigma)), since the normal distribution is continuous. So it seems that part (b) of the exercise is inappropriate. The solution does actually evaluate the probability as the value of the pdf at the single point, which is wrong. The probabilities should all be 0, so the answer to (b) is undefined. I replied: The pdf is the probability density function, which for a continuous distribution is defined as the derivative of the cumulative density function. The notation in BDA is rigorous but we do not spell out all the details, so I can see how confusion is possible. My correspondent: I agree that the pdf is the derivative of the cdf. But to compute P(a .lt. Y .lt. b) for a continuous distribution (with support in the real line) requires integrating over t

6 0.77985108 23 andrew gelman stats-2010-05-09-Popper’s great, but don’t bother with his theory of probability

7 0.75263143 1387 andrew gelman stats-2012-06-21-Will Tiger Woods catch Jack Nicklaus? And a discussion of the virtues of using continuous data even if your goal is discrete prediction

8 0.74064231 731 andrew gelman stats-2011-05-26-Lottery probability update

9 0.73822623 562 andrew gelman stats-2011-02-06-Statistician cracks Toronto lottery

10 0.72180414 808 andrew gelman stats-2011-07-18-The estimated effect size is implausibly large. Under what models is this a piece of evidence that the true effect is small?

11 0.71560496 171 andrew gelman stats-2010-07-30-Silly baseball example illustrates a couple of key ideas they don’t usually teach you in statistics class

12 0.70909274 1760 andrew gelman stats-2013-03-12-Misunderstanding the p-value

13 0.70033848 1562 andrew gelman stats-2012-11-05-Let’s try this: Instead of saying, “The probability is 75%,” say “There’s a 25% chance I’m wrong”

14 0.68884647 1857 andrew gelman stats-2013-05-15-Does quantum uncertainty have a place in everyday applied statistics?

15 0.68761528 996 andrew gelman stats-2011-11-07-Chi-square FAIL when many cells have small expected values

16 0.68602228 2132 andrew gelman stats-2013-12-13-And now, here’s something that would make Ed Tufte spin in his . . . ummm, Tufte’s still around, actually, so let’s just say I don’t think he’d like it!

17 0.68565899 1319 andrew gelman stats-2012-05-14-I hate to get all Gerd Gigerenzer on you here, but . . .

18 0.67764705 2258 andrew gelman stats-2014-03-21-Random matrices in the news

19 0.67280006 29 andrew gelman stats-2010-05-12-Probability of successive wins in baseball

20 0.66974658 1518 andrew gelman stats-2012-10-02-Fighting a losing battle


similar blogs computed by lda model

lda for this blog:

topicId topicWeight

[(16, 0.297), (24, 0.079), (40, 0.031), (53, 0.036), (68, 0.01), (95, 0.02), (96, 0.014), (99, 0.376)]

similar blogs list:

simIndex simValue blogId blogTitle

1 0.99191332 321 andrew gelman stats-2010-10-05-Racism!

Introduction: Last night I spoke at the Columbia Club of New York, along with some of my political science colleagues, in a panel about politics, the economy, and the forthcoming election. The discussion was fine . . . until one guy in the audience accused us of bias based on what he imputed as our ethnicity. One of the panelists replied by asking the questioner what of all the things we had said was biased, and the questioner couldn’t actually supply any examples. It makes sense that the questioner couldn’t come up with a single example of bias on our part, considering that we were actually presenting facts . At some level, the questioner’s imputation of our ethnicity and accusation of bias isn’t so horrible. When talking with my friends, I engage in casual ethnic stereotyping all the time–hey, it’s a free country!–and one can certainly make the statistical argument that you can guess people’s ethnicities from their names, appearance, and speech patterns, and in turn you can infer a lot

2 0.99153143 1022 andrew gelman stats-2011-11-21-Progress for the Poor

Introduction: Lane Kenworthy writes : The book is full of graphs that support the above claims. One thing I like about Kenworthy’s approach is that he performs a separate analysis to examine each of his hypotheses. A lot of social scientists seem to think that the ideal analysis will conclude with a big regression where each coefficient tells a story and you can address all your hypotheses by looking at which predictors and interactions have statistically significant coefficients. Really, though, I think you need a separate analysis for each causal question (see chapters 9 and 10 of my book with Jennifer, follow this link ). Kenworthy’s overall recommendation is to increase transfer payments to low-income families and to increase overall government spending on social services, and to fund this through general tax increases. What will it take for this to happen? After a review of the evidence from economic trends and opinion polls, Kenworthy writes, “Americans are potentially recepti

3 0.99042475 1156 andrew gelman stats-2012-02-06-Bayesian model-building by pure thought: Some principles and examples

Introduction: This is one of my favorite papers: In applications, statistical models are often restricted to what produces reasonable estimates based on the data at hand. In many cases, however, the principles that allow a model to be restricted can be derived theoretically, in the absence of any data and with minimal applied context. We illustrate this point with three well-known theoretical examples from spatial statistics and time series. First, we show that an autoregressive model for local averages violates a principle of invariance under scaling. Second, we show how the Bayesian estimate of a strictly-increasing time series, using a uniform prior distribution, depends on the scale of estimation. Third, we interpret local smoothing of spatial lattice data as Bayesian estimation and show why uniform local smoothing does not make sense. In various forms, the results presented here have been derived in previous work; our contribution is to draw out some principles that can be derived theoretic

4 0.9898054 1928 andrew gelman stats-2013-07-06-How to think about papers published in low-grade journals?

Introduction: We’ve had lots of lively discussions of fatally-flawed papers that have been published in top, top journals such as the American Economic Review or the Journal of Personality and Social Psychology or the American Sociological Review or the tabloids . And we also know about mistakes that make their way into mid-ranking outlets such as the Journal of Theoretical Biology. But what about results that appear in the lower tier of legitimate journals? I was thinking about this after reading a post by Dan Kahan slamming a paper that recently appeared in PLOS-One. I won’t discuss the paper itself here because that’s not my point. Rather, I had some thoughts regarding Kahan’s annoyance that a paper with fatal errors was published at all. I commented as follows: Read between the lines. The paper originally was released in 2009 and was published in 2013 in PLOS-One, which is one step above appearing on Arxiv. PLOS-One publishes some good things (so does Arxiv) but it’s the place

5 0.98879206 1495 andrew gelman stats-2012-09-13-Win $5000 in the Economist’s data visualization competition

Introduction: Michael Nelson points me to this . OK, $5,000 isn’t a lot of money (I’m not expecting Niall Ferguson in the competition), but I’m still glad to see this, given that the Economist is known for its excellent graphics.

6 0.9871223 609 andrew gelman stats-2011-03-13-Coauthorship norms

7 0.98690814 1598 andrew gelman stats-2012-11-30-A graphics talk with no visuals!

8 0.98644495 1025 andrew gelman stats-2011-11-24-Always check your evidence

9 0.98626626 700 andrew gelman stats-2011-05-06-Suspicious pattern of too-strong replications of medical research

10 0.98156083 159 andrew gelman stats-2010-07-23-Popular governor, small state

11 0.98115742 564 andrew gelman stats-2011-02-08-Different attitudes about parenting, possibly deriving from different attitudes about self

12 0.98111516 960 andrew gelman stats-2011-10-15-The bias-variance tradeoff

13 0.98031008 387 andrew gelman stats-2010-11-01-Do you own anything that was manufactured in the 1950s and still is in regular, active use in your life?

14 0.98012376 1168 andrew gelman stats-2012-02-14-The tabloids strike again

15 0.97858655 1330 andrew gelman stats-2012-05-19-Cross-validation to check missing-data imputation

16 0.97506511 1487 andrew gelman stats-2012-09-08-Animated drought maps

17 0.97343326 377 andrew gelman stats-2010-10-28-The incoming moderate Republican congressmembers

18 0.9692086 1712 andrew gelman stats-2013-02-07-Philosophy and the practice of Bayesian statistics (with all the discussions!)

19 0.96720648 2 andrew gelman stats-2010-04-23-Modeling heterogenous treatment effects

20 0.96178573 445 andrew gelman stats-2010-12-03-Getting a job in pro sports… as a statistician