andrew_gelman_stats andrew_gelman_stats-2010 andrew_gelman_stats-2010-171 knowledge-graph by maker-knowledge-mining

171 andrew gelman stats-2010-07-30-Silly baseball example illustrates a couple of key ideas they don’t usually teach you in statistics class


meta infos for this blog

Source: html

Introduction: From a commenter on the web, 21 May 2010: Tampa Bay: Playing .732 ball in the toughest division in baseball, wiped their feet on NY twice. If they sweep Houston, which seems pretty likely, they will be at .750, which I [the commenter] have never heard of. At the time of that posting, the Rays were 30-11. Quick calculation: if a team is good enough to be expected to win 100 games, that is, Pr(win) = 100/162 = .617, then there’s a 5% chance that they’ll have won at least 30 of their first 41 games. That’s a calculation based on simple probability theory of independent events, which isn’t quite right here but will get you close and is a good way to train one’s intuition , I think. Having a .732 record after 41 games is not unheard-of. The Detroit Tigers won 35 of their first 40 games in 1984: that’s .875. (I happen to remember that fast start, having been an Orioles fan at the time.) Now on to the key ideas The passage quoted above illustrates three statistical fa


Summary: the most important sentenses genereted by tfidf model

sentIndex sentText sentNum sentScore

1 From a commenter on the web, 21 May 2010: Tampa Bay: Playing . [sent-1, score-0.273]

2 732 ball in the toughest division in baseball, wiped their feet on NY twice. [sent-2, score-0.742]

3 If they sweep Houston, which seems pretty likely, they will be at . [sent-3, score-0.253]

4 Quick calculation: if a team is good enough to be expected to win 100 games, that is, Pr(win) = 100/162 = . [sent-6, score-0.128]

5 617, then there’s a 5% chance that they’ll have won at least 30 of their first 41 games. [sent-7, score-0.166]

6 That’s a calculation based on simple probability theory of independent events, which isn’t quite right here but will get you close and is a good way to train one’s intuition , I think. [sent-8, score-0.312]

7 The Detroit Tigers won 35 of their first 40 games in 1984: that’s . [sent-11, score-0.395]

8 (I happen to remember that fast start, having been an Orioles fan at the time. [sent-13, score-0.243]

9 ) Now on to the key ideas The passage quoted above illustrates three statistical fallacies which I believe are common but are not often discussed: 1. [sent-14, score-0.342]

10 ” There’s no particular reason the commenter should’ve heard of the 1984 Tigers; my point here is that past data aren’t always as you remember them. [sent-33, score-0.537]

11 I don’t mean to pick on the above commenter, who I’m sure was just posting some idle thoughts. [sent-36, score-0.26]

12 In some ways, though, perhaps these low-priority remarks are the best windows into our implicit thinking. [sent-37, score-0.229]

13 Yes, I realize this is out of date–the perils of lagged blog posting. [sent-41, score-0.119]


similar blogs computed by tfidf model

tfidf for this blog:

wordName wordTfidf (topN-words)

[('commenter', 0.273), ('sweep', 0.253), ('wiped', 0.253), ('houston', 0.239), ('rays', 0.239), ('tigers', 0.239), ('games', 0.229), ('feet', 0.199), ('ny', 0.183), ('heard', 0.16), ('calculation', 0.148), ('posting', 0.133), ('win', 0.128), ('idle', 0.127), ('orioles', 0.127), ('tampa', 0.127), ('toughest', 0.127), ('playing', 0.126), ('lagged', 0.119), ('fallacies', 0.11), ('beating', 0.107), ('detroit', 0.107), ('yankees', 0.104), ('remember', 0.104), ('remembered', 0.1), ('bay', 0.096), ('won', 0.092), ('train', 0.091), ('windows', 0.088), ('conditioning', 0.086), ('ball', 0.085), ('pr', 0.083), ('passage', 0.079), ('counting', 0.079), ('illustrates', 0.079), ('date', 0.078), ('division', 0.078), ('baseball', 0.076), ('quoted', 0.074), ('first', 0.074), ('intuition', 0.073), ('remarks', 0.073), ('fan', 0.071), ('never', 0.069), ('twice', 0.069), ('valid', 0.068), ('implicit', 0.068), ('fast', 0.068), ('historical', 0.067), ('events', 0.067)]

similar blogs list:

simIndex simValue blogId blogTitle

same-blog 1 1.0000001 171 andrew gelman stats-2010-07-30-Silly baseball example illustrates a couple of key ideas they don’t usually teach you in statistics class

Introduction: From a commenter on the web, 21 May 2010: Tampa Bay: Playing .732 ball in the toughest division in baseball, wiped their feet on NY twice. If they sweep Houston, which seems pretty likely, they will be at .750, which I [the commenter] have never heard of. At the time of that posting, the Rays were 30-11. Quick calculation: if a team is good enough to be expected to win 100 games, that is, Pr(win) = 100/162 = .617, then there’s a 5% chance that they’ll have won at least 30 of their first 41 games. That’s a calculation based on simple probability theory of independent events, which isn’t quite right here but will get you close and is a good way to train one’s intuition , I think. Having a .732 record after 41 games is not unheard-of. The Detroit Tigers won 35 of their first 40 games in 1984: that’s .875. (I happen to remember that fast start, having been an Orioles fan at the time.) Now on to the key ideas The passage quoted above illustrates three statistical fa

2 0.13210121 29 andrew gelman stats-2010-05-12-Probability of successive wins in baseball

Introduction: Dan Goldstein did an informal study asking people the following question: When two baseball teams play each other on two consecutive days, what is the probability that the winner of the first game will be the winner of the second game? You can make your own guess and the continue reading below. Dan writes: We asked two colleagues knowledgeable in baseball and the mathematics of forecasting. The answers came in between 65% and 70%. The true answer [based on Dan's analysis of a database of baseball games]: 51.3%, a little better than a coin toss. I have to say, I’m surprised his colleagues gave such extreme guesses. I was guessing something like 50%, myself, based on the following very crude reasoning: Suppose two unequal teams are playing, and the chance of team A beating team B is 55%. (This seems like a reasonable average of all matchups, which will include some more extreme disparities but also many more equal contests.) Then the chance of the same team

3 0.12186199 1168 andrew gelman stats-2012-02-14-The tabloids strike again

Introduction: See comments #2,3,4 here . I guess that’s why Science and Nature are known as “the tabloids.” As the commenter writes, “you can’t have people look at too many images of maggot-infested wounds.”

4 0.11340243 218 andrew gelman stats-2010-08-20-I think you knew this already

Introduction: I was playing out a chess game from the newspaper and we reminded how the best players use the entire board in their game. In my own games (I’m not very good, I’m guessing my “rating” would be something like 1500?), the action always gets concentrated on one part of the board. Grandmaster games do get focused on particular squares of the board, of course, but, meanwhile, there are implications in other places and the action can suddenly shift.

5 0.11320458 2224 andrew gelman stats-2014-02-25-Basketball Stats: Don’t model the probability of win, model the expected score differential.

Introduction: Someone who wants to remain anonymous writes: I am working to create a more accurate in-game win probability model for basketball games. My idea is for each timestep in a game (a second, 5 seconds, etc), use the Vegas line, the current score differential, who has the ball, and the number of possessions played already (to account for differences in pace) to create a point estimate probability of the home team winning. This problem would seem to fit a multi-level model structure well. It seems silly to estimate 2,000 regressions (one for each timestep), but the coefficients should vary at each timestep. Do you have suggestions for what type of model this could/would be? Additionally, I believe this needs to be some form of logit/probit given the binary dependent variable (win or loss). Finally, do you have suggestions for what package could accomplish this in Stata or R? To answer the questions in reverse order: 3. I’d hope this could be done in Stan (which can be run from R)

6 0.1120773 2262 andrew gelman stats-2014-03-23-Win probabilities during a sporting event

7 0.10629331 1847 andrew gelman stats-2013-05-08-Of parsing and chess

8 0.091612071 1544 andrew gelman stats-2012-10-22-Is it meaningful to talk about a probability of “65.7%” that Obama will win the election?

9 0.087440565 619 andrew gelman stats-2011-03-19-If a comment is flagged as spam, it will disappear forever

10 0.081796765 2272 andrew gelman stats-2014-03-29-I agree with this comment

11 0.079375915 1562 andrew gelman stats-2012-11-05-Let’s try this: Instead of saying, “The probability is 75%,” say “There’s a 25% chance I’m wrong”

12 0.079029813 445 andrew gelman stats-2010-12-03-Getting a job in pro sports… as a statistician

13 0.076703459 2226 andrew gelman stats-2014-02-26-Econometrics, political science, epidemiology, etc.: Don’t model the probability of a discrete outcome, model the underlying continuous variable

14 0.07453128 652 andrew gelman stats-2011-04-07-Minor-league Stats Predict Major-league Performance, Sarah Palin, and Some Differences Between Baseball and Politics

15 0.07434015 1547 andrew gelman stats-2012-10-25-College football, voting, and the law of large numbers

16 0.074039504 976 andrew gelman stats-2011-10-27-Geophysicist Discovers Modeling Error (in Economics)

17 0.072575383 623 andrew gelman stats-2011-03-21-Baseball’s greatest fielders

18 0.068883397 2036 andrew gelman stats-2013-09-24-“Instead of the intended message that being poor is hard, the takeaway is that rich people aren’t very good with money.”

19 0.065216832 2014 andrew gelman stats-2013-09-09-False memories and statistical analysis

20 0.065202951 54 andrew gelman stats-2010-05-27-Hype about conditional probability puzzles


similar blogs computed by lsi model

lsi for this blog:

topicId topicWeight

[(0, 0.122), (1, -0.029), (2, -0.001), (3, 0.036), (4, -0.004), (5, -0.007), (6, 0.028), (7, 0.003), (8, 0.035), (9, -0.047), (10, -0.005), (11, 0.013), (12, 0.007), (13, -0.05), (14, -0.059), (15, 0.013), (16, 0.011), (17, -0.023), (18, 0.027), (19, -0.01), (20, -0.01), (21, 0.032), (22, -0.012), (23, 0.032), (24, 0.002), (25, 0.044), (26, 0.019), (27, 0.051), (28, -0.02), (29, -0.101), (30, 0.027), (31, -0.031), (32, 0.014), (33, 0.022), (34, 0.003), (35, 0.006), (36, 0.028), (37, 0.033), (38, -0.015), (39, 0.001), (40, 0.002), (41, 0.014), (42, 0.003), (43, -0.028), (44, 0.012), (45, 0.007), (46, -0.023), (47, -0.012), (48, -0.028), (49, -0.032)]

similar blogs list:

simIndex simValue blogId blogTitle

same-blog 1 0.9406985 171 andrew gelman stats-2010-07-30-Silly baseball example illustrates a couple of key ideas they don’t usually teach you in statistics class

Introduction: From a commenter on the web, 21 May 2010: Tampa Bay: Playing .732 ball in the toughest division in baseball, wiped their feet on NY twice. If they sweep Houston, which seems pretty likely, they will be at .750, which I [the commenter] have never heard of. At the time of that posting, the Rays were 30-11. Quick calculation: if a team is good enough to be expected to win 100 games, that is, Pr(win) = 100/162 = .617, then there’s a 5% chance that they’ll have won at least 30 of their first 41 games. That’s a calculation based on simple probability theory of independent events, which isn’t quite right here but will get you close and is a good way to train one’s intuition , I think. Having a .732 record after 41 games is not unheard-of. The Detroit Tigers won 35 of their first 40 games in 1984: that’s .875. (I happen to remember that fast start, having been an Orioles fan at the time.) Now on to the key ideas The passage quoted above illustrates three statistical fa

2 0.80726516 29 andrew gelman stats-2010-05-12-Probability of successive wins in baseball

Introduction: Dan Goldstein did an informal study asking people the following question: When two baseball teams play each other on two consecutive days, what is the probability that the winner of the first game will be the winner of the second game? You can make your own guess and the continue reading below. Dan writes: We asked two colleagues knowledgeable in baseball and the mathematics of forecasting. The answers came in between 65% and 70%. The true answer [based on Dan's analysis of a database of baseball games]: 51.3%, a little better than a coin toss. I have to say, I’m surprised his colleagues gave such extreme guesses. I was guessing something like 50%, myself, based on the following very crude reasoning: Suppose two unequal teams are playing, and the chance of team A beating team B is 55%. (This seems like a reasonable average of all matchups, which will include some more extreme disparities but also many more equal contests.) Then the chance of the same team

3 0.75201404 559 andrew gelman stats-2011-02-06-Bidding for the kickoff

Introduction: Steven Brams and James Jorash propose a system for reducing the advantage that comes from winning the coin flip in overtime: Dispensing with a coin toss, the teams would bid on where the ball is kicked from by the kicking team. In the NFL, it’s now the 30-yard line. Under Brams and Jorasch’s rule, the kicking team would be the team that bids the lower number, because it is willing to put itself at a disadvantage by kicking from farther back. However, it would not kick from the number it bids, but from the average of the two bids. To illustrate, assume team A bids to kick from the 38-yard line, while team B bids its 32-yard line. Team B would win the bidding and, therefore, be designated as the kick-off team. But B wouldn’t kick from 32, but instead from the average of 38 and 32–its 35-yard line. This is better for B by 3 yards than the 32-yard line that it proposed, because it’s closer to the end zone it is kicking towards. It’s also better for A by 3 yards to have B kick fr

4 0.74546629 2105 andrew gelman stats-2013-11-18-What’s my Kasparov number?

Introduction: A colleague writes: Personally my Kasparov number is two: I beat ** in a regular tournament game, and ** beat Kasparov! That’s pretty impressive, especially given that I didn’t know this guy played chess at all! Anyway, this got me thinking, what’s my Kasparov number? OK, that’s easy. I beat Magnus Carlsen the other day when he was passing through town on vacation, Carlsen beat Anand, . . . OK, just kidding. What is my Kasparov number, though? Note that the definition, unlike that of the Erdos or Bacon numbers, is asymmetric: it has to be that I had a victory over person 1, and person 1 had a victory over person 2, etc., and ultimately person N-1 had a victory over Kasparov. The games don’t have to be in time order, they just all have to be victories. And we’ll further require that the games all be played after childhood and before senility (i.e., it doesn’t count if I happened to play someone who happens to be a cousin of some grandmaster whom he beat when they were b

5 0.74356949 2262 andrew gelman stats-2014-03-23-Win probabilities during a sporting event

Introduction: Todd Schneider writes: Apropos of your recent blog post about modeling score differential of basketball games , I thought you might enjoy a site I built, gambletron2000.com , that gathers real-time win probabilities from betting markets for most major sports (including NBA and college basketball). My original goal was to use the variance of changes in win probabilities to quantify which games were the most exciting, but I got a bit carried away and ended up pursuing a bunch of other ideas, which  you can read about in the full writeup here This particular passage from the anonymous someone in your post: My idea is for each timestep in a game (a second, 5 seconds, etc), use the Vegas line, the current score differential, who has the ball, and the number of possessions played already (to account for differences in pace) to create a point estimate probability of the home team winning. reminded me of a graph I made, which shows the mean-reverting tendency of N

6 0.73681915 1387 andrew gelman stats-2012-06-21-Will Tiger Woods catch Jack Nicklaus? And a discussion of the virtues of using continuous data even if your goal is discrete prediction

7 0.73047465 1113 andrew gelman stats-2012-01-11-Toshiro Kageyama on professionalism

8 0.7199347 562 andrew gelman stats-2011-02-06-Statistician cracks Toronto lottery

9 0.71241343 1467 andrew gelman stats-2012-08-23-The pinch-hitter syndrome again

10 0.70618123 2267 andrew gelman stats-2014-03-26-Is a steal really worth 9 points?

11 0.70524967 54 andrew gelman stats-2010-05-27-Hype about conditional probability puzzles

12 0.70148253 813 andrew gelman stats-2011-07-21-Scrabble!

13 0.69642329 623 andrew gelman stats-2011-03-21-Baseball’s greatest fielders

14 0.68423635 445 andrew gelman stats-2010-12-03-Getting a job in pro sports… as a statistician

15 0.68327135 218 andrew gelman stats-2010-08-20-I think you knew this already

16 0.67387128 731 andrew gelman stats-2011-05-26-Lottery probability update

17 0.66713762 1731 andrew gelman stats-2013-02-21-If a lottery is encouraging addictive gambling, don’t expand it!

18 0.66707426 1804 andrew gelman stats-2013-04-15-How effective are football coaches?

19 0.66610312 23 andrew gelman stats-2010-05-09-Popper’s great, but don’t bother with his theory of probability

20 0.65505391 942 andrew gelman stats-2011-10-04-45% hitting, 25% fielding, 25% pitching, and 100% not telling us how they did it


similar blogs computed by lda model

lda for this blog:

topicId topicWeight

[(9, 0.049), (15, 0.023), (16, 0.048), (24, 0.11), (25, 0.244), (57, 0.02), (61, 0.011), (66, 0.012), (72, 0.012), (85, 0.011), (86, 0.03), (89, 0.032), (95, 0.012), (99, 0.262)]

similar blogs list:

simIndex simValue blogId blogTitle

1 0.91012305 1741 andrew gelman stats-2013-02-27-Thin scientists say it’s unhealthy to be fat

Introduction: “Even as you get near the upper reaches of the normal weight range, you begin to see increases in chronic diseases,” said JoAnn Manson, chief of the Division of Preventive Medicine, Brigham and Women’s Hospital, HMS Michael and Lee Bell Professor of Women’s Health, and HSPH professor of epidemiology. “It’s a clear gradient of increase.” Yeah, she would say that. Thin people. And then there’s Frank Hu, professor of nutrition at Harvard: The studies that Flegal [the author of the original study finding a negative correlation between body mass index and mortality] did use included many samples of people who were chronically ill, current smokers and elderly, according to Hu. These factors are associated with weight loss and increased mortality. In other words, people are not dying because they are slim, he said. They are slim because they are dying—of cancer or old age, for example. By doing a meta-analysis of studies that did not properly control for this bias, Flegal amplif

same-blog 2 0.89455485 171 andrew gelman stats-2010-07-30-Silly baseball example illustrates a couple of key ideas they don’t usually teach you in statistics class

Introduction: From a commenter on the web, 21 May 2010: Tampa Bay: Playing .732 ball in the toughest division in baseball, wiped their feet on NY twice. If they sweep Houston, which seems pretty likely, they will be at .750, which I [the commenter] have never heard of. At the time of that posting, the Rays were 30-11. Quick calculation: if a team is good enough to be expected to win 100 games, that is, Pr(win) = 100/162 = .617, then there’s a 5% chance that they’ll have won at least 30 of their first 41 games. That’s a calculation based on simple probability theory of independent events, which isn’t quite right here but will get you close and is a good way to train one’s intuition , I think. Having a .732 record after 41 games is not unheard-of. The Detroit Tigers won 35 of their first 40 games in 1984: that’s .875. (I happen to remember that fast start, having been an Orioles fan at the time.) Now on to the key ideas The passage quoted above illustrates three statistical fa

3 0.89297515 821 andrew gelman stats-2011-07-25-See me talk in the Upper West Side (without graphs) today

Introduction: At Picnic Cafe, Broadway at 101 St, 6-7pm today. Should we vote even though it probably won’t make a difference? Why is the question “Are we better off now than four years ago?” not an appeal to selfishness? Are Americans as polarized as we think? Come explore these and other questions about voting in America today. It’s the usual stuff but close-up so lots of opportunity to argue and heckle. No slides or graphs. My plan is to hand out 30-50 index cards, each with a phrase (for example, “Moderation in the pursuit of moderation is no vice” or “Gerrymandering is good for you” or “How to predict elections”), then participants can call out topics and I’ll yap on them (with discussion) till we run out of time. It’ll be weird to talk without graphs. We’ll see how it goes.

4 0.87910938 353 andrew gelman stats-2010-10-19-The violent crime rate was about 75% higher in Detroit than in Minneapolis in 2009

Introduction: Christopher Uggen reports . I’m surprised the difference is so small. I would’ve thought the crime rate was something like 5 times higher in Detroit than in Minneapolis. I guess Minneapolis must have some rough neighborhoods. Or maybe it’s just that I don’t have a good framework for thinking about crime statistics.

5 0.83673167 1296 andrew gelman stats-2012-05-03-Google Translate for code, and an R help-list bot

Introduction: What we did in our Stan meeting yesterday: Some discussion of revision of the Nuts paper, some conversations about parameterizations of categorical-data models, plans for the R interface, blah blah blah. But also, I had two exciting new ideas! Google Translate for code Wouldn’t it be great if Google Translate could work on computer languages? I suggested this and somebody said that it might be a problem because code isn’t always translatable. But that doesn’t worry so much. Google Translate for human languages isn’t perfect either but it’s a useful guide. If I want to write a message to someone in French or Spanish or Dutch, I wouldn’t just write it in English and run it through Translate. What I do is try my best to write it in the desired language, but I can try out some tricky words or phrases in the translator. Or, if I start by translating, I go back and forth to make sure it all makes sense. An R help-list bot We were talking about how to build a Stan commun

6 0.8188715 1151 andrew gelman stats-2012-02-03-Philosophy of Bayesian statistics: my reactions to Senn

7 0.81824899 2213 andrew gelman stats-2014-02-16-There’s no need for you to read this one

8 0.80685282 217 andrew gelman stats-2010-08-19-The “either-or” fallacy of believing in discrete models: an example of folk statistics

9 0.80005682 451 andrew gelman stats-2010-12-05-What do practitioners need to know about regression?

10 0.79396415 1577 andrew gelman stats-2012-11-14-Richer people continue to vote Republican

11 0.7886591 570 andrew gelman stats-2011-02-12-Software request

12 0.78208435 1039 andrew gelman stats-2011-12-02-I just flew in from the econ seminar, and boy are my arms tired

13 0.77583337 859 andrew gelman stats-2011-08-18-Misunderstanding analysis of covariance

14 0.76866227 1682 andrew gelman stats-2013-01-19-R package for Bayes factors

15 0.76791036 2154 andrew gelman stats-2013-12-30-Bill Gates’s favorite graph of the year

16 0.76618004 342 andrew gelman stats-2010-10-14-Trying to be precise about vagueness

17 0.76547152 902 andrew gelman stats-2011-09-12-The importance of style in academic writing

18 0.75906718 1009 andrew gelman stats-2011-11-14-Wickham R short course

19 0.75747854 167 andrew gelman stats-2010-07-27-Why don’t more medical discoveries become cures?

20 0.75744897 1939 andrew gelman stats-2013-07-15-Forward causal reasoning statements are about estimation; reverse causal questions are about model checking and hypothesis generation