andrew_gelman_stats andrew_gelman_stats-2011 andrew_gelman_stats-2011-623 knowledge-graph by maker-knowledge-mining
Source: html
Introduction: Someone just stopped by and dropped off a copy of the book Wizardry: Baseball’s All-time Greatest Fielders Revealed, by Michael Humphreys. I don’t have much to say about the topic–I did see Brooks Robinson play, but I don’t remember any fancy plays. I must have seen Mark Belanger but I don’t really recall. Ozzie Smith was cool but I saw only him on TV. The most impressive thing I ever saw live was Rickey Henderson stealing a base. The best thing about that was that everyone was expecting him to steal the base, and he still was able to do it. But that wasn’t fielding either. Anyway, Humphreys was nice enough to give me a copy of his book, and since I can’t say much (I didn’t have it in me to study the formulas in detail, nor do I know enough to be able to evaluate them), I might as well say what I can say right away. (Note: Humphreys replies to some of these questions in a comment .) 1. Near the beginning, Humphreys says that 10 runs are worth about 1 win. I’ve always b
sentIndex sentText sentNum sentScore
1 Anyway, Humphreys was nice enough to give me a copy of his book, and since I can’t say much (I didn’t have it in me to study the formulas in detail, nor do I know enough to be able to evaluate them), I might as well say what I can say right away. [sent-8, score-0.299]
2 Near the beginning, Humphreys says that 10 runs are worth about 1 win. [sent-11, score-0.217]
3 If a team scores 700 runs in 162 games, then an extra 10 runs is 710, and Bill James’s prediction is Games. [sent-13, score-0.654]
4 Winning 1 extra game gives you an 82-80 record, for a ratio of 82/80=1. [sent-17, score-0.147]
5 As I understand it, Humphreys is proposing two methods to evaluate fielders: - The full approach, given knowledge of where all the balls are hit when a player is in the field. [sent-24, score-0.125]
6 For example, Bill James has his A*B/C formula for evaluating offensive effectiveness. [sent-27, score-0.153]
7 But there’s also on-base percentage and slugging average, both of which give a pretty good sense of what’s going on and serve as a bridge between the basic statistics (1B, 2B, 3B, BB, etc) and the ultimate goal of runs scored. [sent-28, score-0.337]
8 Similarly, I think Humphreys would make many a baseball fan happy if he could give a sense of the meaning of some basic fielding statistics–not just fielding average but also #assists, #double plays, etc. [sent-29, score-0.909]
9 Humphreys makes the case that fielding is more important, as a contribution to winning, than we’ve thought. [sent-34, score-0.406]
10 Are there other aspects of strong (or weak) fielding not captured in the data? [sent-36, score-0.479]
11 For example, suppose you have a team such as the ’80s Cardinals with a fast infield, a fast outfield, and a pitching staff that throws a lot of low pitches leading to ground balls. [sent-37, score-0.319]
12 In this case, the fielders are getting more chances because the manager trusts them enough to get ground-ball pitchers. [sent-39, score-0.427]
13 Conversely, a team with bad fielders perhaps will adjust their pitching accordingly, taking more chances with the BB and HR. [sent-40, score-0.579]
14 Perhaps start with something simple like some graphs of (estimated) offensive ability vs. [sent-50, score-0.138]
15 Then some time series of fielding statistics, both the raw data of putouts, chances, assists, etc. [sent-52, score-0.406]
16 Humphreys talks a lot about different eras of baseball and argues persuasively that players are much better now than in the old days. [sent-59, score-0.403]
17 This motivates some adjustment for the years in which a player was active, just as with statistics for offense and pitching. [sent-60, score-0.2]
18 The one thing I’m worried about in the comparison of players from different eras is that I assume that fielding as a whole has been more important in some periods (e. [sent-61, score-0.707]
19 If you’re fielding in an era where fielding matters more, you can actually save more runs and win more games through fielding. [sent-64, score-1.273]
20 Basically, in comparing fielders in different eras, you have a choice between evaluating what they did or what they could do . [sent-66, score-0.332]
wordName wordTfidf (topN-words)
[('humphreys', 0.483), ('fielding', 0.406), ('fielders', 0.267), ('runs', 0.217), ('eras', 0.16), ('extra', 0.147), ('pitching', 0.132), ('james', 0.126), ('chances', 0.107), ('bb', 0.107), ('assists', 0.107), ('pythagorean', 0.107), ('bill', 0.098), ('baseball', 0.097), ('games', 0.097), ('players', 0.093), ('offensive', 0.088), ('win', 0.081), ('team', 0.073), ('captured', 0.073), ('player', 0.068), ('statistics', 0.067), ('era', 0.066), ('adjustment', 0.065), ('evaluating', 0.065), ('intuition', 0.062), ('say', 0.061), ('winning', 0.061), ('copy', 0.059), ('relationship', 0.058), ('evaluate', 0.057), ('fast', 0.057), ('see', 0.056), ('persuasively', 0.053), ('triples', 0.053), ('ozzie', 0.053), ('slugging', 0.053), ('trusts', 0.053), ('fundamental', 0.052), ('doubles', 0.05), ('batting', 0.05), ('derivation', 0.05), ('henderson', 0.05), ('rickey', 0.05), ('ability', 0.05), ('basically', 0.05), ('accordingly', 0.048), ('trajectories', 0.048), ('thing', 0.048), ('saw', 0.048)]
simIndex simValue blogId blogTitle
same-blog 1 0.99999982 623 andrew gelman stats-2011-03-21-Baseball’s greatest fielders
Introduction: Someone just stopped by and dropped off a copy of the book Wizardry: Baseball’s All-time Greatest Fielders Revealed, by Michael Humphreys. I don’t have much to say about the topic–I did see Brooks Robinson play, but I don’t remember any fancy plays. I must have seen Mark Belanger but I don’t really recall. Ozzie Smith was cool but I saw only him on TV. The most impressive thing I ever saw live was Rickey Henderson stealing a base. The best thing about that was that everyone was expecting him to steal the base, and he still was able to do it. But that wasn’t fielding either. Anyway, Humphreys was nice enough to give me a copy of his book, and since I can’t say much (I didn’t have it in me to study the formulas in detail, nor do I know enough to be able to evaluate them), I might as well say what I can say right away. (Note: Humphreys replies to some of these questions in a comment .) 1. Near the beginning, Humphreys says that 10 runs are worth about 1 win. I’ve always b
2 0.24538311 942 andrew gelman stats-2011-10-04-45% hitting, 25% fielding, 25% pitching, and 100% not telling us how they did it
Introduction: A University of Delaware press release reports : This month, the Journal of Quantitative Analysis in Sports will feature the article “An Estimate of How Hitting, Pitching, Fielding, and Base-stealing Impact Team Winning Percentages in Baseball.” In it, University of Delaware Prof. Charles Pavitt of the Department of Communication defines the perfect “formula” for Major League Baseball (MLB) teams to use to build the ultimate winning team. Pavitt found hitting accounts for more than 45 percent of teams’ winning records, fielding for 25 percent and pitching for 25 percent. And that the impact of stolen bases is greatly overestimated. He crunched hitting, pitching, fielding and base-stealing records for every MLB team over a 48-year period from 1951-1998 with a method no other researcher has used in this area. In statistical parlance, he used a conceptual decomposition of offense and defense into its component parts and then analyzed recombinations of the parts in intuitively mea
3 0.22063567 697 andrew gelman stats-2011-05-05-A statistician rereads Bill James
Introduction: Ben Lindbergh invited me to write an article for Baseball Prospectus. I first sent him this item on the differences between baseball and politics but he said it was too political for them. I then sent him this review of a book on baseball’s greatest fielders but he said they already had someone slotted to review that book. Then I sent him some reflections on the great Bill James and he published it ! If anybody out there knows Bill James, please send this on to him: I have some questions at the end that I’m curious about. Here’s how it begins: I read my first Bill James book in 1984, took my first statistics class in 1985, and began graduate study in statistics the next year. Besides giving me the opportunity to study with the best applied statistician of the late 20th century (Don Rubin) and the best theoretical statistician of the early 21st (Xiao-Li Meng), going to graduate school at Harvard in 1986 gave me the opportunity to sit in a basement room one evening that
Introduction: During our discussion of estimates of teacher performance, Steve Sailer wrote : I suspect we’re going to take years to work the kinks out of overall rating systems. By way of analogy, Bill James kicked off the modern era of baseball statistics analysis around 1975. But he stuck to doing smaller scale analyses and avoided trying to build one giant overall model for rating players. In contrast, other analysts such as Pete Palmer rushed into building overall ranking systems, such as his 1984 book, but they tended to generate curious results such as the greatness of Roy Smalley Jr.. James held off until 1999 before unveiling his win share model for overall rankings. I remember looking at Pete Palmer’s book many years ago and being disappointed that he did everything through his Linear Weights formula. A hit is worth X, a walk is worth Y, etc. Some of this is good–it’s presumably an improvement on counting walks as 0 or 1 hits, also an improvement on counting doubles and triples a
Introduction: In politics, as in baseball, hot prospects from the minors can have trouble handling big-league pitching. Right after Sarah Palin was chosen as the Republican nominee for vice president in 2008, my friend Ubs, who grew up in Alaska and follows politics closely, wrote the following : Palin would probably be a pretty good president. . . . She is fantastically popular. Her percentage approval ratings have reached the 90s. Even now, with a minor nepotism scandal going on, she’s still about 80%. . . . How does one do that? You might get 60% or 70% who are rabidly enthusiastic in their love and support, but you’re also going to get a solid core of opposition who hate you with nearly as much passion. The way you get to 90% is by being boringly competent while remaining inoffensive to people all across the political spectrum. Ubs gives a long discussion of Alaska’s unique politics and then writes: Palin’s magic formula for success has been simply to ignore partisan crap and get
6 0.12402916 1381 andrew gelman stats-2012-06-16-The Art of Fielding
10 0.099247865 642 andrew gelman stats-2011-04-02-Bill James and the base-rate fallacy
11 0.094890229 1506 andrew gelman stats-2012-09-21-Building a regression model . . . with only 27 data points
12 0.094680198 29 andrew gelman stats-2010-05-12-Probability of successive wins in baseball
13 0.094005883 611 andrew gelman stats-2011-03-14-As the saying goes, when they argue that you’re taking over, that’s when you know you’ve won
14 0.088956185 2267 andrew gelman stats-2014-03-26-Is a steal really worth 9 points?
15 0.083612718 1903 andrew gelman stats-2013-06-17-Weak identification provides partial information
16 0.082986854 173 andrew gelman stats-2010-07-31-Editing and clutch hitting
17 0.080429725 1847 andrew gelman stats-2013-05-08-Of parsing and chess
18 0.079889432 1070 andrew gelman stats-2011-12-19-The scope for snooping
19 0.079813391 440 andrew gelman stats-2010-12-01-In defense of jargon
20 0.07924138 1671 andrew gelman stats-2013-01-13-Preregistration of Studies and Mock Reports
topicId topicWeight
[(0, 0.179), (1, -0.026), (2, -0.003), (3, 0.048), (4, 0.023), (5, 0.008), (6, 0.02), (7, 0.025), (8, 0.053), (9, 0.002), (10, 0.002), (11, 0.01), (12, -0.013), (13, -0.032), (14, -0.022), (15, 0.009), (16, 0.018), (17, 0.007), (18, 0.059), (19, -0.051), (20, -0.03), (21, 0.027), (22, 0.005), (23, 0.078), (24, 0.029), (25, 0.07), (26, -0.039), (27, 0.023), (28, -0.029), (29, -0.124), (30, 0.012), (31, -0.022), (32, 0.047), (33, -0.007), (34, -0.039), (35, 0.059), (36, 0.031), (37, 0.021), (38, -0.009), (39, 0.022), (40, 0.119), (41, 0.062), (42, -0.019), (43, -0.026), (44, -0.011), (45, 0.007), (46, 0.006), (47, -0.016), (48, -0.034), (49, 0.012)]
simIndex simValue blogId blogTitle
same-blog 1 0.92511535 623 andrew gelman stats-2011-03-21-Baseball’s greatest fielders
Introduction: Someone just stopped by and dropped off a copy of the book Wizardry: Baseball’s All-time Greatest Fielders Revealed, by Michael Humphreys. I don’t have much to say about the topic–I did see Brooks Robinson play, but I don’t remember any fancy plays. I must have seen Mark Belanger but I don’t really recall. Ozzie Smith was cool but I saw only him on TV. The most impressive thing I ever saw live was Rickey Henderson stealing a base. The best thing about that was that everyone was expecting him to steal the base, and he still was able to do it. But that wasn’t fielding either. Anyway, Humphreys was nice enough to give me a copy of his book, and since I can’t say much (I didn’t have it in me to study the formulas in detail, nor do I know enough to be able to evaluate them), I might as well say what I can say right away. (Note: Humphreys replies to some of these questions in a comment .) 1. Near the beginning, Humphreys says that 10 runs are worth about 1 win. I’ve always b
2 0.86858398 697 andrew gelman stats-2011-05-05-A statistician rereads Bill James
Introduction: Ben Lindbergh invited me to write an article for Baseball Prospectus. I first sent him this item on the differences between baseball and politics but he said it was too political for them. I then sent him this review of a book on baseball’s greatest fielders but he said they already had someone slotted to review that book. Then I sent him some reflections on the great Bill James and he published it ! If anybody out there knows Bill James, please send this on to him: I have some questions at the end that I’m curious about. Here’s how it begins: I read my first Bill James book in 1984, took my first statistics class in 1985, and began graduate study in statistics the next year. Besides giving me the opportunity to study with the best applied statistician of the late 20th century (Don Rubin) and the best theoretical statistician of the early 21st (Xiao-Li Meng), going to graduate school at Harvard in 1986 gave me the opportunity to sit in a basement room one evening that
Introduction: Eric Tassone writes: Probably not blog-worthy/blog-appropriate, but have you heard Bill James discussing the Sandusky & Paterno stuff? I think you discussed once his stance on the Dowd Report, and this seems to be from the same part of his personality—which goes beyond contrarian . . . I have in fact blogged on James ( many times ) and on Paterno , so yes I think this is blogworthy. On the other hand, most readers of this blog probably don’t care about baseball, football, or William James, so I’ll put the rest below the fold. What is legendary baseball statistician Bill James doing, defending the crime-coverups of legendary coach Joe Paterno? As I wrote in my earlier blog on Paterno, it isn’t always easy to do the right thing, and I have no idea if I’d behave any better if I were in such a situation. The characteristics of a good coach do not necessarily provide what it takes to make good decisions off the field. In this sense even more of the blame should go
4 0.85857129 1113 andrew gelman stats-2012-01-11-Toshiro Kageyama on professionalism
Introduction: Following up on our discussion of professionalism (in which Jonathan Chait argued that “the definition of a professional career track” requires pay differentials and the chance to get fired, and I argued the opposite, that a lot of people go into professional careers specifically because of the job security), Austin Frakt pointed me to this description of professionalism from Go master Toshiro Kageyama. This in turn reminds me of a remark of Bill James when he explained lack of surprise that clutch hitting does not show up in the data. He wrote that the underlying idea of clutch hitting is that a player will play particuarly well in an important situation where the game or the season is on the line. But, James pointed out, these guys are pros, and the true sign of a professional is that he can always stay concentrated. This argument applies particuarly for hitting, maybe less so for pitching, where a pitcher can’t necessarily throw his hardest for 100 pitches in a game.
5 0.85086459 642 andrew gelman stats-2011-04-02-Bill James and the base-rate fallacy
Introduction: I was recently rereading and enjoying Bill James’s Historical Baseball Abstract (the second edition, from 2001). But even the Master is not perfect. Here he is, in the context of the all-time 20th-greatest shortstop (in his reckoning): Are athletes special people? In general, no, but occasionally, yes. Johnny Pesky at 75 was trim, youthful, optimistic, and practically exploding with energy. You rarely meet anybody like that who isn’t an ex-athlete–and that makes athletes seem special. [italics in the original] Hey, I’ve met 75-year-olds like that–and none of them are ex-athletes! That’s probably because I don’t know a lot of ex-athletes. But Bill James . . . he knows a lot of athletes. He went to the bathroom with Tim Raines once! The most I can say is that I saw Rickey Henderson steal a couple bases when he was playing against the Orioles once. Cognitive psychologists talk about the base-rate fallacy , which is the mistake of estimating probabilities without accou
7 0.77655393 942 andrew gelman stats-2011-10-04-45% hitting, 25% fielding, 25% pitching, and 100% not telling us how they did it
8 0.76821393 509 andrew gelman stats-2011-01-09-Chartjunk, but in a good cause!
9 0.76209623 29 andrew gelman stats-2010-05-12-Probability of successive wins in baseball
10 0.74962157 173 andrew gelman stats-2010-07-31-Editing and clutch hitting
13 0.73686546 445 andrew gelman stats-2010-12-03-Getting a job in pro sports… as a statistician
14 0.72952849 440 andrew gelman stats-2010-12-01-In defense of jargon
15 0.71800709 2267 andrew gelman stats-2014-03-26-Is a steal really worth 9 points?
16 0.69932687 1219 andrew gelman stats-2012-03-18-Tips on “great design” from . . . Microsoft!
17 0.6973331 813 andrew gelman stats-2011-07-21-Scrabble!
18 0.68926471 949 andrew gelman stats-2011-10-10-Grrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrr
19 0.68746293 987 andrew gelman stats-2011-11-02-How Khan Academy is using Machine Learning to Assess Student Mastery
20 0.68178028 802 andrew gelman stats-2011-07-13-Super Sam Fuld Needs Your Help (with Foul Ball stats)
topicId topicWeight
[(9, 0.046), (13, 0.015), (15, 0.019), (16, 0.1), (21, 0.021), (24, 0.101), (27, 0.016), (30, 0.017), (35, 0.02), (36, 0.013), (55, 0.011), (86, 0.02), (89, 0.204), (95, 0.014), (99, 0.245)]
simIndex simValue blogId blogTitle
1 0.97539991 1160 andrew gelman stats-2012-02-09-Familial Linkage between Neuropsychiatric Disorders and Intellectual Interests
Introduction: When I spoke at Princeton last year, I talked with neuroscientist Sam Wang, who told me about a project he did surveying incoming Princeton freshmen about mental illness in their families. He and his coauthor Benjamin Campbell found some interesting results, which they just published : A link between intellect and temperament has long been the subject of speculation. . . . Studies of the artistically inclined report linkage with familial depression, while among eminent and creative scientists, a lower incidence of affective disorders is found. In the case of developmental disorders, a heightened prevalence of autism spectrum disorders (ASDs) has been found in the families of mathematicians, physicists, and engineers. . . . We surveyed the incoming class of 2014 at Princeton University about their intended academic major, familial incidence of neuropsychiatric disorders, and demographic variables. . . . Consistent with prior findings, we noticed a relation between intended academ
2 0.97120881 1756 andrew gelman stats-2013-03-10-He said he was sorry
Introduction: Yes, it can be done : Hereby I contact you to clarify the situation that occurred with the publication of the article entitled *** which was published in Volume 11, Issue 3 of *** and I made the mistake of declaring as an author. This chapter is a plagiarism of . . . I wish to express and acknowledge that I am solely responsible for this . . . I recognize the gravity of the offense committed, since there is no justification for so doing. Therefore, and as a sign of shame and regret I feel in this situation, I will publish this letter, in order to set an example for other researchers do not engage in a similar error. No more, and to please accept my apologies, Sincerely, *** P.S. Since we’re on Retraction Watch already, I’ll point you to this unrelated story featuring a hilarious photo of a fraudster, who in this case was a grad student in psychology who faked his data and “has agreed to submit to a three-year supervisory period for any work involving funding from the
3 0.95299661 833 andrew gelman stats-2011-07-31-Untunable Metropolis
Introduction: Michael Margolis writes: What are we to make of it when a Metropolis-Hastings step just won’t tune? That is, the acceptance rate is zero at expected-jump-size X, and way above 1/2 at X-exp(-16) (i.e., machine precision ). I’ve solved my practical problem by writing that I would have liked to include results from a diffuse prior, but couldn’t. But I’m bothered by the poverty of my intuition. And since everything I’ve read says this is an issue of efficiency, rather than accuracy, I wonder if I could solve it just by running massive and heavily thinned chains. My reply: I can’t see how this could happen in a well-specified problem! I suspect it’s a bug. Otherwise try rescaling your variables so that your parameters will have values on the order of magnitude of 1. To which Margolis responded: I hardly wrote any of the code, so I can’t speak to the bug question — it’s binomial kriging from the R package geoRglm. And there are no covariates to scale — just the zero and one
Introduction: I remember in 4th grade or so, the teacher would give us a list of vocabulary words each week and we’d have to show we learned them by using each in a sentence. We quickly got bored and decided to do the assignment by writing a single sentence using all ten words. (Which the teacher hated, of course.) The above headline is in that spirit, combining blog posts rather than vocabulary words. But that only uses two of the entries. To really do the job, I’d need to throw in bivariate associations, ecological fallacies, high-dimensional feature selection, statistical significance, the suddenly unpopular name Hilary, snotty reviewers, the contagion of obesity, and milk-related spam. Or we could bring in some of the all-time favorites, such as Bayesians, economists, Finland, beautiful parents and their daughters, goofy graphics, red and blue states, essentialism in children’s reasoning, chess running, and zombies. Putting 8 of these in a single sentence (along with Glenn Hubbard
5 0.94920206 1215 andrew gelman stats-2012-03-16-The “hot hand” and problems with hypothesis testing
Introduction: Gur Yaari writes : Anyone who has ever watched a sports competition is familiar with expressions like “on fire”, “in the zone”, “on a roll”, “momentum” and so on. But what do these expressions really mean? In 1985 when Thomas Gilovich, Robert Vallone and Amos Tversky studied this phenomenon for the first time, they defined it as: “. . . these phrases express a belief that the performance of a player during a particular period is significantly better than expected on the basis of the player’s overall record”. Their conclusion was that what people tend to perceive as a “hot hand” is essentially a cognitive illusion caused by a misperception of random sequences. Until recently there was little, if any, evidence to rule out their conclusion. Increased computing power and new data availability from various sports now provide surprising evidence of this phenomenon, thus reigniting the debate. Yaari goes on to some studies that have found time dependence in basketball, baseball, voll
6 0.9460938 459 andrew gelman stats-2010-12-09-Solve mazes by starting at the exit
7 0.92391622 1855 andrew gelman stats-2013-05-13-Stan!
8 0.9226082 2243 andrew gelman stats-2014-03-11-The myth of the myth of the myth of the hot hand
same-blog 9 0.91994125 623 andrew gelman stats-2011-03-21-Baseball’s greatest fielders
10 0.91753328 407 andrew gelman stats-2010-11-11-Data Visualization vs. Statistical Graphics
11 0.91028631 1685 andrew gelman stats-2013-01-21-Class on computational social science this semester, Fridays, 1:00-3:40pm
12 0.90028572 1572 andrew gelman stats-2012-11-10-I don’t like this cartoon
13 0.89856851 566 andrew gelman stats-2011-02-09-The boxer, the wrestler, and the coin flip, again
14 0.89812356 1320 andrew gelman stats-2012-05-14-Question 4 of my final exam for Design and Analysis of Sample Surveys
15 0.89757943 1839 andrew gelman stats-2013-05-04-Jesus historian Niall Ferguson and the improving standards of public discourse
16 0.89731914 231 andrew gelman stats-2010-08-24-Yet another Bayesian job opportunity
17 0.89469498 1477 andrew gelman stats-2012-08-30-Visualizing Distributions of Covariance Matrices
18 0.88978392 1991 andrew gelman stats-2013-08-21-BDA3 table of contents (also a new paper on visualization)
19 0.88884652 1783 andrew gelman stats-2013-03-31-He’s getting ready to write a book
20 0.87002975 1628 andrew gelman stats-2012-12-17-Statistics in a world where nothing is random