andrew_gelman_stats andrew_gelman_stats-2011 andrew_gelman_stats-2011-697 knowledge-graph by maker-knowledge-mining
Source: html
Introduction: Ben Lindbergh invited me to write an article for Baseball Prospectus. I first sent him this item on the differences between baseball and politics but he said it was too political for them. I then sent him this review of a book on baseball’s greatest fielders but he said they already had someone slotted to review that book. Then I sent him some reflections on the great Bill James and he published it ! If anybody out there knows Bill James, please send this on to him: I have some questions at the end that I’m curious about. Here’s how it begins: I read my first Bill James book in 1984, took my first statistics class in 1985, and began graduate study in statistics the next year. Besides giving me the opportunity to study with the best applied statistician of the late 20th century (Don Rubin) and the best theoretical statistician of the early 21st (Xiao-Li Meng), going to graduate school at Harvard in 1986 gave me the opportunity to sit in a basement room one evening that
sentIndex sentText sentNum sentScore
1 I first sent him this item on the differences between baseball and politics but he said it was too political for them. [sent-2, score-0.415]
2 I then sent him this review of a book on baseball’s greatest fielders but he said they already had someone slotted to review that book. [sent-3, score-0.188]
3 Then I sent him some reflections on the great Bill James and he published it ! [sent-4, score-0.101]
4 If anybody out there knows Bill James, please send this on to him: I have some questions at the end that I’m curious about. [sent-5, score-0.087]
5 Here’s how it begins: I read my first Bill James book in 1984, took my first statistics class in 1985, and began graduate study in statistics the next year. [sent-6, score-0.37]
6 ” Unfortunately, John McNamara didn’t hear us, and the rest was history. [sent-8, score-0.081]
7 In statistics, I like to say that each substantive hypothesis deserves its own analysis: it’s generally hopeless to expect that you can run a single regression and pull off the answers to each of your research questions, one coefficient at a time. [sent-11, score-0.171]
8 - Controlled comparisons: Instead of comparing simple aggregates, be more careful and make comparisons on pairs or groups of similar players or teams. [sent-12, score-0.183]
9 As economists Rajeev Dehejia and Sadek Wahba demonstrated in a pair of influential articles (they have been cited over 2400 times since their publication a decade ago), these comparisons work only when you are controlling for appropriate characteristics. [sent-13, score-0.266]
10 In the case of Bill James’s analysis, player age is typically a key comparison variable. [sent-14, score-0.111]
11 From the standpoint of applied statistics, controlled comparisons combine the averaging that you get from having a moderate or large sample size with the insight that comes from understanding individual cases. [sent-15, score-0.299]
12 - Conceptual models used as guides to comparisons: James has written many times that he does not study statistical questions, he studies baseball questions. [sent-16, score-0.564]
13 A conceptual model such as the defensive spectrum, or the narrowing of abilities, or the contribution of speed to both offense and defense, drives the direction of the study and motivates many of the details of the analysis. [sent-18, score-0.297]
14 I have tried to follow these principles in my own work. [sent-19, score-0.162]
15 One central method of statistics that Bill James does not draw upon very often (if at all) is fitting parametric models. [sent-20, score-0.172]
16 For example, James found that the power two in the Pythagorean prediction for wins worked pretty well. [sent-21, score-0.072]
17 He didn’t try to estimate the power from data, nor did he, for example, try to come up with a conclusion such as, “each additional run is worth 0. [sent-22, score-0.357]
18 ” On the rare occasions that he did estimate a parameter (for example, the relative values of stolen bases and times caught stealing), he buried his methodology and had no interest in making a big deal about the estimation. [sent-24, score-0.613]
19 Why didn’t Bill James follow the example of Pete Palmer and others and try to estimate the relative values of walks, singles, doubles, and other outcomes? [sent-26, score-0.338]
wordName wordTfidf (topN-words)
[('james', 0.37), ('baseball', 0.314), ('bill', 0.24), ('comparisons', 0.183), ('abilities', 0.122), ('controlled', 0.116), ('conceptual', 0.113), ('player', 0.111), ('try', 0.102), ('sent', 0.101), ('study', 0.097), ('single', 0.096), ('statistics', 0.091), ('graduate', 0.091), ('principles', 0.088), ('questions', 0.087), ('singles', 0.087), ('dehejia', 0.087), ('fielders', 0.087), ('pythagorean', 0.087), ('narrowing', 0.087), ('prospectus', 0.087), ('relative', 0.083), ('times', 0.083), ('wahba', 0.082), ('aggregates', 0.082), ('doubles', 0.082), ('pluralism', 0.082), ('rajeev', 0.082), ('fitting', 0.081), ('estimate', 0.081), ('rest', 0.081), ('palmer', 0.078), ('buried', 0.078), ('opportunity', 0.078), ('screaming', 0.075), ('basement', 0.075), ('hopeless', 0.075), ('occasions', 0.075), ('tried', 0.074), ('walks', 0.073), ('power', 0.072), ('values', 0.072), ('pete', 0.071), ('stealing', 0.071), ('stolen', 0.071), ('didn', 0.07), ('bases', 0.07), ('guides', 0.07), ('statistician', 0.068)]
simIndex simValue blogId blogTitle
same-blog 1 0.9999994 697 andrew gelman stats-2011-05-05-A statistician rereads Bill James
Introduction: Ben Lindbergh invited me to write an article for Baseball Prospectus. I first sent him this item on the differences between baseball and politics but he said it was too political for them. I then sent him this review of a book on baseball’s greatest fielders but he said they already had someone slotted to review that book. Then I sent him some reflections on the great Bill James and he published it ! If anybody out there knows Bill James, please send this on to him: I have some questions at the end that I’m curious about. Here’s how it begins: I read my first Bill James book in 1984, took my first statistics class in 1985, and began graduate study in statistics the next year. Besides giving me the opportunity to study with the best applied statistician of the late 20th century (Don Rubin) and the best theoretical statistician of the early 21st (Xiao-Li Meng), going to graduate school at Harvard in 1986 gave me the opportunity to sit in a basement room one evening that
Introduction: During our discussion of estimates of teacher performance, Steve Sailer wrote : I suspect we’re going to take years to work the kinks out of overall rating systems. By way of analogy, Bill James kicked off the modern era of baseball statistics analysis around 1975. But he stuck to doing smaller scale analyses and avoided trying to build one giant overall model for rating players. In contrast, other analysts such as Pete Palmer rushed into building overall ranking systems, such as his 1984 book, but they tended to generate curious results such as the greatness of Roy Smalley Jr.. James held off until 1999 before unveiling his win share model for overall rankings. I remember looking at Pete Palmer’s book many years ago and being disappointed that he did everything through his Linear Weights formula. A hit is worth X, a walk is worth Y, etc. Some of this is good–it’s presumably an improvement on counting walks as 0 or 1 hits, also an improvement on counting doubles and triples a
Introduction: Eric Tassone writes: Probably not blog-worthy/blog-appropriate, but have you heard Bill James discussing the Sandusky & Paterno stuff? I think you discussed once his stance on the Dowd Report, and this seems to be from the same part of his personality—which goes beyond contrarian . . . I have in fact blogged on James ( many times ) and on Paterno , so yes I think this is blogworthy. On the other hand, most readers of this blog probably don’t care about baseball, football, or William James, so I’ll put the rest below the fold. What is legendary baseball statistician Bill James doing, defending the crime-coverups of legendary coach Joe Paterno? As I wrote in my earlier blog on Paterno, it isn’t always easy to do the right thing, and I have no idea if I’d behave any better if I were in such a situation. The characteristics of a good coach do not necessarily provide what it takes to make good decisions off the field. In this sense even more of the blame should go
4 0.22063567 623 andrew gelman stats-2011-03-21-Baseball’s greatest fielders
Introduction: Someone just stopped by and dropped off a copy of the book Wizardry: Baseball’s All-time Greatest Fielders Revealed, by Michael Humphreys. I don’t have much to say about the topic–I did see Brooks Robinson play, but I don’t remember any fancy plays. I must have seen Mark Belanger but I don’t really recall. Ozzie Smith was cool but I saw only him on TV. The most impressive thing I ever saw live was Rickey Henderson stealing a base. The best thing about that was that everyone was expecting him to steal the base, and he still was able to do it. But that wasn’t fielding either. Anyway, Humphreys was nice enough to give me a copy of his book, and since I can’t say much (I didn’t have it in me to study the formulas in detail, nor do I know enough to be able to evaluate them), I might as well say what I can say right away. (Note: Humphreys replies to some of these questions in a comment .) 1. Near the beginning, Humphreys says that 10 runs are worth about 1 win. I’ve always b
5 0.21659857 440 andrew gelman stats-2010-12-01-In defense of jargon
Introduction: Daniel Drezner takes on Bill James.
6 0.2077992 642 andrew gelman stats-2011-04-02-Bill James and the base-rate fallacy
9 0.19364335 509 andrew gelman stats-2011-01-09-Chartjunk, but in a good cause!
10 0.18392868 2116 andrew gelman stats-2013-11-28-“Statistics is what people think math is”
12 0.15313503 499 andrew gelman stats-2011-01-03-5 books
13 0.13801382 173 andrew gelman stats-2010-07-31-Editing and clutch hitting
14 0.13760506 367 andrew gelman stats-2010-10-25-In today’s economy, the rich get richer
15 0.13758206 987 andrew gelman stats-2011-11-02-How Khan Academy is using Machine Learning to Assess Student Mastery
16 0.13527273 295 andrew gelman stats-2010-09-25-Clusters with very small numbers of observations
17 0.12594542 1506 andrew gelman stats-2012-09-21-Building a regression model . . . with only 27 data points
18 0.12255508 355 andrew gelman stats-2010-10-20-Andy vs. the Ideal Point Model of Voting
19 0.11009754 1903 andrew gelman stats-2013-06-17-Weak identification provides partial information
20 0.10806577 1989 andrew gelman stats-2013-08-20-Correcting for multiple comparisons in a Bayesian regression model
topicId topicWeight
[(0, 0.235), (1, -0.016), (2, -0.013), (3, -0.012), (4, 0.033), (5, 0.038), (6, -0.005), (7, 0.056), (8, 0.04), (9, 0.031), (10, 0.035), (11, 0.021), (12, 0.021), (13, -0.036), (14, 0.003), (15, -0.003), (16, -0.015), (17, -0.001), (18, 0.074), (19, -0.094), (20, -0.034), (21, -0.013), (22, 0.033), (23, 0.085), (24, 0.044), (25, 0.087), (26, -0.06), (27, -0.05), (28, -0.015), (29, -0.238), (30, -0.024), (31, 0.023), (32, 0.103), (33, 0.004), (34, -0.088), (35, 0.094), (36, 0.069), (37, 0.012), (38, 0.001), (39, -0.049), (40, 0.182), (41, 0.143), (42, -0.094), (43, -0.006), (44, -0.006), (45, 0.001), (46, -0.059), (47, -0.05), (48, -0.06), (49, -0.015)]
simIndex simValue blogId blogTitle
same-blog 1 0.97467983 697 andrew gelman stats-2011-05-05-A statistician rereads Bill James
Introduction: Ben Lindbergh invited me to write an article for Baseball Prospectus. I first sent him this item on the differences between baseball and politics but he said it was too political for them. I then sent him this review of a book on baseball’s greatest fielders but he said they already had someone slotted to review that book. Then I sent him some reflections on the great Bill James and he published it ! If anybody out there knows Bill James, please send this on to him: I have some questions at the end that I’m curious about. Here’s how it begins: I read my first Bill James book in 1984, took my first statistics class in 1985, and began graduate study in statistics the next year. Besides giving me the opportunity to study with the best applied statistician of the late 20th century (Don Rubin) and the best theoretical statistician of the early 21st (Xiao-Li Meng), going to graduate school at Harvard in 1986 gave me the opportunity to sit in a basement room one evening that
Introduction: Eric Tassone writes: Probably not blog-worthy/blog-appropriate, but have you heard Bill James discussing the Sandusky & Paterno stuff? I think you discussed once his stance on the Dowd Report, and this seems to be from the same part of his personality—which goes beyond contrarian . . . I have in fact blogged on James ( many times ) and on Paterno , so yes I think this is blogworthy. On the other hand, most readers of this blog probably don’t care about baseball, football, or William James, so I’ll put the rest below the fold. What is legendary baseball statistician Bill James doing, defending the crime-coverups of legendary coach Joe Paterno? As I wrote in my earlier blog on Paterno, it isn’t always easy to do the right thing, and I have no idea if I’d behave any better if I were in such a situation. The characteristics of a good coach do not necessarily provide what it takes to make good decisions off the field. In this sense even more of the blame should go
3 0.86213464 642 andrew gelman stats-2011-04-02-Bill James and the base-rate fallacy
Introduction: I was recently rereading and enjoying Bill James’s Historical Baseball Abstract (the second edition, from 2001). But even the Master is not perfect. Here he is, in the context of the all-time 20th-greatest shortstop (in his reckoning): Are athletes special people? In general, no, but occasionally, yes. Johnny Pesky at 75 was trim, youthful, optimistic, and practically exploding with energy. You rarely meet anybody like that who isn’t an ex-athlete–and that makes athletes seem special. [italics in the original] Hey, I’ve met 75-year-olds like that–and none of them are ex-athletes! That’s probably because I don’t know a lot of ex-athletes. But Bill James . . . he knows a lot of athletes. He went to the bathroom with Tim Raines once! The most I can say is that I saw Rickey Henderson steal a couple bases when he was playing against the Orioles once. Cognitive psychologists talk about the base-rate fallacy , which is the mistake of estimating probabilities without accou
Introduction: During our discussion of estimates of teacher performance, Steve Sailer wrote : I suspect we’re going to take years to work the kinks out of overall rating systems. By way of analogy, Bill James kicked off the modern era of baseball statistics analysis around 1975. But he stuck to doing smaller scale analyses and avoided trying to build one giant overall model for rating players. In contrast, other analysts such as Pete Palmer rushed into building overall ranking systems, such as his 1984 book, but they tended to generate curious results such as the greatness of Roy Smalley Jr.. James held off until 1999 before unveiling his win share model for overall rankings. I remember looking at Pete Palmer’s book many years ago and being disappointed that he did everything through his Linear Weights formula. A hit is worth X, a walk is worth Y, etc. Some of this is good–it’s presumably an improvement on counting walks as 0 or 1 hits, also an improvement on counting doubles and triples a
5 0.85233688 440 andrew gelman stats-2010-12-01-In defense of jargon
Introduction: Daniel Drezner takes on Bill James.
6 0.77279681 623 andrew gelman stats-2011-03-21-Baseball’s greatest fielders
7 0.77119392 1113 andrew gelman stats-2012-01-11-Toshiro Kageyama on professionalism
8 0.76260209 367 andrew gelman stats-2010-10-25-In today’s economy, the rich get richer
9 0.76104558 509 andrew gelman stats-2011-01-09-Chartjunk, but in a good cause!
10 0.75726986 173 andrew gelman stats-2010-07-31-Editing and clutch hitting
11 0.70479864 942 andrew gelman stats-2011-10-04-45% hitting, 25% fielding, 25% pitching, and 100% not telling us how they did it
12 0.67320454 499 andrew gelman stats-2011-01-03-5 books
13 0.65918881 987 andrew gelman stats-2011-11-02-How Khan Academy is using Machine Learning to Assess Student Mastery
14 0.65724075 2116 andrew gelman stats-2013-11-28-“Statistics is what people think math is”
15 0.64626801 445 andrew gelman stats-2010-12-03-Getting a job in pro sports… as a statistician
16 0.64224935 1219 andrew gelman stats-2012-03-18-Tips on “great design” from . . . Microsoft!
19 0.62327915 355 andrew gelman stats-2010-10-20-Andy vs. the Ideal Point Model of Voting
20 0.61987811 949 andrew gelman stats-2011-10-10-Grrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrr
topicId topicWeight
[(1, 0.121), (9, 0.01), (16, 0.101), (21, 0.013), (24, 0.117), (50, 0.013), (54, 0.024), (55, 0.014), (63, 0.019), (82, 0.024), (86, 0.028), (89, 0.043), (99, 0.348)]
simIndex simValue blogId blogTitle
same-blog 1 0.9802407 697 andrew gelman stats-2011-05-05-A statistician rereads Bill James
Introduction: Ben Lindbergh invited me to write an article for Baseball Prospectus. I first sent him this item on the differences between baseball and politics but he said it was too political for them. I then sent him this review of a book on baseball’s greatest fielders but he said they already had someone slotted to review that book. Then I sent him some reflections on the great Bill James and he published it ! If anybody out there knows Bill James, please send this on to him: I have some questions at the end that I’m curious about. Here’s how it begins: I read my first Bill James book in 1984, took my first statistics class in 1985, and began graduate study in statistics the next year. Besides giving me the opportunity to study with the best applied statistician of the late 20th century (Don Rubin) and the best theoretical statistician of the early 21st (Xiao-Li Meng), going to graduate school at Harvard in 1986 gave me the opportunity to sit in a basement room one evening that
2 0.97449565 973 andrew gelman stats-2011-10-26-Antman again courts controversy
Introduction: Commenter Zbicyclist links to a fun article by Howard French on biologist E. O. Wilson: Wilson announced that his new book may be his last. It is not limited to the discussion of evolutionary biology, but ranges provocatively through the humanities, as well. . . . Generation after generation of students have suffered trying to “puzzle out” what great thinkers like Socrates, Plato, and Descartes had to say on the great questions of man’s nature, Wilson said, but this was of little use, because philosophy has been based on “failed models of the brain.” This reminds me of my recent remarks on the use of crude folk-psychology models as microfoundations for social sciences. The article also discusses Wilson’s recent crusade against selfish-gene-style simplifications of human and animal nature. I’m with Wilson 100% on this one. “Two brothers or eight cousins” is a cute line but it doesn’t seem to come close to describing how species or societies work, and it’s always seemed a
3 0.97423887 525 andrew gelman stats-2011-01-19-Thiel update
Introduction: A year or so ago I discussed the reasoning of zillionaire financier Peter Thiel, who seems to believe his own hype and, worse, seems to be able to convince reporters of his infallibility as well. Apparently he “possesses a preternatural ability to spot patterns that others miss.” More recently, Felix Salmon commented on Thiel’s financial misadventures: Peter Thiel’s hedge fund, Clarium Capital, ain’t doing so well. Its assets under management are down 90% from their peak, and total returns from the high point are -65%. Thiel is smart, successful, rich, well-connected, and on top of all that his calls have actually been right . . . None of that, clearly, was enough for Clarium to make money on its trades: the fund was undone by volatility and weakness in risk management. There are a few lessons to learn here. Firstly, just because someone is a Silicon Valley gazillionaire, or any kind of successful entrepreneur for that matter, doesn’t mean they should be trusted with oth
4 0.97036099 1154 andrew gelman stats-2012-02-04-“Turn a Boring Bar Graph into a 3D Masterpiece”
Introduction: Jimmy sends in this . Steps include “Make whimsical sparkles by drawing an ellipse using the Ellipse Tool,” “Rotate the sparkles . . . Give some sparkles less Opacity by using the Transparency Palette,” and “Add a haze around each sparkle by drawing a white ellipse using the Ellipse Tool.” The punchline: Now, the next time you need to include a boring graph in one of your designs you’ll be able to add some extra emphasis and get people to really pay attention to those numbers! P.S. to all the commenters: Yeah, yeah, do your contrarian best and tell me why chartjunk is actually a good thing, how I’m just a snob, etc etc.
5 0.96997821 272 andrew gelman stats-2010-09-13-Ross Ihaka to R: Drop Dead
Introduction: Christian Robert posts these thoughts : I [Ross Ihaka] have been worried for some time that R isn’t going to provide the base that we’re going to need for statistical computation in the future. (It may well be that the future is already upon us.) There are certainly efficiency problems (speed and memory use), but there are more fundamental issues too. Some of these were inherited from S and some are peculiar to R. One of the worst problems is scoping. Consider the following little gem. f =function() { if (runif(1) > .5) x = 10 x } The x being returned by this function is randomly local or global. There are other examples where variables alternate between local and non-local throughout the body of a function. No sensible language would allow this. It’s ugly and it makes optimisation really difficult. This isn’t the only problem, even weirder things happen because of interactions between scoping and lazy evaluation. In light of this, I [Ihaka] have come to the c
8 0.95950252 906 andrew gelman stats-2011-09-14-Another day, another stats postdoc
9 0.95497751 2190 andrew gelman stats-2014-01-29-Stupid R Tricks: Random Scope
10 0.95248705 541 andrew gelman stats-2011-01-27-Why can’t I be more like Bill James, or, The use of default and default-like models
12 0.94883132 315 andrew gelman stats-2010-10-03-He doesn’t trust the fit . . . r=.999
14 0.94785613 642 andrew gelman stats-2011-04-02-Bill James and the base-rate fallacy
15 0.94363356 2354 andrew gelman stats-2014-05-30-Mmm, statistical significance . . . Evilicious!
16 0.94308943 148 andrew gelman stats-2010-07-15-“Gender Bias Still Exists in Modern Children’s Literature, Say Centre Researchers”
17 0.94287843 1917 andrew gelman stats-2013-06-28-Econ coauthorship update
18 0.94125414 738 andrew gelman stats-2011-05-30-Works well versus well understood
19 0.94121361 1656 andrew gelman stats-2013-01-05-Understanding regression models and regression coefficients