andrew_gelman_stats andrew_gelman_stats-2011 andrew_gelman_stats-2011-697 knowledge-graph by maker-knowledge-mining

697 andrew gelman stats-2011-05-05-A statistician rereads Bill James


meta infos for this blog

Source: html

Introduction: Ben Lindbergh invited me to write an article for Baseball Prospectus. I first sent him this item on the differences between baseball and politics but he said it was too political for them. I then sent him this review of a book on baseball’s greatest fielders but he said they already had someone slotted to review that book. Then I sent him some reflections on the great Bill James and he published it ! If anybody out there knows Bill James, please send this on to him: I have some questions at the end that I’m curious about. Here’s how it begins: I read my first Bill James book in 1984, took my first statistics class in 1985, and began graduate study in statistics the next year. Besides giving me the opportunity to study with the best applied statistician of the late 20th century (Don Rubin) and the best theoretical statistician of the early 21st (Xiao-Li Meng), going to graduate school at Harvard in 1986 gave me the opportunity to sit in a basement room one evening that


Summary: the most important sentenses genereted by tfidf model

sentIndex sentText sentNum sentScore

1 I first sent him this item on the differences between baseball and politics but he said it was too political for them. [sent-2, score-0.415]

2 I then sent him this review of a book on baseball’s greatest fielders but he said they already had someone slotted to review that book. [sent-3, score-0.188]

3 Then I sent him some reflections on the great Bill James and he published it ! [sent-4, score-0.101]

4 If anybody out there knows Bill James, please send this on to him: I have some questions at the end that I’m curious about. [sent-5, score-0.087]

5 Here’s how it begins: I read my first Bill James book in 1984, took my first statistics class in 1985, and began graduate study in statistics the next year. [sent-6, score-0.37]

6 ” Unfortunately, John McNamara didn’t hear us, and the rest was history. [sent-8, score-0.081]

7 In statistics, I like to say that each substantive hypothesis deserves its own analysis: it’s generally hopeless to expect that you can run a single regression and pull off the answers to each of your research questions, one coefficient at a time. [sent-11, score-0.171]

8 - Controlled comparisons: Instead of comparing simple aggregates, be more careful and make comparisons on pairs or groups of similar players or teams. [sent-12, score-0.183]

9 As economists Rajeev Dehejia and Sadek Wahba demonstrated in a pair of influential articles (they have been cited over 2400 times since their publication a decade ago), these comparisons work only when you are controlling for appropriate characteristics. [sent-13, score-0.266]

10 In the case of Bill James’s analysis, player age is typically a key comparison variable. [sent-14, score-0.111]

11 From the standpoint of applied statistics, controlled comparisons combine the averaging that you get from having a moderate or large sample size with the insight that comes from understanding individual cases. [sent-15, score-0.299]

12 - Conceptual models used as guides to comparisons: James has written many times that he does not study statistical questions, he studies baseball questions. [sent-16, score-0.564]

13 A conceptual model such as the defensive spectrum, or the narrowing of abilities, or the contribution of speed to both offense and defense, drives the direction of the study and motivates many of the details of the analysis. [sent-18, score-0.297]

14 I have tried to follow these principles in my own work. [sent-19, score-0.162]

15 One central method of statistics that Bill James does not draw upon very often (if at all) is fitting parametric models. [sent-20, score-0.172]

16 For example, James found that the power two in the Pythagorean prediction for wins worked pretty well. [sent-21, score-0.072]

17 He didn’t try to estimate the power from data, nor did he, for example, try to come up with a conclusion such as, “each additional run is worth 0. [sent-22, score-0.357]

18 ” On the rare occasions that he did estimate a parameter (for example, the relative values of stolen bases and times caught stealing), he buried his methodology and had no interest in making a big deal about the estimation. [sent-24, score-0.613]

19 Why didn’t Bill James follow the example of Pete Palmer and others and try to estimate the relative values of walks, singles, doubles, and other outcomes? [sent-26, score-0.338]


similar blogs computed by tfidf model

tfidf for this blog:

wordName wordTfidf (topN-words)

[('james', 0.37), ('baseball', 0.314), ('bill', 0.24), ('comparisons', 0.183), ('abilities', 0.122), ('controlled', 0.116), ('conceptual', 0.113), ('player', 0.111), ('try', 0.102), ('sent', 0.101), ('study', 0.097), ('single', 0.096), ('statistics', 0.091), ('graduate', 0.091), ('principles', 0.088), ('questions', 0.087), ('singles', 0.087), ('dehejia', 0.087), ('fielders', 0.087), ('pythagorean', 0.087), ('narrowing', 0.087), ('prospectus', 0.087), ('relative', 0.083), ('times', 0.083), ('wahba', 0.082), ('aggregates', 0.082), ('doubles', 0.082), ('pluralism', 0.082), ('rajeev', 0.082), ('fitting', 0.081), ('estimate', 0.081), ('rest', 0.081), ('palmer', 0.078), ('buried', 0.078), ('opportunity', 0.078), ('screaming', 0.075), ('basement', 0.075), ('hopeless', 0.075), ('occasions', 0.075), ('tried', 0.074), ('walks', 0.073), ('power', 0.072), ('values', 0.072), ('pete', 0.071), ('stealing', 0.071), ('stolen', 0.071), ('didn', 0.07), ('bases', 0.07), ('guides', 0.07), ('statistician', 0.068)]

similar blogs list:

simIndex simValue blogId blogTitle

same-blog 1 0.9999994 697 andrew gelman stats-2011-05-05-A statistician rereads Bill James

Introduction: Ben Lindbergh invited me to write an article for Baseball Prospectus. I first sent him this item on the differences between baseball and politics but he said it was too political for them. I then sent him this review of a book on baseball’s greatest fielders but he said they already had someone slotted to review that book. Then I sent him some reflections on the great Bill James and he published it ! If anybody out there knows Bill James, please send this on to him: I have some questions at the end that I’m curious about. Here’s how it begins: I read my first Bill James book in 1984, took my first statistics class in 1985, and began graduate study in statistics the next year. Besides giving me the opportunity to study with the best applied statistician of the late 20th century (Don Rubin) and the best theoretical statistician of the early 21st (Xiao-Li Meng), going to graduate school at Harvard in 1986 gave me the opportunity to sit in a basement room one evening that

2 0.37125444 541 andrew gelman stats-2011-01-27-Why can’t I be more like Bill James, or, The use of default and default-like models

Introduction: During our discussion of estimates of teacher performance, Steve Sailer wrote : I suspect we’re going to take years to work the kinks out of overall rating systems. By way of analogy, Bill James kicked off the modern era of baseball statistics analysis around 1975. But he stuck to doing smaller scale analyses and avoided trying to build one giant overall model for rating players. In contrast, other analysts such as Pete Palmer rushed into building overall ranking systems, such as his 1984 book, but they tended to generate curious results such as the greatness of Roy Smalley Jr.. James held off until 1999 before unveiling his win share model for overall rankings. I remember looking at Pete Palmer’s book many years ago and being disappointed that he did everything through his Linear Weights formula. A hit is worth X, a walk is worth Y, etc. Some of this is good–it’s presumably an improvement on counting walks as 0 or 1 hits, also an improvement on counting doubles and triples a

3 0.29236126 1419 andrew gelman stats-2012-07-17-“Faith means belief in something concerning which doubt is theoretically possible.” — William James

Introduction: Eric Tassone writes: Probably not blog-worthy/blog-appropriate, but have you heard Bill James discussing the Sandusky & Paterno stuff? I think you discussed once his stance on the Dowd Report, and this seems to be from the same part of his personality—which goes beyond contrarian . . . I have in fact blogged on James ( many times ) and on Paterno , so yes I think this is blogworthy. On the other hand, most readers of this blog probably don’t care about baseball, football, or William James, so I’ll put the rest below the fold. What is legendary baseball statistician Bill James doing, defending the crime-coverups of legendary coach Joe Paterno? As I wrote in my earlier blog on Paterno, it isn’t always easy to do the right thing, and I have no idea if I’d behave any better if I were in such a situation. The characteristics of a good coach do not necessarily provide what it takes to make good decisions off the field. In this sense even more of the blame should go

4 0.22063567 623 andrew gelman stats-2011-03-21-Baseball’s greatest fielders

Introduction: Someone just stopped by and dropped off a copy of the book Wizardry: Baseball’s All-time Greatest Fielders Revealed, by Michael Humphreys. I don’t have much to say about the topic–I did see Brooks Robinson play, but I don’t remember any fancy plays. I must have seen Mark Belanger but I don’t really recall. Ozzie Smith was cool but I saw only him on TV. The most impressive thing I ever saw live was Rickey Henderson stealing a base. The best thing about that was that everyone was expecting him to steal the base, and he still was able to do it. But that wasn’t fielding either. Anyway, Humphreys was nice enough to give me a copy of his book, and since I can’t say much (I didn’t have it in me to study the formulas in detail, nor do I know enough to be able to evaluate them), I might as well say what I can say right away. (Note: Humphreys replies to some of these questions in a comment .) 1. Near the beginning, Humphreys says that 10 runs are worth about 1 win. I’ve always b

5 0.21659857 440 andrew gelman stats-2010-12-01-In defense of jargon

Introduction: Daniel Drezner takes on Bill James.

6 0.2077992 642 andrew gelman stats-2011-04-02-Bill James and the base-rate fallacy

7 0.19488072 611 andrew gelman stats-2011-03-14-As the saying goes, when they argue that you’re taking over, that’s when you know you’ve won

8 0.19380066 652 andrew gelman stats-2011-04-07-Minor-league Stats Predict Major-league Performance, Sarah Palin, and Some Differences Between Baseball and Politics

9 0.19364335 509 andrew gelman stats-2011-01-09-Chartjunk, but in a good cause!

10 0.18392868 2116 andrew gelman stats-2013-11-28-“Statistics is what people think math is”

11 0.15556006 2226 andrew gelman stats-2014-02-26-Econometrics, political science, epidemiology, etc.: Don’t model the probability of a discrete outcome, model the underlying continuous variable

12 0.15313503 499 andrew gelman stats-2011-01-03-5 books

13 0.13801382 173 andrew gelman stats-2010-07-31-Editing and clutch hitting

14 0.13760506 367 andrew gelman stats-2010-10-25-In today’s economy, the rich get richer

15 0.13758206 987 andrew gelman stats-2011-11-02-How Khan Academy is using Machine Learning to Assess Student Mastery

16 0.13527273 295 andrew gelman stats-2010-09-25-Clusters with very small numbers of observations

17 0.12594542 1506 andrew gelman stats-2012-09-21-Building a regression model . . . with only 27 data points

18 0.12255508 355 andrew gelman stats-2010-10-20-Andy vs. the Ideal Point Model of Voting

19 0.11009754 1903 andrew gelman stats-2013-06-17-Weak identification provides partial information

20 0.10806577 1989 andrew gelman stats-2013-08-20-Correcting for multiple comparisons in a Bayesian regression model


similar blogs computed by lsi model

lsi for this blog:

topicId topicWeight

[(0, 0.235), (1, -0.016), (2, -0.013), (3, -0.012), (4, 0.033), (5, 0.038), (6, -0.005), (7, 0.056), (8, 0.04), (9, 0.031), (10, 0.035), (11, 0.021), (12, 0.021), (13, -0.036), (14, 0.003), (15, -0.003), (16, -0.015), (17, -0.001), (18, 0.074), (19, -0.094), (20, -0.034), (21, -0.013), (22, 0.033), (23, 0.085), (24, 0.044), (25, 0.087), (26, -0.06), (27, -0.05), (28, -0.015), (29, -0.238), (30, -0.024), (31, 0.023), (32, 0.103), (33, 0.004), (34, -0.088), (35, 0.094), (36, 0.069), (37, 0.012), (38, 0.001), (39, -0.049), (40, 0.182), (41, 0.143), (42, -0.094), (43, -0.006), (44, -0.006), (45, 0.001), (46, -0.059), (47, -0.05), (48, -0.06), (49, -0.015)]

similar blogs list:

simIndex simValue blogId blogTitle

same-blog 1 0.97467983 697 andrew gelman stats-2011-05-05-A statistician rereads Bill James

Introduction: Ben Lindbergh invited me to write an article for Baseball Prospectus. I first sent him this item on the differences between baseball and politics but he said it was too political for them. I then sent him this review of a book on baseball’s greatest fielders but he said they already had someone slotted to review that book. Then I sent him some reflections on the great Bill James and he published it ! If anybody out there knows Bill James, please send this on to him: I have some questions at the end that I’m curious about. Here’s how it begins: I read my first Bill James book in 1984, took my first statistics class in 1985, and began graduate study in statistics the next year. Besides giving me the opportunity to study with the best applied statistician of the late 20th century (Don Rubin) and the best theoretical statistician of the early 21st (Xiao-Li Meng), going to graduate school at Harvard in 1986 gave me the opportunity to sit in a basement room one evening that

2 0.87240499 1419 andrew gelman stats-2012-07-17-“Faith means belief in something concerning which doubt is theoretically possible.” — William James

Introduction: Eric Tassone writes: Probably not blog-worthy/blog-appropriate, but have you heard Bill James discussing the Sandusky & Paterno stuff? I think you discussed once his stance on the Dowd Report, and this seems to be from the same part of his personality—which goes beyond contrarian . . . I have in fact blogged on James ( many times ) and on Paterno , so yes I think this is blogworthy. On the other hand, most readers of this blog probably don’t care about baseball, football, or William James, so I’ll put the rest below the fold. What is legendary baseball statistician Bill James doing, defending the crime-coverups of legendary coach Joe Paterno? As I wrote in my earlier blog on Paterno, it isn’t always easy to do the right thing, and I have no idea if I’d behave any better if I were in such a situation. The characteristics of a good coach do not necessarily provide what it takes to make good decisions off the field. In this sense even more of the blame should go

3 0.86213464 642 andrew gelman stats-2011-04-02-Bill James and the base-rate fallacy

Introduction: I was recently rereading and enjoying Bill James’s Historical Baseball Abstract (the second edition, from 2001). But even the Master is not perfect. Here he is, in the context of the all-time 20th-greatest shortstop (in his reckoning): Are athletes special people? In general, no, but occasionally, yes. Johnny Pesky at 75 was trim, youthful, optimistic, and practically exploding with energy. You rarely meet anybody like that who isn’t an ex-athlete–and that makes athletes seem special. [italics in the original] Hey, I’ve met 75-year-olds like that–and none of them are ex-athletes! That’s probably because I don’t know a lot of ex-athletes. But Bill James . . . he knows a lot of athletes. He went to the bathroom with Tim Raines once! The most I can say is that I saw Rickey Henderson steal a couple bases when he was playing against the Orioles once. Cognitive psychologists talk about the base-rate fallacy , which is the mistake of estimating probabilities without accou

4 0.8612116 541 andrew gelman stats-2011-01-27-Why can’t I be more like Bill James, or, The use of default and default-like models

Introduction: During our discussion of estimates of teacher performance, Steve Sailer wrote : I suspect we’re going to take years to work the kinks out of overall rating systems. By way of analogy, Bill James kicked off the modern era of baseball statistics analysis around 1975. But he stuck to doing smaller scale analyses and avoided trying to build one giant overall model for rating players. In contrast, other analysts such as Pete Palmer rushed into building overall ranking systems, such as his 1984 book, but they tended to generate curious results such as the greatness of Roy Smalley Jr.. James held off until 1999 before unveiling his win share model for overall rankings. I remember looking at Pete Palmer’s book many years ago and being disappointed that he did everything through his Linear Weights formula. A hit is worth X, a walk is worth Y, etc. Some of this is good–it’s presumably an improvement on counting walks as 0 or 1 hits, also an improvement on counting doubles and triples a

5 0.85233688 440 andrew gelman stats-2010-12-01-In defense of jargon

Introduction: Daniel Drezner takes on Bill James.

6 0.77279681 623 andrew gelman stats-2011-03-21-Baseball’s greatest fielders

7 0.77119392 1113 andrew gelman stats-2012-01-11-Toshiro Kageyama on professionalism

8 0.76260209 367 andrew gelman stats-2010-10-25-In today’s economy, the rich get richer

9 0.76104558 509 andrew gelman stats-2011-01-09-Chartjunk, but in a good cause!

10 0.75726986 173 andrew gelman stats-2010-07-31-Editing and clutch hitting

11 0.70479864 942 andrew gelman stats-2011-10-04-45% hitting, 25% fielding, 25% pitching, and 100% not telling us how they did it

12 0.67320454 499 andrew gelman stats-2011-01-03-5 books

13 0.65918881 987 andrew gelman stats-2011-11-02-How Khan Academy is using Machine Learning to Assess Student Mastery

14 0.65724075 2116 andrew gelman stats-2013-11-28-“Statistics is what people think math is”

15 0.64626801 445 andrew gelman stats-2010-12-03-Getting a job in pro sports… as a statistician

16 0.64224935 1219 andrew gelman stats-2012-03-18-Tips on “great design” from . . . Microsoft!

17 0.63259995 611 andrew gelman stats-2011-03-14-As the saying goes, when they argue that you’re taking over, that’s when you know you’ve won

18 0.62936878 652 andrew gelman stats-2011-04-07-Minor-league Stats Predict Major-league Performance, Sarah Palin, and Some Differences Between Baseball and Politics

19 0.62327915 355 andrew gelman stats-2010-10-20-Andy vs. the Ideal Point Model of Voting

20 0.61987811 949 andrew gelman stats-2011-10-10-Grrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrr


similar blogs computed by lda model

lda for this blog:

topicId topicWeight

[(1, 0.121), (9, 0.01), (16, 0.101), (21, 0.013), (24, 0.117), (50, 0.013), (54, 0.024), (55, 0.014), (63, 0.019), (82, 0.024), (86, 0.028), (89, 0.043), (99, 0.348)]

similar blogs list:

simIndex simValue blogId blogTitle

same-blog 1 0.9802407 697 andrew gelman stats-2011-05-05-A statistician rereads Bill James

Introduction: Ben Lindbergh invited me to write an article for Baseball Prospectus. I first sent him this item on the differences between baseball and politics but he said it was too political for them. I then sent him this review of a book on baseball’s greatest fielders but he said they already had someone slotted to review that book. Then I sent him some reflections on the great Bill James and he published it ! If anybody out there knows Bill James, please send this on to him: I have some questions at the end that I’m curious about. Here’s how it begins: I read my first Bill James book in 1984, took my first statistics class in 1985, and began graduate study in statistics the next year. Besides giving me the opportunity to study with the best applied statistician of the late 20th century (Don Rubin) and the best theoretical statistician of the early 21st (Xiao-Li Meng), going to graduate school at Harvard in 1986 gave me the opportunity to sit in a basement room one evening that

2 0.97449565 973 andrew gelman stats-2011-10-26-Antman again courts controversy

Introduction: Commenter Zbicyclist links to a fun article by Howard French on biologist E. O. Wilson: Wilson announced that his new book may be his last. It is not limited to the discussion of evolutionary biology, but ranges provocatively through the humanities, as well. . . . Generation after generation of students have suffered trying to “puzzle out” what great thinkers like Socrates, Plato, and Descartes had to say on the great questions of man’s nature, Wilson said, but this was of little use, because philosophy has been based on “failed models of the brain.” This reminds me of my recent remarks on the use of crude folk-psychology models as microfoundations for social sciences. The article also discusses Wilson’s recent crusade against selfish-gene-style simplifications of human and animal nature. I’m with Wilson 100% on this one. “Two brothers or eight cousins” is a cute line but it doesn’t seem to come close to describing how species or societies work, and it’s always seemed a

3 0.97423887 525 andrew gelman stats-2011-01-19-Thiel update

Introduction: A year or so ago I discussed the reasoning of zillionaire financier Peter Thiel, who seems to believe his own hype and, worse, seems to be able to convince reporters of his infallibility as well. Apparently he “possesses a preternatural ability to spot patterns that others miss.” More recently, Felix Salmon commented on Thiel’s financial misadventures: Peter Thiel’s hedge fund, Clarium Capital, ain’t doing so well. Its assets under management are down 90% from their peak, and total returns from the high point are -65%. Thiel is smart, successful, rich, well-connected, and on top of all that his calls have actually been right . . . None of that, clearly, was enough for Clarium to make money on its trades: the fund was undone by volatility and weakness in risk management. There are a few lessons to learn here. Firstly, just because someone is a Silicon Valley gazillionaire, or any kind of successful entrepreneur for that matter, doesn’t mean they should be trusted with oth

4 0.97036099 1154 andrew gelman stats-2012-02-04-“Turn a Boring Bar Graph into a 3D Masterpiece”

Introduction: Jimmy sends in this . Steps include “Make whimsical sparkles by drawing an ellipse using the Ellipse Tool,” “Rotate the sparkles . . . Give some sparkles less Opacity by using the Transparency Palette,” and “Add a haze around each sparkle by drawing a white ellipse using the Ellipse Tool.” The punchline: Now, the next time you need to include a boring graph in one of your designs you’ll be able to add some extra emphasis and get people to really pay attention to those numbers! P.S. to all the commenters: Yeah, yeah, do your contrarian best and tell me why chartjunk is actually a good thing, how I’m just a snob, etc etc.

5 0.96997821 272 andrew gelman stats-2010-09-13-Ross Ihaka to R: Drop Dead

Introduction: Christian Robert posts these thoughts : I [Ross Ihaka] have been worried for some time that R isn’t going to provide the base that we’re going to need for statistical computation in the future. (It may well be that the future is already upon us.) There are certainly efficiency problems (speed and memory use), but there are more fundamental issues too. Some of these were inherited from S and some are peculiar to R. One of the worst problems is scoping. Consider the following little gem. f =function() { if (runif(1) > .5) x = 10 x } The x being returned by this function is randomly local or global. There are other examples where variables alternate between local and non-local throughout the body of a function. No sensible language would allow this. It’s ugly and it makes optimisation really difficult. This isn’t the only problem, even weirder things happen because of interactions between scoping and lazy evaluation. In light of this, I [Ihaka] have come to the c

6 0.96500504 664 andrew gelman stats-2011-04-16-Dilbert update: cartooning can give you the strength to open jars with your bare hands

7 0.96145171 1665 andrew gelman stats-2013-01-10-That controversial claim that high genetic diversity, or low genetic diversity, is bad for the economy

8 0.95950252 906 andrew gelman stats-2011-09-14-Another day, another stats postdoc

9 0.95497751 2190 andrew gelman stats-2014-01-29-Stupid R Tricks: Random Scope

10 0.95248705 541 andrew gelman stats-2011-01-27-Why can’t I be more like Bill James, or, The use of default and default-like models

11 0.95015037 1419 andrew gelman stats-2012-07-17-“Faith means belief in something concerning which doubt is theoretically possible.” — William James

12 0.94883132 315 andrew gelman stats-2010-10-03-He doesn’t trust the fit . . . r=.999

13 0.94811696 2030 andrew gelman stats-2013-09-19-Is coffee a killer? I don’t think the effect is as high as was estimated from the highest number that came out of a noisy study

14 0.94785613 642 andrew gelman stats-2011-04-02-Bill James and the base-rate fallacy

15 0.94363356 2354 andrew gelman stats-2014-05-30-Mmm, statistical significance . . . Evilicious!

16 0.94308943 148 andrew gelman stats-2010-07-15-“Gender Bias Still Exists in Modern Children’s Literature, Say Centre Researchers”

17 0.94287843 1917 andrew gelman stats-2013-06-28-Econ coauthorship update

18 0.94125414 738 andrew gelman stats-2011-05-30-Works well versus well understood

19 0.94121361 1656 andrew gelman stats-2013-01-05-Understanding regression models and regression coefficients

20 0.94095683 657 andrew gelman stats-2011-04-11-Note to Dilbert: The difference between Charlie Sheen and Superman is that the Man of Steel protected Lois Lane, he didn’t bruise her