andrew_gelman_stats andrew_gelman_stats-2010 andrew_gelman_stats-2010-137 knowledge-graph by maker-knowledge-mining
Source: html
Introduction: Freakonomics reports : A reader in Norway named Christian Sørensen examined the height statistics for all players in the 2010 World Cup and found an interesting anomaly: there seemed to be unnaturally few players listed at 169, 179, and 189 centimeters and an apparent surplus of players who were 170, 180, and 190 centimeters tall (roughly 5-foot-7 inches, 5-foot-11 inches, and 6-foot-3 inches, respectively). Here’s the data: It’s not costless to communicate numbers. When we compare “eighty” (6 characters) vs “seventy-nine” (12 characters) – how much information are we gaining by twice the number of characters? Do people really care about height at +-0.5 cm or is +-1 cm enough? It’s harder to communicate odd numbers (“three” vs four or two, “seven” vs “six” or “eight”, “nine” vs “ten”) than even ones. As language tends to follow our behaviors, people have been doing it for a long time. We remember the shorter description of a quantity. This is my theory why we end up wi
sentIndex sentText sentNum sentScore
1 Here’s the data: It’s not costless to communicate numbers. [sent-2, score-0.283]
2 When we compare “eighty” (6 characters) vs “seventy-nine” (12 characters) – how much information are we gaining by twice the number of characters? [sent-3, score-0.457]
3 It’s harder to communicate odd numbers (“three” vs four or two, “seven” vs “six” or “eight”, “nine” vs “ten”) than even ones. [sent-6, score-1.098]
4 As language tends to follow our behaviors, people have been doing it for a long time. [sent-7, score-0.067]
5 This is my theory why we end up with more rounded numbers. [sent-9, score-0.094]
6 This is also partially why Benford’s law holds: we change the scales and measurement units as to enable us to store the numbers in our minds more economically. [sent-10, score-0.487]
7 Compare “ninety-nine” (11 characters) with “hundred” (7c), or “nine hundred ninety-nine” (24) with “thousand” (8c). [sent-11, score-0.136]
8 The fact that I said 100 implies that there is a certain amount of uncertainty in my estimate. [sent-14, score-0.113]
9 I could have written it as 1e2, implying that the real quantity is somewhere between 50 and 150. [sent-15, score-0.567]
10 If I said 102, I’d be implying that the real quantity is between 101 and 103. [sent-16, score-0.68]
11 If I said 103, I’d be implying that the real quantity is between 102. [sent-17, score-0.68]
12 If I said 50, the real quantity is probably between 40 and 60. [sent-20, score-0.474]
13 This way, by rounding up, I have been both economical in my expression but also been able to honestly communicate my standard error. [sent-21, score-0.553]
14 Eventually, increased accuracy is not always worth the increased cost of communication and memorization. [sent-22, score-0.206]
15 So, do you still think World Cup players are being self-aggrandizing, or are they perhaps just economical or even conscious of standard errors? [sent-23, score-0.537]
16 [D+1: Hal Varian points to number clustering in asset markets . [sent-24, score-0.167]
17 Also thanks to Janne helped improve the above presentation. [sent-25, score-0.065]
wordName wordTfidf (topN-words)
[('vs', 0.282), ('characters', 0.261), ('inches', 0.253), ('quantity', 0.248), ('players', 0.231), ('centimeters', 0.219), ('economical', 0.219), ('implying', 0.206), ('cm', 0.188), ('communicate', 0.183), ('cup', 0.164), ('nine', 0.143), ('hundred', 0.136), ('height', 0.127), ('said', 0.113), ('real', 0.113), ('increased', 0.103), ('costless', 0.1), ('janne', 0.1), ('surplus', 0.1), ('benford', 0.094), ('rounded', 0.094), ('compare', 0.091), ('asset', 0.09), ('varian', 0.09), ('norway', 0.087), ('conscious', 0.087), ('gaining', 0.084), ('anomaly', 0.084), ('honestly', 0.077), ('clustering', 0.077), ('respectively', 0.076), ('rounding', 0.074), ('seven', 0.073), ('scales', 0.073), ('tall', 0.073), ('enable', 0.072), ('hal', 0.072), ('behaviors', 0.072), ('eight', 0.072), ('shorter', 0.072), ('minds', 0.07), ('numbers', 0.069), ('partially', 0.069), ('store', 0.068), ('examined', 0.067), ('tends', 0.067), ('units', 0.066), ('helped', 0.065), ('apparent', 0.064)]
simIndex simValue blogId blogTitle
same-blog 1 0.99999994 137 andrew gelman stats-2010-07-10-Cost of communicating numbers
Introduction: Freakonomics reports : A reader in Norway named Christian Sørensen examined the height statistics for all players in the 2010 World Cup and found an interesting anomaly: there seemed to be unnaturally few players listed at 169, 179, and 189 centimeters and an apparent surplus of players who were 170, 180, and 190 centimeters tall (roughly 5-foot-7 inches, 5-foot-11 inches, and 6-foot-3 inches, respectively). Here’s the data: It’s not costless to communicate numbers. When we compare “eighty” (6 characters) vs “seventy-nine” (12 characters) – how much information are we gaining by twice the number of characters? Do people really care about height at +-0.5 cm or is +-1 cm enough? It’s harder to communicate odd numbers (“three” vs four or two, “seven” vs “six” or “eight”, “nine” vs “ten”) than even ones. As language tends to follow our behaviors, people have been doing it for a long time. We remember the shorter description of a quantity. This is my theory why we end up wi
2 0.10959528 646 andrew gelman stats-2011-04-04-Graphical insights into the safety of cycling.
Introduction: This article by Thomas Crag, at Copenhagenize, is marred by reliance on old data, but it’s so full of informative graphical displays — most of them not made by the author, I think — that it’s hard to pick just one. But here ya go. This figure shows fatalities (among cyclists) versus distance cycled, with a point for each year…unfortunately ending in way back in 1998, but still: This is a good alternative to the more common choice for this sort of plot, which would be overlaying curves of fatalities vs time and distance cycled vs time. The article also explicitly discusses the fact, previously discussed on this blog , that it’s misleading, to the point of being wrong in most contexts, to compare the safety of walking vs cycling vs driving by looking at the casualty or fatality rate per kilometer . Often, as in this article, the question of interest is something like, if more people switched from driving to cycling, how many more or fewer people would die? Obviously, if peo
3 0.102428 2204 andrew gelman stats-2014-02-09-Keli Liu and Xiao-Li Meng on Simpson’s paradox
Introduction: XL sent me this paper , “A Fruitful Resolution to Simpson’s Paradox via Multi-Resolution Inference.” I told Keli and Xiao-Li that I wasn’t sure I fully understood the paper—as usual, XL is subtle and sophisticated, also I only get about half of his jokes—but I sent along these thoughts: 1. I do not think counterfactuals or potential outcomes are necessary for Simpson’s paradox. I say this because one can set up Simpson’s paradox with variables that cannot be manipulated, or for which manipulations are not directly of interest. 2. Simpson’s paradox is part of a more general issue that regression coefs change if you add more predictors, the flipping of sign is not really necessary. Here’s an example that I use in my teaching that illustrates both points: I can run a regression predicting income from sex and height. I find that the coef of sex is $10,000 (i.e., comparing a man and woman of the same height, on average the man will make $10,000 more) and the coefficient of h
4 0.1020122 1381 andrew gelman stats-2012-06-16-The Art of Fielding
Introduction: I liked it; the reviews were well-deserved. It indeed is a cross between The Mysteries of Pittsburgh and The Universal Baseball Association, J. Henry Waugh, Prop. What struck me most, though, was the contrast with Indecision, the novel by Harbach’s associate, Benjamin Kunkel. As I noted a few years ago , Indecision was notable in that all the characters had agency. That is, each character had his or her own ideas and seemed to act on his or her own ideas, rather than merely carrying the plot along or providing scenery. In contrast, the most gripping drama in The Art of Fielding seem to be characters’ struggling with their plot-determined roles (hence the connection with Coover’s God-soaked baseball classic). Also notable to me was that the college-aged characters not being particularly obsessed with sex—I guess this is that easy-going hook-up culture I keep reading about—while at the same time, just about all the characters seem to be involved in serious drug addiction. I’ve re
Introduction: I just want to share with you the best comment we’ve every had in the nearly ten-year history of this blog. Also it has statistical content! Here’s the story. After seeing an amusing article by Tom Scocca relating how reporter John Lee Anderson called someone as a “little twerp” on twitter: I conjectured that Anderson suffered from “tall person syndrome,” that problem that some people of above-average height have, that they think they’re more important than other people because they literally look down on them. But I had no idea of Anderson’s actual height. Commenter Gary responded with this impressive bit of investigative reporting: Based on this picture: he appears to be fairly tall. But the perspective makes it hard to judge. Based on this picture: he appears to be about 9-10 inches taller than Catalina Garcia. But how tall is Catalina Garcia? Not that tall – she’s shorter than the high-wire artist Phillipe Petit: And he doesn’t appear
6 0.087353826 1945 andrew gelman stats-2013-07-18-“How big is your chance of dying in an ordinary play?”
7 0.084012225 2250 andrew gelman stats-2014-03-16-“I have no idea who Catalina Garcia is, but she makes a decent ruler”
9 0.079038218 303 andrew gelman stats-2010-09-28-“Genomics” vs. genetics
10 0.07786122 148 andrew gelman stats-2010-07-15-“Gender Bias Still Exists in Modern Children’s Literature, Say Centre Researchers”
11 0.074005775 1895 andrew gelman stats-2013-06-12-Peter Thiel is writing another book!
12 0.073669016 174 andrew gelman stats-2010-08-01-Literature and life
13 0.073142946 2251 andrew gelman stats-2014-03-17-In the best alternative histories, the real world is what’s ultimately real
14 0.071750171 1544 andrew gelman stats-2012-10-22-Is it meaningful to talk about a probability of “65.7%” that Obama will win the election?
15 0.06957005 939 andrew gelman stats-2011-10-03-DBQQ rounding for labeling charts and communicating tolerances
17 0.061102837 1467 andrew gelman stats-2012-08-23-The pinch-hitter syndrome again
18 0.060349148 954 andrew gelman stats-2011-10-12-Benford’s Law suggests lots of financial fraud
19 0.059559416 424 andrew gelman stats-2010-11-21-Data cleaning tool!
20 0.058490939 562 andrew gelman stats-2011-02-06-Statistician cracks Toronto lottery
topicId topicWeight
[(0, 0.102), (1, -0.028), (2, 0.011), (3, -0.003), (4, 0.015), (5, -0.025), (6, 0.024), (7, 0.013), (8, 0.017), (9, -0.012), (10, -0.025), (11, -0.024), (12, 0.003), (13, -0.009), (14, -0.021), (15, 0.017), (16, -0.002), (17, -0.007), (18, 0.04), (19, -0.012), (20, -0.011), (21, 0.008), (22, 0.007), (23, 0.042), (24, 0.007), (25, 0.003), (26, -0.017), (27, 0.01), (28, -0.006), (29, -0.016), (30, 0.014), (31, 0.026), (32, -0.01), (33, -0.0), (34, 0.017), (35, 0.005), (36, -0.008), (37, 0.028), (38, 0.015), (39, -0.027), (40, 0.01), (41, -0.019), (42, -0.021), (43, 0.009), (44, -0.034), (45, -0.003), (46, -0.021), (47, -0.015), (48, 0.019), (49, -0.019)]
simIndex simValue blogId blogTitle
same-blog 1 0.9616226 137 andrew gelman stats-2010-07-10-Cost of communicating numbers
Introduction: Freakonomics reports : A reader in Norway named Christian Sørensen examined the height statistics for all players in the 2010 World Cup and found an interesting anomaly: there seemed to be unnaturally few players listed at 169, 179, and 189 centimeters and an apparent surplus of players who were 170, 180, and 190 centimeters tall (roughly 5-foot-7 inches, 5-foot-11 inches, and 6-foot-3 inches, respectively). Here’s the data: It’s not costless to communicate numbers. When we compare “eighty” (6 characters) vs “seventy-nine” (12 characters) – how much information are we gaining by twice the number of characters? Do people really care about height at +-0.5 cm or is +-1 cm enough? It’s harder to communicate odd numbers (“three” vs four or two, “seven” vs “six” or “eight”, “nine” vs “ten”) than even ones. As language tends to follow our behaviors, people have been doing it for a long time. We remember the shorter description of a quantity. This is my theory why we end up wi
2 0.74717206 1403 andrew gelman stats-2012-07-02-Moving beyond hopeless graphics
Introduction: I was at a talk awhile ago where the speaker presented tables with 4, 5, 6, even 8 significant digits even though, as is usual, only the first or second digit of each number conveyed any useful information. A graph would be better, but even if you’re too lazy to make a plot, a bit of rounding would seem to be required. I mentioned this to a colleague, who responded: I don’t know how to stop this practice. Logic doesn’t work. Maybe ridicule? Best hope is the departure from field who do it. (Theories don’t die, but the people who follow those theories retire.) Another possibility, I think, is helpful software defaults. If we can get to the people who write the software, maybe we could have some impact. Once the software is written, however, it’s probably too late. I’m not far from the center of the R universe, but I don’t know if I’ll ever succeed in my goals of increasing the default number of histogram bars or reducing the default number of decimal places in regression
3 0.73478156 168 andrew gelman stats-2010-07-28-Colorless green, and clueless
Introduction: Faithful readers will know that my ideal alternative career is to be an editor in the Max Perkins mold. If not that, I think I’d enjoy being a literary essayist, someone like Alfred Kazin or Edmund Wilson or Louis Menand, who could write about my favorite authors and books in a forum where others would read and discuss what I wrote. I could occasionally collect my articles into books, and so on. On the other hand, if I actually had such a career, I wouldn’t have much of an option to do statistical research in my spare time, so I think for my own broader goals, I’ve gotten hold of the right side of the stick. As it is, I enjoy writing about literary matters but it never quite seems worth spending the time to do it right. (And, stepping outside myself, I realize that I have a lot more to offer the world as a statistician than literary critic. Criticism is like musicianship–it can be hard to do, and it’s impressive when done well, but a lot of people can do it. Literary criticism
4 0.72863013 563 andrew gelman stats-2011-02-07-Evaluating predictions of political events
Introduction: Mike Cohen writes: The recent events in Egypt raise an interesting statistical question. It is of course common for news stations like CNN to interview various officials and policy experts to find out what is likely to happen next. The obvious response of people like us is why ask such people when they didn’t foresee a month ago that these dynamic events were about to happen. One would instead like to hear from those experts that did predict that something was about to happen in Tunisia, and Egypt, and Jordan, and maybe Yemen, etc. Well, are there such people? My friend Bob Burton says that of course one can find such people in the sense that they made such predictions, but that is like finding counties that have voted for the President in the last five elections, big deal, or psychics that predicted the last assassination, again big deal. There is a good deal of truth in that. However, it seems like we do a little better. There are two points to make. First, there is an i
5 0.72002828 526 andrew gelman stats-2011-01-19-“If it saves the life of a single child…” and other nonsense
Introduction: This post is by Phil Price. An Oregon legislator, Mitch Greenlick, has proposed to make it illegal in Oregon to carry a child under six years old on one’s bike (including in a child seat) or in a bike trailer. The guy says “”We’ve just done a study showing that 30 percent of riders biking to work at least three days a week have some sort of crash that leads to an injury… When that’s going on out there, what happens when you have a four year old on the back of a bike?” The study is from Oregon Health Sciences University, at which the legislator is a professor. Greenlick also says “”If it’s true that it’s unsafe, we have an obligation to protect people. If I thought a law would save one child’s life, I would step in and do it. Wouldn’t you?” There are two statistical issues here. The first is in the category of “lies, damn lies, and statistics,” and involves the statement about how many riders have injuries. As quoted on a blog , the author of the study in question says th
6 0.71992147 1187 andrew gelman stats-2012-02-27-“Apple confronts the law of large numbers” . . . huh?
8 0.71776652 582 andrew gelman stats-2011-02-20-Statisticians vs. everybody else
10 0.71465373 949 andrew gelman stats-2011-10-10-Grrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrr
11 0.71302229 2352 andrew gelman stats-2014-05-29-When you believe in things that you don’t understand
12 0.70990872 2341 andrew gelman stats-2014-05-20-plus ça change, plus c’est la même chose
14 0.70960397 944 andrew gelman stats-2011-10-05-How accurate is your gaydar?
16 0.70054394 940 andrew gelman stats-2011-10-03-It depends upon what the meaning of the word “firm” is.
17 0.70052832 1707 andrew gelman stats-2013-02-05-Glenn Hubbard and I were on opposite sides of a court case and I didn’t even know it!
18 0.69833636 1759 andrew gelman stats-2013-03-12-How tall is Jon Lee Anderson?
19 0.69569653 1083 andrew gelman stats-2011-12-26-The quals and the quants
20 0.69546902 487 andrew gelman stats-2010-12-27-Alfred Kahn
topicId topicWeight
[(6, 0.015), (9, 0.016), (13, 0.01), (15, 0.017), (16, 0.037), (21, 0.014), (24, 0.089), (44, 0.011), (55, 0.023), (67, 0.029), (79, 0.037), (80, 0.143), (81, 0.073), (82, 0.013), (89, 0.031), (95, 0.04), (96, 0.029), (99, 0.262)]
simIndex simValue blogId blogTitle
same-blog 1 0.94052011 137 andrew gelman stats-2010-07-10-Cost of communicating numbers
Introduction: Freakonomics reports : A reader in Norway named Christian Sørensen examined the height statistics for all players in the 2010 World Cup and found an interesting anomaly: there seemed to be unnaturally few players listed at 169, 179, and 189 centimeters and an apparent surplus of players who were 170, 180, and 190 centimeters tall (roughly 5-foot-7 inches, 5-foot-11 inches, and 6-foot-3 inches, respectively). Here’s the data: It’s not costless to communicate numbers. When we compare “eighty” (6 characters) vs “seventy-nine” (12 characters) – how much information are we gaining by twice the number of characters? Do people really care about height at +-0.5 cm or is +-1 cm enough? It’s harder to communicate odd numbers (“three” vs four or two, “seven” vs “six” or “eight”, “nine” vs “ten”) than even ones. As language tends to follow our behaviors, people have been doing it for a long time. We remember the shorter description of a quantity. This is my theory why we end up wi
2 0.92837256 730 andrew gelman stats-2011-05-25-Rechecking the census
Introduction: Sam Roberts writes : The Census Bureau [reported] that though New York City’s population reached a record high of 8,175,133 in 2010, the gain of 2 percent, or 166,855 people, since 2000 fell about 200,000 short of what the bureau itself had estimated. Public officials were incredulous that a city that lures tens of thousands of immigrants each year and where a forest of new buildings has sprouted could really have recorded such a puny increase. How, they wondered, could Queens have grown by only one-tenth of 1 percent since 2000? How, even with a surge in foreclosures, could the number of vacant apartments have soared by nearly 60 percent in Queens and by 66 percent in Brooklyn? That does seem a bit suspicious. So the newspaper did its own survey: Now, a house-to-house New York Times survey of three representative square blocks where the Census Bureau said vacancies had increased and the population had declined since 2000 suggests that the city’s outrage is somewhat ju
3 0.91645157 1029 andrew gelman stats-2011-11-26-“To Rethink Sprawl, Start With Offices”
Introduction: According to this op-ed by Louise Mozingo, the fashion for suburban corporate parks is seventy years old: In 1942 the AT&T; Bell Telephone Laboratories moved from its offices in Lower Manhattan to a new, custom-designed facility on 213 acres outside Summit, N.J. The location provided space for laboratories and quiet for acoustical research, and new features: parking lots that allowed scientists and engineers to drive from their nearby suburban homes, a spacious cafeteria and lounge and, most surprisingly, views from every window of a carefully tended pastoral landscape designed by the Olmsted brothers, sons of the designer of Central Park. Corporate management never saw the city center in the same way again. Bell Labs initiated a tide of migration of white-collar workers, especially as state and federal governments conveniently extended highways into the rural edge. Just to throw some Richard Florida in the mix: Back in 1990, I turned down a job offer from Bell Labs, larg
4 0.91099524 138 andrew gelman stats-2010-07-10-Creating a good wager based on probability estimates
Introduction: Suppose you and I agree on a probability estimate…perhaps we both agree there is a 2/3 chance Spain will beat Netherlands in tomorrow’s World Cup. In this case, we could agree on a wager: if Spain beats Netherlands, I pay you $x. If Netherlands beats Spain, you pay me $2x. It is easy to see that my expected loss (or win) is $0, and that the same is true for you. Either of us should be indifferent to taking this bet, and to which side of the bet we are on. We might make this bet just to increase our interest in watching the game, but neither of us would see a money-making opportunity here. By the way, the relationship between “odds” and the event probability — a 1/3 chance of winning turning into a bet at 2:1 odds — is that if the event probability is p, then a fair bet has odds of (1/p – 1):1. More interesting, and more relevant to many real-world situations, is the case that we disagree on the probability of an event. If we disagree on the probability, then there should be
Introduction: The title of this blog post quotes the second line of the abstract of Goldstein et al.’s much ballyhooed 2008 tech report, Do More Expensive Wines Taste Better? Evidence from a Large Sample of Blind Tastings . The first sentence of the abstract is Individuals who are unaware of the price do not derive more enjoyment from more expensive wine. Perhaps not surprisingly, given the easy target wine snobs make, the popular press has picked up on the first sentence of the tech report. For example, the Freakonomics blog/radio entry of the same name quotes the first line, ignores the qualification, then concludes Wishing you the happiest of holiday seasons, and urging you to spend $15 instead of $50 on your next bottle of wine. Go ahead, take the money you save and blow it on the lottery. In case you’re wondering about whether to buy me a cheap or expensive bottle of wine, keep in mind I’ve had classical “wine training”. After ten minutes of training with some side by
6 0.90076786 1747 andrew gelman stats-2013-03-03-More research on the role of puzzles in processing data graphics
7 0.89052248 964 andrew gelman stats-2011-10-19-An interweaving-transformation strategy for boosting MCMC efficiency
8 0.88899618 1027 andrew gelman stats-2011-11-25-Note to student journalists: Google is your friend
9 0.88227797 642 andrew gelman stats-2011-04-02-Bill James and the base-rate fallacy
11 0.86713195 937 andrew gelman stats-2011-10-02-That advice not to work so hard
12 0.86447001 1033 andrew gelman stats-2011-11-28-Greece to head statistician: Tell the truth, go to jail
13 0.86363256 384 andrew gelman stats-2010-10-31-Two stories about the election that I don’t believe
14 0.86315173 129 andrew gelman stats-2010-07-05-Unrelated to all else
15 0.86072093 1222 andrew gelman stats-2012-03-20-5 books book
16 0.85949683 1962 andrew gelman stats-2013-07-30-The Roy causal model?
17 0.8593328 2367 andrew gelman stats-2014-06-10-Spring forward, fall back, drop dead?
18 0.85893506 461 andrew gelman stats-2010-12-09-“‘Why work?’”
19 0.85849392 556 andrew gelman stats-2011-02-04-Patterns
20 0.85646796 484 andrew gelman stats-2010-12-24-Foreign language skills as an intrinsic good; also, beware the tyranny of measurement