andrew_gelman_stats andrew_gelman_stats-2011 andrew_gelman_stats-2011-1025 knowledge-graph by maker-knowledge-mining

1025 andrew gelman stats-2011-11-24-Always check your evidence


meta infos for this blog

Source: html

Introduction: Logical reasoning typically takes the following form: 1. I know that A is true. 2. I know that A implies B. 3. Therefore, I can conclude that B is true. I, like Lewis Carroll, have problems with this process sometimes, but it’s pretty standard. There is also a statistical version in which the above statements are replaced by averages (“A usually happens,” etc.). But in all these stories, the argument can fall down if you get the facts wrong. Perhaps that’s one reason that statisticians can be obsessed with detail. For example, David Brooks wrote the following, in a column called “Living with Mistakes”: The historian Leslie Hannah identified the ten largest American companies in 1912. None of those companies ranked in the top 100 companies by 1990. Huh? Could that really be? I googled “ten largest american companies 1912″ and found this , from Leslie Hannah: No big deal: two still in the top 10 rather than zero in the top 100, but Brooks’s general


Summary: the most important sentenses genereted by tfidf model

sentIndex sentText sentNum sentScore

1 Logical reasoning typically takes the following form: 1. [sent-1, score-0.108]

2 There is also a statistical version in which the above statements are replaced by averages (“A usually happens,” etc. [sent-8, score-0.158]

3 But in all these stories, the argument can fall down if you get the facts wrong. [sent-10, score-0.07]

4 Perhaps that’s one reason that statisticians can be obsessed with detail. [sent-11, score-0.11]

5 For example, David Brooks wrote the following, in a column called “Living with Mistakes”: The historian Leslie Hannah identified the ten largest American companies in 1912. [sent-12, score-0.88]

6 None of those companies ranked in the top 100 companies by 1990. [sent-13, score-0.828]

7 I googled “ten largest american companies 1912″ and found this , from Leslie Hannah: No big deal: two still in the top 10 rather than zero in the top 100, but Brooks’s general point still holds. [sent-16, score-1.095]

8 This is more a comment on how a statistician such as myself will see a number and immediately feel the urge to check it. [sent-18, score-0.283]

9 If you don’t have that instinct—that feeling that numbers should directly correspond to reality—then I think you’re missing part of what it takes to really do statistics. [sent-19, score-0.391]

10 A statistician who doesn’t care about the numbers can be helpful and even make major contributions, but I still think something is missing. [sent-20, score-0.319]

11 The analogy might be a physicist who doesn’t like to tinker with machines or a chemist who doesn’t like to play around in the lab or a psychologist who has no curiosity about human motivations or an artist who doesn’t like to doodle. [sent-21, score-0.835]

12 Again, this is no criticism of Brooks—as a journalist, he’s of course more interested in good stories than in getting the details right (recall the notorious $20 dinner at Red Lobster). [sent-22, score-0.219]

13 There also might be some important part of the story that I’m missing. [sent-27, score-0.069]

14 Brooks’s column doesn’t supply a link to his data source but I’m willing to be corrected if there’s something else going on. [sent-28, score-0.282]


similar blogs computed by tfidf model

tfidf for this blog:

wordName wordTfidf (topN-words)

[('brooks', 0.395), ('companies', 0.293), ('hannah', 0.254), ('leslie', 0.239), ('largest', 0.166), ('doesn', 0.16), ('top', 0.142), ('ten', 0.138), ('numbers', 0.132), ('column', 0.127), ('lobster', 0.127), ('tinker', 0.12), ('instinct', 0.114), ('carroll', 0.114), ('urge', 0.114), ('stories', 0.112), ('chemist', 0.11), ('obsessed', 0.11), ('takes', 0.108), ('dinner', 0.107), ('lewis', 0.104), ('curiosity', 0.102), ('statistician', 0.1), ('artist', 0.1), ('ranked', 0.1), ('storytelling', 0.095), ('american', 0.092), ('machines', 0.091), ('historian', 0.087), ('corrected', 0.087), ('still', 0.087), ('googled', 0.086), ('motivations', 0.083), ('correspond', 0.082), ('physicist', 0.081), ('replaced', 0.08), ('averages', 0.078), ('psychologist', 0.078), ('logical', 0.076), ('journalist', 0.074), ('implies', 0.072), ('conclude', 0.072), ('lab', 0.07), ('facts', 0.07), ('identified', 0.069), ('immediately', 0.069), ('part', 0.069), ('supply', 0.068), ('reality', 0.068), ('contributions', 0.067)]

similar blogs list:

simIndex simValue blogId blogTitle

same-blog 1 0.99999982 1025 andrew gelman stats-2011-11-24-Always check your evidence

Introduction: Logical reasoning typically takes the following form: 1. I know that A is true. 2. I know that A implies B. 3. Therefore, I can conclude that B is true. I, like Lewis Carroll, have problems with this process sometimes, but it’s pretty standard. There is also a statistical version in which the above statements are replaced by averages (“A usually happens,” etc.). But in all these stories, the argument can fall down if you get the facts wrong. Perhaps that’s one reason that statisticians can be obsessed with detail. For example, David Brooks wrote the following, in a column called “Living with Mistakes”: The historian Leslie Hannah identified the ten largest American companies in 1912. None of those companies ranked in the top 100 companies by 1990. Huh? Could that really be? I googled “ten largest american companies 1912″ and found this , from Leslie Hannah: No big deal: two still in the top 10 rather than zero in the top 100, but Brooks’s general

2 0.33375466 1729 andrew gelman stats-2013-02-20-My beef with Brooks: the alternative to “good statistics” is not “no statistics,” it’s “bad statistics”

Introduction: I was thinking more about David Brooks’s anti-data column from yesterday, and I realized what is really bothering me. Brooks expresses skepticism about numbers, about the limitations of raw data, about the importance of human thinking. Fine, I agree with all of this, to some extent. But then Brooks turns around uses numbers and unquestioningly and uncritically (OK, not completely uncritically; see P.S. below). In a notorious recent case, Brooks wrote, in the context of college admissions: You’re going to want to argue with Unz’s article all the way along, especially for its narrow, math-test-driven view of merit. But it’s potentially ground-shifting. Unz’s other big point is that Jews are vastly overrepresented at elite universities and that Jewish achievement has collapsed. In the 1970s, for example, 40 percent of top scorers in the Math Olympiad had Jewish names. Now 2.5 percent do. But these numbers are incorrect, as I learned from a professor of oncology at the Univ

3 0.22074045 2280 andrew gelman stats-2014-04-03-As the boldest experiment in journalism history, you admit you made a mistake

Introduction: The pre-NYT David Brooks liked to make fun of the NYT. Here’s one from 1997 : I’m not sure I’d like to be one of the people featured on the New York Times wedding page, but I know I’d like to be the father of one of them. Imagine how happy Stanley J. Kogan must have been, for example, when his daughter Jamie got into Yale. Then imagine his pride when Jamie made Phi Beta Kappa and graduated summa cum laude. . . . he must have enjoyed a gloat or two when his daughter put on that cap and gown. And things only got better. Jamie breezed through Stanford Law School. And then she met a man—Thomas Arena—who appears to be exactly the sort of son-in-law that pediatric urologists dream about. . . . These two awesome resumes collided at a wedding ceremony . . . It must have been one of the happiest days in Stanley J. Kogan’s life. The rest of us got to read about it on the New York Times wedding page. Brooks is reputed to be Jewish himself so I think it’s ok for him to mock Jewish peop

4 0.20266575 1458 andrew gelman stats-2012-08-14-1.5 million people were told that extreme conservatives are happier than political moderates. Approximately .0001 million Americans learned that the opposite is true.

Introduction: A Brooks op-ed in the New York Times (circulation approximately 1.5 million): People at the extremes are happier than political moderates. . . . none, it seems, are happier than the Tea Partiers . . . Jay Livingston on his blog (circulation approximately 0 (rounding to the nearest million)), giving data from the 2009-2010 General Social Survey, which is the usual place people turn to for population data on happiness of Americans: The GSS does not offer “bitter” or “Tea Party” as choices, but extreme conservatives are nearly three times as likely as others to be “not too happy.” Livingston reports that the sample size for “Extremely Conservative” here is 80. Thus the standard error for that green bar on the right is approx sqrt(0.3*0.7/80)=0.05. So how could Brooks have made such a mistake? I can think of two possibilities: 1. Brooks has some other data source that directly addresses the happiness of supporters of the Tea Party movement. 2. Brooks looked a

5 0.17548421 1271 andrew gelman stats-2012-04-20-Education could use some systematic evaluation

Introduction: David Brooks writes : There’s an atmosphere of grand fragility hanging over America’s colleges. The grandeur comes from the surging application rates, the international renown, the fancy new dining and athletic facilities. The fragility comes from the fact that colleges are charging more money, but it’s not clear how much actual benefit they are providing. . . . This is an unstable situation. At some point, parents are going to decide that $160,000 is too high a price if all you get is an empty credential and a fancy car-window sticker. One part of the solution is found in three little words: value-added assessments. Colleges have to test more to find out how they’re doing. I agree with that last paragraph. Eric Loken and I said as much in the context of statistics teaching, but the principle of measuring outcomes makes sense more generally. (Issues of measurement and evaluation are particularly salient to statisticians, given that we strongly recommend formal quanti

6 0.14958882 1587 andrew gelman stats-2012-11-21-Red state blue state, or, states and counties are not persons

7 0.14520516 2107 andrew gelman stats-2013-11-20-NYT (non)-retraction watch

8 0.12154154 1727 andrew gelman stats-2013-02-19-Beef with data

9 0.11238787 2269 andrew gelman stats-2014-03-27-Beyond the Valley of the Trolls

10 0.10846456 2337 andrew gelman stats-2014-05-18-Never back down: The culture of poverty and the culture of journalism

11 0.09548033 411 andrew gelman stats-2010-11-13-Ethical concerns in medical trials

12 0.088888943 1768 andrew gelman stats-2013-03-18-Mertz’s reply to Unz’s response to Mertz’s comments on Unz’s article

13 0.088829488 434 andrew gelman stats-2010-11-28-When Small Numbers Lead to Big Errors

14 0.08659102 1187 andrew gelman stats-2012-02-27-“Apple confronts the law of large numbers” . . . huh?

15 0.085104145 1125 andrew gelman stats-2012-01-18-Beautiful Line Charts

16 0.083003089 1347 andrew gelman stats-2012-05-27-Macromuddle

17 0.082520895 1844 andrew gelman stats-2013-05-06-Against optimism about social science

18 0.081481978 2284 andrew gelman stats-2014-04-07-How literature is like statistical reasoning: Kosara on stories. Gelman and Basbøll on stories.

19 0.081086919 1533 andrew gelman stats-2012-10-14-If x is correlated with y, then y is correlated with x

20 0.080209315 108 andrew gelman stats-2010-06-24-Sometimes the raw numbers are better than a percentage


similar blogs computed by lsi model

lsi for this blog:

topicId topicWeight

[(0, 0.162), (1, -0.076), (2, -0.012), (3, 0.021), (4, -0.016), (5, -0.024), (6, 0.047), (7, 0.014), (8, 0.004), (9, -0.012), (10, -0.032), (11, -0.0), (12, -0.097), (13, 0.036), (14, 0.001), (15, 0.074), (16, -0.042), (17, -0.019), (18, 0.055), (19, -0.022), (20, -0.005), (21, -0.002), (22, 0.035), (23, 0.016), (24, -0.015), (25, 0.044), (26, -0.017), (27, 0.02), (28, -0.009), (29, 0.018), (30, 0.022), (31, 0.022), (32, 0.032), (33, -0.018), (34, 0.007), (35, -0.018), (36, -0.008), (37, -0.007), (38, -0.019), (39, 0.011), (40, 0.039), (41, 0.045), (42, 0.008), (43, -0.006), (44, -0.029), (45, 0.066), (46, 0.033), (47, -0.01), (48, 0.02), (49, 0.008)]

similar blogs list:

simIndex simValue blogId blogTitle

same-blog 1 0.94468433 1025 andrew gelman stats-2011-11-24-Always check your evidence

Introduction: Logical reasoning typically takes the following form: 1. I know that A is true. 2. I know that A implies B. 3. Therefore, I can conclude that B is true. I, like Lewis Carroll, have problems with this process sometimes, but it’s pretty standard. There is also a statistical version in which the above statements are replaced by averages (“A usually happens,” etc.). But in all these stories, the argument can fall down if you get the facts wrong. Perhaps that’s one reason that statisticians can be obsessed with detail. For example, David Brooks wrote the following, in a column called “Living with Mistakes”: The historian Leslie Hannah identified the ten largest American companies in 1912. None of those companies ranked in the top 100 companies by 1990. Huh? Could that really be? I googled “ten largest american companies 1912″ and found this , from Leslie Hannah: No big deal: two still in the top 10 rather than zero in the top 100, but Brooks’s general

2 0.83906406 2280 andrew gelman stats-2014-04-03-As the boldest experiment in journalism history, you admit you made a mistake

Introduction: The pre-NYT David Brooks liked to make fun of the NYT. Here’s one from 1997 : I’m not sure I’d like to be one of the people featured on the New York Times wedding page, but I know I’d like to be the father of one of them. Imagine how happy Stanley J. Kogan must have been, for example, when his daughter Jamie got into Yale. Then imagine his pride when Jamie made Phi Beta Kappa and graduated summa cum laude. . . . he must have enjoyed a gloat or two when his daughter put on that cap and gown. And things only got better. Jamie breezed through Stanford Law School. And then she met a man—Thomas Arena—who appears to be exactly the sort of son-in-law that pediatric urologists dream about. . . . These two awesome resumes collided at a wedding ceremony . . . It must have been one of the happiest days in Stanley J. Kogan’s life. The rest of us got to read about it on the New York Times wedding page. Brooks is reputed to be Jewish himself so I think it’s ok for him to mock Jewish peop

3 0.83823884 1729 andrew gelman stats-2013-02-20-My beef with Brooks: the alternative to “good statistics” is not “no statistics,” it’s “bad statistics”

Introduction: I was thinking more about David Brooks’s anti-data column from yesterday, and I realized what is really bothering me. Brooks expresses skepticism about numbers, about the limitations of raw data, about the importance of human thinking. Fine, I agree with all of this, to some extent. But then Brooks turns around uses numbers and unquestioningly and uncritically (OK, not completely uncritically; see P.S. below). In a notorious recent case, Brooks wrote, in the context of college admissions: You’re going to want to argue with Unz’s article all the way along, especially for its narrow, math-test-driven view of merit. But it’s potentially ground-shifting. Unz’s other big point is that Jews are vastly overrepresented at elite universities and that Jewish achievement has collapsed. In the 1970s, for example, 40 percent of top scorers in the Math Olympiad had Jewish names. Now 2.5 percent do. But these numbers are incorrect, as I learned from a professor of oncology at the Univ

4 0.82051939 2107 andrew gelman stats-2013-11-20-NYT (non)-retraction watch

Introduction: Mark Palko is irritated by the Times’s refusal to retract a recounting of a hoax regarding Dickens and Dostoevsky. All I can say is, the Times refuses to retract mistakes of fact that are far more current than that! See here for two examples that particularly annoyed me, to the extent that I contacted various people at the Times but ran into refusals to retract. I guess a daily newspaper publishes so much material that they can’t be expected to run a retraction every time they publish something false, even when such things are brought to their attention. Speaking of corrections, I wonder if later editions of the Samuelson economics textbook discussed their notorious graph predicting Soviet economic performance. The easiest thing would be just to remove the graph, but I think it would be a better economics lesson to discuss the error! Similarly, I think the NYT would do well to run an article on their Dickens-Dostoevsky mistake, along with a column by Arthur Brooks on how

5 0.7874763 1743 andrew gelman stats-2013-02-28-Different modes of discourse

Introduction: Political/business negotiation vs. scholarly communication. In a negotiation you hold back, you only make concessions if you have to or in exchange for something else. In scholarly communication you look for your own mistakes, you volunteer information to others, and if someone points out a mistake, you learn from it. (Just a couple days ago, in fact, someone sent me an email showing a problem with bayesglm. I ran and altered his code, and it turned out we had a problem. Based on this information, Yu-Sung found and fixed the code. I was grateful to be informed of the problem.) Not all scholarly exchange goes like this, but that’s the ideal. In contrast, openness and transparency are not ideals in politics and business; in many cases they’re not even desired. If Barack Obama and John Boehner are negotiating on the budget, would it be appropriate for one of them to just start off the negotiations by making a bunch of concessions for free? No, of course not. Negotiation doesn

6 0.74196523 1768 andrew gelman stats-2013-03-18-Mertz’s reply to Unz’s response to Mertz’s comments on Unz’s article

7 0.73578107 1458 andrew gelman stats-2012-08-14-1.5 million people were told that extreme conservatives are happier than political moderates. Approximately .0001 million Americans learned that the opposite is true.

8 0.73215431 135 andrew gelman stats-2010-07-09-Rasmussen sez: “108% of Respondents Say . . .”

9 0.73197526 1271 andrew gelman stats-2012-04-20-Education could use some systematic evaluation

10 0.72836947 1730 andrew gelman stats-2013-02-20-Unz on Unz

11 0.71670371 189 andrew gelman stats-2010-08-06-Proposal for a moratorium on the use of the words “fashionable” and “trendy”

12 0.70569503 1751 andrew gelman stats-2013-03-06-Janet Mertz’s response to “The Myth of American Meritocracy”

13 0.69356388 1187 andrew gelman stats-2012-02-27-“Apple confronts the law of large numbers” . . . huh?

14 0.69278485 2073 andrew gelman stats-2013-10-22-Ivy Jew update

15 0.68817031 1553 andrew gelman stats-2012-10-30-Real rothko, fake rothko

16 0.68671858 2337 andrew gelman stats-2014-05-18-Never back down: The culture of poverty and the culture of journalism

17 0.6761086 69 andrew gelman stats-2010-06-04-A Wikipedia whitewash

18 0.67362756 335 andrew gelman stats-2010-10-11-How to think about Lou Dobbs

19 0.6719408 707 andrew gelman stats-2011-05-12-Human nature can’t be changed (except when it can)

20 0.66652215 1591 andrew gelman stats-2012-11-26-Politics as an escape hatch


similar blogs computed by lda model

lda for this blog:

topicId topicWeight

[(9, 0.017), (10, 0.011), (16, 0.498), (21, 0.011), (24, 0.084), (39, 0.011), (72, 0.012), (76, 0.015), (99, 0.236)]

similar blogs list:

simIndex simValue blogId blogTitle

1 0.99466932 572 andrew gelman stats-2011-02-14-Desecration of valuable real estate

Introduction: Malecki asks: Is this the worst infographic ever to appear in NYT? USA Today is not something to aspire to. To connect to some of our recent themes , I agree this is a pretty horrible data display. But it’s not bad as a series of images. Considering the competition to be a cartoon or series of photos, these images aren’t so bad. One issue, I think, is that designers get credit for creativity and originality (unusual color combinations! Histogram bars shaped like mosques!) , which is often the opposite of what we want in a clear graph. It’s Martin Amis vs. George Orwell all over again.

2 0.98831409 1115 andrew gelman stats-2012-01-12-Where are the larger-than-life athletes?

Introduction: Jonathan Cantor points to this poll estimating rifle-armed QB Tim Tebow as America’s favorite pro athlete: In an ESPN survey of 1,502 Americans age 12 or older, three percent identified Tebow as their favorite professional athlete. Tebow finished in front of Kobe Bryant (2 percent), Aaron Rodgers (1.9 percent), Peyton Manning (1.8 percent), and Tom Brady (1.5 percent). Amusing. What this survey says to me is that there are no super-popular athletes who are active in America today. Which actually sounds about right. No Tiger Woods, no Magic Johnson, Muhammed Ali, John Elway, Pete Rose, Billie Jean King, etc etc. Tebow is an amusing choice, people might as well pick him now while he’s still on top. As a sports celeb, he’s like Bill Lee or the Refrigerator: colorful and a solid pro athlete, but no superstar. When you think about all the colorful superstar athletes of times gone by, it’s perhaps surprising that there’s nobody out there right now to play the role. I supp

3 0.98669207 1014 andrew gelman stats-2011-11-16-Visualizations of NYPD stop-and-frisk data

Introduction: Cathy O’Neil organized this visualization project with NYPD stop-and-frisk data. It’s part of the Data Without Borders project. Unfortunately, because of legal restrictions I couldn’t send them the data Jeff, Alex, and I used in our project several years ago.

4 0.98237228 1279 andrew gelman stats-2012-04-24-ESPN is looking to hire a research analyst

Introduction: This is somebody’s dream job, I’m sure . . . ESPN is looking for a statistician to join the HR department as a Research Analyst . The job will consist of analytical research and producing statistics about the people that work at ESPN. Topics of interest will include productivity, efficiency, and retention of employees, among other items. In addition to data mining and producing reports, we also field surveys and analyze results. The position is located at the headquarters in Bristol, Connecticut, the same campus where nearly all ESPN shows are produced. ESPN is a Disney company, so discounts and free admission to Disney parks are available for employees. Flexible work arrangements are available, along with working in the New York City office part-time if desired. The role is a relatively new function and will have a high impact very quickly on helping the business function. Statistical software, text books, and any other resource needed to get the job done will be provided. T

5 0.98224759 528 andrew gelman stats-2011-01-21-Elevator shame is a two-way street

Introduction: Tyler Cowen links a blog by Samuel Arbesman mocking people who are so lazy that they take the elevator from 1 to 2. This reminds me of my own annoyance about a guy who worked in my building and did not take the elevator. (For the full story, go here and search on “elevator.”)

6 0.98098737 1366 andrew gelman stats-2012-06-05-How do segregation measures change when you change the level of aggregation?

7 0.9754647 1659 andrew gelman stats-2013-01-07-Some silly things you (didn’t) miss by not reading the sister blog

8 0.97194016 1304 andrew gelman stats-2012-05-06-Picking on Stephen Wolfram

9 0.96858132 1180 andrew gelman stats-2012-02-22-I’m officially no longer a “rogue”

10 0.96407956 398 andrew gelman stats-2010-11-06-Quote of the day

11 0.96169734 1487 andrew gelman stats-2012-09-08-Animated drought maps

12 0.95074725 1330 andrew gelman stats-2012-05-19-Cross-validation to check missing-data imputation

same-blog 13 0.94332504 1025 andrew gelman stats-2011-11-24-Always check your evidence

14 0.94237435 1598 andrew gelman stats-2012-11-30-A graphics talk with no visuals!

15 0.94207108 1026 andrew gelman stats-2011-11-25-Bayes wikipedia update

16 0.94057095 445 andrew gelman stats-2010-12-03-Getting a job in pro sports… as a statistician

17 0.92921221 700 andrew gelman stats-2011-05-06-Suspicious pattern of too-strong replications of medical research

18 0.91817111 1156 andrew gelman stats-2012-02-06-Bayesian model-building by pure thought: Some principles and examples

19 0.90324736 2 andrew gelman stats-2010-04-23-Modeling heterogenous treatment effects

20 0.90217876 609 andrew gelman stats-2011-03-13-Coauthorship norms