andrew_gelman_stats andrew_gelman_stats-2012 andrew_gelman_stats-2012-1253 knowledge-graph by maker-knowledge-mining
Source: html
Introduction: Dan Kahan sends along this awesome graph (click on the image to see the whole thing): and writes: I [Kahan] saw it at http://www.theatlantic.com/technology/archive/2012/04/the-100-year-march-of-technology-in-1-graph/255573/ , which misidentified the source (not “visual economics”; visualizingeconomics .com , which attributes it to Nicholas Felton , who apparently condensed this version , which I worry could cause a stroke). But it did have a good write-up that (I’m glad) caught my attention. It made me [Kahan] start to wonder about what sorts of qualities of a technology will influence its dissemination & also about the availability of benchmarks for proliferation of various sorts of things (e.g, fads & trends, health-promoting behaviors, knowledge of a scientific discovery) that one could use to gauge how meaningful the apparent increase in rates of proliferation of these technologies has been over time. That in turn made me wonder whether — indeed, suspect th
sentIndex sentText sentNum sentScore
1 Dan Kahan sends along this awesome graph (click on the image to see the whole thing): and writes: I [Kahan] saw it at http://www. [sent-1, score-0.29]
2 com , which attributes it to Nicholas Felton , who apparently condensed this version , which I worry could cause a stroke). [sent-4, score-0.325]
3 It made me [Kahan] start to wonder about what sorts of qualities of a technology will influence its dissemination & also about the availability of benchmarks for proliferation of various sorts of things (e. [sent-6, score-1.03]
4 g, fads & trends, health-promoting behaviors, knowledge of a scientific discovery) that one could use to gauge how meaningful the apparent increase in rates of proliferation of these technologies has been over time. [sent-7, score-0.875]
5 That in turn made me wonder whether — indeed, suspect that — some smart historian or economist has already addressed these points; I’ll have to poke around to see! [sent-8, score-0.547]
6 It’s not what anyone would or should use to report or illustrate data analysis, but a graph that puts you in the mood to wonder & conjecture — without negligently pointing you in a misleading direction — is a nice species of graph. [sent-9, score-0.846]
7 My brief response: I think the graph could be cleaned up in some way but basically I like it. [sent-13, score-0.458]
8 It has a directness that is to my taste, unlike the Nightingale graph, whose twists distract from the data message. [sent-14, score-0.381]
wordName wordTfidf (topN-words)
[('kahan', 0.272), ('proliferation', 0.25), ('conjecture', 0.218), ('graph', 0.191), ('wonder', 0.155), ('provoke', 0.139), ('fads', 0.139), ('condensed', 0.139), ('twists', 0.131), ('stroke', 0.125), ('directness', 0.125), ('combat', 0.125), ('distract', 0.125), ('dissemination', 0.125), ('soldiers', 0.121), ('technologies', 0.117), ('poke', 0.117), ('nightingale', 0.117), ('benchmarks', 0.114), ('gauge', 0.114), ('visually', 0.114), ('qualities', 0.109), ('sorts', 0.109), ('mood', 0.107), ('nicholas', 0.105), ('cleaned', 0.105), ('imagining', 0.105), ('attributes', 0.105), ('rose', 0.104), ('availability', 0.101), ('behaviors', 0.101), ('awesome', 0.099), ('killed', 0.098), ('addressed', 0.098), ('species', 0.096), ('historian', 0.096), ('deaths', 0.092), ('apparent', 0.089), ('taste', 0.086), ('meaningful', 0.085), ('technology', 0.083), ('convey', 0.083), ('glad', 0.082), ('brief', 0.081), ('discovery', 0.081), ('smart', 0.081), ('desire', 0.081), ('could', 0.081), ('failed', 0.079), ('illustrate', 0.079)]
simIndex simValue blogId blogTitle
same-blog 1 0.99999994 1253 andrew gelman stats-2012-04-08-Technology speedup graph
Introduction: Dan Kahan sends along this awesome graph (click on the image to see the whole thing): and writes: I [Kahan] saw it at http://www.theatlantic.com/technology/archive/2012/04/the-100-year-march-of-technology-in-1-graph/255573/ , which misidentified the source (not “visual economics”; visualizingeconomics .com , which attributes it to Nicholas Felton , who apparently condensed this version , which I worry could cause a stroke). But it did have a good write-up that (I’m glad) caught my attention. It made me [Kahan] start to wonder about what sorts of qualities of a technology will influence its dissemination & also about the availability of benchmarks for proliferation of various sorts of things (e.g, fads & trends, health-promoting behaviors, knowledge of a scientific discovery) that one could use to gauge how meaningful the apparent increase in rates of proliferation of these technologies has been over time. That in turn made me wonder whether — indeed, suspect th
Introduction: I continue to struggle to convey my thoughts on statistical graphics so I’ll try another approach, this time giving my own story. For newcomers to this discussion: the background is that Antony Unwin and I wrote an article on the different goals embodied in information visualization and statistical graphics, but I have difficulty communicating on this point with the infovis people. Maybe if I tell my own story, and then they tell their stories, this will point a way forward to a more constructive discussion. So here goes. I majored in physics in college and I worked in a couple of research labs during the summer. Physicists graph everything. I did most of my plotting on graph paper–this continued through my second year of grad school–and became expert at putting points at 1/5, 2/5, 3/5, and 4/5 between the x and y grid lines. In grad school in statistics, I continued my physics habits and graphed everything I could. I did notice, though, that the faculty and the other
3 0.11149628 1633 andrew gelman stats-2012-12-21-Kahan on Pinker on politics
Introduction: Reacting to my recent post on Steven Pinker’s too-broad (in my opinion) speculations on red and blue states, Dan “cultural cognition” Kahan writes : Pinker is clearly right to note that mass political opinions on seemingly diverse issues cohere, and Andrew, I think, is way too quick to challenge this I [Kahan] could cite to billions of interesting papers, but I’ll just show you what I mean instead. A recent CCP data collection involving a nationally representative on-line sample of 1750 subjects included a module that asked the subjects to indicate on a six-point scale “how strongly . . . you support or oppose” a collection of policies: policy_gun Stricter gun control laws in the United States. policy_healthcare Universal health care. policy_taxcut Raising income taxes for persons in the highest-income tax bracket. policy_affirmative action Affirmative action for minorities. policy_warming Stricter carbon emission standards to reduce global warming. Positions c
4 0.10139747 855 andrew gelman stats-2011-08-16-Infovis and statgraphics update update
Introduction: To continue our discussion from last week , consider three positions regarding the display of information: (a) The traditional tabular approach. This is how most statisticians, econometricians, political scientists, sociologists, etc., seem to operate. They understand the appeal of a pretty graph, and they’re willing to plot some data as part of an exploratory data analysis, but they see their serious research as leading to numerical estimates, p-values, tables of numbers. These people might use a graph to illustrate their points but they don’t see them as necessary in their research. (b) Statistical graphics as performed by Howard Wainer, Bill Cleveland, Dianne Cook, etc. They–we–see graphics as central to the process of statistical modeling and data analysis and are interested in graphs (static and dynamic) that display every data point as transparently as possible. (c) Information visualization or infographics, as performed by graphics designers and statisticians who are
5 0.10013615 2167 andrew gelman stats-2014-01-10-Do you believe that “humans and other living things have evolved over time”?
Introduction: The other day on the sister blog we discussed a recent Pew Research survey that seemed to show that Republicans are becoming more partisan about evolution (or, as Paul Krugman put it, “So what happened after 2009 that might be driving Republican views? . . . Republicans are being driven to identify in all ways with their tribe — and the tribal belief system is dominated by anti-science fundamentalists”). We presented some discussion and evidence from Dan Kahan suggesting that the evidence for such a change was not so clear at all. Kahan drew his conclusions from a more detailed analysis of the much-discussed Pew data, along with a comparison to a recent Gallup poll. Also following up on this is sociologist David Wealiem, who pulls some more data into the discussion: Although the Pew report mentions only the 2009 survey, the question has been asked a number of times since 2005. Here are the results—the numbers represent the percent saying “evolved” minus the percent sayin
6 0.098074839 2006 andrew gelman stats-2013-09-03-Evaluating evidence from published research
7 0.093001463 2154 andrew gelman stats-2013-12-30-Bill Gates’s favorite graph of the year
8 0.092537209 2266 andrew gelman stats-2014-03-25-A statistical graphics course and statistical graphics advice
9 0.092172801 1932 andrew gelman stats-2013-07-10-Don’t trust the Turk
11 0.091254465 2186 andrew gelman stats-2014-01-26-Infoviz on top of stat graphic on top of spreadsheet
12 0.090662032 1833 andrew gelman stats-2013-04-30-“Tragedy of the science-communication commons”
13 0.087939933 1896 andrew gelman stats-2013-06-13-Against the myth of the heroic visualization
14 0.087379321 262 andrew gelman stats-2010-09-08-Here’s how rumors get started: Lineplots, dotplots, and nonfunctional modernist architecture
15 0.08675079 1376 andrew gelman stats-2012-06-12-Simple graph WIN: the example of birthday frequencies
16 0.083178058 2004 andrew gelman stats-2013-09-01-Post-publication peer review: How it (sometimes) really works
18 0.081452154 1201 andrew gelman stats-2012-03-07-Inference = data + model
19 0.080316424 1104 andrew gelman stats-2012-01-07-A compelling reason to go to London, Ontario??
20 0.080186188 502 andrew gelman stats-2011-01-04-Cash in, cash out graph
topicId topicWeight
[(0, 0.152), (1, -0.043), (2, -0.016), (3, 0.015), (4, 0.066), (5, -0.098), (6, -0.047), (7, 0.018), (8, 0.011), (9, 0.013), (10, -0.017), (11, -0.011), (12, 0.002), (13, 0.005), (14, 0.027), (15, 0.006), (16, 0.044), (17, 0.006), (18, -0.015), (19, -0.003), (20, -0.022), (21, 0.006), (22, -0.037), (23, -0.018), (24, 0.006), (25, -0.0), (26, 0.043), (27, -0.016), (28, -0.022), (29, -0.005), (30, 0.011), (31, 0.008), (32, -0.069), (33, -0.042), (34, -0.013), (35, -0.009), (36, -0.027), (37, -0.067), (38, 0.016), (39, 0.046), (40, 0.002), (41, -0.014), (42, 0.035), (43, 0.007), (44, -0.041), (45, 0.024), (46, 0.041), (47, 0.03), (48, -0.014), (49, -0.005)]
simIndex simValue blogId blogTitle
same-blog 1 0.95626479 1253 andrew gelman stats-2012-04-08-Technology speedup graph
Introduction: Dan Kahan sends along this awesome graph (click on the image to see the whole thing): and writes: I [Kahan] saw it at http://www.theatlantic.com/technology/archive/2012/04/the-100-year-march-of-technology-in-1-graph/255573/ , which misidentified the source (not “visual economics”; visualizingeconomics .com , which attributes it to Nicholas Felton , who apparently condensed this version , which I worry could cause a stroke). But it did have a good write-up that (I’m glad) caught my attention. It made me [Kahan] start to wonder about what sorts of qualities of a technology will influence its dissemination & also about the availability of benchmarks for proliferation of various sorts of things (e.g, fads & trends, health-promoting behaviors, knowledge of a scientific discovery) that one could use to gauge how meaningful the apparent increase in rates of proliferation of these technologies has been over time. That in turn made me wonder whether — indeed, suspect th
2 0.85855007 502 andrew gelman stats-2011-01-04-Cash in, cash out graph
Introduction: David Afshartous writes: I thought this graph [from Ed Easterling] might be good for your blog. The 71 outlined squares show the main story, and the regions of the graph present the information nicely. Looks like the bins for the color coding are not of equal size and of course the end bins are unbounded. Might be interesting to graph the distribution of the actual data for the 71 outlined squares. In addition, I assume that each period begins on Jan 1 so data size could be naturally increased by looking at intervals that start on June 1 as well (where the limit of this process would be to have it at the granularity of one day; while it most likely wouldn’t make much difference, I’ve seen some graphs before where 1 year returns can be quite sensitive to starting date, etc). I agree that (a) the graph could be improved in small ways–in particular, adding half-year data seems like a great idea–and (b) it’s a wonderful, wonderful graph as is. And the NYT graphics people ad
3 0.8513056 671 andrew gelman stats-2011-04-20-One more time-use graph
Introduction: Evan Hensleigh sens me this redesign of the cross-national time use graph : Here was my version: And here was the original: Compared to my graph, Evan’s has better fonts, and that’s important–good fonts can make a display look professional. But I’m not sure about his other innovations. To me, the different colors for the different time-use categories are more of a distraction than a visual aid, and I also don’t like how he made the bars fatter. As I noted in my earlier entry, to me this draws unwanted attention to the negative space between the bars. His country labels are slightly misaligned (particularly Japan and USA), and I really don’t like his horizontal axis at all! He removed the units of hours and put + and – on the edges so that the axes run into each other. What was the point of that? It’s bad news. Also I don’t see any advantage at all to the prehensile tick marks. On the other hand, if Evgn and I were working together on such a graph, we w
4 0.83167189 443 andrew gelman stats-2010-12-02-Automating my graphics advice
Introduction: After seeing this graph : I have the following message for Sharad: Rotate the graph 90 degrees so you can see the words. Also you can ditch the lines. Then what you have is a dotplot, following the principles of Cleveland (1985). You can lay out a few on one page to see some interactions with demographics. The real challenge here . . . . . . is to automate this sort of advice. Or maybe we just need a really nice dotplot() function and enough examples, and people will start doing it? P.S. Often a lineplot is better. See here for a discussion of another Sharad example.
5 0.83097917 1104 andrew gelman stats-2012-01-07-A compelling reason to go to London, Ontario??
Introduction: Dan Goldstein asks what I think of this : My reply: It’s hard for me to imagine a compelling reason for anyone to go to London, Ontario–but, hey, I guess there’s all kinds of people in this world! More seriously, I see the appeal of the graph but it’s a bit busy for my taste. Over the years I’ve moved toward small multiples rather than single busy graphs. That’s one reason why I prefer Tufte’s second book to his first book. The Napoleon-in-Russia graph is a bad model, in that inspires people to try to cram lots of variables on a single graph. Dan wrote back: I [Dan] like it as a travel planning graph, it gives you what you want to know (how how will the days be, how cold will the nights be, will it rain) but is a bit easier on the brain than a table of highs and lows. Also makes it easy to see the trend. I agree the 2nd axis doesn’t help.
6 0.82995147 2154 andrew gelman stats-2013-12-30-Bill Gates’s favorite graph of the year
7 0.82759118 1376 andrew gelman stats-2012-06-12-Simple graph WIN: the example of birthday frequencies
9 0.82217926 1684 andrew gelman stats-2013-01-20-Ugly ugly ugly
12 0.80800515 2091 andrew gelman stats-2013-11-06-“Marginally significant”
13 0.80682147 915 andrew gelman stats-2011-09-17-(Worst) graph of the year
14 0.80185217 1613 andrew gelman stats-2012-12-09-Hey—here’s a photo of me making fun of a silly infographic (from last year)
16 0.79407901 829 andrew gelman stats-2011-07-29-Infovis vs. statgraphics: A clear example of their different goals
17 0.78620493 787 andrew gelman stats-2011-07-05-Different goals, different looks: Infovis and the Chris Rock effect
18 0.78287852 1439 andrew gelman stats-2012-08-01-A book with a bunch of simple graphs
19 0.78241855 2203 andrew gelman stats-2014-02-08-“Guys who do more housework get less sex”
20 0.78022248 1258 andrew gelman stats-2012-04-10-Why display 6 years instead of 30?
topicId topicWeight
[(2, 0.018), (5, 0.018), (8, 0.011), (15, 0.078), (16, 0.059), (24, 0.12), (43, 0.179), (45, 0.02), (54, 0.01), (55, 0.018), (63, 0.018), (76, 0.04), (86, 0.023), (95, 0.029), (99, 0.261)]
simIndex simValue blogId blogTitle
Introduction: Neil Malhotra writes: I just wanted to alert to this completely misinformed Politico article by Roger Simon, equating sampling theory with “magic.” Normally, I wouldn’t send you this, but I sent him a helpful email and he was a complete jerk about it. Wow—this is really bad. It’s so bad I refuse to link to it. I don’t know who this dude is, but it’s pitiful. Andy Rooney could do better. And I don’t mean Andy Rooney in his prime, I mean Andy Rooney right now. The piece appears to be an attempt at jocularity, but it’s about 10 million times worse than whatever the worst thing is that Dave Barry has ever written. My question to Neil Malhotra is . . . what made you click on this in the first place? P.S. John Sides piles on with some Gallup quotes.
2 0.94656014 314 andrew gelman stats-2010-10-03-Disconnect between drug and medical device approval
Introduction: Sanjay Kaul wrotes: By statute (“the least burdensome” pathway), the approval standard for devices by the US FDA is lower than for drugs. Before a new drug can be marketed, the sponsor must show “substantial evidence of effectiveness” as based on two or more well-controlled clinical studies (which literally means 2 trials, each with a p value of <0.05, or 1 large trial with a robust p value <0.00125). In contrast, the sponsor of a new device, especially those that are designated as high-risk (Class III) device, need only demonstrate "substantial equivalence" to an FDA-approved device via the 510(k) exemption or a "reasonable assurance of safety and effectiveness", evaluated through a pre-market approval and typically based on a single study. What does “reasonable assurance” or “substantial equivalence” imply to you as a Bayesian? These are obviously qualitative constructs, but if one were to quantify them, how would you go about addressing it? The regulatory definitions for
3 0.93411803 1754 andrew gelman stats-2013-03-08-Cool GSS training video! And cumulative file 1972-2012!
Introduction: Felipe Osorio made the above video to help people use the General Social Survey and R to answer research questions in social science. Go for it! Meanwhile, Tom Smith reports: The initial release of the General Social Survey (GSS), cumulative file for 1972-2012 is now on our website . Codebooks and copies of questionnaires will be posted shortly. Later additional files including the GSS reinterview panels and additional variables in the cumulative file will be added. P.S. R scripts are here .
Introduction: Responding to a proposal to move the journal Political Analysis from double-blind to single-blind reviewing (that is, authors would not know who is reviewing their papers but reviewers would know the authors’ names), Tom Palfrey writes: I agree with the editors’ recommendation. I have served on quite a few editorial boards of journals with different blinding policies, and have seen no evidence that double blind procedures are a useful way to improve the quality of articles published in a journal. Aside from the obvious administrative nuisance and the fact that authorship anonymity is a thing of the past in our discipline, the theoretical and empirical arguments in both directions lead to an ambiguous conclusion. Also keep in mind that the editors know the identity of the authors (they need to know for practical reasons), their identity is not hidden from authors, and ultimately it is they who make the accept/reject decision, and also lobby their friends and colleagues to submit “the
Introduction: Matt Taibbi writes : Glenn Hubbard, Leading Academic and Mitt Romney Advisor, Took $1200 an Hour to Be Countrywide’s Expert Witness . . . Hidden among the reams of material recently filed in connection with the lawsuit of monoline insurer MBIA against Bank of America and Countrywide is a deposition of none other than Columbia University’s Glenn Hubbard. . . . Hubbard testified on behalf of Countrywide in the MBIA suit. He conducted an “analysis” that essentially concluded that Countrywide’s loans weren’t any worse than the loans produced by other mortgage originators, and that therefore the monstrous losses that investors in those loans suffered were due to other factors related to the economic crisis – and not caused by the serial misrepresentations and fraud in Countrywide’s underwriting. That’s interesting, because I worked on the other side of this case! I was hired by MBIA’s lawyers. It wouldn’t be polite of me to reveal my consulting rate, and I never actually got depose
same-blog 6 0.93024683 1253 andrew gelman stats-2012-04-08-Technology speedup graph
7 0.92406797 857 andrew gelman stats-2011-08-17-Bayes pays
8 0.92344254 1347 andrew gelman stats-2012-05-27-Macromuddle
9 0.90073341 75 andrew gelman stats-2010-06-08-“Is the cyber mob a threat to freedom?”
10 0.88894635 1860 andrew gelman stats-2013-05-17-How can statisticians help psychologists do their research better?
11 0.88859278 70 andrew gelman stats-2010-06-07-Mister P goes on a date
12 0.88711202 481 andrew gelman stats-2010-12-22-The Jumpstart financial literacy survey and the different purposes of tests
13 0.88511246 1920 andrew gelman stats-2013-06-30-“Non-statistical” statistics tools
14 0.8842274 2330 andrew gelman stats-2014-05-12-Historical Arc of Universities
15 0.87949491 1882 andrew gelman stats-2013-06-03-The statistical properties of smart chains (and referral chains more generally)
16 0.86748707 531 andrew gelman stats-2011-01-22-Third-party Dream Ticket
17 0.86422479 538 andrew gelman stats-2011-01-25-Postdoc Position #2: Hierarchical Modeling and Statistical Graphics
19 0.86077642 1815 andrew gelman stats-2013-04-20-Displaying inferences from complex models