andrew_gelman_stats andrew_gelman_stats-2010 andrew_gelman_stats-2010-443 knowledge-graph by maker-knowledge-mining
Source: html
Introduction: After seeing this graph : I have the following message for Sharad: Rotate the graph 90 degrees so you can see the words. Also you can ditch the lines. Then what you have is a dotplot, following the principles of Cleveland (1985). You can lay out a few on one page to see some interactions with demographics. The real challenge here . . . . . . is to automate this sort of advice. Or maybe we just need a really nice dotplot() function and enough examples, and people will start doing it? P.S. Often a lineplot is better. See here for a discussion of another Sharad example.
sentIndex sentText sentNum sentScore
1 After seeing this graph : I have the following message for Sharad: Rotate the graph 90 degrees so you can see the words. [sent-1, score-1.048]
2 Then what you have is a dotplot, following the principles of Cleveland (1985). [sent-3, score-0.286]
3 You can lay out a few on one page to see some interactions with demographics. [sent-4, score-0.579]
4 Or maybe we just need a really nice dotplot() function and enough examples, and people will start doing it? [sent-12, score-0.628]
5 See here for a discussion of another Sharad example. [sent-16, score-0.134]
wordName wordTfidf (topN-words)
[('dotplot', 0.446), ('sharad', 0.437), ('rotate', 0.261), ('automate', 0.25), ('ditch', 0.234), ('lineplot', 0.228), ('cleveland', 0.201), ('lay', 0.196), ('graph', 0.191), ('degrees', 0.161), ('following', 0.145), ('challenge', 0.145), ('interactions', 0.143), ('principles', 0.141), ('nice', 0.13), ('message', 0.129), ('function', 0.122), ('seeing', 0.121), ('see', 0.11), ('page', 0.103), ('examples', 0.095), ('start', 0.092), ('real', 0.079), ('often', 0.073), ('need', 0.07), ('enough', 0.068), ('discussion', 0.068), ('another', 0.066), ('sort', 0.063), ('maybe', 0.061), ('really', 0.047), ('example', 0.046), ('people', 0.038), ('also', 0.036), ('one', 0.027)]
simIndex simValue blogId blogTitle
same-blog 1 1.0000001 443 andrew gelman stats-2010-12-02-Automating my graphics advice
Introduction: After seeing this graph : I have the following message for Sharad: Rotate the graph 90 degrees so you can see the words. Also you can ditch the lines. Then what you have is a dotplot, following the principles of Cleveland (1985). You can lay out a few on one page to see some interactions with demographics. The real challenge here . . . . . . is to automate this sort of advice. Or maybe we just need a really nice dotplot() function and enough examples, and people will start doing it? P.S. Often a lineplot is better. See here for a discussion of another Sharad example.
Introduction: 1. I remarked that Sharad had a good research article with some ugly graphs. 2. Dan posted Sharad’s graph and some unpleasant alternatives, inadvertently associating me with one of the unpleasant alternatives. Dan was comparing barplots with dotplots. 3. I commented on Dan’s site that, in this case, I’d much prefer a well-designed lineplot. I wrote: There’s a principle in decision analysis that the most important step is not the evaluation of the decision tree but the decision of what options to include in the tree in the first place. I think that’s what’s happening here. You’re seriously limiting yourself by considering the above options, which really are all the same graph with just slight differences in format. What you need to do is break outside the box. (Graph 2-which I think you think is the kind of thing that Gelman would like-indeed is the kind of thing that I think the R gurus like, but I don’t like it at all . It looks clean without actually being clea
Introduction: John Kastellec points me to this blog by Ezra Klein criticizing the following graph from a recent Republican Party report: Klein (following Alexander Hart ) slams the graph for not going all the way to zero on the y-axis, thus making the projected change seem bigger than it really is. I agree with Klein and Hart that, if you’re gonna do a bar chart, you want the bars to go down to 0. On the other hand, a projected change from 19% to 23% is actually pretty big, and I don’t see the point of using a graphical display that hides it. The solution: Ditch the bar graph entirely and replace it by a lineplot , in particular, a time series with year-by-year data. The time series would have several advantages: 1. Data are placed in context. You’d see every year, instead of discrete averages, and you’d get to see the changes in the context of year-to-year variation. 2. With the time series, you can use whatever y-axis works with the data. No need to go to zero. P.S. I l
4 0.13377646 319 andrew gelman stats-2010-10-04-“Who owns Congress”
Introduction: Curt Yeske pointed me to this . Wow–these graphs are really hard to read! The old me would’ve said that each of these graphs would be better replaced by a dotplot (or, better still, a series of lineplots showing time trends). The new me would still like the dotplots and lineplots, but I’d say it’s fine to have the eye-grabbing but hard-to-read graphs as is, and then to have the more informative statistical graphics underneath, as it were. The idea is, you’d click on the pretty but hard-to-read “infovis” graphs, and this would then reveal informative “full Cleveland” graphs. And then if you click again you’d get a spreadsheet with the raw numbers. That I’d like to see, as a new model for graphical presentation.
Introduction: I continue to struggle to convey my thoughts on statistical graphics so I’ll try another approach, this time giving my own story. For newcomers to this discussion: the background is that Antony Unwin and I wrote an article on the different goals embodied in information visualization and statistical graphics, but I have difficulty communicating on this point with the infovis people. Maybe if I tell my own story, and then they tell their stories, this will point a way forward to a more constructive discussion. So here goes. I majored in physics in college and I worked in a couple of research labs during the summer. Physicists graph everything. I did most of my plotting on graph paper–this continued through my second year of grad school–and became expert at putting points at 1/5, 2/5, 3/5, and 4/5 between the x and y grid lines. In grad school in statistics, I continued my physics habits and graphed everything I could. I did notice, though, that the faculty and the other
6 0.12163604 2048 andrew gelman stats-2013-10-03-A comment on a post at the Monkey Cage
8 0.11091866 2154 andrew gelman stats-2013-12-30-Bill Gates’s favorite graph of the year
9 0.10978203 90 andrew gelman stats-2010-06-16-Oil spill and corn production
10 0.10516204 829 andrew gelman stats-2011-07-29-Infovis vs. statgraphics: A clear example of their different goals
11 0.10133503 61 andrew gelman stats-2010-05-31-A data visualization manifesto
12 0.08910495 488 andrew gelman stats-2010-12-27-Graph of the year
13 0.081874333 2091 andrew gelman stats-2013-11-06-“Marginally significant”
15 0.077579297 1376 andrew gelman stats-2012-06-12-Simple graph WIN: the example of birthday frequencies
16 0.077576965 772 andrew gelman stats-2011-06-17-Graphical tools for understanding multilevel models
17 0.076093741 348 andrew gelman stats-2010-10-17-Joanne Gowa scooped me by 22 years in my criticism of Axelrod’s Evolution of Cooperation
18 0.074991904 502 andrew gelman stats-2011-01-04-Cash in, cash out graph
19 0.074735999 1894 andrew gelman stats-2013-06-12-How to best graph the Beveridge curve, relating the vacancy rate in jobs to the unemployment rate?
20 0.073947906 2266 andrew gelman stats-2014-03-25-A statistical graphics course and statistical graphics advice
topicId topicWeight
[(0, 0.091), (1, -0.019), (2, -0.009), (3, 0.049), (4, 0.092), (5, -0.103), (6, -0.047), (7, 0.027), (8, -0.005), (9, -0.006), (10, 0.009), (11, -0.009), (12, -0.007), (13, 0.001), (14, 0.026), (15, -0.004), (16, 0.011), (17, 0.007), (18, -0.026), (19, -0.001), (20, 0.028), (21, 0.003), (22, -0.012), (23, -0.029), (24, 0.019), (25, -0.008), (26, 0.039), (27, 0.013), (28, -0.049), (29, 0.0), (30, -0.017), (31, -0.001), (32, -0.069), (33, -0.035), (34, -0.034), (35, -0.045), (36, -0.037), (37, -0.068), (38, -0.023), (39, 0.048), (40, 0.028), (41, -0.003), (42, -0.005), (43, 0.031), (44, 0.008), (45, 0.014), (46, 0.042), (47, -0.003), (48, -0.043), (49, -0.013)]
simIndex simValue blogId blogTitle
same-blog 1 0.9613499 443 andrew gelman stats-2010-12-02-Automating my graphics advice
Introduction: After seeing this graph : I have the following message for Sharad: Rotate the graph 90 degrees so you can see the words. Also you can ditch the lines. Then what you have is a dotplot, following the principles of Cleveland (1985). You can lay out a few on one page to see some interactions with demographics. The real challenge here . . . . . . is to automate this sort of advice. Or maybe we just need a really nice dotplot() function and enough examples, and people will start doing it? P.S. Often a lineplot is better. See here for a discussion of another Sharad example.
Introduction: John Kastellec points me to this blog by Ezra Klein criticizing the following graph from a recent Republican Party report: Klein (following Alexander Hart ) slams the graph for not going all the way to zero on the y-axis, thus making the projected change seem bigger than it really is. I agree with Klein and Hart that, if you’re gonna do a bar chart, you want the bars to go down to 0. On the other hand, a projected change from 19% to 23% is actually pretty big, and I don’t see the point of using a graphical display that hides it. The solution: Ditch the bar graph entirely and replace it by a lineplot , in particular, a time series with year-by-year data. The time series would have several advantages: 1. Data are placed in context. You’d see every year, instead of discrete averages, and you’d get to see the changes in the context of year-to-year variation. 2. With the time series, you can use whatever y-axis works with the data. No need to go to zero. P.S. I l
3 0.85296428 671 andrew gelman stats-2011-04-20-One more time-use graph
Introduction: Evan Hensleigh sens me this redesign of the cross-national time use graph : Here was my version: And here was the original: Compared to my graph, Evan’s has better fonts, and that’s important–good fonts can make a display look professional. But I’m not sure about his other innovations. To me, the different colors for the different time-use categories are more of a distraction than a visual aid, and I also don’t like how he made the bars fatter. As I noted in my earlier entry, to me this draws unwanted attention to the negative space between the bars. His country labels are slightly misaligned (particularly Japan and USA), and I really don’t like his horizontal axis at all! He removed the units of hours and put + and – on the edges so that the axes run into each other. What was the point of that? It’s bad news. Also I don’t see any advantage at all to the prehensile tick marks. On the other hand, if Evgn and I were working together on such a graph, we w
4 0.8287375 2146 andrew gelman stats-2013-12-24-NYT version of birthday graph
Introduction: They didn’t have room for all four graphs of the time-series decomposition so they just displayed the date-of-year graph: They rotated so the graph fit better on the page. The rotation worked for me, but I was a bit bummed that that they put the title and heading of the graph (“The birthrate tends to drop on holidays . . .”) on the left in the Mar-Apr slot, leaving no room to label Leap Day and April Fool’s. I suggested to the graphics people that they put the label at the very top and just shrink the rest of the graph by 5 or 10% so as to not take up any more total space. Then there’d be plenty of space to label Leap Day and April Fool’s. But they didn’t do it, maybe they felt that it wouldn’t look good to have the label right at the top, I dunno.
Introduction: 1. I remarked that Sharad had a good research article with some ugly graphs. 2. Dan posted Sharad’s graph and some unpleasant alternatives, inadvertently associating me with one of the unpleasant alternatives. Dan was comparing barplots with dotplots. 3. I commented on Dan’s site that, in this case, I’d much prefer a well-designed lineplot. I wrote: There’s a principle in decision analysis that the most important step is not the evaluation of the decision tree but the decision of what options to include in the tree in the first place. I think that’s what’s happening here. You’re seriously limiting yourself by considering the above options, which really are all the same graph with just slight differences in format. What you need to do is break outside the box. (Graph 2-which I think you think is the kind of thing that Gelman would like-indeed is the kind of thing that I think the R gurus like, but I don’t like it at all . It looks clean without actually being clea
6 0.8209359 502 andrew gelman stats-2011-01-04-Cash in, cash out graph
7 0.81858546 2154 andrew gelman stats-2013-12-30-Bill Gates’s favorite graph of the year
8 0.81774217 1104 andrew gelman stats-2012-01-07-A compelling reason to go to London, Ontario??
9 0.8141939 1258 andrew gelman stats-2012-04-10-Why display 6 years instead of 30?
10 0.81227499 829 andrew gelman stats-2011-07-29-Infovis vs. statgraphics: A clear example of their different goals
11 0.80809051 1684 andrew gelman stats-2013-01-20-Ugly ugly ugly
12 0.79898161 1011 andrew gelman stats-2011-11-15-World record running times vs. distance
13 0.79510272 1376 andrew gelman stats-2012-06-12-Simple graph WIN: the example of birthday frequencies
14 0.78240335 1253 andrew gelman stats-2012-04-08-Technology speedup graph
15 0.77173847 843 andrew gelman stats-2011-08-07-Non-rant
16 0.77055401 1439 andrew gelman stats-2012-08-01-A book with a bunch of simple graphs
18 0.76358873 1613 andrew gelman stats-2012-12-09-Hey—here’s a photo of me making fun of a silly infographic (from last year)
20 0.76096261 488 andrew gelman stats-2010-12-27-Graph of the year
topicId topicWeight
[(5, 0.036), (16, 0.071), (24, 0.127), (39, 0.343), (99, 0.259)]
simIndex simValue blogId blogTitle
1 0.85999489 1343 andrew gelman stats-2012-05-25-And now, here’s something we hope you’ll really like
Introduction: This came in the email: Postdoctoral Researcher (3 years) in State-Space Modeling of Animal Movement and Population Dynamics in Universities of Turku and Helsinki, Finland We seek for a statistician/mathematician with experience in ecological modeling or an ecologist with strong quantitative training to join an interdisciplinary research team focusing on dispersal and dynamics of the Siberian flying squirrel (Pteromys volans). The Postdoctoral Researcher will develop modeling approaches (from individual based models to population level models) to assess the dispersal and population dynamics of the flying squirrel. A key challenge will be the integration of different kinds of data (census data, telemetry data, mark-recapture data, life-history data, and data on environmental covariates such as forest structure) into the modeling framework using Bayesian State-Space models or other such approaches. The project will be supervised by Dr. Vesa Selonen (a flying squirrel specialist;
same-blog 2 0.85004538 443 andrew gelman stats-2010-12-02-Automating my graphics advice
Introduction: After seeing this graph : I have the following message for Sharad: Rotate the graph 90 degrees so you can see the words. Also you can ditch the lines. Then what you have is a dotplot, following the principles of Cleveland (1985). You can lay out a few on one page to see some interactions with demographics. The real challenge here . . . . . . is to automate this sort of advice. Or maybe we just need a really nice dotplot() function and enough examples, and people will start doing it? P.S. Often a lineplot is better. See here for a discussion of another Sharad example.
3 0.80965006 31 andrew gelman stats-2010-05-13-Visualization in 1939
Introduction: Willard Cope Brinton’s second book Graphic Presentation (1939) surprised me with the quality of its graphics. Prof. Michael Stoll has some scans at Flickr . For example: The whole book can be downloaded (in a worse resolution) from Archive.Org .
4 0.80360663 844 andrew gelman stats-2011-08-07-Update on the new Handbook of MCMC
Introduction: It’s edited by Steve Brooks, Galin Jones, Xiao-Li Meng, and myself. Here’s the information and some sample chapters (including my own chapter with Ken Shirley on inference and monitoring convergence and Radford’s instant classic on Hamiltonian Monte Carlo). Sorry about the $100 price tag–nobody asked me about that! But if you’re doing these computations as part of your work, I think the book will be well worth it.
5 0.75641984 1622 andrew gelman stats-2012-12-14-Can gambling addicts be identified in gambling venues?
Introduction: Mark Griffiths, a psychologist who apparently is Europe’s only Professor of Gambling Studies, writes: You made the comment about how difficult it is to spot problem gamblers. I and a couple of colleagues [Paul Delfabbro and Daniel Kingjust] published this review of all the research done on spotting problem gamblers in online and offline gaming venues (attached) that I covered in one of my recent blogs .
7 0.7209419 1157 andrew gelman stats-2012-02-07-Philosophy of Bayesian statistics: my reactions to Hendry
8 0.71234989 155 andrew gelman stats-2010-07-19-David Blackwell
9 0.71076107 441 andrew gelman stats-2010-12-01-Mapmaking software
10 0.70477724 1927 andrew gelman stats-2013-07-05-“Numbersense: How to use big data to your advantage”
11 0.69581205 1193 andrew gelman stats-2012-03-03-“Do you guys pay your bills?”
12 0.69212306 867 andrew gelman stats-2011-08-23-The economics of the mac? A paradox of competition
13 0.68631279 674 andrew gelman stats-2011-04-21-Handbook of Markov Chain Monte Carlo
14 0.68190098 686 andrew gelman stats-2011-04-29-What are the open problems in Bayesian statistics??
16 0.6706745 1125 andrew gelman stats-2012-01-18-Beautiful Line Charts
18 0.66745478 1681 andrew gelman stats-2013-01-19-Participate in a short survey about the weight of evidence provided by statistics
19 0.66370887 2248 andrew gelman stats-2014-03-15-Problematic interpretations of confidence intervals
20 0.6611613 935 andrew gelman stats-2011-10-01-When should you worry about imputed data?