andrew_gelman_stats andrew_gelman_stats-2010 andrew_gelman_stats-2010-488 knowledge-graph by maker-knowledge-mining
Source: html
Introduction: From blogger Matthew Yglesias : There are lots of great graphs all over the web (see, for example, here and here for some snappy pictures of unemployment trends from blogger “Geoff”). There’s nothing special about Yglesias’s graph. In fact, the reason I’m singling it out as “graph of the year” is because it’s not special. It’s a display of three numbers, with no subtlety or artistry in its presentation. True, it has some good features: - Clear title - Clearly labeled axes - Vertical axis goes to zero - The cities are in a sensible order (not, for example, alphabetical) - The graphs is readable; none of that 3-D “data visualization” crap that looks cool but distances the reader from the numbers being displayed. What’s impressive about the above graph, what makes it a landmark to me, is that it was made at all. As noted in the text immediately below the image, it’s a display of exactly three numbers which can with little effort be completely presented and e
sentIndex sentText sentNum sentScore
1 From blogger Matthew Yglesias : There are lots of great graphs all over the web (see, for example, here and here for some snappy pictures of unemployment trends from blogger “Geoff”). [sent-1, score-0.587]
2 In fact, the reason I’m singling it out as “graph of the year” is because it’s not special. [sent-3, score-0.1]
3 It’s a display of three numbers, with no subtlety or artistry in its presentation. [sent-4, score-0.595]
4 What’s impressive about the above graph, what makes it a landmark to me, is that it was made at all. [sent-6, score-0.108]
5 As noted in the text immediately below the image, it’s a display of exactly three numbers which can with little effort be completely presented and explained in three sentences. [sent-7, score-0.686]
6 Personally, I’d prefer a horizontally-aligned dotplot, which can display the information more compactly and readably. [sent-8, score-0.433]
7 And I’d prefer population per acre rather than per square mile. [sent-9, score-0.785]
8 I find it very hard to visualize 60,000 or even 10,000 people in a square mile. [sent-10, score-0.248]
9 In contrast, 15 people per acre is something I can understand immediately. [sent-11, score-0.393]
10 (One could also compute gimmicks such as the average distance to the closest person, if all the people were laid out in city, evenly spaced. [sent-12, score-0.383]
11 I think that sort of calculation can aid intuition, but in this case I think it’s a bit trickier than necessary for the points that Yglesias is making. [sent-13, score-0.191]
12 Similarly, graphical methods have truly arrived when journalists use graphs to make ordinary, unexceptional points in a clearer way. [sent-15, score-0.541]
13 The success of this graph also demolishes naive notions of efficiency of data display. [sent-19, score-0.407]
14 An entire graph is being used to display only three numbers, but there’s nothing chartjunky about it. [sent-20, score-0.651]
wordName wordTfidf (topN-words)
[('yglesias', 0.257), ('acre', 0.252), ('unexceptional', 0.252), ('display', 0.232), ('graph', 0.198), ('numbers', 0.158), ('square', 0.158), ('ordinary', 0.156), ('three', 0.148), ('blogger', 0.141), ('per', 0.141), ('graphs', 0.127), ('trickier', 0.115), ('artistry', 0.115), ('gimmicks', 0.115), ('demolishes', 0.115), ('willie', 0.115), ('geoff', 0.108), ('compactly', 0.108), ('landmark', 0.108), ('evenly', 0.103), ('distances', 0.1), ('singling', 0.1), ('snappy', 0.1), ('subtlety', 0.1), ('alphabetical', 0.097), ('robinson', 0.097), ('joining', 0.094), ('notions', 0.094), ('equality', 0.094), ('prefer', 0.093), ('axes', 0.092), ('dotplot', 0.092), ('visualize', 0.09), ('vertical', 0.09), ('arrived', 0.09), ('closest', 0.083), ('laid', 0.082), ('league', 0.08), ('pictures', 0.078), ('racial', 0.077), ('readable', 0.076), ('aid', 0.076), ('cities', 0.076), ('crap', 0.075), ('nothing', 0.073), ('axis', 0.073), ('clearer', 0.072), ('sensible', 0.072), ('join', 0.072)]
simIndex simValue blogId blogTitle
same-blog 1 1.0000002 488 andrew gelman stats-2010-12-27-Graph of the year
Introduction: From blogger Matthew Yglesias : There are lots of great graphs all over the web (see, for example, here and here for some snappy pictures of unemployment trends from blogger “Geoff”). There’s nothing special about Yglesias’s graph. In fact, the reason I’m singling it out as “graph of the year” is because it’s not special. It’s a display of three numbers, with no subtlety or artistry in its presentation. True, it has some good features: - Clear title - Clearly labeled axes - Vertical axis goes to zero - The cities are in a sensible order (not, for example, alphabetical) - The graphs is readable; none of that 3-D “data visualization” crap that looks cool but distances the reader from the numbers being displayed. What’s impressive about the above graph, what makes it a landmark to me, is that it was made at all. As noted in the text immediately below the image, it’s a display of exactly three numbers which can with little effort be completely presented and e
2 0.17701855 61 andrew gelman stats-2010-05-31-A data visualization manifesto
Introduction: Details matter (at least, they do for me), but we don’t yet have a systematic way of going back and forth between the structure of a graph, its details, and the underlying questions that motivate our visualizations. (Cleveland, Wilkinson, and others have written a bit on how to formalize these connections, and I’ve thought about it too, but we have a ways to go.) I was thinking about this difficulty after reading an article on graphics by some computer scientists that was well-written but to me lacked a feeling for the linkages between substantive/statistical goals and graphical details. I have problems with these issues too, and my point here is not to criticize but to move the discussion forward. When thinking about visualization, how important are the details? Aleks pointed me to this article by Jeffrey Heer, Michael Bostock, and Vadim Ogievetsky, “A Tour through the Visualization Zoo: A survey of powerful visualization techniques, from the obvious to the obscure.” Th
Introduction: I continue to struggle to convey my thoughts on statistical graphics so I’ll try another approach, this time giving my own story. For newcomers to this discussion: the background is that Antony Unwin and I wrote an article on the different goals embodied in information visualization and statistical graphics, but I have difficulty communicating on this point with the infovis people. Maybe if I tell my own story, and then they tell their stories, this will point a way forward to a more constructive discussion. So here goes. I majored in physics in college and I worked in a couple of research labs during the summer. Physicists graph everything. I did most of my plotting on graph paper–this continued through my second year of grad school–and became expert at putting points at 1/5, 2/5, 3/5, and 4/5 between the x and y grid lines. In grad school in statistics, I continued my physics habits and graphed everything I could. I did notice, though, that the faculty and the other
4 0.16603412 1684 andrew gelman stats-2013-01-20-Ugly ugly ugly
Introduction: Denis Cote sends the following , under the heading, “Some bad graphs for your enjoyment”: To start with, they don’t know how to spell “color.” Seriously, though, the graph is a mess. The circular display implies a circular or periodic structure that isn’t actually in the data, the cramped display requires the use of an otherwise-unnecessary color code that makes it difficult to find or make sense of the information, the alphabetical ordering (without even supplying state names, only abbreviations) makes it further difficult to find any patterns. It would be so much better, and even easier, to just display a set of small maps shading states on whether they have different laws. But that’s part of the problem—the clearer graph would also be easier to make! To get a distinctive graph, there needs to be some degree of difficulty. The designers continue with these monstrosities: Here they decide to display only 5 states at a time so that it’s really hard to see any big pi
5 0.1598808 312 andrew gelman stats-2010-10-02-“Regression to the mean” is fine. But what’s the “mean”?
Introduction: In the context of a discussion of Democratic party strategies, Matthew Yglesias writes : Given where things stood in January 2009, large House losses were essentially inevitable. The Democratic majority elected in 2008 was totally unsustainable and was doomed by basic regression to the mean. I’d like to push back on this, if for no other reason than that I didn’t foresee all this back in January 2009. Regression to the mean is a fine idea, but what’s the “mean” that you’re regressing to? Here’s a graph I made a couple years ago , showing the time series of Democratic vote share in congressional and presidential elections: Take a look at the House vote in 2006 and 2008. Is this a blip, just begging to be slammed down in 2010 by a regression to the mean? Or does it represent a return to form, back to the 55% level of support that the Democrats had for most of the previous fifty years? It’s not so obvious what to think–at least, not simply from looking at the graph.
6 0.1412552 855 andrew gelman stats-2011-08-16-Infovis and statgraphics update update
7 0.1394871 2266 andrew gelman stats-2014-03-25-A statistical graphics course and statistical graphics advice
8 0.13937312 2154 andrew gelman stats-2013-12-30-Bill Gates’s favorite graph of the year
10 0.13012363 670 andrew gelman stats-2011-04-20-Attractive but hard-to-read graph could be made much much better
13 0.11823003 536 andrew gelman stats-2011-01-24-Trends in partisanship by state
14 0.11557622 2091 andrew gelman stats-2013-11-06-“Marginally significant”
16 0.11394601 1584 andrew gelman stats-2012-11-19-Tradeoffs in information graphics
17 0.11141466 671 andrew gelman stats-2011-04-20-One more time-use graph
19 0.10554974 1176 andrew gelman stats-2012-02-19-Standardized writing styles and standardized graphing styles
20 0.099819392 1848 andrew gelman stats-2013-05-09-A tale of two discussion papers
topicId topicWeight
[(0, 0.154), (1, -0.058), (2, 0.018), (3, 0.087), (4, 0.137), (5, -0.175), (6, -0.084), (7, 0.065), (8, -0.041), (9, -0.005), (10, 0.004), (11, -0.016), (12, -0.055), (13, 0.017), (14, 0.019), (15, 0.028), (16, 0.048), (17, -0.006), (18, 0.015), (19, -0.022), (20, -0.012), (21, 0.028), (22, -0.017), (23, 0.027), (24, 0.027), (25, 0.007), (26, 0.002), (27, 0.033), (28, -0.017), (29, -0.009), (30, 0.04), (31, -0.016), (32, -0.049), (33, -0.034), (34, -0.044), (35, 0.01), (36, 0.014), (37, -0.057), (38, -0.004), (39, 0.008), (40, 0.022), (41, -0.012), (42, -0.046), (43, 0.022), (44, -0.026), (45, 0.022), (46, -0.024), (47, -0.03), (48, -0.038), (49, 0.009)]
simIndex simValue blogId blogTitle
same-blog 1 0.98236638 488 andrew gelman stats-2010-12-27-Graph of the year
Introduction: From blogger Matthew Yglesias : There are lots of great graphs all over the web (see, for example, here and here for some snappy pictures of unemployment trends from blogger “Geoff”). There’s nothing special about Yglesias’s graph. In fact, the reason I’m singling it out as “graph of the year” is because it’s not special. It’s a display of three numbers, with no subtlety or artistry in its presentation. True, it has some good features: - Clear title - Clearly labeled axes - Vertical axis goes to zero - The cities are in a sensible order (not, for example, alphabetical) - The graphs is readable; none of that 3-D “data visualization” crap that looks cool but distances the reader from the numbers being displayed. What’s impressive about the above graph, what makes it a landmark to me, is that it was made at all. As noted in the text immediately below the image, it’s a display of exactly three numbers which can with little effort be completely presented and e
2 0.89991581 2154 andrew gelman stats-2013-12-30-Bill Gates’s favorite graph of the year
Introduction: Under the subject line “Blog bait!”, Brendan Nyhan points me to this post at the Washington Post blog: For 2013, we asked some of the year’s most interesting, important and influential thinkers to name their favorite graph of the year — and why they chose it. Here’s Bill Gates’s. Infographic by Thomas Porostocky for WIRED. “I love this graph because it shows that while the number of people dying from communicable diseases is still far too high, those numbers continue to come down. . . .” As Brendan is aware, this is not my favorite sort of graph, it’s a bit of a puzzle to read and figure out where all the pieces fit in, also weird stuff going on like 3-D effects and the big space taken up by those yellow and green borders, as well as tricky things like understanding what some of those little blocks are, and perhaps the biggest question, what is the definition of an “untimely death.” But, as often is the case, the defects of the graph form a statistical perspective can
3 0.8979938 829 andrew gelman stats-2011-07-29-Infovis vs. statgraphics: A clear example of their different goals
Introduction: I recently came across a data visualization that perfectly demonstrates the difference between the “infovis” and “statgraphics” perspectives. Here’s the image ( link from Tyler Cowen): That’s the infovis. The statgraphic version would simply be a dotplot, something like this: (I purposely used the default settings in R with only minor modifications here to demonstrate what happens if you just want to plot the data with minimal effort.) Let’s compare the two graphs: From a statistical graphics perspective, the second graph dominates. The countries are directly comparable and the numbers are indicated by positions rather than area. The first graph is full of distracting color and gives the misleading visual impression that the total GDP of countries 5-10 is about equal to that of countries 1-4. If the goal is to get attention , though, it’s another story. There’s nothing special about the top graph above except how it looks. It represents neither a dat
4 0.8808443 1684 andrew gelman stats-2013-01-20-Ugly ugly ugly
Introduction: Denis Cote sends the following , under the heading, “Some bad graphs for your enjoyment”: To start with, they don’t know how to spell “color.” Seriously, though, the graph is a mess. The circular display implies a circular or periodic structure that isn’t actually in the data, the cramped display requires the use of an otherwise-unnecessary color code that makes it difficult to find or make sense of the information, the alphabetical ordering (without even supplying state names, only abbreviations) makes it further difficult to find any patterns. It would be so much better, and even easier, to just display a set of small maps shading states on whether they have different laws. But that’s part of the problem—the clearer graph would also be easier to make! To get a distinctive graph, there needs to be some degree of difficulty. The designers continue with these monstrosities: Here they decide to display only 5 states at a time so that it’s really hard to see any big pi
5 0.85776424 1439 andrew gelman stats-2012-08-01-A book with a bunch of simple graphs
Introduction: Howard Friedman sent me a new book, The Measure of a Nation, subtitled How to Regain America’s Competitive Edge and Boost Our Global Standing. Without commenting on the substance of Friedman’s recommendations, I’d like to endorse his strategy of presentation, which is to display graph after graph after graph showing the same message over and over again, which is that the U.S. is outperformed by various other countries (mostly in Europe) on a variety of measures. These aren’t graphs I would ever make—they are scatterplots in which the x-axis conveys no information. But they have the advantage of repetition: once you figure out how to read one of the graphs, you can read the others easily. Here’s an example which I found from a quick Google: I can’t actually figure out what is happening on the x-axis, nor do I understand the “star, middle child, dog” thing. But I like the use of graphics. Lots more fun than bullet points. Seriously. P.S. Just to be clear: I am not trying
6 0.85297853 502 andrew gelman stats-2011-01-04-Cash in, cash out graph
7 0.83517969 37 andrew gelman stats-2010-05-17-Is chartjunk really “more useful” than plain graphs? I don’t think so.
8 0.81677461 671 andrew gelman stats-2011-04-20-One more time-use graph
10 0.8106733 61 andrew gelman stats-2010-05-31-A data visualization manifesto
11 0.80761951 2146 andrew gelman stats-2013-12-24-NYT version of birthday graph
14 0.80287135 2266 andrew gelman stats-2014-03-25-A statistical graphics course and statistical graphics advice
15 0.80203235 1011 andrew gelman stats-2011-11-15-World record running times vs. distance
17 0.79897219 672 andrew gelman stats-2011-04-20-The R code for those time-use graphs
18 0.79587984 915 andrew gelman stats-2011-09-17-(Worst) graph of the year
19 0.79561257 863 andrew gelman stats-2011-08-21-Bad graph
20 0.79152185 1669 andrew gelman stats-2013-01-12-The power of the puzzlegraph
topicId topicWeight
[(4, 0.016), (5, 0.05), (9, 0.014), (13, 0.016), (16, 0.102), (21, 0.045), (24, 0.156), (43, 0.062), (47, 0.015), (53, 0.035), (59, 0.01), (67, 0.081), (77, 0.034), (86, 0.014), (89, 0.025), (99, 0.185)]
simIndex simValue blogId blogTitle
same-blog 1 0.9534443 488 andrew gelman stats-2010-12-27-Graph of the year
Introduction: From blogger Matthew Yglesias : There are lots of great graphs all over the web (see, for example, here and here for some snappy pictures of unemployment trends from blogger “Geoff”). There’s nothing special about Yglesias’s graph. In fact, the reason I’m singling it out as “graph of the year” is because it’s not special. It’s a display of three numbers, with no subtlety or artistry in its presentation. True, it has some good features: - Clear title - Clearly labeled axes - Vertical axis goes to zero - The cities are in a sensible order (not, for example, alphabetical) - The graphs is readable; none of that 3-D “data visualization” crap that looks cool but distances the reader from the numbers being displayed. What’s impressive about the above graph, what makes it a landmark to me, is that it was made at all. As noted in the text immediately below the image, it’s a display of exactly three numbers which can with little effort be completely presented and e
2 0.88098764 1871 andrew gelman stats-2013-05-27-Annals of spam
Introduction: I received the following email, subject line “Want to Buy Text Link from andrewgelman.com”: Dear, I am Mary Taylor. I have started a link building campaign for my growing websites. For this, I need your cooperation. The campaign is quite diverse and large scale and if you take some time to understand it – it will benefit us. First I want to clarify that I do not want “blogroll” ”footer” or any other type of “site wide links”. Secondly I want links from inner pages of site – with good page rank of course. Third links should be within text so that Google may not mark them as spam – not for you and not for me. Hence this link building will cause almost no harm to your site or me. Because content links are fine with Google. Now I should come to the requirements. I will accept links from Page Rank 3 to as high as you have got. Also kindly note that I can buy 1 to 50 links from one site – so you should understand the scale of the project. If you have multiple sites with co
3 0.88077235 898 andrew gelman stats-2011-09-10-Fourteen magic words: an update
Introduction: In the discussion of the fourteen magic words that can increase voter turnout by over 10 percentage points , questions were raised about the methods used to estimate the experimental effects. I sent these on to Chris Bryan, the author of the study, and he gave the following response: We’re happy to address the questions that have come up. It’s always noteworthy when a precise psychological manipulation like this one generates a large effect on a meaningful outcome. Such findings illustrate the power of the underlying psychological process. I’ve provided the contingency tables for the two turnout experiments below. As indicated in the paper, the data are analyzed using logistic regressions. The change in chi-squared statistic represents the significance of the noun vs. verb condition variable in predicting turnout; that is, the change in the model’s significance when the condition variable is added. This is a standard way to analyze dichotomous outcomes. Four outliers were excl
4 0.8804872 799 andrew gelman stats-2011-07-13-Hypothesis testing with multiple imputations
Introduction: Vincent Yip writes: I have read your paper [with Kobi Abayomi and Marc Levy] regarding multiple imputation application. In order to diagnostic my imputed data, I used Kolmogorov-Smirnov (K-S) tests to compare the distribution differences between the imputed and observed values of a single attribute as mentioned in your paper. My question is: For example I have this attribute X with the following data: (NA = missing) Original dataset: 1, NA, 3, 4, 1, 5, NA Imputed dataset: 1, 2 , 3, 4, 1, 5, 6 a) in order to run the KS test, will I treat the observed data as 1, 3, 4,1, 5? b) and for the observed data, will I treat 1, 2 , 3, 4, 1, 5, 6 as the imputed dataset for the K-S test? or just 2 ,6? c) if I used m=5, I will have 5 set of imputed data sets. How would I apply K-S test to 5 of them and compare to the single observed distribution? Do I combine the 5 imputed data set into one by averaging each imputed values so I get one single imputed data and compare with the ob
5 0.87914884 1606 andrew gelman stats-2012-12-05-The Grinch Comes Back
Introduction: Wayne Folta writes: In keeping with your interest in graphs, this might interest or inspire you, if you haven’t seen it already, which features 20 scientific graphs that Wired likes, ranging from drawn illustrations to trajectory plots. My reaction: I looked at the first 10. I liked 1, 3, and 5, I didn’t like 2, 7, 8, 9, and 10. I have neutral feelings about 4 and 6. I won’t explain all these feelings, but, just for example, from my perspective, image 9 fails as a statistical graphic (although it might be fine as an infovis) by trying to cram to much into a single image. I don’t think it works to have all the colors on the single wheels; instead I’d prefer some sort of grid of images. Also, I don’t see the point of the circular display. That makes no sense at all; it’s a misleading feature. That said, the graphs I dislike can still be fine for their purpose. A graph in a journal such as Science or Nature is meant to grab the eye of a busy reader (or to go viral on
6 0.87908411 687 andrew gelman stats-2011-04-29-Zero is zero
7 0.87791955 1080 andrew gelman stats-2011-12-24-Latest in blog advertising
8 0.87777925 586 andrew gelman stats-2011-02-23-A statistical version of Arrow’s paradox
9 0.87742752 481 andrew gelman stats-2010-12-22-The Jumpstart financial literacy survey and the different purposes of tests
10 0.87590855 807 andrew gelman stats-2011-07-17-Macro causality
11 0.87400055 1584 andrew gelman stats-2012-11-19-Tradeoffs in information graphics
12 0.87337863 1713 andrew gelman stats-2013-02-08-P-values and statistical practice
13 0.87322891 2029 andrew gelman stats-2013-09-18-Understanding posterior p-values
15 0.87216789 1219 andrew gelman stats-2012-03-18-Tips on “great design” from . . . Microsoft!
16 0.87186348 1546 andrew gelman stats-2012-10-24-Hey—has anybody done this study yet?
17 0.87185633 1293 andrew gelman stats-2012-05-01-Huff the Magic Dragon
18 0.87177235 1155 andrew gelman stats-2012-02-05-What is a prior distribution?
19 0.87154365 1881 andrew gelman stats-2013-06-03-Boot
20 0.8709566 1923 andrew gelman stats-2013-07-03-Bayes pays!