andrew_gelman_stats andrew_gelman_stats-2013 andrew_gelman_stats-2013-1684 knowledge-graph by maker-knowledge-mining
Source: html
Introduction: Denis Cote sends the following , under the heading, “Some bad graphs for your enjoyment”: To start with, they don’t know how to spell “color.” Seriously, though, the graph is a mess. The circular display implies a circular or periodic structure that isn’t actually in the data, the cramped display requires the use of an otherwise-unnecessary color code that makes it difficult to find or make sense of the information, the alphabetical ordering (without even supplying state names, only abbreviations) makes it further difficult to find any patterns. It would be so much better, and even easier, to just display a set of small maps shading states on whether they have different laws. But that’s part of the problem—the clearer graph would also be easier to make! To get a distinctive graph, there needs to be some degree of difficulty. The designers continue with these monstrosities: Here they decide to display only 5 states at a time so that it’s really hard to see any big pi
sentIndex sentText sentNum sentScore
1 Denis Cote sends the following , under the heading, “Some bad graphs for your enjoyment”: To start with, they don’t know how to spell “color. [sent-1, score-0.295]
2 It would be so much better, and even easier, to just display a set of small maps shading states on whether they have different laws. [sent-4, score-0.721]
3 But that’s part of the problem—the clearer graph would also be easier to make! [sent-5, score-0.39]
4 To get a distinctive graph, there needs to be some degree of difficulty. [sent-6, score-0.268]
5 Designers (and, I assume, newspaper readers) like circles, they’re so pretty and symmetric. [sent-10, score-0.078]
wordName wordTfidf (topN-words)
[('circular', 0.399), ('display', 0.373), ('circles', 0.232), ('designers', 0.208), ('color', 0.168), ('graph', 0.153), ('wedges', 0.147), ('shading', 0.147), ('abbreviations', 0.147), ('cote', 0.147), ('denis', 0.147), ('easier', 0.144), ('periodic', 0.128), ('alphabetical', 0.124), ('supplying', 0.124), ('symmetric', 0.124), ('enjoyment', 0.121), ('distinctive', 0.118), ('states', 0.116), ('legend', 0.114), ('makes', 0.113), ('spell', 0.11), ('difficult', 0.109), ('graphs', 0.108), ('pie', 0.108), ('unnecessary', 0.108), ('ordering', 0.107), ('heading', 0.103), ('requiring', 0.099), ('charts', 0.094), ('clearer', 0.093), ('label', 0.092), ('forth', 0.087), ('odd', 0.087), ('maps', 0.085), ('implies', 0.084), ('conclude', 0.083), ('degree', 0.079), ('harder', 0.079), ('newspaper', 0.078), ('sends', 0.077), ('names', 0.076), ('reader', 0.076), ('decide', 0.075), ('picture', 0.073), ('structure', 0.073), ('find', 0.071), ('requires', 0.071), ('needs', 0.071), ('continue', 0.07)]
simIndex simValue blogId blogTitle
same-blog 1 0.99999994 1684 andrew gelman stats-2013-01-20-Ugly ugly ugly
Introduction: Denis Cote sends the following , under the heading, “Some bad graphs for your enjoyment”: To start with, they don’t know how to spell “color.” Seriously, though, the graph is a mess. The circular display implies a circular or periodic structure that isn’t actually in the data, the cramped display requires the use of an otherwise-unnecessary color code that makes it difficult to find or make sense of the information, the alphabetical ordering (without even supplying state names, only abbreviations) makes it further difficult to find any patterns. It would be so much better, and even easier, to just display a set of small maps shading states on whether they have different laws. But that’s part of the problem—the clearer graph would also be easier to make! To get a distinctive graph, there needs to be some degree of difficulty. The designers continue with these monstrosities: Here they decide to display only 5 states at a time so that it’s really hard to see any big pi
Introduction: I continue to struggle to convey my thoughts on statistical graphics so I’ll try another approach, this time giving my own story. For newcomers to this discussion: the background is that Antony Unwin and I wrote an article on the different goals embodied in information visualization and statistical graphics, but I have difficulty communicating on this point with the infovis people. Maybe if I tell my own story, and then they tell their stories, this will point a way forward to a more constructive discussion. So here goes. I majored in physics in college and I worked in a couple of research labs during the summer. Physicists graph everything. I did most of my plotting on graph paper–this continued through my second year of grad school–and became expert at putting points at 1/5, 2/5, 3/5, and 4/5 between the x and y grid lines. In grad school in statistics, I continued my physics habits and graphed everything I could. I did notice, though, that the faculty and the other
3 0.18195806 855 andrew gelman stats-2011-08-16-Infovis and statgraphics update update
Introduction: To continue our discussion from last week , consider three positions regarding the display of information: (a) The traditional tabular approach. This is how most statisticians, econometricians, political scientists, sociologists, etc., seem to operate. They understand the appeal of a pretty graph, and they’re willing to plot some data as part of an exploratory data analysis, but they see their serious research as leading to numerical estimates, p-values, tables of numbers. These people might use a graph to illustrate their points but they don’t see them as necessary in their research. (b) Statistical graphics as performed by Howard Wainer, Bill Cleveland, Dianne Cook, etc. They–we–see graphics as central to the process of statistical modeling and data analysis and are interested in graphs (static and dynamic) that display every data point as transparently as possible. (c) Information visualization or infographics, as performed by graphics designers and statisticians who are
4 0.16603412 488 andrew gelman stats-2010-12-27-Graph of the year
Introduction: From blogger Matthew Yglesias : There are lots of great graphs all over the web (see, for example, here and here for some snappy pictures of unemployment trends from blogger “Geoff”). There’s nothing special about Yglesias’s graph. In fact, the reason I’m singling it out as “graph of the year” is because it’s not special. It’s a display of three numbers, with no subtlety or artistry in its presentation. True, it has some good features: - Clear title - Clearly labeled axes - Vertical axis goes to zero - The cities are in a sensible order (not, for example, alphabetical) - The graphs is readable; none of that 3-D “data visualization” crap that looks cool but distances the reader from the numbers being displayed. What’s impressive about the above graph, what makes it a landmark to me, is that it was made at all. As noted in the text immediately below the image, it’s a display of exactly three numbers which can with little effort be completely presented and e
5 0.15828434 1584 andrew gelman stats-2012-11-19-Tradeoffs in information graphics
Introduction: The visual display of quantitative information (to use Edward Tufte’s wonderful term) is a diverse field or set of fields, and its practitioners have different goals. The goals of software designers, applied statisticians, biologists, graphic designers, and journalists (to list just a few of the important creators of data graphics) often overlap—but not completely. One of our aims in writing our article [on Infovis and Statistical Graphics] was to emphasize the diversity of graphical goals, as it seems to us that even experts tend to consider one aspect of a graph and not others. Our main practical suggestion was that, in the internet age, we should not have to choose between attractive graphs and informational graphs: it should be possible to display both, via interactive displays. But to follow this suggestion, one must first accept that not every beautiful graph is informative, and not every informative graph is beautiful. . . . Yes, it can sometimes be possible for a graph to
6 0.14544921 61 andrew gelman stats-2010-05-31-A data visualization manifesto
7 0.13232151 1606 andrew gelman stats-2012-12-05-The Grinch Comes Back
9 0.13000409 37 andrew gelman stats-2010-05-17-Is chartjunk really “more useful” than plain graphs? I don’t think so.
10 0.1207006 2266 andrew gelman stats-2014-03-25-A statistical graphics course and statistical graphics advice
12 0.11220133 2146 andrew gelman stats-2013-12-24-NYT version of birthday graph
14 0.10605208 1379 andrew gelman stats-2012-06-14-Cool-ass signal processing using Gaussian processes (birthdays again)
15 0.1058636 1116 andrew gelman stats-2012-01-13-Infographic on the economy
16 0.1056578 159 andrew gelman stats-2010-07-23-Popular governor, small state
17 0.10512706 2154 andrew gelman stats-2013-12-30-Bill Gates’s favorite graph of the year
18 0.10397553 670 andrew gelman stats-2011-04-20-Attractive but hard-to-read graph could be made much much better
19 0.10213902 1176 andrew gelman stats-2012-02-19-Standardized writing styles and standardized graphing styles
20 0.097937882 1764 andrew gelman stats-2013-03-15-How do I make my graphs?
topicId topicWeight
[(0, 0.13), (1, -0.043), (2, -0.001), (3, 0.084), (4, 0.15), (5, -0.178), (6, -0.077), (7, 0.043), (8, -0.025), (9, 0.011), (10, 0.014), (11, -0.016), (12, -0.029), (13, 0.016), (14, 0.025), (15, 0.014), (16, 0.022), (17, -0.004), (18, -0.014), (19, 0.0), (20, 0.023), (21, 0.018), (22, -0.028), (23, 0.002), (24, 0.01), (25, -0.029), (26, 0.034), (27, 0.015), (28, -0.033), (29, 0.01), (30, 0.04), (31, -0.014), (32, -0.052), (33, -0.002), (34, -0.016), (35, -0.004), (36, -0.024), (37, -0.035), (38, -0.016), (39, 0.018), (40, 0.018), (41, -0.012), (42, -0.01), (43, 0.008), (44, -0.015), (45, 0.01), (46, -0.003), (47, 0.037), (48, 0.021), (49, 0.025)]
simIndex simValue blogId blogTitle
same-blog 1 0.96828902 1684 andrew gelman stats-2013-01-20-Ugly ugly ugly
Introduction: Denis Cote sends the following , under the heading, “Some bad graphs for your enjoyment”: To start with, they don’t know how to spell “color.” Seriously, though, the graph is a mess. The circular display implies a circular or periodic structure that isn’t actually in the data, the cramped display requires the use of an otherwise-unnecessary color code that makes it difficult to find or make sense of the information, the alphabetical ordering (without even supplying state names, only abbreviations) makes it further difficult to find any patterns. It would be so much better, and even easier, to just display a set of small maps shading states on whether they have different laws. But that’s part of the problem—the clearer graph would also be easier to make! To get a distinctive graph, there needs to be some degree of difficulty. The designers continue with these monstrosities: Here they decide to display only 5 states at a time so that it’s really hard to see any big pi
2 0.90851539 829 andrew gelman stats-2011-07-29-Infovis vs. statgraphics: A clear example of their different goals
Introduction: I recently came across a data visualization that perfectly demonstrates the difference between the “infovis” and “statgraphics” perspectives. Here’s the image ( link from Tyler Cowen): That’s the infovis. The statgraphic version would simply be a dotplot, something like this: (I purposely used the default settings in R with only minor modifications here to demonstrate what happens if you just want to plot the data with minimal effort.) Let’s compare the two graphs: From a statistical graphics perspective, the second graph dominates. The countries are directly comparable and the numbers are indicated by positions rather than area. The first graph is full of distracting color and gives the misleading visual impression that the total GDP of countries 5-10 is about equal to that of countries 1-4. If the goal is to get attention , though, it’s another story. There’s nothing special about the top graph above except how it looks. It represents neither a dat
3 0.89553267 1439 andrew gelman stats-2012-08-01-A book with a bunch of simple graphs
Introduction: Howard Friedman sent me a new book, The Measure of a Nation, subtitled How to Regain America’s Competitive Edge and Boost Our Global Standing. Without commenting on the substance of Friedman’s recommendations, I’d like to endorse his strategy of presentation, which is to display graph after graph after graph showing the same message over and over again, which is that the U.S. is outperformed by various other countries (mostly in Europe) on a variety of measures. These aren’t graphs I would ever make—they are scatterplots in which the x-axis conveys no information. But they have the advantage of repetition: once you figure out how to read one of the graphs, you can read the others easily. Here’s an example which I found from a quick Google: I can’t actually figure out what is happening on the x-axis, nor do I understand the “star, middle child, dog” thing. But I like the use of graphics. Lots more fun than bullet points. Seriously. P.S. Just to be clear: I am not trying
4 0.89473522 488 andrew gelman stats-2010-12-27-Graph of the year
Introduction: From blogger Matthew Yglesias : There are lots of great graphs all over the web (see, for example, here and here for some snappy pictures of unemployment trends from blogger “Geoff”). There’s nothing special about Yglesias’s graph. In fact, the reason I’m singling it out as “graph of the year” is because it’s not special. It’s a display of three numbers, with no subtlety or artistry in its presentation. True, it has some good features: - Clear title - Clearly labeled axes - Vertical axis goes to zero - The cities are in a sensible order (not, for example, alphabetical) - The graphs is readable; none of that 3-D “data visualization” crap that looks cool but distances the reader from the numbers being displayed. What’s impressive about the above graph, what makes it a landmark to me, is that it was made at all. As noted in the text immediately below the image, it’s a display of exactly three numbers which can with little effort be completely presented and e
5 0.88745606 671 andrew gelman stats-2011-04-20-One more time-use graph
Introduction: Evan Hensleigh sens me this redesign of the cross-national time use graph : Here was my version: And here was the original: Compared to my graph, Evan’s has better fonts, and that’s important–good fonts can make a display look professional. But I’m not sure about his other innovations. To me, the different colors for the different time-use categories are more of a distraction than a visual aid, and I also don’t like how he made the bars fatter. As I noted in my earlier entry, to me this draws unwanted attention to the negative space between the bars. His country labels are slightly misaligned (particularly Japan and USA), and I really don’t like his horizontal axis at all! He removed the units of hours and put + and – on the edges so that the axes run into each other. What was the point of that? It’s bad news. Also I don’t see any advantage at all to the prehensile tick marks. On the other hand, if Evgn and I were working together on such a graph, we w
6 0.87174118 502 andrew gelman stats-2011-01-04-Cash in, cash out graph
7 0.87149084 672 andrew gelman stats-2011-04-20-The R code for those time-use graphs
9 0.86329508 37 andrew gelman stats-2010-05-17-Is chartjunk really “more useful” than plain graphs? I don’t think so.
10 0.86181402 2154 andrew gelman stats-2013-12-30-Bill Gates’s favorite graph of the year
13 0.84755665 1896 andrew gelman stats-2013-06-13-Against the myth of the heroic visualization
14 0.84518009 2266 andrew gelman stats-2014-03-25-A statistical graphics course and statistical graphics advice
15 0.84126627 61 andrew gelman stats-2010-05-31-A data visualization manifesto
16 0.83601993 2146 andrew gelman stats-2013-12-24-NYT version of birthday graph
18 0.82828128 1606 andrew gelman stats-2012-12-05-The Grinch Comes Back
20 0.81136006 2038 andrew gelman stats-2013-09-25-Great graphs of names
topicId topicWeight
[(5, 0.022), (15, 0.02), (16, 0.124), (21, 0.014), (24, 0.122), (42, 0.016), (52, 0.013), (77, 0.308), (79, 0.016), (86, 0.027), (99, 0.215)]
simIndex simValue blogId blogTitle
1 0.91727406 230 andrew gelman stats-2010-08-24-Kaggle forcasting update
Introduction: Anthony Goldbloom writes: The Elo rating system is now in 47th position (team Elo Benchmark on the leaderboard). Team Intuition submitted using Microsoft’s Trueskill rating system – Intuition is in 38th position. And for the tourism forecasting competition, the best submission is doing better than the threshold for publication in the International Journal of Forecasting.
2 0.88514006 1784 andrew gelman stats-2013-04-01-Wolfram on Mandelbrot
Introduction: The most perfect pairing of author and subject since Nicholson Baker and John Updike. Here’s Wolfram on the great researcher of fractals : In his way, Mandelbrot paid me some great compliments. When I was in my 20s, and he in his 60s, he would ask about my scientific work: “How can so many people take someone so young so seriously?” In 2002, my book “A New Kind of Science”—in which I argued that many phenomena across science are the complex results of relatively simple, program-like rules—appeared. Mandelbrot seemed to see it as a direct threat, once declaring that “Wolfram’s ‘science’ is not new except when it is clearly wrong; it deserves to be completely disregarded.” In private, though, several mutual friends told me, he fretted that in the long view of history it would overwhelm his work. In retrospect, I don’t think Mandelbrot had much to worry about on this account. The link from the above review came from Peter Woit, who also points to a review by Brian Hayes wit
same-blog 3 0.88068467 1684 andrew gelman stats-2013-01-20-Ugly ugly ugly
Introduction: Denis Cote sends the following , under the heading, “Some bad graphs for your enjoyment”: To start with, they don’t know how to spell “color.” Seriously, though, the graph is a mess. The circular display implies a circular or periodic structure that isn’t actually in the data, the cramped display requires the use of an otherwise-unnecessary color code that makes it difficult to find or make sense of the information, the alphabetical ordering (without even supplying state names, only abbreviations) makes it further difficult to find any patterns. It would be so much better, and even easier, to just display a set of small maps shading states on whether they have different laws. But that’s part of the problem—the clearer graph would also be easier to make! To get a distinctive graph, there needs to be some degree of difficulty. The designers continue with these monstrosities: Here they decide to display only 5 states at a time so that it’s really hard to see any big pi
4 0.85246623 978 andrew gelman stats-2011-10-28-Cool job opening with brilliant researchers at Yahoo
Introduction: Duncan Watts writes: The Human Social Dynamics Group in Yahoo Research is seeking highly qualified candidates for a post-doctoral research scientist position. The Human and Social Dynamics group is devoted to understanding the interplay between individual-level behavior (e.g. how people make decisions about what music they like, which dates to go on, or which groups to join) and the social environment in which individual behavior necessarily plays itself out. In particular, we are interested in: * Structure and evolution of social groups and networks * Decision making, social influence, diffusion, and collective decisions * Networking and collaborative problem solving. The intrinsically multi-disciplinary and cross-cutting nature of the subject demands an eclectic range of researchers, both in terms of domain-expertise (e.g. decision sciences, social psychology, sociology) and technical skills (e.g. statistical analysis, mathematical modeling, computer simulations, design o
5 0.84926355 911 andrew gelman stats-2011-09-15-More data tools worth using from Google
Introduction: Speaking of open data and google tools, see this post from Revolution R: How to use a Google Spreadsheet as data in R .
7 0.8406809 380 andrew gelman stats-2010-10-29-“Bluntly put . . .”
8 0.83289826 1071 andrew gelman stats-2011-12-19-“NYU Professor Claims He Was Fired for Giving James Franco a D”
9 0.83111727 1604 andrew gelman stats-2012-12-04-An epithet I can live with
11 0.81539613 1124 andrew gelman stats-2012-01-17-How to map geographically-detailed survey responses?
12 0.80613625 57 andrew gelman stats-2010-05-29-Roth and Amsterdam
13 0.77600634 562 andrew gelman stats-2011-02-06-Statistician cracks Toronto lottery
14 0.75403804 1297 andrew gelman stats-2012-05-03-New New York data research organizations
15 0.75149465 1561 andrew gelman stats-2012-11-04-Someone is wrong on the internet
16 0.75023401 93 andrew gelman stats-2010-06-17-My proposal for making college admissions fairer
17 0.74891436 401 andrew gelman stats-2010-11-08-Silly old chi-square!
18 0.74136662 207 andrew gelman stats-2010-08-14-Pourquoi Google search est devenu plus raisonnable?
19 0.73797435 74 andrew gelman stats-2010-06-08-“Extreme views weakly held”
20 0.73724353 2054 andrew gelman stats-2013-10-07-Bing is preferred to Google by people who aren’t like me