andrew_gelman_stats andrew_gelman_stats-2011 andrew_gelman_stats-2011-855 knowledge-graph by maker-knowledge-mining
Source: html
Introduction: To continue our discussion from last week , consider three positions regarding the display of information: (a) The traditional tabular approach. This is how most statisticians, econometricians, political scientists, sociologists, etc., seem to operate. They understand the appeal of a pretty graph, and they’re willing to plot some data as part of an exploratory data analysis, but they see their serious research as leading to numerical estimates, p-values, tables of numbers. These people might use a graph to illustrate their points but they don’t see them as necessary in their research. (b) Statistical graphics as performed by Howard Wainer, Bill Cleveland, Dianne Cook, etc. They–we–see graphics as central to the process of statistical modeling and data analysis and are interested in graphs (static and dynamic) that display every data point as transparently as possible. (c) Information visualization or infographics, as performed by graphics designers and statisticians who are
sentIndex sentText sentNum sentScore
1 They–we–see graphics as central to the process of statistical modeling and data analysis and are interested in graphs (static and dynamic) that display every data point as transparently as possible. [sent-7, score-1.03]
2 (c) Information visualization or infographics, as performed by graphics designers and statisticians who are particularly interested in the way that graphics can involve non-statisticians in the process of thinking about and understanding numerical data. [sent-8, score-0.949]
3 From about 1995-2010, most of my writing on graphics focused on the contrast between (a) and (b), in particular the way that exploratory graphics could be used to check the fit of and better understand probability models. [sent-10, score-0.627]
4 The differences between statistical graphics and information visualization are important because, on both sides, people are doing work that’s not as good as it could be. [sent-20, score-0.655]
5 In my ignorance of the different perspectives of infographics and statistical visualization, I naively thought that the best images for the statistical purpose of my understanding of the data would automatically be the best for involving others. [sent-39, score-0.613]
6 Statistical graphics people detest pie charts and tend to like plain-looking displays such as dotplots and lineplots, with even more complicated varieties such as mosaic plots looking fairly uniform from a visual perspective. [sent-47, score-0.664]
7 Statisticians seem to care a lot about displaying data optimally but not much about what people actually learn from their graphs in real life. [sent-48, score-0.538]
8 In contrast, infovis experts tend to like striking, unusual patterns and favor graphs with visual appeal. [sent-50, score-0.606]
9 On one hand you have people like Cleveland, Tufte, Antony Unwin, Kaiser Fung, myself, and many other statisticians who want graphs to be transparent so that users can identify what each data point on the graph stands for. [sent-53, score-0.868]
10 And somewhere on the side are the tens of millions of maybe not-so-satisfied users of Excel who are fumbling to display today’s data using last century’s tools, making graphs that hit that sweet spot of being both ugly and barely informative! [sent-56, score-0.665]
11 My commenters and I suggested various alternatives; for some purposes the original graphs might be fine; the key point, though, which we should all be able to agree on, is that there are a lot of things to look for here and we shouldn’t be stuck in any single form of visual data expression. [sent-62, score-0.589]
12 Different tastes, different goals The main point I’ve been trying to get at in the recent discussion, starting with my paper with Unwin, is that it makes sense to consider the different tastes of statgraphics and infovis people as reflecting different goals. [sent-63, score-1.024]
13 It’s not just that we statisticians are too lame to make graphs that look good, or that graphics designers are too clueless to display actual data. [sent-64, score-0.94]
14 In contrast, graphics designers are always being made aware of the problems of getting the attention of outsiders, hence they develop tools to make graphs more impressive visually appealing. [sent-67, score-0.722]
15 Putting it all together Infovis people have a lot of knowledge that statisticians don’t have, ranging from technical issues of fonts and colors to a general user-focused visual and storytelling perspective. [sent-68, score-0.471]
16 I’ve spent a lot of time in the past thirty years making graphs and thinking about making graphs. [sent-70, score-0.492]
17 I’ve spent a lot of time in the past fifteen years or so thinking about statistical graphics as a central link connecting model building, Bayesian inference, and exploratory data analysis (in particular, check out my articles from 2003 and 2004 that I keep linking to). [sent-71, score-0.839]
18 And I’ve spent a lot of time in the past year thinking and writing about the different goals of different practitioners of statistical graphics. [sent-72, score-0.724]
19 I’ve tried to explore these differences by studying some graphs that infovis proponents really seem to like : Florence Nightingale’s plots, Yau’s 5 best data visualizations of the year, an award-winning plot from a newspaper contest, and others. [sent-75, score-0.889]
20 I think I’ve found some common features in these graphs which might indicate some systematic differences in goals between infovis and statgraphics. [sent-76, score-0.771]
wordName wordTfidf (topN-words)
[('graphics', 0.274), ('infovis', 0.243), ('graphs', 0.237), ('statisticians', 0.169), ('display', 0.163), ('yau', 0.151), ('infographics', 0.14), ('different', 0.136), ('cleveland', 0.133), ('differences', 0.128), ('visual', 0.126), ('data', 0.125), ('tastes', 0.12), ('visually', 0.114), ('transparent', 0.109), ('statistical', 0.106), ('lot', 0.101), ('pie', 0.101), ('ve', 0.101), ('designers', 0.097), ('wordle', 0.092), ('goals', 0.091), ('unwin', 0.09), ('visualizations', 0.09), ('spent', 0.089), ('charts', 0.088), ('statgraphics', 0.087), ('graph', 0.079), ('exploratory', 0.079), ('florence', 0.078), ('nightingale', 0.078), ('people', 0.075), ('users', 0.074), ('figuring', 0.072), ('kosara', 0.072), ('visualization', 0.072), ('common', 0.072), ('viewer', 0.071), ('reader', 0.071), ('recognize', 0.068), ('effective', 0.068), ('putting', 0.067), ('plot', 0.066), ('spot', 0.066), ('infographic', 0.066), ('past', 0.065), ('communicating', 0.063), ('nathan', 0.063), ('rock', 0.063), ('numerical', 0.063)]
simIndex simValue blogId blogTitle
same-blog 1 1.0000007 855 andrew gelman stats-2011-08-16-Infovis and statgraphics update update
Introduction: To continue our discussion from last week , consider three positions regarding the display of information: (a) The traditional tabular approach. This is how most statisticians, econometricians, political scientists, sociologists, etc., seem to operate. They understand the appeal of a pretty graph, and they’re willing to plot some data as part of an exploratory data analysis, but they see their serious research as leading to numerical estimates, p-values, tables of numbers. These people might use a graph to illustrate their points but they don’t see them as necessary in their research. (b) Statistical graphics as performed by Howard Wainer, Bill Cleveland, Dianne Cook, etc. They–we–see graphics as central to the process of statistical modeling and data analysis and are interested in graphs (static and dynamic) that display every data point as transparently as possible. (c) Information visualization or infographics, as performed by graphics designers and statisticians who are
Introduction: I continue to struggle to convey my thoughts on statistical graphics so I’ll try another approach, this time giving my own story. For newcomers to this discussion: the background is that Antony Unwin and I wrote an article on the different goals embodied in information visualization and statistical graphics, but I have difficulty communicating on this point with the infovis people. Maybe if I tell my own story, and then they tell their stories, this will point a way forward to a more constructive discussion. So here goes. I majored in physics in college and I worked in a couple of research labs during the summer. Physicists graph everything. I did most of my plotting on graph paper–this continued through my second year of grad school–and became expert at putting points at 1/5, 2/5, 3/5, and 4/5 between the x and y grid lines. In grad school in statistics, I continued my physics habits and graphed everything I could. I did notice, though, that the faculty and the other
Introduction: Our discussion on data visualization continues. One one side are three statisticians–Antony Unwin, Kaiser Fung, and myself. We have been writing about the different goals served by information visualization and statistical graphics. On the other side are graphics experts (sorry for the imprecision, I don’t know exactly what these people do in their day jobs or how they are trained, and I don’t want to mislabel them) such as Robert Kosara and Jen Lowe , who seem a bit annoyed at how my colleagues and myself seem to follow the Tufte strategy of criticizing what we don’t understand. And on the third side are many (most?) academic statisticians, econometricians, etc., who don’t understand or respect graphs and seem to think of visualization as a toy that is unrelated to serious science or statistics. I’m not so interested in the third group right now–I tried to communicate with them in my big articles from 2003 and 2004 )–but I am concerned that our dialogue with the graphic
4 0.38513029 1584 andrew gelman stats-2012-11-19-Tradeoffs in information graphics
Introduction: The visual display of quantitative information (to use Edward Tufte’s wonderful term) is a diverse field or set of fields, and its practitioners have different goals. The goals of software designers, applied statisticians, biologists, graphic designers, and journalists (to list just a few of the important creators of data graphics) often overlap—but not completely. One of our aims in writing our article [on Infovis and Statistical Graphics] was to emphasize the diversity of graphical goals, as it seems to us that even experts tend to consider one aspect of a graph and not others. Our main practical suggestion was that, in the internet age, we should not have to choose between attractive graphs and informational graphs: it should be possible to display both, via interactive displays. But to follow this suggestion, one must first accept that not every beautiful graph is informative, and not every informative graph is beautiful. . . . Yes, it can sometimes be possible for a graph to
5 0.30892473 816 andrew gelman stats-2011-07-22-“Information visualization” vs. “Statistical graphics”
Introduction: By now you all must be tired of my one-sided presentations of the differences between infovis and statgraphics (for example, this article with Antony Unwin). Today is something different. Courtesy of Martin Theus, editor of the Statistical Computing and Graphics Newsletter, we have two short articles offering competing perspectives: Robert Kosara writes from an Infovis view: Information visualization is a field that has had trouble defining its boundaries, and that consequently is often misunderstood. It doesn’t help that InfoVis, as it is also known, produces pretty pictures that people like to look at and link to or send around. But InfoVis is more than pretty pictures, and it is more than statistical graphics. The key to understanding InfoVis is to ignore the images for a moment and focus on the part that is often lost: interaction. When we use visualization tools, we don’t just create one image or one kind of visualization. In fact, most people would argue that there is
6 0.2971687 1594 andrew gelman stats-2012-11-28-My talk on statistical graphics at Mit this Thurs aft
7 0.2859714 1848 andrew gelman stats-2013-05-09-A tale of two discussion papers
8 0.27157235 1604 andrew gelman stats-2012-12-04-An epithet I can live with
9 0.26719213 2266 andrew gelman stats-2014-03-25-A statistical graphics course and statistical graphics advice
10 0.24731819 1811 andrew gelman stats-2013-04-18-Psychology experiments to understand what’s going on with data graphics?
11 0.24396682 319 andrew gelman stats-2010-10-04-“Who owns Congress”
12 0.23071606 2279 andrew gelman stats-2014-04-02-Am I too negative?
13 0.22307111 61 andrew gelman stats-2010-05-31-A data visualization manifesto
14 0.22122656 1275 andrew gelman stats-2012-04-22-Please stop me before I barf again
15 0.20883358 1450 andrew gelman stats-2012-08-08-My upcoming talk for the data visualization meetup
16 0.20138219 1176 andrew gelman stats-2012-02-19-Standardized writing styles and standardized graphing styles
17 0.19597328 1668 andrew gelman stats-2013-01-11-My talk at the NY data visualization meetup this Monday!
18 0.19230951 546 andrew gelman stats-2011-01-31-Infovis vs. statistical graphics: My talk tomorrow (Tues) 1pm at Columbia
19 0.19026245 492 andrew gelman stats-2010-12-30-That puzzle-solving feeling
20 0.18844959 1661 andrew gelman stats-2013-01-08-Software is as software does
topicId topicWeight
[(0, 0.322), (1, -0.057), (2, -0.1), (3, 0.147), (4, 0.234), (5, -0.296), (6, -0.251), (7, 0.144), (8, -0.058), (9, 0.028), (10, 0.012), (11, -0.007), (12, -0.053), (13, -0.004), (14, 0.003), (15, -0.083), (16, -0.038), (17, -0.046), (18, 0.047), (19, 0.077), (20, -0.011), (21, -0.069), (22, 0.02), (23, 0.051), (24, -0.01), (25, -0.01), (26, 0.014), (27, 0.032), (28, -0.041), (29, 0.002), (30, -0.079), (31, 0.039), (32, 0.047), (33, 0.111), (34, 0.017), (35, 0.059), (36, 0.013), (37, 0.09), (38, 0.036), (39, -0.004), (40, 0.016), (41, -0.027), (42, -0.056), (43, -0.053), (44, 0.005), (45, -0.035), (46, -0.009), (47, 0.017), (48, -0.016), (49, 0.048)]
simIndex simValue blogId blogTitle
1 0.95906681 1584 andrew gelman stats-2012-11-19-Tradeoffs in information graphics
Introduction: The visual display of quantitative information (to use Edward Tufte’s wonderful term) is a diverse field or set of fields, and its practitioners have different goals. The goals of software designers, applied statisticians, biologists, graphic designers, and journalists (to list just a few of the important creators of data graphics) often overlap—but not completely. One of our aims in writing our article [on Infovis and Statistical Graphics] was to emphasize the diversity of graphical goals, as it seems to us that even experts tend to consider one aspect of a graph and not others. Our main practical suggestion was that, in the internet age, we should not have to choose between attractive graphs and informational graphs: it should be possible to display both, via interactive displays. But to follow this suggestion, one must first accept that not every beautiful graph is informative, and not every informative graph is beautiful. . . . Yes, it can sometimes be possible for a graph to
same-blog 2 0.95626134 855 andrew gelman stats-2011-08-16-Infovis and statgraphics update update
Introduction: To continue our discussion from last week , consider three positions regarding the display of information: (a) The traditional tabular approach. This is how most statisticians, econometricians, political scientists, sociologists, etc., seem to operate. They understand the appeal of a pretty graph, and they’re willing to plot some data as part of an exploratory data analysis, but they see their serious research as leading to numerical estimates, p-values, tables of numbers. These people might use a graph to illustrate their points but they don’t see them as necessary in their research. (b) Statistical graphics as performed by Howard Wainer, Bill Cleveland, Dianne Cook, etc. They–we–see graphics as central to the process of statistical modeling and data analysis and are interested in graphs (static and dynamic) that display every data point as transparently as possible. (c) Information visualization or infographics, as performed by graphics designers and statisticians who are
3 0.92295903 1604 andrew gelman stats-2012-12-04-An epithet I can live with
Introduction: Here . Indeed, I’d much rather be a legend than a myth. I just want to clarify one thing. Walter Hickey writes: [Antony Unwin and Andrew Gelman] collaborated on this presentation where they take a hard look at what’s wrong with the recent trends of data visualization and infographics. The takeaway is that while there have been great leaps in visualization technology, some of the visualizations that have garnered the highest praises have actually been lacking in a number of key areas. Specifically, the pair does a takedown of the top visualizations of 2008 as decided by the popular statistics blog Flowing Data. This is a fair summary, but I want to emphasize that, although our dislike of some award-winning visualizations is central to our argument, it is only the first part of our story. As Antony and I worked more on our paper, and especially after seeing the discussions by Robert Kosara, Stephen Few, Hadley Wickham, and Paul Murrell (all to appear in Journal of Computati
Introduction: Our discussion on data visualization continues. One one side are three statisticians–Antony Unwin, Kaiser Fung, and myself. We have been writing about the different goals served by information visualization and statistical graphics. On the other side are graphics experts (sorry for the imprecision, I don’t know exactly what these people do in their day jobs or how they are trained, and I don’t want to mislabel them) such as Robert Kosara and Jen Lowe , who seem a bit annoyed at how my colleagues and myself seem to follow the Tufte strategy of criticizing what we don’t understand. And on the third side are many (most?) academic statisticians, econometricians, etc., who don’t understand or respect graphs and seem to think of visualization as a toy that is unrelated to serious science or statistics. I’m not so interested in the third group right now–I tried to communicate with them in my big articles from 2003 and 2004 )–but I am concerned that our dialogue with the graphic
Introduction: I continue to struggle to convey my thoughts on statistical graphics so I’ll try another approach, this time giving my own story. For newcomers to this discussion: the background is that Antony Unwin and I wrote an article on the different goals embodied in information visualization and statistical graphics, but I have difficulty communicating on this point with the infovis people. Maybe if I tell my own story, and then they tell their stories, this will point a way forward to a more constructive discussion. So here goes. I majored in physics in college and I worked in a couple of research labs during the summer. Physicists graph everything. I did most of my plotting on graph paper–this continued through my second year of grad school–and became expert at putting points at 1/5, 2/5, 3/5, and 4/5 between the x and y grid lines. In grad school in statistics, I continued my physics habits and graphed everything I could. I did notice, though, that the faculty and the other
6 0.8695395 319 andrew gelman stats-2010-10-04-“Who owns Congress”
7 0.8593291 816 andrew gelman stats-2011-07-22-“Information visualization” vs. “Statistical graphics”
8 0.85176873 1594 andrew gelman stats-2012-11-28-My talk on statistical graphics at Mit this Thurs aft
9 0.83488733 2266 andrew gelman stats-2014-03-25-A statistical graphics course and statistical graphics advice
10 0.82732993 1606 andrew gelman stats-2012-12-05-The Grinch Comes Back
11 0.82122761 1896 andrew gelman stats-2013-06-13-Against the myth of the heroic visualization
12 0.81366593 372 andrew gelman stats-2010-10-27-A use for tables (really)
13 0.81169879 1275 andrew gelman stats-2012-04-22-Please stop me before I barf again
14 0.80407852 1811 andrew gelman stats-2013-04-18-Psychology experiments to understand what’s going on with data graphics?
15 0.80271888 1775 andrew gelman stats-2013-03-23-In which I disagree with John Maynard Keynes
16 0.79854667 1096 andrew gelman stats-2012-01-02-Graphical communication for legal scholarship
17 0.79499066 1764 andrew gelman stats-2013-03-15-How do I make my graphs?
18 0.78341842 794 andrew gelman stats-2011-07-09-The quest for the holy graph
19 0.78053534 1848 andrew gelman stats-2013-05-09-A tale of two discussion papers
20 0.7783457 61 andrew gelman stats-2010-05-31-A data visualization manifesto
topicId topicWeight
[(2, 0.012), (5, 0.029), (16, 0.08), (21, 0.049), (24, 0.189), (34, 0.012), (45, 0.018), (51, 0.021), (55, 0.016), (57, 0.015), (63, 0.023), (76, 0.056), (77, 0.011), (86, 0.025), (99, 0.278)]
simIndex simValue blogId blogTitle
Introduction: Jerzy Wieczorek has an interesting review of the book Graph Design for the Eye and Mind by psychology researcher Stephen Kosslyn. I recommend you read all of Wieczorek’s review (and maybe Kosslyn’s book, but that I haven’t seen), but here I’ll just focus on one point. Here’s Wieczorek summarizing Kosslyn: p. 18-19: the horizontal axis should be for the variable with the “most important part of the data.” See Kosslyn’s Figure 1.6 and 1.7 below. Figure 1.6 clearly shows that one of the sex-by-income groups reacts to age differently than the other three groups do. Figure 1.7 uses sex as the x-axis variable, making it much harder to see this same effect in the data. As a statistician exploring the data, I might make several plots using different groupings… but for communicating my results to an audience, I would choose the one plot that shows the findings most clearly. Those who know me well (or who have read the title of this post) will guess my reaction, whic
2 0.97867721 2246 andrew gelman stats-2014-03-13-An Economist’s Guide to Visualizing Data
Introduction: Stephen Jenkins wrote: I was thinking that you and your blog readers might be interested in “ An Economist’s Guide to Visualizing Data ” by Jonathan Schwabish, in the most recent Journal of Economic Perspectives (which is the American Economic Association’s main “outreach” journal in some ways). I replied: Ooh, I hate this so much! This seems to represent a horrible example of economists not recognizing that outsiders can help them. We do much much better in political science. To which Jenkins wrote: Ha! I guessed as much — hence sent it. And I’ll now admit I was surprised that JEP took the piece without getting Schwabisch to widen his reference points. To elaborate a bit: I agree with Schwabish’s general advice (“show the data,” “reduce the clutter,” and “integrate the text and the graph”). But then he illustrates with 8 before-and-after stories in which he shows an existing graph and then gives his improvements. My problem is that I don’t like most of his “afte
3 0.97614604 1818 andrew gelman stats-2013-04-22-Goal: Rules for Turing chess
Introduction: Daniel Murell has more thoughts on Turing chess (last discussed here ): When I played with my brother, we had it that if you managed to lap someone while running around the house, then you got an additional move. This means that if you had the option to take the king on your additional move, you could, and doing so won you the game. He was fitter at the time so he slipped in two additional moves over the course of the game. I still won :) I am much better at him at chess though, so I’m sure he would have beaten me had we been more even. W.r.t. dsquared’s comment and your response, I’m not overly concerned about the first move, because you can enforce that white must reach a halfway point or that some time interval elapse before black makes his first move. This version though does have one significant weakness that is evident to me. If you wait a little for your opponent to return to make his second move in a row against you, you get your breath back. He couldn’t plan for th
Introduction: In this article , Oliver Sacks talks about his extreme difficulty in recognizing people (even close friends) and places (even extremely familiar locations such as his apartment and his office). After reading this, I started to wonder if I have a very mild case of face-blindness. I’m very good at recognizing places, but I’m not good at faces. And I can’t really visualize faces at all. Like Sacks and some of his correspondents, I often have to do it by cheating, by recognizing certain landmarks that I can remember, thus coding the face linguistically rather than visually. (On the other hand, when thinking about mathematics or statistics, I’m very visual, as readers of this blog can attest.) Anyway, in searching for the link to Sacks’s article, I came across the “ Cambridge Face Memory Test .” My reaction when taking this test was mostly irritation. I just found it annoying to stare at all these unadorned faces, and in my attempt to memorize them, I was trying to use trick
5 0.97123015 1713 andrew gelman stats-2013-02-08-P-values and statistical practice
Introduction: From my new article in the journal Epidemiology: Sander Greenland and Charles Poole accept that P values are here to stay but recognize that some of their most common interpretations have problems. The casual view of the P value as posterior probability of the truth of the null hypothesis is false and not even close to valid under any reasonable model, yet this misunderstanding persists even in high-stakes settings (as discussed, for example, by Greenland in 2011). The formal view of the P value as a probability conditional on the null is mathematically correct but typically irrelevant to research goals (hence, the popularity of alternative—if wrong—interpretations). A Bayesian interpretation based on a spike-and-slab model makes little sense in applied contexts in epidemiology, political science, and other fields in which true effects are typically nonzero and bounded (thus violating both the “spike” and the “slab” parts of the model). I find Greenland and Poole’s perspective t
7 0.96962768 807 andrew gelman stats-2011-07-17-Macro causality
same-blog 8 0.96923637 855 andrew gelman stats-2011-08-16-Infovis and statgraphics update update
9 0.96917307 1176 andrew gelman stats-2012-02-19-Standardized writing styles and standardized graphing styles
11 0.9688226 1792 andrew gelman stats-2013-04-07-X on JLP
13 0.96784186 2029 andrew gelman stats-2013-09-18-Understanding posterior p-values
14 0.96773374 502 andrew gelman stats-2011-01-04-Cash in, cash out graph
15 0.9673171 2086 andrew gelman stats-2013-11-03-How best to compare effects measured in two different time periods?
16 0.96676338 1080 andrew gelman stats-2011-12-24-Latest in blog advertising
17 0.96642494 1644 andrew gelman stats-2012-12-30-Fixed effects, followed by Bayes shrinkage?
19 0.96560085 1881 andrew gelman stats-2013-06-03-Boot
20 0.96528608 777 andrew gelman stats-2011-06-23-Combining survey data obtained using different modes of sampling