andrew_gelman_stats andrew_gelman_stats-2011 andrew_gelman_stats-2011-847 knowledge-graph by maker-knowledge-mining

847 andrew gelman stats-2011-08-10-Using a “pure infographic” to explore differences between information visualization and statistical graphics


meta infos for this blog

Source: html

Introduction: Our discussion on data visualization continues. One one side are three statisticians–Antony Unwin, Kaiser Fung, and myself. We have been writing about the different goals served by information visualization and statistical graphics. On the other side are graphics experts (sorry for the imprecision, I don’t know exactly what these people do in their day jobs or how they are trained, and I don’t want to mislabel them) such as Robert Kosara and Jen Lowe , who seem a bit annoyed at how my colleagues and myself seem to follow the Tufte strategy of criticizing what we don’t understand. And on the third side are many (most?) academic statisticians, econometricians, etc., who don’t understand or respect graphs and seem to think of visualization as a toy that is unrelated to serious science or statistics. I’m not so interested in the third group right now–I tried to communicate with them in my big articles from 2003 and 2004 )–but I am concerned that our dialogue with the graphic


Summary: the most important sentenses genereted by tfidf model

sentIndex sentText sentNum sentScore

1 We have been writing about the different goals served by information visualization and statistical graphics. [sent-3, score-0.354]

2 I’m not so interested in the third group right now–I tried to communicate with them in my big articles from 2003 and 2004 )–but I am concerned that our dialogue with the graphics experts is not moving forward quite as I’d wished. [sent-8, score-0.581]

3 For example, Hadley Wickham, creator of the great ggplot2, wrote : Unfortunately both sides [statisticians and infovgraphics people] seem to be comparing the best of one side with the worst of the other. [sent-13, score-0.339]

4 There are some awful infovis papers that completely ignore utility in the pursuit of aesthetics. [sent-14, score-0.405]

5 There are many awful stat graphics papers that ignore aesthetics in the pursuit of utility (and often fail to achieve that). [sent-15, score-0.65]

6 ” Sure, sometimes this is true (as in the notorious “chartjunk” paper in which pretty graphs are compared to piss-poor plots that violate every principle of visualization and statistical graphics). [sent-19, score-0.436]

7 In my long article with Unwin, we discussed the “5 best data visualizations of the year”! [sent-21, score-0.329]

8 In our short article , we discuss Florence Nightingale’s spiral graph, which is considered a data visualization classic. [sent-22, score-0.441]

9 And, from the other side, my impression is that infographics gurus are happy to celebrate the best of statistical graphics. [sent-23, score-0.385]

10 In much of my recent writing on graphics, I’ve focused on visualizations that have been popular and effective–Wordle is an excellent example here–while not following what I would consider to be good principles of statistical graphics. [sent-27, score-0.312]

11 The differences between (a) and (b) are my subject, and a great way to highlight them is to consider examples that are effective as infovis but not as statistical graphics. [sent-29, score-0.371]

12 doing the opposite with my favorite statistical graphics: demonstrating that despite their savvy graphical arrangements of comparisons, my graphs don’t always communicate what I’d like them to. [sent-31, score-0.354]

13 I’m very open to the idea that graphics experts could help me communicate in ways that I didn’t think of, just as I’d hope that graphics experts would accept that even the coolest images and dynamic graphics could be reimagined if the goal is data exploration. [sent-32, score-1.648]

14 To get back to our exchange with Kosara, I stand firm in my belief that the swirly plot is not such a good way to display time series data–there are more effective ways of understanding periodicity, and no I don’t think this has anything to do with dynamic vs. [sent-33, score-0.311]

15 I’ll quarantine a discussion of the display of periodic data to another blog post. [sent-37, score-0.322]

16 It’s a display of strategies of Rock Paper Scissors that Nathan Yau featured a couple weeks ago on his blog: This is an attractive graphic that conveys some information–but the images have almost nothing to do with the info. [sent-39, score-0.486]

17 Difference in perspectives The graphic in question is titled, “How do I win rock, paper, scissors every time? [sent-41, score-0.489]

18 Conversely, a journalist wouldn’t be caught dead making a boring headline (for example, “Some strategies that might increase your odds in rock paper scissors”). [sent-46, score-0.334]

19 In contrast, my post from a few years ago (titled “How to win at rock-paper-scissors,” a bit misleading but much less so than “How to win every time”) had a lot more information and received exactly 6 comments. [sent-50, score-0.446]

20 ) Again I purposely chose a non-quantitative example to move the discussion away from “How’s the best way to display these data” and focus entirely on the different goals. [sent-57, score-0.361]


similar blogs computed by tfidf model

tfidf for this blog:

wordName wordTfidf (topN-words)

[('graphics', 0.323), ('scissors', 0.193), ('kosara', 0.185), ('visualization', 0.153), ('win', 0.148), ('display', 0.148), ('side', 0.145), ('communicate', 0.144), ('infographics', 0.134), ('best', 0.127), ('statistical', 0.124), ('nathan', 0.12), ('rock', 0.12), ('wordle', 0.117), ('visualizations', 0.114), ('unexpected', 0.114), ('experts', 0.114), ('attractive', 0.113), ('spiral', 0.111), ('nightingale', 0.099), ('awful', 0.097), ('pursuit', 0.094), ('discuss', 0.089), ('data', 0.088), ('effective', 0.087), ('graphs', 0.086), ('discussion', 0.086), ('highlight', 0.082), ('graph', 0.081), ('titled', 0.08), ('antony', 0.079), ('infovis', 0.078), ('information', 0.077), ('unwin', 0.077), ('strategies', 0.076), ('dynamic', 0.076), ('graphic', 0.075), ('images', 0.074), ('popular', 0.074), ('every', 0.073), ('love', 0.072), ('statisticians', 0.072), ('fung', 0.071), ('utility', 0.071), ('dead', 0.07), ('goal', 0.069), ('journalist', 0.068), ('worst', 0.067), ('ignore', 0.065), ('sorry', 0.065)]

similar blogs list:

simIndex simValue blogId blogTitle

same-blog 1 0.99999988 847 andrew gelman stats-2011-08-10-Using a “pure infographic” to explore differences between information visualization and statistical graphics

Introduction: Our discussion on data visualization continues. One one side are three statisticians–Antony Unwin, Kaiser Fung, and myself. We have been writing about the different goals served by information visualization and statistical graphics. On the other side are graphics experts (sorry for the imprecision, I don’t know exactly what these people do in their day jobs or how they are trained, and I don’t want to mislabel them) such as Robert Kosara and Jen Lowe , who seem a bit annoyed at how my colleagues and myself seem to follow the Tufte strategy of criticizing what we don’t understand. And on the third side are many (most?) academic statisticians, econometricians, etc., who don’t understand or respect graphs and seem to think of visualization as a toy that is unrelated to serious science or statistics. I’m not so interested in the third group right now–I tried to communicate with them in my big articles from 2003 and 2004 )–but I am concerned that our dialogue with the graphic

2 0.47176734 855 andrew gelman stats-2011-08-16-Infovis and statgraphics update update

Introduction: To continue our discussion from last week , consider three positions regarding the display of information: (a) The traditional tabular approach. This is how most statisticians, econometricians, political scientists, sociologists, etc., seem to operate. They understand the appeal of a pretty graph, and they’re willing to plot some data as part of an exploratory data analysis, but they see their serious research as leading to numerical estimates, p-values, tables of numbers. These people might use a graph to illustrate their points but they don’t see them as necessary in their research. (b) Statistical graphics as performed by Howard Wainer, Bill Cleveland, Dianne Cook, etc. They–we–see graphics as central to the process of statistical modeling and data analysis and are interested in graphs (static and dynamic) that display every data point as transparently as possible. (c) Information visualization or infographics, as performed by graphics designers and statisticians who are

3 0.3766095 878 andrew gelman stats-2011-08-29-Infovis, infographics, and data visualization: Where I’m coming from, and where I’d like to go

Introduction: I continue to struggle to convey my thoughts on statistical graphics so I’ll try another approach, this time giving my own story. For newcomers to this discussion: the background is that Antony Unwin and I wrote an article on the different goals embodied in information visualization and statistical graphics, but I have difficulty communicating on this point with the infovis people. Maybe if I tell my own story, and then they tell their stories, this will point a way forward to a more constructive discussion. So here goes. I majored in physics in college and I worked in a couple of research labs during the summer. Physicists graph everything. I did most of my plotting on graph paper–this continued through my second year of grad school–and became expert at putting points at 1/5, 2/5, 3/5, and 4/5 between the x and y grid lines. In grad school in statistics, I continued my physics habits and graphed everything I could. I did notice, though, that the faculty and the other

4 0.33044904 1584 andrew gelman stats-2012-11-19-Tradeoffs in information graphics

Introduction: The visual display of quantitative information (to use Edward Tufte’s wonderful term) is a diverse field or set of fields, and its practitioners have different goals. The goals of software designers, applied statisticians, biologists, graphic designers, and journalists (to list just a few of the important creators of data graphics) often overlap—but not completely. One of our aims in writing our article [on Infovis and Statistical Graphics] was to emphasize the diversity of graphical goals, as it seems to us that even experts tend to consider one aspect of a graph and not others. Our main practical suggestion was that, in the internet age, we should not have to choose between attractive graphs and informational graphs: it should be possible to display both, via interactive displays. But to follow this suggestion, one must first accept that not every beautiful graph is informative, and not every informative graph is beautiful. . . . Yes, it can sometimes be possible for a graph to

5 0.30994016 816 andrew gelman stats-2011-07-22-“Information visualization” vs. “Statistical graphics”

Introduction: By now you all must be tired of my one-sided presentations of the differences between infovis and statgraphics (for example, this article with Antony Unwin). Today is something different. Courtesy of Martin Theus, editor of the Statistical Computing and Graphics Newsletter, we have two short articles offering competing perspectives: Robert Kosara writes from an Infovis view: Information visualization is a field that has had trouble defining its boundaries, and that consequently is often misunderstood. It doesn’t help that InfoVis, as it is also known, produces pretty pictures that people like to look at and link to or send around. But InfoVis is more than pretty pictures, and it is more than statistical graphics. The key to understanding InfoVis is to ignore the images for a moment and focus on the part that is often lost: interaction. When we use visualization tools, we don’t just create one image or one kind of visualization. In fact, most people would argue that there is

6 0.30669683 1604 andrew gelman stats-2012-12-04-An epithet I can live with

7 0.27790475 1848 andrew gelman stats-2013-05-09-A tale of two discussion papers

8 0.23584959 2266 andrew gelman stats-2014-03-25-A statistical graphics course and statistical graphics advice

9 0.21743144 1811 andrew gelman stats-2013-04-18-Psychology experiments to understand what’s going on with data graphics?

10 0.2155817 2279 andrew gelman stats-2014-04-02-Am I too negative?

11 0.20301378 1594 andrew gelman stats-2012-11-28-My talk on statistical graphics at Mit this Thurs aft

12 0.19178295 1132 andrew gelman stats-2012-01-21-A counterfeit data graphic

13 0.18377452 546 andrew gelman stats-2011-01-31-Infovis vs. statistical graphics: My talk tomorrow (Tues) 1pm at Columbia

14 0.17845081 794 andrew gelman stats-2011-07-09-The quest for the holy graph

15 0.1700784 492 andrew gelman stats-2010-12-30-That puzzle-solving feeling

16 0.16644405 61 andrew gelman stats-2010-05-31-A data visualization manifesto

17 0.16387108 1764 andrew gelman stats-2013-03-15-How do I make my graphs?

18 0.16064134 1096 andrew gelman stats-2012-01-02-Graphical communication for legal scholarship

19 0.15912288 599 andrew gelman stats-2011-03-03-Two interesting posts elsewhere on graphics

20 0.15907171 2284 andrew gelman stats-2014-04-07-How literature is like statistical reasoning: Kosara on stories. Gelman and Basbøll on stories.


similar blogs computed by lsi model

lsi for this blog:

topicId topicWeight

[(0, 0.28), (1, -0.074), (2, -0.113), (3, 0.114), (4, 0.182), (5, -0.266), (6, -0.194), (7, 0.095), (8, -0.046), (9, -0.009), (10, 0.017), (11, 0.014), (12, -0.017), (13, 0.017), (14, -0.058), (15, -0.067), (16, -0.063), (17, -0.033), (18, 0.022), (19, 0.094), (20, -0.019), (21, -0.064), (22, 0.029), (23, 0.012), (24, -0.022), (25, 0.017), (26, -0.003), (27, 0.065), (28, -0.034), (29, -0.027), (30, -0.074), (31, 0.026), (32, 0.068), (33, 0.089), (34, 0.038), (35, 0.042), (36, -0.005), (37, 0.068), (38, 0.053), (39, 0.037), (40, -0.041), (41, -0.039), (42, -0.037), (43, -0.077), (44, 0.024), (45, 0.029), (46, 0.05), (47, -0.016), (48, -0.026), (49, 0.031)]

similar blogs list:

simIndex simValue blogId blogTitle

same-blog 1 0.95756084 847 andrew gelman stats-2011-08-10-Using a “pure infographic” to explore differences between information visualization and statistical graphics

Introduction: Our discussion on data visualization continues. One one side are three statisticians–Antony Unwin, Kaiser Fung, and myself. We have been writing about the different goals served by information visualization and statistical graphics. On the other side are graphics experts (sorry for the imprecision, I don’t know exactly what these people do in their day jobs or how they are trained, and I don’t want to mislabel them) such as Robert Kosara and Jen Lowe , who seem a bit annoyed at how my colleagues and myself seem to follow the Tufte strategy of criticizing what we don’t understand. And on the third side are many (most?) academic statisticians, econometricians, etc., who don’t understand or respect graphs and seem to think of visualization as a toy that is unrelated to serious science or statistics. I’m not so interested in the third group right now–I tried to communicate with them in my big articles from 2003 and 2004 )–but I am concerned that our dialogue with the graphic

2 0.94320786 1584 andrew gelman stats-2012-11-19-Tradeoffs in information graphics

Introduction: The visual display of quantitative information (to use Edward Tufte’s wonderful term) is a diverse field or set of fields, and its practitioners have different goals. The goals of software designers, applied statisticians, biologists, graphic designers, and journalists (to list just a few of the important creators of data graphics) often overlap—but not completely. One of our aims in writing our article [on Infovis and Statistical Graphics] was to emphasize the diversity of graphical goals, as it seems to us that even experts tend to consider one aspect of a graph and not others. Our main practical suggestion was that, in the internet age, we should not have to choose between attractive graphs and informational graphs: it should be possible to display both, via interactive displays. But to follow this suggestion, one must first accept that not every beautiful graph is informative, and not every informative graph is beautiful. . . . Yes, it can sometimes be possible for a graph to

3 0.93306893 1604 andrew gelman stats-2012-12-04-An epithet I can live with

Introduction: Here . Indeed, I’d much rather be a legend than a myth. I just want to clarify one thing. Walter Hickey writes: [Antony Unwin and Andrew Gelman] collaborated on this presentation where they take a hard look at what’s wrong with the recent trends of data visualization and infographics. The takeaway is that while there have been great leaps in visualization technology, some of the visualizations that have garnered the highest praises have actually been lacking in a number of key areas. Specifically, the pair does a takedown of the top visualizations of 2008 as decided by the popular statistics blog Flowing Data. This is a fair summary, but I want to emphasize that, although our dislike of some award-winning visualizations is central to our argument, it is only the first part of our story. As Antony and I worked more on our paper, and especially after seeing the discussions by Robert Kosara, Stephen Few, Hadley Wickham, and Paul Murrell (all to appear in Journal of Computati

4 0.89992863 855 andrew gelman stats-2011-08-16-Infovis and statgraphics update update

Introduction: To continue our discussion from last week , consider three positions regarding the display of information: (a) The traditional tabular approach. This is how most statisticians, econometricians, political scientists, sociologists, etc., seem to operate. They understand the appeal of a pretty graph, and they’re willing to plot some data as part of an exploratory data analysis, but they see their serious research as leading to numerical estimates, p-values, tables of numbers. These people might use a graph to illustrate their points but they don’t see them as necessary in their research. (b) Statistical graphics as performed by Howard Wainer, Bill Cleveland, Dianne Cook, etc. They–we–see graphics as central to the process of statistical modeling and data analysis and are interested in graphs (static and dynamic) that display every data point as transparently as possible. (c) Information visualization or infographics, as performed by graphics designers and statisticians who are

5 0.89589208 816 andrew gelman stats-2011-07-22-“Information visualization” vs. “Statistical graphics”

Introduction: By now you all must be tired of my one-sided presentations of the differences between infovis and statgraphics (for example, this article with Antony Unwin). Today is something different. Courtesy of Martin Theus, editor of the Statistical Computing and Graphics Newsletter, we have two short articles offering competing perspectives: Robert Kosara writes from an Infovis view: Information visualization is a field that has had trouble defining its boundaries, and that consequently is often misunderstood. It doesn’t help that InfoVis, as it is also known, produces pretty pictures that people like to look at and link to or send around. But InfoVis is more than pretty pictures, and it is more than statistical graphics. The key to understanding InfoVis is to ignore the images for a moment and focus on the part that is often lost: interaction. When we use visualization tools, we don’t just create one image or one kind of visualization. In fact, most people would argue that there is

6 0.85450226 878 andrew gelman stats-2011-08-29-Infovis, infographics, and data visualization: Where I’m coming from, and where I’d like to go

7 0.84298468 1594 andrew gelman stats-2012-11-28-My talk on statistical graphics at Mit this Thurs aft

8 0.83048451 319 andrew gelman stats-2010-10-04-“Who owns Congress”

9 0.82223153 794 andrew gelman stats-2011-07-09-The quest for the holy graph

10 0.81608546 1811 andrew gelman stats-2013-04-18-Psychology experiments to understand what’s going on with data graphics?

11 0.78706044 1848 andrew gelman stats-2013-05-09-A tale of two discussion papers

12 0.78551203 2266 andrew gelman stats-2014-03-25-A statistical graphics course and statistical graphics advice

13 0.78020918 1896 andrew gelman stats-2013-06-13-Against the myth of the heroic visualization

14 0.77556789 1764 andrew gelman stats-2013-03-15-How do I make my graphs?

15 0.77465564 1096 andrew gelman stats-2012-01-02-Graphical communication for legal scholarship

16 0.76250672 546 andrew gelman stats-2011-01-31-Infovis vs. statistical graphics: My talk tomorrow (Tues) 1pm at Columbia

17 0.75530225 1775 andrew gelman stats-2013-03-23-In which I disagree with John Maynard Keynes

18 0.74843639 1308 andrew gelman stats-2012-05-08-chartsnthings !

19 0.74547607 1606 andrew gelman stats-2012-12-05-The Grinch Comes Back

20 0.73228115 1275 andrew gelman stats-2012-04-22-Please stop me before I barf again


similar blogs computed by lda model

lda for this blog:

topicId topicWeight

[(5, 0.015), (15, 0.028), (16, 0.08), (21, 0.01), (24, 0.379), (53, 0.014), (55, 0.016), (57, 0.014), (63, 0.017), (77, 0.017), (86, 0.016), (87, 0.011), (99, 0.238)]

similar blogs list:

simIndex simValue blogId blogTitle

1 0.99192011 482 andrew gelman stats-2010-12-23-Capitalism as a form of voluntarism

Introduction: Interesting discussion by Alex Tabarrok (following up on an article by Rebecca Solnit) on the continuum between voluntarism (or, more generally, non-cash transactions) and markets with monetary exchange. I just have a few comments of my own: 1. Solnit writes of “the iceberg economy,” which she characterizes as “based on gift economies, barter, mutual aid, and giving without hope of return . . . the relations between friends, between family members, the activities of volunteers or those who have chosen their vocation on principle rather than for profit.” I just wonder whether “barter” completely fits in here. Maybe it depends on context. Sometimes barter is an informal way of keeping track (you help me and I help you), but in settings of low liquidity I could imagine barter being simply an inefficient way of performing an economic transaction. 2. I am no expert on capitalism but my impression is that it’s not just about “competition and selfishness” but also is related to the

2 0.98945546 1787 andrew gelman stats-2013-04-04-Wanna be the next Tyler Cowen? It’s not as easy as you might think!

Introduction: Someone told me he ran into someone who said his goal was to be Tyler Cowen. OK, fine, it’s a worthy goal, but I don’t think it’s so easy .

3 0.98845577 1092 andrew gelman stats-2011-12-29-More by Berger and me on weakly informative priors

Introduction: A couple days ago we discussed some remarks by Tony O’Hagan and Jim Berger on weakly informative priors. Jim followed up on Deborah Mayo’s blog with this: Objective Bayesian priors are often improper (i.e., have infinite total mass), but this is not a problem when they are developed correctly. But not every improper prior is satisfactory. For instance, the constant prior is known to be unsatisfactory in many situations. The ‘solution’ pseudo-Bayesians often use is to choose a constant prior over a large but bounded set (a ‘weakly informative’ prior), saying it is now proper and so all is well. This is not true; if the constant prior on the whole parameter space is bad, so will be the constant prior over the bounded set. The problem is, in part, that some people confuse proper priors with subjective priors and, having learned that true subjective priors are fine, incorrectly presume that weakly informative proper priors are fine. I have a few reactions to this: 1. I agree

4 0.98755622 1706 andrew gelman stats-2013-02-04-Too many MC’s not enough MIC’s, or What principles should govern attempts to summarize bivariate associations in large multivariate datasets?

Introduction: Justin Kinney writes: Since your blog has discussed the “maximal information coefficient” (MIC) of Reshef et al., I figured you might want to see the critique that Gurinder Atwal and I have posted. In short, Reshef et al.’s central claim that MIC is “equitable” is incorrect. We [Kinney and Atwal] offer mathematical proof that the definition of “equitability” Reshef et al. propose is unsatisfiable—no nontrivial dependence measure, including MIC, has this property. Replicating the simulations in their paper with modestly larger data sets validates this finding. The heuristic notion of equitability, however, can be formalized instead as a self-consistency condition closely related to the Data Processing Inequality. Mutual information satisfies this new definition of equitability but MIC does not. We therefore propose that simply estimating mutual information will, in many cases, provide the sort of dependence measure Reshef et al. seek. For background, here are my two p

5 0.98744202 938 andrew gelman stats-2011-10-03-Comparing prediction errors

Introduction: Someone named James writes: I’m working on a classification task, sentence segmentation. The classifier algorithm we use (BoosTexter, a boosted learning algorithm) classifies each word independently conditional on its features, i.e. a bag-of-words model, so any contextual clues need to be encoded into the features. The feature extraction system I am proposing in my thesis uses a heteroscedastic LDA to transform data to produce the features the classifier runs on. The HLDA system has a couple parameters I’m testing, and I’m running a 3×2 full factorial experiment. That’s the background which may or may not be relevant to the question. The output of each trial is a class (there are only 2 classes, right now) for every word in the dataset. Because of the nature of the task, one class strongly predominates, say 90-95% of the data. My question is this: in terms of overall performance (we use F1 score), many of these trials are pretty close together, which leads me to ask whethe

6 0.98675185 743 andrew gelman stats-2011-06-03-An argument that can’t possibly make sense

7 0.98539078 1479 andrew gelman stats-2012-09-01-Mothers and Moms

8 0.98486722 1978 andrew gelman stats-2013-08-12-Fixing the race, ethnicity, and national origin questions on the U.S. Census

9 0.98388296 241 andrew gelman stats-2010-08-29-Ethics and statistics in development research

10 0.98337466 1376 andrew gelman stats-2012-06-12-Simple graph WIN: the example of birthday frequencies

11 0.97989964 38 andrew gelman stats-2010-05-18-Breastfeeding, infant hyperbilirubinemia, statistical graphics, and modern medicine

12 0.9792338 278 andrew gelman stats-2010-09-15-Advice that might make sense for individuals but is negative-sum overall

13 0.97837967 1455 andrew gelman stats-2012-08-12-Probabilistic screening to get an approximate self-weighted sample

14 0.97729421 643 andrew gelman stats-2011-04-02-So-called Bayesian hypothesis testing is just as bad as regular hypothesis testing

15 0.97588736 1891 andrew gelman stats-2013-06-09-“Heterogeneity of variance in experimental studies: A challenge to conventional interpretations”

16 0.97275376 2143 andrew gelman stats-2013-12-22-The kluges of today are the textbook solutions of tomorrow.

17 0.97250748 1999 andrew gelman stats-2013-08-27-Bayesian model averaging or fitting a larger model

18 0.96915388 1258 andrew gelman stats-2012-04-10-Why display 6 years instead of 30?

19 0.96894085 197 andrew gelman stats-2010-08-10-The last great essayist?

20 0.96683371 953 andrew gelman stats-2011-10-11-Steve Jobs’s cancer and science-based medicine