andrew_gelman_stats andrew_gelman_stats-2010 andrew_gelman_stats-2010-61 knowledge-graph by maker-knowledge-mining

61 andrew gelman stats-2010-05-31-A data visualization manifesto

meta infos for this blog

Source: html

Introduction: Details matter (at least, they do for me), but we don’t yet have a systematic way of going back and forth between the structure of a graph, its details, and the underlying questions that motivate our visualizations. (Cleveland, Wilkinson, and others have written a bit on how to formalize these connections, and I’ve thought about it too, but we have a ways to go.) I was thinking about this difficulty after reading an article on graphics by some computer scientists that was well-written but to me lacked a feeling for the linkages between substantive/statistical goals and graphical details. I have problems with these issues too, and my point here is not to criticize but to move the discussion forward. When thinking about visualization, how important are the details? Aleks pointed me to this article by Jeffrey Heer, Michael Bostock, and Vadim Ogievetsky, “A Tour through the Visualization Zoo: A survey of powerful visualization techniques, from the obvious to the obscure.” Th

Summary: the most important sentenses genereted by tfidf model

sentIndex sentText sentNum sentScore

1 ) I was thinking about this difficulty after reading an article on graphics by some computer scientists that was well-written but to me lacked a feeling for the linkages between substantive/statistical goals and graphical details. [sent-3, score-0.434]

2 ” They make some reasonable points, but a big problem I have with the article is in the details of the actual visualizations they show. [sent-7, score-0.428]

3 Figure 1B has that notorious alphabetical order, also some weird visual artifacts that get created by stacking curves, and a x-axis that is not fully labeled. [sent-9, score-0.451]

4 ) Yes, I realize that one purpose of the article is to criticize such graphs (“While such charts have proven popular in recent years, they do have some notable limitations. [sent-12, score-0.358]

5 Still, it doesn’t help to list the industries in alphabetical order. [sent-18, score-0.437]

6 Something went terribly wrong here; perhaps each graph was rescaled to its own range, which wouldn’t make much sense in a small multiples plot. [sent-21, score-0.35]

7 I could keep going here through all the other graphs in the article But maybe these criticisms are irrelevant. [sent-25, score-0.43]

8 Perhaps such glitches (from my perspective) are either irrelevant to the general message of the graph or, from the other direction, force the reader to look at the graph and read the surrounding text more clearly to figure out what’s going on. [sent-31, score-0.645]

9 After all, a graph isn’t a TV show, readers aren’t passive, so maybe it’s actually good to make them work to figure out what’s going on. [sent-32, score-0.685]

10 At a statistical level, though, I think the details are very important, because they connect the data being graphed with the underlying questions being studied. [sent-33, score-0.448]

11 If you’re not interested in an alphabetical ordering, you don’t want to put it on a graph. [sent-35, score-0.302]

12 If you want to convey something beyond simply that big cars get worse gas mileage, you’ll want to invert the axes on your parallel coordinate plot. [sent-36, score-0.342]

13 If you wanted to say I’m wrong, you could perhaps invoke an opportunity cost argument, that the time I spend worrying about where to label the lines on a graph (not to mention the time I spend blogging about it! [sent-39, score-0.415]

14 For me, the details of the graphing are absolutely necessary to the statistical analysis–decades ago, before I did everything on the computer, I spent lots and lots of time making graphs by hand, using colored pens and all the rest–but for others, maybe not. [sent-41, score-0.688]

15 article is that it doesn’t mention what are perhaps the three most important kinds of graphs: dot plots, line plots, and scatterplots. [sent-43, score-0.587]

16 See here here for a dotplot (from Jeff and Justin), and here for some line plots and scatterplots. [sent-44, score-0.298]

17 A clearer understanding of line plots would’ve been a big help in making Figure 1C, for example. [sent-48, score-0.435]

18 What’s missing is the link from the substantive questions (what are the reasons for making the graph in the first place? [sent-54, score-0.354]

19 Instead we go through menus of possibilities (actual forced options on computer packages, or mental menus in which we make choices based on what we’ve seen before) and then have to go back and fix things. [sent-57, score-0.424]

20 I didn’t feel like revising the whole piece, but I guess I will if I want to rewrite the article for publication somewhere, which maybe I’ll do if I find the right coauthor. [sent-70, score-0.295]

similar blogs computed by tfidf model

tfidf for this blog:

wordName wordTfidf (topN-words)

[('details', 0.245), ('alphabetical', 0.212), ('plots', 0.208), ('graph', 0.194), ('figure', 0.193), ('heer', 0.188), ('industries', 0.164), ('visualization', 0.163), ('graphs', 0.161), ('dot', 0.138), ('manifesto', 0.137), ('menus', 0.137), ('article', 0.122), ('stacking', 0.118), ('unemployment', 0.109), ('axes', 0.101), ('perhaps', 0.095), ('thinking', 0.094), ('ordering', 0.091), ('line', 0.09), ('want', 0.09), ('readers', 0.09), ('computer', 0.089), ('questions', 0.084), ('labels', 0.084), ('maybe', 0.083), ('scale', 0.08), ('important', 0.079), ('labeled', 0.076), ('making', 0.076), ('criticize', 0.075), ('forth', 0.074), ('systematic', 0.069), ('hand', 0.066), ('graphical', 0.066), ('going', 0.064), ('mention', 0.063), ('faulting', 0.063), ('invoke', 0.063), ('linkages', 0.063), ('pens', 0.063), ('fully', 0.062), ('make', 0.061), ('help', 0.061), ('something', 0.061), ('statistical', 0.06), ('factor', 0.059), ('bostock', 0.059), ('artifacts', 0.059), ('graphed', 0.059)]

similar blogs list:

simIndex simValue blogId blogTitle

same-blog 1 1.0000001 61 andrew gelman stats-2010-05-31-A data visualization manifesto

2 0.32999173 878 andrew gelman stats-2011-08-29-Infovis, infographics, and data visualization: Where I’m coming from, and where I’d like to go

Introduction: I continue to struggle to convey my thoughts on statistical graphics so I’ll try another approach, this time giving my own story. For newcomers to this discussion: the background is that Antony Unwin and I wrote an article on the different goals embodied in information visualization and statistical graphics, but I have difficulty communicating on this point with the infovis people. Maybe if I tell my own story, and then they tell their stories, this will point a way forward to a more constructive discussion. So here goes. I majored in physics in college and I worked in a couple of research labs during the summer. Physicists graph everything. I did most of my plotting on graph paper–this continued through my second year of grad school–and became expert at putting points at 1/5, 2/5, 3/5, and 4/5 between the x and y grid lines. In grad school in statistics, I continued my physics habits and graphed everything I could. I did notice, though, that the faculty and the other

3 0.24532597 676 andrew gelman stats-2011-04-23-The payoff: $650. The odds: 1 in 500,000.

Introduction: Details here .

4 0.22307111 855 andrew gelman stats-2011-08-16-Infovis and statgraphics update update

Introduction: To continue our discussion from last week , consider three positions regarding the display of information: (a) The traditional tabular approach. This is how most statisticians, econometricians, political scientists, sociologists, etc., seem to operate. They understand the appeal of a pretty graph, and they’re willing to plot some data as part of an exploratory data analysis, but they see their serious research as leading to numerical estimates, p-values, tables of numbers. These people might use a graph to illustrate their points but they don’t see them as necessary in their research. (b) Statistical graphics as performed by Howard Wainer, Bill Cleveland, Dianne Cook, etc. They–we–see graphics as central to the process of statistical modeling and data analysis and are interested in graphs (static and dynamic) that display every data point as transparently as possible. (c) Information visualization or infographics, as performed by graphics designers and statisticians who are

5 0.21962497 252 andrew gelman stats-2010-09-02-R needs a good function to make line plots

Introduction: More and more I’m thinking that line plots are great. More specifically, two-way grids of line plots on common scales, with one, two, or three lines per plot (enough to show comparisons but not so many that you can’t tell the lines apart). Also dot plots, of the sort that have been masterfully used by Lax and Phillips to show comparisons and trends in support for gay rights. There’s a big step missing, though, and that is to be able to make these graphs as a default. We have to figure out the right way to structure the data so these graphs come naturally. Then when it’s all working, we can talk the Excel people into implementing our ideas. I’m not asking to be paid here; all our ideas are in the public domain and I’m happy for Microsoft or Google or whoever to copy us. P.S. Drew Conway writes: This could be accomplished with ggplot2 using various combinations of the grammar. If I am understanding what you mean by line plots, here are some examples with code . In fact,

6 0.21773897 2266 andrew gelman stats-2014-03-25-A statistical graphics course and statistical graphics advice

7 0.17753415 1848 andrew gelman stats-2013-05-09-A tale of two discussion papers

8 0.17701855 488 andrew gelman stats-2010-12-27-Graph of the year

9 0.16885544 1811 andrew gelman stats-2013-04-18-Psychology experiments to understand what’s going on with data graphics?

10 0.16644405 847 andrew gelman stats-2011-08-10-Using a “pure infographic” to explore differences between information visualization and statistical graphics

11 0.16245027 1764 andrew gelman stats-2013-03-15-How do I make my graphs?

12 0.15801953 1609 andrew gelman stats-2012-12-06-Stephen Kosslyn’s principles of graphics and one more: There’s no need to cram everything into a single plot

13 0.15682431 1176 andrew gelman stats-2012-02-19-Standardized writing styles and standardized graphing styles

14 0.1551635 262 andrew gelman stats-2010-09-08-Here’s how rumors get started: Lineplots, dotplots, and nonfunctional modernist architecture

15 0.15003304 798 andrew gelman stats-2011-07-12-Sometimes a graph really is just ugly

16 0.14899975 2154 andrew gelman stats-2013-12-30-Bill Gates’s favorite graph of the year

17 0.14670005 816 andrew gelman stats-2011-07-22-“Information visualization” vs. “Statistical graphics”

18 0.1462025 2172 andrew gelman stats-2014-01-14-Advice on writing research articles

19 0.14544921 1684 andrew gelman stats-2013-01-20-Ugly ugly ugly

20 0.1444066 1767 andrew gelman stats-2013-03-17-The disappearing or non-disappearing middle class

similar blogs computed by lsi model

lsi for this blog:

topicId topicWeight

[(0, 0.301), (1, -0.087), (2, -0.029), (3, 0.105), (4, 0.186), (5, -0.231), (6, -0.108), (7, 0.059), (8, -0.026), (9, -0.004), (10, 0.028), (11, -0.025), (12, -0.044), (13, 0.015), (14, 0.023), (15, -0.013), (16, 0.021), (17, -0.019), (18, -0.061), (19, -0.001), (20, 0.007), (21, -0.01), (22, -0.001), (23, 0.053), (24, 0.008), (25, -0.016), (26, 0.036), (27, 0.006), (28, -0.018), (29, 0.011), (30, 0.028), (31, 0.003), (32, -0.025), (33, -0.014), (34, -0.031), (35, -0.013), (36, -0.018), (37, -0.01), (38, -0.002), (39, -0.04), (40, 0.009), (41, 0.013), (42, -0.004), (43, 0.022), (44, -0.029), (45, -0.029), (46, -0.022), (47, 0.001), (48, 0.016), (49, 0.013)]

similar blogs list:

simIndex simValue blogId blogTitle

same-blog 1 0.97313756 61 andrew gelman stats-2010-05-31-A data visualization manifesto

2 0.93193215 37 andrew gelman stats-2010-05-17-Is chartjunk really “more useful” than plain graphs? I don’t think so.

Introduction: Helen DeWitt links to this blog that reports on a study by Scott Bateman, Carl Gutwin, David McDine, Regan Mandryk, Aaron Genest, and Christopher Brooks that claims the following: Guidelines for designing information charts often state that the presentation should reduce ‘chart junk’–visual embellishments that are not essential to understanding the data. . . . we conducted an experiment that compared embellished charts with plain ones, and measured both interpretation accuracy and long-term recall. We found that people’s accuracy in describing the embellished charts was no worse than for plain charts, and that their recall after a two-to-three-week gap was significantly better. As the above-linked blogger puts it, “chartjunk is more useful than plain graphs. . . . Tufte is not going to like this.” I can’t speak for Ed Tufte, but I’m not gonna take this claim about chartjunk lying down. I have two points to make which I hope can stop the above-linked study from being sla

3 0.93158847 878 andrew gelman stats-2011-08-29-Infovis, infographics, and data visualization: Where I’m coming from, and where I’d like to go

4 0.92620611 1684 andrew gelman stats-2013-01-20-Ugly ugly ugly

Introduction: Denis Cote sends the following , under the heading, “Some bad graphs for your enjoyment”: To start with, they don’t know how to spell “color.” Seriously, though, the graph is a mess. The circular display implies a circular or periodic structure that isn’t actually in the data, the cramped display requires the use of an otherwise-unnecessary color code that makes it difficult to find or make sense of the information, the alphabetical ordering (without even supplying state names, only abbreviations) makes it further difficult to find any patterns. It would be so much better, and even easier, to just display a set of small maps shading states on whether they have different laws. But that’s part of the problem—the clearer graph would also be easier to make! To get a distinctive graph, there needs to be some degree of difficulty. The designers continue with these monstrosities: Here they decide to display only 5 states at a time so that it’s really hard to see any big pi

5 0.91476029 1609 andrew gelman stats-2012-12-06-Stephen Kosslyn’s principles of graphics and one more: There’s no need to cram everything into a single plot

Introduction: Jerzy Wieczorek has an interesting review of the book Graph Design for the Eye and Mind by psychology researcher Stephen Kosslyn. I recommend you read all of Wieczorek’s review (and maybe Kosslyn’s book, but that I haven’t seen), but here I’ll just focus on one point. Here’s Wieczorek summarizing Kosslyn: p. 18-19: the horizontal axis should be for the variable with the “most important part of the data.” See Kosslyn’s Figure 1.6 and 1.7 below. Figure 1.6 clearly shows that one of the sex-by-income groups reacts to age differently than the other three groups do. Figure 1.7 uses sex as the x-axis variable, making it much harder to see this same effect in the data. As a statistician exploring the data, I might make several plots using different groupings… but for communicating my results to an audience, I would choose the one plot that shows the findings most clearly. Those who know me well (or who have read the title of this post) will guess my reaction, whic

6 0.90682185 1439 andrew gelman stats-2012-08-01-A book with a bunch of simple graphs

7 0.90172452 1764 andrew gelman stats-2013-03-15-How do I make my graphs?

8 0.89896691 2266 andrew gelman stats-2014-03-25-A statistical graphics course and statistical graphics advice

9 0.89599693 2246 andrew gelman stats-2014-03-13-An Economist’s Guide to Visualizing Data

10 0.89598149 488 andrew gelman stats-2010-12-27-Graph of the year

11 0.89510036 2154 andrew gelman stats-2013-12-30-Bill Gates’s favorite graph of the year

12 0.89147812 829 andrew gelman stats-2011-07-29-Infovis vs. statgraphics: A clear example of their different goals

13 0.88859022 1894 andrew gelman stats-2013-06-12-How to best graph the Beveridge curve, relating the vacancy rate in jobs to the unemployment rate?

14 0.88355082 1606 andrew gelman stats-2012-12-05-The Grinch Comes Back

15 0.87390047 319 andrew gelman stats-2010-10-04-“Who owns Congress”

16 0.85982931 1896 andrew gelman stats-2013-06-13-Against the myth of the heroic visualization

17 0.85792959 671 andrew gelman stats-2011-04-20-One more time-use graph

18 0.85529774 855 andrew gelman stats-2011-08-16-Infovis and statgraphics update update

19 0.85112029 294 andrew gelman stats-2010-09-23-Thinking outside the (graphical) box: Instead of arguing about how best to fix a bar chart, graph it as a time series lineplot instead

20 0.84844774 296 andrew gelman stats-2010-09-26-A simple semigraphic display

similar blogs computed by lda model

lda for this blog:

topicId topicWeight

[(5, 0.046), (15, 0.041), (16, 0.078), (21, 0.026), (24, 0.17), (27, 0.014), (45, 0.02), (50, 0.084), (53, 0.024), (76, 0.017), (77, 0.017), (86, 0.032), (88, 0.01), (99, 0.307)]

similar blogs list:

simIndex simValue blogId blogTitle

1 0.97159123 1805 andrew gelman stats-2013-04-16-Memo to Reinhart and Rogoff: I think it’s best to admit your errors and go on from there

Introduction: Jeff Ratto points me to this news article by Dean Baker reporting the work of three economists, Thomas Herndon, Michael Ash, and Robert Pollin, who found errors in a much-cited article by Carmen Reinhart and Kenneth Rogoff analyzing historical statistics of economic growth and public debt. Mike Konczal provides a clear summary; that’s where I got the above image. Errors in data processing and data analysis It turns out that Reinhart and Rogoff flubbed it. Herndon et al. write of “spreadsheet errors, omission of available data, weighting, and transcription.” The spreadsheet errors are the most embarrassing, but the other choices in data analysis seem pretty bad too. It can be tough to work with small datasets, so I have sympathy for Reinhart and Rogoff, but it does look like they were jumping to conclusions in their paper. Perhaps the urgency of the topic moved them to publish as fast as possible rather than carefully considering the impact of their data-analytic choi

same-blog 2 0.96856016 61 andrew gelman stats-2010-05-31-A data visualization manifesto

3 0.96535474 1793 andrew gelman stats-2013-04-08-The Supreme Court meets the fallacy of the one-sided bet

Introduction: Doug Hartmann writes ( link from Jay Livingston): Justice Antonin Scalia’s comment in the Supreme Court hearings on the U.S. law defining marriage that “there’s considerable disagreement among sociologists as to what the consequences of raising a child in a single-sex family, whether that is harmful to the child or not.” Hartman argues that Scalia is factually incorrect—there is not actually “considerable disagreement among sociologists” on this issue—and quotes a recent report from the American Sociological Association to this effect. Assuming there’s no other considerable group of sociologists (Hartman knows of only one small group) arguing otherwise, it seems that Hartman has a point. Scalia would’ve been better off omitting the phrase “among sociologists”—then he’d have been on safe ground, because you can always find somebody to take a position on the issue. Jerry Falwell’s no longer around but there’s a lot more where he came from. Even among scientists, there’s

4 0.96463716 541 andrew gelman stats-2011-01-27-Why can’t I be more like Bill James, or, The use of default and default-like models

Introduction: During our discussion of estimates of teacher performance, Steve Sailer wrote : I suspect we’re going to take years to work the kinks out of overall rating systems. By way of analogy, Bill James kicked off the modern era of baseball statistics analysis around 1975. But he stuck to doing smaller scale analyses and avoided trying to build one giant overall model for rating players. In contrast, other analysts such as Pete Palmer rushed into building overall ranking systems, such as his 1984 book, but they tended to generate curious results such as the greatness of Roy Smalley Jr.. James held off until 1999 before unveiling his win share model for overall rankings. I remember looking at Pete Palmer’s book many years ago and being disappointed that he did everything through his Linear Weights formula. A hit is worth X, a walk is worth Y, etc. Some of this is good–it’s presumably an improvement on counting walks as 0 or 1 hits, also an improvement on counting doubles and triples a

5 0.9634881 1981 andrew gelman stats-2013-08-14-The robust beauty of improper linear models in decision making

Introduction: Andreas Graefe writes (see here here here ): The usual procedure for developing linear models to predict any kind of target variable is to identify a subset of most important predictors and to estimate weights that provide the best possible solution for a given sample. The resulting “optimally” weighted linear composite is then used when predicting new data. This approach is useful in situations with large and reliable datasets and few predictor variables. However, a large body of analytical and empirical evidence since the 1970s shows that the weighting of variables is of little, if any, value in situations with small and noisy datasets and a large number of predictor variables. In such situations, including all relevant variables is more important than their weighting. These findings have yet to impact many fields. This study uses data from nine established U.S. election-forecasting models whose forecasts are regularly published in academic journals to demonstrate the value o

6 0.96304893 210 andrew gelman stats-2010-08-16-What I learned from those tough 538 commenters

7 0.9618929 1713 andrew gelman stats-2013-02-08-P-values and statistical practice

8 0.95895392 878 andrew gelman stats-2011-08-29-Infovis, infographics, and data visualization: Where I’m coming from, and where I’d like to go

9 0.95743465 1247 andrew gelman stats-2012-04-05-More philosophy of Bayes

10 0.9555037 1117 andrew gelman stats-2012-01-13-What are the important issues in ethics and statistics? I’m looking for your input!

11 0.9552415 1914 andrew gelman stats-2013-06-25-Is there too much coauthorship in economics (and science more generally)? Or too little?

12 0.95506883 1760 andrew gelman stats-2013-03-12-Misunderstanding the p-value

13 0.95492554 2013 andrew gelman stats-2013-09-08-What we need here is some peer review for statistical graphics

14 0.95451415 1695 andrew gelman stats-2013-01-28-Economists argue about Bayes

15 0.95429695 120 andrew gelman stats-2010-06-30-You can’t put Pandora back in the box

16 0.95429075 2149 andrew gelman stats-2013-12-26-Statistical evidence for revised standards

17 0.95424384 391 andrew gelman stats-2010-11-03-Some thoughts on election forecasting

18 0.95423615 2080 andrew gelman stats-2013-10-28-Writing for free

19 0.95421946 1162 andrew gelman stats-2012-02-11-Adding an error model to a deterministic model

20 0.95417476 2340 andrew gelman stats-2014-05-20-Thermodynamic Monte Carlo: Michael Betancourt’s new method for simulating from difficult distributions and evaluating normalizing constants