andrew_gelman_stats andrew_gelman_stats-2014 andrew_gelman_stats-2014-2266 knowledge-graph by maker-knowledge-mining

2266 andrew gelman stats-2014-03-25-A statistical graphics course and statistical graphics advice

meta infos for this blog

Source: html

Introduction: Dean Eckles writes: Some of my coworkers at Facebook and I have worked with Udacity to create an online course on exploratory data analysis, including using data visualizations in R as part of EDA. The course has now launched at https://www.udacity.com/course/ud651 so anyone can take it for free. And Kaiser Fung has reviewed it . So definitely feel free to promote it! Criticism is also welcome (we are still fine-tuning things and adding more notes throughout). I wrote some more comments about the course here , including highlighting the interviews with my great coworkers. I didn’t have a chance to look at the course so instead I responded with some generic comments about eda and visualization (in no particular order): - Think of a graph as a comparison. All graphs are comparison (indeed, all statistical analyses are comparisons). If you already have the graph in mind, think of what comparisons it’s enabling. Or if you haven’t settled on the graph yet, think of what

Summary: the most important sentenses genereted by tfidf model

sentIndex sentText sentNum sentScore

1 Dean Eckles writes: Some of my coworkers at Facebook and I have worked with Udacity to create an online course on exploratory data analysis, including using data visualizations in R as part of EDA. [sent-1, score-0.608]

2 I didn’t have a chance to look at the course so instead I responded with some generic comments about eda and visualization (in no particular order): - Think of a graph as a comparison. [sent-9, score-0.643]

3 All graphs are comparison (indeed, all statistical analyses are comparisons). [sent-10, score-0.222]

4 If you already have the graph in mind, think of what comparisons it’s enabling. [sent-11, score-0.569]

5 Or if you haven’t settled on the graph yet, think of what comparisons you’d like to make. [sent-12, score-0.648]

6 - For example, Tukey described EDA as the search for the unexpected (or something like that, I don’t remember the exact quote). [sent-13, score-0.13]

7 But, if you think about it, the unexpected is necessarily defined relative to what is expected, thus the (possibly implicit) model that the graph is being compared to. [sent-14, score-0.674]

8 - Consider two extreme views: (a) a graph as a pure exploration, where you bring no expectations whatsoever to the data, (b) a graph as pure execution, you know what you want to show and then you show it. [sent-15, score-1.19]

9 Exploration is always relative to expectations, but on the other hand you always want the capacity for being surprised. [sent-17, score-0.374]

10 - No need to cram all information onto a single graph. [sent-18, score-0.085]

11 - A related point: Make each graph small, then you can put lots of graphs on a page (or screen) - A tradeoff: clarity, recognized graphical forms (time series, scatterplot, etc), and spare (Tufte or Cleveland-like) design make a graph easier to read. [sent-20, score-1.096]

12 But too many similar-looking spare graphs can blur in the mind and then you’re not fully engaging the reader’s visual brain. [sent-21, score-0.538]

13 A graph with two or three lines (labeled directly, not with a legend, please! [sent-24, score-0.312]

14 I don’t see a big difference between exploratory graphics and presentation graphics. [sent-28, score-0.566]

15 When I make graphics for myself, I make them (roughly) presentation quality: I make them in pdf, give titles and axis labels, grids of graphs, etc. [sent-29, score-0.798]

16 - Statistical graphics is commonly presented as being exploratory and about plotting the raw data. [sent-30, score-0.532]

17 I think that’s important, no doubt about it, and I don’t do enough of it. [sent-31, score-0.087]

18 But, in addition, beyond data exploration, graphics is important for understanding the models we have fit, so I also like the term “exploratory model analysis. [sent-33, score-0.298]

19 ” - I’ve only very rarely made dynamic graphs (oddly enough, the only one I can think of offhand, I made for a research project in 1989 and never published or even showed it to anyone else, I think). [sent-34, score-0.516]

20 When teaching, it’s good to give the students a good sense of the areas I don’t know about, to help them better map the territory relative to my course material. [sent-38, score-0.372]

similar blogs computed by tfidf model

tfidf for this blog:

wordName wordTfidf (topN-words)

[('graph', 0.312), ('graphics', 0.23), ('exploratory', 0.23), ('graphs', 0.222), ('eda', 0.189), ('exploration', 0.186), ('comparisons', 0.17), ('spare', 0.152), ('relative', 0.145), ('course', 0.142), ('dynamic', 0.13), ('unexpected', 0.13), ('expectations', 0.126), ('pure', 0.113), ('presentation', 0.106), ('coworkers', 0.1), ('make', 0.098), ('mind', 0.091), ('eckles', 0.087), ('think', 0.087), ('grids', 0.085), ('cram', 0.085), ('territory', 0.085), ('titles', 0.083), ('highlighting', 0.083), ('clarity', 0.079), ('settled', 0.079), ('always', 0.078), ('legend', 0.078), ('execution', 0.078), ('https', 0.078), ('anyone', 0.077), ('whatsoever', 0.076), ('offhand', 0.075), ('launched', 0.074), ('tradeoff', 0.073), ('engaging', 0.073), ('capacity', 0.073), ('line', 0.072), ('plotting', 0.072), ('scatterplot', 0.072), ('tufte', 0.071), ('facebook', 0.07), ('show', 0.069), ('tukey', 0.068), ('dean', 0.068), ('data', 0.068), ('promote', 0.068), ('pdf', 0.068), ('amazingly', 0.068)]

similar blogs list:

simIndex simValue blogId blogTitle

same-blog 1 1.0000007 2266 andrew gelman stats-2014-03-25-A statistical graphics course and statistical graphics advice

2 0.31087592 878 andrew gelman stats-2011-08-29-Infovis, infographics, and data visualization: Where I’m coming from, and where I’d like to go

Introduction: I continue to struggle to convey my thoughts on statistical graphics so I’ll try another approach, this time giving my own story. For newcomers to this discussion: the background is that Antony Unwin and I wrote an article on the different goals embodied in information visualization and statistical graphics, but I have difficulty communicating on this point with the infovis people. Maybe if I tell my own story, and then they tell their stories, this will point a way forward to a more constructive discussion. So here goes. I majored in physics in college and I worked in a couple of research labs during the summer. Physicists graph everything. I did most of my plotting on graph paper–this continued through my second year of grad school–and became expert at putting points at 1/5, 2/5, 3/5, and 4/5 between the x and y grid lines. In grad school in statistics, I continued my physics habits and graphed everything I could. I did notice, though, that the faculty and the other

3 0.26719213 855 andrew gelman stats-2011-08-16-Infovis and statgraphics update update

Introduction: To continue our discussion from last week , consider three positions regarding the display of information: (a) The traditional tabular approach. This is how most statisticians, econometricians, political scientists, sociologists, etc., seem to operate. They understand the appeal of a pretty graph, and they’re willing to plot some data as part of an exploratory data analysis, but they see their serious research as leading to numerical estimates, p-values, tables of numbers. These people might use a graph to illustrate their points but they don’t see them as necessary in their research. (b) Statistical graphics as performed by Howard Wainer, Bill Cleveland, Dianne Cook, etc. They–we–see graphics as central to the process of statistical modeling and data analysis and are interested in graphs (static and dynamic) that display every data point as transparently as possible. (c) Information visualization or infographics, as performed by graphics designers and statisticians who are

4 0.23584959 847 andrew gelman stats-2011-08-10-Using a “pure infographic” to explore differences between information visualization and statistical graphics

Introduction: Our discussion on data visualization continues. One one side are three statisticians–Antony Unwin, Kaiser Fung, and myself. We have been writing about the different goals served by information visualization and statistical graphics. On the other side are graphics experts (sorry for the imprecision, I don’t know exactly what these people do in their day jobs or how they are trained, and I don’t want to mislabel them) such as Robert Kosara and Jen Lowe , who seem a bit annoyed at how my colleagues and myself seem to follow the Tufte strategy of criticizing what we don’t understand. And on the third side are many (most?) academic statisticians, econometricians, etc., who don’t understand or respect graphs and seem to think of visualization as a toy that is unrelated to serious science or statistics. I’m not so interested in the third group right now–I tried to communicate with them in my big articles from 2003 and 2004 )–but I am concerned that our dialogue with the graphic

5 0.21773897 61 andrew gelman stats-2010-05-31-A data visualization manifesto

Introduction: Details matter (at least, they do for me), but we don’t yet have a systematic way of going back and forth between the structure of a graph, its details, and the underlying questions that motivate our visualizations. (Cleveland, Wilkinson, and others have written a bit on how to formalize these connections, and I’ve thought about it too, but we have a ways to go.) I was thinking about this difficulty after reading an article on graphics by some computer scientists that was well-written but to me lacked a feeling for the linkages between substantive/statistical goals and graphical details. I have problems with these issues too, and my point here is not to criticize but to move the discussion forward. When thinking about visualization, how important are the details? Aleks pointed me to this article by Jeffrey Heer, Michael Bostock, and Vadim Ogievetsky, “A Tour through the Visualization Zoo: A survey of powerful visualization techniques, from the obvious to the obscure.” Th

6 0.20172839 1764 andrew gelman stats-2013-03-15-How do I make my graphs?

7 0.20066044 1584 andrew gelman stats-2012-11-19-Tradeoffs in information graphics

8 0.19524902 1604 andrew gelman stats-2012-12-04-An epithet I can live with

9 0.18667923 1811 andrew gelman stats-2013-04-18-Psychology experiments to understand what’s going on with data graphics?

10 0.18540956 1450 andrew gelman stats-2012-08-08-My upcoming talk for the data visualization meetup

11 0.18222094 502 andrew gelman stats-2011-01-04-Cash in, cash out graph

12 0.18195057 2279 andrew gelman stats-2014-04-02-Am I too negative?

13 0.17839101 1668 andrew gelman stats-2013-01-11-My talk at the NY data visualization meetup this Monday!

14 0.17263778 1376 andrew gelman stats-2012-06-12-Simple graph WIN: the example of birthday frequencies

15 0.16827467 262 andrew gelman stats-2010-09-08-Here’s how rumors get started: Lineplots, dotplots, and nonfunctional modernist architecture

16 0.16227615 1066 andrew gelman stats-2011-12-17-Ripley on model selection, and some links on exploratory model analysis

17 0.15969932 319 andrew gelman stats-2010-10-04-“Who owns Congress”

18 0.15873718 2186 andrew gelman stats-2014-01-26-Infoviz on top of stat graphic on top of spreadsheet

19 0.1578521 1848 andrew gelman stats-2013-05-09-A tale of two discussion papers

20 0.15535803 2154 andrew gelman stats-2013-12-30-Bill Gates’s favorite graph of the year

similar blogs computed by lsi model

lsi for this blog:

topicId topicWeight

[(0, 0.249), (1, -0.049), (2, -0.071), (3, 0.115), (4, 0.247), (5, -0.246), (6, -0.182), (7, 0.112), (8, -0.055), (9, -0.006), (10, 0.035), (11, 0.05), (12, -0.044), (13, 0.004), (14, 0.016), (15, -0.071), (16, -0.009), (17, -0.016), (18, -0.006), (19, 0.03), (20, 0.019), (21, 0.026), (22, -0.005), (23, -0.036), (24, -0.038), (25, 0.0), (26, 0.062), (27, -0.018), (28, -0.008), (29, -0.016), (30, -0.023), (31, -0.022), (32, -0.041), (33, 0.008), (34, -0.038), (35, 0.022), (36, 0.033), (37, -0.023), (38, 0.012), (39, -0.007), (40, -0.016), (41, 0.026), (42, 0.006), (43, 0.005), (44, -0.031), (45, -0.016), (46, 0.04), (47, 0.025), (48, -0.027), (49, -0.061)]

similar blogs list:

simIndex simValue blogId blogTitle

same-blog 1 0.97424102 2266 andrew gelman stats-2014-03-25-A statistical graphics course and statistical graphics advice

2 0.89577204 878 andrew gelman stats-2011-08-29-Infovis, infographics, and data visualization: Where I’m coming from, and where I’d like to go

3 0.86011964 829 andrew gelman stats-2011-07-29-Infovis vs. statgraphics: A clear example of their different goals

Introduction: I recently came across a data visualization that perfectly demonstrates the difference between the “infovis” and “statgraphics” perspectives. Here’s the image ( link from Tyler Cowen): That’s the infovis. The statgraphic version would simply be a dotplot, something like this: (I purposely used the default settings in R with only minor modifications here to demonstrate what happens if you just want to plot the data with minimal effort.) Let’s compare the two graphs: From a statistical graphics perspective, the second graph dominates. The countries are directly comparable and the numbers are indicated by positions rather than area. The first graph is full of distracting color and gives the misleading visual impression that the total GDP of countries 5-10 is about equal to that of countries 1-4. If the goal is to get attention , though, it’s another story. There’s nothing special about the top graph above except how it looks. It represents neither a dat

4 0.84918725 1606 andrew gelman stats-2012-12-05-The Grinch Comes Back

Introduction: Wayne Folta writes: In keeping with your interest in graphs, this might interest or inspire you, if you haven’t seen it already, which features 20 scientific graphs that Wired likes, ranging from drawn illustrations to trajectory plots. My reaction: I looked at the first 10. I liked 1, 3, and 5, I didn’t like 2, 7, 8, 9, and 10. I have neutral feelings about 4 and 6. I won’t explain all these feelings, but, just for example, from my perspective, image 9 fails as a statistical graphic (although it might be fine as an infovis) by trying to cram to much into a single image. I don’t think it works to have all the colors on the single wheels; instead I’d prefer some sort of grid of images. Also, I don’t see the point of the circular display. That makes no sense at all; it’s a misleading feature. That said, the graphs I dislike can still be fine for their purpose. A graph in a journal such as Science or Nature is meant to grab the eye of a busy reader (or to go viral on

5 0.8409515 855 andrew gelman stats-2011-08-16-Infovis and statgraphics update update

6 0.83642244 1439 andrew gelman stats-2012-08-01-A book with a bunch of simple graphs

7 0.83556861 1896 andrew gelman stats-2013-06-13-Against the myth of the heroic visualization

8 0.83554709 1684 andrew gelman stats-2013-01-20-Ugly ugly ugly

9 0.83488446 502 andrew gelman stats-2011-01-04-Cash in, cash out graph

10 0.83385092 319 andrew gelman stats-2010-10-04-“Who owns Congress”

11 0.8243174 1609 andrew gelman stats-2012-12-06-Stephen Kosslyn’s principles of graphics and one more: There’s no need to cram everything into a single plot

12 0.82184595 61 andrew gelman stats-2010-05-31-A data visualization manifesto

13 0.81706285 1584 andrew gelman stats-2012-11-19-Tradeoffs in information graphics

14 0.81694615 1604 andrew gelman stats-2012-12-04-An epithet I can live with

15 0.81248087 488 andrew gelman stats-2010-12-27-Graph of the year

16 0.80975103 847 andrew gelman stats-2011-08-10-Using a “pure infographic” to explore differences between information visualization and statistical graphics

17 0.80682534 2146 andrew gelman stats-2013-12-24-NYT version of birthday graph

18 0.80538005 1275 andrew gelman stats-2012-04-22-Please stop me before I barf again

19 0.79839706 37 andrew gelman stats-2010-05-17-Is chartjunk really “more useful” than plain graphs? I don’t think so.

20 0.79800767 372 andrew gelman stats-2010-10-27-A use for tables (really)

similar blogs computed by lda model

lda for this blog:

topicId topicWeight

[(5, 0.026), (10, 0.053), (14, 0.033), (16, 0.094), (21, 0.058), (24, 0.205), (52, 0.017), (66, 0.055), (77, 0.025), (79, 0.016), (99, 0.309)]

similar blogs list:

simIndex simValue blogId blogTitle

same-blog 1 0.97169924 2266 andrew gelman stats-2014-03-25-A statistical graphics course and statistical graphics advice

2 0.96724713 252 andrew gelman stats-2010-09-02-R needs a good function to make line plots

Introduction: More and more I’m thinking that line plots are great. More specifically, two-way grids of line plots on common scales, with one, two, or three lines per plot (enough to show comparisons but not so many that you can’t tell the lines apart). Also dot plots, of the sort that have been masterfully used by Lax and Phillips to show comparisons and trends in support for gay rights. There’s a big step missing, though, and that is to be able to make these graphs as a default. We have to figure out the right way to structure the data so these graphs come naturally. Then when it’s all working, we can talk the Excel people into implementing our ideas. I’m not asking to be paid here; all our ideas are in the public domain and I’m happy for Microsoft or Google or whoever to copy us. P.S. Drew Conway writes: This could be accomplished with ggplot2 using various combinations of the grammar. If I am understanding what you mean by line plots, here are some examples with code . In fact,

3 0.96603906 2288 andrew gelman stats-2014-04-10-Small multiples of lineplots > maps (ok, not always, but yes in this case)

Introduction: Kaiser Fung shares this graph from Ritchie King: Kaiser writes: What they did right: - Did not put the data on a map - Ordered the countries by the most recent data point rather than alphabetically - Scale labels are found only on outer edge of the chart area, rather than one set per panel - Only used three labels for the 11 years on the plot - Did not overdo the vertical scale either The nicest feature was the XL scale applied only to South Korea. This destroys the small-multiples principle but draws attention to the top left corner, where the designer wants our eyes to go. I would have used smaller fonts throughout. I agree with all of Kaiser’s comments. I could even add a few more, like using light gray for the backgrounds and a bright blue for the lines, spacing the graphs well, using full country names rather than three-letter abbreviations. There are so many standard mistakes that go into default data displays that it is refreshing to see a simple graph done

4 0.9658711 37 andrew gelman stats-2010-05-17-Is chartjunk really “more useful” than plain graphs? I don’t think so.

Introduction: Helen DeWitt links to this blog that reports on a study by Scott Bateman, Carl Gutwin, David McDine, Regan Mandryk, Aaron Genest, and Christopher Brooks that claims the following: Guidelines for designing information charts often state that the presentation should reduce ‘chart junk’–visual embellishments that are not essential to understanding the data. . . . we conducted an experiment that compared embellished charts with plain ones, and measured both interpretation accuracy and long-term recall. We found that people’s accuracy in describing the embellished charts was no worse than for plain charts, and that their recall after a two-to-three-week gap was significantly better. As the above-linked blogger puts it, “chartjunk is more useful than plain graphs. . . . Tufte is not going to like this.” I can’t speak for Ed Tufte, but I’m not gonna take this claim about chartjunk lying down. I have two points to make which I hope can stop the above-linked study from being sla

5 0.96324885 2174 andrew gelman stats-2014-01-17-How to think about the statistical evidence when the statistical evidence can’t be conclusive?

Introduction: There’s a paradigm in applied statistics that goes something like this: 1. There is a scientific or policy question of some theoretical or practical importance. 2. Researchers gather data on relevant outcomes and perform a statistical analysis, ideally leading to a clear conclusion (p less than 0.05, or a strong posterior distribution, or good predictive performance, or high reliability and validity, whatever). 3. This conclusion informs policy. This paradigm has room for positive findings (for example, that a new program is statistically significantly better, or statistically significantly worse than what came before) or negative findings (data are inconclusive, further study is needed), even if negative findings seem less likely to make their way into the textbooks. But what happens when step 2 simply isn’t possible. This came up a few years ago—nearly 10 years ago, now!—with the excellent paper by Donohue and Wolfers which explained why it’s just about impossible to

6 0.96324122 1792 andrew gelman stats-2013-04-07-X on JLP

7 0.96248007 807 andrew gelman stats-2011-07-17-Macro causality

8 0.96235824 1402 andrew gelman stats-2012-07-01-Ice cream! and temperature

9 0.9616732 878 andrew gelman stats-2011-08-29-Infovis, infographics, and data visualization: Where I’m coming from, and where I’d like to go

10 0.96164453 586 andrew gelman stats-2011-02-23-A statistical version of Arrow’s paradox

11 0.96044654 1713 andrew gelman stats-2013-02-08-P-values and statistical practice

12 0.96037436 2112 andrew gelman stats-2013-11-25-An interesting but flawed attempt to apply general forecasting principles to contextualize attitudes toward risks of global warming

13 0.96033823 1080 andrew gelman stats-2011-12-24-Latest in blog advertising

14 0.9598074 502 andrew gelman stats-2011-01-04-Cash in, cash out graph

15 0.95973116 898 andrew gelman stats-2011-09-10-Fourteen magic words: an update

16 0.9595418 2029 andrew gelman stats-2013-09-18-Understanding posterior p-values

17 0.95894146 2080 andrew gelman stats-2013-10-28-Writing for free

18 0.95844042 2086 andrew gelman stats-2013-11-03-How best to compare effects measured in two different time periods?

19 0.95799673 262 andrew gelman stats-2010-09-08-Here’s how rumors get started: Lineplots, dotplots, and nonfunctional modernist architecture

20 0.95788932 1644 andrew gelman stats-2012-12-30-Fixed effects, followed by Bayes shrinkage?