andrew_gelman_stats andrew_gelman_stats-2014 andrew_gelman_stats-2014-2288 knowledge-graph by maker-knowledge-mining

2288 andrew gelman stats-2014-04-10-Small multiples of lineplots > maps (ok, not always, but yes in this case)


meta infos for this blog

Source: html

Introduction: Kaiser Fung shares this graph from Ritchie King: Kaiser writes: What they did right: - Did not put the data on a map - Ordered the countries by the most recent data point rather than alphabetically - Scale labels are found only on outer edge of the chart area, rather than one set per panel - Only used three labels for the 11 years on the plot - Did not overdo the vertical scale either The nicest feature was the XL scale applied only to South Korea. This destroys the small-multiples principle but draws attention to the top left corner, where the designer wants our eyes to go. I would have used smaller fonts throughout. I agree with all of Kaiser’s comments. I could even add a few more, like using light gray for the backgrounds and a bright blue for the lines, spacing the graphs well, using full country names rather than three-letter abbreviations. There are so many standard mistakes that go into default data displays that it is refreshing to see a simple graph done


Summary: the most important sentenses genereted by tfidf model

sentIndex sentText sentNum sentScore

1 This destroys the small-multiples principle but draws attention to the top left corner, where the designer wants our eyes to go. [sent-2, score-0.351]

2 I could even add a few more, like using light gray for the backgrounds and a bright blue for the lines, spacing the graphs well, using full country names rather than three-letter abbreviations. [sent-5, score-0.293]

3 Kaiser continues: One way to appreciate the greatness of the chart is to look at alternatives. [sent-7, score-0.189]

4 Here, the Economist tries the lazy approach of using a map: ( link ) For one thing, they have to give up the time dimension. [sent-8, score-0.155]

5 A variation is a cartogram in which the physical size and shape of countries are mapped to the underlying data. [sent-9, score-0.163]

6 Here’s one on Worldmapper ( link ): One problem with this transformation is what to do with missing data. [sent-10, score-0.083]

7 Also, the big big trouble with the transformed map is that the #1 piece of information it gives you is something we all know already—that China has a lot of people. [sent-12, score-0.424]

8 Sure, if you look carefully you can figure out other things—hey, India has a billion people too but it’s really small on the map, I guess nobody’s drinking much there—but that’s all complicated reasoning involving mental division. [sent-13, score-0.127]

9 Kaiser continues: Wikipedia has a better map with variations of one color ( link ): I agree that this one is better than the Economist map above. [sent-15, score-1.102]

10 )) and, (b) it shows the time trends (most notably, the declines in Russia and Brazil, the increase from a low baseline in India, and Korea’s steady #1 position). [sent-18, score-0.143]

11 The click-through solution Let me conclude, as always in this sort of discussion, that displaying patterns in the data is not the only reason for a graph. [sent-19, score-0.078]

12 If an unusually-colored map catches people’s eyes, maybe that’s the best way to go. [sent-21, score-0.499]

13 My ideal solution would be click-through: the Economist (or wherever) has the colorful map with instructions to click to see the informative grid of line plots, then you can click again and get a spreadsheet with all the numbers. [sent-22, score-0.808]


similar blogs computed by tfidf model

tfidf for this blog:

wordName wordTfidf (topN-words)

[('map', 0.424), ('kaiser', 0.231), ('korea', 0.189), ('economist', 0.172), ('legend', 0.133), ('india', 0.129), ('drinking', 0.127), ('grid', 0.122), ('ordered', 0.12), ('eyes', 0.119), ('chart', 0.116), ('labels', 0.115), ('scale', 0.11), ('color', 0.098), ('wikipedia', 0.093), ('click', 0.092), ('alphabetically', 0.086), ('portugal', 0.086), ('nicest', 0.086), ('countries', 0.085), ('continues', 0.084), ('link', 0.083), ('ritchie', 0.081), ('destroys', 0.081), ('attention', 0.08), ('lines', 0.078), ('solution', 0.078), ('brazil', 0.078), ('spacing', 0.078), ('shots', 0.078), ('mapped', 0.078), ('grabbing', 0.075), ('projection', 0.075), ('fonts', 0.075), ('outer', 0.075), ('catches', 0.075), ('xl', 0.075), ('shows', 0.074), ('huge', 0.074), ('comparisons', 0.073), ('variations', 0.073), ('greatness', 0.073), ('using', 0.072), ('bright', 0.071), ('wine', 0.071), ('wherever', 0.071), ('designer', 0.071), ('low', 0.069), ('quantitatively', 0.069), ('distorted', 0.069)]

similar blogs list:

simIndex simValue blogId blogTitle

same-blog 1 1.0000001 2288 andrew gelman stats-2014-04-10-Small multiples of lineplots > maps (ok, not always, but yes in this case)

Introduction: Kaiser Fung shares this graph from Ritchie King: Kaiser writes: What they did right: - Did not put the data on a map - Ordered the countries by the most recent data point rather than alphabetically - Scale labels are found only on outer edge of the chart area, rather than one set per panel - Only used three labels for the 11 years on the plot - Did not overdo the vertical scale either The nicest feature was the XL scale applied only to South Korea. This destroys the small-multiples principle but draws attention to the top left corner, where the designer wants our eyes to go. I would have used smaller fonts throughout. I agree with all of Kaiser’s comments. I could even add a few more, like using light gray for the backgrounds and a bright blue for the lines, spacing the graphs well, using full country names rather than three-letter abbreviations. There are so many standard mistakes that go into default data displays that it is refreshing to see a simple graph done

2 0.22591521 1810 andrew gelman stats-2013-04-17-Subway series

Introduction: Abby points us to a spare but cool visualization . I don’t like the curvy connect-the-dots line, but my main suggested improvement would be a closer link to the map . Showing median income on census tracts along subway lines is cool, but ultimately it’s a clever gimmick that pulls me in and makes me curious about what the map looks like. (And, thanks to google, the map was easy to find.)

3 0.21971188 2186 andrew gelman stats-2014-01-26-Infoviz on top of stat graphic on top of spreadsheet

Introduction: Kaiser points to this infoviz from MIT’s Technology Review: Kaiser writes: What makes the designer want to tilt the reader’s head? This chart is unreadable. It also fails the self-sufficiency test. All 13 data points are printed onto the chart. You really don’t need the axis, and the gridlines. A further design flaw is the use of signposts. Our eyes are drawn to the hexagons containing the brand icons but the data is at the other end of the signpost, where it is planted on the surface! Here is a sketch of something not as cute: I [Kaiser] expressed time as years . . . The mobile-related entities are labelled red. The dots could be replaced by the hexagonal brand icons. I agree with all of Kaiser’s criticisms, and I agree that his graph is, from the statistical perspective, a zillion times better than what was published. On the other hand, unusual images can get attention. Recall the famous/notorious clock plot from Florence Nightingale . This is why I’ve move

4 0.18794098 787 andrew gelman stats-2011-07-05-Different goals, different looks: Infovis and the Chris Rock effect

Introduction: Seth writes: Here’s my candidate for bad graphic of the year: I [Seth] studied it and learned nothing. I have no idea how they assigned colors to locations. I already knew that there were more within-city calls than calls to individual distant locations — for example that there are more SF-SF calls than SF-LA calls. The researchers took a huge rich database and boiled it down to nothing (in terms of information value) — and I have a funny feeling they don’t realize how awful this is and what a waste. I send it to you because it isn’t obvious how to do better — at least not obvious to them. My reply: My first reaction is to agree–I don’t get anything out of this graph either! But let me step back. I think it’s best to understand this using the framework of my paper with Antony Unwin , by thinking of the goals that are satisfied by different sorts of graphs. What does this graph convey? It doesn’t tell us much about phone calls, but it does tell us that some peop

5 0.16330333 878 andrew gelman stats-2011-08-29-Infovis, infographics, and data visualization: Where I’m coming from, and where I’d like to go

Introduction: I continue to struggle to convey my thoughts on statistical graphics so I’ll try another approach, this time giving my own story. For newcomers to this discussion: the background is that Antony Unwin and I wrote an article on the different goals embodied in information visualization and statistical graphics, but I have difficulty communicating on this point with the infovis people. Maybe if I tell my own story, and then they tell their stories, this will point a way forward to a more constructive discussion. So here goes. I majored in physics in college and I worked in a couple of research labs during the summer. Physicists graph everything. I did most of my plotting on graph paper–this continued through my second year of grad school–and became expert at putting points at 1/5, 2/5, 3/5, and 4/5 between the x and y grid lines. In grad school in statistics, I continued my physics habits and graphed everything I could. I did notice, though, that the faculty and the other

6 0.15103911 648 andrew gelman stats-2011-04-04-The Case for More False Positives in Anti-doping Testing

7 0.15074584 670 andrew gelman stats-2011-04-20-Attractive but hard-to-read graph could be made much much better

8 0.15065029 182 andrew gelman stats-2010-08-03-Nebraska never looked so appealing: anatomy of a zombie attack. Oops, I mean a recession.

9 0.14624336 1090 andrew gelman stats-2011-12-28-“. . . extending for dozens of pages”

10 0.14374346 262 andrew gelman stats-2010-09-08-Here’s how rumors get started: Lineplots, dotplots, and nonfunctional modernist architecture

11 0.14087236 61 andrew gelman stats-2010-05-31-A data visualization manifesto

12 0.13892341 388 andrew gelman stats-2010-11-01-The placebo effect in pharma

13 0.13382031 2266 andrew gelman stats-2014-03-25-A statistical graphics course and statistical graphics advice

14 0.13151979 2031 andrew gelman stats-2013-09-19-What makes a statistician look like a hero?

15 0.12896228 1001 andrew gelman stats-2011-11-10-Three hours in the life of a statistician

16 0.12648515 2154 andrew gelman stats-2013-12-30-Bill Gates’s favorite graph of the year

17 0.12225377 543 andrew gelman stats-2011-01-28-NYT shills for personal DNA tests

18 0.11636542 1452 andrew gelman stats-2012-08-09-Visually weighting regression displays

19 0.11536177 1376 andrew gelman stats-2012-06-12-Simple graph WIN: the example of birthday frequencies

20 0.1142337 1653 andrew gelman stats-2013-01-04-Census dotmap


similar blogs computed by lsi model

lsi for this blog:

topicId topicWeight

[(0, 0.2), (1, -0.048), (2, 0.03), (3, 0.055), (4, 0.126), (5, -0.162), (6, -0.044), (7, 0.06), (8, -0.013), (9, 0.028), (10, -0.039), (11, -0.044), (12, -0.012), (13, 0.032), (14, -0.05), (15, 0.008), (16, -0.013), (17, 0.106), (18, 0.005), (19, -0.007), (20, 0.015), (21, 0.054), (22, -0.047), (23, -0.056), (24, 0.018), (25, -0.016), (26, 0.031), (27, 0.013), (28, -0.001), (29, 0.069), (30, 0.031), (31, -0.06), (32, 0.012), (33, 0.023), (34, 0.029), (35, -0.058), (36, -0.003), (37, -0.048), (38, -0.054), (39, -0.015), (40, -0.026), (41, 0.097), (42, 0.0), (43, 0.013), (44, -0.008), (45, 0.061), (46, -0.022), (47, 0.061), (48, -0.0), (49, -0.021)]

similar blogs list:

simIndex simValue blogId blogTitle

same-blog 1 0.94603205 2288 andrew gelman stats-2014-04-10-Small multiples of lineplots > maps (ok, not always, but yes in this case)

Introduction: Kaiser Fung shares this graph from Ritchie King: Kaiser writes: What they did right: - Did not put the data on a map - Ordered the countries by the most recent data point rather than alphabetically - Scale labels are found only on outer edge of the chart area, rather than one set per panel - Only used three labels for the 11 years on the plot - Did not overdo the vertical scale either The nicest feature was the XL scale applied only to South Korea. This destroys the small-multiples principle but draws attention to the top left corner, where the designer wants our eyes to go. I would have used smaller fonts throughout. I agree with all of Kaiser’s comments. I could even add a few more, like using light gray for the backgrounds and a bright blue for the lines, spacing the graphs well, using full country names rather than three-letter abbreviations. There are so many standard mistakes that go into default data displays that it is refreshing to see a simple graph done

2 0.90011364 2186 andrew gelman stats-2014-01-26-Infoviz on top of stat graphic on top of spreadsheet

Introduction: Kaiser points to this infoviz from MIT’s Technology Review: Kaiser writes: What makes the designer want to tilt the reader’s head? This chart is unreadable. It also fails the self-sufficiency test. All 13 data points are printed onto the chart. You really don’t need the axis, and the gridlines. A further design flaw is the use of signposts. Our eyes are drawn to the hexagons containing the brand icons but the data is at the other end of the signpost, where it is planted on the surface! Here is a sketch of something not as cute: I [Kaiser] expressed time as years . . . The mobile-related entities are labelled red. The dots could be replaced by the hexagonal brand icons. I agree with all of Kaiser’s criticisms, and I agree that his graph is, from the statistical perspective, a zillion times better than what was published. On the other hand, unusual images can get attention. Recall the famous/notorious clock plot from Florence Nightingale . This is why I’ve move

3 0.80298948 1090 andrew gelman stats-2011-12-28-“. . . extending for dozens of pages”

Introduction: Kaiser writes : I have read a fair share of bore-them-to-tears compilation of survey research results – you know, those presentations with one multi-colored, stacked or grouped bar chart after another, extending for dozens of pages. I hate those grouped bar charts also—as I’ve written repeatedly, the central role of almost all statistical displays is to make comparisons, and you can make twice as many comparisons with a line plot as a bar plot. But I suspect the real problem with the reports that Kaiser is talking about is the “extending for dozens of pages” part. If they could just print each individual plot smaller and put dozens on a page, you could maybe get through the whole report in two or three pages. Almost always, graphs are too large. I’ve even seen abominations such as a fifty-page report with a single huge pie chart on each page. As Kaiser says, think about communication! A report with one big pie chart or bar plot per page is like a text document with one w

4 0.79353207 262 andrew gelman stats-2010-09-08-Here’s how rumors get started: Lineplots, dotplots, and nonfunctional modernist architecture

Introduction: 1. I remarked that Sharad had a good research article with some ugly graphs. 2. Dan posted Sharad’s graph and some unpleasant alternatives, inadvertently associating me with one of the unpleasant alternatives. Dan was comparing barplots with dotplots. 3. I commented on Dan’s site that, in this case, I’d much prefer a well-designed lineplot. I wrote: There’s a principle in decision analysis that the most important step is not the evaluation of the decision tree but the decision of what options to include in the tree in the first place. I think that’s what’s happening here. You’re seriously limiting yourself by considering the above options, which really are all the same graph with just slight differences in format. What you need to do is break outside the box. (Graph 2-which I think you think is the kind of thing that Gelman would like-indeed is the kind of thing that I think the R gurus like, but I don’t like it at all . It looks clean without actually being clea

5 0.78557098 672 andrew gelman stats-2011-04-20-The R code for those time-use graphs

Introduction: By popular demand, here’s my R script for the time-use graphs : # The data a1 <- c(4.2,3.2,11.1,1.3,2.2,2.0) a2 <- c(3.9,3.2,10.0,0.8,3.1,3.1) a3 <- c(6.3,2.5,9.8,0.9,2.2,2.4) a4 <- c(4.4,3.1,9.8,0.8,3.3,2.7) a5 <- c(4.8,3.0,9.9,0.7,3.3,2.4) a6 <- c(4.0,3.4,10.5,0.7,3.3,2.1) a <- rbind(a1,a2,a3,a4,a5,a6) avg <- colMeans (a) avg.array <- t (array (avg, rev(dim(a)))) diff <- a - avg.array country.name <- c("France", "Germany", "Japan", "Britain", "USA", "Turkey") # The line plots par (mfrow=c(2,3), mar=c(4,4,2,.5), mgp=c(2,.7,0), tck=-.02, oma=c(3,0,4,0), bg="gray96", fg="gray30") for (i in 1:6){ plot (c(1,6), c(-1,1.7), xlab="", ylab="", xaxt="n", yaxt="n", bty="l", type="n") lines (1:6, diff[i,], col="blue") points (1:6, diff[i,], pch=19, col="black") if (i>3){ axis (1, c(1,3,5), c ("Work,\nstudy", "Eat,\nsleep", "Leisure"), mgp=c(2,1.5,0), tck=0, cex.axis=1.2) axis (1, c(2,4,6), c ("Unpaid\nwork", "Personal\nCare", "Other"), mgp=c(2,1.5,0),

6 0.7744025 1001 andrew gelman stats-2011-11-10-Three hours in the life of a statistician

7 0.74913239 1684 andrew gelman stats-2013-01-20-Ugly ugly ugly

8 0.74142057 670 andrew gelman stats-2011-04-20-Attractive but hard-to-read graph could be made much much better

9 0.72878623 1609 andrew gelman stats-2012-12-06-Stephen Kosslyn’s principles of graphics and one more: There’s no need to cram everything into a single plot

10 0.7247656 134 andrew gelman stats-2010-07-08-“What do you think about curved lines connecting discrete data-points?”

11 0.724572 461 andrew gelman stats-2010-12-09-“‘Why work?’”

12 0.71952724 294 andrew gelman stats-2010-09-23-Thinking outside the (graphical) box: Instead of arguing about how best to fix a bar chart, graph it as a time series lineplot instead

13 0.71399403 1834 andrew gelman stats-2013-05-01-A graph at war with its caption. Also, how to visualize the same numbers without giving the display a misleading causal feel?

14 0.71149528 37 andrew gelman stats-2010-05-17-Is chartjunk really “more useful” than plain graphs? I don’t think so.

15 0.70912409 1498 andrew gelman stats-2012-09-16-Choices in graphing parallel time series

16 0.70794481 2154 andrew gelman stats-2013-12-30-Bill Gates’s favorite graph of the year

17 0.7040326 502 andrew gelman stats-2011-01-04-Cash in, cash out graph

18 0.703345 488 andrew gelman stats-2010-12-27-Graph of the year

19 0.70303595 671 andrew gelman stats-2011-04-20-One more time-use graph

20 0.70162237 1896 andrew gelman stats-2013-06-13-Against the myth of the heroic visualization


similar blogs computed by lda model

lda for this blog:

topicId topicWeight

[(5, 0.018), (10, 0.037), (16, 0.129), (21, 0.042), (24, 0.157), (30, 0.041), (34, 0.058), (63, 0.015), (65, 0.03), (66, 0.026), (73, 0.018), (77, 0.022), (80, 0.017), (82, 0.013), (86, 0.011), (95, 0.05), (99, 0.185)]

similar blogs list:

simIndex simValue blogId blogTitle

same-blog 1 0.94253898 2288 andrew gelman stats-2014-04-10-Small multiples of lineplots > maps (ok, not always, but yes in this case)

Introduction: Kaiser Fung shares this graph from Ritchie King: Kaiser writes: What they did right: - Did not put the data on a map - Ordered the countries by the most recent data point rather than alphabetically - Scale labels are found only on outer edge of the chart area, rather than one set per panel - Only used three labels for the 11 years on the plot - Did not overdo the vertical scale either The nicest feature was the XL scale applied only to South Korea. This destroys the small-multiples principle but draws attention to the top left corner, where the designer wants our eyes to go. I would have used smaller fonts throughout. I agree with all of Kaiser’s comments. I could even add a few more, like using light gray for the backgrounds and a bright blue for the lines, spacing the graphs well, using full country names rather than three-letter abbreviations. There are so many standard mistakes that go into default data displays that it is refreshing to see a simple graph done

2 0.91904712 135 andrew gelman stats-2010-07-09-Rasmussen sez: “108% of Respondents Say . . .”

Introduction: The recent discussion of pollsters reminded me of a story from a couple years ago that perhaps is still relevant . . . I was looking up the governors’ popularity numbers on the web, and came across this page from Rasmussen Reports which shows Sarah Palin as the 3rd-most-popular governor. But then I looked more carefully. Janet Napolitano of Arizona was viewed as Excellent by 28% of respondents, Good by 27%, Fair by 26%, and Poor by 27%. That adds up to 108%! What’s going on? I’d think they would have a computer program to pipe the survey results directly into the spreadsheet. But I guess not, someone must be typing in these numbers one at a time. Another possibility is that they are altering their numbers by hand, and someone made a mistake with the Napolitano numbers, adding a few percent in one place and forgetting to subtract elsewhere. Or maybe there’s another explanation? P.S. Here are some thoughts from Mark Blumenthal P.P.S. I checked the Rasmussen link toda

3 0.90501624 799 andrew gelman stats-2011-07-13-Hypothesis testing with multiple imputations

Introduction: Vincent Yip writes: I have read your paper [with Kobi Abayomi and Marc Levy] regarding multiple imputation application. In order to diagnostic my imputed data, I used Kolmogorov-Smirnov (K-S) tests to compare the distribution differences between the imputed and observed values of a single attribute as mentioned in your paper. My question is: For example I have this attribute X with the following data: (NA = missing) Original dataset: 1, NA, 3, 4, 1, 5, NA Imputed dataset: 1, 2 , 3, 4, 1, 5, 6 a) in order to run the KS test, will I treat the observed data as 1, 3, 4,1, 5? b) and for the observed data, will I treat 1, 2 , 3, 4, 1, 5, 6 as the imputed dataset for the K-S test? or just 2 ,6? c) if I used m=5, I will have 5 set of imputed data sets. How would I apply K-S test to 5 of them and compare to the single observed distribution? Do I combine the 5 imputed data set into one by averaging each imputed values so I get one single imputed data and compare with the ob

4 0.90448052 2095 andrew gelman stats-2013-11-09-Typo in Ghitza and Gelman MRP paper

Introduction: Devin Caughey points out a typo in the second column of page 765 of our AJPS paper . Here’s what we have: The typo is in the third line of the second paragraph above. Where it says y^*_j = y.bar^*_j n_j, it should be y^*_j = y.bar^*_j n^*_j. One frustrating system of the current system of journal publication is that I know of no way to append this correction to the published article. I can put it here, but anyone who misses this is stuck. And I don’t think the AJPS can link from the article to this post. I contacted the editor of the AJPS who said there will be no problem appending the correction to the electronic version of the article.

5 0.90371329 411 andrew gelman stats-2010-11-13-Ethical concerns in medical trials

Introduction: I just read this article on the treatment of medical volunteers, written by doctor and bioethicist Carl Ellliott. As a statistician who has done a small amount of consulting for pharmaceutical companies, I have a slightly different perspective. As a doctor, Elliott focuses on individual patients, whereas, as a statistician, I’ve been trained to focus on the goal of accurately estimate treatment effects. I’ll go through Elliott’s article and give my reactions. Elliott: In Miami, investigative reporters for Bloomberg Markets magazine discovered that a contract research organisation called SFBC International was testing drugs on undocumented immigrants in a rundown motel; since that report, the motel has been demolished for fire and safety violations. . . . SFBC had recently been named one of the best small businesses in America by Forbes magazine. The Holiday Inn testing facility was the largest in North America, and had been operating for nearly ten years before inspecto

6 0.90116596 1871 andrew gelman stats-2013-05-27-Annals of spam

7 0.89906919 503 andrew gelman stats-2011-01-04-Clarity on my email policy

8 0.89816302 177 andrew gelman stats-2010-08-02-Reintegrating rebels into civilian life: Quasi-experimental evidence from Burundi

9 0.89621103 1293 andrew gelman stats-2012-05-01-Huff the Magic Dragon

10 0.89501595 488 andrew gelman stats-2010-12-27-Graph of the year

11 0.89418167 1734 andrew gelman stats-2013-02-23-Life in the C-suite: A graph that is both ugly and bad, and an unrelated story

12 0.89363956 1080 andrew gelman stats-2011-12-24-Latest in blog advertising

13 0.89326304 599 andrew gelman stats-2011-03-03-Two interesting posts elsewhere on graphics

14 0.89246345 586 andrew gelman stats-2011-02-23-A statistical version of Arrow’s paradox

15 0.89141035 1019 andrew gelman stats-2011-11-19-Validation of Software for Bayesian Models Using Posterior Quantiles

16 0.89104486 593 andrew gelman stats-2011-02-27-Heat map

17 0.89034438 674 andrew gelman stats-2011-04-21-Handbook of Markov Chain Monte Carlo

18 0.88972718 1402 andrew gelman stats-2012-07-01-Ice cream! and temperature

19 0.88889873 2 andrew gelman stats-2010-04-23-Modeling heterogenous treatment effects

20 0.88816726 898 andrew gelman stats-2011-09-10-Fourteen magic words: an update