andrew_gelman_stats andrew_gelman_stats-2011 andrew_gelman_stats-2011-829 knowledge-graph by maker-knowledge-mining

829 andrew gelman stats-2011-07-29-Infovis vs. statgraphics: A clear example of their different goals


meta infos for this blog

Source: html

Introduction: I recently came across a data visualization that perfectly demonstrates the difference between the “infovis” and “statgraphics” perspectives. Here’s the image ( link from Tyler Cowen): That’s the infovis. The statgraphic version would simply be a dotplot, something like this: (I purposely used the default settings in R with only minor modifications here to demonstrate what happens if you just want to plot the data with minimal effort.) Let’s compare the two graphs: From a statistical graphics perspective, the second graph dominates. The countries are directly comparable and the numbers are indicated by positions rather than area. The first graph is full of distracting color and gives the misleading visual impression that the total GDP of countries 5-10 is about equal to that of countries 1-4. If the goal is to get attention , though, it’s another story. There’s nothing special about the top graph above except how it looks. It represents neither a dat


Summary: the most important sentenses genereted by tfidf model

sentIndex sentText sentNum sentScore

1 I recently came across a data visualization that perfectly demonstrates the difference between the “infovis” and “statgraphics” perspectives. [sent-1, score-0.148]

2 Here’s the image ( link from Tyler Cowen): That’s the infovis. [sent-2, score-0.067]

3 The statgraphic version would simply be a dotplot, something like this: (I purposely used the default settings in R with only minor modifications here to demonstrate what happens if you just want to plot the data with minimal effort. [sent-3, score-0.336]

4 ) Let’s compare the two graphs: From a statistical graphics perspective, the second graph dominates. [sent-4, score-0.277]

5 The countries are directly comparable and the numbers are indicated by positions rather than area. [sent-5, score-0.564]

6 The first graph is full of distracting color and gives the misleading visual impression that the total GDP of countries 5-10 is about equal to that of countries 1-4. [sent-6, score-1.257]

7 There’s nothing special about the top graph above except how it looks. [sent-8, score-0.203]

8 It represents neither a data-gathering effort, nor a statistical analysis, nor even a clever juxtaposition (as in the famous graph of health costs and life expectancies). [sent-9, score-0.52]

9 If someone had posted the second graph above (the lineplot), I doubt it would’ve been sent around the web, and I doubt that Cowen would’ve noticed it in the first place. [sent-10, score-0.489]

10 Thus, in this modern world of multichannel communications, chartjunk does have a purpose: it gets you noticed. [sent-11, score-0.103]

11 png", height=350, width=400) countries <- c ("South Africa", "Egypt", "Nigeria", "Algeria", "Morocco", "Angola", "Libya", "Tunisia", "Kenya", "Ethiopia", "Ghana", "Cameroon") gdp <- c (285. [sent-15, score-0.87]

12 2) dotchart (rev(gdp), rev(countries), xlab="GDP in billions of US dollars", main="African Countries by GDP", xlim=max(gdp)*c(. [sent-26, score-0.1]


similar blogs computed by tfidf model

tfidf for this blog:

wordName wordTfidf (topN-words)

[('gdp', 0.52), ('countries', 0.35), ('rev', 0.213), ('graph', 0.203), ('cowen', 0.123), ('juxtaposition', 0.118), ('pch', 0.118), ('png', 0.118), ('ethiopia', 0.118), ('kenya', 0.118), ('libya', 0.118), ('xlim', 0.118), ('egypt', 0.111), ('tunisia', 0.111), ('statgraphics', 0.111), ('expectancies', 0.111), ('modifications', 0.106), ('doubt', 0.106), ('chartjunk', 0.103), ('xlab', 0.103), ('billions', 0.1), ('african', 0.097), ('lineplot', 0.097), ('dotplot', 0.095), ('distracting', 0.095), ('purposely', 0.091), ('africa', 0.089), ('communications', 0.088), ('width', 0.088), ('max', 0.085), ('demonstrates', 0.081), ('indicated', 0.079), ('infovis', 0.078), ('height', 0.075), ('minimal', 0.074), ('second', 0.074), ('clever', 0.073), ('south', 0.072), ('positions', 0.068), ('color', 0.067), ('perfectly', 0.067), ('comparable', 0.067), ('image', 0.067), ('minor', 0.065), ('visual', 0.065), ('equal', 0.064), ('costs', 0.064), ('dollars', 0.064), ('misleading', 0.063), ('represents', 0.062)]

similar blogs list:

simIndex simValue blogId blogTitle

same-blog 1 0.99999994 829 andrew gelman stats-2011-07-29-Infovis vs. statgraphics: A clear example of their different goals

Introduction: I recently came across a data visualization that perfectly demonstrates the difference between the “infovis” and “statgraphics” perspectives. Here’s the image ( link from Tyler Cowen): That’s the infovis. The statgraphic version would simply be a dotplot, something like this: (I purposely used the default settings in R with only minor modifications here to demonstrate what happens if you just want to plot the data with minimal effort.) Let’s compare the two graphs: From a statistical graphics perspective, the second graph dominates. The countries are directly comparable and the numbers are indicated by positions rather than area. The first graph is full of distracting color and gives the misleading visual impression that the total GDP of countries 5-10 is about equal to that of countries 1-4. If the goal is to get attention , though, it’s another story. There’s nothing special about the top graph above except how it looks. It represents neither a dat

2 0.21031597 670 andrew gelman stats-2011-04-20-Attractive but hard-to-read graph could be made much much better

Introduction: Matthew Yglesias shares this graph from the Economist : I hate this graph. OK, sure, I don’t hate hate hate hate it: it’s not a 3-d exploding pie chart or anything. It’s not misleading, it’s just extremely difficult to read. Basically, you have to go back and forth between the colors and the labels and the countries and read it like a table. OK, so here’s the table: Average Hours Per Day Spent in Each Activity Work, Unpaid Eating, Personal Country study work sleeping care Leisure Other France 4 3 11 1 2 2 Germany 4 3 10 1 3 3 Japan 6 2 10 1 2 2 Britain 4 3 10 1 3 3 USA 5 3 10 1 3 2 Turkey 4 3 11 1 3 2 Hmm, that didn’t work too well. Let’s try subtracting the average from each column (for these six countries,

3 0.17023337 878 andrew gelman stats-2011-08-29-Infovis, infographics, and data visualization: Where I’m coming from, and where I’d like to go

Introduction: I continue to struggle to convey my thoughts on statistical graphics so I’ll try another approach, this time giving my own story. For newcomers to this discussion: the background is that Antony Unwin and I wrote an article on the different goals embodied in information visualization and statistical graphics, but I have difficulty communicating on this point with the infovis people. Maybe if I tell my own story, and then they tell their stories, this will point a way forward to a more constructive discussion. So here goes. I majored in physics in college and I worked in a couple of research labs during the summer. Physicists graph everything. I did most of my plotting on graph paper–this continued through my second year of grad school–and became expert at putting points at 1/5, 2/5, 3/5, and 4/5 between the x and y grid lines. In grad school in statistics, I continued my physics habits and graphed everything I could. I did notice, though, that the faculty and the other

4 0.16496666 1010 andrew gelman stats-2011-11-14-“Free energy” and economic resources

Introduction: By “free energy” I don’t mean perpetual motion machines, cars that run on water and get 200 mpg, or the latest cold-fusion hype. No, I’m referring to the term from physics. The free energy of a system is, roughly, the amount of energy that can be directly extracted from it. For example, a rock at room temperature is just full of energy—not just the energy locked in its nuclei, but basic thermal energy—but at room temperature you can’t extract any of it. To the physicists in the audience: Yes, I realize that free energy has a technical meaning in statistical mechanics and that my above definition is sloppy. Please bear with me. And, to the non-physicists: feel free to head to Wikipedia or a physics textbook for a more careful treatment. I was thinking about free energy the other day when hearing someone on the radio say something about China bailing out the E.U. I did a double-take. Huh? The E.U. is rich, China’s not so rich. How can a middle-income country bail out a

5 0.14000095 1386 andrew gelman stats-2012-06-21-Belief in hell is associated with lower crime rates

Introduction: I remember attending a talk a few years ago by my political science colleague John Huber in which he discussed cross-national comparisons of religious attitudes. One thing I remember is that the U.S. is highly religious, another thing I remembered is that lots more Americans believe in heaven than believe in hell. Some of this went into Red State Blue State—not the heaven/hell thing, but the graph of religiosity vs. GDP: and the corresponding graph of religious attendance vs. GDP for U.S. states: Also we learned that, at the individual level, the correlation of religious attendance with income is zero (according to survey reports, rich Americans are neither more nor less likely than poor Americans to go to church regularly): while the correlation of prayer with income is strongly negative (poor Americans are much more likely than rich Americans to regularly pray): Anyway, with all this, I was primed to be interested in a recent study by psychologist

6 0.13683116 1834 andrew gelman stats-2013-05-01-A graph at war with its caption. Also, how to visualize the same numbers without giving the display a misleading causal feel?

7 0.13024394 1677 andrew gelman stats-2013-01-16-Greenland is one tough town

8 0.12328584 1665 andrew gelman stats-2013-01-10-That controversial claim that high genetic diversity, or low genetic diversity, is bad for the economy

9 0.12069851 855 andrew gelman stats-2011-08-16-Infovis and statgraphics update update

10 0.11739168 2266 andrew gelman stats-2014-03-25-A statistical graphics course and statistical graphics advice

11 0.11563946 1767 andrew gelman stats-2013-03-17-The disappearing or non-disappearing middle class

12 0.11406711 1669 andrew gelman stats-2013-01-12-The power of the puzzlegraph

13 0.1059924 37 andrew gelman stats-2010-05-17-Is chartjunk really “more useful” than plain graphs? I don’t think so.

14 0.10588275 1584 andrew gelman stats-2012-11-19-Tradeoffs in information graphics

15 0.10516204 443 andrew gelman stats-2010-12-02-Automating my graphics advice

16 0.10027795 262 andrew gelman stats-2010-09-08-Here’s how rumors get started: Lineplots, dotplots, and nonfunctional modernist architecture

17 0.099988952 2154 andrew gelman stats-2013-12-30-Bill Gates’s favorite graph of the year

18 0.096978679 847 andrew gelman stats-2011-08-10-Using a “pure infographic” to explore differences between information visualization and statistical graphics

19 0.096252277 502 andrew gelman stats-2011-01-04-Cash in, cash out graph

20 0.095721073 61 andrew gelman stats-2010-05-31-A data visualization manifesto


similar blogs computed by lsi model

lsi for this blog:

topicId topicWeight

[(0, 0.118), (1, -0.046), (2, -0.001), (3, 0.058), (4, 0.116), (5, -0.153), (6, -0.082), (7, 0.039), (8, -0.032), (9, 0.009), (10, -0.025), (11, -0.015), (12, -0.042), (13, 0.047), (14, 0.025), (15, 0.009), (16, 0.044), (17, -0.017), (18, -0.016), (19, 0.004), (20, 0.035), (21, 0.018), (22, -0.011), (23, 0.02), (24, 0.014), (25, 0.001), (26, 0.01), (27, 0.021), (28, -0.039), (29, 0.029), (30, 0.005), (31, -0.032), (32, -0.043), (33, -0.04), (34, -0.01), (35, 0.011), (36, -0.017), (37, -0.016), (38, -0.008), (39, 0.025), (40, -0.013), (41, -0.005), (42, -0.026), (43, 0.013), (44, -0.002), (45, -0.014), (46, 0.044), (47, -0.006), (48, -0.022), (49, -0.0)]

similar blogs list:

simIndex simValue blogId blogTitle

same-blog 1 0.97786999 829 andrew gelman stats-2011-07-29-Infovis vs. statgraphics: A clear example of their different goals

Introduction: I recently came across a data visualization that perfectly demonstrates the difference between the “infovis” and “statgraphics” perspectives. Here’s the image ( link from Tyler Cowen): That’s the infovis. The statgraphic version would simply be a dotplot, something like this: (I purposely used the default settings in R with only minor modifications here to demonstrate what happens if you just want to plot the data with minimal effort.) Let’s compare the two graphs: From a statistical graphics perspective, the second graph dominates. The countries are directly comparable and the numbers are indicated by positions rather than area. The first graph is full of distracting color and gives the misleading visual impression that the total GDP of countries 5-10 is about equal to that of countries 1-4. If the goal is to get attention , though, it’s another story. There’s nothing special about the top graph above except how it looks. It represents neither a dat

2 0.88734126 488 andrew gelman stats-2010-12-27-Graph of the year

Introduction: From blogger Matthew Yglesias : There are lots of great graphs all over the web (see, for example, here and here for some snappy pictures of unemployment trends from blogger “Geoff”). There’s nothing special about Yglesias’s graph. In fact, the reason I’m singling it out as “graph of the year” is because it’s not special. It’s a display of three numbers, with no subtlety or artistry in its presentation. True, it has some good features: - Clear title - Clearly labeled axes - Vertical axis goes to zero - The cities are in a sensible order (not, for example, alphabetical) - The graphs is readable; none of that 3-D “data visualization” crap that looks cool but distances the reader from the numbers being displayed. What’s impressive about the above graph, what makes it a landmark to me, is that it was made at all. As noted in the text immediately below the image, it’s a display of exactly three numbers which can with little effort be completely presented and e

3 0.88269687 1684 andrew gelman stats-2013-01-20-Ugly ugly ugly

Introduction: Denis Cote sends the following , under the heading, “Some bad graphs for your enjoyment”: To start with, they don’t know how to spell “color.” Seriously, though, the graph is a mess. The circular display implies a circular or periodic structure that isn’t actually in the data, the cramped display requires the use of an otherwise-unnecessary color code that makes it difficult to find or make sense of the information, the alphabetical ordering (without even supplying state names, only abbreviations) makes it further difficult to find any patterns. It would be so much better, and even easier, to just display a set of small maps shading states on whether they have different laws. But that’s part of the problem—the clearer graph would also be easier to make! To get a distinctive graph, there needs to be some degree of difficulty. The designers continue with these monstrosities: Here they decide to display only 5 states at a time so that it’s really hard to see any big pi

4 0.85767823 502 andrew gelman stats-2011-01-04-Cash in, cash out graph

Introduction: David Afshartous writes: I thought this graph [from Ed Easterling] might be good for your blog. The 71 outlined squares show the main story, and the regions of the graph present the information nicely. Looks like the bins for the color coding are not of equal size and of course the end bins are unbounded. Might be interesting to graph the distribution of the actual data for the 71 outlined squares. In addition, I assume that each period begins on Jan 1 so data size could be naturally increased by looking at intervals that start on June 1 as well (where the limit of this process would be to have it at the granularity of one day; while it most likely wouldn’t make much difference, I’ve seen some graphs before where 1 year returns can be quite sensitive to starting date, etc). I agree that (a) the graph could be improved in small ways–in particular, adding half-year data seems like a great idea–and (b) it’s a wonderful, wonderful graph as is. And the NYT graphics people ad

5 0.85708088 2154 andrew gelman stats-2013-12-30-Bill Gates’s favorite graph of the year

Introduction: Under the subject line “Blog bait!”, Brendan Nyhan points me to this post at the Washington Post blog: For 2013, we asked some of the year’s most interesting, important and influential thinkers to name their favorite graph of the year — and why they chose it. Here’s Bill Gates’s. Infographic by Thomas Porostocky for WIRED. “I love this graph because it shows that while the number of people dying from communicable diseases is still far too high, those numbers continue to come down. . . .” As Brendan is aware, this is not my favorite sort of graph, it’s a bit of a puzzle to read and figure out where all the pieces fit in, also weird stuff going on like 3-D effects and the big space taken up by those yellow and green borders, as well as tricky things like understanding what some of those little blocks are, and perhaps the biggest question, what is the definition of an “untimely death.” But, as often is the case, the defects of the graph form a statistical perspective can

6 0.85084122 2146 andrew gelman stats-2013-12-24-NYT version of birthday graph

7 0.83668113 671 andrew gelman stats-2011-04-20-One more time-use graph

8 0.82754427 1669 andrew gelman stats-2013-01-12-The power of the puzzlegraph

9 0.82616985 1439 andrew gelman stats-2012-08-01-A book with a bunch of simple graphs

10 0.81315583 878 andrew gelman stats-2011-08-29-Infovis, infographics, and data visualization: Where I’m coming from, and where I’d like to go

11 0.80786747 2266 andrew gelman stats-2014-03-25-A statistical graphics course and statistical graphics advice

12 0.80531991 1011 andrew gelman stats-2011-11-15-World record running times vs. distance

13 0.80511892 1376 andrew gelman stats-2012-06-12-Simple graph WIN: the example of birthday frequencies

14 0.802706 670 andrew gelman stats-2011-04-20-Attractive but hard-to-read graph could be made much much better

15 0.7983734 915 andrew gelman stats-2011-09-17-(Worst) graph of the year

16 0.7930634 294 andrew gelman stats-2010-09-23-Thinking outside the (graphical) box: Instead of arguing about how best to fix a bar chart, graph it as a time series lineplot instead

17 0.79248208 1896 andrew gelman stats-2013-06-13-Against the myth of the heroic visualization

18 0.78842551 61 andrew gelman stats-2010-05-31-A data visualization manifesto

19 0.78827929 1894 andrew gelman stats-2013-06-12-How to best graph the Beveridge curve, relating the vacancy rate in jobs to the unemployment rate?

20 0.78393579 672 andrew gelman stats-2011-04-20-The R code for those time-use graphs


similar blogs computed by lda model

lda for this blog:

topicId topicWeight

[(3, 0.012), (6, 0.013), (10, 0.012), (16, 0.073), (18, 0.18), (21, 0.033), (24, 0.163), (32, 0.011), (39, 0.022), (59, 0.041), (76, 0.014), (86, 0.013), (89, 0.018), (95, 0.125), (99, 0.161)]

similar blogs list:

simIndex simValue blogId blogTitle

same-blog 1 0.91350257 829 andrew gelman stats-2011-07-29-Infovis vs. statgraphics: A clear example of their different goals

Introduction: I recently came across a data visualization that perfectly demonstrates the difference between the “infovis” and “statgraphics” perspectives. Here’s the image ( link from Tyler Cowen): That’s the infovis. The statgraphic version would simply be a dotplot, something like this: (I purposely used the default settings in R with only minor modifications here to demonstrate what happens if you just want to plot the data with minimal effort.) Let’s compare the two graphs: From a statistical graphics perspective, the second graph dominates. The countries are directly comparable and the numbers are indicated by positions rather than area. The first graph is full of distracting color and gives the misleading visual impression that the total GDP of countries 5-10 is about equal to that of countries 1-4. If the goal is to get attention , though, it’s another story. There’s nothing special about the top graph above except how it looks. It represents neither a dat

2 0.82426465 969 andrew gelman stats-2011-10-22-Researching the cost-effectiveness of political lobbying organisations

Introduction: Sally Murray from Giving What We Can writes: We are an organisation that assesses different charitable (/fundable) interventions, to estimate which are the most cost-effective (measured in terms of the improvement of life for people in developing countries gained for every dollar invested). Our research guides and encourages greater donations to the most cost-effective charities we thus identify, and our members have so far pledged a total of $14m to these causes, with many hundreds more relying on our advice in a less formal way. I am specifically researching the cost-effectiveness of political lobbying organisations. We are initially focusing on organisations that lobby for ‘big win’ outcomes such as increased funding of the most cost-effective NTD treatments/ vaccine research, changes to global trade rules (potentially) and more obscure lobbies such as “Keep Antibiotics Working”. We’ve a great deal of respect for your work and the superbly rational way you go about it, and

3 0.79465199 1967 andrew gelman stats-2013-08-04-What are the key assumptions of linear regression?

Introduction: Andy Cooper writes: A link to an article , “Four Assumptions Of Multiple Regression That Researchers Should Always Test”, has been making the rounds on Twitter. Their first rule is “Variables are Normally distributed.” And they seem to be talking about the independent variables – but then later bring in tests on the residuals (while admitting that the normally-distributed error assumption is a weak assumption). I thought we had long-since moved away from transforming our independent variables to make them normally distributed for statistical reasons (as opposed to standardizing them for interpretability, etc.) Am I missing something? I agree that leverage in a influence is important, but normality of the variables? The article is from 2002, so it might be dated, but given the popularity of the tweet, I thought I’d ask your opinion. My response: There’s some useful advice on that page but overall I think the advice was dated even in 2002. In section 3.6 of my book wit

4 0.79310107 404 andrew gelman stats-2010-11-09-“Much of the recent reported drop in interstate migration is a statistical artifact”

Introduction: Greg Kaplan writes: I noticed that you have blogged a little about interstate migration trends in the US, and thought that you might be interested in a new working paper of mine (joint with Sam Schulhofer-Wohl from the Minneapolis Fed) which I have attached. Briefly, we show that much of the recent reported drop in interstate migration is a statistical artifact: The Census Bureau made an undocumented change in its imputation procedures for missing data in 2006, and this change significantly reduced the number of imputed interstate moves. The change in imputation procedures — not any actual change in migration behavior — explains 90 percent of the reported decrease in interstate migration between the 2005 and 2006 Current Population Surveys, and 42 percent of the decrease between 2000 and 2010. I haven’t had a chance to give a serious look so could only make the quick suggestion to make the graphs smaller and put multiple graphs on a page, This would allow the reader to bett

5 0.792826 1973 andrew gelman stats-2013-08-08-For chrissake, just make up an analysis already! We have a lab here to run, y’know?

Introduction: Ben Hyde sends along this : Stuck in the middle of the supplemental data, reporting the total workup for their compounds, was this gem: Emma, please insert NMR data here! where are they? and for this compound, just make up an elemental analysis . . . I’m reminded of our recent discussions of coauthorship, where I argued that I see real advantages to having multiple people taking responsibility for the result. Jay Verkuilen responded: “On the flipside of collaboration . . . is diffusion of responsibility, where everybody thinks someone else ‘has that problem’ and thus things don’t get solved.” That’s what seems to have happened (hilariously) here.

6 0.78921497 1183 andrew gelman stats-2012-02-25-Calibration!

7 0.7835803 718 andrew gelman stats-2011-05-18-Should kids be able to bring their own lunches to school?

8 0.77956694 266 andrew gelman stats-2010-09-09-The future of R

9 0.77409434 832 andrew gelman stats-2011-07-31-Even a good data display can sometimes be improved

10 0.77388787 698 andrew gelman stats-2011-05-05-Shocking but not surprising

11 0.76736188 1820 andrew gelman stats-2013-04-23-Foundation for Open Access Statistics

12 0.76685834 12 andrew gelman stats-2010-04-30-More on problems with surveys estimating deaths in war zones

13 0.76441371 1292 andrew gelman stats-2012-05-01-Colorless green facts asserted resolutely

14 0.76439428 2046 andrew gelman stats-2013-10-01-I’ll say it again

15 0.76263165 456 andrew gelman stats-2010-12-07-The red-state, blue-state war is happening in the upper half of the income distribution

16 0.76237333 1164 andrew gelman stats-2012-02-13-Help with this problem, win valuable prizes

17 0.75816679 2135 andrew gelman stats-2013-12-15-The UN Plot to Force Bayesianism on Unsuspecting Americans (penalized B-Spline edition)

18 0.75005263 588 andrew gelman stats-2011-02-24-In case you were wondering, here’s the price of milk

19 0.74764431 1691 andrew gelman stats-2013-01-25-Extreem p-values!

20 0.74688452 1086 andrew gelman stats-2011-12-27-The most dangerous jobs in America