andrew_gelman_stats andrew_gelman_stats-2011 andrew_gelman_stats-2011-863 knowledge-graph by maker-knowledge-mining

863 andrew gelman stats-2011-08-21-Bad graph

meta infos for this blog

Source: html

Introduction: Dan Goldstein points us to this : It’s a good infographic–it grabs the reader’s eye ( see discussion here ), no? P.S. The above remark is not meant as a dig at infographics. On the contrary, I am sincerely saying that a graph that violates all statistical principles and does not do a good job at displaying data, can still be valuable and useful as a data graphic. For this infographic, the numbers are used as ornamentation to attract the viewer, just as one might use a cartoon or a dramatic photo image. P.P.S. At Hadley’s suggestion (see comment below), I’ve changed all uses of “infovis” above to “infographic.”

Summary: the most important sentenses genereted by tfidf model

sentIndex sentText sentNum sentScore

1 Dan Goldstein points us to this : It’s a good infographic–it grabs the reader’s eye ( see discussion here ), no? [sent-1, score-0.771]

2 The above remark is not meant as a dig at infographics. [sent-4, score-0.526]

3 On the contrary, I am sincerely saying that a graph that violates all statistical principles and does not do a good job at displaying data, can still be valuable and useful as a data graphic. [sent-5, score-1.465]

4 For this infographic, the numbers are used as ornamentation to attract the viewer, just as one might use a cartoon or a dramatic photo image. [sent-6, score-1.07]

5 At Hadley’s suggestion (see comment below), I’ve changed all uses of “infovis” above to “infographic. [sent-10, score-0.481]

similar blogs computed by tfidf model

tfidf for this blog:

wordName wordTfidf (topN-words)

[('infographic', 0.375), ('grabs', 0.247), ('dig', 0.237), ('photo', 0.222), ('cartoon', 0.216), ('goldstein', 0.203), ('viewer', 0.203), ('violates', 0.199), ('attract', 0.196), ('sincerely', 0.193), ('hadley', 0.193), ('contrary', 0.177), ('infovis', 0.174), ('displaying', 0.172), ('eye', 0.17), ('dramatic', 0.158), ('remark', 0.153), ('valuable', 0.149), ('suggestion', 0.145), ('dan', 0.14), ('meant', 0.136), ('reader', 0.135), ('principles', 0.133), ('uses', 0.125), ('changed', 0.122), ('job', 0.096), ('numbers', 0.091), ('graph', 0.091), ('comment', 0.089), ('good', 0.087), ('useful', 0.084), ('saying', 0.08), ('data', 0.071), ('see', 0.069), ('points', 0.068), ('us', 0.066), ('discussion', 0.064), ('used', 0.063), ('still', 0.06), ('use', 0.051), ('statistical', 0.05), ('might', 0.047), ('ve', 0.044), ('one', 0.026)]

similar blogs list:

simIndex simValue blogId blogTitle

same-blog 1 0.99999988 863 andrew gelman stats-2011-08-21-Bad graph

2 0.18726525 1800 andrew gelman stats-2013-04-12-Too tired to mock

Introduction: Someone sent me an email with the subject line “A terrible infographic,” and it went on from there: “Given some of your recent writing on infovis, I thought you might get a kick out of this . . . I’m certainly sympathetic to their motivations, but some of these plots do not aid understanding… To pick on a few in particular, the first plot attached, cropped from the infographic, is a strange alternative to a bar plot. For the second attachment, I still don’t understand what they’ve plotted. . . .” I agree with everything he wrote, but this point I think I’m getting too exhausted to laugh at graphs unless there is an obvious political bias to point to.

3 0.15755756 878 andrew gelman stats-2011-08-29-Infovis, infographics, and data visualization: Where I’m coming from, and where I’d like to go

Introduction: I continue to struggle to convey my thoughts on statistical graphics so I’ll try another approach, this time giving my own story. For newcomers to this discussion: the background is that Antony Unwin and I wrote an article on the different goals embodied in information visualization and statistical graphics, but I have difficulty communicating on this point with the infovis people. Maybe if I tell my own story, and then they tell their stories, this will point a way forward to a more constructive discussion. So here goes. I majored in physics in college and I worked in a couple of research labs during the summer. Physicists graph everything. I did most of my plotting on graph paper–this continued through my second year of grad school–and became expert at putting points at 1/5, 2/5, 3/5, and 4/5 between the x and y grid lines. In grad school in statistics, I continued my physics habits and graphed everything I could. I did notice, though, that the faculty and the other

4 0.14051494 855 andrew gelman stats-2011-08-16-Infovis and statgraphics update update

Introduction: To continue our discussion from last week , consider three positions regarding the display of information: (a) The traditional tabular approach. This is how most statisticians, econometricians, political scientists, sociologists, etc., seem to operate. They understand the appeal of a pretty graph, and they’re willing to plot some data as part of an exploratory data analysis, but they see their serious research as leading to numerical estimates, p-values, tables of numbers. These people might use a graph to illustrate their points but they don’t see them as necessary in their research. (b) Statistical graphics as performed by Howard Wainer, Bill Cleveland, Dianne Cook, etc. They–we–see graphics as central to the process of statistical modeling and data analysis and are interested in graphs (static and dynamic) that display every data point as transparently as possible. (c) Information visualization or infographics, as performed by graphics designers and statisticians who are

5 0.11881097 1734 andrew gelman stats-2013-02-23-Life in the C-suite: A graph that is both ugly and bad, and an unrelated story

Introduction: Jemes Keirstead sends along this infographic : He hates it: First we’ve got an hourglass metaphor wrecked by the fact that “now” (i.e. the pinch point in the glass) is actually 3-5 years in the future and the past sand includes “up to three years” in the future. Then there are the percentages which are appear to represent a vertical distance, not volume of sand or width of the hourglass. Add to that a strange color scheme in which green goes from dark to light to dark again. I know January’s not even finished yet, but surely a competitor for worst infographic of 2013? Keirstead doesn’t even comment on what I see as the worst aspect of the graph, which is that the “3-5 years” band is the narrowest on the graph, but expressed as a per-year rate it is actually the highest of all the percentages. The hourglass visualization does the astounding feat of taking the period where the executives expect the highest rate of change and presenting it as a minimum in the graph.

6 0.11471077 572 andrew gelman stats-2011-02-14-Desecration of valuable real estate

7 0.11241721 2154 andrew gelman stats-2013-12-30-Bill Gates’s favorite graph of the year

8 0.1096916 1104 andrew gelman stats-2012-01-07-A compelling reason to go to London, Ontario??

9 0.10942501 1584 andrew gelman stats-2012-11-19-Tradeoffs in information graphics

10 0.10317659 1862 andrew gelman stats-2013-05-18-uuuuuuuuuuuuugly

11 0.099417314 190 andrew gelman stats-2010-08-07-Mister P makes the big jump from the New York Times to the Washington Post

12 0.097074948 1572 andrew gelman stats-2012-11-10-I don’t like this cartoon

13 0.094751537 1606 andrew gelman stats-2012-12-05-The Grinch Comes Back

14 0.089515723 583 andrew gelman stats-2011-02-21-An interesting assignment for statistical graphics

15 0.089258239 305 andrew gelman stats-2010-09-29-Decision science vs. social psychology

16 0.086633161 509 andrew gelman stats-2011-01-09-Chartjunk, but in a good cause!

17 0.086042508 847 andrew gelman stats-2011-08-10-Using a “pure infographic” to explore differences between information visualization and statistical graphics

18 0.085921668 1851 andrew gelman stats-2013-05-11-Actually, I have no problem with this graph

19 0.085540466 1811 andrew gelman stats-2013-04-18-Psychology experiments to understand what’s going on with data graphics?

20 0.084629275 816 andrew gelman stats-2011-07-22-“Information visualization” vs. “Statistical graphics”

similar blogs computed by lsi model

lsi for this blog:

topicId topicWeight

[(0, 0.093), (1, -0.023), (2, -0.031), (3, 0.043), (4, 0.081), (5, -0.1), (6, -0.066), (7, 0.05), (8, -0.005), (9, -0.016), (10, 0.003), (11, -0.012), (12, -0.009), (13, -0.003), (14, -0.016), (15, 0.01), (16, 0.014), (17, 0.0), (18, -0.015), (19, 0.007), (20, -0.001), (21, -0.003), (22, -0.005), (23, 0.017), (24, 0.001), (25, -0.008), (26, 0.01), (27, 0.058), (28, -0.031), (29, -0.009), (30, -0.014), (31, -0.008), (32, -0.028), (33, -0.007), (34, -0.05), (35, 0.04), (36, -0.008), (37, 0.006), (38, 0.006), (39, 0.019), (40, 0.035), (41, -0.037), (42, -0.014), (43, 0.016), (44, -0.025), (45, 0.027), (46, 0.032), (47, -0.013), (48, -0.023), (49, -0.015)]

similar blogs list:

simIndex simValue blogId blogTitle

same-blog 1 0.92398983 863 andrew gelman stats-2011-08-21-Bad graph

2 0.83431029 829 andrew gelman stats-2011-07-29-Infovis vs. statgraphics: A clear example of their different goals

Introduction: I recently came across a data visualization that perfectly demonstrates the difference between the “infovis” and “statgraphics” perspectives. Here’s the image ( link from Tyler Cowen): That’s the infovis. The statgraphic version would simply be a dotplot, something like this: (I purposely used the default settings in R with only minor modifications here to demonstrate what happens if you just want to plot the data with minimal effort.) Let’s compare the two graphs: From a statistical graphics perspective, the second graph dominates. The countries are directly comparable and the numbers are indicated by positions rather than area. The first graph is full of distracting color and gives the misleading visual impression that the total GDP of countries 5-10 is about equal to that of countries 1-4. If the goal is to get attention , though, it’s another story. There’s nothing special about the top graph above except how it looks. It represents neither a dat

3 0.8199712 488 andrew gelman stats-2010-12-27-Graph of the year

Introduction: From blogger Matthew Yglesias : There are lots of great graphs all over the web (see, for example, here and here for some snappy pictures of unemployment trends from blogger “Geoff”). There’s nothing special about Yglesias’s graph. In fact, the reason I’m singling it out as “graph of the year” is because it’s not special. It’s a display of three numbers, with no subtlety or artistry in its presentation. True, it has some good features: - Clear title - Clearly labeled axes - Vertical axis goes to zero - The cities are in a sensible order (not, for example, alphabetical) - The graphs is readable; none of that 3-D “data visualization” crap that looks cool but distances the reader from the numbers being displayed. What’s impressive about the above graph, what makes it a landmark to me, is that it was made at all. As noted in the text immediately below the image, it’s a display of exactly three numbers which can with little effort be completely presented and e

4 0.80532724 2154 andrew gelman stats-2013-12-30-Bill Gates’s favorite graph of the year

Introduction: Under the subject line “Blog bait!”, Brendan Nyhan points me to this post at the Washington Post blog: For 2013, we asked some of the year’s most interesting, important and influential thinkers to name their favorite graph of the year — and why they chose it. Here’s Bill Gates’s. Infographic by Thomas Porostocky for WIRED. “I love this graph because it shows that while the number of people dying from communicable diseases is still far too high, those numbers continue to come down. . . .” As Brendan is aware, this is not my favorite sort of graph, it’s a bit of a puzzle to read and figure out where all the pieces fit in, also weird stuff going on like 3-D effects and the big space taken up by those yellow and green borders, as well as tricky things like understanding what some of those little blocks are, and perhaps the biggest question, what is the definition of an “untimely death.” But, as often is the case, the defects of the graph form a statistical perspective can

5 0.79442483 878 andrew gelman stats-2011-08-29-Infovis, infographics, and data visualization: Where I’m coming from, and where I’d like to go

6 0.78900921 2266 andrew gelman stats-2014-03-25-A statistical graphics course and statistical graphics advice

7 0.78680515 1684 andrew gelman stats-2013-01-20-Ugly ugly ugly

8 0.78600776 502 andrew gelman stats-2011-01-04-Cash in, cash out graph

9 0.77409351 1584 andrew gelman stats-2012-11-19-Tradeoffs in information graphics

10 0.77217937 1604 andrew gelman stats-2012-12-04-An epithet I can live with

11 0.76858479 1606 andrew gelman stats-2012-12-05-The Grinch Comes Back

12 0.75671971 1439 andrew gelman stats-2012-08-01-A book with a bunch of simple graphs

13 0.75066501 671 andrew gelman stats-2011-04-20-One more time-use graph

14 0.74515021 855 andrew gelman stats-2011-08-16-Infovis and statgraphics update update

15 0.74436194 847 andrew gelman stats-2011-08-10-Using a “pure infographic” to explore differences between information visualization and statistical graphics

16 0.74259424 1811 andrew gelman stats-2013-04-18-Psychology experiments to understand what’s going on with data graphics?

17 0.73985249 1896 andrew gelman stats-2013-06-13-Against the myth of the heroic visualization

18 0.73668009 319 andrew gelman stats-2010-10-04-“Who owns Congress”

19 0.73544103 294 andrew gelman stats-2010-09-23-Thinking outside the (graphical) box: Instead of arguing about how best to fix a bar chart, graph it as a time series lineplot instead

20 0.73156577 61 andrew gelman stats-2010-05-31-A data visualization manifesto

similar blogs computed by lda model

lda for this blog:

topicId topicWeight

[(5, 0.071), (15, 0.1), (16, 0.084), (24, 0.077), (42, 0.031), (53, 0.034), (79, 0.257), (84, 0.052), (99, 0.162)]

similar blogs list:

simIndex simValue blogId blogTitle

1 0.81187737 939 andrew gelman stats-2011-10-03-DBQQ rounding for labeling charts and communicating tolerances

Introduction: This is a mini research note, not deserving of a paper, but perhaps useful to others. It reinvents what has already appeared on this blog. Let’s say we have a line chart with numbers between 152.134 and 210.823, with the mean of 183.463. How should we label the chart with about 3 tics? Perhaps 152.132, 181.4785 and 210.823? Don’t do it! Objective is to fit about 3-7 tics at the optimal level of rounding. I use the following sequence: decimal rounding : fitting integer power and single-digit decimal i , rounding to i * 10^ power (example: 100 200 300) binary having power , fitting single-digit decimal i and binary b , rounding to 2* i /(1+ b ) * 10^ power (150 200 250) (optional) quaternary having power , fitting single-digit decimal i and quaternary q (0,1,2,3) round to 4* i /(1+ q ) * 10^ power (150 175 200) quinary having power , fitting single-digit decimal i and quinary f (0,1,2,3,4) round to 5* i /(1+ f ) * 10^ power (160 180 200)

2 0.80883533 469 andrew gelman stats-2010-12-16-2500 people living in a park in Chicago?

Introduction: Frank Hansen writes: Columbus Park is on Chicago’s west side, in the Austin neighborhood. The park is a big green area which includes a golf course. Here is the google satellite view. Here is the nytimes page. Go to Chicago, and zoom over to the census tract 2521, which is just north of the horizontal gray line (Eisenhower Expressway, aka I290) and just east of Oak Park. The park is labeled on the nytimes map. The census data have around 50 dots (they say 50 people per dot) in the park which has no residential buildings. Congressional district is Danny Davis, IL7. Here’s a map of the district. So, how do we explain the map showing ~50 dots worth of people living in the park. What’s up with the algorithm to place the dots? I dunno. I leave this one to you, the readers.

3 0.7967695 845 andrew gelman stats-2011-08-08-How adoption speed affects the abandonment of cultural tastes

Introduction: Interesting article by Jonah Berger and Gael Le Mens: Products, styles, and social movements often catch on and become popular, but little is known about why such identity-relevant cultural tastes and practices die out. We demonstrate that the velocity of adoption may affect abandonment: Analysis of over 100 years of data on first-name adoption in both France and the United States illustrates that cultural tastes that have been adopted quickly die faster (i.e., are less likely to persist). Mirroring this aggregate pattern, at the individual level, expecting parents are more hesitant to adopt names that recently experienced sharper increases in adoption. Further analysis indicate that these effects are driven by concerns about symbolic value: Fads are perceived negatively, so people avoid identity-relevant items with sharply increasing popularity because they believe that they will be short lived. Ancillary analyses also indicate that, in contrast to conventional wisdom, identity-r

4 0.79286897 2139 andrew gelman stats-2013-12-19-Happy birthday

Introduction: (Click for bigger image.) The above is Akiâ€™s decomposition of the birthdays data (the number of babies born each day in the United States, from 1968 through 1988) using a Gaussian process model, as described in more detail in our book .

5 0.78799641 1379 andrew gelman stats-2012-06-14-Cool-ass signal processing using Gaussian processes (birthdays again)

Introduction: Aki writes: Here’s my version of the birthday frequency graph . I used Gaussian process with two slowly varying components and periodic component with decay, so that periodic form can change in time. I used Student’s t-distribution as observation model to allow exceptional dates to be outliers. I guess that periodic component due to week effect is still in the data because there is data only from twenty years. Naturally it would be better to model the whole timeseries, but it was easier to just use the cvs by Mulligan. ALl I can say is . . . wow. Bayes wins again. Maybe Aki can supply the R or Matlab code? P.S. And let’s not forget how great the simple and clear time series plots are, compared to various fancy visualizations that people might try. P.P.S. More here .

6 0.78573138 1126 andrew gelman stats-2012-01-18-Bob on Stan

7 0.78513074 1538 andrew gelman stats-2012-10-17-Rust

8 0.78318286 1515 andrew gelman stats-2012-09-29-Jost Haidt

same-blog 9 0.77477914 863 andrew gelman stats-2011-08-21-Bad graph

10 0.7275874 1825 andrew gelman stats-2013-04-25-It’s binless! A program for computing normalizing functions

11 0.69271898 1172 andrew gelman stats-2012-02-17-Rare name analysis and wealth convergence

12 0.66202295 1884 andrew gelman stats-2013-06-05-A story of fake-data checking being used to shoot down a flawed analysis at the Farm Credit Agency

13 0.6589489 1048 andrew gelman stats-2011-12-09-Maze generation algorithms!

14 0.65193433 1786 andrew gelman stats-2013-04-03-Hierarchical array priors for ANOVA decompositions

15 0.64043534 399 andrew gelman stats-2010-11-07-Challenges of experimental design; also another rant on the practice of mentioning the publication of an article but not naming its author

16 0.63833296 1044 andrew gelman stats-2011-12-06-The K Foundation burns Cosma’s turkey

17 0.62274647 2004 andrew gelman stats-2013-09-01-Post-publication peer review: How it (sometimes) really works

18 0.61961377 1384 andrew gelman stats-2012-06-19-Slick time series decomposition of the birthdays data

19 0.61177069 636 andrew gelman stats-2011-03-29-The Conservative States of America

20 0.605582 329 andrew gelman stats-2010-10-08-More on those dudes who will pay your professor $8000 to assign a book to your class, and related stories about small-time sleazoids