andrew_gelman_stats andrew_gelman_stats-2011 andrew_gelman_stats-2011-863 knowledge-graph by maker-knowledge-mining
Source: html
Introduction: Dan Goldstein points us to this : It’s a good infographic–it grabs the reader’s eye ( see discussion here ), no? P.S. The above remark is not meant as a dig at infographics. On the contrary, I am sincerely saying that a graph that violates all statistical principles and does not do a good job at displaying data, can still be valuable and useful as a data graphic. For this infographic, the numbers are used as ornamentation to attract the viewer, just as one might use a cartoon or a dramatic photo image. P.P.S. At Hadley’s suggestion (see comment below), I’ve changed all uses of “infovis” above to “infographic.”
sentIndex sentText sentNum sentScore
1 Dan Goldstein points us to this : It’s a good infographic–it grabs the reader’s eye ( see discussion here ), no? [sent-1, score-0.771]
2 The above remark is not meant as a dig at infographics. [sent-4, score-0.526]
3 On the contrary, I am sincerely saying that a graph that violates all statistical principles and does not do a good job at displaying data, can still be valuable and useful as a data graphic. [sent-5, score-1.465]
4 For this infographic, the numbers are used as ornamentation to attract the viewer, just as one might use a cartoon or a dramatic photo image. [sent-6, score-1.07]
5 At Hadley’s suggestion (see comment below), I’ve changed all uses of “infovis” above to “infographic. [sent-10, score-0.481]
wordName wordTfidf (topN-words)
[('infographic', 0.375), ('grabs', 0.247), ('dig', 0.237), ('photo', 0.222), ('cartoon', 0.216), ('goldstein', 0.203), ('viewer', 0.203), ('violates', 0.199), ('attract', 0.196), ('sincerely', 0.193), ('hadley', 0.193), ('contrary', 0.177), ('infovis', 0.174), ('displaying', 0.172), ('eye', 0.17), ('dramatic', 0.158), ('remark', 0.153), ('valuable', 0.149), ('suggestion', 0.145), ('dan', 0.14), ('meant', 0.136), ('reader', 0.135), ('principles', 0.133), ('uses', 0.125), ('changed', 0.122), ('job', 0.096), ('numbers', 0.091), ('graph', 0.091), ('comment', 0.089), ('good', 0.087), ('useful', 0.084), ('saying', 0.08), ('data', 0.071), ('see', 0.069), ('points', 0.068), ('us', 0.066), ('discussion', 0.064), ('used', 0.063), ('still', 0.06), ('use', 0.051), ('statistical', 0.05), ('might', 0.047), ('ve', 0.044), ('one', 0.026)]
simIndex simValue blogId blogTitle
same-blog 1 0.99999988 863 andrew gelman stats-2011-08-21-Bad graph
Introduction: Dan Goldstein points us to this : It’s a good infographic–it grabs the reader’s eye ( see discussion here ), no? P.S. The above remark is not meant as a dig at infographics. On the contrary, I am sincerely saying that a graph that violates all statistical principles and does not do a good job at displaying data, can still be valuable and useful as a data graphic. For this infographic, the numbers are used as ornamentation to attract the viewer, just as one might use a cartoon or a dramatic photo image. P.P.S. At Hadley’s suggestion (see comment below), I’ve changed all uses of “infovis” above to “infographic.”
2 0.18726525 1800 andrew gelman stats-2013-04-12-Too tired to mock
Introduction: Someone sent me an email with the subject line “A terrible infographic,” and it went on from there: “Given some of your recent writing on infovis, I thought you might get a kick out of this . . . I’m certainly sympathetic to their motivations, but some of these plots do not aid understanding… To pick on a few in particular, the first plot attached, cropped from the infographic, is a strange alternative to a bar plot. For the second attachment, I still don’t understand what they’ve plotted. . . .” I agree with everything he wrote, but this point I think I’m getting too exhausted to laugh at graphs unless there is an obvious political bias to point to.
Introduction: I continue to struggle to convey my thoughts on statistical graphics so I’ll try another approach, this time giving my own story. For newcomers to this discussion: the background is that Antony Unwin and I wrote an article on the different goals embodied in information visualization and statistical graphics, but I have difficulty communicating on this point with the infovis people. Maybe if I tell my own story, and then they tell their stories, this will point a way forward to a more constructive discussion. So here goes. I majored in physics in college and I worked in a couple of research labs during the summer. Physicists graph everything. I did most of my plotting on graph paper–this continued through my second year of grad school–and became expert at putting points at 1/5, 2/5, 3/5, and 4/5 between the x and y grid lines. In grad school in statistics, I continued my physics habits and graphed everything I could. I did notice, though, that the faculty and the other
4 0.14051494 855 andrew gelman stats-2011-08-16-Infovis and statgraphics update update
Introduction: To continue our discussion from last week , consider three positions regarding the display of information: (a) The traditional tabular approach. This is how most statisticians, econometricians, political scientists, sociologists, etc., seem to operate. They understand the appeal of a pretty graph, and they’re willing to plot some data as part of an exploratory data analysis, but they see their serious research as leading to numerical estimates, p-values, tables of numbers. These people might use a graph to illustrate their points but they don’t see them as necessary in their research. (b) Statistical graphics as performed by Howard Wainer, Bill Cleveland, Dianne Cook, etc. They–we–see graphics as central to the process of statistical modeling and data analysis and are interested in graphs (static and dynamic) that display every data point as transparently as possible. (c) Information visualization or infographics, as performed by graphics designers and statisticians who are
5 0.11881097 1734 andrew gelman stats-2013-02-23-Life in the C-suite: A graph that is both ugly and bad, and an unrelated story
Introduction: Jemes Keirstead sends along this infographic : He hates it: First we’ve got an hourglass metaphor wrecked by the fact that “now” (i.e. the pinch point in the glass) is actually 3-5 years in the future and the past sand includes “up to three years” in the future. Then there are the percentages which are appear to represent a vertical distance, not volume of sand or width of the hourglass. Add to that a strange color scheme in which green goes from dark to light to dark again. I know January’s not even finished yet, but surely a competitor for worst infographic of 2013? Keirstead doesn’t even comment on what I see as the worst aspect of the graph, which is that the “3-5 years” band is the narrowest on the graph, but expressed as a per-year rate it is actually the highest of all the percentages. The hourglass visualization does the astounding feat of taking the period where the executives expect the highest rate of change and presenting it as a minimum in the graph.
6 0.11471077 572 andrew gelman stats-2011-02-14-Desecration of valuable real estate
7 0.11241721 2154 andrew gelman stats-2013-12-30-Bill Gates’s favorite graph of the year
8 0.1096916 1104 andrew gelman stats-2012-01-07-A compelling reason to go to London, Ontario??
9 0.10942501 1584 andrew gelman stats-2012-11-19-Tradeoffs in information graphics
10 0.10317659 1862 andrew gelman stats-2013-05-18-uuuuuuuuuuuuugly
11 0.099417314 190 andrew gelman stats-2010-08-07-Mister P makes the big jump from the New York Times to the Washington Post
12 0.097074948 1572 andrew gelman stats-2012-11-10-I don’t like this cartoon
13 0.094751537 1606 andrew gelman stats-2012-12-05-The Grinch Comes Back
14 0.089515723 583 andrew gelman stats-2011-02-21-An interesting assignment for statistical graphics
15 0.089258239 305 andrew gelman stats-2010-09-29-Decision science vs. social psychology
16 0.086633161 509 andrew gelman stats-2011-01-09-Chartjunk, but in a good cause!
18 0.085921668 1851 andrew gelman stats-2013-05-11-Actually, I have no problem with this graph
19 0.085540466 1811 andrew gelman stats-2013-04-18-Psychology experiments to understand what’s going on with data graphics?
20 0.084629275 816 andrew gelman stats-2011-07-22-“Information visualization” vs. “Statistical graphics”
topicId topicWeight
[(0, 0.093), (1, -0.023), (2, -0.031), (3, 0.043), (4, 0.081), (5, -0.1), (6, -0.066), (7, 0.05), (8, -0.005), (9, -0.016), (10, 0.003), (11, -0.012), (12, -0.009), (13, -0.003), (14, -0.016), (15, 0.01), (16, 0.014), (17, 0.0), (18, -0.015), (19, 0.007), (20, -0.001), (21, -0.003), (22, -0.005), (23, 0.017), (24, 0.001), (25, -0.008), (26, 0.01), (27, 0.058), (28, -0.031), (29, -0.009), (30, -0.014), (31, -0.008), (32, -0.028), (33, -0.007), (34, -0.05), (35, 0.04), (36, -0.008), (37, 0.006), (38, 0.006), (39, 0.019), (40, 0.035), (41, -0.037), (42, -0.014), (43, 0.016), (44, -0.025), (45, 0.027), (46, 0.032), (47, -0.013), (48, -0.023), (49, -0.015)]
simIndex simValue blogId blogTitle
same-blog 1 0.92398983 863 andrew gelman stats-2011-08-21-Bad graph
Introduction: Dan Goldstein points us to this : It’s a good infographic–it grabs the reader’s eye ( see discussion here ), no? P.S. The above remark is not meant as a dig at infographics. On the contrary, I am sincerely saying that a graph that violates all statistical principles and does not do a good job at displaying data, can still be valuable and useful as a data graphic. For this infographic, the numbers are used as ornamentation to attract the viewer, just as one might use a cartoon or a dramatic photo image. P.P.S. At Hadley’s suggestion (see comment below), I’ve changed all uses of “infovis” above to “infographic.”
2 0.83431029 829 andrew gelman stats-2011-07-29-Infovis vs. statgraphics: A clear example of their different goals
Introduction: I recently came across a data visualization that perfectly demonstrates the difference between the “infovis” and “statgraphics” perspectives. Here’s the image ( link from Tyler Cowen): That’s the infovis. The statgraphic version would simply be a dotplot, something like this: (I purposely used the default settings in R with only minor modifications here to demonstrate what happens if you just want to plot the data with minimal effort.) Let’s compare the two graphs: From a statistical graphics perspective, the second graph dominates. The countries are directly comparable and the numbers are indicated by positions rather than area. The first graph is full of distracting color and gives the misleading visual impression that the total GDP of countries 5-10 is about equal to that of countries 1-4. If the goal is to get attention , though, it’s another story. There’s nothing special about the top graph above except how it looks. It represents neither a dat
3 0.8199712 488 andrew gelman stats-2010-12-27-Graph of the year
Introduction: From blogger Matthew Yglesias : There are lots of great graphs all over the web (see, for example, here and here for some snappy pictures of unemployment trends from blogger “Geoff”). There’s nothing special about Yglesias’s graph. In fact, the reason I’m singling it out as “graph of the year” is because it’s not special. It’s a display of three numbers, with no subtlety or artistry in its presentation. True, it has some good features: - Clear title - Clearly labeled axes - Vertical axis goes to zero - The cities are in a sensible order (not, for example, alphabetical) - The graphs is readable; none of that 3-D “data visualization” crap that looks cool but distances the reader from the numbers being displayed. What’s impressive about the above graph, what makes it a landmark to me, is that it was made at all. As noted in the text immediately below the image, it’s a display of exactly three numbers which can with little effort be completely presented and e
4 0.80532724 2154 andrew gelman stats-2013-12-30-Bill Gates’s favorite graph of the year
Introduction: Under the subject line “Blog bait!”, Brendan Nyhan points me to this post at the Washington Post blog: For 2013, we asked some of the year’s most interesting, important and influential thinkers to name their favorite graph of the year — and why they chose it. Here’s Bill Gates’s. Infographic by Thomas Porostocky for WIRED. “I love this graph because it shows that while the number of people dying from communicable diseases is still far too high, those numbers continue to come down. . . .” As Brendan is aware, this is not my favorite sort of graph, it’s a bit of a puzzle to read and figure out where all the pieces fit in, also weird stuff going on like 3-D effects and the big space taken up by those yellow and green borders, as well as tricky things like understanding what some of those little blocks are, and perhaps the biggest question, what is the definition of an “untimely death.” But, as often is the case, the defects of the graph form a statistical perspective can
Introduction: I continue to struggle to convey my thoughts on statistical graphics so I’ll try another approach, this time giving my own story. For newcomers to this discussion: the background is that Antony Unwin and I wrote an article on the different goals embodied in information visualization and statistical graphics, but I have difficulty communicating on this point with the infovis people. Maybe if I tell my own story, and then they tell their stories, this will point a way forward to a more constructive discussion. So here goes. I majored in physics in college and I worked in a couple of research labs during the summer. Physicists graph everything. I did most of my plotting on graph paper–this continued through my second year of grad school–and became expert at putting points at 1/5, 2/5, 3/5, and 4/5 between the x and y grid lines. In grad school in statistics, I continued my physics habits and graphed everything I could. I did notice, though, that the faculty and the other
6 0.78900921 2266 andrew gelman stats-2014-03-25-A statistical graphics course and statistical graphics advice
7 0.78680515 1684 andrew gelman stats-2013-01-20-Ugly ugly ugly
8 0.78600776 502 andrew gelman stats-2011-01-04-Cash in, cash out graph
9 0.77409351 1584 andrew gelman stats-2012-11-19-Tradeoffs in information graphics
10 0.77217937 1604 andrew gelman stats-2012-12-04-An epithet I can live with
11 0.76858479 1606 andrew gelman stats-2012-12-05-The Grinch Comes Back
12 0.75671971 1439 andrew gelman stats-2012-08-01-A book with a bunch of simple graphs
13 0.75066501 671 andrew gelman stats-2011-04-20-One more time-use graph
14 0.74515021 855 andrew gelman stats-2011-08-16-Infovis and statgraphics update update
16 0.74259424 1811 andrew gelman stats-2013-04-18-Psychology experiments to understand what’s going on with data graphics?
17 0.73985249 1896 andrew gelman stats-2013-06-13-Against the myth of the heroic visualization
18 0.73668009 319 andrew gelman stats-2010-10-04-“Who owns Congress”
20 0.73156577 61 andrew gelman stats-2010-05-31-A data visualization manifesto
topicId topicWeight
[(5, 0.071), (15, 0.1), (16, 0.084), (24, 0.077), (42, 0.031), (53, 0.034), (79, 0.257), (84, 0.052), (99, 0.162)]
simIndex simValue blogId blogTitle
1 0.81187737 939 andrew gelman stats-2011-10-03-DBQQ rounding for labeling charts and communicating tolerances
Introduction: This is a mini research note, not deserving of a paper, but perhaps useful to others. It reinvents what has already appeared on this blog. Let’s say we have a line chart with numbers between 152.134 and 210.823, with the mean of 183.463. How should we label the chart with about 3 tics? Perhaps 152.132, 181.4785 and 210.823? Don’t do it! Objective is to fit about 3-7 tics at the optimal level of rounding. I use the following sequence: decimal rounding : fitting integer power and single-digit decimal i , rounding to i * 10^ power (example: 100 200 300) binary having power , fitting single-digit decimal i and binary b , rounding to 2* i /(1+ b ) * 10^ power (150 200 250) (optional) quaternary having power , fitting single-digit decimal i and quaternary q (0,1,2,3) round to 4* i /(1+ q ) * 10^ power (150 175 200) quinary having power , fitting single-digit decimal i and quinary f (0,1,2,3,4) round to 5* i /(1+ f ) * 10^ power (160 180 200)
2 0.80883533 469 andrew gelman stats-2010-12-16-2500 people living in a park in Chicago?
Introduction: Frank Hansen writes: Columbus Park is on Chicago’s west side, in the Austin neighborhood. The park is a big green area which includes a golf course. Here is the google satellite view. Here is the nytimes page. Go to Chicago, and zoom over to the census tract 2521, which is just north of the horizontal gray line (Eisenhower Expressway, aka I290) and just east of Oak Park. The park is labeled on the nytimes map. The census data have around 50 dots (they say 50 people per dot) in the park which has no residential buildings. Congressional district is Danny Davis, IL7. Here’s a map of the district. So, how do we explain the map showing ~50 dots worth of people living in the park. What’s up with the algorithm to place the dots? I dunno. I leave this one to you, the readers.
3 0.7967695 845 andrew gelman stats-2011-08-08-How adoption speed affects the abandonment of cultural tastes
Introduction: Interesting article by Jonah Berger and Gael Le Mens: Products, styles, and social movements often catch on and become popular, but little is known about why such identity-relevant cultural tastes and practices die out. We demonstrate that the velocity of adoption may affect abandonment: Analysis of over 100 years of data on first-name adoption in both France and the United States illustrates that cultural tastes that have been adopted quickly die faster (i.e., are less likely to persist). Mirroring this aggregate pattern, at the individual level, expecting parents are more hesitant to adopt names that recently experienced sharper increases in adoption. Further analysis indicate that these effects are driven by concerns about symbolic value: Fads are perceived negatively, so people avoid identity-relevant items with sharply increasing popularity because they believe that they will be short lived. Ancillary analyses also indicate that, in contrast to conventional wisdom, identity-r
4 0.79286897 2139 andrew gelman stats-2013-12-19-Happy birthday
Introduction: (Click for bigger image.) The above is Aki’s decomposition of the birthdays data (the number of babies born each day in the United States, from 1968 through 1988) using a Gaussian process model, as described in more detail in our book .
5 0.78799641 1379 andrew gelman stats-2012-06-14-Cool-ass signal processing using Gaussian processes (birthdays again)
Introduction: Aki writes: Here’s my version of the birthday frequency graph . I used Gaussian process with two slowly varying components and periodic component with decay, so that periodic form can change in time. I used Student’s t-distribution as observation model to allow exceptional dates to be outliers. I guess that periodic component due to week effect is still in the data because there is data only from twenty years. Naturally it would be better to model the whole timeseries, but it was easier to just use the cvs by Mulligan. ALl I can say is . . . wow. Bayes wins again. Maybe Aki can supply the R or Matlab code? P.S. And let’s not forget how great the simple and clear time series plots are, compared to various fancy visualizations that people might try. P.P.S. More here .
6 0.78573138 1126 andrew gelman stats-2012-01-18-Bob on Stan
7 0.78513074 1538 andrew gelman stats-2012-10-17-Rust
8 0.78318286 1515 andrew gelman stats-2012-09-29-Jost Haidt
same-blog 9 0.77477914 863 andrew gelman stats-2011-08-21-Bad graph
10 0.7275874 1825 andrew gelman stats-2013-04-25-It’s binless! A program for computing normalizing functions
11 0.69271898 1172 andrew gelman stats-2012-02-17-Rare name analysis and wealth convergence
13 0.6589489 1048 andrew gelman stats-2011-12-09-Maze generation algorithms!
14 0.65193433 1786 andrew gelman stats-2013-04-03-Hierarchical array priors for ANOVA decompositions
16 0.63833296 1044 andrew gelman stats-2011-12-06-The K Foundation burns Cosma’s turkey
17 0.62274647 2004 andrew gelman stats-2013-09-01-Post-publication peer review: How it (sometimes) really works
18 0.61961377 1384 andrew gelman stats-2012-06-19-Slick time series decomposition of the birthdays data
19 0.61177069 636 andrew gelman stats-2011-03-29-The Conservative States of America