andrew_gelman_stats andrew_gelman_stats-2012 andrew_gelman_stats-2012-1116 knowledge-graph by maker-knowledge-mining
Source: html
Introduction: Gabriel Bergin writes: Just thought I’d share an infographic you might enjoy. I [Bergin] quite like what they did with the colored ranges of previous curves in the two middle graphs: I like it. Would it be possible to put the two long time series on the same scale? As it is, one starts in 1948 and the other starts in 1980. The only thing about the display that I really don’t like are those balls on the top indicating the duration of recessions. It looks weird to me to display a time duration in the form of the area of a ball.
sentIndex sentText sentNum sentScore
1 Gabriel Bergin writes: Just thought I’d share an infographic you might enjoy. [sent-1, score-0.38]
2 I [Bergin] quite like what they did with the colored ranges of previous curves in the two middle graphs: I like it. [sent-2, score-1.08]
3 Would it be possible to put the two long time series on the same scale? [sent-3, score-0.474]
4 As it is, one starts in 1948 and the other starts in 1980. [sent-4, score-0.624]
5 The only thing about the display that I really don’t like are those balls on the top indicating the duration of recessions. [sent-5, score-1.242]
6 It looks weird to me to display a time duration in the form of the area of a ball. [sent-6, score-1.125]
wordName wordTfidf (topN-words)
[('bergin', 0.525), ('duration', 0.384), ('starts', 0.3), ('display', 0.242), ('gabriel', 0.208), ('balls', 0.192), ('colored', 0.185), ('ranges', 0.181), ('infographic', 0.171), ('curves', 0.167), ('indicating', 0.163), ('ball', 0.161), ('weird', 0.141), ('middle', 0.119), ('share', 0.108), ('previous', 0.105), ('area', 0.104), ('scale', 0.102), ('series', 0.101), ('two', 0.09), ('form', 0.089), ('top', 0.089), ('graphs', 0.088), ('looks', 0.087), ('like', 0.078), ('time', 0.078), ('quite', 0.077), ('possible', 0.073), ('long', 0.072), ('put', 0.06), ('thought', 0.059), ('thing', 0.054), ('might', 0.042), ('really', 0.04), ('writes', 0.036), ('would', 0.026), ('one', 0.024)]
simIndex simValue blogId blogTitle
same-blog 1 0.99999994 1116 andrew gelman stats-2012-01-13-Infographic on the economy
Introduction: Gabriel Bergin writes: Just thought I’d share an infographic you might enjoy. I [Bergin] quite like what they did with the colored ranges of previous curves in the two middle graphs: I like it. Would it be possible to put the two long time series on the same scale? As it is, one starts in 1948 and the other starts in 1980. The only thing about the display that I really don’t like are those balls on the top indicating the duration of recessions. It looks weird to me to display a time duration in the form of the area of a ball.
2 0.1058636 1684 andrew gelman stats-2013-01-20-Ugly ugly ugly
Introduction: Denis Cote sends the following , under the heading, “Some bad graphs for your enjoyment”: To start with, they don’t know how to spell “color.” Seriously, though, the graph is a mess. The circular display implies a circular or periodic structure that isn’t actually in the data, the cramped display requires the use of an otherwise-unnecessary color code that makes it difficult to find or make sense of the information, the alphabetical ordering (without even supplying state names, only abbreviations) makes it further difficult to find any patterns. It would be so much better, and even easier, to just display a set of small maps shading states on whether they have different laws. But that’s part of the problem—the clearer graph would also be easier to make! To get a distinctive graph, there needs to be some degree of difficulty. The designers continue with these monstrosities: Here they decide to display only 5 states at a time so that it’s really hard to see any big pi
3 0.10172644 363 andrew gelman stats-2010-10-22-Graphing Likert scale responses
Introduction: Alex Hoffman writes: I am reviewing a article with a whole bunch of tables with likert scale responses. You know, the standard thing with each question on its own line, followed by 5 columns of numbers. Is there a good way to display this data graphically? OK, there’s no one best way, but can you point your readers to a few good examples? My reply: Some sort of small multiples. I’m thinking of lineplots. Maybe a grid of plots, each with three colored and labeled lines. For example, it might be a grid with 10 rows and 5 columns. To really know what to do, I’d have to have more sense of what’s being plotted. Feel free to contribute your ideas in the comments.
4 0.09341234 1955 andrew gelman stats-2013-07-25-Bayes-respecting experimental design and other things
Introduction: Dan Lakeland writes: I have some questions about some basic statistical ideas and would like your opinion on them: 1) Parameters that manifestly DON’T exist: It makes good sense to me to think about Bayesian statistics as narrowing in on the value of parameters based on a model and some data. But there are cases where “the parameter” simply doesn’t make sense as an actual thing. Yet, it’s not really a complete fiction, like unicorns either, it’s some kind of “effective” thing maybe. Here’s an example of what I mean. I did a simple toy experiment where we dropped crumpled up balls of paper and timed their fall times. (see here: http://models.street-artists.org/?s=falling+ball ) It was pretty instructive actually, and I did it to figure out how to in a practical way use an ODE to get a likelihood in MCMC procedures. One of the parameters in the model is the radius of the spherical ball of paper. But the ball of paper isn’t a sphere, not even approximately. There’s no single valu
5 0.084550358 855 andrew gelman stats-2011-08-16-Infovis and statgraphics update update
Introduction: To continue our discussion from last week , consider three positions regarding the display of information: (a) The traditional tabular approach. This is how most statisticians, econometricians, political scientists, sociologists, etc., seem to operate. They understand the appeal of a pretty graph, and they’re willing to plot some data as part of an exploratory data analysis, but they see their serious research as leading to numerical estimates, p-values, tables of numbers. These people might use a graph to illustrate their points but they don’t see them as necessary in their research. (b) Statistical graphics as performed by Howard Wainer, Bill Cleveland, Dianne Cook, etc. They–we–see graphics as central to the process of statistical modeling and data analysis and are interested in graphs (static and dynamic) that display every data point as transparently as possible. (c) Information visualization or infographics, as performed by graphics designers and statisticians who are
6 0.082203306 1800 andrew gelman stats-2013-04-12-Too tired to mock
7 0.0778457 1464 andrew gelman stats-2012-08-20-Donald E. Westlake on George W. Bush
8 0.073690765 2239 andrew gelman stats-2014-03-09-Reviewing the peer review process?
10 0.072043978 1064 andrew gelman stats-2011-12-16-The benefit of the continuous color scale
11 0.071470283 1767 andrew gelman stats-2013-03-17-The disappearing or non-disappearing middle class
12 0.071284182 878 andrew gelman stats-2011-08-29-Infovis, infographics, and data visualization: Where I’m coming from, and where I’d like to go
13 0.071092226 488 andrew gelman stats-2010-12-27-Graph of the year
14 0.070457458 112 andrew gelman stats-2010-06-27-Sampling rate of human-scaled time series
16 0.066892534 1584 andrew gelman stats-2012-11-19-Tradeoffs in information graphics
17 0.066770114 863 andrew gelman stats-2011-08-21-Bad graph
18 0.063866369 1995 andrew gelman stats-2013-08-23-“I mean, what exact buttons do I have to hit?”
20 0.062042326 1439 andrew gelman stats-2012-08-01-A book with a bunch of simple graphs
topicId topicWeight
[(0, 0.075), (1, -0.026), (2, -0.01), (3, 0.037), (4, 0.068), (5, -0.073), (6, -0.014), (7, 0.014), (8, -0.022), (9, 0.012), (10, 0.02), (11, -0.011), (12, -0.012), (13, -0.012), (14, 0.005), (15, -0.001), (16, 0.02), (17, -0.017), (18, 0.0), (19, 0.016), (20, 0.009), (21, 0.023), (22, -0.013), (23, 0.001), (24, 0.027), (25, -0.005), (26, 0.009), (27, 0.001), (28, -0.005), (29, 0.003), (30, -0.0), (31, -0.016), (32, -0.018), (33, 0.007), (34, -0.017), (35, -0.016), (36, -0.004), (37, 0.01), (38, 0.011), (39, 0.024), (40, 0.02), (41, 0.022), (42, -0.021), (43, 0.001), (44, -0.027), (45, 0.006), (46, -0.009), (47, 0.013), (48, 0.021), (49, 0.003)]
simIndex simValue blogId blogTitle
same-blog 1 0.93540239 1116 andrew gelman stats-2012-01-13-Infographic on the economy
Introduction: Gabriel Bergin writes: Just thought I’d share an infographic you might enjoy. I [Bergin] quite like what they did with the colored ranges of previous curves in the two middle graphs: I like it. Would it be possible to put the two long time series on the same scale? As it is, one starts in 1948 and the other starts in 1980. The only thing about the display that I really don’t like are those balls on the top indicating the duration of recessions. It looks weird to me to display a time duration in the form of the area of a ball.
2 0.86051291 1684 andrew gelman stats-2013-01-20-Ugly ugly ugly
Introduction: Denis Cote sends the following , under the heading, “Some bad graphs for your enjoyment”: To start with, they don’t know how to spell “color.” Seriously, though, the graph is a mess. The circular display implies a circular or periodic structure that isn’t actually in the data, the cramped display requires the use of an otherwise-unnecessary color code that makes it difficult to find or make sense of the information, the alphabetical ordering (without even supplying state names, only abbreviations) makes it further difficult to find any patterns. It would be so much better, and even easier, to just display a set of small maps shading states on whether they have different laws. But that’s part of the problem—the clearer graph would also be easier to make! To get a distinctive graph, there needs to be some degree of difficulty. The designers continue with these monstrosities: Here they decide to display only 5 states at a time so that it’s really hard to see any big pi
3 0.83533627 37 andrew gelman stats-2010-05-17-Is chartjunk really “more useful” than plain graphs? I don’t think so.
Introduction: Helen DeWitt links to this blog that reports on a study by Scott Bateman, Carl Gutwin, David McDine, Regan Mandryk, Aaron Genest, and Christopher Brooks that claims the following: Guidelines for designing information charts often state that the presentation should reduce ‘chart junk’–visual embellishments that are not essential to understanding the data. . . . we conducted an experiment that compared embellished charts with plain ones, and measured both interpretation accuracy and long-term recall. We found that people’s accuracy in describing the embellished charts was no worse than for plain charts, and that their recall after a two-to-three-week gap was significantly better. As the above-linked blogger puts it, “chartjunk is more useful than plain graphs. . . . Tufte is not going to like this.” I can’t speak for Ed Tufte, but I’m not gonna take this claim about chartjunk lying down. I have two points to make which I hope can stop the above-linked study from being sla
4 0.82859534 672 andrew gelman stats-2011-04-20-The R code for those time-use graphs
Introduction: By popular demand, here’s my R script for the time-use graphs : # The data a1 <- c(4.2,3.2,11.1,1.3,2.2,2.0) a2 <- c(3.9,3.2,10.0,0.8,3.1,3.1) a3 <- c(6.3,2.5,9.8,0.9,2.2,2.4) a4 <- c(4.4,3.1,9.8,0.8,3.3,2.7) a5 <- c(4.8,3.0,9.9,0.7,3.3,2.4) a6 <- c(4.0,3.4,10.5,0.7,3.3,2.1) a <- rbind(a1,a2,a3,a4,a5,a6) avg <- colMeans (a) avg.array <- t (array (avg, rev(dim(a)))) diff <- a - avg.array country.name <- c("France", "Germany", "Japan", "Britain", "USA", "Turkey") # The line plots par (mfrow=c(2,3), mar=c(4,4,2,.5), mgp=c(2,.7,0), tck=-.02, oma=c(3,0,4,0), bg="gray96", fg="gray30") for (i in 1:6){ plot (c(1,6), c(-1,1.7), xlab="", ylab="", xaxt="n", yaxt="n", bty="l", type="n") lines (1:6, diff[i,], col="blue") points (1:6, diff[i,], pch=19, col="black") if (i>3){ axis (1, c(1,3,5), c ("Work,\nstudy", "Eat,\nsleep", "Leisure"), mgp=c(2,1.5,0), tck=0, cex.axis=1.2) axis (1, c(2,4,6), c ("Unpaid\nwork", "Personal\nCare", "Other"), mgp=c(2,1.5,0),
Introduction: John Kastellec points me to this blog by Ezra Klein criticizing the following graph from a recent Republican Party report: Klein (following Alexander Hart ) slams the graph for not going all the way to zero on the y-axis, thus making the projected change seem bigger than it really is. I agree with Klein and Hart that, if you’re gonna do a bar chart, you want the bars to go down to 0. On the other hand, a projected change from 19% to 23% is actually pretty big, and I don’t see the point of using a graphical display that hides it. The solution: Ditch the bar graph entirely and replace it by a lineplot , in particular, a time series with year-by-year data. The time series would have several advantages: 1. Data are placed in context. You’d see every year, instead of discrete averages, and you’d get to see the changes in the context of year-to-year variation. 2. With the time series, you can use whatever y-axis works with the data. No need to go to zero. P.S. I l
6 0.82398009 1439 andrew gelman stats-2012-08-01-A book with a bunch of simple graphs
8 0.80126768 61 andrew gelman stats-2010-05-31-A data visualization manifesto
9 0.80064166 2246 andrew gelman stats-2014-03-13-An Economist’s Guide to Visualizing Data
10 0.7985329 671 andrew gelman stats-2011-04-20-One more time-use graph
11 0.79660743 1896 andrew gelman stats-2013-06-13-Against the myth of the heroic visualization
12 0.79235625 829 andrew gelman stats-2011-07-29-Infovis vs. statgraphics: A clear example of their different goals
13 0.78659332 1764 andrew gelman stats-2013-03-15-How do I make my graphs?
14 0.78621316 488 andrew gelman stats-2010-12-27-Graph of the year
15 0.78455406 1011 andrew gelman stats-2011-11-15-World record running times vs. distance
17 0.7814579 1606 andrew gelman stats-2012-12-05-The Grinch Comes Back
18 0.77922773 2266 andrew gelman stats-2014-03-25-A statistical graphics course and statistical graphics advice
19 0.77435219 319 andrew gelman stats-2010-10-04-“Who owns Congress”
20 0.77151716 2319 andrew gelman stats-2014-05-05-Can we make better graphs of global temperature history?
topicId topicWeight
[(1, 0.033), (10, 0.037), (16, 0.038), (24, 0.203), (34, 0.031), (77, 0.048), (89, 0.019), (93, 0.323), (99, 0.11)]
simIndex simValue blogId blogTitle
Introduction: I put it on the sister blog so you loyal readers here wouldn’t be distracted by it.
same-blog 2 0.81113535 1116 andrew gelman stats-2012-01-13-Infographic on the economy
Introduction: Gabriel Bergin writes: Just thought I’d share an infographic you might enjoy. I [Bergin] quite like what they did with the colored ranges of previous curves in the two middle graphs: I like it. Would it be possible to put the two long time series on the same scale? As it is, one starts in 1948 and the other starts in 1980. The only thing about the display that I really don’t like are those balls on the top indicating the duration of recessions. It looks weird to me to display a time duration in the form of the area of a ball.
3 0.76452506 1432 andrew gelman stats-2012-07-27-“Get off my lawn”-blogging
Introduction: Jay Livingston critiques the recent pronouncements of sociologist and cigarette shill Peter Berger, who recently has moved into cultural criticism of New York’s mayor for living with “a woman to whom he is not married” (this is apparently a European sort of thing, I guess they don’t have unmarried partners in the parts of the U.S. where Berger hangs out). But what impresses me is that Berger is doing regular blogging at the age of 84 , writing a long essay each week. That’s really amazing to me. Some of the blogging is a bit suspect, for example the bit where he claims that he personally could convert gays to heterosexual orientation (“A few stubborn individuals may resist the Berger conversion program. The majority will succumb”)—but, really, you gotta admire that he’s doing this. I hope I’m that active when (if) I reach my mid-80s. (As a nonsmoker, I should have a pretty good chance of reaching that point.) P.S. More rant at the sister blog. P.P.S. In comments,
4 0.73607886 1397 andrew gelman stats-2012-06-27-Stand Your Ground laws and homicides
Introduction: Jeff points me to a paper by Chandler McClellan and Erdal Tekin which begins as follows: The controversies surrounding Stand Your Ground laws have recently captured the nation’s attention. Since 2005, eighteen states have passed laws extending the right to self-defense with no duty to retreat to any place a person has a legal right to be, and several additional states are debating the adoption of similar legislation. Despite the implications that these laws may have for public safety, there has been little empirical investigation of their impact on crime and victimization. In this paper, we use monthly data from the U.S. Vital Statistics to examine how Stand Your Ground laws affect homicides. We identify the impact of these laws by exploiting variation in the effective date of these laws across states. Our results indicate that Stand Your Ground laws are associated with a significant increase in the number of homicides among whites, especially white males. According to our estimat
5 0.73584247 1503 andrew gelman stats-2012-09-19-“Poor Smokers in New York State Spend 25% of Income on Cigarettes, Study Finds”
Introduction: Jeff points me to this news article and asks, Can this be right? Hmmm . . . the article defines “wealthier smokers” as “those earning 60,000 or more.” So suppose a “low-income smoker” makes $20K, then 25% is $5000, which is $100 a week, or $14/day, which according to the article is roughly the cost of a pack of cigarettes. So I guess it’s possible. It just depends where you put the cutoff for “low-income” and where you put the cutoff for “smoker.” I also wonder whether the numerator and denominator are comparable. It might be that if you add up all of these people’s expenses and divide by their income, you’ll get a ratio of more than 100%.
6 0.72312546 1569 andrew gelman stats-2012-11-08-30-30-40 Nation
7 0.70078951 1210 andrew gelman stats-2012-03-12-Plagiarists are in the habit of lying
8 0.69831657 1281 andrew gelman stats-2012-04-25-Dyson’s baffling love of crackpots
9 0.67320645 1123 andrew gelman stats-2012-01-17-Big corporations are more popular than you might realize
10 0.66386068 683 andrew gelman stats-2011-04-28-Asymmetry in Political Bias
11 0.66084182 1711 andrew gelman stats-2013-02-07-How Open Should Academic Papers Be?
12 0.65948284 1959 andrew gelman stats-2013-07-28-50 shades of gray: A research story
13 0.63577557 1693 andrew gelman stats-2013-01-25-Subsidized driving
14 0.5917989 482 andrew gelman stats-2010-12-23-Capitalism as a form of voluntarism
16 0.57223547 938 andrew gelman stats-2011-10-03-Comparing prediction errors
17 0.57223028 1092 andrew gelman stats-2011-12-29-More by Berger and me on weakly informative priors
18 0.57071459 1479 andrew gelman stats-2012-09-01-Mothers and Moms
19 0.57055396 643 andrew gelman stats-2011-04-02-So-called Bayesian hypothesis testing is just as bad as regular hypothesis testing
20 0.57030022 1787 andrew gelman stats-2013-04-04-Wanna be the next Tyler Cowen? It’s not as easy as you might think!