andrew_gelman_stats andrew_gelman_stats-2011 andrew_gelman_stats-2011-572 knowledge-graph by maker-knowledge-mining
Source: html
Introduction: Malecki asks: Is this the worst infographic ever to appear in NYT? USA Today is not something to aspire to. To connect to some of our recent themes , I agree this is a pretty horrible data display. But it’s not bad as a series of images. Considering the competition to be a cartoon or series of photos, these images aren’t so bad. One issue, I think, is that designers get credit for creativity and originality (unusual color combinations! Histogram bars shaped like mosques!) , which is often the opposite of what we want in a clear graph. It’s Martin Amis vs. George Orwell all over again.
sentIndex sentText sentNum sentScore
1 Malecki asks: Is this the worst infographic ever to appear in NYT? [sent-1, score-0.514]
2 To connect to some of our recent themes , I agree this is a pretty horrible data display. [sent-3, score-0.678]
3 Considering the competition to be a cartoon or series of photos, these images aren’t so bad. [sent-5, score-0.723]
4 One issue, I think, is that designers get credit for creativity and originality (unusual color combinations! [sent-6, score-0.916]
5 ) , which is often the opposite of what we want in a clear graph. [sent-8, score-0.316]
wordName wordTfidf (topN-words)
[('aspire', 0.25), ('originality', 0.236), ('photos', 0.226), ('shaped', 0.218), ('series', 0.211), ('cartoon', 0.206), ('malecki', 0.201), ('creativity', 0.197), ('histogram', 0.193), ('orwell', 0.19), ('combinations', 0.184), ('infographic', 0.179), ('designers', 0.177), ('usa', 0.177), ('bars', 0.169), ('themes', 0.159), ('martin', 0.159), ('images', 0.158), ('connect', 0.152), ('nyt', 0.151), ('unusual', 0.15), ('competition', 0.148), ('worst', 0.143), ('color', 0.143), ('horrible', 0.133), ('credit', 0.128), ('george', 0.126), ('considering', 0.124), ('opposite', 0.124), ('asks', 0.122), ('appear', 0.105), ('aren', 0.104), ('today', 0.098), ('ever', 0.087), ('issue', 0.083), ('clear', 0.075), ('bad', 0.074), ('agree', 0.074), ('often', 0.066), ('recent', 0.065), ('pretty', 0.061), ('want', 0.051), ('something', 0.048), ('get', 0.035), ('data', 0.034), ('like', 0.027), ('think', 0.027), ('one', 0.025)]
simIndex simValue blogId blogTitle
same-blog 1 1.0 572 andrew gelman stats-2011-02-14-Desecration of valuable real estate
Introduction: Malecki asks: Is this the worst infographic ever to appear in NYT? USA Today is not something to aspire to. To connect to some of our recent themes , I agree this is a pretty horrible data display. But it’s not bad as a series of images. Considering the competition to be a cartoon or series of photos, these images aren’t so bad. One issue, I think, is that designers get credit for creativity and originality (unusual color combinations! Histogram bars shaped like mosques!) , which is often the opposite of what we want in a clear graph. It’s Martin Amis vs. George Orwell all over again.
2 0.14002302 1862 andrew gelman stats-2013-05-18-uuuuuuuuuuuuugly
Introduction: Hamdan Azhar writes: I came across this graphic of vaccine-attributed decreases in mortality and was curious if you found it as unattractive and unintuitive as I did. Hope all is well with you! My reply: All’s well with me. And yes, that’s one horrible graph. It has all the problems with a bad infographic with none of the virtues. Compared to this monstrosity, the typical USA Today graph is a stunning, beautiful masterpiece. I don’t think I want to soil this webpage with the image. In fact, I don’t even want to link to it.
3 0.11974309 722 andrew gelman stats-2011-05-20-Why no Wegmania?
Introduction: A colleague asks: When I search the web, I find the story [of the article by Said, Wegman, et al. on social networks in climate research, which was recently bumped from the journal Computational Statistics and Data Analysis because of plagiarism] only on blogs, USA Today, and UPI. Why is that? Any idea why it isn’t reported by any of the major newspapers? Here’s my answer: 1. USA Today broke the story. Apparently this USA Today reporter put a lot of effort into it. The NYT doesn’t like to run a story that begins, “Yesterday, USA Today reported…” 2. To us it’s big news because we’re statisticians. [The main guy in the study, Edward Wegman, won the Founders Award from the American Statistical Association a few years ago.] To the rest of the world, the story is: “Obscure prof at an obscure college plagiarized an article in a journal that nobody’s ever heard of.” When a Harvard scientist paints black dots on white mice and says he’s curing cancer, that’s news. When P
4 0.11471077 863 andrew gelman stats-2011-08-21-Bad graph
Introduction: Dan Goldstein points us to this : It’s a good infographic–it grabs the reader’s eye ( see discussion here ), no? P.S. The above remark is not meant as a dig at infographics. On the contrary, I am sincerely saying that a graph that violates all statistical principles and does not do a good job at displaying data, can still be valuable and useful as a data graphic. For this infographic, the numbers are used as ornamentation to attract the viewer, just as one might use a cartoon or a dramatic photo image. P.P.S. At Hadley’s suggestion (see comment below), I’ve changed all uses of “infovis” above to “infographic.”
5 0.11316612 112 andrew gelman stats-2010-06-27-Sampling rate of human-scaled time series
Introduction: Bill Harris writes with two interesting questions involving time series analysis: I used to work in an organization that designed and made signal processing equipment. Antialiasing and windowing of time series was a big deal in performing analysis accurately. Now I’m in a place where I have to make inferences about human-scaled time series. It has dawned on me that the two are related. I’m not sure we often have data sampled at a rate at least twice the highest frequency present (not just the highest frequency of interest). The only articles I’ve seen about aliasing as applied to social science series are from Hinich or from related works . Box and Jenkins hint at it in section 13.3 of Time Series Analysis, but the analysis seems to be mostly heuristic. Yet I can imagine all sorts of time series subject to similar problems, from analyses of stock prices based on closing prices (mentioned in the latter article) to other economic series measured on a monthly basis to en
6 0.11202423 1064 andrew gelman stats-2011-12-16-The benefit of the continuous color scale
7 0.097188115 1943 andrew gelman stats-2013-07-18-Data to use for in-class sampling exercises?
8 0.087968841 1734 andrew gelman stats-2013-02-23-Life in the C-suite: A graph that is both ugly and bad, and an unrelated story
9 0.083047092 1800 andrew gelman stats-2013-04-12-Too tired to mock
10 0.082679942 1168 andrew gelman stats-2012-02-14-The tabloids strike again
11 0.080007717 822 andrew gelman stats-2011-07-26-Any good articles on the use of error bars?
13 0.075652957 2048 andrew gelman stats-2013-10-03-A comment on a post at the Monkey Cage
14 0.070950642 1572 andrew gelman stats-2012-11-10-I don’t like this cartoon
16 0.068378657 1684 andrew gelman stats-2013-01-20-Ugly ugly ugly
18 0.0667018 1422 andrew gelman stats-2012-07-20-Likelihood thresholds and decisions
19 0.065941699 855 andrew gelman stats-2011-08-16-Infovis and statgraphics update update
topicId topicWeight
[(0, 0.074), (1, -0.028), (2, -0.019), (3, 0.024), (4, 0.039), (5, -0.051), (6, -0.013), (7, 0.014), (8, -0.005), (9, -0.012), (10, 0.004), (11, -0.024), (12, -0.006), (13, -0.003), (14, 0.001), (15, -0.014), (16, 0.015), (17, -0.013), (18, -0.001), (19, -0.009), (20, -0.004), (21, -0.018), (22, -0.008), (23, -0.008), (24, 0.017), (25, -0.022), (26, 0.007), (27, -0.024), (28, -0.007), (29, 0.006), (30, 0.004), (31, 0.014), (32, -0.008), (33, -0.008), (34, -0.017), (35, 0.032), (36, -0.032), (37, 0.033), (38, 0.032), (39, 0.029), (40, 0.003), (41, 0.033), (42, -0.013), (43, 0.002), (44, -0.016), (45, 0.037), (46, 0.005), (47, 0.034), (48, 0.007), (49, -0.008)]
simIndex simValue blogId blogTitle
same-blog 1 0.94085288 572 andrew gelman stats-2011-02-14-Desecration of valuable real estate
Introduction: Malecki asks: Is this the worst infographic ever to appear in NYT? USA Today is not something to aspire to. To connect to some of our recent themes , I agree this is a pretty horrible data display. But it’s not bad as a series of images. Considering the competition to be a cartoon or series of photos, these images aren’t so bad. One issue, I think, is that designers get credit for creativity and originality (unusual color combinations! Histogram bars shaped like mosques!) , which is often the opposite of what we want in a clear graph. It’s Martin Amis vs. George Orwell all over again.
Introduction: John Kastellec points me to this blog by Ezra Klein criticizing the following graph from a recent Republican Party report: Klein (following Alexander Hart ) slams the graph for not going all the way to zero on the y-axis, thus making the projected change seem bigger than it really is. I agree with Klein and Hart that, if you’re gonna do a bar chart, you want the bars to go down to 0. On the other hand, a projected change from 19% to 23% is actually pretty big, and I don’t see the point of using a graphical display that hides it. The solution: Ditch the bar graph entirely and replace it by a lineplot , in particular, a time series with year-by-year data. The time series would have several advantages: 1. Data are placed in context. You’d see every year, instead of discrete averages, and you’d get to see the changes in the context of year-to-year variation. 2. With the time series, you can use whatever y-axis works with the data. No need to go to zero. P.S. I l
3 0.68694234 1896 andrew gelman stats-2013-06-13-Against the myth of the heroic visualization
Introduction: Alberto Cairo tells a fascinating story about John Snow, H. W. Acland, and the Mythmaking Problem: Every human community—nations, ethnic and cultural groups, professional guilds—inevitably raises a few of its members to the status of heroes and weaves myths around them. . . . The visual display of information is no stranger to heroes and myth. In fact, being a set of disciplines with a relatively small amount of practitioners and researchers, it has generated a staggering number of heroes, perhaps as a morale-enhancing mechanism. Most of us have heard of the wonders of William Playfair’s Commercial and Political Atlas, Florence Nightingale’s coxcomb charts, Charles Joseph Minard’s Napoleon’s march diagram, and Henry Beck’s 1933 redesign of the London Underground map. . . . Cairo’s goal, I think, is not to disparage these great pioneers of graphics but rather to put their work in perspective, recognizing the work of their excellent contemporaries. I would like to echo Cairo’
4 0.68241084 1116 andrew gelman stats-2012-01-13-Infographic on the economy
Introduction: Gabriel Bergin writes: Just thought I’d share an infographic you might enjoy. I [Bergin] quite like what they did with the colored ranges of previous curves in the two middle graphs: I like it. Would it be possible to put the two long time series on the same scale? As it is, one starts in 1948 and the other starts in 1980. The only thing about the display that I really don’t like are those balls on the top indicating the duration of recessions. It looks weird to me to display a time duration in the form of the area of a ball.
Introduction: I continue to struggle to convey my thoughts on statistical graphics so I’ll try another approach, this time giving my own story. For newcomers to this discussion: the background is that Antony Unwin and I wrote an article on the different goals embodied in information visualization and statistical graphics, but I have difficulty communicating on this point with the infovis people. Maybe if I tell my own story, and then they tell their stories, this will point a way forward to a more constructive discussion. So here goes. I majored in physics in college and I worked in a couple of research labs during the summer. Physicists graph everything. I did most of my plotting on graph paper–this continued through my second year of grad school–and became expert at putting points at 1/5, 2/5, 3/5, and 4/5 between the x and y grid lines. In grad school in statistics, I continued my physics habits and graphed everything I could. I did notice, though, that the faculty and the other
7 0.66838443 1125 andrew gelman stats-2012-01-18-Beautiful Line Charts
8 0.66185576 61 andrew gelman stats-2010-05-31-A data visualization manifesto
9 0.65982211 1684 andrew gelman stats-2013-01-20-Ugly ugly ugly
10 0.65780908 1862 andrew gelman stats-2013-05-18-uuuuuuuuuuuuugly
11 0.65655839 2246 andrew gelman stats-2014-03-13-An Economist’s Guide to Visualizing Data
12 0.64522558 1124 andrew gelman stats-2012-01-17-How to map geographically-detailed survey responses?
13 0.64454758 1606 andrew gelman stats-2012-12-05-The Grinch Comes Back
14 0.642106 1253 andrew gelman stats-2012-04-08-Technology speedup graph
15 0.64107233 855 andrew gelman stats-2011-08-16-Infovis and statgraphics update update
16 0.63889736 502 andrew gelman stats-2011-01-04-Cash in, cash out graph
17 0.63586098 2091 andrew gelman stats-2013-11-06-“Marginally significant”
18 0.63306284 2065 andrew gelman stats-2013-10-17-Cool dynamic demographic maps provide beautiful illustration of Chris Rock effect
19 0.63149065 1764 andrew gelman stats-2013-03-15-How do I make my graphs?
20 0.63148797 2266 andrew gelman stats-2014-03-25-A statistical graphics course and statistical graphics advice
topicId topicWeight
[(16, 0.677), (24, 0.063), (95, 0.046), (99, 0.082)]
simIndex simValue blogId blogTitle
1 0.99322212 1026 andrew gelman stats-2011-11-25-Bayes wikipedia update
Introduction: I checked and somebody went in and screwed up my fixes to the wikipedia page on Bayesian inference. I give up.
2 0.98638988 1745 andrew gelman stats-2013-03-02-Classification error
Introduction: 15-2040 != 19-3010 (and, for that matter, 25-1022 != 25-1063).
same-blog 3 0.96438247 572 andrew gelman stats-2011-02-14-Desecration of valuable real estate
Introduction: Malecki asks: Is this the worst infographic ever to appear in NYT? USA Today is not something to aspire to. To connect to some of our recent themes , I agree this is a pretty horrible data display. But it’s not bad as a series of images. Considering the competition to be a cartoon or series of photos, these images aren’t so bad. One issue, I think, is that designers get credit for creativity and originality (unusual color combinations! Histogram bars shaped like mosques!) , which is often the opposite of what we want in a clear graph. It’s Martin Amis vs. George Orwell all over again.
4 0.95320719 398 andrew gelman stats-2010-11-06-Quote of the day
Introduction: “A statistical model is usually taken to be summarized by a likelihood, or a likelihood and a prior distribution, but we go an extra step by noting that the parameters of a model are typically batched, and we take this batching as an essential part of the model.”
5 0.94588238 1014 andrew gelman stats-2011-11-16-Visualizations of NYPD stop-and-frisk data
Introduction: Cathy O’Neil organized this visualization project with NYPD stop-and-frisk data. It’s part of the Data Without Borders project. Unfortunately, because of legal restrictions I couldn’t send them the data Jeff, Alex, and I used in our project several years ago.
6 0.92147875 1115 andrew gelman stats-2012-01-12-Where are the larger-than-life athletes?
7 0.91748959 528 andrew gelman stats-2011-01-21-Elevator shame is a two-way street
8 0.89484191 1659 andrew gelman stats-2013-01-07-Some silly things you (didn’t) miss by not reading the sister blog
9 0.89135814 1279 andrew gelman stats-2012-04-24-ESPN is looking to hire a research analyst
10 0.87512022 1366 andrew gelman stats-2012-06-05-How do segregation measures change when you change the level of aggregation?
11 0.87453574 1697 andrew gelman stats-2013-01-29-Where 36% of all boys end up nowadays
12 0.87295449 1304 andrew gelman stats-2012-05-06-Picking on Stephen Wolfram
13 0.85815489 1180 andrew gelman stats-2012-02-22-I’m officially no longer a “rogue”
14 0.8253184 1487 andrew gelman stats-2012-09-08-Animated drought maps
15 0.80801374 445 andrew gelman stats-2010-12-03-Getting a job in pro sports… as a statistician
16 0.80636209 1330 andrew gelman stats-2012-05-19-Cross-validation to check missing-data imputation
17 0.78638798 1598 andrew gelman stats-2012-11-30-A graphics talk with no visuals!
18 0.78607458 1025 andrew gelman stats-2011-11-24-Always check your evidence
19 0.76423472 700 andrew gelman stats-2011-05-06-Suspicious pattern of too-strong replications of medical research
20 0.74366605 1156 andrew gelman stats-2012-02-06-Bayesian model-building by pure thought: Some principles and examples