andrew_gelman_stats andrew_gelman_stats-2013 andrew_gelman_stats-2013-1800 knowledge-graph by maker-knowledge-mining
Source: html
Introduction: Someone sent me an email with the subject line “A terrible infographic,” and it went on from there: “Given some of your recent writing on infovis, I thought you might get a kick out of this . . . I’m certainly sympathetic to their motivations, but some of these plots do not aid understanding… To pick on a few in particular, the first plot attached, cropped from the infographic, is a strange alternative to a bar plot. For the second attachment, I still don’t understand what they’ve plotted. . . .” I agree with everything he wrote, but this point I think I’m getting too exhausted to laugh at graphs unless there is an obvious political bias to point to.
sentIndex sentText sentNum sentScore
1 Someone sent me an email with the subject line “A terrible infographic,” and it went on from there: “Given some of your recent writing on infovis, I thought you might get a kick out of this . [sent-1, score-1.268]
2 I’m certainly sympathetic to their motivations, but some of these plots do not aid understanding… To pick on a few in particular, the first plot attached, cropped from the infographic, is a strange alternative to a bar plot. [sent-4, score-1.654]
3 For the second attachment, I still don’t understand what they’ve plotted. [sent-5, score-0.23]
4 ” I agree with everything he wrote, but this point I think I’m getting too exhausted to laugh at graphs unless there is an obvious political bias to point to. [sent-9, score-1.498]
wordName wordTfidf (topN-words)
[('infographic', 0.393), ('cropped', 0.259), ('attachment', 0.248), ('exhausted', 0.248), ('kick', 0.221), ('laugh', 0.202), ('terrible', 0.194), ('aid', 0.184), ('infovis', 0.182), ('motivations', 0.18), ('sympathetic', 0.175), ('bar', 0.175), ('strange', 0.166), ('attached', 0.162), ('plots', 0.152), ('plot', 0.132), ('unless', 0.132), ('pick', 0.128), ('alternative', 0.126), ('bias', 0.123), ('obvious', 0.121), ('subject', 0.117), ('email', 0.113), ('sent', 0.107), ('went', 0.107), ('certainly', 0.104), ('everything', 0.103), ('graphs', 0.101), ('point', 0.1), ('line', 0.099), ('understanding', 0.099), ('second', 0.086), ('getting', 0.083), ('writing', 0.083), ('understand', 0.081), ('agree', 0.081), ('someone', 0.078), ('political', 0.074), ('particular', 0.072), ('recent', 0.071), ('wrote', 0.069), ('thought', 0.068), ('given', 0.066), ('still', 0.063), ('first', 0.053), ('might', 0.049), ('ve', 0.047), ('get', 0.039), ('think', 0.03)]
simIndex simValue blogId blogTitle
same-blog 1 1.0 1800 andrew gelman stats-2013-04-12-Too tired to mock
Introduction: Someone sent me an email with the subject line “A terrible infographic,” and it went on from there: “Given some of your recent writing on infovis, I thought you might get a kick out of this . . . I’m certainly sympathetic to their motivations, but some of these plots do not aid understanding… To pick on a few in particular, the first plot attached, cropped from the infographic, is a strange alternative to a bar plot. For the second attachment, I still don’t understand what they’ve plotted. . . .” I agree with everything he wrote, but this point I think I’m getting too exhausted to laugh at graphs unless there is an obvious political bias to point to.
2 0.18726525 863 andrew gelman stats-2011-08-21-Bad graph
Introduction: Dan Goldstein points us to this : It’s a good infographic–it grabs the reader’s eye ( see discussion here ), no? P.S. The above remark is not meant as a dig at infographics. On the contrary, I am sincerely saying that a graph that violates all statistical principles and does not do a good job at displaying data, can still be valuable and useful as a data graphic. For this infographic, the numbers are used as ornamentation to attract the viewer, just as one might use a cartoon or a dramatic photo image. P.P.S. At Hadley’s suggestion (see comment below), I’ve changed all uses of “infovis” above to “infographic.”
3 0.14802384 687 andrew gelman stats-2011-04-29-Zero is zero
Introduction: Nathan Roseberry writes: I thought I had read on your blog that bar charts should always include zero on the scale, but a search of your blog (or google) didn’t return what I was looking for. Is it considered a best practice to always include zero on the axis for bar charts? Has this been written in a book? My reply: The idea is that the area of the bar represents “how many” or “how much.” The bar has to go down to 0 for that to work. You don’t have to have your y-axis go to zero, but if you want the axis to go anywhere else, don’t use a bar graph, use a line graph. Usually line graphs are better anyway. I’m sure this is all in a book somewhere.
Introduction: I continue to struggle to convey my thoughts on statistical graphics so I’ll try another approach, this time giving my own story. For newcomers to this discussion: the background is that Antony Unwin and I wrote an article on the different goals embodied in information visualization and statistical graphics, but I have difficulty communicating on this point with the infovis people. Maybe if I tell my own story, and then they tell their stories, this will point a way forward to a more constructive discussion. So here goes. I majored in physics in college and I worked in a couple of research labs during the summer. Physicists graph everything. I did most of my plotting on graph paper–this continued through my second year of grad school–and became expert at putting points at 1/5, 2/5, 3/5, and 4/5 between the x and y grid lines. In grad school in statistics, I continued my physics habits and graphed everything I could. I did notice, though, that the faculty and the other
5 0.14352886 855 andrew gelman stats-2011-08-16-Infovis and statgraphics update update
Introduction: To continue our discussion from last week , consider three positions regarding the display of information: (a) The traditional tabular approach. This is how most statisticians, econometricians, political scientists, sociologists, etc., seem to operate. They understand the appeal of a pretty graph, and they’re willing to plot some data as part of an exploratory data analysis, but they see their serious research as leading to numerical estimates, p-values, tables of numbers. These people might use a graph to illustrate their points but they don’t see them as necessary in their research. (b) Statistical graphics as performed by Howard Wainer, Bill Cleveland, Dianne Cook, etc. They–we–see graphics as central to the process of statistical modeling and data analysis and are interested in graphs (static and dynamic) that display every data point as transparently as possible. (c) Information visualization or infographics, as performed by graphics designers and statisticians who are
6 0.12729141 1734 andrew gelman stats-2013-02-23-Life in the C-suite: A graph that is both ugly and bad, and an unrelated story
7 0.12385246 252 andrew gelman stats-2010-09-02-R needs a good function to make line plots
8 0.11122948 1090 andrew gelman stats-2011-12-28-“. . . extending for dozens of pages”
9 0.098450631 126 andrew gelman stats-2010-07-03-Graphical presentation of risk ratios
10 0.09705168 1862 andrew gelman stats-2013-05-18-uuuuuuuuuuuuugly
11 0.096084222 2144 andrew gelman stats-2013-12-23-I hate this stuff
12 0.090989776 1258 andrew gelman stats-2012-04-10-Why display 6 years instead of 30?
13 0.088141955 61 andrew gelman stats-2010-05-31-A data visualization manifesto
14 0.085170045 816 andrew gelman stats-2011-07-22-“Information visualization” vs. “Statistical graphics”
15 0.083047092 572 andrew gelman stats-2011-02-14-Desecration of valuable real estate
16 0.082203306 1116 andrew gelman stats-2012-01-13-Infographic on the economy
17 0.081413873 1128 andrew gelman stats-2012-01-19-Sharon Begley: Worse than Stephen Jay Gould?
18 0.080558114 1028 andrew gelman stats-2011-11-26-Tenure lets you handle students who cheat
19 0.079908565 1747 andrew gelman stats-2013-03-03-More research on the role of puzzles in processing data graphics
20 0.079050779 319 andrew gelman stats-2010-10-04-“Who owns Congress”
topicId topicWeight
[(0, 0.108), (1, -0.05), (2, -0.021), (3, 0.036), (4, 0.053), (5, -0.097), (6, -0.023), (7, 0.021), (8, -0.016), (9, -0.006), (10, 0.002), (11, -0.011), (12, -0.005), (13, -0.005), (14, 0.006), (15, -0.01), (16, 0.006), (17, -0.047), (18, -0.018), (19, 0.033), (20, 0.0), (21, -0.013), (22, 0.011), (23, -0.011), (24, 0.018), (25, -0.016), (26, 0.042), (27, -0.004), (28, -0.037), (29, 0.01), (30, -0.03), (31, 0.005), (32, -0.036), (33, 0.034), (34, -0.05), (35, -0.04), (36, 0.017), (37, 0.01), (38, 0.042), (39, -0.028), (40, 0.051), (41, -0.016), (42, -0.017), (43, 0.013), (44, -0.017), (45, -0.021), (46, 0.014), (47, 0.04), (48, 0.051), (49, -0.036)]
simIndex simValue blogId blogTitle
same-blog 1 0.96254712 1800 andrew gelman stats-2013-04-12-Too tired to mock
Introduction: Someone sent me an email with the subject line “A terrible infographic,” and it went on from there: “Given some of your recent writing on infovis, I thought you might get a kick out of this . . . I’m certainly sympathetic to their motivations, but some of these plots do not aid understanding… To pick on a few in particular, the first plot attached, cropped from the infographic, is a strange alternative to a bar plot. For the second attachment, I still don’t understand what they’ve plotted. . . .” I agree with everything he wrote, but this point I think I’m getting too exhausted to laugh at graphs unless there is an obvious political bias to point to.
2 0.77334052 126 andrew gelman stats-2010-07-03-Graphical presentation of risk ratios
Introduction: Jimmy passes this article by Ahmad Reza Hosseinpoor and Carla AbouZahr. I have little to say, except that (a) they seem to be making a reasonable point, and (b) those bar graphs are pretty ugly.
3 0.7439993 37 andrew gelman stats-2010-05-17-Is chartjunk really “more useful” than plain graphs? I don’t think so.
Introduction: Helen DeWitt links to this blog that reports on a study by Scott Bateman, Carl Gutwin, David McDine, Regan Mandryk, Aaron Genest, and Christopher Brooks that claims the following: Guidelines for designing information charts often state that the presentation should reduce ‘chart junk’–visual embellishments that are not essential to understanding the data. . . . we conducted an experiment that compared embellished charts with plain ones, and measured both interpretation accuracy and long-term recall. We found that people’s accuracy in describing the embellished charts was no worse than for plain charts, and that their recall after a two-to-three-week gap was significantly better. As the above-linked blogger puts it, “chartjunk is more useful than plain graphs. . . . Tufte is not going to like this.” I can’t speak for Ed Tufte, but I’m not gonna take this claim about chartjunk lying down. I have two points to make which I hope can stop the above-linked study from being sla
4 0.73338765 1439 andrew gelman stats-2012-08-01-A book with a bunch of simple graphs
Introduction: Howard Friedman sent me a new book, The Measure of a Nation, subtitled How to Regain America’s Competitive Edge and Boost Our Global Standing. Without commenting on the substance of Friedman’s recommendations, I’d like to endorse his strategy of presentation, which is to display graph after graph after graph showing the same message over and over again, which is that the U.S. is outperformed by various other countries (mostly in Europe) on a variety of measures. These aren’t graphs I would ever make—they are scatterplots in which the x-axis conveys no information. But they have the advantage of repetition: once you figure out how to read one of the graphs, you can read the others easily. Here’s an example which I found from a quick Google: I can’t actually figure out what is happening on the x-axis, nor do I understand the “star, middle child, dog” thing. But I like the use of graphics. Lots more fun than bullet points. Seriously. P.S. Just to be clear: I am not trying
5 0.72794104 61 andrew gelman stats-2010-05-31-A data visualization manifesto
Introduction: Details matter (at least, they do for me), but we don’t yet have a systematic way of going back and forth between the structure of a graph, its details, and the underlying questions that motivate our visualizations. (Cleveland, Wilkinson, and others have written a bit on how to formalize these connections, and I’ve thought about it too, but we have a ways to go.) I was thinking about this difficulty after reading an article on graphics by some computer scientists that was well-written but to me lacked a feeling for the linkages between substantive/statistical goals and graphical details. I have problems with these issues too, and my point here is not to criticize but to move the discussion forward. When thinking about visualization, how important are the details? Aleks pointed me to this article by Jeffrey Heer, Michael Bostock, and Vadim Ogievetsky, “A Tour through the Visualization Zoo: A survey of powerful visualization techniques, from the obvious to the obscure.” Th
6 0.71224356 1116 andrew gelman stats-2012-01-13-Infographic on the economy
7 0.71042293 1258 andrew gelman stats-2012-04-10-Why display 6 years instead of 30?
8 0.69954896 319 andrew gelman stats-2010-10-04-“Who owns Congress”
9 0.69870961 296 andrew gelman stats-2010-09-26-A simple semigraphic display
10 0.69635856 672 andrew gelman stats-2011-04-20-The R code for those time-use graphs
11 0.69447815 2266 andrew gelman stats-2014-03-25-A statistical graphics course and statistical graphics advice
12 0.6929639 2111 andrew gelman stats-2013-11-23-Tables > figures yet again
13 0.69277209 2246 andrew gelman stats-2014-03-13-An Economist’s Guide to Visualizing Data
14 0.69082665 1764 andrew gelman stats-2013-03-15-How do I make my graphs?
15 0.68992394 1684 andrew gelman stats-2013-01-20-Ugly ugly ugly
18 0.68125075 798 andrew gelman stats-2011-07-12-Sometimes a graph really is just ugly
topicId topicWeight
[(15, 0.312), (16, 0.052), (24, 0.215), (34, 0.075), (53, 0.026), (59, 0.024), (99, 0.165)]
simIndex simValue blogId blogTitle
same-blog 1 0.91889703 1800 andrew gelman stats-2013-04-12-Too tired to mock
Introduction: Someone sent me an email with the subject line “A terrible infographic,” and it went on from there: “Given some of your recent writing on infovis, I thought you might get a kick out of this . . . I’m certainly sympathetic to their motivations, but some of these plots do not aid understanding… To pick on a few in particular, the first plot attached, cropped from the infographic, is a strange alternative to a bar plot. For the second attachment, I still don’t understand what they’ve plotted. . . .” I agree with everything he wrote, but this point I think I’m getting too exhausted to laugh at graphs unless there is an obvious political bias to point to.
2 0.90861142 439 andrew gelman stats-2010-11-30-Of psychology research and investment tips
Introduction: A few days after “ Dramatic study shows participants are affected by psychological phenomena from the future ,” (see here ) the British Psychological Society follows up with “ Can psychology help combat pseudoscience? .” Somehow I’m reminded of that bit of financial advice which says, if you want to save some money, your best investment is to pay off your credit card bills.
3 0.90580821 1081 andrew gelman stats-2011-12-24-Statistical ethics violation
Introduction: A colleague writes: When I was in NYC I went to this party by group of Japanese bio-scientists. There, one guy told me about how the biggest pharmaceutical company in Japan did their statistics. They ran 100 different tests and reported the most significant one. (This was in 2006 and he said they stopped doing this few years back so they were doing this until pretty recently…) I’m not sure if this was 100 multiple comparison or 100 different kinds of test but I’m sure they wouldn’t want to disclose their data… Ouch!
4 0.86629891 908 andrew gelman stats-2011-09-14-Type M errors in the lab
Introduction: Jeff points us to this news article by Asher Mullard: Bayer halts nearly two-thirds of its target-validation projects because in-house experimental findings fail to match up with published literature claims, finds a first-of-a-kind analysis on data irreproducibility. An unspoken industry rule alleges that at least 50% of published studies from academic laboratories cannot be repeated in an industrial setting, wrote venture capitalist Bruce Booth in a recent blog post. A first-of-a-kind analysis of Bayer’s internal efforts to validate ‘new drug target’ claims now not only supports this view but suggests that 50% may be an underestimate; the company’s in-house experimental data do not match literature claims in 65% of target-validation projects, leading to project discontinuation. . . . Khusru Asadullah, Head of Target Discovery at Bayer, and his colleagues looked back at 67 target-validation projects, covering the majority of Bayer’s work in oncology, women’s health and cardiov
5 0.85302949 945 andrew gelman stats-2011-10-06-W’man < W’pedia, again
Introduction: Blogger Deep Climate looks at another paper by the 2002 recipient of the American Statistical Association’s Founders award. This time it’s not funny, it’s just sad. Here’s Wikipedia on simulated annealing: By analogy with this physical process, each step of the SA algorithm replaces the current solution by a random “nearby” solution, chosen with a probability that depends on the difference between the corresponding function values and on a global parameter T (called the temperature), that is gradually decreased during the process. The dependency is such that the current solution changes almost randomly when T is large, but increasingly “downhill” as T goes to zero. The allowance for “uphill” moves saves the method from becoming stuck at local minima—which are the bane of greedier methods. And here’s Wegman: During each step of the algorithm, the variable that will eventually represent the minimum is replaced by a random solution that is chosen according to a temperature
6 0.8453027 834 andrew gelman stats-2011-08-01-I owe it all to the haters
7 0.84017396 1541 andrew gelman stats-2012-10-19-Statistical discrimination again
8 0.82419217 2188 andrew gelman stats-2014-01-27-“Disappointed with your results? Boost your scientific paper”
9 0.82313824 133 andrew gelman stats-2010-07-08-Gratuitous use of “Bayesian Statistics,” a branding issue?
11 0.80913901 1794 andrew gelman stats-2013-04-09-My talks in DC and Baltimore this week
12 0.79416203 1394 andrew gelman stats-2012-06-27-99!
13 0.79232949 1624 andrew gelman stats-2012-12-15-New prize on causality in statstistics education
14 0.78983819 762 andrew gelman stats-2011-06-13-How should journals handle replication studies?
15 0.78122222 1908 andrew gelman stats-2013-06-21-Interpreting interactions in discrete-data regression
17 0.75497758 2278 andrew gelman stats-2014-04-01-Association for Psychological Science announces a new journal
18 0.75130892 803 andrew gelman stats-2011-07-14-Subtleties with measurement-error models for the evaluation of wacky claims
19 0.74732006 883 andrew gelman stats-2011-09-01-Arrow’s theorem update
20 0.73848337 1779 andrew gelman stats-2013-03-27-“Two Dogmas of Strong Objective Bayesianism”