andrew_gelman_stats andrew_gelman_stats-2010 andrew_gelman_stats-2010-126 knowledge-graph by maker-knowledge-mining
Source: html
Introduction: Jimmy passes this article by Ahmad Reza Hosseinpoor and Carla AbouZahr. I have little to say, except that (a) they seem to be making a reasonable point, and (b) those bar graphs are pretty ugly.
sentIndex sentText sentNum sentScore
1 Jimmy passes this article by Ahmad Reza Hosseinpoor and Carla AbouZahr. [sent-1, score-0.612]
2 I have little to say, except that (a) they seem to be making a reasonable point, and (b) those bar graphs are pretty ugly. [sent-2, score-1.708]
wordName wordTfidf (topN-words)
[('passes', 0.497), ('jimmy', 0.433), ('bar', 0.375), ('ugly', 0.351), ('except', 0.263), ('graphs', 0.217), ('reasonable', 0.204), ('making', 0.178), ('little', 0.173), ('seem', 0.154), ('pretty', 0.144), ('article', 0.115), ('say', 0.113), ('point', 0.107)]
simIndex simValue blogId blogTitle
same-blog 1 1.0 126 andrew gelman stats-2010-07-03-Graphical presentation of risk ratios
Introduction: Jimmy passes this article by Ahmad Reza Hosseinpoor and Carla AbouZahr. I have little to say, except that (a) they seem to be making a reasonable point, and (b) those bar graphs are pretty ugly.
2 0.2760933 687 andrew gelman stats-2011-04-29-Zero is zero
Introduction: Nathan Roseberry writes: I thought I had read on your blog that bar charts should always include zero on the scale, but a search of your blog (or google) didn’t return what I was looking for. Is it considered a best practice to always include zero on the axis for bar charts? Has this been written in a book? My reply: The idea is that the area of the bar represents “how many” or “how much.” The bar has to go down to 0 for that to work. You don’t have to have your y-axis go to zero, but if you want the axis to go anywhere else, don’t use a bar graph, use a line graph. Usually line graphs are better anyway. I’m sure this is all in a book somewhere.
Introduction: John Kastellec points me to this blog by Ezra Klein criticizing the following graph from a recent Republican Party report: Klein (following Alexander Hart ) slams the graph for not going all the way to zero on the y-axis, thus making the projected change seem bigger than it really is. I agree with Klein and Hart that, if you’re gonna do a bar chart, you want the bars to go down to 0. On the other hand, a projected change from 19% to 23% is actually pretty big, and I don’t see the point of using a graphical display that hides it. The solution: Ditch the bar graph entirely and replace it by a lineplot , in particular, a time series with year-by-year data. The time series would have several advantages: 1. Data are placed in context. You’d see every year, instead of discrete averages, and you’d get to see the changes in the context of year-to-year variation. 2. With the time series, you can use whatever y-axis works with the data. No need to go to zero. P.S. I l
4 0.15730461 1090 andrew gelman stats-2011-12-28-“. . . extending for dozens of pages”
Introduction: Kaiser writes : I have read a fair share of bore-them-to-tears compilation of survey research results – you know, those presentations with one multi-colored, stacked or grouped bar chart after another, extending for dozens of pages. I hate those grouped bar charts also—as I’ve written repeatedly, the central role of almost all statistical displays is to make comparisons, and you can make twice as many comparisons with a line plot as a bar plot. But I suspect the real problem with the reports that Kaiser is talking about is the “extending for dozens of pages” part. If they could just print each individual plot smaller and put dozens on a page, you could maybe get through the whole report in two or three pages. Almost always, graphs are too large. I’ve even seen abominations such as a fifty-page report with a single huge pie chart on each page. As Kaiser says, think about communication! A report with one big pie chart or bar plot per page is like a text document with one w
5 0.15397219 195 andrew gelman stats-2010-08-09-President Carter
Introduction: This assessment by Tyler Cowen reminded me that, in 1980, I and just about all my friends hated Jimmy Carter. Most of us much preferred him to Reagan but still hated Carter. I wouldn’t associate this with any particular ideological feeling—it’s not that we thought he was too liberal, or too conservative. He just seemed completely ineffectual. I remember feeling at the time that he had no principles, that he’d do anything to get elected. In retrospect, I think of this as an instance of uniform partisan swing: the president was unpopular nationally, and attitudes about him were negative, relatively speaking, among just about every group. My other Carter story comes from a conversation I had a couple years ago with an economist who’s about my age, a man who said that one reason he and his family moved from town A to town B in his metropolitan area was that, in town B, they didn’t feel like they were the only Republicans on their block. Anyway, this guy described himself as a “
6 0.13869254 747 andrew gelman stats-2011-06-06-Research Directions for Machine Learning and Algorithms
7 0.11945999 305 andrew gelman stats-2010-09-29-Decision science vs. social psychology
8 0.10661512 1275 andrew gelman stats-2012-04-22-Please stop me before I barf again
9 0.098450631 1800 andrew gelman stats-2013-04-12-Too tired to mock
10 0.096767411 855 andrew gelman stats-2011-08-16-Infovis and statgraphics update update
11 0.095498763 319 andrew gelman stats-2010-10-04-“Who owns Congress”
12 0.08682932 428 andrew gelman stats-2010-11-24-Flawed visualization of U.S. voting maybe has some good features
13 0.080097958 1694 andrew gelman stats-2013-01-26-Reflections on ethicsblogging
14 0.079179674 61 andrew gelman stats-2010-05-31-A data visualization manifesto
15 0.074626923 1413 andrew gelman stats-2012-07-11-News flash: Probability and statistics are hard to understand
16 0.072893888 1061 andrew gelman stats-2011-12-16-CrossValidated: A place to post your statistics questions
17 0.072868638 446 andrew gelman stats-2010-12-03-Is 0.05 too strict as a p-value threshold?
18 0.071696088 501 andrew gelman stats-2011-01-04-A new R package for fititng multilevel models
19 0.070663363 574 andrew gelman stats-2011-02-14-“The best data visualizations should stand on their own”? I don’t think so.
20 0.070438772 37 andrew gelman stats-2010-05-17-Is chartjunk really “more useful” than plain graphs? I don’t think so.
topicId topicWeight
[(0, 0.07), (1, -0.033), (2, -0.009), (3, 0.043), (4, 0.042), (5, -0.111), (6, -0.029), (7, 0.028), (8, -0.018), (9, 0.0), (10, 0.003), (11, -0.006), (12, -0.026), (13, -0.007), (14, 0.024), (15, -0.011), (16, 0.002), (17, -0.002), (18, -0.002), (19, 0.011), (20, -0.018), (21, 0.008), (22, -0.001), (23, 0.003), (24, 0.012), (25, -0.014), (26, 0.002), (27, 0.004), (28, -0.033), (29, 0.012), (30, -0.02), (31, -0.001), (32, -0.013), (33, 0.0), (34, -0.024), (35, -0.007), (36, 0.023), (37, -0.01), (38, 0.026), (39, -0.037), (40, 0.017), (41, 0.017), (42, -0.001), (43, 0.087), (44, -0.008), (45, -0.048), (46, -0.007), (47, 0.069), (48, 0.028), (49, 0.01)]
simIndex simValue blogId blogTitle
same-blog 1 0.9678213 126 andrew gelman stats-2010-07-03-Graphical presentation of risk ratios
Introduction: Jimmy passes this article by Ahmad Reza Hosseinpoor and Carla AbouZahr. I have little to say, except that (a) they seem to be making a reasonable point, and (b) those bar graphs are pretty ugly.
2 0.75632721 37 andrew gelman stats-2010-05-17-Is chartjunk really “more useful” than plain graphs? I don’t think so.
Introduction: Helen DeWitt links to this blog that reports on a study by Scott Bateman, Carl Gutwin, David McDine, Regan Mandryk, Aaron Genest, and Christopher Brooks that claims the following: Guidelines for designing information charts often state that the presentation should reduce ‘chart junk’–visual embellishments that are not essential to understanding the data. . . . we conducted an experiment that compared embellished charts with plain ones, and measured both interpretation accuracy and long-term recall. We found that people’s accuracy in describing the embellished charts was no worse than for plain charts, and that their recall after a two-to-three-week gap was significantly better. As the above-linked blogger puts it, “chartjunk is more useful than plain graphs. . . . Tufte is not going to like this.” I can’t speak for Ed Tufte, but I’m not gonna take this claim about chartjunk lying down. I have two points to make which I hope can stop the above-linked study from being sla
3 0.75384611 1439 andrew gelman stats-2012-08-01-A book with a bunch of simple graphs
Introduction: Howard Friedman sent me a new book, The Measure of a Nation, subtitled How to Regain America’s Competitive Edge and Boost Our Global Standing. Without commenting on the substance of Friedman’s recommendations, I’d like to endorse his strategy of presentation, which is to display graph after graph after graph showing the same message over and over again, which is that the U.S. is outperformed by various other countries (mostly in Europe) on a variety of measures. These aren’t graphs I would ever make—they are scatterplots in which the x-axis conveys no information. But they have the advantage of repetition: once you figure out how to read one of the graphs, you can read the others easily. Here’s an example which I found from a quick Google: I can’t actually figure out what is happening on the x-axis, nor do I understand the “star, middle child, dog” thing. But I like the use of graphics. Lots more fun than bullet points. Seriously. P.S. Just to be clear: I am not trying
4 0.7491436 1606 andrew gelman stats-2012-12-05-The Grinch Comes Back
Introduction: Wayne Folta writes: In keeping with your interest in graphs, this might interest or inspire you, if you haven’t seen it already, which features 20 scientific graphs that Wired likes, ranging from drawn illustrations to trajectory plots. My reaction: I looked at the first 10. I liked 1, 3, and 5, I didn’t like 2, 7, 8, 9, and 10. I have neutral feelings about 4 and 6. I won’t explain all these feelings, but, just for example, from my perspective, image 9 fails as a statistical graphic (although it might be fine as an infovis) by trying to cram to much into a single image. I don’t think it works to have all the colors on the single wheels; instead I’d prefer some sort of grid of images. Also, I don’t see the point of the circular display. That makes no sense at all; it’s a misleading feature. That said, the graphs I dislike can still be fine for their purpose. A graph in a journal such as Science or Nature is meant to grab the eye of a busy reader (or to go viral on
5 0.74719006 1275 andrew gelman stats-2012-04-22-Please stop me before I barf again
Introduction: Pointing to some horrible graphs, Kaiser writes, “The Earth Institute needs a graphics adviser.” I agree. The graphs are corporate standard, neither pretty or innovative enough to qualify as infographics, not informational enough to be good statistical data displays. Some examples include the above exploding pie chart, which, as Kaiser notes, is not merely ugly and ridiculously difficult to read (given that it is conveying only nine data points) but also invites suspicion of its numbers, and pages and pages of graphs that could be better compressed into a compact displays (see pages 25-65 of the report). Yes, this is all better than tables of numbers, but I don’t see that much thought went into displaying patterns of information or telling a story. It’s more graph-as-data-dump. To be fair, the report does have some a clean scatterplot (on page 65). But, overall, the graphs are not well-integrated with the messages in the text. I feel a little bit bad about this, beca
6 0.7399267 687 andrew gelman stats-2011-04-29-Zero is zero
7 0.72623253 1800 andrew gelman stats-2013-04-12-Too tired to mock
8 0.71654612 319 andrew gelman stats-2010-10-04-“Who owns Congress”
9 0.70753717 672 andrew gelman stats-2011-04-20-The R code for those time-use graphs
11 0.67532843 1011 andrew gelman stats-2011-11-15-World record running times vs. distance
12 0.67334223 61 andrew gelman stats-2010-05-31-A data visualization manifesto
13 0.67181379 1684 andrew gelman stats-2013-01-20-Ugly ugly ugly
14 0.66789681 1090 andrew gelman stats-2011-12-28-“. . . extending for dozens of pages”
17 0.65099198 2038 andrew gelman stats-2013-09-25-Great graphs of names
18 0.64182043 2266 andrew gelman stats-2014-03-25-A statistical graphics course and statistical graphics advice
19 0.6391567 1697 andrew gelman stats-2013-01-29-Where 36% of all boys end up nowadays
20 0.63700026 2154 andrew gelman stats-2013-12-30-Bill Gates’s favorite graph of the year
topicId topicWeight
[(16, 0.113), (63, 0.473), (99, 0.19)]
simIndex simValue blogId blogTitle
1 0.88687074 739 andrew gelman stats-2011-05-31-When Did Girls Start Wearing Pink?
Introduction: That cute picture is of toddler FDR in a dress, from 1884. Jeanne Maglaty writes : A Ladies’ Home Journal article [or maybe from a different source, according to a commenter] in June 1918 said, “The generally accepted rule is pink for the boys, and blue for the girls. The reason is that pink, being a more decided and stronger color, is more suitable for the boy, while blue, which is more delicate and dainty, is prettier for the girl.” Other sources said blue was flattering for blonds, pink for brunettes; or blue was for blue-eyed babies, pink for brown-eyed babies, according to Paoletti. In 1927, Time magazine printed a chart showing sex-appropriate colors for girls and boys according to leading U.S. stores. In Boston, Filene’s told parents to dress boys in pink. So did Best & Co. in New York City, Halle’s in Cleveland and Marshall Field in Chicago. Today’s color dictate wasn’t established until the 1940s . . . When the women’s liberation movement arrived in the mid-1960s, w
2 0.8666057 568 andrew gelman stats-2011-02-11-Calibration in chess
Introduction: Has anybody done this study yet? I’m curious about the results. Perhaps there’s some chess-playing cognitive psychologist who’d like to collaborate on this?
3 0.85143614 628 andrew gelman stats-2011-03-25-100-year floods
Introduction: According to the National Weather Service : What is a 100 year flood? A 100 year flood is an event that statistically has a 1% chance of occurring in any given year. A 500 year flood has a .2% chance of occurring and a 1000 year flood has a .1% chance of occurring. The accompanying map shows a part of Tennessee that in May 2010 had 1000-year levels of flooding. At first, it seems hard to believe that a 1000-year flood would have just happened to occur last year. But then, this is just a 1000-year flood for that particular place. I don’t really have a sense of the statistics of these events. How many 100-year, 500-year, and 1000-year flood events have been recorded by the Weather Service, and when have they occurred?
same-blog 4 0.80057347 126 andrew gelman stats-2010-07-03-Graphical presentation of risk ratios
Introduction: Jimmy passes this article by Ahmad Reza Hosseinpoor and Carla AbouZahr. I have little to say, except that (a) they seem to be making a reasonable point, and (b) those bar graphs are pretty ugly.
5 0.79408252 684 andrew gelman stats-2011-04-28-Hierarchical ordered logit or probit
Introduction: Jeff writes: How far off is bglmer and can it handle ordered logit or multinom logit? My reply: bglmer is very close. No ordered logit but I was just talking about it with Sophia today. My guess is that the easiest way to fit a hierarchical ordered logit or multinom logit will be to use stan. For right now I’d recommend using glmer/bglmer to fit the ordered logits in order (e.g., 1 vs. 2,3,4, then 2 vs. 3,4, then 3 vs. 4). Or maybe there’s already a hierarchical multinomial logit in mcmcpack or somewhere?
7 0.75488228 313 andrew gelman stats-2010-10-03-A question for psychometricians
8 0.74938428 745 andrew gelman stats-2011-06-04-High-level intellectual discussions in the Columbia statistics department
9 0.69084752 782 andrew gelman stats-2011-06-29-Putting together multinomial discrete regressions by combining simple logits
10 0.68040729 1621 andrew gelman stats-2012-12-13-Puzzles of criminal justice
11 0.67926359 1078 andrew gelman stats-2011-12-22-Tables as graphs: The Ramanujan principle
12 0.65155137 293 andrew gelman stats-2010-09-23-Lowess is great
13 0.63550735 1484 andrew gelman stats-2012-09-05-Two exciting movie ideas: “Second Chance U” and “The New Dirty Dozen”
14 0.63432831 1480 andrew gelman stats-2012-09-02-“If our product is harmful . . . we’ll stop making it.”
15 0.62242162 2249 andrew gelman stats-2014-03-15-Recently in the sister blog
16 0.61882925 102 andrew gelman stats-2010-06-21-Why modern art is all in the mind
17 0.60717189 1316 andrew gelman stats-2012-05-12-black and Black, white and White
18 0.59357536 2163 andrew gelman stats-2014-01-08-How to display multinominal logit results graphically?
19 0.59029901 1201 andrew gelman stats-2012-03-07-Inference = data + model
20 0.57202077 286 andrew gelman stats-2010-09-20-Are the Democrats avoiding a national campaign?