andrew_gelman_stats andrew_gelman_stats-2013 andrew_gelman_stats-2013-2146 knowledge-graph by maker-knowledge-mining
Source: html
Introduction: They didn’t have room for all four graphs of the time-series decomposition so they just displayed the date-of-year graph: They rotated so the graph fit better on the page. The rotation worked for me, but I was a bit bummed that that they put the title and heading of the graph (“The birthrate tends to drop on holidays . . .”) on the left in the Mar-Apr slot, leaving no room to label Leap Day and April Fool’s. I suggested to the graphics people that they put the label at the very top and just shrink the rest of the graph by 5 or 10% so as to not take up any more total space. Then there’d be plenty of space to label Leap Day and April Fool’s. But they didn’t do it, maybe they felt that it wouldn’t look good to have the label right at the top, I dunno.
sentIndex sentText sentNum sentScore
1 They didn’t have room for all four graphs of the time-series decomposition so they just displayed the date-of-year graph: They rotated so the graph fit better on the page. [sent-1, score-1.056]
2 The rotation worked for me, but I was a bit bummed that that they put the title and heading of the graph (“The birthrate tends to drop on holidays . [sent-2, score-1.137]
3 ”) on the left in the Mar-Apr slot, leaving no room to label Leap Day and April Fool’s. [sent-5, score-0.928]
4 I suggested to the graphics people that they put the label at the very top and just shrink the rest of the graph by 5 or 10% so as to not take up any more total space. [sent-6, score-1.611]
5 Then there’d be plenty of space to label Leap Day and April Fool’s. [sent-7, score-0.718]
6 But they didn’t do it, maybe they felt that it wouldn’t look good to have the label right at the top, I dunno. [sent-8, score-0.758]
wordName wordTfidf (topN-words)
[('label', 0.492), ('leap', 0.291), ('fool', 0.279), ('graph', 0.273), ('april', 0.255), ('room', 0.214), ('slot', 0.187), ('decomposition', 0.163), ('holidays', 0.163), ('dunno', 0.156), ('shrink', 0.148), ('top', 0.147), ('heading', 0.138), ('displayed', 0.138), ('leaving', 0.135), ('plenty', 0.133), ('tends', 0.132), ('day', 0.13), ('drop', 0.116), ('didn', 0.106), ('put', 0.099), ('suggested', 0.099), ('felt', 0.093), ('space', 0.093), ('rest', 0.092), ('total', 0.092), ('graphics', 0.091), ('four', 0.088), ('left', 0.087), ('title', 0.087), ('worked', 0.081), ('graphs', 0.073), ('wouldn', 0.069), ('fit', 0.064), ('look', 0.051), ('take', 0.051), ('bit', 0.048), ('right', 0.045), ('maybe', 0.044), ('better', 0.043), ('good', 0.033), ('people', 0.027)]
simIndex simValue blogId blogTitle
same-blog 1 1.0 2146 andrew gelman stats-2013-12-24-NYT version of birthday graph
Introduction: They didn’t have room for all four graphs of the time-series decomposition so they just displayed the date-of-year graph: They rotated so the graph fit better on the page. The rotation worked for me, but I was a bit bummed that that they put the title and heading of the graph (“The birthrate tends to drop on holidays . . .”) on the left in the Mar-Apr slot, leaving no room to label Leap Day and April Fool’s. I suggested to the graphics people that they put the label at the very top and just shrink the rest of the graph by 5 or 10% so as to not take up any more total space. Then there’d be plenty of space to label Leap Day and April Fool’s. But they didn’t do it, maybe they felt that it wouldn’t look good to have the label right at the top, I dunno.
Introduction: I continue to struggle to convey my thoughts on statistical graphics so I’ll try another approach, this time giving my own story. For newcomers to this discussion: the background is that Antony Unwin and I wrote an article on the different goals embodied in information visualization and statistical graphics, but I have difficulty communicating on this point with the infovis people. Maybe if I tell my own story, and then they tell their stories, this will point a way forward to a more constructive discussion. So here goes. I majored in physics in college and I worked in a couple of research labs during the summer. Physicists graph everything. I did most of my plotting on graph paper–this continued through my second year of grad school–and became expert at putting points at 1/5, 2/5, 3/5, and 4/5 between the x and y grid lines. In grad school in statistics, I continued my physics habits and graphed everything I could. I did notice, though, that the faculty and the other
3 0.1397227 2266 andrew gelman stats-2014-03-25-A statistical graphics course and statistical graphics advice
Introduction: Dean Eckles writes: Some of my coworkers at Facebook and I have worked with Udacity to create an online course on exploratory data analysis, including using data visualizations in R as part of EDA. The course has now launched at https://www.udacity.com/course/ud651 so anyone can take it for free. And Kaiser Fung has reviewed it . So definitely feel free to promote it! Criticism is also welcome (we are still fine-tuning things and adding more notes throughout). I wrote some more comments about the course here , including highlighting the interviews with my great coworkers. I didn’t have a chance to look at the course so instead I responded with some generic comments about eda and visualization (in no particular order): - Think of a graph as a comparison. All graphs are comparison (indeed, all statistical analyses are comparisons). If you already have the graph in mind, think of what comparisons it’s enabling. Or if you haven’t settled on the graph yet, think of what
4 0.13105486 1376 andrew gelman stats-2012-06-12-Simple graph WIN: the example of birthday frequencies
Introduction: From Chris Mulligan: The data come from the Center for Disease Control and cover the years 1969-1988. Chris also gives instructions for how to download the data and plot them in R from scratch (in 30 lines of R code)! And now, the background A few months ago I heard about a study reporting that, during a recent eleven-year period, more babies were born on Valentine’s Day and fewer on Halloween compared to neighboring days: I wrote , What I’d really like to see is a graph with all 366 days of the year. It would be easy enough to make. That way we could put the Valentine’s and Halloween data in the context of other possible patterns. While they’re at it, they could also graph births by day of the week and show Thanksgiving, Easter, and other holidays that don’t have fixed dates. It’s so frustrating when people only show part of the story. I was pointed to some tables: and a graph from Matt Stiles: The heatmap is cute but I wanted to se
Introduction: 1. I remarked that Sharad had a good research article with some ugly graphs. 2. Dan posted Sharad’s graph and some unpleasant alternatives, inadvertently associating me with one of the unpleasant alternatives. Dan was comparing barplots with dotplots. 3. I commented on Dan’s site that, in this case, I’d much prefer a well-designed lineplot. I wrote: There’s a principle in decision analysis that the most important step is not the evaluation of the decision tree but the decision of what options to include in the tree in the first place. I think that’s what’s happening here. You’re seriously limiting yourself by considering the above options, which really are all the same graph with just slight differences in format. What you need to do is break outside the box. (Graph 2-which I think you think is the kind of thing that Gelman would like-indeed is the kind of thing that I think the R gurus like, but I don’t like it at all . It looks clean without actually being clea
6 0.12414076 61 andrew gelman stats-2010-05-31-A data visualization manifesto
7 0.11220133 1684 andrew gelman stats-2013-01-20-Ugly ugly ugly
8 0.10407463 502 andrew gelman stats-2011-01-04-Cash in, cash out graph
9 0.10088409 1167 andrew gelman stats-2012-02-14-Extra babies on Valentine’s Day, fewer on Halloween?
11 0.099212117 2154 andrew gelman stats-2013-12-30-Bill Gates’s favorite graph of the year
12 0.099016771 2246 andrew gelman stats-2014-03-13-An Economist’s Guide to Visualizing Data
14 0.091321558 737 andrew gelman stats-2011-05-30-Memorial Day question
15 0.084122576 855 andrew gelman stats-2011-08-16-Infovis and statgraphics update update
16 0.083980195 488 andrew gelman stats-2010-12-27-Graph of the year
17 0.083726697 1584 andrew gelman stats-2012-11-19-Tradeoffs in information graphics
18 0.080393627 50 andrew gelman stats-2010-05-25-Looking for Sister Right
20 0.079914257 670 andrew gelman stats-2011-04-20-Attractive but hard-to-read graph could be made much much better
topicId topicWeight
[(0, 0.083), (1, -0.051), (2, -0.015), (3, 0.078), (4, 0.111), (5, -0.138), (6, -0.058), (7, 0.042), (8, -0.035), (9, -0.014), (10, 0.03), (11, 0.023), (12, -0.033), (13, 0.025), (14, 0.011), (15, -0.031), (16, 0.034), (17, -0.015), (18, -0.016), (19, 0.019), (20, 0.02), (21, -0.005), (22, -0.05), (23, -0.029), (24, 0.019), (25, -0.028), (26, -0.044), (27, -0.018), (28, -0.021), (29, -0.001), (30, -0.003), (31, -0.021), (32, -0.067), (33, -0.05), (34, -0.039), (35, -0.022), (36, -0.004), (37, -0.048), (38, -0.023), (39, 0.022), (40, -0.008), (41, 0.01), (42, -0.014), (43, 0.025), (44, -0.025), (45, -0.001), (46, 0.063), (47, -0.005), (48, -0.022), (49, -0.004)]
simIndex simValue blogId blogTitle
same-blog 1 0.99324042 2146 andrew gelman stats-2013-12-24-NYT version of birthday graph
Introduction: They didn’t have room for all four graphs of the time-series decomposition so they just displayed the date-of-year graph: They rotated so the graph fit better on the page. The rotation worked for me, but I was a bit bummed that that they put the title and heading of the graph (“The birthrate tends to drop on holidays . . .”) on the left in the Mar-Apr slot, leaving no room to label Leap Day and April Fool’s. I suggested to the graphics people that they put the label at the very top and just shrink the rest of the graph by 5 or 10% so as to not take up any more total space. Then there’d be plenty of space to label Leap Day and April Fool’s. But they didn’t do it, maybe they felt that it wouldn’t look good to have the label right at the top, I dunno.
2 0.83083165 829 andrew gelman stats-2011-07-29-Infovis vs. statgraphics: A clear example of their different goals
Introduction: I recently came across a data visualization that perfectly demonstrates the difference between the “infovis” and “statgraphics” perspectives. Here’s the image ( link from Tyler Cowen): That’s the infovis. The statgraphic version would simply be a dotplot, something like this: (I purposely used the default settings in R with only minor modifications here to demonstrate what happens if you just want to plot the data with minimal effort.) Let’s compare the two graphs: From a statistical graphics perspective, the second graph dominates. The countries are directly comparable and the numbers are indicated by positions rather than area. The first graph is full of distracting color and gives the misleading visual impression that the total GDP of countries 5-10 is about equal to that of countries 1-4. If the goal is to get attention , though, it’s another story. There’s nothing special about the top graph above except how it looks. It represents neither a dat
3 0.82974011 671 andrew gelman stats-2011-04-20-One more time-use graph
Introduction: Evan Hensleigh sens me this redesign of the cross-national time use graph : Here was my version: And here was the original: Compared to my graph, Evan’s has better fonts, and that’s important–good fonts can make a display look professional. But I’m not sure about his other innovations. To me, the different colors for the different time-use categories are more of a distraction than a visual aid, and I also don’t like how he made the bars fatter. As I noted in my earlier entry, to me this draws unwanted attention to the negative space between the bars. His country labels are slightly misaligned (particularly Japan and USA), and I really don’t like his horizontal axis at all! He removed the units of hours and put + and – on the edges so that the axes run into each other. What was the point of that? It’s bad news. Also I don’t see any advantage at all to the prehensile tick marks. On the other hand, if Evgn and I were working together on such a graph, we w
4 0.82023591 2154 andrew gelman stats-2013-12-30-Bill Gates’s favorite graph of the year
Introduction: Under the subject line “Blog bait!”, Brendan Nyhan points me to this post at the Washington Post blog: For 2013, we asked some of the year’s most interesting, important and influential thinkers to name their favorite graph of the year — and why they chose it. Here’s Bill Gates’s. Infographic by Thomas Porostocky for WIRED. “I love this graph because it shows that while the number of people dying from communicable diseases is still far too high, those numbers continue to come down. . . .” As Brendan is aware, this is not my favorite sort of graph, it’s a bit of a puzzle to read and figure out where all the pieces fit in, also weird stuff going on like 3-D effects and the big space taken up by those yellow and green borders, as well as tricky things like understanding what some of those little blocks are, and perhaps the biggest question, what is the definition of an “untimely death.” But, as often is the case, the defects of the graph form a statistical perspective can
5 0.81566179 1258 andrew gelman stats-2012-04-10-Why display 6 years instead of 30?
Introduction: I continue to be the go-to guy for bad graphs. Today (i.e., 22 Feb), I received an email from Gary Rosin: I [Rosin] thought you might be interested in this graph showing the decline in median prices of homes since 1997. It exaggerates the proportions by using $150,000 as the floor, rather than zero. Indeed. Here’s the graph: A line plot, rather than a bar plot, would be appropriate here. Also, it’s weird that the headline says “10 years” but the graph has only 6 years. Why not give some perspective and show, say, 30 years?
6 0.81132567 502 andrew gelman stats-2011-01-04-Cash in, cash out graph
7 0.79702574 1684 andrew gelman stats-2013-01-20-Ugly ugly ugly
8 0.79633605 1376 andrew gelman stats-2012-06-12-Simple graph WIN: the example of birthday frequencies
10 0.79070824 443 andrew gelman stats-2010-12-02-Automating my graphics advice
11 0.79039937 915 andrew gelman stats-2011-09-17-(Worst) graph of the year
12 0.7870788 1011 andrew gelman stats-2011-11-15-World record running times vs. distance
13 0.78682238 488 andrew gelman stats-2010-12-27-Graph of the year
14 0.77825093 2266 andrew gelman stats-2014-03-25-A statistical graphics course and statistical graphics advice
15 0.77565467 1613 andrew gelman stats-2012-12-09-Hey—here’s a photo of me making fun of a silly infographic (from last year)
16 0.77016497 670 andrew gelman stats-2011-04-20-Attractive but hard-to-read graph could be made much much better
17 0.76411653 1167 andrew gelman stats-2012-02-14-Extra babies on Valentine’s Day, fewer on Halloween?
19 0.74609929 1104 andrew gelman stats-2012-01-07-A compelling reason to go to London, Ontario??
20 0.73851621 1439 andrew gelman stats-2012-08-01-A book with a bunch of simple graphs
topicId topicWeight
[(10, 0.047), (22, 0.023), (24, 0.199), (52, 0.023), (55, 0.023), (59, 0.046), (65, 0.171), (72, 0.027), (79, 0.027), (86, 0.093), (99, 0.18)]
simIndex simValue blogId blogTitle
same-blog 1 0.95961905 2146 andrew gelman stats-2013-12-24-NYT version of birthday graph
Introduction: They didn’t have room for all four graphs of the time-series decomposition so they just displayed the date-of-year graph: They rotated so the graph fit better on the page. The rotation worked for me, but I was a bit bummed that that they put the title and heading of the graph (“The birthrate tends to drop on holidays . . .”) on the left in the Mar-Apr slot, leaving no room to label Leap Day and April Fool’s. I suggested to the graphics people that they put the label at the very top and just shrink the rest of the graph by 5 or 10% so as to not take up any more total space. Then there’d be plenty of space to label Leap Day and April Fool’s. But they didn’t do it, maybe they felt that it wouldn’t look good to have the label right at the top, I dunno.
2 0.90966952 1475 andrew gelman stats-2012-08-30-A Stan is Born
Introduction: Stan 1.0.0 and RStan 1.0.0 It’s official. The Stan Development Team is happy to announce the first stable versions of Stan and RStan. What is (R)Stan? Stan is an open-source package for obtaining Bayesian inference using the No-U-Turn sampler, a variant of Hamiltonian Monte Carlo. It’s sort of like BUGS, but with a different language for expressing models and a different sampler for sampling from their posteriors. RStan is the R interface to Stan. Stan Home Page Stan’s home page is: http://mc-stan.org/ It links everything you need to get started running Stan from the command line, from R, or from C++, including full step-by-step install instructions, a detailed user’s guide and reference manual for the modeling language, and tested ports of most of the BUGS examples. Peruse the Manual If you’d like to learn more, the Stan User’s Guide and Reference Manual is the place to start.
3 0.85910976 2074 andrew gelman stats-2013-10-23-Can’t Stop Won’t Stop Mister P Beatdown
Introduction: Ben Highton and Matt Buttice point us to this response addressing some of the issues Jeff Lax raised in his most recent MRP post. P.S. Jeff replies in comments: It sounds like we’ve converged. They acknowledge MRP performance is significantly better on average than reported in their new paper in PA and yet performance variation in terms of correlation to “truth” remains higher than some might have thought. Cool. I hope this sort of blog exchange can be a model of scientific discussion. Instead of a paper just sitting there by itself, it can be openly explored. Ideally, the published paper would include a link to these discussions of Highton, Buttice, Lax, Phillips, and Ghitza, so that readers would automatically get all this information.
4 0.85874093 1365 andrew gelman stats-2012-06-04-Question 25 of my final exam for Design and Analysis of Sample Surveys
Introduction: 25. You are using multilevel regression and poststratification (MRP) to a survey of 1500 people to estimate support for the space program, by state. The model is fit using, as a state- level predictor, the Republican presidential vote in the state, which turns out to have a low correlation with support for the space program. Which of the following statements are basically true? (Indicate all that apply.) (a) For small states, the MRP estimates will be determined almost entirely by the demo- graphic characteristics of the respondents in the sample from that state. (b) For small states, the MRP estimates will be determined almost entirely by the demographic characteristics of the population in that state. (c) Adding a predictor specifically for this model (for example, a measure of per-capita space-program spending in the state) could dramatically improve the estimates of state-level opinion. (d) It would not be appropriate to add a predictor such as per-capita space-program spen
5 0.85033011 1454 andrew gelman stats-2012-08-11-Weakly informative priors for Bayesian nonparametric models?
Introduction: Nathaniel Egwu writes: I am a PhD student working on machine learning using artificial neural networks . . . Do you have some recent publications related to how one can construct priors depending on the type of input data available for training? I intend to construct a prior distribution for a given trade-off parameter of my non model obtained through training a neural network. At this stage, my argument is due to the fact that Bayesian nonparameteric estimation offers some insight on how to proceed on this problem. As I’ve been writing here for awhile, I’ve been interested in weakly informative priors. But I have little experience with nonparametric models. Perhaps Aki Vehtari or David Dunson or some other expert on these models can discuss how to set them up with weakly informative priors? This sounds like it could be important to me.
6 0.84979761 671 andrew gelman stats-2011-04-20-One more time-use graph
7 0.84486032 1197 andrew gelman stats-2012-03-04-“All Models are Right, Most are Useless”
8 0.84133327 457 andrew gelman stats-2010-12-07-Whassup with phantom-limb treatment?
10 0.83110338 1426 andrew gelman stats-2012-07-23-Special effects
11 0.82990098 1021 andrew gelman stats-2011-11-21-Don’t judge a book by its title
12 0.82570803 2062 andrew gelman stats-2013-10-15-Last word on Mister P (for now)
13 0.82327771 1367 andrew gelman stats-2012-06-05-Question 26 of my final exam for Design and Analysis of Sample Surveys
14 0.81856412 2056 andrew gelman stats-2013-10-09-Mister P: What’s its secret sauce?
15 0.81586695 2076 andrew gelman stats-2013-10-24-Chasing the noise: W. Edwards Deming would be spinning in his grave
16 0.80994582 779 andrew gelman stats-2011-06-25-Avoiding boundary estimates using a prior distribution as regularization
17 0.80759931 2231 andrew gelman stats-2014-03-03-Running into a Stan Reference by Accident
18 0.80515254 2252 andrew gelman stats-2014-03-17-Ma conférence demain (mardi) à l’École Polytechnique
19 0.80513799 2102 andrew gelman stats-2013-11-15-“Are all significant p-values created equal?”
20 0.80484897 1072 andrew gelman stats-2011-12-19-“The difference between . . .”: It’s not just p=.05 vs. p=.06