After our discussion of visual displays of regression uncertainty, I asked Solomon Hsiang and Lucas Leeman to send me their code. Both of them replied. Solomon wrote: The matlab and stata functions I wrote, as well as the script that replicates my figures, are all posted on my website . Also, I just added options to the main matlab function (vwregress.m) to make it display the spaghetti plot (similar to what Lucas did, but a simple bootstrap) and the shaded CI that you suggested (see figs below). They're good suggestions. Personally, I [Hsiang] like the shaded CI better, since I think that all the visual activity in the spaghetti plot is a little distracting and sometimes adds visual weight in places where I wouldn't want it. But the option is there in case people like it. Solomon then followed up with: I just thought of this small adjustment to your filled CI idea that seems neat. Cartographers like map projections that conserve area. We can do som

1 m) to make it display the spaghetti plot (similar to what Lucas did, but a simple bootstrap) and the shaded CI that you suggested (see figs below). [sent-5, score-0.716]

2 Personally, I [Hsiang] like the shaded CI better, since I think that all the visual activity in the spaghetti plot is a little distracting and sometimes adds visual weight in places where I wouldn’t want it. [sent-7, score-1.107]

3 Imagine that we squirt out ink uniformly to draw the conditional mean and then smear the ink vertically so that it stretches from the lower confidence bound to the upper confidence bound. [sent-12, score-0.883]

4 In places where the CI band is narrow, this will cause very little spreading of the ink so the CI band will be dark. [sent-13, score-0.912]

5 But in places where the CI band is wide, the ink is smeared a lot so it gets lighter. [sent-14, score-0.636]

6 For any vertical sliver of the CI band (think dx) the amount of ink displayed (integrated along a vertical line) will be constant. [sent-15, score-0.692]

7 But in places where we have a lot of information, the display will have more visual weight. [sent-16, score-0.281]

8 I think this is a somewhat more natural visual-weighting scheme for the CI band than the 1/sqrt(N) that I was using for just the mean regression. [sent-17, score-0.282]

9 After thinking a little more about how to visually-weight the CI bands and the spaghetti plots, I think that maybe we should be careful not to “double count” uncertainty. [sent-19, score-0.394]

10 For example, when the estimates begin to spread out in the spaghetti plots, then the apparent coloration begins to thin out simply because there is a lower density of lines. [sent-20, score-0.341]

11 This isn’t obviously wrong, but it does feel like we’re penalizing the graph in uncertain regions twice for the same thing. [sent-22, score-0.169]

12 Plotting the CI band with the “fixed ink” visual-weighting and the spaghetti plot with solid spaghetti seem like analogs to one another, since the vertically integrated quantity of ink is uniform in both plots. [sent-23, score-1.691]

13 Commands to plot both (using the function I posted) are: x = randn(200,1); e = 4*randn(200,1). [sent-24, score-0.238]

14 5,'CI','FILL',200,[0 0 1]); figure %Solid spaghetti plot without visual weighting: vwregress(x, y, 300, . [sent-29, score-0.728]

15 5,'SPAG','SOLID',200,[0 0 1]); My reply: I know what Solomon is saying about the double-counting; I thought about this too in my original post, which is why I’d liked the idea of the spaghetti plot with additive shading. [sent-30, score-0.579]

16 The status quo in many fields is even worse than that, though, in that it is often standard to put little perpendicular lines at the edges of intervals to make “error bars” which emphasize the endpoints even more. [sent-34, score-0.193]

17 Meanwhile Lucas also responded to my request for code: The original plot is based on a nested model with data I cannot make available. [sent-35, score-0.238]

18 out=split) # The x-range you want to use for plotting later X1 <- cbind(1,xx1) # The Matrix of explanatory variables where # >one variable varies from row to row y. [sent-40, score-0.248]

19 lat2) # translate latent to predicted probability cib <- 0. [sent-47, score-0.252]

20 1 # Define level for CI lb <- round((dim(BETA1)[1] * cib)/2) # lb and ub define which predictions to be plotted ub <- dim(BETA1)[1] - lb # >and are based on "cib" # plot the median prediction plot(xx1,colMeans(y. [sent-48, score-1.426]

Introduction: After our discussion of visual displays of regression uncertainty, I asked Solomon Hsiang and Lucas Leeman to send me their code. Both of them replied. Solomon wrote: The matlab and stata functions I wrote, as well as the script that replicates my figures, are all posted on my website . Also, I just added options to the main matlab function (vwregress.m) to make it display the spaghetti plot (similar to what Lucas did, but a simple bootstrap) and the shaded CI that you suggested (see figs below). They’re good suggestions. Personally, I [Hsiang] like the shaded CI better, since I think that all the visual activity in the spaghetti plot is a little distracting and sometimes adds visual weight in places where I wouldn’t want it. But the option is there in case people like it. Solomon then followed up with: I just thought of this small adjustment to your filled CI idea that seems neat. Cartographers like map projections that conserve area. We can do som

2 0.50713634 1478 andrew gelman stats-2012-08-31-Watercolor regression

Introduction: Solomon Hsiang writes: Two small follow-ups based on the discussion (the second/bigger one is to address your comment about the 95% CI edges). 1. I realized that if we plot the confidence intervals as a solid color that fades (eg. using the “fixed ink” scheme from before) we can make sure the regression line also has heightened visual weight where confidence is high by plotting the line white. This makes the contrast (and thus visual weight) between the regression line and the CI highest when the CI is narrow and dark. As the CI fade near the edges, so does the contrast with the regression line. This is a small adjustment, but I like it because it is so simple and it makes the graph much nicer. (see “visually_weighted_fill_reverse” attached). My posted code has been updated to do this automatically. 2. You and your readers didn’t like that the edges of the filled CI were so sharp and arbitrary. But I didn’t like that the contrast between the spaghetti lines and the background

3 0.15069553 1672 andrew gelman stats-2013-01-14-How do you think about the values in a confidence interval?

Introduction: Philip Jones writes: As an interested reader of your blog, I wondered if you might consider a blog entry sometime on the following question I posed on CrossValidated (StackExchange). I originally posed the question based on my uncertainty about 95% CIs: “Are all values within the 95% CI equally likely (probable), or are the values at the “tails” of the 95% CI less likely than those in the middle of the CI closer to the point estimate?” I posed this question based on discordant information I found at a couple of different web sources (I posted these sources in the body of the question). I received some interesting replies, and the replies were not unanimous, in fact there is some serious disagreement there! After seeing this disagreement, I naturally thought of you, and whether you might be able to clear this up. Please note I am not referring to credible intervals, but rather to the common medical journal reporting standard of confidence intervals. My response: First

4 0.14708231 1452 andrew gelman stats-2012-08-09-Visually weighting regression displays

Introduction: Solomon Hsiang writes : One of my colleagues suggested that I send you this very short note that I wrote on a new approach for displaying regression result uncertainty (attached). It’s very simple, and I’ve found it effective in one of my papers where I actually use it, but if you have a chance to glance over it and have any ideas for how to sell the approach or make it better, I’d be very interested to hear them. (Also, if you’ve seen that someone else has already made this point, I’d appreciate knowing that too.) Here’s an example: Hsiang writes: In Panel A, our eyes are drawn outward, away from the center of the display and toward the swirling confidence intervals at the edges. But in Panel B, our eyes are attracted to the middle of the regression line, where the high contrast between the line and the background is sharp and visually heavy. By using visual-weighting, we focus our readers’s attention on those portions of the regression that contain the most inform

5 0.13326749 2248 andrew gelman stats-2014-03-15-Problematic interpretations of confidence intervals

Introduction: Rink Hoekstra writes: A couple of months ago, you were visiting the University of Groningen, and after the talk you gave there I spoke briefly with you about a study that I conducted with Richard Morey, Jeff Rouder and Eric-Jan Wagenmakers. In the study, we found that researchers’  knowledge of how to interpret a confidence interval (CI), was almost as limited as the knowledge of students who had had no inferential statistics course yet. Our manuscript was recently accepted for publication in  Psychonomic Bulletin & Review , and it’s now available online (see e.g.,  here ). Maybe it’s interesting to discuss on your blog, especially since CIs are often promoted (for example in the new guidelines of Psychological Science ), but apparently researchers seem to have little idea how to interpret them. Given that the confidence percentage of a CI tells something about the procedure rather than about the data at hand, this might be understandable, but, according to us, it’s problematic neve

6 0.12961958 1461 andrew gelman stats-2012-08-17-Graphs showing uncertainty using lighter intensities for the lines that go further from the center, to de-emphasize the edges

7 0.10008679 672 andrew gelman stats-2011-04-20-The R code for those time-use graphs

8 0.090924114 929 andrew gelman stats-2011-09-27-Visual diagnostics for discrete-data regressions

9 0.088405676 324 andrew gelman stats-2010-10-07-Contest for developing an R package recommendation system

10 0.087146819 2042 andrew gelman stats-2013-09-28-Difficulties of using statistical significance (or lack thereof) to sift through and compare research hypotheses

11 0.086019106 252 andrew gelman stats-2010-09-02-R needs a good function to make line plots

12 0.085879035 1498 andrew gelman stats-2012-09-16-Choices in graphing parallel time series

13 0.082624882 855 andrew gelman stats-2011-08-16-Infovis and statgraphics update update

14 0.080872871 878 andrew gelman stats-2011-08-29-Infovis, infographics, and data visualization: Where I’m coming from, and where I’d like to go

15 0.080712572 480 andrew gelman stats-2010-12-21-Instead of “confidence interval,” let’s say “uncertainty interval”

16 0.080348842 1235 andrew gelman stats-2012-03-29-I’m looking for a quadrille notebook with faint lines

17 0.078725368 1968 andrew gelman stats-2013-08-05-Evidence on the impact of sustained use of polynomial regression on causal inference (a claim that coal heating is reducing lifespan by 5 years for half a billion people)

18 0.077803791 1258 andrew gelman stats-2012-04-10-Why display 6 years instead of 30?

19 0.076337337 209 andrew gelman stats-2010-08-16-EdLab at Columbia’s Teachers’ College

20 0.075696416 1807 andrew gelman stats-2013-04-17-Data problems, coding errors…what can be done?

