andrew_gelman_stats andrew_gelman_stats-2012 andrew_gelman_stats-2012-1470 knowledge-graph by maker-knowledge-mining

1470 andrew gelman stats-2012-08-26-Graphs showing regression uncertainty: the code!

meta infos for this blog

Source: html

Introduction: After our discussion of visual displays of regression uncertainty, I asked Solomon Hsiang and Lucas Leeman to send me their code. Both of them replied. Solomon wrote: The matlab and stata functions I wrote, as well as the script that replicates my figures, are all posted on my website . Also, I just added options to the main matlab function (vwregress.m) to make it display the spaghetti plot (similar to what Lucas did, but a simple bootstrap) and the shaded CI that you suggested (see figs below). They’re good suggestions. Personally, I [Hsiang] like the shaded CI better, since I think that all the visual activity in the spaghetti plot is a little distracting and sometimes adds visual weight in places where I wouldn’t want it. But the option is there in case people like it. Solomon then followed up with: I just thought of this small adjustment to your filled CI idea that seems neat. Cartographers like map projections that conserve area. We can do som

Summary: the most important sentenses genereted by tfidf model

sentIndex sentText sentNum sentScore

1 m) to make it display the spaghetti plot (similar to what Lucas did, but a simple bootstrap) and the shaded CI that you suggested (see figs below). [sent-5, score-0.716]

2 Personally, I [Hsiang] like the shaded CI better, since I think that all the visual activity in the spaghetti plot is a little distracting and sometimes adds visual weight in places where I wouldn’t want it. [sent-7, score-1.107]

3 Imagine that we squirt out ink uniformly to draw the conditional mean and then smear the ink vertically so that it stretches from the lower confidence bound to the upper confidence bound. [sent-12, score-0.883]

4 In places where the CI band is narrow, this will cause very little spreading of the ink so the CI band will be dark. [sent-13, score-0.912]

5 But in places where the CI band is wide, the ink is smeared a lot so it gets lighter. [sent-14, score-0.636]

6 For any vertical sliver of the CI band (think dx) the amount of ink displayed (integrated along a vertical line) will be constant. [sent-15, score-0.692]

7 But in places where we have a lot of information, the display will have more visual weight. [sent-16, score-0.281]

8 I think this is a somewhat more natural visual-weighting scheme for the CI band than the 1/sqrt(N) that I was using for just the mean regression. [sent-17, score-0.282]

9 After thinking a little more about how to visually-weight the CI bands and the spaghetti plots, I think that maybe we should be careful not to “double count” uncertainty. [sent-19, score-0.394]

10 For example, when the estimates begin to spread out in the spaghetti plots, then the apparent coloration begins to thin out simply because there is a lower density of lines. [sent-20, score-0.341]

11 This isn’t obviously wrong, but it does feel like we’re penalizing the graph in uncertain regions twice for the same thing. [sent-22, score-0.169]

12 Plotting the CI band with the “fixed ink” visual-weighting and the spaghetti plot with solid spaghetti seem like analogs to one another, since the vertically integrated quantity of ink is uniform in both plots. [sent-23, score-1.691]

13 Commands to plot both (using the function I posted) are: x = randn(200,1); e = 4*randn(200,1). [sent-24, score-0.238]

14 5,'CI','FILL',200,[0 0 1]); figure %Solid spaghetti plot without visual weighting: vwregress(x, y, 300, . [sent-29, score-0.728]

15 5,'SPAG','SOLID',200,[0 0 1]); My reply: I know what Solomon is saying about the double-counting; I thought about this too in my original post, which is why I’d liked the idea of the spaghetti plot with additive shading. [sent-30, score-0.579]

16 The status quo in many fields is even worse than that, though, in that it is often standard to put little perpendicular lines at the edges of intervals to make “error bars” which emphasize the endpoints even more. [sent-34, score-0.193]

17 Meanwhile Lucas also responded to my request for code: The original plot is based on a nested model with data I cannot make available. [sent-35, score-0.238]

18 out=split) # The x-range you want to use for plotting later X1 <- cbind(1,xx1) # The Matrix of explanatory variables where # >one variable varies from row to row y. [sent-40, score-0.248]

19 lat2) # translate latent to predicted probability cib <- 0. [sent-47, score-0.252]

20 1 # Define level for CI lb <- round((dim(BETA1)[1] * cib)/2) # lb and ub define which predictions to be plotted ub <- dim(BETA1)[1] - lb # >and are based on "cib" # plot the median prediction plot(xx1,colMeans(y. [sent-48, score-1.426]

similar blogs computed by tfidf model

tfidf for this blog:

wordName wordTfidf (topN-words)

[('ci', 0.378), ('spaghetti', 0.341), ('ink', 0.327), ('lb', 0.248), ('plot', 0.238), ('band', 0.223), ('ub', 0.199), ('cib', 0.149), ('visual', 0.149), ('solomon', 0.14), ('lwd', 0.136), ('lucas', 0.115), ('dim', 0.112), ('rgb', 0.099), ('vertically', 0.099), ('shaded', 0.091), ('randn', 0.091), ('vwregress', 0.091), ('places', 0.086), ('col', 0.085), ('rnorm', 0.085), ('plots', 0.075), ('matlab', 0.074), ('hsiang', 0.074), ('probit', 0.071), ('vertical', 0.071), ('integrated', 0.068), ('confidence', 0.065), ('plotting', 0.065), ('row', 0.064), ('alpha', 0.063), ('uncertain', 0.063), ('round', 0.062), ('regions', 0.061), ('scheme', 0.059), ('split', 0.059), ('component', 0.059), ('weighting', 0.056), ('variable', 0.055), ('latent', 0.055), ('solid', 0.054), ('little', 0.053), ('predicted', 0.048), ('intervals', 0.048), ('emphasize', 0.047), ('define', 0.046), ('display', 0.046), ('similar', 0.046), ('perpendicular', 0.045), ('penalizing', 0.045)]

similar blogs list:

simIndex simValue blogId blogTitle

same-blog 1 0.99999994 1470 andrew gelman stats-2012-08-26-Graphs showing regression uncertainty: the code!

2 0.50713634 1478 andrew gelman stats-2012-08-31-Watercolor regression

Introduction: Solomon Hsiang writes: Two small follow-ups based on the discussion (the second/bigger one is to address your comment about the 95% CI edges). 1. I realized that if we plot the confidence intervals as a solid color that fades (eg. using the “fixed ink” scheme from before) we can make sure the regression line also has heightened visual weight where confidence is high by plotting the line white. This makes the contrast (and thus visual weight) between the regression line and the CI highest when the CI is narrow and dark. As the CI fade near the edges, so does the contrast with the regression line. This is a small adjustment, but I like it because it is so simple and it makes the graph much nicer. (see “visually_weighted_fill_reverse” attached). My posted code has been updated to do this automatically. 2. You and your readers didn’t like that the edges of the filled CI were so sharp and arbitrary. But I didn’t like that the contrast between the spaghetti lines and the background

3 0.15069553 1672 andrew gelman stats-2013-01-14-How do you think about the values in a confidence interval?

Introduction: Philip Jones writes: As an interested reader of your blog, I wondered if you might consider a blog entry sometime on the following question I posed on CrossValidated (StackExchange). I originally posed the question based on my uncertainty about 95% CIs: “Are all values within the 95% CI equally likely (probable), or are the values at the “tails” of the 95% CI less likely than those in the middle of the CI closer to the point estimate?” I posed this question based on discordant information I found at a couple of different web sources (I posted these sources in the body of the question). I received some interesting replies, and the replies were not unanimous, in fact there is some serious disagreement there! After seeing this disagreement, I naturally thought of you, and whether you might be able to clear this up. Please note I am not referring to credible intervals, but rather to the common medical journal reporting standard of confidence intervals. My response: First

4 0.14708231 1452 andrew gelman stats-2012-08-09-Visually weighting regression displays

Introduction: Solomon Hsiang writes : One of my colleagues suggested that I send you this very short note that I wrote on a new approach for displaying regression result uncertainty (attached). It’s very simple, and I’ve found it effective in one of my papers where I actually use it, but if you have a chance to glance over it and have any ideas for how to sell the approach or make it better, I’d be very interested to hear them. (Also, if you’ve seen that someone else has already made this point, I’d appreciate knowing that too.) Here’s an example: Hsiang writes: In Panel A, our eyes are drawn outward, away from the center of the display and toward the swirling confidence intervals at the edges. But in Panel B, our eyes are attracted to the middle of the regression line, where the high contrast between the line and the background is sharp and visually heavy. By using visual-weighting, we focus our readers’s attention on those portions of the regression that contain the most inform

5 0.13326749 2248 andrew gelman stats-2014-03-15-Problematic interpretations of confidence intervals

Introduction: Rink Hoekstra writes: A couple of months ago, you were visiting the University of Groningen, and after the talk you gave there I spoke briefly with you about a study that I conducted with Richard Morey, Jeff Rouder and Eric-Jan Wagenmakers. In the study, we found that researchers’ knowledge of how to interpret a confidence interval (CI), was almost as limited as the knowledge of students who had had no inferential statistics course yet. Our manuscript was recently accepted for publication in Psychonomic Bulletin & Review , and it’s now available online (see e.g., here ). Maybe it’s interesting to discuss on your blog, especially since CIs are often promoted (for example in the new guidelines of Psychological Science ), but apparently researchers seem to have little idea how to interpret them. Given that the confidence percentage of a CI tells something about the procedure rather than about the data at hand, this might be understandable, but, according to us, it’s problematic neve

6 0.12961958 1461 andrew gelman stats-2012-08-17-Graphs showing uncertainty using lighter intensities for the lines that go further from the center, to de-emphasize the edges

7 0.10008679 672 andrew gelman stats-2011-04-20-The R code for those time-use graphs

8 0.090924114 929 andrew gelman stats-2011-09-27-Visual diagnostics for discrete-data regressions

9 0.088405676 324 andrew gelman stats-2010-10-07-Contest for developing an R package recommendation system

10 0.087146819 2042 andrew gelman stats-2013-09-28-Difficulties of using statistical significance (or lack thereof) to sift through and compare research hypotheses

11 0.086019106 252 andrew gelman stats-2010-09-02-R needs a good function to make line plots

12 0.085879035 1498 andrew gelman stats-2012-09-16-Choices in graphing parallel time series

13 0.082624882 855 andrew gelman stats-2011-08-16-Infovis and statgraphics update update

14 0.080872871 878 andrew gelman stats-2011-08-29-Infovis, infographics, and data visualization: Where I’m coming from, and where I’d like to go

15 0.080712572 480 andrew gelman stats-2010-12-21-Instead of “confidence interval,” let’s say “uncertainty interval”

16 0.080348842 1235 andrew gelman stats-2012-03-29-I’m looking for a quadrille notebook with faint lines

17 0.078725368 1968 andrew gelman stats-2013-08-05-Evidence on the impact of sustained use of polynomial regression on causal inference (a claim that coal heating is reducing lifespan by 5 years for half a billion people)

18 0.077803791 1258 andrew gelman stats-2012-04-10-Why display 6 years instead of 30?

19 0.076337337 209 andrew gelman stats-2010-08-16-EdLab at Columbia’s Teachers’ College

20 0.075696416 1807 andrew gelman stats-2013-04-17-Data problems, coding errors…what can be done?

similar blogs computed by lsi model

lsi for this blog:

topicId topicWeight

[(0, 0.114), (1, 0.018), (2, 0.034), (3, 0.007), (4, 0.104), (5, -0.073), (6, -0.006), (7, 0.003), (8, 0.006), (9, -0.027), (10, -0.015), (11, -0.012), (12, -0.008), (13, -0.022), (14, -0.018), (15, 0.02), (16, 0.021), (17, -0.008), (18, -0.003), (19, -0.038), (20, 0.058), (21, 0.067), (22, 0.024), (23, -0.017), (24, 0.069), (25, -0.027), (26, 0.023), (27, -0.071), (28, -0.017), (29, 0.041), (30, 0.032), (31, -0.002), (32, -0.054), (33, -0.033), (34, 0.008), (35, -0.037), (36, -0.011), (37, 0.054), (38, 0.018), (39, -0.031), (40, 0.031), (41, 0.003), (42, 0.052), (43, -0.006), (44, 0.012), (45, 0.023), (46, -0.037), (47, 0.058), (48, 0.026), (49, -0.013)]

similar blogs list:

simIndex simValue blogId blogTitle

same-blog 1 0.93881875 1470 andrew gelman stats-2012-08-26-Graphs showing regression uncertainty: the code!

2 0.92842418 1478 andrew gelman stats-2012-08-31-Watercolor regression

3 0.87870085 1452 andrew gelman stats-2012-08-09-Visually weighting regression displays

4 0.85337871 1461 andrew gelman stats-2012-08-17-Graphs showing uncertainty using lighter intensities for the lines that go further from the center, to de-emphasize the edges

Introduction: Following up on our recent discussion of visually-weighted displays of uncertainty in regression curves, Lucas Leeman sent in the following two graphs: First, the basic spaghetti-style plot showing inferential uncertainty in the E(y|x) curve: Then, a version using even lighter intensities for the lines that go further from the center, to further de-emphasize the edges: P.S. More (including code!) here .

5 0.7273885 672 andrew gelman stats-2011-04-20-The R code for those time-use graphs

Introduction: By popular demand, hereâ€™s my R script for the time-use graphs : # The data a1 <- c(4.2,3.2,11.1,1.3,2.2,2.0) a2 <- c(3.9,3.2,10.0,0.8,3.1,3.1) a3 <- c(6.3,2.5,9.8,0.9,2.2,2.4) a4 <- c(4.4,3.1,9.8,0.8,3.3,2.7) a5 <- c(4.8,3.0,9.9,0.7,3.3,2.4) a6 <- c(4.0,3.4,10.5,0.7,3.3,2.1) a <- rbind(a1,a2,a3,a4,a5,a6) avg <- colMeans (a) avg.array <- t (array (avg, rev(dim(a)))) diff <- a - avg.array country.name <- c("France", "Germany", "Japan", "Britain", "USA", "Turkey") # The line plots par (mfrow=c(2,3), mar=c(4,4,2,.5), mgp=c(2,.7,0), tck=-.02, oma=c(3,0,4,0), bg="gray96", fg="gray30") for (i in 1:6){ plot (c(1,6), c(-1,1.7), xlab="", ylab="", xaxt="n", yaxt="n", bty="l", type="n") lines (1:6, diff[i,], col="blue") points (1:6, diff[i,], pch=19, col="black") if (i>3){ axis (1, c(1,3,5), c ("Work,\nstudy", "Eat,\nsleep", "Leisure"), mgp=c(2,1.5,0), tck=0, cex.axis=1.2) axis (1, c(2,4,6), c ("Unpaid\nwork", "Personal\nCare", "Other"), mgp=c(2,1.5,0),

6 0.70493954 1283 andrew gelman stats-2012-04-26-Let’s play “Guess the smoother”!

7 0.69789952 1235 andrew gelman stats-2012-03-29-I’m looking for a quadrille notebook with faint lines

8 0.65675378 1672 andrew gelman stats-2013-01-14-How do you think about the values in a confidence interval?

9 0.64491105 1498 andrew gelman stats-2012-09-16-Choices in graphing parallel time series

10 0.64012206 293 andrew gelman stats-2010-09-23-Lowess is great

11 0.63559127 134 andrew gelman stats-2010-07-08-“What do you think about curved lines connecting discrete data-points?”

12 0.62701726 1609 andrew gelman stats-2012-12-06-Stephen Kosslyn’s principles of graphics and one more: There’s no need to cram everything into a single plot

13 0.61697918 480 andrew gelman stats-2010-12-21-Instead of “confidence interval,” let’s say “uncertainty interval”

14 0.61671603 1403 andrew gelman stats-2012-07-02-Moving beyond hopeless graphics

15 0.61662221 324 andrew gelman stats-2010-10-07-Contest for developing an R package recommendation system

16 0.60816169 252 andrew gelman stats-2010-09-02-R needs a good function to make line plots

17 0.59923816 2076 andrew gelman stats-2013-10-24-Chasing the noise: W. Edwards Deming would be spinning in his grave

18 0.59806919 1258 andrew gelman stats-2012-04-10-Why display 6 years instead of 30?

19 0.59722495 1716 andrew gelman stats-2013-02-09-iPython Notebook

20 0.59318221 296 andrew gelman stats-2010-09-26-A simple semigraphic display

similar blogs computed by lda model

lda for this blog:

topicId topicWeight

[(2, 0.028), (4, 0.15), (16, 0.058), (21, 0.03), (24, 0.086), (30, 0.013), (34, 0.025), (36, 0.172), (76, 0.013), (86, 0.029), (90, 0.014), (95, 0.019), (99, 0.174)]

similar blogs list:

simIndex simValue blogId blogTitle

same-blog 1 0.87589699 1470 andrew gelman stats-2012-08-26-Graphs showing regression uncertainty: the code!

2 0.84535486 176 andrew gelman stats-2010-08-02-Information is good

Introduction: Washington Post and Slate reporter Anne Applebaum wrote a dismissive column about Wikileaks, saying that they “offer nothing more than raw data.” Applebaum argues that “The notion that the Internet can replace traditional news-gathering has just been revealed to be a myth. . . . without more journalism, more investigation, more work, these documents just don’t matter that much.” Fine. But don’t undervalue the role of mere data! The usual story is that we don’t get to see the raw data underlying newspaper stories. Wikileaks and other crowdsourced data can be extremely useful, whether or not they replace “traditional news-gathering.”

3 0.83022463 1797 andrew gelman stats-2013-04-10-“Proposition and experiment”

Introduction: Anna Lena Phillips writes : I. Many people will not, of their own accord, look at a poem. II. Millions of people will, of their own accord, spend lots and lots of time looking at photographs of cats. III. Therefore, earlier this year, I concluded that the best strategy for increasing the number of viewers for poems would be to print them on top of photographs of cats. IV. I happen to like looking at both poems and cats. V. So this is, for me, a win-win situation. VI. Fortunately, my own cat is a patient model, and (if I am to be believed) quite photogenic. VII. The aforementioned cat is Tisko Tansi, small hero. VII. Thus I present to you (albeit in digital rather than physical form) an Endearments broadside, featuring a poem that originally appeared in BlazeVOX spring 2011. VIII. If you want to share a copy of this image, please ask first. If you want a real copy, you can ask about that too. She follows up with an image of a cat, on which is superimposed a short

4 0.82527339 2242 andrew gelman stats-2014-03-10-Stan Model of the Week: PK Calculation of IV and Oral Dosing

Introduction: [Update: Revised given comments from Wingfeet, Andrew and germo. Thanks! I'd mistakenly translated the dlnorm priors in the first version --- amazing what a difference the priors make. I also escaped the less-than and greater-than signs in the constraints in the model so they're visible. I also updated to match the thin=2 output of JAGS.] We’re going to be starting a Stan “model of the P” (for some time period P) column, so I thought I’d kick things off with one of my own. I’ve been following the Wingvoet blog , the author of which is identified only by the Blogger handle Wingfeet ; a couple of days ago this lovely post came out: PK calculation of IV and oral dosing in JAGS Wingfeet’s post implemented an answer to question 6 from chapter 6 of problem from Rowland and Tozer’s 2010 book, Clinical Pharmacokinetics and Pharmacodynamics , Fourth edition, Lippincott, Williams & Wilkins. So in the grand tradition of using this blog to procrastinate, I thought I’d t

5 0.82394373 1476 andrew gelman stats-2012-08-30-Stan is fast

Introduction: 10,000 iterations for 4 chains on the (precompiled) efficiently-parameterized 8-schools model: > date () [1] "Thu Aug 30 22:12:53 2012" > fit3 <- stan (fit=fit2, data = schools_dat, iter = 1e4, n_chains = 4) SAMPLING FOR MODEL 'anon_model' NOW (CHAIN 1). Iteration: 10000 / 10000 [100%] (Sampling) SAMPLING FOR MODEL 'anon_model' NOW (CHAIN 2). Iteration: 10000 / 10000 [100%] (Sampling) SAMPLING FOR MODEL 'anon_model' NOW (CHAIN 3). Iteration: 10000 / 10000 [100%] (Sampling) SAMPLING FOR MODEL 'anon_model' NOW (CHAIN 4). Iteration: 10000 / 10000 [100%] (Sampling) > date () [1] "Thu Aug 30 22:12:55 2012" > print (fit3) Inference for Stan model: anon_model. 4 chains: each with iter=10000; warmup=5000; thin=1; 10000 iterations saved. mean se_mean sd 2.5% 25% 50% 75% 97.5% n_eff Rhat mu 8.0 0.1 5.1 -2.0 4.7 8.0 11.3 18.4 4032 1 tau 6.7 0.1 5.6 0.3 2.5 5.4 9.3 21.2 2958 1 eta[1] 0.4 0.0 0.9 -1.5 -0

6 0.8192606 1478 andrew gelman stats-2012-08-31-Watercolor regression

7 0.80578566 551 andrew gelman stats-2011-02-02-Obama and Reagan, sitting in a tree, etc.

8 0.78765529 1618 andrew gelman stats-2012-12-11-The consulting biz

9 0.78758276 947 andrew gelman stats-2011-10-08-GiveWell sez: Cost-effectiveness of de-worming was overstated by a factor of 100 (!) due to a series of sloppy calculations

10 0.78408903 370 andrew gelman stats-2010-10-25-Who gets wedding announcements in the Times?

11 0.76240659 1801 andrew gelman stats-2013-04-13-Can you write a program to determine the causal order?

12 0.76164532 101 andrew gelman stats-2010-06-20-“People with an itch to scratch”

13 0.75728285 1847 andrew gelman stats-2013-05-08-Of parsing and chess

14 0.75594282 1918 andrew gelman stats-2013-06-29-Going negative

15 0.75443244 883 andrew gelman stats-2011-09-01-Arrow’s theorem update

16 0.75206709 1217 andrew gelman stats-2012-03-17-NSF program “to support analytic and methodological research in support of its surveys”

17 0.75023937 1919 andrew gelman stats-2013-06-29-R sucks

18 0.74955881 238 andrew gelman stats-2010-08-27-No radon lobby

19 0.74703699 415 andrew gelman stats-2010-11-15-The two faces of Erving Goffman: Subtle observer of human interactions, and Smug organzation man

20 0.74379969 1898 andrew gelman stats-2013-06-14-Progress! (on the understanding of the role of randomization in Bayesian inference)