andrew_gelman_stats andrew_gelman_stats-2012 andrew_gelman_stats-2012-1452 knowledge-graph by maker-knowledge-mining
Source: html
Introduction: Solomon Hsiang writes : One of my colleagues suggested that I send you this very short note that I wrote on a new approach for displaying regression result uncertainty (attached). It’s very simple, and I’ve found it effective in one of my papers where I actually use it, but if you have a chance to glance over it and have any ideas for how to sell the approach or make it better, I’d be very interested to hear them. (Also, if you’ve seen that someone else has already made this point, I’d appreciate knowing that too.) Here’s an example: Hsiang writes: In Panel A, our eyes are drawn outward, away from the center of the display and toward the swirling confidence intervals at the edges. But in Panel B, our eyes are attracted to the middle of the regression line, where the high contrast between the line and the background is sharp and visually heavy. By using visual-weighting, we focus our readers’s attention on those portions of the regression that contain the most inform
sentIndex sentText sentNum sentScore
1 Solomon Hsiang writes : One of my colleagues suggested that I send you this very short note that I wrote on a new approach for displaying regression result uncertainty (attached). [sent-1, score-0.834]
2 It’s very simple, and I’ve found it effective in one of my papers where I actually use it, but if you have a chance to glance over it and have any ideas for how to sell the approach or make it better, I’d be very interested to hear them. [sent-2, score-0.373]
3 ) Here’s an example: Hsiang writes: In Panel A, our eyes are drawn outward, away from the center of the display and toward the swirling confidence intervals at the edges. [sent-4, score-0.222]
4 But in Panel B, our eyes are attracted to the middle of the regression line, where the high contrast between the line and the background is sharp and visually heavy. [sent-5, score-0.69]
5 By using visual-weighting, we focus our readers’s attention on those portions of the regression that contain the most information and where the findings are strongest. [sent-6, score-0.347]
6 Furthermore, when we attempt to look at the edges of Panel B, we actually feel a little uncertain, as if we are trying to make out the shape of something through a fog. [sent-7, score-0.401]
7 This is good thing, because everyone knows that feeling, even if we have no statistical training (or ignore that training when its inconvenient). [sent-8, score-0.252]
8 By aligning the feeling of uncertainty with actual statistical uncertainty, we can more intuitively and more effectively communicate uncertainty in our results to a broader set of viewers. [sent-9, score-1.215]
9 But, once you’re making those edges blurry, couldn’t you also spread them out, to get the best of both worlds, the uncertainty bounds and the visual weighting? [sent-11, score-0.827]
10 Suppose that, instead of displaying the fitted curve and error bounds, you make a spaghetti-style plot showing, say, 1000 draws of the regression curve from the uncertainty distribution. [sent-13, score-1.591]
11 Usually when we do this we just let the lines overwrite, but suppose that instead we make each of the 1000 lines really light gray but then increase the darkness when two or more lines overlap. [sent-14, score-0.967]
12 Then you’ll get a graph where the curve is automatically darker where the uncertainty distribution is more concentrated and lighter where the distribution is more vague. [sent-15, score-1.069]
13 You don’t actually need to draw the 1000 lines, instead you can do it analytically and just plot the color intensities in proportion to the distributions. [sent-17, score-0.499]
14 The result will look something like Hsiang’s visually-weighted regression but more spread out where the curve is more uncertain. [sent-18, score-0.668]
wordName wordTfidf (topN-words)
[('uncertainty', 0.35), ('hsiang', 0.281), ('curve', 0.275), ('panel', 0.216), ('lines', 0.208), ('regression', 0.182), ('edges', 0.176), ('bounds', 0.17), ('uncertain', 0.159), ('eyes', 0.157), ('displaying', 0.149), ('spread', 0.131), ('training', 0.126), ('feeling', 0.111), ('instead', 0.11), ('plot', 0.109), ('darker', 0.107), ('aligning', 0.107), ('intensities', 0.107), ('blurry', 0.103), ('analytically', 0.099), ('lighter', 0.094), ('portions', 0.094), ('visually', 0.094), ('attracted', 0.092), ('intuitively', 0.088), ('solomon', 0.088), ('concentrated', 0.085), ('glance', 0.085), ('suppose', 0.084), ('sharp', 0.083), ('line', 0.082), ('furthermore', 0.08), ('result', 0.08), ('distribution', 0.079), ('worlds', 0.079), ('shape', 0.077), ('gray', 0.075), ('actually', 0.074), ('make', 0.074), ('approach', 0.073), ('contain', 0.071), ('broader', 0.071), ('weighting', 0.07), ('communicate', 0.07), ('effectively', 0.068), ('sell', 0.067), ('draws', 0.067), ('attached', 0.067), ('drawn', 0.065)]
simIndex simValue blogId blogTitle
same-blog 1 0.99999982 1452 andrew gelman stats-2012-08-09-Visually weighting regression displays
Introduction: Solomon Hsiang writes : One of my colleagues suggested that I send you this very short note that I wrote on a new approach for displaying regression result uncertainty (attached). It’s very simple, and I’ve found it effective in one of my papers where I actually use it, but if you have a chance to glance over it and have any ideas for how to sell the approach or make it better, I’d be very interested to hear them. (Also, if you’ve seen that someone else has already made this point, I’d appreciate knowing that too.) Here’s an example: Hsiang writes: In Panel A, our eyes are drawn outward, away from the center of the display and toward the swirling confidence intervals at the edges. But in Panel B, our eyes are attracted to the middle of the regression line, where the high contrast between the line and the background is sharp and visually heavy. By using visual-weighting, we focus our readers’s attention on those portions of the regression that contain the most inform
Introduction: Following up on our recent discussion of visually-weighted displays of uncertainty in regression curves, Lucas Leeman sent in the following two graphs: First, the basic spaghetti-style plot showing inferential uncertainty in the E(y|x) curve: Then, a version using even lighter intensities for the lines that go further from the center, to further de-emphasize the edges: P.S. More (including code!) here .
3 0.1830143 1478 andrew gelman stats-2012-08-31-Watercolor regression
Introduction: Solomon Hsiang writes: Two small follow-ups based on the discussion (the second/bigger one is to address your comment about the 95% CI edges). 1. I realized that if we plot the confidence intervals as a solid color that fades (eg. using the “fixed ink” scheme from before) we can make sure the regression line also has heightened visual weight where confidence is high by plotting the line white. This makes the contrast (and thus visual weight) between the regression line and the CI highest when the CI is narrow and dark. As the CI fade near the edges, so does the contrast with the regression line. This is a small adjustment, but I like it because it is so simple and it makes the graph much nicer. (see “visually_weighted_fill_reverse” attached). My posted code has been updated to do this automatically. 2. You and your readers didn’t like that the edges of the filled CI were so sharp and arbitrary. But I didn’t like that the contrast between the spaghetti lines and the background
4 0.1580193 1543 andrew gelman stats-2012-10-21-Model complexity as a function of sample size
Introduction: As we get more data, we can fit more model. But at some point we become so overwhelmed by data that, for computational reasons, we can barely do anything at all. Thus, the curve above could be thought of as the product of two curves: a steadily increasing curve showing the statistical ability to fit more complex models with more data, and a steadily decreasing curve showing the computational feasibility of doing so.
5 0.15132017 480 andrew gelman stats-2010-12-21-Instead of “confidence interval,” let’s say “uncertainty interval”
Introduction: I’ve become increasingly uncomfortable with the term “confidence interval,” for several reasons: - The well-known difficulties in interpretation (officially the confidence statement can be interpreted only on average, but people typically implicitly give the Bayesian interpretation to each case), - The ambiguity between confidence intervals and predictive intervals. (See the footnote in BDA where we discuss the difference between “inference” and “prediction” in the classical framework.) - The awkwardness of explaining that confidence intervals are big in noisy situations where you have less confidence, and confidence intervals are small when you have more confidence. So here’s my proposal. Let’s use the term “uncertainty interval” instead. The uncertainty interval tells you how much uncertainty you have. That works pretty well, I think. P.S. As of this writing, “confidence interval” outGoogles “uncertainty interval” by the huge margin of 9.5 million to 54000. So we
6 0.14708231 1470 andrew gelman stats-2012-08-26-Graphs showing regression uncertainty: the code!
7 0.12871858 1032 andrew gelman stats-2011-11-28-Does Avastin work on breast cancer? Should Medicare be paying for it?
8 0.1200968 146 andrew gelman stats-2010-07-14-The statistics and the science
9 0.11816137 144 andrew gelman stats-2010-07-13-Hey! Here’s a referee report for you!
10 0.11636542 2288 andrew gelman stats-2014-04-10-Small multiples of lineplots > maps (ok, not always, but yes in this case)
12 0.11466795 1283 andrew gelman stats-2012-04-26-Let’s play “Guess the smoother”!
13 0.11063523 246 andrew gelman stats-2010-08-31-Somewhat Bayesian multilevel modeling
15 0.10791253 2299 andrew gelman stats-2014-04-21-Stan Model of the Week: Hierarchical Modeling of Supernovas
17 0.10305014 324 andrew gelman stats-2010-10-07-Contest for developing an R package recommendation system
18 0.10252961 447 andrew gelman stats-2010-12-03-Reinventing the wheel, only more so.
19 0.10197406 855 andrew gelman stats-2011-08-16-Infovis and statgraphics update update
topicId topicWeight
[(0, 0.168), (1, 0.017), (2, 0.017), (3, 0.016), (4, 0.109), (5, -0.097), (6, -0.011), (7, 0.01), (8, 0.005), (9, -0.011), (10, 0.001), (11, -0.016), (12, -0.001), (13, -0.012), (14, -0.036), (15, -0.005), (16, 0.008), (17, 0.009), (18, 0.015), (19, -0.027), (20, 0.073), (21, 0.086), (22, 0.041), (23, -0.024), (24, 0.07), (25, 0.006), (26, 0.045), (27, -0.105), (28, -0.02), (29, 0.024), (30, 0.097), (31, 0.028), (32, -0.062), (33, -0.038), (34, 0.02), (35, -0.044), (36, -0.031), (37, 0.067), (38, -0.0), (39, -0.063), (40, 0.056), (41, 0.039), (42, 0.05), (43, -0.024), (44, 0.05), (45, 0.014), (46, -0.063), (47, 0.105), (48, 0.028), (49, -0.045)]
simIndex simValue blogId blogTitle
same-blog 1 0.96960628 1452 andrew gelman stats-2012-08-09-Visually weighting regression displays
Introduction: Solomon Hsiang writes : One of my colleagues suggested that I send you this very short note that I wrote on a new approach for displaying regression result uncertainty (attached). It’s very simple, and I’ve found it effective in one of my papers where I actually use it, but if you have a chance to glance over it and have any ideas for how to sell the approach or make it better, I’d be very interested to hear them. (Also, if you’ve seen that someone else has already made this point, I’d appreciate knowing that too.) Here’s an example: Hsiang writes: In Panel A, our eyes are drawn outward, away from the center of the display and toward the swirling confidence intervals at the edges. But in Panel B, our eyes are attracted to the middle of the regression line, where the high contrast between the line and the background is sharp and visually heavy. By using visual-weighting, we focus our readers’s attention on those portions of the regression that contain the most inform
2 0.92890871 1478 andrew gelman stats-2012-08-31-Watercolor regression
Introduction: Solomon Hsiang writes: Two small follow-ups based on the discussion (the second/bigger one is to address your comment about the 95% CI edges). 1. I realized that if we plot the confidence intervals as a solid color that fades (eg. using the “fixed ink” scheme from before) we can make sure the regression line also has heightened visual weight where confidence is high by plotting the line white. This makes the contrast (and thus visual weight) between the regression line and the CI highest when the CI is narrow and dark. As the CI fade near the edges, so does the contrast with the regression line. This is a small adjustment, but I like it because it is so simple and it makes the graph much nicer. (see “visually_weighted_fill_reverse” attached). My posted code has been updated to do this automatically. 2. You and your readers didn’t like that the edges of the filled CI were so sharp and arbitrary. But I didn’t like that the contrast between the spaghetti lines and the background
Introduction: Following up on our recent discussion of visually-weighted displays of uncertainty in regression curves, Lucas Leeman sent in the following two graphs: First, the basic spaghetti-style plot showing inferential uncertainty in the E(y|x) curve: Then, a version using even lighter intensities for the lines that go further from the center, to further de-emphasize the edges: P.S. More (including code!) here .
4 0.89038682 1470 andrew gelman stats-2012-08-26-Graphs showing regression uncertainty: the code!
Introduction: After our discussion of visual displays of regression uncertainty, I asked Solomon Hsiang and Lucas Leeman to send me their code. Both of them replied. Solomon wrote: The matlab and stata functions I wrote, as well as the script that replicates my figures, are all posted on my website . Also, I just added options to the main matlab function (vwregress.m) to make it display the spaghetti plot (similar to what Lucas did, but a simple bootstrap) and the shaded CI that you suggested (see figs below). They’re good suggestions. Personally, I [Hsiang] like the shaded CI better, since I think that all the visual activity in the spaghetti plot is a little distracting and sometimes adds visual weight in places where I wouldn’t want it. But the option is there in case people like it. Solomon then followed up with: I just thought of this small adjustment to your filled CI idea that seems neat. Cartographers like map projections that conserve area. We can do som
5 0.72882706 1283 andrew gelman stats-2012-04-26-Let’s play “Guess the smoother”!
Introduction: Andre de Boer writes: In my profession as a risk manager I encountered this graph: I can’t figure out what kind of regression this is, would you be so kind to enlighten me? The points represent (maturity,yield) of bonds. My reply: That’s a fun problem, reverse-engineering a curve fit! My first guess is lowess, although it seems too flat and asympoty on the right side of the graph to be lowess. Maybe a Gaussian process? Looks too smooth to be a spline. I guess I’ll go with my original guess, on the theory that lowess is the most accessible smoother out there, and if someone fit something much more complicated they’d make more of a big deal about it. On the other hand, if the curve is an automatic output of some software (Excel? Stata?) then it could be just about anything. Does anyone have any ideas?
6 0.71146548 293 andrew gelman stats-2010-09-23-Lowess is great
7 0.70032018 1235 andrew gelman stats-2012-03-29-I’m looking for a quadrille notebook with faint lines
8 0.67011368 672 andrew gelman stats-2011-04-20-The R code for those time-use graphs
9 0.66829985 1403 andrew gelman stats-2012-07-02-Moving beyond hopeless graphics
10 0.64485246 324 andrew gelman stats-2010-10-07-Contest for developing an R package recommendation system
11 0.63361484 144 andrew gelman stats-2010-07-13-Hey! Here’s a referee report for you!
12 0.63228458 134 andrew gelman stats-2010-07-08-“What do you think about curved lines connecting discrete data-points?”
14 0.61526656 252 andrew gelman stats-2010-09-02-R needs a good function to make line plots
15 0.61427033 1656 andrew gelman stats-2013-01-05-Understanding regression models and regression coefficients
16 0.6133396 296 andrew gelman stats-2010-09-26-A simple semigraphic display
18 0.59728408 2288 andrew gelman stats-2014-04-10-Small multiples of lineplots > maps (ok, not always, but yes in this case)
19 0.59677738 1498 andrew gelman stats-2012-09-16-Choices in graphing parallel time series
topicId topicWeight
[(5, 0.01), (14, 0.045), (16, 0.116), (21, 0.061), (24, 0.127), (27, 0.011), (31, 0.011), (34, 0.067), (61, 0.01), (65, 0.036), (78, 0.021), (84, 0.014), (86, 0.027), (99, 0.332)]
simIndex simValue blogId blogTitle
same-blog 1 0.97867841 1452 andrew gelman stats-2012-08-09-Visually weighting regression displays
Introduction: Solomon Hsiang writes : One of my colleagues suggested that I send you this very short note that I wrote on a new approach for displaying regression result uncertainty (attached). It’s very simple, and I’ve found it effective in one of my papers where I actually use it, but if you have a chance to glance over it and have any ideas for how to sell the approach or make it better, I’d be very interested to hear them. (Also, if you’ve seen that someone else has already made this point, I’d appreciate knowing that too.) Here’s an example: Hsiang writes: In Panel A, our eyes are drawn outward, away from the center of the display and toward the swirling confidence intervals at the edges. But in Panel B, our eyes are attracted to the middle of the regression line, where the high contrast between the line and the background is sharp and visually heavy. By using visual-weighting, we focus our readers’s attention on those portions of the regression that contain the most inform
Introduction: Jemes Keirstead sends along this infographic : He hates it: First we’ve got an hourglass metaphor wrecked by the fact that “now” (i.e. the pinch point in the glass) is actually 3-5 years in the future and the past sand includes “up to three years” in the future. Then there are the percentages which are appear to represent a vertical distance, not volume of sand or width of the hourglass. Add to that a strange color scheme in which green goes from dark to light to dark again. I know January’s not even finished yet, but surely a competitor for worst infographic of 2013? Keirstead doesn’t even comment on what I see as the worst aspect of the graph, which is that the “3-5 years” band is the narrowest on the graph, but expressed as a per-year rate it is actually the highest of all the percentages. The hourglass visualization does the astounding feat of taking the period where the executives expect the highest rate of change and presenting it as a minimum in the graph.
3 0.9687987 886 andrew gelman stats-2011-09-02-The new Helen DeWitt novel
Introduction: I read the excerpt in n+1. As one would expect of DeWitt, it was great, while being nothing at all like her other book. THe new book reminded me a bit of Philip K. Dick. Here’s a brief excerpt (which is not actually particularly PKD-like) of the main character talking to himself: “I don’t have what it takes,” he said. He had never said it before because saying it would be like admitting he couldn’t make the grade. I’m not pulling out this quote to sell you on the book. The lines just struck me because of the exquisite distinctions, the idea that “don’t have what it takes” is somehow different than “couldn’t make the grade,” the idea that this character, who expresses his thoughts in empty phrases, ends up assigning to these phrases a set of precise meanings that make sense only to him. One reason Lightning Rods was so fun and refreshing to read is that it’s a non-formula novel that, unlike ChabonFranzenLethemBakerEtc—and, for that matter, unlike Virginia Woolf—is about c
4 0.96576065 1824 andrew gelman stats-2013-04-25-Fascinating graphs from facebook data
Introduction: Yair points us to this page full of wonderful graphs from the Stephen Wolfram blog. Here are a few: And some words: People talk less about video games as they get older, and more about politics and the weather. Men typically talk more about sports and technology than women—and, somewhat surprisingly to me, they also talk more about movies, television and music. Women talk more about pets+animals, family+friends, relationships—and, at least after they reach child-bearing years, health. . . . Some of this is rather depressingly stereotypical. And most of it isn’t terribly surprising to anyone who’s known a reasonable diversity of people of different ages. But what to me is remarkable is how we can see everything laid out in such quantitative detail in the pictures above—kind of a signature of people’s thinking as they go through life. Of course, the pictures above are all based on aggregate data, carefully anonymized. But if we start looking at individuals, we’ll s
5 0.96377915 956 andrew gelman stats-2011-10-13-Hey, you! Don’t take that class!
Introduction: Back when I taught at Berkeley, I once asked a Ph.D. student how he’d decided to work with me. He said that a couple of the tenured professors had advised him not to take my class, and that this advice had got him curious: What about Bayesian statistics is so dangerous that it can scare these otherwise unflappable stat professors. Overall, my senior colleagues’ advice to students to avoid my course probably decreased my enrollment, but the students who did decide to attend surely had better character than the ones who followed directions. (Or, at least I’d like to think that.) I was reminded of that incident recently when reading a news article by Marc Tracy: A U.S. Department of Education committee is investigating whether a Columbia University department head “steered” a Jewish student away from taking a class on the Mideast taught by Professor Joseph Massad due to the perception that she would be “uncomfortable” because of the professor’s pro-Palestinian tilt . . . “Ba
6 0.96268803 884 andrew gelman stats-2011-09-01-My course this fall on Bayesian Computation
7 0.9625963 430 andrew gelman stats-2010-11-25-The von Neumann paradox
8 0.9621169 1163 andrew gelman stats-2012-02-12-Meta-analysis, game theory, and incentives to do replicable research
9 0.96189624 1723 andrew gelman stats-2013-02-15-Wacky priors can work well?
10 0.96117592 2288 andrew gelman stats-2014-04-10-Small multiples of lineplots > maps (ok, not always, but yes in this case)
11 0.96104777 2137 andrew gelman stats-2013-12-17-Replication backlash
12 0.95972985 2186 andrew gelman stats-2014-01-26-Infoviz on top of stat graphic on top of spreadsheet
13 0.95928639 284 andrew gelman stats-2010-09-18-Continuing efforts to justify false “death panels” claim
15 0.95879149 2281 andrew gelman stats-2014-04-04-The Notorious N.H.S.T. presents: Mo P-values Mo Problems
16 0.95779103 711 andrew gelman stats-2011-05-14-Steven Rhoads’s book, “The Economist’s View of the World”
17 0.95773864 71 andrew gelman stats-2010-06-07-Pay for an A?
18 0.95699733 586 andrew gelman stats-2011-02-23-A statistical version of Arrow’s paradox
19 0.95665997 2280 andrew gelman stats-2014-04-03-As the boldest experiment in journalism history, you admit you made a mistake
20 0.95663881 210 andrew gelman stats-2010-08-16-What I learned from those tough 538 commenters