andrew_gelman_stats andrew_gelman_stats-2010 andrew_gelman_stats-2010-134 knowledge-graph by maker-knowledge-mining

134 andrew gelman stats-2010-07-08-“What do you think about curved lines connecting discrete data-points?”


meta infos for this blog

Source: html

Introduction: John Keltz writes: What do you think about curved lines connecting discrete data-points? (For example, here .) The problem with the smoothed graph is it seems to imply that something is going on in between the discrete data points, which is false. However, the straight-line version isn’t representing actual events either- it is just helping the eye connect each point. So maybe the curved version is also just helping the eye connect each point, and looks better doing it. In my own work (value-added modeling of achievement test scores) I use straight lines, but I guess I am not too bothered when people use smoothing. I’d appreciate your input. Regular readers will be unsurprised that, yes, I have an opinion on this one, and that this opinion is connected to some more general ideas about statistical graphics. In general I’m not a fan of the curved lines. They’re ok, but I don’t really see the point. I can connect the dots just fine without the curves. The more general id


Summary: the most important sentenses genereted by tfidf model

sentIndex sentText sentNum sentScore

1 John Keltz writes: What do you think about curved lines connecting discrete data-points? [sent-1, score-1.066]

2 ) The problem with the smoothed graph is it seems to imply that something is going on in between the discrete data points, which is false. [sent-3, score-0.3]

3 However, the straight-line version isn’t representing actual events either- it is just helping the eye connect each point. [sent-4, score-0.793]

4 So maybe the curved version is also just helping the eye connect each point, and looks better doing it. [sent-5, score-1.34]

5 In my own work (value-added modeling of achievement test scores) I use straight lines, but I guess I am not too bothered when people use smoothing. [sent-6, score-0.531]

6 Regular readers will be unsurprised that, yes, I have an opinion on this one, and that this opinion is connected to some more general ideas about statistical graphics. [sent-8, score-0.452]

7 I can connect the dots just fine without the curves. [sent-11, score-0.289]

8 The more general idea is that the line, whether curved or straight, serves two purposes: first, it’s an (interpolative) estimate of some continuous curve; second, it makes short-term trends apparent in a way that’s harder to see from the points alone. [sent-12, score-1.206]

9 The straight line also serves a third purpose, which is to make clear the reliance on the original data. [sent-13, score-0.77]

10 To put it another way, if I’m doing a simple interpolation, I want it to be clear that I’m doing a simple interpolation–and the straight lines make this clear. [sent-14, score-0.68]

11 To me, the added ambiguity is more of a cost than the smoother interpolation is a benefit. [sent-17, score-0.618]


similar blogs computed by tfidf model

tfidf for this blog:

wordName wordTfidf (topN-words)

[('curved', 0.638), ('interpolation', 0.315), ('straight', 0.255), ('connect', 0.212), ('lines', 0.212), ('serves', 0.16), ('eye', 0.151), ('helping', 0.144), ('discrete', 0.129), ('reliance', 0.105), ('smoothed', 0.105), ('smoother', 0.105), ('unsurprised', 0.105), ('version', 0.102), ('opinion', 0.094), ('ambiguity', 0.093), ('general', 0.089), ('connecting', 0.087), ('line', 0.084), ('achievement', 0.078), ('dots', 0.077), ('representing', 0.075), ('apparent', 0.075), ('purposes', 0.074), ('simple', 0.072), ('connected', 0.07), ('curve', 0.07), ('clear', 0.069), ('imply', 0.066), ('fan', 0.065), ('bothered', 0.065), ('scores', 0.063), ('trends', 0.062), ('harder', 0.062), ('events', 0.062), ('points', 0.061), ('purpose', 0.06), ('regular', 0.059), ('continuous', 0.059), ('appreciate', 0.056), ('third', 0.055), ('benefit', 0.055), ('added', 0.053), ('cost', 0.052), ('maybe', 0.051), ('actual', 0.047), ('use', 0.045), ('test', 0.043), ('looks', 0.042), ('original', 0.042)]

similar blogs list:

simIndex simValue blogId blogTitle

same-blog 1 1.0 134 andrew gelman stats-2010-07-08-“What do you think about curved lines connecting discrete data-points?”

Introduction: John Keltz writes: What do you think about curved lines connecting discrete data-points? (For example, here .) The problem with the smoothed graph is it seems to imply that something is going on in between the discrete data points, which is false. However, the straight-line version isn’t representing actual events either- it is just helping the eye connect each point. So maybe the curved version is also just helping the eye connect each point, and looks better doing it. In my own work (value-added modeling of achievement test scores) I use straight lines, but I guess I am not too bothered when people use smoothing. I’d appreciate your input. Regular readers will be unsurprised that, yes, I have an opinion on this one, and that this opinion is connected to some more general ideas about statistical graphics. In general I’m not a fan of the curved lines. They’re ok, but I don’t really see the point. I can connect the dots just fine without the curves. The more general id

2 0.11081477 2186 andrew gelman stats-2014-01-26-Infoviz on top of stat graphic on top of spreadsheet

Introduction: Kaiser points to this infoviz from MIT’s Technology Review: Kaiser writes: What makes the designer want to tilt the reader’s head? This chart is unreadable. It also fails the self-sufficiency test. All 13 data points are printed onto the chart. You really don’t need the axis, and the gridlines. A further design flaw is the use of signposts. Our eyes are drawn to the hexagons containing the brand icons but the data is at the other end of the signpost, where it is planted on the surface! Here is a sketch of something not as cute: I [Kaiser] expressed time as years . . . The mobile-related entities are labelled red. The dots could be replaced by the hexagonal brand icons. I agree with all of Kaiser’s criticisms, and I agree that his graph is, from the statistical perspective, a zillion times better than what was published. On the other hand, unusual images can get attention. Recall the famous/notorious clock plot from Florence Nightingale . This is why I’ve move

3 0.093590118 1452 andrew gelman stats-2012-08-09-Visually weighting regression displays

Introduction: Solomon Hsiang writes : One of my colleagues suggested that I send you this very short note that I wrote on a new approach for displaying regression result uncertainty (attached). It’s very simple, and I’ve found it effective in one of my papers where I actually use it, but if you have a chance to glance over it and have any ideas for how to sell the approach or make it better, I’d be very interested to hear them. (Also, if you’ve seen that someone else has already made this point, I’d appreciate knowing that too.) Here’s an example: Hsiang writes: In Panel A, our eyes are drawn outward, away from the center of the display and toward the swirling confidence intervals at the edges. But in Panel B, our eyes are attracted to the middle of the regression line, where the high contrast between the line and the background is sharp and visually heavy. By using visual-weighting, we focus our readers’s attention on those portions of the regression that contain the most inform

4 0.083018593 1283 andrew gelman stats-2012-04-26-Let’s play “Guess the smoother”!

Introduction: Andre de Boer writes: In my profession as a risk manager I encountered this graph: I can’t figure out what kind of regression this is, would you be so kind to enlighten me? The points represent (maturity,yield) of bonds. My reply: That’s a fun problem, reverse-engineering a curve fit! My first guess is lowess, although it seems too flat and asympoty on the right side of the graph to be lowess. Maybe a Gaussian process? Looks too smooth to be a spline. I guess I’ll go with my original guess, on the theory that lowess is the most accessible smoother out there, and if someone fit something much more complicated they’d make more of a big deal about it. On the other hand, if the curve is an automatic output of some software (Excel? Stata?) then it could be just about anything. Does anyone have any ideas?

5 0.080916762 61 andrew gelman stats-2010-05-31-A data visualization manifesto

Introduction: Details matter (at least, they do for me), but we don’t yet have a systematic way of going back and forth between the structure of a graph, its details, and the underlying questions that motivate our visualizations. (Cleveland, Wilkinson, and others have written a bit on how to formalize these connections, and I’ve thought about it too, but we have a ways to go.) I was thinking about this difficulty after reading an article on graphics by some computer scientists that was well-written but to me lacked a feeling for the linkages between substantive/statistical goals and graphical details. I have problems with these issues too, and my point here is not to criticize but to move the discussion forward. When thinking about visualization, how important are the details? Aleks pointed me to this article by Jeffrey Heer, Michael Bostock, and Vadim Ogievetsky, “A Tour through the Visualization Zoo: A survey of powerful visualization techniques, from the obvious to the obscure.” Th

6 0.079789959 2003 andrew gelman stats-2013-08-30-Stan Project: Continuous Relaxations for Discrete MRFs

7 0.078227013 217 andrew gelman stats-2010-08-19-The “either-or” fallacy of believing in discrete models: an example of folk statistics

8 0.078083843 293 andrew gelman stats-2010-09-23-Lowess is great

9 0.073579036 1609 andrew gelman stats-2012-12-06-Stephen Kosslyn’s principles of graphics and one more: There’s no need to cram everything into a single plot

10 0.073250629 2079 andrew gelman stats-2013-10-27-Uncompressing the concept of compressed sensing

11 0.072279759 252 andrew gelman stats-2010-09-02-R needs a good function to make line plots

12 0.070048563 348 andrew gelman stats-2010-10-17-Joanne Gowa scooped me by 22 years in my criticism of Axelrod’s Evolution of Cooperation

13 0.0692036 1529 andrew gelman stats-2012-10-11-Bayesian brains?

14 0.06716916 781 andrew gelman stats-2011-06-28-The holes in my philosophy of Bayesian data analysis

15 0.065237045 266 andrew gelman stats-2010-09-09-The future of R

16 0.065092131 1461 andrew gelman stats-2012-08-17-Graphs showing uncertainty using lighter intensities for the lines that go further from the center, to de-emphasize the edges

17 0.064245157 2132 andrew gelman stats-2013-12-13-And now, here’s something that would make Ed Tufte spin in his . . . ummm, Tufte’s still around, actually, so let’s just say I don’t think he’d like it!

18 0.06279622 944 andrew gelman stats-2011-10-05-How accurate is your gaydar?

19 0.061227888 1401 andrew gelman stats-2012-06-30-David Hogg on statistics

20 0.059880551 1228 andrew gelman stats-2012-03-25-Continuous variables in Bayesian networks


similar blogs computed by lsi model

lsi for this blog:

topicId topicWeight

[(0, 0.114), (1, 0.004), (2, 0.011), (3, 0.032), (4, 0.05), (5, -0.049), (6, -0.006), (7, 0.004), (8, 0.026), (9, 0.001), (10, -0.009), (11, -0.011), (12, -0.025), (13, -0.011), (14, -0.026), (15, 0.008), (16, 0.02), (17, 0.014), (18, -0.018), (19, -0.0), (20, 0.013), (21, 0.029), (22, -0.021), (23, -0.028), (24, 0.023), (25, -0.01), (26, 0.02), (27, -0.005), (28, -0.029), (29, 0.011), (30, 0.04), (31, -0.006), (32, -0.024), (33, -0.019), (34, -0.015), (35, -0.021), (36, -0.011), (37, -0.003), (38, 0.005), (39, -0.001), (40, 0.014), (41, 0.031), (42, 0.01), (43, 0.008), (44, 0.006), (45, 0.011), (46, 0.003), (47, -0.011), (48, 0.012), (49, -0.023)]

similar blogs list:

simIndex simValue blogId blogTitle

same-blog 1 0.95622915 134 andrew gelman stats-2010-07-08-“What do you think about curved lines connecting discrete data-points?”

Introduction: John Keltz writes: What do you think about curved lines connecting discrete data-points? (For example, here .) The problem with the smoothed graph is it seems to imply that something is going on in between the discrete data points, which is false. However, the straight-line version isn’t representing actual events either- it is just helping the eye connect each point. So maybe the curved version is also just helping the eye connect each point, and looks better doing it. In my own work (value-added modeling of achievement test scores) I use straight lines, but I guess I am not too bothered when people use smoothing. I’d appreciate your input. Regular readers will be unsurprised that, yes, I have an opinion on this one, and that this opinion is connected to some more general ideas about statistical graphics. In general I’m not a fan of the curved lines. They’re ok, but I don’t really see the point. I can connect the dots just fine without the curves. The more general id

2 0.86085415 1283 andrew gelman stats-2012-04-26-Let’s play “Guess the smoother”!

Introduction: Andre de Boer writes: In my profession as a risk manager I encountered this graph: I can’t figure out what kind of regression this is, would you be so kind to enlighten me? The points represent (maturity,yield) of bonds. My reply: That’s a fun problem, reverse-engineering a curve fit! My first guess is lowess, although it seems too flat and asympoty on the right side of the graph to be lowess. Maybe a Gaussian process? Looks too smooth to be a spline. I guess I’ll go with my original guess, on the theory that lowess is the most accessible smoother out there, and if someone fit something much more complicated they’d make more of a big deal about it. On the other hand, if the curve is an automatic output of some software (Excel? Stata?) then it could be just about anything. Does anyone have any ideas?

3 0.82704747 1470 andrew gelman stats-2012-08-26-Graphs showing regression uncertainty: the code!

Introduction: After our discussion of visual displays of regression uncertainty, I asked Solomon Hsiang and Lucas Leeman to send me their code. Both of them replied. Solomon wrote: The matlab and stata functions I wrote, as well as the script that replicates my figures, are all posted on my website . Also, I just added options to the main matlab function (vwregress.m) to make it display the spaghetti plot (similar to what Lucas did, but a simple bootstrap) and the shaded CI that you suggested (see figs below). They’re good suggestions. Personally, I [Hsiang] like the shaded CI better, since I think that all the visual activity in the spaghetti plot is a little distracting and sometimes adds visual weight in places where I wouldn’t want it. But the option is there in case people like it. Solomon then followed up with: I just thought of this small adjustment to your filled CI idea that seems neat. Cartographers like map projections that conserve area. We can do som

4 0.82093161 262 andrew gelman stats-2010-09-08-Here’s how rumors get started: Lineplots, dotplots, and nonfunctional modernist architecture

Introduction: 1. I remarked that Sharad had a good research article with some ugly graphs. 2. Dan posted Sharad’s graph and some unpleasant alternatives, inadvertently associating me with one of the unpleasant alternatives. Dan was comparing barplots with dotplots. 3. I commented on Dan’s site that, in this case, I’d much prefer a well-designed lineplot. I wrote: There’s a principle in decision analysis that the most important step is not the evaluation of the decision tree but the decision of what options to include in the tree in the first place. I think that’s what’s happening here. You’re seriously limiting yourself by considering the above options, which really are all the same graph with just slight differences in format. What you need to do is break outside the box. (Graph 2-which I think you think is the kind of thing that Gelman would like-indeed is the kind of thing that I think the R gurus like, but I don’t like it at all . It looks clean without actually being clea

5 0.81329662 2288 andrew gelman stats-2014-04-10-Small multiples of lineplots > maps (ok, not always, but yes in this case)

Introduction: Kaiser Fung shares this graph from Ritchie King: Kaiser writes: What they did right: - Did not put the data on a map - Ordered the countries by the most recent data point rather than alphabetically - Scale labels are found only on outer edge of the chart area, rather than one set per panel - Only used three labels for the 11 years on the plot - Did not overdo the vertical scale either The nicest feature was the XL scale applied only to South Korea. This destroys the small-multiples principle but draws attention to the top left corner, where the designer wants our eyes to go. I would have used smaller fonts throughout. I agree with all of Kaiser’s comments. I could even add a few more, like using light gray for the backgrounds and a bright blue for the lines, spacing the graphs well, using full country names rather than three-letter abbreviations. There are so many standard mistakes that go into default data displays that it is refreshing to see a simple graph done

6 0.81232744 670 andrew gelman stats-2011-04-20-Attractive but hard-to-read graph could be made much much better

7 0.80918354 1011 andrew gelman stats-2011-11-15-World record running times vs. distance

8 0.80821049 671 andrew gelman stats-2011-04-20-One more time-use graph

9 0.79778737 293 andrew gelman stats-2010-09-23-Lowess is great

10 0.79425806 294 andrew gelman stats-2010-09-23-Thinking outside the (graphical) box: Instead of arguing about how best to fix a bar chart, graph it as a time series lineplot instead

11 0.79343659 1376 andrew gelman stats-2012-06-12-Simple graph WIN: the example of birthday frequencies

12 0.7914933 843 andrew gelman stats-2011-08-07-Non-rant

13 0.78652757 1478 andrew gelman stats-2012-08-31-Watercolor regression

14 0.78421479 672 andrew gelman stats-2011-04-20-The R code for those time-use graphs

15 0.77772343 1684 andrew gelman stats-2013-01-20-Ugly ugly ugly

16 0.77594095 502 andrew gelman stats-2011-01-04-Cash in, cash out graph

17 0.77538234 929 andrew gelman stats-2011-09-27-Visual diagnostics for discrete-data regressions

18 0.76894128 1452 andrew gelman stats-2012-08-09-Visually weighting regression displays

19 0.76877302 1235 andrew gelman stats-2012-03-29-I’m looking for a quadrille notebook with faint lines

20 0.76735353 61 andrew gelman stats-2010-05-31-A data visualization manifesto


similar blogs computed by lda model

lda for this blog:

topicId topicWeight

[(16, 0.092), (21, 0.023), (22, 0.024), (24, 0.154), (27, 0.301), (43, 0.023), (72, 0.014), (99, 0.231)]

similar blogs list:

simIndex simValue blogId blogTitle

1 0.94490469 802 andrew gelman stats-2011-07-13-Super Sam Fuld Needs Your Help (with Foul Ball stats)

Introduction: I was pleasantly surprised to have my recreational reading about baseball in the New Yorker interrupted by a digression on statistics. Sam Fuld of the Tampa Bay Rays, was the subjet of a Ben McGrath profile in the 4 July 2011 issue of the New Yorker , in an article titled Super Sam . After quoting a minor-league trainer who described Fuld as “a bit of a geek” (who isn’t these days?), McGrath gets into that lovely New Yorker detail: One could have pointed out the more persuasive and telling examples, such as the fact that in 2005, after his first pro season, with the Class-A Peoria Chiefs, Fuld applied for a fall internship with Stats, Inc., the research firm that supplies broadcasters with much of the data anad analysis that you hear in sports telecasts. After a description of what they had him doing, reviewing footage of games and cataloguing, he said “I thought, They have a stat for everything, but they don’t have any stats regarding foul balls.” Fuld’s

2 0.91426432 1490 andrew gelman stats-2012-09-09-I’m still wondering . . .

Introduction: Why can’t I buy train and plane tickets through Amazon? That would be so much more convenient than the current system where I have to keep entering information into the damn forms over and over again.

3 0.90875071 347 andrew gelman stats-2010-10-17-Getting arm and lme4 running on the Mac

Introduction: Our “arm” package in R requires Doug Bates’s “lme4″ which fits multilevel models. lme4 is currently having some problems on the Mac. But installation on the Mac can be done; it just takes a bit of work. I have two sets of instructions below. From Yu-Sung: If you have MAC OS DVD, you should install developer X code packages from it. Otherwise, install them from here . After this, do the following in R: install.packages(“lme4″, type = “source”) Then you will have lme4 in R and you can install arm without a problem. And, from David Ozonoff: I installed the lme4 package via the Package Installer but this didn’t work, of course. I then installed, via this link , gfortran which seemed to put the libraries in the right place (I had earlier installed via Fink the gcc42 compiler, so I’m not sure if this is required or not). I then ran, in R, this: install.packages(c(“Matrix”,”lme4″), repos=”http://R-Forge.R-project.org”) This does not appear to work since it wi

4 0.88905233 930 andrew gelman stats-2011-09-28-Wiley Wegman chutzpah update: Now you too can buy a selection of garbled Wikipedia articles, for a mere $1400-$2800 per year!

Introduction: Someone passed on to a message from his university library announcing that the journal “Wiley Interdisciplinary Reviews: Computational Statistics” is no longer free. Librarians have to decide what to do, so I thought I’d offer the following consumer guide: Wiley Computational Statistics journal Wikipedia Frequency 6 issues per year Continuously updated Includes articles from Wikipedia? Yes Yes Cites the Wikipedia sources it uses? No Yes Edited by recipient of ASA Founders Award? Yes No Articles are subject to rigorous review? No Yes Errors, when discovered, get fixed? No Yes Number of vertices in n-dimensional hypercube? 2n 2 n Easy access to Brady Bunch trivia? No Yes Cost (North America) $1400-$2800 $0 Cost (UK) £986-£1972 £0 Cost (Europe) €1213-€2426 €0 The choice seems pretty clear to me! It’s funny for the Wiley journal to start charging now

same-blog 5 0.87787223 134 andrew gelman stats-2010-07-08-“What do you think about curved lines connecting discrete data-points?”

Introduction: John Keltz writes: What do you think about curved lines connecting discrete data-points? (For example, here .) The problem with the smoothed graph is it seems to imply that something is going on in between the discrete data points, which is false. However, the straight-line version isn’t representing actual events either- it is just helping the eye connect each point. So maybe the curved version is also just helping the eye connect each point, and looks better doing it. In my own work (value-added modeling of achievement test scores) I use straight lines, but I guess I am not too bothered when people use smoothing. I’d appreciate your input. Regular readers will be unsurprised that, yes, I have an opinion on this one, and that this opinion is connected to some more general ideas about statistical graphics. In general I’m not a fan of the curved lines. They’re ok, but I don’t really see the point. I can connect the dots just fine without the curves. The more general id

6 0.85456908 1472 andrew gelman stats-2012-08-28-Migrating from dot to underscore

7 0.84630239 173 andrew gelman stats-2010-07-31-Editing and clutch hitting

8 0.84571004 343 andrew gelman stats-2010-10-15-?

9 0.84212816 465 andrew gelman stats-2010-12-13-$3M health care prediction challenge

10 0.83450162 1727 andrew gelman stats-2013-02-19-Beef with data

11 0.82946855 708 andrew gelman stats-2011-05-12-Improvement of 5 MPG: how many more auto deaths?

12 0.81748867 1113 andrew gelman stats-2012-01-11-Toshiro Kageyama on professionalism

13 0.81530261 1255 andrew gelman stats-2012-04-10-Amtrak sucks

14 0.80898118 1869 andrew gelman stats-2013-05-24-In which I side with Neyman over Fisher

15 0.80800927 1238 andrew gelman stats-2012-03-31-Dispute about ethics of data sharing

16 0.80266631 652 andrew gelman stats-2011-04-07-Minor-league Stats Predict Major-league Performance, Sarah Palin, and Some Differences Between Baseball and Politics

17 0.80024767 804 andrew gelman stats-2011-07-15-Static sensitivity analysis

18 0.78289253 1982 andrew gelman stats-2013-08-15-Blaming scientific fraud on the Kuhnians

19 0.77743864 66 andrew gelman stats-2010-06-03-How can news reporters avoid making mistakes when reporting on technical issues? Or, Data used to justify “Data Used to Justify Health Savings Can Be Shaky” can be shaky

20 0.77281058 1293 andrew gelman stats-2012-05-01-Huff the Magic Dragon