andrew_gelman_stats andrew_gelman_stats-2010 andrew_gelman_stats-2010-293 knowledge-graph by maker-knowledge-mining

293 andrew gelman stats-2010-09-23-Lowess is great


meta infos for this blog

Source: html

Introduction: I came across this old blog entry that was just hilarious–but it’s from 2005 so I think most of you haven’t seen it. It’s the story of two people named Martin Voracek and Maryanne Fisher who in a published discussion criticized lowess (a justly popular nonlinear regression method). Curious, I looked up “Martin Voracek” on the web and found an article in the British Medical Journal whose the title promised “trend analysis.” I was wondering what statistical methods they used–something more sophisticated than lowess, perhaps? They did have one figure, and here it is: Voracek and Fisher, the critics of lowess, are fit straight lines to data to clearly nonlinear data! It’s most obvious in their leftmost graph. Voracek and Fisher get full credit for showing scatterplots, but hey . . . they should try lowess next time! What’s really funny in the graph are the little dotted lines indicating inferential uncertainty in the regression lines–all under the assumption of linearity,


Summary: the most important sentenses genereted by tfidf model

sentIndex sentText sentNum sentScore

1 I came across this old blog entry that was just hilarious–but it’s from 2005 so I think most of you haven’t seen it. [sent-1, score-0.061]

2 It’s the story of two people named Martin Voracek and Maryanne Fisher who in a published discussion criticized lowess (a justly popular nonlinear regression method). [sent-2, score-1.111]

3 Curious, I looked up “Martin Voracek” on the web and found an article in the British Medical Journal whose the title promised “trend analysis. [sent-3, score-0.313]

4 ” I was wondering what statistical methods they used–something more sophisticated than lowess, perhaps? [sent-4, score-0.131]

5 They did have one figure, and here it is: Voracek and Fisher, the critics of lowess, are fit straight lines to data to clearly nonlinear data! [sent-5, score-0.654]

6 Voracek and Fisher get full credit for showing scatterplots, but hey . [sent-7, score-0.169]

7 What’s really funny in the graph are the little dotted lines indicating inferential uncertainty in the regression lines–all under the assumption of linearity, of course. [sent-11, score-0.696]

8 (You can see enlarged versions of their graphs at this link . [sent-12, score-0.075]

9 ) As usual, my own house has some glass-based construction and so it’s probably not so wise of me to throw stones, but really! [sent-13, score-0.296]

10 Not knowing about lowess is one thing, but knowing about it, then fitting a straight line to nonlinear data, then criticizing someone else for doing it right–that’s a bit much. [sent-14, score-1.243]


similar blogs computed by tfidf model

tfidf for this blog:

wordName wordTfidf (topN-words)

[('voracek', 0.528), ('lowess', 0.495), ('nonlinear', 0.234), ('fisher', 0.221), ('lines', 0.165), ('martin', 0.153), ('straight', 0.132), ('knowing', 0.129), ('dotted', 0.12), ('maryanne', 0.12), ('justly', 0.113), ('stones', 0.113), ('linearity', 0.105), ('scatterplots', 0.102), ('promised', 0.099), ('wise', 0.093), ('hilarious', 0.09), ('indicating', 0.082), ('construction', 0.08), ('inferential', 0.079), ('criticized', 0.077), ('regression', 0.077), ('british', 0.077), ('versions', 0.075), ('critics', 0.074), ('sophisticated', 0.074), ('trend', 0.071), ('criticizing', 0.068), ('named', 0.065), ('throw', 0.062), ('credit', 0.061), ('entry', 0.061), ('house', 0.061), ('assumption', 0.06), ('curious', 0.06), ('funny', 0.06), ('web', 0.058), ('wondering', 0.057), ('fitting', 0.056), ('medical', 0.056), ('showing', 0.055), ('whose', 0.054), ('obvious', 0.053), ('uncertainty', 0.053), ('hey', 0.053), ('title', 0.053), ('popular', 0.05), ('looked', 0.049), ('data', 0.049), ('usual', 0.048)]

similar blogs list:

simIndex simValue blogId blogTitle

same-blog 1 1.0 293 andrew gelman stats-2010-09-23-Lowess is great

Introduction: I came across this old blog entry that was just hilarious–but it’s from 2005 so I think most of you haven’t seen it. It’s the story of two people named Martin Voracek and Maryanne Fisher who in a published discussion criticized lowess (a justly popular nonlinear regression method). Curious, I looked up “Martin Voracek” on the web and found an article in the British Medical Journal whose the title promised “trend analysis.” I was wondering what statistical methods they used–something more sophisticated than lowess, perhaps? They did have one figure, and here it is: Voracek and Fisher, the critics of lowess, are fit straight lines to data to clearly nonlinear data! It’s most obvious in their leftmost graph. Voracek and Fisher get full credit for showing scatterplots, but hey . . . they should try lowess next time! What’s really funny in the graph are the little dotted lines indicating inferential uncertainty in the regression lines–all under the assumption of linearity,

2 0.22733006 1283 andrew gelman stats-2012-04-26-Let’s play “Guess the smoother”!

Introduction: Andre de Boer writes: In my profession as a risk manager I encountered this graph: I can’t figure out what kind of regression this is, would you be so kind to enlighten me? The points represent (maturity,yield) of bonds. My reply: That’s a fun problem, reverse-engineering a curve fit! My first guess is lowess, although it seems too flat and asympoty on the right side of the graph to be lowess. Maybe a Gaussian process? Looks too smooth to be a spline. I guess I’ll go with my original guess, on the theory that lowess is the most accessible smoother out there, and if someone fit something much more complicated they’d make more of a big deal about it. On the other hand, if the curve is an automatic output of some software (Excel? Stata?) then it could be just about anything. Does anyone have any ideas?

3 0.1061988 861 andrew gelman stats-2011-08-19-Will Stan work well with 40×40 matrices?

Introduction: Tomas Iesmantas writes: I’m dealing with high dimensional (40-50 parameters) hierarchical bayesian model applied to nonlinear Poisson regression problem. Now I’m using an adaptive version for the Metropolis adjusted Langevin algorithm with a truncated drift (Yves F. Atchade, 2003) to obtain samples from posterior. But this algorithm is not very efficient in my case, it needs several millions iterations as burn-in period. And simulation takes quite a long time, since algorithm has to work with 40×40 matrices. Maybe you know another MCMC algorithm which could take not so many burn-in samples and would be able to deal with nonlinear regression? In non-hierarchical nonlinear regression model adaptive metropolis algorithm is enough, but in hierarchical case I could use something more effective. My reply: Try fitting the model in Stan. If that doesn’t work, let me know.

4 0.099109814 2254 andrew gelman stats-2014-03-18-Those wacky anti-Bayesians used to be intimidating, but now they’re just pathetic

Introduction: From 2006 : Eric Archer forwarded this document by Nick Freemantle, “The Reverend Bayes—was he really a prophet?”, in the Journal of the Royal Society of Medicine: Does [Bayes's] contribution merit the enthusiasms of his followers? Or is his legacy overhyped? . . . First, Bayesians appear to have an absolute right to disapprove of any conventional approach in statistics without offering a workable alternative—for example, a colleague recently stated at a meeting that ‘. . . it is OK to have multiple comparisons because Bayesians’ don’t believe in alpha spending’. . . . Second, Bayesians appear to build an army of straw men—everything it seems is different and better from a Bayesian perspective, although many of the concepts seem remarkably familiar. For example, a very well known Bayesian statistician recently surprised the audience with his discovery of the P value as a useful Bayesian statistic at a meeting in Birmingham. Third, Bayesians possess enormous enthusiasm fo

5 0.095853239 1869 andrew gelman stats-2013-05-24-In which I side with Neyman over Fisher

Introduction: As a data analyst and a scientist, Fisher > Neyman, no question. But as a theorist, Fisher came up with ideas that worked just fine in his applications but can fall apart when people try to apply them too generally. Here’s an example that recently came up. Deborah Mayo pointed me to a comment by Stephen Senn on the so-called Fisher and Neyman null hypotheses. In an experiment with n participants (or, as we used to say, subjects or experimental units), the Fisher null hypothesis is that the treatment effect is exactly 0 for every one of the n units, while the Neyman null hypothesis is that the individual treatment effects can be negative or positive but have an average of zero. Senn explains why Neyman’s hypothesis in general makes no sense—the short story is that Fisher’s hypothesis seems relevant in some problems (sometimes we really are studying effects that are zero or close enough for all practical purposes), whereas Neyman’s hypothesis just seems weird (it’s implausible

6 0.094692566 1452 andrew gelman stats-2012-08-09-Visually weighting regression displays

7 0.088970333 1293 andrew gelman stats-2012-05-01-Huff the Magic Dragon

8 0.084685296 1461 andrew gelman stats-2012-08-17-Graphs showing uncertainty using lighter intensities for the lines that go further from the center, to de-emphasize the edges

9 0.084285855 1880 andrew gelman stats-2013-06-02-Flame bait

10 0.081791431 289 andrew gelman stats-2010-09-21-“How segregated is your city?”: A story of why every graph, no matter how clear it seems to be, needs a caption to anchor the reader in some numbers

11 0.08112748 1376 andrew gelman stats-2012-06-12-Simple graph WIN: the example of birthday frequencies

12 0.081108823 2339 andrew gelman stats-2014-05-19-On deck this week

13 0.078083843 134 andrew gelman stats-2010-07-08-“What do you think about curved lines connecting discrete data-points?”

14 0.073921174 541 andrew gelman stats-2011-01-27-Why can’t I be more like Bill James, or, The use of default and default-like models

15 0.07046058 1881 andrew gelman stats-2013-06-03-Boot

16 0.06574066 1825 andrew gelman stats-2013-04-25-It’s binless! A program for computing normalizing functions

17 0.06442403 61 andrew gelman stats-2010-05-31-A data visualization manifesto

18 0.063318044 2186 andrew gelman stats-2014-01-26-Infoviz on top of stat graphic on top of spreadsheet

19 0.061057314 878 andrew gelman stats-2011-08-29-Infovis, infographics, and data visualization: Where I’m coming from, and where I’d like to go

20 0.058143228 2245 andrew gelman stats-2014-03-12-More on publishing in journals


similar blogs computed by lsi model

lsi for this blog:

topicId topicWeight

[(0, 0.105), (1, -0.011), (2, -0.017), (3, 0.015), (4, 0.052), (5, -0.067), (6, -0.027), (7, -0.014), (8, 0.029), (9, -0.008), (10, 0.014), (11, 0.01), (12, 0.007), (13, 0.008), (14, 0.007), (15, 0.009), (16, 0.005), (17, 0.002), (18, -0.017), (19, -0.024), (20, 0.027), (21, 0.03), (22, 0.005), (23, -0.023), (24, 0.029), (25, -0.001), (26, -0.005), (27, -0.017), (28, -0.009), (29, 0.009), (30, 0.056), (31, 0.029), (32, -0.023), (33, -0.004), (34, -0.012), (35, -0.012), (36, -0.027), (37, -0.024), (38, -0.01), (39, 0.008), (40, 0.011), (41, 0.058), (42, 0.041), (43, 0.007), (44, 0.037), (45, 0.018), (46, -0.039), (47, 0.001), (48, 0.019), (49, -0.012)]

similar blogs list:

simIndex simValue blogId blogTitle

same-blog 1 0.95092881 293 andrew gelman stats-2010-09-23-Lowess is great

Introduction: I came across this old blog entry that was just hilarious–but it’s from 2005 so I think most of you haven’t seen it. It’s the story of two people named Martin Voracek and Maryanne Fisher who in a published discussion criticized lowess (a justly popular nonlinear regression method). Curious, I looked up “Martin Voracek” on the web and found an article in the British Medical Journal whose the title promised “trend analysis.” I was wondering what statistical methods they used–something more sophisticated than lowess, perhaps? They did have one figure, and here it is: Voracek and Fisher, the critics of lowess, are fit straight lines to data to clearly nonlinear data! It’s most obvious in their leftmost graph. Voracek and Fisher get full credit for showing scatterplots, but hey . . . they should try lowess next time! What’s really funny in the graph are the little dotted lines indicating inferential uncertainty in the regression lines–all under the assumption of linearity,

2 0.80559349 1283 andrew gelman stats-2012-04-26-Let’s play “Guess the smoother”!

Introduction: Andre de Boer writes: In my profession as a risk manager I encountered this graph: I can’t figure out what kind of regression this is, would you be so kind to enlighten me? The points represent (maturity,yield) of bonds. My reply: That’s a fun problem, reverse-engineering a curve fit! My first guess is lowess, although it seems too flat and asympoty on the right side of the graph to be lowess. Maybe a Gaussian process? Looks too smooth to be a spline. I guess I’ll go with my original guess, on the theory that lowess is the most accessible smoother out there, and if someone fit something much more complicated they’d make more of a big deal about it. On the other hand, if the curve is an automatic output of some software (Excel? Stata?) then it could be just about anything. Does anyone have any ideas?

3 0.77990818 1461 andrew gelman stats-2012-08-17-Graphs showing uncertainty using lighter intensities for the lines that go further from the center, to de-emphasize the edges

Introduction: Following up on our recent discussion of visually-weighted displays of uncertainty in regression curves, Lucas Leeman sent in the following two graphs: First, the basic spaghetti-style plot showing inferential uncertainty in the E(y|x) curve: Then, a version using even lighter intensities for the lines that go further from the center, to further de-emphasize the edges: P.S. More (including code!) here .

4 0.77117234 1452 andrew gelman stats-2012-08-09-Visually weighting regression displays

Introduction: Solomon Hsiang writes : One of my colleagues suggested that I send you this very short note that I wrote on a new approach for displaying regression result uncertainty (attached). It’s very simple, and I’ve found it effective in one of my papers where I actually use it, but if you have a chance to glance over it and have any ideas for how to sell the approach or make it better, I’d be very interested to hear them. (Also, if you’ve seen that someone else has already made this point, I’d appreciate knowing that too.) Here’s an example: Hsiang writes: In Panel A, our eyes are drawn outward, away from the center of the display and toward the swirling confidence intervals at the edges. But in Panel B, our eyes are attracted to the middle of the regression line, where the high contrast between the line and the background is sharp and visually heavy. By using visual-weighting, we focus our readers’s attention on those portions of the regression that contain the most inform

5 0.74697465 1478 andrew gelman stats-2012-08-31-Watercolor regression

Introduction: Solomon Hsiang writes: Two small follow-ups based on the discussion (the second/bigger one is to address your comment about the 95% CI edges). 1. I realized that if we plot the confidence intervals as a solid color that fades (eg. using the “fixed ink” scheme from before) we can make sure the regression line also has heightened visual weight where confidence is high by plotting the line white. This makes the contrast (and thus visual weight) between the regression line and the CI highest when the CI is narrow and dark. As the CI fade near the edges, so does the contrast with the regression line. This is a small adjustment, but I like it because it is so simple and it makes the graph much nicer. (see “visually_weighted_fill_reverse” attached). My posted code has been updated to do this automatically. 2. You and your readers didn’t like that the edges of the filled CI were so sharp and arbitrary. But I didn’t like that the contrast between the spaghetti lines and the background

6 0.74590576 134 andrew gelman stats-2010-07-08-“What do you think about curved lines connecting discrete data-points?”

7 0.73143971 1470 andrew gelman stats-2012-08-26-Graphs showing regression uncertainty: the code!

8 0.7108289 929 andrew gelman stats-2011-09-27-Visual diagnostics for discrete-data regressions

9 0.69416004 1235 andrew gelman stats-2012-03-29-I’m looking for a quadrille notebook with faint lines

10 0.69320381 294 andrew gelman stats-2010-09-23-Thinking outside the (graphical) box: Instead of arguing about how best to fix a bar chart, graph it as a time series lineplot instead

11 0.69106257 61 andrew gelman stats-2010-05-31-A data visualization manifesto

12 0.69084102 2091 andrew gelman stats-2013-11-06-“Marginally significant”

13 0.68674445 144 andrew gelman stats-2010-07-13-Hey! Here’s a referee report for you!

14 0.68044281 262 andrew gelman stats-2010-09-08-Here’s how rumors get started: Lineplots, dotplots, and nonfunctional modernist architecture

15 0.67481971 1609 andrew gelman stats-2012-12-06-Stephen Kosslyn’s principles of graphics and one more: There’s no need to cram everything into a single plot

16 0.67426717 2288 andrew gelman stats-2014-04-10-Small multiples of lineplots > maps (ok, not always, but yes in this case)

17 0.67010856 1684 andrew gelman stats-2013-01-20-Ugly ugly ugly

18 0.66815645 672 andrew gelman stats-2011-04-20-The R code for those time-use graphs

19 0.66555178 1894 andrew gelman stats-2013-06-12-How to best graph the Beveridge curve, relating the vacancy rate in jobs to the unemployment rate?

20 0.66343343 2246 andrew gelman stats-2014-03-13-An Economist’s Guide to Visualizing Data


similar blogs computed by lda model

lda for this blog:

topicId topicWeight

[(15, 0.073), (16, 0.068), (20, 0.014), (24, 0.092), (26, 0.013), (27, 0.051), (53, 0.012), (57, 0.015), (63, 0.282), (89, 0.012), (93, 0.012), (99, 0.231)]

similar blogs list:

simIndex simValue blogId blogTitle

1 0.93956131 739 andrew gelman stats-2011-05-31-When Did Girls Start Wearing Pink?

Introduction: That cute picture is of toddler FDR in a dress, from 1884. Jeanne Maglaty writes : A Ladies’ Home Journal article [or maybe from a different source, according to a commenter] in June 1918 said, “The generally accepted rule is pink for the boys, and blue for the girls. The reason is that pink, being a more decided and stronger color, is more suitable for the boy, while blue, which is more delicate and dainty, is prettier for the girl.” Other sources said blue was flattering for blonds, pink for brunettes; or blue was for blue-eyed babies, pink for brown-eyed babies, according to Paoletti. In 1927, Time magazine printed a chart showing sex-appropriate colors for girls and boys according to leading U.S. stores. In Boston, Filene’s told parents to dress boys in pink. So did Best & Co. in New York City, Halle’s in Cleveland and Marshall Field in Chicago. Today’s color dictate wasn’t established until the 1940s . . . When the women’s liberation movement arrived in the mid-1960s, w

2 0.93428731 745 andrew gelman stats-2011-06-04-High-level intellectual discussions in the Columbia statistics department

Introduction: In case anybody is wondering what we really spend our time talking about . . .

3 0.93085873 628 andrew gelman stats-2011-03-25-100-year floods

Introduction: According to the National Weather Service : What is a 100 year flood? A 100 year flood is an event that statistically has a 1% chance of occurring in any given year. A 500 year flood has a .2% chance of occurring and a 1000 year flood has a .1% chance of occurring. The accompanying map shows a part of Tennessee that in May 2010 had 1000-year levels of flooding. At first, it seems hard to believe that a 1000-year flood would have just happened to occur last year. But then, this is just a 1000-year flood for that particular place. I don’t really have a sense of the statistics of these events. How many 100-year, 500-year, and 1000-year flood events have been recorded by the Weather Service, and when have they occurred?

4 0.9156611 313 andrew gelman stats-2010-10-03-A question for psychometricians

Introduction: Don Coffin writes: A colleague of mine and I are doing a presentation for new faculty on a number of topics related to teaching. Our charge is to identify interesting issues and to find research-based information for them about how to approach things. So, what I wondered is, do you know of any published research dealing with the sort of issues about structuring a course and final exam in the ways you talk about in this blog post ? Some poking around in the usual places hasn’t turned anything up yet. I don’t really know the psychometrics literature but I imagine that some good stuff has been written on principles of test design. There are probably some good papers from back in the 1920s. Can anyone supply some references?

5 0.91249537 126 andrew gelman stats-2010-07-03-Graphical presentation of risk ratios

Introduction: Jimmy passes this article by Ahmad Reza Hosseinpoor and Carla AbouZahr. I have little to say, except that (a) they seem to be making a reasonable point, and (b) those bar graphs are pretty ugly.

6 0.91054142 684 andrew gelman stats-2011-04-28-Hierarchical ordered logit or probit

7 0.90665066 33 andrew gelman stats-2010-05-14-Felix Salmon wins the American Statistical Association’s Excellence in Statistical Reporting Award

8 0.89128608 568 andrew gelman stats-2011-02-11-Calibration in chess

9 0.88488579 1621 andrew gelman stats-2012-12-13-Puzzles of criminal justice

same-blog 10 0.88433218 293 andrew gelman stats-2010-09-23-Lowess is great

11 0.88137531 782 andrew gelman stats-2011-06-29-Putting together multinomial discrete regressions by combining simple logits

12 0.87728286 1078 andrew gelman stats-2011-12-22-Tables as graphs: The Ramanujan principle

13 0.85387683 1484 andrew gelman stats-2012-09-05-Two exciting movie ideas: “Second Chance U” and “The New Dirty Dozen”

14 0.84774327 102 andrew gelman stats-2010-06-21-Why modern art is all in the mind

15 0.83726931 1316 andrew gelman stats-2012-05-12-black and Black, white and White

16 0.83438325 1480 andrew gelman stats-2012-09-02-“If our product is harmful . . . we’ll stop making it.”

17 0.79385328 286 andrew gelman stats-2010-09-20-Are the Democrats avoiding a national campaign?

18 0.79354203 1201 andrew gelman stats-2012-03-07-Inference = data + model

19 0.7882846 2249 andrew gelman stats-2014-03-15-Recently in the sister blog

20 0.78559637 544 andrew gelman stats-2011-01-29-Splitting the data