andrew_gelman_stats andrew_gelman_stats-2012 andrew_gelman_stats-2012-1283 knowledge-graph by maker-knowledge-mining
Source: html
Introduction: Andre de Boer writes: In my profession as a risk manager I encountered this graph: I can’t figure out what kind of regression this is, would you be so kind to enlighten me? The points represent (maturity,yield) of bonds. My reply: That’s a fun problem, reverse-engineering a curve fit! My first guess is lowess, although it seems too flat and asympoty on the right side of the graph to be lowess. Maybe a Gaussian process? Looks too smooth to be a spline. I guess I’ll go with my original guess, on the theory that lowess is the most accessible smoother out there, and if someone fit something much more complicated they’d make more of a big deal about it. On the other hand, if the curve is an automatic output of some software (Excel? Stata?) then it could be just about anything. Does anyone have any ideas?
sentIndex sentText sentNum sentScore
1 Andre de Boer writes: In my profession as a risk manager I encountered this graph: I can’t figure out what kind of regression this is, would you be so kind to enlighten me? [sent-1, score-1.518]
2 My reply: That’s a fun problem, reverse-engineering a curve fit! [sent-3, score-0.4]
3 My first guess is lowess, although it seems too flat and asympoty on the right side of the graph to be lowess. [sent-4, score-0.887]
4 I guess I’ll go with my original guess, on the theory that lowess is the most accessible smoother out there, and if someone fit something much more complicated they’d make more of a big deal about it. [sent-7, score-1.84]
5 On the other hand, if the curve is an automatic output of some software (Excel? [sent-8, score-0.728]
wordName wordTfidf (topN-words)
[('lowess', 0.401), ('curve', 0.294), ('guess', 0.233), ('enlighten', 0.23), ('smoother', 0.22), ('manager', 0.185), ('kind', 0.182), ('smooth', 0.172), ('graph', 0.168), ('excel', 0.166), ('profession', 0.166), ('automatic', 0.166), ('fit', 0.157), ('accessible', 0.157), ('stata', 0.154), ('output', 0.151), ('gaussian', 0.148), ('flat', 0.148), ('de', 0.145), ('encountered', 0.142), ('complicated', 0.122), ('software', 0.117), ('represent', 0.117), ('risk', 0.114), ('fun', 0.106), ('deal', 0.103), ('side', 0.1), ('anyone', 0.094), ('figure', 0.094), ('process', 0.092), ('looks', 0.089), ('original', 0.088), ('although', 0.086), ('hand', 0.086), ('theory', 0.083), ('ideas', 0.08), ('regression', 0.078), ('reply', 0.071), ('someone', 0.069), ('big', 0.065), ('points', 0.064), ('ll', 0.055), ('right', 0.055), ('go', 0.055), ('maybe', 0.054), ('problem', 0.053), ('seems', 0.05), ('first', 0.047), ('something', 0.047), ('make', 0.04)]
simIndex simValue blogId blogTitle
same-blog 1 1.0 1283 andrew gelman stats-2012-04-26-Let’s play “Guess the smoother”!
Introduction: Andre de Boer writes: In my profession as a risk manager I encountered this graph: I can’t figure out what kind of regression this is, would you be so kind to enlighten me? The points represent (maturity,yield) of bonds. My reply: That’s a fun problem, reverse-engineering a curve fit! My first guess is lowess, although it seems too flat and asympoty on the right side of the graph to be lowess. Maybe a Gaussian process? Looks too smooth to be a spline. I guess I’ll go with my original guess, on the theory that lowess is the most accessible smoother out there, and if someone fit something much more complicated they’d make more of a big deal about it. On the other hand, if the curve is an automatic output of some software (Excel? Stata?) then it could be just about anything. Does anyone have any ideas?
2 0.22733006 293 andrew gelman stats-2010-09-23-Lowess is great
Introduction: I came across this old blog entry that was just hilarious–but it’s from 2005 so I think most of you haven’t seen it. It’s the story of two people named Martin Voracek and Maryanne Fisher who in a published discussion criticized lowess (a justly popular nonlinear regression method). Curious, I looked up “Martin Voracek” on the web and found an article in the British Medical Journal whose the title promised “trend analysis.” I was wondering what statistical methods they used–something more sophisticated than lowess, perhaps? They did have one figure, and here it is: Voracek and Fisher, the critics of lowess, are fit straight lines to data to clearly nonlinear data! It’s most obvious in their leftmost graph. Voracek and Fisher get full credit for showing scatterplots, but hey . . . they should try lowess next time! What’s really funny in the graph are the little dotted lines indicating inferential uncertainty in the regression lines–all under the assumption of linearity,
3 0.17829351 1543 andrew gelman stats-2012-10-21-Model complexity as a function of sample size
Introduction: As we get more data, we can fit more model. But at some point we become so overwhelmed by data that, for computational reasons, we can barely do anything at all. Thus, the curve above could be thought of as the product of two curves: a steadily increasing curve showing the statistical ability to fit more complex models with more data, and a steadily decreasing curve showing the computational feasibility of doing so.
4 0.12677272 1881 andrew gelman stats-2013-06-03-Boot
Introduction: Joshua Hartshorne writes: I ran several large-N experiments (separate participants) and looked at performance against age. What we want to do is compare age-of-peak-performance across the different tasks (again, different participants). We bootstrapped age-of-peak-performance. On each iteration, we sampled (with replacement) the X scores at each age, where X=num of participants at that age, and recorded the age at which performance peaked on that task. We then recorded the age at which performance was at peak and repeated. Once we had distributions of age-of-peak-performance, we used the means and SDs to calculate t-statistics to compare the results across different tasks. For graphical presentation, we used medians, interquartile ranges, and 95% confidence intervals (based on the distributions: the range within which 75% and 95% of the bootstrapped peaks appeared). While a number of people we consulted with thought this made a lot of sense, one reviewer of the paper insist
5 0.11584147 1489 andrew gelman stats-2012-09-09-Commercial Bayesian inference software is popping up all over
Introduction: Steve Cohen writes: As someone who has been working with Bayesian statistical models for the past several years, I [Cohen] have been challenged recently to describe the difference between Bayesian Networks (as implemented in BayesiaLab software) and modeling and inference using MCMC methods. I hope you have the time to give me (or to write on your blog) and relatively simple explanation that an advanced layman could understand. My reply: I skimmed the above website but I couldn’t quite see what they do. My guess is that they use MCMC and also various parametric approximations such as variational Bayes. They also seem to have something set up for decision analysis. My guess is that, compared to a general-purpose tool such as Stan, this Bayesia software is more accessible to non-academics in particular application areas (in this case, it looks like business marketing). But I can’t be sure. I’ve also heard about another company that looks to be doing something similar: h
6 0.11466795 1452 andrew gelman stats-2012-08-09-Visually weighting regression displays
8 0.10811995 1808 andrew gelman stats-2013-04-17-Excel-bashing
10 0.10521677 76 andrew gelman stats-2010-06-09-Both R and Stata
11 0.1033857 1376 andrew gelman stats-2012-06-12-Simple graph WIN: the example of birthday frequencies
12 0.10096172 447 andrew gelman stats-2010-12-03-Reinventing the wheel, only more so.
13 0.09559048 61 andrew gelman stats-2010-05-31-A data visualization manifesto
14 0.092167027 2299 andrew gelman stats-2014-04-21-Stan Model of the Week: Hierarchical Modeling of Supernovas
15 0.09146595 1735 andrew gelman stats-2013-02-24-F-f-f-fake data
16 0.09134315 1661 andrew gelman stats-2013-01-08-Software is as software does
17 0.090833887 2328 andrew gelman stats-2014-05-10-What property is important in a risk prediction model? Discrimination or calibration?
18 0.086057886 262 andrew gelman stats-2010-09-08-Here’s how rumors get started: Lineplots, dotplots, and nonfunctional modernist architecture
19 0.083018593 134 andrew gelman stats-2010-07-08-“What do you think about curved lines connecting discrete data-points?”
20 0.082512416 878 andrew gelman stats-2011-08-29-Infovis, infographics, and data visualization: Where I’m coming from, and where I’d like to go
topicId topicWeight
[(0, 0.125), (1, -0.004), (2, -0.016), (3, 0.048), (4, 0.094), (5, -0.061), (6, -0.007), (7, -0.013), (8, 0.037), (9, 0.011), (10, 0.007), (11, -0.005), (12, -0.022), (13, 0.006), (14, -0.001), (15, 0.014), (16, 0.064), (17, -0.014), (18, -0.014), (19, -0.02), (20, 0.033), (21, 0.04), (22, -0.025), (23, -0.039), (24, 0.013), (25, -0.005), (26, 0.02), (27, 0.0), (28, -0.008), (29, -0.013), (30, 0.042), (31, -0.001), (32, -0.04), (33, -0.068), (34, -0.033), (35, -0.022), (36, -0.043), (37, 0.019), (38, -0.04), (39, -0.004), (40, -0.007), (41, 0.035), (42, 0.058), (43, -0.003), (44, 0.05), (45, 0.02), (46, -0.028), (47, 0.039), (48, 0.031), (49, -0.043)]
simIndex simValue blogId blogTitle
same-blog 1 0.97379553 1283 andrew gelman stats-2012-04-26-Let’s play “Guess the smoother”!
Introduction: Andre de Boer writes: In my profession as a risk manager I encountered this graph: I can’t figure out what kind of regression this is, would you be so kind to enlighten me? The points represent (maturity,yield) of bonds. My reply: That’s a fun problem, reverse-engineering a curve fit! My first guess is lowess, although it seems too flat and asympoty on the right side of the graph to be lowess. Maybe a Gaussian process? Looks too smooth to be a spline. I guess I’ll go with my original guess, on the theory that lowess is the most accessible smoother out there, and if someone fit something much more complicated they’d make more of a big deal about it. On the other hand, if the curve is an automatic output of some software (Excel? Stata?) then it could be just about anything. Does anyone have any ideas?
2 0.78827292 1478 andrew gelman stats-2012-08-31-Watercolor regression
Introduction: Solomon Hsiang writes: Two small follow-ups based on the discussion (the second/bigger one is to address your comment about the 95% CI edges). 1. I realized that if we plot the confidence intervals as a solid color that fades (eg. using the “fixed ink” scheme from before) we can make sure the regression line also has heightened visual weight where confidence is high by plotting the line white. This makes the contrast (and thus visual weight) between the regression line and the CI highest when the CI is narrow and dark. As the CI fade near the edges, so does the contrast with the regression line. This is a small adjustment, but I like it because it is so simple and it makes the graph much nicer. (see “visually_weighted_fill_reverse” attached). My posted code has been updated to do this automatically. 2. You and your readers didn’t like that the edges of the filled CI were so sharp and arbitrary. But I didn’t like that the contrast between the spaghetti lines and the background
3 0.75535935 293 andrew gelman stats-2010-09-23-Lowess is great
Introduction: I came across this old blog entry that was just hilarious–but it’s from 2005 so I think most of you haven’t seen it. It’s the story of two people named Martin Voracek and Maryanne Fisher who in a published discussion criticized lowess (a justly popular nonlinear regression method). Curious, I looked up “Martin Voracek” on the web and found an article in the British Medical Journal whose the title promised “trend analysis.” I was wondering what statistical methods they used–something more sophisticated than lowess, perhaps? They did have one figure, and here it is: Voracek and Fisher, the critics of lowess, are fit straight lines to data to clearly nonlinear data! It’s most obvious in their leftmost graph. Voracek and Fisher get full credit for showing scatterplots, but hey . . . they should try lowess next time! What’s really funny in the graph are the little dotted lines indicating inferential uncertainty in the regression lines–all under the assumption of linearity,
Introduction: Following up on our recent discussion of visually-weighted displays of uncertainty in regression curves, Lucas Leeman sent in the following two graphs: First, the basic spaghetti-style plot showing inferential uncertainty in the E(y|x) curve: Then, a version using even lighter intensities for the lines that go further from the center, to further de-emphasize the edges: P.S. More (including code!) here .
5 0.74494284 1235 andrew gelman stats-2012-03-29-I’m looking for a quadrille notebook with faint lines
Introduction: I want a graph-paper-style notebook, ideally something lightweight—I’m looking to make notes, not art drawings—and not too large. I’m currently using a 17 x 22 cm notebook, which is a fine size. It also has pretty small squares, which I like. My problem with the notebook I have now is that the ink is too heavy—that is, the lines are too dark. I want very faint lines, just visible enough to be used as guides but not so heavy that to be overwhelming. The notebooks I see in the stores all have pretty dark lines. Any suggestions?
6 0.74393713 1452 andrew gelman stats-2012-08-09-Visually weighting regression displays
7 0.74069411 1470 andrew gelman stats-2012-08-26-Graphs showing regression uncertainty: the code!
8 0.73419577 134 andrew gelman stats-2010-07-08-“What do you think about curved lines connecting discrete data-points?”
9 0.7330218 929 andrew gelman stats-2011-09-27-Visual diagnostics for discrete-data regressions
10 0.72665501 1376 andrew gelman stats-2012-06-12-Simple graph WIN: the example of birthday frequencies
11 0.72171289 915 andrew gelman stats-2011-09-17-(Worst) graph of the year
12 0.71808738 670 andrew gelman stats-2011-04-20-Attractive but hard-to-read graph could be made much much better
13 0.70209002 672 andrew gelman stats-2011-04-20-The R code for those time-use graphs
15 0.68809336 1104 andrew gelman stats-2012-01-07-A compelling reason to go to London, Ontario??
16 0.68707055 1258 andrew gelman stats-2012-04-10-Why display 6 years instead of 30?
17 0.68601781 671 andrew gelman stats-2011-04-20-One more time-use graph
18 0.68216729 1684 andrew gelman stats-2013-01-20-Ugly ugly ugly
19 0.66865158 502 andrew gelman stats-2011-01-04-Cash in, cash out graph
20 0.66694212 1498 andrew gelman stats-2012-09-16-Choices in graphing parallel time series
topicId topicWeight
[(16, 0.068), (24, 0.285), (35, 0.022), (42, 0.023), (55, 0.082), (56, 0.063), (61, 0.024), (75, 0.024), (82, 0.049), (86, 0.014), (95, 0.027), (99, 0.198)]
simIndex simValue blogId blogTitle
same-blog 1 0.96963727 1283 andrew gelman stats-2012-04-26-Let’s play “Guess the smoother”!
Introduction: Andre de Boer writes: In my profession as a risk manager I encountered this graph: I can’t figure out what kind of regression this is, would you be so kind to enlighten me? The points represent (maturity,yield) of bonds. My reply: That’s a fun problem, reverse-engineering a curve fit! My first guess is lowess, although it seems too flat and asympoty on the right side of the graph to be lowess. Maybe a Gaussian process? Looks too smooth to be a spline. I guess I’ll go with my original guess, on the theory that lowess is the most accessible smoother out there, and if someone fit something much more complicated they’d make more of a big deal about it. On the other hand, if the curve is an automatic output of some software (Excel? Stata?) then it could be just about anything. Does anyone have any ideas?
2 0.94215274 1258 andrew gelman stats-2012-04-10-Why display 6 years instead of 30?
Introduction: I continue to be the go-to guy for bad graphs. Today (i.e., 22 Feb), I received an email from Gary Rosin: I [Rosin] thought you might be interested in this graph showing the decline in median prices of homes since 1997. It exaggerates the proportions by using $150,000 as the floor, rather than zero. Indeed. Here’s the graph: A line plot, rather than a bar plot, would be appropriate here. Also, it’s weird that the headline says “10 years” but the graph has only 6 years. Why not give some perspective and show, say, 30 years?
3 0.93873048 482 andrew gelman stats-2010-12-23-Capitalism as a form of voluntarism
Introduction: Interesting discussion by Alex Tabarrok (following up on an article by Rebecca Solnit) on the continuum between voluntarism (or, more generally, non-cash transactions) and markets with monetary exchange. I just have a few comments of my own: 1. Solnit writes of “the iceberg economy,” which she characterizes as “based on gift economies, barter, mutual aid, and giving without hope of return . . . the relations between friends, between family members, the activities of volunteers or those who have chosen their vocation on principle rather than for profit.” I just wonder whether “barter” completely fits in here. Maybe it depends on context. Sometimes barter is an informal way of keeping track (you help me and I help you), but in settings of low liquidity I could imagine barter being simply an inefficient way of performing an economic transaction. 2. I am no expert on capitalism but my impression is that it’s not just about “competition and selfishness” but also is related to the
4 0.93524826 743 andrew gelman stats-2011-06-03-An argument that can’t possibly make sense
Introduction: Tyler Cowen writes : Texas has begun to enforce [a law regarding parallel parking] only recently . . . Up until now, of course, there has been strong net mobility into the state of Texas, so was the previous lack of enforcement so bad? I care not at all about the direction in which people park their cars and I have no opinion on this law, but I have to raise an alarm at Cowen’s argument here. Let me strip it down to its basic form: 1. Until recently, state X had policy A. 2. Up until now, there has been strong net mobility into state X 3. Therefore, the presumption is that policy A is ok. In this particular case, I think we can safely assume that parallel parking regulations have had close to zero impact on the population flows into and out of Texas. More generally, I think logicians could poke some holes into the argument that 1 and 2 above imply 3. For one thing, you could apply this argument to any policy in any state that’s had positive net migration. Hai
5 0.93507254 1787 andrew gelman stats-2013-04-04-Wanna be the next Tyler Cowen? It’s not as easy as you might think!
Introduction: Someone told me he ran into someone who said his goal was to be Tyler Cowen. OK, fine, it’s a worthy goal, but I don’t think it’s so easy .
6 0.93412572 1092 andrew gelman stats-2011-12-29-More by Berger and me on weakly informative priors
7 0.93244851 1376 andrew gelman stats-2012-06-12-Simple graph WIN: the example of birthday frequencies
8 0.93229043 1455 andrew gelman stats-2012-08-12-Probabilistic screening to get an approximate self-weighted sample
9 0.93188673 938 andrew gelman stats-2011-10-03-Comparing prediction errors
10 0.93092138 1479 andrew gelman stats-2012-09-01-Mothers and Moms
12 0.92885017 278 andrew gelman stats-2010-09-15-Advice that might make sense for individuals but is negative-sum overall
13 0.92873061 1978 andrew gelman stats-2013-08-12-Fixing the race, ethnicity, and national origin questions on the U.S. Census
14 0.92472011 197 andrew gelman stats-2010-08-10-The last great essayist?
15 0.92451161 1999 andrew gelman stats-2013-08-27-Bayesian model averaging or fitting a larger model
16 0.92400891 2143 andrew gelman stats-2013-12-22-The kluges of today are the textbook solutions of tomorrow.
17 0.923908 241 andrew gelman stats-2010-08-29-Ethics and statistics in development research
18 0.92337537 2231 andrew gelman stats-2014-03-03-Running into a Stan Reference by Accident
20 0.92180467 38 andrew gelman stats-2010-05-18-Breastfeeding, infant hyperbilirubinemia, statistical graphics, and modern medicine