andrew_gelman_stats andrew_gelman_stats-2013 andrew_gelman_stats-2013-1886 knowledge-graph by maker-knowledge-mining

1886 andrew gelman stats-2013-06-07-Robust logistic regression


meta infos for this blog

Source: html

Introduction: Corey Yanofsky writes: In your work, you’ve robustificated logistic regression by having the logit function saturate at, e.g., 0.01 and 0.99, instead of 0 and 1. Do you have any thoughts on a sensible setting for the saturation values? My intuition suggests that it has something to do with proportion of outliers expected in the data (assuming a reasonable model fit). It would be desirable to have them fit in the model, but my intuition is that integrability of the posterior distribution might become an issue. My reply: it should be no problem to put these saturation values in the model, I bet it would work fine in Stan if you give them uniform (0,.1) priors or something like that. Or you could just fit the robit model. And this reminds me . . . I’ve been told that when Stan’s on its optimization setting, it fits generalized linear models just about as fast as regular glm or bayesglm in R. This suggests to me that we should have some precompiled regression models in Stan,


Summary: the most important sentenses genereted by tfidf model

sentIndex sentText sentNum sentScore

1 Corey Yanofsky writes: In your work, you’ve robustificated logistic regression by having the logit function saturate at, e. [sent-1, score-0.451]

2 Do you have any thoughts on a sensible setting for the saturation values? [sent-6, score-0.749]

3 My intuition suggests that it has something to do with proportion of outliers expected in the data (assuming a reasonable model fit). [sent-7, score-1.03]

4 It would be desirable to have them fit in the model, but my intuition is that integrability of the posterior distribution might become an issue. [sent-8, score-0.813]

5 My reply: it should be no problem to put these saturation values in the model, I bet it would work fine in Stan if you give them uniform (0,. [sent-9, score-0.916]

6 I’ve been told that when Stan’s on its optimization setting, it fits generalized linear models just about as fast as regular glm or bayesglm in R. [sent-15, score-1.164]

7 This suggests to me that we should have some precompiled regression models in Stan, then we could run all those regressions that way, and we could feel free to use whatever priors we want. [sent-16, score-1.321]


similar blogs computed by tfidf model

tfidf for this blog:

wordName wordTfidf (topN-words)

[('saturation', 0.366), ('stan', 0.265), ('intuition', 0.234), ('precompiled', 0.203), ('corey', 0.203), ('fit', 0.197), ('robit', 0.191), ('priors', 0.19), ('suggests', 0.183), ('setting', 0.177), ('values', 0.168), ('bayesglm', 0.167), ('glm', 0.152), ('outliers', 0.147), ('desirable', 0.145), ('regression', 0.13), ('logit', 0.128), ('sensible', 0.128), ('optimization', 0.127), ('generalized', 0.124), ('bet', 0.123), ('uniform', 0.12), ('model', 0.12), ('regressions', 0.113), ('fast', 0.109), ('proportion', 0.109), ('fits', 0.108), ('logistic', 0.104), ('regular', 0.103), ('models', 0.102), ('reminds', 0.1), ('assuming', 0.098), ('linear', 0.09), ('function', 0.089), ('could', 0.088), ('expected', 0.087), ('posterior', 0.084), ('become', 0.083), ('told', 0.082), ('something', 0.079), ('thoughts', 0.078), ('run', 0.077), ('free', 0.074), ('whatever', 0.073), ('reasonable', 0.071), ('distribution', 0.07), ('fine', 0.07), ('work', 0.069), ('ve', 0.069), ('instead', 0.066)]

similar blogs list:

simIndex simValue blogId blogTitle

same-blog 1 0.99999994 1886 andrew gelman stats-2013-06-07-Robust logistic regression

Introduction: Corey Yanofsky writes: In your work, you’ve robustificated logistic regression by having the logit function saturate at, e.g., 0.01 and 0.99, instead of 0 and 1. Do you have any thoughts on a sensible setting for the saturation values? My intuition suggests that it has something to do with proportion of outliers expected in the data (assuming a reasonable model fit). It would be desirable to have them fit in the model, but my intuition is that integrability of the posterior distribution might become an issue. My reply: it should be no problem to put these saturation values in the model, I bet it would work fine in Stan if you give them uniform (0,.1) priors or something like that. Or you could just fit the robit model. And this reminds me . . . I’ve been told that when Stan’s on its optimization setting, it fits generalized linear models just about as fast as regular glm or bayesglm in R. This suggests to me that we should have some precompiled regression models in Stan,

2 0.18028325 1748 andrew gelman stats-2013-03-04-PyStan!

Introduction: Stan is written in C++ and can be run from the command line and from R. We’d like for Python users to be able to run Stan as well. If anyone is interested in doing this, please let us know and we’d be happy to work with you on it. Stan, like Python, is completely free and open-source. P.S. Because Stan is open-source, it of course would also be possible for people to translate Stan into Python, or to take whatever features they like from Stan and incorporate them into a Python package. That’s fine too. But we think it would make sense in addition for users to be able to run Stan directly from Python, in the same way that it can be run from R.

3 0.17724152 2161 andrew gelman stats-2014-01-07-My recent debugging experience

Introduction: OK, so this sort of thing happens sometimes. I was working on a new idea (still working on it; if it ultimately works out—or if it doesn’t—I’ll let you know) and as part of it I was fitting little models in Stan, in a loop. I thought it would make sense to start with linear regression with normal priors and known data variance, because then the exact solution is Gaussian and I can also work with the problem analytically. So I programmed up the algorithm and, no surprise, it didn’t work. I went through my R code, put in print statements here and there, and cleared out bug after bug until at least it stopped crashing. But the algorithm still wasn’t doing what it was supposed to do. So I decided to do something simpler, and just check that the Stan linear regression gave the same answer as the analytic posterior distribution: I ran Stan for tons of iterations, then computed the sample mean and variance of the simulations. It was an example with two coefficients—I’d originally cho

4 0.17418006 1516 andrew gelman stats-2012-09-30-Computational problems with glm etc.

Introduction: John Mount provides some useful background and follow-up on our discussion from last year on computational instability of the usual logistic regression solver. Just to refresh your memory, here’s a simple logistic regression with only a constant term and no separation, nothing pathological at all: > y <- rep (c(1,0),c(10,5)) > display (glm (y ~ 1, family=binomial(link="logit"))) glm(formula = y ~ 1, family = binomial(link = "logit")) coef.est coef.se (Intercept) 0.69 0.55 --- n = 15, k = 1 residual deviance = 19.1, null deviance = 19.1 (difference = 0.0) And here’s what happens when we give it the not-outrageous starting value of -2: > display (glm (y ~ 1, family=binomial(link="logit"), start=-2)) glm(formula = y ~ 1, family = binomial(link = "logit"), start = -2) coef.est coef.se (Intercept) 71.97 17327434.18 --- n = 15, k = 1 residual deviance = 360.4, null deviance = 19.1 (difference = -341.3) Warning message:

5 0.1738185 1580 andrew gelman stats-2012-11-16-Stantastic!

Introduction: Richard McElreath writes: I’ve been translating a few ongoing data analysis projects into Stan code, mostly with success. The most important for me right now has been a hierarchical zero-inflated gamma problem. This a “hurdle” model, in which a bernoulli GLM produces zeros/nonzeros, and then a gamma GLM produces the nonzero values, using varying effects correlated with those in the bernoulli process. The data are 20 years of human foraging returns from a subsistence hunting population in Paraguay (the Ache), comprising about 15k hunts in total (Hill & Kintigh. 2009. Current Anthropology 50:369-377). Observed values are kilograms of meat returned to camp. The more complex models contain a 147-by-9 matrix of varying effects (147 unique hunters), as well as imputation of missing values. Originally, I had written the sampler myself in raw R code. It was very slow, but I knew what it was doing at least. Just before Stan version 1.0 was released, I had managed to get JAGS to do it a

6 0.16715962 2291 andrew gelman stats-2014-04-14-Transitioning to Stan

7 0.16366842 782 andrew gelman stats-2011-06-29-Putting together multinomial discrete regressions by combining simple logits

8 0.16146155 1475 andrew gelman stats-2012-08-30-A Stan is Born

9 0.15843734 2299 andrew gelman stats-2014-04-21-Stan Model of the Week: Hierarchical Modeling of Supernovas

10 0.15786752 773 andrew gelman stats-2011-06-18-Should we always be using the t and robit instead of the normal and logit?

11 0.1476185 2110 andrew gelman stats-2013-11-22-A Bayesian model for an increasing function, in Stan!

12 0.14229453 1950 andrew gelman stats-2013-07-22-My talks that were scheduled for Tues at the Data Skeptics meetup and Wed at the Open Statistical Programming meetup

13 0.13306676 2096 andrew gelman stats-2013-11-10-Schiminovich is on The Simpsons

14 0.13266484 1465 andrew gelman stats-2012-08-21-D. Buggin

15 0.12630352 1527 andrew gelman stats-2012-10-10-Another reason why you can get good inferences from a bad model

16 0.12558289 2109 andrew gelman stats-2013-11-21-Hidden dangers of noninformative priors

17 0.12516126 1486 andrew gelman stats-2012-09-07-Prior distributions for regression coefficients

18 0.12503666 2029 andrew gelman stats-2013-09-18-Understanding posterior p-values

19 0.12450627 1431 andrew gelman stats-2012-07-27-Overfitting

20 0.12380464 1941 andrew gelman stats-2013-07-16-Priors


similar blogs computed by lsi model

lsi for this blog:

topicId topicWeight

[(0, 0.166), (1, 0.186), (2, 0.012), (3, 0.095), (4, 0.099), (5, 0.039), (6, 0.074), (7, -0.185), (8, -0.023), (9, -0.003), (10, -0.057), (11, 0.041), (12, -0.079), (13, -0.035), (14, 0.003), (15, -0.043), (16, -0.006), (17, 0.007), (18, -0.015), (19, -0.017), (20, 0.011), (21, -0.042), (22, -0.036), (23, -0.043), (24, -0.004), (25, -0.005), (26, 0.029), (27, -0.1), (28, -0.084), (29, -0.013), (30, 0.025), (31, 0.019), (32, 0.012), (33, 0.014), (34, -0.014), (35, 0.001), (36, 0.014), (37, 0.044), (38, -0.015), (39, -0.012), (40, 0.024), (41, 0.007), (42, 0.012), (43, -0.01), (44, 0.06), (45, -0.001), (46, -0.021), (47, 0.017), (48, -0.007), (49, 0.055)]

similar blogs list:

simIndex simValue blogId blogTitle

same-blog 1 0.9620232 1886 andrew gelman stats-2013-06-07-Robust logistic regression

Introduction: Corey Yanofsky writes: In your work, you’ve robustificated logistic regression by having the logit function saturate at, e.g., 0.01 and 0.99, instead of 0 and 1. Do you have any thoughts on a sensible setting for the saturation values? My intuition suggests that it has something to do with proportion of outliers expected in the data (assuming a reasonable model fit). It would be desirable to have them fit in the model, but my intuition is that integrability of the posterior distribution might become an issue. My reply: it should be no problem to put these saturation values in the model, I bet it would work fine in Stan if you give them uniform (0,.1) priors or something like that. Or you could just fit the robit model. And this reminds me . . . I’ve been told that when Stan’s on its optimization setting, it fits generalized linear models just about as fast as regular glm or bayesglm in R. This suggests to me that we should have some precompiled regression models in Stan,

2 0.8762812 2161 andrew gelman stats-2014-01-07-My recent debugging experience

Introduction: OK, so this sort of thing happens sometimes. I was working on a new idea (still working on it; if it ultimately works out—or if it doesn’t—I’ll let you know) and as part of it I was fitting little models in Stan, in a loop. I thought it would make sense to start with linear regression with normal priors and known data variance, because then the exact solution is Gaussian and I can also work with the problem analytically. So I programmed up the algorithm and, no surprise, it didn’t work. I went through my R code, put in print statements here and there, and cleared out bug after bug until at least it stopped crashing. But the algorithm still wasn’t doing what it was supposed to do. So I decided to do something simpler, and just check that the Stan linear regression gave the same answer as the analytic posterior distribution: I ran Stan for tons of iterations, then computed the sample mean and variance of the simulations. It was an example with two coefficients—I’d originally cho

3 0.86306095 2242 andrew gelman stats-2014-03-10-Stan Model of the Week: PK Calculation of IV and Oral Dosing

Introduction: [Update: Revised given comments from Wingfeet, Andrew and germo. Thanks! I'd mistakenly translated the dlnorm priors in the first version --- amazing what a difference the priors make. I also escaped the less-than and greater-than signs in the constraints in the model so they're visible. I also updated to match the thin=2 output of JAGS.] We’re going to be starting a Stan “model of the P” (for some time period P) column, so I thought I’d kick things off with one of my own. I’ve been following the Wingvoet blog , the author of which is identified only by the Blogger handle Wingfeet ; a couple of days ago this lovely post came out: PK calculation of IV and oral dosing in JAGS Wingfeet’s post implemented an answer to question 6 from chapter 6 of problem from Rowland and Tozer’s 2010 book, Clinical Pharmacokinetics and Pharmacodynamics , Fourth edition, Lippincott, Williams & Wilkins. So in the grand tradition of using this blog to procrastinate, I thought I’d t

4 0.85445446 2110 andrew gelman stats-2013-11-22-A Bayesian model for an increasing function, in Stan!

Introduction: Following up on yesterday’s post, here’s David Chudzicki’s story (with graphs and Stan/R code!) of how he fit a model for an increasing function (“isotonic regression”). Chudzicki writes: This post will describe a way I came up with of fitting a function that’s constrained to be increasing, using Stan. If you want practical help, standard statistical approaches, or expert research, this isn’t the place for you (look up “isotonic regression” or “Bayesian isotonic regression” or David Dunson). This is the place for you if you want to read about how I thought about setting up a model, implemented the model in Stan, and created graphics to understand what was going on. The background is that a simple, natural-seeming uniform prior on the function values does not work so well—it’s a much stronger prior distribution than one might naively think, just one of those unexpected aspects of high-dimensional probability distributions. So Chudzicki sets up a more general family with a hype

5 0.80768549 1580 andrew gelman stats-2012-11-16-Stantastic!

Introduction: Richard McElreath writes: I’ve been translating a few ongoing data analysis projects into Stan code, mostly with success. The most important for me right now has been a hierarchical zero-inflated gamma problem. This a “hurdle” model, in which a bernoulli GLM produces zeros/nonzeros, and then a gamma GLM produces the nonzero values, using varying effects correlated with those in the bernoulli process. The data are 20 years of human foraging returns from a subsistence hunting population in Paraguay (the Ache), comprising about 15k hunts in total (Hill & Kintigh. 2009. Current Anthropology 50:369-377). Observed values are kilograms of meat returned to camp. The more complex models contain a 147-by-9 matrix of varying effects (147 unique hunters), as well as imputation of missing values. Originally, I had written the sampler myself in raw R code. It was very slow, but I knew what it was doing at least. Just before Stan version 1.0 was released, I had managed to get JAGS to do it a

6 0.79830796 2299 andrew gelman stats-2014-04-21-Stan Model of the Week: Hierarchical Modeling of Supernovas

7 0.79492736 2291 andrew gelman stats-2014-04-14-Transitioning to Stan

8 0.76076376 2003 andrew gelman stats-2013-08-30-Stan Project: Continuous Relaxations for Discrete MRFs

9 0.73961753 1472 andrew gelman stats-2012-08-28-Migrating from dot to underscore

10 0.72456205 2020 andrew gelman stats-2013-09-12-Samplers for Big Science: emcee and BAT

11 0.72027391 2096 andrew gelman stats-2013-11-10-Schiminovich is on The Simpsons

12 0.71298403 1627 andrew gelman stats-2012-12-17-Stan and RStan 1.1.0

13 0.71146488 2035 andrew gelman stats-2013-09-23-Scalable Stan

14 0.7109139 1753 andrew gelman stats-2013-03-06-Stan 1.2.0 and RStan 1.2.0

15 0.70052975 2150 andrew gelman stats-2013-12-27-(R-Py-Cmd)Stan 2.1.0

16 0.699579 2178 andrew gelman stats-2014-01-20-Mailing List Degree-of-Difficulty Difficulty

17 0.69668579 2318 andrew gelman stats-2014-05-04-Stan (& JAGS) Tutorial on Linear Mixed Models

18 0.69482034 1036 andrew gelman stats-2011-11-30-Stan uses Nuts!

19 0.69392484 1856 andrew gelman stats-2013-05-14-GPstuff: Bayesian Modeling with Gaussian Processes

20 0.68250263 782 andrew gelman stats-2011-06-29-Putting together multinomial discrete regressions by combining simple logits


similar blogs computed by lda model

lda for this blog:

topicId topicWeight

[(9, 0.035), (16, 0.031), (21, 0.029), (24, 0.21), (53, 0.014), (54, 0.042), (58, 0.124), (59, 0.017), (63, 0.017), (86, 0.043), (99, 0.326)]

similar blogs list:

simIndex simValue blogId blogTitle

same-blog 1 0.97401166 1886 andrew gelman stats-2013-06-07-Robust logistic regression

Introduction: Corey Yanofsky writes: In your work, you’ve robustificated logistic regression by having the logit function saturate at, e.g., 0.01 and 0.99, instead of 0 and 1. Do you have any thoughts on a sensible setting for the saturation values? My intuition suggests that it has something to do with proportion of outliers expected in the data (assuming a reasonable model fit). It would be desirable to have them fit in the model, but my intuition is that integrability of the posterior distribution might become an issue. My reply: it should be no problem to put these saturation values in the model, I bet it would work fine in Stan if you give them uniform (0,.1) priors or something like that. Or you could just fit the robit model. And this reminds me . . . I’ve been told that when Stan’s on its optimization setting, it fits generalized linear models just about as fast as regular glm or bayesglm in R. This suggests to me that we should have some precompiled regression models in Stan,

2 0.96531999 815 andrew gelman stats-2011-07-22-Statistical inference based on the minimum description length principle

Introduction: Tom Ball writes: Here’s another query to add to the stats backlog…Minimum Description Length (MDL). I’m attaching a 2002 Psych Rev paper on same. Basically, it’s an approach to model selection that replaces goodness of fit with generalizability or complexity. Would be great to get your response to this approach. My reply: I’ve heard about the minimum description length principle for a long time but have never really understood it. So I have nothing to say! Anyone who has anything useful to say on the topic, feel free to add in the comments. The rest of you might wonder why I posted this. I just thought it would be good for you to have some sense of the boundaries of my knowledge.

3 0.96098167 1966 andrew gelman stats-2013-08-03-Uncertainty in parameter estimates using multilevel models

Introduction: David Hsu writes: I have a (perhaps) simple question about uncertainty in parameter estimates using multilevel models — what is an appropriate threshold for measure parameter uncertainty in a multilevel model? The reason why I ask is that I set out to do a crossed two-way model with two varying intercepts, similar to your flight simulator example in your 2007 book. The difference is that I have a lot of predictors specific to each cell (I think equivalent to airport and pilot in your example), and I find after modeling this in JAGS, I happily find that the predictors are much less important than the variability by cell (airport and pilot effects). Happily because this is what I am writing a paper about. However, I then went to check subsets of predictors using lm() and lmer(). I understand that they all use different estimation methods, but what I can’t figure out is why the errors on all of the coefficient estimates are *so* different. For example, using JAGS, and th

4 0.95416987 574 andrew gelman stats-2011-02-14-“The best data visualizations should stand on their own”? I don’t think so.

Introduction: Jimmy pointed me to this blog by Drew Conway on word clouds. I don’t have much to say about Conway’s specifics–word clouds aren’t really my thing, but I’m glad that people are thinking about how to do them better–but I did notice one phrase of his that I’ll dispute. Conway writes The best data visualizations should stand on their own . . . I disagree. I prefer the saying, “A picture plus 1000 words is better than two pictures or 2000 words.” That is, I see a positive interaction between words and pictures or, to put it another way, diminishing returns for words or pictures on their own. I don’t have any big theory for this, but I think, when expressed as a joint value function, my idea makes sense. Also, I live this suggestion in my own work. I typically accompany my graphs with long captions and I try to accompany my words with pictures (although I’m not doing it here, because with the software I use, it’s much easier to type more words than to find, scale, and insert i

5 0.95184386 979 andrew gelman stats-2011-10-29-Bayesian inference for the parameter of a uniform distribution

Introduction: Subhash Lele writes: I was wondering if you might know some good references to Bayesian treatment of parameter estimation for U(0,b) type distributions. I am looking for cases where the parameter is on the boundary. I would appreciate any help and advice you could provide. I am, in particular, looking for an MCMC (preferably in WinBUGS) based approach. I figured out the WinBUGS part but I am still curious about the theoretical papers, asymptotics etc. I actually can’t think of any examples! But maybe you, the readers, can. We also should think of the best way to implement this model in Stan. We like to transform to avoid hard boundary constraints, but it seems a bit tacky to do a data-based transformation (which itself would not work if there are latent variables). P.S. I actually saw Lele speak at a statistics conference around 20 years ago. There was a lively exchange between Lele and an older guy who was working on similar problems using a different method. The oth

6 0.94618344 119 andrew gelman stats-2010-06-30-Why is George Apley overrated?

7 0.94569975 1428 andrew gelman stats-2012-07-25-The problem with realistic advice?

8 0.94020683 1644 andrew gelman stats-2012-12-30-Fixed effects, followed by Bayes shrinkage?

9 0.93955135 1150 andrew gelman stats-2012-02-02-The inevitable problems with statistical significance and 95% intervals

10 0.93943036 1167 andrew gelman stats-2012-02-14-Extra babies on Valentine’s Day, fewer on Halloween?

11 0.93939906 502 andrew gelman stats-2011-01-04-Cash in, cash out graph

12 0.93899626 970 andrew gelman stats-2011-10-24-Bell Labs

13 0.93829304 1465 andrew gelman stats-2012-08-21-D. Buggin

14 0.93789649 86 andrew gelman stats-2010-06-14-“Too much data”?

15 0.93787915 1502 andrew gelman stats-2012-09-19-Scalability in education

16 0.93783069 1941 andrew gelman stats-2013-07-16-Priors

17 0.93759143 1792 andrew gelman stats-2013-04-07-X on JLP

18 0.93727827 899 andrew gelman stats-2011-09-10-The statistical significance filter

19 0.93696845 1191 andrew gelman stats-2012-03-01-Hoe noem je?

20 0.93687499 669 andrew gelman stats-2011-04-19-The mysterious Gamma (1.4, 0.4)