andrew_gelman_stats andrew_gelman_stats-2012 andrew_gelman_stats-2012-1535 knowledge-graph by maker-knowledge-mining

1535 andrew gelman stats-2012-10-16-Bayesian analogue to stepwise regression?


meta infos for this blog

Source: html

Introduction: Bill Harris writes: On pp. 250-251 of BDA second edition, you write about multiple comparisons, and you write about stepwise regression on p. 405. How would you look at stepwise regression analyses in light of the multiple comparisons problem? Is there an issue? My reply: In this case I think the right approach is to keep all the coefs but partially pool them toward 0 (after suitable transformation). But then the challenge is coming up with a general way to construct good prior distributions. I’m still thinking about that one! Yet another approach is to put something together purely nonparametrically as with Bart.


Summary: the most important sentenses genereted by tfidf model

sentIndex sentText sentNum sentScore

1 250-251 of BDA second edition, you write about multiple comparisons, and you write about stepwise regression on p. [sent-2, score-1.221]

2 How would you look at stepwise regression analyses in light of the multiple comparisons problem? [sent-4, score-1.423]

3 My reply: In this case I think the right approach is to keep all the coefs but partially pool them toward 0 (after suitable transformation). [sent-6, score-1.313]

4 But then the challenge is coming up with a general way to construct good prior distributions. [sent-7, score-0.657]

5 Yet another approach is to put something together purely nonparametrically as with Bart. [sent-9, score-0.614]


similar blogs computed by tfidf model

tfidf for this blog:

wordName wordTfidf (topN-words)

[('stepwise', 0.467), ('bart', 0.242), ('coefs', 0.242), ('comparisons', 0.227), ('transformation', 0.211), ('multiple', 0.204), ('harris', 0.204), ('suitable', 0.187), ('partially', 0.185), ('pool', 0.174), ('regression', 0.172), ('construct', 0.171), ('approach', 0.171), ('bda', 0.167), ('edition', 0.159), ('purely', 0.153), ('write', 0.147), ('challenge', 0.14), ('light', 0.138), ('bill', 0.124), ('analyses', 0.116), ('toward', 0.111), ('together', 0.107), ('yet', 0.1), ('coming', 0.1), ('keep', 0.095), ('prior', 0.091), ('issue', 0.089), ('second', 0.084), ('thinking', 0.081), ('reply', 0.078), ('look', 0.069), ('general', 0.069), ('put', 0.067), ('another', 0.064), ('still', 0.061), ('right', 0.061), ('problem', 0.059), ('case', 0.058), ('something', 0.052), ('good', 0.044), ('way', 0.042), ('writes', 0.04), ('would', 0.03), ('think', 0.029), ('one', 0.027)]

similar blogs list:

simIndex simValue blogId blogTitle

same-blog 1 0.99999994 1535 andrew gelman stats-2012-10-16-Bayesian analogue to stepwise regression?

Introduction: Bill Harris writes: On pp. 250-251 of BDA second edition, you write about multiple comparisons, and you write about stepwise regression on p. 405. How would you look at stepwise regression analyses in light of the multiple comparisons problem? Is there an issue? My reply: In this case I think the right approach is to keep all the coefs but partially pool them toward 0 (after suitable transformation). But then the challenge is coming up with a general way to construct good prior distributions. I’m still thinking about that one! Yet another approach is to put something together purely nonparametrically as with Bart.

2 0.39441055 2357 andrew gelman stats-2014-06-02-Why we hate stepwise regression

Introduction: Haynes Goddard writes: I have been slowly working my way through the grad program in stats here, and the latest course was a biostats course on categorical and survival analysis. I noticed in the semi-parametric and parametric material (Wang and Lee is the text) that they use stepwise regression a lot. I learned in econometrics that stepwise is poor practice, as it defaults to the “theory of the regression line”, that is no theory at all, just the variation in the data. I don’t find the topic on your blog, and wonder if you have addressed the issue. My reply: Stepwise regression is one of these things, like outlier detection and pie charts, which appear to be popular among non-statisticans but are considered by statisticians to be a bit of a joke. For example, Jennifer and I don’t mention stepwise regression in our book, not even once. To address the issue more directly: the motivation behind stepwise regression is that you have a lot of potential predictors but not e

3 0.2311337 1989 andrew gelman stats-2013-08-20-Correcting for multiple comparisons in a Bayesian regression model

Introduction: Joe Northrup writes: I have a question about correcting for multiple comparisons in a Bayesian regression model. I believe I understand the argument in your 2012 paper in Journal of Research on Educational Effectiveness that when you have a hierarchical model there is shrinkage of estimates towards the group-level mean and thus there is no need to add any additional penalty to correct for multiple comparisons. In my case I do not have hierarchically structured data—i.e. I have only 1 observation per group but have a categorical variable with a large number of categories. Thus, I am fitting a simple multiple regression in a Bayesian framework. Would putting a strong, mean 0, multivariate normal prior on the betas in this model accomplish the same sort of shrinkage (it seems to me that it would) and do you believe this is a valid way to address criticism of multiple comparisons in this setting? My reply: Yes, I think this makes sense. One way to address concerns of multiple com

4 0.16996181 2356 andrew gelman stats-2014-06-02-On deck this week

Introduction: Mon: Why we hate stepwise regression Tues: Did you buy laundry detergent on their most recent trip to the store? Also comments on scientific publication and yet another suggestion to do a study that allows within-person comparisons Wed: All the Assumptions That Are My Life Thurs: Identifying pathways for managing multiple disturbances to limit plant invasions Fri: Statistically savvy journalism Sat: “Does researching casual marijuana use cause brain abnormalities?” Sun: Regression and causality and variable ordering

5 0.15085152 2348 andrew gelman stats-2014-05-26-On deck this week

Introduction: Mon: WAIC and cross-validation in Stan! Tues: A whole fleet of gremlins: Looking more carefully at Richard Tol’s twice-corrected paper, “The Economic Effects of Climate Change” Wed: Just wondering Thurs: When you believe in things that you don’t understand Fri: I posted this as a comment on a sociology blog Sat: “Building on theories used to describe magnets, scientists have put together a model that captures something very different . . .” Sun: Why we hate stepwise regression

6 0.12828232 2 andrew gelman stats-2010-04-23-Modeling heterogenous treatment effects

7 0.11768223 1016 andrew gelman stats-2011-11-17-I got 99 comparisons but multiplicity ain’t one

8 0.11572986 1656 andrew gelman stats-2013-01-05-Understanding regression models and regression coefficients

9 0.1026811 2117 andrew gelman stats-2013-11-29-The gradual transition to replicable science

10 0.10010361 1870 andrew gelman stats-2013-05-26-How to understand coefficients that reverse sign when you start controlling for things?

11 0.098557457 608 andrew gelman stats-2011-03-12-Single or multiple imputation?

12 0.097521499 1691 andrew gelman stats-2013-01-25-Extreem p-values!

13 0.096150883 697 andrew gelman stats-2011-05-05-A statistician rereads Bill James

14 0.095638245 848 andrew gelman stats-2011-08-11-That xkcd cartoon on multiple comparisons that all of you were sending me a couple months ago

15 0.094473839 1155 andrew gelman stats-2012-02-05-What is a prior distribution?

16 0.092738517 2364 andrew gelman stats-2014-06-08-Regression and causality and variable ordering

17 0.087235473 2200 andrew gelman stats-2014-02-05-Prior distribution for a predicted probability

18 0.086883411 524 andrew gelman stats-2011-01-19-Data exploration and multiple comparisons

19 0.084943652 1725 andrew gelman stats-2013-02-17-“1.7%” ha ha ha

20 0.084837779 247 andrew gelman stats-2010-09-01-How does Bayes do it?


similar blogs computed by lsi model

lsi for this blog:

topicId topicWeight

[(0, 0.123), (1, 0.06), (2, 0.02), (3, -0.006), (4, 0.034), (5, -0.033), (6, 0.044), (7, -0.008), (8, 0.011), (9, 0.063), (10, 0.025), (11, 0.043), (12, 0.048), (13, 0.015), (14, 0.05), (15, 0.012), (16, -0.049), (17, -0.011), (18, 0.006), (19, -0.002), (20, -0.003), (21, 0.063), (22, 0.007), (23, 0.04), (24, -0.004), (25, 0.004), (26, 0.058), (27, -0.088), (28, -0.041), (29, -0.045), (30, 0.049), (31, 0.026), (32, 0.059), (33, 0.058), (34, -0.015), (35, -0.012), (36, 0.039), (37, 0.049), (38, -0.041), (39, -0.008), (40, 0.044), (41, 0.116), (42, -0.039), (43, -0.081), (44, 0.057), (45, 0.012), (46, -0.058), (47, 0.027), (48, 0.007), (49, -0.113)]

similar blogs list:

simIndex simValue blogId blogTitle

same-blog 1 0.97338861 1535 andrew gelman stats-2012-10-16-Bayesian analogue to stepwise regression?

Introduction: Bill Harris writes: On pp. 250-251 of BDA second edition, you write about multiple comparisons, and you write about stepwise regression on p. 405. How would you look at stepwise regression analyses in light of the multiple comparisons problem? Is there an issue? My reply: In this case I think the right approach is to keep all the coefs but partially pool them toward 0 (after suitable transformation). But then the challenge is coming up with a general way to construct good prior distributions. I’m still thinking about that one! Yet another approach is to put something together purely nonparametrically as with Bart.

2 0.78652382 2357 andrew gelman stats-2014-06-02-Why we hate stepwise regression

Introduction: Haynes Goddard writes: I have been slowly working my way through the grad program in stats here, and the latest course was a biostats course on categorical and survival analysis. I noticed in the semi-parametric and parametric material (Wang and Lee is the text) that they use stepwise regression a lot. I learned in econometrics that stepwise is poor practice, as it defaults to the “theory of the regression line”, that is no theory at all, just the variation in the data. I don’t find the topic on your blog, and wonder if you have addressed the issue. My reply: Stepwise regression is one of these things, like outlier detection and pie charts, which appear to be popular among non-statisticans but are considered by statisticians to be a bit of a joke. For example, Jennifer and I don’t mention stepwise regression in our book, not even once. To address the issue more directly: the motivation behind stepwise regression is that you have a lot of potential predictors but not e

3 0.71128643 1989 andrew gelman stats-2013-08-20-Correcting for multiple comparisons in a Bayesian regression model

Introduction: Joe Northrup writes: I have a question about correcting for multiple comparisons in a Bayesian regression model. I believe I understand the argument in your 2012 paper in Journal of Research on Educational Effectiveness that when you have a hierarchical model there is shrinkage of estimates towards the group-level mean and thus there is no need to add any additional penalty to correct for multiple comparisons. In my case I do not have hierarchically structured data—i.e. I have only 1 observation per group but have a categorical variable with a large number of categories. Thus, I am fitting a simple multiple regression in a Bayesian framework. Would putting a strong, mean 0, multivariate normal prior on the betas in this model accomplish the same sort of shrinkage (it seems to me that it would) and do you believe this is a valid way to address criticism of multiple comparisons in this setting? My reply: Yes, I think this makes sense. One way to address concerns of multiple com

4 0.70439786 1870 andrew gelman stats-2013-05-26-How to understand coefficients that reverse sign when you start controlling for things?

Introduction: Denis Cote writes: Just read this today and my unsophisticated statistical mind is confused. “Initial bivariate analyses suggest that union membership is actually associated with worse health. This association disappears when controlling for demographics, then reverses and becomes significant when controlling for labor market characteristics.” From my education about statistics, I remember to be suspicious about multiple regression coefficients that are in the opposite direction of the bivariate coefficients. What I am missing? I vaguely remember something about the suppression effect. My reply: There’s a long literature on this from many decades ago. My general feeling about such situations is that, when the coefficient changes a lot after controlling for other variables, it is important to visualize this change, to understand what is the interaction among variables that is associated with the change in the coefficients. This is what we did in our Red State Blue State

5 0.68481576 796 andrew gelman stats-2011-07-10-Matching and regression: two great tastes etc etc

Introduction: Matthew Bogard writes: Regarding the book Mostly Harmless Econometrics, you state : A casual reader of the book might be left with the unfortunate impression that matching is a competitor to regression rather than a tool for making regression more effective. But in fact isn’t that what they are arguing, that, in a ‘mostly harmless way’ regression is in fact a matching estimator itself? “Our view is that regression can be motivated as a particular sort of weighted matching estimator, and therefore the differences between regression and matching estimates are unlikely to be of major empirical importance” (Chapter 3 p. 70) They seem to be distinguishing regression (without prior matching) from all other types of matching techniques, and therefore implying that regression can be a ‘mostly harmless’ substitute or competitor to matching. My previous understanding, before starting this book was as you say, that matching is a tool that makes regression more effective. I have n

6 0.68403685 1656 andrew gelman stats-2013-01-05-Understanding regression models and regression coefficients

7 0.64843905 1445 andrew gelman stats-2012-08-06-Slow progress

8 0.647075 144 andrew gelman stats-2010-07-13-Hey! Here’s a referee report for you!

9 0.61539084 1094 andrew gelman stats-2011-12-31-Using factor analysis or principal components analysis or measurement-error models for biological measurements in archaeology?

10 0.60894531 327 andrew gelman stats-2010-10-07-There are never 70 distinct parameters

11 0.60845298 848 andrew gelman stats-2011-08-11-That xkcd cartoon on multiple comparisons that all of you were sending me a couple months ago

12 0.60817683 1814 andrew gelman stats-2013-04-20-A mess with which I am comfortable

13 0.60234636 451 andrew gelman stats-2010-12-05-What do practitioners need to know about regression?

14 0.58346045 1849 andrew gelman stats-2013-05-09-Same old same old

15 0.58282298 770 andrew gelman stats-2011-06-15-Still more Mr. P in public health

16 0.57427424 10 andrew gelman stats-2010-04-29-Alternatives to regression for social science predictions

17 0.57100618 704 andrew gelman stats-2011-05-10-Multiple imputation and multilevel analysis

18 0.56564975 2356 andrew gelman stats-2014-06-02-On deck this week

19 0.56284428 1691 andrew gelman stats-2013-01-25-Extreem p-values!

20 0.55992687 1815 andrew gelman stats-2013-04-20-Displaying inferences from complex models


similar blogs computed by lda model

lda for this blog:

topicId topicWeight

[(11, 0.038), (21, 0.025), (24, 0.166), (42, 0.327), (65, 0.027), (82, 0.029), (99, 0.249)]

similar blogs list:

simIndex simValue blogId blogTitle

1 0.9115845 808 andrew gelman stats-2011-07-18-The estimated effect size is implausibly large. Under what models is this a piece of evidence that the true effect is small?

Introduction: Paul Pudaite writes in response to my discussion with Bartels regarding effect sizes and measurement error models: You [Gelman] wrote: “I actually think there will be some (non-Gaussian) models for which, as y gets larger, E(x|y) can actually go back toward zero.” I [Pudaite] encountered this phenomenon some time in the ’90s. See this graph which shows the conditional expectation of X given Z, when Z = X + Y and the probability density functions of X and Y are, respectively, exp(-x^2) and 1/(y^2+1) (times appropriate constants). As the magnitude of Z increases, E[X|Z] shrinks to zero. I wasn’t sure it was worth the effort to try to publish a two paragraph paper. I suspect that this is true whenever the tail of one distribution is ‘sufficiently heavy’ with respect to the tail of the other. Hmm, I suppose there might be enough substance in a paper that attempted to characterize this outcome for, say, unimodal symmetric distributions. Maybe someone can do this? I think i

2 0.89196181 1002 andrew gelman stats-2011-11-10-“Venetia Orcutt, GWU med school professor, quits after complaints of no-show class”

Introduction: She was assigned to teach a class in “evidence-based medicine”! ( link from my usual news source). I wonder what was in the syllabus? If anyone has a copy, feel free to send to me and I will post it here. My favorite part of the story, though, is this: Almost all physician assistant students refused to comment to a reporter Tuesday, saying they’d been told by the department not to talk to media. Talk about obedience to authority! They’re studying in a program that offers nonexistent courses, but then they follow the department’s gag order.

3 0.87107408 124 andrew gelman stats-2010-07-02-Note to the quals

Introduction: See here for latest rant.

same-blog 4 0.86263782 1535 andrew gelman stats-2012-10-16-Bayesian analogue to stepwise regression?

Introduction: Bill Harris writes: On pp. 250-251 of BDA second edition, you write about multiple comparisons, and you write about stepwise regression on p. 405. How would you look at stepwise regression analyses in light of the multiple comparisons problem? Is there an issue? My reply: In this case I think the right approach is to keep all the coefs but partially pool them toward 0 (after suitable transformation). But then the challenge is coming up with a general way to construct good prior distributions. I’m still thinking about that one! Yet another approach is to put something together purely nonparametrically as with Bart.

5 0.83922911 1104 andrew gelman stats-2012-01-07-A compelling reason to go to London, Ontario??

Introduction: Dan Goldstein asks what I think of this : My reply: It’s hard for me to imagine a compelling reason for anyone to go to London, Ontario–but, hey, I guess there’s all kinds of people in this world! More seriously, I see the appeal of the graph but it’s a bit busy for my taste. Over the years I’ve moved toward small multiples rather than single busy graphs. That’s one reason why I prefer Tufte’s second book to his first book. The Napoleon-in-Russia graph is a bad model, in that inspires people to try to cram lots of variables on a single graph. Dan wrote back: I [Dan] like it as a travel planning graph, it gives you what you want to know (how how will the days be, how cold will the nights be, will it rain) but is a bit easier on the brain than a table of highs and lows. Also makes it easy to see the trend. I agree the 2nd axis doesn’t help.

6 0.83920866 1233 andrew gelman stats-2012-03-27-Pushback against internet self-help gurus

7 0.83337939 1775 andrew gelman stats-2013-03-23-In which I disagree with John Maynard Keynes

8 0.8252399 1791 andrew gelman stats-2013-04-07-Scatterplot charades!

9 0.82287365 307 andrew gelman stats-2010-09-29-“Texting bans don’t reduce crashes; effects are slight crash increases”

10 0.82056022 60 andrew gelman stats-2010-05-30-What Auteur Theory and Freshwater Economics have in common

11 0.8053382 590 andrew gelman stats-2011-02-25-Good introductory book for statistical computation?

12 0.80148757 1138 andrew gelman stats-2012-01-25-Chris Schmid on Evidence Based Medicine

13 0.7982198 713 andrew gelman stats-2011-05-15-1-2 social scientist + 1-2 politician = ???

14 0.78431857 483 andrew gelman stats-2010-12-23-Science, ideology, and human origins

15 0.77799249 1060 andrew gelman stats-2011-12-15-Freakonomics: What went wrong?

16 0.77460968 492 andrew gelman stats-2010-12-30-That puzzle-solving feeling

17 0.77386731 111 andrew gelman stats-2010-06-26-Tough love as a style of writing

18 0.76951134 1692 andrew gelman stats-2013-01-25-Freakonomics Experiments

19 0.76783538 2015 andrew gelman stats-2013-09-10-The ethics of lying, cheating, and stealing with data: A case study

20 0.76293874 1936 andrew gelman stats-2013-07-13-Economic policy does not occur in a political vacuum