andrew_gelman_stats andrew_gelman_stats-2013 andrew_gelman_stats-2013-1849 knowledge-graph by maker-knowledge-mining

1849 andrew gelman stats-2013-05-09-Same old same old


meta infos for this blog

Source: html

Introduction: In an email I sent to a colleague who’s writing about lasso and Bayesian regression for R users: The one thing you might want to add, to fit with your pragmatic perspective, is to point out that these different methods are optimal under different assumptions about the data. However, these assumptions are never true (even in the rare cases where you have a believable prior, it won’t really follow the functional form assumed by bayesglm ; even in the rare cases where you have a real loss function, it won’t really follow the mathematical form assumed by lasso etc), but these methods can still be useful and be given the interpretation of regularized estimates. Another thing that someone might naively think is that regularization is fine but “ unbiased ” is somehow the most honest. In practice, if you stick to “unbiased” methods such as least squares, you’ll restrict the number of variables you can include in your model. So in reality you suffer from omitted-variable bias. So th


Summary: the most important sentenses genereted by tfidf model

sentIndex sentText sentNum sentScore

1 In an email I sent to a colleague who’s writing about lasso and Bayesian regression for R users: The one thing you might want to add, to fit with your pragmatic perspective, is to point out that these different methods are optimal under different assumptions about the data. [sent-1, score-1.361]

2 Another thing that someone might naively think is that regularization is fine but “ unbiased ” is somehow the most honest. [sent-3, score-0.726]

3 In practice, if you stick to “unbiased” methods such as least squares, you’ll restrict the number of variables you can include in your model. [sent-4, score-0.454]

4 So in reality you suffer from omitted-variable bias. [sent-5, score-0.211]

5 It’s not like the user can simply do unregularized regression and then think of regularization as a frill. [sent-7, score-0.846]

6 The practitioner who uses unregularized regression has already essentially made a compromise with the devil by restricting the number of predictors in the model to a “manageable” level (whatever that means). [sent-8, score-1.437]


similar blogs computed by tfidf model

tfidf for this blog:

wordName wordTfidf (topN-words)

[('unregularized', 0.367), ('lasso', 0.243), ('regularization', 0.223), ('unbiased', 0.217), ('assumed', 0.191), ('rare', 0.177), ('believable', 0.167), ('devil', 0.167), ('manageable', 0.167), ('regression', 0.161), ('practitioner', 0.158), ('assumptions', 0.146), ('regularized', 0.146), ('methods', 0.146), ('bayesglm', 0.138), ('pragmatic', 0.132), ('follow', 0.131), ('restricting', 0.129), ('compromise', 0.125), ('form', 0.125), ('naively', 0.123), ('cases', 0.123), ('won', 0.122), ('suffer', 0.121), ('functional', 0.113), ('optimal', 0.107), ('stick', 0.106), ('squares', 0.104), ('restrict', 0.104), ('safe', 0.099), ('number', 0.098), ('user', 0.095), ('loss', 0.093), ('users', 0.09), ('reality', 0.09), ('somehow', 0.088), ('colleague', 0.085), ('home', 0.081), ('uses', 0.08), ('interpretation', 0.08), ('predictors', 0.079), ('mathematical', 0.076), ('thing', 0.075), ('function', 0.073), ('essentially', 0.073), ('etc', 0.071), ('email', 0.069), ('different', 0.066), ('practice', 0.066), ('sent', 0.065)]

similar blogs list:

simIndex simValue blogId blogTitle

same-blog 1 0.99999994 1849 andrew gelman stats-2013-05-09-Same old same old

Introduction: In an email I sent to a colleague who’s writing about lasso and Bayesian regression for R users: The one thing you might want to add, to fit with your pragmatic perspective, is to point out that these different methods are optimal under different assumptions about the data. However, these assumptions are never true (even in the rare cases where you have a believable prior, it won’t really follow the functional form assumed by bayesglm ; even in the rare cases where you have a real loss function, it won’t really follow the mathematical form assumed by lasso etc), but these methods can still be useful and be given the interpretation of regularized estimates. Another thing that someone might naively think is that regularization is fine but “ unbiased ” is somehow the most honest. In practice, if you stick to “unbiased” methods such as least squares, you’ll restrict the number of variables you can include in your model. So in reality you suffer from omitted-variable bias. So th

2 0.276216 1769 andrew gelman stats-2013-03-18-Tibshirani announces new research result: A significance test for the lasso

Introduction: Lasso and me For a long time I was wrong about lasso. Lasso (“least absolute shrinkage and selection operator”) is a regularization procedure that shrinks regression coefficients toward zero, and in its basic form is equivalent to maximum penalized likelihood estimation with a penalty function that is proportional to the sum of the absolute values of the regression coefficients. I first heard about lasso from a talk that Trevor Hastie Rob Tibshirani gave at Berkeley in 1994 or 1995. He demonstrated that it shrunk regression coefficients to zero. I wasn’t impressed, first because it seemed like no big deal (if that’s the prior you use, that’s the shrinkage you get) and second because, from a Bayesian perspective, I don’t want to shrink things all the way to zero. In the sorts of social and environmental science problems I’ve worked on, just about nothing is zero. I’d like to control my noisy estimates but there’s nothing special about zero. At the end of the talk I stood

3 0.20982553 1763 andrew gelman stats-2013-03-14-Everyone’s trading bias for variance at some point, it’s just done at different places in the analyses

Introduction: Some things I respect When it comes to meta-models of statistics, here are two philosophies that I respect: 1. (My) Bayesian approach, which I associate with E. T. Jaynes, in which you construct models with strong assumptions, ride your models hard, check their fit to data, and then scrap them and improve them as necessary. 2. At the other extreme, model-free statistical procedures that are designed to work well under very weak assumptions—for example, instead of assuming a distribution is Gaussian, you would just want the procedure to work well under some conditions on the smoothness of the second derivative of the log density function. Both the above philosophies recognize that (almost) all important assumptions will be wrong, and they resolve this concern via aggressive model checking or via robustness. And of course there are intermediate positions, such as working with Bayesian models that have been shown to be robust, and then still checking them. Or, to flip it arou

4 0.13501443 1695 andrew gelman stats-2013-01-28-Economists argue about Bayes

Introduction: Robert Bell pointed me to this post by Brad De Long on Bayesian statistics, and then I also noticed this from Noah Smith, who wrote: My impression is that although the Bayesian/Frequentist debate is interesting and intellectually fun, there’s really not much “there” there… despite being so-hip-right-now, Bayesian is not the Statistical Jesus. I’m happy to see the discussion going in this direction. Twenty-five years ago or so, when I got into this biz, there were some serious anti-Bayesian attitudes floating around in mainstream statistics. Discussions in the journals sometimes devolved into debates of the form, “Bayesians: knaves or fools?”. You’d get all sorts of free-floating skepticism about any prior distribution at all, even while people were accepting without question (and doing theory on) logistic regressions, proportional hazards models, and all sorts of strong strong models. (In the subfield of survey sampling, various prominent researchers would refuse to mode

5 0.12380107 1418 andrew gelman stats-2012-07-16-Long discussion about causal inference and the use of hierarchical models to bridge between different inferential settings

Introduction: Elias Bareinboim asked what I thought about his comment on selection bias in which he referred to a paper by himself and Judea Pearl, “Controlling Selection Bias in Causal Inference.” I replied that I have no problem with what he wrote, but that from my perspective I find it easier to conceptualize such problems in terms of multilevel models. I elaborated on that point in a recent post , “Hierarchical modeling as a framework for extrapolation,” which I think was read by only a few people (I say this because it received only two comments). I don’t think Bareinboim objected to anything I wrote, but like me he is comfortable working within his own framework. He wrote the following to me: In some sense, “not ad hoc” could mean logically consistent. In other words, if one agrees with the assumptions encoded in the model, one must also agree with the conclusions entailed by these assumptions. I am not aware of any other way of doing mathematics. As it turns out, to get causa

6 0.11029921 1967 andrew gelman stats-2013-08-04-What are the key assumptions of linear regression?

7 0.10288054 1877 andrew gelman stats-2013-05-30-Infill asymptotics and sprawl asymptotics

8 0.09639816 1172 andrew gelman stats-2012-02-17-Rare name analysis and wealth convergence

9 0.095594972 1469 andrew gelman stats-2012-08-25-Ways of knowing

10 0.091641054 960 andrew gelman stats-2011-10-15-The bias-variance tradeoff

11 0.090022244 10 andrew gelman stats-2010-04-29-Alternatives to regression for social science predictions

12 0.089351542 2357 andrew gelman stats-2014-06-02-Why we hate stepwise regression

13 0.089147545 1149 andrew gelman stats-2012-02-01-Philosophy of Bayesian statistics: my reactions to Cox and Mayo

14 0.088793077 244 andrew gelman stats-2010-08-30-Useful models, model checking, and external validation: a mini-discussion

15 0.086410403 2129 andrew gelman stats-2013-12-10-Cross-validation and Bayesian estimation of tuning parameters

16 0.085601687 1735 andrew gelman stats-2013-02-24-F-f-f-fake data

17 0.083829977 754 andrew gelman stats-2011-06-09-Difficulties with Bayesian model averaging

18 0.083344728 1155 andrew gelman stats-2012-02-05-What is a prior distribution?

19 0.08327239 796 andrew gelman stats-2011-07-10-Matching and regression: two great tastes etc etc

20 0.082665116 451 andrew gelman stats-2010-12-05-What do practitioners need to know about regression?


similar blogs computed by lsi model

lsi for this blog:

topicId topicWeight

[(0, 0.155), (1, 0.094), (2, -0.008), (3, 0.024), (4, 0.003), (5, 0.008), (6, 0.025), (7, -0.026), (8, 0.035), (9, 0.029), (10, 0.018), (11, -0.011), (12, 0.013), (13, 0.003), (14, 0.014), (15, 0.02), (16, -0.02), (17, -0.024), (18, -0.031), (19, 0.012), (20, 0.011), (21, 0.004), (22, 0.039), (23, 0.022), (24, -0.008), (25, 0.037), (26, 0.059), (27, -0.033), (28, -0.037), (29, 0.025), (30, 0.05), (31, 0.062), (32, 0.015), (33, 0.036), (34, 0.008), (35, -0.047), (36, -0.008), (37, -0.012), (38, -0.019), (39, -0.015), (40, 0.014), (41, -0.005), (42, 0.025), (43, -0.048), (44, 0.072), (45, 0.001), (46, -0.043), (47, 0.012), (48, 0.018), (49, 0.008)]

similar blogs list:

simIndex simValue blogId blogTitle

same-blog 1 0.9766537 1849 andrew gelman stats-2013-05-09-Same old same old

Introduction: In an email I sent to a colleague who’s writing about lasso and Bayesian regression for R users: The one thing you might want to add, to fit with your pragmatic perspective, is to point out that these different methods are optimal under different assumptions about the data. However, these assumptions are never true (even in the rare cases where you have a believable prior, it won’t really follow the functional form assumed by bayesglm ; even in the rare cases where you have a real loss function, it won’t really follow the mathematical form assumed by lasso etc), but these methods can still be useful and be given the interpretation of regularized estimates. Another thing that someone might naively think is that regularization is fine but “ unbiased ” is somehow the most honest. In practice, if you stick to “unbiased” methods such as least squares, you’ll restrict the number of variables you can include in your model. So in reality you suffer from omitted-variable bias. So th

2 0.81111509 1967 andrew gelman stats-2013-08-04-What are the key assumptions of linear regression?

Introduction: Andy Cooper writes: A link to an article , “Four Assumptions Of Multiple Regression That Researchers Should Always Test”, has been making the rounds on Twitter. Their first rule is “Variables are Normally distributed.” And they seem to be talking about the independent variables – but then later bring in tests on the residuals (while admitting that the normally-distributed error assumption is a weak assumption). I thought we had long-since moved away from transforming our independent variables to make them normally distributed for statistical reasons (as opposed to standardizing them for interpretability, etc.) Am I missing something? I agree that leverage in a influence is important, but normality of the variables? The article is from 2002, so it might be dated, but given the popularity of the tweet, I thought I’d ask your opinion. My response: There’s some useful advice on that page but overall I think the advice was dated even in 2002. In section 3.6 of my book wit

3 0.79767364 796 andrew gelman stats-2011-07-10-Matching and regression: two great tastes etc etc

Introduction: Matthew Bogard writes: Regarding the book Mostly Harmless Econometrics, you state : A casual reader of the book might be left with the unfortunate impression that matching is a competitor to regression rather than a tool for making regression more effective. But in fact isn’t that what they are arguing, that, in a ‘mostly harmless way’ regression is in fact a matching estimator itself? “Our view is that regression can be motivated as a particular sort of weighted matching estimator, and therefore the differences between regression and matching estimates are unlikely to be of major empirical importance” (Chapter 3 p. 70) They seem to be distinguishing regression (without prior matching) from all other types of matching techniques, and therefore implying that regression can be a ‘mostly harmless’ substitute or competitor to matching. My previous understanding, before starting this book was as you say, that matching is a tool that makes regression more effective. I have n

4 0.79151303 327 andrew gelman stats-2010-10-07-There are never 70 distinct parameters

Introduction: Sam Seaver writes: I’m a graduate student in computational biology, and I’m relatively new to advanced statistics, and am trying to teach myself how best to approach a problem I have. My dataset is a small sparse matrix of 150 cases and 70 predictors, it is sparse as in many zeros, not many ‘NA’s. Each case is a nutrient that is fed into an in silico organism, and its response is whether or not it stimulates growth, and each predictor is one of 70 different pathways that the nutrient may or may not belong to. Because all of the nutrients do not belong to all of the pathways, there are thus many zeros in my matrix. My goal is to be able to use the pathways themselves to predict whether or not a nutrient could stimulate growth, thus I wanted to compute regression coefficients for each pathway, with which I could apply to other nutrients for other species. There are quite a few singularities in the dataset (summary(glm) reports that 14 coefficients are not defined because of sin

5 0.76791114 2357 andrew gelman stats-2014-06-02-Why we hate stepwise regression

Introduction: Haynes Goddard writes: I have been slowly working my way through the grad program in stats here, and the latest course was a biostats course on categorical and survival analysis. I noticed in the semi-parametric and parametric material (Wang and Lee is the text) that they use stepwise regression a lot. I learned in econometrics that stepwise is poor practice, as it defaults to the “theory of the regression line”, that is no theory at all, just the variation in the data. I don’t find the topic on your blog, and wonder if you have addressed the issue. My reply: Stepwise regression is one of these things, like outlier detection and pie charts, which appear to be popular among non-statisticans but are considered by statisticians to be a bit of a joke. For example, Jennifer and I don’t mention stepwise regression in our book, not even once. To address the issue more directly: the motivation behind stepwise regression is that you have a lot of potential predictors but not e

6 0.76349962 1094 andrew gelman stats-2011-12-31-Using factor analysis or principal components analysis or measurement-error models for biological measurements in archaeology?

7 0.76141661 1769 andrew gelman stats-2013-03-18-Tibshirani announces new research result: A significance test for the lasso

8 0.75232333 375 andrew gelman stats-2010-10-28-Matching for preprocessing data for causal inference

9 0.7309947 2110 andrew gelman stats-2013-11-22-A Bayesian model for an increasing function, in Stan!

10 0.73042876 1656 andrew gelman stats-2013-01-05-Understanding regression models and regression coefficients

11 0.71247464 144 andrew gelman stats-2010-07-13-Hey! Here’s a referee report for you!

12 0.7114498 1908 andrew gelman stats-2013-06-21-Interpreting interactions in discrete-data regression

13 0.70922619 451 andrew gelman stats-2010-12-05-What do practitioners need to know about regression?

14 0.69944185 861 andrew gelman stats-2011-08-19-Will Stan work well with 40×40 matrices?

15 0.69502395 10 andrew gelman stats-2010-04-29-Alternatives to regression for social science predictions

16 0.68334675 1516 andrew gelman stats-2012-09-30-Computational problems with glm etc.

17 0.67650312 1535 andrew gelman stats-2012-10-16-Bayesian analogue to stepwise regression?

18 0.67645818 2364 andrew gelman stats-2014-06-08-Regression and causality and variable ordering

19 0.67489624 1870 andrew gelman stats-2013-05-26-How to understand coefficients that reverse sign when you start controlling for things?

20 0.6747281 553 andrew gelman stats-2011-02-03-is it possible to “overstratify” when assigning a treatment in a randomized control trial?


similar blogs computed by lda model

lda for this blog:

topicId topicWeight

[(2, 0.016), (6, 0.013), (15, 0.021), (16, 0.075), (21, 0.026), (24, 0.185), (33, 0.147), (54, 0.017), (69, 0.029), (72, 0.041), (84, 0.016), (86, 0.061), (89, 0.023), (99, 0.234)]

similar blogs list:

simIndex simValue blogId blogTitle

same-blog 1 0.93922877 1849 andrew gelman stats-2013-05-09-Same old same old

Introduction: In an email I sent to a colleague who’s writing about lasso and Bayesian regression for R users: The one thing you might want to add, to fit with your pragmatic perspective, is to point out that these different methods are optimal under different assumptions about the data. However, these assumptions are never true (even in the rare cases where you have a believable prior, it won’t really follow the functional form assumed by bayesglm ; even in the rare cases where you have a real loss function, it won’t really follow the mathematical form assumed by lasso etc), but these methods can still be useful and be given the interpretation of regularized estimates. Another thing that someone might naively think is that regularization is fine but “ unbiased ” is somehow the most honest. In practice, if you stick to “unbiased” methods such as least squares, you’ll restrict the number of variables you can include in your model. So in reality you suffer from omitted-variable bias. So th

2 0.89193249 346 andrew gelman stats-2010-10-16-Mandelbrot and Akaike: from taxonomy to smooth runways (pioneering work in fractals and self-similarity)

Introduction: Mandelbrot on taxonomy (from 1955; the first publication about fractals that I know of): Searching for Mandelbrot on the blog led me to Akaike , who also recently passed away and also did interesting early work on self-similar stochastic processes. For example, this wonderful opening of his 1962 paper, “On a limiting process which asymptotically produces f^{-2} spectral density”: In the recent papers in which the results of the spectral analyses of roughnesses of runways or roadways are reported, the power spectral densities of approximately the form f^{-2} (f: frequency) are often treated. This fact directed the present author to the investigation of the limiting process which will provide the f^{-2} form under fairly general assumptions. In this paper a very simple model is given which explains a way how the f^{-2} form is obtained asymptotically. Our fundamental model is that the stochastic process, which might be considered to represent the roughness of the runway

3 0.8836875 1367 andrew gelman stats-2012-06-05-Question 26 of my final exam for Design and Analysis of Sample Surveys

Introduction: 26. You have just graded an an exam with 28 questions and 15 students. You fit a logistic item- response model estimating ability, difficulty, and discrimination parameters. Which of the following statements are basically true? (Indicate all that apply.) (a) If a question is answered correctly by students with very low and very high ability, but is missed by students in the middle, it will have a high value for its discrimination parameter. (b) It is not possible to fit an item-response model when you have more questions than students. In order to fit the model, you either need to reduce the number of questions (for example, by discarding some questions or by putting together some questions into a combined score) or increase the number of students in the dataset. (c) To keep the model identified, you can set one of the difficulty parameters or one of the ability parameters to zero and set one of the discrimination parameters to 1. (d) If two students answer the same number of q

4 0.88291585 2208 andrew gelman stats-2014-02-12-How to think about “identifiability” in Bayesian inference?

Introduction: We had some questions on the Stan list regarding identification. The topic arose because people were fitting models with improper posterior distributions, the kind of model where there’s a ridge in the likelihood and the parameters are not otherwise constrained. I tried to help by writing something on Bayesian identifiability for the Stan list. Then Ben Goodrich came along and cleaned up what I wrote. I think this might be of interest to many of you so I’ll repeat the discussion here. Here’s what I wrote: Identification is actually a tricky concept and is not so clearly defined. In the broadest sense, a Bayesian model is identified if the posterior distribution is proper. Then one can do Bayesian inference and that’s that. No need to require a finite variance or even a finite mean, all that’s needed is a finite integral of the probability distribution. That said, there are some reasons why a stronger definition can be useful: 1. Weak identification. Suppose that, wit

5 0.8826344 846 andrew gelman stats-2011-08-09-Default priors update?

Introduction: Ryan King writes: I was wondering if you have a brief comment on the state of the art for objective priors for hierarchical generalized linear models (generalized linear mixed models). I have been working off the papers in Bayesian Analysis (2006) 1, Number 3 (Browne and Draper, Kass and Natarajan, Gelman). There seems to have been continuous work for matching priors in linear mixed models, but GLMMs less so because of the lack of an analytic marginal likelihood for the variance components. There are a number of additional suggestions in the literature since 2006, but little robust practical guidance. I’m interested in both mean parameters and the variance components. I’m almost always concerned with logistic random effect models. I’m fascinated by the matching-priors idea of higher-order asymptotic improvements to maximum likelihood, and need to make some kind of defensible default recommendation. Given the massive scale of the datasets (genetics …), extensive sensitivity a

6 0.88203526 898 andrew gelman stats-2011-09-10-Fourteen magic words: an update

7 0.88192904 2299 andrew gelman stats-2014-04-21-Stan Model of the Week: Hierarchical Modeling of Supernovas

8 0.88130116 1474 andrew gelman stats-2012-08-29-More on scaled-inverse Wishart and prior independence

9 0.87947965 2201 andrew gelman stats-2014-02-06-Bootstrap averaging: Examples where it works and where it doesn’t work

10 0.87946153 262 andrew gelman stats-2010-09-08-Here’s how rumors get started: Lineplots, dotplots, and nonfunctional modernist architecture

11 0.87830436 494 andrew gelman stats-2010-12-31-Type S error rates for classical and Bayesian single and multiple comparison procedures

12 0.87778366 1240 andrew gelman stats-2012-04-02-Blogads update

13 0.87603438 1883 andrew gelman stats-2013-06-04-Interrogating p-values

14 0.87599719 2224 andrew gelman stats-2014-02-25-Basketball Stats: Don’t model the probability of win, model the expected score differential.

15 0.87586838 1206 andrew gelman stats-2012-03-10-95% intervals that I don’t believe, because they’re from a flat prior I don’t believe

16 0.87575936 899 andrew gelman stats-2011-09-10-The statistical significance filter

17 0.87564015 2086 andrew gelman stats-2013-11-03-How best to compare effects measured in two different time periods?

18 0.87512195 2149 andrew gelman stats-2013-12-26-Statistical evidence for revised standards

19 0.87481254 2029 andrew gelman stats-2013-09-18-Understanding posterior p-values

20 0.87474775 727 andrew gelman stats-2011-05-23-My new writing strategy