andrew_gelman_stats andrew_gelman_stats-2011 andrew_gelman_stats-2011-960 knowledge-graph by maker-knowledge-mining

960 andrew gelman stats-2011-10-15-The bias-variance tradeoff


meta infos for this blog

Source: html

Introduction: Joshua Vogelstein asks for my thoughts as a Bayesian on the above topic. So here they are (briefly): The concept of the bias-variance tradeoff can be useful if you don’t take it too seriously. The basic idea is as follows: if you’re estimating something, you can slice your data finer and finer, or perform more and more adjustments, each time getting a purer—and less biased—estimate. But each subdivision or each adjustment reduces your sample size or increases potential estimation error, hence the variance of your estimate goes up. That story is real. In lots and lots of examples, there’s a continuum between a completely unadjusted general estimate (high bias, low variance) and a specific, focused, adjusted estimate (low bias, high variance). Suppose, for example, you’re using data from a large experiment to estimate the effect of a treatment on a fairly narrow group, say, white men between the ages of 45 and 50. At one extreme, you could just take the estimated treatment e


Summary: the most important sentenses genereted by tfidf model

sentIndex sentText sentNum sentScore

1 So here they are (briefly): The concept of the bias-variance tradeoff can be useful if you don’t take it too seriously. [sent-2, score-0.338]

2 The basic idea is as follows: if you’re estimating something, you can slice your data finer and finer, or perform more and more adjustments, each time getting a purer—and less biased—estimate. [sent-3, score-0.398]

3 But each subdivision or each adjustment reduces your sample size or increases potential estimation error, hence the variance of your estimate goes up. [sent-4, score-0.711]

4 In lots and lots of examples, there’s a continuum between a completely unadjusted general estimate (high bias, low variance) and a specific, focused, adjusted estimate (low bias, high variance). [sent-6, score-1.418]

5 Suppose, for example, you’re using data from a large experiment to estimate the effect of a treatment on a fairly narrow group, say, white men between the ages of 45 and 50. [sent-7, score-0.844]

6 At one extreme, you could just take the estimated treatment effect for the entire population, which could have high bias (to the extent the effect varies by age, sex, and ethnicity) but low variance (because you’re using all the data). [sent-8, score-1.193]

7 At the other extreme, you could form an estimate using only data from the group in question, which would then be unbiased (assuming an appropriate experimental design) but would have a high variance. [sent-9, score-0.946]

8 In between are various model-based solutions such as Mister P. [sent-10, score-0.068]

9 The bit about the bias-variance tradeoff that I don’t buy is that a researcher can feel free to move along this efficient frontier, with the choice of estimate being somewhat of a matter of taste. [sent-11, score-0.783]

10 The idea is that a conservative serious scientist type might prefer something unbiased, whereas a risk-lover might accept some trade-off. [sent-12, score-0.169]

11 One of the difficulties with this conventional view is that in some settings the unbiased estimate is taken to be the conservative choice while in other places it is considered more reasonable to go for low variance. [sent-13, score-1.015]

12 Lots of classically-minded statistics textbooks will give you the sense that the unbiased estimate is the safe, sober choice. [sent-14, score-0.81]

13 But then when it comes to subgroup analysis, these same sober advice-givers ( link from Chris Blattman) will turn around and lecture you on why you shouldn’t try to estimate treatment effects in subgroups. [sent-15, score-0.826]

14 Here they’re preferring the biased but less variable estimate—but they typically won’t describe it in that way because that would sound a bit odd. [sent-16, score-0.291]

15 So my first problem with the bias-variance tradeoff idea is that there’s typically an assumption that bias is more important—except when it’s not. [sent-18, score-0.713]

16 My second problem is that, from a Bayesian perspective, given the model, there actually is a best estimate, or at least a best summary of posterior information about the parameter being estimated. [sent-19, score-0.2]

17 It’s not a matter of personal choice or a taste for unbiasedness or whatever. [sent-20, score-0.296]

18 For a Bayesian, the problem with the “bias” concept is that is conditional on the true parameter value. [sent-23, score-0.315]


similar blogs computed by tfidf model

tfidf for this blog:

wordName wordTfidf (topN-words)

[('estimate', 0.351), ('bias', 0.275), ('unbiased', 0.266), ('tradeoff', 0.223), ('sober', 0.193), ('variance', 0.185), ('finer', 0.178), ('low', 0.166), ('parameter', 0.133), ('choice', 0.132), ('treatment', 0.13), ('high', 0.121), ('biased', 0.12), ('concept', 0.115), ('bayesian', 0.106), ('subdivision', 0.103), ('conservative', 0.1), ('extreme', 0.099), ('continuum', 0.092), ('preferring', 0.092), ('re', 0.092), ('effect', 0.091), ('lots', 0.09), ('starters', 0.089), ('unadjusted', 0.087), ('frontier', 0.087), ('unbiasedness', 0.087), ('vogelstein', 0.087), ('subgroup', 0.084), ('slice', 0.081), ('blattman', 0.081), ('typically', 0.079), ('matter', 0.077), ('joshua', 0.077), ('group', 0.074), ('adjustments', 0.073), ('ages', 0.073), ('mister', 0.072), ('reduces', 0.072), ('adjusted', 0.07), ('varies', 0.07), ('data', 0.07), ('idea', 0.069), ('solutions', 0.068), ('virtue', 0.068), ('lecture', 0.068), ('problem', 0.067), ('narrow', 0.065), ('using', 0.064), ('ethnicity', 0.064)]

similar blogs list:

simIndex simValue blogId blogTitle

same-blog 1 1.0 960 andrew gelman stats-2011-10-15-The bias-variance tradeoff

Introduction: Joshua Vogelstein asks for my thoughts as a Bayesian on the above topic. So here they are (briefly): The concept of the bias-variance tradeoff can be useful if you don’t take it too seriously. The basic idea is as follows: if you’re estimating something, you can slice your data finer and finer, or perform more and more adjustments, each time getting a purer—and less biased—estimate. But each subdivision or each adjustment reduces your sample size or increases potential estimation error, hence the variance of your estimate goes up. That story is real. In lots and lots of examples, there’s a continuum between a completely unadjusted general estimate (high bias, low variance) and a specific, focused, adjusted estimate (low bias, high variance). Suppose, for example, you’re using data from a large experiment to estimate the effect of a treatment on a fairly narrow group, say, white men between the ages of 45 and 50. At one extreme, you could just take the estimated treatment e

2 0.25725111 1763 andrew gelman stats-2013-03-14-Everyone’s trading bias for variance at some point, it’s just done at different places in the analyses

Introduction: Some things I respect When it comes to meta-models of statistics, here are two philosophies that I respect: 1. (My) Bayesian approach, which I associate with E. T. Jaynes, in which you construct models with strong assumptions, ride your models hard, check their fit to data, and then scrap them and improve them as necessary. 2. At the other extreme, model-free statistical procedures that are designed to work well under very weak assumptions—for example, instead of assuming a distribution is Gaussian, you would just want the procedure to work well under some conditions on the smoothness of the second derivative of the log density function. Both the above philosophies recognize that (almost) all important assumptions will be wrong, and they resolve this concern via aggressive model checking or via robustness. And of course there are intermediate positions, such as working with Bayesian models that have been shown to be robust, and then still checking them. Or, to flip it arou

3 0.20380171 1418 andrew gelman stats-2012-07-16-Long discussion about causal inference and the use of hierarchical models to bridge between different inferential settings

Introduction: Elias Bareinboim asked what I thought about his comment on selection bias in which he referred to a paper by himself and Judea Pearl, “Controlling Selection Bias in Causal Inference.” I replied that I have no problem with what he wrote, but that from my perspective I find it easier to conceptualize such problems in terms of multilevel models. I elaborated on that point in a recent post , “Hierarchical modeling as a framework for extrapolation,” which I think was read by only a few people (I say this because it received only two comments). I don’t think Bareinboim objected to anything I wrote, but like me he is comfortable working within his own framework. He wrote the following to me: In some sense, “not ad hoc” could mean logically consistent. In other words, if one agrees with the assumptions encoded in the model, one must also agree with the conclusions entailed by these assumptions. I am not aware of any other way of doing mathematics. As it turns out, to get causa

4 0.20135619 494 andrew gelman stats-2010-12-31-Type S error rates for classical and Bayesian single and multiple comparison procedures

Introduction: Type S error: When your estimate is the wrong sign, compared to the true value of the parameter Type M error: When the magnitude of your estimate is far off, compared to the true value of the parameter More here.

5 0.18246797 810 andrew gelman stats-2011-07-20-Adding more information can make the variance go up (depending on your model)

Introduction: Andy McKenzie writes: In their March 9 “ counterpoint ” in nature biotech to the prospect that we should try to integrate more sources of data in clinical practice (see “ point ” arguing for this), Isaac Kohane and David Margulies claim that, “Finally, how much better is our new knowledge than older knowledge? When is the incremental benefit of a genomic variant(s) or gene expression profile relative to a family history or classic histopathology insufficient and when does it add rather than subtract variance?” Perhaps I am mistaken (thus this email), but it seems that this claim runs contra to the definition of conditional probability. That is, if you have a hierarchical model, and the family history / classical histopathology already suggests a parameter estimate with some variance, how could the new genomic info possibly increase the variance of that parameter estimate? Surely the question is how much variance the new genomic info reduces and whether it therefore justifies t

6 0.16492942 779 andrew gelman stats-2011-06-25-Avoiding boundary estimates using a prior distribution as regularization

7 0.1577712 1196 andrew gelman stats-2012-03-04-Piss-poor monocausal social science

8 0.15670685 255 andrew gelman stats-2010-09-04-How does multilevel modeling affect the estimate of the grand mean?

9 0.14920048 7 andrew gelman stats-2010-04-27-Should Mister P be allowed-encouraged to reside in counter-factual populations?

10 0.14033557 1737 andrew gelman stats-2013-02-25-Correlation of 1 . . . too good to be true?

11 0.13930839 1291 andrew gelman stats-2012-04-30-Systematic review of publication bias in studies on publication bias

12 0.13657698 1644 andrew gelman stats-2012-12-30-Fixed effects, followed by Bayes shrinkage?

13 0.13536428 803 andrew gelman stats-2011-07-14-Subtleties with measurement-error models for the evaluation of wacky claims

14 0.13379182 1572 andrew gelman stats-2012-11-10-I don’t like this cartoon

15 0.13158257 1868 andrew gelman stats-2013-05-23-Validation of Software for Bayesian Models Using Posterior Quantiles

16 0.1254375 2180 andrew gelman stats-2014-01-21-Everything I need to know about Bayesian statistics, I learned in eight schools.

17 0.12527446 2143 andrew gelman stats-2013-12-22-The kluges of today are the textbook solutions of tomorrow.

18 0.12391488 828 andrew gelman stats-2011-07-28-Thoughts on Groseclose book on media bias

19 0.12369491 888 andrew gelman stats-2011-09-03-A psychology researcher asks: Is Anova dead?

20 0.12270335 464 andrew gelman stats-2010-12-12-Finite-population standard deviation in a hierarchical model


similar blogs computed by lsi model

lsi for this blog:

topicId topicWeight

[(0, 0.24), (1, 0.123), (2, 0.112), (3, -0.097), (4, 0.004), (5, -0.001), (6, 0.032), (7, 0.044), (8, 0.01), (9, -0.029), (10, -0.031), (11, -0.043), (12, 0.048), (13, 0.034), (14, 0.037), (15, 0.007), (16, -0.067), (17, 0.028), (18, -0.014), (19, 0.015), (20, -0.038), (21, -0.021), (22, 0.064), (23, 0.029), (24, -0.009), (25, 0.027), (26, -0.037), (27, 0.019), (28, -0.008), (29, -0.003), (30, 0.012), (31, 0.032), (32, 0.001), (33, -0.051), (34, 0.014), (35, 0.003), (36, -0.076), (37, -0.055), (38, 0.046), (39, 0.007), (40, -0.013), (41, -0.006), (42, -0.127), (43, 0.023), (44, -0.032), (45, 0.065), (46, 0.114), (47, 0.043), (48, -0.043), (49, 0.017)]

similar blogs list:

simIndex simValue blogId blogTitle

same-blog 1 0.97734207 960 andrew gelman stats-2011-10-15-The bias-variance tradeoff

Introduction: Joshua Vogelstein asks for my thoughts as a Bayesian on the above topic. So here they are (briefly): The concept of the bias-variance tradeoff can be useful if you don’t take it too seriously. The basic idea is as follows: if you’re estimating something, you can slice your data finer and finer, or perform more and more adjustments, each time getting a purer—and less biased—estimate. But each subdivision or each adjustment reduces your sample size or increases potential estimation error, hence the variance of your estimate goes up. That story is real. In lots and lots of examples, there’s a continuum between a completely unadjusted general estimate (high bias, low variance) and a specific, focused, adjusted estimate (low bias, high variance). Suppose, for example, you’re using data from a large experiment to estimate the effect of a treatment on a fairly narrow group, say, white men between the ages of 45 and 50. At one extreme, you could just take the estimated treatment e

2 0.77728122 2180 andrew gelman stats-2014-01-21-Everything I need to know about Bayesian statistics, I learned in eight schools.

Introduction: This post is by Phil. I’m aware that there  are  some people who use a Bayesian approach largely because it allows them to provide a highly informative prior distribution based subjective judgment, but that is not the appeal of Bayesian methods for a lot of us practitioners. It’s disappointing and surprising, twenty years after my initial experiences, to still hear highly informed professional statisticians who think that what distinguishes Bayesian statistics from Frequentist statistics is “subjectivity” ( as seen in  a recent blog post and its comments ). My first encounter with Bayesian statistics was just over 20 years ago. I was a postdoc at Lawrence Berkeley National Laboratory, with a new PhD in theoretical atomic physics but working on various problems related to the geographical and statistical distribution of indoor radon (a naturally occurring radioactive gas that can be dangerous if present at high concentrations). One of the issues I ran into right at the start was th

3 0.77536064 464 andrew gelman stats-2010-12-12-Finite-population standard deviation in a hierarchical model

Introduction: Karri Seppa writes: My topic is regional variation in the cause-specific survival of breast cancer patients across the 21 hospital districts in Finland, this component being modeled by random effects. I am interested mainly in the district-specific effects, and with a hierarchical model I can get reasonable estimates also for sparsely populated districts. Based on the recommendation given in the book by yourself and Dr. Hill (2007) I tend to think that the finite-population variance would be an appropriate measure to summarize the overall variation across the 21 districts. However, I feel it is somewhat incoherent first to assume a Normal distribution for the district effects, involving a “superpopulation” variance parameter, and then to compute the finite-population variance from the estimated district-specific parameters. I wonder whether the finite-population variance were more appropriate in the context of a model with fixed district effects? My reply: I agree that th

4 0.75515699 1441 andrew gelman stats-2012-08-02-“Based on my experiences, I think you could make general progress by constructing a solution to your specific problem.”

Introduction: David Radwin writes: I am seeking a statistic measuring an estimate’s reliability or stability as an alternative to the coefficient of variation (CV), also known as the relative standard error. The CV is the standard error of an estimate (proportion, mean, regression coefficient, etc.) divided by the estimate itself, usually expressed as a percentage. For example, if a survey finds 15% unemployment with a 6% standard error, the CV is .06/.15 = .4 = 40%. Some US government agencies flag or suppress as unreliable any estimate with a CV over a certain threshold such as 30% or 50%. But this standard can be arbitrary (for example, 85% employment would have a much lower CV of .06/.85 = 7%), and the CV has other drawbacks I won’t elaborate here. I don’t need an evaluation of the wisdom of using the CV or anything else for measuring an estimate’s stability, but one of my projects calls for such a measure and I would like to find a better alternative. Can you or your blog readers suggest

5 0.75203216 368 andrew gelman stats-2010-10-25-Is instrumental variables analysis particularly susceptible to Type M errors?

Introduction: Hendrik Juerges writes: I am an applied econometrician. The reason I am writing is that I am pondering a question for some time now and I am curious whether you have any views on it. One problem the practitioner of instrumental variables estimation faces is large standard errors even with very large samples. Part of the problem is of course that one estimates a ratio. Anyhow, more often than not, I and many other researchers I know end up with large point estimates and standard errors when trying IV on a problem. Sometimes some of us are lucky and get a statistically significant result. Those estimates that make it beyond the 2 standard error threshold are often ridiculously large (one famous example in my line of research being Lleras-Muney’s estimates of the 10% effect of one year of schooling on mortality). The standard defense here is that IV estimates the complier-specific causal effect (which is mathematically correct). But still, I find many of the IV results (including my

6 0.74845624 803 andrew gelman stats-2011-07-14-Subtleties with measurement-error models for the evaluation of wacky claims

7 0.74327433 775 andrew gelman stats-2011-06-21-Fundamental difficulty of inference for a ratio when the denominator could be positive or negative

8 0.72793835 255 andrew gelman stats-2010-09-04-How does multilevel modeling affect the estimate of the grand mean?

9 0.72280246 1409 andrew gelman stats-2012-07-08-Is linear regression unethical in that it gives more weight to cases that are far from the average?

10 0.71282214 494 andrew gelman stats-2010-12-31-Type S error rates for classical and Bayesian single and multiple comparison procedures

11 0.71104562 518 andrew gelman stats-2011-01-15-Regression discontinuity designs: looking for the keys under the lamppost?

12 0.69609535 246 andrew gelman stats-2010-08-31-Somewhat Bayesian multilevel modeling

13 0.68853414 810 andrew gelman stats-2011-07-20-Adding more information can make the variance go up (depending on your model)

14 0.68575644 1377 andrew gelman stats-2012-06-13-A question about AIC

15 0.68574482 1737 andrew gelman stats-2013-02-25-Correlation of 1 . . . too good to be true?

16 0.68517989 1955 andrew gelman stats-2013-07-25-Bayes-respecting experimental design and other things

17 0.680987 643 andrew gelman stats-2011-04-02-So-called Bayesian hypothesis testing is just as bad as regular hypothesis testing

18 0.67652458 777 andrew gelman stats-2011-06-23-Combining survey data obtained using different modes of sampling

19 0.66679561 1348 andrew gelman stats-2012-05-27-Question 17 of my final exam for Design and Analysis of Sample Surveys

20 0.66277498 972 andrew gelman stats-2011-10-25-How do you interpret standard errors from a regression fit to the entire population?


similar blogs computed by lda model

lda for this blog:

topicId topicWeight

[(10, 0.016), (15, 0.021), (16, 0.367), (18, 0.015), (24, 0.133), (42, 0.011), (62, 0.013), (87, 0.015), (95, 0.014), (99, 0.299)]

similar blogs list:

simIndex simValue blogId blogTitle

1 0.99314457 1366 andrew gelman stats-2012-06-05-How do segregation measures change when you change the level of aggregation?

Introduction: In a discussion of workplace segregation, Philip Cohen posts some graphs that led me to a statistical question. I’ll pose my question below, but first the graphs: In a world of zero segregation of jobs by sex, the top graph above would have a spike at 50% (or, whatever the actual percentage is of women in the labor force) and, in the bottom graph, the pink and blue lines would be in the same place and would look like very steep S curves. The difference between the pink and blue lines represents segregation by job. One thing I wonder is how these graphs would change if we redefine occupation. (For example, is my occupation “mathematical scientist,” “statistician,” “teacher,” “university professor,” “statistics professor,” or “tenured statistics professor”?) Finer or coarser classification would give different results, and I wonder how this would work. This is not at all meant as a criticism of Cohen’s claims, it’s just a statistical question. I’m guessing that

2 0.987499 1487 andrew gelman stats-2012-09-08-Animated drought maps

Introduction: Aleks sends along this dynamic graphic from Mike Bostock: I’m not so happy with the arrangement of years by decade—the lineup of all years ending in 0, or 1, or 2, etc., seems a bit of a distraction—but in many ways the display is impressive. And, as often is the case with such graphs, once it’s out there, other people can do similar things and make their own improvements.

3 0.98696125 1598 andrew gelman stats-2012-11-30-A graphics talk with no visuals!

Introduction: So, I’m at MIT, twenty minutes into my talk on tradeoffs in information graphics to the computer scientists, when the power goes out. They had some dim backup lighting so we weren’t all sitting there in the dark, but the projector wasn’t working. So I took questions for the remaining 40 minutes. It went well, perhaps better than the actual talk would’ve gone, even though they didn’t get to see most of my slides .

4 0.98574692 1025 andrew gelman stats-2011-11-24-Always check your evidence

Introduction: Logical reasoning typically takes the following form: 1. I know that A is true. 2. I know that A implies B. 3. Therefore, I can conclude that B is true. I, like Lewis Carroll, have problems with this process sometimes, but it’s pretty standard. There is also a statistical version in which the above statements are replaced by averages (“A usually happens,” etc.). But in all these stories, the argument can fall down if you get the facts wrong. Perhaps that’s one reason that statisticians can be obsessed with detail. For example, David Brooks wrote the following, in a column called “Living with Mistakes”: The historian Leslie Hannah identified the ten largest American companies in 1912. None of those companies ranked in the top 100 companies by 1990. Huh? Could that really be? I googled “ten largest american companies 1912″ and found this , from Leslie Hannah: No big deal: two still in the top 10 rather than zero in the top 100, but Brooks’s general

5 0.98562425 700 andrew gelman stats-2011-05-06-Suspicious pattern of too-strong replications of medical research

Introduction: Howard Wainer writes in the Statistics Forum: The Chinese scientific literature is rarely read or cited outside of China. But the authors of this work are usually knowledgeable of the non-Chinese literature — at least the A-list journals. And so they too try to replicate the alpha finding. But do they? One would think that they would find the same diminished effect size, but they don’t! Instead they replicate the original result, even larger. Here’s one of the graphs: How did this happen? Full story here .

6 0.98200768 1156 andrew gelman stats-2012-02-06-Bayesian model-building by pure thought: Some principles and examples

7 0.9804467 2 andrew gelman stats-2010-04-23-Modeling heterogenous treatment effects

8 0.98033631 1330 andrew gelman stats-2012-05-19-Cross-validation to check missing-data imputation

9 0.97943985 1279 andrew gelman stats-2012-04-24-ESPN is looking to hire a research analyst

10 0.97863597 609 andrew gelman stats-2011-03-13-Coauthorship norms

11 0.97548807 177 andrew gelman stats-2010-08-02-Reintegrating rebels into civilian life: Quasi-experimental evidence from Burundi

12 0.97020876 1180 andrew gelman stats-2012-02-22-I’m officially no longer a “rogue”

13 0.96936691 445 andrew gelman stats-2010-12-03-Getting a job in pro sports… as a statistician

14 0.96732205 321 andrew gelman stats-2010-10-05-Racism!

15 0.96353865 1928 andrew gelman stats-2013-07-06-How to think about papers published in low-grade journals?

16 0.95743489 1304 andrew gelman stats-2012-05-06-Picking on Stephen Wolfram

17 0.95669556 1022 andrew gelman stats-2011-11-21-Progress for the Poor

18 0.9548347 377 andrew gelman stats-2010-10-28-The incoming moderate Republican congressmembers

19 0.95342863 1115 andrew gelman stats-2012-01-12-Where are the larger-than-life athletes?

20 0.95293391 411 andrew gelman stats-2010-11-13-Ethical concerns in medical trials