andrew_gelman_stats andrew_gelman_stats-2013 andrew_gelman_stats-2013-2109 knowledge-graph by maker-knowledge-mining

2109 andrew gelman stats-2013-11-21-Hidden dangers of noninformative priors


meta infos for this blog

Source: html

Introduction: Following up on Christian’s post [link fixed] on the topic, I’d like to offer a few thoughts of my own. In BDA, we express the idea that a noninformative prior is a placeholder: you can use the noninformative prior to get the analysis started, then if your posterior distribution is less informative than you would like, or if it does not make sense, you can go back and add prior information. Same thing for the data model (the “likelihood”), for that matter: it often makes sense to start with something simple and conventional and then go from there. So, in that sense, noninformative priors are no big deal, they’re just a way to get started. Just don’t take them too seriously. Traditionally in statistics we’ve worked with the paradigm of a single highly informative dataset with only weak external information. But if the data are sparse and prior information is strong, we have to think differently. And, when you increase the dimensionality of a problem, both these things hap


Summary: the most important sentenses genereted by tfidf model

sentIndex sentText sentNum sentScore

1 In BDA, we express the idea that a noninformative prior is a placeholder: you can use the noninformative prior to get the analysis started, then if your posterior distribution is less informative than you would like, or if it does not make sense, you can go back and add prior information. [sent-2, score-2.44]

2 Same thing for the data model (the “likelihood”), for that matter: it often makes sense to start with something simple and conventional and then go from there. [sent-3, score-0.625]

3 So, in that sense, noninformative priors are no big deal, they’re just a way to get started. [sent-4, score-0.543]

4 Traditionally in statistics we’ve worked with the paradigm of a single highly informative dataset with only weak external information. [sent-6, score-0.302]

5 But if the data are sparse and prior information is strong, we have to think differently. [sent-7, score-0.89]

6 And, when you increase the dimensionality of a problem, both these things happen: data per parameter become more sparse, and priors distribution that are innocuous in low dimensions become strong and highly informative (sometimes in a bad way) in high dimensions. [sent-8, score-1.13]

7 Here are two examples of the dangers of noninformative priors: 1. [sent-9, score-0.486]

8 From section 3 of my 1996 paper , Bayesian Model-Building by Pure Thought: estimating a convex, increasing function with a flat prior on the function values (subject to the constraints). [sent-10, score-0.722]

9 As discussed in the article, the innocuous-seeming prior contains a huge amount of information as you increase the number of points at which the curve is estimated. [sent-12, score-0.691]

10 A noninformative uniform prior on the coefficients is equivalent to a hierarchical N(0,tau^2) model with tau set to a very large value. [sent-15, score-1.171]

11 This is a very strong prior distribution pulling the estimates apart, and the resulting estimates of individual coefficients are implausible. [sent-16, score-1.026]

12 Any setting where the prior information really is strong, so that if you assume a flat prior, you can get silly estimates simply from noise variation. [sent-18, score-0.96]

13 For example , the claim that beautiful parents are more likely to have girls, which is based on data that are much much weaker than the prior information on this topic. [sent-19, score-0.846]

14 Finally, the simplest example yet, and my new favorite: we assign a flat noninformative prior to a continuous parameter theta. [sent-21, score-1.13]

15 Indeed, if posterior inferences don’t make sense, that’s another way of saying that we have external (prior) information that was not included in the model. [sent-27, score-0.601]

16 (“Doesn’t make sense” implies some source of knowledge about which claims make sense and which don’t. [sent-28, score-0.455]

17 ” A better way to put it would be that people use conventional models that include much less information than is actually known. [sent-39, score-0.564]

18 If data are strong, you can often do just fine with conventional models. [sent-42, score-0.392]

19 But if data are sparse, it can often make sense to go back and add some real information to your model, in order to better answer your scientific questions. [sent-43, score-0.78]

20 But scientific reports typically don’t just report information in data, they also make general claims about the world, and for that it can be a terrible mistake to ignore strong information that is already known. [sent-45, score-0.846]


similar blogs computed by tfidf model

tfidf for this blog:

wordName wordTfidf (topN-words)

[('prior', 0.412), ('noninformative', 0.354), ('conventional', 0.229), ('information', 0.205), ('strong', 0.198), ('sparse', 0.179), ('sense', 0.165), ('flat', 0.158), ('priors', 0.121), ('informative', 0.119), ('make', 0.113), ('external', 0.108), ('posterior', 0.107), ('uniform', 0.102), ('pure', 0.098), ('noise', 0.096), ('data', 0.094), ('distribution', 0.09), ('estimates', 0.089), ('coefficients', 0.088), ('elasticity', 0.078), ('function', 0.076), ('dimensionality', 0.075), ('convex', 0.075), ('highly', 0.075), ('parameter', 0.075), ('increase', 0.074), ('hierarchical', 0.074), ('tau', 0.073), ('dangers', 0.073), ('example', 0.072), ('become', 0.071), ('often', 0.069), ('model', 0.068), ('way', 0.068), ('innocuous', 0.067), ('order', 0.067), ('add', 0.067), ('claims', 0.064), ('scaling', 0.064), ('weaker', 0.063), ('re', 0.062), ('include', 0.062), ('terrible', 0.061), ('observe', 0.061), ('pulling', 0.06), ('shoot', 0.06), ('examples', 0.059), ('simplest', 0.059), ('traditionally', 0.059)]

similar blogs list:

simIndex simValue blogId blogTitle

same-blog 1 1.0000001 2109 andrew gelman stats-2013-11-21-Hidden dangers of noninformative priors

Introduction: Following up on Christian’s post [link fixed] on the topic, I’d like to offer a few thoughts of my own. In BDA, we express the idea that a noninformative prior is a placeholder: you can use the noninformative prior to get the analysis started, then if your posterior distribution is less informative than you would like, or if it does not make sense, you can go back and add prior information. Same thing for the data model (the “likelihood”), for that matter: it often makes sense to start with something simple and conventional and then go from there. So, in that sense, noninformative priors are no big deal, they’re just a way to get started. Just don’t take them too seriously. Traditionally in statistics we’ve worked with the paradigm of a single highly informative dataset with only weak external information. But if the data are sparse and prior information is strong, we have to think differently. And, when you increase the dimensionality of a problem, both these things hap

2 0.50083548 1941 andrew gelman stats-2013-07-16-Priors

Introduction: Nick Firoozye writes: While I am absolutely sympathetic to the Bayesian agenda I am often troubled by the requirement of having priors. We must have priors on the parameter of an infinite number of model we have never seen before and I find this troubling. There is a similarly troubling problem in economics of utility theory. Utility is on consumables. To be complete a consumer must assign utility to all sorts of things they never would have encountered. More recent versions of utility theory instead make consumption goods a portfolio of attributes. Cadillacs are x many units of luxury y of transport etc etc. And we can automatically have personal utilities to all these attributes. I don’t ever see parameters. Some model have few and some have hundreds. Instead, I see data. So I don’t know how to have an opinion on parameters themselves. Rather I think it far more natural to have opinions on the behavior of models. The prior predictive density is a good and sensible notion. Also

3 0.37703356 1155 andrew gelman stats-2012-02-05-What is a prior distribution?

Introduction: Some recent blog discussion revealed some confusion that I’ll try to resolve here. I wrote that I’m not a big fan of subjective priors. Various commenters had difficulty with this point, and I think the issue was most clearly stated by Bill Jeff re erys, who wrote : It seems to me that your prior has to reflect your subjective information before you look at the data. How can it not? But this does not mean that the (subjective) prior that you choose is irrefutable; Surely a prior that reflects prior information just does not have to be inconsistent with that information. But that still leaves a range of priors that are consistent with it, the sort of priors that one would use in a sensitivity analysis, for example. I think I see what Bill is getting at. A prior represents your subjective belief, or some approximation to your subjective belief, even if it’s not perfect. That sounds reasonable but I don’t think it works. Or, at least, it often doesn’t work. Let’s start

4 0.36102095 1695 andrew gelman stats-2013-01-28-Economists argue about Bayes

Introduction: Robert Bell pointed me to this post by Brad De Long on Bayesian statistics, and then I also noticed this from Noah Smith, who wrote: My impression is that although the Bayesian/Frequentist debate is interesting and intellectually fun, there’s really not much “there” there… despite being so-hip-right-now, Bayesian is not the Statistical Jesus. I’m happy to see the discussion going in this direction. Twenty-five years ago or so, when I got into this biz, there were some serious anti-Bayesian attitudes floating around in mainstream statistics. Discussions in the journals sometimes devolved into debates of the form, “Bayesians: knaves or fools?”. You’d get all sorts of free-floating skepticism about any prior distribution at all, even while people were accepting without question (and doing theory on) logistic regressions, proportional hazards models, and all sorts of strong strong models. (In the subfield of survey sampling, various prominent researchers would refuse to mode

5 0.34949902 1092 andrew gelman stats-2011-12-29-More by Berger and me on weakly informative priors

Introduction: A couple days ago we discussed some remarks by Tony O’Hagan and Jim Berger on weakly informative priors. Jim followed up on Deborah Mayo’s blog with this: Objective Bayesian priors are often improper (i.e., have infinite total mass), but this is not a problem when they are developed correctly. But not every improper prior is satisfactory. For instance, the constant prior is known to be unsatisfactory in many situations. The ‘solution’ pseudo-Bayesians often use is to choose a constant prior over a large but bounded set (a ‘weakly informative’ prior), saying it is now proper and so all is well. This is not true; if the constant prior on the whole parameter space is bad, so will be the constant prior over the bounded set. The problem is, in part, that some people confuse proper priors with subjective priors and, having learned that true subjective priors are fine, incorrectly presume that weakly informative proper priors are fine. I have a few reactions to this: 1. I agree

6 0.32048529 1149 andrew gelman stats-2012-02-01-Philosophy of Bayesian statistics: my reactions to Cox and Mayo

7 0.31704924 2200 andrew gelman stats-2014-02-05-Prior distribution for a predicted probability

8 0.31448516 1474 andrew gelman stats-2012-08-29-More on scaled-inverse Wishart and prior independence

9 0.31447548 2017 andrew gelman stats-2013-09-11-“Informative g-Priors for Logistic Regression”

10 0.28049892 1046 andrew gelman stats-2011-12-07-Neutral noninformative and informative conjugate beta and gamma prior distributions

11 0.26943573 1946 andrew gelman stats-2013-07-19-Prior distributions on derived quantities rather than on parameters themselves

12 0.26486975 1465 andrew gelman stats-2012-08-21-D. Buggin

13 0.24896744 779 andrew gelman stats-2011-06-25-Avoiding boundary estimates using a prior distribution as regularization

14 0.24711576 1486 andrew gelman stats-2012-09-07-Prior distributions for regression coefficients

15 0.24700414 1087 andrew gelman stats-2011-12-27-“Keeping things unridiculous”: Berger, O’Hagan, and me on weakly informative priors

16 0.23603745 1858 andrew gelman stats-2013-05-15-Reputations changeable, situations tolerable

17 0.22823048 1247 andrew gelman stats-2012-04-05-More philosophy of Bayes

18 0.22631675 1792 andrew gelman stats-2013-04-07-X on JLP

19 0.21744537 63 andrew gelman stats-2010-06-02-The problem of overestimation of group-level variance parameters

20 0.21718591 1757 andrew gelman stats-2013-03-11-My problem with the Lindley paradox


similar blogs computed by lsi model

lsi for this blog:

topicId topicWeight

[(0, 0.307), (1, 0.274), (2, 0.04), (3, 0.079), (4, -0.051), (5, -0.151), (6, 0.232), (7, 0.05), (8, -0.263), (9, 0.099), (10, -0.009), (11, 0.032), (12, 0.096), (13, 0.012), (14, -0.034), (15, 0.009), (16, -0.019), (17, -0.006), (18, 0.046), (19, -0.002), (20, -0.026), (21, -0.043), (22, -0.038), (23, 0.001), (24, -0.017), (25, 0.038), (26, 0.044), (27, -0.038), (28, -0.007), (29, 0.03), (30, 0.017), (31, -0.032), (32, -0.011), (33, -0.032), (34, -0.007), (35, 0.047), (36, 0.027), (37, -0.005), (38, -0.025), (39, 0.026), (40, 0.019), (41, 0.023), (42, 0.006), (43, -0.049), (44, 0.008), (45, 0.042), (46, -0.009), (47, -0.052), (48, -0.02), (49, 0.04)]

similar blogs list:

simIndex simValue blogId blogTitle

same-blog 1 0.97319436 2109 andrew gelman stats-2013-11-21-Hidden dangers of noninformative priors

Introduction: Following up on Christian’s post [link fixed] on the topic, I’d like to offer a few thoughts of my own. In BDA, we express the idea that a noninformative prior is a placeholder: you can use the noninformative prior to get the analysis started, then if your posterior distribution is less informative than you would like, or if it does not make sense, you can go back and add prior information. Same thing for the data model (the “likelihood”), for that matter: it often makes sense to start with something simple and conventional and then go from there. So, in that sense, noninformative priors are no big deal, they’re just a way to get started. Just don’t take them too seriously. Traditionally in statistics we’ve worked with the paradigm of a single highly informative dataset with only weak external information. But if the data are sparse and prior information is strong, we have to think differently. And, when you increase the dimensionality of a problem, both these things hap

2 0.95024872 1092 andrew gelman stats-2011-12-29-More by Berger and me on weakly informative priors

Introduction: A couple days ago we discussed some remarks by Tony O’Hagan and Jim Berger on weakly informative priors. Jim followed up on Deborah Mayo’s blog with this: Objective Bayesian priors are often improper (i.e., have infinite total mass), but this is not a problem when they are developed correctly. But not every improper prior is satisfactory. For instance, the constant prior is known to be unsatisfactory in many situations. The ‘solution’ pseudo-Bayesians often use is to choose a constant prior over a large but bounded set (a ‘weakly informative’ prior), saying it is now proper and so all is well. This is not true; if the constant prior on the whole parameter space is bad, so will be the constant prior over the bounded set. The problem is, in part, that some people confuse proper priors with subjective priors and, having learned that true subjective priors are fine, incorrectly presume that weakly informative proper priors are fine. I have a few reactions to this: 1. I agree

3 0.94832063 1155 andrew gelman stats-2012-02-05-What is a prior distribution?

Introduction: Some recent blog discussion revealed some confusion that I’ll try to resolve here. I wrote that I’m not a big fan of subjective priors. Various commenters had difficulty with this point, and I think the issue was most clearly stated by Bill Jeff re erys, who wrote : It seems to me that your prior has to reflect your subjective information before you look at the data. How can it not? But this does not mean that the (subjective) prior that you choose is irrefutable; Surely a prior that reflects prior information just does not have to be inconsistent with that information. But that still leaves a range of priors that are consistent with it, the sort of priors that one would use in a sensitivity analysis, for example. I think I see what Bill is getting at. A prior represents your subjective belief, or some approximation to your subjective belief, even if it’s not perfect. That sounds reasonable but I don’t think it works. Or, at least, it often doesn’t work. Let’s start

4 0.94426924 1474 andrew gelman stats-2012-08-29-More on scaled-inverse Wishart and prior independence

Introduction: I’ve had a couple of email conversations in the past couple days on dependence in multivariate prior distributions. Modeling the degrees of freedom and scale parameters in the t distribution First, in our Stan group we’ve been discussing the choice of priors for the degrees-of-freedom parameter in the t distribution. I wrote that also there’s the question of parameterization. It does not necessarily make sense to have independent priors on the df and scale parameters. In some sense, the meaning of the scale parameter changes with the df. Prior dependence between correlation and scale parameters in the scaled inverse-Wishart model The second case of parameterization in prior distribution arose from an email I received from Chris Chatham pointing me to this exploration by Matt Simpson of the scaled inverse-Wishart prior distribution for hierarchical covariance matrices. Simpson writes: A popular prior for Σ is the inverse-Wishart distribution [ not the same as the

5 0.94191426 1941 andrew gelman stats-2013-07-16-Priors

Introduction: Nick Firoozye writes: While I am absolutely sympathetic to the Bayesian agenda I am often troubled by the requirement of having priors. We must have priors on the parameter of an infinite number of model we have never seen before and I find this troubling. There is a similarly troubling problem in economics of utility theory. Utility is on consumables. To be complete a consumer must assign utility to all sorts of things they never would have encountered. More recent versions of utility theory instead make consumption goods a portfolio of attributes. Cadillacs are x many units of luxury y of transport etc etc. And we can automatically have personal utilities to all these attributes. I don’t ever see parameters. Some model have few and some have hundreds. Instead, I see data. So I don’t know how to have an opinion on parameters themselves. Rather I think it far more natural to have opinions on the behavior of models. The prior predictive density is a good and sensible notion. Also

6 0.9174875 1046 andrew gelman stats-2011-12-07-Neutral noninformative and informative conjugate beta and gamma prior distributions

7 0.91523975 1946 andrew gelman stats-2013-07-19-Prior distributions on derived quantities rather than on parameters themselves

8 0.9151957 1858 andrew gelman stats-2013-05-15-Reputations changeable, situations tolerable

9 0.90832555 2017 andrew gelman stats-2013-09-11-“Informative g-Priors for Logistic Regression”

10 0.89897364 1130 andrew gelman stats-2012-01-20-Prior beliefs about locations of decision boundaries

11 0.89320797 1087 andrew gelman stats-2011-12-27-“Keeping things unridiculous”: Berger, O’Hagan, and me on weakly informative priors

12 0.8784163 669 andrew gelman stats-2011-04-19-The mysterious Gamma (1.4, 0.4)

13 0.86716223 801 andrew gelman stats-2011-07-13-On the half-Cauchy prior for a global scale parameter

14 0.86697447 2200 andrew gelman stats-2014-02-05-Prior distribution for a predicted probability

15 0.86602724 1465 andrew gelman stats-2012-08-21-D. Buggin

16 0.86175239 846 andrew gelman stats-2011-08-09-Default priors update?

17 0.84698749 468 andrew gelman stats-2010-12-15-Weakly informative priors and imprecise probabilities

18 0.84684277 779 andrew gelman stats-2011-06-25-Avoiding boundary estimates using a prior distribution as regularization

19 0.83437914 1486 andrew gelman stats-2012-09-07-Prior distributions for regression coefficients

20 0.82494307 1454 andrew gelman stats-2012-08-11-Weakly informative priors for Bayesian nonparametric models?


similar blogs computed by lda model

lda for this blog:

topicId topicWeight

[(11, 0.024), (13, 0.011), (15, 0.037), (16, 0.086), (21, 0.029), (24, 0.322), (40, 0.016), (47, 0.017), (89, 0.024), (99, 0.333)]

similar blogs list:

simIndex simValue blogId blogTitle

1 0.9908579 1838 andrew gelman stats-2013-05-03-Setting aside the politics, the debate over the new health-care study reveals that we’re moving to a new high standard of statistical journalism

Introduction: Pointing to this news article by Megan McArdle discussing a recent study of Medicaid recipients, Jonathan Falk writes: Forget the interpretation for a moment, and the political spin, but haven’t we reached an interesting point when a journalist says things like: When you do an RCT with more than 12,000 people in it, and your defense of your hypothesis is that maybe the study just didn’t have enough power, what you’re actually saying is “the beneficial effects are probably pretty small”. and A good Bayesian—and aren’t most of us are supposed to be good Bayesians these days?—should be updating in light of this new information. Given this result, what is the likelihood that Obamacare will have a positive impact on the average health of Americans? Every one of us, for or against, should be revising that probability downwards. I’m not saying that you have to revise it to zero; I certainly haven’t. But however high it was yesterday, it should be somewhat lower today. This

2 0.99073398 847 andrew gelman stats-2011-08-10-Using a “pure infographic” to explore differences between information visualization and statistical graphics

Introduction: Our discussion on data visualization continues. One one side are three statisticians–Antony Unwin, Kaiser Fung, and myself. We have been writing about the different goals served by information visualization and statistical graphics. On the other side are graphics experts (sorry for the imprecision, I don’t know exactly what these people do in their day jobs or how they are trained, and I don’t want to mislabel them) such as Robert Kosara and Jen Lowe , who seem a bit annoyed at how my colleagues and myself seem to follow the Tufte strategy of criticizing what we don’t understand. And on the third side are many (most?) academic statisticians, econometricians, etc., who don’t understand or respect graphs and seem to think of visualization as a toy that is unrelated to serious science or statistics. I’m not so interested in the third group right now–I tried to communicate with them in my big articles from 2003 and 2004 )–but I am concerned that our dialogue with the graphic

3 0.98886418 1080 andrew gelman stats-2011-12-24-Latest in blog advertising

Introduction: I received the following message from “Patricia Lopez” of “Premium Link Ads”: Hello, I am interested in placing a text link on your page: http://andrewgelman.com/2011/07/super_sam_fuld/. The link would point to a page on a website that is relevant to your page and may be useful to your site visitors. We would be happy to compensate you for your time if it is something we are able to work out. The best way to reach me is through a direct response to this email. This will help me get back to you about the right link request. Please let me know if you are interested, and if not thanks for your time. Thanks. Usually I just ignore these, but after our recent discussion I decided to reply. I wrote: How much do you pay? But no answer. I wonder what’s going on? I mean, why bother sending the email in the first place if you’re not going to follow up?

4 0.9887327 197 andrew gelman stats-2010-08-10-The last great essayist?

Introduction: I recently read a bizarre article by Janet Malcolm on a murder trial in NYC. What threw me about the article was that the story was utterly commonplace (by the standards of today’s headlines): divorced mom kills ex-husband in a custody dispute over their four-year-old daughter. The only interesting features were (a) the wife was a doctor and the husband were a dentist, the sort of people you’d expect to sue rather than slay, and (b) the wife hired a hitman from within the insular immigrant community that she (and her husband) belonged to. But, really, neither of these was much of a twist. To add to the non-storyness of it all, there were no other suspects, the evidence against the wife and the hitman was overwhelming, and even the high-paid defense lawyers didn’t seem to be making much of an effort to convince anyone of their client’s innocents. (One of the closing arguments was that one aspect of the wife’s story was so ridiculous that it had to be true. In the lawyer’s wo

5 0.98716271 1155 andrew gelman stats-2012-02-05-What is a prior distribution?

Introduction: Some recent blog discussion revealed some confusion that I’ll try to resolve here. I wrote that I’m not a big fan of subjective priors. Various commenters had difficulty with this point, and I think the issue was most clearly stated by Bill Jeff re erys, who wrote : It seems to me that your prior has to reflect your subjective information before you look at the data. How can it not? But this does not mean that the (subjective) prior that you choose is irrefutable; Surely a prior that reflects prior information just does not have to be inconsistent with that information. But that still leaves a range of priors that are consistent with it, the sort of priors that one would use in a sensitivity analysis, for example. I think I see what Bill is getting at. A prior represents your subjective belief, or some approximation to your subjective belief, even if it’s not perfect. That sounds reasonable but I don’t think it works. Or, at least, it often doesn’t work. Let’s start

6 0.98676181 1757 andrew gelman stats-2013-03-11-My problem with the Lindley paradox

7 0.98600197 1455 andrew gelman stats-2012-08-12-Probabilistic screening to get an approximate self-weighted sample

8 0.98588443 1240 andrew gelman stats-2012-04-02-Blogads update

9 0.98427117 1087 andrew gelman stats-2011-12-27-“Keeping things unridiculous”: Berger, O’Hagan, and me on weakly informative priors

10 0.9841904 278 andrew gelman stats-2010-09-15-Advice that might make sense for individuals but is negative-sum overall

11 0.98362052 953 andrew gelman stats-2011-10-11-Steve Jobs’s cancer and science-based medicine

12 0.98353392 1999 andrew gelman stats-2013-08-27-Bayesian model averaging or fitting a larger model

13 0.98330945 2029 andrew gelman stats-2013-09-18-Understanding posterior p-values

14 0.98319864 2143 andrew gelman stats-2013-12-22-The kluges of today are the textbook solutions of tomorrow.

same-blog 15 0.98233265 2109 andrew gelman stats-2013-11-21-Hidden dangers of noninformative priors

16 0.98232734 2247 andrew gelman stats-2014-03-14-The maximal information coefficient

17 0.98147476 2312 andrew gelman stats-2014-04-29-Ken Rice presents a unifying approach to statistical inference and hypothesis testing

18 0.9806639 1208 andrew gelman stats-2012-03-11-Gelman on Hennig on Gelman on Bayes

19 0.98053998 1072 andrew gelman stats-2011-12-19-“The difference between . . .”: It’s not just p=.05 vs. p=.06

20 0.98045695 846 andrew gelman stats-2011-08-09-Default priors update?