andrew_gelman_stats andrew_gelman_stats-2012 andrew_gelman_stats-2012-1155 knowledge-graph by maker-knowledge-mining

1155 andrew gelman stats-2012-02-05-What is a prior distribution?

meta infos for this blog

Source: html

Introduction: Some recent blog discussion revealed some confusion that I’ll try to resolve here. I wrote that I’m not a big fan of subjective priors. Various commenters had difficulty with this point, and I think the issue was most clearly stated by Bill Jeff re erys, who wrote : It seems to me that your prior has to reflect your subjective information before you look at the data. How can it not? But this does not mean that the (subjective) prior that you choose is irrefutable; Surely a prior that reflects prior information just does not have to be inconsistent with that information. But that still leaves a range of priors that are consistent with it, the sort of priors that one would use in a sensitivity analysis, for example. I think I see what Bill is getting at. A prior represents your subjective belief, or some approximation to your subjective belief, even if it’s not perfect. That sounds reasonable but I don’t think it works. Or, at least, it often doesn’t work. Let’s start

Summary: the most important sentenses genereted by tfidf model

sentIndex sentText sentNum sentScore

1 Some recent blog discussion revealed some confusion that I’ll try to resolve here. [sent-1, score-0.19]

2 I wrote that I’m not a big fan of subjective priors. [sent-2, score-0.441]

3 Various commenters had difficulty with this point, and I think the issue was most clearly stated by Bill Jeff re erys, who wrote : It seems to me that your prior has to reflect your subjective information before you look at the data. [sent-3, score-1.347]

4 But this does not mean that the (subjective) prior that you choose is irrefutable; Surely a prior that reflects prior information just does not have to be inconsistent with that information. [sent-5, score-1.884]

5 But that still leaves a range of priors that are consistent with it, the sort of priors that one would use in a sensitivity analysis, for example. [sent-6, score-0.42]

6 A prior represents your subjective belief, or some approximation to your subjective belief, even if it’s not perfect. [sent-8, score-1.463]

7 That sounds reasonable but I don’t think it works. [sent-9, score-0.077]

8 You hop on a scale that gives unbiased measurements with errors that have a standard deviation of 0. [sent-12, score-0.275]

9 To do Bayesian analysis, you assign a N(0,10000^2) prior on your true weight. [sent-14, score-0.58]

10 Instead of thinking of these as subjective beliefs, I prefer to think of the joint probability distribution as a model, reflecting a set of assumptions. [sent-19, score-0.823]

11 In some settings these assumptions represent subjective beliefs, in other settings they don’t. [sent-20, score-0.74]

12 If I could go back and alter it, I’d add something on weakly informative priors, but I still agree with the general approach discussed there. [sent-22, score-0.228]

13 Just to give an example of what I mean by prior information: The analyses in Red State Blue State all use noninformative prior distributions. [sent-25, score-1.247]

14 But a lot of prior information comes in, in the selection of what questions to study, what models to consider, and what variables to include in the model. [sent-26, score-0.721]

15 For example, as state-level predictors we include region of the country, Republican vote in the previous presidential election, and average state income. [sent-27, score-0.309]

16 Prior information goes into the choice and construction of all these predictors. [sent-28, score-0.201]

17 But the prior distribution is a particular probability distribution that in this case is flat and does not reflect prior knowledge. [sent-29, score-1.587]

18 One way to think about informative prior distributions is as a form of smoothing: when setting the parameters of a probability distribution based on prior knowledge, we are imposing some time smoothness on the parameters. [sent-30, score-1.612]

19 I think that’s probably a good idea and that the Red State Blue State analyses (among others) would be better for it. [sent-31, score-0.165]

20 I didn’t set up this prior structure because I wasn’t easily equipped to do so and it seemed like too much effort, but perhaps at some future time this sort of structuring will be as commonplace as hierarchical modeling is today. [sent-32, score-0.77]

similar blogs computed by tfidf model

tfidf for this blog:

wordName wordTfidf (topN-words)

[('prior', 0.516), ('subjective', 0.441), ('state', 0.169), ('belief', 0.16), ('priors', 0.142), ('distribution', 0.14), ('information', 0.133), ('beliefs', 0.12), ('reflect', 0.117), ('settings', 0.101), ('blue', 0.099), ('represent', 0.097), ('probability', 0.096), ('equipped', 0.095), ('irrefutable', 0.095), ('bill', 0.093), ('informative', 0.093), ('hop', 0.091), ('smoothness', 0.091), ('red', 0.091), ('analyses', 0.088), ('imposing', 0.083), ('structuring', 0.081), ('commonplace', 0.078), ('think', 0.077), ('inconsistent', 0.076), ('smoothing', 0.076), ('alter', 0.072), ('include', 0.072), ('sensitivity', 0.07), ('reflecting', 0.069), ('noninformative', 0.069), ('reflects', 0.069), ('region', 0.068), ('construction', 0.068), ('leaves', 0.066), ('resolve', 0.066), ('unbiased', 0.066), ('revealed', 0.065), ('approximation', 0.065), ('assign', 0.064), ('purposes', 0.064), ('stated', 0.063), ('weakly', 0.063), ('flat', 0.062), ('deviation', 0.06), ('confusion', 0.059), ('surely', 0.059), ('measurements', 0.058), ('mean', 0.058)]

similar blogs list:

simIndex simValue blogId blogTitle

same-blog 1 1.0 1155 andrew gelman stats-2012-02-05-What is a prior distribution?

2 0.46216294 1092 andrew gelman stats-2011-12-29-More by Berger and me on weakly informative priors

Introduction: A couple days ago we discussed some remarks by Tony O’Hagan and Jim Berger on weakly informative priors. Jim followed up on Deborah Mayo’s blog with this: Objective Bayesian priors are often improper (i.e., have infinite total mass), but this is not a problem when they are developed correctly. But not every improper prior is satisfactory. For instance, the constant prior is known to be unsatisfactory in many situations. The ‘solution’ pseudo-Bayesians often use is to choose a constant prior over a large but bounded set (a ‘weakly informative’ prior), saying it is now proper and so all is well. This is not true; if the constant prior on the whole parameter space is bad, so will be the constant prior over the bounded set. The problem is, in part, that some people confuse proper priors with subjective priors and, having learned that true subjective priors are fine, incorrectly presume that weakly informative proper priors are fine. I have a few reactions to this: 1. I agree

3 0.38588417 1941 andrew gelman stats-2013-07-16-Priors

Introduction: Nick Firoozye writes: While I am absolutely sympathetic to the Bayesian agenda I am often troubled by the requirement of having priors. We must have priors on the parameter of an infinite number of model we have never seen before and I find this troubling. There is a similarly troubling problem in economics of utility theory. Utility is on consumables. To be complete a consumer must assign utility to all sorts of things they never would have encountered. More recent versions of utility theory instead make consumption goods a portfolio of attributes. Cadillacs are x many units of luxury y of transport etc etc. And we can automatically have personal utilities to all these attributes. I don’t ever see parameters. Some model have few and some have hundreds. Instead, I see data. So I don’t know how to have an opinion on parameters themselves. Rather I think it far more natural to have opinions on the behavior of models. The prior predictive density is a good and sensible notion. Also

4 0.37703356 2109 andrew gelman stats-2013-11-21-Hidden dangers of noninformative priors

Introduction: Following up on Christian’s post [link fixed] on the topic, I’d like to offer a few thoughts of my own. In BDA, we express the idea that a noninformative prior is a placeholder: you can use the noninformative prior to get the analysis started, then if your posterior distribution is less informative than you would like, or if it does not make sense, you can go back and add prior information. Same thing for the data model (the “likelihood”), for that matter: it often makes sense to start with something simple and conventional and then go from there. So, in that sense, noninformative priors are no big deal, they’re just a way to get started. Just don’t take them too seriously. Traditionally in statistics we’ve worked with the paradigm of a single highly informative dataset with only weak external information. But if the data are sparse and prior information is strong, we have to think differently. And, when you increase the dimensionality of a problem, both these things hap

5 0.36922777 2200 andrew gelman stats-2014-02-05-Prior distribution for a predicted probability

Introduction: I received the following email: I have an interesting thought on a prior for a logistic regression, and would love your input on how to make it “work.” Some of my research, two published papers, are on mathematical models of **. Along those lines, I’m interested in developing more models for **. . . . Empirical studies show that the public is rather smart and that the wisdom-of-the-crowd is fairly accurate. So, my thought would be to tread the public’s probability of the event as a prior, and then see how adding data, through a model, would change or perturb our inferred probability of **. (Similarly, I could envision using previously published epidemiological research as a prior probability of a disease, and then seeing how the addition of new testing protocols would update that belief.) However, everything I learned about hierarchical Bayesian models has a prior as a distribution on the coefficients. I don’t know how to start with a prior point estimate for the probabili

6 0.33759966 1474 andrew gelman stats-2012-08-29-More on scaled-inverse Wishart and prior independence

7 0.31939366 1946 andrew gelman stats-2013-07-19-Prior distributions on derived quantities rather than on parameters themselves

8 0.31135833 1149 andrew gelman stats-2012-02-01-Philosophy of Bayesian statistics: my reactions to Cox and Mayo

9 0.30387747 1087 andrew gelman stats-2011-12-27-“Keeping things unridiculous”: Berger, O’Hagan, and me on weakly informative priors

10 0.30003792 1151 andrew gelman stats-2012-02-03-Philosophy of Bayesian statistics: my reactions to Senn

11 0.29163754 2017 andrew gelman stats-2013-09-11-“Informative g-Priors for Logistic Regression”

12 0.28557462 1695 andrew gelman stats-2013-01-28-Economists argue about Bayes

13 0.26926115 1247 andrew gelman stats-2012-04-05-More philosophy of Bayes

14 0.2613638 2138 andrew gelman stats-2013-12-18-In Memoriam Dennis Lindley

15 0.24432303 1486 andrew gelman stats-2012-09-07-Prior distributions for regression coefficients

16 0.24402098 779 andrew gelman stats-2011-06-25-Avoiding boundary estimates using a prior distribution as regularization

17 0.24320394 1858 andrew gelman stats-2013-05-15-Reputations changeable, situations tolerable

18 0.23458192 1182 andrew gelman stats-2012-02-24-Untangling the Jeffreys-Lindley paradox

19 0.23286332 781 andrew gelman stats-2011-06-28-The holes in my philosophy of Bayesian data analysis

20 0.23023266 1130 andrew gelman stats-2012-01-20-Prior beliefs about locations of decision boundaries

similar blogs computed by lsi model

lsi for this blog:

topicId topicWeight

[(0, 0.253), (1, 0.265), (2, 0.077), (3, 0.151), (4, -0.146), (5, -0.145), (6, 0.21), (7, 0.081), (8, -0.286), (9, 0.104), (10, 0.047), (11, 0.006), (12, 0.116), (13, 0.041), (14, 0.003), (15, 0.019), (16, -0.004), (17, -0.002), (18, 0.053), (19, -0.003), (20, -0.058), (21, -0.066), (22, -0.07), (23, -0.005), (24, -0.013), (25, 0.051), (26, 0.076), (27, -0.065), (28, -0.054), (29, 0.02), (30, 0.059), (31, -0.065), (32, 0.014), (33, 0.021), (34, -0.054), (35, 0.022), (36, 0.06), (37, 0.019), (38, -0.045), (39, -0.004), (40, 0.026), (41, -0.026), (42, 0.042), (43, -0.046), (44, 0.035), (45, 0.028), (46, -0.063), (47, -0.051), (48, -0.063), (49, -0.018)]

similar blogs list:

simIndex simValue blogId blogTitle

same-blog 1 0.98970032 1155 andrew gelman stats-2012-02-05-What is a prior distribution?

2 0.94770461 1092 andrew gelman stats-2011-12-29-More by Berger and me on weakly informative priors

3 0.90725935 1087 andrew gelman stats-2011-12-27-“Keeping things unridiculous”: Berger, O’Hagan, and me on weakly informative priors

Introduction: Deborah Mayo sent me this quote from Jim Berger: Too often I see people pretending to be subjectivists, and then using “weakly informative” priors that the objective Bayesian community knows are terrible and will give ridiculous answers; subjectivism is then being used as a shield to hide ignorance. . . . In my own more provocative moments, I claim that the only true subjectivists are the objective Bayesians, because they refuse to use subjectivism as a shield against criticism of sloppy pseudo-Bayesian practice. This caught my attention because I’ve become more and more convinced that weakly informative priors are the right way to go in many different situations. I don’t think Berger was talking about me , though, as the above quote came from a publication in 2006, at which time I’d only started writing about weakly informative priors. Going back to Berger’s article , I see that his “weakly informative priors” remark was aimed at this article by Anthony O’Hagan, who w

4 0.89945281 1858 andrew gelman stats-2013-05-15-Reputations changeable, situations tolerable

Introduction: David Kessler, Peter Hoff, and David Dunson write : Marginally specified priors for nonparametric Bayesian estimation Prior specification for nonparametric Bayesian inference involves the difficult task of quantifying prior knowledge about a parameter of high, often infinite, dimension. Realistically, a statistician is unlikely to have informed opinions about all aspects of such a parameter, but may have real information about functionals of the parameter, such the population mean or variance. This article proposes a new framework for nonparametric Bayes inference in which the prior distribution for a possibly infinite-dimensional parameter is decomposed into two parts: an informative prior on a finite set of functionals, and a nonparametric conditional prior for the parameter given the functionals. Such priors can be easily constructed from standard nonparametric prior distributions in common use, and inherit the large support of the standard priors upon which they are based. Ad

5 0.88332063 1474 andrew gelman stats-2012-08-29-More on scaled-inverse Wishart and prior independence

Introduction: I’ve had a couple of email conversations in the past couple days on dependence in multivariate prior distributions. Modeling the degrees of freedom and scale parameters in the t distribution First, in our Stan group we’ve been discussing the choice of priors for the degrees-of-freedom parameter in the t distribution. I wrote that also there’s the question of parameterization. It does not necessarily make sense to have independent priors on the df and scale parameters. In some sense, the meaning of the scale parameter changes with the df. Prior dependence between correlation and scale parameters in the scaled inverse-Wishart model The second case of parameterization in prior distribution arose from an email I received from Chris Chatham pointing me to this exploration by Matt Simpson of the scaled inverse-Wishart prior distribution for hierarchical covariance matrices. Simpson writes: A popular prior for Σ is the inverse-Wishart distribution [ not the same as the

6 0.88170946 1046 andrew gelman stats-2011-12-07-Neutral noninformative and informative conjugate beta and gamma prior distributions

7 0.87679744 2109 andrew gelman stats-2013-11-21-Hidden dangers of noninformative priors

8 0.87089729 2138 andrew gelman stats-2013-12-18-In Memoriam Dennis Lindley

9 0.86067778 2017 andrew gelman stats-2013-09-11-“Informative g-Priors for Logistic Regression”

10 0.85885906 1946 andrew gelman stats-2013-07-19-Prior distributions on derived quantities rather than on parameters themselves

11 0.84697282 801 andrew gelman stats-2011-07-13-On the half-Cauchy prior for a global scale parameter

12 0.84146345 468 andrew gelman stats-2010-12-15-Weakly informative priors and imprecise probabilities

13 0.82881433 1941 andrew gelman stats-2013-07-16-Priors

14 0.82846671 1130 andrew gelman stats-2012-01-20-Prior beliefs about locations of decision boundaries

15 0.82244897 2200 andrew gelman stats-2014-02-05-Prior distribution for a predicted probability

16 0.8084361 846 andrew gelman stats-2011-08-09-Default priors update?

17 0.80139446 669 andrew gelman stats-2011-04-19-The mysterious Gamma (1.4, 0.4)

18 0.7929371 1486 andrew gelman stats-2012-09-07-Prior distributions for regression coefficients

19 0.77707517 1454 andrew gelman stats-2012-08-11-Weakly informative priors for Bayesian nonparametric models?

20 0.76840979 1465 andrew gelman stats-2012-08-21-D. Buggin

similar blogs computed by lda model

lda for this blog:

topicId topicWeight

[(5, 0.012), (16, 0.1), (21, 0.022), (24, 0.302), (41, 0.01), (44, 0.014), (47, 0.074), (53, 0.027), (86, 0.024), (99, 0.291)]

similar blogs list:

simIndex simValue blogId blogTitle

same-blog 1 0.98241806 1155 andrew gelman stats-2012-02-05-What is a prior distribution?

2 0.97695744 1838 andrew gelman stats-2013-05-03-Setting aside the politics, the debate over the new health-care study reveals that we’re moving to a new high standard of statistical journalism

Introduction: Pointing to this news article by Megan McArdle discussing a recent study of Medicaid recipients, Jonathan Falk writes: Forget the interpretation for a moment, and the political spin, but haven’t we reached an interesting point when a journalist says things like: When you do an RCT with more than 12,000 people in it, and your defense of your hypothesis is that maybe the study just didn’t have enough power, what you’re actually saying is “the beneficial effects are probably pretty small”. and A good Bayesian—and aren’t most of us are supposed to be good Bayesians these days?—should be updating in light of this new information. Given this result, what is the likelihood that Obamacare will have a positive impact on the average health of Americans? Every one of us, for or against, should be revising that probability downwards. I’m not saying that you have to revise it to zero; I certainly haven’t. But however high it was yesterday, it should be somewhat lower today. This

3 0.97495484 953 andrew gelman stats-2011-10-11-Steve Jobs’s cancer and science-based medicine

Introduction: Interesting discussion from David Gorski (which I found via this link from Joseph Delaney). I don’t have anything really to add to this discussion except to note the value of this sort of anecdote in a statistics discussion. It’s only n=1 and adds almost nothing to the literature on the effectiveness of various treatments, but a story like this can help focus one’s thoughts on the decision problems.

4 0.9735139 847 andrew gelman stats-2011-08-10-Using a “pure infographic” to explore differences between information visualization and statistical graphics

Introduction: Our discussion on data visualization continues. One one side are three statisticians–Antony Unwin, Kaiser Fung, and myself. We have been writing about the different goals served by information visualization and statistical graphics. On the other side are graphics experts (sorry for the imprecision, I don’t know exactly what these people do in their day jobs or how they are trained, and I don’t want to mislabel them) such as Robert Kosara and Jen Lowe , who seem a bit annoyed at how my colleagues and myself seem to follow the Tufte strategy of criticizing what we don’t understand. And on the third side are many (most?) academic statisticians, econometricians, etc., who don’t understand or respect graphs and seem to think of visualization as a toy that is unrelated to serious science or statistics. I’m not so interested in the third group right now–I tried to communicate with them in my big articles from 2003 and 2004 )–but I am concerned that our dialogue with the graphic

5 0.97334969 1455 andrew gelman stats-2012-08-12-Probabilistic screening to get an approximate self-weighted sample

Introduction: Sharad had a survey sampling question: We’re trying to use mechanical turk to conduct some surveys, and have quickly discovered that turkers tend to be quite young. We’d really like a representative sample of the U.S., or at the least be able to recruit a diverse enough sample from turk that we can post-stratify to adjust the estimates. The approach we ended up taking is to pay turkers a small amount to answer a couple of screening questions (age & sex), and then probabilistically recruit individuals to complete the full survey (for more money) based on the estimated turk population parameters and our desired target distribution. We use rejection sampling, so the end result is that individuals who are invited to take the full survey look as if they came from a representative sample, at least in terms of age and sex. I’m wondering whether this sort of technique—a two step design in which participants are first screened and then probabilistically selected to mimic a target distributio

6 0.97242689 1080 andrew gelman stats-2011-12-24-Latest in blog advertising

7 0.97121704 1757 andrew gelman stats-2013-03-11-My problem with the Lindley paradox

8 0.97061408 2143 andrew gelman stats-2013-12-22-The kluges of today are the textbook solutions of tomorrow.

9 0.97032619 197 andrew gelman stats-2010-08-10-The last great essayist?

10 0.96850681 2029 andrew gelman stats-2013-09-18-Understanding posterior p-values

11 0.96834606 1376 andrew gelman stats-2012-06-12-Simple graph WIN: the example of birthday frequencies

12 0.96780843 2312 andrew gelman stats-2014-04-29-Ken Rice presents a unifying approach to statistical inference and hypothesis testing

13 0.96765101 1240 andrew gelman stats-2012-04-02-Blogads update

14 0.96679795 1999 andrew gelman stats-2013-08-27-Bayesian model averaging or fitting a larger model

15 0.96661615 1062 andrew gelman stats-2011-12-16-Mr. Pearson, meet Mr. Mandelbrot: Detecting Novel Associations in Large Data Sets

16 0.96657002 278 andrew gelman stats-2010-09-15-Advice that might make sense for individuals but is negative-sum overall

17 0.96573937 1072 andrew gelman stats-2011-12-19-“The difference between . . .”: It’s not just p=.05 vs. p=.06

18 0.96544528 846 andrew gelman stats-2011-08-09-Default priors update?

19 0.96536088 1584 andrew gelman stats-2012-11-19-Tradeoffs in information graphics

20 0.96514308 1368 andrew gelman stats-2012-06-06-Question 27 of my final exam for Design and Analysis of Sample Surveys