andrew_gelman_stats andrew_gelman_stats-2012 andrew_gelman_stats-2012-1474 knowledge-graph by maker-knowledge-mining

1474 andrew gelman stats-2012-08-29-More on scaled-inverse Wishart and prior independence


meta infos for this blog

Source: html

Introduction: I’ve had a couple of email conversations in the past couple days on dependence in multivariate prior distributions. Modeling the degrees of freedom and scale parameters in the t distribution First, in our Stan group we’ve been discussing the choice of priors for the degrees-of-freedom parameter in the t distribution. I wrote that also there’s the question of parameterization. It does not necessarily make sense to have independent priors on the df and scale parameters. In some sense, the meaning of the scale parameter changes with the df. Prior dependence between correlation and scale parameters in the scaled inverse-Wishart model The second case of parameterization in prior distribution arose from an email I received from Chris Chatham pointing me to this exploration by Matt Simpson of the scaled inverse-Wishart prior distribution for hierarchical covariance matrices. Simpson writes: A popular prior for Σ is the inverse-Wishart distribution [ not the same as the


Summary: the most important sentenses genereted by tfidf model

sentIndex sentText sentNum sentScore

1 I’ve had a couple of email conversations in the past couple days on dependence in multivariate prior distributions. [sent-1, score-0.668]

2 Modeling the degrees of freedom and scale parameters in the t distribution First, in our Stan group we’ve been discussing the choice of priors for the degrees-of-freedom parameter in the t distribution. [sent-2, score-0.661]

3 Simpson writes: A popular prior for Σ is the inverse-Wishart distribution [ not the same as the scaled -inverse Wishart model; see discussion below], but there are some problems . [sent-7, score-0.852]

4 using the standard “noninformative” version of the inverse-Wishart prior, which makes the marginal distribution of the correlations uniform, large standard deviations are associated with large absolute correlations. [sent-10, score-1.007]

5 As I wrote in the book with Jennifer, the inverse-Wishart does not seem flexible enough as a prior, but I think the key problem is not the prior correlations between the correlation and scale parameters but rather the restricted prior range of the scales. [sent-15, score-1.585]

6 We wanted a prior that allowed us to be less informative on the scale parameters while still expressing ignorance about the correlations. [sent-17, score-0.727]

7 instead of modeling Ω as a correlation matrix, only constrain it to be positive semi-definite so that Δ and Ω jointly determine the standard deviations, but Ω still determines the correlations alone. [sent-22, score-0.603]

8 the scaled inverse-Wishart is a much easier to work with, but theoretically it still allows for some dependence between the correlations and the variances in Σ. [sent-26, score-0.722]

9 Simpson then performs some simulations from various prior distributions and makes some graphs, after which he concludes: The [scaled inverse-Wishart] prior . [sent-27, score-0.866]

10 Frequentists have a point when they criticise Bayesians who argue that priors are fantastic because they allow you to express useful prior knowledge, and then turn around and use a conjugate prior because that’s what’s convenient. [sent-46, score-0.976]

11 Simpson and Barthelmé are unhappy with the scaled-inverse Wishart prior because they feel that the correlation and standard deviations should be a priori independent. [sent-59, score-0.957]

12 I don’t see that it’s so important to have prior independence of these parameters when ρ is close to ±1. [sent-61, score-0.868]

13 Models (1) and (2) are identical, and in both cases ρ is the correlation and σ is the marginal standard deviation (just as in Simpson and Barthelmé’s parameterizations). [sent-67, score-0.65]

14 Now σ is the conditional standard deviation; the marginal standard deviation is σ/√(1-ρ^2). [sent-69, score-0.605]

15 If you set independent priors on ρ and σ in model (3), this will induce dependence between ρ and the marginal standard deviation. [sent-70, score-0.728]

16 The above is not to say that the scaled-inverse Wishart model is best, or that prior independence of correlations and conditional variances is better (or worse) in general than prior independence of correlations and marginal variances. [sent-77, score-2.18]

17 A statement such as, “We think of correlation and scale as being two different things” does not imply that we should have prior independence in some particular parameterization. [sent-79, score-1.047]

18 That said, in Stan we’ve actually been working with a prior distribution in which the correlation and marginal variance parameters are independent. [sent-80, score-1.166]

19 priors The other thing to remember is that a prior distribution exists in relation to the likelihood. [sent-83, score-0.739]

20 The uniform prior distribution on [-1,1] seems reasonable. [sent-85, score-0.713]


similar blogs computed by tfidf model

tfidf for this blog:

wordName wordTfidf (topN-words)

[('prior', 0.433), ('simpson', 0.304), ('independence', 0.23), ('correlation', 0.223), ('scaled', 0.223), ('correlations', 0.202), ('distribution', 0.196), ('marginal', 0.181), ('dependence', 0.165), ('scale', 0.161), ('barthelm', 0.142), ('standard', 0.134), ('parameters', 0.133), ('variances', 0.132), ('wishart', 0.117), ('deviation', 0.112), ('priors', 0.11), ('deviations', 0.098), ('noninformative', 0.097), ('covariance', 0.094), ('model', 0.093), ('uniform', 0.084), ('matrix', 0.084), ('frequentists', 0.082), ('parameterizations', 0.08), ('close', 0.072), ('conversations', 0.07), ('priori', 0.069), ('parameterization', 0.069), ('absolute', 0.062), ('parameter', 0.061), ('constraints', 0.057), ('bayesians', 0.053), ('pointing', 0.052), ('near', 0.049), ('default', 0.049), ('burdzy', 0.047), ('chatham', 0.047), ('zaslavsky', 0.047), ('parameterize', 0.047), ('strategy', 0.047), ('independent', 0.045), ('df', 0.044), ('jointly', 0.044), ('practicality', 0.044), ('malley', 0.044), ('conditional', 0.044), ('classical', 0.044), ('consider', 0.044), ('sacrifice', 0.043)]

similar blogs list:

simIndex simValue blogId blogTitle

same-blog 1 1.0000005 1474 andrew gelman stats-2012-08-29-More on scaled-inverse Wishart and prior independence

Introduction: I’ve had a couple of email conversations in the past couple days on dependence in multivariate prior distributions. Modeling the degrees of freedom and scale parameters in the t distribution First, in our Stan group we’ve been discussing the choice of priors for the degrees-of-freedom parameter in the t distribution. I wrote that also there’s the question of parameterization. It does not necessarily make sense to have independent priors on the df and scale parameters. In some sense, the meaning of the scale parameter changes with the df. Prior dependence between correlation and scale parameters in the scaled inverse-Wishart model The second case of parameterization in prior distribution arose from an email I received from Chris Chatham pointing me to this exploration by Matt Simpson of the scaled inverse-Wishart prior distribution for hierarchical covariance matrices. Simpson writes: A popular prior for Σ is the inverse-Wishart distribution [ not the same as the

2 0.43904141 1466 andrew gelman stats-2012-08-22-The scaled inverse Wishart prior distribution for a covariance matrix in a hierarchical model

Introduction: Since we’re talking about the scaled inverse Wishart . . . here’s a recent message from Chris Chatham: I have been reading your book on Bayesian Hierarchical/Multilevel Modeling but have been struggling a bit with deciding whether to model my multivariate normal distribution using the scaled inverse Wishart approach you advocate given the arguments at this blog post [entitled "Why an inverse-Wishart prior may not be such a good idea"]. My reply: We discuss this in our book. We know the inverse-Wishart has problems, that’s why we recommend the scaled inverse-Wishart, which is a more general class of models. Here ‘s an old blog post on the topic. And also of course there’s the description in our book. Chris pointed me to the following comment by Simon Barthelmé: Using the scaled inverse Wishart doesn’t change anything, the standard deviations of the invidual coefficients and their covariance are still dependent. My answer would be to use a prior that models the stan

3 0.35691068 1941 andrew gelman stats-2013-07-16-Priors

Introduction: Nick Firoozye writes: While I am absolutely sympathetic to the Bayesian agenda I am often troubled by the requirement of having priors. We must have priors on the parameter of an infinite number of model we have never seen before and I find this troubling. There is a similarly troubling problem in economics of utility theory. Utility is on consumables. To be complete a consumer must assign utility to all sorts of things they never would have encountered. More recent versions of utility theory instead make consumption goods a portfolio of attributes. Cadillacs are x many units of luxury y of transport etc etc. And we can automatically have personal utilities to all these attributes. I don’t ever see parameters. Some model have few and some have hundreds. Instead, I see data. So I don’t know how to have an opinion on parameters themselves. Rather I think it far more natural to have opinions on the behavior of models. The prior predictive density is a good and sensible notion. Also

4 0.34196922 779 andrew gelman stats-2011-06-25-Avoiding boundary estimates using a prior distribution as regularization

Introduction: For awhile I’ve been fitting most of my multilevel models using lmer/glmer, which gives point estimates of the group-level variance parameters (maximum marginal likelihood estimate for lmer and an approximation for glmer). I’m usually satisfied with this–sure, point estimation understates the uncertainty in model fitting, but that’s typically the least of our worries. Sometimes, though, lmer/glmer estimates group-level variances at 0 or estimates group-level correlation parameters at +/- 1. Typically, when this happens, it’s not that we’re so sure the variance is close to zero or that the correlation is close to 1 or -1; rather, the marginal likelihood does not provide a lot of information about these parameters of the group-level error distribution. I don’t want point estimates on the boundary. I don’t want to say that the unexplained variance in some dimension is exactly zero. One way to handle this problem is full Bayes: slap a prior on sigma, do your Gibbs and Metropolis

5 0.33759966 1155 andrew gelman stats-2012-02-05-What is a prior distribution?

Introduction: Some recent blog discussion revealed some confusion that I’ll try to resolve here. I wrote that I’m not a big fan of subjective priors. Various commenters had difficulty with this point, and I think the issue was most clearly stated by Bill Jeff re erys, who wrote : It seems to me that your prior has to reflect your subjective information before you look at the data. How can it not? But this does not mean that the (subjective) prior that you choose is irrefutable; Surely a prior that reflects prior information just does not have to be inconsistent with that information. But that still leaves a range of priors that are consistent with it, the sort of priors that one would use in a sensitivity analysis, for example. I think I see what Bill is getting at. A prior represents your subjective belief, or some approximation to your subjective belief, even if it’s not perfect. That sounds reasonable but I don’t think it works. Or, at least, it often doesn’t work. Let’s start

6 0.33253279 2200 andrew gelman stats-2014-02-05-Prior distribution for a predicted probability

7 0.31448516 2109 andrew gelman stats-2013-11-21-Hidden dangers of noninformative priors

8 0.28646809 1465 andrew gelman stats-2012-08-21-D. Buggin

9 0.27273247 1092 andrew gelman stats-2011-12-29-More by Berger and me on weakly informative priors

10 0.26401579 1946 andrew gelman stats-2013-07-19-Prior distributions on derived quantities rather than on parameters themselves

11 0.25174221 2017 andrew gelman stats-2013-09-11-“Informative g-Priors for Logistic Regression”

12 0.24248692 1486 andrew gelman stats-2012-09-07-Prior distributions for regression coefficients

13 0.23430426 1695 andrew gelman stats-2013-01-28-Economists argue about Bayes

14 0.22426358 547 andrew gelman stats-2011-01-31-Using sample size in the prior distribution

15 0.22366619 1477 andrew gelman stats-2012-08-30-Visualizing Distributions of Covariance Matrices

16 0.21909876 846 andrew gelman stats-2011-08-09-Default priors update?

17 0.21863645 801 andrew gelman stats-2011-07-13-On the half-Cauchy prior for a global scale parameter

18 0.21734972 781 andrew gelman stats-2011-06-28-The holes in my philosophy of Bayesian data analysis

19 0.2166445 1858 andrew gelman stats-2013-05-15-Reputations changeable, situations tolerable

20 0.21349858 1149 andrew gelman stats-2012-02-01-Philosophy of Bayesian statistics: my reactions to Cox and Mayo


similar blogs computed by lsi model

lsi for this blog:

topicId topicWeight

[(0, 0.244), (1, 0.289), (2, 0.055), (3, 0.104), (4, -0.03), (5, -0.103), (6, 0.25), (7, -0.004), (8, -0.259), (9, 0.121), (10, 0.001), (11, 0.012), (12, 0.075), (13, 0.029), (14, -0.022), (15, -0.04), (16, -0.041), (17, 0.015), (18, 0.048), (19, -0.017), (20, 0.024), (21, -0.072), (22, 0.013), (23, 0.039), (24, 0.019), (25, 0.079), (26, 0.019), (27, 0.02), (28, -0.005), (29, -0.014), (30, -0.003), (31, -0.002), (32, 0.014), (33, 0.024), (34, 0.005), (35, -0.003), (36, 0.027), (37, -0.004), (38, -0.019), (39, 0.002), (40, -0.001), (41, 0.008), (42, 0.021), (43, 0.0), (44, 0.013), (45, -0.01), (46, -0.014), (47, -0.056), (48, 0.044), (49, -0.05)]

similar blogs list:

simIndex simValue blogId blogTitle

same-blog 1 0.98056597 1474 andrew gelman stats-2012-08-29-More on scaled-inverse Wishart and prior independence

Introduction: I’ve had a couple of email conversations in the past couple days on dependence in multivariate prior distributions. Modeling the degrees of freedom and scale parameters in the t distribution First, in our Stan group we’ve been discussing the choice of priors for the degrees-of-freedom parameter in the t distribution. I wrote that also there’s the question of parameterization. It does not necessarily make sense to have independent priors on the df and scale parameters. In some sense, the meaning of the scale parameter changes with the df. Prior dependence between correlation and scale parameters in the scaled inverse-Wishart model The second case of parameterization in prior distribution arose from an email I received from Chris Chatham pointing me to this exploration by Matt Simpson of the scaled inverse-Wishart prior distribution for hierarchical covariance matrices. Simpson writes: A popular prior for Σ is the inverse-Wishart distribution [ not the same as the

2 0.92434961 779 andrew gelman stats-2011-06-25-Avoiding boundary estimates using a prior distribution as regularization

Introduction: For awhile I’ve been fitting most of my multilevel models using lmer/glmer, which gives point estimates of the group-level variance parameters (maximum marginal likelihood estimate for lmer and an approximation for glmer). I’m usually satisfied with this–sure, point estimation understates the uncertainty in model fitting, but that’s typically the least of our worries. Sometimes, though, lmer/glmer estimates group-level variances at 0 or estimates group-level correlation parameters at +/- 1. Typically, when this happens, it’s not that we’re so sure the variance is close to zero or that the correlation is close to 1 or -1; rather, the marginal likelihood does not provide a lot of information about these parameters of the group-level error distribution. I don’t want point estimates on the boundary. I don’t want to say that the unexplained variance in some dimension is exactly zero. One way to handle this problem is full Bayes: slap a prior on sigma, do your Gibbs and Metropolis

3 0.9075098 846 andrew gelman stats-2011-08-09-Default priors update?

Introduction: Ryan King writes: I was wondering if you have a brief comment on the state of the art for objective priors for hierarchical generalized linear models (generalized linear mixed models). I have been working off the papers in Bayesian Analysis (2006) 1, Number 3 (Browne and Draper, Kass and Natarajan, Gelman). There seems to have been continuous work for matching priors in linear mixed models, but GLMMs less so because of the lack of an analytic marginal likelihood for the variance components. There are a number of additional suggestions in the literature since 2006, but little robust practical guidance. I’m interested in both mean parameters and the variance components. I’m almost always concerned with logistic random effect models. I’m fascinated by the matching-priors idea of higher-order asymptotic improvements to maximum likelihood, and need to make some kind of defensible default recommendation. Given the massive scale of the datasets (genetics …), extensive sensitivity a

4 0.90660244 1858 andrew gelman stats-2013-05-15-Reputations changeable, situations tolerable

Introduction: David Kessler, Peter Hoff, and David Dunson write : Marginally specified priors for nonparametric Bayesian estimation Prior specification for nonparametric Bayesian inference involves the difficult task of quantifying prior knowledge about a parameter of high, often infinite, dimension. Realistically, a statistician is unlikely to have informed opinions about all aspects of such a parameter, but may have real information about functionals of the parameter, such the population mean or variance. This article proposes a new framework for nonparametric Bayes inference in which the prior distribution for a possibly infinite-dimensional parameter is decomposed into two parts: an informative prior on a finite set of functionals, and a nonparametric conditional prior for the parameter given the functionals. Such priors can be easily constructed from standard nonparametric prior distributions in common use, and inherit the large support of the standard priors upon which they are based. Ad

5 0.90265018 1046 andrew gelman stats-2011-12-07-Neutral noninformative and informative conjugate beta and gamma prior distributions

Introduction: Jouni Kerman did a cool bit of research justifying the Beta (1/3, 1/3) prior as noninformative for binomial data, and the Gamma (1/3, 0) prior for Poisson data. You probably thought that nothing new could be said about noninformative priors in such basic problems, but you were wrong! Here’s the story : The conjugate binomial and Poisson models are commonly used for estimating proportions or rates. However, it is not well known that the conventional noninformative conjugate priors tend to shrink the posterior quantiles toward the boundary or toward the middle of the parameter space, making them thus appear excessively informative. The shrinkage is always largest when the number of observed events is small. This behavior persists for all sample sizes and exposures. The effect of the prior is therefore most conspicuous and potentially controversial when analyzing rare events. As alternative default conjugate priors, I [Jouni] introduce Beta(1/3, 1/3) and Gamma(1/3, 0), which I cal

6 0.88890451 669 andrew gelman stats-2011-04-19-The mysterious Gamma (1.4, 0.4)

7 0.87984771 1155 andrew gelman stats-2012-02-05-What is a prior distribution?

8 0.87444854 1092 andrew gelman stats-2011-12-29-More by Berger and me on weakly informative priors

9 0.86749607 2017 andrew gelman stats-2013-09-11-“Informative g-Priors for Logistic Regression”

10 0.86520714 1465 andrew gelman stats-2012-08-21-D. Buggin

11 0.86252058 2109 andrew gelman stats-2013-11-21-Hidden dangers of noninformative priors

12 0.85851067 1946 andrew gelman stats-2013-07-19-Prior distributions on derived quantities rather than on parameters themselves

13 0.85740405 801 andrew gelman stats-2011-07-13-On the half-Cauchy prior for a global scale parameter

14 0.85647386 1941 andrew gelman stats-2013-07-16-Priors

15 0.84882259 442 andrew gelman stats-2010-12-01-bayesglm in Stata?

16 0.83894587 2200 andrew gelman stats-2014-02-05-Prior distribution for a predicted probability

17 0.83427668 1130 andrew gelman stats-2012-01-20-Prior beliefs about locations of decision boundaries

18 0.83196354 1466 andrew gelman stats-2012-08-22-The scaled inverse Wishart prior distribution for a covariance matrix in a hierarchical model

19 0.80579776 1087 andrew gelman stats-2011-12-27-“Keeping things unridiculous”: Berger, O’Hagan, and me on weakly informative priors

20 0.80023056 184 andrew gelman stats-2010-08-04-That half-Cauchy prior


similar blogs computed by lda model

lda for this blog:

topicId topicWeight

[(11, 0.051), (15, 0.011), (16, 0.06), (24, 0.238), (36, 0.066), (42, 0.018), (59, 0.026), (86, 0.073), (89, 0.041), (99, 0.244)]

similar blogs list:

simIndex simValue blogId blogTitle

same-blog 1 0.96949279 1474 andrew gelman stats-2012-08-29-More on scaled-inverse Wishart and prior independence

Introduction: I’ve had a couple of email conversations in the past couple days on dependence in multivariate prior distributions. Modeling the degrees of freedom and scale parameters in the t distribution First, in our Stan group we’ve been discussing the choice of priors for the degrees-of-freedom parameter in the t distribution. I wrote that also there’s the question of parameterization. It does not necessarily make sense to have independent priors on the df and scale parameters. In some sense, the meaning of the scale parameter changes with the df. Prior dependence between correlation and scale parameters in the scaled inverse-Wishart model The second case of parameterization in prior distribution arose from an email I received from Chris Chatham pointing me to this exploration by Matt Simpson of the scaled inverse-Wishart prior distribution for hierarchical covariance matrices. Simpson writes: A popular prior for Σ is the inverse-Wishart distribution [ not the same as the

2 0.96563035 846 andrew gelman stats-2011-08-09-Default priors update?

Introduction: Ryan King writes: I was wondering if you have a brief comment on the state of the art for objective priors for hierarchical generalized linear models (generalized linear mixed models). I have been working off the papers in Bayesian Analysis (2006) 1, Number 3 (Browne and Draper, Kass and Natarajan, Gelman). There seems to have been continuous work for matching priors in linear mixed models, but GLMMs less so because of the lack of an analytic marginal likelihood for the variance components. There are a number of additional suggestions in the literature since 2006, but little robust practical guidance. I’m interested in both mean parameters and the variance components. I’m almost always concerned with logistic random effect models. I’m fascinated by the matching-priors idea of higher-order asymptotic improvements to maximum likelihood, and need to make some kind of defensible default recommendation. Given the massive scale of the datasets (genetics …), extensive sensitivity a

3 0.95946085 1367 andrew gelman stats-2012-06-05-Question 26 of my final exam for Design and Analysis of Sample Surveys

Introduction: 26. You have just graded an an exam with 28 questions and 15 students. You fit a logistic item- response model estimating ability, difficulty, and discrimination parameters. Which of the following statements are basically true? (Indicate all that apply.) (a) If a question is answered correctly by students with very low and very high ability, but is missed by students in the middle, it will have a high value for its discrimination parameter. (b) It is not possible to fit an item-response model when you have more questions than students. In order to fit the model, you either need to reduce the number of questions (for example, by discarding some questions or by putting together some questions into a combined score) or increase the number of students in the dataset. (c) To keep the model identified, you can set one of the difficulty parameters or one of the ability parameters to zero and set one of the discrimination parameters to 1. (d) If two students answer the same number of q

4 0.95618618 1368 andrew gelman stats-2012-06-06-Question 27 of my final exam for Design and Analysis of Sample Surveys

Introduction: 27. Which of the following problems were identified with the Burnham et al. survey of Iraq mortality? (Indicate all that apply.) (a) The survey used cluster sampling, which is inappropriate for estimating individual out- comes such as death. (b) In their report, Burnham et al. did not identify their primary sampling units. (c) The second-stage sampling was not a probability sample. (d) Survey materials supplied by the authors are incomplete and inconsistent with published descriptions of the survey. Solution to question 26 From yesterday : 26. You have just graded an an exam with 28 questions and 15 students. You fit a logistic item- response model estimating ability, difficulty, and discrimination parameters. Which of the following statements are basically true? (Indicate all that apply.) (a) If a question is answered correctly by students with very low and very high ability, but is missed by students in the middle, it will have a high value for its discrimination

5 0.95389104 953 andrew gelman stats-2011-10-11-Steve Jobs’s cancer and science-based medicine

Introduction: Interesting discussion from David Gorski (which I found via this link from Joseph Delaney). I don’t have anything really to add to this discussion except to note the value of this sort of anecdote in a statistics discussion. It’s only n=1 and adds almost nothing to the literature on the effectiveness of various treatments, but a story like this can help focus one’s thoughts on the decision problems.

6 0.95371222 779 andrew gelman stats-2011-06-25-Avoiding boundary estimates using a prior distribution as regularization

7 0.95343697 1851 andrew gelman stats-2013-05-11-Actually, I have no problem with this graph

8 0.94845176 494 andrew gelman stats-2010-12-31-Type S error rates for classical and Bayesian single and multiple comparison procedures

9 0.94788802 2224 andrew gelman stats-2014-02-25-Basketball Stats: Don’t model the probability of win, model the expected score differential.

10 0.94754255 1838 andrew gelman stats-2013-05-03-Setting aside the politics, the debate over the new health-care study reveals that we’re moving to a new high standard of statistical journalism

11 0.94598502 2312 andrew gelman stats-2014-04-29-Ken Rice presents a unifying approach to statistical inference and hypothesis testing

12 0.94500971 847 andrew gelman stats-2011-08-10-Using a “pure infographic” to explore differences between information visualization and statistical graphics

13 0.94490039 197 andrew gelman stats-2010-08-10-The last great essayist?

14 0.9441309 1240 andrew gelman stats-2012-04-02-Blogads update

15 0.94412541 1004 andrew gelman stats-2011-11-11-Kaiser Fung on how not to critique models

16 0.94317949 1455 andrew gelman stats-2012-08-12-Probabilistic screening to get an approximate self-weighted sample

17 0.94254708 2086 andrew gelman stats-2013-11-03-How best to compare effects measured in two different time periods?

18 0.94248927 1062 andrew gelman stats-2011-12-16-Mr. Pearson, meet Mr. Mandelbrot: Detecting Novel Associations in Large Data Sets

19 0.94157803 1087 andrew gelman stats-2011-12-27-“Keeping things unridiculous”: Berger, O’Hagan, and me on weakly informative priors

20 0.9408679 1155 andrew gelman stats-2012-02-05-What is a prior distribution?