andrew_gelman_stats andrew_gelman_stats-2013 andrew_gelman_stats-2013-1946 knowledge-graph by maker-knowledge-mining

1946 andrew gelman stats-2013-07-19-Prior distributions on derived quantities rather than on parameters themselves


meta infos for this blog

Source: html

Introduction: Following up on our discussion of the other day , Nick Firoozye writes: One thing I meant by my initial query (but really didn’t manage to get across) was this: I have no idea what my prior would be on many many models, but just like Utility Theory expects ALL consumers to attach a utility to any and all consumption goods (even those I haven’t seen or heard of), Bayesian Stats (almost) expects the same for priors. (Of course it’s not a religious edict much in the way Utility Theory has, since there is no theory of a “modeler” in the Bayesian paradigm—nonetheless there is still an expectation that we should have priors over all sorts of parameters which mean almost nothing to us). For most models with sufficient complexity, I also have no idea what my informative priors are actually doing and the only way to know anything is through something I can see and experience, through data, not parameters or state variables. My question was more on the—let’s use the prior to come up


Summary: the most important sentenses genereted by tfidf model

sentIndex sentText sentNum sentScore

1 (Of course it’s not a religious edict much in the way Utility Theory has, since there is no theory of a “modeler” in the Bayesian paradigm—nonetheless there is still an expectation that we should have priors over all sorts of parameters which mean almost nothing to us). [sent-2, score-0.549]

2 For most models with sufficient complexity, I also have no idea what my informative priors are actually doing and the only way to know anything is through something I can see and experience, through data, not parameters or state variables. [sent-3, score-0.622]

3 My question was more on the—let’s use the prior to come up with something that can be manipulated, and then use this to restrict or identify the prior. [sent-4, score-0.272]

4 We can use a simple conditioning rule with huge matrices to find the conditional densities of the other, e. [sent-7, score-0.359]

5 In the case of looking at prior conditional forecasts, if they do not seem (subjectively) reasonable, the priors need to be changed. [sent-12, score-0.846]

6 Easy enough and works wonders in these macro-financial models where the LT unconditional econ forecasts are very bad but the LT yield curve forecasts conditioned on econ data are generally quite reasonable. [sent-15, score-1.004]

7 ), that these motions were largely related to demand shocks in the economy, where growth and inflation move together (typically the only shocks that the Fed really knows how to deal with), but that the atypical motions bear steepening and bull flattening seemed to coincide with supply shocks. [sent-18, score-0.96]

8 At least conditional forecasts give something you might be able to see in reality. [sent-23, score-0.511]

9 I have no idea whether one can easily put enough constraints on the priors to make them fully determined. [sent-24, score-0.642]

10 This is more like having some information for an informative prior but perhaps not enough to make it unique (e. [sent-26, score-0.538]

11 Say MaxEnt subject to the subjective constraints, or like in Reference priors, minimize the cross-entropy between the prior and the posterior subject to my subjective constraints, etc. [sent-30, score-0.504]

12 Irrespective, the goal is not to have priors on parameters exactly since I think this is damn near impossible. [sent-34, score-0.495]

13 I think nobody knows what the correlation between the state variables in time t vs time t+1 should be to make the model all that reasonable (well hopefully they are uncorrelated, but who knows? [sent-35, score-0.29]

14 My actual contention here is—people do not have priors on parameters. [sent-38, score-0.324]

15 But relationships in data, forecasts, conditional forecasts, all these are observable or involve observable quantities. [sent-41, score-0.422]

16 But using these methods in this subjective prior identification problem seems not completely loony. [sent-49, score-0.451]

17 In some settings I think it can make sense to put a prior distribution on parameters, in other sense it can make more sense to encode prior information in terms of predictive quantities. [sent-51, score-1.0]

18 In my paper many years ago with Frederic Bois, we constructed priors on our model parameters that made sense to us on a transformed scale. [sent-52, score-0.617]

19 In Stan, by the way, you can put priors on anything that can be computed: parameters, functions of parameters, predictions, whatever. [sent-53, score-0.378]

20 As we’ve been discussing a lot on this blog recently, strong priors can make sense, especially in settings with sparse data where we want to avoid being jerked around by patterns in the noise. [sent-54, score-0.394]


similar blogs computed by tfidf model

tfidf for this blog:

wordName wordTfidf (topN-words)

[('priors', 0.324), ('prior', 0.272), ('forecasts', 0.261), ('conditional', 0.25), ('parameters', 0.171), ('vars', 0.16), ('yield', 0.159), ('curve', 0.129), ('constraints', 0.128), ('density', 0.128), ('jefrreys', 0.117), ('lt', 0.117), ('motions', 0.117), ('steepening', 0.117), ('subjective', 0.116), ('conditioning', 0.109), ('maxent', 0.107), ('subjectively', 0.107), ('bull', 0.101), ('utility', 0.097), ('shocks', 0.096), ('entropy', 0.093), ('expects', 0.093), ('flattening', 0.093), ('reasonable', 0.093), ('objective', 0.091), ('observable', 0.086), ('gdp', 0.078), ('inflation', 0.075), ('bear', 0.075), ('slope', 0.075), ('informative', 0.073), ('cross', 0.073), ('knows', 0.073), ('predictive', 0.073), ('make', 0.07), ('scenario', 0.068), ('condition', 0.067), ('competing', 0.066), ('enough', 0.066), ('gaussian', 0.065), ('econ', 0.064), ('sense', 0.063), ('identification', 0.063), ('instance', 0.06), ('many', 0.059), ('unique', 0.057), ('theory', 0.054), ('put', 0.054), ('state', 0.054)]

similar blogs list:

simIndex simValue blogId blogTitle

same-blog 1 1.0000005 1946 andrew gelman stats-2013-07-19-Prior distributions on derived quantities rather than on parameters themselves

Introduction: Following up on our discussion of the other day , Nick Firoozye writes: One thing I meant by my initial query (but really didn’t manage to get across) was this: I have no idea what my prior would be on many many models, but just like Utility Theory expects ALL consumers to attach a utility to any and all consumption goods (even those I haven’t seen or heard of), Bayesian Stats (almost) expects the same for priors. (Of course it’s not a religious edict much in the way Utility Theory has, since there is no theory of a “modeler” in the Bayesian paradigm—nonetheless there is still an expectation that we should have priors over all sorts of parameters which mean almost nothing to us). For most models with sufficient complexity, I also have no idea what my informative priors are actually doing and the only way to know anything is through something I can see and experience, through data, not parameters or state variables. My question was more on the—let’s use the prior to come up

2 0.34804156 1092 andrew gelman stats-2011-12-29-More by Berger and me on weakly informative priors

Introduction: A couple days ago we discussed some remarks by Tony O’Hagan and Jim Berger on weakly informative priors. Jim followed up on Deborah Mayo’s blog with this: Objective Bayesian priors are often improper (i.e., have infinite total mass), but this is not a problem when they are developed correctly. But not every improper prior is satisfactory. For instance, the constant prior is known to be unsatisfactory in many situations. The ‘solution’ pseudo-Bayesians often use is to choose a constant prior over a large but bounded set (a ‘weakly informative’ prior), saying it is now proper and so all is well. This is not true; if the constant prior on the whole parameter space is bad, so will be the constant prior over the bounded set. The problem is, in part, that some people confuse proper priors with subjective priors and, having learned that true subjective priors are fine, incorrectly presume that weakly informative proper priors are fine. I have a few reactions to this: 1. I agree

3 0.34067991 1941 andrew gelman stats-2013-07-16-Priors

Introduction: Nick Firoozye writes: While I am absolutely sympathetic to the Bayesian agenda I am often troubled by the requirement of having priors. We must have priors on the parameter of an infinite number of model we have never seen before and I find this troubling. There is a similarly troubling problem in economics of utility theory. Utility is on consumables. To be complete a consumer must assign utility to all sorts of things they never would have encountered. More recent versions of utility theory instead make consumption goods a portfolio of attributes. Cadillacs are x many units of luxury y of transport etc etc. And we can automatically have personal utilities to all these attributes. I don’t ever see parameters. Some model have few and some have hundreds. Instead, I see data. So I don’t know how to have an opinion on parameters themselves. Rather I think it far more natural to have opinions on the behavior of models. The prior predictive density is a good and sensible notion. Also

4 0.31939366 1155 andrew gelman stats-2012-02-05-What is a prior distribution?

Introduction: Some recent blog discussion revealed some confusion that I’ll try to resolve here. I wrote that I’m not a big fan of subjective priors. Various commenters had difficulty with this point, and I think the issue was most clearly stated by Bill Jeff re erys, who wrote : It seems to me that your prior has to reflect your subjective information before you look at the data. How can it not? But this does not mean that the (subjective) prior that you choose is irrefutable; Surely a prior that reflects prior information just does not have to be inconsistent with that information. But that still leaves a range of priors that are consistent with it, the sort of priors that one would use in a sensitivity analysis, for example. I think I see what Bill is getting at. A prior represents your subjective belief, or some approximation to your subjective belief, even if it’s not perfect. That sounds reasonable but I don’t think it works. Or, at least, it often doesn’t work. Let’s start

5 0.26943573 2109 andrew gelman stats-2013-11-21-Hidden dangers of noninformative priors

Introduction: Following up on Christian’s post [link fixed] on the topic, I’d like to offer a few thoughts of my own. In BDA, we express the idea that a noninformative prior is a placeholder: you can use the noninformative prior to get the analysis started, then if your posterior distribution is less informative than you would like, or if it does not make sense, you can go back and add prior information. Same thing for the data model (the “likelihood”), for that matter: it often makes sense to start with something simple and conventional and then go from there. So, in that sense, noninformative priors are no big deal, they’re just a way to get started. Just don’t take them too seriously. Traditionally in statistics we’ve worked with the paradigm of a single highly informative dataset with only weak external information. But if the data are sparse and prior information is strong, we have to think differently. And, when you increase the dimensionality of a problem, both these things hap

6 0.26401579 1474 andrew gelman stats-2012-08-29-More on scaled-inverse Wishart and prior independence

7 0.25493556 1087 andrew gelman stats-2011-12-27-“Keeping things unridiculous”: Berger, O’Hagan, and me on weakly informative priors

8 0.24505137 1486 andrew gelman stats-2012-09-07-Prior distributions for regression coefficients

9 0.24275199 1858 andrew gelman stats-2013-05-15-Reputations changeable, situations tolerable

10 0.23475248 846 andrew gelman stats-2011-08-09-Default priors update?

11 0.22503917 1695 andrew gelman stats-2013-01-28-Economists argue about Bayes

12 0.22285865 2200 andrew gelman stats-2014-02-05-Prior distribution for a predicted probability

13 0.20754558 1149 andrew gelman stats-2012-02-01-Philosophy of Bayesian statistics: my reactions to Cox and Mayo

14 0.20352837 1247 andrew gelman stats-2012-04-05-More philosophy of Bayes

15 0.20069304 1465 andrew gelman stats-2012-08-21-D. Buggin

16 0.18670043 2129 andrew gelman stats-2013-12-10-Cross-validation and Bayesian estimation of tuning parameters

17 0.18205118 779 andrew gelman stats-2011-06-25-Avoiding boundary estimates using a prior distribution as regularization

18 0.17770873 1999 andrew gelman stats-2013-08-27-Bayesian model averaging or fitting a larger model

19 0.17594297 801 andrew gelman stats-2011-07-13-On the half-Cauchy prior for a global scale parameter

20 0.17497623 1046 andrew gelman stats-2011-12-07-Neutral noninformative and informative conjugate beta and gamma prior distributions


similar blogs computed by lsi model

lsi for this blog:

topicId topicWeight

[(0, 0.266), (1, 0.239), (2, 0.022), (3, 0.12), (4, -0.072), (5, -0.08), (6, 0.171), (7, 0.013), (8, -0.189), (9, 0.099), (10, 0.005), (11, 0.002), (12, 0.065), (13, 0.019), (14, -0.027), (15, -0.014), (16, 0.03), (17, 0.011), (18, 0.021), (19, 0.011), (20, -0.038), (21, -0.026), (22, -0.062), (23, 0.053), (24, -0.023), (25, 0.031), (26, 0.057), (27, -0.01), (28, -0.009), (29, 0.064), (30, 0.011), (31, -0.04), (32, 0.001), (33, -0.033), (34, -0.031), (35, 0.003), (36, -0.004), (37, 0.022), (38, 0.012), (39, 0.001), (40, -0.011), (41, 0.016), (42, 0.049), (43, 0.005), (44, 0.005), (45, -0.013), (46, -0.023), (47, -0.018), (48, 0.037), (49, -0.001)]

similar blogs list:

simIndex simValue blogId blogTitle

same-blog 1 0.97332621 1946 andrew gelman stats-2013-07-19-Prior distributions on derived quantities rather than on parameters themselves

Introduction: Following up on our discussion of the other day , Nick Firoozye writes: One thing I meant by my initial query (but really didn’t manage to get across) was this: I have no idea what my prior would be on many many models, but just like Utility Theory expects ALL consumers to attach a utility to any and all consumption goods (even those I haven’t seen or heard of), Bayesian Stats (almost) expects the same for priors. (Of course it’s not a religious edict much in the way Utility Theory has, since there is no theory of a “modeler” in the Bayesian paradigm—nonetheless there is still an expectation that we should have priors over all sorts of parameters which mean almost nothing to us). For most models with sufficient complexity, I also have no idea what my informative priors are actually doing and the only way to know anything is through something I can see and experience, through data, not parameters or state variables. My question was more on the—let’s use the prior to come up

2 0.94729775 1092 andrew gelman stats-2011-12-29-More by Berger and me on weakly informative priors

Introduction: A couple days ago we discussed some remarks by Tony O’Hagan and Jim Berger on weakly informative priors. Jim followed up on Deborah Mayo’s blog with this: Objective Bayesian priors are often improper (i.e., have infinite total mass), but this is not a problem when they are developed correctly. But not every improper prior is satisfactory. For instance, the constant prior is known to be unsatisfactory in many situations. The ‘solution’ pseudo-Bayesians often use is to choose a constant prior over a large but bounded set (a ‘weakly informative’ prior), saying it is now proper and so all is well. This is not true; if the constant prior on the whole parameter space is bad, so will be the constant prior over the bounded set. The problem is, in part, that some people confuse proper priors with subjective priors and, having learned that true subjective priors are fine, incorrectly presume that weakly informative proper priors are fine. I have a few reactions to this: 1. I agree

3 0.92803329 1087 andrew gelman stats-2011-12-27-“Keeping things unridiculous”: Berger, O’Hagan, and me on weakly informative priors

Introduction: Deborah Mayo sent me this quote from Jim Berger: Too often I see people pretending to be subjectivists, and then using “weakly informative” priors that the objective Bayesian community knows are terrible and will give ridiculous answers; subjectivism is then being used as a shield to hide ignorance. . . . In my own more provocative moments, I claim that the only true subjectivists are the objective Bayesians, because they refuse to use subjectivism as a shield against criticism of sloppy pseudo-Bayesian practice. This caught my attention because I’ve become more and more convinced that weakly informative priors are the right way to go in many different situations. I don’t think Berger was talking about me , though, as the above quote came from a publication in 2006, at which time I’d only started writing about weakly informative priors. Going back to Berger’s article , I see that his “weakly informative priors” remark was aimed at this article by Anthony O’Hagan, who w

4 0.9249115 1474 andrew gelman stats-2012-08-29-More on scaled-inverse Wishart and prior independence

Introduction: I’ve had a couple of email conversations in the past couple days on dependence in multivariate prior distributions. Modeling the degrees of freedom and scale parameters in the t distribution First, in our Stan group we’ve been discussing the choice of priors for the degrees-of-freedom parameter in the t distribution. I wrote that also there’s the question of parameterization. It does not necessarily make sense to have independent priors on the df and scale parameters. In some sense, the meaning of the scale parameter changes with the df. Prior dependence between correlation and scale parameters in the scaled inverse-Wishart model The second case of parameterization in prior distribution arose from an email I received from Chris Chatham pointing me to this exploration by Matt Simpson of the scaled inverse-Wishart prior distribution for hierarchical covariance matrices. Simpson writes: A popular prior for Σ is the inverse-Wishart distribution [ not the same as the

5 0.92232835 1858 andrew gelman stats-2013-05-15-Reputations changeable, situations tolerable

Introduction: David Kessler, Peter Hoff, and David Dunson write : Marginally specified priors for nonparametric Bayesian estimation Prior specification for nonparametric Bayesian inference involves the difficult task of quantifying prior knowledge about a parameter of high, often infinite, dimension. Realistically, a statistician is unlikely to have informed opinions about all aspects of such a parameter, but may have real information about functionals of the parameter, such the population mean or variance. This article proposes a new framework for nonparametric Bayes inference in which the prior distribution for a possibly infinite-dimensional parameter is decomposed into two parts: an informative prior on a finite set of functionals, and a nonparametric conditional prior for the parameter given the functionals. Such priors can be easily constructed from standard nonparametric prior distributions in common use, and inherit the large support of the standard priors upon which they are based. Ad

6 0.91550869 1155 andrew gelman stats-2012-02-05-What is a prior distribution?

7 0.90848106 2109 andrew gelman stats-2013-11-21-Hidden dangers of noninformative priors

8 0.89628983 468 andrew gelman stats-2010-12-15-Weakly informative priors and imprecise probabilities

9 0.89348024 846 andrew gelman stats-2011-08-09-Default priors update?

10 0.88264036 801 andrew gelman stats-2011-07-13-On the half-Cauchy prior for a global scale parameter

11 0.8811757 669 andrew gelman stats-2011-04-19-The mysterious Gamma (1.4, 0.4)

12 0.88010299 1454 andrew gelman stats-2012-08-11-Weakly informative priors for Bayesian nonparametric models?

13 0.87482393 1046 andrew gelman stats-2011-12-07-Neutral noninformative and informative conjugate beta and gamma prior distributions

14 0.86631531 1465 andrew gelman stats-2012-08-21-D. Buggin

15 0.8630054 1941 andrew gelman stats-2013-07-16-Priors

16 0.86270583 2017 andrew gelman stats-2013-09-11-“Informative g-Priors for Logistic Regression”

17 0.85830855 1486 andrew gelman stats-2012-09-07-Prior distributions for regression coefficients

18 0.8352558 779 andrew gelman stats-2011-06-25-Avoiding boundary estimates using a prior distribution as regularization

19 0.82774192 2200 andrew gelman stats-2014-02-05-Prior distribution for a predicted probability

20 0.81000906 1466 andrew gelman stats-2012-08-22-The scaled inverse Wishart prior distribution for a covariance matrix in a hierarchical model


similar blogs computed by lda model

lda for this blog:

topicId topicWeight

[(16, 0.064), (21, 0.038), (24, 0.184), (45, 0.021), (50, 0.022), (53, 0.022), (57, 0.011), (60, 0.109), (72, 0.011), (77, 0.01), (86, 0.021), (89, 0.036), (95, 0.014), (96, 0.012), (99, 0.228)]

similar blogs list:

simIndex simValue blogId blogTitle

same-blog 1 0.94299591 1946 andrew gelman stats-2013-07-19-Prior distributions on derived quantities rather than on parameters themselves

Introduction: Following up on our discussion of the other day , Nick Firoozye writes: One thing I meant by my initial query (but really didn’t manage to get across) was this: I have no idea what my prior would be on many many models, but just like Utility Theory expects ALL consumers to attach a utility to any and all consumption goods (even those I haven’t seen or heard of), Bayesian Stats (almost) expects the same for priors. (Of course it’s not a religious edict much in the way Utility Theory has, since there is no theory of a “modeler” in the Bayesian paradigm—nonetheless there is still an expectation that we should have priors over all sorts of parameters which mean almost nothing to us). For most models with sufficient complexity, I also have no idea what my informative priors are actually doing and the only way to know anything is through something I can see and experience, through data, not parameters or state variables. My question was more on the—let’s use the prior to come up

2 0.92840856 1465 andrew gelman stats-2012-08-21-D. Buggin

Introduction: Joe Zhao writes: I am trying to fit my data using the scaled inverse wishart model you mentioned in your book, Data analysis using regression and hierarchical models. Instead of using a uniform prior on the scale parameters, I try to use a log-normal distribution prior. However, I found that the individual coefficients don’t shrink much to a certain value even a highly informative prior (with extremely low variance) is considered. The coefficients are just very close to their least-squares estimations. Is it because of the log-normal prior I’m using or I’m wrong somewhere? My reply: If your priors are concentrated enough at zero variance, then yeah, the posterior estimates of the parameters should be pulled (almost) all the way to zero. If this isn’t happening, you got a problem. So as a start I’d try putting in some really strong priors concentrated at 0 (for example, N(0,.1^2)) and checking that you get a sensible answer. If not, you might well have a bug. You can also try

3 0.9236052 2086 andrew gelman stats-2013-11-03-How best to compare effects measured in two different time periods?

Introduction: I received the following email from someone who wishes to remain anonymous: My colleague and I are trying to understand the best way to approach a problem involving measuring a group of individuals’ abilities across time, and are hoping you can offer some guidance. We are trying to analyze the combined effect of two distinct groups of people (A and B, with no overlap between A and B) who collaborate to produce a binary outcome, using a mixed logistic regression along the lines of the following. Outcome ~ (1 | A) + (1 | B) + Other variables What we’re interested in testing was whether the observed A random effects in period 1 are predictive of the A random effects in the following period 2. Our idea being create two models, each using a different period’s worth of data, to create two sets of A coefficients, then observe the relationship between the two. If the A’s have a persistent ability across periods, the coefficients should be correlated or show a linear-ish relationshi

4 0.9226445 2029 andrew gelman stats-2013-09-18-Understanding posterior p-values

Introduction: David Kaplan writes: I came across your paper “Understanding Posterior Predictive P-values”, and I have a question regarding your statement “If a posterior predictive p-value is 0.4, say, that means that, if we believe the model, we think there is a 40% chance that tomorrow’s value of T(y_rep) will exceed today’s T(y).” This is perfectly understandable to me and represents the idea of calibration. However, I am unsure how this relates to statements about fit. If T is the LR chi-square or Pearson chi-square, then your statement that there is a 40% chance that tomorrows value exceeds today’s value indicates bad fit, I think. Yet, some literature indicates that high p-values suggest good fit. Could you clarify this? My reply: I think that “fit” depends on the question being asked. In this case, I’d say the model fits for this particular purpose, even though it might not fit for other purposes. And here’s the abstract of the paper: Posterior predictive p-values do not i

5 0.92204273 1792 andrew gelman stats-2013-04-07-X on JLP

Introduction: Christian Robert writes on the Jeffreys-Lindley paradox. I have nothing to add to this beyond my recent comments : To me, the Lindley paradox falls apart because of its noninformative prior distribution on the parameter of interest. If you really think there’s a high probability the parameter is nearly exactly zero, I don’t see the point of the model saying that you have no prior information at all on the parameter. In short: my criticism of so-called Bayesian hypothesis testing is that it’s insufficiently Bayesian. To clarify, I’m speaking of all the examples I’ve ever worked on in social and environmental science, where in some settings I can imagine a parameter being very close to zero and in other settings I can imagine a parameter taking on just about any value in a wide range, but where I’ve never seen an example where a parameter could be either right at zero or taking on any possible value. But such examples might occur in areas of application that I haven’t worked on.

6 0.91992867 567 andrew gelman stats-2011-02-10-English-to-English translation

7 0.91989851 807 andrew gelman stats-2011-07-17-Macro causality

8 0.91952914 1080 andrew gelman stats-2011-12-24-Latest in blog advertising

9 0.91941333 1644 andrew gelman stats-2012-12-30-Fixed effects, followed by Bayes shrinkage?

10 0.91931236 1155 andrew gelman stats-2012-02-05-What is a prior distribution?

11 0.91912198 846 andrew gelman stats-2011-08-09-Default priors update?

12 0.91875076 898 andrew gelman stats-2011-09-10-Fourteen magic words: an update

13 0.91871852 1757 andrew gelman stats-2013-03-11-My problem with the Lindley paradox

14 0.91848254 1240 andrew gelman stats-2012-04-02-Blogads update

15 0.91805142 1474 andrew gelman stats-2012-08-29-More on scaled-inverse Wishart and prior independence

16 0.91769731 191 andrew gelman stats-2010-08-08-Angry about the soda tax

17 0.91738546 2149 andrew gelman stats-2013-12-26-Statistical evidence for revised standards

18 0.91720414 1838 andrew gelman stats-2013-05-03-Setting aside the politics, the debate over the new health-care study reveals that we’re moving to a new high standard of statistical journalism

19 0.91678202 502 andrew gelman stats-2011-01-04-Cash in, cash out graph

20 0.91663456 1208 andrew gelman stats-2012-03-11-Gelman on Hennig on Gelman on Bayes