andrew_gelman_stats andrew_gelman_stats-2012 andrew_gelman_stats-2012-1465 knowledge-graph by maker-knowledge-mining
Source: html
Introduction: Joe Zhao writes: I am trying to fit my data using the scaled inverse wishart model you mentioned in your book, Data analysis using regression and hierarchical models. Instead of using a uniform prior on the scale parameters, I try to use a log-normal distribution prior. However, I found that the individual coefficients don’t shrink much to a certain value even a highly informative prior (with extremely low variance) is considered. The coefficients are just very close to their least-squares estimations. Is it because of the log-normal prior I’m using or I’m wrong somewhere? My reply: If your priors are concentrated enough at zero variance, then yeah, the posterior estimates of the parameters should be pulled (almost) all the way to zero. If this isn’t happening, you got a problem. So as a start I’d try putting in some really strong priors concentrated at 0 (for example, N(0,.1^2)) and checking that you get a sensible answer. If not, you might well have a bug. You can also try
sentIndex sentText sentNum sentScore
1 Joe Zhao writes: I am trying to fit my data using the scaled inverse wishart model you mentioned in your book, Data analysis using regression and hierarchical models. [sent-1, score-1.338]
2 Instead of using a uniform prior on the scale parameters, I try to use a log-normal distribution prior. [sent-2, score-0.921]
3 However, I found that the individual coefficients don’t shrink much to a certain value even a highly informative prior (with extremely low variance) is considered. [sent-3, score-1.282]
4 The coefficients are just very close to their least-squares estimations. [sent-4, score-0.314]
5 Is it because of the log-normal prior I’m using or I’m wrong somewhere? [sent-5, score-0.486]
6 My reply: If your priors are concentrated enough at zero variance, then yeah, the posterior estimates of the parameters should be pulled (almost) all the way to zero. [sent-6, score-1.154]
7 So as a start I’d try putting in some really strong priors concentrated at 0 (for example, N(0,. [sent-8, score-1.013]
wordName wordTfidf (topN-words)
[('concentrated', 0.336), ('prior', 0.229), ('coefficients', 0.228), ('zhao', 0.225), ('priors', 0.21), ('variance', 0.203), ('try', 0.198), ('using', 0.187), ('wishart', 0.185), ('parameters', 0.181), ('scaled', 0.177), ('pulled', 0.168), ('shrink', 0.168), ('inverse', 0.161), ('joe', 0.152), ('sensible', 0.142), ('fake', 0.141), ('uniform', 0.133), ('simulation', 0.128), ('yeah', 0.123), ('happening', 0.117), ('somewhere', 0.114), ('extremely', 0.109), ('putting', 0.109), ('mentioned', 0.107), ('checking', 0.104), ('informative', 0.103), ('highly', 0.098), ('hierarchical', 0.097), ('scale', 0.096), ('posterior', 0.092), ('data', 0.092), ('certain', 0.091), ('low', 0.091), ('zero', 0.09), ('close', 0.086), ('strong', 0.086), ('individual', 0.083), ('value', 0.082), ('almost', 0.078), ('distribution', 0.078), ('estimates', 0.077), ('however', 0.076), ('isn', 0.076), ('start', 0.074), ('instead', 0.073), ('fit', 0.073), ('regression', 0.072), ('wrong', 0.07), ('got', 0.066)]
simIndex simValue blogId blogTitle
same-blog 1 1.0 1465 andrew gelman stats-2012-08-21-D. Buggin
Introduction: Joe Zhao writes: I am trying to fit my data using the scaled inverse wishart model you mentioned in your book, Data analysis using regression and hierarchical models. Instead of using a uniform prior on the scale parameters, I try to use a log-normal distribution prior. However, I found that the individual coefficients don’t shrink much to a certain value even a highly informative prior (with extremely low variance) is considered. The coefficients are just very close to their least-squares estimations. Is it because of the log-normal prior I’m using or I’m wrong somewhere? My reply: If your priors are concentrated enough at zero variance, then yeah, the posterior estimates of the parameters should be pulled (almost) all the way to zero. If this isn’t happening, you got a problem. So as a start I’d try putting in some really strong priors concentrated at 0 (for example, N(0,.1^2)) and checking that you get a sensible answer. If not, you might well have a bug. You can also try
Introduction: Since we’re talking about the scaled inverse Wishart . . . here’s a recent message from Chris Chatham: I have been reading your book on Bayesian Hierarchical/Multilevel Modeling but have been struggling a bit with deciding whether to model my multivariate normal distribution using the scaled inverse Wishart approach you advocate given the arguments at this blog post [entitled "Why an inverse-Wishart prior may not be such a good idea"]. My reply: We discuss this in our book. We know the inverse-Wishart has problems, that’s why we recommend the scaled inverse-Wishart, which is a more general class of models. Here ‘s an old blog post on the topic. And also of course there’s the description in our book. Chris pointed me to the following comment by Simon Barthelmé: Using the scaled inverse Wishart doesn’t change anything, the standard deviations of the invidual coefficients and their covariance are still dependent. My answer would be to use a prior that models the stan
3 0.28646809 1474 andrew gelman stats-2012-08-29-More on scaled-inverse Wishart and prior independence
Introduction: I’ve had a couple of email conversations in the past couple days on dependence in multivariate prior distributions. Modeling the degrees of freedom and scale parameters in the t distribution First, in our Stan group we’ve been discussing the choice of priors for the degrees-of-freedom parameter in the t distribution. I wrote that also there’s the question of parameterization. It does not necessarily make sense to have independent priors on the df and scale parameters. In some sense, the meaning of the scale parameter changes with the df. Prior dependence between correlation and scale parameters in the scaled inverse-Wishart model The second case of parameterization in prior distribution arose from an email I received from Chris Chatham pointing me to this exploration by Matt Simpson of the scaled inverse-Wishart prior distribution for hierarchical covariance matrices. Simpson writes: A popular prior for Σ is the inverse-Wishart distribution [ not the same as the
4 0.26486975 2109 andrew gelman stats-2013-11-21-Hidden dangers of noninformative priors
Introduction: Following up on Christian’s post [link fixed] on the topic, I’d like to offer a few thoughts of my own. In BDA, we express the idea that a noninformative prior is a placeholder: you can use the noninformative prior to get the analysis started, then if your posterior distribution is less informative than you would like, or if it does not make sense, you can go back and add prior information. Same thing for the data model (the “likelihood”), for that matter: it often makes sense to start with something simple and conventional and then go from there. So, in that sense, noninformative priors are no big deal, they’re just a way to get started. Just don’t take them too seriously. Traditionally in statistics we’ve worked with the paradigm of a single highly informative dataset with only weak external information. But if the data are sparse and prior information is strong, we have to think differently. And, when you increase the dimensionality of a problem, both these things hap
5 0.25926694 1941 andrew gelman stats-2013-07-16-Priors
Introduction: Nick Firoozye writes: While I am absolutely sympathetic to the Bayesian agenda I am often troubled by the requirement of having priors. We must have priors on the parameter of an infinite number of model we have never seen before and I find this troubling. There is a similarly troubling problem in economics of utility theory. Utility is on consumables. To be complete a consumer must assign utility to all sorts of things they never would have encountered. More recent versions of utility theory instead make consumption goods a portfolio of attributes. Cadillacs are x many units of luxury y of transport etc etc. And we can automatically have personal utilities to all these attributes. I don’t ever see parameters. Some model have few and some have hundreds. Instead, I see data. So I don’t know how to have an opinion on parameters themselves. Rather I think it far more natural to have opinions on the behavior of models. The prior predictive density is a good and sensible notion. Also
6 0.25415519 779 andrew gelman stats-2011-06-25-Avoiding boundary estimates using a prior distribution as regularization
7 0.24494658 846 andrew gelman stats-2011-08-09-Default priors update?
8 0.22855923 1092 andrew gelman stats-2011-12-29-More by Berger and me on weakly informative priors
9 0.22342227 1486 andrew gelman stats-2012-09-07-Prior distributions for regression coefficients
10 0.21838839 63 andrew gelman stats-2010-06-02-The problem of overestimation of group-level variance parameters
11 0.20353408 1155 andrew gelman stats-2012-02-05-What is a prior distribution?
12 0.20069304 1946 andrew gelman stats-2013-07-19-Prior distributions on derived quantities rather than on parameters themselves
13 0.19778071 2200 andrew gelman stats-2014-02-05-Prior distribution for a predicted probability
14 0.17286497 801 andrew gelman stats-2011-07-13-On the half-Cauchy prior for a global scale parameter
15 0.17064773 1695 andrew gelman stats-2013-01-28-Economists argue about Bayes
16 0.16828889 2017 andrew gelman stats-2013-09-11-“Informative g-Priors for Logistic Regression”
17 0.16586006 1087 andrew gelman stats-2011-12-27-“Keeping things unridiculous”: Berger, O’Hagan, and me on weakly informative priors
18 0.16271093 1989 andrew gelman stats-2013-08-20-Correcting for multiple comparisons in a Bayesian regression model
19 0.16004263 669 andrew gelman stats-2011-04-19-The mysterious Gamma (1.4, 0.4)
20 0.15986162 850 andrew gelman stats-2011-08-11-Understanding how estimates change when you move to a multilevel model
topicId topicWeight
[(0, 0.194), (1, 0.258), (2, 0.063), (3, 0.079), (4, 0.031), (5, -0.07), (6, 0.19), (7, -0.019), (8, -0.162), (9, 0.152), (10, 0.059), (11, 0.024), (12, 0.067), (13, 0.021), (14, 0.024), (15, -0.015), (16, -0.053), (17, -0.001), (18, 0.054), (19, -0.022), (20, -0.028), (21, -0.012), (22, 0.001), (23, 0.023), (24, -0.014), (25, 0.002), (26, 0.011), (27, -0.005), (28, 0.001), (29, 0.011), (30, 0.024), (31, -0.04), (32, 0.001), (33, -0.007), (34, 0.013), (35, 0.002), (36, -0.016), (37, -0.022), (38, 0.009), (39, 0.027), (40, -0.015), (41, 0.011), (42, 0.01), (43, 0.031), (44, 0.003), (45, -0.024), (46, -0.009), (47, -0.017), (48, 0.051), (49, -0.023)]
simIndex simValue blogId blogTitle
same-blog 1 0.96687073 1465 andrew gelman stats-2012-08-21-D. Buggin
Introduction: Joe Zhao writes: I am trying to fit my data using the scaled inverse wishart model you mentioned in your book, Data analysis using regression and hierarchical models. Instead of using a uniform prior on the scale parameters, I try to use a log-normal distribution prior. However, I found that the individual coefficients don’t shrink much to a certain value even a highly informative prior (with extremely low variance) is considered. The coefficients are just very close to their least-squares estimations. Is it because of the log-normal prior I’m using or I’m wrong somewhere? My reply: If your priors are concentrated enough at zero variance, then yeah, the posterior estimates of the parameters should be pulled (almost) all the way to zero. If this isn’t happening, you got a problem. So as a start I’d try putting in some really strong priors concentrated at 0 (for example, N(0,.1^2)) and checking that you get a sensible answer. If not, you might well have a bug. You can also try
2 0.93912202 846 andrew gelman stats-2011-08-09-Default priors update?
Introduction: Ryan King writes: I was wondering if you have a brief comment on the state of the art for objective priors for hierarchical generalized linear models (generalized linear mixed models). I have been working off the papers in Bayesian Analysis (2006) 1, Number 3 (Browne and Draper, Kass and Natarajan, Gelman). There seems to have been continuous work for matching priors in linear mixed models, but GLMMs less so because of the lack of an analytic marginal likelihood for the variance components. There are a number of additional suggestions in the literature since 2006, but little robust practical guidance. I’m interested in both mean parameters and the variance components. I’m almost always concerned with logistic random effect models. I’m fascinated by the matching-priors idea of higher-order asymptotic improvements to maximum likelihood, and need to make some kind of defensible default recommendation. Given the massive scale of the datasets (genetics …), extensive sensitivity a
3 0.91874182 1474 andrew gelman stats-2012-08-29-More on scaled-inverse Wishart and prior independence
Introduction: I’ve had a couple of email conversations in the past couple days on dependence in multivariate prior distributions. Modeling the degrees of freedom and scale parameters in the t distribution First, in our Stan group we’ve been discussing the choice of priors for the degrees-of-freedom parameter in the t distribution. I wrote that also there’s the question of parameterization. It does not necessarily make sense to have independent priors on the df and scale parameters. In some sense, the meaning of the scale parameter changes with the df. Prior dependence between correlation and scale parameters in the scaled inverse-Wishart model The second case of parameterization in prior distribution arose from an email I received from Chris Chatham pointing me to this exploration by Matt Simpson of the scaled inverse-Wishart prior distribution for hierarchical covariance matrices. Simpson writes: A popular prior for Σ is the inverse-Wishart distribution [ not the same as the
4 0.91143137 779 andrew gelman stats-2011-06-25-Avoiding boundary estimates using a prior distribution as regularization
Introduction: For awhile I’ve been fitting most of my multilevel models using lmer/glmer, which gives point estimates of the group-level variance parameters (maximum marginal likelihood estimate for lmer and an approximation for glmer). I’m usually satisfied with this–sure, point estimation understates the uncertainty in model fitting, but that’s typically the least of our worries. Sometimes, though, lmer/glmer estimates group-level variances at 0 or estimates group-level correlation parameters at +/- 1. Typically, when this happens, it’s not that we’re so sure the variance is close to zero or that the correlation is close to 1 or -1; rather, the marginal likelihood does not provide a lot of information about these parameters of the group-level error distribution. I don’t want point estimates on the boundary. I don’t want to say that the unexplained variance in some dimension is exactly zero. One way to handle this problem is full Bayes: slap a prior on sigma, do your Gibbs and Metropolis
5 0.89932406 669 andrew gelman stats-2011-04-19-The mysterious Gamma (1.4, 0.4)
Introduction: A student writes: I have a question about an earlier recommendation of yours on the election of the prior distribution for the precision hyperparameter of a normal distribution, and a reference for the recommendation. If I recall correctly I have read that you have suggested to use Gamma(1.4, 0.4) instead of Gamma(0.01,0.01) for the prior distribution of the precision hyper parameter of a normal distribution. I would very much appreciate if you would have the time to point me to this publication of yours. The reason is that I have used the prior distribution (Gamma(1.4, 0.4)) in a study which we now revise for publication, and where a reviewer question the choice of the distribution (claiming that it is too informative!). I am well aware of that you in recent publications (Prior distributions for variance parameters in hierarchical models. Bayesian Analysis; Data Analysis using regression and multilevel/hierarchical models) suggest to model the precision as pow(standard deviatio
6 0.87374836 1858 andrew gelman stats-2013-05-15-Reputations changeable, situations tolerable
8 0.86259913 184 andrew gelman stats-2010-08-04-That half-Cauchy prior
9 0.8593542 2017 andrew gelman stats-2013-09-11-“Informative g-Priors for Logistic Regression”
10 0.85450447 1946 andrew gelman stats-2013-07-19-Prior distributions on derived quantities rather than on parameters themselves
11 0.8509267 801 andrew gelman stats-2011-07-13-On the half-Cauchy prior for a global scale parameter
12 0.84802151 1486 andrew gelman stats-2012-09-07-Prior distributions for regression coefficients
13 0.84792572 1046 andrew gelman stats-2011-12-07-Neutral noninformative and informative conjugate beta and gamma prior distributions
14 0.84583747 1092 andrew gelman stats-2011-12-29-More by Berger and me on weakly informative priors
15 0.84419769 63 andrew gelman stats-2010-06-02-The problem of overestimation of group-level variance parameters
16 0.83649844 2109 andrew gelman stats-2013-11-21-Hidden dangers of noninformative priors
17 0.82897526 1454 andrew gelman stats-2012-08-11-Weakly informative priors for Bayesian nonparametric models?
18 0.81730068 1155 andrew gelman stats-2012-02-05-What is a prior distribution?
19 0.81051248 1941 andrew gelman stats-2013-07-16-Priors
20 0.80929762 1130 andrew gelman stats-2012-01-20-Prior beliefs about locations of decision boundaries
topicId topicWeight
[(9, 0.019), (11, 0.047), (15, 0.015), (16, 0.041), (21, 0.07), (24, 0.267), (60, 0.084), (76, 0.02), (86, 0.036), (95, 0.013), (99, 0.278)]
simIndex simValue blogId blogTitle
same-blog 1 0.97054529 1465 andrew gelman stats-2012-08-21-D. Buggin
Introduction: Joe Zhao writes: I am trying to fit my data using the scaled inverse wishart model you mentioned in your book, Data analysis using regression and hierarchical models. Instead of using a uniform prior on the scale parameters, I try to use a log-normal distribution prior. However, I found that the individual coefficients don’t shrink much to a certain value even a highly informative prior (with extremely low variance) is considered. The coefficients are just very close to their least-squares estimations. Is it because of the log-normal prior I’m using or I’m wrong somewhere? My reply: If your priors are concentrated enough at zero variance, then yeah, the posterior estimates of the parameters should be pulled (almost) all the way to zero. If this isn’t happening, you got a problem. So as a start I’d try putting in some really strong priors concentrated at 0 (for example, N(0,.1^2)) and checking that you get a sensible answer. If not, you might well have a bug. You can also try
2 0.96270972 2029 andrew gelman stats-2013-09-18-Understanding posterior p-values
Introduction: David Kaplan writes: I came across your paper “Understanding Posterior Predictive P-values”, and I have a question regarding your statement “If a posterior predictive p-value is 0.4, say, that means that, if we believe the model, we think there is a 40% chance that tomorrow’s value of T(y_rep) will exceed today’s T(y).” This is perfectly understandable to me and represents the idea of calibration. However, I am unsure how this relates to statements about fit. If T is the LR chi-square or Pearson chi-square, then your statement that there is a 40% chance that tomorrows value exceeds today’s value indicates bad fit, I think. Yet, some literature indicates that high p-values suggest good fit. Could you clarify this? My reply: I think that “fit” depends on the question being asked. In this case, I’d say the model fits for this particular purpose, even though it might not fit for other purposes. And here’s the abstract of the paper: Posterior predictive p-values do not i
3 0.96115524 1757 andrew gelman stats-2013-03-11-My problem with the Lindley paradox
Introduction: From a couple years ago but still relevant, I think: To me, the Lindley paradox falls apart because of its noninformative prior distribution on the parameter of interest. If you really think there’s a high probability the parameter is nearly exactly zero, I don’t see the point of the model saying that you have no prior information at all on the parameter. In short: my criticism of so-called Bayesian hypothesis testing is that it’s insufficiently Bayesian. P.S. To clarify (in response to Bill’s comment below): I’m speaking of all the examples I’ve ever worked on in social and environmental science, where in some settings I can imagine a parameter being very close to zero and in other settings I can imagine a parameter taking on just about any value in a wide range, but where I’ve never seen an example where a parameter could be either right at zero or taking on any possible value. But such examples might occur in areas of application that I haven’t worked on.
4 0.96092635 896 andrew gelman stats-2011-09-09-My homework success
Introduction: A friend writes to me: You will be amused to know that students in our Bayesian Inference paper at 4th year found solutions to exercises from your book on-line. The amazing thing was that some of them were dumb enough to copy out solutions verbatim. However, I thought you might like to know you have done well in this class! I’m happy to hear this. I worked hard on those solutions!
5 0.95843554 953 andrew gelman stats-2011-10-11-Steve Jobs’s cancer and science-based medicine
Introduction: Interesting discussion from David Gorski (which I found via this link from Joseph Delaney). I don’t have anything really to add to this discussion except to note the value of this sort of anecdote in a statistics discussion. It’s only n=1 and adds almost nothing to the literature on the effectiveness of various treatments, but a story like this can help focus one’s thoughts on the decision problems.
8 0.95476162 1080 andrew gelman stats-2011-12-24-Latest in blog advertising
9 0.95385456 1474 andrew gelman stats-2012-08-29-More on scaled-inverse Wishart and prior independence
10 0.95294458 2143 andrew gelman stats-2013-12-22-The kluges of today are the textbook solutions of tomorrow.
11 0.95244527 1792 andrew gelman stats-2013-04-07-X on JLP
12 0.95238197 1240 andrew gelman stats-2012-04-02-Blogads update
13 0.95236194 433 andrew gelman stats-2010-11-27-One way that psychology research is different than medical research
14 0.95215464 1455 andrew gelman stats-2012-08-12-Probabilistic screening to get an approximate self-weighted sample
15 0.95179373 2247 andrew gelman stats-2014-03-14-The maximal information coefficient
16 0.95123315 1368 andrew gelman stats-2012-06-06-Question 27 of my final exam for Design and Analysis of Sample Surveys
17 0.95106655 1208 andrew gelman stats-2012-03-11-Gelman on Hennig on Gelman on Bayes
18 0.95096564 1087 andrew gelman stats-2011-12-27-“Keeping things unridiculous”: Berger, O’Hagan, and me on weakly informative priors
19 0.95090282 779 andrew gelman stats-2011-06-25-Avoiding boundary estimates using a prior distribution as regularization
20 0.95083439 1999 andrew gelman stats-2013-08-27-Bayesian model averaging or fitting a larger model