andrew_gelman_stats andrew_gelman_stats-2012 andrew_gelman_stats-2012-1466 knowledge-graph by maker-knowledge-mining

1466 andrew gelman stats-2012-08-22-The scaled inverse Wishart prior distribution for a covariance matrix in a hierarchical model

meta infos for this blog

Source: html

Introduction: Since we’re talking about the scaled inverse Wishart . . . here’s a recent message from Chris Chatham: I have been reading your book on Bayesian Hierarchical/Multilevel Modeling but have been struggling a bit with deciding whether to model my multivariate normal distribution using the scaled inverse Wishart approach you advocate given the arguments at this blog post [entitled "Why an inverse-Wishart prior may not be such a good idea"]. My reply: We discuss this in our book. We know the inverse-Wishart has problems, that’s why we recommend the scaled inverse-Wishart, which is a more general class of models. Here ‘s an old blog post on the topic. And also of course there’s the description in our book. Chris pointed me to the following comment by Simon Barthelmé: Using the scaled inverse Wishart doesn’t change anything, the standard deviations of the invidual coefficients and their covariance are still dependent. My answer would be to use a prior that models the stan

Summary: the most important sentenses genereted by tfidf model

sentIndex sentText sentNum sentScore

1 Since we’re talking about the scaled inverse Wishart . [sent-1, score-0.744]

2 We know the inverse-Wishart has problems, that’s why we recommend the scaled inverse-Wishart, which is a more general class of models. [sent-6, score-0.49]

3 Chris pointed me to the following comment by Simon Barthelmé: Using the scaled inverse Wishart doesn’t change anything, the standard deviations of the invidual coefficients and their covariance are still dependent. [sent-9, score-1.145]

4 My answer would be to use a prior that models the standard deviations and the correlations separately, so that you can express things like “I don’t expect my coefficients to be too large but I expect them to be correlated. [sent-10, score-0.948]

5 ” Barthelmé mentions the Barnard, McCulloch, and Meng paper (which I just love, and which I cite in at least one of my books) in which the scale parameters and the correlations are modeled independently, and writes, “I don’t see why this isn’t the default in most statistical software, honestly. [sent-11, score-0.768]

6 ” The answer to this last question is that computation is really slow with that model. [sent-12, score-0.056]

7 Also, it’s not really necessary for scale parameters and correlations to be precisely independent. [sent-13, score-0.646]

8 What you want is for these parameters to be uncoupled or to be de facto independent. [sent-14, score-0.334]

9 To put it another way, what matters in a prior is not what the prior looks like, what matters is what the posterior looks like. [sent-15, score-0.792]

10 We’d like to be able to estimate, from hierarchical data, the scale parameters and also the correlations. [sent-16, score-0.517]

11 The redundant parameterization in the scaled inverse Wishart prior (which, just to remind you, is due to O’Malley and Zaslavsky, not me; all I’ve done is to publicize it) allows scale parameters and correlations to both be estimated from data. [sent-17, score-1.943]

12 It fixes the problem with the unscaled inverse-Wishart. [sent-18, score-0.07]

13 What I like about these models are that they are computationally convenient, and the scaled version allows the flexibility we want for a hierarchical model. [sent-21, score-0.903]

14 model is fine too (and, as I said, I love their article) but I don’t see any particular reason why these parameters should be independent in the prior. [sent-23, score-0.322]

15 What matters is how things get estimated in the posterior. [sent-25, score-0.226]

16 Unfortunately, even now I think the inverse-Wishart is considered the standard, and people don’t always know about the scaled inverse-Wishart. [sent-26, score-0.49]

17 Another problem is that people often think of these models in terms of how they work with direct multivariate data, but I’m more interested in the hierarchical modeling context where a set of parameters (for example, regression coefficients) vary by group. [sent-27, score-0.599]

similar blogs computed by tfidf model

tfidf for this blog:

wordName wordTfidf (topN-words)

[('scaled', 0.49), ('wishart', 0.366), ('inverse', 0.254), ('parameters', 0.25), ('correlations', 0.19), ('barthelm', 0.178), ('scale', 0.152), ('prior', 0.151), ('matters', 0.148), ('barnard', 0.146), ('coefficients', 0.135), ('deviations', 0.123), ('hierarchical', 0.115), ('multivariate', 0.105), ('allows', 0.096), ('mcculloch', 0.089), ('chatham', 0.089), ('zaslavsky', 0.089), ('chris', 0.088), ('standard', 0.084), ('facto', 0.084), ('malley', 0.084), ('estimated', 0.078), ('publicize', 0.077), ('advocate', 0.077), ('redundant', 0.073), ('love', 0.072), ('expect', 0.071), ('computationally', 0.07), ('fixes', 0.07), ('remind', 0.067), ('models', 0.067), ('flexibility', 0.065), ('parameterization', 0.065), ('looks', 0.065), ('mentions', 0.064), ('simon', 0.064), ('another', 0.064), ('meng', 0.063), ('independently', 0.063), ('struggling', 0.062), ('modeling', 0.062), ('deciding', 0.061), ('covariance', 0.059), ('modeled', 0.057), ('answer', 0.056), ('separately', 0.056), ('entitled', 0.055), ('cite', 0.055), ('precisely', 0.054)]

similar blogs list:

simIndex simValue blogId blogTitle

same-blog 1 1.0 1466 andrew gelman stats-2012-08-22-The scaled inverse Wishart prior distribution for a covariance matrix in a hierarchical model

2 0.43904141 1474 andrew gelman stats-2012-08-29-More on scaled-inverse Wishart and prior independence

Introduction: I’ve had a couple of email conversations in the past couple days on dependence in multivariate prior distributions. Modeling the degrees of freedom and scale parameters in the t distribution First, in our Stan group we’ve been discussing the choice of priors for the degrees-of-freedom parameter in the t distribution. I wrote that also there’s the question of parameterization. It does not necessarily make sense to have independent priors on the df and scale parameters. In some sense, the meaning of the scale parameter changes with the df. Prior dependence between correlation and scale parameters in the scaled inverse-Wishart model The second case of parameterization in prior distribution arose from an email I received from Chris Chatham pointing me to this exploration by Matt Simpson of the scaled inverse-Wishart prior distribution for hierarchical covariance matrices. Simpson writes: A popular prior for Σ is the inverse-Wishart distribution [ not the same as the

3 0.36037576 1465 andrew gelman stats-2012-08-21-D. Buggin

Introduction: Joe Zhao writes: I am trying to fit my data using the scaled inverse wishart model you mentioned in your book, Data analysis using regression and hierarchical models. Instead of using a uniform prior on the scale parameters, I try to use a log-normal distribution prior. However, I found that the individual coefficients don’t shrink much to a certain value even a highly informative prior (with extremely low variance) is considered. The coefficients are just very close to their least-squares estimations. Is it because of the log-normal prior I’m using or I’m wrong somewhere? My reply: If your priors are concentrated enough at zero variance, then yeah, the posterior estimates of the parameters should be pulled (almost) all the way to zero. If this isn’t happening, you got a problem. So as a start I’d try putting in some really strong priors concentrated at 0 (for example, N(0,.1^2)) and checking that you get a sensible answer. If not, you might well have a bug. You can also try

4 0.22073686 1477 andrew gelman stats-2012-08-30-Visualizing Distributions of Covariance Matrices

Introduction: Since weâ€™ve been discussing prior distributions on covariance matrices, I will recommend this recent article (coauthored with Tomoki Tokuda, Ben Goodrich, Iven Van Mechelen, and Francis Tuerlinckx) on their visualization: We present some methods for graphing distributions of covariance matrices and demonstrate them on several models, including the Wishart, inverse-Wishart, and scaled inverse-Wishart families in different dimensions. Our visualizations follow the principle of decomposing a covariance matrix into scale parameters and correlations, pulling out marginal summaries where possible and using two and three-dimensional plots to reveal multivariate structure. Visualizing a distribution of covariance matrices is a step beyond visualizing a single covariance matrix or a single multivariate dataset. Our visualization methods are available through the R package VisCov.

5 0.2033347 1991 andrew gelman stats-2013-08-21-BDA3 table of contents (also a new paper on visualization)

Introduction: In response to our recent posting of Amazon’s offer of Bayesian Data Analysis 3rd edition at 40% off, some people asked what was in this new edition, with more information beyond the beautiful cover image and the brief paragraph I’d posted earlier. Here’s the table of contents. The following sections have all-new material: 1.4 New introduction of BDA principles using a simple spell checking example 2.9 Weakly informative prior distributions 5.7 Weakly informative priors for hierarchical variance parameters 7.1-7.4 Predictive accuracy for model evaluation and comparison 10.6 Computing environments 11.4 Split R-hat 11.5 New measure of effective number of simulation draws 13.7 Variational inference 13.8 Expectation propagation 13.9 Other approximations 14.6 Regularization for regression models C.1 Getting started with R and Stan C.2 Fitting a hierarchical model in Stan C.4 Programming Hamiltonian Monte Carlo in R And the new chapters: 20 Basis function models 2

6 0.19542512 779 andrew gelman stats-2011-06-25-Avoiding boundary estimates using a prior distribution as regularization

7 0.16474767 2200 andrew gelman stats-2014-02-05-Prior distribution for a predicted probability

8 0.15254164 846 andrew gelman stats-2011-08-09-Default priors update?

9 0.15019083 2017 andrew gelman stats-2013-09-11-“Informative g-Priors for Logistic Regression”

10 0.15000625 1941 andrew gelman stats-2013-07-16-Priors

11 0.14505801 1287 andrew gelman stats-2012-04-28-Understanding simulations in terms of predictive inference?

12 0.1320101 2109 andrew gelman stats-2013-11-21-Hidden dangers of noninformative priors

13 0.13152072 801 andrew gelman stats-2011-07-13-On the half-Cauchy prior for a global scale parameter

14 0.12936257 1155 andrew gelman stats-2012-02-05-What is a prior distribution?

15 0.12561421 1695 andrew gelman stats-2013-01-28-Economists argue about Bayes

16 0.12191731 1486 andrew gelman stats-2012-09-07-Prior distributions for regression coefficients

17 0.12086524 1946 andrew gelman stats-2013-07-19-Prior distributions on derived quantities rather than on parameters themselves

18 0.11292474 931 andrew gelman stats-2011-09-29-Hamiltonian Monte Carlo stories

19 0.1067033 1425 andrew gelman stats-2012-07-23-Examples of the use of hierarchical modeling to generalize to new settings

20 0.10507607 547 andrew gelman stats-2011-01-31-Using sample size in the prior distribution

similar blogs computed by lsi model

lsi for this blog:

topicId topicWeight

[(0, 0.184), (1, 0.16), (2, 0.026), (3, 0.059), (4, 0.036), (5, -0.021), (6, 0.118), (7, -0.04), (8, -0.108), (9, 0.08), (10, 0.05), (11, 0.0), (12, 0.052), (13, 0.025), (14, 0.021), (15, -0.005), (16, -0.063), (17, 0.028), (18, 0.027), (19, -0.007), (20, 0.005), (21, -0.026), (22, -0.009), (23, 0.03), (24, 0.015), (25, 0.015), (26, 0.012), (27, 0.034), (28, 0.019), (29, -0.031), (30, -0.002), (31, -0.001), (32, 0.036), (33, -0.019), (34, 0.019), (35, -0.045), (36, -0.003), (37, -0.036), (38, -0.009), (39, 0.025), (40, -0.028), (41, 0.012), (42, 0.017), (43, 0.012), (44, 0.025), (45, 0.001), (46, -0.011), (47, -0.035), (48, 0.037), (49, -0.077)]

similar blogs list:

simIndex simValue blogId blogTitle

same-blog 1 0.94898206 1466 andrew gelman stats-2012-08-22-The scaled inverse Wishart prior distribution for a covariance matrix in a hierarchical model

2 0.90135592 846 andrew gelman stats-2011-08-09-Default priors update?

Introduction: Ryan King writes: I was wondering if you have a brief comment on the state of the art for objective priors for hierarchical generalized linear models (generalized linear mixed models). I have been working off the papers in Bayesian Analysis (2006) 1, Number 3 (Browne and Draper, Kass and Natarajan, Gelman). There seems to have been continuous work for matching priors in linear mixed models, but GLMMs less so because of the lack of an analytic marginal likelihood for the variance components. There are a number of additional suggestions in the literature since 2006, but little robust practical guidance. I’m interested in both mean parameters and the variance components. I’m almost always concerned with logistic random effect models. I’m fascinated by the matching-priors idea of higher-order asymptotic improvements to maximum likelihood, and need to make some kind of defensible default recommendation. Given the massive scale of the datasets (genetics …), extensive sensitivity a

3 0.88588089 779 andrew gelman stats-2011-06-25-Avoiding boundary estimates using a prior distribution as regularization

Introduction: For awhile I’ve been fitting most of my multilevel models using lmer/glmer, which gives point estimates of the group-level variance parameters (maximum marginal likelihood estimate for lmer and an approximation for glmer). I’m usually satisfied with this–sure, point estimation understates the uncertainty in model fitting, but that’s typically the least of our worries. Sometimes, though, lmer/glmer estimates group-level variances at 0 or estimates group-level correlation parameters at +/- 1. Typically, when this happens, it’s not that we’re so sure the variance is close to zero or that the correlation is close to 1 or -1; rather, the marginal likelihood does not provide a lot of information about these parameters of the group-level error distribution. I don’t want point estimates on the boundary. I don’t want to say that the unexplained variance in some dimension is exactly zero. One way to handle this problem is full Bayes: slap a prior on sigma, do your Gibbs and Metropolis

4 0.86264175 1474 andrew gelman stats-2012-08-29-More on scaled-inverse Wishart and prior independence

5 0.86058122 1465 andrew gelman stats-2012-08-21-D. Buggin

6 0.81594372 669 andrew gelman stats-2011-04-19-The mysterious Gamma (1.4, 0.4)

7 0.81291234 801 andrew gelman stats-2011-07-13-On the half-Cauchy prior for a global scale parameter

8 0.79462528 2017 andrew gelman stats-2013-09-11-“Informative g-Priors for Logistic Regression”

9 0.79445797 1786 andrew gelman stats-2013-04-03-Hierarchical array priors for ANOVA decompositions

10 0.7836588 1946 andrew gelman stats-2013-07-19-Prior distributions on derived quantities rather than on parameters themselves

11 0.7780081 1858 andrew gelman stats-2013-05-15-Reputations changeable, situations tolerable

12 0.77474022 184 andrew gelman stats-2010-08-04-That half-Cauchy prior

13 0.77325273 63 andrew gelman stats-2010-06-02-The problem of overestimation of group-level variance parameters

14 0.77155513 1046 andrew gelman stats-2011-12-07-Neutral noninformative and informative conjugate beta and gamma prior distributions

15 0.76803374 1486 andrew gelman stats-2012-09-07-Prior distributions for regression coefficients

16 0.76515925 2129 andrew gelman stats-2013-12-10-Cross-validation and Bayesian estimation of tuning parameters

17 0.76084197 2200 andrew gelman stats-2014-02-05-Prior distribution for a predicted probability

18 0.75597703 1991 andrew gelman stats-2013-08-21-BDA3 table of contents (also a new paper on visualization)

19 0.74389702 2109 andrew gelman stats-2013-11-21-Hidden dangers of noninformative priors

20 0.74260771 555 andrew gelman stats-2011-02-04-Handy Matrix Cheat Sheet, with Gradients

similar blogs computed by lda model

lda for this blog:

topicId topicWeight

[(2, 0.017), (11, 0.198), (16, 0.046), (21, 0.021), (24, 0.137), (72, 0.039), (86, 0.04), (89, 0.04), (97, 0.017), (99, 0.284)]

similar blogs list:

simIndex simValue blogId blogTitle

1 0.95202684 273 andrew gelman stats-2010-09-13-Update on marathon statistics

Introduction: Frank Hansen updates his story and writes: Here is a link to the new stuff. The update is a little less than half way down the page. 1. used display() instead of summary() 2. include a proxy for [non] newbies — whether I can find their name in a previous Chicago Marathon. 3. graph actual pace vs. fitted pace (color code newbie proxy) 4. estimate the model separately for newbies and non-newbies. some incidental discussion of sd of errors. There are a few things unfinished but I have to get to bed, I’m running the 2010 Chicago Half tomorrow morning, and they moved the start up from 7:30 to 7:00 because it’s the day of the Bears home opener too.

same-blog 2 0.91433311 1466 andrew gelman stats-2012-08-22-The scaled inverse Wishart prior distribution for a covariance matrix in a hierarchical model

3 0.90642333 1387 andrew gelman stats-2012-06-21-Will Tiger Woods catch Jack Nicklaus? And a discussion of the virtues of using continuous data even if your goal is discrete prediction

Introduction: I know next to nothing about golf. My mini-golf scores typically approach the maximum of 7 per hole, and I’ve never actually played macro-golf. I did publish a paper on golf once ( A Probability Model for Golf Putting , with Deb Nolan), but it’s not so rare for people to publish papers on topics they know nothing about. Those who can’t, research. But I certainly have the ability to post other people’s ideas. Charles Murray writes: I [Murray] am playing around with the likelihood of Tiger Woods breaking Nicklaus’s record in the Majors. I’ve already gone on record two years ago with the reason why he won’t, but now I’m looking at it from a non-psychological perspective. Given the history of the majors, what how far above the average _for other great golfers_ does Tiger have to perform? Here’s the procedure I’ve been working on: 1. For all golfers who have won at at least one major since 1934 (the year the Masters began), create 120 lines: one for each Major for each year f

4 0.89954907 1386 andrew gelman stats-2012-06-21-Belief in hell is associated with lower crime rates

Introduction: I remember attending a talk a few years ago by my political science colleague John Huber in which he discussed cross-national comparisons of religious attitudes. One thing I remember is that the U.S. is highly religious, another thing I remembered is that lots more Americans believe in heaven than believe in hell. Some of this went into Red State Blue State—not the heaven/hell thing, but the graph of religiosity vs. GDP: and the corresponding graph of religious attendance vs. GDP for U.S. states: Also we learned that, at the individual level, the correlation of religious attendance with income is zero (according to survey reports, rich Americans are neither more nor less likely than poor Americans to go to church regularly): while the correlation of prayer with income is strongly negative (poor Americans are much more likely than rich Americans to regularly pray): Anyway, with all this, I was primed to be interested in a recent study by psychologist

5 0.89767528 458 andrew gelman stats-2010-12-08-Blogging: Is it “fair use”?

Introduction: Dave Kane writes: I [Kane] am involved in a dispute relating to whether or not a blog can be considered part of one’s academic writing. Williams College restricts the use of undergraduate theses as follows: Non-commercial, academic use within the scope of “Fair Use” standards is acceptable. Otherwise, you may not copy or distribute any content without the permission of the copyright holder. Seems obvious enough. Yet some folks think that my use of thesis material in a blog post fails this test because it is not “academic.” See this post for the gory details. Parenthetically, your readers might be interested in the substantive discovery here, the details of the Williams admissions process (which is probably very similar to Columbia’s). Williams places students into academic rating (AR) categories as follows: verbal math composite SAT II ACT AP AR 1: 770-800 750-800 1520-1600 750-800 35-36 mostly 5s AR 2: 730-770 720-750 1450-1520 720-770 33-34 4s an

6 0.89370453 1799 andrew gelman stats-2013-04-12-Stan 1.3.0 and RStan 1.3.0 Ready for Action

7 0.89107728 1311 andrew gelman stats-2012-05-10-My final exam for Design and Analysis of Sample Surveys

8 0.88983959 1462 andrew gelman stats-2012-08-18-Standardizing regression inputs

9 0.87836885 382 andrew gelman stats-2010-10-30-“Presidential Election Outcomes Directly Influence Suicide Rates”

10 0.8747412 297 andrew gelman stats-2010-09-27-An interesting education and statistics blog

11 0.87238413 1225 andrew gelman stats-2012-03-22-Procrastination as a positive productivity strategy

12 0.87214708 1620 andrew gelman stats-2012-12-12-“Teaching effectiveness” as another dimension in cognitive ability

13 0.86821544 378 andrew gelman stats-2010-10-28-World Economic Forum Data Visualization Challenge

14 0.85901475 598 andrew gelman stats-2011-03-03-Is Harvard hurting poor kids by cutting tuition for the upper middle class?

15 0.85885781 1610 andrew gelman stats-2012-12-06-Yes, checking calibration of probability forecasts is part of Bayesian statistics

16 0.85830182 1276 andrew gelman stats-2012-04-22-“Gross misuse of statistics” can be a good thing, if it indicates the acceptance of the importance of statistical reasoning

17 0.85501289 1474 andrew gelman stats-2012-08-29-More on scaled-inverse Wishart and prior independence

18 0.85333574 2335 andrew gelman stats-2014-05-15-Bill Easterly vs. Jeff Sachs: What percentage of the recipients didn’t use the free malaria bed nets in Zambia?

19 0.8455714 2204 andrew gelman stats-2014-02-09-Keli Liu and Xiao-Li Meng on Simpson’s paradox

20 0.84443402 2321 andrew gelman stats-2014-05-05-On deck this week