andrew_gelman_stats andrew_gelman_stats-2012 andrew_gelman_stats-2012-1466 knowledge-graph by maker-knowledge-mining
Source: html
Introduction: Since we’re talking about the scaled inverse Wishart . . . here’s a recent message from Chris Chatham: I have been reading your book on Bayesian Hierarchical/Multilevel Modeling but have been struggling a bit with deciding whether to model my multivariate normal distribution using the scaled inverse Wishart approach you advocate given the arguments at this blog post [entitled "Why an inverse-Wishart prior may not be such a good idea"]. My reply: We discuss this in our book. We know the inverse-Wishart has problems, that’s why we recommend the scaled inverse-Wishart, which is a more general class of models. Here ‘s an old blog post on the topic. And also of course there’s the description in our book. Chris pointed me to the following comment by Simon Barthelmé: Using the scaled inverse Wishart doesn’t change anything, the standard deviations of the invidual coefficients and their covariance are still dependent. My answer would be to use a prior that models the stan
sentIndex sentText sentNum sentScore
1 Since we’re talking about the scaled inverse Wishart . [sent-1, score-0.744]
2 We know the inverse-Wishart has problems, that’s why we recommend the scaled inverse-Wishart, which is a more general class of models. [sent-6, score-0.49]
3 Chris pointed me to the following comment by Simon Barthelmé: Using the scaled inverse Wishart doesn’t change anything, the standard deviations of the invidual coefficients and their covariance are still dependent. [sent-9, score-1.145]
4 My answer would be to use a prior that models the standard deviations and the correlations separately, so that you can express things like “I don’t expect my coefficients to be too large but I expect them to be correlated. [sent-10, score-0.948]
5 ” Barthelmé mentions the Barnard, McCulloch, and Meng paper (which I just love, and which I cite in at least one of my books) in which the scale parameters and the correlations are modeled independently, and writes, “I don’t see why this isn’t the default in most statistical software, honestly. [sent-11, score-0.768]
6 ” The answer to this last question is that computation is really slow with that model. [sent-12, score-0.056]
7 Also, it’s not really necessary for scale parameters and correlations to be precisely independent. [sent-13, score-0.646]
8 What you want is for these parameters to be uncoupled or to be de facto independent. [sent-14, score-0.334]
9 To put it another way, what matters in a prior is not what the prior looks like, what matters is what the posterior looks like. [sent-15, score-0.792]
10 We’d like to be able to estimate, from hierarchical data, the scale parameters and also the correlations. [sent-16, score-0.517]
11 The redundant parameterization in the scaled inverse Wishart prior (which, just to remind you, is due to O’Malley and Zaslavsky, not me; all I’ve done is to publicize it) allows scale parameters and correlations to both be estimated from data. [sent-17, score-1.943]
12 It fixes the problem with the unscaled inverse-Wishart. [sent-18, score-0.07]
13 What I like about these models are that they are computationally convenient, and the scaled version allows the flexibility we want for a hierarchical model. [sent-21, score-0.903]
14 model is fine too (and, as I said, I love their article) but I don’t see any particular reason why these parameters should be independent in the prior. [sent-23, score-0.322]
15 What matters is how things get estimated in the posterior. [sent-25, score-0.226]
16 Unfortunately, even now I think the inverse-Wishart is considered the standard, and people don’t always know about the scaled inverse-Wishart. [sent-26, score-0.49]
17 Another problem is that people often think of these models in terms of how they work with direct multivariate data, but I’m more interested in the hierarchical modeling context where a set of parameters (for example, regression coefficients) vary by group. [sent-27, score-0.599]
wordName wordTfidf (topN-words)
[('scaled', 0.49), ('wishart', 0.366), ('inverse', 0.254), ('parameters', 0.25), ('correlations', 0.19), ('barthelm', 0.178), ('scale', 0.152), ('prior', 0.151), ('matters', 0.148), ('barnard', 0.146), ('coefficients', 0.135), ('deviations', 0.123), ('hierarchical', 0.115), ('multivariate', 0.105), ('allows', 0.096), ('mcculloch', 0.089), ('chatham', 0.089), ('zaslavsky', 0.089), ('chris', 0.088), ('standard', 0.084), ('facto', 0.084), ('malley', 0.084), ('estimated', 0.078), ('publicize', 0.077), ('advocate', 0.077), ('redundant', 0.073), ('love', 0.072), ('expect', 0.071), ('computationally', 0.07), ('fixes', 0.07), ('remind', 0.067), ('models', 0.067), ('flexibility', 0.065), ('parameterization', 0.065), ('looks', 0.065), ('mentions', 0.064), ('simon', 0.064), ('another', 0.064), ('meng', 0.063), ('independently', 0.063), ('struggling', 0.062), ('modeling', 0.062), ('deciding', 0.061), ('covariance', 0.059), ('modeled', 0.057), ('answer', 0.056), ('separately', 0.056), ('entitled', 0.055), ('cite', 0.055), ('precisely', 0.054)]
simIndex simValue blogId blogTitle
same-blog 1 1.0 1466 andrew gelman stats-2012-08-22-The scaled inverse Wishart prior distribution for a covariance matrix in a hierarchical model
Introduction: Since we’re talking about the scaled inverse Wishart . . . here’s a recent message from Chris Chatham: I have been reading your book on Bayesian Hierarchical/Multilevel Modeling but have been struggling a bit with deciding whether to model my multivariate normal distribution using the scaled inverse Wishart approach you advocate given the arguments at this blog post [entitled "Why an inverse-Wishart prior may not be such a good idea"]. My reply: We discuss this in our book. We know the inverse-Wishart has problems, that’s why we recommend the scaled inverse-Wishart, which is a more general class of models. Here ‘s an old blog post on the topic. And also of course there’s the description in our book. Chris pointed me to the following comment by Simon Barthelmé: Using the scaled inverse Wishart doesn’t change anything, the standard deviations of the invidual coefficients and their covariance are still dependent. My answer would be to use a prior that models the stan
2 0.43904141 1474 andrew gelman stats-2012-08-29-More on scaled-inverse Wishart and prior independence
Introduction: I’ve had a couple of email conversations in the past couple days on dependence in multivariate prior distributions. Modeling the degrees of freedom and scale parameters in the t distribution First, in our Stan group we’ve been discussing the choice of priors for the degrees-of-freedom parameter in the t distribution. I wrote that also there’s the question of parameterization. It does not necessarily make sense to have independent priors on the df and scale parameters. In some sense, the meaning of the scale parameter changes with the df. Prior dependence between correlation and scale parameters in the scaled inverse-Wishart model The second case of parameterization in prior distribution arose from an email I received from Chris Chatham pointing me to this exploration by Matt Simpson of the scaled inverse-Wishart prior distribution for hierarchical covariance matrices. Simpson writes: A popular prior for Σ is the inverse-Wishart distribution [ not the same as the
3 0.36037576 1465 andrew gelman stats-2012-08-21-D. Buggin
Introduction: Joe Zhao writes: I am trying to fit my data using the scaled inverse wishart model you mentioned in your book, Data analysis using regression and hierarchical models. Instead of using a uniform prior on the scale parameters, I try to use a log-normal distribution prior. However, I found that the individual coefficients don’t shrink much to a certain value even a highly informative prior (with extremely low variance) is considered. The coefficients are just very close to their least-squares estimations. Is it because of the log-normal prior I’m using or I’m wrong somewhere? My reply: If your priors are concentrated enough at zero variance, then yeah, the posterior estimates of the parameters should be pulled (almost) all the way to zero. If this isn’t happening, you got a problem. So as a start I’d try putting in some really strong priors concentrated at 0 (for example, N(0,.1^2)) and checking that you get a sensible answer. If not, you might well have a bug. You can also try
4 0.22073686 1477 andrew gelman stats-2012-08-30-Visualizing Distributions of Covariance Matrices
Introduction: Since we’ve been discussing prior distributions on covariance matrices, I will recommend this recent article (coauthored with Tomoki Tokuda, Ben Goodrich, Iven Van Mechelen, and Francis Tuerlinckx) on their visualization: We present some methods for graphing distributions of covariance matrices and demonstrate them on several models, including the Wishart, inverse-Wishart, and scaled inverse-Wishart families in different dimensions. Our visualizations follow the principle of decomposing a covariance matrix into scale parameters and correlations, pulling out marginal summaries where possible and using two and three-dimensional plots to reveal multivariate structure. Visualizing a distribution of covariance matrices is a step beyond visualizing a single covariance matrix or a single multivariate dataset. Our visualization methods are available through the R package VisCov.
5 0.2033347 1991 andrew gelman stats-2013-08-21-BDA3 table of contents (also a new paper on visualization)
Introduction: In response to our recent posting of Amazon’s offer of Bayesian Data Analysis 3rd edition at 40% off, some people asked what was in this new edition, with more information beyond the beautiful cover image and the brief paragraph I’d posted earlier. Here’s the table of contents. The following sections have all-new material: 1.4 New introduction of BDA principles using a simple spell checking example 2.9 Weakly informative prior distributions 5.7 Weakly informative priors for hierarchical variance parameters 7.1-7.4 Predictive accuracy for model evaluation and comparison 10.6 Computing environments 11.4 Split R-hat 11.5 New measure of effective number of simulation draws 13.7 Variational inference 13.8 Expectation propagation 13.9 Other approximations 14.6 Regularization for regression models C.1 Getting started with R and Stan C.2 Fitting a hierarchical model in Stan C.4 Programming Hamiltonian Monte Carlo in R And the new chapters: 20 Basis function models 2
6 0.19542512 779 andrew gelman stats-2011-06-25-Avoiding boundary estimates using a prior distribution as regularization
7 0.16474767 2200 andrew gelman stats-2014-02-05-Prior distribution for a predicted probability
8 0.15254164 846 andrew gelman stats-2011-08-09-Default priors update?
9 0.15019083 2017 andrew gelman stats-2013-09-11-“Informative g-Priors for Logistic Regression”
10 0.15000625 1941 andrew gelman stats-2013-07-16-Priors
11 0.14505801 1287 andrew gelman stats-2012-04-28-Understanding simulations in terms of predictive inference?
12 0.1320101 2109 andrew gelman stats-2013-11-21-Hidden dangers of noninformative priors
13 0.13152072 801 andrew gelman stats-2011-07-13-On the half-Cauchy prior for a global scale parameter
14 0.12936257 1155 andrew gelman stats-2012-02-05-What is a prior distribution?
15 0.12561421 1695 andrew gelman stats-2013-01-28-Economists argue about Bayes
16 0.12191731 1486 andrew gelman stats-2012-09-07-Prior distributions for regression coefficients
17 0.12086524 1946 andrew gelman stats-2013-07-19-Prior distributions on derived quantities rather than on parameters themselves
18 0.11292474 931 andrew gelman stats-2011-09-29-Hamiltonian Monte Carlo stories
19 0.1067033 1425 andrew gelman stats-2012-07-23-Examples of the use of hierarchical modeling to generalize to new settings
20 0.10507607 547 andrew gelman stats-2011-01-31-Using sample size in the prior distribution
topicId topicWeight
[(0, 0.184), (1, 0.16), (2, 0.026), (3, 0.059), (4, 0.036), (5, -0.021), (6, 0.118), (7, -0.04), (8, -0.108), (9, 0.08), (10, 0.05), (11, 0.0), (12, 0.052), (13, 0.025), (14, 0.021), (15, -0.005), (16, -0.063), (17, 0.028), (18, 0.027), (19, -0.007), (20, 0.005), (21, -0.026), (22, -0.009), (23, 0.03), (24, 0.015), (25, 0.015), (26, 0.012), (27, 0.034), (28, 0.019), (29, -0.031), (30, -0.002), (31, -0.001), (32, 0.036), (33, -0.019), (34, 0.019), (35, -0.045), (36, -0.003), (37, -0.036), (38, -0.009), (39, 0.025), (40, -0.028), (41, 0.012), (42, 0.017), (43, 0.012), (44, 0.025), (45, 0.001), (46, -0.011), (47, -0.035), (48, 0.037), (49, -0.077)]
simIndex simValue blogId blogTitle
same-blog 1 0.94898206 1466 andrew gelman stats-2012-08-22-The scaled inverse Wishart prior distribution for a covariance matrix in a hierarchical model
Introduction: Since we’re talking about the scaled inverse Wishart . . . here’s a recent message from Chris Chatham: I have been reading your book on Bayesian Hierarchical/Multilevel Modeling but have been struggling a bit with deciding whether to model my multivariate normal distribution using the scaled inverse Wishart approach you advocate given the arguments at this blog post [entitled "Why an inverse-Wishart prior may not be such a good idea"]. My reply: We discuss this in our book. We know the inverse-Wishart has problems, that’s why we recommend the scaled inverse-Wishart, which is a more general class of models. Here ‘s an old blog post on the topic. And also of course there’s the description in our book. Chris pointed me to the following comment by Simon Barthelmé: Using the scaled inverse Wishart doesn’t change anything, the standard deviations of the invidual coefficients and their covariance are still dependent. My answer would be to use a prior that models the stan
2 0.90135592 846 andrew gelman stats-2011-08-09-Default priors update?
Introduction: Ryan King writes: I was wondering if you have a brief comment on the state of the art for objective priors for hierarchical generalized linear models (generalized linear mixed models). I have been working off the papers in Bayesian Analysis (2006) 1, Number 3 (Browne and Draper, Kass and Natarajan, Gelman). There seems to have been continuous work for matching priors in linear mixed models, but GLMMs less so because of the lack of an analytic marginal likelihood for the variance components. There are a number of additional suggestions in the literature since 2006, but little robust practical guidance. I’m interested in both mean parameters and the variance components. I’m almost always concerned with logistic random effect models. I’m fascinated by the matching-priors idea of higher-order asymptotic improvements to maximum likelihood, and need to make some kind of defensible default recommendation. Given the massive scale of the datasets (genetics …), extensive sensitivity a
3 0.88588089 779 andrew gelman stats-2011-06-25-Avoiding boundary estimates using a prior distribution as regularization
Introduction: For awhile I’ve been fitting most of my multilevel models using lmer/glmer, which gives point estimates of the group-level variance parameters (maximum marginal likelihood estimate for lmer and an approximation for glmer). I’m usually satisfied with this–sure, point estimation understates the uncertainty in model fitting, but that’s typically the least of our worries. Sometimes, though, lmer/glmer estimates group-level variances at 0 or estimates group-level correlation parameters at +/- 1. Typically, when this happens, it’s not that we’re so sure the variance is close to zero or that the correlation is close to 1 or -1; rather, the marginal likelihood does not provide a lot of information about these parameters of the group-level error distribution. I don’t want point estimates on the boundary. I don’t want to say that the unexplained variance in some dimension is exactly zero. One way to handle this problem is full Bayes: slap a prior on sigma, do your Gibbs and Metropolis
4 0.86264175 1474 andrew gelman stats-2012-08-29-More on scaled-inverse Wishart and prior independence
Introduction: I’ve had a couple of email conversations in the past couple days on dependence in multivariate prior distributions. Modeling the degrees of freedom and scale parameters in the t distribution First, in our Stan group we’ve been discussing the choice of priors for the degrees-of-freedom parameter in the t distribution. I wrote that also there’s the question of parameterization. It does not necessarily make sense to have independent priors on the df and scale parameters. In some sense, the meaning of the scale parameter changes with the df. Prior dependence between correlation and scale parameters in the scaled inverse-Wishart model The second case of parameterization in prior distribution arose from an email I received from Chris Chatham pointing me to this exploration by Matt Simpson of the scaled inverse-Wishart prior distribution for hierarchical covariance matrices. Simpson writes: A popular prior for Σ is the inverse-Wishart distribution [ not the same as the
5 0.86058122 1465 andrew gelman stats-2012-08-21-D. Buggin
Introduction: Joe Zhao writes: I am trying to fit my data using the scaled inverse wishart model you mentioned in your book, Data analysis using regression and hierarchical models. Instead of using a uniform prior on the scale parameters, I try to use a log-normal distribution prior. However, I found that the individual coefficients don’t shrink much to a certain value even a highly informative prior (with extremely low variance) is considered. The coefficients are just very close to their least-squares estimations. Is it because of the log-normal prior I’m using or I’m wrong somewhere? My reply: If your priors are concentrated enough at zero variance, then yeah, the posterior estimates of the parameters should be pulled (almost) all the way to zero. If this isn’t happening, you got a problem. So as a start I’d try putting in some really strong priors concentrated at 0 (for example, N(0,.1^2)) and checking that you get a sensible answer. If not, you might well have a bug. You can also try
6 0.81594372 669 andrew gelman stats-2011-04-19-The mysterious Gamma (1.4, 0.4)
7 0.81291234 801 andrew gelman stats-2011-07-13-On the half-Cauchy prior for a global scale parameter
8 0.79462528 2017 andrew gelman stats-2013-09-11-“Informative g-Priors for Logistic Regression”
9 0.79445797 1786 andrew gelman stats-2013-04-03-Hierarchical array priors for ANOVA decompositions
10 0.7836588 1946 andrew gelman stats-2013-07-19-Prior distributions on derived quantities rather than on parameters themselves
11 0.7780081 1858 andrew gelman stats-2013-05-15-Reputations changeable, situations tolerable
12 0.77474022 184 andrew gelman stats-2010-08-04-That half-Cauchy prior
13 0.77325273 63 andrew gelman stats-2010-06-02-The problem of overestimation of group-level variance parameters
14 0.77155513 1046 andrew gelman stats-2011-12-07-Neutral noninformative and informative conjugate beta and gamma prior distributions
15 0.76803374 1486 andrew gelman stats-2012-09-07-Prior distributions for regression coefficients
16 0.76515925 2129 andrew gelman stats-2013-12-10-Cross-validation and Bayesian estimation of tuning parameters
17 0.76084197 2200 andrew gelman stats-2014-02-05-Prior distribution for a predicted probability
18 0.75597703 1991 andrew gelman stats-2013-08-21-BDA3 table of contents (also a new paper on visualization)
19 0.74389702 2109 andrew gelman stats-2013-11-21-Hidden dangers of noninformative priors
20 0.74260771 555 andrew gelman stats-2011-02-04-Handy Matrix Cheat Sheet, with Gradients
topicId topicWeight
[(2, 0.017), (11, 0.198), (16, 0.046), (21, 0.021), (24, 0.137), (72, 0.039), (86, 0.04), (89, 0.04), (97, 0.017), (99, 0.284)]
simIndex simValue blogId blogTitle
1 0.95202684 273 andrew gelman stats-2010-09-13-Update on marathon statistics
Introduction: Frank Hansen updates his story and writes: Here is a link to the new stuff. The update is a little less than half way down the page. 1. used display() instead of summary() 2. include a proxy for [non] newbies — whether I can find their name in a previous Chicago Marathon. 3. graph actual pace vs. fitted pace (color code newbie proxy) 4. estimate the model separately for newbies and non-newbies. some incidental discussion of sd of errors. There are a few things unfinished but I have to get to bed, I’m running the 2010 Chicago Half tomorrow morning, and they moved the start up from 7:30 to 7:00 because it’s the day of the Bears home opener too.
same-blog 2 0.91433311 1466 andrew gelman stats-2012-08-22-The scaled inverse Wishart prior distribution for a covariance matrix in a hierarchical model
Introduction: Since we’re talking about the scaled inverse Wishart . . . here’s a recent message from Chris Chatham: I have been reading your book on Bayesian Hierarchical/Multilevel Modeling but have been struggling a bit with deciding whether to model my multivariate normal distribution using the scaled inverse Wishart approach you advocate given the arguments at this blog post [entitled "Why an inverse-Wishart prior may not be such a good idea"]. My reply: We discuss this in our book. We know the inverse-Wishart has problems, that’s why we recommend the scaled inverse-Wishart, which is a more general class of models. Here ‘s an old blog post on the topic. And also of course there’s the description in our book. Chris pointed me to the following comment by Simon Barthelmé: Using the scaled inverse Wishart doesn’t change anything, the standard deviations of the invidual coefficients and their covariance are still dependent. My answer would be to use a prior that models the stan
Introduction: I know next to nothing about golf. My mini-golf scores typically approach the maximum of 7 per hole, and I’ve never actually played macro-golf. I did publish a paper on golf once ( A Probability Model for Golf Putting , with Deb Nolan), but it’s not so rare for people to publish papers on topics they know nothing about. Those who can’t, research. But I certainly have the ability to post other people’s ideas. Charles Murray writes: I [Murray] am playing around with the likelihood of Tiger Woods breaking Nicklaus’s record in the Majors. I’ve already gone on record two years ago with the reason why he won’t, but now I’m looking at it from a non-psychological perspective. Given the history of the majors, what how far above the average _for other great golfers_ does Tiger have to perform? Here’s the procedure I’ve been working on: 1. For all golfers who have won at at least one major since 1934 (the year the Masters began), create 120 lines: one for each Major for each year f
4 0.89954907 1386 andrew gelman stats-2012-06-21-Belief in hell is associated with lower crime rates
Introduction: I remember attending a talk a few years ago by my political science colleague John Huber in which he discussed cross-national comparisons of religious attitudes. One thing I remember is that the U.S. is highly religious, another thing I remembered is that lots more Americans believe in heaven than believe in hell. Some of this went into Red State Blue State—not the heaven/hell thing, but the graph of religiosity vs. GDP: and the corresponding graph of religious attendance vs. GDP for U.S. states: Also we learned that, at the individual level, the correlation of religious attendance with income is zero (according to survey reports, rich Americans are neither more nor less likely than poor Americans to go to church regularly): while the correlation of prayer with income is strongly negative (poor Americans are much more likely than rich Americans to regularly pray): Anyway, with all this, I was primed to be interested in a recent study by psychologist
5 0.89767528 458 andrew gelman stats-2010-12-08-Blogging: Is it “fair use”?
Introduction: Dave Kane writes: I [Kane] am involved in a dispute relating to whether or not a blog can be considered part of one’s academic writing. Williams College restricts the use of undergraduate theses as follows: Non-commercial, academic use within the scope of “Fair Use” standards is acceptable. Otherwise, you may not copy or distribute any content without the permission of the copyright holder. Seems obvious enough. Yet some folks think that my use of thesis material in a blog post fails this test because it is not “academic.” See this post for the gory details. Parenthetically, your readers might be interested in the substantive discovery here, the details of the Williams admissions process (which is probably very similar to Columbia’s). Williams places students into academic rating (AR) categories as follows: verbal math composite SAT II ACT AP AR 1: 770-800 750-800 1520-1600 750-800 35-36 mostly 5s AR 2: 730-770 720-750 1450-1520 720-770 33-34 4s an
6 0.89370453 1799 andrew gelman stats-2013-04-12-Stan 1.3.0 and RStan 1.3.0 Ready for Action
7 0.89107728 1311 andrew gelman stats-2012-05-10-My final exam for Design and Analysis of Sample Surveys
8 0.88983959 1462 andrew gelman stats-2012-08-18-Standardizing regression inputs
9 0.87836885 382 andrew gelman stats-2010-10-30-“Presidential Election Outcomes Directly Influence Suicide Rates”
10 0.8747412 297 andrew gelman stats-2010-09-27-An interesting education and statistics blog
11 0.87238413 1225 andrew gelman stats-2012-03-22-Procrastination as a positive productivity strategy
12 0.87214708 1620 andrew gelman stats-2012-12-12-“Teaching effectiveness” as another dimension in cognitive ability
13 0.86821544 378 andrew gelman stats-2010-10-28-World Economic Forum Data Visualization Challenge
14 0.85901475 598 andrew gelman stats-2011-03-03-Is Harvard hurting poor kids by cutting tuition for the upper middle class?
15 0.85885781 1610 andrew gelman stats-2012-12-06-Yes, checking calibration of probability forecasts is part of Bayesian statistics
17 0.85501289 1474 andrew gelman stats-2012-08-29-More on scaled-inverse Wishart and prior independence
19 0.8455714 2204 andrew gelman stats-2014-02-09-Keli Liu and Xiao-Li Meng on Simpson’s paradox
20 0.84443402 2321 andrew gelman stats-2014-05-05-On deck this week