andrew_gelman_stats andrew_gelman_stats-2010 andrew_gelman_stats-2010-184 knowledge-graph by maker-knowledge-mining
Source: html
Introduction: Xiaoyu Qian writes: I have a question when I apply the half-Cauchy prior (Gelman, 2006) for the variance parameter in a hierarchical model. The model I used is a three level IRT model equivalent to a Rasch model. The variance parameter I try to estimate is at the third level. The group size ranges from 15 to 44. The data is TIMSS 2007 data. I used the syntax provided by the paper and found that the convergence of the standard deviation term is good (sigma.theta), however, the convergence for the parameter “xi” is not very good. Does it mean the whole model has not converged? Do you have any suggestion for this situation. I also used the uniform prior and correlate the result with the half-Cauchy result for the standard deviation term. The results correlated .99. My reply: It’s not a problem if xi does not converge well. It’s |xi|*sigma that is relevant. And, if the number of groups is large, the prior probably won’t matter so much, which would explain your 99% correlat
sentIndex sentText sentNum sentScore
1 Xiaoyu Qian writes: I have a question when I apply the half-Cauchy prior (Gelman, 2006) for the variance parameter in a hierarchical model. [sent-1, score-0.81]
2 The model I used is a three level IRT model equivalent to a Rasch model. [sent-2, score-0.574]
3 The variance parameter I try to estimate is at the third level. [sent-3, score-0.614]
4 I used the syntax provided by the paper and found that the convergence of the standard deviation term is good (sigma. [sent-6, score-1.11]
5 theta), however, the convergence for the parameter “xi” is not very good. [sent-7, score-0.493]
6 I also used the uniform prior and correlate the result with the half-Cauchy result for the standard deviation term. [sent-10, score-1.198]
7 My reply: It’s not a problem if xi does not converge well. [sent-13, score-0.671]
8 And, if the number of groups is large, the prior probably won’t matter so much, which would explain your 99% correlation. [sent-15, score-0.531]
wordName wordTfidf (topN-words)
[('xi', 0.529), ('convergence', 0.25), ('parameter', 0.243), ('deviation', 0.223), ('prior', 0.191), ('qian', 0.187), ('irt', 0.187), ('rasch', 0.187), ('variance', 0.169), ('converged', 0.169), ('correlate', 0.158), ('syntax', 0.158), ('sigma', 0.147), ('ranges', 0.142), ('converge', 0.142), ('used', 0.135), ('result', 0.131), ('standard', 0.118), ('uniform', 0.111), ('model', 0.111), ('suggestion', 0.104), ('equivalent', 0.096), ('correlated', 0.096), ('provided', 0.095), ('third', 0.089), ('correlation', 0.089), ('apply', 0.083), ('gelman', 0.083), ('groups', 0.081), ('hierarchical', 0.08), ('term', 0.08), ('size', 0.076), ('explain', 0.075), ('matter', 0.07), ('whole', 0.068), ('won', 0.068), ('group', 0.068), ('however', 0.063), ('level', 0.061), ('three', 0.06), ('probably', 0.059), ('estimate', 0.058), ('large', 0.056), ('number', 0.055), ('try', 0.055), ('reply', 0.054), ('mean', 0.053), ('results', 0.052), ('found', 0.051), ('question', 0.044)]
simIndex simValue blogId blogTitle
same-blog 1 0.99999994 184 andrew gelman stats-2010-08-04-That half-Cauchy prior
Introduction: Xiaoyu Qian writes: I have a question when I apply the half-Cauchy prior (Gelman, 2006) for the variance parameter in a hierarchical model. The model I used is a three level IRT model equivalent to a Rasch model. The variance parameter I try to estimate is at the third level. The group size ranges from 15 to 44. The data is TIMSS 2007 data. I used the syntax provided by the paper and found that the convergence of the standard deviation term is good (sigma.theta), however, the convergence for the parameter “xi” is not very good. Does it mean the whole model has not converged? Do you have any suggestion for this situation. I also used the uniform prior and correlate the result with the half-Cauchy result for the standard deviation term. The results correlated .99. My reply: It’s not a problem if xi does not converge well. It’s |xi|*sigma that is relevant. And, if the number of groups is large, the prior probably won’t matter so much, which would explain your 99% correlat
2 0.19364731 1474 andrew gelman stats-2012-08-29-More on scaled-inverse Wishart and prior independence
Introduction: I’ve had a couple of email conversations in the past couple days on dependence in multivariate prior distributions. Modeling the degrees of freedom and scale parameters in the t distribution First, in our Stan group we’ve been discussing the choice of priors for the degrees-of-freedom parameter in the t distribution. I wrote that also there’s the question of parameterization. It does not necessarily make sense to have independent priors on the df and scale parameters. In some sense, the meaning of the scale parameter changes with the df. Prior dependence between correlation and scale parameters in the scaled inverse-Wishart model The second case of parameterization in prior distribution arose from an email I received from Chris Chatham pointing me to this exploration by Matt Simpson of the scaled inverse-Wishart prior distribution for hierarchical covariance matrices. Simpson writes: A popular prior for Σ is the inverse-Wishart distribution [ not the same as the
3 0.17935622 63 andrew gelman stats-2010-06-02-The problem of overestimation of group-level variance parameters
Introduction: John Lawson writes: I have been experimenting using Bayesian Methods to estimate variance components, and I have noticed that even when I use a noninformative prior, my estimates are never close to the method of moments or REML estimates. In every case I have tried, the sum of the Bayesian estimated variance components is always larger than the sum of the estimates obtained by method of moments or REML. For data sets I have used that arise from a simple one-way random effects model, the Bayesian estimates of the between groups variance component is usually larger than the method of moments or REML estimates. When I use a uniform prior on the between standard deviation (as you recommended in your 2006 paper ) rather than an inverse gamma prior on the between variance component, the between variance component is usually reduced. However, for the dyestuff data in Davies(1949, p74), the opposite appears to be the case. I am a worried that the Bayesian estimators of the varian
4 0.17655078 1809 andrew gelman stats-2013-04-17-NUTS discussed on Xi’an’s Og
Introduction: Xi’an’s Og (aka Christian Robert’s blog) is featuring a very nice presentation of NUTS by Marco Banterle, with discussion and some suggestions. I’m not even sure how they found Michael Betancourt’s paper on geometric NUTS — I don’t see it on the arXiv yet, or I’d provide a link.
5 0.17190295 1941 andrew gelman stats-2013-07-16-Priors
Introduction: Nick Firoozye writes: While I am absolutely sympathetic to the Bayesian agenda I am often troubled by the requirement of having priors. We must have priors on the parameter of an infinite number of model we have never seen before and I find this troubling. There is a similarly troubling problem in economics of utility theory. Utility is on consumables. To be complete a consumer must assign utility to all sorts of things they never would have encountered. More recent versions of utility theory instead make consumption goods a portfolio of attributes. Cadillacs are x many units of luxury y of transport etc etc. And we can automatically have personal utilities to all these attributes. I don’t ever see parameters. Some model have few and some have hundreds. Instead, I see data. So I don’t know how to have an opinion on parameters themselves. Rather I think it far more natural to have opinions on the behavior of models. The prior predictive density is a good and sensible notion. Also
6 0.16848257 779 andrew gelman stats-2011-06-25-Avoiding boundary estimates using a prior distribution as regularization
7 0.15573089 1757 andrew gelman stats-2013-03-11-My problem with the Lindley paradox
9 0.14854495 1792 andrew gelman stats-2013-04-07-X on JLP
10 0.14019755 810 andrew gelman stats-2011-07-20-Adding more information can make the variance go up (depending on your model)
11 0.13925573 2109 andrew gelman stats-2013-11-21-Hidden dangers of noninformative priors
12 0.13904408 1858 andrew gelman stats-2013-05-15-Reputations changeable, situations tolerable
13 0.13685337 669 andrew gelman stats-2011-04-19-The mysterious Gamma (1.4, 0.4)
14 0.13271621 1465 andrew gelman stats-2012-08-21-D. Buggin
15 0.1301945 801 andrew gelman stats-2011-07-13-On the half-Cauchy prior for a global scale parameter
16 0.13008408 1155 andrew gelman stats-2012-02-05-What is a prior distribution?
17 0.1294806 2200 andrew gelman stats-2014-02-05-Prior distribution for a predicted probability
18 0.12882829 255 andrew gelman stats-2010-09-04-How does multilevel modeling affect the estimate of the grand mean?
19 0.12729925 2231 andrew gelman stats-2014-03-03-Running into a Stan Reference by Accident
20 0.12403421 160 andrew gelman stats-2010-07-23-Unhappy with improvement by a factor of 10^29
topicId topicWeight
[(0, 0.134), (1, 0.174), (2, 0.085), (3, -0.009), (4, 0.016), (5, -0.026), (6, 0.109), (7, -0.015), (8, -0.111), (9, 0.086), (10, 0.037), (11, 0.008), (12, 0.051), (13, 0.029), (14, -0.006), (15, -0.017), (16, -0.036), (17, -0.004), (18, 0.035), (19, -0.009), (20, -0.011), (21, -0.044), (22, 0.036), (23, -0.017), (24, 0.005), (25, -0.007), (26, -0.036), (27, 0.02), (28, -0.001), (29, -0.02), (30, -0.021), (31, -0.014), (32, -0.003), (33, -0.017), (34, -0.016), (35, 0.01), (36, 0.011), (37, -0.023), (38, 0.03), (39, 0.018), (40, -0.033), (41, -0.009), (42, -0.035), (43, 0.04), (44, -0.049), (45, -0.006), (46, 0.024), (47, -0.063), (48, 0.004), (49, 0.005)]
simIndex simValue blogId blogTitle
same-blog 1 0.96662927 184 andrew gelman stats-2010-08-04-That half-Cauchy prior
Introduction: Xiaoyu Qian writes: I have a question when I apply the half-Cauchy prior (Gelman, 2006) for the variance parameter in a hierarchical model. The model I used is a three level IRT model equivalent to a Rasch model. The variance parameter I try to estimate is at the third level. The group size ranges from 15 to 44. The data is TIMSS 2007 data. I used the syntax provided by the paper and found that the convergence of the standard deviation term is good (sigma.theta), however, the convergence for the parameter “xi” is not very good. Does it mean the whole model has not converged? Do you have any suggestion for this situation. I also used the uniform prior and correlate the result with the half-Cauchy result for the standard deviation term. The results correlated .99. My reply: It’s not a problem if xi does not converge well. It’s |xi|*sigma that is relevant. And, if the number of groups is large, the prior probably won’t matter so much, which would explain your 99% correlat
2 0.88307548 779 andrew gelman stats-2011-06-25-Avoiding boundary estimates using a prior distribution as regularization
Introduction: For awhile I’ve been fitting most of my multilevel models using lmer/glmer, which gives point estimates of the group-level variance parameters (maximum marginal likelihood estimate for lmer and an approximation for glmer). I’m usually satisfied with this–sure, point estimation understates the uncertainty in model fitting, but that’s typically the least of our worries. Sometimes, though, lmer/glmer estimates group-level variances at 0 or estimates group-level correlation parameters at +/- 1. Typically, when this happens, it’s not that we’re so sure the variance is close to zero or that the correlation is close to 1 or -1; rather, the marginal likelihood does not provide a lot of information about these parameters of the group-level error distribution. I don’t want point estimates on the boundary. I don’t want to say that the unexplained variance in some dimension is exactly zero. One way to handle this problem is full Bayes: slap a prior on sigma, do your Gibbs and Metropolis
3 0.86390418 846 andrew gelman stats-2011-08-09-Default priors update?
Introduction: Ryan King writes: I was wondering if you have a brief comment on the state of the art for objective priors for hierarchical generalized linear models (generalized linear mixed models). I have been working off the papers in Bayesian Analysis (2006) 1, Number 3 (Browne and Draper, Kass and Natarajan, Gelman). There seems to have been continuous work for matching priors in linear mixed models, but GLMMs less so because of the lack of an analytic marginal likelihood for the variance components. There are a number of additional suggestions in the literature since 2006, but little robust practical guidance. I’m interested in both mean parameters and the variance components. I’m almost always concerned with logistic random effect models. I’m fascinated by the matching-priors idea of higher-order asymptotic improvements to maximum likelihood, and need to make some kind of defensible default recommendation. Given the massive scale of the datasets (genetics …), extensive sensitivity a
4 0.8300131 1474 andrew gelman stats-2012-08-29-More on scaled-inverse Wishart and prior independence
Introduction: I’ve had a couple of email conversations in the past couple days on dependence in multivariate prior distributions. Modeling the degrees of freedom and scale parameters in the t distribution First, in our Stan group we’ve been discussing the choice of priors for the degrees-of-freedom parameter in the t distribution. I wrote that also there’s the question of parameterization. It does not necessarily make sense to have independent priors on the df and scale parameters. In some sense, the meaning of the scale parameter changes with the df. Prior dependence between correlation and scale parameters in the scaled inverse-Wishart model The second case of parameterization in prior distribution arose from an email I received from Chris Chatham pointing me to this exploration by Matt Simpson of the scaled inverse-Wishart prior distribution for hierarchical covariance matrices. Simpson writes: A popular prior for Σ is the inverse-Wishart distribution [ not the same as the
5 0.82988799 1465 andrew gelman stats-2012-08-21-D. Buggin
Introduction: Joe Zhao writes: I am trying to fit my data using the scaled inverse wishart model you mentioned in your book, Data analysis using regression and hierarchical models. Instead of using a uniform prior on the scale parameters, I try to use a log-normal distribution prior. However, I found that the individual coefficients don’t shrink much to a certain value even a highly informative prior (with extremely low variance) is considered. The coefficients are just very close to their least-squares estimations. Is it because of the log-normal prior I’m using or I’m wrong somewhere? My reply: If your priors are concentrated enough at zero variance, then yeah, the posterior estimates of the parameters should be pulled (almost) all the way to zero. If this isn’t happening, you got a problem. So as a start I’d try putting in some really strong priors concentrated at 0 (for example, N(0,.1^2)) and checking that you get a sensible answer. If not, you might well have a bug. You can also try
6 0.81086749 63 andrew gelman stats-2010-06-02-The problem of overestimation of group-level variance parameters
7 0.80524302 1858 andrew gelman stats-2013-05-15-Reputations changeable, situations tolerable
9 0.78187698 669 andrew gelman stats-2011-04-19-The mysterious Gamma (1.4, 0.4)
10 0.77812344 810 andrew gelman stats-2011-07-20-Adding more information can make the variance go up (depending on your model)
11 0.76669729 801 andrew gelman stats-2011-07-13-On the half-Cauchy prior for a global scale parameter
12 0.76502025 1941 andrew gelman stats-2013-07-16-Priors
14 0.74657011 1966 andrew gelman stats-2013-08-03-Uncertainty in parameter estimates using multilevel models
15 0.74583268 2017 andrew gelman stats-2013-09-11-“Informative g-Priors for Logistic Regression”
16 0.73590785 2129 andrew gelman stats-2013-12-10-Cross-validation and Bayesian estimation of tuning parameters
17 0.72608989 2109 andrew gelman stats-2013-11-21-Hidden dangers of noninformative priors
18 0.71980089 1092 andrew gelman stats-2011-12-29-More by Berger and me on weakly informative priors
19 0.71653414 1786 andrew gelman stats-2013-04-03-Hierarchical array priors for ANOVA decompositions
20 0.71559519 1486 andrew gelman stats-2012-09-07-Prior distributions for regression coefficients
topicId topicWeight
[(1, 0.022), (16, 0.021), (24, 0.171), (63, 0.024), (84, 0.27), (86, 0.046), (99, 0.315)]
simIndex simValue blogId blogTitle
1 0.97507811 667 andrew gelman stats-2011-04-19-Free $5 gift certificate!
Introduction: I bought something online and got a gift certificate for $5 to use at BustedTees.com. The gift code is TP07zh4q5dc and it expires on 30 Apr. I don’t need a T-shirt so I’ll pass this on to you. I assume it only works once. So the first person who follows up on this gets the discount. Enjoy!
2 0.96843731 1181 andrew gelman stats-2012-02-23-Philosophy: Pointer to Salmon
Introduction: Larry Brownstein writes: I read your article on induction and deduction and your comments on Deborah Mayo’s approach and thought you might find the following useful in this discussion. It is Wesley Salmon’s Reality and Rationality (2005). Here he argues that Bayesian inferential procedures can replace the hypothetical-deductive method aka the Hempel-Oppenheim theory of explanation. He is concerned about the subjectivity problem, so takes a frequentist approach to the use of Bayes in this context. Hardly anyone agrees that the H-D approach accounts for scientific explanation. The problem has been to find a replacement. Salmon thought he had found it. I don’t know this book—but that’s no surprise since I know just about none of the philosophy of science literature that came after Popper, Kuhn, and Lakatos. That’s why I collaborated with Cosma Shalizi. He’s the one who connected me to Deborah Mayo and who put in the recent philosophy references in our articles. Anyway, I’m pa
3 0.96188772 490 andrew gelman stats-2010-12-29-Brain Structure and the Big Five
Introduction: Many years ago, a research psychologist whose judgment I greatly respect told me that the characterization of personality by the so-called Big Five traits (extraversion, etc.) was old-fashioned. So I’m always surprised to see that the Big Five keeps cropping up. I guess not everyone agrees that it’s a bad idea. For example, Hamdan Azhar wrote to me: I was wondering if you’d seen this recent paper (De Young et al. 2010) that finds significant correlations between brain volume in selected regions and personality trait measures (from the Big Five). This is quite a ground-breaking finding and it was covered extensively in the mainstream media. I think readers of your blog would be interested in your thoughts, statistically speaking, on their methodology and findings. My reply: I’d be interested in my thoughts on this too! But I don’t know enough to say anything useful. From the abstract of the paper under discussion: Controlling for age, sex, and whole-brain volume
4 0.95653558 323 andrew gelman stats-2010-10-06-Sociotropic Voting and the Media
Introduction: Stephen Ansolabehere, Marc Meredith, and Erik Snowberg write : The literature on economic voting notes that voters’ subjective evaluations of the overall state of the economy are correlated with vote choice, whereas personal economic experiences are not. Missing from this literature is a description of how voters acquire information about the general state of the economy, and how that information is used to form perceptions. In order to begin understanding this process, we [Ansolabehere, Meredith, and Snowberg] asked a series of questions on the 2006 ANES Pilot about respondents’ perceptions of the average price of gas and the unemployment rate in their home state. We find that questions about gas prices and unemployment show differences in the sources of information about these two economic variables. Information about unemployment rates come from media sources, and are systematically biased by partisan factors. Information about gas prices, in contrast, comes only from everyday
5 0.94946855 235 andrew gelman stats-2010-08-25-Term Limits for the Supreme Court?
Introduction: In the wake of the confirmation of Elena Kagan to the Supreme Court, political commentators have been expressing a bit of frustration about polarization within the court and polarization in the nomination process. One proposal that’s been floating around is to replace lifetime appointments by fixed terms, perhaps twelve or eighteen years. This would enforce a regular schedule of replacements, instead of the current system in which eighty-something judges have an incentive to hang on as long as possible so as to time their retirements to be during the administration of a politically-compatible president. A couple weeks ago at the sister blog, John Sides discussed some recent research that was relevant to the judicial term limits proposal. Political scientists Justin Crowe and Chris Karpowitz analyzed the historical record or Supreme Court terms and found that long terms of twenty years or more have been happening since the early years of the court. Yes, there is less turnover th
6 0.93009335 1152 andrew gelman stats-2012-02-03-Web equation
8 0.9264217 1817 andrew gelman stats-2013-04-21-More on Bayesian model selection in high-dimensional settings
9 0.92355818 1776 andrew gelman stats-2013-03-25-The harm done by tests of significance
same-blog 10 0.91876805 184 andrew gelman stats-2010-08-04-That half-Cauchy prior
11 0.90535414 1352 andrew gelman stats-2012-05-29-Question 19 of my final exam for Design and Analysis of Sample Surveys
12 0.90326178 1877 andrew gelman stats-2013-05-30-Infill asymptotics and sprawl asymptotics
13 0.90039897 1353 andrew gelman stats-2012-05-30-Question 20 of my final exam for Design and Analysis of Sample Surveys
14 0.89729548 2053 andrew gelman stats-2013-10-06-Ideas that spread fast and slow
15 0.87880659 2004 andrew gelman stats-2013-09-01-Post-publication peer review: How it (sometimes) really works
16 0.87837446 1165 andrew gelman stats-2012-02-13-Philosophy of Bayesian statistics: my reactions to Wasserman
17 0.87709928 98 andrew gelman stats-2010-06-19-Further thoughts on happiness and life satisfaction research
18 0.86392736 1732 andrew gelman stats-2013-02-22-Evaluating the impacts of welfare reform?
19 0.86038268 186 andrew gelman stats-2010-08-04-“To find out what happens when you change something, it is necessary to change it.”
20 0.85983342 2302 andrew gelman stats-2014-04-23-A short questionnaire regarding the subjective assessment of evidence