andrew_gelman_stats andrew_gelman_stats-2011 andrew_gelman_stats-2011-669 knowledge-graph by maker-knowledge-mining

669 andrew gelman stats-2011-04-19-The mysterious Gamma (1.4, 0.4)


meta infos for this blog

Source: html

Introduction: A student writes: I have a question about an earlier recommendation of yours on the election of the prior distribution for the precision hyperparameter of a normal distribution, and a reference for the recommendation. If I recall correctly I have read that you have suggested to use Gamma(1.4, 0.4) instead of Gamma(0.01,0.01) for the prior distribution of the precision hyper parameter of a normal distribution. I would very much appreciate if you would have the time to point me to this publication of yours. The reason is that I have used the prior distribution (Gamma(1.4, 0.4)) in a study which we now revise for publication, and where a reviewer question the choice of the distribution (claiming that it is too informative!). I am well aware of that you in recent publications (Prior distributions for variance parameters in hierarchical models. Bayesian Analysis; Data Analysis using regression and multilevel/hierarchical models) suggest to model the precision as pow(standard deviatio


Summary: the most important sentenses genereted by tfidf model

sentIndex sentText sentNum sentScore

1 A student writes: I have a question about an earlier recommendation of yours on the election of the prior distribution for the precision hyperparameter of a normal distribution, and a reference for the recommendation. [sent-1, score-1.633]

2 If I recall correctly I have read that you have suggested to use Gamma(1. [sent-2, score-0.296]

3 01) for the prior distribution of the precision hyper parameter of a normal distribution. [sent-6, score-1.144]

4 I would very much appreciate if you would have the time to point me to this publication of yours. [sent-7, score-0.353]

5 The reason is that I have used the prior distribution (Gamma(1. [sent-8, score-0.486]

6 4)) in a study which we now revise for publication, and where a reviewer question the choice of the distribution (claiming that it is too informative! [sent-10, score-0.657]

7 I am well aware of that you in recent publications (Prior distributions for variance parameters in hierarchical models. [sent-12, score-0.597]

8 Bayesian Analysis; Data Analysis using regression and multilevel/hierarchical models) suggest to model the precision as pow(standard deviation, -2) and to use either a Uniform or a Half-Cauchy distribution. [sent-13, score-0.528]

9 However, since our model was fitted before I saw these publications, I would much like to find your earlier recommendation (which works fine! [sent-14, score-0.646]

10 My reply: I’ve never heard of a Gamma (1. [sent-16, score-0.067]

11 But I can believe that it might work well–it would depend on the application. [sent-20, score-0.162]

12 Perhaps this was created by matching some moments or quantiles? [sent-25, score-0.294]


similar blogs computed by tfidf model

tfidf for this blog:

wordName wordTfidf (topN-words)

[('gamma', 0.588), ('precision', 0.287), ('distribution', 0.272), ('prior', 0.214), ('publications', 0.183), ('recommendation', 0.173), ('hyper', 0.157), ('normal', 0.146), ('publication', 0.139), ('hyperparameter', 0.133), ('quantiles', 0.13), ('revise', 0.127), ('earlier', 0.12), ('reviewer', 0.116), ('moments', 0.11), ('matching', 0.099), ('deviation', 0.094), ('uniform', 0.093), ('depend', 0.093), ('claiming', 0.09), ('fitted', 0.088), ('correctly', 0.086), ('created', 0.085), ('reference', 0.079), ('suggested', 0.079), ('aware', 0.076), ('appreciate', 0.076), ('question', 0.074), ('informative', 0.072), ('variance', 0.071), ('saw', 0.071), ('election', 0.071), ('recall', 0.07), ('distributions', 0.07), ('would', 0.069), ('parameter', 0.068), ('choice', 0.068), ('hierarchical', 0.068), ('analysis', 0.067), ('heard', 0.067), ('suggest', 0.066), ('well', 0.066), ('student', 0.064), ('works', 0.063), ('parameters', 0.063), ('model', 0.062), ('use', 0.061), ('fine', 0.054), ('however', 0.053), ('either', 0.052)]

similar blogs list:

simIndex simValue blogId blogTitle

same-blog 1 1.0000001 669 andrew gelman stats-2011-04-19-The mysterious Gamma (1.4, 0.4)

Introduction: A student writes: I have a question about an earlier recommendation of yours on the election of the prior distribution for the precision hyperparameter of a normal distribution, and a reference for the recommendation. If I recall correctly I have read that you have suggested to use Gamma(1.4, 0.4) instead of Gamma(0.01,0.01) for the prior distribution of the precision hyper parameter of a normal distribution. I would very much appreciate if you would have the time to point me to this publication of yours. The reason is that I have used the prior distribution (Gamma(1.4, 0.4)) in a study which we now revise for publication, and where a reviewer question the choice of the distribution (claiming that it is too informative!). I am well aware of that you in recent publications (Prior distributions for variance parameters in hierarchical models. Bayesian Analysis; Data Analysis using regression and multilevel/hierarchical models) suggest to model the precision as pow(standard deviatio

2 0.31046146 779 andrew gelman stats-2011-06-25-Avoiding boundary estimates using a prior distribution as regularization

Introduction: For awhile I’ve been fitting most of my multilevel models using lmer/glmer, which gives point estimates of the group-level variance parameters (maximum marginal likelihood estimate for lmer and an approximation for glmer). I’m usually satisfied with this–sure, point estimation understates the uncertainty in model fitting, but that’s typically the least of our worries. Sometimes, though, lmer/glmer estimates group-level variances at 0 or estimates group-level correlation parameters at +/- 1. Typically, when this happens, it’s not that we’re so sure the variance is close to zero or that the correlation is close to 1 or -1; rather, the marginal likelihood does not provide a lot of information about these parameters of the group-level error distribution. I don’t want point estimates on the boundary. I don’t want to say that the unexplained variance in some dimension is exactly zero. One way to handle this problem is full Bayes: slap a prior on sigma, do your Gibbs and Metropolis

3 0.21603888 1046 andrew gelman stats-2011-12-07-Neutral noninformative and informative conjugate beta and gamma prior distributions

Introduction: Jouni Kerman did a cool bit of research justifying the Beta (1/3, 1/3) prior as noninformative for binomial data, and the Gamma (1/3, 0) prior for Poisson data. You probably thought that nothing new could be said about noninformative priors in such basic problems, but you were wrong! Here’s the story : The conjugate binomial and Poisson models are commonly used for estimating proportions or rates. However, it is not well known that the conventional noninformative conjugate priors tend to shrink the posterior quantiles toward the boundary or toward the middle of the parameter space, making them thus appear excessively informative. The shrinkage is always largest when the number of observed events is small. This behavior persists for all sample sizes and exposures. The effect of the prior is therefore most conspicuous and potentially controversial when analyzing rare events. As alternative default conjugate priors, I [Jouni] introduce Beta(1/3, 1/3) and Gamma(1/3, 0), which I cal

4 0.21574637 63 andrew gelman stats-2010-06-02-The problem of overestimation of group-level variance parameters

Introduction: John Lawson writes: I have been experimenting using Bayesian Methods to estimate variance components, and I have noticed that even when I use a noninformative prior, my estimates are never close to the method of moments or REML estimates. In every case I have tried, the sum of the Bayesian estimated variance components is always larger than the sum of the estimates obtained by method of moments or REML. For data sets I have used that arise from a simple one-way random effects model, the Bayesian estimates of the between groups variance component is usually larger than the method of moments or REML estimates. When I use a uniform prior on the between standard deviation (as you recommended in your 2006 paper ) rather than an inverse gamma prior on the between variance component, the between variance component is usually reduced. However, for the dyestuff data in Davies(1949, p74), the opposite appears to be the case. I am a worried that the Bayesian estimators of the varian

5 0.2116106 1474 andrew gelman stats-2012-08-29-More on scaled-inverse Wishart and prior independence

Introduction: I’ve had a couple of email conversations in the past couple days on dependence in multivariate prior distributions. Modeling the degrees of freedom and scale parameters in the t distribution First, in our Stan group we’ve been discussing the choice of priors for the degrees-of-freedom parameter in the t distribution. I wrote that also there’s the question of parameterization. It does not necessarily make sense to have independent priors on the df and scale parameters. In some sense, the meaning of the scale parameter changes with the df. Prior dependence between correlation and scale parameters in the scaled inverse-Wishart model The second case of parameterization in prior distribution arose from an email I received from Chris Chatham pointing me to this exploration by Matt Simpson of the scaled inverse-Wishart prior distribution for hierarchical covariance matrices. Simpson writes: A popular prior for Σ is the inverse-Wishart distribution [ not the same as the

6 0.20792529 2143 andrew gelman stats-2013-12-22-The kluges of today are the textbook solutions of tomorrow.

7 0.2001971 1155 andrew gelman stats-2012-02-05-What is a prior distribution?

8 0.19690995 2200 andrew gelman stats-2014-02-05-Prior distribution for a predicted probability

9 0.19218269 1941 andrew gelman stats-2013-07-16-Priors

10 0.18169565 846 andrew gelman stats-2011-08-09-Default priors update?

11 0.16580677 2109 andrew gelman stats-2013-11-21-Hidden dangers of noninformative priors

12 0.16004263 1465 andrew gelman stats-2012-08-21-D. Buggin

13 0.15559675 1092 andrew gelman stats-2011-12-29-More by Berger and me on weakly informative priors

14 0.14463937 781 andrew gelman stats-2011-06-28-The holes in my philosophy of Bayesian data analysis

15 0.14249313 2128 andrew gelman stats-2013-12-09-How to model distributions that have outliers in one direction

16 0.14156926 1527 andrew gelman stats-2012-10-10-Another reason why you can get good inferences from a bad model

17 0.14093639 773 andrew gelman stats-2011-06-18-Should we always be using the t and robit instead of the normal and logit?

18 0.14039697 1897 andrew gelman stats-2013-06-13-When’s that next gamma-ray blast gonna come, already?

19 0.13685337 184 andrew gelman stats-2010-08-04-That half-Cauchy prior

20 0.1348286 2176 andrew gelman stats-2014-01-19-Transformations for non-normal data


similar blogs computed by lsi model

lsi for this blog:

topicId topicWeight

[(0, 0.171), (1, 0.207), (2, 0.035), (3, 0.052), (4, -0.026), (5, -0.048), (6, 0.163), (7, -0.011), (8, -0.158), (9, 0.072), (10, 0.06), (11, 0.001), (12, 0.05), (13, 0.006), (14, -0.012), (15, -0.017), (16, -0.018), (17, 0.0), (18, 0.041), (19, -0.019), (20, 0.007), (21, 0.003), (22, 0.016), (23, 0.018), (24, 0.006), (25, 0.019), (26, 0.001), (27, -0.012), (28, -0.007), (29, 0.026), (30, -0.031), (31, -0.009), (32, -0.003), (33, 0.044), (34, -0.004), (35, 0.007), (36, -0.004), (37, 0.03), (38, -0.025), (39, 0.021), (40, 0.008), (41, 0.003), (42, -0.023), (43, -0.0), (44, 0.027), (45, -0.001), (46, 0.032), (47, 0.004), (48, 0.045), (49, -0.03)]

similar blogs list:

simIndex simValue blogId blogTitle

same-blog 1 0.97305495 669 andrew gelman stats-2011-04-19-The mysterious Gamma (1.4, 0.4)

Introduction: A student writes: I have a question about an earlier recommendation of yours on the election of the prior distribution for the precision hyperparameter of a normal distribution, and a reference for the recommendation. If I recall correctly I have read that you have suggested to use Gamma(1.4, 0.4) instead of Gamma(0.01,0.01) for the prior distribution of the precision hyper parameter of a normal distribution. I would very much appreciate if you would have the time to point me to this publication of yours. The reason is that I have used the prior distribution (Gamma(1.4, 0.4)) in a study which we now revise for publication, and where a reviewer question the choice of the distribution (claiming that it is too informative!). I am well aware of that you in recent publications (Prior distributions for variance parameters in hierarchical models. Bayesian Analysis; Data Analysis using regression and multilevel/hierarchical models) suggest to model the precision as pow(standard deviatio

2 0.93407947 1474 andrew gelman stats-2012-08-29-More on scaled-inverse Wishart and prior independence

Introduction: I’ve had a couple of email conversations in the past couple days on dependence in multivariate prior distributions. Modeling the degrees of freedom and scale parameters in the t distribution First, in our Stan group we’ve been discussing the choice of priors for the degrees-of-freedom parameter in the t distribution. I wrote that also there’s the question of parameterization. It does not necessarily make sense to have independent priors on the df and scale parameters. In some sense, the meaning of the scale parameter changes with the df. Prior dependence between correlation and scale parameters in the scaled inverse-Wishart model The second case of parameterization in prior distribution arose from an email I received from Chris Chatham pointing me to this exploration by Matt Simpson of the scaled inverse-Wishart prior distribution for hierarchical covariance matrices. Simpson writes: A popular prior for Σ is the inverse-Wishart distribution [ not the same as the

3 0.89371997 1465 andrew gelman stats-2012-08-21-D. Buggin

Introduction: Joe Zhao writes: I am trying to fit my data using the scaled inverse wishart model you mentioned in your book, Data analysis using regression and hierarchical models. Instead of using a uniform prior on the scale parameters, I try to use a log-normal distribution prior. However, I found that the individual coefficients don’t shrink much to a certain value even a highly informative prior (with extremely low variance) is considered. The coefficients are just very close to their least-squares estimations. Is it because of the log-normal prior I’m using or I’m wrong somewhere? My reply: If your priors are concentrated enough at zero variance, then yeah, the posterior estimates of the parameters should be pulled (almost) all the way to zero. If this isn’t happening, you got a problem. So as a start I’d try putting in some really strong priors concentrated at 0 (for example, N(0,.1^2)) and checking that you get a sensible answer. If not, you might well have a bug. You can also try

4 0.88696587 779 andrew gelman stats-2011-06-25-Avoiding boundary estimates using a prior distribution as regularization

Introduction: For awhile I’ve been fitting most of my multilevel models using lmer/glmer, which gives point estimates of the group-level variance parameters (maximum marginal likelihood estimate for lmer and an approximation for glmer). I’m usually satisfied with this–sure, point estimation understates the uncertainty in model fitting, but that’s typically the least of our worries. Sometimes, though, lmer/glmer estimates group-level variances at 0 or estimates group-level correlation parameters at +/- 1. Typically, when this happens, it’s not that we’re so sure the variance is close to zero or that the correlation is close to 1 or -1; rather, the marginal likelihood does not provide a lot of information about these parameters of the group-level error distribution. I don’t want point estimates on the boundary. I don’t want to say that the unexplained variance in some dimension is exactly zero. One way to handle this problem is full Bayes: slap a prior on sigma, do your Gibbs and Metropolis

5 0.87974757 1858 andrew gelman stats-2013-05-15-Reputations changeable, situations tolerable

Introduction: David Kessler, Peter Hoff, and David Dunson write : Marginally specified priors for nonparametric Bayesian estimation Prior specification for nonparametric Bayesian inference involves the difficult task of quantifying prior knowledge about a parameter of high, often infinite, dimension. Realistically, a statistician is unlikely to have informed opinions about all aspects of such a parameter, but may have real information about functionals of the parameter, such the population mean or variance. This article proposes a new framework for nonparametric Bayes inference in which the prior distribution for a possibly infinite-dimensional parameter is decomposed into two parts: an informative prior on a finite set of functionals, and a nonparametric conditional prior for the parameter given the functionals. Such priors can be easily constructed from standard nonparametric prior distributions in common use, and inherit the large support of the standard priors upon which they are based. Ad

6 0.87707293 846 andrew gelman stats-2011-08-09-Default priors update?

7 0.87115598 1130 andrew gelman stats-2012-01-20-Prior beliefs about locations of decision boundaries

8 0.8634423 2200 andrew gelman stats-2014-02-05-Prior distribution for a predicted probability

9 0.86332667 1941 andrew gelman stats-2013-07-16-Priors

10 0.85845274 442 andrew gelman stats-2010-12-01-bayesglm in Stata?

11 0.85353529 1046 andrew gelman stats-2011-12-07-Neutral noninformative and informative conjugate beta and gamma prior distributions

12 0.84866035 2017 andrew gelman stats-2013-09-11-“Informative g-Priors for Logistic Regression”

13 0.84828532 1946 andrew gelman stats-2013-07-19-Prior distributions on derived quantities rather than on parameters themselves

14 0.84399927 2109 andrew gelman stats-2013-11-21-Hidden dangers of noninformative priors

15 0.83708102 1155 andrew gelman stats-2012-02-05-What is a prior distribution?

16 0.83706856 1092 andrew gelman stats-2011-12-29-More by Berger and me on weakly informative priors

17 0.82432705 801 andrew gelman stats-2011-07-13-On the half-Cauchy prior for a global scale parameter

18 0.81317377 1466 andrew gelman stats-2012-08-22-The scaled inverse Wishart prior distribution for a covariance matrix in a hierarchical model

19 0.81268746 63 andrew gelman stats-2010-06-02-The problem of overestimation of group-level variance parameters

20 0.80941004 184 andrew gelman stats-2010-08-04-That half-Cauchy prior


similar blogs computed by lda model

lda for this blog:

topicId topicWeight

[(2, 0.02), (15, 0.054), (16, 0.043), (24, 0.292), (40, 0.022), (59, 0.019), (66, 0.048), (85, 0.014), (86, 0.068), (95, 0.016), (99, 0.285)]

similar blogs list:

simIndex simValue blogId blogTitle

1 0.97814757 779 andrew gelman stats-2011-06-25-Avoiding boundary estimates using a prior distribution as regularization

Introduction: For awhile I’ve been fitting most of my multilevel models using lmer/glmer, which gives point estimates of the group-level variance parameters (maximum marginal likelihood estimate for lmer and an approximation for glmer). I’m usually satisfied with this–sure, point estimation understates the uncertainty in model fitting, but that’s typically the least of our worries. Sometimes, though, lmer/glmer estimates group-level variances at 0 or estimates group-level correlation parameters at +/- 1. Typically, when this happens, it’s not that we’re so sure the variance is close to zero or that the correlation is close to 1 or -1; rather, the marginal likelihood does not provide a lot of information about these parameters of the group-level error distribution. I don’t want point estimates on the boundary. I don’t want to say that the unexplained variance in some dimension is exactly zero. One way to handle this problem is full Bayes: slap a prior on sigma, do your Gibbs and Metropolis

2 0.97724605 1367 andrew gelman stats-2012-06-05-Question 26 of my final exam for Design and Analysis of Sample Surveys

Introduction: 26. You have just graded an an exam with 28 questions and 15 students. You fit a logistic item- response model estimating ability, difficulty, and discrimination parameters. Which of the following statements are basically true? (Indicate all that apply.) (a) If a question is answered correctly by students with very low and very high ability, but is missed by students in the middle, it will have a high value for its discrimination parameter. (b) It is not possible to fit an item-response model when you have more questions than students. In order to fit the model, you either need to reduce the number of questions (for example, by discarding some questions or by putting together some questions into a combined score) or increase the number of students in the dataset. (c) To keep the model identified, you can set one of the difficulty parameters or one of the ability parameters to zero and set one of the discrimination parameters to 1. (d) If two students answer the same number of q

3 0.97665596 494 andrew gelman stats-2010-12-31-Type S error rates for classical and Bayesian single and multiple comparison procedures

Introduction: Type S error: When your estimate is the wrong sign, compared to the true value of the parameter Type M error: When the magnitude of your estimate is far off, compared to the true value of the parameter More here.

4 0.97622031 953 andrew gelman stats-2011-10-11-Steve Jobs’s cancer and science-based medicine

Introduction: Interesting discussion from David Gorski (which I found via this link from Joseph Delaney). I don’t have anything really to add to this discussion except to note the value of this sort of anecdote in a statistics discussion. It’s only n=1 and adds almost nothing to the literature on the effectiveness of various treatments, but a story like this can help focus one’s thoughts on the decision problems.

5 0.97525716 1368 andrew gelman stats-2012-06-06-Question 27 of my final exam for Design and Analysis of Sample Surveys

Introduction: 27. Which of the following problems were identified with the Burnham et al. survey of Iraq mortality? (Indicate all that apply.) (a) The survey used cluster sampling, which is inappropriate for estimating individual out- comes such as death. (b) In their report, Burnham et al. did not identify their primary sampling units. (c) The second-stage sampling was not a probability sample. (d) Survey materials supplied by the authors are incomplete and inconsistent with published descriptions of the survey. Solution to question 26 From yesterday : 26. You have just graded an an exam with 28 questions and 15 students. You fit a logistic item- response model estimating ability, difficulty, and discrimination parameters. Which of the following statements are basically true? (Indicate all that apply.) (a) If a question is answered correctly by students with very low and very high ability, but is missed by students in the middle, it will have a high value for its discrimination

6 0.97415864 846 andrew gelman stats-2011-08-09-Default priors update?

7 0.97408485 197 andrew gelman stats-2010-08-10-The last great essayist?

8 0.97354507 1087 andrew gelman stats-2011-12-27-“Keeping things unridiculous”: Berger, O’Hagan, and me on weakly informative priors

9 0.97347569 1062 andrew gelman stats-2011-12-16-Mr. Pearson, meet Mr. Mandelbrot: Detecting Novel Associations in Large Data Sets

10 0.97297841 847 andrew gelman stats-2011-08-10-Using a “pure infographic” to explore differences between information visualization and statistical graphics

11 0.97291082 1240 andrew gelman stats-2012-04-02-Blogads update

12 0.97116661 1455 andrew gelman stats-2012-08-12-Probabilistic screening to get an approximate self-weighted sample

13 0.97064686 1474 andrew gelman stats-2012-08-29-More on scaled-inverse Wishart and prior independence

14 0.97051215 1838 andrew gelman stats-2013-05-03-Setting aside the politics, the debate over the new health-care study reveals that we’re moving to a new high standard of statistical journalism

15 0.96896714 278 andrew gelman stats-2010-09-15-Advice that might make sense for individuals but is negative-sum overall

16 0.96864986 1999 andrew gelman stats-2013-08-27-Bayesian model averaging or fitting a larger model

17 0.96864581 2017 andrew gelman stats-2013-09-11-“Informative g-Priors for Logistic Regression”

18 0.96725214 2312 andrew gelman stats-2014-04-29-Ken Rice presents a unifying approach to statistical inference and hypothesis testing

19 0.96478379 1224 andrew gelman stats-2012-03-21-Teaching velocity and acceleration

20 0.96408308 2099 andrew gelman stats-2013-11-13-“What are some situations in which the classical approach (or a naive implementation of it, based on cookbook recipes) gives worse results than a Bayesian approach, results that actually impeded the science?”