andrew_gelman_stats andrew_gelman_stats-2010 andrew_gelman_stats-2010-398 knowledge-graph by maker-knowledge-mining
Source: html
Introduction: “A statistical model is usually taken to be summarized by a likelihood, or a likelihood and a prior distribution, but we go an extra step by noting that the parameters of a model are typically batched, and we take this batching as an essential part of the model.”
sentIndex sentText sentNum sentScore
1 “A statistical model is usually taken to be summarized by a likelihood, or a likelihood and a prior distribution, but we go an extra step by noting that the parameters of a model are typically batched, and we take this batching as an essential part of the model. [sent-1, score-3.843]
wordName wordTfidf (topN-words)
[('batching', 0.444), ('likelihood', 0.391), ('summarized', 0.357), ('noting', 0.322), ('essential', 0.288), ('extra', 0.245), ('taken', 0.181), ('step', 0.179), ('parameters', 0.178), ('model', 0.175), ('typically', 0.17), ('usually', 0.168), ('distribution', 0.154), ('prior', 0.151), ('part', 0.12), ('take', 0.114), ('go', 0.1), ('statistical', 0.085)]
simIndex simValue blogId blogTitle
same-blog 1 0.99999994 398 andrew gelman stats-2010-11-06-Quote of the day
Introduction: “A statistical model is usually taken to be summarized by a likelihood, or a likelihood and a prior distribution, but we go an extra step by noting that the parameters of a model are typically batched, and we take this batching as an essential part of the model.”
2 0.18123178 547 andrew gelman stats-2011-01-31-Using sample size in the prior distribution
Introduction: Mike McLaughlin writes: Consider the Seeds example in vol. 1 of the BUGS examples. There, a binomial likelihood has a p parameter constructed, via logit, from two covariates. What I am wondering is: Would it be legitimate, in a binomial + logit problem like this, to allow binomial p[i] to be a function of the corresponding n[i] or would that amount to using the data in the prior? In other words, in the context of the Seeds example, is r[] the only data or is n[] data as well and therefore not permissible in a prior formulation? I [McLaughlin] currently have a model with a common beta prior for all p[i] but would like to mitigate this commonality (a kind of James-Stein effect) when there are lots of observations for some i. But this seems to feed the data back into the prior. Does it really? It also occurs to me [McLaughlin] that, perhaps, a binomial likelihood is not the one to use here (not flexible enough). My reply: Strictly speaking, “n” is data, and so what you wa
3 0.16858289 779 andrew gelman stats-2011-06-25-Avoiding boundary estimates using a prior distribution as regularization
Introduction: For awhile I’ve been fitting most of my multilevel models using lmer/glmer, which gives point estimates of the group-level variance parameters (maximum marginal likelihood estimate for lmer and an approximation for glmer). I’m usually satisfied with this–sure, point estimation understates the uncertainty in model fitting, but that’s typically the least of our worries. Sometimes, though, lmer/glmer estimates group-level variances at 0 or estimates group-level correlation parameters at +/- 1. Typically, when this happens, it’s not that we’re so sure the variance is close to zero or that the correlation is close to 1 or -1; rather, the marginal likelihood does not provide a lot of information about these parameters of the group-level error distribution. I don’t want point estimates on the boundary. I don’t want to say that the unexplained variance in some dimension is exactly zero. One way to handle this problem is full Bayes: slap a prior on sigma, do your Gibbs and Metropolis
4 0.16237609 1474 andrew gelman stats-2012-08-29-More on scaled-inverse Wishart and prior independence
Introduction: I’ve had a couple of email conversations in the past couple days on dependence in multivariate prior distributions. Modeling the degrees of freedom and scale parameters in the t distribution First, in our Stan group we’ve been discussing the choice of priors for the degrees-of-freedom parameter in the t distribution. I wrote that also there’s the question of parameterization. It does not necessarily make sense to have independent priors on the df and scale parameters. In some sense, the meaning of the scale parameter changes with the df. Prior dependence between correlation and scale parameters in the scaled inverse-Wishart model The second case of parameterization in prior distribution arose from an email I received from Chris Chatham pointing me to this exploration by Matt Simpson of the scaled inverse-Wishart prior distribution for hierarchical covariance matrices. Simpson writes: A popular prior for Σ is the inverse-Wishart distribution [ not the same as the
5 0.15732108 1422 andrew gelman stats-2012-07-20-Likelihood thresholds and decisions
Introduction: David Hogg points me to this discussion: Martin Strasbourg and I [Hogg] discussed his project to detect new satellites of M31 in the PAndAS survey. He can construct a likelihood ratio (possibly even a marginalized likelihood ratio) at every position in the M31 imaging, between the best-fit satellite-plus-background model and the best nothing-plus-background model. He can make a two-dimensional map of these likelihood ratios and show a the histogram of them. Looking at this histogram, which has a tail to very large ratios, he asked me, where should I put my cut? That is, at what likelihood ratio does a candidate deserve follow-up? Here’s my unsatisfying answer: To a statistician, the distribution of likelihood ratios is interesting and valuable to study. To an astronomer, it is uninteresting. You don’t want to know the distribution of likelihoods, you want to find satellites . . . I wrote that I think this makes sense and that it would actualy be an interesting and useful rese
7 0.14929464 247 andrew gelman stats-2010-09-01-How does Bayes do it?
8 0.14239669 2200 andrew gelman stats-2014-02-05-Prior distribution for a predicted probability
9 0.13567451 781 andrew gelman stats-2011-06-28-The holes in my philosophy of Bayesian data analysis
10 0.13023084 1247 andrew gelman stats-2012-04-05-More philosophy of Bayes
11 0.11688837 2109 andrew gelman stats-2013-11-21-Hidden dangers of noninformative priors
12 0.11346442 1941 andrew gelman stats-2013-07-16-Priors
13 0.11229108 1155 andrew gelman stats-2012-02-05-What is a prior distribution?
14 0.11184394 442 andrew gelman stats-2010-12-01-bayesglm in Stata?
15 0.11002759 846 andrew gelman stats-2011-08-09-Default priors update?
16 0.10895004 1518 andrew gelman stats-2012-10-02-Fighting a losing battle
17 0.10783512 754 andrew gelman stats-2011-06-09-Difficulties with Bayesian model averaging
19 0.10682081 788 andrew gelman stats-2011-07-06-Early stopping and penalized likelihood
20 0.10643288 1287 andrew gelman stats-2012-04-28-Understanding simulations in terms of predictive inference?
topicId topicWeight
[(0, 0.099), (1, 0.167), (2, 0.019), (3, 0.066), (4, -0.03), (5, -0.023), (6, 0.081), (7, 0.011), (8, -0.053), (9, 0.037), (10, 0.015), (11, 0.022), (12, -0.018), (13, 0.001), (14, -0.078), (15, -0.051), (16, 0.003), (17, -0.01), (18, 0.017), (19, -0.018), (20, 0.037), (21, -0.056), (22, 0.002), (23, -0.027), (24, -0.023), (25, 0.033), (26, -0.02), (27, 0.008), (28, 0.027), (29, 0.001), (30, -0.056), (31, -0.01), (32, -0.006), (33, 0.046), (34, 0.012), (35, 0.044), (36, -0.022), (37, -0.013), (38, -0.017), (39, -0.004), (40, 0.038), (41, 0.019), (42, -0.026), (43, 0.026), (44, 0.049), (45, -0.001), (46, 0.007), (47, -0.01), (48, 0.018), (49, 0.018)]
simIndex simValue blogId blogTitle
same-blog 1 0.96974671 398 andrew gelman stats-2010-11-06-Quote of the day
Introduction: “A statistical model is usually taken to be summarized by a likelihood, or a likelihood and a prior distribution, but we go an extra step by noting that the parameters of a model are typically batched, and we take this batching as an essential part of the model.”
2 0.80507606 1474 andrew gelman stats-2012-08-29-More on scaled-inverse Wishart and prior independence
Introduction: I’ve had a couple of email conversations in the past couple days on dependence in multivariate prior distributions. Modeling the degrees of freedom and scale parameters in the t distribution First, in our Stan group we’ve been discussing the choice of priors for the degrees-of-freedom parameter in the t distribution. I wrote that also there’s the question of parameterization. It does not necessarily make sense to have independent priors on the df and scale parameters. In some sense, the meaning of the scale parameter changes with the df. Prior dependence between correlation and scale parameters in the scaled inverse-Wishart model The second case of parameterization in prior distribution arose from an email I received from Chris Chatham pointing me to this exploration by Matt Simpson of the scaled inverse-Wishart prior distribution for hierarchical covariance matrices. Simpson writes: A popular prior for Σ is the inverse-Wishart distribution [ not the same as the
3 0.80080992 547 andrew gelman stats-2011-01-31-Using sample size in the prior distribution
Introduction: Mike McLaughlin writes: Consider the Seeds example in vol. 1 of the BUGS examples. There, a binomial likelihood has a p parameter constructed, via logit, from two covariates. What I am wondering is: Would it be legitimate, in a binomial + logit problem like this, to allow binomial p[i] to be a function of the corresponding n[i] or would that amount to using the data in the prior? In other words, in the context of the Seeds example, is r[] the only data or is n[] data as well and therefore not permissible in a prior formulation? I [McLaughlin] currently have a model with a common beta prior for all p[i] but would like to mitigate this commonality (a kind of James-Stein effect) when there are lots of observations for some i. But this seems to feed the data back into the prior. Does it really? It also occurs to me [McLaughlin] that, perhaps, a binomial likelihood is not the one to use here (not flexible enough). My reply: Strictly speaking, “n” is data, and so what you wa
4 0.79440355 669 andrew gelman stats-2011-04-19-The mysterious Gamma (1.4, 0.4)
Introduction: A student writes: I have a question about an earlier recommendation of yours on the election of the prior distribution for the precision hyperparameter of a normal distribution, and a reference for the recommendation. If I recall correctly I have read that you have suggested to use Gamma(1.4, 0.4) instead of Gamma(0.01,0.01) for the prior distribution of the precision hyper parameter of a normal distribution. I would very much appreciate if you would have the time to point me to this publication of yours. The reason is that I have used the prior distribution (Gamma(1.4, 0.4)) in a study which we now revise for publication, and where a reviewer question the choice of the distribution (claiming that it is too informative!). I am well aware of that you in recent publications (Prior distributions for variance parameters in hierarchical models. Bayesian Analysis; Data Analysis using regression and multilevel/hierarchical models) suggest to model the precision as pow(standard deviatio
5 0.76952249 1287 andrew gelman stats-2012-04-28-Understanding simulations in terms of predictive inference?
Introduction: David Hogg writes: My (now deceased) collaborator and guru in all things inference, Sam Roweis, used to emphasize to me that we should evaluate models in the data space — not the parameter space — because models are always effectively “effective” and not really, fundamentally true. Or, in other words, models should be compared in the space of their predictions, not in the space of their parameters (the parameters didn’t really “exist” at all for Sam). In that spirit, when we estimate the effectiveness of a MCMC method or tuning — by autocorrelation time or ESJD or anything else — shouldn’t we be looking at the changes in the model predictions over time, rather than the changes in the parameters over time? That is, the autocorrelation time should be the autocorrelation time in what the model (at the walker position) predicts for the data, and the ESJD should be the expected squared jump distance in what the model predicts for the data? This might resolve the concern I expressed a
6 0.75899261 1459 andrew gelman stats-2012-08-15-How I think about mixture models
7 0.7516923 1130 andrew gelman stats-2012-01-20-Prior beliefs about locations of decision boundaries
8 0.74329484 442 andrew gelman stats-2010-12-01-bayesglm in Stata?
9 0.74307406 1460 andrew gelman stats-2012-08-16-“Real data can be a pain”
10 0.73656213 1999 andrew gelman stats-2013-08-27-Bayesian model averaging or fitting a larger model
11 0.73414481 1363 andrew gelman stats-2012-06-03-Question about predictive checks
12 0.7295714 1247 andrew gelman stats-2012-04-05-More philosophy of Bayes
13 0.72868466 1401 andrew gelman stats-2012-06-30-David Hogg on statistics
14 0.72573984 1941 andrew gelman stats-2013-07-16-Priors
15 0.72394192 160 andrew gelman stats-2010-07-23-Unhappy with improvement by a factor of 10^29
16 0.72370255 1221 andrew gelman stats-2012-03-19-Whassup with deviance having a high posterior correlation with a parameter in the model?
17 0.72244763 184 andrew gelman stats-2010-08-04-That half-Cauchy prior
18 0.72233671 1518 andrew gelman stats-2012-10-02-Fighting a losing battle
19 0.71992463 1465 andrew gelman stats-2012-08-21-D. Buggin
20 0.71981019 2200 andrew gelman stats-2014-02-05-Prior distribution for a predicted probability
topicId topicWeight
[(16, 0.629), (24, 0.168)]
simIndex simValue blogId blogTitle
1 0.96618426 1745 andrew gelman stats-2013-03-02-Classification error
Introduction: 15-2040 != 19-3010 (and, for that matter, 25-1022 != 25-1063).
same-blog 2 0.96274179 398 andrew gelman stats-2010-11-06-Quote of the day
Introduction: “A statistical model is usually taken to be summarized by a likelihood, or a likelihood and a prior distribution, but we go an extra step by noting that the parameters of a model are typically batched, and we take this batching as an essential part of the model.”
3 0.95656645 1026 andrew gelman stats-2011-11-25-Bayes wikipedia update
Introduction: I checked and somebody went in and screwed up my fixes to the wikipedia page on Bayesian inference. I give up.
4 0.92324948 1697 andrew gelman stats-2013-01-29-Where 36% of all boys end up nowadays
Introduction: My Take a Number feature appears in today’s Times. And here are the graphs that I wish they’d had space to include! Original story here .
5 0.92296606 572 andrew gelman stats-2011-02-14-Desecration of valuable real estate
Introduction: Malecki asks: Is this the worst infographic ever to appear in NYT? USA Today is not something to aspire to. To connect to some of our recent themes , I agree this is a pretty horrible data display. But it’s not bad as a series of images. Considering the competition to be a cartoon or series of photos, these images aren’t so bad. One issue, I think, is that designers get credit for creativity and originality (unusual color combinations! Histogram bars shaped like mosques!) , which is often the opposite of what we want in a clear graph. It’s Martin Amis vs. George Orwell all over again.
6 0.87818813 1014 andrew gelman stats-2011-11-16-Visualizations of NYPD stop-and-frisk data
7 0.8593632 1279 andrew gelman stats-2012-04-24-ESPN is looking to hire a research analyst
8 0.85470551 1115 andrew gelman stats-2012-01-12-Where are the larger-than-life athletes?
9 0.84260821 528 andrew gelman stats-2011-01-21-Elevator shame is a two-way street
10 0.83617449 1366 andrew gelman stats-2012-06-05-How do segregation measures change when you change the level of aggregation?
11 0.81541342 1659 andrew gelman stats-2013-01-07-Some silly things you (didn’t) miss by not reading the sister blog
12 0.78802156 1304 andrew gelman stats-2012-05-06-Picking on Stephen Wolfram
13 0.77507079 1180 andrew gelman stats-2012-02-22-I’m officially no longer a “rogue”
14 0.75656021 1487 andrew gelman stats-2012-09-08-Animated drought maps
15 0.73206145 1330 andrew gelman stats-2012-05-19-Cross-validation to check missing-data imputation
16 0.72945029 445 andrew gelman stats-2010-12-03-Getting a job in pro sports… as a statistician
17 0.71879107 1598 andrew gelman stats-2012-11-30-A graphics talk with no visuals!
18 0.71735132 1025 andrew gelman stats-2011-11-24-Always check your evidence
19 0.70363683 700 andrew gelman stats-2011-05-06-Suspicious pattern of too-strong replications of medical research
20 0.69973969 2 andrew gelman stats-2010-04-23-Modeling heterogenous treatment effects