andrew_gelman_stats andrew_gelman_stats-2010 andrew_gelman_stats-2010-442 knowledge-graph by maker-knowledge-mining
Source: html
Introduction: Is there an implementation of bayesglm in Stata? (That is, approximate maximum penalized likelihood estimation with specified normal or t prior distributions on the coefficients.)
sentIndex sentText sentNum sentScore
1 (That is, approximate maximum penalized likelihood estimation with specified normal or t prior distributions on the coefficients. [sent-2, score-2.344]
wordName wordTfidf (topN-words)
[('bayesglm', 0.412), ('penalized', 0.374), ('specified', 0.322), ('stata', 0.316), ('implementation', 0.298), ('approximate', 0.285), ('maximum', 0.284), ('estimation', 0.235), ('normal', 0.232), ('distributions', 0.222), ('likelihood', 0.22), ('prior', 0.17)]
simIndex simValue blogId blogTitle
same-blog 1 1.0 442 andrew gelman stats-2010-12-01-bayesglm in Stata?
Introduction: Is there an implementation of bayesglm in Stata? (That is, approximate maximum penalized likelihood estimation with specified normal or t prior distributions on the coefficients.)
2 0.22726306 788 andrew gelman stats-2011-07-06-Early stopping and penalized likelihood
Introduction: Maximum likelihood gives the beat fit to the training data but in general overfits, yielding overly-noisy parameter estimates that don’t perform so well when predicting new data. A popular solution to this overfitting problem takes advantage of the iterative nature of most maximum likelihood algorithms by stopping early. In general, an iterative optimization algorithm goes from a starting point to the maximum of some objective function. If the starting point has some good properties, then early stopping can work well, keeping some of the virtues of the starting point while respecting the data. This trick can be performed the other way, too, starting with the data and then processing it to move it toward a model. That’s how the iterative proportional fitting algorithm of Deming and Stephan (1940) works to fit multivariate categorical data to known margins. In any case, the trick is to stop at the right point–not so soon that you’re ignoring the data but not so late that you en
3 0.20299828 779 andrew gelman stats-2011-06-25-Avoiding boundary estimates using a prior distribution as regularization
Introduction: For awhile I’ve been fitting most of my multilevel models using lmer/glmer, which gives point estimates of the group-level variance parameters (maximum marginal likelihood estimate for lmer and an approximation for glmer). I’m usually satisfied with this–sure, point estimation understates the uncertainty in model fitting, but that’s typically the least of our worries. Sometimes, though, lmer/glmer estimates group-level variances at 0 or estimates group-level correlation parameters at +/- 1. Typically, when this happens, it’s not that we’re so sure the variance is close to zero or that the correlation is close to 1 or -1; rather, the marginal likelihood does not provide a lot of information about these parameters of the group-level error distribution. I don’t want point estimates on the boundary. I don’t want to say that the unexplained variance in some dimension is exactly zero. One way to handle this problem is full Bayes: slap a prior on sigma, do your Gibbs and Metropolis
4 0.16424081 76 andrew gelman stats-2010-06-09-Both R and Stata
Introduction: A student I’m working with writes: I was planning on getting a applied stat text as a desk reference, and for that I’m assuming you’d recommend your own book. Also, being an economics student, I was initially planning on doing my analysis in STATA, but I noticed on your blog that you use R, and apparently so does the rest of the statistics profession. Would you rather I do my programming in R this summer, or does it not matter? It doesn’t look too hard to learn, so just let me know what’s most convenient for you. My reply: Yes, I recommend my book with Jennifer Hill. Also the book by John Fox, An R and S-plus Companion to Applied Regression, is a good way to get into R. I recommend you use both Stata and R. If you’re already familiar with Stata, then stick with it–it’s a great system for working with big datasets. You can grab your data in Stata, do some basic manipulations, then save a smaller dataset to read into R (using R’s read.dta() function). Once you want to make fu
5 0.16026713 869 andrew gelman stats-2011-08-24-Mister P in Stata
Introduction: Maurizio Pisati sends along this presentation of work with Valeria Glorioso. He writes: “Our major problem, now, is uncertainty estimation — we’re still struggling to find a solution appropriate to the Stata environment.”
6 0.15903521 247 andrew gelman stats-2010-09-01-How does Bayes do it?
7 0.15650707 80 andrew gelman stats-2010-06-11-Free online course in multilevel modeling
8 0.13028035 269 andrew gelman stats-2010-09-10-R vs. Stata, or, Different ways to estimate multilevel models
9 0.1295585 1560 andrew gelman stats-2012-11-03-Statistical methods that work in some settings but not others
10 0.11962494 2182 andrew gelman stats-2014-01-22-Spell-checking example demonstrates key aspects of Bayesian data analysis
11 0.11438645 1858 andrew gelman stats-2013-05-15-Reputations changeable, situations tolerable
12 0.1138701 2176 andrew gelman stats-2014-01-19-Transformations for non-normal data
13 0.11184394 398 andrew gelman stats-2010-11-06-Quote of the day
14 0.10868007 773 andrew gelman stats-2011-06-18-Should we always be using the t and robit instead of the normal and logit?
16 0.10498375 2200 andrew gelman stats-2014-02-05-Prior distribution for a predicted probability
17 0.10147719 1941 andrew gelman stats-2013-07-16-Priors
18 0.1000923 1474 andrew gelman stats-2012-08-29-More on scaled-inverse Wishart and prior independence
19 0.09766645 1155 andrew gelman stats-2012-02-05-What is a prior distribution?
20 0.091550812 519 andrew gelman stats-2011-01-16-Update on the generalized method of moments
topicId topicWeight
[(0, 0.047), (1, 0.127), (2, 0.013), (3, 0.05), (4, -0.013), (5, -0.011), (6, 0.085), (7, -0.01), (8, -0.108), (9, 0.006), (10, -0.011), (11, -0.028), (12, 0.031), (13, -0.001), (14, -0.005), (15, -0.035), (16, -0.021), (17, 0.001), (18, 0.023), (19, -0.039), (20, 0.034), (21, -0.026), (22, 0.023), (23, 0.018), (24, 0.015), (25, 0.003), (26, 0.0), (27, 0.003), (28, 0.042), (29, 0.006), (30, -0.028), (31, 0.02), (32, 0.01), (33, -0.004), (34, 0.015), (35, -0.011), (36, -0.032), (37, 0.015), (38, -0.052), (39, 0.03), (40, 0.006), (41, 0.014), (42, 0.005), (43, 0.01), (44, 0.053), (45, 0.026), (46, 0.008), (47, 0.029), (48, 0.012), (49, -0.011)]
simIndex simValue blogId blogTitle
same-blog 1 0.98735416 442 andrew gelman stats-2010-12-01-bayesglm in Stata?
Introduction: Is there an implementation of bayesglm in Stata? (That is, approximate maximum penalized likelihood estimation with specified normal or t prior distributions on the coefficients.)
2 0.75795525 779 andrew gelman stats-2011-06-25-Avoiding boundary estimates using a prior distribution as regularization
Introduction: For awhile I’ve been fitting most of my multilevel models using lmer/glmer, which gives point estimates of the group-level variance parameters (maximum marginal likelihood estimate for lmer and an approximation for glmer). I’m usually satisfied with this–sure, point estimation understates the uncertainty in model fitting, but that’s typically the least of our worries. Sometimes, though, lmer/glmer estimates group-level variances at 0 or estimates group-level correlation parameters at +/- 1. Typically, when this happens, it’s not that we’re so sure the variance is close to zero or that the correlation is close to 1 or -1; rather, the marginal likelihood does not provide a lot of information about these parameters of the group-level error distribution. I don’t want point estimates on the boundary. I don’t want to say that the unexplained variance in some dimension is exactly zero. One way to handle this problem is full Bayes: slap a prior on sigma, do your Gibbs and Metropolis
3 0.742984 1474 andrew gelman stats-2012-08-29-More on scaled-inverse Wishart and prior independence
Introduction: I’ve had a couple of email conversations in the past couple days on dependence in multivariate prior distributions. Modeling the degrees of freedom and scale parameters in the t distribution First, in our Stan group we’ve been discussing the choice of priors for the degrees-of-freedom parameter in the t distribution. I wrote that also there’s the question of parameterization. It does not necessarily make sense to have independent priors on the df and scale parameters. In some sense, the meaning of the scale parameter changes with the df. Prior dependence between correlation and scale parameters in the scaled inverse-Wishart model The second case of parameterization in prior distribution arose from an email I received from Chris Chatham pointing me to this exploration by Matt Simpson of the scaled inverse-Wishart prior distribution for hierarchical covariance matrices. Simpson writes: A popular prior for Σ is the inverse-Wishart distribution [ not the same as the
4 0.73945093 669 andrew gelman stats-2011-04-19-The mysterious Gamma (1.4, 0.4)
Introduction: A student writes: I have a question about an earlier recommendation of yours on the election of the prior distribution for the precision hyperparameter of a normal distribution, and a reference for the recommendation. If I recall correctly I have read that you have suggested to use Gamma(1.4, 0.4) instead of Gamma(0.01,0.01) for the prior distribution of the precision hyper parameter of a normal distribution. I would very much appreciate if you would have the time to point me to this publication of yours. The reason is that I have used the prior distribution (Gamma(1.4, 0.4)) in a study which we now revise for publication, and where a reviewer question the choice of the distribution (claiming that it is too informative!). I am well aware of that you in recent publications (Prior distributions for variance parameters in hierarchical models. Bayesian Analysis; Data Analysis using regression and multilevel/hierarchical models) suggest to model the precision as pow(standard deviatio
Introduction: Jouni Kerman did a cool bit of research justifying the Beta (1/3, 1/3) prior as noninformative for binomial data, and the Gamma (1/3, 0) prior for Poisson data. You probably thought that nothing new could be said about noninformative priors in such basic problems, but you were wrong! Here’s the story : The conjugate binomial and Poisson models are commonly used for estimating proportions or rates. However, it is not well known that the conventional noninformative conjugate priors tend to shrink the posterior quantiles toward the boundary or toward the middle of the parameter space, making them thus appear excessively informative. The shrinkage is always largest when the number of observed events is small. This behavior persists for all sample sizes and exposures. The effect of the prior is therefore most conspicuous and potentially controversial when analyzing rare events. As alternative default conjugate priors, I [Jouni] introduce Beta(1/3, 1/3) and Gamma(1/3, 0), which I cal
6 0.73241466 1130 andrew gelman stats-2012-01-20-Prior beliefs about locations of decision boundaries
7 0.72669071 1858 andrew gelman stats-2013-05-15-Reputations changeable, situations tolerable
8 0.69103116 2200 andrew gelman stats-2014-02-05-Prior distribution for a predicted probability
9 0.67240834 247 andrew gelman stats-2010-09-01-How does Bayes do it?
10 0.66661239 846 andrew gelman stats-2011-08-09-Default priors update?
11 0.66657448 1941 andrew gelman stats-2013-07-16-Priors
12 0.66442078 1465 andrew gelman stats-2012-08-21-D. Buggin
13 0.64284122 801 andrew gelman stats-2011-07-13-On the half-Cauchy prior for a global scale parameter
14 0.64272124 547 andrew gelman stats-2011-01-31-Using sample size in the prior distribution
15 0.64033723 2201 andrew gelman stats-2014-02-06-Bootstrap averaging: Examples where it works and where it doesn’t work
16 0.63875192 2017 andrew gelman stats-2013-09-11-“Informative g-Priors for Logistic Regression”
17 0.63096231 519 andrew gelman stats-2011-01-16-Update on the generalized method of moments
18 0.62468553 2109 andrew gelman stats-2013-11-21-Hidden dangers of noninformative priors
19 0.62371773 858 andrew gelman stats-2011-08-17-Jumping off the edge of the world
20 0.61979538 1089 andrew gelman stats-2011-12-28-Path sampling for models of varying dimension
topicId topicWeight
[(5, 0.068), (16, 0.154), (24, 0.157), (54, 0.097), (81, 0.088), (86, 0.169), (99, 0.055)]
simIndex simValue blogId blogTitle
same-blog 1 0.98264313 442 andrew gelman stats-2010-12-01-bayesglm in Stata?
Introduction: Is there an implementation of bayesglm in Stata? (That is, approximate maximum penalized likelihood estimation with specified normal or t prior distributions on the coefficients.)
2 0.76568919 1697 andrew gelman stats-2013-01-29-Where 36% of all boys end up nowadays
Introduction: My Take a Number feature appears in today’s Times. And here are the graphs that I wish they’d had space to include! Original story here .
3 0.69200587 436 andrew gelman stats-2010-11-29-Quality control problems at the New York Times
Introduction: I guess there’s a reason they put this stuff in the Opinion section and not in the Science section, huh? P.S. More here .
4 0.67426521 185 andrew gelman stats-2010-08-04-Why does anyone support private macroeconomic forecasts?
Introduction: Tyler Cowen asks the above question. I don’t have a full answer, but, in the Economics section of A Quantitative Tour of the Social Sciences , Richard Clarida discusses in detail the ways that researchers have tried to estimate the extent to which government or private forecasts supply additional information.
5 0.6675446 1576 andrew gelman stats-2012-11-13-Stan at NIPS 2012 Workshop on Probabilistic Programming
Introduction: If you need an excuse to go skiing in Tahoe next month, our paper on Stan as a probabilistic programming language was accepted for: Workshop on Probabilistic Programming NIPS 2012 7–8 December, 2012, Lake Tahoe, Nevada The workshop is organized by the folks behind the probabilistic programming language Church and has a great lineup of invited speakers (Chris Bishop, Josh Tennenbaum, and Stuart Russell). And in case you’re interested in the main conference, here’s the list of accepted NIPS 2012 papers and posters . To learn more about Stan, check out the links to the manual on the Stan Home Page We’ll put up a link to our final NIPS workshop paper there when we finish it.
6 0.65683609 253 andrew gelman stats-2010-09-03-Gladwell vs Pinker
7 0.65637773 164 andrew gelman stats-2010-07-26-A very short story
8 0.65480673 795 andrew gelman stats-2011-07-10-Aleks says this is the future of visualization
9 0.65259981 873 andrew gelman stats-2011-08-26-Luck or knowledge?
10 0.64927959 398 andrew gelman stats-2010-11-06-Quote of the day
11 0.64892244 1118 andrew gelman stats-2012-01-14-A model rejection letter
13 0.63722962 228 andrew gelman stats-2010-08-24-A new efficient lossless compression algorithm
14 0.63139647 779 andrew gelman stats-2011-06-25-Avoiding boundary estimates using a prior distribution as regularization
15 0.62959433 1718 andrew gelman stats-2013-02-11-Toward a framework for automatic model building
16 0.6272729 1427 andrew gelman stats-2012-07-24-More from the sister blog
17 0.62726802 177 andrew gelman stats-2010-08-02-Reintegrating rebels into civilian life: Quasi-experimental evidence from Burundi
18 0.62662417 1547 andrew gelman stats-2012-10-25-College football, voting, and the law of large numbers
19 0.62240326 1530 andrew gelman stats-2012-10-11-Migrating your blog from Movable Type to WordPress
20 0.62014955 839 andrew gelman stats-2011-08-04-To commenters who are trying to sell something