andrew_gelman_stats andrew_gelman_stats-2013 andrew_gelman_stats-2013-1941 knowledge-graph by maker-knowledge-mining
Source: html
Introduction: Nick Firoozye writes: While I am absolutely sympathetic to the Bayesian agenda I am often troubled by the requirement of having priors. We must have priors on the parameter of an infinite number of model we have never seen before and I find this troubling. There is a similarly troubling problem in economics of utility theory. Utility is on consumables. To be complete a consumer must assign utility to all sorts of things they never would have encountered. More recent versions of utility theory instead make consumption goods a portfolio of attributes. Cadillacs are x many units of luxury y of transport etc etc. And we can automatically have personal utilities to all these attributes. I don’t ever see parameters. Some model have few and some have hundreds. Instead, I see data. So I don’t know how to have an opinion on parameters themselves. Rather I think it far more natural to have opinions on the behavior of models. The prior predictive density is a good and sensible notion. Also
sentIndex sentText sentNum sentScore
1 The prior predictive density is a good and sensible notion. [sent-14, score-0.47]
2 That this may or may not give enough info to ascribe a proper prior in parameter space all the better. [sent-17, score-0.544]
3 To the extent it does not we must arbitrarily pick one (eg reference prior or maxent prior subject to the data/model prior constraints). [sent-18, score-1.463]
4 ” As I wrote in response to Larry, in some specific cases, noninformative priors can improve our estimates ( see here , for example), but in general I’ve found that it’s a good idea to include prior information. [sent-21, score-0.753]
5 Even weak prior information can make a big difference ( see here , for example). [sent-22, score-0.773]
6 A reasonable goal, I think, is for us to set up a prior distribution that is informative without hoping that it will include all our prior information. [sent-24, score-1.064]
7 stan”): data { real y; } parameters { real theta; } model { theta ~ cauchy (0, 1); y ~ normal (theta, 1); } Then the R script: library ("rstan") y <- 1 fit1 <- stan(file="normal. [sent-38, score-0.919]
8 This time, indeed, 84% of my posterior simulations of theta are greater than 0. [sent-43, score-0.542]
9 If you look at your posterior inference and it doesn’t make sense to you, this “doesn’t make sense” corresponds to additional prior information you haven’t included in your analysis. [sent-49, score-0.705]
10 OK, so that’s one way to consider the unreasonableness of a noninformative prior in this setting. [sent-50, score-0.673]
11 The other way to see what’s going on with this example is to take that flat prior seriously. [sent-53, score-0.639]
12 Suppose theta really could be just about anything—or, to keep things finite, suppose you wanted to assign theta a uniform prior distribution on [-1000,1000], and then you gather enough data to estimate theta with a standard deviation of 1. [sent-54, score-2.182]
13 998 chance that your estimate will be more than 2 standard errors away from zero so that your posterior certainty about the sign of theta will be at least 20:1. [sent-57, score-0.882]
14 So, in your prior distribution, this particular event—that y is so close to zero that there is uncertainty about theta’s sign—is extremely unlikely. [sent-60, score-0.552]
15 In either case, the flat-prior analysis gives you a high posterior probability that the difference is positive in the general population, and a high posterior probability that this difference is large (more than 1 percentage point, say). [sent-70, score-0.951]
16 Based on the literature and on the difficulty of measuring attractiveness, I’d say that a reasonable weak prior distribution for the difference in probability of girl birth, comparing beautiful and ugly parents in the general population, is N(0,0. [sent-76, score-1.185]
17 The traditional way of presenting such examples in a Bayesian statistics book would be to use a flat prior or weak prior, perhaps trying to demonstrate a lack of sensitivity to the prior. [sent-94, score-0.675]
18 In some settings the data are strong and prior information is weak, and it’s not really worth the effort to think seriously about what external knowledge we have about the system being studied. [sent-98, score-0.627]
19 More often than not, though, I think we do know a lot, and we’re interested in various questions where data are sparse, and I think we should be putting more effort into quantifying our prior distribution. [sent-99, score-0.56]
20 Upsetting situations—for example, the data of 1 +/- 1 which lead to a seemingly too-strong claim of 5:1 odds in favor of a positive effect—are helpful in that they can reveal that we have prior information that we have not yet included in our models. [sent-100, score-0.627]
wordName wordTfidf (topN-words)
[('prior', 0.47), ('theta', 0.374), ('posterior', 0.168), ('cauchy', 0.145), ('priors', 0.143), ('noninformative', 0.14), ('difference', 0.124), ('utility', 0.124), ('standard', 0.113), ('weak', 0.112), ('parents', 0.109), ('sqrt', 0.107), ('probability', 0.097), ('percentage', 0.096), ('flat', 0.093), ('beautiful', 0.092), ('data', 0.09), ('likely', 0.082), ('zero', 0.082), ('sign', 0.081), ('large', 0.077), ('example', 0.076), ('info', 0.074), ('normal', 0.071), ('distribution', 0.071), ('girls', 0.069), ('information', 0.067), ('stan', 0.067), ('print', 0.066), ('gather', 0.065), ('assign', 0.065), ('estimate', 0.064), ('effects', 0.064), ('consider', 0.063), ('parameters', 0.062), ('file', 0.061), ('deviation', 0.061), ('effect', 0.061), ('uniform', 0.061), ('model', 0.061), ('real', 0.058), ('stronger', 0.058), ('larry', 0.058), ('literature', 0.057), ('opinions', 0.055), ('result', 0.054), ('error', 0.054), ('reasonable', 0.053), ('near', 0.053), ('must', 0.053)]
simIndex simValue blogId blogTitle
same-blog 1 1.0 1941 andrew gelman stats-2013-07-16-Priors
Introduction: Nick Firoozye writes: While I am absolutely sympathetic to the Bayesian agenda I am often troubled by the requirement of having priors. We must have priors on the parameter of an infinite number of model we have never seen before and I find this troubling. There is a similarly troubling problem in economics of utility theory. Utility is on consumables. To be complete a consumer must assign utility to all sorts of things they never would have encountered. More recent versions of utility theory instead make consumption goods a portfolio of attributes. Cadillacs are x many units of luxury y of transport etc etc. And we can automatically have personal utilities to all these attributes. I don’t ever see parameters. Some model have few and some have hundreds. Instead, I see data. So I don’t know how to have an opinion on parameters themselves. Rather I think it far more natural to have opinions on the behavior of models. The prior predictive density is a good and sensible notion. Also
2 0.50083548 2109 andrew gelman stats-2013-11-21-Hidden dangers of noninformative priors
Introduction: Following up on Christian’s post [link fixed] on the topic, I’d like to offer a few thoughts of my own. In BDA, we express the idea that a noninformative prior is a placeholder: you can use the noninformative prior to get the analysis started, then if your posterior distribution is less informative than you would like, or if it does not make sense, you can go back and add prior information. Same thing for the data model (the “likelihood”), for that matter: it often makes sense to start with something simple and conventional and then go from there. So, in that sense, noninformative priors are no big deal, they’re just a way to get started. Just don’t take them too seriously. Traditionally in statistics we’ve worked with the paradigm of a single highly informative dataset with only weak external information. But if the data are sparse and prior information is strong, we have to think differently. And, when you increase the dimensionality of a problem, both these things hap
3 0.38696241 2200 andrew gelman stats-2014-02-05-Prior distribution for a predicted probability
Introduction: I received the following email: I have an interesting thought on a prior for a logistic regression, and would love your input on how to make it “work.” Some of my research, two published papers, are on mathematical models of **. Along those lines, I’m interested in developing more models for **. . . . Empirical studies show that the public is rather smart and that the wisdom-of-the-crowd is fairly accurate. So, my thought would be to tread the public’s probability of the event as a prior, and then see how adding data, through a model, would change or perturb our inferred probability of **. (Similarly, I could envision using previously published epidemiological research as a prior probability of a disease, and then seeing how the addition of new testing protocols would update that belief.) However, everything I learned about hierarchical Bayesian models has a prior as a distribution on the coefficients. I don’t know how to start with a prior point estimate for the probabili
4 0.38588417 1155 andrew gelman stats-2012-02-05-What is a prior distribution?
Introduction: Some recent blog discussion revealed some confusion that I’ll try to resolve here. I wrote that I’m not a big fan of subjective priors. Various commenters had difficulty with this point, and I think the issue was most clearly stated by Bill Jeff re erys, who wrote : It seems to me that your prior has to reflect your subjective information before you look at the data. How can it not? But this does not mean that the (subjective) prior that you choose is irrefutable; Surely a prior that reflects prior information just does not have to be inconsistent with that information. But that still leaves a range of priors that are consistent with it, the sort of priors that one would use in a sensitivity analysis, for example. I think I see what Bill is getting at. A prior represents your subjective belief, or some approximation to your subjective belief, even if it’s not perfect. That sounds reasonable but I don’t think it works. Or, at least, it often doesn’t work. Let’s start
5 0.35691068 1474 andrew gelman stats-2012-08-29-More on scaled-inverse Wishart and prior independence
Introduction: I’ve had a couple of email conversations in the past couple days on dependence in multivariate prior distributions. Modeling the degrees of freedom and scale parameters in the t distribution First, in our Stan group we’ve been discussing the choice of priors for the degrees-of-freedom parameter in the t distribution. I wrote that also there’s the question of parameterization. It does not necessarily make sense to have independent priors on the df and scale parameters. In some sense, the meaning of the scale parameter changes with the df. Prior dependence between correlation and scale parameters in the scaled inverse-Wishart model The second case of parameterization in prior distribution arose from an email I received from Chris Chatham pointing me to this exploration by Matt Simpson of the scaled inverse-Wishart prior distribution for hierarchical covariance matrices. Simpson writes: A popular prior for Σ is the inverse-Wishart distribution [ not the same as the
6 0.34822267 1695 andrew gelman stats-2013-01-28-Economists argue about Bayes
7 0.34265903 1092 andrew gelman stats-2011-12-29-More by Berger and me on weakly informative priors
8 0.34067991 1946 andrew gelman stats-2013-07-19-Prior distributions on derived quantities rather than on parameters themselves
9 0.33021793 1149 andrew gelman stats-2012-02-01-Philosophy of Bayesian statistics: my reactions to Cox and Mayo
10 0.31629992 547 andrew gelman stats-2011-01-31-Using sample size in the prior distribution
11 0.29608259 1130 andrew gelman stats-2012-01-20-Prior beliefs about locations of decision boundaries
12 0.29519963 1089 andrew gelman stats-2011-12-28-Path sampling for models of varying dimension
13 0.27887544 779 andrew gelman stats-2011-06-25-Avoiding boundary estimates using a prior distribution as regularization
14 0.27386504 1858 andrew gelman stats-2013-05-15-Reputations changeable, situations tolerable
15 0.27354062 1486 andrew gelman stats-2012-09-07-Prior distributions for regression coefficients
16 0.26293916 2017 andrew gelman stats-2013-09-11-“Informative g-Priors for Logistic Regression”
17 0.25926694 1465 andrew gelman stats-2012-08-21-D. Buggin
18 0.24744451 1999 andrew gelman stats-2013-08-27-Bayesian model averaging or fitting a larger model
19 0.24564554 1046 andrew gelman stats-2011-12-07-Neutral noninformative and informative conjugate beta and gamma prior distributions
20 0.23708378 1868 andrew gelman stats-2013-05-23-Validation of Software for Bayesian Models Using Posterior Quantiles
topicId topicWeight
[(0, 0.35), (1, 0.33), (2, 0.102), (3, 0.033), (4, -0.059), (5, -0.131), (6, 0.243), (7, 0.044), (8, -0.286), (9, -0.005), (10, -0.051), (11, -0.008), (12, 0.086), (13, 0.001), (14, -0.032), (15, -0.028), (16, -0.041), (17, 0.026), (18, 0.057), (19, -0.033), (20, -0.009), (21, -0.051), (22, 0.0), (23, -0.006), (24, 0.019), (25, 0.044), (26, 0.03), (27, -0.04), (28, -0.02), (29, 0.007), (30, -0.015), (31, -0.015), (32, -0.071), (33, -0.019), (34, -0.003), (35, 0.069), (36, 0.019), (37, 0.031), (38, -0.06), (39, 0.004), (40, 0.018), (41, 0.043), (42, -0.05), (43, -0.056), (44, -0.01), (45, 0.009), (46, 0.071), (47, -0.002), (48, 0.002), (49, 0.055)]
simIndex simValue blogId blogTitle
same-blog 1 0.97510165 1941 andrew gelman stats-2013-07-16-Priors
Introduction: Nick Firoozye writes: While I am absolutely sympathetic to the Bayesian agenda I am often troubled by the requirement of having priors. We must have priors on the parameter of an infinite number of model we have never seen before and I find this troubling. There is a similarly troubling problem in economics of utility theory. Utility is on consumables. To be complete a consumer must assign utility to all sorts of things they never would have encountered. More recent versions of utility theory instead make consumption goods a portfolio of attributes. Cadillacs are x many units of luxury y of transport etc etc. And we can automatically have personal utilities to all these attributes. I don’t ever see parameters. Some model have few and some have hundreds. Instead, I see data. So I don’t know how to have an opinion on parameters themselves. Rather I think it far more natural to have opinions on the behavior of models. The prior predictive density is a good and sensible notion. Also
2 0.90767187 2109 andrew gelman stats-2013-11-21-Hidden dangers of noninformative priors
Introduction: Following up on Christian’s post [link fixed] on the topic, I’d like to offer a few thoughts of my own. In BDA, we express the idea that a noninformative prior is a placeholder: you can use the noninformative prior to get the analysis started, then if your posterior distribution is less informative than you would like, or if it does not make sense, you can go back and add prior information. Same thing for the data model (the “likelihood”), for that matter: it often makes sense to start with something simple and conventional and then go from there. So, in that sense, noninformative priors are no big deal, they’re just a way to get started. Just don’t take them too seriously. Traditionally in statistics we’ve worked with the paradigm of a single highly informative dataset with only weak external information. But if the data are sparse and prior information is strong, we have to think differently. And, when you increase the dimensionality of a problem, both these things hap
3 0.90483677 1474 andrew gelman stats-2012-08-29-More on scaled-inverse Wishart and prior independence
Introduction: I’ve had a couple of email conversations in the past couple days on dependence in multivariate prior distributions. Modeling the degrees of freedom and scale parameters in the t distribution First, in our Stan group we’ve been discussing the choice of priors for the degrees-of-freedom parameter in the t distribution. I wrote that also there’s the question of parameterization. It does not necessarily make sense to have independent priors on the df and scale parameters. In some sense, the meaning of the scale parameter changes with the df. Prior dependence between correlation and scale parameters in the scaled inverse-Wishart model The second case of parameterization in prior distribution arose from an email I received from Chris Chatham pointing me to this exploration by Matt Simpson of the scaled inverse-Wishart prior distribution for hierarchical covariance matrices. Simpson writes: A popular prior for Σ is the inverse-Wishart distribution [ not the same as the
4 0.88847363 1130 andrew gelman stats-2012-01-20-Prior beliefs about locations of decision boundaries
Introduction: Forest Gregg writes: I want to incorporate a prior belief into an estimation of a logistic regression classifier of points distributed in a 2d space. My prior belief is a funny kind of prior though. It’s a belief about where the decision boundary between classes should fall. Over the 2d space, I lay a grid, and I believe that a decision boundary that separates any two classes should fall along any of the grid line with some probablity, and that the decision boundary should fall anywhere except a gridline with a much lower probability. For the two class case, and a logistic regression model parameterized by W and data X, my prior could perhaps be expressed Pr(W) = (normalizing constant)/exp(d) where d = f(grid,W,X) such that when logistic(W^TX)= .5 and X is ‘far’ from grid lines, then d is large. Have you ever seen a model like this, or do you have any notions about a good avenue to pursue? My real data consist of geocoded Craigslist’s postings that are labeled with the
5 0.87712711 669 andrew gelman stats-2011-04-19-The mysterious Gamma (1.4, 0.4)
Introduction: A student writes: I have a question about an earlier recommendation of yours on the election of the prior distribution for the precision hyperparameter of a normal distribution, and a reference for the recommendation. If I recall correctly I have read that you have suggested to use Gamma(1.4, 0.4) instead of Gamma(0.01,0.01) for the prior distribution of the precision hyper parameter of a normal distribution. I would very much appreciate if you would have the time to point me to this publication of yours. The reason is that I have used the prior distribution (Gamma(1.4, 0.4)) in a study which we now revise for publication, and where a reviewer question the choice of the distribution (claiming that it is too informative!). I am well aware of that you in recent publications (Prior distributions for variance parameters in hierarchical models. Bayesian Analysis; Data Analysis using regression and multilevel/hierarchical models) suggest to model the precision as pow(standard deviatio
7 0.86449248 1858 andrew gelman stats-2013-05-15-Reputations changeable, situations tolerable
8 0.86131489 1155 andrew gelman stats-2012-02-05-What is a prior distribution?
9 0.85115826 2200 andrew gelman stats-2014-02-05-Prior distribution for a predicted probability
10 0.8458451 1092 andrew gelman stats-2011-12-29-More by Berger and me on weakly informative priors
11 0.84337091 779 andrew gelman stats-2011-06-25-Avoiding boundary estimates using a prior distribution as regularization
12 0.83922708 1946 andrew gelman stats-2013-07-19-Prior distributions on derived quantities rather than on parameters themselves
13 0.81689954 2017 andrew gelman stats-2013-09-11-“Informative g-Priors for Logistic Regression”
14 0.81321663 547 andrew gelman stats-2011-01-31-Using sample size in the prior distribution
15 0.80400491 1465 andrew gelman stats-2012-08-21-D. Buggin
16 0.79663175 801 andrew gelman stats-2011-07-13-On the half-Cauchy prior for a global scale parameter
17 0.7941305 442 andrew gelman stats-2010-12-01-bayesglm in Stata?
18 0.7909345 846 andrew gelman stats-2011-08-09-Default priors update?
19 0.78278244 1089 andrew gelman stats-2011-12-28-Path sampling for models of varying dimension
20 0.78196019 63 andrew gelman stats-2010-06-02-The problem of overestimation of group-level variance parameters
topicId topicWeight
[(2, 0.053), (13, 0.013), (15, 0.029), (16, 0.062), (21, 0.025), (24, 0.281), (36, 0.013), (53, 0.014), (60, 0.012), (65, 0.02), (77, 0.011), (86, 0.021), (94, 0.012), (96, 0.012), (99, 0.308)]
simIndex simValue blogId blogTitle
1 0.98732597 1240 andrew gelman stats-2012-04-02-Blogads update
Introduction: A few months ago I reported on someone who wanted to insert text links into the blog. I asked her how much they would pay and got no answer. Yesterday, though, I received this reply: Hello Andrew, I am sorry for the delay in getting back to you. I’d like to make a proposal for your site. Please refer below. We would like to place a simple text link ad on page http://andrewgelman.com/2011/07/super_sam_fuld/ to link to *** with the key phrase ***. We will incorporate the key phrase into a sentence so it would read well. Rest assured it won’t sound obnoxious or advertorial. We will then process the final text link code as soon as you agree to our proposal. We can offer you $200 for this with the assumption that you will keep the link “live” on that page for 12 months or longer if you prefer. Please get back to us with a quick reply on your thoughts on this and include your Paypal ID for payment process. Hoping for a positive response from you. I wrote back: Hi,
Introduction: Our discussion on data visualization continues. One one side are three statisticians–Antony Unwin, Kaiser Fung, and myself. We have been writing about the different goals served by information visualization and statistical graphics. On the other side are graphics experts (sorry for the imprecision, I don’t know exactly what these people do in their day jobs or how they are trained, and I don’t want to mislabel them) such as Robert Kosara and Jen Lowe , who seem a bit annoyed at how my colleagues and myself seem to follow the Tufte strategy of criticizing what we don’t understand. And on the third side are many (most?) academic statisticians, econometricians, etc., who don’t understand or respect graphs and seem to think of visualization as a toy that is unrelated to serious science or statistics. I’m not so interested in the third group right now–I tried to communicate with them in my big articles from 2003 and 2004 )–but I am concerned that our dialogue with the graphic
Introduction: Pointing to this news article by Megan McArdle discussing a recent study of Medicaid recipients, Jonathan Falk writes: Forget the interpretation for a moment, and the political spin, but haven’t we reached an interesting point when a journalist says things like: When you do an RCT with more than 12,000 people in it, and your defense of your hypothesis is that maybe the study just didn’t have enough power, what you’re actually saying is “the beneficial effects are probably pretty small”. and A good Bayesian—and aren’t most of us are supposed to be good Bayesians these days?—should be updating in light of this new information. Given this result, what is the likelihood that Obamacare will have a positive impact on the average health of Americans? Every one of us, for or against, should be revising that probability downwards. I’m not saying that you have to revise it to zero; I certainly haven’t. But however high it was yesterday, it should be somewhat lower today. This
4 0.98452628 1367 andrew gelman stats-2012-06-05-Question 26 of my final exam for Design and Analysis of Sample Surveys
Introduction: 26. You have just graded an an exam with 28 questions and 15 students. You fit a logistic item- response model estimating ability, difficulty, and discrimination parameters. Which of the following statements are basically true? (Indicate all that apply.) (a) If a question is answered correctly by students with very low and very high ability, but is missed by students in the middle, it will have a high value for its discrimination parameter. (b) It is not possible to fit an item-response model when you have more questions than students. In order to fit the model, you either need to reduce the number of questions (for example, by discarding some questions or by putting together some questions into a combined score) or increase the number of students in the dataset. (c) To keep the model identified, you can set one of the difficulty parameters or one of the ability parameters to zero and set one of the discrimination parameters to 1. (d) If two students answer the same number of q
5 0.98417807 197 andrew gelman stats-2010-08-10-The last great essayist?
Introduction: I recently read a bizarre article by Janet Malcolm on a murder trial in NYC. What threw me about the article was that the story was utterly commonplace (by the standards of today’s headlines): divorced mom kills ex-husband in a custody dispute over their four-year-old daughter. The only interesting features were (a) the wife was a doctor and the husband were a dentist, the sort of people you’d expect to sue rather than slay, and (b) the wife hired a hitman from within the insular immigrant community that she (and her husband) belonged to. But, really, neither of these was much of a twist. To add to the non-storyness of it all, there were no other suspects, the evidence against the wife and the hitman was overwhelming, and even the high-paid defense lawyers didn’t seem to be making much of an effort to convince anyone of their client’s innocents. (One of the closing arguments was that one aspect of the wife’s story was so ridiculous that it had to be true. In the lawyer’s wo
7 0.98257792 2029 andrew gelman stats-2013-09-18-Understanding posterior p-values
9 0.98148966 1072 andrew gelman stats-2011-12-19-“The difference between . . .”: It’s not just p=.05 vs. p=.06
10 0.98141885 1087 andrew gelman stats-2011-12-27-“Keeping things unridiculous”: Berger, O’Hagan, and me on weakly informative priors
11 0.98138237 1757 andrew gelman stats-2013-03-11-My problem with the Lindley paradox
12 0.98099828 953 andrew gelman stats-2011-10-11-Steve Jobs’s cancer and science-based medicine
14 0.98058999 1155 andrew gelman stats-2012-02-05-What is a prior distribution?
15 0.98007131 1208 andrew gelman stats-2012-03-11-Gelman on Hennig on Gelman on Bayes
same-blog 16 0.98002797 1941 andrew gelman stats-2013-07-16-Priors
17 0.9789995 1080 andrew gelman stats-2011-12-24-Latest in blog advertising
18 0.97871244 1455 andrew gelman stats-2012-08-12-Probabilistic screening to get an approximate self-weighted sample
19 0.97832704 2109 andrew gelman stats-2013-11-21-Hidden dangers of noninformative priors
20 0.97793806 2247 andrew gelman stats-2014-03-14-The maximal information coefficient