andrew_gelman_stats andrew_gelman_stats-2010 andrew_gelman_stats-2010-247 knowledge-graph by maker-knowledge-mining

247 andrew gelman stats-2010-09-01-How does Bayes do it?

meta infos for this blog

Source: html

Introduction: I received the following message from a statistician working in industry: I am studying your paper, A Weakly Informative Default Prior Distribution for Logistic and Other Regression Models . I am not clear why the Bayesian approaches with some priors can usually handle the issue of nonidentifiability or can get stable estimates of parameters in model fit, while the frequentist approaches cannot. My reply: 1. The term “frequentist approach” is pretty general. “Frequentist” refers to an approach for evaluating inferences, not a method for creating estimates. In particular, any Bayes estimate can be viewed as a frequentist inference if you feel like evaluating its frequency properties. In logistic regression, maximum likelihood has some big problems that are solved with penalized likelihood–equivalently, Bayesian inference. A frequentist can feel free to consider the prior as a penalty function rather than a probability distribution of parameters. 2. The reason our approa

Summary: the most important sentenses genereted by tfidf model

sentIndex sentText sentNum sentScore

1 I received the following message from a statistician working in industry: I am studying your paper, A Weakly Informative Default Prior Distribution for Logistic and Other Regression Models . [sent-1, score-0.142]

2 I am not clear why the Bayesian approaches with some priors can usually handle the issue of nonidentifiability or can get stable estimates of parameters in model fit, while the frequentist approaches cannot. [sent-2, score-1.112]

3 “Frequentist” refers to an approach for evaluating inferences, not a method for creating estimates. [sent-5, score-0.613]

4 In particular, any Bayes estimate can be viewed as a frequentist inference if you feel like evaluating its frequency properties. [sent-6, score-1.07]

5 In logistic regression, maximum likelihood has some big problems that are solved with penalized likelihood–equivalently, Bayesian inference. [sent-7, score-0.88]

6 A frequentist can feel free to consider the prior as a penalty function rather than a probability distribution of parameters. [sent-8, score-0.921]

7 The reason our approach works well is that we are adding information. [sent-10, score-0.22]

8 In a logistic regression with separation, there is a lack of information in the likeilhood, and the prior distribution helps out by ruling out unrealistic possibilities. [sent-11, score-1.213]

9 There are settings where our Bayesian method will mess up. [sent-13, score-0.284]

10 For example, if the true logistic regression coefficient is -20, and you have a moderate sample size, our estimate will be much closer to zero (while the maximum likelihood estimate will be minus infinity, which for some purposes might be an acceptable estimate). [sent-14, score-1.794]

11 Various questions along those lines arose during my recent talk at Cambridge. [sent-16, score-0.092]

similar blogs computed by tfidf model

tfidf for this blog:

wordName wordTfidf (topN-words)

[('frequentist', 0.421), ('logistic', 0.306), ('likelihood', 0.197), ('regression', 0.191), ('estimate', 0.186), ('evaluating', 0.183), ('maximum', 0.169), ('distribution', 0.155), ('prior', 0.152), ('nonidentifiability', 0.149), ('approaches', 0.147), ('approach', 0.143), ('ruling', 0.13), ('unrealistic', 0.123), ('equivalently', 0.118), ('bayesian', 0.116), ('cambridge', 0.115), ('infinity', 0.113), ('penalized', 0.112), ('method', 0.109), ('minus', 0.107), ('separation', 0.107), ('acceptable', 0.103), ('mess', 0.101), ('penalty', 0.101), ('solved', 0.096), ('viewed', 0.094), ('purposes', 0.094), ('stable', 0.094), ('industry', 0.094), ('frequency', 0.094), ('moderate', 0.093), ('weakly', 0.093), ('feel', 0.092), ('arose', 0.092), ('refers', 0.09), ('helps', 0.088), ('creating', 0.088), ('handle', 0.084), ('closer', 0.082), ('coefficient', 0.08), ('default', 0.078), ('adding', 0.077), ('settings', 0.074), ('studying', 0.073), ('inferences', 0.073), ('priors', 0.07), ('message', 0.069), ('lack', 0.068), ('informative', 0.068)]

similar blogs list:

simIndex simValue blogId blogTitle

same-blog 1 0.99999994 247 andrew gelman stats-2010-09-01-How does Bayes do it?

2 0.3382546 291 andrew gelman stats-2010-09-22-Philosophy of Bayes and non-Bayes: A dialogue with Deborah Mayo

Introduction: I sent Deborah Mayo a link to my paper with Cosma Shalizi on the philosophy of statistics, and she sent me the link to this conference which unfortunately already occurred. (It’s too bad, because I’d have liked to have been there.) I summarized my philosophy as follows: I am highly sympathetic to the approach of Lakatos (or of Popper, if you consider Lakatos’s “Popper_2″ to be a reasonable simulation of the true Popperism), in that (a) I view statistical models as being built within theoretical structures, and (b) I see the checking and refutation of models to be a key part of scientific progress. A big problem I have with mainstream Bayesianism is its “inductivist” view that science can operate completely smoothly with posterior updates: the idea that new data causes us to increase the posterior probability of good models and decrease the posterior probability of bad models. I don’t buy that: I see models as ever-changing entities that are flexible and can be patched and ex

3 0.248612 2200 andrew gelman stats-2014-02-05-Prior distribution for a predicted probability

Introduction: I received the following email: I have an interesting thought on a prior for a logistic regression, and would love your input on how to make it “work.” Some of my research, two published papers, are on mathematical models of **. Along those lines, I’m interested in developing more models for **. . . . Empirical studies show that the public is rather smart and that the wisdom-of-the-crowd is fairly accurate. So, my thought would be to tread the public’s probability of the event as a prior, and then see how adding data, through a model, would change or perturb our inferred probability of **. (Similarly, I could envision using previously published epidemiological research as a prior probability of a disease, and then seeing how the addition of new testing protocols would update that belief.) However, everything I learned about hierarchical Bayesian models has a prior as a distribution on the coefficients. I don’t know how to start with a prior point estimate for the probabili

4 0.24434528 779 andrew gelman stats-2011-06-25-Avoiding boundary estimates using a prior distribution as regularization

Introduction: For awhile I’ve been fitting most of my multilevel models using lmer/glmer, which gives point estimates of the group-level variance parameters (maximum marginal likelihood estimate for lmer and an approximation for glmer). I’m usually satisfied with this–sure, point estimation understates the uncertainty in model fitting, but that’s typically the least of our worries. Sometimes, though, lmer/glmer estimates group-level variances at 0 or estimates group-level correlation parameters at +/- 1. Typically, when this happens, it’s not that we’re so sure the variance is close to zero or that the correlation is close to 1 or -1; rather, the marginal likelihood does not provide a lot of information about these parameters of the group-level error distribution. I don’t want point estimates on the boundary. I don’t want to say that the unexplained variance in some dimension is exactly zero. One way to handle this problem is full Bayes: slap a prior on sigma, do your Gibbs and Metropolis

5 0.23954025 1560 andrew gelman stats-2012-11-03-Statistical methods that work in some settings but not others

Introduction: David Hogg pointed me to this post by Larry Wasserman: 1. The Horwitz-Thompson estimator satisfies the following condition: for every , where — the parameter space — is the set of all functions . (There are practical improvements to the Horwitz-Thompson estimator that we discussed in our earlier posts but we won’t revisit those here.) 2. A Bayes estimator requires a prior for . In general, if is not a function of then (1) will not hold. . . . 3. If you let be a function if , (1) still, in general, does not hold. 4. If you make a function if in just the right way, then (1) will hold. . . . There is nothing wrong with doing this, but in our opinion this is not in the spirit of Bayesian inference. . . . 7. This example is only meant to show that Bayesian estimators do not necessarily have good frequentist properties. This should not be surprising. There is no reason why we should in general expect a Bayesian method to have a frequentist property

6 0.2072268 2201 andrew gelman stats-2014-02-06-Bootstrap averaging: Examples where it works and where it doesn’t work

7 0.1982232 1182 andrew gelman stats-2012-02-24-Untangling the Jeffreys-Lindley paradox

8 0.19046828 1445 andrew gelman stats-2012-08-06-Slow progress

9 0.19034803 846 andrew gelman stats-2011-08-09-Default priors update?

10 0.18959069 1155 andrew gelman stats-2012-02-05-What is a prior distribution?

11 0.18797857 1092 andrew gelman stats-2011-12-29-More by Berger and me on weakly informative priors

12 0.18771838 2017 andrew gelman stats-2013-09-11-“Informative g-Priors for Logistic Regression”

13 0.18572396 1695 andrew gelman stats-2013-01-28-Economists argue about Bayes

14 0.18541846 1572 andrew gelman stats-2012-11-10-I don’t like this cartoon

15 0.17380036 534 andrew gelman stats-2011-01-24-Bayes at the end

16 0.16703591 1941 andrew gelman stats-2013-07-16-Priors

17 0.16605403 1474 andrew gelman stats-2012-08-29-More on scaled-inverse Wishart and prior independence

18 0.16221233 1486 andrew gelman stats-2012-09-07-Prior distributions for regression coefficients

19 0.16167679 788 andrew gelman stats-2011-07-06-Early stopping and penalized likelihood

20 0.16048628 1355 andrew gelman stats-2012-05-31-Lindley’s paradox

similar blogs computed by lsi model

lsi for this blog:

topicId topicWeight

[(0, 0.196), (1, 0.288), (2, 0.028), (3, 0.059), (4, -0.056), (5, -0.03), (6, 0.092), (7, 0.052), (8, -0.093), (9, -0.03), (10, 0.033), (11, -0.051), (12, 0.042), (13, 0.078), (14, 0.008), (15, -0.005), (16, -0.043), (17, -0.008), (18, 0.021), (19, -0.006), (20, 0.01), (21, 0.015), (22, 0.063), (23, 0.021), (24, 0.063), (25, -0.024), (26, 0.039), (27, -0.061), (28, -0.036), (29, -0.002), (30, 0.048), (31, 0.055), (32, 0.022), (33, -0.014), (34, 0.031), (35, -0.073), (36, -0.013), (37, -0.001), (38, -0.013), (39, -0.01), (40, 0.028), (41, 0.059), (42, -0.034), (43, -0.007), (44, 0.078), (45, 0.068), (46, 0.021), (47, 0.069), (48, -0.044), (49, -0.021)]

similar blogs list:

simIndex simValue blogId blogTitle

same-blog 1 0.99298507 247 andrew gelman stats-2010-09-01-How does Bayes do it?

2 0.76312113 2129 andrew gelman stats-2013-12-10-Cross-validation and Bayesian estimation of tuning parameters

Introduction: Ilya Lipkovich writes: I read with great interest your 2008 paper [with Aleks Jakulin, Grazia Pittau, and Yu-Sung Su] on weakly informative priors for logistic regression and also followed an interesting discussion on your blog. This discussion was within Bayesian community in relation to the validity of priors. However i would like to approach it rather from a more broad perspective on predictive modeling bringing in the ideas from machine/statistical learning approach”. Actually you were the first to bring it up by mentioning in your paper “borrowing ideas from computer science” on cross-validation when comparing predictive ability of your proposed priors with other choices. However, using cross-validation for comparing method performance is not the only or primary use of CV in machine-learning. Most of machine learning methods have some “meta” or complexity parameters and use cross-validation to tune them up. For example, one of your comparison methods is BBR which actually

3 0.76020271 2201 andrew gelman stats-2014-02-06-Bootstrap averaging: Examples where it works and where it doesn’t work

Introduction: Aki and I write : The very generality of the boostrap creates both opportunity and peril, allowing researchers to solve otherwise intractable problems but also sometimes leading to an answer with an inappropriately high level of certainty. We demonstrate with two examples from our own research: one problem where bootstrap smoothing was effective and led us to an improved method, and another case where bootstrap smoothing would not solve the underlying problem. Our point in these examples is not to disparage bootstrapping but rather to gain insight into where it will be more or less effective as a smoothing tool. An example where bootstrap smoothing works well Bayesian posterior distributions are commonly summarized using Monte Carlo simulations, and inferences for scalar parameters or quantities of interest can be summarized using 50% or 95% intervals. A interval for a continuous quantity is typically constructed either as a central probability interval (with probabili

4 0.75800902 63 andrew gelman stats-2010-06-02-The problem of overestimation of group-level variance parameters

Introduction: John Lawson writes: I have been experimenting using Bayesian Methods to estimate variance components, and I have noticed that even when I use a noninformative prior, my estimates are never close to the method of moments or REML estimates. In every case I have tried, the sum of the Bayesian estimated variance components is always larger than the sum of the estimates obtained by method of moments or REML. For data sets I have used that arise from a simple one-way random effects model, the Bayesian estimates of the between groups variance component is usually larger than the method of moments or REML estimates. When I use a uniform prior on the between standard deviation (as you recommended in your 2006 paper ) rather than an inverse gamma prior on the between variance component, the between variance component is usually reduced. However, for the dyestuff data in Davies(1949, p74), the opposite appears to be the case. I am a worried that the Bayesian estimators of the varian

5 0.75145149 779 andrew gelman stats-2011-06-25-Avoiding boundary estimates using a prior distribution as regularization

6 0.75063819 1130 andrew gelman stats-2012-01-20-Prior beliefs about locations of decision boundaries

7 0.74106663 1858 andrew gelman stats-2013-05-15-Reputations changeable, situations tolerable

8 0.74054968 2200 andrew gelman stats-2014-02-05-Prior distribution for a predicted probability

9 0.7369417 669 andrew gelman stats-2011-04-19-The mysterious Gamma (1.4, 0.4)

10 0.72959554 442 andrew gelman stats-2010-12-01-bayesglm in Stata?

11 0.72066498 1941 andrew gelman stats-2013-07-16-Priors

12 0.71859241 801 andrew gelman stats-2011-07-13-On the half-Cauchy prior for a global scale parameter

13 0.70077431 1560 andrew gelman stats-2012-11-03-Statistical methods that work in some settings but not others

14 0.69911617 2017 andrew gelman stats-2013-09-11-“Informative g-Priors for Logistic Regression”

15 0.69612592 1474 andrew gelman stats-2012-08-29-More on scaled-inverse Wishart and prior independence

16 0.69284356 2027 andrew gelman stats-2013-09-17-Christian Robert on the Jeffreys-Lindley paradox; more generally, it’s good news when philosophical arguments can be transformed into technical modeling issues

17 0.68646246 846 andrew gelman stats-2011-08-09-Default priors update?

18 0.68504852 1155 andrew gelman stats-2012-02-05-What is a prior distribution?

19 0.67718858 449 andrew gelman stats-2010-12-04-Generalized Method of Moments, whatever that is

20 0.67604065 1466 andrew gelman stats-2012-08-22-The scaled inverse Wishart prior distribution for a covariance matrix in a hierarchical model

similar blogs computed by lda model

lda for this blog:

topicId topicWeight

[(3, 0.018), (5, 0.011), (7, 0.012), (15, 0.033), (16, 0.07), (24, 0.242), (40, 0.012), (43, 0.021), (44, 0.011), (63, 0.023), (81, 0.013), (82, 0.02), (84, 0.033), (85, 0.014), (99, 0.363)]

similar blogs list:

simIndex simValue blogId blogTitle

same-blog 1 0.99397433 247 andrew gelman stats-2010-09-01-How does Bayes do it?

2 0.98853165 2109 andrew gelman stats-2013-11-21-Hidden dangers of noninformative priors

Introduction: Following up on Christian’s post [link fixed] on the topic, I’d like to offer a few thoughts of my own. In BDA, we express the idea that a noninformative prior is a placeholder: you can use the noninformative prior to get the analysis started, then if your posterior distribution is less informative than you would like, or if it does not make sense, you can go back and add prior information. Same thing for the data model (the “likelihood”), for that matter: it often makes sense to start with something simple and conventional and then go from there. So, in that sense, noninformative priors are no big deal, they’re just a way to get started. Just don’t take them too seriously. Traditionally in statistics we’ve worked with the paradigm of a single highly informative dataset with only weak external information. But if the data are sparse and prior information is strong, we have to think differently. And, when you increase the dimensionality of a problem, both these things hap

3 0.98680359 970 andrew gelman stats-2011-10-24-Bell Labs

Introduction: Sining Chen told me they’re hiring in the statistics group at Bell Labs . I’ll do my bit for economic stimulus by announcing this job (see below). I love Bell Labs. I worked there for three summers, in a physics lab in 1985-86 under the supervision of Loren Pfeiffer, and by myself in the statistics group in 1990. I learned a lot working for Loren. He was a really smart and driven guy. His lab was a small set of rooms—in Bell Labs, everything’s in a small room, as they value the positive externality of close physical proximity of different labs, which you get by making each lab compact—and it was Loren, his assistant (a guy named Ken West who kept everything running in the lab), and three summer students: me, Gowton Achaibar, and a girl whose name I’ve forgotten. Gowtan and I had a lot of fun chatting in the lab. One day I made a silly comment about Gowton’s accent—he was from Guyana and pronounced “three” as “tree”—and then I apologized and said: Hey, here I am making fun o

4 0.98678708 2080 andrew gelman stats-2013-10-28-Writing for free

Introduction: Max Read points to discussions by Cord Jefferson and Tim Krieger about people who write for free, thus depressing the wages of paid journalists. The topic interests me because I’m one of those people who writes for free, all the time. As a commenter wrote in response to Cord Jefferson’s article: It’s not just people who have inherited money, it’s also people who have “day jobs” to support themselves while they pursue dream jobs in fields like journalism, fiction writing, theater and music. In this case, I’m pursuing the dream job of blogging, but it’s the same basic idea. I actually enjoy doing this, which is more than can be said of Tim Kreider, who writes: I will freely admit that writing beats baling hay or going door-to-door for a living, but it’s still shockingly unenjoyable work. I’m lucky enough not to ever have had to bale hay or go door-to-door for a living, but I find writing to be enjoyable! So I can see how it can be hard for Kreider to compete wi

5 0.98597151 511 andrew gelman stats-2011-01-11-One more time on that ESP study: The problem of overestimates and the shrinkage solution

Introduction: Benedict Carey writes a follow-up article on ESP studies and Bayesian statistics. ( See here for my previous thoughts on the topic.) Everything Carey writes is fine, and he even uses an example I recommended: The statistical approach that has dominated the social sciences for almost a century is called significance testing. The idea is straightforward. A finding from any well-designed study — say, a correlation between a personality trait and the risk of depression — is considered “significant” if its probability of occurring by chance is less than 5 percent. This arbitrary cutoff makes sense when the effect being studied is a large one — for example, when measuring the so-called Stroop effect. This effect predicts that naming the color of a word is faster and more accurate when the word and color match (“red” in red letters) than when they do not (“red” in blue letters), and is very strong in almost everyone. “But if the true effect of what you are measuring is small,” sai

6 0.98554391 1941 andrew gelman stats-2013-07-16-Priors

7 0.98501438 2358 andrew gelman stats-2014-06-03-Did you buy laundry detergent on their most recent trip to the store? Also comments on scientific publication and yet another suggestion to do a study that allows within-person comparisons

8 0.98459488 1713 andrew gelman stats-2013-02-08-P-values and statistical practice

9 0.98439378 1208 andrew gelman stats-2012-03-11-Gelman on Hennig on Gelman on Bayes

10 0.98430789 1644 andrew gelman stats-2012-12-30-Fixed effects, followed by Bayes shrinkage?

11 0.98409796 2129 andrew gelman stats-2013-12-10-Cross-validation and Bayesian estimation of tuning parameters

12 0.98327589 1240 andrew gelman stats-2012-04-02-Blogads update

13 0.98260808 1883 andrew gelman stats-2013-06-04-Interrogating p-values

14 0.98183328 2208 andrew gelman stats-2014-02-12-How to think about “identifiability” in Bayesian inference?

15 0.98163116 2200 andrew gelman stats-2014-02-05-Prior distribution for a predicted probability

16 0.98141599 2086 andrew gelman stats-2013-11-03-How best to compare effects measured in two different time periods?

17 0.98130912 899 andrew gelman stats-2011-09-10-The statistical significance filter

18 0.98072183 466 andrew gelman stats-2010-12-13-“The truth wears off: Is there something wrong with the scientific method?”

19 0.98041928 1155 andrew gelman stats-2012-02-05-What is a prior distribution?

20 0.97986937 1247 andrew gelman stats-2012-04-05-More philosophy of Bayes