andrew_gelman_stats andrew_gelman_stats-2012 andrew_gelman_stats-2012-1445 knowledge-graph by maker-knowledge-mining

1445 andrew gelman stats-2012-08-06-Slow progress


meta infos for this blog

Source: html

Introduction: I received the following message: I am a Psychology postgraduate at the University of Glasgow and am writing for an article request. I’ve just read your 2008 published article titled “A weakly informative default prior distribution for logistic and other regression models” and found from it that your group also wrote a report on applying the Bayesian logistic regression approach to multilevel model, which is titled “An approximate EM algorithm for multilevel generalized linear models”. I have been looking for it online but did find it, and was wondering if I may request this report from you? My first thought is that this is a good sign that psychology undergraduates are reading papers like this. Unfortunately I had to reply as follows: Hi, we actually programmed this up but never debugged it! So no actual paper . . . I think I could’ve done it if I had ever focused on the problem. Between the messiness of the algebra and the messiness of the R code, I never got it all to


Summary: the most important sentenses genereted by tfidf model

sentIndex sentText sentNum sentScore

1 I received the following message: I am a Psychology postgraduate at the University of Glasgow and am writing for an article request. [sent-1, score-0.246]

2 I have been looking for it online but did find it, and was wondering if I may request this report from you? [sent-3, score-0.569]

3 My first thought is that this is a good sign that psychology undergraduates are reading papers like this. [sent-4, score-0.608]

4 Unfortunately I had to reply as follows: Hi, we actually programmed this up but never debugged it! [sent-5, score-0.579]

5 I think I could’ve done it if I had ever focused on the problem. [sent-9, score-0.259]

6 Between the messiness of the algebra and the messiness of the R code, I never got it all to work. [sent-10, score-1.173]


similar blogs computed by tfidf model

tfidf for this blog:

wordName wordTfidf (topN-words)

[('messiness', 0.417), ('titled', 0.302), ('logistic', 0.227), ('debugged', 0.221), ('multilevel', 0.184), ('programmed', 0.174), ('undergraduates', 0.174), ('psychology', 0.171), ('em', 0.163), ('hi', 0.158), ('report', 0.155), ('algebra', 0.154), ('regression', 0.142), ('request', 0.141), ('weakly', 0.138), ('generalized', 0.136), ('approximate', 0.126), ('applying', 0.125), ('never', 0.12), ('algorithm', 0.116), ('sign', 0.116), ('focused', 0.115), ('default', 0.115), ('models', 0.111), ('follows', 0.104), ('wondering', 0.104), ('message', 0.103), ('informative', 0.101), ('online', 0.098), ('unfortunately', 0.098), ('linear', 0.098), ('code', 0.097), ('received', 0.093), ('actual', 0.089), ('article', 0.086), ('group', 0.08), ('university', 0.079), ('ever', 0.077), ('distribution', 0.076), ('papers', 0.075), ('prior', 0.075), ('ve', 0.075), ('reading', 0.072), ('looking', 0.071), ('approach', 0.07), ('done', 0.067), ('writing', 0.067), ('got', 0.065), ('reply', 0.064), ('published', 0.064)]

similar blogs list:

simIndex simValue blogId blogTitle

same-blog 1 0.99999994 1445 andrew gelman stats-2012-08-06-Slow progress

Introduction: I received the following message: I am a Psychology postgraduate at the University of Glasgow and am writing for an article request. I’ve just read your 2008 published article titled “A weakly informative default prior distribution for logistic and other regression models” and found from it that your group also wrote a report on applying the Bayesian logistic regression approach to multilevel model, which is titled “An approximate EM algorithm for multilevel generalized linear models”. I have been looking for it online but did find it, and was wondering if I may request this report from you? My first thought is that this is a good sign that psychology undergraduates are reading papers like this. Unfortunately I had to reply as follows: Hi, we actually programmed this up but never debugged it! So no actual paper . . . I think I could’ve done it if I had ever focused on the problem. Between the messiness of the algebra and the messiness of the R code, I never got it all to

2 0.19046828 247 andrew gelman stats-2010-09-01-How does Bayes do it?

Introduction: I received the following message from a statistician working in industry: I am studying your paper, A Weakly Informative Default Prior Distribution for Logistic and Other Regression Models . I am not clear why the Bayesian approaches with some priors can usually handle the issue of nonidentifiability or can get stable estimates of parameters in model fit, while the frequentist approaches cannot. My reply: 1. The term “frequentist approach” is pretty general. “Frequentist” refers to an approach for evaluating inferences, not a method for creating estimates. In particular, any Bayes estimate can be viewed as a frequentist inference if you feel like evaluating its frequency properties. In logistic regression, maximum likelihood has some big problems that are solved with penalized likelihood–equivalently, Bayesian inference. A frequentist can feel free to consider the prior as a penalty function rather than a probability distribution of parameters. 2. The reason our approa

3 0.16694349 846 andrew gelman stats-2011-08-09-Default priors update?

Introduction: Ryan King writes: I was wondering if you have a brief comment on the state of the art for objective priors for hierarchical generalized linear models (generalized linear mixed models). I have been working off the papers in Bayesian Analysis (2006) 1, Number 3 (Browne and Draper, Kass and Natarajan, Gelman). There seems to have been continuous work for matching priors in linear mixed models, but GLMMs less so because of the lack of an analytic marginal likelihood for the variance components. There are a number of additional suggestions in the literature since 2006, but little robust practical guidance. I’m interested in both mean parameters and the variance components. I’m almost always concerned with logistic random effect models. I’m fascinated by the matching-priors idea of higher-order asymptotic improvements to maximum likelihood, and need to make some kind of defensible default recommendation. Given the massive scale of the datasets (genetics …), extensive sensitivity a

4 0.16486067 2200 andrew gelman stats-2014-02-05-Prior distribution for a predicted probability

Introduction: I received the following email: I have an interesting thought on a prior for a logistic regression, and would love your input on how to make it “work.” Some of my research, two published papers, are on mathematical models of **. Along those lines, I’m interested in developing more models for **. . . . Empirical studies show that the public is rather smart and that the wisdom-of-the-crowd is fairly accurate. So, my thought would be to tread the public’s probability of the event as a prior, and then see how adding data, through a model, would change or perturb our inferred probability of **. (Similarly, I could envision using previously published epidemiological research as a prior probability of a disease, and then seeing how the addition of new testing protocols would update that belief.) However, everything I learned about hierarchical Bayesian models has a prior as a distribution on the coefficients. I don’t know how to start with a prior point estimate for the probabili

5 0.14040753 852 andrew gelman stats-2011-08-13-Checking your model using fake data

Introduction: Someone sent me the following email: I tried to do a logistic regression . . . I programmed the model in different ways and got different answers . . . can’t get the results to match . . . What am I doing wrong? . . . Here’s my code . . . I didn’t have the time to look at his code so I gave the following general response: One way to check things is to try simulating data from the fitted model, then fit your model again to the simulated data and see what happens. P.S. He followed my suggestion and responded a few days later: Yeah, that did the trick! I was treating a factor variable as a covariate! I love it when generic advice works out!

6 0.13653612 1087 andrew gelman stats-2011-12-27-“Keeping things unridiculous”: Berger, O’Hagan, and me on weakly informative priors

7 0.13287573 1516 andrew gelman stats-2012-09-30-Computational problems with glm etc.

8 0.12930119 2117 andrew gelman stats-2013-11-29-The gradual transition to replicable science

9 0.12748976 2017 andrew gelman stats-2013-09-11-“Informative g-Priors for Logistic Regression”

10 0.12727976 773 andrew gelman stats-2011-06-18-Should we always be using the t and robit instead of the normal and logit?

11 0.12362594 1092 andrew gelman stats-2011-12-29-More by Berger and me on weakly informative priors

12 0.12278938 779 andrew gelman stats-2011-06-25-Avoiding boundary estimates using a prior distribution as regularization

13 0.11972331 2034 andrew gelman stats-2013-09-23-My talk Tues 24 Sept at 12h30 at Université de Technologie de Compiègne

14 0.11771498 2294 andrew gelman stats-2014-04-17-If you get to the point of asking, just do it. But some difficulties do arise . . .

15 0.11635977 397 andrew gelman stats-2010-11-06-Multilevel quantile regression

16 0.11567335 1486 andrew gelman stats-2012-09-07-Prior distributions for regression coefficients

17 0.10878292 295 andrew gelman stats-2010-09-25-Clusters with very small numbers of observations

18 0.107457 2081 andrew gelman stats-2013-10-29-My talk in Amsterdam tomorrow (Wed 29 Oct): Can we use Bayesian methods to resolve the current crisis of statistically-significant research findings that don’t hold up?

19 0.10455792 1886 andrew gelman stats-2013-06-07-Robust logistic regression

20 0.10245153 2157 andrew gelman stats-2014-01-02-2013


similar blogs computed by lsi model

lsi for this blog:

topicId topicWeight

[(0, 0.178), (1, 0.117), (2, -0.033), (3, 0.01), (4, 0.023), (5, 0.002), (6, 0.064), (7, -0.071), (8, -0.027), (9, 0.087), (10, 0.106), (11, -0.013), (12, 0.044), (13, 0.044), (14, 0.05), (15, 0.014), (16, -0.028), (17, -0.006), (18, 0.018), (19, 0.018), (20, -0.001), (21, 0.022), (22, 0.046), (23, -0.024), (24, -0.029), (25, -0.096), (26, 0.002), (27, -0.061), (28, -0.066), (29, -0.038), (30, -0.027), (31, 0.028), (32, 0.003), (33, 0.037), (34, -0.008), (35, -0.072), (36, -0.003), (37, 0.029), (38, -0.002), (39, -0.04), (40, 0.018), (41, 0.036), (42, 0.039), (43, -0.053), (44, 0.033), (45, 0.072), (46, -0.019), (47, 0.043), (48, -0.008), (49, -0.076)]

similar blogs list:

simIndex simValue blogId blogTitle

same-blog 1 0.96944886 1445 andrew gelman stats-2012-08-06-Slow progress

Introduction: I received the following message: I am a Psychology postgraduate at the University of Glasgow and am writing for an article request. I’ve just read your 2008 published article titled “A weakly informative default prior distribution for logistic and other regression models” and found from it that your group also wrote a report on applying the Bayesian logistic regression approach to multilevel model, which is titled “An approximate EM algorithm for multilevel generalized linear models”. I have been looking for it online but did find it, and was wondering if I may request this report from you? My first thought is that this is a good sign that psychology undergraduates are reading papers like this. Unfortunately I had to reply as follows: Hi, we actually programmed this up but never debugged it! So no actual paper . . . I think I could’ve done it if I had ever focused on the problem. Between the messiness of the algebra and the messiness of the R code, I never got it all to

2 0.68128067 1535 andrew gelman stats-2012-10-16-Bayesian analogue to stepwise regression?

Introduction: Bill Harris writes: On pp. 250-251 of BDA second edition, you write about multiple comparisons, and you write about stepwise regression on p. 405. How would you look at stepwise regression analyses in light of the multiple comparisons problem? Is there an issue? My reply: In this case I think the right approach is to keep all the coefs but partially pool them toward 0 (after suitable transformation). But then the challenge is coming up with a general way to construct good prior distributions. I’m still thinking about that one! Yet another approach is to put something together purely nonparametrically as with Bart.

3 0.67953205 397 andrew gelman stats-2010-11-06-Multilevel quantile regression

Introduction: Ryan Seals writes: I’m an epidemiologist at Emory University, and I’m working on a project of release patterns in jails (basically trying to model how long individuals are in jail before they’re release, for purposes of designing short-term health interventions, i.e. HIV testing, drug counseling, etc…). The question lends itself to quantile regression; we’re interested in the # of days it takes for 50% and 75% of inmates to be released. But being a clustered/nested data structure, it also obviously lends itself to multilevel modeling, with the group-level being individual jails. So: do you know of any work on multilevel quantile regression? My quick lit search didn’t yield much, and I don’t see any preprogrammed way to do it in SAS. My reply: To start with, I’m putting in the R keyword here, on the hope that some readers might be able to refer you to an R function that does what you want. Beyond this, I think it should be possible to program something in Bugs. In ARM we hav

4 0.67333519 2117 andrew gelman stats-2013-11-29-The gradual transition to replicable science

Introduction: Somebody emailed me: I am a researcher at ** University and I have recently read your article on average predictive comparisons for statistical models published 2007 in the journal “Sociological Methodology”. Gelman, Andrew/Iain Pardoe. 2007. “Average Predictive Comparisons for Models with Nonlinearity, Interactions, and Variance Components”. Sociological Methodology 37: 23-51. Currently I am working with multilevel models and find your approach very interesting and useful. May I ask you whether replication materials (e.g. R Code) for this article are available? I had to reply: Hi—I’m embarrassed to say that our R files are a mess! I had ideas of programming the approach more generally as an R package but this has not yet happened yet.

5 0.66986269 10 andrew gelman stats-2010-04-29-Alternatives to regression for social science predictions

Introduction: Somebody named David writes: I [David] thought you might be interested or have an opinion on the paper referenced below. I am largely skeptical on the techniques presented and thought you might have some insight because you work with datasets more similar to those in ‘social science’ than myself. Dana and Dawes. The superiority of simple alternatives to regression for social science predictions. Journal of Educational and Behavioral Statistics (2004) vol. 29 (3) pp. 317. My reply: I read the abstract (available online) and it seemed reasonable to me. They prefer simple averages or weights based on correlations rather than regressions. From a Bayesian perspective, what they’re saying is that least-squares regression and similar methods are noisy, and they can do better via massive simplification. I’ve been a big fan of Robyn Dawes ever since reading his article in the classic Kahneman, Slovic, and Tversky volume. I have no idea how much Dawes knows about modern Bayesian

6 0.66635704 704 andrew gelman stats-2011-05-10-Multiple imputation and multilevel analysis

7 0.66532314 1815 andrew gelman stats-2013-04-20-Displaying inferences from complex models

8 0.6636638 1486 andrew gelman stats-2012-09-07-Prior distributions for regression coefficients

9 0.661955 1989 andrew gelman stats-2013-08-20-Correcting for multiple comparisons in a Bayesian regression model

10 0.66164976 327 andrew gelman stats-2010-10-07-There are never 70 distinct parameters

11 0.66160077 1466 andrew gelman stats-2012-08-22-The scaled inverse Wishart prior distribution for a covariance matrix in a hierarchical model

12 0.65717256 653 andrew gelman stats-2011-04-08-Multilevel regression with shrinkage for “fixed” effects

13 0.65627557 2357 andrew gelman stats-2014-06-02-Why we hate stepwise regression

14 0.6543318 1769 andrew gelman stats-2013-03-18-Tibshirani announces new research result: A significance test for the lasso

15 0.65171218 1216 andrew gelman stats-2012-03-17-Modeling group-level predictors in a multilevel regression

16 0.64905292 726 andrew gelman stats-2011-05-22-Handling multiple versions of an outcome variable

17 0.64890665 247 andrew gelman stats-2010-09-01-How does Bayes do it?

18 0.64518404 510 andrew gelman stats-2011-01-10-I guess they noticed that if you take the first word on every seventeenth page, it spells out “Death to the Shah”

19 0.64509374 1814 andrew gelman stats-2013-04-20-A mess with which I am comfortable

20 0.64121765 848 andrew gelman stats-2011-08-11-That xkcd cartoon on multiple comparisons that all of you were sending me a couple months ago


similar blogs computed by lda model

lda for this blog:

topicId topicWeight

[(5, 0.017), (16, 0.029), (24, 0.228), (40, 0.103), (47, 0.023), (55, 0.024), (56, 0.019), (63, 0.014), (75, 0.022), (86, 0.03), (99, 0.38)]

similar blogs list:

simIndex simValue blogId blogTitle

same-blog 1 0.98018736 1445 andrew gelman stats-2012-08-06-Slow progress

Introduction: I received the following message: I am a Psychology postgraduate at the University of Glasgow and am writing for an article request. I’ve just read your 2008 published article titled “A weakly informative default prior distribution for logistic and other regression models” and found from it that your group also wrote a report on applying the Bayesian logistic regression approach to multilevel model, which is titled “An approximate EM algorithm for multilevel generalized linear models”. I have been looking for it online but did find it, and was wondering if I may request this report from you? My first thought is that this is a good sign that psychology undergraduates are reading papers like this. Unfortunately I had to reply as follows: Hi, we actually programmed this up but never debugged it! So no actual paper . . . I think I could’ve done it if I had ever focused on the problem. Between the messiness of the algebra and the messiness of the R code, I never got it all to

2 0.9732089 1679 andrew gelman stats-2013-01-18-Is it really true that only 8% of people who buy Herbalife products are Herbalife distributors?

Introduction: A reporter emailed me the other day with a question about a case I’d never heard of before, a company called Herbalife that is being accused of being a pyramid scheme. The reporter pointed me to this document which describes a survey conducted by “a third party firm called Lieberman Research”: Two independent studies took place using real time (aka “river”) sampling, in which respondents were intercepted across a wide array of websites Sample size of 2,000 adults 18+ matched to U.S. census on age, gender, income, region and ethnicity “River sampling” in this case appears to mean, according to the reporter, that “people were invited into it through online ads.” The survey found that 5% of U.S. households had purchased Herbalife products during the past three months (with a “0.8% margin of error,” ha ha ha). They they did a multiplication and a division to estimate that only 8% of households who bought these products were Herbalife distributors: 480,000 active distributor

3 0.97123468 1671 andrew gelman stats-2013-01-13-Preregistration of Studies and Mock Reports

Introduction: The traditional system of scientific and scholarly publishing is breaking down in two different directions. On one hand, we are moving away from relying on a small set of journals as gatekeepers: the number of papers and research projects is increasing, the number of publication outlets is increasing, and important manuscripts are being posted on SSRN, Arxiv, and other nonrefereed sites. At the same time, many researchers are worried about the profusion of published claims that turn out to not replicate or in plain language, to be false. This concern is not new–some prominent discussions include Rosenthal (1979), Ioannidis (2005), and Vul et al. (2009)–but there is a growing sense that the scientific signal is being swamped by noise. I recently had the opportunity to comment in the journal Political Analysis on two papers, one by Humphreys, Sierra, and Windt, and one by Monogan, on the preregistration of studies and mock reports. Here’s the issue of the journal. Given the hi

4 0.96832359 2200 andrew gelman stats-2014-02-05-Prior distribution for a predicted probability

Introduction: I received the following email: I have an interesting thought on a prior for a logistic regression, and would love your input on how to make it “work.” Some of my research, two published papers, are on mathematical models of **. Along those lines, I’m interested in developing more models for **. . . . Empirical studies show that the public is rather smart and that the wisdom-of-the-crowd is fairly accurate. So, my thought would be to tread the public’s probability of the event as a prior, and then see how adding data, through a model, would change or perturb our inferred probability of **. (Similarly, I could envision using previously published epidemiological research as a prior probability of a disease, and then seeing how the addition of new testing protocols would update that belief.) However, everything I learned about hierarchical Bayesian models has a prior as a distribution on the coefficients. I don’t know how to start with a prior point estimate for the probabili

5 0.96606731 1988 andrew gelman stats-2013-08-19-BDA3 still (I hope) at 40% off! (and a link to one of my favorite papers)

Introduction: Follow the Amazon link and check to see if it’s still on sale . P.S. I don’t make any money through this link. We do get some royalties from the book, but only a very small amount. I’m pushing the Amazon link right now because (a) I think the book is great, and I want as many people as possible to have it, and (b) 40% off is a pretty good deal and I don’t know how long this will last. P.P.S. Just so this post has some statistical content, here’s one of my favorite papers , Bayesian model-building by pure thought: some principles and examples. It’s from 1996, and here’s the abstract:

6 0.96587169 2129 andrew gelman stats-2013-12-10-Cross-validation and Bayesian estimation of tuning parameters

7 0.96518993 970 andrew gelman stats-2011-10-24-Bell Labs

8 0.96482527 2090 andrew gelman stats-2013-11-05-How much do we trust a new claim that early childhood stimulation raised earnings by 42%?

9 0.96418917 1733 andrew gelman stats-2013-02-22-Krugman sets the bar too high

10 0.96411777 669 andrew gelman stats-2011-04-19-The mysterious Gamma (1.4, 0.4)

11 0.96395791 1941 andrew gelman stats-2013-07-16-Priors

12 0.96354485 1170 andrew gelman stats-2012-02-16-A previous discussion with Charles Murray about liberals, conservatives, and social class

13 0.96317899 86 andrew gelman stats-2010-06-14-“Too much data”?

14 0.96285743 1644 andrew gelman stats-2012-12-30-Fixed effects, followed by Bayes shrinkage?

15 0.96270525 259 andrew gelman stats-2010-09-06-Inbox zero. Really.

16 0.9627037 2109 andrew gelman stats-2013-11-21-Hidden dangers of noninformative priors

17 0.9624998 1150 andrew gelman stats-2012-02-02-The inevitable problems with statistical significance and 95% intervals

18 0.96182233 408 andrew gelman stats-2010-11-11-Incumbency advantage in 2010

19 0.96172708 2283 andrew gelman stats-2014-04-06-An old discussion of food deserts

20 0.96150231 1225 andrew gelman stats-2012-03-22-Procrastination as a positive productivity strategy