andrew_gelman_stats andrew_gelman_stats-2010 andrew_gelman_stats-2010-419 knowledge-graph by maker-knowledge-mining

419 andrew gelman stats-2010-11-18-Derivative-based MCMC as a breakthrough technique for implementing Bayesian statistics


meta infos for this blog

Source: html

Introduction: John Salvatier pointed me to this blog on derivative based MCMC algorithms (also sometimes called “hybrid” or “Hamiltonian” Monte Carlo) and automatic differentiation as the future of MCMC. This all makes sense to me and is consistent both with my mathematical intuition from studying Metropolis algorithms and my experience with Matt using hybrid MCMC when fitting hierarchical spline models. In particular, I agree with Salvatier’s point about the potential for computation of analytic derivatives of the log-density function. As long as we’re mostly snapping together our models using analytically-simple pieces, the same part of the program that handles the computation of log-posterior densities should also be able to compute derivatives analytically. I’ve been a big fan of automatic derivative-based MCMC methods since I started hearing about them a couple years ago (I’m thinking of the DREAM project and of Mark Girolami’s paper), and I too wonder why they haven’t been used more. I


Summary: the most important sentenses genereted by tfidf model

sentIndex sentText sentNum sentScore

1 John Salvatier pointed me to this blog on derivative based MCMC algorithms (also sometimes called “hybrid” or “Hamiltonian” Monte Carlo) and automatic differentiation as the future of MCMC. [sent-1, score-0.635]

2 This all makes sense to me and is consistent both with my mathematical intuition from studying Metropolis algorithms and my experience with Matt using hybrid MCMC when fitting hierarchical spline models. [sent-2, score-1.039]

3 In particular, I agree with Salvatier’s point about the potential for computation of analytic derivatives of the log-density function. [sent-3, score-0.468]

4 As long as we’re mostly snapping together our models using analytically-simple pieces, the same part of the program that handles the computation of log-posterior densities should also be able to compute derivatives analytically. [sent-4, score-0.827]

5 I’ve been a big fan of automatic derivative-based MCMC methods since I started hearing about them a couple years ago (I’m thinking of the DREAM project and of Mark Girolami’s paper), and I too wonder why they haven’t been used more. [sent-5, score-0.548]

6 I guess we can try implementing them in our current project in which we’re trying to fit models with deep interactions. [sent-6, score-0.383]

7 I also suspect there are some underlying connections between derivative-based jumping rules and redundant parameterizations for hierarchical models . [sent-7, score-0.679]

8 Salvatier is saying what I’ve been saying (not very convincingly) for a couple years. [sent-9, score-0.256]

9 But, somehow, seeing it in somebody else’s words makes it much more persuasive, and again I’m all excited about this stuff. [sent-10, score-0.227]

10 My only amendment to Salvatier’s blog is that I wouldn’t refer to these as “new” algorithms; they’ve been around for something like 25 years, I think. [sent-11, score-0.199]


similar blogs computed by tfidf model

tfidf for this blog:

wordName wordTfidf (topN-words)

[('salvatier', 0.509), ('mcmc', 0.243), ('algorithms', 0.241), ('hybrid', 0.235), ('derivatives', 0.217), ('automatic', 0.184), ('computation', 0.153), ('amendment', 0.122), ('girolami', 0.122), ('handles', 0.122), ('hierarchical', 0.116), ('project', 0.115), ('parameterizations', 0.114), ('redundant', 0.111), ('spline', 0.109), ('derivative', 0.106), ('convincingly', 0.104), ('differentiation', 0.104), ('models', 0.101), ('persuasive', 0.101), ('analytic', 0.098), ('densities', 0.097), ('dream', 0.094), ('implementing', 0.094), ('metropolis', 0.094), ('jumping', 0.094), ('couple', 0.092), ('excited', 0.092), ('matt', 0.088), ('hamiltonian', 0.088), ('carlo', 0.084), ('saying', 0.082), ('hearing', 0.081), ('monte', 0.08), ('pieces', 0.078), ('connections', 0.078), ('intuition', 0.078), ('refer', 0.077), ('fan', 0.076), ('compute', 0.075), ('deep', 0.073), ('somehow', 0.071), ('makes', 0.069), ('ve', 0.069), ('somebody', 0.066), ('studying', 0.066), ('rules', 0.065), ('fitting', 0.063), ('mostly', 0.062), ('consistent', 0.062)]

similar blogs list:

simIndex simValue blogId blogTitle

same-blog 1 1.0000001 419 andrew gelman stats-2010-11-18-Derivative-based MCMC as a breakthrough technique for implementing Bayesian statistics

Introduction: John Salvatier pointed me to this blog on derivative based MCMC algorithms (also sometimes called “hybrid” or “Hamiltonian” Monte Carlo) and automatic differentiation as the future of MCMC. This all makes sense to me and is consistent both with my mathematical intuition from studying Metropolis algorithms and my experience with Matt using hybrid MCMC when fitting hierarchical spline models. In particular, I agree with Salvatier’s point about the potential for computation of analytic derivatives of the log-density function. As long as we’re mostly snapping together our models using analytically-simple pieces, the same part of the program that handles the computation of log-posterior densities should also be able to compute derivatives analytically. I’ve been a big fan of automatic derivative-based MCMC methods since I started hearing about them a couple years ago (I’m thinking of the DREAM project and of Mark Girolami’s paper), and I too wonder why they haven’t been used more. I

2 0.28773969 181 andrew gelman stats-2010-08-03-MCMC in Python

Introduction: John Salvatier forwards a note from Anand Patil that a paper on PyMC has appeared in the Journal of Statistical Software, We’ll have to check this out.

3 0.20028323 575 andrew gelman stats-2011-02-15-What are the trickiest models to fit?

Introduction: John Salvatier writes: What do you and your readers think are the trickiest models to fit? If I had an algorithm that I claimed could fit many models with little fuss, what kinds of models would really impress you? I am interested in testing different MCMC sampling methods to evaluate their performance and I want to stretch the bounds of their abilities. I don’t know what’s the trickiest, but just about anything I work on in a serious way gives me some troubles. This reminds me that we should finish our Bayesian Benchmarks paper already.

4 0.16746351 535 andrew gelman stats-2011-01-24-Bleg: Automatic Differentiation for Log Prob Gradients?

Introduction: We need help picking out an automatic differentiation package for Hamiltonian Monte Carlo sampling from the posterior of a generalized linear model with deep interactions. Specifically, we need to compute gradients for log probability functions with thousands of parameters that involve matrix (determinants, eigenvalues, inverses), stats (distributions), and math (log gamma) functions. Any suggestions? The Application: Hybrid Monte Carlo for Posteriors We’re getting serious about implementing posterior sampling using Hamiltonian Monte Carlo. HMC speeds up mixing by including gradient information to help guide the Metropolis proposals toward areas high probability. In practice, the algorithm requires a handful or of gradient calculations per sample, but there are many dimensions and the functions are hairy enough we don’t want to compute derivaties by hand. Auto Diff: Perhaps not What you Think It may not have been clear to readers of this blog that automatic diffe

5 0.13739911 555 andrew gelman stats-2011-02-04-Handy Matrix Cheat Sheet, with Gradients

Introduction: This post is an (unpaid) advertisement for the following extremely useful resource: Petersen, K. B. and M. S. Pedersen. 2008. The Matrix Cookbook . Tehcnical Report, Technical University of Denmark. It contains 70+ pages of useful relations and derivations involving matrices. What grabbed my eye was the computation of gradients for matrix operations ranging from eigenvalues and determinants to multivariate normal density functions. I had no idea the multivariate normal had such a clean gradient (see section 8). We’ve been playing around with Hamiltonian (aka Hybrid) Monte Carlo for sampling from the posterior of hierarchical generalized linear models with lots of interactions. HMC speeds up Metropolis sampling by using the gradient of the log probability to drive samples in the direction of higher probability density, which is particularly useful for correlated parameters that mix slowly with standard Gibbs sampling. Matt “III” Hoffman ‘s already got it workin

6 0.1360935 1735 andrew gelman stats-2013-02-24-F-f-f-fake data

7 0.12043784 1443 andrew gelman stats-2012-08-04-Bayesian Learning via Stochastic Gradient Langevin Dynamics

8 0.10592052 2156 andrew gelman stats-2014-01-01-“Though They May Be Unaware, Newlyweds Implicitly Know Whether Their Marriage Will Be Satisfying”

9 0.10501287 2231 andrew gelman stats-2014-03-03-Running into a Stan Reference by Accident

10 0.10177125 1489 andrew gelman stats-2012-09-09-Commercial Bayesian inference software is popping up all over

11 0.09511669 421 andrew gelman stats-2010-11-19-Just chaid

12 0.091294773 2067 andrew gelman stats-2013-10-18-EP and ABC

13 0.089410126 1772 andrew gelman stats-2013-03-20-Stan at Google this Thurs and at Berkeley this Fri noon

14 0.087879486 861 andrew gelman stats-2011-08-19-Will Stan work well with 40×40 matrices?

15 0.087143183 1739 andrew gelman stats-2013-02-26-An AI can build and try out statistical models using an open-ended generative grammar

16 0.085460119 590 andrew gelman stats-2011-02-25-Good introductory book for statistical computation?

17 0.084520854 2003 andrew gelman stats-2013-08-30-Stan Project: Continuous Relaxations for Discrete MRFs

18 0.081207663 1399 andrew gelman stats-2012-06-28-Life imitates blog

19 0.079604894 1339 andrew gelman stats-2012-05-23-Learning Differential Geometry for Hamiltonian Monte Carlo

20 0.079171717 1749 andrew gelman stats-2013-03-04-Stan in L.A. this Wed 3:30pm


similar blogs computed by lsi model

lsi for this blog:

topicId topicWeight

[(0, 0.13), (1, 0.035), (2, -0.038), (3, 0.054), (4, 0.026), (5, 0.045), (6, 0.005), (7, -0.078), (8, 0.005), (9, 0.006), (10, -0.005), (11, -0.018), (12, -0.042), (13, -0.036), (14, 0.025), (15, 0.008), (16, -0.003), (17, 0.008), (18, -0.014), (19, -0.01), (20, 0.01), (21, -0.003), (22, -0.04), (23, 0.002), (24, 0.012), (25, -0.015), (26, -0.063), (27, 0.048), (28, 0.027), (29, -0.011), (30, -0.023), (31, 0.004), (32, 0.033), (33, -0.05), (34, 0.021), (35, -0.041), (36, -0.018), (37, 0.01), (38, -0.028), (39, 0.021), (40, -0.065), (41, 0.062), (42, 0.002), (43, -0.008), (44, -0.004), (45, -0.054), (46, -0.072), (47, 0.022), (48, 0.06), (49, -0.041)]

similar blogs list:

simIndex simValue blogId blogTitle

same-blog 1 0.95599359 419 andrew gelman stats-2010-11-18-Derivative-based MCMC as a breakthrough technique for implementing Bayesian statistics

Introduction: John Salvatier pointed me to this blog on derivative based MCMC algorithms (also sometimes called “hybrid” or “Hamiltonian” Monte Carlo) and automatic differentiation as the future of MCMC. This all makes sense to me and is consistent both with my mathematical intuition from studying Metropolis algorithms and my experience with Matt using hybrid MCMC when fitting hierarchical spline models. In particular, I agree with Salvatier’s point about the potential for computation of analytic derivatives of the log-density function. As long as we’re mostly snapping together our models using analytically-simple pieces, the same part of the program that handles the computation of log-posterior densities should also be able to compute derivatives analytically. I’ve been a big fan of automatic derivative-based MCMC methods since I started hearing about them a couple years ago (I’m thinking of the DREAM project and of Mark Girolami’s paper), and I too wonder why they haven’t been used more. I

2 0.75959092 555 andrew gelman stats-2011-02-04-Handy Matrix Cheat Sheet, with Gradients

Introduction: This post is an (unpaid) advertisement for the following extremely useful resource: Petersen, K. B. and M. S. Pedersen. 2008. The Matrix Cookbook . Tehcnical Report, Technical University of Denmark. It contains 70+ pages of useful relations and derivations involving matrices. What grabbed my eye was the computation of gradients for matrix operations ranging from eigenvalues and determinants to multivariate normal density functions. I had no idea the multivariate normal had such a clean gradient (see section 8). We’ve been playing around with Hamiltonian (aka Hybrid) Monte Carlo for sampling from the posterior of hierarchical generalized linear models with lots of interactions. HMC speeds up Metropolis sampling by using the gradient of the log probability to drive samples in the direction of higher probability density, which is particularly useful for correlated parameters that mix slowly with standard Gibbs sampling. Matt “III” Hoffman ‘s already got it workin

3 0.75281715 575 andrew gelman stats-2011-02-15-What are the trickiest models to fit?

Introduction: John Salvatier writes: What do you and your readers think are the trickiest models to fit? If I had an algorithm that I claimed could fit many models with little fuss, what kinds of models would really impress you? I am interested in testing different MCMC sampling methods to evaluate their performance and I want to stretch the bounds of their abilities. I don’t know what’s the trickiest, but just about anything I work on in a serious way gives me some troubles. This reminds me that we should finish our Bayesian Benchmarks paper already.

4 0.7509591 931 andrew gelman stats-2011-09-29-Hamiltonian Monte Carlo stories

Introduction: Tomas Iesmantas had asked me for advice on a regression problem with 50 parameters, and I’d recommended Hamiltonian Monte Carlo. A few weeks later he reported back: After trying several modifications (HMC for all parameters at once, HMC just for first level parameters and Riemman manifold Hamiltonian Monte Carlo method), I finally got it running with HMC just for first level parameters and for others using direct sampling, since conditional distributions turned out to have closed form. However, even in this case it is quite tricky, since I had to employ mass matrix and not just diagonal but at the beginning of algorithm generated it randomly (ensuring it is positive definite). Such random generation of mass matrix is quite blind step, but it proved to be quite helpful. Riemman manifold HMC is quite vagarious, or to be more specific, metric of manifold is very sensitive. In my model log-likelihood I had exponents and values of metrics matrix elements was very large and wh

5 0.73620075 1682 andrew gelman stats-2013-01-19-R package for Bayes factors

Introduction: Richard Morey writes: You and your blog readers may be interested to know that a we’ve released a major new version of the BayesFactor package to CRAN. The package computes Bayes factors for linear mixed models and regression models. Of course, I’m aware you don’t like point-null model comparisons, but the package does more than that; it also allows sampling from posterior distributions of the compared models, in much the same way that your arm package does with lmer objects. The sampling (both for the Bayes factors and posteriors) is quite fast, since the back end is written in C. Some basic examples using the package can be found here , and the CRAN page is here . Indeed I don’t like point-null model comparisons . . . but maybe this will be useful to some of you!

6 0.73062998 535 andrew gelman stats-2011-01-24-Bleg: Automatic Differentiation for Log Prob Gradients?

7 0.73045039 1036 andrew gelman stats-2011-11-30-Stan uses Nuts!

8 0.68504614 1489 andrew gelman stats-2012-09-09-Commercial Bayesian inference software is popping up all over

9 0.68076533 1991 andrew gelman stats-2013-08-21-BDA3 table of contents (also a new paper on visualization)

10 0.67784888 501 andrew gelman stats-2011-01-04-A new R package for fititng multilevel models

11 0.67360568 243 andrew gelman stats-2010-08-30-Computer models of the oil spill

12 0.67012876 1443 andrew gelman stats-2012-08-04-Bayesian Learning via Stochastic Gradient Langevin Dynamics

13 0.66969347 1739 andrew gelman stats-2013-02-26-An AI can build and try out statistical models using an open-ended generative grammar

14 0.66123414 2020 andrew gelman stats-2013-09-12-Samplers for Big Science: emcee and BAT

15 0.64513302 2003 andrew gelman stats-2013-08-30-Stan Project: Continuous Relaxations for Discrete MRFs

16 0.63351703 1134 andrew gelman stats-2012-01-21-Lessons learned from a recent R package submission

17 0.63094759 2231 andrew gelman stats-2014-03-03-Running into a Stan Reference by Accident

18 0.62089211 2332 andrew gelman stats-2014-05-12-“The results (not shown) . . .”

19 0.61992407 421 andrew gelman stats-2010-11-19-Just chaid

20 0.61572057 1374 andrew gelman stats-2012-06-11-Convergence Monitoring for Non-Identifiable and Non-Parametric Models


similar blogs computed by lda model

lda for this blog:

topicId topicWeight

[(4, 0.197), (6, 0.012), (16, 0.054), (21, 0.042), (22, 0.029), (23, 0.025), (24, 0.149), (36, 0.025), (57, 0.012), (61, 0.029), (73, 0.012), (82, 0.023), (99, 0.286)]

similar blogs list:

simIndex simValue blogId blogTitle

1 0.9587602 947 andrew gelman stats-2011-10-08-GiveWell sez: Cost-effectiveness of de-worming was overstated by a factor of 100 (!) due to a series of sloppy calculations

Introduction: Alexander at GiveWell writes : The Disease Control Priorities in Developing Countries (DCP2), a major report funded by the Gates Foundation . . . provides an estimate of $3.41 per disability-adjusted life-year (DALY) for the cost-effectiveness of soil-transmitted-helminth (STH) treatment, implying that STH treatment is one of the most cost-effective interventions for global health. In investigating this figure, we have corresponded, over a period of months, with six scholars who had been directly or indirectly involved in the production of the estimate. Eventually, we were able to obtain the spreadsheet that was used to generate the $3.41/DALY estimate. That spreadsheet contains five separate errors that, when corrected, shift the estimated cost effectiveness of deworming from $3.41 to $326.43. [I think they mean to say $300 -- ed.] We came to this conclusion a year after learning that the DCP2’s published cost-effectiveness estimate for schistosomiasis treatment – another kind of

2 0.95101309 1618 andrew gelman stats-2012-12-11-The consulting biz

Introduction: I received the following (unsolicited) email: Hello, *** LLC, a ***-based market research company, has a financial client who is interested in speaking with a statistician who has done research in the field of Alzheimer’s Disease and preferably familiar with the SOLA and BAPI trials. We offer an honorarium of $200 for a 30 minute telephone interview. Please advise us if you have an employment or consulting agreement with any organization or operate professionally pursuant to an organization’s code of conduct or employee manual that may control activities by you outside of your regular present and former employment, such as participating in this consulting project for MedPanel. If there are such contracts or other documents that do apply to you, please forward MedPanel a copy of each such document asap as we are obligated to review such documents to determine if you are permitted to participate as a consultant for MedPanel on a project with this particular client. If you are

3 0.94361174 1801 andrew gelman stats-2013-04-13-Can you write a program to determine the causal order?

Introduction: Mike Zyphur writes: Kaggle.com has launched a competition to determine what’s an effect and what’s a cause. They’ve got correlated variables, they’re deprived of context, and you’re asked to determine the causal order. $5,000 prizes. I followed the link and the example they gave didn’t make much sense to me (the two variables were temperature and altitude of cities in Germany, and they said that altitude causes temperature). It has the feeling to me of one of those weird standardized tests we used to see sometimes in school, where there’s no real correct answer so the goal is to figure out what the test-writer wanted you to say. Nonetheless, this might be of interest, so I’m passing it along to you.

4 0.94165385 1919 andrew gelman stats-2013-06-29-R sucks

Introduction: I was trying to make some new graphs using 5-year-old R code and I got all these problems because I was reading in files with variable names such as “co.fipsid” and now R is automatically changing them to “co_fipsid”. Or maybe the names had underbars all along, and the old R had changed them into dots. Whatever. I understand that backward compatibility can be hard to maintain, but this is just annoying.

5 0.93186212 1918 andrew gelman stats-2013-06-29-Going negative

Introduction: Troels Ring writes: I have measured total phosphorus, TP, on a number of dialysis patients, and also measured conventional phosphate, Pi. Now P is exchanged with the environment as Pi, so in principle a correlation between TP and Pi could perhaps be expected. I’m really most interested in the fraction of TP which is not Pi, that is TP-Pi. I would also expect that to be positively correlated with Pi. However, looking at the data using a mixed model an insignificant negative correlation is obtained. Then I thought, that since TP-Pi is bound to be small if Pi is large a negative correlation is almost dictated by the math even if the biology would have it otherwise in so far as the the TP-Pi, likely organic P, must someday have been Pi. Hence I thought about correcting the slight negative correlation between TP-Pi and Pi for the expected large negative correlation due to the math – to eventually recover what I came from: a positive correlation. People seems to agree that this thinki

6 0.92376471 238 andrew gelman stats-2010-08-27-No radon lobby

same-blog 7 0.92156708 419 andrew gelman stats-2010-11-18-Derivative-based MCMC as a breakthrough technique for implementing Bayesian statistics

8 0.91908681 907 andrew gelman stats-2011-09-14-Reproducibility in Practice

9 0.9085443 113 andrew gelman stats-2010-06-28-Advocacy in the form of a “deliberative forum”

10 0.90321267 1470 andrew gelman stats-2012-08-26-Graphs showing regression uncertainty: the code!

11 0.89921439 1829 andrew gelman stats-2013-04-28-Plain old everyday Bayesianism!

12 0.89224732 2211 andrew gelman stats-2014-02-14-The popularity of certain baby names is falling off the clifffffffffffff

13 0.88886887 2212 andrew gelman stats-2014-02-15-Mary, Mary, why ya buggin

14 0.8860966 2078 andrew gelman stats-2013-10-26-“The Bayesian approach to forensic evidence”

15 0.88475955 2000 andrew gelman stats-2013-08-28-Why during the 1950-1960′s did Jerry Cornfield become a Bayesian?

16 0.8758561 1997 andrew gelman stats-2013-08-24-Measurement error in monkey studies

17 0.86854303 1605 andrew gelman stats-2012-12-04-Write This Book

18 0.86763597 1350 andrew gelman stats-2012-05-28-Value-added assessment: What went wrong?

19 0.86444569 1996 andrew gelman stats-2013-08-24-All inference is about generalizing from sample to population

20 0.85586107 2120 andrew gelman stats-2013-12-02-Does a professor’s intervention in online discussions have the effect of prolonging discussion or cutting it off?