andrew_gelman_stats andrew_gelman_stats-2013 andrew_gelman_stats-2013-2020 knowledge-graph by maker-knowledge-mining

2020 andrew gelman stats-2013-09-12-Samplers for Big Science: emcee and BAT


meta infos for this blog

Source: html

Introduction: Over the past few months, we’ve talked about modeling with particle physicists ( Allen Caldwell ), astrophysicists ( David Hogg , who regularly comments here), and climate and energy usage modelers ( Phil Price , who regularly posts here). Big Science Black Boxes We’ve gotten pretty much the same story from all of them: their models involve “big science” components that are hugely complex and provided by outside implementations from labs like CERN or LBL. Some concrete examples for energy modeling are the TOUGH2 thermal simulator, the EnergyPlus building energy usage simulator, and global climate model (GCM) implementations. These models have the character of not only being black boxes, but taking several seconds or more to generate the equivalent of a likelihood function evaluation. So we can’t use something like Stan, because nobody has the person years required to implement something like TOUGH2 in Stan (and Stan doesn’t have the debugging or modularity tools to suppor


Summary: the most important sentenses genereted by tfidf model

sentIndex sentText sentNum sentScore

1 Over the past few months, we’ve talked about modeling with particle physicists ( Allen Caldwell ), astrophysicists ( David Hogg , who regularly comments here), and climate and energy usage modelers ( Phil Price , who regularly posts here). [sent-1, score-0.736]

2 Big Science Black Boxes We’ve gotten pretty much the same story from all of them: their models involve “big science” components that are hugely complex and provided by outside implementations from labs like CERN or LBL. [sent-2, score-0.07]

3 Some concrete examples for energy modeling are the TOUGH2 thermal simulator, the EnergyPlus building energy usage simulator, and global climate model (GCM) implementations. [sent-3, score-0.653]

4 These models have the character of not only being black boxes, but taking several seconds or more to generate the equivalent of a likelihood function evaluation. [sent-4, score-0.212]

5 So we can’t use something like Stan, because nobody has the person years required to implement something like TOUGH2 in Stan (and Stan doesn’t have the debugging or modularity tools to support large-scale development anyway). [sent-5, score-0.321]

6 Even just templating out the code so that we could use automatic differentiation on it would be a huge challenge. [sent-6, score-0.14]

7 Sampling and Optimization Not surprisingly, these researchers tend to focus on methods that can be implemented using only black box log probability evaluators. [sent-7, score-0.237]

8 Phil told us that he and his colleagues at LBL use optimizers that are based on ensemble techniques like the (the Nelder-Mead method ). [sent-8, score-0.419]

9 Bayesian Analysis Toolkit (BAT) Allen Caldwell and his group (Daniel Kollar and Kevin Kröninger are listed as the core developers) are behind the Bayesian Analysis Toolkit (BAT), which is based on Metropolis. [sent-10, score-0.086]

10 BAT requires users to implement a class in C++ for the model, but it can call any kind of external libraries it wants. [sent-11, score-0.402]

11 emcee, the MCMC Hammer David Hogg and his group (Daniel Foreman-Mackey seems to be doing the heavy code lifting) are behind emcee , aka “the MCMC Hammer,” which is based on Goodman and Weare’s ensemble sampler , which was motivated by the Nelder-Mead method for optimization. [sent-12, score-0.934]

12 emcee requires users to implement a log probability function in Python, which can then call C or C++ or Fortran on the back end. [sent-13, score-0.882]

13 We plan to add the Goodman and Weare ensemble method to Stan. [sent-14, score-0.334]

14 We’re still working on how to integrate it into our sampling framework. [sent-15, score-0.143]

15 Perhaps we should’ve heeded Hadley Wickham ‘s advice, which he followed in naming “ggplot2,” which is to pick something with zero existing hits on Google. [sent-19, score-0.07]

16 ” Of course, “bugs” takes the cake in this competition. [sent-21, score-0.085]


similar blogs computed by tfidf model

tfidf for this blog:

wordName wordTfidf (topN-words)

[('emcee', 0.374), ('ensemble', 0.241), ('bat', 0.231), ('caldwell', 0.187), ('toolkit', 0.187), ('weare', 0.187), ('hammer', 0.17), ('simulator', 0.17), ('implement', 0.159), ('goodman', 0.148), ('stan', 0.148), ('energy', 0.142), ('black', 0.137), ('boxes', 0.134), ('usage', 0.127), ('allen', 0.125), ('hogg', 0.122), ('regularly', 0.114), ('mcmc', 0.102), ('log', 0.1), ('climate', 0.1), ('daniel', 0.094), ('method', 0.093), ('phil', 0.092), ('users', 0.092), ('behind', 0.086), ('optimizers', 0.085), ('cake', 0.085), ('modularity', 0.085), ('requires', 0.082), ('ensembles', 0.08), ('debugging', 0.077), ('cern', 0.077), ('function', 0.075), ('code', 0.074), ('sampling', 0.074), ('modelers', 0.072), ('thermal', 0.072), ('lifting', 0.072), ('fortran', 0.072), ('naming', 0.07), ('concrete', 0.07), ('implementations', 0.07), ('libraries', 0.069), ('integrate', 0.069), ('particle', 0.067), ('python', 0.067), ('wickham', 0.066), ('differentiation', 0.066), ('aka', 0.066)]

similar blogs list:

simIndex simValue blogId blogTitle

same-blog 1 1.0000001 2020 andrew gelman stats-2013-09-12-Samplers for Big Science: emcee and BAT

Introduction: Over the past few months, we’ve talked about modeling with particle physicists ( Allen Caldwell ), astrophysicists ( David Hogg , who regularly comments here), and climate and energy usage modelers ( Phil Price , who regularly posts here). Big Science Black Boxes We’ve gotten pretty much the same story from all of them: their models involve “big science” components that are hugely complex and provided by outside implementations from labs like CERN or LBL. Some concrete examples for energy modeling are the TOUGH2 thermal simulator, the EnergyPlus building energy usage simulator, and global climate model (GCM) implementations. These models have the character of not only being black boxes, but taking several seconds or more to generate the equivalent of a likelihood function evaluation. So we can’t use something like Stan, because nobody has the person years required to implement something like TOUGH2 in Stan (and Stan doesn’t have the debugging or modularity tools to suppor

2 0.14648104 1748 andrew gelman stats-2013-03-04-PyStan!

Introduction: Stan is written in C++ and can be run from the command line and from R. We’d like for Python users to be able to run Stan as well. If anyone is interested in doing this, please let us know and we’d be happy to work with you on it. Stan, like Python, is completely free and open-source. P.S. Because Stan is open-source, it of course would also be possible for people to translate Stan into Python, or to take whatever features they like from Stan and incorporate them into a Python package. That’s fine too. But we think it would make sense in addition for users to be able to run Stan directly from Python, in the same way that it can be run from R.

3 0.13637711 2291 andrew gelman stats-2014-04-14-Transitioning to Stan

Introduction: Kevin Cartier writes: I’ve been happily using R for a number of years now and recently came across Stan. Looks big and powerful, so I’d like to pick an appropriate project and try it out. I wondered if you could point me to a link or document that goes into the motivation for this tool (aside from the Stan user doc)? What I’d like to understand is, at what point might you look at an emergent R project and advise, “You know, that thing you’re trying to do would be a whole lot easier/simpler/more straightforward to implement with Stan.” (or words to that effect). My reply: For my collaborators in political science, Stan has been most useful for models where the data set is not huge (e.g., we might have 10,000 data points or 50,000 data points but not 10 million) but where the model is somewhat complex (for example, a model with latent time series structure). The point is that the model has enough parameters and uncertainty that you’ll want to do full Bayes (rather than some sort

4 0.12553221 1475 andrew gelman stats-2012-08-30-A Stan is Born

Introduction: Stan 1.0.0 and RStan 1.0.0 It’s official. The Stan Development Team is happy to announce the first stable versions of Stan and RStan. What is (R)Stan? Stan is an open-source package for obtaining Bayesian inference using the No-U-Turn sampler, a variant of Hamiltonian Monte Carlo. It’s sort of like BUGS, but with a different language for expressing models and a different sampler for sampling from their posteriors. RStan is the R interface to Stan. Stan Home Page Stan’s home page is: http://mc-stan.org/ It links everything you need to get started running Stan from the command line, from R, or from C++, including full step-by-step install instructions, a detailed user’s guide and reference manual for the modeling language, and tested ports of most of the BUGS examples. Peruse the Manual If you’d like to learn more, the Stan User’s Guide and Reference Manual is the place to start.

5 0.11301106 1950 andrew gelman stats-2013-07-22-My talks that were scheduled for Tues at the Data Skeptics meetup and Wed at the Open Statistical Programming meetup

Introduction: Statistical Methods and Data Skepticism Data analysis today is dominated by three paradigms: null hypothesis significance testing, Bayesian inference, and exploratory data analysis. There is concern that all these methods lead to overconfidence on the part of researchers and the general public, and this concern has led to the new “data skepticism” movement. But the history of statistics is already in some sense a history of data skepticism. Concepts of bias, variance, sampling and measurement error, least-squares regression, and statistical significance can all be viewed as formalizations of data skepticism. All these methods address the concern that patterns in observed data might not generalize to the population of interest. We discuss the challenge of attaining data skepticism while avoiding data nihilism, and consider some proposed future directions. Stan Stan (mc-stan.org) is an open-source package for obtaining Bayesian inference using the No-U-Turn sampler, a

6 0.10713184 1580 andrew gelman stats-2012-11-16-Stantastic!

7 0.10230149 535 andrew gelman stats-2011-01-24-Bleg: Automatic Differentiation for Log Prob Gradients?

8 0.10122589 1825 andrew gelman stats-2013-04-25-It’s binless! A program for computing normalizing functions

9 0.099910483 2364 andrew gelman stats-2014-06-08-Regression and causality and variable ordering

10 0.099078365 1753 andrew gelman stats-2013-03-06-Stan 1.2.0 and RStan 1.2.0

11 0.097141236 2124 andrew gelman stats-2013-12-05-Stan (quietly) passes 512 people on the users list

12 0.09230949 160 andrew gelman stats-2010-07-23-Unhappy with improvement by a factor of 10^29

13 0.09177161 1010 andrew gelman stats-2011-11-14-“Free energy” and economic resources

14 0.089807577 1948 andrew gelman stats-2013-07-21-Bayes related

15 0.086955987 774 andrew gelman stats-2011-06-20-The pervasive twoishness of statistics; in particular, the “sampling distribution” and the “likelihood” are two different models, and that’s a good thing

16 0.084933154 1528 andrew gelman stats-2012-10-10-My talk at MIT on Thurs 11 Oct

17 0.083513714 2231 andrew gelman stats-2014-03-03-Running into a Stan Reference by Accident

18 0.082897738 1489 andrew gelman stats-2012-09-09-Commercial Bayesian inference software is popping up all over

19 0.082853086 2161 andrew gelman stats-2014-01-07-My recent debugging experience

20 0.078790002 2325 andrew gelman stats-2014-05-07-Stan users meetup next week


similar blogs computed by lsi model

lsi for this blog:

topicId topicWeight

[(0, 0.139), (1, 0.068), (2, -0.031), (3, 0.056), (4, 0.058), (5, 0.07), (6, -0.003), (7, -0.113), (8, -0.024), (9, -0.076), (10, -0.069), (11, -0.022), (12, -0.051), (13, -0.02), (14, 0.012), (15, -0.008), (16, 0.02), (17, 0.014), (18, -0.002), (19, -0.016), (20, -0.016), (21, -0.0), (22, -0.053), (23, 0.014), (24, -0.002), (25, -0.005), (26, -0.012), (27, 0.009), (28, 0.022), (29, -0.004), (30, 0.021), (31, 0.027), (32, -0.002), (33, -0.025), (34, -0.037), (35, 0.001), (36, -0.001), (37, 0.024), (38, -0.02), (39, -0.018), (40, -0.031), (41, 0.027), (42, -0.038), (43, -0.012), (44, -0.003), (45, -0.007), (46, -0.012), (47, 0.017), (48, -0.018), (49, 0.014)]

similar blogs list:

simIndex simValue blogId blogTitle

same-blog 1 0.962524 2020 andrew gelman stats-2013-09-12-Samplers for Big Science: emcee and BAT

Introduction: Over the past few months, we’ve talked about modeling with particle physicists ( Allen Caldwell ), astrophysicists ( David Hogg , who regularly comments here), and climate and energy usage modelers ( Phil Price , who regularly posts here). Big Science Black Boxes We’ve gotten pretty much the same story from all of them: their models involve “big science” components that are hugely complex and provided by outside implementations from labs like CERN or LBL. Some concrete examples for energy modeling are the TOUGH2 thermal simulator, the EnergyPlus building energy usage simulator, and global climate model (GCM) implementations. These models have the character of not only being black boxes, but taking several seconds or more to generate the equivalent of a likelihood function evaluation. So we can’t use something like Stan, because nobody has the person years required to implement something like TOUGH2 in Stan (and Stan doesn’t have the debugging or modularity tools to suppor

2 0.88394547 1036 andrew gelman stats-2011-11-30-Stan uses Nuts!

Introduction: We interrupt our usual program of Ed Wegman Gregg Easterbrook Niall Ferguson mockery to deliver a serious update on our statistical computing project. Stan (“Sampling Through Adaptive Neighborhoods”) is our new C++ program (written mostly by Bob Carpenter) that draws samples from Bayesian models. Stan can take different sorts of inputs: you can write the model in a Bugs-like syntax and it goes from there, or you can write the log-posterior directly as a C++ function. Most of the computation is done using Hamiltonian Monte Carlo. HMC requires some tuning, so Matt Hoffman up and wrote a new algorithm, Nuts (the “No-U-Turn Sampler”) which optimizes HMC adaptively. In many settings, Nuts is actually more computationally efficient than the optimal static HMC! When the the Nuts paper appeared on Arxiv, Christian Robert noticed it and had some reactions . In response to Xian’s comments, Matt writes: Christian writes: I wonder about the computing time (and the “una

3 0.84366709 2003 andrew gelman stats-2013-08-30-Stan Project: Continuous Relaxations for Discrete MRFs

Introduction: Hamiltonian Monte Carlo (HMC), as used by Stan , is only defined for continuous parameters. We’d love to be able to do discrete sampling. So I was excited when I saw this: Yichuan Zhang, Charles Sutton, Amos J Storkey, and Zoubin Ghahramani. 2012. Continuous Relaxations for Discrete Hamiltonian Monte Carlo . NIPS 25. Abstract: Continuous relaxations play an important role in discrete optimization, but have not seen much use in approximate probabilistic inference. Here we show that a general form of the Gaussian Integral Trick makes it possible to transform a wide class of discrete variable undirected models into fully continuous systems. The continuous representation allows the use of gradient-based Hamiltonian Monte Carlo for inference, results in new ways of estimating normalization constants (partition functions), and in general opens up a number of new avenues for inference in difficult discrete systems. We demonstrate some of these continuous relaxation inference a

4 0.84353 1580 andrew gelman stats-2012-11-16-Stantastic!

Introduction: Richard McElreath writes: I’ve been translating a few ongoing data analysis projects into Stan code, mostly with success. The most important for me right now has been a hierarchical zero-inflated gamma problem. This a “hurdle” model, in which a bernoulli GLM produces zeros/nonzeros, and then a gamma GLM produces the nonzero values, using varying effects correlated with those in the bernoulli process. The data are 20 years of human foraging returns from a subsistence hunting population in Paraguay (the Ache), comprising about 15k hunts in total (Hill & Kintigh. 2009. Current Anthropology 50:369-377). Observed values are kilograms of meat returned to camp. The more complex models contain a 147-by-9 matrix of varying effects (147 unique hunters), as well as imputation of missing values. Originally, I had written the sampler myself in raw R code. It was very slow, but I knew what it was doing at least. Just before Stan version 1.0 was released, I had managed to get JAGS to do it a

5 0.82306033 2291 andrew gelman stats-2014-04-14-Transitioning to Stan

Introduction: Kevin Cartier writes: I’ve been happily using R for a number of years now and recently came across Stan. Looks big and powerful, so I’d like to pick an appropriate project and try it out. I wondered if you could point me to a link or document that goes into the motivation for this tool (aside from the Stan user doc)? What I’d like to understand is, at what point might you look at an emergent R project and advise, “You know, that thing you’re trying to do would be a whole lot easier/simpler/more straightforward to implement with Stan.” (or words to that effect). My reply: For my collaborators in political science, Stan has been most useful for models where the data set is not huge (e.g., we might have 10,000 data points or 50,000 data points but not 10 million) but where the model is somewhat complex (for example, a model with latent time series structure). The point is that the model has enough parameters and uncertainty that you’ll want to do full Bayes (rather than some sort

6 0.81764758 2161 andrew gelman stats-2014-01-07-My recent debugging experience

7 0.81529528 2150 andrew gelman stats-2013-12-27-(R-Py-Cmd)Stan 2.1.0

8 0.80006289 1753 andrew gelman stats-2013-03-06-Stan 1.2.0 and RStan 1.2.0

9 0.79438198 1627 andrew gelman stats-2012-12-17-Stan and RStan 1.1.0

10 0.76854539 1710 andrew gelman stats-2013-02-06-The new Stan 1.1.1, featuring Gaussian processes!

11 0.75651556 2242 andrew gelman stats-2014-03-10-Stan Model of the Week: PK Calculation of IV and Oral Dosing

12 0.75632793 2231 andrew gelman stats-2014-03-03-Running into a Stan Reference by Accident

13 0.75364363 1799 andrew gelman stats-2013-04-12-Stan 1.3.0 and RStan 1.3.0 Ready for Action

14 0.75233293 1475 andrew gelman stats-2012-08-30-A Stan is Born

15 0.74659306 2124 andrew gelman stats-2013-12-05-Stan (quietly) passes 512 people on the users list

16 0.74101764 1472 andrew gelman stats-2012-08-28-Migrating from dot to underscore

17 0.72525764 1748 andrew gelman stats-2013-03-04-PyStan!

18 0.72217637 1528 andrew gelman stats-2012-10-10-My talk at MIT on Thurs 11 Oct

19 0.71842432 1855 andrew gelman stats-2013-05-13-Stan!

20 0.71365678 2209 andrew gelman stats-2014-02-13-CmdStan, RStan, PyStan v2.2.0


similar blogs computed by lda model

lda for this blog:

topicId topicWeight

[(2, 0.054), (16, 0.055), (17, 0.019), (24, 0.077), (27, 0.037), (35, 0.016), (50, 0.012), (58, 0.029), (73, 0.157), (78, 0.016), (82, 0.013), (84, 0.046), (86, 0.049), (90, 0.037), (91, 0.024), (97, 0.034), (99, 0.243)]

similar blogs list:

simIndex simValue blogId blogTitle

same-blog 1 0.93388408 2020 andrew gelman stats-2013-09-12-Samplers for Big Science: emcee and BAT

Introduction: Over the past few months, we’ve talked about modeling with particle physicists ( Allen Caldwell ), astrophysicists ( David Hogg , who regularly comments here), and climate and energy usage modelers ( Phil Price , who regularly posts here). Big Science Black Boxes We’ve gotten pretty much the same story from all of them: their models involve “big science” components that are hugely complex and provided by outside implementations from labs like CERN or LBL. Some concrete examples for energy modeling are the TOUGH2 thermal simulator, the EnergyPlus building energy usage simulator, and global climate model (GCM) implementations. These models have the character of not only being black boxes, but taking several seconds or more to generate the equivalent of a likelihood function evaluation. So we can’t use something like Stan, because nobody has the person years required to implement something like TOUGH2 in Stan (and Stan doesn’t have the debugging or modularity tools to suppor

2 0.92640817 655 andrew gelman stats-2011-04-10-“Versatile, affordable chicken has grown in popularity”

Introduction: Awhile ago I was cleaning out the closet and found some old unread magazines. Good stuff. As we’ve discussed before , lots of things are better read a few years late. Today I was reading the 18 Nov 2004 issue of the London Review of Books, which contained (among other things) the following: - A review by Jenny Diski of a biography of Stanley Milgram. Diski appears to want to debunk: Milgram was a whiz at devising sexy experiments, but barely interested in any theoretical basis for them. They all have the same instant attractiveness of style, and then an underlying emptiness. Huh? Michael Jordan couldn’t hit the curveball and he was reportedly an easy mark for golf hustlers but that doesn’t diminish his greatness on the basketball court. She also criticizes Milgram for being “no help at all” for solving international disputes. OK, fine. I haven’t solved any international disputes either. Milgram, though, . . . he conducted an imaginative experiment whose results stu

3 0.92616487 1925 andrew gelman stats-2013-07-04-“Versatile, affordable chicken has grown in popularity”

Introduction: From two years ago : Awhile ago I was cleaning out the closet and found some old unread magazines. Good stuff. As we’ve discussed before , lots of things are better read a few years late. Today I was reading the 18 Nov 2004 issue of the London Review of Books, which contained (among other things) the following: - A review by Jenny Diski of a biography of Stanley Milgram. Diski appears to want to debunk: Milgram was a whiz at devising sexy experiments, but barely interested in any theoretical basis for them. They all have the same instant attractiveness of style, and then an underlying emptiness. Huh? Michael Jordan couldn’t hit the curveball and he was reportedly an easy mark for golf hustlers but that doesn’t diminish his greatness on the basketball court. She also criticizes Milgram for being “no help at all” for solving international disputes. OK, fine. I haven’t solved any international disputes either. Milgram, though, . . . he conducted an imaginative exp

4 0.91561329 794 andrew gelman stats-2011-07-09-The quest for the holy graph

Introduction: Eytan Adar writes: I was just going through the latest draft of your paper with Anthony Unwin . I heard part of it at the talk you gave (remotely) here at UMich. I’m curious about your discussion of the Baby Name Voyager . The tool in itself is simple, attractive, and useful. No argument from me there. It’s an awesome demonstration of how subtle interactions can be very helpful (click and it zooms, type and it filters… falls perfectly into the Shneiderman visualization mantra). It satisfies a very common use case: finding appropriate names for children. That said, I can’t help but feeling that what you are really excited about is the very static analysis on last letters (you spend most of your time on this). This analysis, incidentally, is not possible to infer from the interactive application (which doesn’t support this type of filtering and pivoting). In a sense, the two visualizations don’t have anything to do with each other (other than a shared context/dataset).

5 0.90762776 2238 andrew gelman stats-2014-03-09-Hipmunk worked

Introduction: In the past I’ve categorized Hipmunk as a really cool flight-finder that doesn’t actually work , as worse than Expedia , and as graphics without content . So, I thought it would be only fair to tell you that I bought a flight the other day using Hipmunk and it gave me the same flight as Expedia but at a lower cost (by linking to something called CheapOair, which I hope is legit). So score one for Hipmunk.

6 0.90098214 497 andrew gelman stats-2011-01-02-Hipmunk update

7 0.89704818 7 andrew gelman stats-2010-04-27-Should Mister P be allowed-encouraged to reside in counter-factual populations?

8 0.89639604 1099 andrew gelman stats-2012-01-05-Approaching harmonic convergence

9 0.89599741 1748 andrew gelman stats-2013-03-04-PyStan!

10 0.88132155 917 andrew gelman stats-2011-09-20-Last post on Hipmunk

11 0.87463498 161 andrew gelman stats-2010-07-24-Differences in color perception by sex, also the Bechdel test for women in movies

12 0.87150085 280 andrew gelman stats-2010-09-16-Meet Hipmunk, a really cool flight-finder that doesn’t actually work

13 0.871351 1511 andrew gelman stats-2012-09-26-What do statistical p-values mean when the sample = the population?

14 0.85811591 798 andrew gelman stats-2011-07-12-Sometimes a graph really is just ugly

15 0.8487885 496 andrew gelman stats-2011-01-01-Tukey’s philosophy

16 0.84541893 641 andrew gelman stats-2011-04-01-So many topics, so little time

17 0.84187025 1846 andrew gelman stats-2013-05-07-Like Casper the ghost, Niall Ferguson is not only white. He is also very, very adorable.

18 0.84039837 320 andrew gelman stats-2010-10-05-Does posterior predictive model checking fit with the operational subjective approach?

19 0.83593273 2325 andrew gelman stats-2014-05-07-Stan users meetup next week

20 0.83475554 573 andrew gelman stats-2011-02-14-Hipmunk < Expedia, again