andrew_gelman_stats andrew_gelman_stats-2011 andrew_gelman_stats-2011-1036 knowledge-graph by maker-knowledge-mining
Source: html
Introduction: We interrupt our usual program of Ed Wegman Gregg Easterbrook Niall Ferguson mockery to deliver a serious update on our statistical computing project. Stan (“Sampling Through Adaptive Neighborhoods”) is our new C++ program (written mostly by Bob Carpenter) that draws samples from Bayesian models. Stan can take different sorts of inputs: you can write the model in a Bugs-like syntax and it goes from there, or you can write the log-posterior directly as a C++ function. Most of the computation is done using Hamiltonian Monte Carlo. HMC requires some tuning, so Matt Hoffman up and wrote a new algorithm, Nuts (the “No-U-Turn Sampler”) which optimizes HMC adaptively. In many settings, Nuts is actually more computationally efficient than the optimal static HMC! When the the Nuts paper appeared on Arxiv, Christian Robert noticed it and had some reactions . In response to Xian’s comments, Matt writes: Christian writes: I wonder about the computing time (and the “una
sentIndex sentText sentNum sentScore
1 We interrupt our usual program of Ed Wegman Gregg Easterbrook Niall Ferguson mockery to deliver a serious update on our statistical computing project. [sent-1, score-0.289]
2 Most of the computation is done using Hamiltonian Monte Carlo. [sent-4, score-0.077]
3 In many settings, Nuts is actually more computationally efficient than the optimal static HMC! [sent-6, score-0.13]
4 In response to Xian’s comments, Matt writes: Christian writes: I wonder about the computing time (and the “unacceptably large amount of memory”, p. [sent-8, score-0.172]
5 12) required by the doubling procedure: 2^j is growing fast with j! [sent-9, score-0.188]
6 (If my intuition is right, the computing time should increase rather quickly with the dimension. [sent-10, score-0.243]
7 And I do not get the argument within the paper that the costly part is the gradient computation: it seems to me the gradient must be computed for all of the 2^j points. [sent-11, score-0.414]
8 ) 2 j does grow quickly with j, but so does the length of the trajectory, and it’s impossible to run a Hamiltonian trajectory for a seriously long time without making a U turn and stopping the doubling process. [sent-12, score-0.867]
9 (Just like it’s impossible to throw a ball out of an infinitely deep pit with a finite amount of energy. [sent-13, score-0.198]
10 As far as memory goes, the “naive” implementation (algorithm 2) has to store all O(2 j ) states it visits, but the more sophisticated implementation (algorithm 3) only needs to store O(j) states. [sent-15, score-0.63]
11 Finally, the gradient computations dominate precisely because we must compute 2 j of them—NUTS introduces O(2 j j) non-gradient overhead, which is usually trivial compared to O(2 j ) gradient computations. [sent-16, score-0.494]
12 These costs scale linearly with dimension, like HMC’s costs. [sent-18, score-0.128]
13 Trajectory lengths will generally increase faster than linearly with dimension, i. [sent-19, score-0.313]
14 But the optimal trajectory length for HMC does as well, and in high dimensions HMC is pretty much the best we’ve got. [sent-22, score-0.553]
15 (Unless you can exploit problem structure in some really really clever ways [or for some specific models with lots of independence structure--ed. [sent-23, score-0.129]
16 There’s also a c++ implementation in Stan, but it’s not (yet) well documented. [sent-27, score-0.122]
17 I guess I [Matt] should also write python and R implementations, although my R expertise doesn’t extend very far beyond what’s needed to use ggplot2. [sent-28, score-0.173]
18 And in any case, Stan will have the fast C++ version. [sent-30, score-0.073]
19 You’ll be able to run it from R just like you can run Bugs or Jags. [sent-35, score-0.228]
20 Compared to Bugs and Jags, Stan should be faster (especially for big models) and should be able to fit a wider range of models (for example, varying-intercept, varying-slope multilevel models with multiple coefficients per group). [sent-36, score-0.334]
wordName wordTfidf (topN-words)
[('hmc', 0.337), ('trajectory', 0.297), ('nuts', 0.282), ('matt', 0.223), ('gradient', 0.207), ('overhead', 0.184), ('stan', 0.178), ('memory', 0.134), ('optimal', 0.13), ('linearly', 0.128), ('faster', 0.126), ('length', 0.126), ('christian', 0.124), ('implementation', 0.122), ('doubling', 0.115), ('matlab', 0.112), ('computing', 0.11), ('algorithm', 0.108), ('python', 0.107), ('xian', 0.102), ('grow', 0.1), ('store', 0.093), ('hamiltonian', 0.089), ('dimension', 0.085), ('bugs', 0.082), ('compared', 0.08), ('run', 0.078), ('impossible', 0.077), ('computation', 0.077), ('bob', 0.075), ('quickly', 0.074), ('fast', 0.073), ('able', 0.072), ('models', 0.068), ('optimizes', 0.068), ('far', 0.066), ('unacceptably', 0.064), ('amount', 0.062), ('exploit', 0.061), ('trajectories', 0.061), ('interrupt', 0.061), ('negligible', 0.061), ('mockery', 0.059), ('infinitely', 0.059), ('program', 0.059), ('increase', 0.059), ('visits', 0.058), ('proprietary', 0.058), ('syntax', 0.058), ('carpenter', 0.058)]
simIndex simValue blogId blogTitle
same-blog 1 1.0000001 1036 andrew gelman stats-2011-11-30-Stan uses Nuts!
Introduction: We interrupt our usual program of Ed Wegman Gregg Easterbrook Niall Ferguson mockery to deliver a serious update on our statistical computing project. Stan (“Sampling Through Adaptive Neighborhoods”) is our new C++ program (written mostly by Bob Carpenter) that draws samples from Bayesian models. Stan can take different sorts of inputs: you can write the model in a Bugs-like syntax and it goes from there, or you can write the log-posterior directly as a C++ function. Most of the computation is done using Hamiltonian Monte Carlo. HMC requires some tuning, so Matt Hoffman up and wrote a new algorithm, Nuts (the “No-U-Turn Sampler”) which optimizes HMC adaptively. In many settings, Nuts is actually more computationally efficient than the optimal static HMC! When the the Nuts paper appeared on Arxiv, Christian Robert noticed it and had some reactions . In response to Xian’s comments, Matt writes: Christian writes: I wonder about the computing time (and the “una
2 0.23439893 931 andrew gelman stats-2011-09-29-Hamiltonian Monte Carlo stories
Introduction: Tomas Iesmantas had asked me for advice on a regression problem with 50 parameters, and I’d recommended Hamiltonian Monte Carlo. A few weeks later he reported back: After trying several modifications (HMC for all parameters at once, HMC just for first level parameters and Riemman manifold Hamiltonian Monte Carlo method), I finally got it running with HMC just for first level parameters and for others using direct sampling, since conditional distributions turned out to have closed form. However, even in this case it is quite tricky, since I had to employ mass matrix and not just diagonal but at the beginning of algorithm generated it randomly (ensuring it is positive definite). Such random generation of mass matrix is quite blind step, but it proved to be quite helpful. Riemman manifold HMC is quite vagarious, or to be more specific, metric of manifold is very sensitive. In my model log-likelihood I had exponents and values of metrics matrix elements was very large and wh
3 0.22002137 1748 andrew gelman stats-2013-03-04-PyStan!
Introduction: Stan is written in C++ and can be run from the command line and from R. We’d like for Python users to be able to run Stan as well. If anyone is interested in doing this, please let us know and we’d be happy to work with you on it. Stan, like Python, is completely free and open-source. P.S. Because Stan is open-source, it of course would also be possible for people to translate Stan into Python, or to take whatever features they like from Stan and incorporate them into a Python package. That’s fine too. But we think it would make sense in addition for users to be able to run Stan directly from Python, in the same way that it can be run from R.
4 0.18656103 1772 andrew gelman stats-2013-03-20-Stan at Google this Thurs and at Berkeley this Fri noon
Introduction: Michael Betancourt will be speaking at Google and at the University of California, Berkeley. The Google talk is closed to outsiders (but if you work at Google, you should go!); the Berkeley talk is open to all: Friday March 22, 12:10 pm, Evans Hall 1011. Title of talk: Stan : Practical Bayesian Inference with Hamiltonian Monte Carlo Abstract: Practical implementations of Bayesian inference are often limited to approximation methods that only slowly explore the posterior distribution. By taking advantage of the curvature of the posterior, however, Hamiltonian Monte Carlo (HMC) efficiently explores even the most highly contorted distributions. In this talk I will review the foundations of and recent developments within HMC, concluding with a discussion of Stan, a powerful inference engine that utilizes HMC, automatic differentiation, and adaptive methods to minimize user input. This is cool stuff. And he’ll be showing the whirlpool movie!
5 0.17835897 1809 andrew gelman stats-2013-04-17-NUTS discussed on Xi’an’s Og
Introduction: Xi’an’s Og (aka Christian Robert’s blog) is featuring a very nice presentation of NUTS by Marco Banterle, with discussion and some suggestions. I’m not even sure how they found Michael Betancourt’s paper on geometric NUTS — I don’t see it on the arXiv yet, or I’d provide a link.
6 0.16490771 1749 andrew gelman stats-2013-03-04-Stan in L.A. this Wed 3:30pm
7 0.15606688 2114 andrew gelman stats-2013-11-26-“Please make fun of this claim”
8 0.15520316 1475 andrew gelman stats-2012-08-30-A Stan is Born
9 0.14720187 2150 andrew gelman stats-2013-12-27-(R-Py-Cmd)Stan 2.1.0
10 0.14078894 1580 andrew gelman stats-2012-11-16-Stantastic!
11 0.14002292 2231 andrew gelman stats-2014-03-03-Running into a Stan Reference by Accident
12 0.13850656 2012 andrew gelman stats-2013-09-07-Job openings at American University
13 0.13629042 1339 andrew gelman stats-2012-05-23-Learning Differential Geometry for Hamiltonian Monte Carlo
14 0.13513473 555 andrew gelman stats-2011-02-04-Handy Matrix Cheat Sheet, with Gradients
16 0.12606506 535 andrew gelman stats-2011-01-24-Bleg: Automatic Differentiation for Log Prob Gradients?
17 0.12318274 2003 andrew gelman stats-2013-08-30-Stan Project: Continuous Relaxations for Discrete MRFs
18 0.12243891 1710 andrew gelman stats-2013-02-06-The new Stan 1.1.1, featuring Gaussian processes!
19 0.12099776 2161 andrew gelman stats-2014-01-07-My recent debugging experience
20 0.11732467 2291 andrew gelman stats-2014-04-14-Transitioning to Stan
topicId topicWeight
[(0, 0.151), (1, 0.048), (2, -0.048), (3, 0.064), (4, 0.074), (5, 0.084), (6, 0.027), (7, -0.196), (8, -0.07), (9, -0.089), (10, -0.105), (11, -0.028), (12, -0.092), (13, -0.02), (14, 0.061), (15, -0.05), (16, -0.033), (17, 0.012), (18, -0.025), (19, -0.014), (20, -0.028), (21, 0.035), (22, -0.043), (23, -0.004), (24, 0.027), (25, 0.011), (26, -0.021), (27, 0.045), (28, -0.02), (29, 0.016), (30, 0.021), (31, 0.03), (32, 0.007), (33, -0.019), (34, -0.009), (35, 0.002), (36, 0.028), (37, -0.023), (38, -0.013), (39, 0.006), (40, -0.021), (41, 0.008), (42, -0.053), (43, -0.025), (44, 0.022), (45, -0.074), (46, -0.047), (47, 0.007), (48, 0.007), (49, -0.052)]
simIndex simValue blogId blogTitle
same-blog 1 0.9623087 1036 andrew gelman stats-2011-11-30-Stan uses Nuts!
Introduction: We interrupt our usual program of Ed Wegman Gregg Easterbrook Niall Ferguson mockery to deliver a serious update on our statistical computing project. Stan (“Sampling Through Adaptive Neighborhoods”) is our new C++ program (written mostly by Bob Carpenter) that draws samples from Bayesian models. Stan can take different sorts of inputs: you can write the model in a Bugs-like syntax and it goes from there, or you can write the log-posterior directly as a C++ function. Most of the computation is done using Hamiltonian Monte Carlo. HMC requires some tuning, so Matt Hoffman up and wrote a new algorithm, Nuts (the “No-U-Turn Sampler”) which optimizes HMC adaptively. In many settings, Nuts is actually more computationally efficient than the optimal static HMC! When the the Nuts paper appeared on Arxiv, Christian Robert noticed it and had some reactions . In response to Xian’s comments, Matt writes: Christian writes: I wonder about the computing time (and the “una
2 0.86931676 2150 andrew gelman stats-2013-12-27-(R-Py-Cmd)Stan 2.1.0
Introduction: We’re happy to announce the release of Stan C++, CmdStan, RStan, and PyStan 2.1.0. This is a minor feature release, but it is also an important bug fix release. As always, the place to start is the (all new) Stan web pages: http://mc-stan.org Major Bug in 2.0.0, 2.0.1 Stan 2.0.0 and Stan 2.0.1 introduced a bug in the implementation of the NUTS criterion that led to poor tail exploration and thus biased the posterior uncertainty downward. There was no bug in NUTS in Stan 1.3 or earlier, and 2.1 has been extensively tested and tests put in place so this problem will not recur. If you are using Stan 2.0.0 or 2.0.1, you should switch to 2.1.0 as soon as possible and rerun any models you care about. New Target Acceptance Rate Default for Stan 2.1.0 Another big change aimed at reducing posterior estimation bias was an increase in the target acceptance rate during adaptation from 0.65 to 0.80. The bad news is that iterations will take around 50% longer
3 0.81604844 1475 andrew gelman stats-2012-08-30-A Stan is Born
Introduction: Stan 1.0.0 and RStan 1.0.0 It’s official. The Stan Development Team is happy to announce the first stable versions of Stan and RStan. What is (R)Stan? Stan is an open-source package for obtaining Bayesian inference using the No-U-Turn sampler, a variant of Hamiltonian Monte Carlo. It’s sort of like BUGS, but with a different language for expressing models and a different sampler for sampling from their posteriors. RStan is the R interface to Stan. Stan Home Page Stan’s home page is: http://mc-stan.org/ It links everything you need to get started running Stan from the command line, from R, or from C++, including full step-by-step install instructions, a detailed user’s guide and reference manual for the modeling language, and tested ports of most of the BUGS examples. Peruse the Manual If you’d like to learn more, the Stan User’s Guide and Reference Manual is the place to start.
4 0.80751121 1580 andrew gelman stats-2012-11-16-Stantastic!
Introduction: Richard McElreath writes: I’ve been translating a few ongoing data analysis projects into Stan code, mostly with success. The most important for me right now has been a hierarchical zero-inflated gamma problem. This a “hurdle” model, in which a bernoulli GLM produces zeros/nonzeros, and then a gamma GLM produces the nonzero values, using varying effects correlated with those in the bernoulli process. The data are 20 years of human foraging returns from a subsistence hunting population in Paraguay (the Ache), comprising about 15k hunts in total (Hill & Kintigh. 2009. Current Anthropology 50:369-377). Observed values are kilograms of meat returned to camp. The more complex models contain a 147-by-9 matrix of varying effects (147 unique hunters), as well as imputation of missing values. Originally, I had written the sampler myself in raw R code. It was very slow, but I knew what it was doing at least. Just before Stan version 1.0 was released, I had managed to get JAGS to do it a
5 0.80203974 2161 andrew gelman stats-2014-01-07-My recent debugging experience
Introduction: OK, so this sort of thing happens sometimes. I was working on a new idea (still working on it; if it ultimately works out—or if it doesn’t—I’ll let you know) and as part of it I was fitting little models in Stan, in a loop. I thought it would make sense to start with linear regression with normal priors and known data variance, because then the exact solution is Gaussian and I can also work with the problem analytically. So I programmed up the algorithm and, no surprise, it didn’t work. I went through my R code, put in print statements here and there, and cleared out bug after bug until at least it stopped crashing. But the algorithm still wasn’t doing what it was supposed to do. So I decided to do something simpler, and just check that the Stan linear regression gave the same answer as the analytic posterior distribution: I ran Stan for tons of iterations, then computed the sample mean and variance of the simulations. It was an example with two coefficients—I’d originally cho
6 0.80172795 2003 andrew gelman stats-2013-08-30-Stan Project: Continuous Relaxations for Discrete MRFs
7 0.80129892 1627 andrew gelman stats-2012-12-17-Stan and RStan 1.1.0
8 0.79629833 2231 andrew gelman stats-2014-03-03-Running into a Stan Reference by Accident
9 0.78824127 1855 andrew gelman stats-2013-05-13-Stan!
10 0.78262997 1753 andrew gelman stats-2013-03-06-Stan 1.2.0 and RStan 1.2.0
11 0.77963334 2020 andrew gelman stats-2013-09-12-Samplers for Big Science: emcee and BAT
12 0.77877879 1710 andrew gelman stats-2013-02-06-The new Stan 1.1.1, featuring Gaussian processes!
13 0.76851451 1748 andrew gelman stats-2013-03-04-PyStan!
14 0.7638427 2209 andrew gelman stats-2014-02-13-CmdStan, RStan, PyStan v2.2.0
15 0.74619114 2318 andrew gelman stats-2014-05-04-Stan (& JAGS) Tutorial on Linear Mixed Models
16 0.73932421 712 andrew gelman stats-2011-05-14-The joys of working in the public domain
17 0.73910338 2291 andrew gelman stats-2014-04-14-Transitioning to Stan
18 0.71461219 2124 andrew gelman stats-2013-12-05-Stan (quietly) passes 512 people on the users list
19 0.7130999 1799 andrew gelman stats-2013-04-12-Stan 1.3.0 and RStan 1.3.0 Ready for Action
20 0.7126509 1576 andrew gelman stats-2012-11-13-Stan at NIPS 2012 Workshop on Probabilistic Programming
topicId topicWeight
[(15, 0.02), (16, 0.073), (21, 0.027), (24, 0.154), (32, 0.012), (55, 0.026), (57, 0.133), (73, 0.033), (79, 0.017), (82, 0.066), (86, 0.029), (87, 0.016), (89, 0.029), (99, 0.206)]
simIndex simValue blogId blogTitle
same-blog 1 0.94738817 1036 andrew gelman stats-2011-11-30-Stan uses Nuts!
Introduction: We interrupt our usual program of Ed Wegman Gregg Easterbrook Niall Ferguson mockery to deliver a serious update on our statistical computing project. Stan (“Sampling Through Adaptive Neighborhoods”) is our new C++ program (written mostly by Bob Carpenter) that draws samples from Bayesian models. Stan can take different sorts of inputs: you can write the model in a Bugs-like syntax and it goes from there, or you can write the log-posterior directly as a C++ function. Most of the computation is done using Hamiltonian Monte Carlo. HMC requires some tuning, so Matt Hoffman up and wrote a new algorithm, Nuts (the “No-U-Turn Sampler”) which optimizes HMC adaptively. In many settings, Nuts is actually more computationally efficient than the optimal static HMC! When the the Nuts paper appeared on Arxiv, Christian Robert noticed it and had some reactions . In response to Xian’s comments, Matt writes: Christian writes: I wonder about the computing time (and the “una
2 0.93180716 1101 andrew gelman stats-2012-01-05-What are the standards for reliability in experimental psychology?
Introduction: An experimental psychologist was wondering about the standards in that field for “acceptable reliability” (when looking at inter-rater reliability in coding data). He wondered, for example, if some variation on signal detectability theory might be applied to adjust for inter-rater differences in criteria for saying some code is present. What about Cohen’s kappa? The psychologist wrote: Cohen’s kappa does adjust for “guessing,” but its assumptions are not well motivated, perhaps not any more than adjustments for guessing versus the application of signal detectability theory where that can be applied. But one can’t do a straightforward application of signal detectability theory for reliability in that you don’t know whether the signal is present or not. I think measurement issues are important but I don’t have enough experience in this area to answer the question without knowing more about the problem that this researcher is working on. I’m posting it here because I imagine t
3 0.91417789 1460 andrew gelman stats-2012-08-16-“Real data can be a pain”
Introduction: Michael McLaughlin sent me the following query with the above title. Some time ago, I [McLaughlin] was handed a dataset that needed to be modeled. It was generated as follows: 1. Random navigation errors, historically a binary mixture of normal and Laplace with a common mean, were collected by observation. 2. Sadly, these data were recorded with too few decimal places so that the resulting quantization is clearly visible in a scatterplot. 3. The quantized data were then interpolated (to an unobserved location). The final result looks like fuzzy points (small scale jitter) at quantized intervals spanning a much larger scale (the parent mixture distribution). This fuzziness, likely ~normal or ~Laplace, results from the interpolation. Otherwise, the data would look like a discrete analogue of the normal/Laplace mixture. I would like to characterize the latent normal/Laplace mixture distribution but the quantization is “getting in the way”. When I tried MCMC on this proble
4 0.89970344 1044 andrew gelman stats-2011-12-06-The K Foundation burns Cosma’s turkey
Introduction: Shalizi delivers a slow, drawn-out illustration of the point that economic efficiency is all about who’s got the $, which isn’t always related to what we would usually call “efficiency” in other settings. (His point is related to my argument that the phrase “willingness to pay” should generally be replaced by “ability to pay.”) The basic story is simple: Good guy needs a turkey, bad guy wants a turkey. Bad guy is willing and able to pay more for the turkey than good guy can afford, hence good guy starves to death. The counterargument is that a market in turkeys will motivate producers to breed more turkeys, ultimately saturating the bad guys’ desires and leaving surplus turkeys for the good guys at a reasonable price. I’m sure there’s a counter-counterargument too, but I don’t want to go there. But what really amused me about Cosma’s essay was how he scrambled the usual cultural/political associations. (I assume he did this on purpose.) In the standard version of t
Introduction: Uberbloggers Andrew Sullivan and Matthew Yglesias were kind enough to link to my five-year-old post with graphs from Red State Blue State on time trends of average income by state. Here are the graphs : Yglesias’s take-home point: There isn’t that much change over time in states’ economic well-being. All things considered the best predictor of how rich a state was in 2000 was simply how rich it was in 1929…. Massachusetts and Connecticut have always been rich and Arkansas and Mississippi have always been poor. I’d like to point to a different feature of the graphs, which is that, although the rankings of the states haven’t changed much (as can be seen from the “2000 compared to 1929″ scale), the relative values of the incomes have converged quite a bit—at least, they converged from about 1930 to 1980 before hitting some level of stability. And the rankings have changed a bit. My impression (without checking the numbers) is that New York and Connecticut were
6 0.8902396 861 andrew gelman stats-2011-08-19-Will Stan work well with 40×40 matrices?
7 0.88782144 931 andrew gelman stats-2011-09-29-Hamiltonian Monte Carlo stories
8 0.88365495 2318 andrew gelman stats-2014-05-04-Stan (& JAGS) Tutorial on Linear Mixed Models
9 0.88331664 1120 andrew gelman stats-2012-01-15-Fun fight over the Grover search algorithm
10 0.87739897 2299 andrew gelman stats-2014-04-21-Stan Model of the Week: Hierarchical Modeling of Supernovas
11 0.87364155 35 andrew gelman stats-2010-05-16-Another update on the spam email study
13 0.87019956 2015 andrew gelman stats-2013-09-10-The ethics of lying, cheating, and stealing with data: A case study
14 0.86966944 215 andrew gelman stats-2010-08-18-DataMarket
15 0.86853743 1542 andrew gelman stats-2012-10-20-A statistical model for underdispersion
16 0.86636728 1488 andrew gelman stats-2012-09-08-Annals of spam
19 0.86221343 816 andrew gelman stats-2011-07-22-“Information visualization” vs. “Statistical graphics”
20 0.86133444 1713 andrew gelman stats-2013-02-08-P-values and statistical practice