andrew_gelman_stats andrew_gelman_stats-2012 andrew_gelman_stats-2012-1580 knowledge-graph by maker-knowledge-mining
Source: html
Introduction: Richard McElreath writes: I’ve been translating a few ongoing data analysis projects into Stan code, mostly with success. The most important for me right now has been a hierarchical zero-inflated gamma problem. This a “hurdle” model, in which a bernoulli GLM produces zeros/nonzeros, and then a gamma GLM produces the nonzero values, using varying effects correlated with those in the bernoulli process. The data are 20 years of human foraging returns from a subsistence hunting population in Paraguay (the Ache), comprising about 15k hunts in total (Hill & Kintigh. 2009. Current Anthropology 50:369-377). Observed values are kilograms of meat returned to camp. The more complex models contain a 147-by-9 matrix of varying effects (147 unique hunters), as well as imputation of missing values. Originally, I had written the sampler myself in raw R code. It was very slow, but I knew what it was doing at least. Just before Stan version 1.0 was released, I had managed to get JAGS to do it a
sentIndex sentText sentNum sentScore
1 The most important for me right now has been a hierarchical zero-inflated gamma problem. [sent-2, score-0.13]
2 This a “hurdle” model, in which a bernoulli GLM produces zeros/nonzeros, and then a gamma GLM produces the nonzero values, using varying effects correlated with those in the bernoulli process. [sent-3, score-0.989]
3 The data are 20 years of human foraging returns from a subsistence hunting population in Paraguay (the Ache), comprising about 15k hunts in total (Hill & Kintigh. [sent-4, score-0.342]
4 Observed values are kilograms of meat returned to camp. [sent-7, score-0.138]
5 The more complex models contain a 147-by-9 matrix of varying effects (147 unique hunters), as well as imputation of missing values. [sent-8, score-0.225]
6 Originally, I had written the sampler myself in raw R code. [sent-9, score-0.119]
7 0 was released, I had managed to get JAGS to do it all quite reliably. [sent-12, score-0.12]
8 Stan produces the same inferences as my JAGS code does, but with 8 hour runs (no thinning needed) instead of 30 hour runs (with massive thinning). [sent-16, score-0.82]
9 In the future, I should be getting similar data for about a dozen other foraging populations, so will want to scale this up to a meta-analytic level, with partial pooling across societies. [sent-17, score-0.293]
10 On the horizon, I have a harder project I’d like to port into Stan, involving cumulative multi-normal likelihoods. [sent-19, score-0.161]
11 I wrote my own sampler, using likelihoods from pmvnorm in the mvtnorm package, but it mixes very slowly, once all the varying effects are included. [sent-20, score-0.79]
12 Is there a clever way to get the same likelihoods in Stan yet? [sent-21, score-0.128]
13 If not, once you have a guide prepared for how to compile in new distributions, I can probably use that to hack mvtnorm’s pmvnorm into Stan. [sent-22, score-0.258]
14 I’m pretty sure that with some vectorization and other steps, he can get his model to run in much less than 8 hours in Stan. [sent-23, score-0.143]
15 And Lucas Leeman writes: I just wanted to say thank you for Stan! [sent-25, score-0.164]
16 I had this problem with a very slow mixing chain and I have finally managed to get Stan to do what I want. [sent-27, score-0.224]
17 With the mock example I am playing Stan drastically outperforms the software I was using before. [sent-28, score-0.148]
18 A few years ago, I had the attitude that I could fit a model in Bugs, and if that didn’t work I could program it myself. [sent-31, score-0.145]
19 Fitting a model in Stan is essentially the same as programming it myself, except that the program has already been optimized and debugged, thus combining the convenience of Bugs with the efficiency of compiled code. [sent-33, score-0.394]
20 Also, again we thank the Department of Energy, Institute for Education Sciences, and National Science Foundation for partial support of this project. [sent-34, score-0.265]
wordName wordTfidf (topN-words)
[('stan', 0.418), ('foraging', 0.192), ('pmvnorm', 0.192), ('jags', 0.19), ('mvtnorm', 0.175), ('thinning', 0.175), ('produces', 0.169), ('thank', 0.164), ('bugs', 0.157), ('bernoulli', 0.148), ('varying', 0.144), ('gamma', 0.13), ('glm', 0.13), ('likelihoods', 0.128), ('managed', 0.12), ('sampler', 0.119), ('efficiency', 0.105), ('slow', 0.104), ('runs', 0.101), ('partial', 0.101), ('hour', 0.099), ('port', 0.087), ('hunters', 0.087), ('mcelreath', 0.087), ('debugged', 0.087), ('horizon', 0.082), ('leeman', 0.082), ('effects', 0.081), ('translating', 0.079), ('drastically', 0.079), ('code', 0.076), ('comprising', 0.076), ('hurdle', 0.076), ('anthropology', 0.076), ('program', 0.076), ('project', 0.074), ('announcing', 0.074), ('vectorization', 0.074), ('optimized', 0.074), ('hunting', 0.074), ('lucas', 0.074), ('values', 0.072), ('compiled', 0.07), ('mixes', 0.07), ('model', 0.069), ('outperforms', 0.069), ('compile', 0.066), ('converge', 0.066), ('meat', 0.066), ('inefficient', 0.066)]
simIndex simValue blogId blogTitle
same-blog 1 1.0000004 1580 andrew gelman stats-2012-11-16-Stantastic!
Introduction: Richard McElreath writes: I’ve been translating a few ongoing data analysis projects into Stan code, mostly with success. The most important for me right now has been a hierarchical zero-inflated gamma problem. This a “hurdle” model, in which a bernoulli GLM produces zeros/nonzeros, and then a gamma GLM produces the nonzero values, using varying effects correlated with those in the bernoulli process. The data are 20 years of human foraging returns from a subsistence hunting population in Paraguay (the Ache), comprising about 15k hunts in total (Hill & Kintigh. 2009. Current Anthropology 50:369-377). Observed values are kilograms of meat returned to camp. The more complex models contain a 147-by-9 matrix of varying effects (147 unique hunters), as well as imputation of missing values. Originally, I had written the sampler myself in raw R code. It was very slow, but I knew what it was doing at least. Just before Stan version 1.0 was released, I had managed to get JAGS to do it a
2 0.32029936 1475 andrew gelman stats-2012-08-30-A Stan is Born
Introduction: Stan 1.0.0 and RStan 1.0.0 It’s official. The Stan Development Team is happy to announce the first stable versions of Stan and RStan. What is (R)Stan? Stan is an open-source package for obtaining Bayesian inference using the No-U-Turn sampler, a variant of Hamiltonian Monte Carlo. It’s sort of like BUGS, but with a different language for expressing models and a different sampler for sampling from their posteriors. RStan is the R interface to Stan. Stan Home Page Stan’s home page is: http://mc-stan.org/ It links everything you need to get started running Stan from the command line, from R, or from C++, including full step-by-step install instructions, a detailed user’s guide and reference manual for the modeling language, and tested ports of most of the BUGS examples. Peruse the Manual If you’d like to learn more, the Stan User’s Guide and Reference Manual is the place to start.
3 0.25857335 2291 andrew gelman stats-2014-04-14-Transitioning to Stan
Introduction: Kevin Cartier writes: I’ve been happily using R for a number of years now and recently came across Stan. Looks big and powerful, so I’d like to pick an appropriate project and try it out. I wondered if you could point me to a link or document that goes into the motivation for this tool (aside from the Stan user doc)? What I’d like to understand is, at what point might you look at an emergent R project and advise, “You know, that thing you’re trying to do would be a whole lot easier/simpler/more straightforward to implement with Stan.” (or words to that effect). My reply: For my collaborators in political science, Stan has been most useful for models where the data set is not huge (e.g., we might have 10,000 data points or 50,000 data points but not 10 million) but where the model is somewhat complex (for example, a model with latent time series structure). The point is that the model has enough parameters and uncertainty that you’ll want to do full Bayes (rather than some sort
4 0.23778529 1748 andrew gelman stats-2013-03-04-PyStan!
Introduction: Stan is written in C++ and can be run from the command line and from R. We’d like for Python users to be able to run Stan as well. If anyone is interested in doing this, please let us know and we’d be happy to work with you on it. Stan, like Python, is completely free and open-source. P.S. Because Stan is open-source, it of course would also be possible for people to translate Stan into Python, or to take whatever features they like from Stan and incorporate them into a Python package. That’s fine too. But we think it would make sense in addition for users to be able to run Stan directly from Python, in the same way that it can be run from R.
Introduction: Statistical Methods and Data Skepticism Data analysis today is dominated by three paradigms: null hypothesis significance testing, Bayesian inference, and exploratory data analysis. There is concern that all these methods lead to overconfidence on the part of researchers and the general public, and this concern has led to the new “data skepticism” movement. But the history of statistics is already in some sense a history of data skepticism. Concepts of bias, variance, sampling and measurement error, least-squares regression, and statistical significance can all be viewed as formalizations of data skepticism. All these methods address the concern that patterns in observed data might not generalize to the population of interest. We discuss the challenge of attaining data skepticism while avoiding data nihilism, and consider some proposed future directions. Stan Stan (mc-stan.org) is an open-source package for obtaining Bayesian inference using the No-U-Turn sampler, a
6 0.21699868 2161 andrew gelman stats-2014-01-07-My recent debugging experience
7 0.21559866 1855 andrew gelman stats-2013-05-13-Stan!
8 0.18721765 2150 andrew gelman stats-2013-12-27-(R-Py-Cmd)Stan 2.1.0
9 0.18330847 1627 andrew gelman stats-2012-12-17-Stan and RStan 1.1.0
10 0.1738185 1886 andrew gelman stats-2013-06-07-Robust logistic regression
11 0.16606307 1528 andrew gelman stats-2012-10-10-My talk at MIT on Thurs 11 Oct
12 0.15755974 2318 andrew gelman stats-2014-05-04-Stan (& JAGS) Tutorial on Linear Mixed Models
13 0.1534185 55 andrew gelman stats-2010-05-27-In Linux, use jags() to call Jags instead of using bugs() to call OpenBugs
14 0.15188004 1045 andrew gelman stats-2011-12-07-Martyn Plummer’s Secret JAGS Blog
15 0.14812312 2096 andrew gelman stats-2013-11-10-Schiminovich is on The Simpsons
16 0.14698091 2124 andrew gelman stats-2013-12-05-Stan (quietly) passes 512 people on the users list
17 0.14692231 2209 andrew gelman stats-2014-02-13-CmdStan, RStan, PyStan v2.2.0
18 0.14237456 2299 andrew gelman stats-2014-04-21-Stan Model of the Week: Hierarchical Modeling of Supernovas
19 0.14078894 1036 andrew gelman stats-2011-11-30-Stan uses Nuts!
20 0.13435419 2035 andrew gelman stats-2013-09-23-Scalable Stan
topicId topicWeight
[(0, 0.157), (1, 0.085), (2, -0.026), (3, 0.077), (4, 0.128), (5, 0.12), (6, 0.001), (7, -0.269), (8, -0.084), (9, -0.092), (10, -0.164), (11, -0.001), (12, -0.1), (13, -0.075), (14, 0.095), (15, -0.014), (16, -0.028), (17, 0.06), (18, -0.031), (19, 0.008), (20, -0.028), (21, 0.005), (22, -0.086), (23, -0.013), (24, -0.015), (25, -0.032), (26, 0.004), (27, -0.033), (28, -0.075), (29, 0.006), (30, -0.006), (31, -0.016), (32, 0.003), (33, -0.002), (34, -0.003), (35, 0.06), (36, 0.023), (37, 0.039), (38, 0.011), (39, -0.01), (40, -0.001), (41, 0.014), (42, -0.007), (43, -0.028), (44, -0.005), (45, -0.045), (46, 0.001), (47, -0.019), (48, -0.017), (49, 0.039)]
simIndex simValue blogId blogTitle
same-blog 1 0.97198957 1580 andrew gelman stats-2012-11-16-Stantastic!
Introduction: Richard McElreath writes: I’ve been translating a few ongoing data analysis projects into Stan code, mostly with success. The most important for me right now has been a hierarchical zero-inflated gamma problem. This a “hurdle” model, in which a bernoulli GLM produces zeros/nonzeros, and then a gamma GLM produces the nonzero values, using varying effects correlated with those in the bernoulli process. The data are 20 years of human foraging returns from a subsistence hunting population in Paraguay (the Ache), comprising about 15k hunts in total (Hill & Kintigh. 2009. Current Anthropology 50:369-377). Observed values are kilograms of meat returned to camp. The more complex models contain a 147-by-9 matrix of varying effects (147 unique hunters), as well as imputation of missing values. Originally, I had written the sampler myself in raw R code. It was very slow, but I knew what it was doing at least. Just before Stan version 1.0 was released, I had managed to get JAGS to do it a
2 0.93857598 1475 andrew gelman stats-2012-08-30-A Stan is Born
Introduction: Stan 1.0.0 and RStan 1.0.0 It’s official. The Stan Development Team is happy to announce the first stable versions of Stan and RStan. What is (R)Stan? Stan is an open-source package for obtaining Bayesian inference using the No-U-Turn sampler, a variant of Hamiltonian Monte Carlo. It’s sort of like BUGS, but with a different language for expressing models and a different sampler for sampling from their posteriors. RStan is the R interface to Stan. Stan Home Page Stan’s home page is: http://mc-stan.org/ It links everything you need to get started running Stan from the command line, from R, or from C++, including full step-by-step install instructions, a detailed user’s guide and reference manual for the modeling language, and tested ports of most of the BUGS examples. Peruse the Manual If you’d like to learn more, the Stan User’s Guide and Reference Manual is the place to start.
3 0.93784791 2150 andrew gelman stats-2013-12-27-(R-Py-Cmd)Stan 2.1.0
Introduction: We’re happy to announce the release of Stan C++, CmdStan, RStan, and PyStan 2.1.0. This is a minor feature release, but it is also an important bug fix release. As always, the place to start is the (all new) Stan web pages: http://mc-stan.org Major Bug in 2.0.0, 2.0.1 Stan 2.0.0 and Stan 2.0.1 introduced a bug in the implementation of the NUTS criterion that led to poor tail exploration and thus biased the posterior uncertainty downward. There was no bug in NUTS in Stan 1.3 or earlier, and 2.1 has been extensively tested and tests put in place so this problem will not recur. If you are using Stan 2.0.0 or 2.0.1, you should switch to 2.1.0 as soon as possible and rerun any models you care about. New Target Acceptance Rate Default for Stan 2.1.0 Another big change aimed at reducing posterior estimation bias was an increase in the target acceptance rate during adaptation from 0.65 to 0.80. The bad news is that iterations will take around 50% longer
4 0.92419088 1748 andrew gelman stats-2013-03-04-PyStan!
Introduction: Stan is written in C++ and can be run from the command line and from R. We’d like for Python users to be able to run Stan as well. If anyone is interested in doing this, please let us know and we’d be happy to work with you on it. Stan, like Python, is completely free and open-source. P.S. Because Stan is open-source, it of course would also be possible for people to translate Stan into Python, or to take whatever features they like from Stan and incorporate them into a Python package. That’s fine too. But we think it would make sense in addition for users to be able to run Stan directly from Python, in the same way that it can be run from R.
5 0.91687042 712 andrew gelman stats-2011-05-14-The joys of working in the public domain
Introduction: Stan will make a total lifetime profit of $0, so we can’t be sued !
6 0.91171318 2209 andrew gelman stats-2014-02-13-CmdStan, RStan, PyStan v2.2.0
7 0.882819 2124 andrew gelman stats-2013-12-05-Stan (quietly) passes 512 people on the users list
8 0.87615609 1627 andrew gelman stats-2012-12-17-Stan and RStan 1.1.0
9 0.87303072 2161 andrew gelman stats-2014-01-07-My recent debugging experience
10 0.87296563 2291 andrew gelman stats-2014-04-14-Transitioning to Stan
11 0.86756265 2318 andrew gelman stats-2014-05-04-Stan (& JAGS) Tutorial on Linear Mixed Models
12 0.84933013 1036 andrew gelman stats-2011-11-30-Stan uses Nuts!
13 0.83476835 2325 andrew gelman stats-2014-05-07-Stan users meetup next week
14 0.83129025 1576 andrew gelman stats-2012-11-13-Stan at NIPS 2012 Workshop on Probabilistic Programming
15 0.82935506 2242 andrew gelman stats-2014-03-10-Stan Model of the Week: PK Calculation of IV and Oral Dosing
16 0.82196683 1855 andrew gelman stats-2013-05-13-Stan!
17 0.81503904 2096 andrew gelman stats-2013-11-10-Schiminovich is on The Simpsons
18 0.80010051 1472 andrew gelman stats-2012-08-28-Migrating from dot to underscore
19 0.77762389 2003 andrew gelman stats-2013-08-30-Stan Project: Continuous Relaxations for Discrete MRFs
20 0.76624846 2020 andrew gelman stats-2013-09-12-Samplers for Big Science: emcee and BAT
topicId topicWeight
[(5, 0.02), (15, 0.011), (16, 0.064), (21, 0.038), (24, 0.164), (27, 0.021), (36, 0.042), (40, 0.018), (44, 0.02), (45, 0.02), (54, 0.012), (64, 0.015), (78, 0.117), (86, 0.026), (89, 0.071), (99, 0.203)]
simIndex simValue blogId blogTitle
same-blog 1 0.93614352 1580 andrew gelman stats-2012-11-16-Stantastic!
Introduction: Richard McElreath writes: I’ve been translating a few ongoing data analysis projects into Stan code, mostly with success. The most important for me right now has been a hierarchical zero-inflated gamma problem. This a “hurdle” model, in which a bernoulli GLM produces zeros/nonzeros, and then a gamma GLM produces the nonzero values, using varying effects correlated with those in the bernoulli process. The data are 20 years of human foraging returns from a subsistence hunting population in Paraguay (the Ache), comprising about 15k hunts in total (Hill & Kintigh. 2009. Current Anthropology 50:369-377). Observed values are kilograms of meat returned to camp. The more complex models contain a 147-by-9 matrix of varying effects (147 unique hunters), as well as imputation of missing values. Originally, I had written the sampler myself in raw R code. It was very slow, but I knew what it was doing at least. Just before Stan version 1.0 was released, I had managed to get JAGS to do it a
2 0.89966762 2025 andrew gelman stats-2013-09-15-The it-gets-me-so-angry-I-can’t-deal-with-it threshold
Introduction: I happened to be looking at Slate (I know, I know, but I’d already browsed Gawker and I was desperately avoiding doing real work) and came across this article by Alice Gregory entitled, “I Read Everything Janet Malcolm Ever Published. I’m in awe of her.” I too think Malcolm is an excellent writer, but (a) I’m not happy that she gets off the hook for faking quotes , and (b) I’m really really not happy with her apparent attempt to try to force a mistrial for a convicted killer. I just can’t get over that, for some reason. I can appreciate Picasso’s genius even though he beat his wives or whatever it was he did, I can enjoy the music of Jackson Browne, etc. But for some reason this Malcolm stuff sticks in my craw. There’s no deep meaning to this—I recognize it is a somewhat irrational attitude on my part, I just wanted to share it with you.
3 0.88920856 1881 andrew gelman stats-2013-06-03-Boot
Introduction: Joshua Hartshorne writes: I ran several large-N experiments (separate participants) and looked at performance against age. What we want to do is compare age-of-peak-performance across the different tasks (again, different participants). We bootstrapped age-of-peak-performance. On each iteration, we sampled (with replacement) the X scores at each age, where X=num of participants at that age, and recorded the age at which performance peaked on that task. We then recorded the age at which performance was at peak and repeated. Once we had distributions of age-of-peak-performance, we used the means and SDs to calculate t-statistics to compare the results across different tasks. For graphical presentation, we used medians, interquartile ranges, and 95% confidence intervals (based on the distributions: the range within which 75% and 95% of the bootstrapped peaks appeared). While a number of people we consulted with thought this made a lot of sense, one reviewer of the paper insist
4 0.88892227 1572 andrew gelman stats-2012-11-10-I don’t like this cartoon
Introduction: Some people pointed me to this : I am happy to see statistical theory and methods be a topic in popular culture, and of course I’m glad that, contra Feller , the Bayesian is presented as the hero this time, but . . . . I think the lower-left panel of the cartoon unfairly misrepresents frequentist statisticians. Frequentist statisticians recognize many statistical goals. Point estimates trade off bias and variance. Interval estimates have the goal of achieving nominal coverage and the goal of being informative. Tests have the goals of calibration and power. Frequentists know that no single principle applies in all settings, and this is a setting where this particular method is clearly inappropriate. All statisticians use prior information in their statistical analysis. Non-Bayesians express their prior information not through a probability distribution on parameters but rather through their choice of methods. I think this non-Bayesian attitude is too restrictive, but in
5 0.88718218 207 andrew gelman stats-2010-08-14-Pourquoi Google search est devenu plus raisonnable?
Introduction: A few months ago I questioned Dan Ariely’s belief that Google is the voice of the people by reporting the following bizarre options that Google gave to complete the simplest search I could think of: Several commenters gave informed discussions about what was going on in Google’s program. Maybe things are better now, though? The latest version seems much more reasonable: (Aleks sent this to me, then I checked on my own computer and got the same thing.)
6 0.88495392 1702 andrew gelman stats-2013-02-01-Don’t let your standard errors drive your research agenda
8 0.88131225 846 andrew gelman stats-2011-08-09-Default priors update?
9 0.8803941 1991 andrew gelman stats-2013-08-21-BDA3 table of contents (also a new paper on visualization)
10 0.87807703 1639 andrew gelman stats-2012-12-26-Impersonators
11 0.87731302 2086 andrew gelman stats-2013-11-03-How best to compare effects measured in two different time periods?
12 0.87606454 639 andrew gelman stats-2011-03-31-Bayes: radical, liberal, or conservative?
14 0.87552059 1473 andrew gelman stats-2012-08-28-Turing chess run update
15 0.87398696 2089 andrew gelman stats-2013-11-04-Shlemiel the Software Developer and Unknown Unknowns
16 0.87373364 1474 andrew gelman stats-2012-08-29-More on scaled-inverse Wishart and prior independence
17 0.87282014 407 andrew gelman stats-2010-11-11-Data Visualization vs. Statistical Graphics
18 0.87171042 807 andrew gelman stats-2011-07-17-Macro causality
19 0.86981261 502 andrew gelman stats-2011-01-04-Cash in, cash out graph
20 0.86927116 2029 andrew gelman stats-2013-09-18-Understanding posterior p-values