andrew_gelman_stats andrew_gelman_stats-2014 andrew_gelman_stats-2014-2291 knowledge-graph by maker-knowledge-mining
Source: html
Introduction: Kevin Cartier writes: I’ve been happily using R for a number of years now and recently came across Stan. Looks big and powerful, so I’d like to pick an appropriate project and try it out. I wondered if you could point me to a link or document that goes into the motivation for this tool (aside from the Stan user doc)? What I’d like to understand is, at what point might you look at an emergent R project and advise, “You know, that thing you’re trying to do would be a whole lot easier/simpler/more straightforward to implement with Stan.” (or words to that effect). My reply: For my collaborators in political science, Stan has been most useful for models where the data set is not huge (e.g., we might have 10,000 data points or 50,000 data points but not 10 million) but where the model is somewhat complex (for example, a model with latent time series structure). The point is that the model has enough parameters and uncertainty that you’ll want to do full Bayes (rather than some sort
sentIndex sentText sentNum sentScore
1 Kevin Cartier writes: I’ve been happily using R for a number of years now and recently came across Stan. [sent-1, score-0.147]
2 Looks big and powerful, so I’d like to pick an appropriate project and try it out. [sent-2, score-0.324]
3 I wondered if you could point me to a link or document that goes into the motivation for this tool (aside from the Stan user doc)? [sent-3, score-1.021]
4 What I’d like to understand is, at what point might you look at an emergent R project and advise, “You know, that thing you’re trying to do would be a whole lot easier/simpler/more straightforward to implement with Stan. [sent-4, score-0.649]
5 My reply: For my collaborators in political science, Stan has been most useful for models where the data set is not huge (e. [sent-6, score-0.272]
6 , we might have 10,000 data points or 50,000 data points but not 10 million) but where the model is somewhat complex (for example, a model with latent time series structure). [sent-8, score-0.938]
7 The point is that the model has enough parameters and uncertainty that you’ll want to do full Bayes (rather than some sort of point estimate). [sent-9, score-0.608]
8 At that point, Stan is a winner compared to programming one’s own Monte Carlo algorithm. [sent-10, score-0.302]
9 We (the Stan team) should really prepare a document with a bunch of examples where Stan is a win, in one way or another. [sent-11, score-0.564]
wordName wordTfidf (topN-words)
[('stan', 0.488), ('document', 0.352), ('point', 0.17), ('project', 0.159), ('happily', 0.147), ('preparing', 0.144), ('advise', 0.142), ('doc', 0.142), ('prepare', 0.138), ('straightforward', 0.136), ('kevin', 0.13), ('winner', 0.125), ('wondered', 0.124), ('collaborators', 0.116), ('carlo', 0.116), ('implement', 0.116), ('latent', 0.113), ('improving', 0.111), ('monte', 0.111), ('model', 0.111), ('powerful', 0.108), ('user', 0.107), ('programming', 0.104), ('motivation', 0.101), ('points', 0.098), ('tool', 0.096), ('blogging', 0.096), ('win', 0.095), ('structure', 0.092), ('million', 0.09), ('somewhat', 0.089), ('aside', 0.087), ('pick', 0.087), ('complex', 0.087), ('team', 0.086), ('bayes', 0.082), ('uncertainty', 0.082), ('spend', 0.081), ('huge', 0.08), ('takes', 0.08), ('series', 0.079), ('appropriate', 0.078), ('rather', 0.077), ('data', 0.076), ('parameters', 0.075), ('bunch', 0.074), ('compared', 0.073), ('words', 0.072), ('goes', 0.071), ('whole', 0.068)]
simIndex simValue blogId blogTitle
same-blog 1 0.99999994 2291 andrew gelman stats-2014-04-14-Transitioning to Stan
Introduction: Kevin Cartier writes: I’ve been happily using R for a number of years now and recently came across Stan. Looks big and powerful, so I’d like to pick an appropriate project and try it out. I wondered if you could point me to a link or document that goes into the motivation for this tool (aside from the Stan user doc)? What I’d like to understand is, at what point might you look at an emergent R project and advise, “You know, that thing you’re trying to do would be a whole lot easier/simpler/more straightforward to implement with Stan.” (or words to that effect). My reply: For my collaborators in political science, Stan has been most useful for models where the data set is not huge (e.g., we might have 10,000 data points or 50,000 data points but not 10 million) but where the model is somewhat complex (for example, a model with latent time series structure). The point is that the model has enough parameters and uncertainty that you’ll want to do full Bayes (rather than some sort
2 0.33249018 1475 andrew gelman stats-2012-08-30-A Stan is Born
Introduction: Stan 1.0.0 and RStan 1.0.0 It’s official. The Stan Development Team is happy to announce the first stable versions of Stan and RStan. What is (R)Stan? Stan is an open-source package for obtaining Bayesian inference using the No-U-Turn sampler, a variant of Hamiltonian Monte Carlo. It’s sort of like BUGS, but with a different language for expressing models and a different sampler for sampling from their posteriors. RStan is the R interface to Stan. Stan Home Page Stan’s home page is: http://mc-stan.org/ It links everything you need to get started running Stan from the command line, from R, or from C++, including full step-by-step install instructions, a detailed user’s guide and reference manual for the modeling language, and tested ports of most of the BUGS examples. Peruse the Manual If you’d like to learn more, the Stan User’s Guide and Reference Manual is the place to start.
3 0.26701081 1748 andrew gelman stats-2013-03-04-PyStan!
Introduction: Stan is written in C++ and can be run from the command line and from R. We’d like for Python users to be able to run Stan as well. If anyone is interested in doing this, please let us know and we’d be happy to work with you on it. Stan, like Python, is completely free and open-source. P.S. Because Stan is open-source, it of course would also be possible for people to translate Stan into Python, or to take whatever features they like from Stan and incorporate them into a Python package. That’s fine too. But we think it would make sense in addition for users to be able to run Stan directly from Python, in the same way that it can be run from R.
4 0.25857335 1580 andrew gelman stats-2012-11-16-Stantastic!
Introduction: Richard McElreath writes: I’ve been translating a few ongoing data analysis projects into Stan code, mostly with success. The most important for me right now has been a hierarchical zero-inflated gamma problem. This a “hurdle” model, in which a bernoulli GLM produces zeros/nonzeros, and then a gamma GLM produces the nonzero values, using varying effects correlated with those in the bernoulli process. The data are 20 years of human foraging returns from a subsistence hunting population in Paraguay (the Ache), comprising about 15k hunts in total (Hill & Kintigh. 2009. Current Anthropology 50:369-377). Observed values are kilograms of meat returned to camp. The more complex models contain a 147-by-9 matrix of varying effects (147 unique hunters), as well as imputation of missing values. Originally, I had written the sampler myself in raw R code. It was very slow, but I knew what it was doing at least. Just before Stan version 1.0 was released, I had managed to get JAGS to do it a
5 0.23895 2161 andrew gelman stats-2014-01-07-My recent debugging experience
Introduction: OK, so this sort of thing happens sometimes. I was working on a new idea (still working on it; if it ultimately works out—or if it doesn’t—I’ll let you know) and as part of it I was fitting little models in Stan, in a loop. I thought it would make sense to start with linear regression with normal priors and known data variance, because then the exact solution is Gaussian and I can also work with the problem analytically. So I programmed up the algorithm and, no surprise, it didn’t work. I went through my R code, put in print statements here and there, and cleared out bug after bug until at least it stopped crashing. But the algorithm still wasn’t doing what it was supposed to do. So I decided to do something simpler, and just check that the Stan linear regression gave the same answer as the analytic posterior distribution: I ran Stan for tons of iterations, then computed the sample mean and variance of the simulations. It was an example with two coefficients—I’d originally cho
7 0.18942298 2150 andrew gelman stats-2013-12-27-(R-Py-Cmd)Stan 2.1.0
8 0.17947638 2299 andrew gelman stats-2014-04-21-Stan Model of the Week: Hierarchical Modeling of Supernovas
9 0.17801523 1528 andrew gelman stats-2012-10-10-My talk at MIT on Thurs 11 Oct
10 0.17336471 2096 andrew gelman stats-2013-11-10-Schiminovich is on The Simpsons
11 0.17204531 1948 andrew gelman stats-2013-07-21-Bayes related
12 0.16902037 2124 andrew gelman stats-2013-12-05-Stan (quietly) passes 512 people on the users list
13 0.16715962 1886 andrew gelman stats-2013-06-07-Robust logistic regression
14 0.16647477 1855 andrew gelman stats-2013-05-13-Stan!
15 0.16604115 2209 andrew gelman stats-2014-02-13-CmdStan, RStan, PyStan v2.2.0
16 0.16247204 1472 andrew gelman stats-2012-08-28-Migrating from dot to underscore
17 0.16090706 2035 andrew gelman stats-2013-09-23-Scalable Stan
18 0.15855095 2325 andrew gelman stats-2014-05-07-Stan users meetup next week
19 0.15694869 1627 andrew gelman stats-2012-12-17-Stan and RStan 1.1.0
20 0.14470659 1131 andrew gelman stats-2012-01-20-Stan: A (Bayesian) Directed Graphical Model Compiler
topicId topicWeight
[(0, 0.209), (1, 0.086), (2, -0.036), (3, 0.105), (4, 0.131), (5, 0.101), (6, -0.001), (7, -0.268), (8, -0.068), (9, -0.14), (10, -0.152), (11, 0.002), (12, -0.11), (13, -0.086), (14, 0.053), (15, -0.037), (16, -0.045), (17, 0.051), (18, -0.013), (19, -0.0), (20, -0.056), (21, -0.036), (22, -0.081), (23, -0.01), (24, -0.013), (25, -0.021), (26, 0.032), (27, -0.045), (28, -0.079), (29, -0.02), (30, -0.01), (31, -0.032), (32, -0.024), (33, -0.051), (34, -0.017), (35, 0.089), (36, 0.036), (37, -0.006), (38, 0.017), (39, -0.045), (40, 0.017), (41, 0.006), (42, -0.029), (43, -0.036), (44, 0.005), (45, 0.011), (46, 0.006), (47, 0.002), (48, -0.049), (49, 0.032)]
simIndex simValue blogId blogTitle
same-blog 1 0.9509005 2291 andrew gelman stats-2014-04-14-Transitioning to Stan
Introduction: Kevin Cartier writes: I’ve been happily using R for a number of years now and recently came across Stan. Looks big and powerful, so I’d like to pick an appropriate project and try it out. I wondered if you could point me to a link or document that goes into the motivation for this tool (aside from the Stan user doc)? What I’d like to understand is, at what point might you look at an emergent R project and advise, “You know, that thing you’re trying to do would be a whole lot easier/simpler/more straightforward to implement with Stan.” (or words to that effect). My reply: For my collaborators in political science, Stan has been most useful for models where the data set is not huge (e.g., we might have 10,000 data points or 50,000 data points but not 10 million) but where the model is somewhat complex (for example, a model with latent time series structure). The point is that the model has enough parameters and uncertainty that you’ll want to do full Bayes (rather than some sort
2 0.94785041 1580 andrew gelman stats-2012-11-16-Stantastic!
Introduction: Richard McElreath writes: I’ve been translating a few ongoing data analysis projects into Stan code, mostly with success. The most important for me right now has been a hierarchical zero-inflated gamma problem. This a “hurdle” model, in which a bernoulli GLM produces zeros/nonzeros, and then a gamma GLM produces the nonzero values, using varying effects correlated with those in the bernoulli process. The data are 20 years of human foraging returns from a subsistence hunting population in Paraguay (the Ache), comprising about 15k hunts in total (Hill & Kintigh. 2009. Current Anthropology 50:369-377). Observed values are kilograms of meat returned to camp. The more complex models contain a 147-by-9 matrix of varying effects (147 unique hunters), as well as imputation of missing values. Originally, I had written the sampler myself in raw R code. It was very slow, but I knew what it was doing at least. Just before Stan version 1.0 was released, I had managed to get JAGS to do it a
3 0.91911572 2150 andrew gelman stats-2013-12-27-(R-Py-Cmd)Stan 2.1.0
Introduction: We’re happy to announce the release of Stan C++, CmdStan, RStan, and PyStan 2.1.0. This is a minor feature release, but it is also an important bug fix release. As always, the place to start is the (all new) Stan web pages: http://mc-stan.org Major Bug in 2.0.0, 2.0.1 Stan 2.0.0 and Stan 2.0.1 introduced a bug in the implementation of the NUTS criterion that led to poor tail exploration and thus biased the posterior uncertainty downward. There was no bug in NUTS in Stan 1.3 or earlier, and 2.1 has been extensively tested and tests put in place so this problem will not recur. If you are using Stan 2.0.0 or 2.0.1, you should switch to 2.1.0 as soon as possible and rerun any models you care about. New Target Acceptance Rate Default for Stan 2.1.0 Another big change aimed at reducing posterior estimation bias was an increase in the target acceptance rate during adaptation from 0.65 to 0.80. The bad news is that iterations will take around 50% longer
4 0.91754824 1748 andrew gelman stats-2013-03-04-PyStan!
Introduction: Stan is written in C++ and can be run from the command line and from R. We’d like for Python users to be able to run Stan as well. If anyone is interested in doing this, please let us know and we’d be happy to work with you on it. Stan, like Python, is completely free and open-source. P.S. Because Stan is open-source, it of course would also be possible for people to translate Stan into Python, or to take whatever features they like from Stan and incorporate them into a Python package. That’s fine too. But we think it would make sense in addition for users to be able to run Stan directly from Python, in the same way that it can be run from R.
5 0.91525447 1475 andrew gelman stats-2012-08-30-A Stan is Born
Introduction: Stan 1.0.0 and RStan 1.0.0 It’s official. The Stan Development Team is happy to announce the first stable versions of Stan and RStan. What is (R)Stan? Stan is an open-source package for obtaining Bayesian inference using the No-U-Turn sampler, a variant of Hamiltonian Monte Carlo. It’s sort of like BUGS, but with a different language for expressing models and a different sampler for sampling from their posteriors. RStan is the R interface to Stan. Stan Home Page Stan’s home page is: http://mc-stan.org/ It links everything you need to get started running Stan from the command line, from R, or from C++, including full step-by-step install instructions, a detailed user’s guide and reference manual for the modeling language, and tested ports of most of the BUGS examples. Peruse the Manual If you’d like to learn more, the Stan User’s Guide and Reference Manual is the place to start.
6 0.90257663 712 andrew gelman stats-2011-05-14-The joys of working in the public domain
7 0.88693446 2161 andrew gelman stats-2014-01-07-My recent debugging experience
8 0.88199347 2325 andrew gelman stats-2014-05-07-Stan users meetup next week
9 0.88073403 2318 andrew gelman stats-2014-05-04-Stan (& JAGS) Tutorial on Linear Mixed Models
10 0.8776353 2124 andrew gelman stats-2013-12-05-Stan (quietly) passes 512 people on the users list
11 0.87488234 2209 andrew gelman stats-2014-02-13-CmdStan, RStan, PyStan v2.2.0
12 0.85369909 1627 andrew gelman stats-2012-12-17-Stan and RStan 1.1.0
13 0.84924382 2242 andrew gelman stats-2014-03-10-Stan Model of the Week: PK Calculation of IV and Oral Dosing
14 0.8402766 1036 andrew gelman stats-2011-11-30-Stan uses Nuts!
15 0.83872581 1576 andrew gelman stats-2012-11-13-Stan at NIPS 2012 Workshop on Probabilistic Programming
16 0.8342219 2096 andrew gelman stats-2013-11-10-Schiminovich is on The Simpsons
17 0.81045657 2035 andrew gelman stats-2013-09-23-Scalable Stan
18 0.81032169 1855 andrew gelman stats-2013-05-13-Stan!
19 0.80296987 2020 andrew gelman stats-2013-09-12-Samplers for Big Science: emcee and BAT
20 0.79864919 1472 andrew gelman stats-2012-08-28-Migrating from dot to underscore
topicId topicWeight
[(9, 0.039), (16, 0.028), (18, 0.017), (19, 0.017), (21, 0.015), (24, 0.152), (58, 0.018), (59, 0.064), (77, 0.012), (85, 0.017), (86, 0.08), (89, 0.038), (99, 0.403)]
simIndex simValue blogId blogTitle
same-blog 1 0.98645532 2291 andrew gelman stats-2014-04-14-Transitioning to Stan
Introduction: Kevin Cartier writes: I’ve been happily using R for a number of years now and recently came across Stan. Looks big and powerful, so I’d like to pick an appropriate project and try it out. I wondered if you could point me to a link or document that goes into the motivation for this tool (aside from the Stan user doc)? What I’d like to understand is, at what point might you look at an emergent R project and advise, “You know, that thing you’re trying to do would be a whole lot easier/simpler/more straightforward to implement with Stan.” (or words to that effect). My reply: For my collaborators in political science, Stan has been most useful for models where the data set is not huge (e.g., we might have 10,000 data points or 50,000 data points but not 10 million) but where the model is somewhat complex (for example, a model with latent time series structure). The point is that the model has enough parameters and uncertainty that you’ll want to do full Bayes (rather than some sort
2 0.97965729 1009 andrew gelman stats-2011-11-14-Wickham R short course
Introduction: Hadley writes: I [Hadley] am going to be teaching an R development master class in New York City on Dec 12-13. The basic idea of the class is to help you write better code, focused on the mantra of “do not repeat yourself”. In day one you will learn powerful new tools of abstraction, allowing you to solve a wider range of problems with fewer lines of code. Day two will teach you how to make packages, the fundamental unit of code distribution in R, allowing others to save time by allowing them to use your code. To get the most out of this course, you should have some experience programming in R already: you should be familiar with writing functions, and the basic data structures of R: vectors, matrices, arrays, lists and data frames. You will find the course particularly useful if you’re an experienced R user looking to take the next step, or if you’re moving to R from other programming languages and you want to quickly get up to speed with R’s unique features. A coupl
3 0.97930515 383 andrew gelman stats-2010-10-31-Analyzing the entire population rather than a sample
Introduction: Lee Mobley writes: I recently read what you posted on your blog How does statistical analysis differ when analyzing the entire population rather than a sample? What you said in the blog accords with my training in econometrics. However I am concerned about a new wrinkle on this problem that derives from multilevel modeling. We are analyzing multilevel models of the probability of using cancer screening for the entire Medicare population. I argue that every state has different systems in place (politics, cancer control efforts, culture, insurance regulations, etc) so that essentially a different probability generating mechanism is in place for each state. Thus I estimate 50 separate regressions for the populations in each state, and then note and map the variability in the effect estimates (slope parameters) for each covariate. Reviewers argue that I should be using random slopes modeling, pooling all individuals in all states together. I am familiar with this approach
Introduction: Thomas Basbøll writes : [Advertising executive] Russell Davies wrote a blog post called “The Tyranny of the Big Idea”. His five-point procedure begins: Start doing stuff. Start executing things which seem right. Do it quickly and do it often. Don’t cling onto anything, good or bad. Don’t repeat much. Take what was good and do it differently. And ends with: “And something else and something else.” This inspires several thoughts, which I’ll take advantage of the blog format to present with no attempt to be cohesively organized. 1. My first concern is the extent to which productivity-enhancing advice such as Davies’s (and Basbøll’s) is zero or even negative-sum , just helping people in the rat race. But, upon reflection, I’d rate the recommendations as positive-sum. If people learn to write better and be more productive, that’s not (necessarily) just positional. 2. Blogging fits with the “Do it quickly and do it often” advice. 3. I wonder what Basbøll thinks abo
5 0.9778257 1377 andrew gelman stats-2012-06-13-A question about AIC
Introduction: Jacob Oaknin asks: Akaike ‘s selection criterion is often justified on the basis of the empirical risk of a ML estimate being a biased estimate of the true generalization error of a parametric family, say the family, S_m, of linear regressors on a m-dimensional variable x=(x_1,..,x_m) with gaussian noise independent of x (for instance in “Unifying the derivations for the Akaike and Corrected Akaike information criteria”, by J.E.Cavanaugh, Statistics and Probability Letters, vol. 33, 1997, pp. 201-208). On the other hand, the family S_m is known to have finite VC-dimension (VC = m+1), and this fact should grant that empirical risk minimizer is asymtotically consistent regardless of the underlying probability distribution, and in particular for the assumed gaussian distribution of noise(“An overview of statistical learning theory”, by V.N.Vapnik, IEEE Transactions On Neural Networks, vol. 10, No. 5, 1999, pp. 988-999) What am I missing? My reply: I’m no expert on AIC so
7 0.97389901 2058 andrew gelman stats-2013-10-11-Gladwell and Chabris, David and Goliath, and science writing as stone soup
8 0.97322124 2041 andrew gelman stats-2013-09-27-Setting up Jitts online
9 0.97312576 1267 andrew gelman stats-2012-04-17-Hierarchical-multilevel modeling with “big data”
10 0.97291386 1630 andrew gelman stats-2012-12-18-Postdoc positions at Microsoft Research – NYC
11 0.97255611 2364 andrew gelman stats-2014-06-08-Regression and causality and variable ordering
12 0.97224599 1746 andrew gelman stats-2013-03-02-Fishing for cherries
13 0.9721303 1428 andrew gelman stats-2012-07-25-The problem with realistic advice?
14 0.97123635 731 andrew gelman stats-2011-05-26-Lottery probability update
15 0.97117448 754 andrew gelman stats-2011-06-09-Difficulties with Bayesian model averaging
16 0.97087908 691 andrew gelman stats-2011-05-03-Psychology researchers discuss ESP
17 0.97078168 1228 andrew gelman stats-2012-03-25-Continuous variables in Bayesian networks
18 0.97061884 114 andrew gelman stats-2010-06-28-More on Bayesian deduction-induction
19 0.97047961 2184 andrew gelman stats-2014-01-24-Parables vs. stories
20 0.9704656 2258 andrew gelman stats-2014-03-21-Random matrices in the news