andrew_gelman_stats andrew_gelman_stats-2012 andrew_gelman_stats-2012-1476 knowledge-graph by maker-knowledge-mining

1476 andrew gelman stats-2012-08-30-Stan is fast

meta infos for this blog

Source: html

Introduction: 10,000 iterations for 4 chains on the (precompiled) efficiently-parameterized 8-schools model: > date () [1] "Thu Aug 30 22:12:53 2012" > fit3 <- stan (fit=fit2, data = schools_dat, iter = 1e4, n_chains = 4) SAMPLING FOR MODEL 'anon_model' NOW (CHAIN 1). Iteration: 10000 / 10000 [100%] (Sampling) SAMPLING FOR MODEL 'anon_model' NOW (CHAIN 2). Iteration: 10000 / 10000 [100%] (Sampling) SAMPLING FOR MODEL 'anon_model' NOW (CHAIN 3). Iteration: 10000 / 10000 [100%] (Sampling) SAMPLING FOR MODEL 'anon_model' NOW (CHAIN 4). Iteration: 10000 / 10000 [100%] (Sampling) > date () [1] "Thu Aug 30 22:12:55 2012" > print (fit3) Inference for Stan model: anon_model. 4 chains: each with iter=10000; warmup=5000; thin=1; 10000 iterations saved. mean se_mean sd 2.5% 25% 50% 75% 97.5% n_eff Rhat mu 8.0 0.1 5.1 -2.0 4.7 8.0 11.3 18.4 4032 1 tau 6.7 0.1 5.6 0.3 2.5 5.4 9.3 21.2 2958 1 eta[1] 0.4 0.0 0.9 -1.5 -0

Summary: the most important sentenses genereted by tfidf model

sentIndex sentText sentNum sentScore

1 10,000 iterations for 4 chains on the (precompiled) efficiently-parameterized 8-schools model: > date () [1] "Thu Aug 30 22:12:53 2012" > fit3 <- stan (fit=fit2, data = schools_dat, iter = 1e4, n_chains = 4) SAMPLING FOR MODEL 'anon_model' NOW (CHAIN 1). [sent-1, score-0.827]

2 Iteration: 10000 / 10000 [100%] (Sampling) > date () [1] "Thu Aug 30 22:12:55 2012" > print (fit3) Inference for Stan model: anon_model. [sent-5, score-0.125]

3 4 chains: each with iter=10000; warmup=5000; thin=1; 10000 iterations saved. [sent-6, score-0.29]

4 For each parameter, n_eff is a crude measure of effective sample size, and Rhat is the potential scale reduction factor on split chains (at convergence, Rhat=1). [sent-162, score-0.437]

5 And, as you can see from the R-hats and effective sample sizes, 10,000 iterations is overkill here. [sent-165, score-0.476]

6 I’ll first simulate 800 schools worth of data, rerun, and see what happens. [sent-169, score-0.216]

7 That’s right, 4 chains of 1000 iterations (enough for convergence) for the 800 schools problem, in 10 seconds. [sent-171, score-0.632]

8 Well, it’s pretty horrible if you’re planning to do something with a billion data points. [sent-173, score-0.091]

9 For now, let me just point out that the 8 schools is not the ideal model to show the strengths of Stan vs. [sent-176, score-0.353]

10 The 8 schools model is conditionally conjugate and so Gibbs can work efficiently there. [sent-178, score-0.464]

11 Just for laffs, I tried the (nonconjugate) Student-t model (or, as Stan puts it, student_t) with no added parameterizations, I just replaced normal with student_t with 4 df. [sent-181, score-0.131]

12 The runs took 3 seconds for the 10,000 iterations of the 8 schools and 34 seconds for the 1000 iterations of the 800 schools. [sent-182, score-1.229]

13 But I think the reason it took a bit longer is not the nonconjugacy but just that we haven’t vectorized the student_t model yet. [sent-183, score-0.352]

14 That’s just a small implementation detail, nor requiring any tricks or changes to the algorithm. [sent-185, score-0.092]

15 These models did take 12 seconds each to compile. [sent-189, score-0.188]

16 Once it’s compiled, you can fit it immediately on new data without needing to recompile. [sent-191, score-0.137]

similar blogs computed by tfidf model

tfidf for this blog:

wordName wordTfidf (topN-words)

[('eta', 0.502), ('theta', 0.324), ('iterations', 0.29), ('sampling', 0.23), ('iteration', 0.225), ('thu', 0.219), ('rhat', 0.2), ('seconds', 0.188), ('chains', 0.174), ('aug', 0.174), ('schools', 0.168), ('chain', 0.156), ('model', 0.131), ('iter', 0.12), ('vectorized', 0.116), ('stan', 0.116), ('took', 0.105), ('convergence', 0.089), ('date', 0.082), ('sample', 0.067), ('nonconjugate', 0.067), ('precompiled', 0.067), ('effective', 0.065), ('laffs', 0.063), ('conditionally', 0.063), ('warmup', 0.063), ('rerun', 0.06), ('tau', 0.056), ('mu', 0.056), ('parameterizations', 0.056), ('conjugate', 0.056), ('compiled', 0.054), ('overkill', 0.054), ('strengths', 0.054), ('thin', 0.05), ('needing', 0.049), ('simulate', 0.048), ('tricks', 0.047), ('efficiently', 0.046), ('billion', 0.046), ('sd', 0.046), ('data', 0.045), ('requiring', 0.045), ('reduction', 0.045), ('gibbs', 0.045), ('damn', 0.044), ('split', 0.043), ('print', 0.043), ('fit', 0.043), ('crude', 0.043)]

similar blogs list:

simIndex simValue blogId blogTitle

same-blog 1 0.99999994 1476 andrew gelman stats-2012-08-30-Stan is fast

2 0.20389777 1089 andrew gelman stats-2011-12-28-Path sampling for models of varying dimension

Introduction: Somebody asks: Iâ€™m reading your paper on path sampling. It essentially solves the problem of computing the ratio \int q0(omega)d omega/\int q1(omega) d omega. I.e the arguments in q0() and q1() are the same. But this assumption is not always true in Bayesian model selection using Bayes factor. In general (for BF), we have this problem, t1 and t2 may have no relation at all. \int f1(y|t1)p1(t1) d t1 / \int f2(y|t2)p2(t2) d t2 As an example, suppose that we want to compare two sets of normally distributed data with known variance whether they have the same mean (H0) or they are not necessarily have the same mean (H1). Then the dummy variable should be mu in H0 (which is the common mean of both set of samples), and should be (mu1, mu2) (which are the means for each set of samples). One straight method to address my problem is to preform path integration for the numerate and the denominator, as both the numerate and the denominator are integrals. Each integral can be rewrit

3 0.17568199 1941 andrew gelman stats-2013-07-16-Priors

Introduction: Nick Firoozye writes: While I am absolutely sympathetic to the Bayesian agenda I am often troubled by the requirement of having priors. We must have priors on the parameter of an infinite number of model we have never seen before and I find this troubling. There is a similarly troubling problem in economics of utility theory. Utility is on consumables. To be complete a consumer must assign utility to all sorts of things they never would have encountered. More recent versions of utility theory instead make consumption goods a portfolio of attributes. Cadillacs are x many units of luxury y of transport etc etc. And we can automatically have personal utilities to all these attributes. I don’t ever see parameters. Some model have few and some have hundreds. Instead, I see data. So I don’t know how to have an opinion on parameters themselves. Rather I think it far more natural to have opinions on the behavior of models. The prior predictive density is a good and sensible notion. Also

4 0.17393009 1868 andrew gelman stats-2013-05-23-Validation of Software for Bayesian Models Using Posterior Quantiles

Introduction: Every once in awhile I get a question that I can directly answer from my published research. When that happens it makes me so happy. Here’s an example. Patrick Lam wrote, Suppose one develops a Bayesian model to estimate a parameter theta. Now suppose one wants to evaluate the model via simulation by generating fake data where you know the value of theta and see how well you recover theta with your model, assuming that you use the posterior mean as the estimate. The traditional frequentist way of evaluating it might be to generate many datasets and see how well your estimator performs each time in terms of unbiasedness or mean squared error or something. But given that unbiasedness means nothing to a Bayesian and there is no repeated sampling interpretation in a Bayesian model, how would you suggest one would evaluate a Bayesian model? My reply: I actually have a paper on this ! It is by Cook, Gelman, and Rubin. The idea is to draw theta from the prior distribution.

5 0.1506522 858 andrew gelman stats-2011-08-17-Jumping off the edge of the world

Introduction: Tomas Iesmantas writes: I’m facing a problem where parameter space is bounded, e.g. all parameters have to be positive. If in MCMC as proposal distribution I use normal distribution, then at some iterations I get negative proposals. So my question is: should I use recalculation of acceptance probability every time I reject the proposal (something like in delayed rejection method), or I have to use another proposal (like lognormal, truncated normal, etc.)? The simplest solution is to just calculate p(theta)=0 for theta outside the legal region, thus reject those jumps. This will work fine (just remember that when you reject, you have to stay at the last value for one more iteration), but if you’re doing these rejections all the time, you might want to reparameterize your space, for example using logs for positive parameters, logits for constrained parameters, and softmax for parameters that are constrained to sum to 1.

6 0.14971136 899 andrew gelman stats-2011-09-10-The statistical significance filter

7 0.14107987 547 andrew gelman stats-2011-01-31-Using sample size in the prior distribution

8 0.14059834 2161 andrew gelman stats-2014-01-07-My recent debugging experience

9 0.13587445 774 andrew gelman stats-2011-06-20-The pervasive twoishness of statistics; in particular, the “sampling distribution” and the “likelihood” are two different models, and that’s a good thing

10 0.11867672 961 andrew gelman stats-2011-10-16-The “Washington read” and the algebra of conditional distributions

11 0.11029533 2155 andrew gelman stats-2013-12-31-No on Yes-No decisions

12 0.10986167 870 andrew gelman stats-2011-08-25-Why it doesn’t make sense in general to form confidence intervals by inverting hypothesis tests

13 0.10690106 1144 andrew gelman stats-2012-01-29-How many parameters are in a multilevel model?

14 0.10672729 1950 andrew gelman stats-2013-07-22-My talks that were scheduled for Tues at the Data Skeptics meetup and Wed at the Open Statistical Programming meetup

15 0.10669054 1628 andrew gelman stats-2012-12-17-Statistics in a world where nothing is random

16 0.10532069 1999 andrew gelman stats-2013-08-27-Bayesian model averaging or fitting a larger model

17 0.10336915 1913 andrew gelman stats-2013-06-24-Why it doesn’t make sense in general to form confidence intervals by inverting hypothesis tests

18 0.10322066 1848 andrew gelman stats-2013-05-09-A tale of two discussion papers

19 0.10205717 1130 andrew gelman stats-2012-01-20-Prior beliefs about locations of decision boundaries

20 0.10088801 2150 andrew gelman stats-2013-12-27-(R-Py-Cmd)Stan 2.1.0

similar blogs computed by lsi model

lsi for this blog:

topicId topicWeight

[(0, 0.106), (1, 0.115), (2, 0.018), (3, 0.03), (4, 0.066), (5, 0.047), (6, 0.02), (7, -0.065), (8, -0.021), (9, -0.097), (10, -0.049), (11, 0.001), (12, -0.071), (13, -0.015), (14, -0.046), (15, -0.068), (16, -0.006), (17, 0.022), (18, -0.001), (19, -0.021), (20, 0.026), (21, -0.04), (22, -0.024), (23, -0.032), (24, 0.026), (25, 0.017), (26, -0.049), (27, 0.026), (28, 0.04), (29, 0.036), (30, -0.04), (31, 0.003), (32, -0.044), (33, 0.025), (34, -0.033), (35, 0.084), (36, -0.003), (37, 0.012), (38, -0.052), (39, 0.023), (40, 0.06), (41, 0.052), (42, -0.062), (43, -0.028), (44, -0.026), (45, -0.056), (46, 0.042), (47, 0.004), (48, -0.019), (49, 0.041)]

similar blogs list:

simIndex simValue blogId blogTitle

same-blog 1 0.92119706 1476 andrew gelman stats-2012-08-30-Stan is fast

2 0.69848353 2242 andrew gelman stats-2014-03-10-Stan Model of the Week: PK Calculation of IV and Oral Dosing

Introduction: [Update: Revised given comments from Wingfeet, Andrew and germo. Thanks! I'd mistakenly translated the dlnorm priors in the first version --- amazing what a difference the priors make. I also escaped the less-than and greater-than signs in the constraints in the model so they're visible. I also updated to match the thin=2 output of JAGS.] We’re going to be starting a Stan “model of the P” (for some time period P) column, so I thought I’d kick things off with one of my own. I’ve been following the Wingvoet blog , the author of which is identified only by the Blogger handle Wingfeet ; a couple of days ago this lovely post came out: PK calculation of IV and oral dosing in JAGS Wingfeet’s post implemented an answer to question 6 from chapter 6 of problem from Rowland and Tozer’s 2010 book, Clinical Pharmacokinetics and Pharmacodynamics , Fourth edition, Lippincott, Williams & Wilkins. So in the grand tradition of using this blog to procrastinate, I thought I’d t

3 0.68811518 2003 andrew gelman stats-2013-08-30-Stan Project: Continuous Relaxations for Discrete MRFs

Introduction: Hamiltonian Monte Carlo (HMC), as used by Stan , is only defined for continuous parameters. We’d love to be able to do discrete sampling. So I was excited when I saw this: Yichuan Zhang, Charles Sutton, Amos J Storkey, and Zoubin Ghahramani. 2012. Continuous Relaxations for Discrete Hamiltonian Monte Carlo . NIPS 25. Abstract: Continuous relaxations play an important role in discrete optimization, but have not seen much use in approximate probabilistic inference. Here we show that a general form of the Gaussian Integral Trick makes it possible to transform a wide class of discrete variable undirected models into fully continuous systems. The continuous representation allows the use of gradient-based Hamiltonian Monte Carlo for inference, results in new ways of estimating normalization constants (partition functions), and in general opens up a number of new avenues for inference in difficult discrete systems. We demonstrate some of these continuous relaxation inference a

4 0.67598951 2299 andrew gelman stats-2014-04-21-Stan Model of the Week: Hierarchical Modeling of Supernovas

Introduction: The Stan Model of the Week showcases research using Stan to push the limits of applied statistics. If you have a model that you would like to submit for a future post then send us an email . Our inaugural post comes from Nathan Sanders, a graduate student finishing up his thesis on astrophysics at Harvard. Nathan writes, “Core-collapse supernovae, the luminous explosions of massive stars, exhibit an expansive and meaningful diversity of behavior in their brightness evolution over time (their “light curves”). Our group discovers and monitors these events using the Pan-STARRS1 telescope in Hawaii, and we’ve collected a dataset of about 20,000 individual photometric observations of about 80 Type IIP supernovae, the class my work has focused on. While this dataset provides one of the best available tools to infer the explosion properties of these supernovae, due to the nature of extragalactic astronomy (observing from distances 1 billion light years), these light curves typicall

5 0.67072415 1363 andrew gelman stats-2012-06-03-Question about predictive checks

Introduction: Klaas Metselaar writes: I [Metselaar] am currently involved in a discussion about the use of the notion “predictive” as used in “posterior predictive check”. I would argue that the notion “predictive” should be reserved for posterior checks using information not used in the determination of the posterior. I quote from the discussion: “However, the predictive uncertainty in a Bayesian calculation requires sampling from all the random variables, and this includes both the model parameters and the residual error”. My [Metselaar's] comment: This may be exactly the point I am worried about: shouldn’t the predictive uncertainty be defined as sampling from the posterior parameter distribution + residual error + sampling from the prediction error distribution? Residual error reduces to measurement error in the case of a model which is perfect for the sample of experiments. Measurement error could be reduced to almost zero by ideal and perfect measurement instruments. I would h

6 0.66865069 858 andrew gelman stats-2011-08-17-Jumping off the edge of the world

7 0.6652382 1089 andrew gelman stats-2011-12-28-Path sampling for models of varying dimension

8 0.66221738 2208 andrew gelman stats-2014-02-12-How to think about “identifiability” in Bayesian inference?

9 0.65340239 2291 andrew gelman stats-2014-04-14-Transitioning to Stan

10 0.6447798 2020 andrew gelman stats-2013-09-12-Samplers for Big Science: emcee and BAT

11 0.64195329 1460 andrew gelman stats-2012-08-16-“Real data can be a pain”

12 0.64011586 1527 andrew gelman stats-2012-10-10-Another reason why you can get good inferences from a bad model

13 0.625009 1580 andrew gelman stats-2012-11-16-Stantastic!

14 0.62058228 774 andrew gelman stats-2011-06-20-The pervasive twoishness of statistics; in particular, the “sampling distribution” and the “likelihood” are two different models, and that’s a good thing

15 0.6163618 2231 andrew gelman stats-2014-03-03-Running into a Stan Reference by Accident

16 0.61595142 2340 andrew gelman stats-2014-05-20-Thermodynamic Monte Carlo: Michael Betancourt’s new method for simulating from difficult distributions and evaluating normalizing constants

17 0.60486245 1221 andrew gelman stats-2012-03-19-Whassup with deviance having a high posterior correlation with a parameter in the model?

18 0.60197401 1799 andrew gelman stats-2013-04-12-Stan 1.3.0 and RStan 1.3.0 Ready for Action

19 0.60053152 2161 andrew gelman stats-2014-01-07-My recent debugging experience

20 0.59741199 1401 andrew gelman stats-2012-06-30-David Hogg on statistics

similar blogs computed by lda model

lda for this blog:

topicId topicWeight

[(2, 0.013), (13, 0.013), (15, 0.016), (16, 0.036), (23, 0.012), (24, 0.158), (27, 0.013), (36, 0.372), (45, 0.012), (54, 0.013), (86, 0.021), (99, 0.172)]

similar blogs list:

simIndex simValue blogId blogTitle

1 0.98925877 2242 andrew gelman stats-2014-03-10-Stan Model of the Week: PK Calculation of IV and Oral Dosing

same-blog 2 0.86481804 1476 andrew gelman stats-2012-08-30-Stan is fast

3 0.8538276 1797 andrew gelman stats-2013-04-10-“Proposition and experiment”

Introduction: Anna Lena Phillips writes : I. Many people will not, of their own accord, look at a poem. II. Millions of people will, of their own accord, spend lots and lots of time looking at photographs of cats. III. Therefore, earlier this year, I concluded that the best strategy for increasing the number of viewers for poems would be to print them on top of photographs of cats. IV. I happen to like looking at both poems and cats. V. So this is, for me, a win-win situation. VI. Fortunately, my own cat is a patient model, and (if I am to be believed) quite photogenic. VII. The aforementioned cat is Tisko Tansi, small hero. VII. Thus I present to you (albeit in digital rather than physical form) an Endearments broadside, featuring a poem that originally appeared in BlazeVOX spring 2011. VIII. If you want to share a copy of this image, please ask first. If you want a real copy, you can ask about that too. She follows up with an image of a cat, on which is superimposed a short

4 0.84997129 176 andrew gelman stats-2010-08-02-Information is good

Introduction: Washington Post and Slate reporter Anne Applebaum wrote a dismissive column about Wikileaks, saying that they “offer nothing more than raw data.” Applebaum argues that “The notion that the Internet can replace traditional news-gathering has just been revealed to be a myth. . . . without more journalism, more investigation, more work, these documents just don’t matter that much.” Fine. But don’t undervalue the role of mere data! The usual story is that we don’t get to see the raw data underlying newspaper stories. Wikileaks and other crowdsourced data can be extremely useful, whether or not they replace “traditional news-gathering.”

5 0.81868768 1478 andrew gelman stats-2012-08-31-Watercolor regression

Introduction: Solomon Hsiang writes: Two small follow-ups based on the discussion (the second/bigger one is to address your comment about the 95% CI edges). 1. I realized that if we plot the confidence intervals as a solid color that fades (eg. using the “fixed ink” scheme from before) we can make sure the regression line also has heightened visual weight where confidence is high by plotting the line white. This makes the contrast (and thus visual weight) between the regression line and the CI highest when the CI is narrow and dark. As the CI fade near the edges, so does the contrast with the regression line. This is a small adjustment, but I like it because it is so simple and it makes the graph much nicer. (see “visually_weighted_fill_reverse” attached). My posted code has been updated to do this automatically. 2. You and your readers didn’t like that the edges of the filled CI were so sharp and arbitrary. But I didn’t like that the contrast between the spaghetti lines and the background

6 0.75802612 551 andrew gelman stats-2011-02-02-Obama and Reagan, sitting in a tree, etc.

7 0.71114433 370 andrew gelman stats-2010-10-25-Who gets wedding announcements in the Times?

8 0.70847297 883 andrew gelman stats-2011-09-01-Arrow’s theorem update

9 0.69628632 1470 andrew gelman stats-2012-08-26-Graphs showing regression uncertainty: the code!

10 0.68427896 1847 andrew gelman stats-2013-05-08-Of parsing and chess

11 0.68284458 1217 andrew gelman stats-2012-03-17-NSF program “to support analytic and methodological research in support of its surveys”

12 0.68108022 101 andrew gelman stats-2010-06-20-“People with an itch to scratch”

13 0.67045128 1898 andrew gelman stats-2013-06-14-Progress! (on the understanding of the role of randomization in Bayesian inference)

14 0.66419429 1851 andrew gelman stats-2013-05-11-Actually, I have no problem with this graph

15 0.65885806 2105 andrew gelman stats-2013-11-18-What’s my Kasparov number?

16 0.65435016 415 andrew gelman stats-2010-11-15-The two faces of Erving Goffman: Subtle observer of human interactions, and Smug organzation man

17 0.6457181 55 andrew gelman stats-2010-05-27-In Linux, use jags() to call Jags instead of using bugs() to call OpenBugs

18 0.6308006 2096 andrew gelman stats-2013-11-10-Schiminovich is on The Simpsons

19 0.60675687 1666 andrew gelman stats-2013-01-10-They’d rather be rigorous than right

20 0.5865863 2150 andrew gelman stats-2013-12-27-(R-Py-Cmd)Stan 2.1.0