andrew_gelman_stats andrew_gelman_stats-2014 andrew_gelman_stats-2014-2161 knowledge-graph by maker-knowledge-mining

2161 andrew gelman stats-2014-01-07-My recent debugging experience

meta infos for this blog

Source: html

Introduction: OK, so this sort of thing happens sometimes. I was working on a new idea (still working on it; if it ultimately works out—or if it doesn’t—I’ll let you know) and as part of it I was fitting little models in Stan, in a loop. I thought it would make sense to start with linear regression with normal priors and known data variance, because then the exact solution is Gaussian and I can also work with the problem analytically. So I programmed up the algorithm and, no surprise, it didn’t work. I went through my R code, put in print statements here and there, and cleared out bug after bug until at least it stopped crashing. But the algorithm still wasn’t doing what it was supposed to do. So I decided to do something simpler, and just check that the Stan linear regression gave the same answer as the analytic posterior distribution: I ran Stan for tons of iterations, then computed the sample mean and variance of the simulations. It was an example with two coefficients—I’d originally cho

Summary: the most important sentenses genereted by tfidf model

sentIndex sentText sentNum sentScore

1 I was working on a new idea (still working on it; if it ultimately works out—or if it doesn’t—I’ll let you know) and as part of it I was fitting little models in Stan, in a loop. [sent-2, score-0.066]

2 I thought it would make sense to start with linear regression with normal priors and known data variance, because then the exact solution is Gaussian and I can also work with the problem analytically. [sent-3, score-0.522]

3 So I programmed up the algorithm and, no surprise, it didn’t work. [sent-4, score-0.15]

4 I went through my R code, put in print statements here and there, and cleared out bug after bug until at least it stopped crashing. [sent-5, score-0.681]

5 But the algorithm still wasn’t doing what it was supposed to do. [sent-6, score-0.161]

6 So I decided to do something simpler, and just check that the Stan linear regression gave the same answer as the analytic posterior distribution: I ran Stan for tons of iterations, then computed the sample mean and variance of the simulations. [sent-7, score-0.646]

7 The means looked fine but the covariance matrix from the Stan simulations was off. [sent-9, score-0.297]

8 The correlations were wrong—not by a lot, but by a nonzero amount, I think the value from the formula was 0. [sent-10, score-0.222]

9 Which I did, actually, but fixing the formula didn’t solve the problem. [sent-15, score-0.224]

10 I also tried direct simulation, and that gave the right answer too. [sent-16, score-0.248]

11 So I just fed Stan the posterior distribution directly. [sent-18, score-0.13]

12 I simplified further, forget regression entirely, just give independent normal priors: parameters { real b1; real b2; } model { b1 ~ normal (0, 1); b2 ~ normal (0, 1); } You can’t get much more stripped down than that. [sent-20, score-1.014]

13 Still had the problem: > library ("rstan") > gaussian2 <- stan (file="gaussian2. [sent-21, score-0.425]

14 stan", iter=20000, chains=4) > sims <- extract(gaussian2)$b1 > print (mean (sims)) [1] -0. [sent-22, score-0.677]

15 I then tried to go even simpler, to one dimension: parameters { real b1; } model { b1 ~ normal (0, 1); } This time it gave the right answer: > gaussian1 <- stan (file="gaussian1. [sent-27, score-0.919]

16 stan", iter=20000, chains=4) > sims <- extract(gaussian1)$b1 > print (mean (sims)) [1] 0. [sent-28, score-0.677]

17 I have two adjacent examples, one where Stan works and one where it doesn’t. [sent-31, score-0.147]

18 Just to be clear: the above is not meant to represent exemplary practice. [sent-35, score-0.073]

19 It seems to be a general rule of programming, and of research, that no matter how simple we start, we should start even simpler, to get that solid rock of computational certainty on which we can stand while building our complex structures. [sent-37, score-0.081]

20 The indents just don’t show up when I use the “code” tag in html. [sent-42, score-0.15]

similar blogs computed by tfidf model

tfidf for this blog:

wordName wordTfidf (topN-words)

[('stan', 0.425), ('sims', 0.413), ('print', 0.264), ('normal', 0.189), ('bug', 0.17), ('formula', 0.16), ('simpler', 0.147), ('iter', 0.147), ('var', 0.134), ('extract', 0.116), ('covariance', 0.108), ('chains', 0.107), ('regression', 0.104), ('gave', 0.101), ('file', 0.098), ('matrix', 0.096), ('simulations', 0.093), ('mean', 0.093), ('algorithm', 0.086), ('adjacent', 0.081), ('indent', 0.081), ('indents', 0.081), ('prospectively', 0.081), ('start', 0.081), ('answer', 0.077), ('cleared', 0.077), ('priors', 0.076), ('still', 0.075), ('variance', 0.073), ('exemplary', 0.073), ('linear', 0.072), ('code', 0.071), ('progression', 0.071), ('stripped', 0.071), ('ep', 0.071), ('tried', 0.07), ('real', 0.069), ('simplified', 0.069), ('tag', 0.069), ('html', 0.067), ('posterior', 0.067), ('works', 0.066), ('parameters', 0.065), ('programmed', 0.064), ('fixing', 0.064), ('fed', 0.063), ('nonzero', 0.062), ('rstan', 0.062), ('forthcoming', 0.059), ('analytic', 0.059)]

similar blogs list:

simIndex simValue blogId blogTitle

same-blog 1 0.99999982 2161 andrew gelman stats-2014-01-07-My recent debugging experience

2 0.27720076 1475 andrew gelman stats-2012-08-30-A Stan is Born

Introduction: Stan 1.0.0 and RStan 1.0.0 It’s official. The Stan Development Team is happy to announce the first stable versions of Stan and RStan. What is (R)Stan? Stan is an open-source package for obtaining Bayesian inference using the No-U-Turn sampler, a variant of Hamiltonian Monte Carlo. It’s sort of like BUGS, but with a different language for expressing models and a different sampler for sampling from their posteriors. RStan is the R interface to Stan. Stan Home Page Stan’s home page is: http://mc-stan.org/ It links everything you need to get started running Stan from the command line, from R, or from C++, including full step-by-step install instructions, a detailed user’s guide and reference manual for the modeling language, and tested ports of most of the BUGS examples. Peruse the Manual If you’d like to learn more, the Stan User’s Guide and Reference Manual is the place to start.

3 0.27111608 2150 andrew gelman stats-2013-12-27-(R-Py-Cmd)Stan 2.1.0

Introduction: We’re happy to announce the release of Stan C++, CmdStan, RStan, and PyStan 2.1.0. This is a minor feature release, but it is also an important bug fix release. As always, the place to start is the (all new) Stan web pages: http://mc-stan.org Major Bug in 2.0.0, 2.0.1 Stan 2.0.0 and Stan 2.0.1 introduced a bug in the implementation of the NUTS criterion that led to poor tail exploration and thus biased the posterior uncertainty downward. There was no bug in NUTS in Stan 1.3 or earlier, and 2.1 has been extensively tested and tests put in place so this problem will not recur. If you are using Stan 2.0.0 or 2.0.1, you should switch to 2.1.0 as soon as possible and rerun any models you care about. New Target Acceptance Rate Default for Stan 2.1.0 Another big change aimed at reducing posterior estimation bias was an increase in the target acceptance rate during adaptation from 0.65 to 0.80. The bad news is that iterations will take around 50% longer

4 0.23895 2291 andrew gelman stats-2014-04-14-Transitioning to Stan

Introduction: Kevin Cartier writes: I’ve been happily using R for a number of years now and recently came across Stan. Looks big and powerful, so I’d like to pick an appropriate project and try it out. I wondered if you could point me to a link or document that goes into the motivation for this tool (aside from the Stan user doc)? What I’d like to understand is, at what point might you look at an emergent R project and advise, “You know, that thing you’re trying to do would be a whole lot easier/simpler/more straightforward to implement with Stan.” (or words to that effect). My reply: For my collaborators in political science, Stan has been most useful for models where the data set is not huge (e.g., we might have 10,000 data points or 50,000 data points but not 10 million) but where the model is somewhat complex (for example, a model with latent time series structure). The point is that the model has enough parameters and uncertainty that you’ll want to do full Bayes (rather than some sort

5 0.23537582 1748 andrew gelman stats-2013-03-04-PyStan!

Introduction: Stan is written in C++ and can be run from the command line and from R. We’d like for Python users to be able to run Stan as well. If anyone is interested in doing this, please let us know and we’d be happy to work with you on it. Stan, like Python, is completely free and open-source. P.S. Because Stan is open-source, it of course would also be possible for people to translate Stan into Python, or to take whatever features they like from Stan and incorporate them into a Python package. That’s fine too. But we think it would make sense in addition for users to be able to run Stan directly from Python, in the same way that it can be run from R.

6 0.22291826 1627 andrew gelman stats-2012-12-17-Stan and RStan 1.1.0

7 0.21798989 2209 andrew gelman stats-2014-02-13-CmdStan, RStan, PyStan v2.2.0

8 0.21699868 1580 andrew gelman stats-2012-11-16-Stantastic!

9 0.20728654 1753 andrew gelman stats-2013-03-06-Stan 1.2.0 and RStan 1.2.0

10 0.20088065 952 andrew gelman stats-2011-10-11-More reason to like Sims besides just his name

11 0.19937719 1950 andrew gelman stats-2013-07-22-My talks that were scheduled for Tues at the Data Skeptics meetup and Wed at the Open Statistical Programming meetup

12 0.19068216 2096 andrew gelman stats-2013-11-10-Schiminovich is on The Simpsons

13 0.18040852 2332 andrew gelman stats-2014-05-12-“The results (not shown) . . .”

14 0.17807598 1710 andrew gelman stats-2013-02-06-The new Stan 1.1.1, featuring Gaussian processes!

15 0.17724152 1886 andrew gelman stats-2013-06-07-Robust logistic regression

16 0.16712348 1941 andrew gelman stats-2013-07-16-Priors

17 0.15731955 2124 andrew gelman stats-2013-12-05-Stan (quietly) passes 512 people on the users list

18 0.15353374 1991 andrew gelman stats-2013-08-21-BDA3 table of contents (also a new paper on visualization)

19 0.14578873 1855 andrew gelman stats-2013-05-13-Stan!

20 0.14331783 1528 andrew gelman stats-2012-10-10-My talk at MIT on Thurs 11 Oct

similar blogs computed by lsi model

lsi for this blog:

topicId topicWeight

[(0, 0.187), (1, 0.117), (2, -0.023), (3, 0.105), (4, 0.141), (5, 0.062), (6, 0.083), (7, -0.274), (8, -0.116), (9, -0.118), (10, -0.142), (11, -0.001), (12, -0.1), (13, -0.059), (14, 0.071), (15, -0.064), (16, -0.036), (17, 0.073), (18, -0.0), (19, -0.004), (20, -0.018), (21, -0.044), (22, -0.048), (23, 0.008), (24, 0.033), (25, 0.009), (26, 0.018), (27, -0.056), (28, -0.089), (29, -0.01), (30, 0.043), (31, 0.036), (32, -0.001), (33, 0.01), (34, 0.011), (35, 0.014), (36, -0.007), (37, 0.058), (38, -0.026), (39, -0.038), (40, 0.011), (41, 0.014), (42, 0.003), (43, -0.023), (44, 0.018), (45, 0.001), (46, 0.015), (47, 0.004), (48, 0.034), (49, -0.001)]

similar blogs list:

simIndex simValue blogId blogTitle

same-blog 1 0.97438568 2161 andrew gelman stats-2014-01-07-My recent debugging experience

2 0.9242689 2150 andrew gelman stats-2013-12-27-(R-Py-Cmd)Stan 2.1.0

3 0.88354063 1627 andrew gelman stats-2012-12-17-Stan and RStan 1.1.0

Introduction: We’re happy to announce the availability of Stan and RStan versions 1.1.0, which are general tools for performing model-based Bayesian inference using the no-U-turn sampler, an adaptive form of Hamiltonian Monte Carlo. Information on downloading and installing and using them is available as always from Stan Home Page: http://mc-stan.org/ Let us know if you have any problems on the mailing lists or at the e-mails linked on the home page (please don’t use this web page). The full release notes follow. (R)Stan Version 1.1.0 Release Notes =================================== -- Backward Compatibility Issue * Categorical distribution recoded to match documentation; it now has support {1,...,K} rather than {0,...,K-1}. * (RStan) change default value of permuted flag from FALSE to TRUE for Stan fit S4 extract() method -- New Features * Conditional (if-then-else) statements * While statements -- New Functions * generalized multiply_lower_tri

4 0.88328844 1475 andrew gelman stats-2012-08-30-A Stan is Born

5 0.88193733 1580 andrew gelman stats-2012-11-16-Stantastic!

Introduction: Richard McElreath writes: I’ve been translating a few ongoing data analysis projects into Stan code, mostly with success. The most important for me right now has been a hierarchical zero-inflated gamma problem. This a “hurdle” model, in which a bernoulli GLM produces zeros/nonzeros, and then a gamma GLM produces the nonzero values, using varying effects correlated with those in the bernoulli process. The data are 20 years of human foraging returns from a subsistence hunting population in Paraguay (the Ache), comprising about 15k hunts in total (Hill & Kintigh. 2009. Current Anthropology 50:369-377). Observed values are kilograms of meat returned to camp. The more complex models contain a 147-by-9 matrix of varying effects (147 unique hunters), as well as imputation of missing values. Originally, I had written the sampler myself in raw R code. It was very slow, but I knew what it was doing at least. Just before Stan version 1.0 was released, I had managed to get JAGS to do it a

6 0.88192451 1748 andrew gelman stats-2013-03-04-PyStan!

7 0.86235988 2209 andrew gelman stats-2014-02-13-CmdStan, RStan, PyStan v2.2.0

8 0.85796207 712 andrew gelman stats-2011-05-14-The joys of working in the public domain

9 0.83652389 1036 andrew gelman stats-2011-11-30-Stan uses Nuts!

10 0.82518947 2124 andrew gelman stats-2013-12-05-Stan (quietly) passes 512 people on the users list

11 0.82323897 2291 andrew gelman stats-2014-04-14-Transitioning to Stan

12 0.81920964 2003 andrew gelman stats-2013-08-30-Stan Project: Continuous Relaxations for Discrete MRFs

13 0.81707519 1472 andrew gelman stats-2012-08-28-Migrating from dot to underscore

14 0.8141396 1753 andrew gelman stats-2013-03-06-Stan 1.2.0 and RStan 1.2.0

15 0.80877191 2242 andrew gelman stats-2014-03-10-Stan Model of the Week: PK Calculation of IV and Oral Dosing

16 0.80815017 1855 andrew gelman stats-2013-05-13-Stan!

17 0.79200774 2318 andrew gelman stats-2014-05-04-Stan (& JAGS) Tutorial on Linear Mixed Models

18 0.78299844 2096 andrew gelman stats-2013-11-10-Schiminovich is on The Simpsons

19 0.78273189 1799 andrew gelman stats-2013-04-12-Stan 1.3.0 and RStan 1.3.0 Ready for Action

20 0.78059751 1710 andrew gelman stats-2013-02-06-The new Stan 1.1.1, featuring Gaussian processes!

similar blogs computed by lda model

lda for this blog:

topicId topicWeight

[(6, 0.031), (13, 0.02), (16, 0.088), (24, 0.185), (36, 0.048), (42, 0.018), (44, 0.028), (49, 0.039), (55, 0.043), (57, 0.018), (65, 0.064), (86, 0.054), (89, 0.025), (99, 0.221)]

similar blogs list:

simIndex simValue blogId blogTitle

same-blog 1 0.95467579 2161 andrew gelman stats-2014-01-07-My recent debugging experience

2 0.93538618 1367 andrew gelman stats-2012-06-05-Question 26 of my final exam for Design and Analysis of Sample Surveys

Introduction: 26. You have just graded an an exam with 28 questions and 15 students. You fit a logistic item- response model estimating ability, difficulty, and discrimination parameters. Which of the following statements are basically true? (Indicate all that apply.) (a) If a question is answered correctly by students with very low and very high ability, but is missed by students in the middle, it will have a high value for its discrimination parameter. (b) It is not possible to fit an item-response model when you have more questions than students. In order to fit the model, you either need to reduce the number of questions (for example, by discarding some questions or by putting together some questions into a combined score) or increase the number of students in the dataset. (c) To keep the model identified, you can set one of the difficulty parameters or one of the ability parameters to zero and set one of the discrimination parameters to 1. (d) If two students answer the same number of q

3 0.93097937 2055 andrew gelman stats-2013-10-08-A Bayesian approach for peer-review panels? and a speculation about Bruno Frey

Introduction: Daniel Sgroi and Andrew Oswald write : Many governments wish to assess the quality of their universities. A prominent example is the UK’s new Research Excellence Framework (REF) 2014. In the REF, peer-review panels will be provided with information on publications and citations. This paper suggests a way in which panels could choose the weights to attach to these two indicators. The analysis draws in an intuitive way on the concept of Bayesian updating (where citations gradually reveal information about the initially imperfectly-observed importance of the research). Our study should not be interpreted as the argument that only mechanistic measures ought to be used in a REF. I agree that, if you’re going to choose a weighted average, it makes sense to think about where the weights are coming from. Some aspects of Sgroi and Oswald’s proposal remind me of the old idea of evaluating journal articles by expected number of total citations. The idea is that you’d use four pieces of i

4 0.93095803 1206 andrew gelman stats-2012-03-10-95% intervals that I don’t believe, because they’re from a flat prior I don’t believe

Introduction: Arnaud Trolle (no relation ) writes: I have a question about the interpretation of (non-)overlapping of 95% credibility intervals. In a Bayesian ANOVA (a within-subjects one), I computed 95% credibility intervals about the main effects of a factor. I’d like to compare two by two the main effects across the different conditions of the factor. Can I directly interpret the (non-)overlapping of these credibility intervals and make the following statements: “As the 95% credibility intervals do not overlap, both conditions have significantly different main effects” or conversely “As the 95% credibility intervals overlap, the main effects of both conditions are not significantly different, i.e. equivalent”? I heard that, in the case of classical confidence intervals, the second statement is false, but what happens when working within a Bayesian framework? My reply: I think it makes more sense to directly look at inference for the difference. Also, your statements about equivalence

5 0.92972916 1365 andrew gelman stats-2012-06-04-Question 25 of my final exam for Design and Analysis of Sample Surveys

Introduction: 25. You are using multilevel regression and poststratification (MRP) to a survey of 1500 people to estimate support for the space program, by state. The model is fit using, as a state- level predictor, the Republican presidential vote in the state, which turns out to have a low correlation with support for the space program. Which of the following statements are basically true? (Indicate all that apply.) (a) For small states, the MRP estimates will be determined almost entirely by the demo- graphic characteristics of the respondents in the sample from that state. (b) For small states, the MRP estimates will be determined almost entirely by the demographic characteristics of the population in that state. (c) Adding a predictor specifically for this model (for example, a measure of per-capita space-program spending in the state) could dramatically improve the estimates of state-level opinion. (d) It would not be appropriate to add a predictor such as per-capita space-program spen

6 0.92952955 1799 andrew gelman stats-2013-04-12-Stan 1.3.0 and RStan 1.3.0 Ready for Action

7 0.92951989 1454 andrew gelman stats-2012-08-11-Weakly informative priors for Bayesian nonparametric models?

8 0.92691725 846 andrew gelman stats-2011-08-09-Default priors update?

9 0.92498016 1197 andrew gelman stats-2012-03-04-“All Models are Right, Most are Useless”

10 0.92443848 1474 andrew gelman stats-2012-08-29-More on scaled-inverse Wishart and prior independence

11 0.92250574 1881 andrew gelman stats-2013-06-03-Boot

12 0.9217754 758 andrew gelman stats-2011-06-11-Hey, good news! Your p-value just passed the 0.05 threshold!

13 0.92142212 671 andrew gelman stats-2011-04-20-One more time-use graph

14 0.9212659 1475 andrew gelman stats-2012-08-30-A Stan is Born

15 0.92080534 1753 andrew gelman stats-2013-03-06-Stan 1.2.0 and RStan 1.2.0

16 0.92073572 1851 andrew gelman stats-2013-05-11-Actually, I have no problem with this graph

17 0.92055839 1240 andrew gelman stats-2012-04-02-Blogads update

18 0.92054951 693 andrew gelman stats-2011-05-04-Don’t any statisticians work for the IRS?

19 0.92038465 2074 andrew gelman stats-2013-10-23-Can’t Stop Won’t Stop Mister P Beatdown

20 0.91890919 1572 andrew gelman stats-2012-11-10-I don’t like this cartoon