andrew_gelman_stats andrew_gelman_stats-2011 andrew_gelman_stats-2011-1019 knowledge-graph by maker-knowledge-mining

1019 andrew gelman stats-2011-11-19-Validation of Software for Bayesian Models Using Posterior Quantiles


meta infos for this blog

Source: html

Introduction: I love this stuff : This article presents a simulation-based method designed to establish the computational correctness of software developed to fit a specific Bayesian model, capitalizing on properties of Bayesian posterior distributions. We illustrate the validation technique with two examples. The validation method is shown to find errors in software when they exist and, moreover, the validation output can be informative about the nature and location of such errors. We also compare our method with that of an earlier approach. I hope we can put it into Stan.


Summary: the most important sentenses genereted by tfidf model

sentIndex sentText sentNum sentScore

1 I love this stuff : This article presents a simulation-based method designed to establish the computational correctness of software developed to fit a specific Bayesian model, capitalizing on properties of Bayesian posterior distributions. [sent-1, score-2.372]

2 The validation method is shown to find errors in software when they exist and, moreover, the validation output can be informative about the nature and location of such errors. [sent-3, score-2.607]

3 We also compare our method with that of an earlier approach. [sent-4, score-0.514]

4 I hope we can put it into Stan. [sent-5, score-0.16]


similar blogs computed by tfidf model

tfidf for this blog:

wordName wordTfidf (topN-words)

[('validation', 0.567), ('method', 0.272), ('software', 0.239), ('capitalizing', 0.235), ('correctness', 0.205), ('establish', 0.181), ('moreover', 0.168), ('technique', 0.162), ('properties', 0.157), ('location', 0.156), ('output', 0.155), ('designed', 0.146), ('presents', 0.145), ('illustrate', 0.141), ('exist', 0.141), ('developed', 0.132), ('bayesian', 0.129), ('shown', 0.128), ('computational', 0.126), ('compare', 0.114), ('informative', 0.114), ('stan', 0.108), ('nature', 0.108), ('specific', 0.105), ('posterior', 0.102), ('love', 0.102), ('errors', 0.1), ('hope', 0.097), ('stuff', 0.096), ('earlier', 0.095), ('fit', 0.08), ('put', 0.063), ('find', 0.06), ('model', 0.049), ('article', 0.049), ('two', 0.047), ('also', 0.033)]

similar blogs list:

simIndex simValue blogId blogTitle

same-blog 1 0.99999994 1019 andrew gelman stats-2011-11-19-Validation of Software for Bayesian Models Using Posterior Quantiles

Introduction: I love this stuff : This article presents a simulation-based method designed to establish the computational correctness of software developed to fit a specific Bayesian model, capitalizing on properties of Bayesian posterior distributions. We illustrate the validation technique with two examples. The validation method is shown to find errors in software when they exist and, moreover, the validation output can be informative about the nature and location of such errors. We also compare our method with that of an earlier approach. I hope we can put it into Stan.

2 0.13962997 1330 andrew gelman stats-2012-05-19-Cross-validation to check missing-data imputation

Introduction: Aureliano Crameri writes: I have questions regarding one technique you and your colleagues described in your papers: the cross validation (Multiple Imputation with Diagnostics (mi) in R: Opening Windows into the Black Box, with reference to Gelman, King, and Liu, 1998). I think this is the technique I need for my purpose, but I am not sure I understand it right. I want to use the multiple imputation to estimate the outcome of psychotherapies based on longitudinal data. First I have to demonstrate that I am able to get unbiased estimates with the multiple imputation. The expected bias is the overestimation of the outcome of dropouts. I will test my imputation strategies by means of a series of simulations (delete values, impute, compare with the original). Due to the complexity of the statistical analyses I think I need at least 200 cases. Now I don’t have so many cases without any missings. My data have missing values in different variables. The proportion of missing values is

3 0.12262027 1761 andrew gelman stats-2013-03-13-Lame Statistics Patents

Introduction: Manoel Galdino wrote in a comment off-topic on another post (which I erased): I know you commented before about patents on statistical methods. Did you know this patent ( http://www.archpatent.com/patents/8032473 )? Do you have any comment on patents that don’t describe mathematically how it works and how and if they’re any different from previous methods? And what about the lack of scientific validation of the claims in such a method? The patent in question, “US 8032473: “Generalized reduced error logistic regression method,” begins with the following “claim”: A system for machine learning comprising: a computer including a computer-readable medium having software stored thereon that, when executed by said computer, performs a method comprising the steps of being trained to learn a logistic regression match to a target class variable so to exhibit classification learning by which: an estimated error in each variable’s moment in the logistic regression be modeled and reduce

4 0.12064366 1205 andrew gelman stats-2012-03-09-Coming to agreement on philosophy of statistics

Introduction: Deborah Mayo collected some reactions to my recent article , Induction and Deduction in Bayesian Data Analysis. I’m pleased that that everybody (philosopher Mayo, applied statistician Stephen Senn, and theoretical statistician Larry Wasserman) is so positive about my article and that nobody’s defending the sort of hard-core inductivism that’s featured on the Bayesian inference wikipedia page. Here’s the Wikipedia definition, which I disagree with: Bayesian inference uses aspects of the scientific method, which involves collecting evidence that is meant to be consistent or inconsistent with a given hypothesis. As evidence accumulates, the degree of belief in a hypothesis ought to change. With enough evidence, it should become very high or very low. . . . Bayesian inference uses a numerical estimate of the degree of belief in a hypothesis before evidence has been observed and calculates a numerical estimate of the degree of belief in the hypothesis after evidence has been obse

5 0.11049464 1469 andrew gelman stats-2012-08-25-Ways of knowing

Introduction: In this discussion from last month, computer science student and Judea Pearl collaborator Elias Barenboim expressed an attitude that hierarchical Bayesian methods might be fine in practice but that they lack theory, that Bayesians can’t succeed in toy problems. I posted a P.S. there which might not have been noticed so I will put it here: I now realize that there is some disagreement about what constitutes a “guarantee.” In one of his comments, Barenboim writes, “the assurance we have that the result must hold as long as the assumptions in the model are correct should be regarded as a guarantee.” In that sense, yes, we have guarantees! It is fundamental to Bayesian inference that the result must hold if the assumptions in the model are correct. We have lots of that in Bayesian Data Analysis (particularly in the first four chapters but implicitly elsewhere as well), and this is also covered in the classic books by Lindley, Jaynes, and others. This sort of guarantee is indeed p

6 0.10399823 1825 andrew gelman stats-2013-04-25-It’s binless! A program for computing normalizing functions

7 0.10026748 1514 andrew gelman stats-2012-09-28-AdviseStat 47% Campaign Ad

8 0.099442117 291 andrew gelman stats-2010-09-22-Philosophy of Bayes and non-Bayes: A dialogue with Deborah Mayo

9 0.094544552 1214 andrew gelman stats-2012-03-15-Of forecasts and graph theory and characterizing a statistical method by the information it uses

10 0.094488256 147 andrew gelman stats-2010-07-15-Quote of the day: statisticians and defaults

11 0.092278905 1489 andrew gelman stats-2012-09-09-Commercial Bayesian inference software is popping up all over

12 0.090884149 231 andrew gelman stats-2010-08-24-Yet another Bayesian job opportunity

13 0.088999331 932 andrew gelman stats-2011-09-30-Articles on the philosophy of Bayesian statistics by Cox, Mayo, Senn, and others!

14 0.088232681 6 andrew gelman stats-2010-04-27-Jelte Wicherts lays down the stats on IQ

15 0.07831347 1527 andrew gelman stats-2012-10-10-Another reason why you can get good inferences from a bad model

16 0.076968059 496 andrew gelman stats-2011-01-01-Tukey’s philosophy

17 0.076708145 2368 andrew gelman stats-2014-06-11-Bayes in the research conversation

18 0.075971216 1431 andrew gelman stats-2012-07-27-Overfitting

19 0.075853184 1497 andrew gelman stats-2012-09-15-Our blog makes connections!

20 0.075805359 1948 andrew gelman stats-2013-07-21-Bayes related


similar blogs computed by lsi model

lsi for this blog:

topicId topicWeight

[(0, 0.096), (1, 0.11), (2, -0.051), (3, 0.043), (4, -0.015), (5, 0.017), (6, -0.035), (7, -0.059), (8, -0.026), (9, -0.067), (10, -0.009), (11, -0.032), (12, -0.039), (13, 0.026), (14, -0.001), (15, 0.009), (16, 0.031), (17, 0.02), (18, -0.034), (19, 0.025), (20, -0.004), (21, 0.024), (22, -0.022), (23, -0.012), (24, 0.034), (25, -0.013), (26, 0.005), (27, 0.003), (28, 0.024), (29, -0.013), (30, 0.048), (31, -0.0), (32, 0.065), (33, -0.019), (34, 0.011), (35, 0.038), (36, -0.012), (37, 0.024), (38, 0.017), (39, -0.031), (40, 0.018), (41, 0.001), (42, -0.03), (43, 0.025), (44, 0.054), (45, 0.003), (46, 0.002), (47, 0.018), (48, 0.022), (49, 0.042)]

similar blogs list:

simIndex simValue blogId blogTitle

same-blog 1 0.96943319 1019 andrew gelman stats-2011-11-19-Validation of Software for Bayesian Models Using Posterior Quantiles

Introduction: I love this stuff : This article presents a simulation-based method designed to establish the computational correctness of software developed to fit a specific Bayesian model, capitalizing on properties of Bayesian posterior distributions. We illustrate the validation technique with two examples. The validation method is shown to find errors in software when they exist and, moreover, the validation output can be informative about the nature and location of such errors. We also compare our method with that of an earlier approach. I hope we can put it into Stan.

2 0.77369565 2349 andrew gelman stats-2014-05-26-WAIC and cross-validation in Stan!

Introduction: Aki and I write : The Watanabe-Akaike information criterion (WAIC) and cross-validation are methods for estimating pointwise out-of-sample prediction accuracy from a fitted Bayesian model. WAIC is based on the series expansion of leave-one-out cross-validation (LOO), and asymptotically they are equal. With finite data, WAIC and cross-validation address different predictive questions and thus it is useful to be able to compute both. WAIC and an importance-sampling approximated LOO can be estimated directly using the log-likelihood evaluated at the posterior simulations of the parameter values. We show how to compute WAIC, IS-LOO, K-fold cross-validation, and related diagnostic quantities in the Bayesian inference package Stan as called from R. This is important, I think. One reason the deviance information criterion (DIC) has been so popular is its implementation in Bugs. We think WAIC and cross-validation make more sense than DIC, especially from a Bayesian perspective in whic

3 0.68582308 1528 andrew gelman stats-2012-10-10-My talk at MIT on Thurs 11 Oct

Introduction: Stan: open-source Bayesian inference Speaker: Andrew Gelman, Columbia University Date: Thursday, October 11 2012 Time: 4:00PM to 5:00PM Location: 32-D507 Host: Polina Golland, CSAIL Contact: Polina Golland, 6172538005, polina@csail.mit.edu Stan ( mc-stan.org ) is an open-source package for obtaining Bayesian inference using the No-U-Turn sampler, a variant of Hamiltonian Monte Carlo. We discuss how Stan works and what it can do, the problems that motivated us to write Stan, current challenges, and areas of planned development, including tools for improved generality and usability, more efficient sampling algorithms, and fuller integration of model building, model checking, and model understanding in Bayesian data analysis. P.S. Here’s the talk .

4 0.68100309 1443 andrew gelman stats-2012-08-04-Bayesian Learning via Stochastic Gradient Langevin Dynamics

Introduction: Burak Bayramli writes: In this paper by Sunjin Ahn, Anoop Korattikara, and Max Welling and this paper by Welling and Yee Whye The, there are some arguments on big data and the use of MCMC. Both papers have suggested improvements to speed up MCMC computations. I was wondering what your thoughts were, especially on this paragraph: When a dataset has a billion data-cases (as is not uncommon these days) MCMC algorithms will not even have generated a single (burn-in) sample when a clever learning algorithm based on stochastic gradients may already be making fairly good predictions. In fact, the intriguing results of Bottou and Bousquet (2008) seem to indicate that in terms of “number of bits learned per unit of computation”, an algorithm as simple as stochastic gradient descent is almost optimally efficient. We therefore argue that for Bayesian methods to remain useful in an age when the datasets grow at an exponential rate, they need to embrace the ideas of the stochastic optimiz

5 0.67462587 449 andrew gelman stats-2010-12-04-Generalized Method of Moments, whatever that is

Introduction: Xuequn Hu writes: I am an econ doctoral student, trying to do some empirical work using Bayesian methods. Recently I read a paper(and its discussion) that pitches Bayesian methods against GMM (Generalized Method of Moments), which is quite popular in econometrics for frequentists. I am wondering if you can, here or on your blog, give some insights about these two methods, from the perspective of a Bayesian statistician. I know GMM does not conform to likelihood principle, but Bayesian are often charged with strong distribution assumptions. I can’t actually help on this, since I don’t know what GMM is. My guess is that, like other methods that don’t explicitly use prior estimation, this method will work well if sufficient information is included as data. Which would imply a hierarchical structure.

6 0.66458279 1205 andrew gelman stats-2012-03-09-Coming to agreement on philosophy of statistics

7 0.6576491 1648 andrew gelman stats-2013-01-02-A important new survey of Bayesian predictive methods for model assessment, selection and comparison

8 0.65749818 2020 andrew gelman stats-2013-09-12-Samplers for Big Science: emcee and BAT

9 0.64703333 1228 andrew gelman stats-2012-03-25-Continuous variables in Bayesian networks

10 0.63939661 2254 andrew gelman stats-2014-03-18-Those wacky anti-Bayesians used to be intimidating, but now they’re just pathetic

11 0.63921899 1779 andrew gelman stats-2013-03-27-“Two Dogmas of Strong Objective Bayesianism”

12 0.63867515 1950 andrew gelman stats-2013-07-22-My talks that were scheduled for Tues at the Data Skeptics meetup and Wed at the Open Statistical Programming meetup

13 0.63827097 904 andrew gelman stats-2011-09-13-My wikipedia edit

14 0.63794535 1856 andrew gelman stats-2013-05-14-GPstuff: Bayesian Modeling with Gaussian Processes

15 0.63613099 1438 andrew gelman stats-2012-07-31-What is a Bayesian?

16 0.63486272 1719 andrew gelman stats-2013-02-11-Why waste time philosophizing?

17 0.62644017 1489 andrew gelman stats-2012-09-09-Commercial Bayesian inference software is popping up all over

18 0.62426114 1712 andrew gelman stats-2013-02-07-Philosophy and the practice of Bayesian statistics (with all the discussions!)

19 0.62254 1749 andrew gelman stats-2013-03-04-Stan in L.A. this Wed 3:30pm

20 0.61818957 1571 andrew gelman stats-2012-11-09-The anti-Bayesian moment and its passing


similar blogs computed by lda model

lda for this blog:

topicId topicWeight

[(1, 0.027), (16, 0.163), (21, 0.023), (24, 0.168), (40, 0.031), (41, 0.099), (45, 0.029), (86, 0.072), (89, 0.025), (99, 0.216)]

similar blogs list:

simIndex simValue blogId blogTitle

same-blog 1 0.9547888 1019 andrew gelman stats-2011-11-19-Validation of Software for Bayesian Models Using Posterior Quantiles

Introduction: I love this stuff : This article presents a simulation-based method designed to establish the computational correctness of software developed to fit a specific Bayesian model, capitalizing on properties of Bayesian posterior distributions. We illustrate the validation technique with two examples. The validation method is shown to find errors in software when they exist and, moreover, the validation output can be informative about the nature and location of such errors. We also compare our method with that of an earlier approach. I hope we can put it into Stan.

2 0.91693461 411 andrew gelman stats-2010-11-13-Ethical concerns in medical trials

Introduction: I just read this article on the treatment of medical volunteers, written by doctor and bioethicist Carl Ellliott. As a statistician who has done a small amount of consulting for pharmaceutical companies, I have a slightly different perspective. As a doctor, Elliott focuses on individual patients, whereas, as a statistician, I’ve been trained to focus on the goal of accurately estimate treatment effects. I’ll go through Elliott’s article and give my reactions. Elliott: In Miami, investigative reporters for Bloomberg Markets magazine discovered that a contract research organisation called SFBC International was testing drugs on undocumented immigrants in a rundown motel; since that report, the motel has been demolished for fire and safety violations. . . . SFBC had recently been named one of the best small businesses in America by Forbes magazine. The Holiday Inn testing facility was the largest in North America, and had been operating for nearly ten years before inspecto

3 0.91426212 177 andrew gelman stats-2010-08-02-Reintegrating rebels into civilian life: Quasi-experimental evidence from Burundi

Introduction: Michael Gilligan, Eric Mvukiyehe, and Cyrus Samii write : We [Gilligan, Mvukiyehe, and Samii] use original survey data, collected in Burundi in the summer of 2007, to show that a World Bank ex-combatant reintegration program implemented after Burundi’s civil war caused significant economic reintegration for its beneficiaries but that this economic reintegration did not translate into greater political and social reintegration. Previous studies of reintegration programs have found them to be ineffective, but these studies have suffered from selection bias: only ex-combatants who self selected into those programs were studied. We avoid such bias with a quasi-experimental research design made possible by an exogenous bureaucratic failure in the implementation of program. One of the World Bank’s implementing partners delayed implementation by almost a year due to an unforeseen contract dispute. As a result, roughly a third of ex-combatants had their program benefits withheld for reas

4 0.91150868 2262 andrew gelman stats-2014-03-23-Win probabilities during a sporting event

Introduction: Todd Schneider writes: Apropos of your recent blog post about modeling score differential of basketball games , I thought you might enjoy a site I built, gambletron2000.com , that gathers real-time win probabilities from betting markets for most major sports (including NBA and college basketball). My original goal was to use the variance of changes in win probabilities to quantify which games were the most exciting, but I got a bit carried away and ended up pursuing a bunch of other ideas, which  you can read about in the full writeup here This particular passage from the anonymous someone in your post: My idea is for each timestep in a game (a second, 5 seconds, etc), use the Vegas line, the current score differential, who has the ball, and the number of possessions played already (to account for differences in pace) to create a point estimate probability of the home team winning. reminded me of a graph I made, which shows the mean-reverting tendency of N

5 0.90652609 2 andrew gelman stats-2010-04-23-Modeling heterogenous treatment effects

Introduction: Don Green and Holger Kern write on one of my favorite topics , treatment interactions (see also here ): We [Green and Kern] present a methodology that largely automates the search for systematic treatment effect heterogeneity in large-scale experiments. We introduce a nonparametric estimator developed in statistical learning, Bayesian Additive Regression Trees (BART), to model treatment effects that vary as a function of covariates. BART has several advantages over commonly employed parametric modeling strategies, in particular its ability to automatically detect and model relevant treatment-covariate interactions in a flexible manner. To increase the reliability and credibility of the resulting conditional treatment effect estimates, we suggest the use of a split sample design. The data are randomly divided into two equally-sized parts, with the first part used to explore treatment effect heterogeneity and the second part used to confirm the results. This approach permits a re

6 0.90344203 503 andrew gelman stats-2011-01-04-Clarity on my email policy

7 0.9025805 1871 andrew gelman stats-2013-05-27-Annals of spam

8 0.90245903 447 andrew gelman stats-2010-12-03-Reinventing the wheel, only more so.

9 0.90148413 1214 andrew gelman stats-2012-03-15-Of forecasts and graph theory and characterizing a statistical method by the information it uses

10 0.90082788 1626 andrew gelman stats-2012-12-16-The lamest, grudgingest, non-retraction retraction ever

11 0.89840567 799 andrew gelman stats-2011-07-13-Hypothesis testing with multiple imputations

12 0.89709228 1206 andrew gelman stats-2012-03-10-95% intervals that I don’t believe, because they’re from a flat prior I don’t believe

13 0.89619851 1016 andrew gelman stats-2011-11-17-I got 99 comparisons but multiplicity ain’t one

14 0.89521778 1300 andrew gelman stats-2012-05-05-Recently in the sister blog

15 0.89304101 1980 andrew gelman stats-2013-08-13-Test scores and grades predict job performance (but maybe not at Google)

16 0.89198208 2224 andrew gelman stats-2014-02-25-Basketball Stats: Don’t model the probability of win, model the expected score differential.

17 0.89151311 185 andrew gelman stats-2010-08-04-Why does anyone support private macroeconomic forecasts?

18 0.89129198 2179 andrew gelman stats-2014-01-20-The AAA Tranche of Subprime Science

19 0.89026362 1923 andrew gelman stats-2013-07-03-Bayes pays!

20 0.89023679 674 andrew gelman stats-2011-04-21-Handbook of Markov Chain Monte Carlo