andrew_gelman_stats andrew_gelman_stats-2011 andrew_gelman_stats-2011-833 knowledge-graph by maker-knowledge-mining

833 andrew gelman stats-2011-07-31-Untunable Metropolis


meta infos for this blog

Source: html

Introduction: Michael Margolis writes: What are we to make of it when a Metropolis-Hastings step just won’t tune? That is, the acceptance rate is zero at expected-jump-size X, and way above 1/2 at X-exp(-16) (i.e., machine precision ). I’ve solved my practical problem by writing that I would have liked to include results from a diffuse prior, but couldn’t. But I’m bothered by the poverty of my intuition. And since everything I’ve read says this is an issue of efficiency, rather than accuracy, I wonder if I could solve it just by running massive and heavily thinned chains. My reply: I can’t see how this could happen in a well-specified problem! I suspect it’s a bug. Otherwise try rescaling your variables so that your parameters will have values on the order of magnitude of 1. To which Margolis responded: I hardly wrote any of the code, so I can’t speak to the bug question — it’s binomial kriging from the R package geoRglm. And there are no covariates to scale — just the zero and one


Summary: the most important sentenses genereted by tfidf model

sentIndex sentText sentNum sentScore

1 Michael Margolis writes: What are we to make of it when a Metropolis-Hastings step just won’t tune? [sent-1, score-0.079]

2 That is, the acceptance rate is zero at expected-jump-size X, and way above 1/2 at X-exp(-16) (i. [sent-2, score-0.363]

3 I’ve solved my practical problem by writing that I would have liked to include results from a diffuse prior, but couldn’t. [sent-5, score-0.597]

4 And since everything I’ve read says this is an issue of efficiency, rather than accuracy, I wonder if I could solve it just by running massive and heavily thinned chains. [sent-7, score-0.585]

5 My reply: I can’t see how this could happen in a well-specified problem! [sent-8, score-0.082]

6 Otherwise try rescaling your variables so that your parameters will have values on the order of magnitude of 1. [sent-10, score-0.67]

7 To which Margolis responded: I hardly wrote any of the code, so I can’t speak to the bug question — it’s binomial kriging from the R package geoRglm. [sent-11, score-0.727]

8 And there are no covariates to scale — just the zero and one of the binomial distribution. [sent-12, score-0.647]

9 But I will look into rescaling the spatial units forthwith. [sent-13, score-0.585]

10 Don’t forget the folk theorem of statistical computing. [sent-17, score-0.356]


similar blogs computed by tfidf model

tfidf for this blog:

wordName wordTfidf (topN-words)

[('margolis', 0.432), ('rescaling', 0.324), ('binomial', 0.278), ('diffuse', 0.185), ('tune', 0.162), ('zero', 0.157), ('bug', 0.137), ('folk', 0.136), ('poverty', 0.131), ('spatial', 0.131), ('units', 0.13), ('heavily', 0.129), ('covariates', 0.128), ('acceptance', 0.128), ('massive', 0.126), ('solved', 0.126), ('precision', 0.12), ('efficiency', 0.119), ('theorem', 0.111), ('bothered', 0.111), ('liked', 0.109), ('forget', 0.109), ('magnitude', 0.109), ('hardly', 0.108), ('accuracy', 0.106), ('computing', 0.106), ('machine', 0.106), ('package', 0.103), ('responded', 0.102), ('speak', 0.101), ('solve', 0.096), ('otherwise', 0.092), ('practical', 0.091), ('suspect', 0.089), ('couldn', 0.089), ('michael', 0.088), ('running', 0.087), ('problem', 0.086), ('code', 0.086), ('tried', 0.084), ('scale', 0.084), ('happen', 0.082), ('values', 0.082), ('worked', 0.08), ('step', 0.079), ('parameters', 0.079), ('rate', 0.078), ('order', 0.076), ('everything', 0.074), ('wonder', 0.073)]

similar blogs list:

simIndex simValue blogId blogTitle

same-blog 1 0.99999994 833 andrew gelman stats-2011-07-31-Untunable Metropolis

Introduction: Michael Margolis writes: What are we to make of it when a Metropolis-Hastings step just won’t tune? That is, the acceptance rate is zero at expected-jump-size X, and way above 1/2 at X-exp(-16) (i.e., machine precision ). I’ve solved my practical problem by writing that I would have liked to include results from a diffuse prior, but couldn’t. But I’m bothered by the poverty of my intuition. And since everything I’ve read says this is an issue of efficiency, rather than accuracy, I wonder if I could solve it just by running massive and heavily thinned chains. My reply: I can’t see how this could happen in a well-specified problem! I suspect it’s a bug. Otherwise try rescaling your variables so that your parameters will have values on the order of magnitude of 1. To which Margolis responded: I hardly wrote any of the code, so I can’t speak to the bug question — it’s binomial kriging from the R package geoRglm. And there are no covariates to scale — just the zero and one

2 0.15907985 547 andrew gelman stats-2011-01-31-Using sample size in the prior distribution

Introduction: Mike McLaughlin writes: Consider the Seeds example in vol. 1 of the BUGS examples. There, a binomial likelihood has a p parameter constructed, via logit, from two covariates. What I am wondering is: Would it be legitimate, in a binomial + logit problem like this, to allow binomial p[i] to be a function of the corresponding n[i] or would that amount to using the data in the prior? In other words, in the context of the Seeds example, is r[] the only data or is n[] data as well and therefore not permissible in a prior formulation? I [McLaughlin] currently have a model with a common beta prior for all p[i] but would like to mitigate this commonality (a kind of James-Stein effect) when there are lots of observations for some i. But this seems to feed the data back into the prior. Does it really? It also occurs to me [McLaughlin] that, perhaps, a binomial likelihood is not the one to use here (not flexible enough). My reply: Strictly speaking, “n” is data, and so what you wa

3 0.10818859 2150 andrew gelman stats-2013-12-27-(R-Py-Cmd)Stan 2.1.0

Introduction: We’re happy to announce the release of Stan C++, CmdStan, RStan, and PyStan 2.1.0.  This is a minor feature release, but it is also an important bug fix release.  As always, the place to start is the (all new) Stan web pages: http://mc-stan.org   Major Bug in 2.0.0, 2.0.1 Stan 2.0.0 and Stan 2.0.1 introduced a bug in the implementation of the NUTS criterion that led to poor tail exploration and thus biased the posterior uncertainty downward.  There was no bug in NUTS in Stan 1.3 or earlier, and 2.1 has been extensively tested and tests put in place so this problem will not recur. If you are using Stan 2.0.0 or 2.0.1, you should switch to 2.1.0 as soon as possible and rerun any models you care about.   New Target Acceptance Rate Default for Stan 2.1.0 Another big change aimed at reducing posterior estimation bias was an increase in the target acceptance rate during adaptation from 0.65 to 0.80.  The bad news is that iterations will take around 50% longer

4 0.10752365 1509 andrew gelman stats-2012-09-24-Analyzing photon counts

Introduction: Via Tom LaGatta, Boris Glebov writes: My labmates have statistics problem. We are all experimentalists, but need an input on a fine statistics point. The problem is as follows. The data set consists of photon counts measured at a series of coordinates. The number of input photons is known, but the system transmission (T) is not known and needs to be estimated. The number of transmitted photons at each coordinate follows a binomial distribution, not a Gaussian one. The spatial distribution of T values it then fit using a Levenberg-Marquart method modified to use weights for each data point. At present, my labmates are not sure how to properly calculate and use the weights. The equations are designed for Gaussian distributions, not binomial ones, and this is a problem because in many cases the photon counts are near the edge (say, zero), where a Gaussian width is nonsensical. Could you recommend a source they could use to guide their calculations? My reply: I don’t know a

5 0.10368353 246 andrew gelman stats-2010-08-31-Somewhat Bayesian multilevel modeling

Introduction: Eric McGhee writes: I’m trying to generate county-level estimates from a statewide survey of California using multilevel modeling. I would love to learn the full Bayesian approach, but I’m on a tight schedule and worried about teaching myself something of that complexity in the time available. I’m hoping I can use the classical approach and simulate standard errors using what you and Jennifer Hill call the “informal Bayesian” method. This has raised a few questions: First, what are the costs of using this approach as opposed to full Bayesian? Second, when I use the predictive simulation as described on p. 149 of “Data Analysis” on a binary dependent variable and a sample of 2000, I get a 5%-95% range of simulation results so large as to be effectively useless (on the order of +/- 15 points). This is true even for LA county, which has enough cases by itself (about 500) to get a standard error of about 2 points from simple disaggregation. However, if I simulate only with t

6 0.10366549 846 andrew gelman stats-2011-08-09-Default priors update?

7 0.095172584 696 andrew gelman stats-2011-05-04-Whassup with glm()?

8 0.094733171 251 andrew gelman stats-2010-09-02-Interactions of predictors in a causal model

9 0.090300336 1841 andrew gelman stats-2013-05-04-The Folk Theorem of Statistical Computing

10 0.086487137 1516 andrew gelman stats-2012-09-30-Computational problems with glm etc.

11 0.085563146 1403 andrew gelman stats-2012-07-02-Moving beyond hopeless graphics

12 0.083508089 1474 andrew gelman stats-2012-08-29-More on scaled-inverse Wishart and prior independence

13 0.082002163 25 andrew gelman stats-2010-05-10-Two great tastes that taste great together

14 0.079310916 1134 andrew gelman stats-2012-01-21-Lessons learned from a recent R package submission

15 0.078408912 1941 andrew gelman stats-2013-07-16-Priors

16 0.078146107 1486 andrew gelman stats-2012-09-07-Prior distributions for regression coefficients

17 0.076702558 1792 andrew gelman stats-2013-04-07-X on JLP

18 0.076675691 217 andrew gelman stats-2010-08-19-The “either-or” fallacy of believing in discrete models: an example of folk statistics

19 0.076628506 1757 andrew gelman stats-2013-03-11-My problem with the Lindley paradox

20 0.073316641 1248 andrew gelman stats-2012-04-06-17 groups, 6 group-level predictors: What to do?


similar blogs computed by lsi model

lsi for this blog:

topicId topicWeight

[(0, 0.126), (1, 0.058), (2, 0.008), (3, 0.021), (4, 0.049), (5, -0.014), (6, 0.066), (7, -0.033), (8, -0.029), (9, 0.008), (10, -0.013), (11, -0.014), (12, 0.013), (13, -0.019), (14, 0.01), (15, -0.013), (16, -0.02), (17, -0.012), (18, 0.025), (19, -0.013), (20, 0.021), (21, 0.017), (22, 0.013), (23, 0.008), (24, -0.008), (25, 0.004), (26, -0.005), (27, -0.02), (28, 0.009), (29, -0.007), (30, 0.032), (31, 0.029), (32, -0.02), (33, 0.019), (34, -0.008), (35, -0.035), (36, -0.017), (37, 0.025), (38, -0.015), (39, 0.012), (40, -0.026), (41, 0.048), (42, -0.032), (43, 0.025), (44, 0.033), (45, -0.017), (46, 0.032), (47, 0.01), (48, 0.031), (49, -0.009)]

similar blogs list:

simIndex simValue blogId blogTitle

same-blog 1 0.95864385 833 andrew gelman stats-2011-07-31-Untunable Metropolis

Introduction: Michael Margolis writes: What are we to make of it when a Metropolis-Hastings step just won’t tune? That is, the acceptance rate is zero at expected-jump-size X, and way above 1/2 at X-exp(-16) (i.e., machine precision ). I’ve solved my practical problem by writing that I would have liked to include results from a diffuse prior, but couldn’t. But I’m bothered by the poverty of my intuition. And since everything I’ve read says this is an issue of efficiency, rather than accuracy, I wonder if I could solve it just by running massive and heavily thinned chains. My reply: I can’t see how this could happen in a well-specified problem! I suspect it’s a bug. Otherwise try rescaling your variables so that your parameters will have values on the order of magnitude of 1. To which Margolis responded: I hardly wrote any of the code, so I can’t speak to the bug question — it’s binomial kriging from the R package geoRglm. And there are no covariates to scale — just the zero and one

2 0.74280798 1769 andrew gelman stats-2013-03-18-Tibshirani announces new research result: A significance test for the lasso

Introduction: Lasso and me For a long time I was wrong about lasso. Lasso (“least absolute shrinkage and selection operator”) is a regularization procedure that shrinks regression coefficients toward zero, and in its basic form is equivalent to maximum penalized likelihood estimation with a penalty function that is proportional to the sum of the absolute values of the regression coefficients. I first heard about lasso from a talk that Trevor Hastie Rob Tibshirani gave at Berkeley in 1994 or 1995. He demonstrated that it shrunk regression coefficients to zero. I wasn’t impressed, first because it seemed like no big deal (if that’s the prior you use, that’s the shrinkage you get) and second because, from a Bayesian perspective, I don’t want to shrink things all the way to zero. In the sorts of social and environmental science problems I’ve worked on, just about nothing is zero. I’d like to control my noisy estimates but there’s nothing special about zero. At the end of the talk I stood

3 0.7286936 327 andrew gelman stats-2010-10-07-There are never 70 distinct parameters

Introduction: Sam Seaver writes: I’m a graduate student in computational biology, and I’m relatively new to advanced statistics, and am trying to teach myself how best to approach a problem I have. My dataset is a small sparse matrix of 150 cases and 70 predictors, it is sparse as in many zeros, not many ‘NA’s. Each case is a nutrient that is fed into an in silico organism, and its response is whether or not it stimulates growth, and each predictor is one of 70 different pathways that the nutrient may or may not belong to. Because all of the nutrients do not belong to all of the pathways, there are thus many zeros in my matrix. My goal is to be able to use the pathways themselves to predict whether or not a nutrient could stimulate growth, thus I wanted to compute regression coefficients for each pathway, with which I could apply to other nutrients for other species. There are quite a few singularities in the dataset (summary(glm) reports that 14 coefficients are not defined because of sin

4 0.7222057 1466 andrew gelman stats-2012-08-22-The scaled inverse Wishart prior distribution for a covariance matrix in a hierarchical model

Introduction: Since we’re talking about the scaled inverse Wishart . . . here’s a recent message from Chris Chatham: I have been reading your book on Bayesian Hierarchical/Multilevel Modeling but have been struggling a bit with deciding whether to model my multivariate normal distribution using the scaled inverse Wishart approach you advocate given the arguments at this blog post [entitled "Why an inverse-Wishart prior may not be such a good idea"]. My reply: We discuss this in our book. We know the inverse-Wishart has problems, that’s why we recommend the scaled inverse-Wishart, which is a more general class of models. Here ‘s an old blog post on the topic. And also of course there’s the description in our book. Chris pointed me to the following comment by Simon Barthelmé: Using the scaled inverse Wishart doesn’t change anything, the standard deviations of the invidual coefficients and their covariance are still dependent. My answer would be to use a prior that models the stan

5 0.72000152 650 andrew gelman stats-2011-04-05-Monitor the efficiency of your Markov chain sampler using expected squared jumped distance!

Introduction: Marc Tanguay writes in with a specific question that has a very general answer. First, the question: I [Tanguay] am currently running a MCMC for which I have 3 parameters that are restricted to a specific space. 2 are bounded between 0 and 1 while the third is binary and updated by a Beta-Binomial. Since my priors are also bounded, I notice that, conditional on All the rest (which covers both data and other parameters), the density was not varying a lot within the space of the parameters. As a result, the acceptance rate is high, about 85%, and this despite the fact that all the parameter’s space is explore. Since in your book, the optimal acceptance rates prescribed are lower that 50% (in case of multiple parameters), do you think I should worry about getting 85%. Or is this normal given the restrictions on the parameters? First off: Yes, my guess is that you should be taking bigger jumps. 85% seems like too high an acceptance rate for Metropolis jumping. More generally, t

6 0.71721035 858 andrew gelman stats-2011-08-17-Jumping off the edge of the world

7 0.711918 669 andrew gelman stats-2011-04-19-The mysterious Gamma (1.4, 0.4)

8 0.697505 1142 andrew gelman stats-2012-01-29-Difficulties with the 1-4-power transformation

9 0.69622982 726 andrew gelman stats-2011-05-22-Handling multiple versions of an outcome variable

10 0.68754035 788 andrew gelman stats-2011-07-06-Early stopping and penalized likelihood

11 0.68675643 246 andrew gelman stats-2010-08-31-Somewhat Bayesian multilevel modeling

12 0.68148106 1465 andrew gelman stats-2012-08-21-D. Buggin

13 0.67705005 931 andrew gelman stats-2011-09-29-Hamiltonian Monte Carlo stories

14 0.67364216 547 andrew gelman stats-2011-01-31-Using sample size in the prior distribution

15 0.6729092 861 andrew gelman stats-2011-08-19-Will Stan work well with 40×40 matrices?

16 0.67147881 1955 andrew gelman stats-2013-07-25-Bayes-respecting experimental design and other things

17 0.66995484 2332 andrew gelman stats-2014-05-12-“The results (not shown) . . .”

18 0.66963667 248 andrew gelman stats-2010-09-01-Ratios where the numerator and denominator both change signs

19 0.66823351 938 andrew gelman stats-2011-10-03-Comparing prediction errors

20 0.66141915 519 andrew gelman stats-2011-01-16-Update on the generalized method of moments


similar blogs computed by lda model

lda for this blog:

topicId topicWeight

[(5, 0.031), (15, 0.021), (16, 0.082), (24, 0.152), (53, 0.036), (86, 0.035), (89, 0.304), (99, 0.228)]

similar blogs list:

simIndex simValue blogId blogTitle

1 0.96837103 1215 andrew gelman stats-2012-03-16-The “hot hand” and problems with hypothesis testing

Introduction: Gur Yaari writes : Anyone who has ever watched a sports competition is familiar with expressions like “on fire”, “in the zone”, “on a roll”, “momentum” and so on. But what do these expressions really mean? In 1985 when Thomas Gilovich, Robert Vallone and Amos Tversky studied this phenomenon for the first time, they defined it as: “. . . these phrases express a belief that the performance of a player during a particular period is significantly better than expected on the basis of the player’s overall record”. Their conclusion was that what people tend to perceive as a “hot hand” is essentially a cognitive illusion caused by a misperception of random sequences. Until recently there was little, if any, evidence to rule out their conclusion. Increased computing power and new data availability from various sports now provide surprising evidence of this phenomenon, thus reigniting the debate. Yaari goes on to some studies that have found time dependence in basketball, baseball, voll

2 0.95574147 1160 andrew gelman stats-2012-02-09-Familial Linkage between Neuropsychiatric Disorders and Intellectual Interests

Introduction: When I spoke at Princeton last year, I talked with neuroscientist Sam Wang, who told me about a project he did surveying incoming Princeton freshmen about mental illness in their families. He and his coauthor Benjamin Campbell found some interesting results, which they just published : A link between intellect and temperament has long been the subject of speculation. . . . Studies of the artistically inclined report linkage with familial depression, while among eminent and creative scientists, a lower incidence of affective disorders is found. In the case of developmental disorders, a heightened prevalence of autism spectrum disorders (ASDs) has been found in the families of mathematicians, physicists, and engineers. . . . We surveyed the incoming class of 2014 at Princeton University about their intended academic major, familial incidence of neuropsychiatric disorders, and demographic variables. . . . Consistent with prior findings, we noticed a relation between intended academ

3 0.95561892 2243 andrew gelman stats-2014-03-11-The myth of the myth of the myth of the hot hand

Introduction: Phil pointed me to this paper so I thought I probably better repeat what I wrote a couple years ago: 1. The effects are certainly not zero. We are not machines, and anything that can affect our expectations (for example, our success in previous tries) should affect our performance. 2. The effects I’ve seen are small, on the order of 2 percentage points (for example, the probability of a success in some sports task might be 45% if you’re “hot” and 43% otherwise). 3. There’s a huge amount of variation, not just between but also among players. Sometimes if you succeed you will stay relaxed and focused, other times you can succeed and get overconfidence. 4. Whatever the latest results on particular sports, I can’t see anyone overturning the basic finding of Gilovich, Vallone, and Tversky that players and spectators alike will perceive the hot hand even when it does not exist and dramatically overestimate the magnitude and consistency of any hot-hand phenomenon that does exist.

4 0.95547646 1477 andrew gelman stats-2012-08-30-Visualizing Distributions of Covariance Matrices

Introduction: Since we’ve been discussing prior distributions on covariance matrices, I will recommend this recent article (coauthored with Tomoki Tokuda, Ben Goodrich, Iven Van Mechelen, and Francis Tuerlinckx) on their visualization: We present some methods for graphing distributions of covariance matrices and demonstrate them on several models, including the Wishart, inverse-Wishart, and scaled inverse-Wishart families in different dimensions. Our visualizations follow the principle of decomposing a covariance matrix into scale parameters and correlations, pulling out marginal summaries where possible and using two and three-dimensional plots to reveal multivariate structure. Visualizing a distribution of covariance matrices is a step beyond visualizing a single covariance matrix or a single multivariate dataset. Our visualization methods are available through the R package VisCov.

5 0.95109034 1685 andrew gelman stats-2013-01-21-Class on computational social science this semester, Fridays, 1:00-3:40pm

Introduction: Sharad Goel, Jake Hofman, and Sergei Vassilvitskii are teaching this awesome class on computational social science this semester in the applied math department at Columbia. Here’s the course info . You should take this course. These guys are amazing.

6 0.94575214 1756 andrew gelman stats-2013-03-10-He said he was sorry

7 0.93513793 1708 andrew gelman stats-2013-02-05-Wouldn’t it be cool if Glenn Hubbard were consulting for Herbalife and I were on the other side?

same-blog 8 0.92680037 833 andrew gelman stats-2011-07-31-Untunable Metropolis

9 0.9090184 459 andrew gelman stats-2010-12-09-Solve mazes by starting at the exit

10 0.89429331 1953 andrew gelman stats-2013-07-24-Recently in the sister blog

11 0.87609756 407 andrew gelman stats-2010-11-11-Data Visualization vs. Statistical Graphics

12 0.85230893 1855 andrew gelman stats-2013-05-13-Stan!

13 0.8360498 1991 andrew gelman stats-2013-08-21-BDA3 table of contents (also a new paper on visualization)

14 0.83032942 1572 andrew gelman stats-2012-11-10-I don’t like this cartoon

15 0.82242423 566 andrew gelman stats-2011-02-09-The boxer, the wrestler, and the coin flip, again

16 0.81965059 1320 andrew gelman stats-2012-05-14-Question 4 of my final exam for Design and Analysis of Sample Surveys

17 0.81789625 623 andrew gelman stats-2011-03-21-Baseball’s greatest fielders

18 0.81333375 1032 andrew gelman stats-2011-11-28-Does Avastin work on breast cancer? Should Medicare be paying for it?

19 0.81056273 231 andrew gelman stats-2010-08-24-Yet another Bayesian job opportunity

20 0.79667258 1783 andrew gelman stats-2013-03-31-He’s getting ready to write a book