andrew_gelman_stats andrew_gelman_stats-2011 andrew_gelman_stats-2011-858 knowledge-graph by maker-knowledge-mining

858 andrew gelman stats-2011-08-17-Jumping off the edge of the world


meta infos for this blog

Source: html

Introduction: Tomas Iesmantas writes: I’m facing a problem where parameter space is bounded, e.g. all parameters have to be positive. If in MCMC as proposal distribution I use normal distribution, then at some iterations I get negative proposals. So my question is: should I use recalculation of acceptance probability every time I reject the proposal (something like in delayed rejection method), or I have to use another proposal (like lognormal, truncated normal, etc.)? The simplest solution is to just calculate p(theta)=0 for theta outside the legal region, thus reject those jumps. This will work fine (just remember that when you reject, you have to stay at the last value for one more iteration), but if you’re doing these rejections all the time, you might want to reparameterize your space, for example using logs for positive parameters, logits for constrained parameters, and softmax for parameters that are constrained to sum to 1.


Summary: the most important sentenses genereted by tfidf model

sentIndex sentText sentNum sentScore

1 Tomas Iesmantas writes: I’m facing a problem where parameter space is bounded, e. [sent-1, score-0.416]

2 If in MCMC as proposal distribution I use normal distribution, then at some iterations I get negative proposals. [sent-4, score-0.921]

3 So my question is: should I use recalculation of acceptance probability every time I reject the proposal (something like in delayed rejection method), or I have to use another proposal (like lognormal, truncated normal, etc. [sent-5, score-2.212]

4 The simplest solution is to just calculate p(theta)=0 for theta outside the legal region, thus reject those jumps. [sent-7, score-1.112]


similar blogs computed by tfidf model

tfidf for this blog:

wordName wordTfidf (topN-words)

[('reject', 0.329), ('proposal', 0.324), ('parameters', 0.286), ('constrained', 0.245), ('theta', 0.216), ('recalculation', 0.178), ('delayed', 0.168), ('logs', 0.168), ('logits', 0.168), ('iesmantas', 0.168), ('rejections', 0.168), ('tomas', 0.168), ('space', 0.167), ('normal', 0.165), ('truncated', 0.155), ('lognormal', 0.15), ('iteration', 0.15), ('facing', 0.133), ('iterations', 0.129), ('bounded', 0.129), ('distribution', 0.123), ('simplest', 0.121), ('rejection', 0.121), ('region', 0.12), ('sum', 0.118), ('acceptance', 0.115), ('calculate', 0.114), ('legal', 0.109), ('mcmc', 0.107), ('stay', 0.106), ('use', 0.103), ('outside', 0.083), ('solution', 0.08), ('negative', 0.077), ('parameter', 0.077), ('remember', 0.073), ('positive', 0.07), ('value', 0.065), ('method', 0.065), ('fine', 0.061), ('thus', 0.06), ('time', 0.058), ('probability', 0.056), ('every', 0.055), ('last', 0.051), ('another', 0.042), ('question', 0.042), ('problem', 0.039), ('like', 0.039), ('using', 0.037)]

similar blogs list:

simIndex simValue blogId blogTitle

same-blog 1 1.0000001 858 andrew gelman stats-2011-08-17-Jumping off the edge of the world

Introduction: Tomas Iesmantas writes: I’m facing a problem where parameter space is bounded, e.g. all parameters have to be positive. If in MCMC as proposal distribution I use normal distribution, then at some iterations I get negative proposals. So my question is: should I use recalculation of acceptance probability every time I reject the proposal (something like in delayed rejection method), or I have to use another proposal (like lognormal, truncated normal, etc.)? The simplest solution is to just calculate p(theta)=0 for theta outside the legal region, thus reject those jumps. This will work fine (just remember that when you reject, you have to stay at the last value for one more iteration), but if you’re doing these rejections all the time, you might want to reparameterize your space, for example using logs for positive parameters, logits for constrained parameters, and softmax for parameters that are constrained to sum to 1.

2 0.16091919 961 andrew gelman stats-2011-10-16-The “Washington read” and the algebra of conditional distributions

Introduction: I was trying to explain in class how a (Bayesian) statistician reads the formula for a probability distribution. In old-fashioned statistics textbooks you’re told that if you want to compute a conditional distribution from a joint distribution you need to do some heavy math: p(a|b) = p(a,b)/\int p(a’,b)da’. When doing Bayesian statistics, though, you usually don’t have to do the integration or the division. If you have parameters theta and data y, you first write p(y,theta). Then to get p(theta|y), you don’t need to integrate or divide. All you have to do is look at p(y,theta) in a certain way: Treat y as a constant and theta as a variable. Similarly, if you’re doing the Gibbs sampler and want a conditional distribution, just consider the parameter you’re updating as the variable and everything else as a constant. No need to integrate or divide, you just take the joint distribution and look at it from the right perspective. Awhile ago Yair told me there’s something called

3 0.16049966 1287 andrew gelman stats-2012-04-28-Understanding simulations in terms of predictive inference?

Introduction: David Hogg writes: My (now deceased) collaborator and guru in all things inference, Sam Roweis, used to emphasize to me that we should evaluate models in the data space — not the parameter space — because models are always effectively “effective” and not really, fundamentally true. Or, in other words, models should be compared in the space of their predictions, not in the space of their parameters (the parameters didn’t really “exist” at all for Sam). In that spirit, when we estimate the effectiveness of a MCMC method or tuning — by autocorrelation time or ESJD or anything else — shouldn’t we be looking at the changes in the model predictions over time, rather than the changes in the parameters over time? That is, the autocorrelation time should be the autocorrelation time in what the model (at the walker position) predicts for the data, and the ESJD should be the expected squared jump distance in what the model predicts for the data? This might resolve the concern I expressed a

4 0.15932122 931 andrew gelman stats-2011-09-29-Hamiltonian Monte Carlo stories

Introduction: Tomas Iesmantas had asked me for advice on a regression problem with 50 parameters, and I’d recommended Hamiltonian Monte Carlo. A few weeks later he reported back: After trying several modifications (HMC for all parameters at once, HMC just for first level parameters and Riemman manifold Hamiltonian Monte Carlo method), I finally got it running with HMC just for first level parameters and for others using direct sampling, since conditional distributions turned out to have closed form. However, even in this case it is quite tricky, since I had to employ mass matrix and not just diagonal but at the beginning of algorithm generated it randomly (ensuring it is positive definite). Such random generation of mass matrix is quite blind step, but it proved to be quite helpful. Riemman manifold HMC is quite vagarious, or to be more specific, metric of manifold is very sensitive. In my model log-likelihood I had exponents and values of metrics matrix elements was very large and wh

5 0.15844834 1089 andrew gelman stats-2011-12-28-Path sampling for models of varying dimension

Introduction: Somebody asks: I’m reading your paper on path sampling. It essentially solves the problem of computing the ratio \int q0(omega)d omega/\int q1(omega) d omega. I.e the arguments in q0() and q1() are the same. But this assumption is not always true in Bayesian model selection using Bayes factor. In general (for BF), we have this problem, t1 and t2 may have no relation at all. \int f1(y|t1)p1(t1) d t1 / \int f2(y|t2)p2(t2) d t2 As an example, suppose that we want to compare two sets of normally distributed data with known variance whether they have the same mean (H0) or they are not necessarily have the same mean (H1). Then the dummy variable should be mu in H0 (which is the common mean of both set of samples), and should be (mu1, mu2) (which are the means for each set of samples). One straight method to address my problem is to preform path integration for the numerate and the denominator, as both the numerate and the denominator are integrals. Each integral can be rewrit

6 0.1506522 1476 andrew gelman stats-2012-08-30-Stan is fast

7 0.14981514 160 andrew gelman stats-2010-07-23-Unhappy with improvement by a factor of 10^29

8 0.14940551 547 andrew gelman stats-2011-01-31-Using sample size in the prior distribution

9 0.14633481 861 andrew gelman stats-2011-08-19-Will Stan work well with 40×40 matrices?

10 0.14631455 870 andrew gelman stats-2011-08-25-Why it doesn’t make sense in general to form confidence intervals by inverting hypothesis tests

11 0.14545055 1941 andrew gelman stats-2013-07-16-Priors

12 0.14046066 650 andrew gelman stats-2011-04-05-Monitor the efficiency of your Markov chain sampler using expected squared jumped distance!

13 0.13952535 1913 andrew gelman stats-2013-06-24-Why it doesn’t make sense in general to form confidence intervals by inverting hypothesis tests

14 0.12928502 341 andrew gelman stats-2010-10-14-Confusion about continuous probability densities

15 0.10440574 899 andrew gelman stats-2011-09-10-The statistical significance filter

16 0.10373686 2208 andrew gelman stats-2014-02-12-How to think about “identifiability” in Bayesian inference?

17 0.10354345 1999 andrew gelman stats-2013-08-27-Bayesian model averaging or fitting a larger model

18 0.10293671 2155 andrew gelman stats-2013-12-31-No on Yes-No decisions

19 0.10280804 2129 andrew gelman stats-2013-12-10-Cross-validation and Bayesian estimation of tuning parameters

20 0.10249317 1686 andrew gelman stats-2013-01-21-Finite-population Anova calculations for models with interactions


similar blogs computed by lsi model

lsi for this blog:

topicId topicWeight

[(0, 0.105), (1, 0.117), (2, 0.036), (3, 0.033), (4, 0.023), (5, -0.018), (6, 0.098), (7, -0.003), (8, -0.069), (9, -0.035), (10, -0.012), (11, -0.022), (12, -0.015), (13, -0.051), (14, -0.072), (15, -0.042), (16, -0.021), (17, -0.005), (18, 0.019), (19, -0.061), (20, 0.074), (21, -0.022), (22, 0.017), (23, -0.024), (24, 0.044), (25, 0.009), (26, -0.056), (27, 0.025), (28, 0.044), (29, 0.028), (30, -0.007), (31, -0.005), (32, -0.026), (33, 0.035), (34, -0.027), (35, -0.037), (36, -0.024), (37, 0.036), (38, -0.032), (39, 0.032), (40, 0.044), (41, 0.034), (42, -0.066), (43, 0.004), (44, -0.04), (45, -0.048), (46, 0.056), (47, 0.065), (48, 0.04), (49, 0.033)]

similar blogs list:

simIndex simValue blogId blogTitle

same-blog 1 0.96849245 858 andrew gelman stats-2011-08-17-Jumping off the edge of the world

Introduction: Tomas Iesmantas writes: I’m facing a problem where parameter space is bounded, e.g. all parameters have to be positive. If in MCMC as proposal distribution I use normal distribution, then at some iterations I get negative proposals. So my question is: should I use recalculation of acceptance probability every time I reject the proposal (something like in delayed rejection method), or I have to use another proposal (like lognormal, truncated normal, etc.)? The simplest solution is to just calculate p(theta)=0 for theta outside the legal region, thus reject those jumps. This will work fine (just remember that when you reject, you have to stay at the last value for one more iteration), but if you’re doing these rejections all the time, you might want to reparameterize your space, for example using logs for positive parameters, logits for constrained parameters, and softmax for parameters that are constrained to sum to 1.

2 0.86100632 1089 andrew gelman stats-2011-12-28-Path sampling for models of varying dimension

Introduction: Somebody asks: I’m reading your paper on path sampling. It essentially solves the problem of computing the ratio \int q0(omega)d omega/\int q1(omega) d omega. I.e the arguments in q0() and q1() are the same. But this assumption is not always true in Bayesian model selection using Bayes factor. In general (for BF), we have this problem, t1 and t2 may have no relation at all. \int f1(y|t1)p1(t1) d t1 / \int f2(y|t2)p2(t2) d t2 As an example, suppose that we want to compare two sets of normally distributed data with known variance whether they have the same mean (H0) or they are not necessarily have the same mean (H1). Then the dummy variable should be mu in H0 (which is the common mean of both set of samples), and should be (mu1, mu2) (which are the means for each set of samples). One straight method to address my problem is to preform path integration for the numerate and the denominator, as both the numerate and the denominator are integrals. Each integral can be rewrit

3 0.81267381 160 andrew gelman stats-2010-07-23-Unhappy with improvement by a factor of 10^29

Introduction: I have an optimization problem: I have a complicated physical model that predicts energy and thermal behavior of a building, given the values of a slew of parameters, such as insulation effectiveness, window transmissivity, etc. I’m trying to find the parameter set that best fits several weeks of thermal and energy use data from the real building that we modeled. (Of course I would rather explore parameter space and come up with probability distributions for the parameters, and maybe that will come later, but for now I’m just optimizing). To do the optimization, colleagues and I implemented a “particle swarm optimization” algorithm on a massively parallel machine. This involves giving each of about 120 “particles” an initial position in parameter space, then letting them move around, trying to move to better positions according to a specific algorithm. We gave each particle an initial position sampled from our prior distribution for each parameter. So far we’ve run about 140 itera

4 0.72729683 2128 andrew gelman stats-2013-12-09-How to model distributions that have outliers in one direction

Introduction: Shravan writes: I have a problem very similar to the one presented chapter 6 of BDA, the speed of light example. You use the distribution of the minimum scores from the posterior predictive distribution, show that it’s not realistic given the data, and suggest that an asymmetric contaminated normal distribution or a symmetric long-tailed distribution would be better. How does one use such a distribution? My reply: You can actually use a symmetric long-tailed distribution such as t with low degrees of freedom. One striking feature of symmetric long-tailed distributions is that a small random sample from such a distribution can have outliers on one side or the other and look asymmetric. Just to see this, try the following in R: par (mfrow=c(3,3), mar=c(1,1,1,1)) for (i in 1:9) hist (rt (100, 2), xlab="", ylab="", main="") You’ll see some skewed distributions. So that’s the message (which I learned from an offhand comment of Rubin, actually): if you want to model

5 0.71178323 961 andrew gelman stats-2011-10-16-The “Washington read” and the algebra of conditional distributions

Introduction: I was trying to explain in class how a (Bayesian) statistician reads the formula for a probability distribution. In old-fashioned statistics textbooks you’re told that if you want to compute a conditional distribution from a joint distribution you need to do some heavy math: p(a|b) = p(a,b)/\int p(a’,b)da’. When doing Bayesian statistics, though, you usually don’t have to do the integration or the division. If you have parameters theta and data y, you first write p(y,theta). Then to get p(theta|y), you don’t need to integrate or divide. All you have to do is look at p(y,theta) in a certain way: Treat y as a constant and theta as a variable. Similarly, if you’re doing the Gibbs sampler and want a conditional distribution, just consider the parameter you’re updating as the variable and everything else as a constant. No need to integrate or divide, you just take the joint distribution and look at it from the right perspective. Awhile ago Yair told me there’s something called

6 0.6992563 547 andrew gelman stats-2011-01-31-Using sample size in the prior distribution

7 0.66460532 519 andrew gelman stats-2011-01-16-Update on the generalized method of moments

8 0.66058236 2201 andrew gelman stats-2014-02-06-Bootstrap averaging: Examples where it works and where it doesn’t work

9 0.65577406 1221 andrew gelman stats-2012-03-19-Whassup with deviance having a high posterior correlation with a parameter in the model?

10 0.65047783 650 andrew gelman stats-2011-04-05-Monitor the efficiency of your Markov chain sampler using expected squared jumped distance!

11 0.64765912 996 andrew gelman stats-2011-11-07-Chi-square FAIL when many cells have small expected values

12 0.64554107 1363 andrew gelman stats-2012-06-03-Question about predictive checks

13 0.63935828 669 andrew gelman stats-2011-04-19-The mysterious Gamma (1.4, 0.4)

14 0.63770574 1130 andrew gelman stats-2012-01-20-Prior beliefs about locations of decision boundaries

15 0.63402641 2340 andrew gelman stats-2014-05-20-Thermodynamic Monte Carlo: Michael Betancourt’s new method for simulating from difficult distributions and evaluating normalizing constants

16 0.62983406 442 andrew gelman stats-2010-12-01-bayesglm in Stata?

17 0.6211592 341 andrew gelman stats-2010-10-14-Confusion about continuous probability densities

18 0.61405766 1309 andrew gelman stats-2012-05-09-The first version of my “inference from iterative simulation using parallel sequences” paper!

19 0.60525399 56 andrew gelman stats-2010-05-28-Another argument in favor of expressing conditional probability statements using the population distribution

20 0.59897965 931 andrew gelman stats-2011-09-29-Hamiltonian Monte Carlo stories


similar blogs computed by lda model

lda for this blog:

topicId topicWeight

[(8, 0.022), (15, 0.096), (24, 0.22), (52, 0.017), (81, 0.28), (99, 0.239)]

similar blogs list:

simIndex simValue blogId blogTitle

1 0.91846228 915 andrew gelman stats-2011-09-17-(Worst) graph of the year

Introduction: This (forwarded to me from Jeff, from a powerpoint by Willam Gawthrop) wins not on form but on content: Really this graph should stand alone but it’s so wonderful that I can’t resist pointing out a few things: - The gap between 610 and 622 A.D. seems to be about the same as the previous 600 years, and only a little less than the 1400 years before that. - “Pious and devout” Jews are portrayed as having steadily increased in nonviolence up to the present day. Been to Israel lately? - I assume the line labeled “Bible” is referring to Christians? I’m sort of amazed to see pious and devout Christians listed as being maximally violent at the beginning. Huh? I thought Christ was supposed to be a nonviolent, mellow dude. The line starts at 3 B.C., implying that baby Jesus was at the extreme of violence. Gong forward, we can learn from the graph that pious and devout Christians in 1492 or 1618, say, were much more peaceful than Jesus and his crew. - Most amusingly g

2 0.91500932 552 andrew gelman stats-2011-02-03-Model Makers’ Hippocratic Oath

Introduction: Emanuel Derman and Paul Wilmott wonder how to get their fellow modelers to give up their fantasy of perfection. In a Business Week article they proposed, not entirely in jest, a model makers’ Hippocratic Oath: I will remember that I didn’t make the world and that it doesn’t satisfy my equations. Though I will use models boldly to estimate value, I will not be overly impressed by mathematics. I will never sacrifice reality for elegance without explaining why I have done so. Nor will I give the people who use my model false comfort about its accuracy. Instead, I will make explicit its assumptions and oversights. I understand that my work may have enormous effects on society and the economy, many of them beyond my comprehension. Found via Abductive Intelligence .

3 0.90619767 1762 andrew gelman stats-2013-03-13-“I have no idea who Catalina Garcia is, but she makes a decent ruler”: I don’t know if John Lee “little twerp” Anderson actually suffers from tall-person syndrome, but he is indeed tall

Introduction: I just want to share with you the best comment we’ve every had in the nearly ten-year history of this blog. Also it has statistical content! Here’s the story. After seeing an amusing article by Tom Scocca relating how reporter John Lee Anderson called someone as a “little twerp” on twitter: I conjectured that Anderson suffered from “tall person syndrome,” that problem that some people of above-average height have, that they think they’re more important than other people because they literally look down on them. But I had no idea of Anderson’s actual height. Commenter Gary responded with this impressive bit of investigative reporting: Based on this picture: he appears to be fairly tall. But the perspective makes it hard to judge. Based on this picture: he appears to be about 9-10 inches taller than Catalina Garcia. But how tall is Catalina Garcia? Not that tall – she’s shorter than the high-wire artist Phillipe Petit: And he doesn’t appear

same-blog 4 0.87180454 858 andrew gelman stats-2011-08-17-Jumping off the edge of the world

Introduction: Tomas Iesmantas writes: I’m facing a problem where parameter space is bounded, e.g. all parameters have to be positive. If in MCMC as proposal distribution I use normal distribution, then at some iterations I get negative proposals. So my question is: should I use recalculation of acceptance probability every time I reject the proposal (something like in delayed rejection method), or I have to use another proposal (like lognormal, truncated normal, etc.)? The simplest solution is to just calculate p(theta)=0 for theta outside the legal region, thus reject those jumps. This will work fine (just remember that when you reject, you have to stay at the last value for one more iteration), but if you’re doing these rejections all the time, you might want to reparameterize your space, for example using logs for positive parameters, logits for constrained parameters, and softmax for parameters that are constrained to sum to 1.

5 0.86572933 1129 andrew gelman stats-2012-01-20-Bugs Bunny, the governor of Massachusetts, the Dow 36,000 guy, presidential qualifications, and Peggy Noonan

Introduction: Elsewhere: 1. They asked me to write about my “favorite election- or campaign-related movie, novel, or TV show” (Salon) 2. The shopping period is over; the time for buying has begun (NYT) 3. If anybody’s gonna be criticizing my tax plan, I want it to be this guy (Monkey Cage) 4. The 4 key qualifications to be a great president; unfortunately George W. Bush satisfies all four, and Ronald Reagan doesn’t match any of them (Monkey Cage) 5. The politics of eyeliner (Monkey Cage)

6 0.84675467 484 andrew gelman stats-2010-12-24-Foreign language skills as an intrinsic good; also, beware the tyranny of measurement

7 0.84251511 1321 andrew gelman stats-2012-05-15-A statistical research project: Weeding out the fraudulent citations

8 0.83664101 1632 andrew gelman stats-2012-12-20-Who exactly are those silly academics who aren’t as smart as a Vegas bookie?

9 0.83257174 849 andrew gelman stats-2011-08-11-The Reliability of Cluster Surveys of Conflict Mortality: Violent Deaths and Non-Violent Deaths

10 0.81355864 556 andrew gelman stats-2011-02-04-Patterns

11 0.81183636 1033 andrew gelman stats-2011-11-28-Greece to head statistician: Tell the truth, go to jail

12 0.80632257 2096 andrew gelman stats-2013-11-10-Schiminovich is on The Simpsons

13 0.79754812 1962 andrew gelman stats-2013-07-30-The Roy causal model?

14 0.79562783 658 andrew gelman stats-2011-04-11-Statistics in high schools: Towards more accessible conceptions of statistical inference

15 0.79370534 1222 andrew gelman stats-2012-03-20-5 books book

16 0.79226005 1705 andrew gelman stats-2013-02-04-Recently in the sister blog

17 0.78875768 779 andrew gelman stats-2011-06-25-Avoiding boundary estimates using a prior distribution as regularization

18 0.78068715 2250 andrew gelman stats-2014-03-16-“I have no idea who Catalina Garcia is, but she makes a decent ruler”

19 0.77417123 1057 andrew gelman stats-2011-12-14-Hey—I didn’t know that!

20 0.76074207 2188 andrew gelman stats-2014-01-27-“Disappointed with your results? Boost your scientific paper”