andrew_gelman_stats andrew_gelman_stats-2011 andrew_gelman_stats-2011-961 knowledge-graph by maker-knowledge-mining

961 andrew gelman stats-2011-10-16-The “Washington read” and the algebra of conditional distributions


meta infos for this blog

Source: html

Introduction: I was trying to explain in class how a (Bayesian) statistician reads the formula for a probability distribution. In old-fashioned statistics textbooks you’re told that if you want to compute a conditional distribution from a joint distribution you need to do some heavy math: p(a|b) = p(a,b)/\int p(a’,b)da’. When doing Bayesian statistics, though, you usually don’t have to do the integration or the division. If you have parameters theta and data y, you first write p(y,theta). Then to get p(theta|y), you don’t need to integrate or divide. All you have to do is look at p(y,theta) in a certain way: Treat y as a constant and theta as a variable. Similarly, if you’re doing the Gibbs sampler and want a conditional distribution, just consider the parameter you’re updating as the variable and everything else as a constant. No need to integrate or divide, you just take the joint distribution and look at it from the right perspective. Awhile ago Yair told me there’s something called


Summary: the most important sentenses genereted by tfidf model

sentIndex sentText sentNum sentScore

1 I was trying to explain in class how a (Bayesian) statistician reads the formula for a probability distribution. [sent-1, score-0.173]

2 In old-fashioned statistics textbooks you’re told that if you want to compute a conditional distribution from a joint distribution you need to do some heavy math: p(a|b) = p(a,b)/\int p(a’,b)da’. [sent-2, score-1.598]

3 When doing Bayesian statistics, though, you usually don’t have to do the integration or the division. [sent-3, score-0.088]

4 If you have parameters theta and data y, you first write p(y,theta). [sent-4, score-0.434]

5 Then to get p(theta|y), you don’t need to integrate or divide. [sent-5, score-0.392]

6 All you have to do is look at p(y,theta) in a certain way: Treat y as a constant and theta as a variable. [sent-6, score-0.511]

7 Similarly, if you’re doing the Gibbs sampler and want a conditional distribution, just consider the parameter you’re updating as the variable and everything else as a constant. [sent-7, score-0.56]

8 No need to integrate or divide, you just take the joint distribution and look at it from the right perspective. [sent-8, score-0.927]

9 Awhile ago Yair told me there’s something called the “Washington read,” where you pick up a book, go straight to the index, and see if, where, and how often you’re mentioned. [sent-9, score-0.181]

10 It struck me, when explaining Bayesian algebra, that what we’re really doing when we get a conditional distribution is to take a Washington read of the joint distribution, from the perspective of the parameter or parameters of interest. [sent-10, score-1.421]

11 More generally, I’ve found that an important step in being able to do mathematics for statistics is learning how to focus on different symbols in a formula. [sent-11, score-0.428]

12 In math, all symbols are in some sense equal, whereas in statistics, x and y and pi and theta and sigma and lambda all play different roles. [sent-12, score-0.781]

13 The “Washington read” for conditional distributions is an example of statistical reading of mathematics. [sent-14, score-0.251]

14 (Another example is that, with rare exceptions, I read “38. [sent-15, score-0.207]


similar blogs computed by tfidf model

tfidf for this blog:

wordName wordTfidf (topN-words)

[('theta', 0.326), ('distribution', 0.278), ('conditional', 0.251), ('symbols', 0.242), ('joint', 0.228), ('washington', 0.225), ('integrate', 0.215), ('read', 0.207), ('symbol', 0.134), ('math', 0.132), ('int', 0.126), ('re', 0.12), ('parameter', 0.116), ('statistics', 0.112), ('told', 0.108), ('pi', 0.108), ('parameters', 0.108), ('formulas', 0.105), ('sigma', 0.105), ('bayesian', 0.104), ('look', 0.104), ('need', 0.102), ('updating', 0.102), ('algebra', 0.093), ('exceptions', 0.091), ('sampler', 0.091), ('gibbs', 0.09), ('yair', 0.089), ('formula', 0.088), ('integration', 0.088), ('divide', 0.087), ('heavy', 0.087), ('reads', 0.085), ('passage', 0.084), ('struck', 0.084), ('personality', 0.083), ('flat', 0.081), ('constant', 0.081), ('dimensions', 0.081), ('index', 0.08), ('textbooks', 0.079), ('impossible', 0.076), ('get', 0.075), ('compute', 0.075), ('explaining', 0.074), ('mathematics', 0.074), ('treat', 0.074), ('straight', 0.073), ('equal', 0.073), ('stuck', 0.073)]

similar blogs list:

simIndex simValue blogId blogTitle

same-blog 1 1.0000001 961 andrew gelman stats-2011-10-16-The “Washington read” and the algebra of conditional distributions

Introduction: I was trying to explain in class how a (Bayesian) statistician reads the formula for a probability distribution. In old-fashioned statistics textbooks you’re told that if you want to compute a conditional distribution from a joint distribution you need to do some heavy math: p(a|b) = p(a,b)/\int p(a’,b)da’. When doing Bayesian statistics, though, you usually don’t have to do the integration or the division. If you have parameters theta and data y, you first write p(y,theta). Then to get p(theta|y), you don’t need to integrate or divide. All you have to do is look at p(y,theta) in a certain way: Treat y as a constant and theta as a variable. Similarly, if you’re doing the Gibbs sampler and want a conditional distribution, just consider the parameter you’re updating as the variable and everything else as a constant. No need to integrate or divide, you just take the joint distribution and look at it from the right perspective. Awhile ago Yair told me there’s something called

2 0.33418396 1089 andrew gelman stats-2011-12-28-Path sampling for models of varying dimension

Introduction: Somebody asks: I’m reading your paper on path sampling. It essentially solves the problem of computing the ratio \int q0(omega)d omega/\int q1(omega) d omega. I.e the arguments in q0() and q1() are the same. But this assumption is not always true in Bayesian model selection using Bayes factor. In general (for BF), we have this problem, t1 and t2 may have no relation at all. \int f1(y|t1)p1(t1) d t1 / \int f2(y|t2)p2(t2) d t2 As an example, suppose that we want to compare two sets of normally distributed data with known variance whether they have the same mean (H0) or they are not necessarily have the same mean (H1). Then the dummy variable should be mu in H0 (which is the common mean of both set of samples), and should be (mu1, mu2) (which are the means for each set of samples). One straight method to address my problem is to preform path integration for the numerate and the denominator, as both the numerate and the denominator are integrals. Each integral can be rewrit

3 0.22284201 1941 andrew gelman stats-2013-07-16-Priors

Introduction: Nick Firoozye writes: While I am absolutely sympathetic to the Bayesian agenda I am often troubled by the requirement of having priors. We must have priors on the parameter of an infinite number of model we have never seen before and I find this troubling. There is a similarly troubling problem in economics of utility theory. Utility is on consumables. To be complete a consumer must assign utility to all sorts of things they never would have encountered. More recent versions of utility theory instead make consumption goods a portfolio of attributes. Cadillacs are x many units of luxury y of transport etc etc. And we can automatically have personal utilities to all these attributes. I don’t ever see parameters. Some model have few and some have hundreds. Instead, I see data. So I don’t know how to have an opinion on parameters themselves. Rather I think it far more natural to have opinions on the behavior of models. The prior predictive density is a good and sensible notion. Also

4 0.21526183 899 andrew gelman stats-2011-09-10-The statistical significance filter

Introduction: I’ve talked about this a bit but it’s never had its own blog entry (until now). Statistically significant findings tend to overestimate the magnitude of effects. This holds in general (because E(|x|) > |E(x)|) but even more so if you restrict to statistically significant results. Here’s an example. Suppose a true effect of theta is unbiasedly estimated by y ~ N (theta, 1). Further suppose that we will only consider statistically significant results, that is, cases in which |y| > 2. The estimate “|y| conditional on |y|>2″ is clearly an overestimate of |theta|. First off, if |theta|<2, the estimate |y| conditional on statistical significance is not only too high in expectation, it's always too high. This is a problem, given that |theta| is in reality probably is less than 2. (The low-hangning fruit have already been picked, remember?) But even if |theta|>2, the estimate |y| conditional on statistical significance will still be too high in expectation. For a discussion o

5 0.19747934 547 andrew gelman stats-2011-01-31-Using sample size in the prior distribution

Introduction: Mike McLaughlin writes: Consider the Seeds example in vol. 1 of the BUGS examples. There, a binomial likelihood has a p parameter constructed, via logit, from two covariates. What I am wondering is: Would it be legitimate, in a binomial + logit problem like this, to allow binomial p[i] to be a function of the corresponding n[i] or would that amount to using the data in the prior? In other words, in the context of the Seeds example, is r[] the only data or is n[] data as well and therefore not permissible in a prior formulation? I [McLaughlin] currently have a model with a common beta prior for all p[i] but would like to mitigate this commonality (a kind of James-Stein effect) when there are lots of observations for some i. But this seems to feed the data back into the prior. Does it really? It also occurs to me [McLaughlin] that, perhaps, a binomial likelihood is not the one to use here (not flexible enough). My reply: Strictly speaking, “n” is data, and so what you wa

6 0.18308637 1610 andrew gelman stats-2012-12-06-Yes, checking calibration of probability forecasts is part of Bayesian statistics

7 0.16150381 1868 andrew gelman stats-2013-05-23-Validation of Software for Bayesian Models Using Posterior Quantiles

8 0.16091919 858 andrew gelman stats-2011-08-17-Jumping off the edge of the world

9 0.14810249 2072 andrew gelman stats-2013-10-21-The future (and past) of statistical sciences

10 0.14427963 1130 andrew gelman stats-2012-01-20-Prior beliefs about locations of decision boundaries

11 0.14381997 341 andrew gelman stats-2010-10-14-Confusion about continuous probability densities

12 0.1330144 2155 andrew gelman stats-2013-12-31-No on Yes-No decisions

13 0.13069123 2128 andrew gelman stats-2013-12-09-How to model distributions that have outliers in one direction

14 0.12806669 1999 andrew gelman stats-2013-08-27-Bayesian model averaging or fitting a larger model

15 0.1216868 1955 andrew gelman stats-2013-07-25-Bayes-respecting experimental design and other things

16 0.12064036 1554 andrew gelman stats-2012-10-31-It not necessary that Bayesian methods conform to the likelihood principle

17 0.11988541 236 andrew gelman stats-2010-08-26-Teaching yourself mathematics

18 0.11937599 1309 andrew gelman stats-2012-05-09-The first version of my “inference from iterative simulation using parallel sequences” paper!

19 0.11909361 1695 andrew gelman stats-2013-01-28-Economists argue about Bayes

20 0.11867672 1476 andrew gelman stats-2012-08-30-Stan is fast


similar blogs computed by lsi model

lsi for this blog:

topicId topicWeight

[(0, 0.198), (1, 0.118), (2, -0.052), (3, 0.094), (4, -0.027), (5, 0.001), (6, 0.091), (7, 0.063), (8, -0.052), (9, -0.07), (10, -0.003), (11, -0.008), (12, 0.011), (13, -0.03), (14, 0.007), (15, -0.016), (16, -0.041), (17, 0.009), (18, 0.044), (19, -0.061), (20, 0.094), (21, 0.002), (22, 0.039), (23, -0.031), (24, 0.051), (25, 0.048), (26, -0.046), (27, 0.052), (28, 0.058), (29, 0.051), (30, 0.002), (31, 0.063), (32, -0.061), (33, 0.022), (34, 0.014), (35, 0.006), (36, -0.017), (37, 0.077), (38, -0.06), (39, -0.024), (40, 0.086), (41, 0.034), (42, -0.088), (43, -0.045), (44, -0.048), (45, -0.071), (46, 0.129), (47, 0.091), (48, 0.008), (49, 0.011)]

similar blogs list:

simIndex simValue blogId blogTitle

same-blog 1 0.96656418 961 andrew gelman stats-2011-10-16-The “Washington read” and the algebra of conditional distributions

Introduction: I was trying to explain in class how a (Bayesian) statistician reads the formula for a probability distribution. In old-fashioned statistics textbooks you’re told that if you want to compute a conditional distribution from a joint distribution you need to do some heavy math: p(a|b) = p(a,b)/\int p(a’,b)da’. When doing Bayesian statistics, though, you usually don’t have to do the integration or the division. If you have parameters theta and data y, you first write p(y,theta). Then to get p(theta|y), you don’t need to integrate or divide. All you have to do is look at p(y,theta) in a certain way: Treat y as a constant and theta as a variable. Similarly, if you’re doing the Gibbs sampler and want a conditional distribution, just consider the parameter you’re updating as the variable and everything else as a constant. No need to integrate or divide, you just take the joint distribution and look at it from the right perspective. Awhile ago Yair told me there’s something called

2 0.8549906 1089 andrew gelman stats-2011-12-28-Path sampling for models of varying dimension

Introduction: Somebody asks: I’m reading your paper on path sampling. It essentially solves the problem of computing the ratio \int q0(omega)d omega/\int q1(omega) d omega. I.e the arguments in q0() and q1() are the same. But this assumption is not always true in Bayesian model selection using Bayes factor. In general (for BF), we have this problem, t1 and t2 may have no relation at all. \int f1(y|t1)p1(t1) d t1 / \int f2(y|t2)p2(t2) d t2 As an example, suppose that we want to compare two sets of normally distributed data with known variance whether they have the same mean (H0) or they are not necessarily have the same mean (H1). Then the dummy variable should be mu in H0 (which is the common mean of both set of samples), and should be (mu1, mu2) (which are the means for each set of samples). One straight method to address my problem is to preform path integration for the numerate and the denominator, as both the numerate and the denominator are integrals. Each integral can be rewrit

3 0.79671252 858 andrew gelman stats-2011-08-17-Jumping off the edge of the world

Introduction: Tomas Iesmantas writes: I’m facing a problem where parameter space is bounded, e.g. all parameters have to be positive. If in MCMC as proposal distribution I use normal distribution, then at some iterations I get negative proposals. So my question is: should I use recalculation of acceptance probability every time I reject the proposal (something like in delayed rejection method), or I have to use another proposal (like lognormal, truncated normal, etc.)? The simplest solution is to just calculate p(theta)=0 for theta outside the legal region, thus reject those jumps. This will work fine (just remember that when you reject, you have to stay at the last value for one more iteration), but if you’re doing these rejections all the time, you might want to reparameterize your space, for example using logs for positive parameters, logits for constrained parameters, and softmax for parameters that are constrained to sum to 1.

4 0.74161446 2128 andrew gelman stats-2013-12-09-How to model distributions that have outliers in one direction

Introduction: Shravan writes: I have a problem very similar to the one presented chapter 6 of BDA, the speed of light example. You use the distribution of the minimum scores from the posterior predictive distribution, show that it’s not realistic given the data, and suggest that an asymmetric contaminated normal distribution or a symmetric long-tailed distribution would be better. How does one use such a distribution? My reply: You can actually use a symmetric long-tailed distribution such as t with low degrees of freedom. One striking feature of symmetric long-tailed distributions is that a small random sample from such a distribution can have outliers on one side or the other and look asymmetric. Just to see this, try the following in R: par (mfrow=c(3,3), mar=c(1,1,1,1)) for (i in 1:9) hist (rt (100, 2), xlab="", ylab="", main="") You’ll see some skewed distributions. So that’s the message (which I learned from an offhand comment of Rubin, actually): if you want to model

5 0.70551348 341 andrew gelman stats-2010-10-14-Confusion about continuous probability densities

Introduction: I had the following email exchange with a reader of Bayesian Data Analysis. My correspondent wrote: Exercise 1(b) involves evaluating the normal pdf at a single point. But p(Y=y|mu,sigma) = 0 (and is not simply N(y|mu,sigma)), since the normal distribution is continuous. So it seems that part (b) of the exercise is inappropriate. The solution does actually evaluate the probability as the value of the pdf at the single point, which is wrong. The probabilities should all be 0, so the answer to (b) is undefined. I replied: The pdf is the probability density function, which for a continuous distribution is defined as the derivative of the cumulative density function. The notation in BDA is rigorous but we do not spell out all the details, so I can see how confusion is possible. My correspondent: I agree that the pdf is the derivative of the cdf. But to compute P(a .lt. Y .lt. b) for a continuous distribution (with support in the real line) requires integrating over t

6 0.705414 638 andrew gelman stats-2011-03-30-More on the correlation between statistical and political ideology

7 0.68277055 519 andrew gelman stats-2011-01-16-Update on the generalized method of moments

8 0.6431967 1560 andrew gelman stats-2012-11-03-Statistical methods that work in some settings but not others

9 0.63986218 547 andrew gelman stats-2011-01-31-Using sample size in the prior distribution

10 0.63840562 160 andrew gelman stats-2010-07-23-Unhappy with improvement by a factor of 10^29

11 0.63157588 1868 andrew gelman stats-2013-05-23-Validation of Software for Bayesian Models Using Posterior Quantiles

12 0.62861371 1309 andrew gelman stats-2012-05-09-The first version of my “inference from iterative simulation using parallel sequences” paper!

13 0.62339139 1955 andrew gelman stats-2013-07-25-Bayes-respecting experimental design and other things

14 0.61653835 788 andrew gelman stats-2011-07-06-Early stopping and penalized likelihood

15 0.61516935 984 andrew gelman stats-2011-11-01-David MacKay sez . . . 12??

16 0.61147553 566 andrew gelman stats-2011-02-09-The boxer, the wrestler, and the coin flip, again

17 0.60824841 1572 andrew gelman stats-2012-11-10-I don’t like this cartoon

18 0.60741585 2201 andrew gelman stats-2014-02-06-Bootstrap averaging: Examples where it works and where it doesn’t work

19 0.604415 2072 andrew gelman stats-2013-10-21-The future (and past) of statistical sciences

20 0.60241795 1610 andrew gelman stats-2012-12-06-Yes, checking calibration of probability forecasts is part of Bayesian statistics


similar blogs computed by lda model

lda for this blog:

topicId topicWeight

[(4, 0.012), (15, 0.041), (16, 0.073), (21, 0.016), (24, 0.2), (27, 0.01), (40, 0.019), (45, 0.022), (55, 0.023), (62, 0.012), (76, 0.01), (83, 0.04), (86, 0.043), (89, 0.011), (95, 0.026), (99, 0.354)]

similar blogs list:

simIndex simValue blogId blogTitle

1 0.98503685 1162 andrew gelman stats-2012-02-11-Adding an error model to a deterministic model

Introduction: Daniel Lakeland asks , “Where do likelihoods come from?” He describes a class of problems where you have a deterministic dynamic model that you want to fit to data. The data won’t fit perfectly so, if you want to do Bayesian inference, you need to introduce an error model. This looks a little bit different from the usual way that models are presented in statistics textbooks, where the focus is typically on the random error process, not on the deterministic part of the model. A focus on the error process makes sense in some applications that have inherent randomness or variation (for example, genetics, psychology, and survey sampling) but not so much in the physical sciences, where the deterministic model can be complicated and is typically the essence of the study. Often in these sorts of studies, the staring point (and sometimes the ending point) is what the physicists call “nonlinear least squares” or what we would call normally-distributed errors. That’s what we did for our

2 0.98481834 511 andrew gelman stats-2011-01-11-One more time on that ESP study: The problem of overestimates and the shrinkage solution

Introduction: Benedict Carey writes a follow-up article on ESP studies and Bayesian statistics. ( See here for my previous thoughts on the topic.) Everything Carey writes is fine, and he even uses an example I recommended: The statistical approach that has dominated the social sciences for almost a century is called significance testing. The idea is straightforward. A finding from any well-designed study — say, a correlation between a personality trait and the risk of depression — is considered “significant” if its probability of occurring by chance is less than 5 percent. This arbitrary cutoff makes sense when the effect being studied is a large one — for example, when measuring the so-called Stroop effect. This effect predicts that naming the color of a word is faster and more accurate when the word and color match (“red” in red letters) than when they do not (“red” in blue letters), and is very strong in almost everyone. “But if the true effect of what you are measuring is small,” sai

3 0.98431432 1695 andrew gelman stats-2013-01-28-Economists argue about Bayes

Introduction: Robert Bell pointed me to this post by Brad De Long on Bayesian statistics, and then I also noticed this from Noah Smith, who wrote: My impression is that although the Bayesian/Frequentist debate is interesting and intellectually fun, there’s really not much “there” there… despite being so-hip-right-now, Bayesian is not the Statistical Jesus. I’m happy to see the discussion going in this direction. Twenty-five years ago or so, when I got into this biz, there were some serious anti-Bayesian attitudes floating around in mainstream statistics. Discussions in the journals sometimes devolved into debates of the form, “Bayesians: knaves or fools?”. You’d get all sorts of free-floating skepticism about any prior distribution at all, even while people were accepting without question (and doing theory on) logistic regressions, proportional hazards models, and all sorts of strong strong models. (In the subfield of survey sampling, various prominent researchers would refuse to mode

4 0.98263162 899 andrew gelman stats-2011-09-10-The statistical significance filter

Introduction: I’ve talked about this a bit but it’s never had its own blog entry (until now). Statistically significant findings tend to overestimate the magnitude of effects. This holds in general (because E(|x|) > |E(x)|) but even more so if you restrict to statistically significant results. Here’s an example. Suppose a true effect of theta is unbiasedly estimated by y ~ N (theta, 1). Further suppose that we will only consider statistically significant results, that is, cases in which |y| > 2. The estimate “|y| conditional on |y|>2″ is clearly an overestimate of |theta|. First off, if |theta|<2, the estimate |y| conditional on statistical significance is not only too high in expectation, it's always too high. This is a problem, given that |theta| is in reality probably is less than 2. (The low-hangning fruit have already been picked, remember?) But even if |theta|>2, the estimate |y| conditional on statistical significance will still be too high in expectation. For a discussion o

5 0.98240113 1247 andrew gelman stats-2012-04-05-More philosophy of Bayes

Introduction: Konrad Scheffler writes: I was interested by your paper “Induction and deduction in Bayesian data analysis” and was wondering if you would entertain a few questions: – Under the banner of objective Bayesianism, I would posit something like this as a description of Bayesian inference: “Objective Bayesian probability is not a degree of belief (which would necessarily be subjective) but a measure of the plausibility of a hypothesis, conditional on a formally specified information state. One way of specifying a formal information state is to specify a model, which involves specifying both a prior distribution (typically for a set of unobserved variables) and a likelihood function (typically for a set of observed variables, conditioned on the values of the unobserved variables). Bayesian inference involves calculating the objective degree of plausibility of a hypothesis (typically the truth value of the hypothesis is a function of the variables mentioned above) given such a

6 0.98220265 1605 andrew gelman stats-2012-12-04-Write This Book

7 0.98203564 1671 andrew gelman stats-2013-01-13-Preregistration of Studies and Mock Reports

8 0.98191917 788 andrew gelman stats-2011-07-06-Early stopping and penalized likelihood

9 0.98167229 878 andrew gelman stats-2011-08-29-Infovis, infographics, and data visualization: Where I’m coming from, and where I’d like to go

10 0.98153687 2140 andrew gelman stats-2013-12-19-Revised evidence for statistical standards

11 0.98148656 1117 andrew gelman stats-2012-01-13-What are the important issues in ethics and statistics? I’m looking for your input!

12 0.98147899 2244 andrew gelman stats-2014-03-11-What if I were to stop publishing in journals?

13 0.98127651 2174 andrew gelman stats-2014-01-17-How to think about the statistical evidence when the statistical evidence can’t be conclusive?

14 0.98116803 1560 andrew gelman stats-2012-11-03-Statistical methods that work in some settings but not others

same-blog 15 0.98115212 961 andrew gelman stats-2011-10-16-The “Washington read” and the algebra of conditional distributions

16 0.98081237 970 andrew gelman stats-2011-10-24-Bell Labs

17 0.98070008 2040 andrew gelman stats-2013-09-26-Difficulties in making inferences about scientific truth from distributions of published p-values

18 0.98057508 1763 andrew gelman stats-2013-03-14-Everyone’s trading bias for variance at some point, it’s just done at different places in the analyses

19 0.98046434 2055 andrew gelman stats-2013-10-08-A Bayesian approach for peer-review panels? and a speculation about Bruno Frey

20 0.9801265 1910 andrew gelman stats-2013-06-22-Struggles over the criticism of the “cannabis users and IQ change” paper