andrew_gelman_stats andrew_gelman_stats-2011 andrew_gelman_stats-2011-729 knowledge-graph by maker-knowledge-mining
Source: html
Introduction: Peng Yu writes: On page 180 of BDA2, deviance is defined as D(y,\theta)=-2log p(y|\theta). However, according to GLM 2/e by McCullagh and Nelder, deviance is the different of the log-likelihood of the full model and the base model (times 2) (see the equation on the wiki webpage). The english word ‘deviance’ implies the difference from a standard (in this case, the base model). I’m wondering what the rationale for your definition of deviance, which consists of only 1 term rather than 2 terms. My reply: Deviance is typically computed as a relative quantity; that is, people look at the difference in deviance. So the two definitions are equivalent.
sentIndex sentText sentNum sentScore
1 Peng Yu writes: On page 180 of BDA2, deviance is defined as D(y,\theta)=-2log p(y|\theta). [sent-1, score-0.86]
2 However, according to GLM 2/e by McCullagh and Nelder, deviance is the different of the log-likelihood of the full model and the base model (times 2) (see the equation on the wiki webpage). [sent-2, score-1.56]
3 The english word ‘deviance’ implies the difference from a standard (in this case, the base model). [sent-3, score-0.673]
4 I’m wondering what the rationale for your definition of deviance, which consists of only 1 term rather than 2 terms. [sent-4, score-0.579]
5 My reply: Deviance is typically computed as a relative quantity; that is, people look at the difference in deviance. [sent-5, score-0.467]
wordName wordTfidf (topN-words)
[('deviance', 0.708), ('theta', 0.219), ('base', 0.206), ('mccullagh', 0.18), ('peng', 0.18), ('nelder', 0.17), ('yu', 0.157), ('wiki', 0.145), ('consists', 0.139), ('rationale', 0.139), ('definitions', 0.137), ('glm', 0.134), ('difference', 0.124), ('computed', 0.117), ('equation', 0.114), ('webpage', 0.113), ('quantity', 0.112), ('model', 0.106), ('implies', 0.102), ('definition', 0.102), ('english', 0.101), ('equivalent', 0.092), ('relative', 0.086), ('defined', 0.085), ('wondering', 0.085), ('word', 0.083), ('term', 0.077), ('according', 0.074), ('typically', 0.069), ('page', 0.067), ('full', 0.065), ('however', 0.061), ('times', 0.057), ('standard', 0.057), ('reply', 0.052), ('look', 0.046), ('case', 0.039), ('rather', 0.037), ('different', 0.036), ('two', 0.034), ('writes', 0.027), ('people', 0.025), ('see', 0.024)]
simIndex simValue blogId blogTitle
same-blog 1 1.0000001 729 andrew gelman stats-2011-05-24-Deviance as a difference
Introduction: Peng Yu writes: On page 180 of BDA2, deviance is defined as D(y,\theta)=-2log p(y|\theta). However, according to GLM 2/e by McCullagh and Nelder, deviance is the different of the log-likelihood of the full model and the base model (times 2) (see the equation on the wiki webpage). The english word ‘deviance’ implies the difference from a standard (in this case, the base model). I’m wondering what the rationale for your definition of deviance, which consists of only 1 term rather than 2 terms. My reply: Deviance is typically computed as a relative quantity; that is, people look at the difference in deviance. So the two definitions are equivalent.
Introduction: Jean Richardson writes: Do you know what might lead to a large negative cross-correlation (-0.95) between deviance and one of the model parameters? Here’s the (brief) background: I [Richardson] have written a Bayesian hierarchical site occupancy model for presence of disease on individual amphibians. The response variable is therefore binary (disease present/absent) and the probability of disease being present in an individual (psi) depends on various covariates (species of amphibian, location sampled, etc.) paramaterized using a logit link function. Replicates are individuals sampled (tested for presence of disease) together. The possibility of imperfect detection is included as p = (prob. disease detected given disease is present). Posterior distributions were estimated using WinBUGS via R2WinBUGS. Simulated data from the model fit the real data very well and posterior distribution densities seem robust to any changes in the model (different priors, etc.) All autocor
3 0.30894631 1516 andrew gelman stats-2012-09-30-Computational problems with glm etc.
Introduction: John Mount provides some useful background and follow-up on our discussion from last year on computational instability of the usual logistic regression solver. Just to refresh your memory, here’s a simple logistic regression with only a constant term and no separation, nothing pathological at all: > y <- rep (c(1,0),c(10,5)) > display (glm (y ~ 1, family=binomial(link="logit"))) glm(formula = y ~ 1, family = binomial(link = "logit")) coef.est coef.se (Intercept) 0.69 0.55 --- n = 15, k = 1 residual deviance = 19.1, null deviance = 19.1 (difference = 0.0) And here’s what happens when we give it the not-outrageous starting value of -2: > display (glm (y ~ 1, family=binomial(link="logit"), start=-2)) glm(formula = y ~ 1, family = binomial(link = "logit"), start = -2) coef.est coef.se (Intercept) 71.97 17327434.18 --- n = 15, k = 1 residual deviance = 360.4, null deviance = 19.1 (difference = -341.3) Warning message:
4 0.30265656 696 andrew gelman stats-2011-05-04-Whassup with glm()?
Introduction: We’re having problem with starting values in glm(). A very simple logistic regression with just an intercept with a very simple starting value (beta=5) blows up. Here’s the R code: > y <- rep (c(1,0),c(10,5)) > glm (y ~ 1, family=binomial(link="logit")) Call: glm(formula = y ~ 1, family = binomial(link = "logit")) Coefficients: (Intercept) 0.6931 Degrees of Freedom: 14 Total (i.e. Null); 14 Residual Null Deviance: 19.1 Residual Deviance: 19.1 AIC: 21.1 > glm (y ~ 1, family=binomial(link="logit"), start=2) Call: glm(formula = y ~ 1, family = binomial(link = "logit"), start = 2) Coefficients: (Intercept) 0.6931 Degrees of Freedom: 14 Total (i.e. Null); 14 Residual Null Deviance: 19.1 Residual Deviance: 19.1 AIC: 21.1 > glm (y ~ 1, family=binomial(link="logit"), start=5) Call: glm(formula = y ~ 1, family = binomial(link = "logit"), start = 5) Coefficients: (Intercept) 1.501e+15 Degrees of Freedom: 14 Total (i.
5 0.13177542 1089 andrew gelman stats-2011-12-28-Path sampling for models of varying dimension
Introduction: Somebody asks: I’m reading your paper on path sampling. It essentially solves the problem of computing the ratio \int q0(omega)d omega/\int q1(omega) d omega. I.e the arguments in q0() and q1() are the same. But this assumption is not always true in Bayesian model selection using Bayes factor. In general (for BF), we have this problem, t1 and t2 may have no relation at all. \int f1(y|t1)p1(t1) d t1 / \int f2(y|t2)p2(t2) d t2 As an example, suppose that we want to compare two sets of normally distributed data with known variance whether they have the same mean (H0) or they are not necessarily have the same mean (H1). Then the dummy variable should be mu in H0 (which is the common mean of both set of samples), and should be (mu1, mu2) (which are the means for each set of samples). One straight method to address my problem is to preform path integration for the numerate and the denominator, as both the numerate and the denominator are integrals. Each integral can be rewrit
6 0.1214527 1941 andrew gelman stats-2013-07-16-Priors
7 0.11978365 776 andrew gelman stats-2011-06-22-Deviance, DIC, AIC, cross-validation, etc
8 0.097317882 547 andrew gelman stats-2011-01-31-Using sample size in the prior distribution
9 0.097149417 1868 andrew gelman stats-2013-05-23-Validation of Software for Bayesian Models Using Posterior Quantiles
10 0.09711498 899 andrew gelman stats-2011-09-10-The statistical significance filter
11 0.08824192 1975 andrew gelman stats-2013-08-09-Understanding predictive information criteria for Bayesian models
12 0.087236129 1476 andrew gelman stats-2012-08-30-Stan is fast
13 0.080732659 571 andrew gelman stats-2011-02-13-A departmental wiki page?
14 0.079227261 961 andrew gelman stats-2011-10-16-The “Washington read” and the algebra of conditional distributions
15 0.0763603 2349 andrew gelman stats-2014-05-26-WAIC and cross-validation in Stan!
16 0.074544653 2155 andrew gelman stats-2013-12-31-No on Yes-No decisions
17 0.073678389 1999 andrew gelman stats-2013-08-27-Bayesian model averaging or fitting a larger model
18 0.071264133 1130 andrew gelman stats-2012-01-20-Prior beliefs about locations of decision boundaries
20 0.066449381 1913 andrew gelman stats-2013-06-24-Why it doesn’t make sense in general to form confidence intervals by inverting hypothesis tests
topicId topicWeight
[(0, 0.071), (1, 0.07), (2, 0.018), (3, 0.019), (4, 0.017), (5, -0.015), (6, 0.027), (7, -0.013), (8, 0.026), (9, -0.018), (10, -0.012), (11, 0.005), (12, -0.016), (13, -0.029), (14, -0.04), (15, 0.021), (16, 0.005), (17, -0.021), (18, -0.002), (19, -0.028), (20, 0.071), (21, 0.011), (22, 0.058), (23, -0.08), (24, 0.03), (25, 0.0), (26, -0.01), (27, -0.007), (28, 0.029), (29, -0.004), (30, -0.047), (31, 0.04), (32, 0.006), (33, 0.015), (34, 0.002), (35, 0.015), (36, -0.011), (37, -0.009), (38, 0.009), (39, 0.047), (40, 0.025), (41, 0.014), (42, -0.062), (43, -0.016), (44, -0.017), (45, -0.025), (46, 0.095), (47, 0.028), (48, -0.037), (49, 0.104)]
simIndex simValue blogId blogTitle
same-blog 1 0.90058339 729 andrew gelman stats-2011-05-24-Deviance as a difference
Introduction: Peng Yu writes: On page 180 of BDA2, deviance is defined as D(y,\theta)=-2log p(y|\theta). However, according to GLM 2/e by McCullagh and Nelder, deviance is the different of the log-likelihood of the full model and the base model (times 2) (see the equation on the wiki webpage). The english word ‘deviance’ implies the difference from a standard (in this case, the base model). I’m wondering what the rationale for your definition of deviance, which consists of only 1 term rather than 2 terms. My reply: Deviance is typically computed as a relative quantity; that is, people look at the difference in deviance. So the two definitions are equivalent.
Introduction: Jean Richardson writes: Do you know what might lead to a large negative cross-correlation (-0.95) between deviance and one of the model parameters? Here’s the (brief) background: I [Richardson] have written a Bayesian hierarchical site occupancy model for presence of disease on individual amphibians. The response variable is therefore binary (disease present/absent) and the probability of disease being present in an individual (psi) depends on various covariates (species of amphibian, location sampled, etc.) paramaterized using a logit link function. Replicates are individuals sampled (tested for presence of disease) together. The possibility of imperfect detection is included as p = (prob. disease detected given disease is present). Posterior distributions were estimated using WinBUGS via R2WinBUGS. Simulated data from the model fit the real data very well and posterior distribution densities seem robust to any changes in the model (different priors, etc.) All autocor
3 0.70818084 1089 andrew gelman stats-2011-12-28-Path sampling for models of varying dimension
Introduction: Somebody asks: I’m reading your paper on path sampling. It essentially solves the problem of computing the ratio \int q0(omega)d omega/\int q1(omega) d omega. I.e the arguments in q0() and q1() are the same. But this assumption is not always true in Bayesian model selection using Bayes factor. In general (for BF), we have this problem, t1 and t2 may have no relation at all. \int f1(y|t1)p1(t1) d t1 / \int f2(y|t2)p2(t2) d t2 As an example, suppose that we want to compare two sets of normally distributed data with known variance whether they have the same mean (H0) or they are not necessarily have the same mean (H1). Then the dummy variable should be mu in H0 (which is the common mean of both set of samples), and should be (mu1, mu2) (which are the means for each set of samples). One straight method to address my problem is to preform path integration for the numerate and the denominator, as both the numerate and the denominator are integrals. Each integral can be rewrit
4 0.69102883 696 andrew gelman stats-2011-05-04-Whassup with glm()?
Introduction: We’re having problem with starting values in glm(). A very simple logistic regression with just an intercept with a very simple starting value (beta=5) blows up. Here’s the R code: > y <- rep (c(1,0),c(10,5)) > glm (y ~ 1, family=binomial(link="logit")) Call: glm(formula = y ~ 1, family = binomial(link = "logit")) Coefficients: (Intercept) 0.6931 Degrees of Freedom: 14 Total (i.e. Null); 14 Residual Null Deviance: 19.1 Residual Deviance: 19.1 AIC: 21.1 > glm (y ~ 1, family=binomial(link="logit"), start=2) Call: glm(formula = y ~ 1, family = binomial(link = "logit"), start = 2) Coefficients: (Intercept) 0.6931 Degrees of Freedom: 14 Total (i.e. Null); 14 Residual Null Deviance: 19.1 Residual Deviance: 19.1 AIC: 21.1 > glm (y ~ 1, family=binomial(link="logit"), start=5) Call: glm(formula = y ~ 1, family = binomial(link = "logit"), start = 5) Coefficients: (Intercept) 1.501e+15 Degrees of Freedom: 14 Total (i.
5 0.59972167 1422 andrew gelman stats-2012-07-20-Likelihood thresholds and decisions
Introduction: David Hogg points me to this discussion: Martin Strasbourg and I [Hogg] discussed his project to detect new satellites of M31 in the PAndAS survey. He can construct a likelihood ratio (possibly even a marginalized likelihood ratio) at every position in the M31 imaging, between the best-fit satellite-plus-background model and the best nothing-plus-background model. He can make a two-dimensional map of these likelihood ratios and show a the histogram of them. Looking at this histogram, which has a tail to very large ratios, he asked me, where should I put my cut? That is, at what likelihood ratio does a candidate deserve follow-up? Here’s my unsatisfying answer: To a statistician, the distribution of likelihood ratios is interesting and valuable to study. To an astronomer, it is uninteresting. You don’t want to know the distribution of likelihoods, you want to find satellites . . . I wrote that I think this makes sense and that it would actualy be an interesting and useful rese
6 0.59885615 1868 andrew gelman stats-2013-05-23-Validation of Software for Bayesian Models Using Posterior Quantiles
7 0.59860235 1516 andrew gelman stats-2012-09-30-Computational problems with glm etc.
8 0.58251852 1346 andrew gelman stats-2012-05-27-Average predictive comparisons when changing a pair of variables
9 0.57061929 858 andrew gelman stats-2011-08-17-Jumping off the edge of the world
10 0.56861353 547 andrew gelman stats-2011-01-31-Using sample size in the prior distribution
12 0.5296616 2342 andrew gelman stats-2014-05-21-Models with constraints
13 0.52899522 1476 andrew gelman stats-2012-08-30-Stan is fast
14 0.52747732 782 andrew gelman stats-2011-06-29-Putting together multinomial discrete regressions by combining simple logits
15 0.52333391 1284 andrew gelman stats-2012-04-26-Modeling probability data
16 0.52327418 2311 andrew gelman stats-2014-04-29-Bayesian Uncertainty Quantification for Differential Equations!
17 0.52296317 1527 andrew gelman stats-2012-10-10-Another reason why you can get good inferences from a bad model
18 0.52004939 151 andrew gelman stats-2010-07-16-Wanted: Probability distributions for rank orderings
19 0.51806509 160 andrew gelman stats-2010-07-23-Unhappy with improvement by a factor of 10^29
20 0.51618332 1363 andrew gelman stats-2012-06-03-Question about predictive checks
topicId topicWeight
[(0, 0.025), (16, 0.056), (21, 0.032), (24, 0.142), (50, 0.164), (54, 0.025), (55, 0.032), (61, 0.124), (87, 0.021), (89, 0.026), (94, 0.037), (98, 0.027), (99, 0.131)]
simIndex simValue blogId blogTitle
same-blog 1 0.90691459 729 andrew gelman stats-2011-05-24-Deviance as a difference
Introduction: Peng Yu writes: On page 180 of BDA2, deviance is defined as D(y,\theta)=-2log p(y|\theta). However, according to GLM 2/e by McCullagh and Nelder, deviance is the different of the log-likelihood of the full model and the base model (times 2) (see the equation on the wiki webpage). The english word ‘deviance’ implies the difference from a standard (in this case, the base model). I’m wondering what the rationale for your definition of deviance, which consists of only 1 term rather than 2 terms. My reply: Deviance is typically computed as a relative quantity; that is, people look at the difference in deviance. So the two definitions are equivalent.
2 0.77870291 818 andrew gelman stats-2011-07-23-Parallel JAGS RNGs
Introduction: As a matter of convention, we usually run 3 or 4 chains in JAGS. By default, this gives rise to chains that draw samples from 3 or 4 distinct pseudorandom number generators. I didn’t go and check whether it does things 111,222,333 or 123,123,123, but in any event the “parallel chains” in JAGS are samples drawn from distinct RNGs computed on a single processor core. But we all have multiple cores now, or we’re computing on a cluster or the cloud! So the behavior we’d like from rjags is to use the foreach package with each JAGS chain using a parallel-safe RNG. The default behavior with n.chain=1 will be that each parallel instance will use .RNG.name[1] , the Wichmann-Hill RNG. JAGS 2.2.0 includes a new lecuyer module (along with the glm module, which everyone should probably always use, and doesn’t have many undocumented tricks that I know of). But lecuyer is completely undocumented! I tried .RNG.name="lecuyer::Lecuyer" , .RNG.name="lecuyer::lecuyer" , and .RNG.name=
3 0.74730194 374 andrew gelman stats-2010-10-27-No matter how famous you are, billions of people have never heard of you.
Introduction: I was recently speaking with a member of the U.S. House of Representatives, a Californian in a tight race this year. I mentioned the fivethirtyeight.com prediction for him, and he said “fivethirtyeight.com? What’s that?”
4 0.7463814 696 andrew gelman stats-2011-05-04-Whassup with glm()?
Introduction: We’re having problem with starting values in glm(). A very simple logistic regression with just an intercept with a very simple starting value (beta=5) blows up. Here’s the R code: > y <- rep (c(1,0),c(10,5)) > glm (y ~ 1, family=binomial(link="logit")) Call: glm(formula = y ~ 1, family = binomial(link = "logit")) Coefficients: (Intercept) 0.6931 Degrees of Freedom: 14 Total (i.e. Null); 14 Residual Null Deviance: 19.1 Residual Deviance: 19.1 AIC: 21.1 > glm (y ~ 1, family=binomial(link="logit"), start=2) Call: glm(formula = y ~ 1, family = binomial(link = "logit"), start = 2) Coefficients: (Intercept) 0.6931 Degrees of Freedom: 14 Total (i.e. Null); 14 Residual Null Deviance: 19.1 Residual Deviance: 19.1 AIC: 21.1 > glm (y ~ 1, family=binomial(link="logit"), start=5) Call: glm(formula = y ~ 1, family = binomial(link = "logit"), start = 5) Coefficients: (Intercept) 1.501e+15 Degrees of Freedom: 14 Total (i.
5 0.74480128 16 andrew gelman stats-2010-05-04-Burgess on Kipling
Introduction: This is my last entry derived from Anthony Burgess’s book reviews , and it’ll be short. His review of Angus Wilson’s “The Strange Ride of Rudyard Kipling: His Life and Works” is a wonderfully balanced little thing. Nothing incredibly deep–like most items in the collection, the review is only two pages long–but I give it credit for being a rare piece of Kipling criticism I’ve seen that (a) seriously engages with the politics, without (b) congratulating itself on bravely going against the fashions of the politically incorrect chattering classes by celebrating Kipling’s magnificent achievement blah blah blah. Instead, Burgess shows respect for Kipling’s work and puts it in historical, biographical, and literary context. Burgess concludes that Wilson’s book “reminds us, in John Gross’s words, that Kipling ‘remains a haunting, unsettling presence, with whom we still have to come to terms.’ Still.” Well put, and generous of Burgess to end his review with another’s quote. Other cri
7 0.72339398 1433 andrew gelman stats-2012-07-28-LOL without the CATS
8 0.72003365 1975 andrew gelman stats-2013-08-09-Understanding predictive information criteria for Bayesian models
9 0.71727717 1558 andrew gelman stats-2012-11-02-Not so fast on levees and seawalls for NY harbor?
10 0.71323454 1370 andrew gelman stats-2012-06-07-Duncan Watts and the Titanic
12 0.69896781 1028 andrew gelman stats-2011-11-26-Tenure lets you handle students who cheat
13 0.69796979 1793 andrew gelman stats-2013-04-08-The Supreme Court meets the fallacy of the one-sided bet
14 0.69516408 1757 andrew gelman stats-2013-03-11-My problem with the Lindley paradox
15 0.69309723 2349 andrew gelman stats-2014-05-26-WAIC and cross-validation in Stan!
16 0.69032383 1805 andrew gelman stats-2013-04-16-Memo to Reinhart and Rogoff: I think it’s best to admit your errors and go on from there
17 0.68760127 1283 andrew gelman stats-2012-04-26-Let’s play “Guess the smoother”!
19 0.68701279 827 andrew gelman stats-2011-07-28-Amusing case of self-defeating science writing
20 0.68114209 232 andrew gelman stats-2010-08-25-Dodging the diplomats