andrew_gelman_stats andrew_gelman_stats-2011 andrew_gelman_stats-2011-729 knowledge-graph by maker-knowledge-mining

729 andrew gelman stats-2011-05-24-Deviance as a difference


meta infos for this blog

Source: html

Introduction: Peng Yu writes: On page 180 of BDA2, deviance is defined as D(y,\theta)=-2log p(y|\theta). However, according to GLM 2/e by McCullagh and Nelder, deviance is the different of the log-likelihood of the full model and the base model (times 2) (see the equation on the wiki webpage). The english word ‘deviance’ implies the difference from a standard (in this case, the base model). I’m wondering what the rationale for your definition of deviance, which consists of only 1 term rather than 2 terms. My reply: Deviance is typically computed as a relative quantity; that is, people look at the difference in deviance. So the two definitions are equivalent.


Summary: the most important sentenses genereted by tfidf model

sentIndex sentText sentNum sentScore

1 Peng Yu writes: On page 180 of BDA2, deviance is defined as D(y,\theta)=-2log p(y|\theta). [sent-1, score-0.86]

2 However, according to GLM 2/e by McCullagh and Nelder, deviance is the different of the log-likelihood of the full model and the base model (times 2) (see the equation on the wiki webpage). [sent-2, score-1.56]

3 The english word ‘deviance’ implies the difference from a standard (in this case, the base model). [sent-3, score-0.673]

4 I’m wondering what the rationale for your definition of deviance, which consists of only 1 term rather than 2 terms. [sent-4, score-0.579]

5 My reply: Deviance is typically computed as a relative quantity; that is, people look at the difference in deviance. [sent-5, score-0.467]


similar blogs computed by tfidf model

tfidf for this blog:

wordName wordTfidf (topN-words)

[('deviance', 0.708), ('theta', 0.219), ('base', 0.206), ('mccullagh', 0.18), ('peng', 0.18), ('nelder', 0.17), ('yu', 0.157), ('wiki', 0.145), ('consists', 0.139), ('rationale', 0.139), ('definitions', 0.137), ('glm', 0.134), ('difference', 0.124), ('computed', 0.117), ('equation', 0.114), ('webpage', 0.113), ('quantity', 0.112), ('model', 0.106), ('implies', 0.102), ('definition', 0.102), ('english', 0.101), ('equivalent', 0.092), ('relative', 0.086), ('defined', 0.085), ('wondering', 0.085), ('word', 0.083), ('term', 0.077), ('according', 0.074), ('typically', 0.069), ('page', 0.067), ('full', 0.065), ('however', 0.061), ('times', 0.057), ('standard', 0.057), ('reply', 0.052), ('look', 0.046), ('case', 0.039), ('rather', 0.037), ('different', 0.036), ('two', 0.034), ('writes', 0.027), ('people', 0.025), ('see', 0.024)]

similar blogs list:

simIndex simValue blogId blogTitle

same-blog 1 1.0000001 729 andrew gelman stats-2011-05-24-Deviance as a difference

Introduction: Peng Yu writes: On page 180 of BDA2, deviance is defined as D(y,\theta)=-2log p(y|\theta). However, according to GLM 2/e by McCullagh and Nelder, deviance is the different of the log-likelihood of the full model and the base model (times 2) (see the equation on the wiki webpage). The english word ‘deviance’ implies the difference from a standard (in this case, the base model). I’m wondering what the rationale for your definition of deviance, which consists of only 1 term rather than 2 terms. My reply: Deviance is typically computed as a relative quantity; that is, people look at the difference in deviance. So the two definitions are equivalent.

2 0.37461284 1221 andrew gelman stats-2012-03-19-Whassup with deviance having a high posterior correlation with a parameter in the model?

Introduction: Jean Richardson writes: Do you know what might lead to a large negative cross-correlation (-0.95) between deviance and one of the model parameters? Here’s the (brief) background: I [Richardson] have written a Bayesian hierarchical site occupancy model for presence of disease on individual amphibians. The response variable is therefore binary (disease present/absent) and the probability of disease being present in an individual (psi) depends on various covariates (species of amphibian, location sampled, etc.) paramaterized using a logit link function. Replicates are individuals sampled (tested for presence of disease) together. The possibility of imperfect detection is included as p = (prob. disease detected given disease is present). Posterior distributions were estimated using WinBUGS via R2WinBUGS. Simulated data from the model fit the real data very well and posterior distribution densities seem robust to any changes in the model (different priors, etc.) All autocor

3 0.30894631 1516 andrew gelman stats-2012-09-30-Computational problems with glm etc.

Introduction: John Mount provides some useful background and follow-up on our discussion from last year on computational instability of the usual logistic regression solver. Just to refresh your memory, here’s a simple logistic regression with only a constant term and no separation, nothing pathological at all: > y <- rep (c(1,0),c(10,5)) > display (glm (y ~ 1, family=binomial(link="logit"))) glm(formula = y ~ 1, family = binomial(link = "logit")) coef.est coef.se (Intercept) 0.69 0.55 --- n = 15, k = 1 residual deviance = 19.1, null deviance = 19.1 (difference = 0.0) And here’s what happens when we give it the not-outrageous starting value of -2: > display (glm (y ~ 1, family=binomial(link="logit"), start=-2)) glm(formula = y ~ 1, family = binomial(link = "logit"), start = -2) coef.est coef.se (Intercept) 71.97 17327434.18 --- n = 15, k = 1 residual deviance = 360.4, null deviance = 19.1 (difference = -341.3) Warning message:

4 0.30265656 696 andrew gelman stats-2011-05-04-Whassup with glm()?

Introduction: We’re having problem with starting values in glm(). A very simple logistic regression with just an intercept with a very simple starting value (beta=5) blows up. Here’s the R code: > y <- rep (c(1,0),c(10,5)) > glm (y ~ 1, family=binomial(link="logit")) Call: glm(formula = y ~ 1, family = binomial(link = "logit")) Coefficients: (Intercept) 0.6931 Degrees of Freedom: 14 Total (i.e. Null); 14 Residual Null Deviance: 19.1 Residual Deviance: 19.1 AIC: 21.1 > glm (y ~ 1, family=binomial(link="logit"), start=2) Call: glm(formula = y ~ 1, family = binomial(link = "logit"), start = 2) Coefficients: (Intercept) 0.6931 Degrees of Freedom: 14 Total (i.e. Null); 14 Residual Null Deviance: 19.1 Residual Deviance: 19.1 AIC: 21.1 > glm (y ~ 1, family=binomial(link="logit"), start=5) Call: glm(formula = y ~ 1, family = binomial(link = "logit"), start = 5) Coefficients: (Intercept) 1.501e+15 Degrees of Freedom: 14 Total (i.

5 0.13177542 1089 andrew gelman stats-2011-12-28-Path sampling for models of varying dimension

Introduction: Somebody asks: I’m reading your paper on path sampling. It essentially solves the problem of computing the ratio \int q0(omega)d omega/\int q1(omega) d omega. I.e the arguments in q0() and q1() are the same. But this assumption is not always true in Bayesian model selection using Bayes factor. In general (for BF), we have this problem, t1 and t2 may have no relation at all. \int f1(y|t1)p1(t1) d t1 / \int f2(y|t2)p2(t2) d t2 As an example, suppose that we want to compare two sets of normally distributed data with known variance whether they have the same mean (H0) or they are not necessarily have the same mean (H1). Then the dummy variable should be mu in H0 (which is the common mean of both set of samples), and should be (mu1, mu2) (which are the means for each set of samples). One straight method to address my problem is to preform path integration for the numerate and the denominator, as both the numerate and the denominator are integrals. Each integral can be rewrit

6 0.1214527 1941 andrew gelman stats-2013-07-16-Priors

7 0.11978365 776 andrew gelman stats-2011-06-22-Deviance, DIC, AIC, cross-validation, etc

8 0.097317882 547 andrew gelman stats-2011-01-31-Using sample size in the prior distribution

9 0.097149417 1868 andrew gelman stats-2013-05-23-Validation of Software for Bayesian Models Using Posterior Quantiles

10 0.09711498 899 andrew gelman stats-2011-09-10-The statistical significance filter

11 0.08824192 1975 andrew gelman stats-2013-08-09-Understanding predictive information criteria for Bayesian models

12 0.087236129 1476 andrew gelman stats-2012-08-30-Stan is fast

13 0.080732659 571 andrew gelman stats-2011-02-13-A departmental wiki page?

14 0.079227261 961 andrew gelman stats-2011-10-16-The “Washington read” and the algebra of conditional distributions

15 0.0763603 2349 andrew gelman stats-2014-05-26-WAIC and cross-validation in Stan!

16 0.074544653 2155 andrew gelman stats-2013-12-31-No on Yes-No decisions

17 0.073678389 1999 andrew gelman stats-2013-08-27-Bayesian model averaging or fitting a larger model

18 0.071264133 1130 andrew gelman stats-2012-01-20-Prior beliefs about locations of decision boundaries

19 0.07061024 870 andrew gelman stats-2011-08-25-Why it doesn’t make sense in general to form confidence intervals by inverting hypothesis tests

20 0.066449381 1913 andrew gelman stats-2013-06-24-Why it doesn’t make sense in general to form confidence intervals by inverting hypothesis tests


similar blogs computed by lsi model

lsi for this blog:

topicId topicWeight

[(0, 0.071), (1, 0.07), (2, 0.018), (3, 0.019), (4, 0.017), (5, -0.015), (6, 0.027), (7, -0.013), (8, 0.026), (9, -0.018), (10, -0.012), (11, 0.005), (12, -0.016), (13, -0.029), (14, -0.04), (15, 0.021), (16, 0.005), (17, -0.021), (18, -0.002), (19, -0.028), (20, 0.071), (21, 0.011), (22, 0.058), (23, -0.08), (24, 0.03), (25, 0.0), (26, -0.01), (27, -0.007), (28, 0.029), (29, -0.004), (30, -0.047), (31, 0.04), (32, 0.006), (33, 0.015), (34, 0.002), (35, 0.015), (36, -0.011), (37, -0.009), (38, 0.009), (39, 0.047), (40, 0.025), (41, 0.014), (42, -0.062), (43, -0.016), (44, -0.017), (45, -0.025), (46, 0.095), (47, 0.028), (48, -0.037), (49, 0.104)]

similar blogs list:

simIndex simValue blogId blogTitle

same-blog 1 0.90058339 729 andrew gelman stats-2011-05-24-Deviance as a difference

Introduction: Peng Yu writes: On page 180 of BDA2, deviance is defined as D(y,\theta)=-2log p(y|\theta). However, according to GLM 2/e by McCullagh and Nelder, deviance is the different of the log-likelihood of the full model and the base model (times 2) (see the equation on the wiki webpage). The english word ‘deviance’ implies the difference from a standard (in this case, the base model). I’m wondering what the rationale for your definition of deviance, which consists of only 1 term rather than 2 terms. My reply: Deviance is typically computed as a relative quantity; that is, people look at the difference in deviance. So the two definitions are equivalent.

2 0.76743579 1221 andrew gelman stats-2012-03-19-Whassup with deviance having a high posterior correlation with a parameter in the model?

Introduction: Jean Richardson writes: Do you know what might lead to a large negative cross-correlation (-0.95) between deviance and one of the model parameters? Here’s the (brief) background: I [Richardson] have written a Bayesian hierarchical site occupancy model for presence of disease on individual amphibians. The response variable is therefore binary (disease present/absent) and the probability of disease being present in an individual (psi) depends on various covariates (species of amphibian, location sampled, etc.) paramaterized using a logit link function. Replicates are individuals sampled (tested for presence of disease) together. The possibility of imperfect detection is included as p = (prob. disease detected given disease is present). Posterior distributions were estimated using WinBUGS via R2WinBUGS. Simulated data from the model fit the real data very well and posterior distribution densities seem robust to any changes in the model (different priors, etc.) All autocor

3 0.70818084 1089 andrew gelman stats-2011-12-28-Path sampling for models of varying dimension

Introduction: Somebody asks: I’m reading your paper on path sampling. It essentially solves the problem of computing the ratio \int q0(omega)d omega/\int q1(omega) d omega. I.e the arguments in q0() and q1() are the same. But this assumption is not always true in Bayesian model selection using Bayes factor. In general (for BF), we have this problem, t1 and t2 may have no relation at all. \int f1(y|t1)p1(t1) d t1 / \int f2(y|t2)p2(t2) d t2 As an example, suppose that we want to compare two sets of normally distributed data with known variance whether they have the same mean (H0) or they are not necessarily have the same mean (H1). Then the dummy variable should be mu in H0 (which is the common mean of both set of samples), and should be (mu1, mu2) (which are the means for each set of samples). One straight method to address my problem is to preform path integration for the numerate and the denominator, as both the numerate and the denominator are integrals. Each integral can be rewrit

4 0.69102883 696 andrew gelman stats-2011-05-04-Whassup with glm()?

Introduction: We’re having problem with starting values in glm(). A very simple logistic regression with just an intercept with a very simple starting value (beta=5) blows up. Here’s the R code: > y <- rep (c(1,0),c(10,5)) > glm (y ~ 1, family=binomial(link="logit")) Call: glm(formula = y ~ 1, family = binomial(link = "logit")) Coefficients: (Intercept) 0.6931 Degrees of Freedom: 14 Total (i.e. Null); 14 Residual Null Deviance: 19.1 Residual Deviance: 19.1 AIC: 21.1 > glm (y ~ 1, family=binomial(link="logit"), start=2) Call: glm(formula = y ~ 1, family = binomial(link = "logit"), start = 2) Coefficients: (Intercept) 0.6931 Degrees of Freedom: 14 Total (i.e. Null); 14 Residual Null Deviance: 19.1 Residual Deviance: 19.1 AIC: 21.1 > glm (y ~ 1, family=binomial(link="logit"), start=5) Call: glm(formula = y ~ 1, family = binomial(link = "logit"), start = 5) Coefficients: (Intercept) 1.501e+15 Degrees of Freedom: 14 Total (i.

5 0.59972167 1422 andrew gelman stats-2012-07-20-Likelihood thresholds and decisions

Introduction: David Hogg points me to this discussion: Martin Strasbourg and I [Hogg] discussed his project to detect new satellites of M31 in the PAndAS survey. He can construct a likelihood ratio (possibly even a marginalized likelihood ratio) at every position in the M31 imaging, between the best-fit satellite-plus-background model and the best nothing-plus-background model. He can make a two-dimensional map of these likelihood ratios and show a the histogram of them. Looking at this histogram, which has a tail to very large ratios, he asked me, where should I put my cut? That is, at what likelihood ratio does a candidate deserve follow-up? Here’s my unsatisfying answer: To a statistician, the distribution of likelihood ratios is interesting and valuable to study. To an astronomer, it is uninteresting. You don’t want to know the distribution of likelihoods, you want to find satellites . . . I wrote that I think this makes sense and that it would actualy be an interesting and useful rese

6 0.59885615 1868 andrew gelman stats-2013-05-23-Validation of Software for Bayesian Models Using Posterior Quantiles

7 0.59860235 1516 andrew gelman stats-2012-09-30-Computational problems with glm etc.

8 0.58251852 1346 andrew gelman stats-2012-05-27-Average predictive comparisons when changing a pair of variables

9 0.57061929 858 andrew gelman stats-2011-08-17-Jumping off the edge of the world

10 0.56861353 547 andrew gelman stats-2011-01-31-Using sample size in the prior distribution

11 0.5628866 1875 andrew gelman stats-2013-05-28-Simplify until your fake-data check works, then add complications until you can figure out where the problem is coming from

12 0.5296616 2342 andrew gelman stats-2014-05-21-Models with constraints

13 0.52899522 1476 andrew gelman stats-2012-08-30-Stan is fast

14 0.52747732 782 andrew gelman stats-2011-06-29-Putting together multinomial discrete regressions by combining simple logits

15 0.52333391 1284 andrew gelman stats-2012-04-26-Modeling probability data

16 0.52327418 2311 andrew gelman stats-2014-04-29-Bayesian Uncertainty Quantification for Differential Equations!

17 0.52296317 1527 andrew gelman stats-2012-10-10-Another reason why you can get good inferences from a bad model

18 0.52004939 151 andrew gelman stats-2010-07-16-Wanted: Probability distributions for rank orderings

19 0.51806509 160 andrew gelman stats-2010-07-23-Unhappy with improvement by a factor of 10^29

20 0.51618332 1363 andrew gelman stats-2012-06-03-Question about predictive checks


similar blogs computed by lda model

lda for this blog:

topicId topicWeight

[(0, 0.025), (16, 0.056), (21, 0.032), (24, 0.142), (50, 0.164), (54, 0.025), (55, 0.032), (61, 0.124), (87, 0.021), (89, 0.026), (94, 0.037), (98, 0.027), (99, 0.131)]

similar blogs list:

simIndex simValue blogId blogTitle

same-blog 1 0.90691459 729 andrew gelman stats-2011-05-24-Deviance as a difference

Introduction: Peng Yu writes: On page 180 of BDA2, deviance is defined as D(y,\theta)=-2log p(y|\theta). However, according to GLM 2/e by McCullagh and Nelder, deviance is the different of the log-likelihood of the full model and the base model (times 2) (see the equation on the wiki webpage). The english word ‘deviance’ implies the difference from a standard (in this case, the base model). I’m wondering what the rationale for your definition of deviance, which consists of only 1 term rather than 2 terms. My reply: Deviance is typically computed as a relative quantity; that is, people look at the difference in deviance. So the two definitions are equivalent.

2 0.77870291 818 andrew gelman stats-2011-07-23-Parallel JAGS RNGs

Introduction: As a matter of convention, we usually run 3 or 4 chains in JAGS. By default, this gives rise to chains that draw samples from 3 or 4 distinct pseudorandom number generators. I didn’t go and check whether it does things 111,222,333 or 123,123,123, but in any event the “parallel chains” in JAGS are samples drawn from distinct RNGs computed on a single processor core. But we all have multiple cores now, or we’re computing on a cluster or the cloud! So the behavior we’d like from rjags is to use the foreach package with each JAGS chain using a parallel-safe RNG. The default behavior with n.chain=1 will be that each parallel instance will use .RNG.name[1] , the Wichmann-Hill RNG. JAGS 2.2.0 includes a new lecuyer module (along with the glm module, which everyone should probably always use, and doesn’t have many undocumented tricks that I know of). But lecuyer is completely undocumented! I tried .RNG.name="lecuyer::Lecuyer" , .RNG.name="lecuyer::lecuyer" , and .RNG.name=

3 0.74730194 374 andrew gelman stats-2010-10-27-No matter how famous you are, billions of people have never heard of you.

Introduction: I was recently speaking with a member of the U.S. House of Representatives, a Californian in a tight race this year. I mentioned the fivethirtyeight.com prediction for him, and he said “fivethirtyeight.com? What’s that?”

4 0.7463814 696 andrew gelman stats-2011-05-04-Whassup with glm()?

Introduction: We’re having problem with starting values in glm(). A very simple logistic regression with just an intercept with a very simple starting value (beta=5) blows up. Here’s the R code: > y <- rep (c(1,0),c(10,5)) > glm (y ~ 1, family=binomial(link="logit")) Call: glm(formula = y ~ 1, family = binomial(link = "logit")) Coefficients: (Intercept) 0.6931 Degrees of Freedom: 14 Total (i.e. Null); 14 Residual Null Deviance: 19.1 Residual Deviance: 19.1 AIC: 21.1 > glm (y ~ 1, family=binomial(link="logit"), start=2) Call: glm(formula = y ~ 1, family = binomial(link = "logit"), start = 2) Coefficients: (Intercept) 0.6931 Degrees of Freedom: 14 Total (i.e. Null); 14 Residual Null Deviance: 19.1 Residual Deviance: 19.1 AIC: 21.1 > glm (y ~ 1, family=binomial(link="logit"), start=5) Call: glm(formula = y ~ 1, family = binomial(link = "logit"), start = 5) Coefficients: (Intercept) 1.501e+15 Degrees of Freedom: 14 Total (i.

5 0.74480128 16 andrew gelman stats-2010-05-04-Burgess on Kipling

Introduction: This is my last entry derived from Anthony Burgess’s book reviews , and it’ll be short. His review of Angus Wilson’s “The Strange Ride of Rudyard Kipling: His Life and Works” is a wonderfully balanced little thing. Nothing incredibly deep–like most items in the collection, the review is only two pages long–but I give it credit for being a rare piece of Kipling criticism I’ve seen that (a) seriously engages with the politics, without (b) congratulating itself on bravely going against the fashions of the politically incorrect chattering classes by celebrating Kipling’s magnificent achievement blah blah blah. Instead, Burgess shows respect for Kipling’s work and puts it in historical, biographical, and literary context. Burgess concludes that Wilson’s book “reminds us, in John Gross’s words, that Kipling ‘remains a haunting, unsettling presence, with whom we still have to come to terms.’ Still.” Well put, and generous of Burgess to end his review with another’s quote. Other cri

6 0.73014581 1662 andrew gelman stats-2013-01-09-The difference between “significant” and “non-significant” is not itself statistically significant

7 0.72339398 1433 andrew gelman stats-2012-07-28-LOL without the CATS

8 0.72003365 1975 andrew gelman stats-2013-08-09-Understanding predictive information criteria for Bayesian models

9 0.71727717 1558 andrew gelman stats-2012-11-02-Not so fast on levees and seawalls for NY harbor?

10 0.71323454 1370 andrew gelman stats-2012-06-07-Duncan Watts and the Titanic

11 0.7120676 1221 andrew gelman stats-2012-03-19-Whassup with deviance having a high posterior correlation with a parameter in the model?

12 0.69896781 1028 andrew gelman stats-2011-11-26-Tenure lets you handle students who cheat

13 0.69796979 1793 andrew gelman stats-2013-04-08-The Supreme Court meets the fallacy of the one-sided bet

14 0.69516408 1757 andrew gelman stats-2013-03-11-My problem with the Lindley paradox

15 0.69309723 2349 andrew gelman stats-2014-05-26-WAIC and cross-validation in Stan!

16 0.69032383 1805 andrew gelman stats-2013-04-16-Memo to Reinhart and Rogoff: I think it’s best to admit your errors and go on from there

17 0.68760127 1283 andrew gelman stats-2012-04-26-Let’s play “Guess the smoother”!

18 0.68724597 2112 andrew gelman stats-2013-11-25-An interesting but flawed attempt to apply general forecasting principles to contextualize attitudes toward risks of global warming

19 0.68701279 827 andrew gelman stats-2011-07-28-Amusing case of self-defeating science writing

20 0.68114209 232 andrew gelman stats-2010-08-25-Dodging the diplomats