andrew_gelman_stats andrew_gelman_stats-2011 andrew_gelman_stats-2011-996 knowledge-graph by maker-knowledge-mining

996 andrew gelman stats-2011-11-07-Chi-square FAIL when many cells have small expected values


meta infos for this blog

Source: html

Introduction: William Perkins, Mark Tygert, and Rachel Ward write : If a discrete probability distribution in a model being tested for goodness-of-fit is not close to uniform, then forming the Pearson χ2 statistic can involve division by nearly zero. This often leads to serious trouble in practice — even in the absence of round-off errors . . . The problem is not merely that the chi-squared statistic doesn’t have the advertised chi-squared distribution —a reference distribution can always be computed via simulation, either using the posterior predictive distribution or by conditioning on a point estimate of the cell expectations and then making a degrees-of-freedom sort of adjustment. Rather, the problem is that, when there are lots of cells with near-zero expectation, the chi-squared test is mostly noise. And this is not merely a theoretical problem. It comes up in real examples. Here’s one, taken from the classic 1992 genetics paper of Guo and Thomspson: And here are the e


Summary: the most important sentenses genereted by tfidf model

sentIndex sentText sentNum sentScore

1 William Perkins, Mark Tygert, and Rachel Ward write : If a discrete probability distribution in a model being tested for goodness-of-fit is not close to uniform, then forming the Pearson χ2 statistic can involve division by nearly zero. [sent-1, score-0.78]

2 This often leads to serious trouble in practice — even in the absence of round-off errors . [sent-2, score-0.077]

3 Rather, the problem is that, when there are lots of cells with near-zero expectation, the chi-squared test is mostly noise. [sent-6, score-0.518]

4 Here’s one, taken from the classic 1992 genetics paper of Guo and Thomspson: And here are the expected frequencies from the Guo and Thompson model: The p-value of the chi-squared test is 0. [sent-9, score-0.633]

5 But it turns out that that if you do an equally-weighted mean square test (rather than chi-square, which weights each cell proportional to expected counts), you get a p-value of 0. [sent-12, score-0.984]

6 (Perkins, Tygert, and Ward compute the p-value via simulation. [sent-14, score-0.101]

7 All those zeroes and near-zeroes in the data give you a chi-squared test that is so noisy as to be useless. [sent-17, score-0.344]

8 If people really are going around saying their models fit in such situations, it could be causing real problems. [sent-18, score-0.23]

9 In the chi-squared statistic, all that noise in the empty cells is diluting the signal. [sent-21, score-0.533]


similar blogs computed by tfidf model

tfidf for this blog:

wordName wordTfidf (topN-words)

[('cells', 0.348), ('statistic', 0.324), ('expected', 0.296), ('tygert', 0.252), ('perkins', 0.216), ('guo', 0.216), ('ward', 0.194), ('test', 0.17), ('distribution', 0.159), ('square', 0.158), ('cell', 0.149), ('merely', 0.116), ('summed', 0.115), ('zeroes', 0.108), ('diluting', 0.108), ('advertised', 0.104), ('thompson', 0.104), ('via', 0.101), ('rachel', 0.094), ('forming', 0.09), ('frequencies', 0.09), ('discrepancies', 0.089), ('going', 0.088), ('squared', 0.084), ('pearson', 0.083), ('conditioning', 0.078), ('rejection', 0.078), ('absence', 0.077), ('empty', 0.077), ('genetics', 0.077), ('causing', 0.077), ('largest', 0.075), ('proportional', 0.074), ('expectation', 0.074), ('computed', 0.074), ('william', 0.073), ('weights', 0.072), ('expectations', 0.072), ('tested', 0.071), ('weighting', 0.071), ('counts', 0.071), ('division', 0.07), ('signal', 0.069), ('uniform', 0.068), ('noisy', 0.066), ('involve', 0.066), ('simulation', 0.065), ('mean', 0.065), ('real', 0.065), ('situations', 0.065)]

similar blogs list:

simIndex simValue blogId blogTitle

same-blog 1 1.0 996 andrew gelman stats-2011-11-07-Chi-square FAIL when many cells have small expected values

Introduction: William Perkins, Mark Tygert, and Rachel Ward write : If a discrete probability distribution in a model being tested for goodness-of-fit is not close to uniform, then forming the Pearson χ2 statistic can involve division by nearly zero. This often leads to serious trouble in practice — even in the absence of round-off errors . . . The problem is not merely that the chi-squared statistic doesn’t have the advertised chi-squared distribution —a reference distribution can always be computed via simulation, either using the posterior predictive distribution or by conditioning on a point estimate of the cell expectations and then making a degrees-of-freedom sort of adjustment. Rather, the problem is that, when there are lots of cells with near-zero expectation, the chi-squared test is mostly noise. And this is not merely a theoretical problem. It comes up in real examples. Here’s one, taken from the classic 1992 genetics paper of Guo and Thomspson: And here are the e

2 0.1081041 56 andrew gelman stats-2010-05-28-Another argument in favor of expressing conditional probability statements using the population distribution

Introduction: Yesterday we had a spirited discussion of the following conditional probability puzzle: “I have two children. One is a boy born on a Tuesday. What is the probability I have two boys?” This reminded me of the principle, familiar from statistics instruction and the cognitive psychology literature, that the best way to teach these sorts of examples is through integers rather than fractions. For example, consider this classic problem: “10% of persons have disease X. You are tested for the disease and test positive, and the test has 80% accuracy. What is the probability that you have the disease?” This can be solved directly using conditional probability but it appears to be clearer to do it using integers: Start with 100 people. 10 will have the disease and 90 will not. Of the 10 with the disease, 8 will test positive and 2 will test negative. Of the 90 without the disease, 18 will test positive and 72% will test negative. (72% = 0.8*90.) So, out of the origin

3 0.084793083 1363 andrew gelman stats-2012-06-03-Question about predictive checks

Introduction: Klaas Metselaar writes: I [Metselaar] am currently involved in a discussion about the use of the notion “predictive” as used in “posterior predictive check”. I would argue that the notion “predictive” should be reserved for posterior checks using information not used in the determination of the posterior. I quote from the discussion: “However, the predictive uncertainty in a Bayesian calculation requires sampling from all the random variables, and this includes both the model parameters and the residual error”. My [Metselaar's] comment: This may be exactly the point I am worried about: shouldn’t the predictive uncertainty be defined as sampling from the posterior parameter distribution + residual error + sampling from the prediction error distribution? Residual error reduces to measurement error in the case of a model which is perfect for the sample of experiments. Measurement error could be reduced to almost zero by ideal and perfect measurement instruments. I would h

4 0.082501985 376 andrew gelman stats-2010-10-28-My talk at American University

Introduction: Red State Blue State: How Will the U.S. Vote? It’s the “annual Halloween and pre-election extravaganza” of the Department of Mathematics and Statistics, and they suggested I could talk on the zombies paper (of course), but I thought the material on voting might be of more general interest. The “How will the U.S. vote?” subtitle was not of my choosing, but I suppose I can add a few slides about the forthcoming election. Fri 29 Oct 2010, 7pm in Ward I, in the basement of the Ward Circle building. Should be fun. I haven’t been to AU since taking a class there, over 30 years ago. P.S. It was indeed fun. Here’s the talk. I did end up briefly describing my zombie research but it didn’t make it into any of the slides.

5 0.081748255 2029 andrew gelman stats-2013-09-18-Understanding posterior p-values

Introduction: David Kaplan writes: I came across your paper “Understanding Posterior Predictive P-values”, and I have a question regarding your statement “If a posterior predictive p-value is 0.4, say, that means that, if we believe the model, we think there is a 40% chance that tomorrow’s value of T(y_rep) will exceed today’s T(y).” This is perfectly understandable to me and represents the idea of calibration. However, I am unsure how this relates to statements about fit. If T is the LR chi-square or Pearson chi-square, then your statement that there is a 40% chance that tomorrows value exceeds today’s value indicates bad fit, I think. Yet, some literature indicates that high p-values suggest good fit. Could you clarify this? My reply: I think that “fit” depends on the question being asked. In this case, I’d say the model fits for this particular purpose, even though it might not fit for other purposes. And here’s the abstract of the paper: Posterior predictive p-values do not i

6 0.08072561 1767 andrew gelman stats-2013-03-17-The disappearing or non-disappearing middle class

7 0.080028705 1981 andrew gelman stats-2013-08-14-The robust beauty of improper linear models in decision making

8 0.079294473 1527 andrew gelman stats-2012-10-10-Another reason why you can get good inferences from a bad model

9 0.078366689 2128 andrew gelman stats-2013-12-09-How to model distributions that have outliers in one direction

10 0.078347065 781 andrew gelman stats-2011-06-28-The holes in my philosophy of Bayesian data analysis

11 0.077515773 351 andrew gelman stats-2010-10-18-“I was finding the test so irritating and boring that I just started to click through as fast as I could”

12 0.077178508 1886 andrew gelman stats-2013-06-07-Robust logistic regression

13 0.076671995 799 andrew gelman stats-2011-07-13-Hypothesis testing with multiple imputations

14 0.075310007 1605 andrew gelman stats-2012-12-04-Write This Book

15 0.075218603 1247 andrew gelman stats-2012-04-05-More philosophy of Bayes

16 0.07450445 1309 andrew gelman stats-2012-05-09-The first version of my “inference from iterative simulation using parallel sequences” paper!

17 0.073928133 1941 andrew gelman stats-2013-07-16-Priors

18 0.073621094 929 andrew gelman stats-2011-09-27-Visual diagnostics for discrete-data regressions

19 0.073531888 754 andrew gelman stats-2011-06-09-Difficulties with Bayesian model averaging

20 0.072959594 2258 andrew gelman stats-2014-03-21-Random matrices in the news


similar blogs computed by lsi model

lsi for this blog:

topicId topicWeight

[(0, 0.134), (1, 0.072), (2, 0.029), (3, 0.0), (4, 0.02), (5, -0.032), (6, 0.035), (7, 0.011), (8, 0.016), (9, -0.036), (10, 0.001), (11, 0.017), (12, -0.039), (13, -0.038), (14, -0.052), (15, -0.03), (16, 0.033), (17, -0.007), (18, 0.014), (19, -0.048), (20, 0.039), (21, 0.006), (22, 0.016), (23, -0.036), (24, 0.016), (25, 0.02), (26, -0.012), (27, 0.016), (28, 0.03), (29, 0.023), (30, -0.002), (31, 0.019), (32, -0.015), (33, -0.0), (34, 0.002), (35, 0.008), (36, 0.03), (37, -0.006), (38, -0.012), (39, 0.021), (40, 0.013), (41, -0.028), (42, -0.004), (43, -0.028), (44, -0.014), (45, 0.02), (46, 0.01), (47, 0.016), (48, 0.05), (49, -0.03)]

similar blogs list:

simIndex simValue blogId blogTitle

same-blog 1 0.97297788 996 andrew gelman stats-2011-11-07-Chi-square FAIL when many cells have small expected values

Introduction: William Perkins, Mark Tygert, and Rachel Ward write : If a discrete probability distribution in a model being tested for goodness-of-fit is not close to uniform, then forming the Pearson χ2 statistic can involve division by nearly zero. This often leads to serious trouble in practice — even in the absence of round-off errors . . . The problem is not merely that the chi-squared statistic doesn’t have the advertised chi-squared distribution —a reference distribution can always be computed via simulation, either using the posterior predictive distribution or by conditioning on a point estimate of the cell expectations and then making a degrees-of-freedom sort of adjustment. Rather, the problem is that, when there are lots of cells with near-zero expectation, the chi-squared test is mostly noise. And this is not merely a theoretical problem. It comes up in real examples. Here’s one, taken from the classic 1992 genetics paper of Guo and Thomspson: And here are the e

2 0.84287447 2029 andrew gelman stats-2013-09-18-Understanding posterior p-values

Introduction: David Kaplan writes: I came across your paper “Understanding Posterior Predictive P-values”, and I have a question regarding your statement “If a posterior predictive p-value is 0.4, say, that means that, if we believe the model, we think there is a 40% chance that tomorrow’s value of T(y_rep) will exceed today’s T(y).” This is perfectly understandable to me and represents the idea of calibration. However, I am unsure how this relates to statements about fit. If T is the LR chi-square or Pearson chi-square, then your statement that there is a 40% chance that tomorrows value exceeds today’s value indicates bad fit, I think. Yet, some literature indicates that high p-values suggest good fit. Could you clarify this? My reply: I think that “fit” depends on the question being asked. In this case, I’d say the model fits for this particular purpose, even though it might not fit for other purposes. And here’s the abstract of the paper: Posterior predictive p-values do not i

3 0.78751791 56 andrew gelman stats-2010-05-28-Another argument in favor of expressing conditional probability statements using the population distribution

Introduction: Yesterday we had a spirited discussion of the following conditional probability puzzle: “I have two children. One is a boy born on a Tuesday. What is the probability I have two boys?” This reminded me of the principle, familiar from statistics instruction and the cognitive psychology literature, that the best way to teach these sorts of examples is through integers rather than fractions. For example, consider this classic problem: “10% of persons have disease X. You are tested for the disease and test positive, and the test has 80% accuracy. What is the probability that you have the disease?” This can be solved directly using conditional probability but it appears to be clearer to do it using integers: Start with 100 people. 10 will have the disease and 90 will not. Of the 10 with the disease, 8 will test positive and 2 will test negative. Of the 90 without the disease, 18 will test positive and 72% will test negative. (72% = 0.8*90.) So, out of the origin

4 0.77715254 1527 andrew gelman stats-2012-10-10-Another reason why you can get good inferences from a bad model

Introduction: John Cook considers how people justify probability distribution assumptions: Sometimes distribution assumptions are not justified. Sometimes distributions can be derived from fundamental principles [or] . . . on theoretical grounds. For example, large samples and the central limit theorem together may justify assuming that something is normally distributed. Often the choice of distribution is somewhat arbitrary, chosen by intuition or for convenience, and then empirically shown to work well enough. Sometimes a distribution can be a bad fit and still work well, depending on what you’re asking of it. Cook continues: The last point is particularly interesting. It’s not hard to imagine that a poor fit would produce poor results. It’s surprising when a poor fit produces good results. And then he gives an example of an effective but inaccurate model used to model survival times in a clinical trial. Cook explains: The [poorly-fitting] method works well because of the q

5 0.77552396 2128 andrew gelman stats-2013-12-09-How to model distributions that have outliers in one direction

Introduction: Shravan writes: I have a problem very similar to the one presented chapter 6 of BDA, the speed of light example. You use the distribution of the minimum scores from the posterior predictive distribution, show that it’s not realistic given the data, and suggest that an asymmetric contaminated normal distribution or a symmetric long-tailed distribution would be better. How does one use such a distribution? My reply: You can actually use a symmetric long-tailed distribution such as t with low degrees of freedom. One striking feature of symmetric long-tailed distributions is that a small random sample from such a distribution can have outliers on one side or the other and look asymmetric. Just to see this, try the following in R: par (mfrow=c(3,3), mar=c(1,1,1,1)) for (i in 1:9) hist (rt (100, 2), xlab="", ylab="", main="") You’ll see some skewed distributions. So that’s the message (which I learned from an offhand comment of Rubin, actually): if you want to model

6 0.77403027 1363 andrew gelman stats-2012-06-03-Question about predictive checks

7 0.76295871 1983 andrew gelman stats-2013-08-15-More on AIC, WAIC, etc

8 0.75156933 1346 andrew gelman stats-2012-05-27-Average predictive comparisons when changing a pair of variables

9 0.74657172 1881 andrew gelman stats-2013-06-03-Boot

10 0.74013519 1401 andrew gelman stats-2012-06-30-David Hogg on statistics

11 0.7271474 2176 andrew gelman stats-2014-01-19-Transformations for non-normal data

12 0.72403628 1284 andrew gelman stats-2012-04-26-Modeling probability data

13 0.72310996 1460 andrew gelman stats-2012-08-16-“Real data can be a pain”

14 0.72026139 923 andrew gelman stats-2011-09-24-What is the normal range of values in a medical test?

15 0.71304077 2258 andrew gelman stats-2014-03-21-Random matrices in the news

16 0.7121594 1509 andrew gelman stats-2012-09-24-Analyzing photon counts

17 0.71097642 2342 andrew gelman stats-2014-05-21-Models with constraints

18 0.70808715 341 andrew gelman stats-2010-10-14-Confusion about continuous probability densities

19 0.70765263 2311 andrew gelman stats-2014-04-29-Bayesian Uncertainty Quantification for Differential Equations!

20 0.70619649 1518 andrew gelman stats-2012-10-02-Fighting a losing battle


similar blogs computed by lda model

lda for this blog:

topicId topicWeight

[(2, 0.016), (15, 0.02), (16, 0.035), (21, 0.033), (24, 0.153), (31, 0.024), (44, 0.011), (86, 0.056), (89, 0.023), (96, 0.012), (97, 0.247), (98, 0.022), (99, 0.224)]

similar blogs list:

simIndex simValue blogId blogTitle

1 0.94900298 1573 andrew gelman stats-2012-11-11-Incredibly strange spam

Introduction: Unsolicited (of course) in the email the other day: Just wanted to touch base with you to see if you needed any quotes on Parking lot lighting or Garage Lighting? (Induction, LED, Canopy etc…) We help retrofit 1000′s of garages around the country. Let me know your specs and ill send you a quote in 24 hours. ** Owner Emergency Lights Co. Ill indeed. . . .

2 0.92810166 882 andrew gelman stats-2011-08-31-Meanwhile, on the sister blog . . .

Introduction: NYT columnist Douthat asks: Should we be disturbed that a leading presidential candidate endorses a pro-slavery position? Who’s on the web? And where are they? Sowell, Carlson, Barone: fools, knaves, or simply victims of a cognitive illusion? Don’t blame the American public for the D.C. deadlock Calvin College update Help reform the Institutional Review Board (IRB) system! Powerful credit-rating agencies are a creation of the government . . . what does it mean when they bite the hand that feeds them? “Waiting for a landslide” A simple theory of why Obama didn’t come out fighting in 2009 A modest proposal Noooooooooooooooo!!!!!!!!!!!!!!! The Family Research Council and the Barnard Center for Research on Women Sleazy data miners Genetic essentialism is in our genes Wow, that was a lot! No wonder I don’t get any research done…

3 0.90152955 160 andrew gelman stats-2010-07-23-Unhappy with improvement by a factor of 10^29

Introduction: I have an optimization problem: I have a complicated physical model that predicts energy and thermal behavior of a building, given the values of a slew of parameters, such as insulation effectiveness, window transmissivity, etc. I’m trying to find the parameter set that best fits several weeks of thermal and energy use data from the real building that we modeled. (Of course I would rather explore parameter space and come up with probability distributions for the parameters, and maybe that will come later, but for now I’m just optimizing). To do the optimization, colleagues and I implemented a “particle swarm optimization” algorithm on a massively parallel machine. This involves giving each of about 120 “particles” an initial position in parameter space, then letting them move around, trying to move to better positions according to a specific algorithm. We gave each particle an initial position sampled from our prior distribution for each parameter. So far we’ve run about 140 itera

same-blog 4 0.89854753 996 andrew gelman stats-2011-11-07-Chi-square FAIL when many cells have small expected values

Introduction: William Perkins, Mark Tygert, and Rachel Ward write : If a discrete probability distribution in a model being tested for goodness-of-fit is not close to uniform, then forming the Pearson χ2 statistic can involve division by nearly zero. This often leads to serious trouble in practice — even in the absence of round-off errors . . . The problem is not merely that the chi-squared statistic doesn’t have the advertised chi-squared distribution —a reference distribution can always be computed via simulation, either using the posterior predictive distribution or by conditioning on a point estimate of the cell expectations and then making a degrees-of-freedom sort of adjustment. Rather, the problem is that, when there are lots of cells with near-zero expectation, the chi-squared test is mostly noise. And this is not merely a theoretical problem. It comes up in real examples. Here’s one, taken from the classic 1992 genetics paper of Guo and Thomspson: And here are the e

5 0.86937106 142 andrew gelman stats-2010-07-12-God, Guns, and Gaydar: The Laws of Probability Push You to Overestimate Small Groups

Introduction: Earlier today, Nate criticized a U.S. military survey that asks troops the question, “Do you currently serve with a male or female Service member you believe to be homosexual.” [emphasis added] As Nate points out, by asking this question in such a speculative way, “it would seem that you’ll be picking up a tremendous number of false positives–soldiers who are believed to be gay, but aren’t–and that these false positives will swamp any instances in which soldiers (in spite of DADT) are actually somewhat open about their same-sex attractions.” This is a general problem in survey research. In an article in Chance magazine in 1997, “The myth of millions of annual self-defense gun uses: a case study of survey overestimates of rare events” [see here for related references], David Hemenway uses the false-positive, false-negative reasoning to explain this bias in terms of probability theory. Misclassifications that induce seemingly minor biases in estimates of certain small probab

6 0.85074872 553 andrew gelman stats-2011-02-03-is it possible to “overstratify” when assigning a treatment in a randomized control trial?

7 0.84743655 1001 andrew gelman stats-2011-11-10-Three hours in the life of a statistician

8 0.83372498 13 andrew gelman stats-2010-04-30-Things I learned from the Mickey Kaus for Senate campaign

9 0.8325243 1651 andrew gelman stats-2013-01-03-Faculty Position in Visualization, Visual Analytics, Imaging, and Human Centered Computing

10 0.80017769 526 andrew gelman stats-2011-01-19-“If it saves the life of a single child…” and other nonsense

11 0.79789323 2118 andrew gelman stats-2013-11-30-???

12 0.7931264 1812 andrew gelman stats-2013-04-19-Chomsky chomsky chomsky chomsky furiously

13 0.79058695 820 andrew gelman stats-2011-07-25-Design of nonrandomized cluster sample study

14 0.78890431 1694 andrew gelman stats-2013-01-26-Reflections on ethicsblogging

15 0.76901555 18 andrew gelman stats-2010-05-06-$63,000 worth of abusive research . . . or just a really stupid waste of time?

16 0.76470488 1335 andrew gelman stats-2012-05-21-Responding to a bizarre anti-social-science screed

17 0.76129216 115 andrew gelman stats-2010-06-28-Whassup with those crappy thrillers?

18 0.75944042 1518 andrew gelman stats-2012-10-02-Fighting a losing battle

19 0.75909162 112 andrew gelman stats-2010-06-27-Sampling rate of human-scaled time series

20 0.75788748 2246 andrew gelman stats-2014-03-13-An Economist’s Guide to Visualizing Data