andrew_gelman_stats andrew_gelman_stats-2012 andrew_gelman_stats-2012-1363 knowledge-graph by maker-knowledge-mining
Source: html
Introduction: Klaas Metselaar writes: I [Metselaar] am currently involved in a discussion about the use of the notion “predictive” as used in “posterior predictive check”. I would argue that the notion “predictive” should be reserved for posterior checks using information not used in the determination of the posterior. I quote from the discussion: “However, the predictive uncertainty in a Bayesian calculation requires sampling from all the random variables, and this includes both the model parameters and the residual error”. My [Metselaar's] comment: This may be exactly the point I am worried about: shouldn’t the predictive uncertainty be defined as sampling from the posterior parameter distribution + residual error + sampling from the prediction error distribution? Residual error reduces to measurement error in the case of a model which is perfect for the sample of experiments. Measurement error could be reduced to almost zero by ideal and perfect measurement instruments. I would h
sentIndex sentText sentNum sentScore
1 Klaas Metselaar writes: I [Metselaar] am currently involved in a discussion about the use of the notion “predictive” as used in “posterior predictive check”. [sent-1, score-0.699]
2 I would argue that the notion “predictive” should be reserved for posterior checks using information not used in the determination of the posterior. [sent-2, score-0.989]
3 I quote from the discussion: “However, the predictive uncertainty in a Bayesian calculation requires sampling from all the random variables, and this includes both the model parameters and the residual error”. [sent-3, score-0.938]
4 My [Metselaar's] comment: This may be exactly the point I am worried about: shouldn’t the predictive uncertainty be defined as sampling from the posterior parameter distribution + residual error + sampling from the prediction error distribution? [sent-4, score-2.129]
5 Residual error reduces to measurement error in the case of a model which is perfect for the sample of experiments. [sent-5, score-1.146]
6 Measurement error could be reduced to almost zero by ideal and perfect measurement instruments. [sent-6, score-0.676]
7 It is the price we have to pay for imperfect knowledge (a small sample of experimental sites or too large a leap of faith in defining the population for which the sample is representative), new times and new places. [sent-8, score-0.7]
8 Unless we can show that this predictive error distribution is essentially 0 for the population of interest, we as scientists have work to do. [sent-9, score-0.934]
9 using a model for gravity with a posterior based on earth observations only, and wanting to use it predictively for earth and mars. [sent-12, score-1.093]
10 A posterior predictive check for earth could be perfect, but would be completely wrong if the model is to be used for mars (the leap of faith I am talking about). [sent-13, score-1.561]
11 I would reserve the notion posterior distribution check for checks involving the data “A” on which the posterior is based, and reserve the notion “posterior predictive check” for a posterior distribution check using data not contained in dataset “A”. [sent-14, score-3.325]
12 My reply: We speak of three sorts of predictive checks: within-sample, cross-validation, and out-of-sample. [sent-15, score-0.53]
13 In any of these scenarios, we are comparing data to a predictive distribution. [sent-16, score-0.557]
14 In the first case, we are comparing data to predictions based on a model fit to those data. [sent-17, score-0.383]
15 In the second case, we hold out some of our data for the comparison. [sent-18, score-0.149]
16 In the third case, we compare predictions to new data not from the original source. [sent-19, score-0.147]
17 All three of these sorts of predictive comparisons can be useful. [sent-20, score-0.53]
wordName wordTfidf (topN-words)
[('predictive', 0.412), ('posterior', 0.339), ('metselaar', 0.271), ('residual', 0.25), ('error', 0.232), ('distribution', 0.228), ('notion', 0.227), ('earth', 0.171), ('perfect', 0.164), ('check', 0.164), ('measurement', 0.163), ('checks', 0.155), ('reserve', 0.127), ('faith', 0.125), ('leap', 0.121), ('sample', 0.111), ('sampling', 0.107), ('model', 0.097), ('case', 0.089), ('predictively', 0.082), ('hold', 0.082), ('predictions', 0.08), ('prediction', 0.079), ('comparing', 0.078), ('determination', 0.078), ('reserved', 0.078), ('uncertainty', 0.072), ('mars', 0.072), ('parameter', 0.071), ('gravity', 0.07), ('data', 0.067), ('zero', 0.066), ('sorts', 0.065), ('contained', 0.063), ('scenarios', 0.063), ('population', 0.062), ('based', 0.061), ('empirically', 0.061), ('used', 0.06), ('sites', 0.059), ('imperfect', 0.058), ('reduces', 0.058), ('three', 0.053), ('defining', 0.053), ('established', 0.053), ('using', 0.052), ('reduced', 0.051), ('covered', 0.051), ('wanting', 0.05), ('representative', 0.049)]
simIndex simValue blogId blogTitle
same-blog 1 1.0 1363 andrew gelman stats-2012-06-03-Question about predictive checks
Introduction: Klaas Metselaar writes: I [Metselaar] am currently involved in a discussion about the use of the notion “predictive” as used in “posterior predictive check”. I would argue that the notion “predictive” should be reserved for posterior checks using information not used in the determination of the posterior. I quote from the discussion: “However, the predictive uncertainty in a Bayesian calculation requires sampling from all the random variables, and this includes both the model parameters and the residual error”. My [Metselaar's] comment: This may be exactly the point I am worried about: shouldn’t the predictive uncertainty be defined as sampling from the posterior parameter distribution + residual error + sampling from the prediction error distribution? Residual error reduces to measurement error in the case of a model which is perfect for the sample of experiments. Measurement error could be reduced to almost zero by ideal and perfect measurement instruments. I would h
2 0.30216292 2029 andrew gelman stats-2013-09-18-Understanding posterior p-values
Introduction: David Kaplan writes: I came across your paper “Understanding Posterior Predictive P-values”, and I have a question regarding your statement “If a posterior predictive p-value is 0.4, say, that means that, if we believe the model, we think there is a 40% chance that tomorrow’s value of T(y_rep) will exceed today’s T(y).” This is perfectly understandable to me and represents the idea of calibration. However, I am unsure how this relates to statements about fit. If T is the LR chi-square or Pearson chi-square, then your statement that there is a 40% chance that tomorrows value exceeds today’s value indicates bad fit, I think. Yet, some literature indicates that high p-values suggest good fit. Could you clarify this? My reply: I think that “fit” depends on the question being asked. In this case, I’d say the model fits for this particular purpose, even though it might not fit for other purposes. And here’s the abstract of the paper: Posterior predictive p-values do not i
3 0.21366319 1208 andrew gelman stats-2012-03-11-Gelman on Hennig on Gelman on Bayes
Introduction: Deborah Mayo pointed me to this discussion by Christian Hennig of my recent article on Induction and Deduction in Bayesian Data Analysis. A couple days ago I responded to comments by Mayo, Stephen Senn, and Larry Wasserman. I will respond to Hennig by pulling out paragraphs from his discussion and then replying. Hennig: for me the terms “frequentist” and “subjective Bayes” point to interpretations of probability, and not to specific methods of inference. The frequentist one refers to the idea that there is an underlying data generating process that repeatedly throws out data and would approximate the assumed distribution if one could only repeat it infinitely often. Hennig makes the good point that, if this is the way you would define “frequentist” (it’s not how I’d define the term myself, but I’ll use Hennig’s definition here), then it makes sense to be a frequentist in some settings but not others. Dice really can be rolled over and over again; a sample survey of 15
Introduction: Lots of good statistical methods make use of two models. For example: - Classical statistics: estimates and standard errors using the likelihood function; tests and p-values using the sampling distribution. (The sampling distribution is not equivalent to the likelihood, as has been much discussed, for example in sequential stopping problems.) - Bayesian data analysis: inference using the posterior distribution; model checking using the predictive distribution (which, again, depends on the data-generating process in a way that the likelihood does not). - Machine learning: estimation using the data; evaluation using cross-validation (which requires some rule for partitioning the data, a rule that stands outside of the data themselves). - Bootstrap, jackknife, etc: estimation using an “estimator” (which, I would argue, is based in some sense on a model for the data), uncertainties using resampling (which, I would argue, is close to the idea of a “sampling distribution” in
5 0.17728008 1983 andrew gelman stats-2013-08-15-More on AIC, WAIC, etc
Introduction: Following up on our discussion from the other day, Angelika van der Linde sends along this paper from 2012 (link to journal here ). And Aki pulls out this great quote from Geisser and Eddy (1979): This discussion makes clear that in the nested case this method, as Akaike’s, is not consistent; i.e., even if $M_k$ is true, it will be rejected with probability $\alpha$ as $N\to\infty$. This point is also made by Schwarz (1978). However, from the point of view of prediction, this is of no great consequence. For large numbers of observations, a prediction based on the falsely assumed $M_k$, will not differ appreciably from one based on the true $M_k$. For example, if we assert that two normal populations have different means when in fact they have the same mean, then the use of the group mean as opposed to the grand mean for predicting a future observation results in predictors which are asymptotically equivalent and whose predictive variances are $\sigma^2[1 + (1/2n)]$ and $\si
6 0.17108935 1144 andrew gelman stats-2012-01-29-How many parameters are in a multilevel model?
7 0.16184424 2128 andrew gelman stats-2013-12-09-How to model distributions that have outliers in one direction
8 0.15882587 1247 andrew gelman stats-2012-04-05-More philosophy of Bayes
9 0.15292855 1941 andrew gelman stats-2013-07-16-Priors
11 0.14618772 754 andrew gelman stats-2011-06-09-Difficulties with Bayesian model averaging
12 0.14518134 2201 andrew gelman stats-2014-02-06-Bootstrap averaging: Examples where it works and where it doesn’t work
13 0.14105976 781 andrew gelman stats-2011-06-28-The holes in my philosophy of Bayesian data analysis
14 0.14012654 1162 andrew gelman stats-2012-02-11-Adding an error model to a deterministic model
15 0.13969868 1527 andrew gelman stats-2012-10-10-Another reason why you can get good inferences from a bad model
16 0.1380938 772 andrew gelman stats-2011-06-17-Graphical tools for understanding multilevel models
17 0.13671961 2208 andrew gelman stats-2014-02-12-How to think about “identifiability” in Bayesian inference?
18 0.13427809 154 andrew gelman stats-2010-07-18-Predictive checks for hierarchical models
19 0.13216779 1955 andrew gelman stats-2013-07-25-Bayes-respecting experimental design and other things
20 0.12851097 2041 andrew gelman stats-2013-09-27-Setting up Jitts online
topicId topicWeight
[(0, 0.173), (1, 0.201), (2, 0.062), (3, 0.005), (4, 0.027), (5, -0.0), (6, 0.02), (7, -0.007), (8, -0.007), (9, -0.033), (10, -0.001), (11, -0.022), (12, -0.06), (13, 0.008), (14, -0.128), (15, -0.024), (16, 0.014), (17, -0.035), (18, 0.032), (19, -0.027), (20, 0.056), (21, -0.042), (22, 0.036), (23, -0.032), (24, -0.004), (25, 0.03), (26, -0.083), (27, 0.083), (28, 0.097), (29, 0.008), (30, -0.038), (31, 0.04), (32, -0.034), (33, 0.003), (34, -0.015), (35, 0.037), (36, -0.024), (37, -0.051), (38, -0.035), (39, -0.018), (40, 0.012), (41, -0.036), (42, -0.015), (43, 0.016), (44, -0.043), (45, -0.045), (46, -0.002), (47, 0.027), (48, 0.03), (49, -0.01)]
simIndex simValue blogId blogTitle
same-blog 1 0.97429919 1363 andrew gelman stats-2012-06-03-Question about predictive checks
Introduction: Klaas Metselaar writes: I [Metselaar] am currently involved in a discussion about the use of the notion “predictive” as used in “posterior predictive check”. I would argue that the notion “predictive” should be reserved for posterior checks using information not used in the determination of the posterior. I quote from the discussion: “However, the predictive uncertainty in a Bayesian calculation requires sampling from all the random variables, and this includes both the model parameters and the residual error”. My [Metselaar's] comment: This may be exactly the point I am worried about: shouldn’t the predictive uncertainty be defined as sampling from the posterior parameter distribution + residual error + sampling from the prediction error distribution? Residual error reduces to measurement error in the case of a model which is perfect for the sample of experiments. Measurement error could be reduced to almost zero by ideal and perfect measurement instruments. I would h
2 0.8153671 2029 andrew gelman stats-2013-09-18-Understanding posterior p-values
Introduction: David Kaplan writes: I came across your paper “Understanding Posterior Predictive P-values”, and I have a question regarding your statement “If a posterior predictive p-value is 0.4, say, that means that, if we believe the model, we think there is a 40% chance that tomorrow’s value of T(y_rep) will exceed today’s T(y).” This is perfectly understandable to me and represents the idea of calibration. However, I am unsure how this relates to statements about fit. If T is the LR chi-square or Pearson chi-square, then your statement that there is a 40% chance that tomorrows value exceeds today’s value indicates bad fit, I think. Yet, some literature indicates that high p-values suggest good fit. Could you clarify this? My reply: I think that “fit” depends on the question being asked. In this case, I’d say the model fits for this particular purpose, even though it might not fit for other purposes. And here’s the abstract of the paper: Posterior predictive p-values do not i
3 0.78608531 1401 andrew gelman stats-2012-06-30-David Hogg on statistics
Introduction: Data analysis recipes: Fitting a model to data : We go through the many considerations involved in fitting a model to data, using as an example the fit of a straight line to a set of points in a two-dimensional plane. Standard weighted least-squares fitting is only appropriate when there is a dimension along which the data points have negligible uncertainties, and another along which all the uncertainties can be described by Gaussians of known variance; these conditions are rarely met in practice. We consider cases of general, heterogeneous, and arbitrarily covariant two-dimensional uncertainties, and situations in which there are bad data (large outliers), unknown uncertainties, and unknown but expected intrinsic scatter in the linear relationship being fit. Above all we emphasize the importance of having a “generative model” for the data, even an approximate one. Once there is a generative model, the subsequent fitting is non-arbitrary because the model permits direct computation
Introduction: Jean Richardson writes: Do you know what might lead to a large negative cross-correlation (-0.95) between deviance and one of the model parameters? Here’s the (brief) background: I [Richardson] have written a Bayesian hierarchical site occupancy model for presence of disease on individual amphibians. The response variable is therefore binary (disease present/absent) and the probability of disease being present in an individual (psi) depends on various covariates (species of amphibian, location sampled, etc.) paramaterized using a logit link function. Replicates are individuals sampled (tested for presence of disease) together. The possibility of imperfect detection is included as p = (prob. disease detected given disease is present). Posterior distributions were estimated using WinBUGS via R2WinBUGS. Simulated data from the model fit the real data very well and posterior distribution densities seem robust to any changes in the model (different priors, etc.) All autocor
5 0.7580002 1983 andrew gelman stats-2013-08-15-More on AIC, WAIC, etc
Introduction: Following up on our discussion from the other day, Angelika van der Linde sends along this paper from 2012 (link to journal here ). And Aki pulls out this great quote from Geisser and Eddy (1979): This discussion makes clear that in the nested case this method, as Akaike’s, is not consistent; i.e., even if $M_k$ is true, it will be rejected with probability $\alpha$ as $N\to\infty$. This point is also made by Schwarz (1978). However, from the point of view of prediction, this is of no great consequence. For large numbers of observations, a prediction based on the falsely assumed $M_k$, will not differ appreciably from one based on the true $M_k$. For example, if we assert that two normal populations have different means when in fact they have the same mean, then the use of the group mean as opposed to the grand mean for predicting a future observation results in predictors which are asymptotically equivalent and whose predictive variances are $\sigma^2[1 + (1/2n)]$ and $\si
7 0.75433171 996 andrew gelman stats-2011-11-07-Chi-square FAIL when many cells have small expected values
8 0.74894899 1162 andrew gelman stats-2012-02-11-Adding an error model to a deterministic model
9 0.74862152 1527 andrew gelman stats-2012-10-10-Another reason why you can get good inferences from a bad model
11 0.73262 85 andrew gelman stats-2010-06-14-Prior distribution for design effects
12 0.72227001 810 andrew gelman stats-2011-07-20-Adding more information can make the variance go up (depending on your model)
13 0.72010839 1374 andrew gelman stats-2012-06-11-Convergence Monitoring for Non-Identifiable and Non-Parametric Models
14 0.71934682 1476 andrew gelman stats-2012-08-30-Stan is fast
15 0.71872038 398 andrew gelman stats-2010-11-06-Quote of the day
16 0.71706074 1460 andrew gelman stats-2012-08-16-“Real data can be a pain”
17 0.71347082 1284 andrew gelman stats-2012-04-26-Modeling probability data
18 0.70748842 2128 andrew gelman stats-2013-12-09-How to model distributions that have outliers in one direction
19 0.70676816 1047 andrew gelman stats-2011-12-08-I Am Too Absolutely Heteroskedastic for This Probit Model
20 0.70049924 1287 andrew gelman stats-2012-04-28-Understanding simulations in terms of predictive inference?
topicId topicWeight
[(9, 0.013), (10, 0.085), (16, 0.041), (21, 0.02), (24, 0.212), (39, 0.016), (52, 0.022), (86, 0.015), (89, 0.014), (90, 0.035), (95, 0.06), (99, 0.331)]
simIndex simValue blogId blogTitle
same-blog 1 0.9821623 1363 andrew gelman stats-2012-06-03-Question about predictive checks
Introduction: Klaas Metselaar writes: I [Metselaar] am currently involved in a discussion about the use of the notion “predictive” as used in “posterior predictive check”. I would argue that the notion “predictive” should be reserved for posterior checks using information not used in the determination of the posterior. I quote from the discussion: “However, the predictive uncertainty in a Bayesian calculation requires sampling from all the random variables, and this includes both the model parameters and the residual error”. My [Metselaar's] comment: This may be exactly the point I am worried about: shouldn’t the predictive uncertainty be defined as sampling from the posterior parameter distribution + residual error + sampling from the prediction error distribution? Residual error reduces to measurement error in the case of a model which is perfect for the sample of experiments. Measurement error could be reduced to almost zero by ideal and perfect measurement instruments. I would h
2 0.98148376 1402 andrew gelman stats-2012-07-01-Ice cream! and temperature
Introduction: Just in time for the hot weather . . . Aleks points me to this link to a graph of % check-ins at NYC ice cream shops plotted against temperature in 2011. Aleks writes, “interesting how the ice cream response lags temperature in spring/fall but during the summer, the response is immediate.” This graph is a good starting point but I think more could be done, both in the analysis and purely in the graphics. Putting the two lines together like this with a fixed ratio is just too crude a tool. A series of graphs done just right could show a lot, I think!
3 0.97710079 37 andrew gelman stats-2010-05-17-Is chartjunk really “more useful” than plain graphs? I don’t think so.
Introduction: Helen DeWitt links to this blog that reports on a study by Scott Bateman, Carl Gutwin, David McDine, Regan Mandryk, Aaron Genest, and Christopher Brooks that claims the following: Guidelines for designing information charts often state that the presentation should reduce ‘chart junk’–visual embellishments that are not essential to understanding the data. . . . we conducted an experiment that compared embellished charts with plain ones, and measured both interpretation accuracy and long-term recall. We found that people’s accuracy in describing the embellished charts was no worse than for plain charts, and that their recall after a two-to-three-week gap was significantly better. As the above-linked blogger puts it, “chartjunk is more useful than plain graphs. . . . Tufte is not going to like this.” I can’t speak for Ed Tufte, but I’m not gonna take this claim about chartjunk lying down. I have two points to make which I hope can stop the above-linked study from being sla
Introduction: To understand the above title, see here . Masanao writes: This report claims that eating meat increases the risk of cancer. I’m sure you can’t read the page but you probably can understand the graphs. Different bars represent subdivision in the amount of the particular type of meat one consumes. And each chunk is different types of meat. Left is for male right is for female. They claim that the difference is significant, but they are clearly not!! I’m for not eating much meat but this is just way too much… Here’s the graph: I don’t know what to think. If you look carefully you can find one or two statistically significant differences but overall the pattern doesn’t look so compelling. I don’t know what the top and bottom rows are, though. Overall, the pattern in the top row looks like it could represent a real trend, while the graphs on the bottom row look like noise. This could be a good example for our multiple comparisons paper. If the researchers won’t
5 0.96804833 1744 andrew gelman stats-2013-03-01-Why big effects are more important than small effects
Introduction: The title of this post is silly but I have an important point to make, regarding an implicit model which I think many people assume even though it does not really make sense. Following a link from Sanjay Srivastava, I came across a post from David Funder saying that it’s useful to talk about the sizes of effects (I actually prefer the term “comparisons” so as to avoid the causal baggage) rather than just their signs. I agree , and I wanted to elaborate a bit on a point that comes up in Funder’s discussion. He quotes an (unnamed) prominent social psychologist as writing: The key to our research . . . [is not] to accurately estimate effect size. If I were testing an advertisement for a marketing research firm and wanted to be sure that the cost of the ad would produce enough sales to make it worthwhile, effect size would be crucial. But when I am testing a theory about whether, say, positive mood reduces information processing in comparison with negative mood, I am worried abou
6 0.96626258 1974 andrew gelman stats-2013-08-08-Statistical significance and the dangerous lure of certainty
8 0.96250623 86 andrew gelman stats-2010-06-14-“Too much data”?
10 0.96153742 63 andrew gelman stats-2010-06-02-The problem of overestimation of group-level variance parameters
11 0.96076345 511 andrew gelman stats-2011-01-11-One more time on that ESP study: The problem of overestimates and the shrinkage solution
12 0.95955974 1209 andrew gelman stats-2012-03-12-As a Bayesian I want scientists to report their data non-Bayesianly
14 0.95906192 1644 andrew gelman stats-2012-12-30-Fixed effects, followed by Bayes shrinkage?
15 0.95894599 2154 andrew gelman stats-2013-12-30-Bill Gates’s favorite graph of the year
16 0.95889556 970 andrew gelman stats-2011-10-24-Bell Labs
17 0.95850575 1941 andrew gelman stats-2013-07-16-Priors
18 0.95837224 2109 andrew gelman stats-2013-11-21-Hidden dangers of noninformative priors
19 0.95771313 1946 andrew gelman stats-2013-07-19-Prior distributions on derived quantities rather than on parameters themselves
20 0.95765626 2129 andrew gelman stats-2013-12-10-Cross-validation and Bayesian estimation of tuning parameters