andrew_gelman_stats andrew_gelman_stats-2012 andrew_gelman_stats-2012-1401 knowledge-graph by maker-knowledge-mining
Source: html
Introduction: Data analysis recipes: Fitting a model to data : We go through the many considerations involved in fitting a model to data, using as an example the fit of a straight line to a set of points in a two-dimensional plane. Standard weighted least-squares fitting is only appropriate when there is a dimension along which the data points have negligible uncertainties, and another along which all the uncertainties can be described by Gaussians of known variance; these conditions are rarely met in practice. We consider cases of general, heterogeneous, and arbitrarily covariant two-dimensional uncertainties, and situations in which there are bad data (large outliers), unknown uncertainties, and unknown but expected intrinsic scatter in the linear relationship being fit. Above all we emphasize the importance of having a “generative model” for the data, even an approximate one. Once there is a generative model, the subsequent fitting is non-arbitrary because the model permits direct computation
sentIndex sentText sentNum sentScore
1 Data analysis recipes: Fitting a model to data : We go through the many considerations involved in fitting a model to data, using as an example the fit of a straight line to a set of points in a two-dimensional plane. [sent-1, score-0.787]
2 Standard weighted least-squares fitting is only appropriate when there is a dimension along which the data points have negligible uncertainties, and another along which all the uncertainties can be described by Gaussians of known variance; these conditions are rarely met in practice. [sent-2, score-1.255]
3 We consider cases of general, heterogeneous, and arbitrarily covariant two-dimensional uncertainties, and situations in which there are bad data (large outliers), unknown uncertainties, and unknown but expected intrinsic scatter in the linear relationship being fit. [sent-3, score-0.88]
4 Above all we emphasize the importance of having a “generative model” for the data, even an approximate one. [sent-4, score-0.071]
5 Once there is a generative model, the subsequent fitting is non-arbitrary because the model permits direct computation of the likelihood of the parameters or the posterior probability distribution. [sent-5, score-1.311]
6 Construction of a posterior probability distribution is indispensible if there are “nuisance parameters” to marginalize away. [sent-6, score-0.613]
7 Data analysis recipes: Probability calculus for inference : In this pedagogical text aimed at those wanting to start thinking about or brush up on probabilistic inference, I review the rules by which probability distribution functions can (and cannot) be combined. [sent-7, score-1.379]
8 I connect these rules to the operations performed in probabilistic data analysis. [sent-8, score-0.653]
9 Dimensional analysis is emphasized as a valuable tool for helping to construct non-wrong probabilistic statements. [sent-9, score-0.623]
10 The applications of probability calculus in constructing likelihoods, marginalized likelihoods, posterior probabilities, and posterior predictions are all discussed. [sent-10, score-1.001]
wordName wordTfidf (topN-words)
[('uncertainties', 0.378), ('probabilistic', 0.234), ('fitting', 0.233), ('recipes', 0.225), ('posterior', 0.205), ('probability', 0.197), ('generative', 0.189), ('calculus', 0.183), ('likelihoods', 0.183), ('unknown', 0.168), ('data', 0.135), ('intrinsic', 0.125), ('brush', 0.125), ('gaussians', 0.125), ('indispensible', 0.125), ('marginalized', 0.125), ('model', 0.123), ('rules', 0.119), ('pedagogical', 0.112), ('negligible', 0.112), ('scatter', 0.108), ('permits', 0.105), ('heterogeneous', 0.105), ('nuisance', 0.105), ('arbitrarily', 0.105), ('parameters', 0.1), ('dimensional', 0.098), ('considerations', 0.093), ('outliers', 0.09), ('operations', 0.089), ('aimed', 0.088), ('subsequent', 0.088), ('distribution', 0.086), ('constructing', 0.086), ('weighted', 0.084), ('construction', 0.083), ('emphasized', 0.082), ('along', 0.082), ('analysis', 0.08), ('construct', 0.079), ('inference', 0.079), ('dimension', 0.078), ('helping', 0.077), ('wanting', 0.076), ('connect', 0.076), ('rarely', 0.071), ('approximate', 0.071), ('valuable', 0.071), ('computation', 0.071), ('situations', 0.071)]
simIndex simValue blogId blogTitle
same-blog 1 1.0000001 1401 andrew gelman stats-2012-06-30-David Hogg on statistics
Introduction: Data analysis recipes: Fitting a model to data : We go through the many considerations involved in fitting a model to data, using as an example the fit of a straight line to a set of points in a two-dimensional plane. Standard weighted least-squares fitting is only appropriate when there is a dimension along which the data points have negligible uncertainties, and another along which all the uncertainties can be described by Gaussians of known variance; these conditions are rarely met in practice. We consider cases of general, heterogeneous, and arbitrarily covariant two-dimensional uncertainties, and situations in which there are bad data (large outliers), unknown uncertainties, and unknown but expected intrinsic scatter in the linear relationship being fit. Above all we emphasize the importance of having a “generative model” for the data, even an approximate one. Once there is a generative model, the subsequent fitting is non-arbitrary because the model permits direct computation
2 0.16064805 1961 andrew gelman stats-2013-07-29-Postdocs in probabilistic modeling! With David Blei! And Stan!
Introduction: David Blei writes: I have two postdoc openings for basic research in probabilistic modeling . The thrusts are (a) scalable inference and (b) model checking. We will be developing new methods and implementing them in probabilistic programming systems. I am open to applicants interested in many kinds of applications and from any field. “Scalable inference” means black-box VB and related ideas, and “probabilistic programming systems” means Stan! (You might be familiar with Stan as an implementation of Nuts for posterior sampling, but Stan is also an efficient program for computing probability densities and their gradients, and as such is an ideal platform for developing scalable implementations of variational inference and related algorithms.) And you know I like model checking. Here’s the full ad: ===== POSTDOC POSITIONS IN PROBABILISTIC MODELING ===== We expect to have two postdoctoral positions available for January 2014 (or later). These positions are in D
3 0.15138176 780 andrew gelman stats-2011-06-27-Bridges between deterministic and probabilistic models for binary data
Introduction: For the analysis of binary data, various deterministic models have been proposed, which are generally simpler to fit and easier to understand than probabilistic models. We claim that corresponding to any deterministic model is an implicit stochastic model in which the deterministic model fits imperfectly, with errors occurring at random. In the context of binary data, we consider a model in which the probability of error depends on the model prediction. We show how to fit this model using a stochastic modification of deterministic optimization schemes. The advantages of fitting the stochastic model explicitly (rather than implicitly, by simply fitting a deterministic model and accepting the occurrence of errors) include quantification of uncertainty in the deterministic model’s parameter estimates, better estimation of the true model error rate, and the ability to check the fit of the model nontrivially. We illustrate this with a simple theoretical example of item response data and w
4 0.14276941 1482 andrew gelman stats-2012-09-04-Model checking and model understanding in machine learning
Introduction: Last month I wrote : Computer scientists are often brilliant but they can be unfamiliar with what is done in the worlds of data collection and analysis. This goes the other way too: statisticians such as myself can look pretty awkward, reinventing (or failing to reinvent) various wheels when we write computer programs or, even worse, try to design software.Andrew MacNamara writes: Andrew MacNamara followed up with some thoughts: I [MacNamara] had some basic statistics training through my MBA program, after having completed an undergrad degree in computer science. Since then I’ve been very interested in learning more about statistical techniques, including things like GLM and censored data analyses as well as machine learning topics like neural nets, SVMs, etc. I began following your blog after some research into Bayesian analysis topics and I am trying to dig deeper on that side of things. One thing I have noticed is that there seems to be a distinction between data analysi
5 0.13990189 1095 andrew gelman stats-2012-01-01-Martin and Liu: Probabilistic inference based on consistency of model with data
Introduction: What better way to start then new year than with some hard-core statistical theory? Ryan Martin and Chuanhai Liu send along a new paper on inferential models: Probability is a useful tool for describing uncertainty, so it is natural to strive for a system of statistical inference based on probabilities for or against various hypotheses. But existing probabilistic inference methods struggle to provide a meaningful interpretation of the probabilities across experiments in sufficient generality. In this paper we further develop a promising new approach based on what are called inferential models (IMs). The fundamental idea behind IMs is that there is an unobservable auxiliary variable that itself describes the inherent uncertainty about the parameter of interest, and that posterior probabilistic inference can be accomplished by predicting this unobserved quantity. We describe a simple and intuitive three-step construction of a random set of candidate parameter values, each being co
6 0.1383169 1247 andrew gelman stats-2012-04-05-More philosophy of Bayes
7 0.12780546 1363 andrew gelman stats-2012-06-03-Question about predictive checks
8 0.12648593 1712 andrew gelman stats-2013-02-07-Philosophy and the practice of Bayesian statistics (with all the discussions!)
10 0.12385833 754 andrew gelman stats-2011-06-09-Difficulties with Bayesian model averaging
11 0.12365258 2200 andrew gelman stats-2014-02-05-Prior distribution for a predicted probability
12 0.12332181 781 andrew gelman stats-2011-06-28-The holes in my philosophy of Bayesian data analysis
13 0.11778508 1208 andrew gelman stats-2012-03-11-Gelman on Hennig on Gelman on Bayes
14 0.11719774 1422 andrew gelman stats-2012-07-20-Likelihood thresholds and decisions
15 0.11710822 1941 andrew gelman stats-2013-07-16-Priors
17 0.11519995 1999 andrew gelman stats-2013-08-27-Bayesian model averaging or fitting a larger model
18 0.11207186 2208 andrew gelman stats-2014-02-12-How to think about “identifiability” in Bayesian inference?
19 0.11131135 779 andrew gelman stats-2011-06-25-Avoiding boundary estimates using a prior distribution as regularization
20 0.10824936 1713 andrew gelman stats-2013-02-08-P-values and statistical practice
topicId topicWeight
[(0, 0.142), (1, 0.182), (2, 0.009), (3, 0.042), (4, 0.006), (5, -0.002), (6, -0.002), (7, -0.017), (8, 0.008), (9, -0.015), (10, 0.001), (11, 0.015), (12, -0.074), (13, -0.043), (14, -0.098), (15, -0.013), (16, 0.042), (17, -0.006), (18, 0.008), (19, -0.044), (20, 0.027), (21, -0.017), (22, -0.001), (23, -0.036), (24, -0.001), (25, 0.057), (26, -0.021), (27, 0.035), (28, 0.043), (29, -0.027), (30, -0.063), (31, 0.01), (32, -0.044), (33, 0.021), (34, -0.021), (35, 0.016), (36, 0.007), (37, -0.066), (38, -0.011), (39, 0.014), (40, 0.01), (41, -0.017), (42, 0.033), (43, -0.028), (44, 0.036), (45, -0.011), (46, -0.0), (47, 0.012), (48, 0.039), (49, -0.018)]
simIndex simValue blogId blogTitle
same-blog 1 0.94448608 1401 andrew gelman stats-2012-06-30-David Hogg on statistics
Introduction: Data analysis recipes: Fitting a model to data : We go through the many considerations involved in fitting a model to data, using as an example the fit of a straight line to a set of points in a two-dimensional plane. Standard weighted least-squares fitting is only appropriate when there is a dimension along which the data points have negligible uncertainties, and another along which all the uncertainties can be described by Gaussians of known variance; these conditions are rarely met in practice. We consider cases of general, heterogeneous, and arbitrarily covariant two-dimensional uncertainties, and situations in which there are bad data (large outliers), unknown uncertainties, and unknown but expected intrinsic scatter in the linear relationship being fit. Above all we emphasize the importance of having a “generative model” for the data, even an approximate one. Once there is a generative model, the subsequent fitting is non-arbitrary because the model permits direct computation
2 0.85915762 1363 andrew gelman stats-2012-06-03-Question about predictive checks
Introduction: Klaas Metselaar writes: I [Metselaar] am currently involved in a discussion about the use of the notion “predictive” as used in “posterior predictive check”. I would argue that the notion “predictive” should be reserved for posterior checks using information not used in the determination of the posterior. I quote from the discussion: “However, the predictive uncertainty in a Bayesian calculation requires sampling from all the random variables, and this includes both the model parameters and the residual error”. My [Metselaar's] comment: This may be exactly the point I am worried about: shouldn’t the predictive uncertainty be defined as sampling from the posterior parameter distribution + residual error + sampling from the prediction error distribution? Residual error reduces to measurement error in the case of a model which is perfect for the sample of experiments. Measurement error could be reduced to almost zero by ideal and perfect measurement instruments. I would h
3 0.85032344 2029 andrew gelman stats-2013-09-18-Understanding posterior p-values
Introduction: David Kaplan writes: I came across your paper “Understanding Posterior Predictive P-values”, and I have a question regarding your statement “If a posterior predictive p-value is 0.4, say, that means that, if we believe the model, we think there is a 40% chance that tomorrow’s value of T(y_rep) will exceed today’s T(y).” This is perfectly understandable to me and represents the idea of calibration. However, I am unsure how this relates to statements about fit. If T is the LR chi-square or Pearson chi-square, then your statement that there is a 40% chance that tomorrows value exceeds today’s value indicates bad fit, I think. Yet, some literature indicates that high p-values suggest good fit. Could you clarify this? My reply: I think that “fit” depends on the question being asked. In this case, I’d say the model fits for this particular purpose, even though it might not fit for other purposes. And here’s the abstract of the paper: Posterior predictive p-values do not i
4 0.84636527 1284 andrew gelman stats-2012-04-26-Modeling probability data
Introduction: Rafael Huber writes: I conducted an experiment in which subjects where asked to estimate the probability of a certain event given a number of information (like a wheater forecaster or a stockmarket trader). These probability estimates are the dependent variable of my experiment. My goal is to model the data with a (hierarchical) Bayesian regression. A linear equation with all the presented information (quantified as log odds) defines the mu of a normal likelihood. The tau as precision is another free parameter. y[r] ~ dnorm( mu[r] , tau[ subj[r] ] ) mu[r] <- b0[ subj[r] ] + b1[ subj[r] ] * x1[r] + b2[ subj[r] ] * x2[r] + b3[ subj[r] ] * x3[r] My problem is that I do not believe that the normal is the correct probability distribution to model probability data (‌ because the error is limited). However, until now nobody was able to tell me how I can correctly model probability data. My reply: You can take the logit of the data before analyzing them. That is assuming there
Introduction: Lots of good statistical methods make use of two models. For example: - Classical statistics: estimates and standard errors using the likelihood function; tests and p-values using the sampling distribution. (The sampling distribution is not equivalent to the likelihood, as has been much discussed, for example in sequential stopping problems.) - Bayesian data analysis: inference using the posterior distribution; model checking using the predictive distribution (which, again, depends on the data-generating process in a way that the likelihood does not). - Machine learning: estimation using the data; evaluation using cross-validation (which requires some rule for partitioning the data, a rule that stands outside of the data themselves). - Bootstrap, jackknife, etc: estimation using an “estimator” (which, I would argue, is based in some sense on a model for the data), uncertainties using resampling (which, I would argue, is close to the idea of a “sampling distribution” in
6 0.79280293 1983 andrew gelman stats-2013-08-15-More on AIC, WAIC, etc
8 0.78976327 1817 andrew gelman stats-2013-04-21-More on Bayesian model selection in high-dimensional settings
9 0.78650337 754 andrew gelman stats-2011-06-09-Difficulties with Bayesian model averaging
10 0.78618336 1999 andrew gelman stats-2013-08-27-Bayesian model averaging or fitting a larger model
11 0.78401637 1460 andrew gelman stats-2012-08-16-“Real data can be a pain”
12 0.78309536 398 andrew gelman stats-2010-11-06-Quote of the day
13 0.77526653 82 andrew gelman stats-2010-06-12-UnConMax – uncertainty consideration maxims 7 +-- 2
14 0.77085751 996 andrew gelman stats-2011-11-07-Chi-square FAIL when many cells have small expected values
15 0.75942057 1287 andrew gelman stats-2012-04-28-Understanding simulations in terms of predictive inference?
16 0.75632131 1141 andrew gelman stats-2012-01-28-Using predator-prey models on the Canadian lynx series
17 0.74990368 1162 andrew gelman stats-2012-02-11-Adding an error model to a deterministic model
18 0.74844033 1247 andrew gelman stats-2012-04-05-More philosophy of Bayes
19 0.74822599 1459 andrew gelman stats-2012-08-15-How I think about mixture models
20 0.74751508 1527 andrew gelman stats-2012-10-10-Another reason why you can get good inferences from a bad model
topicId topicWeight
[(6, 0.021), (16, 0.028), (21, 0.292), (24, 0.198), (26, 0.013), (27, 0.029), (42, 0.021), (47, 0.011), (53, 0.01), (57, 0.012), (59, 0.011), (73, 0.034), (99, 0.21)]
simIndex simValue blogId blogTitle
1 0.97231573 672 andrew gelman stats-2011-04-20-The R code for those time-use graphs
Introduction: By popular demand, here’s my R script for the time-use graphs : # The data a1 <- c(4.2,3.2,11.1,1.3,2.2,2.0) a2 <- c(3.9,3.2,10.0,0.8,3.1,3.1) a3 <- c(6.3,2.5,9.8,0.9,2.2,2.4) a4 <- c(4.4,3.1,9.8,0.8,3.3,2.7) a5 <- c(4.8,3.0,9.9,0.7,3.3,2.4) a6 <- c(4.0,3.4,10.5,0.7,3.3,2.1) a <- rbind(a1,a2,a3,a4,a5,a6) avg <- colMeans (a) avg.array <- t (array (avg, rev(dim(a)))) diff <- a - avg.array country.name <- c("France", "Germany", "Japan", "Britain", "USA", "Turkey") # The line plots par (mfrow=c(2,3), mar=c(4,4,2,.5), mgp=c(2,.7,0), tck=-.02, oma=c(3,0,4,0), bg="gray96", fg="gray30") for (i in 1:6){ plot (c(1,6), c(-1,1.7), xlab="", ylab="", xaxt="n", yaxt="n", bty="l", type="n") lines (1:6, diff[i,], col="blue") points (1:6, diff[i,], pch=19, col="black") if (i>3){ axis (1, c(1,3,5), c ("Work,\nstudy", "Eat,\nsleep", "Leisure"), mgp=c(2,1.5,0), tck=0, cex.axis=1.2) axis (1, c(2,4,6), c ("Unpaid\nwork", "Personal\nCare", "Other"), mgp=c(2,1.5,0),
2 0.95633698 2298 andrew gelman stats-2014-04-21-On deck this week
Introduction: Mon : Ticket to Baaaath Tues : Ticket to Baaaaarf Wed : Thinking of doing a list experiment? Here’s a list of reasons why you should think again Thurs : An open site for researchers to post and share papers Fri : Questions about “Too Good to Be True” Sat : Sleazy sock puppet can’t stop spamming our discussion of compressed sensing and promoting the work of Xiteng Liu Sun : White stripes and dead armadillos
Introduction: A tall thin young man came to my office today to talk about one of my current pet topics: stories and social science. I brought up Tom Wolfe and his goal of compressing an entire city into a single novel, and how this reminded me of the psychologists Kahneman and Tversky’s concept of “the law of small numbers,” the idea that we expect any small sample to replicate all the properties of the larger population that it represents. Strictly speaking, the law of small numbers is impossible—any small sample necessarily has its own unique features—but this is even more true if we consider network properties. The average American knows about 700 people (depending on how you define “know”) and this defines a social network over the population. Now suppose you look at a few hundred people and all their connections. This mini-network will almost necessarily look much much sparser than the national network, as we’re removing the connections to the people not in the sample. Now consider how
same-blog 4 0.92504644 1401 andrew gelman stats-2012-06-30-David Hogg on statistics
Introduction: Data analysis recipes: Fitting a model to data : We go through the many considerations involved in fitting a model to data, using as an example the fit of a straight line to a set of points in a two-dimensional plane. Standard weighted least-squares fitting is only appropriate when there is a dimension along which the data points have negligible uncertainties, and another along which all the uncertainties can be described by Gaussians of known variance; these conditions are rarely met in practice. We consider cases of general, heterogeneous, and arbitrarily covariant two-dimensional uncertainties, and situations in which there are bad data (large outliers), unknown uncertainties, and unknown but expected intrinsic scatter in the linear relationship being fit. Above all we emphasize the importance of having a “generative model” for the data, even an approximate one. Once there is a generative model, the subsequent fitting is non-arbitrary because the model permits direct computation
5 0.92152917 151 andrew gelman stats-2010-07-16-Wanted: Probability distributions for rank orderings
Introduction: Dietrich Stoyan writes: I asked the IMS people for an expert in statistics of voting/elections and they wrote me your name. I am a statistician, but never worked in the field voting/elections. It was my son-in-law who asked me for statistical theories in that field. He posed in particular the following problem: The aim of the voting is to come to a ranking of c candidates. Every vote is a permutation of these c candidates. The problem is to have probability distributions in the set of all permutations of c elements. Are there theories for such distributions? I should be very grateful for a fast answer with hints to literature. (I confess that I do not know your books.) My reply: Rather than trying to model the ranks directly, I’d recommend modeling a latent continuous outcome which then implies a distribution on ranks, if the ranks are of interest. There are lots of distributions of c-dimensional continuous outcomes. In political science, the usual way to start is
6 0.91484094 1232 andrew gelman stats-2012-03-27-Banned in NYC school tests
7 0.9058122 432 andrew gelman stats-2010-11-27-Neumann update
8 0.90340447 894 andrew gelman stats-2011-09-07-Hipmunk FAIL: Graphics without content is not enough
10 0.88652122 1675 andrew gelman stats-2013-01-15-“10 Things You Need to Know About Causal Effects”
11 0.87959599 514 andrew gelman stats-2011-01-13-News coverage of statistical issues…how did I do?
12 0.87536991 1275 andrew gelman stats-2012-04-22-Please stop me before I barf again
13 0.86945462 62 andrew gelman stats-2010-06-01-Two Postdoc Positions Available on Bayesian Hierarchical Modeling
15 0.85945421 810 andrew gelman stats-2011-07-20-Adding more information can make the variance go up (depending on your model)
16 0.85368371 854 andrew gelman stats-2011-08-15-A silly paper that tries to make fun of multilevel models
17 0.84772688 1857 andrew gelman stats-2013-05-15-Does quantum uncertainty have a place in everyday applied statistics?
19 0.83754468 433 andrew gelman stats-2010-11-27-One way that psychology research is different than medical research
20 0.83053064 659 andrew gelman stats-2011-04-13-Jim Campbell argues that Larry Bartels’s “Unequal Democracy” findings are not robust