andrew_gelman_stats andrew_gelman_stats-2011 andrew_gelman_stats-2011-754 knowledge-graph by maker-knowledge-mining
Source: html
Introduction: In response to this article by Cosma Shalizi and myself on the philosophy of Bayesian statistics, David Hogg writes: I [Hogg] agree–even in physics and astronomy–that the models are not “True” in the God-like sense of being absolute reality (that is, I am not a realist); and I have argued (a philosophically very naive paper, but hey, I was new to all this) that for pretty fundamental reasons we could never arrive at the True (with a capital “T”) model of the Universe. The goal of inference is to find the “best” model, where “best” might have something to do with prediction, or explanation, or message length, or (horror!) our utility. Needless to say, most of my physics friends *are* realists, even in the face of “effective theories” as Newtonian mechanics is an effective theory of GR and GR is an effective theory of “quantum gravity” (this plays to your point, because if you think any theory is possibly an effective theory, how could you ever find Truth?). I also liked the i
sentIndex sentText sentNum sentScore
1 The goal of inference is to find the “best” model, where “best” might have something to do with prediction, or explanation, or message length, or (horror! [sent-2, score-0.088]
2 I also liked the ideas that the prior is really a testable regularization, and part of the model, and that model checking is our main work as scientists. [sent-6, score-0.63]
3 3, where you say that you can’t even use Bayes to average or compare the probabilities of models. [sent-8, score-0.263]
4 I agree that you don’t think any of your models are True, but if you decide that what the scientist is trying to do is explain or encode (as in the translation between inference and signal compression), then model averaging using Bayes *will* give the best possible result. [sent-9, score-1.207]
5 That is, it seems to me like there *is* an interpretation of what a scientist *does* that makes Bayesian averaging a good idea. [sent-10, score-0.405]
6 I guess you can say that you don’t think that is what a scientist does, but that gets into technical assumptions about epistemology that I don’t understand. [sent-11, score-0.207]
7 It is just that you are not *done* when you have done all that; you still have to do model checking and expanding and generalizing afterwards (but even this can still be understood in terms of finding the best possible effective theory or encoding for the data). [sent-15, score-1.359]
8 Yet another way of trying to explain my confusion is this: When you describe the convergence process in a model space that *doesn’t* contain the truth, you say that all it tries to do is match the distribution of the data. [sent-16, score-0.547]
9 Matching the distribution of the data with a simpler model? [sent-18, score-0.095]
10 My reply: Bayesian model averaging could work, and in some situations is does work, but it won’t necessarily work. [sent-20, score-0.662]
11 The problem arises with the models being averaged, in particular the posterior probabilities of the individual models depend crucially on untestable aspects of the prior distributions. [sent-21, score-0.563]
12 This is not to say that posterior model averaging is necessarily useless, merely that if you want to do it, I think you need to think seriously about the different pieces of the super-model that you’re estimating. [sent-23, score-0.863]
13 At this point I’d prefer continuous model expansion rather than discrete model averaging. [sent-24, score-0.664]
14 I would also be interested in your opinion about leave-one-out cross-validation; my engineer/CS friends love it. [sent-27, score-0.091]
15 To me, cross-validation is tied into predictive model checking in that ideas such as “leave one out” are fundamentally related do data collection. [sent-31, score-0.46]
16 Xval is like model checking in that the data come in through the sampling distribution, not just the likelihood. [sent-32, score-0.46]
wordName wordTfidf (topN-words)
[('model', 0.29), ('averaging', 0.286), ('effective', 0.226), ('gr', 0.202), ('theory', 0.187), ('checking', 0.17), ('bayes', 0.162), ('hogg', 0.131), ('scientist', 0.119), ('posterior', 0.113), ('flat', 0.112), ('marginal', 0.101), ('best', 0.099), ('bayesian', 0.095), ('distribution', 0.095), ('cause', 0.095), ('prior', 0.094), ('truth', 0.093), ('probabilities', 0.092), ('models', 0.092), ('philosophically', 0.092), ('newtonian', 0.092), ('xval', 0.092), ('physics', 0.091), ('friends', 0.091), ('say', 0.088), ('inference', 0.088), ('necessarily', 0.086), ('priors', 0.086), ('true', 0.085), ('point', 0.084), ('even', 0.083), ('compression', 0.083), ('needless', 0.083), ('afterwards', 0.083), ('agree', 0.081), ('encoding', 0.08), ('crucially', 0.08), ('gravity', 0.078), ('horror', 0.078), ('averaged', 0.078), ('encode', 0.078), ('testable', 0.076), ('arrive', 0.076), ('astronomy', 0.076), ('mechanics', 0.074), ('explain', 0.074), ('expanding', 0.071), ('quantum', 0.071), ('generalizing', 0.07)]
simIndex simValue blogId blogTitle
same-blog 1 0.9999994 754 andrew gelman stats-2011-06-09-Difficulties with Bayesian model averaging
Introduction: In response to this article by Cosma Shalizi and myself on the philosophy of Bayesian statistics, David Hogg writes: I [Hogg] agree–even in physics and astronomy–that the models are not “True” in the God-like sense of being absolute reality (that is, I am not a realist); and I have argued (a philosophically very naive paper, but hey, I was new to all this) that for pretty fundamental reasons we could never arrive at the True (with a capital “T”) model of the Universe. The goal of inference is to find the “best” model, where “best” might have something to do with prediction, or explanation, or message length, or (horror!) our utility. Needless to say, most of my physics friends *are* realists, even in the face of “effective theories” as Newtonian mechanics is an effective theory of GR and GR is an effective theory of “quantum gravity” (this plays to your point, because if you think any theory is possibly an effective theory, how could you ever find Truth?). I also liked the i
2 0.26620191 1247 andrew gelman stats-2012-04-05-More philosophy of Bayes
Introduction: Konrad Scheffler writes: I was interested by your paper “Induction and deduction in Bayesian data analysis” and was wondering if you would entertain a few questions: – Under the banner of objective Bayesianism, I would posit something like this as a description of Bayesian inference: “Objective Bayesian probability is not a degree of belief (which would necessarily be subjective) but a measure of the plausibility of a hypothesis, conditional on a formally specified information state. One way of specifying a formal information state is to specify a model, which involves specifying both a prior distribution (typically for a set of unobserved variables) and a likelihood function (typically for a set of observed variables, conditioned on the values of the unobserved variables). Bayesian inference involves calculating the objective degree of plausibility of a hypothesis (typically the truth value of the hypothesis is a function of the variables mentioned above) given such a
3 0.26252368 781 andrew gelman stats-2011-06-28-The holes in my philosophy of Bayesian data analysis
Introduction: I’ve been writing a lot about my philosophy of Bayesian statistics and how it fits into Popper’s ideas about falsification and Kuhn’s ideas about scientific revolutions. Here’s my long, somewhat technical paper with Cosma Shalizi. Here’s our shorter overview for the volume on the philosophy of social science. Here’s my latest try (for an online symposium), focusing on the key issues. I’m pretty happy with my approach–the familiar idea that Bayesian data analysis iterates the three steps of model building, inference, and model checking–but it does have some unresolved (maybe unresolvable) problems. Here are a couple mentioned in the third of the above links. Consider a simple model with independent data y_1, y_2, .., y_10 ~ N(θ,σ^2), with a prior distribution θ ~ N(0,10^2) and σ known and taking on some value of approximately 10. Inference about μ is straightforward, as is model checking, whether based on graphs or numerical summaries such as the sample variance and skewn
Introduction: In response to my remarks on his online book, Think Bayes, Allen Downey wrote: I [Downey] have a question about one of your comments: My [Gelman's] main criticism with both books is that they talk a lot about inference but not so much about model building or model checking (recall the three steps of Bayesian data analysis). I think it’s ok for an introductory book to focus on inference, which of course is central to the data-analytic process—but I’d like them to at least mention that Bayesian ideas arise in model building and model checking as well. This sounds like something I agree with, and one of the things I tried to do in the book is to put modeling decisions front and center. But the word “modeling” is used in lots of ways, so I want to see if we are talking about the same thing. For example, in many chapters, I start with a simple model of the scenario, do some analysis, then check whether the model is good enough, and iterate. Here’s the discussion of modeling
5 0.24595267 1999 andrew gelman stats-2013-08-27-Bayesian model averaging or fitting a larger model
Introduction: Nick Firoozye writes: I had a question about BMA [Bayesian model averaging] and model combinations in general, and direct it to you since they are a basic form of hierarchical model, albeit in the simplest of forms. I wanted to ask what the underlying assumptions are that could lead to BMA improving on a larger model. I know model combination is a topic of interest in the (frequentist) econometrics community (e.g., Bates & Granger, http://www.jstor.org/discover/10.2307/3008764?uid=3738032&uid;=2&uid;=4&sid;=21101948653381) but at the time it was considered a bit of a puzzle. Perhaps small models combined outperform a big model due to standard errors, insufficient data, etc. But I haven’t seen much in way of Bayesian justification. In simplest terms, you might have a joint density P(Y,theta_1,theta_2) from which you could use the two marginals P(Y,theta_1) and P(Y,theta_2) to derive two separate forecasts. A BMA-er would do a weighted average of the two forecast densities, having p
6 0.23448287 811 andrew gelman stats-2011-07-20-Kind of Bayesian
7 0.21897331 244 andrew gelman stats-2010-08-30-Useful models, model checking, and external validation: a mini-discussion
8 0.20439702 1510 andrew gelman stats-2012-09-25-Incoherence of Bayesian data analysis
9 0.20141089 1392 andrew gelman stats-2012-06-26-Occam
11 0.19833022 320 andrew gelman stats-2010-10-05-Does posterior predictive model checking fit with the operational subjective approach?
12 0.19022597 1431 andrew gelman stats-2012-07-27-Overfitting
13 0.18934096 1695 andrew gelman stats-2013-01-28-Economists argue about Bayes
14 0.18785843 1817 andrew gelman stats-2013-04-21-More on Bayesian model selection in high-dimensional settings
15 0.18719228 1941 andrew gelman stats-2013-07-16-Priors
16 0.18580264 217 andrew gelman stats-2010-08-19-The “either-or” fallacy of believing in discrete models: an example of folk statistics
17 0.18452223 1983 andrew gelman stats-2013-08-15-More on AIC, WAIC, etc
18 0.18075976 1779 andrew gelman stats-2013-03-27-“Two Dogmas of Strong Objective Bayesianism”
19 0.17839493 110 andrew gelman stats-2010-06-26-Philosophy and the practice of Bayesian statistics
20 0.17769951 1205 andrew gelman stats-2012-03-09-Coming to agreement on philosophy of statistics
topicId topicWeight
[(0, 0.31), (1, 0.276), (2, -0.026), (3, 0.094), (4, -0.107), (5, -0.017), (6, 0.039), (7, 0.016), (8, 0.078), (9, 0.035), (10, -0.001), (11, 0.047), (12, -0.087), (13, -0.008), (14, -0.105), (15, -0.019), (16, 0.076), (17, -0.012), (18, -0.01), (19, 0.022), (20, 0.004), (21, -0.073), (22, -0.056), (23, -0.046), (24, -0.041), (25, 0.023), (26, 0.013), (27, 0.021), (28, -0.007), (29, -0.003), (30, -0.056), (31, -0.048), (32, -0.012), (33, -0.018), (34, -0.035), (35, 0.018), (36, 0.003), (37, -0.027), (38, -0.019), (39, -0.015), (40, 0.008), (41, -0.027), (42, 0.042), (43, 0.015), (44, 0.02), (45, -0.029), (46, -0.014), (47, 0.004), (48, 0.031), (49, -0.001)]
simIndex simValue blogId blogTitle
same-blog 1 0.98366308 754 andrew gelman stats-2011-06-09-Difficulties with Bayesian model averaging
Introduction: In response to this article by Cosma Shalizi and myself on the philosophy of Bayesian statistics, David Hogg writes: I [Hogg] agree–even in physics and astronomy–that the models are not “True” in the God-like sense of being absolute reality (that is, I am not a realist); and I have argued (a philosophically very naive paper, but hey, I was new to all this) that for pretty fundamental reasons we could never arrive at the True (with a capital “T”) model of the Universe. The goal of inference is to find the “best” model, where “best” might have something to do with prediction, or explanation, or message length, or (horror!) our utility. Needless to say, most of my physics friends *are* realists, even in the face of “effective theories” as Newtonian mechanics is an effective theory of GR and GR is an effective theory of “quantum gravity” (this plays to your point, because if you think any theory is possibly an effective theory, how could you ever find Truth?). I also liked the i
2 0.93047512 1041 andrew gelman stats-2011-12-04-David MacKay and Occam’s Razor
Introduction: In my comments on David MacKay’s 2003 book on Bayesian inference, I wrote that I hate all the Occam-factor stuff that MacKay talks about, and I linked to this quote from Radford Neal: Sometimes a simple model will outperform a more complex model . . . Nevertheless, I believe that deliberately limiting the complexity of the model is not fruitful when the problem is evidently complex. Instead, if a simple model is found that outperforms some particular complex model, the appropriate response is to define a different complex model that captures whatever aspect of the problem led to the simple model performing well. MacKay replied as follows: When you said you disagree with me on Occam factors I think what you meant was that you agree with me on them. I’ve read your post on the topic and completely agreed with you (and Radford) that we should be using models the size of a house, models that we believe in, and that anyone who thinks it is a good idea to bias the model toward
Introduction: David Rohde writes: I have been thinking a lot lately about your Bayesian model checking approach. This is in part because I have been working on exploratory data analysis and wishing to avoid controversy and mathematical statistics we omitted model checking from our discussion. This is something that the refereeing process picked us up on and we ultimately added a critical discussion of null-hypothesis testing to our paper . The exploratory technique we discussed was essentially a 2D histogram approach, but we used Polya models as a formal model for the histogram. We are currently working on a new paper, and we are thinking through how or if we should do “confirmatory analysis” or model checking in the paper. What I find most admirable about your statistical work is that you clearly use the Bayesian approach to do useful applied statistical analysis. My own attempts at applied Bayesian analysis makes me greatly admire your applied successes. On the other hand it may be t
4 0.91583365 1392 andrew gelman stats-2012-06-26-Occam
Introduction: Cosma Shalizi and Larry Wasserman discuss some papers from a conference on Ockham’s Razor. I don’t have anything new to add on this so let me link to past blog entries on the topic and repost the following from 2004 : A lot has been written in statistics about “parsimony”—that is, the desire to explain phenomena using fewer parameters–but I’ve never seen any good general justification for parsimony. (I don’t count “Occam’s Razor,” or “Ockham’s Razor,” or whatever, as a justification. You gotta do better than digging up a 700-year-old quote.) Maybe it’s because I work in social science, but my feeling is: if you can approximate reality with just a few parameters, fine. If you can use more parameters to fold in more information, that’s even better. In practice, I often use simple models—because they are less effort to fit and, especially, to understand. But I don’t kid myself that they’re better than more complicated efforts! My favorite quote on this comes from Rad
5 0.91180617 1999 andrew gelman stats-2013-08-27-Bayesian model averaging or fitting a larger model
Introduction: Nick Firoozye writes: I had a question about BMA [Bayesian model averaging] and model combinations in general, and direct it to you since they are a basic form of hierarchical model, albeit in the simplest of forms. I wanted to ask what the underlying assumptions are that could lead to BMA improving on a larger model. I know model combination is a topic of interest in the (frequentist) econometrics community (e.g., Bates & Granger, http://www.jstor.org/discover/10.2307/3008764?uid=3738032&uid;=2&uid;=4&sid;=21101948653381) but at the time it was considered a bit of a puzzle. Perhaps small models combined outperform a big model due to standard errors, insufficient data, etc. But I haven’t seen much in way of Bayesian justification. In simplest terms, you might have a joint density P(Y,theta_1,theta_2) from which you could use the two marginals P(Y,theta_1) and P(Y,theta_2) to derive two separate forecasts. A BMA-er would do a weighted average of the two forecast densities, having p
6 0.90642679 811 andrew gelman stats-2011-07-20-Kind of Bayesian
7 0.90506268 781 andrew gelman stats-2011-06-28-The holes in my philosophy of Bayesian data analysis
8 0.90422642 1247 andrew gelman stats-2012-04-05-More philosophy of Bayes
9 0.90365463 1817 andrew gelman stats-2013-04-21-More on Bayesian model selection in high-dimensional settings
10 0.89887643 1510 andrew gelman stats-2012-09-25-Incoherence of Bayesian data analysis
11 0.89091378 614 andrew gelman stats-2011-03-15-Induction within a model, deductive inference for model evaluation
12 0.87808341 1723 andrew gelman stats-2013-02-15-Wacky priors can work well?
13 0.87718242 1208 andrew gelman stats-2012-03-11-Gelman on Hennig on Gelman on Bayes
14 0.86900377 1459 andrew gelman stats-2012-08-15-How I think about mixture models
15 0.86434925 1406 andrew gelman stats-2012-07-05-Xiao-Li Meng and Xianchao Xie rethink asymptotics
16 0.86155099 1431 andrew gelman stats-2012-07-27-Overfitting
17 0.8582589 244 andrew gelman stats-2010-08-30-Useful models, model checking, and external validation: a mini-discussion
18 0.84987468 2029 andrew gelman stats-2013-09-18-Understanding posterior p-values
19 0.84318817 217 andrew gelman stats-2010-08-19-The “either-or” fallacy of believing in discrete models: an example of folk statistics
20 0.84202075 1216 andrew gelman stats-2012-03-17-Modeling group-level predictors in a multilevel regression
topicId topicWeight
[(15, 0.023), (16, 0.065), (21, 0.017), (24, 0.217), (48, 0.066), (86, 0.063), (89, 0.014), (99, 0.378)]
simIndex simValue blogId blogTitle
Introduction: John Sides points to this discussion (with over 200 comments!) by political scientist Charli Carpenter of her response to a student from another university who emailed with questions that look like they come from a homework assignment. Here’s the student’s original email : Hi Mr. Carpenter, I am a fourth year college student and I have the honor of reading one of your books and I just had a few questions… I am very fascinated by your work and I am just trying to understand everything. Can you please address some of my questions? I would greatly appreciate it. It certainly help me understand your wonderful article better. Thank you very much! :) 1. What is the fundamental purpose of your article? 2. What is your fundamental thesis? 3. What evidence do you use to support your thesis? 4. What is the overall conclusion? 5. Do you feel that you have a fair balance of opposing viewpoints? Sincerely, After a series of emails in which Carpenter explained why she thought
2 0.98755264 464 andrew gelman stats-2010-12-12-Finite-population standard deviation in a hierarchical model
Introduction: Karri Seppa writes: My topic is regional variation in the cause-specific survival of breast cancer patients across the 21 hospital districts in Finland, this component being modeled by random effects. I am interested mainly in the district-specific effects, and with a hierarchical model I can get reasonable estimates also for sparsely populated districts. Based on the recommendation given in the book by yourself and Dr. Hill (2007) I tend to think that the finite-population variance would be an appropriate measure to summarize the overall variation across the 21 districts. However, I feel it is somewhat incoherent first to assume a Normal distribution for the district effects, involving a “superpopulation” variance parameter, and then to compute the finite-population variance from the estimated district-specific parameters. I wonder whether the finite-population variance were more appropriate in the context of a model with fixed district effects? My reply: I agree that th
3 0.98541564 1695 andrew gelman stats-2013-01-28-Economists argue about Bayes
Introduction: Robert Bell pointed me to this post by Brad De Long on Bayesian statistics, and then I also noticed this from Noah Smith, who wrote: My impression is that although the Bayesian/Frequentist debate is interesting and intellectually fun, there’s really not much “there” there… despite being so-hip-right-now, Bayesian is not the Statistical Jesus. I’m happy to see the discussion going in this direction. Twenty-five years ago or so, when I got into this biz, there were some serious anti-Bayesian attitudes floating around in mainstream statistics. Discussions in the journals sometimes devolved into debates of the form, “Bayesians: knaves or fools?”. You’d get all sorts of free-floating skepticism about any prior distribution at all, even while people were accepting without question (and doing theory on) logistic regressions, proportional hazards models, and all sorts of strong strong models. (In the subfield of survey sampling, various prominent researchers would refuse to mode
4 0.98451686 899 andrew gelman stats-2011-09-10-The statistical significance filter
Introduction: I’ve talked about this a bit but it’s never had its own blog entry (until now). Statistically significant findings tend to overestimate the magnitude of effects. This holds in general (because E(|x|) > |E(x)|) but even more so if you restrict to statistically significant results. Here’s an example. Suppose a true effect of theta is unbiasedly estimated by y ~ N (theta, 1). Further suppose that we will only consider statistically significant results, that is, cases in which |y| > 2. The estimate “|y| conditional on |y|>2″ is clearly an overestimate of |theta|. First off, if |theta|<2, the estimate |y| conditional on statistical significance is not only too high in expectation, it's always too high. This is a problem, given that |theta| is in reality probably is less than 2. (The low-hangning fruit have already been picked, remember?) But even if |theta|>2, the estimate |y| conditional on statistical significance will still be too high in expectation. For a discussion o
same-blog 5 0.98416275 754 andrew gelman stats-2011-06-09-Difficulties with Bayesian model averaging
Introduction: In response to this article by Cosma Shalizi and myself on the philosophy of Bayesian statistics, David Hogg writes: I [Hogg] agree–even in physics and astronomy–that the models are not “True” in the God-like sense of being absolute reality (that is, I am not a realist); and I have argued (a philosophically very naive paper, but hey, I was new to all this) that for pretty fundamental reasons we could never arrive at the True (with a capital “T”) model of the Universe. The goal of inference is to find the “best” model, where “best” might have something to do with prediction, or explanation, or message length, or (horror!) our utility. Needless to say, most of my physics friends *are* realists, even in the face of “effective theories” as Newtonian mechanics is an effective theory of GR and GR is an effective theory of “quantum gravity” (this plays to your point, because if you think any theory is possibly an effective theory, how could you ever find Truth?). I also liked the i
6 0.98354584 781 andrew gelman stats-2011-06-28-The holes in my philosophy of Bayesian data analysis
7 0.98353475 1910 andrew gelman stats-2013-06-22-Struggles over the criticism of the “cannabis users and IQ change” paper
8 0.98230886 1247 andrew gelman stats-2012-04-05-More philosophy of Bayes
9 0.98206174 1518 andrew gelman stats-2012-10-02-Fighting a losing battle
10 0.98159581 1162 andrew gelman stats-2012-02-11-Adding an error model to a deterministic model
11 0.98149985 1605 andrew gelman stats-2012-12-04-Write This Book
12 0.98119903 2140 andrew gelman stats-2013-12-19-Revised evidence for statistical standards
13 0.98110759 669 andrew gelman stats-2011-04-19-The mysterious Gamma (1.4, 0.4)
15 0.98048216 511 andrew gelman stats-2011-01-11-One more time on that ESP study: The problem of overestimates and the shrinkage solution
18 0.97983837 2055 andrew gelman stats-2013-10-08-A Bayesian approach for peer-review panels? and a speculation about Bruno Frey
19 0.97978258 1644 andrew gelman stats-2012-12-30-Fixed effects, followed by Bayes shrinkage?
20 0.97968024 888 andrew gelman stats-2011-09-03-A psychology researcher asks: Is Anova dead?