andrew_gelman_stats andrew_gelman_stats-2012 andrew_gelman_stats-2012-1392 knowledge-graph by maker-knowledge-mining

1392 andrew gelman stats-2012-06-26-Occam


meta infos for this blog

Source: html

Introduction: Cosma Shalizi and Larry Wasserman discuss some papers from a conference on Ockham’s Razor. I don’t have anything new to add on this so let me link to past blog entries on the topic and repost the following from 2004 : A lot has been written in statistics about “parsimony”—that is, the desire to explain phenomena using fewer parameters–but I’ve never seen any good general justification for parsimony. (I don’t count “Occam’s Razor,” or “Ockham’s Razor,” or whatever, as a justification. You gotta do better than digging up a 700-year-old quote.) Maybe it’s because I work in social science, but my feeling is: if you can approximate reality with just a few parameters, fine. If you can use more parameters to fold in more information, that’s even better. In practice, I often use simple models—because they are less effort to fit and, especially, to understand. But I don’t kid myself that they’re better than more complicated efforts! My favorite quote on this comes from Rad


Summary: the most important sentenses genereted by tfidf model

sentIndex sentText sentNum sentScore

1 You gotta do better than digging up a 700-year-old quote. [sent-4, score-0.199]

2 In practice, I often use simple models—because they are less effort to fit and, especially, to understand. [sent-7, score-0.265]

3 103-104: Sometimes a simple model will outperform a more complex model . [sent-10, score-0.678]

4 Nevertheless, I believe that deliberately limiting the complexity of the model is not fruitful when the problem is evidently complex. [sent-13, score-0.368]

5 Instead, if a simple model is found that outperforms some particular complex model, the appropriate response is to define a different complex model that captures whatever aspect of the problem led to the simple model performing well. [sent-14, score-1.282]

6 Simpler models are easier to understand, and that counts for a lot. [sent-17, score-0.236]

7 I start with simple models and then work from there. [sent-18, score-0.349]

8 I’m interested in the so-called network of models, the idea that we can and should routinely fit multiple models, not for the purpose of model choice or even model averaging, but so as to better understand how we are fitting the data. [sent-19, score-0.831]

9 Part of my attitude might come from my social-science experience: we often here people saying, Your model is fine but it should also include variables X, Y, and Z. [sent-21, score-0.292]

10 I never hear people complaining and saying that my model would be better if it did not include some factor or another. [sent-22, score-0.407]

11 In many practical settings there can be a problem when a model contains too many variables or too much complexity. [sent-23, score-0.35]

12 So I think that, in some settings, Occam’s Razor is an alternative (and, to me, not the most desirable alternative) to using a more sophisticated estimation procedure. [sent-27, score-0.239]

13 The Occam applications I don’t like are the discrete versions such as advocated by Adrian Raftery and others, in which some version of Bayesian calculation is used to get results saying that the posterior probability is 60%, say, that a certain coefficient in a model is exactly zero. [sent-28, score-0.281]

14 I’d rather keep the term in the model and just shrink it continuously toward zero. [sent-29, score-0.215]

15 I find that in practice we can often get a nice interpretible picture if we do not ask for perfect smoothness (lowest variance) but as we allow for less and less smoothness the picture becomes hard to understand. [sent-41, score-0.701]

16 For some types of models, we typically find that simpler models yield better out of sample forecasts than more complex ones. [sent-45, score-0.673]

17 I refer in particular to the choice of lag length in ARMA models (okay, not all that exciting) and to Lutkepohl’s work showing that the use of criteria like the BIC which penalizes complexity more strenuously leads to better out of sample forecasts. [sent-46, score-0.692]

18 A focus on cross-validation often leads to the choice of simpler models (though of course the data could suggest a more complicated model is superior). [sent-48, score-0.944]

19 This is what Radford Neal does for his Bayesian neural nets (lots of neurons, as the number of neurons gets large, the prior on them being zero gets stronger). [sent-53, score-0.259]

20 As we discuss further in the comments below, if you’re doing least squares (for example, in fitting Arma models), you need to penalize those big models, but this is not such a concern if you’re regularizing. [sent-55, score-0.241]


similar blogs computed by tfidf model

tfidf for this blog:

wordName wordTfidf (topN-words)

[('models', 0.236), ('neal', 0.221), ('parsimony', 0.218), ('model', 0.215), ('razor', 0.206), ('occam', 0.197), ('simpler', 0.176), ('ockham', 0.16), ('neurons', 0.145), ('arma', 0.145), ('complex', 0.135), ('smoothness', 0.131), ('better', 0.126), ('parameters', 0.117), ('neural', 0.114), ('radford', 0.114), ('simple', 0.113), ('purpose', 0.113), ('desirable', 0.104), ('choice', 0.094), ('squares', 0.09), ('complexity', 0.09), ('discuss', 0.083), ('whatever', 0.078), ('often', 0.077), ('bayesian', 0.075), ('less', 0.075), ('leads', 0.073), ('complicated', 0.073), ('strenuously', 0.073), ('digging', 0.073), ('settings', 0.072), ('picture', 0.072), ('predictions', 0.07), ('prediction', 0.07), ('heavier', 0.069), ('estimation', 0.068), ('nice', 0.068), ('fitting', 0.068), ('alternative', 0.067), ('saying', 0.066), ('microeconomics', 0.066), ('lowering', 0.066), ('regularizing', 0.066), ('milton', 0.066), ('beck', 0.066), ('quote', 0.064), ('problem', 0.063), ('bic', 0.063), ('realism', 0.061)]

similar blogs list:

simIndex simValue blogId blogTitle

same-blog 1 0.99999982 1392 andrew gelman stats-2012-06-26-Occam

Introduction: Cosma Shalizi and Larry Wasserman discuss some papers from a conference on Ockham’s Razor. I don’t have anything new to add on this so let me link to past blog entries on the topic and repost the following from 2004 : A lot has been written in statistics about “parsimony”—that is, the desire to explain phenomena using fewer parameters–but I’ve never seen any good general justification for parsimony. (I don’t count “Occam’s Razor,” or “Ockham’s Razor,” or whatever, as a justification. You gotta do better than digging up a 700-year-old quote.) Maybe it’s because I work in social science, but my feeling is: if you can approximate reality with just a few parameters, fine. If you can use more parameters to fold in more information, that’s even better. In practice, I often use simple models—because they are less effort to fit and, especially, to understand. But I don’t kid myself that they’re better than more complicated efforts! My favorite quote on this comes from Rad

2 0.39149788 1041 andrew gelman stats-2011-12-04-David MacKay and Occam’s Razor

Introduction: In my comments on David MacKay’s 2003 book on Bayesian inference, I wrote that I hate all the Occam-factor stuff that MacKay talks about, and I linked to this quote from Radford Neal: Sometimes a simple model will outperform a more complex model . . . Nevertheless, I believe that deliberately limiting the complexity of the model is not fruitful when the problem is evidently complex. Instead, if a simple model is found that outperforms some particular complex model, the appropriate response is to define a different complex model that captures whatever aspect of the problem led to the simple model performing well. MacKay replied as follows: When you said you disagree with me on Occam factors I think what you meant was that you agree with me on them. I’ve read your post on the topic and completely agreed with you (and Radford) that we should be using models the size of a house, models that we believe in, and that anyone who thinks it is a good idea to bias the model toward

3 0.2959238 2133 andrew gelman stats-2013-12-13-Flexibility is good

Introduction: If I made a separate post for each interesting blog discussion, we’d get overwhelmed. That’s why I often leave detailed responses in the comments section, even though I’m pretty sure that most readers don’t look in the comments at all. Sometimes, though, I think it’s good to bring such discussions to light. Here’s a recent example. Michael wrote : Poor predictive performance usually indicates that the model isn’t sufficiently flexible to explain the data, and my understanding of the proper Bayesian strategy is to feed that back into your original model and try again until you achieve better performance. Corey replied : It was my impression that — in ML at least — poor predictive performance is more often due to the model being too flexible and fitting noise. And Rahul agreed : Good point. A very flexible model will describe your training data perfectly and then go bonkers when unleashed on wild data. But I wrote : Overfitting comes from a model being flex

4 0.20537472 1972 andrew gelman stats-2013-08-07-When you’re planning on fitting a model, build up to it by fitting simpler models first. Then, once you have a model you like, check the hell out of it

Introduction: In response to my remarks on his online book, Think Bayes, Allen Downey wrote: I [Downey] have a question about one of your comments: My [Gelman's] main criticism with both books is that they talk a lot about inference but not so much about model building or model checking (recall the three steps of Bayesian data analysis). I think it’s ok for an introductory book to focus on inference, which of course is central to the data-analytic process—but I’d like them to at least mention that Bayesian ideas arise in model building and model checking as well. This sounds like something I agree with, and one of the things I tried to do in the book is to put modeling decisions front and center. But the word “modeling” is used in lots of ways, so I want to see if we are talking about the same thing. For example, in many chapters, I start with a simple model of the scenario, do some analysis, then check whether the model is good enough, and iterate. Here’s the discussion of modeling

5 0.20141089 754 andrew gelman stats-2011-06-09-Difficulties with Bayesian model averaging

Introduction: In response to this article by Cosma Shalizi and myself on the philosophy of Bayesian statistics, David Hogg writes: I [Hogg] agree–even in physics and astronomy–that the models are not “True” in the God-like sense of being absolute reality (that is, I am not a realist); and I have argued (a philosophically very naive paper, but hey, I was new to all this) that for pretty fundamental reasons we could never arrive at the True (with a capital “T”) model of the Universe. The goal of inference is to find the “best” model, where “best” might have something to do with prediction, or explanation, or message length, or (horror!) our utility. Needless to say, most of my physics friends *are* realists, even in the face of “effective theories” as Newtonian mechanics is an effective theory of GR and GR is an effective theory of “quantum gravity” (this plays to your point, because if you think any theory is possibly an effective theory, how could you ever find Truth?). I also liked the i

6 0.19445369 1999 andrew gelman stats-2013-08-27-Bayesian model averaging or fitting a larger model

7 0.19010767 1586 andrew gelman stats-2012-11-21-Readings for a two-week segment on Bayesian modeling?

8 0.18056561 781 andrew gelman stats-2011-06-28-The holes in my philosophy of Bayesian data analysis

9 0.17493129 1431 andrew gelman stats-2012-07-27-Overfitting

10 0.17227271 391 andrew gelman stats-2010-11-03-Some thoughts on election forecasting

11 0.17006879 1406 andrew gelman stats-2012-07-05-Xiao-Li Meng and Xianchao Xie rethink asymptotics

12 0.16386898 1459 andrew gelman stats-2012-08-15-How I think about mixture models

13 0.15805762 811 andrew gelman stats-2011-07-20-Kind of Bayesian

14 0.15622824 1287 andrew gelman stats-2012-04-28-Understanding simulations in terms of predictive inference?

15 0.15520748 244 andrew gelman stats-2010-08-30-Useful models, model checking, and external validation: a mini-discussion

16 0.1547253 1247 andrew gelman stats-2012-04-05-More philosophy of Bayes

17 0.15370676 24 andrew gelman stats-2010-05-09-Special journal issue on statistical methods for the social sciences

18 0.15339878 774 andrew gelman stats-2011-06-20-The pervasive twoishness of statistics; in particular, the “sampling distribution” and the “likelihood” are two different models, and that’s a good thing

19 0.15281923 780 andrew gelman stats-2011-06-27-Bridges between deterministic and probabilistic models for binary data

20 0.1503059 1946 andrew gelman stats-2013-07-19-Prior distributions on derived quantities rather than on parameters themselves


similar blogs computed by lsi model

lsi for this blog:

topicId topicWeight

[(0, 0.291), (1, 0.22), (2, -0.012), (3, 0.086), (4, -0.017), (5, 0.016), (6, 0.015), (7, -0.035), (8, 0.124), (9, 0.073), (10, 0.042), (11, 0.042), (12, -0.056), (13, -0.011), (14, -0.082), (15, -0.01), (16, 0.038), (17, -0.022), (18, -0.026), (19, -0.006), (20, 0.0), (21, -0.07), (22, -0.049), (23, -0.035), (24, -0.044), (25, -0.02), (26, -0.012), (27, 0.011), (28, 0.002), (29, -0.021), (30, -0.071), (31, -0.02), (32, 0.009), (33, -0.022), (34, 0.019), (35, -0.004), (36, 0.01), (37, -0.034), (38, 0.024), (39, 0.002), (40, -0.01), (41, -0.036), (42, 0.019), (43, 0.058), (44, 0.022), (45, 0.007), (46, -0.016), (47, 0.005), (48, 0.024), (49, -0.006)]

similar blogs list:

simIndex simValue blogId blogTitle

same-blog 1 0.98868972 1392 andrew gelman stats-2012-06-26-Occam

Introduction: Cosma Shalizi and Larry Wasserman discuss some papers from a conference on Ockham’s Razor. I don’t have anything new to add on this so let me link to past blog entries on the topic and repost the following from 2004 : A lot has been written in statistics about “parsimony”—that is, the desire to explain phenomena using fewer parameters–but I’ve never seen any good general justification for parsimony. (I don’t count “Occam’s Razor,” or “Ockham’s Razor,” or whatever, as a justification. You gotta do better than digging up a 700-year-old quote.) Maybe it’s because I work in social science, but my feeling is: if you can approximate reality with just a few parameters, fine. If you can use more parameters to fold in more information, that’s even better. In practice, I often use simple models—because they are less effort to fit and, especially, to understand. But I don’t kid myself that they’re better than more complicated efforts! My favorite quote on this comes from Rad

2 0.94356239 754 andrew gelman stats-2011-06-09-Difficulties with Bayesian model averaging

Introduction: In response to this article by Cosma Shalizi and myself on the philosophy of Bayesian statistics, David Hogg writes: I [Hogg] agree–even in physics and astronomy–that the models are not “True” in the God-like sense of being absolute reality (that is, I am not a realist); and I have argued (a philosophically very naive paper, but hey, I was new to all this) that for pretty fundamental reasons we could never arrive at the True (with a capital “T”) model of the Universe. The goal of inference is to find the “best” model, where “best” might have something to do with prediction, or explanation, or message length, or (horror!) our utility. Needless to say, most of my physics friends *are* realists, even in the face of “effective theories” as Newtonian mechanics is an effective theory of GR and GR is an effective theory of “quantum gravity” (this plays to your point, because if you think any theory is possibly an effective theory, how could you ever find Truth?). I also liked the i

3 0.94049269 1041 andrew gelman stats-2011-12-04-David MacKay and Occam’s Razor

Introduction: In my comments on David MacKay’s 2003 book on Bayesian inference, I wrote that I hate all the Occam-factor stuff that MacKay talks about, and I linked to this quote from Radford Neal: Sometimes a simple model will outperform a more complex model . . . Nevertheless, I believe that deliberately limiting the complexity of the model is not fruitful when the problem is evidently complex. Instead, if a simple model is found that outperforms some particular complex model, the appropriate response is to define a different complex model that captures whatever aspect of the problem led to the simple model performing well. MacKay replied as follows: When you said you disagree with me on Occam factors I think what you meant was that you agree with me on them. I’ve read your post on the topic and completely agreed with you (and Radford) that we should be using models the size of a house, models that we believe in, and that anyone who thinks it is a good idea to bias the model toward

4 0.91841727 1459 andrew gelman stats-2012-08-15-How I think about mixture models

Introduction: Larry Wasserman refers to finite mixture models as “beasts” and writes jokes that they “should be avoided at all costs.” I’ve thought a lot about mixture models, ever since using them in an analysis of voting patterns that was published in 1990. First off, I’d like to say that our model was useful so I’d prefer not to pay the cost of avoiding it. For a quick description of our mixture model and its context, see pp. 379-380 of my article in the Jim Berger volume). Actually, our case was particularly difficult because we were not even fitting a mixture model to data, we were fitting it to latent data and using the model to perform partial pooling. My difficulties in trying to fit this model inspired our discussion of mixture models in Bayesian Data Analysis (page 109 in the second edition, in the section on “Counterexamples to the theorems”). I agree with Larry that if you’re fitting a mixture model, it’s good to be aware of the problems that arise if you try to estimate

5 0.91140836 1999 andrew gelman stats-2013-08-27-Bayesian model averaging or fitting a larger model

Introduction: Nick Firoozye writes: I had a question about BMA [Bayesian model averaging] and model combinations in general, and direct it to you since they are a basic form of hierarchical model, albeit in the simplest of forms. I wanted to ask what the underlying assumptions are that could lead to BMA improving on a larger model. I know model combination is a topic of interest in the (frequentist) econometrics community (e.g., Bates & Granger, http://www.jstor.org/discover/10.2307/3008764?uid=3738032&uid;=2&uid;=4&sid;=21101948653381) but at the time it was considered a bit of a puzzle. Perhaps small models combined outperform a big model due to standard errors, insufficient data, etc. But I haven’t seen much in way of Bayesian justification. In simplest terms, you might have a joint density P(Y,theta_1,theta_2) from which you could use the two marginals P(Y,theta_1) and P(Y,theta_2) to derive two separate forecasts. A BMA-er would do a weighted average of the two forecast densities, having p

6 0.90868914 1406 andrew gelman stats-2012-07-05-Xiao-Li Meng and Xianchao Xie rethink asymptotics

7 0.90374517 1431 andrew gelman stats-2012-07-27-Overfitting

8 0.88992047 1739 andrew gelman stats-2013-02-26-An AI can build and try out statistical models using an open-ended generative grammar

9 0.88916737 320 andrew gelman stats-2010-10-05-Does posterior predictive model checking fit with the operational subjective approach?

10 0.88805223 1817 andrew gelman stats-2013-04-21-More on Bayesian model selection in high-dimensional settings

11 0.87829065 1395 andrew gelman stats-2012-06-27-Cross-validation (What is it good for?)

12 0.878263 773 andrew gelman stats-2011-06-18-Should we always be using the t and robit instead of the normal and logit?

13 0.87586951 1972 andrew gelman stats-2013-08-07-When you’re planning on fitting a model, build up to it by fitting simpler models first. Then, once you have a model you like, check the hell out of it

14 0.87507051 1141 andrew gelman stats-2012-01-28-Using predator-prey models on the Canadian lynx series

15 0.87432259 614 andrew gelman stats-2011-03-15-Induction within a model, deductive inference for model evaluation

16 0.87043792 1510 andrew gelman stats-2012-09-25-Incoherence of Bayesian data analysis

17 0.86551952 964 andrew gelman stats-2011-10-19-An interweaving-transformation strategy for boosting MCMC efficiency

18 0.86532605 1468 andrew gelman stats-2012-08-24-Multilevel modeling and instrumental variables

19 0.86490214 1216 andrew gelman stats-2012-03-17-Modeling group-level predictors in a multilevel regression

20 0.86453527 2136 andrew gelman stats-2013-12-16-Whither the “bet on sparsity principle” in a nonsparse world?


similar blogs computed by lda model

lda for this blog:

topicId topicWeight

[(5, 0.014), (15, 0.031), (16, 0.085), (21, 0.036), (24, 0.189), (29, 0.136), (53, 0.011), (55, 0.013), (79, 0.018), (84, 0.01), (86, 0.034), (95, 0.015), (99, 0.278)]

similar blogs list:

simIndex simValue blogId blogTitle

1 0.97685903 1940 andrew gelman stats-2013-07-16-A poll that throws away data???

Introduction: Mark Blumenthal writes: What do you think about the “random rejection” method used by PPP that was attacked at some length today by a Republican pollster. Our just published post on the debate includes all the details as I know them. The Storify of Martino’s tweets has some additional data tables linked to toward the end. Also, more specifically, setting aside Martino’s suggestion of manipulation (which is also quite possible with post-stratification weights), would the PPP method introduce more potential random error than weighting? From Blumenthal’s blog: B.J. Martino, a senior vice president at the Republican polling firm The Tarrance Group, went on an 30-minute Twitter rant on Tuesday questioning the unorthodox method used by PPP [Public Policy Polling] to select samples and weight data: “Looking at @ppppolls new VA SW. Wondering how many interviews they discarded to get down to 601 completes? Because @ppppolls discards a LOT of interviews. Of 64,811 conducted

2 0.95993394 1687 andrew gelman stats-2013-01-21-Workshop on science communication for graduate students

Introduction: Nathan Sanders writes: Applications are now open for the Communicating Science 2013 workshop (http://workshop.astrobites.com/), to be held in Cambridge, MA on June 13-15th, 2013. Graduate students at US institutions in all fields of science and engineering are encouraged to apply – funding is available for travel expenses and accommodations. The application can be found here: http://workshop.astrobites.org/application Participants will build the communication skills that technical professionals need to express complex ideas to their peers, experts in other fields, and the general public. There will be panel discussions on the following topics: * Engaging Non-Scientific Audiences * Science Writing for a Cause * Communicating Science Through Fiction * Sharing Science with Scientists * The World of Non-Academic Publishing * Communicating using Multimedia and the Web In addition to these discussions, ample time is allotted for interacting with the experts and with att

3 0.95661628 2133 andrew gelman stats-2013-12-13-Flexibility is good

Introduction: If I made a separate post for each interesting blog discussion, we’d get overwhelmed. That’s why I often leave detailed responses in the comments section, even though I’m pretty sure that most readers don’t look in the comments at all. Sometimes, though, I think it’s good to bring such discussions to light. Here’s a recent example. Michael wrote : Poor predictive performance usually indicates that the model isn’t sufficiently flexible to explain the data, and my understanding of the proper Bayesian strategy is to feed that back into your original model and try again until you achieve better performance. Corey replied : It was my impression that — in ML at least — poor predictive performance is more often due to the model being too flexible and fitting noise. And Rahul agreed : Good point. A very flexible model will describe your training data perfectly and then go bonkers when unleashed on wild data. But I wrote : Overfitting comes from a model being flex

4 0.95628071 2051 andrew gelman stats-2013-10-04-Scientific communication that accords you “the basic human dignity of allowing you to draw your own conclusions”

Introduction: Amanda Martinez, a writer for The Atlantic and others, advised attendees that her favorite writing “accorded me the basic human dignity of allowing me to draw my own conclusions.” I really like that way of putting it, and this is something we tried hard to do with Red State Blue State, to put the information and our reasoning right there in front of the reader, rather than hiding behind a bunch of statistically-significant regression coefficients. This is related to the idea of presenting research findings quantitatively (which, I think, lends itself to clearer statements of uncertainty and variation) rather than qualitatively (which seems to come out more deterministically, as “X causes Y” or “when A happens, B happens”). The above quote comes from a conference of students organized by Nathan Sanders, who writes: Thanks so much for posting an announcement about the Communicating Science workshop (ComSciCon) back in January! With the help of your blog, we received more than

5 0.95439798 651 andrew gelman stats-2011-04-06-My talk at Northwestern University tomorrow (Thursday)

Introduction: Of Beauty, Sex, and Power: Statistical Challenges in Estimating Small Effects. At the Institute of Policy Research, Thurs 7 Apr 2011, 3.30pm . Regular blog readers know all about this topic. ( Here are the slides.) But, rest assured, I don’t just mock. I also offer constructive suggestions. My last talk at Northwestern was fifteen years ago. Actually, I gave two lectures then, in the process of being turned down for a job enjoying their chilly Midwestern hospitality. P.S. I searched on the web and also found this announcement which gives the wrong title.

same-blog 6 0.95392364 1392 andrew gelman stats-2012-06-26-Occam

7 0.95142376 639 andrew gelman stats-2011-03-31-Bayes: radical, liberal, or conservative?

8 0.95128709 1944 andrew gelman stats-2013-07-18-You’ll get a high Type S error rate if you use classical statistical methods to analyze data from underpowered studies

9 0.9494462 1491 andrew gelman stats-2012-09-10-Update on Levitt paper on child car seats

10 0.94906563 1024 andrew gelman stats-2011-11-23-Of hypothesis tests and Unitarians

11 0.94546252 1539 andrew gelman stats-2012-10-18-IRB nightmares

12 0.94308388 1344 andrew gelman stats-2012-05-25-Question 15 of my final exam for Design and Analysis of Sample Surveys

13 0.93910915 1034 andrew gelman stats-2011-11-29-World Class Speakers and Entertainers

14 0.93710673 466 andrew gelman stats-2010-12-13-“The truth wears off: Is there something wrong with the scientific method?”

15 0.93544739 2057 andrew gelman stats-2013-10-10-Chris Chabris is irritated by Malcolm Gladwell

16 0.93306684 1421 andrew gelman stats-2012-07-19-Alexa, Maricel, and Marty: Three cellular automata who got on my nerves

17 0.92878813 1041 andrew gelman stats-2011-12-04-David MacKay and Occam’s Razor

18 0.92805463 2118 andrew gelman stats-2013-11-30-???

19 0.92473125 868 andrew gelman stats-2011-08-24-Blogs vs. real journalism

20 0.92409384 2149 andrew gelman stats-2013-12-26-Statistical evidence for revised standards