andrew_gelman_stats andrew_gelman_stats-2011 andrew_gelman_stats-2011-774 knowledge-graph by maker-knowledge-mining

774 andrew gelman stats-2011-06-20-The pervasive twoishness of statistics; in particular, the “sampling distribution” and the “likelihood” are two different models, and that’s a good thing


meta infos for this blog

Source: html

Introduction: Lots of good statistical methods make use of two models. For example: - Classical statistics: estimates and standard errors using the likelihood function; tests and p-values using the sampling distribution. (The sampling distribution is not equivalent to the likelihood, as has been much discussed, for example in sequential stopping problems.) - Bayesian data analysis: inference using the posterior distribution; model checking using the predictive distribution (which, again, depends on the data-generating process in a way that the likelihood does not). - Machine learning: estimation using the data; evaluation using cross-validation (which requires some rule for partitioning the data, a rule that stands outside of the data themselves). - Bootstrap, jackknife, etc: estimation using an “estimator” (which, I would argue, is based in some sense on a model for the data), uncertainties using resampling (which, I would argue, is close to the idea of a “sampling distribution” in


Summary: the most important sentenses genereted by tfidf model

sentIndex sentText sentNum sentScore

1 Lots of good statistical methods make use of two models. [sent-1, score-0.167]

2 For example: - Classical statistics: estimates and standard errors using the likelihood function; tests and p-values using the sampling distribution. [sent-2, score-0.875]

3 (The sampling distribution is not equivalent to the likelihood, as has been much discussed, for example in sequential stopping problems. [sent-3, score-0.846]

4 ) - Bayesian data analysis: inference using the posterior distribution; model checking using the predictive distribution (which, again, depends on the data-generating process in a way that the likelihood does not). [sent-4, score-1.218]

5 - Machine learning: estimation using the data; evaluation using cross-validation (which requires some rule for partitioning the data, a rule that stands outside of the data themselves). [sent-5, score-0.982]

6 This commonality across these very different statistical procedures suggests to me that thinking on parallel tracks is an important and fundamental property of statistics. [sent-7, score-0.317]

7 Perhaps, rather than trying to systematize all statistical learning into a single inferential framework (whether it be Neyman-Pearson hypothesis testing, Bayesian inference over graphical models, or some purely predictive behavioristic approach), we would be better off embracing our twoishness. [sent-8, score-0.623]

8 This relates to my philosophizing with Shalizi on falsification, Popper, Kuhn, and statistics as normal and revolutionary science. [sent-9, score-0.272]

9 Twoishness also has relevance to statistical practice in focusing one’s attention on both parts of the model. [sent-10, score-0.075]

10 To see this, step back for a moment and consider the transition from optimization problems such as “least squares” to model-based inference such as “maximum likelihood under the normal distribution. [sent-11, score-0.61]

11 ” Moving from the procedure to the model was a step forward in that models can be understood, checked, and generalized, in a way that is more difficult with mere procedures. [sent-12, score-0.516]

12 Or maybe I will take a slightly more cautious and thus defensible position and say that, if the goal is to understand, check, and generalize a learning algorithm (such as least squares), it can help to understand its expression as model-based inference. [sent-13, score-0.297]

13 Once we recognize, for example, that bootstrap inference has two models (the implicit data model underlying the estimator, and the sampling model for the bootstrapping), we can ask questions such as: - Are the two models coherent? [sent-15, score-1.738]

14 Can we learn anything from the data model that will help with the sampling model, and vice-versa? [sent-16, score-0.722]

15 This is often treated as automatic or as somewhat of a technical problem (for example, how do you bootstrap time series data), but ultimately, as with any sampling problem, it should depend on the problem context. [sent-18, score-0.795]

16 Recognizing the bootstrapping step as a model (rather than simply a computational trick), the user is on the way to choosing the model rather than automatically taking the default. [sent-19, score-0.693]

17 There are aspects of sampling distributions (for example, sequential design) that don’t arise in the data at hand, and there are aspects of inference (for example, regularization) that don’t come from the sampling distribution. [sent-22, score-1.499]

18 So it makes sense to me that two models are needed. [sent-23, score-0.214]


similar blogs computed by tfidf model

tfidf for this blog:

wordName wordTfidf (topN-words)

[('sampling', 0.379), ('bootstrap', 0.222), ('twoishness', 0.214), ('model', 0.211), ('likelihood', 0.172), ('sequential', 0.165), ('using', 0.162), ('inference', 0.155), ('bootstrapping', 0.153), ('estimator', 0.138), ('distribution', 0.135), ('data', 0.132), ('learning', 0.124), ('models', 0.122), ('squares', 0.121), ('step', 0.118), ('arise', 0.099), ('commonality', 0.097), ('jackknife', 0.097), ('philosophizing', 0.097), ('resampling', 0.097), ('example', 0.097), ('aspects', 0.095), ('requires', 0.094), ('partitioning', 0.092), ('systematize', 0.092), ('estimation', 0.092), ('two', 0.092), ('rule', 0.091), ('normal', 0.09), ('predictive', 0.089), ('defensible', 0.088), ('embracing', 0.088), ('argue', 0.086), ('revolutionary', 0.085), ('cautious', 0.085), ('tracks', 0.077), ('kuhn', 0.077), ('transition', 0.075), ('statistical', 0.075), ('uncertainties', 0.074), ('falsification', 0.071), ('stopping', 0.07), ('property', 0.068), ('stands', 0.066), ('automatic', 0.066), ('popper', 0.065), ('mere', 0.065), ('regularization', 0.065), ('problem', 0.064)]

similar blogs list:

simIndex simValue blogId blogTitle

same-blog 1 1.0000002 774 andrew gelman stats-2011-06-20-The pervasive twoishness of statistics; in particular, the “sampling distribution” and the “likelihood” are two different models, and that’s a good thing

Introduction: Lots of good statistical methods make use of two models. For example: - Classical statistics: estimates and standard errors using the likelihood function; tests and p-values using the sampling distribution. (The sampling distribution is not equivalent to the likelihood, as has been much discussed, for example in sequential stopping problems.) - Bayesian data analysis: inference using the posterior distribution; model checking using the predictive distribution (which, again, depends on the data-generating process in a way that the likelihood does not). - Machine learning: estimation using the data; evaluation using cross-validation (which requires some rule for partitioning the data, a rule that stands outside of the data themselves). - Bootstrap, jackknife, etc: estimation using an “estimator” (which, I would argue, is based in some sense on a model for the data), uncertainties using resampling (which, I would argue, is close to the idea of a “sampling distribution” in

2 0.24737117 781 andrew gelman stats-2011-06-28-The holes in my philosophy of Bayesian data analysis

Introduction: I’ve been writing a lot about my philosophy of Bayesian statistics and how it fits into Popper’s ideas about falsification and Kuhn’s ideas about scientific revolutions. Here’s my long, somewhat technical paper with Cosma Shalizi. Here’s our shorter overview for the volume on the philosophy of social science. Here’s my latest try (for an online symposium), focusing on the key issues. I’m pretty happy with my approach–the familiar idea that Bayesian data analysis iterates the three steps of model building, inference, and model checking–but it does have some unresolved (maybe unresolvable) problems. Here are a couple mentioned in the third of the above links. Consider a simple model with independent data y_1, y_2, .., y_10 ~ N(θ,σ^2), with a prior distribution θ ~ N(0,10^2) and σ known and taking on some value of approximately 10. Inference about μ is straightforward, as is model checking, whether based on graphs or numerical summaries such as the sample variance and skewn

3 0.20057833 754 andrew gelman stats-2011-06-09-Difficulties with Bayesian model averaging

Introduction: In response to this article by Cosma Shalizi and myself on the philosophy of Bayesian statistics, David Hogg writes: I [Hogg] agree–even in physics and astronomy–that the models are not “True” in the God-like sense of being absolute reality (that is, I am not a realist); and I have argued (a philosophically very naive paper, but hey, I was new to all this) that for pretty fundamental reasons we could never arrive at the True (with a capital “T”) model of the Universe. The goal of inference is to find the “best” model, where “best” might have something to do with prediction, or explanation, or message length, or (horror!) our utility. Needless to say, most of my physics friends *are* realists, even in the face of “effective theories” as Newtonian mechanics is an effective theory of GR and GR is an effective theory of “quantum gravity” (this plays to your point, because if you think any theory is possibly an effective theory, how could you ever find Truth?). I also liked the i

4 0.20008045 1972 andrew gelman stats-2013-08-07-When you’re planning on fitting a model, build up to it by fitting simpler models first. Then, once you have a model you like, check the hell out of it

Introduction: In response to my remarks on his online book, Think Bayes, Allen Downey wrote: I [Downey] have a question about one of your comments: My [Gelman's] main criticism with both books is that they talk a lot about inference but not so much about model building or model checking (recall the three steps of Bayesian data analysis). I think it’s ok for an introductory book to focus on inference, which of course is central to the data-analytic process—but I’d like them to at least mention that Bayesian ideas arise in model building and model checking as well. This sounds like something I agree with, and one of the things I tried to do in the book is to put modeling decisions front and center. But the word “modeling” is used in lots of ways, so I want to see if we are talking about the same thing. For example, in many chapters, I start with a simple model of the scenario, do some analysis, then check whether the model is good enough, and iterate. Here’s the discussion of modeling

5 0.19868416 1144 andrew gelman stats-2012-01-29-How many parameters are in a multilevel model?

Introduction: Stephen Collins writes: I’m reading your Multilevel modeling book and am trying to apply it to my work. I’m concerned with how to estimate a random intercept model if there are hundreds/thousands of levels. In the Gibbs sampling, am I sampling a parameter for each level? Or, just the hyper-parameters? In other words, say I had 500 zipcode intercepts modeled as ~ N(m,s). Would my posterior be two dimensional, sampling for “m” and “s,” or would it have 502 dimensions? My reply: Indeed you will have hundreds or thousands of parameters—or, in classical terms, hundreds or thousands of predictive quantities. But that’s ok. Even if none of those predictions is precise, you’re learning about the model. See page 526 of the book for more discussion of the number of parameters in a multilevel model.

6 0.19507544 2201 andrew gelman stats-2014-02-06-Bootstrap averaging: Examples where it works and where it doesn’t work

7 0.19422649 1628 andrew gelman stats-2012-12-17-Statistics in a world where nothing is random

8 0.18892057 662 andrew gelman stats-2011-04-15-Bayesian statistical pragmatism

9 0.18841958 1247 andrew gelman stats-2012-04-05-More philosophy of Bayes

10 0.18753697 291 andrew gelman stats-2010-09-22-Philosophy of Bayes and non-Bayes: A dialogue with Deborah Mayo

11 0.18397883 1363 andrew gelman stats-2012-06-03-Question about predictive checks

12 0.18005039 1735 andrew gelman stats-2013-02-24-F-f-f-fake data

13 0.1770519 2176 andrew gelman stats-2014-01-19-Transformations for non-normal data

14 0.17227729 1482 andrew gelman stats-2012-09-04-Model checking and model understanding in machine learning

15 0.1712908 1431 andrew gelman stats-2012-07-27-Overfitting

16 0.16932298 1950 andrew gelman stats-2013-07-22-My talks that were scheduled for Tues at the Data Skeptics meetup and Wed at the Open Statistical Programming meetup

17 0.16814971 1719 andrew gelman stats-2013-02-11-Why waste time philosophizing?

18 0.16758995 1779 andrew gelman stats-2013-03-27-“Two Dogmas of Strong Objective Bayesianism”

19 0.16669385 85 andrew gelman stats-2010-06-14-Prior distribution for design effects

20 0.16454342 1763 andrew gelman stats-2013-03-14-Everyone’s trading bias for variance at some point, it’s just done at different places in the analyses


similar blogs computed by lsi model

lsi for this blog:

topicId topicWeight

[(0, 0.265), (1, 0.278), (2, -0.022), (3, 0.05), (4, 0.008), (5, 0.059), (6, -0.089), (7, 0.011), (8, 0.092), (9, -0.04), (10, -0.018), (11, 0.006), (12, -0.087), (13, -0.009), (14, -0.098), (15, -0.056), (16, 0.025), (17, -0.029), (18, 0.0), (19, -0.03), (20, 0.032), (21, -0.088), (22, -0.031), (23, 0.003), (24, -0.025), (25, 0.044), (26, -0.064), (27, 0.05), (28, 0.095), (29, 0.068), (30, -0.062), (31, -0.025), (32, -0.038), (33, 0.065), (34, -0.023), (35, 0.054), (36, -0.053), (37, -0.023), (38, -0.059), (39, 0.051), (40, 0.044), (41, 0.007), (42, 0.001), (43, 0.033), (44, -0.003), (45, -0.017), (46, -0.008), (47, 0.014), (48, 0.071), (49, -0.048)]

similar blogs list:

simIndex simValue blogId blogTitle

same-blog 1 0.97077739 774 andrew gelman stats-2011-06-20-The pervasive twoishness of statistics; in particular, the “sampling distribution” and the “likelihood” are two different models, and that’s a good thing

Introduction: Lots of good statistical methods make use of two models. For example: - Classical statistics: estimates and standard errors using the likelihood function; tests and p-values using the sampling distribution. (The sampling distribution is not equivalent to the likelihood, as has been much discussed, for example in sequential stopping problems.) - Bayesian data analysis: inference using the posterior distribution; model checking using the predictive distribution (which, again, depends on the data-generating process in a way that the likelihood does not). - Machine learning: estimation using the data; evaluation using cross-validation (which requires some rule for partitioning the data, a rule that stands outside of the data themselves). - Bootstrap, jackknife, etc: estimation using an “estimator” (which, I would argue, is based in some sense on a model for the data), uncertainties using resampling (which, I would argue, is close to the idea of a “sampling distribution” in

2 0.85846114 1363 andrew gelman stats-2012-06-03-Question about predictive checks

Introduction: Klaas Metselaar writes: I [Metselaar] am currently involved in a discussion about the use of the notion “predictive” as used in “posterior predictive check”. I would argue that the notion “predictive” should be reserved for posterior checks using information not used in the determination of the posterior. I quote from the discussion: “However, the predictive uncertainty in a Bayesian calculation requires sampling from all the random variables, and this includes both the model parameters and the residual error”. My [Metselaar's] comment: This may be exactly the point I am worried about: shouldn’t the predictive uncertainty be defined as sampling from the posterior parameter distribution + residual error + sampling from the prediction error distribution? Residual error reduces to measurement error in the case of a model which is perfect for the sample of experiments. Measurement error could be reduced to almost zero by ideal and perfect measurement instruments. I would h

3 0.85373634 1401 andrew gelman stats-2012-06-30-David Hogg on statistics

Introduction: Data analysis recipes: Fitting a model to data : We go through the many considerations involved in fitting a model to data, using as an example the fit of a straight line to a set of points in a two-dimensional plane. Standard weighted least-squares fitting is only appropriate when there is a dimension along which the data points have negligible uncertainties, and another along which all the uncertainties can be described by Gaussians of known variance; these conditions are rarely met in practice. We consider cases of general, heterogeneous, and arbitrarily covariant two-dimensional uncertainties, and situations in which there are bad data (large outliers), unknown uncertainties, and unknown but expected intrinsic scatter in the linear relationship being fit. Above all we emphasize the importance of having a “generative model” for the data, even an approximate one. Once there is a generative model, the subsequent fitting is non-arbitrary because the model permits direct computation

4 0.84306729 1406 andrew gelman stats-2012-07-05-Xiao-Li Meng and Xianchao Xie rethink asymptotics

Introduction: In an article catchily entitled, “I got more data, my model is more refined, but my estimator is getting worse! Am I just dumb?”, Meng and Xie write: Possibly, but more likely you are merely a victim of conventional wisdom. More data or better models by no means guarantee better estimators (e.g., with a smaller mean squared error), when you are not following probabilistically principled methods such as MLE (for large samples) or Bayesian approaches. Estimating equations are par- ticularly vulnerable in this regard, almost a necessary price for their robustness. These points will be demonstrated via common tasks of estimating regression parameters and correlations, under simple mod- els such as bivariate normal and ARCH(1). Some general strategies for detecting and avoiding such pitfalls are suggested, including checking for self-efficiency (Meng, 1994, Statistical Science) and adopting a guiding working model. Using the example of estimating the autocorrelation ρ under a statio

5 0.83448535 1527 andrew gelman stats-2012-10-10-Another reason why you can get good inferences from a bad model

Introduction: John Cook considers how people justify probability distribution assumptions: Sometimes distribution assumptions are not justified. Sometimes distributions can be derived from fundamental principles [or] . . . on theoretical grounds. For example, large samples and the central limit theorem together may justify assuming that something is normally distributed. Often the choice of distribution is somewhat arbitrary, chosen by intuition or for convenience, and then empirically shown to work well enough. Sometimes a distribution can be a bad fit and still work well, depending on what you’re asking of it. Cook continues: The last point is particularly interesting. It’s not hard to imagine that a poor fit would produce poor results. It’s surprising when a poor fit produces good results. And then he gives an example of an effective but inaccurate model used to model survival times in a clinical trial. Cook explains: The [poorly-fitting] method works well because of the q

6 0.81877881 2176 andrew gelman stats-2014-01-19-Transformations for non-normal data

7 0.81582224 1460 andrew gelman stats-2012-08-16-“Real data can be a pain”

8 0.81321156 781 andrew gelman stats-2011-06-28-The holes in my philosophy of Bayesian data analysis

9 0.80823708 320 andrew gelman stats-2010-10-05-Does posterior predictive model checking fit with the operational subjective approach?

10 0.80546427 754 andrew gelman stats-2011-06-09-Difficulties with Bayesian model averaging

11 0.79953158 1476 andrew gelman stats-2012-08-30-Stan is fast

12 0.79653275 1287 andrew gelman stats-2012-04-28-Understanding simulations in terms of predictive inference?

13 0.78884447 1162 andrew gelman stats-2012-02-11-Adding an error model to a deterministic model

14 0.7879023 1431 andrew gelman stats-2012-07-27-Overfitting

15 0.78699887 1482 andrew gelman stats-2012-09-04-Model checking and model understanding in machine learning

16 0.78556162 1041 andrew gelman stats-2011-12-04-David MacKay and Occam’s Razor

17 0.77842736 773 andrew gelman stats-2011-06-18-Should we always be using the t and robit instead of the normal and logit?

18 0.77807903 964 andrew gelman stats-2011-10-19-An interweaving-transformation strategy for boosting MCMC efficiency

19 0.77538037 244 andrew gelman stats-2010-08-30-Useful models, model checking, and external validation: a mini-discussion

20 0.76881468 662 andrew gelman stats-2011-04-15-Bayesian statistical pragmatism


similar blogs computed by lda model

lda for this blog:

topicId topicWeight

[(13, 0.011), (15, 0.033), (16, 0.058), (21, 0.038), (24, 0.196), (53, 0.025), (68, 0.134), (84, 0.037), (86, 0.046), (99, 0.311)]

similar blogs list:

simIndex simValue blogId blogTitle

1 0.98787618 1674 andrew gelman stats-2013-01-15-Prior Selection for Vector Autoregressions

Introduction: Brendan Nyhan sends along this paper by Domenico Giannone, Michele Lenza, and Giorgio Primiceri: Vector autoregressions are flexible time series models that can capture complex dynamic interrelationships among macroeconomic variables. However, their dense parameterization leads to unstable inference and inaccurate out-of-sample forecasts, particularly for models with many variables. A solution to this problem is to use informative priors, in order to shrink the richly parameterized unrestricted model towards a parsimonious naive benchmark, and thus reduce estimation uncertainty. This paper studies the optimal choice of the informativeness of these priors, which we treat as additional parameters, in the spirit of hierarchical modeling. This approach is theoretically grounded, easy to implement, and greatly reduces the number and importance of subjective choices in the setting of the prior. Moreover, it performs very well both in terms of out-of-sample forecasting—as well as factor

2 0.96691144 924 andrew gelman stats-2011-09-24-“Income can’t be used to predict political opinion”

Introduction: What really irritates me about this column (by John Steele Gordon) is not how stupid it is (an article about “millionaires” that switches within the very same paragraph between “a nest egg of $1 million” and “a $1 million annual income” without acknowledging the difference between these concepts) or the ignorance it displays (no, it’s not true that “McCain carried the middle class” in 2008—unless by “middle class” you mean “middle class whites”). No, what really ticks me off is that, when the Red State Blue State book was coming out, we pitched a “5 myths” article for the Washington Post, and they turned us down! Perhaps the rule is: if it’s in the Opinions section of the paper, it can’t contain any facts? Or, to be more precise, any facts it contains must be counterbalanced by an equal number of inanities? Grrrrr . . . I haven’t been so annoyed since reading that New York Times article that argued that electoral politics is just like high school. Who needs political scie

3 0.95756769 913 andrew gelman stats-2011-09-16-Groundhog day in August?

Introduction: A colleague writes: Due to my similar interest in plagiarism , I went to The Human Cultural and Social Landscape session. [The recipient of the American Statistical Association's Founders Award in 2002] gave the first talk in the session instead of Yasmin Said, which was modestly attended (20 or so people) and gave a sociology talk with no numbers — and no attribution to where these ideas (on Afghanistan culture) came from. Would it really have hurt to give the source of this? I’m on board with plain laziness for this one. I think he may have mentioned a number of his collaborators at the beginning, and all he talked about were cultural customs and backgrounds, no science to speak of. It’s kind of amazing to me that he actually showed up at JSM, but of course if he had any shame, he wouldn’t have repeatedly stolen copied without proper attribution in the first place. It’s not even like Doris Kearns Goodwin who reportedly produced a well-written book out of it!

4 0.95685208 2201 andrew gelman stats-2014-02-06-Bootstrap averaging: Examples where it works and where it doesn’t work

Introduction: Aki and I write : The very generality of the boostrap creates both opportunity and peril, allowing researchers to solve otherwise intractable problems but also sometimes leading to an answer with an inappropriately high level of certainty. We demonstrate with two examples from our own research: one problem where bootstrap smoothing was effective and led us to an improved method, and another case where bootstrap smoothing would not solve the underlying problem. Our point in these examples is not to disparage bootstrapping but rather to gain insight into where it will be more or less effective as a smoothing tool. An example where bootstrap smoothing works well Bayesian posterior distributions are commonly summarized using Monte Carlo simulations, and inferences for scalar parameters or quantities of interest can be summarized using 50% or 95% intervals. A interval for a continuous quantity is typically constructed either as a central probability interval (with probabili

same-blog 5 0.95666111 774 andrew gelman stats-2011-06-20-The pervasive twoishness of statistics; in particular, the “sampling distribution” and the “likelihood” are two different models, and that’s a good thing

Introduction: Lots of good statistical methods make use of two models. For example: - Classical statistics: estimates and standard errors using the likelihood function; tests and p-values using the sampling distribution. (The sampling distribution is not equivalent to the likelihood, as has been much discussed, for example in sequential stopping problems.) - Bayesian data analysis: inference using the posterior distribution; model checking using the predictive distribution (which, again, depends on the data-generating process in a way that the likelihood does not). - Machine learning: estimation using the data; evaluation using cross-validation (which requires some rule for partitioning the data, a rule that stands outside of the data themselves). - Bootstrap, jackknife, etc: estimation using an “estimator” (which, I would argue, is based in some sense on a model for the data), uncertainties using resampling (which, I would argue, is close to the idea of a “sampling distribution” in

6 0.95334846 1568 andrew gelman stats-2012-11-07-That last satisfaction at the end of the career

7 0.95311153 875 andrew gelman stats-2011-08-28-Better than Dennis the dentist or Laura the lawyer

8 0.95050788 877 andrew gelman stats-2011-08-29-Applying quantum probability to political science

9 0.94131458 2055 andrew gelman stats-2013-10-08-A Bayesian approach for peer-review panels? and a speculation about Bruno Frey

10 0.94044375 622 andrew gelman stats-2011-03-21-A possible resolution of the albedo mystery!

11 0.93804455 1284 andrew gelman stats-2012-04-26-Modeling probability data

12 0.93769062 2149 andrew gelman stats-2013-12-26-Statistical evidence for revised standards

13 0.93549234 1883 andrew gelman stats-2013-06-04-Interrogating p-values

14 0.9353205 1225 andrew gelman stats-2012-03-22-Procrastination as a positive productivity strategy

15 0.93477011 899 andrew gelman stats-2011-09-10-The statistical significance filter

16 0.93459588 2220 andrew gelman stats-2014-02-22-Quickies

17 0.93390906 1114 andrew gelman stats-2012-01-12-Controversy about average personality differences between men and women

18 0.9329021 1247 andrew gelman stats-2012-04-05-More philosophy of Bayes

19 0.93214446 2299 andrew gelman stats-2014-04-21-Stan Model of the Week: Hierarchical Modeling of Supernovas

20 0.93169439 2140 andrew gelman stats-2013-12-19-Revised evidence for statistical standards