andrew_gelman_stats andrew_gelman_stats-2012 andrew_gelman_stats-2012-1162 knowledge-graph by maker-knowledge-mining

1162 andrew gelman stats-2012-02-11-Adding an error model to a deterministic model


meta infos for this blog

Source: html

Introduction: Daniel Lakeland asks , “Where do likelihoods come from?” He describes a class of problems where you have a deterministic dynamic model that you want to fit to data. The data won’t fit perfectly so, if you want to do Bayesian inference, you need to introduce an error model. This looks a little bit different from the usual way that models are presented in statistics textbooks, where the focus is typically on the random error process, not on the deterministic part of the model. A focus on the error process makes sense in some applications that have inherent randomness or variation (for example, genetics, psychology, and survey sampling) but not so much in the physical sciences, where the deterministic model can be complicated and is typically the essence of the study. Often in these sorts of studies, the staring point (and sometimes the ending point) is what the physicists call “nonlinear least squares” or what we would call normally-distributed errors. That’s what we did for our


Summary: the most important sentenses genereted by tfidf model

sentIndex sentText sentNum sentScore

1 Daniel Lakeland asks , “Where do likelihoods come from? [sent-1, score-0.186]

2 ” He describes a class of problems where you have a deterministic dynamic model that you want to fit to data. [sent-2, score-0.892]

3 The data won’t fit perfectly so, if you want to do Bayesian inference, you need to introduce an error model. [sent-3, score-0.659]

4 This looks a little bit different from the usual way that models are presented in statistics textbooks, where the focus is typically on the random error process, not on the deterministic part of the model. [sent-4, score-1.271]

5 A focus on the error process makes sense in some applications that have inherent randomness or variation (for example, genetics, psychology, and survey sampling) but not so much in the physical sciences, where the deterministic model can be complicated and is typically the essence of the study. [sent-5, score-2.241]

6 Often in these sorts of studies, the staring point (and sometimes the ending point) is what the physicists call “nonlinear least squares” or what we would call normally-distributed errors. [sent-6, score-0.677]

7 That’s what we did for our toxicology and dilution-assay models. [sent-7, score-0.128]

8 Sometimes it makes sense to have the error variance scale as a power of the magnitude of the measurement. [sent-8, score-0.608]

9 The error terms in these models typically include model error as well as measurement variation. [sent-9, score-1.254]

10 In other settings you might put errors in different places in the model, corresponding to different sources of variation and model error. [sent-10, score-0.806]

11 For discrete data, Iven Van Mechelen and I suggested a generic approach for adding error to a deterministic model, but I don’t think this really would work with Lakeland’s examples. [sent-11, score-1.139]


similar blogs computed by tfidf model

tfidf for this blog:

wordName wordTfidf (topN-words)

[('deterministic', 0.435), ('error', 0.374), ('lakeland', 0.287), ('model', 0.18), ('typically', 0.175), ('variation', 0.136), ('iven', 0.128), ('mechelen', 0.128), ('toxicology', 0.128), ('staring', 0.128), ('focus', 0.121), ('ending', 0.12), ('essence', 0.117), ('process', 0.115), ('call', 0.114), ('randomness', 0.112), ('likelihoods', 0.112), ('van', 0.106), ('genetics', 0.102), ('physicists', 0.102), ('introduce', 0.1), ('inherent', 0.1), ('sometimes', 0.099), ('dynamic', 0.099), ('nonlinear', 0.099), ('fit', 0.098), ('squares', 0.095), ('generic', 0.091), ('different', 0.09), ('textbooks', 0.09), ('perfectly', 0.087), ('corresponding', 0.086), ('discrete', 0.084), ('magnitude', 0.084), ('daniel', 0.084), ('describes', 0.08), ('adding', 0.079), ('physical', 0.078), ('makes', 0.078), ('complicated', 0.076), ('suggested', 0.076), ('models', 0.076), ('sources', 0.076), ('settings', 0.076), ('measurement', 0.075), ('asks', 0.074), ('sciences', 0.074), ('sense', 0.072), ('places', 0.072), ('applications', 0.072)]

similar blogs list:

simIndex simValue blogId blogTitle

same-blog 1 0.99999988 1162 andrew gelman stats-2012-02-11-Adding an error model to a deterministic model

Introduction: Daniel Lakeland asks , “Where do likelihoods come from?” He describes a class of problems where you have a deterministic dynamic model that you want to fit to data. The data won’t fit perfectly so, if you want to do Bayesian inference, you need to introduce an error model. This looks a little bit different from the usual way that models are presented in statistics textbooks, where the focus is typically on the random error process, not on the deterministic part of the model. A focus on the error process makes sense in some applications that have inherent randomness or variation (for example, genetics, psychology, and survey sampling) but not so much in the physical sciences, where the deterministic model can be complicated and is typically the essence of the study. Often in these sorts of studies, the staring point (and sometimes the ending point) is what the physicists call “nonlinear least squares” or what we would call normally-distributed errors. That’s what we did for our

2 0.45810711 780 andrew gelman stats-2011-06-27-Bridges between deterministic and probabilistic models for binary data

Introduction: For the analysis of binary data, various deterministic models have been proposed, which are generally simpler to fit and easier to understand than probabilistic models. We claim that corresponding to any deterministic model is an implicit stochastic model in which the deterministic model fits imperfectly, with errors occurring at random. In the context of binary data, we consider a model in which the probability of error depends on the model prediction. We show how to fit this model using a stochastic modification of deterministic optimization schemes. The advantages of fitting the stochastic model explicitly (rather than implicitly, by simply fitting a deterministic model and accepting the occurrence of errors) include quantification of uncertainty in the deterministic model’s parameter estimates, better estimation of the true model error rate, and the ability to check the fit of the model nontrivially. We illustrate this with a simple theoretical example of item response data and w

3 0.39096999 24 andrew gelman stats-2010-05-09-Special journal issue on statistical methods for the social sciences

Introduction: Last year I spoke at a conference celebrating the 10th anniversary of the University of Washington’s Center for Statistics and the Social Sciences, and just today a special issue of the journal Statistical Methodology came out in honor of the center’s anniversary. My article in the special issue actually has nothing to do with my talk at the conference; rather, it’s an exploration of an idea that Iven Van Mechelen and I had for understanding deterministic models probabilistically: For the analysis of binary data, various deterministic models have been proposed, which are generally simpler to fit and easier to understand than probabilistic models. We claim that corresponding to any deterministic model is an implicit stochastic model in which the deterministic model fits imperfectly, with errors occurring at random. In the context of binary data, we consider a model in which the probability of error depends on the model prediction. We show how to fit this model using a stocha

4 0.19463867 822 andrew gelman stats-2011-07-26-Any good articles on the use of error bars?

Introduction: Hadley Wickham asks: I was wondering if you knew of any good articles on the use of error bars. I’m particularly looking for articles that discuss the difference between error of means and error of difference in the context of models (e.g. mixed models) where they are very different. I suspect every applied field has a couple of good articles, but it’s really hard to search for them. Can anyone help on this? My only advice is to get rid of those horrible crossbars at the ends of the error bars. The crossbars draw attention to the error bars’ endpoints, which are generally not important at all. See, for example, my Anova paper , for some examples of how I like error bars to look.

5 0.16760302 773 andrew gelman stats-2011-06-18-Should we always be using the t and robit instead of the normal and logit?

Introduction: My (coauthored) books on Bayesian data analysis and applied regression are like almost all the other statistics textbooks out there, in that we spend most of our time on the basic distributions such as normal and logistic and then, only as an aside, discuss robust models such as t and robit. Why aren’t the t and robit front and center? Sure, I can see starting with the normal (at least in the Bayesian book, where we actually work out all the algebra), but then why don’t we move on immediately to the real stuff? This isn’t just (or mainly) a question of textbooks or teaching; I’m really thinking here about statistical practice. My statistical practice. Should t and robit be the default? If not, why not? Some possible answers: 10. Estimating the degrees of freedom in the error distribution isn’t so easy, and throwing this extra parameter into the model could make inference unstable. 9. Real data usually don’t have outliers. In practice, fitting a robust model costs you

6 0.14328058 494 andrew gelman stats-2010-12-31-Type S error rates for classical and Bayesian single and multiple comparison procedures

7 0.14012654 1363 andrew gelman stats-2012-06-03-Question about predictive checks

8 0.13711387 1737 andrew gelman stats-2013-02-25-Correlation of 1 . . . too good to be true?

9 0.13197814 1141 andrew gelman stats-2012-01-28-Using predator-prey models on the Canadian lynx series

10 0.13021746 1395 andrew gelman stats-2012-06-27-Cross-validation (What is it good for?)

11 0.12927762 1972 andrew gelman stats-2013-08-07-When you’re planning on fitting a model, build up to it by fitting simpler models first. Then, once you have a model you like, check the hell out of it

12 0.12778042 1392 andrew gelman stats-2012-06-26-Occam

13 0.12689614 774 andrew gelman stats-2011-06-20-The pervasive twoishness of statistics; in particular, the “sampling distribution” and the “likelihood” are two different models, and that’s a good thing

14 0.12658004 781 andrew gelman stats-2011-06-28-The holes in my philosophy of Bayesian data analysis

15 0.12137239 1983 andrew gelman stats-2013-08-15-More on AIC, WAIC, etc

16 0.1197303 1165 andrew gelman stats-2012-02-13-Philosophy of Bayesian statistics: my reactions to Wasserman

17 0.11964537 754 andrew gelman stats-2011-06-09-Difficulties with Bayesian model averaging

18 0.11920153 1955 andrew gelman stats-2013-07-25-Bayes-respecting experimental design and other things

19 0.11729473 214 andrew gelman stats-2010-08-17-Probability-processing hardware

20 0.11718092 1628 andrew gelman stats-2012-12-17-Statistics in a world where nothing is random


similar blogs computed by lsi model

lsi for this blog:

topicId topicWeight

[(0, 0.191), (1, 0.163), (2, 0.035), (3, -0.003), (4, 0.032), (5, 0.031), (6, -0.029), (7, -0.004), (8, 0.107), (9, 0.005), (10, 0.037), (11, 0.012), (12, -0.114), (13, 0.001), (14, -0.117), (15, -0.054), (16, 0.006), (17, -0.042), (18, -0.028), (19, -0.002), (20, 0.04), (21, -0.076), (22, -0.01), (23, -0.061), (24, -0.039), (25, 0.021), (26, -0.059), (27, 0.027), (28, 0.029), (29, -0.041), (30, -0.068), (31, 0.059), (32, -0.04), (33, -0.043), (34, 0.023), (35, -0.002), (36, -0.069), (37, -0.086), (38, 0.002), (39, -0.046), (40, -0.018), (41, -0.012), (42, -0.012), (43, 0.068), (44, -0.008), (45, 0.048), (46, -0.02), (47, -0.009), (48, -0.027), (49, 0.02)]

similar blogs list:

simIndex simValue blogId blogTitle

same-blog 1 0.97551358 1162 andrew gelman stats-2012-02-11-Adding an error model to a deterministic model

Introduction: Daniel Lakeland asks , “Where do likelihoods come from?” He describes a class of problems where you have a deterministic dynamic model that you want to fit to data. The data won’t fit perfectly so, if you want to do Bayesian inference, you need to introduce an error model. This looks a little bit different from the usual way that models are presented in statistics textbooks, where the focus is typically on the random error process, not on the deterministic part of the model. A focus on the error process makes sense in some applications that have inherent randomness or variation (for example, genetics, psychology, and survey sampling) but not so much in the physical sciences, where the deterministic model can be complicated and is typically the essence of the study. Often in these sorts of studies, the staring point (and sometimes the ending point) is what the physicists call “nonlinear least squares” or what we would call normally-distributed errors. That’s what we did for our

2 0.86663151 780 andrew gelman stats-2011-06-27-Bridges between deterministic and probabilistic models for binary data

Introduction: For the analysis of binary data, various deterministic models have been proposed, which are generally simpler to fit and easier to understand than probabilistic models. We claim that corresponding to any deterministic model is an implicit stochastic model in which the deterministic model fits imperfectly, with errors occurring at random. In the context of binary data, we consider a model in which the probability of error depends on the model prediction. We show how to fit this model using a stochastic modification of deterministic optimization schemes. The advantages of fitting the stochastic model explicitly (rather than implicitly, by simply fitting a deterministic model and accepting the occurrence of errors) include quantification of uncertainty in the deterministic model’s parameter estimates, better estimation of the true model error rate, and the ability to check the fit of the model nontrivially. We illustrate this with a simple theoretical example of item response data and w

3 0.86167371 24 andrew gelman stats-2010-05-09-Special journal issue on statistical methods for the social sciences

Introduction: Last year I spoke at a conference celebrating the 10th anniversary of the University of Washington’s Center for Statistics and the Social Sciences, and just today a special issue of the journal Statistical Methodology came out in honor of the center’s anniversary. My article in the special issue actually has nothing to do with my talk at the conference; rather, it’s an exploration of an idea that Iven Van Mechelen and I had for understanding deterministic models probabilistically: For the analysis of binary data, various deterministic models have been proposed, which are generally simpler to fit and easier to understand than probabilistic models. We claim that corresponding to any deterministic model is an implicit stochastic model in which the deterministic model fits imperfectly, with errors occurring at random. In the context of binary data, we consider a model in which the probability of error depends on the model prediction. We show how to fit this model using a stocha

4 0.81895542 1141 andrew gelman stats-2012-01-28-Using predator-prey models on the Canadian lynx series

Introduction: The “Canadian lynx data” is one of the famous examples used in time series analysis. And the usual models that are fit to these data in the statistics time-series literature, don’t work well. Cavan Reilly and Angelique Zeringue write : Reilly and Zeringue then present their analysis. Their simple little predator-prey model with a weakly informative prior way outperforms the standard big-ass autoregression models. Check this out: Or, to put it into numbers, when they fit their model to the first 80 years and predict to the next 34, their root mean square out-of-sample error is 1480 (see scale of data above). In contrast, the standard model fit to these data (the SETAR model of Tong, 1990) has more than twice as many parameters but gets a worse-performing root mean square error of 1600, even when that model is fit to the entire dataset. (If you fit the SETAR or any similar autoregressive model to the first 80 years and use it to predict the next 34, the predictions

5 0.79630589 1406 andrew gelman stats-2012-07-05-Xiao-Li Meng and Xianchao Xie rethink asymptotics

Introduction: In an article catchily entitled, “I got more data, my model is more refined, but my estimator is getting worse! Am I just dumb?”, Meng and Xie write: Possibly, but more likely you are merely a victim of conventional wisdom. More data or better models by no means guarantee better estimators (e.g., with a smaller mean squared error), when you are not following probabilistically principled methods such as MLE (for large samples) or Bayesian approaches. Estimating equations are par- ticularly vulnerable in this regard, almost a necessary price for their robustness. These points will be demonstrated via common tasks of estimating regression parameters and correlations, under simple mod- els such as bivariate normal and ARCH(1). Some general strategies for detecting and avoiding such pitfalls are suggested, including checking for self-efficiency (Meng, 1994, Statistical Science) and adopting a guiding working model. Using the example of estimating the autocorrelation ρ under a statio

6 0.79218882 773 andrew gelman stats-2011-06-18-Should we always be using the t and robit instead of the normal and logit?

7 0.78809297 1363 andrew gelman stats-2012-06-03-Question about predictive checks

8 0.78316033 1392 andrew gelman stats-2012-06-26-Occam

9 0.77955282 1395 andrew gelman stats-2012-06-27-Cross-validation (What is it good for?)

10 0.7779184 552 andrew gelman stats-2011-02-03-Model Makers’ Hippocratic Oath

11 0.76062357 2133 andrew gelman stats-2013-12-13-Flexibility is good

12 0.75986099 964 andrew gelman stats-2011-10-19-An interweaving-transformation strategy for boosting MCMC efficiency

13 0.75788713 1004 andrew gelman stats-2011-11-11-Kaiser Fung on how not to critique models

14 0.75526702 1459 andrew gelman stats-2012-08-15-How I think about mixture models

15 0.75277925 1047 andrew gelman stats-2011-12-08-I Am Too Absolutely Heteroskedastic for This Probit Model

16 0.75022203 1431 andrew gelman stats-2012-07-27-Overfitting

17 0.74760354 774 andrew gelman stats-2011-06-20-The pervasive twoishness of statistics; in particular, the “sampling distribution” and the “likelihood” are two different models, and that’s a good thing

18 0.74748206 1972 andrew gelman stats-2013-08-07-When you’re planning on fitting a model, build up to it by fitting simpler models first. Then, once you have a model you like, check the hell out of it

19 0.74655497 1197 andrew gelman stats-2012-03-04-“All Models are Right, Most are Useless”

20 0.74368936 1739 andrew gelman stats-2013-02-26-An AI can build and try out statistical models using an open-ended generative grammar


similar blogs computed by lda model

lda for this blog:

topicId topicWeight

[(2, 0.038), (15, 0.058), (16, 0.078), (17, 0.016), (21, 0.043), (24, 0.204), (34, 0.043), (54, 0.016), (56, 0.051), (86, 0.054), (89, 0.023), (99, 0.259)]

similar blogs list:

simIndex simValue blogId blogTitle

same-blog 1 0.96440351 1162 andrew gelman stats-2012-02-11-Adding an error model to a deterministic model

Introduction: Daniel Lakeland asks , “Where do likelihoods come from?” He describes a class of problems where you have a deterministic dynamic model that you want to fit to data. The data won’t fit perfectly so, if you want to do Bayesian inference, you need to introduce an error model. This looks a little bit different from the usual way that models are presented in statistics textbooks, where the focus is typically on the random error process, not on the deterministic part of the model. A focus on the error process makes sense in some applications that have inherent randomness or variation (for example, genetics, psychology, and survey sampling) but not so much in the physical sciences, where the deterministic model can be complicated and is typically the essence of the study. Often in these sorts of studies, the staring point (and sometimes the ending point) is what the physicists call “nonlinear least squares” or what we would call normally-distributed errors. That’s what we did for our

2 0.96097755 936 andrew gelman stats-2011-10-02-Covariate Adjustment in RCT - Model Overfitting in Multilevel Regression

Introduction: Makoto Hanita writes: We have been discussing the following two issues amongst ourselves, then with our methodological consultant for several days. However, we have not been able to arrive at a consensus. Consequently, we decided to seek an opinion from nationally known experts. FYI, we sent a similar inquiry to Larry Hedges and David Rogosa . . . 1)      We are wondering if a post-hoc covariate adjustment is a good practice in the context of RCTs [randomized clinical trials]. We have a situation where we found a significant baseline difference between the treatment and the control groups in 3 variables. Some of us argue that adding those three variables to the original impact analysis model is a good idea, as that would remove the confound from the impact estimate. Others among us, on the other hand, argue that a post-hoc covariate adjustment should never be done, on the ground that those covariates are correlated with the treatment, which makes the analysis model that of quasi

3 0.95945746 994 andrew gelman stats-2011-11-06-Josh Tenenbaum presents . . . a model of folk physics!

Introduction: Josh Tenenbaum describes some new work modeling people’s physical reasoning as probabilistic inferences over intuitive theories of mechanics. A general-purpose capacity for “physical intelligence”—inferring physical properties of objects and predicting future states in complex dynamical scenes—is central to how humans interpret their environment and plan safe and effective actions. The computations and representations underlying physical intelligence remain unclear, however. Cognitive studies have focused on mapping out judgment biases and errors, or on testing simple heuristic models suitable only for highly specific cases; they have not attempted to give general-purpose unifying models. In computer science, artificial intelligence and robotics researchers have long sought to formalize common-sense physical reasoning but without success in approaching human-level competence. Here we show that a wide range of human physical judgments can be explained by positing an “intuitive me

4 0.95770866 1367 andrew gelman stats-2012-06-05-Question 26 of my final exam for Design and Analysis of Sample Surveys

Introduction: 26. You have just graded an an exam with 28 questions and 15 students. You fit a logistic item- response model estimating ability, difficulty, and discrimination parameters. Which of the following statements are basically true? (Indicate all that apply.) (a) If a question is answered correctly by students with very low and very high ability, but is missed by students in the middle, it will have a high value for its discrimination parameter. (b) It is not possible to fit an item-response model when you have more questions than students. In order to fit the model, you either need to reduce the number of questions (for example, by discarding some questions or by putting together some questions into a combined score) or increase the number of students in the dataset. (c) To keep the model identified, you can set one of the difficulty parameters or one of the ability parameters to zero and set one of the discrimination parameters to 1. (d) If two students answer the same number of q

5 0.95762849 1240 andrew gelman stats-2012-04-02-Blogads update

Introduction: A few months ago I reported on someone who wanted to insert text links into the blog. I asked her how much they would pay and got no answer. Yesterday, though, I received this reply: Hello Andrew, I am sorry for the delay in getting back to you. I’d like to make a proposal for your site. Please refer below. We would like to place a simple text link ad on page http://andrewgelman.com/2011/07/super_sam_fuld/ to link to *** with the key phrase ***. We will incorporate the key phrase into a sentence so it would read well. Rest assured it won’t sound obnoxious or advertorial. We will then process the final text link code as soon as you agree to our proposal. We can offer you $200 for this with the assumption that you will keep the link “live” on that page for 12 months or longer if you prefer. Please get back to us with a quick reply on your thoughts on this and include your Paypal ID for payment process. Hoping for a positive response from you. I wrote back: Hi,

6 0.95674503 2149 andrew gelman stats-2013-12-26-Statistical evidence for revised standards

7 0.95533669 1713 andrew gelman stats-2013-02-08-P-values and statistical practice

8 0.95526332 2299 andrew gelman stats-2014-04-21-Stan Model of the Week: Hierarchical Modeling of Supernovas

9 0.95512444 2305 andrew gelman stats-2014-04-25-Revised statistical standards for evidence (comments to Val Johnson’s comments on our comments on Val’s comments on p-values)

10 0.95430851 780 andrew gelman stats-2011-06-27-Bridges between deterministic and probabilistic models for binary data

11 0.95424902 494 andrew gelman stats-2010-12-31-Type S error rates for classical and Bayesian single and multiple comparison procedures

12 0.95365024 846 andrew gelman stats-2011-08-09-Default priors update?

13 0.95339656 2029 andrew gelman stats-2013-09-18-Understanding posterior p-values

14 0.9533484 1883 andrew gelman stats-2013-06-04-Interrogating p-values

15 0.95301431 2208 andrew gelman stats-2014-02-12-How to think about “identifiability” in Bayesian inference?

16 0.95218146 898 andrew gelman stats-2011-09-10-Fourteen magic words: an update

17 0.95205224 1171 andrew gelman stats-2012-02-16-“False-positive psychology”

18 0.95110321 2140 andrew gelman stats-2013-12-19-Revised evidence for statistical standards

19 0.95035052 2086 andrew gelman stats-2013-11-03-How best to compare effects measured in two different time periods?

20 0.94983137 1474 andrew gelman stats-2012-08-29-More on scaled-inverse Wishart and prior independence