andrew_gelman_stats andrew_gelman_stats-2012 andrew_gelman_stats-2012-1141 knowledge-graph by maker-knowledge-mining

1141 andrew gelman stats-2012-01-28-Using predator-prey models on the Canadian lynx series


meta infos for this blog

Source: html

Introduction: The “Canadian lynx data” is one of the famous examples used in time series analysis. And the usual models that are fit to these data in the statistics time-series literature, don’t work well. Cavan Reilly and Angelique Zeringue write : Reilly and Zeringue then present their analysis. Their simple little predator-prey model with a weakly informative prior way outperforms the standard big-ass autoregression models. Check this out: Or, to put it into numbers, when they fit their model to the first 80 years and predict to the next 34, their root mean square out-of-sample error is 1480 (see scale of data above). In contrast, the standard model fit to these data (the SETAR model of Tong, 1990) has more than twice as many parameters but gets a worse-performing root mean square error of 1600, even when that model is fit to the entire dataset. (If you fit the SETAR or any similar autoregressive model to the first 80 years and use it to predict the next 34, the predictions


Summary: the most important sentenses genereted by tfidf model

sentIndex sentText sentNum sentScore

1 The “Canadian lynx data” is one of the famous examples used in time series analysis. [sent-1, score-0.342]

2 And the usual models that are fit to these data in the statistics time-series literature, don’t work well. [sent-2, score-0.428]

3 Their simple little predator-prey model with a weakly informative prior way outperforms the standard big-ass autoregression models. [sent-4, score-0.779]

4 Check this out: Or, to put it into numbers, when they fit their model to the first 80 years and predict to the next 34, their root mean square out-of-sample error is 1480 (see scale of data above). [sent-5, score-1.223]

5 In contrast, the standard model fit to these data (the SETAR model of Tong, 1990) has more than twice as many parameters but gets a worse-performing root mean square error of 1600, even when that model is fit to the entire dataset. [sent-6, score-1.846]

6 (If you fit the SETAR or any similar autoregressive model to the first 80 years and use it to predict the next 34, the predictions are a disaster—the predicted values quickly go toward the mean and can’t even attempt to track the curve. [sent-7, score-1.072]

7 ) As Reilly and Zeringue note, the above graph shows potential room for improvement in the model, but even as is, it shows the huge benefits that can be obtained by attempting to model the underlying process rather than simply fitting the data using a conventional family of models. [sent-8, score-0.918]

8 (It’s funny for me to emphasize this point, given how often I use conventional models such as linear and logistic regression. [sent-9, score-0.352]

9 The title and text above have been modified to reflect comments below with reference to models fit to the lynx data in the ecology literature. [sent-12, score-1.277]

10 There appears to be not enough communication between ecologists and statisticians. [sent-13, score-0.162]


similar blogs computed by tfidf model

tfidf for this blog:

wordName wordTfidf (topN-words)

[('lynx', 0.342), ('zeringue', 0.342), ('reilly', 0.298), ('autoregression', 0.25), ('setar', 0.25), ('model', 0.225), ('fit', 0.221), ('ecology', 0.17), ('root', 0.168), ('square', 0.157), ('conventional', 0.121), ('models', 0.114), ('angelique', 0.114), ('predict', 0.106), ('ecologists', 0.103), ('cavan', 0.103), ('autoregressive', 0.103), ('shows', 0.098), ('mean', 0.097), ('data', 0.093), ('canadian', 0.092), ('check', 0.091), ('outperforms', 0.09), ('outperform', 0.088), ('disaster', 0.087), ('attempting', 0.084), ('modified', 0.083), ('error', 0.08), ('next', 0.076), ('comments', 0.072), ('standard', 0.072), ('simple', 0.071), ('obtained', 0.071), ('weakly', 0.071), ('holds', 0.069), ('generic', 0.068), ('improvement', 0.066), ('reflect', 0.066), ('track', 0.062), ('twice', 0.062), ('references', 0.062), ('quickly', 0.062), ('room', 0.062), ('predicted', 0.061), ('attempt', 0.059), ('emphasize', 0.059), ('text', 0.059), ('communication', 0.059), ('logistic', 0.058), ('reference', 0.057)]

similar blogs list:

simIndex simValue blogId blogTitle

same-blog 1 0.99999988 1141 andrew gelman stats-2012-01-28-Using predator-prey models on the Canadian lynx series

Introduction: The “Canadian lynx data” is one of the famous examples used in time series analysis. And the usual models that are fit to these data in the statistics time-series literature, don’t work well. Cavan Reilly and Angelique Zeringue write : Reilly and Zeringue then present their analysis. Their simple little predator-prey model with a weakly informative prior way outperforms the standard big-ass autoregression models. Check this out: Or, to put it into numbers, when they fit their model to the first 80 years and predict to the next 34, their root mean square out-of-sample error is 1480 (see scale of data above). In contrast, the standard model fit to these data (the SETAR model of Tong, 1990) has more than twice as many parameters but gets a worse-performing root mean square error of 1600, even when that model is fit to the entire dataset. (If you fit the SETAR or any similar autoregressive model to the first 80 years and use it to predict the next 34, the predictions

2 0.16717355 1907 andrew gelman stats-2013-06-20-Amazing retro gnu graphics!

Introduction: Bill Harris writes: Speaking of strange graphics, http://makingsense.facilitatedsystems.com/2007/03/making-musical-sense-by-email-part-2.html shows an example of text (gnuplot’s dumb terminal) graphics of data from MCSim (code and other material available from http://makingsense.facilitatedsystems.com/2007/03/making-musical-sense-by-email-table-of.html). At another extreme, slide 20 of https://docs.google.com/viewer?a=v&pid;=sites&srcid;=ZGVmYXVsdGRvbWFpbnx3c2hhcnJpczEzfGd4OjZkNGFjZWZhOTAyYTFkMDg shows a stereogram of more MCSim output (I was a bit more naive back then). I included the stereogram as a bit of humor just to show what could be done with J graphics. Surprisingly, one person in the audience focused intently on that slide and, after a moment, said “Got it!” We spoke afterwards, and it turned out that he was on the board or at least a volunteer at the Portland (OR) 3D Center of Art & Photography (http://www.3dcenterusa.com/index.html). Regarding mcsim, the

3 0.1572741 780 andrew gelman stats-2011-06-27-Bridges between deterministic and probabilistic models for binary data

Introduction: For the analysis of binary data, various deterministic models have been proposed, which are generally simpler to fit and easier to understand than probabilistic models. We claim that corresponding to any deterministic model is an implicit stochastic model in which the deterministic model fits imperfectly, with errors occurring at random. In the context of binary data, we consider a model in which the probability of error depends on the model prediction. We show how to fit this model using a stochastic modification of deterministic optimization schemes. The advantages of fitting the stochastic model explicitly (rather than implicitly, by simply fitting a deterministic model and accepting the occurrence of errors) include quantification of uncertainty in the deterministic model’s parameter estimates, better estimation of the true model error rate, and the ability to check the fit of the model nontrivially. We illustrate this with a simple theoretical example of item response data and w

4 0.15207961 1972 andrew gelman stats-2013-08-07-When you’re planning on fitting a model, build up to it by fitting simpler models first. Then, once you have a model you like, check the hell out of it

Introduction: In response to my remarks on his online book, Think Bayes, Allen Downey wrote: I [Downey] have a question about one of your comments: My [Gelman's] main criticism with both books is that they talk a lot about inference but not so much about model building or model checking (recall the three steps of Bayesian data analysis). I think it’s ok for an introductory book to focus on inference, which of course is central to the data-analytic process—but I’d like them to at least mention that Bayesian ideas arise in model building and model checking as well. This sounds like something I agree with, and one of the things I tried to do in the book is to put modeling decisions front and center. But the word “modeling” is used in lots of ways, so I want to see if we are talking about the same thing. For example, in many chapters, I start with a simple model of the scenario, do some analysis, then check whether the model is good enough, and iterate. Here’s the discussion of modeling

5 0.14776357 1406 andrew gelman stats-2012-07-05-Xiao-Li Meng and Xianchao Xie rethink asymptotics

Introduction: In an article catchily entitled, “I got more data, my model is more refined, but my estimator is getting worse! Am I just dumb?”, Meng and Xie write: Possibly, but more likely you are merely a victim of conventional wisdom. More data or better models by no means guarantee better estimators (e.g., with a smaller mean squared error), when you are not following probabilistically principled methods such as MLE (for large samples) or Bayesian approaches. Estimating equations are par- ticularly vulnerable in this regard, almost a necessary price for their robustness. These points will be demonstrated via common tasks of estimating regression parameters and correlations, under simple mod- els such as bivariate normal and ARCH(1). Some general strategies for detecting and avoiding such pitfalls are suggested, including checking for self-efficiency (Meng, 1994, Statistical Science) and adopting a guiding working model. Using the example of estimating the autocorrelation ρ under a statio

6 0.1413826 1392 andrew gelman stats-2012-06-26-Occam

7 0.13613775 1735 andrew gelman stats-2013-02-24-F-f-f-fake data

8 0.13400351 2133 andrew gelman stats-2013-12-13-Flexibility is good

9 0.13235174 24 andrew gelman stats-2010-05-09-Special journal issue on statistical methods for the social sciences

10 0.13197814 1162 andrew gelman stats-2012-02-11-Adding an error model to a deterministic model

11 0.12734659 1459 andrew gelman stats-2012-08-15-How I think about mixture models

12 0.12484116 1431 andrew gelman stats-2012-07-27-Overfitting

13 0.12355798 1142 andrew gelman stats-2012-01-29-Difficulties with the 1-4-power transformation

14 0.12349265 773 andrew gelman stats-2011-06-18-Should we always be using the t and robit instead of the normal and logit?

15 0.11155114 1999 andrew gelman stats-2013-08-27-Bayesian model averaging or fitting a larger model

16 0.11112433 1886 andrew gelman stats-2013-06-07-Robust logistic regression

17 0.11059258 781 andrew gelman stats-2011-06-28-The holes in my philosophy of Bayesian data analysis

18 0.1069738 754 andrew gelman stats-2011-06-09-Difficulties with Bayesian model averaging

19 0.10563438 852 andrew gelman stats-2011-08-13-Checking your model using fake data

20 0.10533426 2109 andrew gelman stats-2013-11-21-Hidden dangers of noninformative priors


similar blogs computed by lsi model

lsi for this blog:

topicId topicWeight

[(0, 0.18), (1, 0.158), (2, 0.022), (3, 0.067), (4, 0.056), (5, -0.02), (6, -0.003), (7, -0.038), (8, 0.066), (9, 0.056), (10, 0.047), (11, 0.065), (12, -0.058), (13, -0.011), (14, -0.089), (15, -0.011), (16, 0.039), (17, -0.029), (18, -0.005), (19, -0.008), (20, 0.021), (21, -0.04), (22, -0.031), (23, -0.076), (24, -0.021), (25, 0.016), (26, -0.026), (27, -0.027), (28, 0.016), (29, -0.027), (30, -0.056), (31, 0.006), (32, -0.031), (33, -0.013), (34, -0.003), (35, 0.038), (36, -0.014), (37, -0.02), (38, 0.014), (39, -0.023), (40, -0.002), (41, 0.002), (42, -0.009), (43, 0.042), (44, 0.001), (45, 0.005), (46, -0.047), (47, -0.06), (48, 0.01), (49, 0.015)]

similar blogs list:

simIndex simValue blogId blogTitle

same-blog 1 0.98587036 1141 andrew gelman stats-2012-01-28-Using predator-prey models on the Canadian lynx series

Introduction: The “Canadian lynx data” is one of the famous examples used in time series analysis. And the usual models that are fit to these data in the statistics time-series literature, don’t work well. Cavan Reilly and Angelique Zeringue write : Reilly and Zeringue then present their analysis. Their simple little predator-prey model with a weakly informative prior way outperforms the standard big-ass autoregression models. Check this out: Or, to put it into numbers, when they fit their model to the first 80 years and predict to the next 34, their root mean square out-of-sample error is 1480 (see scale of data above). In contrast, the standard model fit to these data (the SETAR model of Tong, 1990) has more than twice as many parameters but gets a worse-performing root mean square error of 1600, even when that model is fit to the entire dataset. (If you fit the SETAR or any similar autoregressive model to the first 80 years and use it to predict the next 34, the predictions

2 0.92313325 780 andrew gelman stats-2011-06-27-Bridges between deterministic and probabilistic models for binary data

Introduction: For the analysis of binary data, various deterministic models have been proposed, which are generally simpler to fit and easier to understand than probabilistic models. We claim that corresponding to any deterministic model is an implicit stochastic model in which the deterministic model fits imperfectly, with errors occurring at random. In the context of binary data, we consider a model in which the probability of error depends on the model prediction. We show how to fit this model using a stochastic modification of deterministic optimization schemes. The advantages of fitting the stochastic model explicitly (rather than implicitly, by simply fitting a deterministic model and accepting the occurrence of errors) include quantification of uncertainty in the deterministic model’s parameter estimates, better estimation of the true model error rate, and the ability to check the fit of the model nontrivially. We illustrate this with a simple theoretical example of item response data and w

3 0.91605008 24 andrew gelman stats-2010-05-09-Special journal issue on statistical methods for the social sciences

Introduction: Last year I spoke at a conference celebrating the 10th anniversary of the University of Washington’s Center for Statistics and the Social Sciences, and just today a special issue of the journal Statistical Methodology came out in honor of the center’s anniversary. My article in the special issue actually has nothing to do with my talk at the conference; rather, it’s an exploration of an idea that Iven Van Mechelen and I had for understanding deterministic models probabilistically: For the analysis of binary data, various deterministic models have been proposed, which are generally simpler to fit and easier to understand than probabilistic models. We claim that corresponding to any deterministic model is an implicit stochastic model in which the deterministic model fits imperfectly, with errors occurring at random. In the context of binary data, we consider a model in which the probability of error depends on the model prediction. We show how to fit this model using a stocha

4 0.9107734 1431 andrew gelman stats-2012-07-27-Overfitting

Introduction: Ilya Esteban writes: In traditional machine learning and statistical learning techniques, you spend a lot of time selecting your input features, fiddling with model parameter values, etc., all of which leads to the problem of overfitting the data and producing overly optimistic estimates for how good the model really is. You can use techniques such as cross-validation and out-of-sample validation data to try to limit the damage, but they are imperfect solutions at best. While Bayesian models have the great advantage of not forcing you to manually select among the various weights and input features, you still often end up trying different priors and model structures (especially with hierarchical models), before coming up with a “final” model. When applying Bayesian modeling to real world data sets, how does should you evaluate alternate priors and topologies for the model without falling into the same overfitting trap as you do with non-Bayesian models? If you try several different

5 0.90884739 2133 andrew gelman stats-2013-12-13-Flexibility is good

Introduction: If I made a separate post for each interesting blog discussion, we’d get overwhelmed. That’s why I often leave detailed responses in the comments section, even though I’m pretty sure that most readers don’t look in the comments at all. Sometimes, though, I think it’s good to bring such discussions to light. Here’s a recent example. Michael wrote : Poor predictive performance usually indicates that the model isn’t sufficiently flexible to explain the data, and my understanding of the proper Bayesian strategy is to feed that back into your original model and try again until you achieve better performance. Corey replied : It was my impression that — in ML at least — poor predictive performance is more often due to the model being too flexible and fitting noise. And Rahul agreed : Good point. A very flexible model will describe your training data perfectly and then go bonkers when unleashed on wild data. But I wrote : Overfitting comes from a model being flex

6 0.89179724 1999 andrew gelman stats-2013-08-27-Bayesian model averaging or fitting a larger model

7 0.88464165 1459 andrew gelman stats-2012-08-15-How I think about mixture models

8 0.88359207 1162 andrew gelman stats-2012-02-11-Adding an error model to a deterministic model

9 0.8804062 1392 andrew gelman stats-2012-06-26-Occam

10 0.87424982 1406 andrew gelman stats-2012-07-05-Xiao-Li Meng and Xianchao Xie rethink asymptotics

11 0.86758929 1004 andrew gelman stats-2011-11-11-Kaiser Fung on how not to critique models

12 0.86419386 1817 andrew gelman stats-2013-04-21-More on Bayesian model selection in high-dimensional settings

13 0.86273408 328 andrew gelman stats-2010-10-08-Displaying a fitted multilevel model

14 0.86209494 1972 andrew gelman stats-2013-08-07-When you’re planning on fitting a model, build up to it by fitting simpler models first. Then, once you have a model you like, check the hell out of it

15 0.86157519 448 andrew gelman stats-2010-12-03-This is a footnote in one of my papers

16 0.85742968 1395 andrew gelman stats-2012-06-27-Cross-validation (What is it good for?)

17 0.85626453 20 andrew gelman stats-2010-05-07-Bayesian hierarchical model for the prediction of soccer results

18 0.84622288 754 andrew gelman stats-2011-06-09-Difficulties with Bayesian model averaging

19 0.84571207 773 andrew gelman stats-2011-06-18-Should we always be using the t and robit instead of the normal and logit?

20 0.84159684 265 andrew gelman stats-2010-09-09-Removing the blindfold: visualising statistical models


similar blogs computed by lda model

lda for this blog:

topicId topicWeight

[(9, 0.02), (16, 0.071), (21, 0.036), (24, 0.189), (29, 0.021), (41, 0.018), (45, 0.014), (53, 0.031), (61, 0.017), (62, 0.012), (74, 0.144), (82, 0.011), (86, 0.04), (99, 0.276)]

similar blogs list:

simIndex simValue blogId blogTitle

1 0.96404839 140 andrew gelman stats-2010-07-10-SeeThroughNY

Introduction: From Ira Stoll , a link to this cool data site , courtesy of the Manhattan Institute, with all sorts of state budget information including the salaries of all city and state employees.

2 0.95463347 1612 andrew gelman stats-2012-12-08-The Case for More False Positives in Anti-doping Testing

Introduction: Kaiser Fung was ahead of the curve on Lance Armstrong: The media has gotten the statistics totally backwards. On the one hand, they faithfully report the colorful stories of athletes who fail drug tests pleading their innocence. (I have written about the Spanish cyclist Alberto Contador here.) On the other hand, they unquestioningly report athletes who claim “hundreds of negative tests” prove their honesty. Putting these two together implies that the media believes that negative test results are highly reliable while positive test results are unreliable. The reality is just the opposite. When an athlete tests positive, it’s almost sure that he/she has doped. Sure, most of the clean athletes will test negative but what is often missed is that the majority of dopers will also test negative. We don’t need to do any computation to see that this is true. In most major sports competitions, the proportion of tests declared positive is typically below 1%. If you believe that the pr

same-blog 3 0.95009089 1141 andrew gelman stats-2012-01-28-Using predator-prey models on the Canadian lynx series

Introduction: The “Canadian lynx data” is one of the famous examples used in time series analysis. And the usual models that are fit to these data in the statistics time-series literature, don’t work well. Cavan Reilly and Angelique Zeringue write : Reilly and Zeringue then present their analysis. Their simple little predator-prey model with a weakly informative prior way outperforms the standard big-ass autoregression models. Check this out: Or, to put it into numbers, when they fit their model to the first 80 years and predict to the next 34, their root mean square out-of-sample error is 1480 (see scale of data above). In contrast, the standard model fit to these data (the SETAR model of Tong, 1990) has more than twice as many parameters but gets a worse-performing root mean square error of 1600, even when that model is fit to the entire dataset. (If you fit the SETAR or any similar autoregressive model to the first 80 years and use it to predict the next 34, the predictions

4 0.94860983 2261 andrew gelman stats-2014-03-23-Greg Mankiw’s utility function

Introduction: From 2010 : Greg Mankiw writes (link from Tyler Cowen ): Without any taxes, accepting that editor’s assignment would have yielded my children an extra $10,000. With taxes, it yields only $1,000. In effect, once the entire tax system is taken into account, my family’s marginal tax rate is about 90 percent. Is it any wonder that I [Mankiw] turn down most of the money-making opportunities I am offered? By contrast, without the tax increases advocated by the Obama administration, the numbers would look quite different. I would face a lower income tax rate, a lower Medicare tax rate, and no deduction phaseout or estate tax. Taking that writing assignment would yield my kids about $2,000. I would have twice the incentive to keep working. First, the good news Obama’s tax rates are much lower than Mankiw had anticipated! According to the above quote, his marginal tax rate is currently 80% but threatens to rise to 90%. But, in October 2008, Mankiw calculated that Obama’s

5 0.94570988 336 andrew gelman stats-2010-10-11-Mankiw’s marginal tax rate (which declined from 93% to 80% in two years) and the difficulty of microeconomic reasoning

Introduction: Greg Mankiw writes (link from Tyler Cowen ): Without any taxes, accepting that editor’s assignment would have yielded my children an extra $10,000. With taxes, it yields only $1,000. In effect, once the entire tax system is taken into account, my family’s marginal tax rate is about 90 percent. Is it any wonder that I [Mankiw] turn down most of the money-making opportunities I am offered? By contrast, without the tax increases advocated by the Obama administration, the numbers would look quite different. I would face a lower income tax rate, a lower Medicare tax rate, and no deduction phaseout or estate tax. Taking that writing assignment would yield my kids about $2,000. I would have twice the incentive to keep working. First, the good news Obama’s tax rates are much lower than Mankiw had anticipated! According to the above quote, his marginal tax rate is currently 80% but threatens to rise to 90%. But, in October 2008, Mankiw calculated that Obama’s would tax his m

6 0.93766707 1780 andrew gelman stats-2013-03-28-Racism!

7 0.92958689 366 andrew gelman stats-2010-10-24-Mankiw tax update

8 0.92172658 301 andrew gelman stats-2010-09-28-Correlation, prediction, variation, etc.

9 0.9215039 338 andrew gelman stats-2010-10-12-Update on Mankiw’s work incentives

10 0.91825193 2239 andrew gelman stats-2014-03-09-Reviewing the peer review process?

11 0.91715831 285 andrew gelman stats-2010-09-18-Fiction is not for tirades? Tell that to Saul Bellow!

12 0.91082299 2149 andrew gelman stats-2013-12-26-Statistical evidence for revised standards

13 0.90869111 1085 andrew gelman stats-2011-12-27-Laws as expressive

14 0.90804899 2305 andrew gelman stats-2014-04-25-Revised statistical standards for evidence (comments to Val Johnson’s comments on our comments on Val’s comments on p-values)

15 0.90771168 1644 andrew gelman stats-2012-12-30-Fixed effects, followed by Bayes shrinkage?

16 0.90717345 1792 andrew gelman stats-2013-04-07-X on JLP

17 0.90649575 899 andrew gelman stats-2011-09-10-The statistical significance filter

18 0.90641546 1881 andrew gelman stats-2013-06-03-Boot

19 0.90605623 807 andrew gelman stats-2011-07-17-Macro causality

20 0.90602988 836 andrew gelman stats-2011-08-03-Another plagiarism mystery