andrew_gelman_stats andrew_gelman_stats-2013 andrew_gelman_stats-2013-1735 knowledge-graph by maker-knowledge-mining
Source: html
Introduction: Tiago Fragoso writes: Suppose I fit a two stage regression model Y = a + bx + e a = cw + d + e1 I could fit it all in one step by using MCMC for example (my model is more complicated than that, so I’ll have to do it by MCMC). However, I could fit the first regression only using MCMC because those estimates are hard to obtain and perform the second regression using least squares or a separate MCMC. So there’s an ‘one step’ inference based on doing it all at the same time and a ‘two step’ inference by fitting one and using the estimates on the further steps. What is gained or lost between both? Is anything done in this question? My response: Rather than answering your particular question, I’ll give you my generic answer, which is to simulate fake data from your model, then fit your model both ways and see how the results differ. Repeat the simulation a few thousand times and you can make all the statistical comparisons you like.
sentIndex sentText sentNum sentScore
1 Tiago Fragoso writes: Suppose I fit a two stage regression model Y = a + bx + e a = cw + d + e1 I could fit it all in one step by using MCMC for example (my model is more complicated than that, so I’ll have to do it by MCMC). [sent-1, score-2.395]
2 However, I could fit the first regression only using MCMC because those estimates are hard to obtain and perform the second regression using least squares or a separate MCMC. [sent-2, score-2.116]
3 So there’s an ‘one step’ inference based on doing it all at the same time and a ‘two step’ inference by fitting one and using the estimates on the further steps. [sent-3, score-0.88]
4 My response: Rather than answering your particular question, I’ll give you my generic answer, which is to simulate fake data from your model, then fit your model both ways and see how the results differ. [sent-6, score-1.36]
5 Repeat the simulation a few thousand times and you can make all the statistical comparisons you like. [sent-7, score-0.493]
wordName wordTfidf (topN-words)
[('mcmc', 0.416), ('fit', 0.298), ('step', 0.28), ('regression', 0.222), ('bx', 0.218), ('using', 0.193), ('model', 0.182), ('gained', 0.168), ('answering', 0.165), ('simulate', 0.165), ('estimates', 0.159), ('inference', 0.147), ('stage', 0.145), ('thousand', 0.145), ('fake', 0.145), ('squares', 0.144), ('obtain', 0.141), ('generic', 0.138), ('repeat', 0.137), ('simulation', 0.132), ('lost', 0.123), ('perform', 0.12), ('complicated', 0.116), ('separate', 0.111), ('question', 0.109), ('fitting', 0.108), ('ll', 0.105), ('comparisons', 0.098), ('two', 0.087), ('suppose', 0.085), ('ways', 0.082), ('response', 0.078), ('however', 0.078), ('times', 0.074), ('answer', 0.073), ('second', 0.072), ('done', 0.07), ('one', 0.069), ('could', 0.067), ('hard', 0.066), ('anything', 0.065), ('results', 0.064), ('least', 0.063), ('give', 0.061), ('particular', 0.06), ('based', 0.057), ('rather', 0.048), ('first', 0.045), ('statistical', 0.044), ('example', 0.038)]
simIndex simValue blogId blogTitle
same-blog 1 0.99999988 1735 andrew gelman stats-2013-02-24-F-f-f-fake data
Introduction: Tiago Fragoso writes: Suppose I fit a two stage regression model Y = a + bx + e a = cw + d + e1 I could fit it all in one step by using MCMC for example (my model is more complicated than that, so I’ll have to do it by MCMC). However, I could fit the first regression only using MCMC because those estimates are hard to obtain and perform the second regression using least squares or a separate MCMC. So there’s an ‘one step’ inference based on doing it all at the same time and a ‘two step’ inference by fitting one and using the estimates on the further steps. What is gained or lost between both? Is anything done in this question? My response: Rather than answering your particular question, I’ll give you my generic answer, which is to simulate fake data from your model, then fit your model both ways and see how the results differ. Repeat the simulation a few thousand times and you can make all the statistical comparisons you like.
Introduction: Lots of good statistical methods make use of two models. For example: - Classical statistics: estimates and standard errors using the likelihood function; tests and p-values using the sampling distribution. (The sampling distribution is not equivalent to the likelihood, as has been much discussed, for example in sequential stopping problems.) - Bayesian data analysis: inference using the posterior distribution; model checking using the predictive distribution (which, again, depends on the data-generating process in a way that the likelihood does not). - Machine learning: estimation using the data; evaluation using cross-validation (which requires some rule for partitioning the data, a rule that stands outside of the data themselves). - Bootstrap, jackknife, etc: estimation using an “estimator” (which, I would argue, is based in some sense on a model for the data), uncertainties using resampling (which, I would argue, is close to the idea of a “sampling distribution” in
3 0.15762764 861 andrew gelman stats-2011-08-19-Will Stan work well with 40×40 matrices?
Introduction: Tomas Iesmantas writes: I’m dealing with high dimensional (40-50 parameters) hierarchical bayesian model applied to nonlinear Poisson regression problem. Now I’m using an adaptive version for the Metropolis adjusted Langevin algorithm with a truncated drift (Yves F. Atchade, 2003) to obtain samples from posterior. But this algorithm is not very efficient in my case, it needs several millions iterations as burn-in period. And simulation takes quite a long time, since algorithm has to work with 40×40 matrices. Maybe you know another MCMC algorithm which could take not so many burn-in samples and would be able to deal with nonlinear regression? In non-hierarchical nonlinear regression model adaptive metropolis algorithm is enough, but in hierarchical case I could use something more effective. My reply: Try fitting the model in Stan. If that doesn’t work, let me know.
Introduction: In response to my remarks on his online book, Think Bayes, Allen Downey wrote: I [Downey] have a question about one of your comments: My [Gelman's] main criticism with both books is that they talk a lot about inference but not so much about model building or model checking (recall the three steps of Bayesian data analysis). I think it’s ok for an introductory book to focus on inference, which of course is central to the data-analytic process—but I’d like them to at least mention that Bayesian ideas arise in model building and model checking as well. This sounds like something I agree with, and one of the things I tried to do in the book is to put modeling decisions front and center. But the word “modeling” is used in lots of ways, so I want to see if we are talking about the same thing. For example, in many chapters, I start with a simple model of the scenario, do some analysis, then check whether the model is good enough, and iterate. Here’s the discussion of modeling
5 0.15268701 1489 andrew gelman stats-2012-09-09-Commercial Bayesian inference software is popping up all over
Introduction: Steve Cohen writes: As someone who has been working with Bayesian statistical models for the past several years, I [Cohen] have been challenged recently to describe the difference between Bayesian Networks (as implemented in BayesiaLab software) and modeling and inference using MCMC methods. I hope you have the time to give me (or to write on your blog) and relatively simple explanation that an advanced layman could understand. My reply: I skimmed the above website but I couldn’t quite see what they do. My guess is that they use MCMC and also various parametric approximations such as variational Bayes. They also seem to have something set up for decision analysis. My guess is that, compared to a general-purpose tool such as Stan, this Bayesia software is more accessible to non-academics in particular application areas (in this case, it looks like business marketing). But I can’t be sure. I’ve also heard about another company that looks to be doing something similar: h
6 0.14725092 1367 andrew gelman stats-2012-06-05-Question 26 of my final exam for Design and Analysis of Sample Surveys
7 0.14583057 246 andrew gelman stats-2010-08-31-Somewhat Bayesian multilevel modeling
8 0.14275444 1465 andrew gelman stats-2012-08-21-D. Buggin
9 0.14202181 2231 andrew gelman stats-2014-03-03-Running into a Stan Reference by Accident
10 0.13961866 852 andrew gelman stats-2011-08-13-Checking your model using fake data
11 0.13756596 780 andrew gelman stats-2011-06-27-Bridges between deterministic and probabilistic models for binary data
12 0.13613775 1141 andrew gelman stats-2012-01-28-Using predator-prey models on the Canadian lynx series
14 0.13517281 1460 andrew gelman stats-2012-08-16-“Real data can be a pain”
15 0.13338956 1443 andrew gelman stats-2012-08-04-Bayesian Learning via Stochastic Gradient Langevin Dynamics
16 0.12543821 1527 andrew gelman stats-2012-10-10-Another reason why you can get good inferences from a bad model
17 0.12434521 575 andrew gelman stats-2011-02-15-What are the trickiest models to fit?
18 0.12325312 1431 andrew gelman stats-2012-07-27-Overfitting
19 0.12117176 1886 andrew gelman stats-2013-06-07-Robust logistic regression
20 0.11993922 24 andrew gelman stats-2010-05-09-Special journal issue on statistical methods for the social sciences
topicId topicWeight
[(0, 0.192), (1, 0.165), (2, 0.051), (3, 0.023), (4, 0.091), (5, 0.036), (6, -0.028), (7, -0.061), (8, 0.136), (9, 0.023), (10, 0.062), (11, 0.045), (12, -0.063), (13, -0.003), (14, -0.059), (15, -0.006), (16, 0.021), (17, -0.026), (18, -0.032), (19, 0.021), (20, 0.019), (21, 0.004), (22, -0.003), (23, -0.091), (24, 0.004), (25, 0.022), (26, -0.033), (27, -0.091), (28, -0.005), (29, -0.002), (30, 0.022), (31, -0.016), (32, -0.02), (33, 0.019), (34, 0.046), (35, -0.011), (36, -0.037), (37, 0.005), (38, -0.029), (39, 0.001), (40, 0.01), (41, 0.084), (42, -0.033), (43, 0.006), (44, 0.082), (45, -0.039), (46, -0.056), (47, 0.02), (48, 0.064), (49, -0.023)]
simIndex simValue blogId blogTitle
same-blog 1 0.98163778 1735 andrew gelman stats-2013-02-24-F-f-f-fake data
Introduction: Tiago Fragoso writes: Suppose I fit a two stage regression model Y = a + bx + e a = cw + d + e1 I could fit it all in one step by using MCMC for example (my model is more complicated than that, so I’ll have to do it by MCMC). However, I could fit the first regression only using MCMC because those estimates are hard to obtain and perform the second regression using least squares or a separate MCMC. So there’s an ‘one step’ inference based on doing it all at the same time and a ‘two step’ inference by fitting one and using the estimates on the further steps. What is gained or lost between both? Is anything done in this question? My response: Rather than answering your particular question, I’ll give you my generic answer, which is to simulate fake data from your model, then fit your model both ways and see how the results differ. Repeat the simulation a few thousand times and you can make all the statistical comparisons you like.
2 0.86991036 861 andrew gelman stats-2011-08-19-Will Stan work well with 40×40 matrices?
Introduction: Tomas Iesmantas writes: I’m dealing with high dimensional (40-50 parameters) hierarchical bayesian model applied to nonlinear Poisson regression problem. Now I’m using an adaptive version for the Metropolis adjusted Langevin algorithm with a truncated drift (Yves F. Atchade, 2003) to obtain samples from posterior. But this algorithm is not very efficient in my case, it needs several millions iterations as burn-in period. And simulation takes quite a long time, since algorithm has to work with 40×40 matrices. Maybe you know another MCMC algorithm which could take not so many burn-in samples and would be able to deal with nonlinear regression? In non-hierarchical nonlinear regression model adaptive metropolis algorithm is enough, but in hierarchical case I could use something more effective. My reply: Try fitting the model in Stan. If that doesn’t work, let me know.
3 0.84937823 852 andrew gelman stats-2011-08-13-Checking your model using fake data
Introduction: Someone sent me the following email: I tried to do a logistic regression . . . I programmed the model in different ways and got different answers . . . can’t get the results to match . . . What am I doing wrong? . . . Here’s my code . . . I didn’t have the time to look at his code so I gave the following general response: One way to check things is to try simulating data from the fitted model, then fit your model again to the simulated data and see what happens. P.S. He followed my suggestion and responded a few days later: Yeah, that did the trick! I was treating a factor variable as a covariate! I love it when generic advice works out!
4 0.78436583 1406 andrew gelman stats-2012-07-05-Xiao-Li Meng and Xianchao Xie rethink asymptotics
Introduction: In an article catchily entitled, “I got more data, my model is more refined, but my estimator is getting worse! Am I just dumb?”, Meng and Xie write: Possibly, but more likely you are merely a victim of conventional wisdom. More data or better models by no means guarantee better estimators (e.g., with a smaller mean squared error), when you are not following probabilistically principled methods such as MLE (for large samples) or Bayesian approaches. Estimating equations are par- ticularly vulnerable in this regard, almost a necessary price for their robustness. These points will be demonstrated via common tasks of estimating regression parameters and correlations, under simple mod- els such as bivariate normal and ARCH(1). Some general strategies for detecting and avoiding such pitfalls are suggested, including checking for self-efficiency (Meng, 1994, Statistical Science) and adopting a guiding working model. Using the example of estimating the autocorrelation ρ under a statio
5 0.78413469 964 andrew gelman stats-2011-10-19-An interweaving-transformation strategy for boosting MCMC efficiency
Introduction: Yaming Yu and Xiao-Li Meng write in with a cool new idea for improving the efficiency of Gibbs and Metropolis in multilevel models: For a broad class of multilevel models, there exist two well-known competing parameterizations, the centered parameterization (CP) and the non-centered parameterization (NCP), for effective MCMC implementation. Much literature has been devoted to the questions of when to use which and how to compromise between them via partial CP/NCP. This article introduces an alternative strategy for boosting MCMC efficiency via simply interweaving—but not alternating—the two parameterizations. This strategy has the surprising property that failure of both the CP and NCP chains to converge geometrically does not prevent the interweaving algorithm from doing so. It achieves this seemingly magical property by taking advantage of the discordance of the two parameterizations, namely, the sufficiency of CP and the ancillarity of NCP, to substantially reduce the Markovian
6 0.78006077 935 andrew gelman stats-2011-10-01-When should you worry about imputed data?
7 0.77934712 1468 andrew gelman stats-2012-08-24-Multilevel modeling and instrumental variables
8 0.77689314 1141 andrew gelman stats-2012-01-28-Using predator-prey models on the Canadian lynx series
9 0.77681577 773 andrew gelman stats-2011-06-18-Should we always be using the t and robit instead of the normal and logit?
10 0.77247673 772 andrew gelman stats-2011-06-17-Graphical tools for understanding multilevel models
11 0.77138054 1395 andrew gelman stats-2012-06-27-Cross-validation (What is it good for?)
12 0.76891571 39 andrew gelman stats-2010-05-18-The 1.6 rule
14 0.76196277 823 andrew gelman stats-2011-07-26-Including interactions or not
15 0.75989789 1981 andrew gelman stats-2013-08-14-The robust beauty of improper linear models in decision making
16 0.75522143 250 andrew gelman stats-2010-09-02-Blending results from two relatively independent multi-level models
18 0.74455136 328 andrew gelman stats-2010-10-08-Displaying a fitted multilevel model
19 0.74355161 1459 andrew gelman stats-2012-08-15-How I think about mixture models
20 0.73897809 1431 andrew gelman stats-2012-07-27-Overfitting
topicId topicWeight
[(16, 0.028), (21, 0.189), (24, 0.159), (53, 0.024), (99, 0.468)]
simIndex simValue blogId blogTitle
1 0.99159211 1989 andrew gelman stats-2013-08-20-Correcting for multiple comparisons in a Bayesian regression model
Introduction: Joe Northrup writes: I have a question about correcting for multiple comparisons in a Bayesian regression model. I believe I understand the argument in your 2012 paper in Journal of Research on Educational Effectiveness that when you have a hierarchical model there is shrinkage of estimates towards the group-level mean and thus there is no need to add any additional penalty to correct for multiple comparisons. In my case I do not have hierarchically structured data—i.e. I have only 1 observation per group but have a categorical variable with a large number of categories. Thus, I am fitting a simple multiple regression in a Bayesian framework. Would putting a strong, mean 0, multivariate normal prior on the betas in this model accomplish the same sort of shrinkage (it seems to me that it would) and do you believe this is a valid way to address criticism of multiple comparisons in this setting? My reply: Yes, I think this makes sense. One way to address concerns of multiple com
2 0.98955512 147 andrew gelman stats-2010-07-15-Quote of the day: statisticians and defaults
Introduction: On statisticians and statistical software: Statisticians are particularly sensitive to default settings, which makes sense considering that statistics is, in many ways, a science based on defaults. What is a “statistical method” if not a recommended default analysis, backed up by some combination of theory and experience?
3 0.98879808 854 andrew gelman stats-2011-08-15-A silly paper that tries to make fun of multilevel models
Introduction: Torkild Hovde Lyngstad writes: I wondered what your reaction would be to this paper from a recent issue of European Political Science. It came out already in March this year, so you might have seen it or even commented on it before. Is is a joke at the expense of the whole polisci discipline, a joke the Editors did not catch, or the sequel to the Sokal affair, just with quanto social science as the target? My reply: Yes, several people pointed me to this article. I don’t think it’s a hoax, it’s more of a joke: the author is making the point that with fancy statistics you can discover all sorts of patterns that don’t make sense. The implication, I believe, is that many patterns that social scientists do find through statistical analysis are not actually meaningful. I agree with this point, which could be even more pithily stated as “correlation does not imply causation.” I am irritated, however, by the singling out of multilevel models here, as the point could be mad
4 0.98831964 1675 andrew gelman stats-2013-01-15-“10 Things You Need to Know About Causal Effects”
Introduction: Macartan Humphreys pointed me to this excellent guide . Here are the 10 items: 1. A causal claim is a statement about what didn’t happen. 2. There is a fundamental problem of causal inference. 3. You can estimate average causal effects even if you cannot observe any individual causal effects. 4. If you know that, on average, A causes B and that B causes C, this does not mean that you know that A causes C. 5. The counterfactual model is all about contribution, not attribution. 6. X can cause Y even if there is no “causal path” connecting X and Y. 7. Correlation is not causation. 8. X can cause Y even if X is not a necessary condition or a sufficient condition for Y. 9. Estimating average causal effects does not require that treatment and control groups are identical. 10. There is no causation without manipulation. The article follows with crisp discussions of each point. My favorite is item #6, not because it’s the most important but because it brings in some real s
5 0.9876461 432 andrew gelman stats-2010-11-27-Neumann update
Introduction: Steve Hsu, who started off this discussion, had some comments on my speculations on the personality of John von Neumann and others. Steve writes: I [Hsu] actually knew Feynman a bit when I was an undergrad, and found him to be very nice to students. Since then I have heard quite a few stories from people in theoretical physics which emphasize his nastier side, and I think in the end he was quite a complicated person like everyone else. There are a couple of pseudo-biographies of vN, but none as high quality as, e.g., Gleick’s book on Feynman or Hodges book about Turing. (Gleick studied physics as an undergrad at Harvard, and Hodges is a PhD in mathematical physics — pretty rare backgrounds for biographers!) For example, as mentioned on the comment thread to your post, Steve Heims wrote a book about both vN and Wiener (!), and Norman Macrae wrote a biography of vN. Both books are worth reading, but I think neither really do him justice. The breadth of vN’s work is just too m
6 0.98265588 1451 andrew gelman stats-2012-08-08-Robert Kosara reviews Ed Tufte’s short course
7 0.98141813 62 andrew gelman stats-2010-06-01-Two Postdoc Positions Available on Bayesian Hierarchical Modeling
8 0.98080903 514 andrew gelman stats-2011-01-13-News coverage of statistical issues…how did I do?
10 0.97996503 1857 andrew gelman stats-2013-05-15-Does quantum uncertainty have a place in everyday applied statistics?
same-blog 12 0.97884542 1735 andrew gelman stats-2013-02-24-F-f-f-fake data
13 0.97644109 2159 andrew gelman stats-2014-01-04-“Dogs are sensitive to small variations of the Earth’s magnetic field”
14 0.97542608 1459 andrew gelman stats-2012-08-15-How I think about mixture models
15 0.97510707 1864 andrew gelman stats-2013-05-20-Evaluating Columbia University’s Frontiers of Science course
16 0.97378027 486 andrew gelman stats-2010-12-26-Age and happiness: The pattern isn’t as clear as you might think
17 0.97377813 789 andrew gelman stats-2011-07-07-Descriptive statistics, causal inference, and story time
18 0.97321069 2037 andrew gelman stats-2013-09-25-Classical probability does not apply to quantum systems (causal inference edition)
19 0.97250426 246 andrew gelman stats-2010-08-31-Somewhat Bayesian multilevel modeling
20 0.97188985 1275 andrew gelman stats-2012-04-22-Please stop me before I barf again