andrew_gelman_stats andrew_gelman_stats-2012 andrew_gelman_stats-2012-1216 knowledge-graph by maker-knowledge-mining

1216 andrew gelman stats-2012-03-17-Modeling group-level predictors in a multilevel regression


meta infos for this blog

Source: html

Introduction: Trey Causey writes: Do you have suggestions as to model selection strategies akin to Bayesian model averaging for multilevel models when level-2 inputs are of substantive interest? I [Causey] have seen plenty of R packages and procedures for non-multilevel models, and tried the glmulti package but found that it did not perform well with more than a few level-2 variables. My quick answer is: with a name like that, you should really be fitting three-level models! My longer answer is: regular readers will be unsurprised to hear that I’m no fan of Bayesian model averaging . Instead I’d prefer to bite the bullet and assign an informative prior distribution on these coefficients. I don’t have a great example of such an analysis but I’m more and more thinking that this is the way to go. I don’t see the point in aiming for the intermediate goal of pruning the predictors; I’d rather have a procedure that includes prior information on the predictors and their interactions.


Summary: the most important sentenses genereted by tfidf model

sentIndex sentText sentNum sentScore

1 Trey Causey writes: Do you have suggestions as to model selection strategies akin to Bayesian model averaging for multilevel models when level-2 inputs are of substantive interest? [sent-1, score-1.566]

2 I [Causey] have seen plenty of R packages and procedures for non-multilevel models, and tried the glmulti package but found that it did not perform well with more than a few level-2 variables. [sent-2, score-0.85]

3 My quick answer is: with a name like that, you should really be fitting three-level models! [sent-3, score-0.413]

4 My longer answer is: regular readers will be unsurprised to hear that I’m no fan of Bayesian model averaging . [sent-4, score-1.238]

5 Instead I’d prefer to bite the bullet and assign an informative prior distribution on these coefficients. [sent-5, score-0.899]

6 I don’t have a great example of such an analysis but I’m more and more thinking that this is the way to go. [sent-6, score-0.127]

7 I don’t see the point in aiming for the intermediate goal of pruning the predictors; I’d rather have a procedure that includes prior information on the predictors and their interactions. [sent-7, score-1.014]


similar blogs computed by tfidf model

tfidf for this blog:

wordName wordTfidf (topN-words)

[('causey', 0.432), ('averaging', 0.269), ('trey', 0.216), ('predictors', 0.204), ('unsurprised', 0.195), ('bite', 0.183), ('bullet', 0.17), ('akin', 0.17), ('models', 0.162), ('aiming', 0.159), ('inputs', 0.151), ('prior', 0.147), ('plenty', 0.146), ('intermediate', 0.143), ('strategies', 0.14), ('assign', 0.137), ('answer', 0.137), ('packages', 0.133), ('model', 0.128), ('fan', 0.122), ('substantive', 0.122), ('procedures', 0.118), ('package', 0.113), ('bayesian', 0.112), ('perform', 0.112), ('interactions', 0.112), ('procedure', 0.112), ('regular', 0.11), ('suggestions', 0.108), ('includes', 0.107), ('longer', 0.102), ('fitting', 0.101), ('informative', 0.099), ('selection', 0.098), ('hear', 0.098), ('tried', 0.093), ('multilevel', 0.09), ('quick', 0.088), ('prefer', 0.088), ('name', 0.087), ('goal', 0.085), ('readers', 0.077), ('seen', 0.076), ('distribution', 0.075), ('interest', 0.074), ('instead', 0.07), ('thinking', 0.065), ('great', 0.062), ('found', 0.059), ('information', 0.057)]

similar blogs list:

simIndex simValue blogId blogTitle

same-blog 1 0.99999994 1216 andrew gelman stats-2012-03-17-Modeling group-level predictors in a multilevel regression

Introduction: Trey Causey writes: Do you have suggestions as to model selection strategies akin to Bayesian model averaging for multilevel models when level-2 inputs are of substantive interest? I [Causey] have seen plenty of R packages and procedures for non-multilevel models, and tried the glmulti package but found that it did not perform well with more than a few level-2 variables. My quick answer is: with a name like that, you should really be fitting three-level models! My longer answer is: regular readers will be unsurprised to hear that I’m no fan of Bayesian model averaging . Instead I’d prefer to bite the bullet and assign an informative prior distribution on these coefficients. I don’t have a great example of such an analysis but I’m more and more thinking that this is the way to go. I don’t see the point in aiming for the intermediate goal of pruning the predictors; I’d rather have a procedure that includes prior information on the predictors and their interactions.

2 0.17550065 754 andrew gelman stats-2011-06-09-Difficulties with Bayesian model averaging

Introduction: In response to this article by Cosma Shalizi and myself on the philosophy of Bayesian statistics, David Hogg writes: I [Hogg] agree–even in physics and astronomy–that the models are not “True” in the God-like sense of being absolute reality (that is, I am not a realist); and I have argued (a philosophically very naive paper, but hey, I was new to all this) that for pretty fundamental reasons we could never arrive at the True (with a capital “T”) model of the Universe. The goal of inference is to find the “best” model, where “best” might have something to do with prediction, or explanation, or message length, or (horror!) our utility. Needless to say, most of my physics friends *are* realists, even in the face of “effective theories” as Newtonian mechanics is an effective theory of GR and GR is an effective theory of “quantum gravity” (this plays to your point, because if you think any theory is possibly an effective theory, how could you ever find Truth?). I also liked the i

3 0.15661263 1999 andrew gelman stats-2013-08-27-Bayesian model averaging or fitting a larger model

Introduction: Nick Firoozye writes: I had a question about BMA [Bayesian model averaging] and model combinations in general, and direct it to you since they are a basic form of hierarchical model, albeit in the simplest of forms. I wanted to ask what the underlying assumptions are that could lead to BMA improving on a larger model. I know model combination is a topic of interest in the (frequentist) econometrics community (e.g., Bates & Granger, http://www.jstor.org/discover/10.2307/3008764?uid=3738032&uid;=2&uid;=4&sid;=21101948653381) but at the time it was considered a bit of a puzzle. Perhaps small models combined outperform a big model due to standard errors, insufficient data, etc. But I haven’t seen much in way of Bayesian justification. In simplest terms, you might have a joint density P(Y,theta_1,theta_2) from which you could use the two marginals P(Y,theta_1) and P(Y,theta_2) to derive two separate forecasts. A BMA-er would do a weighted average of the two forecast densities, having p

4 0.15541835 1155 andrew gelman stats-2012-02-05-What is a prior distribution?

Introduction: Some recent blog discussion revealed some confusion that I’ll try to resolve here. I wrote that I’m not a big fan of subjective priors. Various commenters had difficulty with this point, and I think the issue was most clearly stated by Bill Jeff re erys, who wrote : It seems to me that your prior has to reflect your subjective information before you look at the data. How can it not? But this does not mean that the (subjective) prior that you choose is irrefutable; Surely a prior that reflects prior information just does not have to be inconsistent with that information. But that still leaves a range of priors that are consistent with it, the sort of priors that one would use in a sensitivity analysis, for example. I think I see what Bill is getting at. A prior represents your subjective belief, or some approximation to your subjective belief, even if it’s not perfect. That sounds reasonable but I don’t think it works. Or, at least, it often doesn’t work. Let’s start

5 0.14797856 811 andrew gelman stats-2011-07-20-Kind of Bayesian

Introduction: Astrophysicist Andrew Jaffe pointed me to this and discussion of my philosophy of statistics (which is, in turn, my rational reconstruction of the statistical practice of Bayesians such as Rubin and Jaynes). Jaffe’s summary is fair enough and I only disagree in a few points: 1. Jaffe writes: Subjective probability, at least the way it is actually used by practicing scientists, is a sort of “as-if” subjectivity — how would an agent reason if her beliefs were reflected in a certain set of probability distributions? This is why when I discuss probability I try to make the pedantic point that all probabilities are conditional, at least on some background prior information or context. I agree, and my problem with the usual procedures used for Bayesian model comparison and Bayesian model averaging is not that these approaches are subjective but that the particular models being considered don’t make sense. I’m thinking of the sorts of models that say the truth is either A or

6 0.14338465 2017 andrew gelman stats-2013-09-11-“Informative g-Priors for Logistic Regression”

7 0.14210062 342 andrew gelman stats-2010-10-14-Trying to be precise about vagueness

8 0.13606669 1506 andrew gelman stats-2012-09-21-Building a regression model . . . with only 27 data points

9 0.1318993 772 andrew gelman stats-2011-06-17-Graphical tools for understanding multilevel models

10 0.13125615 2200 andrew gelman stats-2014-02-05-Prior distribution for a predicted probability

11 0.12951054 1247 andrew gelman stats-2012-04-05-More philosophy of Bayes

12 0.12921685 2109 andrew gelman stats-2013-11-21-Hidden dangers of noninformative priors

13 0.12876827 653 andrew gelman stats-2011-04-08-Multilevel regression with shrinkage for “fixed” effects

14 0.12675901 1392 andrew gelman stats-2012-06-26-Occam

15 0.12584373 1941 andrew gelman stats-2013-07-16-Priors

16 0.12582488 1459 andrew gelman stats-2012-08-15-How I think about mixture models

17 0.12512653 1695 andrew gelman stats-2013-01-28-Economists argue about Bayes

18 0.12484969 781 andrew gelman stats-2011-06-28-The holes in my philosophy of Bayesian data analysis

19 0.12306624 1092 andrew gelman stats-2011-12-29-More by Berger and me on weakly informative priors

20 0.1229834 1981 andrew gelman stats-2013-08-14-The robust beauty of improper linear models in decision making


similar blogs computed by lsi model

lsi for this blog:

topicId topicWeight

[(0, 0.152), (1, 0.205), (2, 0.002), (3, 0.08), (4, 0.004), (5, -0.006), (6, 0.035), (7, -0.003), (8, 0.017), (9, 0.091), (10, 0.049), (11, 0.017), (12, 0.006), (13, 0.022), (14, 0.006), (15, 0.0), (16, 0.008), (17, -0.008), (18, 0.019), (19, 0.026), (20, -0.033), (21, 0.008), (22, -0.02), (23, -0.031), (24, -0.043), (25, -0.045), (26, -0.02), (27, -0.038), (28, -0.021), (29, -0.014), (30, -0.029), (31, -0.007), (32, 0.017), (33, 0.004), (34, 0.002), (35, 0.017), (36, 0.019), (37, 0.021), (38, -0.008), (39, -0.01), (40, 0.004), (41, -0.004), (42, 0.048), (43, -0.01), (44, 0.013), (45, 0.026), (46, -0.019), (47, 0.003), (48, 0.012), (49, -0.023)]

similar blogs list:

simIndex simValue blogId blogTitle

same-blog 1 0.97645551 1216 andrew gelman stats-2012-03-17-Modeling group-level predictors in a multilevel regression

Introduction: Trey Causey writes: Do you have suggestions as to model selection strategies akin to Bayesian model averaging for multilevel models when level-2 inputs are of substantive interest? I [Causey] have seen plenty of R packages and procedures for non-multilevel models, and tried the glmulti package but found that it did not perform well with more than a few level-2 variables. My quick answer is: with a name like that, you should really be fitting three-level models! My longer answer is: regular readers will be unsurprised to hear that I’m no fan of Bayesian model averaging . Instead I’d prefer to bite the bullet and assign an informative prior distribution on these coefficients. I don’t have a great example of such an analysis but I’m more and more thinking that this is the way to go. I don’t see the point in aiming for the intermediate goal of pruning the predictors; I’d rather have a procedure that includes prior information on the predictors and their interactions.

2 0.85334027 1486 andrew gelman stats-2012-09-07-Prior distributions for regression coefficients

Introduction: Eric Brown writes: I have come across a number of recommendations over the years about best practices for multilevel regression modeling. For example, the use of t-distributed priors for coefficients in logistic regression and standardizing input variables from one of your 2008 Annals of Applied Statistics papers; or recommendations for priors on variance parameters from your 2006 Bayesian Analysis paper. I understand that these are often of varied opinion of people in the field, but I was wondering if you have a reference that you point people to for a place to get started? I’ve tried looking through your blog posts but couldn’t find any summaries. For example, what are some examples of when I should use more than a two-level hierarchical model? Can I use a spike-slab coefficient model with a t-distributed prior for the slab rather than a normal? If I assume that my model is a priori wrong (but still useful), what are some recommended ways to choose how many interactions to u

3 0.83978373 1459 andrew gelman stats-2012-08-15-How I think about mixture models

Introduction: Larry Wasserman refers to finite mixture models as “beasts” and writes jokes that they “should be avoided at all costs.” I’ve thought a lot about mixture models, ever since using them in an analysis of voting patterns that was published in 1990. First off, I’d like to say that our model was useful so I’d prefer not to pay the cost of avoiding it. For a quick description of our mixture model and its context, see pp. 379-380 of my article in the Jim Berger volume). Actually, our case was particularly difficult because we were not even fitting a mixture model to data, we were fitting it to latent data and using the model to perform partial pooling. My difficulties in trying to fit this model inspired our discussion of mixture models in Bayesian Data Analysis (page 109 in the second edition, in the section on “Counterexamples to the theorems”). I agree with Larry that if you’re fitting a mixture model, it’s good to be aware of the problems that arise if you try to estimate

4 0.82725906 1999 andrew gelman stats-2013-08-27-Bayesian model averaging or fitting a larger model

Introduction: Nick Firoozye writes: I had a question about BMA [Bayesian model averaging] and model combinations in general, and direct it to you since they are a basic form of hierarchical model, albeit in the simplest of forms. I wanted to ask what the underlying assumptions are that could lead to BMA improving on a larger model. I know model combination is a topic of interest in the (frequentist) econometrics community (e.g., Bates & Granger, http://www.jstor.org/discover/10.2307/3008764?uid=3738032&uid;=2&uid;=4&sid;=21101948653381) but at the time it was considered a bit of a puzzle. Perhaps small models combined outperform a big model due to standard errors, insufficient data, etc. But I haven’t seen much in way of Bayesian justification. In simplest terms, you might have a joint density P(Y,theta_1,theta_2) from which you could use the two marginals P(Y,theta_1) and P(Y,theta_2) to derive two separate forecasts. A BMA-er would do a weighted average of the two forecast densities, having p

5 0.80543602 1431 andrew gelman stats-2012-07-27-Overfitting

Introduction: Ilya Esteban writes: In traditional machine learning and statistical learning techniques, you spend a lot of time selecting your input features, fiddling with model parameter values, etc., all of which leads to the problem of overfitting the data and producing overly optimistic estimates for how good the model really is. You can use techniques such as cross-validation and out-of-sample validation data to try to limit the damage, but they are imperfect solutions at best. While Bayesian models have the great advantage of not forcing you to manually select among the various weights and input features, you still often end up trying different priors and model structures (especially with hierarchical models), before coming up with a “final” model. When applying Bayesian modeling to real world data sets, how does should you evaluate alternate priors and topologies for the model without falling into the same overfitting trap as you do with non-Bayesian models? If you try several different

6 0.80110407 2033 andrew gelman stats-2013-09-23-More on Bayesian methods and multilevel modeling

7 0.79943341 781 andrew gelman stats-2011-06-28-The holes in my philosophy of Bayesian data analysis

8 0.79616952 1454 andrew gelman stats-2012-08-11-Weakly informative priors for Bayesian nonparametric models?

9 0.79237276 2145 andrew gelman stats-2013-12-24-Estimating and summarizing inference for hierarchical variance parameters when the number of groups is small

10 0.79028732 1392 andrew gelman stats-2012-06-26-Occam

11 0.78537363 1465 andrew gelman stats-2012-08-21-D. Buggin

12 0.78078467 1723 andrew gelman stats-2013-02-15-Wacky priors can work well?

13 0.77450389 1247 andrew gelman stats-2012-04-05-More philosophy of Bayes

14 0.77032512 754 andrew gelman stats-2011-06-09-Difficulties with Bayesian model averaging

15 0.76873368 653 andrew gelman stats-2011-04-08-Multilevel regression with shrinkage for “fixed” effects

16 0.76635921 1510 andrew gelman stats-2012-09-25-Incoherence of Bayesian data analysis

17 0.76627916 1041 andrew gelman stats-2011-12-04-David MacKay and Occam’s Razor

18 0.76239085 669 andrew gelman stats-2011-04-19-The mysterious Gamma (1.4, 0.4)

19 0.76051337 342 andrew gelman stats-2010-10-14-Trying to be precise about vagueness

20 0.75907719 1468 andrew gelman stats-2012-08-24-Multilevel modeling and instrumental variables


similar blogs computed by lda model

lda for this blog:

topicId topicWeight

[(2, 0.019), (16, 0.02), (21, 0.025), (22, 0.273), (24, 0.162), (63, 0.015), (86, 0.049), (89, 0.016), (99, 0.302)]

similar blogs list:

simIndex simValue blogId blogTitle

1 0.95971149 448 andrew gelman stats-2010-12-03-This is a footnote in one of my papers

Introduction: In the annals of hack literature, it is sometimes said that if you aim to write best-selling crap, all you’ll end up with is crap. To truly produce best-selling crap, you have to have a conviction, perhaps misplaced, that your writing has integrity. Whether or not this is a good generalization about writing, I have seen an analogous phenomenon in statistics: If you try to do nothing but model the data, you can be in for a wild and unpleasant ride: real data always seem to have one more twist beyond our ability to model (von Neumann’s elephant’s trunk notwithstanding). But if you model the underlying process, sometimes your model can fit surprisingly well as well as inviting openings for future research progress.

2 0.93220425 1398 andrew gelman stats-2012-06-28-Every time you take a sample, you’ll have to pay this guy a quarter

Introduction: Roy Mendelssohn pointed me to this heartwarming story of Jay Vadiveloo, an actuary who got a patent for the idea of statistical sampling. Vadiveloo writes, “the results were astounding: statistical sampling worked.” You may laugh, but wait till Albedo Man buys the patent and makes everybody do his bidding. They’re gonna dig up Laplace and make him pay retroactive royalties. And somehow Clippy will get involved in all this. P.S. Mendelssohn writes: “Yes, I felt it was a heartwarming story also. Perhaps we can get a patent for regression.” I say, forget a patent for regression. I want a patent for the sample mean. That’s where the real money is. You can’t charge a lot for each use, but consider the volume!

3 0.92389739 1037 andrew gelman stats-2011-12-01-Lamentably common misunderstanding of meritocracy

Introduction: Tyler Cowen pointed to an article by business-school professor Luigi Zingales about meritocracy. I’d expect a b-school prof to support the idea of meritocracy, and Zingales does not disappoint. But he says a bunch of other things that to me represent a confused conflation of ideas. Here’s Zingales: America became known as a land of opportunity—a place whose capitalist system benefited the hardworking and the virtuous [emphasis added]. In a word, it was a meritocracy. That’s interesting—and revealing. Here’s what I get when I look up “meritocracy” in the dictionary : 1 : a system in which the talented are chosen and moved ahead on the basis of their achievement 2 : leadership selected on the basis of intellectual criteria Nothing here about “hardworking” or “virtuous.” In a meritocracy, you can be as hardworking as John Kruk or as virtuous as Kobe Bryant and you’ll still get ahead—if you have the talent and achievement. Throwing in “hardworking” and “virtuous”

same-blog 4 0.91924226 1216 andrew gelman stats-2012-03-17-Modeling group-level predictors in a multilevel regression

Introduction: Trey Causey writes: Do you have suggestions as to model selection strategies akin to Bayesian model averaging for multilevel models when level-2 inputs are of substantive interest? I [Causey] have seen plenty of R packages and procedures for non-multilevel models, and tried the glmulti package but found that it did not perform well with more than a few level-2 variables. My quick answer is: with a name like that, you should really be fitting three-level models! My longer answer is: regular readers will be unsurprised to hear that I’m no fan of Bayesian model averaging . Instead I’d prefer to bite the bullet and assign an informative prior distribution on these coefficients. I don’t have a great example of such an analysis but I’m more and more thinking that this is the way to go. I don’t see the point in aiming for the intermediate goal of pruning the predictors; I’d rather have a procedure that includes prior information on the predictors and their interactions.

5 0.90575361 145 andrew gelman stats-2010-07-13-Statistical controversy regarding human rights violations in Colomnbia

Introduction: Megan Price wrote in that she and Daniel Guzmán of the Benetech Human Rights Program released a paper today entitled “Comments to the article ‘Is Violence Against Union Members in Colombia Systematic and Targeted?’” (o aqui en español), which examines an article written by Colombian academics Daniel Mejía and María José Uribe. Price writes [in the third person]: The paper reviewed by Price and Guzmán concluded that “. . . on average, violence against unionists in Colombia is neither systematic nor targeted.” However, in their response, Price and Guzmán present – in technical and methodological detail – the reasons they find the conclusions in Mejía and Uribe’s study to be overstated. Price and Guzmán believe that weaknesses in the data, in the choice of the statistical model, and the interpretation of the model used in Mejía and Uribe’s study, all raise serious questions about the authors’ strong causal conclusions. Price and Guzmán point out that unchecked, those conclusio

6 0.90392077 1700 andrew gelman stats-2013-01-31-Snotty reviewers

7 0.89883077 477 andrew gelman stats-2010-12-20-Costless false beliefs

8 0.89172214 504 andrew gelman stats-2011-01-05-For those of you in the U.K., also an amusing paradox involving the infamous hookah story

9 0.8860842 385 andrew gelman stats-2010-10-31-Wacky surveys where they don’t tell you the questions they asked

10 0.87392616 2123 andrew gelman stats-2013-12-04-Tesla fires!

11 0.87156481 1964 andrew gelman stats-2013-08-01-Non-topical blogging

12 0.8685351 92 andrew gelman stats-2010-06-17-Drug testing for recipents of NSF and NIH grants?

13 0.86820674 1161 andrew gelman stats-2012-02-10-If an entire article in Computational Statistics and Data Analysis were put together from other, unacknowledged, sources, would that be a work of art?

14 0.84886563 1804 andrew gelman stats-2013-04-15-How effective are football coaches?

15 0.84667784 879 andrew gelman stats-2011-08-29-New journal on causal inference

16 0.84593445 1413 andrew gelman stats-2012-07-11-News flash: Probability and statistics are hard to understand

17 0.83704996 2317 andrew gelman stats-2014-05-04-Honored oldsters write about statistics

18 0.83145654 2167 andrew gelman stats-2014-01-10-Do you believe that “humans and other living things have evolved over time”?

19 0.82744104 1984 andrew gelman stats-2013-08-16-BDA at 40% off!

20 0.82317287 2340 andrew gelman stats-2014-05-20-Thermodynamic Monte Carlo: Michael Betancourt’s new method for simulating from difficult distributions and evaluating normalizing constants