andrew_gelman_stats andrew_gelman_stats-2010 andrew_gelman_stats-2010-251 knowledge-graph by maker-knowledge-mining

251 andrew gelman stats-2010-09-02-Interactions of predictors in a causal model


meta infos for this blog

Source: html

Introduction: Michael Bader writes: What is the best way to examine interactions of independent variables in a propensity weights framework? Let’s say we are interested in estimating breathing difficulty (measured on a continuous scale) and our main predictor is age of housing. The object is to estimate whether living in housing 20 years or older is associated with breathing difficulty compared counterfactually to those living in housing less than 20 years old; as a secondary question, we want to know whether that effect differs for those in poverty compared to those not in poverty. In our first-stage propensity model, we include whether the respondent lives in poverty. The weights applied to the other covariates in the propensity model are similar to those living in poverty compared to those who are not. Now, can I simply interact the poverty variable with the age of construction variable to look at the interaction of age of housing and poverty on breathing difficulty? My thought is no —


Summary: the most important sentenses genereted by tfidf model

sentIndex sentText sentNum sentScore

1 Michael Bader writes: What is the best way to examine interactions of independent variables in a propensity weights framework? [sent-1, score-1.105]

2 Let’s say we are interested in estimating breathing difficulty (measured on a continuous scale) and our main predictor is age of housing. [sent-2, score-0.579]

3 In our first-stage propensity model, we include whether the respondent lives in poverty. [sent-4, score-0.661]

4 The weights applied to the other covariates in the propensity model are similar to those living in poverty compared to those who are not. [sent-5, score-1.571]

5 Now, can I simply interact the poverty variable with the age of construction variable to look at the interaction of age of housing and poverty on breathing difficulty? [sent-6, score-2.074]

6 On the other hand, if the weights are comparable across the model when we stratify on poverty, I’m not sure whether it will have much of an effect. [sent-8, score-0.575]

7 Or, I could be totally incorrect and running the interaction with the poverty variable is sufficient. [sent-9, score-0.748]

8 I [Bader] am happy to read up on the subject; but when I tried doing a search, all I could find were debates about adding interactions into the propensity model itself, not looking at interactions of separate independent variables in the model. [sent-10, score-1.096]

9 My reply: I don’t think it’s a good idea to frame this in terms of weights or weighting. [sent-11, score-0.321]

10 I think of propensity scores as just one particular method for the more general problem of constructing similar groups in a treatment/control comparison. [sent-12, score-0.597]

11 ) In the example you describe above, you could compare people who lived in housing 20 years older to people who lived in more recent housing, matching on other variables including their previous poverty status. [sent-14, score-1.339]

12 Then you can include the relevant interactions in your model. [sent-15, score-0.235]

13 The whole propensity-weighting thing seems like a distraction from your real goals here. [sent-16, score-0.074]


similar blogs computed by tfidf model

tfidf for this blog:

wordName wordTfidf (topN-words)

[('propensity', 0.436), ('poverty', 0.396), ('housing', 0.355), ('weights', 0.266), ('breathing', 0.229), ('bader', 0.186), ('living', 0.177), ('interactions', 0.175), ('older', 0.153), ('interaction', 0.142), ('difficulty', 0.121), ('lived', 0.117), ('age', 0.114), ('variable', 0.103), ('compared', 0.1), ('whether', 0.097), ('variables', 0.093), ('washed', 0.085), ('model', 0.084), ('independent', 0.081), ('stratify', 0.08), ('poor', 0.08), ('distraction', 0.074), ('differs', 0.068), ('respondent', 0.068), ('main', 0.066), ('interact', 0.065), ('cells', 0.064), ('secondary', 0.062), ('include', 0.06), ('constructing', 0.058), ('similar', 0.057), ('construction', 0.057), ('years', 0.055), ('frame', 0.055), ('covariates', 0.055), ('incorrect', 0.054), ('object', 0.054), ('examine', 0.054), ('matching', 0.053), ('totally', 0.053), ('estimate', 0.053), ('debates', 0.052), ('effect', 0.05), ('arm', 0.049), ('predictor', 0.049), ('comparable', 0.048), ('joint', 0.048), ('measured', 0.047), ('scores', 0.046)]

similar blogs list:

simIndex simValue blogId blogTitle

same-blog 1 0.99999988 251 andrew gelman stats-2010-09-02-Interactions of predictors in a causal model

Introduction: Michael Bader writes: What is the best way to examine interactions of independent variables in a propensity weights framework? Let’s say we are interested in estimating breathing difficulty (measured on a continuous scale) and our main predictor is age of housing. The object is to estimate whether living in housing 20 years or older is associated with breathing difficulty compared counterfactually to those living in housing less than 20 years old; as a secondary question, we want to know whether that effect differs for those in poverty compared to those not in poverty. In our first-stage propensity model, we include whether the respondent lives in poverty. The weights applied to the other covariates in the propensity model are similar to those living in poverty compared to those who are not. Now, can I simply interact the poverty variable with the age of construction variable to look at the interaction of age of housing and poverty on breathing difficulty? My thought is no —

2 0.18487248 560 andrew gelman stats-2011-02-06-Education and Poverty

Introduction: Jonathan Livengood writes: There has been some discussion about the recent PISA results (in which the U.S. comes out pretty badly), for example here and here . The claim being made is that the poor U.S. scores are due to rampant individual- or family-level poverty in the U.S. They claim that when one controls for poverty, the U.S. comes out on top in the PISA standings, and then they infer that poverty causes poor test scores. The further inference is then that the U.S. could improve education by the “simple” action of reducing poverty. Anyway, I was wondering what you thought about their analysis. My reply: I agree this is interesting and I agree it’s hard to know exactly what to say about these comparisons. When I’m stuck in this sort of question, I ask, WWJD? In this case, I think Jennifer would ask what are the potential interventions being considered. Various ideas for changing the school system would perhaps have different effects on different groups of students.

3 0.14343446 784 andrew gelman stats-2011-07-01-Weighting and prediction in sample surveys

Introduction: A couple years ago Rod Little was invited to write an article for the diamond jubilee of the Calcutta Statistical Association Bulletin. His article was published with discussions from Danny Pfefferman, J. N. K. Rao, Don Rubin, and myself. Here it all is . I’ll paste my discussion below, but it’s worth reading the others’ perspectives too. Especially the part in Rod’s rejoinder where he points out a mistake I made. Survey weights, like sausage and legislation, are designed and best appreciated by those who are placed a respectable distance from their manufacture. For those of us working inside the factory, vigorous discussion of methods is appreciated. I enjoyed Rod Little’s review of the connections between modeling and survey weighting and have just a few comments. I like Little’s discussion of model-based shrinkage of post-stratum averages, which, as he notes, can be seen to correspond to shrinkage of weights. I would only add one thing to his formula at the end of his

4 0.14144641 86 andrew gelman stats-2010-06-14-“Too much data”?

Introduction: Chris Hane writes: I am scientist needing to model a treatment effect on a population of ~500 people. The dependent variable in the model is the difference in a person’s pre-treatment 12 month total medical cost versus post-treatment cost. So there is large variation in costs, but not so much by using the difference between the pre and post treatment costs. The issue I’d like some advice on is that the treatment has already occurred so there is no possibility of creating a fully randomized control now. I do have a very large population of people to use as possible controls via propensity scoring or exact matching. If I had a few thousand people to possibly match, then I would use standard techniques. However, I have a potential population of over a hundred thousand people. An exact match of the possible controls to age, gender and region of the country still leaves a population of 10,000 controls. Even if I use propensity scores to weight the 10,000 observations (understan

5 0.12593427 23 andrew gelman stats-2010-05-09-Popper’s great, but don’t bother with his theory of probability

Introduction: Adam Gurri writes: Any chance you could do a post explaining Popper’s propensity theory of probability? I have never understood it. My reply: I’m a big fan of Popper (search this blog for details), especially as interpreted by Lakatos, but as far as I can tell, Popper’s theory of probability is hopeless. We’ve made a lot of progress on probability in the past 75 years, and I don’t see any real need to go back to the bad old days.

6 0.12349721 888 andrew gelman stats-2011-09-03-A psychology researcher asks: Is Anova dead?

7 0.12167539 2351 andrew gelman stats-2014-05-28-Bayesian nonparametric weighted sampling inference

8 0.11699394 2171 andrew gelman stats-2014-01-13-Postdoc with Liz Stuart on propensity score methods when the covariates are measured with error

9 0.11586386 1509 andrew gelman stats-2012-09-24-Analyzing photon counts

10 0.11226582 139 andrew gelman stats-2010-07-10-Life in New York, Then and Now

11 0.10723279 823 andrew gelman stats-2011-07-26-Including interactions or not

12 0.1017765 2164 andrew gelman stats-2014-01-09-Hermann Goering and Jane Jacobs, together at last!

13 0.097898439 561 andrew gelman stats-2011-02-06-Poverty, educational performance – and can be done about it

14 0.094733171 833 andrew gelman stats-2011-07-31-Untunable Metropolis

15 0.093712144 1908 andrew gelman stats-2013-06-21-Interpreting interactions in discrete-data regression

16 0.093380466 288 andrew gelman stats-2010-09-21-Discussion of the paper by Girolami and Calderhead on Bayesian computation

17 0.093215376 1430 andrew gelman stats-2012-07-26-Some thoughts on survey weighting

18 0.088897139 405 andrew gelman stats-2010-11-10-Estimation from an out-of-date census

19 0.088328749 1981 andrew gelman stats-2013-08-14-The robust beauty of improper linear models in decision making

20 0.087470472 177 andrew gelman stats-2010-08-02-Reintegrating rebels into civilian life: Quasi-experimental evidence from Burundi


similar blogs computed by lsi model

lsi for this blog:

topicId topicWeight

[(0, 0.132), (1, 0.052), (2, 0.068), (3, -0.031), (4, 0.057), (5, 0.021), (6, 0.0), (7, -0.006), (8, 0.08), (9, 0.057), (10, -0.001), (11, 0.013), (12, 0.005), (13, 0.015), (14, 0.012), (15, 0.017), (16, 0.03), (17, 0.006), (18, -0.003), (19, 0.011), (20, -0.013), (21, 0.019), (22, -0.005), (23, -0.022), (24, -0.014), (25, 0.036), (26, -0.017), (27, 0.021), (28, -0.034), (29, 0.011), (30, 0.032), (31, 0.036), (32, -0.027), (33, 0.03), (34, -0.025), (35, -0.006), (36, 0.019), (37, 0.045), (38, -0.01), (39, 0.008), (40, -0.022), (41, -0.043), (42, 0.046), (43, -0.008), (44, 0.021), (45, 0.003), (46, 0.024), (47, 0.004), (48, 0.009), (49, 0.057)]

similar blogs list:

simIndex simValue blogId blogTitle

same-blog 1 0.9527148 251 andrew gelman stats-2010-09-02-Interactions of predictors in a causal model

Introduction: Michael Bader writes: What is the best way to examine interactions of independent variables in a propensity weights framework? Let’s say we are interested in estimating breathing difficulty (measured on a continuous scale) and our main predictor is age of housing. The object is to estimate whether living in housing 20 years or older is associated with breathing difficulty compared counterfactually to those living in housing less than 20 years old; as a secondary question, we want to know whether that effect differs for those in poverty compared to those not in poverty. In our first-stage propensity model, we include whether the respondent lives in poverty. The weights applied to the other covariates in the propensity model are similar to those living in poverty compared to those who are not. Now, can I simply interact the poverty variable with the age of construction variable to look at the interaction of age of housing and poverty on breathing difficulty? My thought is no —

2 0.84999144 86 andrew gelman stats-2010-06-14-“Too much data”?

Introduction: Chris Hane writes: I am scientist needing to model a treatment effect on a population of ~500 people. The dependent variable in the model is the difference in a person’s pre-treatment 12 month total medical cost versus post-treatment cost. So there is large variation in costs, but not so much by using the difference between the pre and post treatment costs. The issue I’d like some advice on is that the treatment has already occurred so there is no possibility of creating a fully randomized control now. I do have a very large population of people to use as possible controls via propensity scoring or exact matching. If I had a few thousand people to possibly match, then I would use standard techniques. However, I have a potential population of over a hundred thousand people. An exact match of the possible controls to age, gender and region of the country still leaves a population of 10,000 controls. Even if I use propensity scores to weight the 10,000 observations (understan

3 0.8028053 1294 andrew gelman stats-2012-05-01-Modeling y = a + b + c

Introduction: Brandon Behlendorf writes: I [Behlendorf] am replicating some previous research using OLS [he's talking about what we call "linear regression"---ed.] to regress a logged rate (to reduce skew) of Y on a number of predictors (Xs). Y is the count of a phenomena divided by the population of the unit of the analysis. The problem that I am encountering is that Y is composite count of a number of distinct phenomena [A+B+C], and these phenomena are not uniformly distributed across the sample. Most of the research in this area has conducted regressions either with Y or with individual phenomena [A or B or C] as the dependent variable. Yet it seems that if [A, B, C] are not uniformly distributed across the sample of units in the same proportion, then the use of Y would be biased, since as a count of [A+B+C] divided by the population, it would treat as equivalent units both [2+0.5+1.5] and [4+0+0]. My goal is trying to find a methodology which allows a researcher to regress Y on a

4 0.77941537 257 andrew gelman stats-2010-09-04-Question about standard range for social science correlations

Introduction: Andrew Eppig writes: I’m a physicist by training who is transitioning to the social sciences. I recently came across a reference in the Economist to a paper on IQ and parasites which I read as I have more than a passing interest in IQ research (having read much that you and others (e.g., Shalizi, Wicherts) have written). In this paper I note that the authors find a very high correlation between national IQ and parasite prevalence. The strength of the correlation (-0.76 to -0.82) surprised me, as I’m used to much weaker correlations in the social sciences. To me, it’s a bit too high, suggesting that there are other factors at play or that one of the variables is merely a proxy for a large number of other variables. But I have no basis for this other than a gut feeling and a memory of a plot on Language Log about the distribution of correlation coefficients in social psychology. So my question is this: Is a correlation in the range of (-0.82,-0.76) more likely to be a correlatio

5 0.77297354 1218 andrew gelman stats-2012-03-18-Check your missing-data imputations using cross-validation

Introduction: Elena Grewal writes: I am currently using the iterative regression imputation model as implemented in the Stata ICE package. I am using data from a survey of about 90,000 students in 142 schools and my variable of interest is parent level of education. I want only this variable to be imputed with as little bias as possible as I am not using any other variable. So I scoured the survey for every variable I thought could possibly predict parent education. The main variable I found is parent occupation, which explains about 35% of the variance in parent education for the students with complete data on both. I then include the 20 other variables I found in the survey in a regression predicting parent education, which explains about 40% of the variance in parent education for students with complete data on all the variables. My question is this: many of the other variables I found have more missing values than the parent education variable, and also, although statistically significant

6 0.76104164 1017 andrew gelman stats-2011-11-18-Lack of complete overlap

7 0.75962603 375 andrew gelman stats-2010-10-28-Matching for preprocessing data for causal inference

8 0.75490296 553 andrew gelman stats-2011-02-03-is it possible to “overstratify” when assigning a treatment in a randomized control trial?

9 0.7544148 888 andrew gelman stats-2011-09-03-A psychology researcher asks: Is Anova dead?

10 0.74871343 1121 andrew gelman stats-2012-01-15-R-squared for multilevel models

11 0.74434221 14 andrew gelman stats-2010-05-01-Imputing count data

12 0.7443189 753 andrew gelman stats-2011-06-09-Allowing interaction terms to vary

13 0.7366398 1462 andrew gelman stats-2012-08-18-Standardizing regression inputs

14 0.73429906 936 andrew gelman stats-2011-10-02-Covariate Adjustment in RCT - Model Overfitting in Multilevel Regression

15 0.73389047 2296 andrew gelman stats-2014-04-19-Index or indicator variables

16 0.72505111 1900 andrew gelman stats-2013-06-15-Exploratory multilevel analysis when group-level variables are of importance

17 0.71958613 1344 andrew gelman stats-2012-05-25-Question 15 of my final exam for Design and Analysis of Sample Surveys

18 0.71631211 851 andrew gelman stats-2011-08-12-year + (1|year)

19 0.70947921 1981 andrew gelman stats-2013-08-14-The robust beauty of improper linear models in decision making

20 0.70262384 2204 andrew gelman stats-2014-02-09-Keli Liu and Xiao-Li Meng on Simpson’s paradox


similar blogs computed by lda model

lda for this blog:

topicId topicWeight

[(15, 0.034), (16, 0.064), (21, 0.014), (24, 0.117), (38, 0.171), (47, 0.013), (85, 0.019), (86, 0.021), (88, 0.033), (97, 0.015), (99, 0.351)]

similar blogs list:

simIndex simValue blogId blogTitle

1 0.98364311 393 andrew gelman stats-2010-11-04-Estimating the effect of A on B, and also the effect of B on A

Introduction: Lei Liu writes: I am working with clinicians in infectious disease and international health to study the (possible causal) relation between malnutrition and virus infection episodes (e.g., diarrhea) in babies in developing countries. Basically the clinicians are interested in two questions: does malnutrition cause more diarrhea episodes? does diarrhea lead to malnutrition? The malnutrition status is indicated by height and weight (adjusted, HAZ and WAZ measures) observed every 3 months from birth to 1 year. They also recorded the time of each diarrhea episode during the 1 year follow-up period. They have very solid datasets for analysis. As you can see, this is almost like a chicken and egg problem. I am a layman to causal inference. The method I use is just to do some simple regression. For example, to study the causal relation from malnutrition to diarrhea episodes, I use binary variable (diarrhea yes/no during months 0-3) as response, and use the HAZ at month 0 as covariate

2 0.96889359 527 andrew gelman stats-2011-01-20-Cars vs. trucks

Introduction: Anupam Agrawal writes: I am an Assistant Professor of Operations Management at the University of Illinois. . . . My main work is in supply chain area, and empirical in nature. . . . I am working with a firm that has two separate divisions – one making cars, and the other makes trucks. Four years back, the firm made an interesting organizational change. They created a separate group of ~25 engineers, in their car division (from within their quality and production engineers). This group was focused on improving supplier quality and reported to car plant head . The truck division did not (and still does not) have such an independent “supplier improvement group”. Other than this unit in car, the organizational arrangements in the two divisions mimic each other. There are many common suppliers to the car and truck division. Data on quality of components coming from suppliers has been collected (for the last four years). The organizational change happened in January 2007. My focus is

3 0.96847975 1874 andrew gelman stats-2013-05-28-Nostalgia

Introduction: Saw Argo the other day, was impressed by the way it was filmed in such a 70s style, sorta like that movie The Limey or an episode of the Rockford Files. I also felt nostalgia for that relatively nonviolent era. All those hostages and nobody was killed. It’s a good thing the Ayatollah didn’t have some fundamentalist Shiite equivalent of John Yoo telling him to waterboard everybody. At the time we were all so angry and upset about the hostage-taking, but from the perspective of our suicide-bomber era, that whole hostage episode seems so comfortingly mild.

same-blog 4 0.96832985 251 andrew gelman stats-2010-09-02-Interactions of predictors in a causal model

Introduction: Michael Bader writes: What is the best way to examine interactions of independent variables in a propensity weights framework? Let’s say we are interested in estimating breathing difficulty (measured on a continuous scale) and our main predictor is age of housing. The object is to estimate whether living in housing 20 years or older is associated with breathing difficulty compared counterfactually to those living in housing less than 20 years old; as a secondary question, we want to know whether that effect differs for those in poverty compared to those not in poverty. In our first-stage propensity model, we include whether the respondent lives in poverty. The weights applied to the other covariates in the propensity model are similar to those living in poverty compared to those who are not. Now, can I simply interact the poverty variable with the age of construction variable to look at the interaction of age of housing and poverty on breathing difficulty? My thought is no —

5 0.94823796 600 andrew gelman stats-2011-03-04-“Social Psychologists Detect Liberal Bias Within”

Introduction: Mark Palko asks what I think of this news article by John Tierney. The article’s webpage is given the strange incomplete title above. My first comment is that the headline appears false. I didn’t see any evidence presented of liberal bias. (If the headline says “Social psychologists detect,” I expect to see some detection, not just anecdotes.) What I did see was a discussion of the fact that most academic psychologists consider themselves politically liberal (a pattern that holds for academic researchers in general), along with some anecdotes of moderates over the years who have felt their political views disrespected by the liberal majority. I’m interested in the topic, and I’m open to the possibility that there are all sorts of biases in academic research–but I don’t see the evidence from this article that social psychologists have detected any bias yet. Phrases such as “a statistically impossible lack of diversity” are just silly. What I really wonder is what John Jo

6 0.9422605 658 andrew gelman stats-2011-04-11-Statistics in high schools: Towards more accessible conceptions of statistical inference

7 0.92844176 717 andrew gelman stats-2011-05-17-Statistics plagiarism scandal

8 0.91722882 1722 andrew gelman stats-2013-02-14-Statistics for firefighters: update

9 0.91227406 48 andrew gelman stats-2010-05-23-The bane of many causes

10 0.91219187 1498 andrew gelman stats-2012-09-16-Choices in graphing parallel time series

11 0.90712839 1340 andrew gelman stats-2012-05-23-Question 13 of my final exam for Design and Analysis of Sample Surveys

12 0.90644985 1348 andrew gelman stats-2012-05-27-Question 17 of my final exam for Design and Analysis of Sample Surveys

13 0.90585488 1418 andrew gelman stats-2012-07-16-Long discussion about causal inference and the use of hierarchical models to bridge between different inferential settings

14 0.90557289 2245 andrew gelman stats-2014-03-12-More on publishing in journals

15 0.90479022 1972 andrew gelman stats-2013-08-07-When you’re planning on fitting a model, build up to it by fitting simpler models first. Then, once you have a model you like, check the hell out of it

16 0.90402007 1633 andrew gelman stats-2012-12-21-Kahan on Pinker on politics

17 0.90400696 1750 andrew gelman stats-2013-03-05-Watership Down, thick description, applied statistics, immutability of stories, and playing tennis with a net

18 0.9037959 2009 andrew gelman stats-2013-09-05-A locally organized online BDA course on G+ hangout?

19 0.90347332 2008 andrew gelman stats-2013-09-04-Does it matter that a sample is unrepresentative? It depends on the size of the treatment interactions

20 0.90335196 2170 andrew gelman stats-2014-01-13-Judea Pearl overview on causal inference, and more general thoughts on the reexpression of existing methods by considering their implicit assumptions