andrew_gelman_stats andrew_gelman_stats-2010 andrew_gelman_stats-2010-213 knowledge-graph by maker-knowledge-mining

213 andrew gelman stats-2010-08-17-Matching at two levels


meta infos for this blog

Source: html

Introduction: Steve Porter writes with a question about matching for inferences in a hierarchical data structure. I’ve never thought about this particular issue, but it seems potentially important. Maybe one or more of you have some useful suggestions? Porter writes: After immersing myself in the relatively sparse literature on propensity scores with clustered data, it seems as if people take one of two approaches. If the treatment is at the cluster-level (like school policies), they match on only the cluster-level covariates. If the treatment is at the individual level, they match on individual-level covariates. (I have also found some papers that match on individual-level covariates when it seems as if the treatment is really at the cluster-level.) But what if there is a selection process at both levels? For my research question (effect of tenure systems on faculty behavior) there is a two-step selection process: first colleges choose whether to have a tenure system for faculty; then f


Summary: the most important sentenses genereted by tfidf model

sentIndex sentText sentNum sentScore

1 Steve Porter writes with a question about matching for inferences in a hierarchical data structure. [sent-1, score-0.431]

2 I’ve never thought about this particular issue, but it seems potentially important. [sent-2, score-0.14]

3 Porter writes: After immersing myself in the relatively sparse literature on propensity scores with clustered data, it seems as if people take one of two approaches. [sent-4, score-0.666]

4 If the treatment is at the cluster-level (like school policies), they match on only the cluster-level covariates. [sent-5, score-0.645]

5 If the treatment is at the individual level, they match on individual-level covariates. [sent-6, score-0.553]

6 (I have also found some papers that match on individual-level covariates when it seems as if the treatment is really at the cluster-level. [sent-7, score-0.858]

7 ) But what if there is a selection process at both levels? [sent-8, score-0.243]

8 For my research question (effect of tenure systems on faculty behavior) there is a two-step selection process: first colleges choose whether to have a tenure system for faculty; then faculty choose whether to work for a college that has a tenure system. [sent-9, score-2.066]

9 My concern is that there will be differences between treated and untreated at both levels, and matching at only one level will not achieve balance for covariates at the other level. [sent-10, score-0.992]

10 My idea for handling this is a three-step process: first, match multiple controls to treated schools to balance at the cluster-level, then using only faculty in the matched school sample, match again using individual-level variables. [sent-11, score-2.077]

11 Hopefully at this point I would have enough schools and faculty within schools for a two-level HLM, using covariates at both levels to handle any remaining bias. [sent-12, score-1.363]

12 Have you come across any applications where someone tries to match at two levels rather than one? [sent-14, score-0.706]

13 ”–to think about what comparisons you’d like to make, if sample size were not an issue. [sent-18, score-0.08]

14 Also, if you do end up fitting your model on a relatively small subset of your data, you could evaluate some aspects of your inferences on your larger data, to see if your fitted model gives reasonable predictions. [sent-19, score-0.449]


similar blogs computed by tfidf model

tfidf for this blog:

wordName wordTfidf (topN-words)

[('match', 0.402), ('faculty', 0.323), ('tenure', 0.244), ('porter', 0.238), ('covariates', 0.232), ('levels', 0.228), ('schools', 0.18), ('balance', 0.152), ('treatment', 0.151), ('matching', 0.149), ('process', 0.135), ('treated', 0.134), ('relatively', 0.128), ('immersing', 0.119), ('overthinking', 0.119), ('inferences', 0.116), ('clustered', 0.112), ('hlm', 0.112), ('selection', 0.108), ('untreated', 0.107), ('choose', 0.105), ('school', 0.092), ('propensity', 0.088), ('hopefully', 0.088), ('question', 0.085), ('matched', 0.084), ('handling', 0.083), ('colleges', 0.083), ('sparse', 0.082), ('data', 0.081), ('achieve', 0.08), ('sample', 0.08), ('remaining', 0.079), ('level', 0.077), ('controls', 0.077), ('tries', 0.076), ('using', 0.074), ('subset', 0.074), ('seems', 0.073), ('whether', 0.068), ('steve', 0.067), ('potentially', 0.067), ('handle', 0.067), ('fitted', 0.067), ('systems', 0.066), ('policies', 0.065), ('scores', 0.064), ('evaluate', 0.064), ('rubin', 0.063), ('concern', 0.061)]

similar blogs list:

simIndex simValue blogId blogTitle

same-blog 1 0.99999976 213 andrew gelman stats-2010-08-17-Matching at two levels

Introduction: Steve Porter writes with a question about matching for inferences in a hierarchical data structure. I’ve never thought about this particular issue, but it seems potentially important. Maybe one or more of you have some useful suggestions? Porter writes: After immersing myself in the relatively sparse literature on propensity scores with clustered data, it seems as if people take one of two approaches. If the treatment is at the cluster-level (like school policies), they match on only the cluster-level covariates. If the treatment is at the individual level, they match on individual-level covariates. (I have also found some papers that match on individual-level covariates when it seems as if the treatment is really at the cluster-level.) But what if there is a selection process at both levels? For my research question (effect of tenure systems on faculty behavior) there is a two-step selection process: first colleges choose whether to have a tenure system for faculty; then f

2 0.21981256 86 andrew gelman stats-2010-06-14-“Too much data”?

Introduction: Chris Hane writes: I am scientist needing to model a treatment effect on a population of ~500 people. The dependent variable in the model is the difference in a person’s pre-treatment 12 month total medical cost versus post-treatment cost. So there is large variation in costs, but not so much by using the difference between the pre and post treatment costs. The issue I’d like some advice on is that the treatment has already occurred so there is no possibility of creating a fully randomized control now. I do have a very large population of people to use as possible controls via propensity scoring or exact matching. If I had a few thousand people to possibly match, then I would use standard techniques. However, I have a potential population of over a hundred thousand people. An exact match of the possible controls to age, gender and region of the country still leaves a population of 10,000 controls. Even if I use propensity scores to weight the 10,000 observations (understan

3 0.20174944 375 andrew gelman stats-2010-10-28-Matching for preprocessing data for causal inference

Introduction: Chris Blattman writes : Matching is not an identification strategy a solution to your endogeneity problem; it is a weighting scheme. Saying matching will reduce endogeneity bias is like saying that the best way to get thin is to weigh yourself in kilos. The statement makes no sense. It confuses technique with substance. . . . When you run a regression, you control for the X you can observe. When you match, you are simply matching based on those same X. . . . I see what Chris is getting at–matching, like regression, won’t help for the variables you’re not controlling for–but I disagree with his characterization of matching as a weighting scheme. I see matching as a way to restrict your analysis to comparable cases. The statistical motivation: robustness. If you had a good enough model, you wouldn’t neet to match, you’d just fit the model to the data. But in common practice we often use simple regression models and so it can be helpful to do some matching first before regress

4 0.18001547 2070 andrew gelman stats-2013-10-20-The institution of tenure

Introduction: Rohin Dhar writes: The Priceonomics blog is doing a feature where we ask a few economists what they think of the the institution of tenure. If you’d be interested in participating, I’d love to get your response. As an economist, what do you think of tenure? Should it be abolished / kept / modified? My reply: Just to be clear, I’m assuming that when you say “tenure,” you’re talking about lifetime employment for college professors such as myself. I’m actually a political scientist, not an economist. So rather than giving my opinion, I’ll say what I think an economist might say. I think an economist could say one of two things: Economist as anthropologist would say: Tenure is decided by independent institutions acting freely. If they choose to offer tenure, they will have good reasons, and it is not part of an economist’s job to second-guess individual decisions. Economist as McKinsey consultant would say: Tenure can be evaluated based on a cost-benefit analysis. How

5 0.16540541 571 andrew gelman stats-2011-02-13-A departmental wiki page?

Introduction: I was recently struggling with the Columbia University philophy department’s webpage (to see who might be interested in this stuff ). The faculty webpage was horrible: it’s just a list of names and links with no information on research interests. So I did some searching on the web and found a wonderful wikipedia page which had exactly what I wanted. Then I checked my own department’s page , and it’s even worse than what they have in philosophy! (We also have this page, which is even worse in that it omits many of our faculty and has a bunch of ridiculously technical links for some of the faculty who are included.) I don’t know about the philosophy department, but the statistics department’s webpage is an overengineered mess, designed from the outset to look pretty rather than to be easily updated. Maybe we could replace it entirely with a wiki? In the meantime, if anybody feels like setting up a wikipedia entry for the research of Columbia’s statistics faculty, that

6 0.15150981 936 andrew gelman stats-2011-10-02-Covariate Adjustment in RCT - Model Overfitting in Multilevel Regression

7 0.15131402 2108 andrew gelman stats-2013-11-20-That’s crazy talk!

8 0.13515513 957 andrew gelman stats-2011-10-14-Questions about a study of charter schools

9 0.13300268 796 andrew gelman stats-2011-07-10-Matching and regression: two great tastes etc etc

10 0.11683339 850 andrew gelman stats-2011-08-11-Understanding how estimates change when you move to a multilevel model

11 0.11638927 1028 andrew gelman stats-2011-11-26-Tenure lets you handle students who cheat

12 0.10383183 1353 andrew gelman stats-2012-05-30-Question 20 of my final exam for Design and Analysis of Sample Surveys

13 0.10110158 852 andrew gelman stats-2011-08-13-Checking your model using fake data

14 0.10070294 2096 andrew gelman stats-2013-11-10-Schiminovich is on The Simpsons

15 0.093033388 2 andrew gelman stats-2010-04-23-Modeling heterogenous treatment effects

16 0.092461854 542 andrew gelman stats-2011-01-28-Homework and treatment levels

17 0.091662303 1267 andrew gelman stats-2012-04-17-Hierarchical-multilevel modeling with “big data”

18 0.090781853 1651 andrew gelman stats-2013-01-03-Faculty Position in Visualization, Visual Analytics, Imaging, and Human Centered Computing

19 0.090055078 1352 andrew gelman stats-2012-05-29-Question 19 of my final exam for Design and Analysis of Sample Surveys

20 0.088296071 388 andrew gelman stats-2010-11-01-The placebo effect in pharma


similar blogs computed by lsi model

lsi for this blog:

topicId topicWeight

[(0, 0.167), (1, 0.042), (2, 0.036), (3, -0.072), (4, 0.066), (5, 0.069), (6, -0.004), (7, 0.036), (8, 0.017), (9, 0.058), (10, 0.006), (11, 0.027), (12, -0.014), (13, -0.039), (14, -0.001), (15, -0.0), (16, 0.028), (17, -0.005), (18, -0.018), (19, 0.042), (20, -0.023), (21, 0.032), (22, -0.035), (23, -0.034), (24, 0.005), (25, -0.004), (26, -0.026), (27, 0.022), (28, -0.065), (29, 0.073), (30, -0.021), (31, -0.017), (32, 0.004), (33, 0.1), (34, -0.039), (35, 0.032), (36, -0.014), (37, -0.012), (38, -0.011), (39, 0.082), (40, 0.005), (41, -0.036), (42, -0.007), (43, -0.021), (44, 0.047), (45, -0.028), (46, 0.008), (47, -0.038), (48, -0.012), (49, 0.013)]

similar blogs list:

simIndex simValue blogId blogTitle

same-blog 1 0.94238144 213 andrew gelman stats-2010-08-17-Matching at two levels

Introduction: Steve Porter writes with a question about matching for inferences in a hierarchical data structure. I’ve never thought about this particular issue, but it seems potentially important. Maybe one or more of you have some useful suggestions? Porter writes: After immersing myself in the relatively sparse literature on propensity scores with clustered data, it seems as if people take one of two approaches. If the treatment is at the cluster-level (like school policies), they match on only the cluster-level covariates. If the treatment is at the individual level, they match on individual-level covariates. (I have also found some papers that match on individual-level covariates when it seems as if the treatment is really at the cluster-level.) But what if there is a selection process at both levels? For my research question (effect of tenure systems on faculty behavior) there is a two-step selection process: first colleges choose whether to have a tenure system for faculty; then f

2 0.80097461 86 andrew gelman stats-2010-06-14-“Too much data”?

Introduction: Chris Hane writes: I am scientist needing to model a treatment effect on a population of ~500 people. The dependent variable in the model is the difference in a person’s pre-treatment 12 month total medical cost versus post-treatment cost. So there is large variation in costs, but not so much by using the difference between the pre and post treatment costs. The issue I’d like some advice on is that the treatment has already occurred so there is no possibility of creating a fully randomized control now. I do have a very large population of people to use as possible controls via propensity scoring or exact matching. If I had a few thousand people to possibly match, then I would use standard techniques. However, I have a potential population of over a hundred thousand people. An exact match of the possible controls to age, gender and region of the country still leaves a population of 10,000 controls. Even if I use propensity scores to weight the 10,000 observations (understan

3 0.77379662 936 andrew gelman stats-2011-10-02-Covariate Adjustment in RCT - Model Overfitting in Multilevel Regression

Introduction: Makoto Hanita writes: We have been discussing the following two issues amongst ourselves, then with our methodological consultant for several days. However, we have not been able to arrive at a consensus. Consequently, we decided to seek an opinion from nationally known experts. FYI, we sent a similar inquiry to Larry Hedges and David Rogosa . . . 1)      We are wondering if a post-hoc covariate adjustment is a good practice in the context of RCTs [randomized clinical trials]. We have a situation where we found a significant baseline difference between the treatment and the control groups in 3 variables. Some of us argue that adding those three variables to the original impact analysis model is a good idea, as that would remove the confound from the impact estimate. Others among us, on the other hand, argue that a post-hoc covariate adjustment should never be done, on the ground that those covariates are correlated with the treatment, which makes the analysis model that of quasi

4 0.74493939 1017 andrew gelman stats-2011-11-18-Lack of complete overlap

Introduction: Evens Salies writes: I have a question regarding a randomizing constraint in my current funded electricity experiment. After elimination of missing data we have 110 voluntary households from a larger population (resource constraints do not allow us to have more households!). I randomly assign them to threated and non treated where the treatment variable is some ICT that allows the treated to track their electricity consumption in real tim. The ICT is made of two devices, one that is plugged on the household’s modem and the other on the electric meter. A necessary condition for being treated is that the distance between the box and the meter be below some threshold (d), the value of which is 20 meters approximately. 50 ICTs can be installed. 60 households will be in the control group. But, I can only assign 6 households in the control group for whom d is less than 20. Therefore, I have only 6 households in the control group who have a counterfactual in the group of treated.

5 0.7428534 1523 andrew gelman stats-2012-10-06-Comparing people from two surveys, one of which is a simple random sample and one of which is not

Introduction: Juli writes: I’m helping a professor out with an analysis, and I was hoping that you might be able to point me to some relevant literature… She has two studies that have been completed already (so we can’t go back to the planning stage in terms of sampling, unfortunately). Both studies are based around the population of adults in LA who attended LA public high schools at some point, so that is the same for both studies. Study #1 uses random digit dialing, so I consider that one to be SRS. Study #2, however, is a convenience sample in which all participants were involved with one of eight community-based organizations (CBOs). Of course, both studies can be analyzed independently, but she was hoping for there to be some way to combine/compare the two studies. Specifically, I am working on looking at the civic engagement of the adults in both studies. In study #1, this means looking at factors such as involvement in student government. In study #2, this means looking at involv

6 0.72578096 315 andrew gelman stats-2010-10-03-He doesn’t trust the fit . . . r=.999

7 0.71307534 1910 andrew gelman stats-2013-06-22-Struggles over the criticism of the “cannabis users and IQ change” paper

8 0.70205474 326 andrew gelman stats-2010-10-07-Peer pressure, selection, and educational reform

9 0.70038086 1294 andrew gelman stats-2012-05-01-Modeling y = a + b + c

10 0.68614662 1688 andrew gelman stats-2013-01-22-That claim that students whose parents pay for more of college get worse grades

11 0.68184721 753 andrew gelman stats-2011-06-09-Allowing interaction terms to vary

12 0.68030542 972 andrew gelman stats-2011-10-25-How do you interpret standard errors from a regression fit to the entire population?

13 0.67890328 1657 andrew gelman stats-2013-01-06-Lee Nguyen Tran Kim Song Shimazaki

14 0.66731542 2046 andrew gelman stats-2013-10-01-I’ll say it again

15 0.66719013 560 andrew gelman stats-2011-02-06-Education and Poverty

16 0.66029495 957 andrew gelman stats-2011-10-14-Questions about a study of charter schools

17 0.65725678 527 andrew gelman stats-2011-01-20-Cars vs. trucks

18 0.65518975 251 andrew gelman stats-2010-09-02-Interactions of predictors in a causal model

19 0.64658517 1350 andrew gelman stats-2012-05-28-Value-added assessment: What went wrong?

20 0.64248222 1383 andrew gelman stats-2012-06-18-Hierarchical modeling as a framework for extrapolation


similar blogs computed by lda model

lda for this blog:

topicId topicWeight

[(15, 0.017), (16, 0.053), (21, 0.042), (24, 0.162), (30, 0.011), (45, 0.026), (72, 0.189), (86, 0.013), (89, 0.035), (95, 0.01), (99, 0.336)]

similar blogs list:

simIndex simValue blogId blogTitle

1 0.98571885 1375 andrew gelman stats-2012-06-11-The unitary nature of consciousness: “It’s impossible to be insanely frustrated about 2 things at once”

Introduction: Dan Kahan writes: We all know it’s ridiculous to be able to go on an fMRI fishing trip & resort to post hoc story-telling to explain the “significant” correlations one (inevitably) observes (good fMRI studies *don’t* do this; only bad ones do– to the injury of the reputation of all the scholars doing good studies of this kind). But now one doesn’t even need correlations that support the post-hoc inferences one is drawing. This one’s good. Kahan continues: Headline: Religious Experiences Shrink Part of the Brain text: ” … The study, published March 30 [2011] in PLoS One, showed greater atrophy in the hippocampus in individuals who identify with specific religious groups as well as those with no religious affiliation … The results showed significantly greater hippocampal atrophy in individuals reporting a life-changing religious experience. In addition, they found significantly greater hippocampal atrophy among born-again Protestants, Catholics, and those with no religiou

2 0.97915685 741 andrew gelman stats-2011-06-02-At least he didn’t prove a false theorem

Introduction: Siobhan Mattison pointed me to this . I’m just disappointed they didn’t use my Fenimore Cooper line. Although I guess that reference wouldn’t resonate much outside the U.S. P.S. My guess was correct See comments below. Actually, the reference probably wouldn’t resonate so well among under-50-year-olds in the U.S. either. Sort of like the Jaycees story.

3 0.97657168 1381 andrew gelman stats-2012-06-16-The Art of Fielding

Introduction: I liked it; the reviews were well-deserved. It indeed is a cross between The Mysteries of Pittsburgh and The Universal Baseball Association, J. Henry Waugh, Prop. What struck me most, though, was the contrast with Indecision, the novel by Harbach’s associate, Benjamin Kunkel. As I noted a few years ago , Indecision was notable in that all the characters had agency. That is, each character had his or her own ideas and seemed to act on his or her own ideas, rather than merely carrying the plot along or providing scenery. In contrast, the most gripping drama in The Art of Fielding seem to be characters’ struggling with their plot-determined roles (hence the connection with Coover’s God-soaked baseball classic). Also notable to me was that the college-aged characters not being particularly obsessed with sex—I guess this is that easy-going hook-up culture I keep reading about—while at the same time, just about all the characters seem to be involved in serious drug addiction. I’ve re

4 0.97560573 84 andrew gelman stats-2010-06-14-Is it 1930?

Introduction: Lawrence Mishel of the Economic Policy Institute reports: Goldman Sachs’ latest forecast (and they’ve been pretty accurate so far) is that unemployment will rise to 9.9% by early 2011 and trend down to 9.7% for the last quarter of 2011. Obviously, this is a simply awful scenario but it seems one that is being accepted. That is, we seem to be in the process of accepting the unacceptable. Note that this scenario probably assumes the passage of the limited efforts now being considered in Congress. One might be surprised that Obama and congressional Democrats are not doing more to try to bring unemployment down. On the other hand, just to speak in generalities (not knowing any of the people involved), I would think that Obama would be much much more worried about the economy doing well in 2010 and then crashing in 2012. A crappy economy through 2011 and then improvement in 2012–that would be his ideal, no? Not that he would have the ability to time this sort of thing. But perhap

5 0.97497308 268 andrew gelman stats-2010-09-10-Fighting Migraine with Multilevel Modeling

Introduction: Hal Pashler writes: Ed Vul and I are working on something that, although less exciting than the struggle against voodoo correlations in fMRI :-) might interest you and your readers. The background is this: we have been struck for a long time by how many people get frustrated and confused trying to figure out whether something they are doing/eating/etc is triggering something bad, whether it be migraine headaches, children’s tantrums, arthritis pains, or whatever. It seems crazy to try to do such computations in one’s head–and the psychological literature suggests people must be pretty bad at this kind of thing–but what’s the alternative? We are trying to develop one alternative approach–starting with migraine as a pilot project. We created a website that migraine sufferers can sign up for. The users select a list of factors that they think might be triggering their headaches (eg drinking red wine, eating stinky cheese, etc.–the website suggests a big list of candidates drawn

6 0.96921569 1179 andrew gelman stats-2012-02-21-“Readability” as freedom from the actual sensation of reading

7 0.96507645 83 andrew gelman stats-2010-06-13-Silly Sas lays out old-fashioned statistical thinking

8 0.96362162 500 andrew gelman stats-2011-01-03-Bribing statistics

9 0.96273625 68 andrew gelman stats-2010-06-03-…pretty soon you’re talking real money.

10 0.95921052 190 andrew gelman stats-2010-08-07-Mister P makes the big jump from the New York Times to the Washington Post

11 0.95470333 2335 andrew gelman stats-2014-05-15-Bill Easterly vs. Jeff Sachs: What percentage of the recipients didn’t use the free malaria bed nets in Zambia?

12 0.95255613 1079 andrew gelman stats-2011-12-23-Surveys show Americans are populist class warriors, except when they aren’t

13 0.95123994 1244 andrew gelman stats-2012-04-03-Meta-analyses of impact evaluations of aid programs

14 0.94722372 624 andrew gelman stats-2011-03-22-A question about the economic benefits of universities

15 0.94714195 550 andrew gelman stats-2011-02-02-An IV won’t save your life if the line is tangled

16 0.94441921 737 andrew gelman stats-2011-05-30-Memorial Day question

17 0.93549871 2331 andrew gelman stats-2014-05-12-On deck this week

18 0.93369305 1524 andrew gelman stats-2012-10-07-An (impressive) increase in survival rate from 50% to 60% corresponds to an R-squared of (only) 1%. Counterintuitive, huh?

19 0.9314366 2044 andrew gelman stats-2013-09-30-Query from a textbook author – looking for stories to tell to undergrads about significance

20 0.93015057 727 andrew gelman stats-2011-05-23-My new writing strategy