andrew_gelman_stats andrew_gelman_stats-2010 andrew_gelman_stats-2010-2 knowledge-graph by maker-knowledge-mining

2 andrew gelman stats-2010-04-23-Modeling heterogenous treatment effects


meta infos for this blog

Source: html

Introduction: Don Green and Holger Kern write on one of my favorite topics , treatment interactions (see also here ): We [Green and Kern] present a methodology that largely automates the search for systematic treatment effect heterogeneity in large-scale experiments. We introduce a nonparametric estimator developed in statistical learning, Bayesian Additive Regression Trees (BART), to model treatment effects that vary as a function of covariates. BART has several advantages over commonly employed parametric modeling strategies, in particular its ability to automatically detect and model relevant treatment-covariate interactions in a flexible manner. To increase the reliability and credibility of the resulting conditional treatment effect estimates, we suggest the use of a split sample design. The data are randomly divided into two equally-sized parts, with the first part used to explore treatment effect heterogeneity and the second part used to confirm the results. This approach permits a re


Summary: the most important sentenses genereted by tfidf model

sentIndex sentText sentNum sentScore

1 Don Green and Holger Kern write on one of my favorite topics , treatment interactions (see also here ): We [Green and Kern] present a methodology that largely automates the search for systematic treatment effect heterogeneity in large-scale experiments. [sent-1, score-1.65]

2 We introduce a nonparametric estimator developed in statistical learning, Bayesian Additive Regression Trees (BART), to model treatment effects that vary as a function of covariates. [sent-2, score-0.634]

3 BART has several advantages over commonly employed parametric modeling strategies, in particular its ability to automatically detect and model relevant treatment-covariate interactions in a flexible manner. [sent-3, score-0.725]

4 To increase the reliability and credibility of the resulting conditional treatment effect estimates, we suggest the use of a split sample design. [sent-4, score-0.904]

5 The data are randomly divided into two equally-sized parts, with the first part used to explore treatment effect heterogeneity and the second part used to confirm the results. [sent-5, score-1.298]

6 This approach permits a relatively unstructured data-driven exploration of treatment effect heterogeneity while avoiding charges of data dredging and mitigating multiple comparison problems. [sent-6, score-1.726]

7 We illustrate the value of our approach by offering two empirical examples, a survey experiment on Americans support for social welfare spending and a voter mobilization field experiment. [sent-7, score-0.53]

8 In both applications, BART provides robust insights into the nature of systematic treatment effect heterogeneity. [sent-8, score-0.856]

9 Pretty pictures, too (except for ugly Table 1, but, hey, nobody’s perfect). [sent-11, score-0.076]


similar blogs computed by tfidf model

tfidf for this blog:

wordName wordTfidf (topN-words)

[('treatment', 0.379), ('bart', 0.346), ('heterogeneity', 0.286), ('kern', 0.281), ('effect', 0.189), ('green', 0.164), ('systematic', 0.14), ('interactions', 0.132), ('mobilization', 0.128), ('mitigating', 0.12), ('dredging', 0.115), ('unstructured', 0.115), ('permits', 0.108), ('employed', 0.099), ('trees', 0.095), ('charges', 0.094), ('parametric', 0.094), ('additive', 0.091), ('avoiding', 0.091), ('credibility', 0.091), ('estimator', 0.09), ('pictures', 0.087), ('reliability', 0.087), ('confirm', 0.086), ('welfare', 0.086), ('detect', 0.084), ('introduce', 0.084), ('flexible', 0.083), ('split', 0.083), ('strategies', 0.083), ('voter', 0.082), ('advantages', 0.081), ('nonparametric', 0.081), ('offering', 0.081), ('approach', 0.081), ('commonly', 0.08), ('exploration', 0.079), ('divided', 0.078), ('ugly', 0.076), ('insights', 0.075), ('resulting', 0.075), ('methodology', 0.073), ('randomly', 0.073), ('robust', 0.073), ('automatically', 0.072), ('illustrate', 0.072), ('largely', 0.072), ('explore', 0.069), ('part', 0.069), ('relatively', 0.069)]

similar blogs list:

simIndex simValue blogId blogTitle

same-blog 1 0.99999982 2 andrew gelman stats-2010-04-23-Modeling heterogenous treatment effects

Introduction: Don Green and Holger Kern write on one of my favorite topics , treatment interactions (see also here ): We [Green and Kern] present a methodology that largely automates the search for systematic treatment effect heterogeneity in large-scale experiments. We introduce a nonparametric estimator developed in statistical learning, Bayesian Additive Regression Trees (BART), to model treatment effects that vary as a function of covariates. BART has several advantages over commonly employed parametric modeling strategies, in particular its ability to automatically detect and model relevant treatment-covariate interactions in a flexible manner. To increase the reliability and credibility of the resulting conditional treatment effect estimates, we suggest the use of a split sample design. The data are randomly divided into two equally-sized parts, with the first part used to explore treatment effect heterogeneity and the second part used to confirm the results. This approach permits a re

2 0.32746956 1891 andrew gelman stats-2013-06-09-“Heterogeneity of variance in experimental studies: A challenge to conventional interpretations”

Introduction: Avi sent along this old paper from Bryk and Raudenbush, who write: The presence of heterogeneity of variance across groups indicates that the standard statistical model for treatment effects no longer applies. Specifically, the assumption that treatments add a constant to each subject’s development fails. An alternative model is required to represent how treatment effects are distributed across individuals. We develop in this article a simple statistical model to demonstrate the link between heterogeneity of variance and random treatment effects. Next, we illustrate with results from two previously published studies how a failure to recognize the substantive importance of heterogeneity of variance obscured significant results present in these data. The article concludes with a review and synthesis of techniques for modeling variances. Although these methods have been well established in the statistical literature, they are not widely known by social and behavioral scientists. T

3 0.19248335 1310 andrew gelman stats-2012-05-09-Varying treatment effects, again

Introduction: This time from Bernard Fraga and Eitan Hersh. Once you think about it, it’s hard to imagine any nonzero treatment effects that don’t vary. I’m glad to see this area of research becoming more prominent. ( Here ‘s a discussion of another political science example, also of voter turnout, from a few years ago, from Avi Feller and Chris Holmes.) Some of my fragmentary work on varying treatment effects is here (Treatment Effects in Before-After Data) and here (Estimating Incumbency Advantage and Its Variation, as an Example of a Before–After Study).

4 0.15699178 1374 andrew gelman stats-2012-06-11-Convergence Monitoring for Non-Identifiable and Non-Parametric Models

Introduction: Becky Passonneau and colleagues at the Center for Computational Learning Systems (CCLS) at Columbia have been working on a project for ConEd (New York’s major electric utility) to rank structures based on vulnerability to secondary events (e.g., transformer explosions, cable meltdowns, electrical fires). They’ve been using the R implementation BayesTree of Chipman, George and McCulloch’s Bayesian Additive Regression Trees (BART). BART is a Bayesian non-parametric method that is non-identifiable in two ways. Firstly, it is an additive tree model with a fixed number of trees, the indexes of which aren’t identified (you get the same predictions in a model swapping the order of the trees). This is the same kind of non-identifiability you get with any mixture model (additive or interpolated) with an exchangeable prior on the mixture components. Secondly, the trees themselves have varying structure over samples in terms of number of nodes and their topology (depth, branching, etc

5 0.1468109 388 andrew gelman stats-2010-11-01-The placebo effect in pharma

Introduction: Bruce McCullough writes: The Sept 2009 issue of Wired had a big article on the increase in the placebo effect, and why it’s been getting bigger. Kaiser Fung has a synopsis . As if you don’t have enough to do, I thought you might be interested in blogging on this. My reply: I thought Kaiser’s discussion was good, especially this point: Effect on treatment group = Effect of the drug + effect of belief in being treated Effect on placebo group = Effect of belief in being treated Thus, the difference between the two groups = effect of the drug, since the effect of belief in being treated affects both groups of patients. Thus, as Kaiser puts it, if the treatment isn’t doing better than placebo, it doesn’t say that the placebo effect is big (let alone “too big”) but that the treatment isn’t showing any additional effect. It’s “treatment + placebo” vs. placebo, not treatment vs. placebo. That said, I’d prefer for Kaiser to make it clear that the additivity he’s assu

6 0.12930524 7 andrew gelman stats-2010-04-27-Should Mister P be allowed-encouraged to reside in counter-factual populations?

7 0.12828232 1535 andrew gelman stats-2012-10-16-Bayesian analogue to stepwise regression?

8 0.11838581 2120 andrew gelman stats-2013-12-02-Does a professor’s intervention in online discussions have the effect of prolonging discussion or cutting it off?

9 0.11792828 86 andrew gelman stats-2010-06-14-“Too much data”?

10 0.11271329 2008 andrew gelman stats-2013-09-04-Does it matter that a sample is unrepresentative? It depends on the size of the treatment interactions

11 0.10918744 936 andrew gelman stats-2011-10-02-Covariate Adjustment in RCT - Model Overfitting in Multilevel Regression

12 0.10522464 803 andrew gelman stats-2011-07-14-Subtleties with measurement-error models for the evaluation of wacky claims

13 0.10424538 241 andrew gelman stats-2010-08-29-Ethics and statistics in development research

14 0.099518962 823 andrew gelman stats-2011-07-26-Including interactions or not

15 0.099396236 1150 andrew gelman stats-2012-02-02-The inevitable problems with statistical significance and 95% intervals

16 0.099340811 1869 andrew gelman stats-2013-05-24-In which I side with Neyman over Fisher

17 0.098528251 797 andrew gelman stats-2011-07-11-How do we evaluate a new and wacky claim?

18 0.093920089 399 andrew gelman stats-2010-11-07-Challenges of experimental design; also another rant on the practice of mentioning the publication of an article but not naming its author

19 0.093033388 213 andrew gelman stats-2010-08-17-Matching at two levels

20 0.092878975 1763 andrew gelman stats-2013-03-14-Everyone’s trading bias for variance at some point, it’s just done at different places in the analyses


similar blogs computed by lsi model

lsi for this blog:

topicId topicWeight

[(0, 0.15), (1, 0.06), (2, 0.051), (3, -0.09), (4, 0.025), (5, 0.01), (6, -0.066), (7, 0.021), (8, 0.044), (9, 0.032), (10, -0.041), (11, -0.009), (12, 0.056), (13, 0.001), (14, 0.03), (15, -0.008), (16, -0.035), (17, -0.009), (18, -0.038), (19, 0.042), (20, -0.038), (21, -0.009), (22, -0.007), (23, -0.006), (24, -0.006), (25, 0.048), (26, -0.051), (27, 0.031), (28, -0.05), (29, 0.016), (30, -0.052), (31, -0.001), (32, -0.044), (33, -0.004), (34, -0.013), (35, -0.002), (36, -0.053), (37, -0.046), (38, 0.047), (39, -0.023), (40, -0.003), (41, 0.038), (42, -0.012), (43, 0.005), (44, 0.066), (45, 0.032), (46, 0.004), (47, -0.051), (48, 0.0), (49, 0.049)]

similar blogs list:

simIndex simValue blogId blogTitle

same-blog 1 0.97962379 2 andrew gelman stats-2010-04-23-Modeling heterogenous treatment effects

Introduction: Don Green and Holger Kern write on one of my favorite topics , treatment interactions (see also here ): We [Green and Kern] present a methodology that largely automates the search for systematic treatment effect heterogeneity in large-scale experiments. We introduce a nonparametric estimator developed in statistical learning, Bayesian Additive Regression Trees (BART), to model treatment effects that vary as a function of covariates. BART has several advantages over commonly employed parametric modeling strategies, in particular its ability to automatically detect and model relevant treatment-covariate interactions in a flexible manner. To increase the reliability and credibility of the resulting conditional treatment effect estimates, we suggest the use of a split sample design. The data are randomly divided into two equally-sized parts, with the first part used to explore treatment effect heterogeneity and the second part used to confirm the results. This approach permits a re

2 0.83554894 1891 andrew gelman stats-2013-06-09-“Heterogeneity of variance in experimental studies: A challenge to conventional interpretations”

Introduction: Avi sent along this old paper from Bryk and Raudenbush, who write: The presence of heterogeneity of variance across groups indicates that the standard statistical model for treatment effects no longer applies. Specifically, the assumption that treatments add a constant to each subject’s development fails. An alternative model is required to represent how treatment effects are distributed across individuals. We develop in this article a simple statistical model to demonstrate the link between heterogeneity of variance and random treatment effects. Next, we illustrate with results from two previously published studies how a failure to recognize the substantive importance of heterogeneity of variance obscured significant results present in these data. The article concludes with a review and synthesis of techniques for modeling variances. Although these methods have been well established in the statistical literature, they are not widely known by social and behavioral scientists. T

3 0.8091535 1310 andrew gelman stats-2012-05-09-Varying treatment effects, again

Introduction: This time from Bernard Fraga and Eitan Hersh. Once you think about it, it’s hard to imagine any nonzero treatment effects that don’t vary. I’m glad to see this area of research becoming more prominent. ( Here ‘s a discussion of another political science example, also of voter turnout, from a few years ago, from Avi Feller and Chris Holmes.) Some of my fragmentary work on varying treatment effects is here (Treatment Effects in Before-After Data) and here (Estimating Incumbency Advantage and Its Variation, as an Example of a Before–After Study).

4 0.75874603 1744 andrew gelman stats-2013-03-01-Why big effects are more important than small effects

Introduction: The title of this post is silly but I have an important point to make, regarding an implicit model which I think many people assume even though it does not really make sense. Following a link from Sanjay Srivastava, I came across a post from David Funder saying that it’s useful to talk about the sizes of effects (I actually prefer the term “comparisons” so as to avoid the causal baggage) rather than just their signs. I agree , and I wanted to elaborate a bit on a point that comes up in Funder’s discussion. He quotes an (unnamed) prominent social psychologist as writing: The key to our research . . . [is not] to accurately estimate effect size. If I were testing an advertisement for a marketing research firm and wanted to be sure that the cost of the ad would produce enough sales to make it worthwhile, effect size would be crucial. But when I am testing a theory about whether, say, positive mood reduces information processing in comparison with negative mood, I am worried abou

5 0.74763358 7 andrew gelman stats-2010-04-27-Should Mister P be allowed-encouraged to reside in counter-factual populations?

Introduction: Lets say you are repeatedly going to recieve unselected sets of well done RCTs on various say medical treatments. One reasonable assumption with all of these treatments is that they are monotonic – either helpful or harmful for all. The treatment effect will (as always) vary for subgroups in the population – these will not be explicitly identified in the studies – but each study very likely will enroll different percentages of the variuos patient subgroups. Being all randomized studies these subgroups will be balanced in the treatment versus control arms – but each study will (as always) be estimating a different – but exchangeable – treatment effect (Exhangeable due to the ignorance about the subgroup memberships of the enrolled patients.) That reasonable assumption – monotonicity – will be to some extent (as always) wrong, but given that it is a risk believed well worth taking – if the average effect in any population is positive (versus negative) the average effect in any other

6 0.73626423 803 andrew gelman stats-2011-07-14-Subtleties with measurement-error models for the evaluation of wacky claims

7 0.72179198 2097 andrew gelman stats-2013-11-11-Why ask why? Forward causal inference and reverse causal questions

8 0.71240515 898 andrew gelman stats-2011-09-10-Fourteen magic words: an update

9 0.71124262 1400 andrew gelman stats-2012-06-29-Decline Effect in Linguistics?

10 0.71051115 716 andrew gelman stats-2011-05-17-Is the internet causing half the rapes in Norway? I wanna see the scatterplot.

11 0.70838684 1644 andrew gelman stats-2012-12-30-Fixed effects, followed by Bayes shrinkage?

12 0.69987279 963 andrew gelman stats-2011-10-18-Question on Type M errors

13 0.69892371 393 andrew gelman stats-2010-11-04-Estimating the effect of A on B, and also the effect of B on A

14 0.68034708 1910 andrew gelman stats-2013-06-22-Struggles over the criticism of the “cannabis users and IQ change” paper

15 0.66823548 433 andrew gelman stats-2010-11-27-One way that psychology research is different than medical research

16 0.6675809 1186 andrew gelman stats-2012-02-27-Confusion from illusory precision

17 0.66508734 2165 andrew gelman stats-2014-01-09-San Fernando Valley cityscapes: An example of the benefits of fractal devastation?

18 0.6632387 2227 andrew gelman stats-2014-02-27-“What Can we Learn from the Many Labs Replication Project?”

19 0.6625281 518 andrew gelman stats-2011-01-15-Regression discontinuity designs: looking for the keys under the lamppost?

20 0.66012043 1150 andrew gelman stats-2012-02-02-The inevitable problems with statistical significance and 95% intervals


similar blogs computed by lda model

lda for this blog:

topicId topicWeight

[(16, 0.411), (21, 0.01), (24, 0.141), (26, 0.012), (45, 0.018), (63, 0.021), (85, 0.025), (99, 0.232)]

similar blogs list:

simIndex simValue blogId blogTitle

1 0.99228239 1366 andrew gelman stats-2012-06-05-How do segregation measures change when you change the level of aggregation?

Introduction: In a discussion of workplace segregation, Philip Cohen posts some graphs that led me to a statistical question. I’ll pose my question below, but first the graphs: In a world of zero segregation of jobs by sex, the top graph above would have a spike at 50% (or, whatever the actual percentage is of women in the labor force) and, in the bottom graph, the pink and blue lines would be in the same place and would look like very steep S curves. The difference between the pink and blue lines represents segregation by job. One thing I wonder is how these graphs would change if we redefine occupation. (For example, is my occupation “mathematical scientist,” “statistician,” “teacher,” “university professor,” “statistics professor,” or “tenured statistics professor”?) Finer or coarser classification would give different results, and I wonder how this would work. This is not at all meant as a criticism of Cohen’s claims, it’s just a statistical question. I’m guessing that

2 0.98788673 1279 andrew gelman stats-2012-04-24-ESPN is looking to hire a research analyst

Introduction: This is somebody’s dream job, I’m sure . . . ESPN is looking for a statistician to join the HR department as a Research Analyst . The job will consist of analytical research and producing statistics about the people that work at ESPN. Topics of interest will include productivity, efficiency, and retention of employees, among other items. In addition to data mining and producing reports, we also field surveys and analyze results. The position is located at the headquarters in Bristol, Connecticut, the same campus where nearly all ESPN shows are produced. ESPN is a Disney company, so discounts and free admission to Disney parks are available for employees. Flexible work arrangements are available, along with working in the New York City office part-time if desired. The role is a relatively new function and will have a high impact very quickly on helping the business function. Statistical software, text books, and any other resource needed to get the job done will be provided. T

3 0.97704601 572 andrew gelman stats-2011-02-14-Desecration of valuable real estate

Introduction: Malecki asks: Is this the worst infographic ever to appear in NYT? USA Today is not something to aspire to. To connect to some of our recent themes , I agree this is a pretty horrible data display. But it’s not bad as a series of images. Considering the competition to be a cartoon or series of photos, these images aren’t so bad. One issue, I think, is that designers get credit for creativity and originality (unusual color combinations! Histogram bars shaped like mosques!) , which is often the opposite of what we want in a clear graph. It’s Martin Amis vs. George Orwell all over again.

4 0.96834099 398 andrew gelman stats-2010-11-06-Quote of the day

Introduction: “A statistical model is usually taken to be summarized by a likelihood, or a likelihood and a prior distribution, but we go an extra step by noting that the parameters of a model are typically batched, and we take this batching as an essential part of the model.”

5 0.96599299 1115 andrew gelman stats-2012-01-12-Where are the larger-than-life athletes?

Introduction: Jonathan Cantor points to this poll estimating rifle-armed QB Tim Tebow as America’s favorite pro athlete: In an ESPN survey of 1,502 Americans age 12 or older, three percent identified Tebow as their favorite professional athlete. Tebow finished in front of Kobe Bryant (2 percent), Aaron Rodgers (1.9 percent), Peyton Manning (1.8 percent), and Tom Brady (1.5 percent). Amusing. What this survey says to me is that there are no super-popular athletes who are active in America today. Which actually sounds about right. No Tiger Woods, no Magic Johnson, Muhammed Ali, John Elway, Pete Rose, Billie Jean King, etc etc. Tebow is an amusing choice, people might as well pick him now while he’s still on top. As a sports celeb, he’s like Bill Lee or the Refrigerator: colorful and a solid pro athlete, but no superstar. When you think about all the colorful superstar athletes of times gone by, it’s perhaps surprising that there’s nobody out there right now to play the role. I supp

6 0.96428156 1487 andrew gelman stats-2012-09-08-Animated drought maps

7 0.95564157 1598 andrew gelman stats-2012-11-30-A graphics talk with no visuals!

8 0.95493478 1180 andrew gelman stats-2012-02-22-I’m officially no longer a “rogue”

9 0.95462281 1025 andrew gelman stats-2011-11-24-Always check your evidence

10 0.95430911 1330 andrew gelman stats-2012-05-19-Cross-validation to check missing-data imputation

11 0.95210308 528 andrew gelman stats-2011-01-21-Elevator shame is a two-way street

12 0.95132256 1014 andrew gelman stats-2011-11-16-Visualizations of NYPD stop-and-frisk data

13 0.9510529 700 andrew gelman stats-2011-05-06-Suspicious pattern of too-strong replications of medical research

14 0.94943953 1304 andrew gelman stats-2012-05-06-Picking on Stephen Wolfram

15 0.9489525 1659 andrew gelman stats-2013-01-07-Some silly things you (didn’t) miss by not reading the sister blog

same-blog 16 0.94750738 2 andrew gelman stats-2010-04-23-Modeling heterogenous treatment effects

17 0.94190943 1156 andrew gelman stats-2012-02-06-Bayesian model-building by pure thought: Some principles and examples

18 0.94128972 177 andrew gelman stats-2010-08-02-Reintegrating rebels into civilian life: Quasi-experimental evidence from Burundi

19 0.93931019 445 andrew gelman stats-2010-12-03-Getting a job in pro sports… as a statistician

20 0.93555301 609 andrew gelman stats-2011-03-13-Coauthorship norms