andrew_gelman_stats andrew_gelman_stats-2011 andrew_gelman_stats-2011-518 knowledge-graph by maker-knowledge-mining

518 andrew gelman stats-2011-01-15-Regression discontinuity designs: looking for the keys under the lamppost?

meta infos for this blog

Source: html

Introduction: Jas sends along this paper (with Devin Caughey), entitled Regression-Discontinuity Designs and Popular Elections: Implications of Pro-Incumbent Bias in Close U.S. House Races, and writes: The paper shows that regression discontinuity does not work for US House elections. Close House elections are anything but random. It isn’t election recounts or something like that (we collect recount data to show that it isn’t). We have collected much new data to try to hunt down what is going on (e.g., campaign finance data, CQ pre-election forecasts, correct many errors in the Lee dataset). The substantive implications are interesting. We also have a section that compares in details Gelman and King versus the Lee estimand and estimator. I had a few comments: David Lee is not estimating the effect of incumbency; he’s estimating the effect of the incumbent party, which is a completely different thing. The regression discontinuity design is completely inappropriate for estimating the

Summary: the most important sentenses genereted by tfidf model

sentIndex sentText sentNum sentScore

1 House Races, and writes: The paper shows that regression discontinuity does not work for US House elections. [sent-3, score-0.28]

2 I had a few comments: David Lee is not estimating the effect of incumbency; he’s estimating the effect of the incumbent party, which is a completely different thing. [sent-11, score-0.999]

3 The regression discontinuity design is completely inappropriate for estimating the effect of incumbency. [sent-12, score-0.627]

4 I guess I should’ve published this, but Lee’s idea seemed so evidently inappropriate (at least for the problem of estimating incumbency advantage in the U. [sent-14, score-0.829]

5 ) that it didn’t seem worth devoting effort to it. [sent-16, score-0.115]

6 Leigh Linden did convince me that the estimate makes sense for India, though, where it really does seem to make more sense to think of incumbency as a property of a political party rather than of an individual legislator. [sent-17, score-0.811]

7 Gary King and I distinguished between incumbency effect and incumbent party effect in our 1990 AJPS paper, where we explicitly laid out the causal inference. [sent-18, score-1.4]

8 Finally, the most sophisticated analysis of incumbency advantaget and its variation that I know of is my recent Jasa paper with Huang. [sent-20, score-0.611]

9 In short, we’re estimating the effect of incumbency, Lee is estimating the effect of incumbent party. [sent-21, score-0.999]

10 You can see this through a thought experiment in which all congressmembers are term-limited to serve only two years. [sent-22, score-0.147]

11 There would then never be any incumbents running for reelection (thus, no incumbency effect) but there would be an incumbent party effect. [sent-23, score-1.021]

12 And, correspondingly, the Gelman and Huang (or Gelman and King) estimate of incumbency advantage would be undefined, but the Lee estimator of incumbent party effect would work just fine. [sent-24, score-1.268]

13 Perhaps you should make your title and abstract more general, since really the key contribution of your paper is not about incumbency (since, as you point out, the different estimates happen to be pretty similar for U. [sent-26, score-0.721]

14 The implication I get from the beginning of the paper is that, if only the RD assumptions were valid, Lee’s estimate would be just fine. [sent-30, score-0.211]

15 Later on, you clarify that there are tradeoffs–basically, Lee is attempting to trade off validity for reliability. [sent-31, score-0.111]

16 (I say that he’s trading off valididty because I don’t think anyone would really consider the incumbent-party effect as an incumbency effect. [sent-32, score-0.751]

17 And a connection to the concepts of reliability and validity might be useful. [sent-35, score-0.2]

18 That’s a general issue in causal inference: do you want a biased, assumption-laden estimate of the actual quantity of interest, or a crisp randomized estimate of something that’s vaguely related that you happen to have an experiment (or natural experiment) on? [sent-36, score-0.565]

19 Also, I really really don’t recommend fitting fourth-order polynomials. [sent-39, score-0.11]

20 I suppose I should write a paper about this–maybe you’d be interested in collaborating on such an effort? [sent-43, score-0.168]

similar blogs computed by tfidf model

tfidf for this blog:

wordName wordTfidf (topN-words)

[('incumbency', 0.502), ('lee', 0.351), ('incumbent', 0.253), ('caughey', 0.196), ('effect', 0.194), ('estimating', 0.179), ('party', 0.152), ('sekhon', 0.144), ('king', 0.119), ('validity', 0.111), ('paper', 0.109), ('discontinuity', 0.108), ('causal', 0.105), ('estimate', 0.102), ('house', 0.099), ('experiment', 0.09), ('reliability', 0.089), ('gelman', 0.087), ('inappropriate', 0.083), ('designs', 0.083), ('mass', 0.073), ('implications', 0.069), ('elections', 0.067), ('cq', 0.065), ('linden', 0.065), ('leigh', 0.065), ('splines', 0.065), ('suprised', 0.065), ('advantage', 0.065), ('regression', 0.063), ('devin', 0.062), ('devoting', 0.062), ('estimand', 0.062), ('huang', 0.062), ('hunt', 0.062), ('collaborating', 0.059), ('undefined', 0.059), ('congressmembers', 0.057), ('incumbents', 0.057), ('reelection', 0.057), ('ajps', 0.057), ('crisp', 0.057), ('polynomials', 0.057), ('correspondingly', 0.055), ('really', 0.055), ('happen', 0.055), ('natural', 0.054), ('effort', 0.053), ('jasa', 0.052), ('temptation', 0.052)]

similar blogs list:

simIndex simValue blogId blogTitle

same-blog 1 0.9999997 518 andrew gelman stats-2011-01-15-Regression discontinuity designs: looking for the keys under the lamppost?

2 0.21408273 1201 andrew gelman stats-2012-03-07-Inference = data + model

Introduction: A recent article on global warming reminded me of the difficulty of letting the data speak. William Nordhaus shows the following graph: And then he writes: One of the reasons that drawing conclusions on temperature trends is tricky is that the historical temperature series is highly volatile, as can be seen in the figure. The presence of short-term volatility requires looking at long-term trends. A useful analogy is the stock market. Suppose an analyst says that because real stock prices have declined over the last decade (which is true), it follows that there is no upward trend. Here again, an examination of the long-term data would quickly show this to be incorrect. The last decade of temperature and stock market data is not representative of the longer-term trends. The finding that global temperatures are rising over the last century-plus is one of the most robust findings of climate science and statistics. I see what he’s saying, but first, I don’t find the st

3 0.16767716 1310 andrew gelman stats-2012-05-09-Varying treatment effects, again

Introduction: This time from Bernard Fraga and Eitan Hersh. Once you think about it, it’s hard to imagine any nonzero treatment effects that don’t vary. I’m glad to see this area of research becoming more prominent. ( Here ‘s a discussion of another political science example, also of voter turnout, from a few years ago, from Avi Feller and Chris Holmes.) Some of my fragmentary work on varying treatment effects is here (Treatment Effects in Before-After Data) and here (Estimating Incumbency Advantage and Its Variation, as an Example of a Before–After Study).

4 0.1543061 478 andrew gelman stats-2010-12-20-More on why “all politics is local” is an outdated slogan

Introduction: Yesterday I wrote that Mickey Kaus was right to point out that it’s time to retire Tip O’Neill’s famous dictum that “all politics are local.” As Kaus points out, all the congressional elections in recent decades have been nationalized. The slogan is particularly silly for Tip O’Neill himself. Sure, O’Neill had to have a firm grip on local politics to get his safe seat in the first place, but after that it was smooth sailing. Jonathan Bernstein disagrees , writing: Yes, but: don’t most Members of the House have ironclad partisan districts? And isn’t the most important single thing they can do to protect themselves involve having pull in state politics to avoid being gerrymandered? That is “all politics is local,” no? There’s also a fair amount they can do to stay on the good side of their local party, thus avoiding a primary fight. And, even in an era of nationalized elections, there’s still plenty a Member of Congress can do to to influence elections on the margins, a

5 0.13353673 1939 andrew gelman stats-2013-07-15-Forward causal reasoning statements are about estimation; reverse causal questions are about model checking and hypothesis generation

Introduction: Consider two broad classes of inferential questions : 1. Forward causal inference . What might happen if we do X? What are the effects of smoking on health, the effects of schooling on knowledge, the effect of campaigns on election outcomes, and so forth? 2. Reverse causal inference . What causes Y? Why do more attractive people earn more money? Why do many poor people vote for Republicans and rich people vote for Democrats? Why did the economy collapse? When statisticians and econometricians write about causal inference, they focus on forward causal questions. Rubin always told us: Never ask Why? Only ask What if? And, from the econ perspective, causation is typically framed in terms of manipulations: if x had changed by 1, how much would y be expected to change, holding all else constant? But reverse causal questions are important too. They’re a natural way to think (consider the importance of the word “Why”) and are arguably more important than forward questions.

6 0.13121334 1982 andrew gelman stats-2013-08-15-Blaming scientific fraud on the Kuhnians

7 0.1297994 803 andrew gelman stats-2011-07-14-Subtleties with measurement-error models for the evaluation of wacky claims

8 0.1290431 1086 andrew gelman stats-2011-12-27-The most dangerous jobs in America

9 0.12495878 1418 andrew gelman stats-2012-07-16-Long discussion about causal inference and the use of hierarchical models to bridge between different inferential settings

10 0.1168587 797 andrew gelman stats-2011-07-11-How do we evaluate a new and wacky claim?

11 0.11389886 1968 andrew gelman stats-2013-08-05-Evidence on the impact of sustained use of polynomial regression on causal inference (a claim that coal heating is reducing lifespan by 5 years for half a billion people)

12 0.11309274 1971 andrew gelman stats-2013-08-07-I doubt they cheated

13 0.10920046 960 andrew gelman stats-2011-10-15-The bias-variance tradeoff

14 0.10457475 2095 andrew gelman stats-2013-11-09-Typo in Ghitza and Gelman MRP paper

15 0.10165693 796 andrew gelman stats-2011-07-10-Matching and regression: two great tastes etc etc

16 0.10024637 550 andrew gelman stats-2011-02-02-An IV won’t save your life if the line is tangled

17 0.099692419 1514 andrew gelman stats-2012-09-28-AdviseStat 47% Campaign Ad

18 0.099474765 250 andrew gelman stats-2010-09-02-Blending results from two relatively independent multi-level models

19 0.097808644 1547 andrew gelman stats-2012-10-25-College football, voting, and the law of large numbers

20 0.09756811 2245 andrew gelman stats-2014-03-12-More on publishing in journals

similar blogs computed by lsi model

lsi for this blog:

topicId topicWeight

[(0, 0.199), (1, 0.015), (2, 0.079), (3, -0.055), (4, -0.023), (5, -0.026), (6, -0.04), (7, -0.035), (8, 0.043), (9, -0.012), (10, 0.017), (11, 0.03), (12, 0.049), (13, -0.036), (14, 0.04), (15, 0.002), (16, -0.029), (17, 0.026), (18, -0.032), (19, 0.065), (20, -0.075), (21, 0.015), (22, 0.076), (23, 0.022), (24, 0.052), (25, 0.084), (26, -0.008), (27, 0.003), (28, -0.016), (29, 0.022), (30, 0.009), (31, 0.013), (32, -0.035), (33, -0.039), (34, -0.053), (35, -0.014), (36, -0.024), (37, -0.048), (38, 0.012), (39, 0.003), (40, -0.022), (41, 0.033), (42, -0.089), (43, -0.017), (44, -0.037), (45, 0.024), (46, 0.006), (47, -0.02), (48, 0.025), (49, 0.047)]

similar blogs list:

simIndex simValue blogId blogTitle

same-blog 1 0.95566058 518 andrew gelman stats-2011-01-15-Regression discontinuity designs: looking for the keys under the lamppost?

2 0.76906878 1492 andrew gelman stats-2012-09-11-Using the “instrumental variables” or “potential outcomes” approach to clarify causal thinking

Introduction: As I’ve written here many times, my experiences in social science and public health research have left me skeptical of statistical methods that hypothesize or try to detect zero relationships between observational data (see, for example, the discussion starting at the bottom of page 960 in my review of causal inference in the American Journal of Sociology). In short, I have a taste for continuous rather than discrete models. As discussed in the above-linked article (with respect to the writings of cognitive scientist Steven Sloman), I think that common-sense thinking about causal inference can often mislead. In many cases, I have found that that the theoretical frameworks of instrumental variables and potential outcomes (for a review see, for example, chapters 9 and 10 of my book with Jennifer) help clarify my thinking. Here is an example that came up in a recent blog discussion. Computer science student Elias Bareinboim gave the following example: “suppose we know nothing a

3 0.75427735 803 andrew gelman stats-2011-07-14-Subtleties with measurement-error models for the evaluation of wacky claims

Introduction: A few days ago I discussed the evaluation of somewhat-plausible claims that are somewhat supported by theory and somewhat supported by statistical evidence. One point I raised was that an implausibly large estimate of effect size can be cause for concern: Uri Simonsohn (the author of the recent rebuttal of the name-choice article by Pelham et al.) argued that the implied effects were too large to be believed (just as I was arguing above regarding the July 4th study), which makes more plausible his claims that the results arise from methodological artifacts. That calculation is straight Bayes: the distribution of systematic errors has much longer tails than the distribution of random errors, so the larger the estimated effect, the more likely it is to be a mistake. This little theoretical result is a bit annoying, because it is the larger effects that are the most interesting!” Larry Bartels notes that my reasoning above is a bit incoherent: I [Bartels] strongly agree with

4 0.74330831 1675 andrew gelman stats-2013-01-15-“10 Things You Need to Know About Causal Effects”

Introduction: Macartan Humphreys pointed me to this excellent guide . Here are the 10 items: 1. A causal claim is a statement about what didn’t happen. 2. There is a fundamental problem of causal inference. 3. You can estimate average causal effects even if you cannot observe any individual causal effects. 4. If you know that, on average, A causes B and that B causes C, this does not mean that you know that A causes C. 5. The counterfactual model is all about contribution, not attribution. 6. X can cause Y even if there is no “causal path” connecting X and Y. 7. Correlation is not causation. 8. X can cause Y even if X is not a necessary condition or a sufficient condition for Y. 9. Estimating average causal effects does not require that treatment and control groups are identical. 10. There is no causation without manipulation. The article follows with crisp discussions of each point. My favorite is item #6, not because it’s the most important but because it brings in some real s

5 0.73750889 1136 andrew gelman stats-2012-01-23-Fight! (also a bit of reminiscence at the end)

Introduction: Martin Lindquist and Michael Sobel published a fun little article in Neuroimage on models and assumptions for causal inference with intermediate outcomes. As their subtitle indicates (“A response to the comments on our comment”), this is a topic of some controversy. Lindquist and Sobel write: Our original comment (Lindquist and Sobel, 2011) made explicit the types of assumptions neuroimaging researchers are making when directed graphical models (DGMs), which include certain types of structural equation models (SEMs), are used to estimate causal effects. When these assumptions, which many researchers are not aware of, are not met, parameters of these models should not be interpreted as effects. . . . [Judea] Pearl does not disagree with anything we stated. However, he takes exception to our use of potential outcomes notation, which is the standard notation used in the statistical literature on causal inference, and his comment is devoted to promoting his alternative conventions. [C

6 0.72241652 2286 andrew gelman stats-2014-04-08-Understanding Simpson’s paradox using a graph

7 0.72153288 797 andrew gelman stats-2011-07-11-How do we evaluate a new and wacky claim?

8 0.71585786 716 andrew gelman stats-2011-05-17-Is the internet causing half the rapes in Norway? I wanna see the scatterplot.

9 0.71447825 393 andrew gelman stats-2010-11-04-Estimating the effect of A on B, and also the effect of B on A

10 0.71141881 1732 andrew gelman stats-2013-02-22-Evaluating the impacts of welfare reform?

11 0.71104121 1418 andrew gelman stats-2012-07-16-Long discussion about causal inference and the use of hierarchical models to bridge between different inferential settings

12 0.70921868 629 andrew gelman stats-2011-03-26-Is it plausible that 1% of people pick a career based on their first name?

13 0.69629878 2336 andrew gelman stats-2014-05-16-How much can we learn about individual-level causal claims from state-level correlations?

14 0.69618005 2097 andrew gelman stats-2013-11-11-Why ask why? Forward causal inference and reverse causal questions

15 0.68668908 1939 andrew gelman stats-2013-07-15-Forward causal reasoning statements are about estimation; reverse causal questions are about model checking and hypothesis generation

16 0.68229443 960 andrew gelman stats-2011-10-15-The bias-variance tradeoff

17 0.68070859 1409 andrew gelman stats-2012-07-08-Is linear regression unethical in that it gives more weight to cases that are far from the average?

18 0.67493188 368 andrew gelman stats-2010-10-25-Is instrumental variables analysis particularly susceptible to Type M errors?

19 0.66849911 2 andrew gelman stats-2010-04-23-Modeling heterogenous treatment effects

20 0.66821533 287 andrew gelman stats-2010-09-20-Paul Rosenbaum on those annoying pre-treatment variables that are sort-of instruments and sort-of covariates

similar blogs computed by lda model

lda for this blog:

topicId topicWeight

[(2, 0.022), (9, 0.018), (15, 0.025), (16, 0.057), (20, 0.015), (21, 0.025), (24, 0.14), (31, 0.012), (46, 0.011), (53, 0.012), (63, 0.057), (69, 0.037), (73, 0.014), (86, 0.044), (88, 0.053), (93, 0.029), (95, 0.015), (99, 0.292)]

similar blogs list:

simIndex simValue blogId blogTitle

same-blog 1 0.97390497 518 andrew gelman stats-2011-01-15-Regression discontinuity designs: looking for the keys under the lamppost?

2 0.97155261 1403 andrew gelman stats-2012-07-02-Moving beyond hopeless graphics

Introduction: I was at a talk awhile ago where the speaker presented tables with 4, 5, 6, even 8 significant digits even though, as is usual, only the first or second digit of each number conveyed any useful information. A graph would be better, but even if you’re too lazy to make a plot, a bit of rounding would seem to be required. I mentioned this to a colleague, who responded: I don’t know how to stop this practice. Logic doesn’t work. Maybe ridicule? Best hope is the departure from field who do it. (Theories don’t die, but the people who follow those theories retire.) Another possibility, I think, is helpful software defaults. If we can get to the people who write the software, maybe we could have some impact. Once the software is written, however, it’s probably too late. I’m not far from the center of the R universe, but I don’t know if I’ll ever succeed in my goals of increasing the default number of histogram bars or reducing the default number of decimal places in regression

3 0.96400821 291 andrew gelman stats-2010-09-22-Philosophy of Bayes and non-Bayes: A dialogue with Deborah Mayo

Introduction: I sent Deborah Mayo a link to my paper with Cosma Shalizi on the philosophy of statistics, and she sent me the link to this conference which unfortunately already occurred. (It’s too bad, because I’d have liked to have been there.) I summarized my philosophy as follows: I am highly sympathetic to the approach of Lakatos (or of Popper, if you consider Lakatos’s “Popper_2″ to be a reasonable simulation of the true Popperism), in that (a) I view statistical models as being built within theoretical structures, and (b) I see the checking and refutation of models to be a key part of scientific progress. A big problem I have with mainstream Bayesianism is its “inductivist” view that science can operate completely smoothly with posterior updates: the idea that new data causes us to increase the posterior probability of good models and decrease the posterior probability of bad models. I don’t buy that: I see models as ever-changing entities that are flexible and can be patched and ex

4 0.9625293 400 andrew gelman stats-2010-11-08-Poli sci plagiarism update, and a note about the benefits of not caring

Introduction: A recent story about academic plagiarism spurred me to some more general thoughts about the intellectual benefits of not giving a damn. I’ll briefly summarize the plagiarism story and then get to my larger point. Copying big blocks of text from others’ writings without attribution Last month I linked to the story of Frank Fischer, an elderly professor of political science who was caught copying big blocks of text (with minor modifications) from others’ writings without attribution. Apparently there’s some dispute about whether this constitutes plagiarism. On one hand, Harvard’s policy is that “in academic writing, it is considered plagiarism to draw any idea or any language from someone else without adequately crediting that source in your paper.” On the other hand, several of Fischer’s colleagues defend him by saying, “Mr. Fischer sometimes used the words of other authors. . . ” They also write: The essence of plagiarism is passing off someone else’s work as

5 0.96200055 1506 andrew gelman stats-2012-09-21-Building a regression model . . . with only 27 data points

Introduction: Dan Silitonga writes: I was wondering whether you would have any advice on building a regression model on a very small datasets. I’m in the midst of revamping the model to predict tax collections from unincorporated businesses. But I only have 27 data points, 27 years of annual data. Any advice would be much appreciated. My reply: This sounds tough, especially given that 27 years of annual data isn’t even 27 independent data points. I have various essentially orthogonal suggestions: 1 [added after seeing John Cook's comment below]. Do your best, making as many assumptions as you need. In a Bayesian context, this means that you’d use a strong and informative prior and let the data update it as appropriate. In a less formal setting, you’d start with a guess of a model and then alter it to the extent that your data contradict your original guess. 2. Get more data. Not by getting information on more years (I assume you can’t do that) but by breaking up the data you do

6 0.95908755 784 andrew gelman stats-2011-07-01-Weighting and prediction in sample surveys

7 0.95907998 2148 andrew gelman stats-2013-12-25-Spam!

8 0.95873219 1690 andrew gelman stats-2013-01-23-When are complicated models helpful in psychology research and when are they overkill?

9 0.95828688 678 andrew gelman stats-2011-04-25-Democrats do better among the most and least educated groups

10 0.95809746 1337 andrew gelman stats-2012-05-22-Question 12 of my final exam for Design and Analysis of Sample Surveys

11 0.95805585 888 andrew gelman stats-2011-09-03-A psychology researcher asks: Is Anova dead?

12 0.95800096 923 andrew gelman stats-2011-09-24-What is the normal range of values in a medical test?

13 0.95787573 603 andrew gelman stats-2011-03-07-Assumptions vs. conditions, part 2

14 0.95778263 106 andrew gelman stats-2010-06-23-Scientists can read your mind . . . as long as the’re allowed to look at more than one place in your brain and then make a prediction after seeing what you actually did

15 0.95771295 2263 andrew gelman stats-2014-03-24-Empirical implications of Empirical Implications of Theoretical Models

16 0.95762897 629 andrew gelman stats-2011-03-26-Is it plausible that 1% of people pick a career based on their first name?

17 0.95745468 1564 andrew gelman stats-2012-11-06-Choose your default, or your default will choose you (election forecasting edition)

18 0.95725852 421 andrew gelman stats-2010-11-19-Just chaid

19 0.95695138 315 andrew gelman stats-2010-10-03-He doesn’t trust the fit . . . r=.999

20 0.95693684 2120 andrew gelman stats-2013-12-02-Does a professor’s intervention in online discussions have the effect of prolonging discussion or cutting it off?