andrew_gelman_stats andrew_gelman_stats-2011 andrew_gelman_stats-2011-518 knowledge-graph by maker-knowledge-mining
Source: html
Introduction: Jas sends along this paper (with Devin Caughey), entitled Regression-Discontinuity Designs and Popular Elections: Implications of Pro-Incumbent Bias in Close U.S. House Races, and writes: The paper shows that regression discontinuity does not work for US House elections. Close House elections are anything but random. It isn’t election recounts or something like that (we collect recount data to show that it isn’t). We have collected much new data to try to hunt down what is going on (e.g., campaign finance data, CQ pre-election forecasts, correct many errors in the Lee dataset). The substantive implications are interesting. We also have a section that compares in details Gelman and King versus the Lee estimand and estimator. I had a few comments: David Lee is not estimating the effect of incumbency; he’s estimating the effect of the incumbent party, which is a completely different thing. The regression discontinuity design is completely inappropriate for estimating the
sentIndex sentText sentNum sentScore
1 House Races, and writes: The paper shows that regression discontinuity does not work for US House elections. [sent-3, score-0.28]
2 I had a few comments: David Lee is not estimating the effect of incumbency; he’s estimating the effect of the incumbent party, which is a completely different thing. [sent-11, score-0.999]
3 The regression discontinuity design is completely inappropriate for estimating the effect of incumbency. [sent-12, score-0.627]
4 I guess I should’ve published this, but Lee’s idea seemed so evidently inappropriate (at least for the problem of estimating incumbency advantage in the U. [sent-14, score-0.829]
5 ) that it didn’t seem worth devoting effort to it. [sent-16, score-0.115]
6 Leigh Linden did convince me that the estimate makes sense for India, though, where it really does seem to make more sense to think of incumbency as a property of a political party rather than of an individual legislator. [sent-17, score-0.811]
7 Gary King and I distinguished between incumbency effect and incumbent party effect in our 1990 AJPS paper, where we explicitly laid out the causal inference. [sent-18, score-1.4]
8 Finally, the most sophisticated analysis of incumbency advantaget and its variation that I know of is my recent Jasa paper with Huang. [sent-20, score-0.611]
9 In short, we’re estimating the effect of incumbency, Lee is estimating the effect of incumbent party. [sent-21, score-0.999]
10 You can see this through a thought experiment in which all congressmembers are term-limited to serve only two years. [sent-22, score-0.147]
11 There would then never be any incumbents running for reelection (thus, no incumbency effect) but there would be an incumbent party effect. [sent-23, score-1.021]
12 And, correspondingly, the Gelman and Huang (or Gelman and King) estimate of incumbency advantage would be undefined, but the Lee estimator of incumbent party effect would work just fine. [sent-24, score-1.268]
13 Perhaps you should make your title and abstract more general, since really the key contribution of your paper is not about incumbency (since, as you point out, the different estimates happen to be pretty similar for U. [sent-26, score-0.721]
14 The implication I get from the beginning of the paper is that, if only the RD assumptions were valid, Lee’s estimate would be just fine. [sent-30, score-0.211]
15 Later on, you clarify that there are tradeoffs–basically, Lee is attempting to trade off validity for reliability. [sent-31, score-0.111]
16 (I say that he’s trading off valididty because I don’t think anyone would really consider the incumbent-party effect as an incumbency effect. [sent-32, score-0.751]
17 And a connection to the concepts of reliability and validity might be useful. [sent-35, score-0.2]
18 That’s a general issue in causal inference: do you want a biased, assumption-laden estimate of the actual quantity of interest, or a crisp randomized estimate of something that’s vaguely related that you happen to have an experiment (or natural experiment) on? [sent-36, score-0.565]
19 Also, I really really don’t recommend fitting fourth-order polynomials. [sent-39, score-0.11]
20 I suppose I should write a paper about this–maybe you’d be interested in collaborating on such an effort? [sent-43, score-0.168]
wordName wordTfidf (topN-words)
[('incumbency', 0.502), ('lee', 0.351), ('incumbent', 0.253), ('caughey', 0.196), ('effect', 0.194), ('estimating', 0.179), ('party', 0.152), ('sekhon', 0.144), ('king', 0.119), ('validity', 0.111), ('paper', 0.109), ('discontinuity', 0.108), ('causal', 0.105), ('estimate', 0.102), ('house', 0.099), ('experiment', 0.09), ('reliability', 0.089), ('gelman', 0.087), ('inappropriate', 0.083), ('designs', 0.083), ('mass', 0.073), ('implications', 0.069), ('elections', 0.067), ('cq', 0.065), ('linden', 0.065), ('leigh', 0.065), ('splines', 0.065), ('suprised', 0.065), ('advantage', 0.065), ('regression', 0.063), ('devin', 0.062), ('devoting', 0.062), ('estimand', 0.062), ('huang', 0.062), ('hunt', 0.062), ('collaborating', 0.059), ('undefined', 0.059), ('congressmembers', 0.057), ('incumbents', 0.057), ('reelection', 0.057), ('ajps', 0.057), ('crisp', 0.057), ('polynomials', 0.057), ('correspondingly', 0.055), ('really', 0.055), ('happen', 0.055), ('natural', 0.054), ('effort', 0.053), ('jasa', 0.052), ('temptation', 0.052)]
simIndex simValue blogId blogTitle
same-blog 1 0.9999997 518 andrew gelman stats-2011-01-15-Regression discontinuity designs: looking for the keys under the lamppost?
Introduction: Jas sends along this paper (with Devin Caughey), entitled Regression-Discontinuity Designs and Popular Elections: Implications of Pro-Incumbent Bias in Close U.S. House Races, and writes: The paper shows that regression discontinuity does not work for US House elections. Close House elections are anything but random. It isn’t election recounts or something like that (we collect recount data to show that it isn’t). We have collected much new data to try to hunt down what is going on (e.g., campaign finance data, CQ pre-election forecasts, correct many errors in the Lee dataset). The substantive implications are interesting. We also have a section that compares in details Gelman and King versus the Lee estimand and estimator. I had a few comments: David Lee is not estimating the effect of incumbency; he’s estimating the effect of the incumbent party, which is a completely different thing. The regression discontinuity design is completely inappropriate for estimating the
2 0.21408273 1201 andrew gelman stats-2012-03-07-Inference = data + model
Introduction: A recent article on global warming reminded me of the difficulty of letting the data speak. William Nordhaus shows the following graph: And then he writes: One of the reasons that drawing conclusions on temperature trends is tricky is that the historical temperature series is highly volatile, as can be seen in the figure. The presence of short-term volatility requires looking at long-term trends. A useful analogy is the stock market. Suppose an analyst says that because real stock prices have declined over the last decade (which is true), it follows that there is no upward trend. Here again, an examination of the long-term data would quickly show this to be incorrect. The last decade of temperature and stock market data is not representative of the longer-term trends. The finding that global temperatures are rising over the last century-plus is one of the most robust findings of climate science and statistics. I see what he’s saying, but first, I don’t find the st
3 0.16767716 1310 andrew gelman stats-2012-05-09-Varying treatment effects, again
Introduction: This time from Bernard Fraga and Eitan Hersh. Once you think about it, it’s hard to imagine any nonzero treatment effects that don’t vary. I’m glad to see this area of research becoming more prominent. ( Here ‘s a discussion of another political science example, also of voter turnout, from a few years ago, from Avi Feller and Chris Holmes.) Some of my fragmentary work on varying treatment effects is here (Treatment Effects in Before-After Data) and here (Estimating Incumbency Advantage and Its Variation, as an Example of a Before–After Study).
4 0.1543061 478 andrew gelman stats-2010-12-20-More on why “all politics is local” is an outdated slogan
Introduction: Yesterday I wrote that Mickey Kaus was right to point out that it’s time to retire Tip O’Neill’s famous dictum that “all politics are local.” As Kaus points out, all the congressional elections in recent decades have been nationalized. The slogan is particularly silly for Tip O’Neill himself. Sure, O’Neill had to have a firm grip on local politics to get his safe seat in the first place, but after that it was smooth sailing. Jonathan Bernstein disagrees , writing: Yes, but: don’t most Members of the House have ironclad partisan districts? And isn’t the most important single thing they can do to protect themselves involve having pull in state politics to avoid being gerrymandered? That is “all politics is local,” no? There’s also a fair amount they can do to stay on the good side of their local party, thus avoiding a primary fight. And, even in an era of nationalized elections, there’s still plenty a Member of Congress can do to to influence elections on the margins, a
Introduction: Consider two broad classes of inferential questions : 1. Forward causal inference . What might happen if we do X? What are the effects of smoking on health, the effects of schooling on knowledge, the effect of campaigns on election outcomes, and so forth? 2. Reverse causal inference . What causes Y? Why do more attractive people earn more money? Why do many poor people vote for Republicans and rich people vote for Democrats? Why did the economy collapse? When statisticians and econometricians write about causal inference, they focus on forward causal questions. Rubin always told us: Never ask Why? Only ask What if? And, from the econ perspective, causation is typically framed in terms of manipulations: if x had changed by 1, how much would y be expected to change, holding all else constant? But reverse causal questions are important too. They’re a natural way to think (consider the importance of the word “Why”) and are arguably more important than forward questions.
6 0.13121334 1982 andrew gelman stats-2013-08-15-Blaming scientific fraud on the Kuhnians
8 0.1290431 1086 andrew gelman stats-2011-12-27-The most dangerous jobs in America
10 0.1168587 797 andrew gelman stats-2011-07-11-How do we evaluate a new and wacky claim?
12 0.11309274 1971 andrew gelman stats-2013-08-07-I doubt they cheated
13 0.10920046 960 andrew gelman stats-2011-10-15-The bias-variance tradeoff
14 0.10457475 2095 andrew gelman stats-2013-11-09-Typo in Ghitza and Gelman MRP paper
15 0.10165693 796 andrew gelman stats-2011-07-10-Matching and regression: two great tastes etc etc
16 0.10024637 550 andrew gelman stats-2011-02-02-An IV won’t save your life if the line is tangled
17 0.099692419 1514 andrew gelman stats-2012-09-28-AdviseStat 47% Campaign Ad
18 0.099474765 250 andrew gelman stats-2010-09-02-Blending results from two relatively independent multi-level models
19 0.097808644 1547 andrew gelman stats-2012-10-25-College football, voting, and the law of large numbers
20 0.09756811 2245 andrew gelman stats-2014-03-12-More on publishing in journals
topicId topicWeight
[(0, 0.199), (1, 0.015), (2, 0.079), (3, -0.055), (4, -0.023), (5, -0.026), (6, -0.04), (7, -0.035), (8, 0.043), (9, -0.012), (10, 0.017), (11, 0.03), (12, 0.049), (13, -0.036), (14, 0.04), (15, 0.002), (16, -0.029), (17, 0.026), (18, -0.032), (19, 0.065), (20, -0.075), (21, 0.015), (22, 0.076), (23, 0.022), (24, 0.052), (25, 0.084), (26, -0.008), (27, 0.003), (28, -0.016), (29, 0.022), (30, 0.009), (31, 0.013), (32, -0.035), (33, -0.039), (34, -0.053), (35, -0.014), (36, -0.024), (37, -0.048), (38, 0.012), (39, 0.003), (40, -0.022), (41, 0.033), (42, -0.089), (43, -0.017), (44, -0.037), (45, 0.024), (46, 0.006), (47, -0.02), (48, 0.025), (49, 0.047)]
simIndex simValue blogId blogTitle
same-blog 1 0.95566058 518 andrew gelman stats-2011-01-15-Regression discontinuity designs: looking for the keys under the lamppost?
Introduction: Jas sends along this paper (with Devin Caughey), entitled Regression-Discontinuity Designs and Popular Elections: Implications of Pro-Incumbent Bias in Close U.S. House Races, and writes: The paper shows that regression discontinuity does not work for US House elections. Close House elections are anything but random. It isn’t election recounts or something like that (we collect recount data to show that it isn’t). We have collected much new data to try to hunt down what is going on (e.g., campaign finance data, CQ pre-election forecasts, correct many errors in the Lee dataset). The substantive implications are interesting. We also have a section that compares in details Gelman and King versus the Lee estimand and estimator. I had a few comments: David Lee is not estimating the effect of incumbency; he’s estimating the effect of the incumbent party, which is a completely different thing. The regression discontinuity design is completely inappropriate for estimating the
Introduction: As I’ve written here many times, my experiences in social science and public health research have left me skeptical of statistical methods that hypothesize or try to detect zero relationships between observational data (see, for example, the discussion starting at the bottom of page 960 in my review of causal inference in the American Journal of Sociology). In short, I have a taste for continuous rather than discrete models. As discussed in the above-linked article (with respect to the writings of cognitive scientist Steven Sloman), I think that common-sense thinking about causal inference can often mislead. In many cases, I have found that that the theoretical frameworks of instrumental variables and potential outcomes (for a review see, for example, chapters 9 and 10 of my book with Jennifer) help clarify my thinking. Here is an example that came up in a recent blog discussion. Computer science student Elias Bareinboim gave the following example: “suppose we know nothing a
3 0.75427735 803 andrew gelman stats-2011-07-14-Subtleties with measurement-error models for the evaluation of wacky claims
Introduction: A few days ago I discussed the evaluation of somewhat-plausible claims that are somewhat supported by theory and somewhat supported by statistical evidence. One point I raised was that an implausibly large estimate of effect size can be cause for concern: Uri Simonsohn (the author of the recent rebuttal of the name-choice article by Pelham et al.) argued that the implied effects were too large to be believed (just as I was arguing above regarding the July 4th study), which makes more plausible his claims that the results arise from methodological artifacts. That calculation is straight Bayes: the distribution of systematic errors has much longer tails than the distribution of random errors, so the larger the estimated effect, the more likely it is to be a mistake. This little theoretical result is a bit annoying, because it is the larger effects that are the most interesting!” Larry Bartels notes that my reasoning above is a bit incoherent: I [Bartels] strongly agree with
4 0.74330831 1675 andrew gelman stats-2013-01-15-“10 Things You Need to Know About Causal Effects”
Introduction: Macartan Humphreys pointed me to this excellent guide . Here are the 10 items: 1. A causal claim is a statement about what didn’t happen. 2. There is a fundamental problem of causal inference. 3. You can estimate average causal effects even if you cannot observe any individual causal effects. 4. If you know that, on average, A causes B and that B causes C, this does not mean that you know that A causes C. 5. The counterfactual model is all about contribution, not attribution. 6. X can cause Y even if there is no “causal path” connecting X and Y. 7. Correlation is not causation. 8. X can cause Y even if X is not a necessary condition or a sufficient condition for Y. 9. Estimating average causal effects does not require that treatment and control groups are identical. 10. There is no causation without manipulation. The article follows with crisp discussions of each point. My favorite is item #6, not because it’s the most important but because it brings in some real s
5 0.73750889 1136 andrew gelman stats-2012-01-23-Fight! (also a bit of reminiscence at the end)
Introduction: Martin Lindquist and Michael Sobel published a fun little article in Neuroimage on models and assumptions for causal inference with intermediate outcomes. As their subtitle indicates (“A response to the comments on our comment”), this is a topic of some controversy. Lindquist and Sobel write: Our original comment (Lindquist and Sobel, 2011) made explicit the types of assumptions neuroimaging researchers are making when directed graphical models (DGMs), which include certain types of structural equation models (SEMs), are used to estimate causal effects. When these assumptions, which many researchers are not aware of, are not met, parameters of these models should not be interpreted as effects. . . . [Judea] Pearl does not disagree with anything we stated. However, he takes exception to our use of potential outcomes notation, which is the standard notation used in the statistical literature on causal inference, and his comment is devoted to promoting his alternative conventions. [C
6 0.72241652 2286 andrew gelman stats-2014-04-08-Understanding Simpson’s paradox using a graph
7 0.72153288 797 andrew gelman stats-2011-07-11-How do we evaluate a new and wacky claim?
8 0.71585786 716 andrew gelman stats-2011-05-17-Is the internet causing half the rapes in Norway? I wanna see the scatterplot.
9 0.71447825 393 andrew gelman stats-2010-11-04-Estimating the effect of A on B, and also the effect of B on A
10 0.71141881 1732 andrew gelman stats-2013-02-22-Evaluating the impacts of welfare reform?
12 0.70921868 629 andrew gelman stats-2011-03-26-Is it plausible that 1% of people pick a career based on their first name?
13 0.69629878 2336 andrew gelman stats-2014-05-16-How much can we learn about individual-level causal claims from state-level correlations?
14 0.69618005 2097 andrew gelman stats-2013-11-11-Why ask why? Forward causal inference and reverse causal questions
16 0.68229443 960 andrew gelman stats-2011-10-15-The bias-variance tradeoff
18 0.67493188 368 andrew gelman stats-2010-10-25-Is instrumental variables analysis particularly susceptible to Type M errors?
19 0.66849911 2 andrew gelman stats-2010-04-23-Modeling heterogenous treatment effects
topicId topicWeight
[(2, 0.022), (9, 0.018), (15, 0.025), (16, 0.057), (20, 0.015), (21, 0.025), (24, 0.14), (31, 0.012), (46, 0.011), (53, 0.012), (63, 0.057), (69, 0.037), (73, 0.014), (86, 0.044), (88, 0.053), (93, 0.029), (95, 0.015), (99, 0.292)]
simIndex simValue blogId blogTitle
same-blog 1 0.97390497 518 andrew gelman stats-2011-01-15-Regression discontinuity designs: looking for the keys under the lamppost?
Introduction: Jas sends along this paper (with Devin Caughey), entitled Regression-Discontinuity Designs and Popular Elections: Implications of Pro-Incumbent Bias in Close U.S. House Races, and writes: The paper shows that regression discontinuity does not work for US House elections. Close House elections are anything but random. It isn’t election recounts or something like that (we collect recount data to show that it isn’t). We have collected much new data to try to hunt down what is going on (e.g., campaign finance data, CQ pre-election forecasts, correct many errors in the Lee dataset). The substantive implications are interesting. We also have a section that compares in details Gelman and King versus the Lee estimand and estimator. I had a few comments: David Lee is not estimating the effect of incumbency; he’s estimating the effect of the incumbent party, which is a completely different thing. The regression discontinuity design is completely inappropriate for estimating the
2 0.97155261 1403 andrew gelman stats-2012-07-02-Moving beyond hopeless graphics
Introduction: I was at a talk awhile ago where the speaker presented tables with 4, 5, 6, even 8 significant digits even though, as is usual, only the first or second digit of each number conveyed any useful information. A graph would be better, but even if you’re too lazy to make a plot, a bit of rounding would seem to be required. I mentioned this to a colleague, who responded: I don’t know how to stop this practice. Logic doesn’t work. Maybe ridicule? Best hope is the departure from field who do it. (Theories don’t die, but the people who follow those theories retire.) Another possibility, I think, is helpful software defaults. If we can get to the people who write the software, maybe we could have some impact. Once the software is written, however, it’s probably too late. I’m not far from the center of the R universe, but I don’t know if I’ll ever succeed in my goals of increasing the default number of histogram bars or reducing the default number of decimal places in regression
3 0.96400821 291 andrew gelman stats-2010-09-22-Philosophy of Bayes and non-Bayes: A dialogue with Deborah Mayo
Introduction: I sent Deborah Mayo a link to my paper with Cosma Shalizi on the philosophy of statistics, and she sent me the link to this conference which unfortunately already occurred. (It’s too bad, because I’d have liked to have been there.) I summarized my philosophy as follows: I am highly sympathetic to the approach of Lakatos (or of Popper, if you consider Lakatos’s “Popper_2″ to be a reasonable simulation of the true Popperism), in that (a) I view statistical models as being built within theoretical structures, and (b) I see the checking and refutation of models to be a key part of scientific progress. A big problem I have with mainstream Bayesianism is its “inductivist” view that science can operate completely smoothly with posterior updates: the idea that new data causes us to increase the posterior probability of good models and decrease the posterior probability of bad models. I don’t buy that: I see models as ever-changing entities that are flexible and can be patched and ex
4 0.9625293 400 andrew gelman stats-2010-11-08-Poli sci plagiarism update, and a note about the benefits of not caring
Introduction: A recent story about academic plagiarism spurred me to some more general thoughts about the intellectual benefits of not giving a damn. I’ll briefly summarize the plagiarism story and then get to my larger point. Copying big blocks of text from others’ writings without attribution Last month I linked to the story of Frank Fischer, an elderly professor of political science who was caught copying big blocks of text (with minor modifications) from others’ writings without attribution. Apparently there’s some dispute about whether this constitutes plagiarism. On one hand, Harvard’s policy is that “in academic writing, it is considered plagiarism to draw any idea or any language from someone else without adequately crediting that source in your paper.” On the other hand, several of Fischer’s colleagues defend him by saying, “Mr. Fischer sometimes used the words of other authors. . . ” They also write: The essence of plagiarism is passing off someone else’s work as
5 0.96200055 1506 andrew gelman stats-2012-09-21-Building a regression model . . . with only 27 data points
Introduction: Dan Silitonga writes: I was wondering whether you would have any advice on building a regression model on a very small datasets. I’m in the midst of revamping the model to predict tax collections from unincorporated businesses. But I only have 27 data points, 27 years of annual data. Any advice would be much appreciated. My reply: This sounds tough, especially given that 27 years of annual data isn’t even 27 independent data points. I have various essentially orthogonal suggestions: 1 [added after seeing John Cook's comment below]. Do your best, making as many assumptions as you need. In a Bayesian context, this means that you’d use a strong and informative prior and let the data update it as appropriate. In a less formal setting, you’d start with a guess of a model and then alter it to the extent that your data contradict your original guess. 2. Get more data. Not by getting information on more years (I assume you can’t do that) but by breaking up the data you do
6 0.95908755 784 andrew gelman stats-2011-07-01-Weighting and prediction in sample surveys
7 0.95907998 2148 andrew gelman stats-2013-12-25-Spam!
9 0.95828688 678 andrew gelman stats-2011-04-25-Democrats do better among the most and least educated groups
10 0.95809746 1337 andrew gelman stats-2012-05-22-Question 12 of my final exam for Design and Analysis of Sample Surveys
11 0.95805585 888 andrew gelman stats-2011-09-03-A psychology researcher asks: Is Anova dead?
12 0.95800096 923 andrew gelman stats-2011-09-24-What is the normal range of values in a medical test?
13 0.95787573 603 andrew gelman stats-2011-03-07-Assumptions vs. conditions, part 2
15 0.95771295 2263 andrew gelman stats-2014-03-24-Empirical implications of Empirical Implications of Theoretical Models
16 0.95762897 629 andrew gelman stats-2011-03-26-Is it plausible that 1% of people pick a career based on their first name?
17 0.95745468 1564 andrew gelman stats-2012-11-06-Choose your default, or your default will choose you (election forecasting edition)
18 0.95725852 421 andrew gelman stats-2010-11-19-Just chaid
19 0.95695138 315 andrew gelman stats-2010-10-03-He doesn’t trust the fit . . . r=.999