andrew_gelman_stats andrew_gelman_stats-2013 andrew_gelman_stats-2013-1996 knowledge-graph by maker-knowledge-mining

1996 andrew gelman stats-2013-08-24-All inference is about generalizing from sample to population


meta infos for this blog

Source: html

Introduction: Jeff Walker writes: Your blog has skirted around the value of observational studies and chided folks for using causal language when they only have associations but I sense that you ultimately find value in these associations. I would love for you to expand this thought in a blog. Specifically: Does a measured association “suggest” a causal relationship? Are measured associations a good and efficient way to narrow the field of things that should be studied? Of all the things we should pursue, should we start with the stuff that has some largish measured association? Certainly many associations are not directly causal but due to joint association. Similarly, there must be many variables that are directly causally associated ( A -> B) but the effect, measured as an association, is masked by confounders. So if we took the “measured associations are worthwhile” approach, we’d never or rarely find the masked effects. But I’d also like to know if one is more likely to find a large causal


Summary: the most important sentenses genereted by tfidf model

sentIndex sentText sentNum sentScore

1 Jeff Walker writes: Your blog has skirted around the value of observational studies and chided folks for using causal language when they only have associations but I sense that you ultimately find value in these associations. [sent-1, score-1.256]

2 I would love for you to expand this thought in a blog. [sent-2, score-0.094]

3 Specifically: Does a measured association “suggest” a causal relationship? [sent-3, score-0.949]

4 Are measured associations a good and efficient way to narrow the field of things that should be studied? [sent-4, score-0.891]

5 Of all the things we should pursue, should we start with the stuff that has some largish measured association? [sent-5, score-0.364]

6 Certainly many associations are not directly causal but due to joint association. [sent-6, score-0.813]

7 Similarly, there must be many variables that are directly causally associated ( A -> B) but the effect, measured as an association, is masked by confounders. [sent-7, score-0.869]

8 So if we took the “measured associations are worthwhile” approach, we’d never or rarely find the masked effects. [sent-8, score-0.825]

9 But I’d also like to know if one is more likely to find a large causal effect given some association, so the association makes a good “working hypothesis”. [sent-9, score-0.756]

10 Effectively I’m asking, are observational studies worth the time and effort or would we be better to limit ourselves to experimental systems? [sent-11, score-0.31]

11 My response: I like Don Rubin’s take on this, which is that if you want to go from association to causation, state very clearly what the assumptions are for this step to work. [sent-12, score-0.594]

12 The clear statement of these assumptions can be helpful in moving forward ( here’s an example from my own work, with Gary King). [sent-13, score-0.178]

13 Another way to say this is that all inference is about generalizing from sample to population, to predicting the outcomes of hypothetical interventions on new cases. [sent-14, score-0.397]

14 Even a perfectly clean randomized experiment is typically of interest only to the extent that it generalizes to new people not included in the original study. [sent-16, score-0.397]


similar blogs computed by tfidf model

tfidf for this blog:

wordName wordTfidf (topN-words)

[('association', 0.377), ('associations', 0.372), ('measured', 0.364), ('masked', 0.285), ('causal', 0.208), ('observational', 0.147), ('causally', 0.122), ('generalizes', 0.117), ('assumptions', 0.113), ('walker', 0.107), ('clearly', 0.104), ('escape', 0.102), ('worthwhile', 0.102), ('generalizing', 0.099), ('directly', 0.098), ('leap', 0.096), ('pursue', 0.096), ('causation', 0.096), ('value', 0.095), ('find', 0.094), ('folks', 0.094), ('expand', 0.094), ('interventions', 0.092), ('studies', 0.089), ('narrow', 0.082), ('king', 0.079), ('effectively', 0.077), ('effect', 0.077), ('gary', 0.077), ('hypothetical', 0.076), ('rarely', 0.074), ('limit', 0.074), ('randomized', 0.074), ('perfectly', 0.074), ('joint', 0.074), ('efficient', 0.073), ('systems', 0.072), ('studied', 0.07), ('clean', 0.07), ('relationship', 0.07), ('rubin', 0.068), ('predicting', 0.068), ('jeff', 0.067), ('specifically', 0.066), ('moving', 0.065), ('included', 0.062), ('language', 0.062), ('outcomes', 0.062), ('asking', 0.062), ('due', 0.061)]

similar blogs list:

simIndex simValue blogId blogTitle

same-blog 1 1.0 1996 andrew gelman stats-2013-08-24-All inference is about generalizing from sample to population

Introduction: Jeff Walker writes: Your blog has skirted around the value of observational studies and chided folks for using causal language when they only have associations but I sense that you ultimately find value in these associations. I would love for you to expand this thought in a blog. Specifically: Does a measured association “suggest” a causal relationship? Are measured associations a good and efficient way to narrow the field of things that should be studied? Of all the things we should pursue, should we start with the stuff that has some largish measured association? Certainly many associations are not directly causal but due to joint association. Similarly, there must be many variables that are directly causally associated ( A -> B) but the effect, measured as an association, is masked by confounders. So if we took the “measured associations are worthwhile” approach, we’d never or rarely find the masked effects. But I’d also like to know if one is more likely to find a large causal

2 0.18524803 1418 andrew gelman stats-2012-07-16-Long discussion about causal inference and the use of hierarchical models to bridge between different inferential settings

Introduction: Elias Bareinboim asked what I thought about his comment on selection bias in which he referred to a paper by himself and Judea Pearl, “Controlling Selection Bias in Causal Inference.” I replied that I have no problem with what he wrote, but that from my perspective I find it easier to conceptualize such problems in terms of multilevel models. I elaborated on that point in a recent post , “Hierarchical modeling as a framework for extrapolation,” which I think was read by only a few people (I say this because it received only two comments). I don’t think Bareinboim objected to anything I wrote, but like me he is comfortable working within his own framework. He wrote the following to me: In some sense, “not ad hoc” could mean logically consistent. In other words, if one agrees with the assumptions encoded in the model, one must also agree with the conclusions entailed by these assumptions. I am not aware of any other way of doing mathematics. As it turns out, to get causa

3 0.17358525 1939 andrew gelman stats-2013-07-15-Forward causal reasoning statements are about estimation; reverse causal questions are about model checking and hypothesis generation

Introduction: Consider two broad classes of inferential questions : 1. Forward causal inference . What might happen if we do X? What are the effects of smoking on health, the effects of schooling on knowledge, the effect of campaigns on election outcomes, and so forth? 2. Reverse causal inference . What causes Y? Why do more attractive people earn more money? Why do many poor people vote for Republicans and rich people vote for Democrats? Why did the economy collapse? When statisticians and econometricians write about causal inference, they focus on forward causal questions. Rubin always told us: Never ask Why? Only ask What if? And, from the econ perspective, causation is typically framed in terms of manipulations: if x had changed by 1, how much would y be expected to change, holding all else constant? But reverse causal questions are important too. They’re a natural way to think (consider the importance of the word “Why”) and are arguably more important than forward questions.

4 0.15178926 2268 andrew gelman stats-2014-03-26-New research journal on observational studies

Introduction: Dylan Small writes: I am starting an observational studies journal that aims to publish papers on all aspects of observational studies, including study protocols for observational studies, methodologies for observational studies, descriptions of data sets for observational studies, software for observational studies and analyses of observational studies. One of the goals of the journal is to promote the planning of observational studies and to publish study plans for observational studies, like study plans are published for major clinical trials. Regular readers will know my suggestion that scientific journals move away from the idea of being unique publishers of new material and move toward a “newsletter” approach, recommending papers from Arxiv, SSRN, etc. So, instead of going through exhausting review and revision processes, the journal editors would read and review recent preprints on observational studies and then, each month or quarter or whatever, produce a list of pap

5 0.12740441 1305 andrew gelman stats-2012-05-07-Happy news on happiness; what can we believe?

Introduction: Sharon Jayson writes : The conventional wisdom that’s developed over the past few decades — based on early research — has said parents are less happy, more depressed and have less-satisfying marriages than their childless counterparts. But now, two new studies presented as part of the Population Association of America’s annual meeting suggest that earlier findings in several studies weren’t so clear-cut and may, in fact, be flawed. The newer analyses presented this week use analytical methods based on data from almost 130,000 adults around the globe — including more than 52,000 parents — and the conclusions aren’t so grim. They say that parents today may indeed be happier than non-parents and that parental happiness levels — while they do drop — don’t dip below the levels they were before having children. . . . The other study, of some 120,000 adults from two nationally representative surveys between 1972-2008, finds that parents were indeed less happy than non-parents in the d

6 0.12119806 1675 andrew gelman stats-2013-01-15-“10 Things You Need to Know About Causal Effects”

7 0.12100846 879 andrew gelman stats-2011-08-29-New journal on causal inference

8 0.11581943 935 andrew gelman stats-2011-10-01-When should you worry about imputed data?

9 0.11068029 1191 andrew gelman stats-2012-03-01-Hoe noem je?

10 0.10674889 785 andrew gelman stats-2011-07-02-Experimental reasoning in social science

11 0.1065761 340 andrew gelman stats-2010-10-13-Randomized experiments, non-randomized experiments, and observational studies

12 0.10598314 2170 andrew gelman stats-2014-01-13-Judea Pearl overview on causal inference, and more general thoughts on the reexpression of existing methods by considering their implicit assumptions

13 0.1037636 460 andrew gelman stats-2010-12-09-Statistics gifts?

14 0.099704385 807 andrew gelman stats-2011-07-17-Macro causality

15 0.096680015 1802 andrew gelman stats-2013-04-14-Detecting predictability in complex ecosystems

16 0.096289195 2315 andrew gelman stats-2014-05-02-Discovering general multidimensional associations

17 0.094314769 1624 andrew gelman stats-2012-12-15-New prize on causality in statstistics education

18 0.090253629 1778 andrew gelman stats-2013-03-27-My talk at the University of Michigan today 4pm

19 0.089985281 803 andrew gelman stats-2011-07-14-Subtleties with measurement-error models for the evaluation of wacky claims

20 0.087083757 1732 andrew gelman stats-2013-02-22-Evaluating the impacts of welfare reform?


similar blogs computed by lsi model

lsi for this blog:

topicId topicWeight

[(0, 0.162), (1, 0.005), (2, 0.023), (3, -0.083), (4, -0.002), (5, -0.002), (6, -0.028), (7, 0.009), (8, 0.063), (9, 0.019), (10, -0.039), (11, 0.018), (12, 0.034), (13, -0.003), (14, 0.022), (15, 0.017), (16, 0.005), (17, 0.018), (18, -0.044), (19, 0.047), (20, -0.015), (21, -0.068), (22, 0.058), (23, 0.029), (24, 0.088), (25, 0.134), (26, 0.036), (27, -0.012), (28, 0.023), (29, 0.069), (30, 0.029), (31, -0.055), (32, 0.004), (33, -0.005), (34, -0.041), (35, 0.026), (36, 0.037), (37, -0.004), (38, -0.02), (39, 0.014), (40, -0.019), (41, -0.016), (42, 0.017), (43, -0.02), (44, 0.011), (45, -0.035), (46, 0.035), (47, 0.014), (48, -0.007), (49, -0.022)]

similar blogs list:

simIndex simValue blogId blogTitle

same-blog 1 0.96429801 1996 andrew gelman stats-2013-08-24-All inference is about generalizing from sample to population

Introduction: Jeff Walker writes: Your blog has skirted around the value of observational studies and chided folks for using causal language when they only have associations but I sense that you ultimately find value in these associations. I would love for you to expand this thought in a blog. Specifically: Does a measured association “suggest” a causal relationship? Are measured associations a good and efficient way to narrow the field of things that should be studied? Of all the things we should pursue, should we start with the stuff that has some largish measured association? Certainly many associations are not directly causal but due to joint association. Similarly, there must be many variables that are directly causally associated ( A -> B) but the effect, measured as an association, is masked by confounders. So if we took the “measured associations are worthwhile” approach, we’d never or rarely find the masked effects. But I’d also like to know if one is more likely to find a large causal

2 0.85704792 1675 andrew gelman stats-2013-01-15-“10 Things You Need to Know About Causal Effects”

Introduction: Macartan Humphreys pointed me to this excellent guide . Here are the 10 items: 1. A causal claim is a statement about what didn’t happen. 2. There is a fundamental problem of causal inference. 3. You can estimate average causal effects even if you cannot observe any individual causal effects. 4. If you know that, on average, A causes B and that B causes C, this does not mean that you know that A causes C. 5. The counterfactual model is all about contribution, not attribution. 6. X can cause Y even if there is no “causal path” connecting X and Y. 7. Correlation is not causation. 8. X can cause Y even if X is not a necessary condition or a sufficient condition for Y. 9. Estimating average causal effects does not require that treatment and control groups are identical. 10. There is no causation without manipulation. The article follows with crisp discussions of each point. My favorite is item #6, not because it’s the most important but because it brings in some real s

3 0.8172121 2286 andrew gelman stats-2014-04-08-Understanding Simpson’s paradox using a graph

Introduction: Joshua Vogelstein pointed me to this post by Michael Nielsen on how to teach Simpson’s paradox. I don’t know if Nielsen (and others) are aware that people have developed some snappy graphical methods for displaying Simpson’s paradox (and, more generally, aggregation issues). We do some this in our Red State Blue State book, but before that was the BK plot, named by Howard Wainer after a 2001 paper by Stuart Baker and Barnett Kramer, although in apparently appeared earlier in a 1987 paper by Jeon, Chung, and Bae, and doubtless was made by various other people before then. Here’s Wainer’s graphical explication from 2002 (adapted from Baker and Kramer’s 2001 paper): Here’s the version from our 2007 article (with Boris Shor, Joe Bafumi, and David Park): But I recommend Wainer’s article (linked to above) as the first thing to read on the topic of presenting aggregation paradoxes in a clear and grabby way. P.S. Robert Long writes in: I noticed your post ab

4 0.81193352 340 andrew gelman stats-2010-10-13-Randomized experiments, non-randomized experiments, and observational studies

Introduction: In the spirit of Dehejia and Wahba: Three Conditions under Which Experiments and Observational Studies Produce Comparable Causal Estimates: New Findings from Within-Study Comparisons , by Cook, Shadish, and Wong. Can Nonrandomized Experiments Yield Accurate Answers? A Randomized Experiment Comparing Random and Nonrandom Assignments, by Shadish, Clark, and Steiner. I just talk about causal inference. These people do it. The second link above is particularly interesting because it includes discussions by some causal inference heavyweights. WWJD and all that.

5 0.80094343 1939 andrew gelman stats-2013-07-15-Forward causal reasoning statements are about estimation; reverse causal questions are about model checking and hypothesis generation

Introduction: Consider two broad classes of inferential questions : 1. Forward causal inference . What might happen if we do X? What are the effects of smoking on health, the effects of schooling on knowledge, the effect of campaigns on election outcomes, and so forth? 2. Reverse causal inference . What causes Y? Why do more attractive people earn more money? Why do many poor people vote for Republicans and rich people vote for Democrats? Why did the economy collapse? When statisticians and econometricians write about causal inference, they focus on forward causal questions. Rubin always told us: Never ask Why? Only ask What if? And, from the econ perspective, causation is typically framed in terms of manipulations: if x had changed by 1, how much would y be expected to change, holding all else constant? But reverse causal questions are important too. They’re a natural way to think (consider the importance of the word “Why”) and are arguably more important than forward questions.

6 0.80014682 1418 andrew gelman stats-2012-07-16-Long discussion about causal inference and the use of hierarchical models to bridge between different inferential settings

7 0.79991841 1802 andrew gelman stats-2013-04-14-Detecting predictability in complex ecosystems

8 0.7866134 1492 andrew gelman stats-2012-09-11-Using the “instrumental variables” or “potential outcomes” approach to clarify causal thinking

9 0.77633339 1136 andrew gelman stats-2012-01-23-Fight! (also a bit of reminiscence at the end)

10 0.76499969 1732 andrew gelman stats-2013-02-22-Evaluating the impacts of welfare reform?

11 0.74801624 807 andrew gelman stats-2011-07-17-Macro causality

12 0.74484932 550 andrew gelman stats-2011-02-02-An IV won’t save your life if the line is tangled

13 0.73967254 393 andrew gelman stats-2010-11-04-Estimating the effect of A on B, and also the effect of B on A

14 0.72692484 1624 andrew gelman stats-2012-12-15-New prize on causality in statstistics education

15 0.72616005 1888 andrew gelman stats-2013-06-08-New Judea Pearl journal of causal inference

16 0.72299689 2170 andrew gelman stats-2014-01-13-Judea Pearl overview on causal inference, and more general thoughts on the reexpression of existing methods by considering their implicit assumptions

17 0.71617168 287 andrew gelman stats-2010-09-20-Paul Rosenbaum on those annoying pre-treatment variables that are sort-of instruments and sort-of covariates

18 0.71251112 2097 andrew gelman stats-2013-11-11-Why ask why? Forward causal inference and reverse causal questions

19 0.71147048 879 andrew gelman stats-2011-08-29-New journal on causal inference

20 0.70598292 1133 andrew gelman stats-2012-01-21-Judea Pearl on why he is “only a half-Bayesian”


similar blogs computed by lda model

lda for this blog:

topicId topicWeight

[(4, 0.081), (8, 0.012), (10, 0.011), (16, 0.072), (21, 0.036), (24, 0.159), (28, 0.02), (55, 0.013), (77, 0.041), (86, 0.045), (95, 0.069), (99, 0.338)]

similar blogs list:

simIndex simValue blogId blogTitle

same-blog 1 0.978387 1996 andrew gelman stats-2013-08-24-All inference is about generalizing from sample to population

Introduction: Jeff Walker writes: Your blog has skirted around the value of observational studies and chided folks for using causal language when they only have associations but I sense that you ultimately find value in these associations. I would love for you to expand this thought in a blog. Specifically: Does a measured association “suggest” a causal relationship? Are measured associations a good and efficient way to narrow the field of things that should be studied? Of all the things we should pursue, should we start with the stuff that has some largish measured association? Certainly many associations are not directly causal but due to joint association. Similarly, there must be many variables that are directly causally associated ( A -> B) but the effect, measured as an association, is masked by confounders. So if we took the “measured associations are worthwhile” approach, we’d never or rarely find the masked effects. But I’d also like to know if one is more likely to find a large causal

2 0.97598696 2211 andrew gelman stats-2014-02-14-The popularity of certain baby names is falling off the clifffffffffffff

Introduction: Ubs writes: I was looking at baby name data last night and I stumbled upon something curious. I follow the baby names blog occasionally but not regularly, so I’m not sure if it’s been noticed before. Let me present it like this: Take the statement… Of the top 100 boys and top 100 girls names, only ___% contain the letter __. I’m using the SSA baby names page, so that’s U.S. births, and I’m looking at the decade of 2000-2009 (so kids currently aged 4 to 13). Which letters would you expect to have the lowest rate of occurrence? As expected, the lowest score is for Q, which appears zero times. (Jacqueline ranks #104 for girls.) It’s the second lowest that surprised me. (… You can pause and try to guess now. Spoilers to follow.) Of the other big-point Scrabble letters, Z appears in four names (Elizabeth, Zachary, Mackenzie, Zoe) and X in six, of which five are closely related (Alexis, Alexander, Alexandra, Alexa, Alex, Xavier). J is heavily overrepresented, especial

3 0.97447878 1918 andrew gelman stats-2013-06-29-Going negative

Introduction: Troels Ring writes: I have measured total phosphorus, TP, on a number of dialysis patients, and also measured conventional phosphate, Pi. Now P is exchanged with the environment as Pi, so in principle a correlation between TP and Pi could perhaps be expected. I’m really most interested in the fraction of TP which is not Pi, that is TP-Pi. I would also expect that to be positively correlated with Pi. However, looking at the data using a mixed model an insignificant negative correlation is obtained. Then I thought, that since TP-Pi is bound to be small if Pi is large a negative correlation is almost dictated by the math even if the biology would have it otherwise in so far as the the TP-Pi, likely organic P, must someday have been Pi. Hence I thought about correcting the slight negative correlation between TP-Pi and Pi for the expected large negative correlation due to the math – to eventually recover what I came from: a positive correlation. People seems to agree that this thinki

4 0.96800125 419 andrew gelman stats-2010-11-18-Derivative-based MCMC as a breakthrough technique for implementing Bayesian statistics

Introduction: John Salvatier pointed me to this blog on derivative based MCMC algorithms (also sometimes called “hybrid” or “Hamiltonian” Monte Carlo) and automatic differentiation as the future of MCMC. This all makes sense to me and is consistent both with my mathematical intuition from studying Metropolis algorithms and my experience with Matt using hybrid MCMC when fitting hierarchical spline models. In particular, I agree with Salvatier’s point about the potential for computation of analytic derivatives of the log-density function. As long as we’re mostly snapping together our models using analytically-simple pieces, the same part of the program that handles the computation of log-posterior densities should also be able to compute derivatives analytically. I’ve been a big fan of automatic derivative-based MCMC methods since I started hearing about them a couple years ago (I’m thinking of the DREAM project and of Mark Girolami’s paper), and I too wonder why they haven’t been used more. I

5 0.96796089 1801 andrew gelman stats-2013-04-13-Can you write a program to determine the causal order?

Introduction: Mike Zyphur writes: Kaggle.com has launched a competition to determine what’s an effect and what’s a cause. They’ve got correlated variables, they’re deprived of context, and you’re asked to determine the causal order. $5,000 prizes. I followed the link and the example they gave didn’t make much sense to me (the two variables were temperature and altitude of cities in Germany, and they said that altitude causes temperature). It has the feeling to me of one of those weird standardized tests we used to see sometimes in school, where there’s no real correct answer so the goal is to figure out what the test-writer wanted you to say. Nonetheless, this might be of interest, so I’m passing it along to you.

6 0.96526343 2154 andrew gelman stats-2013-12-30-Bill Gates’s favorite graph of the year

7 0.964118 1737 andrew gelman stats-2013-02-25-Correlation of 1 . . . too good to be true?

8 0.96347106 1575 andrew gelman stats-2012-11-12-Thinking like a statistician (continuously) rather than like a civilian (discretely)

9 0.96326107 907 andrew gelman stats-2011-09-14-Reproducibility in Practice

10 0.96277976 670 andrew gelman stats-2011-04-20-Attractive but hard-to-read graph could be made much much better

11 0.96190441 1070 andrew gelman stats-2011-12-19-The scope for snooping

12 0.96020138 238 andrew gelman stats-2010-08-27-No radon lobby

13 0.95960689 1834 andrew gelman stats-2013-05-01-A graph at war with its caption. Also, how to visualize the same numbers without giving the display a misleading causal feel?

14 0.95943528 1605 andrew gelman stats-2012-12-04-Write This Book

15 0.95943409 2341 andrew gelman stats-2014-05-20-plus ça change, plus c’est la même chose

16 0.95921618 113 andrew gelman stats-2010-06-28-Advocacy in the form of a “deliberative forum”

17 0.9575007 1829 andrew gelman stats-2013-04-28-Plain old everyday Bayesianism!

18 0.95743382 2174 andrew gelman stats-2014-01-17-How to think about the statistical evidence when the statistical evidence can’t be conclusive?

19 0.95694643 1350 andrew gelman stats-2012-05-28-Value-added assessment: What went wrong?

20 0.95668393 1305 andrew gelman stats-2012-05-07-Happy news on happiness; what can we believe?