1136 andrew gelman stats-2012-01-23-Fight! (also a bit of reminiscence at the end)

Introduction: Martin Lindquist and Michael Sobel published a fun little article in Neuroimage on models and assumptions for causal inference with intermediate outcomes. As their subtitle indicates (“A response to the comments on our comment”), this is a topic of some controversy. Lindquist and Sobel write: Our original comment (Lindquist and Sobel, 2011) made explicit the types of assumptions neuroimaging researchers are making when directed graphical models (DGMs), which include certain types of structural equation models (SEMs), are used to estimate causal effects. When these assumptions, which many researchers are not aware of, are not met, parameters of these models should not be interpreted as effects. . . . [Judea] Pearl does not disagree with anything we stated. However, he takes exception to our use of potential outcomes notation, which is the standard notation used in the statistical literature on causal inference, and his comment is devoted to promoting his alternative conventions. [C

1 Martin Lindquist and Michael Sobel published a fun little article in Neuroimage on models and assumptions for causal inference with intermediate outcomes. [sent-1, score-0.406]

2 Lindquist and Sobel write: Our original comment (Lindquist and Sobel, 2011) made explicit the types of assumptions neuroimaging researchers are making when directed graphical models (DGMs), which include certain types of structural equation models (SEMs), are used to estimate causal effects. [sent-3, score-1.034]

3 However, he takes exception to our use of potential outcomes notation, which is the standard notation used in the statistical literature on causal inference, and his comment is devoted to promoting his alternative conventions. [sent-9, score-0.361]

4 Glymour is also more optimistic than us about the potential of using directed graphical models (DGMs) to discover causal relations in neuroimaging research . [sent-11, score-0.524]

5 They consider a causal setting z -> x -> y, where z is the treatment variable, x is the intermediate outcome, and y is the ultimate outcome, and much of their discussion centers on estimating the causal effect of x on y. [sent-15, score-0.515]

6 If x is an observed variable that is not directly manipulated, I don’t know if it makes sense to talk about the effect of x on y, unconditional on the intervention that was used to change x. [sent-17, score-0.197]

7 In their example, I’d talk about “the effect of x on y, if x is changed through z. [sent-18, score-0.15]

8 Lindquist and Sobel talk about the effect of z on x. [sent-21, score-0.15]

9 If z=0 or 1, they write x(z), so that the causal effect of z on x is x(1) – x(0) (or, more generally, x(1) compared to x(0), but we lose nothing by considering simple differences here). [sent-22, score-0.39]

10 If x can equal 0 or 1, they write y(z,x), so that the causal effect of x on y, conditional on z, is y(z,1) – y(z,0). [sent-25, score-0.329]

11 I don’t find Pearl’s response to be so convincing—I agree with Lindquist and Sobel’s statement that the graphical or structural equation modeling expression looks simple and appealing but the underlying assumptions in those expressions are not so clear. [sent-38, score-0.6]

12 To be specific, Pearl contrasts three expressions of a single model, the causal chain Z—>X—>Y. [sent-40, score-0.286]

13 Here’s Pearl: Pearl characterizes the third expression is a more meaningful and clear display. [sent-41, score-0.149]

14 In contrast, Lindquist and Sobel argue that the above graphical expression appears clear only because it sweeps the model’s assumptions under the rug. [sent-42, score-0.404]

15 Speaking of clear and simple, I’m reminded of a scene, several decades ago, when a bunch of us on the county math team won some competition, and the prize was that we each got to choose one of several math books. [sent-44, score-0.154]

16 Which brings back another memory: our coach for the Mathematical Olympiad program was an unbelievably grumpy old man. [sent-48, score-0.162]

17 At one point he interrupted one of his lectures to rant about how all the calculus books now are wasting their space with applications. [sent-49, score-0.199]

18 That all seemed natural to me at the time but in retrospect I’m amazed by how brainwashed we all were. [sent-51, score-0.139]

19 ) The other thing I remember about the grumpy coach dude, besides his personality (which, in retrospect, was perhaps necessary to keep a bunch of 15-year-old boys in line; even nerds can make trouble), was that he thought it was cheating to use calculus or analytic geometry. [sent-55, score-0.318]

20 His favorite sorts of problems used elaborate arguments from classical geometry and he always felt we should be able to solve these without resorting to technical means. [sent-56, score-0.142]

1 0.94886982 2314 andrew gelman stats-2014-05-01-Heller, Heller, and Gorfine on univariate and multivariate information measures

Introduction: Malka Gorfine writes: We noticed that the important topic of association measures and tests came up again in your blog, and we have few comments in this regard. It is useful to distinguish between the univariate and multivariate methods. A consistent multivariate method can recognise dependence between two vectors of random variables, while a univariate method can only loop over pairs of components and check for dependency between them. There are very few consistent multivariate methods. To the best of our knowledge there are three practical methods: 1) HSIC by Gretton et al. ( 2) dcov by Szekely et al. ( 3) A method we introduced in Heller et al (Biometrika, 2013, 503—510,, and an R package, HHG, is available as well A

2 0.94491851 309 andrew gelman stats-2010-10-01-Why Development Economics Needs Theory?

Introduction: Robert Neumann writes: in the JEP 24(3), page18, Daron Acemoglu states: Why Development Economics Needs Theory There is no general agreement on how much we should rely on economic theory in motivating empirical work and whether we should try to formulate and estimate “structural parameters.” I (Acemoglu) argue that the answer is largely “yes” because otherwise econometric estimates would lack external validity, in which case they can neither inform us about whether a particular model or theory is a useful approximation to reality, nor would they be useful in providing us guidance on what the effects of similar shocks and policies would be in different circumstances or if implemented in different scales. I therefore define “structural parameters” as those that provide external validity and would thus be useful in testing theories or in policy analysis beyond the specific environment and sample from which they are derived. External validity becomes a particularly challenging t

3 0.94434029 1230 andrew gelman stats-2012-03-26-Further thoughts on nonparametric correlation measures

Introduction: Malka Gorfine, Ruth Heller, and Yair Heller write a comment on the paper of Reshef et al. that we discussed a few months ago. Just to remind you what’s going on here, here’s my quick summary from December: Reshef et al. propose a new nonlinear R-squared-like measure. Unlike R-squared, this new method depends on a tuning parameter that controls the level of discretization, in a “How long is the coast of Britain” sort of way. The dependence on scale is inevitable for such a general method. Just consider: if you sample 1000 points from the unit bivariate normal distribution, (x,y) ~ N(0,I), you’ll be able to fit them perfectly by a 999-degree polynomial fit to the data. So the scale of the fit matters. The clever idea of the paper is that, instead of going for an absolute measure (which, as we’ve seen, will be scale-dependent), they focus on the problem of summarizing the grid of pairwise dependences in a large set of variables. As they put it: “Imagine a data set with hundreds

4 0.94121635 1557 andrew gelman stats-2012-11-01-‘Researcher Degrees of Freedom’

Introduction: False-Positive Psychology: Undisclosed Flexibility in Data Collection and Analysis Allows Presenting Anything as Significant [I]t is unacceptably easy to publish “statistically significant” evidence consistent with any hypothesis. The culprit is a construct we refer to as researcher degrees of freedom. In the course of collecting and analyzing data, researchers have many decisions to make: Should more data be collected? Should some observations be excluded? Which conditions should be combined and which ones compared? Which control variables should be considered? Should specific measures be combined or transformed or both? It is rare, and sometimes impractical, for researchers to make all these decisions beforehand. Rather, it is common (and accepted practice) for researchers to explore various analytic alternatives, to search for a combination that yields “statistical significance,” and to then report only what “worked.” The problem, of course, is that the likelihood of at leas

same-blog 5 0.93452197 1136 andrew gelman stats-2012-01-23-Fight! (also a bit of reminiscence at the end)

Introduction: Martin Lindquist and Michael Sobel published a fun little article in Neuroimage on models and assumptions for causal inference with intermediate outcomes. As their subtitle indicates (“A response to the comments on our comment”), this is a topic of some controversy. Lindquist and Sobel write: Our original comment (Lindquist and Sobel, 2011) made explicit the types of assumptions neuroimaging researchers are making when directed graphical models (DGMs), which include certain types of structural equation models (SEMs), are used to estimate causal effects. When these assumptions, which many researchers are not aware of, are not met, parameters of these models should not be interpreted as effects. . . . [Judea] Pearl does not disagree with anything we stated. However, he takes exception to our use of potential outcomes notation, which is the standard notation used in the statistical literature on causal inference, and his comment is devoted to promoting his alternative conventions. [C

6 0.9265511 705 andrew gelman stats-2011-05-10-Some interesting unpublished ideas on survey weighting

7 0.92286336 1616 andrew gelman stats-2012-12-10-John McAfee is a Heinlein hero

8 0.9180541 2324 andrew gelman stats-2014-05-07-Once more on nonparametric measures of mutual information

9 0.90710896 1362 andrew gelman stats-2012-06-03-Question 24 of my final exam for Design and Analysis of Sample Surveys

10 0.90566367 397 andrew gelman stats-2010-11-06-Multilevel quantile regression

11 0.90318346 1076 andrew gelman stats-2011-12-21-Derman, Rodrik and the nature of statistical models

12 0.89601481 1422 andrew gelman stats-2012-07-20-Likelihood thresholds and decisions

13 0.89072299 1467 andrew gelman stats-2012-08-23-The pinch-hitter syndrome again

14 0.88633823 1591 andrew gelman stats-2012-11-26-Politics as an escape hatch

15 0.8835218 2359 andrew gelman stats-2014-06-04-All the Assumptions That Are My Life

16 0.88120991 2136 andrew gelman stats-2013-12-16-Whither the “bet on sparsity principle” in a nonsparse world?

17 0.87420458 1383 andrew gelman stats-2012-06-18-Hierarchical modeling as a framework for extrapolation

18 0.87250733 1272 andrew gelman stats-2012-04-20-More proposals to reform the peer-review system

19 0.86931157 2315 andrew gelman stats-2014-05-02-Discovering general multidimensional associations

20 0.8675167 1228 andrew gelman stats-2012-03-25-Continuous variables in Bayesian networks