andrew_gelman_stats andrew_gelman_stats-2013 andrew_gelman_stats-2013-1801 knowledge-graph by maker-knowledge-mining

1801 andrew gelman stats-2013-04-13-Can you write a program to determine the causal order?


meta infos for this blog

Source: html

Introduction: Mike Zyphur writes: Kaggle.com has launched a competition to determine what’s an effect and what’s a cause. They’ve got correlated variables, they’re deprived of context, and you’re asked to determine the causal order. $5,000 prizes. I followed the link and the example they gave didn’t make much sense to me (the two variables were temperature and altitude of cities in Germany, and they said that altitude causes temperature). It has the feeling to me of one of those weird standardized tests we used to see sometimes in school, where there’s no real correct answer so the goal is to figure out what the test-writer wanted you to say. Nonetheless, this might be of interest, so I’m passing it along to you.


Summary: the most important sentenses genereted by tfidf model

sentIndex sentText sentNum sentScore

1 com has launched a competition to determine what’s an effect and what’s a cause. [sent-2, score-0.651]

2 They’ve got correlated variables, they’re deprived of context, and you’re asked to determine the causal order. [sent-3, score-0.64]

3 I followed the link and the example they gave didn’t make much sense to me (the two variables were temperature and altitude of cities in Germany, and they said that altitude causes temperature). [sent-5, score-2.412]

4 It has the feeling to me of one of those weird standardized tests we used to see sometimes in school, where there’s no real correct answer so the goal is to figure out what the test-writer wanted you to say. [sent-6, score-1.26]

5 Nonetheless, this might be of interest, so I’m passing it along to you. [sent-7, score-0.288]


similar blogs computed by tfidf model

tfidf for this blog:

wordName wordTfidf (topN-words)

[('altitude', 0.532), ('temperature', 0.297), ('determine', 0.258), ('zyphur', 0.242), ('germany', 0.187), ('launched', 0.178), ('variables', 0.177), ('standardized', 0.176), ('passing', 0.165), ('nonetheless', 0.162), ('cities', 0.16), ('competition', 0.143), ('weird', 0.143), ('mike', 0.142), ('causes', 0.14), ('correlated', 0.124), ('feeling', 0.118), ('followed', 0.107), ('tests', 0.106), ('gave', 0.101), ('causal', 0.097), ('context', 0.096), ('correct', 0.096), ('goal', 0.095), ('wanted', 0.094), ('school', 0.093), ('figure', 0.093), ('asked', 0.09), ('re', 0.087), ('interest', 0.083), ('along', 0.08), ('sometimes', 0.079), ('link', 0.077), ('answer', 0.077), ('effect', 0.072), ('got', 0.071), ('said', 0.069), ('real', 0.069), ('didn', 0.065), ('used', 0.058), ('sense', 0.058), ('two', 0.046), ('might', 0.043), ('ve', 0.041), ('example', 0.04), ('make', 0.039), ('much', 0.037), ('writes', 0.037), ('see', 0.032), ('one', 0.024)]

similar blogs list:

simIndex simValue blogId blogTitle

same-blog 1 1.0000001 1801 andrew gelman stats-2013-04-13-Can you write a program to determine the causal order?

Introduction: Mike Zyphur writes: Kaggle.com has launched a competition to determine what’s an effect and what’s a cause. They’ve got correlated variables, they’re deprived of context, and you’re asked to determine the causal order. $5,000 prizes. I followed the link and the example they gave didn’t make much sense to me (the two variables were temperature and altitude of cities in Germany, and they said that altitude causes temperature). It has the feeling to me of one of those weird standardized tests we used to see sometimes in school, where there’s no real correct answer so the goal is to figure out what the test-writer wanted you to say. Nonetheless, this might be of interest, so I’m passing it along to you.

2 0.1773496 1501 andrew gelman stats-2012-09-18-More studies on the economic effects of climate change

Introduction: After writing yesterday’s post , I was going through Solomon Hsiang’s blog and found a post pointing to three studies from researchers at business schools: Severe Weather and Automobile Assembly Productivity Gérard P. Cachon, Santiago Gallino and Marcelo Olivares Abstract: It is expected that climate change could lead to an increased frequency of severe weather. In turn, severe weather intuitively should hamper the productivity of work that occurs outside. But what is the effect of rain, snow, fog, heat and wind on work that occurs indoors, such as the production of automobiles? Using weekly production data from 64 automobile plants in the United States over a ten-year period, we find that adverse weather conditions lead to a significant reduction in production. For example, one additional day of high wind advisory by the National Weather Service (i.e., maximum winds generally in excess of 44 miles per hour) reduces production by 26%, which is comparable in order of magnitude t

3 0.13857055 2340 andrew gelman stats-2014-05-20-Thermodynamic Monte Carlo: Michael Betancourt’s new method for simulating from difficult distributions and evaluating normalizing constants

Introduction: I hate to keep bumping our scheduled posts but this is just too important and too exciting to wait. So it’s time to jump the queue. The news is a paper from Michael Betancourt that presents a super-cool new way to compute normalizing constants: A common strategy for inference in complex models is the relaxation of a simple model into the more complex target model, for example the prior into the posterior in Bayesian inference. Existing approaches that attempt to generate such transformations, however, are sensitive to the pathologies of complex distributions and can be difficult to implement in practice. Leveraging the geometry of thermodynamic processes I introduce a principled and robust approach to deforming measures that presents a powerful new tool for inference. The idea is to generalize Hamiltonian Monte Carlo so that it moves through a family of distributions (that is, it transitions through an “inverse temperature” variable called beta that indexes the family) a

4 0.1263105 180 andrew gelman stats-2010-08-03-Climate Change News

Introduction: I. State of the Climate report The National Oceanic and Atmospheric Administration recently released their “State of the Climate Report” for 2009 . The report has chapters discussing global climate (temperatures, water vapor, cloudiness, alpine glaciers,…); oceans (ocean heat content, sea level, sea surface temperatures, etc.); the arctic (sea ice extent, permafrost, vegetation, and so on); Antarctica (weather observations, sea ice extent,…), and regional climates. NOAA also provides a nice page that lets you display any of 11 relevant time-series datasets (land-surface air temperature, sea level, ocean heat content, September arctic sea-ice extent, sea-surface temperature, northern hemisphere snow cover, specific humidity, glacier mass balance, marine air temperature, tropospheric temperature, and stratospheric temperature). Each of the plots overlays data from several databases (not necessarily indepenedent of each other), and you can select which ones to include or leave

5 0.11518022 1675 andrew gelman stats-2013-01-15-“10 Things You Need to Know About Causal Effects”

Introduction: Macartan Humphreys pointed me to this excellent guide . Here are the 10 items: 1. A causal claim is a statement about what didn’t happen. 2. There is a fundamental problem of causal inference. 3. You can estimate average causal effects even if you cannot observe any individual causal effects. 4. If you know that, on average, A causes B and that B causes C, this does not mean that you know that A causes C. 5. The counterfactual model is all about contribution, not attribution. 6. X can cause Y even if there is no “causal path” connecting X and Y. 7. Correlation is not causation. 8. X can cause Y even if X is not a necessary condition or a sufficient condition for Y. 9. Estimating average causal effects does not require that treatment and control groups are identical. 10. There is no causation without manipulation. The article follows with crisp discussions of each point. My favorite is item #6, not because it’s the most important but because it brings in some real s

6 0.11073975 2364 andrew gelman stats-2014-06-08-Regression and causality and variable ordering

7 0.1071562 2097 andrew gelman stats-2013-11-11-Why ask why? Forward causal inference and reverse causal questions

8 0.097971976 1486 andrew gelman stats-2012-09-07-Prior distributions for regression coefficients

9 0.093436986 1402 andrew gelman stats-2012-07-01-Ice cream! and temperature

10 0.089174926 1201 andrew gelman stats-2012-03-07-Inference = data + model

11 0.08382798 945 andrew gelman stats-2011-10-06-W’man < W’pedia, again

12 0.079822384 357 andrew gelman stats-2010-10-20-Sas and R

13 0.078053825 1939 andrew gelman stats-2013-07-15-Forward causal reasoning statements are about estimation; reverse causal questions are about model checking and hypothesis generation

14 0.077601336 1418 andrew gelman stats-2012-07-16-Long discussion about causal inference and the use of hierarchical models to bridge between different inferential settings

15 0.075698979 206 andrew gelman stats-2010-08-13-Indiemapper makes thematic mapping easy

16 0.074184246 257 andrew gelman stats-2010-09-04-Question about standard range for social science correlations

17 0.072657324 803 andrew gelman stats-2011-07-14-Subtleties with measurement-error models for the evaluation of wacky claims

18 0.07048548 401 andrew gelman stats-2010-11-08-Silly old chi-square!

19 0.067369796 1106 andrew gelman stats-2012-01-08-Intro to splines—with cool graphs

20 0.066589728 1500 andrew gelman stats-2012-09-17-“2% per degree Celsius . . . the magic number for how worker productivity responds to warm-hot temperatures”


similar blogs computed by lsi model

lsi for this blog:

topicId topicWeight

[(0, 0.108), (1, -0.01), (2, 0.021), (3, -0.025), (4, 0.049), (5, -0.002), (6, 0.022), (7, 0.016), (8, 0.045), (9, 0.013), (10, -0.021), (11, 0.02), (12, 0.036), (13, -0.023), (14, -0.0), (15, 0.012), (16, 0.027), (17, 0.009), (18, -0.026), (19, 0.016), (20, -0.042), (21, -0.005), (22, 0.037), (23, -0.0), (24, 0.056), (25, 0.02), (26, 0.01), (27, -0.074), (28, 0.01), (29, 0.033), (30, 0.035), (31, 0.038), (32, 0.018), (33, -0.027), (34, -0.063), (35, -0.034), (36, 0.035), (37, -0.013), (38, 0.026), (39, 0.053), (40, 0.002), (41, 0.045), (42, -0.014), (43, 0.014), (44, -0.043), (45, -0.019), (46, -0.058), (47, -0.003), (48, -0.017), (49, 0.031)]

similar blogs list:

simIndex simValue blogId blogTitle

same-blog 1 0.95480192 1801 andrew gelman stats-2013-04-13-Can you write a program to determine the causal order?

Introduction: Mike Zyphur writes: Kaggle.com has launched a competition to determine what’s an effect and what’s a cause. They’ve got correlated variables, they’re deprived of context, and you’re asked to determine the causal order. $5,000 prizes. I followed the link and the example they gave didn’t make much sense to me (the two variables were temperature and altitude of cities in Germany, and they said that altitude causes temperature). It has the feeling to me of one of those weird standardized tests we used to see sometimes in school, where there’s no real correct answer so the goal is to figure out what the test-writer wanted you to say. Nonetheless, this might be of interest, so I’m passing it along to you.

2 0.63100797 1136 andrew gelman stats-2012-01-23-Fight! (also a bit of reminiscence at the end)

Introduction: Martin Lindquist and Michael Sobel published a fun little article in Neuroimage on models and assumptions for causal inference with intermediate outcomes. As their subtitle indicates (“A response to the comments on our comment”), this is a topic of some controversy. Lindquist and Sobel write: Our original comment (Lindquist and Sobel, 2011) made explicit the types of assumptions neuroimaging researchers are making when directed graphical models (DGMs), which include certain types of structural equation models (SEMs), are used to estimate causal effects. When these assumptions, which many researchers are not aware of, are not met, parameters of these models should not be interpreted as effects. . . . [Judea] Pearl does not disagree with anything we stated. However, he takes exception to our use of potential outcomes notation, which is the standard notation used in the statistical literature on causal inference, and his comment is devoted to promoting his alternative conventions. [C

3 0.63070387 1675 andrew gelman stats-2013-01-15-“10 Things You Need to Know About Causal Effects”

Introduction: Macartan Humphreys pointed me to this excellent guide . Here are the 10 items: 1. A causal claim is a statement about what didn’t happen. 2. There is a fundamental problem of causal inference. 3. You can estimate average causal effects even if you cannot observe any individual causal effects. 4. If you know that, on average, A causes B and that B causes C, this does not mean that you know that A causes C. 5. The counterfactual model is all about contribution, not attribution. 6. X can cause Y even if there is no “causal path” connecting X and Y. 7. Correlation is not causation. 8. X can cause Y even if X is not a necessary condition or a sufficient condition for Y. 9. Estimating average causal effects does not require that treatment and control groups are identical. 10. There is no causation without manipulation. The article follows with crisp discussions of each point. My favorite is item #6, not because it’s the most important but because it brings in some real s

4 0.62738007 212 andrew gelman stats-2010-08-17-Futures contracts, Granger causality, and my preference for estimation to testing

Introduction: José Iparraguirre writes: There’s a letter in the latest issue of The Economist (July 31st) signed by Sir Richard Branson (Virgin), Michael Masters (Masters Capital Management) and David Frenk (Better Markets) about an “>OECD report on speculation and the prices of commodities, which includes the following: “The report uses a Granger causality test to measure the relationship between the level of commodities futures contracts held by swap dealers, and the prices of those commodities. Granger tests, however, are of dubious applicability to extremely volatile variables like commodities prices.” The report says: Granger causality is a standard statistical technique for determining whether one time series is useful in forecasting another. It is important to bear in mind that the term causality is used in a statistical sense, and not in a philosophical one of structural causation. More precisely a variable A is said to Granger cause B if knowing the time paths of B and A toge

5 0.61998099 807 andrew gelman stats-2011-07-17-Macro causality

Introduction: David Backus writes: This is from my area of work, macroeconomics. The suggestion here is that the economy is growing slowly because consumers aren’t spending money. But how do we know it’s not the reverse: that consumers are spending less because the economy isn’t doing well. As a teacher, I can tell you that it’s almost impossible to get students to understand that the first statement isn’t obviously true. What I’d call the demand-side story (more spending leads to more output) is everywhere, including this piece, from the usually reliable David Leonhardt. This whole situation reminds me of the story of the village whose inhabitants support themselves by taking in each others’ laundry. I guess we’re rich enough in the U.S. that we can stay afloat for a few decades just buying things from each other? Regarding the causal question, I’d like to move away from the idea of “Does A causes B or does B cause A” and toward a more intervention-based framework (Rubin’s model for

6 0.61426747 550 andrew gelman stats-2011-02-02-An IV won’t save your life if the line is tangled

7 0.60700166 1656 andrew gelman stats-2013-01-05-Understanding regression models and regression coefficients

8 0.60412705 481 andrew gelman stats-2010-12-22-The Jumpstart financial literacy survey and the different purposes of tests

9 0.60051477 1985 andrew gelman stats-2013-08-16-Learning about correlations using cross-sectional and over-time comparisons between and within countries

10 0.59937298 1492 andrew gelman stats-2012-09-11-Using the “instrumental variables” or “potential outcomes” approach to clarify causal thinking

11 0.59173328 307 andrew gelman stats-2010-09-29-“Texting bans don’t reduce crashes; effects are slight crash increases”

12 0.59168833 393 andrew gelman stats-2010-11-04-Estimating the effect of A on B, and also the effect of B on A

13 0.58890659 518 andrew gelman stats-2011-01-15-Regression discontinuity designs: looking for the keys under the lamppost?

14 0.58548623 2364 andrew gelman stats-2014-06-08-Regression and causality and variable ordering

15 0.57817233 1734 andrew gelman stats-2013-02-23-Life in the C-suite: A graph that is both ugly and bad, and an unrelated story

16 0.57471687 1666 andrew gelman stats-2013-01-10-They’d rather be rigorous than right

17 0.56775218 459 andrew gelman stats-2010-12-09-Solve mazes by starting at the exit

18 0.56643599 180 andrew gelman stats-2010-08-03-Climate Change News

19 0.56484395 983 andrew gelman stats-2011-10-31-Skepticism about skepticism of global warming skepticism skepticism

20 0.56433439 248 andrew gelman stats-2010-09-01-Ratios where the numerator and denominator both change signs


similar blogs computed by lda model

lda for this blog:

topicId topicWeight

[(2, 0.026), (4, 0.256), (15, 0.015), (16, 0.097), (21, 0.036), (24, 0.143), (76, 0.028), (93, 0.026), (99, 0.238)]

similar blogs list:

simIndex simValue blogId blogTitle

1 0.93264902 1618 andrew gelman stats-2012-12-11-The consulting biz

Introduction: I received the following (unsolicited) email: Hello, *** LLC, a ***-based market research company, has a financial client who is interested in speaking with a statistician who has done research in the field of Alzheimer’s Disease and preferably familiar with the SOLA and BAPI trials. We offer an honorarium of $200 for a 30 minute telephone interview. Please advise us if you have an employment or consulting agreement with any organization or operate professionally pursuant to an organization’s code of conduct or employee manual that may control activities by you outside of your regular present and former employment, such as participating in this consulting project for MedPanel. If there are such contracts or other documents that do apply to you, please forward MedPanel a copy of each such document asap as we are obligated to review such documents to determine if you are permitted to participate as a consultant for MedPanel on a project with this particular client. If you are

2 0.91897774 947 andrew gelman stats-2011-10-08-GiveWell sez: Cost-effectiveness of de-worming was overstated by a factor of 100 (!) due to a series of sloppy calculations

Introduction: Alexander at GiveWell writes : The Disease Control Priorities in Developing Countries (DCP2), a major report funded by the Gates Foundation . . . provides an estimate of $3.41 per disability-adjusted life-year (DALY) for the cost-effectiveness of soil-transmitted-helminth (STH) treatment, implying that STH treatment is one of the most cost-effective interventions for global health. In investigating this figure, we have corresponded, over a period of months, with six scholars who had been directly or indirectly involved in the production of the estimate. Eventually, we were able to obtain the spreadsheet that was used to generate the $3.41/DALY estimate. That spreadsheet contains five separate errors that, when corrected, shift the estimated cost effectiveness of deworming from $3.41 to $326.43. [I think they mean to say $300 -- ed.] We came to this conclusion a year after learning that the DCP2’s published cost-effectiveness estimate for schistosomiasis treatment – another kind of

same-blog 3 0.89239991 1801 andrew gelman stats-2013-04-13-Can you write a program to determine the causal order?

Introduction: Mike Zyphur writes: Kaggle.com has launched a competition to determine what’s an effect and what’s a cause. They’ve got correlated variables, they’re deprived of context, and you’re asked to determine the causal order. $5,000 prizes. I followed the link and the example they gave didn’t make much sense to me (the two variables were temperature and altitude of cities in Germany, and they said that altitude causes temperature). It has the feeling to me of one of those weird standardized tests we used to see sometimes in school, where there’s no real correct answer so the goal is to figure out what the test-writer wanted you to say. Nonetheless, this might be of interest, so I’m passing it along to you.

4 0.87277246 1919 andrew gelman stats-2013-06-29-R sucks

Introduction: I was trying to make some new graphs using 5-year-old R code and I got all these problems because I was reading in files with variable names such as “co.fipsid” and now R is automatically changing them to “co_fipsid”. Or maybe the names had underbars all along, and the old R had changed them into dots. Whatever. I understand that backward compatibility can be hard to maintain, but this is just annoying.

5 0.8629151 1918 andrew gelman stats-2013-06-29-Going negative

Introduction: Troels Ring writes: I have measured total phosphorus, TP, on a number of dialysis patients, and also measured conventional phosphate, Pi. Now P is exchanged with the environment as Pi, so in principle a correlation between TP and Pi could perhaps be expected. I’m really most interested in the fraction of TP which is not Pi, that is TP-Pi. I would also expect that to be positively correlated with Pi. However, looking at the data using a mixed model an insignificant negative correlation is obtained. Then I thought, that since TP-Pi is bound to be small if Pi is large a negative correlation is almost dictated by the math even if the biology would have it otherwise in so far as the the TP-Pi, likely organic P, must someday have been Pi. Hence I thought about correcting the slight negative correlation between TP-Pi and Pi for the expected large negative correlation due to the math – to eventually recover what I came from: a positive correlation. People seems to agree that this thinki

6 0.84878576 238 andrew gelman stats-2010-08-27-No radon lobby

7 0.84267217 907 andrew gelman stats-2011-09-14-Reproducibility in Practice

8 0.84249055 113 andrew gelman stats-2010-06-28-Advocacy in the form of a “deliberative forum”

9 0.82246876 1829 andrew gelman stats-2013-04-28-Plain old everyday Bayesianism!

10 0.82062507 419 andrew gelman stats-2010-11-18-Derivative-based MCMC as a breakthrough technique for implementing Bayesian statistics

11 0.81935012 1470 andrew gelman stats-2012-08-26-Graphs showing regression uncertainty: the code!

12 0.81795561 1997 andrew gelman stats-2013-08-24-Measurement error in monkey studies

13 0.80191553 2211 andrew gelman stats-2014-02-14-The popularity of certain baby names is falling off the clifffffffffffff

14 0.79966623 2212 andrew gelman stats-2014-02-15-Mary, Mary, why ya buggin

15 0.78733408 2000 andrew gelman stats-2013-08-28-Why during the 1950-1960′s did Jerry Cornfield become a Bayesian?

16 0.78350353 2078 andrew gelman stats-2013-10-26-“The Bayesian approach to forensic evidence”

17 0.77508754 1350 andrew gelman stats-2012-05-28-Value-added assessment: What went wrong?

18 0.76831275 807 andrew gelman stats-2011-07-17-Macro causality

19 0.7636106 1605 andrew gelman stats-2012-12-04-Write This Book

20 0.76296651 2065 andrew gelman stats-2013-10-17-Cool dynamic demographic maps provide beautiful illustration of Chris Rock effect