andrew_gelman_stats andrew_gelman_stats-2013 andrew_gelman_stats-2013-1918 knowledge-graph by maker-knowledge-mining
Source: html
Introduction: Troels Ring writes: I have measured total phosphorus, TP, on a number of dialysis patients, and also measured conventional phosphate, Pi. Now P is exchanged with the environment as Pi, so in principle a correlation between TP and Pi could perhaps be expected. I’m really most interested in the fraction of TP which is not Pi, that is TP-Pi. I would also expect that to be positively correlated with Pi. However, looking at the data using a mixed model an insignificant negative correlation is obtained. Then I thought, that since TP-Pi is bound to be small if Pi is large a negative correlation is almost dictated by the math even if the biology would have it otherwise in so far as the the TP-Pi, likely organic P, must someday have been Pi. Hence I thought about correcting the slight negative correlation between TP-Pi and Pi for the expected large negative correlation due to the math – to eventually recover what I came from: a positive correlation. People seems to agree that this thinki
sentIndex sentText sentNum sentScore
1 Troels Ring writes: I have measured total phosphorus, TP, on a number of dialysis patients, and also measured conventional phosphate, Pi. [sent-1, score-0.384]
2 Now P is exchanged with the environment as Pi, so in principle a correlation between TP and Pi could perhaps be expected. [sent-2, score-0.472]
3 I’m really most interested in the fraction of TP which is not Pi, that is TP-Pi. [sent-3, score-0.064]
4 I would also expect that to be positively correlated with Pi. [sent-4, score-0.125]
5 However, looking at the data using a mixed model an insignificant negative correlation is obtained. [sent-5, score-0.628]
6 Then I thought, that since TP-Pi is bound to be small if Pi is large a negative correlation is almost dictated by the math even if the biology would have it otherwise in so far as the the TP-Pi, likely organic P, must someday have been Pi. [sent-6, score-0.995]
7 Hence I thought about correcting the slight negative correlation between TP-Pi and Pi for the expected large negative correlation due to the math – to eventually recover what I came from: a positive correlation. [sent-7, score-1.514]
8 People seems to agree that this thinking is nonsense. [sent-8, score-0.059]
9 They say I can just keep to the analysis and forget about RTM. [sent-9, score-0.055]
10 I cannot help thinking that if I could measure TP-Pi by a method not requiring me to subtract Pi, I would get at least a cleaner result. [sent-10, score-0.281]
11 My reply: I’m getting confused on the details here, but, yes, it is typical that if you have two variables A and B measured on a common scale, that A-B has a negative correlation with B. [sent-11, score-0.768]
12 This comes up, for example, in adjusting for pretest scores in education. [sent-12, score-0.388]
13 People often have the intuition that they should be analyzing posttest – pretest, but it typically makes more sense to look at posttest – 0. [sent-13, score-0.482]
14 Ultimately I suppose the solution is to go beyond correlations and to have a generative model for the joint distribution of TP and Pi. [sent-16, score-0.184]
wordName wordTfidf (topN-words)
[('pi', 0.555), ('tp', 0.433), ('correlation', 0.28), ('pretest', 0.267), ('negative', 0.214), ('posttest', 0.186), ('measured', 0.166), ('phosphate', 0.099), ('math', 0.097), ('exchanged', 0.093), ('organic', 0.089), ('dictated', 0.086), ('ring', 0.083), ('subtract', 0.079), ('electric', 0.079), ('insignificant', 0.078), ('slight', 0.076), ('cleaner', 0.076), ('generative', 0.075), ('positively', 0.074), ('recover', 0.074), ('correcting', 0.071), ('adjusting', 0.068), ('bound', 0.067), ('requiring', 0.067), ('patients', 0.065), ('fraction', 0.064), ('thinking', 0.059), ('large', 0.059), ('intuition', 0.057), ('biology', 0.057), ('mixed', 0.056), ('joint', 0.056), ('confused', 0.055), ('forget', 0.055), ('eventually', 0.054), ('analyzing', 0.053), ('scores', 0.053), ('typical', 0.053), ('correlations', 0.053), ('jennifer', 0.052), ('environment', 0.052), ('conventional', 0.052), ('correlated', 0.051), ('company', 0.049), ('thought', 0.049), ('hence', 0.048), ('principle', 0.047), ('otherwise', 0.046), ('due', 0.046)]
simIndex simValue blogId blogTitle
same-blog 1 1.0 1918 andrew gelman stats-2013-06-29-Going negative
Introduction: Troels Ring writes: I have measured total phosphorus, TP, on a number of dialysis patients, and also measured conventional phosphate, Pi. Now P is exchanged with the environment as Pi, so in principle a correlation between TP and Pi could perhaps be expected. I’m really most interested in the fraction of TP which is not Pi, that is TP-Pi. I would also expect that to be positively correlated with Pi. However, looking at the data using a mixed model an insignificant negative correlation is obtained. Then I thought, that since TP-Pi is bound to be small if Pi is large a negative correlation is almost dictated by the math even if the biology would have it otherwise in so far as the the TP-Pi, likely organic P, must someday have been Pi. Hence I thought about correcting the slight negative correlation between TP-Pi and Pi for the expected large negative correlation due to the math – to eventually recover what I came from: a positive correlation. People seems to agree that this thinki
2 0.12863034 1997 andrew gelman stats-2013-08-24-Measurement error in monkey studies
Introduction: Following up on our recent discussion of combative linguist Noam Chomsky and disgraced primatologist Marc Hauser, here are some stories from Jay Livingston about monkey research. Don’t get me wrong—I eat burgers, so I’m not trying to get on my moral high horse here. But the stories do get you thinking about measurement error and why I would not trust the PI of a monkey study to code his own measurements and keep his data secret.
3 0.10705476 961 andrew gelman stats-2011-10-16-The “Washington read” and the algebra of conditional distributions
Introduction: I was trying to explain in class how a (Bayesian) statistician reads the formula for a probability distribution. In old-fashioned statistics textbooks you’re told that if you want to compute a conditional distribution from a joint distribution you need to do some heavy math: p(a|b) = p(a,b)/\int p(a’,b)da’. When doing Bayesian statistics, though, you usually don’t have to do the integration or the division. If you have parameters theta and data y, you first write p(y,theta). Then to get p(theta|y), you don’t need to integrate or divide. All you have to do is look at p(y,theta) in a certain way: Treat y as a constant and theta as a variable. Similarly, if you’re doing the Gibbs sampler and want a conditional distribution, just consider the parameter you’re updating as the variable and everything else as a constant. No need to integrate or divide, you just take the joint distribution and look at it from the right perspective. Awhile ago Yair told me there’s something called
4 0.10622288 1474 andrew gelman stats-2012-08-29-More on scaled-inverse Wishart and prior independence
Introduction: I’ve had a couple of email conversations in the past couple days on dependence in multivariate prior distributions. Modeling the degrees of freedom and scale parameters in the t distribution First, in our Stan group we’ve been discussing the choice of priors for the degrees-of-freedom parameter in the t distribution. I wrote that also there’s the question of parameterization. It does not necessarily make sense to have independent priors on the df and scale parameters. In some sense, the meaning of the scale parameter changes with the df. Prior dependence between correlation and scale parameters in the scaled inverse-Wishart model The second case of parameterization in prior distribution arose from an email I received from Chris Chatham pointing me to this exploration by Matt Simpson of the scaled inverse-Wishart prior distribution for hierarchical covariance matrices. Simpson writes: A popular prior for Σ is the inverse-Wishart distribution [ not the same as the
5 0.10365023 1688 andrew gelman stats-2013-01-22-That claim that students whose parents pay for more of college get worse grades
Introduction: Theodore Vasiloudis writes: I came upon this article by Laura Hamilton, an assistant professor in the University of California at Merced, that claims that “The more money that parents provide for higher education, the lower the grades their children earn.” I can’t help but feel that there something wrong with the basis of the study or a confounding factor causing this apparent correlation, and since you often comment on studies on your blog I thought you might find this study interesting. My reply: I have to admit that the description above made me suspicious of the study before I even looked at it. On first thought, I’d expect the effect of parent’s financial contributions to be positive (as they free the student from the need to get a job during college), but not negative. Hamilton argues that “parental investments create a disincentive for student achievement,” which may be—but I’m generally suspicious of arguments in which the rebound is bigger than the main effect.
6 0.10343511 257 andrew gelman stats-2010-09-04-Question about standard range for social science correlations
7 0.087530822 315 andrew gelman stats-2010-10-03-He doesn’t trust the fit . . . r=.999
8 0.086352691 1996 andrew gelman stats-2013-08-24-All inference is about generalizing from sample to population
9 0.081594467 1980 andrew gelman stats-2013-08-13-Test scores and grades predict job performance (but maybe not at Google)
10 0.081438921 2148 andrew gelman stats-2013-12-25-Spam!
11 0.078714773 1062 andrew gelman stats-2011-12-16-Mr. Pearson, meet Mr. Mandelbrot: Detecting Novel Associations in Large Data Sets
12 0.071713917 1076 andrew gelman stats-2011-12-21-Derman, Rodrik and the nature of statistical models
13 0.069737196 936 andrew gelman stats-2011-10-02-Covariate Adjustment in RCT - Model Overfitting in Multilevel Regression
14 0.068414874 451 andrew gelman stats-2010-12-05-What do practitioners need to know about regression?
16 0.066136479 2287 andrew gelman stats-2014-04-09-Advice: positive-sum, zero-sum, or negative-sum
17 0.064171292 561 andrew gelman stats-2011-02-06-Poverty, educational performance – and can be done about it
19 0.060845483 1527 andrew gelman stats-2012-10-10-Another reason why you can get good inferences from a bad model
20 0.058499612 212 andrew gelman stats-2010-08-17-Futures contracts, Granger causality, and my preference for estimation to testing
topicId topicWeight
[(0, 0.107), (1, 0.024), (2, 0.037), (3, -0.012), (4, 0.033), (5, 0.009), (6, 0.033), (7, 0.014), (8, 0.011), (9, 0.034), (10, 0.001), (11, 0.011), (12, -0.009), (13, -0.026), (14, 0.005), (15, 0.004), (16, 0.02), (17, 0.013), (18, 0.013), (19, -0.01), (20, 0.019), (21, 0.004), (22, 0.016), (23, -0.009), (24, 0.021), (25, 0.029), (26, 0.003), (27, 0.025), (28, -0.0), (29, 0.012), (30, -0.005), (31, 0.014), (32, 0.035), (33, 0.02), (34, 0.027), (35, 0.026), (36, 0.022), (37, 0.005), (38, -0.007), (39, -0.03), (40, 0.009), (41, -0.011), (42, 0.011), (43, 0.018), (44, -0.011), (45, -0.021), (46, 0.01), (47, 0.001), (48, 0.015), (49, 0.02)]
simIndex simValue blogId blogTitle
same-blog 1 0.94335693 1918 andrew gelman stats-2013-06-29-Going negative
Introduction: Troels Ring writes: I have measured total phosphorus, TP, on a number of dialysis patients, and also measured conventional phosphate, Pi. Now P is exchanged with the environment as Pi, so in principle a correlation between TP and Pi could perhaps be expected. I’m really most interested in the fraction of TP which is not Pi, that is TP-Pi. I would also expect that to be positively correlated with Pi. However, looking at the data using a mixed model an insignificant negative correlation is obtained. Then I thought, that since TP-Pi is bound to be small if Pi is large a negative correlation is almost dictated by the math even if the biology would have it otherwise in so far as the the TP-Pi, likely organic P, must someday have been Pi. Hence I thought about correcting the slight negative correlation between TP-Pi and Pi for the expected large negative correlation due to the math – to eventually recover what I came from: a positive correlation. People seems to agree that this thinki
2 0.77770352 303 andrew gelman stats-2010-09-28-“Genomics” vs. genetics
Introduction: John Cook and Joseph Delaney point to an article by Yurii Aulchenko et al., who write: 54 loci showing strong statistical evidence for association to human height were described, providing us with potential genomic means of human height prediction. In a population-based study of 5748 people, we find that a 54-loci genomic profile explained 4-6% of the sex- and age-adjusted height variance, and had limited ability to discriminate tall/short people. . . . In a family-based study of 550 people, with both parents having height measurements, we find that the Galtonian mid-parental prediction method explained 40% of the sex- and age-adjusted height variance, and showed high discriminative accuracy. . . . The message is that the simple approach of predicting child’s height using a regression model given parents’ average height performs much better than the method they have based on combining 54 genes. They also find that, if you start with the prediction based on parents’ heigh
3 0.76579994 301 andrew gelman stats-2010-09-28-Correlation, prediction, variation, etc.
Introduction: Hamdan Azhar writes: I [Azhar] write with a question about language in the context of statistics. Consider the three statements below. a) Y is significantly associated (correlated) with X; b) knowledge of X allows us to account for __% of the variance in Y; c) Y can be predicted to a significant extent given knowledge of X. To what extent are these statements equivalent? Much of the (non-statistical) scientific literature doesn’t seem to distinguish between these notions. Is this just about semantics — or are there meaningful differences here, particularly between b and c? Consider a framework where X constitutes a predictor space of p variables (x1,…,xp). We wish to generate a linear combination of these variables to yield a score that optimally correlates with Y. Can we substitute the word “predicts” for “optimally correlates with” in this context? One can argue that “correlating” or “accounting for variance” suggests that we are trying to maximize goodness-of-fit (i
4 0.76206619 561 andrew gelman stats-2011-02-06-Poverty, educational performance – and can be done about it
Introduction: Andrew has pointed to Jonathan Livengood’s analysis of the correlation between poverty and PISA results, whereby schools with poorer students get poorer test results. I’d have written a comment, but then I couldn’t have inserted a chart. Andrew points out that a causal analysis is needed. This reminds me of an intervention that has been done before: take a child out of poverty, and bring him up in a better-off family. What’s going to happen? There have been several studies examining correlations between adoptive and biological parents’ IQ (assuming IQ is a test analogous to the math and verbal tests, and that parent IQ is analogous to the quality of instruction – but the point is in the analysis not in the metric). This is the result (from Adoption Strategies by Robin P Corley in Encyclopedia of Life Sciences): So, while it did make a difference at an early age, with increasing age of the adopted child, the intelligence of adoptive parents might not be making any difference
5 0.74185801 257 andrew gelman stats-2010-09-04-Question about standard range for social science correlations
Introduction: Andrew Eppig writes: I’m a physicist by training who is transitioning to the social sciences. I recently came across a reference in the Economist to a paper on IQ and parasites which I read as I have more than a passing interest in IQ research (having read much that you and others (e.g., Shalizi, Wicherts) have written). In this paper I note that the authors find a very high correlation between national IQ and parasite prevalence. The strength of the correlation (-0.76 to -0.82) surprised me, as I’m used to much weaker correlations in the social sciences. To me, it’s a bit too high, suggesting that there are other factors at play or that one of the variables is merely a proxy for a large number of other variables. But I have no basis for this other than a gut feeling and a memory of a plot on Language Log about the distribution of correlation coefficients in social psychology. So my question is this: Is a correlation in the range of (-0.82,-0.76) more likely to be a correlatio
6 0.72450763 315 andrew gelman stats-2010-10-03-He doesn’t trust the fit . . . r=.999
7 0.71892667 1346 andrew gelman stats-2012-05-27-Average predictive comparisons when changing a pair of variables
8 0.71384346 923 andrew gelman stats-2011-09-24-What is the normal range of values in a medical test?
9 0.70032132 1910 andrew gelman stats-2013-06-22-Struggles over the criticism of the “cannabis users and IQ change” paper
10 0.69798189 226 andrew gelman stats-2010-08-23-More on those L.A. Times estimates of teacher effectiveness
11 0.69297737 888 andrew gelman stats-2011-09-03-A psychology researcher asks: Is Anova dead?
12 0.69247508 1142 andrew gelman stats-2012-01-29-Difficulties with the 1-4-power transformation
13 0.68432885 1070 andrew gelman stats-2011-12-19-The scope for snooping
14 0.68238854 1114 andrew gelman stats-2012-01-12-Controversy about average personality differences between men and women
15 0.6822325 938 andrew gelman stats-2011-10-03-Comparing prediction errors
16 0.67985684 1121 andrew gelman stats-2012-01-15-R-squared for multilevel models
17 0.67833662 1881 andrew gelman stats-2013-06-03-Boot
18 0.67623115 1688 andrew gelman stats-2013-01-22-That claim that students whose parents pay for more of college get worse grades
19 0.6757319 1350 andrew gelman stats-2012-05-28-Value-added assessment: What went wrong?
20 0.67465365 2258 andrew gelman stats-2014-03-21-Random matrices in the news
topicId topicWeight
[(4, 0.248), (15, 0.015), (16, 0.062), (21, 0.044), (24, 0.088), (30, 0.011), (45, 0.037), (49, 0.011), (77, 0.012), (85, 0.019), (86, 0.01), (87, 0.011), (91, 0.013), (95, 0.048), (99, 0.247)]
simIndex simValue blogId blogTitle
Introduction: Alexander at GiveWell writes : The Disease Control Priorities in Developing Countries (DCP2), a major report funded by the Gates Foundation . . . provides an estimate of $3.41 per disability-adjusted life-year (DALY) for the cost-effectiveness of soil-transmitted-helminth (STH) treatment, implying that STH treatment is one of the most cost-effective interventions for global health. In investigating this figure, we have corresponded, over a period of months, with six scholars who had been directly or indirectly involved in the production of the estimate. Eventually, we were able to obtain the spreadsheet that was used to generate the $3.41/DALY estimate. That spreadsheet contains five separate errors that, when corrected, shift the estimated cost effectiveness of deworming from $3.41 to $326.43. [I think they mean to say $300 -- ed.] We came to this conclusion a year after learning that the DCP2’s published cost-effectiveness estimate for schistosomiasis treatment – another kind of
2 0.9112832 1618 andrew gelman stats-2012-12-11-The consulting biz
Introduction: I received the following (unsolicited) email: Hello, *** LLC, a ***-based market research company, has a financial client who is interested in speaking with a statistician who has done research in the field of Alzheimer’s Disease and preferably familiar with the SOLA and BAPI trials. We offer an honorarium of $200 for a 30 minute telephone interview. Please advise us if you have an employment or consulting agreement with any organization or operate professionally pursuant to an organization’s code of conduct or employee manual that may control activities by you outside of your regular present and former employment, such as participating in this consulting project for MedPanel. If there are such contracts or other documents that do apply to you, please forward MedPanel a copy of each such document asap as we are obligated to review such documents to determine if you are permitted to participate as a consultant for MedPanel on a project with this particular client. If you are
same-blog 3 0.89008272 1918 andrew gelman stats-2013-06-29-Going negative
Introduction: Troels Ring writes: I have measured total phosphorus, TP, on a number of dialysis patients, and also measured conventional phosphate, Pi. Now P is exchanged with the environment as Pi, so in principle a correlation between TP and Pi could perhaps be expected. I’m really most interested in the fraction of TP which is not Pi, that is TP-Pi. I would also expect that to be positively correlated with Pi. However, looking at the data using a mixed model an insignificant negative correlation is obtained. Then I thought, that since TP-Pi is bound to be small if Pi is large a negative correlation is almost dictated by the math even if the biology would have it otherwise in so far as the the TP-Pi, likely organic P, must someday have been Pi. Hence I thought about correcting the slight negative correlation between TP-Pi and Pi for the expected large negative correlation due to the math – to eventually recover what I came from: a positive correlation. People seems to agree that this thinki
4 0.88567603 1919 andrew gelman stats-2013-06-29-R sucks
Introduction: I was trying to make some new graphs using 5-year-old R code and I got all these problems because I was reading in files with variable names such as “co.fipsid” and now R is automatically changing them to “co_fipsid”. Or maybe the names had underbars all along, and the old R had changed them into dots. Whatever. I understand that backward compatibility can be hard to maintain, but this is just annoying.
5 0.86709368 1801 andrew gelman stats-2013-04-13-Can you write a program to determine the causal order?
Introduction: Mike Zyphur writes: Kaggle.com has launched a competition to determine what’s an effect and what’s a cause. They’ve got correlated variables, they’re deprived of context, and you’re asked to determine the causal order. $5,000 prizes. I followed the link and the example they gave didn’t make much sense to me (the two variables were temperature and altitude of cities in Germany, and they said that altitude causes temperature). It has the feeling to me of one of those weird standardized tests we used to see sometimes in school, where there’s no real correct answer so the goal is to figure out what the test-writer wanted you to say. Nonetheless, this might be of interest, so I’m passing it along to you.
6 0.8576442 1997 andrew gelman stats-2013-08-24-Measurement error in monkey studies
7 0.85394263 238 andrew gelman stats-2010-08-27-No radon lobby
8 0.84721923 113 andrew gelman stats-2010-06-28-Advocacy in the form of a “deliberative forum”
9 0.83743393 1829 andrew gelman stats-2013-04-28-Plain old everyday Bayesianism!
10 0.8353551 907 andrew gelman stats-2011-09-14-Reproducibility in Practice
11 0.82012212 419 andrew gelman stats-2010-11-18-Derivative-based MCMC as a breakthrough technique for implementing Bayesian statistics
12 0.81620455 1470 andrew gelman stats-2012-08-26-Graphs showing regression uncertainty: the code!
13 0.80733323 2000 andrew gelman stats-2013-08-28-Why during the 1950-1960′s did Jerry Cornfield become a Bayesian?
14 0.80402064 2211 andrew gelman stats-2014-02-14-The popularity of certain baby names is falling off the clifffffffffffff
15 0.80132121 2212 andrew gelman stats-2014-02-15-Mary, Mary, why ya buggin
16 0.79315531 2078 andrew gelman stats-2013-10-26-“The Bayesian approach to forensic evidence”
17 0.77229935 1350 andrew gelman stats-2012-05-28-Value-added assessment: What went wrong?
18 0.77054226 1996 andrew gelman stats-2013-08-24-All inference is about generalizing from sample to population
19 0.75907016 48 andrew gelman stats-2010-05-23-The bane of many causes