andrew_gelman_stats andrew_gelman_stats-2014 andrew_gelman_stats-2014-2315 knowledge-graph by maker-knowledge-mining
Source: html
Introduction: Continuing our discussion of general measures of correlations, Ben Murrell sends along this paper (with corresponding R package), which begins: When two variables are related by a known function, the coefficient of determination (denoted R-squared) measures the proportion of the total variance in the observations that is explained by that function. This quantifies the strength of the relationship between variables by describing what proportion of the variance is signal as opposed to noise. For linear relationships, this is equal to the square of the correlation coefficient, ρ. When the parametric form of the relationship is unknown, however, it is unclear how to estimate the proportion of explained variance equitably – assigning similar values to equally noisy relationships. Here we demonstrate how to directly estimate a generalized R-squared when the form of the relationship is unknown, and we question the performance of the Maximal Information Coefficient (MIC) – a recently pr
sentIndex sentText sentNum sentScore
1 This quantifies the strength of the relationship between variables by describing what proportion of the variance is signal as opposed to noise. [sent-2, score-1.235]
2 For linear relationships, this is equal to the square of the correlation coefficient, ρ. [sent-3, score-0.166]
3 When the parametric form of the relationship is unknown, however, it is unclear how to estimate the proportion of explained variance equitably – assigning similar values to equally noisy relationships. [sent-4, score-1.833]
4 Here we demonstrate how to directly estimate a generalized R-squared when the form of the relationship is unknown, and we question the performance of the Maximal Information Coefficient (MIC) – a recently proposed information theoretic measure of dependence. [sent-5, score-0.838]
5 We show that our approach behaves equitably, has more power than MIC to detect association between variables, and converges faster with increasing sample size. [sent-6, score-0.515]
6 Most importantly, our approach generalizes to higher dimensions, which allows us to estimate the strength of multivariate relationships (Y against A,B,…) and to measure association while controlling for covariates (Y against X controlling for C). [sent-7, score-1.403]
7 And, since we’re talking about R-squared, let me point you to my 2006 paper with Iain Pardoe, Bayesian measures of explained variance and pooling in multilevel (hierarchical) models . [sent-8, score-0.73]
wordName wordTfidf (topN-words)
[('equitably', 0.269), ('variance', 0.243), ('mic', 0.227), ('relationship', 0.219), ('coefficient', 0.217), ('proportion', 0.216), ('explained', 0.2), ('measures', 0.2), ('unknown', 0.181), ('strength', 0.176), ('relationships', 0.17), ('controlling', 0.158), ('variables', 0.147), ('iain', 0.134), ('association', 0.13), ('determination', 0.127), ('theoretic', 0.127), ('converges', 0.127), ('estimate', 0.126), ('pardoe', 0.121), ('generalizes', 0.121), ('murrell', 0.114), ('measure', 0.111), ('maximal', 0.104), ('unclear', 0.1), ('form', 0.1), ('parametric', 0.099), ('assigning', 0.099), ('importantly', 0.093), ('square', 0.093), ('detect', 0.089), ('covariates', 0.087), ('pooling', 0.087), ('approach', 0.086), ('equally', 0.084), ('ben', 0.084), ('faster', 0.083), ('generalized', 0.082), ('dimensions', 0.081), ('signal', 0.081), ('multivariate', 0.08), ('opposed', 0.079), ('noisy', 0.078), ('continuing', 0.077), ('corresponding', 0.076), ('begins', 0.074), ('describing', 0.074), ('observations', 0.074), ('proposed', 0.073), ('equal', 0.073)]
simIndex simValue blogId blogTitle
same-blog 1 0.99999994 2315 andrew gelman stats-2014-05-02-Discovering general multidimensional associations
Introduction: Continuing our discussion of general measures of correlations, Ben Murrell sends along this paper (with corresponding R package), which begins: When two variables are related by a known function, the coefficient of determination (denoted R-squared) measures the proportion of the total variance in the observations that is explained by that function. This quantifies the strength of the relationship between variables by describing what proportion of the variance is signal as opposed to noise. For linear relationships, this is equal to the square of the correlation coefficient, ρ. When the parametric form of the relationship is unknown, however, it is unclear how to estimate the proportion of explained variance equitably – assigning similar values to equally noisy relationships. Here we demonstrate how to directly estimate a generalized R-squared when the form of the relationship is unknown, and we question the performance of the Maximal Information Coefficient (MIC) – a recently pr
2 0.23081538 1230 andrew gelman stats-2012-03-26-Further thoughts on nonparametric correlation measures
Introduction: Malka Gorfine, Ruth Heller, and Yair Heller write a comment on the paper of Reshef et al. that we discussed a few months ago. Just to remind you what’s going on here, here’s my quick summary from December: Reshef et al. propose a new nonlinear R-squared-like measure. Unlike R-squared, this new method depends on a tuning parameter that controls the level of discretization, in a “How long is the coast of Britain” sort of way. The dependence on scale is inevitable for such a general method. Just consider: if you sample 1000 points from the unit bivariate normal distribution, (x,y) ~ N(0,I), you’ll be able to fit them perfectly by a 999-degree polynomial fit to the data. So the scale of the fit matters. The clever idea of the paper is that, instead of going for an absolute measure (which, as we’ve seen, will be scale-dependent), they focus on the problem of summarizing the grid of pairwise dependences in a large set of variables. As they put it: “Imagine a data set with hundreds
Introduction: Malka Gorfine writes: We noticed that the important topic of association measures and tests came up again in your blog, and we have few comments in this regard. It is useful to distinguish between the univariate and multivariate methods. A consistent multivariate method can recognise dependence between two vectors of random variables, while a univariate method can only loop over pairs of components and check for dependency between them. There are very few consistent multivariate methods. To the best of our knowledge there are three practical methods: 1) HSIC by Gretton et al. (http://www.gatsby.ucl.ac.uk/~gretton/papers/GreBouSmoSch05.pdf) 2) dcov by Szekely et al. (http://projecteuclid.org/euclid.aoas/1267453933) 3) A method we introduced in Heller et al (Biometrika, 2013, 503—510, http://biomet.oxfordjournals.org/content/early/2012/12/04/biomet.ass070.full.pdf+html, and an R package, HHG, is available as well http://cran.r-project.org/web/packages/HHG/index.html). A
4 0.15862453 2247 andrew gelman stats-2014-03-14-The maximal information coefficient
Introduction: Justin Kinney writes: I wanted to let you know that the critique Mickey Atwal and I wrote regarding equitability and the maximal information coefficient has just been published . We discussed this paper last year, under the heading, Too many MC’s not enough MIC’s, or What principles should govern attempts to summarize bivariate associations in large multivariate datasets? Kinney and Atwal’s paper is interesting, with my only criticism being that in some places they seem to aim for what might not be possible. For example, they write that “mutual information is already widely believed to quantify dependencies without bias for relationships of one type or another,” which seems a bit vague to me. And later they write, “How to compute such an estimate that does not bias the resulting mutual information value remains an open problem,” which seems to me to miss the point in that unbiased statistical estimates are not generally possible and indeed are often not desirable. Their
Introduction: Justin Kinney writes: Since your blog has discussed the “maximal information coefficient” (MIC) of Reshef et al., I figured you might want to see the critique that Gurinder Atwal and I have posted. In short, Reshef et al.’s central claim that MIC is “equitable” is incorrect. We [Kinney and Atwal] offer mathematical proof that the definition of “equitability” Reshef et al. propose is unsatisfiable—no nontrivial dependence measure, including MIC, has this property. Replicating the simulations in their paper with modestly larger data sets validates this finding. The heuristic notion of equitability, however, can be formalized instead as a self-consistency condition closely related to the Data Processing Inequality. Mutual information satisfies this new definition of equitability but MIC does not. We therefore propose that simply estimating mutual information will, in many cases, provide the sort of dependence measure Reshef et al. seek. For background, here are my two p
6 0.13860169 810 andrew gelman stats-2011-07-20-Adding more information can make the variance go up (depending on your model)
7 0.13673051 1737 andrew gelman stats-2013-02-25-Correlation of 1 . . . too good to be true?
8 0.12625396 2324 andrew gelman stats-2014-05-07-Once more on nonparametric measures of mutual information
9 0.12349215 301 andrew gelman stats-2010-09-28-Correlation, prediction, variation, etc.
10 0.12301692 779 andrew gelman stats-2011-06-25-Avoiding boundary estimates using a prior distribution as regularization
11 0.12276942 1967 andrew gelman stats-2013-08-04-What are the key assumptions of linear regression?
12 0.11935133 257 andrew gelman stats-2010-09-04-Question about standard range for social science correlations
13 0.11797613 1340 andrew gelman stats-2012-05-23-Question 13 of my final exam for Design and Analysis of Sample Surveys
14 0.11720035 63 andrew gelman stats-2010-06-02-The problem of overestimation of group-level variance parameters
15 0.11134394 1870 andrew gelman stats-2013-05-26-How to understand coefficients that reverse sign when you start controlling for things?
16 0.10716553 850 andrew gelman stats-2011-08-11-Understanding how estimates change when you move to a multilevel model
17 0.10539529 960 andrew gelman stats-2011-10-15-The bias-variance tradeoff
19 0.10275246 846 andrew gelman stats-2011-08-09-Default priors update?
20 0.097468667 1248 andrew gelman stats-2012-04-06-17 groups, 6 group-level predictors: What to do?
topicId topicWeight
[(0, 0.121), (1, 0.11), (2, 0.094), (3, -0.069), (4, 0.073), (5, 0.015), (6, -0.003), (7, -0.047), (8, -0.0), (9, 0.066), (10, 0.028), (11, -0.0), (12, 0.017), (13, 0.055), (14, 0.035), (15, 0.013), (16, -0.018), (17, 0.022), (18, -0.021), (19, -0.033), (20, 0.017), (21, 0.022), (22, 0.092), (23, 0.018), (24, 0.085), (25, 0.016), (26, 0.034), (27, 0.037), (28, 0.06), (29, -0.025), (30, 0.064), (31, 0.093), (32, 0.093), (33, -0.064), (34, 0.081), (35, 0.005), (36, 0.034), (37, -0.007), (38, 0.03), (39, -0.021), (40, 0.017), (41, -0.019), (42, 0.046), (43, 0.035), (44, -0.053), (45, -0.027), (46, 0.058), (47, -0.037), (48, -0.051), (49, -0.047)]
simIndex simValue blogId blogTitle
same-blog 1 0.98641157 2315 andrew gelman stats-2014-05-02-Discovering general multidimensional associations
Introduction: Continuing our discussion of general measures of correlations, Ben Murrell sends along this paper (with corresponding R package), which begins: When two variables are related by a known function, the coefficient of determination (denoted R-squared) measures the proportion of the total variance in the observations that is explained by that function. This quantifies the strength of the relationship between variables by describing what proportion of the variance is signal as opposed to noise. For linear relationships, this is equal to the square of the correlation coefficient, ρ. When the parametric form of the relationship is unknown, however, it is unclear how to estimate the proportion of explained variance equitably – assigning similar values to equally noisy relationships. Here we demonstrate how to directly estimate a generalized R-squared when the form of the relationship is unknown, and we question the performance of the Maximal Information Coefficient (MIC) – a recently pr
2 0.74359512 1230 andrew gelman stats-2012-03-26-Further thoughts on nonparametric correlation measures
Introduction: Malka Gorfine, Ruth Heller, and Yair Heller write a comment on the paper of Reshef et al. that we discussed a few months ago. Just to remind you what’s going on here, here’s my quick summary from December: Reshef et al. propose a new nonlinear R-squared-like measure. Unlike R-squared, this new method depends on a tuning parameter that controls the level of discretization, in a “How long is the coast of Britain” sort of way. The dependence on scale is inevitable for such a general method. Just consider: if you sample 1000 points from the unit bivariate normal distribution, (x,y) ~ N(0,I), you’ll be able to fit them perfectly by a 999-degree polynomial fit to the data. So the scale of the fit matters. The clever idea of the paper is that, instead of going for an absolute measure (which, as we’ve seen, will be scale-dependent), they focus on the problem of summarizing the grid of pairwise dependences in a large set of variables. As they put it: “Imagine a data set with hundreds
3 0.72630841 301 andrew gelman stats-2010-09-28-Correlation, prediction, variation, etc.
Introduction: Hamdan Azhar writes: I [Azhar] write with a question about language in the context of statistics. Consider the three statements below. a) Y is significantly associated (correlated) with X; b) knowledge of X allows us to account for __% of the variance in Y; c) Y can be predicted to a significant extent given knowledge of X. To what extent are these statements equivalent? Much of the (non-statistical) scientific literature doesn’t seem to distinguish between these notions. Is this just about semantics — or are there meaningful differences here, particularly between b and c? Consider a framework where X constitutes a predictor space of p variables (x1,…,xp). We wish to generate a linear combination of these variables to yield a score that optimally correlates with Y. Can we substitute the word “predicts” for “optimally correlates with” in this context? One can argue that “correlating” or “accounting for variance” suggests that we are trying to maximize goodness-of-fit (i
4 0.72084785 2247 andrew gelman stats-2014-03-14-The maximal information coefficient
Introduction: Justin Kinney writes: I wanted to let you know that the critique Mickey Atwal and I wrote regarding equitability and the maximal information coefficient has just been published . We discussed this paper last year, under the heading, Too many MC’s not enough MIC’s, or What principles should govern attempts to summarize bivariate associations in large multivariate datasets? Kinney and Atwal’s paper is interesting, with my only criticism being that in some places they seem to aim for what might not be possible. For example, they write that “mutual information is already widely believed to quantify dependencies without bias for relationships of one type or another,” which seems a bit vague to me. And later they write, “How to compute such an estimate that does not bias the resulting mutual information value remains an open problem,” which seems to me to miss the point in that unbiased statistical estimates are not generally possible and indeed are often not desirable. Their
Introduction: Malka Gorfine writes: We noticed that the important topic of association measures and tests came up again in your blog, and we have few comments in this regard. It is useful to distinguish between the univariate and multivariate methods. A consistent multivariate method can recognise dependence between two vectors of random variables, while a univariate method can only loop over pairs of components and check for dependency between them. There are very few consistent multivariate methods. To the best of our knowledge there are three practical methods: 1) HSIC by Gretton et al. (http://www.gatsby.ucl.ac.uk/~gretton/papers/GreBouSmoSch05.pdf) 2) dcov by Szekely et al. (http://projecteuclid.org/euclid.aoas/1267453933) 3) A method we introduced in Heller et al (Biometrika, 2013, 503—510, http://biomet.oxfordjournals.org/content/early/2012/12/04/biomet.ass070.full.pdf+html, and an R package, HHG, is available as well http://cran.r-project.org/web/packages/HHG/index.html). A
6 0.66725582 2324 andrew gelman stats-2014-05-07-Once more on nonparametric measures of mutual information
7 0.63204575 1062 andrew gelman stats-2011-12-16-Mr. Pearson, meet Mr. Mandelbrot: Detecting Novel Associations in Large Data Sets
9 0.60027999 1663 andrew gelman stats-2013-01-09-The effects of fiscal consolidation
10 0.59305882 778 andrew gelman stats-2011-06-24-New ideas on DIC from Martyn Plummer and Sumio Watanabe
11 0.58424169 1102 andrew gelman stats-2012-01-06-Bayesian Anova found useful in ecology
12 0.58005822 918 andrew gelman stats-2011-09-21-Avoiding boundary estimates in linear mixed models
13 0.57571101 1825 andrew gelman stats-2013-04-25-It’s binless! A program for computing normalizing functions
14 0.57197791 1966 andrew gelman stats-2013-08-03-Uncertainty in parameter estimates using multilevel models
15 0.57152879 1908 andrew gelman stats-2013-06-21-Interpreting interactions in discrete-data regression
16 0.56751382 810 andrew gelman stats-2011-07-20-Adding more information can make the variance go up (depending on your model)
17 0.55521035 257 andrew gelman stats-2010-09-04-Question about standard range for social science correlations
18 0.55385506 1737 andrew gelman stats-2013-02-25-Correlation of 1 . . . too good to be true?
20 0.52593291 1686 andrew gelman stats-2013-01-21-Finite-population Anova calculations for models with interactions
topicId topicWeight
[(4, 0.014), (10, 0.028), (13, 0.015), (15, 0.04), (17, 0.073), (24, 0.194), (52, 0.014), (55, 0.023), (57, 0.056), (84, 0.011), (85, 0.014), (86, 0.036), (88, 0.012), (89, 0.011), (90, 0.021), (98, 0.046), (99, 0.292)]
simIndex simValue blogId blogTitle
same-blog 1 0.98730874 2315 andrew gelman stats-2014-05-02-Discovering general multidimensional associations
Introduction: Continuing our discussion of general measures of correlations, Ben Murrell sends along this paper (with corresponding R package), which begins: When two variables are related by a known function, the coefficient of determination (denoted R-squared) measures the proportion of the total variance in the observations that is explained by that function. This quantifies the strength of the relationship between variables by describing what proportion of the variance is signal as opposed to noise. For linear relationships, this is equal to the square of the correlation coefficient, ρ. When the parametric form of the relationship is unknown, however, it is unclear how to estimate the proportion of explained variance equitably – assigning similar values to equally noisy relationships. Here we demonstrate how to directly estimate a generalized R-squared when the form of the relationship is unknown, and we question the performance of the Maximal Information Coefficient (MIC) – a recently pr
2 0.95442665 1230 andrew gelman stats-2012-03-26-Further thoughts on nonparametric correlation measures
Introduction: Malka Gorfine, Ruth Heller, and Yair Heller write a comment on the paper of Reshef et al. that we discussed a few months ago. Just to remind you what’s going on here, here’s my quick summary from December: Reshef et al. propose a new nonlinear R-squared-like measure. Unlike R-squared, this new method depends on a tuning parameter that controls the level of discretization, in a “How long is the coast of Britain” sort of way. The dependence on scale is inevitable for such a general method. Just consider: if you sample 1000 points from the unit bivariate normal distribution, (x,y) ~ N(0,I), you’ll be able to fit them perfectly by a 999-degree polynomial fit to the data. So the scale of the fit matters. The clever idea of the paper is that, instead of going for an absolute measure (which, as we’ve seen, will be scale-dependent), they focus on the problem of summarizing the grid of pairwise dependences in a large set of variables. As they put it: “Imagine a data set with hundreds
3 0.94442707 2129 andrew gelman stats-2013-12-10-Cross-validation and Bayesian estimation of tuning parameters
Introduction: Ilya Lipkovich writes: I read with great interest your 2008 paper [with Aleks Jakulin, Grazia Pittau, and Yu-Sung Su] on weakly informative priors for logistic regression and also followed an interesting discussion on your blog. This discussion was within Bayesian community in relation to the validity of priors. However i would like to approach it rather from a more broad perspective on predictive modeling bringing in the ideas from machine/statistical learning approach”. Actually you were the first to bring it up by mentioning in your paper “borrowing ideas from computer science” on cross-validation when comparing predictive ability of your proposed priors with other choices. However, using cross-validation for comparing method performance is not the only or primary use of CV in machine-learning. Most of machine learning methods have some “meta” or complexity parameters and use cross-validation to tune them up. For example, one of your comparison methods is BBR which actually
4 0.94418836 2283 andrew gelman stats-2014-04-06-An old discussion of food deserts
Introduction: I happened to be reading an old comment thread from 2012 (follow the link from here ) and came across this amusing exchange: Perhaps this is the paper Jonathan was talking about? Here’s more from the thread: Anyway, I don’t have anything to add right now, I just thought it was an interesting discussion.
5 0.94371879 2035 andrew gelman stats-2013-09-23-Scalable Stan
Introduction: Bob writes: If you have papers that have used Stan, we’d love to hear about it. We finally got some submissions, so we’re going to start a list on the web site for 2.0 in earnest. You can either mail them to the list, to me directly, or just update the issue (at least until it’s closed or moved): https://github.com/stan-dev/stan/issues/187 For example, Henrik Mannerstrom fit a hierarchical model the other day with 360,000 data points and 120,000 variables. And it worked just fine in Stan. I’ve asked him to write this up so we can post it here. Here’s the famous graph Bob made showing the scalability of Stan for a series of hierarchical item-response models:
6 0.94368529 1502 andrew gelman stats-2012-09-19-Scalability in education
7 0.94356841 669 andrew gelman stats-2011-04-19-The mysterious Gamma (1.4, 0.4)
8 0.94349647 259 andrew gelman stats-2010-09-06-Inbox zero. Really.
9 0.94300222 2314 andrew gelman stats-2014-05-01-Heller, Heller, and Gorfine on univariate and multivariate information measures
10 0.94299519 77 andrew gelman stats-2010-06-09-Sof[t]
11 0.93997335 1363 andrew gelman stats-2012-06-03-Question about predictive checks
12 0.93979156 494 andrew gelman stats-2010-12-31-Type S error rates for classical and Bayesian single and multiple comparison procedures
13 0.93968022 2359 andrew gelman stats-2014-06-04-All the Assumptions That Are My Life
14 0.93939024 1170 andrew gelman stats-2012-02-16-A previous discussion with Charles Murray about liberals, conservatives, and social class
15 0.93916523 970 andrew gelman stats-2011-10-24-Bell Labs
16 0.93893248 963 andrew gelman stats-2011-10-18-Question on Type M errors
17 0.93862903 1941 andrew gelman stats-2013-07-16-Priors
18 0.93848908 1733 andrew gelman stats-2013-02-22-Krugman sets the bar too high
19 0.93831444 1150 andrew gelman stats-2012-02-02-The inevitable problems with statistical significance and 95% intervals