andrew_gelman_stats andrew_gelman_stats-2014 andrew_gelman_stats-2014-2247 knowledge-graph by maker-knowledge-mining
Source: html
Introduction: Justin Kinney writes: I wanted to let you know that the critique Mickey Atwal and I wrote regarding equitability and the maximal information coefficient has just been published . We discussed this paper last year, under the heading, Too many MC’s not enough MIC’s, or What principles should govern attempts to summarize bivariate associations in large multivariate datasets? Kinney and Atwal’s paper is interesting, with my only criticism being that in some places they seem to aim for what might not be possible. For example, they write that “mutual information is already widely believed to quantify dependencies without bias for relationships of one type or another,” which seems a bit vague to me. And later they write, “How to compute such an estimate that does not bias the resulting mutual information value remains an open problem,” which seems to me to miss the point in that unbiased statistical estimates are not generally possible and indeed are often not desirable. Their
sentIndex sentText sentNum sentScore
1 Justin Kinney writes: I wanted to let you know that the critique Mickey Atwal and I wrote regarding equitability and the maximal information coefficient has just been published . [sent-1, score-0.251]
2 We discussed this paper last year, under the heading, Too many MC’s not enough MIC’s, or What principles should govern attempts to summarize bivariate associations in large multivariate datasets? [sent-2, score-0.233]
3 For example, they write that “mutual information is already widely believed to quantify dependencies without bias for relationships of one type or another,” which seems a bit vague to me. [sent-4, score-0.463]
4 And later they write, “How to compute such an estimate that does not bias the resulting mutual information value remains an open problem,” which seems to me to miss the point in that unbiased statistical estimates are not generally possible and indeed are often not desirable. [sent-5, score-0.371]
5 Their criticisms of the MIC measure of Reshef et al. [sent-6, score-0.247]
6 (see above link to that earlier post for background) may well be reasonable, but, again, there are points where they (Kinney and Atwal) may be missing some possibilities. [sent-7, score-0.084]
7 For example, they write, “nonmonotonic relationships have systematically reduced MIC values relative to monotonic ones” and refer to this as a “bias. [sent-8, score-0.21]
8 ” But it seems to me that nonmonotonic relationships really are less predictable. [sent-9, score-0.384]
9 Consider scatterplots A and B of the Kinney and Atwal paper. [sent-10, score-0.063]
10 The two distributions have the same residual error sd(y|x), but in plot B (the nonmonotonic example) sd(x|y) is much bigger. [sent-11, score-0.356]
11 Not that sd is necessarily the correct measure—in my earlier post, I asked what would be the appropriate measure of association between two variables whose scatterplot looked like a circle (that is, y = +/- sqrt(1-x^2)). [sent-12, score-0.763]
12 More generally, I fear that Kinney and Atwal could be painting themselves in a corner if they are defining the strength of association between two variables in terms of the distribution of y given x. [sent-13, score-0.334]
13 I’m not so much bothered by the asymmetry as by the implicit dismissal of any smoothness in x. [sent-14, score-0.191]
14 One could, for example, consider a function where sd(y|x)=0, that is, y is a deterministic function of x, but in a really jumpy way with lots of big discontinuities going up and down. [sent-15, score-0.257]
15 This to me would be a weaker association than a simple y=a+bx. [sent-16, score-0.161]
16 was that they were interested in quickly summarizing pairwise relations in large sets of variables. [sent-19, score-0.13]
17 In contrast, Kinney and Atwal focus on getting an efficient measure of mutual information for a single pair of variables. [sent-20, score-0.406]
18 I suppose that Kinney and Atwal could apply their method to a larger structure in the manner of Reshef et al. [sent-21, score-0.094]
19 I’d also be interested in a discussion of the idea that the measure of dependence can depend on the scale of discretization, as discussed in my earlier post. [sent-23, score-0.461]
20 In any case, lots of good stuff here, and I imagine that different measures of dependence could be useful for different purposes. [sent-24, score-0.103]
wordName wordTfidf (topN-words)
[('atwal', 0.467), ('kinney', 0.467), ('nonmonotonic', 0.244), ('sd', 0.204), ('reshef', 0.2), ('mic', 0.188), ('mutual', 0.175), ('measure', 0.153), ('relationships', 0.14), ('association', 0.107), ('dependence', 0.103), ('et', 0.094), ('earlier', 0.084), ('information', 0.078), ('discontinuities', 0.074), ('mc', 0.074), ('monotonic', 0.07), ('govern', 0.07), ('dismissal', 0.07), ('discretization', 0.067), ('smoothness', 0.067), ('equitability', 0.067), ('bias', 0.066), ('interested', 0.066), ('function', 0.065), ('dependencies', 0.064), ('pairwise', 0.064), ('scatterplots', 0.063), ('mickey', 0.063), ('painting', 0.061), ('write', 0.061), ('maximal', 0.057), ('residual', 0.056), ('corner', 0.056), ('two', 0.056), ('bivariate', 0.055), ('discussed', 0.055), ('asymmetry', 0.054), ('variables', 0.054), ('weaker', 0.054), ('quantify', 0.054), ('associations', 0.053), ('deterministic', 0.053), ('scatterplot', 0.053), ('circle', 0.052), ('generally', 0.052), ('heading', 0.052), ('sqrt', 0.052), ('critique', 0.049), ('example', 0.049)]
simIndex simValue blogId blogTitle
same-blog 1 1.0 2247 andrew gelman stats-2014-03-14-The maximal information coefficient
Introduction: Justin Kinney writes: I wanted to let you know that the critique Mickey Atwal and I wrote regarding equitability and the maximal information coefficient has just been published . We discussed this paper last year, under the heading, Too many MC’s not enough MIC’s, or What principles should govern attempts to summarize bivariate associations in large multivariate datasets? Kinney and Atwal’s paper is interesting, with my only criticism being that in some places they seem to aim for what might not be possible. For example, they write that “mutual information is already widely believed to quantify dependencies without bias for relationships of one type or another,” which seems a bit vague to me. And later they write, “How to compute such an estimate that does not bias the resulting mutual information value remains an open problem,” which seems to me to miss the point in that unbiased statistical estimates are not generally possible and indeed are often not desirable. Their
Introduction: Justin Kinney writes: Since your blog has discussed the “maximal information coefficient” (MIC) of Reshef et al., I figured you might want to see the critique that Gurinder Atwal and I have posted. In short, Reshef et al.’s central claim that MIC is “equitable” is incorrect. We [Kinney and Atwal] offer mathematical proof that the definition of “equitability” Reshef et al. propose is unsatisfiable—no nontrivial dependence measure, including MIC, has this property. Replicating the simulations in their paper with modestly larger data sets validates this finding. The heuristic notion of equitability, however, can be formalized instead as a self-consistency condition closely related to the Data Processing Inequality. Mutual information satisfies this new definition of equitability but MIC does not. We therefore propose that simply estimating mutual information will, in many cases, provide the sort of dependence measure Reshef et al. seek. For background, here are my two p
3 0.43903872 2324 andrew gelman stats-2014-05-07-Once more on nonparametric measures of mutual information
Introduction: Ben Murell writes: Our reply to Kinney and Atwal has come out (http://www.pnas.org/content/early/2014/04/29/1403623111.full.pdf) along with their response (http://www.pnas.org/content/early/2014/04/29/1404661111.full.pdf). I feel like they somewhat missed the point. If you’re still interested in this line of discussion, feel free to post, and maybe the Murrells and Kinney can bash it out in your comments! Background: Too many MC’s not enough MIC’s, or What principles should govern attempts to summarize bivariate associations in large multivariate datasets? Heller, Heller, and Gorfine on univariate and multivariate information measures Kinney and Atwal on the maximal information coefficient Mr. Pearson, meet Mr. Mandelbrot: Detecting Novel Associations in Large Data Sets Gorfine, Heller, Heller, Simon, and Tibshirani don’t like MIC The fun thing is that all these people are sending me their papers, and I’m enough of an outsider in this field that each of the
4 0.35081333 2314 andrew gelman stats-2014-05-01-Heller, Heller, and Gorfine on univariate and multivariate information measures
Introduction: Malka Gorfine writes: We noticed that the important topic of association measures and tests came up again in your blog, and we have few comments in this regard. It is useful to distinguish between the univariate and multivariate methods. A consistent multivariate method can recognise dependence between two vectors of random variables, while a univariate method can only loop over pairs of components and check for dependency between them. There are very few consistent multivariate methods. To the best of our knowledge there are three practical methods: 1) HSIC by Gretton et al. (http://www.gatsby.ucl.ac.uk/~gretton/papers/GreBouSmoSch05.pdf) 2) dcov by Szekely et al. (http://projecteuclid.org/euclid.aoas/1267453933) 3) A method we introduced in Heller et al (Biometrika, 2013, 503—510, http://biomet.oxfordjournals.org/content/early/2012/12/04/biomet.ass070.full.pdf+html, and an R package, HHG, is available as well http://cran.r-project.org/web/packages/HHG/index.html). A
5 0.26908267 1230 andrew gelman stats-2012-03-26-Further thoughts on nonparametric correlation measures
Introduction: Malka Gorfine, Ruth Heller, and Yair Heller write a comment on the paper of Reshef et al. that we discussed a few months ago. Just to remind you what’s going on here, here’s my quick summary from December: Reshef et al. propose a new nonlinear R-squared-like measure. Unlike R-squared, this new method depends on a tuning parameter that controls the level of discretization, in a “How long is the coast of Britain” sort of way. The dependence on scale is inevitable for such a general method. Just consider: if you sample 1000 points from the unit bivariate normal distribution, (x,y) ~ N(0,I), you’ll be able to fit them perfectly by a 999-degree polynomial fit to the data. So the scale of the fit matters. The clever idea of the paper is that, instead of going for an absolute measure (which, as we’ve seen, will be scale-dependent), they focus on the problem of summarizing the grid of pairwise dependences in a large set of variables. As they put it: “Imagine a data set with hundreds
6 0.24498324 1062 andrew gelman stats-2011-12-16-Mr. Pearson, meet Mr. Mandelbrot: Detecting Novel Associations in Large Data Sets
7 0.15862453 2315 andrew gelman stats-2014-05-02-Discovering general multidimensional associations
8 0.12623958 1686 andrew gelman stats-2013-01-21-Finite-population Anova calculations for models with interactions
11 0.083317488 918 andrew gelman stats-2011-09-21-Avoiding boundary estimates in linear mixed models
12 0.081861466 1996 andrew gelman stats-2013-08-24-All inference is about generalizing from sample to population
13 0.070874378 2311 andrew gelman stats-2014-04-29-Bayesian Uncertainty Quantification for Differential Equations!
14 0.070432656 1462 andrew gelman stats-2012-08-18-Standardizing regression inputs
15 0.069693483 1247 andrew gelman stats-2012-04-05-More philosophy of Bayes
17 0.068493038 960 andrew gelman stats-2011-10-15-The bias-variance tradeoff
18 0.068303421 187 andrew gelman stats-2010-08-05-Update on state size and governors’ popularity
topicId topicWeight
[(0, 0.139), (1, 0.032), (2, 0.023), (3, -0.049), (4, 0.034), (5, -0.046), (6, 0.009), (7, -0.026), (8, -0.004), (9, 0.015), (10, 0.021), (11, 0.008), (12, -0.031), (13, -0.02), (14, -0.01), (15, 0.039), (16, 0.014), (17, 0.024), (18, -0.021), (19, -0.045), (20, 0.067), (21, -0.006), (22, 0.087), (23, -0.019), (24, 0.103), (25, 0.105), (26, 0.075), (27, 0.092), (28, 0.179), (29, 0.029), (30, 0.119), (31, 0.144), (32, 0.127), (33, -0.023), (34, 0.076), (35, -0.016), (36, 0.02), (37, -0.017), (38, -0.038), (39, -0.046), (40, 0.057), (41, 0.018), (42, 0.052), (43, -0.003), (44, -0.076), (45, -0.023), (46, 0.149), (47, -0.082), (48, -0.156), (49, 0.037)]
simIndex simValue blogId blogTitle
same-blog 1 0.95167005 2247 andrew gelman stats-2014-03-14-The maximal information coefficient
Introduction: Justin Kinney writes: I wanted to let you know that the critique Mickey Atwal and I wrote regarding equitability and the maximal information coefficient has just been published . We discussed this paper last year, under the heading, Too many MC’s not enough MIC’s, or What principles should govern attempts to summarize bivariate associations in large multivariate datasets? Kinney and Atwal’s paper is interesting, with my only criticism being that in some places they seem to aim for what might not be possible. For example, they write that “mutual information is already widely believed to quantify dependencies without bias for relationships of one type or another,” which seems a bit vague to me. And later they write, “How to compute such an estimate that does not bias the resulting mutual information value remains an open problem,” which seems to me to miss the point in that unbiased statistical estimates are not generally possible and indeed are often not desirable. Their
2 0.91728282 2324 andrew gelman stats-2014-05-07-Once more on nonparametric measures of mutual information
Introduction: Ben Murell writes: Our reply to Kinney and Atwal has come out (http://www.pnas.org/content/early/2014/04/29/1403623111.full.pdf) along with their response (http://www.pnas.org/content/early/2014/04/29/1404661111.full.pdf). I feel like they somewhat missed the point. If you’re still interested in this line of discussion, feel free to post, and maybe the Murrells and Kinney can bash it out in your comments! Background: Too many MC’s not enough MIC’s, or What principles should govern attempts to summarize bivariate associations in large multivariate datasets? Heller, Heller, and Gorfine on univariate and multivariate information measures Kinney and Atwal on the maximal information coefficient Mr. Pearson, meet Mr. Mandelbrot: Detecting Novel Associations in Large Data Sets Gorfine, Heller, Heller, Simon, and Tibshirani don’t like MIC The fun thing is that all these people are sending me their papers, and I’m enough of an outsider in this field that each of the
3 0.90883017 2314 andrew gelman stats-2014-05-01-Heller, Heller, and Gorfine on univariate and multivariate information measures
Introduction: Malka Gorfine writes: We noticed that the important topic of association measures and tests came up again in your blog, and we have few comments in this regard. It is useful to distinguish between the univariate and multivariate methods. A consistent multivariate method can recognise dependence between two vectors of random variables, while a univariate method can only loop over pairs of components and check for dependency between them. There are very few consistent multivariate methods. To the best of our knowledge there are three practical methods: 1) HSIC by Gretton et al. (http://www.gatsby.ucl.ac.uk/~gretton/papers/GreBouSmoSch05.pdf) 2) dcov by Szekely et al. (http://projecteuclid.org/euclid.aoas/1267453933) 3) A method we introduced in Heller et al (Biometrika, 2013, 503—510, http://biomet.oxfordjournals.org/content/early/2012/12/04/biomet.ass070.full.pdf+html, and an R package, HHG, is available as well http://cran.r-project.org/web/packages/HHG/index.html). A
4 0.90594625 1230 andrew gelman stats-2012-03-26-Further thoughts on nonparametric correlation measures
Introduction: Malka Gorfine, Ruth Heller, and Yair Heller write a comment on the paper of Reshef et al. that we discussed a few months ago. Just to remind you what’s going on here, here’s my quick summary from December: Reshef et al. propose a new nonlinear R-squared-like measure. Unlike R-squared, this new method depends on a tuning parameter that controls the level of discretization, in a “How long is the coast of Britain” sort of way. The dependence on scale is inevitable for such a general method. Just consider: if you sample 1000 points from the unit bivariate normal distribution, (x,y) ~ N(0,I), you’ll be able to fit them perfectly by a 999-degree polynomial fit to the data. So the scale of the fit matters. The clever idea of the paper is that, instead of going for an absolute measure (which, as we’ve seen, will be scale-dependent), they focus on the problem of summarizing the grid of pairwise dependences in a large set of variables. As they put it: “Imagine a data set with hundreds
Introduction: Justin Kinney writes: Since your blog has discussed the “maximal information coefficient” (MIC) of Reshef et al., I figured you might want to see the critique that Gurinder Atwal and I have posted. In short, Reshef et al.’s central claim that MIC is “equitable” is incorrect. We [Kinney and Atwal] offer mathematical proof that the definition of “equitability” Reshef et al. propose is unsatisfiable—no nontrivial dependence measure, including MIC, has this property. Replicating the simulations in their paper with modestly larger data sets validates this finding. The heuristic notion of equitability, however, can be formalized instead as a self-consistency condition closely related to the Data Processing Inequality. Mutual information satisfies this new definition of equitability but MIC does not. We therefore propose that simply estimating mutual information will, in many cases, provide the sort of dependence measure Reshef et al. seek. For background, here are my two p
6 0.85429746 1062 andrew gelman stats-2011-12-16-Mr. Pearson, meet Mr. Mandelbrot: Detecting Novel Associations in Large Data Sets
7 0.67734939 2315 andrew gelman stats-2014-05-02-Discovering general multidimensional associations
8 0.64287823 1825 andrew gelman stats-2013-04-25-It’s binless! A program for computing normalizing functions
10 0.53022081 301 andrew gelman stats-2010-09-28-Correlation, prediction, variation, etc.
11 0.5004167 778 andrew gelman stats-2011-06-24-New ideas on DIC from Martyn Plummer and Sumio Watanabe
12 0.49377647 1424 andrew gelman stats-2012-07-22-Extreme events as evidence for differences in distributions
13 0.48942581 519 andrew gelman stats-2011-01-16-Update on the generalized method of moments
14 0.47519246 2250 andrew gelman stats-2014-03-16-“I have no idea who Catalina Garcia is, but she makes a decent ruler”
15 0.47147819 1975 andrew gelman stats-2013-08-09-Understanding predictive information criteria for Bayesian models
16 0.46975443 939 andrew gelman stats-2011-10-03-DBQQ rounding for labeling charts and communicating tolerances
17 0.46600828 2311 andrew gelman stats-2014-04-29-Bayesian Uncertainty Quantification for Differential Equations!
18 0.46525693 1828 andrew gelman stats-2013-04-27-Time-Sharing Experiments for the Social Sciences
19 0.44409832 1380 andrew gelman stats-2012-06-15-Coaching, teaching, and writing
20 0.44353104 1120 andrew gelman stats-2012-01-15-Fun fight over the Grover search algorithm
topicId topicWeight
[(16, 0.038), (17, 0.05), (21, 0.025), (24, 0.371), (29, 0.016), (41, 0.02), (45, 0.027), (52, 0.036), (57, 0.012), (69, 0.013), (76, 0.027), (77, 0.012), (90, 0.018), (99, 0.219)]
simIndex simValue blogId blogTitle
Introduction: Justin Kinney writes: Since your blog has discussed the “maximal information coefficient” (MIC) of Reshef et al., I figured you might want to see the critique that Gurinder Atwal and I have posted. In short, Reshef et al.’s central claim that MIC is “equitable” is incorrect. We [Kinney and Atwal] offer mathematical proof that the definition of “equitability” Reshef et al. propose is unsatisfiable—no nontrivial dependence measure, including MIC, has this property. Replicating the simulations in their paper with modestly larger data sets validates this finding. The heuristic notion of equitability, however, can be formalized instead as a self-consistency condition closely related to the Data Processing Inequality. Mutual information satisfies this new definition of equitability but MIC does not. We therefore propose that simply estimating mutual information will, in many cases, provide the sort of dependence measure Reshef et al. seek. For background, here are my two p
2 0.98120838 1092 andrew gelman stats-2011-12-29-More by Berger and me on weakly informative priors
Introduction: A couple days ago we discussed some remarks by Tony O’Hagan and Jim Berger on weakly informative priors. Jim followed up on Deborah Mayo’s blog with this: Objective Bayesian priors are often improper (i.e., have infinite total mass), but this is not a problem when they are developed correctly. But not every improper prior is satisfactory. For instance, the constant prior is known to be unsatisfactory in many situations. The ‘solution’ pseudo-Bayesians often use is to choose a constant prior over a large but bounded set (a ‘weakly informative’ prior), saying it is now proper and so all is well. This is not true; if the constant prior on the whole parameter space is bad, so will be the constant prior over the bounded set. The problem is, in part, that some people confuse proper priors with subjective priors and, having learned that true subjective priors are fine, incorrectly presume that weakly informative proper priors are fine. I have a few reactions to this: 1. I agree
3 0.98028427 938 andrew gelman stats-2011-10-03-Comparing prediction errors
Introduction: Someone named James writes: I’m working on a classification task, sentence segmentation. The classifier algorithm we use (BoosTexter, a boosted learning algorithm) classifies each word independently conditional on its features, i.e. a bag-of-words model, so any contextual clues need to be encoded into the features. The feature extraction system I am proposing in my thesis uses a heteroscedastic LDA to transform data to produce the features the classifier runs on. The HLDA system has a couple parameters I’m testing, and I’m running a 3×2 full factorial experiment. That’s the background which may or may not be relevant to the question. The output of each trial is a class (there are only 2 classes, right now) for every word in the dataset. Because of the nature of the task, one class strongly predominates, say 90-95% of the data. My question is this: in terms of overall performance (we use F1 score), many of these trials are pretty close together, which leads me to ask whethe
4 0.97907352 241 andrew gelman stats-2010-08-29-Ethics and statistics in development research
Introduction: From Bannerjee and Duflo, “The Experimental Approach to Development Economics,” Annual Review of Economics (2009): One issue with the explicit acknowledgment of randomization as a fair way to allocate the program is that implementers may find that the easiest way to present it to the community is to say that an expansion of the program is planned for the control areas in the future (especially when such is indeed the case, as in phased-in design). I can’t quite figure out whether Bannerjee and Duflo are saying that they would lie and tell people that an expansion is planned when it isn’t, or whether they’re deploring that other people do it. I’m not bothered by a lot of the deception in experimental research–for example, I think the Milgram obedience experiment was just fine–but somehow the above deception bothers me. It just seems wrong to tell people that an expansion is planned if it’s not. P.S. Overall the article is pretty good. My only real problem with it is that
5 0.97878045 1479 andrew gelman stats-2012-09-01-Mothers and Moms
Introduction: Philip Cohen asks , “Why are mothers becoming moms?” These aren’t just two words for the same thing: in political terms “mother” is merely descriptive while “mom” is more positive. Indeed, we speak of “mom and apple pie” as unquestionable American icons. Cohen points out that motherhood is sometimes but not always respected in political discourse: On the one hand, both President Obama and pundit Hilary Rosen have now called motherhood the world’s hardest job. And with the Romneys flopping onto the all-mothers-work bandwagon, it appears we’re reaching a rare rhetorical consensus. On the other hand, the majority in both major political parties agrees that poor single mothers and their children need one thing above all – a (real) job, one that provides the “dignity of an honest day’s work.” For welfare purposes, taking care of children is not only not the toughest job in the world, it is more akin to nothing at all. When Bill Clinton’s endorsed welfare-to-work he famously decla
6 0.9781878 482 andrew gelman stats-2010-12-23-Capitalism as a form of voluntarism
8 0.97710145 743 andrew gelman stats-2011-06-03-An argument that can’t possibly make sense
9 0.97633851 643 andrew gelman stats-2011-04-02-So-called Bayesian hypothesis testing is just as bad as regular hypothesis testing
10 0.97476304 1978 andrew gelman stats-2013-08-12-Fixing the race, ethnicity, and national origin questions on the U.S. Census
12 0.97181487 1787 andrew gelman stats-2013-04-04-Wanna be the next Tyler Cowen? It’s not as easy as you might think!
13 0.97175145 1376 andrew gelman stats-2012-06-12-Simple graph WIN: the example of birthday frequencies
14 0.96610999 278 andrew gelman stats-2010-09-15-Advice that might make sense for individuals but is negative-sum overall
15 0.96470034 545 andrew gelman stats-2011-01-30-New innovations in spam
16 0.96320164 240 andrew gelman stats-2010-08-29-ARM solutions
17 0.96291792 1455 andrew gelman stats-2012-08-12-Probabilistic screening to get an approximate self-weighted sample
18 0.96291018 2143 andrew gelman stats-2013-12-22-The kluges of today are the textbook solutions of tomorrow.
19 0.96236211 2229 andrew gelman stats-2014-02-28-God-leaf-tree
20 0.961218 1999 andrew gelman stats-2013-08-27-Bayesian model averaging or fitting a larger model