andrew_gelman_stats andrew_gelman_stats-2012 andrew_gelman_stats-2012-1212 knowledge-graph by maker-knowledge-mining
Source: html
Introduction: Jeff Helzner writes: A friend of mine and I cited your open data article in our attempts to persuade a professor at another institution [Brian Leiter] into releasing the raw data from his influential rankings of philosophy departments. He is now claiming the national security response: . . . disclosing the reputational data would violate the terms on which the evaluators agreed to complete the surveys (did they even bother to read the description of the methodology, one wonders?). I [Helzner] do not find this to be a compelling reply in this case. In fact, I would say that when such data cannot be disclosed it reveals a flaw in the design of the survey. Experimental designs must be open so that others can run the experiment. Mathematical proofs must be open so that they can be reviewed by others. Likewise, it seems to me that the details of statistical argument should be open to inspection. Do you have any thoughts on this? Or do you know of any other leading statistici
sentIndex sentText sentNum sentScore
1 Jeff Helzner writes: A friend of mine and I cited your open data article in our attempts to persuade a professor at another institution [Brian Leiter] into releasing the raw data from his influential rankings of philosophy departments. [sent-1, score-0.801]
2 disclosing the reputational data would violate the terms on which the evaluators agreed to complete the surveys (did they even bother to read the description of the methodology, one wonders? [sent-5, score-0.336]
3 In fact, I would say that when such data cannot be disclosed it reveals a flaw in the design of the survey. [sent-8, score-0.497]
4 Experimental designs must be open so that others can run the experiment. [sent-9, score-0.576]
5 Mathematical proofs must be open so that they can be reviewed by others. [sent-10, score-0.474]
6 Likewise, it seems to me that the details of statistical argument should be open to inspection. [sent-11, score-0.418]
7 Or do you know of any other leading statisticians who might have a line on this issue, something that we could offer in response to this guy’s reluctance to share his data? [sent-13, score-0.384]
8 In fact, I would say that when such data cannot be disclosed it reveals a flaw in the design of the survey. [sent-15, score-0.497]
9 Experimental designs must be open so that others can run the experiment. [sent-16, score-0.576]
10 Mathematical proofs must be open so that they can be reviewed by others. [sent-17, score-0.474]
11 Likewise, it seems to me that the details of statistical argument should be open to inspection. [sent-18, score-0.418]
12 Or do you know of any other leading statisticians who might have a line on this issue, something that we could offer in response to this guy’s reluctance to share his data? [sent-20, score-0.384]
13 Jeff also points to this long discussion among philosophers regarding the practicality of data sharing in this context. [sent-21, score-0.285]
14 Here’s a relevant bit from my article (which tells the story of a team of researchers at a Federal lab who declined to share with me their data from an animal experiment, several years ago): If you really believe your results, you should want your data out in the open. [sent-22, score-0.412]
15 Here’s a page all about his “research on various aspects of statistical disclosure limitation, including assessing risk and utility, synthetic data methods, remote access servers, and secure analyses of distributed data. [sent-30, score-0.267]
16 I agree with Jeff that data should be disclosed, designs should be open so that others can run the experiment, and the details of statistical argument should be open to inspection. [sent-34, score-1.06]
17 That said, if his study is a private effort, Leiter has no obligation to share any of the above, and as some people wrote in the above-linked comment thread, the most obvious ways of releasing data could destroy some of the confidentiality. [sent-35, score-0.378]
18 If he chooses not to release any of his raw data in any form, then I think it’s appropriate to interpret his claims with skepticism. [sent-38, score-0.252]
19 I have an agreement to give Professor Healy the most recent data as well, as soon as we can retain an RA to assist in the anonymization. [sent-65, score-0.226]
20 If there is interest in sharing the data with others, perhaps Kieran Healy could speak with Jerry Reiter (also at Duke) about how best to do so. [sent-66, score-0.227]
wordName wordTfidf (topN-words)
[('leiter', 0.473), ('helzner', 0.338), ('open', 0.218), ('jeff', 0.19), ('disclosed', 0.174), ('healy', 0.152), ('data', 0.15), ('reiter', 0.135), ('juvenile', 0.123), ('designs', 0.117), ('releasing', 0.116), ('share', 0.112), ('kieran', 0.111), ('duke', 0.104), ('raw', 0.102), ('likewise', 0.099), ('confidentiality', 0.099), ('reluctance', 0.097), ('jerry', 0.097), ('proofs', 0.095), ('flaw', 0.091), ('others', 0.087), ('must', 0.084), ('reveals', 0.082), ('compelling', 0.081), ('reviewed', 0.077), ('sharing', 0.077), ('agreement', 0.076), ('guy', 0.075), ('details', 0.072), ('thoughts', 0.071), ('run', 0.07), ('argument', 0.069), ('philosophy', 0.065), ('response', 0.062), ('surveys', 0.062), ('insults', 0.062), ('snark', 0.062), ('disclosing', 0.062), ('confined', 0.062), ('ra', 0.062), ('reputational', 0.062), ('statistical', 0.059), ('synthetic', 0.058), ('hmmmm', 0.058), ('practicality', 0.058), ('experimental', 0.057), ('leading', 0.057), ('offer', 0.056), ('experiment', 0.056)]
simIndex simValue blogId blogTitle
same-blog 1 0.99999994 1212 andrew gelman stats-2012-03-14-Controversy about a ranking of philosophy departments, or How should we think about statistical results when we can’t see the raw data?
Introduction: Jeff Helzner writes: A friend of mine and I cited your open data article in our attempts to persuade a professor at another institution [Brian Leiter] into releasing the raw data from his influential rankings of philosophy departments. He is now claiming the national security response: . . . disclosing the reputational data would violate the terms on which the evaluators agreed to complete the surveys (did they even bother to read the description of the methodology, one wonders?). I [Helzner] do not find this to be a compelling reply in this case. In fact, I would say that when such data cannot be disclosed it reveals a flaw in the design of the survey. Experimental designs must be open so that others can run the experiment. Mathematical proofs must be open so that they can be reviewed by others. Likewise, it seems to me that the details of statistical argument should be open to inspection. Do you have any thoughts on this? Or do you know of any other leading statistici
Introduction: David Karger writes: Your recent post on sharing data was of great interest to me, as my own research in computer science asks how to incentivize and lower barriers to data sharing. I was particularly curious about your highlighting of effort as the major dis-incentive to sharing. I would love to hear more, as this question of effort is on we specifically target in our development of tools for data authoring and publishing. As a straw man, let me point out that sharing data technically requires no more than posting an excel spreadsheet online. And that you likely already produced that spreadsheet during your own analytic work. So, in what way does such low-tech publishing fail to meet your data sharing objectives? Our own hypothesis has been that the effort is really quite low, with the problem being a lack of *immediate/tangible* benefits (as opposed to the long-term values you accurately describe). To attack this problem, we’re developing tools (and, since it appear
3 0.10584165 58 andrew gelman stats-2010-05-29-Stupid legal crap
Introduction: From the website of a journal where I published an article: In Springer journals you have the choice of publishing with or without open access. If you choose open access, your article will be freely available to everyone everywhere. In exchange for an open access fee of â‚Ź 2000 / US $3000 you retain the copyright and your article will carry the Creative Commons License. Please make your choice below. Hmmm . . . pay $3000 so that an article that I wrote and gave to the journal for free can be accessed by others? Sounds like a good deal to me!
4 0.10133134 1238 andrew gelman stats-2012-03-31-Dispute about ethics of data sharing
Introduction: Several months ago, Sam Behseta, the new editor of Chance magazine, asked me if I’d like to have a column. I said yes, I’d like to write on ethics and statistics. My first column was called “Open Data and Open Methods” and I discussed the ethical obligation to share data and make our computations transparent wherever possible. In my column, I recounted a story from a bit over 20 years ago when I noticed a problem in a published analysis (involving electromagnetic fields and calcium flow in chicken brains) and contacted the researcher in charge of the study, who would not share his data with me. Two of the people from that research team—biologist Carl Blackman and statistician Dennis House—saw my Chance column and felt that I had misrepresented the situation and had criticized them unfairly. Blackman and House expressed their concerns in letters to the editor which were just published, along with my reply, in the latest issue of Chance . Seeing as I posted my article here, I
5 0.095494002 875 andrew gelman stats-2011-08-28-Better than Dennis the dentist or Laura the lawyer
Introduction: Kieran Healy points to Robin Mahfood, the CEO of the charity Food for the Poor. This really is pretty impressive: you see a lot of good first-name or last-name matches but not so many where the entire name forms a coherent and relevant phrase.
6 0.088788509 1054 andrew gelman stats-2011-12-12-More frustrations trying to replicate an analysis published in a reputable journal
7 0.084424756 1774 andrew gelman stats-2013-03-22-Likelihood Ratio ≠ 1 Journal
9 0.082098663 1758 andrew gelman stats-2013-03-11-Yes, the decision to try (or not) to have a child can be made rationally
10 0.080231115 1117 andrew gelman stats-2012-01-13-What are the important issues in ethics and statistics? I’m looking for your input!
11 0.079178065 911 andrew gelman stats-2011-09-15-More data tools worth using from Google
12 0.078356594 1955 andrew gelman stats-2013-07-25-Bayes-respecting experimental design and other things
13 0.078029603 1719 andrew gelman stats-2013-02-11-Why waste time philosophizing?
14 0.077293061 1482 andrew gelman stats-2012-09-04-Model checking and model understanding in machine learning
15 0.075756162 1848 andrew gelman stats-2013-05-09-A tale of two discussion papers
16 0.07524541 1837 andrew gelman stats-2013-05-03-NYC Data Skeptics Meetup
17 0.075039752 61 andrew gelman stats-2010-05-31-A data visualization manifesto
18 0.074547321 1974 andrew gelman stats-2013-08-08-Statistical significance and the dangerous lure of certainty
19 0.074110501 352 andrew gelman stats-2010-10-19-Analysis of survey data: Design based models vs. hierarchical modeling?
20 0.074036345 544 andrew gelman stats-2011-01-29-Splitting the data
topicId topicWeight
[(0, 0.178), (1, -0.034), (2, -0.036), (3, -0.022), (4, 0.004), (5, -0.0), (6, -0.038), (7, -0.015), (8, 0.001), (9, 0.004), (10, 0.002), (11, -0.024), (12, -0.004), (13, 0.011), (14, -0.02), (15, 0.035), (16, 0.01), (17, -0.021), (18, 0.032), (19, 0.01), (20, -0.02), (21, 0.013), (22, -0.021), (23, 0.011), (24, -0.051), (25, 0.005), (26, 0.028), (27, -0.003), (28, 0.016), (29, 0.022), (30, 0.015), (31, -0.018), (32, 0.014), (33, 0.015), (34, -0.013), (35, 0.033), (36, 0.029), (37, 0.021), (38, 0.009), (39, 0.043), (40, 0.05), (41, 0.009), (42, 0.019), (43, 0.005), (44, -0.016), (45, -0.021), (46, -0.014), (47, -0.031), (48, 0.009), (49, 0.011)]
simIndex simValue blogId blogTitle
same-blog 1 0.94875789 1212 andrew gelman stats-2012-03-14-Controversy about a ranking of philosophy departments, or How should we think about statistical results when we can’t see the raw data?
Introduction: Jeff Helzner writes: A friend of mine and I cited your open data article in our attempts to persuade a professor at another institution [Brian Leiter] into releasing the raw data from his influential rankings of philosophy departments. He is now claiming the national security response: . . . disclosing the reputational data would violate the terms on which the evaluators agreed to complete the surveys (did they even bother to read the description of the methodology, one wonders?). I [Helzner] do not find this to be a compelling reply in this case. In fact, I would say that when such data cannot be disclosed it reveals a flaw in the design of the survey. Experimental designs must be open so that others can run the experiment. Mathematical proofs must be open so that they can be reviewed by others. Likewise, it seems to me that the details of statistical argument should be open to inspection. Do you have any thoughts on this? Or do you know of any other leading statistici
2 0.85436308 1835 andrew gelman stats-2013-05-02-7 ways to separate errors from statistics
Introduction: Betsey Stevenson and Justin Wolfers have been inspired by the recent Reinhardt and Rogoff debacle to list “six ways to separate lies from statistics” in economics research: 1. “Focus on how robust a finding is, meaning that different ways of looking at the evidence point to the same conclusion.” 2. Don’t confuse statistical with practical significance. 3. “Be wary of scholars using high-powered statistical techniques as a bludgeon to silence critics who are not specialists.” 4. “Don’t fall into the trap of thinking about an empirical finding as ‘right’ or ‘wrong.’ At best, data provide an imperfect guide.” 5. “Don’t mistake correlation for causation.” 6. “Always ask ‘so what?’” I like all these points, especially #4, which I think doesn’t get said enough. As I wrote a few months ago, high-profile social science research aims for proof, not for understanding—and that’s a problem. My addition to the list If you compare my title above to that of Stevenson
3 0.82492661 1525 andrew gelman stats-2012-10-08-Ethical standards in different data communities
Introduction: I opened the paper today and saw this from Paul Krugman, on Jack Welch, the former chairman of General Electric, who posted an assertion on Twitter that the [recent unemployment data] had been cooked to help President Obama’s re-election campaign. His claim was quickly picked up by right-wing pundits and media personalities. It was nonsense, of course. Job numbers are prepared by professional civil servants, at an agency that currently has no political appointees. But then maybe Mr. Welch — under whose leadership G.E. reported remarkably smooth earnings growth, with none of the short-term fluctuations you might have expected (fluctuations that reappeared under his successor) — doesn’t know how hard it would be to cook the jobs data. I was curious so I googled *general electric historical earnings*. It was surprisingly difficult to find the numbers! Most of the links just went back to 2011, or to 2008. Eventually I came across this blog by Barry Ritholtz that showed this
4 0.82205027 989 andrew gelman stats-2011-11-03-This post does not mention Wegman
Introduction: A correspondent writes: Since you have commented on scientific fraud a lot. I wanted to give you an update on the Diederik Stapel case. I’d rather not see my name on the blog if you would elaborate on this any further. It is long but worth the read I guess. I’ll first give you the horrible details which will fill you with a mixture of horror and stupefied amazement at Stapel’s behavior. Then I’ll share Stapel’s abject apology, which might make you feel sorry for the guy. First the amazing story of how he perpetrated the fraud: There has been an interim report delivered to the rector of Tilburg University. Tilburg University is cooperating with the university of Amsterdam and of Groningen in this case. The results are pretty severe, I provide here a quick and literal translation of some comments by the chairman of the investigation committee. This report is publicly available on the university webpage (along with some other things of interest) but in Dutch: What
5 0.82034755 1238 andrew gelman stats-2012-03-31-Dispute about ethics of data sharing
Introduction: Several months ago, Sam Behseta, the new editor of Chance magazine, asked me if I’d like to have a column. I said yes, I’d like to write on ethics and statistics. My first column was called “Open Data and Open Methods” and I discussed the ethical obligation to share data and make our computations transparent wherever possible. In my column, I recounted a story from a bit over 20 years ago when I noticed a problem in a published analysis (involving electromagnetic fields and calcium flow in chicken brains) and contacted the researcher in charge of the study, who would not share his data with me. Two of the people from that research team—biologist Carl Blackman and statistician Dennis House—saw my Chance column and felt that I had misrepresented the situation and had criticized them unfairly. Blackman and House expressed their concerns in letters to the editor which were just published, along with my reply, in the latest issue of Chance . Seeing as I posted my article here, I
7 0.80960631 242 andrew gelman stats-2010-08-29-The Subtle Micro-Effects of Peacekeeping
8 0.79996866 544 andrew gelman stats-2011-01-29-Splitting the data
9 0.79745024 1289 andrew gelman stats-2012-04-29-We go to war with the data we have, not the data we want
10 0.7869401 946 andrew gelman stats-2011-10-07-Analysis of Power Law of Participation
11 0.78411365 192 andrew gelman stats-2010-08-08-Turning pages into data
14 0.7764042 2309 andrew gelman stats-2014-04-28-Crowdstorming a dataset
15 0.76910979 404 andrew gelman stats-2010-11-09-“Much of the recent reported drop in interstate migration is a statistical artifact”
16 0.76730311 1844 andrew gelman stats-2013-05-06-Against optimism about social science
17 0.76713127 1447 andrew gelman stats-2012-08-07-Reproducible science FAIL (so far): What’s stoppin people from sharin data and code?
18 0.7657513 1640 andrew gelman stats-2012-12-26-What do people do wrong? WSJ columnist is looking for examples!
19 0.76544029 991 andrew gelman stats-2011-11-04-Insecure researchers aren’t sharing their data
20 0.76434952 1449 andrew gelman stats-2012-08-08-Gregor Mendel’s suspicious data
topicId topicWeight
[(13, 0.012), (15, 0.043), (16, 0.089), (24, 0.096), (53, 0.015), (57, 0.012), (68, 0.023), (72, 0.015), (77, 0.014), (82, 0.021), (91, 0.185), (96, 0.025), (99, 0.266)]
simIndex simValue blogId blogTitle
1 0.96284086 637 andrew gelman stats-2011-03-29-Unfinished business
Introduction: This blog by J. Robert Lennon on abandoned novels made me think of the more general topic of abandoned projects. I seem to recall George V. Higgins writing that he’d written and discarded 14 novels or so before publishing The Friends of Eddie Coyle. I haven’t abandoned any novels but I’ve abandoned lots of research projects (and also have started various projects that there’s no way I’ll finish). If you think about the decisions involved, it really has to be that way. You learn while you’re working on a project whether it’s worth continuing. Sometimes I’ve put in the hard work and pushed a project to completion, published the article, and then I think . . . what was the point? The modal number of citations of our articles is zero, etc.
2 0.92987072 1186 andrew gelman stats-2012-02-27-Confusion from illusory precision
Introduction: When I posted this link to Dean Foster’s rants, some commenters pointed out this linked claim by famed statistician/provacateur Bjorn Lomberg: If [writes Lomborg] you reduce your child’s intake of fruits and vegetables by just 0.03 grams a day (that’s the equivalent of half a grain of rice) when you opt for more expensive organic produce, the total risk of cancer goes up, not down. Omit buying just one apple every 20 years because you have gone organic, and your child is worse off. Let’s unpack Lomborg’s claim. I don’t know anything about the science of pesticides and cancer, but can he really be so sure that the effects are so small as to be comparable to the health effects of eating “just one apple every 20 years”? I can’t believe you could estimate effects to anything like that precision. I can’t believe anyone has such a precise estimate of the health effects of pesticides, and also I can’t believe anyone has such a precise effect of the health effect of eating an app
3 0.92262298 1528 andrew gelman stats-2012-10-10-My talk at MIT on Thurs 11 Oct
Introduction: Stan: open-source Bayesian inference Speaker: Andrew Gelman, Columbia University Date: Thursday, October 11 2012 Time: 4:00PM to 5:00PM Location: 32-D507 Host: Polina Golland, CSAIL Contact: Polina Golland, 6172538005, polina@csail.mit.edu Stan ( mc-stan.org ) is an open-source package for obtaining Bayesian inference using the No-U-Turn sampler, a variant of Hamiltonian Monte Carlo. We discuss how Stan works and what it can do, the problems that motivated us to write Stan, current challenges, and areas of planned development, including tools for improved generality and usability, more efficient sampling algorithms, and fuller integration of model building, model checking, and model understanding in Bayesian data analysis. P.S. Here’s the talk .
4 0.91376948 920 andrew gelman stats-2011-09-22-Top 10 blog obsessions
Introduction: I was just thinking about this because we seem to be circling around the same few topics over and over (while occasionally slipping in some new statistical ideas): 10. Wegman 9. Hipmunk 8. Dennis the dentist 7. Freakonomics 6. The difference between significant and non-significant is not itself statistically significant 5. Just use a hierarchical model already! 4. Innumerate journalists who think that presidential elections are just like high school 3. A graph can be pretty but convey essentially no information 2. Stan is coming 1. Clippy! Did I miss anything important?
same-blog 5 0.91346407 1212 andrew gelman stats-2012-03-14-Controversy about a ranking of philosophy departments, or How should we think about statistical results when we can’t see the raw data?
Introduction: Jeff Helzner writes: A friend of mine and I cited your open data article in our attempts to persuade a professor at another institution [Brian Leiter] into releasing the raw data from his influential rankings of philosophy departments. He is now claiming the national security response: . . . disclosing the reputational data would violate the terms on which the evaluators agreed to complete the surveys (did they even bother to read the description of the methodology, one wonders?). I [Helzner] do not find this to be a compelling reply in this case. In fact, I would say that when such data cannot be disclosed it reveals a flaw in the design of the survey. Experimental designs must be open so that others can run the experiment. Mathematical proofs must be open so that they can be reviewed by others. Likewise, it seems to me that the details of statistical argument should be open to inspection. Do you have any thoughts on this? Or do you know of any other leading statistici
6 0.90624189 53 andrew gelman stats-2010-05-26-Tumors, on the left, or on the right?
7 0.8945244 736 andrew gelman stats-2011-05-29-Response to “Why Tables Are Really Much Better Than Graphs”
8 0.87142158 48 andrew gelman stats-2010-05-23-The bane of many causes
9 0.86814654 1753 andrew gelman stats-2013-03-06-Stan 1.2.0 and RStan 1.2.0
10 0.86776233 2296 andrew gelman stats-2014-04-19-Index or indicator variables
11 0.85597563 1596 andrew gelman stats-2012-11-29-More consulting experiences, this time in computational linguistics
12 0.84609532 1533 andrew gelman stats-2012-10-14-If x is correlated with y, then y is correlated with x
13 0.84589159 1358 andrew gelman stats-2012-06-01-Question 22 of my final exam for Design and Analysis of Sample Surveys
14 0.84378016 2114 andrew gelman stats-2013-11-26-“Please make fun of this claim”
15 0.84003717 1878 andrew gelman stats-2013-05-31-How to fix the tabloids? Toward replicable social science research
16 0.83664578 2227 andrew gelman stats-2014-02-27-“What Can we Learn from the Many Labs Replication Project?”
17 0.83622903 2137 andrew gelman stats-2013-12-17-Replication backlash
19 0.83490145 2007 andrew gelman stats-2013-09-03-Popper and Jaynes