andrew_gelman_stats andrew_gelman_stats-2012 andrew_gelman_stats-2012-1212 knowledge-graph by maker-knowledge-mining

1212 andrew gelman stats-2012-03-14-Controversy about a ranking of philosophy departments, or How should we think about statistical results when we can’t see the raw data?

meta infos for this blog

Source: html

Introduction: Jeff Helzner writes: A friend of mine and I cited your open data article in our attempts to persuade a professor at another institution [Brian Leiter] into releasing the raw data from his influential rankings of philosophy departments. He is now claiming the national security response: . . . disclosing the reputational data would violate the terms on which the evaluators agreed to complete the surveys (did they even bother to read the description of the methodology, one wonders?). I [Helzner] do not find this to be a compelling reply in this case. In fact, I would say that when such data cannot be disclosed it reveals a flaw in the design of the survey. Experimental designs must be open so that others can run the experiment. Mathematical proofs must be open so that they can be reviewed by others. Likewise, it seems to me that the details of statistical argument should be open to inspection. Do you have any thoughts on this? Or do you know of any other leading statistici

Summary: the most important sentenses genereted by tfidf model

sentIndex sentText sentNum sentScore

1 Jeff Helzner writes: A friend of mine and I cited your open data article in our attempts to persuade a professor at another institution [Brian Leiter] into releasing the raw data from his influential rankings of philosophy departments. [sent-1, score-0.801]

2 disclosing the reputational data would violate the terms on which the evaluators agreed to complete the surveys (did they even bother to read the description of the methodology, one wonders? [sent-5, score-0.336]

3 In fact, I would say that when such data cannot be disclosed it reveals a flaw in the design of the survey. [sent-8, score-0.497]

4 Experimental designs must be open so that others can run the experiment. [sent-9, score-0.576]

5 Mathematical proofs must be open so that they can be reviewed by others. [sent-10, score-0.474]

6 Likewise, it seems to me that the details of statistical argument should be open to inspection. [sent-11, score-0.418]

7 Or do you know of any other leading statisticians who might have a line on this issue, something that we could offer in response to this guy’s reluctance to share his data? [sent-13, score-0.384]

8 In fact, I would say that when such data cannot be disclosed it reveals a flaw in the design of the survey. [sent-15, score-0.497]

9 Experimental designs must be open so that others can run the experiment. [sent-16, score-0.576]

10 Mathematical proofs must be open so that they can be reviewed by others. [sent-17, score-0.474]

11 Likewise, it seems to me that the details of statistical argument should be open to inspection. [sent-18, score-0.418]

12 Or do you know of any other leading statisticians who might have a line on this issue, something that we could offer in response to this guy’s reluctance to share his data? [sent-20, score-0.384]

13 Jeff also points to this long discussion among philosophers regarding the practicality of data sharing in this context. [sent-21, score-0.285]

14 Here’s a relevant bit from my article (which tells the story of a team of researchers at a Federal lab who declined to share with me their data from an animal experiment, several years ago): If you really believe your results, you should want your data out in the open. [sent-22, score-0.412]

15 Here’s a page all about his “research on various aspects of statistical disclosure limitation, including assessing risk and utility, synthetic data methods, remote access servers, and secure analyses of distributed data. [sent-30, score-0.267]

16 I agree with Jeff that data should be disclosed, designs should be open so that others can run the experiment, and the details of statistical argument should be open to inspection. [sent-34, score-1.06]

17 That said, if his study is a private effort, Leiter has no obligation to share any of the above, and as some people wrote in the above-linked comment thread, the most obvious ways of releasing data could destroy some of the confidentiality. [sent-35, score-0.378]

18 If he chooses not to release any of his raw data in any form, then I think it’s appropriate to interpret his claims with skepticism. [sent-38, score-0.252]

19 I have an agreement to give Professor Healy the most recent data as well, as soon as we can retain an RA to assist in the anonymization. [sent-65, score-0.226]

20 If there is interest in sharing the data with others, perhaps Kieran Healy could speak with Jerry Reiter (also at Duke) about how best to do so. [sent-66, score-0.227]

similar blogs computed by tfidf model

tfidf for this blog:

wordName wordTfidf (topN-words)

[('leiter', 0.473), ('helzner', 0.338), ('open', 0.218), ('jeff', 0.19), ('disclosed', 0.174), ('healy', 0.152), ('data', 0.15), ('reiter', 0.135), ('juvenile', 0.123), ('designs', 0.117), ('releasing', 0.116), ('share', 0.112), ('kieran', 0.111), ('duke', 0.104), ('raw', 0.102), ('likewise', 0.099), ('confidentiality', 0.099), ('reluctance', 0.097), ('jerry', 0.097), ('proofs', 0.095), ('flaw', 0.091), ('others', 0.087), ('must', 0.084), ('reveals', 0.082), ('compelling', 0.081), ('reviewed', 0.077), ('sharing', 0.077), ('agreement', 0.076), ('guy', 0.075), ('details', 0.072), ('thoughts', 0.071), ('run', 0.07), ('argument', 0.069), ('philosophy', 0.065), ('response', 0.062), ('surveys', 0.062), ('insults', 0.062), ('snark', 0.062), ('disclosing', 0.062), ('confined', 0.062), ('ra', 0.062), ('reputational', 0.062), ('statistical', 0.059), ('synthetic', 0.058), ('hmmmm', 0.058), ('practicality', 0.058), ('experimental', 0.057), ('leading', 0.057), ('offer', 0.056), ('experiment', 0.056)]

similar blogs list:

simIndex simValue blogId blogTitle

same-blog 1 0.99999994 1212 andrew gelman stats-2012-03-14-Controversy about a ranking of philosophy departments, or How should we think about statistical results when we can’t see the raw data?

2 0.13117485 1447 andrew gelman stats-2012-08-07-Reproducible science FAIL (so far): What’s stoppin people from sharin data and code?

Introduction: David Karger writes: Your recent post on sharing data was of great interest to me, as my own research in computer science asks how to incentivize and lower barriers to data sharing. I was particularly curious about your highlighting of effort as the major dis-incentive to sharing. I would love to hear more, as this question of effort is on we specifically target in our development of tools for data authoring and publishing. As a straw man, let me point out that sharing data technically requires no more than posting an excel spreadsheet online. And that you likely already produced that spreadsheet during your own analytic work. So, in what way does such low-tech publishing fail to meet your data sharing objectives? Our own hypothesis has been that the effort is really quite low, with the problem being a lack of *immediate/tangible* benefits (as opposed to the long-term values you accurately describe). To attack this problem, we’re developing tools (and, since it appear

3 0.10584165 58 andrew gelman stats-2010-05-29-Stupid legal crap

Introduction: From the website of a journal where I published an article: In Springer journals you have the choice of publishing with or without open access. If you choose open access, your article will be freely available to everyone everywhere. In exchange for an open access fee of â‚Ź 2000 / US $3000 you retain the copyright and your article will carry the Creative Commons License. Please make your choice below. Hmmm . . . pay $3000 so that an article that I wrote and gave to the journal for free can be accessed by others? Sounds like a good deal to me!

4 0.10133134 1238 andrew gelman stats-2012-03-31-Dispute about ethics of data sharing

Introduction: Several months ago, Sam Behseta, the new editor of Chance magazine, asked me if I’d like to have a column. I said yes, I’d like to write on ethics and statistics. My first column was called “Open Data and Open Methods” and I discussed the ethical obligation to share data and make our computations transparent wherever possible. In my column, I recounted a story from a bit over 20 years ago when I noticed a problem in a published analysis (involving electromagnetic fields and calcium flow in chicken brains) and contacted the researcher in charge of the study, who would not share his data with me. Two of the people from that research team—biologist Carl Blackman and statistician Dennis House—saw my Chance column and felt that I had misrepresented the situation and had criticized them unfairly. Blackman and House expressed their concerns in letters to the editor which were just published, along with my reply, in the latest issue of Chance . Seeing as I posted my article here, I

5 0.095494002 875 andrew gelman stats-2011-08-28-Better than Dennis the dentist or Laura the lawyer

Introduction: Kieran Healy points to Robin Mahfood, the CEO of the charity Food for the Poor. This really is pretty impressive: you see a lot of good first-name or last-name matches but not so many where the entire name forms a coherent and relevant phrase.

6 0.088788509 1054 andrew gelman stats-2011-12-12-More frustrations trying to replicate an analysis published in a reputable journal

7 0.084424756 1774 andrew gelman stats-2013-03-22-Likelihood Ratio ≠ 1 Journal

8 0.083666503 2235 andrew gelman stats-2014-03-06-How much time (if any) should we spend criticizing research that’s fraudulent, crappy, or just plain pointless?

9 0.082098663 1758 andrew gelman stats-2013-03-11-Yes, the decision to try (or not) to have a child can be made rationally

10 0.080231115 1117 andrew gelman stats-2012-01-13-What are the important issues in ethics and statistics? I’m looking for your input!

11 0.079178065 911 andrew gelman stats-2011-09-15-More data tools worth using from Google

12 0.078356594 1955 andrew gelman stats-2013-07-25-Bayes-respecting experimental design and other things

13 0.078029603 1719 andrew gelman stats-2013-02-11-Why waste time philosophizing?

14 0.077293061 1482 andrew gelman stats-2012-09-04-Model checking and model understanding in machine learning

15 0.075756162 1848 andrew gelman stats-2013-05-09-A tale of two discussion papers

16 0.07524541 1837 andrew gelman stats-2013-05-03-NYC Data Skeptics Meetup

17 0.075039752 61 andrew gelman stats-2010-05-31-A data visualization manifesto

18 0.074547321 1974 andrew gelman stats-2013-08-08-Statistical significance and the dangerous lure of certainty

19 0.074110501 352 andrew gelman stats-2010-10-19-Analysis of survey data: Design based models vs. hierarchical modeling?

20 0.074036345 544 andrew gelman stats-2011-01-29-Splitting the data

similar blogs computed by lsi model

lsi for this blog:

topicId topicWeight

[(0, 0.178), (1, -0.034), (2, -0.036), (3, -0.022), (4, 0.004), (5, -0.0), (6, -0.038), (7, -0.015), (8, 0.001), (9, 0.004), (10, 0.002), (11, -0.024), (12, -0.004), (13, 0.011), (14, -0.02), (15, 0.035), (16, 0.01), (17, -0.021), (18, 0.032), (19, 0.01), (20, -0.02), (21, 0.013), (22, -0.021), (23, 0.011), (24, -0.051), (25, 0.005), (26, 0.028), (27, -0.003), (28, 0.016), (29, 0.022), (30, 0.015), (31, -0.018), (32, 0.014), (33, 0.015), (34, -0.013), (35, 0.033), (36, 0.029), (37, 0.021), (38, 0.009), (39, 0.043), (40, 0.05), (41, 0.009), (42, 0.019), (43, 0.005), (44, -0.016), (45, -0.021), (46, -0.014), (47, -0.031), (48, 0.009), (49, 0.011)]

similar blogs list:

simIndex simValue blogId blogTitle

same-blog 1 0.94875789 1212 andrew gelman stats-2012-03-14-Controversy about a ranking of philosophy departments, or How should we think about statistical results when we can’t see the raw data?

2 0.85436308 1835 andrew gelman stats-2013-05-02-7 ways to separate errors from statistics

Introduction: Betsey Stevenson and Justin Wolfers have been inspired by the recent Reinhardt and Rogoff debacle to list “six ways to separate lies from statistics” in economics research: 1. “Focus on how robust a finding is, meaning that different ways of looking at the evidence point to the same conclusion.” 2. Don’t confuse statistical with practical significance. 3. “Be wary of scholars using high-powered statistical techniques as a bludgeon to silence critics who are not specialists.” 4. “Don’t fall into the trap of thinking about an empirical finding as ‘right’ or ‘wrong.’ At best, data provide an imperfect guide.” 5. “Don’t mistake correlation for causation.” 6. “Always ask ‘so what?’” I like all these points, especially #4, which I think doesn’t get said enough. As I wrote a few months ago, high-profile social science research aims for proof, not for understanding—and that’s a problem. My addition to the list If you compare my title above to that of Stevenson

3 0.82492661 1525 andrew gelman stats-2012-10-08-Ethical standards in different data communities

Introduction: I opened the paper today and saw this from Paul Krugman, on Jack Welch, the former chairman of General Electric, who posted an assertion on Twitter that the [recent unemployment data] had been cooked to help President Obama’s re-election campaign. His claim was quickly picked up by right-wing pundits and media personalities. It was nonsense, of course. Job numbers are prepared by professional civil servants, at an agency that currently has no political appointees. But then maybe Mr. Welch — under whose leadership G.E. reported remarkably smooth earnings growth, with none of the short-term fluctuations you might have expected (fluctuations that reappeared under his successor) — doesn’t know how hard it would be to cook the jobs data. I was curious so I googled *general electric historical earnings*. It was surprisingly difficult to find the numbers! Most of the links just went back to 2011, or to 2008. Eventually I came across this blog by Barry Ritholtz that showed this

4 0.82205027 989 andrew gelman stats-2011-11-03-This post does not mention Wegman

Introduction: A correspondent writes: Since you have commented on scientific fraud a lot. I wanted to give you an update on the Diederik Stapel case. I’d rather not see my name on the blog if you would elaborate on this any further. It is long but worth the read I guess. I’ll first give you the horrible details which will fill you with a mixture of horror and stupefied amazement at Stapel’s behavior. Then I’ll share Stapel’s abject apology, which might make you feel sorry for the guy. First the amazing story of how he perpetrated the fraud: There has been an interim report delivered to the rector of Tilburg University. Tilburg University is cooperating with the university of Amsterdam and of Groningen in this case. The results are pretty severe, I provide here a quick and literal translation of some comments by the chairman of the investigation committee. This report is publicly available on the university webpage (along with some other things of interest) but in Dutch: What

5 0.82034755 1238 andrew gelman stats-2012-03-31-Dispute about ethics of data sharing

6 0.8143329 1681 andrew gelman stats-2013-01-19-Participate in a short survey about the weight of evidence provided by statistics

7 0.80960631 242 andrew gelman stats-2010-08-29-The Subtle Micro-Effects of Peacekeeping

8 0.79996866 544 andrew gelman stats-2011-01-29-Splitting the data

9 0.79745024 1289 andrew gelman stats-2012-04-29-We go to war with the data we have, not the data we want

10 0.7869401 946 andrew gelman stats-2011-10-07-Analysis of Power Law of Participation

11 0.78411365 192 andrew gelman stats-2010-08-08-Turning pages into data

12 0.7814793 1805 andrew gelman stats-2013-04-16-Memo to Reinhart and Rogoff: I think it’s best to admit your errors and go on from there

13 0.78140581 2355 andrew gelman stats-2014-05-31-Jessica Tracy and Alec Beall (authors of the fertile-women-wear-pink study) comment on our Garden of Forking Paths paper, and I comment on their comments

14 0.7764042 2309 andrew gelman stats-2014-04-28-Crowdstorming a dataset

15 0.76910979 404 andrew gelman stats-2010-11-09-“Much of the recent reported drop in interstate migration is a statistical artifact”

16 0.76730311 1844 andrew gelman stats-2013-05-06-Against optimism about social science

17 0.76713127 1447 andrew gelman stats-2012-08-07-Reproducible science FAIL (so far): What’s stoppin people from sharin data and code?

18 0.7657513 1640 andrew gelman stats-2012-12-26-What do people do wrong? WSJ columnist is looking for examples!

19 0.76544029 991 andrew gelman stats-2011-11-04-Insecure researchers aren’t sharing their data

20 0.76434952 1449 andrew gelman stats-2012-08-08-Gregor Mendel’s suspicious data

similar blogs computed by lda model

lda for this blog:

topicId topicWeight

[(13, 0.012), (15, 0.043), (16, 0.089), (24, 0.096), (53, 0.015), (57, 0.012), (68, 0.023), (72, 0.015), (77, 0.014), (82, 0.021), (91, 0.185), (96, 0.025), (99, 0.266)]

similar blogs list:

simIndex simValue blogId blogTitle

1 0.96284086 637 andrew gelman stats-2011-03-29-Unfinished business

Introduction: This blog by J. Robert Lennon on abandoned novels made me think of the more general topic of abandoned projects. I seem to recall George V. Higgins writing that he’d written and discarded 14 novels or so before publishing The Friends of Eddie Coyle. I haven’t abandoned any novels but I’ve abandoned lots of research projects (and also have started various projects that there’s no way I’ll finish). If you think about the decisions involved, it really has to be that way. You learn while you’re working on a project whether it’s worth continuing. Sometimes I’ve put in the hard work and pushed a project to completion, published the article, and then I think . . . what was the point? The modal number of citations of our articles is zero, etc.

2 0.92987072 1186 andrew gelman stats-2012-02-27-Confusion from illusory precision

Introduction: When I posted this link to Dean Foster’s rants, some commenters pointed out this linked claim by famed statistician/provacateur Bjorn Lomberg: If [writes Lomborg] you reduce your child’s intake of fruits and vegetables by just 0.03 grams a day (that’s the equivalent of half a grain of rice) when you opt for more expensive organic produce, the total risk of cancer goes up, not down. Omit buying just one apple every 20 years because you have gone organic, and your child is worse off. Let’s unpack Lomborg’s claim. I don’t know anything about the science of pesticides and cancer, but can he really be so sure that the effects are so small as to be comparable to the health effects of eating “just one apple every 20 years”? I can’t believe you could estimate effects to anything like that precision. I can’t believe anyone has such a precise estimate of the health effects of pesticides, and also I can’t believe anyone has such a precise effect of the health effect of eating an app

3 0.92262298 1528 andrew gelman stats-2012-10-10-My talk at MIT on Thurs 11 Oct

Introduction: Stan: open-source Bayesian inference Speaker: Andrew Gelman, Columbia University Date: Thursday, October 11 2012 Time: 4:00PM to 5:00PM Location: 32-D507 Host: Polina Golland, CSAIL Contact: Polina Golland, 6172538005, polina@csail.mit.edu Stan ( mc-stan.org ) is an open-source package for obtaining Bayesian inference using the No-U-Turn sampler, a variant of Hamiltonian Monte Carlo. We discuss how Stan works and what it can do, the problems that motivated us to write Stan, current challenges, and areas of planned development, including tools for improved generality and usability, more efficient sampling algorithms, and fuller integration of model building, model checking, and model understanding in Bayesian data analysis. P.S. Hereâ€™s the talk .

4 0.91376948 920 andrew gelman stats-2011-09-22-Top 10 blog obsessions

Introduction: I was just thinking about this because we seem to be circling around the same few topics over and over (while occasionally slipping in some new statistical ideas): 10. Wegman 9. Hipmunk 8. Dennis the dentist 7. Freakonomics 6. The difference between significant and non-significant is not itself statistically significant 5. Just use a hierarchical model already! 4. Innumerate journalists who think that presidential elections are just like high school 3. A graph can be pretty but convey essentially no information 2. Stan is coming 1. Clippy! Did I miss anything important?

same-blog 5 0.91346407 1212 andrew gelman stats-2012-03-14-Controversy about a ranking of philosophy departments, or How should we think about statistical results when we can’t see the raw data?

6 0.90624189 53 andrew gelman stats-2010-05-26-Tumors, on the left, or on the right?

7 0.8945244 736 andrew gelman stats-2011-05-29-Response to “Why Tables Are Really Much Better Than Graphs”

8 0.87142158 48 andrew gelman stats-2010-05-23-The bane of many causes

9 0.86814654 1753 andrew gelman stats-2013-03-06-Stan 1.2.0 and RStan 1.2.0

10 0.86776233 2296 andrew gelman stats-2014-04-19-Index or indicator variables

11 0.85597563 1596 andrew gelman stats-2012-11-29-More consulting experiences, this time in computational linguistics

12 0.84609532 1533 andrew gelman stats-2012-10-14-If x is correlated with y, then y is correlated with x

13 0.84589159 1358 andrew gelman stats-2012-06-01-Question 22 of my final exam for Design and Analysis of Sample Surveys

14 0.84378016 2114 andrew gelman stats-2013-11-26-“Please make fun of this claim”

15 0.84003717 1878 andrew gelman stats-2013-05-31-How to fix the tabloids? Toward replicable social science research

16 0.83664578 2227 andrew gelman stats-2014-02-27-“What Can we Learn from the Many Labs Replication Project?”

17 0.83622903 2137 andrew gelman stats-2013-12-17-Replication backlash

18 0.83593631 1441 andrew gelman stats-2012-08-02-“Based on my experiences, I think you could make general progress by constructing a solution to your specific problem.”

19 0.83490145 2007 andrew gelman stats-2013-09-03-Popper and Jaynes

20 0.83487648 1865 andrew gelman stats-2013-05-20-What happened that the journal Psychological Science published a paper with no identifiable strengths?