andrew_gelman_stats andrew_gelman_stats-2011 andrew_gelman_stats-2011-751 knowledge-graph by maker-knowledge-mining
Source: html
Introduction: At the time of our last discussion , Edward Wegman, a statistics professor who has also worked for government research agencies, had been involved in three cases of plagiarism: a report for the U.S. Congress on climate models, a paper on social networks, a paper on color graphics. Each of the plagiarism stories was slightly different: the congressional report involved the distorted copying of research by a scientist (Raymond Bradley) whose conclusions Wegman disagreed with, the social networks paper included copied material in its background section, and the color graphics paper included various bits and pieces by others that had been used in old lecture notes. Since then, blogger Deep Climate has uncovered another plagiarized article by Wegman, this time an article in a 2005 volume on data mining and data visualization. Deep Climate writes, “certain sections of Statistical Data Mining rely heavily on lightly edited portions on lectures from Wegman’s statistical data mining c
sentIndex sentText sentNum sentScore
1 At the time of our last discussion , Edward Wegman, a statistics professor who has also worked for government research agencies, had been involved in three cases of plagiarism: a report for the U. [sent-1, score-0.215]
2 Congress on climate models, a paper on social networks, a paper on color graphics. [sent-3, score-0.262]
3 Since then, blogger Deep Climate has uncovered another plagiarized article by Wegman, this time an article in a 2005 volume on data mining and data visualization. [sent-5, score-0.358]
4 Deep Climate writes, “certain sections of Statistical Data Mining rely heavily on lightly edited portions on lectures from Wegman’s statistical data mining course at GMU. [sent-6, score-0.361]
5 John Mashey wrote to me: See here for a 12-pager that highlights, not the plagiarism in the Wegman report, but some of the easiest-to-see problems that I think rise to falsification/fabrication, always much harder to explain, especially to a general audience. [sent-14, score-0.382]
6 A chunk of it started with DC’s analysis of problems with the way they hacked Bradley’s tree-rings, and applies another visual display style to bring that out and explain to a broader audience. [sent-16, score-0.196]
7 I suspect this goes beyond the laziness of typical plagiarism, but when one finds a mass of plagiarized text, the embedded changes leap out. [sent-17, score-0.33]
8 I’ve noticed this in other academic settings–not usually involving plagiarism, but a setting where researcher A disagrees with researcher B, but instead of citing and quoting B’s work, A will simply vaguely refer to what B is doing and then disparage it, without even the courtesy of a citation. [sent-21, score-0.078]
9 If you want to disagree with someone, I think it’s best to directly explain what you’re disagreeing with! [sent-23, score-0.175]
10 I’ve read many an expert disagreement in various literatures, but I’ve always thought the paradigms were: a) Expert A offers data, analysis and conclusions. [sent-25, score-0.152]
11 b) Expert B cites a) carefully, then says: - The data was confounded, or was wrong, or there is new data. [sent-26, score-0.145]
12 - The data and analysis are OK, but the conclusions go beyond the data OR - My model and analysis does a better job of accounting for the data. [sent-28, score-0.409]
13 The other that confuses people no end, are 1000-year temperature reconstructions, all valiantly trying to extract signal from noise, and non-experts see differences and assume they disagree . [sent-31, score-0.244]
14 ] This is the only case I’ve ever seen like this, especially in such a high-profile report. [sent-38, score-0.082]
15 My [Mashey's] hypothesis is: - Wegman & Said had zero expertise in paleoclimate and were totally incapable of citing Bradley and arguing with him. [sent-39, score-0.244]
16 - So they plagiarized to simulate expertise, I conjecture figuring that experts would skip over it, especially seeing cites of Bradley tables beforehand (albeit with ludicrous errors inserted in copying them) and a vague Bradley cite at end. [sent-41, score-0.5]
17 - They couldn’t just quote the relevant sections, because they had the “wrong” answers, and anyone who actually reads Ray’s book knows that it is mostly about the techniques for extracting signal from noise and coping with the various confounders. [sent-42, score-0.183]
18 - They couldn’t quote some sections and then change the conclusions without being totally obvious. [sent-43, score-0.262]
19 And, while I’ve done work in climate reconstruction, social networks, and color graphics, I don’t consider any of these to be my primary areas of expertise. [sent-48, score-0.262]
20 My main explanation of plagiarism is that it’s laziness , the desire to simulate expertise or creativity where there is none. [sent-50, score-0.623]
wordName wordTfidf (topN-words)
[('wegman', 0.395), ('bradley', 0.372), ('plagiarism', 0.3), ('climate', 0.166), ('laziness', 0.147), ('mashey', 0.133), ('plagiarized', 0.121), ('mining', 0.115), ('sections', 0.11), ('disagree', 0.107), ('networks', 0.098), ('color', 0.096), ('expertise', 0.095), ('deep', 0.092), ('steal', 0.087), ('cites', 0.084), ('goodwin', 0.083), ('edward', 0.082), ('especially', 0.082), ('conclusions', 0.081), ('simulate', 0.081), ('expert', 0.08), ('report', 0.079), ('citing', 0.078), ('copying', 0.077), ('couldn', 0.077), ('lectures', 0.075), ('involved', 0.074), ('analysis', 0.072), ('totally', 0.071), ('scholar', 0.071), ('temperature', 0.069), ('explain', 0.068), ('signal', 0.068), ('noise', 0.062), ('beyond', 0.062), ('cases', 0.062), ('data', 0.061), ('ve', 0.057), ('hemisphere', 0.056), ('hacked', 0.056), ('christy', 0.056), ('everitt', 0.056), ('latitude', 0.056), ('literatures', 0.056), ('spencer', 0.056), ('experts', 0.055), ('included', 0.054), ('reconstructions', 0.053), ('coping', 0.053)]
simIndex simValue blogId blogTitle
same-blog 1 1.0000001 751 andrew gelman stats-2011-06-08-Another Wegman plagiarism
Introduction: At the time of our last discussion , Edward Wegman, a statistics professor who has also worked for government research agencies, had been involved in three cases of plagiarism: a report for the U.S. Congress on climate models, a paper on social networks, a paper on color graphics. Each of the plagiarism stories was slightly different: the congressional report involved the distorted copying of research by a scientist (Raymond Bradley) whose conclusions Wegman disagreed with, the social networks paper included copied material in its background section, and the color graphics paper included various bits and pieces by others that had been used in old lecture notes. Since then, blogger Deep Climate has uncovered another plagiarized article by Wegman, this time an article in a 2005 volume on data mining and data visualization. Deep Climate writes, “certain sections of Statistical Data Mining rely heavily on lightly edited portions on lectures from Wegman’s statistical data mining c
2 0.45956504 728 andrew gelman stats-2011-05-24-A (not quite) grand unified theory of plagiarism, as applied to the Wegman case
Introduction: A common reason for plagiarism is laziness: you want credit for doing something but you don’t really feel like doing it–maybe you’d rather go fishing, or bowling, or blogging, or whatever, so you just steal it, or you hire someone to steal it for you. Interestingly enough, we see that in many defenses of plagiarism allegations. A common response is: I was sloppy in dealing with my notes, or I let my research assistant (who, incidentally, wasn’t credited in the final version) copy things for me and the research assistant got sloppy. The common theme: The person wanted the credit without doing the work. As I wrote last year, I like to think that directness and openness is a virtue in scientific writing. For example, clearly citing the works we draw from, even when such citing of secondary sources might make us appear less erudite. But I can see how some scholars might feel a pressure to cover their traces. Wegman Which brings us to Ed Wegman, whose defense of plagiari
Introduction: As regular readers of this blog are aware, I am fascinated by academic and scientific cheating and the excuses people give for it. Bruno Frey and colleagues published a single article (with only minor variants) in five different major journals, and these articles did not cite each other. And there have been several other cases of his self-plagiarism (see this review from Olaf Storbeck). I do not mind the general practice of repeating oneself for different audiences—in the social sciences, we call this Arrow’s Theorem —but in this case Frey seems to have gone a bit too far. Blogger Economic Logic has looked into this and concluded that this sort of common practice is standard in “the context of the German(-speaking) academic environment,” and what sets Frey apart is not his self-plagiarism or even his brazenness but rather his practice of doing it in high-visibility journals. Economic Logic writes that “[Frey's] contribution is pedagogical, he found a good and interesting
4 0.3131206 766 andrew gelman stats-2011-06-14-Last Wegman post (for now)
Introduction: John Mashey points me to a news article by Eli Kintisch with the following wonderful quote: Will Happer, a physicist at Princeton University who questions the consensus view on climate, thinks Mashey is a destructive force who uses “totalitarian tactics”–publishing damaging documents online, without peer review–to carry out personal vendettas. I’ve never thought of uploading files as “totalitarian” but maybe they do things differently at Princeton. I actually think of totalitarians as acting secretly–denunciations without evidence, midnight arrests, trials in undisclosed locations, and so forth. Mashey’s practice of putting everything out in the open seems to me the opposite of totalitarian. The article also reports that Edward Wegman’s lawyer said that Wegman “has never engaged in plagiarism.” If I were the lawyer, I’d be pretty mad at Wegman at this point. I can just imagine the conversation: Lawyer: You never told me about that 2005 paper where you stole from Bria
5 0.23867853 1867 andrew gelman stats-2013-05-22-To Throw Away Data: Plagiarism as a Statistical Crime
Introduction: I’ve been blogging a lot lately about plagiarism (sorry, Bob!), and one thing that’s been bugging me is, why does it bother me so much. Part of the story is simple: much of my reputation comes from the words I write, so I bristle at any attempt to devalue words. I feel the same way about plagiarism that a rich person would feel about counterfeiting: Don’t debase my currency! But it’s more than that. After discussing this a bit with Thomas Basbøll, I realized that I’m bothered by the way that plagiarism interferes with the transmission of information: Much has been written on the ethics of plagiarism. One aspect that has received less notice is plagiarism’s role in corrupting our ability to learn from data: We propose that plagiarism is a statistical crime. It involves the hiding of important information regarding the source and context of the copied work in its original form. Such information can dramatically alter the statistical inferences made about the work. In statisti
6 0.22402021 1588 andrew gelman stats-2012-11-23-No one knows what it’s like to be the bad man
8 0.20667058 722 andrew gelman stats-2011-05-20-Why no Wegmania?
9 0.17701563 945 andrew gelman stats-2011-10-06-W’man < W’pedia, again
10 0.17413723 2234 andrew gelman stats-2014-03-05-Plagiarism, Arizona style
11 0.1681294 1568 andrew gelman stats-2012-11-07-That last satisfaction at the end of the career
12 0.1605991 1266 andrew gelman stats-2012-04-16-Another day, another plagiarist
13 0.16006666 345 andrew gelman stats-2010-10-15-Things we do on sabbatical instead of actually working
14 0.15297838 902 andrew gelman stats-2011-09-12-The importance of style in academic writing
15 0.145799 1324 andrew gelman stats-2012-05-16-Wikipedia author confronts Ed Wegman
16 0.1417474 1435 andrew gelman stats-2012-07-30-Retracted articles and unethical behavior in economics journals?
17 0.13976023 400 andrew gelman stats-2010-11-08-Poli sci plagiarism update, and a note about the benefits of not caring
18 0.12939197 1440 andrew gelman stats-2012-08-02-“A Christmas Carol” as applied to plagiarism
19 0.11699979 2334 andrew gelman stats-2014-05-14-“The subtle funk of just a little poultry offal”
topicId topicWeight
[(0, 0.205), (1, -0.065), (2, -0.075), (3, -0.023), (4, -0.006), (5, -0.038), (6, 0.023), (7, -0.029), (8, 0.018), (9, 0.02), (10, 0.048), (11, -0.013), (12, -0.031), (13, -0.001), (14, -0.039), (15, -0.032), (16, 0.042), (17, -0.022), (18, 0.133), (19, -0.082), (20, -0.031), (21, -0.007), (22, -0.034), (23, 0.01), (24, 0.021), (25, -0.089), (26, -0.039), (27, -0.1), (28, -0.024), (29, 0.006), (30, 0.135), (31, 0.157), (32, -0.024), (33, 0.098), (34, 0.067), (35, 0.103), (36, -0.067), (37, -0.182), (38, 0.109), (39, 0.04), (40, -0.097), (41, 0.03), (42, -0.025), (43, -0.066), (44, -0.081), (45, -0.032), (46, -0.019), (47, -0.0), (48, -0.002), (49, -0.054)]
simIndex simValue blogId blogTitle
same-blog 1 0.92951494 751 andrew gelman stats-2011-06-08-Another Wegman plagiarism
Introduction: At the time of our last discussion , Edward Wegman, a statistics professor who has also worked for government research agencies, had been involved in three cases of plagiarism: a report for the U.S. Congress on climate models, a paper on social networks, a paper on color graphics. Each of the plagiarism stories was slightly different: the congressional report involved the distorted copying of research by a scientist (Raymond Bradley) whose conclusions Wegman disagreed with, the social networks paper included copied material in its background section, and the color graphics paper included various bits and pieces by others that had been used in old lecture notes. Since then, blogger Deep Climate has uncovered another plagiarized article by Wegman, this time an article in a 2005 volume on data mining and data visualization. Deep Climate writes, “certain sections of Statistical Data Mining rely heavily on lightly edited portions on lectures from Wegman’s statistical data mining c
2 0.90283293 766 andrew gelman stats-2011-06-14-Last Wegman post (for now)
Introduction: John Mashey points me to a news article by Eli Kintisch with the following wonderful quote: Will Happer, a physicist at Princeton University who questions the consensus view on climate, thinks Mashey is a destructive force who uses “totalitarian tactics”–publishing damaging documents online, without peer review–to carry out personal vendettas. I’ve never thought of uploading files as “totalitarian” but maybe they do things differently at Princeton. I actually think of totalitarians as acting secretly–denunciations without evidence, midnight arrests, trials in undisclosed locations, and so forth. Mashey’s practice of putting everything out in the open seems to me the opposite of totalitarian. The article also reports that Edward Wegman’s lawyer said that Wegman “has never engaged in plagiarism.” If I were the lawyer, I’d be pretty mad at Wegman at this point. I can just imagine the conversation: Lawyer: You never told me about that 2005 paper where you stole from Bria
Introduction: A common reason for plagiarism is laziness: you want credit for doing something but you don’t really feel like doing it–maybe you’d rather go fishing, or bowling, or blogging, or whatever, so you just steal it, or you hire someone to steal it for you. Interestingly enough, we see that in many defenses of plagiarism allegations. A common response is: I was sloppy in dealing with my notes, or I let my research assistant (who, incidentally, wasn’t credited in the final version) copy things for me and the research assistant got sloppy. The common theme: The person wanted the credit without doing the work. As I wrote last year, I like to think that directness and openness is a virtue in scientific writing. For example, clearly citing the works we draw from, even when such citing of secondary sources might make us appear less erudite. But I can see how some scholars might feel a pressure to cover their traces. Wegman Which brings us to Ed Wegman, whose defense of plagiari
4 0.82134438 1867 andrew gelman stats-2013-05-22-To Throw Away Data: Plagiarism as a Statistical Crime
Introduction: I’ve been blogging a lot lately about plagiarism (sorry, Bob!), and one thing that’s been bugging me is, why does it bother me so much. Part of the story is simple: much of my reputation comes from the words I write, so I bristle at any attempt to devalue words. I feel the same way about plagiarism that a rich person would feel about counterfeiting: Don’t debase my currency! But it’s more than that. After discussing this a bit with Thomas Basbøll, I realized that I’m bothered by the way that plagiarism interferes with the transmission of information: Much has been written on the ethics of plagiarism. One aspect that has received less notice is plagiarism’s role in corrupting our ability to learn from data: We propose that plagiarism is a statistical crime. It involves the hiding of important information regarding the source and context of the copied work in its original form. Such information can dramatically alter the statistical inferences made about the work. In statisti
5 0.81062764 722 andrew gelman stats-2011-05-20-Why no Wegmania?
Introduction: A colleague asks: When I search the web, I find the story [of the article by Said, Wegman, et al. on social networks in climate research, which was recently bumped from the journal Computational Statistics and Data Analysis because of plagiarism] only on blogs, USA Today, and UPI. Why is that? Any idea why it isn’t reported by any of the major newspapers? Here’s my answer: 1. USA Today broke the story. Apparently this USA Today reporter put a lot of effort into it. The NYT doesn’t like to run a story that begins, “Yesterday, USA Today reported…” 2. To us it’s big news because we’re statisticians. [The main guy in the study, Edward Wegman, won the Founders Award from the American Statistical Association a few years ago.] To the rest of the world, the story is: “Obscure prof at an obscure college plagiarized an article in a journal that nobody’s ever heard of.” When a Harvard scientist paints black dots on white mice and says he’s curing cancer, that’s news. When P
6 0.78902358 1266 andrew gelman stats-2012-04-16-Another day, another plagiarist
7 0.78752869 1568 andrew gelman stats-2012-11-07-That last satisfaction at the end of the career
9 0.76517469 1236 andrew gelman stats-2012-03-29-Resolution of Diederik Stapel case
10 0.76285195 400 andrew gelman stats-2010-11-08-Poli sci plagiarism update, and a note about the benefits of not caring
11 0.75571996 1324 andrew gelman stats-2012-05-16-Wikipedia author confronts Ed Wegman
12 0.74395168 345 andrew gelman stats-2010-10-15-Things we do on sabbatical instead of actually working
13 0.72209007 1588 andrew gelman stats-2012-11-23-No one knows what it’s like to be the bad man
14 0.68234819 1442 andrew gelman stats-2012-08-03-Double standard? Plagiarizing journos get slammed, plagiarizing profs just shrug it off
15 0.6563409 755 andrew gelman stats-2011-06-09-Recently in the award-winning sister blog
17 0.65537643 2234 andrew gelman stats-2014-03-05-Plagiarism, Arizona style
18 0.63684565 2334 andrew gelman stats-2014-05-14-“The subtle funk of just a little poultry offal”
19 0.63287944 913 andrew gelman stats-2011-09-16-Groundhog day in August?
topicId topicWeight
[(15, 0.012), (16, 0.056), (21, 0.043), (24, 0.102), (25, 0.013), (27, 0.105), (42, 0.011), (45, 0.053), (47, 0.02), (52, 0.011), (53, 0.01), (59, 0.033), (82, 0.016), (86, 0.03), (89, 0.018), (90, 0.011), (95, 0.017), (99, 0.289)]
simIndex simValue blogId blogTitle
1 0.97222209 708 andrew gelman stats-2011-05-12-Improvement of 5 MPG: how many more auto deaths?
Introduction: This entry was posted by Phil Price. A colleague is looking at data on car (and SUV and light truck) collisions and casualties. He’s interested in causal relationships. For instance, suppose car manufacturers try to improve gas mileage without decreasing acceleration. The most likely way they will do that is to make cars lighter. But perhaps lighter cars are more dangerous; how many more people will die for each mpg increase in gas mileage? There are a few different data sources, all of them seriously deficient from the standpoint of answering this question. Deaths are very well reported, so if someone dies in an auto accident you can find out what kind of car they were in, what other kinds of cars (if any) were involved in the accident, whether the person was a driver or passenger, and so on. But it’s hard to normalize: OK, I know that N people who were passengers in a particular model of car died in car accidents last year, but I don’t know how many passenger-miles that
2 0.9681142 173 andrew gelman stats-2010-07-31-Editing and clutch hitting
Introduction: Regarding editing : The only serious editing I’ve ever received has been for my New York Times op-eds and my article in the American Scientist. My book editors have all been nice people, and they’ve helped me with many things (including suggestions of what my priorities should be in communicating with readers)–they’ve been great–but they’ve not given (nor have I expected or asked for) serious editing. Maybe I should’ve asked for it, I don’t know. I’ve had time-wasting experiences with copy editors and a particularly annoying experience with a production editor (who was so difficult that my coauthors and I actually contacted our agent and a lawyer about the possibility of getting out of our contract), but that’s another story. Regarding clutch hitting , Bill James once noted that it’s great when a Bucky Dent hits an unexpected home run, but what’s really special is being able to get the big hit when it’s expected of you. The best players can do their best every time they come t
3 0.96519005 465 andrew gelman stats-2010-12-13-$3M health care prediction challenge
Introduction: i received the following press release from the Heritage Provider Network, “the largest limited Knox-Keene licensed managed care organization in California.” I have no idea what this means, but I assume it’s some sort of HMO. In any case, this looks like it could be interesting: Participants in the Health Prize challenge will be given a data set comprised of the de-identified medical records of 100,000 individuals who are members of HPN. The teams will then need to predict the hospitalization of a set percentage of those members who went to the hospital during the year following the start date, and do so with a defined accuracy rate. The winners will receive the $3 million prize. . . . the contest is designed to spur involvement by others involved in analytics, such as those involved in data mining and predictive modeling who may not currently be working in health care. “We believe that doing so will bring innovative thinking to health analytics and may allow us to solve at
4 0.96466339 134 andrew gelman stats-2010-07-08-“What do you think about curved lines connecting discrete data-points?”
Introduction: John Keltz writes: What do you think about curved lines connecting discrete data-points? (For example, here .) The problem with the smoothed graph is it seems to imply that something is going on in between the discrete data points, which is false. However, the straight-line version isn’t representing actual events either- it is just helping the eye connect each point. So maybe the curved version is also just helping the eye connect each point, and looks better doing it. In my own work (value-added modeling of achievement test scores) I use straight lines, but I guess I am not too bothered when people use smoothing. I’d appreciate your input. Regular readers will be unsurprised that, yes, I have an opinion on this one, and that this opinion is connected to some more general ideas about statistical graphics. In general I’m not a fan of the curved lines. They’re ok, but I don’t really see the point. I can connect the dots just fine without the curves. The more general id
Introduction: In politics, as in baseball, hot prospects from the minors can have trouble handling big-league pitching. Right after Sarah Palin was chosen as the Republican nominee for vice president in 2008, my friend Ubs, who grew up in Alaska and follows politics closely, wrote the following : Palin would probably be a pretty good president. . . . She is fantastically popular. Her percentage approval ratings have reached the 90s. Even now, with a minor nepotism scandal going on, she’s still about 80%. . . . How does one do that? You might get 60% or 70% who are rabidly enthusiastic in their love and support, but you’re also going to get a solid core of opposition who hate you with nearly as much passion. The way you get to 90% is by being boringly competent while remaining inoffensive to people all across the political spectrum. Ubs gives a long discussion of Alaska’s unique politics and then writes: Palin’s magic formula for success has been simply to ignore partisan crap and get
6 0.96314549 343 andrew gelman stats-2010-10-15-?
same-blog 9 0.95566744 751 andrew gelman stats-2011-06-08-Another Wegman plagiarism
10 0.95412147 1472 andrew gelman stats-2012-08-28-Migrating from dot to underscore
11 0.95240206 1238 andrew gelman stats-2012-03-31-Dispute about ethics of data sharing
12 0.95104259 804 andrew gelman stats-2011-07-15-Static sensitivity analysis
13 0.94915253 1982 andrew gelman stats-2013-08-15-Blaming scientific fraud on the Kuhnians
14 0.94756335 341 andrew gelman stats-2010-10-14-Confusion about continuous probability densities
15 0.94703829 802 andrew gelman stats-2011-07-13-Super Sam Fuld Needs Your Help (with Foul Ball stats)
16 0.94423163 120 andrew gelman stats-2010-06-30-You can’t put Pandora back in the box
17 0.94326997 1447 andrew gelman stats-2012-08-07-Reproducible science FAIL (so far): What’s stoppin people from sharin data and code?
18 0.94149697 2079 andrew gelman stats-2013-10-27-Uncompressing the concept of compressed sensing
19 0.93890858 2177 andrew gelman stats-2014-01-19-“The British amateur who debunked the mathematics of happiness”
20 0.93668258 1518 andrew gelman stats-2012-10-02-Fighting a losing battle