andrew_gelman_stats andrew_gelman_stats-2011 andrew_gelman_stats-2011-722 knowledge-graph by maker-knowledge-mining
Source: html
Introduction: A colleague asks: When I search the web, I find the story [of the article by Said, Wegman, et al. on social networks in climate research, which was recently bumped from the journal Computational Statistics and Data Analysis because of plagiarism] only on blogs, USA Today, and UPI. Why is that? Any idea why it isn’t reported by any of the major newspapers? Here’s my answer: 1. USA Today broke the story. Apparently this USA Today reporter put a lot of effort into it. The NYT doesn’t like to run a story that begins, “Yesterday, USA Today reported…” 2. To us it’s big news because we’re statisticians. [The main guy in the study, Edward Wegman, won the Founders Award from the American Statistical Association a few years ago.] To the rest of the world, the story is: “Obscure prof at an obscure college plagiarized an article in a journal that nobody’s ever heard of.” When a Harvard scientist paints black dots on white mice and says he’s curing cancer, that’s news. When P
sentIndex sentText sentNum sentScore
1 A colleague asks: When I search the web, I find the story [of the article by Said, Wegman, et al. [sent-1, score-0.237]
2 on social networks in climate research, which was recently bumped from the journal Computational Statistics and Data Analysis because of plagiarism] only on blogs, USA Today, and UPI. [sent-2, score-0.555]
3 Any idea why it isn’t reported by any of the major newspapers? [sent-4, score-0.123]
4 Apparently this USA Today reporter put a lot of effort into it. [sent-7, score-0.085]
5 The NYT doesn’t like to run a story that begins, “Yesterday, USA Today reported…” 2. [sent-8, score-0.155]
6 To us it’s big news because we’re statisticians. [sent-9, score-0.149]
7 ] To the rest of the world, the story is: “Obscure prof at an obscure college plagiarized an article in a journal that nobody’s ever heard of. [sent-11, score-0.701]
8 ” When a Harvard scientist paints black dots on white mice and says he’s curing cancer, that’s news. [sent-12, score-0.347]
9 Nobody retracts an article on social networks, that’s not so exciting. [sent-14, score-0.162]
10 I think it’s possible the story will develop further. [sent-16, score-0.155]
11 If these statisticians get accused of lying to Congress, that could hit the papers. [sent-17, score-0.271]
12 Basically, plagiarism is exciting to academics but not so thrilling to the general public if no celebrities are involved. [sent-18, score-0.445]
13 I expect someone at the Chronicle of Higher Education 3. [sent-19, score-0.079]
14 One more thing: newspapers like to report things that are clearly news: earthquakes, fires, elections, arrests, . [sent-20, score-0.193]
15 If criminal charges come up or if someone starts suing, then I could see the court events as a hook on which to hang a news story. [sent-23, score-0.83]
wordName wordTfidf (topN-words)
[('usa', 0.396), ('today', 0.22), ('newspapers', 0.193), ('wegman', 0.179), ('obscure', 0.177), ('plagiarism', 0.166), ('networks', 0.163), ('story', 0.155), ('news', 0.149), ('bumped', 0.14), ('fires', 0.132), ('paints', 0.132), ('earthquakes', 0.126), ('nobody', 0.125), ('reported', 0.123), ('chronicle', 0.122), ('arrests', 0.122), ('mice', 0.122), ('celebrities', 0.115), ('founders', 0.113), ('broke', 0.11), ('hang', 0.11), ('hook', 0.108), ('charges', 0.103), ('edward', 0.102), ('criminal', 0.1), ('accused', 0.1), ('plagiarized', 0.1), ('prof', 0.097), ('lying', 0.094), ('court', 0.093), ('dots', 0.093), ('warming', 0.092), ('award', 0.091), ('journal', 0.09), ('starts', 0.088), ('academics', 0.085), ('nyt', 0.085), ('reporter', 0.085), ('climate', 0.082), ('article', 0.082), ('cancer', 0.08), ('social', 0.08), ('congress', 0.08), ('someone', 0.079), ('exciting', 0.079), ('blogs', 0.078), ('begins', 0.077), ('hit', 0.077), ('global', 0.076)]
simIndex simValue blogId blogTitle
same-blog 1 1.0000001 722 andrew gelman stats-2011-05-20-Why no Wegmania?
Introduction: A colleague asks: When I search the web, I find the story [of the article by Said, Wegman, et al. on social networks in climate research, which was recently bumped from the journal Computational Statistics and Data Analysis because of plagiarism] only on blogs, USA Today, and UPI. Why is that? Any idea why it isn’t reported by any of the major newspapers? Here’s my answer: 1. USA Today broke the story. Apparently this USA Today reporter put a lot of effort into it. The NYT doesn’t like to run a story that begins, “Yesterday, USA Today reported…” 2. To us it’s big news because we’re statisticians. [The main guy in the study, Edward Wegman, won the Founders Award from the American Statistical Association a few years ago.] To the rest of the world, the story is: “Obscure prof at an obscure college plagiarized an article in a journal that nobody’s ever heard of.” When a Harvard scientist paints black dots on white mice and says he’s curing cancer, that’s news. When P
2 0.23390141 728 andrew gelman stats-2011-05-24-A (not quite) grand unified theory of plagiarism, as applied to the Wegman case
Introduction: A common reason for plagiarism is laziness: you want credit for doing something but you don’t really feel like doing it–maybe you’d rather go fishing, or bowling, or blogging, or whatever, so you just steal it, or you hire someone to steal it for you. Interestingly enough, we see that in many defenses of plagiarism allegations. A common response is: I was sloppy in dealing with my notes, or I let my research assistant (who, incidentally, wasn’t credited in the final version) copy things for me and the research assistant got sloppy. The common theme: The person wanted the credit without doing the work. As I wrote last year, I like to think that directness and openness is a virtue in scientific writing. For example, clearly citing the works we draw from, even when such citing of secondary sources might make us appear less erudite. But I can see how some scholars might feel a pressure to cover their traces. Wegman Which brings us to Ed Wegman, whose defense of plagiari
3 0.20667058 751 andrew gelman stats-2011-06-08-Another Wegman plagiarism
Introduction: At the time of our last discussion , Edward Wegman, a statistics professor who has also worked for government research agencies, had been involved in three cases of plagiarism: a report for the U.S. Congress on climate models, a paper on social networks, a paper on color graphics. Each of the plagiarism stories was slightly different: the congressional report involved the distorted copying of research by a scientist (Raymond Bradley) whose conclusions Wegman disagreed with, the social networks paper included copied material in its background section, and the color graphics paper included various bits and pieces by others that had been used in old lecture notes. Since then, blogger Deep Climate has uncovered another plagiarized article by Wegman, this time an article in a 2005 volume on data mining and data visualization. Deep Climate writes, “certain sections of Statistical Data Mining rely heavily on lightly edited portions on lectures from Wegman’s statistical data mining c
4 0.16092788 1943 andrew gelman stats-2013-07-18-Data to use for in-class sampling exercises?
Introduction: Mark Street writes: I teach a high school (grade 11) statistics class outside the USA and I am always looking for hands-on demonstrations. In fact, last week (the start of our school year here), I did the in-class exercise about “guessing ages of ten pictures” (p. 11-13) from your book “Teaching Statistics – A Bag of Tricks”. I am interested in using the “candy weighing” demonstration (p. 120-121) to talk about random sampling. I agree with your advice (p.48, Sec 5.1) that it’s better to have students do sampling from actual populations. I also agree with your suggestion that actual doing personal interviews is not an effective use of time, except for larger projects. To that end, can you suggest some sources of actual population data that I could use in class? As I am outside the USA (in Thailand), we do not have phonebooks here. Certainly, this must soon be a problem for people in the USA with phonebooks going the way of the 8-track tape. I even looked online for di
Introduction: As regular readers of this blog are aware, I am fascinated by academic and scientific cheating and the excuses people give for it. Bruno Frey and colleagues published a single article (with only minor variants) in five different major journals, and these articles did not cite each other. And there have been several other cases of his self-plagiarism (see this review from Olaf Storbeck). I do not mind the general practice of repeating oneself for different audiences—in the social sciences, we call this Arrow’s Theorem —but in this case Frey seems to have gone a bit too far. Blogger Economic Logic has looked into this and concluded that this sort of common practice is standard in “the context of the German(-speaking) academic environment,” and what sets Frey apart is not his self-plagiarism or even his brazenness but rather his practice of doing it in high-visibility journals. Economic Logic writes that “[Frey's] contribution is pedagogical, he found a good and interesting
6 0.14454001 766 andrew gelman stats-2011-06-14-Last Wegman post (for now)
7 0.132425 1867 andrew gelman stats-2013-05-22-To Throw Away Data: Plagiarism as a Statistical Crime
8 0.12598932 1588 andrew gelman stats-2012-11-23-No one knows what it’s like to be the bad man
9 0.12582742 1862 andrew gelman stats-2013-05-18-uuuuuuuuuuuuugly
10 0.11974309 572 andrew gelman stats-2011-02-14-Desecration of valuable real estate
11 0.11947708 1568 andrew gelman stats-2012-11-07-That last satisfaction at the end of the career
13 0.10987958 558 andrew gelman stats-2011-02-05-Fattening of the world and good use of the alpha channel
14 0.10519992 1266 andrew gelman stats-2012-04-16-Another day, another plagiarist
15 0.10241996 1435 andrew gelman stats-2012-07-30-Retracted articles and unethical behavior in economics journals?
16 0.10178234 367 andrew gelman stats-2010-10-25-In today’s economy, the rich get richer
17 0.10165583 345 andrew gelman stats-2010-10-15-Things we do on sabbatical instead of actually working
18 0.0969574 1697 andrew gelman stats-2013-01-29-Where 36% of all boys end up nowadays
19 0.095189534 408 andrew gelman stats-2010-11-11-Incumbency advantage in 2010
20 0.094724283 2115 andrew gelman stats-2013-11-27-Three unblinded mice
topicId topicWeight
[(0, 0.155), (1, -0.103), (2, -0.032), (3, -0.035), (4, -0.033), (5, 0.0), (6, 0.023), (7, -0.032), (8, -0.023), (9, 0.023), (10, 0.017), (11, -0.034), (12, -0.022), (13, 0.031), (14, -0.048), (15, 0.006), (16, 0.055), (17, 0.002), (18, 0.061), (19, -0.046), (20, -0.04), (21, 0.004), (22, -0.033), (23, -0.02), (24, 0.051), (25, -0.029), (26, -0.086), (27, -0.038), (28, -0.048), (29, -0.025), (30, 0.053), (31, 0.095), (32, -0.009), (33, 0.088), (34, 0.008), (35, 0.079), (36, -0.065), (37, -0.098), (38, 0.05), (39, 0.055), (40, -0.074), (41, 0.036), (42, 0.009), (43, -0.026), (44, -0.024), (45, 0.001), (46, -0.019), (47, 0.019), (48, -0.037), (49, -0.061)]
simIndex simValue blogId blogTitle
same-blog 1 0.96194291 722 andrew gelman stats-2011-05-20-Why no Wegmania?
Introduction: A colleague asks: When I search the web, I find the story [of the article by Said, Wegman, et al. on social networks in climate research, which was recently bumped from the journal Computational Statistics and Data Analysis because of plagiarism] only on blogs, USA Today, and UPI. Why is that? Any idea why it isn’t reported by any of the major newspapers? Here’s my answer: 1. USA Today broke the story. Apparently this USA Today reporter put a lot of effort into it. The NYT doesn’t like to run a story that begins, “Yesterday, USA Today reported…” 2. To us it’s big news because we’re statisticians. [The main guy in the study, Edward Wegman, won the Founders Award from the American Statistical Association a few years ago.] To the rest of the world, the story is: “Obscure prof at an obscure college plagiarized an article in a journal that nobody’s ever heard of.” When a Harvard scientist paints black dots on white mice and says he’s curing cancer, that’s news. When P
2 0.83365613 766 andrew gelman stats-2011-06-14-Last Wegman post (for now)
Introduction: John Mashey points me to a news article by Eli Kintisch with the following wonderful quote: Will Happer, a physicist at Princeton University who questions the consensus view on climate, thinks Mashey is a destructive force who uses “totalitarian tactics”–publishing damaging documents online, without peer review–to carry out personal vendettas. I’ve never thought of uploading files as “totalitarian” but maybe they do things differently at Princeton. I actually think of totalitarians as acting secretly–denunciations without evidence, midnight arrests, trials in undisclosed locations, and so forth. Mashey’s practice of putting everything out in the open seems to me the opposite of totalitarian. The article also reports that Edward Wegman’s lawyer said that Wegman “has never engaged in plagiarism.” If I were the lawyer, I’d be pretty mad at Wegman at this point. I can just imagine the conversation: Lawyer: You never told me about that 2005 paper where you stole from Bria
3 0.82610363 728 andrew gelman stats-2011-05-24-A (not quite) grand unified theory of plagiarism, as applied to the Wegman case
Introduction: A common reason for plagiarism is laziness: you want credit for doing something but you don’t really feel like doing it–maybe you’d rather go fishing, or bowling, or blogging, or whatever, so you just steal it, or you hire someone to steal it for you. Interestingly enough, we see that in many defenses of plagiarism allegations. A common response is: I was sloppy in dealing with my notes, or I let my research assistant (who, incidentally, wasn’t credited in the final version) copy things for me and the research assistant got sloppy. The common theme: The person wanted the credit without doing the work. As I wrote last year, I like to think that directness and openness is a virtue in scientific writing. For example, clearly citing the works we draw from, even when such citing of secondary sources might make us appear less erudite. But I can see how some scholars might feel a pressure to cover their traces. Wegman Which brings us to Ed Wegman, whose defense of plagiari
4 0.79868799 1568 andrew gelman stats-2012-11-07-That last satisfaction at the end of the career
Introduction: I just finished reading an amusing but somewhat disturbing article by Mark Singer, a reporter for the New Yorker who follows in that magazine’s tradition of writing about amiable frauds. (For those who are keeping score at home, Singer employs a McKelway-style relaxed tolerance rather than Liebling-style pyrotechnics.) Singer’s topic was a midwestern dentist named Kip Litton who fradulently invented a side career for himself as a sub-3-hour marathoner. What was amazing was not so much that Litton lied about his accomplishments but, rather, the huge efforts that he undertook to support these lies. He went to faraway cities to not run marathons. He fabricated multiple personas on running message boards. He even invented an entire marathon and made up a list of participants. This got me thinking about Ed Wegman (sorry!), the statistician who got tangled in a series of plagiarism scandals . As with Litton, once Wegman was caught once, energetic people looked at the records and
5 0.79139674 751 andrew gelman stats-2011-06-08-Another Wegman plagiarism
Introduction: At the time of our last discussion , Edward Wegman, a statistics professor who has also worked for government research agencies, had been involved in three cases of plagiarism: a report for the U.S. Congress on climate models, a paper on social networks, a paper on color graphics. Each of the plagiarism stories was slightly different: the congressional report involved the distorted copying of research by a scientist (Raymond Bradley) whose conclusions Wegman disagreed with, the social networks paper included copied material in its background section, and the color graphics paper included various bits and pieces by others that had been used in old lecture notes. Since then, blogger Deep Climate has uncovered another plagiarized article by Wegman, this time an article in a 2005 volume on data mining and data visualization. Deep Climate writes, “certain sections of Statistical Data Mining rely heavily on lightly edited portions on lectures from Wegman’s statistical data mining c
7 0.76417041 755 andrew gelman stats-2011-06-09-Recently in the award-winning sister blog
8 0.75575101 1867 andrew gelman stats-2013-05-22-To Throw Away Data: Plagiarism as a Statistical Crime
9 0.74992186 1236 andrew gelman stats-2012-03-29-Resolution of Diederik Stapel case
10 0.74329621 400 andrew gelman stats-2010-11-08-Poli sci plagiarism update, and a note about the benefits of not caring
11 0.74203759 1266 andrew gelman stats-2012-04-16-Another day, another plagiarist
12 0.7229777 1324 andrew gelman stats-2012-05-16-Wikipedia author confronts Ed Wegman
13 0.71348304 345 andrew gelman stats-2010-10-15-Things we do on sabbatical instead of actually working
15 0.66967535 2234 andrew gelman stats-2014-03-05-Plagiarism, Arizona style
16 0.6672529 1588 andrew gelman stats-2012-11-23-No one knows what it’s like to be the bad man
18 0.62187493 1484 andrew gelman stats-2012-09-05-Two exciting movie ideas: “Second Chance U” and “The New Dirty Dozen”
19 0.61956328 1901 andrew gelman stats-2013-06-16-Evilicious: Why We Evolved a Taste for Being Bad
20 0.60893822 2334 andrew gelman stats-2014-05-14-“The subtle funk of just a little poultry offal”
topicId topicWeight
[(1, 0.013), (16, 0.224), (18, 0.013), (21, 0.01), (24, 0.072), (27, 0.029), (45, 0.011), (52, 0.031), (59, 0.019), (62, 0.011), (63, 0.035), (66, 0.02), (72, 0.02), (77, 0.012), (84, 0.01), (85, 0.014), (86, 0.061), (89, 0.013), (99, 0.294)]
simIndex simValue blogId blogTitle
1 0.9728272 321 andrew gelman stats-2010-10-05-Racism!
Introduction: Last night I spoke at the Columbia Club of New York, along with some of my political science colleagues, in a panel about politics, the economy, and the forthcoming election. The discussion was fine . . . until one guy in the audience accused us of bias based on what he imputed as our ethnicity. One of the panelists replied by asking the questioner what of all the things we had said was biased, and the questioner couldn’t actually supply any examples. It makes sense that the questioner couldn’t come up with a single example of bias on our part, considering that we were actually presenting facts . At some level, the questioner’s imputation of our ethnicity and accusation of bias isn’t so horrible. When talking with my friends, I engage in casual ethnic stereotyping all the time–hey, it’s a free country!–and one can certainly make the statistical argument that you can guess people’s ethnicities from their names, appearance, and speech patterns, and in turn you can infer a lot
2 0.97277695 377 andrew gelman stats-2010-10-28-The incoming moderate Republican congressmembers
Introduction: Boris writes : By nearly all accounts, the Republicans looks set to take over the US House of Representatives in next week’s November 2010 general election. . . . Republicans, in this wave election that recalls 1994, look set to win not just swing districts, but also those districts that have been traditionally Democratic, or those with strong or longtime Democratic incumbents. Naturally, just as in 2008, this has led to overclaiming by jubilant conservatives and distraught liberals-though the adjectives were then reversed-that this portends a realignment in American politics. . . . Republican moderates in Congress are often associated with two factors: 1) a liberal voting record earlier in their career, and 2) a liberal district. Of course, both are related, in the sense that ambitious moderates choose liberal districts to run in, and liberal districts weed out conservative candidates. . . . Given how competitive Republicans are in 2010, even in otherwise unfriendly territory,
3 0.96981561 1022 andrew gelman stats-2011-11-21-Progress for the Poor
Introduction: Lane Kenworthy writes : The book is full of graphs that support the above claims. One thing I like about Kenworthy’s approach is that he performs a separate analysis to examine each of his hypotheses. A lot of social scientists seem to think that the ideal analysis will conclude with a big regression where each coefficient tells a story and you can address all your hypotheses by looking at which predictors and interactions have statistically significant coefficients. Really, though, I think you need a separate analysis for each causal question (see chapters 9 and 10 of my book with Jennifer, follow this link ). Kenworthy’s overall recommendation is to increase transfer payments to low-income families and to increase overall government spending on social services, and to fund this through general tax increases. What will it take for this to happen? After a review of the evidence from economic trends and opinion polls, Kenworthy writes, “Americans are potentially recepti
4 0.96918398 609 andrew gelman stats-2011-03-13-Coauthorship norms
Introduction: I followed this link from Chris Blattman to an article by economist Roland Fryer, who writes: I [Fryer] find no evidence that teacher incentives increase student performance, attendance, or graduation, nor do I find any evidence that the incentives change student or teacher behavior. What struck me were not the findings (which, as Fryer notes in his article, are plausible enough) but the use of the word “I” rather than “we.” A field experiment is a big deal, and I was surprised to read that Fryer did it all by himself! Here’s the note of acknowledgments (on the first page of the article): This project would not have been possible without the leadership and support of Joel Klein. I am also grateful to Jennifer Bell-Ellwanger, Joanna Cannon, and Dominique West for their cooperation in collecting the data necessary for this project, and to my colleagues Edward Glaeser, Richard Holden, and Lawrence Katz for helpful comments and discussions. Vilsa E. Curto, Meghan L. Howard,
5 0.96849453 700 andrew gelman stats-2011-05-06-Suspicious pattern of too-strong replications of medical research
Introduction: Howard Wainer writes in the Statistics Forum: The Chinese scientific literature is rarely read or cited outside of China. But the authors of this work are usually knowledgeable of the non-Chinese literature — at least the A-list journals. And so they too try to replicate the alpha finding. But do they? One would think that they would find the same diminished effect size, but they don’t! Instead they replicate the original result, even larger. Here’s one of the graphs: How did this happen? Full story here .
6 0.96840054 1928 andrew gelman stats-2013-07-06-How to think about papers published in low-grade journals?
7 0.96713781 1156 andrew gelman stats-2012-02-06-Bayesian model-building by pure thought: Some principles and examples
9 0.96347296 1495 andrew gelman stats-2012-09-13-Win $5000 in the Economist’s data visualization competition
10 0.96301854 159 andrew gelman stats-2010-07-23-Popular governor, small state
same-blog 11 0.9626106 722 andrew gelman stats-2011-05-20-Why no Wegmania?
12 0.96148497 960 andrew gelman stats-2011-10-15-The bias-variance tradeoff
13 0.9604944 1598 andrew gelman stats-2012-11-30-A graphics talk with no visuals!
14 0.95986927 1025 andrew gelman stats-2011-11-24-Always check your evidence
15 0.95774263 1330 andrew gelman stats-2012-05-19-Cross-validation to check missing-data imputation
16 0.95602268 1712 andrew gelman stats-2013-02-07-Philosophy and the practice of Bayesian statistics (with all the discussions!)
17 0.95318043 185 andrew gelman stats-2010-08-04-Why does anyone support private macroeconomic forecasts?
18 0.95214748 411 andrew gelman stats-2010-11-13-Ethical concerns in medical trials
19 0.95118368 2 andrew gelman stats-2010-04-23-Modeling heterogenous treatment effects