andrew_gelman_stats andrew_gelman_stats-2012 andrew_gelman_stats-2012-1488 knowledge-graph by maker-knowledge-mining

1488 andrew gelman stats-2012-09-08-Annals of spam


meta infos for this blog

Source: html

Introduction: I have to go through the inbox to approve new comments. When I set to auto-approve, I get overwhelmed with spam. As is, I still get spam but it’s manageable. Usually the spam is uninteresting but this one caught my eye: At first this seemed reasonable enough: law firm is desperate for business, spams blogs to raise its Google ranking. But what’s with the writing in the actual comment? It’s incoherent but it doesn’t look computer-generated. My guess is that the law firm in Massachusetts hired a company that promised to raise their Google rankings, and that this company hired some non-English-speaking foreigners to search through the web and write some spam comments. If anyone actually reads the comments, they might get the impression that this law firm is staffed by illiterates . . . but, as we all know, nobody reads blog comments! P.S. I followed the link (sorry!) and came across this: I guess if they’re going to use a tragedy as an excuse to troll for Faceb


Summary: the most important sentenses genereted by tfidf model

sentIndex sentText sentNum sentScore

1 I have to go through the inbox to approve new comments. [sent-1, score-0.245]

2 When I set to auto-approve, I get overwhelmed with spam. [sent-2, score-0.189]

3 Usually the spam is uninteresting but this one caught my eye: At first this seemed reasonable enough: law firm is desperate for business, spams blogs to raise its Google ranking. [sent-4, score-1.673]

4 But what’s with the writing in the actual comment? [sent-5, score-0.062]

5 My guess is that the law firm in Massachusetts hired a company that promised to raise their Google rankings, and that this company hired some non-English-speaking foreigners to search through the web and write some spam comments. [sent-7, score-2.438]

6 If anyone actually reads the comments, they might get the impression that this law firm is staffed by illiterates . [sent-8, score-1.043]

7 ) and came across this: I guess if they’re going to use a tragedy as an excuse to troll for Facebook hits, I shouldn’t be surprised that they’d spam a statistics blog. [sent-15, score-1.057]

8 Sometimes people email to tell me that their comment did not appear on the blog. [sent-19, score-0.232]

9 If so, it almost certainly got trapped in the spam filter and I never saw it. [sent-20, score-0.849]


similar blogs computed by tfidf model

tfidf for this blog:

wordName wordTfidf (topN-words)

[('spam', 0.499), ('firm', 0.287), ('law', 0.217), ('hired', 0.212), ('reads', 0.195), ('raise', 0.181), ('company', 0.154), ('foreigners', 0.154), ('staffed', 0.154), ('tragedy', 0.154), ('google', 0.139), ('desperate', 0.134), ('troll', 0.134), ('inbox', 0.13), ('uninteresting', 0.13), ('promised', 0.127), ('overwhelmed', 0.124), ('rankings', 0.121), ('trapped', 0.121), ('incoherent', 0.119), ('approve', 0.115), ('facebook', 0.107), ('comment', 0.104), ('excuse', 0.104), ('massachusetts', 0.103), ('hits', 0.103), ('filter', 0.102), ('eye', 0.1), ('guess', 0.098), ('comments', 0.098), ('blogs', 0.086), ('sorry', 0.085), ('caught', 0.079), ('web', 0.074), ('shouldn', 0.071), ('saw', 0.069), ('search', 0.069), ('nobody', 0.069), ('followed', 0.068), ('surprised', 0.068), ('impression', 0.066), ('get', 0.065), ('business', 0.065), ('appear', 0.065), ('email', 0.063), ('actual', 0.062), ('seemed', 0.06), ('anyone', 0.059), ('usually', 0.058), ('certainly', 0.058)]

similar blogs list:

simIndex simValue blogId blogTitle

same-blog 1 0.99999988 1488 andrew gelman stats-2012-09-08-Annals of spam

Introduction: I have to go through the inbox to approve new comments. When I set to auto-approve, I get overwhelmed with spam. As is, I still get spam but it’s manageable. Usually the spam is uninteresting but this one caught my eye: At first this seemed reasonable enough: law firm is desperate for business, spams blogs to raise its Google ranking. But what’s with the writing in the actual comment? It’s incoherent but it doesn’t look computer-generated. My guess is that the law firm in Massachusetts hired a company that promised to raise their Google rankings, and that this company hired some non-English-speaking foreigners to search through the web and write some spam comments. If anyone actually reads the comments, they might get the impression that this law firm is staffed by illiterates . . . but, as we all know, nobody reads blog comments! P.S. I followed the link (sorry!) and came across this: I guess if they’re going to use a tragedy as an excuse to troll for Faceb

2 0.40333176 425 andrew gelman stats-2010-11-21-If your comment didn’t get through . . .

Introduction: It probably got caught in the spam filter. We get tons and tons of spam (including the annoying spam that I have to remove by hand). If your comment was accompanied by an ad or a spam link, then maybe I just deleted it.

3 0.35582221 132 andrew gelman stats-2010-07-07-Note to “Cigarettes”

Introduction: To the person who posted an apparently non-spam comment with a URL link to a “cheap cigarettes” website: In case you’re wondering, no, your comment didn’t get caught by the spam filter–I’m not sure why not, given that URL. I put it in the spam file manually. If you’d like to participate in blog discussion in the future, please refrain from including spam links. Thank you. Also, it’s “John Tukey,” not “John Turkey.”

4 0.35381556 619 andrew gelman stats-2011-03-19-If a comment is flagged as spam, it will disappear forever

Introduction: A commenter wrote (by email): I’ve noticed that you’ve quit approving my comments on your blog. I hope I didn’t anger you in some way or write something you felt was inappropriate. My reply: I have not been unapproving any comments. If you have comments that have not appeared, they have probably been going into the spam filter. I get literally thousands of spam comments a day and so anything that hits the spam filter is gone forever. I think there is a way to register as a commenter; that could help.

5 0.24211589 27 andrew gelman stats-2010-05-11-Update on the spam email study

Introduction: A few days ago I reported on the spam email that I received from two business school professors (one at Columbia)! As noted on the blog, I sent an email directly to the study’s authors at the time of reading the email, but they have yet to respond. This surprises me a bit. Certainly if 6300 faculty each have time to respond to one email on this study, the two faculty have time to respond to 6300 email replies, no? I was actually polite enough to respond to both of their emails! If I do hear back, I’ll let youall know! P.S. Paul Basken interviewed me briefly for a story in the Chronicle of Higher Education on the now-notorious spam email study. Basken’s article is reasonable–he points out that (a) the study irritated a lot of people, but (b) is ultimately no big deal. One interesting thing about the article is that, although some people felt that the spam email study was ethical, nobody came forth with an argument that the study was actually worth doing. P.P.S. In

6 0.2279436 839 andrew gelman stats-2011-08-04-To commenters who are trying to sell something

7 0.20753759 817 andrew gelman stats-2011-07-23-New blog home

8 0.20281769 771 andrew gelman stats-2011-06-16-30 days of statistics

9 0.19333923 523 andrew gelman stats-2011-01-18-Spam is out of control

10 0.15123594 2160 andrew gelman stats-2014-01-06-Spam names

11 0.13949238 545 andrew gelman stats-2011-01-30-New innovations in spam

12 0.12133366 635 andrew gelman stats-2011-03-29-Bayesian spam!

13 0.12026425 1709 andrew gelman stats-2013-02-06-The fractal nature of scientific revolutions

14 0.11544321 370 andrew gelman stats-2010-10-25-Who gets wedding announcements in the Times?

15 0.10753417 1980 andrew gelman stats-2013-08-13-Test scores and grades predict job performance (but maybe not at Google)

16 0.10697123 220 andrew gelman stats-2010-08-20-Why I blog?

17 0.10675831 9 andrew gelman stats-2010-04-28-But it all goes to pay for gas, car insurance, and tolls on the turnpike

18 0.10426684 981 andrew gelman stats-2011-10-30-rms2

19 0.093632095 373 andrew gelman stats-2010-10-27-It’s better than being forwarded the latest works of you-know-who

20 0.092785493 633 andrew gelman stats-2011-03-28-“The New Tyranny: Carbon Monoxide Detectors?”


similar blogs computed by lsi model

lsi for this blog:

topicId topicWeight

[(0, 0.105), (1, -0.069), (2, -0.053), (3, 0.032), (4, 0.033), (5, 0.025), (6, 0.078), (7, -0.051), (8, 0.033), (9, -0.063), (10, 0.007), (11, -0.016), (12, 0.187), (13, 0.024), (14, -0.064), (15, 0.108), (16, 0.012), (17, -0.093), (18, -0.042), (19, 0.06), (20, 0.106), (21, -0.089), (22, 0.014), (23, -0.179), (24, 0.0), (25, -0.026), (26, 0.041), (27, 0.137), (28, -0.084), (29, -0.023), (30, 0.006), (31, 0.062), (32, -0.003), (33, -0.007), (34, -0.083), (35, 0.162), (36, -0.018), (37, 0.058), (38, 0.018), (39, 0.01), (40, -0.107), (41, 0.11), (42, -0.08), (43, 0.082), (44, -0.039), (45, -0.053), (46, 0.079), (47, 0.048), (48, 0.034), (49, -0.053)]

similar blogs list:

simIndex simValue blogId blogTitle

same-blog 1 0.97631472 1488 andrew gelman stats-2012-09-08-Annals of spam

Introduction: I have to go through the inbox to approve new comments. When I set to auto-approve, I get overwhelmed with spam. As is, I still get spam but it’s manageable. Usually the spam is uninteresting but this one caught my eye: At first this seemed reasonable enough: law firm is desperate for business, spams blogs to raise its Google ranking. But what’s with the writing in the actual comment? It’s incoherent but it doesn’t look computer-generated. My guess is that the law firm in Massachusetts hired a company that promised to raise their Google rankings, and that this company hired some non-English-speaking foreigners to search through the web and write some spam comments. If anyone actually reads the comments, they might get the impression that this law firm is staffed by illiterates . . . but, as we all know, nobody reads blog comments! P.S. I followed the link (sorry!) and came across this: I guess if they’re going to use a tragedy as an excuse to troll for Faceb

2 0.96737236 425 andrew gelman stats-2010-11-21-If your comment didn’t get through . . .

Introduction: It probably got caught in the spam filter. We get tons and tons of spam (including the annoying spam that I have to remove by hand). If your comment was accompanied by an ad or a spam link, then maybe I just deleted it.

3 0.92471468 523 andrew gelman stats-2011-01-18-Spam is out of control

Introduction: I just took a look at the spam folder . . . 600 messages in the past hour ! Seems pretty ridiculous to me.

4 0.92074114 132 andrew gelman stats-2010-07-07-Note to “Cigarettes”

Introduction: To the person who posted an apparently non-spam comment with a URL link to a “cheap cigarettes” website: In case you’re wondering, no, your comment didn’t get caught by the spam filter–I’m not sure why not, given that URL. I put it in the spam file manually. If you’d like to participate in blog discussion in the future, please refrain from including spam links. Thank you. Also, it’s “John Tukey,” not “John Turkey.”

5 0.91854888 619 andrew gelman stats-2011-03-19-If a comment is flagged as spam, it will disappear forever

Introduction: A commenter wrote (by email): I’ve noticed that you’ve quit approving my comments on your blog. I hope I didn’t anger you in some way or write something you felt was inappropriate. My reply: I have not been unapproving any comments. If you have comments that have not appeared, they have probably been going into the spam filter. I get literally thousands of spam comments a day and so anything that hits the spam filter is gone forever. I think there is a way to register as a commenter; that could help.

6 0.90983814 839 andrew gelman stats-2011-08-04-To commenters who are trying to sell something

7 0.80958295 817 andrew gelman stats-2011-07-23-New blog home

8 0.71345955 2160 andrew gelman stats-2014-01-06-Spam names

9 0.69522095 1709 andrew gelman stats-2013-02-06-The fractal nature of scientific revolutions

10 0.66931361 545 andrew gelman stats-2011-01-30-New innovations in spam

11 0.6644606 876 andrew gelman stats-2011-08-28-Vaguely related to the coke-dumping story

12 0.66056478 27 andrew gelman stats-2010-05-11-Update on the spam email study

13 0.64533645 9 andrew gelman stats-2010-04-28-But it all goes to pay for gas, car insurance, and tolls on the turnpike

14 0.63838577 771 andrew gelman stats-2011-06-16-30 days of statistics

15 0.63431799 199 andrew gelman stats-2010-08-11-Note to semi-spammers

16 0.620718 1168 andrew gelman stats-2012-02-14-The tabloids strike again

17 0.59084958 1791 andrew gelman stats-2013-04-07-Scatterplot charades!

18 0.58187765 220 andrew gelman stats-2010-08-20-Why I blog?

19 0.57124263 635 andrew gelman stats-2011-03-29-Bayesian spam!

20 0.53259927 1871 andrew gelman stats-2013-05-27-Annals of spam


similar blogs computed by lda model

lda for this blog:

topicId topicWeight

[(2, 0.014), (13, 0.018), (15, 0.056), (16, 0.05), (20, 0.025), (24, 0.163), (27, 0.028), (29, 0.025), (57, 0.016), (82, 0.195), (84, 0.017), (92, 0.018), (98, 0.08), (99, 0.189)]

similar blogs list:

simIndex simValue blogId blogTitle

same-blog 1 0.9231683 1488 andrew gelman stats-2012-09-08-Annals of spam

Introduction: I have to go through the inbox to approve new comments. When I set to auto-approve, I get overwhelmed with spam. As is, I still get spam but it’s manageable. Usually the spam is uninteresting but this one caught my eye: At first this seemed reasonable enough: law firm is desperate for business, spams blogs to raise its Google ranking. But what’s with the writing in the actual comment? It’s incoherent but it doesn’t look computer-generated. My guess is that the law firm in Massachusetts hired a company that promised to raise their Google rankings, and that this company hired some non-English-speaking foreigners to search through the web and write some spam comments. If anyone actually reads the comments, they might get the impression that this law firm is staffed by illiterates . . . but, as we all know, nobody reads blog comments! P.S. I followed the link (sorry!) and came across this: I guess if they’re going to use a tragedy as an excuse to troll for Faceb

2 0.90991247 940 andrew gelman stats-2011-10-03-It depends upon what the meaning of the word “firm” is.

Introduction: David Hogg pointed me to this news article by Angela Saini: It’s not often that the quiet world of mathematics is rocked by a murder case. But last summer saw a trial that sent academics into a tailspin, and has since swollen into a fevered clash between science and the law. At its heart, this is a story about chance. And it begins with a convicted killer, “T”, who took his case to the court of appeal in 2010. Among the evidence against him was a shoeprint from a pair of Nike trainers, which seemed to match a pair found at his home. While appeals often unmask shaky evidence, this was different. This time, a mathematical formula was thrown out of court. The footwear expert made what the judge believed were poor calculations about the likelihood of the match, compounded by a bad explanation of how he reached his opinion. The conviction was quashed. . . . “The impact will be quite shattering,” says Professor Norman Fenton, a mathematician at Queen Mary, University of London.

3 0.90980375 178 andrew gelman stats-2010-08-03-(Partisan) visualization of health care legislation

Introduction: Congressman Kevin Brady from Texas distributes this visualization of reformed health care in the US (click for a bigger picture): Here’s a PDF at Brady’s page, and a local copy of it. Complexity has its costs. Beyond the cost of writing it, learning it, following it, there’s also the cost of checking it. John Walker has some funny examples of what’s hidden in the almost 8000 pages of IRS code. Text mining and applied statistics will solve all that, hopefully. Anyone interested in developing a pork detection system for the legislation? Or an analysis of how much entropy to the legal code did each congressman contribute? There are already spin detectors , that help you detect whether the writer is a Democrat (“stimulus”, “health care”) or a Republican (“deficit spending”, “ObamaCare”). D+0.1: Jared Lander points to versions by Rep. Boehner and Robert Palmer .

4 0.89816022 335 andrew gelman stats-2010-10-11-How to think about Lou Dobbs

Introduction: I was unsurprised to read that Lou Dobbs, the former CNN host who crusaded against illegal immigrants, had actually hired a bunch of them himself to maintain his large house and his horse farm. (OK, I have to admit I was surprised by the part about the horse farm.) But I think most of the reactions to this story missed the point. Isabel Macdonald’s article that broke the story was entitled, “Lou Dobbs, American Hypocrite,” and most of the discussion went from there, with some commenters piling on Dobbs and others defending him by saying that Dobbs hired his laborers through contractors and may not have known they were in the country illegally. To me, though, the key issue is slightly different. And Macdonald’s story is relevant whether or not Dobbs knew he was hiring illegals. My point is not that Dobbs is a bad guy, or a hypocrite, or whatever. My point is that, in his setting, it would take an extraordinary effort to not hire illegal immigrants to take care of his house

5 0.8819831 1772 andrew gelman stats-2013-03-20-Stan at Google this Thurs and at Berkeley this Fri noon

Introduction: Michael Betancourt will be speaking at Google and at the University of California, Berkeley. The Google talk is closed to outsiders (but if you work at Google, you should go!); the Berkeley talk is open to all: Friday March 22, 12:10 pm, Evans Hall 1011. Title of talk: Stan : Practical Bayesian Inference with Hamiltonian Monte Carlo Abstract: Practical implementations of Bayesian inference are often limited to approximation methods that only slowly explore the posterior distribution. By taking advantage of the curvature of the posterior, however, Hamiltonian Monte Carlo (HMC) efficiently explores even the most highly contorted distributions. In this talk I will review the foundations of and recent developments within HMC, concluding with a discussion of Stan, a powerful inference engine that utilizes HMC, automatic differentiation, and adaptive methods to minimize user input. This is cool stuff. And he’ll be showing the whirlpool movie!

6 0.85944551 1749 andrew gelman stats-2013-03-04-Stan in L.A. this Wed 3:30pm

7 0.85099906 2162 andrew gelman stats-2014-01-08-Belief aggregation

8 0.84466934 1440 andrew gelman stats-2012-08-02-“A Christmas Carol” as applied to plagiarism

9 0.83589554 357 andrew gelman stats-2010-10-20-Sas and R

10 0.83573413 340 andrew gelman stats-2010-10-13-Randomized experiments, non-randomized experiments, and observational studies

11 0.82458907 1094 andrew gelman stats-2011-12-31-Using factor analysis or principal components analysis or measurement-error models for biological measurements in archaeology?

12 0.82335746 1963 andrew gelman stats-2013-07-31-Response by Jessica Tracy and Alec Beall to my critique of the methods in their paper, “Women Are More Likely to Wear Red or Pink at Peak Fertility”

13 0.82127666 931 andrew gelman stats-2011-09-29-Hamiltonian Monte Carlo stories

14 0.81997609 1134 andrew gelman stats-2012-01-21-Lessons learned from a recent R package submission

15 0.81853783 326 andrew gelman stats-2010-10-07-Peer pressure, selection, and educational reform

16 0.81457502 67 andrew gelman stats-2010-06-03-More on that Dartmouth health care study

17 0.81417269 1553 andrew gelman stats-2012-10-30-Real rothko, fake rothko

18 0.80763972 359 andrew gelman stats-2010-10-21-Applied Statistics Center miniconference: Statistical sampling in developing countries

19 0.80625159 366 andrew gelman stats-2010-10-24-Mankiw tax update

20 0.80158943 2003 andrew gelman stats-2013-08-30-Stan Project: Continuous Relaxations for Discrete MRFs