andrew_gelman_stats andrew_gelman_stats-2014 andrew_gelman_stats-2014-2160 knowledge-graph by maker-knowledge-mining

2160 andrew gelman stats-2014-01-06-Spam names


meta infos for this blog

Source: html

Introduction: There was this thing going around awhile ago, the “porn star name,” which you create by taking the name of your childhood pet, followed by the name of the street where you grew up (for example, Blitz Clifton). But recently I’ve been thinking about spam names. Just in the last two days, I’ve received emails from “Blair Williams” (“I’m sorry to have to tell you this. Tomorrow is the last day that the 40% discount will be available.”), “Audrey Woods” (“I wanted to reach out to you to let you know that we just launched an infographic . . .”), “Steven Harris” (“Part-Time Job – Earn $600/day in your spare-time”), and “Nick Bagnall” (“I sent you an email some weeks ago concerning . . .”). Actually, I think “Nick Bagnall” is probably a real person who’s just spamming me. But the first three names above look fake fake fake. And then there were “George Stoneriver,” Scott Wolfe,” and just plain “Paul,” who were sockpuppeting our discussion on compressed sensing a couple months ago. And do


Summary: the most important sentenses genereted by tfidf model

sentIndex sentText sentNum sentScore

1 There was this thing going around awhile ago, the “porn star name,” which you create by taking the name of your childhood pet, followed by the name of the street where you grew up (for example, Blitz Clifton). [sent-1, score-0.616]

2 But recently I’ve been thinking about spam names. [sent-2, score-0.244]

3 Just in the last two days, I’ve received emails from “Blair Williams” (“I’m sorry to have to tell you this. [sent-3, score-0.255]

4 Tomorrow is the last day that the 40% discount will be available. [sent-4, score-0.149]

5 ”), “Steven Harris” (“Part-Time Job – Earn $600/day in your spare-time”), and “Nick Bagnall” (“I sent you an email some weeks ago concerning . [sent-8, score-0.062]

6 Actually, I think “Nick Bagnall” is probably a real person who’s just spamming me. [sent-12, score-0.295]

7 But the first three names above look fake fake fake. [sent-13, score-0.51]

8 And then there were “George Stoneriver,” Scott Wolfe,” and just plain “Paul,” who were sockpuppeting our discussion on compressed sensing a couple months ago. [sent-14, score-0.062]

9 McKee is a real person who just did some spamming on the side (as Kahneman and Tversky might say, “Marty is a real person and is active in the shamanist movement”), but I’m guessing that Alexa Russell and Maricel Anderson are fakes. [sent-17, score-0.459]

10 It sounds generic, but not too generic (no John Smiths, surely). [sent-20, score-0.09]

11 And, of course, if you happen to be named “Audrey Woods” for real, you’re screwed, as everyone will be sending your messages straight to the spam folder. [sent-23, score-0.429]

12 Unfair, I know, but the same thing must happen to people named John Smith who try to register at hotels, no? [sent-24, score-0.185]

13 To create a good spammer name, you perhaps need a certain nonchalance, so as to conceal all art and make whatever one does or says appear to be without effort and almost without any thought about it. [sent-26, score-0.214]

14 Perhaps there is a simple algorithm to come up with a spammer name, e. [sent-27, score-0.142]

15 make a first name out of the first two letters of the names each of your first two children, then make a last name out of the first two letters of the names of each of your last two children. [sent-29, score-1.78]

16 I suppose I should just switch all the way to gmail and then maybe more of my spam would get caught? [sent-34, score-0.319]

17 But I’d like to be able to continue to use the mac email reader, as it allows me to read and write emails while offline. [sent-35, score-0.16]

18 The people in the above image are not porn stars. [sent-39, score-0.15]

19 Speaking of spam, check our this expose from Nenad SEO. [sent-50, score-0.065]

20 Although I’m sure there’s nothing special about online scammers, I’d also be repulsed by student-loan scammers, legal double-billers, and other participants in the scam-industrial complex. [sent-52, score-0.075]


similar blogs computed by tfidf model

tfidf for this blog:

wordName wordTfidf (topN-words)

[('name', 0.272), ('clifton', 0.248), ('spam', 0.244), ('audrey', 0.225), ('blitz', 0.225), ('bagnall', 0.225), ('woods', 0.186), ('scammers', 0.165), ('nick', 0.155), ('names', 0.155), ('maricel', 0.15), ('porn', 0.15), ('stoneriver', 0.15), ('spammer', 0.142), ('alexa', 0.142), ('fake', 0.141), ('marty', 0.136), ('mckee', 0.136), ('spamming', 0.131), ('named', 0.122), ('anderson', 0.109), ('russell', 0.106), ('emails', 0.098), ('generic', 0.09), ('letters', 0.089), ('last', 0.086), ('real', 0.085), ('john', 0.08), ('person', 0.079), ('george', 0.076), ('nah', 0.075), ('repulsed', 0.075), ('gmail', 0.075), ('first', 0.073), ('create', 0.072), ('rumor', 0.071), ('blair', 0.071), ('chaos', 0.071), ('two', 0.071), ('expose', 0.065), ('closing', 0.063), ('discount', 0.063), ('happen', 0.063), ('wolfe', 0.062), ('screwed', 0.062), ('sensing', 0.062), ('email', 0.062), ('curiosity', 0.06), ('shop', 0.06), ('sleazy', 0.06)]

similar blogs list:

simIndex simValue blogId blogTitle

same-blog 1 0.99999994 2160 andrew gelman stats-2014-01-06-Spam names

Introduction: There was this thing going around awhile ago, the “porn star name,” which you create by taking the name of your childhood pet, followed by the name of the street where you grew up (for example, Blitz Clifton). But recently I’ve been thinking about spam names. Just in the last two days, I’ve received emails from “Blair Williams” (“I’m sorry to have to tell you this. Tomorrow is the last day that the 40% discount will be available.”), “Audrey Woods” (“I wanted to reach out to you to let you know that we just launched an infographic . . .”), “Steven Harris” (“Part-Time Job – Earn $600/day in your spare-time”), and “Nick Bagnall” (“I sent you an email some weeks ago concerning . . .”). Actually, I think “Nick Bagnall” is probably a real person who’s just spamming me. But the first three names above look fake fake fake. And then there were “George Stoneriver,” Scott Wolfe,” and just plain “Paul,” who were sockpuppeting our discussion on compressed sensing a couple months ago. And do

2 0.21876574 2306 andrew gelman stats-2014-04-26-Sleazy sock puppet can’t stop spamming our discussion of compressed sensing and promoting the work of Xiteng Liu

Introduction: Some asshole who has a bug up his ass about compressed sensing is spamming our comments with a bunch of sock puppets. All from the same IP address: “George Stoneriver,” Scott Wolfe,” and just plain “Paul,” all saying pretty much the same thing in the same sort of broken English (except for Paul, whose post was too short to do a dialect analysis). “Scott Wolfe” is a generic sort of name, but a quick google search reveals nothing related to this topic. “George Stoneriver” seems to have no internet presence at all (besides the comments at this blog). As for “Paul,” I don’t know, maybe the spammer was too lazy to invent a last name? Our spammer spends about half his time slamming the field of compressed sensing and the other half pumping up the work of someone named Xiteng Liu. There’s no excuse for this behavior. It’s horrible, a true abuse of our scholarly community. If Scott Adams wants to use a sock puppet, fine, the guy’s an artist and we should cut him some slack. If tha

3 0.20978355 1421 andrew gelman stats-2012-07-19-Alexa, Maricel, and Marty: Three cellular automata who got on my nerves

Introduction: I received the following two emails within fifteen minutes of each other. First, from “Alexa Russell,” subject line “An idea for a blog post: The Role, Importance, and Power of Words”: Hi Andrew, I’m a researcher/writer for a resource covering the importance of English proficiency in today’s workplace. I came across your blog andrewgelman.com as I was conducting research and I’m interested in contributing an article to your blog because I found the topics you cover very engaging. I’m thinking about writing an article that looks at how the Internet has changed the way English is used today; not only has its syntax changed as a result of the Internet Revolution, but the amount of job opportunities has also shifted as a result of this shift. I’d be happy to work with you on the topic if you have any insights. Thanks, and I look forward to hearing from you soon. Best, Alexa Second, From “Maricel Anderson,” subject line “An idea for a blog post: Healthcare Management and Geri

4 0.19137141 425 andrew gelman stats-2010-11-21-If your comment didn’t get through . . .

Introduction: It probably got caught in the spam filter. We get tons and tons of spam (including the annoying spam that I have to remove by hand). If your comment was accompanied by an ad or a spam link, then maybe I just deleted it.

5 0.18321113 132 andrew gelman stats-2010-07-07-Note to “Cigarettes”

Introduction: To the person who posted an apparently non-spam comment with a URL link to a “cheap cigarettes” website: In case you’re wondering, no, your comment didn’t get caught by the spam filter–I’m not sure why not, given that URL. I put it in the spam file manually. If you’d like to participate in blog discussion in the future, please refrain from including spam links. Thank you. Also, it’s “John Tukey,” not “John Turkey.”

6 0.16525573 619 andrew gelman stats-2011-03-19-If a comment is flagged as spam, it will disappear forever

7 0.16223989 2211 andrew gelman stats-2014-02-14-The popularity of certain baby names is falling off the clifffffffffffff

8 0.15647367 27 andrew gelman stats-2010-05-11-Update on the spam email study

9 0.15123594 1488 andrew gelman stats-2012-09-08-Annals of spam

10 0.14934662 2148 andrew gelman stats-2013-12-25-Spam!

11 0.12839893 28 andrew gelman stats-2010-05-12-Alert: Incompetent colleague wastes time of hardworking Wolfram Research publicist

12 0.11527087 523 andrew gelman stats-2011-01-18-Spam is out of control

13 0.11486897 839 andrew gelman stats-2011-08-04-To commenters who are trying to sell something

14 0.11243997 2212 andrew gelman stats-2014-02-15-Mary, Mary, why ya buggin

15 0.10509063 771 andrew gelman stats-2011-06-16-30 days of statistics

16 0.10376628 605 andrew gelman stats-2011-03-09-Does it feel like cheating when I do this? Variation in ethical standards and expectations

17 0.10220861 1701 andrew gelman stats-2013-01-31-The name that fell off a cliff

18 0.095897883 1249 andrew gelman stats-2012-04-06-Thinking seriously about social science research

19 0.095887735 817 andrew gelman stats-2011-07-23-New blog home

20 0.094024405 2282 andrew gelman stats-2014-04-05-Bizarre academic spam


similar blogs computed by lsi model

lsi for this blog:

topicId topicWeight

[(0, 0.13), (1, -0.075), (2, -0.043), (3, 0.041), (4, 0.038), (5, 0.001), (6, 0.068), (7, -0.034), (8, 0.026), (9, -0.047), (10, 0.011), (11, -0.009), (12, 0.106), (13, 0.04), (14, -0.029), (15, 0.087), (16, 0.013), (17, -0.067), (18, 0.0), (19, 0.036), (20, 0.019), (21, -0.039), (22, 0.04), (23, -0.11), (24, -0.01), (25, -0.04), (26, -0.002), (27, 0.071), (28, -0.054), (29, -0.016), (30, 0.005), (31, 0.072), (32, -0.04), (33, 0.023), (34, -0.076), (35, 0.066), (36, -0.006), (37, 0.053), (38, -0.056), (39, -0.004), (40, -0.077), (41, 0.071), (42, 0.011), (43, 0.027), (44, 0.006), (45, 0.009), (46, 0.035), (47, 0.013), (48, 0.127), (49, 0.008)]

similar blogs list:

simIndex simValue blogId blogTitle

same-blog 1 0.94458956 2160 andrew gelman stats-2014-01-06-Spam names

Introduction: There was this thing going around awhile ago, the “porn star name,” which you create by taking the name of your childhood pet, followed by the name of the street where you grew up (for example, Blitz Clifton). But recently I’ve been thinking about spam names. Just in the last two days, I’ve received emails from “Blair Williams” (“I’m sorry to have to tell you this. Tomorrow is the last day that the 40% discount will be available.”), “Audrey Woods” (“I wanted to reach out to you to let you know that we just launched an infographic . . .”), “Steven Harris” (“Part-Time Job – Earn $600/day in your spare-time”), and “Nick Bagnall” (“I sent you an email some weeks ago concerning . . .”). Actually, I think “Nick Bagnall” is probably a real person who’s just spamming me. But the first three names above look fake fake fake. And then there were “George Stoneriver,” Scott Wolfe,” and just plain “Paul,” who were sockpuppeting our discussion on compressed sensing a couple months ago. And do

2 0.85121727 1488 andrew gelman stats-2012-09-08-Annals of spam

Introduction: I have to go through the inbox to approve new comments. When I set to auto-approve, I get overwhelmed with spam. As is, I still get spam but it’s manageable. Usually the spam is uninteresting but this one caught my eye: At first this seemed reasonable enough: law firm is desperate for business, spams blogs to raise its Google ranking. But what’s with the writing in the actual comment? It’s incoherent but it doesn’t look computer-generated. My guess is that the law firm in Massachusetts hired a company that promised to raise their Google rankings, and that this company hired some non-English-speaking foreigners to search through the web and write some spam comments. If anyone actually reads the comments, they might get the impression that this law firm is staffed by illiterates . . . but, as we all know, nobody reads blog comments! P.S. I followed the link (sorry!) and came across this: I guess if they’re going to use a tragedy as an excuse to troll for Faceb

3 0.7954461 425 andrew gelman stats-2010-11-21-If your comment didn’t get through . . .

Introduction: It probably got caught in the spam filter. We get tons and tons of spam (including the annoying spam that I have to remove by hand). If your comment was accompanied by an ad or a spam link, then maybe I just deleted it.

4 0.79211926 523 andrew gelman stats-2011-01-18-Spam is out of control

Introduction: I just took a look at the spam folder . . . 600 messages in the past hour ! Seems pretty ridiculous to me.

5 0.79030514 132 andrew gelman stats-2010-07-07-Note to “Cigarettes”

Introduction: To the person who posted an apparently non-spam comment with a URL link to a “cheap cigarettes” website: In case you’re wondering, no, your comment didn’t get caught by the spam filter–I’m not sure why not, given that URL. I put it in the spam file manually. If you’d like to participate in blog discussion in the future, please refrain from including spam links. Thank you. Also, it’s “John Tukey,” not “John Turkey.”

6 0.78855991 619 andrew gelman stats-2011-03-19-If a comment is flagged as spam, it will disappear forever

7 0.7243402 839 andrew gelman stats-2011-08-04-To commenters who are trying to sell something

8 0.71203554 27 andrew gelman stats-2010-05-11-Update on the spam email study

9 0.71005058 1249 andrew gelman stats-2012-04-06-Thinking seriously about social science research

10 0.69725984 817 andrew gelman stats-2011-07-23-New blog home

11 0.66236997 2306 andrew gelman stats-2014-04-26-Sleazy sock puppet can’t stop spamming our discussion of compressed sensing and promoting the work of Xiteng Liu

12 0.65982634 545 andrew gelman stats-2011-01-30-New innovations in spam

13 0.62375957 1421 andrew gelman stats-2012-07-19-Alexa, Maricel, and Marty: Three cellular automata who got on my nerves

14 0.61049139 9 andrew gelman stats-2010-04-28-But it all goes to pay for gas, car insurance, and tolls on the turnpike

15 0.60252428 2212 andrew gelman stats-2014-02-15-Mary, Mary, why ya buggin

16 0.60055727 605 andrew gelman stats-2011-03-09-Does it feel like cheating when I do this? Variation in ethical standards and expectations

17 0.59238702 2211 andrew gelman stats-2014-02-14-The popularity of certain baby names is falling off the clifffffffffffff

18 0.58806109 1534 andrew gelman stats-2012-10-15-The strange reappearance of Matthew Klam

19 0.58326465 771 andrew gelman stats-2011-06-16-30 days of statistics

20 0.58038032 28 andrew gelman stats-2010-05-12-Alert: Incompetent colleague wastes time of hardworking Wolfram Research publicist


similar blogs computed by lda model

lda for this blog:

topicId topicWeight

[(9, 0.039), (16, 0.042), (21, 0.044), (24, 0.143), (26, 0.011), (29, 0.026), (34, 0.035), (42, 0.017), (63, 0.069), (68, 0.016), (69, 0.021), (73, 0.011), (75, 0.041), (86, 0.059), (94, 0.013), (96, 0.015), (98, 0.111), (99, 0.173)]

similar blogs list:

simIndex simValue blogId blogTitle

same-blog 1 0.91693705 2160 andrew gelman stats-2014-01-06-Spam names

Introduction: There was this thing going around awhile ago, the “porn star name,” which you create by taking the name of your childhood pet, followed by the name of the street where you grew up (for example, Blitz Clifton). But recently I’ve been thinking about spam names. Just in the last two days, I’ve received emails from “Blair Williams” (“I’m sorry to have to tell you this. Tomorrow is the last day that the 40% discount will be available.”), “Audrey Woods” (“I wanted to reach out to you to let you know that we just launched an infographic . . .”), “Steven Harris” (“Part-Time Job – Earn $600/day in your spare-time”), and “Nick Bagnall” (“I sent you an email some weeks ago concerning . . .”). Actually, I think “Nick Bagnall” is probably a real person who’s just spamming me. But the first three names above look fake fake fake. And then there were “George Stoneriver,” Scott Wolfe,” and just plain “Paul,” who were sockpuppeting our discussion on compressed sensing a couple months ago. And do

2 0.89022142 1853 andrew gelman stats-2013-05-12-OpenData Latinoamerica

Introduction: Miguel Paz writes : Poderomedia Foundation and PinLatam are launching OpenDataLatinoamerica.org, a regional data repository to free data and use it on Hackathons and other activities by HacksHackers chapters and other organizations. We are doing this because the road to the future of news has been littered with lost datasets. A day or so after every hackathon and meeting where a group has come together to analyze, compare and understand a particular set of data, someone tries to remember where the successful files were stored. Too often, no one is certain. Therefore with Mariano Blejman we realized that we need a central repository where you can share the data that you have proved to be reliable: OpenData Latinoamerica, which we are leading as ICFJ Knight International Journalism Fellows. If you work in Latin America or Central America your organization can take part in OpenDataLatinoamerica.org. To apply, go to the website and answer a simple form agreeing to meet the standard

3 0.88423789 420 andrew gelman stats-2010-11-18-Prison terms for financial fraud?

Introduction: My econ dept colleague Joseph Stiglitz suggests that financial fraudsters be sent to prison. He points out that the usual penalty–million-dollar fines–just isn’t enough for crimes whose rewards can be in the hundreds of millions of dollars. That all makes sense, but why do the options have to be: 1. No punishment 2. A fine with little punishment or deterrent value 3. Prison. What’s the point of putting nonviolent criminals in prison? As I’ve said before , I’d prefer if the government just took all these convicted thieves’ assets along with 95% of their salary for several years, made them do community service (sorting bottles and cans at the local dump, perhaps; a financier should be good at this sort of thing, no?), etc. If restriction of personal freedom is judged be part of the sentence, they could be given some sort of electronic tag that would send a message to the police if you are ever more than 3 miles from your home. And a curfew so you have to stay home bet

4 0.87898254 1239 andrew gelman stats-2012-04-01-A randomized trial of the set-point diet

Introduction: Someone pointed me to this forthcoming article in the journal Nutrition by J. F. Lee et al. It looks pretty cool. I’m glad that someone went to the effort of performing this careful study. Regular readers will know that I’ve been waiting for this one for awhile. In case you can’t read the article through the paywall, here’s the abstract: Background: Under a widely-accepted theory of caloric balance, any individual has a set-point weight and will find it uncomfortable and typically unsustainable to keep his or her weight below that point. Set-points have evidently been increasing over the past few decades in the United States and other countries, leading to a public-health crisis of obesity. In an n=1 study, Roberts (2004, 2006) proposed an intervention to lower the set-point via daily consumption of unflavored sugar water or vegetable oil. Objective: To evaluate weight-loss outcomes under the diet proposed by Roberts (2004, 2006). Design: Randomized clinica

5 0.87466139 1867 andrew gelman stats-2013-05-22-To Throw Away Data: Plagiarism as a Statistical Crime

Introduction: I’ve been blogging a lot lately about plagiarism (sorry, Bob!), and one thing that’s been bugging me is, why does it bother me so much. Part of the story is simple: much of my reputation comes from the words I write, so I bristle at any attempt to devalue words. I feel the same way about plagiarism that a rich person would feel about counterfeiting: Don’t debase my currency! But it’s more than that. After discussing this a bit with Thomas Basbøll, I realized that I’m bothered by the way that plagiarism interferes with the transmission of information: Much has been written on the ethics of plagiarism. One aspect that has received less notice is plagiarism’s role in corrupting our ability to learn from data: We propose that plagiarism is a statistical crime. It involves the hiding of important information regarding the source and context of the copied work in its original form. Such information can dramatically alter the statistical inferences made about the work. In statisti

6 0.86958402 96 andrew gelman stats-2010-06-18-Course proposal: Bayesian and advanced likelihood statistical methods for zombies.

7 0.85881627 1399 andrew gelman stats-2012-06-28-Life imitates blog

8 0.85868633 1556 andrew gelman stats-2012-11-01-Recently in the sister blogs: special pre-election edition!

9 0.85314918 196 andrew gelman stats-2010-08-10-The U.S. as welfare state

10 0.85227352 132 andrew gelman stats-2010-07-07-Note to “Cigarettes”

11 0.83922821 102 andrew gelman stats-2010-06-21-Why modern art is all in the mind

12 0.83516359 215 andrew gelman stats-2010-08-18-DataMarket

13 0.83394611 742 andrew gelman stats-2011-06-02-Grouponomics, counterfactuals, and opportunity cost

14 0.83355403 26 andrew gelman stats-2010-05-11-Update on religious affiliations of Supreme Court justices

15 0.83183193 782 andrew gelman stats-2011-06-29-Putting together multinomial discrete regressions by combining simple logits

16 0.83138901 1 andrew gelman stats-2010-04-22-Political Belief Networks: Socio-cognitive Heterogeneity in American Public Opinion

17 0.8278172 1701 andrew gelman stats-2013-01-31-The name that fell off a cliff

18 0.82648218 959 andrew gelman stats-2011-10-14-The most clueless political column ever—I think this Easterbrook dude has the journalistic equivalent of “tenure”

19 0.82614344 994 andrew gelman stats-2011-11-06-Josh Tenenbaum presents . . . a model of folk physics!

20 0.8252002 1584 andrew gelman stats-2012-11-19-Tradeoffs in information graphics