andrew_gelman_stats andrew_gelman_stats-2011 andrew_gelman_stats-2011-635 knowledge-graph by maker-knowledge-mining

635 andrew gelman stats-2011-03-29-Bayesian spam!


meta infos for this blog

Source: html

Introduction: Cool! I know Bayes has reached the big time when I receive spam like this: Bayesian networks are rapidly emerging as a new research paradigm . . . With this monthly newsletter, we’ll keep you up to date . . . Financial Analytics Webinar . . . will exhibit at this year’s INFORMS Analytics Conference in downtown Chicago. Please join us for our Bayesian networks technology workshop on April 10 . . . a powerful desktop application (Windows/Mac/Unix) for knowledge discovery, data mining, analytics, predictive modeling and simulation . . . the world’s only comprehensive software package for learning, editing and analyzing Bayesian networks . . . If you no longer wish to receive these emails, please reply to this message with “Unsubscribe” in the subject line . . . You know the saying, “It’s not real unless it’s on TV”? My saying is: It’s not real until it’s on spam.


Summary: the most important sentenses genereted by tfidf model

sentIndex sentText sentNum sentScore

1 I know Bayes has reached the big time when I receive spam like this: Bayesian networks are rapidly emerging as a new research paradigm . [sent-2, score-1.292]

2 With this monthly newsletter, we’ll keep you up to date . [sent-5, score-0.297]

3 will exhibit at this year’s INFORMS Analytics Conference in downtown Chicago. [sent-11, score-0.304]

4 Please join us for our Bayesian networks technology workshop on April 10 . [sent-12, score-0.685]

5 a powerful desktop application (Windows/Mac/Unix) for knowledge discovery, data mining, analytics, predictive modeling and simulation . [sent-15, score-0.691]

6 the world’s only comprehensive software package for learning, editing and analyzing Bayesian networks . [sent-18, score-0.877]

7 If you no longer wish to receive these emails, please reply to this message with “Unsubscribe” in the subject line . [sent-21, score-0.786]

8 You know the saying, “It’s not real unless it’s on TV”? [sent-24, score-0.192]

9 My saying is: It’s not real until it’s on spam. [sent-25, score-0.216]


similar blogs computed by tfidf model

tfidf for this blog:

wordName wordTfidf (topN-words)

[('analytics', 0.411), ('networks', 0.32), ('spam', 0.238), ('receive', 0.219), ('downtown', 0.173), ('desktop', 0.166), ('newsletter', 0.166), ('informs', 0.16), ('please', 0.15), ('emerging', 0.148), ('bayesian', 0.143), ('comprehensive', 0.142), ('workshop', 0.139), ('exhibit', 0.131), ('rapidly', 0.131), ('editing', 0.131), ('mining', 0.125), ('paradigm', 0.12), ('emails', 0.119), ('monthly', 0.119), ('april', 0.118), ('reached', 0.116), ('join', 0.116), ('tv', 0.115), ('date', 0.113), ('saying', 0.112), ('conference', 0.11), ('technology', 0.11), ('discovery', 0.108), ('powerful', 0.107), ('simulation', 0.105), ('real', 0.104), ('wish', 0.102), ('analyzing', 0.1), ('package', 0.096), ('application', 0.089), ('software', 0.088), ('unless', 0.088), ('financial', 0.088), ('cool', 0.088), ('longer', 0.086), ('message', 0.085), ('predictive', 0.083), ('bayes', 0.081), ('subject', 0.078), ('learning', 0.078), ('knowledge', 0.077), ('line', 0.066), ('keep', 0.065), ('modeling', 0.064)]

similar blogs list:

simIndex simValue blogId blogTitle

same-blog 1 1.0000001 635 andrew gelman stats-2011-03-29-Bayesian spam!

Introduction: Cool! I know Bayes has reached the big time when I receive spam like this: Bayesian networks are rapidly emerging as a new research paradigm . . . With this monthly newsletter, we’ll keep you up to date . . . Financial Analytics Webinar . . . will exhibit at this year’s INFORMS Analytics Conference in downtown Chicago. Please join us for our Bayesian networks technology workshop on April 10 . . . a powerful desktop application (Windows/Mac/Unix) for knowledge discovery, data mining, analytics, predictive modeling and simulation . . . the world’s only comprehensive software package for learning, editing and analyzing Bayesian networks . . . If you no longer wish to receive these emails, please reply to this message with “Unsubscribe” in the subject line . . . You know the saying, “It’s not real unless it’s on TV”? My saying is: It’s not real until it’s on spam.

2 0.23893592 1902 andrew gelman stats-2013-06-17-Job opening at new “big data” consulting firm!

Introduction: David Shor sends along a job announcement for Civis Analytics, which he describes as “basically Obama’s Analytics team reconstituted as a company”: Data Scientist Position Overview Data Scientists are responsible for providing the fundamental data science that powers our work – including predictive analytics, data mining, experimental design and ad-hoc statistical analysis. As a Data Scientist, you will join our Chicago-based data science team, working closely and collaboratively with analysts and engineers to identify, quantify and solve big, meaningful problems. Data Scientists will have the opportunity to dive deeply into big problems and work in a variety of areas. Civis Analytics has opportunities for applicants who are seasoned professionals, brilliant new comers, and anywhere in between. Qualifications · Master’s degree in statistics, machine learning, computer science with heavy quant focus, a related subject, or a Bachelor’s degree and significant work ex

3 0.17692471 425 andrew gelman stats-2010-11-21-If your comment didn’t get through . . .

Introduction: It probably got caught in the spam filter. We get tons and tons of spam (including the annoying spam that I have to remove by hand). If your comment was accompanied by an ad or a spam link, then maybe I just deleted it.

4 0.16157377 465 andrew gelman stats-2010-12-13-$3M health care prediction challenge

Introduction: i received the following press release from the Heritage Provider Network, “the largest limited Knox-Keene licensed managed care organization in California.” I have no idea what this means, but I assume it’s some sort of HMO. In any case, this looks like it could be interesting: Participants in the Health Prize challenge will be given a data set comprised of the de-identified medical records of 100,000 individuals who are members of HPN. The teams will then need to predict the hospitalization of a set percentage of those members who went to the hospital during the year following the start date, and do so with a defined accuracy rate. The winners will receive the $3 million prize. . . . the contest is designed to spur involvement by others involved in analytics, such as those involved in data mining and predictive modeling who may not currently be working in health care. “We believe that doing so will bring innovative thinking to health analytics and may allow us to solve at

5 0.1580434 132 andrew gelman stats-2010-07-07-Note to “Cigarettes”

Introduction: To the person who posted an apparently non-spam comment with a URL link to a “cheap cigarettes” website: In case you’re wondering, no, your comment didn’t get caught by the spam filter–I’m not sure why not, given that URL. I put it in the spam file manually. If you’d like to participate in blog discussion in the future, please refrain from including spam links. Thank you. Also, it’s “John Tukey,” not “John Turkey.”

6 0.15788455 231 andrew gelman stats-2010-08-24-Yet another Bayesian job opportunity

7 0.15662213 1276 andrew gelman stats-2012-04-22-“Gross misuse of statistics” can be a good thing, if it indicates the acceptance of the importance of statistical reasoning

8 0.13487773 619 andrew gelman stats-2011-03-19-If a comment is flagged as spam, it will disappear forever

9 0.12133366 1488 andrew gelman stats-2012-09-08-Annals of spam

10 0.12064236 951 andrew gelman stats-2011-10-11-Data mining efforts for Obama’s campaign

11 0.1171803 1110 andrew gelman stats-2012-01-10-Jobs in statistics research! In New Jersey!

12 0.1107925 1909 andrew gelman stats-2013-06-21-Job openings at conservative political analytics firm!

13 0.10916315 27 andrew gelman stats-2010-05-11-Update on the spam email study

14 0.10412814 1651 andrew gelman stats-2013-01-03-Faculty Position in Visualization, Visual Analytics, Imaging, and Human Centered Computing

15 0.10353165 2118 andrew gelman stats-2013-11-30-???

16 0.10159817 771 andrew gelman stats-2011-06-16-30 days of statistics

17 0.10097076 1489 andrew gelman stats-2012-09-09-Commercial Bayesian inference software is popping up all over

18 0.09865959 839 andrew gelman stats-2011-08-04-To commenters who are trying to sell something

19 0.095093533 1992 andrew gelman stats-2013-08-21-Workshop for Women in Machine Learning

20 0.092276588 2282 andrew gelman stats-2014-04-05-Bizarre academic spam


similar blogs computed by lsi model

lsi for this blog:

topicId topicWeight

[(0, 0.104), (1, 0.025), (2, -0.094), (3, 0.045), (4, -0.018), (5, 0.082), (6, -0.055), (7, -0.034), (8, -0.034), (9, -0.007), (10, -0.032), (11, -0.061), (12, 0.11), (13, 0.027), (14, -0.041), (15, 0.074), (16, 0.03), (17, -0.063), (18, -0.02), (19, 0.045), (20, 0.054), (21, 0.024), (22, 0.017), (23, -0.084), (24, -0.015), (25, -0.03), (26, 0.031), (27, 0.058), (28, 0.013), (29, -0.031), (30, 0.001), (31, -0.012), (32, -0.015), (33, -0.011), (34, 0.009), (35, 0.072), (36, -0.052), (37, 0.013), (38, 0.0), (39, -0.013), (40, -0.086), (41, 0.061), (42, -0.052), (43, 0.063), (44, -0.009), (45, 0.021), (46, 0.065), (47, 0.049), (48, 0.016), (49, -0.029)]

similar blogs list:

simIndex simValue blogId blogTitle

same-blog 1 0.94706613 635 andrew gelman stats-2011-03-29-Bayesian spam!

Introduction: Cool! I know Bayes has reached the big time when I receive spam like this: Bayesian networks are rapidly emerging as a new research paradigm . . . With this monthly newsletter, we’ll keep you up to date . . . Financial Analytics Webinar . . . will exhibit at this year’s INFORMS Analytics Conference in downtown Chicago. Please join us for our Bayesian networks technology workshop on April 10 . . . a powerful desktop application (Windows/Mac/Unix) for knowledge discovery, data mining, analytics, predictive modeling and simulation . . . the world’s only comprehensive software package for learning, editing and analyzing Bayesian networks . . . If you no longer wish to receive these emails, please reply to this message with “Unsubscribe” in the subject line . . . You know the saying, “It’s not real unless it’s on TV”? My saying is: It’s not real until it’s on spam.

2 0.69455701 425 andrew gelman stats-2010-11-21-If your comment didn’t get through . . .

Introduction: It probably got caught in the spam filter. We get tons and tons of spam (including the annoying spam that I have to remove by hand). If your comment was accompanied by an ad or a spam link, then maybe I just deleted it.

3 0.69021392 1488 andrew gelman stats-2012-09-08-Annals of spam

Introduction: I have to go through the inbox to approve new comments. When I set to auto-approve, I get overwhelmed with spam. As is, I still get spam but it’s manageable. Usually the spam is uninteresting but this one caught my eye: At first this seemed reasonable enough: law firm is desperate for business, spams blogs to raise its Google ranking. But what’s with the writing in the actual comment? It’s incoherent but it doesn’t look computer-generated. My guess is that the law firm in Massachusetts hired a company that promised to raise their Google rankings, and that this company hired some non-English-speaking foreigners to search through the web and write some spam comments. If anyone actually reads the comments, they might get the impression that this law firm is staffed by illiterates . . . but, as we all know, nobody reads blog comments! P.S. I followed the link (sorry!) and came across this: I guess if they’re going to use a tragedy as an excuse to troll for Faceb

4 0.66577017 132 andrew gelman stats-2010-07-07-Note to “Cigarettes”

Introduction: To the person who posted an apparently non-spam comment with a URL link to a “cheap cigarettes” website: In case you’re wondering, no, your comment didn’t get caught by the spam filter–I’m not sure why not, given that URL. I put it in the spam file manually. If you’d like to participate in blog discussion in the future, please refrain from including spam links. Thank you. Also, it’s “John Tukey,” not “John Turkey.”

5 0.65969753 523 andrew gelman stats-2011-01-18-Spam is out of control

Introduction: I just took a look at the spam folder . . . 600 messages in the past hour ! Seems pretty ridiculous to me.

6 0.65902317 839 andrew gelman stats-2011-08-04-To commenters who are trying to sell something

7 0.626697 231 andrew gelman stats-2010-08-24-Yet another Bayesian job opportunity

8 0.60986888 545 andrew gelman stats-2011-01-30-New innovations in spam

9 0.58977842 619 andrew gelman stats-2011-03-19-If a comment is flagged as spam, it will disappear forever

10 0.58297497 1902 andrew gelman stats-2013-06-17-Job opening at new “big data” consulting firm!

11 0.57514435 1489 andrew gelman stats-2012-09-09-Commercial Bayesian inference software is popping up all over

12 0.57135707 223 andrew gelman stats-2010-08-21-Statoverflow

13 0.56770474 1297 andrew gelman stats-2012-05-03-New New York data research organizations

14 0.56296378 453 andrew gelman stats-2010-12-07-Biostatistics via Pragmatic and Perceptive Bayes.

15 0.56106293 817 andrew gelman stats-2011-07-23-New blog home

16 0.55955338 1279 andrew gelman stats-2012-04-24-ESPN is looking to hire a research analyst

17 0.55288005 1119 andrew gelman stats-2012-01-15-Excellence in Statistical Reporting Award

18 0.54244375 1990 andrew gelman stats-2013-08-20-Job opening at an organization that promotes reproducible research!

19 0.53782207 1497 andrew gelman stats-2012-09-15-Our blog makes connections!

20 0.51937234 1909 andrew gelman stats-2013-06-21-Job openings at conservative political analytics firm!


similar blogs computed by lda model

lda for this blog:

topicId topicWeight

[(9, 0.018), (15, 0.037), (16, 0.054), (18, 0.02), (21, 0.036), (22, 0.042), (23, 0.016), (24, 0.124), (27, 0.018), (32, 0.045), (34, 0.015), (48, 0.015), (53, 0.055), (59, 0.016), (65, 0.029), (88, 0.019), (98, 0.076), (99, 0.26)]

similar blogs list:

simIndex simValue blogId blogTitle

same-blog 1 0.97134459 635 andrew gelman stats-2011-03-29-Bayesian spam!

Introduction: Cool! I know Bayes has reached the big time when I receive spam like this: Bayesian networks are rapidly emerging as a new research paradigm . . . With this monthly newsletter, we’ll keep you up to date . . . Financial Analytics Webinar . . . will exhibit at this year’s INFORMS Analytics Conference in downtown Chicago. Please join us for our Bayesian networks technology workshop on April 10 . . . a powerful desktop application (Windows/Mac/Unix) for knowledge discovery, data mining, analytics, predictive modeling and simulation . . . the world’s only comprehensive software package for learning, editing and analyzing Bayesian networks . . . If you no longer wish to receive these emails, please reply to this message with “Unsubscribe” in the subject line . . . You know the saying, “It’s not real unless it’s on TV”? My saying is: It’s not real until it’s on spam.

2 0.94346118 955 andrew gelman stats-2011-10-12-Why it doesn’t make sense to chew people out for not reading the help page

Introduction: Karl Broman writes : Barry Rowlingson gave an interesting talk at UseR 2011, “Why R-help must die!” He suggested the Q-and-A type sites Stack Overflow (on programming) and Cross Validated (on statistics), both part of Stack Exchange. I haven’t used R-help recently but I do occasionally send people there. Just to see what was going on there, I clicked on over , did a little searching, and found this delight from a renowned professor of R. There’s something about the “please” there that just makes it all that much more special. (In contrast, the advice here to “please do your homework” just seems rude. I have a larger (or maybe smaller) point to make, though, which is about the silliness of advice to “read the damn manual” etc. Several years ago I read a fascinating book called City by William Whyte. He and his students had gone around various public places in NYC and observed how people actually behaved—how they walked, sit, stood, and interacted. One of Whyte’s ce

3 0.93994564 1867 andrew gelman stats-2013-05-22-To Throw Away Data: Plagiarism as a Statistical Crime

Introduction: I’ve been blogging a lot lately about plagiarism (sorry, Bob!), and one thing that’s been bugging me is, why does it bother me so much. Part of the story is simple: much of my reputation comes from the words I write, so I bristle at any attempt to devalue words. I feel the same way about plagiarism that a rich person would feel about counterfeiting: Don’t debase my currency! But it’s more than that. After discussing this a bit with Thomas Basbøll, I realized that I’m bothered by the way that plagiarism interferes with the transmission of information: Much has been written on the ethics of plagiarism. One aspect that has received less notice is plagiarism’s role in corrupting our ability to learn from data: We propose that plagiarism is a statistical crime. It involves the hiding of important information regarding the source and context of the copied work in its original form. Such information can dramatically alter the statistical inferences made about the work. In statisti

4 0.93501735 1853 andrew gelman stats-2013-05-12-OpenData Latinoamerica

Introduction: Miguel Paz writes : Poderomedia Foundation and PinLatam are launching OpenDataLatinoamerica.org, a regional data repository to free data and use it on Hackathons and other activities by HacksHackers chapters and other organizations. We are doing this because the road to the future of news has been littered with lost datasets. A day or so after every hackathon and meeting where a group has come together to analyze, compare and understand a particular set of data, someone tries to remember where the successful files were stored. Too often, no one is certain. Therefore with Mariano Blejman we realized that we need a central repository where you can share the data that you have proved to be reliable: OpenData Latinoamerica, which we are leading as ICFJ Knight International Journalism Fellows. If you work in Latin America or Central America your organization can take part in OpenDataLatinoamerica.org. To apply, go to the website and answer a simple form agreeing to meet the standard

5 0.93119538 2334 andrew gelman stats-2014-05-14-“The subtle funk of just a little poultry offal”

Introduction: Today’s item mixes two of my favorite themes in a horrible way, sort of like a Reese’s Cup but combining brussels sprouts and liver instead of peanut butter and chocolate. In this case, the disturbing flavors that go together are plagiarism (you know what that is) and the publication filter (the idea that there should be very stringent standards for criticizing something, once it happens to be published somewhere). The copyist The first ingredient comes from Matthew Whitaker, an Arizona State University Foundation Professor of History who has a deplorable record of copying material from other writers without attribution . For convenience, I’ll reproduce an example here: On the plus side; Whitaker removed the cliche’d phrase, “undisputed rulers of the roost” when copying from the online encyclopedia; on the downside, I don’t know what he was thinking when he rendered “Conservatives” with a capital letter. And in case you were wondering what the policy on t

6 0.93043655 2234 andrew gelman stats-2014-03-05-Plagiarism, Arizona style

7 0.93011296 446 andrew gelman stats-2010-12-03-Is 0.05 too strict as a p-value threshold?

8 0.92971349 2013 andrew gelman stats-2013-09-08-What we need here is some peer review for statistical graphics

9 0.9278599 758 andrew gelman stats-2011-06-11-Hey, good news! Your p-value just passed the 0.05 threshold!

10 0.92772388 959 andrew gelman stats-2011-10-14-The most clueless political column ever—I think this Easterbrook dude has the journalistic equivalent of “tenure”

11 0.9276191 625 andrew gelman stats-2011-03-23-My last post on albedo, I promise

12 0.92752761 196 andrew gelman stats-2010-08-10-The U.S. as welfare state

13 0.92703074 2160 andrew gelman stats-2014-01-06-Spam names

14 0.92613447 248 andrew gelman stats-2010-09-01-Ratios where the numerator and denominator both change signs

15 0.92598432 1 andrew gelman stats-2010-04-22-Political Belief Networks: Socio-cognitive Heterogeneity in American Public Opinion

16 0.92373139 2313 andrew gelman stats-2014-04-30-Seth Roberts

17 0.9234271 2220 andrew gelman stats-2014-02-22-Quickies

18 0.92274761 1701 andrew gelman stats-2013-01-31-The name that fell off a cliff

19 0.92229092 1933 andrew gelman stats-2013-07-10-Please send all comments to -dev-ripley

20 0.92215461 695 andrew gelman stats-2011-05-04-Statistics ethics question