andrew_gelman_stats andrew_gelman_stats-2011 andrew_gelman_stats-2011-635 knowledge-graph by maker-knowledge-mining
Source: html
Introduction: Cool! I know Bayes has reached the big time when I receive spam like this: Bayesian networks are rapidly emerging as a new research paradigm . . . With this monthly newsletter, we’ll keep you up to date . . . Financial Analytics Webinar . . . will exhibit at this year’s INFORMS Analytics Conference in downtown Chicago. Please join us for our Bayesian networks technology workshop on April 10 . . . a powerful desktop application (Windows/Mac/Unix) for knowledge discovery, data mining, analytics, predictive modeling and simulation . . . the world’s only comprehensive software package for learning, editing and analyzing Bayesian networks . . . If you no longer wish to receive these emails, please reply to this message with “Unsubscribe” in the subject line . . . You know the saying, “It’s not real unless it’s on TV”? My saying is: It’s not real until it’s on spam.
sentIndex sentText sentNum sentScore
1 I know Bayes has reached the big time when I receive spam like this: Bayesian networks are rapidly emerging as a new research paradigm . [sent-2, score-1.292]
2 With this monthly newsletter, we’ll keep you up to date . [sent-5, score-0.297]
3 will exhibit at this year’s INFORMS Analytics Conference in downtown Chicago. [sent-11, score-0.304]
4 Please join us for our Bayesian networks technology workshop on April 10 . [sent-12, score-0.685]
5 a powerful desktop application (Windows/Mac/Unix) for knowledge discovery, data mining, analytics, predictive modeling and simulation . [sent-15, score-0.691]
6 the world’s only comprehensive software package for learning, editing and analyzing Bayesian networks . [sent-18, score-0.877]
7 If you no longer wish to receive these emails, please reply to this message with “Unsubscribe” in the subject line . [sent-21, score-0.786]
8 You know the saying, “It’s not real unless it’s on TV”? [sent-24, score-0.192]
9 My saying is: It’s not real until it’s on spam. [sent-25, score-0.216]
wordName wordTfidf (topN-words)
[('analytics', 0.411), ('networks', 0.32), ('spam', 0.238), ('receive', 0.219), ('downtown', 0.173), ('desktop', 0.166), ('newsletter', 0.166), ('informs', 0.16), ('please', 0.15), ('emerging', 0.148), ('bayesian', 0.143), ('comprehensive', 0.142), ('workshop', 0.139), ('exhibit', 0.131), ('rapidly', 0.131), ('editing', 0.131), ('mining', 0.125), ('paradigm', 0.12), ('emails', 0.119), ('monthly', 0.119), ('april', 0.118), ('reached', 0.116), ('join', 0.116), ('tv', 0.115), ('date', 0.113), ('saying', 0.112), ('conference', 0.11), ('technology', 0.11), ('discovery', 0.108), ('powerful', 0.107), ('simulation', 0.105), ('real', 0.104), ('wish', 0.102), ('analyzing', 0.1), ('package', 0.096), ('application', 0.089), ('software', 0.088), ('unless', 0.088), ('financial', 0.088), ('cool', 0.088), ('longer', 0.086), ('message', 0.085), ('predictive', 0.083), ('bayes', 0.081), ('subject', 0.078), ('learning', 0.078), ('knowledge', 0.077), ('line', 0.066), ('keep', 0.065), ('modeling', 0.064)]
simIndex simValue blogId blogTitle
same-blog 1 1.0000001 635 andrew gelman stats-2011-03-29-Bayesian spam!
Introduction: Cool! I know Bayes has reached the big time when I receive spam like this: Bayesian networks are rapidly emerging as a new research paradigm . . . With this monthly newsletter, we’ll keep you up to date . . . Financial Analytics Webinar . . . will exhibit at this year’s INFORMS Analytics Conference in downtown Chicago. Please join us for our Bayesian networks technology workshop on April 10 . . . a powerful desktop application (Windows/Mac/Unix) for knowledge discovery, data mining, analytics, predictive modeling and simulation . . . the world’s only comprehensive software package for learning, editing and analyzing Bayesian networks . . . If you no longer wish to receive these emails, please reply to this message with “Unsubscribe” in the subject line . . . You know the saying, “It’s not real unless it’s on TV”? My saying is: It’s not real until it’s on spam.
2 0.23893592 1902 andrew gelman stats-2013-06-17-Job opening at new “big data” consulting firm!
Introduction: David Shor sends along a job announcement for Civis Analytics, which he describes as “basically Obama’s Analytics team reconstituted as a company”: Data Scientist Position Overview Data Scientists are responsible for providing the fundamental data science that powers our work – including predictive analytics, data mining, experimental design and ad-hoc statistical analysis. As a Data Scientist, you will join our Chicago-based data science team, working closely and collaboratively with analysts and engineers to identify, quantify and solve big, meaningful problems. Data Scientists will have the opportunity to dive deeply into big problems and work in a variety of areas. Civis Analytics has opportunities for applicants who are seasoned professionals, brilliant new comers, and anywhere in between. Qualifications · Master’s degree in statistics, machine learning, computer science with heavy quant focus, a related subject, or a Bachelor’s degree and significant work ex
3 0.17692471 425 andrew gelman stats-2010-11-21-If your comment didn’t get through . . .
Introduction: It probably got caught in the spam filter. We get tons and tons of spam (including the annoying spam that I have to remove by hand). If your comment was accompanied by an ad or a spam link, then maybe I just deleted it.
4 0.16157377 465 andrew gelman stats-2010-12-13-$3M health care prediction challenge
Introduction: i received the following press release from the Heritage Provider Network, “the largest limited Knox-Keene licensed managed care organization in California.” I have no idea what this means, but I assume it’s some sort of HMO. In any case, this looks like it could be interesting: Participants in the Health Prize challenge will be given a data set comprised of the de-identified medical records of 100,000 individuals who are members of HPN. The teams will then need to predict the hospitalization of a set percentage of those members who went to the hospital during the year following the start date, and do so with a defined accuracy rate. The winners will receive the $3 million prize. . . . the contest is designed to spur involvement by others involved in analytics, such as those involved in data mining and predictive modeling who may not currently be working in health care. “We believe that doing so will bring innovative thinking to health analytics and may allow us to solve at
5 0.1580434 132 andrew gelman stats-2010-07-07-Note to “Cigarettes”
Introduction: To the person who posted an apparently non-spam comment with a URL link to a “cheap cigarettes” website: In case you’re wondering, no, your comment didn’t get caught by the spam filter–I’m not sure why not, given that URL. I put it in the spam file manually. If you’d like to participate in blog discussion in the future, please refrain from including spam links. Thank you. Also, it’s “John Tukey,” not “John Turkey.”
6 0.15788455 231 andrew gelman stats-2010-08-24-Yet another Bayesian job opportunity
8 0.13487773 619 andrew gelman stats-2011-03-19-If a comment is flagged as spam, it will disappear forever
9 0.12133366 1488 andrew gelman stats-2012-09-08-Annals of spam
10 0.12064236 951 andrew gelman stats-2011-10-11-Data mining efforts for Obama’s campaign
11 0.1171803 1110 andrew gelman stats-2012-01-10-Jobs in statistics research! In New Jersey!
12 0.1107925 1909 andrew gelman stats-2013-06-21-Job openings at conservative political analytics firm!
13 0.10916315 27 andrew gelman stats-2010-05-11-Update on the spam email study
14 0.10412814 1651 andrew gelman stats-2013-01-03-Faculty Position in Visualization, Visual Analytics, Imaging, and Human Centered Computing
15 0.10353165 2118 andrew gelman stats-2013-11-30-???
16 0.10159817 771 andrew gelman stats-2011-06-16-30 days of statistics
17 0.10097076 1489 andrew gelman stats-2012-09-09-Commercial Bayesian inference software is popping up all over
18 0.09865959 839 andrew gelman stats-2011-08-04-To commenters who are trying to sell something
19 0.095093533 1992 andrew gelman stats-2013-08-21-Workshop for Women in Machine Learning
20 0.092276588 2282 andrew gelman stats-2014-04-05-Bizarre academic spam
topicId topicWeight
[(0, 0.104), (1, 0.025), (2, -0.094), (3, 0.045), (4, -0.018), (5, 0.082), (6, -0.055), (7, -0.034), (8, -0.034), (9, -0.007), (10, -0.032), (11, -0.061), (12, 0.11), (13, 0.027), (14, -0.041), (15, 0.074), (16, 0.03), (17, -0.063), (18, -0.02), (19, 0.045), (20, 0.054), (21, 0.024), (22, 0.017), (23, -0.084), (24, -0.015), (25, -0.03), (26, 0.031), (27, 0.058), (28, 0.013), (29, -0.031), (30, 0.001), (31, -0.012), (32, -0.015), (33, -0.011), (34, 0.009), (35, 0.072), (36, -0.052), (37, 0.013), (38, 0.0), (39, -0.013), (40, -0.086), (41, 0.061), (42, -0.052), (43, 0.063), (44, -0.009), (45, 0.021), (46, 0.065), (47, 0.049), (48, 0.016), (49, -0.029)]
simIndex simValue blogId blogTitle
same-blog 1 0.94706613 635 andrew gelman stats-2011-03-29-Bayesian spam!
Introduction: Cool! I know Bayes has reached the big time when I receive spam like this: Bayesian networks are rapidly emerging as a new research paradigm . . . With this monthly newsletter, we’ll keep you up to date . . . Financial Analytics Webinar . . . will exhibit at this year’s INFORMS Analytics Conference in downtown Chicago. Please join us for our Bayesian networks technology workshop on April 10 . . . a powerful desktop application (Windows/Mac/Unix) for knowledge discovery, data mining, analytics, predictive modeling and simulation . . . the world’s only comprehensive software package for learning, editing and analyzing Bayesian networks . . . If you no longer wish to receive these emails, please reply to this message with “Unsubscribe” in the subject line . . . You know the saying, “It’s not real unless it’s on TV”? My saying is: It’s not real until it’s on spam.
2 0.69455701 425 andrew gelman stats-2010-11-21-If your comment didn’t get through . . .
Introduction: It probably got caught in the spam filter. We get tons and tons of spam (including the annoying spam that I have to remove by hand). If your comment was accompanied by an ad or a spam link, then maybe I just deleted it.
3 0.69021392 1488 andrew gelman stats-2012-09-08-Annals of spam
Introduction: I have to go through the inbox to approve new comments. When I set to auto-approve, I get overwhelmed with spam. As is, I still get spam but it’s manageable. Usually the spam is uninteresting but this one caught my eye: At first this seemed reasonable enough: law firm is desperate for business, spams blogs to raise its Google ranking. But what’s with the writing in the actual comment? It’s incoherent but it doesn’t look computer-generated. My guess is that the law firm in Massachusetts hired a company that promised to raise their Google rankings, and that this company hired some non-English-speaking foreigners to search through the web and write some spam comments. If anyone actually reads the comments, they might get the impression that this law firm is staffed by illiterates . . . but, as we all know, nobody reads blog comments! P.S. I followed the link (sorry!) and came across this: I guess if they’re going to use a tragedy as an excuse to troll for Faceb
4 0.66577017 132 andrew gelman stats-2010-07-07-Note to “Cigarettes”
Introduction: To the person who posted an apparently non-spam comment with a URL link to a “cheap cigarettes” website: In case you’re wondering, no, your comment didn’t get caught by the spam filter–I’m not sure why not, given that URL. I put it in the spam file manually. If you’d like to participate in blog discussion in the future, please refrain from including spam links. Thank you. Also, it’s “John Tukey,” not “John Turkey.”
5 0.65969753 523 andrew gelman stats-2011-01-18-Spam is out of control
Introduction: I just took a look at the spam folder . . . 600 messages in the past hour ! Seems pretty ridiculous to me.
6 0.65902317 839 andrew gelman stats-2011-08-04-To commenters who are trying to sell something
7 0.626697 231 andrew gelman stats-2010-08-24-Yet another Bayesian job opportunity
8 0.60986888 545 andrew gelman stats-2011-01-30-New innovations in spam
9 0.58977842 619 andrew gelman stats-2011-03-19-If a comment is flagged as spam, it will disappear forever
10 0.58297497 1902 andrew gelman stats-2013-06-17-Job opening at new “big data” consulting firm!
11 0.57514435 1489 andrew gelman stats-2012-09-09-Commercial Bayesian inference software is popping up all over
12 0.57135707 223 andrew gelman stats-2010-08-21-Statoverflow
13 0.56770474 1297 andrew gelman stats-2012-05-03-New New York data research organizations
14 0.56296378 453 andrew gelman stats-2010-12-07-Biostatistics via Pragmatic and Perceptive Bayes.
15 0.56106293 817 andrew gelman stats-2011-07-23-New blog home
16 0.55955338 1279 andrew gelman stats-2012-04-24-ESPN is looking to hire a research analyst
17 0.55288005 1119 andrew gelman stats-2012-01-15-Excellence in Statistical Reporting Award
18 0.54244375 1990 andrew gelman stats-2013-08-20-Job opening at an organization that promotes reproducible research!
19 0.53782207 1497 andrew gelman stats-2012-09-15-Our blog makes connections!
20 0.51937234 1909 andrew gelman stats-2013-06-21-Job openings at conservative political analytics firm!
topicId topicWeight
[(9, 0.018), (15, 0.037), (16, 0.054), (18, 0.02), (21, 0.036), (22, 0.042), (23, 0.016), (24, 0.124), (27, 0.018), (32, 0.045), (34, 0.015), (48, 0.015), (53, 0.055), (59, 0.016), (65, 0.029), (88, 0.019), (98, 0.076), (99, 0.26)]
simIndex simValue blogId blogTitle
same-blog 1 0.97134459 635 andrew gelman stats-2011-03-29-Bayesian spam!
Introduction: Cool! I know Bayes has reached the big time when I receive spam like this: Bayesian networks are rapidly emerging as a new research paradigm . . . With this monthly newsletter, we’ll keep you up to date . . . Financial Analytics Webinar . . . will exhibit at this year’s INFORMS Analytics Conference in downtown Chicago. Please join us for our Bayesian networks technology workshop on April 10 . . . a powerful desktop application (Windows/Mac/Unix) for knowledge discovery, data mining, analytics, predictive modeling and simulation . . . the world’s only comprehensive software package for learning, editing and analyzing Bayesian networks . . . If you no longer wish to receive these emails, please reply to this message with “Unsubscribe” in the subject line . . . You know the saying, “It’s not real unless it’s on TV”? My saying is: It’s not real until it’s on spam.
2 0.94346118 955 andrew gelman stats-2011-10-12-Why it doesn’t make sense to chew people out for not reading the help page
Introduction: Karl Broman writes : Barry Rowlingson gave an interesting talk at UseR 2011, “Why R-help must die!” He suggested the Q-and-A type sites Stack Overflow (on programming) and Cross Validated (on statistics), both part of Stack Exchange. I haven’t used R-help recently but I do occasionally send people there. Just to see what was going on there, I clicked on over , did a little searching, and found this delight from a renowned professor of R. There’s something about the “please” there that just makes it all that much more special. (In contrast, the advice here to “please do your homework” just seems rude. I have a larger (or maybe smaller) point to make, though, which is about the silliness of advice to “read the damn manual” etc. Several years ago I read a fascinating book called City by William Whyte. He and his students had gone around various public places in NYC and observed how people actually behaved—how they walked, sit, stood, and interacted. One of Whyte’s ce
3 0.93994564 1867 andrew gelman stats-2013-05-22-To Throw Away Data: Plagiarism as a Statistical Crime
Introduction: I’ve been blogging a lot lately about plagiarism (sorry, Bob!), and one thing that’s been bugging me is, why does it bother me so much. Part of the story is simple: much of my reputation comes from the words I write, so I bristle at any attempt to devalue words. I feel the same way about plagiarism that a rich person would feel about counterfeiting: Don’t debase my currency! But it’s more than that. After discussing this a bit with Thomas Basbøll, I realized that I’m bothered by the way that plagiarism interferes with the transmission of information: Much has been written on the ethics of plagiarism. One aspect that has received less notice is plagiarism’s role in corrupting our ability to learn from data: We propose that plagiarism is a statistical crime. It involves the hiding of important information regarding the source and context of the copied work in its original form. Such information can dramatically alter the statistical inferences made about the work. In statisti
4 0.93501735 1853 andrew gelman stats-2013-05-12-OpenData Latinoamerica
Introduction: Miguel Paz writes : Poderomedia Foundation and PinLatam are launching OpenDataLatinoamerica.org, a regional data repository to free data and use it on Hackathons and other activities by HacksHackers chapters and other organizations. We are doing this because the road to the future of news has been littered with lost datasets. A day or so after every hackathon and meeting where a group has come together to analyze, compare and understand a particular set of data, someone tries to remember where the successful files were stored. Too often, no one is certain. Therefore with Mariano Blejman we realized that we need a central repository where you can share the data that you have proved to be reliable: OpenData Latinoamerica, which we are leading as ICFJ Knight International Journalism Fellows. If you work in Latin America or Central America your organization can take part in OpenDataLatinoamerica.org. To apply, go to the website and answer a simple form agreeing to meet the standard
5 0.93119538 2334 andrew gelman stats-2014-05-14-“The subtle funk of just a little poultry offal”
Introduction: Today’s item mixes two of my favorite themes in a horrible way, sort of like a Reese’s Cup but combining brussels sprouts and liver instead of peanut butter and chocolate. In this case, the disturbing flavors that go together are plagiarism (you know what that is) and the publication filter (the idea that there should be very stringent standards for criticizing something, once it happens to be published somewhere). The copyist The first ingredient comes from Matthew Whitaker, an Arizona State University Foundation Professor of History who has a deplorable record of copying material from other writers without attribution . For convenience, I’ll reproduce an example here: On the plus side; Whitaker removed the cliche’d phrase, “undisputed rulers of the roost” when copying from the online encyclopedia; on the downside, I don’t know what he was thinking when he rendered “Conservatives” with a capital letter. And in case you were wondering what the policy on t
6 0.93043655 2234 andrew gelman stats-2014-03-05-Plagiarism, Arizona style
7 0.93011296 446 andrew gelman stats-2010-12-03-Is 0.05 too strict as a p-value threshold?
8 0.92971349 2013 andrew gelman stats-2013-09-08-What we need here is some peer review for statistical graphics
9 0.9278599 758 andrew gelman stats-2011-06-11-Hey, good news! Your p-value just passed the 0.05 threshold!
11 0.9276191 625 andrew gelman stats-2011-03-23-My last post on albedo, I promise
12 0.92752761 196 andrew gelman stats-2010-08-10-The U.S. as welfare state
13 0.92703074 2160 andrew gelman stats-2014-01-06-Spam names
14 0.92613447 248 andrew gelman stats-2010-09-01-Ratios where the numerator and denominator both change signs
15 0.92598432 1 andrew gelman stats-2010-04-22-Political Belief Networks: Socio-cognitive Heterogeneity in American Public Opinion
16 0.92373139 2313 andrew gelman stats-2014-04-30-Seth Roberts
17 0.9234271 2220 andrew gelman stats-2014-02-22-Quickies
18 0.92274761 1701 andrew gelman stats-2013-01-31-The name that fell off a cliff
19 0.92229092 1933 andrew gelman stats-2013-07-10-Please send all comments to -dev-ripley
20 0.92215461 695 andrew gelman stats-2011-05-04-Statistics ethics question