andrew_gelman_stats andrew_gelman_stats-2011 andrew_gelman_stats-2011-793 knowledge-graph by maker-knowledge-mining
Source: html
Introduction: Just as scientists should never really have to think much about statistics, I feel that, in an ideal world, statisticians would never have to worry about computing. In the real world, though, we have to spend a lot of time building our own tools. It would be great if we could routinely run R with speed and memory limitations being less of a concern. One suggestion that sometimes arises is to run things on “the cloud.” So I was interested upon receiving this email from Niklas Frassa: Time intensive calculations, as known from life science, finance or business intelligence, can now be processed at a whole new level of speed – in the Cloud. cloudnumbers.com provides an intuitive platform that enables everyone to run time consuming calculations on clusters with more than 1000 CPUs. So far, High Performance Computing has only been accessible for large corporations and universities leading to significant competitive disadvantages for small and medium-sized companies. With cloudnu
sentIndex sentText sentNum sentScore
1 Just as scientists should never really have to think much about statistics, I feel that, in an ideal world, statisticians would never have to worry about computing. [sent-1, score-0.463]
2 In the real world, though, we have to spend a lot of time building our own tools. [sent-2, score-0.165]
3 It would be great if we could routinely run R with speed and memory limitations being less of a concern. [sent-3, score-0.817]
4 One suggestion that sometimes arises is to run things on “the cloud. [sent-4, score-0.382]
5 ” So I was interested upon receiving this email from Niklas Frassa: Time intensive calculations, as known from life science, finance or business intelligence, can now be processed at a whole new level of speed – in the Cloud. [sent-5, score-0.797]
6 com provides an intuitive platform that enables everyone to run time consuming calculations on clusters with more than 1000 CPUs. [sent-7, score-1.15]
7 So far, High Performance Computing has only been accessible for large corporations and universities leading to significant competitive disadvantages for small and medium-sized companies. [sent-8, score-0.703]
8 com we finally make High Performance Computing accessible to everyone. [sent-10, score-0.217]
9 com’s scalable server environment results in minimal idle times – and great cost savings, as customers only pay for what they actually consume. [sent-12, score-0.88]
10 Furthermore, users do no longer need a degree in computer science to be able to access the computing power of supercomputers. [sent-13, score-0.546]
11 com thing doesn’t sound so wonderful, at least for people like me who are already using Rstudio. [sent-19, score-0.086]
12 See response from the company in comments below. [sent-23, score-0.192]
wordName wordTfidf (topN-words)
[('computing', 0.271), ('accessible', 0.217), ('speed', 0.207), ('calculations', 0.193), ('run', 0.192), ('idle', 0.169), ('enables', 0.159), ('performance', 0.153), ('server', 0.152), ('consuming', 0.142), ('intensive', 0.139), ('disadvantages', 0.139), ('clusters', 0.139), ('processed', 0.139), ('scalable', 0.139), ('corporations', 0.136), ('savings', 0.13), ('customers', 0.128), ('platform', 0.122), ('intuitive', 0.121), ('furthermore', 0.119), ('finance', 0.119), ('competitive', 0.111), ('memory', 0.11), ('intelligence', 0.109), ('comments', 0.107), ('minimal', 0.107), ('receiving', 0.107), ('routinely', 0.107), ('limitations', 0.105), ('feel', 0.104), ('world', 0.102), ('universities', 0.1), ('high', 0.1), ('wonderful', 0.098), ('arises', 0.097), ('great', 0.096), ('access', 0.094), ('suggestion', 0.093), ('ideal', 0.093), ('never', 0.092), ('users', 0.091), ('degree', 0.09), ('environment', 0.089), ('sound', 0.086), ('upon', 0.086), ('company', 0.085), ('building', 0.083), ('time', 0.082), ('worry', 0.082)]
simIndex simValue blogId blogTitle
same-blog 1 0.99999994 793 andrew gelman stats-2011-07-09-R on the cloud
Introduction: Just as scientists should never really have to think much about statistics, I feel that, in an ideal world, statisticians would never have to worry about computing. In the real world, though, we have to spend a lot of time building our own tools. It would be great if we could routinely run R with speed and memory limitations being less of a concern. One suggestion that sometimes arises is to run things on “the cloud.” So I was interested upon receiving this email from Niklas Frassa: Time intensive calculations, as known from life science, finance or business intelligence, can now be processed at a whole new level of speed – in the Cloud. cloudnumbers.com provides an intuitive platform that enables everyone to run time consuming calculations on clusters with more than 1000 CPUs. So far, High Performance Computing has only been accessible for large corporations and universities leading to significant competitive disadvantages for small and medium-sized companies. With cloudnu
2 0.12162332 786 andrew gelman stats-2011-07-04-Questions about quantum computing
Introduction: I read this article by Rivka Galchen on quantum computing. Much of the article was about an eccentric scientist in his fifties named David Deutch. I’m sure the guy is brilliant but I wasn’t particularly interested in his not particularly interesting life story (apparently he’s thin and lives in Oxford). There was a brief description of quantum computing itself, which reminds me of the discussion we had a couple years ago under the heading, The laws of conditional probability are false (and the update here ). I don’t have anything new to say here; I’d just never heard of quantum computing before and it seemed relevant to our discussion. The uncertainty inherent in quantum computing seems closely related to Jouni’s idea of fully Bayesian computing , that uncertainty should be inherent in the computational structure rather than tacked on at the end. P.S. No, I’m not working on July 4th! This post is two months old, we just have a long waiting list of blog entries.
3 0.10768449 2307 andrew gelman stats-2014-04-27-Big Data…Big Deal? Maybe, if Used with Caution.
Introduction: This post is by David K. Park As we have witnessed, the term “big data” has been thrusted onto the zeitgeist in the past several years, however, when one pushes beyond the hype, there seems to be little substance there. We’ve always had “data” so what so unique about it this time? Yes, we recognize it’s “big” but is there anything unique about data this time around? I’ve spend some time thinking about this and the answer seems to be yes, and it falls on three dimensions: Capturing Conversations & Relationships : Individuals have always communicated with one another, but now we can capture some of that conversation – email, blogs, social media (Facebook, Twitter, Pinterest) – and we can now do it with machines via sensors, ie “the internet of things” as we hear so much about; Granularity : We can now understand individuals at a much finer level of analysis. No longer do we need to rely on a sample size of 500 people to “represent” the nation, but instead we can acc
4 0.10093004 1036 andrew gelman stats-2011-11-30-Stan uses Nuts!
Introduction: We interrupt our usual program of Ed Wegman Gregg Easterbrook Niall Ferguson mockery to deliver a serious update on our statistical computing project. Stan (“Sampling Through Adaptive Neighborhoods”) is our new C++ program (written mostly by Bob Carpenter) that draws samples from Bayesian models. Stan can take different sorts of inputs: you can write the model in a Bugs-like syntax and it goes from there, or you can write the log-posterior directly as a C++ function. Most of the computation is done using Hamiltonian Monte Carlo. HMC requires some tuning, so Matt Hoffman up and wrote a new algorithm, Nuts (the “No-U-Turn Sampler”) which optimizes HMC adaptively. In many settings, Nuts is actually more computationally efficient than the optimal static HMC! When the the Nuts paper appeared on Arxiv, Christian Robert noticed it and had some reactions . In response to Xian’s comments, Matt writes: Christian writes: I wonder about the computing time (and the “una
5 0.099700674 503 andrew gelman stats-2011-01-04-Clarity on my email policy
Introduction: I never read email before 4. That doesn’t mean I never send email before 4.
6 0.099265032 2173 andrew gelman stats-2014-01-15-Postdoc involving pathbreaking work in MRP, Stan, and the 2014 election!
7 0.097825512 122 andrew gelman stats-2010-07-01-MCMC machine
8 0.09602277 2304 andrew gelman stats-2014-04-24-An open site for researchers to post and share papers
9 0.093726821 1596 andrew gelman stats-2012-11-29-More consulting experiences, this time in computational linguistics
10 0.091393813 1110 andrew gelman stats-2012-01-10-Jobs in statistics research! In New Jersey!
11 0.087350488 272 andrew gelman stats-2010-09-13-Ross Ihaka to R: Drop Dead
12 0.086516403 1748 andrew gelman stats-2013-03-04-PyStan!
13 0.086374372 1605 andrew gelman stats-2012-12-04-Write This Book
14 0.084340721 597 andrew gelman stats-2011-03-02-RStudio – new cross-platform IDE for R
15 0.083785996 2282 andrew gelman stats-2014-04-05-Bizarre academic spam
16 0.083111912 426 andrew gelman stats-2010-11-22-Postdoc opportunity here at Columbia — deadline soon!
17 0.083090685 1289 andrew gelman stats-2012-04-29-We go to war with the data we have, not the data we want
19 0.082352541 1464 andrew gelman stats-2012-08-20-Donald E. Westlake on George W. Bush
20 0.07905937 1832 andrew gelman stats-2013-04-29-The blogroll
topicId topicWeight
[(0, 0.168), (1, -0.053), (2, -0.043), (3, -0.001), (4, 0.03), (5, 0.051), (6, 0.008), (7, -0.025), (8, -0.003), (9, 0.016), (10, -0.038), (11, -0.027), (12, 0.026), (13, -0.024), (14, -0.046), (15, 0.03), (16, 0.022), (17, -0.027), (18, 0.002), (19, 0.01), (20, 0.057), (21, -0.0), (22, 0.011), (23, 0.012), (24, -0.034), (25, 0.01), (26, 0.002), (27, 0.053), (28, -0.013), (29, 0.004), (30, 0.012), (31, -0.021), (32, 0.017), (33, 0.0), (34, 0.016), (35, -0.021), (36, -0.002), (37, 0.036), (38, 0.021), (39, 0.002), (40, 0.03), (41, 0.004), (42, -0.081), (43, -0.009), (44, 0.025), (45, -0.075), (46, -0.004), (47, 0.005), (48, 0.015), (49, -0.016)]
simIndex simValue blogId blogTitle
same-blog 1 0.96064663 793 andrew gelman stats-2011-07-09-R on the cloud
Introduction: Just as scientists should never really have to think much about statistics, I feel that, in an ideal world, statisticians would never have to worry about computing. In the real world, though, we have to spend a lot of time building our own tools. It would be great if we could routinely run R with speed and memory limitations being less of a concern. One suggestion that sometimes arises is to run things on “the cloud.” So I was interested upon receiving this email from Niklas Frassa: Time intensive calculations, as known from life science, finance or business intelligence, can now be processed at a whole new level of speed – in the Cloud. cloudnumbers.com provides an intuitive platform that enables everyone to run time consuming calculations on clusters with more than 1000 CPUs. So far, High Performance Computing has only been accessible for large corporations and universities leading to significant competitive disadvantages for small and medium-sized companies. With cloudnu
2 0.77636772 2282 andrew gelman stats-2014-04-05-Bizarre academic spam
Introduction: I’ve been getting these sorts of emails every couple days lately: Respected Professor Gelman I am a senior undergraduate at Indian Institute of Technology Kanpur (IIT Kanpur). I am currently in the 8th Semester of my Master of Science (Integrated) in Mathematics and Scientific Computing program. I went through some of your previous work and found it to be very interesting, especially ‘Discussion of the article “website morphing”‘. I am interested in working under your guidance in a full time research during this summer (May 2014 – July 2014) I have a deep interest in Economics (especially Game Theory), Applied Mathematics and Statistics and I have consistently performed well in many courses. My past research experience convinced me of my potential for research and I am in search of an opportunity under your guidance to hone my analytic and research skills As evident from my resume, most of my work till now hovers around analysis and application of abstract ideas, where in mos
3 0.76624382 1596 andrew gelman stats-2012-11-29-More consulting experiences, this time in computational linguistics
Introduction: Bob wrote this long comment that I think is worth posting: I [Bob] have done a fair bit of consulting for my small natural language processing company over the past ten years. Like statistics, natural language processing is something may companies think they want, but have no idea how to do themselves. We almost always handed out “free” consulting. Usually on the phone to people who called us out of the blue. Our blog and tutorials Google ranking was pretty much our only approach to marketing other than occassionally going to business-oriented conferences. Our goal was to sell software licenses (because consulting doesn’t scale nor does it provide continuing royalty income), but since so few people knew how to use toolkits like ours, we had to help them along the way. We even provided “free” consulting with our startup license package. We were brutally honest with customers, both about our goals and their goals. Their goals were often incompatible with ours (use company X’
Introduction: Ever since I got this new sound system for my bike, I’ve been listening to a lot of podcasts. This American Life is really good. I know, I know, everybody knows that, but it’s true. The only segments I don’t like are the ones that are too “writerly,” when they read a short story aloud. They don’t work for me. Most of the time, though, the show is as great as everyone says it is. Anyway, the other day I listened to program #466: Blackjack . It started with some items on card counting. That stuff is always fun. Then they get to the longer story, which is all about a moderately rich housewife from Iowa who, over a roughly ten-year period, lost her life savings, something like a million dollars, at Harrah’s casinos. Did you know they had casinos in Iowa and Indiana? I didn’t. Anyway, the lady was a gambling addict. That part’s pretty clear. You don’t lose your life savings at a casino by accident. The scary part, though, was how the casino company craftily enabled her to
Introduction: Sandeep Baliga writes : [In a recent study , Gilles Duranton and Matthew Turner write:] For interstate highways in metropolitan areas we [Duranton and Turner] find that VKT (vehicle kilometers traveled) increases one for one with interstate highways, confirming the fundamental law of highway congestion.’ Provision of public transit also simply leads to the people taking public transport being replaced by drivers on the road. Therefore: These findings suggest that both road capacity expansions and extensions to public transit are not appropriate policies with which to combat traffic congestion. This leaves congestion pricing as the main candidate tool to curb traffic congestion. To which I reply: Sure, if your goal is to curb traffic congestion . But what sort of goal is that? Thinking like a microeconomist, my policy goal is to increase people’s utility. Sure, traffic congestion is annoying, but there must be some advantages to driving on that crowded road or pe
6 0.71419007 732 andrew gelman stats-2011-05-26-What Do We Learn from Narrow Randomized Studies?
7 0.71170217 1520 andrew gelman stats-2012-10-03-Advice that’s so eminently sensible but so difficult to follow
8 0.71054947 2330 andrew gelman stats-2014-05-12-Historical Arc of Universities
9 0.70759493 411 andrew gelman stats-2010-11-13-Ethical concerns in medical trials
10 0.70740813 1261 andrew gelman stats-2012-04-12-The Naval Research Lab
11 0.70694244 395 andrew gelman stats-2010-11-05-Consulting: how do you figure out what to charge?
12 0.70681214 1519 andrew gelman stats-2012-10-02-Job!
13 0.70565742 1731 andrew gelman stats-2013-02-21-If a lottery is encouraging addictive gambling, don’t expand it!
14 0.70368278 179 andrew gelman stats-2010-08-03-An Olympic size swimming pool full of lithium water
15 0.70016098 1058 andrew gelman stats-2011-12-14-Higgs bozos: Rosencrantz and Guildenstern are spinning in their graves
16 0.69935507 1245 andrew gelman stats-2012-04-03-Redundancy and efficiency: In praise of Penn Station
17 0.69755894 592 andrew gelman stats-2011-02-26-“Do you need ideal conditions to do great work?”
18 0.69542235 167 andrew gelman stats-2010-07-27-Why don’t more medical discoveries become cures?
19 0.69427305 166 andrew gelman stats-2010-07-27-The Three Golden Rules for Successful Scientific Research
20 0.69113201 1670 andrew gelman stats-2013-01-13-More Bell Labs happy talk
topicId topicWeight
[(2, 0.011), (9, 0.041), (15, 0.055), (16, 0.078), (21, 0.014), (24, 0.099), (30, 0.092), (32, 0.012), (34, 0.012), (40, 0.015), (52, 0.029), (54, 0.013), (56, 0.032), (59, 0.021), (76, 0.014), (84, 0.015), (86, 0.029), (93, 0.014), (99, 0.326)]
simIndex simValue blogId blogTitle
Introduction: Howard Wainer writes : When we focus only on the differences between groups, we too easily lose track of the big picture. Nowhere is this more obvious than in the current public discussions of the size of the gap in test scores that is observed between racial groups. It has been noted that in New Jersey the gap between the average scores of white and black students on the well-developed scale of the National Assessment of Educational Progress (NAEP) has shrunk by only about 25 percent over the past two decades. The conclusion drawn was that even though the change is in the right direction, it is far too slow. But focusing on the difference blinds us to what has been a remarkable success in education over the past 20 years. Although the direction and size of student improvements are considered across many subject areas and many age groups, I will describe just one — 4th grade mathematics. . . . there have been steep gains for both racial groups over this period (somewhat steeper g
2 0.97244906 1623 andrew gelman stats-2012-12-14-GiveWell charity recommendations
Introduction: In a rare Christmas-themed post here, I pass along this note from Alexander Berger at GiveWell : We just published a blog post following up on the *other* famous piece of evidence for deworming, the Miguel and Kremer experiment from Kenya. They shared data and code from their working paper (!) follow-up finding that deworming increases incomes ten years later, and we came out of the re-analysis feeling more confident in, though not wholly convinced by, the results. We’ve also just released our new list of top charities for giving season this year, which I think might be a good fit for your audience. We wrote a blog post explaining our choices , and have also published extensive reviews of the top charities and the interventions on which they work. Perhaps the most interesting change since last year is the addition of GiveDirectly in the #2 spot; they do direct unconditional cash transfers to people living on less than a dollar a day in Kenya. We think it’s a remarkable mode
3 0.96328115 1768 andrew gelman stats-2013-03-18-Mertz’s reply to Unz’s response to Mertz’s comments on Unz’s article
Introduction: Here. And here’s the story so far: Ron Unz posted a long article on college admissions of Asians and Jews with some numbers and comparisons that made their way into some blogs (including here ) and also a David Brooks NYT column which was read by many people, including Janet Mertz, who’d done previous research on ethnic composition of high-end math students. Mertz contacted me (she’d earlier tried Brooks and others but received no helpful reply), and I posted her findings along with those of another correspondent. Unz then replied , motivating Mertz to write a seven-page document expanding on her earlier emails. Unz responded to that, characterizing Mertz as maybe “emotional” but not actually disputing any of her figures. Unz did, however, make the unconvincing (to me) implication that his original numbers were basically OK even in light of Mertz’s corrections. So Mertz responded once more . (There’s also a side discussion about women’s representation in m
same-blog 4 0.96321428 793 andrew gelman stats-2011-07-09-R on the cloud
Introduction: Just as scientists should never really have to think much about statistics, I feel that, in an ideal world, statisticians would never have to worry about computing. In the real world, though, we have to spend a lot of time building our own tools. It would be great if we could routinely run R with speed and memory limitations being less of a concern. One suggestion that sometimes arises is to run things on “the cloud.” So I was interested upon receiving this email from Niklas Frassa: Time intensive calculations, as known from life science, finance or business intelligence, can now be processed at a whole new level of speed – in the Cloud. cloudnumbers.com provides an intuitive platform that enables everyone to run time consuming calculations on clusters with more than 1000 CPUs. So far, High Performance Computing has only been accessible for large corporations and universities leading to significant competitive disadvantages for small and medium-sized companies. With cloudnu
5 0.95989561 631 andrew gelman stats-2011-03-28-Explaining that plot.
Introduction: With some upgrades from a previous post . And with a hopefully clear 40+ page draft paper (see page 16). Drawing Inference – Literally and by Individual Contribution.pdf Comments are welcome, though my reponses may be delayed. (Working on how to best render the graphs.) K? p.s. Plot was modified so that it might be better interpreted without reading any of the paper – though I would not suggest that – reading at least pages 1 to 17 is recomended.
6 0.95942569 1831 andrew gelman stats-2013-04-29-The Great Race
7 0.95862031 109 andrew gelman stats-2010-06-25-Classics of statistics
8 0.95784104 179 andrew gelman stats-2010-08-03-An Olympic size swimming pool full of lithium water
9 0.955235 412 andrew gelman stats-2010-11-13-Time to apply for the hackNY summer fellows program
10 0.9550246 2073 andrew gelman stats-2013-10-22-Ivy Jew update
11 0.95316261 1259 andrew gelman stats-2012-04-11-How things sound to us, versus how they sound to others
12 0.95294523 1195 andrew gelman stats-2012-03-04-Multiple comparisons dispute in the tabloids
13 0.95251411 170 andrew gelman stats-2010-07-29-When is expertise relevant?
14 0.95209062 1497 andrew gelman stats-2012-09-15-Our blog makes connections!
17 0.95008206 2337 andrew gelman stats-2014-05-18-Never back down: The culture of poverty and the culture of journalism
18 0.94927579 2245 andrew gelman stats-2014-03-12-More on publishing in journals
19 0.94864672 1751 andrew gelman stats-2013-03-06-Janet Mertz’s response to “The Myth of American Meritocracy”
20 0.94861233 2158 andrew gelman stats-2014-01-03-Booze: Been There. Done That.