andrew_gelman_stats andrew_gelman_stats-2010 andrew_gelman_stats-2010-272 knowledge-graph by maker-knowledge-mining

272 andrew gelman stats-2010-09-13-Ross Ihaka to R: Drop Dead


meta infos for this blog

Source: html

Introduction: Christian Robert posts these thoughts : I [Ross Ihaka] have been worried for some time that R isn’t going to provide the base that we’re going to need for statistical computation in the future. (It may well be that the future is already upon us.) There are certainly efficiency problems (speed and memory use), but there are more fundamental issues too. Some of these were inherited from S and some are peculiar to R. One of the worst problems is scoping. Consider the following little gem. f =function() { if (runif(1) > .5) x = 10 x } The x being returned by this function is randomly local or global. There are other examples where variables alternate between local and non-local throughout the body of a function. No sensible language would allow this. It’s ugly and it makes optimisation really difficult. This isn’t the only problem, even weirder things happen because of interactions between scoping and lazy evaluation. In light of this, I [Ihaka] have come to the c


Summary: the most important sentenses genereted by tfidf model

sentIndex sentText sentNum sentScore

1 Christian Robert posts these thoughts : I [Ross Ihaka] have been worried for some time that R isn’t going to provide the base that we’re going to need for statistical computation in the future. [sent-1, score-0.268]

2 ) There are certainly efficiency problems (speed and memory use), but there are more fundamental issues too. [sent-3, score-0.419]

3 Some of these were inherited from S and some are peculiar to R. [sent-4, score-0.203]

4 5) x = 10 x } The x being returned by this function is randomly local or global. [sent-8, score-0.293]

5 There are other examples where variables alternate between local and non-local throughout the body of a function. [sent-9, score-0.278]

6 This isn’t the only problem, even weirder things happen because of interactions between scoping and lazy evaluation. [sent-12, score-0.193]

7 In light of this, I [Ihaka] have come to the conclusion that rather than “fixing” R, it would be much more productive to simply start over and build something better. [sent-13, score-0.082]

8 I think the best you could hope for by fixing the efficiency problems in R would be to boost performance by a small multiple, or perhaps as much as an order of magnitude. [sent-14, score-0.723]

9 This probably isn’t enough to justify the effort (Luke Tierney has been working on R compilation for over a decade now). [sent-15, score-0.181]

10 Adding this to the mix might just make it possible to get a three order-of-magnitude performance boost with just a fraction of the memory that R uses. [sent-20, score-0.5]

11 When writing ARM, I was careful to write code in what I considered a readable way, which in many instances involved looping rather than vectorization and the much-hated apply() function. [sent-26, score-0.367]

12 (A particular difficulty arises when dealing with posterior simulations, where scalars become matrices, matrices become two-way arrays, and so forth. [sent-27, score-0.464]

13 ) In my programming, I’ve found myself using notational conventions where the structure in the program should be, and I think this is a common problem in R. [sent-28, score-0.215]

14 (Consider the various objects such as rownames, rows, row. [sent-29, score-0.078]

15 ) And anyone who’s worked with R for awhile has had the frustration of having to take a dataset and shake it to wring out all the layers of structure that are put there by default. [sent-31, score-0.514]

16 I’ll read in some ascii data and then be going through different permutations of functions such as as. [sent-32, score-0.205]

17 character() to convert data from “levels” into numbers or strings. [sent-35, score-0.084]

18 And I recognize that many of its problems arise from its generality. [sent-39, score-0.127]


similar blogs computed by tfidf model

tfidf for this blog:

wordName wordTfidf (topN-words)

[('ihaka', 0.348), ('fixing', 0.183), ('ross', 0.179), ('boost', 0.168), ('matrices', 0.16), ('memory', 0.152), ('efficiency', 0.14), ('problems', 0.127), ('isn', 0.117), ('permutations', 0.116), ('wring', 0.116), ('looping', 0.116), ('rownames', 0.116), ('scalars', 0.116), ('scoping', 0.116), ('structure', 0.114), ('local', 0.11), ('layers', 0.109), ('luke', 0.109), ('tierney', 0.109), ('performance', 0.105), ('peculiar', 0.105), ('runif', 0.105), ('function', 0.102), ('compilation', 0.101), ('conventions', 0.101), ('bureaucratic', 0.098), ('annoyance', 0.098), ('arrays', 0.098), ('inherited', 0.098), ('vectorization', 0.098), ('alternate', 0.095), ('shake', 0.095), ('become', 0.094), ('thoughts', 0.09), ('going', 0.089), ('rows', 0.087), ('convert', 0.084), ('productive', 0.082), ('returned', 0.081), ('justify', 0.08), ('frustration', 0.08), ('objects', 0.078), ('readable', 0.077), ('lazy', 0.077), ('instances', 0.076), ('fraction', 0.075), ('sensible', 0.073), ('body', 0.073), ('faster', 0.072)]

similar blogs list:

simIndex simValue blogId blogTitle

same-blog 1 0.99999988 272 andrew gelman stats-2010-09-13-Ross Ihaka to R: Drop Dead

Introduction: Christian Robert posts these thoughts : I [Ross Ihaka] have been worried for some time that R isn’t going to provide the base that we’re going to need for statistical computation in the future. (It may well be that the future is already upon us.) There are certainly efficiency problems (speed and memory use), but there are more fundamental issues too. Some of these were inherited from S and some are peculiar to R. One of the worst problems is scoping. Consider the following little gem. f =function() { if (runif(1) > .5) x = 10 x } The x being returned by this function is randomly local or global. There are other examples where variables alternate between local and non-local throughout the body of a function. No sensible language would allow this. It’s ugly and it makes optimisation really difficult. This isn’t the only problem, even weirder things happen because of interactions between scoping and lazy evaluation. In light of this, I [Ihaka] have come to the c

2 0.14176913 2190 andrew gelman stats-2014-01-29-Stupid R Tricks: Random Scope

Introduction: Andrew and I have been discussing how we’re going to define functions in Stan for defining systems of differential equations; see our evolving ode design doc ; comments welcome, of course. About Scope I mentioned to Andrew I would prefer pure lexical, static scoping, as found in languages like C++ and Java. If you’re not familiar with the alternatives, there’s a nice overview in the Wikipedia article on scope . Let me call out a few passages that will help set the context. A fundamental distinction in scoping is what “context” means – whether name resolution depends on the location in the source code (lexical scope, static scope, which depends on the lexical context) or depends on the program state when the name is encountered (dynamic scope, which depends on the execution context or calling context). Lexical resolution can be determined at compile time, and is also known as early binding, while dynamic resolution can in general only be determined at run time, and thus

3 0.12657066 1009 andrew gelman stats-2011-11-14-Wickham R short course

Introduction: Hadley writes: I [Hadley] am going to be teaching an R development master class in New York City on Dec 12-13. The basic idea of the class is to help you write better code, focused on the mantra of “do not repeat yourself”. In day one you will learn powerful new tools of abstraction, allowing you to solve a wider range of problems with fewer lines of code. Day two will teach you how to make packages, the fundamental unit of code distribution in R, allowing others to save time by allowing them to use your code. To get the most out of this course, you should have some experience programming in R already: you should be familiar with writing functions, and the basic data structures of R: vectors, matrices, arrays, lists and data frames. You will find the course particularly useful if you’re an experienced R user looking to take the next step, or if you’re moving to R from other programming languages and you want to quickly get up to speed with R’s unique features. A coupl

4 0.11540385 1807 andrew gelman stats-2013-04-17-Data problems, coding errors…what can be done?

Introduction: This post is by Phil A recent post on this blog discusses a prominent case of an Excel error leading to substantially wrong results from a statistical analysis. Excel is notorious for this because it is easy to add a row or column of data (or intermediate results) but forget to update equations so that they correctly use the new data. That particular error is less common in a language like R because R programmers usually refer to data by variable name (or by applying functions to a named variable), so the same code works even if you add or remove data. Still, there is plenty of opportunity for errors no matter what language one uses. Andrew ran into problems fairly recently, and also blogged about another instance. I’ve never had to retract a paper, but that’s partly because I haven’t published a whole lot of papers. Certainly I have found plenty of substantial errors pretty late in some of my data analyses, and I obviously don’t have sufficient mechanisms in place to be sure

5 0.10461958 1710 andrew gelman stats-2013-02-06-The new Stan 1.1.1, featuring Gaussian processes!

Introduction: We just released Stan 1.1.1 and RStan 1.1.1 As usual, you can find download and install instructions at: http://mc-stan.org/ This is a patch release and is fully backward compatible with Stan and RStan 1.1.0. The main thing you should notice is that the multivariate models should be much faster and all the bugs reported for 1.1.0 have been fixed. We’ve also added a bit more functionality. The substantial changes are listed in the following release notes. v1.1.1 (5 February 2012) ====================================================================== Bug Fixes ———————————- * fixed bug in comparison operators, which swapped operator< with operator<= and swapped operator> with operator>= semantics * auto-initialize all variables to prevent segfaults * atan2 gradient propagation fixed * fixed off-by-one in NUTS treedepth bound so NUTS goes at most to specified tree depth rather than specified depth + 1 * various compiler compatibility and minor consistency issues * f

6 0.10056394 1753 andrew gelman stats-2013-03-06-Stan 1.2.0 and RStan 1.2.0

7 0.097172394 266 andrew gelman stats-2010-09-09-The future of R

8 0.093477048 1948 andrew gelman stats-2013-07-21-Bayes related

9 0.090169907 535 andrew gelman stats-2011-01-24-Bleg: Automatic Differentiation for Log Prob Gradients?

10 0.089901917 888 andrew gelman stats-2011-09-03-A psychology researcher asks: Is Anova dead?

11 0.089560226 1695 andrew gelman stats-2013-01-28-Economists argue about Bayes

12 0.087975845 101 andrew gelman stats-2010-06-20-“People with an itch to scratch”

13 0.087588295 1799 andrew gelman stats-2013-04-12-Stan 1.3.0 and RStan 1.3.0 Ready for Action

14 0.087529108 2299 andrew gelman stats-2014-04-21-Stan Model of the Week: Hierarchical Modeling of Supernovas

15 0.087350488 793 andrew gelman stats-2011-07-09-R on the cloud

16 0.086032607 1848 andrew gelman stats-2013-05-09-A tale of two discussion papers

17 0.085305169 1447 andrew gelman stats-2012-08-07-Reproducible science FAIL (so far): What’s stoppin people from sharin data and code?

18 0.085270718 1661 andrew gelman stats-2013-01-08-Software is as software does

19 0.085202791 1764 andrew gelman stats-2013-03-15-How do I make my graphs?

20 0.083075613 528 andrew gelman stats-2011-01-21-Elevator shame is a two-way street


similar blogs computed by lsi model

lsi for this blog:

topicId topicWeight

[(0, 0.197), (1, 0.001), (2, -0.023), (3, 0.03), (4, 0.076), (5, 0.002), (6, 0.023), (7, -0.042), (8, -0.005), (9, -0.004), (10, -0.034), (11, 0.0), (12, -0.027), (13, -0.052), (14, 0.017), (15, 0.013), (16, 0.002), (17, -0.029), (18, -0.011), (19, 0.023), (20, 0.019), (21, 0.013), (22, -0.025), (23, 0.041), (24, -0.029), (25, 0.007), (26, 0.035), (27, 0.038), (28, 0.034), (29, 0.034), (30, 0.019), (31, -0.006), (32, 0.03), (33, 0.004), (34, 0.03), (35, -0.034), (36, 0.001), (37, 0.039), (38, -0.021), (39, 0.017), (40, 0.003), (41, 0.002), (42, -0.026), (43, 0.014), (44, -0.004), (45, 0.014), (46, -0.016), (47, -0.005), (48, 0.041), (49, 0.006)]

similar blogs list:

simIndex simValue blogId blogTitle

same-blog 1 0.95664102 272 andrew gelman stats-2010-09-13-Ross Ihaka to R: Drop Dead

Introduction: Christian Robert posts these thoughts : I [Ross Ihaka] have been worried for some time that R isn’t going to provide the base that we’re going to need for statistical computation in the future. (It may well be that the future is already upon us.) There are certainly efficiency problems (speed and memory use), but there are more fundamental issues too. Some of these were inherited from S and some are peculiar to R. One of the worst problems is scoping. Consider the following little gem. f =function() { if (runif(1) > .5) x = 10 x } The x being returned by this function is randomly local or global. There are other examples where variables alternate between local and non-local throughout the body of a function. No sensible language would allow this. It’s ugly and it makes optimisation really difficult. This isn’t the only problem, even weirder things happen because of interactions between scoping and lazy evaluation. In light of this, I [Ihaka] have come to the c

2 0.8765226 266 andrew gelman stats-2010-09-09-The future of R

Introduction: Some thoughts from Christian , including this bit: We need to consider separately 1. R’s brilliant library 2. R’s not-so-brilliant language and/or interpreter. I don’t know that R’s library is so brilliant as all that–if necessary, I don’t think it would be hard to reprogram the important packages in a new language. I would say, though, that the problems with R are not just in the technical details of the language. I think the culture of R has some problems too. As I’ve written before, R functions used to be lean and mean, and now they’re full of exception-handling and calls to other packages. R functions are spaghetti-like messes of connections in which I keep expecting to run into syntax like “GOTO 120.” I learned about these problems a couple years ago when writing bayesglm(), which is a simple adaptation of glm(). But glm(), and its workhorse, glm.fit(), are a mess: They’re about 10 lines of functioning code, plus about 20 lines of necessary front-end, plus a cou

3 0.8727898 2089 andrew gelman stats-2013-11-04-Shlemiel the Software Developer and Unknown Unknowns

Introduction: The Stan meeting today reminded me of Joel Spolsky’s recasting of the Yiddish joke about Shlemiel the Painter. Joel retold it on his blog, Joel on Software , in the post Back to Basics : Shlemiel gets a job as a street painter, painting the dotted lines down the middle of the road. On the first day he takes a can of paint out to the road and finishes 300 yards of the road. “That’s pretty good!” says his boss, “you’re a fast worker!” and pays him a kopeck. The next day Shlemiel only gets 150 yards done. “Well, that’s not nearly as good as yesterday, but you’re still a fast worker. 150 yards is respectable,” and pays him a kopeck. The next day Shlemiel paints 30 yards of the road. “Only 30!” shouts his boss. “That’s unacceptable! On the first day you did ten times that much work! What’s going on?” “I can’t help it,” says Shlemiel. “Every day I get farther and farther away from the paint can!” Joel used it as an example of the kind of string processing naive programmers ar

4 0.87116075 1807 andrew gelman stats-2013-04-17-Data problems, coding errors…what can be done?

Introduction: This post is by Phil A recent post on this blog discusses a prominent case of an Excel error leading to substantially wrong results from a statistical analysis. Excel is notorious for this because it is easy to add a row or column of data (or intermediate results) but forget to update equations so that they correctly use the new data. That particular error is less common in a language like R because R programmers usually refer to data by variable name (or by applying functions to a named variable), so the same code works even if you add or remove data. Still, there is plenty of opportunity for errors no matter what language one uses. Andrew ran into problems fairly recently, and also blogged about another instance. I’ve never had to retract a paper, but that’s partly because I haven’t published a whole lot of papers. Certainly I have found plenty of substantial errors pretty late in some of my data analyses, and I obviously don’t have sufficient mechanisms in place to be sure

5 0.85148603 818 andrew gelman stats-2011-07-23-Parallel JAGS RNGs

Introduction: As a matter of convention, we usually run 3 or 4 chains in JAGS. By default, this gives rise to chains that draw samples from 3 or 4 distinct pseudorandom number generators. I didn’t go and check whether it does things 111,222,333 or 123,123,123, but in any event the “parallel chains” in JAGS are samples drawn from distinct RNGs computed on a single processor core. But we all have multiple cores now, or we’re computing on a cluster or the cloud! So the behavior we’d like from rjags is to use the foreach package with each JAGS chain using a parallel-safe RNG. The default behavior with n.chain=1 will be that each parallel instance will use .RNG.name[1] , the Wichmann-Hill RNG. JAGS 2.2.0 includes a new lecuyer module (along with the glm module, which everyone should probably always use, and doesn’t have many undocumented tricks that I know of). But lecuyer is completely undocumented! I tried .RNG.name="lecuyer::Lecuyer" , .RNG.name="lecuyer::lecuyer" , and .RNG.name=

6 0.85052413 597 andrew gelman stats-2011-03-02-RStudio – new cross-platform IDE for R

7 0.82925242 535 andrew gelman stats-2011-01-24-Bleg: Automatic Differentiation for Log Prob Gradients?

8 0.82485795 1655 andrew gelman stats-2013-01-05-The statistics software signal

9 0.80568683 324 andrew gelman stats-2010-10-07-Contest for developing an R package recommendation system

10 0.8014735 2190 andrew gelman stats-2014-01-29-Stupid R Tricks: Random Scope

11 0.79898632 470 andrew gelman stats-2010-12-16-“For individuals with wine training, however, we find indications of a positive relationship between price and enjoyment”

12 0.79758751 907 andrew gelman stats-2011-09-14-Reproducibility in Practice

13 0.79217505 1520 andrew gelman stats-2012-10-03-Advice that’s so eminently sensible but so difficult to follow

14 0.77693266 1716 andrew gelman stats-2013-02-09-iPython Notebook

15 0.77281839 1134 andrew gelman stats-2012-01-21-Lessons learned from a recent R package submission

16 0.76922089 1808 andrew gelman stats-2013-04-17-Excel-bashing

17 0.76817924 166 andrew gelman stats-2010-07-27-The Three Golden Rules for Successful Scientific Research

18 0.76742995 736 andrew gelman stats-2011-05-29-Response to “Why Tables Are Really Much Better Than Graphs”

19 0.76505822 910 andrew gelman stats-2011-09-15-Google Refine

20 0.76394475 360 andrew gelman stats-2010-10-21-Forensic bioinformatics, or, Don’t believe everything you read in the (scientific) papers


similar blogs computed by lda model

lda for this blog:

topicId topicWeight

[(1, 0.15), (2, 0.012), (9, 0.011), (16, 0.106), (24, 0.103), (32, 0.019), (42, 0.038), (44, 0.013), (45, 0.024), (63, 0.038), (77, 0.015), (86, 0.027), (89, 0.023), (99, 0.288)]

similar blogs list:

simIndex simValue blogId blogTitle

1 0.96925163 525 andrew gelman stats-2011-01-19-Thiel update

Introduction: A year or so ago I discussed the reasoning of zillionaire financier Peter Thiel, who seems to believe his own hype and, worse, seems to be able to convince reporters of his infallibility as well. Apparently he “possesses a preternatural ability to spot patterns that others miss.” More recently, Felix Salmon commented on Thiel’s financial misadventures: Peter Thiel’s hedge fund, Clarium Capital, ain’t doing so well. Its assets under management are down 90% from their peak, and total returns from the high point are -65%. Thiel is smart, successful, rich, well-connected, and on top of all that his calls have actually been right . . . None of that, clearly, was enough for Clarium to make money on its trades: the fund was undone by volatility and weakness in risk management. There are a few lessons to learn here. Firstly, just because someone is a Silicon Valley gazillionaire, or any kind of successful entrepreneur for that matter, doesn’t mean they should be trusted with oth

2 0.96510887 973 andrew gelman stats-2011-10-26-Antman again courts controversy

Introduction: Commenter Zbicyclist links to a fun article by Howard French on biologist E. O. Wilson: Wilson announced that his new book may be his last. It is not limited to the discussion of evolutionary biology, but ranges provocatively through the humanities, as well. . . . Generation after generation of students have suffered trying to “puzzle out” what great thinkers like Socrates, Plato, and Descartes had to say on the great questions of man’s nature, Wilson said, but this was of little use, because philosophy has been based on “failed models of the brain.” This reminds me of my recent remarks on the use of crude folk-psychology models as microfoundations for social sciences. The article also discusses Wilson’s recent crusade against selfish-gene-style simplifications of human and animal nature. I’m with Wilson 100% on this one. “Two brothers or eight cousins” is a cute line but it doesn’t seem to come close to describing how species or societies work, and it’s always seemed a

3 0.95959842 1154 andrew gelman stats-2012-02-04-“Turn a Boring Bar Graph into a 3D Masterpiece”

Introduction: Jimmy sends in this . Steps include “Make whimsical sparkles by drawing an ellipse using the Ellipse Tool,” “Rotate the sparkles . . . Give some sparkles less Opacity by using the Transparency Palette,” and “Add a haze around each sparkle by drawing a white ellipse using the Ellipse Tool.” The punchline: Now, the next time you need to include a boring graph in one of your designs you’ll be able to add some extra emphasis and get people to really pay attention to those numbers! P.S. to all the commenters: Yeah, yeah, do your contrarian best and tell me why chartjunk is actually a good thing, how I’m just a snob, etc etc.

4 0.95686799 664 andrew gelman stats-2011-04-16-Dilbert update: cartooning can give you the strength to open jars with your bare hands

Introduction: We were having so much fun on this thread that I couldn’t resist linking to this news item by Adrian Chen. The good news is that Scott Adams (creater of the Dilbert comic strip) “has a certified genius IQ” and that he “can open jars with [his] bare hands.” He is also “able to lift heavy objects.” Cool! In all seriousness, I knew nothing about this aspect of Adams when I wrote the earlier blog. I was just surprised (and remain surprised) that he was so impressed with Charlie Sheen for being good-looking and being able to remember his lines. At the time I thought it was just a matter of Adams being overly-influenced by his direct experience, along with some satisfaction in separating himself from the general mass of Sheen-haters out there. But now I wonder if something more is going on, that maybe he feels that he and Sheen are on the same side in a culture war. In any case, the ultimate topic of interest here is not Sheen or Adams but rather more general questions of what

5 0.93855119 1449 andrew gelman stats-2012-08-08-Gregor Mendel’s suspicious data

Introduction: Howard Wainer points me to a thoughtful discussion by Moti Nissani on “Psychological, Historical, and Ethical Reflections on the Mendelian Paradox.” The paradox, as Nissani defines it, is that Mendel’s data seem in many cases too good to be true, yet Mendel had a reputation for probity and it seems doubtful that he had a Mark-Hauser-style attitude toward reporting scientific data. Nissani writes: Taken together, the situation seems paradoxical. On the one hand, we have evidence that “the data of most, if not all, of the experiments have been falsified so as to agree closely with Mendel’s expectations.” We also have good reasons to believe that Mendel encountered linkage but failed to report it and that he may have taken the somewhat unusual step of having his scientific records destroyed shortly after his death. On the other hand, everything else we know about him/in addition to his undisputed genius/suggests a man of unimpeachable integrity, fine observational powers, and a pa

6 0.93801385 581 andrew gelman stats-2011-02-19-“The best living writer of thrillers”

7 0.93720961 697 andrew gelman stats-2011-05-05-A statistician rereads Bill James

same-blog 8 0.93516946 272 andrew gelman stats-2010-09-13-Ross Ihaka to R: Drop Dead

9 0.91870648 1665 andrew gelman stats-2013-01-10-That controversial claim that high genetic diversity, or low genetic diversity, is bad for the economy

10 0.91837728 587 andrew gelman stats-2011-02-24-5 seconds of every #1 pop single

11 0.9156307 2190 andrew gelman stats-2014-01-29-Stupid R Tricks: Random Scope

12 0.91435659 1419 andrew gelman stats-2012-07-17-“Faith means belief in something concerning which doubt is theoretically possible.” — William James

13 0.90968966 541 andrew gelman stats-2011-01-27-Why can’t I be more like Bill James, or, The use of default and default-like models

14 0.90652657 642 andrew gelman stats-2011-04-02-Bill James and the base-rate fallacy

15 0.9063291 906 andrew gelman stats-2011-09-14-Another day, another stats postdoc

16 0.90370131 657 andrew gelman stats-2011-04-11-Note to Dilbert: The difference between Charlie Sheen and Superman is that the Man of Steel protected Lois Lane, he didn’t bruise her

17 0.89851195 2030 andrew gelman stats-2013-09-19-Is coffee a killer? I don’t think the effect is as high as was estimated from the highest number that came out of a noisy study

18 0.89817858 315 andrew gelman stats-2010-10-03-He doesn’t trust the fit . . . r=.999

19 0.89445519 148 andrew gelman stats-2010-07-15-“Gender Bias Still Exists in Modern Children’s Literature, Say Centre Researchers”

20 0.8929258 966 andrew gelman stats-2011-10-20-A qualified but incomplete thanks to Gregg Easterbrook’s editor at Reuters