Introduction: Last month I wrote : Computer scientists are often brilliant but they can be unfamiliar with what is done in the worlds of data collection and analysis. This goes the other way too: statisticians such as myself can look pretty awkward, reinventing (or failing to reinvent) various wheels when we write computer programs or, even worse, try to design software.Andrew MacNamara writes: Andrew MacNamara followed up with some thoughts: I [MacNamara] had some basic statistics training through my MBA program, after having completed an undergrad degree in computer science. Since then I’ve been very interested in learning more about statistical techniques, including things like GLM and censored data analyses as well as machine learning topics like neural nets, SVMs, etc. I began following your blog after some research into Bayesian analysis topics and I am trying to dig deeper on that side of things. One thing I have noticed is that there seems to be a distinction between data analysi

1 Last month I wrote : Computer scientists are often brilliant but they can be unfamiliar with what is done in the worlds of data collection and analysis. [sent-1, score-0.203]

2 This goes the other way too: statisticians such as myself can look pretty awkward, reinventing (or failing to reinvent) various wheels when we write computer programs or, even worse, try to design software. [sent-2, score-0.58]

3 Andrew MacNamara writes: Andrew MacNamara followed up with some thoughts: I [MacNamara] had some basic statistics training through my MBA program, after having completed an undergrad degree in computer science. [sent-3, score-0.501]

4 Since then I’ve been very interested in learning more about statistical techniques, including things like GLM and censored data analyses as well as machine learning topics like neural nets, SVMs, etc. [sent-4, score-0.837]

5 One thing I have noticed is that there seems to be a distinction between data analysis as approached from a statistical perspective (e. [sent-6, score-0.365]

6 , generalized linear models) versus from a computer science perspective (e. [sent-8, score-0.348]

7 Many of the computer scientists I work with approach a data analysis problem by throwing as many ‘features’ at the model as possible, letting the computer do the work, and trying to get the best-performing model as measured by some cross-validation technique. [sent-11, score-1.28]

8 ) and testing for statistical issues that are known to cause problems with models or diagnostics (e. [sent-13, score-0.326]

9 To me, a symptom of this difference in philosophies is that the machine learning software packages I have tried do not seem to output any statistics showing the relative importance or errors of the input features like I would expect from a statistical regression package. [sent-17, score-0.755]

10 My reply: The big difference I’ve noticed between the two fields is that statisticians like to demonstrate our methods on new examples whereas computer scientists seem to be prefer to show better performance on benchmark problems. [sent-21, score-0.878]

11 To a computer scientist, though, solving a new problem is no big deal—they can solve problems whenever they want, and it is through benchmarks that they can make fair comparisons. [sent-24, score-0.535]

12 Now to return to the original question: Yes, CS methods seem to focus on prediction while statistical methods focus on understanding. [sent-25, score-0.485]

13 One might describe the basic approaches of different quantitative fields as follows: Economics: identify the causal effect; Psychology: model the underlying process; Statistics: fit the data; Computer science: predict. [sent-26, score-0.5]

14 About ten years ago I had several meetings with a computer scientist here at Columbia who was working on interesting statistical methods. [sent-28, score-0.524]

15 I was wondering if his methods could help on my problems, or if my methods could help on his. [sent-29, score-0.298]

16 Conversely, it seemed impossible to apply my computationally-intensive hierarchical modeling methods with his huge masses of information. [sent-32, score-0.215]

17 I’ve long thought that machine-learning-style approaches would benefit from predictive model checking. [sent-35, score-0.334]

18 When you see where your model doesn’t fit data, this can give a sense of how it can make sense to put in improvements. [sent-36, score-0.23]

19 Then again, I’ve long thought that statistical model fits should be checked to data also, and a lot of statisticians (particularly Bayesians) have resisted this. [sent-37, score-0.555]

20 Machine learning methods are not always generative, in which case the first step to model checking is the construction of a generative model corresponding to (or approximating) the estimation procedure. [sent-39, score-1.004]

similar blogs list:

simIndex simValue blogId blogTitle

1 0.98438811 1926 andrew gelman stats-2013-07-05-More plain old everyday Bayesianism

Introduction: Following up on this story , Bob Goodman writes: A most recent issue of the New England Journal of Medicine published a study entitled “Biventricular Pacing for Atrioventricular Block and Systolic Dysfunction,” (N Engl J Med 2013; 368:1585-1593), whereby “A hierarchical Bayesian proportional-hazards model was used for analysis of the primary outcome.” It is the first study I can recall in this journal that has reported on Table 2 (primary outcomes) “The Posterior Probability of Hazard Ratio < 1" (which in this case was .9978). This is ok, but to be really picky I will say that there’s typically not so much reason to care about the posterior probability that the effect is greater than 1; I’d rather have an estimate of the effect. Also we should be using informative priors.

2 0.98368877 1516 andrew gelman stats-2012-09-30-Computational problems with glm etc.

Introduction: John Mount provides some useful background and follow-up on our discussion from last year on computational instability of the usual logistic regression solver. Just to refresh your memory, here’s a simple logistic regression with only a constant term and no separation, nothing pathological at all: > y <- rep (c(1,0),c(10,5)) > display (glm (y ~ 1, family=binomial(link="logit"))) glm(formula = y ~ 1, family = binomial(link = "logit")) coef.est (Intercept) 0.69 0.55 --- n = 15, k = 1 residual deviance = 19.1, null deviance = 19.1 (difference = 0.0) And here’s what happens when we give it the not-outrageous starting value of -2: > display (glm (y ~ 1, family=binomial(link="logit"), start=-2)) glm(formula = y ~ 1, family = binomial(link = "logit"), start = -2) coef.est (Intercept) 71.97 17327434.18 --- n = 15, k = 1 residual deviance = 360.4, null deviance = 19.1 (difference = -341.3) Warning message:

3 0.98349655 1443 andrew gelman stats-2012-08-04-Bayesian Learning via Stochastic Gradient Langevin Dynamics

Introduction: Burak Bayramli writes: In this paper by Sunjin Ahn, Anoop Korattikara, and Max Welling and this paper by Welling and Yee Whye The, there are some arguments on big data and the use of MCMC. Both papers have suggested improvements to speed up MCMC computations. I was wondering what your thoughts were, especially on this paragraph: When a dataset has a billion data-cases (as is not uncommon these days) MCMC algorithms will not even have generated a single (burn-in) sample when a clever learning algorithm based on stochastic gradients may already be making fairly good predictions. In fact, the intriguing results of Bottou and Bousquet (2008) seem to indicate that in terms of “number of bits learned per unit of computation”, an algorithm as simple as stochastic gradient descent is almost optimally efficient. We therefore argue that for Bayesian methods to remain useful in an age when the datasets grow at an exponential rate, they need to embrace the ideas of the stochastic optimiz

4 0.98241031 2049 andrew gelman stats-2013-10-03-On house arrest for p-hacking

Introduction: People keep pointing me to this excellent news article by David Brown, about a scientist who was convicted of data manipulation: In all, 330 patients were randomly assigned to get either interferon gamma-1b or placebo injections. Disease progression or death occurred in 46 percent of those on the drug and 52 percent of those on placebo. That was not a significant difference, statistically speaking. When only survival was considered, however, the drug looked better: 10 percent of people getting the drug died, compared with 17 percent of those on placebo. However, that difference wasn’t “statistically significant,” either. Specifically, the so-called P value — a mathematical measure of the strength of the evidence that there’s a true difference between a treatment and placebo — was 0.08. . . . Technically, the study was a bust, although the results leaned toward a benefit from interferon gamma-1b. Was there a group of patients in which the results tipped? Harkonen asked the statis

5 0.97868776 807 andrew gelman stats-2011-07-17-Macro causality

Introduction: David Backus writes: This is from my area of work, macroeconomics. The suggestion here is that the economy is growing slowly because consumers aren’t spending money. But how do we know it’s not the reverse: that consumers are spending less because the economy isn’t doing well. As a teacher, I can tell you that it’s almost impossible to get students to understand that the first statement isn’t obviously true. What I’d call the demand-side story (more spending leads to more output) is everywhere, including this piece, from the usually reliable David Leonhardt. This whole situation reminds me of the story of the village whose inhabitants support themselves by taking in each others’ laundry. I guess we’re rich enough in the U.S. that we can stay afloat for a few decades just buying things from each other? Regarding the causal question, I’d like to move away from the idea of “Does A causes B or does B cause A” and toward a more intervention-based framework (Rubin’s model for

6 0.97857195 262 andrew gelman stats-2010-09-08-Here’s how rumors get started: Lineplots, dotplots, and nonfunctional modernist architecture

7 0.97786266 2174 andrew gelman stats-2014-01-17-How to think about the statistical evidence when the statistical evidence can’t be conclusive?

8 0.97760463 1061 andrew gelman stats-2011-12-16-CrossValidated: A place to post your statistics questions

same-blog 9 0.97756964 1482 andrew gelman stats-2012-09-04-Model checking and model understanding in machine learning

10 0.97725034 1695 andrew gelman stats-2013-01-28-Economists argue about Bayes

11 0.97711551 2281 andrew gelman stats-2014-04-04-The Notorious N.H.S.T. presents: Mo P-values Mo Problems

12 0.97709787 878 andrew gelman stats-2011-08-29-Infovis, infographics, and data visualization: Where I’m coming from, and where I’d like to go

13 0.97693968 1760 andrew gelman stats-2013-03-12-Misunderstanding the p-value

14 0.97683966 789 andrew gelman stats-2011-07-07-Descriptive statistics, causal inference, and story time

15 0.97636086 662 andrew gelman stats-2011-04-15-Bayesian statistical pragmatism

16 0.97604114 855 andrew gelman stats-2011-08-16-Infovis and statgraphics update update

17 0.97596693 18 andrew gelman stats-2010-05-06-$63,000 worth of abusive research . . . or just a really stupid waste of time?

18 0.97588164 1763 andrew gelman stats-2013-03-14-Everyone’s trading bias for variance at some point, it’s just done at different places in the analyses

19 0.97554976 2120 andrew gelman stats-2013-12-02-Does a professor’s intervention in online discussions have the effect of prolonging discussion or cutting it off?

20 0.97546673 288 andrew gelman stats-2010-09-21-Discussion of the paper by Girolami and Calderhead on Bayesian computation