andrew_gelman_stats andrew_gelman_stats-2012 andrew_gelman_stats-2012-1482 knowledge-graph by maker-knowledge-mining
Source: html
Introduction: Last month I wrote : Computer scientists are often brilliant but they can be unfamiliar with what is done in the worlds of data collection and analysis. This goes the other way too: statisticians such as myself can look pretty awkward, reinventing (or failing to reinvent) various wheels when we write computer programs or, even worse, try to design software.Andrew MacNamara writes: Andrew MacNamara followed up with some thoughts: I [MacNamara] had some basic statistics training through my MBA program, after having completed an undergrad degree in computer science. Since then I’ve been very interested in learning more about statistical techniques, including things like GLM and censored data analyses as well as machine learning topics like neural nets, SVMs, etc. I began following your blog after some research into Bayesian analysis topics and I am trying to dig deeper on that side of things. One thing I have noticed is that there seems to be a distinction between data analysi
sentIndex sentText sentNum sentScore
1 Last month I wrote : Computer scientists are often brilliant but they can be unfamiliar with what is done in the worlds of data collection and analysis. [sent-1, score-0.203]
2 This goes the other way too: statisticians such as myself can look pretty awkward, reinventing (or failing to reinvent) various wheels when we write computer programs or, even worse, try to design software. [sent-2, score-0.58]
3 Andrew MacNamara writes: Andrew MacNamara followed up with some thoughts: I [MacNamara] had some basic statistics training through my MBA program, after having completed an undergrad degree in computer science. [sent-3, score-0.501]
4 Since then I’ve been very interested in learning more about statistical techniques, including things like GLM and censored data analyses as well as machine learning topics like neural nets, SVMs, etc. [sent-4, score-0.837]
5 One thing I have noticed is that there seems to be a distinction between data analysis as approached from a statistical perspective (e. [sent-6, score-0.365]
6 , generalized linear models) versus from a computer science perspective (e. [sent-8, score-0.348]
7 Many of the computer scientists I work with approach a data analysis problem by throwing as many ‘features’ at the model as possible, letting the computer do the work, and trying to get the best-performing model as measured by some cross-validation technique. [sent-11, score-1.28]
8 ) and testing for statistical issues that are known to cause problems with models or diagnostics (e. [sent-13, score-0.326]
9 To me, a symptom of this difference in philosophies is that the machine learning software packages I have tried do not seem to output any statistics showing the relative importance or errors of the input features like I would expect from a statistical regression package. [sent-17, score-0.755]
10 My reply: The big difference I’ve noticed between the two fields is that statisticians like to demonstrate our methods on new examples whereas computer scientists seem to be prefer to show better performance on benchmark problems. [sent-21, score-0.878]
11 To a computer scientist, though, solving a new problem is no big deal—they can solve problems whenever they want, and it is through benchmarks that they can make fair comparisons. [sent-24, score-0.535]
12 Now to return to the original question: Yes, CS methods seem to focus on prediction while statistical methods focus on understanding. [sent-25, score-0.485]
13 One might describe the basic approaches of different quantitative fields as follows: Economics: identify the causal effect; Psychology: model the underlying process; Statistics: fit the data; Computer science: predict. [sent-26, score-0.5]
14 About ten years ago I had several meetings with a computer scientist here at Columbia who was working on interesting statistical methods. [sent-28, score-0.524]
15 I was wondering if his methods could help on my problems, or if my methods could help on his. [sent-29, score-0.298]
16 Conversely, it seemed impossible to apply my computationally-intensive hierarchical modeling methods with his huge masses of information. [sent-32, score-0.215]
17 I’ve long thought that machine-learning-style approaches would benefit from predictive model checking. [sent-35, score-0.334]
18 When you see where your model doesn’t fit data, this can give a sense of how it can make sense to put in improvements. [sent-36, score-0.23]
19 Then again, I’ve long thought that statistical model fits should be checked to data also, and a lot of statisticians (particularly Bayesians) have resisted this. [sent-37, score-0.555]
20 Machine learning methods are not always generative, in which case the first step to model checking is the construction of a generative model corresponding to (or approximating) the estimation procedure. [sent-39, score-1.004]
wordName wordTfidf (topN-words)
[('computer', 0.348), ('generative', 0.279), ('macnamara', 0.242), ('model', 0.159), ('machine', 0.158), ('learning', 0.156), ('methods', 0.149), ('conceptually', 0.128), ('cs', 0.128), ('problems', 0.121), ('data', 0.12), ('computationally', 0.116), ('philosophies', 0.114), ('statistical', 0.113), ('approaches', 0.108), ('checking', 0.102), ('models', 0.092), ('statisticians', 0.09), ('basic', 0.089), ('scientists', 0.083), ('techniques', 0.08), ('features', 0.075), ('ve', 0.075), ('return', 0.074), ('resisted', 0.073), ('reinvent', 0.073), ('reinventing', 0.073), ('svm', 0.073), ('importance', 0.073), ('fields', 0.073), ('fit', 0.071), ('transferrable', 0.069), ('wheels', 0.069), ('noticed', 0.069), ('topics', 0.068), ('lack', 0.067), ('predictive', 0.067), ('symptom', 0.066), ('censored', 0.066), ('approximating', 0.066), ('dig', 0.066), ('masses', 0.066), ('generalizes', 0.066), ('new', 0.066), ('undergrad', 0.064), ('blei', 0.064), ('thereof', 0.064), ('scientist', 0.063), ('impression', 0.063), ('analysis', 0.063)]
simIndex simValue blogId blogTitle
same-blog 1 1.0 1482 andrew gelman stats-2012-09-04-Model checking and model understanding in machine learning
Introduction: Last month I wrote : Computer scientists are often brilliant but they can be unfamiliar with what is done in the worlds of data collection and analysis. This goes the other way too: statisticians such as myself can look pretty awkward, reinventing (or failing to reinvent) various wheels when we write computer programs or, even worse, try to design software.Andrew MacNamara writes: Andrew MacNamara followed up with some thoughts: I [MacNamara] had some basic statistics training through my MBA program, after having completed an undergrad degree in computer science. Since then I’ve been very interested in learning more about statistical techniques, including things like GLM and censored data analyses as well as machine learning topics like neural nets, SVMs, etc. I began following your blog after some research into Bayesian analysis topics and I am trying to dig deeper on that side of things. One thing I have noticed is that there seems to be a distinction between data analysi
2 0.21062808 1740 andrew gelman stats-2013-02-26-“Is machine learning a subset of statistics?”
Introduction: Following up on our previous post , Andrew Wilson writes: I agree we are in a really exciting time for statistics and machine learning. There has been a lot of talk lately comparing machine learning with statistics. I am curious whether you think there are many fundamental differences between the fields, or just superficial differences — different popular approximate inference methods, slightly different popular application areas, etc. Is machine learning a subset of statistics? In the paper we discuss how we think machine learning is fundamentally about pattern discovery, and ultimately, fully automating the learning and decision making process. In other words, whatever a human does when he or she uses tools to analyze data, can be written down algorithmically and automated on a computer. I am not sure if the ambitions are similar in statistics — and I don’t have any conventional statistics background, which makes it harder to tell. I think it’s an interesting discussion.
Introduction: David Duvenaud writes: I’ve been following your recent discussions about how an AI could do statistics [see also here ]. I was especially excited about your suggestion for new statistical methods using “a language-like approach to recursively creating new models from a specified list of distributions and transformations, and an automatic approach to checking model fit.” Your discussion of these ideas was exciting to me and my colleagues because we recently did some work taking a step in this direction, automatically searching through a grammar over Gaussian process regression models. Roger Grosse previously did the same thing , but over matrix decomposition models using held-out predictive likelihood to check model fit. These are both examples of automatic Bayesian model-building by a search over more and more complex models, as you suggested. One nice thing is that both grammars include lots of standard models for free, and they seem to work pretty well, although the
4 0.19601005 781 andrew gelman stats-2011-06-28-The holes in my philosophy of Bayesian data analysis
Introduction: I’ve been writing a lot about my philosophy of Bayesian statistics and how it fits into Popper’s ideas about falsification and Kuhn’s ideas about scientific revolutions. Here’s my long, somewhat technical paper with Cosma Shalizi. Here’s our shorter overview for the volume on the philosophy of social science. Here’s my latest try (for an online symposium), focusing on the key issues. I’m pretty happy with my approach–the familiar idea that Bayesian data analysis iterates the three steps of model building, inference, and model checking–but it does have some unresolved (maybe unresolvable) problems. Here are a couple mentioned in the third of the above links. Consider a simple model with independent data y_1, y_2, .., y_10 ~ N(θ,σ^2), with a prior distribution θ ~ N(0,10^2) and σ known and taking on some value of approximately 10. Inference about μ is straightforward, as is model checking, whether based on graphs or numerical summaries such as the sample variance and skewn
5 0.1948438 1510 andrew gelman stats-2012-09-25-Incoherence of Bayesian data analysis
Introduction: Hogg writes: At the end this article you wonder about consistency. Have you ever considered the possibility that utility might resolve some of the problems? I have no idea if it would—I am not advocating that position—I just get some kind of intuition from phrases like “Judgment is required to decide…”. Perhaps there is a coherent and objective description of what is—or could be—done under a coherent “utility” model (like a utility that could be objectively agreed upon and computed). Utilities are usually subjective—true—but priors are usually subjective too. My reply: I’m happy to think about utility, for some particular problem or class of problems going to the effort of assigning costs and benefits to different outcomes. I agree that a utility analysis, even if (necessarily) imperfect, can usefully focus discussion. For example, if a statistical method for selecting variables is justified on the basis of cost, I like the idea of attempting to quantify the costs of ga
6 0.19317564 1902 andrew gelman stats-2013-06-17-Job opening at new “big data” consulting firm!
7 0.18682069 1469 andrew gelman stats-2012-08-25-Ways of knowing
10 0.1676625 1719 andrew gelman stats-2013-02-11-Why waste time philosophizing?
11 0.16433379 1695 andrew gelman stats-2013-01-28-Economists argue about Bayes
12 0.16349781 1431 andrew gelman stats-2012-07-27-Overfitting
13 0.1603002 291 andrew gelman stats-2010-09-22-Philosophy of Bayes and non-Bayes: A dialogue with Deborah Mayo
14 0.15965097 244 andrew gelman stats-2010-08-30-Useful models, model checking, and external validation: a mini-discussion
16 0.15666287 754 andrew gelman stats-2011-06-09-Difficulties with Bayesian model averaging
17 0.15601158 320 andrew gelman stats-2010-10-05-Does posterior predictive model checking fit with the operational subjective approach?
18 0.15578288 1630 andrew gelman stats-2012-12-18-Postdoc positions at Microsoft Research – NYC
19 0.14723378 1961 andrew gelman stats-2013-07-29-Postdocs in probabilistic modeling! With David Blei! And Stan!
20 0.14707522 1848 andrew gelman stats-2013-05-09-A tale of two discussion papers
topicId topicWeight
[(0, 0.323), (1, 0.151), (2, -0.097), (3, 0.037), (4, 0.014), (5, 0.083), (6, -0.128), (7, -0.001), (8, 0.063), (9, 0.063), (10, -0.005), (11, 0.017), (12, -0.045), (13, -0.035), (14, -0.087), (15, -0.003), (16, 0.018), (17, -0.068), (18, 0.003), (19, -0.01), (20, 0.023), (21, -0.039), (22, -0.006), (23, 0.015), (24, -0.05), (25, 0.025), (26, -0.028), (27, -0.032), (28, 0.016), (29, -0.033), (30, 0.02), (31, 0.018), (32, 0.003), (33, 0.001), (34, 0.014), (35, -0.003), (36, -0.036), (37, -0.024), (38, -0.037), (39, -0.021), (40, -0.02), (41, 0.015), (42, -0.014), (43, 0.074), (44, 0.048), (45, 0.013), (46, -0.03), (47, 0.009), (48, 0.037), (49, -0.009)]
simIndex simValue blogId blogTitle
same-blog 1 0.97864872 1482 andrew gelman stats-2012-09-04-Model checking and model understanding in machine learning
Introduction: Last month I wrote : Computer scientists are often brilliant but they can be unfamiliar with what is done in the worlds of data collection and analysis. This goes the other way too: statisticians such as myself can look pretty awkward, reinventing (or failing to reinvent) various wheels when we write computer programs or, even worse, try to design software.Andrew MacNamara writes: Andrew MacNamara followed up with some thoughts: I [MacNamara] had some basic statistics training through my MBA program, after having completed an undergrad degree in computer science. Since then I’ve been very interested in learning more about statistical techniques, including things like GLM and censored data analyses as well as machine learning topics like neural nets, SVMs, etc. I began following your blog after some research into Bayesian analysis topics and I am trying to dig deeper on that side of things. One thing I have noticed is that there seems to be a distinction between data analysi
Introduction: David Duvenaud writes: I’ve been following your recent discussions about how an AI could do statistics [see also here ]. I was especially excited about your suggestion for new statistical methods using “a language-like approach to recursively creating new models from a specified list of distributions and transformations, and an automatic approach to checking model fit.” Your discussion of these ideas was exciting to me and my colleagues because we recently did some work taking a step in this direction, automatically searching through a grammar over Gaussian process regression models. Roger Grosse previously did the same thing , but over matrix decomposition models using held-out predictive likelihood to check model fit. These are both examples of automatic Bayesian model-building by a search over more and more complex models, as you suggested. One nice thing is that both grammars include lots of standard models for free, and they seem to work pretty well, although the
3 0.89895838 1740 andrew gelman stats-2013-02-26-“Is machine learning a subset of statistics?”
Introduction: Following up on our previous post , Andrew Wilson writes: I agree we are in a really exciting time for statistics and machine learning. There has been a lot of talk lately comparing machine learning with statistics. I am curious whether you think there are many fundamental differences between the fields, or just superficial differences — different popular approximate inference methods, slightly different popular application areas, etc. Is machine learning a subset of statistics? In the paper we discuss how we think machine learning is fundamentally about pattern discovery, and ultimately, fully automating the learning and decision making process. In other words, whatever a human does when he or she uses tools to analyze data, can be written down algorithmically and automated on a computer. I am not sure if the ambitions are similar in statistics — and I don’t have any conventional statistics background, which makes it harder to tell. I think it’s an interesting discussion.
4 0.86576569 1406 andrew gelman stats-2012-07-05-Xiao-Li Meng and Xianchao Xie rethink asymptotics
Introduction: In an article catchily entitled, “I got more data, my model is more refined, but my estimator is getting worse! Am I just dumb?”, Meng and Xie write: Possibly, but more likely you are merely a victim of conventional wisdom. More data or better models by no means guarantee better estimators (e.g., with a smaller mean squared error), when you are not following probabilistically principled methods such as MLE (for large samples) or Bayesian approaches. Estimating equations are par- ticularly vulnerable in this regard, almost a necessary price for their robustness. These points will be demonstrated via common tasks of estimating regression parameters and correlations, under simple mod- els such as bivariate normal and ARCH(1). Some general strategies for detecting and avoiding such pitfalls are suggested, including checking for self-efficiency (Meng, 1994, Statistical Science) and adopting a guiding working model. Using the example of estimating the autocorrelation ρ under a statio
5 0.86062104 496 andrew gelman stats-2011-01-01-Tukey’s philosophy
Introduction: The great statistician John Tukey, in his writings from the 1970s onward (and maybe earlier) was time and again making the implicit argument that you should evaluate a statistical method based on what it does; you should {\em not} be staring at the model that purportedly underlies the method, trying to determine if the model is “true” (or “true enough”). Tukey’s point was that models can be great to inspire methods, but the model is the scaffolding; it is the method that is the building you have to live in. I don’t fully agree with this philosophy–I think models are a good way to understand data and also often connect usefully to scientific models (although not as cleanly as is thought by our friends who work in economics or statistical hypothesis testing). To put it another way: What makes a building good? A building is good if it is useful. If a building is useful, people will use it. Eventually improvements will be needed, partly because the building will get worn down, part
6 0.84386665 214 andrew gelman stats-2010-08-17-Probability-processing hardware
7 0.84142178 1788 andrew gelman stats-2013-04-04-When is there “hidden structure in data” to be discovered?
10 0.83488107 1718 andrew gelman stats-2013-02-11-Toward a framework for automatic model building
12 0.83203799 421 andrew gelman stats-2010-11-19-Just chaid
13 0.83199769 244 andrew gelman stats-2010-08-30-Useful models, model checking, and external validation: a mini-discussion
14 0.82616127 964 andrew gelman stats-2011-10-19-An interweaving-transformation strategy for boosting MCMC efficiency
15 0.82471418 773 andrew gelman stats-2011-06-18-Should we always be using the t and robit instead of the normal and logit?
16 0.82081634 690 andrew gelman stats-2011-05-01-Peter Huber’s reflections on data analysis
18 0.79708338 1392 andrew gelman stats-2012-06-26-Occam
19 0.78805494 781 andrew gelman stats-2011-06-28-The holes in my philosophy of Bayesian data analysis
20 0.78563088 1510 andrew gelman stats-2012-09-25-Incoherence of Bayesian data analysis
topicId topicWeight
[(9, 0.021), (16, 0.074), (21, 0.056), (24, 0.189), (35, 0.071), (42, 0.01), (44, 0.011), (53, 0.014), (57, 0.01), (86, 0.037), (99, 0.361)]
simIndex simValue blogId blogTitle
1 0.98438811 1926 andrew gelman stats-2013-07-05-More plain old everyday Bayesianism
Introduction: Following up on this story , Bob Goodman writes: A most recent issue of the New England Journal of Medicine published a study entitled “Biventricular Pacing for Atrioventricular Block and Systolic Dysfunction,” (N Engl J Med 2013; 368:1585-1593), whereby “A hierarchical Bayesian proportional-hazards model was used for analysis of the primary outcome.” It is the first study I can recall in this journal that has reported on Table 2 (primary outcomes) “The Posterior Probability of Hazard Ratio < 1" (which in this case was .9978). This is ok, but to be really picky I will say that there’s typically not so much reason to care about the posterior probability that the effect is greater than 1; I’d rather have an estimate of the effect. Also we should be using informative priors.
2 0.98368877 1516 andrew gelman stats-2012-09-30-Computational problems with glm etc.
Introduction: John Mount provides some useful background and follow-up on our discussion from last year on computational instability of the usual logistic regression solver. Just to refresh your memory, here’s a simple logistic regression with only a constant term and no separation, nothing pathological at all: > y <- rep (c(1,0),c(10,5)) > display (glm (y ~ 1, family=binomial(link="logit"))) glm(formula = y ~ 1, family = binomial(link = "logit")) coef.est coef.se (Intercept) 0.69 0.55 --- n = 15, k = 1 residual deviance = 19.1, null deviance = 19.1 (difference = 0.0) And here’s what happens when we give it the not-outrageous starting value of -2: > display (glm (y ~ 1, family=binomial(link="logit"), start=-2)) glm(formula = y ~ 1, family = binomial(link = "logit"), start = -2) coef.est coef.se (Intercept) 71.97 17327434.18 --- n = 15, k = 1 residual deviance = 360.4, null deviance = 19.1 (difference = -341.3) Warning message:
3 0.98349655 1443 andrew gelman stats-2012-08-04-Bayesian Learning via Stochastic Gradient Langevin Dynamics
Introduction: Burak Bayramli writes: In this paper by Sunjin Ahn, Anoop Korattikara, and Max Welling and this paper by Welling and Yee Whye The, there are some arguments on big data and the use of MCMC. Both papers have suggested improvements to speed up MCMC computations. I was wondering what your thoughts were, especially on this paragraph: When a dataset has a billion data-cases (as is not uncommon these days) MCMC algorithms will not even have generated a single (burn-in) sample when a clever learning algorithm based on stochastic gradients may already be making fairly good predictions. In fact, the intriguing results of Bottou and Bousquet (2008) seem to indicate that in terms of “number of bits learned per unit of computation”, an algorithm as simple as stochastic gradient descent is almost optimally efficient. We therefore argue that for Bayesian methods to remain useful in an age when the datasets grow at an exponential rate, they need to embrace the ideas of the stochastic optimiz
4 0.98241031 2049 andrew gelman stats-2013-10-03-On house arrest for p-hacking
Introduction: People keep pointing me to this excellent news article by David Brown, about a scientist who was convicted of data manipulation: In all, 330 patients were randomly assigned to get either interferon gamma-1b or placebo injections. Disease progression or death occurred in 46 percent of those on the drug and 52 percent of those on placebo. That was not a significant difference, statistically speaking. When only survival was considered, however, the drug looked better: 10 percent of people getting the drug died, compared with 17 percent of those on placebo. However, that difference wasn’t “statistically significant,” either. Specifically, the so-called P value — a mathematical measure of the strength of the evidence that there’s a true difference between a treatment and placebo — was 0.08. . . . Technically, the study was a bust, although the results leaned toward a benefit from interferon gamma-1b. Was there a group of patients in which the results tipped? Harkonen asked the statis
5 0.97868776 807 andrew gelman stats-2011-07-17-Macro causality
Introduction: David Backus writes: This is from my area of work, macroeconomics. The suggestion here is that the economy is growing slowly because consumers aren’t spending money. But how do we know it’s not the reverse: that consumers are spending less because the economy isn’t doing well. As a teacher, I can tell you that it’s almost impossible to get students to understand that the first statement isn’t obviously true. What I’d call the demand-side story (more spending leads to more output) is everywhere, including this piece, from the usually reliable David Leonhardt. This whole situation reminds me of the story of the village whose inhabitants support themselves by taking in each others’ laundry. I guess we’re rich enough in the U.S. that we can stay afloat for a few decades just buying things from each other? Regarding the causal question, I’d like to move away from the idea of “Does A causes B or does B cause A” and toward a more intervention-based framework (Rubin’s model for
8 0.97760463 1061 andrew gelman stats-2011-12-16-CrossValidated: A place to post your statistics questions
same-blog 9 0.97756964 1482 andrew gelman stats-2012-09-04-Model checking and model understanding in machine learning
10 0.97725034 1695 andrew gelman stats-2013-01-28-Economists argue about Bayes
11 0.97711551 2281 andrew gelman stats-2014-04-04-The Notorious N.H.S.T. presents: Mo P-values Mo Problems
13 0.97693968 1760 andrew gelman stats-2013-03-12-Misunderstanding the p-value
14 0.97683966 789 andrew gelman stats-2011-07-07-Descriptive statistics, causal inference, and story time
15 0.97636086 662 andrew gelman stats-2011-04-15-Bayesian statistical pragmatism
16 0.97604114 855 andrew gelman stats-2011-08-16-Infovis and statgraphics update update
17 0.97596693 18 andrew gelman stats-2010-05-06-$63,000 worth of abusive research . . . or just a really stupid waste of time?
20 0.97546673 288 andrew gelman stats-2010-09-21-Discussion of the paper by Girolami and Calderhead on Bayesian computation