andrew_gelman_stats andrew_gelman_stats-2012 andrew_gelman_stats-2012-1482 knowledge-graph by maker-knowledge-mining

1482 andrew gelman stats-2012-09-04-Model checking and model understanding in machine learning

meta infos for this blog

Source: html

Introduction: Last month I wrote : Computer scientists are often brilliant but they can be unfamiliar with what is done in the worlds of data collection and analysis. This goes the other way too: statisticians such as myself can look pretty awkward, reinventing (or failing to reinvent) various wheels when we write computer programs or, even worse, try to design software.Andrew MacNamara writes: Andrew MacNamara followed up with some thoughts: I [MacNamara] had some basic statistics training through my MBA program, after having completed an undergrad degree in computer science. Since then I’ve been very interested in learning more about statistical techniques, including things like GLM and censored data analyses as well as machine learning topics like neural nets, SVMs, etc. I began following your blog after some research into Bayesian analysis topics and I am trying to dig deeper on that side of things. One thing I have noticed is that there seems to be a distinction between data analysi

Summary: the most important sentenses genereted by tfidf model

sentIndex sentText sentNum sentScore

1 Last month I wrote : Computer scientists are often brilliant but they can be unfamiliar with what is done in the worlds of data collection and analysis. [sent-1, score-0.203]

2 This goes the other way too: statisticians such as myself can look pretty awkward, reinventing (or failing to reinvent) various wheels when we write computer programs or, even worse, try to design software. [sent-2, score-0.58]

3 Andrew MacNamara writes: Andrew MacNamara followed up with some thoughts: I [MacNamara] had some basic statistics training through my MBA program, after having completed an undergrad degree in computer science. [sent-3, score-0.501]

4 Since then I’ve been very interested in learning more about statistical techniques, including things like GLM and censored data analyses as well as machine learning topics like neural nets, SVMs, etc. [sent-4, score-0.837]

5 One thing I have noticed is that there seems to be a distinction between data analysis as approached from a statistical perspective (e. [sent-6, score-0.365]

6 , generalized linear models) versus from a computer science perspective (e. [sent-8, score-0.348]

7 Many of the computer scientists I work with approach a data analysis problem by throwing as many ‘features’ at the model as possible, letting the computer do the work, and trying to get the best-performing model as measured by some cross-validation technique. [sent-11, score-1.28]

8 ) and testing for statistical issues that are known to cause problems with models or diagnostics (e. [sent-13, score-0.326]

9 To me, a symptom of this difference in philosophies is that the machine learning software packages I have tried do not seem to output any statistics showing the relative importance or errors of the input features like I would expect from a statistical regression package. [sent-17, score-0.755]

10 My reply: The big difference I’ve noticed between the two fields is that statisticians like to demonstrate our methods on new examples whereas computer scientists seem to be prefer to show better performance on benchmark problems. [sent-21, score-0.878]

11 To a computer scientist, though, solving a new problem is no big deal—they can solve problems whenever they want, and it is through benchmarks that they can make fair comparisons. [sent-24, score-0.535]

12 Now to return to the original question: Yes, CS methods seem to focus on prediction while statistical methods focus on understanding. [sent-25, score-0.485]

13 One might describe the basic approaches of different quantitative fields as follows: Economics: identify the causal effect; Psychology: model the underlying process; Statistics: fit the data; Computer science: predict. [sent-26, score-0.5]

14 About ten years ago I had several meetings with a computer scientist here at Columbia who was working on interesting statistical methods. [sent-28, score-0.524]

15 I was wondering if his methods could help on my problems, or if my methods could help on his. [sent-29, score-0.298]

16 Conversely, it seemed impossible to apply my computationally-intensive hierarchical modeling methods with his huge masses of information. [sent-32, score-0.215]

17 I’ve long thought that machine-learning-style approaches would benefit from predictive model checking. [sent-35, score-0.334]

18 When you see where your model doesn’t fit data, this can give a sense of how it can make sense to put in improvements. [sent-36, score-0.23]

19 Then again, I’ve long thought that statistical model fits should be checked to data also, and a lot of statisticians (particularly Bayesians) have resisted this. [sent-37, score-0.555]

20 Machine learning methods are not always generative, in which case the first step to model checking is the construction of a generative model corresponding to (or approximating) the estimation procedure. [sent-39, score-1.004]

similar blogs computed by tfidf model

tfidf for this blog:

wordName wordTfidf (topN-words)

[('computer', 0.348), ('generative', 0.279), ('macnamara', 0.242), ('model', 0.159), ('machine', 0.158), ('learning', 0.156), ('methods', 0.149), ('conceptually', 0.128), ('cs', 0.128), ('problems', 0.121), ('data', 0.12), ('computationally', 0.116), ('philosophies', 0.114), ('statistical', 0.113), ('approaches', 0.108), ('checking', 0.102), ('models', 0.092), ('statisticians', 0.09), ('basic', 0.089), ('scientists', 0.083), ('techniques', 0.08), ('features', 0.075), ('ve', 0.075), ('return', 0.074), ('resisted', 0.073), ('reinvent', 0.073), ('reinventing', 0.073), ('svm', 0.073), ('importance', 0.073), ('fields', 0.073), ('fit', 0.071), ('transferrable', 0.069), ('wheels', 0.069), ('noticed', 0.069), ('topics', 0.068), ('lack', 0.067), ('predictive', 0.067), ('symptom', 0.066), ('censored', 0.066), ('approximating', 0.066), ('dig', 0.066), ('masses', 0.066), ('generalizes', 0.066), ('new', 0.066), ('undergrad', 0.064), ('blei', 0.064), ('thereof', 0.064), ('scientist', 0.063), ('impression', 0.063), ('analysis', 0.063)]

similar blogs list:

simIndex simValue blogId blogTitle

same-blog 1 1.0 1482 andrew gelman stats-2012-09-04-Model checking and model understanding in machine learning

2 0.21062808 1740 andrew gelman stats-2013-02-26-“Is machine learning a subset of statistics?”

Introduction: Following up on our previous post , Andrew Wilson writes: I agree we are in a really exciting time for statistics and machine learning. There has been a lot of talk lately comparing machine learning with statistics. I am curious whether you think there are many fundamental differences between the fields, or just superficial differences — different popular approximate inference methods, slightly different popular application areas, etc. Is machine learning a subset of statistics? In the paper we discuss how we think machine learning is fundamentally about pattern discovery, and ultimately, fully automating the learning and decision making process. In other words, whatever a human does when he or she uses tools to analyze data, can be written down algorithmically and automated on a computer. I am not sure if the ambitions are similar in statistics — and I don’t have any conventional statistics background, which makes it harder to tell. I think it’s an interesting discussion.

3 0.20160452 1739 andrew gelman stats-2013-02-26-An AI can build and try out statistical models using an open-ended generative grammar

Introduction: David Duvenaud writes: I’ve been following your recent discussions about how an AI could do statistics [see also here ]. I was especially excited about your suggestion for new statistical methods using “a language-like approach to recursively creating new models from a specified list of distributions and transformations, and an automatic approach to checking model fit.” Your discussion of these ideas was exciting to me and my colleagues because we recently did some work taking a step in this direction, automatically searching through a grammar over Gaussian process regression models. Roger Grosse previously did the same thing , but over matrix decomposition models using held-out predictive likelihood to check model fit. These are both examples of automatic Bayesian model-building by a search over more and more complex models, as you suggested. One nice thing is that both grammars include lots of standard models for free, and they seem to work pretty well, although the

4 0.19601005 781 andrew gelman stats-2011-06-28-The holes in my philosophy of Bayesian data analysis

Introduction: I’ve been writing a lot about my philosophy of Bayesian statistics and how it fits into Popper’s ideas about falsification and Kuhn’s ideas about scientific revolutions. Here’s my long, somewhat technical paper with Cosma Shalizi. Here’s our shorter overview for the volume on the philosophy of social science. Here’s my latest try (for an online symposium), focusing on the key issues. I’m pretty happy with my approach–the familiar idea that Bayesian data analysis iterates the three steps of model building, inference, and model checking–but it does have some unresolved (maybe unresolvable) problems. Here are a couple mentioned in the third of the above links. Consider a simple model with independent data y_1, y_2, .., y_10 ~ N(θ,σ^2), with a prior distribution θ ~ N(0,10^2) and σ known and taking on some value of approximately 10. Inference about μ is straightforward, as is model checking, whether based on graphs or numerical summaries such as the sample variance and skewn

5 0.1948438 1510 andrew gelman stats-2012-09-25-Incoherence of Bayesian data analysis

Introduction: Hogg writes: At the end this article you wonder about consistency. Have you ever considered the possibility that utility might resolve some of the problems? I have no idea if it would—I am not advocating that position—I just get some kind of intuition from phrases like “Judgment is required to decide…”. Perhaps there is a coherent and objective description of what is—or could be—done under a coherent “utility” model (like a utility that could be objectively agreed upon and computed). Utilities are usually subjective—true—but priors are usually subjective too. My reply: I’m happy to think about utility, for some particular problem or class of problems going to the effort of assigning costs and benefits to different outcomes. I agree that a utility analysis, even if (necessarily) imperfect, can usefully focus discussion. For example, if a statistical method for selecting variables is justified on the basis of cost, I like the idea of attempting to quantify the costs of ga

6 0.19317564 1902 andrew gelman stats-2013-06-17-Job opening at new “big data” consulting firm!

7 0.18682069 1469 andrew gelman stats-2012-08-25-Ways of knowing

8 0.17509219 1972 andrew gelman stats-2013-08-07-When you’re planning on fitting a model, build up to it by fitting simpler models first. Then, once you have a model you like, check the hell out of it

9 0.17227729 774 andrew gelman stats-2011-06-20-The pervasive twoishness of statistics; in particular, the “sampling distribution” and the “likelihood” are two different models, and that’s a good thing

10 0.1676625 1719 andrew gelman stats-2013-02-11-Why waste time philosophizing?

11 0.16433379 1695 andrew gelman stats-2013-01-28-Economists argue about Bayes

12 0.16349781 1431 andrew gelman stats-2012-07-27-Overfitting

13 0.1603002 291 andrew gelman stats-2010-09-22-Philosophy of Bayes and non-Bayes: A dialogue with Deborah Mayo

14 0.15965097 244 andrew gelman stats-2010-08-30-Useful models, model checking, and external validation: a mini-discussion

15 0.15834317 1763 andrew gelman stats-2013-03-14-Everyone’s trading bias for variance at some point, it’s just done at different places in the analyses

16 0.15666287 754 andrew gelman stats-2011-06-09-Difficulties with Bayesian model averaging

17 0.15601158 320 andrew gelman stats-2010-10-05-Does posterior predictive model checking fit with the operational subjective approach?

18 0.15578288 1630 andrew gelman stats-2012-12-18-Postdoc positions at Microsoft Research – NYC

19 0.14723378 1961 andrew gelman stats-2013-07-29-Postdocs in probabilistic modeling! With David Blei! And Stan!

20 0.14707522 1848 andrew gelman stats-2013-05-09-A tale of two discussion papers

similar blogs computed by lsi model

lsi for this blog:

topicId topicWeight

[(0, 0.323), (1, 0.151), (2, -0.097), (3, 0.037), (4, 0.014), (5, 0.083), (6, -0.128), (7, -0.001), (8, 0.063), (9, 0.063), (10, -0.005), (11, 0.017), (12, -0.045), (13, -0.035), (14, -0.087), (15, -0.003), (16, 0.018), (17, -0.068), (18, 0.003), (19, -0.01), (20, 0.023), (21, -0.039), (22, -0.006), (23, 0.015), (24, -0.05), (25, 0.025), (26, -0.028), (27, -0.032), (28, 0.016), (29, -0.033), (30, 0.02), (31, 0.018), (32, 0.003), (33, 0.001), (34, 0.014), (35, -0.003), (36, -0.036), (37, -0.024), (38, -0.037), (39, -0.021), (40, -0.02), (41, 0.015), (42, -0.014), (43, 0.074), (44, 0.048), (45, 0.013), (46, -0.03), (47, 0.009), (48, 0.037), (49, -0.009)]

similar blogs list:

simIndex simValue blogId blogTitle

same-blog 1 0.97864872 1482 andrew gelman stats-2012-09-04-Model checking and model understanding in machine learning

2 0.91440082 1739 andrew gelman stats-2013-02-26-An AI can build and try out statistical models using an open-ended generative grammar

3 0.89895838 1740 andrew gelman stats-2013-02-26-“Is machine learning a subset of statistics?”

4 0.86576569 1406 andrew gelman stats-2012-07-05-Xiao-Li Meng and Xianchao Xie rethink asymptotics

Introduction: In an article catchily entitled, “I got more data, my model is more refined, but my estimator is getting worse! Am I just dumb?”, Meng and Xie write: Possibly, but more likely you are merely a victim of conventional wisdom. More data or better models by no means guarantee better estimators (e.g., with a smaller mean squared error), when you are not following probabilistically principled methods such as MLE (for large samples) or Bayesian approaches. Estimating equations are par- ticularly vulnerable in this regard, almost a necessary price for their robustness. These points will be demonstrated via common tasks of estimating regression parameters and correlations, under simple mod- els such as bivariate normal and ARCH(1). Some general strategies for detecting and avoiding such pitfalls are suggested, including checking for self-efficiency (Meng, 1994, Statistical Science) and adopting a guiding working model. Using the example of estimating the autocorrelation ρ under a statio

5 0.86062104 496 andrew gelman stats-2011-01-01-Tukey’s philosophy

Introduction: The great statistician John Tukey, in his writings from the 1970s onward (and maybe earlier) was time and again making the implicit argument that you should evaluate a statistical method based on what it does; you should {\em not} be staring at the model that purportedly underlies the method, trying to determine if the model is “true” (or “true enough”). Tukey’s point was that models can be great to inspire methods, but the model is the scaffolding; it is the method that is the building you have to live in. I don’t fully agree with this philosophy–I think models are a good way to understand data and also often connect usefully to scientific models (although not as cleanly as is thought by our friends who work in economics or statistical hypothesis testing). To put it another way: What makes a building good? A building is good if it is useful. If a building is useful, people will use it. Eventually improvements will be needed, partly because the building will get worn down, part

6 0.84386665 214 andrew gelman stats-2010-08-17-Probability-processing hardware

7 0.84142178 1788 andrew gelman stats-2013-04-04-When is there “hidden structure in data” to be discovered?

8 0.83914346 1690 andrew gelman stats-2013-01-23-When are complicated models helpful in psychology research and when are they overkill?

9 0.83776826 320 andrew gelman stats-2010-10-05-Does posterior predictive model checking fit with the operational subjective approach?

10 0.83488107 1718 andrew gelman stats-2013-02-11-Toward a framework for automatic model building

11 0.83277178 1763 andrew gelman stats-2013-03-14-Everyone’s trading bias for variance at some point, it’s just done at different places in the analyses

12 0.83203799 421 andrew gelman stats-2010-11-19-Just chaid

13 0.83199769 244 andrew gelman stats-2010-08-30-Useful models, model checking, and external validation: a mini-discussion

14 0.82616127 964 andrew gelman stats-2011-10-19-An interweaving-transformation strategy for boosting MCMC efficiency

15 0.82471418 773 andrew gelman stats-2011-06-18-Should we always be using the t and robit instead of the normal and logit?

16 0.82081634 690 andrew gelman stats-2011-05-01-Peter Huber’s reflections on data analysis

17 0.8131572 774 andrew gelman stats-2011-06-20-The pervasive twoishness of statistics; in particular, the “sampling distribution” and the “likelihood” are two different models, and that’s a good thing

18 0.79708338 1392 andrew gelman stats-2012-06-26-Occam

19 0.78805494 781 andrew gelman stats-2011-06-28-The holes in my philosophy of Bayesian data analysis

20 0.78563088 1510 andrew gelman stats-2012-09-25-Incoherence of Bayesian data analysis

similar blogs computed by lda model

lda for this blog:

topicId topicWeight

[(9, 0.021), (16, 0.074), (21, 0.056), (24, 0.189), (35, 0.071), (42, 0.01), (44, 0.011), (53, 0.014), (57, 0.01), (86, 0.037), (99, 0.361)]

similar blogs list:

simIndex simValue blogId blogTitle

1 0.98438811 1926 andrew gelman stats-2013-07-05-More plain old everyday Bayesianism

Introduction: Following up on this story , Bob Goodman writes: A most recent issue of the New England Journal of Medicine published a study entitled “Biventricular Pacing for Atrioventricular Block and Systolic Dysfunction,” (N Engl J Med 2013; 368:1585-1593), whereby “A hierarchical Bayesian proportional-hazards model was used for analysis of the primary outcome.” It is the first study I can recall in this journal that has reported on Table 2 (primary outcomes) “The Posterior Probability of Hazard Ratio < 1" (which in this case was .9978). This is ok, but to be really picky I will say that there’s typically not so much reason to care about the posterior probability that the effect is greater than 1; I’d rather have an estimate of the effect. Also we should be using informative priors.

2 0.98368877 1516 andrew gelman stats-2012-09-30-Computational problems with glm etc.

Introduction: John Mount provides some useful background and follow-up on our discussion from last year on computational instability of the usual logistic regression solver. Just to refresh your memory, here’s a simple logistic regression with only a constant term and no separation, nothing pathological at all: > y <- rep (c(1,0),c(10,5)) > display (glm (y ~ 1, family=binomial(link="logit"))) glm(formula = y ~ 1, family = binomial(link = "logit")) coef.est coef.se (Intercept) 0.69 0.55 --- n = 15, k = 1 residual deviance = 19.1, null deviance = 19.1 (difference = 0.0) And here’s what happens when we give it the not-outrageous starting value of -2: > display (glm (y ~ 1, family=binomial(link="logit"), start=-2)) glm(formula = y ~ 1, family = binomial(link = "logit"), start = -2) coef.est coef.se (Intercept) 71.97 17327434.18 --- n = 15, k = 1 residual deviance = 360.4, null deviance = 19.1 (difference = -341.3) Warning message:

3 0.98349655 1443 andrew gelman stats-2012-08-04-Bayesian Learning via Stochastic Gradient Langevin Dynamics

Introduction: Burak Bayramli writes: In this paper by Sunjin Ahn, Anoop Korattikara, and Max Welling and this paper by Welling and Yee Whye The, there are some arguments on big data and the use of MCMC. Both papers have suggested improvements to speed up MCMC computations. I was wondering what your thoughts were, especially on this paragraph: When a dataset has a billion data-cases (as is not uncommon these days) MCMC algorithms will not even have generated a single (burn-in) sample when a clever learning algorithm based on stochastic gradients may already be making fairly good predictions. In fact, the intriguing results of Bottou and Bousquet (2008) seem to indicate that in terms of “number of bits learned per unit of computation”, an algorithm as simple as stochastic gradient descent is almost optimally efficient. We therefore argue that for Bayesian methods to remain useful in an age when the datasets grow at an exponential rate, they need to embrace the ideas of the stochastic optimiz

4 0.98241031 2049 andrew gelman stats-2013-10-03-On house arrest for p-hacking

Introduction: People keep pointing me to this excellent news article by David Brown, about a scientist who was convicted of data manipulation: In all, 330 patients were randomly assigned to get either interferon gamma-1b or placebo injections. Disease progression or death occurred in 46 percent of those on the drug and 52 percent of those on placebo. That was not a significant difference, statistically speaking. When only survival was considered, however, the drug looked better: 10 percent of people getting the drug died, compared with 17 percent of those on placebo. However, that difference wasn’t “statistically significant,” either. Specifically, the so-called P value — a mathematical measure of the strength of the evidence that there’s a true difference between a treatment and placebo — was 0.08. . . . Technically, the study was a bust, although the results leaned toward a benefit from interferon gamma-1b. Was there a group of patients in which the results tipped? Harkonen asked the statis

5 0.97868776 807 andrew gelman stats-2011-07-17-Macro causality

Introduction: David Backus writes: This is from my area of work, macroeconomics. The suggestion here is that the economy is growing slowly because consumers aren’t spending money. But how do we know it’s not the reverse: that consumers are spending less because the economy isn’t doing well. As a teacher, I can tell you that it’s almost impossible to get students to understand that the first statement isn’t obviously true. What I’d call the demand-side story (more spending leads to more output) is everywhere, including this piece, from the usually reliable David Leonhardt. This whole situation reminds me of the story of the village whose inhabitants support themselves by taking in each others’ laundry. I guess we’re rich enough in the U.S. that we can stay afloat for a few decades just buying things from each other? Regarding the causal question, I’d like to move away from the idea of “Does A causes B or does B cause A” and toward a more intervention-based framework (Rubin’s model for

6 0.97857195 262 andrew gelman stats-2010-09-08-Here’s how rumors get started: Lineplots, dotplots, and nonfunctional modernist architecture

7 0.97786266 2174 andrew gelman stats-2014-01-17-How to think about the statistical evidence when the statistical evidence can’t be conclusive?

8 0.97760463 1061 andrew gelman stats-2011-12-16-CrossValidated: A place to post your statistics questions

same-blog 9 0.97756964 1482 andrew gelman stats-2012-09-04-Model checking and model understanding in machine learning

10 0.97725034 1695 andrew gelman stats-2013-01-28-Economists argue about Bayes

11 0.97711551 2281 andrew gelman stats-2014-04-04-The Notorious N.H.S.T. presents: Mo P-values Mo Problems

12 0.97709787 878 andrew gelman stats-2011-08-29-Infovis, infographics, and data visualization: Where I’m coming from, and where I’d like to go

13 0.97693968 1760 andrew gelman stats-2013-03-12-Misunderstanding the p-value

14 0.97683966 789 andrew gelman stats-2011-07-07-Descriptive statistics, causal inference, and story time

15 0.97636086 662 andrew gelman stats-2011-04-15-Bayesian statistical pragmatism

16 0.97604114 855 andrew gelman stats-2011-08-16-Infovis and statgraphics update update

17 0.97596693 18 andrew gelman stats-2010-05-06-$63,000 worth of abusive research . . . or just a really stupid waste of time?

18 0.97588164 1763 andrew gelman stats-2013-03-14-Everyone’s trading bias for variance at some point, it’s just done at different places in the analyses

19 0.97554976 2120 andrew gelman stats-2013-12-02-Does a professor’s intervention in online discussions have the effect of prolonging discussion or cutting it off?

20 0.97546673 288 andrew gelman stats-2010-09-21-Discussion of the paper by Girolami and Calderhead on Bayesian computation