andrew_gelman_stats andrew_gelman_stats-2013 andrew_gelman_stats-2013-1960 knowledge-graph by maker-knowledge-mining

1960 andrew gelman stats-2013-07-28-More on that machine learning course


meta infos for this blog

Source: html

Introduction: Following up on our discussion the other day, Andrew Ng writes: Looking at the “typical” ML syllabus, I think most classes do a great job teaching the core ideas, but that there’re two recent trends in ML that are usually not yet reflected. First, unlike 10 years ago, a lot of our students are now taking ML not to do ML research, but to apply it in other research areas or in industry. I’d like to serve these students as well. While many ML classes do a nice job teaching the theory and core algorithms, I’ve seen very few that teach the “hands-on” tactics for how to actually build a high-performance ML system, or on how to think about piecing together a complex ML architecture. For example, what sorts of diagnostics do you run to figure out why your algorithm isn’t giving reasonable accuracy? How much do you invest in collecting additional training data? How do you structure your org chart and metrics if you think there’re 3 components that need to be built and plugged


Summary: the most important sentenses genereted by tfidf model

sentIndex sentText sentNum sentScore

1 Following up on our discussion the other day, Andrew Ng writes: Looking at the “typical” ML syllabus, I think most classes do a great job teaching the core ideas, but that there’re two recent trends in ML that are usually not yet reflected. [sent-1, score-0.486]

2 First, unlike 10 years ago, a lot of our students are now taking ML not to do ML research, but to apply it in other research areas or in industry. [sent-2, score-0.132]

3 While many ML classes do a nice job teaching the theory and core algorithms, I’ve seen very few that teach the “hands-on” tactics for how to actually build a high-performance ML system, or on how to think about piecing together a complex ML architecture. [sent-4, score-0.675]

4 For example, what sorts of diagnostics do you run to figure out why your algorithm isn’t giving reasonable accuracy? [sent-5, score-0.165]

5 How much do you invest in collecting additional training data? [sent-6, score-0.219]

6 How do you structure your org chart and metrics if you think there’re 3 components that need to be built and plugged together? [sent-7, score-0.231]

7 Second, I think most ML classes have been slow to appreciate the rise of Big Data. [sent-12, score-0.585]

8 (Surprisingly, I find that even some classes titled “Big Data” are still slow to appreciate the rise of Big Data. [sent-13, score-0.564]

9 ) The volume, scale, magnitude of data that we all have access to now is completely unprecedented, and a lot of Silicon Valley’s ML advances have been because of this. [sent-14, score-0.099]

10 (For example, the single most commonly used learning algorithm in the Valley is probably logistic regression, only applied at massive scale. [sent-15, score-0.201]

11 ) This rise of data has led to a parallel literature on data warehousing, and tools like Hadoop/Hive/Storm/Kafka/AWS/… for exploiting this data. [sent-16, score-0.33]

12 The way you think about obtaining and training on 1B examples is very different than the way you think about training on 10K examples, and it goes beyond the algorithmic questions like online vs. [sent-17, score-0.606]

13 batch, into computer systems issues, hardware constraints, and questions like how to plan for polyglot persistence. [sent-18, score-0.133]

14 I think this will become even more true over time. [sent-20, score-0.069]


similar blogs computed by tfidf model

tfidf for this blog:

wordName wordTfidf (topN-words)

[('ml', 0.795), ('classes', 0.189), ('rise', 0.176), ('training', 0.117), ('algorithm', 0.111), ('valley', 0.105), ('teaching', 0.091), ('learning', 0.09), ('core', 0.086), ('slow', 0.084), ('algorithms', 0.084), ('questions', 0.07), ('piecing', 0.07), ('proximate', 0.07), ('students', 0.07), ('think', 0.069), ('appreciate', 0.067), ('algorithmic', 0.066), ('exploit', 0.063), ('hardware', 0.063), ('tactics', 0.063), ('iterating', 0.063), ('apply', 0.062), ('batch', 0.061), ('unprecedented', 0.061), ('plugged', 0.059), ('silicon', 0.058), ('exploiting', 0.058), ('syllabus', 0.056), ('invest', 0.056), ('metrics', 0.056), ('big', 0.056), ('together', 0.056), ('graduates', 0.055), ('diagnostics', 0.054), ('expense', 0.053), ('superior', 0.053), ('job', 0.051), ('advances', 0.051), ('tradeoff', 0.051), ('obtaining', 0.05), ('served', 0.05), ('examples', 0.048), ('titled', 0.048), ('data', 0.048), ('chart', 0.047), ('trade', 0.046), ('serve', 0.046), ('collecting', 0.046), ('mathematically', 0.046)]

similar blogs list:

simIndex simValue blogId blogTitle

same-blog 1 1.0 1960 andrew gelman stats-2013-07-28-More on that machine learning course

Introduction: Following up on our discussion the other day, Andrew Ng writes: Looking at the “typical” ML syllabus, I think most classes do a great job teaching the core ideas, but that there’re two recent trends in ML that are usually not yet reflected. First, unlike 10 years ago, a lot of our students are now taking ML not to do ML research, but to apply it in other research areas or in industry. I’d like to serve these students as well. While many ML classes do a nice job teaching the theory and core algorithms, I’ve seen very few that teach the “hands-on” tactics for how to actually build a high-performance ML system, or on how to think about piecing together a complex ML architecture. For example, what sorts of diagnostics do you run to figure out why your algorithm isn’t giving reasonable accuracy? How much do you invest in collecting additional training data? How do you structure your org chart and metrics if you think there’re 3 components that need to be built and plugged

2 0.63783747 1956 andrew gelman stats-2013-07-25-What should be in a machine learning course?

Introduction: Nando de Freitas writes: We’re designing two machine learning (ML) courses at Oxford (introductory and advanced ML). In doing this, we have many questions and wonder what your thoughts are on the following: - Which do you think are the key optimization papers/ideas that should be covered. - Which topics do you think are coolest things in ML? - Which are the essential ideas, tools and approaches? - Are there other courses you would recommend? - Which are good resources for students to learn to code and apply convolutional nets? Theano? What are the key deep learning things to know first? - Which are the best scalable classifiers? … pegasos .. etc. - Which are the coolest applications that can be easily given as a programming exercise? - What theory to teach? PAC? PAC-Bayes? CLTs? - What are the best tutorials on sample complexity for ML? - How much should we emphasize the trade-offs of computing/optimization-approximation-estimation. - What are the ML algorithms mostly

3 0.1269947 2133 andrew gelman stats-2013-12-13-Flexibility is good

Introduction: If I made a separate post for each interesting blog discussion, we’d get overwhelmed. That’s why I often leave detailed responses in the comments section, even though I’m pretty sure that most readers don’t look in the comments at all. Sometimes, though, I think it’s good to bring such discussions to light. Here’s a recent example. Michael wrote : Poor predictive performance usually indicates that the model isn’t sufficiently flexible to explain the data, and my understanding of the proper Bayesian strategy is to feed that back into your original model and try again until you achieve better performance. Corey replied : It was my impression that — in ML at least — poor predictive performance is more often due to the model being too flexible and fitting noise. And Rahul agreed : Good point. A very flexible model will describe your training data perfectly and then go bonkers when unleashed on wild data. But I wrote : Overfitting comes from a model being flex

4 0.11472203 1377 andrew gelman stats-2012-06-13-A question about AIC

Introduction: Jacob Oaknin asks: Akaike ‘s selection criterion is often justified on the basis of the empirical risk of a ML estimate being a biased estimate of the true generalization error of a parametric family, say the family, S_m, of linear regressors on a m-dimensional variable x=(x_1,..,x_m) with gaussian noise independent of x (for instance in “Unifying the derivations for the Akaike and Corrected Akaike information criteria”, by J.E.Cavanaugh, Statistics and Probability Letters, vol. 33, 1997, pp. 201-208). On the other hand, the family S_m is known to have finite VC-dimension (VC = m+1), and this fact should grant that empirical risk minimizer is asymtotically consistent regardless of the underlying probability distribution, and in particular for the assumed gaussian distribution of noise(“An overview of statistical learning theory”, by V.N.Vapnik, IEEE Transactions On Neural Networks, vol. 10, No. 5, 1999, pp. 988-999) What am I missing? My reply: I’m no expert on AIC so

5 0.10559429 1047 andrew gelman stats-2011-12-08-I Am Too Absolutely Heteroskedastic for This Probit Model

Introduction: Soren Lorensen wrote: I’m working on a project that uses a binary choice model on panel data. Since I have panel data and am using MLE, I’m concerned about heteroskedasticity making my estimates inconsistent and biased. Are you familiar with any statistical packages with pre-built tests for heteroskedasticity in binary choice ML models? If not, is there value in cutting my data into groups over which I guess the error variance might vary and eyeballing residual plots? Have you other suggestions about how I might resolve this concern? I replied that I wouldn’t worry so much about heteroskedasticity. Breaking up the data into pieces might make sense, but for the purpose of estimating how the coefficients might vary—that is, nonlinearity and interactions. Soren shot back: I’m somewhat puzzled however: homoskedasticity is an identifying assumption in estimating a probit model: if we don’t have it all sorts of bad things can happen to our parameter estimates. Do you suggest n

6 0.094379224 1517 andrew gelman stats-2012-10-01-“On Inspiring Students and Being Human”

7 0.078528896 390 andrew gelman stats-2010-11-02-Fragment of statistical autobiography

8 0.074402072 1788 andrew gelman stats-2013-04-04-When is there “hidden structure in data” to be discovered?

9 0.073346809 938 andrew gelman stats-2011-10-03-Comparing prediction errors

10 0.070954345 1740 andrew gelman stats-2013-02-26-“Is machine learning a subset of statistics?”

11 0.06981004 1582 andrew gelman stats-2012-11-18-How to teach methods we don’t like?

12 0.069192208 1482 andrew gelman stats-2012-09-04-Model checking and model understanding in machine learning

13 0.068708688 1611 andrew gelman stats-2012-12-07-Feedback on my Bayesian Data Analysis class at Columbia

14 0.067741029 1864 andrew gelman stats-2013-05-20-Evaluating Columbia University’s Frontiers of Science course

15 0.067013755 1895 andrew gelman stats-2013-06-12-Peter Thiel is writing another book!

16 0.06548433 22 andrew gelman stats-2010-05-07-Jenny Davidson wins Mark Van Doren Award, also some reflections on the continuity of work within literary criticism or statistics

17 0.065418415 2041 andrew gelman stats-2013-09-27-Setting up Jitts online

18 0.064246982 861 andrew gelman stats-2011-08-19-Will Stan work well with 40×40 matrices?

19 0.063918017 1630 andrew gelman stats-2012-12-18-Postdoc positions at Microsoft Research – NYC

20 0.063488506 236 andrew gelman stats-2010-08-26-Teaching yourself mathematics


similar blogs computed by lsi model

lsi for this blog:

topicId topicWeight

[(0, 0.127), (1, -0.009), (2, -0.037), (3, 0.01), (4, 0.045), (5, 0.075), (6, -0.002), (7, 0.047), (8, -0.032), (9, 0.019), (10, 0.012), (11, 0.039), (12, -0.039), (13, -0.03), (14, -0.011), (15, -0.036), (16, 0.005), (17, -0.013), (18, -0.031), (19, 0.011), (20, 0.012), (21, 0.0), (22, -0.007), (23, 0.041), (24, -0.022), (25, 0.02), (26, 0.025), (27, 0.01), (28, 0.031), (29, 0.009), (30, 0.017), (31, -0.017), (32, -0.035), (33, -0.025), (34, 0.017), (35, -0.017), (36, -0.041), (37, 0.023), (38, -0.024), (39, -0.055), (40, -0.025), (41, 0.032), (42, -0.056), (43, 0.04), (44, 0.025), (45, 0.056), (46, -0.005), (47, 0.012), (48, 0.029), (49, 0.019)]

similar blogs list:

simIndex simValue blogId blogTitle

same-blog 1 0.92058891 1960 andrew gelman stats-2013-07-28-More on that machine learning course

Introduction: Following up on our discussion the other day, Andrew Ng writes: Looking at the “typical” ML syllabus, I think most classes do a great job teaching the core ideas, but that there’re two recent trends in ML that are usually not yet reflected. First, unlike 10 years ago, a lot of our students are now taking ML not to do ML research, but to apply it in other research areas or in industry. I’d like to serve these students as well. While many ML classes do a nice job teaching the theory and core algorithms, I’ve seen very few that teach the “hands-on” tactics for how to actually build a high-performance ML system, or on how to think about piecing together a complex ML architecture. For example, what sorts of diagnostics do you run to figure out why your algorithm isn’t giving reasonable accuracy? How much do you invest in collecting additional training data? How do you structure your org chart and metrics if you think there’re 3 components that need to be built and plugged

2 0.84746903 1956 andrew gelman stats-2013-07-25-What should be in a machine learning course?

Introduction: Nando de Freitas writes: We’re designing two machine learning (ML) courses at Oxford (introductory and advanced ML). In doing this, we have many questions and wonder what your thoughts are on the following: - Which do you think are the key optimization papers/ideas that should be covered. - Which topics do you think are coolest things in ML? - Which are the essential ideas, tools and approaches? - Are there other courses you would recommend? - Which are good resources for students to learn to code and apply convolutional nets? Theano? What are the key deep learning things to know first? - Which are the best scalable classifiers? … pegasos .. etc. - Which are the coolest applications that can be easily given as a programming exercise? - What theory to teach? PAC? PAC-Bayes? CLTs? - What are the best tutorials on sample complexity for ML? - How much should we emphasize the trade-offs of computing/optimization-approximation-estimation. - What are the ML algorithms mostly

3 0.77835357 1517 andrew gelman stats-2012-10-01-“On Inspiring Students and Being Human”

Introduction: Rachel Schutt (the author of the Taxonomy of Confusion) has a blog! for the course she’s teaching at Columbia, “Introduction to Data Science.” It sounds like a great course—I wish I could take it! Her latest post is “On Inspiring Students and Being Human”: Of course one hopes as a teacher that one will inspire students . . . But what I actually mean by “inspiring students” is that you are inspiring me; you are students who inspire: “inspiring students”. This is one of the happy unintended consequences of this course so far for me. She then gives examples of some of the students in her class and some of their interesting ideas: Phillip is a PhD student in the sociology department . . . He’s in the process of developing his thesis topic around some of the themes we’ve been discussing in this class, such as the emerging data science community. Arvi works at the College Board and is a part time student . . . He analyzes user-level data of students who have signed up f

4 0.75013137 308 andrew gelman stats-2010-09-30-Nano-project qualifying exam process: An intensified dialogue between students and faculty

Introduction: Joe Blitzstein and Xiao-Li Meng write : An e ffectively designed examination process goes far beyond revealing students’ knowledge or skills. It also serves as a great teaching and learning tool, incentivizing the students to think more deeply and to connect the dots at a higher level. This extends throughout the entire process: pre-exam preparation, the exam itself, and the post-exam period (the aftermath or, more appropriately, afterstat of the exam). As in the publication process, the first submission is essential but still just one piece in the dialogue. Viewing the entire exam process as an extended dialogue between students and faculty, we discuss ideas for making this dialogue induce more inspiration than perspiration, and thereby making it a memorable deep-learning triumph rather than a wish-to-forget test-taking trauma. We illustrate such a dialogue through a recently introduced course in the Harvard Statistics Department, Stat 399: Problem Solving in Statistics, and tw

5 0.72082204 1611 andrew gelman stats-2012-12-07-Feedback on my Bayesian Data Analysis class at Columbia

Introduction: In one of the final Jitts, we asked the students how the course could be improved. Some of their suggestions would work, some would not. I’m putting all the suggestions below, interpolating my responses. (Overall, I think the course went well. Please remember that the remarks below are not course evaluations; they are answers to my specific question of how the course could be better. If we’d had a Jitt asking all the ways the course was good, you’d be seeing lots of positive remarks. But that wouldn’t be particularly useful or interesting.) The best thing about the course is that the kids worked hard each week on their homeworks. OK, here are the comments and my replies: Could have been better if we did less amount but more in detail. I don’t know if this would’ve been possible. I wanted to get to the harder stuff (HMC, VB, nonparametric models) which required a certain amount of preparation. And, even so, there was not time for everything. And also, needs solut

6 0.71235853 462 andrew gelman stats-2010-12-10-Who’s holding the pen?, The split screen, and other ideas for one-on-one instruction

7 0.71091855 1752 andrew gelman stats-2013-03-06-Online Education and Jazz

8 0.70327419 516 andrew gelman stats-2011-01-14-A new idea for a science core course based entirely on computer simulation

9 0.70325595 277 andrew gelman stats-2010-09-14-In an introductory course, when does learning occur?

10 0.70195967 236 andrew gelman stats-2010-08-26-Teaching yourself mathematics

11 0.70110422 2104 andrew gelman stats-2013-11-17-Big bad education bureaucracy does big bad things

12 0.69934791 1009 andrew gelman stats-2011-11-14-Wickham R short course

13 0.68740928 402 andrew gelman stats-2010-11-09-Kaggle: forecasting competitions in the classroom

14 0.68385029 1890 andrew gelman stats-2013-06-09-Frontiers of Science update

15 0.67880028 515 andrew gelman stats-2011-01-13-The Road to a B

16 0.66181916 1056 andrew gelman stats-2011-12-13-Drawing to Learn in Science

17 0.65868527 579 andrew gelman stats-2011-02-18-What is this, a statistics class or a dentist’s office??

18 0.65857357 1224 andrew gelman stats-2012-03-21-Teaching velocity and acceleration

19 0.65544099 1864 andrew gelman stats-2013-05-20-Evaluating Columbia University’s Frontiers of Science course

20 0.65267879 2257 andrew gelman stats-2014-03-20-The candy weighing demonstration, or, the unwisdom of crowds


similar blogs computed by lda model

lda for this blog:

topicId topicWeight

[(5, 0.017), (13, 0.049), (15, 0.015), (16, 0.043), (24, 0.147), (42, 0.024), (52, 0.027), (53, 0.103), (55, 0.016), (77, 0.018), (86, 0.038), (90, 0.054), (95, 0.042), (99, 0.253)]

similar blogs list:

simIndex simValue blogId blogTitle

1 0.94968659 1047 andrew gelman stats-2011-12-08-I Am Too Absolutely Heteroskedastic for This Probit Model

Introduction: Soren Lorensen wrote: I’m working on a project that uses a binary choice model on panel data. Since I have panel data and am using MLE, I’m concerned about heteroskedasticity making my estimates inconsistent and biased. Are you familiar with any statistical packages with pre-built tests for heteroskedasticity in binary choice ML models? If not, is there value in cutting my data into groups over which I guess the error variance might vary and eyeballing residual plots? Have you other suggestions about how I might resolve this concern? I replied that I wouldn’t worry so much about heteroskedasticity. Breaking up the data into pieces might make sense, but for the purpose of estimating how the coefficients might vary—that is, nonlinearity and interactions. Soren shot back: I’m somewhat puzzled however: homoskedasticity is an identifying assumption in estimating a probit model: if we don’t have it all sorts of bad things can happen to our parameter estimates. Do you suggest n

2 0.94508982 1555 andrew gelman stats-2012-10-31-Social scientists who use medical analogies to explain causal inference are, I think, implicitly trying to borrow some of the scientific and cultural authority of that field for our own purposes

Introduction: I’m sorry I don’t have any new zombie papers in time for Halloween. Instead I’d like to be a little monster by reproducing a mini-rant from this article on experimental reasoning in social science: I will restrict my discussion to social science examples. Social scientists are often tempted to illustrate their ideas with examples from medical research. When it comes to medicine, though, we are, with rare exceptions, at best ignorant laypersons (in my case, not even reaching that level), and it is my impression that by reaching for medical analogies we are implicitly trying to borrow some of the scientific and cultural authority of that field for our own purposes. Evidence-based medicine is the subject of a large literature of its own (see, for example, Lau, Ioannidis, and Schmid, 1998).

3 0.9446286 991 andrew gelman stats-2011-11-04-Insecure researchers aren’t sharing their data

Introduction: Jelte Wicherts writes: I thought you might be interested in reading this paper that is to appear this week in PLoS ONE. In it we [Wicherts, Marjan Bakker, and Dylan Molenaar] show that the willingness to share data from published psychological research is associated both with “the strength of the evidence” (against H0) and the prevalence of errors in the reporting of p-values. The issue of data archiving will likely be put on the agenda of granting bodies and the APA/APS because of what Diederik Stapel did . I hate hate hate hate hate when people don’t share their data. In fact, that’s the subject of my very first column on ethics for Chance magazine. I have a story from 22 years ago, when I contacted some scientists and showed them how I could reanalyze their data more efficiently (based on a preliminary analysis of their published summary statistics). They seemed to feel threatened by the suggestion and refused to send me their raw data. (It was an animal experiment

4 0.9430269 687 andrew gelman stats-2011-04-29-Zero is zero

Introduction: Nathan Roseberry writes: I thought I had read on your blog that bar charts should always include zero on the scale, but a search of your blog (or google) didn’t return what I was looking for. Is it considered a best practice to always include zero on the axis for bar charts? Has this been written in a book? My reply: The idea is that the area of the bar represents “how many” or “how much.” The bar has to go down to 0 for that to work. You don’t have to have your y-axis go to zero, but if you want the axis to go anywhere else, don’t use a bar graph, use a line graph. Usually line graphs are better anyway. I’m sure this is all in a book somewhere.

5 0.94300324 1905 andrew gelman stats-2013-06-18-There are no fat sprinters

Introduction: This post is by Phil. A little over three years ago I wrote a post about exercise and weight loss in which I described losing a fair amount of weight due to (I believe) an exercise regime, with no effort to change my diet; this contradicted the prediction of studies that had recently been released. The comment thread on that post is quite interesting: a lot of people had had similar experiences — losing weight, or keeping it off, with an exercise program that includes very short periods of exercise at maximal intensity — while other people expressed some skepticism about my claims. Some commenters said that I risked injury; others said it was too early to judge anything because my weight loss might not last. The people who predicted injury were right: running the curve during a 200m sprint a month or two after that post, I strained my Achilles tendon. Nothing really serious, but it did keep me off the track for a couple of months, and rather than go back to sprinting I switched t

6 0.94293737 495 andrew gelman stats-2010-12-31-“Threshold earners” and economic inequality

7 0.9418599 1856 andrew gelman stats-2013-05-14-GPstuff: Bayesian Modeling with Gaussian Processes

same-blog 8 0.94077408 1960 andrew gelman stats-2013-07-28-More on that machine learning course

9 0.93999386 446 andrew gelman stats-2010-12-03-Is 0.05 too strict as a p-value threshold?

10 0.93518996 547 andrew gelman stats-2011-01-31-Using sample size in the prior distribution

11 0.93447185 1468 andrew gelman stats-2012-08-24-Multilevel modeling and instrumental variables

12 0.93438953 248 andrew gelman stats-2010-09-01-Ratios where the numerator and denominator both change signs

13 0.93158281 1902 andrew gelman stats-2013-06-17-Job opening at new “big data” consulting firm!

14 0.92962921 2313 andrew gelman stats-2014-04-30-Seth Roberts

15 0.92788869 2155 andrew gelman stats-2013-12-31-No on Yes-No decisions

16 0.92788386 2149 andrew gelman stats-2013-12-26-Statistical evidence for revised standards

17 0.92698646 880 andrew gelman stats-2011-08-30-Annals of spam

18 0.92636698 1956 andrew gelman stats-2013-07-25-What should be in a machine learning course?

19 0.92322302 1509 andrew gelman stats-2012-09-24-Analyzing photon counts

20 0.9218924 499 andrew gelman stats-2011-01-03-5 books