fast_ml fast_ml-2013 fast_ml-2013-22 knowledge-graph by maker-knowledge-mining

22 fast ml-2013-03-07-Choosing a machine learning algorithm


meta infos for this blog

Source: html

Introduction: To celbrate the first 100 followers on Twitter, we asked them what would they like to read about here. One of the responders, Itamar Berger, suggested a topic: how to choose a ML algorithm for a task at hand. Well, what do we now? Three things come to mind: We’d try fast things first. In terms of speed, here’s how we imagine the order: linear models trees, that is bagged or boosted trees everything else* We’d use something we are comfortable with. Learning new things is very exciting, however we’d ask ourselves a question: do we want to learn a new technique, or do we want a result? We’d prefer something with fewer hyperparameters to set. More params means more tuning, that is training and re-training over and over, even if automatically. Random forests are hard to beat in this department. Linear models are pretty good too. A random forest scene, credit: Jonathan MacGregor *By “everything else” we mean “everything popular”, mostly things like


Summary: the most important sentenses genereted by tfidf model

sentIndex sentText sentNum sentScore

1 To celbrate the first 100 followers on Twitter, we asked them what would they like to read about here. [sent-1, score-0.346]

2 One of the responders, Itamar Berger, suggested a topic: how to choose a ML algorithm for a task at hand. [sent-2, score-0.323]

3 Three things come to mind: We’d try fast things first. [sent-4, score-0.546]

4 In terms of speed, here’s how we imagine the order: linear models trees, that is bagged or boosted trees everything else* We’d use something we are comfortable with. [sent-5, score-1.258]

5 Learning new things is very exciting, however we’d ask ourselves a question: do we want to learn a new technique, or do we want a result? [sent-6, score-0.651]

6 We’d prefer something with fewer hyperparameters to set. [sent-7, score-0.468]

7 More params means more tuning, that is training and re-training over and over, even if automatically. [sent-8, score-0.116]

8 Random forests are hard to beat in this department. [sent-9, score-0.205]

9 A random forest scene, credit: Jonathan MacGregor *By “everything else” we mean “everything popular”, mostly things like SVMs or neural networks. [sent-11, score-0.259]

10 There’s been some interesting research about fast non-linear methods and we hope to write about it when we get to grips with the stuff. [sent-12, score-0.396]

11 We’d also like to mention some other simple algorithms besides linear models, for example nearest neighbours or naive Bayes . [sent-13, score-0.887]

12 Sometimes they may yield good or at least acceptable results. [sent-14, score-0.28]


similar blogs computed by tfidf model

tfidf for this blog:

wordName wordTfidf (topN-words)

[('everything', 0.323), ('models', 0.252), ('fast', 0.196), ('else', 0.189), ('things', 0.175), ('boosted', 0.158), ('bayes', 0.158), ('followers', 0.158), ('naive', 0.158), ('neighbours', 0.158), ('responders', 0.158), ('suggested', 0.158), ('trees', 0.151), ('hyperparameters', 0.14), ('bagged', 0.14), ('acceptable', 0.14), ('ml', 0.14), ('nearest', 0.14), ('yield', 0.14), ('twitter', 0.126), ('something', 0.119), ('wasn', 0.116), ('params', 0.116), ('tuning', 0.116), ('forests', 0.116), ('besides', 0.116), ('mention', 0.116), ('hope', 0.116), ('linear', 0.115), ('asked', 0.108), ('svms', 0.108), ('fewer', 0.108), ('technique', 0.108), ('order', 0.101), ('prefer', 0.101), ('topic', 0.101), ('new', 0.1), ('ask', 0.094), ('mind', 0.094), ('want', 0.091), ('hard', 0.089), ('choose', 0.089), ('credit', 0.089), ('algorithms', 0.084), ('speed', 0.084), ('research', 0.084), ('random', 0.084), ('read', 0.08), ('question', 0.076), ('algorithm', 0.076)]

similar blogs list:

simIndex simValue blogId blogTitle

same-blog 1 0.99999994 22 fast ml-2013-03-07-Choosing a machine learning algorithm

Introduction: To celbrate the first 100 followers on Twitter, we asked them what would they like to read about here. One of the responders, Itamar Berger, suggested a topic: how to choose a ML algorithm for a task at hand. Well, what do we now? Three things come to mind: We’d try fast things first. In terms of speed, here’s how we imagine the order: linear models trees, that is bagged or boosted trees everything else* We’d use something we are comfortable with. Learning new things is very exciting, however we’d ask ourselves a question: do we want to learn a new technique, or do we want a result? We’d prefer something with fewer hyperparameters to set. More params means more tuning, that is training and re-training over and over, even if automatically. Random forests are hard to beat in this department. Linear models are pretty good too. A random forest scene, credit: Jonathan MacGregor *By “everything else” we mean “everything popular”, mostly things like

2 0.20558283 16 fast ml-2013-01-12-Intro to random forests

Introduction: Let’s step back from forays into cutting edge topics and look at a random forest, one of the most popular machine learning techniques today. Why is it so attractive? First of all, decision tree ensembles have been found by Caruana et al. as the best overall approach for a variety of problems. Random forests, specifically, perform well both in low dimensional and high dimensional tasks. There are basically two kinds of tree ensembles: bagged trees and boosted trees. Bagging means that when building each subsequent tree, we don’t look at the earlier trees, while in boosting we consider the earlier trees and strive to compensate for their weaknesses (which may lead to overfitting). Random forest is an example of the bagging approach, less prone to overfit. Gradient boosted trees (notably GBM package in R) represent the other one. Both are very successful in many applications. Trees are also relatively fast to train, compared to some more involved methods. Besides effectivnes

3 0.11766096 46 fast ml-2013-12-07-13 NIPS papers that caught our eye

Introduction: Recently Rob Zinkov published his selection of interesting-looking NIPS papers . Inspired by this, we list some more. Rob seems to like Bayesian stuff, we’re more into neural networks. If you feel like browsing, Andrej Karpathy has a page with all NIPS 2013 papers . They are categorized by topics discovered by running LDA. When you see an interesting paper, you can discover ones ranked similiar by TF-IDF. Here’s what we found. Understanding Dropout Pierre Baldi, Peter J. Sadowski Dropout is a relatively new algorithm for training neural networks which relies on stochastically dropping out neurons during training in order to avoid the co-adaptation of feature detectors. We introduce a general formalism for studying dropout on either units or connections, with arbitrary probability values, and use it to analyze the averaging and regularizing properties of dropout in both linear and non-linear networks. For deep neural networks, the averaging properties of dropout are characte

4 0.10335237 37 fast ml-2013-09-03-Our followers and who else they follow

Introduction: Recently we hit 400 followers mark on Twitter. To celebrate we decided to do some data mining on you , specifically to discover who our followers are and who else they follow. For your viewing pleasure we packaged the results nicely with Bootstrap. Here’s some data science in action. Our followers This table show our 20 most popular followers as measeared by their follower count. The occasional question marks stand for non-ASCII characters. Each link opens a new window.   Followers Screen name Name Description 8685 pankaj Pankaj Gupta I lead the Personalization and Recommender Systems group at Twitter. Founded two startups in the past.   5070 ogrisel Olivier Grisel Datageek, contributor to scikit-learn, works with Python / Java / Clojure / Pig, interested in Machine Learning, NLProc, {Big|Linked|Open} Data and braaains!   4582 thuske thuske & 4442 ram Ram Ravichandran So

5 0.091847539 24 fast ml-2013-03-25-Dimensionality reduction for sparse binary data - an overview

Introduction: Last time we explored dimensionality reduction in practice using Gensim’s LSI and LDA. Now, having spent some time researching the subject matter, we will give an overview of other options. UPDATE : We now consider the topic quite irrelevant, because sparse high-dimensional data is precisely where linear models shine. See Amazon aspires to automate access control , Predicting advertised salaries and Predicting closed questions on Stack Overflow . And the few most popular methods are: LSI/LSA - a multinomial PCA LDA - Latent Dirichlet Allocation matrix factorization, in particular non-negative variants: NMF ICA, or Independent Components Analysis mixtures of Bernoullis stacked RBMs correlated topic models, an extension of LDA We tried the first two before. As regards matrix factorization, you do the same stuff as with movie recommendations (think Netflix challenge). The difference is, now all the matrix elements are known and we are only interested in

6 0.091197461 62 fast ml-2014-05-26-Yann LeCun's answers from the Reddit AMA

7 0.090968668 19 fast ml-2013-02-07-The secret of the big guys

8 0.088358887 55 fast ml-2014-03-20-Good representations, distance, metric learning and supervised dimensionality reduction

9 0.079884879 40 fast ml-2013-10-06-Pylearn2 in practice

10 0.079490319 13 fast ml-2012-12-27-Spearmint with a random forest

11 0.074277297 21 fast ml-2013-02-27-Dimensionality reduction for sparse binary data

12 0.073250815 27 fast ml-2013-05-01-Deep learning made easy

13 0.071649611 52 fast ml-2014-02-02-Yesterday a kaggler, today a Kaggle master: a wrap-up of the cats and dogs competition

14 0.070420399 12 fast ml-2012-12-21-Tuning hyperparams automatically with Spearmint

15 0.063878193 59 fast ml-2014-04-21-Predicting happiness from demographics and poll answers

16 0.061811674 57 fast ml-2014-04-01-Exclusive Geoff Hinton interview

17 0.060929306 20 fast ml-2013-02-18-Predicting advertised salaries

18 0.060096219 18 fast ml-2013-01-17-A very fast denoising autoencoder

19 0.056812715 14 fast ml-2013-01-04-Madelon: Spearmint's revenge

20 0.05158091 8 fast ml-2012-10-15-Merck challenge


similar blogs computed by lsi model

lsi for this blog:

topicId topicWeight

[(0, 0.244), (1, 0.171), (2, 0.028), (3, 0.012), (4, 0.098), (5, 0.177), (6, -0.086), (7, -0.192), (8, 0.028), (9, -0.024), (10, -0.395), (11, -0.301), (12, -0.005), (13, 0.162), (14, -0.144), (15, -0.258), (16, 0.03), (17, -0.045), (18, -0.016), (19, 0.101), (20, -0.131), (21, 0.145), (22, -0.058), (23, 0.017), (24, -0.06), (25, 0.023), (26, -0.002), (27, 0.051), (28, -0.002), (29, 0.07), (30, -0.056), (31, 0.071), (32, -0.022), (33, 0.023), (34, 0.064), (35, 0.0), (36, -0.185), (37, -0.105), (38, -0.01), (39, 0.231), (40, -0.047), (41, -0.064), (42, -0.127), (43, 0.141), (44, 0.032), (45, -0.105), (46, -0.126), (47, -0.011), (48, -0.157), (49, -0.05)]

similar blogs list:

simIndex simValue blogId blogTitle

same-blog 1 0.99214733 22 fast ml-2013-03-07-Choosing a machine learning algorithm

Introduction: To celbrate the first 100 followers on Twitter, we asked them what would they like to read about here. One of the responders, Itamar Berger, suggested a topic: how to choose a ML algorithm for a task at hand. Well, what do we now? Three things come to mind: We’d try fast things first. In terms of speed, here’s how we imagine the order: linear models trees, that is bagged or boosted trees everything else* We’d use something we are comfortable with. Learning new things is very exciting, however we’d ask ourselves a question: do we want to learn a new technique, or do we want a result? We’d prefer something with fewer hyperparameters to set. More params means more tuning, that is training and re-training over and over, even if automatically. Random forests are hard to beat in this department. Linear models are pretty good too. A random forest scene, credit: Jonathan MacGregor *By “everything else” we mean “everything popular”, mostly things like

2 0.44801429 16 fast ml-2013-01-12-Intro to random forests

Introduction: Let’s step back from forays into cutting edge topics and look at a random forest, one of the most popular machine learning techniques today. Why is it so attractive? First of all, decision tree ensembles have been found by Caruana et al. as the best overall approach for a variety of problems. Random forests, specifically, perform well both in low dimensional and high dimensional tasks. There are basically two kinds of tree ensembles: bagged trees and boosted trees. Bagging means that when building each subsequent tree, we don’t look at the earlier trees, while in boosting we consider the earlier trees and strive to compensate for their weaknesses (which may lead to overfitting). Random forest is an example of the bagging approach, less prone to overfit. Gradient boosted trees (notably GBM package in R) represent the other one. Both are very successful in many applications. Trees are also relatively fast to train, compared to some more involved methods. Besides effectivnes

3 0.28863052 37 fast ml-2013-09-03-Our followers and who else they follow

Introduction: Recently we hit 400 followers mark on Twitter. To celebrate we decided to do some data mining on you , specifically to discover who our followers are and who else they follow. For your viewing pleasure we packaged the results nicely with Bootstrap. Here’s some data science in action. Our followers This table show our 20 most popular followers as measeared by their follower count. The occasional question marks stand for non-ASCII characters. Each link opens a new window.   Followers Screen name Name Description 8685 pankaj Pankaj Gupta I lead the Personalization and Recommender Systems group at Twitter. Founded two startups in the past.   5070 ogrisel Olivier Grisel Datageek, contributor to scikit-learn, works with Python / Java / Clojure / Pig, interested in Machine Learning, NLProc, {Big|Linked|Open} Data and braaains!   4582 thuske thuske & 4442 ram Ram Ravichandran So

4 0.23961665 46 fast ml-2013-12-07-13 NIPS papers that caught our eye

Introduction: Recently Rob Zinkov published his selection of interesting-looking NIPS papers . Inspired by this, we list some more. Rob seems to like Bayesian stuff, we’re more into neural networks. If you feel like browsing, Andrej Karpathy has a page with all NIPS 2013 papers . They are categorized by topics discovered by running LDA. When you see an interesting paper, you can discover ones ranked similiar by TF-IDF. Here’s what we found. Understanding Dropout Pierre Baldi, Peter J. Sadowski Dropout is a relatively new algorithm for training neural networks which relies on stochastically dropping out neurons during training in order to avoid the co-adaptation of feature detectors. We introduce a general formalism for studying dropout on either units or connections, with arbitrary probability values, and use it to analyze the averaging and regularizing properties of dropout in both linear and non-linear networks. For deep neural networks, the averaging properties of dropout are characte

5 0.21645997 19 fast ml-2013-02-07-The secret of the big guys

Introduction: Are you interested in linear models, or K-means clustering? Probably not much. These are very basic techniques with fancier alternatives. But here’s the bomb: when you combine those two methods for supervised learning, you can get better results than from a random forest. And maybe even faster. We have already written about Vowpal Wabbit , a fast linear learner from Yahoo/Microsoft. Google’s response (or at least, a Google’s guy response) seems to be Sofia-ML . The software consists of two parts: a linear learner and K-means clustering. We found Sofia a while ago and wondered about K-means: who needs K-means? Here’s a clue: This package can be used for learning cluster centers (…) and for mapping a given data set onto a new feature space based on the learned cluster centers. Our eyes only opened when we read a certain paper, namely An Analysis of Single-Layer Networks in Unsupervised Feature Learning ( PDF ). The paper, by Coates , Lee and Ng, is about object recogni

6 0.2143575 62 fast ml-2014-05-26-Yann LeCun's answers from the Reddit AMA

7 0.19294611 21 fast ml-2013-02-27-Dimensionality reduction for sparse binary data

8 0.19215561 24 fast ml-2013-03-25-Dimensionality reduction for sparse binary data - an overview

9 0.17588703 40 fast ml-2013-10-06-Pylearn2 in practice

10 0.17173788 27 fast ml-2013-05-01-Deep learning made easy

11 0.15379879 52 fast ml-2014-02-02-Yesterday a kaggler, today a Kaggle master: a wrap-up of the cats and dogs competition

12 0.15073413 13 fast ml-2012-12-27-Spearmint with a random forest

13 0.14522727 55 fast ml-2014-03-20-Good representations, distance, metric learning and supervised dimensionality reduction

14 0.1409682 57 fast ml-2014-04-01-Exclusive Geoff Hinton interview

15 0.13114139 32 fast ml-2013-07-05-Processing large files, line by line

16 0.12980901 12 fast ml-2012-12-21-Tuning hyperparams automatically with Spearmint

17 0.12888578 18 fast ml-2013-01-17-A very fast denoising autoencoder

18 0.12395409 59 fast ml-2014-04-21-Predicting happiness from demographics and poll answers

19 0.12094072 8 fast ml-2012-10-15-Merck challenge

20 0.11219651 42 fast ml-2013-10-28-How much data is enough?


similar blogs computed by lda model

lda for this blog:

topicId topicWeight

[(69, 0.07), (71, 0.798), (99, 0.024)]

similar blogs list:

simIndex simValue blogId blogTitle

same-blog 1 0.99350441 22 fast ml-2013-03-07-Choosing a machine learning algorithm

Introduction: To celbrate the first 100 followers on Twitter, we asked them what would they like to read about here. One of the responders, Itamar Berger, suggested a topic: how to choose a ML algorithm for a task at hand. Well, what do we now? Three things come to mind: We’d try fast things first. In terms of speed, here’s how we imagine the order: linear models trees, that is bagged or boosted trees everything else* We’d use something we are comfortable with. Learning new things is very exciting, however we’d ask ourselves a question: do we want to learn a new technique, or do we want a result? We’d prefer something with fewer hyperparameters to set. More params means more tuning, that is training and re-training over and over, even if automatically. Random forests are hard to beat in this department. Linear models are pretty good too. A random forest scene, credit: Jonathan MacGregor *By “everything else” we mean “everything popular”, mostly things like

2 0.60281372 16 fast ml-2013-01-12-Intro to random forests

Introduction: Let’s step back from forays into cutting edge topics and look at a random forest, one of the most popular machine learning techniques today. Why is it so attractive? First of all, decision tree ensembles have been found by Caruana et al. as the best overall approach for a variety of problems. Random forests, specifically, perform well both in low dimensional and high dimensional tasks. There are basically two kinds of tree ensembles: bagged trees and boosted trees. Bagging means that when building each subsequent tree, we don’t look at the earlier trees, while in boosting we consider the earlier trees and strive to compensate for their weaknesses (which may lead to overfitting). Random forest is an example of the bagging approach, less prone to overfit. Gradient boosted trees (notably GBM package in R) represent the other one. Both are very successful in many applications. Trees are also relatively fast to train, compared to some more involved methods. Besides effectivnes

3 0.34496012 21 fast ml-2013-02-27-Dimensionality reduction for sparse binary data

Introduction: Much of data in machine learning is sparse, that is mostly zeros, and often binary. The phenomenon may result from converting categorical variables to one-hot vectors, and from converting text to bag-of-words representation. If each feature is binary - either zero or one - then it holds exactly one bit of information. Surely we could somehow compress such data to fewer real numbers. To do this, we turn to topic models, an area of research with roots in natural language processing. In NLP, a training set is called a corpus, and each document is like a row in the set. A document might be three pages of text, or just a few words, as in a tweet. The idea of topic modelling is that you can group words in your corpus into relatively few topics and represent each document as a mixture of these topics. It’s attractive because you can interpret the model by looking at words that form the topics. Sometimes they seem meaningful, sometimes not. A meaningful topic might be, for example: “cric

4 0.33244604 18 fast ml-2013-01-17-A very fast denoising autoencoder

Introduction: Once upon a time we were browsing machine learning papers and software. We were interested in autoencoders and found a rather unusual one. It was called marginalized Stacked Denoising Autoencoder and the author claimed that it preserves the strong feature learning capacity of Stacked Denoising Autoencoders, but is orders of magnitudes faster. We like all things fast, so we were hooked. About autoencoders Wikipedia says that an autoencoder is an artificial neural network and its aim is to learn a compressed representation for a set of data. This means it is being used for dimensionality reduction . In other words, an autoencoder is a neural network meant to replicate the input. It would be trivial with a big enough number of units in a hidden layer: the network would just find an identity mapping. Hence dimensionality reduction: a hidden layer size is typically smaller than input layer. mSDA is a curious specimen: it is not a neural network and it doesn’t reduce dimension

5 0.29363659 19 fast ml-2013-02-07-The secret of the big guys

Introduction: Are you interested in linear models, or K-means clustering? Probably not much. These are very basic techniques with fancier alternatives. But here’s the bomb: when you combine those two methods for supervised learning, you can get better results than from a random forest. And maybe even faster. We have already written about Vowpal Wabbit , a fast linear learner from Yahoo/Microsoft. Google’s response (or at least, a Google’s guy response) seems to be Sofia-ML . The software consists of two parts: a linear learner and K-means clustering. We found Sofia a while ago and wondered about K-means: who needs K-means? Here’s a clue: This package can be used for learning cluster centers (…) and for mapping a given data set onto a new feature space based on the learned cluster centers. Our eyes only opened when we read a certain paper, namely An Analysis of Single-Layer Networks in Unsupervised Feature Learning ( PDF ). The paper, by Coates , Lee and Ng, is about object recogni

6 0.28740773 13 fast ml-2012-12-27-Spearmint with a random forest

7 0.27695233 9 fast ml-2012-10-25-So you want to work for Facebook

8 0.27642781 57 fast ml-2014-04-01-Exclusive Geoff Hinton interview

9 0.27571878 55 fast ml-2014-03-20-Good representations, distance, metric learning and supervised dimensionality reduction

10 0.26399088 46 fast ml-2013-12-07-13 NIPS papers that caught our eye

11 0.25718859 24 fast ml-2013-03-25-Dimensionality reduction for sparse binary data - an overview

12 0.25624803 28 fast ml-2013-05-12-And deliver us from Weka

13 0.25405082 40 fast ml-2013-10-06-Pylearn2 in practice

14 0.25220028 26 fast ml-2013-04-17-Regression as classification

15 0.24798828 25 fast ml-2013-04-10-Gender discrimination

16 0.23534125 32 fast ml-2013-07-05-Processing large files, line by line

17 0.23237176 7 fast ml-2012-10-05-Predicting closed questions on Stack Overflow

18 0.23137288 8 fast ml-2012-10-15-Merck challenge

19 0.22385573 34 fast ml-2013-07-14-Running things on a GPU

20 0.21864119 52 fast ml-2014-02-02-Yesterday a kaggler, today a Kaggle master: a wrap-up of the cats and dogs competition