fast_ml fast_ml-2013 fast_ml-2013-16 knowledge-graph by maker-knowledge-mining
Source: html
Introduction: Let’s step back from forays into cutting edge topics and look at a random forest, one of the most popular machine learning techniques today. Why is it so attractive? First of all, decision tree ensembles have been found by Caruana et al. as the best overall approach for a variety of problems. Random forests, specifically, perform well both in low dimensional and high dimensional tasks. There are basically two kinds of tree ensembles: bagged trees and boosted trees. Bagging means that when building each subsequent tree, we don’t look at the earlier trees, while in boosting we consider the earlier trees and strive to compensate for their weaknesses (which may lead to overfitting). Random forest is an example of the bagging approach, less prone to overfit. Gradient boosted trees (notably GBM package in R) represent the other one. Both are very successful in many applications. Trees are also relatively fast to train, compared to some more involved methods. Besides effectivnes
sentIndex sentText sentNum sentScore
1 Let’s step back from forays into cutting edge topics and look at a random forest, one of the most popular machine learning techniques today. [sent-1, score-0.551]
2 First of all, decision tree ensembles have been found by Caruana et al. [sent-3, score-0.668]
3 as the best overall approach for a variety of problems. [sent-4, score-0.257]
4 Random forests, specifically, perform well both in low dimensional and high dimensional tasks. [sent-5, score-0.43]
5 There are basically two kinds of tree ensembles: bagged trees and boosted trees. [sent-6, score-1.075]
6 Bagging means that when building each subsequent tree, we don’t look at the earlier trees, while in boosting we consider the earlier trees and strive to compensate for their weaknesses (which may lead to overfitting). [sent-7, score-0.984]
7 Random forest is an example of the bagging approach, less prone to overfit. [sent-8, score-0.386]
8 Gradient boosted trees (notably GBM package in R) represent the other one. [sent-9, score-0.627]
9 Trees are also relatively fast to train, compared to some more involved methods. [sent-11, score-0.092]
10 Besides effectivness and speed, random forests are easy to use: There are few hyperparams to tune, number of trees being the most important. [sent-12, score-0.826]
11 It’s not difficult to tune it, as usually more is better, up to a certain point. [sent-13, score-0.187]
12 With bigger datasets it’s almost a matter of how many trees you can afford computationally. [sent-14, score-0.351]
13 Having few hyperparams differentiaties random forest from gradient boosted trees, which have more parameters to tweak. [sent-15, score-0.909]
14 Shifting means subtracting the mean so that values in each column are centered around zero; scaling means dividing by standard deviance so that the magnitudes of all features are similar. [sent-17, score-0.394]
15 Gradient descent based methods (for example neural networks and SVMs) gain from such pre-processing of data. [sent-18, score-0.073]
16 Not that it’s much work, but still an additional step to perform. [sent-19, score-0.093]
17 One last thing we will mention is that a random forest generates an internal unbiased estimate of the generalization error as the forest building progresses [Breiman] , known as out-of-bag error, or OOBE. [sent-20, score-1.239]
18 It stems from the fact that any given tree only uses a subset of available data for training, and the rest can be used to estimate the error. [sent-21, score-0.655]
19 Thus you can immediately get a rough idea of how the learning goes, even without a validation set. [sent-22, score-0.092]
20 All this makes random forests one of first choices in supervised learning. [sent-23, score-0.372]
wordName wordTfidf (topN-words)
[('tree', 0.367), ('trees', 0.351), ('boosted', 0.276), ('estimate', 0.221), ('forests', 0.202), ('gradient', 0.202), ('dimensional', 0.184), ('random', 0.17), ('earlier', 0.162), ('forest', 0.158), ('building', 0.147), ('ensembles', 0.147), ('bagging', 0.147), ('tune', 0.125), ('hyperparams', 0.103), ('approach', 0.098), ('step', 0.093), ('decision', 0.092), ('compensate', 0.092), ('cutting', 0.092), ('generates', 0.092), ('immediately', 0.092), ('involved', 0.092), ('magnitudes', 0.092), ('shift', 0.092), ('shifting', 0.092), ('variety', 0.092), ('bagged', 0.081), ('centered', 0.081), ('internal', 0.081), ('prone', 0.081), ('subtracting', 0.081), ('attractive', 0.073), ('descent', 0.073), ('generalization', 0.073), ('thus', 0.073), ('error', 0.072), ('means', 0.07), ('subset', 0.067), ('overall', 0.067), ('besides', 0.067), ('notably', 0.067), ('edge', 0.067), ('mention', 0.067), ('techniques', 0.067), ('certain', 0.062), ('perform', 0.062), ('svms', 0.062), ('et', 0.062), ('topics', 0.062)]
simIndex simValue blogId blogTitle
same-blog 1 1.0000001 16 fast ml-2013-01-12-Intro to random forests
Introduction: Let’s step back from forays into cutting edge topics and look at a random forest, one of the most popular machine learning techniques today. Why is it so attractive? First of all, decision tree ensembles have been found by Caruana et al. as the best overall approach for a variety of problems. Random forests, specifically, perform well both in low dimensional and high dimensional tasks. There are basically two kinds of tree ensembles: bagged trees and boosted trees. Bagging means that when building each subsequent tree, we don’t look at the earlier trees, while in boosting we consider the earlier trees and strive to compensate for their weaknesses (which may lead to overfitting). Random forest is an example of the bagging approach, less prone to overfit. Gradient boosted trees (notably GBM package in R) represent the other one. Both are very successful in many applications. Trees are also relatively fast to train, compared to some more involved methods. Besides effectivnes
2 0.20558283 22 fast ml-2013-03-07-Choosing a machine learning algorithm
Introduction: To celbrate the first 100 followers on Twitter, we asked them what would they like to read about here. One of the responders, Itamar Berger, suggested a topic: how to choose a ML algorithm for a task at hand. Well, what do we now? Three things come to mind: We’d try fast things first. In terms of speed, here’s how we imagine the order: linear models trees, that is bagged or boosted trees everything else* We’d use something we are comfortable with. Learning new things is very exciting, however we’d ask ourselves a question: do we want to learn a new technique, or do we want a result? We’d prefer something with fewer hyperparameters to set. More params means more tuning, that is training and re-training over and over, even if automatically. Random forests are hard to beat in this department. Linear models are pretty good too. A random forest scene, credit: Jonathan MacGregor *By “everything else” we mean “everything popular”, mostly things like
3 0.14721784 14 fast ml-2013-01-04-Madelon: Spearmint's revenge
Introduction: Little Spearmint couldn’t sleep that night. I was so close… - he was thinking. It seemed that he had found a better than default value for one of the random forest hyperparams, but it turned out to be false. He made a decision as he fell asleep: Next time, I will show them! The way to do this is to use a dataset that is known to produce lower error with high mtry values, namely previously mentioned Madelon from NIPS 2003 Feature Selection Challenge. Among 500 attributes, only 20 are informative, the rest are noise. That’s the reason why high mtry is good here: you have to consider a lot of features to find a meaningful one. The dataset consists of a train, validation and test parts, with labels being available for train and validation. We will further split the training set into our train and validation sets, and use the original validation set as a test set to evaluate final results of parameter tuning. As an error measure we use Area Under Curve , or AUC, which was
4 0.14282188 13 fast ml-2012-12-27-Spearmint with a random forest
Introduction: Now that we have Spearmint basics nailed, we’ll try tuning a random forest, and specifically two hyperparams: a number of trees ( ntrees ) and a number of candidate features at each split ( mtry ). Here’s some code . We’re going to use a red wine quality dataset. It has about 1600 examples and our goal will be to predict a rating for a wine given all the other properties. This is a regression* task, as ratings are in (0,10) range. We will split the data 80/10/10 into train, validation and test set, and use the first two to establish optimal hyperparams and then predict on the test set. As an error measure we will use RMSE. At first, we will try ntrees between 10 and 200 and mtry between 3 and 11 (there’s eleven features total, so that’s the upper bound). Here are the results of two Spearmint runs with 71 and 95 tries respectively. Colors denote a validation error value: green : RMSE < 0.57 blue : RMSE < 0.58 black : RMSE >= 0.58 Turns out that some diffe
5 0.097227529 55 fast ml-2014-03-20-Good representations, distance, metric learning and supervised dimensionality reduction
Introduction: How to represent features for machine learning is an important business. For example, deep learning is all about finding good representations. What exactly they are depends on a task at hand. We investigate how to use available labels to obtain good representations. Motivation The paper that inspired us a while ago was Nonparametric Guidance of Autoencoder Representations using Label Information by Snoek, Adams and LaRochelle. It’s about autoencoders, but contains a greater idea: Discriminative algorithms often work best with highly-informative features; remarkably, such features can often be learned without the labels. (…) However, pure unsupervised learning (…) can find representations that may or may not be useful for the ultimate discriminative task. (…) In this work, we are interested in the discovery of latent features which can be later used as alternate representations of data for discriminative tasks. That is, we wish to find ways to extract statistical structu
6 0.082152449 18 fast ml-2013-01-17-A very fast denoising autoencoder
7 0.076205581 19 fast ml-2013-02-07-The secret of the big guys
8 0.073847204 48 fast ml-2013-12-28-Regularizing neural networks with dropout and with DropConnect
9 0.073803522 21 fast ml-2013-02-27-Dimensionality reduction for sparse binary data
10 0.072042055 12 fast ml-2012-12-21-Tuning hyperparams automatically with Spearmint
11 0.068489686 46 fast ml-2013-12-07-13 NIPS papers that caught our eye
12 0.066714838 20 fast ml-2013-02-18-Predicting advertised salaries
13 0.066549323 27 fast ml-2013-05-01-Deep learning made easy
14 0.062656455 7 fast ml-2012-10-05-Predicting closed questions on Stack Overflow
15 0.061092407 15 fast ml-2013-01-07-Machine learning courses online
16 0.060133588 43 fast ml-2013-11-02-Maxing out the digits
17 0.058985438 8 fast ml-2012-10-15-Merck challenge
18 0.05834173 61 fast ml-2014-05-08-Impute missing values with Amelia
19 0.057187077 25 fast ml-2013-04-10-Gender discrimination
20 0.055801455 57 fast ml-2014-04-01-Exclusive Geoff Hinton interview
topicId topicWeight
[(0, 0.245), (1, 0.192), (2, -0.153), (3, 0.024), (4, 0.091), (5, 0.128), (6, -0.024), (7, -0.197), (8, 0.029), (9, -0.071), (10, -0.337), (11, -0.311), (12, -0.005), (13, 0.1), (14, -0.192), (15, -0.066), (16, 0.02), (17, 0.042), (18, 0.004), (19, 0.204), (20, -0.025), (21, 0.096), (22, -0.021), (23, 0.14), (24, -0.004), (25, -0.054), (26, 0.003), (27, 0.178), (28, 0.103), (29, 0.05), (30, 0.0), (31, -0.154), (32, 0.023), (33, 0.192), (34, 0.003), (35, 0.095), (36, 0.075), (37, -0.047), (38, 0.045), (39, -0.044), (40, 0.051), (41, 0.06), (42, 0.136), (43, -0.345), (44, 0.07), (45, 0.15), (46, 0.074), (47, 0.094), (48, 0.169), (49, 0.033)]
simIndex simValue blogId blogTitle
same-blog 1 0.98713553 16 fast ml-2013-01-12-Intro to random forests
Introduction: Let’s step back from forays into cutting edge topics and look at a random forest, one of the most popular machine learning techniques today. Why is it so attractive? First of all, decision tree ensembles have been found by Caruana et al. as the best overall approach for a variety of problems. Random forests, specifically, perform well both in low dimensional and high dimensional tasks. There are basically two kinds of tree ensembles: bagged trees and boosted trees. Bagging means that when building each subsequent tree, we don’t look at the earlier trees, while in boosting we consider the earlier trees and strive to compensate for their weaknesses (which may lead to overfitting). Random forest is an example of the bagging approach, less prone to overfit. Gradient boosted trees (notably GBM package in R) represent the other one. Both are very successful in many applications. Trees are also relatively fast to train, compared to some more involved methods. Besides effectivnes
2 0.44582376 22 fast ml-2013-03-07-Choosing a machine learning algorithm
Introduction: To celbrate the first 100 followers on Twitter, we asked them what would they like to read about here. One of the responders, Itamar Berger, suggested a topic: how to choose a ML algorithm for a task at hand. Well, what do we now? Three things come to mind: We’d try fast things first. In terms of speed, here’s how we imagine the order: linear models trees, that is bagged or boosted trees everything else* We’d use something we are comfortable with. Learning new things is very exciting, however we’d ask ourselves a question: do we want to learn a new technique, or do we want a result? We’d prefer something with fewer hyperparameters to set. More params means more tuning, that is training and re-training over and over, even if automatically. Random forests are hard to beat in this department. Linear models are pretty good too. A random forest scene, credit: Jonathan MacGregor *By “everything else” we mean “everything popular”, mostly things like
3 0.26845199 13 fast ml-2012-12-27-Spearmint with a random forest
Introduction: Now that we have Spearmint basics nailed, we’ll try tuning a random forest, and specifically two hyperparams: a number of trees ( ntrees ) and a number of candidate features at each split ( mtry ). Here’s some code . We’re going to use a red wine quality dataset. It has about 1600 examples and our goal will be to predict a rating for a wine given all the other properties. This is a regression* task, as ratings are in (0,10) range. We will split the data 80/10/10 into train, validation and test set, and use the first two to establish optimal hyperparams and then predict on the test set. As an error measure we will use RMSE. At first, we will try ntrees between 10 and 200 and mtry between 3 and 11 (there’s eleven features total, so that’s the upper bound). Here are the results of two Spearmint runs with 71 and 95 tries respectively. Colors denote a validation error value: green : RMSE < 0.57 blue : RMSE < 0.58 black : RMSE >= 0.58 Turns out that some diffe
4 0.2559779 14 fast ml-2013-01-04-Madelon: Spearmint's revenge
Introduction: Little Spearmint couldn’t sleep that night. I was so close… - he was thinking. It seemed that he had found a better than default value for one of the random forest hyperparams, but it turned out to be false. He made a decision as he fell asleep: Next time, I will show them! The way to do this is to use a dataset that is known to produce lower error with high mtry values, namely previously mentioned Madelon from NIPS 2003 Feature Selection Challenge. Among 500 attributes, only 20 are informative, the rest are noise. That’s the reason why high mtry is good here: you have to consider a lot of features to find a meaningful one. The dataset consists of a train, validation and test parts, with labels being available for train and validation. We will further split the training set into our train and validation sets, and use the original validation set as a test set to evaluate final results of parameter tuning. As an error measure we use Area Under Curve , or AUC, which was
5 0.24021889 15 fast ml-2013-01-07-Machine learning courses online
Introduction: How do you learn machine learning? A good way to begin is to take an online course. These courses started appearing towards the end of 2011, first from Stanford University, now from Coursera , Udacity , edX and other institutions. There are very many of them, including a few about machine learning. Here’s a list: Introduction to Artificial Intelligence by Sebastian Thrun and Peter Norvig. That was the first online class, and it contains two units on machine learning (units five and six). Both instructors work at Google. Sebastian Thrun is best known for building a self-driving car and Peter Norvig is a leading authority on AI, so they know what they are talking about. After the success of the class Sebastian Thrun quit Stanford to found Udacity, his online learning startup. Machine Learning by Andrew Ng. Again, one of the first classes, by Stanford professor who started Coursera, the best known online learning provider today. Andrew Ng is a world class authority on m
6 0.18655087 18 fast ml-2013-01-17-A very fast denoising autoencoder
7 0.17240982 21 fast ml-2013-02-27-Dimensionality reduction for sparse binary data
9 0.1602812 19 fast ml-2013-02-07-The secret of the big guys
10 0.15857539 27 fast ml-2013-05-01-Deep learning made easy
11 0.15688525 20 fast ml-2013-02-18-Predicting advertised salaries
12 0.14204727 48 fast ml-2013-12-28-Regularizing neural networks with dropout and with DropConnect
13 0.14089751 25 fast ml-2013-04-10-Gender discrimination
14 0.12399143 61 fast ml-2014-05-08-Impute missing values with Amelia
15 0.12363346 58 fast ml-2014-04-12-Deep learning these days
16 0.12152147 23 fast ml-2013-03-18-Large scale L1 feature selection with Vowpal Wabbit
17 0.11560203 7 fast ml-2012-10-05-Predicting closed questions on Stack Overflow
18 0.11489563 31 fast ml-2013-06-19-Go non-linear with Vowpal Wabbit
19 0.11242254 57 fast ml-2014-04-01-Exclusive Geoff Hinton interview
20 0.10830013 59 fast ml-2014-04-21-Predicting happiness from demographics and poll answers
topicId topicWeight
[(26, 0.027), (31, 0.021), (35, 0.03), (37, 0.021), (55, 0.032), (58, 0.012), (69, 0.177), (71, 0.238), (99, 0.332)]
simIndex simValue blogId blogTitle
same-blog 1 0.95380735 16 fast ml-2013-01-12-Intro to random forests
Introduction: Let’s step back from forays into cutting edge topics and look at a random forest, one of the most popular machine learning techniques today. Why is it so attractive? First of all, decision tree ensembles have been found by Caruana et al. as the best overall approach for a variety of problems. Random forests, specifically, perform well both in low dimensional and high dimensional tasks. There are basically two kinds of tree ensembles: bagged trees and boosted trees. Bagging means that when building each subsequent tree, we don’t look at the earlier trees, while in boosting we consider the earlier trees and strive to compensate for their weaknesses (which may lead to overfitting). Random forest is an example of the bagging approach, less prone to overfit. Gradient boosted trees (notably GBM package in R) represent the other one. Both are very successful in many applications. Trees are also relatively fast to train, compared to some more involved methods. Besides effectivnes
2 0.8492893 6 fast ml-2012-09-25-Best Buy mobile contest - full disclosure
Introduction: Here’s the final version of the script, with two ideas improving on our previous take: searching in product names and spelling correction. As far as product names go, we will use product data available in XML format to extract SKU and name for each product: 8564564 Ace Combat 6: Fires of Liberation Platinum Hits 2755149 Ace Combat: Assault Horizon 1208344 Adrenalin Misfits Further, we will process the names in the same way we processed queries: 8564564 acecombat6firesofliberationplatinumhits 2755149 acecombatassaulthorizon 1208344 adrenalinmisfits When we need to fill in some predictions, instead of taking them from the benchmark, we will search in names. If that is not enough, then we go to the benchmark. But wait! There’s more. We will spell-correct test queries when looking in our query -> sku mapping. For example, we may have a SKU for L.A. Noire ( lanoire ), but not for L.A. Noir ( lanoir ). It is easy to see that this is basically the same query, o
3 0.81424403 5 fast ml-2012-09-19-Best Buy mobile contest - big data
Introduction: Last time we talked about the small data branch of Best Buy contest . Now it’s time to tackle the big boy . It is positioned as “cloud computing sized problem”, because there is 7GB of unpacked data, vs. younger brother’s 20MB. This is reflected in “cloud computing” and “cluster” and “Oracle” talk in the forum , and also in small number of participating teams: so far, only six contestants managed to beat the benchmark. But don’t be scared. Most of data mass is in XML product information. Training and test sets together are 378MB. Good news. The really interesting thing is that we can take the script we’ve used for small data and apply it to this contest, obtaining 0.355 in a few minutes (benchmark is 0.304). Not impressed? With simple extension you can up the score to 0.55. Read below for details. This is the very same script , with one difference. In this challenge, benchmark recommendations differ from line to line, so we can’t just hard-code five item IDs like before
4 0.78694755 4 fast ml-2012-09-17-Best Buy mobile contest
Introduction: There’s a contest on Kaggle called ACM Hackaton . Actually, there are two, one based on small data and one on big data. Here we will be talking about small data contest - specifically, about beating the benchmark - but the ideas are equally applicable to both. The deal is, we have a training set from Best Buy with search queries and items which users clicked after the query, plus some other data. Items are in this case Xbox games like “Batman” or “Rocksmith” or “Call of Duty”. We are asked to predict an item user clicked given the query. Metric is MAP@5 (see an explanation of MAP ). The problem isn’t typical for Kaggle, because it doesn’t really require using machine learning in traditional sense. To beat the benchmark, it’s enough to write a short script. Concretely ;), we’re gonna build a mapping from queries to items, using the training set. It will be just a Python dictionary looking like this: 'forzasteeringwheel': {'2078113': 1}, 'finalfantasy13': {'9461183': 3, '351
5 0.65622258 9 fast ml-2012-10-25-So you want to work for Facebook
Introduction: Good news, everyone! There’s a new contest on Kaggle - Facebook is looking for talent . They won’t pay, but just might interview. This post is in a way a bonus for active readers because most visitors of fastml.com originally come from Kaggle forums. For this competition the forums are disabled to encourage own work . To honor this, we won’t publish any code. But own work doesn’t mean original work , and we wouldn’t want to reinvent the wheel, would we? The contest differs substantially from a Kaggle stereotype, if there is such a thing, in three major ways: there’s no money prizes, as mentioned above it’s not a real world problem, but rather an assignment to screen job candidates (this has important consequences, described below) it’s not a typical machine learning project, but rather a broader AI exercise You are given a graph of the internet, actually a snapshot of the graph for each of 15 time steps. You are also given a bunch of paths in this graph, which a
6 0.63865304 25 fast ml-2013-04-10-Gender discrimination
7 0.6240747 22 fast ml-2013-03-07-Choosing a machine learning algorithm
8 0.59141058 18 fast ml-2013-01-17-A very fast denoising autoencoder
9 0.58773321 41 fast ml-2013-10-09-Big data made easy
10 0.58306319 19 fast ml-2013-02-07-The secret of the big guys
11 0.57840037 48 fast ml-2013-12-28-Regularizing neural networks with dropout and with DropConnect
12 0.56554681 24 fast ml-2013-03-25-Dimensionality reduction for sparse binary data - an overview
13 0.56097627 61 fast ml-2014-05-08-Impute missing values with Amelia
14 0.55314785 35 fast ml-2013-08-12-Accelerometer Biometric Competition
15 0.55076671 21 fast ml-2013-02-27-Dimensionality reduction for sparse binary data
16 0.5472182 20 fast ml-2013-02-18-Predicting advertised salaries
17 0.54675293 55 fast ml-2014-03-20-Good representations, distance, metric learning and supervised dimensionality reduction
18 0.5420233 43 fast ml-2013-11-02-Maxing out the digits
19 0.53804553 28 fast ml-2013-05-12-And deliver us from Weka
20 0.53704154 23 fast ml-2013-03-18-Large scale L1 feature selection with Vowpal Wabbit