fast_ml fast_ml-2012 fast_ml-2012-13 knowledge-graph by maker-knowledge-mining

13 fast ml-2012-12-27-Spearmint with a random forest


meta infos for this blog

Source: html

Introduction: Now that we have Spearmint basics nailed, we’ll try tuning a random forest, and specifically two hyperparams: a number of trees ( ntrees ) and a number of candidate features at each split ( mtry ). Here’s some code . We’re going to use a red wine quality dataset. It has about 1600 examples and our goal will be to predict a rating for a wine given all the other properties. This is a regression* task, as ratings are in (0,10) range. We will split the data 80/10/10 into train, validation and test set, and use the first two to establish optimal hyperparams and then predict on the test set. As an error measure we will use RMSE. At first, we will try ntrees between 10 and 200 and mtry between 3 and 11 (there’s eleven features total, so that’s the upper bound). Here are the results of two Spearmint runs with 71 and 95 tries respectively. Colors denote a validation error value: green : RMSE < 0.57 blue : RMSE < 0.58 black : RMSE >= 0.58 Turns out that some diffe


Summary: the most important sentenses genereted by tfidf model

sentIndex sentText sentNum sentScore

1 Now that we have Spearmint basics nailed, we’ll try tuning a random forest, and specifically two hyperparams: a number of trees ( ntrees ) and a number of candidate features at each split ( mtry ). [sent-1, score-1.647]

2 It has about 1600 examples and our goal will be to predict a rating for a wine given all the other properties. [sent-4, score-0.266]

3 We will split the data 80/10/10 into train, validation and test set, and use the first two to establish optimal hyperparams and then predict on the test set. [sent-6, score-0.357]

4 At first, we will try ntrees between 10 and 200 and mtry between 3 and 11 (there’s eleven features total, so that’s the upper bound). [sent-8, score-0.926]

5 Here are the results of two Spearmint runs with 71 and 95 tries respectively. [sent-9, score-0.349]

6 Colors denote a validation error value: green : RMSE < 0. [sent-10, score-0.294]

7 58 Turns out that some differences in the error value, even though not that big, are present, so a little effort to choose good hyperparams makes sense. [sent-13, score-0.466]

8 A number of trees is on the horizontal axis and the number of candidate features on the vertical. [sent-14, score-0.526]

9 As you can see, the two runs are pretty similar. [sent-15, score-0.19]

10 The number of trees doesn’t matter so much as long as it is big enough - good results can be achieved with less than a hundred tress. [sent-16, score-0.28]

11 However, mtry parameter is clearly very important - most of good runs used a value of three. [sent-17, score-1.044]

12 This is close to a default setting, which is a square root of number of features (3. [sent-18, score-0.518]

13 If three is so good, we might try setting the lower bound to two or even one. [sent-20, score-0.551]

14 Here is the plot of the third, final run with 141 tries: It seems that two is even better for mtry than three. [sent-22, score-0.722]

15 And indeed, a best found combination is 158 trees, mtry 2. [sent-23, score-0.606]

16 Random forests are intrinsically random, so in another run with these settings you will probably get a worse result than that or a lower error with a different number of trees. [sent-26, score-0.487]

17 Let’s see how the error depends on mtry (we omit a few outliers to get a close-up): The best value for mtry is probably two, three being a runner up. [sent-27, score-1.554]

18 So maybe we discovered a better than default value for this hyperparam after all. [sent-28, score-0.31]

19 To verify this, we will train ten random forests with the default setting and ten forests with mtry = 2. [sent-29, score-1.583]

20 If you’d like to learn more about impact of mtry on random forest accuracy, there’s a whole paper on this subject by Bernard et al . [sent-34, score-0.758]


similar blogs computed by tfidf model

tfidf for this blog:

wordName wordTfidf (topN-words)

[('mtry', 0.606), ('value', 0.179), ('ntrees', 0.165), ('setting', 0.164), ('error', 0.163), ('forests', 0.151), ('wine', 0.137), ('candidate', 0.137), ('dramatic', 0.137), ('default', 0.131), ('trees', 0.131), ('rmse', 0.131), ('runs', 0.131), ('hyperparams', 0.116), ('ten', 0.11), ('bound', 0.11), ('root', 0.11), ('square', 0.11), ('spearmint', 0.101), ('tries', 0.101), ('improvement', 0.101), ('regression', 0.099), ('number', 0.091), ('random', 0.091), ('lower', 0.082), ('try', 0.079), ('features', 0.076), ('parameter', 0.073), ('differences', 0.069), ('nailed', 0.069), ('sensible', 0.069), ('rating', 0.069), ('denote', 0.069), ('prepared', 0.069), ('verify', 0.069), ('validation', 0.062), ('ratings', 0.061), ('concretely', 0.061), ('regular', 0.061), ('basics', 0.061), ('effort', 0.061), ('impact', 0.061), ('limit', 0.061), ('predict', 0.06), ('split', 0.06), ('two', 0.059), ('results', 0.058), ('even', 0.057), ('treat', 0.055), ('clearly', 0.055)]

similar blogs list:

simIndex simValue blogId blogTitle

same-blog 1 1.0 13 fast ml-2012-12-27-Spearmint with a random forest

Introduction: Now that we have Spearmint basics nailed, we’ll try tuning a random forest, and specifically two hyperparams: a number of trees ( ntrees ) and a number of candidate features at each split ( mtry ). Here’s some code . We’re going to use a red wine quality dataset. It has about 1600 examples and our goal will be to predict a rating for a wine given all the other properties. This is a regression* task, as ratings are in (0,10) range. We will split the data 80/10/10 into train, validation and test set, and use the first two to establish optimal hyperparams and then predict on the test set. As an error measure we will use RMSE. At first, we will try ntrees between 10 and 200 and mtry between 3 and 11 (there’s eleven features total, so that’s the upper bound). Here are the results of two Spearmint runs with 71 and 95 tries respectively. Colors denote a validation error value: green : RMSE < 0.57 blue : RMSE < 0.58 black : RMSE >= 0.58 Turns out that some diffe

2 0.58491415 14 fast ml-2013-01-04-Madelon: Spearmint's revenge

Introduction: Little Spearmint couldn’t sleep that night. I was so close… - he was thinking. It seemed that he had found a better than default value for one of the random forest hyperparams, but it turned out to be false. He made a decision as he fell asleep: Next time, I will show them! The way to do this is to use a dataset that is known to produce lower error with high mtry values, namely previously mentioned Madelon from NIPS 2003 Feature Selection Challenge. Among 500 attributes, only 20 are informative, the rest are noise. That’s the reason why high mtry is good here: you have to consider a lot of features to find a meaningful one. The dataset consists of a train, validation and test parts, with labels being available for train and validation. We will further split the training set into our train and validation sets, and use the original validation set as a test set to evaluate final results of parameter tuning. As an error measure we use Area Under Curve , or AUC, which was

3 0.24709208 12 fast ml-2012-12-21-Tuning hyperparams automatically with Spearmint

Introduction: The promise What’s attractive in machine learning? That a machine is learning, instead of a human. But an operator still has a lot of work to do. First, he has to learn how to teach a machine, in general. Then, when it comes to a concrete task, there are two main areas where a human needs to do the work (and remember, laziness is a virtue, at least for a programmer, so we’d like to minimize amount of work done by a human): data preparation model tuning This story is about model tuning. Typically, to achieve satisfactory results, first we need to convert raw data into format accepted by the model we would like to use, and then tune a few hyperparameters of the model. For example, some hyperparams to tune for a random forest may be a number of trees to grow and a number of candidate features at each split ( mtry in R randomForest). For a neural network, there are quite a lot of hyperparams: number of layers, number of neurons in each layer (specifically, in each hid

4 0.15169109 18 fast ml-2013-01-17-A very fast denoising autoencoder

Introduction: Once upon a time we were browsing machine learning papers and software. We were interested in autoencoders and found a rather unusual one. It was called marginalized Stacked Denoising Autoencoder and the author claimed that it preserves the strong feature learning capacity of Stacked Denoising Autoencoders, but is orders of magnitudes faster. We like all things fast, so we were hooked. About autoencoders Wikipedia says that an autoencoder is an artificial neural network and its aim is to learn a compressed representation for a set of data. This means it is being used for dimensionality reduction . In other words, an autoencoder is a neural network meant to replicate the input. It would be trivial with a big enough number of units in a hidden layer: the network would just find an identity mapping. Hence dimensionality reduction: a hidden layer size is typically smaller than input layer. mSDA is a curious specimen: it is not a neural network and it doesn’t reduce dimension

5 0.14282188 16 fast ml-2013-01-12-Intro to random forests

Introduction: Let’s step back from forays into cutting edge topics and look at a random forest, one of the most popular machine learning techniques today. Why is it so attractive? First of all, decision tree ensembles have been found by Caruana et al. as the best overall approach for a variety of problems. Random forests, specifically, perform well both in low dimensional and high dimensional tasks. There are basically two kinds of tree ensembles: bagged trees and boosted trees. Bagging means that when building each subsequent tree, we don’t look at the earlier trees, while in boosting we consider the earlier trees and strive to compensate for their weaknesses (which may lead to overfitting). Random forest is an example of the bagging approach, less prone to overfit. Gradient boosted trees (notably GBM package in R) represent the other one. Both are very successful in many applications. Trees are also relatively fast to train, compared to some more involved methods. Besides effectivnes

6 0.12474725 11 fast ml-2012-12-07-Predicting wine quality

7 0.1012109 48 fast ml-2013-12-28-Regularizing neural networks with dropout and with DropConnect

8 0.10055304 20 fast ml-2013-02-18-Predicting advertised salaries

9 0.09978155 43 fast ml-2013-11-02-Maxing out the digits

10 0.098353028 19 fast ml-2013-02-07-The secret of the big guys

11 0.082037494 55 fast ml-2014-03-20-Good representations, distance, metric learning and supervised dimensionality reduction

12 0.079490319 22 fast ml-2013-03-07-Choosing a machine learning algorithm

13 0.073086999 17 fast ml-2013-01-14-Feature selection in practice

14 0.072775565 52 fast ml-2014-02-02-Yesterday a kaggler, today a Kaggle master: a wrap-up of the cats and dogs competition

15 0.068612903 27 fast ml-2013-05-01-Deep learning made easy

16 0.067543872 54 fast ml-2014-03-06-PyBrain - a simple neural networks library in Python

17 0.067305461 26 fast ml-2013-04-17-Regression as classification

18 0.067094244 25 fast ml-2013-04-10-Gender discrimination

19 0.056808162 46 fast ml-2013-12-07-13 NIPS papers that caught our eye

20 0.055593427 31 fast ml-2013-06-19-Go non-linear with Vowpal Wabbit


similar blogs computed by lsi model

lsi for this blog:

topicId topicWeight

[(0, 0.337), (1, 0.353), (2, -0.618), (3, -0.07), (4, 0.01), (5, 0.011), (6, 0.077), (7, -0.058), (8, 0.129), (9, -0.064), (10, 0.025), (11, 0.12), (12, 0.05), (13, 0.04), (14, 0.069), (15, 0.043), (16, -0.103), (17, 0.023), (18, -0.052), (19, -0.05), (20, 0.034), (21, 0.039), (22, -0.001), (23, 0.05), (24, -0.007), (25, -0.017), (26, 0.019), (27, -0.016), (28, -0.003), (29, 0.003), (30, -0.009), (31, -0.038), (32, -0.056), (33, 0.064), (34, -0.082), (35, 0.004), (36, 0.001), (37, 0.097), (38, -0.009), (39, -0.023), (40, 0.072), (41, -0.095), (42, -0.035), (43, 0.109), (44, -0.061), (45, 0.057), (46, -0.136), (47, -0.02), (48, 0.007), (49, 0.046)]

similar blogs list:

simIndex simValue blogId blogTitle

same-blog 1 0.97530484 13 fast ml-2012-12-27-Spearmint with a random forest

Introduction: Now that we have Spearmint basics nailed, we’ll try tuning a random forest, and specifically two hyperparams: a number of trees ( ntrees ) and a number of candidate features at each split ( mtry ). Here’s some code . We’re going to use a red wine quality dataset. It has about 1600 examples and our goal will be to predict a rating for a wine given all the other properties. This is a regression* task, as ratings are in (0,10) range. We will split the data 80/10/10 into train, validation and test set, and use the first two to establish optimal hyperparams and then predict on the test set. As an error measure we will use RMSE. At first, we will try ntrees between 10 and 200 and mtry between 3 and 11 (there’s eleven features total, so that’s the upper bound). Here are the results of two Spearmint runs with 71 and 95 tries respectively. Colors denote a validation error value: green : RMSE < 0.57 blue : RMSE < 0.58 black : RMSE >= 0.58 Turns out that some diffe

2 0.92765242 14 fast ml-2013-01-04-Madelon: Spearmint's revenge

Introduction: Little Spearmint couldn’t sleep that night. I was so close… - he was thinking. It seemed that he had found a better than default value for one of the random forest hyperparams, but it turned out to be false. He made a decision as he fell asleep: Next time, I will show them! The way to do this is to use a dataset that is known to produce lower error with high mtry values, namely previously mentioned Madelon from NIPS 2003 Feature Selection Challenge. Among 500 attributes, only 20 are informative, the rest are noise. That’s the reason why high mtry is good here: you have to consider a lot of features to find a meaningful one. The dataset consists of a train, validation and test parts, with labels being available for train and validation. We will further split the training set into our train and validation sets, and use the original validation set as a test set to evaluate final results of parameter tuning. As an error measure we use Area Under Curve , or AUC, which was

3 0.48668891 12 fast ml-2012-12-21-Tuning hyperparams automatically with Spearmint

Introduction: The promise What’s attractive in machine learning? That a machine is learning, instead of a human. But an operator still has a lot of work to do. First, he has to learn how to teach a machine, in general. Then, when it comes to a concrete task, there are two main areas where a human needs to do the work (and remember, laziness is a virtue, at least for a programmer, so we’d like to minimize amount of work done by a human): data preparation model tuning This story is about model tuning. Typically, to achieve satisfactory results, first we need to convert raw data into format accepted by the model we would like to use, and then tune a few hyperparameters of the model. For example, some hyperparams to tune for a random forest may be a number of trees to grow and a number of candidate features at each split ( mtry in R randomForest). For a neural network, there are quite a lot of hyperparams: number of layers, number of neurons in each layer (specifically, in each hid

4 0.28000686 16 fast ml-2013-01-12-Intro to random forests

Introduction: Let’s step back from forays into cutting edge topics and look at a random forest, one of the most popular machine learning techniques today. Why is it so attractive? First of all, decision tree ensembles have been found by Caruana et al. as the best overall approach for a variety of problems. Random forests, specifically, perform well both in low dimensional and high dimensional tasks. There are basically two kinds of tree ensembles: bagged trees and boosted trees. Bagging means that when building each subsequent tree, we don’t look at the earlier trees, while in boosting we consider the earlier trees and strive to compensate for their weaknesses (which may lead to overfitting). Random forest is an example of the bagging approach, less prone to overfit. Gradient boosted trees (notably GBM package in R) represent the other one. Both are very successful in many applications. Trees are also relatively fast to train, compared to some more involved methods. Besides effectivnes

5 0.26502252 43 fast ml-2013-11-02-Maxing out the digits

Introduction: Recently we’ve been investigating the basics of Pylearn2 . Now it’s time for a more advanced example: a multilayer perceptron with dropout and maxout activation for the MNIST digits. Maxout explained If you’ve been following developments in deep learning, you know that Hinton’s most recent recommendation for supervised learning, after a few years of bashing backpropagation in favour of unsupervised pretraining, is to use classic multilayer perceptrons with dropout and rectified linear units. For us, this breath of simplicity is a welcome change. Rectified linear is f(x) = max( 0, x ) . This makes backpropagation trivial: for x > 0, the derivative is one, else zero. Note that ReLU consists of two linear functions. But why stop at two? Let’s take max. out of three, or four, or five linear functions… And so maxout is a generalization of ReLU. It can approximate any convex function. Now backpropagation is easy and dropout prevents overfitting, so we can train a deep

6 0.24646164 18 fast ml-2013-01-17-A very fast denoising autoencoder

7 0.23139198 20 fast ml-2013-02-18-Predicting advertised salaries

8 0.22760879 11 fast ml-2012-12-07-Predicting wine quality

9 0.20887588 48 fast ml-2013-12-28-Regularizing neural networks with dropout and with DropConnect

10 0.18924034 54 fast ml-2014-03-06-PyBrain - a simple neural networks library in Python

11 0.18486634 55 fast ml-2014-03-20-Good representations, distance, metric learning and supervised dimensionality reduction

12 0.17758115 52 fast ml-2014-02-02-Yesterday a kaggler, today a Kaggle master: a wrap-up of the cats and dogs competition

13 0.17423916 27 fast ml-2013-05-01-Deep learning made easy

14 0.17220953 17 fast ml-2013-01-14-Feature selection in practice

15 0.1582084 19 fast ml-2013-02-07-The secret of the big guys

16 0.14285406 25 fast ml-2013-04-10-Gender discrimination

17 0.142176 26 fast ml-2013-04-17-Regression as classification

18 0.13623719 59 fast ml-2014-04-21-Predicting happiness from demographics and poll answers

19 0.13106067 46 fast ml-2013-12-07-13 NIPS papers that caught our eye

20 0.12833473 22 fast ml-2013-03-07-Choosing a machine learning algorithm


similar blogs computed by lda model

lda for this blog:

topicId topicWeight

[(6, 0.01), (20, 0.25), (26, 0.031), (31, 0.031), (35, 0.038), (55, 0.015), (69, 0.395), (71, 0.082), (84, 0.032), (99, 0.018)]

similar blogs list:

simIndex simValue blogId blogTitle

same-blog 1 0.93422914 13 fast ml-2012-12-27-Spearmint with a random forest

Introduction: Now that we have Spearmint basics nailed, we’ll try tuning a random forest, and specifically two hyperparams: a number of trees ( ntrees ) and a number of candidate features at each split ( mtry ). Here’s some code . We’re going to use a red wine quality dataset. It has about 1600 examples and our goal will be to predict a rating for a wine given all the other properties. This is a regression* task, as ratings are in (0,10) range. We will split the data 80/10/10 into train, validation and test set, and use the first two to establish optimal hyperparams and then predict on the test set. As an error measure we will use RMSE. At first, we will try ntrees between 10 and 200 and mtry between 3 and 11 (there’s eleven features total, so that’s the upper bound). Here are the results of two Spearmint runs with 71 and 95 tries respectively. Colors denote a validation error value: green : RMSE < 0.57 blue : RMSE < 0.58 black : RMSE >= 0.58 Turns out that some diffe

2 0.83949357 27 fast ml-2013-05-01-Deep learning made easy

Introduction: As usual, there’s an interesting competition at Kaggle: The Black Box. It’s connected to ICML 2013 Workshop on Challenges in Representation Learning, held by the deep learning guys from Montreal. There are a couple benchmarks for this competition and the best one is unusually hard to beat 1 - only less than a fourth of those taking part managed to do so. We’re among them. Here’s how. The key ingredient in our success is a recently developed secret Stanford technology for deep unsupervised learning: sparse filtering by Jiquan Ngiam et al. Actually, it’s not secret. It’s available at Github , and has one or two very appealling properties. Let us explain. The main idea of deep unsupervised learning, as we understand it, is feature extraction. One of the most common applications is in multimedia. The reason for that is that multimedia tasks, for example object recognition, are easy for humans, but difficult for computers 2 . Geoff Hinton from Toronto talks about two ends

3 0.83347416 14 fast ml-2013-01-04-Madelon: Spearmint's revenge

Introduction: Little Spearmint couldn’t sleep that night. I was so close… - he was thinking. It seemed that he had found a better than default value for one of the random forest hyperparams, but it turned out to be false. He made a decision as he fell asleep: Next time, I will show them! The way to do this is to use a dataset that is known to produce lower error with high mtry values, namely previously mentioned Madelon from NIPS 2003 Feature Selection Challenge. Among 500 attributes, only 20 are informative, the rest are noise. That’s the reason why high mtry is good here: you have to consider a lot of features to find a meaningful one. The dataset consists of a train, validation and test parts, with labels being available for train and validation. We will further split the training set into our train and validation sets, and use the original validation set as a test set to evaluate final results of parameter tuning. As an error measure we use Area Under Curve , or AUC, which was

4 0.83268666 1 fast ml-2012-08-09-What you wanted to know about Mean Average Precision

Introduction: Let’s say that there are some users and some items, like movies, songs or jobs. Each user might be interested in some items. The client asks us to recommend a few items (the number is x) for each user. They will evaluate the results using mean average precision, or MAP, metric. Specifically MAP@x - this means they ask us to recommend x items for each user. So what is this MAP? First, we will get M out of the way. MAP is just an average of APs, or average precision, for all users. In other words, we take the mean for Average Precision, hence Mean Average Precision. If we have 1000 users, we sum APs for each user and divide the sum by 1000. This is MAP. So now, what is AP, or average precision? It may be that we don’t really need to know. But we probably need to know this: we can recommend at most x items for each user it pays to submit all x recommendations, because we are not penalized for bad guesses order matters, so it’s better to submit more certain recommendations fi

5 0.74311 18 fast ml-2013-01-17-A very fast denoising autoencoder

Introduction: Once upon a time we were browsing machine learning papers and software. We were interested in autoencoders and found a rather unusual one. It was called marginalized Stacked Denoising Autoencoder and the author claimed that it preserves the strong feature learning capacity of Stacked Denoising Autoencoders, but is orders of magnitudes faster. We like all things fast, so we were hooked. About autoencoders Wikipedia says that an autoencoder is an artificial neural network and its aim is to learn a compressed representation for a set of data. This means it is being used for dimensionality reduction . In other words, an autoencoder is a neural network meant to replicate the input. It would be trivial with a big enough number of units in a hidden layer: the network would just find an identity mapping. Hence dimensionality reduction: a hidden layer size is typically smaller than input layer. mSDA is a curious specimen: it is not a neural network and it doesn’t reduce dimension

6 0.7394169 17 fast ml-2013-01-14-Feature selection in practice

7 0.73810762 43 fast ml-2013-11-02-Maxing out the digits

8 0.73651057 12 fast ml-2012-12-21-Tuning hyperparams automatically with Spearmint

9 0.71959651 48 fast ml-2013-12-28-Regularizing neural networks with dropout and with DropConnect

10 0.69397861 9 fast ml-2012-10-25-So you want to work for Facebook

11 0.68953425 35 fast ml-2013-08-12-Accelerometer Biometric Competition

12 0.68132836 20 fast ml-2013-02-18-Predicting advertised salaries

13 0.6753276 40 fast ml-2013-10-06-Pylearn2 in practice

14 0.66985333 19 fast ml-2013-02-07-The secret of the big guys

15 0.64362019 16 fast ml-2013-01-12-Intro to random forests

16 0.64170521 55 fast ml-2014-03-20-Good representations, distance, metric learning and supervised dimensionality reduction

17 0.63733065 23 fast ml-2013-03-18-Large scale L1 feature selection with Vowpal Wabbit

18 0.62156528 26 fast ml-2013-04-17-Regression as classification

19 0.61753774 25 fast ml-2013-04-10-Gender discrimination

20 0.60932046 61 fast ml-2014-05-08-Impute missing values with Amelia