fast_ml fast_ml-2012 fast_ml-2012-11 knowledge-graph by maker-knowledge-mining
Source: html
Introduction: This post is as much about wine as it is about machine learning, so if you enjoy wine, like we do, you may find it especially interesting. Here’s some R and Matlab code , and if you want to get right to the point, skip to the charts . There’s a book by Philipp Janert called Data Analysis with Open Source Tools , which, by the way, we would recommend. From this book we found out about the wine quality datasets . There are two, one for red wine and one for white wine, and they are interesting because they contain quality ratings (1 - 10) for a few thousands of wines, along with their physical and chemical properties. We could probably use these properties to predict a rating for a wine. We’ll be looking at white and red wine separately for the reasons you will see shortly. Principal component analysis for white wine Janert performs a principal component analysis (PCA) and shows a resulting plot for white wine. What’s interesting about this plot is that judging by the first tw
sentIndex sentText sentNum sentScore
1 This post is as much about wine as it is about machine learning, so if you enjoy wine, like we do, you may find it especially interesting. [sent-1, score-0.579]
2 From this book we found out about the wine quality datasets . [sent-4, score-0.689]
3 There are two, one for red wine and one for white wine, and they are interesting because they contain quality ratings (1 - 10) for a few thousands of wines, along with their physical and chemical properties. [sent-5, score-1.027]
4 We’ll be looking at white and red wine separately for the reasons you will see shortly. [sent-7, score-0.832]
5 Principal component analysis for white wine Janert performs a principal component analysis (PCA) and shows a resulting plot for white wine. [sent-8, score-1.232]
6 What’s interesting about this plot is that judging by the first two principal components, a quality is very much correlated with alcohol content and fixed acidity (or pH - it’s basically the same thing). [sent-9, score-1.462]
7 It is true especially for alcohol content, which is listed on every bottle. [sent-11, score-0.507]
8 Now let’s do some more analysis involving quality, alcohol content and fixed acidity. [sent-15, score-0.991]
9 Let’s introduce some terminology: good means rating seven or higher bad means rating four or lower mediocre means rating five OK means rating six. [sent-18, score-1.201]
10 White wine Good and bad Red means good. [sent-20, score-0.67]
11 Horizontal axis is alcohol content, vertical axis is fixed acidity. [sent-22, score-0.921]
12 If alcohol content is at least 11%, or better yet, 12%, you are very likely to have a good wine. [sent-23, score-0.738]
13 Basically, you can get away with lower alcohol content if the wine has relatively less fixed acidity. [sent-27, score-1.537]
14 If you see a wine on the right side of that line, it is very likely to be good. [sent-30, score-0.616]
15 Red wine Alcohol OR acidity, please Similarly to white wines, if alcohol content is 12% or more, probably you’re all set. [sent-35, score-1.38]
16 Also, you want a wine which is either high in alcohol or highly acidic. [sent-36, score-1.038]
17 In other words, there are quite a lot good wines with less than 12% alcohol; what they have in common is high fixed acidity and some mediocre ones between them. [sent-37, score-1.004]
18 Compare with a white wine case - it’s the other way around. [sent-39, score-0.687]
19 To sum up, judging a wine on just two properties is rather simplistic. [sent-42, score-0.644]
20 All other things being equal, we much prefer a two year old wine to a one year old wine. [sent-44, score-0.733]
wordName wordTfidf (topN-words)
[('wine', 0.517), ('alcohol', 0.475), ('wines', 0.256), ('fixed', 0.243), ('acidity', 0.219), ('content', 0.218), ('rating', 0.182), ('white', 0.17), ('mediocre', 0.146), ('red', 0.145), ('quality', 0.111), ('bad', 0.107), ('charts', 0.091), ('principal', 0.089), ('axis', 0.077), ('assumptions', 0.073), ('janert', 0.073), ('ph', 0.073), ('picture', 0.073), ('terminology', 0.073), ('components', 0.073), ('properties', 0.073), ('book', 0.061), ('analysis', 0.055), ('ratings', 0.054), ('similarly', 0.054), ('judging', 0.054), ('old', 0.054), ('side', 0.054), ('year', 0.054), ('plot', 0.053), ('story', 0.049), ('vertical', 0.049), ('less', 0.048), ('means', 0.046), ('lot', 0.046), ('high', 0.046), ('likely', 0.045), ('horizontal', 0.045), ('pca', 0.045), ('tell', 0.045), ('shows', 0.041), ('component', 0.041), ('lower', 0.036), ('especially', 0.032), ('bear', 0.03), ('contain', 0.03), ('draw', 0.03), ('enjoy', 0.03), ('generalized', 0.03)]
simIndex simValue blogId blogTitle
same-blog 1 0.99999994 11 fast ml-2012-12-07-Predicting wine quality
Introduction: This post is as much about wine as it is about machine learning, so if you enjoy wine, like we do, you may find it especially interesting. Here’s some R and Matlab code , and if you want to get right to the point, skip to the charts . There’s a book by Philipp Janert called Data Analysis with Open Source Tools , which, by the way, we would recommend. From this book we found out about the wine quality datasets . There are two, one for red wine and one for white wine, and they are interesting because they contain quality ratings (1 - 10) for a few thousands of wines, along with their physical and chemical properties. We could probably use these properties to predict a rating for a wine. We’ll be looking at white and red wine separately for the reasons you will see shortly. Principal component analysis for white wine Janert performs a principal component analysis (PCA) and shows a resulting plot for white wine. What’s interesting about this plot is that judging by the first tw
2 0.12474725 13 fast ml-2012-12-27-Spearmint with a random forest
Introduction: Now that we have Spearmint basics nailed, we’ll try tuning a random forest, and specifically two hyperparams: a number of trees ( ntrees ) and a number of candidate features at each split ( mtry ). Here’s some code . We’re going to use a red wine quality dataset. It has about 1600 examples and our goal will be to predict a rating for a wine given all the other properties. This is a regression* task, as ratings are in (0,10) range. We will split the data 80/10/10 into train, validation and test set, and use the first two to establish optimal hyperparams and then predict on the test set. As an error measure we will use RMSE. At first, we will try ntrees between 10 and 200 and mtry between 3 and 11 (there’s eleven features total, so that’s the upper bound). Here are the results of two Spearmint runs with 71 and 95 tries respectively. Colors denote a validation error value: green : RMSE < 0.57 blue : RMSE < 0.58 black : RMSE >= 0.58 Turns out that some diffe
3 0.061164211 47 fast ml-2013-12-15-A-B testing with bayesian bandits in Google Analytics
Introduction: A/B testing is a way to optimize a web page. Half of visitors see one version, the other half another, so you can tell which version is more conducive to your goal - for example selling something. Since June 2013 A/B testing can be conveniently done with Google Analytics. Here’s how. This article is not quite about machine learning. If you’re not interested in testing, scroll down to the bayesian bandits section . Google Content Experiments We remember Google Website Optimizer from a few years ago. It wasn’t exactly user friendly or slick, but it felt solid and did the job. Unfortunately, at one point in time Google pulled the plug, leaving Genetify as a sole free (and open source) tool for multivariate testing. Multivariate means testing a few elements on a page simultanously. At that time they launched Content Experiments in Google Analytics, but it was a giant step backward. Content experiments were very primitive and only allowed rudimentary A/B split testing. It i
4 0.055270601 24 fast ml-2013-03-25-Dimensionality reduction for sparse binary data - an overview
Introduction: Last time we explored dimensionality reduction in practice using Gensim’s LSI and LDA. Now, having spent some time researching the subject matter, we will give an overview of other options. UPDATE : We now consider the topic quite irrelevant, because sparse high-dimensional data is precisely where linear models shine. See Amazon aspires to automate access control , Predicting advertised salaries and Predicting closed questions on Stack Overflow . And the few most popular methods are: LSI/LSA - a multinomial PCA LDA - Latent Dirichlet Allocation matrix factorization, in particular non-negative variants: NMF ICA, or Independent Components Analysis mixtures of Bernoullis stacked RBMs correlated topic models, an extension of LDA We tried the first two before. As regards matrix factorization, you do the same stuff as with movie recommendations (think Netflix challenge). The difference is, now all the matrix elements are known and we are only interested in
5 0.05240396 14 fast ml-2013-01-04-Madelon: Spearmint's revenge
Introduction: Little Spearmint couldn’t sleep that night. I was so close… - he was thinking. It seemed that he had found a better than default value for one of the random forest hyperparams, but it turned out to be false. He made a decision as he fell asleep: Next time, I will show them! The way to do this is to use a dataset that is known to produce lower error with high mtry values, namely previously mentioned Madelon from NIPS 2003 Feature Selection Challenge. Among 500 attributes, only 20 are informative, the rest are noise. That’s the reason why high mtry is good here: you have to consider a lot of features to find a meaningful one. The dataset consists of a train, validation and test parts, with labels being available for train and validation. We will further split the training set into our train and validation sets, and use the original validation set as a test set to evaluate final results of parameter tuning. As an error measure we use Area Under Curve , or AUC, which was
6 0.042193424 46 fast ml-2013-12-07-13 NIPS papers that caught our eye
7 0.035950374 12 fast ml-2012-12-21-Tuning hyperparams automatically with Spearmint
8 0.03201 53 fast ml-2014-02-20-Are stocks predictable?
9 0.028829416 7 fast ml-2012-10-05-Predicting closed questions on Stack Overflow
10 0.02814536 18 fast ml-2013-01-17-A very fast denoising autoencoder
11 0.027786363 55 fast ml-2014-03-20-Good representations, distance, metric learning and supervised dimensionality reduction
12 0.026893759 15 fast ml-2013-01-07-Machine learning courses online
13 0.026446443 21 fast ml-2013-02-27-Dimensionality reduction for sparse binary data
14 0.025100123 1 fast ml-2012-08-09-What you wanted to know about Mean Average Precision
15 0.024747688 56 fast ml-2014-03-31-If you use R, you may want RStudio
16 0.024238449 62 fast ml-2014-05-26-Yann LeCun's answers from the Reddit AMA
17 0.023575524 33 fast ml-2013-07-09-Introducing phraug
18 0.021265745 19 fast ml-2013-02-07-The secret of the big guys
19 0.020538833 51 fast ml-2014-01-25-Why IPy: reasons for using IPython interactively
20 0.02004973 23 fast ml-2013-03-18-Large scale L1 feature selection with Vowpal Wabbit
topicId topicWeight
[(0, 0.107), (1, 0.086), (2, -0.091), (3, -0.053), (4, 0.029), (5, 0.095), (6, -0.052), (7, -0.073), (8, 0.104), (9, 0.014), (10, 0.209), (11, 0.22), (12, 0.014), (13, 0.033), (14, -0.197), (15, -0.118), (16, -0.35), (17, -0.084), (18, -0.329), (19, -0.177), (20, 0.094), (21, -0.026), (22, 0.056), (23, 0.044), (24, -0.282), (25, -0.32), (26, 0.046), (27, 0.048), (28, 0.363), (29, -0.034), (30, -0.14), (31, 0.153), (32, -0.014), (33, -0.105), (34, -0.041), (35, 0.121), (36, -0.023), (37, -0.103), (38, -0.157), (39, 0.156), (40, -0.14), (41, 0.066), (42, 0.036), (43, -0.093), (44, 0.069), (45, -0.033), (46, 0.086), (47, 0.058), (48, -0.011), (49, -0.045)]
simIndex simValue blogId blogTitle
same-blog 1 0.99309075 11 fast ml-2012-12-07-Predicting wine quality
Introduction: This post is as much about wine as it is about machine learning, so if you enjoy wine, like we do, you may find it especially interesting. Here’s some R and Matlab code , and if you want to get right to the point, skip to the charts . There’s a book by Philipp Janert called Data Analysis with Open Source Tools , which, by the way, we would recommend. From this book we found out about the wine quality datasets . There are two, one for red wine and one for white wine, and they are interesting because they contain quality ratings (1 - 10) for a few thousands of wines, along with their physical and chemical properties. We could probably use these properties to predict a rating for a wine. We’ll be looking at white and red wine separately for the reasons you will see shortly. Principal component analysis for white wine Janert performs a principal component analysis (PCA) and shows a resulting plot for white wine. What’s interesting about this plot is that judging by the first tw
2 0.1545604 13 fast ml-2012-12-27-Spearmint with a random forest
Introduction: Now that we have Spearmint basics nailed, we’ll try tuning a random forest, and specifically two hyperparams: a number of trees ( ntrees ) and a number of candidate features at each split ( mtry ). Here’s some code . We’re going to use a red wine quality dataset. It has about 1600 examples and our goal will be to predict a rating for a wine given all the other properties. This is a regression* task, as ratings are in (0,10) range. We will split the data 80/10/10 into train, validation and test set, and use the first two to establish optimal hyperparams and then predict on the test set. As an error measure we will use RMSE. At first, we will try ntrees between 10 and 200 and mtry between 3 and 11 (there’s eleven features total, so that’s the upper bound). Here are the results of two Spearmint runs with 71 and 95 tries respectively. Colors denote a validation error value: green : RMSE < 0.57 blue : RMSE < 0.58 black : RMSE >= 0.58 Turns out that some diffe
3 0.10795964 24 fast ml-2013-03-25-Dimensionality reduction for sparse binary data - an overview
Introduction: Last time we explored dimensionality reduction in practice using Gensim’s LSI and LDA. Now, having spent some time researching the subject matter, we will give an overview of other options. UPDATE : We now consider the topic quite irrelevant, because sparse high-dimensional data is precisely where linear models shine. See Amazon aspires to automate access control , Predicting advertised salaries and Predicting closed questions on Stack Overflow . And the few most popular methods are: LSI/LSA - a multinomial PCA LDA - Latent Dirichlet Allocation matrix factorization, in particular non-negative variants: NMF ICA, or Independent Components Analysis mixtures of Bernoullis stacked RBMs correlated topic models, an extension of LDA We tried the first two before. As regards matrix factorization, you do the same stuff as with movie recommendations (think Netflix challenge). The difference is, now all the matrix elements are known and we are only interested in
4 0.082970098 47 fast ml-2013-12-15-A-B testing with bayesian bandits in Google Analytics
Introduction: A/B testing is a way to optimize a web page. Half of visitors see one version, the other half another, so you can tell which version is more conducive to your goal - for example selling something. Since June 2013 A/B testing can be conveniently done with Google Analytics. Here’s how. This article is not quite about machine learning. If you’re not interested in testing, scroll down to the bayesian bandits section . Google Content Experiments We remember Google Website Optimizer from a few years ago. It wasn’t exactly user friendly or slick, but it felt solid and did the job. Unfortunately, at one point in time Google pulled the plug, leaving Genetify as a sole free (and open source) tool for multivariate testing. Multivariate means testing a few elements on a page simultanously. At that time they launched Content Experiments in Google Analytics, but it was a giant step backward. Content experiments were very primitive and only allowed rudimentary A/B split testing. It i
5 0.061861821 53 fast ml-2014-02-20-Are stocks predictable?
Introduction: We’d like to be able to predict stock market. That seems like a nice way of making money. We’ll address the fundamental issue: can stocks be predicted in the short term, that is a few days ahead? There’s a technique that seeks to answer the question of predictability. It’s called Forecastable Component Analysis . Based on a new forecastability measure, ForeCA finds an optimal transformation to separate a multivariate time series into a forecastable and an orthogonal white noise space. The author, Georg M. Goerg*, implemented it in R package ForeCA . It might be useful in two ways: It can tell you how forecastable time series is. Given a multivariate time series, let’s say a portfolio of stocks, it can find forecastable components. The idea in the second point is similiar to PCA - ForeCA is a linear dimensionality reduction technique. The main difference is that the method explicitly addresses forecastability. It does so by considering an interplay between time and
6 0.054828554 19 fast ml-2013-02-07-The secret of the big guys
7 0.053764749 18 fast ml-2013-01-17-A very fast denoising autoencoder
8 0.052293353 7 fast ml-2012-10-05-Predicting closed questions on Stack Overflow
9 0.051600259 46 fast ml-2013-12-07-13 NIPS papers that caught our eye
10 0.046390615 56 fast ml-2014-03-31-If you use R, you may want RStudio
11 0.046112366 58 fast ml-2014-04-12-Deep learning these days
12 0.046008468 40 fast ml-2013-10-06-Pylearn2 in practice
13 0.045689207 1 fast ml-2012-08-09-What you wanted to know about Mean Average Precision
14 0.044524629 14 fast ml-2013-01-04-Madelon: Spearmint's revenge
15 0.04418467 33 fast ml-2013-07-09-Introducing phraug
16 0.043918628 23 fast ml-2013-03-18-Large scale L1 feature selection with Vowpal Wabbit
17 0.043676071 15 fast ml-2013-01-07-Machine learning courses online
18 0.039783981 59 fast ml-2014-04-21-Predicting happiness from demographics and poll answers
19 0.03961752 27 fast ml-2013-05-01-Deep learning made easy
20 0.037141036 55 fast ml-2014-03-20-Good representations, distance, metric learning and supervised dimensionality reduction
topicId topicWeight
[(6, 0.015), (22, 0.022), (26, 0.021), (31, 0.01), (35, 0.027), (55, 0.018), (69, 0.152), (71, 0.022), (73, 0.012), (81, 0.013), (84, 0.535), (99, 0.018)]
simIndex simValue blogId blogTitle
1 0.96282059 37 fast ml-2013-09-03-Our followers and who else they follow
Introduction: Recently we hit 400 followers mark on Twitter. To celebrate we decided to do some data mining on you , specifically to discover who our followers are and who else they follow. For your viewing pleasure we packaged the results nicely with Bootstrap. Here’s some data science in action. Our followers This table show our 20 most popular followers as measeared by their follower count. The occasional question marks stand for non-ASCII characters. Each link opens a new window. Followers Screen name Name Description 8685 pankaj Pankaj Gupta I lead the Personalization and Recommender Systems group at Twitter. Founded two startups in the past. 5070 ogrisel Olivier Grisel Datageek, contributor to scikit-learn, works with Python / Java / Clojure / Pig, interested in Machine Learning, NLProc, {Big|Linked|Open} Data and braaains! 4582 thuske thuske & 4442 ram Ram Ravichandran So
same-blog 2 0.92917252 11 fast ml-2012-12-07-Predicting wine quality
Introduction: This post is as much about wine as it is about machine learning, so if you enjoy wine, like we do, you may find it especially interesting. Here’s some R and Matlab code , and if you want to get right to the point, skip to the charts . There’s a book by Philipp Janert called Data Analysis with Open Source Tools , which, by the way, we would recommend. From this book we found out about the wine quality datasets . There are two, one for red wine and one for white wine, and they are interesting because they contain quality ratings (1 - 10) for a few thousands of wines, along with their physical and chemical properties. We could probably use these properties to predict a rating for a wine. We’ll be looking at white and red wine separately for the reasons you will see shortly. Principal component analysis for white wine Janert performs a principal component analysis (PCA) and shows a resulting plot for white wine. What’s interesting about this plot is that judging by the first tw
3 0.32780355 9 fast ml-2012-10-25-So you want to work for Facebook
Introduction: Good news, everyone! There’s a new contest on Kaggle - Facebook is looking for talent . They won’t pay, but just might interview. This post is in a way a bonus for active readers because most visitors of fastml.com originally come from Kaggle forums. For this competition the forums are disabled to encourage own work . To honor this, we won’t publish any code. But own work doesn’t mean original work , and we wouldn’t want to reinvent the wheel, would we? The contest differs substantially from a Kaggle stereotype, if there is such a thing, in three major ways: there’s no money prizes, as mentioned above it’s not a real world problem, but rather an assignment to screen job candidates (this has important consequences, described below) it’s not a typical machine learning project, but rather a broader AI exercise You are given a graph of the internet, actually a snapshot of the graph for each of 15 time steps. You are also given a bunch of paths in this graph, which a
4 0.30873862 13 fast ml-2012-12-27-Spearmint with a random forest
Introduction: Now that we have Spearmint basics nailed, we’ll try tuning a random forest, and specifically two hyperparams: a number of trees ( ntrees ) and a number of candidate features at each split ( mtry ). Here’s some code . We’re going to use a red wine quality dataset. It has about 1600 examples and our goal will be to predict a rating for a wine given all the other properties. This is a regression* task, as ratings are in (0,10) range. We will split the data 80/10/10 into train, validation and test set, and use the first two to establish optimal hyperparams and then predict on the test set. As an error measure we will use RMSE. At first, we will try ntrees between 10 and 200 and mtry between 3 and 11 (there’s eleven features total, so that’s the upper bound). Here are the results of two Spearmint runs with 71 and 95 tries respectively. Colors denote a validation error value: green : RMSE < 0.57 blue : RMSE < 0.58 black : RMSE >= 0.58 Turns out that some diffe
5 0.2940031 15 fast ml-2013-01-07-Machine learning courses online
Introduction: How do you learn machine learning? A good way to begin is to take an online course. These courses started appearing towards the end of 2011, first from Stanford University, now from Coursera , Udacity , edX and other institutions. There are very many of them, including a few about machine learning. Here’s a list: Introduction to Artificial Intelligence by Sebastian Thrun and Peter Norvig. That was the first online class, and it contains two units on machine learning (units five and six). Both instructors work at Google. Sebastian Thrun is best known for building a self-driving car and Peter Norvig is a leading authority on AI, so they know what they are talking about. After the success of the class Sebastian Thrun quit Stanford to found Udacity, his online learning startup. Machine Learning by Andrew Ng. Again, one of the first classes, by Stanford professor who started Coursera, the best known online learning provider today. Andrew Ng is a world class authority on m
6 0.28068483 48 fast ml-2013-12-28-Regularizing neural networks with dropout and with DropConnect
7 0.27893752 27 fast ml-2013-05-01-Deep learning made easy
8 0.27768084 12 fast ml-2012-12-21-Tuning hyperparams automatically with Spearmint
9 0.27576771 14 fast ml-2013-01-04-Madelon: Spearmint's revenge
10 0.27530351 1 fast ml-2012-08-09-What you wanted to know about Mean Average Precision
11 0.27274731 49 fast ml-2014-01-10-Classifying images with a pre-trained deep network
12 0.27178162 35 fast ml-2013-08-12-Accelerometer Biometric Competition
13 0.26848176 45 fast ml-2013-11-27-Object recognition in images with cuda-convnet
14 0.26278013 62 fast ml-2014-05-26-Yann LeCun's answers from the Reddit AMA
15 0.25699088 55 fast ml-2014-03-20-Good representations, distance, metric learning and supervised dimensionality reduction
16 0.25422734 17 fast ml-2013-01-14-Feature selection in practice
17 0.25097874 43 fast ml-2013-11-02-Maxing out the digits
18 0.24932112 36 fast ml-2013-08-23-A bag of words and a nice little network
19 0.24882795 18 fast ml-2013-01-17-A very fast denoising autoencoder
20 0.24800862 53 fast ml-2014-02-20-Are stocks predictable?