fast_ml fast_ml-2013 fast_ml-2013-28 knowledge-graph by maker-knowledge-mining
Source: html
Introduction: Sometimes, fortunately not very often, we see people mention Weka as useful machine learning software. This is misleading, because Weka is just a toy: it can give a beginner a bit of taste of machine learning, but if you want to accomplish anything meaningful, there are many way better tools. Warning: this is a post with strong opinions. Weka has at least two serious shortcomings: It’s written in Java. We think that Java has its place*, but not really in desktop machine learning. Particularly, Java is very memory-hungry and it means that you either constrain yourself to really small data or buy some RAM. This also applies to other popular GUI tools like Rapid Miner or KNIME. It is no accident that some of the best known Java projects concern themselves with using many machines. One is just not enough with Java. Weka people chose another way: their next project, MOA, is about data stream mining. Online learning, in other words. That way you only need one example in memor
sentIndex sentText sentNum sentScore
1 Sometimes, fortunately not very often, we see people mention Weka as useful machine learning software. [sent-1, score-0.381]
2 This is misleading, because Weka is just a toy: it can give a beginner a bit of taste of machine learning, but if you want to accomplish anything meaningful, there are many way better tools. [sent-2, score-0.421]
3 Warning: this is a post with strong opinions. [sent-3, score-0.114]
4 Weka has at least two serious shortcomings: It’s written in Java. [sent-4, score-0.156]
5 We think that Java has its place*, but not really in desktop machine learning. [sent-5, score-0.327]
6 Particularly, Java is very memory-hungry and it means that you either constrain yourself to really small data or buy some RAM. [sent-6, score-0.26]
7 This also applies to other popular GUI tools like Rapid Miner or KNIME. [sent-7, score-0.07]
8 It is no accident that some of the best known Java projects concern themselves with using many machines. [sent-8, score-0.285]
9 Weka people chose another way: their next project, MOA, is about data stream mining. [sent-10, score-0.233]
10 That way you only need one example in memory at a time. [sent-12, score-0.104]
11 Bruce Lee responding to Scaling Weka With Hadoop proposal Weka uses its own proprietary file format called ARFF. [sent-13, score-0.152]
12 It’s seriously stupid and you will discover that if you try to work with it. [sent-14, score-0.185]
13 *Java seems to be successfully applied in distributed infrastructure-type software like Hadoop. [sent-17, score-0.152]
wordName wordTfidf (topN-words)
[('weka', 0.743), ('java', 0.364), ('hadoop', 0.182), ('really', 0.131), ('misleading', 0.103), ('lee', 0.103), ('warning', 0.103), ('projects', 0.103), ('proprietary', 0.103), ('seriously', 0.103), ('taste', 0.103), ('people', 0.099), ('desktop', 0.091), ('believe', 0.091), ('concern', 0.091), ('serious', 0.091), ('buy', 0.082), ('distributed', 0.082), ('meaningful', 0.082), ('chose', 0.082), ('anything', 0.082), ('discover', 0.082), ('particularly', 0.076), ('project', 0.076), ('mention', 0.076), ('applied', 0.07), ('place', 0.07), ('tools', 0.07), ('written', 0.065), ('strong', 0.065), ('scaling', 0.065), ('machine', 0.062), ('check', 0.058), ('memory', 0.058), ('fortunately', 0.055), ('online', 0.055), ('next', 0.052), ('known', 0.049), ('post', 0.049), ('uses', 0.049), ('small', 0.047), ('way', 0.046), ('bit', 0.045), ('useful', 0.045), ('often', 0.045), ('learning', 0.044), ('think', 0.043), ('enough', 0.043), ('many', 0.042), ('give', 0.041)]
simIndex simValue blogId blogTitle
same-blog 1 1.0 28 fast ml-2013-05-12-And deliver us from Weka
Introduction: Sometimes, fortunately not very often, we see people mention Weka as useful machine learning software. This is misleading, because Weka is just a toy: it can give a beginner a bit of taste of machine learning, but if you want to accomplish anything meaningful, there are many way better tools. Warning: this is a post with strong opinions. Weka has at least two serious shortcomings: It’s written in Java. We think that Java has its place*, but not really in desktop machine learning. Particularly, Java is very memory-hungry and it means that you either constrain yourself to really small data or buy some RAM. This also applies to other popular GUI tools like Rapid Miner or KNIME. It is no accident that some of the best known Java projects concern themselves with using many machines. One is just not enough with Java. Weka people chose another way: their next project, MOA, is about data stream mining. Online learning, in other words. That way you only need one example in memor
2 0.064715333 62 fast ml-2014-05-26-Yann LeCun's answers from the Reddit AMA
Introduction: On May 15th Yann LeCun answered “ask me anything” questions on Reddit . We hand-picked some of his thoughts and grouped them by topic for your enjoyment. Toronto, Montreal and New York All three groups are strong and complementary. Geoff (who spends more time at Google than in Toronto now) and Russ Salakhutdinov like RBMs and deep Boltzmann machines. I like the idea of Boltzmann machines (it’s a beautifully simple concept) but it doesn’t scale well. Also, I totally hate sampling. Yoshua and his colleagues have focused a lot on various unsupervised learning, including denoising auto-encoders, contracting auto-encoders. They are not allergic to sampling like I am. On the application side, they have worked on text, not so much on images. In our lab at NYU (Rob Fergus, David Sontag, me and our students and postdocs), we have been focusing on sparse auto-encoders for unsupervised learning. They have the advantage of scaling well. We have also worked on applications, mostly to v
3 0.056763109 37 fast ml-2013-09-03-Our followers and who else they follow
Introduction: Recently we hit 400 followers mark on Twitter. To celebrate we decided to do some data mining on you , specifically to discover who our followers are and who else they follow. For your viewing pleasure we packaged the results nicely with Bootstrap. Here’s some data science in action. Our followers This table show our 20 most popular followers as measeared by their follower count. The occasional question marks stand for non-ASCII characters. Each link opens a new window. Followers Screen name Name Description 8685 pankaj Pankaj Gupta I lead the Personalization and Recommender Systems group at Twitter. Founded two startups in the past. 5070 ogrisel Olivier Grisel Datageek, contributor to scikit-learn, works with Python / Java / Clojure / Pig, interested in Machine Learning, NLProc, {Big|Linked|Open} Data and braaains! 4582 thuske thuske & 4442 ram Ram Ravichandran So
4 0.041884653 42 fast ml-2013-10-28-How much data is enough?
Introduction: A Reddit reader asked how much data is needed for a machine learning project to get meaningful results. Prof. Yaser Abu-Mostafa from Caltech answered this very question in his online course . The answer is that as a rule of thumb, you need roughly 10 times as many examples as there are degrees of freedom in your model. In case of a linear model, degrees of freedom essentially equal data dimensionality (a number of columns). We find that thinking in terms of dimensionality vs number of examples is a convenient shortcut. The more powerful the model, the more it’s prone to overfitting and so the more examples you need. And of course the way of controlling this is through validation. Breaking the rules In practice you can get away with less than 10x, especially if your model is simple and uses regularization. In Kaggle competitions the ratio is often closer to 1:1, and sometimes dimensionality is far greater than a number of examples, depending on how you pre-process the data
5 0.041178271 15 fast ml-2013-01-07-Machine learning courses online
Introduction: How do you learn machine learning? A good way to begin is to take an online course. These courses started appearing towards the end of 2011, first from Stanford University, now from Coursera , Udacity , edX and other institutions. There are very many of them, including a few about machine learning. Here’s a list: Introduction to Artificial Intelligence by Sebastian Thrun and Peter Norvig. That was the first online class, and it contains two units on machine learning (units five and six). Both instructors work at Google. Sebastian Thrun is best known for building a self-driving car and Peter Norvig is a leading authority on AI, so they know what they are talking about. After the success of the class Sebastian Thrun quit Stanford to found Udacity, his online learning startup. Machine Learning by Andrew Ng. Again, one of the first classes, by Stanford professor who started Coursera, the best known online learning provider today. Andrew Ng is a world class authority on m
6 0.040017288 7 fast ml-2012-10-05-Predicting closed questions on Stack Overflow
7 0.037657876 12 fast ml-2012-12-21-Tuning hyperparams automatically with Spearmint
8 0.036625884 32 fast ml-2013-07-05-Processing large files, line by line
9 0.03658779 27 fast ml-2013-05-01-Deep learning made easy
10 0.035249244 41 fast ml-2013-10-09-Big data made easy
11 0.03501327 5 fast ml-2012-09-19-Best Buy mobile contest - big data
12 0.030920468 58 fast ml-2014-04-12-Deep learning these days
13 0.03083263 33 fast ml-2013-07-09-Introducing phraug
14 0.030760719 19 fast ml-2013-02-07-The secret of the big guys
15 0.029211821 55 fast ml-2014-03-20-Good representations, distance, metric learning and supervised dimensionality reduction
16 0.027164349 24 fast ml-2013-03-25-Dimensionality reduction for sparse binary data - an overview
17 0.027067445 4 fast ml-2012-09-17-Best Buy mobile contest
18 0.026727777 23 fast ml-2013-03-18-Large scale L1 feature selection with Vowpal Wabbit
19 0.026410429 21 fast ml-2013-02-27-Dimensionality reduction for sparse binary data
20 0.025804022 45 fast ml-2013-11-27-Object recognition in images with cuda-convnet
topicId topicWeight
[(0, 0.109), (1, 0.024), (2, 0.084), (3, -0.05), (4, 0.032), (5, 0.094), (6, -0.025), (7, 0.08), (8, 0.134), (9, -0.08), (10, -0.055), (11, 0.111), (12, 0.064), (13, 0.037), (14, 0.077), (15, -0.048), (16, 0.164), (17, 0.216), (18, -0.23), (19, 0.384), (20, 0.166), (21, -0.193), (22, 0.558), (23, -0.036), (24, 0.089), (25, 0.045), (26, -0.192), (27, -0.038), (28, 0.036), (29, 0.086), (30, 0.053), (31, 0.333), (32, -0.153), (33, 0.082), (34, 0.073), (35, 0.158), (36, -0.047), (37, 0.06), (38, 0.043), (39, -0.012), (40, 0.013), (41, 0.007), (42, 0.024), (43, -0.025), (44, -0.043), (45, 0.043), (46, 0.057), (47, -0.063), (48, 0.014), (49, 0.01)]
simIndex simValue blogId blogTitle
same-blog 1 0.98321527 28 fast ml-2013-05-12-And deliver us from Weka
Introduction: Sometimes, fortunately not very often, we see people mention Weka as useful machine learning software. This is misleading, because Weka is just a toy: it can give a beginner a bit of taste of machine learning, but if you want to accomplish anything meaningful, there are many way better tools. Warning: this is a post with strong opinions. Weka has at least two serious shortcomings: It’s written in Java. We think that Java has its place*, but not really in desktop machine learning. Particularly, Java is very memory-hungry and it means that you either constrain yourself to really small data or buy some RAM. This also applies to other popular GUI tools like Rapid Miner or KNIME. It is no accident that some of the best known Java projects concern themselves with using many machines. One is just not enough with Java. Weka people chose another way: their next project, MOA, is about data stream mining. Online learning, in other words. That way you only need one example in memor
2 0.14395331 62 fast ml-2014-05-26-Yann LeCun's answers from the Reddit AMA
Introduction: On May 15th Yann LeCun answered “ask me anything” questions on Reddit . We hand-picked some of his thoughts and grouped them by topic for your enjoyment. Toronto, Montreal and New York All three groups are strong and complementary. Geoff (who spends more time at Google than in Toronto now) and Russ Salakhutdinov like RBMs and deep Boltzmann machines. I like the idea of Boltzmann machines (it’s a beautifully simple concept) but it doesn’t scale well. Also, I totally hate sampling. Yoshua and his colleagues have focused a lot on various unsupervised learning, including denoising auto-encoders, contracting auto-encoders. They are not allergic to sampling like I am. On the application side, they have worked on text, not so much on images. In our lab at NYU (Rob Fergus, David Sontag, me and our students and postdocs), we have been focusing on sparse auto-encoders for unsupervised learning. They have the advantage of scaling well. We have also worked on applications, mostly to v
3 0.11470852 37 fast ml-2013-09-03-Our followers and who else they follow
Introduction: Recently we hit 400 followers mark on Twitter. To celebrate we decided to do some data mining on you , specifically to discover who our followers are and who else they follow. For your viewing pleasure we packaged the results nicely with Bootstrap. Here’s some data science in action. Our followers This table show our 20 most popular followers as measeared by their follower count. The occasional question marks stand for non-ASCII characters. Each link opens a new window. Followers Screen name Name Description 8685 pankaj Pankaj Gupta I lead the Personalization and Recommender Systems group at Twitter. Founded two startups in the past. 5070 ogrisel Olivier Grisel Datageek, contributor to scikit-learn, works with Python / Java / Clojure / Pig, interested in Machine Learning, NLProc, {Big|Linked|Open} Data and braaains! 4582 thuske thuske & 4442 ram Ram Ravichandran So
4 0.093846709 15 fast ml-2013-01-07-Machine learning courses online
Introduction: How do you learn machine learning? A good way to begin is to take an online course. These courses started appearing towards the end of 2011, first from Stanford University, now from Coursera , Udacity , edX and other institutions. There are very many of them, including a few about machine learning. Here’s a list: Introduction to Artificial Intelligence by Sebastian Thrun and Peter Norvig. That was the first online class, and it contains two units on machine learning (units five and six). Both instructors work at Google. Sebastian Thrun is best known for building a self-driving car and Peter Norvig is a leading authority on AI, so they know what they are talking about. After the success of the class Sebastian Thrun quit Stanford to found Udacity, his online learning startup. Machine Learning by Andrew Ng. Again, one of the first classes, by Stanford professor who started Coursera, the best known online learning provider today. Andrew Ng is a world class authority on m
5 0.086480498 7 fast ml-2012-10-05-Predicting closed questions on Stack Overflow
Introduction: This time we enter the Stack Overflow challenge , which is about predicting a status of a given question on SO. There are five possible statuses, so it’s a multi-class classification problem. We would prefer a tool able to perform multiclass classification by itself. It can be done by hand by constructing five datasets, each with binary labels (one class against all others), and then combining predictions, but it might be a bit tricky to get right - we tried. Fortunately, nice people at Yahoo, excuse us, Microsoft, recently relased a new version of Vowpal Wabbit , and this new version supports multiclass classification. In case you’re wondering, Vowpal Wabbit is a fast linear learner. We like the “fast” part and “linear” is OK for dealing with lots of words, as in this contest. In any case, with more than three million data points it wouldn’t be that easy to train a kernel SVM, a neural net or what have you. VW, being a well-polished tool, has a few very convenient features.
6 0.086169548 27 fast ml-2013-05-01-Deep learning made easy
7 0.079167351 12 fast ml-2012-12-21-Tuning hyperparams automatically with Spearmint
8 0.073471829 42 fast ml-2013-10-28-How much data is enough?
9 0.073299766 24 fast ml-2013-03-25-Dimensionality reduction for sparse binary data - an overview
10 0.069937386 19 fast ml-2013-02-07-The secret of the big guys
11 0.069639079 55 fast ml-2014-03-20-Good representations, distance, metric learning and supervised dimensionality reduction
12 0.068894014 4 fast ml-2012-09-17-Best Buy mobile contest
13 0.066110007 32 fast ml-2013-07-05-Processing large files, line by line
14 0.065170348 21 fast ml-2013-02-27-Dimensionality reduction for sparse binary data
15 0.064746782 16 fast ml-2013-01-12-Intro to random forests
16 0.060735028 23 fast ml-2013-03-18-Large scale L1 feature selection with Vowpal Wabbit
17 0.059574693 5 fast ml-2012-09-19-Best Buy mobile contest - big data
18 0.055773571 58 fast ml-2014-04-12-Deep learning these days
19 0.05486355 44 fast ml-2013-11-18-CUDA on a Linux laptop
20 0.053577844 41 fast ml-2013-10-09-Big data made easy
topicId topicWeight
[(6, 0.024), (15, 0.529), (26, 0.012), (58, 0.053), (69, 0.092), (71, 0.04), (79, 0.029), (99, 0.068)]
simIndex simValue blogId blogTitle
same-blog 1 0.85211188 28 fast ml-2013-05-12-And deliver us from Weka
Introduction: Sometimes, fortunately not very often, we see people mention Weka as useful machine learning software. This is misleading, because Weka is just a toy: it can give a beginner a bit of taste of machine learning, but if you want to accomplish anything meaningful, there are many way better tools. Warning: this is a post with strong opinions. Weka has at least two serious shortcomings: It’s written in Java. We think that Java has its place*, but not really in desktop machine learning. Particularly, Java is very memory-hungry and it means that you either constrain yourself to really small data or buy some RAM. This also applies to other popular GUI tools like Rapid Miner or KNIME. It is no accident that some of the best known Java projects concern themselves with using many machines. One is just not enough with Java. Weka people chose another way: their next project, MOA, is about data stream mining. Online learning, in other words. That way you only need one example in memor
2 0.21568978 16 fast ml-2013-01-12-Intro to random forests
Introduction: Let’s step back from forays into cutting edge topics and look at a random forest, one of the most popular machine learning techniques today. Why is it so attractive? First of all, decision tree ensembles have been found by Caruana et al. as the best overall approach for a variety of problems. Random forests, specifically, perform well both in low dimensional and high dimensional tasks. There are basically two kinds of tree ensembles: bagged trees and boosted trees. Bagging means that when building each subsequent tree, we don’t look at the earlier trees, while in boosting we consider the earlier trees and strive to compensate for their weaknesses (which may lead to overfitting). Random forest is an example of the bagging approach, less prone to overfit. Gradient boosted trees (notably GBM package in R) represent the other one. Both are very successful in many applications. Trees are also relatively fast to train, compared to some more involved methods. Besides effectivnes
3 0.20044279 12 fast ml-2012-12-21-Tuning hyperparams automatically with Spearmint
Introduction: The promise What’s attractive in machine learning? That a machine is learning, instead of a human. But an operator still has a lot of work to do. First, he has to learn how to teach a machine, in general. Then, when it comes to a concrete task, there are two main areas where a human needs to do the work (and remember, laziness is a virtue, at least for a programmer, so we’d like to minimize amount of work done by a human): data preparation model tuning This story is about model tuning. Typically, to achieve satisfactory results, first we need to convert raw data into format accepted by the model we would like to use, and then tune a few hyperparameters of the model. For example, some hyperparams to tune for a random forest may be a number of trees to grow and a number of candidate features at each split ( mtry in R randomForest). For a neural network, there are quite a lot of hyperparams: number of layers, number of neurons in each layer (specifically, in each hid
4 0.18702112 4 fast ml-2012-09-17-Best Buy mobile contest
Introduction: There’s a contest on Kaggle called ACM Hackaton . Actually, there are two, one based on small data and one on big data. Here we will be talking about small data contest - specifically, about beating the benchmark - but the ideas are equally applicable to both. The deal is, we have a training set from Best Buy with search queries and items which users clicked after the query, plus some other data. Items are in this case Xbox games like “Batman” or “Rocksmith” or “Call of Duty”. We are asked to predict an item user clicked given the query. Metric is MAP@5 (see an explanation of MAP ). The problem isn’t typical for Kaggle, because it doesn’t really require using machine learning in traditional sense. To beat the benchmark, it’s enough to write a short script. Concretely ;), we’re gonna build a mapping from queries to items, using the training set. It will be just a Python dictionary looking like this: 'forzasteeringwheel': {'2078113': 1}, 'finalfantasy13': {'9461183': 3, '351
5 0.18529505 13 fast ml-2012-12-27-Spearmint with a random forest
Introduction: Now that we have Spearmint basics nailed, we’ll try tuning a random forest, and specifically two hyperparams: a number of trees ( ntrees ) and a number of candidate features at each split ( mtry ). Here’s some code . We’re going to use a red wine quality dataset. It has about 1600 examples and our goal will be to predict a rating for a wine given all the other properties. This is a regression* task, as ratings are in (0,10) range. We will split the data 80/10/10 into train, validation and test set, and use the first two to establish optimal hyperparams and then predict on the test set. As an error measure we will use RMSE. At first, we will try ntrees between 10 and 200 and mtry between 3 and 11 (there’s eleven features total, so that’s the upper bound). Here are the results of two Spearmint runs with 71 and 95 tries respectively. Colors denote a validation error value: green : RMSE < 0.57 blue : RMSE < 0.58 black : RMSE >= 0.58 Turns out that some diffe
6 0.18402195 9 fast ml-2012-10-25-So you want to work for Facebook
7 0.18364038 27 fast ml-2013-05-01-Deep learning made easy
8 0.18324445 18 fast ml-2013-01-17-A very fast denoising autoencoder
9 0.18215668 43 fast ml-2013-11-02-Maxing out the digits
10 0.18075451 1 fast ml-2012-08-09-What you wanted to know about Mean Average Precision
11 0.17743883 17 fast ml-2013-01-14-Feature selection in practice
12 0.17711899 6 fast ml-2012-09-25-Best Buy mobile contest - full disclosure
13 0.17666967 14 fast ml-2013-01-04-Madelon: Spearmint's revenge
14 0.1756838 48 fast ml-2013-12-28-Regularizing neural networks with dropout and with DropConnect
15 0.17552309 20 fast ml-2013-02-18-Predicting advertised salaries
16 0.17442876 25 fast ml-2013-04-10-Gender discrimination
17 0.17345148 35 fast ml-2013-08-12-Accelerometer Biometric Competition
18 0.1732306 7 fast ml-2012-10-05-Predicting closed questions on Stack Overflow
19 0.16987424 19 fast ml-2013-02-07-The secret of the big guys
20 0.16827324 21 fast ml-2013-02-27-Dimensionality reduction for sparse binary data