fast_ml fast_ml-2014 fast_ml-2014-58 knowledge-graph by maker-knowledge-mining

58 fast ml-2014-04-12-Deep learning these days


meta infos for this blog

Source: html

Introduction: It seems that quite a few people with interest in deep learning think of it in terms of unsupervised pre-training, autoencoders, stacked RBMs and deep belief networks. It’s easy to get into this groove by watching one of Geoff Hinton’s videos from a few years ago, where he bashes backpropagation in favour of unsupervised methods that are able to discover the structure in data by themselves, the same way as human brain does. Those videos, papers and tutorials linger. They were state of the art once, but things have changed since then. These days supervised learning is the king again. This has to do with the fact that you can look at data from many different angles and usually you’d prefer representation that is useful for the discriminative task at hand . Unsupervised learning will find some angle, but will it be the one you want? In case of the MNIST digits, sure. Otherwise probably not. Or maybe it will find a lot of angles while you only need one. Ladies and gentlemen, pleas


Summary: the most important sentenses genereted by tfidf model

sentIndex sentText sentNum sentScore

1 It seems that quite a few people with interest in deep learning think of it in terms of unsupervised pre-training, autoencoders, stacked RBMs and deep belief networks. [sent-1, score-0.798]

2 It’s easy to get into this groove by watching one of Geoff Hinton’s videos from a few years ago, where he bashes backpropagation in favour of unsupervised methods that are able to discover the structure in data by themselves, the same way as human brain does. [sent-2, score-0.891]

3 This has to do with the fact that you can look at data from many different angles and usually you’d prefer representation that is useful for the discriminative task at hand . [sent-6, score-0.361]

4 Or maybe it will find a lot of angles while you only need one. [sent-10, score-0.291]

5 Ladies and gentlemen, please welcome Sander Dieleman. [sent-11, score-0.075]

6 He has won the The Galaxy Zoo challenge on Kaggle, so he might know a thing or two about deep learning. [sent-12, score-0.202]

7 Here’s a Reddit comment about deep learning Sander recently made, reprinted with the author’s permission: It’s true that unsupervised pre-training was initially what made it possible to train deeper networks, but the last few years the pre-training approach has been largely obsoleted. [sent-13, score-1.04]

8 Nowadays, deep neural networks are a lot more similar to their 80’s cousins. [sent-14, score-0.288]

9 Instead of pre-training, the difference is now in the activation functions and regularisation methods used (and sometimes in the optimisation algorithm, although much more rarely). [sent-15, score-0.245]

10 I would say that the “pre-training era”, which started around 2006, ended in the early ’10s when people started using rectified linear units (ReLUs), and later dropout, and discovered that pre-training was no longer beneficial for this type of networks. [sent-16, score-0.653]

11 ReLUs (and modern variants such as maxout) suffer significantly less from the vanishing gradient problem. [sent-17, score-0.245]

12 Dropout is a strong regulariser that helps ensure the solution we get generalises well. [sent-18, score-0.075]

13 These are precisely the two issues pre-training sought to solve, but they are now solved in different ways. [sent-19, score-0.235]

14 Using ReLUs also tends to result in faster training, because they’re faster to compute, and because the optimisation is easier. [sent-20, score-0.319]

15 Commercial applications of deep learning usually consist of very large, feed-forward neural nets with ReLUs and dropout, and trained simply with backprop. [sent-21, score-0.548]

16 Unsupervised pre-training still has its applications when labeled data is very scarce and unlabeled data is abundant, but more often than not, it’s no longer needed. [sent-22, score-0.371]

17 If you want more, Sander pointed us to a similiar comment by our favourite Dave W-F from Toronto. [sent-23, score-0.321]


similar blogs computed by tfidf model

tfidf for this blog:

wordName wordTfidf (topN-words)

[('relus', 0.411), ('sander', 0.308), ('unsupervised', 0.227), ('angles', 0.205), ('deep', 0.202), ('comment', 0.171), ('optimisation', 0.171), ('videos', 0.171), ('dropout', 0.163), ('started', 0.125), ('applications', 0.125), ('years', 0.109), ('made', 0.096), ('longer', 0.096), ('lot', 0.086), ('angle', 0.085), ('variants', 0.085), ('dave', 0.085), ('favour', 0.085), ('belief', 0.085), ('modern', 0.085), ('discriminative', 0.085), ('largely', 0.085), ('solved', 0.085), ('people', 0.082), ('deeper', 0.075), ('issues', 0.075), ('ended', 0.075), ('precisely', 0.075), ('human', 0.075), ('watching', 0.075), ('autoencoders', 0.075), ('labeled', 0.075), ('unlabeled', 0.075), ('helps', 0.075), ('rbms', 0.075), ('significantly', 0.075), ('consist', 0.075), ('digits', 0.075), ('early', 0.075), ('pointed', 0.075), ('favourite', 0.075), ('welcome', 0.075), ('backpropagation', 0.075), ('initially', 0.075), ('nets', 0.075), ('later', 0.075), ('methods', 0.074), ('faster', 0.074), ('usually', 0.071)]

similar blogs list:

simIndex simValue blogId blogTitle

same-blog 1 1.0 58 fast ml-2014-04-12-Deep learning these days

Introduction: It seems that quite a few people with interest in deep learning think of it in terms of unsupervised pre-training, autoencoders, stacked RBMs and deep belief networks. It’s easy to get into this groove by watching one of Geoff Hinton’s videos from a few years ago, where he bashes backpropagation in favour of unsupervised methods that are able to discover the structure in data by themselves, the same way as human brain does. Those videos, papers and tutorials linger. They were state of the art once, but things have changed since then. These days supervised learning is the king again. This has to do with the fact that you can look at data from many different angles and usually you’d prefer representation that is useful for the discriminative task at hand . Unsupervised learning will find some angle, but will it be the one you want? In case of the MNIST digits, sure. Otherwise probably not. Or maybe it will find a lot of angles while you only need one. Ladies and gentlemen, pleas

2 0.21394759 62 fast ml-2014-05-26-Yann LeCun's answers from the Reddit AMA

Introduction: On May 15th Yann LeCun answered “ask me anything” questions on Reddit . We hand-picked some of his thoughts and grouped them by topic for your enjoyment. Toronto, Montreal and New York All three groups are strong and complementary. Geoff (who spends more time at Google than in Toronto now) and Russ Salakhutdinov like RBMs and deep Boltzmann machines. I like the idea of Boltzmann machines (it’s a beautifully simple concept) but it doesn’t scale well. Also, I totally hate sampling. Yoshua and his colleagues have focused a lot on various unsupervised learning, including denoising auto-encoders, contracting auto-encoders. They are not allergic to sampling like I am. On the application side, they have worked on text, not so much on images. In our lab at NYU (Rob Fergus, David Sontag, me and our students and postdocs), we have been focusing on sparse auto-encoders for unsupervised learning. They have the advantage of scaling well. We have also worked on applications, mostly to v

3 0.17775629 27 fast ml-2013-05-01-Deep learning made easy

Introduction: As usual, there’s an interesting competition at Kaggle: The Black Box. It’s connected to ICML 2013 Workshop on Challenges in Representation Learning, held by the deep learning guys from Montreal. There are a couple benchmarks for this competition and the best one is unusually hard to beat 1 - only less than a fourth of those taking part managed to do so. We’re among them. Here’s how. The key ingredient in our success is a recently developed secret Stanford technology for deep unsupervised learning: sparse filtering by Jiquan Ngiam et al. Actually, it’s not secret. It’s available at Github , and has one or two very appealling properties. Let us explain. The main idea of deep unsupervised learning, as we understand it, is feature extraction. One of the most common applications is in multimedia. The reason for that is that multimedia tasks, for example object recognition, are easy for humans, but difficult for computers 2 . Geoff Hinton from Toronto talks about two ends

4 0.16230737 43 fast ml-2013-11-02-Maxing out the digits

Introduction: Recently we’ve been investigating the basics of Pylearn2 . Now it’s time for a more advanced example: a multilayer perceptron with dropout and maxout activation for the MNIST digits. Maxout explained If you’ve been following developments in deep learning, you know that Hinton’s most recent recommendation for supervised learning, after a few years of bashing backpropagation in favour of unsupervised pretraining, is to use classic multilayer perceptrons with dropout and rectified linear units. For us, this breath of simplicity is a welcome change. Rectified linear is f(x) = max( 0, x ) . This makes backpropagation trivial: for x > 0, the derivative is one, else zero. Note that ReLU consists of two linear functions. But why stop at two? Let’s take max. out of three, or four, or five linear functions… And so maxout is a generalization of ReLU. It can approximate any convex function. Now backpropagation is easy and dropout prevents overfitting, so we can train a deep

5 0.097969517 48 fast ml-2013-12-28-Regularizing neural networks with dropout and with DropConnect

Introduction: We continue with CIFAR-10-based competition at Kaggle to get to know DropConnect. It’s supposed to be an improvement over dropout. And dropout is certainly one of the bigger steps forward in neural network development. Is DropConnect really better than dropout? TL;DR DropConnect seems to offer results similiar to dropout. State of the art scores reported in the paper come from model ensembling. Dropout Dropout , by Hinton et al., is perhaps a biggest invention in the field of neural networks in recent years. It adresses the main problem in machine learning, that is overfitting. It does so by “dropping out” some unit activations in a given layer, that is setting them to zero. Thus it prevents co-adaptation of units and can also be seen as a method of ensembling many networks sharing the same weights. For each training example a different set of units to drop is randomly chosen. The idea has a biological inspiration . When a child is conceived, it receives half its genes f

6 0.097306192 46 fast ml-2013-12-07-13 NIPS papers that caught our eye

7 0.08696495 55 fast ml-2014-03-20-Good representations, distance, metric learning and supervised dimensionality reduction

8 0.084932163 57 fast ml-2014-04-01-Exclusive Geoff Hinton interview

9 0.082460597 29 fast ml-2013-05-25-More on sparse filtering and the Black Box competition

10 0.066061266 24 fast ml-2013-03-25-Dimensionality reduction for sparse binary data - an overview

11 0.065746747 2 fast ml-2012-08-27-Kaggle job recommendation challenge

12 0.065347031 15 fast ml-2013-01-07-Machine learning courses online

13 0.059656031 18 fast ml-2013-01-17-A very fast denoising autoencoder

14 0.05224574 54 fast ml-2014-03-06-PyBrain - a simple neural networks library in Python

15 0.050566312 7 fast ml-2012-10-05-Predicting closed questions on Stack Overflow

16 0.046122454 19 fast ml-2013-02-07-The secret of the big guys

17 0.044235799 40 fast ml-2013-10-06-Pylearn2 in practice

18 0.043716248 10 fast ml-2012-11-17-The Facebook challenge HOWTO

19 0.041909318 41 fast ml-2013-10-09-Big data made easy

20 0.041308973 16 fast ml-2013-01-12-Intro to random forests


similar blogs computed by lsi model

lsi for this blog:

topicId topicWeight

[(0, 0.232), (1, 0.221), (2, 0.238), (3, 0.07), (4, 0.151), (5, -0.047), (6, 0.173), (7, 0.167), (8, -0.251), (9, -0.014), (10, 0.115), (11, -0.067), (12, 0.022), (13, 0.131), (14, -0.064), (15, 0.106), (16, -0.06), (17, 0.1), (18, -0.134), (19, 0.088), (20, 0.219), (21, -0.16), (22, 0.002), (23, -0.005), (24, -0.033), (25, 0.05), (26, 0.241), (27, -0.187), (28, 0.035), (29, 0.129), (30, -0.104), (31, -0.211), (32, 0.017), (33, 0.146), (34, 0.072), (35, -0.122), (36, -0.009), (37, -0.157), (38, -0.095), (39, -0.055), (40, 0.111), (41, 0.031), (42, -0.139), (43, -0.071), (44, 0.066), (45, 0.047), (46, -0.022), (47, 0.179), (48, -0.138), (49, -0.087)]

similar blogs list:

simIndex simValue blogId blogTitle

same-blog 1 0.9808327 58 fast ml-2014-04-12-Deep learning these days

Introduction: It seems that quite a few people with interest in deep learning think of it in terms of unsupervised pre-training, autoencoders, stacked RBMs and deep belief networks. It’s easy to get into this groove by watching one of Geoff Hinton’s videos from a few years ago, where he bashes backpropagation in favour of unsupervised methods that are able to discover the structure in data by themselves, the same way as human brain does. Those videos, papers and tutorials linger. They were state of the art once, but things have changed since then. These days supervised learning is the king again. This has to do with the fact that you can look at data from many different angles and usually you’d prefer representation that is useful for the discriminative task at hand . Unsupervised learning will find some angle, but will it be the one you want? In case of the MNIST digits, sure. Otherwise probably not. Or maybe it will find a lot of angles while you only need one. Ladies and gentlemen, pleas

2 0.58121014 62 fast ml-2014-05-26-Yann LeCun's answers from the Reddit AMA

Introduction: On May 15th Yann LeCun answered “ask me anything” questions on Reddit . We hand-picked some of his thoughts and grouped them by topic for your enjoyment. Toronto, Montreal and New York All three groups are strong and complementary. Geoff (who spends more time at Google than in Toronto now) and Russ Salakhutdinov like RBMs and deep Boltzmann machines. I like the idea of Boltzmann machines (it’s a beautifully simple concept) but it doesn’t scale well. Also, I totally hate sampling. Yoshua and his colleagues have focused a lot on various unsupervised learning, including denoising auto-encoders, contracting auto-encoders. They are not allergic to sampling like I am. On the application side, they have worked on text, not so much on images. In our lab at NYU (Rob Fergus, David Sontag, me and our students and postdocs), we have been focusing on sparse auto-encoders for unsupervised learning. They have the advantage of scaling well. We have also worked on applications, mostly to v

3 0.35474432 27 fast ml-2013-05-01-Deep learning made easy

Introduction: As usual, there’s an interesting competition at Kaggle: The Black Box. It’s connected to ICML 2013 Workshop on Challenges in Representation Learning, held by the deep learning guys from Montreal. There are a couple benchmarks for this competition and the best one is unusually hard to beat 1 - only less than a fourth of those taking part managed to do so. We’re among them. Here’s how. The key ingredient in our success is a recently developed secret Stanford technology for deep unsupervised learning: sparse filtering by Jiquan Ngiam et al. Actually, it’s not secret. It’s available at Github , and has one or two very appealling properties. Let us explain. The main idea of deep unsupervised learning, as we understand it, is feature extraction. One of the most common applications is in multimedia. The reason for that is that multimedia tasks, for example object recognition, are easy for humans, but difficult for computers 2 . Geoff Hinton from Toronto talks about two ends

4 0.30386284 43 fast ml-2013-11-02-Maxing out the digits

Introduction: Recently we’ve been investigating the basics of Pylearn2 . Now it’s time for a more advanced example: a multilayer perceptron with dropout and maxout activation for the MNIST digits. Maxout explained If you’ve been following developments in deep learning, you know that Hinton’s most recent recommendation for supervised learning, after a few years of bashing backpropagation in favour of unsupervised pretraining, is to use classic multilayer perceptrons with dropout and rectified linear units. For us, this breath of simplicity is a welcome change. Rectified linear is f(x) = max( 0, x ) . This makes backpropagation trivial: for x > 0, the derivative is one, else zero. Note that ReLU consists of two linear functions. But why stop at two? Let’s take max. out of three, or four, or five linear functions… And so maxout is a generalization of ReLU. It can approximate any convex function. Now backpropagation is easy and dropout prevents overfitting, so we can train a deep

5 0.18480195 48 fast ml-2013-12-28-Regularizing neural networks with dropout and with DropConnect

Introduction: We continue with CIFAR-10-based competition at Kaggle to get to know DropConnect. It’s supposed to be an improvement over dropout. And dropout is certainly one of the bigger steps forward in neural network development. Is DropConnect really better than dropout? TL;DR DropConnect seems to offer results similiar to dropout. State of the art scores reported in the paper come from model ensembling. Dropout Dropout , by Hinton et al., is perhaps a biggest invention in the field of neural networks in recent years. It adresses the main problem in machine learning, that is overfitting. It does so by “dropping out” some unit activations in a given layer, that is setting them to zero. Thus it prevents co-adaptation of units and can also be seen as a method of ensembling many networks sharing the same weights. For each training example a different set of units to drop is randomly chosen. The idea has a biological inspiration . When a child is conceived, it receives half its genes f

6 0.16889428 55 fast ml-2014-03-20-Good representations, distance, metric learning and supervised dimensionality reduction

7 0.14358422 24 fast ml-2013-03-25-Dimensionality reduction for sparse binary data - an overview

8 0.14277469 2 fast ml-2012-08-27-Kaggle job recommendation challenge

9 0.12370271 15 fast ml-2013-01-07-Machine learning courses online

10 0.11474344 16 fast ml-2013-01-12-Intro to random forests

11 0.11395502 12 fast ml-2012-12-21-Tuning hyperparams automatically with Spearmint

12 0.11303103 49 fast ml-2014-01-10-Classifying images with a pre-trained deep network

13 0.10884416 54 fast ml-2014-03-06-PyBrain - a simple neural networks library in Python

14 0.10677829 46 fast ml-2013-12-07-13 NIPS papers that caught our eye

15 0.10543646 7 fast ml-2012-10-05-Predicting closed questions on Stack Overflow

16 0.10408244 29 fast ml-2013-05-25-More on sparse filtering and the Black Box competition

17 0.10098268 18 fast ml-2013-01-17-A very fast denoising autoencoder

18 0.093333349 10 fast ml-2012-11-17-The Facebook challenge HOWTO

19 0.09282475 8 fast ml-2012-10-15-Merck challenge

20 0.090794727 31 fast ml-2013-06-19-Go non-linear with Vowpal Wabbit


similar blogs computed by lda model

lda for this blog:

topicId topicWeight

[(3, 0.46), (24, 0.04), (26, 0.018), (31, 0.068), (35, 0.047), (69, 0.15), (71, 0.036), (78, 0.058), (99, 0.027)]

similar blogs list:

simIndex simValue blogId blogTitle

same-blog 1 0.89376342 58 fast ml-2014-04-12-Deep learning these days

Introduction: It seems that quite a few people with interest in deep learning think of it in terms of unsupervised pre-training, autoencoders, stacked RBMs and deep belief networks. It’s easy to get into this groove by watching one of Geoff Hinton’s videos from a few years ago, where he bashes backpropagation in favour of unsupervised methods that are able to discover the structure in data by themselves, the same way as human brain does. Those videos, papers and tutorials linger. They were state of the art once, but things have changed since then. These days supervised learning is the king again. This has to do with the fact that you can look at data from many different angles and usually you’d prefer representation that is useful for the discriminative task at hand . Unsupervised learning will find some angle, but will it be the one you want? In case of the MNIST digits, sure. Otherwise probably not. Or maybe it will find a lot of angles while you only need one. Ladies and gentlemen, pleas

2 0.3513228 62 fast ml-2014-05-26-Yann LeCun's answers from the Reddit AMA

Introduction: On May 15th Yann LeCun answered “ask me anything” questions on Reddit . We hand-picked some of his thoughts and grouped them by topic for your enjoyment. Toronto, Montreal and New York All three groups are strong and complementary. Geoff (who spends more time at Google than in Toronto now) and Russ Salakhutdinov like RBMs and deep Boltzmann machines. I like the idea of Boltzmann machines (it’s a beautifully simple concept) but it doesn’t scale well. Also, I totally hate sampling. Yoshua and his colleagues have focused a lot on various unsupervised learning, including denoising auto-encoders, contracting auto-encoders. They are not allergic to sampling like I am. On the application side, they have worked on text, not so much on images. In our lab at NYU (Rob Fergus, David Sontag, me and our students and postdocs), we have been focusing on sparse auto-encoders for unsupervised learning. They have the advantage of scaling well. We have also worked on applications, mostly to v

3 0.35037616 19 fast ml-2013-02-07-The secret of the big guys

Introduction: Are you interested in linear models, or K-means clustering? Probably not much. These are very basic techniques with fancier alternatives. But here’s the bomb: when you combine those two methods for supervised learning, you can get better results than from a random forest. And maybe even faster. We have already written about Vowpal Wabbit , a fast linear learner from Yahoo/Microsoft. Google’s response (or at least, a Google’s guy response) seems to be Sofia-ML . The software consists of two parts: a linear learner and K-means clustering. We found Sofia a while ago and wondered about K-means: who needs K-means? Here’s a clue: This package can be used for learning cluster centers (…) and for mapping a given data set onto a new feature space based on the learned cluster centers. Our eyes only opened when we read a certain paper, namely An Analysis of Single-Layer Networks in Unsupervised Feature Learning ( PDF ). The paper, by Coates , Lee and Ng, is about object recogni

4 0.33947483 43 fast ml-2013-11-02-Maxing out the digits

Introduction: Recently we’ve been investigating the basics of Pylearn2 . Now it’s time for a more advanced example: a multilayer perceptron with dropout and maxout activation for the MNIST digits. Maxout explained If you’ve been following developments in deep learning, you know that Hinton’s most recent recommendation for supervised learning, after a few years of bashing backpropagation in favour of unsupervised pretraining, is to use classic multilayer perceptrons with dropout and rectified linear units. For us, this breath of simplicity is a welcome change. Rectified linear is f(x) = max( 0, x ) . This makes backpropagation trivial: for x > 0, the derivative is one, else zero. Note that ReLU consists of two linear functions. But why stop at two? Let’s take max. out of three, or four, or five linear functions… And so maxout is a generalization of ReLU. It can approximate any convex function. Now backpropagation is easy and dropout prevents overfitting, so we can train a deep

5 0.33426923 27 fast ml-2013-05-01-Deep learning made easy

Introduction: As usual, there’s an interesting competition at Kaggle: The Black Box. It’s connected to ICML 2013 Workshop on Challenges in Representation Learning, held by the deep learning guys from Montreal. There are a couple benchmarks for this competition and the best one is unusually hard to beat 1 - only less than a fourth of those taking part managed to do so. We’re among them. Here’s how. The key ingredient in our success is a recently developed secret Stanford technology for deep unsupervised learning: sparse filtering by Jiquan Ngiam et al. Actually, it’s not secret. It’s available at Github , and has one or two very appealling properties. Let us explain. The main idea of deep unsupervised learning, as we understand it, is feature extraction. One of the most common applications is in multimedia. The reason for that is that multimedia tasks, for example object recognition, are easy for humans, but difficult for computers 2 . Geoff Hinton from Toronto talks about two ends

6 0.32456616 13 fast ml-2012-12-27-Spearmint with a random forest

7 0.31744239 18 fast ml-2013-01-17-A very fast denoising autoencoder

8 0.31640649 48 fast ml-2013-12-28-Regularizing neural networks with dropout and with DropConnect

9 0.3099618 14 fast ml-2013-01-04-Madelon: Spearmint's revenge

10 0.30742362 1 fast ml-2012-08-09-What you wanted to know about Mean Average Precision

11 0.30354777 9 fast ml-2012-10-25-So you want to work for Facebook

12 0.30264676 12 fast ml-2012-12-21-Tuning hyperparams automatically with Spearmint

13 0.29954365 54 fast ml-2014-03-06-PyBrain - a simple neural networks library in Python

14 0.29844579 17 fast ml-2013-01-14-Feature selection in practice

15 0.29778865 40 fast ml-2013-10-06-Pylearn2 in practice

16 0.29099938 55 fast ml-2014-03-20-Good representations, distance, metric learning and supervised dimensionality reduction

17 0.28702724 23 fast ml-2013-03-18-Large scale L1 feature selection with Vowpal Wabbit

18 0.28351802 52 fast ml-2014-02-02-Yesterday a kaggler, today a Kaggle master: a wrap-up of the cats and dogs competition

19 0.27910763 26 fast ml-2013-04-17-Regression as classification

20 0.27782708 45 fast ml-2013-11-27-Object recognition in images with cuda-convnet