fast_ml fast_ml-2014 fast_ml-2014-62 knowledge-graph by maker-knowledge-mining

62 fast ml-2014-05-26-Yann LeCun's answers from the Reddit AMA


meta infos for this blog

Source: html

Introduction: On May 15th Yann LeCun answered “ask me anything” questions on Reddit . We hand-picked some of his thoughts and grouped them by topic for your enjoyment. Toronto, Montreal and New York All three groups are strong and complementary. Geoff (who spends more time at Google than in Toronto now) and Russ Salakhutdinov like RBMs and deep Boltzmann machines. I like the idea of Boltzmann machines (it’s a beautifully simple concept) but it doesn’t scale well. Also, I totally hate sampling. Yoshua and his colleagues have focused a lot on various unsupervised learning, including denoising auto-encoders, contracting auto-encoders. They are not allergic to sampling like I am. On the application side, they have worked on text, not so much on images. In our lab at NYU (Rob Fergus, David Sontag, me and our students and postdocs), we have been focusing on sparse auto-encoders for unsupervised learning. They have the advantage of scaling well. We have also worked on applications, mostly to v


Summary: the most important sentenses genereted by tfidf model

sentIndex sentText sentNum sentScore

1 Yoshua and his colleagues have focused a lot on various unsupervised learning, including denoising auto-encoders, contracting auto-encoders. [sent-7, score-0.375]

2 Certainly, we all agree that AI systems of the future will be hierarchical (it’s the very idea of deep learning) and will use temporal prediction. [sent-16, score-0.336]

3 Don’t get fooled by people who claim to have a solution to Artificial General Intelligence, who claim to have AI systems that work “just like the human brain”, or who claim to have figured out how the brain works (well, except if it’s Geoff Hinton making the claim). [sent-28, score-0.524]

4 for machine translation), video understanding learning complex control. [sent-41, score-0.323]

5 Do you think that deep learning would be a good tool for finding similarities in the medical domain (e. [sent-42, score-0.327]

6 Joan Bruna, Arthur Szlam) have come from that community because I think they can help with cracking the unsupervised learning problem. [sent-95, score-0.498]

7 I do not believe that classical learning theory with “IID samples, convex optimization, and supervised classification and regression” is sufficient for representation learning. [sent-96, score-0.365]

8 The theory of deep learning is a wide open field. [sent-102, score-0.442]

9 I like kernel methods (as Woody Allen would say “some of my best friends are kernel methods”). [sent-121, score-0.404]

10 I proposed/used metric learning to learn embeddings with neural nets before it was cool to do this with kernel machines. [sent-131, score-0.409]

11 Learning complex/hierarchical/non-linear features/representations/metrics cannot be done with kernel methods as it can be done with deep architectures. [sent-132, score-0.431]

12 Unsupervised learning The interest of the ML community in representation learning was rekindled by early results with unsupervised learning: stacked sparse auto-encoders, RBMs, etc. [sent-154, score-0.748]

13 It is true that the recent practical success of deep learning in image and speech all use purely supervised backprop (mostly applied to convolutional nets). [sent-155, score-0.476]

14 Still, there are a few applications where unsupervised pre-training does bring an improvement over purely supervised learning. [sent-157, score-0.288]

15 Generally, unsupervised learning is a means to an end. [sent-167, score-0.353]

16 Torch7 is what is being used for deep learning R&D; at NYU, at Facebook AI Research, at Deep Mind, and at Google Brain. [sent-192, score-0.327]

17 The future Deep learning has become the dominant method for acoustic modeling in speech recognition, and is quickly becoming the dominant method for several vision tasks such as object recognition, object detection, and semantic segmentation. [sent-193, score-0.421]

18 The next frontier for deep learning are language understanding, video, and control/planning (e. [sent-194, score-0.401]

19 Integrating deep learning (or representation learning) with reasoning and making unsupervised learning actually work are two big challenges for the next several years. [sent-197, score-0.972]

20 Integrating deep learning (or representation learning) with reasoning and making unsupervised learning actually work are two big challenges for the next several years. [sent-205, score-0.972]


similar blogs computed by tfidf model

tfidf for this blog:

wordName wordTfidf (topN-words)

[('unsupervised', 0.197), ('deep', 0.171), ('learning', 0.156), ('nyu', 0.149), ('community', 0.145), ('torch', 0.145), ('physics', 0.145), ('kernel', 0.144), ('detection', 0.124), ('pedestrian', 0.124), ('convnets', 0.124), ('methods', 0.116), ('theory', 0.115), ('math', 0.109), ('nets', 0.109), ('lot', 0.104), ('mathematical', 0.103), ('claim', 0.099), ('cv', 0.099), ('reasoning', 0.099), ('several', 0.099), ('representation', 0.094), ('video', 0.091), ('applications', 0.091), ('bell', 0.083), ('labs', 0.083), ('future', 0.083), ('theoretical', 0.083), ('speech', 0.083), ('systems', 0.082), ('people', 0.079), ('research', 0.077), ('understanding', 0.076), ('recognition', 0.076), ('objective', 0.076), ('colleagues', 0.074), ('convnet', 0.074), ('htm', 0.074), ('marrying', 0.074), ('pathological', 0.074), ('signals', 0.074), ('vc', 0.074), ('vladimir', 0.074), ('mind', 0.074), ('language', 0.074), ('worked', 0.073), ('natural', 0.07), ('brain', 0.066), ('problems', 0.066), ('convolutional', 0.066)]

similar blogs list:

simIndex simValue blogId blogTitle

same-blog 1 0.99999952 62 fast ml-2014-05-26-Yann LeCun's answers from the Reddit AMA

Introduction: On May 15th Yann LeCun answered “ask me anything” questions on Reddit . We hand-picked some of his thoughts and grouped them by topic for your enjoyment. Toronto, Montreal and New York All three groups are strong and complementary. Geoff (who spends more time at Google than in Toronto now) and Russ Salakhutdinov like RBMs and deep Boltzmann machines. I like the idea of Boltzmann machines (it’s a beautifully simple concept) but it doesn’t scale well. Also, I totally hate sampling. Yoshua and his colleagues have focused a lot on various unsupervised learning, including denoising auto-encoders, contracting auto-encoders. They are not allergic to sampling like I am. On the application side, they have worked on text, not so much on images. In our lab at NYU (Rob Fergus, David Sontag, me and our students and postdocs), we have been focusing on sparse auto-encoders for unsupervised learning. They have the advantage of scaling well. We have also worked on applications, mostly to v

2 0.21394759 58 fast ml-2014-04-12-Deep learning these days

Introduction: It seems that quite a few people with interest in deep learning think of it in terms of unsupervised pre-training, autoencoders, stacked RBMs and deep belief networks. It’s easy to get into this groove by watching one of Geoff Hinton’s videos from a few years ago, where he bashes backpropagation in favour of unsupervised methods that are able to discover the structure in data by themselves, the same way as human brain does. Those videos, papers and tutorials linger. They were state of the art once, but things have changed since then. These days supervised learning is the king again. This has to do with the fact that you can look at data from many different angles and usually you’d prefer representation that is useful for the discriminative task at hand . Unsupervised learning will find some angle, but will it be the one you want? In case of the MNIST digits, sure. Otherwise probably not. Or maybe it will find a lot of angles while you only need one. Ladies and gentlemen, pleas

3 0.19788156 27 fast ml-2013-05-01-Deep learning made easy

Introduction: As usual, there’s an interesting competition at Kaggle: The Black Box. It’s connected to ICML 2013 Workshop on Challenges in Representation Learning, held by the deep learning guys from Montreal. There are a couple benchmarks for this competition and the best one is unusually hard to beat 1 - only less than a fourth of those taking part managed to do so. We’re among them. Here’s how. The key ingredient in our success is a recently developed secret Stanford technology for deep unsupervised learning: sparse filtering by Jiquan Ngiam et al. Actually, it’s not secret. It’s available at Github , and has one or two very appealling properties. Let us explain. The main idea of deep unsupervised learning, as we understand it, is feature extraction. One of the most common applications is in multimedia. The reason for that is that multimedia tasks, for example object recognition, are easy for humans, but difficult for computers 2 . Geoff Hinton from Toronto talks about two ends

4 0.18242559 46 fast ml-2013-12-07-13 NIPS papers that caught our eye

Introduction: Recently Rob Zinkov published his selection of interesting-looking NIPS papers . Inspired by this, we list some more. Rob seems to like Bayesian stuff, we’re more into neural networks. If you feel like browsing, Andrej Karpathy has a page with all NIPS 2013 papers . They are categorized by topics discovered by running LDA. When you see an interesting paper, you can discover ones ranked similiar by TF-IDF. Here’s what we found. Understanding Dropout Pierre Baldi, Peter J. Sadowski Dropout is a relatively new algorithm for training neural networks which relies on stochastically dropping out neurons during training in order to avoid the co-adaptation of feature detectors. We introduce a general formalism for studying dropout on either units or connections, with arbitrary probability values, and use it to analyze the averaging and regularizing properties of dropout in both linear and non-linear networks. For deep neural networks, the averaging properties of dropout are characte

5 0.17096066 57 fast ml-2014-04-01-Exclusive Geoff Hinton interview

Introduction: Geoff Hinton is a living legend. He almost single-handedly invented backpropagation for training feed-forward neural networks. Despite in theory being universal function approximators, these networks turned out to be pretty much useless for more complex problems, like computer vision and speech recognition. Professor Hinton responded by creating deep networks and deep learning, an ultimate form of machine learning. Recently we’ve been fortunate to ask Geoff a few questions and have him answer them. Geoff, thanks so much for talking to us. You’ve had a long and fruitful career. What drives you these days? Well, after a man hits a certain age, his priorities change. Back in the 80s I was happy when I was able to train a network with eight hidden units. Now I can finally have thousands and possibly millions of them. So I guess the answer is scale. Apart from that, I like people at Google and I like making them a ton of money. They happen to pay me well, so it’s a win-win situ

6 0.15401696 15 fast ml-2013-01-07-Machine learning courses online

7 0.13928397 37 fast ml-2013-09-03-Our followers and who else they follow

8 0.12635961 29 fast ml-2013-05-25-More on sparse filtering and the Black Box competition

9 0.11860725 55 fast ml-2014-03-20-Good representations, distance, metric learning and supervised dimensionality reduction

10 0.11039407 12 fast ml-2012-12-21-Tuning hyperparams automatically with Spearmint

11 0.099074133 41 fast ml-2013-10-09-Big data made easy

12 0.0959545 19 fast ml-2013-02-07-The secret of the big guys

13 0.092817739 24 fast ml-2013-03-25-Dimensionality reduction for sparse binary data - an overview

14 0.091367759 45 fast ml-2013-11-27-Object recognition in images with cuda-convnet

15 0.091197461 22 fast ml-2013-03-07-Choosing a machine learning algorithm

16 0.085027091 40 fast ml-2013-10-06-Pylearn2 in practice

17 0.082525246 43 fast ml-2013-11-02-Maxing out the digits

18 0.077658482 7 fast ml-2012-10-05-Predicting closed questions on Stack Overflow

19 0.07148876 18 fast ml-2013-01-17-A very fast denoising autoencoder

20 0.070592798 54 fast ml-2014-03-06-PyBrain - a simple neural networks library in Python


similar blogs computed by lsi model

lsi for this blog:

topicId topicWeight

[(0, 0.341), (1, 0.256), (2, 0.331), (3, 0.046), (4, 0.165), (5, 0.052), (6, 0.104), (7, 0.17), (8, 0.035), (9, 0.008), (10, -0.011), (11, -0.068), (12, 0.088), (13, 0.04), (14, 0.063), (15, -0.019), (16, -0.038), (17, 0.018), (18, -0.197), (19, -0.131), (20, 0.077), (21, -0.145), (22, 0.03), (23, -0.033), (24, -0.07), (25, 0.064), (26, -0.018), (27, -0.096), (28, -0.092), (29, 0.077), (30, 0.048), (31, -0.126), (32, -0.059), (33, -0.049), (34, -0.029), (35, -0.169), (36, -0.066), (37, -0.008), (38, 0.093), (39, 0.011), (40, 0.009), (41, -0.207), (42, 0.002), (43, -0.047), (44, -0.006), (45, -0.037), (46, -0.095), (47, 0.171), (48, -0.005), (49, -0.017)]

similar blogs list:

simIndex simValue blogId blogTitle

same-blog 1 0.97438419 62 fast ml-2014-05-26-Yann LeCun's answers from the Reddit AMA

Introduction: On May 15th Yann LeCun answered “ask me anything” questions on Reddit . We hand-picked some of his thoughts and grouped them by topic for your enjoyment. Toronto, Montreal and New York All three groups are strong and complementary. Geoff (who spends more time at Google than in Toronto now) and Russ Salakhutdinov like RBMs and deep Boltzmann machines. I like the idea of Boltzmann machines (it’s a beautifully simple concept) but it doesn’t scale well. Also, I totally hate sampling. Yoshua and his colleagues have focused a lot on various unsupervised learning, including denoising auto-encoders, contracting auto-encoders. They are not allergic to sampling like I am. On the application side, they have worked on text, not so much on images. In our lab at NYU (Rob Fergus, David Sontag, me and our students and postdocs), we have been focusing on sparse auto-encoders for unsupervised learning. They have the advantage of scaling well. We have also worked on applications, mostly to v

2 0.66565049 58 fast ml-2014-04-12-Deep learning these days

Introduction: It seems that quite a few people with interest in deep learning think of it in terms of unsupervised pre-training, autoencoders, stacked RBMs and deep belief networks. It’s easy to get into this groove by watching one of Geoff Hinton’s videos from a few years ago, where he bashes backpropagation in favour of unsupervised methods that are able to discover the structure in data by themselves, the same way as human brain does. Those videos, papers and tutorials linger. They were state of the art once, but things have changed since then. These days supervised learning is the king again. This has to do with the fact that you can look at data from many different angles and usually you’d prefer representation that is useful for the discriminative task at hand . Unsupervised learning will find some angle, but will it be the one you want? In case of the MNIST digits, sure. Otherwise probably not. Or maybe it will find a lot of angles while you only need one. Ladies and gentlemen, pleas

3 0.46708238 46 fast ml-2013-12-07-13 NIPS papers that caught our eye

Introduction: Recently Rob Zinkov published his selection of interesting-looking NIPS papers . Inspired by this, we list some more. Rob seems to like Bayesian stuff, we’re more into neural networks. If you feel like browsing, Andrej Karpathy has a page with all NIPS 2013 papers . They are categorized by topics discovered by running LDA. When you see an interesting paper, you can discover ones ranked similiar by TF-IDF. Here’s what we found. Understanding Dropout Pierre Baldi, Peter J. Sadowski Dropout is a relatively new algorithm for training neural networks which relies on stochastically dropping out neurons during training in order to avoid the co-adaptation of feature detectors. We introduce a general formalism for studying dropout on either units or connections, with arbitrary probability values, and use it to analyze the averaging and regularizing properties of dropout in both linear and non-linear networks. For deep neural networks, the averaging properties of dropout are characte

4 0.4652206 27 fast ml-2013-05-01-Deep learning made easy

Introduction: As usual, there’s an interesting competition at Kaggle: The Black Box. It’s connected to ICML 2013 Workshop on Challenges in Representation Learning, held by the deep learning guys from Montreal. There are a couple benchmarks for this competition and the best one is unusually hard to beat 1 - only less than a fourth of those taking part managed to do so. We’re among them. Here’s how. The key ingredient in our success is a recently developed secret Stanford technology for deep unsupervised learning: sparse filtering by Jiquan Ngiam et al. Actually, it’s not secret. It’s available at Github , and has one or two very appealling properties. Let us explain. The main idea of deep unsupervised learning, as we understand it, is feature extraction. One of the most common applications is in multimedia. The reason for that is that multimedia tasks, for example object recognition, are easy for humans, but difficult for computers 2 . Geoff Hinton from Toronto talks about two ends

5 0.39812642 15 fast ml-2013-01-07-Machine learning courses online

Introduction: How do you learn machine learning? A good way to begin is to take an online course. These courses started appearing towards the end of 2011, first from Stanford University, now from Coursera , Udacity , edX and other institutions. There are very many of them, including a few about machine learning. Here’s a list: Introduction to Artificial Intelligence by Sebastian Thrun and Peter Norvig. That was the first online class, and it contains two units on machine learning (units five and six). Both instructors work at Google. Sebastian Thrun is best known for building a self-driving car and Peter Norvig is a leading authority on AI, so they know what they are talking about. After the success of the class Sebastian Thrun quit Stanford to found Udacity, his online learning startup. Machine Learning by Andrew Ng. Again, one of the first classes, by Stanford professor who started Coursera, the best known online learning provider today. Andrew Ng is a world class authority on m

6 0.37489572 57 fast ml-2014-04-01-Exclusive Geoff Hinton interview

7 0.33706552 12 fast ml-2012-12-21-Tuning hyperparams automatically with Spearmint

8 0.3113071 37 fast ml-2013-09-03-Our followers and who else they follow

9 0.2797218 55 fast ml-2014-03-20-Good representations, distance, metric learning and supervised dimensionality reduction

10 0.26254284 29 fast ml-2013-05-25-More on sparse filtering and the Black Box competition

11 0.25507155 24 fast ml-2013-03-25-Dimensionality reduction for sparse binary data - an overview

12 0.24338168 45 fast ml-2013-11-27-Object recognition in images with cuda-convnet

13 0.23478876 41 fast ml-2013-10-09-Big data made easy

14 0.23146325 22 fast ml-2013-03-07-Choosing a machine learning algorithm

15 0.21279536 40 fast ml-2013-10-06-Pylearn2 in practice

16 0.20461658 28 fast ml-2013-05-12-And deliver us from Weka

17 0.1975102 19 fast ml-2013-02-07-The secret of the big guys

18 0.19135551 49 fast ml-2014-01-10-Classifying images with a pre-trained deep network

19 0.18193023 7 fast ml-2012-10-05-Predicting closed questions on Stack Overflow

20 0.16363481 60 fast ml-2014-04-30-Converting categorical data into numbers with Pandas and Scikit-learn


similar blogs computed by lda model

lda for this blog:

topicId topicWeight

[(3, 0.026), (15, 0.014), (26, 0.025), (31, 0.046), (35, 0.032), (55, 0.017), (69, 0.107), (71, 0.035), (73, 0.033), (78, 0.422), (79, 0.017), (84, 0.039), (96, 0.041), (99, 0.03)]

similar blogs list:

simIndex simValue blogId blogTitle

same-blog 1 0.9279986 62 fast ml-2014-05-26-Yann LeCun's answers from the Reddit AMA

Introduction: On May 15th Yann LeCun answered “ask me anything” questions on Reddit . We hand-picked some of his thoughts and grouped them by topic for your enjoyment. Toronto, Montreal and New York All three groups are strong and complementary. Geoff (who spends more time at Google than in Toronto now) and Russ Salakhutdinov like RBMs and deep Boltzmann machines. I like the idea of Boltzmann machines (it’s a beautifully simple concept) but it doesn’t scale well. Also, I totally hate sampling. Yoshua and his colleagues have focused a lot on various unsupervised learning, including denoising auto-encoders, contracting auto-encoders. They are not allergic to sampling like I am. On the application side, they have worked on text, not so much on images. In our lab at NYU (Rob Fergus, David Sontag, me and our students and postdocs), we have been focusing on sparse auto-encoders for unsupervised learning. They have the advantage of scaling well. We have also worked on applications, mostly to v

2 0.76198888 19 fast ml-2013-02-07-The secret of the big guys

Introduction: Are you interested in linear models, or K-means clustering? Probably not much. These are very basic techniques with fancier alternatives. But here’s the bomb: when you combine those two methods for supervised learning, you can get better results than from a random forest. And maybe even faster. We have already written about Vowpal Wabbit , a fast linear learner from Yahoo/Microsoft. Google’s response (or at least, a Google’s guy response) seems to be Sofia-ML . The software consists of two parts: a linear learner and K-means clustering. We found Sofia a while ago and wondered about K-means: who needs K-means? Here’s a clue: This package can be used for learning cluster centers (…) and for mapping a given data set onto a new feature space based on the learned cluster centers. Our eyes only opened when we read a certain paper, namely An Analysis of Single-Layer Networks in Unsupervised Feature Learning ( PDF ). The paper, by Coates , Lee and Ng, is about object recogni

3 0.39738211 58 fast ml-2014-04-12-Deep learning these days

Introduction: It seems that quite a few people with interest in deep learning think of it in terms of unsupervised pre-training, autoencoders, stacked RBMs and deep belief networks. It’s easy to get into this groove by watching one of Geoff Hinton’s videos from a few years ago, where he bashes backpropagation in favour of unsupervised methods that are able to discover the structure in data by themselves, the same way as human brain does. Those videos, papers and tutorials linger. They were state of the art once, but things have changed since then. These days supervised learning is the king again. This has to do with the fact that you can look at data from many different angles and usually you’d prefer representation that is useful for the discriminative task at hand . Unsupervised learning will find some angle, but will it be the one you want? In case of the MNIST digits, sure. Otherwise probably not. Or maybe it will find a lot of angles while you only need one. Ladies and gentlemen, pleas

4 0.38098317 55 fast ml-2014-03-20-Good representations, distance, metric learning and supervised dimensionality reduction

Introduction: How to represent features for machine learning is an important business. For example, deep learning is all about finding good representations. What exactly they are depends on a task at hand. We investigate how to use available labels to obtain good representations. Motivation The paper that inspired us a while ago was Nonparametric Guidance of Autoencoder Representations using Label Information by Snoek, Adams and LaRochelle. It’s about autoencoders, but contains a greater idea: Discriminative algorithms often work best with highly-informative features; remarkably, such features can often be learned without the labels. (…) However, pure unsupervised learning (…) can find representations that may or may not be useful for the ultimate discriminative task. (…) In this work, we are interested in the discovery of latent features which can be later used as alternate representations of data for discriminative tasks. That is, we wish to find ways to extract statistical structu

5 0.36655346 46 fast ml-2013-12-07-13 NIPS papers that caught our eye

Introduction: Recently Rob Zinkov published his selection of interesting-looking NIPS papers . Inspired by this, we list some more. Rob seems to like Bayesian stuff, we’re more into neural networks. If you feel like browsing, Andrej Karpathy has a page with all NIPS 2013 papers . They are categorized by topics discovered by running LDA. When you see an interesting paper, you can discover ones ranked similiar by TF-IDF. Here’s what we found. Understanding Dropout Pierre Baldi, Peter J. Sadowski Dropout is a relatively new algorithm for training neural networks which relies on stochastically dropping out neurons during training in order to avoid the co-adaptation of feature detectors. We introduce a general formalism for studying dropout on either units or connections, with arbitrary probability values, and use it to analyze the averaging and regularizing properties of dropout in both linear and non-linear networks. For deep neural networks, the averaging properties of dropout are characte

6 0.34930441 24 fast ml-2013-03-25-Dimensionality reduction for sparse binary data - an overview

7 0.33289957 23 fast ml-2013-03-18-Large scale L1 feature selection with Vowpal Wabbit

8 0.32410139 36 fast ml-2013-08-23-A bag of words and a nice little network

9 0.31826732 12 fast ml-2012-12-21-Tuning hyperparams automatically with Spearmint

10 0.31803706 43 fast ml-2013-11-02-Maxing out the digits

11 0.31697679 18 fast ml-2013-01-17-A very fast denoising autoencoder

12 0.31424403 48 fast ml-2013-12-28-Regularizing neural networks with dropout and with DropConnect

13 0.31398785 40 fast ml-2013-10-06-Pylearn2 in practice

14 0.31354806 27 fast ml-2013-05-01-Deep learning made easy

15 0.3118827 21 fast ml-2013-02-27-Dimensionality reduction for sparse binary data

16 0.30391261 9 fast ml-2012-10-25-So you want to work for Facebook

17 0.29382354 17 fast ml-2013-01-14-Feature selection in practice

18 0.2901473 26 fast ml-2013-04-17-Regression as classification

19 0.28559196 13 fast ml-2012-12-27-Spearmint with a random forest

20 0.28107655 15 fast ml-2013-01-07-Machine learning courses online