fast_ml fast_ml-2014 fast_ml-2014-57 knowledge-graph by maker-knowledge-mining
Source: html
Introduction: Geoff Hinton is a living legend. He almost single-handedly invented backpropagation for training feed-forward neural networks. Despite in theory being universal function approximators, these networks turned out to be pretty much useless for more complex problems, like computer vision and speech recognition. Professor Hinton responded by creating deep networks and deep learning, an ultimate form of machine learning. Recently we’ve been fortunate to ask Geoff a few questions and have him answer them. Geoff, thanks so much for talking to us. You’ve had a long and fruitful career. What drives you these days? Well, after a man hits a certain age, his priorities change. Back in the 80s I was happy when I was able to train a network with eight hidden units. Now I can finally have thousands and possibly millions of them. So I guess the answer is scale. Apart from that, I like people at Google and I like making them a ton of money. They happen to pay me well, so it’s a win-win situ
sentIndex sentText sentNum sentScore
1 He almost single-handedly invented backpropagation for training feed-forward neural networks. [sent-2, score-0.222]
2 Despite in theory being universal function approximators, these networks turned out to be pretty much useless for more complex problems, like computer vision and speech recognition. [sent-3, score-0.787]
3 Professor Hinton responded by creating deep networks and deep learning, an ultimate form of machine learning. [sent-4, score-0.594]
4 Recently we’ve been fortunate to ask Geoff a few questions and have him answer them. [sent-5, score-0.161]
5 Well, after a man hits a certain age, his priorities change. [sent-9, score-0.387]
6 Back in the 80s I was happy when I was able to train a network with eight hidden units. [sent-10, score-0.199]
7 Now I can finally have thousands and possibly millions of them. [sent-11, score-0.094]
8 Apart from that, I like people at Google and I like making them a ton of money. [sent-13, score-0.198]
9 They happen to pay me well, so it’s a win-win situation. [sent-14, score-0.208]
10 And I’ve been impressed with Charlie Tang’s, Charlie’s my student at Toronto University, with his idea of using a linear SVM as a final layer for a network instead of softmax. [sent-17, score-0.293]
11 I’d like to go further by putting a kernel SVM on top, or perhaps a random forest. [sent-18, score-0.104]
12 Backpropagating through trees is quite challenging but I have some ideas on how to cope with that. [sent-19, score-0.161]
13 Have you seen any new big ideas in current research? [sent-20, score-0.161]
14 I don’t have much time to read papers anymore, even when I’m listed as a co-author. [sent-21, score-0.205]
15 I’ve heard of Radford’s [Neal] attempt to speed up R. [sent-22, score-0.094]
16 He’s a very smart man so I’m going to get my students write some R instead of Matlab and we’ll be looking if we can run it in parallel at Google. [sent-23, score-0.481]
17 I haven’t had much of it lately, but I enjoy spending time with my family. [sent-26, score-0.205]
18 When I get a moment alone, I work in the garden or learn Java. [sent-27, score-0.104]
wordName wordTfidf (topN-words)
[('ve', 0.266), ('geoff', 0.26), ('charlie', 0.237), ('man', 0.189), ('ideas', 0.161), ('answer', 0.161), ('svm', 0.141), ('hinton', 0.126), ('impressed', 0.118), ('hits', 0.118), ('alone', 0.118), ('enjoy', 0.118), ('listed', 0.118), ('professor', 0.118), ('university', 0.118), ('thanks', 0.118), ('radford', 0.118), ('ultimate', 0.118), ('despite', 0.118), ('happy', 0.118), ('invented', 0.118), ('responded', 0.118), ('speech', 0.118), ('useless', 0.118), ('pay', 0.104), ('smart', 0.104), ('lately', 0.104), ('moment', 0.104), ('ton', 0.104), ('tang', 0.104), ('perhaps', 0.104), ('backpropagation', 0.104), ('happen', 0.104), ('complex', 0.104), ('parallel', 0.094), ('turned', 0.094), ('theory', 0.094), ('heard', 0.094), ('possibly', 0.094), ('problems', 0.094), ('making', 0.094), ('student', 0.094), ('students', 0.094), ('deep', 0.093), ('much', 0.087), ('vision', 0.087), ('creating', 0.087), ('networks', 0.085), ('network', 0.081), ('certain', 0.08)]
simIndex simValue blogId blogTitle
same-blog 1 1.0000001 57 fast ml-2014-04-01-Exclusive Geoff Hinton interview
Introduction: Geoff Hinton is a living legend. He almost single-handedly invented backpropagation for training feed-forward neural networks. Despite in theory being universal function approximators, these networks turned out to be pretty much useless for more complex problems, like computer vision and speech recognition. Professor Hinton responded by creating deep networks and deep learning, an ultimate form of machine learning. Recently we’ve been fortunate to ask Geoff a few questions and have him answer them. Geoff, thanks so much for talking to us. You’ve had a long and fruitful career. What drives you these days? Well, after a man hits a certain age, his priorities change. Back in the 80s I was happy when I was able to train a network with eight hidden units. Now I can finally have thousands and possibly millions of them. So I guess the answer is scale. Apart from that, I like people at Google and I like making them a ton of money. They happen to pay me well, so it’s a win-win situ
2 0.17096066 62 fast ml-2014-05-26-Yann LeCun's answers from the Reddit AMA
Introduction: On May 15th Yann LeCun answered “ask me anything” questions on Reddit . We hand-picked some of his thoughts and grouped them by topic for your enjoyment. Toronto, Montreal and New York All three groups are strong and complementary. Geoff (who spends more time at Google than in Toronto now) and Russ Salakhutdinov like RBMs and deep Boltzmann machines. I like the idea of Boltzmann machines (it’s a beautifully simple concept) but it doesn’t scale well. Also, I totally hate sampling. Yoshua and his colleagues have focused a lot on various unsupervised learning, including denoising auto-encoders, contracting auto-encoders. They are not allergic to sampling like I am. On the application side, they have worked on text, not so much on images. In our lab at NYU (Rob Fergus, David Sontag, me and our students and postdocs), we have been focusing on sparse auto-encoders for unsupervised learning. They have the advantage of scaling well. We have also worked on applications, mostly to v
3 0.1431199 27 fast ml-2013-05-01-Deep learning made easy
Introduction: As usual, there’s an interesting competition at Kaggle: The Black Box. It’s connected to ICML 2013 Workshop on Challenges in Representation Learning, held by the deep learning guys from Montreal. There are a couple benchmarks for this competition and the best one is unusually hard to beat 1 - only less than a fourth of those taking part managed to do so. We’re among them. Here’s how. The key ingredient in our success is a recently developed secret Stanford technology for deep unsupervised learning: sparse filtering by Jiquan Ngiam et al. Actually, it’s not secret. It’s available at Github , and has one or two very appealling properties. Let us explain. The main idea of deep unsupervised learning, as we understand it, is feature extraction. One of the most common applications is in multimedia. The reason for that is that multimedia tasks, for example object recognition, are easy for humans, but difficult for computers 2 . Geoff Hinton from Toronto talks about two ends
4 0.13621296 29 fast ml-2013-05-25-More on sparse filtering and the Black Box competition
Introduction: The Black Box challenge has just ended. We were thoroughly thrilled to learn that the winner, doubleshot , used sparse filtering, apparently following our cue. His score in terms of accuracy is 0.702, ours 0.645, and the best benchmark 0.525. We ranked 15th out of 217, a few places ahead of the Toronto team consisting of Charlie Tang and Nitish Srivastava . To their credit, Charlie has won the two remaining Challenges in Representation Learning . Not-so-deep learning The difference to our previous, beating-the-benchmark attempt is twofold: one layer instead of two for supervised learning, VW instead of a random forest Somewhat suprisingly, one layer works better than two. Even more surprisingly, with enough units you can get 0.634 using a linear model (Vowpal Wabbit, of course, One-Against-All). In our understanding, that’s the point of overcomplete representations*, which Stanford people seem to care much about. Recall The secret of the big guys and the pape
5 0.099016279 15 fast ml-2013-01-07-Machine learning courses online
Introduction: How do you learn machine learning? A good way to begin is to take an online course. These courses started appearing towards the end of 2011, first from Stanford University, now from Coursera , Udacity , edX and other institutions. There are very many of them, including a few about machine learning. Here’s a list: Introduction to Artificial Intelligence by Sebastian Thrun and Peter Norvig. That was the first online class, and it contains two units on machine learning (units five and six). Both instructors work at Google. Sebastian Thrun is best known for building a self-driving car and Peter Norvig is a leading authority on AI, so they know what they are talking about. After the success of the class Sebastian Thrun quit Stanford to found Udacity, his online learning startup. Machine Learning by Andrew Ng. Again, one of the first classes, by Stanford professor who started Coursera, the best known online learning provider today. Andrew Ng is a world class authority on m
6 0.084932163 58 fast ml-2014-04-12-Deep learning these days
7 0.069774233 12 fast ml-2012-12-21-Tuning hyperparams automatically with Spearmint
8 0.06548278 43 fast ml-2013-11-02-Maxing out the digits
9 0.064662695 46 fast ml-2013-12-07-13 NIPS papers that caught our eye
10 0.061811674 22 fast ml-2013-03-07-Choosing a machine learning algorithm
11 0.061559644 5 fast ml-2012-09-19-Best Buy mobile contest - big data
12 0.056593146 31 fast ml-2013-06-19-Go non-linear with Vowpal Wabbit
13 0.056202922 54 fast ml-2014-03-06-PyBrain - a simple neural networks library in Python
14 0.056177124 59 fast ml-2014-04-21-Predicting happiness from demographics and poll answers
15 0.055801455 16 fast ml-2013-01-12-Intro to random forests
16 0.05425816 50 fast ml-2014-01-20-How to get predictions from Pylearn2
17 0.052937668 41 fast ml-2013-10-09-Big data made easy
18 0.052040912 55 fast ml-2014-03-20-Good representations, distance, metric learning and supervised dimensionality reduction
19 0.050767422 34 fast ml-2013-07-14-Running things on a GPU
20 0.050756522 40 fast ml-2013-10-06-Pylearn2 in practice
topicId topicWeight
[(0, 0.23), (1, 0.165), (2, 0.203), (3, 0.003), (4, 0.105), (5, -0.038), (6, 0.172), (7, 0.14), (8, 0.06), (9, -0.072), (10, -0.059), (11, -0.185), (12, -0.025), (13, -0.084), (14, 0.064), (15, 0.111), (16, -0.125), (17, -0.231), (18, 0.002), (19, -0.227), (20, -0.057), (21, 0.135), (22, 0.222), (23, 0.029), (24, -0.122), (25, -0.141), (26, -0.281), (27, -0.041), (28, -0.133), (29, -0.023), (30, -0.101), (31, -0.202), (32, -0.101), (33, 0.161), (34, 0.11), (35, 0.12), (36, 0.128), (37, -0.054), (38, 0.196), (39, 0.036), (40, -0.121), (41, -0.06), (42, 0.156), (43, 0.021), (44, -0.125), (45, -0.19), (46, 0.218), (47, -0.238), (48, -0.095), (49, -0.025)]
simIndex simValue blogId blogTitle
same-blog 1 0.99155682 57 fast ml-2014-04-01-Exclusive Geoff Hinton interview
Introduction: Geoff Hinton is a living legend. He almost single-handedly invented backpropagation for training feed-forward neural networks. Despite in theory being universal function approximators, these networks turned out to be pretty much useless for more complex problems, like computer vision and speech recognition. Professor Hinton responded by creating deep networks and deep learning, an ultimate form of machine learning. Recently we’ve been fortunate to ask Geoff a few questions and have him answer them. Geoff, thanks so much for talking to us. You’ve had a long and fruitful career. What drives you these days? Well, after a man hits a certain age, his priorities change. Back in the 80s I was happy when I was able to train a network with eight hidden units. Now I can finally have thousands and possibly millions of them. So I guess the answer is scale. Apart from that, I like people at Google and I like making them a ton of money. They happen to pay me well, so it’s a win-win situ
2 0.31919026 62 fast ml-2014-05-26-Yann LeCun's answers from the Reddit AMA
Introduction: On May 15th Yann LeCun answered “ask me anything” questions on Reddit . We hand-picked some of his thoughts and grouped them by topic for your enjoyment. Toronto, Montreal and New York All three groups are strong and complementary. Geoff (who spends more time at Google than in Toronto now) and Russ Salakhutdinov like RBMs and deep Boltzmann machines. I like the idea of Boltzmann machines (it’s a beautifully simple concept) but it doesn’t scale well. Also, I totally hate sampling. Yoshua and his colleagues have focused a lot on various unsupervised learning, including denoising auto-encoders, contracting auto-encoders. They are not allergic to sampling like I am. On the application side, they have worked on text, not so much on images. In our lab at NYU (Rob Fergus, David Sontag, me and our students and postdocs), we have been focusing on sparse auto-encoders for unsupervised learning. They have the advantage of scaling well. We have also worked on applications, mostly to v
3 0.2072693 29 fast ml-2013-05-25-More on sparse filtering and the Black Box competition
Introduction: The Black Box challenge has just ended. We were thoroughly thrilled to learn that the winner, doubleshot , used sparse filtering, apparently following our cue. His score in terms of accuracy is 0.702, ours 0.645, and the best benchmark 0.525. We ranked 15th out of 217, a few places ahead of the Toronto team consisting of Charlie Tang and Nitish Srivastava . To their credit, Charlie has won the two remaining Challenges in Representation Learning . Not-so-deep learning The difference to our previous, beating-the-benchmark attempt is twofold: one layer instead of two for supervised learning, VW instead of a random forest Somewhat suprisingly, one layer works better than two. Even more surprisingly, with enough units you can get 0.634 using a linear model (Vowpal Wabbit, of course, One-Against-All). In our understanding, that’s the point of overcomplete representations*, which Stanford people seem to care much about. Recall The secret of the big guys and the pape
4 0.1946369 27 fast ml-2013-05-01-Deep learning made easy
Introduction: As usual, there’s an interesting competition at Kaggle: The Black Box. It’s connected to ICML 2013 Workshop on Challenges in Representation Learning, held by the deep learning guys from Montreal. There are a couple benchmarks for this competition and the best one is unusually hard to beat 1 - only less than a fourth of those taking part managed to do so. We’re among them. Here’s how. The key ingredient in our success is a recently developed secret Stanford technology for deep unsupervised learning: sparse filtering by Jiquan Ngiam et al. Actually, it’s not secret. It’s available at Github , and has one or two very appealling properties. Let us explain. The main idea of deep unsupervised learning, as we understand it, is feature extraction. One of the most common applications is in multimedia. The reason for that is that multimedia tasks, for example object recognition, are easy for humans, but difficult for computers 2 . Geoff Hinton from Toronto talks about two ends
5 0.18101442 12 fast ml-2012-12-21-Tuning hyperparams automatically with Spearmint
Introduction: The promise What’s attractive in machine learning? That a machine is learning, instead of a human. But an operator still has a lot of work to do. First, he has to learn how to teach a machine, in general. Then, when it comes to a concrete task, there are two main areas where a human needs to do the work (and remember, laziness is a virtue, at least for a programmer, so we’d like to minimize amount of work done by a human): data preparation model tuning This story is about model tuning. Typically, to achieve satisfactory results, first we need to convert raw data into format accepted by the model we would like to use, and then tune a few hyperparameters of the model. For example, some hyperparams to tune for a random forest may be a number of trees to grow and a number of candidate features at each split ( mtry in R randomForest). For a neural network, there are quite a lot of hyperparams: number of layers, number of neurons in each layer (specifically, in each hid
6 0.14596756 43 fast ml-2013-11-02-Maxing out the digits
7 0.14514796 15 fast ml-2013-01-07-Machine learning courses online
8 0.12959118 22 fast ml-2013-03-07-Choosing a machine learning algorithm
9 0.12480411 34 fast ml-2013-07-14-Running things on a GPU
10 0.1123956 54 fast ml-2014-03-06-PyBrain - a simple neural networks library in Python
11 0.11117921 7 fast ml-2012-10-05-Predicting closed questions on Stack Overflow
12 0.1003868 16 fast ml-2013-01-12-Intro to random forests
13 0.10010647 31 fast ml-2013-06-19-Go non-linear with Vowpal Wabbit
14 0.099226341 46 fast ml-2013-12-07-13 NIPS papers that caught our eye
15 0.097446233 48 fast ml-2013-12-28-Regularizing neural networks with dropout and with DropConnect
16 0.092507318 18 fast ml-2013-01-17-A very fast denoising autoencoder
17 0.091939658 52 fast ml-2014-02-02-Yesterday a kaggler, today a Kaggle master: a wrap-up of the cats and dogs competition
18 0.091784202 58 fast ml-2014-04-12-Deep learning these days
19 0.091419928 50 fast ml-2014-01-20-How to get predictions from Pylearn2
20 0.089902051 55 fast ml-2014-03-20-Good representations, distance, metric learning and supervised dimensionality reduction
topicId topicWeight
[(26, 0.02), (31, 0.035), (69, 0.094), (71, 0.059), (73, 0.641), (99, 0.052)]
simIndex simValue blogId blogTitle
same-blog 1 0.95017236 57 fast ml-2014-04-01-Exclusive Geoff Hinton interview
Introduction: Geoff Hinton is a living legend. He almost single-handedly invented backpropagation for training feed-forward neural networks. Despite in theory being universal function approximators, these networks turned out to be pretty much useless for more complex problems, like computer vision and speech recognition. Professor Hinton responded by creating deep networks and deep learning, an ultimate form of machine learning. Recently we’ve been fortunate to ask Geoff a few questions and have him answer them. Geoff, thanks so much for talking to us. You’ve had a long and fruitful career. What drives you these days? Well, after a man hits a certain age, his priorities change. Back in the 80s I was happy when I was able to train a network with eight hidden units. Now I can finally have thousands and possibly millions of them. So I guess the answer is scale. Apart from that, I like people at Google and I like making them a ton of money. They happen to pay me well, so it’s a win-win situ
2 0.18790558 16 fast ml-2013-01-12-Intro to random forests
Introduction: Let’s step back from forays into cutting edge topics and look at a random forest, one of the most popular machine learning techniques today. Why is it so attractive? First of all, decision tree ensembles have been found by Caruana et al. as the best overall approach for a variety of problems. Random forests, specifically, perform well both in low dimensional and high dimensional tasks. There are basically two kinds of tree ensembles: bagged trees and boosted trees. Bagging means that when building each subsequent tree, we don’t look at the earlier trees, while in boosting we consider the earlier trees and strive to compensate for their weaknesses (which may lead to overfitting). Random forest is an example of the bagging approach, less prone to overfit. Gradient boosted trees (notably GBM package in R) represent the other one. Both are very successful in many applications. Trees are also relatively fast to train, compared to some more involved methods. Besides effectivnes
3 0.18761533 62 fast ml-2014-05-26-Yann LeCun's answers from the Reddit AMA
Introduction: On May 15th Yann LeCun answered “ask me anything” questions on Reddit . We hand-picked some of his thoughts and grouped them by topic for your enjoyment. Toronto, Montreal and New York All three groups are strong and complementary. Geoff (who spends more time at Google than in Toronto now) and Russ Salakhutdinov like RBMs and deep Boltzmann machines. I like the idea of Boltzmann machines (it’s a beautifully simple concept) but it doesn’t scale well. Also, I totally hate sampling. Yoshua and his colleagues have focused a lot on various unsupervised learning, including denoising auto-encoders, contracting auto-encoders. They are not allergic to sampling like I am. On the application side, they have worked on text, not so much on images. In our lab at NYU (Rob Fergus, David Sontag, me and our students and postdocs), we have been focusing on sparse auto-encoders for unsupervised learning. They have the advantage of scaling well. We have also worked on applications, mostly to v
4 0.18332115 54 fast ml-2014-03-06-PyBrain - a simple neural networks library in Python
Introduction: We have already written a few articles about Pylearn2 . Today we’ll look at PyBrain. It is another Python neural networks library, and this is where similiarites end. They’re like day and night: Pylearn2 - Byzantinely complicated, PyBrain - simple. We attempted to train a regression model and succeeded at first take (more on this below). Try this with Pylearn2. While there are a few machine learning libraries out there, PyBrain aims to be a very easy-to-use modular library that can be used by entry-level students but still offers the flexibility and algorithms for state-of-the-art research. The library features classic perceptron as well as recurrent neural networks and other things, some of which, for example Evolino , would be hard to find elsewhere. On the downside, PyBrain feels unfinished, abandoned. It is no longer actively developed and the documentation is skimpy. There’s no modern gimmicks like dropout and rectified linear units - just good ol’ sigmoid and ta
5 0.17098825 34 fast ml-2013-07-14-Running things on a GPU
Introduction: You’ve heard about running things on a graphics card, but have you tried it? All you need to taste the speed is a Nvidia card and some software. We run experiments using Cudamat and Theano in Python. GPUs differ from CPUs in that they are optimized for throughput instead of latency. Here’s a metaphor: when you play an online game, you want fast response times, or low latency. Otherwise you get lag. However when you’re downloading a movie, you don’t care about response times, you care about bandwidth - that’s throughput. Massive computations are similiar to downloading a movie in this respect. The setup We’ll be testing things on a platform with an Intel Dual Core CPU @3Ghz and either GeForce 9600 GT, an old card, or GeForce GTX 550 Ti, a more recent card. See the appendix for more info on GPU hardware. Software we’ll be using is Python. On CPU it employs one core*. That is OK in everyday use because while one core is working hard, you can comfortably do something else becau
6 0.16607654 18 fast ml-2013-01-17-A very fast denoising autoencoder
7 0.16544773 13 fast ml-2012-12-27-Spearmint with a random forest
8 0.16427834 27 fast ml-2013-05-01-Deep learning made easy
9 0.16102141 48 fast ml-2013-12-28-Regularizing neural networks with dropout and with DropConnect
10 0.1600194 9 fast ml-2012-10-25-So you want to work for Facebook
11 0.15631312 19 fast ml-2013-02-07-The secret of the big guys
12 0.15552868 12 fast ml-2012-12-21-Tuning hyperparams automatically with Spearmint
13 0.15375385 41 fast ml-2013-10-09-Big data made easy
14 0.15349787 1 fast ml-2012-08-09-What you wanted to know about Mean Average Precision
15 0.15349041 43 fast ml-2013-11-02-Maxing out the digits
16 0.15191542 40 fast ml-2013-10-06-Pylearn2 in practice
17 0.1510178 14 fast ml-2013-01-04-Madelon: Spearmint's revenge
18 0.15057753 17 fast ml-2013-01-14-Feature selection in practice
19 0.14800411 25 fast ml-2013-04-10-Gender discrimination
20 0.14786349 26 fast ml-2013-04-17-Regression as classification