hunch_net hunch_net-2011 hunch_net-2011-438 knowledge-graph by maker-knowledge-mining
Source: html
Introduction: Maybe it’s too early to call, but with four separate Neural Network sessions at this year’s ICML , it looks like Neural Networks are making a comeback. Here are my highlights of these sessions. In general, my feeling is that these papers both demystify deep learning and show its broader applicability. The first observation I made is that the once disreputable “Neural” nomenclature is being used again in lieu of “deep learning”. Maybe it’s because Adam Coates et al. showed that single layer networks can work surprisingly well. An Analysis of Single-Layer Networks in Unsupervised Feature Learning , Adam Coates , Honglak Lee , Andrew Y. Ng (AISTATS 2011) The Importance of Encoding Versus Training with Sparse Coding and Vector Quantization , Adam Coates , Andrew Y. Ng (ICML 2011) Another surprising result out of Andrew Ng’s group comes from Andrew Saxe et al. who show that certain convolutional pooling architectures can obtain close to state-of-the-art pe
sentIndex sentText sentNum sentScore
1 In general, my feeling is that these papers both demystify deep learning and show its broader applicability. [sent-3, score-0.332]
2 showed that single layer networks can work surprisingly well. [sent-6, score-0.205]
3 Ng (ICML 2011) Another surprising result out of Andrew Ng’s group comes from Andrew Saxe et al. [sent-9, score-0.268]
4 who show that certain convolutional pooling architectures can obtain close to state-of-the-art performance with random weights (that is, without actually learning). [sent-10, score-0.346]
5 Ng Of course, in most cases we do want to train these models eventually. [sent-13, score-0.189]
6 There were two interesting papers on the topic of training neural networks. [sent-14, score-0.547]
7 show that a simple, off-the-shelf L-BFGS optimizer is often preferable to stochastic gradient descent. [sent-16, score-0.124]
8 On optimization methods for deep learning , Quoc V. [sent-17, score-0.133]
9 It certainly seems possible since even with standard L-BFGS our recursive neural network (see previous post) can outperform CRF-type models on several challenging computer vision tasks such as semantic segmentation of scene images. [sent-20, score-0.904]
10 This common vision task of labeling each pixel with an object class has not received much attention from the deep learning community. [sent-21, score-0.296]
11 Apart from the vision experiments, this paper further solidifies the trend that neural networks are being used more and more in natural language processing. [sent-22, score-0.737]
12 Another neat example of this trend comes from Yann Dauphin et al. [sent-24, score-0.297]
13 They present an interesting solution for learning with sparse bag-of-word representations. [sent-26, score-0.157]
14 Large-Scale Learning of Embeddings with Reconstruction Sampling , Yann Dauphin, Xavier Glorot, Yoshua Bengio Such sparse representations had previously been problematic for neural architectures. [sent-27, score-0.491]
15 In summary, these papers have helped us understand a bit better which “deep” or “neural” architectures work, why they work and how we should train them. [sent-28, score-0.328]
16 Furthermore, the scope of problems that these architectures can handle has been widened to harder and more real-life problems. [sent-29, score-0.135]
17 Of the non-neural papers, these two papers stood out for me: Sparse Additive Generative Models of Text , Jacob Eisenstein , Amr Ahmed , Eric Xing – the idea is to model each topic only in terms of how it differs from a background distribution. [sent-30, score-0.134]
wordName wordTfidf (topN-words)
[('neural', 0.334), ('coates', 0.262), ('andrew', 0.258), ('ng', 0.212), ('networks', 0.205), ('et', 0.204), ('recurrent', 0.197), ('adam', 0.161), ('sparse', 0.157), ('architectures', 0.135), ('deep', 0.133), ('dauphin', 0.131), ('diffusion', 0.131), ('saxe', 0.131), ('sutskever', 0.131), ('show', 0.124), ('tasks', 0.121), ('train', 0.118), ('quoc', 0.116), ('martens', 0.116), ('le', 0.108), ('vision', 0.105), ('bengio', 0.102), ('yoshua', 0.102), ('trend', 0.093), ('outperform', 0.093), ('dirichlet', 0.093), ('weights', 0.087), ('unsupervised', 0.08), ('training', 0.079), ('papers', 0.075), ('yann', 0.074), ('models', 0.071), ('trees', 0.068), ('sequence', 0.066), ('maybe', 0.065), ('network', 0.064), ('group', 0.064), ('topic', 0.059), ('embeddings', 0.058), ('jacob', 0.058), ('ghahramani', 0.058), ('scene', 0.058), ('maneesh', 0.058), ('segmentation', 0.058), ('wei', 0.058), ('pixel', 0.058), ('chen', 0.058), ('bobby', 0.058), ('additive', 0.058)]
simIndex simValue blogId blogTitle
same-blog 1 1.0000001 438 hunch net-2011-07-11-Interesting Neural Network Papers at ICML 2011
Introduction: Maybe it’s too early to call, but with four separate Neural Network sessions at this year’s ICML , it looks like Neural Networks are making a comeback. Here are my highlights of these sessions. In general, my feeling is that these papers both demystify deep learning and show its broader applicability. The first observation I made is that the once disreputable “Neural” nomenclature is being used again in lieu of “deep learning”. Maybe it’s because Adam Coates et al. showed that single layer networks can work surprisingly well. An Analysis of Single-Layer Networks in Unsupervised Feature Learning , Adam Coates , Honglak Lee , Andrew Y. Ng (AISTATS 2011) The Importance of Encoding Versus Training with Sparse Coding and Vector Quantization , Adam Coates , Andrew Y. Ng (ICML 2011) Another surprising result out of Andrew Ng’s group comes from Andrew Saxe et al. who show that certain convolutional pooling architectures can obtain close to state-of-the-art pe
2 0.25044847 201 hunch net-2006-08-07-The Call of the Deep
Introduction: Many learning algorithms used in practice are fairly simple. Viewed representationally, many prediction algorithms either compute a linear separator of basic features (perceptron, winnow, weighted majority, SVM) or perhaps a linear separator of slightly more complex features (2-layer neural networks or kernelized SVMs). Should we go beyond this, and start using “deep” representations? What is deep learning? Intuitively, deep learning is about learning to predict in ways which can involve complex dependencies between the input (observed) features. Specifying this more rigorously turns out to be rather difficult. Consider the following cases: SVM with Gaussian Kernel. This is not considered deep learning, because an SVM with a gaussian kernel can’t succinctly represent certain decision surfaces. One of Yann LeCun ‘s examples is recognizing objects based on pixel values. An SVM will need a new support vector for each significantly different background. Since the number
3 0.16372432 16 hunch net-2005-02-09-Intuitions from applied learning
Introduction: Since learning is far from an exact science, it’s good to pay attention to basic intuitions of applied learning. Here are a few I’ve collected. Integration In Bayesian learning, the posterior is computed by an integral, and the optimal thing to do is to predict according to this integral. This phenomena seems to be far more general. Bagging, Boosting, SVMs, and Neural Networks all take advantage of this idea to some extent. The phenomena is more general: you can average over many different classification predictors to improve performance. Sources: Zoubin , Caruana Differentiation Different pieces of an average should differentiate to achieve good performance by different methods. This is know as the ‘symmetry breaking’ problem for neural networks, and it’s why weights are initialized randomly. Boosting explicitly attempts to achieve good differentiation by creating new, different, learning problems. Sources: Yann LeCun , Phil Long Deep Representation Ha
4 0.14028822 224 hunch net-2006-12-12-Interesting Papers at NIPS 2006
Introduction: Here are some papers that I found surprisingly interesting. Yoshua Bengio , Pascal Lamblin, Dan Popovici, Hugo Larochelle, Greedy Layer-wise Training of Deep Networks . Empirically investigates some of the design choices behind deep belief networks. Long Zhu , Yuanhao Chen, Alan Yuille Unsupervised Learning of a Probabilistic Grammar for Object Detection and Parsing. An unsupervised method for detecting objects using simple feature filters that works remarkably well on the (supervised) caltech-101 dataset . Shai Ben-David , John Blitzer , Koby Crammer , and Fernando Pereira , Analysis of Representations for Domain Adaptation . This is the first analysis I’ve seen of learning with respect to samples drawn differently from the evaluation distribution which depends on reasonable measurable quantities. All of these papers turn out to have a common theme—the power of unlabeled data to do generically useful things.
5 0.12741691 101 hunch net-2005-08-08-Apprenticeship Reinforcement Learning for Control
Introduction: Pieter Abbeel presented a paper with Andrew Ng at ICML on Exploration and Apprenticeship Learning in Reinforcement Learning . The basic idea of this algorithm is: Collect data from a human controlling a machine. Build a transition model based upon the experience. Build a policy which optimizes the transition model. Evaluate the policy. If it works well, halt, otherwise add the experience into the pool and go to (2). The paper proves that this technique will converge to some policy with expected performance near human expected performance assuming the world fits certain assumptions (MDP or linear dynamics). This general idea of apprenticeship learning (i.e. incorporating data from an expert) seems very compelling because (a) humans often learn this way and (b) much harder problems can be solved. For (a), the notion of teaching is about transferring knowledge from an expert to novices, often via demonstration. To see (b), note that we can create intricate rei
6 0.12143345 407 hunch net-2010-08-23-Boosted Decision Trees for Deep Learning
7 0.11625649 152 hunch net-2006-01-30-Should the Input Representation be a Vector?
8 0.10825863 97 hunch net-2005-07-23-Interesting papers at ACL
9 0.10651304 227 hunch net-2007-01-10-A Deep Belief Net Learning Problem
10 0.10643776 456 hunch net-2012-02-24-ICML+50%
11 0.10515977 219 hunch net-2006-11-22-Explicit Randomization in Learning algorithms
12 0.10495069 189 hunch net-2006-07-05-more icml papers
13 0.10389647 385 hunch net-2009-12-27-Interesting things at NIPS 2009
14 0.10370748 197 hunch net-2006-07-17-A Winner
15 0.098195516 111 hunch net-2005-09-12-Fast Gradient Descent
16 0.096026011 131 hunch net-2005-11-16-The Everything Ensemble Edge
17 0.09373609 469 hunch net-2012-07-09-Videolectures
18 0.091279462 144 hunch net-2005-12-28-Yet more nips thoughts
19 0.088950641 188 hunch net-2006-06-30-ICML papers
20 0.088676192 431 hunch net-2011-04-18-A paper not at Snowbird
topicId topicWeight
[(0, 0.164), (1, 0.023), (2, 0.014), (3, -0.024), (4, 0.143), (5, 0.002), (6, -0.066), (7, -0.018), (8, 0.09), (9, -0.1), (10, -0.072), (11, -0.073), (12, -0.181), (13, -0.217), (14, 0.061), (15, 0.206), (16, 0.039), (17, 0.096), (18, -0.06), (19, 0.019), (20, -0.003), (21, -0.002), (22, -0.039), (23, -0.019), (24, 0.017), (25, 0.075), (26, -0.039), (27, 0.058), (28, -0.05), (29, 0.043), (30, 0.068), (31, 0.029), (32, 0.048), (33, -0.01), (34, 0.103), (35, -0.023), (36, -0.004), (37, 0.019), (38, -0.018), (39, -0.056), (40, 0.045), (41, 0.025), (42, 0.008), (43, -0.009), (44, 0.023), (45, -0.03), (46, 0.029), (47, -0.017), (48, 0.0), (49, 0.039)]
simIndex simValue blogId blogTitle
same-blog 1 0.97291803 438 hunch net-2011-07-11-Interesting Neural Network Papers at ICML 2011
Introduction: Maybe it’s too early to call, but with four separate Neural Network sessions at this year’s ICML , it looks like Neural Networks are making a comeback. Here are my highlights of these sessions. In general, my feeling is that these papers both demystify deep learning and show its broader applicability. The first observation I made is that the once disreputable “Neural” nomenclature is being used again in lieu of “deep learning”. Maybe it’s because Adam Coates et al. showed that single layer networks can work surprisingly well. An Analysis of Single-Layer Networks in Unsupervised Feature Learning , Adam Coates , Honglak Lee , Andrew Y. Ng (AISTATS 2011) The Importance of Encoding Versus Training with Sparse Coding and Vector Quantization , Adam Coates , Andrew Y. Ng (ICML 2011) Another surprising result out of Andrew Ng’s group comes from Andrew Saxe et al. who show that certain convolutional pooling architectures can obtain close to state-of-the-art pe
2 0.76531106 201 hunch net-2006-08-07-The Call of the Deep
Introduction: Many learning algorithms used in practice are fairly simple. Viewed representationally, many prediction algorithms either compute a linear separator of basic features (perceptron, winnow, weighted majority, SVM) or perhaps a linear separator of slightly more complex features (2-layer neural networks or kernelized SVMs). Should we go beyond this, and start using “deep” representations? What is deep learning? Intuitively, deep learning is about learning to predict in ways which can involve complex dependencies between the input (observed) features. Specifying this more rigorously turns out to be rather difficult. Consider the following cases: SVM with Gaussian Kernel. This is not considered deep learning, because an SVM with a gaussian kernel can’t succinctly represent certain decision surfaces. One of Yann LeCun ‘s examples is recognizing objects based on pixel values. An SVM will need a new support vector for each significantly different background. Since the number
3 0.67520928 224 hunch net-2006-12-12-Interesting Papers at NIPS 2006
Introduction: Here are some papers that I found surprisingly interesting. Yoshua Bengio , Pascal Lamblin, Dan Popovici, Hugo Larochelle, Greedy Layer-wise Training of Deep Networks . Empirically investigates some of the design choices behind deep belief networks. Long Zhu , Yuanhao Chen, Alan Yuille Unsupervised Learning of a Probabilistic Grammar for Object Detection and Parsing. An unsupervised method for detecting objects using simple feature filters that works remarkably well on the (supervised) caltech-101 dataset . Shai Ben-David , John Blitzer , Koby Crammer , and Fernando Pereira , Analysis of Representations for Domain Adaptation . This is the first analysis I’ve seen of learning with respect to samples drawn differently from the evaluation distribution which depends on reasonable measurable quantities. All of these papers turn out to have a common theme—the power of unlabeled data to do generically useful things.
4 0.67281687 407 hunch net-2010-08-23-Boosted Decision Trees for Deep Learning
Introduction: About 4 years ago, I speculated that decision trees qualify as a deep learning algorithm because they can make decisions which are substantially nonlinear in the input representation. Ping Li has proved this correct, empirically at UAI by showing that boosted decision trees can beat deep belief networks on versions of Mnist which are artificially hardened so as to make them solvable only by deep learning algorithms. This is an important point, because the ability to solve these sorts of problems is probably the best objective definition of a deep learning algorithm we have. I’m not that surprised. In my experience, if you can accept the computational drawbacks of a boosted decision tree, they can achieve pretty good performance. Geoff Hinton once told me that the great thing about deep belief networks is that they work. I understand that Ping had very substantial difficulty in getting this published, so I hope some reviewers step up to the standard of valuing wha
5 0.6683079 16 hunch net-2005-02-09-Intuitions from applied learning
Introduction: Since learning is far from an exact science, it’s good to pay attention to basic intuitions of applied learning. Here are a few I’ve collected. Integration In Bayesian learning, the posterior is computed by an integral, and the optimal thing to do is to predict according to this integral. This phenomena seems to be far more general. Bagging, Boosting, SVMs, and Neural Networks all take advantage of this idea to some extent. The phenomena is more general: you can average over many different classification predictors to improve performance. Sources: Zoubin , Caruana Differentiation Different pieces of an average should differentiate to achieve good performance by different methods. This is know as the ‘symmetry breaking’ problem for neural networks, and it’s why weights are initialized randomly. Boosting explicitly attempts to achieve good differentiation by creating new, different, learning problems. Sources: Yann LeCun , Phil Long Deep Representation Ha
6 0.64538836 431 hunch net-2011-04-18-A paper not at Snowbird
7 0.56988764 477 hunch net-2013-01-01-Deep Learning 2012
8 0.53671449 219 hunch net-2006-11-22-Explicit Randomization in Learning algorithms
9 0.5162406 227 hunch net-2007-01-10-A Deep Belief Net Learning Problem
10 0.51121378 152 hunch net-2006-01-30-Should the Input Representation be a Vector?
11 0.51020283 349 hunch net-2009-04-21-Interesting Presentations at Snowbird
12 0.50614429 32 hunch net-2005-02-27-Antilearning: When proximity goes bad
13 0.49447107 189 hunch net-2006-07-05-more icml papers
14 0.4897348 466 hunch net-2012-06-05-ICML acceptance statistics
15 0.48887587 188 hunch net-2006-06-30-ICML papers
16 0.48382545 144 hunch net-2005-12-28-Yet more nips thoughts
17 0.48154461 329 hunch net-2008-11-28-A Bumper Crop of Machine Learning Graduates
18 0.46255356 197 hunch net-2006-07-17-A Winner
19 0.43913597 456 hunch net-2012-02-24-ICML+50%
20 0.43776488 101 hunch net-2005-08-08-Apprenticeship Reinforcement Learning for Control
topicId topicWeight
[(10, 0.015), (12, 0.268), (16, 0.024), (27, 0.123), (30, 0.033), (37, 0.018), (38, 0.023), (49, 0.059), (53, 0.113), (55, 0.062), (60, 0.023), (61, 0.015), (65, 0.036), (94, 0.081), (95, 0.02)]
simIndex simValue blogId blogTitle
same-blog 1 0.8646822 438 hunch net-2011-07-11-Interesting Neural Network Papers at ICML 2011
Introduction: Maybe it’s too early to call, but with four separate Neural Network sessions at this year’s ICML , it looks like Neural Networks are making a comeback. Here are my highlights of these sessions. In general, my feeling is that these papers both demystify deep learning and show its broader applicability. The first observation I made is that the once disreputable “Neural” nomenclature is being used again in lieu of “deep learning”. Maybe it’s because Adam Coates et al. showed that single layer networks can work surprisingly well. An Analysis of Single-Layer Networks in Unsupervised Feature Learning , Adam Coates , Honglak Lee , Andrew Y. Ng (AISTATS 2011) The Importance of Encoding Versus Training with Sparse Coding and Vector Quantization , Adam Coates , Andrew Y. Ng (ICML 2011) Another surprising result out of Andrew Ng’s group comes from Andrew Saxe et al. who show that certain convolutional pooling architectures can obtain close to state-of-the-art pe
2 0.85298288 24 hunch net-2005-02-19-Machine learning reading groups
Introduction: Yaroslav collected an extensive list of machine learning reading groups .
3 0.8486498 421 hunch net-2011-01-03-Herman Goldstine 2011
Introduction: Vikas points out the Herman Goldstine Fellowship at IBM . I was a Herman Goldstine Fellow, and benefited from the experience a great deal—that’s where work on learning reductions started. If you can do research independently, it’s recommended. Applications are due January 6.
4 0.75655836 482 hunch net-2013-05-04-COLT and ICML registration
Introduction: Sebastien Bubeck points out COLT registration with a May 13 early registration deadline. The local organizers have done an admirable job of containing costs with a $300 registration fee. ICML registration is also available, at about an x3 higher cost. My understanding is that this is partly due to the costs of a larger conference being harder to contain, partly due to ICML lasting twice as long with tutorials and workshops, and partly because the conference organizers were a bit over-conservative in various ways.
5 0.60674822 311 hunch net-2008-07-26-Compositional Machine Learning Algorithm Design
Introduction: There were two papers at ICML presenting learning algorithms for a contextual bandit -style setting, where the loss for all labels is not known, but the loss for one label is known. (The first might require a exploration scavenging viewpoint to understand if the experimental assignment was nonrandom.) I strongly approve of these papers and further work in this setting and its variants, because I expect it to become more important than supervised learning. As a quick review, we are thinking about situations where repeatedly: The world reveals feature values (aka context information). A policy chooses an action. The world provides a reward. Sometimes this is done in an online fashion where the policy can change based on immediate feedback and sometimes it’s done in a batch setting where many samples are collected before the policy can change. If you haven’t spent time thinking about the setting, you might want to because there are many natural applications. I’m g
6 0.56516421 259 hunch net-2007-08-19-Choice of Metrics
7 0.55730039 365 hunch net-2009-07-31-Vowpal Wabbit Open Source Project
8 0.5553841 141 hunch net-2005-12-17-Workshops as Franchise Conferences
9 0.55042315 249 hunch net-2007-06-21-Presentation Preparation
10 0.55004066 201 hunch net-2006-08-07-The Call of the Deep
11 0.54877567 359 hunch net-2009-06-03-Functionally defined Nonlinear Dynamic Models
12 0.54687464 37 hunch net-2005-03-08-Fast Physics for Learning
13 0.54564834 444 hunch net-2011-09-07-KDD and MUCMD 2011
14 0.54361457 297 hunch net-2008-04-22-Taking the next step
15 0.5406099 286 hunch net-2008-01-25-Turing’s Club for Machine Learning
16 0.54060376 191 hunch net-2006-07-08-MaxEnt contradicts Bayes Rule?
17 0.53933442 416 hunch net-2010-10-29-To Vidoelecture or not
18 0.53648496 435 hunch net-2011-05-16-Research Directions for Machine Learning and Algorithms
19 0.53426635 19 hunch net-2005-02-14-Clever Methods of Overfitting
20 0.53188986 207 hunch net-2006-09-12-Incentive Compatible Reviewing