hunch_net hunch_net-2006 hunch_net-2006-201 knowledge-graph by maker-knowledge-mining
Source: html
Introduction: Many learning algorithms used in practice are fairly simple. Viewed representationally, many prediction algorithms either compute a linear separator of basic features (perceptron, winnow, weighted majority, SVM) or perhaps a linear separator of slightly more complex features (2-layer neural networks or kernelized SVMs). Should we go beyond this, and start using “deep” representations? What is deep learning? Intuitively, deep learning is about learning to predict in ways which can involve complex dependencies between the input (observed) features. Specifying this more rigorously turns out to be rather difficult. Consider the following cases: SVM with Gaussian Kernel. This is not considered deep learning, because an SVM with a gaussian kernel can’t succinctly represent certain decision surfaces. One of Yann LeCun ‘s examples is recognizing objects based on pixel values. An SVM will need a new support vector for each significantly different background. Since the number
sentIndex sentText sentNum sentScore
1 Viewed representationally, many prediction algorithms either compute a linear separator of basic features (perceptron, winnow, weighted majority, SVM) or perhaps a linear separator of slightly more complex features (2-layer neural networks or kernelized SVMs). [sent-2, score-1.276]
2 Intuitively, deep learning is about learning to predict in ways which can involve complex dependencies between the input (observed) features. [sent-5, score-1.023]
3 This is not considered deep learning, because an SVM with a gaussian kernel can’t succinctly represent certain decision surfaces. [sent-8, score-1.38]
4 An SVM will need a new support vector for each significantly different background. [sent-10, score-0.133]
5 This is not considered deep learning for essentially the same reason as the gaussian SVM. [sent-13, score-1.086]
6 A decision tree might be considered a deep learning system. [sent-16, score-1.092]
7 However, there exist simple learning problems that defeat decision trees using axis aligned splits. [sent-17, score-0.573]
8 It’s easy to find problems that defeat such decision trees by rotating a linear separator through many dimensions. [sent-18, score-0.667]
9 A two layer neural network isn’t considered deep learning because it isnt a deep architecture. [sent-20, score-2.003]
10 More importantly, perhaps, the object recognition with occluding background problem implies that the hidden layer must be very large to do general purpose detection. [sent-21, score-0.262]
11 (for example, convolutional neural networks ) A neural network with several layers might be considered deep . [sent-23, score-1.716]
12 Automated feature generation and selection systems might be considered deep since they can certainly develop deep dependencies between the input and the output. [sent-25, score-1.85]
13 One test for a deep learning system is: are there well-defined learning problems which the system can not solve but a human easily could? [sent-26, score-0.929]
14 If the answer is ‘yes’, then it’s perhaps not a deep learning system. [sent-27, score-0.736]
15 There are several theorems of the form: “nearest neighbor can learn any measurable function”, “2 layer neural networks can represent any function”, “a support vector machine with a gaussian kernel can learn any function”. [sent-29, score-1.314]
16 These theorems imply that deep learning is only interesting in the bounded data or computation case. [sent-30, score-0.801]
17 And yet, for the small data situation (think “30 examples”), problems with overfitting become so severe it’s difficult to imagine using more complex learning algorithms than the shallow systems comonly in use. [sent-31, score-0.415]
18 So the domain where a deep learning system might be most useful involves large quantities of data with computational constraints. [sent-32, score-0.862]
19 What are the principles of design for deep learning systems? [sent-33, score-0.729]
20 Can we learn an architecture on the fly or must it be prespecified? [sent-36, score-0.172]
wordName wordTfidf (topN-words)
[('deep', 0.607), ('neural', 0.243), ('considered', 0.232), ('separator', 0.2), ('svm', 0.184), ('gaussian', 0.18), ('layer', 0.174), ('networks', 0.134), ('defeat', 0.133), ('dependencies', 0.116), ('decision', 0.114), ('represent', 0.107), ('complex', 0.097), ('background', 0.088), ('linear', 0.085), ('systems', 0.08), ('trees', 0.077), ('function', 0.076), ('theorems', 0.076), ('kernel', 0.073), ('network', 0.073), ('might', 0.072), ('vector', 0.072), ('input', 0.069), ('learning', 0.067), ('succinctly', 0.067), ('representationally', 0.067), ('generation', 0.067), ('pixel', 0.067), ('learn', 0.066), ('system', 0.065), ('perhaps', 0.062), ('axis', 0.062), ('backgrounds', 0.062), ('aligned', 0.062), ('measurable', 0.062), ('layers', 0.062), ('shallow', 0.062), ('support', 0.061), ('problems', 0.058), ('winnow', 0.058), ('kernelized', 0.058), ('recognizing', 0.058), ('features', 0.056), ('principles', 0.055), ('architecture', 0.053), ('fly', 0.053), ('intuitively', 0.053), ('data', 0.051), ('convolutional', 0.05)]
simIndex simValue blogId blogTitle
same-blog 1 1.0000002 201 hunch net-2006-08-07-The Call of the Deep
Introduction: Many learning algorithms used in practice are fairly simple. Viewed representationally, many prediction algorithms either compute a linear separator of basic features (perceptron, winnow, weighted majority, SVM) or perhaps a linear separator of slightly more complex features (2-layer neural networks or kernelized SVMs). Should we go beyond this, and start using “deep” representations? What is deep learning? Intuitively, deep learning is about learning to predict in ways which can involve complex dependencies between the input (observed) features. Specifying this more rigorously turns out to be rather difficult. Consider the following cases: SVM with Gaussian Kernel. This is not considered deep learning, because an SVM with a gaussian kernel can’t succinctly represent certain decision surfaces. One of Yann LeCun ‘s examples is recognizing objects based on pixel values. An SVM will need a new support vector for each significantly different background. Since the number
2 0.32483643 407 hunch net-2010-08-23-Boosted Decision Trees for Deep Learning
Introduction: About 4 years ago, I speculated that decision trees qualify as a deep learning algorithm because they can make decisions which are substantially nonlinear in the input representation. Ping Li has proved this correct, empirically at UAI by showing that boosted decision trees can beat deep belief networks on versions of Mnist which are artificially hardened so as to make them solvable only by deep learning algorithms. This is an important point, because the ability to solve these sorts of problems is probably the best objective definition of a deep learning algorithm we have. I’m not that surprised. In my experience, if you can accept the computational drawbacks of a boosted decision tree, they can achieve pretty good performance. Geoff Hinton once told me that the great thing about deep belief networks is that they work. I understand that Ping had very substantial difficulty in getting this published, so I hope some reviewers step up to the standard of valuing wha
3 0.28592381 227 hunch net-2007-01-10-A Deep Belief Net Learning Problem
Introduction: “Deep learning” is used to describe learning architectures which have significant depth (as a circuit). One claim is that shallow architectures (one or two layers) can not concisely represent some functions while a circuit with more depth can concisely represent these same functions. Proving lower bounds on the size of a circuit is substantially harder than upper bounds (which are constructive), but some results are known. Luca Trevisan ‘s class notes detail how XOR is not concisely representable by “AC0″ (= constant depth unbounded fan-in AND, OR, NOT gates). This doesn’t quite prove that depth is necessary for the representations commonly used in learning (such as a thresholded weighted sum), but it is strongly suggestive that this is so. Examples like this are a bit disheartening because existing algorithms for deep learning (deep belief nets, gradient descent on deep neural networks, and a perhaps decision trees depending on who you ask) can’t learn XOR very easily.
4 0.25044847 438 hunch net-2011-07-11-Interesting Neural Network Papers at ICML 2011
Introduction: Maybe it’s too early to call, but with four separate Neural Network sessions at this year’s ICML , it looks like Neural Networks are making a comeback. Here are my highlights of these sessions. In general, my feeling is that these papers both demystify deep learning and show its broader applicability. The first observation I made is that the once disreputable “Neural” nomenclature is being used again in lieu of “deep learning”. Maybe it’s because Adam Coates et al. showed that single layer networks can work surprisingly well. An Analysis of Single-Layer Networks in Unsupervised Feature Learning , Adam Coates , Honglak Lee , Andrew Y. Ng (AISTATS 2011) The Importance of Encoding Versus Training with Sparse Coding and Vector Quantization , Adam Coates , Andrew Y. Ng (ICML 2011) Another surprising result out of Andrew Ng’s group comes from Andrew Saxe et al. who show that certain convolutional pooling architectures can obtain close to state-of-the-art pe
5 0.22586417 477 hunch net-2013-01-01-Deep Learning 2012
Introduction: 2012 was a tumultuous year for me, but it was undeniably a great year for deep learning efforts. Signs of this include: Winning a Kaggle competition . Wide adoption of deep learning for speech recognition . Significant industry support . Gains in image recognition . This is a rare event in research: a significant capability breakout. Congratulations are definitely in order for those who managed to achieve it. At this point, deep learning algorithms seem like a choice undeniably worth investigating for real applications with significant data.
6 0.18834843 16 hunch net-2005-02-09-Intuitions from applied learning
7 0.158685 224 hunch net-2006-12-12-Interesting Papers at NIPS 2006
8 0.15569611 152 hunch net-2006-01-30-Should the Input Representation be a Vector?
9 0.14196846 431 hunch net-2011-04-18-A paper not at Snowbird
10 0.13688639 131 hunch net-2005-11-16-The Everything Ensemble Edge
11 0.13478552 219 hunch net-2006-11-22-Explicit Randomization in Learning algorithms
12 0.12484159 329 hunch net-2008-11-28-A Bumper Crop of Machine Learning Graduates
13 0.11483438 337 hunch net-2009-01-21-Nearly all natural problems require nonlinearity
14 0.1124365 235 hunch net-2007-03-03-All Models of Learning have Flaws
15 0.11081503 3 hunch net-2005-01-24-The Humanloop Spectrum of Machine Learning
16 0.10882462 466 hunch net-2012-06-05-ICML acceptance statistics
17 0.10782007 435 hunch net-2011-05-16-Research Directions for Machine Learning and Algorithms
18 0.10704906 197 hunch net-2006-07-17-A Winner
19 0.10299331 266 hunch net-2007-10-15-NIPS workshops extended to 3 days
20 0.098642193 286 hunch net-2008-01-25-Turing’s Club for Machine Learning
topicId topicWeight
[(0, 0.212), (1, 0.095), (2, -0.038), (3, -0.012), (4, 0.147), (5, -0.012), (6, -0.081), (7, 0.071), (8, 0.106), (9, -0.098), (10, -0.195), (11, -0.132), (12, -0.11), (13, -0.242), (14, -0.01), (15, 0.32), (16, 0.016), (17, 0.074), (18, -0.139), (19, 0.136), (20, 0.027), (21, 0.076), (22, 0.031), (23, 0.024), (24, -0.042), (25, 0.016), (26, -0.033), (27, 0.012), (28, 0.045), (29, 0.094), (30, 0.063), (31, 0.042), (32, 0.002), (33, 0.031), (34, 0.118), (35, -0.043), (36, 0.054), (37, -0.029), (38, -0.063), (39, -0.017), (40, -0.016), (41, 0.082), (42, -0.054), (43, -0.041), (44, -0.004), (45, 0.016), (46, -0.002), (47, -0.027), (48, 0.032), (49, -0.061)]
simIndex simValue blogId blogTitle
same-blog 1 0.97073281 201 hunch net-2006-08-07-The Call of the Deep
Introduction: Many learning algorithms used in practice are fairly simple. Viewed representationally, many prediction algorithms either compute a linear separator of basic features (perceptron, winnow, weighted majority, SVM) or perhaps a linear separator of slightly more complex features (2-layer neural networks or kernelized SVMs). Should we go beyond this, and start using “deep” representations? What is deep learning? Intuitively, deep learning is about learning to predict in ways which can involve complex dependencies between the input (observed) features. Specifying this more rigorously turns out to be rather difficult. Consider the following cases: SVM with Gaussian Kernel. This is not considered deep learning, because an SVM with a gaussian kernel can’t succinctly represent certain decision surfaces. One of Yann LeCun ‘s examples is recognizing objects based on pixel values. An SVM will need a new support vector for each significantly different background. Since the number
2 0.9139961 407 hunch net-2010-08-23-Boosted Decision Trees for Deep Learning
Introduction: About 4 years ago, I speculated that decision trees qualify as a deep learning algorithm because they can make decisions which are substantially nonlinear in the input representation. Ping Li has proved this correct, empirically at UAI by showing that boosted decision trees can beat deep belief networks on versions of Mnist which are artificially hardened so as to make them solvable only by deep learning algorithms. This is an important point, because the ability to solve these sorts of problems is probably the best objective definition of a deep learning algorithm we have. I’m not that surprised. In my experience, if you can accept the computational drawbacks of a boosted decision tree, they can achieve pretty good performance. Geoff Hinton once told me that the great thing about deep belief networks is that they work. I understand that Ping had very substantial difficulty in getting this published, so I hope some reviewers step up to the standard of valuing wha
3 0.81272769 438 hunch net-2011-07-11-Interesting Neural Network Papers at ICML 2011
Introduction: Maybe it’s too early to call, but with four separate Neural Network sessions at this year’s ICML , it looks like Neural Networks are making a comeback. Here are my highlights of these sessions. In general, my feeling is that these papers both demystify deep learning and show its broader applicability. The first observation I made is that the once disreputable “Neural” nomenclature is being used again in lieu of “deep learning”. Maybe it’s because Adam Coates et al. showed that single layer networks can work surprisingly well. An Analysis of Single-Layer Networks in Unsupervised Feature Learning , Adam Coates , Honglak Lee , Andrew Y. Ng (AISTATS 2011) The Importance of Encoding Versus Training with Sparse Coding and Vector Quantization , Adam Coates , Andrew Y. Ng (ICML 2011) Another surprising result out of Andrew Ng’s group comes from Andrew Saxe et al. who show that certain convolutional pooling architectures can obtain close to state-of-the-art pe
4 0.80523545 477 hunch net-2013-01-01-Deep Learning 2012
Introduction: 2012 was a tumultuous year for me, but it was undeniably a great year for deep learning efforts. Signs of this include: Winning a Kaggle competition . Wide adoption of deep learning for speech recognition . Significant industry support . Gains in image recognition . This is a rare event in research: a significant capability breakout. Congratulations are definitely in order for those who managed to achieve it. At this point, deep learning algorithms seem like a choice undeniably worth investigating for real applications with significant data.
5 0.7560522 16 hunch net-2005-02-09-Intuitions from applied learning
Introduction: Since learning is far from an exact science, it’s good to pay attention to basic intuitions of applied learning. Here are a few I’ve collected. Integration In Bayesian learning, the posterior is computed by an integral, and the optimal thing to do is to predict according to this integral. This phenomena seems to be far more general. Bagging, Boosting, SVMs, and Neural Networks all take advantage of this idea to some extent. The phenomena is more general: you can average over many different classification predictors to improve performance. Sources: Zoubin , Caruana Differentiation Different pieces of an average should differentiate to achieve good performance by different methods. This is know as the ‘symmetry breaking’ problem for neural networks, and it’s why weights are initialized randomly. Boosting explicitly attempts to achieve good differentiation by creating new, different, learning problems. Sources: Yann LeCun , Phil Long Deep Representation Ha
6 0.74402535 227 hunch net-2007-01-10-A Deep Belief Net Learning Problem
7 0.70757705 224 hunch net-2006-12-12-Interesting Papers at NIPS 2006
8 0.70320457 431 hunch net-2011-04-18-A paper not at Snowbird
9 0.63856882 219 hunch net-2006-11-22-Explicit Randomization in Learning algorithms
10 0.63434982 152 hunch net-2006-01-30-Should the Input Representation be a Vector?
11 0.54208624 329 hunch net-2008-11-28-A Bumper Crop of Machine Learning Graduates
12 0.50958538 349 hunch net-2009-04-21-Interesting Presentations at Snowbird
13 0.50799817 131 hunch net-2005-11-16-The Everything Ensemble Edge
14 0.46573371 348 hunch net-2009-04-02-Asymmophobia
15 0.45610732 253 hunch net-2007-07-06-Idempotent-capable Predictors
16 0.44620484 466 hunch net-2012-06-05-ICML acceptance statistics
17 0.44605827 337 hunch net-2009-01-21-Nearly all natural problems require nonlinearity
18 0.43880323 32 hunch net-2005-02-27-Antilearning: When proximity goes bad
19 0.43293533 197 hunch net-2006-07-17-A Winner
20 0.42127752 3 hunch net-2005-01-24-The Humanloop Spectrum of Machine Learning
topicId topicWeight
[(3, 0.02), (10, 0.015), (27, 0.213), (38, 0.037), (48, 0.013), (49, 0.035), (53, 0.201), (55, 0.061), (61, 0.217), (64, 0.01), (77, 0.01), (94, 0.062), (98, 0.011)]
simIndex simValue blogId blogTitle
1 0.91735637 412 hunch net-2010-09-28-Machined Learnings
Introduction: Paul Mineiro has started Machined Learnings where he’s seriously attempting to do ML research in public. I personally need to read through in greater detail, as much of it is learning reduction related, trying to deal with the sorts of complex source problems that come up in practice.
2 0.89889836 416 hunch net-2010-10-29-To Vidoelecture or not
Introduction: (update: cross-posted on CACM ) For the first time in several years, ICML 2010 did not have videolectures attending. Luckily, the tutorial on exploration and learning which Alina and I put together can be viewed , since we also presented at KDD 2010 , which included videolecture support. ICML didn’t cover the cost of a videolecture, because PASCAL didn’t provide a grant for it this year. On the other hand, KDD covered it out of registration costs. The cost of videolectures isn’t cheap. For a workshop the baseline quote we have is 270 euro per hour, plus a similar cost for the cameraman’s travel and accomodation. This can be reduced substantially by having a volunteer with a camera handle the cameraman duties, uploading the video and slides to be processed for a quoted 216 euro per hour. Youtube is the most predominant free video site with a cost of $0, but it turns out to be a poor alternative. 15 minute upload limits do not match typical talk lengths.
3 0.88472104 106 hunch net-2005-09-04-Science in the Government
Introduction: I found the article on “ Political Science ” at the New York Times interesting. Essentially the article is about allegations that the US government has been systematically distorting scientific views. With a petition by some 7000+ scientists alleging such behavior this is clearly a significant concern. One thing not mentioned explicitly in this discussion is that there are fundamental cultural differences between academic research and the rest of the world. In academic research, careful, clear thought is valued. This value is achieved by both formal and informal mechanisms. One example of a formal mechanism is peer review. In contrast, in the land of politics, the basic value is agreement. It is only with some amount of agreement that a new law can be passed or other actions can be taken. Since Science (with a capitol ‘S’) has accomplished many things, it can be a significant tool in persuading people. This makes it compelling for a politician to use science as a mec
same-blog 4 0.88120937 201 hunch net-2006-08-07-The Call of the Deep
Introduction: Many learning algorithms used in practice are fairly simple. Viewed representationally, many prediction algorithms either compute a linear separator of basic features (perceptron, winnow, weighted majority, SVM) or perhaps a linear separator of slightly more complex features (2-layer neural networks or kernelized SVMs). Should we go beyond this, and start using “deep” representations? What is deep learning? Intuitively, deep learning is about learning to predict in ways which can involve complex dependencies between the input (observed) features. Specifying this more rigorously turns out to be rather difficult. Consider the following cases: SVM with Gaussian Kernel. This is not considered deep learning, because an SVM with a gaussian kernel can’t succinctly represent certain decision surfaces. One of Yann LeCun ‘s examples is recognizing objects based on pixel values. An SVM will need a new support vector for each significantly different background. Since the number
5 0.7772615 6 hunch net-2005-01-27-Learning Complete Problems
Introduction: Let’s define a learning problem as making predictions given past data. There are several ways to attack the learning problem which seem to be equivalent to solving the learning problem. Find the Invariant This viewpoint says that learning is all about learning (or incorporating) transformations of objects that do not change the correct prediction. The best possible invariant is the one which says “all things of the same class are the same”. Finding this is equivalent to learning. This viewpoint is particularly common when working with image features. Feature Selection This viewpoint says that the way to learn is by finding the right features to input to a learning algorithm. The best feature is the one which is the class to predict. Finding this is equivalent to learning for all reasonable learning algorithms. This viewpoint is common in several applications of machine learning. See Gilad’s and Bianca’s comments . Find the Representation This is almost the same a
6 0.76991999 367 hunch net-2009-08-16-Centmail comments
7 0.76517284 60 hunch net-2005-04-23-Advantages and Disadvantages of Bayesian Learning
8 0.75419676 152 hunch net-2006-01-30-Should the Input Representation be a Vector?
9 0.7528379 227 hunch net-2007-01-10-A Deep Belief Net Learning Problem
10 0.75039029 91 hunch net-2005-07-10-Thinking the Unthought
11 0.74546146 131 hunch net-2005-11-16-The Everything Ensemble Edge
12 0.74213511 21 hunch net-2005-02-17-Learning Research Programs
13 0.74043936 2 hunch net-2005-01-24-Holy grails of machine learning?
14 0.73733664 191 hunch net-2006-07-08-MaxEnt contradicts Bayes Rule?
15 0.73681659 478 hunch net-2013-01-07-NYU Large Scale Machine Learning Class
16 0.73548007 141 hunch net-2005-12-17-Workshops as Franchise Conferences
17 0.73279488 19 hunch net-2005-02-14-Clever Methods of Overfitting
18 0.73226416 435 hunch net-2011-05-16-Research Directions for Machine Learning and Algorithms
19 0.73000103 370 hunch net-2009-09-18-Necessary and Sufficient Research
20 0.72896683 347 hunch net-2009-03-26-Machine Learning is too easy