hunch_net hunch_net-2006 hunch_net-2006-152 knowledge-graph by maker-knowledge-mining
Source: html
Introduction: Let’s suppose that we are trying to create a general purpose machine learning box. The box is fed many examples of the function it is supposed to learn and (hopefully) succeeds. To date, most such attempts to produce a box of this form take a vector as input. The elements of the vector might be bits, real numbers, or ‘categorical’ data (a discrete set of values). On the other hand, there are a number of succesful applications of machine learning which do not seem to use a vector representation as input. For example, in vision, convolutional neural networks have been used to solve several vision problems. The input to the convolutional neural network is essentially the raw camera image as a matrix . In learning for natural languages, several people have had success on problems like parts-of-speech tagging using predictors restricted to a window surrounding the word to be predicted. A vector window and a matrix both imply a notion of locality which is being actively and
sentIndex sentText sentNum sentScore
1 Let’s suppose that we are trying to create a general purpose machine learning box. [sent-1, score-0.203]
2 The box is fed many examples of the function it is supposed to learn and (hopefully) succeeds. [sent-2, score-0.448]
3 To date, most such attempts to produce a box of this form take a vector as input. [sent-3, score-0.537]
4 The elements of the vector might be bits, real numbers, or ‘categorical’ data (a discrete set of values). [sent-4, score-0.357]
5 On the other hand, there are a number of succesful applications of machine learning which do not seem to use a vector representation as input. [sent-5, score-0.391]
6 For example, in vision, convolutional neural networks have been used to solve several vision problems. [sent-6, score-0.455]
7 The input to the convolutional neural network is essentially the raw camera image as a matrix . [sent-7, score-0.81]
8 In learning for natural languages, several people have had success on problems like parts-of-speech tagging using predictors restricted to a window surrounding the word to be predicted. [sent-8, score-0.546]
9 A vector window and a matrix both imply a notion of locality which is being actively and effectively used by these algorithms. [sent-9, score-1.178]
10 In contrast, common algorithms like support vector machines, neural networks, and decision trees do not use or take advantage of locality in the input representation. [sent-10, score-1.437]
11 For example, many of these algorithms are (nearly) invariant under permutation of input vector features. [sent-11, score-0.665]
12 A basic question we should ask is: “Does it really matter whether or not a learning algorithm knows the locality structure of the input? [sent-12, score-0.728]
13 ” Consider a simplistic example where we have n input bits as features and the function f to learn uses k of hte n bits. [sent-13, score-0.671]
14 Suppose we also know that the function is one of two candidates: f 1 and f 2 , but that these are two otherwise complicated functions. [sent-14, score-0.166]
15 Then, finding which of the k to use might require an O(n choose k) = O(n! [sent-15, score-0.055]
16 This suggests that telling the algorithm which k features are “near” to other features can be very helpful in solving the problem, at least computationally. [sent-20, score-0.354]
17 There are several natural questions which arise: How do we specify locality? [sent-21, score-0.098]
18 What are natural algorithms subject to locality? [sent-24, score-0.177]
19 It seems most practical existing learning algorithms using locality have some form of sweeping operation over local neighborhoods. [sent-25, score-0.824]
20 Can a general purpose locality aware algorithm perform well on a broad variety of different tasks? [sent-27, score-0.892]
wordName wordTfidf (topN-words)
[('locality', 0.605), ('vector', 0.281), ('input', 0.225), ('window', 0.173), ('neighborhood', 0.16), ('box', 0.144), ('neural', 0.135), ('convolutional', 0.129), ('matrix', 0.119), ('graph', 0.112), ('features', 0.11), ('bits', 0.104), ('vision', 0.104), ('purpose', 0.102), ('suppose', 0.101), ('function', 0.099), ('natural', 0.098), ('hand', 0.091), ('networks', 0.087), ('categorical', 0.086), ('invariant', 0.08), ('algorithms', 0.079), ('simplistic', 0.076), ('camera', 0.076), ('telling', 0.076), ('discrete', 0.076), ('fed', 0.076), ('operation', 0.076), ('surrounding', 0.076), ('tagging', 0.072), ('variety', 0.072), ('candidates', 0.072), ('supposed', 0.072), ('raw', 0.069), ('languages', 0.067), ('complicated', 0.067), ('knows', 0.065), ('using', 0.064), ('arise', 0.063), ('restricted', 0.063), ('tasks', 0.06), ('algorithm', 0.058), ('take', 0.057), ('image', 0.057), ('date', 0.057), ('learn', 0.057), ('use', 0.055), ('broad', 0.055), ('attempts', 0.055), ('succesful', 0.055)]
simIndex simValue blogId blogTitle
same-blog 1 1.0000002 152 hunch net-2006-01-30-Should the Input Representation be a Vector?
Introduction: Let’s suppose that we are trying to create a general purpose machine learning box. The box is fed many examples of the function it is supposed to learn and (hopefully) succeeds. To date, most such attempts to produce a box of this form take a vector as input. The elements of the vector might be bits, real numbers, or ‘categorical’ data (a discrete set of values). On the other hand, there are a number of succesful applications of machine learning which do not seem to use a vector representation as input. For example, in vision, convolutional neural networks have been used to solve several vision problems. The input to the convolutional neural network is essentially the raw camera image as a matrix . In learning for natural languages, several people have had success on problems like parts-of-speech tagging using predictors restricted to a window surrounding the word to be predicted. A vector window and a matrix both imply a notion of locality which is being actively and
2 0.15569611 201 hunch net-2006-08-07-The Call of the Deep
Introduction: Many learning algorithms used in practice are fairly simple. Viewed representationally, many prediction algorithms either compute a linear separator of basic features (perceptron, winnow, weighted majority, SVM) or perhaps a linear separator of slightly more complex features (2-layer neural networks or kernelized SVMs). Should we go beyond this, and start using “deep” representations? What is deep learning? Intuitively, deep learning is about learning to predict in ways which can involve complex dependencies between the input (observed) features. Specifying this more rigorously turns out to be rather difficult. Consider the following cases: SVM with Gaussian Kernel. This is not considered deep learning, because an SVM with a gaussian kernel can’t succinctly represent certain decision surfaces. One of Yann LeCun ‘s examples is recognizing objects based on pixel values. An SVM will need a new support vector for each significantly different background. Since the number
3 0.11849312 227 hunch net-2007-01-10-A Deep Belief Net Learning Problem
Introduction: “Deep learning” is used to describe learning architectures which have significant depth (as a circuit). One claim is that shallow architectures (one or two layers) can not concisely represent some functions while a circuit with more depth can concisely represent these same functions. Proving lower bounds on the size of a circuit is substantially harder than upper bounds (which are constructive), but some results are known. Luca Trevisan ‘s class notes detail how XOR is not concisely representable by “AC0″ (= constant depth unbounded fan-in AND, OR, NOT gates). This doesn’t quite prove that depth is necessary for the representations commonly used in learning (such as a thresholded weighted sum), but it is strongly suggestive that this is so. Examples like this are a bit disheartening because existing algorithms for deep learning (deep belief nets, gradient descent on deep neural networks, and a perhaps decision trees depending on who you ask) can’t learn XOR very easily.
4 0.11708339 219 hunch net-2006-11-22-Explicit Randomization in Learning algorithms
Introduction: There are a number of learning algorithms which explicitly incorporate randomness into their execution. This includes at amongst others: Neural Networks. Neural networks use randomization to assign initial weights. Boltzmann Machines/ Deep Belief Networks . Boltzmann machines are something like a stochastic version of multinode logistic regression. The use of randomness is more essential in Boltzmann machines, because the predicted value at test time also uses randomness. Bagging. Bagging is a process where a learning algorithm is run several different times on several different datasets, creating a final predictor which makes a majority vote. Policy descent. Several algorithms in reinforcement learning such as Conservative Policy Iteration use random bits to create stochastic policies. Experts algorithms. Randomized weighted majority use random bits as a part of the prediction process to achieve better theoretical guarantees. A basic question is: “Should there
5 0.11625649 438 hunch net-2011-07-11-Interesting Neural Network Papers at ICML 2011
Introduction: Maybe it’s too early to call, but with four separate Neural Network sessions at this year’s ICML , it looks like Neural Networks are making a comeback. Here are my highlights of these sessions. In general, my feeling is that these papers both demystify deep learning and show its broader applicability. The first observation I made is that the once disreputable “Neural” nomenclature is being used again in lieu of “deep learning”. Maybe it’s because Adam Coates et al. showed that single layer networks can work surprisingly well. An Analysis of Single-Layer Networks in Unsupervised Feature Learning , Adam Coates , Honglak Lee , Andrew Y. Ng (AISTATS 2011) The Importance of Encoding Versus Training with Sparse Coding and Vector Quantization , Adam Coates , Andrew Y. Ng (ICML 2011) Another surprising result out of Andrew Ng’s group comes from Andrew Saxe et al. who show that certain convolutional pooling architectures can obtain close to state-of-the-art pe
6 0.1115444 253 hunch net-2007-07-06-Idempotent-capable Predictors
7 0.11037365 6 hunch net-2005-01-27-Learning Complete Problems
8 0.10934176 308 hunch net-2008-07-06-To Dual or Not
9 0.10853758 16 hunch net-2005-02-09-Intuitions from applied learning
10 0.10793605 131 hunch net-2005-11-16-The Everything Ensemble Edge
11 0.10313762 435 hunch net-2011-05-16-Research Directions for Machine Learning and Algorithms
12 0.095366977 391 hunch net-2010-03-15-The Efficient Robust Conditional Probability Estimation Problem
13 0.092664957 347 hunch net-2009-03-26-Machine Learning is too easy
14 0.091081284 160 hunch net-2006-03-02-Why do people count for learning?
15 0.090440325 84 hunch net-2005-06-22-Languages of Learning
16 0.090399735 286 hunch net-2008-01-25-Turing’s Club for Machine Learning
17 0.088074297 149 hunch net-2006-01-18-Is Multitask Learning Black-Boxable?
18 0.087568752 235 hunch net-2007-03-03-All Models of Learning have Flaws
19 0.085871853 262 hunch net-2007-09-16-Optimizing Machine Learning Programs
20 0.083628148 136 hunch net-2005-12-07-Is the Google way the way for machine learning?
topicId topicWeight
[(0, 0.184), (1, 0.106), (2, -0.035), (3, -0.007), (4, 0.06), (5, -0.034), (6, -0.05), (7, 0.039), (8, 0.07), (9, -0.028), (10, -0.133), (11, -0.081), (12, -0.023), (13, -0.09), (14, -0.008), (15, 0.103), (16, 0.018), (17, -0.012), (18, -0.08), (19, 0.071), (20, 0.042), (21, 0.04), (22, 0.074), (23, -0.007), (24, -0.001), (25, -0.0), (26, -0.012), (27, 0.001), (28, -0.047), (29, 0.002), (30, -0.078), (31, -0.018), (32, 0.033), (33, 0.043), (34, -0.007), (35, 0.042), (36, -0.016), (37, 0.046), (38, -0.01), (39, -0.041), (40, -0.002), (41, 0.022), (42, 0.029), (43, -0.004), (44, -0.08), (45, -0.151), (46, 0.046), (47, 0.004), (48, -0.022), (49, 0.01)]
simIndex simValue blogId blogTitle
same-blog 1 0.95033896 152 hunch net-2006-01-30-Should the Input Representation be a Vector?
Introduction: Let’s suppose that we are trying to create a general purpose machine learning box. The box is fed many examples of the function it is supposed to learn and (hopefully) succeeds. To date, most such attempts to produce a box of this form take a vector as input. The elements of the vector might be bits, real numbers, or ‘categorical’ data (a discrete set of values). On the other hand, there are a number of succesful applications of machine learning which do not seem to use a vector representation as input. For example, in vision, convolutional neural networks have been used to solve several vision problems. The input to the convolutional neural network is essentially the raw camera image as a matrix . In learning for natural languages, several people have had success on problems like parts-of-speech tagging using predictors restricted to a window surrounding the word to be predicted. A vector window and a matrix both imply a notion of locality which is being actively and
2 0.74104792 219 hunch net-2006-11-22-Explicit Randomization in Learning algorithms
Introduction: There are a number of learning algorithms which explicitly incorporate randomness into their execution. This includes at amongst others: Neural Networks. Neural networks use randomization to assign initial weights. Boltzmann Machines/ Deep Belief Networks . Boltzmann machines are something like a stochastic version of multinode logistic regression. The use of randomness is more essential in Boltzmann machines, because the predicted value at test time also uses randomness. Bagging. Bagging is a process where a learning algorithm is run several different times on several different datasets, creating a final predictor which makes a majority vote. Policy descent. Several algorithms in reinforcement learning such as Conservative Policy Iteration use random bits to create stochastic policies. Experts algorithms. Randomized weighted majority use random bits as a part of the prediction process to achieve better theoretical guarantees. A basic question is: “Should there
3 0.72117031 253 hunch net-2007-07-06-Idempotent-capable Predictors
Introduction: One way to distinguish different learning algorithms is by their ability or inability to easily use an input variable as the predicted output. This is desirable for at least two reasons: Modularity If we want to build complex learning systems via reuse of a subsystem, it’s important to have compatible I/O. “Prior” knowledge Machine learning is often applied in situations where we do have some knowledge of what the right solution is, often in the form of an existing system. In such situations, it’s good to start with a learning algorithm that can be at least as good as any existing system. When doing classification, most learning algorithms can do this. For example, a decision tree can split on a feature, and then classify. The real differences come up when we attempt regression. Many of the algorithms we know and commonly use are not idempotent predictors. Logistic regressors can not be idempotent, because all input features are mapped through a nonlinearity.
4 0.69676399 227 hunch net-2007-01-10-A Deep Belief Net Learning Problem
Introduction: “Deep learning” is used to describe learning architectures which have significant depth (as a circuit). One claim is that shallow architectures (one or two layers) can not concisely represent some functions while a circuit with more depth can concisely represent these same functions. Proving lower bounds on the size of a circuit is substantially harder than upper bounds (which are constructive), but some results are known. Luca Trevisan ‘s class notes detail how XOR is not concisely representable by “AC0″ (= constant depth unbounded fan-in AND, OR, NOT gates). This doesn’t quite prove that depth is necessary for the representations commonly used in learning (such as a thresholded weighted sum), but it is strongly suggestive that this is so. Examples like this are a bit disheartening because existing algorithms for deep learning (deep belief nets, gradient descent on deep neural networks, and a perhaps decision trees depending on who you ask) can’t learn XOR very easily.
5 0.69241053 201 hunch net-2006-08-07-The Call of the Deep
Introduction: Many learning algorithms used in practice are fairly simple. Viewed representationally, many prediction algorithms either compute a linear separator of basic features (perceptron, winnow, weighted majority, SVM) or perhaps a linear separator of slightly more complex features (2-layer neural networks or kernelized SVMs). Should we go beyond this, and start using “deep” representations? What is deep learning? Intuitively, deep learning is about learning to predict in ways which can involve complex dependencies between the input (observed) features. Specifying this more rigorously turns out to be rather difficult. Consider the following cases: SVM with Gaussian Kernel. This is not considered deep learning, because an SVM with a gaussian kernel can’t succinctly represent certain decision surfaces. One of Yann LeCun ‘s examples is recognizing objects based on pixel values. An SVM will need a new support vector for each significantly different background. Since the number
6 0.67545581 348 hunch net-2009-04-02-Asymmophobia
7 0.65733039 16 hunch net-2005-02-09-Intuitions from applied learning
8 0.62268686 6 hunch net-2005-01-27-Learning Complete Problems
9 0.6181218 149 hunch net-2006-01-18-Is Multitask Learning Black-Boxable?
10 0.60147363 84 hunch net-2005-06-22-Languages of Learning
11 0.59124273 438 hunch net-2011-07-11-Interesting Neural Network Papers at ICML 2011
12 0.58458018 337 hunch net-2009-01-21-Nearly all natural problems require nonlinearity
13 0.57081747 298 hunch net-2008-04-26-Eliminating the Birthday Paradox for Universal Features
14 0.55230534 308 hunch net-2008-07-06-To Dual or Not
15 0.53726536 197 hunch net-2006-07-17-A Winner
16 0.53489226 164 hunch net-2006-03-17-Multitask learning is Black-Boxable
17 0.52204144 131 hunch net-2005-11-16-The Everything Ensemble Edge
18 0.52036417 349 hunch net-2009-04-21-Interesting Presentations at Snowbird
19 0.51668638 210 hunch net-2006-09-28-Programming Languages for Machine Learning Implementations
20 0.51639104 435 hunch net-2011-05-16-Research Directions for Machine Learning and Algorithms
topicId topicWeight
[(3, 0.031), (27, 0.221), (38, 0.034), (48, 0.011), (53, 0.176), (55, 0.088), (71, 0.256), (94, 0.056), (95, 0.022)]
simIndex simValue blogId blogTitle
1 0.92869657 161 hunch net-2006-03-05-“Structural” Learning
Introduction: Fernando Pereira pointed out Ando and Zhang ‘s paper on “structural” learning. Structural learning is multitask learning on subproblems created from unlabeled data. The basic idea is to take a look at the unlabeled data and create many supervised problems. On text data, which they test on, these subproblems might be of the form “Given surrounding words predict the middle word”. The hope here is that successfully predicting on these subproblems is relevant to the prediction of your core problem. In the long run, the precise mechanism used (essentially, linear predictors with parameters tied by a common matrix) and the precise problems formed may not be critical. What seems critical is that the hope is realized: the technique provides a significant edge in practice. Some basic questions about this approach are: Are there effective automated mechanisms for creating the subproblems? Is it necessary to use a shared representation?
2 0.90890568 417 hunch net-2010-11-18-ICML 2011 – Call for Tutorials
Introduction: I would like to encourage people to consider giving a tutorial at next years ICML. The ideal tutorial attracts a wide audience, provides a gentle and easily taught introduction to the chosen research area, and also covers the most important contributions in depth. Submissions are due January 14 Â (about two weeks before paper deadline). http://www.icml-2011.org/tutorials.php Regards, Ulf
same-blog 3 0.88623631 152 hunch net-2006-01-30-Should the Input Representation be a Vector?
Introduction: Let’s suppose that we are trying to create a general purpose machine learning box. The box is fed many examples of the function it is supposed to learn and (hopefully) succeeds. To date, most such attempts to produce a box of this form take a vector as input. The elements of the vector might be bits, real numbers, or ‘categorical’ data (a discrete set of values). On the other hand, there are a number of succesful applications of machine learning which do not seem to use a vector representation as input. For example, in vision, convolutional neural networks have been used to solve several vision problems. The input to the convolutional neural network is essentially the raw camera image as a matrix . In learning for natural languages, several people have had success on problems like parts-of-speech tagging using predictors restricted to a window surrounding the word to be predicted. A vector window and a matrix both imply a notion of locality which is being actively and
4 0.87858069 147 hunch net-2006-01-08-Debugging Your Brain
Introduction: One part of doing research is debugging your understanding of reality. This is hard work: How do you even discover where you misunderstand? If you discover a misunderstanding, how do you go about removing it? The process of debugging computer programs is quite analogous to debugging reality misunderstandings. This is natural—a bug in a computer program is a misunderstanding between you and the computer about what you said. Many of the familiar techniques from debugging have exact parallels. Details When programming, there are often signs that some bug exists like: “the graph my program output is shifted a little bit” = maybe you have an indexing error. In debugging yourself, we often have some impression that something is “not right”. These impressions should be addressed directly and immediately. (Some people have the habit of suppressing worries in favor of excess certainty. That’s not healthy for research.) Corner Cases A “corner case” is an input to a program wh
5 0.7969895 450 hunch net-2011-12-02-Hadoop AllReduce and Terascale Learning
Introduction: Suppose you have a dataset with 2 terafeatures (we only count nonzero entries in a datamatrix), and want to learn a good linear predictor in a reasonable amount of time. How do you do it? As a learning theorist, the first thing you do is pray that this is too much data for the number of parameters—but that’s not the case, there are around 16 billion examples, 16 million parameters, and people really care about a high quality predictor, so subsampling is not a good strategy. Alekh visited us last summer, and we had a breakthrough (see here for details), coming up with the first learning algorithm I’ve seen that is provably faster than any future single machine learning algorithm. The proof of this is simple: We can output a optimal-up-to-precision linear predictor faster than the data can be streamed through the network interface of any single machine involved in the computation. It is necessary but not sufficient to have an effective communication infrastructure. It is ne
6 0.74047297 201 hunch net-2006-08-07-The Call of the Deep
7 0.72307706 227 hunch net-2007-01-10-A Deep Belief Net Learning Problem
8 0.72044283 478 hunch net-2013-01-07-NYU Large Scale Machine Learning Class
9 0.71987319 60 hunch net-2005-04-23-Advantages and Disadvantages of Bayesian Learning
10 0.71612382 134 hunch net-2005-12-01-The Webscience Future
11 0.71528357 131 hunch net-2005-11-16-The Everything Ensemble Edge
12 0.71503836 370 hunch net-2009-09-18-Necessary and Sufficient Research
13 0.71471256 141 hunch net-2005-12-17-Workshops as Franchise Conferences
14 0.71061468 151 hunch net-2006-01-25-1 year
15 0.7083438 19 hunch net-2005-02-14-Clever Methods of Overfitting
16 0.70771635 435 hunch net-2011-05-16-Research Directions for Machine Learning and Algorithms
17 0.70768815 358 hunch net-2009-06-01-Multitask Poisoning
18 0.70739079 6 hunch net-2005-01-27-Learning Complete Problems
19 0.70724523 347 hunch net-2009-03-26-Machine Learning is too easy
20 0.70631593 297 hunch net-2008-04-22-Taking the next step