hunch_net hunch_net-2005 hunch_net-2005-16 knowledge-graph by maker-knowledge-mining

16 hunch net-2005-02-09-Intuitions from applied learning

meta infos for this blog

Source: html

Introduction: Since learning is far from an exact science, it’s good to pay attention to basic intuitions of applied learning. Here are a few I’ve collected. Integration In Bayesian learning, the posterior is computed by an integral, and the optimal thing to do is to predict according to this integral. This phenomena seems to be far more general. Bagging, Boosting, SVMs, and Neural Networks all take advantage of this idea to some extent. The phenomena is more general: you can average over many different classification predictors to improve performance. Sources: Zoubin , Caruana Differentiation Different pieces of an average should differentiate to achieve good performance by different methods. This is know as the ‘symmetry breaking’ problem for neural networks, and it’s why weights are initialized randomly. Boosting explicitly attempts to achieve good differentiation by creating new, different, learning problems. Sources: Yann LeCun , Phil Long Deep Representation Ha

Summary: the most important sentenses genereted by tfidf model

sentIndex sentText sentNum sentScore

1 Since learning is far from an exact science, it’s good to pay attention to basic intuitions of applied learning. [sent-1, score-0.486]

2 Integration In Bayesian learning, the posterior is computed by an integral, and the optimal thing to do is to predict according to this integral. [sent-3, score-0.194]

3 Bagging, Boosting, SVMs, and Neural Networks all take advantage of this idea to some extent. [sent-5, score-0.297]

4 The phenomena is more general: you can average over many different classification predictors to improve performance. [sent-6, score-0.438]

5 Sources: Zoubin , Caruana Differentiation Different pieces of an average should differentiate to achieve good performance by different methods. [sent-7, score-0.7]

6 This is know as the ‘symmetry breaking’ problem for neural networks, and it’s why weights are initialized randomly. [sent-8, score-0.274]

7 Boosting explicitly attempts to achieve good differentiation by creating new, different, learning problems. [sent-9, score-0.527]

8 Sources: Yann LeCun , Phil Long Deep Representation Having a deep representation is necessary for having a good general learner. [sent-10, score-0.427]

9 Decision Trees and Convolutional neural networks take advantage of this. [sent-11, score-0.66]

10 SVMs get around it by allowing the user to engineer knowledge into the kernel. [sent-12, score-0.202]

11 Boosting and Bagging rely on another algorithm for this. [sent-13, score-0.084]

12 Sources: Yann LeCun Fine Representation of Bias Many learning theory applications use just a coarse representation of bias such as “function in the hypothesis class or not”. [sent-14, score-0.541]

13 In practice, significantly better performance is achieved from a more finely tuned bias. [sent-15, score-0.266]

14 Bayesian learning has this builtin with a prior. [sent-16, score-0.118]

15 Other techniques can take advantage of this as well. [sent-17, score-0.297]

similar blogs computed by tfidf model

tfidf for this blog:

wordName wordTfidf (topN-words)

[('sources', 0.271), ('differentiation', 0.266), ('representation', 0.243), ('bagging', 0.237), ('boosting', 0.21), ('zoubin', 0.207), ('phenomena', 0.197), ('neural', 0.185), ('advantage', 0.18), ('networks', 0.178), ('svms', 0.172), ('lecun', 0.163), ('yann', 0.15), ('bias', 0.128), ('different', 0.122), ('average', 0.119), ('builtin', 0.118), ('engineer', 0.118), ('take', 0.117), ('achieve', 0.11), ('computed', 0.11), ('differentiate', 0.11), ('deep', 0.108), ('breaking', 0.103), ('caruana', 0.103), ('phil', 0.103), ('coarse', 0.099), ('symmetry', 0.099), ('bayesian', 0.098), ('integral', 0.095), ('tuned', 0.095), ('performance', 0.093), ('far', 0.093), ('convolutional', 0.089), ('weights', 0.089), ('intuitions', 0.086), ('integration', 0.084), ('posterior', 0.084), ('rely', 0.084), ('user', 0.084), ('exact', 0.08), ('achieved', 0.078), ('fine', 0.078), ('pay', 0.077), ('good', 0.076), ('attempts', 0.075), ('attention', 0.074), ('hypothesis', 0.071), ('pieces', 0.07), ('trees', 0.069)]

similar blogs list:

simIndex simValue blogId blogTitle

same-blog 1 1.0000001 16 hunch net-2005-02-09-Intuitions from applied learning

2 0.18834843 201 hunch net-2006-08-07-The Call of the Deep

Introduction: Many learning algorithms used in practice are fairly simple. Viewed representationally, many prediction algorithms either compute a linear separator of basic features (perceptron, winnow, weighted majority, SVM) or perhaps a linear separator of slightly more complex features (2-layer neural networks or kernelized SVMs). Should we go beyond this, and start using “deep” representations? What is deep learning? Intuitively, deep learning is about learning to predict in ways which can involve complex dependencies between the input (observed) features. Specifying this more rigorously turns out to be rather difficult. Consider the following cases: SVM with Gaussian Kernel. This is not considered deep learning, because an SVM with a gaussian kernel can’t succinctly represent certain decision surfaces. One of Yann LeCun ‘s examples is recognizing objects based on pixel values. An SVM will need a new support vector for each significantly different background. Since the number

3 0.16568185 348 hunch net-2009-04-02-Asymmophobia

Introduction: One striking feature of many machine learning algorithms is the gymnastics that designers go through to avoid symmetry breaking. In the most basic form of machine learning, there are labeled examples composed of features. Each of these can be treated symmetrically or asymmetrically by algorithms. feature symmetry Every feature is treated the same. In gradient update rules, the same update is applied whether the feature is first or last. In metric-based predictions, every feature is just as important in computing the distance. example symmetry Every example is treated the same. Batch learning algorithms are great exemplars of this approach. label symmetry Every label is treated the same. This is particularly noticeable in multiclass classification systems which predict according to arg max l w l x but it occurs in many other places as well. Empirically, breaking symmetry well seems to yield great algorithms. feature asymmetry For those who like t

4 0.16372432 438 hunch net-2011-07-11-Interesting Neural Network Papers at ICML 2011

Introduction: Maybe it’s too early to call, but with four separate Neural Network sessions at this year’s ICML , it looks like Neural Networks are making a comeback. Here are my highlights of these sessions. In general, my feeling is that these papers both demystify deep learning and show its broader applicability. The first observation I made is that the once disreputable “Neural” nomenclature is being used again in lieu of “deep learning”. Maybe it’s because Adam Coates et al. showed that single layer networks can work surprisingly well. An Analysis of Single-Layer Networks in Unsupervised Feature Learning , Adam Coates , Honglak Lee , Andrew Y. Ng (AISTATS 2011) The Importance of Encoding Versus Training with Sparse Coding and Vector Quantization , Adam Coates , Andrew Y. Ng (ICML 2011) Another surprising result out of Andrew Ng’s group comes from Andrew Saxe et al. who show that certain convolutional pooling architectures can obtain close to state-of-the-art pe

5 0.14076078 219 hunch net-2006-11-22-Explicit Randomization in Learning algorithms

Introduction: There are a number of learning algorithms which explicitly incorporate randomness into their execution. This includes at amongst others: Neural Networks. Neural networks use randomization to assign initial weights. Boltzmann Machines/ Deep Belief Networks . Boltzmann machines are something like a stochastic version of multinode logistic regression. The use of randomness is more essential in Boltzmann machines, because the predicted value at test time also uses randomness. Bagging. Bagging is a process where a learning algorithm is run several different times on several different datasets, creating a final predictor which makes a majority vote. Policy descent. Several algorithms in reinforcement learning such as Conservative Policy Iteration use random bits to create stochastic policies. Experts algorithms. Randomized weighted majority use random bits as a part of the prediction process to achieve better theoretical guarantees. A basic question is: “Should there

6 0.12485612 60 hunch net-2005-04-23-Advantages and Disadvantages of Bayesian Learning

7 0.12267027 131 hunch net-2005-11-16-The Everything Ensemble Edge

8 0.1177882 382 hunch net-2009-12-09-Future Publication Models @ NIPS

9 0.11759502 179 hunch net-2006-05-16-The value of the orthodox view of Boosting

10 0.11284916 407 hunch net-2010-08-23-Boosted Decision Trees for Deep Learning

11 0.1110186 483 hunch net-2013-06-10-The Large Scale Learning class notes

12 0.10853758 152 hunch net-2006-01-30-Should the Input Representation be a Vector?

13 0.09852016 149 hunch net-2006-01-18-Is Multitask Learning Black-Boxable?

14 0.091941014 95 hunch net-2005-07-14-What Learning Theory might do

15 0.090828955 165 hunch net-2006-03-23-The Approximation Argument

16 0.0893379 423 hunch net-2011-02-02-User preferences for search engines

17 0.086953461 431 hunch net-2011-04-18-A paper not at Snowbird

18 0.085845955 349 hunch net-2009-04-21-Interesting Presentations at Snowbird

19 0.085100137 235 hunch net-2007-03-03-All Models of Learning have Flaws

20 0.082411483 227 hunch net-2007-01-10-A Deep Belief Net Learning Problem

similar blogs computed by lsi model

lsi for this blog:

topicId topicWeight

[(0, 0.17), (1, 0.073), (2, -0.01), (3, 0.003), (4, 0.058), (5, -0.017), (6, -0.059), (7, 0.058), (8, 0.114), (9, -0.069), (10, -0.111), (11, -0.096), (12, -0.032), (13, -0.126), (14, 0.041), (15, 0.171), (16, -0.003), (17, 0.023), (18, -0.122), (19, 0.041), (20, 0.038), (21, -0.062), (22, 0.047), (23, 0.007), (24, -0.057), (25, 0.013), (26, -0.013), (27, 0.107), (28, 0.004), (29, 0.096), (30, 0.016), (31, -0.024), (32, 0.015), (33, -0.091), (34, 0.034), (35, -0.06), (36, -0.022), (37, -0.006), (38, -0.059), (39, 0.073), (40, 0.028), (41, 0.038), (42, -0.022), (43, -0.026), (44, 0.075), (45, -0.114), (46, 0.037), (47, -0.137), (48, 0.003), (49, -0.009)]

similar blogs list:

simIndex simValue blogId blogTitle

same-blog 1 0.97058839 16 hunch net-2005-02-09-Intuitions from applied learning

2 0.75732547 201 hunch net-2006-08-07-The Call of the Deep

3 0.69324368 438 hunch net-2011-07-11-Interesting Neural Network Papers at ICML 2011

4 0.65855217 219 hunch net-2006-11-22-Explicit Randomization in Learning algorithms

5 0.63964444 407 hunch net-2010-08-23-Boosted Decision Trees for Deep Learning

Introduction: About 4 years ago, I speculated that decision trees qualify as a deep learning algorithm because they can make decisions which are substantially nonlinear in the input representation. Ping Li has proved this correct, empirically at UAI by showing that boosted decision trees can beat deep belief networks on versions of Mnist which are artificially hardened so as to make them solvable only by deep learning algorithms. This is an important point, because the ability to solve these sorts of problems is probably the best objective definition of a deep learning algorithm we have. Iâ€™m not that surprised. In my experience, if you can accept the computational drawbacks of a boosted decision tree, they can achieve pretty good performance. Geoff Hinton once told me that the great thing about deep belief networks is that they work. I understand that Ping had very substantial difficulty in getting this published, so I hope some reviewers step up to the standard of valuing wha

6 0.62051034 152 hunch net-2006-01-30-Should the Input Representation be a Vector?

7 0.5668174 348 hunch net-2009-04-02-Asymmophobia

8 0.56269103 224 hunch net-2006-12-12-Interesting Papers at NIPS 2006

9 0.55750257 253 hunch net-2007-07-06-Idempotent-capable Predictors

10 0.55516726 227 hunch net-2007-01-10-A Deep Belief Net Learning Problem

11 0.53891975 349 hunch net-2009-04-21-Interesting Presentations at Snowbird

12 0.538782 477 hunch net-2013-01-01-Deep Learning 2012

13 0.52107829 431 hunch net-2011-04-18-A paper not at Snowbird

14 0.48910528 149 hunch net-2006-01-18-Is Multitask Learning Black-Boxable?

15 0.46956572 60 hunch net-2005-04-23-Advantages and Disadvantages of Bayesian Learning

16 0.46808362 179 hunch net-2006-05-16-The value of the orthodox view of Boosting

17 0.45806509 6 hunch net-2005-01-27-Learning Complete Problems

18 0.45470977 191 hunch net-2006-07-08-MaxEnt contradicts Bayes Rule?

19 0.45305672 131 hunch net-2005-11-16-The Everything Ensemble Edge

20 0.45100409 197 hunch net-2006-07-17-A Winner

similar blogs computed by lda model

lda for this blog:

topicId topicWeight

[(27, 0.116), (49, 0.034), (53, 0.698), (55, 0.035), (94, 0.013)]

similar blogs list:

simIndex simValue blogId blogTitle

1 0.98987699 107 hunch net-2005-09-05-Site Update

Introduction: I tweaked the site in a number of ways today, including: Updating to WordPress 1.5. Installing and heavily tweaking the Geekniche theme. Update: I switched back to a tweaked version of the old theme. Adding the Customizable Post Listings plugin. Installing the StatTraq plugin. Updating some of the links. I particularly recommend looking at the computer research policy blog. Adding threaded comments . This doesn’t thread old comments obviously, but the extra structure may be helpful for new ones. Overall, I think this is an improvement, and it addresses a few of my earlier problems . If you have any difficulties or anything seems “not quite right”, please speak up. A few other tweaks to the site may happen in the near future.

2 0.98293877 56 hunch net-2005-04-14-Families of Learning Theory Statements

Introduction: The diagram above shows a very broad viewpoint of learning theory. arrow Typical statement Examples Past->Past Some prediction algorithm A does almost as well as any of a set of algorithms. Weighted Majority Past->Future Assuming independent samples, past performance predicts future performance. PAC analysis, ERM analysis Future->Future Future prediction performance on subproblems implies future prediction performance using algorithm A . ECOC, Probing A basic question is: Are there other varieties of statements of this type? Avrim noted that there are also “arrows between arrows”: generic methods for transforming between Past->Past statements and Past->Future statements. Are there others?

same-blog 3 0.9813202 16 hunch net-2005-02-09-Intuitions from applied learning

4 0.94942808 91 hunch net-2005-07-10-Thinking the Unthought

Introduction: One thing common to much research is that the researcher must be the first person ever to have some thought. How do you think of something that has never been thought of? There seems to be no methodical manner of doing this, but there are some tricks. The easiest method is to just have some connection come to you. There is a trick here however: you should write it down and fill out the idea immediately because it can just as easily go away. A harder method is to set aside a block of time and simply think about an idea. Distraction elimination is essential here because thinking about the unthought is hard work which your mind will avoid. Another common method is in conversation. Sometimes the process of verbalizing implies new ideas come up and sometimes whoever you are talking to replies just the right way. This method is dangerous though—you must speak to someone who helps you think rather than someone who occupies your thoughts. Try to rephrase the problem so the a

5 0.94930059 2 hunch net-2005-01-24-Holy grails of machine learning?

Introduction: Let me kick things off by posing this question to ML researchers: What do you think are some important holy grails of machine learning? For example: – “A classifier with SVM-level performance but much more scalable” – “Practical confidence bounds (or learning bounds) for classification” – “A reinforcement learning algorithm that can handle the ___ problem” – “Understanding theoretically why ___ works so well in practice” etc. I pose this question because I believe that when goals are stated explicitly and well (thus providing clarity as well as opening up the problems to more people), rather than left implicit, they are likely to be achieved much more quickly. I would also like to know more about the internal goals of the various machine learning sub-areas (theory, kernel methods, graphical models, reinforcement learning, etc) as stated by people in these respective areas. This could help people cross sub-areas.

6 0.93458927 145 hunch net-2005-12-29-Deadline Season

7 0.92743069 367 hunch net-2009-08-16-Centmail comments

8 0.87668294 6 hunch net-2005-01-27-Learning Complete Problems

9 0.7314412 21 hunch net-2005-02-17-Learning Research Programs

10 0.60572976 151 hunch net-2006-01-25-1 year

11 0.60487109 201 hunch net-2006-08-07-The Call of the Deep

12 0.60042429 191 hunch net-2006-07-08-MaxEnt contradicts Bayes Rule?

13 0.59054834 60 hunch net-2005-04-23-Advantages and Disadvantages of Bayesian Learning

14 0.58897203 265 hunch net-2007-10-14-NIPS workshp: Learning Problem Design

15 0.58819956 141 hunch net-2005-12-17-Workshops as Franchise Conferences

16 0.5545451 283 hunch net-2008-01-07-2008 Summer Machine Learning Conference Schedule

17 0.54873848 152 hunch net-2006-01-30-Should the Input Representation be a Vector?

18 0.54749209 321 hunch net-2008-10-19-NIPS 2008 workshop on Kernel Learning

19 0.54329538 292 hunch net-2008-03-15-COLT Open Problems

20 0.53683364 407 hunch net-2010-08-23-Boosted Decision Trees for Deep Learning