hunch_net hunch_net-2010 hunch_net-2010-407 knowledge-graph by maker-knowledge-mining

407 hunch net-2010-08-23-Boosted Decision Trees for Deep Learning


meta infos for this blog

Source: html

Introduction: About 4 years ago, I speculated that decision trees qualify as a deep learning algorithm because they can make decisions which are substantially nonlinear in the input representation. Ping Li has proved this correct, empirically at UAI by showing that boosted decision trees can beat deep belief networks on versions of Mnist which are artificially hardened so as to make them solvable only by deep learning algorithms. This is an important point, because the ability to solve these sorts of problems is probably the best objective definition of a deep learning algorithm we have. I’m not that surprised. In my experience, if you can accept the computational drawbacks of a boosted decision tree, they can achieve pretty good performance. Geoff Hinton once told me that the great thing about deep belief networks is that they work. I understand that Ping had very substantial difficulty in getting this published, so I hope some reviewers step up to the standard of valuing wha


Summary: the most important sentenses genereted by tfidf model

sentIndex sentText sentNum sentScore

1 About 4 years ago, I speculated that decision trees qualify as a deep learning algorithm because they can make decisions which are substantially nonlinear in the input representation. [sent-1, score-1.384]

2 Ping Li has proved this correct, empirically at UAI by showing that boosted decision trees can beat deep belief networks on versions of Mnist which are artificially hardened so as to make them solvable only by deep learning algorithms. [sent-2, score-2.828]

3 This is an important point, because the ability to solve these sorts of problems is probably the best objective definition of a deep learning algorithm we have. [sent-3, score-0.955]

4 In my experience, if you can accept the computational drawbacks of a boosted decision tree, they can achieve pretty good performance. [sent-5, score-0.945]

5 Geoff Hinton once told me that the great thing about deep belief networks is that they work. [sent-6, score-0.957]

6 I understand that Ping had very substantial difficulty in getting this published, so I hope some reviewers step up to the standard of valuing what works. [sent-7, score-0.628]


similar blogs computed by tfidf model

tfidf for this blog:

wordName wordTfidf (topN-words)

[('deep', 0.397), ('ping', 0.348), ('boosted', 0.322), ('trees', 0.203), ('belief', 0.184), ('decision', 0.179), ('networks', 0.175), ('valuing', 0.174), ('artificially', 0.161), ('versions', 0.145), ('hinton', 0.145), ('beat', 0.139), ('told', 0.13), ('objective', 0.13), ('solvable', 0.127), ('li', 0.127), ('geoff', 0.123), ('drawbacks', 0.123), ('nonlinear', 0.123), ('proved', 0.12), ('published', 0.108), ('sorts', 0.108), ('showing', 0.106), ('empirically', 0.096), ('uai', 0.094), ('accept', 0.094), ('ago', 0.093), ('correct', 0.091), ('input', 0.091), ('tree', 0.09), ('definition', 0.089), ('decisions', 0.085), ('getting', 0.084), ('achieve', 0.081), ('probably', 0.081), ('reviewers', 0.081), ('difficulty', 0.079), ('step', 0.079), ('years', 0.079), ('pretty', 0.079), ('algorithm', 0.078), ('make', 0.077), ('ability', 0.072), ('substantially', 0.072), ('thing', 0.071), ('experience', 0.069), ('standard', 0.068), ('works', 0.068), ('computational', 0.067), ('hope', 0.063)]

similar blogs list:

simIndex simValue blogId blogTitle

same-blog 1 0.99999988 407 hunch net-2010-08-23-Boosted Decision Trees for Deep Learning

Introduction: About 4 years ago, I speculated that decision trees qualify as a deep learning algorithm because they can make decisions which are substantially nonlinear in the input representation. Ping Li has proved this correct, empirically at UAI by showing that boosted decision trees can beat deep belief networks on versions of Mnist which are artificially hardened so as to make them solvable only by deep learning algorithms. This is an important point, because the ability to solve these sorts of problems is probably the best objective definition of a deep learning algorithm we have. I’m not that surprised. In my experience, if you can accept the computational drawbacks of a boosted decision tree, they can achieve pretty good performance. Geoff Hinton once told me that the great thing about deep belief networks is that they work. I understand that Ping had very substantial difficulty in getting this published, so I hope some reviewers step up to the standard of valuing wha

2 0.32483643 201 hunch net-2006-08-07-The Call of the Deep

Introduction: Many learning algorithms used in practice are fairly simple. Viewed representationally, many prediction algorithms either compute a linear separator of basic features (perceptron, winnow, weighted majority, SVM) or perhaps a linear separator of slightly more complex features (2-layer neural networks or kernelized SVMs). Should we go beyond this, and start using “deep” representations? What is deep learning? Intuitively, deep learning is about learning to predict in ways which can involve complex dependencies between the input (observed) features. Specifying this more rigorously turns out to be rather difficult. Consider the following cases: SVM with Gaussian Kernel. This is not considered deep learning, because an SVM with a gaussian kernel can’t succinctly represent certain decision surfaces. One of Yann LeCun ‘s examples is recognizing objects based on pixel values. An SVM will need a new support vector for each significantly different background. Since the number

3 0.21811172 131 hunch net-2005-11-16-The Everything Ensemble Edge

Introduction: Rich Caruana , Alexandru Niculescu , Geoff Crew, and Alex Ksikes have done a lot of empirical testing which shows that using all methods to make a prediction is more powerful than using any single method. This is in rough agreement with the Bayesian way of solving problems, but based upon a different (essentially empirical) motivation. A rough summary is: Take all of {decision trees, boosted decision trees, bagged decision trees, boosted decision stumps, K nearest neighbors, neural networks, SVM} with all reasonable parameter settings. Run the methods on each problem of 8 problems with a large test set, calibrating margins using either sigmoid fitting or isotonic regression . For each loss of {accuracy, area under the ROC curve, cross entropy, squared error, etc…} evaluate the average performance of the method. A series of conclusions can be drawn from the observations. ( Calibrated ) boosted decision trees appear to perform best, in general although support v

4 0.18280581 227 hunch net-2007-01-10-A Deep Belief Net Learning Problem

Introduction: “Deep learning” is used to describe learning architectures which have significant depth (as a circuit). One claim is that shallow architectures (one or two layers) can not concisely represent some functions while a circuit with more depth can concisely represent these same functions. Proving lower bounds on the size of a circuit is substantially harder than upper bounds (which are constructive), but some results are known. Luca Trevisan ‘s class notes detail how XOR is not concisely representable by “AC0″ (= constant depth unbounded fan-in AND, OR, NOT gates). This doesn’t quite prove that depth is necessary for the representations commonly used in learning (such as a thresholded weighted sum), but it is strongly suggestive that this is so. Examples like this are a bit disheartening because existing algorithms for deep learning (deep belief nets, gradient descent on deep neural networks, and a perhaps decision trees depending on who you ask) can’t learn XOR very easily.

5 0.14211453 477 hunch net-2013-01-01-Deep Learning 2012

Introduction: 2012 was a tumultuous year for me, but it was undeniably a great year for deep learning efforts. Signs of this include: Winning a Kaggle competition . Wide adoption of deep learning for speech recognition . Significant industry support . Gains in image recognition . This is a rare event in research: a significant capability breakout. Congratulations are definitely in order for those who managed to achieve it. At this point, deep learning algorithms seem like a choice undeniably worth investigating for real applications with significant data.

6 0.12628831 431 hunch net-2011-04-18-A paper not at Snowbird

7 0.12143345 438 hunch net-2011-07-11-Interesting Neural Network Papers at ICML 2011

8 0.11739264 329 hunch net-2008-11-28-A Bumper Crop of Machine Learning Graduates

9 0.11284916 16 hunch net-2005-02-09-Intuitions from applied learning

10 0.11035947 224 hunch net-2006-12-12-Interesting Papers at NIPS 2006

11 0.10482782 3 hunch net-2005-01-24-The Humanloop Spectrum of Machine Learning

12 0.094711013 466 hunch net-2012-06-05-ICML acceptance statistics

13 0.089779384 219 hunch net-2006-11-22-Explicit Randomization in Learning algorithms

14 0.088370785 103 hunch net-2005-08-18-SVM Adaptability

15 0.088323504 343 hunch net-2009-02-18-Decision by Vetocracy

16 0.088049464 247 hunch net-2007-06-14-Interesting Papers at COLT 2007

17 0.085123904 403 hunch net-2010-07-18-ICML & COLT 2010

18 0.082534291 304 hunch net-2008-06-27-Reviewing Horror Stories

19 0.080471791 424 hunch net-2011-02-17-What does Watson mean?

20 0.076895937 368 hunch net-2009-08-26-Another 10-year paper in Machine Learning


similar blogs computed by lsi model

lsi for this blog:

topicId topicWeight

[(0, 0.152), (1, 0.026), (2, 0.032), (3, 0.007), (4, 0.117), (5, -0.015), (6, -0.045), (7, 0.051), (8, 0.056), (9, -0.061), (10, -0.086), (11, -0.092), (12, -0.105), (13, -0.186), (14, -0.03), (15, 0.304), (16, 0.033), (17, 0.069), (18, -0.127), (19, 0.108), (20, -0.002), (21, 0.103), (22, 0.014), (23, 0.033), (24, -0.093), (25, -0.002), (26, -0.02), (27, -0.027), (28, 0.094), (29, 0.11), (30, 0.099), (31, 0.022), (32, -0.03), (33, 0.053), (34, 0.123), (35, -0.05), (36, 0.088), (37, -0.019), (38, -0.031), (39, 0.066), (40, -0.045), (41, 0.064), (42, -0.121), (43, -0.035), (44, 0.053), (45, 0.086), (46, -0.007), (47, -0.02), (48, -0.022), (49, -0.066)]

similar blogs list:

simIndex simValue blogId blogTitle

same-blog 1 0.98631346 407 hunch net-2010-08-23-Boosted Decision Trees for Deep Learning

Introduction: About 4 years ago, I speculated that decision trees qualify as a deep learning algorithm because they can make decisions which are substantially nonlinear in the input representation. Ping Li has proved this correct, empirically at UAI by showing that boosted decision trees can beat deep belief networks on versions of Mnist which are artificially hardened so as to make them solvable only by deep learning algorithms. This is an important point, because the ability to solve these sorts of problems is probably the best objective definition of a deep learning algorithm we have. I’m not that surprised. In my experience, if you can accept the computational drawbacks of a boosted decision tree, they can achieve pretty good performance. Geoff Hinton once told me that the great thing about deep belief networks is that they work. I understand that Ping had very substantial difficulty in getting this published, so I hope some reviewers step up to the standard of valuing wha

2 0.86824697 201 hunch net-2006-08-07-The Call of the Deep

Introduction: Many learning algorithms used in practice are fairly simple. Viewed representationally, many prediction algorithms either compute a linear separator of basic features (perceptron, winnow, weighted majority, SVM) or perhaps a linear separator of slightly more complex features (2-layer neural networks or kernelized SVMs). Should we go beyond this, and start using “deep” representations? What is deep learning? Intuitively, deep learning is about learning to predict in ways which can involve complex dependencies between the input (observed) features. Specifying this more rigorously turns out to be rather difficult. Consider the following cases: SVM with Gaussian Kernel. This is not considered deep learning, because an SVM with a gaussian kernel can’t succinctly represent certain decision surfaces. One of Yann LeCun ‘s examples is recognizing objects based on pixel values. An SVM will need a new support vector for each significantly different background. Since the number

3 0.7684378 477 hunch net-2013-01-01-Deep Learning 2012

Introduction: 2012 was a tumultuous year for me, but it was undeniably a great year for deep learning efforts. Signs of this include: Winning a Kaggle competition . Wide adoption of deep learning for speech recognition . Significant industry support . Gains in image recognition . This is a rare event in research: a significant capability breakout. Congratulations are definitely in order for those who managed to achieve it. At this point, deep learning algorithms seem like a choice undeniably worth investigating for real applications with significant data.

4 0.6880551 438 hunch net-2011-07-11-Interesting Neural Network Papers at ICML 2011

Introduction: Maybe it’s too early to call, but with four separate Neural Network sessions at this year’s ICML , it looks like Neural Networks are making a comeback. Here are my highlights of these sessions. In general, my feeling is that these papers both demystify deep learning and show its broader applicability. The first observation I made is that the once disreputable “Neural” nomenclature is being used again in lieu of “deep learning”. Maybe it’s because Adam Coates et al. showed that single layer networks can work surprisingly well. An Analysis of Single-Layer Networks in Unsupervised Feature Learning , Adam Coates , Honglak Lee , Andrew Y. Ng (AISTATS 2011) The Importance of Encoding Versus Training with Sparse Coding and Vector Quantization , Adam Coates , Andrew Y. Ng (ICML 2011) Another surprising result out of Andrew Ng’s group comes from Andrew Saxe et al. who show that certain convolutional pooling architectures can obtain close to state-of-the-art pe

5 0.66345608 431 hunch net-2011-04-18-A paper not at Snowbird

Introduction: Unfortunately, a scheduling failure meant I missed all of AIStat and most of the learning workshop , otherwise known as Snowbird, when it’s at Snowbird . At snowbird, the talk on Sum-Product networks by Hoifung Poon stood out to me ( Pedro Domingos is a coauthor.). The basic point was that by appropriately constructing networks based on sums and products, the normalization problem in probabilistic models is eliminated, yielding a highly tractable yet flexible representation+learning algorithm. As an algorithm, this is noticeably cleaner than deep belief networks with a claim to being an order of magnitude faster and working better on an image completion task. Snowbird doesn’t have real papers—just the abstract above. I look forward to seeing the paper. (added: Rodrigo points out the deep learning workshop draft .)

6 0.65810859 227 hunch net-2007-01-10-A Deep Belief Net Learning Problem

7 0.63041425 16 hunch net-2005-02-09-Intuitions from applied learning

8 0.61135519 224 hunch net-2006-12-12-Interesting Papers at NIPS 2006

9 0.53586322 329 hunch net-2008-11-28-A Bumper Crop of Machine Learning Graduates

10 0.52878606 219 hunch net-2006-11-22-Explicit Randomization in Learning algorithms

11 0.46552846 131 hunch net-2005-11-16-The Everything Ensemble Edge

12 0.43827912 466 hunch net-2012-06-05-ICML acceptance statistics

13 0.40606496 152 hunch net-2006-01-30-Should the Input Representation be a Vector?

14 0.37229395 349 hunch net-2009-04-21-Interesting Presentations at Snowbird

15 0.34865388 3 hunch net-2005-01-24-The Humanloop Spectrum of Machine Learning

16 0.34531364 490 hunch net-2013-11-09-Graduates and Postdocs

17 0.34324801 424 hunch net-2011-02-17-What does Watson mean?

18 0.34155765 456 hunch net-2012-02-24-ICML+50%

19 0.33641523 118 hunch net-2005-10-07-On-line learning of regular decision rules

20 0.33638537 266 hunch net-2007-10-15-NIPS workshops extended to 3 days


similar blogs computed by lda model

lda for this blog:

topicId topicWeight

[(14, 0.341), (27, 0.251), (53, 0.157), (55, 0.123)]

similar blogs list:

simIndex simValue blogId blogTitle

1 0.9387787 94 hunch net-2005-07-13-Text Entailment at AAAI

Introduction: Rajat Raina presented a paper on the technique they used for the PASCAL Recognizing Textual Entailment challenge. “Text entailment” is the problem of deciding if one sentence implies another. For example the previous sentence entails: Text entailment is a decision problem. One sentence can imply another. The challenge was of the form: given an original sentence and another sentence predict whether there was an entailment. All current techniques for predicting correctness of an entailment are at the “flail” stage—accuracies of around 58% where humans could achieve near 100% accuracy, so there is much room to improve. Apparently, there may be another PASCAL challenge on this problem in the near future.

same-blog 2 0.88683939 407 hunch net-2010-08-23-Boosted Decision Trees for Deep Learning

Introduction: About 4 years ago, I speculated that decision trees qualify as a deep learning algorithm because they can make decisions which are substantially nonlinear in the input representation. Ping Li has proved this correct, empirically at UAI by showing that boosted decision trees can beat deep belief networks on versions of Mnist which are artificially hardened so as to make them solvable only by deep learning algorithms. This is an important point, because the ability to solve these sorts of problems is probably the best objective definition of a deep learning algorithm we have. I’m not that surprised. In my experience, if you can accept the computational drawbacks of a boosted decision tree, they can achieve pretty good performance. Geoff Hinton once told me that the great thing about deep belief networks is that they work. I understand that Ping had very substantial difficulty in getting this published, so I hope some reviewers step up to the standard of valuing wha

3 0.82117593 380 hunch net-2009-11-29-AI Safety

Introduction: Dan Reeves introduced me to Michael Vassar who ran the Singularity Summit and educated me a bit on the subject of AI safety which the Singularity Institute has small grants for . I still believe that interstellar space travel is necessary for long term civilization survival, and the AI is necessary for interstellar space travel . On these grounds alone, we could judge that developing AI is much more safe than not. Nevertheless, there is a basic reasonable fear, as expressed by some commenters, that AI could go bad. A basic scenario starts with someone inventing an AI and telling it to make as much money as possible. The AI promptly starts trading in various markets to make money. To improve, it crafts a virus that takes over most of the world’s computers using it as a surveillance network so that it can always make the right decision. The AI also branches out into any form of distance work, taking over the entire outsourcing process for all jobs that are entirely di

4 0.80930686 471 hunch net-2012-08-24-Patterns for research in machine learning

Introduction: There are a handful of basic code patterns that I wish I was more aware of when I started research in machine learning. Each on its own may seem pointless, but collectively they go a long way towards making the typical research workflow more efficient. Here they are: Separate code from data. Separate input data, working data and output data. Save everything to disk frequently. Separate options from parameters. Do not use global variables. Record the options used to generate each run of the algorithm. Make it easy to sweep options. Make it easy to execute only portions of the code. Use checkpointing. Write demos and tests. Click here for discussion and examples for each item. Also see Charles Sutton’s and HackerNews’ thoughts on the same topic. My guess is that these patterns will not only be useful for machine learning, but also any other computational work that involves either a) processing large amounts of data, or b) algorithms that take a signif

5 0.77789575 430 hunch net-2011-04-11-The Heritage Health Prize

Introduction: The Heritage Health Prize is potentially the largest prediction prize yet at $3M, which is sure to get many people interested. Several elements of the competition may be worth discussing. The most straightforward way for HPN to deploy this predictor is in determining who to cover with insurance. This might easily cover the costs of running the contest itself, but the value to the health system of a whole is minimal, as people not covered still exist. While HPN itself is a provider network, they have active relationships with a number of insurance companies, and the right to resell any entrant. It’s worth keeping in mind that the research and development may nevertheless end up being useful in the longer term, especially as entrants also keep the right to their code. The judging metric is something I haven’t seen previously. If a patient has probability 0.5 of being in the hospital 0 days and probability 0.5 of being in the hospital ~53.6 days, the optimal prediction in e

6 0.64251101 131 hunch net-2005-11-16-The Everything Ensemble Edge

7 0.64250857 201 hunch net-2006-08-07-The Call of the Deep

8 0.63246357 225 hunch net-2007-01-02-Retrospective

9 0.630481 152 hunch net-2006-01-30-Should the Input Representation be a Vector?

10 0.62500614 458 hunch net-2012-03-06-COLT-ICML Open Questions and ICML Instructions

11 0.62465781 151 hunch net-2006-01-25-1 year

12 0.62387788 134 hunch net-2005-12-01-The Webscience Future

13 0.62135094 227 hunch net-2007-01-10-A Deep Belief Net Learning Problem

14 0.62074184 478 hunch net-2013-01-07-NYU Large Scale Machine Learning Class

15 0.62034059 116 hunch net-2005-09-30-Research in conferences

16 0.61792392 370 hunch net-2009-09-18-Necessary and Sufficient Research

17 0.61785465 22 hunch net-2005-02-18-What it means to do research.

18 0.61603016 149 hunch net-2006-01-18-Is Multitask Learning Black-Boxable?

19 0.61572832 437 hunch net-2011-07-10-ICML 2011 and the future

20 0.61570525 158 hunch net-2006-02-24-A Fundamentalist Organization of Machine Learning