hunch_net hunch_net-2005 hunch_net-2005-138 knowledge-graph by maker-knowledge-mining
Source: html
Introduction: Here is a set of papers that I found interesting (and why). A PAC-Bayes approach to the Set Covering Machine improves the set covering machine. The set covering machine approach is a new way to do classification characterized by a very close connection between theory and algorithm. At this point, the approach seems to be competing well with SVMs in about all dimensions: similar computational speed, similar accuracy, stronger learning theory guarantees, more general information source (a kernel has strictly more structure than a metric), and more sparsity. Developing a classification algorithm is not very easy, but the results so far are encouraging. Off-Road Obstacle Avoidance through End-to-End Learning and Learning Depth from Single Monocular Images both effectively showed that depth information can be predicted from camera images (using notably different techniques). This ability is strongly enabling because cameras are cheap, tiny, light, and potentially provider lo
sentIndex sentText sentNum sentScore
1 Here is a set of papers that I found interesting (and why). [sent-1, score-0.221]
2 A PAC-Bayes approach to the Set Covering Machine improves the set covering machine. [sent-2, score-0.518]
3 The set covering machine approach is a new way to do classification characterized by a very close connection between theory and algorithm. [sent-3, score-0.841]
4 At this point, the approach seems to be competing well with SVMs in about all dimensions: similar computational speed, similar accuracy, stronger learning theory guarantees, more general information source (a kernel has strictly more structure than a metric), and more sparsity. [sent-4, score-0.775]
5 Developing a classification algorithm is not very easy, but the results so far are encouraging. [sent-5, score-0.101]
6 Off-Road Obstacle Avoidance through End-to-End Learning and Learning Depth from Single Monocular Images both effectively showed that depth information can be predicted from camera images (using notably different techniques). [sent-6, score-0.896]
7 This ability is strongly enabling because cameras are cheap, tiny, light, and potentially provider longer range distance information than the laser range finders people traditionally use. [sent-7, score-1.069]
8 Roughly speaking, this implies that the perceptron approach can learn arbitary (via the kernel) reasonably simple concepts from unbounded quantities of data. [sent-9, score-0.818]
9 (Feel free to add any that you found interesting. [sent-11, score-0.091]
wordName wordTfidf (topN-words)
[('perceptron', 0.27), ('covering', 0.24), ('depth', 0.229), ('unbounded', 0.216), ('images', 0.185), ('memory', 0.175), ('range', 0.167), ('approach', 0.148), ('kernel', 0.136), ('set', 0.13), ('avoidance', 0.124), ('cameras', 0.124), ('obstacle', 0.124), ('thrun', 0.124), ('characterized', 0.114), ('provider', 0.114), ('tiny', 0.114), ('information', 0.109), ('laser', 0.108), ('sebastian', 0.108), ('kernelized', 0.108), ('connection', 0.108), ('camera', 0.108), ('competes', 0.108), ('strictly', 0.108), ('coarse', 0.103), ('grand', 0.103), ('cheap', 0.103), ('enabling', 0.103), ('classification', 0.101), ('notably', 0.099), ('concepts', 0.099), ('dimensions', 0.095), ('traditionally', 0.092), ('similar', 0.092), ('found', 0.091), ('light', 0.09), ('svms', 0.09), ('dasgupta', 0.09), ('competing', 0.09), ('functional', 0.087), ('darpa', 0.087), ('contains', 0.087), ('distance', 0.085), ('quantities', 0.085), ('budget', 0.085), ('developing', 0.085), ('proved', 0.085), ('predicted', 0.083), ('showed', 0.083)]
simIndex simValue blogId blogTitle
same-blog 1 1.0 138 hunch net-2005-12-09-Some NIPS papers
Introduction: Here is a set of papers that I found interesting (and why). A PAC-Bayes approach to the Set Covering Machine improves the set covering machine. The set covering machine approach is a new way to do classification characterized by a very close connection between theory and algorithm. At this point, the approach seems to be competing well with SVMs in about all dimensions: similar computational speed, similar accuracy, stronger learning theory guarantees, more general information source (a kernel has strictly more structure than a metric), and more sparsity. Developing a classification algorithm is not very easy, but the results so far are encouraging. Off-Road Obstacle Avoidance through End-to-End Learning and Learning Depth from Single Monocular Images both effectively showed that depth information can be predicted from camera images (using notably different techniques). This ability is strongly enabling because cameras are cheap, tiny, light, and potentially provider lo
2 0.11218802 311 hunch net-2008-07-26-Compositional Machine Learning Algorithm Design
Introduction: There were two papers at ICML presenting learning algorithms for a contextual bandit -style setting, where the loss for all labels is not known, but the loss for one label is known. (The first might require a exploration scavenging viewpoint to understand if the experimental assignment was nonrandom.) I strongly approve of these papers and further work in this setting and its variants, because I expect it to become more important than supervised learning. As a quick review, we are thinking about situations where repeatedly: The world reveals feature values (aka context information). A policy chooses an action. The world provides a reward. Sometimes this is done in an online fashion where the policy can change based on immediate feedback and sometimes it’s done in a batch setting where many samples are collected before the policy can change. If you haven’t spent time thinking about the setting, you might want to because there are many natural applications. I’m g
3 0.10996264 143 hunch net-2005-12-27-Automated Labeling
Introduction: One of the common trends in machine learning has been an emphasis on the use of unlabeled data. The argument goes something like “there aren’t many labeled web pages out there, but there are a huge number of web pages, so we must find a way to take advantage of them.” There are several standard approaches for doing this: Unsupervised Learning . You use only unlabeled data. In a typical application, you cluster the data and hope that the clusters somehow correspond to what you care about. Semisupervised Learning. You use both unlabeled and labeled data to build a predictor. The unlabeled data influences the learned predictor in some way. Active Learning . You have unlabeled data and access to a labeling oracle. You interactively choose which examples to label so as to optimize prediction accuracy. It seems there is a fourth approach worth serious investigation—automated labeling. The approach goes as follows: Identify some subset of observed values to predict
4 0.097113445 227 hunch net-2007-01-10-A Deep Belief Net Learning Problem
Introduction: “Deep learning” is used to describe learning architectures which have significant depth (as a circuit). One claim is that shallow architectures (one or two layers) can not concisely represent some functions while a circuit with more depth can concisely represent these same functions. Proving lower bounds on the size of a circuit is substantially harder than upper bounds (which are constructive), but some results are known. Luca Trevisan ‘s class notes detail how XOR is not concisely representable by “AC0″ (= constant depth unbounded fan-in AND, OR, NOT gates). This doesn’t quite prove that depth is necessary for the representations commonly used in learning (such as a thresholded weighted sum), but it is strongly suggestive that this is so. Examples like this are a bit disheartening because existing algorithms for deep learning (deep belief nets, gradient descent on deep neural networks, and a perhaps decision trees depending on who you ask) can’t learn XOR very easily.
5 0.094808407 259 hunch net-2007-08-19-Choice of Metrics
Introduction: How do we judge success in Machine Learning? As Aaron notes , the best way is to use the loss imposed on you by the world. This turns out to be infeasible sometimes for various reasons. The ones I’ve seen are: The learned prediction is used in some complicated process that does not give the feedback necessary to understand the prediction’s impact on the loss. The prediction is used by some other system which expects some semantics to the predicted value. This is similar to the previous example, except that the issue is design modularity rather than engineering modularity. The correct loss function is simply unknown (and perhaps unknowable, except by experimentation). In these situations, it’s unclear what metric for evaluation should be chosen. This post has some design advice for this murkier case. I’m using the word “metric” here to distinguish the fact that we are considering methods for evaluating predictive systems rather than a loss imposed by the real wor
6 0.093788102 120 hunch net-2005-10-10-Predictive Search is Coming
7 0.092392765 111 hunch net-2005-09-12-Fast Gradient Descent
8 0.090795524 286 hunch net-2008-01-25-Turing’s Club for Machine Learning
9 0.087541804 260 hunch net-2007-08-25-The Privacy Problem
10 0.087051332 426 hunch net-2011-03-19-The Ideal Large Scale Learning Class
11 0.086738437 41 hunch net-2005-03-15-The State of Tight Bounds
12 0.083649114 229 hunch net-2007-01-26-Parallel Machine Learning Problems
13 0.083581835 235 hunch net-2007-03-03-All Models of Learning have Flaws
14 0.082569793 435 hunch net-2011-05-16-Research Directions for Machine Learning and Algorithms
15 0.0823033 432 hunch net-2011-04-20-The End of the Beginning of Active Learning
16 0.081987455 332 hunch net-2008-12-23-Use of Learning Theory
17 0.081455648 347 hunch net-2009-03-26-Machine Learning is too easy
18 0.079734147 102 hunch net-2005-08-11-Why Manifold-Based Dimension Reduction Techniques?
19 0.074584983 20 hunch net-2005-02-15-ESPgame and image labeling
20 0.073422588 258 hunch net-2007-08-12-Exponentiated Gradient
topicId topicWeight
[(0, 0.182), (1, 0.08), (2, -0.031), (3, -0.026), (4, 0.074), (5, 0.006), (6, -0.055), (7, -0.0), (8, -0.037), (9, -0.019), (10, 0.005), (11, 0.036), (12, -0.033), (13, -0.023), (14, -0.023), (15, 0.023), (16, -0.015), (17, 0.055), (18, -0.035), (19, 0.046), (20, 0.036), (21, 0.025), (22, -0.019), (23, 0.011), (24, 0.032), (25, -0.044), (26, -0.037), (27, 0.078), (28, 0.004), (29, -0.035), (30, -0.025), (31, -0.062), (32, 0.008), (33, -0.0), (34, -0.037), (35, 0.055), (36, -0.04), (37, 0.06), (38, 0.062), (39, 0.006), (40, 0.078), (41, -0.086), (42, 0.063), (43, -0.009), (44, 0.017), (45, 0.051), (46, 0.008), (47, -0.087), (48, 0.074), (49, -0.039)]
simIndex simValue blogId blogTitle
same-blog 1 0.94139779 138 hunch net-2005-12-09-Some NIPS papers
Introduction: Here is a set of papers that I found interesting (and why). A PAC-Bayes approach to the Set Covering Machine improves the set covering machine. The set covering machine approach is a new way to do classification characterized by a very close connection between theory and algorithm. At this point, the approach seems to be competing well with SVMs in about all dimensions: similar computational speed, similar accuracy, stronger learning theory guarantees, more general information source (a kernel has strictly more structure than a metric), and more sparsity. Developing a classification algorithm is not very easy, but the results so far are encouraging. Off-Road Obstacle Avoidance through End-to-End Learning and Learning Depth from Single Monocular Images both effectively showed that depth information can be predicted from camera images (using notably different techniques). This ability is strongly enabling because cameras are cheap, tiny, light, and potentially provider lo
2 0.61946887 143 hunch net-2005-12-27-Automated Labeling
Introduction: One of the common trends in machine learning has been an emphasis on the use of unlabeled data. The argument goes something like “there aren’t many labeled web pages out there, but there are a huge number of web pages, so we must find a way to take advantage of them.” There are several standard approaches for doing this: Unsupervised Learning . You use only unlabeled data. In a typical application, you cluster the data and hope that the clusters somehow correspond to what you care about. Semisupervised Learning. You use both unlabeled and labeled data to build a predictor. The unlabeled data influences the learned predictor in some way. Active Learning . You have unlabeled data and access to a labeling oracle. You interactively choose which examples to label so as to optimize prediction accuracy. It seems there is a fourth approach worth serious investigation—automated labeling. The approach goes as follows: Identify some subset of observed values to predict
3 0.60775578 426 hunch net-2011-03-19-The Ideal Large Scale Learning Class
Introduction: At NIPS, Andrew Ng asked me what should be in a large scale learning class. After some discussion with him and Nando and mulling it over a bit, these are the topics that I think should be covered. There are many different kinds of scaling. Scaling in examples This is the most basic kind of scaling. Online Gradient Descent This is an old algorithm—I’m not sure if anyone can be credited with it in particular. Perhaps the Perceptron is a good precursor, but substantial improvements come from the notion of a loss function of which squared loss , logistic loss , Hinge Loss, and Quantile Loss are all worth covering. It’s important to cover the semantics of these loss functions as well. Vowpal Wabbit is a reasonably fast codebase implementing these. Second Order Gradient Descent methods For some problems, methods taking into account second derivative information can be more effective. I’ve seen preconditioned conjugate gradient work well, for which Jonath
4 0.59918928 32 hunch net-2005-02-27-Antilearning: When proximity goes bad
Introduction: Joel Predd mentioned “ Antilearning ” by Adam Kowalczyk , which is interesting from a foundational intuitions viewpoint. There is a pervasive intuition that “nearby things tend to have the same label”. This intuition is instantiated in SVMs, nearest neighbor classifiers, decision trees, and neural networks. It turns out there are natural problems where this intuition is opposite of the truth. One natural situation where this occurs is in competition. For example, when Intel fails to meet its earnings estimate, is this evidence that AMD is doing badly also? Or evidence that AMD is doing well? This violation of the proximity intuition means that when the number of examples is few, negating a classifier which attempts to exploit proximity can provide predictive power (thus, the term “antilearning”).
5 0.59298301 420 hunch net-2010-12-26-NIPS 2010
Introduction: I enjoyed attending NIPS this year, with several things interesting me. For the conference itself: Peter Welinder , Steve Branson , Serge Belongie , and Pietro Perona , The Multidimensional Wisdom of Crowds . This paper is about using mechanical turk to get label information, with results superior to a majority vote approach. David McAllester , Tamir Hazan , and Joseph Keshet Direct Loss Minimization for Structured Prediction . This is about another technique for directly optimizing the loss in structured prediction, with an application to speech recognition. Mohammad Saberian and Nuno Vasconcelos Boosting Classifier Cascades . This is about an algorithm for simultaneously optimizing loss and computation in a classifier cascade construction. There were several other papers on cascades which are worth looking at if interested. Alan Fern and Prasad Tadepalli , A Computational Decision Theory for Interactive Assistants . This paper carves out some
6 0.55704671 247 hunch net-2007-06-14-Interesting Papers at COLT 2007
7 0.54820591 77 hunch net-2005-05-29-Maximum Margin Mismatch?
8 0.54227865 334 hunch net-2009-01-07-Interesting Papers at SODA 2009
9 0.53953481 286 hunch net-2008-01-25-Turing’s Club for Machine Learning
10 0.53903461 348 hunch net-2009-04-02-Asymmophobia
11 0.53770047 31 hunch net-2005-02-26-Problem: Reductions and Relative Ranking Metrics
12 0.53439558 87 hunch net-2005-06-29-Not EM for clustering at COLT
13 0.53383613 18 hunch net-2005-02-12-ROC vs. Accuracy vs. AROC
14 0.53315055 6 hunch net-2005-01-27-Learning Complete Problems
15 0.52579504 26 hunch net-2005-02-21-Problem: Cross Validation
16 0.52470088 253 hunch net-2007-07-06-Idempotent-capable Predictors
17 0.51727819 131 hunch net-2005-11-16-The Everything Ensemble Edge
18 0.5160622 310 hunch net-2008-07-15-Interesting papers at COLT (and a bit of UAI & workshops)
19 0.51401728 111 hunch net-2005-09-12-Fast Gradient Descent
20 0.51196438 361 hunch net-2009-06-24-Interesting papers at UAICMOLT 2009
topicId topicWeight
[(3, 0.031), (17, 0.022), (27, 0.222), (30, 0.024), (37, 0.239), (38, 0.036), (51, 0.032), (53, 0.069), (55, 0.061), (63, 0.013), (64, 0.017), (92, 0.036), (94, 0.115)]
simIndex simValue blogId blogTitle
same-blog 1 0.93267554 138 hunch net-2005-12-09-Some NIPS papers
Introduction: Here is a set of papers that I found interesting (and why). A PAC-Bayes approach to the Set Covering Machine improves the set covering machine. The set covering machine approach is a new way to do classification characterized by a very close connection between theory and algorithm. At this point, the approach seems to be competing well with SVMs in about all dimensions: similar computational speed, similar accuracy, stronger learning theory guarantees, more general information source (a kernel has strictly more structure than a metric), and more sparsity. Developing a classification algorithm is not very easy, but the results so far are encouraging. Off-Road Obstacle Avoidance through End-to-End Learning and Learning Depth from Single Monocular Images both effectively showed that depth information can be predicted from camera images (using notably different techniques). This ability is strongly enabling because cameras are cheap, tiny, light, and potentially provider lo
2 0.91241682 368 hunch net-2009-08-26-Another 10-year paper in Machine Learning
Introduction: When I was thinking about the best “10 year paper” for ICML , I also took a look at a few other conferences. Here is one from 10 years ago that interested me: David McAllester PAC-Bayesian Model Averaging , COLT 1999. 2001 Journal Draft . Prior to this paper, the only mechanism known for controlling or estimating the necessary sample complexity for learning over continuously parameterized predictors was VC theory and variants, all of which suffered from a basic problem: they were incredibly pessimistic in practice. This meant that only very gross guidance could be provided for learning algorithm design. The PAC-Bayes bound provided an alternative approach to sample complexity bounds which was radically tighter, quantitatively. It also imported and explained many of the motivations for Bayesian learning in a way that learning theory and perhaps optimization people might appreciate. Since this paper came out, there have been a number of moderately successful attempts t
3 0.90913105 63 hunch net-2005-04-27-DARPA project: LAGR
Introduction: Larry Jackal has set up the LAGR (“Learning Applied to Ground Robotics”) project (and competition) which seems to be quite well designed. Features include: Many participants (8 going on 12?) Standardized hardware. In the DARPA grand challenge contestants entering with motorcycles are at a severe disadvantage to those entering with a Hummer. Similarly, contestants using more powerful sensors can gain huge advantages. Monthly contests, with full feedback (but since the hardware is standardized, only code is shipped). One of the premises of the program is that robust systems are desired. Monthly evaluations at different locations can help measure this and provide data. Attacks a known hard problem. (cross country driving)
4 0.89016545 431 hunch net-2011-04-18-A paper not at Snowbird
Introduction: Unfortunately, a scheduling failure meant I missed all of AIStat and most of the learning workshop , otherwise known as Snowbird, when it’s at Snowbird . At snowbird, the talk on Sum-Product networks by Hoifung Poon stood out to me ( Pedro Domingos is a coauthor.). The basic point was that by appropriately constructing networks based on sums and products, the normalization problem in probabilistic models is eliminated, yielding a highly tractable yet flexible representation+learning algorithm. As an algorithm, this is noticeably cleaner than deep belief networks with a claim to being an order of magnitude faster and working better on an image completion task. Snowbird doesn’t have real papers—just the abstract above. I look forward to seeing the paper. (added: Rodrigo points out the deep learning workshop draft .)
5 0.85557365 1 hunch net-2005-01-19-Why I decided to run a weblog.
Introduction: I have decided to run a weblog on machine learning and learning theory research. Here are some reasons: 1) Weblogs enable new functionality: Public comment on papers. No mechanism for this exists at conferences and most journals. I have encountered it once for a science paper. Some communities have mailing lists supporting this, but not machine learning or learning theory. I have often read papers and found myself wishing there was some method to consider other’s questions and read the replies. Conference shortlists. One of the most common conversations at a conference is “what did you find interesting?” There is no explicit mechanism for sharing this information at conferences, and it’s easy to imagine that it would be handy to do so. Evaluation and comment on research directions. Papers are almost exclusively about new research, rather than evaluation (and consideration) of research directions. This last role is satisfied by funding agencies to some extent, but
6 0.78415018 194 hunch net-2006-07-11-New Models
7 0.75966603 143 hunch net-2005-12-27-Automated Labeling
8 0.75734532 21 hunch net-2005-02-17-Learning Research Programs
9 0.7403152 337 hunch net-2009-01-21-Nearly all natural problems require nonlinearity
10 0.73102641 286 hunch net-2008-01-25-Turing’s Club for Machine Learning
11 0.72691095 158 hunch net-2006-02-24-A Fundamentalist Organization of Machine Learning
12 0.72405279 435 hunch net-2011-05-16-Research Directions for Machine Learning and Algorithms
13 0.72374505 351 hunch net-2009-05-02-Wielding a New Abstraction
14 0.72068822 95 hunch net-2005-07-14-What Learning Theory might do
15 0.72026116 258 hunch net-2007-08-12-Exponentiated Gradient
16 0.72015756 235 hunch net-2007-03-03-All Models of Learning have Flaws
17 0.71906477 253 hunch net-2007-07-06-Idempotent-capable Predictors
18 0.71774721 359 hunch net-2009-06-03-Functionally defined Nonlinear Dynamic Models
19 0.71709055 41 hunch net-2005-03-15-The State of Tight Bounds
20 0.71698105 347 hunch net-2009-03-26-Machine Learning is too easy