hunch_net hunch_net-2005 hunch_net-2005-14 knowledge-graph by maker-knowledge-mining
Source: html
Introduction: What? Reductions are machines which turn solvers for one problem into solvers for another problem. Why? Reductions are useful for several reasons. Laziness . Reducing a problem to classification make at least 10 learning algorithms available to solve a problem. Inventing 10 learning algorithms is quite a bit of work. Similarly, programming a reduction is often trivial, while programming a learning algorithm is a great deal of work. Crystallization . The problems we often want to solve in learning are worst-case-impossible, but average case feasible. By reducing all problems onto one or a few primitives, we can fine tune these primitives to perform well on real-world problems with greater precision due to the greater number of problems to validate on. Theoretical Organization . By studying what reductions are easy vs. hard vs. impossible, we can learn which problems are roughly equivalent in difficulty and which are much harder. What we know now . Typesafe r
sentIndex sentText sentNum sentScore
1 Reductions are machines which turn solvers for one problem into solvers for another problem. [sent-2, score-0.288]
2 Reducing a problem to classification make at least 10 learning algorithms available to solve a problem. [sent-6, score-0.235]
3 Similarly, programming a reduction is often trivial, while programming a learning algorithm is a great deal of work. [sent-8, score-0.232]
4 By reducing all problems onto one or a few primitives, we can fine tune these primitives to perform well on real-world problems with greater precision due to the greater number of problems to validate on. [sent-11, score-0.817]
5 In the beginning, there was the observation that every complex object on a computer can be written as a sequence of bits. [sent-18, score-0.236]
6 This observation leads to the notion that a classifier (which predicts a single bit) can be used to predict any complex object. [sent-19, score-0.451]
7 This observation also often doesn’t work well in practice, because the classifiers are sometimes wrong, so one of many classifiers are often wrong. [sent-22, score-0.523]
8 Worrying about errors leads to the notion of robust reductions (= ways of using simple predictors such as classifiers to make complex predictions). [sent-24, score-0.78]
9 These were analyzed in terms of error rates on training sets and general losses on training sets . [sent-26, score-0.487]
10 The robustness can be (more generally) analyzed with respect to arbitrary test distributions , and algorithms optimized with respect to this notion are often very simple and yield good performance . [sent-27, score-0.438]
11 Solving created classification problems up to error rate e implies: Solving importance weighed classifications up to error rate eN where N is the expected importance. [sent-28, score-1.242]
12 Costing Solving multiclass classification up to error rate 4e using ECOC. [sent-29, score-0.66]
13 Error limiting reductions paper Solving Cost sensitive classification up to loss 2eZ where Z is the sum of costs. [sent-30, score-0.809]
14 Weighted All Pairs algorithm Finding a policy within expected reward (T+1)e/2 of the optimal policy for T step reinforcement learning with a generative model. [sent-31, score-0.545]
15 RLgen paper The same statement holds much more efficiently when the distribution of states of a near optimal policy is also known. [sent-32, score-0.267]
16 Regret Transform Reductions To cope with this, we can analyze how good performance minus the best possible performance (called “regret”) is transformed under reduction. [sent-35, score-0.284]
17 Solving created binary classification problems to regret r implies: Solving importance weighted regret up to r N using the same algorithm as for errors. [sent-36, score-1.276]
18 Costing Solving class membership probability up to l 2 regret 2r . [sent-37, score-0.354]
19 Probing paper Solving multiclass classification to regret 4 r 0. [sent-38, score-0.683]
20 SECOC paper Predicting costs in cost sensitive classification up to l 2 regret 4r SECOC again Solving cost sensitive classification up to regret 4(r Z) 0. [sent-40, score-1.683]
wordName wordTfidf (topN-words)
[('regret', 0.282), ('reductions', 0.268), ('secoc', 0.243), ('solving', 0.24), ('classification', 0.235), ('error', 0.194), ('costing', 0.144), ('solvers', 0.144), ('performance', 0.142), ('sensitive', 0.134), ('observation', 0.131), ('problems', 0.126), ('classifiers', 0.122), ('created', 0.117), ('generative', 0.115), ('primitives', 0.115), ('policy', 0.112), ('analyzed', 0.111), ('notion', 0.111), ('complex', 0.105), ('leads', 0.104), ('transform', 0.104), ('costs', 0.099), ('cost', 0.098), ('sets', 0.091), ('reducing', 0.088), ('sum', 0.086), ('paper', 0.086), ('weighted', 0.084), ('greater', 0.082), ('rate', 0.081), ('implies', 0.081), ('importance', 0.08), ('multiclass', 0.08), ('programming', 0.079), ('often', 0.074), ('crystallization', 0.072), ('en', 0.072), ('membership', 0.072), ('rlgen', 0.072), ('validate', 0.072), ('using', 0.07), ('reinforcement', 0.07), ('optimal', 0.069), ('expected', 0.067), ('classifications', 0.067), ('fern', 0.067), ('inventing', 0.067), ('psdp', 0.067), ('studying', 0.067)]
simIndex simValue blogId blogTitle
same-blog 1 1.0 14 hunch net-2005-02-07-The State of the Reduction
Introduction: What? Reductions are machines which turn solvers for one problem into solvers for another problem. Why? Reductions are useful for several reasons. Laziness . Reducing a problem to classification make at least 10 learning algorithms available to solve a problem. Inventing 10 learning algorithms is quite a bit of work. Similarly, programming a reduction is often trivial, while programming a learning algorithm is a great deal of work. Crystallization . The problems we often want to solve in learning are worst-case-impossible, but average case feasible. By reducing all problems onto one or a few primitives, we can fine tune these primitives to perform well on real-world problems with greater precision due to the greater number of problems to validate on. Theoretical Organization . By studying what reductions are easy vs. hard vs. impossible, we can learn which problems are roughly equivalent in difficulty and which are much harder. What we know now . Typesafe r
2 0.3958911 72 hunch net-2005-05-16-Regret minimizing vs error limiting reductions
Introduction: This post is about a reductions-related problem that I find mysterious. There are two kinds of reductions analysis currently under consideration. Error limiting reductions. Here, the goal is to bound the error rate of the created classifier in terms of the error rate of the binary classifiers that you reduce to. A very simple example of this is that error correcting output codes where it is possible to prove that for certain codes, the multiclass error rate is at most 4 * the binary classifier error rate. Regret minimizing reductions. Here, the goal is to bound the regret of the created classifier in terms of the regret of the binary classifiers reduced to. The regret is the error rate minus the minimum error rate. When the learning problem is noisy the minimum error rate may not be 0 . An analagous result for reget is that for a probabilistic error correcting output code , multiclass regret is at most 4 * (binary regret) 0.5 . The use of “regret” is more desi
3 0.2722849 181 hunch net-2006-05-23-What is the best regret transform reduction from multiclass to binary?
Introduction: This post is about an open problem in learning reductions. Background A reduction might transform a a multiclass prediction problem where there are k possible labels into a binary learning problem where there are only 2 possible labels. On this induced binary problem we might learn a binary classifier with some error rate e . After subtracting the minimum possible (Bayes) error rate b , we get a regret r = e – b . The PECOC (Probabilistic Error Correcting Output Code) reduction has the property that binary regret r implies multiclass regret at most 4r 0.5 . The problem This is not the “rightest” answer. Consider the k=2 case, where we reduce binary to binary. There exists a reduction (the identity) with the property that regret r implies regret r . This is substantially superior to the transform given by the PECOC reduction, which suggests that a better reduction may exist for general k . For example, we can not rule out the possibility that a reduction
4 0.26801041 236 hunch net-2007-03-15-Alternative Machine Learning Reductions Definitions
Introduction: A type of prediction problem is specified by the type of samples produced by a data source (Example: X x {0,1} , X x [0,1] , X x {1,2,3,4,5} , etc…) and a loss function (0/1 loss, squared error loss, cost sensitive losses, etc…). For simplicity, we’ll assume that all losses have a minimum of zero. For this post, we can think of a learning reduction as A mapping R from samples of one type T (like multiclass classification) to another type T’ (like binary classification). A mapping Q from predictors for type T’ to predictors for type T . The simplest sort of learning reduction is a “loss reduction”. The idea in a loss reduction is to prove a statement of the form: Theorem For all base predictors b , for all distributions D over examples of type T : E (x,y) ~ D L T (y,Q(b,x)) <= f(E (x’,y’)~R(D) L T’ (y’,b(x’))) Here L T is the loss for the type T problem and L T’ is the loss for the type T’ problem. Also, R(D) is the distribution ov
5 0.25840333 196 hunch net-2006-07-13-Regression vs. Classification as a Primitive
Introduction: For learning reductions we have been concentrating on reducing various complex learning problems to binary classification. This choice needs to be actively questioned, because it was not carefully considered. Binary clasification is learning a classifier c:X -> {0,1} so as to minimize the probability of being wrong, Pr x,y~D (c(x) y) . The primary alternative candidate seems to be squared error regression. In squared error regression, you learn a regressor s:X -> [0,1] so as to minimize squared error, E x,y~D (s(x)-y) 2 . It is difficult to judge one primitive against another. The judgement must at least partially be made on nontheoretical grounds because (essentially) we are evaluating a choice between two axioms/assumptions. These two primitives are significantly related. Classification can be reduced to regression in the obvious way: you use the regressor to predict D(y=1|x) , then threshold at 0.5 . For this simple reduction a squared error regret of r
6 0.24812332 27 hunch net-2005-02-23-Problem: Reinforcement Learning with Classification
7 0.21348301 83 hunch net-2005-06-18-Lower Bounds for Learning Reductions
8 0.1923151 103 hunch net-2005-08-18-SVM Adaptability
9 0.17966212 391 hunch net-2010-03-15-The Efficient Robust Conditional Probability Estimation Problem
10 0.1766326 351 hunch net-2009-05-02-Wielding a New Abstraction
11 0.17593291 243 hunch net-2007-05-08-Conditional Tournaments for Multiclass to Binary
12 0.16583277 327 hunch net-2008-11-16-Observations on Linearity for Reductions to Regression
13 0.16042484 311 hunch net-2008-07-26-Compositional Machine Learning Algorithm Design
14 0.158136 82 hunch net-2005-06-17-Reopening RL->Classification
15 0.15516295 473 hunch net-2012-09-29-Vowpal Wabbit, version 7.0
16 0.15412685 133 hunch net-2005-11-28-A question of quantification
17 0.15093061 78 hunch net-2005-06-06-Exact Online Learning for Classification
18 0.14796935 274 hunch net-2007-11-28-Computational Consequences of Classification
19 0.14791486 68 hunch net-2005-05-10-Learning Reductions are Reductionist
20 0.14166518 235 hunch net-2007-03-03-All Models of Learning have Flaws
topicId topicWeight
[(0, 0.291), (1, 0.254), (2, 0.138), (3, -0.135), (4, -0.066), (5, -0.088), (6, 0.322), (7, -0.064), (8, -0.221), (9, -0.025), (10, -0.132), (11, -0.059), (12, -0.017), (13, 0.005), (14, 0.005), (15, 0.027), (16, 0.061), (17, -0.022), (18, 0.033), (19, -0.051), (20, -0.024), (21, -0.074), (22, 0.009), (23, -0.042), (24, 0.01), (25, -0.006), (26, -0.028), (27, 0.017), (28, -0.068), (29, 0.045), (30, 0.056), (31, 0.041), (32, -0.001), (33, -0.037), (34, 0.038), (35, 0.024), (36, 0.044), (37, 0.009), (38, 0.057), (39, 0.023), (40, 0.056), (41, -0.052), (42, -0.004), (43, 0.025), (44, 0.024), (45, 0.005), (46, -0.056), (47, -0.069), (48, 0.004), (49, 0.069)]
simIndex simValue blogId blogTitle
same-blog 1 0.98081386 14 hunch net-2005-02-07-The State of the Reduction
Introduction: What? Reductions are machines which turn solvers for one problem into solvers for another problem. Why? Reductions are useful for several reasons. Laziness . Reducing a problem to classification make at least 10 learning algorithms available to solve a problem. Inventing 10 learning algorithms is quite a bit of work. Similarly, programming a reduction is often trivial, while programming a learning algorithm is a great deal of work. Crystallization . The problems we often want to solve in learning are worst-case-impossible, but average case feasible. By reducing all problems onto one or a few primitives, we can fine tune these primitives to perform well on real-world problems with greater precision due to the greater number of problems to validate on. Theoretical Organization . By studying what reductions are easy vs. hard vs. impossible, we can learn which problems are roughly equivalent in difficulty and which are much harder. What we know now . Typesafe r
2 0.85792798 181 hunch net-2006-05-23-What is the best regret transform reduction from multiclass to binary?
Introduction: This post is about an open problem in learning reductions. Background A reduction might transform a a multiclass prediction problem where there are k possible labels into a binary learning problem where there are only 2 possible labels. On this induced binary problem we might learn a binary classifier with some error rate e . After subtracting the minimum possible (Bayes) error rate b , we get a regret r = e – b . The PECOC (Probabilistic Error Correcting Output Code) reduction has the property that binary regret r implies multiclass regret at most 4r 0.5 . The problem This is not the “rightest” answer. Consider the k=2 case, where we reduce binary to binary. There exists a reduction (the identity) with the property that regret r implies regret r . This is substantially superior to the transform given by the PECOC reduction, which suggests that a better reduction may exist for general k . For example, we can not rule out the possibility that a reduction
3 0.82912004 243 hunch net-2007-05-08-Conditional Tournaments for Multiclass to Binary
Introduction: This problem has been cracked (but not quite completely solved) by Alina , Pradeep , and I . The problem is essentially finding a better way to reduce multiclass classification to binary classification. The solution is to use a carefully crafted tournament, the simplest version of which is a single elimination tournament where the “players” are the different classes. An example of the structure is here: For the single elimination tournament, we can prove that: For all multiclass problems D , for all learned binary classifiers c , the regret of an induced multiclass classifier is bounded by the regret of the binary classifier times log 2 k . Restated: reg multiclass (D,Filter_tree_test(c)) <= reg binary (Filter_tree_train(D),c) Here: Filter_tree_train(D) is the induced binary classification problem Filter_tree_test(c) is the induced multiclass classifier. reg multiclass is the multiclass regret (= difference between error rate and minim
4 0.7901144 72 hunch net-2005-05-16-Regret minimizing vs error limiting reductions
Introduction: This post is about a reductions-related problem that I find mysterious. There are two kinds of reductions analysis currently under consideration. Error limiting reductions. Here, the goal is to bound the error rate of the created classifier in terms of the error rate of the binary classifiers that you reduce to. A very simple example of this is that error correcting output codes where it is possible to prove that for certain codes, the multiclass error rate is at most 4 * the binary classifier error rate. Regret minimizing reductions. Here, the goal is to bound the regret of the created classifier in terms of the regret of the binary classifiers reduced to. The regret is the error rate minus the minimum error rate. When the learning problem is noisy the minimum error rate may not be 0 . An analagous result for reget is that for a probabilistic error correcting output code , multiclass regret is at most 4 * (binary regret) 0.5 . The use of “regret” is more desi
5 0.76513743 391 hunch net-2010-03-15-The Efficient Robust Conditional Probability Estimation Problem
Introduction: I’m offering a reward of $1000 for a solution to this problem. This joins the cross validation problem which I’m offering a $500 reward for. I believe both of these problems are hard but plausibly solvable, and plausibly with a solution of substantial practical value. While it’s unlikely these rewards are worth your time on an hourly wage basis, the recognition for solving them definitely should be The Problem The problem is finding a general, robust, and efficient mechanism for estimating a conditional probability P(y|x) where robustness and efficiency are measured using techniques from learning reductions. In particular, suppose we have access to a binary regression oracle B which has two interfaces—one for specifying training information and one for testing. Training information is specified as B(x’,y’) where x’ is a feature vector and y’ is a scalar in [0,1] with no value returned. Testing is done according to B(x’) with a value in [0,1] returned.
6 0.74844086 196 hunch net-2006-07-13-Regression vs. Classification as a Primitive
7 0.73432481 27 hunch net-2005-02-23-Problem: Reinforcement Learning with Classification
8 0.70688319 206 hunch net-2006-09-09-How to solve an NP hard problem in quadratic time
9 0.68604356 236 hunch net-2007-03-15-Alternative Machine Learning Reductions Definitions
10 0.6470229 83 hunch net-2005-06-18-Lower Bounds for Learning Reductions
11 0.64603418 31 hunch net-2005-02-26-Problem: Reductions and Relative Ranking Metrics
12 0.63552552 103 hunch net-2005-08-18-SVM Adaptability
13 0.62313265 327 hunch net-2008-11-16-Observations on Linearity for Reductions to Regression
14 0.62310594 351 hunch net-2009-05-02-Wielding a New Abstraction
15 0.62242812 67 hunch net-2005-05-06-Don’t mix the solution into the problem
16 0.61433017 82 hunch net-2005-06-17-Reopening RL->Classification
17 0.60255677 176 hunch net-2006-05-01-A conversation between Theo and Pat
18 0.57201791 78 hunch net-2005-06-06-Exact Online Learning for Classification
19 0.56157905 311 hunch net-2008-07-26-Compositional Machine Learning Algorithm Design
20 0.54351717 473 hunch net-2012-09-29-Vowpal Wabbit, version 7.0
topicId topicWeight
[(0, 0.037), (1, 0.011), (20, 0.011), (27, 0.32), (38, 0.132), (48, 0.018), (53, 0.108), (55, 0.039), (63, 0.011), (77, 0.018), (79, 0.017), (80, 0.011), (86, 0.021), (94, 0.081), (95, 0.033)]
simIndex simValue blogId blogTitle
same-blog 1 0.98497832 14 hunch net-2005-02-07-The State of the Reduction
Introduction: What? Reductions are machines which turn solvers for one problem into solvers for another problem. Why? Reductions are useful for several reasons. Laziness . Reducing a problem to classification make at least 10 learning algorithms available to solve a problem. Inventing 10 learning algorithms is quite a bit of work. Similarly, programming a reduction is often trivial, while programming a learning algorithm is a great deal of work. Crystallization . The problems we often want to solve in learning are worst-case-impossible, but average case feasible. By reducing all problems onto one or a few primitives, we can fine tune these primitives to perform well on real-world problems with greater precision due to the greater number of problems to validate on. Theoretical Organization . By studying what reductions are easy vs. hard vs. impossible, we can learn which problems are roughly equivalent in difficulty and which are much harder. What we know now . Typesafe r
2 0.98279971 131 hunch net-2005-11-16-The Everything Ensemble Edge
Introduction: Rich Caruana , Alexandru Niculescu , Geoff Crew, and Alex Ksikes have done a lot of empirical testing which shows that using all methods to make a prediction is more powerful than using any single method. This is in rough agreement with the Bayesian way of solving problems, but based upon a different (essentially empirical) motivation. A rough summary is: Take all of {decision trees, boosted decision trees, bagged decision trees, boosted decision stumps, K nearest neighbors, neural networks, SVM} with all reasonable parameter settings. Run the methods on each problem of 8 problems with a large test set, calibrating margins using either sigmoid fitting or isotonic regression . For each loss of {accuracy, area under the ROC curve, cross entropy, squared error, etc…} evaluate the average performance of the method. A series of conclusions can be drawn from the observations. ( Calibrated ) boosted decision trees appear to perform best, in general although support v
3 0.96702874 227 hunch net-2007-01-10-A Deep Belief Net Learning Problem
Introduction: “Deep learning” is used to describe learning architectures which have significant depth (as a circuit). One claim is that shallow architectures (one or two layers) can not concisely represent some functions while a circuit with more depth can concisely represent these same functions. Proving lower bounds on the size of a circuit is substantially harder than upper bounds (which are constructive), but some results are known. Luca Trevisan ‘s class notes detail how XOR is not concisely representable by “AC0″ (= constant depth unbounded fan-in AND, OR, NOT gates). This doesn’t quite prove that depth is necessary for the representations commonly used in learning (such as a thresholded weighted sum), but it is strongly suggestive that this is so. Examples like this are a bit disheartening because existing algorithms for deep learning (deep belief nets, gradient descent on deep neural networks, and a perhaps decision trees depending on who you ask) can’t learn XOR very easily.
4 0.9657408 12 hunch net-2005-02-03-Learning Theory, by assumption
Introduction: One way to organize learning theory is by assumption (in the assumption = axiom sense ), from no assumptions to many assumptions. As you travel down this list, the statements become stronger, but the scope of applicability decreases. No assumptions Online learning There exist a meta prediction algorithm which compete well with the best element of any set of prediction algorithms. Universal Learning Using a “bias” of 2 - description length of turing machine in learning is equivalent to all other computable biases up to some constant. Reductions The ability to predict well on classification problems is equivalent to the ability to predict well on many other learning problems. Independent and Identically Distributed (IID) Data Performance Prediction Based upon past performance, you can predict future performance. Uniform Convergence Performance prediction works even after choosing classifiers based on the data from large sets of classifiers.
5 0.95729518 41 hunch net-2005-03-15-The State of Tight Bounds
Introduction: What? Bounds are mathematical formulas relating observations to future error rates assuming that data is drawn independently. In classical statistics, they are calld confidence intervals. Why? Good Judgement . In many applications of learning, it is desirable to know how well the learned predictor works in the future. This helps you decide if the problem is solved or not. Learning Essence . The form of some of these bounds helps you understand what the essence of learning is. Algorithm Design . Some of these bounds suggest, motivate, or even directly imply learning algorithms. What We Know Now There are several families of bounds, based on how information is used. Testing Bounds . These are methods which use labeled data not used in training to estimate the future error rate. Examples include the test set bound , progressive validation also here and here , train and test bounds , and cross-validation (but see the big open problem ). These tec
6 0.95680875 79 hunch net-2005-06-08-Question: “When is the right time to insert the loss function?”
7 0.95200425 259 hunch net-2007-08-19-Choice of Metrics
8 0.95089406 230 hunch net-2007-02-02-Thoughts regarding “Is machine learning different from statistics?”
9 0.95037711 236 hunch net-2007-03-15-Alternative Machine Learning Reductions Definitions
10 0.94942522 133 hunch net-2005-11-28-A question of quantification
11 0.9491545 27 hunch net-2005-02-23-Problem: Reinforcement Learning with Classification
12 0.9468267 19 hunch net-2005-02-14-Clever Methods of Overfitting
13 0.94459021 435 hunch net-2011-05-16-Research Directions for Machine Learning and Algorithms
14 0.94444084 235 hunch net-2007-03-03-All Models of Learning have Flaws
15 0.94372845 351 hunch net-2009-05-02-Wielding a New Abstraction
16 0.94015306 311 hunch net-2008-07-26-Compositional Machine Learning Algorithm Design
17 0.9376238 370 hunch net-2009-09-18-Necessary and Sufficient Research
18 0.93741304 143 hunch net-2005-12-27-Automated Labeling
19 0.93685824 347 hunch net-2009-03-26-Machine Learning is too easy
20 0.93612766 359 hunch net-2009-06-03-Functionally defined Nonlinear Dynamic Models