hunch_net hunch_net-2007 hunch_net-2007-243 knowledge-graph by maker-knowledge-mining
Source: html
Introduction: This problem has been cracked (but not quite completely solved) by Alina , Pradeep , and I . The problem is essentially finding a better way to reduce multiclass classification to binary classification. The solution is to use a carefully crafted tournament, the simplest version of which is a single elimination tournament where the “players” are the different classes. An example of the structure is here: For the single elimination tournament, we can prove that: For all multiclass problems D , for all learned binary classifiers c , the regret of an induced multiclass classifier is bounded by the regret of the binary classifier times log 2 k . Restated: reg multiclass (D,Filter_tree_test(c)) <= reg binary (Filter_tree_train(D),c) Here: Filter_tree_train(D) is the induced binary classification problem Filter_tree_test(c) is the induced multiclass classifier. reg multiclass is the multiclass regret (= difference between error rate and minim
sentIndex sentText sentNum sentScore
1 This problem has been cracked (but not quite completely solved) by Alina , Pradeep , and I . [sent-1, score-0.152]
2 The problem is essentially finding a better way to reduce multiclass classification to binary classification. [sent-2, score-0.796]
3 The solution is to use a carefully crafted tournament, the simplest version of which is a single elimination tournament where the “players” are the different classes. [sent-3, score-0.83]
4 An example of the structure is here: For the single elimination tournament, we can prove that: For all multiclass problems D , for all learned binary classifiers c , the regret of an induced multiclass classifier is bounded by the regret of the binary classifier times log 2 k . [sent-4, score-2.977]
5 Restated: reg multiclass (D,Filter_tree_test(c)) <= reg binary (Filter_tree_train(D),c) Here: Filter_tree_train(D) is the induced binary classification problem Filter_tree_test(c) is the induced multiclass classifier. [sent-5, score-2.727]
6 reg multiclass is the multiclass regret (= difference between error rate and minimum possible error rate) reg binary is the binary regret This result has a slight dependence on k which we suspect is removable. [sent-6, score-3.28]
7 The current conjecture is that this dependence can be removed by using higher order tournaments such as double elimination , triple elimination, up to log 2 k -elimination. [sent-7, score-0.954]
8 The key insight which makes the result possible is conditionally defining the prediction problems at interior nodes. [sent-8, score-0.485]
9 In essence, we use the learned classifiers from the first level of the tree to filter the distribution over examples reaching the second level of the tree. [sent-9, score-0.502]
10 This process repeats, until the root node is reached. [sent-10, score-0.157]
11 Further details, including a more precise description and some experimental results are in the draft paper . [sent-11, score-0.236]
wordName wordTfidf (topN-words)
[('reg', 0.384), ('multiclass', 0.375), ('binary', 0.342), ('elimination', 0.336), ('tournament', 0.267), ('induced', 0.223), ('regret', 0.189), ('dependence', 0.122), ('classifiers', 0.109), ('conditionally', 0.096), ('interior', 0.096), ('pradeep', 0.096), ('conjecture', 0.096), ('triple', 0.096), ('players', 0.089), ('slight', 0.089), ('cracked', 0.089), ('classifier', 0.089), ('log', 0.088), ('learned', 0.086), ('single', 0.084), ('crafted', 0.084), ('tournaments', 0.084), ('level', 0.081), ('root', 0.08), ('repeats', 0.08), ('result', 0.08), ('classification', 0.079), ('insight', 0.077), ('defining', 0.077), ('node', 0.077), ('reaching', 0.077), ('removed', 0.077), ('error', 0.074), ('rate', 0.072), ('suspect', 0.072), ('filter', 0.068), ('draft', 0.066), ('restated', 0.066), ('completely', 0.063), ('essence', 0.063), ('bounded', 0.061), ('alina', 0.06), ('possible', 0.059), ('description', 0.059), ('simplest', 0.059), ('minimum', 0.058), ('precise', 0.058), ('higher', 0.055), ('experimental', 0.053)]
simIndex simValue blogId blogTitle
same-blog 1 1.0000002 243 hunch net-2007-05-08-Conditional Tournaments for Multiclass to Binary
Introduction: This problem has been cracked (but not quite completely solved) by Alina , Pradeep , and I . The problem is essentially finding a better way to reduce multiclass classification to binary classification. The solution is to use a carefully crafted tournament, the simplest version of which is a single elimination tournament where the “players” are the different classes. An example of the structure is here: For the single elimination tournament, we can prove that: For all multiclass problems D , for all learned binary classifiers c , the regret of an induced multiclass classifier is bounded by the regret of the binary classifier times log 2 k . Restated: reg multiclass (D,Filter_tree_test(c)) <= reg binary (Filter_tree_train(D),c) Here: Filter_tree_train(D) is the induced binary classification problem Filter_tree_test(c) is the induced multiclass classifier. reg multiclass is the multiclass regret (= difference between error rate and minim
2 0.40632313 72 hunch net-2005-05-16-Regret minimizing vs error limiting reductions
Introduction: This post is about a reductions-related problem that I find mysterious. There are two kinds of reductions analysis currently under consideration. Error limiting reductions. Here, the goal is to bound the error rate of the created classifier in terms of the error rate of the binary classifiers that you reduce to. A very simple example of this is that error correcting output codes where it is possible to prove that for certain codes, the multiclass error rate is at most 4 * the binary classifier error rate. Regret minimizing reductions. Here, the goal is to bound the regret of the created classifier in terms of the regret of the binary classifiers reduced to. The regret is the error rate minus the minimum error rate. When the learning problem is noisy the minimum error rate may not be 0 . An analagous result for reget is that for a probabilistic error correcting output code , multiclass regret is at most 4 * (binary regret) 0.5 . The use of “regret” is more desi
3 0.33840042 181 hunch net-2006-05-23-What is the best regret transform reduction from multiclass to binary?
Introduction: This post is about an open problem in learning reductions. Background A reduction might transform a a multiclass prediction problem where there are k possible labels into a binary learning problem where there are only 2 possible labels. On this induced binary problem we might learn a binary classifier with some error rate e . After subtracting the minimum possible (Bayes) error rate b , we get a regret r = e – b . The PECOC (Probabilistic Error Correcting Output Code) reduction has the property that binary regret r implies multiclass regret at most 4r 0.5 . The problem This is not the “rightest” answer. Consider the k=2 case, where we reduce binary to binary. There exists a reduction (the identity) with the property that regret r implies regret r . This is substantially superior to the transform given by the PECOC reduction, which suggests that a better reduction may exist for general k . For example, we can not rule out the possibility that a reduction
4 0.25128528 391 hunch net-2010-03-15-The Efficient Robust Conditional Probability Estimation Problem
Introduction: I’m offering a reward of $1000 for a solution to this problem. This joins the cross validation problem which I’m offering a $500 reward for. I believe both of these problems are hard but plausibly solvable, and plausibly with a solution of substantial practical value. While it’s unlikely these rewards are worth your time on an hourly wage basis, the recognition for solving them definitely should be The Problem The problem is finding a general, robust, and efficient mechanism for estimating a conditional probability P(y|x) where robustness and efficiency are measured using techniques from learning reductions. In particular, suppose we have access to a binary regression oracle B which has two interfaces—one for specifying training information and one for testing. Training information is specified as B(x’,y’) where x’ is a feature vector and y’ is a scalar in [0,1] with no value returned. Testing is done according to B(x’) with a value in [0,1] returned.
5 0.17593291 14 hunch net-2005-02-07-The State of the Reduction
Introduction: What? Reductions are machines which turn solvers for one problem into solvers for another problem. Why? Reductions are useful for several reasons. Laziness . Reducing a problem to classification make at least 10 learning algorithms available to solve a problem. Inventing 10 learning algorithms is quite a bit of work. Similarly, programming a reduction is often trivial, while programming a learning algorithm is a great deal of work. Crystallization . The problems we often want to solve in learning are worst-case-impossible, but average case feasible. By reducing all problems onto one or a few primitives, we can fine tune these primitives to perform well on real-world problems with greater precision due to the greater number of problems to validate on. Theoretical Organization . By studying what reductions are easy vs. hard vs. impossible, we can learn which problems are roughly equivalent in difficulty and which are much harder. What we know now . Typesafe r
6 0.15973057 206 hunch net-2006-09-09-How to solve an NP hard problem in quadratic time
7 0.15399297 439 hunch net-2011-08-01-Interesting papers at COLT 2011
8 0.15336205 196 hunch net-2006-07-13-Regression vs. Classification as a Primitive
9 0.13635972 236 hunch net-2007-03-15-Alternative Machine Learning Reductions Definitions
10 0.10315913 351 hunch net-2009-05-02-Wielding a New Abstraction
11 0.10290128 120 hunch net-2005-10-10-Predictive Search is Coming
12 0.092403367 274 hunch net-2007-11-28-Computational Consequences of Classification
13 0.090317324 247 hunch net-2007-06-14-Interesting Papers at COLT 2007
14 0.088368133 83 hunch net-2005-06-18-Lower Bounds for Learning Reductions
15 0.085686617 78 hunch net-2005-06-06-Exact Online Learning for Classification
16 0.084642932 317 hunch net-2008-09-12-How do we get weak action dependence for learning with partial observations?
17 0.080863416 43 hunch net-2005-03-18-Binomial Weighting
18 0.078948282 102 hunch net-2005-08-11-Why Manifold-Based Dimension Reduction Techniques?
19 0.077266693 19 hunch net-2005-02-14-Clever Methods of Overfitting
20 0.076915339 388 hunch net-2010-01-24-Specializations of the Master Problem
topicId topicWeight
[(0, 0.138), (1, 0.138), (2, 0.123), (3, -0.124), (4, -0.051), (5, -0.044), (6, 0.276), (7, -0.105), (8, -0.259), (9, -0.008), (10, -0.108), (11, -0.01), (12, -0.015), (13, -0.048), (14, -0.06), (15, 0.008), (16, -0.033), (17, 0.047), (18, 0.091), (19, -0.009), (20, -0.067), (21, -0.032), (22, 0.088), (23, 0.039), (24, 0.05), (25, 0.015), (26, 0.024), (27, -0.021), (28, 0.021), (29, -0.04), (30, -0.028), (31, -0.009), (32, -0.003), (33, 0.023), (34, 0.078), (35, -0.048), (36, -0.017), (37, -0.023), (38, -0.059), (39, 0.009), (40, 0.028), (41, 0.023), (42, -0.022), (43, 0.02), (44, 0.04), (45, -0.035), (46, 0.013), (47, -0.071), (48, 0.068), (49, 0.011)]
simIndex simValue blogId blogTitle
same-blog 1 0.99187773 243 hunch net-2007-05-08-Conditional Tournaments for Multiclass to Binary
Introduction: This problem has been cracked (but not quite completely solved) by Alina , Pradeep , and I . The problem is essentially finding a better way to reduce multiclass classification to binary classification. The solution is to use a carefully crafted tournament, the simplest version of which is a single elimination tournament where the “players” are the different classes. An example of the structure is here: For the single elimination tournament, we can prove that: For all multiclass problems D , for all learned binary classifiers c , the regret of an induced multiclass classifier is bounded by the regret of the binary classifier times log 2 k . Restated: reg multiclass (D,Filter_tree_test(c)) <= reg binary (Filter_tree_train(D),c) Here: Filter_tree_train(D) is the induced binary classification problem Filter_tree_test(c) is the induced multiclass classifier. reg multiclass is the multiclass regret (= difference between error rate and minim
2 0.9455989 181 hunch net-2006-05-23-What is the best regret transform reduction from multiclass to binary?
Introduction: This post is about an open problem in learning reductions. Background A reduction might transform a a multiclass prediction problem where there are k possible labels into a binary learning problem where there are only 2 possible labels. On this induced binary problem we might learn a binary classifier with some error rate e . After subtracting the minimum possible (Bayes) error rate b , we get a regret r = e – b . The PECOC (Probabilistic Error Correcting Output Code) reduction has the property that binary regret r implies multiclass regret at most 4r 0.5 . The problem This is not the “rightest” answer. Consider the k=2 case, where we reduce binary to binary. There exists a reduction (the identity) with the property that regret r implies regret r . This is substantially superior to the transform given by the PECOC reduction, which suggests that a better reduction may exist for general k . For example, we can not rule out the possibility that a reduction
3 0.92058635 72 hunch net-2005-05-16-Regret minimizing vs error limiting reductions
Introduction: This post is about a reductions-related problem that I find mysterious. There are two kinds of reductions analysis currently under consideration. Error limiting reductions. Here, the goal is to bound the error rate of the created classifier in terms of the error rate of the binary classifiers that you reduce to. A very simple example of this is that error correcting output codes where it is possible to prove that for certain codes, the multiclass error rate is at most 4 * the binary classifier error rate. Regret minimizing reductions. Here, the goal is to bound the regret of the created classifier in terms of the regret of the binary classifiers reduced to. The regret is the error rate minus the minimum error rate. When the learning problem is noisy the minimum error rate may not be 0 . An analagous result for reget is that for a probabilistic error correcting output code , multiclass regret is at most 4 * (binary regret) 0.5 . The use of “regret” is more desi
4 0.72686869 206 hunch net-2006-09-09-How to solve an NP hard problem in quadratic time
Introduction: This title is a lie, but it is a special lie which has a bit of truth. If n players each play each other, you have a tournament. How do you order the players from weakest to strongest? The standard first attempt is “find the ordering which agrees with the tournament on as many player pairs as possible”. This is called the “minimum feedback arcset” problem in the CS theory literature and it is a well known NP-hard problem. A basic guarantee holds for the solution to this problem: if there is some “true” intrinsic ordering, and the outcome of the tournament disagrees k times (due to noise for instance), then the output ordering will disagree with the original ordering on at most 2k edges (and no solution can be better). One standard approach to tractably solving an NP-hard problem is to find another algorithm with an approximation guarantee. For example, Don Coppersmith , Lisa Fleischer and Atri Rudra proved that ordering players according to the number of wins is
5 0.71896899 14 hunch net-2005-02-07-The State of the Reduction
Introduction: What? Reductions are machines which turn solvers for one problem into solvers for another problem. Why? Reductions are useful for several reasons. Laziness . Reducing a problem to classification make at least 10 learning algorithms available to solve a problem. Inventing 10 learning algorithms is quite a bit of work. Similarly, programming a reduction is often trivial, while programming a learning algorithm is a great deal of work. Crystallization . The problems we often want to solve in learning are worst-case-impossible, but average case feasible. By reducing all problems onto one or a few primitives, we can fine tune these primitives to perform well on real-world problems with greater precision due to the greater number of problems to validate on. Theoretical Organization . By studying what reductions are easy vs. hard vs. impossible, we can learn which problems are roughly equivalent in difficulty and which are much harder. What we know now . Typesafe r
6 0.69591683 196 hunch net-2006-07-13-Regression vs. Classification as a Primitive
7 0.65498191 391 hunch net-2010-03-15-The Efficient Robust Conditional Probability Estimation Problem
8 0.56194156 327 hunch net-2008-11-16-Observations on Linearity for Reductions to Regression
9 0.56094509 78 hunch net-2005-06-06-Exact Online Learning for Classification
10 0.52842736 439 hunch net-2011-08-01-Interesting papers at COLT 2011
11 0.48214546 236 hunch net-2007-03-15-Alternative Machine Learning Reductions Definitions
12 0.45476791 176 hunch net-2006-05-01-A conversation between Theo and Pat
13 0.41676217 247 hunch net-2007-06-14-Interesting Papers at COLT 2007
14 0.40109047 83 hunch net-2005-06-18-Lower Bounds for Learning Reductions
15 0.39484879 311 hunch net-2008-07-26-Compositional Machine Learning Algorithm Design
16 0.36587679 67 hunch net-2005-05-06-Don’t mix the solution into the problem
17 0.35978895 43 hunch net-2005-03-18-Binomial Weighting
18 0.35528758 218 hunch net-2006-11-20-Context and the calculation misperception
19 0.3474659 82 hunch net-2005-06-17-Reopening RL->Classification
20 0.33737412 102 hunch net-2005-08-11-Why Manifold-Based Dimension Reduction Techniques?
topicId topicWeight
[(3, 0.569), (27, 0.219), (55, 0.056), (94, 0.02)]
simIndex simValue blogId blogTitle
same-blog 1 0.93807507 243 hunch net-2007-05-08-Conditional Tournaments for Multiclass to Binary
Introduction: This problem has been cracked (but not quite completely solved) by Alina , Pradeep , and I . The problem is essentially finding a better way to reduce multiclass classification to binary classification. The solution is to use a carefully crafted tournament, the simplest version of which is a single elimination tournament where the “players” are the different classes. An example of the structure is here: For the single elimination tournament, we can prove that: For all multiclass problems D , for all learned binary classifiers c , the regret of an induced multiclass classifier is bounded by the regret of the binary classifier times log 2 k . Restated: reg multiclass (D,Filter_tree_test(c)) <= reg binary (Filter_tree_train(D),c) Here: Filter_tree_train(D) is the induced binary classification problem Filter_tree_test(c) is the induced multiclass classifier. reg multiclass is the multiclass regret (= difference between error rate and minim
2 0.92812812 401 hunch net-2010-06-20-2010 ICML discussion site
Introduction: A substantial difficulty with the 2009 and 2008 ICML discussion system was a communication vacuum, where authors were not informed of comments, and commenters were not informed of responses to their comments without explicit monitoring. Mark Reid has setup a new discussion system for 2010 with the goal of addressing this. Mark didn’t want to make it to intrusive, so you must opt-in. As an author, find your paper and “Subscribe by email” to the comments. As a commenter, you have the option of providing an email for follow-up notification.
3 0.86131078 298 hunch net-2008-04-26-Eliminating the Birthday Paradox for Universal Features
Introduction: I want to expand on this post which describes one of the core tricks for making Vowpal Wabbit fast and easy to use when learning from text. The central trick is converting a word (or any other parseable quantity) into a number via a hash function. Kishore tells me this is a relatively old trick in NLP land, but it has some added advantages when doing online learning, because you can learn directly from the existing data without preprocessing the data to create features (destroying the online property) or using an expensive hashtable lookup (slowing things down). A central concern for this approach is collisions, which create a loss of information. If you use m features in an index space of size n the birthday paradox suggests a collision if m > n 0.5 , essentially because there are m 2 pairs. This is pretty bad, because it says that with a vocabulary of 10 5 features, you might need to have 10 10 entries in your table. It turns out that redundancy is gr
4 0.79024631 213 hunch net-2006-10-08-Incompatibilities between classical confidence intervals and learning.
Introduction: Classical confidence intervals satisfy a theorem of the form: For some data sources D , Pr S ~ D (f(D) > g(S)) > 1-d where f is some function of the distribution (such as the mean) and g is some function of the observed sample S . The constraints on D can vary between “Independent and identically distributed (IID) samples from a gaussian with an unknown mean” to “IID samples from an arbitrary distribution D “. There are even some confidence intervals which do not require IID samples. Classical confidence intervals often confuse people. They do not say “with high probability, for my observed sample, the bounds holds”. Instead, they tell you that if you reason according to the confidence interval in the future (and the constraints on D are satisfied), then you are not often wrong. Restated, they tell you something about what a safe procedure is in a stochastic world where d is the safety parameter. There are a number of results in theoretical machine learn
5 0.7789861 31 hunch net-2005-02-26-Problem: Reductions and Relative Ranking Metrics
Introduction: This, again, is something of a research direction rather than a single problem. There are several metrics people care about which depend upon the relative ranking of examples and there are sometimes good reasons to care about such metrics. Examples include AROC , “F1″, the proportion of the time that the top ranked element is in some class, the proportion of the top 10 examples in some class ( google ‘s problem), the lowest ranked example of some class, and the “sort distance” from a predicted ranking to a correct ranking. See here for an example of some of these. Problem What does the ability to classify well imply about performance under these metrics? Past Work Probabilistic classification under squared error can be solved with a classifier. A counterexample shows this does not imply a good AROC. Sample complexity bounds for AROC (and here ). A paper on “ Learning to Order Things “. Difficulty Several of these may be easy. Some of them may be h
6 0.73287427 391 hunch net-2010-03-15-The Efficient Robust Conditional Probability Estimation Problem
7 0.66755074 289 hunch net-2008-02-17-The Meaning of Confidence
8 0.58982491 183 hunch net-2006-06-14-Explorations of Exploration
9 0.56473577 484 hunch net-2013-06-16-Representative Reviewing
10 0.50147772 34 hunch net-2005-03-02-Prior, “Prior” and Bias
11 0.47484937 461 hunch net-2012-04-09-ICML author feedback is open
12 0.46849787 129 hunch net-2005-11-07-Prediction Competitions
13 0.45672739 78 hunch net-2005-06-06-Exact Online Learning for Classification
14 0.45368129 67 hunch net-2005-05-06-Don’t mix the solution into the problem
15 0.45091477 341 hunch net-2009-02-04-Optimal Proxy Loss for Classification
16 0.45076519 40 hunch net-2005-03-13-Avoiding Bad Reviewing
17 0.44491091 227 hunch net-2007-01-10-A Deep Belief Net Learning Problem
18 0.44385821 320 hunch net-2008-10-14-Who is Responsible for a Bad Review?
19 0.441576 259 hunch net-2007-08-19-Choice of Metrics
20 0.44003069 299 hunch net-2008-04-27-Watchword: Supervised Learning