hunch_net hunch_net-2006 hunch_net-2006-181 knowledge-graph by maker-knowledge-mining

181 hunch net-2006-05-23-What is the best regret transform reduction from multiclass to binary?


meta infos for this blog

Source: html

Introduction: This post is about an open problem in learning reductions. Background A reduction might transform a a multiclass prediction problem where there are k possible labels into a binary learning problem where there are only 2 possible labels. On this induced binary problem we might learn a binary classifier with some error rate e . After subtracting the minimum possible (Bayes) error rate b , we get a regret r = e – b . The PECOC (Probabilistic Error Correcting Output Code) reduction has the property that binary regret r implies multiclass regret at most 4r 0.5 . The problem This is not the “rightest” answer. Consider the k=2 case, where we reduce binary to binary. There exists a reduction (the identity) with the property that regret r implies regret r . This is substantially superior to the transform given by the PECOC reduction, which suggests that a better reduction may exist for general k . For example, we can not rule out the possibility that a reduction


Summary: the most important sentenses genereted by tfidf model

sentIndex sentText sentNum sentScore

1 This post is about an open problem in learning reductions. [sent-1, score-0.086]

2 Background A reduction might transform a a multiclass prediction problem where there are k possible labels into a binary learning problem where there are only 2 possible labels. [sent-2, score-1.587]

3 On this induced binary problem we might learn a binary classifier with some error rate e . [sent-3, score-1.138]

4 After subtracting the minimum possible (Bayes) error rate b , we get a regret r = e – b . [sent-4, score-0.723]

5 The PECOC (Probabilistic Error Correcting Output Code) reduction has the property that binary regret r implies multiclass regret at most 4r 0. [sent-5, score-1.886]

6 The problem This is not the “rightest” answer. [sent-7, score-0.086]

7 Consider the k=2 case, where we reduce binary to binary. [sent-8, score-0.322]

8 There exists a reduction (the identity) with the property that regret r implies regret r . [sent-9, score-1.418]

9 This is substantially superior to the transform given by the PECOC reduction, which suggests that a better reduction may exist for general k . [sent-10, score-0.826]

10 For example, we can not rule out the possibility that a reduction R exists with regret transform guaranteeing binary regret r implies at most multiclass regret c(k) r where c(k) is a k dependent constant between 1 and 4. [sent-11, score-2.769]

11 Difficulty I believe this is a solvable problem, given some serious thought. [sent-12, score-0.058]

12 Impact The use of some reduction from multiclass to binary is common practice, so a good solution could be widely useful. [sent-13, score-1.018]

13 One thing to be aware of is that there is a common and reasonable concern about the ‘naturalness’ of induced problems. [sent-14, score-0.266]

14 There seems to be no way to address this concern other than via empirical testing. [sent-15, score-0.144]

15 On the theoretical side, a better reduction may help us understand whether classification or l 2 regression is the more natural primitive for reduction. [sent-16, score-0.555]

16 The PECOC reduction essentially first turns a binary classifier into an l 2 regressor and then uses the regressor repeatedly to make multiclass predictions. [sent-17, score-1.301]

17 Some background material which may help: Dietterich and Bakiri introduce Error Correcting Output Codes . [sent-18, score-0.274]

18 Guruswami and Sahai analyze ECOC as an error transform reduction . [sent-19, score-0.928]

19 (see lemma 2) Allwein, Schapire, and Singer generalize ECOC to use loss-based decoding . [sent-20, score-0.205]

20 Beygelzimer and Langford showed that ECOC is not a regret transform and proved the PECOC regret transform . [sent-21, score-1.578]


similar blogs computed by tfidf model

tfidf for this blog:

wordName wordTfidf (topN-words)

[('regret', 0.389), ('reduction', 0.381), ('transform', 0.346), ('binary', 0.322), ('pecoc', 0.294), ('multiclass', 0.22), ('ecoc', 0.208), ('error', 0.153), ('regressor', 0.127), ('induced', 0.122), ('correcting', 0.112), ('background', 0.105), ('concern', 0.097), ('property', 0.095), ('implies', 0.09), ('problem', 0.086), ('output', 0.086), ('naturalness', 0.079), ('dietterich', 0.079), ('exists', 0.074), ('possible', 0.073), ('lemma', 0.073), ('singer', 0.073), ('classifier', 0.073), ('decoding', 0.069), ('guaranteeing', 0.069), ('beygelzimer', 0.069), ('langford', 0.069), ('help', 0.064), ('generalize', 0.063), ('introduce', 0.063), ('identity', 0.063), ('primitive', 0.063), ('rate', 0.06), ('codes', 0.059), ('material', 0.059), ('solvable', 0.058), ('proved', 0.055), ('schapire', 0.053), ('showed', 0.053), ('superior', 0.052), ('possibility', 0.052), ('repeatedly', 0.051), ('analyze', 0.048), ('widely', 0.048), ('dependent', 0.048), ('minimum', 0.048), ('may', 0.047), ('address', 0.047), ('common', 0.047)]

similar blogs list:

simIndex simValue blogId blogTitle

same-blog 1 1.0 181 hunch net-2006-05-23-What is the best regret transform reduction from multiclass to binary?

Introduction: This post is about an open problem in learning reductions. Background A reduction might transform a a multiclass prediction problem where there are k possible labels into a binary learning problem where there are only 2 possible labels. On this induced binary problem we might learn a binary classifier with some error rate e . After subtracting the minimum possible (Bayes) error rate b , we get a regret r = e – b . The PECOC (Probabilistic Error Correcting Output Code) reduction has the property that binary regret r implies multiclass regret at most 4r 0.5 . The problem This is not the “rightest” answer. Consider the k=2 case, where we reduce binary to binary. There exists a reduction (the identity) with the property that regret r implies regret r . This is substantially superior to the transform given by the PECOC reduction, which suggests that a better reduction may exist for general k . For example, we can not rule out the possibility that a reduction

2 0.52366138 72 hunch net-2005-05-16-Regret minimizing vs error limiting reductions

Introduction: This post is about a reductions-related problem that I find mysterious. There are two kinds of reductions analysis currently under consideration. Error limiting reductions. Here, the goal is to bound the error rate of the created classifier in terms of the error rate of the binary classifiers that you reduce to. A very simple example of this is that error correcting output codes where it is possible to prove that for certain codes, the multiclass error rate is at most 4 * the binary classifier error rate. Regret minimizing reductions. Here, the goal is to bound the regret of the created classifier in terms of the regret of the binary classifiers reduced to. The regret is the error rate minus the minimum error rate. When the learning problem is noisy the minimum error rate may not be 0 . An analagous result for reget is that for a probabilistic error correcting output code , multiclass regret is at most 4 * (binary regret) 0.5 . The use of “regret” is more desi

3 0.33840042 243 hunch net-2007-05-08-Conditional Tournaments for Multiclass to Binary

Introduction: This problem has been cracked (but not quite completely solved) by Alina , Pradeep , and I . The problem is essentially finding a better way to reduce multiclass classification to binary classification. The solution is to use a carefully crafted tournament, the simplest version of which is a single elimination tournament where the “players” are the different classes. An example of the structure is here: For the single elimination tournament, we can prove that: For all multiclass problems D , for all learned binary classifiers c , the regret of an induced multiclass classifier is bounded by the regret of the binary classifier times log 2 k . Restated: reg multiclass (D,Filter_tree_test(c)) <= reg binary (Filter_tree_train(D),c) Here: Filter_tree_train(D) is the induced binary classification problem Filter_tree_test(c) is the induced multiclass classifier. reg multiclass is the multiclass regret (= difference between error rate and minim

4 0.30574203 236 hunch net-2007-03-15-Alternative Machine Learning Reductions Definitions

Introduction: A type of prediction problem is specified by the type of samples produced by a data source (Example: X x {0,1} , X x [0,1] , X x {1,2,3,4,5} , etc…) and a loss function (0/1 loss, squared error loss, cost sensitive losses, etc…). For simplicity, we’ll assume that all losses have a minimum of zero. For this post, we can think of a learning reduction as A mapping R from samples of one type T (like multiclass classification) to another type T’ (like binary classification). A mapping Q from predictors for type T’ to predictors for type T . The simplest sort of learning reduction is a “loss reduction”. The idea in a loss reduction is to prove a statement of the form: Theorem For all base predictors b , for all distributions D over examples of type T : E (x,y) ~ D L T (y,Q(b,x)) <= f(E (x’,y’)~R(D) L T’ (y’,b(x’))) Here L T is the loss for the type T problem and L T’ is the loss for the type T’ problem. Also, R(D) is the distribution ov

5 0.2722849 14 hunch net-2005-02-07-The State of the Reduction

Introduction: What? Reductions are machines which turn solvers for one problem into solvers for another problem. Why? Reductions are useful for several reasons. Laziness . Reducing a problem to classification make at least 10 learning algorithms available to solve a problem. Inventing 10 learning algorithms is quite a bit of work. Similarly, programming a reduction is often trivial, while programming a learning algorithm is a great deal of work. Crystallization . The problems we often want to solve in learning are worst-case-impossible, but average case feasible. By reducing all problems onto one or a few primitives, we can fine tune these primitives to perform well on real-world problems with greater precision due to the greater number of problems to validate on. Theoretical Organization . By studying what reductions are easy vs. hard vs. impossible, we can learn which problems are roughly equivalent in difficulty and which are much harder. What we know now . Typesafe r

6 0.26199201 196 hunch net-2006-07-13-Regression vs. Classification as a Primitive

7 0.24037376 391 hunch net-2010-03-15-The Efficient Robust Conditional Probability Estimation Problem

8 0.20925155 83 hunch net-2005-06-18-Lower Bounds for Learning Reductions

9 0.1948086 327 hunch net-2008-11-16-Observations on Linearity for Reductions to Regression

10 0.17036575 103 hunch net-2005-08-18-SVM Adaptability

11 0.16193612 439 hunch net-2011-08-01-Interesting papers at COLT 2011

12 0.14381787 206 hunch net-2006-09-09-How to solve an NP hard problem in quadratic time

13 0.12687966 27 hunch net-2005-02-23-Problem: Reinforcement Learning with Classification

14 0.12077241 351 hunch net-2009-05-02-Wielding a New Abstraction

15 0.11974552 311 hunch net-2008-07-26-Compositional Machine Learning Algorithm Design

16 0.118642 78 hunch net-2005-06-06-Exact Online Learning for Classification

17 0.11372234 186 hunch net-2006-06-24-Online convex optimization at COLT

18 0.098100714 343 hunch net-2009-02-18-Decision by Vetocracy

19 0.096859984 82 hunch net-2005-06-17-Reopening RL->Classification

20 0.096170291 274 hunch net-2007-11-28-Computational Consequences of Classification


similar blogs computed by lsi model

lsi for this blog:

topicId topicWeight

[(0, 0.169), (1, 0.188), (2, 0.134), (3, -0.15), (4, -0.081), (5, -0.066), (6, 0.374), (7, -0.15), (8, -0.356), (9, -0.023), (10, -0.157), (11, -0.073), (12, -0.041), (13, -0.046), (14, -0.028), (15, 0.024), (16, -0.012), (17, 0.053), (18, 0.091), (19, -0.057), (20, -0.05), (21, -0.048), (22, 0.11), (23, 0.02), (24, 0.047), (25, -0.045), (26, 0.006), (27, -0.018), (28, 0.023), (29, 0.004), (30, 0.048), (31, 0.005), (32, -0.048), (33, 0.029), (34, 0.056), (35, -0.075), (36, -0.073), (37, -0.054), (38, -0.066), (39, -0.013), (40, 0.023), (41, 0.053), (42, 0.003), (43, 0.0), (44, -0.017), (45, -0.009), (46, -0.016), (47, -0.047), (48, 0.01), (49, 0.016)]

similar blogs list:

simIndex simValue blogId blogTitle

same-blog 1 0.99092889 181 hunch net-2006-05-23-What is the best regret transform reduction from multiclass to binary?

Introduction: This post is about an open problem in learning reductions. Background A reduction might transform a a multiclass prediction problem where there are k possible labels into a binary learning problem where there are only 2 possible labels. On this induced binary problem we might learn a binary classifier with some error rate e . After subtracting the minimum possible (Bayes) error rate b , we get a regret r = e – b . The PECOC (Probabilistic Error Correcting Output Code) reduction has the property that binary regret r implies multiclass regret at most 4r 0.5 . The problem This is not the “rightest” answer. Consider the k=2 case, where we reduce binary to binary. There exists a reduction (the identity) with the property that regret r implies regret r . This is substantially superior to the transform given by the PECOC reduction, which suggests that a better reduction may exist for general k . For example, we can not rule out the possibility that a reduction

2 0.93663061 243 hunch net-2007-05-08-Conditional Tournaments for Multiclass to Binary

Introduction: This problem has been cracked (but not quite completely solved) by Alina , Pradeep , and I . The problem is essentially finding a better way to reduce multiclass classification to binary classification. The solution is to use a carefully crafted tournament, the simplest version of which is a single elimination tournament where the “players” are the different classes. An example of the structure is here: For the single elimination tournament, we can prove that: For all multiclass problems D , for all learned binary classifiers c , the regret of an induced multiclass classifier is bounded by the regret of the binary classifier times log 2 k . Restated: reg multiclass (D,Filter_tree_test(c)) <= reg binary (Filter_tree_train(D),c) Here: Filter_tree_train(D) is the induced binary classification problem Filter_tree_test(c) is the induced multiclass classifier. reg multiclass is the multiclass regret (= difference between error rate and minim

3 0.89725745 72 hunch net-2005-05-16-Regret minimizing vs error limiting reductions

Introduction: This post is about a reductions-related problem that I find mysterious. There are two kinds of reductions analysis currently under consideration. Error limiting reductions. Here, the goal is to bound the error rate of the created classifier in terms of the error rate of the binary classifiers that you reduce to. A very simple example of this is that error correcting output codes where it is possible to prove that for certain codes, the multiclass error rate is at most 4 * the binary classifier error rate. Regret minimizing reductions. Here, the goal is to bound the regret of the created classifier in terms of the regret of the binary classifiers reduced to. The regret is the error rate minus the minimum error rate. When the learning problem is noisy the minimum error rate may not be 0 . An analagous result for reget is that for a probabilistic error correcting output code , multiclass regret is at most 4 * (binary regret) 0.5 . The use of “regret” is more desi

4 0.74663061 14 hunch net-2005-02-07-The State of the Reduction

Introduction: What? Reductions are machines which turn solvers for one problem into solvers for another problem. Why? Reductions are useful for several reasons. Laziness . Reducing a problem to classification make at least 10 learning algorithms available to solve a problem. Inventing 10 learning algorithms is quite a bit of work. Similarly, programming a reduction is often trivial, while programming a learning algorithm is a great deal of work. Crystallization . The problems we often want to solve in learning are worst-case-impossible, but average case feasible. By reducing all problems onto one or a few primitives, we can fine tune these primitives to perform well on real-world problems with greater precision due to the greater number of problems to validate on. Theoretical Organization . By studying what reductions are easy vs. hard vs. impossible, we can learn which problems are roughly equivalent in difficulty and which are much harder. What we know now . Typesafe r

5 0.69191074 196 hunch net-2006-07-13-Regression vs. Classification as a Primitive

Introduction: For learning reductions we have been concentrating on reducing various complex learning problems to binary classification. This choice needs to be actively questioned, because it was not carefully considered. Binary clasification is learning a classifier c:X -> {0,1} so as to minimize the probability of being wrong, Pr x,y~D (c(x) y) . The primary alternative candidate seems to be squared error regression. In squared error regression, you learn a regressor s:X -> [0,1] so as to minimize squared error, E x,y~D (s(x)-y) 2 . It is difficult to judge one primitive against another. The judgement must at least partially be made on nontheoretical grounds because (essentially) we are evaluating a choice between two axioms/assumptions. These two primitives are significantly related. Classification can be reduced to regression in the obvious way: you use the regressor to predict D(y=1|x) , then threshold at 0.5 . For this simple reduction a squared error regret of r

6 0.6557703 391 hunch net-2010-03-15-The Efficient Robust Conditional Probability Estimation Problem

7 0.64336711 206 hunch net-2006-09-09-How to solve an NP hard problem in quadratic time

8 0.6212455 236 hunch net-2007-03-15-Alternative Machine Learning Reductions Definitions

9 0.6153 327 hunch net-2008-11-16-Observations on Linearity for Reductions to Regression

10 0.55209112 83 hunch net-2005-06-18-Lower Bounds for Learning Reductions

11 0.50037968 78 hunch net-2005-06-06-Exact Online Learning for Classification

12 0.49367148 439 hunch net-2011-08-01-Interesting papers at COLT 2011

13 0.44923627 176 hunch net-2006-05-01-A conversation between Theo and Pat

14 0.42591912 82 hunch net-2005-06-17-Reopening RL->Classification

15 0.42437026 49 hunch net-2005-03-30-What can Type Theory teach us about Machine Learning?

16 0.3884024 27 hunch net-2005-02-23-Problem: Reinforcement Learning with Classification

17 0.38575912 473 hunch net-2012-09-29-Vowpal Wabbit, version 7.0

18 0.37257725 103 hunch net-2005-08-18-SVM Adaptability

19 0.36182398 247 hunch net-2007-06-14-Interesting Papers at COLT 2007

20 0.3417733 351 hunch net-2009-05-02-Wielding a New Abstraction


similar blogs computed by lda model

lda for this blog:

topicId topicWeight

[(3, 0.025), (27, 0.213), (38, 0.49), (53, 0.028), (55, 0.036), (67, 0.035), (94, 0.04)]

similar blogs list:

simIndex simValue blogId blogTitle

1 0.97914046 339 hunch net-2009-01-27-Key Scientific Challenges

Introduction: Yahoo released the Key Scientific Challenges program. There is a Machine Learning list I worked on and a Statistics list which Deepak worked on. I’m hoping this is taken quite seriously by graduate students. The primary value, is that it gave us a chance to sit down and publicly specify directions of research which would be valuable to make progress on. A good strategy for a beginning graduate student is to pick one of these directions, pursue it, and make substantial advances for a PhD. The directions are sufficiently general that I’m sure any serious advance has applications well beyond Yahoo. A secondary point, (which I’m sure is primary for many ) is that there is money for graduate students here. It’s unrestricted, so you can use it for any reasonable travel, supplies, etc…

2 0.97867233 125 hunch net-2005-10-20-Machine Learning in the News

Introduction: The New York Times had a short interview about machine learning in datamining being used pervasively by the IRS and large corporations to predict who to audit and who to target for various marketing campaigns. This is a big application area of machine learning. It can be harmful (learning + databases = another way to invade privacy) or beneficial (as google demonstrates, better targeting of marketing campaigns is far less annoying). This is yet more evidence that we can not rely upon “I’m just another fish in the school” logic for our expectations about treatment by government and large corporations.

same-blog 3 0.97861588 181 hunch net-2006-05-23-What is the best regret transform reduction from multiclass to binary?

Introduction: This post is about an open problem in learning reductions. Background A reduction might transform a a multiclass prediction problem where there are k possible labels into a binary learning problem where there are only 2 possible labels. On this induced binary problem we might learn a binary classifier with some error rate e . After subtracting the minimum possible (Bayes) error rate b , we get a regret r = e – b . The PECOC (Probabilistic Error Correcting Output Code) reduction has the property that binary regret r implies multiclass regret at most 4r 0.5 . The problem This is not the “rightest” answer. Consider the k=2 case, where we reduce binary to binary. There exists a reduction (the identity) with the property that regret r implies regret r . This is substantially superior to the transform given by the PECOC reduction, which suggests that a better reduction may exist for general k . For example, we can not rule out the possibility that a reduction

4 0.97122598 83 hunch net-2005-06-18-Lower Bounds for Learning Reductions

Introduction: Learning reductions transform a solver of one type of learning problem into a solver of another type of learning problem. When we analyze these for robustness we can make statement of the form “Reduction R has the property that regret r (or loss) on subproblems of type A implies regret at most f ( r ) on the original problem of type B “. A lower bound for a learning reduction would have the form “for all reductions R , there exists a learning problem of type B and learning algorithm for problems of type A where regret r on induced problems implies at least regret f ( r ) for B “. The pursuit of lower bounds is often questionable because, unlike upper bounds, they do not yield practical algorithms. Nevertheless, they may be helpful as a tool for thinking about what is learnable and how learnable it is. This has already come up here and here . At the moment, there is no coherent theory of lower bounds for learning reductions, and we have little understa

5 0.94545054 170 hunch net-2006-04-06-Bounds greater than 1

Introduction: Nati Srebro and Shai Ben-David have a paper at COLT which, in the appendix, proves something very striking: several previous error bounds are always greater than 1. Background One branch of learning theory focuses on theorems which Assume samples are drawn IID from an unknown distribution D . Fix a set of classifiers Find a high probability bound on the maximum true error rate (with respect to D ) as a function of the empirical error rate on the training set. Many of these bounds become extremely complex and hairy. Current Everyone working on this subject wants “tighter bounds”, however there are different definitions of “tighter”. Some groups focus on “functional tightness” (getting the right functional dependency between the size of the training set and a parameterization of the hypothesis space) while others focus on “practical tightness” (finding bounds which work well on practical problems). (I am definitely in the second camp.) One of the da

6 0.93689549 488 hunch net-2013-08-31-Extreme Classification workshop at NIPS

7 0.87484646 353 hunch net-2009-05-08-Computability in Artificial Intelligence

8 0.86857891 233 hunch net-2007-02-16-The Forgetting

9 0.84675127 236 hunch net-2007-03-15-Alternative Machine Learning Reductions Definitions

10 0.8348937 251 hunch net-2007-06-24-Interesting Papers at ICML 2007

11 0.75924069 72 hunch net-2005-05-16-Regret minimizing vs error limiting reductions

12 0.70306283 239 hunch net-2007-04-18-$50K Spock Challenge

13 0.68652904 26 hunch net-2005-02-21-Problem: Cross Validation

14 0.6463443 82 hunch net-2005-06-17-Reopening RL->Classification

15 0.6429531 19 hunch net-2005-02-14-Clever Methods of Overfitting

16 0.63789648 131 hunch net-2005-11-16-The Everything Ensemble Edge

17 0.6293661 49 hunch net-2005-03-30-What can Type Theory teach us about Machine Learning?

18 0.62396538 14 hunch net-2005-02-07-The State of the Reduction

19 0.62328351 391 hunch net-2010-03-15-The Efficient Robust Conditional Probability Estimation Problem

20 0.61454636 259 hunch net-2007-08-19-Choice of Metrics