hunch_net hunch_net-2005 hunch_net-2005-72 knowledge-graph by maker-knowledge-mining

72 hunch net-2005-05-16-Regret minimizing vs error limiting reductions

meta infos for this blog

Source: html

Introduction: This post is about a reductions-related problem that I find mysterious. There are two kinds of reductions analysis currently under consideration. Error limiting reductions. Here, the goal is to bound the error rate of the created classifier in terms of the error rate of the binary classifiers that you reduce to. A very simple example of this is that error correcting output codes where it is possible to prove that for certain codes, the multiclass error rate is at most 4 * the binary classifier error rate. Regret minimizing reductions. Here, the goal is to bound the regret of the created classifier in terms of the regret of the binary classifiers reduced to. The regret is the error rate minus the minimum error rate. When the learning problem is noisy the minimum error rate may not be 0 . An analagous result for reget is that for a probabilistic error correcting output code , multiclass regret is at most 4 * (binary regret) 0.5 . The use of “regret” is more desi

Summary: the most important sentenses genereted by tfidf model

sentIndex sentText sentNum sentScore

1 This post is about a reductions-related problem that I find mysterious. [sent-1, score-0.063]

2 There are two kinds of reductions analysis currently under consideration. [sent-2, score-0.204]

3 Here, the goal is to bound the error rate of the created classifier in terms of the error rate of the binary classifiers that you reduce to. [sent-4, score-2.626]

4 A very simple example of this is that error correcting output codes where it is possible to prove that for certain codes, the multiclass error rate is at most 4 * the binary classifier error rate. [sent-5, score-2.823]

5 Here, the goal is to bound the regret of the created classifier in terms of the regret of the binary classifiers reduced to. [sent-7, score-2.109]

6 The regret is the error rate minus the minimum error rate. [sent-8, score-1.877]

7 When the learning problem is noisy the minimum error rate may not be 0 . [sent-9, score-0.911]

8 An analagous result for reget is that for a probabilistic error correcting output code , multiclass regret is at most 4 * (binary regret) 0. [sent-10, score-1.462]

9 The use of “regret” is more desirable than the use of error rates, because (for example) the ECOC error rate bound implies nothing when there is enough noise so that the binary classifiers always have error rate 0. [sent-12, score-2.99]

10 However the square root dependence introduced when analyzing regret is not desirable. [sent-14, score-0.807]

11 Can we find some algorithm doing multiclass classification with binary classifiers such that average regret r for the binary classifiers implies average regret bounded by 4r for the multiclass classifier? [sent-16, score-2.838]

12 If the answer is “yes”, that reduction algorithm may be empirically superior to the one we use now. [sent-17, score-0.227]

13 If the answer is “no”, that is a sharp and unexpected distinction between error rate analysis and regret analysis. [sent-18, score-1.533]

similar blogs computed by tfidf model

tfidf for this blog:

wordName wordTfidf (topN-words)

[('regret', 0.477), ('error', 0.477), ('binary', 0.314), ('rate', 0.266), ('classifiers', 0.249), ('multiclass', 0.197), ('classifier', 0.163), ('codes', 0.132), ('correcting', 0.125), ('bound', 0.121), ('minimum', 0.106), ('analysis', 0.103), ('created', 0.096), ('output', 0.096), ('average', 0.089), ('worlds', 0.088), ('ecoc', 0.077), ('terms', 0.077), ('minus', 0.074), ('distinction', 0.074), ('root', 0.074), ('goal', 0.073), ('square', 0.071), ('sharp', 0.071), ('introduced', 0.068), ('implies', 0.067), ('minimizing', 0.066), ('answer', 0.065), ('find', 0.063), ('noisy', 0.062), ('reduced', 0.062), ('analyzing', 0.061), ('limiting', 0.06), ('superior', 0.058), ('desirable', 0.057), ('use', 0.056), ('dependence', 0.056), ('bounded', 0.056), ('nothing', 0.054), ('rates', 0.053), ('noise', 0.053), ('certain', 0.052), ('kinds', 0.052), ('currently', 0.049), ('empirically', 0.048), ('reduce', 0.047), ('yes', 0.047), ('probabilistic', 0.047), ('prove', 0.047), ('code', 0.043)]

similar blogs list:

simIndex simValue blogId blogTitle

same-blog 1 0.99999994 72 hunch net-2005-05-16-Regret minimizing vs error limiting reductions

2 0.52366138 181 hunch net-2006-05-23-What is the best regret transform reduction from multiclass to binary?

Introduction: This post is about an open problem in learning reductions. Background A reduction might transform a a multiclass prediction problem where there are k possible labels into a binary learning problem where there are only 2 possible labels. On this induced binary problem we might learn a binary classifier with some error rate e . After subtracting the minimum possible (Bayes) error rate b , we get a regret r = e – b . The PECOC (Probabilistic Error Correcting Output Code) reduction has the property that binary regret r implies multiclass regret at most 4r 0.5 . The problem This is not the “rightest” answer. Consider the k=2 case, where we reduce binary to binary. There exists a reduction (the identity) with the property that regret r implies regret r . This is substantially superior to the transform given by the PECOC reduction, which suggests that a better reduction may exist for general k . For example, we can not rule out the possibility that a reduction

3 0.40632313 243 hunch net-2007-05-08-Conditional Tournaments for Multiclass to Binary

Introduction: This problem has been cracked (but not quite completely solved) by Alina , Pradeep , and I . The problem is essentially finding a better way to reduce multiclass classification to binary classification. The solution is to use a carefully crafted tournament, the simplest version of which is a single elimination tournament where the “players” are the different classes. An example of the structure is here: For the single elimination tournament, we can prove that: For all multiclass problems D , for all learned binary classifiers c , the regret of an induced multiclass classifier is bounded by the regret of the binary classifier times log 2 k . Restated: reg multiclass (D,Filter_tree_test(c)) <= reg binary (Filter_tree_train(D),c) Here: Filter_tree_train(D) is the induced binary classification problem Filter_tree_test(c) is the induced multiclass classifier. reg multiclass is the multiclass regret (= difference between error rate and minim

4 0.3958911 14 hunch net-2005-02-07-The State of the Reduction

Introduction: What? Reductions are machines which turn solvers for one problem into solvers for another problem. Why? Reductions are useful for several reasons. Laziness . Reducing a problem to classification make at least 10 learning algorithms available to solve a problem. Inventing 10 learning algorithms is quite a bit of work. Similarly, programming a reduction is often trivial, while programming a learning algorithm is a great deal of work. Crystallization . The problems we often want to solve in learning are worst-case-impossible, but average case feasible. By reducing all problems onto one or a few primitives, we can fine tune these primitives to perform well on real-world problems with greater precision due to the greater number of problems to validate on. Theoretical Organization . By studying what reductions are easy vs. hard vs. impossible, we can learn which problems are roughly equivalent in difficulty and which are much harder. What we know now . Typesafe r

5 0.31093696 196 hunch net-2006-07-13-Regression vs. Classification as a Primitive

Introduction: For learning reductions we have been concentrating on reducing various complex learning problems to binary classification. This choice needs to be actively questioned, because it was not carefully considered. Binary clasification is learning a classifier c:X -> {0,1} so as to minimize the probability of being wrong, Pr x,y~D (c(x) y) . The primary alternative candidate seems to be squared error regression. In squared error regression, you learn a regressor s:X -> [0,1] so as to minimize squared error, E x,y~D (s(x)-y) 2 . It is difficult to judge one primitive against another. The judgement must at least partially be made on nontheoretical grounds because (essentially) we are evaluating a choice between two axioms/assumptions. These two primitives are significantly related. Classification can be reduced to regression in the obvious way: you use the regressor to predict D(y=1|x) , then threshold at 0.5 . For this simple reduction a squared error regret of r

6 0.26540032 78 hunch net-2005-06-06-Exact Online Learning for Classification

7 0.19299772 327 hunch net-2008-11-16-Observations on Linearity for Reductions to Regression

8 0.18433957 439 hunch net-2011-08-01-Interesting papers at COLT 2011

9 0.17659192 391 hunch net-2010-03-15-The Efficient Robust Conditional Probability Estimation Problem

10 0.17495666 236 hunch net-2007-03-15-Alternative Machine Learning Reductions Definitions

11 0.17340034 35 hunch net-2005-03-04-The Big O and Constants in Learning

12 0.16787346 67 hunch net-2005-05-06-Don’t mix the solution into the problem

13 0.16671206 206 hunch net-2006-09-09-How to solve an NP hard problem in quadratic time

14 0.16592364 83 hunch net-2005-06-18-Lower Bounds for Learning Reductions

15 0.14983985 247 hunch net-2007-06-14-Interesting Papers at COLT 2007

16 0.14081267 43 hunch net-2005-03-18-Binomial Weighting

17 0.13765107 218 hunch net-2006-11-20-Context and the calculation misperception

18 0.13282233 170 hunch net-2006-04-06-Bounds greater than 1

19 0.13089095 74 hunch net-2005-05-21-What is the right form of modularity in structured prediction?

20 0.12824498 311 hunch net-2008-07-26-Compositional Machine Learning Algorithm Design

similar blogs computed by lsi model

lsi for this blog:

topicId topicWeight

[(0, 0.17), (1, 0.229), (2, 0.176), (3, -0.186), (4, -0.069), (5, -0.102), (6, 0.409), (7, -0.157), (8, -0.369), (9, -0.019), (10, -0.151), (11, 0.017), (12, 0.019), (13, -0.14), (14, -0.025), (15, 0.001), (16, -0.083), (17, 0.057), (18, 0.069), (19, -0.009), (20, -0.018), (21, -0.003), (22, 0.067), (23, 0.107), (24, 0.048), (25, -0.049), (26, 0.051), (27, -0.019), (28, -0.004), (29, -0.066), (30, -0.022), (31, 0.042), (32, -0.008), (33, 0.01), (34, 0.065), (35, -0.048), (36, 0.035), (37, -0.118), (38, -0.055), (39, -0.028), (40, -0.028), (41, -0.027), (42, -0.028), (43, -0.015), (44, -0.005), (45, 0.013), (46, 0.016), (47, -0.11), (48, 0.056), (49, 0.025)]

similar blogs list:

simIndex simValue blogId blogTitle

same-blog 1 0.99685597 72 hunch net-2005-05-16-Regret minimizing vs error limiting reductions

2 0.93725395 243 hunch net-2007-05-08-Conditional Tournaments for Multiclass to Binary

3 0.91678464 181 hunch net-2006-05-23-What is the best regret transform reduction from multiclass to binary?

4 0.76698852 196 hunch net-2006-07-13-Regression vs. Classification as a Primitive

5 0.71734035 14 hunch net-2005-02-07-The State of the Reduction

6 0.66499907 206 hunch net-2006-09-09-How to solve an NP hard problem in quadratic time

7 0.64949197 78 hunch net-2005-06-06-Exact Online Learning for Classification

8 0.5913713 327 hunch net-2008-11-16-Observations on Linearity for Reductions to Regression

9 0.56937385 391 hunch net-2010-03-15-The Efficient Robust Conditional Probability Estimation Problem

10 0.49754584 247 hunch net-2007-06-14-Interesting Papers at COLT 2007

11 0.45901015 176 hunch net-2006-05-01-A conversation between Theo and Pat

12 0.45876342 439 hunch net-2011-08-01-Interesting papers at COLT 2011

13 0.45445272 35 hunch net-2005-03-04-The Big O and Constants in Learning

14 0.43486232 236 hunch net-2007-03-15-Alternative Machine Learning Reductions Definitions

15 0.42315075 218 hunch net-2006-11-20-Context and the calculation misperception

16 0.41556802 67 hunch net-2005-05-06-Don’t mix the solution into the problem

17 0.40855819 83 hunch net-2005-06-18-Lower Bounds for Learning Reductions

18 0.38801178 170 hunch net-2006-04-06-Bounds greater than 1

19 0.38589102 43 hunch net-2005-03-18-Binomial Weighting

20 0.37610441 311 hunch net-2008-07-26-Compositional Machine Learning Algorithm Design

similar blogs computed by lda model

lda for this blog:

topicId topicWeight

[(3, 0.063), (27, 0.375), (38, 0.236), (48, 0.044), (55, 0.052), (83, 0.061), (94, 0.03)]

similar blogs list:

simIndex simValue blogId blogTitle

same-blog 1 0.98236251 72 hunch net-2005-05-16-Regret minimizing vs error limiting reductions

2 0.97457296 251 hunch net-2007-06-24-Interesting Papers at ICML 2007

Introduction: Here are a few of the papers I enjoyed at ICML. Steffen Bickel , Michael BrÃƒÂ¼eckner, Tobias Scheffer , Discriminative Learning for Differing Training and Test Distributions There is a nice trick in this paper: they predict the probability that an unlabeled sample is in the training set vs. the test set, and then use this prediction to importance weight labeled samples in the training set. This paper uses a specific parametric model, but the approach is easily generalized. Steve Hanneke A Bound on the Label Complexity of Agnostic Active Learning This paper bounds the number of labels required by the A 2 algorithm for active learning in the agnostic case. Last year we figured out agnostic active learning was possible. This year, it’s quantified. Hopefull soon, it will be practical. Sylvian Gelly , David Silver Combining Online and Offline Knowledge in UCT . This paper is about techniques for improving MoGo with various sorts of learning. MoGo has a fair

3 0.94887871 236 hunch net-2007-03-15-Alternative Machine Learning Reductions Definitions

Introduction: A type of prediction problem is specified by the type of samples produced by a data source (Example: X x {0,1} , X x [0,1] , X x {1,2,3,4,5} , etc…) and a loss function (0/1 loss, squared error loss, cost sensitive losses, etc…). For simplicity, we’ll assume that all losses have a minimum of zero. For this post, we can think of a learning reduction as A mapping R from samples of one type T (like multiclass classification) to another type T’ (like binary classification). A mapping Q from predictors for type T’ to predictors for type T . The simplest sort of learning reduction is a “loss reduction”. The idea in a loss reduction is to prove a statement of the form: Theorem For all base predictors b , for all distributions D over examples of type T : E (x,y) ~ D L T (y,Q(b,x)) <= f(E (x’,y’)~R(D) L T’ (y’,b(x’))) Here L T is the loss for the type T problem and L T’ is the loss for the type T’ problem. Also, R(D) is the distribution ov

4 0.93772358 353 hunch net-2009-05-08-Computability in Artificial Intelligence

Introduction: Normally I do not blog, but John kindly invited me to do so. Since computability issues play a major role in Artificial Intelligence and Machine Learning, I would like to take the opportunity to comment on that and raise some questions. The general attitude is that AI is about finding efficient smart algorithms. For large parts of machine learning, the same attitude is not too dangerous. If you want to concentrate on conceptual problems, simply become a statistician. There is no analogous escape for modern research on AI (as opposed to GOFAI rooted in logic). Let me show by analogy why limiting research to computational questions is bad for any field. Except in computer science, computational aspects play little role in the development of fundamental theories: Consider e.g. set theory with axiom of choice, foundations of logic, exact/full minimax for zero-sum games, quantum (field) theory, string theory, … Indeed, at least in physics, every new fundamental theory seems to

5 0.93086874 170 hunch net-2006-04-06-Bounds greater than 1

Introduction: Nati Srebro and Shai Ben-David have a paper at COLT which, in the appendix, proves something very striking: several previous error bounds are always greater than 1. Background One branch of learning theory focuses on theorems which Assume samples are drawn IID from an unknown distribution D . Fix a set of classifiers Find a high probability bound on the maximum true error rate (with respect to D ) as a function of the empirical error rate on the training set. Many of these bounds become extremely complex and hairy. Current Everyone working on this subject wants “tighter bounds”, however there are different definitions of “tighter”. Some groups focus on “functional tightness” (getting the right functional dependency between the size of the training set and a parameterization of the hypothesis space) while others focus on “practical tightness” (finding bounds which work well on practical problems). (I am definitely in the second camp.) One of the da

6 0.91005063 327 hunch net-2008-11-16-Observations on Linearity for Reductions to Regression

7 0.90477043 259 hunch net-2007-08-19-Choice of Metrics

8 0.90313911 14 hunch net-2005-02-07-The State of the Reduction

9 0.90194333 83 hunch net-2005-06-18-Lower Bounds for Learning Reductions

10 0.90184766 181 hunch net-2006-05-23-What is the best regret transform reduction from multiclass to binary?

11 0.90111738 196 hunch net-2006-07-13-Regression vs. Classification as a Primitive

12 0.89707464 133 hunch net-2005-11-28-A question of quantification

13 0.89104182 233 hunch net-2007-02-16-The Forgetting

14 0.87676531 227 hunch net-2007-01-10-A Deep Belief Net Learning Problem

15 0.87665111 131 hunch net-2005-11-16-The Everything Ensemble Edge

16 0.87654155 391 hunch net-2010-03-15-The Efficient Robust Conditional Probability Estimation Problem

17 0.87556589 26 hunch net-2005-02-21-Problem: Cross Validation

18 0.87058717 230 hunch net-2007-02-02-Thoughts regarding “Is machine learning different from statistics?”

19 0.86883157 289 hunch net-2008-02-17-The Meaning of Confidence

20 0.86568779 79 hunch net-2005-06-08-Question: “When is the right time to insert the loss function?”