hunch_net hunch_net-2007 hunch_net-2007-236 knowledge-graph by maker-knowledge-mining

236 hunch net-2007-03-15-Alternative Machine Learning Reductions Definitions


meta infos for this blog

Source: html

Introduction: A type of prediction problem is specified by the type of samples produced by a data source (Example: X x {0,1} , X x [0,1] , X x {1,2,3,4,5} , etc…) and a loss function (0/1 loss, squared error loss, cost sensitive losses, etc…). For simplicity, we’ll assume that all losses have a minimum of zero. For this post, we can think of a learning reduction as A mapping R from samples of one type T (like multiclass classification) to another type T’ (like binary classification). A mapping Q from predictors for type T’ to predictors for type T . The simplest sort of learning reduction is a “loss reduction”. The idea in a loss reduction is to prove a statement of the form: Theorem For all base predictors b , for all distributions D over examples of type T : E (x,y) ~ D L T (y,Q(b,x)) <= f(E (x’,y’)~R(D) L T’ (y’,b(x’))) Here L T is the loss for the type T problem and L T’ is the loss for the type T’ problem. Also, R(D) is the distribution ov


Summary: the most important sentenses genereted by tfidf model

sentIndex sentText sentNum sentScore

1 A type of prediction problem is specified by the type of samples produced by a data source (Example: X x {0,1} , X x [0,1] , X x {1,2,3,4,5} , etc…) and a loss function (0/1 loss, squared error loss, cost sensitive losses, etc…). [sent-1, score-1.23]

2 For this post, we can think of a learning reduction as A mapping R from samples of one type T (like multiclass classification) to another type T’ (like binary classification). [sent-3, score-1.341]

3 A mapping Q from predictors for type T’ to predictors for type T . [sent-4, score-1.569]

4 The simplest sort of learning reduction is a “loss reduction”. [sent-5, score-0.162]

5 Also, R(D) is the distribution over samples induced by first drawing from D and then mapping the sample via R . [sent-7, score-0.644]

6 The function f() is the loss transform function—we try to find reductions R,Q which minimize it’s value. [sent-8, score-0.803]

7 If R,Q are deterministic, then there always exists a choice of D,b such that the loss rate on the right hand side is 0. [sent-9, score-0.278]

8 However, it’s common to encounter real-world learning problems D which are inherently noisy, implying that the induced problem D’ is often inherently noisy. [sent-10, score-0.342]

9 Distinguishing between errors due to environmental noise and errors due to base predictor mistakes seems important (and experimentally, it has been). [sent-11, score-0.568]

10 The skeletons of the theory for these families of reductions have been layed out at this point. [sent-14, score-0.474]

11 This hope is pretty reasonable—empirically, we have observed a consistent step up in performance going from loss transform to regret transform reductions. [sent-17, score-0.867]

12 The fact that the minimum is taken over all predictors in regret transforms is counterintuitive to some people, who are used to “Empirical Risk Minimization” statements where a minimum is taken over a limited set of predictors. [sent-19, score-0.993]

13 Essentially, we limit ourselves to reductions with the property that they are reversible. [sent-24, score-0.353]

14 Reversibility can be tested by mapping from one problem to another, and then back. [sent-25, score-0.245]

15 There are a several variant theorem statements we could imagine. [sent-26, score-0.36]

16 We would like it to be the case that Bayes Law and reductions compose. [sent-29, score-0.292]

17 A theorem statement of the following form might be about right. [sent-30, score-0.43]

18 Also, R(P) is the prior induced by mapping D to R(D) after drawing from P . [sent-32, score-0.528]

19 The two missing components for these kinds of reductions are: Theoretical evidence that we can satisfy these definitions of reduction between interesting types of learning problems. [sent-33, score-0.454]

20 My experience is that analyzing reductions has yielded significant insight into how to solve learning problems, so I would encourage anyone with a bit of theoretical inclination in Machine Learning to consider the above (or other) families of reductions. [sent-35, score-0.519]


similar blogs computed by tfidf model

tfidf for this blog:

wordName wordTfidf (topN-words)

[('type', 0.42), ('reductions', 0.292), ('predictors', 0.268), ('transform', 0.245), ('base', 0.232), ('loss', 0.211), ('theorem', 0.208), ('mapping', 0.193), ('induced', 0.186), ('regret', 0.166), ('reduction', 0.162), ('distributions', 0.15), ('drawing', 0.149), ('min', 0.14), ('families', 0.134), ('bayes', 0.132), ('statement', 0.102), ('transforms', 0.096), ('statements', 0.087), ('minimum', 0.087), ('binary', 0.074), ('samples', 0.072), ('exists', 0.067), ('errors', 0.067), ('idea', 0.066), ('variant', 0.065), ('losses', 0.065), ('law', 0.062), ('limit', 0.061), ('form', 0.06), ('following', 0.06), ('classification', 0.059), ('noise', 0.058), ('function', 0.055), ('problem', 0.052), ('inherently', 0.052), ('taken', 0.052), ('limited', 0.05), ('counterintuitive', 0.048), ('layed', 0.048), ('yielded', 0.048), ('osindero', 0.048), ('simon', 0.048), ('environmental', 0.048), ('due', 0.048), ('empirical', 0.046), ('experimentally', 0.045), ('inclination', 0.045), ('modifications', 0.045), ('sample', 0.044)]

similar blogs list:

simIndex simValue blogId blogTitle

same-blog 1 0.99999982 236 hunch net-2007-03-15-Alternative Machine Learning Reductions Definitions

Introduction: A type of prediction problem is specified by the type of samples produced by a data source (Example: X x {0,1} , X x [0,1] , X x {1,2,3,4,5} , etc…) and a loss function (0/1 loss, squared error loss, cost sensitive losses, etc…). For simplicity, we’ll assume that all losses have a minimum of zero. For this post, we can think of a learning reduction as A mapping R from samples of one type T (like multiclass classification) to another type T’ (like binary classification). A mapping Q from predictors for type T’ to predictors for type T . The simplest sort of learning reduction is a “loss reduction”. The idea in a loss reduction is to prove a statement of the form: Theorem For all base predictors b , for all distributions D over examples of type T : E (x,y) ~ D L T (y,Q(b,x)) <= f(E (x’,y’)~R(D) L T’ (y’,b(x’))) Here L T is the loss for the type T problem and L T’ is the loss for the type T’ problem. Also, R(D) is the distribution ov

2 0.43694499 83 hunch net-2005-06-18-Lower Bounds for Learning Reductions

Introduction: Learning reductions transform a solver of one type of learning problem into a solver of another type of learning problem. When we analyze these for robustness we can make statement of the form “Reduction R has the property that regret r (or loss) on subproblems of type A implies regret at most f ( r ) on the original problem of type B “. A lower bound for a learning reduction would have the form “for all reductions R , there exists a learning problem of type B and learning algorithm for problems of type A where regret r on induced problems implies at least regret f ( r ) for B “. The pursuit of lower bounds is often questionable because, unlike upper bounds, they do not yield practical algorithms. Nevertheless, they may be helpful as a tool for thinking about what is learnable and how learnable it is. This has already come up here and here . At the moment, there is no coherent theory of lower bounds for learning reductions, and we have little understa

3 0.30574203 181 hunch net-2006-05-23-What is the best regret transform reduction from multiclass to binary?

Introduction: This post is about an open problem in learning reductions. Background A reduction might transform a a multiclass prediction problem where there are k possible labels into a binary learning problem where there are only 2 possible labels. On this induced binary problem we might learn a binary classifier with some error rate e . After subtracting the minimum possible (Bayes) error rate b , we get a regret r = e – b . The PECOC (Probabilistic Error Correcting Output Code) reduction has the property that binary regret r implies multiclass regret at most 4r 0.5 . The problem This is not the “rightest” answer. Consider the k=2 case, where we reduce binary to binary. There exists a reduction (the identity) with the property that regret r implies regret r . This is substantially superior to the transform given by the PECOC reduction, which suggests that a better reduction may exist for general k . For example, we can not rule out the possibility that a reduction

4 0.26801041 14 hunch net-2005-02-07-The State of the Reduction

Introduction: What? Reductions are machines which turn solvers for one problem into solvers for another problem. Why? Reductions are useful for several reasons. Laziness . Reducing a problem to classification make at least 10 learning algorithms available to solve a problem. Inventing 10 learning algorithms is quite a bit of work. Similarly, programming a reduction is often trivial, while programming a learning algorithm is a great deal of work. Crystallization . The problems we often want to solve in learning are worst-case-impossible, but average case feasible. By reducing all problems onto one or a few primitives, we can fine tune these primitives to perform well on real-world problems with greater precision due to the greater number of problems to validate on. Theoretical Organization . By studying what reductions are easy vs. hard vs. impossible, we can learn which problems are roughly equivalent in difficulty and which are much harder. What we know now . Typesafe r

5 0.23931739 49 hunch net-2005-03-30-What can Type Theory teach us about Machine Learning?

Introduction: This post is some combination of belaboring the obvious and speculating wildly about the future. The basic issue to be addressed is how to think about machine learning in terms given to us from Programming Language theory. Types and Reductions John’s research programme (I feel this should be in British spelling to reflect the grandiousness of the idea…) of machine learning reductions StateOfReduction is at some essential level type-theoretic in nature. The fundamental elements are the classifier, a function f: alpha -> beta, and the corresponding classifier trainer g: List of (alpha,beta) -> (alpha -> beta). The research goal is to create *combinators* that produce new f’s and g’s given existing ones. John (probably quite rightly) seems unwilling at the moment to commit to any notion stronger than these combinators are correctly typed. One way to see the result of a reduction is something typed like: (For those denied the joy of the Hindly-Milner type system, “simple” is probab

6 0.22160807 9 hunch net-2005-02-01-Watchword: Loss

7 0.21577363 245 hunch net-2007-05-12-Loss Function Semantics

8 0.20180236 391 hunch net-2010-03-15-The Efficient Robust Conditional Probability Estimation Problem

9 0.19682983 341 hunch net-2009-02-04-Optimal Proxy Loss for Classification

10 0.18430567 103 hunch net-2005-08-18-SVM Adaptability

11 0.17987184 133 hunch net-2005-11-28-A question of quantification

12 0.17495666 72 hunch net-2005-05-16-Regret minimizing vs error limiting reductions

13 0.17052461 235 hunch net-2007-03-03-All Models of Learning have Flaws

14 0.16365212 79 hunch net-2005-06-08-Question: “When is the right time to insert the loss function?”

15 0.1488315 74 hunch net-2005-05-21-What is the right form of modularity in structured prediction?

16 0.13635972 243 hunch net-2007-05-08-Conditional Tournaments for Multiclass to Binary

17 0.132715 12 hunch net-2005-02-03-Learning Theory, by assumption

18 0.13153504 82 hunch net-2005-06-17-Reopening RL->Classification

19 0.13041152 259 hunch net-2007-08-19-Choice of Metrics

20 0.12891282 351 hunch net-2009-05-02-Wielding a New Abstraction


similar blogs computed by lsi model

lsi for this blog:

topicId topicWeight

[(0, 0.23), (1, 0.239), (2, 0.151), (3, -0.179), (4, -0.198), (5, 0.01), (6, 0.197), (7, -0.05), (8, -0.111), (9, -0.041), (10, -0.051), (11, -0.117), (12, -0.02), (13, 0.03), (14, 0.032), (15, 0.002), (16, 0.099), (17, -0.028), (18, 0.0), (19, -0.116), (20, 0.028), (21, -0.096), (22, -0.041), (23, -0.167), (24, -0.013), (25, -0.029), (26, -0.017), (27, 0.029), (28, 0.083), (29, 0.071), (30, 0.1), (31, -0.014), (32, -0.057), (33, 0.04), (34, -0.014), (35, -0.151), (36, -0.176), (37, 0.045), (38, -0.037), (39, 0.002), (40, -0.006), (41, 0.098), (42, 0.028), (43, -0.069), (44, -0.082), (45, 0.024), (46, -0.012), (47, 0.102), (48, -0.024), (49, 0.042)]

similar blogs list:

simIndex simValue blogId blogTitle

same-blog 1 0.98292619 236 hunch net-2007-03-15-Alternative Machine Learning Reductions Definitions

Introduction: A type of prediction problem is specified by the type of samples produced by a data source (Example: X x {0,1} , X x [0,1] , X x {1,2,3,4,5} , etc…) and a loss function (0/1 loss, squared error loss, cost sensitive losses, etc…). For simplicity, we’ll assume that all losses have a minimum of zero. For this post, we can think of a learning reduction as A mapping R from samples of one type T (like multiclass classification) to another type T’ (like binary classification). A mapping Q from predictors for type T’ to predictors for type T . The simplest sort of learning reduction is a “loss reduction”. The idea in a loss reduction is to prove a statement of the form: Theorem For all base predictors b , for all distributions D over examples of type T : E (x,y) ~ D L T (y,Q(b,x)) <= f(E (x’,y’)~R(D) L T’ (y’,b(x’))) Here L T is the loss for the type T problem and L T’ is the loss for the type T’ problem. Also, R(D) is the distribution ov

2 0.85643387 83 hunch net-2005-06-18-Lower Bounds for Learning Reductions

Introduction: Learning reductions transform a solver of one type of learning problem into a solver of another type of learning problem. When we analyze these for robustness we can make statement of the form “Reduction R has the property that regret r (or loss) on subproblems of type A implies regret at most f ( r ) on the original problem of type B “. A lower bound for a learning reduction would have the form “for all reductions R , there exists a learning problem of type B and learning algorithm for problems of type A where regret r on induced problems implies at least regret f ( r ) for B “. The pursuit of lower bounds is often questionable because, unlike upper bounds, they do not yield practical algorithms. Nevertheless, they may be helpful as a tool for thinking about what is learnable and how learnable it is. This has already come up here and here . At the moment, there is no coherent theory of lower bounds for learning reductions, and we have little understa

3 0.66543853 49 hunch net-2005-03-30-What can Type Theory teach us about Machine Learning?

Introduction: This post is some combination of belaboring the obvious and speculating wildly about the future. The basic issue to be addressed is how to think about machine learning in terms given to us from Programming Language theory. Types and Reductions John’s research programme (I feel this should be in British spelling to reflect the grandiousness of the idea…) of machine learning reductions StateOfReduction is at some essential level type-theoretic in nature. The fundamental elements are the classifier, a function f: alpha -> beta, and the corresponding classifier trainer g: List of (alpha,beta) -> (alpha -> beta). The research goal is to create *combinators* that produce new f’s and g’s given existing ones. John (probably quite rightly) seems unwilling at the moment to commit to any notion stronger than these combinators are correctly typed. One way to see the result of a reduction is something typed like: (For those denied the joy of the Hindly-Milner type system, “simple” is probab

4 0.65827906 181 hunch net-2006-05-23-What is the best regret transform reduction from multiclass to binary?

Introduction: This post is about an open problem in learning reductions. Background A reduction might transform a a multiclass prediction problem where there are k possible labels into a binary learning problem where there are only 2 possible labels. On this induced binary problem we might learn a binary classifier with some error rate e . After subtracting the minimum possible (Bayes) error rate b , we get a regret r = e – b . The PECOC (Probabilistic Error Correcting Output Code) reduction has the property that binary regret r implies multiclass regret at most 4r 0.5 . The problem This is not the “rightest” answer. Consider the k=2 case, where we reduce binary to binary. There exists a reduction (the identity) with the property that regret r implies regret r . This is substantially superior to the transform given by the PECOC reduction, which suggests that a better reduction may exist for general k . For example, we can not rule out the possibility that a reduction

5 0.63939279 391 hunch net-2010-03-15-The Efficient Robust Conditional Probability Estimation Problem

Introduction: I’m offering a reward of $1000 for a solution to this problem. This joins the cross validation problem which I’m offering a $500 reward for. I believe both of these problems are hard but plausibly solvable, and plausibly with a solution of substantial practical value. While it’s unlikely these rewards are worth your time on an hourly wage basis, the recognition for solving them definitely should be The Problem The problem is finding a general, robust, and efficient mechanism for estimating a conditional probability P(y|x) where robustness and efficiency are measured using techniques from learning reductions. In particular, suppose we have access to a binary regression oracle B which has two interfaces—one for specifying training information and one for testing. Training information is specified as B(x’,y’) where x’ is a feature vector and y’ is a scalar in [0,1] with no value returned. Testing is done according to B(x’) with a value in [0,1] returned.

6 0.63776207 103 hunch net-2005-08-18-SVM Adaptability

7 0.62440503 14 hunch net-2005-02-07-The State of the Reduction

8 0.54233271 82 hunch net-2005-06-17-Reopening RL->Classification

9 0.51580757 74 hunch net-2005-05-21-What is the right form of modularity in structured prediction?

10 0.51106411 243 hunch net-2007-05-08-Conditional Tournaments for Multiclass to Binary

11 0.46755531 196 hunch net-2006-07-13-Regression vs. Classification as a Primitive

12 0.46629095 9 hunch net-2005-02-01-Watchword: Loss

13 0.46013463 245 hunch net-2007-05-12-Loss Function Semantics

14 0.4550814 351 hunch net-2009-05-02-Wielding a New Abstraction

15 0.45301351 79 hunch net-2005-06-08-Question: “When is the right time to insert the loss function?”

16 0.44684833 341 hunch net-2009-02-04-Optimal Proxy Loss for Classification

17 0.44058239 72 hunch net-2005-05-16-Regret minimizing vs error limiting reductions

18 0.43922454 133 hunch net-2005-11-28-A question of quantification

19 0.43567586 473 hunch net-2012-09-29-Vowpal Wabbit, version 7.0

20 0.42634669 206 hunch net-2006-09-09-How to solve an NP hard problem in quadratic time


similar blogs computed by lda model

lda for this blog:

topicId topicWeight

[(0, 0.018), (3, 0.052), (17, 0.082), (27, 0.245), (38, 0.269), (53, 0.078), (55, 0.047), (77, 0.033), (94, 0.056)]

similar blogs list:

simIndex simValue blogId blogTitle

same-blog 1 0.96676135 236 hunch net-2007-03-15-Alternative Machine Learning Reductions Definitions

Introduction: A type of prediction problem is specified by the type of samples produced by a data source (Example: X x {0,1} , X x [0,1] , X x {1,2,3,4,5} , etc…) and a loss function (0/1 loss, squared error loss, cost sensitive losses, etc…). For simplicity, we’ll assume that all losses have a minimum of zero. For this post, we can think of a learning reduction as A mapping R from samples of one type T (like multiclass classification) to another type T’ (like binary classification). A mapping Q from predictors for type T’ to predictors for type T . The simplest sort of learning reduction is a “loss reduction”. The idea in a loss reduction is to prove a statement of the form: Theorem For all base predictors b , for all distributions D over examples of type T : E (x,y) ~ D L T (y,Q(b,x)) <= f(E (x’,y’)~R(D) L T’ (y’,b(x’))) Here L T is the loss for the type T problem and L T’ is the loss for the type T’ problem. Also, R(D) is the distribution ov

2 0.95193893 83 hunch net-2005-06-18-Lower Bounds for Learning Reductions

Introduction: Learning reductions transform a solver of one type of learning problem into a solver of another type of learning problem. When we analyze these for robustness we can make statement of the form “Reduction R has the property that regret r (or loss) on subproblems of type A implies regret at most f ( r ) on the original problem of type B “. A lower bound for a learning reduction would have the form “for all reductions R , there exists a learning problem of type B and learning algorithm for problems of type A where regret r on induced problems implies at least regret f ( r ) for B “. The pursuit of lower bounds is often questionable because, unlike upper bounds, they do not yield practical algorithms. Nevertheless, they may be helpful as a tool for thinking about what is learnable and how learnable it is. This has already come up here and here . At the moment, there is no coherent theory of lower bounds for learning reductions, and we have little understa

3 0.95070213 181 hunch net-2006-05-23-What is the best regret transform reduction from multiclass to binary?

Introduction: This post is about an open problem in learning reductions. Background A reduction might transform a a multiclass prediction problem where there are k possible labels into a binary learning problem where there are only 2 possible labels. On this induced binary problem we might learn a binary classifier with some error rate e . After subtracting the minimum possible (Bayes) error rate b , we get a regret r = e – b . The PECOC (Probabilistic Error Correcting Output Code) reduction has the property that binary regret r implies multiclass regret at most 4r 0.5 . The problem This is not the “rightest” answer. Consider the k=2 case, where we reduce binary to binary. There exists a reduction (the identity) with the property that regret r implies regret r . This is substantially superior to the transform given by the PECOC reduction, which suggests that a better reduction may exist for general k . For example, we can not rule out the possibility that a reduction

4 0.94942826 353 hunch net-2009-05-08-Computability in Artificial Intelligence

Introduction: Normally I do not blog, but John kindly invited me to do so. Since computability issues play a major role in Artificial Intelligence and Machine Learning, I would like to take the opportunity to comment on that and raise some questions. The general attitude is that AI is about finding efficient smart algorithms. For large parts of machine learning, the same attitude is not too dangerous. If you want to concentrate on conceptual problems, simply become a statistician. There is no analogous escape for modern research on AI (as opposed to GOFAI rooted in logic). Let me show by analogy why limiting research to computational questions is bad for any field. Except in computer science, computational aspects play little role in the development of fundamental theories: Consider e.g. set theory with axiom of choice, foundations of logic, exact/full minimax for zero-sum games, quantum (field) theory, string theory, … Indeed, at least in physics, every new fundamental theory seems to

5 0.94496632 170 hunch net-2006-04-06-Bounds greater than 1

Introduction: Nati Srebro and Shai Ben-David have a paper at COLT which, in the appendix, proves something very striking: several previous error bounds are always greater than 1. Background One branch of learning theory focuses on theorems which Assume samples are drawn IID from an unknown distribution D . Fix a set of classifiers Find a high probability bound on the maximum true error rate (with respect to D ) as a function of the empirical error rate on the training set. Many of these bounds become extremely complex and hairy. Current Everyone working on this subject wants “tighter bounds”, however there are different definitions of “tighter”. Some groups focus on “functional tightness” (getting the right functional dependency between the size of the training set and a parameterization of the hypothesis space) while others focus on “practical tightness” (finding bounds which work well on practical problems). (I am definitely in the second camp.) One of the da

6 0.93741667 233 hunch net-2007-02-16-The Forgetting

7 0.9196533 251 hunch net-2007-06-24-Interesting Papers at ICML 2007

8 0.90603364 339 hunch net-2009-01-27-Key Scientific Challenges

9 0.89064574 72 hunch net-2005-05-16-Regret minimizing vs error limiting reductions

10 0.84545141 131 hunch net-2005-11-16-The Everything Ensemble Edge

11 0.838166 125 hunch net-2005-10-20-Machine Learning in the News

12 0.83666694 19 hunch net-2005-02-14-Clever Methods of Overfitting

13 0.83521175 26 hunch net-2005-02-21-Problem: Cross Validation

14 0.82950783 14 hunch net-2005-02-07-The State of the Reduction

15 0.82672411 227 hunch net-2007-01-10-A Deep Belief Net Learning Problem

16 0.82536006 259 hunch net-2007-08-19-Choice of Metrics

17 0.82413393 391 hunch net-2010-03-15-The Efficient Robust Conditional Probability Estimation Problem

18 0.81507337 488 hunch net-2013-08-31-Extreme Classification workshop at NIPS

19 0.81049407 143 hunch net-2005-12-27-Automated Labeling

20 0.80732059 82 hunch net-2005-06-17-Reopening RL->Classification