hunch_net hunch_net-2005 hunch_net-2005-83 knowledge-graph by maker-knowledge-mining

83 hunch net-2005-06-18-Lower Bounds for Learning Reductions


meta infos for this blog

Source: html

Introduction: Learning reductions transform a solver of one type of learning problem into a solver of another type of learning problem. When we analyze these for robustness we can make statement of the form “Reduction R has the property that regret r (or loss) on subproblems of type A implies regret at most f ( r ) on the original problem of type B “. A lower bound for a learning reduction would have the form “for all reductions R , there exists a learning problem of type B and learning algorithm for problems of type A where regret r on induced problems implies at least regret f ( r ) for B “. The pursuit of lower bounds is often questionable because, unlike upper bounds, they do not yield practical algorithms. Nevertheless, they may be helpful as a tool for thinking about what is learnable and how learnable it is. This has already come up here and here . At the moment, there is no coherent theory of lower bounds for learning reductions, and we have little understa


Summary: the most important sentenses genereted by tfidf model

sentIndex sentText sentNum sentScore

1 Learning reductions transform a solver of one type of learning problem into a solver of another type of learning problem. [sent-1, score-1.695]

2 When we analyze these for robustness we can make statement of the form “Reduction R has the property that regret r (or loss) on subproblems of type A implies regret at most f ( r ) on the original problem of type B “. [sent-2, score-1.868]

3 A lower bound for a learning reduction would have the form “for all reductions R , there exists a learning problem of type B and learning algorithm for problems of type A where regret r on induced problems implies at least regret f ( r ) for B “. [sent-3, score-2.485]

4 The pursuit of lower bounds is often questionable because, unlike upper bounds, they do not yield practical algorithms. [sent-4, score-0.912]

5 Nevertheless, they may be helpful as a tool for thinking about what is learnable and how learnable it is. [sent-5, score-0.524]

6 At the moment, there is no coherent theory of lower bounds for learning reductions, and we have little understanding of how feasible they are or which techniques may be useful in proving them. [sent-7, score-0.896]

7 Here is a rough summary of what I know: For structured prediction , we have a partially worked out lower bound for all reductions using the structure to only carry single bits . [sent-8, score-1.441]

8 A proof for reductions using the structure in others ways seems tricky at the moment. [sent-9, score-0.455]

9 For Reinforcement learning it may (this is unclear) be possible to prove a lower bound showing that prediction ability alone can not solve RL well. [sent-10, score-0.825]

10 There are various results which can be thought of as lower bounds for more limited families of reductions. [sent-11, score-0.677]

11 One example is analyzing exactly how badly margin optimization can underperform for 0-1 loss when there is noise. [sent-12, score-0.407]

12 Overall, this is a moderately interesting direction of research which has not been much investigated. [sent-13, score-0.166]


similar blogs computed by tfidf model

tfidf for this blog:

wordName wordTfidf (topN-words)

[('type', 0.436), ('lower', 0.36), ('reductions', 0.272), ('solver', 0.233), ('regret', 0.229), ('bounds', 0.209), ('learnable', 0.187), ('bound', 0.16), ('investigated', 0.117), ('reduction', 0.112), ('families', 0.108), ('carry', 0.108), ('structure', 0.108), ('moderately', 0.102), ('pursuit', 0.102), ('questionable', 0.097), ('alone', 0.093), ('loss', 0.093), ('coherent', 0.09), ('induced', 0.09), ('implies', 0.088), ('margin', 0.087), ('feasible', 0.085), ('subproblems', 0.085), ('transform', 0.085), ('proving', 0.083), ('robustness', 0.081), ('analyzing', 0.081), ('tool', 0.081), ('summary', 0.079), ('rough', 0.079), ('rl', 0.075), ('tricky', 0.075), ('form', 0.073), ('moment', 0.073), ('unlike', 0.073), ('badly', 0.073), ('exactly', 0.073), ('prediction', 0.072), ('analyze', 0.071), ('upper', 0.071), ('showing', 0.071), ('bits', 0.07), ('property', 0.07), ('original', 0.07), ('may', 0.069), ('partially', 0.067), ('unclear', 0.067), ('structured', 0.066), ('direction', 0.064)]

similar blogs list:

simIndex simValue blogId blogTitle

same-blog 1 0.99999994 83 hunch net-2005-06-18-Lower Bounds for Learning Reductions

Introduction: Learning reductions transform a solver of one type of learning problem into a solver of another type of learning problem. When we analyze these for robustness we can make statement of the form “Reduction R has the property that regret r (or loss) on subproblems of type A implies regret at most f ( r ) on the original problem of type B “. A lower bound for a learning reduction would have the form “for all reductions R , there exists a learning problem of type B and learning algorithm for problems of type A where regret r on induced problems implies at least regret f ( r ) for B “. The pursuit of lower bounds is often questionable because, unlike upper bounds, they do not yield practical algorithms. Nevertheless, they may be helpful as a tool for thinking about what is learnable and how learnable it is. This has already come up here and here . At the moment, there is no coherent theory of lower bounds for learning reductions, and we have little understa

2 0.43694499 236 hunch net-2007-03-15-Alternative Machine Learning Reductions Definitions

Introduction: A type of prediction problem is specified by the type of samples produced by a data source (Example: X x {0,1} , X x [0,1] , X x {1,2,3,4,5} , etc…) and a loss function (0/1 loss, squared error loss, cost sensitive losses, etc…). For simplicity, we’ll assume that all losses have a minimum of zero. For this post, we can think of a learning reduction as A mapping R from samples of one type T (like multiclass classification) to another type T’ (like binary classification). A mapping Q from predictors for type T’ to predictors for type T . The simplest sort of learning reduction is a “loss reduction”. The idea in a loss reduction is to prove a statement of the form: Theorem For all base predictors b , for all distributions D over examples of type T : E (x,y) ~ D L T (y,Q(b,x)) <= f(E (x’,y’)~R(D) L T’ (y’,b(x’))) Here L T is the loss for the type T problem and L T’ is the loss for the type T’ problem. Also, R(D) is the distribution ov

3 0.24307716 49 hunch net-2005-03-30-What can Type Theory teach us about Machine Learning?

Introduction: This post is some combination of belaboring the obvious and speculating wildly about the future. The basic issue to be addressed is how to think about machine learning in terms given to us from Programming Language theory. Types and Reductions John’s research programme (I feel this should be in British spelling to reflect the grandiousness of the idea…) of machine learning reductions StateOfReduction is at some essential level type-theoretic in nature. The fundamental elements are the classifier, a function f: alpha -> beta, and the corresponding classifier trainer g: List of (alpha,beta) -> (alpha -> beta). The research goal is to create *combinators* that produce new f’s and g’s given existing ones. John (probably quite rightly) seems unwilling at the moment to commit to any notion stronger than these combinators are correctly typed. One way to see the result of a reduction is something typed like: (For those denied the joy of the Hindly-Milner type system, “simple” is probab

4 0.21348301 14 hunch net-2005-02-07-The State of the Reduction

Introduction: What? Reductions are machines which turn solvers for one problem into solvers for another problem. Why? Reductions are useful for several reasons. Laziness . Reducing a problem to classification make at least 10 learning algorithms available to solve a problem. Inventing 10 learning algorithms is quite a bit of work. Similarly, programming a reduction is often trivial, while programming a learning algorithm is a great deal of work. Crystallization . The problems we often want to solve in learning are worst-case-impossible, but average case feasible. By reducing all problems onto one or a few primitives, we can fine tune these primitives to perform well on real-world problems with greater precision due to the greater number of problems to validate on. Theoretical Organization . By studying what reductions are easy vs. hard vs. impossible, we can learn which problems are roughly equivalent in difficulty and which are much harder. What we know now . Typesafe r

5 0.20925155 181 hunch net-2006-05-23-What is the best regret transform reduction from multiclass to binary?

Introduction: This post is about an open problem in learning reductions. Background A reduction might transform a a multiclass prediction problem where there are k possible labels into a binary learning problem where there are only 2 possible labels. On this induced binary problem we might learn a binary classifier with some error rate e . After subtracting the minimum possible (Bayes) error rate b , we get a regret r = e – b . The PECOC (Probabilistic Error Correcting Output Code) reduction has the property that binary regret r implies multiclass regret at most 4r 0.5 . The problem This is not the “rightest” answer. Consider the k=2 case, where we reduce binary to binary. There exists a reduction (the identity) with the property that regret r implies regret r . This is substantially superior to the transform given by the PECOC reduction, which suggests that a better reduction may exist for general k . For example, we can not rule out the possibility that a reduction

6 0.20255633 41 hunch net-2005-03-15-The State of Tight Bounds

7 0.16592364 72 hunch net-2005-05-16-Regret minimizing vs error limiting reductions

8 0.15867627 82 hunch net-2005-06-17-Reopening RL->Classification

9 0.13593148 351 hunch net-2009-05-02-Wielding a New Abstraction

10 0.12904753 391 hunch net-2010-03-15-The Efficient Robust Conditional Probability Estimation Problem

11 0.12530644 103 hunch net-2005-08-18-SVM Adaptability

12 0.12263677 343 hunch net-2009-02-18-Decision by Vetocracy

13 0.11950964 58 hunch net-2005-04-21-Dynamic Programming Generalizations and Their Use

14 0.11806416 202 hunch net-2006-08-10-Precision is not accuracy

15 0.11769285 341 hunch net-2009-02-04-Optimal Proxy Loss for Classification

16 0.1137845 332 hunch net-2008-12-23-Use of Learning Theory

17 0.11321035 235 hunch net-2007-03-03-All Models of Learning have Flaws

18 0.11072067 27 hunch net-2005-02-23-Problem: Reinforcement Learning with Classification

19 0.10729883 115 hunch net-2005-09-26-Prediction Bounds as the Mathematics of Science

20 0.10430884 473 hunch net-2012-09-29-Vowpal Wabbit, version 7.0


similar blogs computed by lsi model

lsi for this blog:

topicId topicWeight

[(0, 0.204), (1, 0.176), (2, 0.095), (3, -0.105), (4, -0.082), (5, -0.067), (6, 0.247), (7, -0.073), (8, -0.123), (9, -0.042), (10, -0.03), (11, -0.008), (12, -0.057), (13, 0.065), (14, 0.034), (15, 0.001), (16, 0.133), (17, -0.001), (18, -0.036), (19, -0.094), (20, 0.104), (21, -0.099), (22, -0.17), (23, -0.106), (24, -0.025), (25, 0.018), (26, -0.072), (27, 0.067), (28, 0.081), (29, 0.099), (30, 0.103), (31, -0.02), (32, -0.066), (33, 0.018), (34, -0.084), (35, -0.161), (36, -0.166), (37, 0.002), (38, -0.125), (39, 0.006), (40, -0.04), (41, 0.061), (42, 0.006), (43, -0.025), (44, -0.066), (45, 0.054), (46, 0.025), (47, 0.085), (48, -0.054), (49, 0.067)]

similar blogs list:

simIndex simValue blogId blogTitle

same-blog 1 0.9691909 83 hunch net-2005-06-18-Lower Bounds for Learning Reductions

Introduction: Learning reductions transform a solver of one type of learning problem into a solver of another type of learning problem. When we analyze these for robustness we can make statement of the form “Reduction R has the property that regret r (or loss) on subproblems of type A implies regret at most f ( r ) on the original problem of type B “. A lower bound for a learning reduction would have the form “for all reductions R , there exists a learning problem of type B and learning algorithm for problems of type A where regret r on induced problems implies at least regret f ( r ) for B “. The pursuit of lower bounds is often questionable because, unlike upper bounds, they do not yield practical algorithms. Nevertheless, they may be helpful as a tool for thinking about what is learnable and how learnable it is. This has already come up here and here . At the moment, there is no coherent theory of lower bounds for learning reductions, and we have little understa

2 0.84947103 236 hunch net-2007-03-15-Alternative Machine Learning Reductions Definitions

Introduction: A type of prediction problem is specified by the type of samples produced by a data source (Example: X x {0,1} , X x [0,1] , X x {1,2,3,4,5} , etc…) and a loss function (0/1 loss, squared error loss, cost sensitive losses, etc…). For simplicity, we’ll assume that all losses have a minimum of zero. For this post, we can think of a learning reduction as A mapping R from samples of one type T (like multiclass classification) to another type T’ (like binary classification). A mapping Q from predictors for type T’ to predictors for type T . The simplest sort of learning reduction is a “loss reduction”. The idea in a loss reduction is to prove a statement of the form: Theorem For all base predictors b , for all distributions D over examples of type T : E (x,y) ~ D L T (y,Q(b,x)) <= f(E (x’,y’)~R(D) L T’ (y’,b(x’))) Here L T is the loss for the type T problem and L T’ is the loss for the type T’ problem. Also, R(D) is the distribution ov

3 0.68269604 49 hunch net-2005-03-30-What can Type Theory teach us about Machine Learning?

Introduction: This post is some combination of belaboring the obvious and speculating wildly about the future. The basic issue to be addressed is how to think about machine learning in terms given to us from Programming Language theory. Types and Reductions John’s research programme (I feel this should be in British spelling to reflect the grandiousness of the idea…) of machine learning reductions StateOfReduction is at some essential level type-theoretic in nature. The fundamental elements are the classifier, a function f: alpha -> beta, and the corresponding classifier trainer g: List of (alpha,beta) -> (alpha -> beta). The research goal is to create *combinators* that produce new f’s and g’s given existing ones. John (probably quite rightly) seems unwilling at the moment to commit to any notion stronger than these combinators are correctly typed. One way to see the result of a reduction is something typed like: (For those denied the joy of the Hindly-Milner type system, “simple” is probab

4 0.5822407 181 hunch net-2006-05-23-What is the best regret transform reduction from multiclass to binary?

Introduction: This post is about an open problem in learning reductions. Background A reduction might transform a a multiclass prediction problem where there are k possible labels into a binary learning problem where there are only 2 possible labels. On this induced binary problem we might learn a binary classifier with some error rate e . After subtracting the minimum possible (Bayes) error rate b , we get a regret r = e – b . The PECOC (Probabilistic Error Correcting Output Code) reduction has the property that binary regret r implies multiclass regret at most 4r 0.5 . The problem This is not the “rightest” answer. Consider the k=2 case, where we reduce binary to binary. There exists a reduction (the identity) with the property that regret r implies regret r . This is substantially superior to the transform given by the PECOC reduction, which suggests that a better reduction may exist for general k . For example, we can not rule out the possibility that a reduction

5 0.57483113 103 hunch net-2005-08-18-SVM Adaptability

Introduction: Several recent papers have shown that SVM-like optimizations can be used to handle several large family loss functions. This is a good thing because it is implausible that the loss function imposed by the world can not be taken into account in the process of solving a prediction problem. Even people used to the hard-core Bayesian approach to learning often note that some approximations are almost inevitable in specifying a prior and/or integrating to achieve a posterior. Taking into account how the system will be evaluated can allow both computational effort and design effort to be focused so as to improve performance. A current laundry list of capabilities includes: 2002 multiclass SVM including arbitrary cost matrices ICML 2003 Hidden Markov Models NIPS 2003 Markov Networks (see some discussion ) EMNLP 2004 Context free grammars ICML 2004 Any loss (with much computation) ICML 2005 Any constrained linear prediction model (that’s my own

6 0.56728411 14 hunch net-2005-02-07-The State of the Reduction

7 0.55041432 82 hunch net-2005-06-17-Reopening RL->Classification

8 0.48545063 391 hunch net-2010-03-15-The Efficient Robust Conditional Probability Estimation Problem

9 0.47676063 351 hunch net-2009-05-02-Wielding a New Abstraction

10 0.45286363 170 hunch net-2006-04-06-Bounds greater than 1

11 0.45252258 473 hunch net-2012-09-29-Vowpal Wabbit, version 7.0

12 0.44878191 27 hunch net-2005-02-23-Problem: Reinforcement Learning with Classification

13 0.44466454 41 hunch net-2005-03-15-The State of Tight Bounds

14 0.43649265 85 hunch net-2005-06-28-A COLT paper

15 0.42837697 243 hunch net-2007-05-08-Conditional Tournaments for Multiclass to Binary

16 0.418607 58 hunch net-2005-04-21-Dynamic Programming Generalizations and Their Use

17 0.41601008 115 hunch net-2005-09-26-Prediction Bounds as the Mathematics of Science

18 0.41467676 244 hunch net-2007-05-09-The Missing Bound

19 0.41213524 421 hunch net-2011-01-03-Herman Goldstine 2011

20 0.40586811 72 hunch net-2005-05-16-Regret minimizing vs error limiting reductions


similar blogs computed by lda model

lda for this blog:

topicId topicWeight

[(3, 0.015), (27, 0.193), (34, 0.011), (38, 0.532), (53, 0.055), (55, 0.033), (94, 0.021), (95, 0.033)]

similar blogs list:

simIndex simValue blogId blogTitle

1 0.98364305 125 hunch net-2005-10-20-Machine Learning in the News

Introduction: The New York Times had a short interview about machine learning in datamining being used pervasively by the IRS and large corporations to predict who to audit and who to target for various marketing campaigns. This is a big application area of machine learning. It can be harmful (learning + databases = another way to invade privacy) or beneficial (as google demonstrates, better targeting of marketing campaigns is far less annoying). This is yet more evidence that we can not rely upon “I’m just another fish in the school” logic for our expectations about treatment by government and large corporations.

2 0.97451627 339 hunch net-2009-01-27-Key Scientific Challenges

Introduction: Yahoo released the Key Scientific Challenges program. There is a Machine Learning list I worked on and a Statistics list which Deepak worked on. I’m hoping this is taken quite seriously by graduate students. The primary value, is that it gave us a chance to sit down and publicly specify directions of research which would be valuable to make progress on. A good strategy for a beginning graduate student is to pick one of these directions, pursue it, and make substantial advances for a PhD. The directions are sufficiently general that I’m sure any serious advance has applications well beyond Yahoo. A secondary point, (which I’m sure is primary for many ) is that there is money for graduate students here. It’s unrestricted, so you can use it for any reasonable travel, supplies, etc…

3 0.96373552 181 hunch net-2006-05-23-What is the best regret transform reduction from multiclass to binary?

Introduction: This post is about an open problem in learning reductions. Background A reduction might transform a a multiclass prediction problem where there are k possible labels into a binary learning problem where there are only 2 possible labels. On this induced binary problem we might learn a binary classifier with some error rate e . After subtracting the minimum possible (Bayes) error rate b , we get a regret r = e – b . The PECOC (Probabilistic Error Correcting Output Code) reduction has the property that binary regret r implies multiclass regret at most 4r 0.5 . The problem This is not the “rightest” answer. Consider the k=2 case, where we reduce binary to binary. There exists a reduction (the identity) with the property that regret r implies regret r . This is substantially superior to the transform given by the PECOC reduction, which suggests that a better reduction may exist for general k . For example, we can not rule out the possibility that a reduction

same-blog 4 0.96162915 83 hunch net-2005-06-18-Lower Bounds for Learning Reductions

Introduction: Learning reductions transform a solver of one type of learning problem into a solver of another type of learning problem. When we analyze these for robustness we can make statement of the form “Reduction R has the property that regret r (or loss) on subproblems of type A implies regret at most f ( r ) on the original problem of type B “. A lower bound for a learning reduction would have the form “for all reductions R , there exists a learning problem of type B and learning algorithm for problems of type A where regret r on induced problems implies at least regret f ( r ) for B “. The pursuit of lower bounds is often questionable because, unlike upper bounds, they do not yield practical algorithms. Nevertheless, they may be helpful as a tool for thinking about what is learnable and how learnable it is. This has already come up here and here . At the moment, there is no coherent theory of lower bounds for learning reductions, and we have little understa

5 0.94754273 488 hunch net-2013-08-31-Extreme Classification workshop at NIPS

Introduction: Manik and I are organizing the extreme classification workshop at NIPS this year. We have a number of good speakers lined up, but I would further encourage anyone working in the area to submit an abstract by October 9. I believe this is an idea whose time has now come. The NIPS website doesn’t have other workshops listed yet, but I expect several others to be of significant interest.

6 0.92579544 170 hunch net-2006-04-06-Bounds greater than 1

7 0.85108572 353 hunch net-2009-05-08-Computability in Artificial Intelligence

8 0.85096222 233 hunch net-2007-02-16-The Forgetting

9 0.81733572 236 hunch net-2007-03-15-Alternative Machine Learning Reductions Definitions

10 0.79859167 251 hunch net-2007-06-24-Interesting Papers at ICML 2007

11 0.71574903 72 hunch net-2005-05-16-Regret minimizing vs error limiting reductions

12 0.69216204 239 hunch net-2007-04-18-$50K Spock Challenge

13 0.65830106 26 hunch net-2005-02-21-Problem: Cross Validation

14 0.62907439 82 hunch net-2005-06-17-Reopening RL->Classification

15 0.61674756 19 hunch net-2005-02-14-Clever Methods of Overfitting

16 0.60677147 131 hunch net-2005-11-16-The Everything Ensemble Edge

17 0.59362561 49 hunch net-2005-03-30-What can Type Theory teach us about Machine Learning?

18 0.58377773 14 hunch net-2005-02-07-The State of the Reduction

19 0.5790332 391 hunch net-2010-03-15-The Efficient Robust Conditional Probability Estimation Problem

20 0.57671064 284 hunch net-2008-01-18-Datasets