hunch_net hunch_net-2005 hunch_net-2005-83 knowledge-graph by maker-knowledge-mining
Source: html
Introduction: Learning reductions transform a solver of one type of learning problem into a solver of another type of learning problem. When we analyze these for robustness we can make statement of the form “Reduction R has the property that regret r (or loss) on subproblems of type A implies regret at most f ( r ) on the original problem of type B “. A lower bound for a learning reduction would have the form “for all reductions R , there exists a learning problem of type B and learning algorithm for problems of type A where regret r on induced problems implies at least regret f ( r ) for B “. The pursuit of lower bounds is often questionable because, unlike upper bounds, they do not yield practical algorithms. Nevertheless, they may be helpful as a tool for thinking about what is learnable and how learnable it is. This has already come up here and here . At the moment, there is no coherent theory of lower bounds for learning reductions, and we have little understa
sentIndex sentText sentNum sentScore
1 Learning reductions transform a solver of one type of learning problem into a solver of another type of learning problem. [sent-1, score-1.695]
2 When we analyze these for robustness we can make statement of the form “Reduction R has the property that regret r (or loss) on subproblems of type A implies regret at most f ( r ) on the original problem of type B “. [sent-2, score-1.868]
3 A lower bound for a learning reduction would have the form “for all reductions R , there exists a learning problem of type B and learning algorithm for problems of type A where regret r on induced problems implies at least regret f ( r ) for B “. [sent-3, score-2.485]
4 The pursuit of lower bounds is often questionable because, unlike upper bounds, they do not yield practical algorithms. [sent-4, score-0.912]
5 Nevertheless, they may be helpful as a tool for thinking about what is learnable and how learnable it is. [sent-5, score-0.524]
6 At the moment, there is no coherent theory of lower bounds for learning reductions, and we have little understanding of how feasible they are or which techniques may be useful in proving them. [sent-7, score-0.896]
7 Here is a rough summary of what I know: For structured prediction , we have a partially worked out lower bound for all reductions using the structure to only carry single bits . [sent-8, score-1.441]
8 A proof for reductions using the structure in others ways seems tricky at the moment. [sent-9, score-0.455]
9 For Reinforcement learning it may (this is unclear) be possible to prove a lower bound showing that prediction ability alone can not solve RL well. [sent-10, score-0.825]
10 There are various results which can be thought of as lower bounds for more limited families of reductions. [sent-11, score-0.677]
11 One example is analyzing exactly how badly margin optimization can underperform for 0-1 loss when there is noise. [sent-12, score-0.407]
12 Overall, this is a moderately interesting direction of research which has not been much investigated. [sent-13, score-0.166]
wordName wordTfidf (topN-words)
[('type', 0.436), ('lower', 0.36), ('reductions', 0.272), ('solver', 0.233), ('regret', 0.229), ('bounds', 0.209), ('learnable', 0.187), ('bound', 0.16), ('investigated', 0.117), ('reduction', 0.112), ('families', 0.108), ('carry', 0.108), ('structure', 0.108), ('moderately', 0.102), ('pursuit', 0.102), ('questionable', 0.097), ('alone', 0.093), ('loss', 0.093), ('coherent', 0.09), ('induced', 0.09), ('implies', 0.088), ('margin', 0.087), ('feasible', 0.085), ('subproblems', 0.085), ('transform', 0.085), ('proving', 0.083), ('robustness', 0.081), ('analyzing', 0.081), ('tool', 0.081), ('summary', 0.079), ('rough', 0.079), ('rl', 0.075), ('tricky', 0.075), ('form', 0.073), ('moment', 0.073), ('unlike', 0.073), ('badly', 0.073), ('exactly', 0.073), ('prediction', 0.072), ('analyze', 0.071), ('upper', 0.071), ('showing', 0.071), ('bits', 0.07), ('property', 0.07), ('original', 0.07), ('may', 0.069), ('partially', 0.067), ('unclear', 0.067), ('structured', 0.066), ('direction', 0.064)]
simIndex simValue blogId blogTitle
same-blog 1 0.99999994 83 hunch net-2005-06-18-Lower Bounds for Learning Reductions
Introduction: Learning reductions transform a solver of one type of learning problem into a solver of another type of learning problem. When we analyze these for robustness we can make statement of the form “Reduction R has the property that regret r (or loss) on subproblems of type A implies regret at most f ( r ) on the original problem of type B “. A lower bound for a learning reduction would have the form “for all reductions R , there exists a learning problem of type B and learning algorithm for problems of type A where regret r on induced problems implies at least regret f ( r ) for B “. The pursuit of lower bounds is often questionable because, unlike upper bounds, they do not yield practical algorithms. Nevertheless, they may be helpful as a tool for thinking about what is learnable and how learnable it is. This has already come up here and here . At the moment, there is no coherent theory of lower bounds for learning reductions, and we have little understa
2 0.43694499 236 hunch net-2007-03-15-Alternative Machine Learning Reductions Definitions
Introduction: A type of prediction problem is specified by the type of samples produced by a data source (Example: X x {0,1} , X x [0,1] , X x {1,2,3,4,5} , etc…) and a loss function (0/1 loss, squared error loss, cost sensitive losses, etc…). For simplicity, we’ll assume that all losses have a minimum of zero. For this post, we can think of a learning reduction as A mapping R from samples of one type T (like multiclass classification) to another type T’ (like binary classification). A mapping Q from predictors for type T’ to predictors for type T . The simplest sort of learning reduction is a “loss reduction”. The idea in a loss reduction is to prove a statement of the form: Theorem For all base predictors b , for all distributions D over examples of type T : E (x,y) ~ D L T (y,Q(b,x)) <= f(E (x’,y’)~R(D) L T’ (y’,b(x’))) Here L T is the loss for the type T problem and L T’ is the loss for the type T’ problem. Also, R(D) is the distribution ov
3 0.24307716 49 hunch net-2005-03-30-What can Type Theory teach us about Machine Learning?
Introduction: This post is some combination of belaboring the obvious and speculating wildly about the future. The basic issue to be addressed is how to think about machine learning in terms given to us from Programming Language theory. Types and Reductions John’s research programme (I feel this should be in British spelling to reflect the grandiousness of the idea…) of machine learning reductions StateOfReduction is at some essential level type-theoretic in nature. The fundamental elements are the classifier, a function f: alpha -> beta, and the corresponding classifier trainer g: List of (alpha,beta) -> (alpha -> beta). The research goal is to create *combinators* that produce new f’s and g’s given existing ones. John (probably quite rightly) seems unwilling at the moment to commit to any notion stronger than these combinators are correctly typed. One way to see the result of a reduction is something typed like: (For those denied the joy of the Hindly-Milner type system, “simple” is probab
4 0.21348301 14 hunch net-2005-02-07-The State of the Reduction
Introduction: What? Reductions are machines which turn solvers for one problem into solvers for another problem. Why? Reductions are useful for several reasons. Laziness . Reducing a problem to classification make at least 10 learning algorithms available to solve a problem. Inventing 10 learning algorithms is quite a bit of work. Similarly, programming a reduction is often trivial, while programming a learning algorithm is a great deal of work. Crystallization . The problems we often want to solve in learning are worst-case-impossible, but average case feasible. By reducing all problems onto one or a few primitives, we can fine tune these primitives to perform well on real-world problems with greater precision due to the greater number of problems to validate on. Theoretical Organization . By studying what reductions are easy vs. hard vs. impossible, we can learn which problems are roughly equivalent in difficulty and which are much harder. What we know now . Typesafe r
5 0.20925155 181 hunch net-2006-05-23-What is the best regret transform reduction from multiclass to binary?
Introduction: This post is about an open problem in learning reductions. Background A reduction might transform a a multiclass prediction problem where there are k possible labels into a binary learning problem where there are only 2 possible labels. On this induced binary problem we might learn a binary classifier with some error rate e . After subtracting the minimum possible (Bayes) error rate b , we get a regret r = e – b . The PECOC (Probabilistic Error Correcting Output Code) reduction has the property that binary regret r implies multiclass regret at most 4r 0.5 . The problem This is not the “rightest” answer. Consider the k=2 case, where we reduce binary to binary. There exists a reduction (the identity) with the property that regret r implies regret r . This is substantially superior to the transform given by the PECOC reduction, which suggests that a better reduction may exist for general k . For example, we can not rule out the possibility that a reduction
6 0.20255633 41 hunch net-2005-03-15-The State of Tight Bounds
7 0.16592364 72 hunch net-2005-05-16-Regret minimizing vs error limiting reductions
8 0.15867627 82 hunch net-2005-06-17-Reopening RL->Classification
9 0.13593148 351 hunch net-2009-05-02-Wielding a New Abstraction
10 0.12904753 391 hunch net-2010-03-15-The Efficient Robust Conditional Probability Estimation Problem
11 0.12530644 103 hunch net-2005-08-18-SVM Adaptability
12 0.12263677 343 hunch net-2009-02-18-Decision by Vetocracy
13 0.11950964 58 hunch net-2005-04-21-Dynamic Programming Generalizations and Their Use
14 0.11806416 202 hunch net-2006-08-10-Precision is not accuracy
15 0.11769285 341 hunch net-2009-02-04-Optimal Proxy Loss for Classification
16 0.1137845 332 hunch net-2008-12-23-Use of Learning Theory
17 0.11321035 235 hunch net-2007-03-03-All Models of Learning have Flaws
18 0.11072067 27 hunch net-2005-02-23-Problem: Reinforcement Learning with Classification
19 0.10729883 115 hunch net-2005-09-26-Prediction Bounds as the Mathematics of Science
20 0.10430884 473 hunch net-2012-09-29-Vowpal Wabbit, version 7.0
topicId topicWeight
[(0, 0.204), (1, 0.176), (2, 0.095), (3, -0.105), (4, -0.082), (5, -0.067), (6, 0.247), (7, -0.073), (8, -0.123), (9, -0.042), (10, -0.03), (11, -0.008), (12, -0.057), (13, 0.065), (14, 0.034), (15, 0.001), (16, 0.133), (17, -0.001), (18, -0.036), (19, -0.094), (20, 0.104), (21, -0.099), (22, -0.17), (23, -0.106), (24, -0.025), (25, 0.018), (26, -0.072), (27, 0.067), (28, 0.081), (29, 0.099), (30, 0.103), (31, -0.02), (32, -0.066), (33, 0.018), (34, -0.084), (35, -0.161), (36, -0.166), (37, 0.002), (38, -0.125), (39, 0.006), (40, -0.04), (41, 0.061), (42, 0.006), (43, -0.025), (44, -0.066), (45, 0.054), (46, 0.025), (47, 0.085), (48, -0.054), (49, 0.067)]
simIndex simValue blogId blogTitle
same-blog 1 0.9691909 83 hunch net-2005-06-18-Lower Bounds for Learning Reductions
Introduction: Learning reductions transform a solver of one type of learning problem into a solver of another type of learning problem. When we analyze these for robustness we can make statement of the form “Reduction R has the property that regret r (or loss) on subproblems of type A implies regret at most f ( r ) on the original problem of type B “. A lower bound for a learning reduction would have the form “for all reductions R , there exists a learning problem of type B and learning algorithm for problems of type A where regret r on induced problems implies at least regret f ( r ) for B “. The pursuit of lower bounds is often questionable because, unlike upper bounds, they do not yield practical algorithms. Nevertheless, they may be helpful as a tool for thinking about what is learnable and how learnable it is. This has already come up here and here . At the moment, there is no coherent theory of lower bounds for learning reductions, and we have little understa
2 0.84947103 236 hunch net-2007-03-15-Alternative Machine Learning Reductions Definitions
Introduction: A type of prediction problem is specified by the type of samples produced by a data source (Example: X x {0,1} , X x [0,1] , X x {1,2,3,4,5} , etc…) and a loss function (0/1 loss, squared error loss, cost sensitive losses, etc…). For simplicity, we’ll assume that all losses have a minimum of zero. For this post, we can think of a learning reduction as A mapping R from samples of one type T (like multiclass classification) to another type T’ (like binary classification). A mapping Q from predictors for type T’ to predictors for type T . The simplest sort of learning reduction is a “loss reduction”. The idea in a loss reduction is to prove a statement of the form: Theorem For all base predictors b , for all distributions D over examples of type T : E (x,y) ~ D L T (y,Q(b,x)) <= f(E (x’,y’)~R(D) L T’ (y’,b(x’))) Here L T is the loss for the type T problem and L T’ is the loss for the type T’ problem. Also, R(D) is the distribution ov
3 0.68269604 49 hunch net-2005-03-30-What can Type Theory teach us about Machine Learning?
Introduction: This post is some combination of belaboring the obvious and speculating wildly about the future. The basic issue to be addressed is how to think about machine learning in terms given to us from Programming Language theory. Types and Reductions John’s research programme (I feel this should be in British spelling to reflect the grandiousness of the idea…) of machine learning reductions StateOfReduction is at some essential level type-theoretic in nature. The fundamental elements are the classifier, a function f: alpha -> beta, and the corresponding classifier trainer g: List of (alpha,beta) -> (alpha -> beta). The research goal is to create *combinators* that produce new f’s and g’s given existing ones. John (probably quite rightly) seems unwilling at the moment to commit to any notion stronger than these combinators are correctly typed. One way to see the result of a reduction is something typed like: (For those denied the joy of the Hindly-Milner type system, “simple” is probab
4 0.5822407 181 hunch net-2006-05-23-What is the best regret transform reduction from multiclass to binary?
Introduction: This post is about an open problem in learning reductions. Background A reduction might transform a a multiclass prediction problem where there are k possible labels into a binary learning problem where there are only 2 possible labels. On this induced binary problem we might learn a binary classifier with some error rate e . After subtracting the minimum possible (Bayes) error rate b , we get a regret r = e – b . The PECOC (Probabilistic Error Correcting Output Code) reduction has the property that binary regret r implies multiclass regret at most 4r 0.5 . The problem This is not the “rightest” answer. Consider the k=2 case, where we reduce binary to binary. There exists a reduction (the identity) with the property that regret r implies regret r . This is substantially superior to the transform given by the PECOC reduction, which suggests that a better reduction may exist for general k . For example, we can not rule out the possibility that a reduction
5 0.57483113 103 hunch net-2005-08-18-SVM Adaptability
Introduction: Several recent papers have shown that SVM-like optimizations can be used to handle several large family loss functions. This is a good thing because it is implausible that the loss function imposed by the world can not be taken into account in the process of solving a prediction problem. Even people used to the hard-core Bayesian approach to learning often note that some approximations are almost inevitable in specifying a prior and/or integrating to achieve a posterior. Taking into account how the system will be evaluated can allow both computational effort and design effort to be focused so as to improve performance. A current laundry list of capabilities includes: 2002 multiclass SVM including arbitrary cost matrices ICML 2003 Hidden Markov Models NIPS 2003 Markov Networks (see some discussion ) EMNLP 2004 Context free grammars ICML 2004 Any loss (with much computation) ICML 2005 Any constrained linear prediction model (that’s my own
6 0.56728411 14 hunch net-2005-02-07-The State of the Reduction
7 0.55041432 82 hunch net-2005-06-17-Reopening RL->Classification
8 0.48545063 391 hunch net-2010-03-15-The Efficient Robust Conditional Probability Estimation Problem
9 0.47676063 351 hunch net-2009-05-02-Wielding a New Abstraction
10 0.45286363 170 hunch net-2006-04-06-Bounds greater than 1
11 0.45252258 473 hunch net-2012-09-29-Vowpal Wabbit, version 7.0
12 0.44878191 27 hunch net-2005-02-23-Problem: Reinforcement Learning with Classification
13 0.44466454 41 hunch net-2005-03-15-The State of Tight Bounds
14 0.43649265 85 hunch net-2005-06-28-A COLT paper
15 0.42837697 243 hunch net-2007-05-08-Conditional Tournaments for Multiclass to Binary
16 0.418607 58 hunch net-2005-04-21-Dynamic Programming Generalizations and Their Use
17 0.41601008 115 hunch net-2005-09-26-Prediction Bounds as the Mathematics of Science
18 0.41467676 244 hunch net-2007-05-09-The Missing Bound
19 0.41213524 421 hunch net-2011-01-03-Herman Goldstine 2011
20 0.40586811 72 hunch net-2005-05-16-Regret minimizing vs error limiting reductions
topicId topicWeight
[(3, 0.015), (27, 0.193), (34, 0.011), (38, 0.532), (53, 0.055), (55, 0.033), (94, 0.021), (95, 0.033)]
simIndex simValue blogId blogTitle
1 0.98364305 125 hunch net-2005-10-20-Machine Learning in the News
Introduction: The New York Times had a short interview about machine learning in datamining being used pervasively by the IRS and large corporations to predict who to audit and who to target for various marketing campaigns. This is a big application area of machine learning. It can be harmful (learning + databases = another way to invade privacy) or beneficial (as google demonstrates, better targeting of marketing campaigns is far less annoying). This is yet more evidence that we can not rely upon “I’m just another fish in the school” logic for our expectations about treatment by government and large corporations.
2 0.97451627 339 hunch net-2009-01-27-Key Scientific Challenges
Introduction: Yahoo released the Key Scientific Challenges program. There is a Machine Learning list I worked on and a Statistics list which Deepak worked on. I’m hoping this is taken quite seriously by graduate students. The primary value, is that it gave us a chance to sit down and publicly specify directions of research which would be valuable to make progress on. A good strategy for a beginning graduate student is to pick one of these directions, pursue it, and make substantial advances for a PhD. The directions are sufficiently general that I’m sure any serious advance has applications well beyond Yahoo. A secondary point, (which I’m sure is primary for many ) is that there is money for graduate students here. It’s unrestricted, so you can use it for any reasonable travel, supplies, etc…
3 0.96373552 181 hunch net-2006-05-23-What is the best regret transform reduction from multiclass to binary?
Introduction: This post is about an open problem in learning reductions. Background A reduction might transform a a multiclass prediction problem where there are k possible labels into a binary learning problem where there are only 2 possible labels. On this induced binary problem we might learn a binary classifier with some error rate e . After subtracting the minimum possible (Bayes) error rate b , we get a regret r = e – b . The PECOC (Probabilistic Error Correcting Output Code) reduction has the property that binary regret r implies multiclass regret at most 4r 0.5 . The problem This is not the “rightest” answer. Consider the k=2 case, where we reduce binary to binary. There exists a reduction (the identity) with the property that regret r implies regret r . This is substantially superior to the transform given by the PECOC reduction, which suggests that a better reduction may exist for general k . For example, we can not rule out the possibility that a reduction
same-blog 4 0.96162915 83 hunch net-2005-06-18-Lower Bounds for Learning Reductions
Introduction: Learning reductions transform a solver of one type of learning problem into a solver of another type of learning problem. When we analyze these for robustness we can make statement of the form “Reduction R has the property that regret r (or loss) on subproblems of type A implies regret at most f ( r ) on the original problem of type B “. A lower bound for a learning reduction would have the form “for all reductions R , there exists a learning problem of type B and learning algorithm for problems of type A where regret r on induced problems implies at least regret f ( r ) for B “. The pursuit of lower bounds is often questionable because, unlike upper bounds, they do not yield practical algorithms. Nevertheless, they may be helpful as a tool for thinking about what is learnable and how learnable it is. This has already come up here and here . At the moment, there is no coherent theory of lower bounds for learning reductions, and we have little understa
5 0.94754273 488 hunch net-2013-08-31-Extreme Classification workshop at NIPS
Introduction: Manik and I are organizing the extreme classification workshop at NIPS this year. We have a number of good speakers lined up, but I would further encourage anyone working in the area to submit an abstract by October 9. I believe this is an idea whose time has now come. The NIPS website doesn’t have other workshops listed yet, but I expect several others to be of significant interest.
6 0.92579544 170 hunch net-2006-04-06-Bounds greater than 1
7 0.85108572 353 hunch net-2009-05-08-Computability in Artificial Intelligence
8 0.85096222 233 hunch net-2007-02-16-The Forgetting
9 0.81733572 236 hunch net-2007-03-15-Alternative Machine Learning Reductions Definitions
10 0.79859167 251 hunch net-2007-06-24-Interesting Papers at ICML 2007
11 0.71574903 72 hunch net-2005-05-16-Regret minimizing vs error limiting reductions
12 0.69216204 239 hunch net-2007-04-18-$50K Spock Challenge
13 0.65830106 26 hunch net-2005-02-21-Problem: Cross Validation
14 0.62907439 82 hunch net-2005-06-17-Reopening RL->Classification
15 0.61674756 19 hunch net-2005-02-14-Clever Methods of Overfitting
16 0.60677147 131 hunch net-2005-11-16-The Everything Ensemble Edge
17 0.59362561 49 hunch net-2005-03-30-What can Type Theory teach us about Machine Learning?
18 0.58377773 14 hunch net-2005-02-07-The State of the Reduction
19 0.5790332 391 hunch net-2010-03-15-The Efficient Robust Conditional Probability Estimation Problem
20 0.57671064 284 hunch net-2008-01-18-Datasets