hunch_net hunch_net-2006 hunch_net-2006-170 knowledge-graph by maker-knowledge-mining

170 hunch net-2006-04-06-Bounds greater than 1


meta infos for this blog

Source: html

Introduction: Nati Srebro and Shai Ben-David have a paper at COLT which, in the appendix, proves something very striking: several previous error bounds are always greater than 1. Background One branch of learning theory focuses on theorems which Assume samples are drawn IID from an unknown distribution D . Fix a set of classifiers Find a high probability bound on the maximum true error rate (with respect to D ) as a function of the empirical error rate on the training set. Many of these bounds become extremely complex and hairy. Current Everyone working on this subject wants “tighter bounds”, however there are different definitions of “tighter”. Some groups focus on “functional tightness” (getting the right functional dependency between the size of the training set and a parameterization of the hypothesis space) while others focus on “practical tightness” (finding bounds which work well on practical problems). (I am definitely in the second camp.) One of the da


Summary: the most important sentenses genereted by tfidf model

sentIndex sentText sentNum sentScore

1 Nati Srebro and Shai Ben-David have a paper at COLT which, in the appendix, proves something very striking: several previous error bounds are always greater than 1. [sent-1, score-0.793]

2 Background One branch of learning theory focuses on theorems which Assume samples are drawn IID from an unknown distribution D . [sent-2, score-0.36]

3 Fix a set of classifiers Find a high probability bound on the maximum true error rate (with respect to D ) as a function of the empirical error rate on the training set. [sent-3, score-0.835]

4 Many of these bounds become extremely complex and hairy. [sent-4, score-0.549]

5 Current Everyone working on this subject wants “tighter bounds”, however there are different definitions of “tighter”. [sent-5, score-0.178]

6 Some groups focus on “functional tightness” (getting the right functional dependency between the size of the training set and a parameterization of the hypothesis space) while others focus on “practical tightness” (finding bounds which work well on practical problems). [sent-6, score-1.552]

7 ) One of the dangers of striving for “functional tightness” is that the bound can depend on strangely interrelated parameters. [sent-8, score-0.347]

8 In fact, apparently these strange interrelations can become so complex that they end up always larger than 1 (some bounds here and here ). [sent-9, score-0.818]

9 ” If it is just done to get a paper accepted under review, perhaps this is unsatisfying. [sent-11, score-0.156]

10 The real value of math comes when it guides us in designing learning algorithms. [sent-12, score-0.514]

11 Math from bounds greater than 1 is a dangerously weak motivation for learning algorithm design. [sent-13, score-0.547]

12 There is a significant danger in taking this “oops” too strongly. [sent-14, score-0.113]

13 There exist some reasonable arguments (not made here) for aiming at functional tightness. [sent-15, score-0.428]

14 The value of the research a person does is more related to the best they have done than the worst. [sent-16, score-0.17]


similar blogs computed by tfidf model

tfidf for this blog:

wordName wordTfidf (topN-words)

[('functional', 0.346), ('bounds', 0.328), ('tightness', 0.321), ('math', 0.247), ('tighter', 0.178), ('error', 0.141), ('greater', 0.14), ('focus', 0.134), ('dependency', 0.122), ('strangely', 0.122), ('complex', 0.119), ('practical', 0.117), ('depend', 0.113), ('danger', 0.113), ('nati', 0.113), ('parameterization', 0.113), ('bound', 0.112), ('training', 0.11), ('oops', 0.107), ('branch', 0.107), ('appendix', 0.107), ('guides', 0.102), ('focuses', 0.102), ('become', 0.102), ('striking', 0.098), ('proves', 0.098), ('strange', 0.094), ('rate', 0.092), ('shai', 0.091), ('wants', 0.089), ('definitions', 0.089), ('apparently', 0.089), ('value', 0.087), ('always', 0.086), ('done', 0.083), ('arguments', 0.082), ('definitely', 0.082), ('background', 0.081), ('unknown', 0.081), ('motivation', 0.079), ('designing', 0.078), ('maximum', 0.078), ('fix', 0.076), ('assume', 0.075), ('worst', 0.075), ('groups', 0.075), ('hypothesis', 0.073), ('accepted', 0.073), ('theorems', 0.07), ('classifiers', 0.069)]

similar blogs list:

simIndex simValue blogId blogTitle

same-blog 1 1.0 170 hunch net-2006-04-06-Bounds greater than 1

Introduction: Nati Srebro and Shai Ben-David have a paper at COLT which, in the appendix, proves something very striking: several previous error bounds are always greater than 1. Background One branch of learning theory focuses on theorems which Assume samples are drawn IID from an unknown distribution D . Fix a set of classifiers Find a high probability bound on the maximum true error rate (with respect to D ) as a function of the empirical error rate on the training set. Many of these bounds become extremely complex and hairy. Current Everyone working on this subject wants “tighter bounds”, however there are different definitions of “tighter”. Some groups focus on “functional tightness” (getting the right functional dependency between the size of the training set and a parameterization of the hypothesis space) while others focus on “practical tightness” (finding bounds which work well on practical problems). (I am definitely in the second camp.) One of the da

2 0.24995089 41 hunch net-2005-03-15-The State of Tight Bounds

Introduction: What? Bounds are mathematical formulas relating observations to future error rates assuming that data is drawn independently. In classical statistics, they are calld confidence intervals. Why? Good Judgement . In many applications of learning, it is desirable to know how well the learned predictor works in the future. This helps you decide if the problem is solved or not. Learning Essence . The form of some of these bounds helps you understand what the essence of learning is. Algorithm Design . Some of these bounds suggest, motivate, or even directly imply learning algorithms. What We Know Now There are several families of bounds, based on how information is used. Testing Bounds . These are methods which use labeled data not used in training to estimate the future error rate. Examples include the test set bound , progressive validation also here and here , train and test bounds , and cross-validation (but see the big open problem ). These tec

3 0.14783072 85 hunch net-2005-06-28-A COLT paper

Introduction: I found Tong Zhang’s paper on Data Dependent Concentration Bounds for Sequential Prediction Algorithms interesting. Roughly speaking, it states a tight bound on the future error rate for online learning algorithms assuming that samples are drawn independently. This bound is easily computed and will make the progressive validation approaches used here significantly more practical.

4 0.13282233 72 hunch net-2005-05-16-Regret minimizing vs error limiting reductions

Introduction: This post is about a reductions-related problem that I find mysterious. There are two kinds of reductions analysis currently under consideration. Error limiting reductions. Here, the goal is to bound the error rate of the created classifier in terms of the error rate of the binary classifiers that you reduce to. A very simple example of this is that error correcting output codes where it is possible to prove that for certain codes, the multiclass error rate is at most 4 * the binary classifier error rate. Regret minimizing reductions. Here, the goal is to bound the regret of the created classifier in terms of the regret of the binary classifiers reduced to. The regret is the error rate minus the minimum error rate. When the learning problem is noisy the minimum error rate may not be 0 . An analagous result for reget is that for a probabilistic error correcting output code , multiclass regret is at most 4 * (binary regret) 0.5 . The use of “regret” is more desi

5 0.11697856 235 hunch net-2007-03-03-All Models of Learning have Flaws

Introduction: Attempts to abstract and study machine learning are within some given framework or mathematical model. It turns out that all of these models are significantly flawed for the purpose of studying machine learning. I’ve created a table (below) outlining the major flaws in some common models of machine learning. The point here is not simply “woe unto us”. There are several implications which seem important. The multitude of models is a point of continuing confusion. It is common for people to learn about machine learning within one framework which often becomes there “home framework” through which they attempt to filter all machine learning. (Have you met people who can only think in terms of kernels? Only via Bayes Law? Only via PAC Learning?) Explicitly understanding the existence of these other frameworks can help resolve the confusion. This is particularly important when reviewing and particularly important for students. Algorithms which conform to multiple approaches c

6 0.11585537 202 hunch net-2006-08-10-Precision is not accuracy

7 0.1157653 163 hunch net-2006-03-12-Online learning or online preservation of learning?

8 0.11388376 115 hunch net-2005-09-26-Prediction Bounds as the Mathematics of Science

9 0.11353772 247 hunch net-2007-06-14-Interesting Papers at COLT 2007

10 0.11093672 332 hunch net-2008-12-23-Use of Learning Theory

11 0.10955691 57 hunch net-2005-04-16-Which Assumptions are Reasonable?

12 0.10945246 230 hunch net-2007-02-02-Thoughts regarding “Is machine learning different from statistics?”

13 0.1014677 83 hunch net-2005-06-18-Lower Bounds for Learning Reductions

14 0.098822914 190 hunch net-2006-07-06-Branch Prediction Competition

15 0.09735965 34 hunch net-2005-03-02-Prior, “Prior” and Bias

16 0.097187608 14 hunch net-2005-02-07-The State of the Reduction

17 0.096434407 368 hunch net-2009-08-26-Another 10-year paper in Machine Learning

18 0.094501182 351 hunch net-2009-05-02-Wielding a New Abstraction

19 0.09353447 244 hunch net-2007-05-09-The Missing Bound

20 0.093169473 2 hunch net-2005-01-24-Holy grails of machine learning?


similar blogs computed by lsi model

lsi for this blog:

topicId topicWeight

[(0, 0.204), (1, 0.076), (2, 0.073), (3, -0.031), (4, 0.017), (5, -0.082), (6, 0.116), (7, -0.002), (8, -0.022), (9, -0.047), (10, 0.061), (11, 0.16), (12, 0.03), (13, -0.06), (14, 0.098), (15, -0.067), (16, -0.039), (17, 0.07), (18, -0.096), (19, 0.023), (20, 0.112), (21, 0.099), (22, -0.149), (23, 0.049), (24, 0.018), (25, -0.015), (26, -0.006), (27, 0.05), (28, 0.049), (29, -0.019), (30, -0.014), (31, 0.038), (32, -0.02), (33, 0.041), (34, -0.092), (35, -0.116), (36, 0.058), (37, -0.053), (38, -0.049), (39, 0.059), (40, -0.074), (41, -0.067), (42, -0.049), (43, -0.023), (44, -0.057), (45, -0.061), (46, 0.062), (47, -0.004), (48, -0.038), (49, 0.077)]

similar blogs list:

simIndex simValue blogId blogTitle

same-blog 1 0.98248261 170 hunch net-2006-04-06-Bounds greater than 1

Introduction: Nati Srebro and Shai Ben-David have a paper at COLT which, in the appendix, proves something very striking: several previous error bounds are always greater than 1. Background One branch of learning theory focuses on theorems which Assume samples are drawn IID from an unknown distribution D . Fix a set of classifiers Find a high probability bound on the maximum true error rate (with respect to D ) as a function of the empirical error rate on the training set. Many of these bounds become extremely complex and hairy. Current Everyone working on this subject wants “tighter bounds”, however there are different definitions of “tighter”. Some groups focus on “functional tightness” (getting the right functional dependency between the size of the training set and a parameterization of the hypothesis space) while others focus on “practical tightness” (finding bounds which work well on practical problems). (I am definitely in the second camp.) One of the da

2 0.81270754 244 hunch net-2007-05-09-The Missing Bound

Introduction: Sham Kakade points out that we are missing a bound. Suppose we have m samples x drawn IID from some distribution D . Through the magic of exponential moment method we know that: If the range of x is bounded by an interval of size I , a Chernoff/Hoeffding style bound gives us a bound on the deviations like O(I/m 0.5 ) (at least in crude form). A proof is on page 9 here . If the range of x is bounded, and the variance (or a bound on the variance) is known, then Bennett’s bound can give tighter results (*). This can be a huge improvment when the true variance small. What’s missing here is a bound that depends on the observed variance rather than a bound on the variance. This means that many people attempt to use Bennett’s bound (incorrectly) by plugging the observed variance in as the true variance, invalidating the bound application. Most of the time, they get away with it, but this is a dangerous move when doing machine learning. In machine learning,

3 0.7724399 85 hunch net-2005-06-28-A COLT paper

Introduction: I found Tong Zhang’s paper on Data Dependent Concentration Bounds for Sequential Prediction Algorithms interesting. Roughly speaking, it states a tight bound on the future error rate for online learning algorithms assuming that samples are drawn independently. This bound is easily computed and will make the progressive validation approaches used here significantly more practical.

4 0.75127733 41 hunch net-2005-03-15-The State of Tight Bounds

Introduction: What? Bounds are mathematical formulas relating observations to future error rates assuming that data is drawn independently. In classical statistics, they are calld confidence intervals. Why? Good Judgement . In many applications of learning, it is desirable to know how well the learned predictor works in the future. This helps you decide if the problem is solved or not. Learning Essence . The form of some of these bounds helps you understand what the essence of learning is. Algorithm Design . Some of these bounds suggest, motivate, or even directly imply learning algorithms. What We Know Now There are several families of bounds, based on how information is used. Testing Bounds . These are methods which use labeled data not used in training to estimate the future error rate. Examples include the test set bound , progressive validation also here and here , train and test bounds , and cross-validation (but see the big open problem ). These tec

5 0.66036761 115 hunch net-2005-09-26-Prediction Bounds as the Mathematics of Science

Introduction: “Science” has many meanings, but one common meaning is “the scientific method ” which is a principled method for investigating the world using the following steps: Form a hypothesis about the world. Use the hypothesis to make predictions. Run experiments to confirm or disprove the predictions. The ordering of these steps is very important to the scientific method. In particular, predictions must be made before experiments are run. Given that we all believe in the scientific method of investigation, it may be surprising to learn that cheating is very common. This happens for many reasons, some innocent and some not. Drug studies. Pharmaceutical companies make predictions about the effects of their drugs and then conduct blind clinical studies to determine their effect. Unfortunately, they have also been caught using some of the more advanced techniques for cheating here : including “reprobleming”, “data set selection”, and probably “overfitting by review”

6 0.65106732 368 hunch net-2009-08-26-Another 10-year paper in Machine Learning

7 0.64373767 247 hunch net-2007-06-14-Interesting Papers at COLT 2007

8 0.63735098 163 hunch net-2006-03-12-Online learning or online preservation of learning?

9 0.61620635 392 hunch net-2010-03-26-A Variance only Deviation Bound

10 0.60416675 33 hunch net-2005-02-28-Regularization

11 0.50600564 230 hunch net-2007-02-02-Thoughts regarding “Is machine learning different from statistics?”

12 0.49720719 55 hunch net-2005-04-10-Is the Goal Understanding or Prediction?

13 0.48756576 83 hunch net-2005-06-18-Lower Bounds for Learning Reductions

14 0.47140869 26 hunch net-2005-02-21-Problem: Cross Validation

15 0.46917588 35 hunch net-2005-03-04-The Big O and Constants in Learning

16 0.45599371 311 hunch net-2008-07-26-Compositional Machine Learning Algorithm Design

17 0.45024815 351 hunch net-2009-05-02-Wielding a New Abstraction

18 0.44975862 185 hunch net-2006-06-16-Regularization = Robustness

19 0.44242343 58 hunch net-2005-04-21-Dynamic Programming Generalizations and Their Use

20 0.44213146 67 hunch net-2005-05-06-Don’t mix the solution into the problem


similar blogs computed by lda model

lda for this blog:

topicId topicWeight

[(3, 0.027), (27, 0.225), (38, 0.516), (55, 0.103), (95, 0.026)]

similar blogs list:

simIndex simValue blogId blogTitle

1 0.9905448 339 hunch net-2009-01-27-Key Scientific Challenges

Introduction: Yahoo released the Key Scientific Challenges program. There is a Machine Learning list I worked on and a Statistics list which Deepak worked on. I’m hoping this is taken quite seriously by graduate students. The primary value, is that it gave us a chance to sit down and publicly specify directions of research which would be valuable to make progress on. A good strategy for a beginning graduate student is to pick one of these directions, pursue it, and make substantial advances for a PhD. The directions are sufficiently general that I’m sure any serious advance has applications well beyond Yahoo. A secondary point, (which I’m sure is primary for many ) is that there is money for graduate students here. It’s unrestricted, so you can use it for any reasonable travel, supplies, etc…

2 0.97534299 125 hunch net-2005-10-20-Machine Learning in the News

Introduction: The New York Times had a short interview about machine learning in datamining being used pervasively by the IRS and large corporations to predict who to audit and who to target for various marketing campaigns. This is a big application area of machine learning. It can be harmful (learning + databases = another way to invade privacy) or beneficial (as google demonstrates, better targeting of marketing campaigns is far less annoying). This is yet more evidence that we can not rely upon “I’m just another fish in the school” logic for our expectations about treatment by government and large corporations.

3 0.97151983 181 hunch net-2006-05-23-What is the best regret transform reduction from multiclass to binary?

Introduction: This post is about an open problem in learning reductions. Background A reduction might transform a a multiclass prediction problem where there are k possible labels into a binary learning problem where there are only 2 possible labels. On this induced binary problem we might learn a binary classifier with some error rate e . After subtracting the minimum possible (Bayes) error rate b , we get a regret r = e – b . The PECOC (Probabilistic Error Correcting Output Code) reduction has the property that binary regret r implies multiclass regret at most 4r 0.5 . The problem This is not the “rightest” answer. Consider the k=2 case, where we reduce binary to binary. There exists a reduction (the identity) with the property that regret r implies regret r . This is substantially superior to the transform given by the PECOC reduction, which suggests that a better reduction may exist for general k . For example, we can not rule out the possibility that a reduction

4 0.96654618 83 hunch net-2005-06-18-Lower Bounds for Learning Reductions

Introduction: Learning reductions transform a solver of one type of learning problem into a solver of another type of learning problem. When we analyze these for robustness we can make statement of the form “Reduction R has the property that regret r (or loss) on subproblems of type A implies regret at most f ( r ) on the original problem of type B “. A lower bound for a learning reduction would have the form “for all reductions R , there exists a learning problem of type B and learning algorithm for problems of type A where regret r on induced problems implies at least regret f ( r ) for B “. The pursuit of lower bounds is often questionable because, unlike upper bounds, they do not yield practical algorithms. Nevertheless, they may be helpful as a tool for thinking about what is learnable and how learnable it is. This has already come up here and here . At the moment, there is no coherent theory of lower bounds for learning reductions, and we have little understa

5 0.96232146 488 hunch net-2013-08-31-Extreme Classification workshop at NIPS

Introduction: Manik and I are organizing the extreme classification workshop at NIPS this year. We have a number of good speakers lined up, but I would further encourage anyone working in the area to submit an abstract by October 9. I believe this is an idea whose time has now come. The NIPS website doesn’t have other workshops listed yet, but I expect several others to be of significant interest.

same-blog 6 0.9569357 170 hunch net-2006-04-06-Bounds greater than 1

7 0.87873507 353 hunch net-2009-05-08-Computability in Artificial Intelligence

8 0.8660931 233 hunch net-2007-02-16-The Forgetting

9 0.83048505 236 hunch net-2007-03-15-Alternative Machine Learning Reductions Definitions

10 0.82967675 251 hunch net-2007-06-24-Interesting Papers at ICML 2007

11 0.7592231 72 hunch net-2005-05-16-Regret minimizing vs error limiting reductions

12 0.68701756 239 hunch net-2007-04-18-$50K Spock Challenge

13 0.68031204 26 hunch net-2005-02-21-Problem: Cross Validation

14 0.65713149 82 hunch net-2005-06-17-Reopening RL->Classification

15 0.64170969 264 hunch net-2007-09-30-NIPS workshops are out.

16 0.63446504 19 hunch net-2005-02-14-Clever Methods of Overfitting

17 0.62678003 49 hunch net-2005-03-30-What can Type Theory teach us about Machine Learning?

18 0.62663394 131 hunch net-2005-11-16-The Everything Ensemble Edge

19 0.61847353 439 hunch net-2011-08-01-Interesting papers at COLT 2011

20 0.61212957 391 hunch net-2010-03-15-The Efficient Robust Conditional Probability Estimation Problem