hunch_net hunch_net-2005 hunch_net-2005-41 knowledge-graph by maker-knowledge-mining

41 hunch net-2005-03-15-The State of Tight Bounds


meta infos for this blog

Source: html

Introduction: What? Bounds are mathematical formulas relating observations to future error rates assuming that data is drawn independently. In classical statistics, they are calld confidence intervals. Why? Good Judgement . In many applications of learning, it is desirable to know how well the learned predictor works in the future. This helps you decide if the problem is solved or not. Learning Essence . The form of some of these bounds helps you understand what the essence of learning is. Algorithm Design . Some of these bounds suggest, motivate, or even directly imply learning algorithms. What We Know Now There are several families of bounds, based on how information is used. Testing Bounds . These are methods which use labeled data not used in training to estimate the future error rate. Examples include the test set bound , progressive validation also here and here , train and test bounds , and cross-validation (but see the big open problem ). These tec


Summary: the most important sentenses genereted by tfidf model

sentIndex sentText sentNum sentScore

1 Bounds are mathematical formulas relating observations to future error rates assuming that data is drawn independently. [sent-2, score-0.422]

2 This helps you decide if the problem is solved or not. [sent-7, score-0.177]

3 The form of some of these bounds helps you understand what the essence of learning is. [sent-9, score-0.752]

4 Some of these bounds suggest, motivate, or even directly imply learning algorithms. [sent-11, score-0.519]

5 What We Know Now There are several families of bounds, based on how information is used. [sent-12, score-0.09]

6 These are methods which use labeled data not used in training to estimate the future error rate. [sent-14, score-0.648]

7 Examples include the test set bound , progressive validation also here and here , train and test bounds , and cross-validation (but see the big open problem ). [sent-15, score-1.362]

8 These techniques are the best available for goal (1) above, but provide little information towards goals (2) or (3). [sent-16, score-0.365]

9 Some of these techniques are computationally efficient while others are not. [sent-17, score-0.08]

10 Unlabeled test set bounds Instead of using labeled data to construct a tight bound, it is sometimes possible to use unlabeled data . [sent-18, score-1.252]

11 Training Bounds These are methods which use labeled data to for both training and testing. [sent-19, score-0.5]

12 These bounds provide insight into goals (2) and (3), but are not very effective for goal (1) above. [sent-20, score-0.881]

13 These bounds teach us about how many independent examples are required to guarantee learning success on different on different representations. [sent-21, score-0.603]

14 Any bound of this sort implies a learning algorithm: “minimize the bound”. [sent-22, score-0.309]

15 The set covering machine is an application of this approach to a (variant) sample compression bound. [sent-23, score-0.343]

16 Here is a list of learning algorithms and bounds that “almost work” for these algorithms. [sent-25, score-0.519]

17 Perceptron PAC-Bayes margin bound, Sample Compression Bound Neural Network PAC-Bayes bound. [sent-27, score-0.145]

18 Decision Tree Pruning Rademacher Complexity Bounds Semisupervised Clustering PAC-MDL or transductive PAC-Bayes bound Nearest neighbor ? [sent-29, score-0.309]

19 ) Limitations The independence assumption is a significant limitation in the applicability of this approach since we often can not believe that independence is satisfied on natural learning problems. [sent-32, score-0.617]

20 Some work has gone into weakening this assumption. [sent-33, score-0.164]


similar blogs computed by tfidf model

tfidf for this blog:

wordName wordTfidf (topN-words)

[('bounds', 0.519), ('bound', 0.309), ('compression', 0.179), ('labeled', 0.171), ('margin', 0.145), ('validation', 0.133), ('independence', 0.133), ('training', 0.131), ('essence', 0.128), ('data', 0.123), ('test', 0.122), ('goals', 0.118), ('unlabeled', 0.118), ('assumption', 0.105), ('helps', 0.105), ('tree', 0.099), ('limitations', 0.097), ('pruning', 0.097), ('weakening', 0.097), ('applicability', 0.09), ('families', 0.09), ('rademacher', 0.09), ('sample', 0.088), ('provide', 0.087), ('relating', 0.084), ('teach', 0.084), ('limitation', 0.081), ('progressive', 0.081), ('goal', 0.08), ('techniques', 0.08), ('insight', 0.077), ('occam', 0.077), ('razor', 0.077), ('semisupervised', 0.077), ('set', 0.076), ('applications', 0.076), ('methods', 0.075), ('satisfied', 0.075), ('judgement', 0.075), ('motivate', 0.075), ('classical', 0.075), ('error', 0.075), ('future', 0.073), ('decide', 0.072), ('perceptron', 0.07), ('clustering', 0.067), ('svm', 0.067), ('assuming', 0.067), ('gone', 0.067), ('decision', 0.066)]

similar blogs list:

simIndex simValue blogId blogTitle

same-blog 1 1.0000002 41 hunch net-2005-03-15-The State of Tight Bounds

Introduction: What? Bounds are mathematical formulas relating observations to future error rates assuming that data is drawn independently. In classical statistics, they are calld confidence intervals. Why? Good Judgement . In many applications of learning, it is desirable to know how well the learned predictor works in the future. This helps you decide if the problem is solved or not. Learning Essence . The form of some of these bounds helps you understand what the essence of learning is. Algorithm Design . Some of these bounds suggest, motivate, or even directly imply learning algorithms. What We Know Now There are several families of bounds, based on how information is used. Testing Bounds . These are methods which use labeled data not used in training to estimate the future error rate. Examples include the test set bound , progressive validation also here and here , train and test bounds , and cross-validation (but see the big open problem ). These tec

2 0.28480452 85 hunch net-2005-06-28-A COLT paper

Introduction: I found Tong Zhang’s paper on Data Dependent Concentration Bounds for Sequential Prediction Algorithms interesting. Roughly speaking, it states a tight bound on the future error rate for online learning algorithms assuming that samples are drawn independently. This bound is easily computed and will make the progressive validation approaches used here significantly more practical.

3 0.24995089 170 hunch net-2006-04-06-Bounds greater than 1

Introduction: Nati Srebro and Shai Ben-David have a paper at COLT which, in the appendix, proves something very striking: several previous error bounds are always greater than 1. Background One branch of learning theory focuses on theorems which Assume samples are drawn IID from an unknown distribution D . Fix a set of classifiers Find a high probability bound on the maximum true error rate (with respect to D ) as a function of the empirical error rate on the training set. Many of these bounds become extremely complex and hairy. Current Everyone working on this subject wants “tighter bounds”, however there are different definitions of “tighter”. Some groups focus on “functional tightness” (getting the right functional dependency between the size of the training set and a parameterization of the hypothesis space) while others focus on “practical tightness” (finding bounds which work well on practical problems). (I am definitely in the second camp.) One of the da

4 0.24440098 115 hunch net-2005-09-26-Prediction Bounds as the Mathematics of Science

Introduction: “Science” has many meanings, but one common meaning is “the scientific method ” which is a principled method for investigating the world using the following steps: Form a hypothesis about the world. Use the hypothesis to make predictions. Run experiments to confirm or disprove the predictions. The ordering of these steps is very important to the scientific method. In particular, predictions must be made before experiments are run. Given that we all believe in the scientific method of investigation, it may be surprising to learn that cheating is very common. This happens for many reasons, some innocent and some not. Drug studies. Pharmaceutical companies make predictions about the effects of their drugs and then conduct blind clinical studies to determine their effect. Unfortunately, they have also been caught using some of the more advanced techniques for cheating here : including “reprobleming”, “data set selection”, and probably “overfitting by review”

5 0.20255633 83 hunch net-2005-06-18-Lower Bounds for Learning Reductions

Introduction: Learning reductions transform a solver of one type of learning problem into a solver of another type of learning problem. When we analyze these for robustness we can make statement of the form “Reduction R has the property that regret r (or loss) on subproblems of type A implies regret at most f ( r ) on the original problem of type B “. A lower bound for a learning reduction would have the form “for all reductions R , there exists a learning problem of type B and learning algorithm for problems of type A where regret r on induced problems implies at least regret f ( r ) for B “. The pursuit of lower bounds is often questionable because, unlike upper bounds, they do not yield practical algorithms. Nevertheless, they may be helpful as a tool for thinking about what is learnable and how learnable it is. This has already come up here and here . At the moment, there is no coherent theory of lower bounds for learning reductions, and we have little understa

6 0.18510951 244 hunch net-2007-05-09-The Missing Bound

7 0.18123776 2 hunch net-2005-01-24-Holy grails of machine learning?

8 0.17669176 26 hunch net-2005-02-21-Problem: Cross Validation

9 0.17266485 247 hunch net-2007-06-14-Interesting Papers at COLT 2007

10 0.16473243 251 hunch net-2007-06-24-Interesting Papers at ICML 2007

11 0.16163793 311 hunch net-2008-07-26-Compositional Machine Learning Algorithm Design

12 0.15081125 332 hunch net-2008-12-23-Use of Learning Theory

13 0.14585032 368 hunch net-2009-08-26-Another 10-year paper in Machine Learning

14 0.1430719 163 hunch net-2006-03-12-Online learning or online preservation of learning?

15 0.1410502 143 hunch net-2005-12-27-Automated Labeling

16 0.14008588 19 hunch net-2005-02-14-Clever Methods of Overfitting

17 0.13615434 34 hunch net-2005-03-02-Prior, “Prior” and Bias

18 0.13136616 131 hunch net-2005-11-16-The Everything Ensemble Edge

19 0.12814218 213 hunch net-2006-10-08-Incompatibilities between classical confidence intervals and learning.

20 0.12595662 45 hunch net-2005-03-22-Active learning


similar blogs computed by lsi model

lsi for this blog:

topicId topicWeight

[(0, 0.246), (1, 0.175), (2, 0.038), (3, -0.065), (4, 0.135), (5, -0.107), (6, 0.104), (7, 0.014), (8, 0.023), (9, -0.064), (10, 0.1), (11, 0.236), (12, 0.068), (13, -0.039), (14, 0.032), (15, -0.105), (16, -0.007), (17, 0.077), (18, -0.21), (19, 0.02), (20, 0.191), (21, 0.062), (22, -0.235), (23, 0.011), (24, -0.029), (25, 0.048), (26, -0.058), (27, 0.073), (28, 0.088), (29, 0.031), (30, 0.026), (31, -0.002), (32, -0.011), (33, -0.051), (34, -0.063), (35, 0.046), (36, 0.021), (37, -0.08), (38, -0.093), (39, 0.031), (40, -0.025), (41, -0.053), (42, -0.052), (43, 0.023), (44, -0.022), (45, 0.049), (46, 0.008), (47, 0.031), (48, -0.029), (49, -0.049)]

similar blogs list:

simIndex simValue blogId blogTitle

same-blog 1 0.97756028 41 hunch net-2005-03-15-The State of Tight Bounds

Introduction: What? Bounds are mathematical formulas relating observations to future error rates assuming that data is drawn independently. In classical statistics, they are calld confidence intervals. Why? Good Judgement . In many applications of learning, it is desirable to know how well the learned predictor works in the future. This helps you decide if the problem is solved or not. Learning Essence . The form of some of these bounds helps you understand what the essence of learning is. Algorithm Design . Some of these bounds suggest, motivate, or even directly imply learning algorithms. What We Know Now There are several families of bounds, based on how information is used. Testing Bounds . These are methods which use labeled data not used in training to estimate the future error rate. Examples include the test set bound , progressive validation also here and here , train and test bounds , and cross-validation (but see the big open problem ). These tec

2 0.88779795 85 hunch net-2005-06-28-A COLT paper

Introduction: I found Tong Zhang’s paper on Data Dependent Concentration Bounds for Sequential Prediction Algorithms interesting. Roughly speaking, it states a tight bound on the future error rate for online learning algorithms assuming that samples are drawn independently. This bound is easily computed and will make the progressive validation approaches used here significantly more practical.

3 0.83516526 244 hunch net-2007-05-09-The Missing Bound

Introduction: Sham Kakade points out that we are missing a bound. Suppose we have m samples x drawn IID from some distribution D . Through the magic of exponential moment method we know that: If the range of x is bounded by an interval of size I , a Chernoff/Hoeffding style bound gives us a bound on the deviations like O(I/m 0.5 ) (at least in crude form). A proof is on page 9 here . If the range of x is bounded, and the variance (or a bound on the variance) is known, then Bennett’s bound can give tighter results (*). This can be a huge improvment when the true variance small. What’s missing here is a bound that depends on the observed variance rather than a bound on the variance. This means that many people attempt to use Bennett’s bound (incorrectly) by plugging the observed variance in as the true variance, invalidating the bound application. Most of the time, they get away with it, but this is a dangerous move when doing machine learning. In machine learning,

4 0.7922222 170 hunch net-2006-04-06-Bounds greater than 1

Introduction: Nati Srebro and Shai Ben-David have a paper at COLT which, in the appendix, proves something very striking: several previous error bounds are always greater than 1. Background One branch of learning theory focuses on theorems which Assume samples are drawn IID from an unknown distribution D . Fix a set of classifiers Find a high probability bound on the maximum true error rate (with respect to D ) as a function of the empirical error rate on the training set. Many of these bounds become extremely complex and hairy. Current Everyone working on this subject wants “tighter bounds”, however there are different definitions of “tighter”. Some groups focus on “functional tightness” (getting the right functional dependency between the size of the training set and a parameterization of the hypothesis space) while others focus on “practical tightness” (finding bounds which work well on practical problems). (I am definitely in the second camp.) One of the da

5 0.77582914 115 hunch net-2005-09-26-Prediction Bounds as the Mathematics of Science

Introduction: “Science” has many meanings, but one common meaning is “the scientific method ” which is a principled method for investigating the world using the following steps: Form a hypothesis about the world. Use the hypothesis to make predictions. Run experiments to confirm or disprove the predictions. The ordering of these steps is very important to the scientific method. In particular, predictions must be made before experiments are run. Given that we all believe in the scientific method of investigation, it may be surprising to learn that cheating is very common. This happens for many reasons, some innocent and some not. Drug studies. Pharmaceutical companies make predictions about the effects of their drugs and then conduct blind clinical studies to determine their effect. Unfortunately, they have also been caught using some of the more advanced techniques for cheating here : including “reprobleming”, “data set selection”, and probably “overfitting by review”

6 0.70201868 247 hunch net-2007-06-14-Interesting Papers at COLT 2007

7 0.67767614 163 hunch net-2006-03-12-Online learning or online preservation of learning?

8 0.64950389 26 hunch net-2005-02-21-Problem: Cross Validation

9 0.64937764 392 hunch net-2010-03-26-A Variance only Deviation Bound

10 0.5834102 2 hunch net-2005-01-24-Holy grails of machine learning?

11 0.57667303 368 hunch net-2009-08-26-Another 10-year paper in Machine Learning

12 0.52822316 58 hunch net-2005-04-21-Dynamic Programming Generalizations and Their Use

13 0.50982785 83 hunch net-2005-06-18-Lower Bounds for Learning Reductions

14 0.5098232 230 hunch net-2007-02-02-Thoughts regarding “Is machine learning different from statistics?”

15 0.47838178 311 hunch net-2008-07-26-Compositional Machine Learning Algorithm Design

16 0.47302729 131 hunch net-2005-11-16-The Everything Ensemble Edge

17 0.46956402 56 hunch net-2005-04-14-Families of Learning Theory Statements

18 0.46184644 33 hunch net-2005-02-28-Regularization

19 0.46095499 138 hunch net-2005-12-09-Some NIPS papers

20 0.45972201 351 hunch net-2009-05-02-Wielding a New Abstraction


similar blogs computed by lda model

lda for this blog:

topicId topicWeight

[(3, 0.051), (27, 0.333), (30, 0.035), (38, 0.074), (48, 0.017), (53, 0.109), (55, 0.051), (67, 0.128), (94, 0.088), (95, 0.025)]

similar blogs list:

simIndex simValue blogId blogTitle

same-blog 1 0.96907896 41 hunch net-2005-03-15-The State of Tight Bounds

Introduction: What? Bounds are mathematical formulas relating observations to future error rates assuming that data is drawn independently. In classical statistics, they are calld confidence intervals. Why? Good Judgement . In many applications of learning, it is desirable to know how well the learned predictor works in the future. This helps you decide if the problem is solved or not. Learning Essence . The form of some of these bounds helps you understand what the essence of learning is. Algorithm Design . Some of these bounds suggest, motivate, or even directly imply learning algorithms. What We Know Now There are several families of bounds, based on how information is used. Testing Bounds . These are methods which use labeled data not used in training to estimate the future error rate. Examples include the test set bound , progressive validation also here and here , train and test bounds , and cross-validation (but see the big open problem ). These tec

2 0.9496581 296 hunch net-2008-04-21-The Science 2.0 article

Introduction: I found the article about science using modern tools interesting , especially the part about ‘blogophobia’, which in my experience is often a substantial issue: many potential guest posters aren’t quite ready, because of the fear of a permanent public mistake, because it is particularly hard to write about the unknown (the essence of research), and because the system for public credit doesn’t yet really handle blog posts. So far, science has been relatively resistant to discussing research on blogs. Some things need to change to get there. Public tolerance of the occasional mistake is essential, as is a willingness to cite (and credit) blogs as freely as papers. I’ve often run into another reason for holding back myself: I don’t want to overtalk my own research. Nevertheless, I’m slowly changing to the opinion that I’m holding back too much: the real power of a blog in research is that it can be used to confer with many people, and that just makes research work better.

3 0.94718409 252 hunch net-2007-07-01-Watchword: Online Learning

Introduction: It turns out that many different people use the term “Online Learning”, and often they don’t have the same definition in mind. Here’s a list of the possibilities I know of. Online Information Setting Online learning refers to a problem in which unlabeled data comes, a prediction is made, and then feedback is acquired. Online Adversarial Setting Online learning refers to algorithms in the Online Information Setting which satisfy guarantees of the form: “For all possible sequences of observations, the algorithim has regret at most log ( number of strategies) with respect to the best strategy in a set.” This is sometimes called online learning with experts. Online Optimization Constraint Online learning refers to optimizing a predictor via a learning algorithm tunes parameters on a per-example basis. This may or may not be applied in the Online Information Setting, and the strategy may or may not satisfy Adversarial setting theory. Online Computational Constra

4 0.93557894 227 hunch net-2007-01-10-A Deep Belief Net Learning Problem

Introduction: “Deep learning” is used to describe learning architectures which have significant depth (as a circuit). One claim is that shallow architectures (one or two layers) can not concisely represent some functions while a circuit with more depth can concisely represent these same functions. Proving lower bounds on the size of a circuit is substantially harder than upper bounds (which are constructive), but some results are known. Luca Trevisan ‘s class notes detail how XOR is not concisely representable by “AC0″ (= constant depth unbounded fan-in AND, OR, NOT gates). This doesn’t quite prove that depth is necessary for the representations commonly used in learning (such as a thresholded weighted sum), but it is strongly suggestive that this is so. Examples like this are a bit disheartening because existing algorithms for deep learning (deep belief nets, gradient descent on deep neural networks, and a perhaps decision trees depending on who you ask) can’t learn XOR very easily.

5 0.92758757 14 hunch net-2005-02-07-The State of the Reduction

Introduction: What? Reductions are machines which turn solvers for one problem into solvers for another problem. Why? Reductions are useful for several reasons. Laziness . Reducing a problem to classification make at least 10 learning algorithms available to solve a problem. Inventing 10 learning algorithms is quite a bit of work. Similarly, programming a reduction is often trivial, while programming a learning algorithm is a great deal of work. Crystallization . The problems we often want to solve in learning are worst-case-impossible, but average case feasible. By reducing all problems onto one or a few primitives, we can fine tune these primitives to perform well on real-world problems with greater precision due to the greater number of problems to validate on. Theoretical Organization . By studying what reductions are easy vs. hard vs. impossible, we can learn which problems are roughly equivalent in difficulty and which are much harder. What we know now . Typesafe r

6 0.9246304 12 hunch net-2005-02-03-Learning Theory, by assumption

7 0.9242866 79 hunch net-2005-06-08-Question: “When is the right time to insert the loss function?”

8 0.92168432 131 hunch net-2005-11-16-The Everything Ensemble Edge

9 0.91953534 347 hunch net-2009-03-26-Machine Learning is too easy

10 0.9193778 351 hunch net-2009-05-02-Wielding a New Abstraction

11 0.91914207 435 hunch net-2011-05-16-Research Directions for Machine Learning and Algorithms

12 0.91601861 230 hunch net-2007-02-02-Thoughts regarding “Is machine learning different from statistics?”

13 0.91600329 158 hunch net-2006-02-24-A Fundamentalist Organization of Machine Learning

14 0.91596299 311 hunch net-2008-07-26-Compositional Machine Learning Algorithm Design

15 0.91573232 183 hunch net-2006-06-14-Explorations of Exploration

16 0.91562015 258 hunch net-2007-08-12-Exponentiated Gradient

17 0.91451949 259 hunch net-2007-08-19-Choice of Metrics

18 0.91441858 478 hunch net-2013-01-07-NYU Large Scale Machine Learning Class

19 0.91265833 463 hunch net-2012-05-02-ICML: Behind the Scenes

20 0.91194636 370 hunch net-2009-09-18-Necessary and Sufficient Research