hunch_net hunch_net-2007 hunch_net-2007-247 knowledge-graph by maker-knowledge-mining

247 hunch net-2007-06-14-Interesting Papers at COLT 2007


meta infos for this blog

Source: html

Introduction: Here are two papers that seem particularly interesting at this year’s COLT. Gilles Blanchard and François Fleuret , Occam’s Hammer . When we are interested in very tight bounds on the true error rate of a classifier, it is tempting to use a PAC-Bayes bound which can (empirically) be quite tight . A disadvantage of the PAC-Bayes bound is that it applies to a classifier which is randomized over a set of base classifiers rather than a single classifier. This paper shows that a similar bound can be proved which holds for a single classifier drawn from the set. The ability to safely use a single classifier is very nice. This technique applies generically to any base bound, so it has other applications covered in the paper. Adam Tauman Kalai . Learning Nested Halfspaces and Uphill Decision Trees . Classification PAC-learning, where you prove that any problem amongst some set is polytime learnable with respect to any distribution over the input X is extraordinarily ch


Summary: the most important sentenses genereted by tfidf model

sentIndex sentText sentNum sentScore

1 When we are interested in very tight bounds on the true error rate of a classifier, it is tempting to use a PAC-Bayes bound which can (empirically) be quite tight . [sent-3, score-0.738]

2 A disadvantage of the PAC-Bayes bound is that it applies to a classifier which is randomized over a set of base classifiers rather than a single classifier. [sent-4, score-1.419]

3 This paper shows that a similar bound can be proved which holds for a single classifier drawn from the set. [sent-5, score-1.183]

4 The ability to safely use a single classifier is very nice. [sent-6, score-0.555]

5 This technique applies generically to any base bound, so it has other applications covered in the paper. [sent-7, score-0.638]

6 Classification PAC-learning, where you prove that any problem amongst some set is polytime learnable with respect to any distribution over the input X is extraordinarily challenging as judged by lack of progress over a long period of time. [sent-10, score-0.894]

7 This paper is about regression PAC-learning, and the results appear much more encouraging than exist in classification PAC-learning. [sent-11, score-0.443]

8 Under the assumption that: The level sets of the correct regressed value are halfspaces. [sent-12, score-0.447]

9 this paper proves that a good regressor can be PAC-learned using a boosting algorithm. [sent-14, score-0.443]

10 (The “uphill decision trees” part of the paper is about one special case where you don’t need the Lipschitz condition. [sent-15, score-0.321]


similar blogs computed by tfidf model

tfidf for this blog:

wordName wordTfidf (topN-words)

[('lipschitz', 0.318), ('uphill', 0.318), ('classifier', 0.26), ('bound', 0.258), ('tight', 0.19), ('single', 0.186), ('sets', 0.179), ('applies', 0.179), ('base', 0.17), ('trees', 0.164), ('paper', 0.134), ('halfspaces', 0.131), ('condition', 0.123), ('generically', 0.123), ('level', 0.118), ('encouraging', 0.118), ('classification', 0.115), ('judged', 0.113), ('occam', 0.113), ('learnable', 0.113), ('proves', 0.113), ('regressor', 0.113), ('randomized', 0.109), ('kalai', 0.109), ('safely', 0.109), ('period', 0.106), ('disadvantage', 0.103), ('challenging', 0.103), ('holds', 0.1), ('tempting', 0.1), ('adam', 0.097), ('extraordinarily', 0.097), ('proved', 0.097), ('decision', 0.097), ('covered', 0.091), ('special', 0.09), ('boosting', 0.083), ('classifiers', 0.08), ('empirically', 0.077), ('assumption', 0.076), ('regression', 0.076), ('technique', 0.075), ('drawn', 0.074), ('shows', 0.074), ('prove', 0.074), ('set', 0.074), ('correct', 0.074), ('input', 0.074), ('progress', 0.071), ('lack', 0.069)]

similar blogs list:

simIndex simValue blogId blogTitle

same-blog 1 1.0000002 247 hunch net-2007-06-14-Interesting Papers at COLT 2007

Introduction: Here are two papers that seem particularly interesting at this year’s COLT. Gilles Blanchard and François Fleuret , Occam’s Hammer . When we are interested in very tight bounds on the true error rate of a classifier, it is tempting to use a PAC-Bayes bound which can (empirically) be quite tight . A disadvantage of the PAC-Bayes bound is that it applies to a classifier which is randomized over a set of base classifiers rather than a single classifier. This paper shows that a similar bound can be proved which holds for a single classifier drawn from the set. The ability to safely use a single classifier is very nice. This technique applies generically to any base bound, so it has other applications covered in the paper. Adam Tauman Kalai . Learning Nested Halfspaces and Uphill Decision Trees . Classification PAC-learning, where you prove that any problem amongst some set is polytime learnable with respect to any distribution over the input X is extraordinarily ch

2 0.17266485 41 hunch net-2005-03-15-The State of Tight Bounds

Introduction: What? Bounds are mathematical formulas relating observations to future error rates assuming that data is drawn independently. In classical statistics, they are calld confidence intervals. Why? Good Judgement . In many applications of learning, it is desirable to know how well the learned predictor works in the future. This helps you decide if the problem is solved or not. Learning Essence . The form of some of these bounds helps you understand what the essence of learning is. Algorithm Design . Some of these bounds suggest, motivate, or even directly imply learning algorithms. What We Know Now There are several families of bounds, based on how information is used. Testing Bounds . These are methods which use labeled data not used in training to estimate the future error rate. Examples include the test set bound , progressive validation also here and here , train and test bounds , and cross-validation (but see the big open problem ). These tec

3 0.1575183 85 hunch net-2005-06-28-A COLT paper

Introduction: I found Tong Zhang’s paper on Data Dependent Concentration Bounds for Sequential Prediction Algorithms interesting. Roughly speaking, it states a tight bound on the future error rate for online learning algorithms assuming that samples are drawn independently. This bound is easily computed and will make the progressive validation approaches used here significantly more practical.

4 0.15707658 196 hunch net-2006-07-13-Regression vs. Classification as a Primitive

Introduction: For learning reductions we have been concentrating on reducing various complex learning problems to binary classification. This choice needs to be actively questioned, because it was not carefully considered. Binary clasification is learning a classifier c:X -> {0,1} so as to minimize the probability of being wrong, Pr x,y~D (c(x) y) . The primary alternative candidate seems to be squared error regression. In squared error regression, you learn a regressor s:X -> [0,1] so as to minimize squared error, E x,y~D (s(x)-y) 2 . It is difficult to judge one primitive against another. The judgement must at least partially be made on nontheoretical grounds because (essentially) we are evaluating a choice between two axioms/assumptions. These two primitives are significantly related. Classification can be reduced to regression in the obvious way: you use the regressor to predict D(y=1|x) , then threshold at 0.5 . For this simple reduction a squared error regret of r

5 0.15547876 244 hunch net-2007-05-09-The Missing Bound

Introduction: Sham Kakade points out that we are missing a bound. Suppose we have m samples x drawn IID from some distribution D . Through the magic of exponential moment method we know that: If the range of x is bounded by an interval of size I , a Chernoff/Hoeffding style bound gives us a bound on the deviations like O(I/m 0.5 ) (at least in crude form). A proof is on page 9 here . If the range of x is bounded, and the variance (or a bound on the variance) is known, then Bennett’s bound can give tighter results (*). This can be a huge improvment when the true variance small. What’s missing here is a bound that depends on the observed variance rather than a bound on the variance. This means that many people attempt to use Bennett’s bound (incorrectly) by plugging the observed variance in as the true variance, invalidating the bound application. Most of the time, they get away with it, but this is a dangerous move when doing machine learning. In machine learning,

6 0.14983985 72 hunch net-2005-05-16-Regret minimizing vs error limiting reductions

7 0.14922437 26 hunch net-2005-02-21-Problem: Cross Validation

8 0.13276488 274 hunch net-2007-11-28-Computational Consequences of Classification

9 0.12081455 14 hunch net-2005-02-07-The State of the Reduction

10 0.11442 43 hunch net-2005-03-18-Binomial Weighting

11 0.11353772 170 hunch net-2006-04-06-Bounds greater than 1

12 0.11239773 115 hunch net-2005-09-26-Prediction Bounds as the Mathematics of Science

13 0.10108186 220 hunch net-2006-11-27-Continuizing Solutions

14 0.10080582 83 hunch net-2005-06-18-Lower Bounds for Learning Reductions

15 0.099963218 361 hunch net-2009-06-24-Interesting papers at UAICMOLT 2009

16 0.09952344 131 hunch net-2005-11-16-The Everything Ensemble Edge

17 0.099206686 133 hunch net-2005-11-28-A question of quantification

18 0.098985419 230 hunch net-2007-02-02-Thoughts regarding “Is machine learning different from statistics?”

19 0.095534243 406 hunch net-2010-08-22-KDD 2010

20 0.092674427 164 hunch net-2006-03-17-Multitask learning is Black-Boxable


similar blogs computed by lsi model

lsi for this blog:

topicId topicWeight

[(0, 0.183), (1, 0.099), (2, 0.114), (3, -0.086), (4, 0.047), (5, -0.038), (6, 0.131), (7, -0.028), (8, -0.085), (9, -0.082), (10, 0.027), (11, 0.124), (12, -0.03), (13, -0.132), (14, 0.014), (15, -0.026), (16, -0.028), (17, 0.098), (18, -0.064), (19, 0.004), (20, 0.082), (21, 0.073), (22, -0.101), (23, 0.027), (24, -0.018), (25, 0.022), (26, 0.009), (27, 0.105), (28, 0.017), (29, -0.006), (30, 0.004), (31, -0.093), (32, -0.094), (33, -0.026), (34, 0.036), (35, 0.077), (36, 0.008), (37, -0.036), (38, 0.04), (39, 0.065), (40, -0.011), (41, -0.103), (42, -0.041), (43, -0.013), (44, 0.005), (45, 0.001), (46, 0.047), (47, -0.012), (48, 0.025), (49, -0.119)]

similar blogs list:

simIndex simValue blogId blogTitle

same-blog 1 0.98164111 247 hunch net-2007-06-14-Interesting Papers at COLT 2007

Introduction: Here are two papers that seem particularly interesting at this year’s COLT. Gilles Blanchard and François Fleuret , Occam’s Hammer . When we are interested in very tight bounds on the true error rate of a classifier, it is tempting to use a PAC-Bayes bound which can (empirically) be quite tight . A disadvantage of the PAC-Bayes bound is that it applies to a classifier which is randomized over a set of base classifiers rather than a single classifier. This paper shows that a similar bound can be proved which holds for a single classifier drawn from the set. The ability to safely use a single classifier is very nice. This technique applies generically to any base bound, so it has other applications covered in the paper. Adam Tauman Kalai . Learning Nested Halfspaces and Uphill Decision Trees . Classification PAC-learning, where you prove that any problem amongst some set is polytime learnable with respect to any distribution over the input X is extraordinarily ch

2 0.71046799 85 hunch net-2005-06-28-A COLT paper

Introduction: I found Tong Zhang’s paper on Data Dependent Concentration Bounds for Sequential Prediction Algorithms interesting. Roughly speaking, it states a tight bound on the future error rate for online learning algorithms assuming that samples are drawn independently. This bound is easily computed and will make the progressive validation approaches used here significantly more practical.

3 0.67890918 41 hunch net-2005-03-15-The State of Tight Bounds

Introduction: What? Bounds are mathematical formulas relating observations to future error rates assuming that data is drawn independently. In classical statistics, they are calld confidence intervals. Why? Good Judgement . In many applications of learning, it is desirable to know how well the learned predictor works in the future. This helps you decide if the problem is solved or not. Learning Essence . The form of some of these bounds helps you understand what the essence of learning is. Algorithm Design . Some of these bounds suggest, motivate, or even directly imply learning algorithms. What We Know Now There are several families of bounds, based on how information is used. Testing Bounds . These are methods which use labeled data not used in training to estimate the future error rate. Examples include the test set bound , progressive validation also here and here , train and test bounds , and cross-validation (but see the big open problem ). These tec

4 0.65744364 26 hunch net-2005-02-21-Problem: Cross Validation

Introduction: The essential problem here is the large gap between experimental observation and theoretical understanding. Method K-fold cross validation is a commonly used technique which takes a set of m examples and partitions them into K sets (“folds”) of size m/K . For each fold, a classifier is trained on the other folds and then test on the fold. Problem Assume only independent samples. Derive a classifier from the K classifiers with a small bound on the true error rate. Past Work (I’ll add more as I remember/learn.) Devroye , Rogers, and Wagner analyzed cross validation and found algorithm specific bounds. Not all of this is online, but here is one paper . Michael Kearns and Dana Ron analyzed cross validation and found that under additional stability assumptions the bound for the classifier which learns on all the data is not much worse than for a test set of size m/K . Avrim Blum, Adam Kalai , and myself analyzed cross validation and found tha

5 0.65385097 244 hunch net-2007-05-09-The Missing Bound

Introduction: Sham Kakade points out that we are missing a bound. Suppose we have m samples x drawn IID from some distribution D . Through the magic of exponential moment method we know that: If the range of x is bounded by an interval of size I , a Chernoff/Hoeffding style bound gives us a bound on the deviations like O(I/m 0.5 ) (at least in crude form). A proof is on page 9 here . If the range of x is bounded, and the variance (or a bound on the variance) is known, then Bennett’s bound can give tighter results (*). This can be a huge improvment when the true variance small. What’s missing here is a bound that depends on the observed variance rather than a bound on the variance. This means that many people attempt to use Bennett’s bound (incorrectly) by plugging the observed variance in as the true variance, invalidating the bound application. Most of the time, they get away with it, but this is a dangerous move when doing machine learning. In machine learning,

6 0.65357345 170 hunch net-2006-04-06-Bounds greater than 1

7 0.54906219 196 hunch net-2006-07-13-Regression vs. Classification as a Primitive

8 0.54690617 115 hunch net-2005-09-26-Prediction Bounds as the Mathematics of Science

9 0.53533328 31 hunch net-2005-02-26-Problem: Reductions and Relative Ranking Metrics

10 0.52502942 18 hunch net-2005-02-12-ROC vs. Accuracy vs. AROC

11 0.50855857 72 hunch net-2005-05-16-Regret minimizing vs error limiting reductions

12 0.5053637 138 hunch net-2005-12-09-Some NIPS papers

13 0.47236761 439 hunch net-2011-08-01-Interesting papers at COLT 2011

14 0.4645322 163 hunch net-2006-03-12-Online learning or online preservation of learning?

15 0.45903438 14 hunch net-2005-02-07-The State of the Reduction

16 0.45108283 206 hunch net-2006-09-09-How to solve an NP hard problem in quadratic time

17 0.44846097 361 hunch net-2009-06-24-Interesting papers at UAICMOLT 2009

18 0.44433281 131 hunch net-2005-11-16-The Everything Ensemble Edge

19 0.43639651 243 hunch net-2007-05-08-Conditional Tournaments for Multiclass to Binary

20 0.42954537 392 hunch net-2010-03-26-A Variance only Deviation Bound


similar blogs computed by lda model

lda for this blog:

topicId topicWeight

[(27, 0.853), (55, 0.034)]

similar blogs list:

simIndex simValue blogId blogTitle

1 0.99980223 274 hunch net-2007-11-28-Computational Consequences of Classification

Introduction: In the regression vs classification debate , I’m adding a new “pro” to classification. It seems there are computational shortcuts available for classification which simply aren’t available for regression. This arises in several situations. In active learning it is sometimes possible to find an e error classifier with just log(e) labeled samples. Only much more modest improvements appear to be achievable for squared loss regression. The essential reason is that the loss function on many examples is flat with respect to large variations in the parameter spaces of a learned classifier, which implies that many of these classifiers do not need to be considered. In contrast, for squared loss regression, most substantial variations in the parameter space influence the loss at most points. In budgeted learning, where there is either a computational time constraint or a feature cost constraint, a classifier can sometimes be learned to very high accuracy under the constraints

2 0.9992137 166 hunch net-2006-03-24-NLPers

Introduction: Hal Daume has started the NLPers blog to discuss learning for language problems.

3 0.9992137 246 hunch net-2007-06-13-Not Posting

Introduction: If you have been disappointed by the lack of a post for the last month, consider contributing your own (I’ve been busy+uninspired). Also, keep in mind that there is a community of machine learning blogs (see the sidebar).

4 0.9992137 418 hunch net-2010-12-02-Traffic Prediction Problem

Introduction: Slashdot points out the Traffic Prediction Challenge which looks pretty fun. The temporal aspect seems to be very common in many real-world problems and somewhat understudied.

same-blog 5 0.99918866 247 hunch net-2007-06-14-Interesting Papers at COLT 2007

Introduction: Here are two papers that seem particularly interesting at this year’s COLT. Gilles Blanchard and François Fleuret , Occam’s Hammer . When we are interested in very tight bounds on the true error rate of a classifier, it is tempting to use a PAC-Bayes bound which can (empirically) be quite tight . A disadvantage of the PAC-Bayes bound is that it applies to a classifier which is randomized over a set of base classifiers rather than a single classifier. This paper shows that a similar bound can be proved which holds for a single classifier drawn from the set. The ability to safely use a single classifier is very nice. This technique applies generically to any base bound, so it has other applications covered in the paper. Adam Tauman Kalai . Learning Nested Halfspaces and Uphill Decision Trees . Classification PAC-learning, where you prove that any problem amongst some set is polytime learnable with respect to any distribution over the input X is extraordinarily ch

6 0.99623722 308 hunch net-2008-07-06-To Dual or Not

7 0.99565452 288 hunch net-2008-02-10-Complexity Illness

8 0.99534428 172 hunch net-2006-04-14-JMLR is a success

9 0.99445117 245 hunch net-2007-05-12-Loss Function Semantics

10 0.99313772 400 hunch net-2010-06-13-The Good News on Exploration and Learning

11 0.98801661 45 hunch net-2005-03-22-Active learning

12 0.97729665 9 hunch net-2005-02-01-Watchword: Loss

13 0.97300833 341 hunch net-2009-02-04-Optimal Proxy Loss for Classification

14 0.97230822 352 hunch net-2009-05-06-Machine Learning to AI

15 0.96731687 304 hunch net-2008-06-27-Reviewing Horror Stories

16 0.95887548 196 hunch net-2006-07-13-Regression vs. Classification as a Primitive

17 0.9464668 483 hunch net-2013-06-10-The Large Scale Learning class notes

18 0.94335711 244 hunch net-2007-05-09-The Missing Bound

19 0.94119173 293 hunch net-2008-03-23-Interactive Machine Learning

20 0.93265337 67 hunch net-2005-05-06-Don’t mix the solution into the problem