hunch_net hunch_net-2005 hunch_net-2005-85 knowledge-graph by maker-knowledge-mining

85 hunch net-2005-06-28-A COLT paper


meta infos for this blog

Source: html

Introduction: I found Tong Zhang’s paper on Data Dependent Concentration Bounds for Sequential Prediction Algorithms interesting. Roughly speaking, it states a tight bound on the future error rate for online learning algorithms assuming that samples are drawn independently. This bound is easily computed and will make the progressive validation approaches used here significantly more practical.


Summary: the most important sentenses genereted by tfidf model

sentIndex sentText sentNum sentScore

1 I found Tong Zhang’s paper on Data Dependent Concentration Bounds for Sequential Prediction Algorithms interesting. [sent-1, score-0.189]

2 Roughly speaking, it states a tight bound on the future error rate for online learning algorithms assuming that samples are drawn independently. [sent-2, score-1.836]

3 This bound is easily computed and will make the progressive validation approaches used here significantly more practical. [sent-3, score-1.595]


similar blogs computed by tfidf model

tfidf for this blog:

wordName wordTfidf (topN-words)

[('computed', 0.289), ('bound', 0.285), ('concentration', 0.26), ('progressive', 0.26), ('tong', 0.234), ('sequential', 0.227), ('zhang', 0.227), ('validation', 0.216), ('assuming', 0.216), ('tight', 0.211), ('speaking', 0.191), ('dependent', 0.191), ('states', 0.188), ('roughly', 0.182), ('drawn', 0.165), ('samples', 0.155), ('practical', 0.15), ('significantly', 0.148), ('algorithms', 0.142), ('bounds', 0.14), ('approaches', 0.122), ('easily', 0.121), ('error', 0.12), ('future', 0.118), ('rate', 0.118), ('found', 0.115), ('online', 0.099), ('prediction', 0.096), ('used', 0.085), ('data', 0.08), ('paper', 0.074), ('make', 0.069), ('learning', 0.019)]

similar blogs list:

simIndex simValue blogId blogTitle

same-blog 1 0.99999988 85 hunch net-2005-06-28-A COLT paper

Introduction: I found Tong Zhang’s paper on Data Dependent Concentration Bounds for Sequential Prediction Algorithms interesting. Roughly speaking, it states a tight bound on the future error rate for online learning algorithms assuming that samples are drawn independently. This bound is easily computed and will make the progressive validation approaches used here significantly more practical.

2 0.28480452 41 hunch net-2005-03-15-The State of Tight Bounds

Introduction: What? Bounds are mathematical formulas relating observations to future error rates assuming that data is drawn independently. In classical statistics, they are calld confidence intervals. Why? Good Judgement . In many applications of learning, it is desirable to know how well the learned predictor works in the future. This helps you decide if the problem is solved or not. Learning Essence . The form of some of these bounds helps you understand what the essence of learning is. Algorithm Design . Some of these bounds suggest, motivate, or even directly imply learning algorithms. What We Know Now There are several families of bounds, based on how information is used. Testing Bounds . These are methods which use labeled data not used in training to estimate the future error rate. Examples include the test set bound , progressive validation also here and here , train and test bounds , and cross-validation (but see the big open problem ). These tec

3 0.16180462 244 hunch net-2007-05-09-The Missing Bound

Introduction: Sham Kakade points out that we are missing a bound. Suppose we have m samples x drawn IID from some distribution D . Through the magic of exponential moment method we know that: If the range of x is bounded by an interval of size I , a Chernoff/Hoeffding style bound gives us a bound on the deviations like O(I/m 0.5 ) (at least in crude form). A proof is on page 9 here . If the range of x is bounded, and the variance (or a bound on the variance) is known, then Bennett’s bound can give tighter results (*). This can be a huge improvment when the true variance small. What’s missing here is a bound that depends on the observed variance rather than a bound on the variance. This means that many people attempt to use Bennett’s bound (incorrectly) by plugging the observed variance in as the true variance, invalidating the bound application. Most of the time, they get away with it, but this is a dangerous move when doing machine learning. In machine learning,

4 0.1575183 247 hunch net-2007-06-14-Interesting Papers at COLT 2007

Introduction: Here are two papers that seem particularly interesting at this year’s COLT. Gilles Blanchard and François Fleuret , Occam’s Hammer . When we are interested in very tight bounds on the true error rate of a classifier, it is tempting to use a PAC-Bayes bound which can (empirically) be quite tight . A disadvantage of the PAC-Bayes bound is that it applies to a classifier which is randomized over a set of base classifiers rather than a single classifier. This paper shows that a similar bound can be proved which holds for a single classifier drawn from the set. The ability to safely use a single classifier is very nice. This technique applies generically to any base bound, so it has other applications covered in the paper. Adam Tauman Kalai . Learning Nested Halfspaces and Uphill Decision Trees . Classification PAC-learning, where you prove that any problem amongst some set is polytime learnable with respect to any distribution over the input X is extraordinarily ch

5 0.14832127 115 hunch net-2005-09-26-Prediction Bounds as the Mathematics of Science

Introduction: “Science” has many meanings, but one common meaning is “the scientific method ” which is a principled method for investigating the world using the following steps: Form a hypothesis about the world. Use the hypothesis to make predictions. Run experiments to confirm or disprove the predictions. The ordering of these steps is very important to the scientific method. In particular, predictions must be made before experiments are run. Given that we all believe in the scientific method of investigation, it may be surprising to learn that cheating is very common. This happens for many reasons, some innocent and some not. Drug studies. Pharmaceutical companies make predictions about the effects of their drugs and then conduct blind clinical studies to determine their effect. Unfortunately, they have also been caught using some of the more advanced techniques for cheating here : including “reprobleming”, “data set selection”, and probably “overfitting by review”

6 0.14783072 170 hunch net-2006-04-06-Bounds greater than 1

7 0.13908684 26 hunch net-2005-02-21-Problem: Cross Validation

8 0.12319648 72 hunch net-2005-05-16-Regret minimizing vs error limiting reductions

9 0.11014093 163 hunch net-2006-03-12-Online learning or online preservation of learning?

10 0.10901433 56 hunch net-2005-04-14-Families of Learning Theory Statements

11 0.093929783 78 hunch net-2005-06-06-Exact Online Learning for Classification

12 0.092824854 83 hunch net-2005-06-18-Lower Bounds for Learning Reductions

13 0.089680977 235 hunch net-2007-03-03-All Models of Learning have Flaws

14 0.089471743 361 hunch net-2009-06-24-Interesting papers at UAICMOLT 2009

15 0.085274979 58 hunch net-2005-04-21-Dynamic Programming Generalizations and Their Use

16 0.084083155 368 hunch net-2009-08-26-Another 10-year paper in Machine Learning

17 0.083988607 251 hunch net-2007-06-24-Interesting Papers at ICML 2007

18 0.07939662 120 hunch net-2005-10-10-Predictive Search is Coming

19 0.078078233 33 hunch net-2005-02-28-Regularization

20 0.075936288 86 hunch net-2005-06-28-The cross validation problem: cash reward


similar blogs computed by lsi model

lsi for this blog:

topicId topicWeight

[(0, 0.146), (1, 0.11), (2, 0.04), (3, -0.078), (4, 0.101), (5, -0.078), (6, 0.092), (7, -0.028), (8, -0.073), (9, -0.003), (10, 0.051), (11, 0.18), (12, 0.049), (13, -0.086), (14, 0.069), (15, -0.098), (16, -0.033), (17, 0.046), (18, -0.121), (19, -0.015), (20, 0.161), (21, 0.016), (22, -0.209), (23, 0.085), (24, -0.003), (25, 0.071), (26, 0.01), (27, 0.097), (28, 0.084), (29, -0.004), (30, -0.004), (31, -0.041), (32, 0.001), (33, -0.031), (34, -0.086), (35, 0.039), (36, 0.018), (37, -0.034), (38, -0.141), (39, 0.065), (40, -0.082), (41, -0.022), (42, -0.018), (43, 0.031), (44, 0.001), (45, -0.021), (46, 0.076), (47, -0.003), (48, -0.028), (49, -0.068)]

similar blogs list:

simIndex simValue blogId blogTitle

same-blog 1 0.9773674 85 hunch net-2005-06-28-A COLT paper

Introduction: I found Tong Zhang’s paper on Data Dependent Concentration Bounds for Sequential Prediction Algorithms interesting. Roughly speaking, it states a tight bound on the future error rate for online learning algorithms assuming that samples are drawn independently. This bound is easily computed and will make the progressive validation approaches used here significantly more practical.

2 0.83937031 244 hunch net-2007-05-09-The Missing Bound

Introduction: Sham Kakade points out that we are missing a bound. Suppose we have m samples x drawn IID from some distribution D . Through the magic of exponential moment method we know that: If the range of x is bounded by an interval of size I , a Chernoff/Hoeffding style bound gives us a bound on the deviations like O(I/m 0.5 ) (at least in crude form). A proof is on page 9 here . If the range of x is bounded, and the variance (or a bound on the variance) is known, then Bennett’s bound can give tighter results (*). This can be a huge improvment when the true variance small. What’s missing here is a bound that depends on the observed variance rather than a bound on the variance. This means that many people attempt to use Bennett’s bound (incorrectly) by plugging the observed variance in as the true variance, invalidating the bound application. Most of the time, they get away with it, but this is a dangerous move when doing machine learning. In machine learning,

3 0.83402646 41 hunch net-2005-03-15-The State of Tight Bounds

Introduction: What? Bounds are mathematical formulas relating observations to future error rates assuming that data is drawn independently. In classical statistics, they are calld confidence intervals. Why? Good Judgement . In many applications of learning, it is desirable to know how well the learned predictor works in the future. This helps you decide if the problem is solved or not. Learning Essence . The form of some of these bounds helps you understand what the essence of learning is. Algorithm Design . Some of these bounds suggest, motivate, or even directly imply learning algorithms. What We Know Now There are several families of bounds, based on how information is used. Testing Bounds . These are methods which use labeled data not used in training to estimate the future error rate. Examples include the test set bound , progressive validation also here and here , train and test bounds , and cross-validation (but see the big open problem ). These tec

4 0.75582665 170 hunch net-2006-04-06-Bounds greater than 1

Introduction: Nati Srebro and Shai Ben-David have a paper at COLT which, in the appendix, proves something very striking: several previous error bounds are always greater than 1. Background One branch of learning theory focuses on theorems which Assume samples are drawn IID from an unknown distribution D . Fix a set of classifiers Find a high probability bound on the maximum true error rate (with respect to D ) as a function of the empirical error rate on the training set. Many of these bounds become extremely complex and hairy. Current Everyone working on this subject wants “tighter bounds”, however there are different definitions of “tighter”. Some groups focus on “functional tightness” (getting the right functional dependency between the size of the training set and a parameterization of the hypothesis space) while others focus on “practical tightness” (finding bounds which work well on practical problems). (I am definitely in the second camp.) One of the da

5 0.7204634 115 hunch net-2005-09-26-Prediction Bounds as the Mathematics of Science

Introduction: “Science” has many meanings, but one common meaning is “the scientific method ” which is a principled method for investigating the world using the following steps: Form a hypothesis about the world. Use the hypothesis to make predictions. Run experiments to confirm or disprove the predictions. The ordering of these steps is very important to the scientific method. In particular, predictions must be made before experiments are run. Given that we all believe in the scientific method of investigation, it may be surprising to learn that cheating is very common. This happens for many reasons, some innocent and some not. Drug studies. Pharmaceutical companies make predictions about the effects of their drugs and then conduct blind clinical studies to determine their effect. Unfortunately, they have also been caught using some of the more advanced techniques for cheating here : including “reprobleming”, “data set selection”, and probably “overfitting by review”

6 0.69073486 247 hunch net-2007-06-14-Interesting Papers at COLT 2007

7 0.64873624 163 hunch net-2006-03-12-Online learning or online preservation of learning?

8 0.54095846 26 hunch net-2005-02-21-Problem: Cross Validation

9 0.53963906 392 hunch net-2010-03-26-A Variance only Deviation Bound

10 0.51462853 58 hunch net-2005-04-21-Dynamic Programming Generalizations and Their Use

11 0.50919122 368 hunch net-2009-08-26-Another 10-year paper in Machine Learning

12 0.44434837 83 hunch net-2005-06-18-Lower Bounds for Learning Reductions

13 0.38450924 33 hunch net-2005-02-28-Regularization

14 0.37868947 311 hunch net-2008-07-26-Compositional Machine Learning Algorithm Design

15 0.36840859 230 hunch net-2007-02-02-Thoughts regarding “Is machine learning different from statistics?”

16 0.36442503 78 hunch net-2005-06-06-Exact Online Learning for Classification

17 0.35749415 87 hunch net-2005-06-29-Not EM for clustering at COLT

18 0.35477611 55 hunch net-2005-04-10-Is the Goal Understanding or Prediction?

19 0.35035262 220 hunch net-2006-11-27-Continuizing Solutions

20 0.33960932 56 hunch net-2005-04-14-Families of Learning Theory Statements


similar blogs computed by lda model

lda for this blog:

topicId topicWeight

[(27, 0.302), (30, 0.344), (53, 0.11), (94, 0.091)]

similar blogs list:

simIndex simValue blogId blogTitle

1 0.95306963 189 hunch net-2006-07-05-more icml papers

Introduction: Here are a few other papers I enjoyed from ICML06. Topic Models: Dynamic Topic Models David Blei, John Lafferty A nice model for how topics in LDA type models can evolve over time, using a linear dynamical system on the natural parameters and a very clever structured variational approximation (in which the mean field parameters are pseudo-observations of a virtual LDS). Like all Blei papers, he makes it look easy, but it is extremely impressive. Pachinko Allocation Wei Li, Andrew McCallum A very elegant (but computationally challenging) model which induces correlation amongst topics using a multi-level DAG whose interior nodes are “super-topics” and “sub-topics” and whose leaves are the vocabulary words. Makes the slumbering monster of structure learning stir. Sequence Analysis (I missed these talks since I was chairing another session) Online Decoding of Markov Models with Latency Constraints Mukund Narasimhan, Paul Viola, Michael Shilman An “a

same-blog 2 0.92370355 85 hunch net-2005-06-28-A COLT paper

Introduction: I found Tong Zhang’s paper on Data Dependent Concentration Bounds for Sequential Prediction Algorithms interesting. Roughly speaking, it states a tight bound on the future error rate for online learning algorithms assuming that samples are drawn independently. This bound is easily computed and will make the progressive validation approaches used here significantly more practical.

3 0.89750201 364 hunch net-2009-07-11-Interesting papers at KDD

Introduction: I attended KDD this year. The conference has always had a strong grounding in what works based on the KDDcup , but it has developed a halo of workshops on various subjects. It seems that KDD has become a place where the economy meets machine learning in a stronger sense than many other conferences. There were several papers that other people might like to take a look at. Yehuda Koren Collaborative Filtering with Temporal Dynamics . This paper describes how to incorporate temporal dynamics into a couple of collaborative filtering approaches. This was also a best paper award. D. Sculley , Robert Malkin, Sugato Basu , Roberto J. Bayardo , Predicting Bounce Rates in Sponsored Search Advertisements . The basic claim of this paper is that the probability people immediately leave (“bounce”) after clicking on an advertisement is predictable. Frank McSherry and Ilya Mironov Differentially Private Recommender Systems: Building Privacy into the Netflix Prize Contende

4 0.89464021 154 hunch net-2006-02-04-Research Budget Changes

Introduction: The announcement of an increase in funding for basic research in the US is encouraging. There is some discussion of this at the Computing Research Policy blog. One part of this discussion has a graph of NSF funding over time, presumably in dollar budgets. I don’t believe that dollar budgets are the right way to judge the impact of funding changes on researchers. A better way to judge seems to be in terms of dollar budget divided by GDP which provides a measure of the relative emphasis on research. This graph was assembled by dividing the NSF budget by the US GDP . For 2005 GDP, I used the current estimate and for 2006 and 2007 assumed an increase by a factor of 1.04 per year. The 2007 number also uses the requested 2007 budget which is certain to change. This graph makes it clear why researchers were upset: research funding emphasis has fallen for 3 years in a row. The reality has been significantly more severe due to DARPA decreasing funding and industrial

5 0.87485164 455 hunch net-2012-02-20-Berkeley Streaming Data Workshop

Introduction: The From Data to Knowledge workshop May 7-11 at Berkeley should be of interest to the many people encountering streaming data in different disciplines. It’s run by a group of astronomers who encounter streaming data all the time. I met Josh Bloom recently and he is broadly interested in a workshop covering all aspects of Machine Learning on streaming data. The hope here is that techniques developed in one area turn out useful in another which seems quite plausible. Particularly if you are in the bay area, consider checking it out.

6 0.82130754 444 hunch net-2011-09-07-KDD and MUCMD 2011

7 0.71580589 132 hunch net-2005-11-26-The Design of an Optimal Research Environment

8 0.70761174 41 hunch net-2005-03-15-The State of Tight Bounds

9 0.67392296 158 hunch net-2006-02-24-A Fundamentalist Organization of Machine Learning

10 0.67197192 258 hunch net-2007-08-12-Exponentiated Gradient

11 0.67044544 347 hunch net-2009-03-26-Machine Learning is too easy

12 0.66352177 67 hunch net-2005-05-06-Don’t mix the solution into the problem

13 0.66345888 14 hunch net-2005-02-07-The State of the Reduction

14 0.66332388 435 hunch net-2011-05-16-Research Directions for Machine Learning and Algorithms

15 0.66276252 352 hunch net-2009-05-06-Machine Learning to AI

16 0.66186523 483 hunch net-2013-06-10-The Large Scale Learning class notes

17 0.66162938 9 hunch net-2005-02-01-Watchword: Loss

18 0.66155773 101 hunch net-2005-08-08-Apprenticeship Reinforcement Learning for Control

19 0.66138667 311 hunch net-2008-07-26-Compositional Machine Learning Algorithm Design

20 0.65859371 79 hunch net-2005-06-08-Question: “When is the right time to insert the loss function?”