hunch_net hunch_net-2009 hunch_net-2009-368 knowledge-graph by maker-knowledge-mining

368 hunch net-2009-08-26-Another 10-year paper in Machine Learning


meta infos for this blog

Source: html

Introduction: When I was thinking about the best “10 year paper” for ICML , I also took a look at a few other conferences. Here is one from 10 years ago that interested me: David McAllester PAC-Bayesian Model Averaging , COLT 1999. 2001 Journal Draft . Prior to this paper, the only mechanism known for controlling or estimating the necessary sample complexity for learning over continuously parameterized predictors was VC theory and variants, all of which suffered from a basic problem: they were incredibly pessimistic in practice. This meant that only very gross guidance could be provided for learning algorithm design. The PAC-Bayes bound provided an alternative approach to sample complexity bounds which was radically tighter, quantitatively. It also imported and explained many of the motivations for Bayesian learning in a way that learning theory and perhaps optimization people might appreciate. Since this paper came out, there have been a number of moderately successful attempts t


Summary: the most important sentenses genereted by tfidf model

sentIndex sentText sentNum sentScore

1 When I was thinking about the best “10 year paper” for ICML , I also took a look at a few other conferences. [sent-1, score-0.245]

2 Here is one from 10 years ago that interested me: David McAllester PAC-Bayesian Model Averaging , COLT 1999. [sent-2, score-0.376]

3 Prior to this paper, the only mechanism known for controlling or estimating the necessary sample complexity for learning over continuously parameterized predictors was VC theory and variants, all of which suffered from a basic problem: they were incredibly pessimistic in practice. [sent-4, score-1.552]

4 This meant that only very gross guidance could be provided for learning algorithm design. [sent-5, score-0.583]

5 The PAC-Bayes bound provided an alternative approach to sample complexity bounds which was radically tighter, quantitatively. [sent-6, score-0.878]

6 It also imported and explained many of the motivations for Bayesian learning in a way that learning theory and perhaps optimization people might appreciate. [sent-7, score-0.388]

7 Since this paper came out, there have been a number of moderately successful attempts to drive algorithms directly by the PAC-Bayes bound. [sent-8, score-0.681]

8 We’ve gone from thinking that a bound driven algorithm is completely useless to merely a bit more pessimistic and computationally intense than might be necessary. [sent-9, score-1.321]

9 The PAC-Bayes bound is related to the “bits-back” argument that Geoff Hinton and Drew van Camp made at COLT 6 years earlier. [sent-10, score-0.652]

10 What other machine learning or learning theory papers from 10 years ago have had a substantial impact? [sent-11, score-0.516]


similar blogs computed by tfidf model

tfidf for this blog:

wordName wordTfidf (topN-words)

[('pessimistic', 0.286), ('provided', 0.213), ('bound', 0.211), ('years', 0.211), ('ago', 0.165), ('averaging', 0.154), ('incredibly', 0.143), ('van', 0.143), ('sample', 0.141), ('theory', 0.14), ('thinking', 0.136), ('intense', 0.135), ('drive', 0.135), ('gross', 0.135), ('moderately', 0.135), ('guidance', 0.135), ('continuously', 0.135), ('suffered', 0.135), ('vc', 0.129), ('motivations', 0.129), ('hinton', 0.129), ('colt', 0.127), ('parameterized', 0.124), ('mcallester', 0.124), ('controlling', 0.124), ('estimating', 0.119), ('driven', 0.119), ('explained', 0.119), ('complexity', 0.119), ('merely', 0.116), ('drew', 0.112), ('tighter', 0.112), ('paper', 0.11), ('journal', 0.109), ('useless', 0.109), ('geoff', 0.109), ('took', 0.109), ('successful', 0.109), ('gone', 0.107), ('draft', 0.107), ('variants', 0.104), ('completely', 0.102), ('radically', 0.1), ('meant', 0.1), ('attempts', 0.098), ('alternative', 0.094), ('came', 0.094), ('argument', 0.087), ('predictors', 0.086), ('david', 0.083)]

similar blogs list:

simIndex simValue blogId blogTitle

same-blog 1 1.0000002 368 hunch net-2009-08-26-Another 10-year paper in Machine Learning

Introduction: When I was thinking about the best “10 year paper” for ICML , I also took a look at a few other conferences. Here is one from 10 years ago that interested me: David McAllester PAC-Bayesian Model Averaging , COLT 1999. 2001 Journal Draft . Prior to this paper, the only mechanism known for controlling or estimating the necessary sample complexity for learning over continuously parameterized predictors was VC theory and variants, all of which suffered from a basic problem: they were incredibly pessimistic in practice. This meant that only very gross guidance could be provided for learning algorithm design. The PAC-Bayes bound provided an alternative approach to sample complexity bounds which was radically tighter, quantitatively. It also imported and explained many of the motivations for Bayesian learning in a way that learning theory and perhaps optimization people might appreciate. Since this paper came out, there have been a number of moderately successful attempts t

2 0.15707739 332 hunch net-2008-12-23-Use of Learning Theory

Introduction: I’ve had serious conversations with several people who believe that the theory in machine learning is “only useful for getting papers published”. That’s a compelling statement, as I’ve seen many papers where the algorithm clearly came first, and the theoretical justification for it came second, purely as a perceived means to improve the chance of publication. Naturally, I disagree and believe that learning theory has much more substantial applications. Even in core learning algorithm design, I’ve found learning theory to be useful, although it’s application is more subtle than many realize. The most straightforward applications can fail, because (as expectation suggests) worst case bounds tend to be loose in practice (*). In my experience, considering learning theory when designing an algorithm has two important effects in practice: It can help make your algorithm behave right at a crude level of analysis, leaving finer details to tuning or common sense. The best example

3 0.14585032 41 hunch net-2005-03-15-The State of Tight Bounds

Introduction: What? Bounds are mathematical formulas relating observations to future error rates assuming that data is drawn independently. In classical statistics, they are calld confidence intervals. Why? Good Judgement . In many applications of learning, it is desirable to know how well the learned predictor works in the future. This helps you decide if the problem is solved or not. Learning Essence . The form of some of these bounds helps you understand what the essence of learning is. Algorithm Design . Some of these bounds suggest, motivate, or even directly imply learning algorithms. What We Know Now There are several families of bounds, based on how information is used. Testing Bounds . These are methods which use labeled data not used in training to estimate the future error rate. Examples include the test set bound , progressive validation also here and here , train and test bounds , and cross-validation (but see the big open problem ). These tec

4 0.14317256 89 hunch net-2005-07-04-The Health of COLT

Introduction: The health of COLT (Conference on Learning Theory or Computational Learning Theory depending on who you ask) has been questioned over the last few years. Low points for the conference occurred when EuroCOLT merged with COLT in 2001, and the attendance at the 2002 Sydney COLT fell to a new low. This occurred in the general context of machine learning conferences rising in both number and size over the last decade. Any discussion of why COLT has had difficulties is inherently controversial as is any story about well-intentioned people making the wrong decisions. Nevertheless, this may be worth discussing in the hope of avoiding problems in the future and general understanding. In any such discussion there is a strong tendency to identify with a conference/community in a patriotic manner that is detrimental to thinking. Keep in mind that conferences exist to further research. My understanding (I wasn’t around) is that COLT started as a subcommunity of the computer science

5 0.1181535 244 hunch net-2007-05-09-The Missing Bound

Introduction: Sham Kakade points out that we are missing a bound. Suppose we have m samples x drawn IID from some distribution D . Through the magic of exponential moment method we know that: If the range of x is bounded by an interval of size I , a Chernoff/Hoeffding style bound gives us a bound on the deviations like O(I/m 0.5 ) (at least in crude form). A proof is on page 9 here . If the range of x is bounded, and the variance (or a bound on the variance) is known, then Bennett’s bound can give tighter results (*). This can be a huge improvment when the true variance small. What’s missing here is a bound that depends on the observed variance rather than a bound on the variance. This means that many people attempt to use Bennett’s bound (incorrectly) by plugging the observed variance in as the true variance, invalidating the bound application. Most of the time, they get away with it, but this is a dangerous move when doing machine learning. In machine learning,

6 0.10985341 486 hunch net-2013-07-10-Thoughts on Artificial Intelligence

7 0.10894823 453 hunch net-2012-01-28-Why COLT?

8 0.101814 343 hunch net-2009-02-18-Decision by Vetocracy

9 0.097672522 304 hunch net-2008-06-27-Reviewing Horror Stories

10 0.096434407 170 hunch net-2006-04-06-Bounds greater than 1

11 0.09555497 163 hunch net-2006-03-12-Online learning or online preservation of learning?

12 0.094401315 452 hunch net-2012-01-04-Why ICML? and the summer conferences

13 0.091927581 251 hunch net-2007-06-24-Interesting Papers at ICML 2007

14 0.091233462 437 hunch net-2011-07-10-ICML 2011 and the future

15 0.091030098 403 hunch net-2010-07-18-ICML & COLT 2010

16 0.09096507 95 hunch net-2005-07-14-What Learning Theory might do

17 0.090041898 351 hunch net-2009-05-02-Wielding a New Abstraction

18 0.088537969 235 hunch net-2007-03-03-All Models of Learning have Flaws

19 0.088441908 454 hunch net-2012-01-30-ICML Posters and Scope

20 0.086835392 435 hunch net-2011-05-16-Research Directions for Machine Learning and Algorithms


similar blogs computed by lsi model

lsi for this blog:

topicId topicWeight

[(0, 0.205), (1, 0.003), (2, 0.053), (3, -0.023), (4, 0.072), (5, -0.06), (6, 0.018), (7, -0.019), (8, 0.053), (9, -0.038), (10, 0.132), (11, 0.063), (12, -0.078), (13, 0.009), (14, 0.09), (15, -0.033), (16, 0.082), (17, 0.032), (18, -0.055), (19, -0.06), (20, 0.064), (21, 0.095), (22, -0.093), (23, 0.068), (24, 0.025), (25, 0.041), (26, -0.008), (27, 0.014), (28, 0.067), (29, -0.037), (30, 0.004), (31, -0.044), (32, 0.046), (33, 0.06), (34, -0.072), (35, -0.075), (36, 0.027), (37, -0.054), (38, -0.019), (39, 0.049), (40, -0.095), (41, 0.01), (42, -0.095), (43, 0.016), (44, 0.058), (45, -0.003), (46, -0.021), (47, -0.116), (48, -0.056), (49, 0.044)]

similar blogs list:

simIndex simValue blogId blogTitle

same-blog 1 0.95165813 368 hunch net-2009-08-26-Another 10-year paper in Machine Learning

Introduction: When I was thinking about the best “10 year paper” for ICML , I also took a look at a few other conferences. Here is one from 10 years ago that interested me: David McAllester PAC-Bayesian Model Averaging , COLT 1999. 2001 Journal Draft . Prior to this paper, the only mechanism known for controlling or estimating the necessary sample complexity for learning over continuously parameterized predictors was VC theory and variants, all of which suffered from a basic problem: they were incredibly pessimistic in practice. This meant that only very gross guidance could be provided for learning algorithm design. The PAC-Bayes bound provided an alternative approach to sample complexity bounds which was radically tighter, quantitatively. It also imported and explained many of the motivations for Bayesian learning in a way that learning theory and perhaps optimization people might appreciate. Since this paper came out, there have been a number of moderately successful attempts t

2 0.6984297 170 hunch net-2006-04-06-Bounds greater than 1

Introduction: Nati Srebro and Shai Ben-David have a paper at COLT which, in the appendix, proves something very striking: several previous error bounds are always greater than 1. Background One branch of learning theory focuses on theorems which Assume samples are drawn IID from an unknown distribution D . Fix a set of classifiers Find a high probability bound on the maximum true error rate (with respect to D ) as a function of the empirical error rate on the training set. Many of these bounds become extremely complex and hairy. Current Everyone working on this subject wants “tighter bounds”, however there are different definitions of “tighter”. Some groups focus on “functional tightness” (getting the right functional dependency between the size of the training set and a parameterization of the hypothesis space) while others focus on “practical tightness” (finding bounds which work well on practical problems). (I am definitely in the second camp.) One of the da

3 0.57985651 41 hunch net-2005-03-15-The State of Tight Bounds

Introduction: What? Bounds are mathematical formulas relating observations to future error rates assuming that data is drawn independently. In classical statistics, they are calld confidence intervals. Why? Good Judgement . In many applications of learning, it is desirable to know how well the learned predictor works in the future. This helps you decide if the problem is solved or not. Learning Essence . The form of some of these bounds helps you understand what the essence of learning is. Algorithm Design . Some of these bounds suggest, motivate, or even directly imply learning algorithms. What We Know Now There are several families of bounds, based on how information is used. Testing Bounds . These are methods which use labeled data not used in training to estimate the future error rate. Examples include the test set bound , progressive validation also here and here , train and test bounds , and cross-validation (but see the big open problem ). These tec

4 0.57926154 85 hunch net-2005-06-28-A COLT paper

Introduction: I found Tong Zhang’s paper on Data Dependent Concentration Bounds for Sequential Prediction Algorithms interesting. Roughly speaking, it states a tight bound on the future error rate for online learning algorithms assuming that samples are drawn independently. This bound is easily computed and will make the progressive validation approaches used here significantly more practical.

5 0.56561399 280 hunch net-2007-12-20-Cool and Interesting things at NIPS, take three

Introduction: Following up on Hal Daume’s post and John’s post on cool and interesting things seen at NIPS I’ll post my own little list of neat papers here as well. Of course it’s going to be biased towards what I think is interesting. Also, I have to say that I hadn’t been able to see many papers this year at nips due to myself being too busy, so please feel free to contribute the papers that you liked 1. P. Mudigonda, V. Kolmogorov, P. Torr. An Analysis of Convex Relaxations for MAP Estimation. A surprising paper which shows that many of the more sophisticated convex relaxations that had been proposed recently turns out to be subsumed by the simplest LP relaxation. Be careful next time you try a cool new convex relaxation! 2. D. Sontag, T. Jaakkola. New Outer Bounds on the Marginal Polytope. The title says it all. The marginal polytope is the set of local marginal distributions over subsets of variables that are globally consistent in the sense that there is at least one distributio

6 0.55994391 244 hunch net-2007-05-09-The Missing Bound

7 0.55456632 89 hunch net-2005-07-04-The Health of COLT

8 0.52997452 332 hunch net-2008-12-23-Use of Learning Theory

9 0.52446324 115 hunch net-2005-09-26-Prediction Bounds as the Mathematics of Science

10 0.51808971 325 hunch net-2008-11-10-ICML Reviewing Criteria

11 0.5052352 42 hunch net-2005-03-17-Going all the Way, Sometimes

12 0.50411773 403 hunch net-2010-07-18-ICML & COLT 2010

13 0.50145704 392 hunch net-2010-03-26-A Variance only Deviation Bound

14 0.50014734 188 hunch net-2006-06-30-ICML papers

15 0.49562973 334 hunch net-2009-01-07-Interesting Papers at SODA 2009

16 0.49365446 163 hunch net-2006-03-12-Online learning or online preservation of learning?

17 0.49040931 33 hunch net-2005-02-28-Regularization

18 0.48865464 439 hunch net-2011-08-01-Interesting papers at COLT 2011

19 0.48285604 454 hunch net-2012-01-30-ICML Posters and Scope

20 0.48261049 361 hunch net-2009-06-24-Interesting papers at UAICMOLT 2009


similar blogs computed by lda model

lda for this blog:

topicId topicWeight

[(1, 0.014), (27, 0.271), (37, 0.4), (38, 0.029), (53, 0.063), (55, 0.084), (94, 0.04)]

similar blogs list:

simIndex simValue blogId blogTitle

1 0.92123979 431 hunch net-2011-04-18-A paper not at Snowbird

Introduction: Unfortunately, a scheduling failure meant I missed all of AIStat and most of the learning workshop , otherwise known as Snowbird, when it’s at Snowbird . At snowbird, the talk on Sum-Product networks by Hoifung Poon stood out to me ( Pedro Domingos is a coauthor.). The basic point was that by appropriately constructing networks based on sums and products, the normalization problem in probabilistic models is eliminated, yielding a highly tractable yet flexible representation+learning algorithm. As an algorithm, this is noticeably cleaner than deep belief networks with a claim to being an order of magnitude faster and working better on an image completion task. Snowbird doesn’t have real papers—just the abstract above. I look forward to seeing the paper. (added: Rodrigo points out the deep learning workshop draft .)

2 0.91284466 63 hunch net-2005-04-27-DARPA project: LAGR

Introduction: Larry Jackal has set up the LAGR (“Learning Applied to Ground Robotics”) project (and competition) which seems to be quite well designed. Features include: Many participants (8 going on 12?) Standardized hardware. In the DARPA grand challenge contestants entering with motorcycles are at a severe disadvantage to those entering with a Hummer. Similarly, contestants using more powerful sensors can gain huge advantages. Monthly contests, with full feedback (but since the hardware is standardized, only code is shipped). One of the premises of the program is that robust systems are desired. Monthly evaluations at different locations can help measure this and provide data. Attacks a known hard problem. (cross country driving)

same-blog 3 0.89246112 368 hunch net-2009-08-26-Another 10-year paper in Machine Learning

Introduction: When I was thinking about the best “10 year paper” for ICML , I also took a look at a few other conferences. Here is one from 10 years ago that interested me: David McAllester PAC-Bayesian Model Averaging , COLT 1999. 2001 Journal Draft . Prior to this paper, the only mechanism known for controlling or estimating the necessary sample complexity for learning over continuously parameterized predictors was VC theory and variants, all of which suffered from a basic problem: they were incredibly pessimistic in practice. This meant that only very gross guidance could be provided for learning algorithm design. The PAC-Bayes bound provided an alternative approach to sample complexity bounds which was radically tighter, quantitatively. It also imported and explained many of the motivations for Bayesian learning in a way that learning theory and perhaps optimization people might appreciate. Since this paper came out, there have been a number of moderately successful attempts t

4 0.82089204 138 hunch net-2005-12-09-Some NIPS papers

Introduction: Here is a set of papers that I found interesting (and why). A PAC-Bayes approach to the Set Covering Machine improves the set covering machine. The set covering machine approach is a new way to do classification characterized by a very close connection between theory and algorithm. At this point, the approach seems to be competing well with SVMs in about all dimensions: similar computational speed, similar accuracy, stronger learning theory guarantees, more general information source (a kernel has strictly more structure than a metric), and more sparsity. Developing a classification algorithm is not very easy, but the results so far are encouraging. Off-Road Obstacle Avoidance through End-to-End Learning and Learning Depth from Single Monocular Images both effectively showed that depth information can be predicted from camera images (using notably different techniques). This ability is strongly enabling because cameras are cheap, tiny, light, and potentially provider lo

5 0.75299108 1 hunch net-2005-01-19-Why I decided to run a weblog.

Introduction: I have decided to run a weblog on machine learning and learning theory research. Here are some reasons: 1) Weblogs enable new functionality: Public comment on papers. No mechanism for this exists at conferences and most journals. I have encountered it once for a science paper. Some communities have mailing lists supporting this, but not machine learning or learning theory. I have often read papers and found myself wishing there was some method to consider other’s questions and read the replies. Conference shortlists. One of the most common conversations at a conference is “what did you find interesting?” There is no explicit mechanism for sharing this information at conferences, and it’s easy to imagine that it would be handy to do so. Evaluation and comment on research directions. Papers are almost exclusively about new research, rather than evaluation (and consideration) of research directions. This last role is satisfied by funding agencies to some extent, but

6 0.70274514 21 hunch net-2005-02-17-Learning Research Programs

7 0.68278658 194 hunch net-2006-07-11-New Models

8 0.59671009 337 hunch net-2009-01-21-Nearly all natural problems require nonlinearity

9 0.59502625 158 hunch net-2006-02-24-A Fundamentalist Organization of Machine Learning

10 0.58971864 143 hunch net-2005-12-27-Automated Labeling

11 0.58708572 325 hunch net-2008-11-10-ICML Reviewing Criteria

12 0.58651352 144 hunch net-2005-12-28-Yet more nips thoughts

13 0.58333439 304 hunch net-2008-06-27-Reviewing Horror Stories

14 0.57967025 352 hunch net-2009-05-06-Machine Learning to AI

15 0.57789332 196 hunch net-2006-07-13-Regression vs. Classification as a Primitive

16 0.57709622 230 hunch net-2007-02-02-Thoughts regarding “Is machine learning different from statistics?”

17 0.57316661 293 hunch net-2008-03-23-Interactive Machine Learning

18 0.57274312 220 hunch net-2006-11-27-Continuizing Solutions

19 0.5721404 347 hunch net-2009-03-26-Machine Learning is too easy

20 0.57147837 311 hunch net-2008-07-26-Compositional Machine Learning Algorithm Design