hunch_net hunch_net-2005 hunch_net-2005-104 knowledge-graph by maker-knowledge-mining

104 hunch net-2005-08-22-Do you believe in induction?


meta infos for this blog

Source: html

Introduction: Foster Provost gave a talk at the ICML metalearning workshop on “metalearning” and the “no free lunch theorem” which seems worth summarizing. As a review: the no free lunch theorem is the most complicated way we know of to say that a bias is required in order to learn. The simplest way to see this is in a nonprobabilistic setting. If you are given examples of the form (x,y) and you wish to predict y from x then any prediction mechanism errs half the time in expectation over all sequences of examples. The proof of this is very simple: on every example a predictor must make some prediction and by symmetry over the set of sequences it will be wrong half the time and right half the time. The basic idea of this proof has been applied to many other settings. The simplistic interpretation of this theorem which many people jump to is “machine learning is dead” since there can be no single learning algorithm which can solve all learning problems. This is the wrong way to thi


Summary: the most important sentenses genereted by tfidf model

sentIndex sentText sentNum sentScore

1 Foster Provost gave a talk at the ICML metalearning workshop on “metalearning” and the “no free lunch theorem” which seems worth summarizing. [sent-1, score-0.608]

2 As a review: the no free lunch theorem is the most complicated way we know of to say that a bias is required in order to learn. [sent-2, score-0.769]

3 The simplest way to see this is in a nonprobabilistic setting. [sent-3, score-0.159]

4 If you are given examples of the form (x,y) and you wish to predict y from x then any prediction mechanism errs half the time in expectation over all sequences of examples. [sent-4, score-1.106]

5 The proof of this is very simple: on every example a predictor must make some prediction and by symmetry over the set of sequences it will be wrong half the time and right half the time. [sent-5, score-1.55]

6 The basic idea of this proof has been applied to many other settings. [sent-6, score-0.188]

7 The simplistic interpretation of this theorem which many people jump to is “machine learning is dead” since there can be no single learning algorithm which can solve all learning problems. [sent-7, score-0.587]

8 In the real world, we do not care about the expectation over all possible sequences, but perhaps instead about some (weighted) expectation over the set of problems we actually encounter. [sent-9, score-0.812]

9 It is enitrely possible that we can form a prediction algorithm with good performance over this set of problems. [sent-10, score-0.576]

10 If we want to access the set of problems we actually encounter, we must do this empirically. [sent-12, score-0.408]

11 Although we must work with the world to understand what a good general-purpose learning algorithm is, quantifying how good the algorithm is may be difficult. [sent-13, score-0.702]

12 In particular, performing well on the last 100 encountered learning problems may say nothing about performing well on the next encountered learning problem. [sent-14, score-1.203]

13 It has been noted by Hume that there is no mathematical proof that the sun will rise tomorrow which does not rely on unverifiable assumptions about the world. [sent-16, score-0.643]

14 Nevertheless, the belief in sunrise tomorrow is essentially universal. [sent-17, score-0.434]

15 A good general purpose learning algorithm is similar to ‘sunrise’: we can’t prove that we will succeed on the next learning problem encountered, but nevertheless we might believe it for inductive reasons. [sent-18, score-0.65]


similar blogs computed by tfidf model

tfidf for this blog:

wordName wordTfidf (topN-words)

[('sequences', 0.278), ('sunrise', 0.261), ('half', 0.24), ('encountered', 0.234), ('expectation', 0.22), ('lunch', 0.214), ('metalearning', 0.214), ('performing', 0.193), ('proof', 0.188), ('tomorrow', 0.173), ('theorem', 0.167), ('algorithm', 0.129), ('set', 0.122), ('provost', 0.116), ('induction', 0.116), ('must', 0.108), ('prediction', 0.107), ('rise', 0.107), ('errs', 0.107), ('inductive', 0.107), ('free', 0.107), ('say', 0.104), ('wrong', 0.103), ('actually', 0.102), ('jump', 0.101), ('quantifying', 0.101), ('simplistic', 0.101), ('nevertheless', 0.098), ('next', 0.098), ('symmetry', 0.096), ('foster', 0.096), ('noted', 0.093), ('dead', 0.093), ('complicated', 0.089), ('interpretation', 0.089), ('way', 0.088), ('world', 0.087), ('rely', 0.082), ('wish', 0.082), ('encounter', 0.078), ('succeed', 0.076), ('problems', 0.076), ('good', 0.074), ('gave', 0.073), ('form', 0.072), ('possible', 0.072), ('nothing', 0.071), ('simplest', 0.071), ('purpose', 0.068), ('right', 0.068)]

similar blogs list:

simIndex simValue blogId blogTitle

same-blog 1 0.99999994 104 hunch net-2005-08-22-Do you believe in induction?

Introduction: Foster Provost gave a talk at the ICML metalearning workshop on “metalearning” and the “no free lunch theorem” which seems worth summarizing. As a review: the no free lunch theorem is the most complicated way we know of to say that a bias is required in order to learn. The simplest way to see this is in a nonprobabilistic setting. If you are given examples of the form (x,y) and you wish to predict y from x then any prediction mechanism errs half the time in expectation over all sequences of examples. The proof of this is very simple: on every example a predictor must make some prediction and by symmetry over the set of sequences it will be wrong half the time and right half the time. The basic idea of this proof has been applied to many other settings. The simplistic interpretation of this theorem which many people jump to is “machine learning is dead” since there can be no single learning algorithm which can solve all learning problems. This is the wrong way to thi

2 0.16874725 133 hunch net-2005-11-28-A question of quantification

Introduction: This is about methods for phrasing and think about the scope of some theorems in learning theory. The basic claim is that there are several different ways of quantifying the scope which sound different yet are essentially the same. For all sequences of examples . This is the standard quantification in online learning analysis. Standard theorems would say something like “for all sequences of predictions by experts, the algorithm A will perform almost as well as the best expert.” For all training sets . This is the standard quantification for boosting analysis such as adaboost or multiclass boosting . Standard theorems have the form “for all training sets the error rate inequalities … hold”. For all distributions over examples . This is the one that we have been using for reductions analysis. Standard theorem statements have the form “For all distributions over examples, the error rate inequalities … hold”. It is not quite true that each of these is equivalent. F

3 0.13556869 187 hunch net-2006-06-25-Presentation of Proofs is Hard.

Introduction: When presenting part of the Reinforcement Learning theory tutorial at ICML 2006 , I was forcibly reminded of this. There are several difficulties. When creating the presentation, the correct level of detail is tricky. With too much detail, the proof takes too much time and people may be lost to boredom. With too little detail, the steps of the proof involve too-great a jump. This is very difficult to judge. What may be an easy step in the careful thought of a quiet room is not so easy when you are occupied by the process of presentation. What may be easy after having gone over this (and other) proofs is not so easy to follow in the first pass by a viewer. These problems seem only correctable by process of repeated test-and-revise. When presenting the proof, simply speaking with sufficient precision is substantially harder than in normal conversation (where precision is not so critical). Practice can help here. When presenting the proof, going at the right p

4 0.13274944 204 hunch net-2006-08-28-Learning Theory standards for NIPS 2006

Introduction: Bob Williamson and I are the learning theory PC members at NIPS this year. This is some attempt to state the standards and tests I applied to the papers. I think it is a good idea to talk about this for two reasons: Making community standards a matter of public record seems healthy. It give us a chance to debate what is and is not the right standard. It might even give us a bit more consistency across the years. It may save us all time. There are a number of papers submitted which just aren’t there yet. Avoiding submitting is the right decision in this case. There are several criteria for judging a paper. All of these were active this year. Some criteria are uncontroversial while others may be so. The paper must have a theorem establishing something new for which it is possible to derive high confidence in the correctness of the results. A surprising number of papers fail this test. This criteria seems essential to the definition of “theory”. Missing theo

5 0.11350004 131 hunch net-2005-11-16-The Everything Ensemble Edge

Introduction: Rich Caruana , Alexandru Niculescu , Geoff Crew, and Alex Ksikes have done a lot of empirical testing which shows that using all methods to make a prediction is more powerful than using any single method. This is in rough agreement with the Bayesian way of solving problems, but based upon a different (essentially empirical) motivation. A rough summary is: Take all of {decision trees, boosted decision trees, bagged decision trees, boosted decision stumps, K nearest neighbors, neural networks, SVM} with all reasonable parameter settings. Run the methods on each problem of 8 problems with a large test set, calibrating margins using either sigmoid fitting or isotonic regression . For each loss of {accuracy, area under the ROC curve, cross entropy, squared error, etc…} evaluate the average performance of the method. A series of conclusions can be drawn from the observations. ( Calibrated ) boosted decision trees appear to perform best, in general although support v

6 0.11197671 160 hunch net-2006-03-02-Why do people count for learning?

7 0.10717429 90 hunch net-2005-07-07-The Limits of Learning Theory

8 0.10359146 258 hunch net-2007-08-12-Exponentiated Gradient

9 0.10313512 413 hunch net-2010-10-08-An easy proof of the Chernoff-Hoeffding bound

10 0.10206848 347 hunch net-2009-03-26-Machine Learning is too easy

11 0.10135854 202 hunch net-2006-08-10-Precision is not accuracy

12 0.09906929 348 hunch net-2009-04-02-Asymmophobia

13 0.096174084 343 hunch net-2009-02-18-Decision by Vetocracy

14 0.095792279 7 hunch net-2005-01-31-Watchword: Assumption

15 0.093769141 227 hunch net-2007-01-10-A Deep Belief Net Learning Problem

16 0.093370721 332 hunch net-2008-12-23-Use of Learning Theory

17 0.090801068 109 hunch net-2005-09-08-Online Learning as the Mathematics of Accountability

18 0.09077701 388 hunch net-2010-01-24-Specializations of the Master Problem

19 0.089848101 43 hunch net-2005-03-18-Binomial Weighting

20 0.089539483 95 hunch net-2005-07-14-What Learning Theory might do


similar blogs computed by lsi model

lsi for this blog:

topicId topicWeight

[(0, 0.228), (1, 0.066), (2, 0.035), (3, 0.026), (4, 0.01), (5, -0.027), (6, 0.017), (7, 0.038), (8, 0.063), (9, -0.035), (10, -0.006), (11, 0.014), (12, 0.118), (13, -0.022), (14, 0.048), (15, 0.038), (16, 0.029), (17, -0.067), (18, -0.039), (19, 0.041), (20, -0.104), (21, -0.079), (22, 0.027), (23, -0.093), (24, 0.009), (25, -0.059), (26, 0.04), (27, -0.084), (28, -0.061), (29, -0.032), (30, -0.049), (31, 0.018), (32, -0.07), (33, 0.045), (34, -0.017), (35, -0.022), (36, -0.042), (37, 0.102), (38, 0.029), (39, -0.012), (40, -0.004), (41, -0.01), (42, -0.101), (43, -0.03), (44, 0.077), (45, 0.013), (46, -0.052), (47, 0.075), (48, 0.05), (49, -0.056)]

similar blogs list:

simIndex simValue blogId blogTitle

same-blog 1 0.94293445 104 hunch net-2005-08-22-Do you believe in induction?

Introduction: Foster Provost gave a talk at the ICML metalearning workshop on “metalearning” and the “no free lunch theorem” which seems worth summarizing. As a review: the no free lunch theorem is the most complicated way we know of to say that a bias is required in order to learn. The simplest way to see this is in a nonprobabilistic setting. If you are given examples of the form (x,y) and you wish to predict y from x then any prediction mechanism errs half the time in expectation over all sequences of examples. The proof of this is very simple: on every example a predictor must make some prediction and by symmetry over the set of sequences it will be wrong half the time and right half the time. The basic idea of this proof has been applied to many other settings. The simplistic interpretation of this theorem which many people jump to is “machine learning is dead” since there can be no single learning algorithm which can solve all learning problems. This is the wrong way to thi

2 0.72025043 133 hunch net-2005-11-28-A question of quantification

Introduction: This is about methods for phrasing and think about the scope of some theorems in learning theory. The basic claim is that there are several different ways of quantifying the scope which sound different yet are essentially the same. For all sequences of examples . This is the standard quantification in online learning analysis. Standard theorems would say something like “for all sequences of predictions by experts, the algorithm A will perform almost as well as the best expert.” For all training sets . This is the standard quantification for boosting analysis such as adaboost or multiclass boosting . Standard theorems have the form “for all training sets the error rate inequalities … hold”. For all distributions over examples . This is the one that we have been using for reductions analysis. Standard theorem statements have the form “For all distributions over examples, the error rate inequalities … hold”. It is not quite true that each of these is equivalent. F

3 0.65911871 160 hunch net-2006-03-02-Why do people count for learning?

Introduction: This post is about a confusion of mine with respect to many commonly used machine learning algorithms. A simple example where this comes up is Bayes net prediction. A Bayes net where a directed acyclic graph over a set of nodes where each node is associated with a variable and the edges indicate dependence. The joint probability distribution over the variables is given by a set of conditional probabilities. For example, a very simple Bayes net might express: P(A,B,C) = P(A | B,C)P(B)P(C) What I don’t understand is the mechanism commonly used to estimate P(A | B, C) . If we let N(A,B,C) be the number of instances of A,B,C then people sometimes form an estimate according to: P’(A | B,C) = N(A,B,C) / N /[N(B)/N * N(C)/N] = N(A,B,C) N /[N(B) N(C)] … in other words, people just estimate P’(A | B,C) according to observed relative frequencies. This is a reasonable technique when you have a large number of samples compared to the size space A x B x C , but it (nat

4 0.65872777 202 hunch net-2006-08-10-Precision is not accuracy

Introduction: In my experience, there are two different groups of people who believe the same thing: the mathematics encountered in typical machine learning conference papers is often of questionable value. The two groups who agree on this are applied machine learning people who have given up on math, and mature theoreticians who understand the limits of theory. Partly, this is just a statement about where we are with respect to machine learning. In particular, we have no mechanism capable of generating a prescription for how to solve all learning problems. In the absence of such certainty, people try to come up with formalisms that partially describe and motivate how and why they do things. This is natural and healthy—we might hope that it will eventually lead to just such a mechanism. But, part of this is simply an emphasis on complexity over clarity. A very natural and simple theoretical statement is often obscured by complexifications. Common sources of complexification include:

5 0.65772581 126 hunch net-2005-10-26-Fallback Analysis is a Secret to Useful Algorithms

Introduction: The ideal of theoretical algorithm analysis is to construct an algorithm with accompanying optimality theorems proving that it is a useful algorithm. This ideal often fails, particularly for learning algorithms and theory. The general form of a theorem is: If preconditions Then postconditions When we design learning algorithms it is very common to come up with precondition assumptions such as “the data is IID”, “the learning problem is drawn from a known distribution over learning problems”, or “there is a perfect classifier”. All of these example preconditions can be false for real-world problems in ways that are not easily detectable. This means that algorithms derived and justified by these very common forms of analysis may be prone to catastrophic failure in routine (mis)application. We can hope for better. Several different kinds of learning algorithm analysis have been developed some of which have fewer preconditions. Simply demanding that these forms of analysi

6 0.65259939 187 hunch net-2006-06-25-Presentation of Proofs is Hard.

7 0.64740521 7 hunch net-2005-01-31-Watchword: Assumption

8 0.64322996 413 hunch net-2010-10-08-An easy proof of the Chernoff-Hoeffding bound

9 0.63965386 57 hunch net-2005-04-16-Which Assumptions are Reasonable?

10 0.63507676 162 hunch net-2006-03-09-Use of Notation

11 0.61684418 347 hunch net-2009-03-26-Machine Learning is too easy

12 0.60833973 43 hunch net-2005-03-18-Binomial Weighting

13 0.60804576 90 hunch net-2005-07-07-The Limits of Learning Theory

14 0.60619384 28 hunch net-2005-02-25-Problem: Online Learning

15 0.58012474 177 hunch net-2006-05-05-An ICML reject

16 0.57316673 148 hunch net-2006-01-13-Benchmarks for RL

17 0.56998944 31 hunch net-2005-02-26-Problem: Reductions and Relative Ranking Metrics

18 0.56905055 204 hunch net-2006-08-28-Learning Theory standards for NIPS 2006

19 0.56004882 42 hunch net-2005-03-17-Going all the Way, Sometimes

20 0.54050165 109 hunch net-2005-09-08-Online Learning as the Mathematics of Accountability


similar blogs computed by lda model

lda for this blog:

topicId topicWeight

[(3, 0.034), (27, 0.184), (38, 0.085), (49, 0.024), (53, 0.051), (55, 0.068), (94, 0.102), (95, 0.057), (96, 0.3)]

similar blogs list:

simIndex simValue blogId blogTitle

1 0.92758334 175 hunch net-2006-04-30-John Langford –> Yahoo Research, NY

Introduction: I will join Yahoo Research (in New York) after my contract ends at TTI-Chicago . The deciding reasons are: Yahoo is running into many hard learning problems. This is precisely the situation where basic research might hope to have the greatest impact. Yahoo Research understands research including publishing, conferences, etc… Yahoo Research is growing, so there is a chance I can help it grow well. Yahoo understands the internet, including (but not at all limited to) experimenting with research blogs. In the end, Yahoo Research seems like the place where I might have a chance to make the greatest difference. Yahoo (as a company) has made a strong bet on Yahoo Research. We-the-researchers all hope that bet will pay off, and this seems plausible. I’ll certainly have fun trying.

2 0.88693321 53 hunch net-2005-04-06-Structured Regret Minimization

Introduction: Geoff Gordon made an interesting presentation at the snowbird learning workshop discussing the use of no-regret algorithms for the use of several robot-related learning problems. There seems to be a draft here . This seems interesting in two ways: Drawback Removal One of the significant problems with these online algorithms is that they can’t cope with structure very easily. This drawback is addressed for certain structures. Experiments One criticism of such algorithms is that they are too “worst case”. Several experiments suggest that protecting yourself against this worst case does not necessarily incur a great loss.

same-blog 3 0.85617375 104 hunch net-2005-08-22-Do you believe in induction?

Introduction: Foster Provost gave a talk at the ICML metalearning workshop on “metalearning” and the “no free lunch theorem” which seems worth summarizing. As a review: the no free lunch theorem is the most complicated way we know of to say that a bias is required in order to learn. The simplest way to see this is in a nonprobabilistic setting. If you are given examples of the form (x,y) and you wish to predict y from x then any prediction mechanism errs half the time in expectation over all sequences of examples. The proof of this is very simple: on every example a predictor must make some prediction and by symmetry over the set of sequences it will be wrong half the time and right half the time. The basic idea of this proof has been applied to many other settings. The simplistic interpretation of this theorem which many people jump to is “machine learning is dead” since there can be no single learning algorithm which can solve all learning problems. This is the wrong way to thi

4 0.74498427 443 hunch net-2011-09-03-Fall Machine Learning Events

Introduction: Many Machine Learning related events are coming up this fall. September 9 , abstracts for the New York Machine Learning Symposium are due. Send a 2 page pdf, if interested, and note that we: widened submissions to be from anybody rather than students. set aside a larger fraction of time for contributed submissions. September 15 , there is a machine learning meetup , where I’ll be discussing terascale learning at AOL. September 16 , there is a CS&Econ; day at New York Academy of Sciences. This is not ML focused, but it’s easy to imagine interest. September 23 and later NIPS workshop submissions start coming due. As usual, there are too many good ones, so I won’t be able to attend all those that interest me. I do hope some workshop makers consider ICML this coming summer, as we are increasing to a 2 day format for you. Here are a few that interest me: Big Learning is about dealing with lots of data. Abstracts are due September 30 . The Bayes

5 0.68393695 105 hunch net-2005-08-23-(Dis)similarities between academia and open source programmers

Introduction: Martin Pool and I recently discussed the similarities and differences between academia and open source programming. Similarities: Cost profile Research and programming share approximately the same cost profile: A large upfront effort is required to produce something useful, and then “anyone” can use it. (The “anyone” is not quite right for either group because only sufficiently technical people could use it.) Wealth profile A “wealthy” academic or open source programmer is someone who has contributed a lot to other people in research or programs. Much of academia is a “gift culture”: whoever gives the most is most respected. Problems Both academia and open source programming suffer from similar problems. Whether or not (and which) open source program is used are perhaps too-often personality driven rather than driven by capability or usefulness. Similar phenomena can happen in academia with respect to directions of research. Funding is often a problem for

6 0.63968384 478 hunch net-2013-01-07-NYU Large Scale Machine Learning Class

7 0.61997515 359 hunch net-2009-06-03-Functionally defined Nonlinear Dynamic Models

8 0.6146462 95 hunch net-2005-07-14-What Learning Theory might do

9 0.61418414 286 hunch net-2008-01-25-Turing’s Club for Machine Learning

10 0.60792512 235 hunch net-2007-03-03-All Models of Learning have Flaws

11 0.60726535 351 hunch net-2009-05-02-Wielding a New Abstraction

12 0.60693699 132 hunch net-2005-11-26-The Design of an Optimal Research Environment

13 0.60582626 143 hunch net-2005-12-27-Automated Labeling

14 0.60513848 237 hunch net-2007-04-02-Contextual Scaling

15 0.6050306 49 hunch net-2005-03-30-What can Type Theory teach us about Machine Learning?

16 0.60477704 371 hunch net-2009-09-21-Netflix finishes (and starts)

17 0.6047169 360 hunch net-2009-06-15-In Active Learning, the question changes

18 0.60385764 306 hunch net-2008-07-02-Proprietary Data in Academic Research?

19 0.60308129 19 hunch net-2005-02-14-Clever Methods of Overfitting

20 0.60300463 98 hunch net-2005-07-27-Not goal metrics