hunch_net hunch_net-2005 hunch_net-2005-56 knowledge-graph by maker-knowledge-mining
Source: html
Introduction: The diagram above shows a very broad viewpoint of learning theory. arrow Typical statement Examples Past->Past Some prediction algorithm A does almost as well as any of a set of algorithms. Weighted Majority Past->Future Assuming independent samples, past performance predicts future performance. PAC analysis, ERM analysis Future->Future Future prediction performance on subproblems implies future prediction performance using algorithm A . ECOC, Probing A basic question is: Are there other varieties of statements of this type? Avrim noted that there are also “arrows between arrows”: generic methods for transforming between Past->Past statements and Past->Future statements. Are there others?
sentIndex sentText sentNum sentScore
1 The diagram above shows a very broad viewpoint of learning theory. [sent-1, score-0.332]
2 arrow Typical statement Examples Past->Past Some prediction algorithm A does almost as well as any of a set of algorithms. [sent-2, score-0.51]
3 Weighted Majority Past->Future Assuming independent samples, past performance predicts future performance. [sent-3, score-1.189]
4 PAC analysis, ERM analysis Future->Future Future prediction performance on subproblems implies future prediction performance using algorithm A . [sent-4, score-1.698]
5 ECOC, Probing A basic question is: Are there other varieties of statements of this type? [sent-5, score-0.52]
6 Avrim noted that there are also “arrows between arrows”: generic methods for transforming between Past->Past statements and Past->Future statements. [sent-6, score-0.627]
wordName wordTfidf (topN-words)
[('future', 0.423), ('arrows', 0.422), ('past', 0.3), ('statements', 0.225), ('performance', 0.222), ('varieties', 0.187), ('prediction', 0.174), ('probing', 0.164), ('ecoc', 0.164), ('noted', 0.15), ('erm', 0.15), ('analysis', 0.146), ('generic', 0.145), ('predicts', 0.14), ('avrim', 0.14), ('pac', 0.136), ('subproblems', 0.136), ('assuming', 0.129), ('broad', 0.119), ('majority', 0.119), ('type', 0.117), ('weighted', 0.109), ('independent', 0.104), ('viewpoint', 0.103), ('statement', 0.099), ('shows', 0.099), ('samples', 0.093), ('typical', 0.091), ('algorithm', 0.084), ('methods', 0.073), ('implies', 0.071), ('others', 0.07), ('almost', 0.068), ('question', 0.059), ('examples', 0.054), ('set', 0.049), ('basic', 0.049), ('using', 0.046), ('well', 0.036), ('also', 0.034), ('learning', 0.011)]
simIndex simValue blogId blogTitle
same-blog 1 1.0000001 56 hunch net-2005-04-14-Families of Learning Theory Statements
Introduction: The diagram above shows a very broad viewpoint of learning theory. arrow Typical statement Examples Past->Past Some prediction algorithm A does almost as well as any of a set of algorithms. Weighted Majority Past->Future Assuming independent samples, past performance predicts future performance. PAC analysis, ERM analysis Future->Future Future prediction performance on subproblems implies future prediction performance using algorithm A . ECOC, Probing A basic question is: Are there other varieties of statements of this type? Avrim noted that there are also “arrows between arrows”: generic methods for transforming between Past->Past statements and Past->Future statements. Are there others?
2 0.17610814 12 hunch net-2005-02-03-Learning Theory, by assumption
Introduction: One way to organize learning theory is by assumption (in the assumption = axiom sense ), from no assumptions to many assumptions. As you travel down this list, the statements become stronger, but the scope of applicability decreases. No assumptions Online learning There exist a meta prediction algorithm which compete well with the best element of any set of prediction algorithms. Universal Learning Using a “bias” of 2 - description length of turing machine in learning is equivalent to all other computable biases up to some constant. Reductions The ability to predict well on classification problems is equivalent to the ability to predict well on many other learning problems. Independent and Identically Distributed (IID) Data Performance Prediction Based upon past performance, you can predict future performance. Uniform Convergence Performance prediction works even after choosing classifiers based on the data from large sets of classifiers.
3 0.1468581 19 hunch net-2005-02-14-Clever Methods of Overfitting
Introduction: “Overfitting” is traditionally defined as training some flexible representation so that it memorizes the data but fails to predict well in the future. For this post, I will define overfitting more generally as over-representing the performance of systems. There are two styles of general overfitting: overrepresenting performance on particular datasets and (implicitly) overrepresenting performance of a method on future datasets. We should all be aware of these methods, avoid them where possible, and take them into account otherwise. I have used “reproblem” and “old datasets”, and may have participated in “overfitting by review”—some of these are very difficult to avoid. Name Method Explanation Remedy Traditional overfitting Train a complex predictor on too-few examples. Hold out pristine examples for testing. Use a simpler predictor. Get more training examples. Integrate over many predictors. Reject papers which do this. Parameter twe
4 0.11463309 158 hunch net-2006-02-24-A Fundamentalist Organization of Machine Learning
Introduction: There are several different flavors of Machine Learning classes. Many classes are of the ‘zoo’ sort: many different learning algorithms are presented. Others avoid the zoo by not covering the full scope of machine learning. This is my view of what makes a good machine learning class, along with why. I’d like to specifically invite comment on whether things are missing, misemphasized, or misplaced. Phase Subject Why? Introduction What is a machine learning problem? A good understanding of the characteristics of machine learning problems seems essential. Characteristics include: a data source, some hope the data is predictive, and a need for generalization. This is probably best taught in a case study manner: lay out the specifics of some problem and then ask “Is this a machine learning problem?” Introduction Machine Learning Problem Identification Identification and recognition of the type of learning problems is (obviously) a very important step i
5 0.11173776 14 hunch net-2005-02-07-The State of the Reduction
Introduction: What? Reductions are machines which turn solvers for one problem into solvers for another problem. Why? Reductions are useful for several reasons. Laziness . Reducing a problem to classification make at least 10 learning algorithms available to solve a problem. Inventing 10 learning algorithms is quite a bit of work. Similarly, programming a reduction is often trivial, while programming a learning algorithm is a great deal of work. Crystallization . The problems we often want to solve in learning are worst-case-impossible, but average case feasible. By reducing all problems onto one or a few primitives, we can fine tune these primitives to perform well on real-world problems with greater precision due to the greater number of problems to validate on. Theoretical Organization . By studying what reductions are easy vs. hard vs. impossible, we can learn which problems are roughly equivalent in difficulty and which are much harder. What we know now . Typesafe r
6 0.10901433 85 hunch net-2005-06-28-A COLT paper
7 0.10636865 236 hunch net-2007-03-15-Alternative Machine Learning Reductions Definitions
8 0.095621094 83 hunch net-2005-06-18-Lower Bounds for Learning Reductions
9 0.090071395 131 hunch net-2005-11-16-The Everything Ensemble Edge
10 0.089048773 314 hunch net-2008-08-24-Mass Customized Medicine in the Future?
11 0.081866793 161 hunch net-2006-03-05-“Structural” Learning
12 0.080051914 388 hunch net-2010-01-24-Specializations of the Master Problem
13 0.078289643 6 hunch net-2005-01-27-Learning Complete Problems
14 0.077864833 57 hunch net-2005-04-16-Which Assumptions are Reasonable?
15 0.075443655 177 hunch net-2006-05-05-An ICML reject
16 0.072863325 163 hunch net-2006-03-12-Online learning or online preservation of learning?
17 0.072402149 104 hunch net-2005-08-22-Do you believe in induction?
18 0.071959443 41 hunch net-2005-03-15-The State of Tight Bounds
19 0.068756126 112 hunch net-2005-09-14-The Predictionist Viewpoint
20 0.068642609 21 hunch net-2005-02-17-Learning Research Programs
topicId topicWeight
[(0, 0.126), (1, 0.1), (2, 0.029), (3, -0.038), (4, 0.035), (5, -0.057), (6, 0.047), (7, 0.029), (8, -0.011), (9, -0.019), (10, -0.043), (11, 0.088), (12, 0.076), (13, -0.047), (14, 0.013), (15, 0.014), (16, 0.051), (17, -0.018), (18, -0.049), (19, 0.016), (20, 0.087), (21, -0.123), (22, -0.041), (23, -0.073), (24, -0.073), (25, -0.082), (26, 0.052), (27, -0.019), (28, 0.004), (29, 0.008), (30, 0.07), (31, 0.049), (32, 0.028), (33, -0.089), (34, -0.027), (35, 0.03), (36, -0.067), (37, 0.09), (38, 0.009), (39, 0.017), (40, 0.016), (41, -0.003), (42, -0.006), (43, 0.026), (44, 0.011), (45, 0.011), (46, -0.041), (47, -0.001), (48, 0.021), (49, 0.083)]
simIndex simValue blogId blogTitle
same-blog 1 0.98669118 56 hunch net-2005-04-14-Families of Learning Theory Statements
Introduction: The diagram above shows a very broad viewpoint of learning theory. arrow Typical statement Examples Past->Past Some prediction algorithm A does almost as well as any of a set of algorithms. Weighted Majority Past->Future Assuming independent samples, past performance predicts future performance. PAC analysis, ERM analysis Future->Future Future prediction performance on subproblems implies future prediction performance using algorithm A . ECOC, Probing A basic question is: Are there other varieties of statements of this type? Avrim noted that there are also “arrows between arrows”: generic methods for transforming between Past->Past statements and Past->Future statements. Are there others?
2 0.66371155 19 hunch net-2005-02-14-Clever Methods of Overfitting
Introduction: “Overfitting” is traditionally defined as training some flexible representation so that it memorizes the data but fails to predict well in the future. For this post, I will define overfitting more generally as over-representing the performance of systems. There are two styles of general overfitting: overrepresenting performance on particular datasets and (implicitly) overrepresenting performance of a method on future datasets. We should all be aware of these methods, avoid them where possible, and take them into account otherwise. I have used “reproblem” and “old datasets”, and may have participated in “overfitting by review”—some of these are very difficult to avoid. Name Method Explanation Remedy Traditional overfitting Train a complex predictor on too-few examples. Hold out pristine examples for testing. Use a simpler predictor. Get more training examples. Integrate over many predictors. Reject papers which do this. Parameter twe
3 0.63509357 12 hunch net-2005-02-03-Learning Theory, by assumption
Introduction: One way to organize learning theory is by assumption (in the assumption = axiom sense ), from no assumptions to many assumptions. As you travel down this list, the statements become stronger, but the scope of applicability decreases. No assumptions Online learning There exist a meta prediction algorithm which compete well with the best element of any set of prediction algorithms. Universal Learning Using a “bias” of 2 - description length of turing machine in learning is equivalent to all other computable biases up to some constant. Reductions The ability to predict well on classification problems is equivalent to the ability to predict well on many other learning problems. Independent and Identically Distributed (IID) Data Performance Prediction Based upon past performance, you can predict future performance. Uniform Convergence Performance prediction works even after choosing classifiers based on the data from large sets of classifiers.
4 0.58485019 133 hunch net-2005-11-28-A question of quantification
Introduction: This is about methods for phrasing and think about the scope of some theorems in learning theory. The basic claim is that there are several different ways of quantifying the scope which sound different yet are essentially the same. For all sequences of examples . This is the standard quantification in online learning analysis. Standard theorems would say something like “for all sequences of predictions by experts, the algorithm A will perform almost as well as the best expert.” For all training sets . This is the standard quantification for boosting analysis such as adaboost or multiclass boosting . Standard theorems have the form “for all training sets the error rate inequalities … hold”. For all distributions over examples . This is the one that we have been using for reductions analysis. Standard theorem statements have the form “For all distributions over examples, the error rate inequalities … hold”. It is not quite true that each of these is equivalent. F
5 0.56892109 131 hunch net-2005-11-16-The Everything Ensemble Edge
Introduction: Rich Caruana , Alexandru Niculescu , Geoff Crew, and Alex Ksikes have done a lot of empirical testing which shows that using all methods to make a prediction is more powerful than using any single method. This is in rough agreement with the Bayesian way of solving problems, but based upon a different (essentially empirical) motivation. A rough summary is: Take all of {decision trees, boosted decision trees, bagged decision trees, boosted decision stumps, K nearest neighbors, neural networks, SVM} with all reasonable parameter settings. Run the methods on each problem of 8 problems with a large test set, calibrating margins using either sigmoid fitting or isotonic regression . For each loss of {accuracy, area under the ROC curve, cross entropy, squared error, etc…} evaluate the average performance of the method. A series of conclusions can be drawn from the observations. ( Calibrated ) boosted decision trees appear to perform best, in general although support v
6 0.55313659 163 hunch net-2006-03-12-Online learning or online preservation of learning?
7 0.54286295 63 hunch net-2005-04-27-DARPA project: LAGR
8 0.51382756 104 hunch net-2005-08-22-Do you believe in induction?
9 0.50850153 190 hunch net-2006-07-06-Branch Prediction Competition
10 0.49978819 83 hunch net-2005-06-18-Lower Bounds for Learning Reductions
11 0.49216595 158 hunch net-2006-02-24-A Fundamentalist Organization of Machine Learning
12 0.48776734 41 hunch net-2005-03-15-The State of Tight Bounds
13 0.48728827 14 hunch net-2005-02-07-The State of the Reduction
14 0.4800891 43 hunch net-2005-03-18-Binomial Weighting
15 0.47666597 112 hunch net-2005-09-14-The Predictionist Viewpoint
16 0.47266576 31 hunch net-2005-02-26-Problem: Reductions and Relative Ranking Metrics
17 0.46917051 18 hunch net-2005-02-12-ROC vs. Accuracy vs. AROC
18 0.46619236 351 hunch net-2009-05-02-Wielding a New Abstraction
19 0.46393439 373 hunch net-2009-10-03-Static vs. Dynamic multiclass prediction
20 0.46231458 345 hunch net-2009-03-08-Prediction Science
topicId topicWeight
[(27, 0.133), (53, 0.712)]
simIndex simValue blogId blogTitle
same-blog 1 0.98849183 56 hunch net-2005-04-14-Families of Learning Theory Statements
Introduction: The diagram above shows a very broad viewpoint of learning theory. arrow Typical statement Examples Past->Past Some prediction algorithm A does almost as well as any of a set of algorithms. Weighted Majority Past->Future Assuming independent samples, past performance predicts future performance. PAC analysis, ERM analysis Future->Future Future prediction performance on subproblems implies future prediction performance using algorithm A . ECOC, Probing A basic question is: Are there other varieties of statements of this type? Avrim noted that there are also “arrows between arrows”: generic methods for transforming between Past->Past statements and Past->Future statements. Are there others?
2 0.98789704 107 hunch net-2005-09-05-Site Update
Introduction: I tweaked the site in a number of ways today, including: Updating to WordPress 1.5. Installing and heavily tweaking the Geekniche theme. Update: I switched back to a tweaked version of the old theme. Adding the Customizable Post Listings plugin. Installing the StatTraq plugin. Updating some of the links. I particularly recommend looking at the computer research policy blog. Adding threaded comments . This doesn’t thread old comments obviously, but the extra structure may be helpful for new ones. Overall, I think this is an improvement, and it addresses a few of my earlier problems . If you have any difficulties or anything seems “not quite right”, please speak up. A few other tweaks to the site may happen in the near future.
3 0.98268509 16 hunch net-2005-02-09-Intuitions from applied learning
Introduction: Since learning is far from an exact science, it’s good to pay attention to basic intuitions of applied learning. Here are a few I’ve collected. Integration In Bayesian learning, the posterior is computed by an integral, and the optimal thing to do is to predict according to this integral. This phenomena seems to be far more general. Bagging, Boosting, SVMs, and Neural Networks all take advantage of this idea to some extent. The phenomena is more general: you can average over many different classification predictors to improve performance. Sources: Zoubin , Caruana Differentiation Different pieces of an average should differentiate to achieve good performance by different methods. This is know as the ‘symmetry breaking’ problem for neural networks, and it’s why weights are initialized randomly. Boosting explicitly attempts to achieve good differentiation by creating new, different, learning problems. Sources: Yann LeCun , Phil Long Deep Representation Ha
4 0.95313066 2 hunch net-2005-01-24-Holy grails of machine learning?
Introduction: Let me kick things off by posing this question to ML researchers: What do you think are some important holy grails of machine learning? For example: – “A classifier with SVM-level performance but much more scalable” – “Practical confidence bounds (or learning bounds) for classification” – “A reinforcement learning algorithm that can handle the ___ problem” – “Understanding theoretically why ___ works so well in practice” etc. I pose this question because I believe that when goals are stated explicitly and well (thus providing clarity as well as opening up the problems to more people), rather than left implicit, they are likely to be achieved much more quickly. I would also like to know more about the internal goals of the various machine learning sub-areas (theory, kernel methods, graphical models, reinforcement learning, etc) as stated by people in these respective areas. This could help people cross sub-areas.
5 0.95042229 91 hunch net-2005-07-10-Thinking the Unthought
Introduction: One thing common to much research is that the researcher must be the first person ever to have some thought. How do you think of something that has never been thought of? There seems to be no methodical manner of doing this, but there are some tricks. The easiest method is to just have some connection come to you. There is a trick here however: you should write it down and fill out the idea immediately because it can just as easily go away. A harder method is to set aside a block of time and simply think about an idea. Distraction elimination is essential here because thinking about the unthought is hard work which your mind will avoid. Another common method is in conversation. Sometimes the process of verbalizing implies new ideas come up and sometimes whoever you are talking to replies just the right way. This method is dangerous though—you must speak to someone who helps you think rather than someone who occupies your thoughts. Try to rephrase the problem so the a
6 0.92932838 367 hunch net-2009-08-16-Centmail comments
7 0.91595066 145 hunch net-2005-12-29-Deadline Season
8 0.88508505 6 hunch net-2005-01-27-Learning Complete Problems
9 0.72578931 21 hunch net-2005-02-17-Learning Research Programs
10 0.60583472 201 hunch net-2006-08-07-The Call of the Deep
11 0.59446239 60 hunch net-2005-04-23-Advantages and Disadvantages of Bayesian Learning
12 0.59383941 151 hunch net-2006-01-25-1 year
13 0.59237325 191 hunch net-2006-07-08-MaxEnt contradicts Bayes Rule?
14 0.58101434 265 hunch net-2007-10-14-NIPS workshp: Learning Problem Design
15 0.57458353 141 hunch net-2005-12-17-Workshops as Franchise Conferences
16 0.55053169 152 hunch net-2006-01-30-Should the Input Representation be a Vector?
17 0.54869801 321 hunch net-2008-10-19-NIPS 2008 workshop on Kernel Learning
18 0.53991437 283 hunch net-2008-01-07-2008 Summer Machine Learning Conference Schedule
19 0.53597367 407 hunch net-2010-08-23-Boosted Decision Trees for Deep Learning
20 0.52097476 25 hunch net-2005-02-20-At One Month