hunch_net hunch_net-2005 hunch_net-2005-57 knowledge-graph by maker-knowledge-mining
Source: html
Introduction: One of the most confusing things about understanding learning theory is the vast array of differing assumptions. Some critical thought about which of these assumptions are reasonable for real-world problems may be useful. Before we even start thinking about assumptions, it’s important to realize that the word has multiple meanings . The meaning used here is “assumption = axiom” (i.e. something you can not verify). Assumption Reasonable? Which analysis? Example/notes Independent and Identically Distributed Data Sometimes PAC,ERM,Prediction bounds,statistics The KDD cup 2004 physics dataset is plausibly IID data. There are a number of situations which are “almost IID” in the sense that IID analysis results in correct intuitions. Unreasonable in adversarial situations (stock market, war, etc…) Independently Distributed Data More than IID, but still only sometimes online->batch conversion Losing “identical” can be helpful in situations where you
sentIndex sentText sentNum sentScore
1 One of the most confusing things about understanding learning theory is the vast array of differing assumptions. [sent-1, score-0.24]
2 Some critical thought about which of these assumptions are reasonable for real-world problems may be useful. [sent-2, score-0.307]
3 Before we even start thinking about assumptions, it’s important to realize that the word has multiple meanings . [sent-3, score-0.083]
4 There are a number of situations which are “almost IID” in the sense that IID analysis results in correct intuitions. [sent-10, score-0.435]
5 Unreasonable in adversarial situations (stock market, war, etc…) Independently Distributed Data More than IID, but still only sometimes online->batch conversion Losing “identical” can be helpful in situations where you have a cyclic process generating data. [sent-11, score-0.91]
6 Finite exchangeability (FEX) Sometimes reasonable as for IID There are a good number of situations where there is a population we wish to classify, pay someone to classify a random subset, and then try to learn. [sent-12, score-0.74]
7 Functional form: Monotonic on variables Often PAC-style Many natural problems seem to have behavior monotonic in their input variables. [sent-16, score-0.502]
8 Functional form: xor Occasionally PAC I was suprised to observe this . [sent-17, score-0.088]
9 Known optimal state distribution Sometimes RL Sometimes humans know what is going on, and sometimes not. [sent-19, score-0.266]
10 Small approximation error everywhere Rarely RL Approximate policy iteration is known for sometimes behaving oddly. [sent-20, score-0.593]
11 If anyone particularly agrees or disagrees with the reasonableness of these assumptions, I’m quite interested. [sent-21, score-0.262]
wordName wordTfidf (topN-words)
[('pac', 0.275), ('sometimes', 0.266), ('iid', 0.256), ('situations', 0.227), ('monotonic', 0.213), ('functional', 0.201), ('variables', 0.191), ('rl', 0.183), ('classify', 0.175), ('reasonable', 0.155), ('assumptions', 0.152), ('form', 0.148), ('distributed', 0.122), ('rarely', 0.118), ('analysis', 0.11), ('assumption', 0.102), ('correct', 0.098), ('input', 0.098), ('reasonableness', 0.095), ('behaving', 0.095), ('conversion', 0.095), ('cyclic', 0.095), ('exchangeability', 0.095), ('mix', 0.095), ('unreasonable', 0.095), ('sphere', 0.088), ('confusing', 0.088), ('disagrees', 0.088), ('inputs', 0.088), ('population', 0.088), ('verify', 0.088), ('xor', 0.088), ('axiom', 0.083), ('meanings', 0.083), ('differing', 0.083), ('mixing', 0.083), ('war', 0.083), ('identically', 0.079), ('occasionally', 0.079), ('agrees', 0.079), ('everywhere', 0.079), ('independently', 0.079), ('known', 0.077), ('erm', 0.076), ('iteration', 0.076), ('ever', 0.071), ('identical', 0.071), ('losing', 0.071), ('stock', 0.069), ('array', 0.069)]
simIndex simValue blogId blogTitle
same-blog 1 1.0000005 57 hunch net-2005-04-16-Which Assumptions are Reasonable?
Introduction: One of the most confusing things about understanding learning theory is the vast array of differing assumptions. Some critical thought about which of these assumptions are reasonable for real-world problems may be useful. Before we even start thinking about assumptions, it’s important to realize that the word has multiple meanings . The meaning used here is “assumption = axiom” (i.e. something you can not verify). Assumption Reasonable? Which analysis? Example/notes Independent and Identically Distributed Data Sometimes PAC,ERM,Prediction bounds,statistics The KDD cup 2004 physics dataset is plausibly IID data. There are a number of situations which are “almost IID” in the sense that IID analysis results in correct intuitions. Unreasonable in adversarial situations (stock market, war, etc…) Independently Distributed Data More than IID, but still only sometimes online->batch conversion Losing “identical” can be helpful in situations where you
2 0.19285113 12 hunch net-2005-02-03-Learning Theory, by assumption
Introduction: One way to organize learning theory is by assumption (in the assumption = axiom sense ), from no assumptions to many assumptions. As you travel down this list, the statements become stronger, but the scope of applicability decreases. No assumptions Online learning There exist a meta prediction algorithm which compete well with the best element of any set of prediction algorithms. Universal Learning Using a “bias” of 2 - description length of turing machine in learning is equivalent to all other computable biases up to some constant. Reductions The ability to predict well on classification problems is equivalent to the ability to predict well on many other learning problems. Independent and Identically Distributed (IID) Data Performance Prediction Based upon past performance, you can predict future performance. Uniform Convergence Performance prediction works even after choosing classifiers based on the data from large sets of classifiers.
3 0.18966216 7 hunch net-2005-01-31-Watchword: Assumption
Introduction: “Assumption” is another word to be careful with in machine learning because it is used in several ways. Assumption = Bias There are several ways to see that some form of ‘bias’ (= preferring of one solution over another) is necessary. This is obvious in an adversarial setting. A good bit of work has been expended explaining this in other settings with “ no free lunch ” theorems. This is a usage specialized to learning which is particularly common when talking about priors for Bayesian Learning. Assumption = “if” of a theorem The assumptions are the ‘if’ part of the ‘if-then’ in a theorem. This is a fairly common usage. Assumption = Axiom The assumptions are the things that we assume are true, but which we cannot verify. Examples are “the IID assumption” or “my problem is a DNF on a small number of bits”. This is the usage which I prefer. One difficulty with any use of the word “assumption” is that you often encounter “if assumption then conclusion so if no
4 0.16524903 235 hunch net-2007-03-03-All Models of Learning have Flaws
Introduction: Attempts to abstract and study machine learning are within some given framework or mathematical model. It turns out that all of these models are significantly flawed for the purpose of studying machine learning. I’ve created a table (below) outlining the major flaws in some common models of machine learning. The point here is not simply “woe unto us”. There are several implications which seem important. The multitude of models is a point of continuing confusion. It is common for people to learn about machine learning within one framework which often becomes there “home framework” through which they attempt to filter all machine learning. (Have you met people who can only think in terms of kernels? Only via Bayes Law? Only via PAC Learning?) Explicitly understanding the existence of these other frameworks can help resolve the confusion. This is particularly important when reviewing and particularly important for students. Algorithms which conform to multiple approaches c
5 0.1514042 127 hunch net-2005-11-02-Progress in Active Learning
Introduction: Several bits of progress have been made since Sanjoy pointed out the significant lack of theoretical understanding of active learning . This is an update on the progress I know of. As a refresher, active learning as meant here is: There is a source of unlabeled data. There is an oracle from which labels can be requested for unlabeled data produced by the source. The goal is to perform well with minimal use of the oracle. Here is what I’ve learned: Sanjoy has developed sufficient and semi-necessary conditions for active learning given the assumptions of IID data and “realizability” (that one of the classifiers is a correct classifier). Nina , Alina , and I developed an algorithm for active learning relying on only the assumption of IID data. A draft is here . Nicolo , Claudio , and Luca showed that it is possible to do active learning in an entirely adversarial setting for linear threshold classifiers here . This was published a year or two ago and I r
6 0.11439978 347 hunch net-2009-03-26-Machine Learning is too easy
7 0.10955691 170 hunch net-2006-04-06-Bounds greater than 1
8 0.10579772 183 hunch net-2006-06-14-Explorations of Exploration
9 0.10429551 160 hunch net-2006-03-02-Why do people count for learning?
10 0.10088301 213 hunch net-2006-10-08-Incompatibilities between classical confidence intervals and learning.
11 0.096977614 126 hunch net-2005-10-26-Fallback Analysis is a Secret to Useful Algorithms
12 0.094187997 220 hunch net-2006-11-27-Continuizing Solutions
13 0.093401372 204 hunch net-2006-08-28-Learning Theory standards for NIPS 2006
14 0.091336653 253 hunch net-2007-07-06-Idempotent-capable Predictors
15 0.090281777 388 hunch net-2010-01-24-Specializations of the Master Problem
16 0.088824227 351 hunch net-2009-05-02-Wielding a New Abstraction
17 0.083025135 262 hunch net-2007-09-16-Optimizing Machine Learning Programs
18 0.082368031 158 hunch net-2006-02-24-A Fundamentalist Organization of Machine Learning
19 0.081753165 343 hunch net-2009-02-18-Decision by Vetocracy
20 0.080818251 35 hunch net-2005-03-04-The Big O and Constants in Learning
topicId topicWeight
[(0, 0.191), (1, 0.087), (2, 0.021), (3, 0.003), (4, 0.06), (5, -0.051), (6, 0.041), (7, 0.045), (8, 0.121), (9, 0.002), (10, 0.055), (11, 0.008), (12, 0.078), (13, 0.019), (14, -0.025), (15, -0.082), (16, -0.043), (17, -0.041), (18, -0.014), (19, -0.011), (20, 0.025), (21, -0.003), (22, 0.024), (23, -0.061), (24, -0.031), (25, -0.102), (26, 0.002), (27, 0.003), (28, 0.017), (29, -0.024), (30, 0.007), (31, -0.026), (32, -0.14), (33, 0.067), (34, 0.112), (35, -0.09), (36, -0.035), (37, 0.115), (38, 0.047), (39, 0.004), (40, -0.052), (41, -0.01), (42, -0.006), (43, -0.018), (44, -0.039), (45, -0.024), (46, 0.044), (47, 0.026), (48, 0.059), (49, -0.062)]
simIndex simValue blogId blogTitle
same-blog 1 0.97865725 57 hunch net-2005-04-16-Which Assumptions are Reasonable?
Introduction: One of the most confusing things about understanding learning theory is the vast array of differing assumptions. Some critical thought about which of these assumptions are reasonable for real-world problems may be useful. Before we even start thinking about assumptions, it’s important to realize that the word has multiple meanings . The meaning used here is “assumption = axiom” (i.e. something you can not verify). Assumption Reasonable? Which analysis? Example/notes Independent and Identically Distributed Data Sometimes PAC,ERM,Prediction bounds,statistics The KDD cup 2004 physics dataset is plausibly IID data. There are a number of situations which are “almost IID” in the sense that IID analysis results in correct intuitions. Unreasonable in adversarial situations (stock market, war, etc…) Independently Distributed Data More than IID, but still only sometimes online->batch conversion Losing “identical” can be helpful in situations where you
2 0.83736968 7 hunch net-2005-01-31-Watchword: Assumption
Introduction: “Assumption” is another word to be careful with in machine learning because it is used in several ways. Assumption = Bias There are several ways to see that some form of ‘bias’ (= preferring of one solution over another) is necessary. This is obvious in an adversarial setting. A good bit of work has been expended explaining this in other settings with “ no free lunch ” theorems. This is a usage specialized to learning which is particularly common when talking about priors for Bayesian Learning. Assumption = “if” of a theorem The assumptions are the ‘if’ part of the ‘if-then’ in a theorem. This is a fairly common usage. Assumption = Axiom The assumptions are the things that we assume are true, but which we cannot verify. Examples are “the IID assumption” or “my problem is a DNF on a small number of bits”. This is the usage which I prefer. One difficulty with any use of the word “assumption” is that you often encounter “if assumption then conclusion so if no
3 0.68321174 12 hunch net-2005-02-03-Learning Theory, by assumption
Introduction: One way to organize learning theory is by assumption (in the assumption = axiom sense ), from no assumptions to many assumptions. As you travel down this list, the statements become stronger, but the scope of applicability decreases. No assumptions Online learning There exist a meta prediction algorithm which compete well with the best element of any set of prediction algorithms. Universal Learning Using a “bias” of 2 - description length of turing machine in learning is equivalent to all other computable biases up to some constant. Reductions The ability to predict well on classification problems is equivalent to the ability to predict well on many other learning problems. Independent and Identically Distributed (IID) Data Performance Prediction Based upon past performance, you can predict future performance. Uniform Convergence Performance prediction works even after choosing classifiers based on the data from large sets of classifiers.
4 0.66962814 157 hunch net-2006-02-18-Multiplication of Learned Probabilities is Dangerous
Introduction: This is about a design flaw in several learning algorithms such as the Naive Bayes classifier and Hidden Markov Models. A number of people are aware of it, but it seems that not everyone is. Several learning systems have the property that they estimate some conditional probabilities P(event | other events) either explicitly or implicitly. Then, at prediction time, these learned probabilities are multiplied together according to some formula to produce a final prediction. The Naive Bayes classifier for binary data is the simplest of these, so it seems like a good example. When Naive Bayes is used, a set of probabilities of the form Pr’(feature i | label) are estimated via counting statistics and some prior. Predictions are made according to the label maximizing: Pr’(label) * Product features i Pr’(feature i | label) (The Pr’ notation indicates these are estimated values.) There is nothing wrong with this method as long as (a) the prior for the sample counts is
5 0.65309352 126 hunch net-2005-10-26-Fallback Analysis is a Secret to Useful Algorithms
Introduction: The ideal of theoretical algorithm analysis is to construct an algorithm with accompanying optimality theorems proving that it is a useful algorithm. This ideal often fails, particularly for learning algorithms and theory. The general form of a theorem is: If preconditions Then postconditions When we design learning algorithms it is very common to come up with precondition assumptions such as “the data is IID”, “the learning problem is drawn from a known distribution over learning problems”, or “there is a perfect classifier”. All of these example preconditions can be false for real-world problems in ways that are not easily detectable. This means that algorithms derived and justified by these very common forms of analysis may be prone to catastrophic failure in routine (mis)application. We can hope for better. Several different kinds of learning algorithm analysis have been developed some of which have fewer preconditions. Simply demanding that these forms of analysi
6 0.62005913 160 hunch net-2006-03-02-Why do people count for learning?
7 0.60740024 127 hunch net-2005-11-02-Progress in Active Learning
8 0.58843929 104 hunch net-2005-08-22-Do you believe in induction?
9 0.58117771 202 hunch net-2006-08-10-Precision is not accuracy
10 0.57692623 133 hunch net-2005-11-28-A question of quantification
11 0.56356466 235 hunch net-2007-03-03-All Models of Learning have Flaws
12 0.55754322 183 hunch net-2006-06-14-Explorations of Exploration
13 0.55366921 263 hunch net-2007-09-18-It’s MDL Jim, but not as we know it…(on Bayes, MDL and consistency)
14 0.54495597 347 hunch net-2009-03-26-Machine Learning is too easy
15 0.53801244 204 hunch net-2006-08-28-Learning Theory standards for NIPS 2006
16 0.52401745 165 hunch net-2006-03-23-The Approximation Argument
17 0.50593084 115 hunch net-2005-09-26-Prediction Bounds as the Mathematics of Science
18 0.49428526 150 hunch net-2006-01-23-On Coding via Mutual Information & Bayes Nets
19 0.49201238 162 hunch net-2006-03-09-Use of Notation
20 0.48425075 432 hunch net-2011-04-20-The End of the Beginning of Active Learning
topicId topicWeight
[(27, 0.209), (32, 0.284), (53, 0.098), (55, 0.081), (77, 0.018), (94, 0.127), (95, 0.082)]
simIndex simValue blogId blogTitle
same-blog 1 0.85581553 57 hunch net-2005-04-16-Which Assumptions are Reasonable?
Introduction: One of the most confusing things about understanding learning theory is the vast array of differing assumptions. Some critical thought about which of these assumptions are reasonable for real-world problems may be useful. Before we even start thinking about assumptions, it’s important to realize that the word has multiple meanings . The meaning used here is “assumption = axiom” (i.e. something you can not verify). Assumption Reasonable? Which analysis? Example/notes Independent and Identically Distributed Data Sometimes PAC,ERM,Prediction bounds,statistics The KDD cup 2004 physics dataset is plausibly IID data. There are a number of situations which are “almost IID” in the sense that IID analysis results in correct intuitions. Unreasonable in adversarial situations (stock market, war, etc…) Independently Distributed Data More than IID, but still only sometimes online->batch conversion Losing “identical” can be helpful in situations where you
2 0.81853229 49 hunch net-2005-03-30-What can Type Theory teach us about Machine Learning?
Introduction: This post is some combination of belaboring the obvious and speculating wildly about the future. The basic issue to be addressed is how to think about machine learning in terms given to us from Programming Language theory. Types and Reductions John’s research programme (I feel this should be in British spelling to reflect the grandiousness of the idea…) of machine learning reductions StateOfReduction is at some essential level type-theoretic in nature. The fundamental elements are the classifier, a function f: alpha -> beta, and the corresponding classifier trainer g: List of (alpha,beta) -> (alpha -> beta). The research goal is to create *combinators* that produce new f’s and g’s given existing ones. John (probably quite rightly) seems unwilling at the moment to commit to any notion stronger than these combinators are correctly typed. One way to see the result of a reduction is something typed like: (For those denied the joy of the Hindly-Milner type system, “simple” is probab
3 0.69054508 286 hunch net-2008-01-25-Turing’s Club for Machine Learning
Introduction: Many people in Machine Learning don’t fully understand the impact of computation, as demonstrated by a lack of big-O analysis of new learning algorithms. This is important—some current active research programs are fundamentally flawed w.r.t. computation, and other research programs are directly motivated by it. When considering a learning algorithm, I think about the following questions: How does the learning algorithm scale with the number of examples m ? Any algorithm using all of the data is at least O(m) , but in many cases this is O(m 2 ) (naive nearest neighbor for self-prediction) or unknown (k-means or many other optimization algorithms). The unknown case is very common, and it can mean (for example) that the algorithm isn’t convergent or simply that the amount of computation isn’t controlled. The above question can also be asked for test cases. In some applications, test-time performance is of great importance. How does the algorithm scale with the number of
4 0.68079913 132 hunch net-2005-11-26-The Design of an Optimal Research Environment
Introduction: How do you create an optimal environment for research? Here are some essential ingredients that I see. Stability . University-based research is relatively good at this. On any particular day, researchers face choices in what they will work on. A very common tradeoff is between: easy small difficult big For researchers without stability, the ‘easy small’ option wins. This is often “ok”—a series of incremental improvements on the state of the art can add up to something very beneficial. However, it misses one of the big potentials of research: finding entirely new and better ways of doing things. Stability comes in many forms. The prototypical example is tenure at a university—a tenured professor is almost imposssible to fire which means that the professor has the freedom to consider far horizon activities. An iron-clad guarantee of a paycheck is not necessary—industrial research labs have succeeded well with research positions of indefinite duration. Atnt rese
5 0.67967272 141 hunch net-2005-12-17-Workshops as Franchise Conferences
Introduction: Founding a successful new conference is extraordinarily difficult. As a conference founder, you must manage to attract a significant number of good papers—enough to entice the participants into participating next year and to (generally) to grow the conference. For someone choosing to participate in a new conference, there is a very significant decision to make: do you send a paper to some new conference with no guarantee that the conference will work out? Or do you send it to another (possibly less related) conference that you are sure will work? The conference founding problem is a joint agreement problem with a very significant barrier. Workshops are a way around this problem, and workshops attached to conferences are a particularly effective means for this. A workshop at a conference is sure to have people available to speak and attend and is sure to have a large audience available. Presenting work at a workshop is not generally exclusive: it can also be presented at a confe
6 0.67483258 435 hunch net-2011-05-16-Research Directions for Machine Learning and Algorithms
7 0.67341465 370 hunch net-2009-09-18-Necessary and Sufficient Research
8 0.67339772 371 hunch net-2009-09-21-Netflix finishes (and starts)
9 0.6709426 359 hunch net-2009-06-03-Functionally defined Nonlinear Dynamic Models
10 0.67085701 95 hunch net-2005-07-14-What Learning Theory might do
11 0.67074364 478 hunch net-2013-01-07-NYU Large Scale Machine Learning Class
12 0.67012894 22 hunch net-2005-02-18-What it means to do research.
13 0.66984677 360 hunch net-2009-06-15-In Active Learning, the question changes
14 0.66901213 347 hunch net-2009-03-26-Machine Learning is too easy
15 0.66895008 237 hunch net-2007-04-02-Contextual Scaling
16 0.66740698 358 hunch net-2009-06-01-Multitask Poisoning
17 0.66731435 351 hunch net-2009-05-02-Wielding a New Abstraction
18 0.6665085 235 hunch net-2007-03-03-All Models of Learning have Flaws
19 0.66271114 229 hunch net-2007-01-26-Parallel Machine Learning Problems
20 0.66225356 134 hunch net-2005-12-01-The Webscience Future