hunch_net hunch_net-2005 hunch_net-2005-7 knowledge-graph by maker-knowledge-mining
Source: html
Introduction: “Assumption” is another word to be careful with in machine learning because it is used in several ways. Assumption = Bias There are several ways to see that some form of ‘bias’ (= preferring of one solution over another) is necessary. This is obvious in an adversarial setting. A good bit of work has been expended explaining this in other settings with “ no free lunch ” theorems. This is a usage specialized to learning which is particularly common when talking about priors for Bayesian Learning. Assumption = “if” of a theorem The assumptions are the ‘if’ part of the ‘if-then’ in a theorem. This is a fairly common usage. Assumption = Axiom The assumptions are the things that we assume are true, but which we cannot verify. Examples are “the IID assumption” or “my problem is a DNF on a small number of bits”. This is the usage which I prefer. One difficulty with any use of the word “assumption” is that you often encounter “if assumption then conclusion so if no
sentIndex sentText sentNum sentScore
1 “Assumption” is another word to be careful with in machine learning because it is used in several ways. [sent-1, score-0.383]
2 Assumption = Bias There are several ways to see that some form of ‘bias’ (= preferring of one solution over another) is necessary. [sent-2, score-0.321]
3 A good bit of work has been expended explaining this in other settings with “ no free lunch ” theorems. [sent-4, score-0.562]
4 This is a usage specialized to learning which is particularly common when talking about priors for Bayesian Learning. [sent-5, score-0.673]
5 Assumption = “if” of a theorem The assumptions are the ‘if’ part of the ‘if-then’ in a theorem. [sent-6, score-0.267]
6 Assumption = Axiom The assumptions are the things that we assume are true, but which we cannot verify. [sent-8, score-0.229]
7 Examples are “the IID assumption” or “my problem is a DNF on a small number of bits”. [sent-9, score-0.047]
8 One difficulty with any use of the word “assumption” is that you often encounter “if assumption then conclusion so if not assumption then not conclusion “. [sent-11, score-2.082]
9 For example, with variant (1), “the assumption of my prior is not met so the algorithm will not learn”. [sent-13, score-1.067]
10 Or, with variant (3), “the data is not IID, so my learning algorithm designed for IID data will not work”. [sent-14, score-0.464]
11 In each of these cases “will” must be replaced with “may” for correctness. [sent-15, score-0.216]
wordName wordTfidf (topN-words)
[('assumption', 0.664), ('usage', 0.252), ('iid', 0.221), ('conclusion', 0.218), ('variant', 0.184), ('word', 0.164), ('bias', 0.147), ('assumptions', 0.146), ('expended', 0.136), ('lunch', 0.126), ('preferring', 0.126), ('axiom', 0.119), ('incorrect', 0.119), ('correctness', 0.113), ('replaced', 0.109), ('priors', 0.105), ('specialized', 0.102), ('explaining', 0.096), ('met', 0.096), ('encounter', 0.092), ('talking', 0.09), ('careful', 0.086), ('assume', 0.083), ('settings', 0.083), ('adversarial', 0.082), ('bits', 0.082), ('another', 0.081), ('common', 0.08), ('designed', 0.079), ('data', 0.07), ('fairly', 0.069), ('theorem', 0.065), ('cases', 0.065), ('obvious', 0.065), ('free', 0.063), ('difficulty', 0.062), ('prior', 0.062), ('algorithm', 0.061), ('work', 0.058), ('true', 0.058), ('bayesian', 0.057), ('part', 0.056), ('several', 0.052), ('solution', 0.05), ('ways', 0.05), ('small', 0.047), ('learn', 0.045), ('particularly', 0.044), ('form', 0.043), ('must', 0.042)]
simIndex simValue blogId blogTitle
same-blog 1 1.0000001 7 hunch net-2005-01-31-Watchword: Assumption
Introduction: “Assumption” is another word to be careful with in machine learning because it is used in several ways. Assumption = Bias There are several ways to see that some form of ‘bias’ (= preferring of one solution over another) is necessary. This is obvious in an adversarial setting. A good bit of work has been expended explaining this in other settings with “ no free lunch ” theorems. This is a usage specialized to learning which is particularly common when talking about priors for Bayesian Learning. Assumption = “if” of a theorem The assumptions are the ‘if’ part of the ‘if-then’ in a theorem. This is a fairly common usage. Assumption = Axiom The assumptions are the things that we assume are true, but which we cannot verify. Examples are “the IID assumption” or “my problem is a DNF on a small number of bits”. This is the usage which I prefer. One difficulty with any use of the word “assumption” is that you often encounter “if assumption then conclusion so if no
2 0.22780354 12 hunch net-2005-02-03-Learning Theory, by assumption
Introduction: One way to organize learning theory is by assumption (in the assumption = axiom sense ), from no assumptions to many assumptions. As you travel down this list, the statements become stronger, but the scope of applicability decreases. No assumptions Online learning There exist a meta prediction algorithm which compete well with the best element of any set of prediction algorithms. Universal Learning Using a “bias” of 2 - description length of turing machine in learning is equivalent to all other computable biases up to some constant. Reductions The ability to predict well on classification problems is equivalent to the ability to predict well on many other learning problems. Independent and Identically Distributed (IID) Data Performance Prediction Based upon past performance, you can predict future performance. Uniform Convergence Performance prediction works even after choosing classifiers based on the data from large sets of classifiers.
3 0.18966216 57 hunch net-2005-04-16-Which Assumptions are Reasonable?
Introduction: One of the most confusing things about understanding learning theory is the vast array of differing assumptions. Some critical thought about which of these assumptions are reasonable for real-world problems may be useful. Before we even start thinking about assumptions, it’s important to realize that the word has multiple meanings . The meaning used here is “assumption = axiom” (i.e. something you can not verify). Assumption Reasonable? Which analysis? Example/notes Independent and Identically Distributed Data Sometimes PAC,ERM,Prediction bounds,statistics The KDD cup 2004 physics dataset is plausibly IID data. There are a number of situations which are “almost IID” in the sense that IID analysis results in correct intuitions. Unreasonable in adversarial situations (stock market, war, etc…) Independently Distributed Data More than IID, but still only sometimes online->batch conversion Losing “identical” can be helpful in situations where you
4 0.17558891 127 hunch net-2005-11-02-Progress in Active Learning
Introduction: Several bits of progress have been made since Sanjoy pointed out the significant lack of theoretical understanding of active learning . This is an update on the progress I know of. As a refresher, active learning as meant here is: There is a source of unlabeled data. There is an oracle from which labels can be requested for unlabeled data produced by the source. The goal is to perform well with minimal use of the oracle. Here is what I’ve learned: Sanjoy has developed sufficient and semi-necessary conditions for active learning given the assumptions of IID data and “realizability” (that one of the classifiers is a correct classifier). Nina , Alina , and I developed an algorithm for active learning relying on only the assumption of IID data. A draft is here . Nicolo , Claudio , and Luca showed that it is possible to do active learning in an entirely adversarial setting for linear threshold classifiers here . This was published a year or two ago and I r
5 0.14790069 347 hunch net-2009-03-26-Machine Learning is too easy
Introduction: One of the remarkable things about machine learning is how diverse it is. The viewpoints of Bayesian learning, reinforcement learning, graphical models, supervised learning, unsupervised learning, genetic programming, etc… share little enough overlap that many people can and do make their careers within one without touching, or even necessarily understanding the others. There are two fundamental reasons why this is possible. For many problems, many approaches work in the sense that they do something useful. This is true empirically, where for many problems we can observe that many different approaches yield better performance than any constant predictor. It’s also true in theory, where we know that for any set of predictors representable in a finite amount of RAM, minimizing training error over the set of predictors does something nontrivial when there are a sufficient number of examples. There is nothing like a unifying problem defining the field. In many other areas there
6 0.10749428 41 hunch net-2005-03-15-The State of Tight Bounds
7 0.10738312 235 hunch net-2007-03-03-All Models of Learning have Flaws
8 0.098871291 319 hunch net-2008-10-01-NIPS 2008 workshop on ‘Learning over Empirical Hypothesis Spaces’
9 0.095792279 104 hunch net-2005-08-22-Do you believe in induction?
10 0.094887458 158 hunch net-2006-02-24-A Fundamentalist Organization of Machine Learning
11 0.093750879 126 hunch net-2005-10-26-Fallback Analysis is a Secret to Useful Algorithms
12 0.092292383 109 hunch net-2005-09-08-Online Learning as the Mathematics of Accountability
13 0.089523777 157 hunch net-2006-02-18-Multiplication of Learned Probabilities is Dangerous
14 0.089174651 34 hunch net-2005-03-02-Prior, “Prior” and Bias
15 0.078283474 388 hunch net-2010-01-24-Specializations of the Master Problem
16 0.077789612 60 hunch net-2005-04-23-Advantages and Disadvantages of Bayesian Learning
17 0.077687144 27 hunch net-2005-02-23-Problem: Reinforcement Learning with Classification
18 0.075848967 23 hunch net-2005-02-19-Loss Functions for Discriminative Training of Energy-Based Models
19 0.072120264 90 hunch net-2005-07-07-The Limits of Learning Theory
20 0.071355842 227 hunch net-2007-01-10-A Deep Belief Net Learning Problem
topicId topicWeight
[(0, 0.137), (1, 0.092), (2, 0.009), (3, 0.004), (4, 0.057), (5, -0.044), (6, 0.028), (7, 0.053), (8, 0.106), (9, -0.032), (10, 0.043), (11, -0.012), (12, 0.132), (13, 0.007), (14, 0.01), (15, -0.086), (16, -0.058), (17, -0.041), (18, -0.087), (19, -0.062), (20, 0.006), (21, -0.04), (22, 0.073), (23, -0.107), (24, -0.051), (25, -0.088), (26, 0.042), (27, 0.008), (28, 0.081), (29, 0.027), (30, 0.01), (31, -0.035), (32, -0.171), (33, 0.045), (34, 0.075), (35, -0.055), (36, -0.064), (37, 0.047), (38, 0.104), (39, -0.001), (40, 0.004), (41, -0.075), (42, -0.108), (43, 0.004), (44, 0.035), (45, -0.105), (46, 0.027), (47, 0.056), (48, 0.042), (49, -0.072)]
simIndex simValue blogId blogTitle
same-blog 1 0.97293341 7 hunch net-2005-01-31-Watchword: Assumption
Introduction: “Assumption” is another word to be careful with in machine learning because it is used in several ways. Assumption = Bias There are several ways to see that some form of ‘bias’ (= preferring of one solution over another) is necessary. This is obvious in an adversarial setting. A good bit of work has been expended explaining this in other settings with “ no free lunch ” theorems. This is a usage specialized to learning which is particularly common when talking about priors for Bayesian Learning. Assumption = “if” of a theorem The assumptions are the ‘if’ part of the ‘if-then’ in a theorem. This is a fairly common usage. Assumption = Axiom The assumptions are the things that we assume are true, but which we cannot verify. Examples are “the IID assumption” or “my problem is a DNF on a small number of bits”. This is the usage which I prefer. One difficulty with any use of the word “assumption” is that you often encounter “if assumption then conclusion so if no
2 0.78776997 57 hunch net-2005-04-16-Which Assumptions are Reasonable?
Introduction: One of the most confusing things about understanding learning theory is the vast array of differing assumptions. Some critical thought about which of these assumptions are reasonable for real-world problems may be useful. Before we even start thinking about assumptions, it’s important to realize that the word has multiple meanings . The meaning used here is “assumption = axiom” (i.e. something you can not verify). Assumption Reasonable? Which analysis? Example/notes Independent and Identically Distributed Data Sometimes PAC,ERM,Prediction bounds,statistics The KDD cup 2004 physics dataset is plausibly IID data. There are a number of situations which are “almost IID” in the sense that IID analysis results in correct intuitions. Unreasonable in adversarial situations (stock market, war, etc…) Independently Distributed Data More than IID, but still only sometimes online->batch conversion Losing “identical” can be helpful in situations where you
3 0.71564567 12 hunch net-2005-02-03-Learning Theory, by assumption
Introduction: One way to organize learning theory is by assumption (in the assumption = axiom sense ), from no assumptions to many assumptions. As you travel down this list, the statements become stronger, but the scope of applicability decreases. No assumptions Online learning There exist a meta prediction algorithm which compete well with the best element of any set of prediction algorithms. Universal Learning Using a “bias” of 2 - description length of turing machine in learning is equivalent to all other computable biases up to some constant. Reductions The ability to predict well on classification problems is equivalent to the ability to predict well on many other learning problems. Independent and Identically Distributed (IID) Data Performance Prediction Based upon past performance, you can predict future performance. Uniform Convergence Performance prediction works even after choosing classifiers based on the data from large sets of classifiers.
4 0.64714301 157 hunch net-2006-02-18-Multiplication of Learned Probabilities is Dangerous
Introduction: This is about a design flaw in several learning algorithms such as the Naive Bayes classifier and Hidden Markov Models. A number of people are aware of it, but it seems that not everyone is. Several learning systems have the property that they estimate some conditional probabilities P(event | other events) either explicitly or implicitly. Then, at prediction time, these learned probabilities are multiplied together according to some formula to produce a final prediction. The Naive Bayes classifier for binary data is the simplest of these, so it seems like a good example. When Naive Bayes is used, a set of probabilities of the form Pr’(feature i | label) are estimated via counting statistics and some prior. Predictions are made according to the label maximizing: Pr’(label) * Product features i Pr’(feature i | label) (The Pr’ notation indicates these are estimated values.) There is nothing wrong with this method as long as (a) the prior for the sample counts is
5 0.57651007 104 hunch net-2005-08-22-Do you believe in induction?
Introduction: Foster Provost gave a talk at the ICML metalearning workshop on “metalearning” and the “no free lunch theorem” which seems worth summarizing. As a review: the no free lunch theorem is the most complicated way we know of to say that a bias is required in order to learn. The simplest way to see this is in a nonprobabilistic setting. If you are given examples of the form (x,y) and you wish to predict y from x then any prediction mechanism errs half the time in expectation over all sequences of examples. The proof of this is very simple: on every example a predictor must make some prediction and by symmetry over the set of sequences it will be wrong half the time and right half the time. The basic idea of this proof has been applied to many other settings. The simplistic interpretation of this theorem which many people jump to is “machine learning is dead” since there can be no single learning algorithm which can solve all learning problems. This is the wrong way to thi
6 0.57073289 126 hunch net-2005-10-26-Fallback Analysis is a Secret to Useful Algorithms
7 0.56734049 127 hunch net-2005-11-02-Progress in Active Learning
8 0.55249286 133 hunch net-2005-11-28-A question of quantification
9 0.54392934 160 hunch net-2006-03-02-Why do people count for learning?
10 0.52529246 347 hunch net-2009-03-26-Machine Learning is too easy
11 0.51902324 413 hunch net-2010-10-08-An easy proof of the Chernoff-Hoeffding bound
12 0.50589579 43 hunch net-2005-03-18-Binomial Weighting
13 0.49778578 235 hunch net-2007-03-03-All Models of Learning have Flaws
14 0.49300328 263 hunch net-2007-09-18-It’s MDL Jim, but not as we know it…(on Bayes, MDL and consistency)
15 0.46897346 165 hunch net-2006-03-23-The Approximation Argument
16 0.46538201 34 hunch net-2005-03-02-Prior, “Prior” and Bias
17 0.44350579 158 hunch net-2006-02-24-A Fundamentalist Organization of Machine Learning
18 0.43335867 191 hunch net-2006-07-08-MaxEnt contradicts Bayes Rule?
19 0.41450712 28 hunch net-2005-02-25-Problem: Online Learning
20 0.41060963 202 hunch net-2006-08-10-Precision is not accuracy
topicId topicWeight
[(3, 0.067), (20, 0.353), (27, 0.149), (77, 0.062), (94, 0.099), (95, 0.131)]
simIndex simValue blogId blogTitle
same-blog 1 0.84193379 7 hunch net-2005-01-31-Watchword: Assumption
Introduction: “Assumption” is another word to be careful with in machine learning because it is used in several ways. Assumption = Bias There are several ways to see that some form of ‘bias’ (= preferring of one solution over another) is necessary. This is obvious in an adversarial setting. A good bit of work has been expended explaining this in other settings with “ no free lunch ” theorems. This is a usage specialized to learning which is particularly common when talking about priors for Bayesian Learning. Assumption = “if” of a theorem The assumptions are the ‘if’ part of the ‘if-then’ in a theorem. This is a fairly common usage. Assumption = Axiom The assumptions are the things that we assume are true, but which we cannot verify. Examples are “the IID assumption” or “my problem is a DNF on a small number of bits”. This is the usage which I prefer. One difficulty with any use of the word “assumption” is that you often encounter “if assumption then conclusion so if no
2 0.72525448 208 hunch net-2006-09-18-What is missing for online collaborative research?
Introduction: The internet has recently made the research process much smoother: papers are easy to obtain, citations are easy to follow, and unpublished “tutorials” are often available. Yet, new research fields can look very complicated to outsiders or newcomers. Every paper is like a small piece of an unfinished jigsaw puzzle: to understand just one publication, a researcher without experience in the field will typically have to follow several layers of citations, and many of the papers he encounters have a great deal of repeated information. Furthermore, from one publication to the next, notation and terminology may not be consistent which can further confuse the reader. But the internet is now proving to be an extremely useful medium for collaboration and knowledge aggregation. Online forums allow users to ask and answer questions and to share ideas. The recent phenomenon of Wikipedia provides a proof-of-concept for the “anyone can edit” system. Can such models be used to facilitate research a
3 0.68396646 190 hunch net-2006-07-06-Branch Prediction Competition
Introduction: Alan Fern points out the second branch prediction challenge (due September 29) which is a follow up to the first branch prediction competition . Branch prediction is one of the fundamental learning problems of the computer age: without it our computers might run an order of magnitude slower. This is a tough problem since there are sharp constraints on time and space complexity in an online environment. For machine learning, the “idealistic track” may fit well. Essentially, they remove these constraints to gain a weak upper bound on what might be done.
4 0.60096467 464 hunch net-2012-05-03-Microsoft Research, New York City
Introduction: Yahoo! laid off people . Unlike every previous time there have been layoffs, this is serious for Yahoo! Research . We had advanced warning from Prabhakar through the simple act of leaving . Yahoo! Research was a world class organization that Prabhakar recruited much of personally, so it is deeply implausible that he would spontaneously decide to leave. My first thought when I saw the news was “Uhoh, Rob said that he knew it was serious when the head of ATnT Research left.” In this case it was even more significant, because Prabhakar recruited me on the premise that Y!R was an experiment in how research should be done: via a combination of high quality people and high engagement with the company. Prabhakar’s departure is a clear end to that experiment. The result is ambiguous from a business perspective. Y!R clearly was not capable of saving the company from its illnesses. I’m not privy to the internal accounting of impact and this is the kind of subject where there c
5 0.55870503 351 hunch net-2009-05-02-Wielding a New Abstraction
Introduction: This post is partly meant as an advertisement for the reductions tutorial Alina , Bianca , and I are planning to do at ICML . Please come, if you are interested. Many research programs can be thought of as finding and building new useful abstractions. The running example I’ll use is learning reductions where I have experience. The basic abstraction here is that we can build a learning algorithm capable of solving classification problems up to a small expected regret. This is used repeatedly to solve more complex problems. In working on a new abstraction, I think you typically run into many substantial problems of understanding, which make publishing particularly difficult. It is difficult to seriously discuss the reason behind or mechanism for abstraction in a conference paper with small page limits. People rarely see such discussions and hence have little basis on which to think about new abstractions. Another difficulty is that when building an abstraction, yo
6 0.52418226 373 hunch net-2009-10-03-Static vs. Dynamic multiclass prediction
7 0.52313089 127 hunch net-2005-11-02-Progress in Active Learning
8 0.522443 344 hunch net-2009-02-22-Effective Research Funding
9 0.49596852 389 hunch net-2010-02-26-Yahoo! ML events
10 0.49438849 456 hunch net-2012-02-24-ICML+50%
11 0.49152198 136 hunch net-2005-12-07-Is the Google way the way for machine learning?
12 0.49029511 30 hunch net-2005-02-25-Why Papers?
13 0.4870429 388 hunch net-2010-01-24-Specializations of the Master Problem
14 0.4858295 116 hunch net-2005-09-30-Research in conferences
15 0.48246604 132 hunch net-2005-11-26-The Design of an Optimal Research Environment
16 0.48172295 360 hunch net-2009-06-15-In Active Learning, the question changes
17 0.48102665 36 hunch net-2005-03-05-Funding Research
18 0.47845152 105 hunch net-2005-08-23-(Dis)similarities between academia and open source programmers
19 0.4735736 109 hunch net-2005-09-08-Online Learning as the Mathematics of Accountability
20 0.47252098 57 hunch net-2005-04-16-Which Assumptions are Reasonable?