hunch_net hunch_net-2009 hunch_net-2009-347 knowledge-graph by maker-knowledge-mining

347 hunch net-2009-03-26-Machine Learning is too easy


meta infos for this blog

Source: html

Introduction: One of the remarkable things about machine learning is how diverse it is. The viewpoints of Bayesian learning, reinforcement learning, graphical models, supervised learning, unsupervised learning, genetic programming, etc… share little enough overlap that many people can and do make their careers within one without touching, or even necessarily understanding the others. There are two fundamental reasons why this is possible. For many problems, many approaches work in the sense that they do something useful. This is true empirically, where for many problems we can observe that many different approaches yield better performance than any constant predictor. It’s also true in theory, where we know that for any set of predictors representable in a finite amount of RAM, minimizing training error over the set of predictors does something nontrivial when there are a sufficient number of examples. There is nothing like a unifying problem defining the field. In many other areas there


Summary: the most important sentenses genereted by tfidf model

sentIndex sentText sentNum sentScore

1 One of the remarkable things about machine learning is how diverse it is. [sent-1, score-0.182]

2 For many problems, many approaches work in the sense that they do something useful. [sent-4, score-0.55]

3 This is true empirically, where for many problems we can observe that many different approaches yield better performance than any constant predictor. [sent-5, score-0.637]

4 It’s also true in theory, where we know that for any set of predictors representable in a finite amount of RAM, minimizing training error over the set of predictors does something nontrivial when there are a sufficient number of examples. [sent-6, score-0.853]

5 There is nothing like a unifying problem defining the field. [sent-7, score-0.367]

6 In many other areas there are unifying problems for which finer distinctions in approaches matter. [sent-8, score-0.787]

7 Particular problems are often “solved” by the first technique applied to them. [sent-10, score-0.279]

8 This is particularly exacerbated when some other field first uses machine learning, as people there may not be aware of other approaches. [sent-11, score-0.2]

9 In other fields, the baseline method is a neural network, an SVM, a graphical model, etc… The analysis of new learning algorithms is often subject to assumptions designed for the learning algorithm. [sent-13, score-0.339]

10 This assumption set selection problem is the theoretician’s version of the data set selection problem. [sent-15, score-0.671]

11 A basic problem is: How do you make progress in a field with this (lack of) structure? [sent-16, score-0.34]

12 Some possibilities are: Pick an approach and push it. [sent-18, score-0.345]

13 The fact that ML is easy means there is a real potential for doing great things this way. [sent-21, score-0.19]

14 Although almost anyone can do something nontrivial on most problems, achieving best-possible performance on some problems is not at all easy. [sent-23, score-0.422]

15 Create algorithms that work fast online, in real time, with very few or no labeled examples, but very many examples, very many features, and very many things to predict. [sent-25, score-0.586]

16 I am least fond of approach (1), although many people successfully follow approach (1) for their career. [sent-26, score-0.746]

17 What’s frustrating about approach (1), is that there does not seem to be any single simple philosophy capable of solving all the problems we might recognize as machine learning problems. [sent-27, score-0.6]

18 Consequently, people following approach (1) are at risk of being outpersuaded by someone sometime in the future. [sent-28, score-0.365]

19 Approach (3) seems solid, promoting a different kind of progress than approach (2). [sent-30, score-0.594]

20 It is not as specialized as (2) or (3), and it seems many constraints are complementary. [sent-32, score-0.212]


similar blogs computed by tfidf model

tfidf for this blog:

wordName wordTfidf (topN-words)

[('approach', 0.266), ('unifying', 0.198), ('graphical', 0.181), ('problems', 0.173), ('assumptions', 0.158), ('nontrivial', 0.158), ('progress', 0.149), ('selection', 0.133), ('many', 0.132), ('etc', 0.123), ('approaches', 0.116), ('predictors', 0.11), ('assumption', 0.107), ('technique', 0.106), ('set', 0.104), ('things', 0.103), ('correct', 0.103), ('field', 0.101), ('exacerbated', 0.099), ('margins', 0.099), ('theoretician', 0.099), ('promoting', 0.099), ('sometime', 0.099), ('justifying', 0.092), ('genetic', 0.092), ('representable', 0.092), ('something', 0.091), ('problem', 0.09), ('real', 0.087), ('finer', 0.086), ('exercise', 0.086), ('examples', 0.086), ('true', 0.084), ('performs', 0.082), ('frustrating', 0.082), ('agrees', 0.082), ('unsurprising', 0.082), ('distinctions', 0.082), ('successfully', 0.082), ('bayesian', 0.082), ('models', 0.081), ('seems', 0.08), ('sense', 0.079), ('viewpoints', 0.079), ('philosophy', 0.079), ('remarkable', 0.079), ('push', 0.079), ('defining', 0.079), ('popular', 0.079), ('solid', 0.079)]

similar blogs list:

simIndex simValue blogId blogTitle

same-blog 1 1.0000001 347 hunch net-2009-03-26-Machine Learning is too easy

Introduction: One of the remarkable things about machine learning is how diverse it is. The viewpoints of Bayesian learning, reinforcement learning, graphical models, supervised learning, unsupervised learning, genetic programming, etc… share little enough overlap that many people can and do make their careers within one without touching, or even necessarily understanding the others. There are two fundamental reasons why this is possible. For many problems, many approaches work in the sense that they do something useful. This is true empirically, where for many problems we can observe that many different approaches yield better performance than any constant predictor. It’s also true in theory, where we know that for any set of predictors representable in a finite amount of RAM, minimizing training error over the set of predictors does something nontrivial when there are a sufficient number of examples. There is nothing like a unifying problem defining the field. In many other areas there

2 0.19656752 235 hunch net-2007-03-03-All Models of Learning have Flaws

Introduction: Attempts to abstract and study machine learning are within some given framework or mathematical model. It turns out that all of these models are significantly flawed for the purpose of studying machine learning. I’ve created a table (below) outlining the major flaws in some common models of machine learning. The point here is not simply “woe unto us”. There are several implications which seem important. The multitude of models is a point of continuing confusion. It is common for people to learn about machine learning within one framework which often becomes there “home framework” through which they attempt to filter all machine learning. (Have you met people who can only think in terms of kernels? Only via Bayes Law? Only via PAC Learning?) Explicitly understanding the existence of these other frameworks can help resolve the confusion. This is particularly important when reviewing and particularly important for students. Algorithms which conform to multiple approaches c

3 0.17257659 435 hunch net-2011-05-16-Research Directions for Machine Learning and Algorithms

Introduction: Muthu invited me to the workshop on algorithms in the field , with the goal of providing a sense of where near-term research should go. When the time came though, I bargained for a post instead, which provides a chance for many other people to comment. There are several things I didn’t fully understand when I went to Yahoo! about 5 years ago. I’d like to repeat them as people in academia may not yet understand them intuitively. Almost all the big impact algorithms operate in pseudo-linear or better time. Think about caching, hashing, sorting, filtering, etc… and you have a sense of what some of the most heavily used algorithms are. This matters quite a bit to Machine Learning research, because people often work with superlinear time algorithms and languages. Two very common examples of this are graphical models, where inference is often a superlinear operation—think about the n 2 dependence on the number of states in a Hidden Markov Model and Kernelized Support Vecto

4 0.15745771 332 hunch net-2008-12-23-Use of Learning Theory

Introduction: I’ve had serious conversations with several people who believe that the theory in machine learning is “only useful for getting papers published”. That’s a compelling statement, as I’ve seen many papers where the algorithm clearly came first, and the theoretical justification for it came second, purely as a perceived means to improve the chance of publication. Naturally, I disagree and believe that learning theory has much more substantial applications. Even in core learning algorithm design, I’ve found learning theory to be useful, although it’s application is more subtle than many realize. The most straightforward applications can fail, because (as expectation suggests) worst case bounds tend to be loose in practice (*). In my experience, considering learning theory when designing an algorithm has two important effects in practice: It can help make your algorithm behave right at a crude level of analysis, leaving finer details to tuning or common sense. The best example

5 0.1550283 165 hunch net-2006-03-23-The Approximation Argument

Introduction: An argument is sometimes made that the Bayesian way is the “right” way to do machine learning. This is a serious argument which deserves a serious reply. The approximation argument is a serious reply for which I have not yet seen a reply 2 . The idea for the Bayesian approach is quite simple, elegant, and general. Essentially, you first specify a prior P(D) over possible processes D producing the data, observe the data, then condition on the data according to Bayes law to construct a posterior: P(D|x) = P(x|D)P(D)/P(x) After this, hard decisions are made (such as “turn left” or “turn right”) by choosing the one which minimizes the expected (with respect to the posterior) loss. This basic idea is reused thousands of times with various choices of P(D) and loss functions which is unsurprising given the many nice properties: There is an extremely strong associated guarantee: If the actual distribution generating the data is drawn from P(D) there is no better method.

6 0.15469791 454 hunch net-2012-01-30-ICML Posters and Scope

7 0.15325503 131 hunch net-2005-11-16-The Everything Ensemble Edge

8 0.15286048 12 hunch net-2005-02-03-Learning Theory, by assumption

9 0.14790069 7 hunch net-2005-01-31-Watchword: Assumption

10 0.14640154 158 hunch net-2006-02-24-A Fundamentalist Organization of Machine Learning

11 0.14263184 286 hunch net-2008-01-25-Turing’s Club for Machine Learning

12 0.13927282 95 hunch net-2005-07-14-What Learning Theory might do

13 0.13709667 68 hunch net-2005-05-10-Learning Reductions are Reductionist

14 0.13697627 22 hunch net-2005-02-18-What it means to do research.

15 0.13607034 237 hunch net-2007-04-02-Contextual Scaling

16 0.13573109 109 hunch net-2005-09-08-Online Learning as the Mathematics of Accountability

17 0.13532998 67 hunch net-2005-05-06-Don’t mix the solution into the problem

18 0.13518371 351 hunch net-2009-05-02-Wielding a New Abstraction

19 0.13315187 352 hunch net-2009-05-06-Machine Learning to AI

20 0.13155726 378 hunch net-2009-11-15-The Other Online Learning


similar blogs computed by lsi model

lsi for this blog:

topicId topicWeight

[(0, 0.348), (1, 0.154), (2, -0.044), (3, 0.045), (4, 0.073), (5, -0.058), (6, -0.018), (7, 0.061), (8, 0.095), (9, 0.001), (10, -0.023), (11, -0.024), (12, 0.078), (13, 0.056), (14, 0.064), (15, -0.016), (16, 0.046), (17, -0.031), (18, -0.023), (19, -0.027), (20, 0.035), (21, 0.019), (22, -0.004), (23, -0.073), (24, 0.054), (25, -0.053), (26, 0.032), (27, 0.009), (28, -0.02), (29, 0.025), (30, -0.013), (31, -0.012), (32, -0.065), (33, 0.027), (34, 0.018), (35, 0.029), (36, 0.036), (37, -0.031), (38, 0.075), (39, -0.008), (40, 0.04), (41, -0.097), (42, -0.026), (43, 0.014), (44, 0.015), (45, -0.049), (46, -0.056), (47, 0.029), (48, 0.015), (49, -0.077)]

similar blogs list:

simIndex simValue blogId blogTitle

same-blog 1 0.97761375 347 hunch net-2009-03-26-Machine Learning is too easy

Introduction: One of the remarkable things about machine learning is how diverse it is. The viewpoints of Bayesian learning, reinforcement learning, graphical models, supervised learning, unsupervised learning, genetic programming, etc… share little enough overlap that many people can and do make their careers within one without touching, or even necessarily understanding the others. There are two fundamental reasons why this is possible. For many problems, many approaches work in the sense that they do something useful. This is true empirically, where for many problems we can observe that many different approaches yield better performance than any constant predictor. It’s also true in theory, where we know that for any set of predictors representable in a finite amount of RAM, minimizing training error over the set of predictors does something nontrivial when there are a sufficient number of examples. There is nothing like a unifying problem defining the field. In many other areas there

2 0.8360855 235 hunch net-2007-03-03-All Models of Learning have Flaws

Introduction: Attempts to abstract and study machine learning are within some given framework or mathematical model. It turns out that all of these models are significantly flawed for the purpose of studying machine learning. I’ve created a table (below) outlining the major flaws in some common models of machine learning. The point here is not simply “woe unto us”. There are several implications which seem important. The multitude of models is a point of continuing confusion. It is common for people to learn about machine learning within one framework which often becomes there “home framework” through which they attempt to filter all machine learning. (Have you met people who can only think in terms of kernels? Only via Bayes Law? Only via PAC Learning?) Explicitly understanding the existence of these other frameworks can help resolve the confusion. This is particularly important when reviewing and particularly important for students. Algorithms which conform to multiple approaches c

3 0.79959732 158 hunch net-2006-02-24-A Fundamentalist Organization of Machine Learning

Introduction: There are several different flavors of Machine Learning classes. Many classes are of the ‘zoo’ sort: many different learning algorithms are presented. Others avoid the zoo by not covering the full scope of machine learning. This is my view of what makes a good machine learning class, along with why. I’d like to specifically invite comment on whether things are missing, misemphasized, or misplaced. Phase Subject Why? Introduction What is a machine learning problem? A good understanding of the characteristics of machine learning problems seems essential. Characteristics include: a data source, some hope the data is predictive, and a need for generalization. This is probably best taught in a case study manner: lay out the specifics of some problem and then ask “Is this a machine learning problem?” Introduction Machine Learning Problem Identification Identification and recognition of the type of learning problems is (obviously) a very important step i

4 0.79143918 104 hunch net-2005-08-22-Do you believe in induction?

Introduction: Foster Provost gave a talk at the ICML metalearning workshop on “metalearning” and the “no free lunch theorem” which seems worth summarizing. As a review: the no free lunch theorem is the most complicated way we know of to say that a bias is required in order to learn. The simplest way to see this is in a nonprobabilistic setting. If you are given examples of the form (x,y) and you wish to predict y from x then any prediction mechanism errs half the time in expectation over all sequences of examples. The proof of this is very simple: on every example a predictor must make some prediction and by symmetry over the set of sequences it will be wrong half the time and right half the time. The basic idea of this proof has been applied to many other settings. The simplistic interpretation of this theorem which many people jump to is “machine learning is dead” since there can be no single learning algorithm which can solve all learning problems. This is the wrong way to thi

5 0.79079896 148 hunch net-2006-01-13-Benchmarks for RL

Introduction: A couple years ago, Drew Bagnell and I started the RLBench project to setup a suite of reinforcement learning benchmark problems. We haven’t been able to touch it (due to lack of time) for a year so the project is on hold. Luckily, there are several other projects such as CLSquare and RL-Glue with a similar goal, and we strongly endorse their continued development. I would like to explain why, especially in the context of criticism of other learning benchmarks. For example, sometimes the UCI Machine Learning Repository is criticized. There are two criticisms I know of: Learning algorithms have overfit to the problems in the repository. It is easy to imagine a mechanism for this happening unintentionally. Strong evidence of this would be provided by learning algorithms which perform great on the UCI machine learning repository but very badly (relative to other learning algorithms) on non-UCI learning problems. I have seen little evidence of this but it remains a po

6 0.77953649 230 hunch net-2007-02-02-Thoughts regarding “Is machine learning different from statistics?”

7 0.7718823 435 hunch net-2011-05-16-Research Directions for Machine Learning and Algorithms

8 0.76213741 12 hunch net-2005-02-03-Learning Theory, by assumption

9 0.76040018 28 hunch net-2005-02-25-Problem: Online Learning

10 0.75804031 133 hunch net-2005-11-28-A question of quantification

11 0.75550628 31 hunch net-2005-02-26-Problem: Reductions and Relative Ranking Metrics

12 0.75503069 237 hunch net-2007-04-02-Contextual Scaling

13 0.75306606 160 hunch net-2006-03-02-Why do people count for learning?

14 0.74978203 253 hunch net-2007-07-06-Idempotent-capable Predictors

15 0.74312496 332 hunch net-2008-12-23-Use of Learning Theory

16 0.7427417 6 hunch net-2005-01-27-Learning Complete Problems

17 0.74252641 351 hunch net-2009-05-02-Wielding a New Abstraction

18 0.74180609 7 hunch net-2005-01-31-Watchword: Assumption

19 0.73358804 95 hunch net-2005-07-14-What Learning Theory might do

20 0.73045313 68 hunch net-2005-05-10-Learning Reductions are Reductionist


similar blogs computed by lda model

lda for this blog:

topicId topicWeight

[(3, 0.023), (9, 0.022), (10, 0.031), (23, 0.161), (27, 0.269), (38, 0.046), (51, 0.014), (53, 0.124), (55, 0.088), (94, 0.1), (95, 0.054)]

similar blogs list:

simIndex simValue blogId blogTitle

1 0.94997162 101 hunch net-2005-08-08-Apprenticeship Reinforcement Learning for Control

Introduction: Pieter Abbeel presented a paper with Andrew Ng at ICML on Exploration and Apprenticeship Learning in Reinforcement Learning . The basic idea of this algorithm is: Collect data from a human controlling a machine. Build a transition model based upon the experience. Build a policy which optimizes the transition model. Evaluate the policy. If it works well, halt, otherwise add the experience into the pool and go to (2). The paper proves that this technique will converge to some policy with expected performance near human expected performance assuming the world fits certain assumptions (MDP or linear dynamics). This general idea of apprenticeship learning (i.e. incorporating data from an expert) seems very compelling because (a) humans often learn this way and (b) much harder problems can be solved. For (a), the notion of teaching is about transferring knowledge from an expert to novices, often via demonstration. To see (b), note that we can create intricate rei

same-blog 2 0.93402141 347 hunch net-2009-03-26-Machine Learning is too easy

Introduction: One of the remarkable things about machine learning is how diverse it is. The viewpoints of Bayesian learning, reinforcement learning, graphical models, supervised learning, unsupervised learning, genetic programming, etc… share little enough overlap that many people can and do make their careers within one without touching, or even necessarily understanding the others. There are two fundamental reasons why this is possible. For many problems, many approaches work in the sense that they do something useful. This is true empirically, where for many problems we can observe that many different approaches yield better performance than any constant predictor. It’s also true in theory, where we know that for any set of predictors representable in a finite amount of RAM, minimizing training error over the set of predictors does something nontrivial when there are a sufficient number of examples. There is nothing like a unifying problem defining the field. In many other areas there

3 0.93375605 279 hunch net-2007-12-19-Cool and interesting things seen at NIPS

Introduction: I learned a number of things at NIPS . The financial people were there in greater force than previously. Two Sigma sponsored NIPS while DRW Trading had a booth. The adversarial machine learning workshop had a number of talks about interesting applications where an adversary really is out to try and mess up your learning algorithm. This is very different from the situation we often think of where the world is oblivious to our learning. This may present new and convincing applications for the learning-against-an-adversary work common at COLT . There were several interesing papers. Sanjoy Dasgupta , Daniel Hsu , and Claire Monteleoni had a paper on General Agnostic Active Learning . The basic idea is that active learning can be done via reduction to a form of supervised learning problem. This is great, because we have many supervised learning algorithms from which the benefits of active learning may be derived. Joseph Bradley and Robert Schapire had a P

4 0.87875307 435 hunch net-2011-05-16-Research Directions for Machine Learning and Algorithms

Introduction: Muthu invited me to the workshop on algorithms in the field , with the goal of providing a sense of where near-term research should go. When the time came though, I bargained for a post instead, which provides a chance for many other people to comment. There are several things I didn’t fully understand when I went to Yahoo! about 5 years ago. I’d like to repeat them as people in academia may not yet understand them intuitively. Almost all the big impact algorithms operate in pseudo-linear or better time. Think about caching, hashing, sorting, filtering, etc… and you have a sense of what some of the most heavily used algorithms are. This matters quite a bit to Machine Learning research, because people often work with superlinear time algorithms and languages. Two very common examples of this are graphical models, where inference is often a superlinear operation—think about the n 2 dependence on the number of states in a Hidden Markov Model and Kernelized Support Vecto

5 0.87846792 478 hunch net-2013-01-07-NYU Large Scale Machine Learning Class

Introduction: Yann LeCun and I are coteaching a class on Large Scale Machine Learning starting late January at NYU . This class will cover many tricks to get machine learning working well on datasets with many features, examples, and classes, along with several elements of deep learning and support systems enabling the previous. This is not a beginning class—you really need to have taken a basic machine learning class previously to follow along. Students will be able to run and experiment with large scale learning algorithms since Yahoo! has donated servers which are being configured into a small scale Hadoop cluster. We are planning to cover the frontier of research in scalable learning algorithms, so good class projects could easily lead to papers. For me, this is a chance to teach on many topics of past research. In general, it seems like researchers should engage in at least occasional teaching of research, both as a proof of teachability and to see their own research through th

6 0.87784398 370 hunch net-2009-09-18-Necessary and Sufficient Research

7 0.87458378 95 hunch net-2005-07-14-What Learning Theory might do

8 0.87449825 286 hunch net-2008-01-25-Turing’s Club for Machine Learning

9 0.87395704 235 hunch net-2007-03-03-All Models of Learning have Flaws

10 0.87109196 359 hunch net-2009-06-03-Functionally defined Nonlinear Dynamic Models

11 0.87013632 351 hunch net-2009-05-02-Wielding a New Abstraction

12 0.86964965 360 hunch net-2009-06-15-In Active Learning, the question changes

13 0.86879694 158 hunch net-2006-02-24-A Fundamentalist Organization of Machine Learning

14 0.86846864 12 hunch net-2005-02-03-Learning Theory, by assumption

15 0.86766368 227 hunch net-2007-01-10-A Deep Belief Net Learning Problem

16 0.8665756 134 hunch net-2005-12-01-The Webscience Future

17 0.86656022 207 hunch net-2006-09-12-Incentive Compatible Reviewing

18 0.86627406 177 hunch net-2006-05-05-An ICML reject

19 0.86613011 358 hunch net-2009-06-01-Multitask Poisoning

20 0.8658939 201 hunch net-2006-08-07-The Call of the Deep