hunch_net hunch_net-2008 hunch_net-2008-332 knowledge-graph by maker-knowledge-mining

332 hunch net-2008-12-23-Use of Learning Theory

meta infos for this blog

Source: html

Introduction: I’ve had serious conversations with several people who believe that the theory in machine learning is “only useful for getting papers published”. That’s a compelling statement, as I’ve seen many papers where the algorithm clearly came first, and the theoretical justification for it came second, purely as a perceived means to improve the chance of publication. Naturally, I disagree and believe that learning theory has much more substantial applications. Even in core learning algorithm design, I’ve found learning theory to be useful, although it’s application is more subtle than many realize. The most straightforward applications can fail, because (as expectation suggests) worst case bounds tend to be loose in practice (*). In my experience, considering learning theory when designing an algorithm has two important effects in practice: It can help make your algorithm behave right at a crude level of analysis, leaving finer details to tuning or common sense. The best example

Summary: the most important sentenses genereted by tfidf model

sentIndex sentText sentNum sentScore

1 I’ve had serious conversations with several people who believe that the theory in machine learning is “only useful for getting papers published”. [sent-1, score-0.594]

2 That’s a compelling statement, as I’ve seen many papers where the algorithm clearly came first, and the theoretical justification for it came second, purely as a perceived means to improve the chance of publication. [sent-2, score-0.697]

3 Naturally, I disagree and believe that learning theory has much more substantial applications. [sent-3, score-0.602]

4 Even in core learning algorithm design, I’ve found learning theory to be useful, although it’s application is more subtle than many realize. [sent-4, score-1.074]

5 In my experience, considering learning theory when designing an algorithm has two important effects in practice: It can help make your algorithm behave right at a crude level of analysis, leaving finer details to tuning or common sense. [sent-6, score-1.08]

6 An algorithm with learning theory considered in it’s design can be more automatic. [sent-8, score-0.684]

7 The “when tuned well” caveat is however substantial, because learning algorithms may be applied by nonexperts or by other algorithms which are computationally constrained. [sent-10, score-0.505]

8 In my experience, learning theory is most useful in it’s crudest forms. [sent-13, score-0.594]

9 I mean this in the broadest sense imaginable: Is it a learning problem or not? [sent-15, score-0.404]

10 Learning theory such as statistical bounds and online learning with experts helps substantially here because it provides guidelines about what is possible to learn and what not. [sent-17, score-0.705]

11 Answering these questions is partly definition checking, and since the answer is often “all of the above”, figuring out which aspect of the problem to address first or next is helpful. [sent-27, score-0.365]

12 Here the relative capacity of a learning algorithm and it’s computational efficiency are most important. [sent-29, score-0.448]

13 If you have few features and many examples, a nonlinear algorithm with more representational capacity is a good idea. [sent-30, score-0.501]

14 If you have many features and little data, linear representations or even exponentiated gradient style algorithms are important. [sent-31, score-0.499]

15 Learning theory can help in all of the above by quantifying “many”, “little”, “most”, and “few”. [sent-34, score-0.46]

16 One thing I realized recently is that the overfitting problem can be a concern even with very large natural datasets, because some examples are naturally more important than others. [sent-36, score-0.459]

17 As might be clear, I think of learning theory as somewhat broader than might be traditional. [sent-37, score-0.512]

18 Many people have only been exposed to one piece of learning theory, often VC theory or it’s cousins. [sent-39, score-0.512]

19 VC theory is a good theory, but it is not complete, and other elements of learning theory seem at least as important and useful. [sent-41, score-0.971]

20 Simply sampling from the learning theory in existing papers does not necessarily give a good distribution of subjects for teaching, because the goal of impressing reviewers does not necessarily coincide with the clean simple analysis that is teachable. [sent-43, score-0.965]

similar blogs computed by tfidf model

tfidf for this blog:

wordName wordTfidf (topN-words)

[('theory', 0.385), ('problem', 0.185), ('algorithm', 0.172), ('capacity', 0.149), ('vc', 0.142), ('answering', 0.142), ('tuned', 0.137), ('learning', 0.127), ('overfitting', 0.115), ('bounds', 0.114), ('subtle', 0.113), ('automatic', 0.113), ('data', 0.109), ('features', 0.108), ('aspect', 0.106), ('came', 0.104), ('ve', 0.103), ('analysis', 0.099), ('necessarily', 0.099), ('mean', 0.092), ('substantial', 0.09), ('little', 0.089), ('impressing', 0.085), ('isomap', 0.085), ('gained', 0.085), ('justification', 0.085), ('perceived', 0.085), ('naturally', 0.085), ('computationally', 0.085), ('useful', 0.082), ('practice', 0.08), ('median', 0.079), ('exponentiated', 0.079), ('guidelines', 0.079), ('usefulness', 0.079), ('application', 0.078), ('algorithms', 0.078), ('tuning', 0.075), ('tightness', 0.075), ('quantifying', 0.075), ('finer', 0.075), ('purely', 0.075), ('memorization', 0.075), ('architecting', 0.075), ('questions', 0.074), ('important', 0.074), ('linear', 0.073), ('many', 0.072), ('clean', 0.071), ('loose', 0.071)]

similar blogs list:

simIndex simValue blogId blogTitle

same-blog 1 1.0000002 332 hunch net-2008-12-23-Use of Learning Theory

2 0.2366171 454 hunch net-2012-01-30-ICML Posters and Scope

Introduction: Normally, I don’t indulge in posters for ICML , but this year is naturally an exception for me. If you want one, there are a small number left here , if you sign up before February. It also seems worthwhile to give some sense of the scope and reviewing criteria for ICML for authors considering submitting papers. At ICML, the (very large) program committee does the reviewing which informs final decisions by area chairs on most papers. Program chairs setup the process, deal with exceptions or disagreements, and provide advice for the reviewing process. Providing advice is tricky (and easily misleading) because a conference is a community, and in the end the aggregate interests of the community determine the conference. Nevertheless, as a program chair this year it seems worthwhile to state the overall philosophy I have and what I plan to encourage (and occasionally discourage). At the highest level, I believe ICML exists to further research into machine learning, which I gene

3 0.20385425 95 hunch net-2005-07-14-What Learning Theory might do

Introduction: I wanted to expand on this post and some of the previous problems/research directions about where learning theory might make large strides. Why theory? The essential reason for theory is “intuition extension”. A very good applied learning person can master some particular application domain yielding the best computer algorithms for solving that problem. A very good theory can take the intuitions discovered by this and other applied learning people and extend them to new domains in a relatively automatic fashion. To do this, we take these basic intuitions and try to find a mathematical model that: Explains the basic intuitions. Makes new testable predictions about how to learn. Succeeds in so learning. This is “intuition extension”: taking what we have learned somewhere else and applying it in new domains. It is fundamentally useful to everyone because it increases the level of automation in solving problems. Where next for learning theory? I like the a

4 0.18301791 235 hunch net-2007-03-03-All Models of Learning have Flaws

Introduction: Attempts to abstract and study machine learning are within some given framework or mathematical model. It turns out that all of these models are significantly flawed for the purpose of studying machine learning. I’ve created a table (below) outlining the major flaws in some common models of machine learning. The point here is not simply “woe unto us”. There are several implications which seem important. The multitude of models is a point of continuing confusion. It is common for people to learn about machine learning within one framework which often becomes there “home framework” through which they attempt to filter all machine learning. (Have you met people who can only think in terms of kernels? Only via Bayes Law? Only via PAC Learning?) Explicitly understanding the existence of these other frameworks can help resolve the confusion. This is particularly important when reviewing and particularly important for students. Algorithms which conform to multiple approaches c

5 0.17167114 435 hunch net-2011-05-16-Research Directions for Machine Learning and Algorithms

Introduction: Muthu invited me to the workshop on algorithms in the field , with the goal of providing a sense of where near-term research should go. When the time came though, I bargained for a post instead, which provides a chance for many other people to comment. There are several things I didn’t fully understand when I went to Yahoo! about 5 years ago. I’d like to repeat them as people in academia may not yet understand them intuitively. Almost all the big impact algorithms operate in pseudo-linear or better time. Think about caching, hashing, sorting, filtering, etc… and you have a sense of what some of the most heavily used algorithms are. This matters quite a bit to Machine Learning research, because people often work with superlinear time algorithms and languages. Two very common examples of this are graphical models, where inference is often a superlinear operation—think about the n 2 dependence on the number of states in a Hidden Markov Model and Kernelized Support Vecto

6 0.17160492 286 hunch net-2008-01-25-Turing’s Club for Machine Learning

7 0.16602181 351 hunch net-2009-05-02-Wielding a New Abstraction

8 0.16548215 28 hunch net-2005-02-25-Problem: Online Learning

9 0.16265804 35 hunch net-2005-03-04-The Big O and Constants in Learning

10 0.15745771 347 hunch net-2009-03-26-Machine Learning is too easy

11 0.15707739 368 hunch net-2009-08-26-Another 10-year paper in Machine Learning

12 0.15313897 343 hunch net-2009-02-18-Decision by Vetocracy

13 0.15199937 19 hunch net-2005-02-14-Clever Methods of Overfitting

14 0.15081125 41 hunch net-2005-03-15-The State of Tight Bounds

15 0.15072063 345 hunch net-2009-03-08-Prediction Science

16 0.14866036 158 hunch net-2006-02-24-A Fundamentalist Organization of Machine Learning

17 0.14491341 337 hunch net-2009-01-21-Nearly all natural problems require nonlinearity

18 0.14160083 42 hunch net-2005-03-17-Going all the Way, Sometimes

19 0.13931915 204 hunch net-2006-08-28-Learning Theory standards for NIPS 2006

20 0.13836722 432 hunch net-2011-04-20-The End of the Beginning of Active Learning

similar blogs computed by lsi model

lsi for this blog:

topicId topicWeight

[(0, 0.373), (1, 0.111), (2, 0.001), (3, 0.026), (4, 0.117), (5, -0.053), (6, 0.004), (7, 0.009), (8, 0.02), (9, 0.02), (10, 0.007), (11, 0.0), (12, 0.01), (13, 0.098), (14, 0.053), (15, 0.028), (16, 0.115), (17, -0.026), (18, -0.109), (19, -0.015), (20, 0.061), (21, 0.029), (22, -0.01), (23, -0.036), (24, -0.034), (25, -0.09), (26, 0.052), (27, -0.019), (28, 0.048), (29, -0.061), (30, -0.054), (31, -0.026), (32, -0.059), (33, 0.032), (34, -0.098), (35, -0.069), (36, 0.077), (37, -0.134), (38, 0.02), (39, -0.059), (40, -0.026), (41, 0.127), (42, 0.021), (43, -0.014), (44, 0.065), (45, 0.059), (46, -0.047), (47, -0.035), (48, -0.077), (49, 0.04)]

similar blogs list:

simIndex simValue blogId blogTitle

same-blog 1 0.9712944 332 hunch net-2008-12-23-Use of Learning Theory

2 0.78975838 126 hunch net-2005-10-26-Fallback Analysis is a Secret to Useful Algorithms

Introduction: The ideal of theoretical algorithm analysis is to construct an algorithm with accompanying optimality theorems proving that it is a useful algorithm. This ideal often fails, particularly for learning algorithms and theory. The general form of a theorem is: If preconditions Then postconditions When we design learning algorithms it is very common to come up with precondition assumptions such as “the data is IID”, “the learning problem is drawn from a known distribution over learning problems”, or “there is a perfect classifier”. All of these example preconditions can be false for real-world problems in ways that are not easily detectable. This means that algorithms derived and justified by these very common forms of analysis may be prone to catastrophic failure in routine (mis)application. We can hope for better. Several different kinds of learning algorithm analysis have been developed some of which have fewer preconditions. Simply demanding that these forms of analysi

3 0.7887795 351 hunch net-2009-05-02-Wielding a New Abstraction

Introduction: This post is partly meant as an advertisement for the reductions tutorial Alina , Bianca , and I are planning to do at ICML . Please come, if you are interested. Many research programs can be thought of as finding and building new useful abstractions. The running example I’ll use is learning reductions where I have experience. The basic abstraction here is that we can build a learning algorithm capable of solving classification problems up to a small expected regret. This is used repeatedly to solve more complex problems. In working on a new abstraction, I think you typically run into many substantial problems of understanding, which make publishing particularly difficult. It is difficult to seriously discuss the reason behind or mechanism for abstraction in a conference paper with small page limits. People rarely see such discussions and hence have little basis on which to think about new abstractions. Another difficulty is that when building an abstraction, yo

4 0.77572298 148 hunch net-2006-01-13-Benchmarks for RL

Introduction: A couple years ago, Drew Bagnell and I started the RLBench project to setup a suite of reinforcement learning benchmark problems. We haven’t been able to touch it (due to lack of time) for a year so the project is on hold. Luckily, there are several other projects such as CLSquare and RL-Glue with a similar goal, and we strongly endorse their continued development. I would like to explain why, especially in the context of criticism of other learning benchmarks. For example, sometimes the UCI Machine Learning Repository is criticized. There are two criticisms I know of: Learning algorithms have overfit to the problems in the repository. It is easy to imagine a mechanism for this happening unintentionally. Strong evidence of this would be provided by learning algorithms which perform great on the UCI machine learning repository but very badly (relative to other learning algorithms) on non-UCI learning problems. I have seen little evidence of this but it remains a po

5 0.75462455 286 hunch net-2008-01-25-Turing’s Club for Machine Learning

Introduction: Many people in Machine Learning don’t fully understand the impact of computation, as demonstrated by a lack of big-O analysis of new learning algorithms. This is important—some current active research programs are fundamentally flawed w.r.t. computation, and other research programs are directly motivated by it. When considering a learning algorithm, I think about the following questions: How does the learning algorithm scale with the number of examples m ? Any algorithm using all of the data is at least O(m) , but in many cases this is O(m 2 ) (naive nearest neighbor for self-prediction) or unknown (k-means or many other optimization algorithms). The unknown case is very common, and it can mean (for example) that the algorithm isn’t convergent or simply that the amount of computation isn’t controlled. The above question can also be asked for test cases. In some applications, test-time performance is of great importance. How does the algorithm scale with the number of

6 0.74322027 35 hunch net-2005-03-04-The Big O and Constants in Learning

7 0.74191517 397 hunch net-2010-05-02-What’s the difference between gambling and rewarding good prediction?

8 0.74116361 28 hunch net-2005-02-25-Problem: Online Learning

9 0.73220873 235 hunch net-2007-03-03-All Models of Learning have Flaws

10 0.73215759 230 hunch net-2007-02-02-Thoughts regarding “Is machine learning different from statistics?”

11 0.73151147 454 hunch net-2012-01-30-ICML Posters and Scope

12 0.72498941 435 hunch net-2011-05-16-Research Directions for Machine Learning and Algorithms

13 0.72189569 42 hunch net-2005-03-17-Going all the Way, Sometimes

14 0.71322417 158 hunch net-2006-02-24-A Fundamentalist Organization of Machine Learning

15 0.71138918 368 hunch net-2009-08-26-Another 10-year paper in Machine Learning

16 0.70349807 347 hunch net-2009-03-26-Machine Learning is too easy

17 0.69912791 95 hunch net-2005-07-14-What Learning Theory might do

18 0.67502236 450 hunch net-2011-12-02-Hadoop AllReduce and Terascale Learning

19 0.64915371 298 hunch net-2008-04-26-Eliminating the Birthday Paradox for Universal Features

20 0.64662355 6 hunch net-2005-01-27-Learning Complete Problems

similar blogs computed by lda model

lda for this blog:

topicId topicWeight

[(3, 0.017), (10, 0.284), (27, 0.266), (38, 0.056), (51, 0.014), (53, 0.09), (55, 0.069), (94, 0.109), (95, 0.03)]

similar blogs list:

simIndex simValue blogId blogTitle

1 0.97260714 55 hunch net-2005-04-10-Is the Goal Understanding or Prediction?

Introduction: Steve Smale and I have a debate about goals of learning theory. Steve likes theorems with a dependence on unobservable quantities. For example, if D is a distribution over a space X x [0,1] , you can state a theorem about the error rate dependent on the variance, E (x,y)~D (y-E y’~D|x [y']) 2 . I dislike this, because I want to use the theorems to produce code solving learning problems. Since I don’t know (and can’t measure) the variance, a theorem depending on the variance does not help me—I would not know what variance to plug into the learning algorithm. Recast more broadly, this is a debate between “declarative” and “operative” mathematics. A strong example of “declarative” mathematics is “a new kind of science” . Roughly speaking, the goal of this kind of approach seems to be finding a way to explain the observations we make. Examples include “some things are unpredictable”, “a phase transition exists”, etc… “Operative” mathematics helps you make predictions a

2 0.96331352 38 hunch net-2005-03-09-Bad Reviewing

Introduction: This is a difficult subject to talk about for many reasons, but a discussion may be helpful. Bad reviewing is a problem in academia. The first step in understanding this is admitting to the problem, so here is a short list of examples of bad reviewing. Reviewer disbelieves theorem proof (ICML), or disbelieve theorem with a trivially false counterexample. (COLT) Reviewer internally swaps quantifiers in a theorem, concludes it has been done before and is trivial. (NIPS) Reviewer believes a technique will not work despite experimental validation. (COLT) Reviewers fail to notice flaw in theorem statement (CRYPTO). Reviewer erroneously claims that it has been done before (NIPS, SODA, JMLR)—(complete with references!) Reviewer inverts the message of a paper and concludes it says nothing important. (NIPS*2) Reviewer fails to distinguish between a DAG and a tree (SODA). Reviewer is enthusiastic about paper but clearly does not understand (ICML). Reviewer erroneously

3 0.9453091 199 hunch net-2006-07-26-Two more UAI papers of interest

Introduction: In addition to Ed Snelson’s paper, there were (at least) two other papers that caught my eye at UAI. One was this paper by Sanjoy Dasgupta, Daniel Hsu and Nakul Verma at UCSD which shows in a surprisingly general and strong way that almost all linear projections of any jointly distributed vector random variable with finite first and second moments look sphereical and unimodal (in fact look like a scale mixture of Gaussians). Great result, as you’d expect from Sanjoy. The other paper which I found intriguing but which I just haven’t groked yet is this beast by Manfred and Dima Kuzmin. You can check out the (beautiful) slides if that helps. I feel like there is something deep here, but my brain is too small to understand it. The COLT and last NIPS papers/slides are also on Manfred’s page. Hopefully someone here can illuminate.

same-blog 4 0.90296596 332 hunch net-2008-12-23-Use of Learning Theory

5 0.86003971 474 hunch net-2012-10-18-7th Annual Machine Learning Symposium

Introduction: A reminder that the New York Academy of Sciences will be hosting the 7th Annual Machine Learning Symposium tomorrow from 9:30am. The main program will feature invited talks from Peter Bartlett , William Freeman , and Vladimir Vapnik , along with numerous spotlight talks and a poster session. Following the main program, hackNY and Microsoft Research are sponsoring a networking hour with talks from machine learning practitioners at NYC startups (specifically bit.ly , Buzzfeed , Chartbeat , and Sense Networks , Visual Revenue ). This should be of great interest to everyone considering working in machine learning.

6 0.83898038 454 hunch net-2012-01-30-ICML Posters and Scope

7 0.79228383 207 hunch net-2006-09-12-Incentive Compatible Reviewing

8 0.78796542 240 hunch net-2007-04-21-Videolectures.net

9 0.7858656 343 hunch net-2009-02-18-Decision by Vetocracy

10 0.78107202 95 hunch net-2005-07-14-What Learning Theory might do

11 0.77331597 177 hunch net-2006-05-05-An ICML reject

12 0.76781243 253 hunch net-2007-07-06-Idempotent-capable Predictors

13 0.76647091 78 hunch net-2005-06-06-Exact Online Learning for Classification

14 0.7659654 79 hunch net-2005-06-08-Question: “When is the right time to insert the loss function?”

15 0.76483375 5 hunch net-2005-01-26-Watchword: Probability

16 0.76156527 158 hunch net-2006-02-24-A Fundamentalist Organization of Machine Learning

17 0.76116955 235 hunch net-2007-03-03-All Models of Learning have Flaws

18 0.76112294 450 hunch net-2011-12-02-Hadoop AllReduce and Terascale Learning

19 0.76045811 347 hunch net-2009-03-26-Machine Learning is too easy

20 0.75908768 98 hunch net-2005-07-27-Not goal metrics