hunch_net hunch_net-2005 hunch_net-2005-126 knowledge-graph by maker-knowledge-mining

126 hunch net-2005-10-26-Fallback Analysis is a Secret to Useful Algorithms


meta infos for this blog

Source: html

Introduction: The ideal of theoretical algorithm analysis is to construct an algorithm with accompanying optimality theorems proving that it is a useful algorithm. This ideal often fails, particularly for learning algorithms and theory. The general form of a theorem is: If preconditions Then postconditions When we design learning algorithms it is very common to come up with precondition assumptions such as “the data is IID”, “the learning problem is drawn from a known distribution over learning problems”, or “there is a perfect classifier”. All of these example preconditions can be false for real-world problems in ways that are not easily detectable. This means that algorithms derived and justified by these very common forms of analysis may be prone to catastrophic failure in routine (mis)application. We can hope for better. Several different kinds of learning algorithm analysis have been developed some of which have fewer preconditions. Simply demanding that these forms of analysi


Summary: the most important sentenses genereted by tfidf model

sentIndex sentText sentNum sentScore

1 The ideal of theoretical algorithm analysis is to construct an algorithm with accompanying optimality theorems proving that it is a useful algorithm. [sent-1, score-0.83]

2 This ideal often fails, particularly for learning algorithms and theory. [sent-2, score-0.227]

3 All of these example preconditions can be false for real-world problems in ways that are not easily detectable. [sent-4, score-0.377]

4 This means that algorithms derived and justified by these very common forms of analysis may be prone to catastrophic failure in routine (mis)application. [sent-5, score-0.87]

5 Several different kinds of learning algorithm analysis have been developed some of which have fewer preconditions. [sent-7, score-0.359]

6 Simply demanding that these forms of analysis be used may be too strong—there is an unresolved criticism that these algorithm may be “too worst case”. [sent-8, score-0.662]

7 Nevertheless, it is possible to have a learning algorithm that simultaneously provides strong postconditions given strong preconditions, reasonable postconditions given reasonable preconditions, and weak postconditions given weak preconditions. [sent-9, score-2.196]

8 Some examples of this I’ve encountered include: Sham , Matthias and Dean showing that some Bayesian regression is robust in a minimax online learning analysis. [sent-10, score-0.18]

9 The cover tree which creates an O(n) datastructure for nearest neighbor queries while simultaneously guaranteeing O(log(n)) query time when the metric obeys a dimensionality constraint. [sent-11, score-0.403]

10 The basic claim is that algorithms with a good fallback analysis are significantly more likely to achieve the theoretical algorithm analysis ideal. [sent-12, score-1.06]

11 Both of the above algorithms have been tested in practice and found capable. [sent-13, score-0.114]

12 Several significant difficulties occur for anyone working on fallback analysis. [sent-14, score-0.353]

13 This is probably the most valid reason—people have limited time to do things. [sent-16, score-0.067]

14 Nevertheless, it is reasonable to hope that the core techniques used by many people have had this effort put into them. [sent-17, score-0.172]

15 It is psychologically difficult to both assume and not assume a precondition, for a researcher. [sent-18, score-0.372]

16 A critical valuable resource here is observing multiple forms of analysis. [sent-19, score-0.301]

17 It is psychologically difficult for a reviewer to appreciate the value of both assuming and not assuming some precondition. [sent-20, score-0.4]

18 In particular, theoretically inclined people 1) get great joy from showing that something new is possible and 1) routinely work on papers of the form “here is a better algorithm to do X given the same assumptions”. [sent-23, score-0.613]

19 A fallback analysis requires a change in assumption invalidating (2) and the new thing that it shows for (1) is subtle: that two existing guarantees can hold for the same algorithm. [sent-24, score-0.595]

20 My hope here is that this subtlety becomes better appreciated in time—making useful algorithms has a fundamental sexiness of it’s own. [sent-25, score-0.274]


similar blogs computed by tfidf model

tfidf for this blog:

wordName wordTfidf (topN-words)

[('postconditions', 0.377), ('preconditions', 0.377), ('fallback', 0.283), ('analysis', 0.228), ('precondition', 0.189), ('psychologically', 0.168), ('forms', 0.156), ('algorithm', 0.131), ('assuming', 0.116), ('algorithms', 0.114), ('ideal', 0.113), ('simultaneously', 0.111), ('assume', 0.102), ('showing', 0.102), ('strong', 0.099), ('given', 0.093), ('weak', 0.091), ('hope', 0.09), ('assumptions', 0.09), ('invalidating', 0.084), ('justified', 0.084), ('matthias', 0.084), ('obeys', 0.084), ('sexy', 0.084), ('unresolved', 0.084), ('reasonable', 0.082), ('joy', 0.078), ('accompanying', 0.078), ('catastrophic', 0.078), ('inclined', 0.078), ('minimax', 0.078), ('observing', 0.078), ('prone', 0.078), ('theoretical', 0.076), ('guaranteeing', 0.073), ('mis', 0.073), ('optimality', 0.073), ('nevertheless', 0.071), ('occur', 0.07), ('appreciated', 0.07), ('datastructure', 0.07), ('routinely', 0.07), ('dean', 0.067), ('resource', 0.067), ('routine', 0.067), ('valid', 0.067), ('queries', 0.065), ('derived', 0.065), ('criticism', 0.063), ('theoretically', 0.061)]

similar blogs list:

simIndex simValue blogId blogTitle

same-blog 1 1.0000002 126 hunch net-2005-10-26-Fallback Analysis is a Secret to Useful Algorithms

Introduction: The ideal of theoretical algorithm analysis is to construct an algorithm with accompanying optimality theorems proving that it is a useful algorithm. This ideal often fails, particularly for learning algorithms and theory. The general form of a theorem is: If preconditions Then postconditions When we design learning algorithms it is very common to come up with precondition assumptions such as “the data is IID”, “the learning problem is drawn from a known distribution over learning problems”, or “there is a perfect classifier”. All of these example preconditions can be false for real-world problems in ways that are not easily detectable. This means that algorithms derived and justified by these very common forms of analysis may be prone to catastrophic failure in routine (mis)application. We can hope for better. Several different kinds of learning algorithm analysis have been developed some of which have fewer preconditions. Simply demanding that these forms of analysi

2 0.13698098 388 hunch net-2010-01-24-Specializations of the Master Problem

Introduction: One thing which is clear on a little reflection is that there exists a single master learning problem capable of encoding essentially all learning problems. This problem is of course a very general sort of reinforcement learning where the world interacts with an agent as: The world announces an observation x . The agent makes a choice a . The world announces a reward r . The goal here is to maximize the sum of the rewards over the time of the agent. No particular structure relating x to a or a to r is implied by this setting so we do not know effective general algorithms for the agent. It’s very easy to prove lower bounds showing that an agent cannot hope to succeed here—just consider the case where actions are unrelated to rewards. Nevertheless, there is a real sense in which essentially all forms of life are agents operating in this setting, somehow succeeding. The gap between these observations drives research—How can we find tractable specializations of

3 0.12582242 332 hunch net-2008-12-23-Use of Learning Theory

Introduction: I’ve had serious conversations with several people who believe that the theory in machine learning is “only useful for getting papers published”. That’s a compelling statement, as I’ve seen many papers where the algorithm clearly came first, and the theoretical justification for it came second, purely as a perceived means to improve the chance of publication. Naturally, I disagree and believe that learning theory has much more substantial applications. Even in core learning algorithm design, I’ve found learning theory to be useful, although it’s application is more subtle than many realize. The most straightforward applications can fail, because (as expectation suggests) worst case bounds tend to be loose in practice (*). In my experience, considering learning theory when designing an algorithm has two important effects in practice: It can help make your algorithm behave right at a crude level of analysis, leaving finer details to tuning or common sense. The best example

4 0.1230391 343 hunch net-2009-02-18-Decision by Vetocracy

Introduction: Few would mistake the process of academic paper review for a fair process, but sometimes the unfairness seems particularly striking. This is most easily seen by comparison: Paper Banditron Offset Tree Notes Problem Scope Multiclass problems where only the loss of one choice can be probed. Strictly greater: Cost sensitive multiclass problems where only the loss of one choice can be probed. Often generalizations don’t matter. That’s not the case here, since every plausible application I’ve thought of involves loss functions substantially different from 0/1. What’s new Analysis and Experiments Algorithm, Analysis, and Experiments As far as I know, the essence of the more general problem was first stated and analyzed with the EXP4 algorithm (page 16) (1998). It’s also the time horizon 1 simplification of the Reinforcement Learning setting for the random trajectory method (page 15) (2002). The Banditron algorithm itself is functionally identi

5 0.10845304 235 hunch net-2007-03-03-All Models of Learning have Flaws

Introduction: Attempts to abstract and study machine learning are within some given framework or mathematical model. It turns out that all of these models are significantly flawed for the purpose of studying machine learning. I’ve created a table (below) outlining the major flaws in some common models of machine learning. The point here is not simply “woe unto us”. There are several implications which seem important. The multitude of models is a point of continuing confusion. It is common for people to learn about machine learning within one framework which often becomes there “home framework” through which they attempt to filter all machine learning. (Have you met people who can only think in terms of kernels? Only via Bayes Law? Only via PAC Learning?) Explicitly understanding the existence of these other frameworks can help resolve the confusion. This is particularly important when reviewing and particularly important for students. Algorithms which conform to multiple approaches c

6 0.10535058 158 hunch net-2006-02-24-A Fundamentalist Organization of Machine Learning

7 0.10430508 286 hunch net-2008-01-25-Turing’s Club for Machine Learning

8 0.10415236 334 hunch net-2009-01-07-Interesting Papers at SODA 2009

9 0.10089675 177 hunch net-2006-05-05-An ICML reject

10 0.098684259 351 hunch net-2009-05-02-Wielding a New Abstraction

11 0.097278044 454 hunch net-2012-01-30-ICML Posters and Scope

12 0.096977614 57 hunch net-2005-04-16-Which Assumptions are Reasonable?

13 0.094600722 183 hunch net-2006-06-14-Explorations of Exploration

14 0.093750879 7 hunch net-2005-01-31-Watchword: Assumption

15 0.092621773 28 hunch net-2005-02-25-Problem: Online Learning

16 0.092184365 35 hunch net-2005-03-04-The Big O and Constants in Learning

17 0.089520417 466 hunch net-2012-06-05-ICML acceptance statistics

18 0.088915251 347 hunch net-2009-03-26-Machine Learning is too easy

19 0.085690573 179 hunch net-2006-05-16-The value of the orthodox view of Boosting

20 0.084839486 12 hunch net-2005-02-03-Learning Theory, by assumption


similar blogs computed by lsi model

lsi for this blog:

topicId topicWeight

[(0, 0.209), (1, 0.083), (2, 0.023), (3, 0.013), (4, 0.066), (5, -0.016), (6, 0.011), (7, 0.011), (8, 0.022), (9, 0.042), (10, 0.037), (11, 0.005), (12, 0.072), (13, -0.013), (14, 0.017), (15, 0.011), (16, -0.025), (17, -0.026), (18, -0.07), (19, -0.012), (20, 0.015), (21, -0.029), (22, -0.008), (23, -0.06), (24, -0.084), (25, -0.068), (26, 0.02), (27, -0.051), (28, 0.014), (29, -0.09), (30, -0.068), (31, -0.042), (32, -0.054), (33, 0.032), (34, -0.015), (35, -0.056), (36, -0.008), (37, -0.002), (38, 0.012), (39, -0.048), (40, -0.03), (41, 0.05), (42, -0.01), (43, 0.005), (44, 0.003), (45, -0.046), (46, 0.06), (47, 0.071), (48, 0.025), (49, -0.001)]

similar blogs list:

simIndex simValue blogId blogTitle

same-blog 1 0.9558875 126 hunch net-2005-10-26-Fallback Analysis is a Secret to Useful Algorithms

Introduction: The ideal of theoretical algorithm analysis is to construct an algorithm with accompanying optimality theorems proving that it is a useful algorithm. This ideal often fails, particularly for learning algorithms and theory. The general form of a theorem is: If preconditions Then postconditions When we design learning algorithms it is very common to come up with precondition assumptions such as “the data is IID”, “the learning problem is drawn from a known distribution over learning problems”, or “there is a perfect classifier”. All of these example preconditions can be false for real-world problems in ways that are not easily detectable. This means that algorithms derived and justified by these very common forms of analysis may be prone to catastrophic failure in routine (mis)application. We can hope for better. Several different kinds of learning algorithm analysis have been developed some of which have fewer preconditions. Simply demanding that these forms of analysi

2 0.72080314 57 hunch net-2005-04-16-Which Assumptions are Reasonable?

Introduction: One of the most confusing things about understanding learning theory is the vast array of differing assumptions. Some critical thought about which of these assumptions are reasonable for real-world problems may be useful. Before we even start thinking about assumptions, it’s important to realize that the word has multiple meanings . The meaning used here is “assumption = axiom” (i.e. something you can not verify). Assumption Reasonable? Which analysis? Example/notes Independent and Identically Distributed Data Sometimes PAC,ERM,Prediction bounds,statistics The KDD cup 2004 physics dataset is plausibly IID data. There are a number of situations which are “almost IID” in the sense that IID analysis results in correct intuitions. Unreasonable in adversarial situations (stock market, war, etc…) Independently Distributed Data More than IID, but still only sometimes online->batch conversion Losing “identical” can be helpful in situations where you

3 0.68883419 202 hunch net-2006-08-10-Precision is not accuracy

Introduction: In my experience, there are two different groups of people who believe the same thing: the mathematics encountered in typical machine learning conference papers is often of questionable value. The two groups who agree on this are applied machine learning people who have given up on math, and mature theoreticians who understand the limits of theory. Partly, this is just a statement about where we are with respect to machine learning. In particular, we have no mechanism capable of generating a prescription for how to solve all learning problems. In the absence of such certainty, people try to come up with formalisms that partially describe and motivate how and why they do things. This is natural and healthy—we might hope that it will eventually lead to just such a mechanism. But, part of this is simply an emphasis on complexity over clarity. A very natural and simple theoretical statement is often obscured by complexifications. Common sources of complexification include:

4 0.68641394 133 hunch net-2005-11-28-A question of quantification

Introduction: This is about methods for phrasing and think about the scope of some theorems in learning theory. The basic claim is that there are several different ways of quantifying the scope which sound different yet are essentially the same. For all sequences of examples . This is the standard quantification in online learning analysis. Standard theorems would say something like “for all sequences of predictions by experts, the algorithm A will perform almost as well as the best expert.” For all training sets . This is the standard quantification for boosting analysis such as adaboost or multiclass boosting . Standard theorems have the form “for all training sets the error rate inequalities … hold”. For all distributions over examples . This is the one that we have been using for reductions analysis. Standard theorem statements have the form “For all distributions over examples, the error rate inequalities … hold”. It is not quite true that each of these is equivalent. F

5 0.68491703 7 hunch net-2005-01-31-Watchword: Assumption

Introduction: “Assumption” is another word to be careful with in machine learning because it is used in several ways. Assumption = Bias There are several ways to see that some form of ‘bias’ (= preferring of one solution over another) is necessary. This is obvious in an adversarial setting. A good bit of work has been expended explaining this in other settings with “ no free lunch ” theorems. This is a usage specialized to learning which is particularly common when talking about priors for Bayesian Learning. Assumption = “if” of a theorem The assumptions are the ‘if’ part of the ‘if-then’ in a theorem. This is a fairly common usage. Assumption = Axiom The assumptions are the things that we assume are true, but which we cannot verify. Examples are “the IID assumption” or “my problem is a DNF on a small number of bits”. This is the usage which I prefer. One difficulty with any use of the word “assumption” is that you often encounter “if assumption then conclusion so if no

6 0.6723696 104 hunch net-2005-08-22-Do you believe in induction?

7 0.66108632 332 hunch net-2008-12-23-Use of Learning Theory

8 0.641303 348 hunch net-2009-04-02-Asymmophobia

9 0.64122045 411 hunch net-2010-09-21-Regretting the dead

10 0.63248158 163 hunch net-2006-03-12-Online learning or online preservation of learning?

11 0.63149512 286 hunch net-2008-01-25-Turing’s Club for Machine Learning

12 0.62782079 204 hunch net-2006-08-28-Learning Theory standards for NIPS 2006

13 0.62684572 334 hunch net-2009-01-07-Interesting Papers at SODA 2009

14 0.62595201 148 hunch net-2006-01-13-Benchmarks for RL

15 0.62396479 219 hunch net-2006-11-22-Explicit Randomization in Learning algorithms

16 0.61977452 35 hunch net-2005-03-04-The Big O and Constants in Learning

17 0.61789328 235 hunch net-2007-03-03-All Models of Learning have Flaws

18 0.61419338 28 hunch net-2005-02-25-Problem: Online Learning

19 0.61246789 177 hunch net-2006-05-05-An ICML reject

20 0.60767967 179 hunch net-2006-05-16-The value of the orthodox view of Boosting


similar blogs computed by lda model

lda for this blog:

topicId topicWeight

[(9, 0.012), (10, 0.021), (21, 0.284), (27, 0.233), (38, 0.08), (51, 0.016), (53, 0.047), (55, 0.08), (94, 0.059), (95, 0.063)]

similar blogs list:

simIndex simValue blogId blogTitle

1 0.95561767 58 hunch net-2005-04-21-Dynamic Programming Generalizations and Their Use

Introduction: David Mcallester gave a talk about this paper (with Pedro Felzenszwalb ). I’ll try to give a high level summary of why it’s interesting. Dynamic programming is most familiar as instantiated by Viterbi decoding in a hidden markov model. It is a general paradigm for problem solving where subproblems are solved and used to solve larger problems. In the Viterbi decoding example, the subproblem is “What is the most probable path ending at each state at timestep t ?”, and the larger problem is the same except at timestep t+1 . There are a few optimizations you can do here: Dynamic Programming -> queued Dynamic Programming . Keep track of the “cost so far” (or “most probable path”) and (carefully) only look at extensions to paths likely to yield the shortest path. “Carefully” here is defined by Dijkstra’s shortest path algorithm . queued Dynamic programming -> A * Add a lower bound on the cost to complete a path (or an upper bound on the probability of a completion) for

2 0.9030894 369 hunch net-2009-08-27-New York Area Machine Learning Events

Introduction: Several events are happening in the NY area. Barriers in Computational Learning Theory Workshop, Aug 28. That’s tomorrow near Princeton. I’m looking forward to speaking at this one on “Getting around Barriers in Learning Theory”, but several other talks are of interest, particularly to the CS theory inclined. Claudia Perlich is running the INFORMS Data Mining Contest with a deadline of Sept. 25. This is a contest using real health record data (they partnered with HealthCare Intelligence ) to predict transfers and mortality. In the current US health care reform debate, the case studies of high costs we hear strongly suggest machine learning & statistics can save many billions. The Singularity Summit October 3&4 . This is for the AIists out there. Several of the talks look interesting, although unfortunately I’ll miss it for ALT . Predictive Analytics World, Oct 20-21 . This is stretching the definition of “New York Area” a bit, but the train to DC is reasonable.

same-blog 3 0.89997125 126 hunch net-2005-10-26-Fallback Analysis is a Secret to Useful Algorithms

Introduction: The ideal of theoretical algorithm analysis is to construct an algorithm with accompanying optimality theorems proving that it is a useful algorithm. This ideal often fails, particularly for learning algorithms and theory. The general form of a theorem is: If preconditions Then postconditions When we design learning algorithms it is very common to come up with precondition assumptions such as “the data is IID”, “the learning problem is drawn from a known distribution over learning problems”, or “there is a perfect classifier”. All of these example preconditions can be false for real-world problems in ways that are not easily detectable. This means that algorithms derived and justified by these very common forms of analysis may be prone to catastrophic failure in routine (mis)application. We can hope for better. Several different kinds of learning algorithm analysis have been developed some of which have fewer preconditions. Simply demanding that these forms of analysi

4 0.84001732 3 hunch net-2005-01-24-The Humanloop Spectrum of Machine Learning

Introduction: All branches of machine learning seem to be united in the idea of using data to make predictions. However, people disagree to some extent about what this means. One way to categorize these different goals is on an axis, where one extreme is “tools to aid a human in using data to do prediction” and the other extreme is “tools to do prediction with no human intervention”. Here is my estimate of where various elements of machine learning fall on this spectrum. Human Necessary Human partially necessary Human unnecessary Clustering, data visualization Bayesian Learning, Probabilistic Models, Graphical Models Kernel Learning (SVM’s, etc..) Decision Trees? Reinforcement Learning The exact position of each element is of course debatable. My reasoning is that clustering and data visualization are nearly useless for prediction without a human in the loop. Bayesian/probabilistic models/graphical models generally require a human to sit and think about what

5 0.82707375 267 hunch net-2007-10-17-Online as the new adjective

Introduction: Online learning is in vogue, which means we should expect to see in the near future: Online boosting. Online decision trees. Online SVMs. (actually, we’ve already seen) Online deep learning. Online parallel learning. etc… There are three fundamental drivers of this trend. Increasing size of datasets makes online algorithms attractive. Online learning can simply be more efficient than batch learning. Here is a picture from a class on online learning: The point of this picture is that even in 3 dimensions and even with linear constraints, finding the minima of a set in an online fashion can be typically faster than finding the minima in a batch fashion. To see this, note that there is a minimal number of gradient updates (i.e. 2) required in order to reach the minima in the typical case. Given this, it’s best to do these updates as quickly as possible, which implies doing the first update online (i.e. before seeing all the examples) is preferred. Note

6 0.78860474 378 hunch net-2009-11-15-The Other Online Learning

7 0.71315891 194 hunch net-2006-07-11-New Models

8 0.69300222 309 hunch net-2008-07-10-Interesting papers, ICML 2008

9 0.69191331 359 hunch net-2009-06-03-Functionally defined Nonlinear Dynamic Models

10 0.68457967 360 hunch net-2009-06-15-In Active Learning, the question changes

11 0.68253726 235 hunch net-2007-03-03-All Models of Learning have Flaws

12 0.67985165 343 hunch net-2009-02-18-Decision by Vetocracy

13 0.67824751 102 hunch net-2005-08-11-Why Manifold-Based Dimension Reduction Techniques?

14 0.67780441 230 hunch net-2007-02-02-Thoughts regarding “Is machine learning different from statistics?”

15 0.67717171 406 hunch net-2010-08-22-KDD 2010

16 0.67710811 12 hunch net-2005-02-03-Learning Theory, by assumption

17 0.6761747 379 hunch net-2009-11-23-ICML 2009 Workshops (and Tutorials)

18 0.6739251 95 hunch net-2005-07-14-What Learning Theory might do

19 0.6725297 36 hunch net-2005-03-05-Funding Research

20 0.67108947 432 hunch net-2011-04-20-The End of the Beginning of Active Learning