hunch_net hunch_net-2006 hunch_net-2006-220 knowledge-graph by maker-knowledge-mining

220 hunch net-2006-11-27-Continuizing Solutions

meta infos for this blog

Source: html

Introduction: This post is about a general technique for problem solving which I’ve never seen taught (in full generality), but which I’ve found very useful. Many problems in computer science turn out to be discretely difficult. The best known version of such problems are NP-hard problems, but I mean ‘discretely difficult’ in a much more general way, which I only know how to capture by examples. ERM In empirical risk minimization, you choose a minimum error rate classifier from a set of classifiers. This is NP hard for common sets, but it can be much harder, depending on the set. Experts In the online learning with experts setting, you try to predict well so as to compete with a set of (adversarial) experts. Here the alternating quantifiers of you and an adversary playing out a game can yield a dynamic programming problem that grows exponentially. Policy Iteration The problem with policy iteration is that you learn a new policy with respect to an old policy, which implies that sim

Summary: the most important sentenses genereted by tfidf model

sentIndex sentText sentNum sentScore

1 This post is about a general technique for problem solving which I’ve never seen taught (in full generality), but which I’ve found very useful. [sent-1, score-0.301]

2 Many problems in computer science turn out to be discretely difficult. [sent-2, score-0.279]

3 The best known version of such problems are NP-hard problems, but I mean ‘discretely difficult’ in a much more general way, which I only know how to capture by examples. [sent-3, score-0.174]

4 ERM In empirical risk minimization, you choose a minimum error rate classifier from a set of classifiers. [sent-4, score-0.188]

5 This is NP hard for common sets, but it can be much harder, depending on the set. [sent-5, score-0.059]

6 Experts In the online learning with experts setting, you try to predict well so as to compete with a set of (adversarial) experts. [sent-6, score-0.384]

7 Here the alternating quantifiers of you and an adversary playing out a game can yield a dynamic programming problem that grows exponentially. [sent-7, score-0.654]

8 Policy Iteration The problem with policy iteration is that you learn a new policy with respect to an old policy, which implies that simply adopting the new policy can go very wrong. [sent-8, score-2.343]

9 For each of these problems, there are “continuized” solutions which can yield smaller computation, more elegant mathematics, or both. [sent-9, score-0.319]

10 ERM By shifting from choosing a single classifier to choosing a stochastic classifier we can prove a new style of bound which is significantly tighter, easier to state, and easier to understand than traditional bounds in the traditional setting. [sent-10, score-1.155]

11 Experts By giving the adversary slightly more power—the ability to split experts and have them fractionally predict one way vs. [sent-12, score-0.518]

12 another, the optimal policy becomes much easier to compute (quadratic in the horizon, or maybe less). [sent-13, score-0.641]

13 Policy Iteration For policy iteration, by stochastically mixing the old and the new policy, we can find a new policy better than the old policy. [sent-15, score-1.54]

14 The first and second examples both involve a setting shift, which may not be valid—in general your setting should reflect your real problem rather than the thing which is easy to solve. [sent-18, score-0.496]

15 However, even with the setting shift, the solutions appear so compellingly more elegant that it is hard to not hope to use them in a solution to the original setting. [sent-19, score-0.502]

16 I have not seen a good formulation of the general approach of continuizing. [sent-20, score-0.256]

similar blogs computed by tfidf model

tfidf for this blog:

wordName wordTfidf (topN-words)

[('policy', 0.511), ('iteration', 0.357), ('experts', 0.256), ('discretely', 0.201), ('erm', 0.143), ('old', 0.141), ('setting', 0.132), ('easier', 0.13), ('elegant', 0.13), ('classifier', 0.123), ('adversary', 0.118), ('traditional', 0.115), ('shift', 0.113), ('choosing', 0.098), ('yield', 0.097), ('general', 0.096), ('solutions', 0.092), ('quantifiers', 0.089), ('continuous', 0.089), ('adopting', 0.089), ('compellingly', 0.089), ('alternating', 0.083), ('formulation', 0.083), ('danger', 0.083), ('bound', 0.081), ('new', 0.079), ('problems', 0.078), ('mixing', 0.078), ('split', 0.078), ('quadratic', 0.078), ('seen', 0.077), ('np', 0.074), ('grows', 0.074), ('horizon', 0.071), ('reflect', 0.071), ('valid', 0.071), ('generality', 0.067), ('conservative', 0.067), ('minimization', 0.067), ('predict', 0.066), ('tighter', 0.065), ('risk', 0.065), ('playing', 0.065), ('problem', 0.065), ('dynamic', 0.063), ('shifting', 0.063), ('taught', 0.063), ('compete', 0.062), ('mathematics', 0.059), ('hard', 0.059)]

similar blogs list:

simIndex simValue blogId blogTitle

same-blog 1 1.0000002 220 hunch net-2006-11-27-Continuizing Solutions

2 0.20302184 388 hunch net-2010-01-24-Specializations of the Master Problem

Introduction: One thing which is clear on a little reflection is that there exists a single master learning problem capable of encoding essentially all learning problems. This problem is of course a very general sort of reinforcement learning where the world interacts with an agent as: The world announces an observation x . The agent makes a choice a . The world announces a reward r . The goal here is to maximize the sum of the rewards over the time of the agent. No particular structure relating x to a or a to r is implied by this setting so we do not know effective general algorithms for the agent. It’s very easy to prove lower bounds showing that an agent cannot hope to succeed here—just consider the case where actions are unrelated to rewards. Nevertheless, there is a real sense in which essentially all forms of life are agents operating in this setting, somehow succeeding. The gap between these observations drives research—How can we find tractable specializations of

3 0.15269674 269 hunch net-2007-10-24-Contextual Bandits

Introduction: One of the fundamental underpinnings of the internet is advertising based content. This has become much more effective due to targeted advertising where ads are specifically matched to interests. Everyone is familiar with this, because everyone uses search engines and all search engines try to make money this way. The problem of matching ads to interests is a natural machine learning problem in some ways since there is much information in who clicks on what. A fundamental problem with this information is that it is not supervised—in particular a click-or-not on one ad doesn’t generally tell you if a different ad would have been clicked on. This implies we have a fundamental exploration problem. A standard mathematical setting for this situation is “ k -Armed Bandits”, often with various relevant embellishments. The k -Armed Bandit setting works on a round-by-round basis. On each round: A policy chooses arm a from 1 of k arms (i.e. 1 of k ads). The world reveals t

4 0.1405932 109 hunch net-2005-09-08-Online Learning as the Mathematics of Accountability

Introduction: Accountability is a social problem. When someone screws up, do you fire them? Or do you accept the error and let them continue? This is a very difficult problem and we all know of stories where the wrong decision was made. Online learning (as meant here), is a subfield of learning theory which analyzes the online learning model. In the online learning model, there are a set of hypotheses or “experts”. On any instantance x , each expert makes a prediction y . A master algorithm A uses these predictions to form it’s own prediction y A and then learns the correct prediction y * . This process repeats. The goal of online learning is to find a master algorithm A which uses the advice of the experts to make good predictions. In particular, we typically want to guarantee that the master algorithm performs almost as well as the best expert. If L(e) is the loss of expert e and L(A) is the loss of the master algorithm, it is often possible to prove: L(A) les

5 0.1360846 14 hunch net-2005-02-07-The State of the Reduction

Introduction: What? Reductions are machines which turn solvers for one problem into solvers for another problem. Why? Reductions are useful for several reasons. Laziness . Reducing a problem to classification make at least 10 learning algorithms available to solve a problem. Inventing 10 learning algorithms is quite a bit of work. Similarly, programming a reduction is often trivial, while programming a learning algorithm is a great deal of work. Crystallization . The problems we often want to solve in learning are worst-case-impossible, but average case feasible. By reducing all problems onto one or a few primitives, we can fine tune these primitives to perform well on real-world problems with greater precision due to the greater number of problems to validate on. Theoretical Organization . By studying what reductions are easy vs. hard vs. impossible, we can learn which problems are roughly equivalent in difficulty and which are much harder. What we know now . Typesafe r

6 0.13296539 100 hunch net-2005-08-04-Why Reinforcement Learning is Important

7 0.13228545 219 hunch net-2006-11-22-Explicit Randomization in Learning algorithms

8 0.13062106 101 hunch net-2005-08-08-Apprenticeship Reinforcement Learning for Control

9 0.11454307 183 hunch net-2006-06-14-Explorations of Exploration

10 0.11168518 282 hunch net-2008-01-06-Research Political Issues

11 0.1019682 311 hunch net-2008-07-26-Compositional Machine Learning Algorithm Design

12 0.10108186 247 hunch net-2007-06-14-Interesting Papers at COLT 2007

13 0.094187997 57 hunch net-2005-04-16-Which Assumptions are Reasonable?

14 0.093599558 215 hunch net-2006-10-22-Exemplar programming

15 0.093538247 50 hunch net-2005-04-01-Basic computer science research takes a hit

16 0.09287221 235 hunch net-2007-03-03-All Models of Learning have Flaws

17 0.092073351 107 hunch net-2005-09-05-Site Update

18 0.091470882 192 hunch net-2006-07-08-Some recent papers

19 0.087848216 8 hunch net-2005-02-01-NIPS: Online Bayes

20 0.08568982 347 hunch net-2009-03-26-Machine Learning is too easy

similar blogs computed by lsi model

lsi for this blog:

topicId topicWeight

[(0, 0.201), (1, 0.1), (2, 0.005), (3, 0.014), (4, 0.024), (5, -0.072), (6, 0.082), (7, -0.009), (8, -0.053), (9, 0.043), (10, 0.107), (11, 0.056), (12, 0.038), (13, 0.032), (14, -0.002), (15, -0.002), (16, 0.028), (17, -0.109), (18, 0.039), (19, 0.119), (20, -0.064), (21, -0.029), (22, -0.077), (23, 0.009), (24, -0.006), (25, 0.106), (26, -0.054), (27, 0.123), (28, 0.048), (29, 0.04), (30, -0.067), (31, -0.003), (32, 0.081), (33, 0.047), (34, 0.036), (35, 0.05), (36, 0.072), (37, 0.083), (38, 0.042), (39, -0.034), (40, -0.006), (41, -0.006), (42, 0.041), (43, 0.052), (44, -0.022), (45, -0.032), (46, 0.083), (47, -0.072), (48, 0.011), (49, -0.009)]

similar blogs list:

simIndex simValue blogId blogTitle

same-blog 1 0.97469306 220 hunch net-2006-11-27-Continuizing Solutions

2 0.74505579 388 hunch net-2010-01-24-Specializations of the Master Problem

3 0.7165038 269 hunch net-2007-10-24-Contextual Bandits

4 0.69090003 317 hunch net-2008-09-12-How do we get weak action dependence for learning with partial observations?

Introduction: This post is about contextual bandit problems where, repeatedly: The world chooses features x and rewards for each action r 1 ,…,r k then announces the features x (but not the rewards). A policy chooses an action a . The world announces the reward r a The goal in these situations is to learn a policy which maximizes r a in expectation efficiently. I’m thinking about all situations which fit the above setting, whether they are drawn IID or adversarially from round to round and whether they involve past logged data or rapidly learning via interaction. One common drawback of all algorithms for solving this setting, is that they have a poor dependence on the number of actions. For example if k is the number of actions, EXP4 (page 66) has a dependence on k 0.5 , epoch-greedy (and the simpler epsilon greedy) have a dependence on k 1/3 , and the offset tree has a dependence on k-1 . These results aren’t directly comparable because different things a

5 0.67899531 311 hunch net-2008-07-26-Compositional Machine Learning Algorithm Design

Introduction: There were two papers at ICML presenting learning algorithms for a contextual bandit -style setting, where the loss for all labels is not known, but the loss for one label is known. (The first might require a exploration scavenging viewpoint to understand if the experimental assignment was nonrandom.) I strongly approve of these papers and further work in this setting and its variants, because I expect it to become more important than supervised learning. As a quick review, we are thinking about situations where repeatedly: The world reveals feature values (aka context information). A policy chooses an action. The world provides a reward. Sometimes this is done in an online fashion where the policy can change based on immediate feedback and sometimes it’s done in a batch setting where many samples are collected before the policy can change. If you haven’t spent time thinking about the setting, you might want to because there are many natural applications. I’m g

6 0.62866789 373 hunch net-2009-10-03-Static vs. Dynamic multiclass prediction

7 0.60205555 101 hunch net-2005-08-08-Apprenticeship Reinforcement Learning for Control

8 0.55541402 100 hunch net-2005-08-04-Why Reinforcement Learning is Important

9 0.53157032 163 hunch net-2006-03-12-Online learning or online preservation of learning?

10 0.5265789 400 hunch net-2010-06-13-The Good News on Exploration and Learning

11 0.52403843 58 hunch net-2005-04-21-Dynamic Programming Generalizations and Their Use

12 0.5196104 183 hunch net-2006-06-14-Explorations of Exploration

13 0.50396621 109 hunch net-2005-09-08-Online Learning as the Mathematics of Accountability

14 0.48508129 276 hunch net-2007-12-10-Learning Track of International Planning Competition

15 0.48347697 148 hunch net-2006-01-13-Benchmarks for RL

16 0.47827372 391 hunch net-2010-03-15-The Efficient Robust Conditional Probability Estimation Problem

17 0.47164208 334 hunch net-2009-01-07-Interesting Papers at SODA 2009

18 0.46884894 78 hunch net-2005-06-06-Exact Online Learning for Classification

19 0.46702385 169 hunch net-2006-04-05-What is state?

20 0.46246567 28 hunch net-2005-02-25-Problem: Online Learning

similar blogs computed by lda model

lda for this blog:

topicId topicWeight

[(0, 0.045), (3, 0.025), (27, 0.301), (53, 0.048), (55, 0.111), (75, 0.168), (77, 0.045), (94, 0.073), (95, 0.077)]

similar blogs list:

simIndex simValue blogId blogTitle

same-blog 1 0.93808615 220 hunch net-2006-11-27-Continuizing Solutions

2 0.8708868 360 hunch net-2009-06-15-In Active Learning, the question changes

Introduction: A little over 4 years ago, Sanjoy made a post saying roughly “we should study active learning theoretically, because not much is understood”. At the time, we did not understand basic things such as whether or not it was possible to PAC-learn with an active algorithm without making strong assumptions about the noise rate. In other words, the fundamental question was “can we do it?” The nature of the question has fundamentally changed in my mind. The answer is to the previous question is “yes”, both information theoretically and computationally, most places where supervised learning could be applied. In many situation, the question has now changed to: “is it worth it?” Is the programming and computational overhead low enough to make the label cost savings of active learning worthwhile? Currently, there are situations where this question could go either way. Much of the challenge for the future is in figuring out how to make active learning easier or more worthwhile.

3 0.86703724 378 hunch net-2009-11-15-The Other Online Learning

Introduction: If you search for “online learning” with any major search engine , it’s interesting to note that zero of the results are for online machine learning. This may not be a mistake if you are committed to a global ordering. In other words, the number of people specifically interested in the least interesting top-10 online human learning result might exceed the number of people interested in online machine learning, even given the presence of the other 9 results. The essential observation here is that the process of human learning is a big business (around 5% of GDP) effecting virtually everyone. The internet is changing this dramatically, by altering the economics of teaching. Consider two possibilities: The classroom-style teaching environment continues as is, with many teachers for the same subject. All the teachers for one subject get together, along with perhaps a factor of 2 more people who are experts in online delivery. They spend a factor of 4 more time designing

4 0.86103344 225 hunch net-2007-01-02-Retrospective

Introduction: It’s been almost two years since this blog began. In that time, I’ve learned enough to shift my expectations in several ways. Initially, the idea was for a general purpose ML blog where different people could contribute posts. What has actually happened is most posts come from me, with a few guest posts that I greatly value. There are a few reasons I see for this. Overload . A couple years ago, I had not fully appreciated just how busy life gets for a researcher. Making a post is not simply a matter of getting to it, but rather of prioritizing between {writing a grant, finishing an overdue review, writing a paper, teaching a class, writing a program, etc…}. This is a substantial transition away from what life as a graduate student is like. At some point the question is not “when will I get to it?” but rather “will I get to it?” and the answer starts to become “no” most of the time. Feedback failure . This blog currently receives about 3K unique visitors per day from

5 0.86083049 343 hunch net-2009-02-18-Decision by Vetocracy

Introduction: Few would mistake the process of academic paper review for a fair process, but sometimes the unfairness seems particularly striking. This is most easily seen by comparison: Paper Banditron Offset Tree Notes Problem Scope Multiclass problems where only the loss of one choice can be probed. Strictly greater: Cost sensitive multiclass problems where only the loss of one choice can be probed. Often generalizations don’t matter. That’s not the case here, since every plausible application I’ve thought of involves loss functions substantially different from 0/1. What’s new Analysis and Experiments Algorithm, Analysis, and Experiments As far as I know, the essence of the more general problem was first stated and analyzed with the EXP4 algorithm (page 16) (1998). It’s also the time horizon 1 simplification of the Reinforcement Learning setting for the random trajectory method (page 15) (2002). The Banditron algorithm itself is functionally identi

6 0.85931015 183 hunch net-2006-06-14-Explorations of Exploration

7 0.85903382 435 hunch net-2011-05-16-Research Directions for Machine Learning and Algorithms

8 0.85894614 132 hunch net-2005-11-26-The Design of an Optimal Research Environment

9 0.85792542 235 hunch net-2007-03-03-All Models of Learning have Flaws

10 0.85766351 194 hunch net-2006-07-11-New Models

11 0.85634577 304 hunch net-2008-06-27-Reviewing Horror Stories

12 0.85580754 351 hunch net-2009-05-02-Wielding a New Abstraction

13 0.85572398 230 hunch net-2007-02-02-Thoughts regarding “Is machine learning different from statistics?”

14 0.85428065 258 hunch net-2007-08-12-Exponentiated Gradient

15 0.85374582 293 hunch net-2008-03-23-Interactive Machine Learning

16 0.85176563 388 hunch net-2010-01-24-Specializations of the Master Problem

17 0.85064632 51 hunch net-2005-04-01-The Producer-Consumer Model of Research

18 0.84977436 406 hunch net-2010-08-22-KDD 2010

19 0.84842396 371 hunch net-2009-09-21-Netflix finishes (and starts)

20 0.84812236 432 hunch net-2011-04-20-The End of the Beginning of Active Learning