hunch_net hunch_net-2008 hunch_net-2008-311 knowledge-graph by maker-knowledge-mining

311 hunch net-2008-07-26-Compositional Machine Learning Algorithm Design


meta infos for this blog

Source: html

Introduction: There were two papers at ICML presenting learning algorithms for a contextual bandit -style setting, where the loss for all labels is not known, but the loss for one label is known. (The first might require a exploration scavenging viewpoint to understand if the experimental assignment was nonrandom.) I strongly approve of these papers and further work in this setting and its variants, because I expect it to become more important than supervised learning. As a quick review, we are thinking about situations where repeatedly: The world reveals feature values (aka context information). A policy chooses an action. The world provides a reward. Sometimes this is done in an online fashion where the policy can change based on immediate feedback and sometimes it’s done in a batch setting where many samples are collected before the policy can change. If you haven’t spent time thinking about the setting, you might want to because there are many natural applications. I’m g


Summary: the most important sentenses genereted by tfidf model

sentIndex sentText sentNum sentScore

1 Offset Tree Reduction The offset tree is a newer machine learning reduction from the contextual bandit setting to the standard supervised learning setting. [sent-19, score-1.002]

2 It more robustly transforms a supervised learner’s performance into good policy performance than any other reduction. [sent-20, score-0.394]

3 The offset tree also has good computational properties, since it produces at most log 2 k binary examples per train or test event, where k is the number of actions. [sent-21, score-0.609]

4 In some sense it’s unfair to include the offset tree because it hasn’t yet been formally published. [sent-22, score-0.555]

5 The Epoch-greedy approach shows how to handle the explore/exploit tradeoff for learning in a contextual bandit setting as a function of a sample complexity bound. [sent-25, score-0.397]

6 Occam’s Razor Bound The Occam’s Razor bound limits the regret of an empirical error minimizing perceptron as function of the number of examples. [sent-27, score-0.398]

7 Each component above has been analyzed in isolation and is at least a reasonable approach (some are the best possible). [sent-30, score-0.471]

8 Fitting these pieces together, we get an online learning algorithm (agnostic offset-tree perceptron) that chooses to explore uniformly at random amongst the actions with probability about 1/t 1/3 . [sent-32, score-0.424]

9 There are three results in the left plot: The blue line is a version of the component set where the Occam’s Razor bound and the Offset Tree reduction have been optimized for the realizable case. [sent-37, score-0.764]

10 The two components that we tweaked are: Realizable case Bound It’s well known that in the realizable case the regret of a chosen classifier should scale as 1/t rather than 1/t 0. [sent-40, score-0.719]

11 The offset tree reduction can be altered to take this into account by eliminating the importance weight from all updates, and updating even for exploitation examples which are not drawn uniform randomly. [sent-46, score-0.756]

12 The red line is what you get with exactly the component set stated above. [sent-47, score-0.379]

13 We were curious about the degree to which a general purpose algorithm can perform well on this application as the realizable case algorithm is definitely broken when the problem is inherently noisy. [sent-48, score-0.53]

14 The green line is from a component set where epoch greedy and the offset tree have been tweaked to keep track of and use the distribution over actions at every timestep. [sent-51, score-1.158]

15 The tweaks used for the component set are: Stochastic Epoch-Greedy Instead of deterministically exploring every 1/(bound_gap) times, choose to explore with probability 1/(bound_gap), and pass this probability to the offset tree reduction. [sent-54, score-1.194]

16 Importance Weighted Perceptron We dealt with importance weights generated by the offset tree reduction by scaling any update by the importance weight. [sent-58, score-0.926]

17 People may be dissatisfied with the component assembly approach to learning algorithm design, because all of the pieces are not analyzed together. [sent-59, score-0.718]

18 We start by assembling a set of components which we know work well from individual component analysis. [sent-69, score-0.454]

19 Then, we try to optimize the performance of the assembly by swapping components or improving individual components to address known properties of the problem or observed deficiencies. [sent-70, score-0.62]

20 In this particular case, we end up with a better performing algorithm and better components (such as stochastic epoch greedy) which are directly reusable in other settings. [sent-71, score-0.357]


similar blogs computed by tfidf model

tfidf for this blog:

wordName wordTfidf (topN-words)

[('banditron', 0.387), ('offset', 0.338), ('tree', 0.217), ('component', 0.215), ('realizable', 0.205), ('components', 0.184), ('perceptron', 0.179), ('modular', 0.13), ('performance', 0.125), ('plot', 0.119), ('importance', 0.117), ('occam', 0.113), ('razor', 0.113), ('line', 0.109), ('approach', 0.105), ('setting', 0.104), ('exploration', 0.102), ('bound', 0.096), ('contextual', 0.095), ('algorithm', 0.094), ('bandit', 0.093), ('technologies', 0.092), ('actions', 0.091), ('probability', 0.091), ('reduction', 0.084), ('pieces', 0.083), ('analyzed', 0.081), ('epoch', 0.079), ('case', 0.075), ('conditions', 0.075), ('policy', 0.073), ('supervised', 0.071), ('dissatisfied', 0.07), ('assembly', 0.07), ('isolation', 0.07), ('regret', 0.069), ('explore', 0.065), ('nonuniform', 0.065), ('action', 0.063), ('perform', 0.062), ('tweaks', 0.061), ('thank', 0.061), ('exploring', 0.061), ('achievable', 0.059), ('known', 0.057), ('set', 0.055), ('tweaked', 0.054), ('error', 0.054), ('binary', 0.054), ('weights', 0.053)]

similar blogs list:

simIndex simValue blogId blogTitle

same-blog 1 0.99999988 311 hunch net-2008-07-26-Compositional Machine Learning Algorithm Design

Introduction: There were two papers at ICML presenting learning algorithms for a contextual bandit -style setting, where the loss for all labels is not known, but the loss for one label is known. (The first might require a exploration scavenging viewpoint to understand if the experimental assignment was nonrandom.) I strongly approve of these papers and further work in this setting and its variants, because I expect it to become more important than supervised learning. As a quick review, we are thinking about situations where repeatedly: The world reveals feature values (aka context information). A policy chooses an action. The world provides a reward. Sometimes this is done in an online fashion where the policy can change based on immediate feedback and sometimes it’s done in a batch setting where many samples are collected before the policy can change. If you haven’t spent time thinking about the setting, you might want to because there are many natural applications. I’m g

2 0.25072443 343 hunch net-2009-02-18-Decision by Vetocracy

Introduction: Few would mistake the process of academic paper review for a fair process, but sometimes the unfairness seems particularly striking. This is most easily seen by comparison: Paper Banditron Offset Tree Notes Problem Scope Multiclass problems where only the loss of one choice can be probed. Strictly greater: Cost sensitive multiclass problems where only the loss of one choice can be probed. Often generalizations don’t matter. That’s not the case here, since every plausible application I’ve thought of involves loss functions substantially different from 0/1. What’s new Analysis and Experiments Algorithm, Analysis, and Experiments As far as I know, the essence of the more general problem was first stated and analyzed with the EXP4 algorithm (page 16) (1998). It’s also the time horizon 1 simplification of the Reinforcement Learning setting for the random trajectory method (page 15) (2002). The Banditron algorithm itself is functionally identi

3 0.22767638 317 hunch net-2008-09-12-How do we get weak action dependence for learning with partial observations?

Introduction: This post is about contextual bandit problems where, repeatedly: The world chooses features x and rewards for each action r 1 ,…,r k then announces the features x (but not the rewards). A policy chooses an action a . The world announces the reward r a The goal in these situations is to learn a policy which maximizes r a in expectation efficiently. I’m thinking about all situations which fit the above setting, whether they are drawn IID or adversarially from round to round and whether they involve past logged data or rapidly learning via interaction. One common drawback of all algorithms for solving this setting, is that they have a poor dependence on the number of actions. For example if k is the number of actions, EXP4 (page 66) has a dependence on k 0.5 , epoch-greedy (and the simpler epsilon greedy) have a dependence on k 1/3 , and the offset tree has a dependence on k-1 . These results aren’t directly comparable because different things a

4 0.18476298 388 hunch net-2010-01-24-Specializations of the Master Problem

Introduction: One thing which is clear on a little reflection is that there exists a single master learning problem capable of encoding essentially all learning problems. This problem is of course a very general sort of reinforcement learning where the world interacts with an agent as: The world announces an observation x . The agent makes a choice a . The world announces a reward r . The goal here is to maximize the sum of the rewards over the time of the agent. No particular structure relating x to a or a to r is implied by this setting so we do not know effective general algorithms for the agent. It’s very easy to prove lower bounds showing that an agent cannot hope to succeed here—just consider the case where actions are unrelated to rewards. Nevertheless, there is a real sense in which essentially all forms of life are agents operating in this setting, somehow succeeding. The gap between these observations drives research—How can we find tractable specializations of

5 0.17539757 269 hunch net-2007-10-24-Contextual Bandits

Introduction: One of the fundamental underpinnings of the internet is advertising based content. This has become much more effective due to targeted advertising where ads are specifically matched to interests. Everyone is familiar with this, because everyone uses search engines and all search engines try to make money this way. The problem of matching ads to interests is a natural machine learning problem in some ways since there is much information in who clicks on what. A fundamental problem with this information is that it is not supervised—in particular a click-or-not on one ad doesn’t generally tell you if a different ad would have been clicked on. This implies we have a fundamental exploration problem. A standard mathematical setting for this situation is “ k -Armed Bandits”, often with various relevant embellishments. The k -Armed Bandit setting works on a round-by-round basis. On each round: A policy chooses arm a from 1 of k arms (i.e. 1 of k ads). The world reveals t

6 0.16163793 41 hunch net-2005-03-15-The State of Tight Bounds

7 0.16042484 14 hunch net-2005-02-07-The State of the Reduction

8 0.15823349 400 hunch net-2010-06-13-The Good News on Exploration and Learning

9 0.13606781 426 hunch net-2011-03-19-The Ideal Large Scale Learning Class

10 0.13259858 435 hunch net-2011-05-16-Research Directions for Machine Learning and Algorithms

11 0.12988554 19 hunch net-2005-02-14-Clever Methods of Overfitting

12 0.12824498 72 hunch net-2005-05-16-Regret minimizing vs error limiting reductions

13 0.12814465 183 hunch net-2006-06-14-Explorations of Exploration

14 0.12804525 309 hunch net-2008-07-10-Interesting papers, ICML 2008

15 0.12585507 391 hunch net-2010-03-15-The Efficient Robust Conditional Probability Estimation Problem

16 0.12458607 235 hunch net-2007-03-03-All Models of Learning have Flaws

17 0.11974552 181 hunch net-2006-05-23-What is the best regret transform reduction from multiclass to binary?

18 0.11787633 365 hunch net-2009-07-31-Vowpal Wabbit Open Source Project

19 0.11786484 45 hunch net-2005-03-22-Active learning

20 0.11530313 351 hunch net-2009-05-02-Wielding a New Abstraction


similar blogs computed by lsi model

lsi for this blog:

topicId topicWeight

[(0, 0.288), (1, 0.162), (2, 0.077), (3, -0.066), (4, 0.064), (5, -0.027), (6, 0.071), (7, -0.04), (8, -0.072), (9, 0.088), (10, 0.085), (11, 0.06), (12, 0.037), (13, 0.008), (14, -0.073), (15, 0.031), (16, -0.025), (17, -0.017), (18, 0.005), (19, 0.12), (20, -0.047), (21, -0.017), (22, -0.038), (23, 0.015), (24, -0.041), (25, 0.07), (26, -0.141), (27, 0.003), (28, 0.002), (29, -0.047), (30, -0.067), (31, -0.037), (32, 0.109), (33, 0.057), (34, 0.029), (35, 0.009), (36, 0.021), (37, 0.023), (38, -0.01), (39, 0.074), (40, -0.032), (41, -0.022), (42, 0.005), (43, -0.018), (44, 0.037), (45, 0.012), (46, -0.01), (47, 0.011), (48, -0.015), (49, 0.105)]

similar blogs list:

simIndex simValue blogId blogTitle

same-blog 1 0.95287091 311 hunch net-2008-07-26-Compositional Machine Learning Algorithm Design

Introduction: There were two papers at ICML presenting learning algorithms for a contextual bandit -style setting, where the loss for all labels is not known, but the loss for one label is known. (The first might require a exploration scavenging viewpoint to understand if the experimental assignment was nonrandom.) I strongly approve of these papers and further work in this setting and its variants, because I expect it to become more important than supervised learning. As a quick review, we are thinking about situations where repeatedly: The world reveals feature values (aka context information). A policy chooses an action. The world provides a reward. Sometimes this is done in an online fashion where the policy can change based on immediate feedback and sometimes it’s done in a batch setting where many samples are collected before the policy can change. If you haven’t spent time thinking about the setting, you might want to because there are many natural applications. I’m g

2 0.85788566 317 hunch net-2008-09-12-How do we get weak action dependence for learning with partial observations?

Introduction: This post is about contextual bandit problems where, repeatedly: The world chooses features x and rewards for each action r 1 ,…,r k then announces the features x (but not the rewards). A policy chooses an action a . The world announces the reward r a The goal in these situations is to learn a policy which maximizes r a in expectation efficiently. I’m thinking about all situations which fit the above setting, whether they are drawn IID or adversarially from round to round and whether they involve past logged data or rapidly learning via interaction. One common drawback of all algorithms for solving this setting, is that they have a poor dependence on the number of actions. For example if k is the number of actions, EXP4 (page 66) has a dependence on k 0.5 , epoch-greedy (and the simpler epsilon greedy) have a dependence on k 1/3 , and the offset tree has a dependence on k-1 . These results aren’t directly comparable because different things a

3 0.77805114 388 hunch net-2010-01-24-Specializations of the Master Problem

Introduction: One thing which is clear on a little reflection is that there exists a single master learning problem capable of encoding essentially all learning problems. This problem is of course a very general sort of reinforcement learning where the world interacts with an agent as: The world announces an observation x . The agent makes a choice a . The world announces a reward r . The goal here is to maximize the sum of the rewards over the time of the agent. No particular structure relating x to a or a to r is implied by this setting so we do not know effective general algorithms for the agent. It’s very easy to prove lower bounds showing that an agent cannot hope to succeed here—just consider the case where actions are unrelated to rewards. Nevertheless, there is a real sense in which essentially all forms of life are agents operating in this setting, somehow succeeding. The gap between these observations drives research—How can we find tractable specializations of

4 0.74455386 269 hunch net-2007-10-24-Contextual Bandits

Introduction: One of the fundamental underpinnings of the internet is advertising based content. This has become much more effective due to targeted advertising where ads are specifically matched to interests. Everyone is familiar with this, because everyone uses search engines and all search engines try to make money this way. The problem of matching ads to interests is a natural machine learning problem in some ways since there is much information in who clicks on what. A fundamental problem with this information is that it is not supervised—in particular a click-or-not on one ad doesn’t generally tell you if a different ad would have been clicked on. This implies we have a fundamental exploration problem. A standard mathematical setting for this situation is “ k -Armed Bandits”, often with various relevant embellishments. The k -Armed Bandit setting works on a round-by-round basis. On each round: A policy chooses arm a from 1 of k arms (i.e. 1 of k ads). The world reveals t

5 0.70735222 220 hunch net-2006-11-27-Continuizing Solutions

Introduction: This post is about a general technique for problem solving which I’ve never seen taught (in full generality), but which I’ve found very useful. Many problems in computer science turn out to be discretely difficult. The best known version of such problems are NP-hard problems, but I mean ‘discretely difficult’ in a much more general way, which I only know how to capture by examples. ERM In empirical risk minimization, you choose a minimum error rate classifier from a set of classifiers. This is NP hard for common sets, but it can be much harder, depending on the set. Experts In the online learning with experts setting, you try to predict well so as to compete with a set of (adversarial) experts. Here the alternating quantifiers of you and an adversary playing out a game can yield a dynamic programming problem that grows exponentially. Policy Iteration The problem with policy iteration is that you learn a new policy with respect to an old policy, which implies that sim

6 0.65889406 391 hunch net-2010-03-15-The Efficient Robust Conditional Probability Estimation Problem

7 0.64446056 351 hunch net-2009-05-02-Wielding a New Abstraction

8 0.62188512 400 hunch net-2010-06-13-The Good News on Exploration and Learning

9 0.62036151 303 hunch net-2008-06-09-The Minimum Sample Complexity of Importance Weighting

10 0.60276359 163 hunch net-2006-03-12-Online learning or online preservation of learning?

11 0.58951545 183 hunch net-2006-06-14-Explorations of Exploration

12 0.58026731 177 hunch net-2006-05-05-An ICML reject

13 0.57683581 392 hunch net-2010-03-26-A Variance only Deviation Bound

14 0.56313497 309 hunch net-2008-07-10-Interesting papers, ICML 2008

15 0.55725706 87 hunch net-2005-06-29-Not EM for clustering at COLT

16 0.54947877 14 hunch net-2005-02-07-The State of the Reduction

17 0.54763013 373 hunch net-2009-10-03-Static vs. Dynamic multiclass prediction

18 0.5446592 343 hunch net-2009-02-18-Decision by Vetocracy

19 0.54406047 334 hunch net-2009-01-07-Interesting Papers at SODA 2009

20 0.54141903 206 hunch net-2006-09-09-How to solve an NP hard problem in quadratic time


similar blogs computed by lda model

lda for this blog:

topicId topicWeight

[(0, 0.012), (3, 0.035), (12, 0.179), (27, 0.314), (38, 0.055), (53, 0.07), (55, 0.061), (64, 0.022), (77, 0.044), (94, 0.087), (95, 0.03)]

similar blogs list:

simIndex simValue blogId blogTitle

same-blog 1 0.94834995 311 hunch net-2008-07-26-Compositional Machine Learning Algorithm Design

Introduction: There were two papers at ICML presenting learning algorithms for a contextual bandit -style setting, where the loss for all labels is not known, but the loss for one label is known. (The first might require a exploration scavenging viewpoint to understand if the experimental assignment was nonrandom.) I strongly approve of these papers and further work in this setting and its variants, because I expect it to become more important than supervised learning. As a quick review, we are thinking about situations where repeatedly: The world reveals feature values (aka context information). A policy chooses an action. The world provides a reward. Sometimes this is done in an online fashion where the policy can change based on immediate feedback and sometimes it’s done in a batch setting where many samples are collected before the policy can change. If you haven’t spent time thinking about the setting, you might want to because there are many natural applications. I’m g

2 0.93191183 438 hunch net-2011-07-11-Interesting Neural Network Papers at ICML 2011

Introduction: Maybe it’s too early to call, but with four separate Neural Network sessions at this year’s ICML , it looks like Neural Networks are making a comeback. Here are my highlights of these sessions. In general, my feeling is that these papers both demystify deep learning and show its broader applicability. The first observation I made is that the once disreputable “Neural” nomenclature is being used again in lieu of “deep learning”. Maybe it’s because Adam Coates et al. showed that single layer networks can work surprisingly well. An Analysis of Single-Layer Networks in Unsupervised Feature Learning , Adam Coates , Honglak Lee , Andrew Y. Ng (AISTATS 2011) The Importance of Encoding Versus Training with Sparse Coding and Vector Quantization , Adam Coates , Andrew Y. Ng (ICML 2011) Another surprising result out of Andrew Ng’s group comes from Andrew Saxe et al. who show that certain convolutional pooling architectures can obtain close to state-of-the-art pe

3 0.9264403 259 hunch net-2007-08-19-Choice of Metrics

Introduction: How do we judge success in Machine Learning? As Aaron notes , the best way is to use the loss imposed on you by the world. This turns out to be infeasible sometimes for various reasons. The ones I’ve seen are: The learned prediction is used in some complicated process that does not give the feedback necessary to understand the prediction’s impact on the loss. The prediction is used by some other system which expects some semantics to the predicted value. This is similar to the previous example, except that the issue is design modularity rather than engineering modularity. The correct loss function is simply unknown (and perhaps unknowable, except by experimentation). In these situations, it’s unclear what metric for evaluation should be chosen. This post has some design advice for this murkier case. I’m using the word “metric” here to distinguish the fact that we are considering methods for evaluating predictive systems rather than a loss imposed by the real wor

4 0.87617517 351 hunch net-2009-05-02-Wielding a New Abstraction

Introduction: This post is partly meant as an advertisement for the reductions tutorial Alina , Bianca , and I are planning to do at ICML . Please come, if you are interested. Many research programs can be thought of as finding and building new useful abstractions. The running example I’ll use is learning reductions where I have experience. The basic abstraction here is that we can build a learning algorithm capable of solving classification problems up to a small expected regret. This is used repeatedly to solve more complex problems. In working on a new abstraction, I think you typically run into many substantial problems of understanding, which make publishing particularly difficult. It is difficult to seriously discuss the reason behind or mechanism for abstraction in a conference paper with small page limits. People rarely see such discussions and hence have little basis on which to think about new abstractions. Another difficulty is that when building an abstraction, yo

5 0.87502027 258 hunch net-2007-08-12-Exponentiated Gradient

Introduction: The Exponentiated Gradient algorithm by Manfred Warmuth and Jyrki Kivinen came out just as I was starting graduate school, so I missed it both at a conference and in class. It’s a fine algorithm which has a remarkable theoretical statement accompanying it. The essential statement holds in the “online learning with an adversary” setting. Initially, there are of set of n weights, which might have values (1/n,…,1/n) , (or any other values from a probability distribution). Everything happens in a round-by-round fashion. On each round, the following happens: The world reveals a set of features x in {0,1} n . In the online learning with an adversary literature, the features are called “experts” and thought of as subpredictors, but this interpretation isn’t necessary—you can just use feature values as experts (or maybe the feature value and the negation of the feature value as two experts). EG makes a prediction according to y’ = w . x (dot product). The world reve

6 0.87402356 435 hunch net-2011-05-16-Research Directions for Machine Learning and Algorithms

7 0.87333637 41 hunch net-2005-03-15-The State of Tight Bounds

8 0.87129605 14 hunch net-2005-02-07-The State of the Reduction

9 0.87128568 235 hunch net-2007-03-03-All Models of Learning have Flaws

10 0.87044132 183 hunch net-2006-06-14-Explorations of Exploration

11 0.86946428 79 hunch net-2005-06-08-Question: “When is the right time to insert the loss function?”

12 0.86906481 230 hunch net-2007-02-02-Thoughts regarding “Is machine learning different from statistics?”

13 0.86897236 421 hunch net-2011-01-03-Herman Goldstine 2011

14 0.86788398 12 hunch net-2005-02-03-Learning Theory, by assumption

15 0.86527157 388 hunch net-2010-01-24-Specializations of the Master Problem

16 0.86481339 347 hunch net-2009-03-26-Machine Learning is too easy

17 0.86404651 360 hunch net-2009-06-15-In Active Learning, the question changes

18 0.86385369 227 hunch net-2007-01-10-A Deep Belief Net Learning Problem

19 0.86304122 131 hunch net-2005-11-16-The Everything Ensemble Edge

20 0.86237216 317 hunch net-2008-09-12-How do we get weak action dependence for learning with partial observations?