hunch_net hunch_net-2010 hunch_net-2010-400 knowledge-graph by maker-knowledge-mining

400 hunch net-2010-06-13-The Good News on Exploration and Learning


meta infos for this blog

Source: html

Introduction: Consider the contextual bandit setting where, repeatedly: A context x is observed. An action a is taken given the context x . A reward r is observed, dependent on x and a . Where the goal of a learning agent is to find a policy for step 2 achieving a large expected reward. This setting is of obvious importance, because in the real world we typically make decisions based on some set of information and then get feedback only about the single action taken. It also fundamentally differs from supervised learning settings because knowing the value of one action is not equivalent to knowing the value of all actions. A decade ago the best machine learning techniques for this setting where implausibly inefficient. Dean Foster once told me he thought the area was a research sinkhole with little progress to be expected. Now we are on the verge of being able to routinely attack these problems, in almost exactly the same sense that we routinely attack bread and but


Summary: the most important sentenses genereted by tfidf model

sentIndex sentText sentNum sentScore

1 Consider the contextual bandit setting where, repeatedly: A context x is observed. [sent-1, score-0.678]

2 Where the goal of a learning agent is to find a policy for step 2 achieving a large expected reward. [sent-4, score-0.074]

3 This setting is of obvious importance, because in the real world we typically make decisions based on some set of information and then get feedback only about the single action taken. [sent-5, score-0.343]

4 It also fundamentally differs from supervised learning settings because knowing the value of one action is not equivalent to knowing the value of all actions. [sent-6, score-1.009]

5 A decade ago the best machine learning techniques for this setting where implausibly inefficient. [sent-7, score-0.456]

6 Dean Foster once told me he thought the area was a research sinkhole with little progress to be expected. [sent-8, score-0.078]

7 Now we are on the verge of being able to routinely attack these problems, in almost exactly the same sense that we routinely attack bread and butter supervised learning problems. [sent-9, score-1.107]

8 Just as for supervised learning, we know how to create and reuse datasets, how to benchmark algorithms, how to reuse existing supervised learning algorithms in this setting, and how to achieve optimal rates of learning quantitatively similar to supervised learning. [sent-10, score-1.494]

9 This is also one of the times when understanding the basic theory can make a huge difference in your success. [sent-11, score-0.261]

10 There are many wrong ways to attack contextual bandit problems or prepare datasets, and taking a wrong turn can easily mean the difference between failure and success. [sent-12, score-1.318]

11 Understanding how contextual bandit problems differ from basic supervised learning problems is critical to routine success here. [sent-13, score-1.127]

12 All of the above is not meant to claim that everything is done research-wise here so we’ll try to outline where the current boundary of research lies as best we can. [sent-14, score-0.292]

13 Given the above, Alina and I decided to prepare a tutorial to be given at Yahoo! [sent-16, score-0.348]

14 The subjects we plan to cover are essentially the keys to the kingdom of solving shallow interactive learning problems. [sent-21, score-0.194]


similar blogs computed by tfidf model

tfidf for this blog:

wordName wordTfidf (topN-words)

[('supervised', 0.315), ('attack', 0.222), ('contextual', 0.211), ('bandit', 0.207), ('prepare', 0.193), ('action', 0.188), ('routinely', 0.174), ('reuse', 0.174), ('setting', 0.155), ('knowing', 0.141), ('techniques', 0.13), ('surely', 0.13), ('problems', 0.114), ('datasets', 0.107), ('difference', 0.106), ('context', 0.105), ('supply', 0.104), ('trip', 0.104), ('outline', 0.104), ('quantitatively', 0.104), ('keys', 0.097), ('implausibly', 0.097), ('lies', 0.097), ('benchmark', 0.097), ('shallow', 0.097), ('wrong', 0.092), ('india', 0.091), ('boundary', 0.091), ('medical', 0.091), ('foster', 0.087), ('given', 0.087), ('reliable', 0.084), ('dean', 0.084), ('routine', 0.084), ('advertising', 0.084), ('basic', 0.082), ('applications', 0.082), ('easily', 0.081), ('methodology', 0.081), ('labs', 0.081), ('join', 0.081), ('told', 0.078), ('differs', 0.076), ('value', 0.074), ('decade', 0.074), ('shifting', 0.074), ('agent', 0.074), ('understanding', 0.073), ('demand', 0.072), ('decided', 0.068)]

similar blogs list:

simIndex simValue blogId blogTitle

same-blog 1 0.99999994 400 hunch net-2010-06-13-The Good News on Exploration and Learning

Introduction: Consider the contextual bandit setting where, repeatedly: A context x is observed. An action a is taken given the context x . A reward r is observed, dependent on x and a . Where the goal of a learning agent is to find a policy for step 2 achieving a large expected reward. This setting is of obvious importance, because in the real world we typically make decisions based on some set of information and then get feedback only about the single action taken. It also fundamentally differs from supervised learning settings because knowing the value of one action is not equivalent to knowing the value of all actions. A decade ago the best machine learning techniques for this setting where implausibly inefficient. Dean Foster once told me he thought the area was a research sinkhole with little progress to be expected. Now we are on the verge of being able to routinely attack these problems, in almost exactly the same sense that we routinely attack bread and but

2 0.25203091 269 hunch net-2007-10-24-Contextual Bandits

Introduction: One of the fundamental underpinnings of the internet is advertising based content. This has become much more effective due to targeted advertising where ads are specifically matched to interests. Everyone is familiar with this, because everyone uses search engines and all search engines try to make money this way. The problem of matching ads to interests is a natural machine learning problem in some ways since there is much information in who clicks on what. A fundamental problem with this information is that it is not supervised—in particular a click-or-not on one ad doesn’t generally tell you if a different ad would have been clicked on. This implies we have a fundamental exploration problem. A standard mathematical setting for this situation is “ k -Armed Bandits”, often with various relevant embellishments. The k -Armed Bandit setting works on a round-by-round basis. On each round: A policy chooses arm a from 1 of k arms (i.e. 1 of k ads). The world reveals t

3 0.23362526 293 hunch net-2008-03-23-Interactive Machine Learning

Introduction: A new direction of research seems to be arising in machine learning: Interactive Machine Learning. This isn’t a familiar term, although it does include some familiar subjects. What is Interactive Machine Learning? The fundamental requirement is (a) learning algorithms which interact with the world and (b) learn. For our purposes, let’s define learning as efficiently competing with a large set of possible predictors. Examples include: Online learning against an adversary ( Avrim’s Notes ). The interaction is almost trivial: the learning algorithm makes a prediction and then receives feedback. The learning is choosing based upon the advice of many experts. Active Learning . In active learning, the interaction is choosing which examples to label, and the learning is choosing from amongst a large set of hypotheses. Contextual Bandits . The interaction is choosing one of several actions and learning only the value of the chosen action (weaker than active learning

4 0.19247508 317 hunch net-2008-09-12-How do we get weak action dependence for learning with partial observations?

Introduction: This post is about contextual bandit problems where, repeatedly: The world chooses features x and rewards for each action r 1 ,…,r k then announces the features x (but not the rewards). A policy chooses an action a . The world announces the reward r a The goal in these situations is to learn a policy which maximizes r a in expectation efficiently. I’m thinking about all situations which fit the above setting, whether they are drawn IID or adversarially from round to round and whether they involve past logged data or rapidly learning via interaction. One common drawback of all algorithms for solving this setting, is that they have a poor dependence on the number of actions. For example if k is the number of actions, EXP4 (page 66) has a dependence on k 0.5 , epoch-greedy (and the simpler epsilon greedy) have a dependence on k 1/3 , and the offset tree has a dependence on k-1 . These results aren’t directly comparable because different things a

5 0.18716422 388 hunch net-2010-01-24-Specializations of the Master Problem

Introduction: One thing which is clear on a little reflection is that there exists a single master learning problem capable of encoding essentially all learning problems. This problem is of course a very general sort of reinforcement learning where the world interacts with an agent as: The world announces an observation x . The agent makes a choice a . The world announces a reward r . The goal here is to maximize the sum of the rewards over the time of the agent. No particular structure relating x to a or a to r is implied by this setting so we do not know effective general algorithms for the agent. It’s very easy to prove lower bounds showing that an agent cannot hope to succeed here—just consider the case where actions are unrelated to rewards. Nevertheless, there is a real sense in which essentially all forms of life are agents operating in this setting, somehow succeeding. The gap between these observations drives research—How can we find tractable specializations of

6 0.16223332 309 hunch net-2008-07-10-Interesting papers, ICML 2008

7 0.15823349 311 hunch net-2008-07-26-Compositional Machine Learning Algorithm Design

8 0.133288 299 hunch net-2008-04-27-Watchword: Supervised Learning

9 0.1213557 360 hunch net-2009-06-15-In Active Learning, the question changes

10 0.11953644 237 hunch net-2007-04-02-Contextual Scaling

11 0.11855456 435 hunch net-2011-05-16-Research Directions for Machine Learning and Algorithms

12 0.11037483 432 hunch net-2011-04-20-The End of the Beginning of Active Learning

13 0.10977628 148 hunch net-2006-01-13-Benchmarks for RL

14 0.10683425 437 hunch net-2011-07-10-ICML 2011 and the future

15 0.10574142 183 hunch net-2006-06-14-Explorations of Exploration

16 0.10255069 156 hunch net-2006-02-11-Yahoo’s Learning Problems.

17 0.095846951 430 hunch net-2011-04-11-The Heritage Health Prize

18 0.093589328 347 hunch net-2009-03-26-Machine Learning is too easy

19 0.09124691 120 hunch net-2005-10-10-Predictive Search is Coming

20 0.091190197 332 hunch net-2008-12-23-Use of Learning Theory


similar blogs computed by lsi model

lsi for this blog:

topicId topicWeight

[(0, 0.24), (1, 0.044), (2, -0.067), (3, 0.016), (4, 0.062), (5, -0.071), (6, 0.0), (7, 0.015), (8, -0.042), (9, 0.088), (10, 0.149), (11, 0.056), (12, 0.022), (13, 0.164), (14, -0.173), (15, 0.128), (16, -0.008), (17, -0.068), (18, 0.002), (19, 0.079), (20, -0.018), (21, -0.066), (22, 0.058), (23, -0.022), (24, -0.097), (25, 0.097), (26, -0.036), (27, -0.044), (28, -0.057), (29, 0.058), (30, -0.033), (31, -0.06), (32, 0.082), (33, 0.054), (34, 0.058), (35, -0.04), (36, 0.006), (37, -0.027), (38, -0.06), (39, 0.052), (40, -0.037), (41, -0.026), (42, -0.058), (43, 0.051), (44, -0.042), (45, 0.052), (46, 0.033), (47, -0.05), (48, 0.063), (49, 0.059)]

similar blogs list:

simIndex simValue blogId blogTitle

same-blog 1 0.9590745 400 hunch net-2010-06-13-The Good News on Exploration and Learning

Introduction: Consider the contextual bandit setting where, repeatedly: A context x is observed. An action a is taken given the context x . A reward r is observed, dependent on x and a . Where the goal of a learning agent is to find a policy for step 2 achieving a large expected reward. This setting is of obvious importance, because in the real world we typically make decisions based on some set of information and then get feedback only about the single action taken. It also fundamentally differs from supervised learning settings because knowing the value of one action is not equivalent to knowing the value of all actions. A decade ago the best machine learning techniques for this setting where implausibly inefficient. Dean Foster once told me he thought the area was a research sinkhole with little progress to be expected. Now we are on the verge of being able to routinely attack these problems, in almost exactly the same sense that we routinely attack bread and but

2 0.85012186 269 hunch net-2007-10-24-Contextual Bandits

Introduction: One of the fundamental underpinnings of the internet is advertising based content. This has become much more effective due to targeted advertising where ads are specifically matched to interests. Everyone is familiar with this, because everyone uses search engines and all search engines try to make money this way. The problem of matching ads to interests is a natural machine learning problem in some ways since there is much information in who clicks on what. A fundamental problem with this information is that it is not supervised—in particular a click-or-not on one ad doesn’t generally tell you if a different ad would have been clicked on. This implies we have a fundamental exploration problem. A standard mathematical setting for this situation is “ k -Armed Bandits”, often with various relevant embellishments. The k -Armed Bandit setting works on a round-by-round basis. On each round: A policy chooses arm a from 1 of k arms (i.e. 1 of k ads). The world reveals t

3 0.82080972 317 hunch net-2008-09-12-How do we get weak action dependence for learning with partial observations?

Introduction: This post is about contextual bandit problems where, repeatedly: The world chooses features x and rewards for each action r 1 ,…,r k then announces the features x (but not the rewards). A policy chooses an action a . The world announces the reward r a The goal in these situations is to learn a policy which maximizes r a in expectation efficiently. I’m thinking about all situations which fit the above setting, whether they are drawn IID or adversarially from round to round and whether they involve past logged data or rapidly learning via interaction. One common drawback of all algorithms for solving this setting, is that they have a poor dependence on the number of actions. For example if k is the number of actions, EXP4 (page 66) has a dependence on k 0.5 , epoch-greedy (and the simpler epsilon greedy) have a dependence on k 1/3 , and the offset tree has a dependence on k-1 . These results aren’t directly comparable because different things a

4 0.74806368 388 hunch net-2010-01-24-Specializations of the Master Problem

Introduction: One thing which is clear on a little reflection is that there exists a single master learning problem capable of encoding essentially all learning problems. This problem is of course a very general sort of reinforcement learning where the world interacts with an agent as: The world announces an observation x . The agent makes a choice a . The world announces a reward r . The goal here is to maximize the sum of the rewards over the time of the agent. No particular structure relating x to a or a to r is implied by this setting so we do not know effective general algorithms for the agent. It’s very easy to prove lower bounds showing that an agent cannot hope to succeed here—just consider the case where actions are unrelated to rewards. Nevertheless, there is a real sense in which essentially all forms of life are agents operating in this setting, somehow succeeding. The gap between these observations drives research—How can we find tractable specializations of

5 0.71652943 293 hunch net-2008-03-23-Interactive Machine Learning

Introduction: A new direction of research seems to be arising in machine learning: Interactive Machine Learning. This isn’t a familiar term, although it does include some familiar subjects. What is Interactive Machine Learning? The fundamental requirement is (a) learning algorithms which interact with the world and (b) learn. For our purposes, let’s define learning as efficiently competing with a large set of possible predictors. Examples include: Online learning against an adversary ( Avrim’s Notes ). The interaction is almost trivial: the learning algorithm makes a prediction and then receives feedback. The learning is choosing based upon the advice of many experts. Active Learning . In active learning, the interaction is choosing which examples to label, and the learning is choosing from amongst a large set of hypotheses. Contextual Bandits . The interaction is choosing one of several actions and learning only the value of the chosen action (weaker than active learning

6 0.63154513 311 hunch net-2008-07-26-Compositional Machine Learning Algorithm Design

7 0.58136487 183 hunch net-2006-06-14-Explorations of Exploration

8 0.57479692 299 hunch net-2008-04-27-Watchword: Supervised Learning

9 0.56288445 360 hunch net-2009-06-15-In Active Learning, the question changes

10 0.56213897 392 hunch net-2010-03-26-A Variance only Deviation Bound

11 0.55145025 220 hunch net-2006-11-27-Continuizing Solutions

12 0.55083227 309 hunch net-2008-07-10-Interesting papers, ICML 2008

13 0.53679353 148 hunch net-2006-01-13-Benchmarks for RL

14 0.52638465 432 hunch net-2011-04-20-The End of the Beginning of Active Learning

15 0.51609248 435 hunch net-2011-05-16-Research Directions for Machine Learning and Algorithms

16 0.47255945 156 hunch net-2006-02-11-Yahoo’s Learning Problems.

17 0.46341872 276 hunch net-2007-12-10-Learning Track of International Planning Competition

18 0.4583343 351 hunch net-2009-05-02-Wielding a New Abstraction

19 0.4551357 345 hunch net-2009-03-08-Prediction Science

20 0.45339265 373 hunch net-2009-10-03-Static vs. Dynamic multiclass prediction


similar blogs computed by lda model

lda for this blog:

topicId topicWeight

[(27, 0.738), (38, 0.016), (53, 0.026), (77, 0.027), (94, 0.058), (95, 0.031)]

similar blogs list:

simIndex simValue blogId blogTitle

same-blog 1 0.99951738 400 hunch net-2010-06-13-The Good News on Exploration and Learning

Introduction: Consider the contextual bandit setting where, repeatedly: A context x is observed. An action a is taken given the context x . A reward r is observed, dependent on x and a . Where the goal of a learning agent is to find a policy for step 2 achieving a large expected reward. This setting is of obvious importance, because in the real world we typically make decisions based on some set of information and then get feedback only about the single action taken. It also fundamentally differs from supervised learning settings because knowing the value of one action is not equivalent to knowing the value of all actions. A decade ago the best machine learning techniques for this setting where implausibly inefficient. Dean Foster once told me he thought the area was a research sinkhole with little progress to be expected. Now we are on the verge of being able to routinely attack these problems, in almost exactly the same sense that we routinely attack bread and but

2 0.99456608 45 hunch net-2005-03-22-Active learning

Introduction: Often, unlabeled data is easy to come by but labels are expensive. For instance, if you’re building a speech recognizer, it’s easy enough to get raw speech samples — just walk around with a microphone — but labeling even one of these samples is a tedious process in which a human must examine the speech signal and carefully segment it into phonemes. In the field of active learning, the goal is as usual to construct an accurate classifier, but the labels of the data points are initially hidden and there is a charge for each label you want revealed. The hope is that by intelligent adaptive querying, you can get away with significantly fewer labels than you would need in a regular supervised learning framework. Here’s an example. Suppose the data lie on the real line, and the classifiers are simple thresholding functions, H = {h w }: h w (x) = 1 if x > w, and 0 otherwise. VC theory tells us that if the underlying distribution P can be classified perfectly by some hypothesis in H (

3 0.99451351 166 hunch net-2006-03-24-NLPers

Introduction: Hal Daume has started the NLPers blog to discuss learning for language problems.

4 0.99451351 246 hunch net-2007-06-13-Not Posting

Introduction: If you have been disappointed by the lack of a post for the last month, consider contributing your own (I’ve been busy+uninspired). Also, keep in mind that there is a community of machine learning blogs (see the sidebar).

5 0.99451351 418 hunch net-2010-12-02-Traffic Prediction Problem

Introduction: Slashdot points out the Traffic Prediction Challenge which looks pretty fun. The temporal aspect seems to be very common in many real-world problems and somewhat understudied.

6 0.99392056 274 hunch net-2007-11-28-Computational Consequences of Classification

7 0.99382383 308 hunch net-2008-07-06-To Dual or Not

8 0.99347365 247 hunch net-2007-06-14-Interesting Papers at COLT 2007

9 0.99301702 172 hunch net-2006-04-14-JMLR is a success

10 0.99213284 245 hunch net-2007-05-12-Loss Function Semantics

11 0.98839885 288 hunch net-2008-02-10-Complexity Illness

12 0.98276132 9 hunch net-2005-02-01-Watchword: Loss

13 0.98224193 352 hunch net-2009-05-06-Machine Learning to AI

14 0.97880256 341 hunch net-2009-02-04-Optimal Proxy Loss for Classification

15 0.96903706 304 hunch net-2008-06-27-Reviewing Horror Stories

16 0.96731466 196 hunch net-2006-07-13-Regression vs. Classification as a Primitive

17 0.95961326 483 hunch net-2013-06-10-The Large Scale Learning class notes

18 0.95670527 244 hunch net-2007-05-09-The Missing Bound

19 0.95131391 293 hunch net-2008-03-23-Interactive Machine Learning

20 0.95037359 8 hunch net-2005-02-01-NIPS: Online Bayes