hunch_net hunch_net-2006 hunch_net-2006-183 knowledge-graph by maker-knowledge-mining
Source: html
Introduction: Exploration is one of the big unsolved problems in machine learning. This isn’t for lack of trying—there are many models of exploration which have been analyzed in many different ways by many different groups of people. At some point, it is worthwhile to sit back and see what has been done across these many models. Reinforcement Learning (1) . Reinforcement learning has traditionally focused on Markov Decision Processes where the next state s’ is given by a conditional distribution P(s’|s,a) given the current state s and action a . The typical result here is that certain specific algorithms controlling an agent can behave within e of optimal for horizon T except for poly(1/e,T,S,A) “wasted” experiences (with high probability). This started with E 3 by Satinder Singh and Michael Kearns . Sham Kakade’s thesis has significant discussion. Extensions have typically been of the form “under extra assumptions, we can prove more”, for example Factored-E 3 and
sentIndex sentText sentNum sentScore
1 This isn’t for lack of trying—there are many models of exploration which have been analyzed in many different ways by many different groups of people. [sent-2, score-0.212]
2 Reinforcement learning has traditionally focused on Markov Decision Processes where the next state s’ is given by a conditional distribution P(s’|s,a) given the current state s and action a . [sent-5, score-0.581]
3 The typical result here is that certain specific algorithms controlling an agent can behave within e of optimal for horizon T except for poly(1/e,T,S,A) “wasted” experiences (with high probability). [sent-6, score-0.486]
4 (It turns out that the number of wasted samples can be less than the number of bits required to describe an MDP . [sent-10, score-0.778]
5 ) A weakness of all these results is that they rely upon (a) assumptions which are often false for real applications, (b) state spaces are too large, and (c) make a gurantee that is rather weak. [sent-11, score-0.526]
6 Good performance is only guaranteed after suffering the possibly catastrophic consequences of exploration. [sent-12, score-0.209]
7 Several recent papers have been attempting to analyze reinforcement learning via reduction. [sent-14, score-0.248]
8 To date, all results are either nonconstructive or involve the use of various hints (oracle access to an optimal policy, the distribution over states of an optimal policy etc…) which short-circuit the need to explore. [sent-15, score-0.673]
9 Much of the rest of reinforcement learning has something to do with exploration, but it’s difficult to summarize succinctly. [sent-18, score-0.248]
10 The typical result here says that you can compete well with the best constant action after some wasted actions. [sent-22, score-0.576]
11 The exact number of wasted actions varies with the precise setting, but it is typically linear in the number of actions. [sent-23, score-0.538]
12 Active Learning (1) The common current use of this term has to do with “selective sampling”=choosing unlabeled samples to label so as to minimize the number of labels required to learn a good predictor (typically a classifier). [sent-25, score-0.575]
13 A typical result has the form: Given that your classifier comes from restricted class C and the labeled data distribution obeys some constraint, the number of adaptively labeled samples required O(log (1/e)) where e is the error rate. [sent-26, score-1.198]
14 ) The constraints on distributions and hypothesis spaces required to achieve these speedups are often severe. [sent-28, score-0.33]
15 The distinguishing difference with respect to selective sampling is that the a labeled can be requested for any unlabeled point (not just those drawn according to some natural distribution). [sent-30, score-0.493]
16 Several relatively strong results hold for membership query learning, but there is a significant drawback: it seems that the ability to query for a label on an arbitrary point is not very natural. [sent-31, score-0.784]
17 For example, imagine query whether a text document is about sports or politics when the text is generated at random. [sent-32, score-0.397]
18 Often, the data generating distribution is assumed to come from some specific parametric family. [sent-34, score-0.233]
19 This failure is either by design (making assumptions which are simply rarely met), by failure to prove interesting results, or both. [sent-37, score-0.431]
20 Active learning deals with generalization, online learning can deal with adversarial situations, and reinforcement learning deals with the situation where your choices influence what you can later learn. [sent-39, score-0.504]
wordName wordTfidf (topN-words)
[('wasted', 0.255), ('reinforcement', 0.248), ('exploration', 0.212), ('query', 0.201), ('distribution', 0.153), ('membership', 0.146), ('typical', 0.142), ('state', 0.136), ('hints', 0.135), ('required', 0.133), ('labeled', 0.129), ('deals', 0.128), ('active', 0.122), ('selective', 0.122), ('assumptions', 0.117), ('mdp', 0.113), ('samples', 0.109), ('spaces', 0.106), ('optimal', 0.105), ('number', 0.104), ('results', 0.099), ('text', 0.098), ('result', 0.091), ('distributions', 0.091), ('unlabeled', 0.089), ('action', 0.088), ('sampling', 0.088), ('design', 0.087), ('specific', 0.08), ('prove', 0.077), ('policy', 0.076), ('unfortunately', 0.076), ('typically', 0.075), ('failure', 0.075), ('back', 0.074), ('turns', 0.073), ('obeys', 0.073), ('suffering', 0.073), ('label', 0.072), ('current', 0.068), ('payoffs', 0.068), ('satinder', 0.068), ('singh', 0.068), ('adaptively', 0.068), ('experiences', 0.068), ('catastrophic', 0.068), ('guaranteed', 0.068), ('weakness', 0.068), ('classifier', 0.067), ('point', 0.065)]
simIndex simValue blogId blogTitle
same-blog 1 1.0000005 183 hunch net-2006-06-14-Explorations of Exploration
Introduction: Exploration is one of the big unsolved problems in machine learning. This isn’t for lack of trying—there are many models of exploration which have been analyzed in many different ways by many different groups of people. At some point, it is worthwhile to sit back and see what has been done across these many models. Reinforcement Learning (1) . Reinforcement learning has traditionally focused on Markov Decision Processes where the next state s’ is given by a conditional distribution P(s’|s,a) given the current state s and action a . The typical result here is that certain specific algorithms controlling an agent can behave within e of optimal for horizon T except for poly(1/e,T,S,A) “wasted” experiences (with high probability). This started with E 3 by Satinder Singh and Michael Kearns . Sham Kakade’s thesis has significant discussion. Extensions have typically been of the form “under extra assumptions, we can prove more”, for example Factored-E 3 and
2 0.23045269 388 hunch net-2010-01-24-Specializations of the Master Problem
Introduction: One thing which is clear on a little reflection is that there exists a single master learning problem capable of encoding essentially all learning problems. This problem is of course a very general sort of reinforcement learning where the world interacts with an agent as: The world announces an observation x . The agent makes a choice a . The world announces a reward r . The goal here is to maximize the sum of the rewards over the time of the agent. No particular structure relating x to a or a to r is implied by this setting so we do not know effective general algorithms for the agent. It’s very easy to prove lower bounds showing that an agent cannot hope to succeed here—just consider the case where actions are unrelated to rewards. Nevertheless, there is a real sense in which essentially all forms of life are agents operating in this setting, somehow succeeding. The gap between these observations drives research—How can we find tractable specializations of
3 0.20020176 45 hunch net-2005-03-22-Active learning
Introduction: Often, unlabeled data is easy to come by but labels are expensive. For instance, if you’re building a speech recognizer, it’s easy enough to get raw speech samples — just walk around with a microphone — but labeling even one of these samples is a tedious process in which a human must examine the speech signal and carefully segment it into phonemes. In the field of active learning, the goal is as usual to construct an accurate classifier, but the labels of the data points are initially hidden and there is a charge for each label you want revealed. The hope is that by intelligent adaptive querying, you can get away with significantly fewer labels than you would need in a regular supervised learning framework. Here’s an example. Suppose the data lie on the real line, and the classifiers are simple thresholding functions, H = {h w }: h w (x) = 1 if x > w, and 0 otherwise. VC theory tells us that if the underlying distribution P can be classified perfectly by some hypothesis in H (
4 0.18025815 27 hunch net-2005-02-23-Problem: Reinforcement Learning with Classification
Introduction: At an intuitive level, the question here is “Can reinforcement learning be solved with classification?” Problem Construct a reinforcement learning algorithm with near-optimal expected sum of rewards in the direct experience model given access to a classifier learning algorithm which has a small error rate or regret on all posed classification problems. The definition of “posed” here is slightly murky. I consider a problem “posed” if there is an algorithm for constructing labeled classification examples. Past Work There exists a reduction of reinforcement learning to classification given a generative model. A generative model is an inherently stronger assumption than the direct experience model. Other work on learning reductions may be important. Several algorithms for solving reinforcement learning in the direct experience model exist. Most, such as E 3 , Factored-E 3 , and metric-E 3 and Rmax require that the observation be the state. Recent work
5 0.17665508 127 hunch net-2005-11-02-Progress in Active Learning
Introduction: Several bits of progress have been made since Sanjoy pointed out the significant lack of theoretical understanding of active learning . This is an update on the progress I know of. As a refresher, active learning as meant here is: There is a source of unlabeled data. There is an oracle from which labels can be requested for unlabeled data produced by the source. The goal is to perform well with minimal use of the oracle. Here is what I’ve learned: Sanjoy has developed sufficient and semi-necessary conditions for active learning given the assumptions of IID data and “realizability” (that one of the classifiers is a correct classifier). Nina , Alina , and I developed an algorithm for active learning relying on only the assumption of IID data. A draft is here . Nicolo , Claudio , and Luca showed that it is possible to do active learning in an entirely adversarial setting for linear threshold classifiers here . This was published a year or two ago and I r
6 0.17478926 235 hunch net-2007-03-03-All Models of Learning have Flaws
7 0.17006955 169 hunch net-2006-04-05-What is state?
8 0.14828657 432 hunch net-2011-04-20-The End of the Beginning of Active Learning
9 0.13733596 143 hunch net-2005-12-27-Automated Labeling
10 0.13297714 148 hunch net-2006-01-13-Benchmarks for RL
11 0.13271755 317 hunch net-2008-09-12-How do we get weak action dependence for learning with partial observations?
12 0.13123873 251 hunch net-2007-06-24-Interesting Papers at ICML 2007
13 0.13092262 101 hunch net-2005-08-08-Apprenticeship Reinforcement Learning for Control
14 0.12814465 311 hunch net-2008-07-26-Compositional Machine Learning Algorithm Design
15 0.12810461 274 hunch net-2007-11-28-Computational Consequences of Classification
16 0.12521029 293 hunch net-2008-03-23-Interactive Machine Learning
17 0.12483842 360 hunch net-2009-06-15-In Active Learning, the question changes
18 0.12363958 100 hunch net-2005-08-04-Why Reinforcement Learning is Important
19 0.12279023 136 hunch net-2005-12-07-Is the Google way the way for machine learning?
20 0.12067258 230 hunch net-2007-02-02-Thoughts regarding “Is machine learning different from statistics?”
topicId topicWeight
[(0, 0.297), (1, 0.15), (2, -0.005), (3, -0.012), (4, 0.116), (5, -0.066), (6, 0.048), (7, 0.032), (8, 0.061), (9, 0.045), (10, 0.138), (11, 0.019), (12, 0.014), (13, 0.097), (14, -0.097), (15, -0.009), (16, -0.037), (17, -0.025), (18, -0.027), (19, 0.024), (20, -0.023), (21, -0.037), (22, 0.049), (23, 0.01), (24, -0.037), (25, -0.017), (26, -0.115), (27, 0.005), (28, -0.02), (29, -0.027), (30, 0.075), (31, 0.004), (32, -0.0), (33, -0.063), (34, 0.073), (35, 0.028), (36, 0.121), (37, 0.061), (38, 0.019), (39, -0.128), (40, -0.048), (41, -0.023), (42, 0.114), (43, 0.036), (44, -0.007), (45, 0.03), (46, -0.004), (47, 0.021), (48, 0.064), (49, 0.072)]
simIndex simValue blogId blogTitle
same-blog 1 0.95878267 183 hunch net-2006-06-14-Explorations of Exploration
Introduction: Exploration is one of the big unsolved problems in machine learning. This isn’t for lack of trying—there are many models of exploration which have been analyzed in many different ways by many different groups of people. At some point, it is worthwhile to sit back and see what has been done across these many models. Reinforcement Learning (1) . Reinforcement learning has traditionally focused on Markov Decision Processes where the next state s’ is given by a conditional distribution P(s’|s,a) given the current state s and action a . The typical result here is that certain specific algorithms controlling an agent can behave within e of optimal for horizon T except for poly(1/e,T,S,A) “wasted” experiences (with high probability). This started with E 3 by Satinder Singh and Michael Kearns . Sham Kakade’s thesis has significant discussion. Extensions have typically been of the form “under extra assumptions, we can prove more”, for example Factored-E 3 and
2 0.71807343 388 hunch net-2010-01-24-Specializations of the Master Problem
Introduction: One thing which is clear on a little reflection is that there exists a single master learning problem capable of encoding essentially all learning problems. This problem is of course a very general sort of reinforcement learning where the world interacts with an agent as: The world announces an observation x . The agent makes a choice a . The world announces a reward r . The goal here is to maximize the sum of the rewards over the time of the agent. No particular structure relating x to a or a to r is implied by this setting so we do not know effective general algorithms for the agent. It’s very easy to prove lower bounds showing that an agent cannot hope to succeed here—just consider the case where actions are unrelated to rewards. Nevertheless, there is a real sense in which essentially all forms of life are agents operating in this setting, somehow succeeding. The gap between these observations drives research—How can we find tractable specializations of
3 0.69320273 45 hunch net-2005-03-22-Active learning
Introduction: Often, unlabeled data is easy to come by but labels are expensive. For instance, if you’re building a speech recognizer, it’s easy enough to get raw speech samples — just walk around with a microphone — but labeling even one of these samples is a tedious process in which a human must examine the speech signal and carefully segment it into phonemes. In the field of active learning, the goal is as usual to construct an accurate classifier, but the labels of the data points are initially hidden and there is a charge for each label you want revealed. The hope is that by intelligent adaptive querying, you can get away with significantly fewer labels than you would need in a regular supervised learning framework. Here’s an example. Suppose the data lie on the real line, and the classifiers are simple thresholding functions, H = {h w }: h w (x) = 1 if x > w, and 0 otherwise. VC theory tells us that if the underlying distribution P can be classified perfectly by some hypothesis in H (
4 0.68454754 143 hunch net-2005-12-27-Automated Labeling
Introduction: One of the common trends in machine learning has been an emphasis on the use of unlabeled data. The argument goes something like “there aren’t many labeled web pages out there, but there are a huge number of web pages, so we must find a way to take advantage of them.” There are several standard approaches for doing this: Unsupervised Learning . You use only unlabeled data. In a typical application, you cluster the data and hope that the clusters somehow correspond to what you care about. Semisupervised Learning. You use both unlabeled and labeled data to build a predictor. The unlabeled data influences the learned predictor in some way. Active Learning . You have unlabeled data and access to a labeling oracle. You interactively choose which examples to label so as to optimize prediction accuracy. It seems there is a fourth approach worth serious investigation—automated labeling. The approach goes as follows: Identify some subset of observed values to predict
5 0.65815973 311 hunch net-2008-07-26-Compositional Machine Learning Algorithm Design
Introduction: There were two papers at ICML presenting learning algorithms for a contextual bandit -style setting, where the loss for all labels is not known, but the loss for one label is known. (The first might require a exploration scavenging viewpoint to understand if the experimental assignment was nonrandom.) I strongly approve of these papers and further work in this setting and its variants, because I expect it to become more important than supervised learning. As a quick review, we are thinking about situations where repeatedly: The world reveals feature values (aka context information). A policy chooses an action. The world provides a reward. Sometimes this is done in an online fashion where the policy can change based on immediate feedback and sometimes it’s done in a batch setting where many samples are collected before the policy can change. If you haven’t spent time thinking about the setting, you might want to because there are many natural applications. I’m g
6 0.64967936 432 hunch net-2011-04-20-The End of the Beginning of Active Learning
7 0.6469689 101 hunch net-2005-08-08-Apprenticeship Reinforcement Learning for Control
8 0.63944232 169 hunch net-2006-04-05-What is state?
9 0.6354596 293 hunch net-2008-03-23-Interactive Machine Learning
10 0.63317609 317 hunch net-2008-09-12-How do we get weak action dependence for learning with partial observations?
11 0.6284399 127 hunch net-2005-11-02-Progress in Active Learning
12 0.62532818 400 hunch net-2010-06-13-The Good News on Exploration and Learning
13 0.6252656 148 hunch net-2006-01-13-Benchmarks for RL
14 0.6043182 299 hunch net-2008-04-27-Watchword: Supervised Learning
15 0.59753144 230 hunch net-2007-02-02-Thoughts regarding “Is machine learning different from statistics?”
16 0.58532321 276 hunch net-2007-12-10-Learning Track of International Planning Competition
17 0.58289647 360 hunch net-2009-06-15-In Active Learning, the question changes
18 0.58240235 220 hunch net-2006-11-27-Continuizing Solutions
19 0.57616335 57 hunch net-2005-04-16-Which Assumptions are Reasonable?
20 0.57267547 269 hunch net-2007-10-24-Contextual Bandits
topicId topicWeight
[(3, 0.227), (9, 0.014), (27, 0.309), (38, 0.046), (53, 0.079), (55, 0.079), (77, 0.048), (94, 0.071), (95, 0.053)]
simIndex simValue blogId blogTitle
1 0.9817304 213 hunch net-2006-10-08-Incompatibilities between classical confidence intervals and learning.
Introduction: Classical confidence intervals satisfy a theorem of the form: For some data sources D , Pr S ~ D (f(D) > g(S)) > 1-d where f is some function of the distribution (such as the mean) and g is some function of the observed sample S . The constraints on D can vary between “Independent and identically distributed (IID) samples from a gaussian with an unknown mean” to “IID samples from an arbitrary distribution D “. There are even some confidence intervals which do not require IID samples. Classical confidence intervals often confuse people. They do not say “with high probability, for my observed sample, the bounds holds”. Instead, they tell you that if you reason according to the confidence interval in the future (and the constraints on D are satisfied), then you are not often wrong. Restated, they tell you something about what a safe procedure is in a stochastic world where d is the safety parameter. There are a number of results in theoretical machine learn
2 0.97113311 31 hunch net-2005-02-26-Problem: Reductions and Relative Ranking Metrics
Introduction: This, again, is something of a research direction rather than a single problem. There are several metrics people care about which depend upon the relative ranking of examples and there are sometimes good reasons to care about such metrics. Examples include AROC , “F1″, the proportion of the time that the top ranked element is in some class, the proportion of the top 10 examples in some class ( google ‘s problem), the lowest ranked example of some class, and the “sort distance” from a predicted ranking to a correct ranking. See here for an example of some of these. Problem What does the ability to classify well imply about performance under these metrics? Past Work Probabilistic classification under squared error can be solved with a classifier. A counterexample shows this does not imply a good AROC. Sample complexity bounds for AROC (and here ). A paper on “ Learning to Order Things “. Difficulty Several of these may be easy. Some of them may be h
3 0.97038823 391 hunch net-2010-03-15-The Efficient Robust Conditional Probability Estimation Problem
Introduction: I’m offering a reward of $1000 for a solution to this problem. This joins the cross validation problem which I’m offering a $500 reward for. I believe both of these problems are hard but plausibly solvable, and plausibly with a solution of substantial practical value. While it’s unlikely these rewards are worth your time on an hourly wage basis, the recognition for solving them definitely should be The Problem The problem is finding a general, robust, and efficient mechanism for estimating a conditional probability P(y|x) where robustness and efficiency are measured using techniques from learning reductions. In particular, suppose we have access to a binary regression oracle B which has two interfaces—one for specifying training information and one for testing. Training information is specified as B(x’,y’) where x’ is a feature vector and y’ is a scalar in [0,1] with no value returned. Testing is done according to B(x’) with a value in [0,1] returned.
4 0.96789181 298 hunch net-2008-04-26-Eliminating the Birthday Paradox for Universal Features
Introduction: I want to expand on this post which describes one of the core tricks for making Vowpal Wabbit fast and easy to use when learning from text. The central trick is converting a word (or any other parseable quantity) into a number via a hash function. Kishore tells me this is a relatively old trick in NLP land, but it has some added advantages when doing online learning, because you can learn directly from the existing data without preprocessing the data to create features (destroying the online property) or using an expensive hashtable lookup (slowing things down). A central concern for this approach is collisions, which create a loss of information. If you use m features in an index space of size n the birthday paradox suggests a collision if m > n 0.5 , essentially because there are m 2 pairs. This is pretty bad, because it says that with a vocabulary of 10 5 features, you might need to have 10 10 entries in your table. It turns out that redundancy is gr
5 0.96621561 289 hunch net-2008-02-17-The Meaning of Confidence
Introduction: In many machine learning papers experiments are done and little confidence bars are reported for the results. This often seems quite clear, until you actually try to figure out what it means. There are several different kinds of ‘confidence’ being used, and it’s easy to become confused. Confidence = Probability . For those who haven’t worried about confidence for a long time, confidence is simply the probability of some event. You are confident about events which have a large probability. This meaning of confidence is inadequate in many applications because we want to reason about how much more information we have, how much more is needed, and where to get it. As an example, a learning algorithm might predict that the probability of an event is 0.5 , but it’s unclear if the probability is 0.5 because no examples have been provided or 0.5 because many examples have been provided and the event is simply fundamentally uncertain. Classical Confidence Intervals . These a
same-blog 6 0.94814432 183 hunch net-2006-06-14-Explorations of Exploration
7 0.93858075 243 hunch net-2007-05-08-Conditional Tournaments for Multiclass to Binary
8 0.87963742 484 hunch net-2013-06-16-Representative Reviewing
9 0.87340933 34 hunch net-2005-03-02-Prior, “Prior” and Bias
10 0.87089497 227 hunch net-2007-01-10-A Deep Belief Net Learning Problem
11 0.87005091 259 hunch net-2007-08-19-Choice of Metrics
12 0.86001289 41 hunch net-2005-03-15-The State of Tight Bounds
13 0.85686839 351 hunch net-2009-05-02-Wielding a New Abstraction
14 0.85625571 67 hunch net-2005-05-06-Don’t mix the solution into the problem
15 0.85359144 194 hunch net-2006-07-11-New Models
16 0.85289323 388 hunch net-2010-01-24-Specializations of the Master Problem
17 0.85187167 79 hunch net-2005-06-08-Question: “When is the right time to insert the loss function?”
18 0.84955513 12 hunch net-2005-02-03-Learning Theory, by assumption
19 0.84819114 220 hunch net-2006-11-27-Continuizing Solutions
20 0.84726274 311 hunch net-2008-07-26-Compositional Machine Learning Algorithm Design