hunch_net hunch_net-2005 hunch_net-2005-100 knowledge-graph by maker-knowledge-mining

100 hunch net-2005-08-04-Why Reinforcement Learning is Important


meta infos for this blog

Source: html

Introduction: One prescription for solving a problem well is: State the problem, in the simplest way possible. In particular, this statement should involve no contamination with or anticipation of the solution. Think about solutions to the stated problem. Stating a problem in a succinct and crisp manner tends to invite a simple elegant solution. When a problem can not be stated succinctly, we wonder if the problem is even understood. (And when a problem is not understood, we wonder if a solution can be meaningful.) Reinforcement learning does step (1) well. It provides a clean simple language to state general AI problems. In reinforcement learning there is a set of actions A , a set of observations O , and a reward r . The reinforcement learning problem, in general, is defined by a conditional measure D( o, r | (o,r,a) * ) which produces an observation o and a reward r given a history (o,r,a) * . The goal in reinforcement learning is to find a policy pi:(o,r,a) * -> a


Summary: the most important sentenses genereted by tfidf model

sentIndex sentText sentNum sentScore

1 One prescription for solving a problem well is: State the problem, in the simplest way possible. [sent-1, score-0.479]

2 In particular, this statement should involve no contamination with or anticipation of the solution. [sent-2, score-0.08]

3 Stating a problem in a succinct and crisp manner tends to invite a simple elegant solution. [sent-4, score-0.857]

4 When a problem can not be stated succinctly, we wonder if the problem is even understood. [sent-5, score-0.717]

5 (And when a problem is not understood, we wonder if a solution can be meaningful. [sent-6, score-0.385]

6 It provides a clean simple language to state general AI problems. [sent-8, score-0.312]

7 In reinforcement learning there is a set of actions A , a set of observations O , and a reward r . [sent-9, score-0.636]

8 The reinforcement learning problem, in general, is defined by a conditional measure D( o, r | (o,r,a) * ) which produces an observation o and a reward r given a history (o,r,a) * . [sent-10, score-0.826]

9 The goal in reinforcement learning is to find a policy pi:(o,r,a) * -> a mapping histories to actions so as to maximize (or approximately maximize) the expected sum of observed rewards. [sent-11, score-0.884]

10 This formulation is capable of capturing almost any (all? [sent-12, score-0.477]

11 (Are there any other formulations capable of capturing a similar generality? [sent-14, score-0.486]

12 ) I don’t believe we yet have good RL solutions from step (2), but that is unsurprising given the generality of the problem. [sent-15, score-0.698]

13 Note that solving RL in this generality is impossible (for example, it can encode classification). [sent-16, score-0.591]

14 It is very common to consider the restricted problem where the history is summarized by the previous observation. [sent-18, score-0.574]

15 Think about relativized solutions (such as reductions). [sent-22, score-0.195]

16 Both methods are options are under active investigation. [sent-23, score-0.097]


similar blogs computed by tfidf model

tfidf for this blog:

wordName wordTfidf (topN-words)

[('generality', 0.283), ('reinforcement', 0.245), ('capturing', 0.22), ('maximize', 0.21), ('wonder', 0.202), ('solutions', 0.195), ('history', 0.189), ('problem', 0.183), ('rl', 0.163), ('actions', 0.163), ('reward', 0.157), ('ai', 0.152), ('stated', 0.149), ('capable', 0.14), ('encode', 0.126), ('formulations', 0.126), ('prescription', 0.126), ('restrictions', 0.126), ('simplify', 0.126), ('succinctly', 0.126), ('state', 0.117), ('crisp', 0.117), ('formulation', 0.117), ('succinct', 0.117), ('step', 0.115), ('aka', 0.11), ('summarized', 0.11), ('clean', 0.105), ('unsurprising', 0.105), ('mapping', 0.101), ('stating', 0.097), ('options', 0.097), ('solving', 0.093), ('invite', 0.092), ('elegant', 0.092), ('investigation', 0.092), ('produces', 0.092), ('restricted', 0.092), ('simple', 0.09), ('impossible', 0.089), ('approximately', 0.089), ('tends', 0.089), ('involve', 0.08), ('manner', 0.077), ('simplest', 0.077), ('sum', 0.076), ('markov', 0.075), ('defined', 0.072), ('conditional', 0.071), ('observations', 0.071)]

similar blogs list:

simIndex simValue blogId blogTitle

same-blog 1 0.99999988 100 hunch net-2005-08-04-Why Reinforcement Learning is Important

Introduction: One prescription for solving a problem well is: State the problem, in the simplest way possible. In particular, this statement should involve no contamination with or anticipation of the solution. Think about solutions to the stated problem. Stating a problem in a succinct and crisp manner tends to invite a simple elegant solution. When a problem can not be stated succinctly, we wonder if the problem is even understood. (And when a problem is not understood, we wonder if a solution can be meaningful.) Reinforcement learning does step (1) well. It provides a clean simple language to state general AI problems. In reinforcement learning there is a set of actions A , a set of observations O , and a reward r . The reinforcement learning problem, in general, is defined by a conditional measure D( o, r | (o,r,a) * ) which produces an observation o and a reward r given a history (o,r,a) * . The goal in reinforcement learning is to find a policy pi:(o,r,a) * -> a

2 0.18470865 27 hunch net-2005-02-23-Problem: Reinforcement Learning with Classification

Introduction: At an intuitive level, the question here is “Can reinforcement learning be solved with classification?” Problem Construct a reinforcement learning algorithm with near-optimal expected sum of rewards in the direct experience model given access to a classifier learning algorithm which has a small error rate or regret on all posed classification problems. The definition of “posed” here is slightly murky. I consider a problem “posed” if there is an algorithm for constructing labeled classification examples. Past Work There exists a reduction of reinforcement learning to classification given a generative model. A generative model is an inherently stronger assumption than the direct experience model. Other work on learning reductions may be important. Several algorithms for solving reinforcement learning in the direct experience model exist. Most, such as E 3 , Factored-E 3 , and metric-E 3 and Rmax require that the observation be the state. Recent work

3 0.15829334 388 hunch net-2010-01-24-Specializations of the Master Problem

Introduction: One thing which is clear on a little reflection is that there exists a single master learning problem capable of encoding essentially all learning problems. This problem is of course a very general sort of reinforcement learning where the world interacts with an agent as: The world announces an observation x . The agent makes a choice a . The world announces a reward r . The goal here is to maximize the sum of the rewards over the time of the agent. No particular structure relating x to a or a to r is implied by this setting so we do not know effective general algorithms for the agent. It’s very easy to prove lower bounds showing that an agent cannot hope to succeed here—just consider the case where actions are unrelated to rewards. Nevertheless, there is a real sense in which essentially all forms of life are agents operating in this setting, somehow succeeding. The gap between these observations drives research—How can we find tractable specializations of

4 0.15216465 169 hunch net-2006-04-05-What is state?

Introduction: In reinforcement learning (and sometimes other settings), there is a notion of “state”. Based upon the state various predictions are made such as “Which action should be taken next?” or “How much cumulative reward do I expect if I take some action from this state?” Given the importance of state, it is important to examine the meaning. There are actually several distinct options and it turns out the definition variation is very important in motivating different pieces of work. Newtonian State. State is the physical pose of the world. Under this definition, there are very many states, often too many for explicit representation. This is also the definition typically used in games. Abstracted State. State is an abstracted physical state of the world. “Is the door open or closed?” “Are you in room A or not?” The number of states is much smaller here. A basic issue here is: “How do you compute the state from observations?” Mathematical State. State is a sufficient stati

5 0.13296539 220 hunch net-2006-11-27-Continuizing Solutions

Introduction: This post is about a general technique for problem solving which I’ve never seen taught (in full generality), but which I’ve found very useful. Many problems in computer science turn out to be discretely difficult. The best known version of such problems are NP-hard problems, but I mean ‘discretely difficult’ in a much more general way, which I only know how to capture by examples. ERM In empirical risk minimization, you choose a minimum error rate classifier from a set of classifiers. This is NP hard for common sets, but it can be much harder, depending on the set. Experts In the online learning with experts setting, you try to predict well so as to compete with a set of (adversarial) experts. Here the alternating quantifiers of you and an adversary playing out a game can yield a dynamic programming problem that grows exponentially. Policy Iteration The problem with policy iteration is that you learn a new policy with respect to an old policy, which implies that sim

6 0.13143535 14 hunch net-2005-02-07-The State of the Reduction

7 0.12363958 183 hunch net-2006-06-14-Explorations of Exploration

8 0.12299822 2 hunch net-2005-01-24-Holy grails of machine learning?

9 0.11759627 380 hunch net-2009-11-29-AI Safety

10 0.11600762 82 hunch net-2005-06-17-Reopening RL->Classification

11 0.11482451 353 hunch net-2009-05-08-Computability in Artificial Intelligence

12 0.1064685 317 hunch net-2008-09-12-How do we get weak action dependence for learning with partial observations?

13 0.10043487 435 hunch net-2011-05-16-Research Directions for Machine Learning and Algorithms

14 0.096863501 148 hunch net-2006-01-13-Benchmarks for RL

15 0.096562803 352 hunch net-2009-05-06-Machine Learning to AI

16 0.096351221 67 hunch net-2005-05-06-Don’t mix the solution into the problem

17 0.095313221 351 hunch net-2009-05-02-Wielding a New Abstraction

18 0.093816184 458 hunch net-2012-03-06-COLT-ICML Open Questions and ICML Instructions

19 0.093261749 101 hunch net-2005-08-08-Apprenticeship Reinforcement Learning for Control

20 0.090867966 235 hunch net-2007-03-03-All Models of Learning have Flaws


similar blogs computed by lsi model

lsi for this blog:

topicId topicWeight

[(0, 0.171), (1, 0.094), (2, 0.008), (3, 0.022), (4, 0.026), (5, -0.068), (6, 0.12), (7, 0.036), (8, 0.008), (9, -0.02), (10, -0.003), (11, -0.065), (12, -0.03), (13, 0.187), (14, -0.028), (15, 0.027), (16, 0.113), (17, -0.094), (18, 0.028), (19, 0.058), (20, -0.116), (21, 0.062), (22, -0.084), (23, -0.022), (24, -0.024), (25, 0.081), (26, -0.108), (27, -0.005), (28, 0.012), (29, 0.097), (30, 0.028), (31, -0.044), (32, -0.034), (33, -0.016), (34, 0.052), (35, 0.082), (36, 0.043), (37, 0.108), (38, 0.099), (39, -0.014), (40, 0.074), (41, -0.047), (42, 0.056), (43, -0.035), (44, -0.034), (45, 0.059), (46, -0.071), (47, 0.05), (48, -0.002), (49, -0.08)]

similar blogs list:

simIndex simValue blogId blogTitle

same-blog 1 0.96793115 100 hunch net-2005-08-04-Why Reinforcement Learning is Important

Introduction: One prescription for solving a problem well is: State the problem, in the simplest way possible. In particular, this statement should involve no contamination with or anticipation of the solution. Think about solutions to the stated problem. Stating a problem in a succinct and crisp manner tends to invite a simple elegant solution. When a problem can not be stated succinctly, we wonder if the problem is even understood. (And when a problem is not understood, we wonder if a solution can be meaningful.) Reinforcement learning does step (1) well. It provides a clean simple language to state general AI problems. In reinforcement learning there is a set of actions A , a set of observations O , and a reward r . The reinforcement learning problem, in general, is defined by a conditional measure D( o, r | (o,r,a) * ) which produces an observation o and a reward r given a history (o,r,a) * . The goal in reinforcement learning is to find a policy pi:(o,r,a) * -> a

2 0.7045188 169 hunch net-2006-04-05-What is state?

Introduction: In reinforcement learning (and sometimes other settings), there is a notion of “state”. Based upon the state various predictions are made such as “Which action should be taken next?” or “How much cumulative reward do I expect if I take some action from this state?” Given the importance of state, it is important to examine the meaning. There are actually several distinct options and it turns out the definition variation is very important in motivating different pieces of work. Newtonian State. State is the physical pose of the world. Under this definition, there are very many states, often too many for explicit representation. This is also the definition typically used in games. Abstracted State. State is an abstracted physical state of the world. “Is the door open or closed?” “Are you in room A or not?” The number of states is much smaller here. A basic issue here is: “How do you compute the state from observations?” Mathematical State. State is a sufficient stati

3 0.686387 27 hunch net-2005-02-23-Problem: Reinforcement Learning with Classification

Introduction: At an intuitive level, the question here is “Can reinforcement learning be solved with classification?” Problem Construct a reinforcement learning algorithm with near-optimal expected sum of rewards in the direct experience model given access to a classifier learning algorithm which has a small error rate or regret on all posed classification problems. The definition of “posed” here is slightly murky. I consider a problem “posed” if there is an algorithm for constructing labeled classification examples. Past Work There exists a reduction of reinforcement learning to classification given a generative model. A generative model is an inherently stronger assumption than the direct experience model. Other work on learning reductions may be important. Several algorithms for solving reinforcement learning in the direct experience model exist. Most, such as E 3 , Factored-E 3 , and metric-E 3 and Rmax require that the observation be the state. Recent work

4 0.64779574 82 hunch net-2005-06-17-Reopening RL->Classification

Introduction: In research, it’s often the case that solving a problem helps you realize that it wasn’t the right problem to solve. This is the case for the “ reduce RL to classification ” problem with the solution hinted at here and turned into a paper here . The essential difficulty is that the method of stating and analyzing reductions ends up being nonalgorithmic (unlike previous reductions) unless you work with learning from teleoperated robots as Greg Grudic does. The difficulty here is due to the reduction being dependent on the optimal policy (which a human teleoperator might simulate, but which is otherwise unavailable). So, this problem is “open” again with the caveat that this time we want a more algorithmic solution. Whether or not this is feasible at all is still unclear and evidence in either direction would greatly interest me. A positive answer might have many practical implications in the long run.

5 0.58106387 359 hunch net-2009-06-03-Functionally defined Nonlinear Dynamic Models

Introduction: Suppose we have a set of observations over time x 1 ,x 2 ,…,x t and want to predict some future event y t+1 . An inevitable problem arises, because learning a predictor h(x 1 ,…,x t ) of y t+1 is generically intractable due to the size of the input. To make this problem tractable, what’s necessary is a method for summarizing the relevant information in past observations for the purpose of prediction in the future. In other words, state is required. Existing approaches for deriving state have some limitations. Hidden Markov models learned with EM suffer from local minima, use tabular learning approaches which provide dubious generalization ability, and often require substantial a.priori specification of the observations. Kalman Filters and Particle Filters are very parametric in the sense that substantial information must be specified up front. Dynamic Bayesian Networks ( graphical models through time) require substantial a.priori specification and often re

6 0.579611 220 hunch net-2006-11-27-Continuizing Solutions

7 0.56012136 101 hunch net-2005-08-08-Apprenticeship Reinforcement Learning for Control

8 0.55486554 276 hunch net-2007-12-10-Learning Track of International Planning Competition

9 0.53987849 148 hunch net-2006-01-13-Benchmarks for RL

10 0.53328329 68 hunch net-2005-05-10-Learning Reductions are Reductionist

11 0.53150582 388 hunch net-2010-01-24-Specializations of the Master Problem

12 0.51866251 269 hunch net-2007-10-24-Contextual Bandits

13 0.48747027 31 hunch net-2005-02-26-Problem: Reductions and Relative Ranking Metrics

14 0.48651677 458 hunch net-2012-03-06-COLT-ICML Open Questions and ICML Instructions

15 0.47926846 14 hunch net-2005-02-07-The State of the Reduction

16 0.47921947 2 hunch net-2005-01-24-Holy grails of machine learning?

17 0.47297937 370 hunch net-2009-09-18-Necessary and Sufficient Research

18 0.47288755 391 hunch net-2010-03-15-The Efficient Robust Conditional Probability Estimation Problem

19 0.47140631 352 hunch net-2009-05-06-Machine Learning to AI

20 0.46323717 347 hunch net-2009-03-26-Machine Learning is too easy


similar blogs computed by lda model

lda for this blog:

topicId topicWeight

[(27, 0.245), (28, 0.355), (38, 0.013), (53, 0.105), (55, 0.038), (77, 0.093), (95, 0.039)]

similar blogs list:

simIndex simValue blogId blogTitle

same-blog 1 0.90501946 100 hunch net-2005-08-04-Why Reinforcement Learning is Important

Introduction: One prescription for solving a problem well is: State the problem, in the simplest way possible. In particular, this statement should involve no contamination with or anticipation of the solution. Think about solutions to the stated problem. Stating a problem in a succinct and crisp manner tends to invite a simple elegant solution. When a problem can not be stated succinctly, we wonder if the problem is even understood. (And when a problem is not understood, we wonder if a solution can be meaningful.) Reinforcement learning does step (1) well. It provides a clean simple language to state general AI problems. In reinforcement learning there is a set of actions A , a set of observations O , and a reward r . The reinforcement learning problem, in general, is defined by a conditional measure D( o, r | (o,r,a) * ) which produces an observation o and a reward r given a history (o,r,a) * . The goal in reinforcement learning is to find a policy pi:(o,r,a) * -> a

2 0.80352581 124 hunch net-2005-10-19-Workshop: Atomic Learning

Introduction: We are planning to have a workshop on atomic learning Jan 7 & 8 at TTI-Chicago. Details are here . The earlier request for interest is here . The primary deadline is abstracts due Nov. 20 to jl@tti-c.org.

3 0.59184611 388 hunch net-2010-01-24-Specializations of the Master Problem

Introduction: One thing which is clear on a little reflection is that there exists a single master learning problem capable of encoding essentially all learning problems. This problem is of course a very general sort of reinforcement learning where the world interacts with an agent as: The world announces an observation x . The agent makes a choice a . The world announces a reward r . The goal here is to maximize the sum of the rewards over the time of the agent. No particular structure relating x to a or a to r is implied by this setting so we do not know effective general algorithms for the agent. It’s very easy to prove lower bounds showing that an agent cannot hope to succeed here—just consider the case where actions are unrelated to rewards. Nevertheless, there is a real sense in which essentially all forms of life are agents operating in this setting, somehow succeeding. The gap between these observations drives research—How can we find tractable specializations of

4 0.58756649 317 hunch net-2008-09-12-How do we get weak action dependence for learning with partial observations?

Introduction: This post is about contextual bandit problems where, repeatedly: The world chooses features x and rewards for each action r 1 ,…,r k then announces the features x (but not the rewards). A policy chooses an action a . The world announces the reward r a The goal in these situations is to learn a policy which maximizes r a in expectation efficiently. I’m thinking about all situations which fit the above setting, whether they are drawn IID or adversarially from round to round and whether they involve past logged data or rapidly learning via interaction. One common drawback of all algorithms for solving this setting, is that they have a poor dependence on the number of actions. For example if k is the number of actions, EXP4 (page 66) has a dependence on k 0.5 , epoch-greedy (and the simpler epsilon greedy) have a dependence on k 1/3 , and the offset tree has a dependence on k-1 . These results aren’t directly comparable because different things a

5 0.58604354 60 hunch net-2005-04-23-Advantages and Disadvantages of Bayesian Learning

Introduction: I don’t consider myself a “Bayesian”, but I do try hard to understand why Bayesian learning works. For the purposes of this post, Bayesian learning is a simple process of: Specify a prior over world models. Integrate using Bayes law with respect to all observed information to compute a posterior over world models. Predict according to the posterior. Bayesian learning has many advantages over other learning programs: Interpolation Bayesian learning methods interpolate all the way to pure engineering. When faced with any learning problem, there is a choice of how much time and effort a human vs. a computer puts in. (For example, the mars rover pathfinding algorithms are almost entirely engineered.) When creating an engineered system, you build a model of the world and then find a good controller in that model. Bayesian methods interpolate to this extreme because the Bayesian prior can be a delta function on one model of the world. What this means is that a recipe

6 0.58234257 483 hunch net-2013-06-10-The Large Scale Learning class notes

7 0.57962674 165 hunch net-2006-03-23-The Approximation Argument

8 0.57823408 12 hunch net-2005-02-03-Learning Theory, by assumption

9 0.5738976 101 hunch net-2005-08-08-Apprenticeship Reinforcement Learning for Control

10 0.57180279 478 hunch net-2013-01-07-NYU Large Scale Machine Learning Class

11 0.57053953 206 hunch net-2006-09-09-How to solve an NP hard problem in quadratic time

12 0.57031643 269 hunch net-2007-10-24-Contextual Bandits

13 0.56948924 201 hunch net-2006-08-07-The Call of the Deep

14 0.56824321 131 hunch net-2005-11-16-The Everything Ensemble Edge

15 0.56739753 9 hunch net-2005-02-01-Watchword: Loss

16 0.56724638 220 hunch net-2006-11-27-Continuizing Solutions

17 0.56663811 311 hunch net-2008-07-26-Compositional Machine Learning Algorithm Design

18 0.56583524 227 hunch net-2007-01-10-A Deep Belief Net Learning Problem

19 0.56359673 158 hunch net-2006-02-24-A Fundamentalist Organization of Machine Learning

20 0.5628652 435 hunch net-2011-05-16-Research Directions for Machine Learning and Algorithms