hunch_net hunch_net-2005 hunch_net-2005-27 knowledge-graph by maker-knowledge-mining

27 hunch net-2005-02-23-Problem: Reinforcement Learning with Classification

meta infos for this blog

Source: html

Introduction: At an intuitive level, the question here is “Can reinforcement learning be solved with classification?” Problem Construct a reinforcement learning algorithm with near-optimal expected sum of rewards in the direct experience model given access to a classifier learning algorithm which has a small error rate or regret on all posed classification problems. The definition of “posed” here is slightly murky. I consider a problem “posed” if there is an algorithm for constructing labeled classification examples. Past Work There exists a reduction of reinforcement learning to classification given a generative model. A generative model is an inherently stronger assumption than the direct experience model. Other work on learning reductions may be important. Several algorithms for solving reinforcement learning in the direct experience model exist. Most, such as E 3 , Factored-E 3 , and metric-E 3 and Rmax require that the observation be the state. Recent work

Summary: the most important sentenses genereted by tfidf model

sentIndex sentText sentNum sentScore

1 At an intuitive level, the question here is “Can reinforcement learning be solved with classification? [sent-1, score-0.557]

2 ” Problem Construct a reinforcement learning algorithm with near-optimal expected sum of rewards in the direct experience model given access to a classifier learning algorithm which has a small error rate or regret on all posed classification problems. [sent-2, score-2.219]

3 The definition of “posed” here is slightly murky. [sent-3, score-0.153]

4 I consider a problem “posed” if there is an algorithm for constructing labeled classification examples. [sent-4, score-0.609]

5 Past Work There exists a reduction of reinforcement learning to classification given a generative model. [sent-5, score-1.018]

6 A generative model is an inherently stronger assumption than the direct experience model. [sent-6, score-0.996]

7 Other work on learning reductions may be important. [sent-7, score-0.085]

8 Several algorithms for solving reinforcement learning in the direct experience model exist. [sent-8, score-1.011]

9 Most, such as E 3 , Factored-E 3 , and metric-E 3 and Rmax require that the observation be the state. [sent-9, score-0.064]

10 This problem is related to predictive state representations , because we are trying to solve reinforcement learning with prediction ability. [sent-11, score-0.799]

11 Difficulty It is not clear whether this problem is solvable or not. [sent-12, score-0.289]

12 A proof that it is not solvable would be extremely interesting, and even partial success one way or another could be important. [sent-13, score-0.623]

13 Impact At the theoretical level, it would be very nice to know if the ability to generalize implies the ability to solve reinforcement learning because (in a general sense) all problems can be cast as reinforcement learning. [sent-14, score-1.461]

14 Even if the solution is exponential in the horizon time it can only motivate relaxations of the core algorithm like RLgen . [sent-15, score-0.524]

similar blogs computed by tfidf model

tfidf for this blog:

wordName wordTfidf (topN-words)

[('reinforcement', 0.451), ('posed', 0.368), ('direct', 0.252), ('classification', 0.217), ('generative', 0.212), ('solvable', 0.193), ('experience', 0.157), ('model', 0.151), ('rlgen', 0.133), ('pomdps', 0.133), ('algorithm', 0.119), ('relaxations', 0.116), ('level', 0.111), ('extends', 0.11), ('ability', 0.109), ('intuitive', 0.106), ('generalize', 0.106), ('horizon', 0.106), ('motivate', 0.102), ('constructing', 0.099), ('rewards', 0.096), ('problem', 0.096), ('solve', 0.093), ('partial', 0.088), ('slightly', 0.086), ('construct', 0.086), ('work', 0.085), ('representations', 0.084), ('exponential', 0.081), ('stronger', 0.08), ('sum', 0.08), ('recent', 0.078), ('labeled', 0.078), ('nice', 0.076), ('extremely', 0.075), ('predictive', 0.075), ('given', 0.074), ('assumption', 0.072), ('inherently', 0.072), ('proof', 0.072), ('past', 0.071), ('access', 0.07), ('definition', 0.067), ('would', 0.066), ('even', 0.065), ('impact', 0.065), ('regret', 0.065), ('require', 0.064), ('reduction', 0.064), ('success', 0.064)]

similar blogs list:

simIndex simValue blogId blogTitle

same-blog 1 1.0 27 hunch net-2005-02-23-Problem: Reinforcement Learning with Classification

2 0.24812332 14 hunch net-2005-02-07-The State of the Reduction

Introduction: What? Reductions are machines which turn solvers for one problem into solvers for another problem. Why? Reductions are useful for several reasons. Laziness . Reducing a problem to classification make at least 10 learning algorithms available to solve a problem. Inventing 10 learning algorithms is quite a bit of work. Similarly, programming a reduction is often trivial, while programming a learning algorithm is a great deal of work. Crystallization . The problems we often want to solve in learning are worst-case-impossible, but average case feasible. By reducing all problems onto one or a few primitives, we can fine tune these primitives to perform well on real-world problems with greater precision due to the greater number of problems to validate on. Theoretical Organization . By studying what reductions are easy vs. hard vs. impossible, we can learn which problems are roughly equivalent in difficulty and which are much harder. What we know now . Typesafe r

3 0.23427191 103 hunch net-2005-08-18-SVM Adaptability

Introduction: Several recent papers have shown that SVM-like optimizations can be used to handle several large family loss functions. This is a good thing because it is implausible that the loss function imposed by the world can not be taken into account in the process of solving a prediction problem. Even people used to the hard-core Bayesian approach to learning often note that some approximations are almost inevitable in specifying a prior and/or integrating to achieve a posterior. Taking into account how the system will be evaluated can allow both computational effort and design effort to be focused so as to improve performance. A current laundry list of capabilities includes: 2002 multiclass SVM including arbitrary cost matrices ICML 2003 Hidden Markov Models NIPS 2003 Markov Networks (see some discussion ) EMNLP 2004 Context free grammars ICML 2004 Any loss (with much computation) ICML 2005 Any constrained linear prediction model (thatâ€™s my own

4 0.18470865 100 hunch net-2005-08-04-Why Reinforcement Learning is Important

Introduction: One prescription for solving a problem well is: State the problem, in the simplest way possible. In particular, this statement should involve no contamination with or anticipation of the solution. Think about solutions to the stated problem. Stating a problem in a succinct and crisp manner tends to invite a simple elegant solution. When a problem can not be stated succinctly, we wonder if the problem is even understood. (And when a problem is not understood, we wonder if a solution can be meaningful.) Reinforcement learning does step (1) well. It provides a clean simple language to state general AI problems. In reinforcement learning there is a set of actions A , a set of observations O , and a reward r . The reinforcement learning problem, in general, is defined by a conditional measure D( o, r | (o,r,a) * ) which produces an observation o and a reward r given a history (o,r,a) * . The goal in reinforcement learning is to find a policy pi:(o,r,a) * -> a

5 0.18025815 183 hunch net-2006-06-14-Explorations of Exploration

Introduction: Exploration is one of the big unsolved problems in machine learning. This isn’t for lack of trying—there are many models of exploration which have been analyzed in many different ways by many different groups of people. At some point, it is worthwhile to sit back and see what has been done across these many models. Reinforcement Learning (1) . Reinforcement learning has traditionally focused on Markov Decision Processes where the next state s’ is given by a conditional distribution P(s’|s,a) given the current state s and action a . The typical result here is that certain specific algorithms controlling an agent can behave within e of optimal for horizon T except for poly(1/e,T,S,A) “wasted” experiences (with high probability). This started with E 3 by Satinder Singh and Michael Kearns . Sham Kakade’s thesis has significant discussion. Extensions have typically been of the form “under extra assumptions, we can prove more”, for example Factored-E 3 and

6 0.16976093 148 hunch net-2006-01-13-Benchmarks for RL

7 0.15438792 2 hunch net-2005-01-24-Holy grails of machine learning?

8 0.12687966 181 hunch net-2006-05-23-What is the best regret transform reduction from multiclass to binary?

9 0.12566233 196 hunch net-2006-07-13-Regression vs. Classification as a Primitive

10 0.12151229 388 hunch net-2010-01-24-Specializations of the Master Problem

11 0.11677687 101 hunch net-2005-08-08-Apprenticeship Reinforcement Learning for Control

12 0.11072067 83 hunch net-2005-06-18-Lower Bounds for Learning Reductions

13 0.10934982 158 hunch net-2006-02-24-A Fundamentalist Organization of Machine Learning

14 0.10821897 135 hunch net-2005-12-04-Watchword: model

15 0.10818119 351 hunch net-2009-05-02-Wielding a New Abstraction

16 0.10739172 435 hunch net-2011-05-16-Research Directions for Machine Learning and Algorithms

17 0.10531606 72 hunch net-2005-05-16-Regret minimizing vs error limiting reductions

18 0.10491932 347 hunch net-2009-03-26-Machine Learning is too easy

19 0.1043309 12 hunch net-2005-02-03-Learning Theory, by assumption

20 0.096497364 194 hunch net-2006-07-11-New Models

similar blogs computed by lsi model

lsi for this blog:

topicId topicWeight

[(0, 0.218), (1, 0.149), (2, 0.033), (3, -0.039), (4, 0.016), (5, -0.065), (6, 0.178), (7, -0.004), (8, -0.052), (9, -0.057), (10, -0.043), (11, -0.077), (12, -0.088), (13, 0.101), (14, 0.002), (15, 0.065), (16, 0.1), (17, -0.02), (18, 0.023), (19, -0.086), (20, -0.082), (21, -0.05), (22, -0.071), (23, -0.054), (24, -0.025), (25, 0.038), (26, -0.098), (27, -0.023), (28, -0.066), (29, 0.088), (30, 0.086), (31, 0.016), (32, -0.098), (33, -0.091), (34, 0.011), (35, 0.054), (36, 0.125), (37, -0.002), (38, 0.127), (39, -0.079), (40, 0.039), (41, -0.079), (42, -0.016), (43, 0.083), (44, 0.062), (45, 0.028), (46, -0.011), (47, 0.024), (48, -0.073), (49, -0.003)]

similar blogs list:

simIndex simValue blogId blogTitle

same-blog 1 0.96394736 27 hunch net-2005-02-23-Problem: Reinforcement Learning with Classification

2 0.74839103 14 hunch net-2005-02-07-The State of the Reduction

3 0.71924371 100 hunch net-2005-08-04-Why Reinforcement Learning is Important

4 0.65849316 103 hunch net-2005-08-18-SVM Adaptability

5 0.65641004 101 hunch net-2005-08-08-Apprenticeship Reinforcement Learning for Control

Introduction: Pieter Abbeel presented a paper with Andrew Ng at ICML on Exploration and Apprenticeship Learning in Reinforcement Learning . The basic idea of this algorithm is: Collect data from a human controlling a machine. Build a transition model based upon the experience. Build a policy which optimizes the transition model. Evaluate the policy. If it works well, halt, otherwise add the experience into the pool and go to (2). The paper proves that this technique will converge to some policy with expected performance near human expected performance assuming the world fits certain assumptions (MDP or linear dynamics). This general idea of apprenticeship learning (i.e. incorporating data from an expert) seems very compelling because (a) humans often learn this way and (b) much harder problems can be solved. For (a), the notion of teaching is about transferring knowledge from an expert to novices, often via demonstration. To see (b), note that we can create intricate rei

6 0.63938737 82 hunch net-2005-06-17-Reopening RL->Classification

7 0.6393705 31 hunch net-2005-02-26-Problem: Reductions and Relative Ranking Metrics

8 0.63346732 2 hunch net-2005-01-24-Holy grails of machine learning?

9 0.58818668 169 hunch net-2006-04-05-What is state?

10 0.57570422 148 hunch net-2006-01-13-Benchmarks for RL

11 0.55697483 276 hunch net-2007-12-10-Learning Track of International Planning Competition

12 0.55581331 351 hunch net-2009-05-02-Wielding a New Abstraction

13 0.53092319 391 hunch net-2010-03-15-The Efficient Robust Conditional Probability Estimation Problem

14 0.51921475 3 hunch net-2005-01-24-The Humanloop Spectrum of Machine Learning

15 0.51872796 192 hunch net-2006-07-08-Some recent papers

16 0.51786923 83 hunch net-2005-06-18-Lower Bounds for Learning Reductions

17 0.51003408 183 hunch net-2006-06-14-Explorations of Exploration

18 0.50099933 18 hunch net-2005-02-12-ROC vs. Accuracy vs. AROC

19 0.49311122 359 hunch net-2009-06-03-Functionally defined Nonlinear Dynamic Models

20 0.4919613 68 hunch net-2005-05-10-Learning Reductions are Reductionist

similar blogs computed by lda model

lda for this blog:

topicId topicWeight

[(27, 0.283), (38, 0.102), (53, 0.117), (55, 0.048), (79, 0.256), (94, 0.04), (95, 0.043)]

similar blogs list:

simIndex simValue blogId blogTitle

1 0.93442374 354 hunch net-2009-05-17-Server Update

Introduction: The hunch.net server has been updated. I’ve taken the opportunity to upgrade the version of wordpress which caused cascading changes. Old threaded comments are now flattened. The system we used to use ( Brian’s threaded comments ) appears incompatible with the new threading system built into wordpress. I haven’t yet figured out a workaround. I setup a feedburner account . I added an RSS aggregator for both Machine Learning and other research blogs that I like to follow. This is something that I’ve wanted to do for awhile. Many other minor changes in font and format, with some help from Alina . If you have any suggestions for site tweaks, please speak up.

same-blog 2 0.9244141 27 hunch net-2005-02-23-Problem: Reinforcement Learning with Classification

3 0.90874493 248 hunch net-2007-06-19-How is Compressed Sensing going to change Machine Learning ?

Introduction: Compressed Sensing (CS) is a new framework developed by Emmanuel Candes , Terry Tao and David Donoho . To summarize, if you acquire a signal in some basis that is incoherent with the basis in which you know the signal to be sparse in, it is very likely you will be able to reconstruct the signal from these incoherent projections. Terry Tao, the recent Fields medalist , does a very nice job at explaining the framework here . He goes further in the theory description in this post where he mentions the central issue of the Uniform Uncertainty Principle. It so happens that random projections are on average incoherent, within the UUP meaning, with most known basis (sines, polynomials, splines, wavelets, curvelets …) and are therefore an ideal basis for Compressed Sensing. [ For more in-depth information on the subject, the Rice group has done a very good job at providing a central library of papers relevant to the growing subject: http://www.dsp.ece.rice.edu/cs/ ] The Machine

4 0.88020837 162 hunch net-2006-03-09-Use of Notation

Introduction: For most people, a mathematical notation is like a language: you learn it and stick with it. For people doing mathematical research, however, this is not enough: they must design new notations for new problems. The design of good notation is both hard and worthwhile since a bad initial notation can retard a line of research greatly. Before we had mathematical notation, equations were all written out in language. Since words have multiple meanings and variable precedences, long equations written out in language can be extraordinarily difficult and sometimes fundamentally ambiguous. A good representative example of this is the legalese in the tax code. Since we want greater precision and clarity, we adopt mathematical notation. One fundamental thing to understand about mathematical notation, is that humans as logic verifiers, are barely capable. This is the fundamental reason why one notation can be much better than another. This observation is easier to miss than you might

5 0.83172888 204 hunch net-2006-08-28-Learning Theory standards for NIPS 2006

Introduction: Bob Williamson and I are the learning theory PC members at NIPS this year. This is some attempt to state the standards and tests I applied to the papers. I think it is a good idea to talk about this for two reasons: Making community standards a matter of public record seems healthy. It give us a chance to debate what is and is not the right standard. It might even give us a bit more consistency across the years. It may save us all time. There are a number of papers submitted which just aren’t there yet. Avoiding submitting is the right decision in this case. There are several criteria for judging a paper. All of these were active this year. Some criteria are uncontroversial while others may be so. The paper must have a theorem establishing something new for which it is possible to derive high confidence in the correctness of the results. A surprising number of papers fail this test. This criteria seems essential to the definition of “theory”. Missing theo

6 0.78273278 423 hunch net-2011-02-02-User preferences for search engines

7 0.77936447 131 hunch net-2005-11-16-The Everything Ensemble Edge

8 0.77679312 227 hunch net-2007-01-10-A Deep Belief Net Learning Problem

9 0.77473027 12 hunch net-2005-02-03-Learning Theory, by assumption

10 0.77297646 14 hunch net-2005-02-07-The State of the Reduction

11 0.76014251 478 hunch net-2013-01-07-NYU Large Scale Machine Learning Class

12 0.75902742 41 hunch net-2005-03-15-The State of Tight Bounds

13 0.7580688 230 hunch net-2007-02-02-Thoughts regarding “Is machine learning different from statistics?”

14 0.75769496 19 hunch net-2005-02-14-Clever Methods of Overfitting

15 0.75545871 370 hunch net-2009-09-18-Necessary and Sufficient Research

16 0.75307775 79 hunch net-2005-06-08-Question: “When is the right time to insert the loss function?”

17 0.74949926 194 hunch net-2006-07-11-New Models

18 0.74909008 347 hunch net-2009-03-26-Machine Learning is too easy

19 0.74716336 235 hunch net-2007-03-03-All Models of Learning have Flaws

20 0.74705535 201 hunch net-2006-08-07-The Call of the Deep