hunch_net hunch_net-2005 hunch_net-2005-101 knowledge-graph by maker-knowledge-mining

101 hunch net-2005-08-08-Apprenticeship Reinforcement Learning for Control


meta infos for this blog

Source: html

Introduction: Pieter Abbeel presented a paper with Andrew Ng at ICML on Exploration and Apprenticeship Learning in Reinforcement Learning . The basic idea of this algorithm is: Collect data from a human controlling a machine. Build a transition model based upon the experience. Build a policy which optimizes the transition model. Evaluate the policy. If it works well, halt, otherwise add the experience into the pool and go to (2). The paper proves that this technique will converge to some policy with expected performance near human expected performance assuming the world fits certain assumptions (MDP or linear dynamics). This general idea of apprenticeship learning (i.e. incorporating data from an expert) seems very compelling because (a) humans often learn this way and (b) much harder problems can be solved. For (a), the notion of teaching is about transferring knowledge from an expert to novices, often via demonstration. To see (b), note that we can create intricate rei


Summary: the most important sentenses genereted by tfidf model

sentIndex sentText sentNum sentScore

1 Pieter Abbeel presented a paper with Andrew Ng at ICML on Exploration and Apprenticeship Learning in Reinforcement Learning . [sent-1, score-0.09]

2 The basic idea of this algorithm is: Collect data from a human controlling a machine. [sent-2, score-0.308]

3 Build a transition model based upon the experience. [sent-3, score-0.202]

4 Build a policy which optimizes the transition model. [sent-4, score-0.51]

5 If it works well, halt, otherwise add the experience into the pool and go to (2). [sent-6, score-0.192]

6 The paper proves that this technique will converge to some policy with expected performance near human expected performance assuming the world fits certain assumptions (MDP or linear dynamics). [sent-7, score-1.229]

7 This general idea of apprenticeship learning (i. [sent-8, score-0.339]

8 incorporating data from an expert) seems very compelling because (a) humans often learn this way and (b) much harder problems can be solved. [sent-10, score-0.174]

9 For (a), the notion of teaching is about transferring knowledge from an expert to novices, often via demonstration. [sent-11, score-0.253]

10 To see (b), note that we can create intricate reinforcement learning problems where a particular sequence of actions must be taken to achieve a goal. [sent-12, score-0.69]

11 A novice might be able to memorize this sequence given just one demonstration even though it would require experience exponential in the length of the sequence to discover the key sequence accidentally. [sent-13, score-1.452]

12 Andrew Ng’s group has exploited this to make this very fun picture. [sent-14, score-0.205]

13 (Yeah, that’s a helicopter flying upside down, under computer control. [sent-15, score-0.378]

14 ) As far as this particular paper, one question occurs to me. [sent-16, score-0.248]

15 There is a general principle of learning which says we should avoid “double approximation”, such as occurs in step (3) where we build an approximate policy on an approximate model. [sent-17, score-0.973]

16 Is there a way to fuse steps (2) and (3) to achieve faster or better learning? [sent-18, score-0.317]


similar blogs computed by tfidf model

tfidf for this blog:

wordName wordTfidf (topN-words)

[('sequence', 0.286), ('apprenticeship', 0.253), ('build', 0.205), ('transition', 0.202), ('policy', 0.198), ('ng', 0.184), ('expert', 0.171), ('occurs', 0.167), ('andrew', 0.16), ('approximate', 0.157), ('exploited', 0.126), ('flying', 0.126), ('helicopter', 0.126), ('memorize', 0.126), ('upside', 0.126), ('reinforcement', 0.123), ('human', 0.121), ('achieve', 0.118), ('expected', 0.118), ('abbeel', 0.117), ('demonstration', 0.117), ('fuse', 0.117), ('pieter', 0.117), ('optimizes', 0.11), ('collect', 0.101), ('controlling', 0.101), ('proves', 0.101), ('performance', 0.1), ('experience', 0.1), ('incorporating', 0.098), ('evaluate', 0.098), ('fits', 0.098), ('converge', 0.098), ('dynamics', 0.098), ('mdp', 0.098), ('length', 0.095), ('pool', 0.092), ('paper', 0.09), ('principle', 0.089), ('assuming', 0.087), ('idea', 0.086), ('teaching', 0.082), ('steps', 0.082), ('actions', 0.082), ('particular', 0.081), ('approximation', 0.08), ('discover', 0.079), ('fun', 0.079), ('exponential', 0.077), ('humans', 0.076)]

similar blogs list:

simIndex simValue blogId blogTitle

same-blog 1 0.99999988 101 hunch net-2005-08-08-Apprenticeship Reinforcement Learning for Control

Introduction: Pieter Abbeel presented a paper with Andrew Ng at ICML on Exploration and Apprenticeship Learning in Reinforcement Learning . The basic idea of this algorithm is: Collect data from a human controlling a machine. Build a transition model based upon the experience. Build a policy which optimizes the transition model. Evaluate the policy. If it works well, halt, otherwise add the experience into the pool and go to (2). The paper proves that this technique will converge to some policy with expected performance near human expected performance assuming the world fits certain assumptions (MDP or linear dynamics). This general idea of apprenticeship learning (i.e. incorporating data from an expert) seems very compelling because (a) humans often learn this way and (b) much harder problems can be solved. For (a), the notion of teaching is about transferring knowledge from an expert to novices, often via demonstration. To see (b), note that we can create intricate rei

2 0.16069943 188 hunch net-2006-06-30-ICML papers

Introduction: Here are some ICML papers which interested me. Arindam Banerjee had a paper which notes that PAC-Bayes bounds, a core theorem in online learning, and the optimality of Bayesian learning statements share a core inequality in their proof. Pieter Abbeel , Morgan Quigley and Andrew Y. Ng have a paper discussing RL techniques for learning given a bad (but not too bad) model of the world. Nina Balcan and Avrim Blum have a paper which discusses how to learn given a similarity function rather than a kernel. A similarity function requires less structure than a kernel, implying that a learning algorithm using a similarity function might be applied in situations where no effective kernel is evident. Nathan Ratliff , Drew Bagnell , and Marty Zinkevich have a paper describing an algorithm which attempts to fuse A * path planning with learning of transition costs based on human demonstration. Papers (2), (3), and (4), all seem like an initial pass at solving in

3 0.13092262 183 hunch net-2006-06-14-Explorations of Exploration

Introduction: Exploration is one of the big unsolved problems in machine learning. This isn’t for lack of trying—there are many models of exploration which have been analyzed in many different ways by many different groups of people. At some point, it is worthwhile to sit back and see what has been done across these many models. Reinforcement Learning (1) . Reinforcement learning has traditionally focused on Markov Decision Processes where the next state s’ is given by a conditional distribution P(s’|s,a) given the current state s and action a . The typical result here is that certain specific algorithms controlling an agent can behave within e of optimal for horizon T except for poly(1/e,T,S,A) “wasted” experiences (with high probability). This started with E 3 by Satinder Singh and Michael Kearns . Sham Kakade’s thesis has significant discussion. Extensions have typically been of the form “under extra assumptions, we can prove more”, for example Factored-E 3 and

4 0.13062106 220 hunch net-2006-11-27-Continuizing Solutions

Introduction: This post is about a general technique for problem solving which I’ve never seen taught (in full generality), but which I’ve found very useful. Many problems in computer science turn out to be discretely difficult. The best known version of such problems are NP-hard problems, but I mean ‘discretely difficult’ in a much more general way, which I only know how to capture by examples. ERM In empirical risk minimization, you choose a minimum error rate classifier from a set of classifiers. This is NP hard for common sets, but it can be much harder, depending on the set. Experts In the online learning with experts setting, you try to predict well so as to compete with a set of (adversarial) experts. Here the alternating quantifiers of you and an adversary playing out a game can yield a dynamic programming problem that grows exponentially. Policy Iteration The problem with policy iteration is that you learn a new policy with respect to an old policy, which implies that sim

5 0.12741691 438 hunch net-2011-07-11-Interesting Neural Network Papers at ICML 2011

Introduction: Maybe it’s too early to call, but with four separate Neural Network sessions at this year’s ICML , it looks like Neural Networks are making a comeback. Here are my highlights of these sessions. In general, my feeling is that these papers both demystify deep learning and show its broader applicability. The first observation I made is that the once disreputable “Neural” nomenclature is being used again in lieu of “deep learning”. Maybe it’s because Adam Coates et al. showed that single layer networks can work surprisingly well. An Analysis of Single-Layer Networks in Unsupervised Feature Learning , Adam Coates , Honglak Lee , Andrew Y. Ng (AISTATS 2011) The Importance of Encoding Versus Training with Sparse Coding and Vector Quantization , Adam Coates , Andrew Y. Ng (ICML 2011) Another surprising result out of Andrew Ng’s group comes from Andrew Saxe et al. who show that certain convolutional pooling architectures can obtain close to state-of-the-art pe

6 0.1189553 388 hunch net-2010-01-24-Specializations of the Master Problem

7 0.11677687 27 hunch net-2005-02-23-Problem: Reinforcement Learning with Classification

8 0.10872679 14 hunch net-2005-02-07-The State of the Reduction

9 0.10784483 8 hunch net-2005-02-01-NIPS: Online Bayes

10 0.10612175 3 hunch net-2005-01-24-The Humanloop Spectrum of Machine Learning

11 0.099856906 45 hunch net-2005-03-22-Active learning

12 0.097319178 77 hunch net-2005-05-29-Maximum Margin Mismatch?

13 0.093261749 100 hunch net-2005-08-04-Why Reinforcement Learning is Important

14 0.090337485 235 hunch net-2007-03-03-All Models of Learning have Flaws

15 0.089878373 403 hunch net-2010-07-18-ICML & COLT 2010

16 0.08481735 269 hunch net-2007-10-24-Contextual Bandits

17 0.084008545 133 hunch net-2005-11-28-A question of quantification

18 0.082792304 97 hunch net-2005-07-23-Interesting papers at ACL

19 0.081961147 165 hunch net-2006-03-23-The Approximation Argument

20 0.078366086 109 hunch net-2005-09-08-Online Learning as the Mathematics of Accountability


similar blogs computed by lsi model

lsi for this blog:

topicId topicWeight

[(0, 0.197), (1, 0.059), (2, 0.017), (3, 0.022), (4, 0.074), (5, -0.055), (6, 0.007), (7, 0.017), (8, 0.065), (9, -0.01), (10, 0.024), (11, -0.034), (12, -0.043), (13, -0.025), (14, 0.011), (15, 0.022), (16, 0.037), (17, -0.059), (18, 0.068), (19, -0.02), (20, -0.038), (21, -0.06), (22, -0.089), (23, -0.01), (24, 0.056), (25, 0.11), (26, -0.068), (27, 0.038), (28, 0.015), (29, 0.039), (30, 0.007), (31, -0.039), (32, 0.005), (33, -0.05), (34, 0.074), (35, -0.009), (36, 0.077), (37, 0.108), (38, 0.109), (39, -0.085), (40, 0.023), (41, -0.078), (42, 0.102), (43, 0.039), (44, 0.143), (45, 0.003), (46, 0.057), (47, -0.044), (48, -0.029), (49, 0.022)]

similar blogs list:

simIndex simValue blogId blogTitle

same-blog 1 0.95222634 101 hunch net-2005-08-08-Apprenticeship Reinforcement Learning for Control

Introduction: Pieter Abbeel presented a paper with Andrew Ng at ICML on Exploration and Apprenticeship Learning in Reinforcement Learning . The basic idea of this algorithm is: Collect data from a human controlling a machine. Build a transition model based upon the experience. Build a policy which optimizes the transition model. Evaluate the policy. If it works well, halt, otherwise add the experience into the pool and go to (2). The paper proves that this technique will converge to some policy with expected performance near human expected performance assuming the world fits certain assumptions (MDP or linear dynamics). This general idea of apprenticeship learning (i.e. incorporating data from an expert) seems very compelling because (a) humans often learn this way and (b) much harder problems can be solved. For (a), the notion of teaching is about transferring knowledge from an expert to novices, often via demonstration. To see (b), note that we can create intricate rei

2 0.64751655 192 hunch net-2006-07-08-Some recent papers

Introduction: It was a fine time for learning in Pittsburgh. John and Sam mentioned some of my favorites. Here’s a few more worth checking out: Online Multitask Learning Ofer Dekel, Phil Long, Yoram Singer This is on my reading list. Definitely an area I’m interested in. Maximum Entropy Distribution Estimation with Generalized Regularization Miroslav Dudík, Robert E. Schapire Learning near-optimal policies with Bellman-residual minimization based fitted policy iteration and a single sample path András Antos, Csaba Szepesvári, Rémi Munos Again, on the list to read. I saw Csaba and Remi talk about this and related work at an ICML Workshop on Kernel Reinforcement Learning. The big question in my head is how this compares/contrasts with existing work in reductions to reinforcement learning. Are there advantages/disadvantages? Higher Order Learning On Graphs> by Sameer Agarwal, Kristin Branson, and Serge Belongie, looks to be interesteding. They seem to poo-poo “tensorization

3 0.6033566 77 hunch net-2005-05-29-Maximum Margin Mismatch?

Introduction: John makes a fascinating point about structured classification (and slightly scooped my post!). Maximum Margin Markov Networks (M3N) are an interesting example of the second class of structured classifiers (where the classification of one label depends on the others), and one of my favorite papers. I’m not alone: the paper won the best student paper award at NIPS in 2003. There are some things I find odd about the paper. For instance, it says of probabilistic models “cannot handle high dimensional feature spaces and lack strong theoretical guarrantees.” I’m aware of no such limitations. Also: “Unfortunately, even probabilistic graphical models that are trained discriminatively do not achieve the same level of performance as SVMs, especially when kernel features are used.” This is quite interesting and contradicts my own experience as well as that of a number of people I greatly respect . I wonder what the root cause is: perhaps there is something different abo

4 0.58737153 188 hunch net-2006-06-30-ICML papers

Introduction: Here are some ICML papers which interested me. Arindam Banerjee had a paper which notes that PAC-Bayes bounds, a core theorem in online learning, and the optimality of Bayesian learning statements share a core inequality in their proof. Pieter Abbeel , Morgan Quigley and Andrew Y. Ng have a paper discussing RL techniques for learning given a bad (but not too bad) model of the world. Nina Balcan and Avrim Blum have a paper which discusses how to learn given a similarity function rather than a kernel. A similarity function requires less structure than a kernel, implying that a learning algorithm using a similarity function might be applied in situations where no effective kernel is evident. Nathan Ratliff , Drew Bagnell , and Marty Zinkevich have a paper describing an algorithm which attempts to fuse A * path planning with learning of transition costs based on human demonstration. Papers (2), (3), and (4), all seem like an initial pass at solving in

5 0.58558047 220 hunch net-2006-11-27-Continuizing Solutions

Introduction: This post is about a general technique for problem solving which I’ve never seen taught (in full generality), but which I’ve found very useful. Many problems in computer science turn out to be discretely difficult. The best known version of such problems are NP-hard problems, but I mean ‘discretely difficult’ in a much more general way, which I only know how to capture by examples. ERM In empirical risk minimization, you choose a minimum error rate classifier from a set of classifiers. This is NP hard for common sets, but it can be much harder, depending on the set. Experts In the online learning with experts setting, you try to predict well so as to compete with a set of (adversarial) experts. Here the alternating quantifiers of you and an adversary playing out a game can yield a dynamic programming problem that grows exponentially. Policy Iteration The problem with policy iteration is that you learn a new policy with respect to an old policy, which implies that sim

6 0.58339435 27 hunch net-2005-02-23-Problem: Reinforcement Learning with Classification

7 0.53517342 189 hunch net-2006-07-05-more icml papers

8 0.53276932 183 hunch net-2006-06-14-Explorations of Exploration

9 0.53248841 3 hunch net-2005-01-24-The Humanloop Spectrum of Machine Learning

10 0.53125298 276 hunch net-2007-12-10-Learning Track of International Planning Competition

11 0.51530409 169 hunch net-2006-04-05-What is state?

12 0.50774485 100 hunch net-2005-08-04-Why Reinforcement Learning is Important

13 0.5032348 388 hunch net-2010-01-24-Specializations of the Master Problem

14 0.50204045 31 hunch net-2005-02-26-Problem: Reductions and Relative Ranking Metrics

15 0.47678858 87 hunch net-2005-06-29-Not EM for clustering at COLT

16 0.47644922 148 hunch net-2006-01-13-Benchmarks for RL

17 0.4670426 8 hunch net-2005-02-01-NIPS: Online Bayes

18 0.45601249 438 hunch net-2011-07-11-Interesting Neural Network Papers at ICML 2011

19 0.45266959 32 hunch net-2005-02-27-Antilearning: When proximity goes bad

20 0.44708413 311 hunch net-2008-07-26-Compositional Machine Learning Algorithm Design


similar blogs computed by lda model

lda for this blog:

topicId topicWeight

[(23, 0.293), (27, 0.271), (53, 0.115), (55, 0.092), (77, 0.048), (94, 0.037), (95, 0.051)]

similar blogs list:

simIndex simValue blogId blogTitle

same-blog 1 0.88555264 101 hunch net-2005-08-08-Apprenticeship Reinforcement Learning for Control

Introduction: Pieter Abbeel presented a paper with Andrew Ng at ICML on Exploration and Apprenticeship Learning in Reinforcement Learning . The basic idea of this algorithm is: Collect data from a human controlling a machine. Build a transition model based upon the experience. Build a policy which optimizes the transition model. Evaluate the policy. If it works well, halt, otherwise add the experience into the pool and go to (2). The paper proves that this technique will converge to some policy with expected performance near human expected performance assuming the world fits certain assumptions (MDP or linear dynamics). This general idea of apprenticeship learning (i.e. incorporating data from an expert) seems very compelling because (a) humans often learn this way and (b) much harder problems can be solved. For (a), the notion of teaching is about transferring knowledge from an expert to novices, often via demonstration. To see (b), note that we can create intricate rei

2 0.84332228 279 hunch net-2007-12-19-Cool and interesting things seen at NIPS

Introduction: I learned a number of things at NIPS . The financial people were there in greater force than previously. Two Sigma sponsored NIPS while DRW Trading had a booth. The adversarial machine learning workshop had a number of talks about interesting applications where an adversary really is out to try and mess up your learning algorithm. This is very different from the situation we often think of where the world is oblivious to our learning. This may present new and convincing applications for the learning-against-an-adversary work common at COLT . There were several interesing papers. Sanjoy Dasgupta , Daniel Hsu , and Claire Monteleoni had a paper on General Agnostic Active Learning . The basic idea is that active learning can be done via reduction to a form of supervised learning problem. This is great, because we have many supervised learning algorithms from which the benefits of active learning may be derived. Joseph Bradley and Robert Schapire had a P

3 0.783746 347 hunch net-2009-03-26-Machine Learning is too easy

Introduction: One of the remarkable things about machine learning is how diverse it is. The viewpoints of Bayesian learning, reinforcement learning, graphical models, supervised learning, unsupervised learning, genetic programming, etc… share little enough overlap that many people can and do make their careers within one without touching, or even necessarily understanding the others. There are two fundamental reasons why this is possible. For many problems, many approaches work in the sense that they do something useful. This is true empirically, where for many problems we can observe that many different approaches yield better performance than any constant predictor. It’s also true in theory, where we know that for any set of predictors representable in a finite amount of RAM, minimizing training error over the set of predictors does something nontrivial when there are a sufficient number of examples. There is nothing like a unifying problem defining the field. In many other areas there

4 0.70718199 478 hunch net-2013-01-07-NYU Large Scale Machine Learning Class

Introduction: Yann LeCun and I are coteaching a class on Large Scale Machine Learning starting late January at NYU . This class will cover many tricks to get machine learning working well on datasets with many features, examples, and classes, along with several elements of deep learning and support systems enabling the previous. This is not a beginning class—you really need to have taken a basic machine learning class previously to follow along. Students will be able to run and experiment with large scale learning algorithms since Yahoo! has donated servers which are being configured into a small scale Hadoop cluster. We are planning to cover the frontier of research in scalable learning algorithms, so good class projects could easily lead to papers. For me, this is a chance to teach on many topics of past research. In general, it seems like researchers should engage in at least occasional teaching of research, both as a proof of teachability and to see their own research through th

5 0.70413953 225 hunch net-2007-01-02-Retrospective

Introduction: It’s been almost two years since this blog began. In that time, I’ve learned enough to shift my expectations in several ways. Initially, the idea was for a general purpose ML blog where different people could contribute posts. What has actually happened is most posts come from me, with a few guest posts that I greatly value. There are a few reasons I see for this. Overload . A couple years ago, I had not fully appreciated just how busy life gets for a researcher. Making a post is not simply a matter of getting to it, but rather of prioritizing between {writing a grant, finishing an overdue review, writing a paper, teaching a class, writing a program, etc…}. This is a substantial transition away from what life as a graduate student is like. At some point the question is not “when will I get to it?” but rather “will I get to it?” and the answer starts to become “no” most of the time. Feedback failure . This blog currently receives about 3K unique visitors per day from

6 0.7033177 293 hunch net-2008-03-23-Interactive Machine Learning

7 0.702591 220 hunch net-2006-11-27-Continuizing Solutions

8 0.70257068 370 hunch net-2009-09-18-Necessary and Sufficient Research

9 0.70100522 435 hunch net-2011-05-16-Research Directions for Machine Learning and Algorithms

10 0.69948226 12 hunch net-2005-02-03-Learning Theory, by assumption

11 0.69887596 158 hunch net-2006-02-24-A Fundamentalist Organization of Machine Learning

12 0.69798142 201 hunch net-2006-08-07-The Call of the Deep

13 0.69539529 483 hunch net-2013-06-10-The Large Scale Learning class notes

14 0.69431692 194 hunch net-2006-07-11-New Models

15 0.69337839 230 hunch net-2007-02-02-Thoughts regarding “Is machine learning different from statistics?”

16 0.69267213 134 hunch net-2005-12-01-The Webscience Future

17 0.69191885 235 hunch net-2007-03-03-All Models of Learning have Flaws

18 0.69059193 22 hunch net-2005-02-18-What it means to do research.

19 0.69055837 60 hunch net-2005-04-23-Advantages and Disadvantages of Bayesian Learning

20 0.69038945 258 hunch net-2007-08-12-Exponentiated Gradient