nips nips2012 nips2012-160 nips2012-160-reference knowledge-graph by maker-knowledge-mining

160 nips-2012-Imitation Learning by Coaching


Source: pdf

Author: He He, Jason Eisner, Hal Daume

Abstract: Imitation Learning has been shown to be successful in solving many challenging real-world problems. Some recent approaches give strong performance guarantees by training the policy iteratively. However, it is important to note that these guarantees depend on how well the policy we found can imitate the oracle on the training data. When there is a substantial difference between the oracle’s ability and the learner’s policy space, we may fail to find a policy that has low error on the training set. In such cases, we propose to use a coach that demonstrates easy-to-learn actions for the learner and gradually approaches the oracle. By a reduction of learning by demonstration to online learning, we prove that coaching can yield a lower regret bound than using the oracle. We apply our algorithm to cost-sensitive dynamic feature selection, a hard decision problem that considers a user-specified accuracy-cost trade-off. Experimental results on UCI datasets show that our method outperforms state-of-the-art imitation learning methods in dynamic feature selection and two static feature selection methods. 1


reference text

[1] P. Abbeel and A. Y. Ng. Apprenticeship learning via inverse reinforcement learning. In ICML, 2004. 8

[2] M. Veloso B. D. Argall, S. Chernova and B. Browning. A survey of robot learning from demonstration. 2009.

[3] Stéphane. Ross, Geoffrey J. Gordon, and J. Andrew. Bagnell. A reduction of imitation learning and structured prediction to no-regret online learning. In AISTATS, 2011.

[4] D. McAllester, T. Hazan, and J. Keshet. Direct loss minimization for structured prediction. In NIPS, 2010.

[5] D. Klein P. Liang, A. Bouchard-Ct and B. Taskar. An end-to-end discriminative approach to machine translation. In ACL, 2006.

[6] D. Chiang, Y. Marton, and P. Resnik. Online large-margin training of syntactic and structural translation features. In EMNLP, 2008.

[7] R. Busa-Fekete D. Benbouzid and B. Kégl. Fast classification using space decision dags. In ICML, 2012.

[8] P. Preux G. Dulac-Arnold, L. Denoyer and P. Gallinari. Datum-wise classification: a sequential approach to sparsity. In ECML, 2011.

[9] Stéphane Ross and J. Andrew Bagnell. Efficient reductions for imitation learning. In AISTATS, 2010.

[10] Kääriäinen. Lower bounds for reductions. In Atomic Learning Workshop, 2006.

[11] Hal Daumé III, John Langford, and Daniel Marcu. Search-based structured prediction. Machine Learning Journal (MLJ), 2009.

[12] Elad Hazan, Adam Kalai, Satyen Kale, and Amit Agarwal. Logarithmic regret algorithms for online convex optimization. In COLT, pages 499–513, 2006.

[13] Sham M. Kakade and Shai Shalev-shwartz. Mind the duality gap: Logarithmic regret algorithms for online optimization. In NIPS, 2008.

[14] Q. Le C. B. Do and C.S. Foo. Proximal regularization for online and batch learning. In ICML, 2009.

[15] H Brendan Mcmahan. Follow-the-regularized-leader and mirror descent : Equivalence theorems and l1 regularization. JMLR, 15:525–533, 2011. 9