nips nips2006 nips2006-47 nips2006-47-reference knowledge-graph by maker-knowledge-mining

47 nips-2006-Boosting Structured Prediction for Imitation Learning

Source: pdf

Author: J. A. Bagnell, Joel Chestnutt, David M. Bradley, Nathan D. Ratliff

Abstract: The Maximum Margin Planning (MMP) (Ratliff et al., 2006) algorithm solves imitation learning problems by learning linear mappings from features to cost functions in a planning domain. The learned policy is the result of minimum-cost planning using these cost functions. These mappings are chosen so that example policies (or trajectories) given by a teacher appear to be lower cost (with a lossscaled margin) than any other policy for a given planning domain. We provide a novel approach, M MP B OOST , based on the functional gradient descent view of boosting (Mason et al., 1999; Friedman, 1999a) that extends MMP by “boosting” in new features. This approach uses simple binary classiﬁcation or regression to improve performance of MMP imitation learning, and naturally extends to the class of structured maximum margin prediction problems. (Taskar et al., 2005) Our technique is applied to navigation and planning problems for outdoor mobile robots and robotic legged locomotion. 1

reference text

Beygelzimer, A., Dani, V., Hayes, T., Langford, J., & Zadrozny, B. (2005). Error limiting reductions between classiﬁcation tasks. ICML ’05. New York, NY. Chestnutt, J., Lau, M., Cheng, G., Kuffner, J., Hodgins, J., & Kanade, T. (2005). Footstep planning for the Honda ASIMO humanoid. Proceedings of the IEEE International Conference on Robotics and Automation. Dietterich, T. G., Ashenfelter, A., & Bulatov, Y. (2004). Training conditional random ﬁelds via gradient tree boosting. ICML ’04. Friedman, J. H. (1999a). Greedy function approximation: A gradient boosting machine. Annals of Statistics. Hassani, S. (1998). Mathematical physics. Springer. Mason, L., J.Baxter, Bartlett, P., & Frean, M. (1999). Functional gradient techniques for combining hypotheses. Advances in Large Margin Classiﬁers. MIT Press. Ratliff, N., Bagnell, J. A., & Zinkevich, M. (2006). Maximum margin planning. Twenty Second International Conference on Machine Learning (ICML06). Taskar, B., Chatalbashev, V., Guestrin, C., & Koller, D. (2005). Learning structured prediction models: A large margin approach. Twenty Second International Conference on Machine Learning (ICML05). Taskar, B., Guestrin, C., & Koller, D. (2003). Max margin markov networks. Advances in Neural Information Processing Systems (NIPS-14). Tsochantaridis, I., Joachims, T., Hofmann, T., & Altun, Y. (2005). Large margin methods for structured and interdependent output variables. Journal of Machine Learning Research, 1453–1484. Yagi, M., & Lumelsky, V. (1999). Biped robot locomotion in scenes with unknown obstacles. Proceedings of the IEEE International Conference on Robotics and Automation (pp. 375–380). Detroit, MI. 4 The best-ﬁrst quadruped planner under the M MP B OOST heuristic is on average approximately 1100 times faster than under the Euclidean heuristic in terms of the number of nodes searched.