nips nips2012 nips2012-353 nips2012-353-reference knowledge-graph by maker-knowledge-mining

353 nips-2012-Transferring Expectations in Model-based Reinforcement Learning

Source: pdf

Author: Trung Nguyen, Tomi Silander, Tze Y. Leong

Abstract: We study how to automatically select and adapt multiple abstractions or representations of the world to support model-based reinforcement learning. We address the challenges of transfer learning in heterogeneous environments with varying tasks. We present an efﬁcient, online framework that, through a sequence of tasks, learns a set of relevant representations to be used in future tasks. Without predeﬁned mapping strategies, we introduce a general approach to support transfer learning across different state spaces. We demonstrate the potential impact of our system through improved jumpstart and faster convergence to near optimum policy in two benchmark domains. 1

reference text

[1] Atkeson, C., Santamaria, J.: A comparison of direct and model-based reinforcement learning. In: ICRA’97. vol. 4, pp. 3557–3564 (1997)

[2] Boutilier, C., Dearden, R., Goldszmidt, M.: Stochastic dynamic programming with factored representations. Journal of Artiﬁcial Intelligence 121, 49–107 (2000)

[3] Brafman, R.I., Tennenholtz, M.: R-max - a general polynomial time algorithm for near-optimal reinforcement learning. Journal of Machine Learning Research 3, 213–231 (2002)

[4] Celiberto, L.A., Matsuura, J.P., de Mntaras, R.L., Bianchi, R.A.C.: Using cases as heuristics in reinforcement learning: A transfer learning application. In: IJCAI’11. pp. 1211–1217 (2011)

[5] Dawid, A.: Statistical theory: The prequential approach. Journal of the Royal Statistical Society A 147, 278–292 (1984)

[6] Doya, K., Samejima, K., Katagiri, K.i., Kawato, M.: Multiple model-based reinforcement learning. Neural Computation 14, 1347–1369 (June 2002)

[7] Fern´ ndez, F., Garc´a, J., Veloso, M.: Probabilistic policy reuse for inter-task transfer learning. Robot and a ı Autonomous System 58, 866–871 (July 2010)

[8] Hester, T., Stone, P.: Generalized model learning for reinforcement learning in factored domains. In: AAMAS’09. vol. 2, pp. 717–724 (2009)

[9] Konidaris, G., Barto, A.: Efﬁcient skill learning using abstraction selection. In: IJCAI’09. pp. 1107–1112 (2009)

[10] Lefﬂer, B.R., Littman, M.L., Edmunds, T.: Efﬁcient reinforcement learning with relocatable action models. In: AAAI’07. vol. 1, pp. 572–577 (2007)

[11] McCarthy, J.: Situations, actions, and causal laws. Tech. Rep. Memo 2, Stanford Artiﬁcial Intelligence Project, Stanford University (1963)

[12] Savage, L.J.: Elicitation of personal probabilities and expectations. Journal of the American Statistical Association 66(336), 783–801 (1971)

[13] Sharma, M., Holmes, M., Santamaria, J., Irani, A., Isbell, C., Ram, A.: Transfer learning in real-time strategy games using hybrid cbr/rl. In: IJCAI’07. pp. 1041–1046 (2007)

[14] Sherstov, A.A., Stone, P.: Improving action selection in MDP’s via knowledge transfer. In: AAAI’05. vol. 2, pp. 1024–1029 (2005)

[15] Silva, B.C.D., Basso, E.W., Bazzan, A.L.C., Engel, P.M.: Dealing with non-stationary environments using context detection. In: ICML’06. pp. 217–224 (2006)

[16] Soni, V., Singh, S.: Using homomorphisms to transfer options across continuous reinforcement learning domains. In: AAAI’06. pp. 494–499 (2006)

[17] Strehl, A.L., Diuk, C., Littman, M.L.: Efﬁcient structure learning in factored-state MDPs. In: AAAI’07. pp. 645–650 (2007)

[18] Sutton, R.S., Barto, A.G.: Reinforcement Learning: An Introduction. MIT Press (1998)

[19] Taylor, M.E., Jong, N.K., Stone, P.: Transferring instances for model-based reinforcement learning. In: Machine Learning and Knowledge Discovery in Databases. LNAI, vol. 5212 (2008)

[20] Taylor, M.E., Stone, P.: Transfer learning for reinforcement learning domains: A survey. Journal of Machine Learning Research 10, 1633–1685 (December 2009)

[21] Van Seijen, H., Bakker, B., Kester, L.: Switching between different state representations in reinforcement learning. In: Proceedings of the 26th IASTED International Conference on Artiﬁcial Intelligence and Applications. pp. 226–231 (2008)

[22] Walsh, T.J., Li, L., Littman, M.L.: Transferring state abstractions between MDPs. In: ICML Workshop on Structural Knowledge Transfer for Machine Learning (2006)

[23] Wilson, A., Fern, A., Ray, S., Tadepalli, P.: Multi-task reinforcement learning: A hierarchical Bayesian approach. In: ICML’07. pp. 1015–1023 (2007)

[24] Xiao, L.: Dual averaging methods for regularized stochastic learning and online optimization. In: NIPS’09 (2009)

[25] Yang, H., Xu, Z., King, I., Lyu, M.R.: Online learning for group Lasso. In: ICML’10 (2010)

[26] Yuan, M., Lin, Y.: Model selection and estimation in regression with grouped variables. Journal of the Royal Statistical Society: Series B (Statistical Methodology) 68(1), 49–67 (2006)

[27] Zhu, X., Ghahramani, Z., Lafferty, J.: Time-sensitive Dirichlet process mixture models. Tech. Rep. CMUCALD-05-104, School of Computer Science, Carnegie Mellon University (2005) 9