nips nips2009 nips2009-145 nips2009-145-reference knowledge-graph by maker-knowledge-mining

145 nips-2009-Manifold Embeddings for Model-Based Reinforcement Learning under Partial Observability

Source: pdf

Author: Keith Bush, Joelle Pineau

Abstract: Interesting real-world datasets often exhibit nonlinear, noisy, continuous-valued states that are unexplorable, are poorly described by ﬁrst principles, and are only partially observable. If partial observability can be overcome, these constraints suggest the use of model-based reinforcement learning. We experiment with manifold embeddings to reconstruct the observable state-space in the context of offline, model-based reinforcement learning. We demonstrate that the embedding of a system can change as a result of learning, and we argue that the best performing embeddings well-represent the dynamics of both the uncontrolled and adaptively controlled system. We apply this approach to learn a neurostimulation policy that suppresses epileptic seizures on animal brain slices. 1

reference text

[1] Christopher G. Atkeson, Andrew W. Moore, and Stefan Schaal. Locally weighted learning for control. Artiﬁcial Intelligence Review, 11:75–113, 1997.

[2] Christopher G. Atkeson and Jun Morimoto. Nonparametric representation of policies and value functions: A trajectory-based approach. In Advances in Neural Information Processing, 2003.

[3] M. Bowling, A. Ghodsi, and D. Wilkinson. Action respecting embedding. In Proceedings of ICML, 2005.

[4] F. Lopes da Silva, W. Blanes, S. Kalitzin, J. Parra, P. Suffczynski, and D. Velis. Dynamical diseases of brain systems: Different routes to epileptic seizures. IEEE Transactions on Biomedical Engineering, 50(5):540–548, 2003.

[5] G. D’Arcangelo, G. Panuccio, V. Tancredi, and M. Avoli. Repetitive low-frequency stimulation reduces epileptiform synchronization in limbic neuronal networks. Neurobiology of Disease, 19:119–128, 2005.

[6] Damien Ernst, Pierre Guerts, and Louis Wehenkel. Tree-based batch mode reinforcement learning. Journal of Machine Learning Research, 6:503–556, 2005.

[7] A. Galka. Topics in Nonlinear Time Series Analysis: with implications for EEG Analysis. World Scientiﬁc, 2000.

[8] J.P. Huke. Embedding nonlinear dynamical systems: A guide to Takens’ Theorem. Technical report, Manchester Institute for Mathematical Sciences, University of Manchester, March, 2006.

[9] K. Jerger and S. Schiff. Periodic pacing and in vitro epileptic focus. Journal of Neurophysiology, 73(2):876–879, 1995.

[10] Nicholas K. Jong and Peter Stone. Model-based function approximation in reinforcement learning. In Proceedings of AAMAS, 2007.

[11] P.W. Keller, S. Mannor, and D. Precup. Automatic basis function construction for approximate dynamic programming and reinforcement learning. In Proceedings of ICML, 2006.

[12] M. Kennel and H. Abarbanel. False neighbors and false strands: A reliable minimum embedding dimension algorithm. Physical Review E, 66:026209, 2002.

[13] S. Mahadevan and M. Maggioni. Proto-value functions: A Laplacian framework for learning representation and control in Markov decision processes. Journal of Machine Learning Research, 8:2169–2231, 2007.

[14] A. K. McCallum. Reinforcement Learning with Selective Perception and Hidden State. PhD thesis, University of Rochester, 1996.

[15] R. Munos and A. Moore. Variable resolution discretization in optimal control. Machine Learning, 49:291– 323, 2002.

[16] U. Parlitz and C. Merkwirth. Prediction of spatiotemporal time series based on reconstructed local states. Physical Review Letters, 84(9):1890–1893, 2000.

[17] Jan Peters, Sethu Vijayakumar, and Stefan Schaal. Natural actor-critic. In Proceedings of ECML, 2005.

[18] Tim Sauer, James A. Yorke, and Martin Casdagli. 65:3/4:579–616, 1991. Embedology. Journal of Statistical Physics,

[19] S. Singh, M. L. Littman, N. K. Jong, D. Pardoe, and P. Stone. Learning predictive state representations. In Proceedings of ICML, 2003.

[20] W. Smart. Explicit manifold representations for value-functions in reinforcement learning. In Proceedings of ISAIM, 2004.

[21] J. Stark. Delay embeddings for forced systems. I. Deterministic forcing. Journal of Nonlinear Science, 9:255–332, 1999.

[22] J. Stark, D.S. Broomhead, M.E. Davies, and J. Huke. Delay embeddings for forced systems. II. Stochastic forcing. Journal of Nonlinear Science, 13:519–577, 2003.

[23] R. Sutton and A. Barto. Reinforcement learning: An introduction. The MIT Press, Cambridge, MA, 1998.

[24] F. Takens. Detecting strange attractors in turbulence. In D. A. Rand & L. S. Young, editor, Dynamical Systems and Turbulence, volume 898, pages 366–381. Warwick, 1980.

[25] D. Wingate and S. Singh. On discovery and learning of models with predictive state representations of state for agents with continuous actions and observations. In Proceedings of AAMAS, 2007. 9