nips nips2006 nips2006-171 nips2006-171-reference knowledge-graph by maker-knowledge-mining

171 nips-2006-Sample Complexity of Policy Search with Known Dynamics

Source: pdf

Author: Peter L. Bartlett, Ambuj Tewari

Abstract: We consider methods that try to ﬁnd a good policy for a Markov decision process by choosing one from a given class. The policy is chosen based on its empirical performance in simulations. We are interested in conditions on the complexity of the policy class that ensure the success of such simulation based policy search methods. We show that under bounds on the amount of computation involved in computing policies, transition dynamics and rewards, uniform convergence of empirical estimates to true value functions occurs. Previously, such results were derived by assuming boundedness of pseudodimension and Lipschitz continuity. These assumptions and ours are both stronger than the usual combinatorial complexity measures. We show, via minimax inequalities, that this is essential: boundedness of pseudodimension or fat-shattering dimension alone is not sufﬁcient.

reference text

[1] Alon, N., Ben-David, S., Cesa-Bianchi, N. & Haussler, D. (1997) Scale-sensitive Dimensions, Uniform Convergence, and Learnability. Journal of the ACM 44(4):615–631.

[2] Anthony, M. & Bartlett P.L. (1999) Neural Network Learning: Theoretical Foundations. Cambridge University Press.

[3] Blum, L., Cucker, F., Shub, M. & Smale, S. (1998) Complexity and Real Computation. Springer-Verlag.

[4] Goldberg, P.W. & Jerrum, M.R. (1995) Bounding the Vapnik-Chervonenkis Dimension of Concept Classes Parameterized by Real Numbers. Machine Learning 18(2-3):131–148.

[5] Haussler, D. (1992) Decision Theoretic Generalizations of the PAC Model for Neural Net and Other Learning Applications. Information and Computation 100:78–150.

[6] Jain, R. & Varaiya, P. (2006) Simulation-based Uniform Value Function Estimates of Discounted and Average-reward MDPs. SIAM Journal on Control and Optimization, to appear.

[7] Ng A.Y. & Jordan M.I. (2000) PEGASUS: A Policy Search Method for MDPs and POMDPs. In Proceedings of the 16th Annual Conference on Uncertainty in Artiﬁcial Intelligence, pp. 405–415. Morgan Kauffman Publishers.

[8] Pollard D. (1990) Empirical Processes: Theory and Applications. NSF-CBMS Regional Conference Series in Probability and Statistics, Volume 2.