nips nips2013 nips2013-280 nips2013-280-reference knowledge-graph by maker-knowledge-mining

280 nips-2013-Robust Data-Driven Dynamic Programming

Source: pdf

Author: Grani Adiwena Hanasusanto, Daniel Kuhn

Abstract: In stochastic optimal control the distribution of the exogenous noise is typically unknown and must be inferred from limited data before dynamic programming (DP)-based solution schemes can be applied. If the conditional expectations in the DP recursions are estimated via kernel regression, however, the historical sample paths enter the solution procedure directly as they determine the evaluation points of the cost-to-go functions. The resulting data-driven DP scheme is asymptotically consistent and admits an efﬁcient computational solution when combined with parametric value function approximations. If training data is sparse, however, the estimated cost-to-go functions display a high variability and an optimistic bias, while the corresponding control policies perform poorly in out-of-sample tests. To mitigate these small sample effects, we propose a robust data-driven DP scheme, which replaces the expectations in the DP recursions with worst-case expectations over a set of distributions close to the best estimate. We show that the arising minmax problems in the DP recursions reduce to tractable conic programs. We also demonstrate that the proposed robust DP algorithm dominates various non-robust schemes in out-of-sample tests across several application domains. 1

reference text

[1] D.P. Bertsekas. Dynamic Programming and Optimal Control, Vol. II. Athena Scientiﬁc, 3rd edition, 2007.

[2] W. Powell. Approximate Dynamic Programming: Solving the Curses of Dimensionality. Wiley-Blackwell, 2007.

[3] L. Devroye. The uniform convergence of the nadaraya-watson regression function estimate. Canadian Journal of Statistics, 6(2):179–191, 1978.

[4] A. Ben-Tal, L. El Ghaoui, and A. Nemirovski. Robust Optimization. Princeton University Press, 2009.

[5] A. Keshavarz and S. Boyd. Quadratic approximate dynamic programming for input-afﬁne systems. International Journal of Robust and Nonlinear Control, 2012. Forthcoming.

[6] A. Nilim and L. El Ghaoui. Robust control of Markov decision processes with uncertain transition matrices. Operations Research, 53(5):780–798, 2005.

[7] G. Iyengar. Robust dynamic programming. Mathematics of Operations Research, 30(2):257–280, 2005.

[8] S. Mannor, O. Mebel, and H. Xu. Lightning does not strike twice: Robust MDPs with coupled uncertainty. In Proceedings of the 29th International Conference on Machine Learning, pages 385–392, 2012.

[9] W. Wiesemann, D. Kuhn, and B. Rustem. Robust Markov decision processes. Mathematics of Operations Research, 38(1):153–183, 2013.

[10] M.G. Lagoudakis and R. Parr. Least-squares policy iteration. The Journal of Machine Learning Research, 4:1107–1149, 2003.

[11] D.P. Bertsekas. Approximate policy iteration: A survey and some new methods. Journal of Control Theory and Applications, 9(3):310–335, 2011.

[12] C.E. Rasmussen and M. Kuss. Gaussian processes in reinforcement learning. In Advances in Neural Information Processing Systems, pages 751–759, 2004.

[13] L. Bu¸oniu, A. Lazaric, M. Ghavamzadeh, R. Munos, R. Babuška, and B. De Schutter. Least-squares s methods for policy iteration. In Reinforcement Learning, pages 75–109. Springer, 2012.

[14] X. Xu, T. Xie, D. Hu, and X. Lu. Kernel least-squares temporal difference learning. International Journal of Information Technology, 11(9):54–63, 2005.

[15] Y. Engel, S. Mannor, and R. Meir. Reinforcement learning with Gaussian processes. In Proceedings of the 22nd International Conference on Machine Learning, pages 201–208, 2005.

[16] G. Taylor and R. Parr. Kernelized value function approximation for reinforcement learning. In Proceedings of the 26th International Conference on Machine Learning, pages 1017–1024, 2009.

[17] E.A. Nadaraya. On estimating regression. Theory of Probability & its Applications, 9(1):141–142, 1964.

[18] G.S. Watson. Smooth regression analysis. Sankhy¯ : The Indian Journal of Statistics, Series A, 26(4):359– a 372, 1964.

[19] B. Silverman. Density Estimation for Statistics and Data Analysis. Chapman & Hall/CRC, 1986.

[20] L. Hannah, W. Powell, and D. Blei. Nonparametric density estimation for stochastic optimization with an observable state variable. In Advances in Neural Information Processing Systems, pages 820–828, 2010.

[21] A. Cybakov. Introduction to Nonparametric Estimation. Springer, 2009.

[22] L. Pardo. Statistical Inference Based on Divergence Measures, volume 185 of Statistics: A Series of Textbooks and Monographs. Chapman and Hall/CRC, 2005.

[23] Z. Wang, P.W. Glynn, and Y. Ye. Likelihood robust optimization for data-driven newsvendor problems. Technical report, Stanford University, 2009.

[24] A. Ben-Tal, D. den Hertog, A. De Waegenaere, B. Melenberg, and G. Rennen. Robust solutions of optimization problems affected by uncertain probabilities. Management Science, 59(2):341–357, 2013.

[25] A. Shapiro, D. Dentcheva, and A. Ruszczy´ ski. Lectures on Stochastic Programming: Modeling and n Theory. SIAM, 2009.

[26] J.F. Sturm. Using SeDuMi 1.02, a MATLAB toolbox for optimization over symmetric cones. Optimization Methods and Software, 11-12:625–654, 1999.

[27] J. Löfberg. YALMIP : A toolbox for modeling and optimization in MATLAB. In Proceedings of the CACSD Conference, 2004.

[28] L. Hannah and D. Dunson. Approximate dynamic programming for storage problems. In Proceedings of the 28th International Conference on Machine Learning, pages 337–344, 2011.

[29] J.H. Kim and W.B. Powell. Optimal energy commitments with storage and intermittent supply. Operations Research, 59(6):1347–1360, 2011.

[30] M. Kraning, Y. Wang, E. Akuiyibo, and S. Boyd. Operation and conﬁguration of a storage portfolio via convex optimization. In Proceedings of the IFAC World Congress, pages 10487–10492, 2011. 9