nips nips2009 nips2009-54 nips2009-54-reference knowledge-graph by maker-knowledge-mining
Source: pdf
Author: Emanuel Todorov
Abstract: We present a theory of compositionality in stochastic optimal control, showing how task-optimal controllers can be constructed from certain primitives. The primitives are themselves feedback controllers pursuing their own agendas. They are mixed in proportion to how much progress they are making towards their agendas and how compatible their agendas are with the present task. The resulting composite control law is provably optimal when the problem belongs to a certain class. This class is rather general and yet has a number of unique properties – one of which is that the Bellman equation can be made linear even for non-linear or discrete dynamics. This gives rise to the compositionality developed here. In the special case of linear dynamics and Gaussian noise our framework yields analytical solutions (i.e. non-linear mixtures of LQG controllers) without requiring the final cost to be quadratic. More generally, a natural set of control primitives can be constructed by applying SVD to Green’s function of the Bellman equation. We illustrate the theory in the context of human arm movements. The ideas of optimality and compositionality are both very prominent in the field of motor control, yet they have been difficult to reconcile. Our work makes this possible.
[1] D. Bertsekas, Dynamic Programming and Optimal Control (2nd Ed). Bellmont, MA: Athena Scientific, 2001.
[2] R. Sutton and A. Barto, Reinforcement Learning: An Introduction. MIT Press, Cambridge MA, 1998.
[3] D. Bertsekas and J. Tsitsiklis, Neuro-dynamic programming. Belmont, MA: Athena Scientific, 1997.
[4] J. Si, A. Barto, W. Powell, and D. Wunsch, Handbook of Learning and Approximate Dynamic Programming. Wiley-IEEE Press, 2004.
[5] S. Mahadevan and M. Maggioni, “Proto-value functions: A Laplacian farmework for learning representation and control in Markov decision processes,” Journal of Machine Learning Research, vol. 8, pp. 2169–2231, 2007.
[6] M. daSilva, F. Durand, and J. Popovic, “Linear bellman combination for control of character animation,” To appear in SIGGRAPH, 2009.
[7] E. Todorov, “Optimality principles in sensorimotor control,” Nature Neuroscience, vol. 7, no. 9, pp. 907–915, 2004.
[8] C. Harris and D. Wolpert, “Signal-dependent noise determines motor planning,” Nature, vol. 394, pp. 780–784, 1998.
[9] C. Sherrington, The integrative action of the nervous system. New Haven: Yale University Press, 1906.
[10] N. Bernstein, On the construction of movements. Moscow: Medgiz, 1947.
[11] M. Latash, “On the evolution of the notion of synergy,” in Motor Control, Today and Tomorrow, G. Gantchev, S. Mori, and J. Massion, Eds. Sofia: Academic Publishing House
[12] M. Tresch, P. Saltiel, and E. Bizzi, “The construction of movement by the spinal cord,” Nature Neuroscience, vol. 2, no. 2, pp. 162–167, 1999.
[13] A. D’Avella, P. Saltiel, and E. Bizzi, “Combinations of muscle synergies in the construction of a natural motor behavior,” Nat.Neurosci., vol. 6, no. 3, pp. 300–308, 2003.
[14] M. Santello, M. Flanders, and J. Soechting, “Postural hand synergies for tool use,” J Neurosci, vol. 18, no. 23, pp. 10 105–15, 1998.
[15] E. Todorov, “Linearly-solvable Markov decision problems,” Advances in Neural Information Processing Systems, 2006.
[16] ——, “General duality between optimal control and estimation,” IEEE Conference on Decision and Control, 2008.
[17] ——, “Efficient computation of optimal actions,” PNAS, in press, 2009.
[18] S. Mitter and N. Newton, “A variational approach to nonlinear estimation,” SIAM J Control Opt, vol. 42, pp. 1813–1833, 2003.
[19] H. Kappen, “Linear theory for control of nonlinear stochastic systems,” Physical Review Letters, vol. 95, 2005.
[20] E. Todorov, “Eigen-function approximation methods for linearly-solvable optimal control problems,” IEEE International Symposium on Adaptive Dynamic Programming and Reinforcemenet Learning, 2009.
[21] R. Stengel, Optimal Control and Estimation. New York: Dover, 1994.
[22] H. Kushner and P. Dupuis, Numerical Methods for Stochastic Optimal Control Problems in Continuous Time. New York: Springer, 2001. 9