nips nips2006 nips2006-143 nips2006-143-reference knowledge-graph by maker-knowledge-mining

143 nips-2006-Natural Actor-Critic for Road Traffic Optimisation


Source: pdf

Author: Silvia Richter, Douglas Aberdeen, Jin Yu

Abstract: Current road-traffic optimisation practice around the world is a combination of hand tuned policies with a small degree of automatic adaption. Even state-ofthe-art research controllers need good models of the road traffic, which cannot be obtained directly from existing sensors. We use a policy-gradient reinforcement learning approach to directly optimise the traffic signals, mapping currently deployed sensor observations to control signals. Our trained controllers are (theoretically) compatible with the traffic system used in Sydney and many other cities around the world. We apply two policy-gradient methods: (1) the recent natural actor-critic algorithm, and (2) a vanilla policy-gradient algorithm for comparison. Along the way we extend natural-actor critic approaches to work for distributed and online infinite-horizon problems. 1


reference text

[1] J. Peters, S. Vijayakumar, and S. Schaal. Natural actor-critic. In Proc. ECML., pages 280–291, 2005.

[2] L. Bottou and Y. Le Cun. Large scale online learning. In Proc. NIPS’2003, volume 16, 2004.

[3] N. H. Gartner, C. J. Messer, and E. Ajay K. Rathi. Traffic Flow Theory: A State of the Art Report Revised Monograph on Traffic Flow Theory. U.S. Department of Transportation, Transportation Research Board,Washington, D.C., 1992.

[4] M. Papageorgiou. Traffic Control. In Handbook of Transportation Science. R. W. Hall, Editor, Kluwer Academic Publishers, Boston, 1999.

[5] A. G. Sims and K. W. Dobinson. The Sydney coordinated adaptive traffic (SCAT) system philosophy and benefits. IEEE Transactions on Vehicular Technology, VT-29(2):130–137, 1980.

[6] M. Wiering. Multi-agent reinforcement learning for traffic light control. In Proc. ICML 2000, 2000.

[7] S. Richter. Learning traffic control - towards practical traffic control using policy gradients. Diplomarbeit, Albert-Ludwigs-Universit¨ t Freiburg, 2006. a

[8] J. A. Bagnell and A. Y. Ng. On local rewards and scaling distributed reinforcement learning. In Proc. NIPS’2005, volume 18, 2006.

[9] R. S. Sutton, D. McAllester, S. Singh, and Y. Mansour. Policy gradient methods for reinforcement learning with function approximation. In Proc. NIPS, volume 12. MIT Press, 2000.

[10] S. Kakade. A natural policy gradient. In Proc. NIPS’2001, volume 14, 2002.

[11] J. A. Boyan. Least-squares temporal difference learning. In Proc. ICML 16, pages 49–56, 1999.

[12] J. Baxter, P. Bartlett, and L. Weaver. Experiments with infinite-horizon, policy-gradient estimation. JAIR, 15:351–381, 2001.