nips nips2006 nips2006-143 nips2006-143-reference knowledge-graph by maker-knowledge-mining

143 nips-2006-Natural Actor-Critic for Road Traffic Optimisation

Source: pdf

Author: Silvia Richter, Douglas Aberdeen, Jin Yu

Abstract: Current road-trafﬁc optimisation practice around the world is a combination of hand tuned policies with a small degree of automatic adaption. Even state-ofthe-art research controllers need good models of the road trafﬁc, which cannot be obtained directly from existing sensors. We use a policy-gradient reinforcement learning approach to directly optimise the trafﬁc signals, mapping currently deployed sensor observations to control signals. Our trained controllers are (theoretically) compatible with the trafﬁc system used in Sydney and many other cities around the world. We apply two policy-gradient methods: (1) the recent natural actor-critic algorithm, and (2) a vanilla policy-gradient algorithm for comparison. Along the way we extend natural-actor critic approaches to work for distributed and online inﬁnite-horizon problems. 1

reference text

[1] J. Peters, S. Vijayakumar, and S. Schaal. Natural actor-critic. In Proc. ECML., pages 280–291, 2005.

[2] L. Bottou and Y. Le Cun. Large scale online learning. In Proc. NIPS’2003, volume 16, 2004.

[3] N. H. Gartner, C. J. Messer, and E. Ajay K. Rathi. Trafﬁc Flow Theory: A State of the Art Report Revised Monograph on Trafﬁc Flow Theory. U.S. Department of Transportation, Transportation Research Board,Washington, D.C., 1992.

[4] M. Papageorgiou. Trafﬁc Control. In Handbook of Transportation Science. R. W. Hall, Editor, Kluwer Academic Publishers, Boston, 1999.

[5] A. G. Sims and K. W. Dobinson. The Sydney coordinated adaptive trafﬁc (SCAT) system philosophy and beneﬁts. IEEE Transactions on Vehicular Technology, VT-29(2):130–137, 1980.

[6] M. Wiering. Multi-agent reinforcement learning for trafﬁc light control. In Proc. ICML 2000, 2000.

[7] S. Richter. Learning trafﬁc control - towards practical trafﬁc control using policy gradients. Diplomarbeit, Albert-Ludwigs-Universit¨ t Freiburg, 2006. a

[8] J. A. Bagnell and A. Y. Ng. On local rewards and scaling distributed reinforcement learning. In Proc. NIPS’2005, volume 18, 2006.

[9] R. S. Sutton, D. McAllester, S. Singh, and Y. Mansour. Policy gradient methods for reinforcement learning with function approximation. In Proc. NIPS, volume 12. MIT Press, 2000.

[10] S. Kakade. A natural policy gradient. In Proc. NIPS’2001, volume 14, 2002.

[11] J. A. Boyan. Least-squares temporal difference learning. In Proc. ICML 16, pages 49–56, 1999.

[12] J. Baxter, P. Bartlett, and L. Weaver. Experiments with inﬁnite-horizon, policy-gradient estimation. JAIR, 15:351–381, 2001.