nips nips2002 nips2002-134 nips2002-134-reference knowledge-graph by maker-knowledge-mining

134 nips-2002-Learning to Take Concurrent Actions

Source: pdf

Author: Khashayar Rohanimanesh, Sridhar Mahadevan

Abstract: We investigate a general semi-Markov Decision Process (SMDP) framework for modeling concurrent decision making, where agents learn optimal plans over concurrent temporally extended actions. We introduce three types of parallel termination schemes – all, any and continue – and theoretically and experimentally compare them. 1

reference text

[1] Craig Boutilier and Ronen Brafman. Planning with concurrent interacting actions. In Proceedings of the Fourteenth National Conference on Artiﬁcial Intelligence (AAAI ’97), 1997.

[2] P. Cichosz. Learning multidimensional control actions from delayed reinforcements. In Eighth International Symposium on System-Modelling-Control (SMC-8), Zakopane, Poland, 1995.

[3] C. A. Knoblock. Generating parallel execution plans with a partial-order planner. In Proceedings of the Second International Conference on Artiﬁcial Intelligence Planning Systems , Chicago, IL, 1994., 1994.

[4] Ray Reiter. Natural actions, concurrency and continuous time in the situation calculus. Principles of Knowledge Representation and Reasoning: Proceedings of the Fifth International Conference (KR’96), Cambridge MA., November 5-8, 1996, 1996.

[5] Khashayar Rohanimanesh and Sridhar Mahadevan. Decision-theoretic planning with concurrent temporally extended actions. In Proceedings of the 17th Conference on Uncertainty in Artiﬁcial Intelligence, 2001.

[6] S. Singh and David Cohn. How to dynamically merge markov decision processes. Proceedings of NIPS 11, 1998.

[7] R. Sutton, D. Precup, and S. Singh. Between MDPs and Semi-MDPs: A framework for temporal abstraction in reinforcement learning. Artiﬁcial Intelligence, pages 181–211, 1999.

[8] Glynn Winskel. Topics in concurrency: Part ii comp. sci. lecture notes. Computer Science course at the University of Cambridge, 2002. 500000