jmlr jmlr2008 jmlr2008-8 jmlr2008-8-reference knowledge-graph by maker-knowledge-mining
Source: pdf
Author: Faustino Gomez, Jürgen Schmidhuber, Risto Miikkulainen
Abstract: Many complex control problems require sophisticated solutions that are not amenable to traditional controller design. Not only is it difficult to model real world systems, but often it is unclear what kind of behavior is required to solve the task. Reinforcement learning (RL) approaches have made progress by using direct interaction with the task environment, but have so far not scaled well to large state spaces and environments that are not fully observable. In recent years, neuroevolution, the artificial evolution of neural networks, has had remarkable success in tasks that exhibit these two properties. In this paper, we compare a neuroevolution method called Cooperative Synapse Neuroevolution (CoSyNE), that uses cooperative coevolution at the level of individual synaptic weights, to a broad range of reinforcement learning algorithms on very difficult versions of the pole balancing problem that involve large (continuous) state spaces and hidden state. CoSyNE is shown to be significantly more efficient and powerful than the other methods on these tasks. Keywords: coevolution, recurrent neural networks, non-linear control, genetic algorithms, experimental comparison
J. S. Albus. A new approach to manipulator control: The cerebellar model articulation controller (CMAC). Journal of Dynamic Systems, Measurement, and Control, 97(3):220–227, 1975. C. W. Anderson. Learning to control an inverted pendulum using neural networks. IEEE Control Systems Magazine, 9:31–37, 1989. C. W. Anderson. Strategy learning with multilayer connectionist representations. Technical Report TR87-509.3, GTE Labs, Waltham, MA, 1987. L. C. Baird and Andrew W. Moore. Gradient descent reinforcement learning. In Advances in Neural Information Processing Systems 12, 1999. R. K. Belew, J. McInerney, and N. N. Schraudolph. Evolving networks: Using the genetic algorithm with connectionist learning. In C. G. Langton, C. Taylor, J. D. Farmer, and S. Rasmussen, editors, Proceedings of the Workshop on Artificial Life (ALIFE ’90). Reading, MA: Addison-Wesley, 1991. ISBN 0-201-52570-4. J. A. Boyan and A. W. Moore. Generalization in reinforcement learning: Safely approximating the value function. In Advances in Neural Information Processing Systems 7, 1995. B. Bryant and R. Miikkulainen. Neuroevolution of adaptive teams: Learning heterogeneous behavior in homogeneous multi-agent systems. In Congress in Evolutionary Computation, Canberra, Australia, 2003. 960 C OOPERATIVE S YNAPSE N EUROEVOLUTION P. J. Darwen. Co-Evolutionary Learning by Automatic Modularization with Speciation. PhD thesis, University College, University of New South Wales, November 1996. R. Eriksson and B. Olsson. Cooperative coevolution in inventory control optimization. In Proceedings of 3rd International Conference on Artificial Neural Networks and Genetic Algorithms, 1997. F. Gomez and R. Miikkulainen. Incremental evolution of complex general behavior. Adaptive Behavior, 5:317–342, 1997. F. Gomez, D. Burger, and R. Miikkulainen. A neuroevolution method for dynamic resource allocation on a chip multiprocessor. In Proceedings of the INNS-IEEE International Joint Conference on Neural Networks, pages 2355–2361, Piscataway, NJ, 2001. IEEE. F. J. Gomez. Robust Nonlinear Control through Neuroevolution. PhD thesis, Department of Computer Sciences, The University of Texas at Austin, August 2003. Technical Report AI-TR-03-303. F. J. Gomez and R. Miikkulainen. Active guidance for a finless rocket using neuroevolution. In E. Cant-Paz et al., editor, Proceedings of the Genetic Evolutionary Computation Conference (GECCO-03). Springer-VerlagBerlin; New York, 2003. D. Grady. The vision thing: Mainly in the brain. Discover, 14:57–66, June 1993. U. Grasemann and R. Miikkulainen. Effective image compression using evolved wavelets. In Proceedings of the Genetic Evolutionary Computation Conference (GECCO-05), New York, 2005. ACM. ISBN 1-59593-010-8. B. Greer, H. Hakonen, R. Lahdelma, and R. Miikkulainen. Numerical optimization with neuroevolution. In Proceedings of the 2002 Congress on Evolutionary Computation (CEC2002), 2002. F. Gruau, D. Whitley, and L. Pyeatt. A comparison between cellular encoding and direct encoding for genetic neural networks. In J. R. Koza, D. E. Goldberg, D. B. Fogel, and R. L. Riolo, editors, Genetic Programming 1996: Proceedings of the First Annual Conference, pages 81–89, Cambridge, MA, 1996a. MIT Press. F. Gruau, D. Whitley, and L. Pyeatt. A comparison between cellular encoding and direct encoding for genetic neural networks. Technical Report NC-TR-96-048, NeuroCOLT, 1996b. G. Grudic. Simulation code for policy http://www.cis.upenn.edu/ grudic/PGRLSim/, 2000. gradient reinforcement learning. N. Hansen and A. Ostermeier. Completely derandomized self-adaptation in evolution strategies. Evolutionary Computation, 9(2):159–195, 2001. S. A. Harp, T. Samad, and A. Guha. Towards the genetic synthesis of neural networks. In Proceedings of the Third International Conference on Genetic Algorithms, pages 360–369, 1989. S. Hochreiter and J. Schmidhuber. Long short-term memory. Neural Computation, 9(8):1735–1780, 1997. 961 G OMEZ , S CHMIDHUMBER AND M IIKKULAINEN S. Hochreiter, Y. Bengio, P. Frasconi, and J. Schmidhuber. Gradient flow in recurrent nets: the difficulty of learning long-term dependencies. In S. C. Kremer and J. F. Kolen, editors, A Field Guide to Dynamical Recurrent Neural Networks. IEEE Press, 2001. J. H. Holland and J. S. Reitman. Cognitive systems based on adaptive algorithms. In D. A. Waterman and F. Hayes-Roth, editors, Pattern-Directed Inference Systems. Academic Press, New York, 1978. J. Horn, D. E. Goldberg, and K. Deb. Implicit niching in a learning classifier system: Nature’s way. Evolutionary Computation, 2(1):37–66, 1994. P. Husbands and F. Mill. Simulated co-evolution as the mechanism for emergent planning and scheduling. In R. K. Belew and L. B. Booker, editors, Proceedings of the Fourth International Conference on Genetic Algorithms, pages 264–270. San Francisco, CA: Morgan Kaufmann, 1991. ISBN 1-55860-208-9. C. Igel. Neuroevolution for reinforcement learning using evolution strategies. In R. Reynolds, H. Abbass, K. C. Tan, B. McKay, D. Essam, and T. Gedeon, editors, Congress on Evolutionary Computation (CEC 2003), volume 4, pages 2588–2595. IEEE, 2003. J.-S. R. Jang. Self-learning fuzzy controllers based on temporal backpropagation. IEEE Transactions on Neural Networks, 3(5):714–723, September 1992. T. Jansen and R. P. Wiegand. The cooperative coevolutionary (1+1) ea. Evolutionary Computation, 12(4), 2004. T. Jansen and R. P. Wiegand. Exploring the explorative advantage of the CC (1+1) ea. In E. Cant-Paz et al., editor, Proceedings of the Genetic Evolutionary Computation Conference (GECCO-03). Springer-VerlagBerlin; New York, 2003. D. Jefferson, R. Collins, C. Cooper, M. Dyer, M. Flowers, R. Korf, C. Taylor, and A. Wang. Evolution as a theme in artificial life: The Genesys/Tracker system. In C. G. Langton, C. Taylor, J. D. Farmer, and S. Rasmussen, editors, Proceedings of the Workshop on Artificial Life (ALIFE ’90). Reading, MA: Addison-Wesley, 1991. ISBN 0-201-52570-4. H. Kitano. Designing neural networks using genetic algorithms with graph generation system. Complex Systems, 4:461–476, 1990. J. R. Koza. Genetic Programming. MIT Press, Cambridge, MA, 1991. L.-J. Lin. Self-improving reactive agents based on reinforcement learning, planning, and teaching. Machine Learning, 8(3):293–321, 1992. L.-J. Lin and T. M. Mitchell. Memory approaches to reinforcement learning in non-Markovian domains. Technical Report CMU-CS-92-138, Carnegie Mellon University, School of Computer Science, May 1992. A. Lubberts and R. Miikkulainen. Co-evolving a go-playing neural network. In Coevolution: Turning Adaptive Algorithms Upon Themselves, Birds-of-a-Feather Workshop, Genetic and Evolutionary Computation Conference (GECCO-2001), 2001. 962 C OOPERATIVE S YNAPSE N EUROEVOLUTION M. Mandischer. Representation and evolution of neural networks. In R.F. Albrecht, C.R. Reeves, and N.C. Steele, editors, Proceedings of the Conference on Artificial Neural Nets and Genetic Algorithms at Innsbruck, Austria, pages 643–649. Springer-Verlag, 1993. N. Meuleau, L. Peshkin, K.-E. Kim, and L. P. Kaelbling. Learning finite state controllers for partially observable environments. In 15th International Conference of Uncertainty in AI, 1999. D. Michie and R. A. Chambers. BOXES: An experiment in adaptive control. In E. Dale and D. Michie, editors, Machine Intelligence. Oliver and Boyd, Edinburgh, UK, 1968. G. Miller and D. Cliff. Co-evolution of pursuit and evasion i: Biological and game-theoretic foundations. Technical Report CSRP311, School of Cognitive and Computing Sciences, University of Sussex, Brighton, UK, 1994. D. E. Moriarty. Symbiotic Evolution of Neural Networks in Sequential Decision Tasks. PhD thesis, Department of Computer Sciences, The University of Texas at Austin, 1997. Technical Report UT-AI97-257. S. Nolfi and D. Parisi. Learning to adapt to changing environments in evolving neural networks. Technical Report 95-15, Institute of Psychology, National Research Council, Rome, Italy, 1995. L. Panait, S. Luke, and J. F. Harrison. Archive-based cooperative coevolutionary algorithms. In GECCO ’06: Proceedings of the 8th annual conference on Genetic and evolutionary computation, pages 345–352, New York, NY, USA, 2006. ACM Press. ISBN 1-59593-186-4. doi: http://doi.acm.org/10.1145/1143997.1144060. J. Paredis. Steps towards co-evolutionary classification neural networks. In R. A. Brooks and P. Maes, editors, Proceedings of the Fourth International Workshop on the Synthesis and Simulation of Living Systems (Artificial Life IV), pages 102–108. Cambridge, MA: MIT Press, 1994. ISBN 0-262-52190-3. J. Paredis. Coevolutionary computation. Artificial Life, 2:355–375, 1995. A. S. Perez-Bergquist. Applying ESP and region specialists to neuro-evolution for Go. Technical Report CSTR01-24, Department of Computer Sciences, The University of Texas at Austin, 2001. J. B. Pollack, A. D. Blair, and M. Land. Coevolution of a backgammon player. In C. G. Langton and K. Shimohara, editors, Proceedings of the 5th International Workshop on Artificial Life: Synthesis and Simulation of Living Systems (ALIFE-96). Cambridge, MA: MIT Press, 1996. ISBN 0-26262111-8. M. A. Potter and K. A. De Jong. Evolving neural networks with collaborative species. In Proceedings of the 1995 Summer Computer Simulation Conference, 1995. C. D. Rosin. Coevolutionary Search Among Adversaries. PhD thesis, University of California, San Diego, San Diego, CA, 1997. J. C. Santamaria, R. S. Sutton, and A. Ram. Experiments with reinforcement learning in problems with continuous state and action spaces. Adaptive Behavior, 6(2):163–218, 1998. 963 G OMEZ , S CHMIDHUMBER AND M IIKKULAINEN N. Saravanan and D. B. Fogel. Evolving neural control systems. IEEE Expert, pages 23–27, June 1995. K. O. Stanley. Efficient Evolution of Neural Networks Through Complexification. PhD thesis, Department of Computer Sciences, The University of Texas at Austin, August 2004. Technical Report AI-TR-04-314. K. O. Stanley and R. Miikkulainen. Evolving neural networks through augmenting topologies. Evolutionary Computation, 10:99–127, 2002. R. S. Sutton. Generalization in reinforcement learning: Successful examples using sparse coarse coding. In D. S. Touretzky, M. C. Mozer, and M. E. Hasselmo, editors, Advances in Neural Information Processing Systems 8, pages 1038–1044. Cambridge, MA: MIT Press, 1996. R. S. Sutton and A. G. Barto. Reinforcement Learning: An Introduction. MIT Press, Cambridge, MA, 1998. ISBN 0-262-19398-1. R. S. Sutton, D. McAllester, S. Singh, and Y. Mansour. Policy gradient methods for reinforcement learning with function approximation. In Advances in Neural Information Processing Systems 12, volume 12, pages 1057–1063. MIT Press, 2000. G. Tesauro. Practical issues in temporal difference learning. Machine Learning, 8:257–277, 1992. H. M. Voigt, J. Born, and I. Santibanez-Koref. Evolutionary structuring of artificial neural networks. Technical report, Technical University Berlin, Bio- and Neuroinformatics Research Group, 1993. C. J. C. H. Watkins and P. Dayan. Q-learning. Machine Learning, 8(3):279–292, 1992. P. Werbos. Backpropagation through time: what does it do and how to do it. In Proceedings of IEEE, volume 78, pages 1550–1560, 1990. B. A. Whitehead and T. D. Choate. Cooperative–competitive genetic evolution of radial basis function centers and widths for time series prediction. IEEE Transactions on Neural Networks, 1995. S. Whiteson, N. Kohl, R. Miikkulainen, and P. Stone. Evolving keepaway soccer players through task decomposition. In E. Cant-Paz et al., editor, Proceedings of the Genetic Evolutionary Computation Conference (GECCO-03). Springer-VerlagBerlin; New York, 2003. D. Whitley, S. Dominic, R. Das, and Charles W. Anderson. Genetic reinforcement learning for neurocontrol problems. Machine Learning, 13:259–284, 1993. R. P. Wiegand. An Analysis of Cooperative Coevolutionary Algorithms. PhD thesis, George Mason University, Fall 2003. R. P. Wiegand, W. C. Liles, and K. A. De Jong. An empirical analysis of collaboration methods in cooperative coevolutionary algorithms. In L. Spector et al., editor, Proceedings of the Genetic and Evolutionary Computation Conference, pages 1235–1242. San Francisco, CA: Morgan Kaufmann, 2001. ISBN 1-55860-774-9. URL citeseer.ist.psu.edu/481900.html. 964 C OOPERATIVE S YNAPSE N EUROEVOLUTION A. Wieland. Evolving neural network controllers for unstable systems. In Proceedings of the International Joint Conference on Neural Networks (Seattle, WA), pages 667–673. Piscataway, NJ: IEEE, 1991. D. Wierstra, A. Foerster, J. Peters, and J. Schmidhuber. Solving deep memory pomdps with recurrent policy gradients. In International Conference on Artificial Neural Networks, 2007. B. Yamauchi and R. D. Beer. Integrating reactive, sequential, and learning behavior using dynamical neural networks. In D. Cliff, P. Husbands, J.-A. Meyer, and S. W. Wilson, editors, From Animals to Animats 3: Proceedings of the Third International Conference on Simulation of Adaptive Behavior, pages 382–391. Cambridge, MA: MIT Press, 1994. ISBN 0-262-53122-4. X. Yao. Evolving artificial neural networks. Proceedings of the IEEE, 87(9):1423–1447, 1999. 965