jmlr jmlr2008 jmlr2008-8 knowledge-graph by maker-knowledge-mining

8 jmlr-2008-Accelerated Neural Evolution through Cooperatively Coevolved Synapses

Source: pdf

Author: Faustino Gomez, Jürgen Schmidhuber, Risto Miikkulainen

Abstract: Many complex control problems require sophisticated solutions that are not amenable to traditional controller design. Not only is it difﬁcult to model real world systems, but often it is unclear what kind of behavior is required to solve the task. Reinforcement learning (RL) approaches have made progress by using direct interaction with the task environment, but have so far not scaled well to large state spaces and environments that are not fully observable. In recent years, neuroevolution, the artiﬁcial evolution of neural networks, has had remarkable success in tasks that exhibit these two properties. In this paper, we compare a neuroevolution method called Cooperative Synapse Neuroevolution (CoSyNE), that uses cooperative coevolution at the level of individual synaptic weights, to a broad range of reinforcement learning algorithms on very difﬁcult versions of the pole balancing problem that involve large (continuous) state spaces and hidden state. CoSyNE is shown to be signiﬁcantly more efﬁcient and powerful than the other methods on these tasks. Keywords: coevolution, recurrent neural networks, non-linear control, genetic algorithms, experimental comparison

Reference: text

Summary: the most important sentenses genereted by tfidf model

sentIndex sentText sentNum sentScore

1 Keywords: coevolution, recurrent neural networks, non-linear control, genetic algorithms, experimental comparison 1. [sent-11, score-0.236]

2 A controller or agent must be built that observes the state of the system, or environment, and outputs a control signal that affects future states of the environment in some desirable way. [sent-13, score-0.275]

3 However, in practice, they have not scaled well to large state spaces or tasks where the state of the environment is not fully observable to the agent. [sent-27, score-0.198]

4 More recently, methods for evolving artiﬁcial neural networks or neuroevolution have shown promising results on continuous, partially observable tasks (Gomez, 2003; Nolﬁ and Parisi, 1995; Yamauchi and Beer, 1994). [sent-31, score-0.425]

5 Our previous method, Enforced SubPopulations, is a particularly effective neuroevolution algorithm that has been a applied successfully to many domains (PerezBergquist, 2001; Lubberts and Miikkulainen, 2001; Greer et al. [sent-32, score-0.193]

6 , 2001; Grasemann and Miikkulainen, 2005), including the real world reinforcement learning task of ﬁnless rocket control (Gomez and Miikkulainen, 2003). [sent-35, score-0.207]

7 To this end, we have chosen a set of pole balancing tasks ranging from the trivial to versions that are extremely difﬁcult for some of todays most advanced methods. [sent-37, score-0.509]

8 The paper is organized as follows: in Section 2, we discuss the general neuroevolution paradigm. [sent-38, score-0.193]

9 In Section 3, the underlying approach used by CoSyNE, cooperative coevolution is described. [sent-39, score-0.263]

10 Section 5 presents our experiments comparing CoSyNE with value function, policy search, and other evolutionary methods. [sent-41, score-0.218]

11 Neuroevolution The basic idea of Neuroevolution (NE; Yao, 1999) is to search the space of neural network policies directly using a genetic algorithm. [sent-44, score-0.248]

12 A chromosome can encode any relevant network parameter including synaptic weight values, number of processing units, connectivity (topology), learning rate, etc. [sent-51, score-0.206]

13 Each chromosome is transformed into a neural network phenotype and evaluated on the task. [sent-62, score-0.192]

14 The agent receives input from the environment (observation) and propagates it through its neural network to compute an output signal (action) that affects the environment. [sent-63, score-0.217]

15 NE approaches differ primarily in how they encode neural network speciﬁcations into genetic strings. [sent-66, score-0.248]

16 By searching the space of policies directly, NE can be applied to reinforcement learning problems without using a value function—neural network controllers map observations from the environment directly to actions. [sent-77, score-0.302]

17 By evolving these networks instead of training them, NE avoids the problem of vanishing error gradients that affect recurrent network learning algorithms (Hochreiter et al. [sent-82, score-0.312]

18 , Pn } repeat repeat for j = 1 to n do // construct complete solution xi j = Select(Pj ) x ⇐ xi j // add subgenotype to complete solution end Evaluate(x) until enough solutions evaluated for i = 1 to n do // each subpopulation reproduces independently Recombine(Pi ) end until solution is found 3. [sent-89, score-0.22]

19 Cooperative Coevolution In natural ecosystems, organisms of one species compete and/or cooperate with many other different species in their struggle for resources and survival. [sent-90, score-0.282]

20 Most coevolutionary problem solving systems have concentrated on competition between species (Darwen, 1996; Pollack et al. [sent-94, score-0.245]

21 In cooperative coevolutionary algorithms the species represent solution components. [sent-102, score-0.381]

22 Algorithm 1 outlines the basic operation of a generic cooperative coevolutionary algorithm. [sent-105, score-0.24]

23 Each species has its own subpopulation Pi , i = 1. [sent-107, score-0.221]

24 This approach has been used in learning neural network classiﬁers, in coevolution of cascade correlation networks, and in coevolution of radial basis functions (Eriksson and Olsson, 1997; Horn et al. [sent-132, score-0.379]

25 To create a network, ﬁrst the weights at a given index in each subpopulation are collected into a chromosome x, then the weights are mapped to their corresponding synapses in a predeﬁned network architecture with six connections, shown at right. [sent-140, score-0.339]

26 particularly powerful idea is to combine cooperative coevolution with neuroevolution so that the beneﬁts of evolving neural networks can be enhanced further through improved search efﬁciency. [sent-141, score-0.647]

27 Much of the motivation for using the cooperative coevolutionary approach is based on the intuition that many problems may be decomposable into weakly coupled low-dimensional subspaces that can be searched semi-independently by separate species (Wiegand et al. [sent-142, score-0.381]

28 Our experience shows that there may be another, complementary, explanation as to why cooperative coevolution in many cases outperforms single-population algorithms. [sent-145, score-0.263]

29 As the number of species increases, the selection of subgenotypes for reproduction becomes less greedy, causing the search points that are evaluated each generation to converge more slowly, providing more paths toward better solutions (not shown). [sent-149, score-0.217]

30 In a normal evolutionary algorithm, a subgenotype suffers the fate of the complete solution to which it is attached. [sent-150, score-0.236]

31 Like neuron-level methods such as ESP, networks are constructed by selecting one member from each subpopulation and plugging them into a predeﬁned network topology. [sent-171, score-0.221]

32 n, is created, where n is the number of synaptic weights in the networks to be evolved, determined by a user-speciﬁed network architecture Ψ. [sent-175, score-0.247]

33 Each generation starts by constructing a complete network chromosome x j = (x1 j , x2 j , . [sent-181, score-0.22]

34 Recombination produces a pool of offspring O consisting of l new network chromosomes o k , where 943 G OMEZ , S CHMIDHUMBER AND M IIKKULAINEN m weights n subpopulations Figure 4: Probabilistic permutations. [sent-189, score-0.34]

35 The subpopulations are then sorted by ﬁtness (line 9), and the weights from the new networks are added to P by replacing the least ﬁt weights in their corresponding subpopulation (i. [sent-200, score-0.329]

36 At this point the algorithm functions as a conventional neuroevolution system that evolves complete network chromosomes. [sent-203, score-0.351]

37 In order to coevolve the synaptic weights, the subpopulations are permuted so that each weight forms part of a potentially different network in the next generation. [sent-204, score-0.227]

38 , which entry in the chromosome corresponds to which synapse) or which 944 C OOPERATIVE S YNAPSE N EUROEVOLUTION Figure 5: The double pole balancing system. [sent-214, score-0.535]

39 The system becomes more difﬁcult to control as the poles assume similar lengths and if the velocities are not provided to the controller. [sent-216, score-0.217]

40 For the genetic operators, we use multi-point crossover where 1-point crossover is applied to each neuron segment of the chromosome to generate two offspring, and mutation where each weight in P has a probability of being perturbed by Cauchy distributed noise with zero mean α = 0. [sent-222, score-0.411]

41 Experiments We compared CoSyNE to a broad range of learning algorithms on a sequence of increasingly difﬁcult versions of the pole balancing task. [sent-225, score-0.468]

42 1 The Pole Balancing Problem The basic pole balancing or inverted pendulum system consists of a pole hinged to a wheeled cart on a ﬁnite stretch of track. [sent-228, score-0.956]

43 The objective is to apply a force to the cart at regular intervals such that the pole is balanced indeﬁnitely and the cart stays within the track boundaries. [sent-229, score-0.657]

44 This long history notwithstanding, it turns out that the basic pole balancing problem can be solved easily by random search. [sent-232, score-0.468]

45 2 Other Methods CoSyNE was compared to eight ontogenetic methods and seven phylogenetic methods in the pole balancing domain: 5. [sent-240, score-0.591]

46 The policy was implemented using a feed-forward neural network with one hidden layer. [sent-249, score-0.212]

47 In this approach, each chromosome in the population represents a complete neural network. [sent-295, score-0.198]

48 CNE differs from Wieland’s algorithm in that (1) the network weights are encoded with real instead of binary numbers, (2) it uses rank selection, and (3) it uses burst mutation. [sent-296, score-0.2]

49 CNE is like ESP except that it evolves at the network level instead of the neuron level, and therefore provides a way to isolate the performance advantage of cooperative coevolution (ESP) over a single population approach (CNE). [sent-297, score-0.523]

50 A network is constructed using the weights in x, and offspring are produced by applying Gaussian noise to each element x(i) with standard deviation δ(i), i ∈ {1. [sent-300, score-0.2]

51 CE uses the standard GP crossover and mutation to recombine the programs allowing evolution to automatically determine an appropriate architecture for the task and relieve the investigator from this often trial-and-error undertaking. [sent-308, score-0.216]

52 Covariance Matrix Adaptation Evolutionary Strategies (CMA-ES; Hansen and Ostermeier 2001) evolves the covariance matrix of the mutation operator in evolutionary strategies. [sent-309, score-0.219]

53 Innovation numbers allow NEAT to keep track of the historical origin of every gene in the population so that (1) crossover can be performed between networks with different topologies, and (2) the networks can be grouped into “species” based on topological similarity. [sent-316, score-0.212]

54 Using this distance, the population is divided into species so that individuals compete primarily within their own species instead of with the population at large. [sent-320, score-0.441]

55 This way, topological 948 C OOPERATIVE S YNAPSE N EUROEVOLUTION output x x cart θ 1 θ 1 long pole θ 2 θ 2 x θ θ cart input short pole long pole short pole 1 2 Figure 6: Neural network control of the pole balancing system. [sent-321, score-2.296]

56 For the recurrent networks (b) used in the non-Markov tasks (1b and 2b), the neurons do not receive the velocities (x, θ1 , θ2 ), ˙ ˙ ˙ instead they must use their feedback connections to determine which direction the poles are moving. [sent-324, score-0.384]

57 For the single pole version the network only has inputs for the cart and long pole. [sent-325, score-0.574]

58 3 Task Setup The pole balancing environment was implemented using a realistic physical model with friction, and fourth-order Runge-Kutta integration with a step size of 0. [sent-337, score-0.523]

59 Figure 6 shows how the network controllers interact with the pole balancing environment. [sent-343, score-0.624]

60 This cycle is repeated until a pole falls or the cart goes off the end of the track. [sent-350, score-0.488]

61 , 1996b) we restrict the force to be no less than ±1/256 × 10 Newtons so that the controllers cannot maintain the system in unstable equilibrium by outputting a force of zero when the poles are vertical. [sent-354, score-0.271]

62 In 2a, the system now has a second pole ˙ ˙ next to the ﬁrst, making the state-space 6-dimensional. [sent-359, score-0.36]

63 Fitness was determined by the number of time steps a network could keep both poles within a speciﬁed failure angle from vertical and the cart between the ends of the track. [sent-361, score-0.369]

64 The failure angle was 12 ◦ and 36 ◦ for the one and two pole tasks, respectively. [sent-362, score-0.396]

65 For the one-pole tasks, the initial pole angle was set to 4. [sent-363, score-0.396]

66 For the two-pole tasks, the initial angle of the long pole was 4. [sent-365, score-0.396]

67 CoSyNE evolved networks with one hidden unit, 20 weights per subpopulation for the 1-pole tasks, and 30 weights for the 2-pole tasks. [sent-368, score-0.29]

68 4 Results: Balancing One Pole Balancing one pole is a relatively easy problem that gives us a performance baseline before moving on to the much harder two-pole task. [sent-375, score-0.36]

69 1 C OMPLETE S TATE I NFORMATION Table 1 shows the results for the single pole balancing task with complete state information. [sent-379, score-0.624]

70 Comparison of various learning methods on the basic pole balancing problem with continuous control. [sent-383, score-0.468]

71 In contrast, evolutionary methods do not update any agent parameters during interaction with the environment and only need to evaluate a function approximator once per state transition since the policy is represented explicitly. [sent-389, score-0.398]

72 For this task the speciation process is an overkill—the task can be solved more efﬁciently by devoting resources to searching for weights only. [sent-395, score-0.193]

73 2 I NCOMPLETE S TATE I NFORMATION This task is identical to the ﬁrst task except the controller only senses the cart position x and pole angle θ. [sent-406, score-0.75]

74 To allow Q-MLP and the SARSA methods to solve this task, we extended their inputs to include the immediately previous cart position, pole angle, and action (xt−1 , θt−1 , at−1 ) in addition to xt , θt , and at . [sent-410, score-0.536]

75 It is clear, however, that VAPS is the slowest method in this comparison, only being able to balance the pole for around 1 minute of simulated time after several days of computation (Meuleau et al. [sent-418, score-0.36]

76 CoSyNE was able to balance the pole for over 30 minutes of simulated time usually within 2 seconds of learning CPU time, and do so reliably. [sent-429, score-0.36]

77 The results on these ﬁrst two tasks show that the single pole environment is not very challenging. [sent-430, score-0.456]

78 5 Results: Balancing Two Poles The double pole problem is a better test environment for these methods, representing a signiﬁcant jump in difﬁculty. [sent-435, score-0.415]

79 Here the controller must balance two poles of different lengths (1m and 0. [sent-436, score-0.205]

80 The second pole adds two more dimensions to the state-space (θ 2 , θ2 ) and nonlinear interactions between the poles. [sent-438, score-0.36]

81 At least in the difﬁcult versions of the pole balancing task, the performance of these two approaches is very similar. [sent-452, score-0.468]

82 2 I NCOMPLETE S TATE I NFORMATION Although the previous task is difﬁcult, the controller has the beneﬁt of complete state information. [sent-456, score-0.242]

83 In this task, as in task 1b, the controller does not have access to the velocities, that is, it does not know how fast or in which direction the poles are moving. [sent-457, score-0.275]

84 953 G OMEZ , S CHMIDHUMBER AND M IIKKULAINEN Method RWG EP CNE SANE Q-MLP RPG NEAT ESP CoSyNE CMA-ES Evaluations 474,329 307,200 22,100 12,600 10,582 (4,981) 3,600 3,800 954 895 CPU time 70 — 73 37 153 — 31 22 4 — Table 3: Two poles with complete state information. [sent-458, score-0.205]

85 The table shows the number of pole balancing attempts (evaluations) and CPU time required by each method to solve the task. [sent-459, score-0.468]

86 This complex ﬁtness is intended to force the network to compute the pole velocities, and avoid solutions that balance the poles by merely swinging them back and forth (i. [sent-474, score-0.606]

87 To determine when the task was solved for the damping ﬁtness function, the best controller from each generation was tested using the standard ﬁtness to see if it could balance the poles for 100K time steps. [sent-478, score-0.35]

88 In the single pole tasks, the value-based ontogenetic methods were outperformed by random search. [sent-494, score-0.448]

89 In contrast, all of the neuroevolution methods scaled up to the most difﬁcult tasks, with CMA-ES and CoSyNE leading the pack. [sent-497, score-0.193]

90 Neuroevolution deals with them by evolving recurrent networks; the networks can compactly represent arbitrary temporal, non-linear mappings. [sent-500, score-0.226]

91 This means that it is possible for the networks evaluated in a given generation to not contain any combinations of weights found in the networks of the previous generation. [sent-504, score-0.195]

92 ˜ 4 Parameters used for the single pole problem: 956 C OOPERATIVE S YNAPSE N EUROEVOLUTION Sym. [sent-515, score-0.36]

93 x θ F l M m Description Position of cart on track Angle of pole from vertical Force applied to cart Half length of pole Mass of cart Mass of pole Value [-2. [sent-516, score-1.464]

94 x θ F li Description Position of cart on track Angle of pole from vertical Force applied to cart Half length of ith pole M mi Mass of cart Mass of ith pole µc Coefﬁcient of friction of cart on track Coefﬁcient of friction if ith pole’s hinge µp Value [-2. [sent-524, score-1.592]

95 Burst threshold is the number of generations after which burst mutation is activated if the best network found so far is not improved upon. [sent-606, score-0.198]

96 of subpops size of subpopulations evals per generation burst threshold 1a FF 5 20 200 10 Task 1b 2a FR FF 5 5 20 40 200 400 10 10 2b FR 5 100 1000 5 The mutation rate for all runs was set to 40%. [sent-609, score-0.232]

97 Burst threshold is the number of generations after which burst mutation is activated if the best network found so far is not improved upon. [sent-610, score-0.198]

98 A neuroevolution method for dynamic resource allocation on a chip multiprocessor. [sent-682, score-0.193]

99 Designing neural networks using genetic algorithms with graph generation system. [sent-858, score-0.249]

100 An empirical analysis of collaboration methods in cooperative coevolutionary algorithms. [sent-1098, score-0.24]

similar papers computed by tfidf model

tfidf for this paper:

wordName wordTfidf (topN-words)

[('pole', 0.36), ('cosyne', 0.316), ('tness', 0.22), ('esp', 0.211), ('neuroevolution', 0.193), ('cne', 0.184), ('neat', 0.158), ('species', 0.141), ('sane', 0.14), ('cooperative', 0.136), ('evolutionary', 0.131), ('cart', 0.128), ('coevolution', 0.127), ('genetic', 0.123), ('chmidhumber', 0.123), ('euroevolution', 0.123), ('iikkulainen', 0.123), ('ynapse', 0.123), ('poles', 0.119), ('balancing', 0.108), ('coevolutionary', 0.104), ('omez', 0.104), ('ooperative', 0.104), ('evolving', 0.097), ('reinforcement', 0.091), ('ontogenetic', 0.088), ('subpopulations', 0.088), ('policy', 0.087), ('network', 0.086), ('controller', 0.086), ('neuron', 0.08), ('subpopulation', 0.08), ('gruau', 0.079), ('miikkulainen', 0.079), ('vaps', 0.079), ('recurrent', 0.074), ('gomez', 0.073), ('task', 0.07), ('controllers', 0.07), ('rwg', 0.07), ('subgenotype', 0.07), ('chromosome', 0.067), ('rpg', 0.067), ('sarsa', 0.067), ('evaluations', 0.064), ('burst', 0.061), ('cmac', 0.061), ('meuleau', 0.061), ('offspring', 0.061), ('population', 0.057), ('networks', 0.055), ('environment', 0.055), ('synaptic', 0.053), ('weights', 0.053), ('santamaria', 0.053), ('texas', 0.053), ('chromosomes', 0.052), ('velocities', 0.052), ('state', 0.051), ('mutation', 0.051), ('evolution', 0.05), ('evolved', 0.049), ('action', 0.048), ('control', 0.046), ('individuals', 0.045), ('crossover', 0.045), ('genotypes', 0.045), ('ahc', 0.044), ('austin', 0.044), ('blueprints', 0.044), ('recombined', 0.044), ('subgenotypes', 0.044), ('isbn', 0.043), ('cpu', 0.043), ('damping', 0.043), ('neurons', 0.043), ('tasks', 0.041), ('force', 0.041), ('neural', 0.039), ('agent', 0.037), ('stanley', 0.037), ('velocity', 0.037), ('approximator', 0.037), ('evolves', 0.037), ('angle', 0.036), ('complete', 0.035), ('coevolved', 0.035), ('fogel', 0.035), ('pgrl', 0.035), ('phylogenetic', 0.035), ('saravanan', 0.035), ('synapse', 0.035), ('tate', 0.035), ('whitley', 0.035), ('wieland', 0.035), ('wierstra', 0.035), ('life', 0.034), ('kg', 0.033), ('memory', 0.033), ('generation', 0.032)]

similar papers list:

simIndex simValue paperId paperTitle

same-paper 1 1.0000012 8 jmlr-2008-Accelerated Neural Evolution through Cooperatively Coevolved Synapses

Author: Faustino Gomez, Jürgen Schmidhuber, Risto Miikkulainen

2 0.13983364 90 jmlr-2008-Theoretical Advantages of Lenient Learners: An Evolutionary Game Theoretic Perspective

Author: Liviu Panait, Karl Tuyls, Sean Luke

Abstract: This paper presents the dynamics of multiple learning agents from an evolutionary game theoretic perspective. We provide replicator dynamics models for cooperative coevolutionary algorithms and for traditional multiagent Q-learning, and we extend these differential equations to account for lenient learners: agents that forgive possible mismatched teammate actions that resulted in low rewards. We use these extended formal models to study the convergence guarantees for these algorithms, and also to visualize the basins of attraction to optimal and suboptimal solutions in two benchmark coordination problems. The paper demonstrates that lenience provides learners with more accurate information about the beneﬁts of performing their actions, resulting in higher likelihood of convergence to the globally optimal solution. In addition, the analysis indicates that the choice of learning algorithm has an insigniﬁcant impact on the overall performance of multiagent learning algorithms; rather, the performance of these algorithms depends primarily on the level of lenience that the agents exhibit to one another. Finally, the research herein supports the strength and generality of evolutionary game theory as a backbone for multiagent learning. Keywords: multiagent learning, reinforcement learning, cooperative coevolution, evolutionary game theory, formal models, visualization, basins of attraction

3 0.073963262 95 jmlr-2008-Value Function Based Reinforcement Learning in Changing Markovian Environments

Author: Balázs Csanád Csáji, László Monostori

Abstract: The paper investigates the possibility of applying value function based reinforcement learning (RL) methods in cases when the environment may change over time. First, theorems are presented which show that the optimal value function of a discounted Markov decision process (MDP) Lipschitz continuously depends on the immediate-cost function and the transition-probability function. Dependence on the discount factor is also analyzed and shown to be non-Lipschitz. Afterwards, the concept of (ε, δ)-MDPs is introduced, which is a generalization of MDPs and ε-MDPs. In this model the environment may change over time, more precisely, the transition function and the cost function may vary from time to time, but the changes must be bounded in the limit. Then, learning algorithms in changing environments are analyzed. A general relaxed convergence theorem for stochastic iterative algorithms is presented. We also demonstrate the results through three classical RL methods: asynchronous value iteration, Q-learning and temporal difference learning. Finally, some numerical experiments concerning changing environments are presented. Keywords: Markov decision processes, reinforcement learning, changing environments, (ε, δ)MDPs, value function bounds, stochastic iterative algorithms

4 0.067398995 85 jmlr-2008-Shark (Machine Learning Open Source Software Paper)

Author: Christian Igel, Verena Heidrich-Meisner, Tobias Glasmachers

Abstract: SHARK is an object-oriented library for the design of adaptive systems. It comprises methods for single- and multi-objective optimization (e.g., evolutionary and gradient-based algorithms) as well as kernel-based methods, neural networks, and other machine learning techniques. Keywords: machine learning software, neural networks, kernel-methods, evolutionary algorithms, optimization, multi-objective-optimization 1. Overview SHARK is a modular C++ library for the design and optimization of adaptive systems. It serves as a toolbox for real world applications and basic research in computational intelligence and machine learning. The library provides methods for single- and multi-objective optimization, in particular evolutionary and gradient-based algorithms, kernel-based learning methods, neural networks, and many other machine learning techniques. Its main design criteria are ﬂexibility and speed. Here we restrict the description of SHARK to its core components, albeit the library contains plenty of additional functionality. Further information can be obtained from the HTML documentation and tutorials. More than 60 illustrative example programs serve as starting points for using SHARK. 2. Basic Tools—Rng, Array, and LinAlg The library provides general auxiliary functions and data structures for the development of machine learning algorithms. The Rng module generates reproducible and platform independent sequences of pseudo random numbers, which can be drawn from 14 predeﬁned discrete and continuous parametric distributions. The Array class provides dynamical array templates of arbitrary type and dimension as well as basic operations acting on these templates. LinAlg implements linear algebra algorithms such as matrix inversion and singular value decomposition. 3. ReClaM—Regression and Classiﬁcation Methods The goal of the ReClaM module is to provide machine learning algorithms for supervised classiﬁcation and regression in a uniﬁed, modular framework. It is built like a construction kit, where the main building blocks are adaptive data processing models, error functions, and optimization c 2008 Christian Igel, Verena Heidrich-Meisner and Tobias Glasmachers. I GEL , H EIDRICH -M EISNER AND G LASMACHERS 8 90736D 3 ¨¥¨¥¥£ ¡ §§©§¦¤¢ init(...) optimize(...) E 8973 B@ 6 4C3 A 86 973 543 %$#¨!

5 0.064632691 49 jmlr-2008-Learning Control Knowledge for Forward Search Planning

Author: Sungwook Yoon, Alan Fern, Robert Givan

Abstract: A number of today’s state-of-the-art planners are based on forward state-space search. The impressive performance can be attributed to progress in computing domain independent heuristics that perform well across many domains. However, it is easy to ﬁnd domains where such heuristics provide poor guidance, leading to planning failure. Motivated by such failures, the focus of this paper is to investigate mechanisms for learning domain-speciﬁc knowledge to better control forward search in a given domain. While there has been a large body of work on inductive learning of control knowledge for AI planning, there is a void of work aimed at forward-state-space search. One reason for this may be that it is challenging to specify a knowledge representation for compactly representing important concepts across a wide range of domains. One of the main contributions of this work is to introduce a novel feature space for representing such control knowledge. The key idea is to deﬁne features in terms of information computed via relaxed plan extraction, which has been a major source of success for non-learning planners. This gives a new way of leveraging relaxed planning techniques in the context of learning. Using this feature space, we describe three forms of control knowledge—reactive policies (decision list rules and measures of progress) and linear heuristics—and show how to learn them and incorporate them into forward state-space search. Our empirical results show that our approaches are able to surpass state-of-the-art nonlearning planners across a wide range of planning competition domains. Keywords: planning, machine learning, knowledge representation, search

6 0.052355096 65 jmlr-2008-Multi-Agent Reinforcement Learning in Common Interest and Fixed Sum Stochastic Games: An Experimental Study

7 0.039950024 18 jmlr-2008-Bayesian Inference and Optimal Design for the Sparse Linear Model

8 0.037832242 36 jmlr-2008-Finite-Time Bounds for Fitted Value Iteration

9 0.033288028 9 jmlr-2008-Active Learning by Spherical Subdivision

10 0.032012172 51 jmlr-2008-Learning Similarity with Operator-valued Large-margin Classifiers

11 0.0315619 53 jmlr-2008-Learning to Combine Motor Primitives Via Greedy Additive Regression

12 0.029438512 59 jmlr-2008-Maximal Causes for Non-linear Component Extraction

13 0.029355982 35 jmlr-2008-Finding Optimal Bayesian Network Given a Super-Structure

14 0.028573895 94 jmlr-2008-Value Function Approximation using Multiple Aggregation for Multiattribute Resource Management

15 0.026290137 32 jmlr-2008-Estimating the Confidence Interval for Prediction Errors of Support Vector Machine Classifiers

16 0.026173966 29 jmlr-2008-Cross-Validation Optimization for Large Scale Structured Classification Kernel Methods

17 0.026171459 17 jmlr-2008-Automatic PCA Dimension Selection for High Dimensional Data and Small Sample Sizes

18 0.02443506 39 jmlr-2008-Gradient Tree Boosting for Training Conditional Random Fields

19 0.023576504 58 jmlr-2008-Max-margin Classification of Data with Absent Features

20 0.021806033 96 jmlr-2008-Visualizing Data using t-SNE

similar papers computed by lsi model

lsi for this paper:

topicId topicWeight

[(0, 0.135), (1, -0.03), (2, -0.059), (3, -0.036), (4, -0.237), (5, -0.18), (6, -0.117), (7, 0.022), (8, 0.033), (9, 0.115), (10, -0.046), (11, 0.045), (12, 0.039), (13, 0.01), (14, 0.148), (15, -0.014), (16, 0.184), (17, 0.167), (18, 0.109), (19, 0.077), (20, -0.021), (21, 0.156), (22, -0.064), (23, -0.03), (24, 0.192), (25, 0.025), (26, -0.105), (27, 0.091), (28, -0.021), (29, -0.106), (30, 0.066), (31, -0.083), (32, -0.087), (33, 0.051), (34, -0.02), (35, 0.056), (36, -0.07), (37, -0.024), (38, 0.262), (39, 0.083), (40, 0.104), (41, 0.086), (42, -0.056), (43, 0.09), (44, -0.089), (45, -0.117), (46, 0.1), (47, -0.073), (48, 0.006), (49, 0.048)]

similar papers list:

simIndex simValue paperId paperTitle

same-paper 1 0.95489627 8 jmlr-2008-Accelerated Neural Evolution through Cooperatively Coevolved Synapses

Author: Faustino Gomez, Jürgen Schmidhuber, Risto Miikkulainen

2 0.61475486 85 jmlr-2008-Shark (Machine Learning Open Source Software Paper)

Author: Christian Igel, Verena Heidrich-Meisner, Tobias Glasmachers

3 0.56706071 90 jmlr-2008-Theoretical Advantages of Lenient Learners: An Evolutionary Game Theoretic Perspective

Author: Liviu Panait, Karl Tuyls, Sean Luke

4 0.29643765 49 jmlr-2008-Learning Control Knowledge for Forward Search Planning

Author: Sungwook Yoon, Alan Fern, Robert Givan

5 0.25896055 65 jmlr-2008-Multi-Agent Reinforcement Learning in Common Interest and Fixed Sum Stochastic Games: An Experimental Study

Author: Avraham Bab, Ronen I. Brafman

Abstract: Multi Agent Reinforcement Learning (MARL) has received continually growing attention in the past decade. Many algorithms that vary in their approaches to the different subtasks of MARL have been developed. However, the theoretical convergence results for these algorithms do not give a clue as to their practical performance nor supply insights to the dynamics of the learning process itself. This work is a comprehensive empirical study conducted on MGS, a simulation system developed for this purpose. It surveys the important algorithms in the ﬁeld, demonstrates the strengths and weaknesses of the different approaches to MARL through application of FriendQ, OAL, WoLF, FoeQ, Rmax, and other algorithms to a variety of fully cooperative and fully competitive domains in self and heterogeneous play, and supplies an informal analysis of the resulting learning processes. The results can aid in the design of new learning algorithms, in matching existing algorithms to speciﬁc tasks, and may guide further research and formal analysis of the learning processes. Keywords: reinforcement learning, multi-agent reinforcement learning, stochastic games

6 0.24546625 59 jmlr-2008-Maximal Causes for Non-linear Component Extraction

7 0.23073533 95 jmlr-2008-Value Function Based Reinforcement Learning in Changing Markovian Environments

8 0.22384681 35 jmlr-2008-Finding Optimal Bayesian Network Given a Super-Structure

9 0.16519479 12 jmlr-2008-Algorithms for Sparse Linear Classifiers in the Massive Data Setting

10 0.16309278 30 jmlr-2008-Discriminative Learning of Max-Sum Classifiers

11 0.15790874 94 jmlr-2008-Value Function Approximation using Multiple Aggregation for Multiattribute Resource Management

12 0.15707813 93 jmlr-2008-Using Markov Blankets for Causal Structure Learning (Special Topic on Causality)

13 0.15477608 9 jmlr-2008-Active Learning by Spherical Subdivision

14 0.14364828 18 jmlr-2008-Bayesian Inference and Optimal Design for the Sparse Linear Model

15 0.13849379 60 jmlr-2008-Minimal Nonlinear Distortion Principle for Nonlinear Independent Component Analysis

16 0.12216302 96 jmlr-2008-Visualizing Data using t-SNE

17 0.121117 76 jmlr-2008-Optimization Techniques for Semi-Supervised Support Vector Machines

18 0.11842878 51 jmlr-2008-Learning Similarity with Operator-valued Large-margin Classifiers

19 0.11613962 47 jmlr-2008-Learning Balls of Strings from Edit Corrections

20 0.10940287 81 jmlr-2008-Regularization on Graphs with Function-adapted Diffusion Processes

similar papers computed by lda model

lda for this paper:

topicId topicWeight

[(0, 0.013), (5, 0.025), (13, 0.382), (24, 0.048), (34, 0.019), (40, 0.031), (54, 0.055), (58, 0.031), (66, 0.031), (72, 0.015), (73, 0.011), (76, 0.026), (78, 0.027), (88, 0.073), (92, 0.035), (94, 0.039), (99, 0.023)]

similar papers list:

simIndex simValue paperId paperTitle

same-paper 1 0.7777971 8 jmlr-2008-Accelerated Neural Evolution through Cooperatively Coevolved Synapses

Author: Faustino Gomez, Jürgen Schmidhuber, Risto Miikkulainen

2 0.31275269 39 jmlr-2008-Gradient Tree Boosting for Training Conditional Random Fields

Author: Thomas G. Dietterich, Guohua Hao, Adam Ashenfelter

Abstract: Conditional random ﬁelds (CRFs) provide a ﬂexible and powerful model for sequence labeling problems. However, existing learning algorithms are slow, particularly in problems with large numbers of potential input features and feature combinations. This paper describes a new algorithm for training CRFs via gradient tree boosting. In tree boosting, the CRF potential functions are represented as weighted sums of regression trees, which provide compact representations of feature interactions. So the algorithm does not explicitly consider the potentially large parameter space. As a result, gradient tree boosting scales linearly in the order of the Markov model and in the order of the feature interactions, rather than exponentially as in previous algorithms based on iterative scaling and gradient descent. Gradient tree boosting also makes it possible to use instance weighting (as in C4.5) and surrogate splitting (as in CART) to handle missing values. Experimental studies of the effectiveness of these two methods (as well as standard imputation and indicator feature methods) show that instance weighting is the best method in most cases when feature values are missing at random. Keywords: sequential supervised learning, conditional random ﬁelds, functional gradient, gradient tree boosting, missing values

3 0.30544496 90 jmlr-2008-Theoretical Advantages of Lenient Learners: An Evolutionary Game Theoretic Perspective

Author: Liviu Panait, Karl Tuyls, Sean Luke

4 0.29302713 89 jmlr-2008-Support Vector Machinery for Infinite Ensemble Learning

Author: Hsuan-Tien Lin, Ling Li

Abstract: Ensemble learning algorithms such as boosting can achieve better performance by averaging over the predictions of some base hypotheses. Nevertheless, most existing algorithms are limited to combining only a ﬁnite number of hypotheses, and the generated ensemble is usually sparse. Thus, it is not clear whether we should construct an ensemble classiﬁer with a larger or even an inﬁnite number of hypotheses. In addition, constructing an inﬁnite ensemble itself is a challenging task. In this paper, we formulate an inﬁnite ensemble learning framework based on the support vector machine (SVM). The framework can output an inﬁnite and nonsparse ensemble through embedding inﬁnitely many hypotheses into an SVM kernel. We use the framework to derive two novel kernels, the stump kernel and the perceptron kernel. The stump kernel embodies inﬁnitely many decision stumps, and the perceptron kernel embodies inﬁnitely many perceptrons. We also show that the Laplacian radial basis function kernel embodies inﬁnitely many decision trees, and can thus be explained through inﬁnite ensemble learning. Experimental results show that SVM with these kernels is superior to boosting with the same base hypothesis set. In addition, SVM with the stump kernel or the perceptron kernel performs similarly to SVM with the Gaussian radial basis function kernel, but enjoys the beneﬁt of faster parameter selection. These properties make the novel kernels favorable choices in practice. Keywords: ensemble learning, boosting, support vector machine, kernel

5 0.28751943 36 jmlr-2008-Finite-Time Bounds for Fitted Value Iteration

Author: Rémi Munos, Csaba Szepesvári

Abstract: In this paper we develop a theoretical analysis of the performance of sampling-based ﬁtted value iteration (FVI) to solve inﬁnite state-space, discounted-reward Markovian decision processes (MDPs) under the assumption that a generative model of the environment is available. Our main results come in the form of ﬁnite-time bounds on the performance of two versions of sampling-based FVI. The convergence rate results obtained allow us to show that both versions of FVI are well behaving in the sense that by using a sufﬁciently large number of samples for a large class of MDPs, arbitrary good performance can be achieved with high probability. An important feature of our proof technique is that it permits the study of weighted L p -norm performance bounds. As a result, our technique applies to a large class of function-approximation methods (e.g., neural networks, adaptive regression trees, kernel machines, locally weighted learning), and our bounds scale well with the effective horizon of the MDP. The bounds show a dependence on the stochastic stability properties of the MDP: they scale with the discounted-average concentrability of the future-state distributions. They also depend on a new measure of the approximation power of the function space, the inherent Bellman residual, which reﬂects how well the function space is “aligned” with the dynamics and rewards of the MDP. The conditions of the main result, as well as the concepts introduced in the analysis, are extensively discussed and compared to previous theoretical results. Numerical experiments are used to substantiate the theoretical ﬁndings. Keywords: ﬁtted value iteration, discounted Markovian decision processes, generative model, reinforcement learning, supervised learning, regression, Pollard’s inequality, statistical learning theory, optimal control

6 0.276012 18 jmlr-2008-Bayesian Inference and Optimal Design for the Sparse Linear Model

7 0.27586612 34 jmlr-2008-Exponentiated Gradient Algorithms for Conditional Random Fields and Max-Margin Markov Networks

8 0.27528939 74 jmlr-2008-Online Learning of Complex Prediction Problems Using Simultaneous Projections

9 0.27461696 9 jmlr-2008-Active Learning by Spherical Subdivision

10 0.27392346 83 jmlr-2008-Robust Submodular Observation Selection

11 0.27260703 58 jmlr-2008-Max-margin Classification of Data with Absent Features

12 0.27163807 86 jmlr-2008-SimpleMKL

13 0.27123821 66 jmlr-2008-Multi-class Discriminant Kernel Learning via Convex Programming (Special Topic on Model Selection)

14 0.27082953 56 jmlr-2008-Magic Moments for Structured Output Prediction

15 0.26990262 81 jmlr-2008-Regularization on Graphs with Function-adapted Diffusion Processes

16 0.26957139 96 jmlr-2008-Visualizing Data using t-SNE

17 0.26919603 57 jmlr-2008-Manifold Learning: The Price of Normalization

18 0.26898855 30 jmlr-2008-Discriminative Learning of Max-Sum Classifiers

19 0.26632792 51 jmlr-2008-Learning Similarity with Operator-valued Large-margin Classifiers

20 0.265176 41 jmlr-2008-Graphical Models for Structured Classification, with an Application to Interpreting Images of Protein Subcellular Location Patterns