jmlr jmlr2010 jmlr2010-37 knowledge-graph by maker-knowledge-mining

37 jmlr-2010-Evolving Static Representations for Task Transfer


Source: pdf

Author: Phillip Verbancsics, Kenneth O. Stanley

Abstract: An important goal for machine learning is to transfer knowledge between tasks. For example, learning to play RoboCup Keepaway should contribute to learning the full game of RoboCup soccer. Previous approaches to transfer in Keepaway have focused on transforming the original representation to fit the new task. In contrast, this paper explores the idea that transfer is most effective if the representation is designed to be the same even across different tasks. To demonstrate this point, a bird’s eye view (BEV) representation is introduced that can represent different tasks on the same two-dimensional map. For example, both the 3 vs. 2 and 4 vs. 3 Keepaway tasks can be represented on the same BEV. Yet the problem is that a raw two-dimensional map is high-dimensional and unstructured. This paper shows how this problem is addressed naturally by an idea from evolutionary computation called indirect encoding, which compresses the representation by exploiting its geometry. The result is that the BEV learns a Keepaway policy that transfers without further learning or manipulation. It also facilitates transferring knowledge learned in a different domain, Knight Joust, into Keepaway. Finally, the indirect encoding of the BEV means that its geometry can be changed without altering the solution. Thus static representations facilitate several kinds of transfer.

Reference: text


Summary: the most important sentenses genereted by tfidf model

sentIndex sentText sentNum sentScore

1 Previous approaches to transfer in Keepaway have focused on transforming the original representation to fit the new task. [sent-8, score-0.312]

2 In contrast, this paper explores the idea that transfer is most effective if the representation is designed to be the same even across different tasks. [sent-9, score-0.312]

3 , state) representation might remain static during transfer is plausible because the raw inputs to biological organisms, for example, vision, remain the same even when new tasks are confronted. [sent-41, score-0.5]

4 The main idea in this paper is that such static representation, when possible, facilitates transfer by ensuring that the semantics of the representation are preserved even when the task changes. [sent-43, score-0.499]

5 To demonstrate the critical role of static representation in transfer, a novel state representation is introduced called a bird’s eye view (BEV), which is a two-dimensional depiction of objects on the ground from above. [sent-44, score-0.361]

6 However, more importantly, unlike any method so far, HyperNEAT can transfer from 3 vs. [sent-70, score-0.263]

7 Additional types of transfer within Keepaway are investigated wherein the representation of the policy (i. [sent-74, score-0.443]

8 Finally, cross-domain transfer is demonstrated by training on a distinctly different domain, Knight Joust (Taylor et al. [sent-77, score-0.263]

9 The main result is that transfer through a static representation is consistently more robust and often provides immediate benefits even without any further training. [sent-80, score-0.448]

10 Thus, while machine 1738 E VOLVING S TATIC R EPRESENTATIONS FOR TASK T RANSFER learning often focuses on the learning algorithm, the hope is that this paper provokes a fruitful conversation on the role of representation in transfer and learning in general. [sent-82, score-0.312]

11 The next section describes the importance of representation in learning, prior research in transfer learning, and the methods that underlie the BEV representation. [sent-85, score-0.312]

12 In Section 4, the experiments that investigate the performance of the BEV in learning and transfer are described. [sent-87, score-0.263]

13 Background This section examines the critical role of representation in RL and then explains the geometry-based methods that underlie the static representation investigated in this paper, and their relation to task transfer. [sent-90, score-0.285]

14 For example, in the Keepaway soccer domain, the state space S for the keeper with the ball can be defined as the set of distances and angles to each other player. [sent-97, score-0.317]

15 A simple policy π would be to pass to the most open teammate when takers are close and hold the ball otherwise. [sent-102, score-0.296]

16 One popular approach to state representation, for example, in the RoboCup Keepaway soccer domain, is to express the state as distances and angles to the other players relative to the agent with the ball (Metzen et al. [sent-104, score-0.426]

17 These additional parameters mean that the same representation cannot be applied to both tasks, thereby complicating the transfer of knowledge between tasks. [sent-115, score-0.312]

18 This transfer could be across objects in the domain or across different tasks. [sent-127, score-0.328]

19 2 Task Transfer Task transfer means applying knowledge learned in one task to a new, related task (Caruana, 1997; Talvitie and Singh, 2007; Taylor et al. [sent-146, score-0.393]

20 Thus the capability to transfer is becoming increasingly important as the tasks studied in RL increase in complexity. [sent-150, score-0.291]

21 However, transfer learning faces several challenges: First, transfer is only effective among compatible tasks and the particular knowledge that can transfer from one task to another must be identified. [sent-151, score-0.868]

22 Second, a method must be derived to actually implement the transfer of knowledge. [sent-152, score-0.263]

23 Finally, cases in which transfer hinders performance, or negative transfer, must be avoided (Pan and Yang, 2008). [sent-153, score-0.263]

24 There are several types of transfer learning problems and a variety of methods that exploit their characteristics. [sent-154, score-0.263]

25 , 2007a), choosing the best policy for the current task from a set of previously learned policies (Talvitie and Singh, 2007), extracting advice from previously learned tasks (Torrey et al. [sent-157, score-0.359]

26 An intuitive approach to transfer learning is to transform the representation of knowledge learned in one task to a suitable form for a new task and then continue learning from that point. [sent-160, score-0.442]

27 A successful method that takes this approach is transfer via inter-task mapping for policy search methods (TVITM-PS; Taylor et al. [sent-161, score-0.394]

28 TVITM-PS is such a leading method for transforming the policy learned in the source task into a policy usable in the target task. [sent-163, score-0.341]

29 In TVITM-PS, a transfer functional ρ is defined to transform the policy π for a source task into the policy for a target task, such that ρ(πsource ) = πtarget . [sent-164, score-0.576]

30 TVITM-PS is a milestone in task transfer because it introduces a formal approach to moving from one domain to another that defines how ambiguous variables in the target domain should be treated. [sent-173, score-0.372]

31 Alternating trusting Exploration and suspicious exploitation (AtEase; Talvitie and Singh 2007) is such a transfer method; it aims to recognize when tasks are related and when to exploit knowledge gained from previous tasks. [sent-178, score-0.291]

32 An important consideration in transfer is whether a human can understand the knowledge being transferred among tasks. [sent-191, score-0.306]

33 Through observation, it may be apparent that a learned policy always passes the ball if opponents approach within one meter, which may then be transformed into a rule to transfer to another task. [sent-203, score-0.486]

34 Instead, knowledge may transfer among several tasks that are simultaneously being learned. [sent-206, score-0.291]

35 This paper adds to our understanding of task transfer by focusing on the role of representation. [sent-213, score-0.314]

36 The main idea in HyperNEAT is that it is possible to learn geometric relationships in the domain through an indirect encoding that describes how the connectivity of the ANN can be generated as a function of the domain geometry. [sent-266, score-0.269]

37 That is, HyperNEAT discovers the regularities in the domain geometry and learns a policy based on them. [sent-272, score-0.276]

38 (1) Every connection between layers in the substrate is queried by the CPPN to determine its weight; the line connecting layers in the substrate represents a sample such connection. [sent-291, score-0.44]

39 As a rule of thumb, nodes are placed on the substrate to reflect the geometry of the domain (i. [sent-306, score-0.298]

40 This way, the connectivity of the substrate becomes a direct function of the domain geometry, which means that knowledge about the problem can be injected into the search and HyperNEAT can exploit the regularities (e. [sent-311, score-0.296]

41 Approach: Bird’s Eye View A major challenge for the state representation in RL tasks is that specific state variables are often tied to agents or individual objects, which makes it difficult to add more such objects without expanding the state space (Taylor et al. [sent-328, score-0.295]

42 , in soccer), the geometry of the input layer of the substrate is made two-dimensional, as in Figure 5. [sent-374, score-0.303]

43 Each dimension ranges between [−1, 1] and the input and output planes of the substrate are equivalently constructed to take advantage of geometric regularities between states and actions. [sent-388, score-0.265]

44 That way, task transfer to different numbers of players is made simple through the static representation. [sent-395, score-0.55]

45 For example, if the substrate resolution is 20 × 20 then the number of possible connections in the substrate is 400 × 400 = 160, 000. [sent-410, score-0.439]

46 Of course, some representations are better suited to transfer in a given domain than others. [sent-419, score-0.319]

47 Further, the ability to transfer between tasks is dependent on the similarity of the tasks. [sent-420, score-0.291]

48 However, this paper focuses on the idea that a particularly effective representation for transfer is one that does not need to change from one task to the next. [sent-421, score-0.363]

49 Takers follow static policies, wherein the first two takers go towards the ball and additional takers attempt to block open keepers. [sent-453, score-0.352]

50 These include each player’s distance to the center of the field, the distance from the keeper with the ball to each other player, the distance from each other keeper to the closest taker, and the minimum angle between the other keepers and the takers (Figure 7). [sent-459, score-0.368]

51 To investigate the ability of a static representation, that is, the HyperNEAT BEV, to learn this task, it is compared to both static policies (Stone et al. [sent-467, score-0.338]

52 The static benchmarks are Always-Hold, Random, and a Hand-Coded policy, which holds the ball if no takers are within 10m (Stone and Sutton, 2001). [sent-473, score-0.276]

53 These static benchmarks provide a baseline to validate that the BEV learns a non-trivial policy in the initial task. [sent-474, score-0.267]

54 The keeper with the ball is the small square, other keepers are circles, and the takers are triangles. [sent-508, score-0.297]

55 Unlike the HyperNEAT BEV, TVITM-PS requires further training after transfer because ρ expands the ANN by adding new state variables. [sent-559, score-0.303]

56 The first is transfer to increasing field sizes, which is evaluated by first training individuals on a small (15m×15m) field size and then testing trained individuals on the trained and larger field sizes (each of 15m×15m, 20m×20m, and 25m×25m). [sent-561, score-0.469]

57 , if the field size is 15m×15m, the substrate is 15 × 15; if it is 25m×25m, the substrate is 25 × 25). [sent-564, score-0.386]

58 Second, transfer to substrates of different resolutions is evaluated by training individuals on a single field size, then doubling the resolution in each dimension of the substrate (i. [sent-569, score-0.594]

59 , an individual trained on a 20m×20m field with a 20 × 20 substrate is reevaluated on a substrate changed to 40 × 40). [sent-571, score-0.424]

60 The higher resolution 20×20 4 BEV is then tested on the same field size to evaluate the ability to transfer knowledge between substrate resolutions. [sent-575, score-0.509]

61 The name Knight Joust reflects that the player is allowed three potential moves: move forward, knight jump left, and knight jump right, where a knight jump is two steps in the direction left or right and then forward (as in chess). [sent-582, score-0.46]

62 This state information can similarly be drawn on the substrate of the BEV by marking the position of the player, opponent, the path between them, and the paths to the corners. [sent-588, score-0.278]

63 1755 V ERBANCSICS AND S TANLEY The evaluation of cross-domain transfer is completed by first training for 20 generations in the Knight Joust domain. [sent-603, score-0.332]

64 This experiment is interesting because it can help to show that static transfer is beneficial with the BEV even in cases where the input semantics of the two tasks have slightly different meaning. [sent-609, score-0.427]

65 Results This section describes the results of training the BEV on the Keepaway benchmark, the transfer performance among variations of the Keepaway task, and finally the performance of the BEV in cross-domain transfer from Knight Joust to Keepaway. [sent-611, score-0.526]

66 2 Keepaway on the 20m×20m field, the best keepers from each of the five runs controlled by a BEV substrate trained by HyperNEAT maintain possession of the ball on average for 15. [sent-629, score-0.401]

67 2 Keepaway Transfer Results In transfer learning, the main focus of this work, the BEV is evaluated by testing individuals trained for 20 generations only on the 3 vs. [sent-642, score-0.435]

68 (2007b), in which teams trained on the smaller task are further trained on the larger task after the transfer because new parameters are added. [sent-648, score-0.441]

69 2 task improves, the transfer performance on the 4 vs. [sent-671, score-0.314]

70 In contrast, the previous best approach to transfer learning in this domain required executing a transfer function and additional 1757 V ERBANCSICS AND S TANLEY training for between 50 and 200 hours (depending on the chosen transfer function) beyond the initial bootstrap training in 3 vs. [sent-684, score-0.818]

71 Thus, because the BEV is static, transfer is instantaneous and requires no special adjustments to the representation to achieve the same result as many hours of further training with the TVITM-PS transfer method. [sent-688, score-0.575]

72 Thus transfer learning is also evaluated by testing the best policy trained in 3 vs. [sent-737, score-0.432]

73 The substrate resolution of the champion individuals from five runs from training on three field sizes (15m×15m, 20m×20m, and 25m×25m) are doubled in each dimension and then tested again on the same field size. [sent-776, score-0.336]

74 The regularities learned by the indirect encoding are not dependent on the particular substrate resolution and may be extrapolated to higher resolutions. [sent-798, score-0.429]

75 3 Knight Joust Transfer Results Cross-domain transfer is evaluated from the non-Keepaway task of Knight Joust on a 20 × 20 grid to 3 vs. [sent-801, score-0.314]

76 After one generation of evolution, the best individuals from transfer exceed the raw performance by 0. [sent-808, score-0.328]

77 Finally, after ten further generations, the best individuals with transfer hold the ball for 1. [sent-810, score-0.392]

78 Discussion and Future Work Methods that alter representation remain important tools in task transfer for domains in which the representation must change with the task. [sent-819, score-0.412]

79 The deeper lesson is the critical role of representation in transfer and the consequent need for algorithms that can learn from relatively high-dimensional static representations of task geometry. [sent-821, score-0.526]

80 Thus performance on Keepaway, both instantaneous and with further training, benefits from transfer from the Knight Joust domain with significance p < 0. [sent-836, score-0.292]

81 HyperNEAT substrates with hidden layers have been shown to work in the past in domains without transfer (D’Ambrosio and Stanley, 2008; Clune et al. [sent-839, score-0.31]

82 1762 E VOLVING S TATIC R EPRESENTATIONS FOR TASK T RANSFER The role of representation in transfer is relevant to all approaches to learning because transfer is always an option for extending the scope of learning. [sent-854, score-0.575]

83 Additionally, the static nature of the representation allows the same policy to train on multiple tasks simultaneously. [sent-859, score-0.344]

84 For example, a soccer player does not practice by playing only soccer games. [sent-860, score-0.366]

85 For example, in this paper the policy is encoded by a CPPN that is expressed as a function of the task geometry, which enables the solution to exploit regularities in the geometry and extrapolate to previously unseen areas of the geometry. [sent-863, score-0.298]

86 1 Prospects for Full RoboCup Soccer An exciting implication of this work is that the power of static transfer and indirect encoding can potentially bootstrap learning the complete game of soccer. [sent-870, score-0.514]

87 The static BEV state representation enables the learned policy to transfer to variations of the task in which the number of players is changed (e. [sent-873, score-0.798]

88 Furthermore, indirectly encoding the policy enables the same policy to be applied to variations of the task in which the geometry has been changed (e. [sent-878, score-0.457]

89 Interestingly, the Keepaway domain was designed as a stepping stone to scaling machine learning methods to the full RoboCup soccer domain (Stone and Sutton, 2001). [sent-883, score-0.308]

90 The same principles that enable the BEV to transfer among variations of the Keepaway domain also can potentially enable the BEV to scale to full Keepaway soccer. [sent-884, score-0.292]

91 For example, because the representation remains static no matter how many players are on the field, training can begin with a small number of players, such as 3 vs. [sent-885, score-0.285]

92 Furthermore, varying the substrate configuration while the solution encoding remains static makes it possible to train skills relevant to RoboCup on subsets of the full field, for example, half-field offense/defense. [sent-888, score-0.397]

93 In this way, varying the number of players and varying the field size are both required to transfer from the RoboCup Keepaway domain to full RoboCup soccer. [sent-889, score-0.392]

94 Thus an interesting property of the BEV is that the state space can transfer, by accommodating new players or field sizes, and the action space can also transfer in the same way. [sent-900, score-0.431]

95 Ultimately, the promise of such transfer is tied to the idea of static representation, whose potential was highlighted in this paper. [sent-901, score-0.399]

96 Conclusion This paper introduced the BEV representation, which simplifies task transfer by making the state representation static. [sent-903, score-0.403]

97 In addition to results competitive with leading methods on the Keepaway benchmark, the BEV, which is enabled by an indirect encoding, achieved transfer learning from 3 vs. [sent-908, score-0.31]

98 The hope is that advanced representations in conjunction with indirect encoding can later contribute to scaling learning techniques to more challenging tasks, such as the complete RoboCup soccer domain. [sent-918, score-0.284]

99 Performance evaluation of EANT in the robocup keepaway benchmark. [sent-1051, score-0.476]

100 Advice taking and transfer learning: Naturally inspired extensions to reinforcement learning. [sent-1198, score-0.314]


similar papers computed by tfidf model

tfidf for this paper:

wordName wordTfidf (topN-words)

[('bev', 0.492), ('keepaway', 0.324), ('transfer', 0.263), ('cppn', 0.258), ('hyperneat', 0.218), ('neat', 0.193), ('substrate', 0.193), ('robocup', 0.152), ('soccer', 0.142), ('static', 0.136), ('policy', 0.131), ('stanley', 0.126), ('knight', 0.126), ('joust', 0.122), ('stone', 0.108), ('players', 0.1), ('gauci', 0.096), ('cppns', 0.086), ('keepers', 0.086), ('eld', 0.084), ('player', 0.082), ('erbancsics', 0.081), ('ransfer', 0.081), ('tanley', 0.081), ('tatic', 0.081), ('volving', 0.081), ('sarsa', 0.078), ('takers', 0.076), ('geometry', 0.076), ('keeper', 0.071), ('generations', 0.069), ('epresentations', 0.069), ('encoding', 0.068), ('policies', 0.066), ('individuals', 0.065), ('ball', 0.064), ('agents', 0.062), ('evolutionary', 0.06), ('ann', 0.058), ('resolution', 0.053), ('taylor', 0.052), ('actions', 0.051), ('reinforcement', 0.051), ('task', 0.051), ('eye', 0.051), ('neuroevolution', 0.051), ('representation', 0.049), ('indirect', 0.047), ('opponent', 0.047), ('sutton', 0.045), ('transferred', 0.043), ('metzen', 0.041), ('whiteson', 0.041), ('state', 0.04), ('kenneth', 0.04), ('agent', 0.04), ('seconds', 0.04), ('regularities', 0.04), ('trained', 0.038), ('objects', 0.036), ('miikkulainen', 0.035), ('topologies', 0.035), ('layer', 0.034), ('connectivity', 0.034), ('geometric', 0.032), ('evolving', 0.031), ('eant', 0.03), ('multiagent', 0.03), ('anns', 0.03), ('rl', 0.03), ('relationships', 0.03), ('domain', 0.029), ('relational', 0.029), ('tasks', 0.028), ('action', 0.028), ('learned', 0.028), ('bird', 0.027), ('advice', 0.027), ('layers', 0.027), ('representations', 0.027), ('population', 0.026), ('ambrosio', 0.025), ('champion', 0.025), ('clune', 0.025), ('risto', 0.025), ('tadepalli', 0.025), ('taker', 0.025), ('talvitie', 0.025), ('teammate', 0.025), ('teammates', 0.025), ('torrey', 0.025), ('paths', 0.025), ('inputs', 0.024), ('reward', 0.024), ('peter', 0.023), ('champions', 0.02), ('marking', 0.02), ('possession', 0.02), ('shimon', 0.02), ('substrates', 0.02)]

similar papers list:

simIndex simValue paperId paperTitle

same-paper 1 1.0000011 37 jmlr-2010-Evolving Static Representations for Task Transfer

Author: Phillip Verbancsics, Kenneth O. Stanley

Abstract: An important goal for machine learning is to transfer knowledge between tasks. For example, learning to play RoboCup Keepaway should contribute to learning the full game of RoboCup soccer. Previous approaches to transfer in Keepaway have focused on transforming the original representation to fit the new task. In contrast, this paper explores the idea that transfer is most effective if the representation is designed to be the same even across different tasks. To demonstrate this point, a bird’s eye view (BEV) representation is introduced that can represent different tasks on the same two-dimensional map. For example, both the 3 vs. 2 and 4 vs. 3 Keepaway tasks can be represented on the same BEV. Yet the problem is that a raw two-dimensional map is high-dimensional and unstructured. This paper shows how this problem is addressed naturally by an idea from evolutionary computation called indirect encoding, which compresses the representation by exploiting its geometry. The result is that the BEV learns a Keepaway policy that transfers without further learning or manipulation. It also facilitates transferring knowledge learned in a different domain, Knight Joust, into Keepaway. Finally, the indirect encoding of the BEV means that its geometry can be changed without altering the solution. Thus static representations facilitate several kinds of transfer.

2 0.059505332 79 jmlr-2010-Near-optimal Regret Bounds for Reinforcement Learning

Author: Thomas Jaksch, Ronald Ortner, Peter Auer

Abstract: For undiscounted reinforcement learning in Markov decision processes (MDPs) we consider the total regret of a learning algorithm with respect to an optimal policy. In order to describe the transition structure of an MDP we propose a new parameter: An MDP has diameter D if for any pair of states s, s′ there is a policy which moves from s to s′ in at most D steps (on average). √ ˜ We present a reinforcement learning algorithm with total regret O(DS AT ) after T steps for any unknown MDP with S states, A actions per state, and diameter D. A corresponding lower bound of √ Ω( DSAT ) on the total regret of any learning algorithm is given as well. These results are complemented by a sample complexity bound on the number of suboptimal steps taken by our algorithm. This bound can be used to achieve a (gap-dependent) regret bound that is logarithmic in T . Finally, we also consider a setting where the MDP is allowed to change a fixed number of ℓ times. We present a modification of our algorithm that is able to deal with this setting and show a √ ˜ regret bound of O(ℓ1/3 T 2/3 DS A). Keywords: undiscounted reinforcement learning, Markov decision process, regret, online learning, sample complexity

3 0.04491172 93 jmlr-2010-PyBrain

Author: Tom Schaul, Justin Bayer, Daan Wierstra, Yi Sun, Martin Felder, Frank Sehnke, Thomas Rückstieß, Jürgen Schmidhuber

Abstract: PyBrain is a versatile machine learning library for Python. Its goal is to provide flexible, easyto-use yet still powerful algorithms for machine learning tasks, including a variety of predefined environments and benchmarks to test and compare algorithms. Implemented algorithms include Long Short-Term Memory (LSTM), policy gradient methods, (multidimensional) recurrent neural networks and deep belief networks. Keywords: Python, neural networks, reinforcement learning, optimization

4 0.041060634 117 jmlr-2010-Why Does Unsupervised Pre-training Help Deep Learning?

Author: Dumitru Erhan, Yoshua Bengio, Aaron Courville, Pierre-Antoine Manzagol, Pascal Vincent, Samy Bengio

Abstract: Much recent research has been devoted to learning algorithms for deep architectures such as Deep Belief Networks and stacks of auto-encoder variants, with impressive results obtained in several areas, mostly on vision and language data sets. The best results obtained on supervised learning tasks involve an unsupervised learning component, usually in an unsupervised pre-training phase. Even though these new algorithms have enabled training deep models, many questions remain as to the nature of this difficult learning problem. The main question investigated here is the following: how does unsupervised pre-training work? Answering this questions is important if learning in deep architectures is to be further improved. We propose several explanatory hypotheses and test them through extensive simulations. We empirically show the influence of pre-training with respect to architecture depth, model capacity, and number of training examples. The experiments confirm and clarify the advantage of unsupervised pre-training. The results suggest that unsupervised pretraining guides the learning towards basins of attraction of minima that support better generalization from the training data set; the evidence from these results supports a regularization explanation for the effect of pre-training. Keywords: deep architectures, unsupervised pre-training, deep belief networks, stacked denoising auto-encoders, non-convex optimization

5 0.035392825 4 jmlr-2010-A Generalized Path Integral Control Approach to Reinforcement Learning

Author: Evangelos Theodorou, Jonas Buchli, Stefan Schaal

Abstract: With the goal to generate more scalable algorithms with higher efficiency and fewer open parameters, reinforcement learning (RL) has recently moved towards combining classical techniques from optimal control and dynamic programming with modern learning techniques from statistical estimation theory. In this vein, this paper suggests to use the framework of stochastic optimal control with path integrals to derive a novel approach to RL with parameterized policies. While solidly grounded in value function estimation and optimal control based on the stochastic Hamilton-JacobiBellman (HJB) equations, policy improvements can be transformed into an approximation problem of a path integral which has no open algorithmic parameters other than the exploration noise. The resulting algorithm can be conceived of as model-based, semi-model-based, or even model free, depending on how the learning problem is structured. The update equations have no danger of numerical instabilities as neither matrix inversions nor gradient learning rates are required. Our new algorithm demonstrates interesting similarities with previous RL research in the framework of probability matching and provides intuition why the slightly heuristically motivated probability matching approach can actually perform well. Empirical evaluations demonstrate significant performance improvements over gradient-based policy learning and scalability to high-dimensional control problems. Finally, a learning experiment on a simulated 12 degree-of-freedom robot dog illustrates the functionality of our algorithm in a complex robot learning scenario. We believe that Policy Improvement with Path Integrals (PI2 ) offers currently one of the most efficient, numerically robust, and easy to implement algorithms for RL based on trajectory roll-outs. Keywords: stochastic optimal control, reinforcement learning, parameterized policies

6 0.035283834 2 jmlr-2010-A Convergent Online Single Time Scale Actor Critic Algorithm

7 0.032255072 109 jmlr-2010-Stochastic Composite Likelihood

8 0.030604335 9 jmlr-2010-An Efficient Explanation of Individual Classifications using Game Theory

9 0.028850719 52 jmlr-2010-Incremental Sigmoid Belief Networks for Grammar Learning

10 0.02878494 64 jmlr-2010-Learning Non-Stationary Dynamic Bayesian Networks

11 0.028555153 107 jmlr-2010-Stacked Denoising Autoencoders: Learning Useful Representations in a Deep Network with a Local Denoising Criterion

12 0.027326886 22 jmlr-2010-Classification Using Geometric Level Sets

13 0.027169712 91 jmlr-2010-Posterior Regularization for Structured Latent Variable Models

14 0.025578266 111 jmlr-2010-Topology Selection in Graphical Models of Autoregressive Processes

15 0.022709766 75 jmlr-2010-Mean Field Variational Approximation for Continuous-Time Bayesian Networks

16 0.022696631 42 jmlr-2010-Generalized Expectation Criteria for Semi-Supervised Learning with Weakly Labeled Data

17 0.020722667 30 jmlr-2010-Dimensionality Estimation, Manifold Learning and Function Approximation using Tensor Voting

18 0.020470774 97 jmlr-2010-Regret Bounds and Minimax Policies under Partial Monitoring

19 0.019846395 3 jmlr-2010-A Fast Hybrid Algorithm for Large-Scalel1-Regularized Logistic Regression

20 0.019250698 39 jmlr-2010-FastInf: An Efficient Approximate Inference Library


similar papers computed by lsi model

lsi for this paper:

topicId topicWeight

[(0, -0.095), (1, 0.018), (2, -0.039), (3, -0.028), (4, 0.022), (5, 0.031), (6, 0.033), (7, -0.027), (8, 0.1), (9, 0.086), (10, 0.102), (11, -0.003), (12, 0.017), (13, -0.094), (14, 0.075), (15, 0.076), (16, 0.055), (17, 0.036), (18, -0.064), (19, -0.017), (20, -0.033), (21, -0.013), (22, 0.028), (23, -0.009), (24, -0.071), (25, -0.032), (26, -0.195), (27, 0.017), (28, 0.137), (29, -0.004), (30, -0.122), (31, 0.025), (32, -0.079), (33, -0.166), (34, -0.007), (35, -0.167), (36, -0.05), (37, -0.078), (38, -0.05), (39, -0.078), (40, -0.083), (41, -0.028), (42, -0.152), (43, -0.093), (44, -0.039), (45, -0.053), (46, -0.146), (47, 0.032), (48, -0.101), (49, 0.314)]

similar papers list:

simIndex simValue paperId paperTitle

same-paper 1 0.9570806 37 jmlr-2010-Evolving Static Representations for Task Transfer

Author: Phillip Verbancsics, Kenneth O. Stanley

Abstract: An important goal for machine learning is to transfer knowledge between tasks. For example, learning to play RoboCup Keepaway should contribute to learning the full game of RoboCup soccer. Previous approaches to transfer in Keepaway have focused on transforming the original representation to fit the new task. In contrast, this paper explores the idea that transfer is most effective if the representation is designed to be the same even across different tasks. To demonstrate this point, a bird’s eye view (BEV) representation is introduced that can represent different tasks on the same two-dimensional map. For example, both the 3 vs. 2 and 4 vs. 3 Keepaway tasks can be represented on the same BEV. Yet the problem is that a raw two-dimensional map is high-dimensional and unstructured. This paper shows how this problem is addressed naturally by an idea from evolutionary computation called indirect encoding, which compresses the representation by exploiting its geometry. The result is that the BEV learns a Keepaway policy that transfers without further learning or manipulation. It also facilitates transferring knowledge learned in a different domain, Knight Joust, into Keepaway. Finally, the indirect encoding of the BEV means that its geometry can be changed without altering the solution. Thus static representations facilitate several kinds of transfer.

2 0.56314111 93 jmlr-2010-PyBrain

Author: Tom Schaul, Justin Bayer, Daan Wierstra, Yi Sun, Martin Felder, Frank Sehnke, Thomas Rückstieß, Jürgen Schmidhuber

Abstract: PyBrain is a versatile machine learning library for Python. Its goal is to provide flexible, easyto-use yet still powerful algorithms for machine learning tasks, including a variety of predefined environments and benchmarks to test and compare algorithms. Implemented algorithms include Long Short-Term Memory (LSTM), policy gradient methods, (multidimensional) recurrent neural networks and deep belief networks. Keywords: Python, neural networks, reinforcement learning, optimization

3 0.30560213 79 jmlr-2010-Near-optimal Regret Bounds for Reinforcement Learning

Author: Thomas Jaksch, Ronald Ortner, Peter Auer

Abstract: For undiscounted reinforcement learning in Markov decision processes (MDPs) we consider the total regret of a learning algorithm with respect to an optimal policy. In order to describe the transition structure of an MDP we propose a new parameter: An MDP has diameter D if for any pair of states s, s′ there is a policy which moves from s to s′ in at most D steps (on average). √ ˜ We present a reinforcement learning algorithm with total regret O(DS AT ) after T steps for any unknown MDP with S states, A actions per state, and diameter D. A corresponding lower bound of √ Ω( DSAT ) on the total regret of any learning algorithm is given as well. These results are complemented by a sample complexity bound on the number of suboptimal steps taken by our algorithm. This bound can be used to achieve a (gap-dependent) regret bound that is logarithmic in T . Finally, we also consider a setting where the MDP is allowed to change a fixed number of ℓ times. We present a modification of our algorithm that is able to deal with this setting and show a √ ˜ regret bound of O(ℓ1/3 T 2/3 DS A). Keywords: undiscounted reinforcement learning, Markov decision process, regret, online learning, sample complexity

4 0.29160666 2 jmlr-2010-A Convergent Online Single Time Scale Actor Critic Algorithm

Author: Dotan Di Castro, Ron Meir

Abstract: Actor-Critic based approaches were among the first to address reinforcement learning in a general setting. Recently, these algorithms have gained renewed interest due to their generality, good convergence properties, and possible biological relevance. In this paper, we introduce an online temporal difference based actor-critic algorithm which is proved to converge to a neighborhood of a local maximum of the average reward. Linear function approximation is used by the critic in order estimate the value function, and the temporal difference signal, which is passed from the critic to the actor. The main distinguishing feature of the present convergence proof is that both the actor and the critic operate on a similar time scale, while in most current convergence proofs they are required to have very different time scales in order to converge. Moreover, the same temporal difference signal is used to update the parameters of both the actor and the critic. A limitation of the proposed approach, compared to results available for two time scale convergence, is that convergence is guaranteed only to a neighborhood of an optimal value, rather to an optimal value itself. The single time scale and identical temporal difference signal used by the actor and the critic, may provide a step towards constructing more biologically realistic models of reinforcement learning in the brain. Keywords: actor critic, single time scale convergence, temporal difference

5 0.2833668 66 jmlr-2010-Linear Algorithms for Online Multitask Classification

Author: Giovanni Cavallanti, Nicolò Cesa-Bianchi, Claudio Gentile

Abstract: We introduce new Perceptron-based algorithms for the online multitask binary classification problem. Under suitable regularity conditions, our algorithms are shown to improve on their baselines by a factor proportional to the number of tasks. We achieve these improvements using various types of regularization that bias our algorithms towards specific notions of task relatedness. More specifically, similarity among tasks is either measured in terms of the geometric closeness of the task reference vectors or as a function of the dimension of their spanned subspace. In addition to adapting to the online setting a mix of known techniques, such as the multitask kernels of Evgeniou et al., our analysis also introduces a matrix-based multitask extension of the p-norm Perceptron, which is used to implement spectral co-regularization. Experiments on real-world data sets complement and support our theoretical findings. Keywords: mistake bounds, perceptron algorithm, multitask learning, spectral regularization

6 0.25711542 4 jmlr-2010-A Generalized Path Integral Control Approach to Reinforcement Learning

7 0.25360066 3 jmlr-2010-A Fast Hybrid Algorithm for Large-Scalel1-Regularized Logistic Regression

8 0.21149622 101 jmlr-2010-Second-Order Bilinear Discriminant Analysis

9 0.1966365 22 jmlr-2010-Classification Using Geometric Level Sets

10 0.18995099 30 jmlr-2010-Dimensionality Estimation, Manifold Learning and Function Approximation using Tensor Voting

11 0.1883326 109 jmlr-2010-Stochastic Composite Likelihood

12 0.1784313 54 jmlr-2010-Information Retrieval Perspective to Nonlinear Dimensionality Reduction for Data Visualization

13 0.17782918 110 jmlr-2010-The SHOGUN Machine Learning Toolbox

14 0.1750841 117 jmlr-2010-Why Does Unsupervised Pre-training Help Deep Learning?

15 0.16346473 23 jmlr-2010-Classification with Incomplete Data Using Dirichlet Process Priors

16 0.15883557 49 jmlr-2010-Hubs in Space: Popular Nearest Neighbors in High-Dimensional Data

17 0.15844315 52 jmlr-2010-Incremental Sigmoid Belief Networks for Grammar Learning

18 0.13679224 12 jmlr-2010-Analysis of Multi-stage Convex Relaxation for Sparse Regularization

19 0.13399281 24 jmlr-2010-Collective Inference for Extraction MRFs Coupled with Symmetric Clique Potentials

20 0.13101654 26 jmlr-2010-Consensus-Based Distributed Support Vector Machines


similar papers computed by lda model

lda for this paper:

topicId topicWeight

[(2, 0.01), (4, 0.015), (8, 0.01), (21, 0.017), (24, 0.016), (25, 0.01), (32, 0.039), (33, 0.018), (34, 0.474), (36, 0.065), (37, 0.043), (38, 0.013), (75, 0.088), (81, 0.018), (85, 0.043), (96, 0.023), (97, 0.01)]

similar papers list:

simIndex simValue paperId paperTitle

same-paper 1 0.70681453 37 jmlr-2010-Evolving Static Representations for Task Transfer

Author: Phillip Verbancsics, Kenneth O. Stanley

Abstract: An important goal for machine learning is to transfer knowledge between tasks. For example, learning to play RoboCup Keepaway should contribute to learning the full game of RoboCup soccer. Previous approaches to transfer in Keepaway have focused on transforming the original representation to fit the new task. In contrast, this paper explores the idea that transfer is most effective if the representation is designed to be the same even across different tasks. To demonstrate this point, a bird’s eye view (BEV) representation is introduced that can represent different tasks on the same two-dimensional map. For example, both the 3 vs. 2 and 4 vs. 3 Keepaway tasks can be represented on the same BEV. Yet the problem is that a raw two-dimensional map is high-dimensional and unstructured. This paper shows how this problem is addressed naturally by an idea from evolutionary computation called indirect encoding, which compresses the representation by exploiting its geometry. The result is that the BEV learns a Keepaway policy that transfers without further learning or manipulation. It also facilitates transferring knowledge learned in a different domain, Knight Joust, into Keepaway. Finally, the indirect encoding of the BEV means that its geometry can be changed without altering the solution. Thus static representations facilitate several kinds of transfer.

2 0.63795871 50 jmlr-2010-Image Denoising with Kernels Based on Natural Image Relations

Author: Valero Laparra, Juan Gutiérrez, Gustavo Camps-Valls, Jesús Malo

Abstract: A successful class of image denoising methods is based on Bayesian approaches working in wavelet representations. The performance of these methods improves when relations among the local frequency coefficients are explicitly included. However, in these techniques, analytical estimates can be obtained only for particular combinations of analytical models of signal and noise, thus precluding its straightforward extension to deal with other arbitrary noise sources. In this paper, we propose an alternative non-explicit way to take into account the relations among natural image wavelet coefficients for denoising: we use support vector regression (SVR) in the wavelet domain to enforce these relations in the estimated signal. Since relations among the coefficients are specific to the signal, the regularization property of SVR is exploited to remove the noise, which does not share this feature. The specific signal relations are encoded in an anisotropic kernel obtained from mutual information measures computed on a representative image database. In the proposed scheme, training considers minimizing the Kullback-Leibler divergence (KLD) between the estimated and actual probability functions (or histograms) of signal and noise in order to enforce similarity up to the higher (computationally estimable) order. Due to its non-parametric nature, the method can eventually cope with different noise sources without the need of an explicit re-formulation, as it is strictly necessary under parametric Bayesian formalisms. Results under several noise levels and noise sources show that: (1) the proposed method outperforms conventional wavelet methods that assume coefficient independence, (2) it is similar to state-of-the-art methods that do explicitly include these relations when the noise source is Gaussian, and (3) it gives better numerical and visual performance when more complex, realistic noise sources are considered. Therefore, the proposed machine learning approach can be seen as a mor

3 0.25353831 75 jmlr-2010-Mean Field Variational Approximation for Continuous-Time Bayesian Networks

Author: Ido Cohn, Tal El-Hay, Nir Friedman, Raz Kupferman

Abstract: Continuous-time Bayesian networks is a natural structured representation language for multicomponent stochastic processes that evolve continuously over time. Despite the compact representation provided by this language, inference in such models is intractable even in relatively simple structured networks. We introduce a mean field variational approximation in which we use a product of inhomogeneous Markov processes to approximate a joint distribution over trajectories. This variational approach leads to a globally consistent distribution, which can be efficiently queried. Additionally, it provides a lower bound on the probability of observations, thus making it attractive for learning tasks. Here we describe the theoretical foundations for the approximation, an efficient implementation that exploits the wide range of highly optimized ordinary differential equations (ODE) solvers, experimentally explore characterizations of processes for which this approximation is suitable, and show applications to a large-scale real-world inference problem. Keywords: continuous time Markov processes, continuous time Bayesian networks, variational approximations, mean field approximation

4 0.2511062 89 jmlr-2010-PAC-Bayesian Analysis of Co-clustering and Beyond

Author: Yevgeny Seldin, Naftali Tishby

Abstract: We derive PAC-Bayesian generalization bounds for supervised and unsupervised learning models based on clustering, such as co-clustering, matrix tri-factorization, graphical models, graph clustering, and pairwise clustering.1 We begin with the analysis of co-clustering, which is a widely used approach to the analysis of data matrices. We distinguish among two tasks in matrix data analysis: discriminative prediction of the missing entries in data matrices and estimation of the joint probability distribution of row and column variables in co-occurrence matrices. We derive PAC-Bayesian generalization bounds for the expected out-of-sample performance of co-clustering-based solutions for these two tasks. The analysis yields regularization terms that were absent in the previous formulations of co-clustering. The bounds suggest that the expected performance of co-clustering is governed by a trade-off between its empirical performance and the mutual information preserved by the cluster variables on row and column IDs. We derive an iterative projection algorithm for finding a local optimum of this trade-off for discriminative prediction tasks. This algorithm achieved stateof-the-art performance in the MovieLens collaborative filtering task. Our co-clustering model can also be seen as matrix tri-factorization and the results provide generalization bounds, regularization terms, and new algorithms for this form of matrix factorization. The analysis of co-clustering is extended to tree-shaped graphical models, which can be used to analyze high dimensional tensors. According to the bounds, the generalization abilities of treeshaped graphical models depend on a trade-off between their empirical data fit and the mutual information that is propagated up the tree levels. We also formulate weighted graph clustering as a prediction problem: given a subset of edge weights we analyze the ability of graph clustering to predict the remaining edge weights. The analysis of co-clustering easily

5 0.24584168 64 jmlr-2010-Learning Non-Stationary Dynamic Bayesian Networks

Author: Joshua W. Robinson, Alexander J. Hartemink

Abstract: Learning dynamic Bayesian network structures provides a principled mechanism for identifying conditional dependencies in time-series data. An important assumption of traditional DBN structure learning is that the data are generated by a stationary process, an assumption that is not true in many important settings. In this paper, we introduce a new class of graphical model called a nonstationary dynamic Bayesian network, in which the conditional dependence structure of the underlying data-generation process is permitted to change over time. Non-stationary dynamic Bayesian networks represent a new framework for studying problems in which the structure of a network is evolving over time. Some examples of evolving networks are transcriptional regulatory networks during an organism’s development, neural pathways during learning, and traffic patterns during the day. We define the non-stationary DBN model, present an MCMC sampling algorithm for learning the structure of the model from time-series data under different assumptions, and demonstrate the effectiveness of the algorithm on both simulated and biological data. Keywords: Bayesian networks, graphical models, model selection, structure learning, Monte Carlo methods

6 0.24427983 51 jmlr-2010-Importance Sampling for Continuous Time Bayesian Networks

7 0.24415255 56 jmlr-2010-Introduction to Causal Inference

8 0.24348204 63 jmlr-2010-Learning Instance-Specific Predictive Models

9 0.24331588 46 jmlr-2010-High Dimensional Inverse Covariance Matrix Estimation via Linear Programming

10 0.24294548 67 jmlr-2010-Local Causal and Markov Blanket Induction for Causal Discovery and Feature Selection for Classification Part I: Algorithms and Empirical Evaluation

11 0.2426876 17 jmlr-2010-Bayesian Learning in Sparse Graphical Factor Models via Variational Mean-Field Annealing

12 0.242006 74 jmlr-2010-Maximum Relative Margin and Data-Dependent Regularization

13 0.24199706 49 jmlr-2010-Hubs in Space: Popular Nearest Neighbors in High-Dimensional Data

14 0.24147262 107 jmlr-2010-Stacked Denoising Autoencoders: Learning Useful Representations in a Deep Network with a Local Denoising Criterion

15 0.24139036 103 jmlr-2010-Sparse Semi-supervised Learning Using Conjugate Functions

16 0.24135718 59 jmlr-2010-Large Scale Online Learning of Image Similarity Through Ranking

17 0.24131562 114 jmlr-2010-Unsupervised Supervised Learning I: Estimating Classification and Regression Errors without Labels

18 0.23928952 111 jmlr-2010-Topology Selection in Graphical Models of Autoregressive Processes

19 0.23908298 69 jmlr-2010-Lp-Nested Symmetric Distributions

20 0.23893137 116 jmlr-2010-WEKA−Experiences with a Java Open-Source Project