nips nips2009 nips2009-54 knowledge-graph by maker-knowledge-mining

54 nips-2009-Compositionality of optimal control laws


Source: pdf

Author: Emanuel Todorov

Abstract: We present a theory of compositionality in stochastic optimal control, showing how task-optimal controllers can be constructed from certain primitives. The primitives are themselves feedback controllers pursuing their own agendas. They are mixed in proportion to how much progress they are making towards their agendas and how compatible their agendas are with the present task. The resulting composite control law is provably optimal when the problem belongs to a certain class. This class is rather general and yet has a number of unique properties – one of which is that the Bellman equation can be made linear even for non-linear or discrete dynamics. This gives rise to the compositionality developed here. In the special case of linear dynamics and Gaussian noise our framework yields analytical solutions (i.e. non-linear mixtures of LQG controllers) without requiring the final cost to be quadratic. More generally, a natural set of control primitives can be constructed by applying SVD to Green’s function of the Bellman equation. We illustrate the theory in the context of human arm movements. The ideas of optimality and compositionality are both very prominent in the field of motor control, yet they have been difficult to reconcile. Our work makes this possible.

Reference: text


Summary: the most important sentenses genereted by tfidf model

sentIndex sentText sentNum sentScore

1 Compositionality of optimal control laws Emanuel Todorov Applied Mathematics and Computer Science & Engineering University of Washington todorov@cs. [sent-1, score-0.547]

2 edu Abstract We present a theory of compositionality in stochastic optimal control, showing how task-optimal controllers can be constructed from certain primitives. [sent-3, score-0.485]

3 The primitives are themselves feedback controllers pursuing their own agendas. [sent-4, score-0.326]

4 The resulting composite control law is provably optimal when the problem belongs to a certain class. [sent-6, score-0.788]

5 In the special case of linear dynamics and Gaussian noise our framework yields analytical solutions (i. [sent-9, score-0.156]

6 More generally, a natural set of control primitives can be constructed by applying SVD to Green’s function of the Bellman equation. [sent-12, score-0.512]

7 The ideas of optimality and compositionality are both very prominent in the field of motor control, yet they have been difficult to reconcile. [sent-14, score-0.445]

8 1 Introduction Stochastic optimal control is of interest in many fields of science and engineering, however it remains hard to solve. [sent-16, score-0.403]

9 Dynamic programming [1] and reinforcement learning [2] work well in discrete state spaces of reasonable size, but cannot handle continuous high-dimensional state spaces characteristic of complex dynamical systems. [sent-17, score-0.197]

10 With few exceptions [5, 6] this good-in-general idea is rarely used in optimal control, because it is unclear what/how can be composed in a way that guarantees optimality of the resulting control law. [sent-21, score-0.444]

11 Since the brain remains pretty much the only system capable of solving truly complex control problems, sensorimotor neuroscience is a natural (albeit under-exploited) source of inspiration. [sent-23, score-0.362]

12 To be sure, a satisfactory understanding of the neural control of movement is nowhere in sight. [sent-24, score-0.445]

13 One such idea is that biological movements are near-optimal [7, 8]. [sent-26, score-0.158]

14 For about a century, researchers have been talking about motor synergies or primitives which somehow simplify control [9–11]. [sent-29, score-0.638]

15 However the structure and origin of the hypothetical primitives, the rules for combining them, and the ways in which they actually simplify the control problem remain unclear. [sent-31, score-0.329]

16 1 2 Stochastic optimal control problems with linear Bellman equations We will be able to derive compositionality rules for first-exit and finite-horizon stochastic optimal control problems which belong to a certain class. [sent-32, score-1.222]

17 Most notably the optimal control law is found analytically given the optimal cost-to-go, which in turn is the solution to a linear equation obtained from the Bellman equation by exponentiation. [sent-34, score-0.72]

18 The discrete-time problem is defined by a state cost q (x) ≥ 0 describing how (un)desirable different states are, and passive dynamics x0 ∼ p (·|x) characterizing the behavior of the system in the absence of controls. [sent-39, score-0.406]

19 The controller can impose any dynamics x0 ∼ u (·|x) it wishes, however it pays a price (control cost) which is the KL divergence between u and p. [sent-40, score-0.284]

20 Thus the discrete-time problem is dynamics: cost rate: x0 ∼ u (·|x) (x, u (·|x)) = q (x) + KL (u (·|x) ||p (·|x)) Let I denote the set of interior states and B the set of boundary states, and let f (x) ≥ 0, x ∈ B be a final cost. [sent-42, score-0.271]

21 The continuous-time problem is a control-affine Ito diffusion with control-quadratic cost: dx = a (x) dt + B (x) (udt + σdω) 1 2 cost rate: (x, u) = q (x) + 2 kuk 2σ The control u is now a (more traditional) vector and ω is a Brownian motion process. [sent-46, score-0.503]

22 Note that the control cost scaling by σ −2 , which is needed to make the math work, can be compensated by rescaling q. [sent-47, score-0.406]

23 This is done by defining the passive dynamics p(h) (·|x) as the h-step transition probability density of the uncontrolled diffusion (or an Euler approximation to it), and the state cost as q(h) (x) = hq (x). [sent-49, score-0.39]

24 ¡ ¢ Then, in the limit h → 0, the integral equation exp q(h) z = G(h) [z] reduces to the differential equation qz = L [z]. [sent-50, score-0.147]

25 From the formula for KL divergence between Gaussians, the KL control cost in the discrete-time formulation reduces to the quadratic control cost in the continuous-time formulation. [sent-52, score-0.852]

26 3 Compositionality theory The compositionality developed in this section follows from the linearity of equations (1, 3). [sent-56, score-0.328]

27 Consider a collection of K optimal control problems in our class which all have the same dynamics – p (·|x) in discrete time or a (x) , B (x) , σ in continuous time – the same state cost rate q (x) and the same sets I and B of interior and boundary states. [sent-59, score-0.883]

28 These problems differ only in their final costs fk (x). [sent-60, score-0.216]

29 Let zk (x) denote the desirability function for problem k, and u∗ (·|x) or k u∗ (x) the corresponding optimal control law. [sent-61, score-0.886]

30 The latter will serve as primitives for constructing k optimal control laws for new problems in our class. [sent-62, score-0.774]

31 Suppose the final cost for the composite problem is f (x), and there exist weights wk such that ´ ³P K (4) f (x) = − log k=1 wk exp (−fk (x)) Thus the functions fk (x) define a K-dimensional manifold of composite problems. [sent-64, score-0.968]

32 The above condition ensures that for all boundary/terminal states x ∈ B we have P (5) z (x) = K wk zk (x) k=1 Since z is the solution to a linear equation, if (5) holds on the boundary then it must hold everywhere. [sent-65, score-0.566]

33 Thus the desirability function for the composite problem is a linear combination of the desirability functions for the component problems. [sent-66, score-0.69]

34 The weights in this linear combination can be interpreted as compatibilities between the control objectives in the component problems and the control objective in the composite problem. [sent-67, score-0.964]

35 The optimal control law for the composite problem is given by (1, 3). [sent-68, score-0.788]

36 The above construction implies that both z and zk are everywhere positive. [sent-69, score-0.303]

37 Indeed if ³P ´ K f (x) = − log wk zk (x) (6) k=1 holds for all x ∈ B, then (5) and z (x) > 0 hold everywhere even if zk (x) ≤ 0 for some k and x. [sent-72, score-0.69]

38 In this case the zk ’s are no longer desirability functions for well-defined optimal control problems. [sent-73, score-0.886]

39 Nevertheless we can think of them as generalized desirability functions with similar meaning: the larger zk (x) is the more compatible state x is with the agenda of component k. [sent-74, score-0.615]

40 1 Compositionality of discrete-time control laws When zk (x) > 0 the composite control law u∗ can be expressed as a state-dependent convex combination of the component control laws u∗ . [sent-76, score-1.969]

41 Combining (5, 1) and using the linearity of G, k u∗ (x0 |x) = X wk G [zk ] (x) p (x0 |x) zk (x0 ) P G [zk ] (x) s ws G [zs ] (x) k 3 The second term above is u∗ . [sent-77, score-0.535]

42 2 Compositionality of continuous-time control laws Substituting (5) in (3) and assuming zk (x) > 0, the control law given by (3) can be written as ¸ X wk zk (x) ∙ σ 2 T ∂ ∗ P B (x) zk (x) u (x) = ∂x s ws zs (x) zk (x) k The term in brackets is u∗ k (x). [sent-81, score-2.335]

43 This is surprising given that in one case the control law directly specifies the probability distribution over next states, while in the other case the control law shifts the mean of the distribution given by the passive dynamics. [sent-83, score-1.095]

44 (A) An LQG problem with quadratic cost-to-go and linear feedback control law. [sent-86, score-0.399]

45 This composite problem is no longer LQG because it has non-quadratic final cost (i. [sent-89, score-0.299]

46 Applying the results from the previous section, the desirability for the composite problem is µ ¶ P 1 T z (x, t) = k wk exp − x Vk (t) x − αk (t) 2 The optimal control law can now be obtained directly from (3), or via composition from (9). [sent-93, score-1.192]

47 Note that the constants αk (t) do not affect the component control laws (and indeed are rarely computed in the LQG framework) however they affect the composite control law through the mixing weights. [sent-94, score-1.277]

48 We illustrate the above construction on a scalar example with integrator dynamics dx = udt+0. [sent-95, score-0.186]

49 The component final costs are of the form dk fk (x) = (x − ck )2 2 In order to center these quadratics at ck rather than 0 we augment the state: x = [x; 1]. [sent-100, score-0.31]

50 Fig 1 shows the optimal cost-to-go functions v (x, t) = − log (z (x, t)) and the optimal control laws u∗ (x, t) for the following problems: {c = 0; d = 5}, {c = −1, 0, 1; d = 5, 0. [sent-102, score-0.621]

51 As expected the cost-to-go is quadratic and the control law is linear with time-varying gain. [sent-108, score-0.532]

52 The control law is no longer linear but instead has an elaborate shape. [sent-110, score-0.492]

53 The third problem (Fig 1C) resembles robust control in the sense that there is a f1at region where all states are equally good. [sent-111, score-0.395]

54 The corresponding control law uses feedback to push the state into this f1at region. [sent-112, score-0.575]

55 Inside the region the controller does nothing, so as to save energy. [sent-113, score-0.16]

56 5 5 Constructing minimal sets of primitives via SVD of Green’s function We showed how composite problems can be solved once the solutions to the component problems are available. [sent-115, score-0.593]

57 The choice of component boundary conditions defines the manifold (6) of problems that can be solved exactly. [sent-116, score-0.188]

58 Recall that the vector of desirability values z (x) at interior states x ∈ I, which we denoted zI , satisfies the linear equation (2). [sent-120, score-0.344]

59 A minimal set of primitives corresponds to the best low-rank approximation to G. [sent-123, score-0.211]

60 If we define "best" in terms of least squares, a minimal set of R primitives is obtained by approximating G using the top R singular values: G ≈ U SV T S is an R-by-R diagonal matrix, U and V are |I|-by-R and |B|-by-R orthonormal matrices. [sent-124, score-0.259]

61 If we now set zB = V·r , which is the r-th column of V , then zI = G zB ≈ U SV T V·r = Srr U·r Thus the right singular vectors (columns of V ) are the component boundary conditions, while the left singular vectors (columns of U ) are the component solutions. [sent-125, score-0.28]

62 The above construction does not use knowledge of the family of composite problems we aim to solve/approximate. [sent-126, score-0.3]

63 The discretization involves |I| = 24520 interior states and |B| = 4163 boundary states. [sent-141, score-0.242]

64 The parametric family of final costs is f (x, θ) = 13 − 13 exp (5 cos (atan 2 (x2 , x1 ) − θ) − 5) This is an inverted von Mises function specifying the desired location where the state should exit the disk. [sent-142, score-0.158]

65 The desirability function z is well approximated in both cases. [sent-149, score-0.214]

66 The difficulty comes from the fact that the components are not always positive, and as a result the composite solution is not always positive. [sent-152, score-0.251]

67 It is notable that the higher-order components (corresponding to smaller singular values) are only modulated near the boundary – which explains why the approximation errors in Fig 2B are near the boundary. [sent-158, score-0.181]

68 In summary, a small number of components are sufficient to construct composite control laws which are near-optimal in most of the state space. [sent-159, score-0.777]

69 6 Figure 2: Illustration of primitives obtained via SVD. [sent-162, score-0.183]

70 (B) Speed profiles for the movements shown in (A). [sent-169, score-0.158]

71 Note that the same controller generates movements of different duration. [sent-170, score-0.318]

72 (C) Hand paths generated by a composite controller obtained by mixing the optimal controllers for two targets. [sent-171, score-0.62]

73 This controller "decides" online which target to go to. [sent-172, score-0.16]

74 7 6 Application to arm movements We are currently working on an optimal control model of arm movements based on compositionality. [sent-173, score-0.853]

75 The dynamics correspond to a 2-link arm moving in the horizontal plane, and have the form ³ ´ ¨ ˙ τ = M (θ) θ + n θ, θ θ contains the shoulder and elbow joint angles, τ is the applied torque, M is the configurationdependent inertia, and n is the vector of Coriolis, centripetal and viscous forces. [sent-174, score-0.191]

76 This augmentation is needed in order to express reaching movements as a first-exit problem. [sent-181, score-0.199]

77 Without it the movement would stop whenever the instantaneous speed becomes zero – which can happen at reversal points as well as the starting point. [sent-182, score-0.152]

78 Note that most models of reaching movements have assumed predefined final time. [sent-183, score-0.199]

79 However this is unrealistic because we know that movement duration scales with distance, and furthermore such scaling takes place online (i. [sent-184, score-0.145]

80 The above second-order system is expressed in general first-order form, and then the passive dynamics corresponding to τ = 0 are discretized in space and time. [sent-187, score-0.235]

81 Thus we have around 20 million discrete states, and the matrix P characterizing the passive dynamics is 20 million - by - 20 million. [sent-192, score-0.264]

82 The speed profiles for these movements are shown in Fig 3B. [sent-196, score-0.194]

83 In particular, it is known that human reaching movements of different amplitude have similar speed profiles around movement onset, and diverge later. [sent-198, score-0.377]

84 Fig 3C shows results for a composite controller obtained by mixing the optimal control laws for two different targets. [sent-199, score-0.979]

85 In this example the targets are sufficiently far away and the final costs are sufficiently steep, thus the mixing yields a switching controller instead of an interpolating controller. [sent-200, score-0.33]

86 Depending on the starting point, this controller takes the hand to one or the other target, and can also switch online if the hand is perturbed. [sent-201, score-0.16]

87 An interpolating controller can be created by placing the targets closer or making the component final costs less steep. [sent-202, score-0.32]

88 In future work we will explore this model in more detail and also build a more realistic model using 3rd-order dynamics (incorporating muscle time constants). [sent-204, score-0.163]

89 7 Summary and relation to prior work We developed a theory of compositionality applicable to a general class of stochastic optimal control problems. [sent-206, score-0.731]

90 Although in this paper we used simple examples, the potential of such compositionality to tackle complex control problems seems clear. [sent-207, score-0.662]

91 While the motivation is similar, PVFs are based on intuitions (mostly from grid worlds divided into rooms) rather than mathematical results regarding optimality of the composite solution. [sent-211, score-0.263]

92 Another difference is that PVFs do not take into account the cost rate q and the boundary B. [sent-213, score-0.181]

93 A particularly important point made by [6] is that the primitives can be only approximately optimal (in this case obtained via local LQG approximations), and yet their combination still produces good results. [sent-218, score-0.304]

94 Maggioni, “Proto-value functions: A Laplacian farmework for learning representation and control in Markov decision processes,” Journal of Machine Learning Research, vol. [sent-238, score-0.329]

95 Popovic, “Linear bellman combination for control of character animation,” To appear in SIGGRAPH, 2009. [sent-244, score-0.454]

96 Bizzi, “The construction of movement by the spinal cord,” Nature Neuroscience, vol. [sent-273, score-0.15]

97 Bizzi, “Combinations of muscle synergies in the construction of a natural motor behavior,” Nat. [sent-280, score-0.199]

98 [16] ——, “General duality between optimal control and estimation,” IEEE Conference on Decision and Control, 2008. [sent-295, score-0.403]

99 Kappen, “Linear theory for control of nonlinear stochastic systems,” Physical Review Letters, vol. [sent-303, score-0.368]

100 Todorov, “Eigen-function approximation methods for linearly-solvable optimal control problems,” IEEE International Symposium on Adaptive Dynamic Programming and Reinforcemenet Learning, 2009. [sent-306, score-0.403]


similar papers computed by tfidf model

tfidf for this paper:

wordName wordTfidf (topN-words)

[('control', 0.329), ('compositionality', 0.289), ('lqg', 0.289), ('zk', 0.269), ('composite', 0.222), ('desirability', 0.214), ('primitives', 0.183), ('fig', 0.181), ('law', 0.163), ('controller', 0.16), ('movements', 0.158), ('wk', 0.152), ('vk', 0.148), ('laws', 0.144), ('bellman', 0.125), ('dynamics', 0.124), ('movement', 0.116), ('pvfs', 0.111), ('todorov', 0.111), ('passive', 0.111), ('zb', 0.107), ('fk', 0.105), ('boundary', 0.104), ('controllers', 0.083), ('nal', 0.081), ('mk', 0.079), ('cost', 0.077), ('ws', 0.075), ('optimal', 0.074), ('svd', 0.072), ('motor', 0.068), ('zs', 0.067), ('arm', 0.067), ('costs', 0.067), ('udt', 0.067), ('synergies', 0.058), ('state', 0.053), ('zi', 0.051), ('mixing', 0.05), ('pii', 0.05), ('interior', 0.049), ('ck', 0.049), ('discretization', 0.048), ('solver', 0.048), ('singular', 0.048), ('yet', 0.047), ('agendas', 0.044), ('kuk', 0.044), ('odes', 0.044), ('pib', 0.044), ('saltiel', 0.044), ('torque', 0.044), ('problems', 0.044), ('reaching', 0.041), ('states', 0.041), ('optimality', 0.041), ('kl', 0.041), ('equation', 0.04), ('component', 0.04), ('quadratic', 0.04), ('linearity', 0.039), ('stochastic', 0.039), ('agenda', 0.039), ('bizzi', 0.039), ('muscle', 0.039), ('exp', 0.038), ('zx', 0.036), ('speed', 0.036), ('formulations', 0.035), ('les', 0.035), ('qi', 0.035), ('construction', 0.034), ('sensorimotor', 0.033), ('solutions', 0.032), ('cartesian', 0.032), ('bertsekas', 0.032), ('spaces', 0.031), ('paths', 0.031), ('pursuing', 0.03), ('pro', 0.03), ('feedback', 0.03), ('components', 0.029), ('discretize', 0.029), ('duration', 0.029), ('resemble', 0.029), ('barto', 0.029), ('discrete', 0.029), ('differential', 0.029), ('minimal', 0.028), ('dx', 0.028), ('preliminary', 0.027), ('athena', 0.027), ('interpolating', 0.027), ('gaussians', 0.026), ('amplitude', 0.026), ('targets', 0.026), ('sv', 0.025), ('diag', 0.025), ('resembles', 0.025), ('diffusion', 0.025)]

similar papers list:

simIndex simValue paperId paperTitle

same-paper 1 1.0000007 54 nips-2009-Compositionality of optimal control laws

Author: Emanuel Todorov

Abstract: We present a theory of compositionality in stochastic optimal control, showing how task-optimal controllers can be constructed from certain primitives. The primitives are themselves feedback controllers pursuing their own agendas. They are mixed in proportion to how much progress they are making towards their agendas and how compatible their agendas are with the present task. The resulting composite control law is provably optimal when the problem belongs to a certain class. This class is rather general and yet has a number of unique properties – one of which is that the Bellman equation can be made linear even for non-linear or discrete dynamics. This gives rise to the compositionality developed here. In the special case of linear dynamics and Gaussian noise our framework yields analytical solutions (i.e. non-linear mixtures of LQG controllers) without requiring the final cost to be quadratic. More generally, a natural set of control primitives can be constructed by applying SVD to Green’s function of the Bellman equation. We illustrate the theory in the context of human arm movements. The ideas of optimality and compositionality are both very prominent in the field of motor control, yet they have been difficult to reconcile. Our work makes this possible.

2 0.13267504 210 nips-2009-STDP enables spiking neurons to detect hidden causes of their inputs

Author: Bernhard Nessler, Michael Pfeiffer, Wolfgang Maass

Abstract: The principles by which spiking neurons contribute to the astounding computational power of generic cortical microcircuits, and how spike-timing-dependent plasticity (STDP) of synaptic weights could generate and maintain this computational function, are unknown. We show here that STDP, in conjunction with a stochastic soft winner-take-all (WTA) circuit, induces spiking neurons to generate through their synaptic weights implicit internal models for subclasses (or “causes”) of the high-dimensional spike patterns of hundreds of pre-synaptic neurons. Hence these neurons will fire after learning whenever the current input best matches their internal model. The resulting computational function of soft WTA circuits, a common network motif of cortical microcircuits, could therefore be a drastic dimensionality reduction of information streams, together with the autonomous creation of internal models for the probability distributions of their input patterns. We show that the autonomous generation and maintenance of this computational function can be explained on the basis of rigorous mathematical principles. In particular, we show that STDP is able to approximate a stochastic online Expectation-Maximization (EM) algorithm for modeling the input data. A corresponding result is shown for Hebbian learning in artificial neural networks. 1

3 0.1318593 60 nips-2009-Convergent Temporal-Difference Learning with Arbitrary Smooth Function Approximation

Author: Shalabh Bhatnagar, Doina Precup, David Silver, Richard S. Sutton, Hamid R. Maei, Csaba Szepesvári

Abstract: We introduce the first temporal-difference learning algorithms that converge with smooth value function approximators, such as neural networks. Conventional temporal-difference (TD) methods, such as TD(λ), Q-learning and Sarsa have been used successfully with function approximation in many applications. However, it is well known that off-policy sampling, as well as nonlinear function approximation, can cause these algorithms to become unstable (i.e., the parameters of the approximator may diverge). Sutton et al. (2009a, 2009b) solved the problem of off-policy learning with linear TD algorithms by introducing a new objective function, related to the Bellman error, and algorithms that perform stochastic gradient-descent on this function. These methods can be viewed as natural generalizations to previous TD methods, as they converge to the same limit points when used with linear function approximation methods. We generalize this work to nonlinear function approximation. We present a Bellman error objective function and two gradient-descent TD algorithms that optimize it. We prove the asymptotic almost-sure convergence of both algorithms, for any finite Markov decision process and any smooth value function approximator, to a locally optimal solution. The algorithms are incremental and the computational complexity per time step scales linearly with the number of parameters of the approximator. Empirical results obtained in the game of Go demonstrate the algorithms’ effectiveness. 1

4 0.11574566 252 nips-2009-Unsupervised Feature Selection for the $k$-means Clustering Problem

Author: Christos Boutsidis, Petros Drineas, Michael W. Mahoney

Abstract: We present a novel feature selection algorithm for the k-means clustering problem. Our algorithm is randomized and, assuming an accuracy parameter ϵ ∈ (0, 1), selects and appropriately rescales in an unsupervised manner Θ(k log(k/ϵ)/ϵ2 ) features from a dataset of arbitrary dimensions. We prove that, if we run any γ-approximate k-means algorithm (γ ≥ 1) on the features selected using our method, we can find a (1 + (1 + ϵ)γ)-approximate partition with high probability. 1

5 0.08417809 99 nips-2009-Functional network reorganization in motor cortex can be explained by reward-modulated Hebbian learning

Author: Steven Chase, Andrew Schwartz, Wolfgang Maass, Robert A. Legenstein

Abstract: The control of neuroprosthetic devices from the activity of motor cortex neurons benefits from learning effects where the function of these neurons is adapted to the control task. It was recently shown that tuning properties of neurons in monkey motor cortex are adapted selectively in order to compensate for an erroneous interpretation of their activity. In particular, it was shown that the tuning curves of those neurons whose preferred directions had been misinterpreted changed more than those of other neurons. In this article, we show that the experimentally observed self-tuning properties of the system can be explained on the basis of a simple learning rule. This learning rule utilizes neuronal noise for exploration and performs Hebbian weight updates that are modulated by a global reward signal. In contrast to most previously proposed reward-modulated Hebbian learning rules, this rule does not require extraneous knowledge about what is noise and what is signal. The learning rule is able to optimize the performance of the model system within biologically realistic periods of time and under high noise levels. When the neuronal noise is fitted to experimental data, the model produces learning effects similar to those found in monkey experiments.

6 0.072656743 113 nips-2009-Improving Existing Fault Recovery Policies

7 0.070765726 136 nips-2009-Learning to Rank by Optimizing NDCG Measure

8 0.064930104 211 nips-2009-Segmenting Scenes by Matching Image Composites

9 0.060829759 209 nips-2009-Robust Value Function Approximation Using Bilinear Programming

10 0.05939129 150 nips-2009-Maximum likelihood trajectories for continuous-time Markov chains

11 0.059097465 162 nips-2009-Neural Implementation of Hierarchical Bayesian Inference by Importance Sampling

12 0.056941144 145 nips-2009-Manifold Embeddings for Model-Based Reinforcement Learning under Partial Observability

13 0.053273838 240 nips-2009-Sufficient Conditions for Agnostic Active Learnable

14 0.049007781 57 nips-2009-Conditional Random Fields with High-Order Features for Sequence Labeling

15 0.04890554 137 nips-2009-Learning transport operators for image manifolds

16 0.048703048 221 nips-2009-Solving Stochastic Games

17 0.048643723 61 nips-2009-Convex Relaxation of Mixture Regression with Efficient Algorithms

18 0.045536406 229 nips-2009-Statistical Analysis of Semi-Supervised Learning: The Limit of Infinite Unlabelled Data

19 0.043601625 250 nips-2009-Training Factor Graphs with Reinforcement Learning for Efficient MAP Inference

20 0.043584544 147 nips-2009-Matrix Completion from Noisy Entries


similar papers computed by lsi model

lsi for this paper:

topicId topicWeight

[(0, -0.155), (1, 0.007), (2, 0.09), (3, -0.017), (4, -0.047), (5, -0.009), (6, 0.01), (7, -0.02), (8, 0.019), (9, -0.017), (10, 0.012), (11, 0.032), (12, -0.001), (13, -0.01), (14, -0.04), (15, -0.061), (16, -0.033), (17, -0.123), (18, -0.056), (19, -0.029), (20, 0.056), (21, -0.064), (22, 0.078), (23, -0.078), (24, -0.112), (25, -0.069), (26, -0.169), (27, 0.036), (28, -0.133), (29, 0.074), (30, -0.043), (31, -0.093), (32, 0.007), (33, -0.035), (34, -0.032), (35, -0.151), (36, 0.008), (37, -0.109), (38, 0.104), (39, -0.046), (40, 0.03), (41, 0.028), (42, -0.035), (43, -0.026), (44, 0.006), (45, 0.075), (46, 0.011), (47, 0.203), (48, 0.107), (49, -0.224)]

similar papers list:

simIndex simValue paperId paperTitle

same-paper 1 0.95679891 54 nips-2009-Compositionality of optimal control laws

Author: Emanuel Todorov

Abstract: We present a theory of compositionality in stochastic optimal control, showing how task-optimal controllers can be constructed from certain primitives. The primitives are themselves feedback controllers pursuing their own agendas. They are mixed in proportion to how much progress they are making towards their agendas and how compatible their agendas are with the present task. The resulting composite control law is provably optimal when the problem belongs to a certain class. This class is rather general and yet has a number of unique properties – one of which is that the Bellman equation can be made linear even for non-linear or discrete dynamics. This gives rise to the compositionality developed here. In the special case of linear dynamics and Gaussian noise our framework yields analytical solutions (i.e. non-linear mixtures of LQG controllers) without requiring the final cost to be quadratic. More generally, a natural set of control primitives can be constructed by applying SVD to Green’s function of the Bellman equation. We illustrate the theory in the context of human arm movements. The ideas of optimality and compositionality are both very prominent in the field of motor control, yet they have been difficult to reconcile. Our work makes this possible.

2 0.6484319 60 nips-2009-Convergent Temporal-Difference Learning with Arbitrary Smooth Function Approximation

Author: Shalabh Bhatnagar, Doina Precup, David Silver, Richard S. Sutton, Hamid R. Maei, Csaba Szepesvári

Abstract: We introduce the first temporal-difference learning algorithms that converge with smooth value function approximators, such as neural networks. Conventional temporal-difference (TD) methods, such as TD(λ), Q-learning and Sarsa have been used successfully with function approximation in many applications. However, it is well known that off-policy sampling, as well as nonlinear function approximation, can cause these algorithms to become unstable (i.e., the parameters of the approximator may diverge). Sutton et al. (2009a, 2009b) solved the problem of off-policy learning with linear TD algorithms by introducing a new objective function, related to the Bellman error, and algorithms that perform stochastic gradient-descent on this function. These methods can be viewed as natural generalizations to previous TD methods, as they converge to the same limit points when used with linear function approximation methods. We generalize this work to nonlinear function approximation. We present a Bellman error objective function and two gradient-descent TD algorithms that optimize it. We prove the asymptotic almost-sure convergence of both algorithms, for any finite Markov decision process and any smooth value function approximator, to a locally optimal solution. The algorithms are incremental and the computational complexity per time step scales linearly with the number of parameters of the approximator. Empirical results obtained in the game of Go demonstrate the algorithms’ effectiveness. 1

3 0.55124724 252 nips-2009-Unsupervised Feature Selection for the $k$-means Clustering Problem

Author: Christos Boutsidis, Petros Drineas, Michael W. Mahoney

Abstract: We present a novel feature selection algorithm for the k-means clustering problem. Our algorithm is randomized and, assuming an accuracy parameter ϵ ∈ (0, 1), selects and appropriately rescales in an unsupervised manner Θ(k log(k/ϵ)/ϵ2 ) features from a dataset of arbitrary dimensions. We prove that, if we run any γ-approximate k-means algorithm (γ ≥ 1) on the features selected using our method, we can find a (1 + (1 + ϵ)γ)-approximate partition with high probability. 1

4 0.5131768 159 nips-2009-Multi-Step Dyna Planning for Policy Evaluation and Control

Author: Hengshuai Yao, Shalabh Bhatnagar, Dongcui Diao, Richard S. Sutton, Csaba Szepesvári

Abstract: In this paper we introduce a multi-step linear Dyna-style planning algorithm. The key element of the multi-step linear Dyna is a multi-step linear model that enables multi-step projection of a sampled feature and multi-step planning based on the simulated multi-step transition experience. We propose two multi-step linear models. The first iterates the one-step linear model, but is generally computationally complex. The second interpolates between the one-step model and the infinite-step model (which turns out to be the LSTD solution), and can be learned efficiently online. Policy evaluation on Boyan Chain shows that multi-step linear Dyna learns a policy faster than single-step linear Dyna, and generally learns faster as the number of projection steps increases. Results on Mountain-car show that multi-step linear Dyna leads to much better online performance than single-step linear Dyna and model-free algorithms; however, the performance of multi-step linear Dyna does not always improve as the number of projection steps increases. Our results also suggest that previous attempts on extending LSTD for online control were unsuccessful because LSTD looks infinite steps into the future, and suffers from the model errors in non-stationary (control) environments.

5 0.45944309 210 nips-2009-STDP enables spiking neurons to detect hidden causes of their inputs

Author: Bernhard Nessler, Michael Pfeiffer, Wolfgang Maass

Abstract: The principles by which spiking neurons contribute to the astounding computational power of generic cortical microcircuits, and how spike-timing-dependent plasticity (STDP) of synaptic weights could generate and maintain this computational function, are unknown. We show here that STDP, in conjunction with a stochastic soft winner-take-all (WTA) circuit, induces spiking neurons to generate through their synaptic weights implicit internal models for subclasses (or “causes”) of the high-dimensional spike patterns of hundreds of pre-synaptic neurons. Hence these neurons will fire after learning whenever the current input best matches their internal model. The resulting computational function of soft WTA circuits, a common network motif of cortical microcircuits, could therefore be a drastic dimensionality reduction of information streams, together with the autonomous creation of internal models for the probability distributions of their input patterns. We show that the autonomous generation and maintenance of this computational function can be explained on the basis of rigorous mathematical principles. In particular, we show that STDP is able to approximate a stochastic online Expectation-Maximization (EM) algorithm for modeling the input data. A corresponding result is shown for Hebbian learning in artificial neural networks. 1

6 0.43250811 150 nips-2009-Maximum likelihood trajectories for continuous-time Markov chains

7 0.42884615 99 nips-2009-Functional network reorganization in motor cortex can be explained by reward-modulated Hebbian learning

8 0.39852712 145 nips-2009-Manifold Embeddings for Model-Based Reinforcement Learning under Partial Observability

9 0.3906886 13 nips-2009-A Neural Implementation of the Kalman Filter

10 0.36226693 209 nips-2009-Robust Value Function Approximation Using Bilinear Programming

11 0.35418764 248 nips-2009-Toward Provably Correct Feature Selection in Arbitrary Domains

12 0.32772908 193 nips-2009-Potential-Based Agnostic Boosting

13 0.32680777 105 nips-2009-Grouped Orthogonal Matching Pursuit for Variable Selection and Prediction

14 0.32197401 240 nips-2009-Sufficient Conditions for Agnostic Active Learnable

15 0.32070464 12 nips-2009-A Generalized Natural Actor-Critic Algorithm

16 0.31948975 113 nips-2009-Improving Existing Fault Recovery Policies

17 0.31466302 234 nips-2009-Streaming k-means approximation

18 0.3124837 16 nips-2009-A Smoothed Approximate Linear Program

19 0.30071384 10 nips-2009-A Gaussian Tree Approximation for Integer Least-Squares

20 0.29464048 160 nips-2009-Multiple Incremental Decremental Learning of Support Vector Machines


similar papers computed by lda model

lda for this paper:

topicId topicWeight

[(24, 0.041), (25, 0.048), (35, 0.043), (36, 0.076), (39, 0.505), (58, 0.066), (61, 0.042), (71, 0.035), (81, 0.014), (86, 0.044), (91, 0.013)]

similar papers list:

simIndex simValue paperId paperTitle

1 0.96926969 109 nips-2009-Hierarchical Learning of Dimensional Biases in Human Categorization

Author: Adam Sanborn, Nick Chater, Katherine A. Heller

Abstract: Existing models of categorization typically represent to-be-classified items as points in a multidimensional space. While from a mathematical point of view, an infinite number of basis sets can be used to represent points in this space, the choice of basis set is psychologically crucial. People generally choose the same basis dimensions – and have a strong preference to generalize along the axes of these dimensions, but not “diagonally”. What makes some choices of dimension special? We explore the idea that the dimensions used by people echo the natural variation in the environment. Specifically, we present a rational model that does not assume dimensions, but learns the same type of dimensional generalizations that people display. This bias is shaped by exposing the model to many categories with a structure hypothesized to be like those which children encounter. The learning behaviour of the model captures the developmental shift from roughly “isotropic” for children to the axis-aligned generalization that adults show. 1

same-paper 2 0.9388569 54 nips-2009-Compositionality of optimal control laws

Author: Emanuel Todorov

Abstract: We present a theory of compositionality in stochastic optimal control, showing how task-optimal controllers can be constructed from certain primitives. The primitives are themselves feedback controllers pursuing their own agendas. They are mixed in proportion to how much progress they are making towards their agendas and how compatible their agendas are with the present task. The resulting composite control law is provably optimal when the problem belongs to a certain class. This class is rather general and yet has a number of unique properties – one of which is that the Bellman equation can be made linear even for non-linear or discrete dynamics. This gives rise to the compositionality developed here. In the special case of linear dynamics and Gaussian noise our framework yields analytical solutions (i.e. non-linear mixtures of LQG controllers) without requiring the final cost to be quadratic. More generally, a natural set of control primitives can be constructed by applying SVD to Green’s function of the Bellman equation. We illustrate the theory in the context of human arm movements. The ideas of optimality and compositionality are both very prominent in the field of motor control, yet they have been difficult to reconcile. Our work makes this possible.

3 0.90449131 102 nips-2009-Graph-based Consensus Maximization among Multiple Supervised and Unsupervised Models

Author: Jing Gao, Feng Liang, Wei Fan, Yizhou Sun, Jiawei Han

Abstract: Ensemble classifiers such as bagging, boosting and model averaging are known to have improved accuracy and robustness over a single model. Their potential, however, is limited in applications which have no access to raw data but to the meta-level model output. In this paper, we study ensemble learning with output from multiple supervised and unsupervised models, a topic where little work has been done. Although unsupervised models, such as clustering, do not directly generate label prediction for each individual, they provide useful constraints for the joint prediction of a set of related objects. We propose to consolidate a classification solution by maximizing the consensus among both supervised predictions and unsupervised constraints. We cast this ensemble task as an optimization problem on a bipartite graph, where the objective function favors the smoothness of the prediction over the graph, as well as penalizing deviations from the initial labeling provided by supervised models. We solve this problem through iterative propagation of probability estimates among neighboring nodes. Our method can also be interpreted as conducting a constrained embedding in a transformed space, or a ranking on the graph. Experimental results on three real applications demonstrate the benefits of the proposed method over existing alternatives1 . 1

4 0.87342262 251 nips-2009-Unsupervised Detection of Regions of Interest Using Iterative Link Analysis

Author: Gunhee Kim, Antonio Torralba

Abstract: This paper proposes a fast and scalable alternating optimization technique to detect regions of interest (ROIs) in cluttered Web images without labels. The proposed approach discovers highly probable regions of object instances by iteratively repeating the following two functions: (1) choose the exemplar set (i.e. a small number of highly ranked reference ROIs) across the dataset and (2) refine the ROIs of each image with respect to the exemplar set. These two subproblems are formulated as ranking in two different similarity networks of ROI hypotheses by link analysis. The experiments with the PASCAL 06 dataset show that our unsupervised localization performance is better than one of state-of-the-art techniques and comparable to supervised methods. Also, we test the scalability of our approach with five objects in Flickr dataset consisting of more than 200K images. 1

5 0.86665589 110 nips-2009-Hierarchical Mixture of Classification Experts Uncovers Interactions between Brain Regions

Author: Bangpeng Yao, Dirk Walther, Diane Beck, Li Fei-fei

Abstract: The human brain can be described as containing a number of functional regions. These regions, as well as the connections between them, play a key role in information processing in the brain. However, most existing multi-voxel pattern analysis approaches either treat multiple regions as one large uniform region or several independent regions, ignoring the connections between them. In this paper we propose to model such connections in an Hidden Conditional Random Field (HCRF) framework, where the classiďŹ er of one region of interest (ROI) makes predictions based on not only its voxels but also the predictions from ROIs that it connects to. Furthermore, we propose a structural learning method in the HCRF framework to automatically uncover the connections between ROIs. We illustrate this approach with fMRI data acquired while human subjects viewed images of different natural scene categories and show that our model can improve the top-level (the classiďŹ er combining information from all ROIs) and ROI-level prediction accuracy, as well as uncover some meaningful connections between ROIs. 1

6 0.63511401 21 nips-2009-Abstraction and Relational learning

7 0.61478657 148 nips-2009-Matrix Completion from Power-Law Distributed Samples

8 0.58708107 85 nips-2009-Explaining human multiple object tracking as resource-constrained approximate inference in a dynamic probabilistic model

9 0.58690196 44 nips-2009-Beyond Categories: The Visual Memex Model for Reasoning About Object Relationships

10 0.58205253 154 nips-2009-Modeling the spacing effect in sequential category learning

11 0.57746875 155 nips-2009-Modelling Relational Data using Bayesian Clustered Tensor Factorization

12 0.56800759 9 nips-2009-A Game-Theoretic Approach to Hypergraph Clustering

13 0.5637452 115 nips-2009-Individuation, Identification and Object Discovery

14 0.56263679 188 nips-2009-Perceptual Multistability as Markov Chain Monte Carlo Inference

15 0.5529716 125 nips-2009-Learning Brain Connectivity of Alzheimer's Disease from Neuroimaging Data

16 0.55256754 133 nips-2009-Learning models of object structure

17 0.54756325 99 nips-2009-Functional network reorganization in motor cortex can be explained by reward-modulated Hebbian learning

18 0.53909123 112 nips-2009-Human Rademacher Complexity

19 0.53771186 86 nips-2009-Exploring Functional Connectivities of the Human Brain using Multivariate Information Analysis

20 0.53216434 40 nips-2009-Bayesian Nonparametric Models on Decomposable Graphs