nips nips2006 nips2006-124 knowledge-graph by maker-knowledge-mining
Source: pdf
Author: Emanuel Todorov
Abstract: We introduce a class of MPDs which greatly simplify Reinforcement Learning. They have discrete state spaces and continuous control spaces. The controls have the effect of rescaling the transition probabilities of an underlying Markov chain. A control cost penalizing KL divergence between controlled and uncontrolled transition probabilities makes the minimization problem convex, and allows analytical computation of the optimal controls given the optimal value function. An exponential transformation of the optimal value function makes the minimized Bellman equation linear. Apart from their theoretical signi cance, the new MDPs enable ef cient approximations to traditional MDPs. Shortest path problems are approximated to arbitrary precision with largest eigenvalue problems, yielding an O (n) algorithm. Accurate approximations to generic MDPs are obtained via continuous embedding reminiscent of LP relaxation in integer programming. Offpolicy learning of the optimal value function is possible without need for stateaction values; the new algorithm (Z-learning) outperforms Q-learning. This work was supported by NSF grant ECS–0524761. 1
Reference: text
sentIndex sentText sentNum sentScore
1 They have discrete state spaces and continuous control spaces. [sent-4, score-0.374]
2 The controls have the effect of rescaling the transition probabilities of an underlying Markov chain. [sent-5, score-0.339]
3 A control cost penalizing KL divergence between controlled and uncontrolled transition probabilities makes the minimization problem convex, and allows analytical computation of the optimal controls given the optimal value function. [sent-6, score-0.822]
4 An exponential transformation of the optimal value function makes the minimized Bellman equation linear. [sent-7, score-0.1]
5 Apart from their theoretical signi cance, the new MDPs enable ef cient approximations to traditional MDPs. [sent-8, score-0.059]
6 Shortest path problems are approximated to arbitrary precision with largest eigenvalue problems, yielding an O (n) algorithm. [sent-9, score-0.158]
7 Accurate approximations to generic MDPs are obtained via continuous embedding reminiscent of LP relaxation in integer programming. [sent-10, score-0.176]
8 1 Introduction In recent years many hard problems have been transformed into easier problems that can be solved ef ciently via linear methods [1] or convex optimization [2]. [sent-13, score-0.073]
9 Indeed the discrete and unstructured nature of traditional MDPs seems incompatible with simplifying features such as linearity and convexity. [sent-15, score-0.139]
10 Here we construct the rst MDP family where the minimization over the control space is convex and analytically tractable, and where the Bellman equation can be exactly transformed into a linear equation. [sent-17, score-0.194]
11 We focus on problems where a non-empty subset A S of states are absorbing and incur zero cost: pij (u) = j and i ` (i; u) = 0 whenever i 2 A. [sent-22, score-0.868]
12 2 A class of more tractable MDPs In our new class of MDPs the control u 2 RjSj is a real-valued vector with dimensionality equal to the number of discrete states. [sent-25, score-0.26]
13 The elements uj of u have the effect of directly modifying the transition probabilities of an uncontrolled Markov chain. [sent-26, score-0.822]
14 In particular, given an uncontrolled transition probability matrix P with elements pij , we de ne the controlled transition probabilities as (2) pij (u) = pij exp (uj ) Note that P (0) = P . [sent-27, score-2.696]
15 In some sense this is the most general notion of "control" one can imagine – we are allowing the controller to rescale the underlying transition probabilities in any way it wishes. [sent-28, score-0.365]
16 In this case uj has no effect and so we set it to 0 for concreteness. [sent-31, score-0.355]
17 Thus the admissible controls are n o X U (i) = u 2 RjSj ; pij exp (uj ) = 1; pij = 0 =) uj = 0 (3) j Real-valued controls make it possible to de ne a natural control cost. [sent-33, score-1.987]
18 Since the control vector acts directly on the transition probabilities, it makes sense to measure its magnitude in terms of the difference between the controlled and uncontrolled transition probabilities. [sent-34, score-0.642]
19 Let pi (u) denote the i-th row-vector of the matrix P (u), that is, the vector of transition probabilities from state i to all other states under control u. [sent-36, score-0.61]
20 The control cost is de ned as r (i; u) = KL (pi (u) jjpi (0)) = X j:pij 6=0 pij (u) log From the properties of KL divergence it follows that r (i; u) Substituting (2) in (4) and simplifying, the control cost becomes X r (i; u) = pij (u) uj pij (u) pij (0) (4) 0, and r (i; u) = 0 iff u = 0. [sent-37, score-3.351]
21 Before each transition the controller speci es the price uj it is willing to pay (or collect, if uj < 0) for every possible next state j. [sent-40, score-1.161]
22 When the actual transition occurs, say to state k, the controller pays the price uk it promised. [sent-41, score-0.453]
23 Then r (i; u) is the price the controller expects to pay before observing the transition. [sent-42, score-0.187]
24 Coming back to the MDP construction, we allow an arbitrary state cost q (i) above control cost: ` (i; u) = q (i) + r (i; u) 0 in addition to the (6) We require q (i) = 0 for absorbing states i 2 A so that the process can continue inde nitely without incurring extra costs. [sent-43, score-0.477]
25 Substituting (5, 6) in (1), the Bellman equation for our MDP is n o X v (i) = min q (i) + pij exp (uj ) (uj + v (j)) (7) j u2U (i) We can now exploit the bene ts of this unusual construction. [sent-44, score-0.782]
26 The Lagrange multiplier i can be found by applying the constraint (3) to the optimal control (10). [sent-47, score-0.175]
27 The result is X pij exp ( v (j)) 1 (12) i = log j and therefore the optimal control law is uj (i) = v (j) log X k pik exp ( v (k)) (13) Thus we have expressed the optimal control law in closed form given the optimal value function. [sent-48, score-1.745]
28 Note that the only in uence of the current state i is through the second term, which serves to normalize the transition probability distribution pi (u ) and is identical for all next states j. [sent-49, score-0.385]
29 Thus the optimal controller is a high-level controller: it tells the Markov chain to go to good states without specifying how to get there. [sent-50, score-0.225]
30 The details of the trajectory emerge from the interaction of this controller and the uncontrolled stochastic dynamics. [sent-51, score-0.285]
31 In the special case pij = consti the transition probabilities (14) correspond to a Gibbs distribution where the optimal value function plays the role of an energy function. [sent-53, score-0.955]
32 1 Iterative solution and convergence analysis From (19) it follows that z is an eigenvector of GP with eigenvalue 1. [sent-56, score-0.103]
33 The answer to both questions is af rmative, because the Bellman equation has a unique solution and v is a solution to the Bellman equation iff z = exp ( v) is an admissible solution to (19). [sent-59, score-0.365]
34 The obvious iterative method is zk+1 = GP zk ; (20) z0 = 1 This iteration always converges to the unique solution, for the following reasons. [sent-61, score-0.111]
35 Multiplication by G scales down some of the rows of P , therefore GP has spectral radius at most 1. [sent-63, score-0.062]
36 But we are guaranteed than an eigenvector z with eigenvalue 1 exists, therefore GP has spectral radius 1 and z is a largest eigenvector. [sent-64, score-0.155]
37 Iteration (20) is equivalent to the power method (without the rescaling which is unnecessary here) so it converges to a largest eigenvector. [sent-65, score-0.075]
38 In particular, for i 2 A the i-th row of GP has elements j , and so the i-th element of zk remains equal i to 1 for all k. [sent-67, score-0.124]
39 The states can be permuted so that GP is in canonical form: T1 T 2 GP = (21) 0 I where the absorbing states are last, T1 is (n m) by (n m), and T2 is (n m) by m. [sent-70, score-0.284]
40 From (21) we have GP k = k T1 0 k T1 1 + + T1 + I T2 I = k T1 0 I k T1 (I T1 ) I 1 T2 (22) A stochastic matrix P with m absorbing states has m eigenvalues 1, and all other eigenvalues are smaller than 1 in absolute value. [sent-72, score-0.318]
41 Since the diagonal elements of G are no greater than 1, all eigenk values of T1 are smaller than 1 and so limk! [sent-73, score-0.074]
42 Therefore iteration (20) converges exk ponentially as where < 1 is the largest eigenvalue of T1 . [sent-75, score-0.133]
43 The factors that can make small are: (i) large state costs q (i) resulting in small terms exp ( q (i)) along the diagonal of G; (ii) small transition probabilities among non-absorbing states (and large transition probabilities from non-absorbing to absorbing states). [sent-77, score-1.067]
44 Thus the average running time scales linearly with the number of non-zero elements in P . [sent-80, score-0.053]
45 2 Alternative problem formulations While the focus of this paper is on in nite-horizon total-cost problems with absorbing states, we have obtained similar results for all other problem formulations commonly used in Reinforcement Learning. [sent-82, score-0.225]
46 In nite-horizon problems equation (19) becomes z (t) = G (t) P (t) z (t + 1) (23) where z (t nal ) is initialized from a given nal cost function. [sent-84, score-0.144]
47 In in nite-horizon average-cost-perstage problems equation (19) becomes z = GP z (24) where is the largest eigenvalue of GP , z is a differential value function, and the average cost-perstage turns out to be log ( ). [sent-85, score-0.239]
48 In in nite-horizon discounted-cost problems equation (19) becomes z = GP z (25) where < 1 is the discount factor and z is de ned element-wise. [sent-86, score-0.092]
49 Even though the latter equation is nonlinear, we have observed that the analog of iteration (20) still converges rapidly. [sent-87, score-0.088]
50 Our goal is to nd the length s (i) of the shortest path from every i 2 S to some vertex in A. [sent-90, score-0.193]
51 i We now show how the shortest path lengths s (i) can be obtained from our MDP. [sent-92, score-0.154]
52 De ne the elements of the stochastic matrix P as dij pij = P (26) k dik corresponding to a random walk on the graph. [sent-93, score-0.821]
53 Next choose > 0 and de ne the state costs q (i) = when i 2 A, = q (i) = 0 when i 2 A (27) This cost model means that we pay a price whenever the current state is not in A. [sent-94, score-0.439]
54 If the control costs were 0 then the shortest paths would simply be s (i) = 1 v (i). [sent-96, score-0.377]
55 Here the control costs are not 0, however they are bounded. [sent-97, score-0.232]
56 This can be shown using (28) pij (u) = pij exp (uj ) 1 which implies that for pij 6= 0 we have uj log pij . [sent-98, score-3.027]
57 Since r (i; u) is a convex combination of the elements of u, the following bound holds: r (i; u) maxj (uj ) log minj:pij 6=0 pij (29) The control costs are bounded and we are free to choose arbitrarily large, so we can make the state costs dominate the optimal value function. [sent-99, score-1.231]
58 1 Thus we have reduced the shortest path problem to an eigenvalue problem. [sent-101, score-0.216]
59 In spectral graph theory many problems have previously been related to eigenvalues of the graph Laplacian [4], but the shortest path problem was not among them until now. [sent-102, score-0.28]
60 Of course (30) involves a limit and so we cannot obtain the exact shortest paths by solving a single eigenvalue problem. [sent-106, score-0.207]
61 However we can obtain a good approximation by setting large enough – but not too large because exp ( ) may become numerically indistinguishable from 0. [sent-107, score-0.129]
62 The result in 1B matches the exact shortest paths. [sent-110, score-0.114]
63 Although the solution for = 1 is numerically larger, it is basically a scaled-up version of the correct solution. [sent-111, score-0.053]
64 4 Approximating discrete MDPs via continuous embedding In the previous section we replaced the shortest path problem with a continuous MDP and obtained an excellent approximation. [sent-113, score-0.406]
65 Here we obtain approximations of similar quality in more general settings, using an approach reminiscent of LP-relaxation in integer programming. [sent-114, score-0.081]
66 We construct an embedding which associates the controls in the discrete MDP with speci c control vectors of a continuous MDP, making sure that for these control vectors the continuous MDP has the same costs and transition probabilities as the discrete MDP. [sent-116, score-1.017]
67 Consider a discrete MDP with transition probabilities and e costs denoted p and `. [sent-118, score-0.472]
68 De ne the matrix B (i) of all controlled transition probabilities from state i. [sent-119, score-0.417]
69 e This matrix has elements baj (i) = pij (a) ; e a 2 U (i) (31) We need two assumptions to guarantee the existence of an exact embedding: for all i 2 S the matrix B (i) must have full row-rank, and if any element of B (i) is 0 then the entire column must be 0. [sent-120, score-0.799]
70 If the latter assumption does not hold, we can replace the problematic 0 elements of B (i) with a small and renormalize. [sent-121, score-0.053]
71 states j for which pij (a) > 0 e for any/all a 2 U (i). [sent-124, score-0.711]
72 The rst step in the construction is to compute the real-valued control vectors ua corresponding to the discrete actions a. [sent-126, score-0.382]
73 Since B is x a stochastic matrix we have B1 = 1, and so q1 B (b + q1) = x Bb = y x (40) Fig 2A Fig 2B Fig 2C 40 * * 30 20 R 2 = 0. [sent-133, score-0.064]
74 986 10 * 0 * 0 20 40 60 value in discrete MDP b Therefore x = x + q1 satis es (38) for all q, and we can adjust q to also satisfy (39), namely X q = log exp (bj ) x (41) j This completes the embedding. [sent-134, score-0.227]
75 If the above q turns out to be negative, we can either choose another e b x by adding an element from the null-space of B, or scale all costs ` (i; a) by a positive constant. [sent-135, score-0.155]
76 Such scaling does not affect the optimal control law for the discrete MDP, but it makes the elements of B y y more negative and thus q becomes more positive. [sent-136, score-0.37]
77 The grid world has a number of obstacles (black squares) and two absorbing states (white stars). [sent-138, score-0.208]
78 The possible next states are the immediate neighbors including the current state. [sent-139, score-0.098]
79 The discrete MDP has jN (i)j 1 actions corresponding to stochastic transitions to each of the neighbors. [sent-141, score-0.197]
80 For each action, the transition probability to the "desired" state is 0:8 and the remaining 0:2 is equally distributed e among the other states. [sent-142, score-0.264]
81 The costs ` (i; a) are random numbers between 1 and 10 – which is why the optimal value function shown in grayscale appears irregular. [sent-143, score-0.161]
82 Fig 2A shows the optimal value function for the discrete MDP. [sent-144, score-0.147]
83 Fig 2B shows the optimal value function for the corresponding continuous MDP. [sent-145, score-0.114]
84 The scatterplot in Fig 2C shows the optimal values in the discrete and continuous MDP (each dot is a state). [sent-146, score-0.209]
85 The values in the continuous MDP are numerically smaller – which is to be expected since the control space is larger. [sent-147, score-0.215]
86 Nevertheless, the correlation between the optimal values in the discrete and continuous MDPs is excellent. [sent-148, score-0.209]
87 5 Z-learning So far we assumed that a model of the continuous MDP is available. [sent-150, score-0.062]
88 We now turn to stochastic approximations of the optimal value function which can be used when a model is not available. [sent-151, score-0.129]
89 All we have access to are samples (ik ; jk ; qk ) where ik is the current state, jk is the next state, qk is the state cost incurred at ik , and k is the sample number. [sent-152, score-0.538]
90 z Let us now compare (43) to the Q-learning algorithm applicable to discrete MDPs. [sent-156, score-0.095]
91 The difference is that `k is now a total cost rather than a state cost, and we have a control uk generated by some control policy. [sent-158, score-0.435]
92 For each state we found the optimal transition probabilities (14). [sent-160, score-0.414]
93 We then constructed a discrete MDP which had one action (per state) that caused the same transition probabilities, and the corresponding cost was the same as in the continuous MDP. [sent-161, score-0.379]
94 We then added jN (i)j 1 other actions by permuting the transition probabilities. [sent-162, score-0.198]
95 Thus the discrete and continuous MDPs were guaranteed to have identical optimal value functions. [sent-163, score-0.227]
96 Note that the goal here is no longer to approximate discrete with continuous MDPs, but to construct pairs of problems with identical solutions allowing fair comparison of Z-learning and Q-learning. [sent-164, score-0.2]
97 When the MDP reaches an absorbing state a new run is started from a random initial state. [sent-167, score-0.226]
98 For small problems (Fig 3A) the two algorithms had identical convergence, however for larger problems (Fig 3B) the new Z-learning algorithm was clearly faster. [sent-169, score-0.068]
99 6 Summary We introduced a new class of MDPs which have a number of remarkable properties, can be solved ef ciently, and yield accurate approximations to traditional MDPs. [sent-173, score-0.059]
100 In general, no single approach is likely to be a magic wand which simpli es all optimal control problems. [sent-174, score-0.175]
wordName wordTfidf (topN-words)
[('pij', 0.635), ('uj', 0.355), ('mdp', 0.229), ('fig', 0.183), ('mdps', 0.177), ('transition', 0.17), ('gp', 0.152), ('uncontrolled', 0.146), ('absorbing', 0.132), ('bellman', 0.123), ('control', 0.123), ('ua', 0.116), ('shortest', 0.114), ('costs', 0.109), ('exp', 0.099), ('probabilities', 0.098), ('controller', 0.097), ('discrete', 0.095), ('state', 0.094), ('jk', 0.079), ('states', 0.076), ('ik', 0.073), ('jn', 0.072), ('dij', 0.069), ('continuous', 0.062), ('eigenvalue', 0.062), ('elements', 0.053), ('optimal', 0.052), ('cost', 0.052), ('price', 0.049), ('equation', 0.048), ('controls', 0.047), ('admissible', 0.046), ('zk', 0.046), ('qk', 0.044), ('uk', 0.043), ('stochastic', 0.042), ('baj', 0.042), ('ju', 0.042), ('rjsj', 0.042), ('todorov', 0.042), ('pay', 0.041), ('reinforcement', 0.041), ('path', 0.04), ('vertex', 0.039), ('approximations', 0.035), ('kl', 0.035), ('formulations', 0.034), ('spectral', 0.034), ('log', 0.033), ('embedding', 0.033), ('pik', 0.033), ('controlled', 0.033), ('transitions', 0.032), ('paths', 0.031), ('largest', 0.031), ('numerically', 0.03), ('iff', 0.03), ('radius', 0.028), ('law', 0.028), ('actions', 0.028), ('eij', 0.028), ('pi', 0.027), ('bb', 0.026), ('analytical', 0.025), ('problems', 0.025), ('unique', 0.025), ('element', 0.025), ('behave', 0.024), ('constraints', 0.024), ('integer', 0.024), ('divergence', 0.024), ('traditional', 0.024), ('rescaling', 0.024), ('maxi', 0.024), ('eigenvalues', 0.023), ('solution', 0.023), ('convex', 0.023), ('tractable', 0.023), ('lagrange', 0.023), ('substituting', 0.022), ('reminiscent', 0.022), ('multiplication', 0.022), ('graph', 0.022), ('matrix', 0.022), ('immediate', 0.022), ('turns', 0.021), ('markov', 0.021), ('diagonal', 0.021), ('simplifying', 0.02), ('construction', 0.02), ('yields', 0.02), ('converges', 0.02), ('iteration', 0.02), ('dimensionality', 0.019), ('becomes', 0.019), ('convergence', 0.018), ('identical', 0.018), ('ecs', 0.018), ('dijkstra', 0.018)]
simIndex simValue paperId paperTitle
same-paper 1 1.0000002 124 nips-2006-Linearly-solvable Markov decision problems
Author: Emanuel Todorov
Abstract: We introduce a class of MPDs which greatly simplify Reinforcement Learning. They have discrete state spaces and continuous control spaces. The controls have the effect of rescaling the transition probabilities of an underlying Markov chain. A control cost penalizing KL divergence between controlled and uncontrolled transition probabilities makes the minimization problem convex, and allows analytical computation of the optimal controls given the optimal value function. An exponential transformation of the optimal value function makes the minimized Bellman equation linear. Apart from their theoretical signi cance, the new MDPs enable ef cient approximations to traditional MDPs. Shortest path problems are approximated to arbitrary precision with largest eigenvalue problems, yielding an O (n) algorithm. Accurate approximations to generic MDPs are obtained via continuous embedding reminiscent of LP relaxation in integer programming. Offpolicy learning of the optimal value function is possible without need for stateaction values; the new algorithm (Z-learning) outperforms Q-learning. This work was supported by NSF grant ECS–0524761. 1
2 0.13564143 125 nips-2006-Logarithmic Online Regret Bounds for Undiscounted Reinforcement Learning
Author: Peter Auer, Ronald Ortner
Abstract: We present a learning algorithm for undiscounted reinforcement learning. Our interest lies in bounds for the algorithm’s online performance after some finite number of steps. In the spirit of similar methods already successfully applied for the exploration-exploitation tradeoff in multi-armed bandit problems, we use upper confidence bounds to show that our UCRL algorithm achieves logarithmic online regret in the number of steps taken with respect to an optimal policy. 1 1.1
3 0.11139192 192 nips-2006-Theory and Dynamics of Perceptual Bistability
Author: Paul R. Schrater, Rashmi Sundareswara
Abstract: Perceptual Bistability refers to the phenomenon of spontaneously switching between two or more interpretations of an image under continuous viewing. Although switching behavior is increasingly well characterized, the origins remain elusive. We propose that perceptual switching naturally arises from the brain’s search for best interpretations while performing Bayesian inference. In particular, we propose that the brain explores a posterior distribution over image interpretations at a rapid time scale via a sampling-like process and updates its interpretation when a sampled interpretation is better than the discounted value of its current interpretation. We formalize the theory, explicitly derive switching rate distributions and discuss qualitative properties of the theory including the effect of changes in the posterior distribution on switching rates. Finally, predictions of the theory are shown to be consistent with measured changes in human switching dynamics to Necker cube stimuli induced by context.
4 0.10292853 183 nips-2006-Stochastic Relational Models for Discriminative Link Prediction
Author: Kai Yu, Wei Chu, Shipeng Yu, Volker Tresp, Zhao Xu
Abstract: We introduce a Gaussian process (GP) framework, stochastic relational models (SRM), for learning social, physical, and other relational phenomena where interactions between entities are observed. The key idea is to model the stochastic structure of entity relationships (i.e., links) via a tensor interaction of multiple GPs, each defined on one type of entities. These models in fact define a set of nonparametric priors on infinite dimensional tensor matrices, where each element represents a relationship between a tuple of entities. By maximizing the marginalized likelihood, information is exchanged between the participating GPs through the entire relational network, so that the dependency structure of links is messaged to the dependency of entities, reflected by the adapted GP kernels. The framework offers a discriminative approach to link prediction, namely, predicting the existences, strengths, or types of relationships based on the partially observed linkage network as well as the attributes of entities (if given). We discuss properties and variants of SRM and derive an efficient learning algorithm. Very encouraging experimental results are achieved on a toy problem and a user-movie preference link prediction task. In the end we discuss extensions of SRM to general relational learning tasks. 1
5 0.088656314 191 nips-2006-The Robustness-Performance Tradeoff in Markov Decision Processes
Author: Huan Xu, Shie Mannor
Abstract: Computation of a satisfactory control policy for a Markov decision process when the parameters of the model are not exactly known is a problem encountered in many practical applications. The traditional robust approach is based on a worstcase analysis and may lead to an overly conservative policy. In this paper we consider the tradeoff between nominal performance and the worst case performance over all possible models. Based on parametric linear programming, we propose a method that computes the whole set of Pareto efficient policies in the performancerobustness plane when only the reward parameters are subject to uncertainty. In the more general case when the transition probabilities are also subject to error, we show that the strategy with the “optimal” tradeoff might be non-Markovian and hence is in general not tractable. 1
6 0.085659727 64 nips-2006-Data Integration for Classification Problems Employing Gaussian Process Priors
7 0.06909667 77 nips-2006-Fast Computation of Graph Kernels
8 0.062181771 171 nips-2006-Sample Complexity of Policy Search with Known Dynamics
9 0.058101796 93 nips-2006-Hyperparameter Learning for Graph Based Semi-supervised Learning Algorithms
10 0.056528974 63 nips-2006-Cross-Validation Optimization for Large Scale Hierarchical Classification Kernel Methods
11 0.052785188 38 nips-2006-Automated Hierarchy Discovery for Planning in Partially Observable Environments
12 0.049317513 57 nips-2006-Conditional mean field
13 0.047014177 98 nips-2006-Inferring Network Structure from Co-Occurrences
14 0.046048414 2 nips-2006-A Collapsed Variational Bayesian Inference Algorithm for Latent Dirichlet Allocation
15 0.045873906 35 nips-2006-Approximate inference using planar graph decomposition
16 0.045332365 47 nips-2006-Boosting Structured Prediction for Imitation Learning
17 0.044587534 25 nips-2006-An Application of Reinforcement Learning to Aerobatic Helicopter Flight
18 0.044543523 143 nips-2006-Natural Actor-Critic for Road Traffic Optimisation
19 0.04234292 15 nips-2006-A Switched Gaussian Process for Estimating Disparity and Segmentation in Binocular Stereo
20 0.040823895 154 nips-2006-Optimal Change-Detection and Spiking Neurons
topicId topicWeight
[(0, -0.144), (1, 0.014), (2, -0.1), (3, -0.065), (4, 0.053), (5, -0.03), (6, 0.095), (7, 0.028), (8, 0.07), (9, 0.012), (10, 0.018), (11, -0.052), (12, 0.018), (13, -0.063), (14, 0.003), (15, 0.047), (16, -0.024), (17, 0.004), (18, -0.043), (19, -0.092), (20, 0.067), (21, 0.061), (22, 0.086), (23, -0.013), (24, 0.06), (25, 0.046), (26, -0.116), (27, -0.087), (28, 0.086), (29, 0.086), (30, -0.007), (31, 0.045), (32, 0.027), (33, 0.003), (34, -0.104), (35, 0.08), (36, -0.021), (37, 0.014), (38, 0.118), (39, 0.124), (40, 0.059), (41, 0.07), (42, 0.024), (43, 0.089), (44, -0.027), (45, -0.017), (46, 0.168), (47, -0.055), (48, 0.11), (49, 0.111)]
simIndex simValue paperId paperTitle
same-paper 1 0.9611727 124 nips-2006-Linearly-solvable Markov decision problems
Author: Emanuel Todorov
Abstract: We introduce a class of MPDs which greatly simplify Reinforcement Learning. They have discrete state spaces and continuous control spaces. The controls have the effect of rescaling the transition probabilities of an underlying Markov chain. A control cost penalizing KL divergence between controlled and uncontrolled transition probabilities makes the minimization problem convex, and allows analytical computation of the optimal controls given the optimal value function. An exponential transformation of the optimal value function makes the minimized Bellman equation linear. Apart from their theoretical signi cance, the new MDPs enable ef cient approximations to traditional MDPs. Shortest path problems are approximated to arbitrary precision with largest eigenvalue problems, yielding an O (n) algorithm. Accurate approximations to generic MDPs are obtained via continuous embedding reminiscent of LP relaxation in integer programming. Offpolicy learning of the optimal value function is possible without need for stateaction values; the new algorithm (Z-learning) outperforms Q-learning. This work was supported by NSF grant ECS–0524761. 1
2 0.62300432 191 nips-2006-The Robustness-Performance Tradeoff in Markov Decision Processes
Author: Huan Xu, Shie Mannor
Abstract: Computation of a satisfactory control policy for a Markov decision process when the parameters of the model are not exactly known is a problem encountered in many practical applications. The traditional robust approach is based on a worstcase analysis and may lead to an overly conservative policy. In this paper we consider the tradeoff between nominal performance and the worst case performance over all possible models. Based on parametric linear programming, we propose a method that computes the whole set of Pareto efficient policies in the performancerobustness plane when only the reward parameters are subject to uncertainty. In the more general case when the transition probabilities are also subject to error, we show that the strategy with the “optimal” tradeoff might be non-Markovian and hence is in general not tractable. 1
3 0.55053574 192 nips-2006-Theory and Dynamics of Perceptual Bistability
Author: Paul R. Schrater, Rashmi Sundareswara
Abstract: Perceptual Bistability refers to the phenomenon of spontaneously switching between two or more interpretations of an image under continuous viewing. Although switching behavior is increasingly well characterized, the origins remain elusive. We propose that perceptual switching naturally arises from the brain’s search for best interpretations while performing Bayesian inference. In particular, we propose that the brain explores a posterior distribution over image interpretations at a rapid time scale via a sampling-like process and updates its interpretation when a sampled interpretation is better than the discounted value of its current interpretation. We formalize the theory, explicitly derive switching rate distributions and discuss qualitative properties of the theory including the effect of changes in the posterior distribution on switching rates. Finally, predictions of the theory are shown to be consistent with measured changes in human switching dynamics to Necker cube stimuli induced by context.
4 0.48840314 25 nips-2006-An Application of Reinforcement Learning to Aerobatic Helicopter Flight
Author: Pieter Abbeel, Adam Coates, Morgan Quigley, Andrew Y. Ng
Abstract: Autonomous helicopter flight is widely regarded to be a highly challenging control problem. This paper presents the first successful autonomous completion on a real RC helicopter of the following four aerobatic maneuvers: forward flip and sideways roll at low speed, tail-in funnel, and nose-in funnel. Our experimental results significantly extend the state of the art in autonomous helicopter flight. We used the following approach: First we had a pilot fly the helicopter to help us find a helicopter dynamics model and a reward (cost) function. Then we used a reinforcement learning (optimal control) algorithm to find a controller that is optimized for the resulting model and reward function. More specifically, we used differential dynamic programming (DDP), an extension of the linear quadratic regulator (LQR). 1
5 0.48483852 183 nips-2006-Stochastic Relational Models for Discriminative Link Prediction
Author: Kai Yu, Wei Chu, Shipeng Yu, Volker Tresp, Zhao Xu
Abstract: We introduce a Gaussian process (GP) framework, stochastic relational models (SRM), for learning social, physical, and other relational phenomena where interactions between entities are observed. The key idea is to model the stochastic structure of entity relationships (i.e., links) via a tensor interaction of multiple GPs, each defined on one type of entities. These models in fact define a set of nonparametric priors on infinite dimensional tensor matrices, where each element represents a relationship between a tuple of entities. By maximizing the marginalized likelihood, information is exchanged between the participating GPs through the entire relational network, so that the dependency structure of links is messaged to the dependency of entities, reflected by the adapted GP kernels. The framework offers a discriminative approach to link prediction, namely, predicting the existences, strengths, or types of relationships based on the partially observed linkage network as well as the attributes of entities (if given). We discuss properties and variants of SRM and derive an efficient learning algorithm. Very encouraging experimental results are achieved on a toy problem and a user-movie preference link prediction task. In the end we discuss extensions of SRM to general relational learning tasks. 1
6 0.4755446 125 nips-2006-Logarithmic Online Regret Bounds for Undiscounted Reinforcement Learning
7 0.3883754 171 nips-2006-Sample Complexity of Policy Search with Known Dynamics
8 0.37345806 90 nips-2006-Hidden Markov Dirichlet Process: Modeling Genetic Recombination in Open Ancestral Space
9 0.35571265 169 nips-2006-Relational Learning with Gaussian Processes
10 0.33476365 64 nips-2006-Data Integration for Classification Problems Employing Gaussian Process Priors
11 0.31442156 154 nips-2006-Optimal Change-Detection and Spiking Neurons
12 0.30668357 57 nips-2006-Conditional mean field
13 0.30471301 37 nips-2006-Attribute-efficient learning of decision lists and linear threshold functions under unconcentrated distributions
14 0.30137852 202 nips-2006-iLSTD: Eligibility Traces and Convergence Analysis
15 0.29841751 138 nips-2006-Multi-Task Feature Learning
16 0.29789516 87 nips-2006-Graph Laplacian Regularization for Large-Scale Semidefinite Programming
17 0.29439396 15 nips-2006-A Switched Gaussian Process for Estimating Disparity and Segmentation in Binocular Stereo
18 0.28813297 77 nips-2006-Fast Computation of Graph Kernels
19 0.28116363 135 nips-2006-Modelling transcriptional regulation using Gaussian Processes
20 0.27939263 2 nips-2006-A Collapsed Variational Bayesian Inference Algorithm for Latent Dirichlet Allocation
topicId topicWeight
[(1, 0.104), (3, 0.025), (7, 0.064), (9, 0.037), (21, 0.313), (22, 0.088), (44, 0.08), (57, 0.075), (65, 0.036), (69, 0.026), (71, 0.027)]
simIndex simValue paperId paperTitle
same-paper 1 0.78377819 124 nips-2006-Linearly-solvable Markov decision problems
Author: Emanuel Todorov
Abstract: We introduce a class of MPDs which greatly simplify Reinforcement Learning. They have discrete state spaces and continuous control spaces. The controls have the effect of rescaling the transition probabilities of an underlying Markov chain. A control cost penalizing KL divergence between controlled and uncontrolled transition probabilities makes the minimization problem convex, and allows analytical computation of the optimal controls given the optimal value function. An exponential transformation of the optimal value function makes the minimized Bellman equation linear. Apart from their theoretical signi cance, the new MDPs enable ef cient approximations to traditional MDPs. Shortest path problems are approximated to arbitrary precision with largest eigenvalue problems, yielding an O (n) algorithm. Accurate approximations to generic MDPs are obtained via continuous embedding reminiscent of LP relaxation in integer programming. Offpolicy learning of the optimal value function is possible without need for stateaction values; the new algorithm (Z-learning) outperforms Q-learning. This work was supported by NSF grant ECS–0524761. 1
2 0.75108886 78 nips-2006-Fast Discriminative Visual Codebooks using Randomized Clustering Forests
Author: Frank Moosmann, Bill Triggs, Frederic Jurie
Abstract: Some of the most effective recent methods for content-based image classification work by extracting dense or sparse local image descriptors, quantizing them according to a coding rule such as k-means vector quantization, accumulating histograms of the resulting “visual word” codes over the image, and classifying these with a conventional classifier such as an SVM. Large numbers of descriptors and large codebooks are needed for good results and this becomes slow using k-means. We introduce Extremely Randomized Clustering Forests – ensembles of randomly created clustering trees – and show that these provide more accurate results, much faster training and testing and good resistance to background clutter in several state-of-the-art image classification tasks. 1
3 0.72756547 94 nips-2006-Image Retrieval and Classification Using Local Distance Functions
Author: Andrea Frome, Yoram Singer, Jitendra Malik
Abstract: In this paper we introduce and experiment with a framework for learning local perceptual distance functions for visual recognition. We learn a distance function for each training image as a combination of elementary distances between patch-based visual features. We apply these combined local distance functions to the tasks of image retrieval and classification of novel images. On the Caltech 101 object recognition benchmark, we achieve 60.3% mean recognition across classes using 15 training images per class, which is better than the best published performance by Zhang, et al. 1
4 0.5220989 34 nips-2006-Approximate Correspondences in High Dimensions
Author: Kristen Grauman, Trevor Darrell
Abstract: Pyramid intersection is an efficient method for computing an approximate partial matching between two sets of feature vectors. We introduce a novel pyramid embedding based on a hierarchy of non-uniformly shaped bins that takes advantage of the underlying structure of the feature space and remains accurate even for sets with high-dimensional feature vectors. The matching similarity is computed in linear time and forms a Mercer kernel. Whereas previous matching approximation algorithms suffer from distortion factors that increase linearly with the feature dimension, we demonstrate that our approach can maintain constant accuracy even as the feature dimension increases. When used as a kernel in a discriminative classifier, our approach achieves improved object recognition results over a state-of-the-art set kernel. 1
5 0.52107096 20 nips-2006-Active learning for misspecified generalized linear models
Author: Francis R. Bach
Abstract: Active learning refers to algorithmic frameworks aimed at selecting training data points in order to reduce the number of required training data points and/or improve the generalization performance of a learning method. In this paper, we present an asymptotic analysis of active learning for generalized linear models. Our analysis holds under the common practical situation of model misspecification, and is based on realistic assumptions regarding the nature of the sampling distributions, which are usually neither independent nor identical. We derive unbiased estimators of generalization performance, as well as estimators of expected reduction in generalization error after adding a new training data point, that allow us to optimize its sampling distribution through a convex optimization problem. Our analysis naturally leads to an algorithm for sequential active learning which is applicable for all tasks supported by generalized linear models (e.g., binary classification, multi-class classification, regression) and can be applied in non-linear settings through the use of Mercer kernels. 1
6 0.51870543 175 nips-2006-Simplifying Mixture Models through Function Approximation
7 0.51580215 3 nips-2006-A Complexity-Distortion Approach to Joint Pattern Alignment
8 0.51528203 65 nips-2006-Denoising and Dimension Reduction in Feature Space
9 0.51433361 165 nips-2006-Real-time adaptive information-theoretic optimization of neurophysiology experiments
10 0.51161796 32 nips-2006-Analysis of Empirical Bayesian Methods for Neuroelectromagnetic Source Localization
11 0.50920296 61 nips-2006-Convex Repeated Games and Fenchel Duality
12 0.50895786 87 nips-2006-Graph Laplacian Regularization for Large-Scale Semidefinite Programming
13 0.50809228 83 nips-2006-Generalized Maximum Margin Clustering and Unsupervised Kernel Learning
14 0.50782388 185 nips-2006-Subordinate class recognition using relational object models
15 0.50780153 195 nips-2006-Training Conditional Random Fields for Maximum Labelwise Accuracy
16 0.50615627 51 nips-2006-Clustering Under Prior Knowledge with Application to Image Segmentation
17 0.5060308 21 nips-2006-AdaBoost is Consistent
18 0.50564384 194 nips-2006-Towards a general independent subspace analysis
19 0.5054242 112 nips-2006-Learning Nonparametric Models for Probabilistic Imitation
20 0.5044027 5 nips-2006-A Kernel Method for the Two-Sample-Problem