nips nips2002 nips2002-33 knowledge-graph by maker-knowledge-mining

33 nips-2002-Approximate Linear Programming for Average-Cost Dynamic Programming


Source: pdf

Author: Benjamin V. Roy, Daniela D. Farias

Abstract: This paper extends our earlier analysis on approximate linear programming as an approach to approximating the cost-to-go function in a discounted-cost dynamic program [6]. In this paper, we consider the average-cost criterion and a version of approximate linear programming that generates approximations to the optimal average cost and differential cost function. We demonstrate that a naive version of approximate linear programming prioritizes approximation of the optimal average cost and that this may not be well-aligned with the objective of deriving a policy with low average cost. For that, the algorithm should aim at producing a good approximation of the differential cost function. We propose a twophase variant of approximate linear programming that allows for external control of the relative accuracy of the approximation of the differential cost function over different portions of the state space via state-relevance weights. Performance bounds suggest that the new algorithm is compatible with the objective of optimizing performance and provide guidance on appropriate choices for state-relevance weights.

Reference: text


Summary: the most important sentenses genereted by tfidf model

sentIndex sentText sentNum sentScore

1 edu Abstract This paper extends our earlier analysis on approximate linear programming as an approach to approximating the cost-to-go function in a discounted-cost dynamic program [6]. [sent-3, score-0.412]

2 In this paper, we consider the average-cost criterion and a version of approximate linear programming that generates approximations to the optimal average cost and differential cost function. [sent-4, score-1.347]

3 We demonstrate that a naive version of approximate linear programming prioritizes approximation of the optimal average cost and that this may not be well-aligned with the objective of deriving a policy with low average cost. [sent-5, score-1.232]

4 For that, the algorithm should aim at producing a good approximation of the differential cost function. [sent-6, score-0.75]

5 We propose a twophase variant of approximate linear programming that allows for external control of the relative accuracy of the approximation of the differential cost function over different portions of the state space via state-relevance weights. [sent-7, score-1.245]

6 Performance bounds suggest that the new algorithm is compatible with the objective of optimizing performance and provide guidance on appropriate choices for state-relevance weights. [sent-8, score-0.249]

7 1 Introduction The curse of dimensionality prevents application of dynamic programming to most problems of practical interest. [sent-9, score-0.305]

8 Approximate linear programming (ALP) aims to alleviate the curse of dimensionality by approximation of the dynamic programming solution. [sent-10, score-0.609]

9 In [6], we develop a variant of approximate linear programming for the discounted-cost case which is shown to scale well with problem size. [sent-11, score-0.345]

10 Originally introduced by Schweitzer and Seidmann [11], approximate linear programming combines the linear programming approach to exact dynamic programming [9] to ap- proximation of the differential cost function (cost-to-go function, in the discounted-cost case) by a linear architecture. [sent-13, score-1.416]

11   ¢(&¡   ¡ $" ¡ ) '% Extension of approximate linear programming to the average-cost setting requires a different algorithm and additional analytical ideas. [sent-15, score-0.33]

12 Specifically, our contribution can be summarized as follows: Analysis of the usual formulation of approximate linear programming for averagecost problems. [sent-16, score-0.394]

13 We start with the observation that the most natural formulation of averagecost ALP, which follows immediately from taking limits in the discounted-cost formulation and can be found, for instance, in [1, 2, 4, 10], can be interpreted as an algorithm for approximating of the optimal average cost. [sent-17, score-0.292]

14 However, to obtain a good policy, one needs a good approximation to the differential cost function. [sent-18, score-0.732]

15 We demonstrate through a counterexample that approximating the average cost and approximating the differential cost function so that it leads to a good policy are not necessarily aligned objectives. [sent-19, score-1.417]

16 Indeed, the algorithm may lead to arbitrarily bad policies, even if the approximate average cost is very close to optimal and the basis functions have the potential to produce an approximate differential cost function leading to a reasonable policy. [sent-20, score-1.349]

17 A critical limitation of the average-cost ALP algorithm found in the literature is that it does not allow for external control of how the approximation to the differential cost function should be emphasized over different portions of the state space. [sent-22, score-0.927]

18 In situations like the one described in the previous paragraph, when the algorithm produces a bad policy, there is little one can do to improve the approximation other than selecting new basis functions. [sent-23, score-0.216]

19 To address this issue, we propose a two-phase variant of average-cost ALP: the first phase is simply the average-cost ALP algorithm already found in the literature, which is used for generating an approximation for the optimal average cost. [sent-24, score-0.324]

20 This approximation is used in the second phase of the algorithm for generating an approximation to the differential cost function. [sent-25, score-0.857]

21 Development of bounds linking the quality of approximate differential cost functions to the performance of the policy associated with them. [sent-27, score-1.143]

22 The observation that the usual formulation of ALP may lead to arbitrarily bad policies raises the question of how to design an algorithm for directly optimizing performance of the policy being obtained. [sent-28, score-0.572]

23 With this question in mind, we develop bounds that relate the quality of approximate differential cost functions — i. [sent-29, score-0.778]

24 , their proximity to the true differential cost function — to the expected increase in cost incurred by using a greedy policy associated with them. [sent-31, score-1.327]

25 The bound suggests using a weighted sum of the distance to the true differential cost function for comparing different approximate differential cost functions. [sent-32, score-1.338]

26 Thus the objective of the second phase of our ALP algorithm is compatible with the objective of optimizing performance of the policy being obtained, and we also have some guidance on appropriate choices of state-relevance weights. [sent-33, score-0.598]

27 2 Stochastic Control Problems and the Curse of Dimensionality We consider discrete-time stochastic control problems involving a finite state space of cardinality . [sent-34, score-0.146]

28 For each state , there is a finite set of available actions. [sent-35, score-0.09]

29 0 8 97 0 5 6' 3 4§ 1 2 10 ¦ ¤ ) §¥¢' % 5 ' When the current state is and action is taken, a cost is incurred. [sent-36, score-0.375]

30 State transition probabilities represent, for each pair of states and each action , the probability that the next state will be given that the current state is and the current action is . [sent-37, score-0.348]

31 8 97 ¢)  ' £¡ £ (% '    ) £ ' %  ¢ ©¨ 5 8 7 5 8 7     A policy is a mapping from states to actions. [sent-38, score-0.385]

32 Given a policy , the dynamics of the system follow a Markov chain with transition probabilities . [sent-39, score-0.363]

33 For each policy , we define a transition matrix whose th entry is , and a cost vector whose th entry is . [sent-40, score-0.604]

34 We make the following assumption on the transition probabilities:    8 £¨  ' ) £ %8 £¨    ' ) £  (%  ¡ and each policy , there   ' and    ' ) £ (%  ¨ ) ' ¢(% '   8 £ ¡ Assumption 1 (Irreducibility). [sent-41, score-0.345]

35 In stochastic control problems, we want to select a policy optimizing a given criterion. [sent-43, score-0.41]

36 In this paper, we will employ as an optimality criterion the average cost § ¢' % )  % H ' ) ' C " I§ ¡ $(% "   C 9 7 A  G' § E' D) (% B @8! [sent-44, score-0.327]

37 1 6420(& 5 3 1 )' F A C ' 1 Irreducibility implies that, for each policy , this — the average cost is independent of the initial state for all    %  UV T  T  S' ) (RQ§ BH P H limit exists and in the system. [sent-45, score-0.733]

38 For any policy , we define the We denote the minimal average cost by associated dynamic programming operator by Note that operates on vectors corresponding to functions on the state space . [sent-47, score-1.042]

39 We also define the dynamic programming operator by A policy is called greedy with respect to if it attains the minimum in the definition of . [sent-48, score-0.628]

40   T  0 § U V ¨ X ¡ W   T U  T  ` ` Y ba #§5 S' ) (Rc§ U T T U U An optimal policy minimizing the average cost can be derived from the solution of Bellman’s equation where is the vector of ones. [sent-49, score-0.749]

41 The scalar is unique and equal to the the optimal Bellman’s equation by pairs average cost. [sent-51, score-0.152]

42 The differential cost funcsolves Bellman’s equation, then is also tion is unique up to a constant factor; if a solution for all , and all other solutions can be shown to be of this form. [sent-53, score-0.637]

43 Any policy that is greedy with uniqueness by imposing respect to the differential cost function is optimal. [sent-55, score-0.992]

44 d f gW P £ PU ) P BH % T £ d P BH U § P U P U ' ¦ § ¢' % P ) d EH U eW U f U Solving Bellman’s equation involves computing and storing the differential cost function for all states in the system. [sent-56, score-0.69]

45 This is computationally infeasible in most problems of practical interest due to the explosion on the number of states as the number of state variables grows. [sent-57, score-0.159]

46 We try to combat the curse of dimensionality by settling for the more modest goal of finding an approximation to the differential cost function. [sent-58, score-0.753]

47 The underlying assumption is that, in many problems of practical interest, the differential cost function will exhibit some regularity, or structure, allowing for reasonable approximations to be stored compactly. [sent-59, score-0.679]

48 § ¤£ Y ¥ iq p0 h 8¡   We consider a linear approximation architecture: given a set of functions , we generate approximations of the form r   © £ ! [sent-60, score-0.203]

49 , each of the basis functions We define a matrix is stored as a column of , and each row corresponds to a vector of the basis functions evaluated at a distinct state . [sent-64, score-0.274]

50 For simplicity, we choose an arbitrary state — henceforth called state “0”— for which we set ; accordingly, we assume that the basis functions are such that , . [sent-67, score-0.255]

51 ¥ # ‰ ¦ § ) ˆ&¡   ¦% ¦ § ) ‡% P ¦ U 3 Approximate Linear Programming Approximate linear programming [11, 6] is inspired by the traditional linear programming approach to dynamic programming, introduced by [9]. [sent-68, score-0.486]

52 U W U T £ £ ' £ )  % ) £ (%  '   ‰ U ¢ ©¨   In approximate linear programming, we reduce the generally intractable dimensions of the average-cost ELP by constraining to be of the form . [sent-70, score-0.117]

53 This yields the first-phase approximate LP (ALP) # x U ¨ ¦  © ) ¤¢  H (3) x T  #  # x W d H Problem (3) can be expressed as an LP by the same argument used for the exact LP. [sent-71, score-0.108]

54 P bH % " #H § # x U 1 H  P $4H 1 § H  P $4H H Lemma 1 implies that the first-phase ALP can be seen as an algorithm for approximating the optimal average cost. [sent-82, score-0.177]

55 Using this algorithm for generating a policy for the averagecost problem is based on the hope that approximation of the optimal average cost should also implicitly imply approximation of the differential cost function. [sent-83, score-1.564]

56 Note that it is not unreasonable to expect that some approximation of the differential cost function should be involved in the minimization of ; for instance, we know that iff . [sent-84, score-0.686]

57 Our first step in the analysis of average-cost ALP is to demonstrate through a counterexample that it can produce arbitrarily bad policies, even if the approximation to the average cost is very accurate. [sent-90, score-0.579]

58 4 Performance of the first-phase ALP: a counterexample ' £© (£   £ ¦ We consider a Markov process with states , each representing a possible number of jobs in a queue with buffer of size . [sent-91, score-0.257]

59 The system state evolves according to £) " ) "  I ¨ ' (QP(£A 3@I 6© W£ ) r F G£C797 4 E ' B  B A@ 8 6 3 ) £"GHF©D©D('('(&(&G;£C7975©5©'' 4££ ©©  " ) 2 % E ' B   B A@ 8 6 3 ) ' " " § ! [sent-92, score-0.09]

60 10" )  r © r ¦ © ¦ From state , transitions to states and occurs with probabilities and , respectively. [sent-93, score-0.177]

61 From state , transitions to states and occur with probabilities and , respectively. [sent-94, score-0.177]

62 The action to be chosen in each state is the departure probability or service rate , which takes values the set . [sent-96, score-0.116]

63 The cost incurred at state if action is taken is given by . [sent-97, score-0.41]

64 For , the first-phase ALP yields an approximation for the optimal average cost, which is within 2% of the true value . [sent-102, score-0.201]

65 However, the average cost yielded by the greedy policy with respect to is 9842. [sent-103, score-0.747]

66 2 for , and goes to infinity as we increase the buffer size. [sent-104, score-0.109]

67 Note that is a very good approximation for over states , and becomes progressive worse as increases. [sent-106, score-0.176]

68 States correspond to virtually all of the stationary probability under the optimal policy ( ), hence it is not surprising that the first-phase ALP yields a very accurate approximation for , as other states contribute very little to the optimal average cost. [sent-107, score-0.688]

69 However, fitting the optimal average cost and the differential cost function over states visited often under the optimal policy is not sufficient for getting a good policy. [sent-108, score-1.435]

70 Indeed, severely underestimates costs in large states, and the greedy policy drives the system to those states, yielding a very large average cost and ultimately making the system unstable, when the buffer size goes to infinity. [sent-109, score-0.835]

71 Hence even though the first-phase ALP is being given a relatively good set of basis functions, it is producing a bad approximate differential cost function, which cannot be improved unless different basis functions are selected. [sent-115, score-0.92]

72  ¡  ¦ ¢  %$" #¦© "  ¢ 5‚  § §     § # # x & '©  ¢ 5 Two-phase average-cost ALP A striking difference between the first-phase average-cost ALP and discounted-cost ALP is the presence in the latter of state relevance weights. [sent-117, score-0.09]

73 For instance, in the example described in the previous section, in the discounted-cost formulation one might be able to improve the policy yielded by ALP by choosing state-relevance weights that put more emphasis on states . [sent-119, score-0.484]

74 Inspired by this observation, we propose a two-phase algorithm with the characteristic that staterelevance weights are present and can be used to control the quality of the differential cost function approximation. [sent-120, score-0.754]

75 The first phase is simply the first-phase ALP introduced in Section 3, and is used for generating an approximation to the optimal average cost. [sent-121, score-0.267]

76 (  H ( 1  # ( can be used for controlling We now demonstrate how the state-relevance weights and the quality of the approximation to the differential cost function. [sent-124, score-0.785]

77 We first define, for any r , given by the unique solution to [3]  ¦ § ) ‡% ¦ £ ¦ 1' 0§ U (5) ¤  £ H ‰ ) ¢' % ) % § $(% ) ' U T ¤ U H given , the function U H If is our estimate for the optimal average cost, then can be seen as an estimate to the differential cost function . [sent-125, score-0.754]

78 ¦) ¥ P H U H P BH P  ¢ ¨ U U   ¡ ¨ H    ¡ ¨  £% ¤) H " P H% P  H¤ U ¤ U Proof: Equation (5), satisfied by , corresponds to Bellman’s equation for the problem of finding the stochastic shortest path to state 0, when costs are given by [3]. [sent-130, score-0.176]

79 Hence corresponds to the vector of smallest expected lengths of paths until state 0. [sent-131, score-0.107]

80  # x  P U © hence the second-phase ALP minimizes an upper bound on the weighted norm of the error in the differential cost function approximation. [sent-155, score-0.695]

81 Note that state-relevance weights determine how errors over different portions of the state space are weighted in the decision of which approximate differential cost function to select, and can be used for balancing accuracy of the approximation over different states. [sent-156, score-1.0]

82 In the next section, we will provide performance bounds that tie a certain norm of the difference between and to the expect increase in cost incurred by using the greedy policy with respect to . [sent-157, score-0.778]

83 This demonstrates that the objective optimized by the second-phase ALP is compatible with the objective of optimizing performance of the policy being obtained, and it also provides some insight about appropriate choices of state-relevance weights. [sent-158, score-0.493]

84 An obvious choice is , since is the estimate for the optimal average cost yielded by the first-phase ALP and it satisfies , so that bound (6) holds. [sent-164, score-0.448]

85 H P 4H  H over to optimize performance of the ultimate policy being generated. [sent-166, score-0.341]

86 It can also be shown that, under certain conditions on the basis functions , the second-phase ALP possesses multiple feasible solutions regardless of the choice of . [sent-168, score-0.167]

87 $H  H # x  H 6 A performance bound In this section, we present a bound on the performance of greedy policies associated with approximate differential cost functions. [sent-170, score-0.932]

88 This bound provide some guidance on appropriate choices for state-relevance weights. [sent-171, score-0.099]

89 For all , let and denote the average cost and stationary state distribution of the greedy policy associated with . [sent-174, score-0.9]

90 © U $P U ©   § ) ¢ £¡ § U $P4H W P U§ %   § ) U  P U   1 § ¨ §   § 1 1 §¡§  § §H   § 1 U " §H P " W P H U Proof: We have , where and denote the costs and transition matrix associated with the greedy policy with respect to , and we have used in the first equality. [sent-176, score-0.485]

91 Alternatively, in some cases it may suffice to use rough guesses about the stationary state distribution of the MDP as choices for the state-relevance weights. [sent-180, score-0.174]

92 This is similar to what is done in [6] and is motivated by the fact that, if the system runs under a “stabilizing” policy, there are exponential lower and upper bounds to the stationary state distribution [5]. [sent-185, score-0.176]

93 The best policy is obtained for , improvement in the shape of and incurs an average cost of approximately , regardless of the buffer size. [sent-191, score-0.761]

94 This cost is only about higher than the optimal average cost. [sent-192, score-0.376]

95 We propose a variant of approximate linear programming — the two-phase approximate linear programming method — that explicitly approximates the differential cost function. [sent-197, score-1.256]

96 The main attractive of the algorithm is the presence of staterelevance weights, which can be used for controlling the relative accuracy of the differential cost function approximation over different portions of the state space. [sent-198, score-0.93]

97 5 1 0 10 20 30 40 50 60 70 80 90 100 Figure 1: Controlled queue example: Differential cost function approximations as a function of . [sent-206, score-0.36]

98 From top to bottom, differential cost function , approximations (with ), and approximation . [sent-207, score-0.746]

99 #  " ¦ £   ¦ £   ¦ § ¤ times state-relevance weights are updated in each iteration according to the stationary state distribution obtained with the policy generated by the algorithm in the previous iteration. [sent-209, score-0.519]

100 On constraint sampling in the linear programming approach to approximate dynamic programming. [sent-249, score-0.355]


similar papers computed by tfidf model

tfidf for this paper:

wordName wordTfidf (topN-words)

[('alp', 0.696), ('differential', 0.343), ('policy', 0.316), ('cost', 0.259), ('programming', 0.192), ('bh', 0.131), ('buffer', 0.092), ('state', 0.09), ('approximate', 0.089), ('approximation', 0.084), ('bellman', 0.077), ('greedy', 0.074), ('states', 0.069), ('average', 0.068), ('bad', 0.067), ('approximations', 0.06), ('portions', 0.058), ('averagecost', 0.055), ('counterexample', 0.055), ('stationary', 0.053), ('optimal', 0.049), ('feasible', 0.048), ('dynamic', 0.046), ('policies', 0.045), ('curse', 0.045), ('basis', 0.044), ('farias', 0.044), ('guidance', 0.044), ('queue', 0.041), ('phase', 0.04), ('lp', 0.04), ('approximating', 0.039), ('weights', 0.039), ('optimizing', 0.038), ('elp', 0.037), ('irreducibility', 0.037), ('prioritizes', 0.037), ('queueing', 0.037), ('schweitzer', 0.037), ('staterelevance', 0.037), ('variant', 0.036), ('incurred', 0.035), ('bounds', 0.033), ('control', 0.032), ('pucci', 0.032), ('choices', 0.031), ('compatible', 0.031), ('functions', 0.031), ('yielded', 0.03), ('arbitrarily', 0.03), ('formulation', 0.03), ('theorem', 0.03), ('transition', 0.029), ('minimizes', 0.029), ('linear', 0.028), ('athena', 0.027), ('preprint', 0.027), ('costs', 0.026), ('generating', 0.026), ('objective', 0.026), ('action', 0.026), ('regardless', 0.026), ('performance', 0.025), ('stochastic', 0.024), ('bound', 0.024), ('management', 0.024), ('associated', 0.024), ('good', 0.023), ('quality', 0.023), ('literature', 0.023), ('dimensionality', 0.022), ('algorithm', 0.021), ('lemma', 0.021), ('weighted', 0.021), ('controlling', 0.021), ('producing', 0.02), ('solution', 0.019), ('norm', 0.019), ('proof', 0.019), ('exact', 0.019), ('van', 0.019), ('controlled', 0.019), ('equation', 0.019), ('minimizing', 0.019), ('markovian', 0.018), ('program', 0.018), ('probabilities', 0.018), ('choice', 0.018), ('increase', 0.017), ('external', 0.017), ('stored', 0.017), ('corresponds', 0.017), ('stanford', 0.017), ('accuracy', 0.017), ('instance', 0.017), ('denote', 0.016), ('scienti', 0.016), ('demonstrate', 0.016), ('unique', 0.016), ('bertsimas', 0.016)]

similar papers list:

simIndex simValue paperId paperTitle

same-paper 1 1.0000007 33 nips-2002-Approximate Linear Programming for Average-Cost Dynamic Programming

Author: Benjamin V. Roy, Daniela D. Farias

Abstract: This paper extends our earlier analysis on approximate linear programming as an approach to approximating the cost-to-go function in a discounted-cost dynamic program [6]. In this paper, we consider the average-cost criterion and a version of approximate linear programming that generates approximations to the optimal average cost and differential cost function. We demonstrate that a naive version of approximate linear programming prioritizes approximation of the optimal average cost and that this may not be well-aligned with the objective of deriving a policy with low average cost. For that, the algorithm should aim at producing a good approximation of the differential cost function. We propose a twophase variant of approximate linear programming that allows for external control of the relative accuracy of the approximation of the differential cost function over different portions of the state space via state-relevance weights. Performance bounds suggest that the new algorithm is compatible with the objective of optimizing performance and provide guidance on appropriate choices for state-relevance weights.

2 0.28450841 3 nips-2002-A Convergent Form of Approximate Policy Iteration

Author: Theodore J. Perkins, Doina Precup

Abstract: We study a new, model-free form of approximate policy iteration which uses Sarsa updates with linear state-action value function approximation for policy evaluation, and a “policy improvement operator” to generate a new policy based on the learned state-action values. We prove that if the policy improvement operator produces -soft policies and is Lipschitz continuous in the action values, with a constant that is not too large, then the approximate policy iteration algorithm converges to a unique solution from any initial policy. To our knowledge, this is the first convergence result for any form of approximate policy iteration under similar computational-resource assumptions.

3 0.20285092 155 nips-2002-Nonparametric Representation of Policies and Value Functions: A Trajectory-Based Approach

Author: Christopher G. Atkeson, Jun Morimoto

Abstract: A longstanding goal of reinforcement learning is to develop nonparametric representations of policies and value functions that support rapid learning without suffering from interference or the curse of dimensionality. We have developed a trajectory-based approach, in which policies and value functions are represented nonparametrically along trajectories. These trajectories, policies, and value functions are updated as the value function becomes more accurate or as a model of the task is updated. We have applied this approach to periodic tasks such as hopping and walking, which required handling discount factors and discontinuities in the task dynamics, and using function approximation to represent value functions at discontinuities. We also describe extensions of the approach to make the policies more robust to modeling error and sensor noise.

4 0.1358688 20 nips-2002-Adaptive Caching by Refetching

Author: Robert B. Gramacy, Manfred K. Warmuth, Scott A. Brandt, Ismail Ari

Abstract: We are constructing caching policies that have 13-20% lower miss rates than the best of twelve baseline policies over a large variety of request streams. This represents an improvement of 49–63% over Least Recently Used, the most commonly implemented policy. We achieve this not by designing a specific new policy but by using on-line Machine Learning algorithms to dynamically shift between the standard policies based on their observed miss rates. A thorough experimental evaluation of our techniques is given, as well as a discussion of what makes caching an interesting on-line learning problem.

5 0.13012727 159 nips-2002-Optimality of Reinforcement Learning Algorithms with Linear Function Approximation

Author: Ralf Schoknecht

Abstract: There are several reinforcement learning algorithms that yield approximate solutions for the problem of policy evaluation when the value function is represented with a linear function approximator. In this paper we show that each of the solutions is optimal with respect to a specific objective function. Moreover, we characterise the different solutions as images of the optimal exact value function under different projection operations. The results presented here will be useful for comparing the algorithms in terms of the error they achieve relative to the error of the optimal approximate solution. 1

6 0.12648216 13 nips-2002-A Note on the Representational Incompatibility of Function Approximation and Factored Dynamics

7 0.11011914 82 nips-2002-Exponential Family PCA for Belief Compression in POMDPs

8 0.10495941 61 nips-2002-Convergent Combinations of Reinforcement Learning with Linear Function Approximation

9 0.10322866 130 nips-2002-Learning in Zero-Sum Team Markov Games Using Factored Value Functions

10 0.095400259 134 nips-2002-Learning to Take Concurrent Actions

11 0.083042122 205 nips-2002-Value-Directed Compression of POMDPs

12 0.076236777 9 nips-2002-A Minimal Intervention Principle for Coordinated Movement

13 0.073861338 165 nips-2002-Ranking with Large Margin Principle: Two Approaches

14 0.065644473 175 nips-2002-Reinforcement Learning to Play an Optimal Nash Equilibrium in Team Markov Games

15 0.065334707 144 nips-2002-Minimax Differential Dynamic Programming: An Application to Robust Biped Walking

16 0.061570369 123 nips-2002-Learning Attractor Landscapes for Learning Motor Primitives

17 0.060304031 24 nips-2002-Adaptive Scaling for Feature Selection in SVMs

18 0.049314585 137 nips-2002-Location Estimation with a Differential Update Network

19 0.048883572 77 nips-2002-Effective Dimension and Generalization of Kernel Learning

20 0.046171017 121 nips-2002-Knowledge-Based Support Vector Machine Classifiers


similar papers computed by lsi model

lsi for this paper:

topicId topicWeight

[(0, -0.155), (1, -0.036), (2, -0.327), (3, -0.11), (4, 0.036), (5, -0.1), (6, 0.083), (7, -0.05), (8, 0.062), (9, 0.086), (10, -0.022), (11, -0.059), (12, -0.043), (13, 0.036), (14, 0.003), (15, -0.016), (16, 0.016), (17, -0.075), (18, 0.095), (19, 0.023), (20, -0.042), (21, -0.006), (22, -0.147), (23, -0.094), (24, -0.066), (25, -0.049), (26, 0.037), (27, 0.062), (28, -0.002), (29, -0.02), (30, 0.004), (31, -0.061), (32, -0.008), (33, -0.058), (34, 0.016), (35, -0.003), (36, -0.127), (37, -0.031), (38, 0.087), (39, -0.032), (40, -0.041), (41, 0.025), (42, -0.077), (43, -0.043), (44, 0.027), (45, 0.042), (46, -0.049), (47, 0.035), (48, -0.011), (49, -0.017)]

similar papers list:

simIndex simValue paperId paperTitle

same-paper 1 0.95672697 33 nips-2002-Approximate Linear Programming for Average-Cost Dynamic Programming

Author: Benjamin V. Roy, Daniela D. Farias

Abstract: This paper extends our earlier analysis on approximate linear programming as an approach to approximating the cost-to-go function in a discounted-cost dynamic program [6]. In this paper, we consider the average-cost criterion and a version of approximate linear programming that generates approximations to the optimal average cost and differential cost function. We demonstrate that a naive version of approximate linear programming prioritizes approximation of the optimal average cost and that this may not be well-aligned with the objective of deriving a policy with low average cost. For that, the algorithm should aim at producing a good approximation of the differential cost function. We propose a twophase variant of approximate linear programming that allows for external control of the relative accuracy of the approximation of the differential cost function over different portions of the state space via state-relevance weights. Performance bounds suggest that the new algorithm is compatible with the objective of optimizing performance and provide guidance on appropriate choices for state-relevance weights.

2 0.86807388 3 nips-2002-A Convergent Form of Approximate Policy Iteration

Author: Theodore J. Perkins, Doina Precup

Abstract: We study a new, model-free form of approximate policy iteration which uses Sarsa updates with linear state-action value function approximation for policy evaluation, and a “policy improvement operator” to generate a new policy based on the learned state-action values. We prove that if the policy improvement operator produces -soft policies and is Lipschitz continuous in the action values, with a constant that is not too large, then the approximate policy iteration algorithm converges to a unique solution from any initial policy. To our knowledge, this is the first convergence result for any form of approximate policy iteration under similar computational-resource assumptions.

3 0.84411627 20 nips-2002-Adaptive Caching by Refetching

Author: Robert B. Gramacy, Manfred K. Warmuth, Scott A. Brandt, Ismail Ari

Abstract: We are constructing caching policies that have 13-20% lower miss rates than the best of twelve baseline policies over a large variety of request streams. This represents an improvement of 49–63% over Least Recently Used, the most commonly implemented policy. We achieve this not by designing a specific new policy but by using on-line Machine Learning algorithms to dynamically shift between the standard policies based on their observed miss rates. A thorough experimental evaluation of our techniques is given, as well as a discussion of what makes caching an interesting on-line learning problem.

4 0.75911677 155 nips-2002-Nonparametric Representation of Policies and Value Functions: A Trajectory-Based Approach

Author: Christopher G. Atkeson, Jun Morimoto

Abstract: A longstanding goal of reinforcement learning is to develop nonparametric representations of policies and value functions that support rapid learning without suffering from interference or the curse of dimensionality. We have developed a trajectory-based approach, in which policies and value functions are represented nonparametrically along trajectories. These trajectories, policies, and value functions are updated as the value function becomes more accurate or as a model of the task is updated. We have applied this approach to periodic tasks such as hopping and walking, which required handling discount factors and discontinuities in the task dynamics, and using function approximation to represent value functions at discontinuities. We also describe extensions of the approach to make the policies more robust to modeling error and sensor noise.

5 0.65989202 13 nips-2002-A Note on the Representational Incompatibility of Function Approximation and Factored Dynamics

Author: Eric Allender, Sanjeev Arora, Michael Kearns, Cristopher Moore, Alexander Russell

Abstract: We establish a new hardness result that shows that the difficulty of planning in factored Markov decision processes is representational rather than just computational. More precisely, we give a fixed family of factored MDPs with linear rewards whose optimal policies and value functions simply cannot be represented succinctly in any standard parametric form. Previous hardness results indicated that computing good policies from the MDP parameters was difficult, but left open the possibility of succinct function approximation for any fixed factored MDP. Our result applies even to policies which yield a polynomially poor approximation to the optimal value, and highlights interesting connections with the complexity class of Arthur-Merlin games.

6 0.65841103 134 nips-2002-Learning to Take Concurrent Actions

7 0.61289984 130 nips-2002-Learning in Zero-Sum Team Markov Games Using Factored Value Functions

8 0.45256117 123 nips-2002-Learning Attractor Landscapes for Learning Motor Primitives

9 0.44405863 185 nips-2002-Speeding up the Parti-Game Algorithm

10 0.43516353 205 nips-2002-Value-Directed Compression of POMDPs

11 0.42899442 144 nips-2002-Minimax Differential Dynamic Programming: An Application to Robust Biped Walking

12 0.40854073 159 nips-2002-Optimality of Reinforcement Learning Algorithms with Linear Function Approximation

13 0.40538037 61 nips-2002-Convergent Combinations of Reinforcement Learning with Linear Function Approximation

14 0.38990247 47 nips-2002-Branching Law for Axons

15 0.35946256 82 nips-2002-Exponential Family PCA for Belief Compression in POMDPs

16 0.32494318 9 nips-2002-A Minimal Intervention Principle for Coordinated Movement

17 0.3212564 165 nips-2002-Ranking with Large Margin Principle: Two Approaches

18 0.25777379 175 nips-2002-Reinforcement Learning to Play an Optimal Nash Equilibrium in Team Markov Games

19 0.24729617 114 nips-2002-Information Regularization with Partially Labeled Data

20 0.24018924 174 nips-2002-Regularized Greedy Importance Sampling


similar papers computed by lda model

lda for this paper:

topicId topicWeight

[(11, 0.015), (23, 0.332), (42, 0.075), (54, 0.179), (55, 0.054), (67, 0.011), (68, 0.011), (73, 0.011), (74, 0.069), (92, 0.028), (98, 0.095)]

similar papers list:

simIndex simValue paperId paperTitle

1 0.96828669 108 nips-2002-Improving Transfer Rates in Brain Computer Interfacing: A Case Study

Author: Peter Meinicke, Matthias Kaper, Florian Hoppe, Manfred Heumann, Helge Ritter

Abstract: In this paper we present results of a study on brain computer interfacing. We adopted an approach of Farwell & Donchin [4], which we tried to improve in several aspects. The main objective was to improve the transfer rates based on offline analysis of EEG-data but within a more realistic setup closer to an online realization than in the original studies. The objective was achieved along two different tracks: on the one hand we used state-of-the-art machine learning techniques for signal classification and on the other hand we augmented the data space by using more electrodes for the interface. For the classification task we utilized SVMs and, as motivated by recent findings on the learning of discriminative densities, we accumulated the values of the classification function in order to combine several classifications, which finally lead to significantly improved rates as compared with techniques applied in the original work. In combination with the data space augmentation, we achieved competitive transfer rates at an average of 50.5 bits/min and with a maximum of 84.7 bits/min.

2 0.95157856 153 nips-2002-Neural Decoding of Cursor Motion Using a Kalman Filter

Author: W Wu, M. J. Black, Y. Gao, M. Serruya, A. Shaikhouni, J. P. Donoghue, Elie Bienenstock

Abstract: The direct neural control of external devices such as computer displays or prosthetic limbs requires the accurate decoding of neural activity representing continuous movement. We develop a real-time control system using the spiking activity of approximately 40 neurons recorded with an electrode array implanted in the arm area of primary motor cortex. In contrast to previous work, we develop a control-theoretic approach that explicitly models the motion of the hand and the probabilistic relationship between this motion and the mean firing rates of the cells in 70 bins. We focus on a realistic cursor control task in which the subject must move a cursor to “hit” randomly placed targets on a computer monitor. Encoding and decoding of the neural data is achieved with a Kalman filter which has a number of advantages over previous linear filtering techniques. In particular, the Kalman filter reconstructions of hand trajectories in off-line experiments are more accurate than previously reported results and the model provides insights into the nature of the neural coding of movement. ¨ ©§

3 0.9493618 26 nips-2002-An Estimation-Theoretic Framework for the Presentation of Multiple Stimuli

Author: Christian W. Eurich

Abstract: A framework is introduced for assessing the encoding accuracy and the discriminational ability of a population of neurons upon simultaneous presentation of multiple stimuli. Minimal square estimation errors are obtained from a Fisher information analysis in an abstract compound space comprising the features of all stimuli. Even for the simplest case of linear superposition of responses and Gaussian tuning, the symmetries in the compound space are very different from those in the case of a single stimulus. The analysis allows for a quantitative description of attentional effects and can be extended to include neural nonlinearities such as nonclassical receptive fields. 1

same-paper 4 0.87805784 33 nips-2002-Approximate Linear Programming for Average-Cost Dynamic Programming

Author: Benjamin V. Roy, Daniela D. Farias

Abstract: This paper extends our earlier analysis on approximate linear programming as an approach to approximating the cost-to-go function in a discounted-cost dynamic program [6]. In this paper, we consider the average-cost criterion and a version of approximate linear programming that generates approximations to the optimal average cost and differential cost function. We demonstrate that a naive version of approximate linear programming prioritizes approximation of the optimal average cost and that this may not be well-aligned with the objective of deriving a policy with low average cost. For that, the algorithm should aim at producing a good approximation of the differential cost function. We propose a twophase variant of approximate linear programming that allows for external control of the relative accuracy of the approximation of the differential cost function over different portions of the state space via state-relevance weights. Performance bounds suggest that the new algorithm is compatible with the objective of optimizing performance and provide guidance on appropriate choices for state-relevance weights.

5 0.68847054 55 nips-2002-Combining Features for BCI

Author: Guido Dornhege, Benjamin Blankertz, Gabriel Curio, Klaus-Robert Müller

Abstract: Recently, interest is growing to develop an effective communication interface connecting the human brain to a computer, the ’Brain-Computer Interface’ (BCI). One motivation of BCI research is to provide a new communication channel substituting normal motor output in patients with severe neuromuscular disabilities. In the last decade, various neurophysiological cortical processes, such as slow potential shifts, movement related potentials (MRPs) or event-related desynchronization (ERD) of spontaneous EEG rhythms, were shown to be suitable for BCI, and, consequently, different independent approaches of extracting BCI-relevant EEG-features for single-trial analysis are under investigation. Here, we present and systematically compare several concepts for combining such EEG-features to improve the single-trial classification. Feature combinations are evaluated on movement imagination experiments with 3 subjects where EEG-features are based on either MRPs or ERD, or both. Those combination methods that incorporate the assumption that the single EEG-features are physiologically mutually independent outperform the plain method of ’adding’ evidence where the single-feature vectors are simply concatenated. These results strengthen the hypothesis that MRP and ERD reflect at least partially independent aspects of cortical processes and open a new perspective to boost BCI effectiveness.

6 0.68487018 3 nips-2002-A Convergent Form of Approximate Policy Iteration

7 0.68307608 155 nips-2002-Nonparametric Representation of Policies and Value Functions: A Trajectory-Based Approach

8 0.67880118 82 nips-2002-Exponential Family PCA for Belief Compression in POMDPs

9 0.66807306 123 nips-2002-Learning Attractor Landscapes for Learning Motor Primitives

10 0.66517061 44 nips-2002-Binary Tuning is Optimal for Neural Rate Coding with High Temporal Resolution

11 0.65835953 169 nips-2002-Real-Time Particle Filters

12 0.65572178 21 nips-2002-Adaptive Classification by Variational Kalman Filtering

13 0.65429622 148 nips-2002-Morton-Style Factorial Coding of Color in Primary Visual Cortex

14 0.64975584 141 nips-2002-Maximally Informative Dimensions: Analyzing Neural Responses to Natural Signals

15 0.64894819 205 nips-2002-Value-Directed Compression of POMDPs

16 0.64198601 5 nips-2002-A Digital Antennal Lobe for Pattern Equalization: Analysis and Design

17 0.64166188 137 nips-2002-Location Estimation with a Differential Update Network

18 0.63839126 68 nips-2002-Discriminative Densities from Maximum Contrast Estimation

19 0.63293821 13 nips-2002-A Note on the Representational Incompatibility of Function Approximation and Factored Dynamics

20 0.6308583 187 nips-2002-Spikernels: Embedding Spiking Neurons in Inner-Product Spaces