nips nips2001 nips2001-32 knowledge-graph by maker-knowledge-mining
Source: pdf
Author: Michael L. Littman, Michael J. Kearns, Satinder P. Singh
Abstract: We describe a new algorithm for computing a Nash equilibrium in graphical games, a compact representation for multi-agent systems that we introduced in previous work. The algorithm is the first to compute equilibria both efficiently and exactly for a non-trivial class of graphical games. 1
Reference: text
sentIndex sentText sentNum sentScore
1 edu Abstract We describe a new algorithm for computing a Nash equilibrium in graphical games, a compact representation for multi-agent systems that we introduced in previous work. [sent-8, score-0.161]
2 The algorithm is the first to compute equilibria both efficiently and exactly for a non-trivial class of graphical games. [sent-9, score-0.293]
3 In the simplest of these formalisms, each vertex represents a single agent, and the edges represent pairwise interaction between agents. [sent-12, score-0.183]
4 Classical game theory is typically used to model multi-agent interactions, and the global states of interest are thus the so-called Nash equilibria, in which no agent has a unilateral incentive to deviate. [sent-14, score-0.24]
5 2001), we introduced such a graphical formalism for multi-agent game theory, and provided two algorithms for computing Nash equilibria when the underlying graph is a tree (or is sufficiently sparse). [sent-16, score-0.516]
6 A second and related algorithm computes all Nash equilibria exactly, but in time exponential in the number of agents. [sent-18, score-0.208]
7 We thus left open the problem of efficiently computing exact equilibria in sparse graphs. [sent-19, score-0.184]
8 Given as input a graphical game that is a tree, the algorithm computes in polynomial time an exact Nash equilibrium for the global multi-agent system. [sent-21, score-0.424]
9 The new algorithm can also be extended to efficiently accommodate parametric representations of the local game matrices, which are analogous to parametric conditional probability tables (such as noisy-OR and sigmoids) in Bayesian networks. [sent-23, score-0.327]
10 The problem of computing Nash equilibria in a graphical game, however, appears to be considerably more difficult than computing conditional probabilities in Bayesian networks. [sent-25, score-0.233]
11 Nevertheless, the analogy and the work presented here suggest a number of interesting avenues for further work in the intersection of game theory, network models, probabilistic inference, statistical physics, and other fields. [sent-26, score-0.24]
12 2 Preliminaries An n-player, two-action 1 game is defined by a set of n matrices Mi (1 ~ i ~ n), each with n indices. [sent-30, score-0.27]
13 ,xn ) = Mi(X) specifies the payoff to player i when the joint action of the n players is x E {O, I} n. [sent-34, score-0.215]
14 If a game is given by simply listing the 2n entries of each of the n matrices, we will say that it is represented in tabular form. [sent-36, score-0.295]
15 ° The actions and 1 are the pure strategies of each player, while a mixed strategy for player i is given by the probability Pi E [0, 1] that the player will play 1. [sent-37, score-0.365]
16 For any joint mixed strategy, given by a product distribution p, we define the expected payoff to player i as Mi(i/) = Ex~p[Mi(X)], where x'" pindicates that each Xj is 1 with probability Pj and with probability 1 - Pj. [sent-38, score-0.246]
17 A Nash equilibrium for the game is a mixed strategy p such that for any player i, and for any value E [0,1], Mi(i/) ::::: Mi(p[i : (We say that Pi is a best response to jJ. [sent-40, score-0.582]
18 ) In other words, no player can improve its expected payoff by deviating unilaterally from a Nash equilibrium. [sent-41, score-0.173]
19 vertices and M is a set of n matrices Mi (1 ::; i ::; n), called the local game matrices . [sent-46, score-0.377]
20 Player i is represented by a vertex labeled i in G. [sent-47, score-0.183]
21 , n} to denote the set of neighbors of player i in G- those vertices j such that the undirected edge (i , j) appears in G. [sent-51, score-0.193]
22 The interpretation is that each player is in a game with only his neighbors in G. [sent-53, score-0.383]
23 Thus, if ING(i) I = k, the matrix Mi has k indices, one for each player in NG(i) , and if x E [0, Ilk, Mi(X) denotes the payoff to i when his k neighbors (which include himself) play The expected payoff under a mixed strategy jJ E [0, Ilk is defined analogously. [sent-54, score-0.454]
24 We thus use Mv to denote the local game matrix for the player identified with vertex V. [sent-58, score-0.55]
25 Note that our definitions are entirely representational, and alter nothing about the underlying game theory. [sent-59, score-0.24]
26 Furthermore, every game can be trivially represented as a graphical game by choosing G to be the complete graph and letting the local game matrices be the original tabular form matrices. [sent-61, score-0.897]
27 However, exactly as for Bayesian networks and other graphical models for probabilistic inference, any game in which the local neighborhoods in G can be bounded by k « n, exponential space savings accrue. [sent-63, score-0.368]
28 3 The Algorithm If (G , M) is a graphical game in which G is a tree, then we can always designate some vertex Z as the root. [sent-65, score-0.509]
29 For any vertex V, the single neighbor of Von the path from V to Z shall be called the child of V, and the (possibly many) neighbors of V on paths towards the leaves shall be called the parents of V. [sent-66, score-0.468]
30 Our algorithm consists of two passes: a downstream pass in which local data structures are passed from the leaves towards the root, and an upstream pass progressing from the root towards the leaves. [sent-67, score-0.511]
31 Throughout the ensuing discussion, we consider a fixed vertex V with parents U I , . [sent-68, score-0.277]
32 On the downstream pass of our algorithm, vertex V will compute and pass to its child W a breakpoint policy, which we now define. [sent-72, score-1.147]
33 ° Definition 1 A breakpoint policy for V consists of an ordered set of W -breakpoints = < WI < W2 < . [sent-73, score-0.742]
34 ,Vt· The interpretation is that for any W E [0,1], if Wi-I < W < Wi for some index i and W plays w, then V shall play Vii and if W = Wi for some index i , then V shall play any value between Vi and Vi+I. [sent-79, score-0.323]
35 We say such a breakpoint policy has t - 1 breakpoints. [sent-80, score-0.736]
36 Wo A breakpoint policy for V can thus be seen as assigning a value (or range of values) to the mixed strategy played by V in response to the play of its child W. [sent-81, score-1.057]
37 Let G V denote the subtree of G with root V, and let M~=w denote the subset of the set of local game matrices M corresponding to the vertices in G V , except that the matrix M v is collapsed one index by setting W = w, thus marginalizing W out. [sent-83, score-0.434]
38 On its downstream pass, our algorithm shall maintain the invariant that if we set the child W = w, then there is a Nash equilibrium for the graphical game (G v , M~=w) (an upstream Nash) in which V = Fv(w). [sent-84, score-0.768]
39 If this property is satisfied by Fv(w), we shall say that Fv(w) is a Nash breakpoint policy for V. [sent-85, score-0.771]
40 The trick is to commit to one of these values (as specified by Fv (w)) that can be extended to a Nash equilibrium for the entire tree G, before we have even processed the tree below V . [sent-87, score-0.138]
41 The algorithm and analysis are inductive: V computes a Nash breakpoint policy Fv(w) from Nash breakpoint policies FUl (v), . [sent-90, score-1.427]
42 , FUk (v) passed down from its parents (and from the local game matrix Mv). [sent-93, score-0.368]
43 The complexity analysis bounds the number of breakpoints for any vertex in the tree. [sent-94, score-0.408]
44 The sign of ~v(it, w) tells us V's best response to the setting of the local neighborhood -0 = it, W = w; positive sign means V = 1 is the best response, negative that V = 0 is the best response, and 0 that V is indifferent and may play any mixed strategy. [sent-98, score-0.337]
45 For the base case, suppose V is a leaf with child W; we want to describe the Nash breakpoint policy for V. [sent-100, score-0.831]
46 If for all w E [0,1], the function ~v(w) is non-negative (non-positive, respectively), V can choose 1 (0, respectively) as a best response (which in this base case is an upstream Nash) to all values W = w. [sent-101, score-0.189]
47 Thus, this crossing point becomes the single breakpoint in Fv(w). [sent-103, score-0.626]
48 ,Uk and child W, and assume V has received Nash breakpoint policies FUi (v) from each parent Ui . [sent-109, score-0.835]
49 Then V can efficiently compute a Nash breakpoint policy Fv (w). [sent-110, score-0.752]
50 The number of breakpoints is no more than two plus the total number of breakpoints in the FUi (v) policies. [sent-111, score-0.485]
51 Proof: Recall that for any fixed value of v, the breakpoint policy FUi (v) specifies either a specific value for Ui (if v falls between two breakpoints of FUi (v)) , or a range of allowed values for Ui (if v is equal to a breakpoint). [sent-112, score-1.055]
52 < Vs = 1 be the ordered union of the breakpoints of the FUi (v). [sent-116, score-0.285]
53 Thus for any breakpoint Vi, there is at most one distinguished parent Uj (that we shall call the free parent) for which Fu; (Vi) specifies an allowed interval of play for Uj . [sent-117, score-0.94]
54 For each breakpoint Ve, we now define the set of values for the child W that, as we let the free parent range across its allowed interval, permit V to play any mixed strategy as a best response. [sent-119, score-1.031]
55 < Vs = 1 be the ordered union of the breakpoints of the parent policies Fu; (v). [sent-123, score-0.412]
56 Fix any breakpoint Ve, and assume without loss of generality that U I is the free parent of V for V = Ve. [sent-124, score-0.713]
57 In words, We is the set of values that W can play that allow V to play any mixed strategy, preserving the existence of an upstream Nash from V given W = w. [sent-131, score-0.341]
58 It also follows from the earlier work that We can be computed in time proportional to the size of V's local game matrix - O(2k) for a vertex with k parents. [sent-134, score-0.45]
59 Lemma 4 For any breakpoint Ve, the set We is either empty, a single interval, or the union of two intervals that are not floating. [sent-136, score-0.66]
60 We wish to create the (inductive) Nash breakpoint policy Fv(w) from the sets W e and the Fu; policies. [sent-137, score-0.715]
61 For b E {O, I}, we define W b as the set of values w such that if W = w and the Uis are set according to their breakpoint policies for V = b, V = b is a best response. [sent-139, score-0.711]
62 < Vs = 1 be the ordered union of the breakpoints of the Fu; (v) policies. [sent-144, score-0.285]
63 Proof: Consider any fixed value of w, and for each open interval (vi> vj+d determined by adjacent breakpoints, label this interval by V 's best response (0 or 1) to W = wand 0 set according to the Fu; policies for this interval. [sent-146, score-0.338]
64 If either the leftmost interval [O ,vd is labeled with 0 or the rightmost interval [v s -I , I] is labeled with 1, then w is included in W O or WI , respectively (V playing 0 or 1 is a best response to what the Uis will play in response to a 0 or 1). [sent-147, score-0.41]
65 Otherwise, since the labeling starts at 1 on the left and ends at 0 on the right, there must be a breakpoint Ve such that V's best response changes over this breakpoint. [sent-148, score-0.671]
66 By continuity, there must be a value of Ui in its allowed interval for which V is indifferent to playing 0 or 1, so w E W e. [sent-150, score-0.199]
67 Since every w is contained in some W e (Lemma 5), and since every W e is the union of at most two intervals (Lemma 4), we can uniquely identify the set WeI that covers the largest (leftmost) interval containing w = 0; let [0, a] be this interval. [sent-153, score-0.146]
68 The solid intervals along these breakpoints are the sets We. [sent-157, score-0.244]
69 This process results in a Nash breakpoint policy Fv(w). [sent-163, score-0.715]
70 Finally, we bound the number of breakpoints in the Fv (w) policy. [sent-164, score-0.225]
71 By construction, each of its breakpoints must be the rightmost portion of some interval in WO, WI, or some We. [sent-165, score-0.32]
72 After the first breakpoint, each of these sets contributes at most one new breakpoint (Lemma 4). [sent-166, score-0.608]
73 The final breakpoint is at w = 1 and does not contribute to the count (Definition 1). [sent-167, score-0.608]
74 There is at most one We for each breakpoint in each Fu; (v) policy, plus WO and WI, plus the initial leftmost interval and minus the final breakpoint, so the total breakpoints in Fv(w) can be no more than two plus the total number of breakpoints in the Fu; (v) policies. [sent-168, score-1.252]
75 Therefore, the root of a size n tree will have a Nash breakpoint policy with no more than 2n breakpoints. [sent-169, score-0.808]
76 2 Upstream Pass The downstream pass completes when each vertex in the tree has had its Nash breakpoint policy computed. [sent-172, score-1.154]
77 For simplicity of description, imagine that the root of t he tree includes a dummy child with constant payoffs and no influence on t he root, so the root's breakpoint policy has t he same form as the others in the tree. [sent-173, score-0.928]
78 To produce a Nash equilibrium, our algorithm performs an upstream pass over the tree, starting from the root. [sent-174, score-0.224]
79 Each vertex is told by its child what value to play, as well as the value the child itself will play. [sent-175, score-0.447]
80 The algorithm ensures that all downstream vertices are Nash (playing best response to their neighbors). [sent-176, score-0.242]
81 Given this information, each vertex computes a value for each of its parents so that its own assigned action is a best response. [sent-177, score-0.349]
82 This process can be initiated by the dummy vertex picking an arbitrary value for itself, and selecting the root's value according to its Nash breakpoint policy. [sent-178, score-0.855]
83 Inductively, we have a vertex V connected to parents U1 , . [sent-179, score-0.255]
84 The child of V has informed V to chose V = v and that it will play W = w. [sent-183, score-0.181]
85 To decide on values for V's parents to enforce V playing a best response, we can look at the Nash breakpoint policies FUi (v), which provide a value (or range of values) for Ui as a function of v that guarantee an upstream Nash. [sent-184, score-0.94]
86 The value v can be a breakpoint for at most one Ui . [sent-185, score-0.63]
87 For each Ui , if v is not a breakpoint in FUi (v) , then Ui should be told to select Ui = FUi (v). [sent-186, score-0.628]
88 If v is a breakpoint in FUi (v), then Ui's value can be computed by solving ~V(Ul "'" Ui,"" Uk, w) = 0; this is the value of Ui that makes V indifferent. [sent-187, score-0.652]
89 The equation is linear in Ui and has a solution by the construction of the Nash breakpoint policies on the downstream pass. [sent-188, score-0.8]
90 When the upstream pass completes, each vertex has a concrete choice of action such that jointly they have formed a Nash equilibrium. [sent-190, score-0.384]
91 Each vertex is involved in a computation in the downstream pass and in the upstream pass. [sent-192, score-0.49]
92 Let t be the total number of breakpoints in the breakpoint policy for a vertex V with k parents. [sent-193, score-1.138]
93 Sorting the breakpoints and computing the W£ sets and computing the new breakpoint policy can be completed in 0 (t log t + t2 k ). [sent-194, score-0.94]
94 In the upstream pass, only one breakpoint is considered, so 0 (log t + 2k) is sufficient for passing breakpoints to the parents. [sent-195, score-0.959]
95 By Theorem 2, t :S 2n , so the entire algorithm executes in time O(n 2 10g n + n22k), where k is the largest number of neighbors of any vertex in the network. [sent-196, score-0.249]
96 The algorithm can be implemented to take advantage of local game matrices provided in a parameterized form. [sent-197, score-0.32]
97 4 Conclusion The algorithm presented in this paper finds a single Nash equilibrium for a game represented by a tree-structured network. [sent-199, score-0.315]
98 2001) was able to select equilibria efficiently according to criteria like maximizing the total expected payoff for all players. [sent-201, score-0.272]
99 The polynomial-time algorithm described in this paper throws out potential equilibria at many stages, most significantly during the construction of the Nash breakpoint policies. [sent-202, score-0.806]
100 An interesting area for future work is to manipulate this process to produce equilibria with particular properties. [sent-203, score-0.147]
wordName wordTfidf (topN-words)
[('breakpoint', 0.608), ('nash', 0.448), ('game', 0.24), ('breakpoints', 0.225), ('vertex', 0.183), ('fv', 0.156), ('equilibria', 0.147), ('fui', 0.145), ('upstream', 0.126), ('ui', 0.119), ('policy', 0.107), ('downstream', 0.106), ('child', 0.1), ('player', 0.1), ('mi', 0.089), ('graphical', 0.086), ('fu', 0.082), ('play', 0.081), ('interval', 0.078), ('pass', 0.075), ('payoff', 0.073), ('parents', 0.072), ('parent', 0.069), ('lemma', 0.061), ('policies', 0.058), ('mixed', 0.053), ('equilibrium', 0.052), ('root', 0.05), ('vertices', 0.05), ('vi', 0.049), ('ve', 0.047), ('indifferent', 0.046), ('uis', 0.046), ('tree', 0.043), ('neighbors', 0.043), ('kearns', 0.04), ('mv', 0.039), ('response', 0.038), ('efficiently', 0.037), ('shall', 0.035), ('tabular', 0.034), ('inductive', 0.033), ('union', 0.033), ('completes', 0.032), ('strategy', 0.031), ('wi', 0.031), ('games', 0.031), ('matrices', 0.03), ('plays', 0.029), ('passed', 0.029), ('vo', 0.029), ('playing', 0.029), ('construction', 0.028), ('local', 0.027), ('ordered', 0.027), ('leftmost', 0.026), ('definition', 0.025), ('specifies', 0.025), ('best', 0.025), ('wo', 0.024), ('proof', 0.024), ('allowed', 0.024), ('assigned', 0.024), ('computes', 0.023), ('algorithm', 0.023), ('fixed', 0.022), ('value', 0.022), ('say', 0.021), ('ilk', 0.021), ('satinder', 0.021), ('index', 0.02), ('define', 0.02), ('plus', 0.02), ('free', 0.02), ('told', 0.02), ('dummy', 0.02), ('intervals', 0.019), ('crossing', 0.018), ('benefits', 0.018), ('floating', 0.018), ('vs', 0.018), ('played', 0.017), ('wand', 0.017), ('players', 0.017), ('rightmost', 0.017), ('setting', 0.017), ('michael', 0.017), ('representational', 0.017), ('koller', 0.017), ('uai', 0.017), ('uk', 0.016), ('identify', 0.016), ('generality', 0.016), ('leaf', 0.016), ('theorem', 0.016), ('uj', 0.015), ('total', 0.015), ('littman', 0.015), ('ng', 0.015), ('exponential', 0.015)]
simIndex simValue paperId paperTitle
same-paper 1 0.9999997 32 nips-2001-An Efficient, Exact Algorithm for Solving Tree-Structured Graphical Games
Author: Michael L. Littman, Michael J. Kearns, Satinder P. Singh
Abstract: We describe a new algorithm for computing a Nash equilibrium in graphical games, a compact representation for multi-agent systems that we introduced in previous work. The algorithm is the first to compute equilibria both efficiently and exactly for a non-trivial class of graphical games. 1
2 0.27222869 146 nips-2001-Playing is believing: The role of beliefs in multi-agent learning
Author: Yu-Han Chang, Leslie Pack Kaelbling
Abstract: We propose a new classification for multi-agent learning algorithms, with each league of players characterized by both their possible strategies and possible beliefs. Using this classification, we review the optimality of existing algorithms, including the case of interleague play. We propose an incremental improvement to the existing algorithms that seems to achieve average payoffs that are at least the Nash equilibrium payoffs in the longrun against fair opponents.
3 0.15398884 187 nips-2001-The Steering Approach for Multi-Criteria Reinforcement Learning
Author: Shie Mannor, Nahum Shimkin
Abstract: We consider the problem of learning to attain multiple goals in a dynamic environment, which is initially unknown. In addition, the environment may contain arbitrarily varying elements related to actions of other agents or to non-stationary moves of Nature. This problem is modelled as a stochastic (Markov) game between the learning agent and an arbitrary player, with a vector-valued reward function. The objective of the learning agent is to have its long-term average reward vector belong to a given target set. We devise an algorithm for achieving this task, which is based on the theory of approachability for stochastic games. This algorithm combines, in an appropriate way, a finite set of standard, scalar-reward learning algorithms. Sufficient conditions are given for the convergence of the learning algorithm to a general target set. The specialization of these results to the single-controller Markov decision problem are discussed as well. 1
4 0.10561366 93 nips-2001-Incremental A*
Author: S. Koenig, M. Likhachev
Abstract: Incremental search techniques find optimal solutions to series of similar search tasks much faster than is possible by solving each search task from scratch. While researchers have developed incremental versions of uninformed search methods, we develop an incremental version of A*. The first search of Lifelong Planning A* is the same as that of A* but all subsequent searches are much faster because it reuses those parts of the previous search tree that are identical to the new search tree. We then present experimental results that demonstrate the advantages of Lifelong Planning A* for simple route planning tasks. 1 Overview Artificial intelligence has investigated knowledge-based search techniques that allow one to solve search tasks in large domains. Most of the research on these methods has studied how to solve one-shot search problems. However, search is often a repetitive process, where one needs to solve a series of similar search tasks, for example, because the actual situation turns out to be slightly different from the one initially assumed or because the situation changes over time. An example for route planning tasks are changing traffic conditions. Thus, one needs to replan for the new situation, for example if one always wants to display the least time-consuming route from the airport to the conference center on a web page. In these situations, most search methods replan from scratch, that is, solve the search problems independently. Incremental search techniques share with case-based planning, plan adaptation, repair-based planning, and learning search-control knowledge the property that they find solutions to series of similar search tasks much faster than is possible by solving each search task from scratch. Incremental search techniques, however, differ from the other techniques in that the quality of their solutions is guaranteed to be as good as the quality of the solutions obtained by replanning from scratch. Although incremental search methods are not widely known in artificial intelligence and control, different researchers have developed incremental search versions of uninformed search methods in the algorithms literature. An overview can be found in [FMSN00]. We, on the other hand, develop an incremental version of A*, thus combining ideas from the algorithms literature and the artificial intelligence literature. We call the algorithm Lifelong Planning A* (LPA*), in analogy to “lifelong learning” [Thr98], because it reuses £ We thank Anthony Stentz for his support. The Intelligent Decision-Making Group is partly supported by NSF awards under contracts IIS9984827, IIS-0098807, and ITR/AP-0113881. The views and conclusions contained in this document are those of the authors and should not be interpreted as representing the official policies, either expressed or implied, of the sponsoring organizations and agencies or the U.S. government. information from previous searches. LPA* uses heuristics to focus the search and always finds a shortest path for the current edge costs. The first search of LPA* is the same as that of A* but all subsequent searches are much faster. LPA* produces at least the search tree that A* builds. However, it achieves a substantial speedup over A* because it reuses those parts of the previous search tree that are identical to the new search tree. 2 The Route Planning Task Lifelong Planning A* (LPA*) solves the following search task: It applies to finite graph search problems on known graphs whose edge costs can increase or decrease over time. denotes the finite set of vertices of the graph. denotes the set of successors of vertex . Similarly, denotes the set of predecessors of vertex . denotes the cost of moving from vertex to vertex . LPA* always determines a shortest path from a given start vertex to a given goal vertex , knowing both the topology of the graph and the current edge costs. We use to denote the start distance of vertex , that is, the length of a shortest path from to . ¨ ¨¦ £ £ ¡ ©§¥¤¢ FP HFE TSRQIGD¨ ¨¦ £ £ ¡ 4 ©D¥CBA@!¨ ¨ ¨¦
5 0.091220766 13 nips-2001-A Natural Policy Gradient
Author: Sham M. Kakade
Abstract: We provide a natural gradient method that represents the steepest descent direction based on the underlying structure of the parameter space. Although gradient methods cannot make large changes in the values of the parameters, we show that the natural gradient is moving toward choosing a greedy optimal action rather than just a better action. These greedy optimal actions are those that would be chosen under one improvement step of policy iteration with approximate, compatible value functions, as defined by Sutton et al. [9]. We then show drastic performance improvements in simple MDPs and in the more challenging MDP of Tetris. 1
6 0.073383018 59 nips-2001-Direct value-approximation for factored MDPs
7 0.067153335 121 nips-2001-Model-Free Least-Squares Policy Iteration
8 0.056116275 99 nips-2001-Intransitive Likelihood-Ratio Classifiers
9 0.055437028 55 nips-2001-Convergence of Optimistic and Incremental Q-Learning
10 0.053072937 58 nips-2001-Covariance Kernels from Bayesian Generative Models
11 0.049444426 67 nips-2001-Efficient Resources Allocation for Markov Decision Processes
12 0.049360439 128 nips-2001-Multiagent Planning with Factored MDPs
14 0.046977542 40 nips-2001-Batch Value Function Approximation via Support Vectors
15 0.046203569 118 nips-2001-Matching Free Trees with Replicator Equations
16 0.037623625 175 nips-2001-Stabilizing Value Function Approximation with the BFBP Algorithm
17 0.03693739 190 nips-2001-Thin Junction Trees
18 0.035939589 193 nips-2001-Unsupervised Learning of Human Motion Models
19 0.033414207 21 nips-2001-A Variational Approach to Learning Curves
20 0.032731462 115 nips-2001-Linear-time inference in Hierarchical HMMs
topicId topicWeight
[(0, -0.11), (1, -0.05), (2, 0.162), (3, -0.01), (4, 0.033), (5, 0.02), (6, -0.05), (7, -0.072), (8, -0.003), (9, 0.013), (10, -0.013), (11, -0.088), (12, 0.005), (13, -0.044), (14, -0.056), (15, -0.018), (16, -0.077), (17, 0.078), (18, -0.125), (19, 0.034), (20, -0.287), (21, 0.152), (22, 0.032), (23, 0.356), (24, 0.097), (25, 0.078), (26, -0.266), (27, 0.003), (28, -0.091), (29, -0.165), (30, -0.109), (31, -0.093), (32, 0.07), (33, 0.023), (34, 0.059), (35, -0.043), (36, -0.036), (37, -0.036), (38, -0.108), (39, -0.032), (40, 0.097), (41, 0.005), (42, 0.02), (43, -0.11), (44, -0.004), (45, -0.073), (46, -0.026), (47, -0.007), (48, 0.005), (49, -0.039)]
simIndex simValue paperId paperTitle
same-paper 1 0.97034913 32 nips-2001-An Efficient, Exact Algorithm for Solving Tree-Structured Graphical Games
Author: Michael L. Littman, Michael J. Kearns, Satinder P. Singh
Abstract: We describe a new algorithm for computing a Nash equilibrium in graphical games, a compact representation for multi-agent systems that we introduced in previous work. The algorithm is the first to compute equilibria both efficiently and exactly for a non-trivial class of graphical games. 1
2 0.87785029 146 nips-2001-Playing is believing: The role of beliefs in multi-agent learning
Author: Yu-Han Chang, Leslie Pack Kaelbling
Abstract: We propose a new classification for multi-agent learning algorithms, with each league of players characterized by both their possible strategies and possible beliefs. Using this classification, we review the optimality of existing algorithms, including the case of interleague play. We propose an incremental improvement to the existing algorithms that seems to achieve average payoffs that are at least the Nash equilibrium payoffs in the longrun against fair opponents.
3 0.57384884 187 nips-2001-The Steering Approach for Multi-Criteria Reinforcement Learning
Author: Shie Mannor, Nahum Shimkin
Abstract: We consider the problem of learning to attain multiple goals in a dynamic environment, which is initially unknown. In addition, the environment may contain arbitrarily varying elements related to actions of other agents or to non-stationary moves of Nature. This problem is modelled as a stochastic (Markov) game between the learning agent and an arbitrary player, with a vector-valued reward function. The objective of the learning agent is to have its long-term average reward vector belong to a given target set. We devise an algorithm for achieving this task, which is based on the theory of approachability for stochastic games. This algorithm combines, in an appropriate way, a finite set of standard, scalar-reward learning algorithms. Sufficient conditions are given for the convergence of the learning algorithm to a general target set. The specialization of these results to the single-controller Markov decision problem are discussed as well. 1
4 0.4139896 93 nips-2001-Incremental A*
Author: S. Koenig, M. Likhachev
Abstract: Incremental search techniques find optimal solutions to series of similar search tasks much faster than is possible by solving each search task from scratch. While researchers have developed incremental versions of uninformed search methods, we develop an incremental version of A*. The first search of Lifelong Planning A* is the same as that of A* but all subsequent searches are much faster because it reuses those parts of the previous search tree that are identical to the new search tree. We then present experimental results that demonstrate the advantages of Lifelong Planning A* for simple route planning tasks. 1 Overview Artificial intelligence has investigated knowledge-based search techniques that allow one to solve search tasks in large domains. Most of the research on these methods has studied how to solve one-shot search problems. However, search is often a repetitive process, where one needs to solve a series of similar search tasks, for example, because the actual situation turns out to be slightly different from the one initially assumed or because the situation changes over time. An example for route planning tasks are changing traffic conditions. Thus, one needs to replan for the new situation, for example if one always wants to display the least time-consuming route from the airport to the conference center on a web page. In these situations, most search methods replan from scratch, that is, solve the search problems independently. Incremental search techniques share with case-based planning, plan adaptation, repair-based planning, and learning search-control knowledge the property that they find solutions to series of similar search tasks much faster than is possible by solving each search task from scratch. Incremental search techniques, however, differ from the other techniques in that the quality of their solutions is guaranteed to be as good as the quality of the solutions obtained by replanning from scratch. Although incremental search methods are not widely known in artificial intelligence and control, different researchers have developed incremental search versions of uninformed search methods in the algorithms literature. An overview can be found in [FMSN00]. We, on the other hand, develop an incremental version of A*, thus combining ideas from the algorithms literature and the artificial intelligence literature. We call the algorithm Lifelong Planning A* (LPA*), in analogy to “lifelong learning” [Thr98], because it reuses £ We thank Anthony Stentz for his support. The Intelligent Decision-Making Group is partly supported by NSF awards under contracts IIS9984827, IIS-0098807, and ITR/AP-0113881. The views and conclusions contained in this document are those of the authors and should not be interpreted as representing the official policies, either expressed or implied, of the sponsoring organizations and agencies or the U.S. government. information from previous searches. LPA* uses heuristics to focus the search and always finds a shortest path for the current edge costs. The first search of LPA* is the same as that of A* but all subsequent searches are much faster. LPA* produces at least the search tree that A* builds. However, it achieves a substantial speedup over A* because it reuses those parts of the previous search tree that are identical to the new search tree. 2 The Route Planning Task Lifelong Planning A* (LPA*) solves the following search task: It applies to finite graph search problems on known graphs whose edge costs can increase or decrease over time. denotes the finite set of vertices of the graph. denotes the set of successors of vertex . Similarly, denotes the set of predecessors of vertex . denotes the cost of moving from vertex to vertex . LPA* always determines a shortest path from a given start vertex to a given goal vertex , knowing both the topology of the graph and the current edge costs. We use to denote the start distance of vertex , that is, the length of a shortest path from to . ¨ ¨¦ £ £ ¡ ©§¥¤¢ FP HFE TSRQIGD¨ ¨¦ £ £ ¡ 4 ©D¥CBA@!¨ ¨ ¨¦
5 0.37794861 99 nips-2001-Intransitive Likelihood-Ratio Classifiers
Author: Jeff Bilmes, Gang Ji, Marina Meila
Abstract: In this work, we introduce an information-theoretic based correction term to the likelihood ratio classification method for multiple classes. Under certain conditions, the term is sufficient for optimally correcting the difference between the true and estimated likelihood ratio, and we analyze this in the Gaussian case. We find that the new correction term significantly improves the classification results when tested on medium vocabulary speech recognition tasks. Moreover, the addition of this term makes the class comparisons analogous to an intransitive game and we therefore use several tournament-like strategies to deal with this issue. We find that further small improvements are obtained by using an appropriate tournament. Lastly, we find that intransitivity appears to be a good measure of classification confidence.
6 0.22855161 64 nips-2001-EM-DD: An Improved Multiple-Instance Learning Technique
7 0.21372487 121 nips-2001-Model-Free Least-Squares Policy Iteration
8 0.18768394 160 nips-2001-Reinforcement Learning and Time Perception -- a Model of Animal Experiments
9 0.17477171 13 nips-2001-A Natural Policy Gradient
10 0.17226022 186 nips-2001-The Noisy Euclidean Traveling Salesman Problem and Learning
11 0.16606848 177 nips-2001-Switch Packet Arbitration via Queue-Learning
12 0.15607665 118 nips-2001-Matching Free Trees with Replicator Equations
13 0.15604015 193 nips-2001-Unsupervised Learning of Human Motion Models
14 0.15007716 18 nips-2001-A Rational Analysis of Cognitive Control in a Speeded Discrimination Task
15 0.14440823 169 nips-2001-Small-World Phenomena and the Dynamics of Information
16 0.14224707 110 nips-2001-Learning Hierarchical Structures with Linear Relational Embedding
17 0.14174214 11 nips-2001-A Maximum-Likelihood Approach to Modeling Multisensory Enhancement
18 0.13133603 66 nips-2001-Efficiency versus Convergence of Boolean Kernels for On-Line Learning Algorithms
19 0.12670279 138 nips-2001-On the Generalization Ability of On-Line Learning Algorithms
20 0.12584105 58 nips-2001-Covariance Kernels from Bayesian Generative Models
topicId topicWeight
[(10, 0.283), (14, 0.04), (17, 0.029), (19, 0.048), (27, 0.077), (30, 0.041), (38, 0.018), (59, 0.03), (67, 0.023), (72, 0.076), (79, 0.047), (83, 0.065), (91, 0.125)]
simIndex simValue paperId paperTitle
same-paper 1 0.81923628 32 nips-2001-An Efficient, Exact Algorithm for Solving Tree-Structured Graphical Games
Author: Michael L. Littman, Michael J. Kearns, Satinder P. Singh
Abstract: We describe a new algorithm for computing a Nash equilibrium in graphical games, a compact representation for multi-agent systems that we introduced in previous work. The algorithm is the first to compute equilibria both efficiently and exactly for a non-trivial class of graphical games. 1
2 0.69389158 58 nips-2001-Covariance Kernels from Bayesian Generative Models
Author: Matthias Seeger
Abstract: We propose the framework of mutual information kernels for learning covariance kernels, as used in Support Vector machines and Gaussian process classifiers, from unlabeled task data using Bayesian techniques. We describe an implementation of this framework which uses variational Bayesian mixtures of factor analyzers in order to attack classification problems in high-dimensional spaces where labeled data is sparse, but unlabeled data is abundant. 1
3 0.66911584 84 nips-2001-Global Coordination of Local Linear Models
Author: Sam T. Roweis, Lawrence K. Saul, Geoffrey E. Hinton
Abstract: High dimensional data that lies on or near a low dimensional manifold can be described by a collection of local linear models. Such a description, however, does not provide a global parameterization of the manifold—arguably an important goal of unsupervised learning. In this paper, we show how to learn a collection of local linear models that solves this more difficult problem. Our local linear models are represented by a mixture of factor analyzers, and the “global coordination” of these models is achieved by adding a regularizing term to the standard maximum likelihood objective function. The regularizer breaks a degeneracy in the mixture model’s parameter space, favoring models whose internal coordinate systems are aligned in a consistent way. As a result, the internal coordinates change smoothly and continuously as one traverses a connected path on the manifold—even when the path crosses the domains of many different local models. The regularizer takes the form of a Kullback-Leibler divergence and illustrates an unexpected application of variational methods: not to perform approximate inference in intractable probabilistic models, but to learn more useful internal representations in tractable ones. 1 Manifold Learning Consider an ensemble of images, each of which contains a face against a neutral background. Each image can be represented by a point in the high dimensional vector space of pixel intensities. This representation, however, does not exploit the strong correlations between pixels of the same image, nor does it support many useful operations for reasoning about faces. If, for example, we select two images with faces in widely different locations and then average their pixel intensities, we do not obtain an image of a face at their average location. Images of faces lie on or near a low-dimensional, curved manifold, and we can represent them more usefully by the coordinates on this manifold than by pixel intensities. Using these “intrinsic coordinates”, the average of two faces is another face with the average of their locations, poses and expressions. To analyze and manipulate faces, it is helpful to imagine a “magic black box” with levers or dials corresponding to the intrinsic coordinates on this manifold. Given a setting of the levers and dials, the box generates an image of a face. Given an image of a face, the box deduces the appropriate setting of the levers and dials. In this paper, we describe a fairly general way to construct such a box automatically from an ensemble of high-dimensional vectors. We assume only that there exists an underlying manifold of low dimensionality and that the relationship between the raw data and the manifold coordinates is locally linear and smoothly varying. Thus our method applies not only to images of faces, but also to many other forms of highly distributed perceptual and scientific data (e.g., spectrograms of speech, robotic sensors, gene expression arrays, document collections). 2 Local Linear Models The global structure of perceptual manifolds (such as images of faces) tends to be highly nonlinear. Fortunately, despite their complicated global structure, we can usually characterize these manifolds as locally linear. Thus, to a good approximation, they can be represented by collections of simpler models, each of which describes a locally linear neighborhood[3, 6, 8]. For unsupervised learning tasks, a probabilistic model that nicely captures this intuition is a mixture of factor analyzers (MFA)[5]. The model is used to describe high dimensional data that lies on or near a lower dimensional manifold. MFAs parameterize a joint distribution over observed and hidden variables: (1) where the observed variable, , represents the high dimensional data; the discrete hidden variables, , indexes different neighborhoods on the manifold; and the continuous hidden variables, , represent low dimensional local coordinates. The model assumes that data is sampled from different neighborhoods on the manifold with prior probabilities , and that within each neighborhood, the data’s local coordinates are normally distributed1 as: RP&¤§¢ Q ¡ I 0 ( 3HGF D C¥@@@¥ 8¥75 ( E¨BAA9¨©©64§ 2 0 ( 31)£ ¥ ¡ ¡ ¥ ¡ ¥ §¥ ¡ &¤§¢'&§ %#¤¢$#¨
4 0.54728907 43 nips-2001-Bayesian time series classification
Author: Peter Sykacek, Stephen J. Roberts
Abstract: This paper proposes an approach to classification of adjacent segments of a time series as being either of classes. We use a hierarchical model that consists of a feature extraction stage and a generative classifier which is built on top of these features. Such two stage approaches are often used in signal and image processing. The novel part of our work is that we link these stages probabilistically by using a latent feature space. To use one joint model is a Bayesian requirement, which has the advantage to fuse information according to its certainty. The classifier is implemented as hidden Markov model with Gaussian and Multinomial observation distributions defined on a suitably chosen representation of autoregressive models. The Markov dependency is motivated by the assumption that successive classifications will be correlated. Inference is done with Markov chain Monte Carlo (MCMC) techniques. We apply the proposed approach to synthetic data and to classification of EEG that was recorded while the subjects performed different cognitive tasks. All experiments show that using a latent feature space results in a significant improvement in generalization accuracy. Hence we expect that this idea generalizes well to other hierarchical models.
5 0.541861 13 nips-2001-A Natural Policy Gradient
Author: Sham M. Kakade
Abstract: We provide a natural gradient method that represents the steepest descent direction based on the underlying structure of the parameter space. Although gradient methods cannot make large changes in the values of the parameters, we show that the natural gradient is moving toward choosing a greedy optimal action rather than just a better action. These greedy optimal actions are those that would be chosen under one improvement step of policy iteration with approximate, compatible value functions, as defined by Sutton et al. [9]. We then show drastic performance improvements in simple MDPs and in the more challenging MDP of Tetris. 1
6 0.5410338 195 nips-2001-Variance Reduction Techniques for Gradient Estimates in Reinforcement Learning
7 0.54000664 95 nips-2001-Infinite Mixtures of Gaussian Process Experts
9 0.53755915 7 nips-2001-A Dynamic HMM for On-line Segmentation of Sequential Data
10 0.53542745 146 nips-2001-Playing is believing: The role of beliefs in multi-agent learning
11 0.5318833 132 nips-2001-Novel iteration schemes for the Cluster Variation Method
12 0.53071785 161 nips-2001-Reinforcement Learning with Long Short-Term Memory
13 0.52996868 187 nips-2001-The Steering Approach for Multi-Criteria Reinforcement Learning
14 0.52853763 56 nips-2001-Convolution Kernels for Natural Language
15 0.52671808 121 nips-2001-Model-Free Least-Squares Policy Iteration
16 0.52520281 183 nips-2001-The Infinite Hidden Markov Model
17 0.5239948 192 nips-2001-Tree-based reparameterization for approximate inference on loopy graphs
18 0.52321702 100 nips-2001-Iterative Double Clustering for Unsupervised and Semi-Supervised Learning
19 0.5227263 186 nips-2001-The Noisy Euclidean Traveling Salesman Problem and Learning
20 0.52256149 3 nips-2001-ACh, Uncertainty, and Cortical Inference