nips nips2004 nips2004-48 knowledge-graph by maker-knowledge-mining
Source: pdf
Author: Michael Bowling
Abstract: Learning in a multiagent system is a challenging problem due to two key factors. First, if other agents are simultaneously learning then the environment is no longer stationary, thus undermining convergence guarantees. Second, learning is often susceptible to deception, where the other agents may be able to exploit a learner’s particular dynamics. In the worst case, this could result in poorer performance than if the agent was not learning at all. These challenges are identifiable in the two most common evaluation criteria for multiagent learning algorithms: convergence and regret. Algorithms focusing on convergence or regret in isolation are numerous. In this paper, we seek to address both criteria in a single algorithm by introducing GIGA-WoLF, a learning algorithm for normalform games. We prove the algorithm guarantees at most zero average regret, while demonstrating the algorithm converges in many situations of self-play. We prove convergence in a limited setting and give empirical results in a wider variety of situations. These results also suggest a third new learning criterion combining convergence and regret, which we call negative non-convergence regret (NNR). 1
Reference: text
sentIndex sentText sentNum sentScore
1 ca Abstract Learning in a multiagent system is a challenging problem due to two key factors. [sent-3, score-0.194]
2 First, if other agents are simultaneously learning then the environment is no longer stationary, thus undermining convergence guarantees. [sent-4, score-0.228]
3 These challenges are identifiable in the two most common evaluation criteria for multiagent learning algorithms: convergence and regret. [sent-7, score-0.452]
4 Algorithms focusing on convergence or regret in isolation are numerous. [sent-8, score-0.469]
5 We prove the algorithm guarantees at most zero average regret, while demonstrating the algorithm converges in many situations of self-play. [sent-10, score-0.189]
6 We prove convergence in a limited setting and give empirical results in a wider variety of situations. [sent-11, score-0.188]
7 These results also suggest a third new learning criterion combining convergence and regret, which we call negative non-convergence regret (NNR). [sent-12, score-0.51]
8 1 Introduction Learning to select actions to achieve goals in a multiagent setting requires overcoming a number of key challenges. [sent-13, score-0.238]
9 Both of these challenges distinguish the multiagent learning problem from traditional single-agent learning, and have been gaining recent attention as multiagent applications continue to proliferate. [sent-16, score-0.413]
10 In a multiagent task with other learning agents, the outcomes of an agent’s action will vary with the changing policies of the other agents. [sent-19, score-0.295]
11 Since most of the convergence results in reinforcement learning depend upon the environment being stationary, convergence is often difficult to obtain in multiagent settings. [sent-20, score-0.5]
12 These algorithms learn joint-action values, which are stationary, and in certain circumstances guarantee these values converge to Nash (or correlated) equilibrium values. [sent-24, score-0.198]
13 This convergence of strategies is guaranteed nearly independently of the actions selected by the other agents, including when other agents play suboptimal responses. [sent-26, score-0.426]
14 These approaches usually examine convergence in self-play, and have included both theoretical and experimental results. [sent-29, score-0.187]
15 Since learning strategies dynamically change their action selection over time, it is important to know that the change cannot be exploited by a clever opponent. [sent-31, score-0.339]
16 A deceptive strategy may “lure” a dynamic strategy away from a safe choice in order to switch to a strategy where the learner receives much lower reward. [sent-32, score-0.422]
17 These two challenges result in two completely different criteria for evaluation: convergence and no-regret. [sent-37, score-0.226]
18 For example, equilibrium learners can have arbitrarily large average regret. [sent-39, score-0.217]
19 On the other hand, no-regret learners’ strategies rarely converge in self-play [12] in even the simplest of games. [sent-40, score-0.207]
20 We also examine key related work in applying gradient ascent algorithms to this learning problem. [sent-43, score-0.197]
21 In Section 3 we introduce GIGA-WoLF, an algorithm with both regret and convergence properties. [sent-44, score-0.492]
22 n ), where n is the number of players in the game, Ai is a set of actions available to player i (A = A1 ×. [sent-52, score-0.254]
23 The problem of learning in a normalform game is one of repeatedly selecting an action and receiving a reward, with a goal of maximizing average reward against an unknown opponent. [sent-56, score-0.419]
24 If there are two players then it is convenient to write a player’s reward function as a |A1 | × |A2 | matrix. [sent-57, score-0.247]
25 In the context of a particular learning algorithm and a particular opponent, let rt ∈ |A1 | be the vector of actual rewards that player one would receive at time t for each of its actions. [sent-60, score-0.417]
26 Let x t ∈ 1 This work is not restricted to zero-sum games and our use of the word “opponent” refers simply to other players in the game. [sent-61, score-0.286]
27 So, player one’s expected payoff at time t is (rt · xt ). [sent-66, score-0.481]
28 Lastly, we will assume the reward 2 for any action is bounded by rmax and therefore ||rt ||2 ≤ |A1 |rmax . [sent-68, score-0.352]
29 1 Evaluation Criteria One common evaluation criterion for learning in normal-form games is convergence. [sent-70, score-0.243]
30 These include, roughly increasing in strength: average reward (i. [sent-72, score-0.193]
31 , (rt · xt )/T ), empirical distribution of actions (i. [sent-74, score-0.324]
32 We focus in this paper on convergence of strategies as this implies the other three forms of convergence as well. [sent-81, score-0.39]
33 In particular, we will say an algorithm converges against a particular opponent if and only if limt→∞ xt = x∗ . [sent-82, score-0.408]
34 Total regret3 is the difference between the maximum total reward of any static strategy given the past history of play and the algorithm’s total reward. [sent-84, score-0.39]
35 T RT ≡ max a∈A1 t=1 ((rt · 1a ) − (rt · xt )) Average regret is just the asymptotic average of total regret, limT →∞ RT /T . [sent-85, score-0.659]
36 An algorithm has no-regret if and only if the average regret is less than or equal to zero against all opponent strategies. [sent-86, score-0.483]
37 The no-regret property makes a strong claim about the performance of the algorithm: the algorithm’s expected average reward is at least as large as the expected average award any static strategy could have achieved. [sent-87, score-0.506]
38 We will examine three recent results evaluating gradient ascent learning algorithms in normal-form games. [sent-92, score-0.179]
39 They examined the resulting strategy trajectories and payoffs in self-play, demonstrating that strategies do not always converge to a Nash equilibrium, depending on the game. [sent-96, score-0.524]
40 They proved, instead, that average payoffs converge (a 3 Our analysis focuses on expectations of regret (total and average), similar to [10, 11]. [sent-97, score-0.548]
41 Although note that for any self-oblivious behavior, including GIGA-WoLF, average regret of at most zero on expectation implies universal consistency, i. [sent-98, score-0.379]
42 Using the WoLF (“Win or Learn Fast”) principle, the algorithm would choose a larger step size when the current strategy had less expected payoff than the equilibrium strategy. [sent-103, score-0.364]
43 This results in strategies converging to the Nash equilibrium in a variety of games including all two-player, two-action games. [sent-104, score-0.442]
44 4 Zinkevich [11] looked at gradient ascent using the evaluation criterion of regret. [sent-105, score-0.188]
45 He proved GIGA’s total regret is bounded by, √ √ 2 RT ≤ T + |A|rmax ( T − 1/2). [sent-108, score-0.372]
46 It is also true, though, that GIGA’s strategies do not converge in self-play even in simple games like matching pennies. [sent-110, score-0.378]
47 3 GIGA-WoLF GIGA-WoLF is a gradient based learning algorithm that internally keeps track of two different gradient-updated strategies, xt and zt . [sent-113, score-0.852]
48 The algorithm chooses actions according to the distribution xt , but updates both xt and zt after each iteration. [sent-114, score-1.117]
49 (1) xt+1 ˆ (2) zt+1 δt+1 (3) xt+1 = P (xt + ηt rt ) = P (zt + ηt rt /3) ||zt+1 − zt || = min 1, ||zt+1 − xt+1 || ˆ = xt+1 + δt+1 (zt+1 − xt+1 ) ˆ ˆ Step (1) updates xt according to GIGA’s standard gradient update and stores the result as xt+1 . [sent-116, score-1.283]
50 Step (2) updates zt in the same manner, but with a smaller step-size. [sent-117, score-0.49]
51 The magnitude of this adjustment is limited by the change in zt that occurred in step (2). [sent-119, score-0.578]
52 A key factor in understanding this algorithm is the observance that a strategy a receives higher reward than a strategy b if and only if the gradient at a is in the direction of b (i. [sent-120, score-0.526]
53 Therefore, the step (3) adjustment is in the direction of the gradient if and only if zt received higher reward than xt . [sent-123, score-1.018]
54 Notice also that, as long as xt is not near the boundary, the change due to step (3) is of lower magnitude than the change due 4 WoLF-IGA may, in fact, be a limited variant of the extragradient method [13] for variational inequality problems. [sent-124, score-0.457]
55 The extragradient algorithm is guaranteed to converge to a Nash equilibrium in self-play for all zero-sum games. [sent-125, score-0.276]
56 Like WoLF-IGA, though, it does not have any known regret guarantees, but more importantly requires the other players’ payoffs to be known. [sent-126, score-0.446]
57 Second, the magnitude of the change is larger when zt received higher reward than xt . [sent-130, score-0.944]
58 , the algorithm is learning faster if and only if its strategy x is losing to strategy z. [sent-133, score-0.304]
59 GIGA-WoLF is a major improvement on the original presentation of WoLF-IGA, where winning was determined by comparison with an equilibrium strategy that was assumed to be given. [sent-134, score-0.265]
60 Not only is less knowledge required, but the use of a GIGA-updated strategy to determine winning will allow us to prove guarantees on the algorithm’s regret. [sent-135, score-0.209]
61 In the next section we present a theoretical examination of GIGA-WoLF’s regret in nplayer, n-action games, along with a limited guarantee of convergence in two-player, twoaction games. [sent-136, score-0.547]
62 In Section 5, we give experimental results of learning using GIGA-WoLF, demonstrating that convergence extends beyond the theoretical analysis presented. [sent-137, score-0.209]
63 4 Theoretical Analysis We begin by examining GIGA-WoLF’s regret against an unknown opponent strategy. [sent-138, score-0.425]
64 √ Theorem 1 If ηt = 1/ t, the regret of GIGA-WoLF is, √ √ 2 RT ≤ 2 T + |A|rmax (2 T − 1). [sent-140, score-0.344]
65 We will find a bound on the regret of the strategy xt with respect to the dynamic strategy zt . [sent-144, score-1.383]
66 Since zt is unmodified GIGA, we already have a bound on the regret of zt with respect to any static strategy. [sent-145, score-1.374]
67 Hence, we can bound the regret of xt with respect to any static strategy. [sent-146, score-0.724]
68 We start by examining the regret of xt with respect to zt using a similar analysis as used by Zinkevich [11]. [sent-147, score-1.089]
69 Let ρx→z refer to the difference in expected payoff between zt and xt t at time t, and Rx→z be the sum of these differences, i. [sent-148, score-0.825]
70 , the total regret of xt with respect T to zt , T ρx→z t = rt · (zt − xt ) Rx→z T ≡ ρx→z . [sent-150, score-1.593]
71 t t=1 We will use the following potential function, Φt ≡ (xt − zt )2 /2ηt . [sent-151, score-0.489]
72 t+1 t+1 If δt+1 < 1, then ||xt+1 − xt+1 || = ||zt+1 − zt ||. [sent-158, score-0.465]
73 So, ˆ ||ˆt+1 − zt+1 || x = ||ˆt+1 − xt+1 || + ||xt+1 − zt+1 || x = ||zt+1 − zt || + ||xt+1 − zt+1 || We can bound the left with the triangle inequality, ||ˆt+1 − zt+1 || x ||xt+1 − zt+1 || ≤ ||ˆt+1 − zt || + ||zt − zt+1 || x ≤ ||ˆt+1 − zt ||. [sent-160, score-1.427]
74 t+1 t+1 t+1 t+1 We will now use this bound on the change in the potential to bound the regret of x t with respect to zt . [sent-163, score-0.938]
75 We know from Zinkevich that, 2 2 (ˆt+1 − zt )2 − (xt − zt )2 ≤ 2ηt rt · (zt − xt ) + ηt rt . [sent-164, score-1.658]
76 x Therefore, ρx→z t 1 2 (ˆt+1 − zt )2 − (xt − zt )2 − rt x 2ηt 2 2 ≤ −∆Φ1 + 1/2ηt rt ≤ −∆Φt+1 + ∆Φ4 + 1/2ηt rt . [sent-165, score-1.602]
77 t+1 t+1 ≤ − 2 2 Since we assume rewards are bounded by rmax we can bound rt by |A|rmax . [sent-166, score-0.421]
78 Summing up √ regret and using the fact that ηt = 1/ t, we get the following bound. [sent-167, score-0.344]
79 T Rx→z T ≤ t=1 −∆Φt + ∆Φ4 + t 2 1 |A|rmax −1 + ηT 2 √ √ 2 T + |A|rmax ( T − 1/2) ≤ (Φ1 − ΦT ) + ≤ ηt 2 |A|rmax 2 T ηt t=1 We know that GIGA’s regret with respect to any strategy is bounded by the same value (see Inequality 2). [sent-168, score-0.475]
80 Theorem 2 In a two-player, two-action repeated game, if one player follows the GIGAWoLF algorithm and the other follows the GIGA algorithm, then their strategies will converge to a Nash equilibrium. [sent-177, score-0.351]
81 5 Experimental Analysis We have presented here two theoretical properties of GIGA-WoLF relating to guarantees on both regret and convergence. [sent-178, score-0.415]
82 There have also been extensive experimental results performed on GIGA-WoLF in a variety of normal-form games [1]. [sent-179, score-0.19]
83 In that vein, we examined the same suite of normal-form games used in experiments with WoLF-PHC, the practical variant of WoLF-IGA [7]. [sent-182, score-0.224]
84 One of the requirements of GIGA-WoLF (and GIGA) is knowledge of the entire reward vector (rt ), which requires knowledge of the game and observation of the opponent’s action. [sent-183, score-0.248]
85 Instead, only the reward of the selected action is likely to be observable. [sent-185, score-0.217]
86 In particular, after selecting action a and receiving reward ra , we update the current estimate of action a’s component of the reward ˆ vector, rt+1 = rt + αt (ˆa − 1a · rt )1a , where αt is the learning rate. [sent-187, score-0.922]
87 For almost all of the games explored, including two-player, two-action games as well as n-action zero-sum games, GIGA-WoLF strategies converged in self-play to equilibrium strategies of the game. [sent-191, score-0.734]
88 A prototypical example of these results is provided in Figure 1(a) and (b), showing strategy trajectories while learning in Rock-Paper-Scissors. [sent-194, score-0.18]
89 GIGA’s strategies do not converge, while GIGA-WoLF’s strategies do converge. [sent-195, score-0.28]
90 Since both are no-regret learners, average payoffs are guaranteed to go to zero, but the short-term payoff is highly favoring GIGA-WoLF. [sent-198, score-0.207]
91 GIGA-WoLF Figure 1: Trajectories of joint strategies in Rock-Paper-Scissors when both players use GIGA (a) or GIGA-WoLF (b). [sent-222, score-0.229]
92 Also shown (c) are the expected and average payoffs of the players when GIGA and GIGA-WoLF play against each other. [sent-223, score-0.288]
93 In other words, the algorithms are outperforming any static strategy to which they could converge upon. [sent-227, score-0.266]
94 This suggests a new desirable property for future multiagent (or online) learning algorithms, negative non-convergence regret (NNR). [sent-228, score-0.56]
95 An algorithm has NNR, if it satisfies the no-regret property and either (i) achieves negative regret or (ii) its strategy converges. [sent-229, score-0.519]
96 This property combines the criteria of regret and convergence, and GIGA-WoLF is a natural candidate for achieving this compelling result. [sent-230, score-0.424]
97 We also proved that in a small class of normal-form games, GIGAWoLF’s strategy when played against GIGA will converge to a Nash equilibrium. [sent-233, score-0.246]
98 We expect GIGAWoLF and these results to be the foundation for further understanding of the connections between the regret and convergence criteria. [sent-236, score-0.469]
99 Markov games as a framework for multi-agent reinforcement learning. [sent-242, score-0.207]
100 The dynamics of reinforcement learning in cooperative multiagent systems. [sent-252, score-0.231]
wordName wordTfidf (topN-words)
[('zt', 0.465), ('giga', 0.384), ('regret', 0.344), ('xt', 0.28), ('rt', 0.224), ('multiagent', 0.176), ('games', 0.171), ('reward', 0.158), ('strategies', 0.14), ('rmax', 0.135), ('strategy', 0.131), ('nash', 0.131), ('convergence', 0.125), ('player', 0.121), ('equilibrium', 0.112), ('payoffs', 0.102), ('gigawolf', 0.091), ('game', 0.09), ('players', 0.089), ('opponent', 0.081), ('learners', 0.07), ('ascent', 0.07), ('static', 0.068), ('converge', 0.067), ('gradient', 0.065), ('agents', 0.065), ('iga', 0.064), ('criteria', 0.059), ('action', 0.059), ('extragradient', 0.055), ('nnr', 0.055), ('payoff', 0.051), ('rock', 0.048), ('bowling', 0.048), ('zinkevich', 0.048), ('actions', 0.044), ('explored', 0.044), ('challenges', 0.042), ('change', 0.041), ('alberta', 0.041), ('exploited', 0.039), ('limt', 0.038), ('theoretical', 0.037), ('amy', 0.037), ('normalform', 0.037), ('rx', 0.036), ('reinforcement', 0.036), ('michael', 0.036), ('average', 0.035), ('guarantees', 0.034), ('pr', 0.033), ('play', 0.033), ('adjustment', 0.032), ('playing', 0.032), ('phc', 0.032), ('greenwald', 0.032), ('bound', 0.032), ('evaluation', 0.031), ('correlated', 0.03), ('trajectories', 0.03), ('rewards', 0.03), ('learner', 0.029), ('expected', 0.029), ('demonstrating', 0.028), ('proved', 0.028), ('fifteenth', 0.027), ('nitesimal', 0.027), ('suite', 0.027), ('win', 0.027), ('refers', 0.026), ('examined', 0.026), ('examine', 0.025), ('updates', 0.025), ('potential', 0.024), ('wolf', 0.024), ('converges', 0.024), ('algorithm', 0.023), ('criterion', 0.022), ('winning', 0.022), ('hart', 0.022), ('outcomes', 0.022), ('prove', 0.022), ('stationary', 0.022), ('limited', 0.022), ('receiving', 0.021), ('proceedings', 0.021), ('singh', 0.021), ('property', 0.021), ('played', 0.02), ('chang', 0.02), ('kearns', 0.02), ('variety', 0.019), ('policies', 0.019), ('guarantee', 0.019), ('online', 0.019), ('environment', 0.019), ('guaranteed', 0.019), ('learning', 0.019), ('step', 0.018), ('key', 0.018)]
simIndex simValue paperId paperTitle
same-paper 1 1.0000002 48 nips-2004-Convergence and No-Regret in Multiagent Learning
Author: Michael Bowling
Abstract: Learning in a multiagent system is a challenging problem due to two key factors. First, if other agents are simultaneously learning then the environment is no longer stationary, thus undermining convergence guarantees. Second, learning is often susceptible to deception, where the other agents may be able to exploit a learner’s particular dynamics. In the worst case, this could result in poorer performance than if the agent was not learning at all. These challenges are identifiable in the two most common evaluation criteria for multiagent learning algorithms: convergence and regret. Algorithms focusing on convergence or regret in isolation are numerous. In this paper, we seek to address both criteria in a single algorithm by introducing GIGA-WoLF, a learning algorithm for normalform games. We prove the algorithm guarantees at most zero average regret, while demonstrating the algorithm converges in many situations of self-play. We prove convergence in a limited setting and give empirical results in a wider variety of situations. These results also suggest a third new learning criterion combining convergence and regret, which we call negative non-convergence regret (NNR). 1
2 0.3469075 206 nips-2004-Worst-Case Analysis of Selective Sampling for Linear-Threshold Algorithms
Author: Nicolò Cesa-bianchi, Claudio Gentile, Luca Zaniboni
Abstract: We provide a worst-case analysis of selective sampling algorithms for learning linear threshold functions. The algorithms considered in this paper are Perceptron-like algorithms, i.e., algorithms which can be efficiently run in any reproducing kernel Hilbert space. Our algorithms exploit a simple margin-based randomized rule to decide whether to query the current label. We obtain selective sampling algorithms achieving on average the same bounds as those proven for their deterministic counterparts, but using much fewer labels. We complement our theoretical findings with an empirical comparison on two text categorization tasks. The outcome of these experiments is largely predicted by our theoretical results: Our selective sampling algorithms tend to perform as good as the algorithms receiving the true label after each classification, while observing in practice substantially fewer labels. 1
3 0.26482069 129 nips-2004-New Criteria and a New Algorithm for Learning in Multi-Agent Systems
Author: Rob Powers, Yoav Shoham
Abstract: We propose a new set of criteria for learning algorithms in multi-agent systems, one that is more stringent and (we argue) better justified than previous proposed criteria. Our criteria, which apply most straightforwardly in repeated games with average rewards, consist of three requirements: (a) against a specified class of opponents (this class is a parameter of the criterion) the algorithm yield a payoff that approaches the payoff of the best response, (b) against other opponents the algorithm’s payoff at least approach (and possibly exceed) the security level payoff (or maximin value), and (c) subject to these requirements, the algorithm achieve a close to optimal payoff in self-play. We furthermore require that these average payoffs be achieved quickly. We then present a novel algorithm, and show that it meets these new criteria for a particular parameter class, the class of stationary opponents. Finally, we show that the algorithm is effective not only in theory, but also empirically. Using a recently introduced comprehensive game theoretic test suite, we show that the algorithm almost universally outperforms previous learning algorithms. 1
4 0.25475791 64 nips-2004-Experts in a Markov Decision Process
Author: Eyal Even-dar, Sham M. Kakade, Yishay Mansour
Abstract: We consider an MDP setting in which the reward function is allowed to change during each time step of play (possibly in an adversarial manner), yet the dynamics remain fixed. Similar to the experts setting, we address the question of how well can an agent do when compared to the reward achieved under the best stationary policy over time. We provide efficient algorithms, which have regret bounds with no dependence on the size of state space. Instead, these bounds depend only on a certain horizon time of the process and logarithmically on the number of actions. We also show that in the case that the dynamics change over time, the problem becomes computationally hard. 1
5 0.20734438 126 nips-2004-Nearly Tight Bounds for the Continuum-Armed Bandit Problem
Author: Robert D. Kleinberg
Abstract: In the multi-armed bandit problem, an online algorithm must choose from a set of strategies in a sequence of n trials so as to minimize the total cost of the chosen strategies. While nearly tight upper and lower bounds are known in the case when the strategy set is finite, much less is known when there is an infinite strategy set. Here we consider the case when the set of strategies is a subset of Rd , and the cost functions are continuous. In the d = 1 case, we improve on the best-known upper and lower bounds, closing the gap to a sublogarithmic factor. We also consider the case where d > 1 and the cost functions are convex, adapting a recent online convex optimization algorithm of Zinkevich to the sparser feedback model of the multi-armed bandit problem. 1
6 0.1316376 86 nips-2004-Instance-Specific Bayesian Model Averaging for Classification
7 0.12434722 148 nips-2004-Probabilistic Computation in Spiking Populations
8 0.11880573 110 nips-2004-Matrix Exponential Gradient Updates for On-line Learning and Bregman Projection
9 0.11195219 189 nips-2004-The Power of Selective Memory: Self-Bounded Learning of Prediction Suffix Trees
10 0.10933526 185 nips-2004-The Convergence of Contrastive Divergences
11 0.10905001 183 nips-2004-Temporal-Difference Networks
12 0.10580726 33 nips-2004-Brain Inspired Reinforcement Learning
13 0.099827461 123 nips-2004-Multi-agent Cooperation in Diverse Population Games
14 0.099185929 88 nips-2004-Intrinsically Motivated Reinforcement Learning
15 0.094443046 175 nips-2004-Stable adaptive control with online learning
16 0.080958232 190 nips-2004-The Rescorla-Wagner Algorithm and Maximum Likelihood Estimation of Causal Parameters
17 0.080795027 24 nips-2004-Approximately Efficient Online Mechanism Design
18 0.079326376 83 nips-2004-Incremental Learning for Visual Tracking
19 0.075329207 67 nips-2004-Exponentiated Gradient Algorithms for Large-margin Structured Classification
20 0.07181748 65 nips-2004-Exploration-Exploitation Tradeoffs for Experts Algorithms in Reactive Environments
topicId topicWeight
[(0, -0.167), (1, -0.072), (2, 0.442), (3, 0.023), (4, 0.045), (5, -0.096), (6, -0.074), (7, -0.032), (8, 0.009), (9, -0.008), (10, -0.115), (11, 0.081), (12, -0.125), (13, -0.173), (14, -0.054), (15, 0.115), (16, 0.105), (17, 0.231), (18, 0.067), (19, -0.056), (20, -0.095), (21, -0.007), (22, 0.146), (23, -0.131), (24, 0.211), (25, 0.001), (26, -0.172), (27, 0.03), (28, -0.108), (29, -0.017), (30, -0.083), (31, 0.069), (32, -0.009), (33, 0.121), (34, -0.052), (35, 0.025), (36, -0.108), (37, 0.025), (38, -0.069), (39, 0.028), (40, -0.009), (41, 0.041), (42, -0.001), (43, -0.014), (44, -0.057), (45, 0.022), (46, -0.043), (47, -0.015), (48, -0.046), (49, -0.12)]
simIndex simValue paperId paperTitle
same-paper 1 0.98342401 48 nips-2004-Convergence and No-Regret in Multiagent Learning
Author: Michael Bowling
Abstract: Learning in a multiagent system is a challenging problem due to two key factors. First, if other agents are simultaneously learning then the environment is no longer stationary, thus undermining convergence guarantees. Second, learning is often susceptible to deception, where the other agents may be able to exploit a learner’s particular dynamics. In the worst case, this could result in poorer performance than if the agent was not learning at all. These challenges are identifiable in the two most common evaluation criteria for multiagent learning algorithms: convergence and regret. Algorithms focusing on convergence or regret in isolation are numerous. In this paper, we seek to address both criteria in a single algorithm by introducing GIGA-WoLF, a learning algorithm for normalform games. We prove the algorithm guarantees at most zero average regret, while demonstrating the algorithm converges in many situations of self-play. We prove convergence in a limited setting and give empirical results in a wider variety of situations. These results also suggest a third new learning criterion combining convergence and regret, which we call negative non-convergence regret (NNR). 1
2 0.7275095 206 nips-2004-Worst-Case Analysis of Selective Sampling for Linear-Threshold Algorithms
Author: Nicolò Cesa-bianchi, Claudio Gentile, Luca Zaniboni
Abstract: We provide a worst-case analysis of selective sampling algorithms for learning linear threshold functions. The algorithms considered in this paper are Perceptron-like algorithms, i.e., algorithms which can be efficiently run in any reproducing kernel Hilbert space. Our algorithms exploit a simple margin-based randomized rule to decide whether to query the current label. We obtain selective sampling algorithms achieving on average the same bounds as those proven for their deterministic counterparts, but using much fewer labels. We complement our theoretical findings with an empirical comparison on two text categorization tasks. The outcome of these experiments is largely predicted by our theoretical results: Our selective sampling algorithms tend to perform as good as the algorithms receiving the true label after each classification, while observing in practice substantially fewer labels. 1
3 0.70457333 129 nips-2004-New Criteria and a New Algorithm for Learning in Multi-Agent Systems
Author: Rob Powers, Yoav Shoham
Abstract: We propose a new set of criteria for learning algorithms in multi-agent systems, one that is more stringent and (we argue) better justified than previous proposed criteria. Our criteria, which apply most straightforwardly in repeated games with average rewards, consist of three requirements: (a) against a specified class of opponents (this class is a parameter of the criterion) the algorithm yield a payoff that approaches the payoff of the best response, (b) against other opponents the algorithm’s payoff at least approach (and possibly exceed) the security level payoff (or maximin value), and (c) subject to these requirements, the algorithm achieve a close to optimal payoff in self-play. We furthermore require that these average payoffs be achieved quickly. We then present a novel algorithm, and show that it meets these new criteria for a particular parameter class, the class of stationary opponents. Finally, we show that the algorithm is effective not only in theory, but also empirically. Using a recently introduced comprehensive game theoretic test suite, we show that the algorithm almost universally outperforms previous learning algorithms. 1
4 0.53894693 126 nips-2004-Nearly Tight Bounds for the Continuum-Armed Bandit Problem
Author: Robert D. Kleinberg
Abstract: In the multi-armed bandit problem, an online algorithm must choose from a set of strategies in a sequence of n trials so as to minimize the total cost of the chosen strategies. While nearly tight upper and lower bounds are known in the case when the strategy set is finite, much less is known when there is an infinite strategy set. Here we consider the case when the set of strategies is a subset of Rd , and the cost functions are continuous. In the d = 1 case, we improve on the best-known upper and lower bounds, closing the gap to a sublogarithmic factor. We also consider the case where d > 1 and the cost functions are convex, adapting a recent online convex optimization algorithm of Zinkevich to the sparser feedback model of the multi-armed bandit problem. 1
5 0.51205468 64 nips-2004-Experts in a Markov Decision Process
Author: Eyal Even-dar, Sham M. Kakade, Yishay Mansour
Abstract: We consider an MDP setting in which the reward function is allowed to change during each time step of play (possibly in an adversarial manner), yet the dynamics remain fixed. Similar to the experts setting, we address the question of how well can an agent do when compared to the reward achieved under the best stationary policy over time. We provide efficient algorithms, which have regret bounds with no dependence on the size of state space. Instead, these bounds depend only on a certain horizon time of the process and logarithmically on the number of actions. We also show that in the case that the dynamics change over time, the problem becomes computationally hard. 1
6 0.45636758 171 nips-2004-Solitaire: Man Versus Machine
7 0.43896568 86 nips-2004-Instance-Specific Bayesian Model Averaging for Classification
8 0.42869902 185 nips-2004-The Convergence of Contrastive Divergences
9 0.36959648 190 nips-2004-The Rescorla-Wagner Algorithm and Maximum Likelihood Estimation of Causal Parameters
10 0.36619925 33 nips-2004-Brain Inspired Reinforcement Learning
11 0.36521539 65 nips-2004-Exploration-Exploitation Tradeoffs for Experts Algorithms in Reactive Environments
12 0.3377167 183 nips-2004-Temporal-Difference Networks
13 0.32775706 110 nips-2004-Matrix Exponential Gradient Updates for On-line Learning and Bregman Projection
14 0.30739716 148 nips-2004-Probabilistic Computation in Spiking Populations
15 0.30675566 123 nips-2004-Multi-agent Cooperation in Diverse Population Games
16 0.30160862 189 nips-2004-The Power of Selective Memory: Self-Bounded Learning of Prediction Suffix Trees
17 0.28108391 122 nips-2004-Modelling Uncertainty in the Game of Go
18 0.24878156 88 nips-2004-Intrinsically Motivated Reinforcement Learning
19 0.24153855 82 nips-2004-Incremental Algorithms for Hierarchical Classification
20 0.22704409 159 nips-2004-Schema Learning: Experience-Based Construction of Predictive Action Models
topicId topicWeight
[(13, 0.049), (15, 0.084), (26, 0.109), (31, 0.023), (33, 0.235), (35, 0.015), (39, 0.012), (50, 0.05), (71, 0.015), (76, 0.01), (77, 0.227), (89, 0.087)]
simIndex simValue paperId paperTitle
1 0.90280259 141 nips-2004-Optimal sub-graphical models
Author: Mukund Narasimhan, Jeff A. Bilmes
Abstract: We investigate the problem of reducing the complexity of a graphical model (G, PG ) by finding a subgraph H of G, chosen from a class of subgraphs H, such that H is optimal with respect to KL-divergence. We do this by first defining a decomposition tree representation for G, which is closely related to the junction-tree representation for G. We then give an algorithm which uses this representation to compute the optimal H ∈ H. Gavril [2] and Tarjan [3] have used graph separation properties to solve several combinatorial optimization problems when the size of the minimal separators in the graph is bounded. We present an extension of this technique which applies to some important choices of H even when the size of the minimal separators of G are arbitrarily large. In particular, this applies to problems such as finding an optimal subgraphical model over a (k − 1)-tree of a graphical model over a k-tree (for arbitrary k) and selecting an optimal subgraphical model with (a constant) d fewer edges with respect to KL-divergence can be solved in time polynomial in |V (G)| using this formulation. 1 Introduction and Preliminaries The complexity of inference in graphical models is typically exponential in some parameter of the graph, such as the size of the largest clique. Therefore, it is often required to find a subgraphical model that has lower complexity (smaller clique size) without introducing a large error in inference results. The KL-divergence between the original probability distribution and the probability distribution on the simplified graphical model is often used to measure the impact on inference. Existing techniques for reducing the complexity of graphical models including annihilation and edge-removal [4] are greedy in nature and cannot make any guarantees regarding the optimality of the solution. This problem is NP-complete [9] and so, in general, one cannot expect a polynomial time algorithm to find the optimal solution. However, we show that when we restrict the problem to some sets of subgraphs, the optimal solution can be found quite quickly using a dynamic programming algorithm in time polynomial in the tree-width of the graph. 1.1 Notation and Terminology A graph G = (V, E) is said to be triangulated if every cycle of length greater than 3 has a chord. A clique of G is a non-empty set S ⊆ V such that {a, b} ∈ E for all ∗ This work was supported by NSF grant IIS-0093430 and an Intel Corporation Grant. {b, c, d} d {c, f, g} {b, c} {b, e, c} b c {f, c} {c, e} {e, c, f } g {b, e} a e f {a, b, e} Figure 1: A triangulated graph G and a junction-tree for G a, b ∈ S. A clique S is maximal if S is not properly contained in another clique. If α and β are non-adjacent vertices of G then a set of vertices S ⊆ V \ {α, β} is called an (α, β)-separator if α and β are in distinct components of G[V \ S]. S is a minimal (α, β)-separator if no proper subset of S is an (α, β)-separator. S is said to be a minimal separator if S is a minimal (α, β)-separator for some non adjacent a, b ∈ V . If T = (K, S) is a junction-tree for G (see [7]), then the nodes K of T correspond to the maximalcliques of G, while the links S correspond to minimal separators of G (We reserve the terms vertices/edges for elements of G, and nodes/links for the elements of T ). If G is triangulated, then the number of maximal cliques is at most |V |. For example, in the graph G shown in Figure 1, K = {{b, c, d} , {a, b, e} , {b, e, c} , {e, c, f } , {c, f, g}}. The links S of T correspond to minimal-separators of G in the following way. If Vi Vj ∈ S (where Vi , Vj ∈ K and hence are cliques of G), then Vi ∩ Vj = φ. We label each edge Vi Vj ∈ S with the set Vij = Vi ∩ Vj , which is a non-empty complete separator in G. The removal of any link Vi Vj ∈ S disconnects T into two subtrees which we denote T (i) and T (j) (chosen so that T (i) contains Vi ). We will let K(i) be the nodes of T (i) , and V (i) = ∪V ∈K (i) V be the set of vertices corresponding to the subtree T (i) . The junction tree property ensures that V (i) ∩ V (j) = Vi ∩ Vj = Vij . We will let G(i) be the subgraph induced by V (i) . A graphical model is a pair (G, P ) where P is the joint probability distribution for random variables X1 , X2 , . . . , Xn , and G is a graph with vertex set V (G) = {X1 , X2 , . . . , Xn } such that the separators in G imply conditional independencies in P (so P factors according to G). If G is triangulated, then the junction-tree algorithm can be used for exact inference in the probability distribution P . The complexity of this algorithm grows with the treewidth of G (which is one less than the size of the largest clique in G when G is triangulated). The growth is exponential when P is a discrete probability distribution, thus rendering exact inference for graphs with large treewidth impractical. Therefore, we seek another graphical model (H, PH ) which allows tractable inference (so H should have lower treewidth than G has). The general problem of finding a graphical model of tree-width at most k so as to minimize the KL-divergence from a specified probability distribution is NP complete for general k ([9]) However, it is known that this problem is solvable in polynomial time (in |V (G)|) for some special cases cases (such as when G has bounded treewidth or when k = 1 [1]). If (G, PG ) and (H, PH ) are graphical models, then we say that (H, PH ) is a subgraphical model of (G, PG ) if H is a spanning subgraph of G. Note in particular that separators in G are separators in H, and hence (G, PH ) is also a graphical model. 2 Graph Decompositions and Divide-and-Conquer Algorithms For the remainder of the paper, we will be assuming that G = (V, E) is some triangulated graph, with junction tree T = (K, S). As observed above, if Vi Vj ∈ S, then the removal {b, c, d} d {b, c} {b, e, c} b c c {c, f, g} {f, c} {e, c, f } g {b, e} a e e f {a, b, e} Figure 2: The graphs G(i) , G(j) and junction-trees T (i) and T (j) resulting from the removal of the link Vij = {c, e} of Vij = Vi ∩ Vj disconnects G into two (vertex-induced) subgraphs G(i) and G(j) which are both triangulated, with junction-trees T (i) and T (j) respectively. We can recursively decompose each of G(i) and G(j) into smaller and smaller subgraphs till the resulting subgraphs are cliques. When the size of all the minimal separators are bounded, we may use these decompositions to easily solve problems that are hard in general. For example, in [5] it is shown that NP-complete problems like vertex coloring, and finding maximum independent sets can be solved in polynomial time on graphs with bounded tree-width (which are equivalent to spanning graphs with bounded size separators). We will be interested in finding (triangulated) subgraphs of G that satisfy some conditions, such as a bound on the number of edges, or a bound on the tree-width and which optimize separable objective functions (described in Section 2) One reason why problems such as this can often be solved easily when the tree-width of G is bounded by some constant is this : If Vij is a separator decomposing G into G(i) and G(j) , then a divide-and-conquer approach would suggest that we try and find optimal subgraphs of G(i) and G(j) and then splice the two together to get an optimal subgraph of G. There are two issues with this approach. First, the optimal subgraphs of G (i) and G(j) need not necessarily match up on Vij , the set of common vertices. Second, even if the two subgraphs agree on the set of common vertices, the graph resulting from splicing the two subgraphs together need not be triangulated (which could happen even if the two subgraphs individually are triangulated). To rectify the situation, we can do the following. We partition the set of subgraphs of G(i) and G(j) into classes, so that any subgraph of G(i) and any subgraph G(j) corresponding to the same class are compatible in the sense that they match up on their intersection namely Vij , and so that by splicing the two subgraphs together, we get a subgraph of G which is acceptable (and in particular is triangulated). Then given optimal subgraphs of both G(i) and G(j) corresponding to each class, we can enumerate over all the classes and pick the best one. Of course, to ensure that we do not repeatedly solve the same problem, we need to work bottom-up (a.k.a dynamic programming) or memoize our solutions. This procedure can be carried out in polynomial (in |V |) time as long as we have only a polynomial number of classes. Now, if we have a polynomial number of classes, these classes need not actually be a partition of all the acceptable subgraphs, though the union of the classes must cover all acceptable subgraphs (so the same subgraph can be contained in more than one class). For our application, every class can be thought of to be the set of subgraphs that satisfy some constraint, and we need to pick a polynomial number of constraints that cover all possibilities. The bound on the tree-width helps us here. If k |Vij | = k, then in any subgraph H of G, H[Vij ] must be one of the 2(2) possible subgraphs k of G[Vij ]. So, if k is sufficiently small (so 2(2) is bounded by some polynomial in |V |), then this procedure results in a polynomial time algorithm. In this paper, we show that in some cases we can characterize the space H so that we still have a polynomial number of constraints even when the tree-width of G is not bounded by a small constant. 2.1 Separable objective functions For cases where exact inference in the graphical model (G, PG ) is intractable, it is natural to try to find a subgraphical model (H, PH ) such that D(PG PH ) is minimized, and inference using H is tractable. We will denote by H the set of subgraphs of G that are tractable for inference. For example, this set could be the set of subgraphs of G with treewidth one less than the treewidth of G, or perhaps the set of subgraphs of G with at d fewer edges. For a specified subgraph H of G, there is a unique probability distribution PH factoring over H that minimizes D(PG PH ). Hence, finding a optimal subgraphical model is equivalent to finding a subgraph H for which D(PG PH ) is minimized. If Vij is a separator of G, we will attempt to find optimal subgraphs of G by finding optimal subgraphs of G (i) and G(j) and splicing them together. However, to do this, we need to ensure that the objective criteria also decomposes along the separator Vij . Suppose that H is any triangulated subgraph of G. Let PG(i) and PG(j) be the (marginalized) distributions of PG on V (i) and V (j) respectively, and PH (i) and PH (j) be the (marginalized) distributions of the distribution PH on V (i) and V (j) where H (i) = H[V (i) ] and H (j) = H[V (j) ], The following result assures us that the KL-divergence also factors according to the separator Vij . Lemma 1. Suppose that (G, PG ) is a graphical model, H is a triangulated subgraph of G, and PH factors over H. Then D(PG PH ) = D(PG(i) PH (i) ) + D(PG(j) PH (j) ) − D(PG[Vij ] PH[Vij ] ). Proof. Since H is a subgraph of G, and Vij is a separator of G, Vij must also be a separator of H. Therefore, PH {Xv }v∈V = PH (i) ({Xv }v∈V (i) )·PH (j) ({Xv }v∈V (j) ) . PH[Vij ] ({Xv }v∈V ) The result ij follows immediately. Therefore, there is hope that we can reduce our our original problem of finding an optimal subgraph H ∈ H as one of finding subgraphs of H (i) ⊆ G(i) and H (j) ⊆ G(j) that are compatible, in the sense that they match up on the overlap Vij , and for which D(PG PH ) is minimized. Throughout this paper, for the sake of concreteness, we will assume that the objective criterion is to minimize the KL-divergence. However, all the results can be extended to other objective functions, as long as they “separate” in the sense that for any separator, the objective function is the sum of the objective functions of the two parts, possibly modulo some correction factor which is purely a function of the separator. Another example might be the complexity r(H) of representing the graphical model H. A very natural representation satisfies r(G) = r(G(i) ) + r(G(j) ) if G has a separator G(i) ∩ G(j) . Therefore, the representation cost reduction would satisfy r(G) − r(H) = (r(G (i) ) − r(H (i) )) + (r(G(j) ) − r(H (j) )), and so also factors according to the separators. Finally note that any linear combinations of such separable functions is also separable, and so this technique could also be used to determine tradeoffs (representation cost vs. KL-divergence loss for example). In Section 4 we discuss some issues regarding computing this function. 2.2 Decompositions and decomposition trees For the algorithms considered in this paper, we will be mostly interested in the decompositions that are specified by the junction tree, and we will represent these decompositions by a rooted tree called a decomposition tree. This representation was introduced in [2, 3], and is similar in spirit to Darwiche’s dtrees [6] which specify decompositions of directed acyclic graphs. In this section and the next, we show how a decomposition tree for a graph may be constructed, and show how it is used to solve a number of optimization problems. abd; ce; gf a; be; cd d; bc; e abe dbc ebc e; cf ; g cef cf g Figure 3: The separator tree corresponding to Figure 1 A decomposition tree for G is a rooted tree whose vertices correspond to separators and cliques of G. We describe the construction of the decomposition tree in terms of a junctiontree T = (K, S) for G. The interior nodes of the decomposition tree R(T ) correspond to S (the links of T and hence the minimal separators of G). The leaf or terminal nodes represent the elements of K (the nodes of T and hence the maximal cliques of G). R(T ) can be recursively constructed from T as follows : If T consists of just one node K, (and hence no edges), then R consists of just one node, which is given the label K as well. If however, T has more than one node, then T must contain at least one link. To begin, let Vi Vj ∈ S be any link in T . Then removal of the link Vi Vj results in two disjoint junctiontrees T (i) and T (j) . We label the root of R by the decomposition (V (i) ; Vij ; V (j) ). The rest of R is recursively built by successively picking links of T (i) and T (j) (decompositions of G(i) and G(j) ) to form the interior nodes of R. The effect of this procedure on the junction tree of Figure 1 is shown in Figure 3, where the decomposition associated with the interior nodes is shown inside the nodes. Let M be the set of all nodes of R(T ). For any interior node M induced by the the link Vi Vj ∈ S of T , then we will let M (i) and M (j) represent the left and right children of M , and R(i) and R(j) be the left and right trees below M . 3 3.1 Finding optimal subgraphical models Optimal sub (k − 1)-trees of k-trees Suppose that G is a k-tree. A sub (k − 1)-tree of G is a subgraph H of G that is (k − 1)tree. Now, if Vij is any minimal separator of G, then both G(i) and G(j) are k-trees on vertex sets V (i) and V (j) respectively. It is clear that the induced subgraphs H[V (i) ] and H[V (j) ] are subgraphs of G(i) and G(j) and are partial (k − 1)-trees. We will be interested in finding sub (k − 1)-trees of k trees and this problem is trivial by the result of [1] when k = 2. Therefore, we assume that k ≥ 3. The following result characterizes the various possibilities for H[Vij ] in this case. Lemma 2. Suppose that G is a k-tree, and S = Vij is a minimal separator of G corresponding to the link ij of the junction-tree T . In any (k − 1)-tree H ⊆ G either 1. There is a u ∈ S such that u is not connected to vertices in both V (i) \ S and V (j) \ S. Then S \ {u} is a minimal separator in H and hence is complete. 2. Every vertex in S is connected to vertices in both V (i) \S and V (j) \S. Then there are vertices {x, y} ⊆ S such that the edge H[S] is missing only the edge {x, y}. Further either H[V (i) ] or H[V (j) ] does not contain a unchorded x-y path. Proof. We consider two possibilities. In the first, there is some vertex u ∈ S such that u is not connected to vertices in both V (i) \S and V (j) \. Since the removal of S disconnects G, the removal of S must also disconnect H. Therefore, S must contain a minimal separator of H. Since H is a (k − 1)-tree, all minimal separators of H must contain k − 1 vertices which must therefore be S \{u}. This corresponds to case (1) above. Clearly this possiblity can occur. If there is no such u ∈ S, then every vertex in S is connected to vertices in both V (i) \ S and V (j) \ S. If x ∈ S is connected to some yi ∈ V (i) \ S and yj ∈ V (j) \ S, then x is contained in every minimal yi /yj separator (see [5]). Therefore, every vertex in S is part of a minimal separator. Since each minimal separator contains k − 1 vertices, there must be at least two distinct minimum separators contained in S. Let Sx = S \ {x} and Sy = S \ {y} be two distinct minimal separators. We claim that H[S] contains all edges except the edge {x, y}. To see this, note that if z, w ∈ S, with z = w and {z, w} = {x, y} (as sets), then either {z, w} ⊂ Sy or {z, w} ⊂ Sx . Since both Sx and Sy are complete in H, this edge must be present in H. The edge {x, y} is not present in H[S] because all minimal separators in H must be of size k − 1. Further, if both V (i) and V (j) contain an unchorded path between x and y, then by joining the two paths at x and y, we get a unchorded cycle in H which contradicts the fact that H is triangulated. Therefore, we may associate k · 2 + 2 · k constraints with each separator Vij of G as 2 follows. There are k possible constraints corresponding to case (1) above (one for each choice of x), and k · 2 choices corresponding to case (2) above. This is because for each 2 pair {x, y} corresponding to the missing edge, we have either V (i) contains no unchorded xy paths or V (j) contains no unchorded xy paths. More explicitly, we can encode the set of constraints CM associated with each separator S corresponding to an interior node M of the decomposition tree as follows: CM = { (x, y, s) : x ∈ S, y ∈ S, s ∈ {i, j}}. If y = x, then this corresponds to case (1) of the above lemma. If s = i, then x is connected only to H (i) and if s = j, then x is connected only to H (j) . If y = x, then this corresponds to case (2) in the above lemma. If s = i, then H (i) does not contain any unchorded path between x and y, and there is no constraint on H (j) . Similarly if s = j, then H (j) does not contain any unchorded path between x and y, and there is no constraint on H (i) . Now suppose that H (i) and H (j) are triangulated subgraphs of G(i) and G(j) respectively, then it is clear that if H (i) and H (j) both satisfy the same constraint they must match up on the common vertices Vij . Therefore to splice together two solutions corresponding to the same constraint, we only need to check that the graph obtained by splicing the graphs is triangulated. Lemma 3. Suppose that H (i) and H (j) are triangulated subgraphs of G(i) and G(j) respectively such that both of them satisfy the same constraint as described above. Then the graph H obtained by splicing H (i) and H (j) together is triangulated. Proof. Suppose that both H (i) and H (j) are both triangulated and both satisfy the same constraint. If both H (i) and H (j) satisfy the same constraint corresponding to case (1) in Lemma 2 and H has an unchorded cycle, then this cycle must involve elements of both H (i) and H (j) . Therefore, there must be two vertices of S \{u} on the cycle, and hence this cycle has a chord as S \ {u} is complete. This contradiction shows that H is triangulated. So assume that both of them satisfy the constraint corresponding to case (2) of Lemma 2. Then if H is not triangulated, there must be a t-cycle (for t ≥ 4) with no chord. Now, since {x, y} is the only missing edge of S in H, and because H (i) and H (j) are individually triangulated, the cycle must contain x, y and vertices of both V (i) \ S and V (j) \ S. We may split this unchorded cycle into two unchorded paths, one contained in V (i) and one in V (j) thus violating our assumption that both H (i) and H (j) satisfy the same constraint. If |S| = k, then there are 2k + 2 · k ∈ O(k 2 ) ∈ O(n2 ). We can use a divide and conquer 2 strategy to find the optimal sub (k − 1) tree once we have taken care of the base case, where G is just a single clique (of k + 1) elements. However, for this case, it is easily checked that any subgraph of G obtained by deleting exactly one edge results in a (k − 1) tree, and every sub (k−1)-tree results from this operation. Therefore, the optimal (k−1)-tree can be found using Algorithm 1, and in this case, the complexity of Algorithm 1 is O(n(k + 1) 2 ). This procedure can be generalized to find the optimal sub (k − d)- tree for any fixed d. However, the number of constraints grows exponentially with d (though it is still polynomial in n). Therefore for small, fixed values of d, we can compute the optimal sub (k − d)-tree of G. While we can compute (k − d)-trees of G by first going from a k tree to a (k − 1) tree, then from a (k − 1)-tree to a (k − 2)-tree, and so on in a greedy fashion, this will not be optimal in general. However, this might be a good algorithm to try when d is large. 3.2 Optimal triangulated subgraphs with |E(G)| − d edges Suppose that we are interested in a (triangulated) subgraph of G that contains d fewer edges that G does. That is, we want to find an optimal subgraph H ⊂ G such that |E(H)| = |E(G)| − d. Note that by the result of [4] there is always a triangulated subgraph with d fewer edges (if d < |E(G)|). Two possibilities for finding such an optimal subgraph are 1. Use the procedure described in [4]. This is a greedy procedure which works in d steps by deleting an edge at each step. At each state, the edge is picked from the set of edges whose deletion leaves a triangulated graph. Then the edge which causes the least increase in KL-divergence is picked at each stage. 2. For each possible subset A of E(G) of size d, whose deletion leaves a triangulated graph, compute the KL divergence using the formula above, and then pick the optimal one. Since there are |E(G)| such sets, this can be done in polynomial d time (in |V (G)|) when d is a constant. The first greedy algorithm is not guaranteed to yield the optimal solution. The second takes time that is O(n2d ). Now, let us solve this problem using the framework we’ve described. Let H be the set of subgraphs of G which may be obtained by deletion of d edges. For each M = ij ∈ M corresponding to the separator Vij , let CM = (l, r, c, s, A) : l + r − c = d, s a d bit string, A ∈ E(G[Vij ]) . The constraint reprec sented by (l, r, c, A) is this : A is a set of d edges of G[Vij ] that are missing in H, l edges are missing from the left subgraph, and r edges are missing from the right subgraph. c represents the double count, and so is subtracted from the total. If k is the size of the largest k clique, then the total number of such constraints is bounded by 2d · 2d · (2) ∈ O(k 2d ) d which could be better than O(n2d ) and is polynomial in |V | when d is constant. See [10] for additional details. 4 Conclusions Algorithm 1 will compute the optimal H ∈ H for the two examples discussed above and is polynomial (for fixed constant d) even if k is O(n). In [10] a generalization is presented which will allow finding the optimal solution for other classes of subgraphical models. Now, we assume an oracle model for computing KL-divergences of probability distributions on vertex sets of cliques. It is clear that these KL-divergences can be computed R ← separator-tree for G; for each vertex M of R in order of increasing height (bottom up) do for each constraint cM of M do if M is an interior vertex of R corresponding to edge ij of the junction tree then Let Ml and Mr be the left and right children of M ; Pick constraint cl ∈ CMl compatible with cM to minimize table[Ml , cl ]; Pick constraint cr ∈ CMr compatible with cM to minimize table[Mr , cr ]; loss ← D(PG [M ] PH [M ]); table[M, cM ] ← table[Ml , cl ] + table[Mr , cr ] − loss; else table[M, cM ] ← D(PG [M ] PH [M ]); end end end Algorithm 1: Finding optimal set of constraints efficiently for distributions like Gaussians, but for discrete distributions this may not be possible when k is large. However even in this case this algorithm will result in only polynomial calls to the oracle. The standard algorithm [3] which is exponential in the treewidth will make O(2k ) calls to this oracle. Therefore, when the cost of computing the KL-divergence is large, this algorithm becomes even more attractive as it results in exponential speedup over the standard algorithm. Alternatively, if we can compute approximate KL-divergences, or approximately optimal solutions, then we can compute an approximate solution by using the same algorithm. References [1] C. Chow and C. Liu, “Approximating discrete probability distributions with dependence trees”, IEEE Transactions on Information Theory, v. 14, 1968, Pages 462–467. [2] F. Gavril, “Algorithms on clique separable graphs”, Discrete Mathematics v. 9 (1977), pp. 159–165. [3] R. E. Tarjan. “Decomposition by Clique Separators”, Discrete Mathematics, v. 55 (1985), pp. 221–232. [4] U. Kjaerulff. “Reduction of computational complexity in Bayesian networks through removal of weak dependencies”, Proceedings of the Tenth Annual Conference on Uncertainty in Artificial Intelligence, pp. 374–382, 1994. [5] T. Kloks, “Treewidth: Computations and Approximations”, Springer-Verlag, 1994. [6] A. Darwiche and M. Hopkins. “Using recursive decomposition to construct elimination orders, jointrees and dtrees”, Technical Report D-122, Computer Science Dept., UCLA. [7] S. Lauritzen. “Graphical Models”, Oxford University Press, Oxford, 1996. [8] T. A. McKee and F. R. McMorris. “Topics in Intersection Graph Theory”, SIAM Monographs on Discrete Mathematics and Applications, 1999. [9] D. Karger and N. Srebro. “Learning Markov networks: Maximum bounded tree-width graphs.” In Symposium on Discrete Algorithms, 2001, Pages 391-401. [10] M. Narasimhan and J. Bilmes. “Optimization on separator-clique trees.”, Technical report UWEETR 2004-10, June 2004.
same-paper 2 0.87752908 48 nips-2004-Convergence and No-Regret in Multiagent Learning
Author: Michael Bowling
Abstract: Learning in a multiagent system is a challenging problem due to two key factors. First, if other agents are simultaneously learning then the environment is no longer stationary, thus undermining convergence guarantees. Second, learning is often susceptible to deception, where the other agents may be able to exploit a learner’s particular dynamics. In the worst case, this could result in poorer performance than if the agent was not learning at all. These challenges are identifiable in the two most common evaluation criteria for multiagent learning algorithms: convergence and regret. Algorithms focusing on convergence or regret in isolation are numerous. In this paper, we seek to address both criteria in a single algorithm by introducing GIGA-WoLF, a learning algorithm for normalform games. We prove the algorithm guarantees at most zero average regret, while demonstrating the algorithm converges in many situations of self-play. We prove convergence in a limited setting and give empirical results in a wider variety of situations. These results also suggest a third new learning criterion combining convergence and regret, which we call negative non-convergence regret (NNR). 1
3 0.85047662 38 nips-2004-Co-Validation: Using Model Disagreement on Unlabeled Data to Validate Classification Algorithms
Author: Omid Madani, David M. Pennock, Gary W. Flake
Abstract: In the context of binary classification, we define disagreement as a measure of how often two independently-trained models differ in their classification of unlabeled data. We explore the use of disagreement for error estimation and model selection. We call the procedure co-validation, since the two models effectively (in)validate one another by comparing results on unlabeled data, which we assume is relatively cheap and plentiful compared to labeled data. We show that per-instance disagreement is an unbiased estimate of the variance of error for that instance. We also show that disagreement provides a lower bound on the prediction (generalization) error, and a tight upper bound on the “variance of prediction error”, or the variance of the average error across instances, where variance is measured across training sets. We present experimental results on several data sets exploring co-validation for error estimation and model selection. The procedure is especially effective in active learning settings, where training sets are not drawn at random and cross validation overestimates error. 1
4 0.8186388 90 nips-2004-Joint Probabilistic Curve Clustering and Alignment
Author: Scott J. Gaffney, Padhraic Smyth
Abstract: Clustering and prediction of sets of curves is an important problem in many areas of science and engineering. It is often the case that curves tend to be misaligned from each other in a continuous manner, either in space (across the measurements) or in time. We develop a probabilistic framework that allows for joint clustering and continuous alignment of sets of curves in curve space (as opposed to a fixed-dimensional featurevector space). The proposed methodology integrates new probabilistic alignment models with model-based curve clustering algorithms. The probabilistic approach allows for the derivation of consistent EM learning algorithms for the joint clustering-alignment problem. Experimental results are shown for alignment of human growth data, and joint clustering and alignment of gene expression time-course data.
5 0.76762974 129 nips-2004-New Criteria and a New Algorithm for Learning in Multi-Agent Systems
Author: Rob Powers, Yoav Shoham
Abstract: We propose a new set of criteria for learning algorithms in multi-agent systems, one that is more stringent and (we argue) better justified than previous proposed criteria. Our criteria, which apply most straightforwardly in repeated games with average rewards, consist of three requirements: (a) against a specified class of opponents (this class is a parameter of the criterion) the algorithm yield a payoff that approaches the payoff of the best response, (b) against other opponents the algorithm’s payoff at least approach (and possibly exceed) the security level payoff (or maximin value), and (c) subject to these requirements, the algorithm achieve a close to optimal payoff in self-play. We furthermore require that these average payoffs be achieved quickly. We then present a novel algorithm, and show that it meets these new criteria for a particular parameter class, the class of stationary opponents. Finally, we show that the algorithm is effective not only in theory, but also empirically. Using a recently introduced comprehensive game theoretic test suite, we show that the algorithm almost universally outperforms previous learning algorithms. 1
6 0.73424423 12 nips-2004-A Temporal Kernel-Based Model for Tracking Hand Movements from Neural Activities
7 0.72987264 72 nips-2004-Generalization Error and Algorithmic Convergence of Median Boosting
8 0.72927737 69 nips-2004-Fast Rates to Bayes for Kernel Machines
9 0.72725201 45 nips-2004-Confidence Intervals for the Area Under the ROC Curve
10 0.72166675 161 nips-2004-Self-Tuning Spectral Clustering
11 0.7194469 3 nips-2004-A Feature Selection Algorithm Based on the Global Minimization of a Generalization Error Bound
12 0.71765357 206 nips-2004-Worst-Case Analysis of Selective Sampling for Linear-Threshold Algorithms
13 0.7158646 86 nips-2004-Instance-Specific Bayesian Model Averaging for Classification
14 0.71547776 103 nips-2004-Limits of Spectral Clustering
15 0.71279854 143 nips-2004-PAC-Bayes Learning of Conjunctions and Classification of Gene-Expression Data
16 0.71271133 204 nips-2004-Variational Minimax Estimation of Discrete Distributions under KL Loss
17 0.71253997 207 nips-2004-ℓ₀-norm Minimization for Basis Selection
18 0.71134377 174 nips-2004-Spike Sorting: Bayesian Clustering of Non-Stationary Data
19 0.71054196 147 nips-2004-Planning for Markov Decision Processes with Sparse Stochasticity
20 0.70989525 44 nips-2004-Conditional Random Fields for Object Recognition