nips nips2004 nips2004-123 knowledge-graph by maker-knowledge-mining
Source: pdf
Author: K. Wong, S. W. Lim, Z. Gao
Abstract: We consider multi-agent systems whose agents compete for resources by striving to be in the minority group. The agents adapt to the environment by reinforcement learning of the preferences of the policies they hold. Diversity of preferences of policies is introduced by adding random biases to the initial cumulative payoffs of their policies. We explain and provide evidence that agent cooperation becomes increasingly important when diversity increases. Analyses of these mechanisms yield excellent agreement with simulations over nine decades of data. 1
Reference: text
sentIndex sentText sentNum sentScore
1 hk Abstract We consider multi-agent systems whose agents compete for resources by striving to be in the minority group. [sent-8, score-0.789]
2 The agents adapt to the environment by reinforcement learning of the preferences of the policies they hold. [sent-9, score-0.87]
3 Diversity of preferences of policies is introduced by adding random biases to the initial cumulative payoffs of their policies. [sent-10, score-0.579]
4 We explain and provide evidence that agent cooperation becomes increasingly important when diversity increases. [sent-11, score-0.521]
5 While a standard analytical approach is to study their steady state behavior described by the Nash equilibria [2], it is interesting to consider the dynamics of how the steady state is approached. [sent-15, score-0.442]
6 In such cases, the cooperation of agents becomes very important. [sent-17, score-0.777]
7 Speci£cally, we consider the dynamics of a version of large population games which models the collective behavior of agents simultaneously and adaptively competing for limited resources. [sent-18, score-0.866]
8 The game is a variant of the Minority Game, in which the agents strive to make the minority decision, thereby balancing the load distributed between the majority and minority choices [3]. [sent-19, score-1.088]
9 When the policy dimension is too low, many agents share identical policies, and the system suffers from the maladaptive behavior of the agents, meaning that they prematurely rush to adapt to system changes in bursts [4]. [sent-21, score-0.875]
10 This is done by randomly assigning biases to the initial preference of policies of the agents, so that agents sharing common policies may not adopt them at the same time, and maladaptation is reduced. [sent-23, score-1.114]
11 As a result, the population difference between the majority and minority groups decreases. [sent-24, score-0.298]
12 In contrast to the maladaptive regime, in which agents blindly respond to environmental signals, agent cooperation becomes increasingly important in the diverse regime. [sent-26, score-1.103]
13 Namely, there are fewer agents ad- justing their policy perferences at each step of the steady state, but there emerges a more coordinated pattern of policy adjustment among them. [sent-27, score-0.976]
14 In this paper, we explain the cooperative mechanisms which appear successively when the diversity of the agents’ preference of policies increases, as recently proposed in [6]. [sent-29, score-0.521]
15 While we focus on the population dynamics of the Minority Game, we expect that the observed cooperative mechanisms are relevant to reinforcement learning in multi-agent systems more generally. [sent-31, score-0.322]
16 2 The Minority Game The Minority Game consists of a population of N agents competing sel£shly to maximize their individual utility in an environment of limited resources, N being odd [3]. [sent-32, score-0.653]
17 Each agent makes a decision + or − at each time step, and the minority group wins. [sent-33, score-0.412]
18 For typical control tasks such as the resource allocation, the decisions + and − may represent two alternative resources, so that less agents utilizing a resource implies more abundance. [sent-34, score-0.666]
19 The decisions of each agent are prescribed by policies, which are binary functions mapping the history of the winning bits of the game in the most recent m steps to decisions + or −. [sent-35, score-0.491]
20 Before the game starts, each agent randomly picks s policies out of the set of 2D policies with replacement, where D ≡ 2m is the number of input states. [sent-37, score-0.695]
21 The long-term goal of an agent is to maximize her cumulative payoff, which is the sum of the undiscounted payoffs received during the game history. [sent-38, score-0.534]
22 For the decision ξ i (t) of agent i at time t (ξi (t) = ±1), the payoff is −ξi (t)G(A(t)), where A(t) ≡ i ξi (t)/N , and G(A) satis£es the property signG(A) = signA. [sent-39, score-0.301]
23 The success of a policy is measured by its cumulative payoff, updated every step irrespective of whether it is adopted or not. [sent-41, score-0.324]
24 Though we only consider random policies instead of organized ones, we expect that the model is suf£cient to capture the collective behavior of large population games. [sent-43, score-0.389]
25 Note that an agent gains in payoff when she makes a decision opposite to A(t), and loses otherwise, re¤ecting the winning of the minority group. [sent-46, score-0.537]
26 It is natural to consider systems with diverse preferences of policies [5]. [sent-47, score-0.379]
27 This means that the initial cumulative payoffs of policies α (α = 1, . [sent-48, score-0.456]
28 , s − 1) of agent i with respect to her sth policy have random biases ωiα . [sent-51, score-0.333]
29 Diversity is important in reducing the maladaptive behavior of the agents, since otherwise the same policy of all agents accumulates the same payoffs, and would be adopted at the same time. [sent-52, score-0.776]
30 For odd R, no two policies have the same cumulative payoffs throughout the process, and the dynamics is deterministic, resulting in highly precise simulation results useful for re£ned comparison with theories. [sent-56, score-0.57]
31 The population averages of the decisions oscillate around 0 at the steady state. [sent-57, score-0.282]
32 Several modes of agent cooperation can be identi£ed, and explained in the following sections. [sent-66, score-0.404]
33 To interpret this result, we describe the macroscopic dynamics of the system by de£ning the D-dimensional vector A µ (t), which is the sum of the decisions of all agents responding to history µ of their policies, normalized by N . [sent-69, score-0.734]
34 While only one of the D components corresponds to the historical state µ∗ (t) of the system, the augmentation to D components is necessary to describe the attractor structure and the transient behavior of the system dynamics. [sent-70, score-0.512]
35 The key to analysing the system dynamics is the observation that the cumulative payoffs of all policies displace by exactly the same amount when the game proceeds. [sent-71, score-0.702]
36 Hence for a given pair of policies, the pro£le of the relative cumulative payoff distribution remains unchanged, but the peak position shifts with the game dynamics. [sent-72, score-0.357]
37 We let Sαβ (ω) be the number of agents holding policies α and β (with α < β), and the bias of α with respect to β is ω. [sent-74, score-0.801]
38 If the cumulative payoff of policy α at time t is Ωα (t), then the agents holding policies α and β make decisions according to policy α if ω + Ωα (t) − Ωβ (t) > 0, and policy β otherwise. [sent-75, score-1.438]
39 At time µ t, the cumulative payoff of policy α changes from Ωα (t) to Ωα (t) − ξα signAµ (t), where µ ξα is the decision of policy α at state µ. [sent-77, score-0.564]
40 Only the £ckle agents, that is, those agents with preferences on the verge of switching signs, contribute to the change in Aµ (t), namely, µ µ ω + Ωα (t) − Ωβ (t) = ±1 and ξα − ξβ = ±2signAµ (t). [sent-78, score-0.699]
41 6 P Prob = 3 8 A R Prob = + average number of agents Prob = 0. [sent-81, score-0.561]
42 0 −3001 3 8 −1 3001 preference Figure 2: (a) The attractor in the Minority Game with m = 1, following the period-4 sequence of P-Q-R-Q in the phase space of A+ and A− . [sent-84, score-0.426]
43 There are 4 approaches to the attractor indicated by the arrows, and the respective probabilities are obtained by considering the detailed dynamics from the different initial positions and states. [sent-85, score-0.42]
44 (b) Experimental evidence of the kinetic sampling effect: steady-state preference dependence of the average number of agents holding the identity policy and its complement, immediately before state Q enters state R, at ρ = N = 1, 023 and averaged over 100,000 samples of initial conditions. [sent-86, score-1.19]
45 πR (3) Equation (3) shows that the dynamics proceeds in the direction which reduces the magnitude of the population vector, each time by a step of size 2/πR. [sent-90, score-0.238]
46 The fraction of £ckle agents at every time step consists of those who have ±1 preferences, which scales as the height of the bias distribution near its center. [sent-97, score-0.615]
47 The scaling relation shows that agent cooperation in this regime is described at the level of statistical distributions of policy preferences, since the number of √ agents making an adaptive move at each step is suf£ciently numerous (∼ N ). [sent-99, score-1.187]
48 2(b) on how the average number of agents, who µ µ hold the identity policy with ξα = µ and its complementary policy ξβ = −µ, depends on the preference ω+Ωα −Ωβ , when the system reaches the steady state in games with m = 1. [sent-103, score-0.655]
49 This value of the preference corresponds to that of the attractor step from Q to R when at state −, decision + loses and decision − wins, and ω + Ωα − Ωβ changes from −1 to +1. [sent-108, score-0.652]
50 The peak at the attractor step shows that its average size is self-organized to be larger than those of the transient steps described by the background distribution. [sent-109, score-0.516]
51 This effect that favors the cooperation of larger clusters of agents is referred to as the kinetic sampling effect. [sent-110, score-0.992]
52 (2) shows that it is equal to 2/N times the number of £ckle √ agents at time t, which is Poisson distributed with a mean of N/ 2πR = ∆/2, where ∆ ≡ N 2/πR is the average step size. [sent-113, score-0.615]
53 However, since the attractor is formed by steps which reverse the sign of Aµ , the average step size in the attractor is larger than that in the transient state, because a long jump in the vicinity of the attractor is more likely to get trapped. [sent-114, score-1.096]
54 To describe this effect, we consider the probability Patt (∆A) of step sizes ∆A in the attractor (with ∆Aµ > 0 for all µ). [sent-115, score-0.387]
55 Thus, the attractor averages (∆A± )2 att are given by (∆A± )2 att (∆A± )2 ∆A+ ∆A− ∆A+ ∆A− Poi = Poi . [sent-120, score-0.515]
56 (4) There are agents who contribute to both ∆A+ and ∆A− , giving rise to their correlations. [sent-121, score-0.585]
57 (2), the strategies of the agents contributing to ∆A+ and ∆A− satisfy ξα − ξβ = − − −2r and ξα − ξβ = 2r respectively. [sent-123, score-0.58]
58 Among the agents contributing to ∆A+ , the extra − − requirement of ξα − ξβ = 2r implies that an average of 1/4 of them also contribute to ∆A− . [sent-124, score-0.656]
59 Hence, the number of agents contributing to both steps is a Poisson variable with mean ∆/8, and those exclusive to the individual steps are Poisson variables with mean 3∆/8. [sent-125, score-0.696]
60 N 2 (2∆ + 1) (6) The attractor states are given by Aµ = mµ /N and mµ /N − ∆Aµ , where mµ = 1, 3, . [sent-132, score-0.303]
61 2(a), the attractor consists of both hops along the A± axes. [sent-146, score-0.348]
62 Analysis shows that only those agents holding the identity policy and its complement can complete both hops after they have adjusted their preferences to ω + Ωα − Ωβ = ±1. [sent-147, score-0.867]
63 Since there are fewer and fewer £ckle agents in the limit ρ N , one would expect that a single agent of this type would dominate the game dynamics, and σ 2 /N would approach 0. [sent-148, score-0.861]
64 However, attractors having 2 £ckle agents are about 10 times more common in the extremely diverse limit. [sent-151, score-0.659]
65 3(a) for a typical case, one of the two agents £rst arrives at the status of ±1 preference of her policies and stay there waiting. [sent-153, score-0.878]
66 Meanwhile, the preference of the second agent is steadily reduced. [sent-154, score-0.285]
67 Once she has arrived at the status of ±1 preference of her policies, both agents can then cooperate to complete the dynamics of the attractor. [sent-155, score-0.825]
68 Thus, the composition of the group of £ckle agents is self-organized through this waiting effect, and consequently the step sizes and variance increase above those predicted by kinetic sampling. [sent-157, score-1.061]
69 Here the agents are so diverse that the average step size is approaching 0. [sent-159, score-0.738]
70 At each state in the phase space, the system remains stationary for many time steps, waiting for some agent to reduce the magnitude of her preference until policy switching can take place. [sent-160, score-0.787]
71 2(a), the attractor may be approached from the arm (P or R) or from the corner (Q). [sent-163, score-0.336]
72 Consider the case of the state approaching from P, waiting up to k times at Q to move to R, and ending the transient dynamics thereafter. [sent-164, score-0.473]
73 Then the + − cumulative payoffs of a policy α can be written as Ωα + ξα at P, Ωα , . [sent-165, score-0.369]
74 , Ωα − kξα at Q − − − and, in the attractor of period 4, repeating the sequence of Ωα − kξα − ξα at R, Ωα − kξα − + − at Q, Ωα − kξα + ξα at P, and Ωα − kξα at Q. [sent-168, score-0.303]
75 The movement of the cumulative payoffs µ can be conveniently represented by writing Ωα = µ k µ ξα , where k µ denotes the number of wins minus losses of decision 1 at state µ in the game history. [sent-169, score-0.498]
76 The size of each step is 2/N times the number of £ckle agents at that step, which is Poisson distributed with average ∆/2. [sent-172, score-0.615]
77 The average numbers of agents appearing simultaneously in different steps positioned along the directions k + ±k − = constant and k ± = constant are, respectively, ∆/8 and ∆/4, and 0 for other directions. [sent-173, score-0.619]
78 Thus, the average number of agents common in the pairs of steps {PQ, QQ1 }, {QQk , QP}, {QP, QR}, {PQ, QP} are ∆/8, ∆/8, ∆/8 and ∆/4 respectively. [sent-174, score-0.619]
79 The number of agents involved in the steps are described in Table 1. [sent-176, score-0.593]
80 The variance of the step sizes is given by 1 [(∆A+ )2 +(∆A− )2 ] 2 att = Pj j 1 + 2 i=0,1 2 [(∆A ) i=0,1 + (∆A− )2 ]∆A+ ∆A− ∆A+ ∆A− i,j , i,j (9) where j = arm or corner. [sent-177, score-0.252]
81 The system converges to the attractor at t = 1, 086. [sent-181, score-0.341]
82 (b) The space of k + and k − describing the movement of the cumulative payoffs in the game with m = 1. [sent-182, score-0.372]
83 (10) We note that the number a0 accounts for the agents who contribute to both steps in the attractor, and thus can complete the attractor dynamics alone in the extremely diverse limit. [sent-189, score-1.211]
84 Once present, it will appear in the attractor step QP, irrespective of the duration k of the wait at Q. [sent-191, score-0.435]
85 These acum agents can wait to complete the attractor dynamics together with the a− agents who contribute independently to the step from Q to R, as well as the a0 agents who contribute to both attractor steps. [sent-192, score-2.645]
86 As a result, the average step size increases due to this waiting effect. [sent-193, score-0.322]
87 In the former case, cooperation between individual types of agents becomes indispensable in reaching the steady state behavior. [sent-194, score-0.967]
88 1(b), the waiting effect causes the variance to increase beyond the kinetic sampling prediction, agreeing with the trend of the simulation results. [sent-198, score-0.574]
89 Further approximation including multiple waiting steps results in the theoretical curves with excellent agreement with the simulation results, as shown in Fig. [sent-202, score-0.424]
90 6 Conclusion We have studied the dynamical mechanisms of cooperation, which emerges automatically in a multi-agent system with adaptive agents competing sel£shly for £nite resources. [sent-207, score-0.708]
91 At low diversity, agent cooperation proceeds at the statistical level, resulting in the scaling relation of the variance with diversity. [sent-208, score-0.508]
92 At high diversity, when kinetic sampling becomes Label PQ QQ1 QQr QQk QR QP Table 1: The number of £ckle agents in the steps of one wait. [sent-209, score-0.78]
93 of agents Poisson averages + Ωα + ξα → Ωα ai + aturn,1 + acum ai = ∆/8, aturn,1 = ∆/8, acum = ∆/4. [sent-211, score-0.809]
94 acum + aturn,2 + a0 signi£cant, we £nd that the attractor dynamics favors the cooperation of larger clusters of agents. [sent-216, score-0.761]
95 In extremely diverse systems, we further discover a waiting mechanism, when agents who are unable to complete the attractor dynamics alone wait for other agents to collaborate with them. [sent-217, score-1.919]
96 When waiting is present, cooperation between individual types of agents becomes indispensable in reaching the steady state behavior. [sent-218, score-1.209]
97 Together, these mechanisms yield theoretical predictions of the population variance in excellent agreement with simulations over nine decades of data. [sent-219, score-0.304]
98 We expect that the observed mechanisms of agent cooperation can be found in reinforcement learning of multi-agent systems in general, due to their generic nature. [sent-220, score-0.52]
99 The mechanisms of statistical cooperation, kinetic sampling and waiting illustrate the importance of dynamical considerations in describing the system behavior, and the capability of multiagent systems to self-organize in their collective dynamics. [sent-221, score-0.612]
100 In particular, it is interesting to note that given enough waiting time, agents with limited abilities can cooperate to achieve dynamics unachievable by individuals. [sent-222, score-0.897]
wordName wordTfidf (topN-words)
[('agents', 0.535), ('attractor', 0.303), ('cooperation', 0.242), ('waiting', 0.242), ('minority', 0.208), ('policies', 0.198), ('agent', 0.162), ('ckle', 0.156), ('kinetic', 0.156), ('payoffs', 0.138), ('game', 0.137), ('policy', 0.134), ('preference', 0.123), ('acum', 0.122), ('diversity', 0.117), ('patt', 0.104), ('payoff', 0.097), ('cumulative', 0.097), ('steady', 0.095), ('diverse', 0.095), ('dynamics', 0.094), ('att', 0.091), ('population', 0.09), ('preferences', 0.086), ('qp', 0.077), ('poisson', 0.07), ('maladaptive', 0.069), ('poi', 0.069), ('pq', 0.069), ('decisions', 0.067), ('mechanisms', 0.061), ('state', 0.06), ('steps', 0.058), ('step', 0.054), ('ppoi', 0.052), ('signa', 0.052), ('wong', 0.052), ('contribute', 0.05), ('transient', 0.049), ('prob', 0.046), ('resources', 0.046), ('kong', 0.045), ('hops', 0.045), ('contributing', 0.045), ('games', 0.045), ('variance', 0.044), ('simulation', 0.043), ('excellent', 0.043), ('holding', 0.042), ('qr', 0.042), ('decision', 0.042), ('hong', 0.041), ('qq', 0.041), ('irrespective', 0.039), ('wait', 0.039), ('system', 0.038), ('scaling', 0.038), ('agreement', 0.038), ('behavior', 0.038), ('biases', 0.037), ('collective', 0.036), ('arrows', 0.036), ('indispensable', 0.035), ('qqk', 0.035), ('shly', 0.035), ('arm', 0.033), ('resource', 0.032), ('sampling', 0.031), ('averages', 0.03), ('sizes', 0.03), ('sel', 0.03), ('agreeing', 0.03), ('extremely', 0.029), ('effect', 0.028), ('competing', 0.028), ('switching', 0.028), ('gao', 0.028), ('decades', 0.028), ('approaching', 0.028), ('loses', 0.028), ('reinforcement', 0.028), ('expect', 0.027), ('lim', 0.027), ('average', 0.026), ('bias', 0.026), ('multiagent', 0.026), ('cooperate', 0.026), ('peak', 0.026), ('complete', 0.025), ('historical', 0.024), ('wins', 0.024), ('emerges', 0.024), ('adapt', 0.023), ('deviates', 0.023), ('initial', 0.023), ('relation', 0.022), ('dynamical', 0.022), ('alone', 0.022), ('status', 0.022), ('cooperative', 0.022)]
simIndex simValue paperId paperTitle
same-paper 1 1.0000005 123 nips-2004-Multi-agent Cooperation in Diverse Population Games
Author: K. Wong, S. W. Lim, Z. Gao
Abstract: We consider multi-agent systems whose agents compete for resources by striving to be in the minority group. The agents adapt to the environment by reinforcement learning of the preferences of the policies they hold. Diversity of preferences of policies is introduced by adding random biases to the initial cumulative payoffs of their policies. We explain and provide evidence that agent cooperation becomes increasingly important when diversity increases. Analyses of these mechanisms yield excellent agreement with simulations over nine decades of data. 1
2 0.31069431 24 nips-2004-Approximately Efficient Online Mechanism Design
Author: David C. Parkes, Dimah Yanovsky, Satinder P. Singh
Abstract: Online mechanism design (OMD) addresses the problem of sequential decision making in a stochastic environment with multiple self-interested agents. The goal in OMD is to make value-maximizing decisions despite this self-interest. In previous work we presented a Markov decision process (MDP)-based approach to OMD in large-scale problem domains. In practice the underlying MDP needed to solve OMD is too large and hence the mechanism must consider approximations. This raises the possibility that agents may be able to exploit the approximation for selfish gain. We adopt sparse-sampling-based MDP algorithms to implement efficient policies, and retain truth-revelation as an approximate BayesianNash equilibrium. Our approach is empirically illustrated in the context of the dynamic allocation of WiFi connectivity to users in a coffeehouse. 1
3 0.15297577 129 nips-2004-New Criteria and a New Algorithm for Learning in Multi-Agent Systems
Author: Rob Powers, Yoav Shoham
Abstract: We propose a new set of criteria for learning algorithms in multi-agent systems, one that is more stringent and (we argue) better justified than previous proposed criteria. Our criteria, which apply most straightforwardly in repeated games with average rewards, consist of three requirements: (a) against a specified class of opponents (this class is a parameter of the criterion) the algorithm yield a payoff that approaches the payoff of the best response, (b) against other opponents the algorithm’s payoff at least approach (and possibly exceed) the security level payoff (or maximin value), and (c) subject to these requirements, the algorithm achieve a close to optimal payoff in self-play. We furthermore require that these average payoffs be achieved quickly. We then present a novel algorithm, and show that it meets these new criteria for a particular parameter class, the class of stationary opponents. Finally, we show that the algorithm is effective not only in theory, but also empirically. Using a recently introduced comprehensive game theoretic test suite, we show that the algorithm almost universally outperforms previous learning algorithms. 1
4 0.13053417 64 nips-2004-Experts in a Markov Decision Process
Author: Eyal Even-dar, Sham M. Kakade, Yishay Mansour
Abstract: We consider an MDP setting in which the reward function is allowed to change during each time step of play (possibly in an adversarial manner), yet the dynamics remain fixed. Similar to the experts setting, we address the question of how well can an agent do when compared to the reward achieved under the best stationary policy over time. We provide efficient algorithms, which have regret bounds with no dependence on the size of state space. Instead, these bounds depend only on a certain horizon time of the process and logarithmically on the number of actions. We also show that in the case that the dynamics change over time, the problem becomes computationally hard. 1
5 0.12804098 39 nips-2004-Coarticulation in Markov Decision Processes
Author: Khashayar Rohanimanesh, Robert Platt, Sridhar Mahadevan, Roderic Grupen
Abstract: We investigate an approach for simultaneously committing to multiple activities, each modeled as a temporally extended action in a semi-Markov decision process (SMDP). For each activity we define a set of admissible solutions consisting of the redundant set of optimal policies, and those policies that ascend the optimal statevalue function associated with them. A plan is then generated by merging them in such a way that the solutions to the subordinate activities are realized in the set of admissible solutions satisfying the superior activities. We present our theoretical results and empirically evaluate our approach in a simulated domain. 1
6 0.099827461 48 nips-2004-Convergence and No-Regret in Multiagent Learning
7 0.097349316 154 nips-2004-Resolving Perceptual Aliasing In The Presence Of Noisy Sensors
8 0.094443254 88 nips-2004-Intrinsically Motivated Reinforcement Learning
9 0.082770698 202 nips-2004-VDCBPI: an Approximate Scalable Algorithm for Large POMDPs
10 0.061253615 100 nips-2004-Learning Preferences for Multiclass Problems
11 0.056602698 1 nips-2004-A Cost-Shaping LP for Bellman Error Minimization with Performance Guarantees
12 0.053811666 140 nips-2004-Optimal Information Decoding from Neuronal Populations with Specific Stimulus Selectivity
13 0.052085306 65 nips-2004-Exploration-Exploitation Tradeoffs for Experts Algorithms in Reactive Environments
14 0.050950769 155 nips-2004-Responding to Modalities with Different Latencies
15 0.049486335 102 nips-2004-Learning first-order Markov models for control
16 0.046917576 26 nips-2004-At the Edge of Chaos: Real-time Computations and Self-Organized Criticality in Recurrent Neural Networks
17 0.044699967 62 nips-2004-Euclidean Embedding of Co-Occurrence Data
18 0.043017603 195 nips-2004-Trait Selection for Assessing Beef Meat Quality Using Non-linear SVM
19 0.039474826 122 nips-2004-Modelling Uncertainty in the Game of Go
20 0.038594902 76 nips-2004-Hierarchical Bayesian Inference in Networks of Spiking Neurons
topicId topicWeight
[(0, -0.116), (1, -0.065), (2, 0.251), (3, -0.056), (4, -0.18), (5, 0.205), (6, -0.061), (7, -0.069), (8, -0.042), (9, 0.002), (10, -0.01), (11, 0.003), (12, -0.031), (13, -0.022), (14, -0.064), (15, 0.078), (16, -0.025), (17, 0.026), (18, 0.007), (19, 0.009), (20, 0.083), (21, -0.005), (22, 0.078), (23, 0.045), (24, 0.132), (25, -0.102), (26, 0.188), (27, -0.069), (28, -0.047), (29, -0.145), (30, 0.045), (31, 0.063), (32, 0.076), (33, 0.063), (34, 0.079), (35, 0.19), (36, -0.082), (37, 0.175), (38, -0.083), (39, -0.124), (40, 0.003), (41, -0.078), (42, 0.043), (43, 0.009), (44, 0.042), (45, 0.065), (46, -0.043), (47, 0.127), (48, 0.087), (49, 0.03)]
simIndex simValue paperId paperTitle
same-paper 1 0.9797321 123 nips-2004-Multi-agent Cooperation in Diverse Population Games
Author: K. Wong, S. W. Lim, Z. Gao
Abstract: We consider multi-agent systems whose agents compete for resources by striving to be in the minority group. The agents adapt to the environment by reinforcement learning of the preferences of the policies they hold. Diversity of preferences of policies is introduced by adding random biases to the initial cumulative payoffs of their policies. We explain and provide evidence that agent cooperation becomes increasingly important when diversity increases. Analyses of these mechanisms yield excellent agreement with simulations over nine decades of data. 1
2 0.8469125 24 nips-2004-Approximately Efficient Online Mechanism Design
Author: David C. Parkes, Dimah Yanovsky, Satinder P. Singh
Abstract: Online mechanism design (OMD) addresses the problem of sequential decision making in a stochastic environment with multiple self-interested agents. The goal in OMD is to make value-maximizing decisions despite this self-interest. In previous work we presented a Markov decision process (MDP)-based approach to OMD in large-scale problem domains. In practice the underlying MDP needed to solve OMD is too large and hence the mechanism must consider approximations. This raises the possibility that agents may be able to exploit the approximation for selfish gain. We adopt sparse-sampling-based MDP algorithms to implement efficient policies, and retain truth-revelation as an approximate BayesianNash equilibrium. Our approach is empirically illustrated in the context of the dynamic allocation of WiFi connectivity to users in a coffeehouse. 1
3 0.69160724 129 nips-2004-New Criteria and a New Algorithm for Learning in Multi-Agent Systems
Author: Rob Powers, Yoav Shoham
Abstract: We propose a new set of criteria for learning algorithms in multi-agent systems, one that is more stringent and (we argue) better justified than previous proposed criteria. Our criteria, which apply most straightforwardly in repeated games with average rewards, consist of three requirements: (a) against a specified class of opponents (this class is a parameter of the criterion) the algorithm yield a payoff that approaches the payoff of the best response, (b) against other opponents the algorithm’s payoff at least approach (and possibly exceed) the security level payoff (or maximin value), and (c) subject to these requirements, the algorithm achieve a close to optimal payoff in self-play. We furthermore require that these average payoffs be achieved quickly. We then present a novel algorithm, and show that it meets these new criteria for a particular parameter class, the class of stationary opponents. Finally, we show that the algorithm is effective not only in theory, but also empirically. Using a recently introduced comprehensive game theoretic test suite, we show that the algorithm almost universally outperforms previous learning algorithms. 1
4 0.53995007 39 nips-2004-Coarticulation in Markov Decision Processes
Author: Khashayar Rohanimanesh, Robert Platt, Sridhar Mahadevan, Roderic Grupen
Abstract: We investigate an approach for simultaneously committing to multiple activities, each modeled as a temporally extended action in a semi-Markov decision process (SMDP). For each activity we define a set of admissible solutions consisting of the redundant set of optimal policies, and those policies that ascend the optimal statevalue function associated with them. A plan is then generated by merging them in such a way that the solutions to the subordinate activities are realized in the set of admissible solutions satisfying the superior activities. We present our theoretical results and empirically evaluate our approach in a simulated domain. 1
5 0.47282359 171 nips-2004-Solitaire: Man Versus Machine
Author: Xiang Yan, Persi Diaconis, Paat Rusmevichientong, Benjamin V. Roy
Abstract: In this paper, we use the rollout method for policy improvement to analyze a version of Klondike solitaire. This version, sometimes called thoughtful solitaire, has all cards revealed to the player, but then follows the usual Klondike rules. A strategy that we establish, using iterated rollouts, wins about twice as many games on average as an expert human player does. 1
6 0.38968876 154 nips-2004-Resolving Perceptual Aliasing In The Presence Of Noisy Sensors
7 0.33434898 64 nips-2004-Experts in a Markov Decision Process
8 0.33354479 88 nips-2004-Intrinsically Motivated Reinforcement Learning
9 0.31954446 122 nips-2004-Modelling Uncertainty in the Game of Go
10 0.29000515 57 nips-2004-Economic Properties of Social Networks
11 0.27690178 48 nips-2004-Convergence and No-Regret in Multiagent Learning
12 0.26839137 175 nips-2004-Stable adaptive control with online learning
13 0.23964173 155 nips-2004-Responding to Modalities with Different Latencies
14 0.23944137 195 nips-2004-Trait Selection for Assessing Beef Meat Quality Using Non-linear SVM
15 0.23813258 202 nips-2004-VDCBPI: an Approximate Scalable Algorithm for Large POMDPs
16 0.22080582 75 nips-2004-Heuristics for Ordering Cue Search in Decision Making
17 0.2111212 65 nips-2004-Exploration-Exploitation Tradeoffs for Experts Algorithms in Reactive Environments
18 0.20097721 1 nips-2004-A Cost-Shaping LP for Bellman Error Minimization with Performance Guarantees
19 0.18384843 147 nips-2004-Planning for Markov Decision Processes with Sparse Stochasticity
20 0.1831193 159 nips-2004-Schema Learning: Experience-Based Construction of Predictive Action Models
topicId topicWeight
[(13, 0.063), (15, 0.088), (25, 0.381), (26, 0.042), (31, 0.028), (33, 0.165), (35, 0.042), (39, 0.014), (50, 0.028), (71, 0.014), (89, 0.038)]
simIndex simValue paperId paperTitle
1 0.86036032 109 nips-2004-Mass Meta-analysis in Talairach Space
Author: Finn \. Nielsen
Abstract: We provide a method for mass meta-analysis in a neuroinformatics database containing stereotaxic Talairach coordinates from neuroimaging experiments. Database labels are used to group the individual experiments, e.g., according to cognitive function, and the consistent pattern of the experiments within the groups are determined. The method voxelizes each group of experiments via a kernel density estimation, forming probability density volumes. The values in the probability density volumes are compared to null-hypothesis distributions generated by resamplings from the entire unlabeled set of experiments, and the distances to the nullhypotheses are used to sort the voxels across groups of experiments. This allows for mass meta-analysis, with the construction of a list with the most prominent associations between brain areas and group labels. Furthermore, the method can be used for functional labeling of voxels. 1
same-paper 2 0.83387798 123 nips-2004-Multi-agent Cooperation in Diverse Population Games
Author: K. Wong, S. W. Lim, Z. Gao
Abstract: We consider multi-agent systems whose agents compete for resources by striving to be in the minority group. The agents adapt to the environment by reinforcement learning of the preferences of the policies they hold. Diversity of preferences of policies is introduced by adding random biases to the initial cumulative payoffs of their policies. We explain and provide evidence that agent cooperation becomes increasingly important when diversity increases. Analyses of these mechanisms yield excellent agreement with simulations over nine decades of data. 1
3 0.82895148 52 nips-2004-Discrete profile alignment via constrained information bottleneck
Author: Sean O'rourke, Gal Chechik, Robin Friedman, Eleazar Eskin
Abstract: Amino acid profiles, which capture position-specific mutation probabilities, are a richer encoding of biological sequences than the individual sequences themselves. However, profile comparisons are much more computationally expensive than discrete symbol comparisons, making profiles impractical for many large datasets. Furthermore, because they are such a rich representation, profiles can be difficult to visualize. To overcome these problems, we propose a discretization for profiles using an expanded alphabet representing not just individual amino acids, but common profiles. By using an extension of information bottleneck (IB) incorporating constraints and priors on the class distributions, we find an informationally optimal alphabet. This discretization yields a concise, informative textual representation for profile sequences. Also alignments between these sequences, while nearly as accurate as the full profileprofile alignments, can be computed almost as quickly as those between individual or consensus sequences. A full pairwise alignment of SwissProt would take years using profiles, but less than 3 days using a discrete IB encoding, illustrating how discrete encoding can expand the range of sequence problems to which profile information can be applied. 1
4 0.63678163 10 nips-2004-A Probabilistic Model for Online Document Clustering with Application to Novelty Detection
Author: Jian Zhang, Zoubin Ghahramani, Yiming Yang
Abstract: In this paper we propose a probabilistic model for online document clustering. We use non-parametric Dirichlet process prior to model the growing number of clusters, and use a prior of general English language model as the base distribution to handle the generation of novel clusters. Furthermore, cluster uncertainty is modeled with a Bayesian Dirichletmultinomial distribution. We use empirical Bayes method to estimate hyperparameters based on a historical dataset. Our probabilistic model is applied to the novelty detection task in Topic Detection and Tracking (TDT) and compared with existing approaches in the literature. 1
5 0.60518235 206 nips-2004-Worst-Case Analysis of Selective Sampling for Linear-Threshold Algorithms
Author: Nicolò Cesa-bianchi, Claudio Gentile, Luca Zaniboni
Abstract: We provide a worst-case analysis of selective sampling algorithms for learning linear threshold functions. The algorithms considered in this paper are Perceptron-like algorithms, i.e., algorithms which can be efficiently run in any reproducing kernel Hilbert space. Our algorithms exploit a simple margin-based randomized rule to decide whether to query the current label. We obtain selective sampling algorithms achieving on average the same bounds as those proven for their deterministic counterparts, but using much fewer labels. We complement our theoretical findings with an empirical comparison on two text categorization tasks. The outcome of these experiments is largely predicted by our theoretical results: Our selective sampling algorithms tend to perform as good as the algorithms receiving the true label after each classification, while observing in practice substantially fewer labels. 1
6 0.55827636 133 nips-2004-Nonparametric Transforms of Graph Kernels for Semi-Supervised Learning
7 0.49035648 124 nips-2004-Multiple Alignment of Continuous Time Series
8 0.48417291 15 nips-2004-Active Learning for Anomaly and Rare-Category Detection
9 0.47437802 12 nips-2004-A Temporal Kernel-Based Model for Tracking Hand Movements from Neural Activities
10 0.46904233 129 nips-2004-New Criteria and a New Algorithm for Learning in Multi-Agent Systems
11 0.46798107 174 nips-2004-Spike Sorting: Bayesian Clustering of Non-Stationary Data
12 0.46773002 77 nips-2004-Hierarchical Clustering of a Mixture Model
13 0.46750203 1 nips-2004-A Cost-Shaping LP for Bellman Error Minimization with Performance Guarantees
14 0.46692017 207 nips-2004-ℓ₀-norm Minimization for Basis Selection
15 0.46657327 44 nips-2004-Conditional Random Fields for Object Recognition
16 0.46647897 31 nips-2004-Blind One-microphone Speech Separation: A Spectral Learning Approach
17 0.46637592 204 nips-2004-Variational Minimax Estimation of Discrete Distributions under KL Loss
18 0.4663094 127 nips-2004-Neighbourhood Components Analysis
19 0.46600941 167 nips-2004-Semi-supervised Learning with Penalized Probabilistic Clustering
20 0.46584606 102 nips-2004-Learning first-order Markov models for control