nips nips2003 nips2003-26 knowledge-graph by maker-knowledge-mining
Source: pdf
Author: David C. Parkes, Satinder P. Singh
Abstract: Online mechanism design (MD) considers the problem of providing incentives to implement desired system-wide outcomes in systems with self-interested agents that arrive and depart dynamically. Agents can choose to misrepresent their arrival and departure times, in addition to information about their value for different outcomes. We consider the problem of maximizing the total longterm value of the system despite the self-interest of agents. The online MD problem induces a Markov Decision Process (MDP), which when solved can be used to implement optimal policies in a truth-revealing Bayesian-Nash equilibrium. 1
Reference: text
sentIndex sentText sentNum sentScore
1 edu Abstract Online mechanism design (MD) considers the problem of providing incentives to implement desired system-wide outcomes in systems with self-interested agents that arrive and depart dynamically. [sent-5, score-0.791]
2 Agents can choose to misrepresent their arrival and departure times, in addition to information about their value for different outcomes. [sent-6, score-0.054]
3 We consider the problem of maximizing the total longterm value of the system despite the self-interest of agents. [sent-7, score-0.102]
4 The online MD problem induces a Markov Decision Process (MDP), which when solved can be used to implement optimal policies in a truth-revealing Bayesian-Nash equilibrium. [sent-8, score-0.265]
5 1 Introduction Mechanism design (MD) is a subfield of economics that seeks to implement particular outcomes in systems of rational agents [1]. [sent-9, score-0.451]
6 Classically, MD considers static worlds in which a one-time decision is made and all agents are assumed to be patient enough to wait for the decision. [sent-10, score-0.535]
7 By contrast, we consider dynamic worlds in which agents may arrive and depart over time and in which a sequence of decisions must be made without the benefit of hindsight about the values of agents yet to arrive. [sent-11, score-1.075]
8 The MD problem for dynamic systems is termed online mechanism design [2]. [sent-12, score-0.347]
9 Online MD supposes the existence of a center, that can receive messages from agents and enforce a particular outcome and collect payments. [sent-13, score-0.298]
10 Sequential decision tasks introduce new subtleties into the MD problem. [sent-14, score-0.109]
11 First, decisions now have expected value instead of certain value because of uncertainty about the future. [sent-15, score-0.273]
12 Second, new temporal strategies are available to an agent, such as waiting to report its presence to try to improve its utility within the mechanism. [sent-16, score-0.105]
13 Online mechanisms must bring truthful and immediate revelation of an agent’s value for sequences of decisions into equilibrium. [sent-17, score-0.555]
14 Without the problem of private information and incentives, the sequential decision problem in online MD could be formulated and solved as a Markov Decision Process (MDP). [sent-18, score-0.361]
15 In fact, we show that an optimal policy and MDP-value function can be used to define an online mechanism in which truthful and immediate revelation of an agent’s valuation for different sequences of decisions is a Bayes-Nash equilibrium. [sent-19, score-0.925]
16 Our approach is very general, applying to any MDP in which the goal is to maximize the total expected sequential value across all agents. [sent-20, score-0.161]
17 To illustrate the flexibility of this model, we can consider the following illustrative applications: reusable goods. [sent-21, score-0.029]
18 A renewable resource is available in each time period. [sent-22, score-0.093]
19 Agents arrive and submit a bid for a particular quantity of resource for each of a contiguous sequence of periods, and before some deadline. [sent-23, score-0.229]
20 Agents submit bids for a quantity of goods with a deadline, by which time a winnerdetermination decision must be made for that agent. [sent-26, score-0.285]
21 A central controller determines and enforces the actions that will be performed by a dynamically changing team of agents. [sent-28, score-0.068]
22 Agents are only able to perform actions while present in the system. [sent-29, score-0.023]
23 Our main contribution is to identify this connection between online MD and MDPs, and to define a new family of dynamic mechanisms, that we term the online VCG mechanism. [sent-30, score-0.461]
24 We also clearly identify the role of the ability to stall a decision, as it relates to the value of an agent, in providing for Bayes-Nash truthful mechanisms. [sent-31, score-0.204]
25 1 Related Work The problem of online MD is due to Friedman and Parkes [2], who focused on strategyproof online mechanisms in which immediate and truthful revelation of an agent’s valuation function is a dominant strategy equilibrium. [sent-33, score-0.892]
26 The authors define the mechanism that we term the delayed VCG mechanism, identify problems for which the mechanism is strategyproof, and provide the seeds of our work in BayesNash truthful mechanisms. [sent-34, score-0.351]
27 Work on online auctions [3] is also related, in that it considers a system with dynamic agent arrivals and departures. [sent-35, score-0.644]
28 However, the online auction work considers a much simpler setting (see also [4]), for instance the allocation of a fixed number of identical goods, and places less emphasis on temporal strategies or allocative efficiency. [sent-36, score-0.313]
29 [5], provide a general method to construct online auctions from online optimization algorithms. [sent-38, score-0.454]
30 In contrast to our methods, their methods consider the special case of single-minded bidders with a value vi for a particular set of resources ri , and are only temporally strategyproof in the special-case of online algorithms with a non-decreasing acceptance threshold. [sent-39, score-0.489]
31 2 Preliminaries In this section, we introduce a general discrete-time and finite-action formulation for a multiagent sequential decision problem. [sent-40, score-0.188]
32 Putting incentives to one side for now, we also define and solve an MDP formalization of the problem. [sent-41, score-0.122]
33 We consider a finite-horizon problem1 with a set T of discrete time points and a sequence of decisions k = {k1 , . [sent-42, score-0.224]
34 , kT }, where kt ∈ Kt and Kt is the set of feasible decisions in period t. [sent-45, score-0.635]
35 Agent i ∈ I arrives at time ai ∈ T , departs at time di ∈ T , and has value vi (k) ≥ 0 for the sequence of decisions k. [sent-46, score-0.512]
36 By assumption, an agent has no 1 The model can be trivially extended to consider infinite horizons if all agents share the same discount factor, but will require some care for more general settings. [sent-47, score-0.563]
37 value for decisions outside of interval [ai , di ]. [sent-48, score-0.275]
38 Agents also face payments, which we allow in general to be collected after an agents departure. [sent-49, score-0.298]
39 Collectively, information θi = (ai , di , vi ) defines the type of agent i with θi ∈ Θ. [sent-50, score-0.39]
40 from a probability distribution f (θ), assumed known to the agents and to the central mechanism. [sent-54, score-0.32]
41 We allow multiple agents to arrive and depart at the same time. [sent-55, score-0.447]
42 Agent i, with type θi , receives utility ui (k, p; θi ) = vi (k; θi ) − p, for decisions k and payment p. [sent-56, score-0.331]
43 We adopt as our goal that of maximizing the expected total sequential value across all agents. [sent-58, score-0.19]
44 If we were to simply ignore incentive issues, the expected-value maximizing decision problem induces an MDP. [sent-59, score-0.171]
45 The state2 of the MDP at time t is the history-vector ht = (θ1 , . [sent-60, score-0.585]
46 , kt−1 ), and includes the reported types up to and including period t and the decisions made up to and including period t − 1. [sent-66, score-0.294]
47 The set of all possible states at time t is denoted Ht . [sent-67, score-0.047]
48 The set of all possible states across all time is T +1 H = t=1 Ht . [sent-68, score-0.078]
49 The set of decisions available in state ht is Kt (ht ). [sent-69, score-0.781]
50 Given a decision kt ∈ Kt (ht ) in state ht , there is some probability distribution Prob(ht+1 |ht , kt ) over possible next states ht+1 determined by the random new agent arrivals, agent departures, and the impact of decision kt . [sent-70, score-2.342]
51 This makes explicit the dynamics that were left implicit in type distribution θi ∈ f (θi ), and includes additional information about the domain. [sent-71, score-0.023]
52 The objective is to make decisions to maximize the expected total value across all agents. [sent-72, score-0.301]
53 We define a payoff function for the induced MDP as follows: there is a payoff Ri (ht , kt ) = vi (k≤t ; θi ) − vi (k≤t−1 ; θi ), that becomes available from agent i τ upon taking action kt in state ht . [sent-73, score-1.802]
54 With this, we have t=1 Ri (ht ; kt ) = vi (k≤τ ; θi ), i for all periods τ . [sent-74, score-0.512]
55 The summed value, i R (ht , kt ), is the payoff obtained from all agents at time t, and is denoted R(ht , kt ). [sent-75, score-1.117]
56 By assumption, the reward to an agent in this basic online MD problem depends only on decisions, and not on state. [sent-76, score-0.461]
57 The transition probabilities and the reward function defined above, together with the feasible decision space, constitute the induced MDP Mf . [sent-77, score-0.157]
58 , πT } where πt : Ht → Kt , an MDP defines an MDPvalue function V π as follows: V π (ht ) is the expected value of the summed payoff obtained from state ht onwards under policy π, i. [sent-81, score-0.766]
59 An optimal policy π ∗ is one that maximizes the MDP-value of every state3 in H. [sent-84, score-0.086]
60 The optimal MDP-value function V ∗ can be computed via the following value iteration algorithm: for t = T − 1, T − 2, . [sent-85, score-0.025]
61 This algorithm works backwards in time from the horizon and has time complexity polynomial in the size of the MDP and the time horizon T . [sent-89, score-0.149]
62 Given the optimal MDP-value function, the optimal policy is derived as follows: for t < T , and for any policy π, E{ht+1 ,. [sent-90, score-0.172]
wordName wordTfidf (topN-words)
[('ht', 0.56), ('kt', 0.376), ('agents', 0.298), ('agent', 0.236), ('md', 0.221), ('decisions', 0.199), ('online', 0.198), ('mdp', 0.158), ('truthful', 0.144), ('payo', 0.132), ('vi', 0.103), ('incentives', 0.099), ('parkes', 0.099), ('revelation', 0.099), ('strategyproof', 0.099), ('mechanism', 0.086), ('depart', 0.086), ('goods', 0.086), ('policy', 0.086), ('decision', 0.08), ('arrivals', 0.066), ('submit', 0.066), ('valuation', 0.066), ('vcg', 0.066), ('arrive', 0.063), ('sequential', 0.059), ('auctions', 0.058), ('considers', 0.056), ('worlds', 0.052), ('di', 0.051), ('multiagent', 0.049), ('immediate', 0.047), ('resource', 0.046), ('ri', 0.042), ('summed', 0.042), ('mechanisms', 0.041), ('horizon', 0.037), ('outcomes', 0.036), ('period', 0.036), ('identify', 0.035), ('implement', 0.034), ('periods', 0.033), ('induces', 0.033), ('design', 0.033), ('across', 0.031), ('strategies', 0.03), ('dynamic', 0.03), ('utility', 0.029), ('rob', 0.029), ('satinder', 0.029), ('incentive', 0.029), ('auction', 0.029), ('classically', 0.029), ('departs', 0.029), ('departure', 0.029), ('harvard', 0.029), ('horizons', 0.029), ('maxk', 0.029), ('onwards', 0.029), ('payments', 0.029), ('reusable', 0.029), ('subtleties', 0.029), ('ai', 0.029), ('maximizing', 0.029), ('quantity', 0.028), ('reward', 0.027), ('erent', 0.026), ('induced', 0.026), ('michigan', 0.026), ('economics', 0.026), ('arrives', 0.026), ('bid', 0.026), ('longterm', 0.026), ('wait', 0.026), ('value', 0.025), ('time', 0.025), ('private', 0.024), ('prob', 0.024), ('rational', 0.024), ('hindsight', 0.024), ('putting', 0.024), ('waiting', 0.024), ('feasible', 0.024), ('expected', 0.024), ('actions', 0.023), ('nes', 0.023), ('patient', 0.023), ('enforces', 0.023), ('formalization', 0.023), ('preliminaries', 0.023), ('includes', 0.023), ('total', 0.022), ('states', 0.022), ('available', 0.022), ('central', 0.022), ('singh', 0.022), ('collectively', 0.022), ('acceptance', 0.022), ('mf', 0.022), ('sub', 0.022)]
simIndex simValue paperId paperTitle
same-paper 1 1.0000001 26 nips-2003-An MDP-Based Approach to Online Mechanism Design
Author: David C. Parkes, Satinder P. Singh
Abstract: Online mechanism design (MD) considers the problem of providing incentives to implement desired system-wide outcomes in systems with self-interested agents that arrive and depart dynamically. Agents can choose to misrepresent their arrival and departure times, in addition to information about their value for different outcomes. We consider the problem of maximizing the total longterm value of the system despite the self-interest of agents. The online MD problem induces a Markov Decision Process (MDP), which when solved can be used to implement optimal policies in a truth-revealing Bayesian-Nash equilibrium. 1
2 0.27291846 20 nips-2003-All learning is Local: Multi-agent Learning in Global Reward Games
Author: Yu-han Chang, Tracey Ho, Leslie P. Kaelbling
Abstract: In large multiagent games, partial observability, coordination, and credit assignment persistently plague attempts to design good learning algorithms. We provide a simple and efficient algorithm that in part uses a linear system to model the world from a single agent’s limited perspective, and takes advantage of Kalman filtering to allow an agent to construct a good training signal and learn an effective policy. 1
3 0.25829238 105 nips-2003-Learning Near-Pareto-Optimal Conventions in Polynomial Time
Author: Xiaofeng Wang, Tuomas Sandholm
Abstract: We study how to learn to play a Pareto-optimal strict Nash equilibrium when there exist multiple equilibria and agents may have different preferences among the equilibria. We focus on repeated coordination games of non-identical interest where agents do not know the game structure up front and receive noisy payoffs. We design efficient near-optimal algorithms for both the perfect monitoring and the imperfect monitoring setting(where the agents only observe their own payoffs and the joint actions). 1
4 0.25060609 41 nips-2003-Boosting versus Covering
Author: Kohei Hatano, Manfred K. Warmuth
Abstract: We investigate improvements of AdaBoost that can exploit the fact that the weak hypotheses are one-sided, i.e. either all its positive (or negative) predictions are correct. In particular, for any set of m labeled examples consistent with a disjunction of k literals (which are one-sided in this case), AdaBoost constructs a consistent hypothesis by using O(k 2 log m) iterations. On the other hand, a greedy set covering algorithm finds a consistent hypothesis of size O(k log m). Our primary question is whether there is a simple boosting algorithm that performs as well as the greedy set covering. We first show that InfoBoost, a modification of AdaBoost proposed by Aslam for a different purpose, does perform as well as the greedy set covering algorithm. We then show that AdaBoost requires Ω(k 2 log m) iterations for learning k-literal disjunctions. We achieve this with an adversary construction and as well as in simple experiments based on artificial data. Further we give a variant called SemiBoost that can handle the degenerate case when the given examples all have the same label. We conclude by showing that SemiBoost can be used to produce small conjunctions as well. 1
5 0.164682 65 nips-2003-Extending Q-Learning to General Adaptive Multi-Agent Systems
Author: Gerald Tesauro
Abstract: Recent multi-agent extensions of Q-Learning require knowledge of other agents’ payoffs and Q-functions, and assume game-theoretic play at all times by all other agents. This paper proposes a fundamentally different approach, dubbed “Hyper-Q” Learning, in which values of mixed strategies rather than base actions are learned, and in which other agents’ strategies are estimated from observed actions via Bayesian inference. Hyper-Q may be effective against many different types of adaptive agents, even if they are persistently dynamic. Against certain broad categories of adaptation, it is argued that Hyper-Q may converge to exact optimal time-varying policies. In tests using Rock-Paper-Scissors, Hyper-Q learns to significantly exploit an Infinitesimal Gradient Ascent (IGA) player, as well as a Policy Hill Climber (PHC) player. Preliminary analysis of Hyper-Q against itself is also presented. 1
6 0.10042307 36 nips-2003-Auction Mechanism Design for Multi-Robot Coordination
7 0.091118783 163 nips-2003-Probability Estimates for Multi-Class Classification by Pairwise Coupling
8 0.087608352 33 nips-2003-Approximate Planning in POMDPs with Macro-Actions
9 0.086135104 102 nips-2003-Large Scale Online Learning
10 0.085911371 165 nips-2003-Reasoning about Time and Knowledge in Neural Symbolic Learning Systems
11 0.083357058 64 nips-2003-Estimating Internal Variables and Paramters of a Learning Agent by a Particle Filter
12 0.076361716 78 nips-2003-Gaussian Processes in Reinforcement Learning
13 0.072631292 62 nips-2003-Envelope-based Planning in Relational MDPs
14 0.072572723 68 nips-2003-Eye Movements for Reward Maximization
15 0.071395464 34 nips-2003-Approximate Policy Iteration with a Policy Language Bias
16 0.067839228 158 nips-2003-Policy Search by Dynamic Programming
17 0.0616423 110 nips-2003-Learning a World Model and Planning with a Self-Organizing, Dynamic Neural System
18 0.060759604 55 nips-2003-Distributed Optimization in Adaptive Networks
19 0.057470541 167 nips-2003-Robustness in Markov Decision Problems with Uncertain Transition Matrices
20 0.057387535 4 nips-2003-A Biologically Plausible Algorithm for Reinforcement-shaped Representational Learning
topicId topicWeight
[(0, -0.139), (1, 0.289), (2, -0.118), (3, -0.123), (4, -0.002), (5, -0.0), (6, -0.08), (7, 0.104), (8, -0.176), (9, -0.136), (10, -0.011), (11, -0.039), (12, -0.029), (13, -0.187), (14, -0.075), (15, -0.046), (16, 0.066), (17, -0.085), (18, -0.092), (19, 0.016), (20, -0.018), (21, -0.071), (22, -0.194), (23, 0.183), (24, 0.069), (25, 0.179), (26, 0.034), (27, 0.133), (28, 0.087), (29, -0.093), (30, 0.104), (31, -0.045), (32, -0.065), (33, 0.095), (34, 0.03), (35, 0.036), (36, -0.008), (37, -0.175), (38, 0.092), (39, -0.149), (40, 0.016), (41, 0.065), (42, 0.005), (43, -0.101), (44, -0.081), (45, -0.026), (46, 0.033), (47, -0.015), (48, 0.085), (49, 0.15)]
simIndex simValue paperId paperTitle
same-paper 1 0.98477334 26 nips-2003-An MDP-Based Approach to Online Mechanism Design
Author: David C. Parkes, Satinder P. Singh
Abstract: Online mechanism design (MD) considers the problem of providing incentives to implement desired system-wide outcomes in systems with self-interested agents that arrive and depart dynamically. Agents can choose to misrepresent their arrival and departure times, in addition to information about their value for different outcomes. We consider the problem of maximizing the total longterm value of the system despite the self-interest of agents. The online MD problem induces a Markov Decision Process (MDP), which when solved can be used to implement optimal policies in a truth-revealing Bayesian-Nash equilibrium. 1
2 0.77374738 105 nips-2003-Learning Near-Pareto-Optimal Conventions in Polynomial Time
Author: Xiaofeng Wang, Tuomas Sandholm
Abstract: We study how to learn to play a Pareto-optimal strict Nash equilibrium when there exist multiple equilibria and agents may have different preferences among the equilibria. We focus on repeated coordination games of non-identical interest where agents do not know the game structure up front and receive noisy payoffs. We design efficient near-optimal algorithms for both the perfect monitoring and the imperfect monitoring setting(where the agents only observe their own payoffs and the joint actions). 1
3 0.53006458 20 nips-2003-All learning is Local: Multi-agent Learning in Global Reward Games
Author: Yu-han Chang, Tracey Ho, Leslie P. Kaelbling
Abstract: In large multiagent games, partial observability, coordination, and credit assignment persistently plague attempts to design good learning algorithms. We provide a simple and efficient algorithm that in part uses a linear system to model the world from a single agent’s limited perspective, and takes advantage of Kalman filtering to allow an agent to construct a good training signal and learn an effective policy. 1
4 0.5265848 165 nips-2003-Reasoning about Time and Knowledge in Neural Symbolic Learning Systems
Author: Artur Garcez, Luis C. Lamb
Abstract: We show that temporal logic and combinations of temporal logics and modal logics of knowledge can be effectively represented in artificial neural networks. We present a Translation Algorithm from temporal rules to neural networks, and show that the networks compute a fixed-point semantics of the rules. We also apply the translation to the muddy children puzzle, which has been used as a testbed for distributed multi-agent systems. We provide a complete solution to the puzzle with the use of simple neural networks, capable of reasoning about time and of knowledge acquisition through inductive learning. 1
5 0.48368111 41 nips-2003-Boosting versus Covering
Author: Kohei Hatano, Manfred K. Warmuth
Abstract: We investigate improvements of AdaBoost that can exploit the fact that the weak hypotheses are one-sided, i.e. either all its positive (or negative) predictions are correct. In particular, for any set of m labeled examples consistent with a disjunction of k literals (which are one-sided in this case), AdaBoost constructs a consistent hypothesis by using O(k 2 log m) iterations. On the other hand, a greedy set covering algorithm finds a consistent hypothesis of size O(k log m). Our primary question is whether there is a simple boosting algorithm that performs as well as the greedy set covering. We first show that InfoBoost, a modification of AdaBoost proposed by Aslam for a different purpose, does perform as well as the greedy set covering algorithm. We then show that AdaBoost requires Ω(k 2 log m) iterations for learning k-literal disjunctions. We achieve this with an adversary construction and as well as in simple experiments based on artificial data. Further we give a variant called SemiBoost that can handle the degenerate case when the given examples all have the same label. We conclude by showing that SemiBoost can be used to produce small conjunctions as well. 1
6 0.45688584 65 nips-2003-Extending Q-Learning to General Adaptive Multi-Agent Systems
7 0.29453382 163 nips-2003-Probability Estimates for Multi-Class Classification by Pairwise Coupling
8 0.25984052 68 nips-2003-Eye Movements for Reward Maximization
9 0.25344673 36 nips-2003-Auction Mechanism Design for Multi-Robot Coordination
10 0.23721212 102 nips-2003-Large Scale Online Learning
11 0.22257207 167 nips-2003-Robustness in Markov Decision Problems with Uncertain Transition Matrices
12 0.20881185 143 nips-2003-On the Dynamics of Boosting
13 0.20535268 62 nips-2003-Envelope-based Planning in Relational MDPs
14 0.20534128 33 nips-2003-Approximate Planning in POMDPs with Macro-Actions
15 0.202856 34 nips-2003-Approximate Policy Iteration with a Policy Language Bias
16 0.18532537 158 nips-2003-Policy Search by Dynamic Programming
17 0.18477781 25 nips-2003-An MCMC-Based Method of Comparing Connectionist Models in Cognitive Science
18 0.18222359 55 nips-2003-Distributed Optimization in Adaptive Networks
19 0.17940862 78 nips-2003-Gaussian Processes in Reinforcement Learning
20 0.17383192 116 nips-2003-Linear Program Approximations for Factored Continuous-State Markov Decision Processes
topicId topicWeight
[(0, 0.033), (11, 0.015), (27, 0.371), (30, 0.042), (35, 0.036), (53, 0.042), (69, 0.012), (71, 0.039), (76, 0.025), (85, 0.032), (91, 0.203), (99, 0.028)]
simIndex simValue paperId paperTitle
same-paper 1 0.85073066 26 nips-2003-An MDP-Based Approach to Online Mechanism Design
Author: David C. Parkes, Satinder P. Singh
Abstract: Online mechanism design (MD) considers the problem of providing incentives to implement desired system-wide outcomes in systems with self-interested agents that arrive and depart dynamically. Agents can choose to misrepresent their arrival and departure times, in addition to information about their value for different outcomes. We consider the problem of maximizing the total longterm value of the system despite the self-interest of agents. The online MD problem induces a Markov Decision Process (MDP), which when solved can be used to implement optimal policies in a truth-revealing Bayesian-Nash equilibrium. 1
2 0.60921043 141 nips-2003-Nonstationary Covariance Functions for Gaussian Process Regression
Author: Christopher J. Paciorek, Mark J. Schervish
Abstract: We introduce a class of nonstationary covariance functions for Gaussian process (GP) regression. Nonstationary covariance functions allow the model to adapt to functions whose smoothness varies with the inputs. The class includes a nonstationary version of the Matérn stationary covariance, in which the differentiability of the regression function is controlled by a parameter, freeing one from fixing the differentiability in advance. In experiments, the nonstationary GP regression model performs well when the input space is two or three dimensions, outperforming a neural network model and Bayesian free-knot spline models, and competitive with a Bayesian neural network, but is outperformed in one dimension by a state-of-the-art Bayesian free-knot spline model. The model readily generalizes to non-Gaussian data. Use of computational methods for speeding GP fitting may allow for implementation of the method on larger datasets. 1
3 0.60832363 100 nips-2003-Laplace Propagation
Author: Eleazar Eskin, Alex J. Smola, S.v.n. Vishwanathan
Abstract: We present a novel method for approximate inference in Bayesian models and regularized risk functionals. It is based on the propagation of mean and variance derived from the Laplace approximation of conditional probabilities in factorizing distributions, much akin to Minka’s Expectation Propagation. In the jointly normal case, it coincides with the latter and belief propagation, whereas in the general case, it provides an optimization strategy containing Support Vector chunking, the Bayes Committee Machine, and Gaussian Process chunking as special cases. 1
4 0.5094353 110 nips-2003-Learning a World Model and Planning with a Self-Organizing, Dynamic Neural System
Author: Marc Toussaint
Abstract: We present a connectionist architecture that can learn a model of the relations between perceptions and actions and use this model for behavior planning. State representations are learned with a growing selforganizing layer which is directly coupled to a perception and a motor layer. Knowledge about possible state transitions is encoded in the lateral connectivity. Motor signals modulate this lateral connectivity and a dynamic field on the layer organizes a planning process. All mechanisms are local and adaptation is based on Hebbian ideas. The model is continuous in the action, perception, and time domain.
5 0.50670397 105 nips-2003-Learning Near-Pareto-Optimal Conventions in Polynomial Time
Author: Xiaofeng Wang, Tuomas Sandholm
Abstract: We study how to learn to play a Pareto-optimal strict Nash equilibrium when there exist multiple equilibria and agents may have different preferences among the equilibria. We focus on repeated coordination games of non-identical interest where agents do not know the game structure up front and receive noisy payoffs. We design efficient near-optimal algorithms for both the perfect monitoring and the imperfect monitoring setting(where the agents only observe their own payoffs and the joint actions). 1
6 0.50145608 154 nips-2003-Perception of the Structure of the Physical World Using Unknown Multimodal Sensors and Effectors
7 0.50116509 46 nips-2003-Clustering with the Connectivity Kernel
9 0.49835861 87 nips-2003-Identifying Structure across Pre-partitioned Data
10 0.45488065 65 nips-2003-Extending Q-Learning to General Adaptive Multi-Agent Systems
11 0.44032866 4 nips-2003-A Biologically Plausible Algorithm for Reinforcement-shaped Representational Learning
12 0.43625969 68 nips-2003-Eye Movements for Reward Maximization
13 0.43156391 34 nips-2003-Approximate Policy Iteration with a Policy Language Bias
14 0.42817906 55 nips-2003-Distributed Optimization in Adaptive Networks
15 0.42727894 84 nips-2003-How to Combine Expert (and Novice) Advice when Actions Impact the Environment?
16 0.4238461 20 nips-2003-All learning is Local: Multi-agent Learning in Global Reward Games
17 0.42121339 107 nips-2003-Learning Spectral Clustering
18 0.42049187 43 nips-2003-Bounded Invariance and the Formation of Place Fields
19 0.41942668 73 nips-2003-Feature Selection in Clustering Problems
20 0.4177984 81 nips-2003-Geometric Analysis of Constrained Curves