nips nips2004 nips2004-24 knowledge-graph by maker-knowledge-mining
Source: pdf
Author: David C. Parkes, Dimah Yanovsky, Satinder P. Singh
Abstract: Online mechanism design (OMD) addresses the problem of sequential decision making in a stochastic environment with multiple self-interested agents. The goal in OMD is to make value-maximizing decisions despite this self-interest. In previous work we presented a Markov decision process (MDP)-based approach to OMD in large-scale problem domains. In practice the underlying MDP needed to solve OMD is too large and hence the mechanism must consider approximations. This raises the possibility that agents may be able to exploit the approximation for selfish gain. We adopt sparse-sampling-based MDP algorithms to implement efficient policies, and retain truth-revelation as an approximate BayesianNash equilibrium. Our approach is empirically illustrated in the context of the dynamic allocation of WiFi connectivity to users in a coffeehouse. 1
Reference: text
sentIndex sentText sentNum sentScore
1 edu Abstract Online mechanism design (OMD) addresses the problem of sequential decision making in a stochastic environment with multiple self-interested agents. [sent-9, score-0.301]
2 In practice the underlying MDP needed to solve OMD is too large and hence the mechanism must consider approximations. [sent-12, score-0.22]
3 This raises the possibility that agents may be able to exploit the approximation for selfish gain. [sent-13, score-0.363]
4 1 Introduction Mechanism design (MD) is concerned with the problem of providing incentives to implement desired system-wide outcomes in systems with multiple self-interested agents. [sent-16, score-0.15]
5 Agents are assumed to have private information, for example about their utility for different outcomes and about their ability to implement different outcomes, and act to maximize their own utility. [sent-17, score-0.16]
6 The MD approach to achieving multiagent coordination supposes the existence of a center that can receive messages from agents and implement an outcome and collect payments from agents. [sent-18, score-0.591]
7 Classic mechanism design considers static systems in which all agents are present and a one-time decision is made about an outcome. [sent-20, score-0.664]
8 Online mechanism design [1] departs from this and allows agents to arrive and depart dynamically requiring decisions to be made with uncertainty about the future. [sent-22, score-0.828]
9 Thus, an online mechanism makes a sequence of decisions without the benefit of hindsight about the valuations of the agents yet to arrive. [sent-23, score-0.887]
10 Without the issue of incentives, the online MD problem is a classic sequential decision problem. [sent-24, score-0.221]
11 This online VCG model differs from the line of work in online auctions, introduced by Lavi and Nisan [4] in that it assumes that the center has a model and it handles a general decision space and any decision horizon. [sent-26, score-0.442]
12 Computing the payments and allocations in the online VCG mechanism involves solving the MDP that defines the underlying centralized (ignoring self-interest) decision making problem. [sent-27, score-0.619]
13 [3] is particularly well suited to online MD because it produces the needed local approximations to the optimal value and action efficiently. [sent-32, score-0.331]
14 More challengingly, regardless of our choice the agents in the system can exploit their knowledge of the mechanism’s approximation algorithm to try and “cheat” the mechanism to further their own selfish interests. [sent-33, score-0.583]
15 Our main contribution is to demonstrate that our new approximate online VCG mechanism has truth-revelation by the agents as an -Bayesian-Nash equilibrium (BNE). [sent-34, score-0.875]
16 This approximate equilibrium supposes that each agent is indifferent to within an expected utility of , and will play a truthful strategy in bestresponse to truthful strategies of other agents if no other strategy can improve its utility by more than . [sent-35, score-1.087]
17 For any , our online mechanism has a run-time that is independent of the number of states in the underlying MDP, provides an -BNE, and implements a policy with expected value within of the optimal policy’s value. [sent-36, score-0.734]
18 We demonstrate the speed-up introduced with sparsesampling (compared with policy calculation via value-iteration), as well as the economic value of adopting an MDP-based approach over a simple fixed-price approach. [sent-38, score-0.289]
19 2 Preliminaries Here we formalize the multiagent sequential decision problem that defines the online mechanism design (OMD) problem. [sent-39, score-0.501]
20 Each agent is asked to report its private information (for instance about its value and its capabilities) to a central planner or mechanism upon arrival. [sent-41, score-0.654]
21 The mechanism implements a policy based on its view of the state of the world (as reported by agents). [sent-42, score-0.521]
22 The policy defines actions in each state, and the assumption is that all agents acquiesce to the decisions of the planner. [sent-43, score-0.701]
23 The mechanism also collects payments from agents, which can themselves depend on the reports of agents. [sent-44, score-0.424]
24 Online Mechanism Design We consider a finite-horizon problem with a set T of time points and a sequence of decisions k = {k1 , . [sent-45, score-0.154]
25 , kT }, where kt ∈ Kt and Kt is the set of feasible decisions in period t. [sent-48, score-0.445]
26 Agent i ∈ I arrives at time ai ∈ T , departs at time li ∈ T , and has value vi (k) ≥ 0 for a sequence of decisions k. [sent-49, score-0.378]
27 By assumption, an agent has no value for decisions outside of interval [ai , li ]. [sent-50, score-0.511]
28 Collectively, information θi = (ai , li , vi ) defines the type of agent i with θi ∈ Θ. [sent-52, score-0.407]
29 from a probability distribution f (θ), assumed known to the agents and to the central mechanism. [sent-56, score-0.363]
30 Multiple agents can arrive and depart at the same time. [sent-57, score-0.421]
31 Agent i, with type θi , receives utility ui (k, p; θi ) = vi (k; θi ) − p, for decisions k and payment p. [sent-58, score-0.334]
32 Definition 1 (Online Mechanism Design) The OMD problem is to implement the sequence of decisions that maximizes the expected summed value across all agents in equilibrium, given self-interested agents with private information about valuations. [sent-60, score-1.086]
33 In economic terms, an optimal (value-maximizing) policy is the allocatively-efficient, or simply the efficient policy. [sent-61, score-0.217]
34 Rather, the agents themselves can have capabilities to perform actions and be asked to perform particular actions by the mechanism. [sent-63, score-0.477]
35 The agents can also have private information about the actions that they are able to perform. [sent-64, score-0.47]
36 In the MDP-based approach to solving the OMD problem the sequential decision problem is formalized as an MDP with the state at any time encapsulating both the current agent population and constraints on current decisions as reflected by decisions made previously. [sent-66, score-0.739]
37 The reward function in the MDP is simply defined to correspond with the total reported value of all agents for all sequences of decisions. [sent-67, score-0.501]
38 Define the state of the MDP at time t as the history-vector ht = (θ1 , . [sent-69, score-0.456]
39 , kt−1 ), to include the reported types up to and including period t and the decisions made up to and including period t − 1. [sent-75, score-0.415]
40 The set of decisions available in state ht is Kt (ht ). [sent-78, score-0.556]
41 Given a decision kt ∈ Kt (ht ) in state ht , there is some probability distribution Prob(ht+1 |ht , kt ) over possible next states ht+1 . [sent-79, score-0.908]
42 In the setting of OMD, this probability distribution is determined by the uncertainty on new agent arrivals (as represented within f (θ)), together with departures and the impact of decision kt on state. [sent-80, score-0.584]
43 The payoff function for the induced MDP is defined to reflect the goal of maximizing the total expected reward across all agents. [sent-81, score-0.146]
44 In particular, payoff R i (ht , kt ) = vi (k≤t ; θi ) − vi (k≤t−1 ; θi ) becomes available from agent i upon taking action kt in state ht . [sent-82, score-1.436]
45 With this, τ we have t=1 Ri (ht , kt ) = vi (k≤τ ; θi ), for all periods τ to provide the required cori respondence with agent valuations. [sent-83, score-0.652]
46 Let R(ht , kt ) = i R (ht , kt ), denote the payoff obtained from all agents at time t. [sent-84, score-0.849]
47 , πT } where πt : Ht → Kt , an MDP defines an MDP-value function V π as follows: V π (ht ) is the expected value of the summed payoff obtained from state ht onwards under policy π, i. [sent-88, score-0.784]
48 An optimal policy π ∗ is one that maximizes the MDP-value of every state in H. [sent-91, score-0.266]
49 Given the optimal MDP-value function, the optimal policy is derived as follows: for t < T , policy π ∗ (h ∈ Ht ) chooses action arg maxk∈Kt (h) [R(h, k) + h ∈Ht+1P rob(h |h, k)V ∗ (h )] ˆ and π ∗ (h ∈ HT ) = arg maxk∈KT (h) R(h, k). [sent-96, score-0.452]
50 Let θ≤t denote reported types up to and i ˆ≤t ; π) denote the total reported reward to agent i up to and including period t . [sent-97, score-0.602]
51 The commitment period for agent i is defined as the first period, mi , i i ˆ ˆ for which ∀t ≥ mi , R≤mi (θ≤mi ; π) = R≤t (θ≤mi ∪ θ>mi ; π), for any types θ>mi still to arrive. [sent-99, score-0.693]
52 This is the earliest period in which agent i’s total value is known with certainty. [sent-100, score-0.5]
53 ˆ ˆ ˆ ˆ ˆ Let ht (θ≤t ; π) denote the state in period t given reports θ≤t . [sent-101, score-0.575]
54 Definition 2 (Online VCG mechanism) Given history h ∈ H, mechanism Mvcg = (Θ; π, pvcg ) implements policy π and collects payment, ˆ pvcg (θ≤mi ; π) i i ˆ ˆ a ˆ a = R≤mi (θ≤mi ; π) − V π (hai (θ≤ˆi ; π)) − V π (hai (θ≤ˆi \i ; π)) (1) ˆ ˆ from agent i in some period t ≥ mi . [sent-103, score-1.244]
55 1 Using histories as state will make the state space very large. [sent-104, score-0.152]
56 Agent i’s payment is equal to its reported value discounted by the expected marginal value that it will contribute to the system as determined by the MDP-value function for the policy in its arrival period. [sent-107, score-0.434]
57 Clearly, property Kt (ht ) ⊇ Kt (ht \ θi ) in any period t in which θi has just arrived is sufficient for stalling. [sent-109, score-0.139]
58 Theorem 1 (Parkes & Singh [6]) An online VCG mechanism, Mvcg = (Θ; π ∗ , pvcg ), based on an optimal policy π ∗ for a correct MDP model that satisfies stalling is BayesianNash incentive compatible and implements the optimal MDP policy. [sent-111, score-0.705]
59 3 Solving Very Large MDPs Approximately From Equation 1, it can be seen that making outcome and payment decisions at any point in time in an online VCG mechanism does not require a global value function or a global policy. [sent-112, score-0.691]
60 [3] compute approximate values and actions for a single state at a time. [sent-114, score-0.181]
61 Thus, sparse-sampling methods are particularly suited to approximating online VCG and we adopt them here. [sent-116, score-0.177]
62 Given a state s and an approximation parameter , it computes an -accurate estimate of the optimal value for s as follows. [sent-120, score-0.158]
63 The value of a node is the maximum over the values for all actions in that node. [sent-125, score-0.128]
64 The value of an action in a node is the summed value of the m rewards on the m outgoing edges for that action plus the summed value of the m children of that node. [sent-126, score-0.445]
65 The estimated optimal value of state s is the value of the root node of the tree. [sent-127, score-0.229]
66 The estimated optimal action in state s is the action that leads to the largest value for the root node in the tree. [sent-128, score-0.327]
67 • (Near-Optimality) The value function of the stochastic policy implemented by the sparse-sampling( ) algorithm, denoted V ss , satisfies |V ∗ (s) − V ss (s)| ≤ si- multaneously for all states s ∈ S. [sent-131, score-0.351]
68 4 Approximately Efficient Online Mechanism Design In this section, we define an approximate online VCG mechanism and consider the effect on incentives of substituting the sparse-sampling( ) algorithm into the online VCG mechanism. [sent-136, score-0.669]
69 We model agents as indifferent between decisions that differ by at most a utility of > 0, and consider an approximate Bayesian-Nash equilibrium. [sent-137, score-0.613]
70 Let vi (θ; π) denote the final value to agent i after reports θ given policy π. [sent-138, score-0.637]
71 In particular, when truth-telling is an -BNE we say that the mechanism is -BNE incentive compatible and no agent can improve its expected utility by more than > 0, for any type, as long as other agents are bidding truthfully. [sent-140, score-1.118]
72 Equivalently, one can interpret an -BNE as an exact equilibrium for agents that face a computational cost of at least to compute the exact BNE. [sent-141, score-0.43]
73 Our proof of incentive-compatibility first demonstrates that an approximate delayed VCG mechanism [1, 6] is -BNE. [sent-143, score-0.306]
74 With this, we demonstrate that the expected value of the payments in the approximate online VCG mechanism is within 3 of the payments in the approximate delayed VCG mechanism. [sent-144, score-0.89]
75 In an approximate delayed-VCG mechanism, the role of the sparse-sampling algorithm is to implement an approximate policy, as well as counterfactual policies for the worlds θ−i without each agent i in turn. [sent-146, score-0.502]
76 The total reported reward to agents = i over this counterfactual series of states is used to define the payment to agent i. [sent-147, score-0.949]
77 Lemma 2 Truthful bidding is an -Bayesian-Nash equilibrium in the sparse-sampling( ) based approximate delayed-VCG mechanism. [sent-148, score-0.146]
78 Proof: Let π denote the approximate policy computed by the sparse-sampling algorithm. [sent-149, score-0.202]
79 Now, if agent i bids truthfully its expected utility is j R≤T (θ; π ) − ˜ ˜ Eθ|θ≤ai {vi (θ; π ) + j=i j R≤T (θ−i ; π )} ˜ (4) j=i where the expectation is over both the types yet to be reported and the randomness introduced by the sparse-sampling( ) algorithm. [sent-151, score-0.469]
80 Substituting R 0, the sparse-sampling( ) based approximate online VCG mechanism has -efficiency in an 4 -BNE. [sent-152, score-0.445]
81 The WiFi problem considers a fixed number of channels C with a random number of agents (max A) that can arrive per period. [sent-154, score-0.443]
82 The value model requires that any allocation to agent i must be for contiguous periods, and be made while the agent is present (i. [sent-169, score-0.793]
83 , during periods [t, ai + di ], for arrival ai and duration di ). [sent-171, score-0.218]
84 An agent’s value for an allocation of duration x is vi x where vi is its per-unit value. [sent-172, score-0.341]
85 Action space: The policy can postpone an agent allocation, or allocate an agent to a channel for the remaining duration of the agent’s time in the system if a channel is available, and the remaining duration is not greater than dmax . [sent-175, score-1.123]
86 Payoff function: The reward at a time step is the summed value obtained from all agents for which an allocation is made in this time step. [sent-176, score-0.663]
87 This is the total value such an agent will receive before it departs. [sent-177, score-0.384]
88 Transition probabilities: The change in resource schedule, and in the agent vector that relates to agents currently present, is deterministic. [sent-178, score-0.701]
89 The random new additions to the agent vector at each step are unaffected by the actions and also independent of time. [sent-179, score-0.395]
90 In the exact online VCG mechanism we compute an optimal policy, and optimal MDP values, offline using finite-horizon value iteration [7]. [sent-181, score-0.515]
91 In computing payments we use factoring, and only determine VCG payments once for each type of agent to arrive. [sent-185, score-0.62]
92 We compare performance with a simple fixed-price allocation scheme that given a particular problem, computes off-line a fixed number of periods and a fixed price (agents are queued and offered the price at random as resources become available) that yields the maximum expected total value. [sent-186, score-0.237]
93 2 Looking first at economic properties, Figure 1(A) shows the effect of varying the number of agents from 2 to 12, comparing the value and revenue between the approximate online VCG mechanism and the fixed price mechanism. [sent-192, score-0.995]
94 Revenue is also generally better from the MDP-based mechanism than in the fixed price scheme. [sent-194, score-0.266]
95 Turning now to computational properties, Figure 1 (C) illustrates the effectiveness of sparse-sampling, and also agent pruning, sampled over 100 instances. [sent-197, score-0.338]
96 100 value:mdp rev:mdp value:fixed rev:fixed 80 value:mdp rev:mdp value:fixed rev:fixed 80 60 % % 60 40 40 20 20 4 6 8 Number of agents 10 98 1. [sent-199, score-0.363]
97 2 88 2 time:pruning time:no pruning 4 6 Sampling Width 8 10 0 4 5 6 7 8 Number of channels 9 10 600 500 vs. [sent-204, score-0.168]
98 Figure 1 (D) shows that pruning is extremely useful computationally (in comparison with plain sparsesampling), for the default model parameters and as the number of agents is increased from 2 to 12. [sent-216, score-0.482]
99 Pruning is effective, removing around 50% of agents (summed across all states in the lookahead tree) at 5 agents. [sent-217, score-0.419]
100 Pricing WiFi at Starbucks– Issues in online mechanism design. [sent-223, score-0.397]
wordName wordTfidf (topN-words)
[('agents', 0.363), ('ht', 0.353), ('vcg', 0.344), ('agent', 0.338), ('mechanism', 0.22), ('mdp', 0.219), ('kt', 0.202), ('online', 0.177), ('policy', 0.154), ('omd', 0.141), ('payments', 0.141), ('decisions', 0.127), ('pvcg', 0.125), ('pruning', 0.119), ('period', 0.116), ('mi', 0.094), ('payment', 0.094), ('duration', 0.086), ('parkes', 0.078), ('wifi', 0.078), ('state', 0.076), ('action', 0.072), ('allocation', 0.071), ('summed', 0.069), ('vi', 0.069), ('revenue', 0.068), ('equilibrium', 0.067), ('mdps', 0.064), ('hai', 0.062), ('mvcg', 0.062), ('sparsesampling', 0.062), ('reward', 0.06), ('ss', 0.06), ('actions', 0.057), ('payoff', 0.055), ('incentive', 0.054), ('maxk', 0.054), ('private', 0.05), ('channels', 0.049), ('approximate', 0.048), ('incentives', 0.047), ('stalling', 0.047), ('truthful', 0.047), ('dmax', 0.046), ('price', 0.046), ('value', 0.046), ('utility', 0.044), ('decision', 0.044), ('rev', 0.044), ('periods', 0.043), ('horizon', 0.041), ('implements', 0.039), ('fixed', 0.038), ('delayed', 0.038), ('md', 0.037), ('design', 0.037), ('centralized', 0.037), ('implement', 0.037), ('compatible', 0.037), ('optimal', 0.036), ('kearns', 0.033), ('collects', 0.033), ('satinder', 0.033), ('reported', 0.032), ('states', 0.031), ('auctions', 0.031), ('bayesiannash', 0.031), ('bidding', 0.031), ('bne', 0.031), ('commerce', 0.031), ('counterfactual', 0.031), ('indifferent', 0.031), ('yanovsky', 0.031), ('arrival', 0.031), ('expected', 0.031), ('arrive', 0.031), ('reports', 0.03), ('arrives', 0.03), ('ai', 0.029), ('outcomes', 0.029), ('lavi', 0.027), ('commitment', 0.027), ('depart', 0.027), ('sel', 0.027), ('supposes', 0.027), ('time', 0.027), ('economic', 0.027), ('singh', 0.027), ('nes', 0.026), ('node', 0.025), ('allocated', 0.025), ('harvard', 0.025), ('lookahead', 0.025), ('mf', 0.025), ('sampling', 0.024), ('types', 0.024), ('channel', 0.024), ('departs', 0.023), ('arrived', 0.023), ('multiagent', 0.023)]
simIndex simValue paperId paperTitle
same-paper 1 0.99999958 24 nips-2004-Approximately Efficient Online Mechanism Design
Author: David C. Parkes, Dimah Yanovsky, Satinder P. Singh
Abstract: Online mechanism design (OMD) addresses the problem of sequential decision making in a stochastic environment with multiple self-interested agents. The goal in OMD is to make value-maximizing decisions despite this self-interest. In previous work we presented a Markov decision process (MDP)-based approach to OMD in large-scale problem domains. In practice the underlying MDP needed to solve OMD is too large and hence the mechanism must consider approximations. This raises the possibility that agents may be able to exploit the approximation for selfish gain. We adopt sparse-sampling-based MDP algorithms to implement efficient policies, and retain truth-revelation as an approximate BayesianNash equilibrium. Our approach is empirically illustrated in the context of the dynamic allocation of WiFi connectivity to users in a coffeehouse. 1
2 0.31069431 123 nips-2004-Multi-agent Cooperation in Diverse Population Games
Author: K. Wong, S. W. Lim, Z. Gao
Abstract: We consider multi-agent systems whose agents compete for resources by striving to be in the minority group. The agents adapt to the environment by reinforcement learning of the preferences of the policies they hold. Diversity of preferences of policies is introduced by adding random biases to the initial cumulative payoffs of their policies. We explain and provide evidence that agent cooperation becomes increasingly important when diversity increases. Analyses of these mechanisms yield excellent agreement with simulations over nine decades of data. 1
3 0.21621557 64 nips-2004-Experts in a Markov Decision Process
Author: Eyal Even-dar, Sham M. Kakade, Yishay Mansour
Abstract: We consider an MDP setting in which the reward function is allowed to change during each time step of play (possibly in an adversarial manner), yet the dynamics remain fixed. Similar to the experts setting, we address the question of how well can an agent do when compared to the reward achieved under the best stationary policy over time. We provide efficient algorithms, which have regret bounds with no dependence on the size of state space. Instead, these bounds depend only on a certain horizon time of the process and logarithmically on the number of actions. We also show that in the case that the dynamics change over time, the problem becomes computationally hard. 1
4 0.18269548 88 nips-2004-Intrinsically Motivated Reinforcement Learning
Author: Nuttapong Chentanez, Andrew G. Barto, Satinder P. Singh
Abstract: Psychologists call behavior intrinsically motivated when it is engaged in for its own sake rather than as a step toward solving a specific problem of clear practical value. But what we learn during intrinsically motivated behavior is essential for our development as competent autonomous entities able to efficiently solve a wide range of practical problems as they arise. In this paper we present initial results from a computational study of intrinsically motivated reinforcement learning aimed at allowing artificial agents to construct and extend hierarchies of reusable skills that are needed for competent autonomy. 1
5 0.16437405 39 nips-2004-Coarticulation in Markov Decision Processes
Author: Khashayar Rohanimanesh, Robert Platt, Sridhar Mahadevan, Roderic Grupen
Abstract: We investigate an approach for simultaneously committing to multiple activities, each modeled as a temporally extended action in a semi-Markov decision process (SMDP). For each activity we define a set of admissible solutions consisting of the redundant set of optimal policies, and those policies that ascend the optimal statevalue function associated with them. A plan is then generated by merging them in such a way that the solutions to the subordinate activities are realized in the set of admissible solutions satisfying the superior activities. We present our theoretical results and empirically evaluate our approach in a simulated domain. 1
6 0.14868015 154 nips-2004-Resolving Perceptual Aliasing In The Presence Of Noisy Sensors
7 0.12399264 129 nips-2004-New Criteria and a New Algorithm for Learning in Multi-Agent Systems
8 0.12198999 175 nips-2004-Stable adaptive control with online learning
9 0.10765515 189 nips-2004-The Power of Selective Memory: Self-Bounded Learning of Prediction Suffix Trees
10 0.095775507 202 nips-2004-VDCBPI: an Approximate Scalable Algorithm for Large POMDPs
11 0.092144698 147 nips-2004-Planning for Markov Decision Processes with Sparse Stochasticity
12 0.091463275 1 nips-2004-A Cost-Shaping LP for Bellman Error Minimization with Performance Guarantees
13 0.086607553 59 nips-2004-Efficient Kernel Discriminant Analysis via QR Decomposition
14 0.085201003 102 nips-2004-Learning first-order Markov models for control
15 0.080795027 48 nips-2004-Convergence and No-Regret in Multiagent Learning
16 0.071458422 65 nips-2004-Exploration-Exploitation Tradeoffs for Experts Algorithms in Reactive Environments
17 0.064112335 9 nips-2004-A Method for Inferring Label Sampling Mechanisms in Semi-Supervised Learning
18 0.062034551 155 nips-2004-Responding to Modalities with Different Latencies
19 0.051299755 57 nips-2004-Economic Properties of Social Networks
20 0.051032599 126 nips-2004-Nearly Tight Bounds for the Continuum-Armed Bandit Problem
topicId topicWeight
[(0, -0.137), (1, -0.056), (2, 0.365), (3, -0.08), (4, -0.194), (5, 0.223), (6, -0.059), (7, -0.096), (8, -0.046), (9, 0.041), (10, 0.003), (11, 0.009), (12, -0.054), (13, 0.004), (14, -0.046), (15, 0.028), (16, -0.082), (17, -0.051), (18, 0.065), (19, 0.071), (20, 0.114), (21, -0.019), (22, 0.003), (23, 0.159), (24, 0.051), (25, -0.045), (26, 0.179), (27, -0.112), (28, -0.048), (29, -0.132), (30, 0.053), (31, 0.046), (32, 0.018), (33, 0.007), (34, 0.131), (35, 0.154), (36, 0.016), (37, 0.096), (38, -0.077), (39, -0.081), (40, -0.014), (41, -0.084), (42, 0.041), (43, 0.05), (44, 0.001), (45, 0.022), (46, -0.009), (47, 0.044), (48, 0.085), (49, 0.064)]
simIndex simValue paperId paperTitle
same-paper 1 0.97866267 24 nips-2004-Approximately Efficient Online Mechanism Design
Author: David C. Parkes, Dimah Yanovsky, Satinder P. Singh
Abstract: Online mechanism design (OMD) addresses the problem of sequential decision making in a stochastic environment with multiple self-interested agents. The goal in OMD is to make value-maximizing decisions despite this self-interest. In previous work we presented a Markov decision process (MDP)-based approach to OMD in large-scale problem domains. In practice the underlying MDP needed to solve OMD is too large and hence the mechanism must consider approximations. This raises the possibility that agents may be able to exploit the approximation for selfish gain. We adopt sparse-sampling-based MDP algorithms to implement efficient policies, and retain truth-revelation as an approximate BayesianNash equilibrium. Our approach is empirically illustrated in the context of the dynamic allocation of WiFi connectivity to users in a coffeehouse. 1
2 0.8958621 123 nips-2004-Multi-agent Cooperation in Diverse Population Games
Author: K. Wong, S. W. Lim, Z. Gao
Abstract: We consider multi-agent systems whose agents compete for resources by striving to be in the minority group. The agents adapt to the environment by reinforcement learning of the preferences of the policies they hold. Diversity of preferences of policies is introduced by adding random biases to the initial cumulative payoffs of their policies. We explain and provide evidence that agent cooperation becomes increasingly important when diversity increases. Analyses of these mechanisms yield excellent agreement with simulations over nine decades of data. 1
3 0.72387064 39 nips-2004-Coarticulation in Markov Decision Processes
Author: Khashayar Rohanimanesh, Robert Platt, Sridhar Mahadevan, Roderic Grupen
Abstract: We investigate an approach for simultaneously committing to multiple activities, each modeled as a temporally extended action in a semi-Markov decision process (SMDP). For each activity we define a set of admissible solutions consisting of the redundant set of optimal policies, and those policies that ascend the optimal statevalue function associated with them. A plan is then generated by merging them in such a way that the solutions to the subordinate activities are realized in the set of admissible solutions satisfying the superior activities. We present our theoretical results and empirically evaluate our approach in a simulated domain. 1
4 0.58605886 129 nips-2004-New Criteria and a New Algorithm for Learning in Multi-Agent Systems
Author: Rob Powers, Yoav Shoham
Abstract: We propose a new set of criteria for learning algorithms in multi-agent systems, one that is more stringent and (we argue) better justified than previous proposed criteria. Our criteria, which apply most straightforwardly in repeated games with average rewards, consist of three requirements: (a) against a specified class of opponents (this class is a parameter of the criterion) the algorithm yield a payoff that approaches the payoff of the best response, (b) against other opponents the algorithm’s payoff at least approach (and possibly exceed) the security level payoff (or maximin value), and (c) subject to these requirements, the algorithm achieve a close to optimal payoff in self-play. We furthermore require that these average payoffs be achieved quickly. We then present a novel algorithm, and show that it meets these new criteria for a particular parameter class, the class of stationary opponents. Finally, we show that the algorithm is effective not only in theory, but also empirically. Using a recently introduced comprehensive game theoretic test suite, we show that the algorithm almost universally outperforms previous learning algorithms. 1
5 0.56674153 154 nips-2004-Resolving Perceptual Aliasing In The Presence Of Noisy Sensors
Author: Guy Shani, Ronen I. Brafman
Abstract: Agents learning to act in a partially observable domain may need to overcome the problem of perceptual aliasing – i.e., different states that appear similar but require different responses. This problem is exacerbated when the agent’s sensors are noisy, i.e., sensors may produce different observations in the same state. We show that many well-known reinforcement learning methods designed to deal with perceptual aliasing, such as Utile SufÄ?Ĺš x Memory, Ä?Ĺš nite size history windows, eligibility traces, and memory bits, do not handle noisy sensors well. We suggest a new algorithm, Noisy Utile SufÄ?Ĺš x Memory (NUSM), based on USM, that uses a weighted classiÄ?Ĺš cation of observed trajectories. We compare NUSM to the above methods and show it to be more robust to noise.
6 0.489274 88 nips-2004-Intrinsically Motivated Reinforcement Learning
7 0.47615498 64 nips-2004-Experts in a Markov Decision Process
8 0.43064895 175 nips-2004-Stable adaptive control with online learning
9 0.37470341 171 nips-2004-Solitaire: Man Versus Machine
10 0.35692447 147 nips-2004-Planning for Markov Decision Processes with Sparse Stochasticity
11 0.35529801 202 nips-2004-VDCBPI: an Approximate Scalable Algorithm for Large POMDPs
12 0.31965762 155 nips-2004-Responding to Modalities with Different Latencies
13 0.31326291 1 nips-2004-A Cost-Shaping LP for Bellman Error Minimization with Performance Guarantees
14 0.28977823 159 nips-2004-Schema Learning: Experience-Based Construction of Predictive Action Models
15 0.26992136 189 nips-2004-The Power of Selective Memory: Self-Bounded Learning of Prediction Suffix Trees
16 0.26905233 75 nips-2004-Heuristics for Ordering Cue Search in Decision Making
17 0.2680856 65 nips-2004-Exploration-Exploitation Tradeoffs for Experts Algorithms in Reactive Environments
18 0.23153555 122 nips-2004-Modelling Uncertainty in the Game of Go
19 0.21702 102 nips-2004-Learning first-order Markov models for control
20 0.21464469 57 nips-2004-Economic Properties of Social Networks
topicId topicWeight
[(13, 0.068), (15, 0.106), (26, 0.051), (31, 0.018), (33, 0.189), (35, 0.014), (50, 0.02), (71, 0.386), (77, 0.02)]
simIndex simValue paperId paperTitle
1 0.8356241 83 nips-2004-Incremental Learning for Visual Tracking
Author: Jongwoo Lim, David A. Ross, Ruei-sung Lin, Ming-Hsuan Yang
Abstract: Most existing tracking algorithms construct a representation of a target object prior to the tracking task starts, and utilize invariant features to handle appearance variation of the target caused by lighting, pose, and view angle change. In this paper, we present an efficient and effective online algorithm that incrementally learns and adapts a low dimensional eigenspace representation to reflect appearance changes of the target, thereby facilitating the tracking task. Furthermore, our incremental method correctly updates the sample mean and the eigenbasis, whereas existing incremental subspace update methods ignore the fact the sample mean varies over time. The tracking problem is formulated as a state inference problem within a Markov Chain Monte Carlo framework and a particle filter is incorporated for propagating sample distributions over time. Numerous experiments demonstrate the effectiveness of the proposed tracking algorithm in indoor and outdoor environments where the target objects undergo large pose and lighting changes. 1
same-paper 2 0.81730759 24 nips-2004-Approximately Efficient Online Mechanism Design
Author: David C. Parkes, Dimah Yanovsky, Satinder P. Singh
Abstract: Online mechanism design (OMD) addresses the problem of sequential decision making in a stochastic environment with multiple self-interested agents. The goal in OMD is to make value-maximizing decisions despite this self-interest. In previous work we presented a Markov decision process (MDP)-based approach to OMD in large-scale problem domains. In practice the underlying MDP needed to solve OMD is too large and hence the mechanism must consider approximations. This raises the possibility that agents may be able to exploit the approximation for selfish gain. We adopt sparse-sampling-based MDP algorithms to implement efficient policies, and retain truth-revelation as an approximate BayesianNash equilibrium. Our approach is empirically illustrated in the context of the dynamic allocation of WiFi connectivity to users in a coffeehouse. 1
3 0.67892826 93 nips-2004-Kernel Projection Machine: a New Tool for Pattern Recognition
Author: Laurent Zwald, Gilles Blanchard, Pascal Massart, Régis Vert
Abstract: This paper investigates the effect of Kernel Principal Component Analysis (KPCA) within the classification framework, essentially the regularization properties of this dimensionality reduction method. KPCA has been previously used as a pre-processing step before applying an SVM but we point out that this method is somewhat redundant from a regularization point of view and we propose a new algorithm called Kernel Projection Machine to avoid this redundancy, based on an analogy with the statistical framework of regression for a Gaussian white noise model. Preliminary experimental results show that this algorithm reaches the same performances as an SVM. 1
4 0.63660848 16 nips-2004-Adaptive Discriminative Generative Model and Its Applications
Author: Ruei-sung Lin, David A. Ross, Jongwoo Lim, Ming-Hsuan Yang
Abstract: This paper presents an adaptive discriminative generative model that generalizes the conventional Fisher Linear Discriminant algorithm and renders a proper probabilistic interpretation. Within the context of object tracking, we aim to find a discriminative generative model that best separates the target from the background. We present a computationally efficient algorithm to constantly update this discriminative model as time progresses. While most tracking algorithms operate on the premise that the object appearance or ambient lighting condition does not significantly change as time progresses, our method adapts a discriminative generative model to reflect appearance variation of the target and background, thereby facilitating the tracking task in ever-changing environments. Numerous experiments show that our method is able to learn a discriminative generative model for tracking target objects undergoing large pose and lighting changes.
5 0.56991553 91 nips-2004-Joint Tracking of Pose, Expression, and Texture using Conditionally Gaussian Filters
Author: Tim K. Marks, J. C. Roddey, Javier R. Movellan, John R. Hershey
Abstract: We present a generative model and stochastic filtering algorithm for simultaneous tracking of 3D position and orientation, non-rigid motion, object texture, and background texture using a single camera. We show that the solution to this problem is formally equivalent to stochastic filtering of conditionally Gaussian processes, a problem for which well known approaches exist [3, 8]. We propose an approach based on Monte Carlo sampling of the nonlinear component of the process (object motion) and exact filtering of the object and background textures given the sampled motion. The smoothness of image sequences in time and space is exploited by using Laplace’s method to generate proposal distributions for importance sampling [7]. The resulting inference algorithm encompasses both optic flow and template-based tracking as special cases, and elucidates the conditions under which these methods are optimal. We demonstrate an application of the system to 3D non-rigid face tracking. 1 Background Recent algorithms track morphable objects by solving optic flow equations, subject to the constraint that the tracked points belong to an object whose non-rigid deformations are linear combinations of a set of basic shapes [10, 2, 11]. These algorithms require precise initialization of the object pose and tend to drift out of alignment on long video sequences. We present G-flow, a generative model and stochastic filtering formulation of tracking that address the problems of initialization and error recovery in a principled manner. We define a non-rigid object by the 3D locations of n vertices. The object is a linear combination of k fixed morph bases, with coefficients c = [c1 , c2 , · · · , ck ]T . The fixed 3 × k matrix hi contains the position of the ith vertex in all k morph bases. The transformation from object-centered to image coordinates consists of a rotation, weak perspective projection, and translation. Thus xi , the 2D location of the ith vertex on the image plane, is xi = grhi c + l, (1) where r is the 3 × 3 rotation matrix, l is the 2 × 1 translation vector, and g = 1 0 0 is the 010 projection matrix. The object pose, ut , comprises both the rigid motion parameters and the morph parameters at time t: ut = {r(t), l(t), c(t)}. (2) 1.1 Optic flow Let yt represent the current image, and let xi (ut ) index the image pixel that is rendered by the ith object vertex when the object assumes pose ut . Suppose that we know ut−1 , the pose at time t − 1, and we want to find ut , the pose at time t. This problem can be solved by minimizing the following form with respect to ut : ut = argmin ˆ ut 1 2 n 2 [yt (xi (ut )) − yt−1 (xi (ut−1 ))] . (3) i=1 In the special case in which the xi (ut ) are neighboring points that move with the same 2D displacement, this reduces to the standard Lucas-Kanade optic flow algorithm [9, 1]. Recent work [10, 2, 11] has shown that in the general case, this optimization problem can be solved efficiently using the Gauss-Newton method. We will take advantage of this fact to develop an efficient stochastic inference algorithm within the framework of G-flow. Notational conventions Unless otherwise stated, capital letters are used for random variables, small letters for specific values taken by random variables, and Greek letters for fixed model parameters. Subscripted colons indicate sequences: e.g., X1:t = X1 · · · Xt . The term In stands for the n × n identity matrix, E for expected value, V ar for the covariance matrix, and V ar−1 for the inverse of the covariance matrix (precision matrix). 2 The Generative Model for G-Flow Figure 1: Left: a(Ut ) determines which texel (color at a vertex of the object model or a pixel of the background model) is responsible for rendering each image pixel. Right: G-flow video generation model: At time t, the object’s 3D pose, Ut , is used to project the object texture, Vt , into 2D. This projection is combined with the background texture, Bt , to generate the observed image, Yt . We model the image sequence Y as a stochastic process generated by three hidden causes, U , V , and B, as shown in the graphical model (Figure 1, right). The m × 1 random vector Yt represents the m-pixel image at time t. The n × 1 random vector Vt and the m × 1 random vector Bt represent the n-texel object texture and the m-texel background texture, respectively. As illustrated in Figure 1, left, the object pose, Ut , determines onto which image pixels the object and background texels project at time t. This is formulated using the projection function a(Ut ). For a given pose, ut , the projection a(ut ) is a block matrix, def a(ut ) = av (ut ) ab (ut ) . Here av (ut ), the object projection function, is an m × n matrix of 0s and 1s that tells onto which image pixel each object vertex projects; e.g., a 1 at row j, column i it means that the ith object point projects onto image pixel j. Matrix ab plays the same role for background pixels. Assuming the foreground mapping is one-toone, we let ab = Im −av (ut )av (ut )T , expressing the simple occlusion constraint that every image pixel is rendered by object or background, but not both. In the G-flow generative model: Vt Yt = a(Ut ) + Wt Wt ∼ N (0, σw Im ), σw > 0 Bt (4) Ut ∼ p(ut | ut−1 ) v v Vt = Vt−1 + Zt−1 Zt−1 ∼ N (0, Ψv ), Ψv is diagonal b b Bt = Bt−1 + Zt−1 Zt−1 ∼ N (0, Ψb ), Ψb is diagonal where p(ut | ut−1 ) is the pose transition distribution, and Z v , Z b , W are independent of each other, of the initial conditions, and over time. The form of the pose distribution is left unspecified since the algorithm proposed here does not require the pose distribution or the pose dynamics to be Gaussian. For the initial conditions, we require that the variance of V1 and the variance of B1 are both diagonal. Non-rigid 3D tracking is a difficult nonlinear filtering problem because changing the pose has a nonlinear effect on the image pixels. Fortunately, the problem has a rich structure that we can exploit: under the G-flow model, video generation is a conditionally Gaussian process [3, 6, 4, 5]. If the specific values taken by the pose sequence, u1:t , were known, then the texture processes, V and B, and the image process, Y , would be jointly Gaussian. This suggests the following scheme: we could use particle filtering to obtain a distribution of pose experts (each expert corresponds to a highly probable sample of pose, u1:t ). For each expert we could then use Kalman filtering equations to infer the posterior distribution of texture given the observed images. This method is known in the statistics community as a Monte Carlo filtering solution for conditionally Gaussian processes [3, 4], and in the machine learning community as Rao-Blackwellized particle filtering [6, 5]. We found that in addition to Rao-Blackwellization, it was also critical to use Laplace’s method to generate the proposal distributions for importance sampling [7]. In the context of G-flow, we accomplished this by performing an optic flow-like optimization, using an efficient algorithm similar to those in [10, 2]. 3 Inference Our goal is to find an expression for the filtering distribution, p(ut , vt , bt | y1:t ). Using the law of total probability, we have the following equation for the filtering distribution: p(ut , vt , bt | y1:t ) = p(ut , vt , bt | u1:t−1 , y1:t ) p(u1:t−1 | y1:t ) du1:t−1 Opinion of expert (5) Credibility of expert We can think of the integral in (5) as a sum over a distribution of experts, where each expert corresponds to a single pose history, u1:t−1 . Based on its hypothesis about pose history, each expert has an opinion about the current pose of the object, Ut , and the texture maps of the object and background, Vt and Bt . Each expert also has a credibility, a scalar that measures how well the expert’s opinion matches the observed image yt . Thus, (5) can be interpreted as follows: The filtering distribution at time t is obtained by integrating over the entire ensemble of experts the opinion of each expert weighted by that expert’s credibility. The opinion distribution of expert u1:t−1 can be factorized into the expert’s opinion about the pose Ut times the conditional distribution of texture Vt , Bt given pose: p(ut , vt , bt | u1:t−1 , y1:t ) = p(ut | u1:t−1 , y1:t ) p(vt , bt | u1:t , y1:t ) (6) Opinion of expert Pose Opinion Texture Opinion given pose The rest of this section explains how we evaluate each term in (5) and (6). We cover the distribution of texture given pose in 3.1, pose opinion in 3.2, and credibility in 3.3. 3.1 Texture opinion given pose The distribution of Vt and Bt given the pose history u1:t is Gaussian with mean and covariance that can be obtained using the Kalman filter estimation equations: −1 V ar−1 (Vt , Bt | u1:t , y1:t ) = V ar−1 (Vt , Bt | u1:t−1 , y1:t−1 ) + a(ut )T σw a(ut ) E(Vt , Bt | u1:t , y1:t ) = V ar(Vt , Bt | u1:t , y1:t ) −1 × V ar−1 (Vt , Bt | u1:t−1 , y1:t−1 )E(Vt , Bt | u1:t−1 , y1:t−1 ) + a(ut )T σw yt (7) (8) This requires p(Vt , Bt |u1:t−1 , y1:t−1 ), which we get from the Kalman prediction equations: E(Vt , Bt | u1:t−1 , y1:t−1 ) = E(Vt−1 , Bt−1 | u1:t−1 , y1:t−1 ) V ar(Vt , Bt | u1:t−1 , y1:t−1 ) = V ar(Vt−1 , Bt−1 | u1:t−1 , y1:t−1 ) + (9) Ψv 0 0 Ψb (10) In (9), the expected value E(Vt , Bt | u1:t−1 , y1:t−1 ) consists of texture maps (templates) for the object and background. In (10), V ar(Vt , Bt | u1:t−1 , y1:t−1 ) represents the degree of uncertainty about each texel in these texture maps. Since this is a diagonal matrix, we can refer to the mean and variance of each texel individually. For the ith texel in the object texture map, we use the following notation: µv (i) t v σt (i) def = ith element of E(Vt | u1:t−1 , y1:t−1 ) def = (i, i)th element of V ar(Vt | u1:t−1 , y1:t−1 ) b Similarly, define µb (j) and σt (j) as the mean and variance of the jth texel in the backt ground texture map. (This notation leaves the dependency on u1:t−1 and y1:t−1 implicit.) 3.2 Pose opinion Based on its current texture template (derived from the history of poses and images up to time t−1) and the new image yt , each expert u1:t−1 has a pose opinion, p(ut |u1:t−1 , y1:t ), a probability distribution representing that expert’s beliefs about the pose at time t. Since the effect of ut on the likelihood function is nonlinear, we will not attempt to find an analytical solution for the pose opinion distribution. However, due to the spatio-temporal smoothness of video signals, it is possible to estimate the peak and variance of an expert’s pose opinion. 3.2.1 Estimating the peak of an expert’s pose opinion We want to estimate ut (u1:t−1 ), the value of ut that maximizes the pose opinion. Since ˆ p(ut | u1:t−1 , y1:t ) = p(y1:t−1 | u1:t−1 ) p(ut | ut−1 ) p(yt | u1:t , y1:t−1 ), p(y1:t | u1:t−1 ) (11) def ut (u1:t−1 ) = argmax p(ut | u1:t−1 , y1:t ) = argmax p(ut | ut−1 ) p(yt | u1:t , y1:t−1 ). ˆ ut ut (12) We now need an expression for the final term in (12), the predictive distribution p(yt | u1:t , y1:t−1 ). By integrating out the hidden texture variables from p(yt , vt , bt | u1:t , y1:t−1 ), and using the conditional independence relationships defined by the graphical model (Figure 1, right), we can derive: 1 m log p(yt | u1:t , y1:t−1 ) = − log 2π − log |V ar(Yt | u1:t , y1:t−1 )| 2 2 n v 2 1 (yt (xi (ut )) − µt (i)) 1 (yt (j) − µb (j))2 t − − , (13) v (i) + σ b 2 i=1 σt 2 σt (j) + σw w j∈X (ut ) where xi (ut ) is the image pixel rendered by the ith object vertex when the object assumes pose ut , and X (ut ) is the set of all image pixels rendered by the object under pose ut . Combining (12) and (13), we can derive ut (u1:t−1 ) = argmin − log p(ut | ut−1 ) ˆ (14) ut + 1 2 n i=1 [yt (xi (ut )) − µv (i)]2 [yt (xi (ut )) − µb (xi (ut ))]2 t t b − − log[σt (xi (ut )) + σw ] v b σt (i) + σw σt (xi (ut )) + σw Foreground term Background terms Note the similarity between (14) and constrained optic flow (3). For example, focus on the foreground term in (14) and ignore the weights in the denominator. The previous image yt−1 from (3) has been replaced by µv (·), the estimated object texture based on the images t and poses up to time t − 1. As in optic flow, we can find the pose estimate ut (u1:t−1 ) ˆ efficiently using the Gauss-Newton method. 3.2.2 Estimating the distribution of an expert’s pose opinion We estimate the distribution of an expert’s pose opinion using a combination of Laplace’s method and importance sampling. Suppose at time t − 1 we are given a sample of experts (d) (d) indexed by d, each endowed with a pose sequence u1:t−1 , a weight wt−1 , and the means and variances of Gaussian distributions for object and background texture. For each expert (d) (d) u1:t−1 , we use (14) to compute ut , the peak of the pose distribution at time t according ˆ (d) to that expert. Define σt as the inverse Hessian matrix of (14) at this peak, the Laplace ˆ estimate of the covariance matrix of the expert’s opinion. We then generate a set of s (d,e) (d) independent samples {ut : e = 1, · · · , s} from a Gaussian distribution with mean ut ˆ (d) (d) (d) and variance proportional to σt , g(·|ˆt , αˆt ), where the parameter α > 0 determines ˆ u σ the sharpness of the sampling distribution. (Note that letting α → 0 would be equivalent to (d,e) (d) simply setting the new pose equal to the peak of the pose opinion, ut = ut .) To find ˆ the parameters of this Gaussian proposal distribution, we use the Gauss-Newton method, ignoring the second of the two background terms in (14). (This term is not ignored in the importance sampling step.) To refine our estimate of the pose opinion we use importance sampling. We assign each sample from the proposal distribution an importance weight wt (d, e) that is proportional to the ratio between the posterior distribution and the proposal distribution: s (d) p(ut | u1:t−1 , y1:t ) = ˆ (d,e) δ(ut − ut ) wt (d, e) s f =1 wt (d, f ) (15) e=1 (d,e) (d) (d) (d,e) p(ut | ut−1 )p(yt | u1:t−1 , ut , y1:t−1 ) wt (d, e) = (16) (d,e) (d) (d) g(ut | ut , αˆt ) ˆ σ (d,e) (d) The numerator of (16) is proportional to p(ut |u1:t−1 , y1:t ) by (12), and the denominator of (16) is the sampling distribution. 3.3 Estimating an expert’s credibility (d) The credibility of the dth expert, p(u1:t−1 | y1:t ), is proportional to the product of a prior term and a likelihood term: (d) (d) p(u1:t−1 | y1:t−1 )p(yt | u1:t−1 , y1:t−1 ) (d) p(u1:t−1 | y1:t ) = . (17) p(yt | y1:t−1 ) Regarding the likelihood, p(yt |u1:t−1 , y1:t−1 ) = p(yt , ut |u1:t−1 , y1:t−1 )dut = p(yt |u1:t , y1:t−1 )p(ut |ut−1 )dut (18) (d,e) We already generated a set of samples {ut : e = 1, · · · , s} that estimate the pose opin(d) ion of the dth expert, p(ut | u1:t−1 , y1:t ). We can now use these samples to estimate the likelihood for the dth expert: (d) p(yt | u1:t−1 , y1:t−1 ) = (d) (d) p(yt | u1:t−1 , ut , y1:t−1 )p(ut | ut−1 )dut (19) (d) (d) (d) (d) = p(yt | u1:t−1 , ut , y1:t−1 )g(ut | ut , αˆt ) ˆ σ 3.4 p(ut | ut−1 ) s e=1 dut ≈ wt (d, e) s Updating the filtering distribution g(ut | (d) (d) ut , αˆt ) ˆ σ Once we have calculated the opinion and credibility of each expert u1:t−1 , we evaluate the integral in (5) as a weighted sum over experts. The credibilities of all of the experts are normalized to sum to 1. New experts u1:t (children) are created from the old experts u1:t−1 (parents) by appending a pose ut to the parent’s history of poses u1:t−1 . Every expert in the new generation is created as follows: One parent is chosen to sire the child. The probability of being chosen is proportional to the parent’s credibility. The child’s value of ut is chosen at random from its parent’s pose opinion (the weighted samples described in Section 3.2.2). 4 Relation to Optic Flow and Template Matching In basic template-matching, the same time-invariant texture map is used to track every frame in the video sequence. Optic flow can be thought of as template-matching with a template that is completely reset at each frame for use in the subsequent frame. In most cases, optimal inference under G-flow involves a combination of optic flow-based and template-based tracking, in which the texture template gradually evolves as new images are presented. Pure optic flow and template-matching emerge as special cases. Optic Flow as a Special Case Suppose that the pose transition probability p(ut | ut−1 ) is uninformative, that the background is uninformative, that every texel in the initial object texture map has equal variance, V ar(V1 ) = κIn , and that the texture transition uncertainty is very high, Ψv → diag(∞). Using (7), (8), and (10), it follows that: µv (i) = [av (ut−1 )]T yt−1 = yt−1 (xi (ut−1 )) , t (20) i.e., the object texture map at time t is determined by the pixels from image yt−1 that according to pose ut−1 were rendered by the object. As a result, (14) reduces to: ut (u1:t−1 ) = argmin ˆ ut 1 2 n yt (xi (ut )) − yt−1 (xi (ut−1 )) 2 (21) i=1 which is identical to (3). Thus constrained optic flow [10, 2, 11] is simply a special case of optimal inference under G-flow, with a single expert and with sampling parameter α → 0. The key assumption that Ψv → diag(∞) means that the object’s texture is very different in adjacent frames. However, optic flow is typically applied in situations in which the object’s texture in adjacent frames is similar. The optimal solution in such situations calls not for optic flow, but for a texture map that integrates information across multiple frames. Template Matching as a Special Case Suppose the initial texture map is known precisely, V ar(V1 ) = 0, and the texture transition uncertainty is very low, Ψv → 0. By (7), (8), and (10), it follows that µv (i) = µv (i) = µv (i), i.e., the texture map does not change t t−1 1 over time, but remains fixed at its initial value (it is a texture template). Then (14) becomes: n yt (xi (ut )) − µv (i) 1 ut (u1:t−1 ) = argmin ˆ ut 2 (22) i=1 where µv (i) is the ith texel of the fixed texture template. This is the error function mini1 mized by standard template-matching algorithms. The key assumption that Ψv → 0 means the object’s texture is constant from each frame to the next, which is rarely true in real data. G-flow provides a principled way to relax this unrealistic assumption of template methods. General Case In general, if the background is uninformative, then minimizing (14) results in a weighted combination of optic flow and template matching, with the weight of each approach depending on the current level of certainty about the object template. In addition, when there is useful information in the background, G-flow infers a model of the background which is used to improve tracking. Figure 2: G-flow tracking an outdoor video. Results are shown for frames 1, 81, and 620. 5 Simulations We collected a video (30 frames/sec) of a subject in an outdoor setting who made a variety of facial expressions while moving her head. A later motion-capture session was used to create a 3D morphable model of her face, consisting of a set of 5 morph bases (k = 5). Twenty experts were initialized randomly near the correct pose on frame 1 of the video and propagated using G-flow inference (assuming an uninformative background). See http://mplab.ucsd.edu for video. Figure 2 shows the distribution of experts for three frames. In each frame, every expert has a hypothesis about the pose (translation, rotation, scale, and morph coefficients). The 38 points in the model are projected into the image according to each expert’s pose, yielding 760 red dots in each frame. In each frame, the mean of the experts gives a single hypothesis about the 3D non-rigid deformation of the face (lower right) as well as the rigid pose of the face (rotated 3D axes, lower left). Notice G-flow’s ability to recover from error: bad initial hypotheses are weeded out, leaving only good hypotheses. To compare G-flow’s performance versus deterministic constrained optic flow algorithms such as [10, 2, 11] , we used both G-flow and the method from [2] to track the same video sequence. We ran each tracker several times, introducing small errors in the starting pose. Figure 3: Average error over time for G-flow (green) and for deterministic optic flow [2] (blue). Results were averaged over 16 runs (deterministic algorithm) or 4 runs (G-flow) and smoothed. As ground truth, the 2D locations of 6 points were hand-labeled in every 20th frame. The error at every 20th frame was calculated as the distance from these labeled locations to the inferred (tracked) locations, averaged across several runs. Figure 3 compares this tracking error as a function of time for the deterministic constrained optic flow algorithm and for a 20-expert version of the G-flow tracking algorithm. Notice that the deterministic system has a tendency to drift (increase in error) over time, whereas G-flow can recover from drift. Acknowledgments Tim K. Marks was supported by NSF grant IIS-0223052 and NSF grant DGE-0333451 to GWC. John Hershey was supported by the UCDIMI grant D00-10084. J. Cooper Roddey was supported by the Swartz Foundation. Javier R. Movellan was supported by NSF grants IIS-0086107, IIS-0220141, and IIS-0223052, and by the UCDIMI grant D00-10084. References [1] Simon Baker and Iain Matthews. Lucas-kanade 20 years on: A unifying framework. International Journal of Computer Vision, 56(3):221–255, 2002. [2] M. Brand. Flexible flow for 3D nonrigid tracking and shape recovery. In CVPR, volume 1, pages 315–322, 2001. [3] H. Chen, P. Kumar, and J. van Schuppen. On Kalman filtering for conditionally gaussian systems with random matrices. Syst. Contr. Lett., 13:397–404, 1989. [4] R. Chen and J. Liu. Mixture Kalman filters. J. R. Statist. Soc. B, 62:493–508, 2000. [5] A. Doucet and C. Andrieu. Particle filtering for partially observed gaussian state space models. J. R. Statist. Soc. B, 64:827–838, 2002. [6] A. Doucet, N. de Freitas, K. Murphy, and S. Russell. Rao-blackwellised particle filtering for dynamic bayesian networks. In 16th Conference on Uncertainty in AI, pages 176–183, 2000. [7] A. Doucet, S. J. Godsill, and C. Andrieu. On sequential monte carlo sampling methods for bayesian filtering. Statistics and Computing, 10:197–208, 2000. [8] Zoubin Ghahramani and Geoffrey E. Hinton. Variational learning for switching state-space models. Neural Computation, 12(4):831–864, 2000. [9] B. Lucas and T. Kanade. An iterative image registration technique with an application to stereo vision. In Proceedings of the International Joint Conference on Artificial Intelligence, 1981. [10] L. Torresani, D. Yang, G. Alexander, and C. Bregler. Tracking and modeling non-rigid objects with rank constraints. In CVPR, pages 493–500, 2001. [11] Lorenzo Torresani, Aaron Hertzmann, and Christoph Bregler. Learning non-rigid 3d shape from 2d motion. In Advances in Neural Information Processing Systems 16. MIT Press, 2004.
6 0.54338264 73 nips-2004-Generative Affine Localisation and Tracking
7 0.53628135 55 nips-2004-Distributed Occlusion Reasoning for Tracking with Nonparametric Belief Propagation
8 0.53433514 182 nips-2004-Synergistic Face Detection and Pose Estimation with Energy-Based Models
9 0.53189594 64 nips-2004-Experts in a Markov Decision Process
10 0.5253405 40 nips-2004-Common-Frame Model for Object Recognition
11 0.51786619 65 nips-2004-Exploration-Exploitation Tradeoffs for Experts Algorithms in Reactive Environments
12 0.51606643 159 nips-2004-Schema Learning: Experience-Based Construction of Predictive Action Models
13 0.51413316 12 nips-2004-A Temporal Kernel-Based Model for Tracking Hand Movements from Neural Activities
14 0.51290971 161 nips-2004-Self-Tuning Spectral Clustering
15 0.51161885 3 nips-2004-A Feature Selection Algorithm Based on the Global Minimization of a Generalization Error Bound
16 0.51115555 31 nips-2004-Blind One-microphone Speech Separation: A Spectral Learning Approach
17 0.51102191 167 nips-2004-Semi-supervised Learning with Penalized Probabilistic Clustering
18 0.51094633 44 nips-2004-Conditional Random Fields for Object Recognition
19 0.51079541 77 nips-2004-Hierarchical Clustering of a Mixture Model
20 0.51061738 207 nips-2004-ℓ₀-norm Minimization for Basis Selection