nips nips2011 nips2011-79 knowledge-graph by maker-knowledge-mining

79 nips-2011-Efficient Offline Communication Policies for Factored Multiagent POMDPs


Source: pdf

Author: João V. Messias, Matthijs Spaan, Pedro U. Lima

Abstract: Factored Decentralized Partially Observable Markov Decision Processes (DecPOMDPs) form a powerful framework for multiagent planning under uncertainty, but optimal solutions require a rigid history-based policy representation. In this paper we allow inter-agent communication which turns the problem in a centralized Multiagent POMDP (MPOMDP). We map belief distributions over state factors to an agent’s local actions by exploiting structure in the joint MPOMDP policy. The key point is that when sparse dependencies between the agents’ decisions exist, often the belief over its local state factors is sufficient for an agent to unequivocally identify the optimal action, and communication can be avoided. We formalize these notions by casting the problem into convex optimization form, and present experimental results illustrating the savings in communication that we can obtain.

Reference: text


Summary: the most important sentenses genereted by tfidf model

sentIndex sentText sentNum sentScore

1 pt Abstract Factored Decentralized Partially Observable Markov Decision Processes (DecPOMDPs) form a powerful framework for multiagent planning under uncertainty, but optimal solutions require a rigid history-based policy representation. [sent-15, score-0.332]

2 In this paper we allow inter-agent communication which turns the problem in a centralized Multiagent POMDP (MPOMDP). [sent-16, score-0.307]

3 We map belief distributions over state factors to an agent’s local actions by exploiting structure in the joint MPOMDP policy. [sent-17, score-0.684]

4 The key point is that when sparse dependencies between the agents’ decisions exist, often the belief over its local state factors is sufficient for an agent to unequivocally identify the optimal action, and communication can be avoided. [sent-18, score-1.193]

5 We formalize these notions by casting the problem into convex optimization form, and present experimental results illustrating the savings in communication that we can obtain. [sent-19, score-0.309]

6 1 Introduction Intelligent decision making in real-world scenarios requires an agent to take into account its limitations in sensing and actuation. [sent-20, score-0.4]

7 When multiple agents interact and cooperate in the same environment, the optimal decision-making problem is particularly challenging. [sent-22, score-0.209]

8 For an agent in isolation, planning under uncertainty has been studied using decisiontheoretic models like Partially Observable Markov Decision Processes (POMDPs) [4]. [sent-23, score-0.438]

9 Our focus is on multiagent techniques, building on the factored Multiagent POMDP model. [sent-24, score-0.36]

10 The major source of intractability for optimal Dec-POMDP solvers is that they typically reason over all possible histories of observations other agents can receive. [sent-26, score-0.239]

11 In this work, we consider factored Dec-POMDPs in which communication between agents is possible, which has already been explored for non-factored models [10, 11, 15, 13] as well as for factored Dec-MDPs [12]. [sent-27, score-0.772]

12 When agents share their observations at each time step, the decentralized problem reduces to a centralized one, known as a Multiagent POMDP (MPOMDP) [10]. [sent-28, score-0.335]

13 In this work, we develop individual policies which map beliefs over state factors to actions or communication decisions. [sent-29, score-0.62]

14 1 Maintaining an exact, factorized belief state is typically not possible in cooperative problems. [sent-30, score-0.307]

15 Intuitively, even a small difference in belief can lead to a different action being taken. [sent-32, score-0.408]

16 However, when sparse dependencies between the agents’ decisions exist, often the belief over its local state factors is sufficient for an agent to identify the action that it should take, and communication can be avoided. [sent-33, score-1.313]

17 We formalize these notions as convex optimization problems, extracting those situations in which communication is superfluous. [sent-34, score-0.265]

18 We present experimental results showing the savings in communication that we can obtain, and the overall impact on decision quality. [sent-35, score-0.379]

19 Section 3 presents the formalization of our method to associate belief points over state factors to actions. [sent-38, score-0.447]

20 Di will be used to refer to agent i; S = ×i Xi , i = 1, . [sent-45, score-0.309]

21 , nf is the state space, decomposable into nf factors Xi ∈ {1, . [sent-48, score-0.331]

22 At each step, every agent i takes an individual action ai ∈ Ai , resulting in the joint action a = a1 , . [sent-59, score-0.709]

23 Existing methods for factored Dec-POMDPs can partition the decision problem across local subsets of agents, due to the possible independence between their actions and observations [8]. [sent-72, score-0.348]

24 Note that this does not preclude the existence of state factors which are common to multiple agents. [sent-74, score-0.169]

25 The possibility of exchanging information between agents greatly influences the overall complexity of solving a Dec-POMDP. [sent-75, score-0.209]

26 In a fully communicative Dec-POMDP, the decentralized model can be reduced to a centralized one, the so-called Multiagent POMDP (MPOMDP) [10]. [sent-76, score-0.163]

27 In a Dec-POMDP, at each t an agent i knows only ai and oi , while in an MPOMDP, it is assumed to know a and o. [sent-78, score-0.382]

28 In the latter case, inter-agent communication is necessary to share the local observations. [sent-79, score-0.326]

29 It is well-known that, for a given decision step t, the value function V t of a POMDP is a piecewise linear, convex function [4], which can be represented as V t (bt ) = max αT · bt t α∈Γ , (1) where Γt is a set of vectors (traditionally referred to as α-vectors). [sent-82, score-0.17]

30 Every α ∈ Γt has a particular joint action a associated to it, which we will denote as φ(α). [sent-83, score-0.221]

31 A joint belief state is a probability distribution over the set of states S, and encodes all of the information gathered by all agents in the Dec-POMDP up to a given time t: bt (s) = Pr(st |ot−1 , at−1 , ot−2 , at−2 , . [sent-88, score-0.652]

32 A belief point over factors L which are locally available to the agent will be denoted bL . [sent-95, score-0.673]

33 The amount of communication required to make this possible can then become problematically large. [sent-99, score-0.265]

34 Even if agents can communicate with each other freely, they might not need to always do so in order to act independently, or even cooperatively. [sent-101, score-0.319]

35 The problem of when and what to communicate has been studied before for Dec-MDPs [12], where factors can be directly observed with no associated uncertainty, by reasoning over the possible local alternative actions to a particular assignment of observable state features. [sent-102, score-0.41]

36 For MPOMDPs, this had been approximated at runtime, but implied keeping track and reasoning over a rapidly-growing number of possible joint belief points [11]. [sent-103, score-0.315]

37 We will describe a method to map a belief factor (or several factors) directly to a local action, or to a communication decision, when applicable. [sent-104, score-0.626]

38 Our approach is the first to exploit, offline, the structure of the value function itself in order to identify regions of belief space where an agent may act independently. [sent-105, score-0.657]

39 1 Decision-making with factored beliefs Note that, as fully described in [2], the factorization (4) typically results in an approximation of the true joint belief, since it is seldom possible to decouple the dynamics of a MDP into strictly independent subprocesses. [sent-109, score-0.246]

40 The dependencies between factors, induced by the transition and observation model of the joint process, quickly develop correlations when the horizon of the decision problem is increased, even if these dependencies are sparse. [sent-110, score-0.246]

41 Still, it was proven in [2] that, if some of these dependencies are broken, the resulting error (measured as the KL-divergence) of the factored belief state, with respect to the true joint belief, is bounded. [sent-111, score-0.507]

42 Unfortunately, even a small error in the belief state can lead to different actions being selected, which may significantly affect the decision quality of the multiagent team in some settings [5, 9]. [sent-112, score-0.691]

43 Each belief factor’s dynamics can be described using a two-stage Dynamic Bayesian Network (DBN). [sent-116, score-0.251]

44 For an agent to maintain, at each time step, a set of belief factors, it must have access to the state factors contained in a particular time slice of the respective DBNs. [sent-117, score-0.757]

45 In the latter case, it may be necessary to perform additional communication in order to keep belief factors consistent. [sent-119, score-0.629]

46 We will not be here concerned with the problem of obtaining a suitable partition scheme of the joint belief onto its factors. [sent-121, score-0.367]

47 Instead we will focus on the amount of communication which is necessary for the joint decision-making of the multi-agent team. [sent-123, score-0.329]

48 1 Value Bounds Over Local Belief Space Recall that, for a given α-vector, Vα (b) = α · b represents the expected reward for selecting the action associated with α. [sent-132, score-0.183]

49 Ideally, if this quantity could be mapped from a local belief point bL , then it would be possible to select the best action for an agent based only on its local information. [sent-133, score-0.839]

50 However, as we will show, it is possible to obtain bounds on the achievable value of any given vector, in local belief space. [sent-135, score-0.423]

51 Let m be size of the local belief factor which contains bL . [sent-140, score-0.312]

52 The maximum achievable value for a local belief point, bL , according to α, is: Vα (bL ) = β + γ · bL + δ . [sent-152, score-0.377]

53 By obtaining the bounds (9) and (10), we have taken a step towards identifying the correct action for an agent to take, based on the local information contained in bL . [sent-168, score-0.627]

54 That action can be safely selected without needing to communicate. [sent-170, score-0.191]

55 The complexity of obtaining the local value bounds for a given value function is basically that of reducing the system (7) for each vector. [sent-171, score-0.189]

56 Note that the dominant term corresponds to the size of the local belief factor, which is usually exponentially smaller than n. [sent-173, score-0.312]

57 2 Dealing With Locally Ambiguous Actions The definition of the value bounds (9) and (10) only allows an agent to act in atypical situations in which an action is clearly dominant in terms of expected value. [sent-178, score-0.582]

58 However, this is often not the case, particularly when considering a large decision horizon, since the present effects of any given action on the overall expected reward are typically not pronounced enough for these considerations to be practical. [sent-179, score-0.253]

59 Vα (bL ) > Vα′ (bL ) and Vα (bL ) < Vα′ (bL )), an agent is forced to further reason about which of those actions is best. [sent-182, score-0.377]

60 , |Γa | maximize Γa b − s i subject to Ab 1k s X ML b b = bL 0n 1T b n =1 (11) If the solution bopt to each of these LPs is such that maxi (Abopt )i ≥ maxj (A′ bopt )j , then action a can be safely selected based on bL . [sent-199, score-0.332]

61 If this is not the case for any of the solutions, then it is not possible to map the agent’s best action solely through bL . [sent-200, score-0.206]

62 Otherwise, the remaining belief factor should be requested from other agents, in order to reconstruct b through (4), and map that agent’s action through the joint policy. [sent-205, score-0.521]

63 Let us consider the general problem with nF belief factors contained in the set F . [sent-207, score-0.392]

64 In this case there are 2|F |−1 combinations of non-local factors which the agent can request. [sent-208, score-0.422]

65 Central to this process is the ability to quickly determine, for a given set of belief factors G ⊆ F , if there are no points in bG with non-decidable actions. [sent-210, score-0.364]

66 This implies that, in order to select an action unambiguously from the belief over G, no such solution may be possible. [sent-213, score-0.408]

67 Equipped with this result, we can now formulate a general procedure that, for a set of belief points in local space, returns the corresponding belief factors which must be communicated in order for an agent to act unambiguously. [sent-214, score-1.059]

68 We refer to this as obtaining the communication map for the problem. [sent-215, score-0.34]

69 This procedure is as follows (a more detailed version is included in [6]): we begin by computing the value bounds of V over local factors L, and sampling N reachable local belief points bL ; for each of these points, if the value bounds of the best action are not conflicting (see Section 3. [sent-216, score-0.791]

70 During execution, an agent updates its local information bL , finds the nearest neighbor point in the communication map, and requests the corresponding factors from the other agents. [sent-219, score-0.748]

71 The agent then selects the action which exhibits the highest maximum value bound given the resulting information. [sent-220, score-0.494]

72 4 Experiments We now analyze the results of applying the aforementioned offline communication mapping process to three different MPOMDP environments, each with a different degrees of interdependency between agents. [sent-231, score-0.265]

73 In this world each agent is confined to a two-state area. [sent-233, score-0.309]

74 One of the agents possesses a package which it must hand over to the other agent, through the non-traversable opening between the rooms L1 and R1. [sent-234, score-0.29]

75 Each agent can move randomly inside its own room (a Shuffle action), Exchange the package with the other agent, or Sense its environment in order to find the opening. [sent-235, score-0.413]

76 An Exchange is only successful if both agents are in the correct position (L1, R1) and if both agents perform this action at the same time, which makes it the only available cooperative action. [sent-236, score-0.575]

77 The fact that, in this problem, each belief factor is twodimensional (each factor spans one of the rooms) allows us to visualize the results of our method. [sent-237, score-0.251]

78 In Figure 2, we see that some of the agent’s expected behavior is already contained in the value bounds over its local factor: if an agent is certain of being in room R1 (i. [sent-238, score-0.501]

79 (bX1 )1 = 0), then the action with the highest-valued bound is Shuffle. [sent-240, score-0.157]

80 Likewise, an Exchange should only be carried out when the agent is certain of being in L1, but it is an ambiguous action since the agent needs to be sure that its teammate can cooperate. [sent-241, score-0.775]

81 In Figure 1c we represent the communication map which was obtained offline through the proposed algorithm. [sent-242, score-0.314]

82 Since there are only two factors, the agent only needs to make a binary decision of whether or not to communicate for a given local belief point. [sent-243, score-0.759]

83 The belief points considered safe are marked as 0, and those associated with a communication decision are marked as 1. [sent-244, score-0.611]

84 In terms of quantitative results, we see that ∼ 30 − 40% of communication episodes are avoided in this simple example, without a significant loss of collected reward. [sent-245, score-0.265]

85 In this 49-state world, two agents lie inside opposite rooms, akin to the Relay-Small problem, but each agent has the goal of moving to the other room. [sent-247, score-0.518]

86 There is only one common passage between both rooms, where the agents may collide. [sent-248, score-0.209]

87 For shorter-horizon solutions, agents may not be able to reach their goal, and they communicate so as to minimize negative reward (collisions). [sent-249, score-0.303]

88 For the infinitehorizon case, however, typically only one of the agents communicates, while waiting for its partner to clear the passage. [sent-250, score-0.209]

89 Note that this relationship between the problem’s horizon and the amount of communication savings does not hold for all of the problems. [sent-251, score-0.335]

90 The proposed method exploits the invariance of local policies over subsets of the joint belief space, and this may arbitrarily change with the problem’s horizon. [sent-252, score-0.412]

91 This is an adaptation of the Relay-Small problem (aptly named Relay-Large) to a setting in which each room has four different states, and each agent may be carrying a package at a given time. [sent-254, score-0.363]

92 For settings assuming full and reduced communication, we show empirical control quality, online communication usage. [sent-292, score-0.265]

93 Here, since the agents can act independently for a longer time, the communication savings are more pronounced, as shown in Table 1. [sent-325, score-0.56]

94 5 Conclusions and Future Work Traditional multiagent planning on partially observable environments mostly deals with fullycommunicative or non-communicative situations. [sent-328, score-0.371]

95 For a more realistic scenario where communication should be used only when necessary, state-of-the-art methods are only capable of approximating the optimal policy at run-time [11, 15]. [sent-329, score-0.293]

96 Here, we have analyzed the properties of MPOMDP models which can be exploited in order to increase the efficiency of communication between agents. [sent-330, score-0.265]

97 The complexity of decentralized control of Markov decision processes. [sent-343, score-0.154]

98 Approximate planning for factored POMDPs using belief state simplification. [sent-367, score-0.549]

99 The communicative multiagent team decision problem: Analyzing teamwork theories and models. [sent-403, score-0.353]

100 Exploiting factored representations for decentralized execution in multi-agent teams. [sent-413, score-0.233]


similar papers computed by tfidf model

tfidf for this paper:

wordName wordTfidf (topN-words)

[('bl', 0.568), ('agent', 0.309), ('communication', 0.265), ('belief', 0.251), ('mpomdp', 0.221), ('multiagent', 0.211), ('agents', 0.209), ('action', 0.157), ('factored', 0.149), ('pomdp', 0.121), ('factors', 0.113), ('planning', 0.093), ('decentralized', 0.084), ('nf', 0.081), ('spaan', 0.081), ('pomdps', 0.076), ('matthijs', 0.074), ('bt', 0.072), ('decision', 0.07), ('actions', 0.068), ('communicate', 0.068), ('ml', 0.067), ('icting', 0.066), ('joint', 0.064), ('local', 0.061), ('pr', 0.06), ('bf', 0.059), ('state', 0.056), ('rooms', 0.056), ('frans', 0.055), ('onedoor', 0.055), ('xnf', 0.055), ('ine', 0.053), ('oi', 0.051), ('environment', 0.05), ('map', 0.049), ('lps', 0.048), ('nikos', 0.048), ('oliehoek', 0.048), ('ab', 0.047), ('bounds', 0.046), ('exchange', 0.045), ('savings', 0.044), ('observable', 0.044), ('dependencies', 0.043), ('act', 0.042), ('centralized', 0.042), ('maxj', 0.041), ('lp', 0.039), ('achievable', 0.037), ('bopt', 0.037), ('communicative', 0.037), ('delft', 0.037), ('messias', 0.037), ('unequivocally', 0.037), ('uncertainty', 0.036), ('policies', 0.036), ('team', 0.035), ('bm', 0.035), ('safely', 0.034), ('beliefs', 0.033), ('bg', 0.032), ('shlomo', 0.032), ('lisbon', 0.032), ('communicated', 0.032), ('cnico', 0.032), ('perseus', 0.032), ('relay', 0.032), ('bj', 0.032), ('mf', 0.031), ('decisions', 0.031), ('solvers', 0.03), ('mg', 0.03), ('simmons', 0.03), ('portugal', 0.03), ('room', 0.029), ('value', 0.028), ('contained', 0.028), ('shuf', 0.028), ('instituto', 0.028), ('policy', 0.028), ('ii', 0.028), ('identify', 0.027), ('associate', 0.027), ('obtaining', 0.026), ('onto', 0.026), ('reward', 0.026), ('maxi', 0.026), ('horizon', 0.026), ('scheduling', 0.025), ('safe', 0.025), ('package', 0.025), ('multi', 0.024), ('disambiguate', 0.024), ('partially', 0.023), ('ai', 0.022), ('exploiting', 0.022), ('ot', 0.022), ('scenarios', 0.021), ('executed', 0.021)]

similar papers list:

simIndex simValue paperId paperTitle

same-paper 1 0.99999976 79 nips-2011-Efficient Offline Communication Policies for Factored Multiagent POMDPs

Author: João V. Messias, Matthijs Spaan, Pedro U. Lima

Abstract: Factored Decentralized Partially Observable Markov Decision Processes (DecPOMDPs) form a powerful framework for multiagent planning under uncertainty, but optimal solutions require a rigid history-based policy representation. In this paper we allow inter-agent communication which turns the problem in a centralized Multiagent POMDP (MPOMDP). We map belief distributions over state factors to an agent’s local actions by exploiting structure in the joint MPOMDP policy. The key point is that when sparse dependencies between the agents’ decisions exist, often the belief over its local state factors is sufficient for an agent to unequivocally identify the optimal action, and communication can be avoided. We formalize these notions by casting the problem into convex optimization form, and present experimental results illustrating the savings in communication that we can obtain.

2 0.19379093 174 nips-2011-Monte Carlo Value Iteration with Macro-Actions

Author: Zhan Lim, Lee Sun, Daniel J. Hsu

Abstract: POMDP planning faces two major computational challenges: large state spaces and long planning horizons. The recently introduced Monte Carlo Value Iteration (MCVI) can tackle POMDPs with very large discrete state spaces or continuous state spaces, but its performance degrades when faced with long planning horizons. This paper presents Macro-MCVI, which extends MCVI by exploiting macro-actions for temporal abstraction. We provide sufficient conditions for Macro-MCVI to inherit the good theoretical properties of MCVI. Macro-MCVI does not require explicit construction of probabilistic models for macro-actions and is thus easy to apply in practice. Experiments show that Macro-MCVI substantially improves the performance of MCVI with suitable macro-actions. 1

3 0.16626558 212 nips-2011-Periodic Finite State Controllers for Efficient POMDP and DEC-POMDP Planning

Author: Joni K. Pajarinen, Jaakko Peltonen

Abstract: Applications such as robot control and wireless communication require planning under uncertainty. Partially observable Markov decision processes (POMDPs) plan policies for single agents under uncertainty and their decentralized versions (DEC-POMDPs) find a policy for multiple agents. The policy in infinite-horizon POMDP and DEC-POMDP problems has been represented as finite state controllers (FSCs). We introduce a novel class of periodic FSCs, composed of layers connected only to the previous and next layer. Our periodic FSC method finds a deterministic finite-horizon policy and converts it to an initial periodic infinitehorizon policy. This policy is optimized by a new infinite-horizon algorithm to yield deterministic periodic policies, and by a new expectation maximization algorithm to yield stochastic periodic policies. Our method yields better results than earlier planning methods and can compute larger solutions than with regular FSCs.

4 0.140789 52 nips-2011-Clustering via Dirichlet Process Mixture Models for Portable Skill Discovery

Author: Scott Niekum, Andrew G. Barto

Abstract: Skill discovery algorithms in reinforcement learning typically identify single states or regions in state space that correspond to task-specific subgoals. However, such methods do not directly address the question of how many distinct skills are appropriate for solving the tasks that the agent faces. This can be highly inefficient when many identified subgoals correspond to the same underlying skill, but are all used individually as skill goals. Furthermore, skills created in this manner are often only transferable to tasks that share identical state spaces, since corresponding subgoals across tasks are not merged into a single skill goal. We show that these problems can be overcome by clustering subgoal data defined in an agent-space and using the resulting clusters as templates for skill termination conditions. Clustering via a Dirichlet process mixture model is used to discover a minimal, sufficient collection of portable skills. 1

5 0.11989228 48 nips-2011-Blending Autonomous Exploration and Apprenticeship Learning

Author: Thomas J. Walsh, Daniel K. Hewlett, Clayton T. Morrison

Abstract: We present theoretical and empirical results for a framework that combines the benefits of apprenticeship and autonomous reinforcement learning. Our approach modifies an existing apprenticeship learning framework that relies on teacher demonstrations and does not necessarily explore the environment. The first change is replacing previously used Mistake Bound model learners with a recently proposed framework that melds the KWIK and Mistake Bound supervised learning protocols. The second change is introducing a communication of expected utility from the student to the teacher. The resulting system only uses teacher traces when the agent needs to learn concepts it cannot efficiently learn on its own. 1

6 0.088709377 41 nips-2011-Autonomous Learning of Action Models for Planning

7 0.076968201 98 nips-2011-From Bandits to Experts: On the Value of Side-Observations

8 0.075110368 158 nips-2011-Learning unbelievable probabilities

9 0.073424488 215 nips-2011-Policy Gradient Coagent Networks

10 0.071789369 300 nips-2011-Variance Reduction in Monte-Carlo Tree Search

11 0.071549989 206 nips-2011-Optimal Reinforcement Learning for Gaussian Systems

12 0.068402849 25 nips-2011-Adaptive Hedge

13 0.068168007 18 nips-2011-Action-Gap Phenomenon in Reinforcement Learning

14 0.066958241 88 nips-2011-Environmental statistics and the trade-off between model-based and TD learning in humans

15 0.066577874 72 nips-2011-Distributed Delayed Stochastic Optimization

16 0.062193222 163 nips-2011-MAP Inference for Bayesian Inverse Reinforcement Learning

17 0.056524634 87 nips-2011-Energetically Optimal Action Potentials

18 0.051759087 154 nips-2011-Learning person-object interactions for action recognition in still images

19 0.051751234 16 nips-2011-A reinterpretation of the policy oscillation phenomenon in approximate policy iteration

20 0.051657241 210 nips-2011-PAC-Bayesian Analysis of Contextual Bandits


similar papers computed by lsi model

lsi for this paper:

topicId topicWeight

[(0, 0.14), (1, -0.097), (2, 0.046), (3, 0.122), (4, -0.095), (5, 0.011), (6, -0.019), (7, -0.024), (8, -0.001), (9, -0.036), (10, -0.01), (11, -0.01), (12, -0.087), (13, -0.074), (14, -0.001), (15, -0.132), (16, 0.118), (17, 0.074), (18, 0.042), (19, -0.027), (20, 0.097), (21, -0.027), (22, 0.013), (23, -0.027), (24, 0.177), (25, 0.111), (26, -0.111), (27, 0.078), (28, -0.07), (29, -0.065), (30, -0.096), (31, -0.071), (32, -0.109), (33, -0.013), (34, 0.11), (35, 0.003), (36, 0.109), (37, -0.029), (38, -0.066), (39, -0.147), (40, 0.075), (41, 0.095), (42, 0.055), (43, 0.028), (44, 0.054), (45, -0.023), (46, 0.031), (47, -0.053), (48, 0.118), (49, -0.036)]

similar papers list:

simIndex simValue paperId paperTitle

same-paper 1 0.96665043 79 nips-2011-Efficient Offline Communication Policies for Factored Multiagent POMDPs

Author: João V. Messias, Matthijs Spaan, Pedro U. Lima

Abstract: Factored Decentralized Partially Observable Markov Decision Processes (DecPOMDPs) form a powerful framework for multiagent planning under uncertainty, but optimal solutions require a rigid history-based policy representation. In this paper we allow inter-agent communication which turns the problem in a centralized Multiagent POMDP (MPOMDP). We map belief distributions over state factors to an agent’s local actions by exploiting structure in the joint MPOMDP policy. The key point is that when sparse dependencies between the agents’ decisions exist, often the belief over its local state factors is sufficient for an agent to unequivocally identify the optimal action, and communication can be avoided. We formalize these notions by casting the problem into convex optimization form, and present experimental results illustrating the savings in communication that we can obtain.

2 0.69654089 41 nips-2011-Autonomous Learning of Action Models for Planning

Author: Neville Mehta, Prasad Tadepalli, Alan Fern

Abstract: This paper introduces two new frameworks for learning action models for planning. In the mistake-bounded planning framework, the learner has access to a planner for the given model representation, a simulator, and a planning problem generator, and aims to learn a model with at most a polynomial number of faulty plans. In the planned exploration framework, the learner does not have access to a problem generator and must instead design its own problems, plan for them, and converge with at most a polynomial number of planning attempts. The paper reduces learning in these frameworks to concept learning with one-sided error and provides algorithms for successful learning in both frameworks. A specific family of hypothesis spaces is shown to be efficiently learnable in both the frameworks. 1

3 0.69452685 48 nips-2011-Blending Autonomous Exploration and Apprenticeship Learning

Author: Thomas J. Walsh, Daniel K. Hewlett, Clayton T. Morrison

Abstract: We present theoretical and empirical results for a framework that combines the benefits of apprenticeship and autonomous reinforcement learning. Our approach modifies an existing apprenticeship learning framework that relies on teacher demonstrations and does not necessarily explore the environment. The first change is replacing previously used Mistake Bound model learners with a recently proposed framework that melds the KWIK and Mistake Bound supervised learning protocols. The second change is introducing a communication of expected utility from the student to the teacher. The resulting system only uses teacher traces when the agent needs to learn concepts it cannot efficiently learn on its own. 1

4 0.66121644 174 nips-2011-Monte Carlo Value Iteration with Macro-Actions

Author: Zhan Lim, Lee Sun, Daniel J. Hsu

Abstract: POMDP planning faces two major computational challenges: large state spaces and long planning horizons. The recently introduced Monte Carlo Value Iteration (MCVI) can tackle POMDPs with very large discrete state spaces or continuous state spaces, but its performance degrades when faced with long planning horizons. This paper presents Macro-MCVI, which extends MCVI by exploiting macro-actions for temporal abstraction. We provide sufficient conditions for Macro-MCVI to inherit the good theoretical properties of MCVI. Macro-MCVI does not require explicit construction of probabilistic models for macro-actions and is thus easy to apply in practice. Experiments show that Macro-MCVI substantially improves the performance of MCVI with suitable macro-actions. 1

5 0.65373933 52 nips-2011-Clustering via Dirichlet Process Mixture Models for Portable Skill Discovery

Author: Scott Niekum, Andrew G. Barto

Abstract: Skill discovery algorithms in reinforcement learning typically identify single states or regions in state space that correspond to task-specific subgoals. However, such methods do not directly address the question of how many distinct skills are appropriate for solving the tasks that the agent faces. This can be highly inefficient when many identified subgoals correspond to the same underlying skill, but are all used individually as skill goals. Furthermore, skills created in this manner are often only transferable to tasks that share identical state spaces, since corresponding subgoals across tasks are not merged into a single skill goal. We show that these problems can be overcome by clustering subgoal data defined in an agent-space and using the resulting clusters as templates for skill termination conditions. Clustering via a Dirichlet process mixture model is used to discover a minimal, sufficient collection of portable skills. 1

6 0.62491071 212 nips-2011-Periodic Finite State Controllers for Efficient POMDP and DEC-POMDP Planning

7 0.4844273 300 nips-2011-Variance Reduction in Monte-Carlo Tree Search

8 0.46422696 237 nips-2011-Reinforcement Learning using Kernel-Based Stochastic Factorization

9 0.39724597 98 nips-2011-From Bandits to Experts: On the Value of Side-Observations

10 0.39552215 206 nips-2011-Optimal Reinforcement Learning for Gaussian Systems

11 0.38993293 50 nips-2011-Budgeted Optimization with Concurrent Stochastic-Duration Experiments

12 0.38792169 25 nips-2011-Adaptive Hedge

13 0.38419953 256 nips-2011-Solving Decision Problems with Limited Information

14 0.35862789 210 nips-2011-PAC-Bayesian Analysis of Contextual Bandits

15 0.35692608 3 nips-2011-A Collaborative Mechanism for Crowdsourcing Prediction Problems

16 0.34110606 87 nips-2011-Energetically Optimal Action Potentials

17 0.33681956 158 nips-2011-Learning unbelievable probabilities

18 0.3244555 218 nips-2011-Predicting Dynamic Difficulty

19 0.31938511 31 nips-2011-An Application of Tree-Structured Expectation Propagation for Channel Decoding

20 0.29632434 245 nips-2011-Selecting the State-Representation in Reinforcement Learning


similar papers computed by lda model

lda for this paper:

topicId topicWeight

[(0, 0.029), (4, 0.043), (20, 0.025), (26, 0.028), (31, 0.121), (33, 0.032), (37, 0.031), (43, 0.045), (45, 0.074), (53, 0.287), (57, 0.031), (65, 0.012), (74, 0.07), (83, 0.039), (86, 0.01), (99, 0.048)]

similar papers list:

simIndex simValue paperId paperTitle

1 0.78369409 256 nips-2011-Solving Decision Problems with Limited Information

Author: Denis D. Maua, Cassio Campos

Abstract: We present a new algorithm for exactly solving decision-making problems represented as an influence diagram. We do not require the usual assumptions of no forgetting and regularity, which allows us to solve problems with limited information. The algorithm, which implements a sophisticated variable elimination procedure, is empirically shown to outperform a state-of-the-art algorithm in randomly generated problems of up to 150 variables and 1064 strategies. 1

2 0.78270382 120 nips-2011-History distribution matching method for predicting effectiveness of HIV combination therapies

Author: Jasmina Bogojeska

Abstract: This paper presents an approach that predicts the effectiveness of HIV combination therapies by simultaneously addressing several problems affecting the available HIV clinical data sets: the different treatment backgrounds of the samples, the uneven representation of the levels of therapy experience, the missing treatment history information, the uneven therapy representation and the unbalanced therapy outcome representation. The computational validation on clinical data shows that, compared to the most commonly used approach that does not account for the issues mentioned above, our model has significantly higher predictive power. This is especially true for samples stemming from patients with longer treatment history and samples associated with rare therapies. Furthermore, our approach is at least as powerful for the remaining samples. 1

same-paper 3 0.77823883 79 nips-2011-Efficient Offline Communication Policies for Factored Multiagent POMDPs

Author: João V. Messias, Matthijs Spaan, Pedro U. Lima

Abstract: Factored Decentralized Partially Observable Markov Decision Processes (DecPOMDPs) form a powerful framework for multiagent planning under uncertainty, but optimal solutions require a rigid history-based policy representation. In this paper we allow inter-agent communication which turns the problem in a centralized Multiagent POMDP (MPOMDP). We map belief distributions over state factors to an agent’s local actions by exploiting structure in the joint MPOMDP policy. The key point is that when sparse dependencies between the agents’ decisions exist, often the belief over its local state factors is sufficient for an agent to unequivocally identify the optimal action, and communication can be avoided. We formalize these notions by casting the problem into convex optimization form, and present experimental results illustrating the savings in communication that we can obtain.

4 0.69466984 5 nips-2011-A Denoising View of Matrix Completion

Author: Weiran Wang, Zhengdong Lu, Miguel Á. Carreira-Perpiñán

Abstract: In matrix completion, we are given a matrix where the values of only some of the entries are present, and we want to reconstruct the missing ones. Much work has focused on the assumption that the data matrix has low rank. We propose a more general assumption based on denoising, so that we expect that the value of a missing entry can be predicted from the values of neighboring points. We propose a nonparametric version of denoising based on local, iterated averaging with meanshift, possibly constrained to preserve local low-rank manifold structure. The few user parameters required (the denoising scale, number of neighbors and local dimensionality) and the number of iterations can be estimated by cross-validating the reconstruction error. Using our algorithms as a postprocessing step on an initial reconstruction (provided by e.g. a low-rank method), we show consistent improvements with synthetic, image and motion-capture data. Completing a matrix from a few given entries is a fundamental problem with many applications in machine learning, computer vision, network engineering, and data mining. Much interest in matrix completion has been caused by recent theoretical breakthroughs in compressed sensing [1, 2] as well as by the now celebrated Netflix challenge on practical prediction problems [3, 4]. Since completion of arbitrary matrices is not a well-posed problem, it is often assumed that the underlying matrix comes from a restricted class. Matrix completion models almost always assume a low-rank structure of the matrix, which is partially justified through factor models [4] and fast convex relaxation [2], and often works quite well when the observations are sparse and/or noisy. The low-rank structure of the matrix essentially asserts that all the column vectors (or the row vectors) live on a low-dimensional subspace. This assumption is arguably too restrictive for problems with richer structure, e.g. when each column of the matrix represents a snapshot of a seriously corrupted motion capture sequence (see section 3), for which a more flexible model, namely a curved manifold, is more appropriate. In this paper, we present a novel view of matrix completion based on manifold denoising, which conceptually generalizes the low-rank assumption to curved manifolds. Traditional manifold denoising is performed on fully observed data [5, 6], aiming to send the data corrupted by noise back to the correct surface (defined in some way). However, with a large proportion of missing entries, we may not have a good estimate of the manifold. Instead, we start with a poor estimate and improve it iteratively. Therefore the “noise” may be due not just to intrinsic noise, but mostly to inaccurately estimated missing entries. We show that our algorithm can be motivated from an objective purely based on denoising, and prove its convergence under some conditions. We then consider a more general case with a nonlinear low-dimensional manifold and use a stopping criterion that works successfully in practice. Our model reduces to a low-rank model when we require the manifold to be flat, showing a relation with a recent thread of matrix completion models based on alternating projection [7]. In our experiments, we show that our denoising-based matrix completion model can make better use of the latent manifold structure on both artificial and real-world data sets, and yields superior recovery of the missing entries. The paper is organized as follows: section 1 reviews nonparametric denoising methods based on mean-shift updates, section 2 extends this to matrix completion by using denoising with constraints, section 3 gives experimental results, and section 4 discusses related work. 1 1 Denoising with (manifold) blurring mean-shift algorithms (GBMS/MBMS) In Gaussian blurring mean-shift (GBMS), denoising is performed in a nonparametric way by local averaging: each data point moves to the average of its neighbors (to a certain scale), and the process is repeated. We follow the derivation in [8]. Consider a dataset {xn }N ⊂ RD and define a n=1 Gaussian kernel density estimate p(x) = 1 N N Gσ (x, xn ) (1) n=1 1 with bandwidth σ > 0 and kernel Gσ (x, xn ) ∝ exp − 2 ( x − xn /σ)2 (other kernels may be used, such as the Epanechnikov kernel, which results in sparse affinities). The (non-blurring) mean-shift algorithm rearranges the stationary point equation ∇p(x) = 0 into the iterative scheme x(τ +1) = f (x(τ ) ) with N x (τ +1) = f (x (τ ) p(n|x )= (τ ) )xn p(n|x (τ ) n=1 )= exp − 1 (x(τ ) − xn )/σ 2 N n′ =1 2 exp − 1 (x(τ ) − xn′ )/σ 2 2 . (2) This converges to a mode of p from almost every initial x ∈ RD , and can be seen as taking selfadapting step sizes along the gradient (since the mean shift f (x) − x is parallel to ∇p(x)). This iterative scheme was originally proposed by [9] and it or variations of it have found widespread application in clustering [8, 10–12] and denoising of 3D point sets (surface fairing; [13, 14]) and manifolds in general [5, 6]. The blurring mean-shift algorithm applies one step of the previous scheme, initialized from every point, in parallel for all points. That is, given the dataset X = {x1 , . . . , xN }, for each xn ∈ X ˜ we obtain a new point xn = f (xn ) by applying one step of the mean-shift algorithm, and then we ˜ replace X with the new dataset X, which is a blurred (shrunk) version of X. By iterating this process we obtain a sequence of datasets X(0) , X(1) , . . . (and a corresponding sequence of kernel density estimates p(0) (x), p(1) (x), . . .) where X(0) is the original dataset and X(τ ) is obtained by blurring X(τ −1) with one mean-shift step. We can see this process as maximizing the following objective function [10] by taking parallel steps of the form (2) for each point: N p(xn ) = E(X) = n=1 1 N N N 1 e− 2 Gσ (xn , xm ) ∝ xn −xm σ 2 . (3) n,m=1 n,m=1 This process eventually converges to a dataset X(∞) where all points are coincident: a completely denoised dataset where all structure has been erased. As shown by [8], this process can be stopped early to return clusters (= locally denoised subsets of points); the number of clusters obtained is controlled by the bandwidth σ. However, here we are interested in the denoising behavior of GBMS. ˜ The GBMS step can be formulated in a matrix form reminiscent of spectral clustering [8] as X = X P where X = (x1 , . . . , xN ) is a D×N matrix of data points; W is the N ×N matrix of Gaussian N affinities wnm = Gσ (xn , xm ); D = diag ( n=1 wnm ) is the degree matrix; and P = WD−1 is N an N × N stochastic matrix: pnm = p(n|xm ) ∈ (0, 1) and n=1 pnm = 1. P (or rather its transpose) is the stochastic matrix of the random walk in a graph [15], which in GBMS represents the posterior probabilities of each point under the kernel density estimate (1). P is similar to the 1 1 matrix N = D− 2 WD− 2 derived from the normalized graph Laplacian commonly used in spectral clustering, e.g. in the normalized cut [16]. Since, by the Perron-Frobenius theorem [17, ch. 8], all left eigenvalues of P(X) have magnitude less than 1 except for one that equals 1 and is associated with ˜ an eigenvector of constant entries, iterating X = X P(X) converges to the stationary distribution of each P(X), where all points coincide. ˜ From this point of view, the product X = X P(X) can be seen as filtering the dataset X with a datadependent low-pass filter P(X), which makes clear the denoising behavior. This also suggests using ˜ other filters [12] X = X φ(P(X)) as long as φ(1) = 1 and |φ(r)| < 1 for r ∈ [0, 1), such as explicit schemes φ(P) = (1 − η)I + ηP for η ∈ (0, 2], power schemes φ(P) = Pn for n = 1, 2, 3 . . . or implicit schemes φ(P) = ((1 + η)I − ηP)−1 for η > 0. One important problem with GBMS is that it denoises equally in all directions. When the data lies on a low-dimensional manifold, denoising orthogonally to it removes out-of-manifold noise, but 2 denoising tangentially to it perturbs intrinsic degrees of freedom of the data and causes shrinkage of the entire manifold (most strongly near its boundary). To prevent this, the manifold blurring meanshift algorithm (MBMS) [5] first computes a predictor averaging step with GBMS, and then for each point xn a corrector projective step removes the step direction that lies in the local tangent space of xn (obtained from local PCA run on its k nearest neighbors). In practice, both GBMS and MBMS must be stopped early to prevent excessive denoising and manifold distortions. 2 Blurring mean-shift denoising algorithms for matrix completion We consider the natural extension of GBMS to the matrix completion case by adding the constraints given by the present values. We use the subindex notation XM and XP to indicate selection of the missing or present values of the matrix XD×N , where P ⊂ U , M = U \ P and U = {(d, n): d = 1, . . . , D, n = 1, . . . , N }. The indices P and values XP of the present matrix entries are the data of the problem. Then we have the following constrained optimization problem: N Gσ (xn , xm ) max E(X) = X s.t. XP = XP . (4) n,m=1 This is similar to low-rank formulations for matrix completion that have the same constraints but use as objective function the reconstruction error with a low-rank assumption, e.g. X − ABX 2 with AD×L , BL×D and L < D. We initialize XM to the output of some other method for matrix completion, such as singular value projection (SVP; [7]). For simple constraints such as ours, gradient projection algorithms are attractive. The gradient of E wrt X is a matrix of D × N whose nth column is: ∇xn E(X) = 2 σ2 N 1 e− 2 xn −xm σ 2 N (xm − xn ) ∝ m=1 2 p(m|xn )xm p(xn ) −xn + σ2 m=1 (5) and its projection on the constraint space is given by zeroing its entries having indices in P; call ΠP this projection operator. Then, we have the following step of length α ≥ 0 along the projected gradient: (τ +1) X(τ +1) = X(τ ) + αΠP (∇X E(X(τ ) )) ⇐⇒ XM (τ ) = XM + α ΠP (∇X E(X(τ ) )) M (6) which updates only the missing entries XM . Since our search direction is ascent and makes an angle with the gradient that is bounded away from π/2, and E is lower bounded, continuously differentiable and has bounded Hessian (thus a Lipschitz continuous gradient) in RN L , by carrying out a line search that satisfies the Wolfe conditions, we are guaranteed convergence to a local stationary point, typically a maximizer [18, th. 3.2]. However, as reasoned later, we do not perform a line search at all, instead we fix the step size to the GBMS self-adapting step size, which results in a simple and faster algorithm consisting of carrying out a GBMS step on X (i.e., X(τ +1) = X(τ ) P(X(τ ) )) and then refilling XP to the present values. While we describe the algorithm in this way for ease of explanation, in practice we do not actually compute the GBMS step for all xdn values, but only for the missing ones, which is all we need. Thus, our algorithm carries out GBMS denoising steps within the missing-data subspace. We can derive this result in a different way by starting from N the unconstrained optimization problem maxXP E(X) = n,m=1 Gσ (xn , xm ) (equivalent to (4)), computing its gradient wrt XP , equating it to zero and rearranging (in the same way the mean-shift algorithm is derived) to obtain a fixed-point iteration identical to our update above. Fig. 1 shows the pseudocode for our denoising-based matrix completion algorithms (using three nonparametric denoising algorithms: GBMS, MBMS and LTP). Convergence and stopping criterion As noted above, we have guaranteed convergence by simply satisfying standard line search conditions, but a line search is costly. At present we do not have (τ +1) a proof that the GBMS step size satisfies such conditions, or indeed that the new iterate XM increases or leaves unchanged the objective, although we have never encountered a counterexample. In fact, it turns out that none of the work about GBMS that we know about proves that either: [10] proves that ∅(X(τ +1) ) ≤ ∅(X(τ ) ) for 0 < ρ < 1, where ∅(·) is the set diameter, while [8, 12] 3 notes that P(X) has a single eigenvalue of value 1 and all others of magnitued less than 1. While this shows that all points converge to the same location, which indeed is the global maximum of (3), it does not necessarily follow that each step decreases E. GBMS (k, σ) with full or k-nn graph: given XD×N , M repeat for n = 1, . . . , N Nn ← {1, . . . , N } (full graph) or k nearest neighbors of xn (k-nn graph) Gσ (xn ,xm ) mean-shift xm ∂xn ← −xn + m∈Nn step m′ ∈Nn Gσ (xn ,xm′ ) end XM ← XM + (∂X)M move points’ missing entries until validation error increases return X However, the question of convergence as τ → ∞ has no practical interest in a denoising setting, because achieving a total denoising almost never yields a good matrix completion. What we want is to achieve just enough denoising and stop the algorithm, as was the case with GBMS clustering, and as is the case in algorithms for image denoising. We propose to determine the optimal number of iterations, as well as the bandwidth σ and any other parameters, by cross-validation. Specifically, we select a held-out set by picking a random subset of the present entries and considering them as missing; this allows us to evaluate an error between our completion for them and the ground truth. We stop iterating when this error increases. MBMS (L, k, σ) with full or k-nn graph: given XD×N , M repeat for n = 1, . . . , N Nn ← {1, . . . , N } (full graph) or k nearest neighbors of xn (k-nn graph) Gσ (xn ,xm ) mean-shift xm ∂xn ← −xn + m∈Nn step m′ ∈Nn Gσ (xn ,xm′ ) Xn ← k nearest neighbors of xn (µn , Un ) ← PCA(Xn , L) estimate L-dim tangent space at xn subtract parallel motion ∂xn ← (I − Un UT )∂xn n end XM ← XM + (∂X)M move points’ missing entries until validation error increases return X This argument justifies an algorithmic, as opposed to an opLTP (L, k) with k-nn graph: given XD×N , M timization, view of denoisingrepeat based matrix completion: apfor n = 1, . . . , N ply a denoising step, refill the Xn ← k nearest neighbors of xn present values, iterate until the (µn , Un ) ← PCA(Xn , L) estimate L-dim tangent space at xn validation error increases. This project point onto tangent space allows very general definitions ∂xn ← (I − Un UT )(µn − xn ) n end of denoising, and indeed a lowXM ← XM + (∂X)M move points’ missing entries rank projection is a form of deuntil validation error increases noising where points are not alreturn X lowed outside the linear manifold. Our formulation using Figure 1: Our denoising matrix completion algorithms, based on the objective function (4) is still Manifold Blurring Mean Shift (MBMS) and its particular cases useful in that it connects our Local Tangent Projection (LTP, k-nn graph, σ = ∞) and Gauss- denoising assumption with the ian Blurring Mean Shift (GBMS, L = 0); see [5] for details. Nn more usual low-rank assumption contains all N points (full graph) or only xn ’s nearest neighbors that has been used in much ma(k-nn graph). The index M selects the components of its input trix completion work, and juscorresponding to missing values. Parameters: denoising scale σ, tifies the refilling step as renumber of neighbors k, local dimensionality L. sulting from the present-data constraints under a gradientprojection optimization. MBMS denoising for matrix completion Following our algorithmic-based approach to denois˜ ing, we could consider generalized GBMS steps of the form X = X φ(P(X)). For clustering, Carreira-Perpi˜ an [12] found an overrelaxed explicit step φ(P) = (1 − η)I + ηP with η ≈ 1.25 to n´ achieve similar clusterings but faster. Here, we focus instead on the MBMS variant of GBMS that allows only for orthogonal, not tangential, point motions (defined wrt their local tangent space as estimated by local PCA), with the goal of preserving low-dimensional manifold structure. MBMS has 3 user parameters: the bandwidth σ (for denoising), and the latent dimensionality L and the 4 number of neighbors k (for the local tangent space and the neighborhood graph). A special case of MBMS called local tangent projection (LTP) results by using a neighborhood graph and setting σ = ∞ (so only two user parameters are needed: L and k). LTP can be seen as doing a low-rank matrix completion locally. LTP was found in [5] to have nearly as good performance as the best σ in several problems. MBMS also includes as particular cases GBMS (L = 0), PCA (k = N , σ = ∞), and no denoising (σ = 0 or L = D). Note that if we apply MBMS to a dataset that lies on a linear manifold of dimensionality d using L ≥ d then no denoising occurs whatsoever because the GBMS updates lie on the d-dimensional manifold and are removed by the corrector step. In practice, even if the data are assumed noiseless, the reconstruction from a low-rank method will lie close to but not exactly on the d-dimensional manifold. However, this suggests using largish ranks for the low-rank method used to reconstruct X and lower L values in the subsequent MBMS run. In summary, this yields a matrix completion algorithm where we apply an MBMS step, refill the present values, and iterate until the validation error increases. Again, in an actual implementation we compute the MBMS step only for the missing entries of X. The shrinking problem of GBMS is less pronounced in our matrix completion setting, because we constrain some values not to change. Still, in agreement with [5], we find MBMS to be generally superior to GBMS. Computational cost With a full graph, the cost per iteration of GBMS and MBMS is O(N 2 D) and O(N 2 D + N (D + k) min(D, k)2 ), respectively. In practice with high-dimensional data, best denoising results are obtained using a neighborhood graph [5], so that the sums over points in eqs. (3) or (4) extend only to the neighbors. With a k-nearest-neighbor graph and if we do not update the neighbors at each iteration (which affects the result little), the respective cost per iteration is O(N kD) and O(N kD + N (D + k) min(D, k)2 ), thus linear in N . The graph is constructed on the initial X we use, consisting of the present values and an imputation for the missing ones achieved with a standard matrix completion method, and has a one-off cost of O(N 2 D). The cost when we have a fraction µ = |M| ∈ [0, 1] of missing data is simply the above times µ. Hence the run time ND of our mean-shift-based matrix completion algorithms is faster the more present data we have, and thus faster than the usual GBMS or MBMS case, where all data are effectively missing. 3 Experimental results We compare with representative methods of several approaches: a low-rank matrix completion method, singular value projection (SVP [7], whose performance we found similar to that of alternating least squares, ALS [3, 4]); fitting a D-dimensional Gaussian model with EM and imputing the missing values of each xn as the conditional mean E {xn,Mn |xn,Pn } (we use the implementation of [19]); and the nonlinear method of [20] (nlPCA). We initialize GBMS and MBMS from some or all of these algorithms. For methods with user parameters, we set them by cross-validation in the following way: we randomly select 10% of the present entries and pretend they are missing as well, we run the algorithm on the remaining 90% of the present values, and we evaluate the reconstruction at the 10% entries we kept earlier. We repeat this over different parameters’ values and pick the one with lowest reconstruction error. We then run the algorithm with these parameters values on the entire present data and report the (test) error with the ground truth for the missing values. 100D Swissroll We created a 3D swissroll data set with 3 000 points and lifted it to 100D with a random orthonormal mapping, and added a little noise (spherical Gaussian with stdev 0.1). We selected uniformly at random 6.76% of the entries to be present. We use the Gaussian model and SVP (fixed rank = 3) as initialization for our algorithm. We typically find that these initial X are very noisy (fig. 3), with some reconstructed points lying between different branches of the manifold and causing a big reconstruction error. We fixed L = 2 (the known dimensionality) for MBMS and cross-validated the other parameters: σ and k for MBMS and GBMS (both using k-nn graph), and the number of iterations τ to be used. Table 1 gives the performance of MBMS and GBMS for testing, along with their optimal parameters. Fig. 3 shows the results of different methods at a few iterations. MBMS initialized from the Gaussian model gives the most remarkable denoising effect. To show that there is a wide range of σ and number of iterations τ that give good performance with GBMS and MBMS, we fix k = 50 and run the algorithm with varying σ values and plot the reconstruction error for missing entries over iterations in fig. 2. Both GBMS can achieve good 5 Methods Gaussian + GBMS (∞, 10, 0, 1) + MBMS (1, 20, 2, 25) SVP + GBMS (3, 50, 0, 1) + MBMS (3, 50, 2, 2) RSSE 168.1 165.8 157.2 156.8 151.4 151.8 mean 2.63 2.57 2.36 1.94 1.89 1.87 stdev 1.59 1.61 1.63 2.10 2.02 2.05 Methods nlPCA SVP + GBMS (400,140,0,1) + MBMS (500,140,9,5) Table 1: Swissroll data set: reconstruction errors obtained by different algorithms along with their optimal parameters (σ, k, L, no. iterations τ ). The three columns show the root sum of squared errors on missing entries, the mean, and the standard deviation of the pointwise reconstruction error, resp. SVP + GBMS error (RSSE) 180 170 SVP + MBMS Gaussian + GBMS 180 180 170 170 170 160 160 ∞ 160 150 0 1 2 3 4 5 6 7 8 910 12 14 16 18 20 iteration τ stdev 42.6 39.3 37.7 34.9 Gaussian + MBMS 180 8 10 15 25 mean 26.1 21.8 18.8 17.0 Table 2: MNIST-7 data set: errors of the different algorithms and their optimal parameters (σ, k, L, no. iterations τ ). The three columns show the root sum of squared errors on missing entries (×10−4 ), the mean, and the standard deviation of pixel errors, respectively. 160 0.3 0.5 1 2 3 5 RSSE 7.77 6.99 6.54 6.03 150 0 1 2 3 4 5 6 7 8 910 12 14 16 18 20 iteration τ 150 0 1 2 3 4 5 6 7 8 910 12 14 16 18 20 iteration τ 150 0 1 2 3 4 5 6 7 8 910 12 14 16 18 20 iteration τ Figure 2: Reconstruction error of GBMS/MBMS over iterations (each curve is a different σ value). denoising (and reconstruction), but MBMS is more robust, with good results occurring for a wide range of iterations, indicating it is able to preserve the manifold structure better. Mocap data We use the running-motion sequence 09 01 from the CMU mocap database with 148 samples (≈ 1.7 cycles) with 150 sensor readings (3D positions of 50 joints on a human body). The motion is intrinsically 1D, tracing a loop in 150D. We compare nlPCA, SVP, the Gaussian model, and MBMS initialized from the first three algorithms. For nlPCA, we do a grid search for the weight decay coefficient while fixing its structure to be 2 × 10 × 150 units, and use an early stopping criterion. For SVP, we do grid search on {1, 2, 3, 5, 7, 10} for the rank. For MBMS (L = 1) and GBMS (L = 0), we do grid search for σ and k. We report the reconstruction error as a function of the proportion of missing entries from 50% to 95%. For each missing-data proportion, we randomly select 5 different sets of present values and run all algorithms for them. Fig. 4 gives the mean errors of all algorithms. All methods perform well when missing-data proportion is small. nlPCA, being prone to local optima, is less stable than SVP and the Gaussian model, especially when the missing-data proportion is large. The Gaussian model gives the best and most stable initialization. At 95%, all methods fail to give an acceptable reconstruction, but up to 90% missing entries, MBMS and GBMS always beat the other algorithms. Fig. 4 shows selected reconstructions from all algorithms. MNIST digit ‘7’ The MNIST digit ‘7’ data set contains 6 265 greyscale (0–255) images of size 28 × 28. We create missing entries in a way reminiscent of run-length errors in transmission. We generate 16 to 26 rectangular boxes of an area approximately 25 pixels at random locations in each image and use them to black out pixels. In this way, we create a high dimensional data set (784 dimensions) with about 50% entries missing on average. Because of the loss of spatial correlations within the blocks, this missing data pattern is harder than random. The Gaussian model cannot handle such a big data set because it involves inverting large covariance matrices. nlPCA is also very slow and we cannot afford cross-validating its structure or the weight decay coefficient, so we picked a reasonable structure (10 × 30 × 784 units), used the default weight decay parameter in the code (10−3 ), and allowed up to 500 iterations. We only use SVP as initialization for our algorithm. Since the intrinsic dimension of MNIST is suspected to be not very high, 6 SVP τ =0 SVP + GBMS τ =1 SVP + MBMS τ =2 Gaussian τ =0 Gaussian + GBMS τ =1 Gaussian + MBMS τ = 25 20 20 20 20 20 20 15 15 15 15 15 15 10 10 10 10 10 10 5 5 5 5 5 5 0 0 0 0 0 0 −5 −5 −5 −5 −5 −5 −10 −10 −15 −15 −10 −5 0 5 10 15 20 −10 −15 −15 −10 −5 0 5 10 15 20 −15 −15 −10 −10 −5 0 5 10 15 20 −15 −15 −10 −10 −5 0 5 10 15 20 −15 −15 −10 −10 −5 0 5 10 15 20 −15 −15 −10 −5 0 5 10 15 20 Figure 3: Denoising effect of the different algorithms. For visualization, we project the 100D data to 3D with the projection matrix used for creating the data. Present values are refilled for all plots. 7000 6000 error 5000 4000 frame 2 (leg distance) frame 10 (foot pose) frame 147 (leg pose) nlPCA nlPCA + GBMS nlPCA + MBMS SVP SVP + GBMS SVP + MBMS Gaussian Gaussian + GBMS Gaussian + MBMS 3000 2000 1000 0 50 60 70 80 85 90 95 % of missing data Figure 4: Left: mean of errors (RSSE) of 5 runs obtained by different algorithms for varying percentage of missing values. Errorbars shown only for Gaussian + MBMS to avoid clutter. Right: sample reconstructions when 85% percent data is missing. Row 1: initialization. Row 2: init+GBMS. Row 3: init+MBMS. Color indicates different initialization: black, original data; red, nlPCA; blue, SVP; green, Gaussian. we used rank 10 for SVP and L = 9 for MBMS. We also use the same k = 140 as in [5]. So we only had to choose σ and the number of iterations via cross-validation. Table 2 shows the methods and their corresponding error. Fig. 5 shows some representative reconstructions from different algorithms, with present values refilled. The mean-shift averaging among closeby neighbors (a soft form of majority voting) helps to eliminate noise, unusual strokes and other artifacts created by SVP, which by their nature tend to occur in different image locations over the neighborhood of images. 4 Related work Matrix completion is widely studied in theoretical compressed sensing [1, 2] as well as practical recommender systems [3, 4]. Most matrix completion models rely on a low-rank assumption, and cannot fully exploit a more complex structure of the problem, such as curved manifolds. Related work is on multi-task learning in a broad sense, which extracts the common structure shared by multiple related objects and achieves simultaneous learning on them. This includes applications such as alignment of noise-corrupted images [21], recovery of images with occlusion [22], and even learning of multiple related regressors or classifiers [23]. Again, all these works are essentially based on a subspace assumption, and do not generalize to more complex situations. A line of work based on a nonlinear low-rank assumption (with a latent variable z of dimensionN 2 ality L < D) involves setting up a least-squares error function minf ,Z n=1 xn − f (zn ) = N,D 2 n,d=1 (xdn − fd (zn )) where one ignores the terms for which xdn is missing, and estimates the function f and the low-dimensional data projections Z by alternating optimization. Linear functions f have been used in the homogeneity analysis literature [24], where this approach is called “missing data deleted”. Nonlinear functions f have been used recently (neural nets [20]; Gaussian processes for collaborative filtering [25]). Better results are obtained if adding a projection term N 2 and optimizing over the missing data as well [26]. n=1 zn − F(xn ) 7 Orig Missing nlPCA SVP GBMS MBMS Orig Missing nlPCA SVP GBMS MBMS Figure 5: Selected reconstructions of MNIST block-occluded digits ‘7’ with different methods. Prior to our denoising-based work there have been efforts to extend the low-rank models to smooth manifolds, mostly in the context of compressed sensing. Baraniuk and Wakin [27] show that certain random measurements, e.g. random projection to a low-dimensional subspace, can preserve the metric of the manifold fairly well, if the intrinsic dimension and the curvature of the manifold are both small enough. However, these observations are not suitable for matrix completion and no algorithm is given for recovering the signal. Chen et al. [28] explicitly model a pre-determined manifold, and use this to regularize the signal when recovering the missing values. They estimate the manifold given complete data, while no complete data is assumed in our matrix completion setting. Another related work is [29], where the manifold modeled with Isomap is used in estimating the positions of satellite cameras in an iterative manner. Finally, our expectation that the value of a missing entry can be predicted from the values of neighboring points is similar to one category of collaborative filtering methods that essentially use similar users/items to predict missing values [3, 4]. 5 Conclusion We have proposed a new paradigm for matrix completion, denoising, which generalizes the commonly used assumption of low rank. Assuming low-rank implies a restrictive form of denoising where the data is forced to have zero variance away from a linear manifold. More general definitions of denoising can potentially handle data that lives in a low-dimensional manifold that is nonlinear, or whose dimensionality varies (e.g. a set of manifolds), or that does not have low rank at all, and naturally they handle noise in the data. Denoising works because of the fundamental fact that a missing value can be predicted by averaging nearby present values. Although we motivate our framework from a constrained optimization point of view (denoise subject to respecting the present data), we argue for an algorithmic view of denoising-based matrix completion: apply a denoising step, refill the present values, iterate until the validation error increases. In turn, this allows different forms of denoising, such as based on low-rank projection (earlier work) or local averaging with blurring mean-shift (this paper). Our nonparametric choice of mean-shift averaging further relaxes assumptions about the data and results in a simple algorithm with very few user parameters that afford user control (denoising scale, local dimensionality) but can be set automatically by cross-validation. Our algorithms are intended to be used as a postprocessing step over a user-provided initialization of the missing values, and we show they consistently improve upon existing algorithms. The MBMS-based algorithm bridges the gap between pure denoising (GBMS) and local low rank. Other definitions of denoising should be possible, for example using temporal as well as spatial neighborhoods, and even applicable to discrete data if we consider denoising as a majority voting among the neighbours of a vector (with suitable definitions of votes and neighborhood). Acknowledgments Work supported by NSF CAREER award IIS–0754089. 8 References [1] Emmanuel J. Cand` s and Benjamin Recht. Exact matrix completion via convex optimization. Foundations e of Computational Mathematics, 9(6):717–772, December 2009. [2] Emmanuel J. Cand` s and Terence Tao. The power of convex relaxation: Near-optimal matrix completion. e IEEE Trans. Information Theory, 56(5):2053–2080, April 2010. [3] Yehuda Koren. Factorization meets the neighborhood: A multifaceted collaborative filtering model. SIGKDD 2008, pages 426–434, Las Vegas, NV, August 24–27 2008. [4] Robert Bell and Yehuda Koren. Scalable collaborative filtering with jointly derived neighborhood interpolation weights. ICDM 2007, pages 43–52, October 28–31 2007. ´ [5] Weiran Wang and Miguel A. Carreira-Perpi˜ an. Manifold blurring mean shift algorithms for manifold n´ denoising. CVPR 2010, pages 1759–1766, San Francisco, CA, June 13–18 2010. [6] Matthias Hein and Markus Maier. Manifold denoising. NIPS 2006, 19:561–568. MIT Press, 2007. [7] Prateek Jain, Raghu Meka, and Inderjit S. Dhillon. Guaranteed rank minimization via singular value projection. NIPS 2010, 23:937–945. MIT Press, 2011. ´ [8] Miguel A. Carreira-Perpi˜ an. Fast nonparametric clustering with Gaussian blurring mean-shift. ICML n´ 2006, pages 153–160. Pittsburgh, PA, June 25–29 2006. [9] Keinosuke Fukunaga and Larry D. Hostetler. The estimation of the gradient of a density function, with application in pattern recognition. IEEE Trans. Information Theory, 21(1):32–40, January 1975. [10] Yizong Cheng. Mean shift, mode seeking, and clustering. IEEE Trans. PAMI, 17(8):790–799, 1995. [11] Dorin Comaniciu and Peter Meer. Mean shift: A robust approach toward feature space analysis. IEEE Trans. PAMI, 24(5):603–619, May 2002. ´ [12] Miguel A. Carreira-Perpi˜ an. Generalised blurring mean-shift algorithms for nonparametric clustering. n´ CVPR 2008, Anchorage, AK, June 23–28 2008. [13] Gabriel Taubin. A signal processing approach to fair surface design. SIGGRAPH 1995, pages 351–358. [14] Mathieu Desbrun, Mark Meyer, Peter Schr¨ der, and Alan H. Barr. Implicit fairing of irregular meshes o using diffusion and curvature flow. SIGGRAPH 1999, pages 317–324. [15] Fan R. K. Chung. Spectral Graph Theory. American Mathematical Society, Providence, RI, 1997. [16] Jianbo Shi and Jitendra Malik. Normalized cuts and image segmentation. IEEE Trans. PAMI, 22(8):888– 905, August 2000. [17] Roger A. Horn and Charles R. Johnson. Matrix Analysis. Cambridge University Press, 1986. [18] Jorge Nocedal and Stephen J. Wright. Numerical Optimization. Springer-Verlag, New York, second edition, 2006. [19] Tapio Schneider. Analysis of incomplete climate data: Estimation of mean values and covariance matrices and imputation of missing values. Journal of Climate, 14(5):853–871, March 2001. [20] Matthias Scholz, Fatma Kaplan, Charles L. Guy, Joachim Kopka, and Joachim Selbig. Non-linear PCA: A missing data approach. Bioinformatics, 21(20):3887–3895, October 15 2005. [21] Yigang Peng, Arvind Ganesh, John Wright, Wenli Xu, and Yi Ma. RASL: Robust alignment by sparse and low-rank decomposition for linearly correlated images. CVPR 2010, pages 763–770, 2010. [22] A. M. Buchanan and A. W. Fitzgibbon. Damped Newton algorithms for matrix factorization with missing data. CVPR 2005, pages 316–322, San Diego, CA, June 20–25 2005. [23] Andreas Argyriou, Theodoros Evgeniou, and Massimiliano Pontil. Multi-task feature learning. NIPS 2006, 19:41–48. MIT Press, 2007. [24] Albert Gifi. Nonlinear Multivariate Analysis. John Wiley & Sons, 1990. [25] Neil D. Lawrence and Raquel Urtasun. Non-linear matrix factorization with Gaussian processes. ICML 2009, Montreal, Canada, June 14–18 2009. ´ [26] Miguel A. Carreira-Perpi˜ an and Zhengdong Lu. Manifold learning and missing data recovery through n´ unsupervised regression. ICDM 2011, December 11–14 2011. [27] Richard G. Baraniuk and Michael B. Wakin. Random projections of smooth manifolds. Foundations of Computational Mathematics, 9(1):51–77, February 2009. [28] Minhua Chen, Jorge Silva, John Paisley, Chunping Wang, David Dunson, and Lawrence Carin. Compressive sensing on manifolds using a nonparametric mixture of factor analyzers: Algorithm and performance bounds. IEEE Trans. Signal Processing, 58(12):6140–6155, December 2010. [29] Michael B. Wakin. A manifold lifting algorithm for multi-view compressive imaging. In Proc. 27th Conference on Picture Coding Symposium (PCS’09), pages 381–384, 2009. 9

5 0.53211647 57 nips-2011-Comparative Analysis of Viterbi Training and Maximum Likelihood Estimation for HMMs

Author: Armen Allahverdyan, Aram Galstyan

Abstract: We present an asymptotic analysis of Viterbi Training (VT) and contrast it with a more conventional Maximum Likelihood (ML) approach to parameter estimation in Hidden Markov Models. While ML estimator works by (locally) maximizing the likelihood of the observed data, VT seeks to maximize the probability of the most likely hidden state sequence. We develop an analytical framework based on a generating function formalism and illustrate it on an exactly solvable model of HMM with one unambiguous symbol. For this particular model the ML objective function is continuously degenerate. VT objective, in contrast, is shown to have only finite degeneracy. Furthermore, VT converges faster and results in sparser (simpler) models, thus realizing an automatic Occam’s razor for HMM learning. For more general scenario VT can be worse compared to ML but still capable of correctly recovering most of the parameters. 1

6 0.53056514 174 nips-2011-Monte Carlo Value Iteration with Macro-Actions

7 0.52509451 229 nips-2011-Query-Aware MCMC

8 0.52288657 75 nips-2011-Dynamical segmentation of single trials from population neural data

9 0.51861757 249 nips-2011-Sequence learning with hidden units in spiking neural networks

10 0.51819998 66 nips-2011-Crowdclustering

11 0.51795864 241 nips-2011-Scalable Training of Mixture Models via Coresets

12 0.51700592 158 nips-2011-Learning unbelievable probabilities

13 0.51551175 180 nips-2011-Multiple Instance Filtering

14 0.51397884 221 nips-2011-Priors over Recurrent Continuous Time Processes

15 0.51390517 206 nips-2011-Optimal Reinforcement Learning for Gaussian Systems

16 0.51308125 243 nips-2011-Select and Sample - A Model of Efficient Neural Inference and Learning

17 0.51222575 301 nips-2011-Variational Gaussian Process Dynamical Systems

18 0.51192242 300 nips-2011-Variance Reduction in Monte-Carlo Tree Search

19 0.51165789 102 nips-2011-Generalised Coupled Tensor Factorisation

20 0.51095283 37 nips-2011-Analytical Results for the Error in Filtering of Gaussian Processes