nips nips2004 nips2004-126 knowledge-graph by maker-knowledge-mining
Source: pdf
Author: Robert D. Kleinberg
Abstract: In the multi-armed bandit problem, an online algorithm must choose from a set of strategies in a sequence of n trials so as to minimize the total cost of the chosen strategies. While nearly tight upper and lower bounds are known in the case when the strategy set is finite, much less is known when there is an infinite strategy set. Here we consider the case when the set of strategies is a subset of Rd , and the cost functions are continuous. In the d = 1 case, we improve on the best-known upper and lower bounds, closing the gap to a sublogarithmic factor. We also consider the case where d > 1 and the cost functions are convex, adapting a recent online convex optimization algorithm of Zinkevich to the sparser feedback model of the multi-armed bandit problem. 1
Reference: text
sentIndex sentText sentNum sentScore
1 Nearly Tight Bounds for the Continuum-Armed Bandit Problem Robert Kleinberg∗ Abstract In the multi-armed bandit problem, an online algorithm must choose from a set of strategies in a sequence of n trials so as to minimize the total cost of the chosen strategies. [sent-1, score-1.204]
2 While nearly tight upper and lower bounds are known in the case when the strategy set is finite, much less is known when there is an infinite strategy set. [sent-2, score-0.578]
3 Here we consider the case when the set of strategies is a subset of Rd , and the cost functions are continuous. [sent-3, score-0.393]
4 We also consider the case where d > 1 and the cost functions are convex, adapting a recent online convex optimization algorithm of Zinkevich to the sparser feedback model of the multi-armed bandit problem. [sent-5, score-1.159]
5 1 Introduction In an online decision problem, an algorithm must choose from among a set of strategies in each of n consecutive trials so as to minimize the total cost of the chosen strategies. [sent-6, score-0.584]
6 The costs of strategies are specified by a real-valued function which is defined on the entire strategy set and which varies over time in a manner initially unknown to the algorithm. [sent-7, score-0.361]
7 The archetypical online decision problems are the best expert problem, in which the entire cost function is revealed to the algorithm as feedback at the end of each trial, and the multiarmed bandit problem, in which the feedback reveals only the cost of the chosen strategy. [sent-8, score-1.341]
8 The names of the two problems are derived from the metaphors of combining expert advice (in the case of the best expert problem) and learning to play the best slot machine in a casino (in the case of the multi-armed bandit problem). [sent-9, score-0.793]
9 In addition to occupying a central position in online learning theory, algorithms for such problems have been applied in numerous other areas of computer science, such as paging and caching [6, 14], data structures [7], routing [4, 5], wireless networks [19], and online auction mechanisms [8, 15]. [sent-11, score-0.421]
10 Multi-armed bandit problems have been studied quite thoroughly in the case of a finite strategy set, and the performance of the optimal algorithm (as a function of n) is known ∗ M. [sent-13, score-0.842]
11 In contrast, much less is known in the case of an infinite strategy set. [sent-22, score-0.189]
12 In this paper, we consider multi-armed bandit problems with a continuum of strategies, parameterized by one or more real numbers. [sent-23, score-0.604]
13 In other words, we are studying online learning problems in which the learner designates a strategy in each time step by specifying a d-tuple of real numbers (x1 , . [sent-24, score-0.341]
14 , xd ); the cost function is then evaluated at (x1 , . [sent-27, score-0.22]
15 Two such applications are online auction mechanism design [8, 15], in which the strategy space is an interval of feasible prices, and online oblivious routing [5], in which the strategy space is a flow polytope. [sent-34, score-0.898]
16 Algorithms for online decisions problems are often evaluated in terms of their regret, defined as the difference in expected cost between the sequence of strategies chosen by the algorithm and the best fixed (i. [sent-35, score-0.593]
17 While tight upper and lower bounds on the regret of algorithms for the K-armed bandit problem have been known for many years [3, 18], our knowledge of such bounds for continuum-armed bandit problems is much less satisfactory. [sent-38, score-1.774]
18 For a one-dimensional strategy space, the first algorithm with sublinear regret appeared in [1], while the first polynomial lower bound on regret appeared in [15]. [sent-39, score-1.127]
19 For Lipschitz-continuous cost functions (the case introduced in [1]), the best known upper and lower bounds for this problem are currently O(n3/4 ) and Ω(n1/2 ), respectively [1, 15], leaving as an open question the problem of determining tight bounds for the regret as a function of n. [sent-40, score-0.857]
20 Here, we solve this open problem by sharpening the upper and lower bounds to O(n2/3 log1/3 (n)) and Ω(n2/3 ), respectively, closing the gap to a sublogarithmic factor. [sent-41, score-0.213]
21 Note that this requires improving the best known algorithm as well as the lower bound technique. [sent-42, score-0.214]
22 Cope requires that each cost function C achieves its optimum at a unique point θ, and that there exist constants K0 > 0 and p ≥ 1 such that for all x, |C(x) − C(θ)| ≥ K0 x − θ p . [sent-44, score-0.209]
23 For this class of cost functions — which is probably broad enough to capture most cases of practical interest — he proves that the regret of the optimal continuum-armed bandit algorithm is O(n−1/2 ), and that this bound is tight. [sent-45, score-1.329]
24 For a d-dimensional strategy space, any multi-armed bandit algorithm must suffer regret depending exponentially on d unless the cost functions are further constrained. [sent-46, score-1.427]
25 (This is demonstrated by a simple counterexample in which the cost function is identically zero in all but one orthant of Rd , takes a negative value somewhere in that orthant, and does not vary over time. [sent-47, score-0.213]
26 ) For the best-expert problem, algorithms whose regret is polynomial in d and sublinear in n are known for the case of cost functions which are constrained to be linear [16] or convex [21]. [sent-48, score-0.69]
27 In the case of linear cost functions, the relevant algorithm has been adapted to the multi-armed bandit setting in [4, 9]. [sent-49, score-0.831]
28 Here we adapt the online convex programming algorithm of [21] to the continuum-armed bandit setting, obtaining the first known algorithm for this problem to achieve regret depending polynomially on d and sublinearly on n. [sent-50, score-1.237]
29 Their algorithm and analysis are superior to ours, requiring fewer smoothness assumptions on the cost functions and producing a tighter upper bound on regret. [sent-52, score-0.437]
30 2 Terminology and Conventions We will assume that a strategy set S ⊆ Rd is given, and that it is a compact subset of Rd . [sent-53, score-0.189]
31 These cost functions must satisfy a continuity property based on the following definition. [sent-62, score-0.255]
32 The first of these is identical with the continuum-armed bandit problem considered in [1], except that [1] formulates the problem in terms of maximizing reward rather than minimizing cost. [sent-67, score-0.604]
33 The second model concerns a sequence of cost functions chosen by an oblivious adversary. [sent-68, score-0.359]
34 The expected cost ¯ ¯ function C : S → R is defined by C(u) = E(C(u)) where C is a random sample ¯ from this distribution. [sent-73, score-0.208]
35 In addition, we assume there exist positive constants ζ, s 0 such that if C is a random sample from the given distribution on cost functions, then 1 2 2 E(esC(u) ) ≤ e 2 ζ s ∀|s| ≤ s0 , u ∈ S. [sent-75, score-0.239]
36 ) A multi-armed bandit algorithm is a rule for deciding which strategy to play at time t, given the outcomes of the first t − 1 trials. [sent-84, score-0.871]
37 More formally, a deterministic multi-armed bandit algorithm U is a sequence of functions U1 , U2 , . [sent-85, score-0.77]
38 , ut−1 , xt−1 ) defines the strategy to be chosen at time t if the algorithm’s first t − 1 choices were u1 , . [sent-92, score-0.189]
39 A randomized multi-armed bandit algorithm is a probability distribution over deterministic multi-armed bandit algorithms. [sent-99, score-1.31]
40 (If the cost functions are random, we will assume their randomness is independent of the algorithm’s random choices. [sent-100, score-0.315]
41 ) For a randomized multi-armed bandit algorithm, the n-step regret R n is the expected difference in total cost between the algorithm’s chosen strategies u 1 , u2 , . [sent-101, score-1.303]
42 Here, the expectation is over the algorithm’s random choices and (in the random-costs model) the randomness of the cost functions. [sent-107, score-0.238]
43 When the algorithm is not sampling a design point, it chooses a strategy which minimizes ˆ ˆ ¯ expected cost according to the current estimate C. [sent-110, score-0.494]
44 The algorithm in [1] thus uses its sampling steps inefficiently, focusing too much attention on portions of the strategy ¯ interval where an accurate estimate of C is unnecessary. [sent-115, score-0.305]
45 First we discretize the strategy space by constraining the algorithm to choose strategies only from a fixed, finite set of K equally spaced design points {1/K, 2/K, . [sent-117, score-0.419]
46 ) This reduces the continuum-armed bandit problem to a finite-armed bandit problem, and we may apply one of the standard algorithms for such problems. [sent-122, score-1.208]
47 Our continuum-armed bandit algorithm is shown in Figure 1. [sent-123, score-0.653]
48 The inner loop requires a subroutine MAB which should implement a finite-armed bandit algorithm appropriate for the cost model under consideration. [sent-125, score-0.908]
49 For example, MAB could be the algorithm UCB1 of [2] in the random case, or the algorithm Exp3 of [3] in the adversarial case. [sent-126, score-0.242]
50 The semantics of MAB are as follows: it is initialized with a finite set of strategies; subsequently it recommends strategies in this set, waits to learn the feedback score for its recommendation, and updates its recommendation when the feedback is received. [sent-127, score-0.258]
51 The analysis of this algorithm will ensure that its choices have low regret relative to the best ¯ design point. [sent-128, score-0.458]
52 The Lipschitz regularity of C guarantees that the best design point performs nearly as well, on average, as the best strategy in S. [sent-129, score-0.329]
53 A LGORITHM CAB1 T ←1 while T ≤ n K← T log T 1 2α+1 Initialize MAB with strategy set {1/K, 2/K, . [sent-130, score-0.189]
54 end T ← 2T end Figure 1: Algorithm for the one-parameter continuum-armed bandit problem Theorem 3. [sent-140, score-0.604]
55 In both the random and adversarial models, the regret of algorithm CAB1 α+1 α is O(n 2α+1 log 2α+1 (n)). [sent-142, score-0.523]
56 Let q = 2α+1 , so that the regret bound is O(n1−q logq (n)). [sent-144, score-0.561]
57 It suffices to prove that the regret in the inner loop is O(T 1−q logq (T )); if so, then we may sum this bound over all iterations of the inner loop to get a geometric progression with constant ratio, whose largest term is O(n1−q logq (n)). [sent-145, score-0.879]
58 Let u∗ be the best strategy in S, and let u be the element of {1/K, 2/K, . [sent-147, score-0.225]
59 Then |u − u∗ | < 1/K, so T 1 ¯ using the fact that C ∈ ulL(α, L, δ) (or that T t=1 Ct ∈ ulL(α, L, δ) in the adversarial case) we obtain T E t=1 Ct (u ) − Ct (u∗ ) ≤ T = O T 1−q logq (T ) . [sent-151, score-0.254]
60 2 in [3], which asserts that the regret √ of Exp3 is O T K log K . [sent-154, score-0.33]
61 (The upper bound for the adversarial model doesn’t directly imply an upper bound for the random model, since the cost functions are required to take values in [0, 1] in the adversarial model but not in the random model. [sent-156, score-0.809]
62 The time steps in which the algorithm chooses strategies in A contribute at most O(T ∆) = O(T 1−q logq (T )) to the regret. [sent-165, score-0.402]
63 For each strategy u ∈ B, one may prove that, with high probability, u is played only O(log(T )/∆(u)2 ) times. [sent-166, score-0.213]
64 ) This implies that the time steps in which the algorithm chooses strategies in B contribute at most O(K log(T )/∆) = O(T 1−q logq (T )) to the regret, which completes the proof. [sent-169, score-0.402]
65 4 Lower bounds for the one-parameter case There are many reasons to expect that Algorithm CAB1 is an inefficient algorithm for the continuum-armed bandit problem. [sent-170, score-0.714]
66 , 1} as an unordered set, ignoring the fact that experiments which sample the cost of one strategy j/K are (at least weakly) predictive of the costs of nearby strategies. [sent-174, score-0.401]
67 1, the exponent of 2α+1 α+1 β is the best possible: for any β < 2α+1 , no algorithm can achieve regret O(n ). [sent-177, score-0.415]
68 This lower bound applies to both the randomized and adversarial models. [sent-178, score-0.296]
69 as follows: for k > 0, the middle third of [ak−1 , bk−1 ] is subdivided into intervals of width wk = 3−k! [sent-187, score-0.197]
70 , and [ak , bk ] is one of these subintervals chosen uniformly at random. [sent-188, score-0.255]
71 For any randomized multi-armed bandit algorithm, there exists a probability α+1 distribution on cost functions such that for all β < 2α+1 , the algorithm’s regret {Rn }∞ n=1 in the random model satisfies Rn lim sup β = ∞. [sent-200, score-1.272]
72 n→∞ n The same lower bound applies in the adversarial model. [sent-201, score-0.243]
73 , such that if we use these intervals to define a probability distribution on cost functions C(x) as above, then Rn /nβ diverges −2α 1 as n runs through the sequence n1 , n2 , n3 , . [sent-206, score-0.339]
74 Subdivide [ak−1 , bk−1 ] into subintervals of width wk , and suppose [ak , bk ] is chosen uniformly at random from this set of subintervals. [sent-214, score-0.438]
75 For any u, u ∈ [ak−1 , bk−1 ], the Kullback-Leibler 2α distance KL(C(u) C(u )) between the cost distributions at u and u is O(wk ), and it is equal to zero unless at least one of u, u lies in [ak , bk ]. [sent-215, score-0.325]
76 This means, roughly speaking, −2α that the algorithm must sample strategies in [ak , bk ] at least wk times before being able to identify [ak , bk ] with constant probability. [sent-216, score-0.634]
77 But [ak , bk ] could be any one of wk−1 /wk −2α possible subintervals, and we don’t have enough time to play wk trials in even a constant fraction of these subintervals before reaching time nk . [sent-217, score-0.513]
78 Therefore, with constant probability, a constant fraction of the strategies chosen up to time nk are not located in [ak , bk ], and α each of them contributes Ω(wk ) to the regret. [sent-218, score-0.346]
79 This means the expected regret at time nk is α Ω(nk wk ). [sent-219, score-0.544]
80 From this, we obtain the stated lower bound using the fact that α+1 2α+1 α n k wk = n k −o(1) . [sent-220, score-0.282]
81 Although this proof sketch rests on a much more complicated construction than the lower bound proof for the finite-armed bandit problem given by Auer et al in [3], one may follow essentially the same series of steps as in their proof to make the sketch given above into a rigorous proof. [sent-221, score-0.887]
82 5 An online convex optimization algorithm We turn now to continuum-armed bandit problems with a strategy space of dimension d > 1. [sent-223, score-1.047]
83 As mentioned in the introduction, for any randomized multi-armed bandit algorithm there is a cost function C (with any desired degree of smoothness and boundedness) such that the algorithm’s regret is Ω(2d ) when faced with the input sequence C1 = C2 = . [sent-224, score-1.254]
84 As a counterpoint to this negative result, we seek interesting classes of cost functions which admit a continuum-armed bandit algorithm whose regret is polynomial in d (and, as always, sublinear in n). [sent-228, score-1.318]
85 A natural candidate is the class of convex, smooth functions on a closed, bounded, convex strategy set S ⊂ Rd , since this is the most 1 Defined by the formula KL(P Q) = P, Q with density functions p, q. [sent-229, score-0.396]
86 R log (p(x)/q(x)) dp(x), for probability distributions general class of functions for which the corresponding best-expert problem is known to admit an efficient algorithm, namely Zinkevich’s greedy projection algorithm [21]. [sent-230, score-0.258]
87 It selects an arbitrary initial strategy u1 ∈ S and updates its strategy in each subsequent time step t according to the rule ut+1 = P (ut − ηt Ct (ut )), where Ct (ut ) is the gradient of Ct at ut and P : Rd → S is the projection operator which maps each point of Rd to the nearest point of S. [sent-235, score-0.667]
88 ) Note that greedy projection is nearly a multi-armed bandit algorithm: if the algorithm’s feedback when sampling strategy ut were the vector Ct (ut ) rather than the number Ct (ut ), it would have all the information required to run greedy projection. [sent-237, score-1.247]
89 To adapt this algorithm to the multi-armed bandit setting, we use the following idea: group the timeline into phases of d + 1 consecutive steps, with a cost function Cφ for each phase φ defined by averaging the cost functions at each time step of φ. [sent-238, score-1.186]
90 In each phase use trials at d + 1 affinely independent points of S, located at or near ut , to estimate the gradient Cφ (ut ). [sent-239, score-0.347]
91 This does not increase the regret by a factor of more than d2 . [sent-242, score-0.33]
92 At the beginning of a phase φ, we assume the algorithm has determined a basepoint strategy uφ . [sent-251, score-0.338]
93 , d}, and in step t we sample the strategy yt = uφ + νφ (xσ(t) − uφ ). [sent-261, score-0.189]
94 Assume that S is in isotropic position and that the cost functions satisfy Ct (x) ≤ 1 for all x ∈ S, 1 ≤ t ≤ n, and that in addition the Hessian matrix of Ct (x) at each point x ∈ S has Frobenius norm bounded above by a constant. [sent-266, score-0.293]
95 If ηk = k −3/4 and νk = k −1/4 , then the regret of the simulated greedy projection algorithm is O(d3 n3/4 ). [sent-267, score-0.483]
96 The algorithm is simply running greedy projection with respect to the simulated cost functions Bφ , and it consequently satisfies a low-regret bound with respect to those functions. [sent-273, score-0.499]
97 The random bijection σ — which is unnecessary in the Kiefer-Wolfowitz algorithm — is used here to defend against the oblivious adversary. [sent-278, score-0.183]
98 In each case, the desired upper bound can be inferred from properties of barycentric spanners, or from the convexity of Cφ and the bounds on its first and second derivatives. [sent-282, score-0.234]
99 Gambling in a rigged casino: The adversarial multi-armed bandit problem. [sent-299, score-0.718]
100 Online geometric optimization in the bandit setting against an adaptive adversary. [sent-336, score-0.631]
wordName wordTfidf (topN-words)
[('bandit', 0.604), ('regret', 0.33), ('ut', 0.22), ('strategy', 0.189), ('cost', 0.178), ('ct', 0.165), ('wk', 0.153), ('online', 0.152), ('bk', 0.147), ('logq', 0.14), ('strategies', 0.138), ('ak', 0.125), ('adversarial', 0.114), ('ull', 0.111), ('lum', 0.1), ('bound', 0.091), ('alai', 0.08), ('mab', 0.08), ('subintervals', 0.08), ('functions', 0.077), ('oblivious', 0.064), ('nk', 0.061), ('bounds', 0.061), ('phase', 0.06), ('greedy', 0.059), ('af', 0.054), ('randomized', 0.053), ('convex', 0.053), ('zinkevich', 0.052), ('sublinear', 0.052), ('rd', 0.049), ('algorithm', 0.049), ('feedback', 0.046), ('projection', 0.045), ('auer', 0.044), ('intervals', 0.044), ('loop', 0.044), ('trials', 0.043), ('design', 0.043), ('upper', 0.042), ('xd', 0.042), ('bump', 0.042), ('routing', 0.042), ('nested', 0.042), ('sequence', 0.04), ('ahan', 0.04), ('auction', 0.04), ('barycentric', 0.04), ('basepoint', 0.04), ('bijection', 0.04), ('chapire', 0.04), ('esa', 0.04), ('flaxman', 0.04), ('hawla', 0.04), ('ianchi', 0.04), ('leinberg', 0.04), ('reund', 0.04), ('soda', 0.04), ('sublogarithmic', 0.04), ('timeline', 0.04), ('steps', 0.04), ('focs', 0.038), ('isotropic', 0.038), ('proof', 0.038), ('lower', 0.038), ('lipschitz', 0.037), ('best', 0.036), ('chooses', 0.035), ('minu', 0.035), ('orthant', 0.035), ('kalai', 0.035), ('mcmahan', 0.035), ('caching', 0.035), ('costs', 0.034), ('tight', 0.034), ('inef', 0.033), ('inner', 0.033), ('proceedings', 0.033), ('closing', 0.032), ('nely', 0.032), ('clinical', 0.032), ('casino', 0.032), ('constants', 0.031), ('rn', 0.03), ('random', 0.03), ('randomness', 0.03), ('game', 0.03), ('play', 0.029), ('admit', 0.028), ('recommendation', 0.028), ('uniformly', 0.028), ('expert', 0.028), ('adaptive', 0.027), ('interval', 0.027), ('nearly', 0.025), ('cn', 0.025), ('appeared', 0.024), ('prove', 0.024), ('decision', 0.024), ('gradient', 0.024)]
simIndex simValue paperId paperTitle
same-paper 1 1.0000007 126 nips-2004-Nearly Tight Bounds for the Continuum-Armed Bandit Problem
Author: Robert D. Kleinberg
Abstract: In the multi-armed bandit problem, an online algorithm must choose from a set of strategies in a sequence of n trials so as to minimize the total cost of the chosen strategies. While nearly tight upper and lower bounds are known in the case when the strategy set is finite, much less is known when there is an infinite strategy set. Here we consider the case when the set of strategies is a subset of Rd , and the cost functions are continuous. In the d = 1 case, we improve on the best-known upper and lower bounds, closing the gap to a sublogarithmic factor. We also consider the case where d > 1 and the cost functions are convex, adapting a recent online convex optimization algorithm of Zinkevich to the sparser feedback model of the multi-armed bandit problem. 1
2 0.20734438 48 nips-2004-Convergence and No-Regret in Multiagent Learning
Author: Michael Bowling
Abstract: Learning in a multiagent system is a challenging problem due to two key factors. First, if other agents are simultaneously learning then the environment is no longer stationary, thus undermining convergence guarantees. Second, learning is often susceptible to deception, where the other agents may be able to exploit a learner’s particular dynamics. In the worst case, this could result in poorer performance than if the agent was not learning at all. These challenges are identifiable in the two most common evaluation criteria for multiagent learning algorithms: convergence and regret. Algorithms focusing on convergence or regret in isolation are numerous. In this paper, we seek to address both criteria in a single algorithm by introducing GIGA-WoLF, a learning algorithm for normalform games. We prove the algorithm guarantees at most zero average regret, while demonstrating the algorithm converges in many situations of self-play. We prove convergence in a limited setting and give empirical results in a wider variety of situations. These results also suggest a third new learning criterion combining convergence and regret, which we call negative non-convergence regret (NNR). 1
3 0.17862977 91 nips-2004-Joint Tracking of Pose, Expression, and Texture using Conditionally Gaussian Filters
Author: Tim K. Marks, J. C. Roddey, Javier R. Movellan, John R. Hershey
Abstract: We present a generative model and stochastic filtering algorithm for simultaneous tracking of 3D position and orientation, non-rigid motion, object texture, and background texture using a single camera. We show that the solution to this problem is formally equivalent to stochastic filtering of conditionally Gaussian processes, a problem for which well known approaches exist [3, 8]. We propose an approach based on Monte Carlo sampling of the nonlinear component of the process (object motion) and exact filtering of the object and background textures given the sampled motion. The smoothness of image sequences in time and space is exploited by using Laplace’s method to generate proposal distributions for importance sampling [7]. The resulting inference algorithm encompasses both optic flow and template-based tracking as special cases, and elucidates the conditions under which these methods are optimal. We demonstrate an application of the system to 3D non-rigid face tracking. 1 Background Recent algorithms track morphable objects by solving optic flow equations, subject to the constraint that the tracked points belong to an object whose non-rigid deformations are linear combinations of a set of basic shapes [10, 2, 11]. These algorithms require precise initialization of the object pose and tend to drift out of alignment on long video sequences. We present G-flow, a generative model and stochastic filtering formulation of tracking that address the problems of initialization and error recovery in a principled manner. We define a non-rigid object by the 3D locations of n vertices. The object is a linear combination of k fixed morph bases, with coefficients c = [c1 , c2 , · · · , ck ]T . The fixed 3 × k matrix hi contains the position of the ith vertex in all k morph bases. The transformation from object-centered to image coordinates consists of a rotation, weak perspective projection, and translation. Thus xi , the 2D location of the ith vertex on the image plane, is xi = grhi c + l, (1) where r is the 3 × 3 rotation matrix, l is the 2 × 1 translation vector, and g = 1 0 0 is the 010 projection matrix. The object pose, ut , comprises both the rigid motion parameters and the morph parameters at time t: ut = {r(t), l(t), c(t)}. (2) 1.1 Optic flow Let yt represent the current image, and let xi (ut ) index the image pixel that is rendered by the ith object vertex when the object assumes pose ut . Suppose that we know ut−1 , the pose at time t − 1, and we want to find ut , the pose at time t. This problem can be solved by minimizing the following form with respect to ut : ut = argmin ˆ ut 1 2 n 2 [yt (xi (ut )) − yt−1 (xi (ut−1 ))] . (3) i=1 In the special case in which the xi (ut ) are neighboring points that move with the same 2D displacement, this reduces to the standard Lucas-Kanade optic flow algorithm [9, 1]. Recent work [10, 2, 11] has shown that in the general case, this optimization problem can be solved efficiently using the Gauss-Newton method. We will take advantage of this fact to develop an efficient stochastic inference algorithm within the framework of G-flow. Notational conventions Unless otherwise stated, capital letters are used for random variables, small letters for specific values taken by random variables, and Greek letters for fixed model parameters. Subscripted colons indicate sequences: e.g., X1:t = X1 · · · Xt . The term In stands for the n × n identity matrix, E for expected value, V ar for the covariance matrix, and V ar−1 for the inverse of the covariance matrix (precision matrix). 2 The Generative Model for G-Flow Figure 1: Left: a(Ut ) determines which texel (color at a vertex of the object model or a pixel of the background model) is responsible for rendering each image pixel. Right: G-flow video generation model: At time t, the object’s 3D pose, Ut , is used to project the object texture, Vt , into 2D. This projection is combined with the background texture, Bt , to generate the observed image, Yt . We model the image sequence Y as a stochastic process generated by three hidden causes, U , V , and B, as shown in the graphical model (Figure 1, right). The m × 1 random vector Yt represents the m-pixel image at time t. The n × 1 random vector Vt and the m × 1 random vector Bt represent the n-texel object texture and the m-texel background texture, respectively. As illustrated in Figure 1, left, the object pose, Ut , determines onto which image pixels the object and background texels project at time t. This is formulated using the projection function a(Ut ). For a given pose, ut , the projection a(ut ) is a block matrix, def a(ut ) = av (ut ) ab (ut ) . Here av (ut ), the object projection function, is an m × n matrix of 0s and 1s that tells onto which image pixel each object vertex projects; e.g., a 1 at row j, column i it means that the ith object point projects onto image pixel j. Matrix ab plays the same role for background pixels. Assuming the foreground mapping is one-toone, we let ab = Im −av (ut )av (ut )T , expressing the simple occlusion constraint that every image pixel is rendered by object or background, but not both. In the G-flow generative model: Vt Yt = a(Ut ) + Wt Wt ∼ N (0, σw Im ), σw > 0 Bt (4) Ut ∼ p(ut | ut−1 ) v v Vt = Vt−1 + Zt−1 Zt−1 ∼ N (0, Ψv ), Ψv is diagonal b b Bt = Bt−1 + Zt−1 Zt−1 ∼ N (0, Ψb ), Ψb is diagonal where p(ut | ut−1 ) is the pose transition distribution, and Z v , Z b , W are independent of each other, of the initial conditions, and over time. The form of the pose distribution is left unspecified since the algorithm proposed here does not require the pose distribution or the pose dynamics to be Gaussian. For the initial conditions, we require that the variance of V1 and the variance of B1 are both diagonal. Non-rigid 3D tracking is a difficult nonlinear filtering problem because changing the pose has a nonlinear effect on the image pixels. Fortunately, the problem has a rich structure that we can exploit: under the G-flow model, video generation is a conditionally Gaussian process [3, 6, 4, 5]. If the specific values taken by the pose sequence, u1:t , were known, then the texture processes, V and B, and the image process, Y , would be jointly Gaussian. This suggests the following scheme: we could use particle filtering to obtain a distribution of pose experts (each expert corresponds to a highly probable sample of pose, u1:t ). For each expert we could then use Kalman filtering equations to infer the posterior distribution of texture given the observed images. This method is known in the statistics community as a Monte Carlo filtering solution for conditionally Gaussian processes [3, 4], and in the machine learning community as Rao-Blackwellized particle filtering [6, 5]. We found that in addition to Rao-Blackwellization, it was also critical to use Laplace’s method to generate the proposal distributions for importance sampling [7]. In the context of G-flow, we accomplished this by performing an optic flow-like optimization, using an efficient algorithm similar to those in [10, 2]. 3 Inference Our goal is to find an expression for the filtering distribution, p(ut , vt , bt | y1:t ). Using the law of total probability, we have the following equation for the filtering distribution: p(ut , vt , bt | y1:t ) = p(ut , vt , bt | u1:t−1 , y1:t ) p(u1:t−1 | y1:t ) du1:t−1 Opinion of expert (5) Credibility of expert We can think of the integral in (5) as a sum over a distribution of experts, where each expert corresponds to a single pose history, u1:t−1 . Based on its hypothesis about pose history, each expert has an opinion about the current pose of the object, Ut , and the texture maps of the object and background, Vt and Bt . Each expert also has a credibility, a scalar that measures how well the expert’s opinion matches the observed image yt . Thus, (5) can be interpreted as follows: The filtering distribution at time t is obtained by integrating over the entire ensemble of experts the opinion of each expert weighted by that expert’s credibility. The opinion distribution of expert u1:t−1 can be factorized into the expert’s opinion about the pose Ut times the conditional distribution of texture Vt , Bt given pose: p(ut , vt , bt | u1:t−1 , y1:t ) = p(ut | u1:t−1 , y1:t ) p(vt , bt | u1:t , y1:t ) (6) Opinion of expert Pose Opinion Texture Opinion given pose The rest of this section explains how we evaluate each term in (5) and (6). We cover the distribution of texture given pose in 3.1, pose opinion in 3.2, and credibility in 3.3. 3.1 Texture opinion given pose The distribution of Vt and Bt given the pose history u1:t is Gaussian with mean and covariance that can be obtained using the Kalman filter estimation equations: −1 V ar−1 (Vt , Bt | u1:t , y1:t ) = V ar−1 (Vt , Bt | u1:t−1 , y1:t−1 ) + a(ut )T σw a(ut ) E(Vt , Bt | u1:t , y1:t ) = V ar(Vt , Bt | u1:t , y1:t ) −1 × V ar−1 (Vt , Bt | u1:t−1 , y1:t−1 )E(Vt , Bt | u1:t−1 , y1:t−1 ) + a(ut )T σw yt (7) (8) This requires p(Vt , Bt |u1:t−1 , y1:t−1 ), which we get from the Kalman prediction equations: E(Vt , Bt | u1:t−1 , y1:t−1 ) = E(Vt−1 , Bt−1 | u1:t−1 , y1:t−1 ) V ar(Vt , Bt | u1:t−1 , y1:t−1 ) = V ar(Vt−1 , Bt−1 | u1:t−1 , y1:t−1 ) + (9) Ψv 0 0 Ψb (10) In (9), the expected value E(Vt , Bt | u1:t−1 , y1:t−1 ) consists of texture maps (templates) for the object and background. In (10), V ar(Vt , Bt | u1:t−1 , y1:t−1 ) represents the degree of uncertainty about each texel in these texture maps. Since this is a diagonal matrix, we can refer to the mean and variance of each texel individually. For the ith texel in the object texture map, we use the following notation: µv (i) t v σt (i) def = ith element of E(Vt | u1:t−1 , y1:t−1 ) def = (i, i)th element of V ar(Vt | u1:t−1 , y1:t−1 ) b Similarly, define µb (j) and σt (j) as the mean and variance of the jth texel in the backt ground texture map. (This notation leaves the dependency on u1:t−1 and y1:t−1 implicit.) 3.2 Pose opinion Based on its current texture template (derived from the history of poses and images up to time t−1) and the new image yt , each expert u1:t−1 has a pose opinion, p(ut |u1:t−1 , y1:t ), a probability distribution representing that expert’s beliefs about the pose at time t. Since the effect of ut on the likelihood function is nonlinear, we will not attempt to find an analytical solution for the pose opinion distribution. However, due to the spatio-temporal smoothness of video signals, it is possible to estimate the peak and variance of an expert’s pose opinion. 3.2.1 Estimating the peak of an expert’s pose opinion We want to estimate ut (u1:t−1 ), the value of ut that maximizes the pose opinion. Since ˆ p(ut | u1:t−1 , y1:t ) = p(y1:t−1 | u1:t−1 ) p(ut | ut−1 ) p(yt | u1:t , y1:t−1 ), p(y1:t | u1:t−1 ) (11) def ut (u1:t−1 ) = argmax p(ut | u1:t−1 , y1:t ) = argmax p(ut | ut−1 ) p(yt | u1:t , y1:t−1 ). ˆ ut ut (12) We now need an expression for the final term in (12), the predictive distribution p(yt | u1:t , y1:t−1 ). By integrating out the hidden texture variables from p(yt , vt , bt | u1:t , y1:t−1 ), and using the conditional independence relationships defined by the graphical model (Figure 1, right), we can derive: 1 m log p(yt | u1:t , y1:t−1 ) = − log 2π − log |V ar(Yt | u1:t , y1:t−1 )| 2 2 n v 2 1 (yt (xi (ut )) − µt (i)) 1 (yt (j) − µb (j))2 t − − , (13) v (i) + σ b 2 i=1 σt 2 σt (j) + σw w j∈X (ut ) where xi (ut ) is the image pixel rendered by the ith object vertex when the object assumes pose ut , and X (ut ) is the set of all image pixels rendered by the object under pose ut . Combining (12) and (13), we can derive ut (u1:t−1 ) = argmin − log p(ut | ut−1 ) ˆ (14) ut + 1 2 n i=1 [yt (xi (ut )) − µv (i)]2 [yt (xi (ut )) − µb (xi (ut ))]2 t t b − − log[σt (xi (ut )) + σw ] v b σt (i) + σw σt (xi (ut )) + σw Foreground term Background terms Note the similarity between (14) and constrained optic flow (3). For example, focus on the foreground term in (14) and ignore the weights in the denominator. The previous image yt−1 from (3) has been replaced by µv (·), the estimated object texture based on the images t and poses up to time t − 1. As in optic flow, we can find the pose estimate ut (u1:t−1 ) ˆ efficiently using the Gauss-Newton method. 3.2.2 Estimating the distribution of an expert’s pose opinion We estimate the distribution of an expert’s pose opinion using a combination of Laplace’s method and importance sampling. Suppose at time t − 1 we are given a sample of experts (d) (d) indexed by d, each endowed with a pose sequence u1:t−1 , a weight wt−1 , and the means and variances of Gaussian distributions for object and background texture. For each expert (d) (d) u1:t−1 , we use (14) to compute ut , the peak of the pose distribution at time t according ˆ (d) to that expert. Define σt as the inverse Hessian matrix of (14) at this peak, the Laplace ˆ estimate of the covariance matrix of the expert’s opinion. We then generate a set of s (d,e) (d) independent samples {ut : e = 1, · · · , s} from a Gaussian distribution with mean ut ˆ (d) (d) (d) and variance proportional to σt , g(·|ˆt , αˆt ), where the parameter α > 0 determines ˆ u σ the sharpness of the sampling distribution. (Note that letting α → 0 would be equivalent to (d,e) (d) simply setting the new pose equal to the peak of the pose opinion, ut = ut .) To find ˆ the parameters of this Gaussian proposal distribution, we use the Gauss-Newton method, ignoring the second of the two background terms in (14). (This term is not ignored in the importance sampling step.) To refine our estimate of the pose opinion we use importance sampling. We assign each sample from the proposal distribution an importance weight wt (d, e) that is proportional to the ratio between the posterior distribution and the proposal distribution: s (d) p(ut | u1:t−1 , y1:t ) = ˆ (d,e) δ(ut − ut ) wt (d, e) s f =1 wt (d, f ) (15) e=1 (d,e) (d) (d) (d,e) p(ut | ut−1 )p(yt | u1:t−1 , ut , y1:t−1 ) wt (d, e) = (16) (d,e) (d) (d) g(ut | ut , αˆt ) ˆ σ (d,e) (d) The numerator of (16) is proportional to p(ut |u1:t−1 , y1:t ) by (12), and the denominator of (16) is the sampling distribution. 3.3 Estimating an expert’s credibility (d) The credibility of the dth expert, p(u1:t−1 | y1:t ), is proportional to the product of a prior term and a likelihood term: (d) (d) p(u1:t−1 | y1:t−1 )p(yt | u1:t−1 , y1:t−1 ) (d) p(u1:t−1 | y1:t ) = . (17) p(yt | y1:t−1 ) Regarding the likelihood, p(yt |u1:t−1 , y1:t−1 ) = p(yt , ut |u1:t−1 , y1:t−1 )dut = p(yt |u1:t , y1:t−1 )p(ut |ut−1 )dut (18) (d,e) We already generated a set of samples {ut : e = 1, · · · , s} that estimate the pose opin(d) ion of the dth expert, p(ut | u1:t−1 , y1:t ). We can now use these samples to estimate the likelihood for the dth expert: (d) p(yt | u1:t−1 , y1:t−1 ) = (d) (d) p(yt | u1:t−1 , ut , y1:t−1 )p(ut | ut−1 )dut (19) (d) (d) (d) (d) = p(yt | u1:t−1 , ut , y1:t−1 )g(ut | ut , αˆt ) ˆ σ 3.4 p(ut | ut−1 ) s e=1 dut ≈ wt (d, e) s Updating the filtering distribution g(ut | (d) (d) ut , αˆt ) ˆ σ Once we have calculated the opinion and credibility of each expert u1:t−1 , we evaluate the integral in (5) as a weighted sum over experts. The credibilities of all of the experts are normalized to sum to 1. New experts u1:t (children) are created from the old experts u1:t−1 (parents) by appending a pose ut to the parent’s history of poses u1:t−1 . Every expert in the new generation is created as follows: One parent is chosen to sire the child. The probability of being chosen is proportional to the parent’s credibility. The child’s value of ut is chosen at random from its parent’s pose opinion (the weighted samples described in Section 3.2.2). 4 Relation to Optic Flow and Template Matching In basic template-matching, the same time-invariant texture map is used to track every frame in the video sequence. Optic flow can be thought of as template-matching with a template that is completely reset at each frame for use in the subsequent frame. In most cases, optimal inference under G-flow involves a combination of optic flow-based and template-based tracking, in which the texture template gradually evolves as new images are presented. Pure optic flow and template-matching emerge as special cases. Optic Flow as a Special Case Suppose that the pose transition probability p(ut | ut−1 ) is uninformative, that the background is uninformative, that every texel in the initial object texture map has equal variance, V ar(V1 ) = κIn , and that the texture transition uncertainty is very high, Ψv → diag(∞). Using (7), (8), and (10), it follows that: µv (i) = [av (ut−1 )]T yt−1 = yt−1 (xi (ut−1 )) , t (20) i.e., the object texture map at time t is determined by the pixels from image yt−1 that according to pose ut−1 were rendered by the object. As a result, (14) reduces to: ut (u1:t−1 ) = argmin ˆ ut 1 2 n yt (xi (ut )) − yt−1 (xi (ut−1 )) 2 (21) i=1 which is identical to (3). Thus constrained optic flow [10, 2, 11] is simply a special case of optimal inference under G-flow, with a single expert and with sampling parameter α → 0. The key assumption that Ψv → diag(∞) means that the object’s texture is very different in adjacent frames. However, optic flow is typically applied in situations in which the object’s texture in adjacent frames is similar. The optimal solution in such situations calls not for optic flow, but for a texture map that integrates information across multiple frames. Template Matching as a Special Case Suppose the initial texture map is known precisely, V ar(V1 ) = 0, and the texture transition uncertainty is very low, Ψv → 0. By (7), (8), and (10), it follows that µv (i) = µv (i) = µv (i), i.e., the texture map does not change t t−1 1 over time, but remains fixed at its initial value (it is a texture template). Then (14) becomes: n yt (xi (ut )) − µv (i) 1 ut (u1:t−1 ) = argmin ˆ ut 2 (22) i=1 where µv (i) is the ith texel of the fixed texture template. This is the error function mini1 mized by standard template-matching algorithms. The key assumption that Ψv → 0 means the object’s texture is constant from each frame to the next, which is rarely true in real data. G-flow provides a principled way to relax this unrealistic assumption of template methods. General Case In general, if the background is uninformative, then minimizing (14) results in a weighted combination of optic flow and template matching, with the weight of each approach depending on the current level of certainty about the object template. In addition, when there is useful information in the background, G-flow infers a model of the background which is used to improve tracking. Figure 2: G-flow tracking an outdoor video. Results are shown for frames 1, 81, and 620. 5 Simulations We collected a video (30 frames/sec) of a subject in an outdoor setting who made a variety of facial expressions while moving her head. A later motion-capture session was used to create a 3D morphable model of her face, consisting of a set of 5 morph bases (k = 5). Twenty experts were initialized randomly near the correct pose on frame 1 of the video and propagated using G-flow inference (assuming an uninformative background). See http://mplab.ucsd.edu for video. Figure 2 shows the distribution of experts for three frames. In each frame, every expert has a hypothesis about the pose (translation, rotation, scale, and morph coefficients). The 38 points in the model are projected into the image according to each expert’s pose, yielding 760 red dots in each frame. In each frame, the mean of the experts gives a single hypothesis about the 3D non-rigid deformation of the face (lower right) as well as the rigid pose of the face (rotated 3D axes, lower left). Notice G-flow’s ability to recover from error: bad initial hypotheses are weeded out, leaving only good hypotheses. To compare G-flow’s performance versus deterministic constrained optic flow algorithms such as [10, 2, 11] , we used both G-flow and the method from [2] to track the same video sequence. We ran each tracker several times, introducing small errors in the starting pose. Figure 3: Average error over time for G-flow (green) and for deterministic optic flow [2] (blue). Results were averaged over 16 runs (deterministic algorithm) or 4 runs (G-flow) and smoothed. As ground truth, the 2D locations of 6 points were hand-labeled in every 20th frame. The error at every 20th frame was calculated as the distance from these labeled locations to the inferred (tracked) locations, averaged across several runs. Figure 3 compares this tracking error as a function of time for the deterministic constrained optic flow algorithm and for a 20-expert version of the G-flow tracking algorithm. Notice that the deterministic system has a tendency to drift (increase in error) over time, whereas G-flow can recover from drift. Acknowledgments Tim K. Marks was supported by NSF grant IIS-0223052 and NSF grant DGE-0333451 to GWC. John Hershey was supported by the UCDIMI grant D00-10084. J. Cooper Roddey was supported by the Swartz Foundation. Javier R. Movellan was supported by NSF grants IIS-0086107, IIS-0220141, and IIS-0223052, and by the UCDIMI grant D00-10084. References [1] Simon Baker and Iain Matthews. Lucas-kanade 20 years on: A unifying framework. International Journal of Computer Vision, 56(3):221–255, 2002. [2] M. Brand. Flexible flow for 3D nonrigid tracking and shape recovery. In CVPR, volume 1, pages 315–322, 2001. [3] H. Chen, P. Kumar, and J. van Schuppen. On Kalman filtering for conditionally gaussian systems with random matrices. Syst. Contr. Lett., 13:397–404, 1989. [4] R. Chen and J. Liu. Mixture Kalman filters. J. R. Statist. Soc. B, 62:493–508, 2000. [5] A. Doucet and C. Andrieu. Particle filtering for partially observed gaussian state space models. J. R. Statist. Soc. B, 64:827–838, 2002. [6] A. Doucet, N. de Freitas, K. Murphy, and S. Russell. Rao-blackwellised particle filtering for dynamic bayesian networks. In 16th Conference on Uncertainty in AI, pages 176–183, 2000. [7] A. Doucet, S. J. Godsill, and C. Andrieu. On sequential monte carlo sampling methods for bayesian filtering. Statistics and Computing, 10:197–208, 2000. [8] Zoubin Ghahramani and Geoffrey E. Hinton. Variational learning for switching state-space models. Neural Computation, 12(4):831–864, 2000. [9] B. Lucas and T. Kanade. An iterative image registration technique with an application to stereo vision. In Proceedings of the International Joint Conference on Artificial Intelligence, 1981. [10] L. Torresani, D. Yang, G. Alexander, and C. Bregler. Tracking and modeling non-rigid objects with rank constraints. In CVPR, pages 493–500, 2001. [11] Lorenzo Torresani, Aaron Hertzmann, and Christoph Bregler. Learning non-rigid 3d shape from 2d motion. In Advances in Neural Information Processing Systems 16. MIT Press, 2004.
4 0.1224085 64 nips-2004-Experts in a Markov Decision Process
Author: Eyal Even-dar, Sham M. Kakade, Yishay Mansour
Abstract: We consider an MDP setting in which the reward function is allowed to change during each time step of play (possibly in an adversarial manner), yet the dynamics remain fixed. Similar to the experts setting, we address the question of how well can an agent do when compared to the reward achieved under the best stationary policy over time. We provide efficient algorithms, which have regret bounds with no dependence on the size of state space. Instead, these bounds depend only on a certain horizon time of the process and logarithmically on the number of actions. We also show that in the case that the dynamics change over time, the problem becomes computationally hard. 1
5 0.10034269 206 nips-2004-Worst-Case Analysis of Selective Sampling for Linear-Threshold Algorithms
Author: Nicolò Cesa-bianchi, Claudio Gentile, Luca Zaniboni
Abstract: We provide a worst-case analysis of selective sampling algorithms for learning linear threshold functions. The algorithms considered in this paper are Perceptron-like algorithms, i.e., algorithms which can be efficiently run in any reproducing kernel Hilbert space. Our algorithms exploit a simple margin-based randomized rule to decide whether to query the current label. We obtain selective sampling algorithms achieving on average the same bounds as those proven for their deterministic counterparts, but using much fewer labels. We complement our theoretical findings with an empirical comparison on two text categorization tasks. The outcome of these experiments is largely predicted by our theoretical results: Our selective sampling algorithms tend to perform as good as the algorithms receiving the true label after each classification, while observing in practice substantially fewer labels. 1
6 0.097662807 36 nips-2004-Class-size Independent Generalization Analsysis of Some Discriminative Multi-Category Classification
7 0.093582958 26 nips-2004-At the Edge of Chaos: Real-time Computations and Self-Organized Criticality in Recurrent Neural Networks
8 0.080852702 204 nips-2004-Variational Minimax Estimation of Discrete Distributions under KL Loss
9 0.07761877 65 nips-2004-Exploration-Exploitation Tradeoffs for Experts Algorithms in Reactive Environments
10 0.075770877 129 nips-2004-New Criteria and a New Algorithm for Learning in Multi-Agent Systems
11 0.073376685 138 nips-2004-Online Bounds for Bayesian Algorithms
12 0.070573874 200 nips-2004-Using Random Forests in the Structured Language Model
13 0.069144107 116 nips-2004-Message Errors in Belief Propagation
14 0.067418896 175 nips-2004-Stable adaptive control with online learning
15 0.066604108 70 nips-2004-Following Curved Regularized Optimization Solution Paths
16 0.064573318 1 nips-2004-A Cost-Shaping LP for Bellman Error Minimization with Performance Guarantees
17 0.064248644 23 nips-2004-Analysis of a greedy active learning strategy
18 0.057899363 71 nips-2004-Generalization Error Bounds for Collaborative Prediction with Low-Rank Matrices
19 0.057675995 110 nips-2004-Matrix Exponential Gradient Updates for On-line Learning and Bregman Projection
20 0.055370741 100 nips-2004-Learning Preferences for Multiclass Problems
topicId topicWeight
[(0, -0.169), (1, 0.001), (2, 0.223), (3, 0.035), (4, 0.029), (5, -0.045), (6, -0.021), (7, 0.046), (8, 0.028), (9, -0.037), (10, -0.062), (11, -0.04), (12, 0.02), (13, -0.058), (14, -0.157), (15, 0.023), (16, 0.065), (17, 0.12), (18, -0.113), (19, -0.143), (20, -0.022), (21, -0.179), (22, 0.004), (23, -0.048), (24, 0.21), (25, 0.005), (26, -0.001), (27, 0.15), (28, -0.09), (29, 0.082), (30, -0.118), (31, 0.024), (32, -0.068), (33, -0.067), (34, -0.028), (35, 0.075), (36, 0.109), (37, 0.071), (38, 0.123), (39, 0.003), (40, -0.013), (41, 0.17), (42, -0.004), (43, 0.106), (44, -0.099), (45, -0.05), (46, 0.045), (47, 0.017), (48, -0.034), (49, -0.025)]
simIndex simValue paperId paperTitle
same-paper 1 0.95259118 126 nips-2004-Nearly Tight Bounds for the Continuum-Armed Bandit Problem
Author: Robert D. Kleinberg
Abstract: In the multi-armed bandit problem, an online algorithm must choose from a set of strategies in a sequence of n trials so as to minimize the total cost of the chosen strategies. While nearly tight upper and lower bounds are known in the case when the strategy set is finite, much less is known when there is an infinite strategy set. Here we consider the case when the set of strategies is a subset of Rd , and the cost functions are continuous. In the d = 1 case, we improve on the best-known upper and lower bounds, closing the gap to a sublogarithmic factor. We also consider the case where d > 1 and the cost functions are convex, adapting a recent online convex optimization algorithm of Zinkevich to the sparser feedback model of the multi-armed bandit problem. 1
2 0.56587666 91 nips-2004-Joint Tracking of Pose, Expression, and Texture using Conditionally Gaussian Filters
Author: Tim K. Marks, J. C. Roddey, Javier R. Movellan, John R. Hershey
Abstract: We present a generative model and stochastic filtering algorithm for simultaneous tracking of 3D position and orientation, non-rigid motion, object texture, and background texture using a single camera. We show that the solution to this problem is formally equivalent to stochastic filtering of conditionally Gaussian processes, a problem for which well known approaches exist [3, 8]. We propose an approach based on Monte Carlo sampling of the nonlinear component of the process (object motion) and exact filtering of the object and background textures given the sampled motion. The smoothness of image sequences in time and space is exploited by using Laplace’s method to generate proposal distributions for importance sampling [7]. The resulting inference algorithm encompasses both optic flow and template-based tracking as special cases, and elucidates the conditions under which these methods are optimal. We demonstrate an application of the system to 3D non-rigid face tracking. 1 Background Recent algorithms track morphable objects by solving optic flow equations, subject to the constraint that the tracked points belong to an object whose non-rigid deformations are linear combinations of a set of basic shapes [10, 2, 11]. These algorithms require precise initialization of the object pose and tend to drift out of alignment on long video sequences. We present G-flow, a generative model and stochastic filtering formulation of tracking that address the problems of initialization and error recovery in a principled manner. We define a non-rigid object by the 3D locations of n vertices. The object is a linear combination of k fixed morph bases, with coefficients c = [c1 , c2 , · · · , ck ]T . The fixed 3 × k matrix hi contains the position of the ith vertex in all k morph bases. The transformation from object-centered to image coordinates consists of a rotation, weak perspective projection, and translation. Thus xi , the 2D location of the ith vertex on the image plane, is xi = grhi c + l, (1) where r is the 3 × 3 rotation matrix, l is the 2 × 1 translation vector, and g = 1 0 0 is the 010 projection matrix. The object pose, ut , comprises both the rigid motion parameters and the morph parameters at time t: ut = {r(t), l(t), c(t)}. (2) 1.1 Optic flow Let yt represent the current image, and let xi (ut ) index the image pixel that is rendered by the ith object vertex when the object assumes pose ut . Suppose that we know ut−1 , the pose at time t − 1, and we want to find ut , the pose at time t. This problem can be solved by minimizing the following form with respect to ut : ut = argmin ˆ ut 1 2 n 2 [yt (xi (ut )) − yt−1 (xi (ut−1 ))] . (3) i=1 In the special case in which the xi (ut ) are neighboring points that move with the same 2D displacement, this reduces to the standard Lucas-Kanade optic flow algorithm [9, 1]. Recent work [10, 2, 11] has shown that in the general case, this optimization problem can be solved efficiently using the Gauss-Newton method. We will take advantage of this fact to develop an efficient stochastic inference algorithm within the framework of G-flow. Notational conventions Unless otherwise stated, capital letters are used for random variables, small letters for specific values taken by random variables, and Greek letters for fixed model parameters. Subscripted colons indicate sequences: e.g., X1:t = X1 · · · Xt . The term In stands for the n × n identity matrix, E for expected value, V ar for the covariance matrix, and V ar−1 for the inverse of the covariance matrix (precision matrix). 2 The Generative Model for G-Flow Figure 1: Left: a(Ut ) determines which texel (color at a vertex of the object model or a pixel of the background model) is responsible for rendering each image pixel. Right: G-flow video generation model: At time t, the object’s 3D pose, Ut , is used to project the object texture, Vt , into 2D. This projection is combined with the background texture, Bt , to generate the observed image, Yt . We model the image sequence Y as a stochastic process generated by three hidden causes, U , V , and B, as shown in the graphical model (Figure 1, right). The m × 1 random vector Yt represents the m-pixel image at time t. The n × 1 random vector Vt and the m × 1 random vector Bt represent the n-texel object texture and the m-texel background texture, respectively. As illustrated in Figure 1, left, the object pose, Ut , determines onto which image pixels the object and background texels project at time t. This is formulated using the projection function a(Ut ). For a given pose, ut , the projection a(ut ) is a block matrix, def a(ut ) = av (ut ) ab (ut ) . Here av (ut ), the object projection function, is an m × n matrix of 0s and 1s that tells onto which image pixel each object vertex projects; e.g., a 1 at row j, column i it means that the ith object point projects onto image pixel j. Matrix ab plays the same role for background pixels. Assuming the foreground mapping is one-toone, we let ab = Im −av (ut )av (ut )T , expressing the simple occlusion constraint that every image pixel is rendered by object or background, but not both. In the G-flow generative model: Vt Yt = a(Ut ) + Wt Wt ∼ N (0, σw Im ), σw > 0 Bt (4) Ut ∼ p(ut | ut−1 ) v v Vt = Vt−1 + Zt−1 Zt−1 ∼ N (0, Ψv ), Ψv is diagonal b b Bt = Bt−1 + Zt−1 Zt−1 ∼ N (0, Ψb ), Ψb is diagonal where p(ut | ut−1 ) is the pose transition distribution, and Z v , Z b , W are independent of each other, of the initial conditions, and over time. The form of the pose distribution is left unspecified since the algorithm proposed here does not require the pose distribution or the pose dynamics to be Gaussian. For the initial conditions, we require that the variance of V1 and the variance of B1 are both diagonal. Non-rigid 3D tracking is a difficult nonlinear filtering problem because changing the pose has a nonlinear effect on the image pixels. Fortunately, the problem has a rich structure that we can exploit: under the G-flow model, video generation is a conditionally Gaussian process [3, 6, 4, 5]. If the specific values taken by the pose sequence, u1:t , were known, then the texture processes, V and B, and the image process, Y , would be jointly Gaussian. This suggests the following scheme: we could use particle filtering to obtain a distribution of pose experts (each expert corresponds to a highly probable sample of pose, u1:t ). For each expert we could then use Kalman filtering equations to infer the posterior distribution of texture given the observed images. This method is known in the statistics community as a Monte Carlo filtering solution for conditionally Gaussian processes [3, 4], and in the machine learning community as Rao-Blackwellized particle filtering [6, 5]. We found that in addition to Rao-Blackwellization, it was also critical to use Laplace’s method to generate the proposal distributions for importance sampling [7]. In the context of G-flow, we accomplished this by performing an optic flow-like optimization, using an efficient algorithm similar to those in [10, 2]. 3 Inference Our goal is to find an expression for the filtering distribution, p(ut , vt , bt | y1:t ). Using the law of total probability, we have the following equation for the filtering distribution: p(ut , vt , bt | y1:t ) = p(ut , vt , bt | u1:t−1 , y1:t ) p(u1:t−1 | y1:t ) du1:t−1 Opinion of expert (5) Credibility of expert We can think of the integral in (5) as a sum over a distribution of experts, where each expert corresponds to a single pose history, u1:t−1 . Based on its hypothesis about pose history, each expert has an opinion about the current pose of the object, Ut , and the texture maps of the object and background, Vt and Bt . Each expert also has a credibility, a scalar that measures how well the expert’s opinion matches the observed image yt . Thus, (5) can be interpreted as follows: The filtering distribution at time t is obtained by integrating over the entire ensemble of experts the opinion of each expert weighted by that expert’s credibility. The opinion distribution of expert u1:t−1 can be factorized into the expert’s opinion about the pose Ut times the conditional distribution of texture Vt , Bt given pose: p(ut , vt , bt | u1:t−1 , y1:t ) = p(ut | u1:t−1 , y1:t ) p(vt , bt | u1:t , y1:t ) (6) Opinion of expert Pose Opinion Texture Opinion given pose The rest of this section explains how we evaluate each term in (5) and (6). We cover the distribution of texture given pose in 3.1, pose opinion in 3.2, and credibility in 3.3. 3.1 Texture opinion given pose The distribution of Vt and Bt given the pose history u1:t is Gaussian with mean and covariance that can be obtained using the Kalman filter estimation equations: −1 V ar−1 (Vt , Bt | u1:t , y1:t ) = V ar−1 (Vt , Bt | u1:t−1 , y1:t−1 ) + a(ut )T σw a(ut ) E(Vt , Bt | u1:t , y1:t ) = V ar(Vt , Bt | u1:t , y1:t ) −1 × V ar−1 (Vt , Bt | u1:t−1 , y1:t−1 )E(Vt , Bt | u1:t−1 , y1:t−1 ) + a(ut )T σw yt (7) (8) This requires p(Vt , Bt |u1:t−1 , y1:t−1 ), which we get from the Kalman prediction equations: E(Vt , Bt | u1:t−1 , y1:t−1 ) = E(Vt−1 , Bt−1 | u1:t−1 , y1:t−1 ) V ar(Vt , Bt | u1:t−1 , y1:t−1 ) = V ar(Vt−1 , Bt−1 | u1:t−1 , y1:t−1 ) + (9) Ψv 0 0 Ψb (10) In (9), the expected value E(Vt , Bt | u1:t−1 , y1:t−1 ) consists of texture maps (templates) for the object and background. In (10), V ar(Vt , Bt | u1:t−1 , y1:t−1 ) represents the degree of uncertainty about each texel in these texture maps. Since this is a diagonal matrix, we can refer to the mean and variance of each texel individually. For the ith texel in the object texture map, we use the following notation: µv (i) t v σt (i) def = ith element of E(Vt | u1:t−1 , y1:t−1 ) def = (i, i)th element of V ar(Vt | u1:t−1 , y1:t−1 ) b Similarly, define µb (j) and σt (j) as the mean and variance of the jth texel in the backt ground texture map. (This notation leaves the dependency on u1:t−1 and y1:t−1 implicit.) 3.2 Pose opinion Based on its current texture template (derived from the history of poses and images up to time t−1) and the new image yt , each expert u1:t−1 has a pose opinion, p(ut |u1:t−1 , y1:t ), a probability distribution representing that expert’s beliefs about the pose at time t. Since the effect of ut on the likelihood function is nonlinear, we will not attempt to find an analytical solution for the pose opinion distribution. However, due to the spatio-temporal smoothness of video signals, it is possible to estimate the peak and variance of an expert’s pose opinion. 3.2.1 Estimating the peak of an expert’s pose opinion We want to estimate ut (u1:t−1 ), the value of ut that maximizes the pose opinion. Since ˆ p(ut | u1:t−1 , y1:t ) = p(y1:t−1 | u1:t−1 ) p(ut | ut−1 ) p(yt | u1:t , y1:t−1 ), p(y1:t | u1:t−1 ) (11) def ut (u1:t−1 ) = argmax p(ut | u1:t−1 , y1:t ) = argmax p(ut | ut−1 ) p(yt | u1:t , y1:t−1 ). ˆ ut ut (12) We now need an expression for the final term in (12), the predictive distribution p(yt | u1:t , y1:t−1 ). By integrating out the hidden texture variables from p(yt , vt , bt | u1:t , y1:t−1 ), and using the conditional independence relationships defined by the graphical model (Figure 1, right), we can derive: 1 m log p(yt | u1:t , y1:t−1 ) = − log 2π − log |V ar(Yt | u1:t , y1:t−1 )| 2 2 n v 2 1 (yt (xi (ut )) − µt (i)) 1 (yt (j) − µb (j))2 t − − , (13) v (i) + σ b 2 i=1 σt 2 σt (j) + σw w j∈X (ut ) where xi (ut ) is the image pixel rendered by the ith object vertex when the object assumes pose ut , and X (ut ) is the set of all image pixels rendered by the object under pose ut . Combining (12) and (13), we can derive ut (u1:t−1 ) = argmin − log p(ut | ut−1 ) ˆ (14) ut + 1 2 n i=1 [yt (xi (ut )) − µv (i)]2 [yt (xi (ut )) − µb (xi (ut ))]2 t t b − − log[σt (xi (ut )) + σw ] v b σt (i) + σw σt (xi (ut )) + σw Foreground term Background terms Note the similarity between (14) and constrained optic flow (3). For example, focus on the foreground term in (14) and ignore the weights in the denominator. The previous image yt−1 from (3) has been replaced by µv (·), the estimated object texture based on the images t and poses up to time t − 1. As in optic flow, we can find the pose estimate ut (u1:t−1 ) ˆ efficiently using the Gauss-Newton method. 3.2.2 Estimating the distribution of an expert’s pose opinion We estimate the distribution of an expert’s pose opinion using a combination of Laplace’s method and importance sampling. Suppose at time t − 1 we are given a sample of experts (d) (d) indexed by d, each endowed with a pose sequence u1:t−1 , a weight wt−1 , and the means and variances of Gaussian distributions for object and background texture. For each expert (d) (d) u1:t−1 , we use (14) to compute ut , the peak of the pose distribution at time t according ˆ (d) to that expert. Define σt as the inverse Hessian matrix of (14) at this peak, the Laplace ˆ estimate of the covariance matrix of the expert’s opinion. We then generate a set of s (d,e) (d) independent samples {ut : e = 1, · · · , s} from a Gaussian distribution with mean ut ˆ (d) (d) (d) and variance proportional to σt , g(·|ˆt , αˆt ), where the parameter α > 0 determines ˆ u σ the sharpness of the sampling distribution. (Note that letting α → 0 would be equivalent to (d,e) (d) simply setting the new pose equal to the peak of the pose opinion, ut = ut .) To find ˆ the parameters of this Gaussian proposal distribution, we use the Gauss-Newton method, ignoring the second of the two background terms in (14). (This term is not ignored in the importance sampling step.) To refine our estimate of the pose opinion we use importance sampling. We assign each sample from the proposal distribution an importance weight wt (d, e) that is proportional to the ratio between the posterior distribution and the proposal distribution: s (d) p(ut | u1:t−1 , y1:t ) = ˆ (d,e) δ(ut − ut ) wt (d, e) s f =1 wt (d, f ) (15) e=1 (d,e) (d) (d) (d,e) p(ut | ut−1 )p(yt | u1:t−1 , ut , y1:t−1 ) wt (d, e) = (16) (d,e) (d) (d) g(ut | ut , αˆt ) ˆ σ (d,e) (d) The numerator of (16) is proportional to p(ut |u1:t−1 , y1:t ) by (12), and the denominator of (16) is the sampling distribution. 3.3 Estimating an expert’s credibility (d) The credibility of the dth expert, p(u1:t−1 | y1:t ), is proportional to the product of a prior term and a likelihood term: (d) (d) p(u1:t−1 | y1:t−1 )p(yt | u1:t−1 , y1:t−1 ) (d) p(u1:t−1 | y1:t ) = . (17) p(yt | y1:t−1 ) Regarding the likelihood, p(yt |u1:t−1 , y1:t−1 ) = p(yt , ut |u1:t−1 , y1:t−1 )dut = p(yt |u1:t , y1:t−1 )p(ut |ut−1 )dut (18) (d,e) We already generated a set of samples {ut : e = 1, · · · , s} that estimate the pose opin(d) ion of the dth expert, p(ut | u1:t−1 , y1:t ). We can now use these samples to estimate the likelihood for the dth expert: (d) p(yt | u1:t−1 , y1:t−1 ) = (d) (d) p(yt | u1:t−1 , ut , y1:t−1 )p(ut | ut−1 )dut (19) (d) (d) (d) (d) = p(yt | u1:t−1 , ut , y1:t−1 )g(ut | ut , αˆt ) ˆ σ 3.4 p(ut | ut−1 ) s e=1 dut ≈ wt (d, e) s Updating the filtering distribution g(ut | (d) (d) ut , αˆt ) ˆ σ Once we have calculated the opinion and credibility of each expert u1:t−1 , we evaluate the integral in (5) as a weighted sum over experts. The credibilities of all of the experts are normalized to sum to 1. New experts u1:t (children) are created from the old experts u1:t−1 (parents) by appending a pose ut to the parent’s history of poses u1:t−1 . Every expert in the new generation is created as follows: One parent is chosen to sire the child. The probability of being chosen is proportional to the parent’s credibility. The child’s value of ut is chosen at random from its parent’s pose opinion (the weighted samples described in Section 3.2.2). 4 Relation to Optic Flow and Template Matching In basic template-matching, the same time-invariant texture map is used to track every frame in the video sequence. Optic flow can be thought of as template-matching with a template that is completely reset at each frame for use in the subsequent frame. In most cases, optimal inference under G-flow involves a combination of optic flow-based and template-based tracking, in which the texture template gradually evolves as new images are presented. Pure optic flow and template-matching emerge as special cases. Optic Flow as a Special Case Suppose that the pose transition probability p(ut | ut−1 ) is uninformative, that the background is uninformative, that every texel in the initial object texture map has equal variance, V ar(V1 ) = κIn , and that the texture transition uncertainty is very high, Ψv → diag(∞). Using (7), (8), and (10), it follows that: µv (i) = [av (ut−1 )]T yt−1 = yt−1 (xi (ut−1 )) , t (20) i.e., the object texture map at time t is determined by the pixels from image yt−1 that according to pose ut−1 were rendered by the object. As a result, (14) reduces to: ut (u1:t−1 ) = argmin ˆ ut 1 2 n yt (xi (ut )) − yt−1 (xi (ut−1 )) 2 (21) i=1 which is identical to (3). Thus constrained optic flow [10, 2, 11] is simply a special case of optimal inference under G-flow, with a single expert and with sampling parameter α → 0. The key assumption that Ψv → diag(∞) means that the object’s texture is very different in adjacent frames. However, optic flow is typically applied in situations in which the object’s texture in adjacent frames is similar. The optimal solution in such situations calls not for optic flow, but for a texture map that integrates information across multiple frames. Template Matching as a Special Case Suppose the initial texture map is known precisely, V ar(V1 ) = 0, and the texture transition uncertainty is very low, Ψv → 0. By (7), (8), and (10), it follows that µv (i) = µv (i) = µv (i), i.e., the texture map does not change t t−1 1 over time, but remains fixed at its initial value (it is a texture template). Then (14) becomes: n yt (xi (ut )) − µv (i) 1 ut (u1:t−1 ) = argmin ˆ ut 2 (22) i=1 where µv (i) is the ith texel of the fixed texture template. This is the error function mini1 mized by standard template-matching algorithms. The key assumption that Ψv → 0 means the object’s texture is constant from each frame to the next, which is rarely true in real data. G-flow provides a principled way to relax this unrealistic assumption of template methods. General Case In general, if the background is uninformative, then minimizing (14) results in a weighted combination of optic flow and template matching, with the weight of each approach depending on the current level of certainty about the object template. In addition, when there is useful information in the background, G-flow infers a model of the background which is used to improve tracking. Figure 2: G-flow tracking an outdoor video. Results are shown for frames 1, 81, and 620. 5 Simulations We collected a video (30 frames/sec) of a subject in an outdoor setting who made a variety of facial expressions while moving her head. A later motion-capture session was used to create a 3D morphable model of her face, consisting of a set of 5 morph bases (k = 5). Twenty experts were initialized randomly near the correct pose on frame 1 of the video and propagated using G-flow inference (assuming an uninformative background). See http://mplab.ucsd.edu for video. Figure 2 shows the distribution of experts for three frames. In each frame, every expert has a hypothesis about the pose (translation, rotation, scale, and morph coefficients). The 38 points in the model are projected into the image according to each expert’s pose, yielding 760 red dots in each frame. In each frame, the mean of the experts gives a single hypothesis about the 3D non-rigid deformation of the face (lower right) as well as the rigid pose of the face (rotated 3D axes, lower left). Notice G-flow’s ability to recover from error: bad initial hypotheses are weeded out, leaving only good hypotheses. To compare G-flow’s performance versus deterministic constrained optic flow algorithms such as [10, 2, 11] , we used both G-flow and the method from [2] to track the same video sequence. We ran each tracker several times, introducing small errors in the starting pose. Figure 3: Average error over time for G-flow (green) and for deterministic optic flow [2] (blue). Results were averaged over 16 runs (deterministic algorithm) or 4 runs (G-flow) and smoothed. As ground truth, the 2D locations of 6 points were hand-labeled in every 20th frame. The error at every 20th frame was calculated as the distance from these labeled locations to the inferred (tracked) locations, averaged across several runs. Figure 3 compares this tracking error as a function of time for the deterministic constrained optic flow algorithm and for a 20-expert version of the G-flow tracking algorithm. Notice that the deterministic system has a tendency to drift (increase in error) over time, whereas G-flow can recover from drift. Acknowledgments Tim K. Marks was supported by NSF grant IIS-0223052 and NSF grant DGE-0333451 to GWC. John Hershey was supported by the UCDIMI grant D00-10084. J. Cooper Roddey was supported by the Swartz Foundation. Javier R. Movellan was supported by NSF grants IIS-0086107, IIS-0220141, and IIS-0223052, and by the UCDIMI grant D00-10084. References [1] Simon Baker and Iain Matthews. Lucas-kanade 20 years on: A unifying framework. International Journal of Computer Vision, 56(3):221–255, 2002. [2] M. Brand. Flexible flow for 3D nonrigid tracking and shape recovery. In CVPR, volume 1, pages 315–322, 2001. [3] H. Chen, P. Kumar, and J. van Schuppen. On Kalman filtering for conditionally gaussian systems with random matrices. Syst. Contr. Lett., 13:397–404, 1989. [4] R. Chen and J. Liu. Mixture Kalman filters. J. R. Statist. Soc. B, 62:493–508, 2000. [5] A. Doucet and C. Andrieu. Particle filtering for partially observed gaussian state space models. J. R. Statist. Soc. B, 64:827–838, 2002. [6] A. Doucet, N. de Freitas, K. Murphy, and S. Russell. Rao-blackwellised particle filtering for dynamic bayesian networks. In 16th Conference on Uncertainty in AI, pages 176–183, 2000. [7] A. Doucet, S. J. Godsill, and C. Andrieu. On sequential monte carlo sampling methods for bayesian filtering. Statistics and Computing, 10:197–208, 2000. [8] Zoubin Ghahramani and Geoffrey E. Hinton. Variational learning for switching state-space models. Neural Computation, 12(4):831–864, 2000. [9] B. Lucas and T. Kanade. An iterative image registration technique with an application to stereo vision. In Proceedings of the International Joint Conference on Artificial Intelligence, 1981. [10] L. Torresani, D. Yang, G. Alexander, and C. Bregler. Tracking and modeling non-rigid objects with rank constraints. In CVPR, pages 493–500, 2001. [11] Lorenzo Torresani, Aaron Hertzmann, and Christoph Bregler. Learning non-rigid 3d shape from 2d motion. In Advances in Neural Information Processing Systems 16. MIT Press, 2004.
3 0.55000931 65 nips-2004-Exploration-Exploitation Tradeoffs for Experts Algorithms in Reactive Environments
Author: Daniela D. Farias, Nimrod Megiddo
Abstract: A reactive environment is one that responds to the actions of an agent rather than evolving obliviously. In reactive environments, experts algorithms must balance exploration and exploitation of experts more carefully than in oblivious ones. In addition, a more subtle definition of a learnable value of an expert is required. A general exploration-exploitation experts method is presented along with a proper definition of value. The method is shown to asymptotically perform as well as the best available expert. Several variants are analyzed from the viewpoint of the exploration-exploitation tradeoff, including explore-then-exploit, polynomially vanishing exploration, constant-frequency exploration, and constant-size exploration phases. Complexity and performance bounds are proven. 1
4 0.53939903 48 nips-2004-Convergence and No-Regret in Multiagent Learning
Author: Michael Bowling
Abstract: Learning in a multiagent system is a challenging problem due to two key factors. First, if other agents are simultaneously learning then the environment is no longer stationary, thus undermining convergence guarantees. Second, learning is often susceptible to deception, where the other agents may be able to exploit a learner’s particular dynamics. In the worst case, this could result in poorer performance than if the agent was not learning at all. These challenges are identifiable in the two most common evaluation criteria for multiagent learning algorithms: convergence and regret. Algorithms focusing on convergence or regret in isolation are numerous. In this paper, we seek to address both criteria in a single algorithm by introducing GIGA-WoLF, a learning algorithm for normalform games. We prove the algorithm guarantees at most zero average regret, while demonstrating the algorithm converges in many situations of self-play. We prove convergence in a limited setting and give empirical results in a wider variety of situations. These results also suggest a third new learning criterion combining convergence and regret, which we call negative non-convergence regret (NNR). 1
5 0.45907554 36 nips-2004-Class-size Independent Generalization Analsysis of Some Discriminative Multi-Category Classification
Author: Tong Zhang
Abstract: We consider the problem of deriving class-size independent generalization bounds for some regularized discriminative multi-category classification methods. In particular, we obtain an expected generalization bound for a standard formulation of multi-category support vector machines. Based on the theoretical result, we argue that the formulation over-penalizes misclassification error, which in theory may lead to poor generalization performance. A remedy, based on a generalization of multi-category logistic regression (conditional maximum entropy), is then proposed, and its theoretical properties are examined. 1
6 0.44411993 129 nips-2004-New Criteria and a New Algorithm for Learning in Multi-Agent Systems
7 0.44057614 64 nips-2004-Experts in a Markov Decision Process
8 0.43577248 206 nips-2004-Worst-Case Analysis of Selective Sampling for Linear-Threshold Algorithms
9 0.34502298 57 nips-2004-Economic Properties of Social Networks
10 0.33188266 171 nips-2004-Solitaire: Man Versus Machine
11 0.32738709 1 nips-2004-A Cost-Shaping LP for Bellman Error Minimization with Performance Guarantees
12 0.31747836 204 nips-2004-Variational Minimax Estimation of Discrete Distributions under KL Loss
13 0.31580484 26 nips-2004-At the Edge of Chaos: Real-time Computations and Self-Organized Criticality in Recurrent Neural Networks
14 0.31288603 138 nips-2004-Online Bounds for Bayesian Algorithms
15 0.30812898 23 nips-2004-Analysis of a greedy active learning strategy
16 0.28368244 196 nips-2004-Triangle Fixing Algorithms for the Metric Nearness Problem
17 0.28183219 198 nips-2004-Unsupervised Variational Bayesian Learning of Nonlinear Models
18 0.27977973 100 nips-2004-Learning Preferences for Multiclass Problems
19 0.26785901 185 nips-2004-The Convergence of Contrastive Divergences
20 0.26561669 71 nips-2004-Generalization Error Bounds for Collaborative Prediction with Low-Rank Matrices
topicId topicWeight
[(13, 0.091), (15, 0.065), (26, 0.044), (31, 0.017), (33, 0.633), (35, 0.02), (39, 0.016), (50, 0.022)]
simIndex simValue paperId paperTitle
same-paper 1 0.9971416 126 nips-2004-Nearly Tight Bounds for the Continuum-Armed Bandit Problem
Author: Robert D. Kleinberg
Abstract: In the multi-armed bandit problem, an online algorithm must choose from a set of strategies in a sequence of n trials so as to minimize the total cost of the chosen strategies. While nearly tight upper and lower bounds are known in the case when the strategy set is finite, much less is known when there is an infinite strategy set. Here we consider the case when the set of strategies is a subset of Rd , and the cost functions are continuous. In the d = 1 case, we improve on the best-known upper and lower bounds, closing the gap to a sublogarithmic factor. We also consider the case where d > 1 and the cost functions are convex, adapting a recent online convex optimization algorithm of Zinkevich to the sparser feedback model of the multi-armed bandit problem. 1
2 0.99625957 139 nips-2004-Optimal Aggregation of Classifiers and Boosting Maps in Functional Magnetic Resonance Imaging
Author: Vladimir Koltchinskii, Manel Martínez-ramón, Stefan Posse
Abstract: We study a method of optimal data-driven aggregation of classifiers in a convex combination and establish tight upper bounds on its excess risk with respect to a convex loss function under the assumption that the solution of optimal aggregation problem is sparse. We use a boosting type algorithm of optimal aggregation to develop aggregate classifiers of activation patterns in fMRI based on locally trained SVM classifiers. The aggregation coefficients are then used to design a ”boosting map” of the brain needed to identify the regions with most significant impact on classification. 1
3 0.99502051 120 nips-2004-Modeling Conversational Dynamics as a Mixed-Memory Markov Process
Author: Tanzeem Choudhury, Sumit Basu
Abstract: In this work, we quantitatively investigate the ways in which a given person influences the joint turn-taking behavior in a conversation. After collecting an auditory database of social interactions among a group of twenty-three people via wearable sensors (66 hours of data each over two weeks), we apply speech and conversation detection methods to the auditory streams. These methods automatically locate the conversations, determine their participants, and mark which participant was speaking when. We then model the joint turn-taking behavior as a Mixed-Memory Markov Model [1] that combines the statistics of the individual subjects' self-transitions and the partners ' cross-transitions. The mixture parameters in this model describe how much each person 's individual behavior contributes to the joint turn-taking behavior of the pair. By estimating these parameters, we thus estimate how much influence each participant has in determining the joint turntaking behavior. We show how this measure correlates significantly with betweenness centrality [2], an independent measure of an individual's importance in a social network. This result suggests that our estimate of conversational influence is predictive of social influence. 1
4 0.99490404 149 nips-2004-Probabilistic Inference of Alternative Splicing Events in Microarray Data
Author: Ofer Shai, Brendan J. Frey, Quaid D. Morris, Qun Pan, Christine Misquitta, Benjamin J. Blencowe
Abstract: Alternative splicing (AS) is an important and frequent step in mammalian gene expression that allows a single gene to specify multiple products, and is crucial for the regulation of fundamental biological processes. The extent of AS regulation, and the mechanisms involved, are not well understood. We have developed a custom DNA microarray platform for surveying AS levels on a large scale. We present here a generative model for the AS Array Platform (GenASAP) and demonstrate its utility for quantifying AS levels in different mouse tissues. Learning is performed using a variational expectation maximization algorithm, and the parameters are shown to correctly capture expected AS trends. A comparison of the results obtained with a well-established but low through-put experimental method demonstrate that AS levels obtained from GenASAP are highly predictive of AS levels in mammalian tissues. 1 Biological diversity through alternative splicing Current estimates place the number of genes in the human genome at approximately 30,000, which is a surprisingly small number when one considers that the genome of yeast, a singlecelled organism, has 6,000 genes. The number of genes alone cannot account for the complexity and cell specialization exhibited by higher eukaryotes (i.e. mammals, plants, etc.). Some of that added complexity can be achieved through the use of alternative splicing, whereby a single gene can be used to code for a multitude of products. Genes are segments of the double stranded DNA that contain the information required by the cell for protein synthesis. That information is coded using an alphabet of 4 (A, C, G, and T), corresponding to the four nucleotides that make up the DNA. In what is known as the central dogma of molecular biology, DNA is transcribed to RNA, which in turn is translated into proteins. Messenger RNA (mRNA) is synthesized in the nucleus of the cell and carries the genomic information to the ribosome. In eukaryotes, genes are generally comprised of both exons, which contain the information needed by the cell to synthesize proteins, and introns, sometimes referred to as spacer DNA, which are spliced out of the pre-mRNA to create mature mRNA. An estimated 35%-75% of human genes [1] can be C1 (a) C1 A C1 C2 C1 A 3’ C2 (b) C2 C1 A 3’ C1 A 5’ C2 A 5’ C2 C1 C1 C2 C1 (c) A2 A1 C1 A1 C1 A2 C1 C2 C2 C1 (d) C2 C2 A C2 C2 C2 C1 C2 Figure 1: Four types of AS. Boxes represent exons and lines represent introns, with the possible splicing alternatives indicated by the connectors. (a) Single cassette exon inclusion/exclusion. C1 and C2 are constitutive exons (exons that are included in all isoforms) and flank a single alternative exon (A). The alternative exon is included in one isoform and excluded in the other. (b) Alternative 3’ (or donor) and alternative 5’ (acceptor) splicing sites. Both exons are constitutive, but may contain alternative donor and/or acceptor splicing sites. (c) Mutually exclusive exons. One of the two alternative exons (A1 and A2 ) may be included in the isoform, but not both. (d) Intron inclusion. An intron may be included in the mature mRNA strand. spliced to yield different combinations of exons (called isoforms), a phenomenon referred to as alternative splicing (AS). There are four major types of AS as shown in Figure 1. Many multi-exon genes may undergo more than one alternative splicing event, resulting in many possible isoforms from a single gene. [2] In addition to adding to the genetic repertoire of an organism by enabling a single gene to code for more than one protein, AS has been shown to be critical for gene regulation, contributing to tissue specificity, and facilitating evolutionary processes. Despite the evident importance of AS, its regulation and impact on specific genes remains poorly understood. The work presented here is concerned with the inference of single cassette exon AS levels (Figure 1a) based on data obtained from RNA expression arrays, also known as microarrays. 1.1 An exon microarray data set that probes alternative splicing events Although it is possible to directly analyze the proteins synthesized by a cell, it is easier, and often more informative, to instead measure the abundance of mRNA present. Traditionally, gene expression (abundance of mRNA) has been studied using low throughput techniques (such as RT-PCR or Northern blots), limited to studying a few sequences at a time and making large scale analysis nearly impossible. In the early 1990s, microarray technology emerged as a method capable of measuring the expression of thousands of DNA sequences simultaneously. Sequences of interest are deposited on a substrate the size of a small microscope slide, to form probes. The mRNA is extracted from the cell and reverse-transcribed back into DNA, which is labelled with red and green fluorescent dye molecules (cy3 and cy5 respectively). When the sample of tagged DNA is washed over the slide, complementary strands of DNA from the sample hybridize to the probes on the array forming A-T and C-G pairings. The slide is then scanned and the fluorescent intensity is measured at each probe. It is generally assumed that the intensity measure at the probe is linearly related to the abundance of mRNA in the cell over a wide dynamic range. Despite significant improvements in microarray technologies in recent years, microarray data still presents some difficulties in analysis. Low measurements tend to have extremely low signal to noise ratio (SNR) [7] and probes often bind to sequences that are very similar, but not identical, to the one for which they were designed (a process referred to as cross- C1 A C1 A C1:A C1 C2 C2 3 Body probes A:C2 A C2 2 Inclusion junction probes C1:C2 C1 C2 1 Exclusion junction probe Figure 2: Each alternative splicing event is studied using six probes. Probes were chosen to measure the expression levels of each of the three exons involved in the event. Additionally, 3 probes are used that target the junctions that are formed by each of the two isoforms. The inclusion isoform would express the junctions formed by C1 and A, and A and C2 , while the exclusion isoform would express the junction formed by C1 and C2 hybridization). Additionally, probes exhibit somewhat varying hybridization efficiency, and sequences exhibit varying labelling efficiency. To design our data sets, we mined public sequence databases and identified exons that were strong candidates for exhibiting AS (the details of that analysis are provided elsewhere [4, 3]). Of the candidates, 3,126 potential AS events in 2,647 unique mouse genes were selected for the design of Agilent Custom Oligonucleotide microarray. The arrays were hybridized with unamplified mRNA samples extracted from 10 wild-type mouse tissues (brain, heart, intestine, kidney, liver, lung, salivary gland, skeletal muscle, spleen, and testis). Each AS event has six target probes on the arrays, chosen from regions of the C1 exon, C2 exon, A exon, C1 :A splice junction, A:C2 splice junction, and C1 :C2 splice junction, as shown in Figure 2. 2 Unsupervised discovery of alternative splicing With the exception of the probe measuring the alternative exon, A (Figure 2), all probes measure sequences that occur in both isoforms. For example, while the sequence of the probe measuring the junction A:C1 is designed to measure the inclusion isoform, half of it corresponds to a sequence that is found in the exclusion isoform. We can therefore safely assume that the measured intensity at each probe is a result of a certain amount of both isoforms binding to the probe. Due to the generally assumed linear relationship between the abundance of mRNA hybridized at a probe and the fluorescent intensity measured, we model the measured intensity as a weighted sum of the overall abundance of the two isoforms. A stronger assumption is that of a single, consistent hybridization profile for both isoforms across all probes and all slides. Ideally, one would prefer to estimate an individual hybridization profile for each AS event studied across all slides. However, in our current setup, the number of tissues is small (10), resulting in two difficulties. First, the number of parameters is very large when compared to the number of data point using this model, and second, a portion of the events do not exhibit tissue specific alternative splicing within our small set of tissues. While the first hurdle could be accounted for using Baysian parameter estimation, the second cannot. 2.1 GenASAP - a generative model for alternative splicing array platform Using the setup described above, the expression vector x, containing the six microarray measurements as real numbers, can be decomposed as a linear combination of the abundance of the two splice isoforms, represented by the real vector s, with some added noise: x = Λs + noise, where Λ is a 6 × 2 weight matrix containing the hybridization profiles for s1 ^ xC ^ xC 1 2 s2 ^ xC :A ^ xA ^ xA:C 2 ^ xC1:C2 2 xC1:C2 1 r xC xC 2 xA xC :A xA:C oC1 oC2 oA oC :A oA:C 1 1 1 2 oC :C 1 2 Σn 2 Figure 3: Graphical model for alternative splicing. Each measurement in the observed expression profile, x, is generated by either using a scale factor, r, on a linear combination of the isoforms, s, or drawing randomly from an outlier model. For a detailed description of the model, see text. the two isoforms across the six probes. Note that we may not have a negative amount of a given isoform, nor can the presence of an isoform deduct from the measured expression, and so both s and Λ are constrained to be positive. Expression levels measured by microarrays have previously been modelled as having expression-dependent noise [7]. To address this, we rewrite the above formulation as x = r(Λs + ε), (1) where r is a scale factor and ε is a zero-mean normally distributed random variable with a diagonal covariance matrix, Ψ, denoted as p(ε) = N (ε; 0, Ψ). The prior distribution for the abundance of the splice isoforms is given by a truncated normal distribution, denoted as p(s) ∝ N (s, 0, I)[s ≥ 0], where [·] is an indicator function such that [s ≥ 0] = 1 if ∀i, si ≥ 0, and [s ≥ 0] = 0 otherwise. Lastly, there is a need to account for aberrant observations (e.g. due to faulty probes, flakes of dust, etc.) with an outlier model. The complete GenASAP model (shown in Figure 3) accounts for the observations as the outcome of either applying equation (1) or an outlier model. To avoid degenerate cases and ensure meaningful and interpretable results, the number of faulty probes considered for each AS event may not exceed two, as indicated by the filled-in square constraint node in Figure 3. The distribution of x conditional on the latent variables, s, r, and o, is: N (xi ; rΛi s, r2 Ψi )[oi =0] N (xi ; Ei , Vi )[oi =1] , p(x|s, r, o) = (2) i where oi ∈ {0, 1} is a bernoulli random variable indicating if the measurement at probe xi is the result of the AS model or the outlier model parameterized by p(oi = 1) = γi . The parameters of the outlier model, E and V, are not optimized and are set to the mean and variance of the data. 2.2 Variational learning in the GenASAP model To infer the posterior distribution over the splice isoform abundances while at the same time learning the model parameters we use a variational expectation-maximization algorithm (EM). EM maximizes the log likelihood of the data by iteratively estimating the posterior distribution of the model given the data in the expectation (E) step, and maximizing the log likelihood with respect to the parameters, while keeping the posterior fixed, in the maximization (M) step. Variational EM is used when, as in the case of GenASAP, the exact posterior is intractable. Variational EM minimizes the free energy of the model, defined as the KL-divergence between the joint distribution of the latent and observed variables and the approximation to the posterior under the model parameters [5, 6]. We approximate the true posterior using the Q distribution given by T Q({s(t) }, {o(t) }, {r(t) }) = (t) Q(r(t) )Q(o(t) |r(t) ) t=1 (t) Q(si |oi , r(t) ) i −1 T =Z (t) (3) d d ρ(t) ω (t) N (s(t) ; µ(t) , Σ(t) )[s(t) ≥ 0], ro ro t=1 where Z is a normalization constant, the superscript d indicates that Σ is constrained to be diagonal, and there are T iid AS events. For computational efficiency, r is selected from a finite set, r ∈ {r1 , r2 , . . . , rC } with uniform probability. The variational free energy is given by Q({s(t) }, {o(t) }, {r(t) }) . P ({s(t) }, {o(t) }, {r(t) }, {x(t) }) s r o (4) Variational EM minimizes the free energy by iteratively updating the Q distribution’s vari(t)d (t)d ational parameters (ρ(t) , ω (t) , µro , and Σro ) in the E-step, and the model parameters (Λ, Ψ, {r1 , r2 , . . . , rC }, and γ) in the M-step. The resulting updates are too long to be shown in the context of this paper and are discussed in detail elsewhere [3]. A few particular points regarding the E-step are worth covering in detail here. Q({s(t) }, {o(t) }, {r(t) }) log F(Q, P ) = If the prior on s was a full normal distribution, there would be no need for a variational approach, and exact EM is possible. For a truncated normal distribution, however, the mixing proportions, Q(r)Q(o|r) cannot be calculated analytically except for the case where s is scalar, necessitating the diagonality constraint. Note that if Σ was allowed to be a full covariance matrix, equation (3) would be the true posterior, and we could find the sufficient statistics of Q(s(t) |o(t) , r(t) ): −1 µ(t) = (I + ΛT (I − O(t) )T Ψ−1 (I − O(t) )Λ)−1 ΛT (I − O(t) )T Ψ−1 x(t) r(t) ro −1 Σ(t) ro = (I + ΛT (I − O(t) )T Ψ−1 (I − O(t) )Λ) (5) (6) where O is a diagonal matrix with elements Oi,i = oi . Furthermore, it can be easily shown that the optimal settings for µd and Σd approximating a normal distribution with full covariance Σ and mean µ is µd optimal = µ −1 Σd optimal = diag(Σ (7) −1 ) (8) In the truncated case, equation (8) is still true. Equation (7) does not hold, though, and µd optimal cannot be found analytically. In our experiments, we found that using equation (7) still decreases the free energy every E-step, and it is significantly more efficient than using, for example, a gradient decent method to compute the optimal µd . Intuitive Weigh Matrix Optimal Weight Matrix 50 50 40 40 30 30 20 20 10 0 10 Inclusion Isoform 0 Exclusion Isoform Inclusion Isoform (a) Exclusion Isoform (b) Figure 4: (a) An intuitive set of weights. Based on the biological background, one would expect to see the inclusion isoform hybridize to the probes measuring C1 , C2 , A, C1 :A, and A:C2 , while the exclusion isoform hybridizes to C1 , C2 , and C1 :C2 . (b) The learned set of weights closely agrees with the intuition, and captures cross hybridization between the probes Contribution of exclusion isoform Contribution of inclusion isoform AS model Original Data (a) RT−PCR AS model measurement prediction (% exclusion) (% exclusion) 14% 72% (b) 27% 70% 8% 22% outliers (c) Figure 5: Three examples of data cases and their predictions. (a) The data does not follow our notion of single cassette exon AS, but the AS level is predicted accurately by the model.(b) The probe C1 :A is marked as outlier, allowing the model to predict the other probes accurately. (c) Two probes are marked as outliers, and the model is still successful in predicting the AS levels. 3 Making biological predictions about alternative splicing The results presented in this paper were obtained using two stages of learning. In the first step, the weight matrix, Λ, is learned on a subset of the data that is selected for quality. Two selection criteria were used: (a) sequencing data was used to select those cases for which, with high confidence, no other AS event is present (Figure 1) and (b) probe sets were selected for high expression, as determined by a set of negative controls. The second selection criterion is motivated by the common assumption that low intensity measurements are of lesser quality (see Section 1.1). In the second step, Λ is kept fixed, and we introduce the additional constraint that the noise is isotropic (Ψ = ψI) and learn on the entire data set. The constraint on the noise is introduced to prevent the model from using only a subset of the six probes for making the final set of predictions. We show a typical learned set of weights in Figure 4. The weights fit well with our intuition of what they should be to capture the presence of the two isoforms. Moreover, the learned weights account for the specific trends in the data. Examples of model prediction based on the microarray data are shown in Figure 5. Due to the nature of the microarray data, we do not expect all the inferred abundances to be equally good, and we devised a scoring criterion that ranks each AS event based on its fit to the model. Intuitively, given two input vectors that are equivalent up to a scale factor, with inferred MAP estimations that are equal up to the same scale factor, we would like their scores to be identical. The scoring criterion used, therefore is k (xk − rΛk s)2 /(xk + Rank 500 1000 2000 5000 10000 15000 20000 30000 Pearson’s correlation coefficient 0.94 0.95 0.95 0.79 0.79 0.78 0.75 0.65 False positive rate 0.11 0.08 0.05 0.2 0.25 0.29 0.32 0.42 Table 1: Model performance evaluated at various ranks. Using 180 RT-PCR measurements, we are able to predict the model’s performance at various ranks. Two evaluation criteria are used: Pearson’s correlation coefficient between the model’s predictions and the RT-PCR measurements and false positive rate, where a prediction is considered to be false positive if it is more than 15% away from the RT-PCR measurement. rΛk s)2 , where the MAP estimations for r and s are used. This scoring criterion can be viewed as proportional to the sum of noise to signal ratios, as estimated using the two values given by the observation and the model’s best prediction of that observation. Since it is the relative amount of the isoforms that is of most interest, we need to use the inferred distribution of the isoform abundances to obtain an estimate for the relative levels of AS. It is not immediately clear how this should be done. We do, however, have RTPCR measurements for 180 AS events to guide us (see figure 6 for details). Using the top 50 ranked RT-PCR measurement, we fit three parameters, {a1 , a2 , a3 }, such that the s2 proportion of excluded isoform present, p, is given by p = a1 s1 +a2 s2 + a3 , where s1 is the MAP estimation of the abundance of the inclusion isoform, s2 is the MAP estimation of the abundance of the exclusion isoform, and the RT-PCR measurement are used for target p. The parameters are fitted using gradient descent on a least squared error (LSE) evaluation criterion. We used two criteria to evaluate the quality of the AS model predictions. Pearson’s correlation coefficient (PCC) is used to evaluate the overall ability of the model to correctly estimate trends in the data. PCC is invariant to affine transformation and so is independent of the transformation parameters a1 and a3 discussed above, while the parameter a2 was found to effect PCC very little. The PCC stays above 0.75 for the top two thirds ranked predictions. The second evaluation criterion used is the false positive rate, where a prediction is considered to be false positive if it is more than 15% away from the RT-PCR measurement. This allows us to say, for example, that if a prediction is within the top 10000, we are 75% confident that it is within 15% of the actual levels of AS. 4 Summary We designed a novel AS model for the inference of the relative abundance of two alternatively spliced isoforms from six measurements. Unsupervised learning in the model is performed using a structured variational EM algorithm, which correctly captures the underlying structure of the data, as suggested by its biological nature. The AS model, though presented here for a cassette exon AS events, can be used to learn any type of AS, and with a simple adjustment, multiple types. The predictions obtained from the AS model are currently being used to verify various claims about the role of AS in evolution and functional genomics, and to help identify sequences that affect the regulation of AS. % Exclusion isoform RT−PCR measurement Vs. AS model predictions RT−PCR measurements: 90 80 AS model prediction Int es Te tine sti Kid s n Sa ey liva Br ry ain Sp le Liv en er Mu sc Lu le ng 100 70 60 50 40 30 14 22 27 32 47 46 66 78 63 20 AS model prediction: 10 27 24 26 26 51 75 60 85 100 (a) 0 0 20 40 60 80 RT−PCR measurement 100 (b) Figure 6: (a) Sample RT-PCR. RNA extracted from the cell is reverse-transcribed to DNA, amplified and labelled with radioactive or fluorescent molecules. The sample is pulled through a viscous gel in an electric field (DNA, being an acid, is positively charged). Shorter strands travel further through the gel than longer ones, resulting in two distinct bands, corresponding to the two isoforms, when exposed to a photosensitive or x-ray film. (b) A scatter plot showing the RT-PCR measurements as compared to the AS model predictions. The plot shows all available RT-PCR measurements with a rank of 8000 or better. The AS model presented assumes a single weight matrix for all data cases. This is an oversimplified view of the data, and current work is being carried out in identifying probe specific expression profiles. However, due to the low dimensionality of the problem (10 tissues, six probes per event), care must be taken to avoid overfitting and to ensure meaningful interpretations. Acknowledgments We would like to thank Wen Zhang, Naveed Mohammad, and Timothy Hughes for their contributions in generating the data set. This work was funded in part by an operating and infrastructure grants from the CIHR and CFI, and a operating grants from NSERC and a Premier’s Research Excellence Award. References [1] J. M. Johnson et al. Genome-wide survey of human alternative pre-mrna splicing with exon junction microarrays. Science, 302:2141–44, 2003. [2] L. Cartegni et al. Listening to silence and understanding nonsense: exonic mutations that affect splicing. Nature Gen. Rev., 3:285–98, 2002. [3] Q. Pan et al. Revealing global regulatory features of mammalian alternative splicing using a quantitative microarray platform. Molecular Cell, 16(6):929–41, 2004. [4] Q. Pan et al. Alternative splicing of conserved exons is frequently species specific in human and mouse. Trends Gen., In Press, 2005. [5] M. I. Jordan, Z. Ghahramani, T. Jaakkola, and Lawrence K. Saul. An introduction to variational methods for graphical models. Machine Learning, 37(2):183– 233, 1999. [6] R. M. Neal and G. E. Hinton. A view of the em algorithm that justifies incremental, sparse, and other variants. In Learning in Graphical Models. Cambridge, MIT Press, 1998. [7] D. M. Rocke and B. Durbin. A model for measurement error for gene expression arrays. Journal of Computational Biology, 8(6):557–69, 2001.
5 0.99237806 39 nips-2004-Coarticulation in Markov Decision Processes
Author: Khashayar Rohanimanesh, Robert Platt, Sridhar Mahadevan, Roderic Grupen
Abstract: We investigate an approach for simultaneously committing to multiple activities, each modeled as a temporally extended action in a semi-Markov decision process (SMDP). For each activity we define a set of admissible solutions consisting of the redundant set of optimal policies, and those policies that ascend the optimal statevalue function associated with them. A plan is then generated by merging them in such a way that the solutions to the subordinate activities are realized in the set of admissible solutions satisfying the superior activities. We present our theoretical results and empirically evaluate our approach in a simulated domain. 1
6 0.9915722 63 nips-2004-Expectation Consistent Free Energies for Approximate Inference
7 0.98997647 115 nips-2004-Maximum Margin Clustering
8 0.98748529 186 nips-2004-The Correlated Correspondence Algorithm for Unsupervised Registration of Nonrigid Surfaces
9 0.91659915 56 nips-2004-Dynamic Bayesian Networks for Brain-Computer Interfaces
10 0.9157849 143 nips-2004-PAC-Bayes Learning of Conjunctions and Classification of Gene-Expression Data
11 0.91172135 80 nips-2004-Identifying Protein-Protein Interaction Sites on a Genome-Wide Scale
12 0.90950376 72 nips-2004-Generalization Error and Algorithmic Convergence of Median Boosting
13 0.90724057 32 nips-2004-Boosting on Manifolds: Adaptive Regularization of Base Classifiers
14 0.90616286 44 nips-2004-Conditional Random Fields for Object Recognition
15 0.90298814 3 nips-2004-A Feature Selection Algorithm Based on the Global Minimization of a Generalization Error Bound
16 0.90240246 204 nips-2004-Variational Minimax Estimation of Discrete Distributions under KL Loss
17 0.90100497 147 nips-2004-Planning for Markov Decision Processes with Sparse Stochasticity
18 0.90096974 64 nips-2004-Experts in a Markov Decision Process
19 0.89285326 166 nips-2004-Semi-supervised Learning via Gaussian Processes
20 0.89276016 38 nips-2004-Co-Validation: Using Model Disagreement on Unlabeled Data to Validate Classification Algorithms