nips nips2004 nips2004-65 knowledge-graph by maker-knowledge-mining
Source: pdf
Author: Daniela D. Farias, Nimrod Megiddo
Abstract: A reactive environment is one that responds to the actions of an agent rather than evolving obliviously. In reactive environments, experts algorithms must balance exploration and exploitation of experts more carefully than in oblivious ones. In addition, a more subtle definition of a learnable value of an expert is required. A general exploration-exploitation experts method is presented along with a proper definition of value. The method is shown to asymptotically perform as well as the best available expert. Several variants are analyzed from the viewpoint of the exploration-exploitation tradeoff, including explore-then-exploit, polynomially vanishing exploration, constant-frequency exploration, and constant-size exploration phases. Complexity and performance bounds are proven. 1
Reference: text
sentIndex sentText sentNum sentScore
1 com Abstract A reactive environment is one that responds to the actions of an agent rather than evolving obliviously. [sent-4, score-0.352]
2 In reactive environments, experts algorithms must balance exploration and exploitation of experts more carefully than in oblivious ones. [sent-5, score-1.002]
3 In addition, a more subtle definition of a learnable value of an expert is required. [sent-6, score-0.455]
4 A general exploration-exploitation experts method is presented along with a proper definition of value. [sent-7, score-0.267]
5 The method is shown to asymptotically perform as well as the best available expert. [sent-8, score-0.12]
6 Several variants are analyzed from the viewpoint of the exploration-exploitation tradeoff, including explore-then-exploit, polynomially vanishing exploration, constant-frequency exploration, and constant-size exploration phases. [sent-9, score-0.401]
7 1 Introduction Real-world environments require agents to choose actions sequentially. [sent-11, score-0.128]
8 For example, a driver has to choose everyday a route from one point to another, based on past experience and perhaps some current information. [sent-12, score-0.13]
9 In another example, an airline company has to set prices dynamically, also based on past experience and current information. [sent-13, score-0.18]
10 One important difference between these two examples is that the effect of the driver’s decision on the future traffic patterns is negligible, whereas prices set by one airline can affect future market prices significantly. [sent-14, score-0.173]
11 In this sense the decisions of the airlines are made in a reactive environment, whereas the driver performs in a non-reactive one. [sent-15, score-0.176]
12 The state of the environment may depend both on the agent’s past choices and on choices made by the environment independent of the agent’s current choice. [sent-19, score-0.366]
13 In this paper we focus on the so-called experts algorithm approach. [sent-21, score-0.23]
14 An “expert” (or “oracle”) is simply a particular strategy recommending actions based on the past history of the process. [sent-22, score-0.128]
15 An experts algorithm is a method that combines the recommendations of several given “experts” (or “oracles”) into another strategy of choosing actions (e. [sent-23, score-0.388]
16 Roughly speaking, such algorithms blend choices of exploration, aimed at acquiring knowledge, and exploitation that capitalizes on gained knowledge to accumulate rewards. [sent-27, score-0.127]
17 In particular, some experts algorithms can be interpreted as blending the testing of all experts and following those experts that observed to be more rewarding. [sent-28, score-0.69]
18 Our previous paper [2] presented a specific exploration-exploitation experts algorithm. [sent-29, score-0.23]
19 The difference between our algorithm and previous experts algorithms is that our algorithm tests each expert for multiple consecutive stages of the decision process, in order to acquire knowledge about how the environment reacts to the expert. [sent-32, score-0.945]
20 We pointed out that the “Minimum Regret” criterion often used for evaluating experts algorithms was not suitable for reactive environments, since it ignored the possibility that different experts may induce different states of the environment. [sent-33, score-0.596]
21 In this paper, we present a more general exploration-exploitation experts method and provide results about the convergence of several of its variants. [sent-36, score-0.267]
22 We develop performance guarantees showing that the method achieves average payoff comparable to that achieved by the best expert. [sent-37, score-0.207]
23 We also introduce a definition for the long-term value of an expert, which captures the reactions of the environment to the expert’s actions, as well as the fact that any learning algorithm commits mistakes. [sent-39, score-0.107]
24 Finally, we characterize how fast the method learns the value of each expert. [sent-40, score-0.14]
25 An important aspect of our results is that they provide an explicit characterization of the tradeoff between exploration and exploitation. [sent-41, score-0.41]
26 Convergence rates based on actual expert performance are presented in section 3. [sent-44, score-0.416]
27 At the same times the environment also “chooses” bt ∈ B, and then the agent receives a reward R(at , bt ). [sent-53, score-0.408]
28 The choices of the environment may depend on various factors, including the past choices of the agent. [sent-54, score-0.279]
29 As in the particular algorithm of [2], the general method follows chosen experts for multiple stages rather than picking a different expert each time. [sent-55, score-0.866]
30 A maximal set of consecutive stages during which the same expert is followed is called a phase. [sent-56, score-0.619]
31 Phase numbers are denoted by i, The number of phases during which expert e has been followed is denoted by N e , the total number of stages during which expert e has been followed is denoted by S e , and the average payoff from phases in which expert e has been followed is denoted by M e . [sent-57, score-2.09]
32 An exploration phase consists of picking a random expert e (i. [sent-60, score-0.953]
33 , r}), and following e’s recommendations for a certain number of stages depending on the variant of the method. [sent-65, score-0.204]
34 An exploitation phase consists of picking an expert e with maximum Me , breaking ties at random, and following e’s recommendations for a certain number of stages depending on the variant of the method. [sent-67, score-0.861]
35 With probability pi , perform an exploration phase, and with probability 1 − pi perform an exploitation phase; denote by e the expert chosen to be followed and by n the number of stages chosen for the current phase. [sent-74, score-1.168]
36 Follow expert e’s instructions for the next n stages. [sent-76, score-0.416]
37 Denote by R the average payoff accumulated during the current phase of n stages and update n ˜ (R − M e ) . [sent-78, score-0.377]
38 We denote stage numbers by s and phase numbers by i. [sent-81, score-0.2]
39 In sections 3 and 5, we present performance bounds for the EEE method when the length of the phase is n = Ne . [sent-102, score-0.229]
40 The following was proven: Pr lim inf M (s) ≥ max lim inf Me (i) = 1 . [sent-107, score-0.462]
41 (1) s→∞ e i→∞ In words, the algorithm achieves asymptotically an average reward that is as large as that of the best expert. [sent-108, score-0.304]
42 Third, they quantify the relationship between amount of exploration, represented by the exploration probabilities p i , and the loss of performance. [sent-114, score-0.366]
43 Together with the analysis of Section 5, which characterizes how fast the EEE method learns the value of each expert, the bounds derived here describe explicitly the tradeoff between exploration and exploitation. [sent-115, score-0.568]
44 We denote by Zej the event “phase j performs exploration with expert e,” and let Zj = e Zej and i i ¯ Z i0 i = E pi . [sent-116, score-0.841]
45 Zj = j=i0 +1 j=i0 +1 ¯ Note that Zi0 i denotes the expected number of exploration phases between phases i0 + 1 and i. [sent-117, score-0.758]
46 The first theorem establishes that, with high probability, after a finite number of iterations, the EEE method performs comparably to the best expert. [sent-118, score-0.108]
47 The performance of each expert is defined as the smallest average reward achieved by that expert in the interval between an (arbitrary) phase i0 and the current phase i. [sent-119, score-1.364]
48 It can be shown via a counterexample that this bound cannot be extended into a (somewhat more natural) comparison between the average reward of the EEE method and the average reward of each expert at iteration i. [sent-120, score-0.929]
49 For all i0 , i and such that Zi0 i ≤ i 2 /(4 ru2 ) − i0 /(4u), Pr M (i) ≤ max e min i0 +1≤j≤i Me (j) − 2 1 ≤ exp − 2i i i 2 ¯ √ 2 − 0 − Z i0 i 4u 4 ru 2 . [sent-123, score-0.175]
50 The following theorem characterizes the expected difference between the average reward of EEE method and that of the best expert. [sent-124, score-0.373]
51 1 that, under certain assumptions on the exploration probabilities, the EEE method performs asymptotically at least as well as the expert that did best. [sent-129, score-0.895]
52 If limi→∞ Z0i /i = 0, then Pr lim inf M (s) ≥ max lim inf Me (i) = 1 . [sent-134, score-0.462]
53 s→∞ e i→∞ (2) Note that here the average reward obtained by the EEE method is compared with the reward actually achieved by each expert during the same run of the method. [sent-135, score-0.878]
54 4 The Value of an Expert In this section we analyze the behavior of the average reward Me (i) that is computed by the EEE method for each expert e. [sent-137, score-0.694]
55 This average reward is also used by the method to intuitively estimate the value of expert e. [sent-138, score-0.674]
56 This concept is not trivial especially when the environment is reactive. [sent-141, score-0.107]
57 The obvious definition of a value as the expected average reward the expert could achieve, if followed exclusively, does not work. [sent-142, score-0.714]
58 That example shows that an algorithm that attempts to learn what an expert would achieve, if played exclusively, cannot avoid committing fatal “mistakes. [sent-144, score-0.46]
59 A more realistic concept of value, relative to a certain environment policy π, is defined as follows, using a real parameter τ . [sent-147, score-0.185]
60 The τ -value µτ of expert e with respect to π is the largest achievable e τ -value of e: µτ = sup{ µ : µ is an achievable τ -value} . [sent-153, score-0.534]
61 (3) e In words, a value µ is achievable by expert e if the expert can secure an expected average reward during the s stages, between stage s0 and stage s0 + s, which is asymptotically at least as much as µ, regardless of the history of the play prior to stage s0 . [sent-154, score-1.373]
62 In [2], we introduced the notion of flexibility as a way of reasoning about the value of an expert and when it can be learned. [sent-155, score-0.416]
63 We note, however, that flexibility does hold when the environment reacts with bounded memory or as a finite automaton. [sent-157, score-0.184]
64 5 Bounds Based on Expected Expert Performance In this section we characterize how fast the EEE method learns the τ -value of each expert. [sent-158, score-0.14]
65 We can derive the rate at which the average reward achieved by the EEE method approaches the τ -value of the best expert. [sent-159, score-0.309]
66 For all > 0 and i, ¯ 4r 3 if 4cτ (2 − τ ) ¯ 1/¯ τ ¯ ≤ Z0i , Pr inf Me (j) < µτ − e then j≥i ≤ 33u2 2 ¯ Z0i 43u2 r 2 exp − . [sent-163, score-0.131]
67 Note from the definition of τ -values that we can only expect the average reward of expert e to be close to µτ if the phase lengths when the expert is chosen are sufficiently large. [sent-164, score-1.282]
68 It ensures that each expert is chosen sufficiently many phases; since phase lengths grow proportionally to the number of phases an expert is chosen, this implies that phase lengths are large enough. [sent-167, score-1.455]
69 1 to provide an overall bound on the difference of the average reward achieved by the EEE method and the τ -value of the best expert. [sent-170, score-0.343]
70 For all > 0, i0 and i, if (i) then 4r 3 4cτ (2 − τ ) ¯ 1/¯ τ i 2 i0 ¯ ¯ ≤ Z0i0 , and (ii) Zi0 i ≤ √ 2 − , 4u 4 ru Pr M (i) ≤ max µτ − 3 e (4) e ≤ 33u2 2 ¯ Z0i0 43u2 r 2 exp − + exp − 1 2i i 2 i ¯ √ 2 − 0 − Z i0 i 4u 4 ru 2 . [sent-173, score-0.266]
71 In Section 6, we analyze several term in the bound small, and Z 0 exploration schemes and their effect on the convergence rate of the EEE method. [sent-177, score-0.475]
72 If limi→∞ Z0i = ∞, then Pr (lim inf i→∞ Me (i) ≥ µτ ) = 1. [sent-183, score-0.107]
73 If limi→∞ Z0i = ∞ and limi→∞ Z0i /i = 0, then Pr lim inf M (i) ≥ max µτ = 1 . [sent-188, score-0.251]
74 e i→∞ e 6 Exploration Schemes The results of the previous sections hold under generic choices of the probabilities p i . [sent-189, score-0.11]
75 Here, we discuss how various particular choices affect the speed of exploiting accumulated information, gathering new information and adapting to changes in the environment. [sent-190, score-0.11]
76 1 Explore-then-Exploit One approach to determining exploration schemes is to minimize the upper bound provided in Corollary 5. [sent-192, score-0.455]
77 This gives rise to a scheme where the whole exploration takes place before any exploitation. [sent-194, score-0.366]
78 It can be shown that the smallest number of phases i, such that U ≤ β, is bounded between two polynomials in 1/ , u, and r. [sent-201, score-0.207]
79 Moreover, its dependence on the the total number of experts r is asymptotically O(r 1. [sent-202, score-0.285]
80 The main drawback of explore-then-exploit is its inability to adapt to changes in the policy of the environment — since the whole exploration occurs first, any change that occurs after exploration has ended cannot be learned. [sent-204, score-0.896]
81 Moreover, the choice of the last exploration phase i0 depends on parameters of the problem that may not be observable. [sent-205, score-0.499]
82 This choice of exploration probabilities satisfies ¯ lim Z0i = ∞ and i→∞ ¯ lim Z0i /i = 0 , i→∞ so the corollaries apply. [sent-211, score-0.685]
83 It follows that the total number of phases required for U to hold grows exponentially in 1/ , u and r. [sent-213, score-0.218]
84 1−α It follows that the smallest number of phases that guarantees that U ≤ β is on the order of 3−α i = O max 6. [sent-216, score-0.28]
85 Constant-Rate Exploration The previous exploration schemes have the property that the frequency of exploration vanishes as the number of phases grows. [sent-218, score-0.972]
86 Constant-rate exploration does not satisfy the conditions of Corollaries 3. [sent-224, score-0.366]
87 However, for any given tolerance level , the value of η can be chosen so that Pr lim inf M (i) ≥ max µτ − e =1. [sent-227, score-0.312]
88 e i→∞ Moreover, constant-rate exploration yields complexity results similar to those of the explore-then-exploit scheme. [sent-228, score-0.366]
89 ) ; pj = √ 2 8 ru then it follows that U ≤ β if the number of phases i is on the order of r 2 u5 u2 i=O log 2 . [sent-232, score-0.352]
90 4 Constant Phase Lengths In all the variants of the EEE method considered so far, the number of stages per phase increases linearly as a function of the number of phases during which the same expert has been followed previously. [sent-234, score-0.951]
91 This growth is used to ensure that, as long as the policy of the environment exhibits some regularity, that regularity is captured by the algorithm. [sent-235, score-0.246]
92 For instance, if that policy is cyclic, then the EEE method correctly learns the long-term value of each expert, regardless of the lengths of the cycles. [sent-236, score-0.232]
93 For practical purposes, it may be necessary to slow down the growth of phase lengths in order to get some meaningful results in reasonable time. [sent-237, score-0.236]
94 In this section, we consider the possibility of a constant number L of stages in each phase. [sent-238, score-0.155]
95 If the EEE method is implemented with phases of fixed length L, then for all i0 , i, and , such that i0 i 2 ¯ − , Z i0 i ≤ 2u2 2u the following bound holds: Pr M (i) ≤ max e min i0 +1≤j≤i Me (j) − 2 ≤ exp − 1 2i i 2 i0 ¯ − − Z i0 i 2u2 2u 2 . [sent-244, score-0.342]
96 We can also characterize the expected difference between the average reward of EEE method and that of the best expert. [sent-245, score-0.351]
97 If the EEE method is implemented with phases of fixed length L, then for all i0 ≤ i and > 0, ¯ 2u2 Zi0 i i0 . [sent-248, score-0.222]
98 If the EEE method is implemented with phases of fixed length L ≥ 2, then for all > 0, 2¯ cτ 2L2 u2 Z0i Pr inf Me (j) < µτ − τ − ≤ · exp − 2 2 . [sent-251, score-0.353]
99 e 2 j≥i L 4L u r An important qualitative difference between fixed-length phases and increasing-length ones is the absence of the number of experts r in the bound given in Theorem 6. [sent-252, score-0.449]
100 This implies that, in the explore-then-exploit or constant-rate exploration schemes, the algorithm requires a number of phases which grows only linearly with r to ensure that τ Pr(M (i) ≤ max Me − c/Lτ − ) ≤ β . [sent-254, score-0.617]
wordName wordTfidf (topN-words)
[('eee', 0.51), ('expert', 0.416), ('exploration', 0.366), ('experts', 0.23), ('phases', 0.185), ('reward', 0.181), ('phase', 0.133), ('stages', 0.125), ('pr', 0.112), ('corollaries', 0.111), ('environment', 0.107), ('inf', 0.107), ('reactive', 0.106), ('lim', 0.104), ('limi', 0.089), ('ru', 0.089), ('pj', 0.078), ('corollary', 0.076), ('agent', 0.076), ('lengths', 0.076), ('driver', 0.07), ('exploitation', 0.07), ('airline', 0.067), ('environments', 0.065), ('actions', 0.063), ('achievable', 0.059), ('recommendations', 0.058), ('megiddo', 0.058), ('registers', 0.058), ('choices', 0.057), ('policy', 0.057), ('followed', 0.055), ('schemes', 0.055), ('asymptotically', 0.055), ('prices', 0.053), ('se', 0.052), ('payoff', 0.046), ('stage', 0.045), ('fatal', 0.044), ('reacts', 0.044), ('zej', 0.044), ('tradeoff', 0.044), ('theorem', 0.043), ('characterize', 0.043), ('tolerance', 0.041), ('average', 0.04), ('max', 0.04), ('learns', 0.04), ('asymptotic', 0.04), ('bounds', 0.039), ('farias', 0.039), ('pucci', 0.039), ('learnable', 0.039), ('increment', 0.039), ('picking', 0.038), ('past', 0.038), ('method', 0.037), ('theorems', 0.037), ('pi', 0.037), ('nr', 0.035), ('polynomially', 0.035), ('bound', 0.034), ('guarantees', 0.033), ('hold', 0.033), ('sr', 0.033), ('accumulated', 0.033), ('hs', 0.033), ('nition', 0.032), ('ae', 0.031), ('possibility', 0.03), ('regularity', 0.029), ('best', 0.028), ('growth', 0.027), ('zj', 0.027), ('history', 0.027), ('exclusively', 0.026), ('mr', 0.026), ('ensure', 0.026), ('denoted', 0.024), ('exp', 0.024), ('exibility', 0.024), ('schapire', 0.023), ('consecutive', 0.023), ('achieved', 0.023), ('bt', 0.022), ('min', 0.022), ('smallest', 0.022), ('expected', 0.022), ('denote', 0.022), ('game', 0.022), ('regardless', 0.022), ('experience', 0.022), ('characterizes', 0.022), ('certain', 0.021), ('freund', 0.021), ('chosen', 0.02), ('fast', 0.02), ('various', 0.02), ('analyze', 0.02), ('sections', 0.02)]
simIndex simValue paperId paperTitle
same-paper 1 1.0000001 65 nips-2004-Exploration-Exploitation Tradeoffs for Experts Algorithms in Reactive Environments
Author: Daniela D. Farias, Nimrod Megiddo
Abstract: A reactive environment is one that responds to the actions of an agent rather than evolving obliviously. In reactive environments, experts algorithms must balance exploration and exploitation of experts more carefully than in oblivious ones. In addition, a more subtle definition of a learnable value of an expert is required. A general exploration-exploitation experts method is presented along with a proper definition of value. The method is shown to asymptotically perform as well as the best available expert. Several variants are analyzed from the viewpoint of the exploration-exploitation tradeoff, including explore-then-exploit, polynomially vanishing exploration, constant-frequency exploration, and constant-size exploration phases. Complexity and performance bounds are proven. 1
2 0.2688095 64 nips-2004-Experts in a Markov Decision Process
Author: Eyal Even-dar, Sham M. Kakade, Yishay Mansour
Abstract: We consider an MDP setting in which the reward function is allowed to change during each time step of play (possibly in an adversarial manner), yet the dynamics remain fixed. Similar to the experts setting, we address the question of how well can an agent do when compared to the reward achieved under the best stationary policy over time. We provide efficient algorithms, which have regret bounds with no dependence on the size of state space. Instead, these bounds depend only on a certain horizon time of the process and logarithmically on the number of actions. We also show that in the case that the dynamics change over time, the problem becomes computationally hard. 1
3 0.11871778 88 nips-2004-Intrinsically Motivated Reinforcement Learning
Author: Nuttapong Chentanez, Andrew G. Barto, Satinder P. Singh
Abstract: Psychologists call behavior intrinsically motivated when it is engaged in for its own sake rather than as a step toward solving a specific problem of clear practical value. But what we learn during intrinsically motivated behavior is essential for our development as competent autonomous entities able to efficiently solve a wide range of practical problems as they arise. In this paper we present initial results from a computational study of intrinsically motivated reinforcement learning aimed at allowing artificial agents to construct and extend hierarchies of reusable skills that are needed for competent autonomy. 1
4 0.10503884 37 nips-2004-Co-Training and Expansion: Towards Bridging Theory and Practice
Author: Maria-florina Balcan, Avrim Blum, Ke Yang
Abstract: Co-training is a method for combining labeled and unlabeled data when examples can be thought of as containing two distinct sets of features. It has had a number of practical successes, yet previous theoretical analyses have needed very strong assumptions on the data that are unlikely to be satisfied in practice. In this paper, we propose a much weaker “expansion” assumption on the underlying data distribution, that we prove is sufficient for iterative cotraining to succeed given appropriately strong PAC-learning algorithms on each feature set, and that to some extent is necessary as well. This expansion assumption in fact motivates the iterative nature of the original co-training algorithm, unlike stronger assumptions (such as independence given the label) that allow a simpler one-shot co-training to succeed. We also heuristically analyze the effect on performance of noise in the data. Predicted behavior is qualitatively matched in synthetic experiments on expander graphs. 1
5 0.089287907 91 nips-2004-Joint Tracking of Pose, Expression, and Texture using Conditionally Gaussian Filters
Author: Tim K. Marks, J. C. Roddey, Javier R. Movellan, John R. Hershey
Abstract: We present a generative model and stochastic filtering algorithm for simultaneous tracking of 3D position and orientation, non-rigid motion, object texture, and background texture using a single camera. We show that the solution to this problem is formally equivalent to stochastic filtering of conditionally Gaussian processes, a problem for which well known approaches exist [3, 8]. We propose an approach based on Monte Carlo sampling of the nonlinear component of the process (object motion) and exact filtering of the object and background textures given the sampled motion. The smoothness of image sequences in time and space is exploited by using Laplace’s method to generate proposal distributions for importance sampling [7]. The resulting inference algorithm encompasses both optic flow and template-based tracking as special cases, and elucidates the conditions under which these methods are optimal. We demonstrate an application of the system to 3D non-rigid face tracking. 1 Background Recent algorithms track morphable objects by solving optic flow equations, subject to the constraint that the tracked points belong to an object whose non-rigid deformations are linear combinations of a set of basic shapes [10, 2, 11]. These algorithms require precise initialization of the object pose and tend to drift out of alignment on long video sequences. We present G-flow, a generative model and stochastic filtering formulation of tracking that address the problems of initialization and error recovery in a principled manner. We define a non-rigid object by the 3D locations of n vertices. The object is a linear combination of k fixed morph bases, with coefficients c = [c1 , c2 , · · · , ck ]T . The fixed 3 × k matrix hi contains the position of the ith vertex in all k morph bases. The transformation from object-centered to image coordinates consists of a rotation, weak perspective projection, and translation. Thus xi , the 2D location of the ith vertex on the image plane, is xi = grhi c + l, (1) where r is the 3 × 3 rotation matrix, l is the 2 × 1 translation vector, and g = 1 0 0 is the 010 projection matrix. The object pose, ut , comprises both the rigid motion parameters and the morph parameters at time t: ut = {r(t), l(t), c(t)}. (2) 1.1 Optic flow Let yt represent the current image, and let xi (ut ) index the image pixel that is rendered by the ith object vertex when the object assumes pose ut . Suppose that we know ut−1 , the pose at time t − 1, and we want to find ut , the pose at time t. This problem can be solved by minimizing the following form with respect to ut : ut = argmin ˆ ut 1 2 n 2 [yt (xi (ut )) − yt−1 (xi (ut−1 ))] . (3) i=1 In the special case in which the xi (ut ) are neighboring points that move with the same 2D displacement, this reduces to the standard Lucas-Kanade optic flow algorithm [9, 1]. Recent work [10, 2, 11] has shown that in the general case, this optimization problem can be solved efficiently using the Gauss-Newton method. We will take advantage of this fact to develop an efficient stochastic inference algorithm within the framework of G-flow. Notational conventions Unless otherwise stated, capital letters are used for random variables, small letters for specific values taken by random variables, and Greek letters for fixed model parameters. Subscripted colons indicate sequences: e.g., X1:t = X1 · · · Xt . The term In stands for the n × n identity matrix, E for expected value, V ar for the covariance matrix, and V ar−1 for the inverse of the covariance matrix (precision matrix). 2 The Generative Model for G-Flow Figure 1: Left: a(Ut ) determines which texel (color at a vertex of the object model or a pixel of the background model) is responsible for rendering each image pixel. Right: G-flow video generation model: At time t, the object’s 3D pose, Ut , is used to project the object texture, Vt , into 2D. This projection is combined with the background texture, Bt , to generate the observed image, Yt . We model the image sequence Y as a stochastic process generated by three hidden causes, U , V , and B, as shown in the graphical model (Figure 1, right). The m × 1 random vector Yt represents the m-pixel image at time t. The n × 1 random vector Vt and the m × 1 random vector Bt represent the n-texel object texture and the m-texel background texture, respectively. As illustrated in Figure 1, left, the object pose, Ut , determines onto which image pixels the object and background texels project at time t. This is formulated using the projection function a(Ut ). For a given pose, ut , the projection a(ut ) is a block matrix, def a(ut ) = av (ut ) ab (ut ) . Here av (ut ), the object projection function, is an m × n matrix of 0s and 1s that tells onto which image pixel each object vertex projects; e.g., a 1 at row j, column i it means that the ith object point projects onto image pixel j. Matrix ab plays the same role for background pixels. Assuming the foreground mapping is one-toone, we let ab = Im −av (ut )av (ut )T , expressing the simple occlusion constraint that every image pixel is rendered by object or background, but not both. In the G-flow generative model: Vt Yt = a(Ut ) + Wt Wt ∼ N (0, σw Im ), σw > 0 Bt (4) Ut ∼ p(ut | ut−1 ) v v Vt = Vt−1 + Zt−1 Zt−1 ∼ N (0, Ψv ), Ψv is diagonal b b Bt = Bt−1 + Zt−1 Zt−1 ∼ N (0, Ψb ), Ψb is diagonal where p(ut | ut−1 ) is the pose transition distribution, and Z v , Z b , W are independent of each other, of the initial conditions, and over time. The form of the pose distribution is left unspecified since the algorithm proposed here does not require the pose distribution or the pose dynamics to be Gaussian. For the initial conditions, we require that the variance of V1 and the variance of B1 are both diagonal. Non-rigid 3D tracking is a difficult nonlinear filtering problem because changing the pose has a nonlinear effect on the image pixels. Fortunately, the problem has a rich structure that we can exploit: under the G-flow model, video generation is a conditionally Gaussian process [3, 6, 4, 5]. If the specific values taken by the pose sequence, u1:t , were known, then the texture processes, V and B, and the image process, Y , would be jointly Gaussian. This suggests the following scheme: we could use particle filtering to obtain a distribution of pose experts (each expert corresponds to a highly probable sample of pose, u1:t ). For each expert we could then use Kalman filtering equations to infer the posterior distribution of texture given the observed images. This method is known in the statistics community as a Monte Carlo filtering solution for conditionally Gaussian processes [3, 4], and in the machine learning community as Rao-Blackwellized particle filtering [6, 5]. We found that in addition to Rao-Blackwellization, it was also critical to use Laplace’s method to generate the proposal distributions for importance sampling [7]. In the context of G-flow, we accomplished this by performing an optic flow-like optimization, using an efficient algorithm similar to those in [10, 2]. 3 Inference Our goal is to find an expression for the filtering distribution, p(ut , vt , bt | y1:t ). Using the law of total probability, we have the following equation for the filtering distribution: p(ut , vt , bt | y1:t ) = p(ut , vt , bt | u1:t−1 , y1:t ) p(u1:t−1 | y1:t ) du1:t−1 Opinion of expert (5) Credibility of expert We can think of the integral in (5) as a sum over a distribution of experts, where each expert corresponds to a single pose history, u1:t−1 . Based on its hypothesis about pose history, each expert has an opinion about the current pose of the object, Ut , and the texture maps of the object and background, Vt and Bt . Each expert also has a credibility, a scalar that measures how well the expert’s opinion matches the observed image yt . Thus, (5) can be interpreted as follows: The filtering distribution at time t is obtained by integrating over the entire ensemble of experts the opinion of each expert weighted by that expert’s credibility. The opinion distribution of expert u1:t−1 can be factorized into the expert’s opinion about the pose Ut times the conditional distribution of texture Vt , Bt given pose: p(ut , vt , bt | u1:t−1 , y1:t ) = p(ut | u1:t−1 , y1:t ) p(vt , bt | u1:t , y1:t ) (6) Opinion of expert Pose Opinion Texture Opinion given pose The rest of this section explains how we evaluate each term in (5) and (6). We cover the distribution of texture given pose in 3.1, pose opinion in 3.2, and credibility in 3.3. 3.1 Texture opinion given pose The distribution of Vt and Bt given the pose history u1:t is Gaussian with mean and covariance that can be obtained using the Kalman filter estimation equations: −1 V ar−1 (Vt , Bt | u1:t , y1:t ) = V ar−1 (Vt , Bt | u1:t−1 , y1:t−1 ) + a(ut )T σw a(ut ) E(Vt , Bt | u1:t , y1:t ) = V ar(Vt , Bt | u1:t , y1:t ) −1 × V ar−1 (Vt , Bt | u1:t−1 , y1:t−1 )E(Vt , Bt | u1:t−1 , y1:t−1 ) + a(ut )T σw yt (7) (8) This requires p(Vt , Bt |u1:t−1 , y1:t−1 ), which we get from the Kalman prediction equations: E(Vt , Bt | u1:t−1 , y1:t−1 ) = E(Vt−1 , Bt−1 | u1:t−1 , y1:t−1 ) V ar(Vt , Bt | u1:t−1 , y1:t−1 ) = V ar(Vt−1 , Bt−1 | u1:t−1 , y1:t−1 ) + (9) Ψv 0 0 Ψb (10) In (9), the expected value E(Vt , Bt | u1:t−1 , y1:t−1 ) consists of texture maps (templates) for the object and background. In (10), V ar(Vt , Bt | u1:t−1 , y1:t−1 ) represents the degree of uncertainty about each texel in these texture maps. Since this is a diagonal matrix, we can refer to the mean and variance of each texel individually. For the ith texel in the object texture map, we use the following notation: µv (i) t v σt (i) def = ith element of E(Vt | u1:t−1 , y1:t−1 ) def = (i, i)th element of V ar(Vt | u1:t−1 , y1:t−1 ) b Similarly, define µb (j) and σt (j) as the mean and variance of the jth texel in the backt ground texture map. (This notation leaves the dependency on u1:t−1 and y1:t−1 implicit.) 3.2 Pose opinion Based on its current texture template (derived from the history of poses and images up to time t−1) and the new image yt , each expert u1:t−1 has a pose opinion, p(ut |u1:t−1 , y1:t ), a probability distribution representing that expert’s beliefs about the pose at time t. Since the effect of ut on the likelihood function is nonlinear, we will not attempt to find an analytical solution for the pose opinion distribution. However, due to the spatio-temporal smoothness of video signals, it is possible to estimate the peak and variance of an expert’s pose opinion. 3.2.1 Estimating the peak of an expert’s pose opinion We want to estimate ut (u1:t−1 ), the value of ut that maximizes the pose opinion. Since ˆ p(ut | u1:t−1 , y1:t ) = p(y1:t−1 | u1:t−1 ) p(ut | ut−1 ) p(yt | u1:t , y1:t−1 ), p(y1:t | u1:t−1 ) (11) def ut (u1:t−1 ) = argmax p(ut | u1:t−1 , y1:t ) = argmax p(ut | ut−1 ) p(yt | u1:t , y1:t−1 ). ˆ ut ut (12) We now need an expression for the final term in (12), the predictive distribution p(yt | u1:t , y1:t−1 ). By integrating out the hidden texture variables from p(yt , vt , bt | u1:t , y1:t−1 ), and using the conditional independence relationships defined by the graphical model (Figure 1, right), we can derive: 1 m log p(yt | u1:t , y1:t−1 ) = − log 2π − log |V ar(Yt | u1:t , y1:t−1 )| 2 2 n v 2 1 (yt (xi (ut )) − µt (i)) 1 (yt (j) − µb (j))2 t − − , (13) v (i) + σ b 2 i=1 σt 2 σt (j) + σw w j∈X (ut ) where xi (ut ) is the image pixel rendered by the ith object vertex when the object assumes pose ut , and X (ut ) is the set of all image pixels rendered by the object under pose ut . Combining (12) and (13), we can derive ut (u1:t−1 ) = argmin − log p(ut | ut−1 ) ˆ (14) ut + 1 2 n i=1 [yt (xi (ut )) − µv (i)]2 [yt (xi (ut )) − µb (xi (ut ))]2 t t b − − log[σt (xi (ut )) + σw ] v b σt (i) + σw σt (xi (ut )) + σw Foreground term Background terms Note the similarity between (14) and constrained optic flow (3). For example, focus on the foreground term in (14) and ignore the weights in the denominator. The previous image yt−1 from (3) has been replaced by µv (·), the estimated object texture based on the images t and poses up to time t − 1. As in optic flow, we can find the pose estimate ut (u1:t−1 ) ˆ efficiently using the Gauss-Newton method. 3.2.2 Estimating the distribution of an expert’s pose opinion We estimate the distribution of an expert’s pose opinion using a combination of Laplace’s method and importance sampling. Suppose at time t − 1 we are given a sample of experts (d) (d) indexed by d, each endowed with a pose sequence u1:t−1 , a weight wt−1 , and the means and variances of Gaussian distributions for object and background texture. For each expert (d) (d) u1:t−1 , we use (14) to compute ut , the peak of the pose distribution at time t according ˆ (d) to that expert. Define σt as the inverse Hessian matrix of (14) at this peak, the Laplace ˆ estimate of the covariance matrix of the expert’s opinion. We then generate a set of s (d,e) (d) independent samples {ut : e = 1, · · · , s} from a Gaussian distribution with mean ut ˆ (d) (d) (d) and variance proportional to σt , g(·|ˆt , αˆt ), where the parameter α > 0 determines ˆ u σ the sharpness of the sampling distribution. (Note that letting α → 0 would be equivalent to (d,e) (d) simply setting the new pose equal to the peak of the pose opinion, ut = ut .) To find ˆ the parameters of this Gaussian proposal distribution, we use the Gauss-Newton method, ignoring the second of the two background terms in (14). (This term is not ignored in the importance sampling step.) To refine our estimate of the pose opinion we use importance sampling. We assign each sample from the proposal distribution an importance weight wt (d, e) that is proportional to the ratio between the posterior distribution and the proposal distribution: s (d) p(ut | u1:t−1 , y1:t ) = ˆ (d,e) δ(ut − ut ) wt (d, e) s f =1 wt (d, f ) (15) e=1 (d,e) (d) (d) (d,e) p(ut | ut−1 )p(yt | u1:t−1 , ut , y1:t−1 ) wt (d, e) = (16) (d,e) (d) (d) g(ut | ut , αˆt ) ˆ σ (d,e) (d) The numerator of (16) is proportional to p(ut |u1:t−1 , y1:t ) by (12), and the denominator of (16) is the sampling distribution. 3.3 Estimating an expert’s credibility (d) The credibility of the dth expert, p(u1:t−1 | y1:t ), is proportional to the product of a prior term and a likelihood term: (d) (d) p(u1:t−1 | y1:t−1 )p(yt | u1:t−1 , y1:t−1 ) (d) p(u1:t−1 | y1:t ) = . (17) p(yt | y1:t−1 ) Regarding the likelihood, p(yt |u1:t−1 , y1:t−1 ) = p(yt , ut |u1:t−1 , y1:t−1 )dut = p(yt |u1:t , y1:t−1 )p(ut |ut−1 )dut (18) (d,e) We already generated a set of samples {ut : e = 1, · · · , s} that estimate the pose opin(d) ion of the dth expert, p(ut | u1:t−1 , y1:t ). We can now use these samples to estimate the likelihood for the dth expert: (d) p(yt | u1:t−1 , y1:t−1 ) = (d) (d) p(yt | u1:t−1 , ut , y1:t−1 )p(ut | ut−1 )dut (19) (d) (d) (d) (d) = p(yt | u1:t−1 , ut , y1:t−1 )g(ut | ut , αˆt ) ˆ σ 3.4 p(ut | ut−1 ) s e=1 dut ≈ wt (d, e) s Updating the filtering distribution g(ut | (d) (d) ut , αˆt ) ˆ σ Once we have calculated the opinion and credibility of each expert u1:t−1 , we evaluate the integral in (5) as a weighted sum over experts. The credibilities of all of the experts are normalized to sum to 1. New experts u1:t (children) are created from the old experts u1:t−1 (parents) by appending a pose ut to the parent’s history of poses u1:t−1 . Every expert in the new generation is created as follows: One parent is chosen to sire the child. The probability of being chosen is proportional to the parent’s credibility. The child’s value of ut is chosen at random from its parent’s pose opinion (the weighted samples described in Section 3.2.2). 4 Relation to Optic Flow and Template Matching In basic template-matching, the same time-invariant texture map is used to track every frame in the video sequence. Optic flow can be thought of as template-matching with a template that is completely reset at each frame for use in the subsequent frame. In most cases, optimal inference under G-flow involves a combination of optic flow-based and template-based tracking, in which the texture template gradually evolves as new images are presented. Pure optic flow and template-matching emerge as special cases. Optic Flow as a Special Case Suppose that the pose transition probability p(ut | ut−1 ) is uninformative, that the background is uninformative, that every texel in the initial object texture map has equal variance, V ar(V1 ) = κIn , and that the texture transition uncertainty is very high, Ψv → diag(∞). Using (7), (8), and (10), it follows that: µv (i) = [av (ut−1 )]T yt−1 = yt−1 (xi (ut−1 )) , t (20) i.e., the object texture map at time t is determined by the pixels from image yt−1 that according to pose ut−1 were rendered by the object. As a result, (14) reduces to: ut (u1:t−1 ) = argmin ˆ ut 1 2 n yt (xi (ut )) − yt−1 (xi (ut−1 )) 2 (21) i=1 which is identical to (3). Thus constrained optic flow [10, 2, 11] is simply a special case of optimal inference under G-flow, with a single expert and with sampling parameter α → 0. The key assumption that Ψv → diag(∞) means that the object’s texture is very different in adjacent frames. However, optic flow is typically applied in situations in which the object’s texture in adjacent frames is similar. The optimal solution in such situations calls not for optic flow, but for a texture map that integrates information across multiple frames. Template Matching as a Special Case Suppose the initial texture map is known precisely, V ar(V1 ) = 0, and the texture transition uncertainty is very low, Ψv → 0. By (7), (8), and (10), it follows that µv (i) = µv (i) = µv (i), i.e., the texture map does not change t t−1 1 over time, but remains fixed at its initial value (it is a texture template). Then (14) becomes: n yt (xi (ut )) − µv (i) 1 ut (u1:t−1 ) = argmin ˆ ut 2 (22) i=1 where µv (i) is the ith texel of the fixed texture template. This is the error function mini1 mized by standard template-matching algorithms. The key assumption that Ψv → 0 means the object’s texture is constant from each frame to the next, which is rarely true in real data. G-flow provides a principled way to relax this unrealistic assumption of template methods. General Case In general, if the background is uninformative, then minimizing (14) results in a weighted combination of optic flow and template matching, with the weight of each approach depending on the current level of certainty about the object template. In addition, when there is useful information in the background, G-flow infers a model of the background which is used to improve tracking. Figure 2: G-flow tracking an outdoor video. Results are shown for frames 1, 81, and 620. 5 Simulations We collected a video (30 frames/sec) of a subject in an outdoor setting who made a variety of facial expressions while moving her head. A later motion-capture session was used to create a 3D morphable model of her face, consisting of a set of 5 morph bases (k = 5). Twenty experts were initialized randomly near the correct pose on frame 1 of the video and propagated using G-flow inference (assuming an uninformative background). See http://mplab.ucsd.edu for video. Figure 2 shows the distribution of experts for three frames. In each frame, every expert has a hypothesis about the pose (translation, rotation, scale, and morph coefficients). The 38 points in the model are projected into the image according to each expert’s pose, yielding 760 red dots in each frame. In each frame, the mean of the experts gives a single hypothesis about the 3D non-rigid deformation of the face (lower right) as well as the rigid pose of the face (rotated 3D axes, lower left). Notice G-flow’s ability to recover from error: bad initial hypotheses are weeded out, leaving only good hypotheses. To compare G-flow’s performance versus deterministic constrained optic flow algorithms such as [10, 2, 11] , we used both G-flow and the method from [2] to track the same video sequence. We ran each tracker several times, introducing small errors in the starting pose. Figure 3: Average error over time for G-flow (green) and for deterministic optic flow [2] (blue). Results were averaged over 16 runs (deterministic algorithm) or 4 runs (G-flow) and smoothed. As ground truth, the 2D locations of 6 points were hand-labeled in every 20th frame. The error at every 20th frame was calculated as the distance from these labeled locations to the inferred (tracked) locations, averaged across several runs. Figure 3 compares this tracking error as a function of time for the deterministic constrained optic flow algorithm and for a 20-expert version of the G-flow tracking algorithm. Notice that the deterministic system has a tendency to drift (increase in error) over time, whereas G-flow can recover from drift. Acknowledgments Tim K. Marks was supported by NSF grant IIS-0223052 and NSF grant DGE-0333451 to GWC. John Hershey was supported by the UCDIMI grant D00-10084. J. Cooper Roddey was supported by the Swartz Foundation. Javier R. Movellan was supported by NSF grants IIS-0086107, IIS-0220141, and IIS-0223052, and by the UCDIMI grant D00-10084. References [1] Simon Baker and Iain Matthews. Lucas-kanade 20 years on: A unifying framework. International Journal of Computer Vision, 56(3):221–255, 2002. [2] M. Brand. Flexible flow for 3D nonrigid tracking and shape recovery. In CVPR, volume 1, pages 315–322, 2001. [3] H. Chen, P. Kumar, and J. van Schuppen. On Kalman filtering for conditionally gaussian systems with random matrices. Syst. Contr. Lett., 13:397–404, 1989. [4] R. Chen and J. Liu. Mixture Kalman filters. J. R. Statist. Soc. B, 62:493–508, 2000. [5] A. Doucet and C. Andrieu. Particle filtering for partially observed gaussian state space models. J. R. Statist. Soc. B, 64:827–838, 2002. [6] A. Doucet, N. de Freitas, K. Murphy, and S. Russell. Rao-blackwellised particle filtering for dynamic bayesian networks. In 16th Conference on Uncertainty in AI, pages 176–183, 2000. [7] A. Doucet, S. J. Godsill, and C. Andrieu. On sequential monte carlo sampling methods for bayesian filtering. Statistics and Computing, 10:197–208, 2000. [8] Zoubin Ghahramani and Geoffrey E. Hinton. Variational learning for switching state-space models. Neural Computation, 12(4):831–864, 2000. [9] B. Lucas and T. Kanade. An iterative image registration technique with an application to stereo vision. In Proceedings of the International Joint Conference on Artificial Intelligence, 1981. [10] L. Torresani, D. Yang, G. Alexander, and C. Bregler. Tracking and modeling non-rigid objects with rank constraints. In CVPR, pages 493–500, 2001. [11] Lorenzo Torresani, Aaron Hertzmann, and Christoph Bregler. Learning non-rigid 3d shape from 2d motion. In Advances in Neural Information Processing Systems 16. MIT Press, 2004.
6 0.078870609 202 nips-2004-VDCBPI: an Approximate Scalable Algorithm for Large POMDPs
7 0.078203812 129 nips-2004-New Criteria and a New Algorithm for Learning in Multi-Agent Systems
8 0.07761877 126 nips-2004-Nearly Tight Bounds for the Continuum-Armed Bandit Problem
9 0.07181748 48 nips-2004-Convergence and No-Regret in Multiagent Learning
10 0.071458422 24 nips-2004-Approximately Efficient Online Mechanism Design
11 0.066988729 154 nips-2004-Resolving Perceptual Aliasing In The Presence Of Noisy Sensors
12 0.060688544 39 nips-2004-Coarticulation in Markov Decision Processes
13 0.0598079 138 nips-2004-Online Bounds for Bayesian Algorithms
14 0.055254109 71 nips-2004-Generalization Error Bounds for Collaborative Prediction with Low-Rank Matrices
15 0.054881796 78 nips-2004-Hierarchical Distributed Representations for Statistical Language Modeling
16 0.053597964 204 nips-2004-Variational Minimax Estimation of Discrete Distributions under KL Loss
17 0.052809015 102 nips-2004-Learning first-order Markov models for control
18 0.052085306 123 nips-2004-Multi-agent Cooperation in Diverse Population Games
19 0.048591372 15 nips-2004-Active Learning for Anomaly and Rare-Category Detection
20 0.043676019 69 nips-2004-Fast Rates to Bayes for Kernel Machines
topicId topicWeight
[(0, -0.126), (1, -0.014), (2, 0.254), (3, -0.017), (4, -0.113), (5, 0.14), (6, -0.034), (7, -0.024), (8, -0.034), (9, 0.029), (10, -0.006), (11, 0.008), (12, 0.075), (13, 0.014), (14, -0.093), (15, -0.015), (16, 0.025), (17, 0.118), (18, -0.046), (19, -0.052), (20, -0.003), (21, -0.009), (22, 0.01), (23, -0.082), (24, -0.034), (25, 0.018), (26, -0.097), (27, 0.13), (28, -0.121), (29, 0.135), (30, -0.066), (31, -0.031), (32, 0.083), (33, -0.129), (34, -0.075), (35, 0.045), (36, 0.062), (37, 0.006), (38, 0.105), (39, -0.03), (40, -0.052), (41, 0.04), (42, -0.008), (43, -0.061), (44, 0.06), (45, -0.023), (46, 0.103), (47, -0.093), (48, -0.003), (49, 0.124)]
simIndex simValue paperId paperTitle
same-paper 1 0.97033697 65 nips-2004-Exploration-Exploitation Tradeoffs for Experts Algorithms in Reactive Environments
Author: Daniela D. Farias, Nimrod Megiddo
Abstract: A reactive environment is one that responds to the actions of an agent rather than evolving obliviously. In reactive environments, experts algorithms must balance exploration and exploitation of experts more carefully than in oblivious ones. In addition, a more subtle definition of a learnable value of an expert is required. A general exploration-exploitation experts method is presented along with a proper definition of value. The method is shown to asymptotically perform as well as the best available expert. Several variants are analyzed from the viewpoint of the exploration-exploitation tradeoff, including explore-then-exploit, polynomially vanishing exploration, constant-frequency exploration, and constant-size exploration phases. Complexity and performance bounds are proven. 1
2 0.72785431 64 nips-2004-Experts in a Markov Decision Process
Author: Eyal Even-dar, Sham M. Kakade, Yishay Mansour
Abstract: We consider an MDP setting in which the reward function is allowed to change during each time step of play (possibly in an adversarial manner), yet the dynamics remain fixed. Similar to the experts setting, we address the question of how well can an agent do when compared to the reward achieved under the best stationary policy over time. We provide efficient algorithms, which have regret bounds with no dependence on the size of state space. Instead, these bounds depend only on a certain horizon time of the process and logarithmically on the number of actions. We also show that in the case that the dynamics change over time, the problem becomes computationally hard. 1
3 0.54665095 88 nips-2004-Intrinsically Motivated Reinforcement Learning
Author: Nuttapong Chentanez, Andrew G. Barto, Satinder P. Singh
Abstract: Psychologists call behavior intrinsically motivated when it is engaged in for its own sake rather than as a step toward solving a specific problem of clear practical value. But what we learn during intrinsically motivated behavior is essential for our development as competent autonomous entities able to efficiently solve a wide range of practical problems as they arise. In this paper we present initial results from a computational study of intrinsically motivated reinforcement learning aimed at allowing artificial agents to construct and extend hierarchies of reusable skills that are needed for competent autonomy. 1
4 0.52345967 126 nips-2004-Nearly Tight Bounds for the Continuum-Armed Bandit Problem
Author: Robert D. Kleinberg
Abstract: In the multi-armed bandit problem, an online algorithm must choose from a set of strategies in a sequence of n trials so as to minimize the total cost of the chosen strategies. While nearly tight upper and lower bounds are known in the case when the strategy set is finite, much less is known when there is an infinite strategy set. Here we consider the case when the set of strategies is a subset of Rd , and the cost functions are continuous. In the d = 1 case, we improve on the best-known upper and lower bounds, closing the gap to a sublogarithmic factor. We also consider the case where d > 1 and the cost functions are convex, adapting a recent online convex optimization algorithm of Zinkevich to the sparser feedback model of the multi-armed bandit problem. 1
5 0.49294689 37 nips-2004-Co-Training and Expansion: Towards Bridging Theory and Practice
Author: Maria-florina Balcan, Avrim Blum, Ke Yang
Abstract: Co-training is a method for combining labeled and unlabeled data when examples can be thought of as containing two distinct sets of features. It has had a number of practical successes, yet previous theoretical analyses have needed very strong assumptions on the data that are unlikely to be satisfied in practice. In this paper, we propose a much weaker “expansion” assumption on the underlying data distribution, that we prove is sufficient for iterative cotraining to succeed given appropriately strong PAC-learning algorithms on each feature set, and that to some extent is necessary as well. This expansion assumption in fact motivates the iterative nature of the original co-training algorithm, unlike stronger assumptions (such as independence given the label) that allow a simpler one-shot co-training to succeed. We also heuristically analyze the effect on performance of noise in the data. Predicted behavior is qualitatively matched in synthetic experiments on expander graphs. 1
6 0.44150496 202 nips-2004-VDCBPI: an Approximate Scalable Algorithm for Large POMDPs
7 0.39133081 91 nips-2004-Joint Tracking of Pose, Expression, and Texture using Conditionally Gaussian Filters
8 0.38427141 129 nips-2004-New Criteria and a New Algorithm for Learning in Multi-Agent Systems
9 0.36966172 39 nips-2004-Coarticulation in Markov Decision Processes
10 0.34984478 154 nips-2004-Resolving Perceptual Aliasing In The Presence Of Noisy Sensors
11 0.33940214 138 nips-2004-Online Bounds for Bayesian Algorithms
12 0.33028179 48 nips-2004-Convergence and No-Regret in Multiagent Learning
13 0.32744983 102 nips-2004-Learning first-order Markov models for control
14 0.29653519 108 nips-2004-Markov Networks for Detecting Overalpping Elements in Sequence Data
15 0.28510341 162 nips-2004-Semi-Markov Conditional Random Fields for Information Extraction
16 0.2836391 71 nips-2004-Generalization Error Bounds for Collaborative Prediction with Low-Rank Matrices
17 0.27740735 24 nips-2004-Approximately Efficient Online Mechanism Design
18 0.24408618 78 nips-2004-Hierarchical Distributed Representations for Statistical Language Modeling
19 0.2313371 204 nips-2004-Variational Minimax Estimation of Discrete Distributions under KL Loss
20 0.22395237 123 nips-2004-Multi-agent Cooperation in Diverse Population Games
topicId topicWeight
[(12, 0.312), (13, 0.088), (15, 0.087), (26, 0.082), (31, 0.014), (33, 0.169), (35, 0.014), (39, 0.015), (50, 0.071), (71, 0.018), (76, 0.01), (89, 0.014)]
simIndex simValue paperId paperTitle
same-paper 1 0.77891254 65 nips-2004-Exploration-Exploitation Tradeoffs for Experts Algorithms in Reactive Environments
Author: Daniela D. Farias, Nimrod Megiddo
Abstract: A reactive environment is one that responds to the actions of an agent rather than evolving obliviously. In reactive environments, experts algorithms must balance exploration and exploitation of experts more carefully than in oblivious ones. In addition, a more subtle definition of a learnable value of an expert is required. A general exploration-exploitation experts method is presented along with a proper definition of value. The method is shown to asymptotically perform as well as the best available expert. Several variants are analyzed from the viewpoint of the exploration-exploitation tradeoff, including explore-then-exploit, polynomially vanishing exploration, constant-frequency exploration, and constant-size exploration phases. Complexity and performance bounds are proven. 1
2 0.59083354 69 nips-2004-Fast Rates to Bayes for Kernel Machines
Author: Ingo Steinwart, Clint Scovel
Abstract: We establish learning rates to the Bayes risk for support vector machines (SVMs) with hinge loss. In particular, for SVMs with Gaussian RBF kernels we propose a geometric condition for distributions which can be used to determine approximation properties of these kernels. Finally, we compare our methods with a recent paper of G. Blanchard et al.. 1
3 0.58615988 206 nips-2004-Worst-Case Analysis of Selective Sampling for Linear-Threshold Algorithms
Author: Nicolò Cesa-bianchi, Claudio Gentile, Luca Zaniboni
Abstract: We provide a worst-case analysis of selective sampling algorithms for learning linear threshold functions. The algorithms considered in this paper are Perceptron-like algorithms, i.e., algorithms which can be efficiently run in any reproducing kernel Hilbert space. Our algorithms exploit a simple margin-based randomized rule to decide whether to query the current label. We obtain selective sampling algorithms achieving on average the same bounds as those proven for their deterministic counterparts, but using much fewer labels. We complement our theoretical findings with an empirical comparison on two text categorization tasks. The outcome of these experiments is largely predicted by our theoretical results: Our selective sampling algorithms tend to perform as good as the algorithms receiving the true label after each classification, while observing in practice substantially fewer labels. 1
4 0.58222747 189 nips-2004-The Power of Selective Memory: Self-Bounded Learning of Prediction Suffix Trees
Author: Ofer Dekel, Shai Shalev-shwartz, Yoram Singer
Abstract: Prediction suffix trees (PST) provide a popular and effective tool for tasks such as compression, classification, and language modeling. In this paper we take a decision theoretic view of PSTs for the task of sequence prediction. Generalizing the notion of margin to PSTs, we present an online PST learning algorithm and derive a loss bound for it. The depth of the PST generated by this algorithm scales linearly with the length of the input. We then describe a self-bounded enhancement of our learning algorithm which automatically grows a bounded-depth PST. We also prove an analogous mistake-bound for the self-bounded algorithm. The result is an efficient algorithm that neither relies on a-priori assumptions on the shape or maximal depth of the target PST nor does it require any parameters. To our knowledge, this is the first provably-correct PST learning algorithm which generates a bounded-depth PST while being competitive with any fixed PST determined in hindsight. 1
5 0.57967842 204 nips-2004-Variational Minimax Estimation of Discrete Distributions under KL Loss
Author: Liam Paninski
Abstract: We develop a family of upper and lower bounds on the worst-case expected KL loss for estimating a discrete distribution on a finite number m of points, given N i.i.d. samples. Our upper bounds are approximationtheoretic, similar to recent bounds for estimating discrete entropy; the lower bounds are Bayesian, based on averages of the KL loss under Dirichlet distributions. The upper bounds are convex in their parameters and thus can be minimized by descent methods to provide estimators with low worst-case error; the lower bounds are indexed by a one-dimensional parameter and are thus easily maximized. Asymptotic analysis of the bounds demonstrates the uniform KL-consistency of a wide class of estimators as c = N/m → ∞ (no matter how slowly), and shows that no estimator is consistent for c bounded (in contrast to entropy estimation). Moreover, the bounds are asymptotically tight as c → 0 or ∞, and are shown numerically to be tight within a factor of two for all c. Finally, in the sparse-data limit c → 0, we find that the Dirichlet-Bayes (add-constant) estimator with parameter scaling like −c log(c) optimizes both the upper and lower bounds, suggesting an optimal choice of the “add-constant” parameter in this regime.
6 0.57691646 103 nips-2004-Limits of Spectral Clustering
7 0.57522577 174 nips-2004-Spike Sorting: Bayesian Clustering of Non-Stationary Data
8 0.57408774 29 nips-2004-Beat Tracking the Graphical Model Way
9 0.57329881 49 nips-2004-Density Level Detection is Classification
10 0.57293224 45 nips-2004-Confidence Intervals for the Area Under the ROC Curve
11 0.57213897 207 nips-2004-ℓ₀-norm Minimization for Basis Selection
12 0.57123429 86 nips-2004-Instance-Specific Bayesian Model Averaging for Classification
13 0.57114971 93 nips-2004-Kernel Projection Machine: a New Tool for Pattern Recognition
14 0.57075179 71 nips-2004-Generalization Error Bounds for Collaborative Prediction with Low-Rank Matrices
15 0.5705055 116 nips-2004-Message Errors in Belief Propagation
16 0.57011449 3 nips-2004-A Feature Selection Algorithm Based on the Global Minimization of a Generalization Error Bound
17 0.56866515 131 nips-2004-Non-Local Manifold Tangent Learning
18 0.56779987 64 nips-2004-Experts in a Markov Decision Process
19 0.56776589 48 nips-2004-Convergence and No-Regret in Multiagent Learning
20 0.56737268 105 nips-2004-Log-concavity Results on Gaussian Process Methods for Supervised and Unsupervised Learning