nips nips2012 nips2012-250 knowledge-graph by maker-knowledge-mining
Source: pdf
Author: Doina Precup, Joelle Pineau, Andre S. Barreto
Abstract: Kernel-based stochastic factorization (KBSF) is an algorithm for solving reinforcement learning tasks with continuous state spaces which builds a Markov decision process (MDP) based on a set of sample transitions. What sets KBSF apart from other kernel-based approaches is the fact that the size of its MDP is independent of the number of transitions, which makes it possible to control the trade-off between the quality of the resulting approximation and the associated computational cost. However, KBSF’s memory usage grows linearly with the number of transitions, precluding its application in scenarios where a large amount of data must be processed. In this paper we show that it is possible to construct KBSF’s MDP in a fully incremental way, thus freeing the space complexity of this algorithm from its dependence on the number of sample transitions. The incremental version of KBSF is able to process an arbitrary amount of data, which results in a model-based reinforcement learning algorithm that can be used to solve continuous MDPs in both off-line and on-line regimes. We present theoretical results showing that KBSF can approximate the value function that would be computed by conventional kernel-based learning with arbitrary precision. We empirically demonstrate the effectiveness of the proposed algorithm in the challenging threepole balancing task, in which the ability to process a large number of transitions is crucial for success. 1
Reference: text
sentIndex sentText sentNum sentScore
1 ca Abstract Kernel-based stochastic factorization (KBSF) is an algorithm for solving reinforcement learning tasks with continuous state spaces which builds a Markov decision process (MDP) based on a set of sample transitions. [sent-9, score-0.25]
2 The incremental version of KBSF is able to process an arbitrary amount of data, which results in a model-based reinforcement learning algorithm that can be used to solve continuous MDPs in both off-line and on-line regimes. [sent-13, score-0.22]
3 We empirically demonstrate the effectiveness of the proposed algorithm in the challenging threepole balancing task, in which the ability to process a large number of transitions is crucial for success. [sent-15, score-0.256]
4 1 Introduction The task of learning a policy for a sequential decision problem with continuous state space is a long-standing challenge that has attracted the attention of the reinforcement learning community for years. [sent-16, score-0.216]
5 KBRL solves a continuous state-space Markov decision process (MDP) using a finite model constructed based on sample transitions only. [sent-18, score-0.32]
6 The main limitation of the algorithm is the fact that, in order to construct its model, it uses an amount of memory that grows linearly with the number of sample transitions. [sent-28, score-0.127]
7 In order to distinguish it from its original, batch counterpart, we call this new version of our algorithm incremental KBSF, or iKBSF for short. [sent-31, score-0.131]
8 In our previous experiments with KBSF, we defined the model used by this algorithm by clustering the sample transitions and then using the clusters’s centers as the representative states in the reduced MDP [4]. [sent-35, score-0.456]
9 In this paper we fill this gap by showing that we can approximate KBRL’s value function at any desired level of accuracy by minimizing the distance from a sampled state to the nearest representative state. [sent-37, score-0.151]
10 Besides its theoretical interest, the bound is also relevant from a practical point of view, since it can be used in iKBSF to guide the on-line selection of representative states. [sent-38, score-0.14]
11 Here, iKBSF’s ability to process a large number of transitions is crucial for achieving a high success rate, which cannot be easily replicated with batch methods. [sent-40, score-0.297]
12 2 Background In reinforcement learning, an agent interacts with an environment in order to find a policy that maximizes the discounted sum of rewards [6]. [sent-41, score-0.139]
13 An MDP is a tuple M ≡ (S, A, Pa , ra , γ), where S is the state space and A is the (finite) action set. [sent-43, score-0.179]
14 In a finite MDP the matrix Pa ∈ R|S|×|S| gives the transition probabilities associated with action a ∈ A and the vector ra ∈ R|S| stores the corresponding expected rewards. [sent-45, score-0.183]
15 Kernel-based reinforcement learning (KBRL) uses sample transitions to derive a finite MDP that approximates the continuous model [1, a ˆ 2]. [sent-48, score-0.363]
16 , na } be sample transitions associated with action a ∈ A, where k k a , sa ∈ S and r a ∈ R. [sent-52, score-0.908]
17 Finally, a define the normalized kernel function associated with action a as κτ (s, sa ) = kτ (s, sa )/∑na kτ (s, sa ). [sent-54, score-1.679]
18 i i j j=1 The model constructed by KBRL has the following transition and reward functions: ˆ Pa (s′ |s) = a κτ (s, sa ), if s′ = sa , ˆi i 0, otherwise and ˆ Ra (s, s′ ) = ria , if s′ = sa , ˆi 0, otherwise. [sent-55, score-1.712]
19 (1) Since only transitions ending in the states sa have a non-zero probability of occurrence, one can ˆi ˆ ˆ define a finite MDP M composed solely of these n = ∑a na states [2, 3]. [sent-56, score-1.007]
20 After V ∗ , the optimal ˆ value function of M, has been computed, the value of any state-action pair can be determined as: a ˆ ˆi Q(s, a) = ∑na κτ (s, sa ) ria + γ V ∗ (sa ) , where s ∈ S and a ∈ A. [sent-57, score-0.597]
21 Therefore, the use of KBRL leads to a dilemma: on the one hand, one wants to use as many transitions as possible to capture the ˆ dynamics of M, but on the other hand one wants to have an MDP M of manageable size. [sent-60, score-0.254]
22 Our algorithm compresses the information contained in KBRL’s model M ¯ in an MDP M whose size is independent of the number of transitions n. [sent-62, score-0.222]
23 , sm } ¯ ¯ ¯ ˙ a ∈ Rna ×m and Ka ∈ Rm×na with el˙ be a set of representative states in S. [sent-71, score-0.218]
24 KBSF computes matrices D a ¯ ˙ ¯ ¯ ¯ ements d˙iaj = κτ (sa , s j ) and kiaj = κτ (si , sa ), where κτ is defined as κτ (s, si ) = kτ (s, si )/∑m kτ (s, s j ). [sent-72, score-0.671]
25 ¯ ¯ ¯ ¯ ˆi ¯ j=1 ¯ j ¯ ¯ ˙ ˙ ¯ ¯ ˆ ¯ The basic idea of the algorithm is to replace the MDP M with M ≡ (S, A, Pa , ra , γ), where Pa = Ka Da ˙ ¯ and ra = Ka ra (ra ∈ Rna is the vector composed of sample rewards ria ). [sent-73, score-0.503]
26 3 Incremental kernel-based stochastic factorization ¯ ¯ In the batch version of KBSF, described in Section 2, the matrices Pa and vectors ra are determined a simultaneously. [sent-84, score-0.245]
27 This has two undesirable conseusing all the transitions in the corresponding sets S ¯ quences. [sent-85, score-0.222]
28 First, the construction of the MDP M requires an amount of memory of O(nmax m), where nmax = maxa na . [sent-86, score-0.228]
29 max ¯ Second, with batch KBSF the only way to incorporate new data into the model M is to recompute ¯ ˙ ˙ the multiplication Pa = Ka Da for all actions a for which there are new sample transitions available. [sent-88, score-0.345]
30 Suppose we split the set of sample transitions Sa in two subsets S1 and S2 such that S1 ∩ S2 = 0 / and S1 ∪ S2 = Sa . [sent-91, score-0.257]
31 Without loss of generality, suppose that the sample transitions are indexed so that a ˆ a ˆ ¯ S1 ≡ {(sa , rk , sa )|k = 1, 2, . [sent-92, score-0.819]
32 , n1 } and S2 ≡ {(sa , rk , sa )|k = n1 + 1, n1 + 2, . [sent-95, score-0.562]
33 Let PS1 k k k k S1 ¯ a and vector ra computed by KBSF using only the n1 transitions in S1 (if n1 = 0, ¯ ¯ and r be matrix P ¯ ¯ ¯ ¯ we define PS1 = 0 ∈ Rm×m and rS1 = 0 ∈ Rm for all a ∈ A). [sent-99, score-0.355]
34 We know that S n1 pi 1 = ¯j ˙a ∑ kit d˙taj = t=1 n1 n1 kτ (sta , s j ) kτ (si , sta )kτ (sta , s j ) ¯ kτ (si , sta ) 1 ¯ ¯ ˆ ¯ ¯ ˆ ¯ = n1 ∑ ∑n1 kτ (s¯i , sa ) ∑m kτ¯ (sˆta , s¯l ) ∑ kτ (s¯i , sa ) ∑ ∑m kτ¯ (sˆta , s¯l ) . [sent-102, score-1.404]
35 l=1 l=1 l=1 l l t=1 t=1 l=1 2 To simplify the notation, define wi 1 = ∑n1 kτ (si , sa ), wi 2 = ∑n1 +n+1 kτ (si , sa ), and cti j = ¯ l ¯ l l=1 l=n1 S kτ (si ,st )kτ (st ,s j ) ¯ a ¯ ˆa ¯ a ∑m kτ (st ,sl ) l=1 ¯ ˆ ¯ , with t ∈ {1, 2, . [sent-103, score-1.31]
36 Then, S ∪S2 pi 1 ¯j S = 1 S S wi 1 + wi 2 n1 +n2 n1 ∑t=1 cti j + ∑t=n1 +1 cti j = 3 1 S n1 +n2 pi 1 wi 1 + ∑t=n1 +1 cti j . [sent-107, score-0.476]
37 ¯j S S wi 1 + wi 2 S n1 +n2 Now, defining bi 2 = ∑t=n1 +1 cti j , we have the simple update rule: j S S ∪S2 pi 1 ¯j = 1 S1 S S2 wi + wi S S bi 2 + pi 1 wi 1 ¯j j . [sent-108, score-0.532]
38 We know that ¯ n1 1 n1 1 ¯ ¯ kτ (si , sta )rta = S1 ∑ kτ (si , sta )rta . [sent-110, score-0.272]
39 n1 ¯ l ∑ ∑l=1 kτ (si , sa ) t=1 wi t=1 ¯ Let hti = kτ (si , sta )rta , with t ∈ {1, 2, . [sent-111, score-0.812]
40 Then, 1 1 S S S ∪S n1 +n2 n1 n1 +n2 wi 1 ri 1 + ∑t=n1 +1 hti ¯ ri 1 2 = S1 ¯ ∑t=1 hti + ∑t=n1 +1 hti = S1 S2 S wi + wi wi + wi 2 S ri 1 = ¯ . [sent-115, score-0.65]
41 n1 +n2 Defining ei 2 = ∑t=n1 +1 hti , we have the following update rule: S S ∪S2 ri 1 ¯ S S = 1 S1 S S2 wi + wi S S ei 2 + ri 1 wi 1 ¯ . [sent-116, score-0.404]
42 (4) S Since bi 2 , ei 2 , and wi 2 can be computed based on S2 only, we can discard the sample transitions in S1 j S ¯ ¯ after computing PS1 and rS1 . [sent-117, score-0.349]
43 Since the amount of memory used by KBSF is now independent of n, it can process an arbitrary number of sample transitions. [sent-124, score-0.123]
44 Algorithm 1 Update KBSF’s MDP ¯ ¯ Pa , ra , wa for all a ∈ A Input: a S for all a ∈ A ¯ Output: Updated M and wa for a ∈ A do for t = 1, . [sent-125, score-0.263]
45 , na do zt ← ∑m kτ (sta , sl ) l=1 ¯ ˆ ¯ na ← |Sa | for i = 1, 2, . [sent-128, score-0.168]
46 , m do na w′ ← ∑t=1 kτ (si , sta ) ¯ for j = 1, 2, . [sent-131, score-0.211]
47 , m do na b ← ∑t=1 kτ (si , sta )kτ (sta , s j )/zt ¯ ¯ ˆ ¯ 1 pi j ← wa +w′ (b + pi j wa ) ¯ ¯ i i na ¯ e ← ∑t=1 kτ (si , sta )rta 1 ri ← wa +w′ (e + ri wa ) ¯ ¯ i i a ← wa + w′ wi i Algorithm 2 Incremental KBSF (iKBSF) si Representative states, i = 1, 2, . [sent-134, score-1.012]
48 , m ¯ tm Interval to update model Input: tv Interval to update value function n Total number of sample transitions ˜ Output: Approximate value function Q(s, a) ¯ ← arbitrary matrix in Rm×|A| Q ¯ ¯ Pa ← 0 ∈ Rm×m , ra ← 0 ∈ Rm , wa ← 0 ∈ Rm , ∀a ∈ A for t = 1, 2, . [sent-137, score-0.658]
49 iKBSF updates the model M and the ¯ value function Q at fixed intervals tm and tv , respectively. [sent-143, score-0.155]
50 When tm = tv = n, we recover the batch version of KBSF; when tm = tv = 1, we have an on-line method which stores no sample transitions. [sent-144, score-0.433]
51 ¯ Note that Algorithm 2 also allows for the inclusion of new representative states to the model M. [sent-145, score-0.199]
52 Using Algorithm 1 this is easy to do: given a new representative state sm+1 , it suffices to set wa = ¯ m+1 ¯ ¯ 0, rm+1 = 0, and pm+1, j = p j,m+1 = 0 for j = 1, 2, . [sent-146, score-0.216]
53 The results below are particularly useful for iKBSF because they provide practical guidance towards where and when to add new representative states. [sent-153, score-0.138]
54 Suppose we have a fixed set of sample transitions Sa . [sent-154, score-0.257]
55 We will show that, if we are free to define the representative states, then we can use KBSF to approximate KBRL’s solution to any desired level of accuracy. [sent-155, score-0.123]
56 To be more precise, let d∗ ≡ maxa,i min j sa − s j , that is, d∗ is the maximum distance ˆi ¯ from a sampled state sa to the closest representative state. [sent-156, score-1.269]
57 ˆ∗ ˆ∗ ¯ ¯∗ ¯ ˆi ¯ Let sa ≡ sa with k = argmaxi min j sa − s j and sa ≡ sh where h = argmin j sa − s j , that is, sa ˆ∗ ˆk is the sampled state in Sa whose distance to the closest representative state is maximal, and sa is the ¯∗ ˆ∗ ¯∗ representative state that is closest to sa . [sent-159, score-4.777]
58 Using these definitions, we can select the pair (sa , sa ) that ˆ∗ ˆ ¯ ˆ∗ ¯∗ ¯ ¯∗ ˆ∗ maximizes sa − sa : s∗ ≡ sb and s∗ ≡ sb where b = argmaxa sa − sa . [sent-160, score-2.777]
59 ˆ∗ ¯∗ ˆ ¯∗ We make the following simple assumptions: (i) sa and sa are unique for all a ∈ A, (ii) 0∞ φ (x)dx ≤ ˆ∗ Lφ < ∞, (iii) φ (x) ≥ φ (y) if x < y, (iv) ∃ Aφ , λφ > 0, ∃ Bφ ≥ 0 such that Aφ exp(−x) ≤ φ (x) ≤ λφ Aφ exp(−x) if x ≥ Bφ . [sent-162, score-1.098]
60 We now introduce a notion of dissimilarity between two states s, s′ ∈ S which is induced by a specific set of sample transitions Sa and the choice of kernel function: a Definition 2. [sent-177, score-0.347]
61 Given β > 0, theβ -dissimilarity between s and s′ with respect to κτ is defined as a ψ(κτ , s, s′ , β ) = na a a ∑k=1 |κτ (s, sa ) − κτ (s′ , sa )|, if k k 0, otherwise. [sent-178, score-1.182]
62 For any α ∈ (0, 1] and any t ≥ m − 1, let ρ a = ρ(kτ , sa , sa , α/t), let ψρ = ¯ ˆ∗ ¯∗ a , sa , s , ρ a ), and let ψ a = max ψ(κ a , sa , s , ∞). [sent-185, score-2.224]
63 , (r|A| )⊺ ]⊺ ∞ ˆ ˇ ˙ = Pa r − DKa ra ∞ ∞ ¯ < ε if d∗ < δ1 and τ < δ2 . [sent-197, score-0.133]
64 From Eqn (1) and the definition of ra , we can write ˆ ˇ ˇ = Pa r − DKa r ∞ ˆ = (Pa − DKa )ˇ r ∞ ˆ ≤ Pa − DKa ∞ ˇ r ∞. [sent-199, score-0.133]
65 We have to show that, if we define the representative states in such a way ¯ that d∗ is small enough, and set τ accordingly, then we can make ψρ < (1 − α)η − αψMAX ≡ η ′ . [sent-205, score-0.199]
66 Recalling that d˙iaj = kτ (sa , s j )/∑m kτ (sa , sk ), ¯ ˆi ¯ k=1 ¯ ˆi ¯ let h = argmax j kτ (sa , s j ), and let ya = kτ (sa , sh ) and ya = max j=h kτ (sa , s j ). [sent-211, score-0.199]
67 Then, for any i, ˇi ˆi ¯ ˆi ¯ ˆi ¯ ¯ ¯ ¯ i max j d˙iaj = ya / ya + ∑ j=h kτ (sa , s j ) ≥ ya /(ya + (m − 1)ya ). [sent-212, score-0.262]
68 How small d∗ and τ should be depends on the particular choice of kernel kτ and on the characteristics of the sets of transitions Sa . [sent-219, score-0.236]
69 Of course, a fixed number m of representative states ¯ imposes a minimum possible value for d∗ , and if this value is not small enough decreasing τ may ¯ actually hurt the approximation. [sent-220, score-0.199]
70 Our result supports the use of a local approximation based on representative states spread over the state space S. [sent-222, score-0.227]
71 This is in line with the quantization strategies used in batch-mode kernel-based reinforcement learning to define the states s j [4, 5]. [sent-223, score-0.161]
72 In the case of on-line learning, we have to ¯ adaptively define the representative states s j as the sample transitions come in. [sent-224, score-0.456]
73 In the next section we show a simple strategy for adding representative states which is based on the theoretical results presented in this section. [sent-226, score-0.216]
74 Figure 1a shows the result of such an experiment when we vary the parameters tm and tv . [sent-233, score-0.155]
75 Note that the case in which tm = tv = 8000 corresponds to the batch version of KBSF. [sent-234, score-0.228]
76 More important, the performance of the decision policies after all sample transitions have been processed is essentially the same for all values of tm and tv , which shows that iKBSF can be used as a tool to circumvent KBSF’s memory demand (which is linear in n). [sent-236, score-0.523]
77 Thus, if one has a batch of sample transitions that does not fit in the available memory, it is possible to split the data in chunks of smaller sizes and still get the same value-function approximation that would 6 3 be computed if the entire data set were processed at once. [sent-237, score-0.341]
78 As shown in Figure 1b, there is only a small computational overhead associated with such a strategy (this results from unnormalizing and ¯ ¯ normalizing the elements of Pa and ra several times through update rules (3) and (4)). [sent-238, score-0.157]
79 0 1 2 ι = 1000 ι = 2000 ι = 4000 ι = 8000 0 2000 4000 6000 Number of sample transitions 8000 0 (a) Performance 2000 4000 6000 Number of sample transitions 8000 (b) Run times Figure 1: Results on the puddle-world task averaged over 50 runs. [sent-243, score-0.536]
80 iKBSF used 100 representative states evenly distributed over the state space and tm = tv = ι (see legends). [sent-244, score-0.382]
81 We performed our simulations using the parameters usually adopted with the double pole task, except that we added a third pole with the same length and mass as the longer pole [15]. [sent-261, score-0.179]
82 In our experiments with the double-pole task, we used 200 representative states and 106 sample transitions collected by a random policy [4]. [sent-263, score-0.489]
83 Here we start our experiment with triple pole-balancing ¯ using exactly the same configuration, and then we let KBSF refine its model M by incorporating more sample transitions through update rules (3) and (4). [sent-264, score-0.32]
84 Representative states were added to the model on-line every time the ¯ ˆi agent encountered a sample state sa for which kτ (sa , s j ) < 0. [sent-269, score-0.688]
85 , m (this corresponds ˆi ¯ ˆi ¯ to setting the maximum allowed distance d∗ from a sampled state to the closest representative state). [sent-273, score-0.171]
86 1 Since this is a batch-mode learning method, we used its result on the initial set of 106 sample transitions as a baseline for our empirical evaluation. [sent-277, score-0.257]
87 7 ● ● ● ● ● ● ● ● ● ● ● ● ● ● Batch KBSF 2e+06 4e+06 6e+06 8e+06 Number of sample transitions (a) Performance 50 0. [sent-285, score-0.257]
88 3 ● ● 1e+07 ● Batch KBSF iKBSF TREE−1000 TREE−100 2e+06 4e+06 6e+06 8e+06 Number of sample transitions (b) Run times 1e+07 ● Number of representative states 1000 2000 3000 4000 iKBSF TREE−1000 TREE−100 Seconds (log) 200 500 2000 10000 ● 0. [sent-287, score-0.456]
89 Of course, we could give more sample transitions to fitted Q-iteration and batch KBSF. [sent-290, score-0.317]
90 In contrast, the amount of memory required by iKBSF is independent of the number of sample transitions n. [sent-292, score-0.33]
91 This can be observed in Figure 2b, which shows that iKBSF can build an approximation using 107 transitions in under 20 minutes. [sent-294, score-0.222]
92 ● ● ● ● ● ● ● ● 2e+06 6e+06 1e+07 Number of sample transitions (c) Size of KBSF’s MDP Figure 2: Results on the triple pole-balancing task averaged over 50 runs. [sent-296, score-0.318]
93 The values correspond to the fraction of episodes initiated from the test states in which the 3 poles could be balanced for 3000 steps (one minute of simulated time). [sent-297, score-0.127]
94 As shown in Figure 2a, the ability of iKBSF to process a large number of sample transitions allows our algorithm to achieve a success rate of approximately 80%. [sent-302, score-0.272]
95 The good performance of iKBSF on the triple pole-balancing task is especially impressive when we recall that the decision policies were evaluated on a set of test states representing all possible directions of inclination of the three poles. [sent-304, score-0.179]
96 To conclude, observe in Figure 2c how the number of representative states m grows as a function of the number of sample transitions processed by KBSF. [sent-307, score-0.499]
97 As more and more data come in, the number of representative states starts to stabilize. [sent-309, score-0.199]
98 As for the theoretical contribution, we showed that KBSF can approximate KBRL’s value function at any level of accuracy by minimizing the distance between sampled states and the closest representative state. [sent-315, score-0.236]
99 This supports the quantization strategies usually adopted in kernel-based RL, and also offers guidance towards where and when to add new representative states in on-line learning. [sent-316, score-0.229]
100 Kernel-based models for reinforcement learning in continuous state spaces. [sent-332, score-0.134]
wordName wordTfidf (topN-words)
[('sa', 0.549), ('kbsf', 0.515), ('ikbsf', 0.324), ('kbrl', 0.24), ('transitions', 0.222), ('ra', 0.133), ('sta', 0.127), ('representative', 0.123), ('mdp', 0.12), ('dka', 0.108), ('tm', 0.087), ('pa', 0.087), ('reinforcement', 0.085), ('na', 0.084), ('ya', 0.078), ('wi', 0.076), ('states', 0.076), ('tv', 0.068), ('wa', 0.065), ('si', 0.061), ('batch', 0.06), ('cti', 0.06), ('hti', 0.06), ('incremental', 0.058), ('ka', 0.05), ('pole', 0.049), ('eqn', 0.049), ('ria', 0.048), ('rta', 0.048), ('memory', 0.045), ('rm', 0.042), ('maxa', 0.039), ('triple', 0.039), ('poles', 0.037), ('iaj', 0.036), ('sample', 0.035), ('pi', 0.034), ('ia', 0.033), ('policy', 0.033), ('nmax', 0.032), ('barreto', 0.032), ('ri', 0.03), ('st', 0.03), ('ormoneit', 0.029), ('mcgill', 0.029), ('amount', 0.028), ('state', 0.028), ('tted', 0.028), ('max', 0.028), ('decision', 0.027), ('montreal', 0.025), ('processed', 0.024), ('ernst', 0.024), ('usage', 0.024), ('factorization', 0.024), ('update', 0.024), ('control', 0.022), ('task', 0.022), ('freeing', 0.021), ('rewards', 0.021), ('rl', 0.021), ('continuous', 0.021), ('closest', 0.02), ('grows', 0.019), ('balancing', 0.019), ('sm', 0.019), ('know', 0.018), ('rna', 0.018), ('action', 0.018), ('mdps', 0.018), ('cart', 0.017), ('precup', 0.017), ('mod', 0.017), ('double', 0.017), ('theoretical', 0.017), ('widths', 0.017), ('dq', 0.017), ('transition', 0.017), ('canada', 0.017), ('ei', 0.016), ('wants', 0.016), ('ta', 0.016), ('di', 0.016), ('sb', 0.016), ('stochastic', 0.015), ('guidance', 0.015), ('sh', 0.015), ('adopted', 0.015), ('property', 0.015), ('policies', 0.015), ('process', 0.015), ('stores', 0.015), ('gb', 0.015), ('episodes', 0.014), ('retains', 0.014), ('kernel', 0.014), ('unstable', 0.013), ('rk', 0.013), ('tree', 0.013), ('version', 0.013)]
simIndex simValue paperId paperTitle
same-paper 1 1.0000002 250 nips-2012-On-line Reinforcement Learning Using Incremental Kernel-Based Stochastic Factorization
Author: Doina Precup, Joelle Pineau, Andre S. Barreto
Abstract: Kernel-based stochastic factorization (KBSF) is an algorithm for solving reinforcement learning tasks with continuous state spaces which builds a Markov decision process (MDP) based on a set of sample transitions. What sets KBSF apart from other kernel-based approaches is the fact that the size of its MDP is independent of the number of transitions, which makes it possible to control the trade-off between the quality of the resulting approximation and the associated computational cost. However, KBSF’s memory usage grows linearly with the number of transitions, precluding its application in scenarios where a large amount of data must be processed. In this paper we show that it is possible to construct KBSF’s MDP in a fully incremental way, thus freeing the space complexity of this algorithm from its dependence on the number of sample transitions. The incremental version of KBSF is able to process an arbitrary amount of data, which results in a model-based reinforcement learning algorithm that can be used to solve continuous MDPs in both off-line and on-line regimes. We present theoretical results showing that KBSF can approximate the value function that would be computed by conventional kernel-based learning with arbitrary precision. We empirically demonstrate the effectiveness of the proposed algorithm in the challenging threepole balancing task, in which the ability to process a large number of transitions is crucial for success. 1
2 0.098806739 297 nips-2012-Robustness and risk-sensitivity in Markov decision processes
Author: Takayuki Osogami
Abstract: We uncover relations between robust MDPs and risk-sensitive MDPs. The objective of a robust MDP is to minimize a function, such as the expectation of cumulative cost, for the worst case when the parameters have uncertainties. The objective of a risk-sensitive MDP is to minimize a risk measure of the cumulative cost when the parameters are known. We show that a risk-sensitive MDP of minimizing the expected exponential utility is equivalent to a robust MDP of minimizing the worst-case expectation with a penalty for the deviation of the uncertain parameters from their nominal values, which is measured with the Kullback-Leibler divergence. We also show that a risk-sensitive MDP of minimizing an iterated risk measure that is composed of certain coherent risk measures is equivalent to a robust MDP of minimizing the worst-case expectation when the possible deviations of uncertain parameters from their nominal values are characterized with a concave function. 1
3 0.089007281 345 nips-2012-Topic-Partitioned Multinetwork Embeddings
Author: Peter Krafft, Juston Moore, Bruce Desmarais, Hanna M. Wallach
Abstract: We introduce a new Bayesian admixture model intended for exploratory analysis of communication networks—specifically, the discovery and visualization of topic-specific subnetworks in email data sets. Our model produces principled visualizations of email networks, i.e., visualizations that have precise mathematical interpretations in terms of our model and its relationship to the observed data. We validate our modeling assumptions by demonstrating that our model achieves better link prediction performance than three state-of-the-art network models and exhibits topic coherence comparable to that of latent Dirichlet allocation. We showcase our model’s ability to discover and visualize topic-specific communication patterns using a new email data set: the New Hanover County email network. We provide an extensive analysis of these communication patterns, leading us to recommend our model for any exploratory analysis of email networks or other similarly-structured communication data. Finally, we advocate for principled visualization as a primary objective in the development of new network models. 1
4 0.079431057 348 nips-2012-Tractable Objectives for Robust Policy Optimization
Author: Katherine Chen, Michael Bowling
Abstract: Robust policy optimization acknowledges that risk-aversion plays a vital role in real-world decision-making. When faced with uncertainty about the effects of actions, the policy that maximizes expected utility over the unknown parameters of the system may also carry with it a risk of intolerably poor performance. One might prefer to accept lower utility in expectation in order to avoid, or reduce the likelihood of, unacceptable levels of utility under harmful parameter realizations. In this paper, we take a Bayesian approach to parameter uncertainty, but unlike other methods avoid making any distributional assumptions about the form of this uncertainty. Instead we focus on identifying optimization objectives for which solutions can be efficiently approximated. We introduce percentile measures: a very general class of objectives for robust policy optimization, which encompasses most existing approaches, including ones known to be intractable. We then introduce a broad subclass of this family for which robust policies can be approximated efficiently. Finally, we frame these objectives in the context of a two-player, zero-sum, extensive-form game and employ a no-regret algorithm to approximate an optimal policy, with computation only polynomial in the number of states and actions of the MDP. 1
5 0.072813086 162 nips-2012-Inverse Reinforcement Learning through Structured Classification
Author: Edouard Klein, Matthieu Geist, Bilal Piot, Olivier Pietquin
Abstract: This paper adresses the inverse reinforcement learning (IRL) problem, that is inferring a reward for which a demonstrated expert behavior is optimal. We introduce a new algorithm, SCIRL, whose principle is to use the so-called feature expectation of the expert as the parameterization of the score function of a multiclass classifier. This approach produces a reward function for which the expert policy is provably near-optimal. Contrary to most of existing IRL algorithms, SCIRL does not require solving the direct RL problem. Moreover, with an appropriate heuristic, it can succeed with only trajectories sampled according to the expert behavior. This is illustrated on a car driving simulator. 1
6 0.068191558 122 nips-2012-Exploration in Model-based Reinforcement Learning by Empirically Estimating Learning Progress
7 0.064820565 259 nips-2012-Online Regret Bounds for Undiscounted Continuous Reinforcement Learning
8 0.064292699 38 nips-2012-Algorithms for Learning Markov Field Policies
9 0.063917585 108 nips-2012-Efficient Bayes-Adaptive Reinforcement Learning using Sample-Based Search
10 0.050622579 88 nips-2012-Cost-Sensitive Exploration in Bayesian Reinforcement Learning
11 0.049691394 296 nips-2012-Risk Aversion in Markov Decision Processes via Near Optimal Chernoff Bounds
12 0.047924556 353 nips-2012-Transferring Expectations in Model-based Reinforcement Learning
13 0.047621235 3 nips-2012-A Bayesian Approach for Policy Learning from Trajectory Preference Queries
14 0.046854954 54 nips-2012-Bayesian Probabilistic Co-Subspace Addition
15 0.044661816 173 nips-2012-Learned Prioritization for Trading Off Accuracy and Speed
16 0.044629626 255 nips-2012-On the Use of Non-Stationary Policies for Stationary Infinite-Horizon Markov Decision Processes
17 0.041670904 344 nips-2012-Timely Object Recognition
18 0.04114338 243 nips-2012-Non-parametric Approximate Dynamic Programming via the Kernel Method
19 0.039950896 313 nips-2012-Sketch-Based Linear Value Function Approximation
20 0.037957117 45 nips-2012-Approximating Equilibria in Sequential Auctions with Incomplete Information and Multi-Unit Demand
topicId topicWeight
[(0, 0.101), (1, -0.115), (2, -0.012), (3, -0.034), (4, -0.02), (5, 0.007), (6, -0.012), (7, -0.007), (8, 0.001), (9, 0.004), (10, 0.008), (11, -0.013), (12, 0.004), (13, 0.01), (14, -0.03), (15, -0.009), (16, -0.013), (17, -0.023), (18, 0.02), (19, -0.028), (20, -0.018), (21, 0.007), (22, -0.002), (23, -0.044), (24, -0.037), (25, 0.016), (26, 0.029), (27, 0.021), (28, 0.013), (29, 0.051), (30, 0.021), (31, -0.019), (32, -0.052), (33, 0.026), (34, 0.034), (35, -0.0), (36, -0.081), (37, 0.004), (38, 0.027), (39, 0.052), (40, -0.008), (41, 0.02), (42, -0.0), (43, -0.091), (44, -0.071), (45, -0.022), (46, 0.003), (47, -0.012), (48, -0.063), (49, 0.038)]
simIndex simValue paperId paperTitle
same-paper 1 0.90109044 250 nips-2012-On-line Reinforcement Learning Using Incremental Kernel-Based Stochastic Factorization
Author: Doina Precup, Joelle Pineau, Andre S. Barreto
Abstract: Kernel-based stochastic factorization (KBSF) is an algorithm for solving reinforcement learning tasks with continuous state spaces which builds a Markov decision process (MDP) based on a set of sample transitions. What sets KBSF apart from other kernel-based approaches is the fact that the size of its MDP is independent of the number of transitions, which makes it possible to control the trade-off between the quality of the resulting approximation and the associated computational cost. However, KBSF’s memory usage grows linearly with the number of transitions, precluding its application in scenarios where a large amount of data must be processed. In this paper we show that it is possible to construct KBSF’s MDP in a fully incremental way, thus freeing the space complexity of this algorithm from its dependence on the number of sample transitions. The incremental version of KBSF is able to process an arbitrary amount of data, which results in a model-based reinforcement learning algorithm that can be used to solve continuous MDPs in both off-line and on-line regimes. We present theoretical results showing that KBSF can approximate the value function that would be computed by conventional kernel-based learning with arbitrary precision. We empirically demonstrate the effectiveness of the proposed algorithm in the challenging threepole balancing task, in which the ability to process a large number of transitions is crucial for success. 1
2 0.73721564 45 nips-2012-Approximating Equilibria in Sequential Auctions with Incomplete Information and Multi-Unit Demand
Author: Amy Greenwald, Jiacui Li, Eric Sodomka
Abstract: In many large economic markets, goods are sold through sequential auctions. Examples include eBay, online ad auctions, wireless spectrum auctions, and the Dutch flower auctions. In this paper, we combine methods from game theory and decision theory to search for approximate equilibria in sequential auction domains, in which bidders do not know their opponents’ values for goods, bidders only partially observe the actions of their opponents’, and bidders demand multiple goods. We restrict attention to two-phased strategies: first predict (i.e., learn); second, optimize. We use best-reply dynamics [4] for prediction (i.e., to predict other bidders’ strategies), and then assuming fixed other-bidder strategies, we estimate and solve the ensuing Markov decision processes (MDP) [18] for optimization. We exploit auction properties to represent the MDP in a more compact state space, and we use Monte Carlo simulation to make estimating the MDP tractable. We show how equilibria found using our search procedure compare to known equilibria for simpler auction domains, and we approximate an equilibrium for a more complex auction domain where analytical solutions are unknown. 1
3 0.71969682 297 nips-2012-Robustness and risk-sensitivity in Markov decision processes
Author: Takayuki Osogami
Abstract: We uncover relations between robust MDPs and risk-sensitive MDPs. The objective of a robust MDP is to minimize a function, such as the expectation of cumulative cost, for the worst case when the parameters have uncertainties. The objective of a risk-sensitive MDP is to minimize a risk measure of the cumulative cost when the parameters are known. We show that a risk-sensitive MDP of minimizing the expected exponential utility is equivalent to a robust MDP of minimizing the worst-case expectation with a penalty for the deviation of the uncertain parameters from their nominal values, which is measured with the Kullback-Leibler divergence. We also show that a risk-sensitive MDP of minimizing an iterated risk measure that is composed of certain coherent risk measures is equivalent to a robust MDP of minimizing the worst-case expectation when the possible deviations of uncertain parameters from their nominal values are characterized with a concave function. 1
4 0.69073683 348 nips-2012-Tractable Objectives for Robust Policy Optimization
Author: Katherine Chen, Michael Bowling
Abstract: Robust policy optimization acknowledges that risk-aversion plays a vital role in real-world decision-making. When faced with uncertainty about the effects of actions, the policy that maximizes expected utility over the unknown parameters of the system may also carry with it a risk of intolerably poor performance. One might prefer to accept lower utility in expectation in order to avoid, or reduce the likelihood of, unacceptable levels of utility under harmful parameter realizations. In this paper, we take a Bayesian approach to parameter uncertainty, but unlike other methods avoid making any distributional assumptions about the form of this uncertainty. Instead we focus on identifying optimization objectives for which solutions can be efficiently approximated. We introduce percentile measures: a very general class of objectives for robust policy optimization, which encompasses most existing approaches, including ones known to be intractable. We then introduce a broad subclass of this family for which robust policies can be approximated efficiently. Finally, we frame these objectives in the context of a two-player, zero-sum, extensive-form game and employ a no-regret algorithm to approximate an optimal policy, with computation only polynomial in the number of states and actions of the MDP. 1
5 0.68700862 88 nips-2012-Cost-Sensitive Exploration in Bayesian Reinforcement Learning
Author: Dongho Kim, Kee-eung Kim, Pascal Poupart
Abstract: In this paper, we consider Bayesian reinforcement learning (BRL) where actions incur costs in addition to rewards, and thus exploration has to be constrained in terms of the expected total cost while learning to maximize the expected longterm total reward. In order to formalize cost-sensitive exploration, we use the constrained Markov decision process (CMDP) as the model of the environment, in which we can naturally encode exploration requirements using the cost function. We extend BEETLE, a model-based BRL method, for learning in the environment with cost constraints. We demonstrate the cost-sensitive exploration behaviour in a number of simulated problems. 1
6 0.65497428 350 nips-2012-Trajectory-Based Short-Sighted Probabilistic Planning
7 0.63529015 296 nips-2012-Risk Aversion in Markov Decision Processes via Near Optimal Chernoff Bounds
8 0.62619805 108 nips-2012-Efficient Bayes-Adaptive Reinforcement Learning using Sample-Based Search
9 0.62338465 243 nips-2012-Non-parametric Approximate Dynamic Programming via the Kernel Method
10 0.59755003 122 nips-2012-Exploration in Model-based Reinforcement Learning by Empirically Estimating Learning Progress
11 0.56311923 259 nips-2012-Online Regret Bounds for Undiscounted Continuous Reinforcement Learning
12 0.54764849 255 nips-2012-On the Use of Non-Stationary Policies for Stationary Infinite-Horizon Markov Decision Processes
13 0.5372088 109 nips-2012-Efficient Monte Carlo Counterfactual Regret Minimization in Games with Many Player Actions
14 0.52550226 38 nips-2012-Algorithms for Learning Markov Field Policies
15 0.51426709 203 nips-2012-Locating Changes in Highly Dependent Data with Unknown Number of Change Points
16 0.51127255 358 nips-2012-Value Pursuit Iteration
17 0.49638626 136 nips-2012-Forward-Backward Activation Algorithm for Hierarchical Hidden Markov Models
18 0.4942179 51 nips-2012-Bayesian Hierarchical Reinforcement Learning
19 0.48732325 153 nips-2012-How Prior Probability Influences Decision Making: A Unifying Probabilistic Model
20 0.47616562 353 nips-2012-Transferring Expectations in Model-based Reinforcement Learning
topicId topicWeight
[(0, 0.017), (21, 0.031), (38, 0.107), (42, 0.028), (54, 0.04), (55, 0.021), (72, 0.328), (74, 0.06), (76, 0.122), (80, 0.085), (92, 0.046)]
simIndex simValue paperId paperTitle
1 0.71769643 9 nips-2012-A Geometric take on Metric Learning
Author: Søren Hauberg, Oren Freifeld, Michael J. Black
Abstract: Multi-metric learning techniques learn local metric tensors in different parts of a feature space. With such an approach, even simple classifiers can be competitive with the state-of-the-art because the distance measure locally adapts to the structure of the data. The learned distance measure is, however, non-metric, which has prevented multi-metric learning from generalizing to tasks such as dimensionality reduction and regression in a principled way. We prove that, with appropriate changes, multi-metric learning corresponds to learning the structure of a Riemannian manifold. We then show that this structure gives us a principled way to perform dimensionality reduction and regression according to the learned metrics. Algorithmically, we provide the first practical algorithm for computing geodesics according to the learned metrics, as well as algorithms for computing exponential and logarithmic maps on the Riemannian manifold. Together, these tools let many Euclidean algorithms take advantage of multi-metric learning. We illustrate the approach on regression and dimensionality reduction tasks that involve predicting measurements of the human body from shape data. 1 Learning and Computing Distances Statistics relies on measuring distances. When the Euclidean metric is insufficient, as is the case in many real problems, standard methods break down. This is a key motivation behind metric learning, which strives to learn good distance measures from data. In the most simple scenarios a single metric tensor is learned, but in recent years, several methods have proposed learning multiple metric tensors, such that different distance measures are applied in different parts of the feature space. This has proven to be a very powerful approach for classification tasks [1, 2], but the approach has not generalized to other tasks. Here we consider the generalization of Principal Component Analysis (PCA) and linear regression; see Fig. 1 for an illustration of our approach. The main problem with generalizing multi-metric learning is that it is based on assumptions that make the feature space both non-smooth and non-metric. Specifically, it is often assumed that straight lines form geodesic curves and that the metric tensor stays constant along these lines. These assumptions are made because it is believed that computing the actual geodesics is intractable, requiring a discretization of the entire feature space [3]. We solve these problems by smoothing the transitions between different metric tensors, which ensures a metric space where geodesics can be computed. In this paper, we consider the scenario where the metric tensor at a given point in feature space is defined as the weighted average of a set of learned metric tensors. In this model, we prove that the feature space becomes a chart for a Riemannian manifold. This ensures a metric feature space, i.e. dist(x, y) = 0 ⇔ x = y , dist(x, y) = dist(y, x) (symmetry), (1) dist(x, z) ≤ dist(x, y) + dist(y, z) (triangle inequality). To compute statistics according to the learned metric, we need to be able to compute distances, which implies that we need to compute geodesics. Based on the observation that geodesics are 1 (a) Local Metrics & Geodesics (b) Tangent Space Representation (c) First Principal Geodesic Figure 1: Illustration of Principal Geodesic Analysis. (a) Geodesics are computed between the mean and each data point. (b) Data is mapped to the Euclidean tangent space and the first principal component is computed. (c) The principal component is mapped back to the feature space. smooth curves in Riemannian spaces, we derive an algorithm for computing geodesics that only requires a discretization of the geodesic rather than the entire feature space. Furthermore, we show how to compute the exponential and logarithmic maps of the manifold. With this we can map any point back and forth between a Euclidean tangent space and the manifold. This gives us a general strategy for incorporating the learned metric tensors in many Euclidean algorithms: map the data to the tangent of the manifold, perform the Euclidean analysis and map the results back to the manifold. Before deriving the algorithms (Sec. 3) we set the scene by an analysis of the shortcomings of current state-of-the-art methods (Sec. 2), which motivate our final model. The model is general and can be used for many problems. Here we illustrate it with several challenging problems in 3D body shape modeling and analysis (Sec. 4). All proofs can be found in the supplementary material along with algorithmic details and further experimental results. 2 Background and Related Work Single-metric learning learns a metric tensor, M, such that distances are measured as dist2 (xi , xj ) = xi − xj 2 M ≡ (xi − xj )T M(xi − xj ) , (2) where M is a symmetric and positive definite D × D matrix. Classic approaches for finding such a metric tensor include PCA, where the metric is given by the inverse covariance matrix of the training data; and linear discriminant analysis (LDA), where the metric tensor is M = S−1 SB S−1 , with Sw W W and SB being the within class scatter and the between class scatter respectively [9]. A more recent approach tries to learn a metric tensor from triplets of data points (xi , xj , xk ), where the metric should obey the constraint that dist(xi , xj ) < dist(xi , xk ). Here the constraints are often chosen such that xi and xj belong to the same class, while xi and xk do not. Various relaxed versions of this idea have been suggested such that the metric can be learned by solving a semi-definite or a quadratic program [1, 2, 4–8]. Among the most popular approaches is the Large Margin Nearest Neighbor (LMNN) classifier [5], which finds a linear transformation that satisfies local distance constraints, making the approach suitable for multi-modal classes. For many problems, a single global metric tensor is not enough, which motivates learning several local metric tensors. The classic work by Hastie and Tibshirani [9] advocates locally learning metric tensors according to LDA and using these as part of a kNN classifier. In a somewhat similar fashion, Weinberger and Saul [5] cluster the training data and learn a separate metric tensor for each cluster using LMNN. A more extreme point of view was taken by Frome et al. [1, 2], who learn a diagonal metric tensor for every point in the training set, such that distance rankings are preserved. Similarly, Malisiewicz and Efros [6] find a diagonal metric tensor for each training point such that the distance to a subset of the training data from the same class is kept small. Once a set of metric tensors {M1 , . . . , MR } has been learned, the distance dist(a, b) is measured according to (2) where “the nearest” metric tensor is used, i.e. R M(x) = r=1 wr (x) ˜ Mr , where wr (x) = ˜ ˜ j wj (x) 1 0 x − xr 2 r ≤ x − xj M otherwise 2 Mj , ∀j , (3) where x is either a or b depending on the algorithm. Note that this gives a non-metric distance function as it is not symmetric. To derive this equation, it is necessary to assume that 1) geodesics 2 −8 −8 Assumed Geodesics Location of Metric Tensors Test Points −6 −8 Actual Geodesics Location of Metric Tensors Test Points −6 Riemannian Geodesics Location of Metric Tensors Test Points −6 −4 −4 −4 −2 −2 −2 0 0 0 2 2 2 4 4 4 6 −8 6 −8 −6 −4 −2 0 (a) 2 4 6 −6 −4 −2 0 2 4 6 6 −8 −6 (b) −4 −2 (c) 0 2 4 6 (d) Figure 2: (a)–(b) An illustrative example where straight lines do not form geodesics and where the metric tensor does not stay constant along lines; see text for details. The background color is proportional to the trace of the metric tensor, such that light grey corresponds to regions where paths are short (M1 ), and dark grey corresponds to regions they are long (M2 ). (c) The suggested geometric model along with the geodesics. Again, background colour is proportional to the trace of the metric tensor; the colour scale is the same is used in (a) and (b). (d) An illustration of the exponential and logarithmic maps. form straight lines, and 2) the metric tensor stays constant along these lines [3]. Both assumptions are problematic, which we illustrate with a simple example in Fig. 2a–c. Assume we are given two metric tensors M1 = 2I and M2 = I positioned at x1 = (2, 2)T and x2 = (4, 4)T respectively. This gives rise to two regions in feature space in which x1 is nearest in the first and x2 is nearest in the second, according to (3). This is illustrated in Fig. 2a. In the same figure, we also show the assumed straight-line geodesics between selected points in space. As can be seen, two of the lines goes through both regions, such that the assumption of constant metric tensors along the line is violated. Hence, it would seem natural to measure the length of the line, by adding the length of the line segments which pass through the different regions of feature space. This was suggested by Ramanan and Baker [3] who also proposed a polynomial time algorithm for measuring these line lengths. This gives a symmetric distance function. Properly computing line lengths according to the local metrics is, however, not enough to ensure that the distance function is metric. As can be seen in Fig. 2a the straight line does not form a geodesic as a shorter path can be found by circumventing the region with the “expensive” metric tensor M1 as illustrated in Fig. 2b. This issue makes it trivial to construct cases where the triangle inequality is violated, which again makes the line length measure non-metric. In summary, if we want a metric feature space, we can neither assume that geodesics are straight lines nor that the metric tensor stays constant along such lines. In practice, good results have been reported using (3) [1,3,5], so it seems obvious to ask: is metricity required? For kNN classifiers this does not appear to be the case, with many successes based on dissimilarities rather than distances [10]. We, however, want to generalize PCA and linear regression, which both seek to minimize the reconstruction error of points projected onto a subspace. As the notion of projection is hard to define sensibly in non-metric spaces, we consider metricity essential. In order to build a model with a metric feature space, we change the weights in (3) to be smooth functions. This impose a well-behaved geometric structure on the feature space, which we take advantage of in order to perform statistical analysis according to the learned metrics. However, first we review the basics of Riemannian geometry as this provides the theoretical foundation of our work. 2.1 Geodesics and Riemannian Geometry We start by defining Riemannian manifolds, which intuitively are smoothly curved spaces equipped with an inner product. Formally, they are smooth manifolds endowed with a Riemannian metric [11]: Definition A Riemannian metric M on a manifold M is a smoothly varying inner product < a, b >x = aT M(x)b in the tangent space Tx M of each point x ∈ M . 3 Often Riemannian manifolds are represented by a chart; i.e. a parameter space for the curved surface. An example chart is the spherical coordinate system often used to represent spheres. While such charts are often flat spaces, the curvature of the manifold arises from the smooth changes in the metric. On a Riemannian manifold M, the length of a smooth curve c : [0, 1] → M is defined as the integral of the norm of the tangent vector (interpreted as speed) along the curve: 1 Length(c) = 1 c (λ) M(c(λ)) dλ c (λ)T M(c(λ))c (λ)dλ , = (4) 0 0 where c denotes the derivative of c and M(c(λ)) is the metric tensor at c(λ). A geodesic curve is then a length-minimizing curve connecting two given points x and y, i.e. (5) cgeo = arg min Length(c) with c(0) = x and c(1) = y . c The distance between x and y is defined as the length of the geodesic. Given a tangent vector v ∈ Tx M, there exists a unique geodesic cv (t) with initial velocity v at x. The Riemannian exponential map, Expx , maps v to a point on the manifold along the geodesic cv at t = 1. This mapping preserves distances such that dist(cv (0), cv (1)) = v . The inverse of the exponential map is the Riemannian logarithmic map denoted Logx . Informally, the exponential and logarithmic maps move points back and forth between the manifold and the tangent space while preserving distances (see Fig. 2d for an illustration). This provides a general strategy for generalizing many Euclidean techniques to Riemannian domains: data points are mapped to the tangent space, where ordinary Euclidean techniques are applied and the results are mapped back to the manifold. 3 A Metric Feature Space With the preliminaries settled we define the new model. Let C = RD denote the feature space. We endow C with a metric tensor in every point x, which we define akin to (3), R M(x) = wr (x)Mr , where wr (x) = r=1 wr (x) ˜ R ˜ j=1 wj (x) , (6) with wr > 0. The only difference from (3) is that we shall not restrict ourselves to binary weight ˜ functions wr . We assume the metric tensors Mr have already been learned; Sec. 4 contain examples ˜ where they have been learned using LMNN [5] and LDA [9]. From the definition of a Riemannian metric, we trivially have the following result: Lemma 1 The space C = RD endowed with the metric tensor from (6) is a chart of a Riemannian manifold, iff the weights wr (x) change smoothly with x. Hence, by only considering smooth weight functions wr we get a well-studied geometric structure ˜ on the feature space, which ensures us that it is metric. To illustrate the implications we return to the example in Fig. 2. We change the weight functions from binary to squared exponentials, which gives the feature space shown in Fig. 2c. As can be seen, the metric tensor now changes smoothly, which also makes the geodesics smooth curves (a property we will use when computing the geodesics). It is worth noting that Ramanan and Baker [3] also consider the idea of smoothly averaging the metric tensor. They, however, only evaluate the metric tensor at the test point of their classifier and then assume straight line geodesics with a constant metric tensor. Such assumptions violate the premise of a smoothly changing metric tensor and, again, the distance measure becomes non-metric. Lemma 1 shows that metric learning can be viewed as manifold learning. The main difference between our approach and techniques such as Isomap [12] is that, while Isomap learns an embedding of the data points, we learn the actual manifold structure. This gives us the benefit that we can compute geodesics as well as the exponential and logarithmic maps. These provide us with mappings back and forth between the manifold and Euclidean representation of the data, which preserve distances as well as possible. The availability of such mappings is in stark contrast to e.g. Isomap. In the next section we will derive a system of ordinary differential equations (ODE’s) that geodesics in C have to satisfy, which provides us with algorithms for computing geodesics as well as exponential and logarithmic maps. With these we can generalize many Euclidean techniques. 4 3.1 Computing Geodesics, Maps and Statistics At minima of (4) we know that the Euler-Lagrange equation must hold [11], i.e. ∂L d ∂L , where L(λ, c, c ) = c (λ)T M(c(λ))c (λ) . = ∂c dλ ∂c As we have an explicit expression for the metric tensor we can compute (7) in closed form: (7) Theorem 2 Geodesic curves in C satisfy the following system of 2nd order ODE’s M(c(λ))c (λ) = − 1 ∂vec [M(c(λ))] 2 ∂c(λ) T (c (λ) ⊗ c (λ)) , (8) where ⊗ denotes the Kronecker product and vec [·] stacks the columns of a matrix into a vector [13]. Proof See supplementary material. This result holds for any smooth weight functions wr . We, however, still need to compute ∂vec[M] , ˜ ∂c which depends on the specific choice of wr . Any smooth weighting scheme is applicable, but we ˜ restrict ourselves to the obvious smooth generalization of (3) and use squared exponentials. From this assumption, we get the following result Theorem 3 For wr (x) = exp − ρ x − xr ˜ 2 ∂vec [M(c)] = ∂c the derivative of the metric tensor from (6) is R ρ R j=1 2 Mr R 2 wj ˜ T r=1 T wj (c − xj ) Mj − (c − xr ) Mr ˜ wr vec [Mr ] ˜ . (9) j=1 Proof See supplementary material. Computing Geodesics. Any geodesic curve must be a solution to (8). Hence, to compute a geodesic between x and y, we can solve (8) subject to the constraints c(0) = x and c(1) = y . (10) This is a boundary value problem, which has a smooth solution. This allows us to solve the problem numerically using a standard three-stage Lobatto IIIa formula, which provides a fourth-order accurate C 1 –continuous solution [14]. Ramanan and Baker [3] discuss the possibility of computing geodesics, but arrive at the conclusion that this is intractable based on the assumption that it requires discretizing the entire feature space. Our solution avoids discretizing the feature space by discretizing the geodesic curve instead. As this is always one-dimensional the approach remains tractable in high-dimensional feature spaces. Computing Logarithmic Maps. Once a geodesic c is found, it follows from the definition of the logarithmic map, Logx (y), that it can be computed as v = Logx (y) = c (0) Length(c) . c (0) (11) In practice, we solve (8) by rewriting it as a system of first order ODE’s, such that we compute both c and c simultaneously (see supplementary material for details). Computing Exponential Maps. Given a starting point x on the manifold and a vector v in the tangent space, the exponential map, Expx (v), finds the unique geodesic starting at x with initial velocity v. As the geodesic must fulfill (8), we can compute the exponential map by solving this system of ODE’s with the initial conditions c(0) = x and c (0) = v . (12) This initial value problem has a unique solution, which we find numerically using a standard RungeKutta scheme [15]. 5 3.1.1 Generalizing PCA and Regression At this stage, we know that the feature space is Riemannian and we know how to compute geodesics and exponential and logarithmic maps. We now seek to generalize PCA and linear regression, which becomes straightforward since solutions are available in Riemannian spaces [16, 17]. These generalizations can be summarized as mapping the data to the tangent space at the mean, performing standard Euclidean analysis in the tangent and mapping the results back. The first step is to compute the mean value on the manifold, which is defined as the point that minimizes the sum-of-squares distances to the data points. Pennec [18] provides an efficient gradient descent approach for computing this point, which we also summarize in the supplementary material. The empirical covariance of a set of points is defined as the ordinary Euclidean covariance in the tangent space at the mean value [18]. With this in mind, it is not surprising that the principal components of a dataset have been generalized as the geodesics starting at the mean with initial velocity corresponding to the eigenvectors of the covariance [16], γvd (t) = Expµ (tvd ) , (13) th where vd denotes the d eigenvector of the covariance. This approach is called Principal Geodesic Analysis (PGA), and the geodesic curve γvd is called the principal geodesic. An illustration of the approach can be seen in Fig. 1 and more algorithmic details are in the supplementary material. Linear regression has been generalized in a similar way [17] by performing regression in the tangent of the mean and mapping the resulting line back to the manifold using the exponential map. The idea of working in the tangent space is both efficient and convenient, but comes with an element of approximation as the logarithmic map is only guarantied to preserve distances to the origin of the tangent and not between all pairs of data points. Practical experience, however, indicates that this is a good tradeoff; see [19] for a more in-depth discussion of when the approximation is suitable. 4 Experiments To illustrate the framework1 we consider an example in human body analysis, and then we analyze the scalability of the approach. But first, to build intuition, Fig. 3a show synthetically generated data samples from two classes. We sample random points xr and learn a local LDA metric [9] by considering all data points within a radius; this locally pushes the two classes apart. We combine the local metrics using (6) and Fig. 3b show the data in the tangent space of the resulting manifold. As can be seen the two classes are now globally further apart, which shows the effect of local metrics. 4.1 Human Body Shape We consider a regression example concerning human body shape analysis. We study 986 female body laser scans from the CAESAR [20] data set; each shape is represented using the leading 35 principal components of the data learned using a SCAPE-like model [21, 22]. Each shape is associated with anthropometric measurements such as body height, shoe size, etc. We show results for shoulder to wrist distance and shoulder breadth, but results for more measurements are in the supplementary material. To predict the measurements from shape coefficients, we learn local metrics and perform linear regression according to these. As a further experiment, we use PGA to reduce the dimensionality of the shape coefficients according to the local metrics, and measure the quality of the reduction by performing linear regression to predict the measurements. As a baseline we use the corresponding Euclidean techniques. To learn the local metric we do the following. First we whiten the data such that the variance captured by PGA will only be due to the change of metric; this allows easy visualization of the impact of the learned metrics. We then cluster the body shapes into equal-sized clusters according to the measurement and learn a LMNN metric for each cluster [5], which we associate with the mean of each class. These push the clusters apart, which introduces variance along the directions where the measurement changes. From this we construct a Riemannian manifold according to (6), 1 Our software implementation for computing geodesics and performing manifold statistics is available at http://ps.is.tue.mpg.de/project/Smooth Metric Learning 6 30 Euclidean Model Riemannian Model 24 20 18 16 20 15 10 5 14 12 0 (a) 25 22 Running Time (sec.) Average Prediction Error 26 10 (b) 20 Dimensionality 0 0 30 50 (c) 100 Dimensionality 150 (d) 4 3 3 2 2 1 1 0 −1 −2 −3 −4 −4 −3 −2 −1 0 1 2 3 4 Shoulder breadth 20 −2 −3 Euclidean Model Riemannian Model 0 −1 25 Prediction Error 4 15 10 0 −4 −5 0 4 10 15 20 Dimensionality 16 25 30 35 17 3 3 5 5 Euclidean Model Riemannian Model 2 15 2 1 1 Prediction Error Shoulder to wrist distance Figure 3: Left panels: Synthetic data. (a) Samples from two classes along with illustratively sampled metric tensors from (6). (b) The data represented in the tangent of a manifold constructed from local LDA metrics learned at random positions. Right panels: Real data. (c) Average error of linearly predicted body measurements (mm). (d) Running time (sec) of the geodesic computation as a function of dimensionality. 0 0 −1 −2 −1 −3 14 13 12 11 −2 −4 −3 −4 −4 10 −5 −3 −2 −1 0 1 Euclidean PCA 2 3 −6 −4 9 0 −2 0 2 4 Tangent Space PCA (PGA) 6 5 10 15 20 Dimensionality 25 30 35 Regression Error Figure 4: Left: body shape data in the first two principal components according to the Euclidean metric. Point color indicates cluster membership. Center: As on the left, but according to the Riemannian model. Right: regression error as a function of the dimensionality of the shape space; again the Euclidean metric and the Riemannian metric are compared. compute the mean value on the manifold, map the data to the tangent space at the mean and perform linear regression in the tangent space. As a first visualization we plot the data expressed in the leading two dimensions of PGA in Fig. 4; as can be seen the learned metrics provide principal geodesics, which are more strongly related with the measurements than the Euclidean model. In order to predict the measurements from the body shape, we perform linear regression, both directly in the shape space according to the Euclidean metric and in the tangent space of the manifold corresponding to the learned metrics (using the logarithmic map from (11)). We measure the prediction error using leave-one-out cross-validation. To further illustrate the power of the PGA model, we repeat this experiment for different dimensionalities of the data. The results are plotted in Fig. 4, showing that regression according to the learned metrics outperforms the Euclidean model. To verify that the learned metrics improve accuracy, we average the prediction errors over all millimeter measurements. The result in Fig. 3c shows that much can be gained in lower dimensions by using the local metrics. To provide visual insights into the behavior of the learned metrics, we uniformly sample body shape along the first principal geodesic (in the range ±7 times the standard deviation) according to the different metrics. The results are available as a movie in the supplementary material, but are also shown in Fig. 5. As can be seen, the learned metrics pick up intuitive relationships between body shape and the measurements, e.g. shoulder to wrist distance is related to overall body size, while shoulder breadth is related to body weight. 7 Shoulder to wrist distance Shoulder breadth Figure 5: Shapes corresponding to the mean (center) and ±7 times the standard deviations along the principal geodesics (left and right). Movies are available in the supplementary material. 4.2 Scalability The human body data set is small enough (986 samples in 35 dimensions) that computing a geodesic only takes a few seconds. To show that the current unoptimized Matlab implementation can handle somewhat larger datasets, we briefly consider a dimensionality reduction task on the classic MNIST handwritten digit data set. We use the preprocessed data available with [3] where the original 28×28 gray scale images were deskewed and projected onto their leading 164 Euclidean principal components (which captures 95% of the variance in the original data). We learn one diagonal LMNN metric per class, which we associate with the mean of the class. From this we construct a Riemannian manifold from (6), compute the mean value on the manifold and compute geodesics between the mean and each data point; this is the computationally expensive part of performing PGA. Fig. 3d plots the average running time (sec) for the computation of geodesics as a function of the dimensionality of the training data. A geodesic can be computed in 100 dimensions in approximately 5 sec., whereas in 150 dimensions it takes about 30 sec. In this experiment, we train a PGA model on 60,000 data points, and test a nearest neighbor classifier in the tangent space as we decrease the dimensionality of the model. Compared to a Euclidean model, this gives a modest improvement in classification accuracy of 2.3 percent, when averaged across different dimensionalities. Plots of the results can be found in the supplementary material. 5 Discussion This work shows that multi-metric learning techniques are indeed applicable outside the realm of kNN classifiers. The idea of defining the metric tensor at any given point as the weighted average of a finite set of learned metrics is quite natural from a modeling point of view, which is also validated by the Riemannian structure of the resulting space. This opens both a theoretical and a practical toolbox for analyzing and developing algorithms that use local metric tensors. Specifically, we show how to use local metric tensors for both regression and dimensionality reduction tasks. Others have attempted to solve non-classification problems using local metrics, but we feel that our approach is the first to have a solid theoretical backing. For example, Hastie and Tibshirani [9] use local LDA metrics for dimensionality reduction by averaging the local metrics and using the resulting metric as part of a Euclidean PCA, which essentially is a linear approach. Another approach was suggested by Hong et al. [23] who simply compute the principal components according to each metric separately, such that one low dimensional model is learned per metric. The suggested approach is, however, not difficulty-free in its current implementation. Currently, we are using off-the-shelf numerical solvers for computing geodesics, which can be computationally demanding. While we managed to analyze medium-sized datasets, we believe that the run-time can be drastically improved by developing specialized numerical solvers. In the experiments, we learned local metrics using techniques specialized for classification tasks as this is all the current literature provides. We expect improvements by learning the metrics specifically for regression and dimensionality reduction, but doing so is currently an open problem. Acknowledgments: Søren Hauberg is supported in part by the Villum Foundation, and Oren Freifeld is supported in part by NIH-NINDS EUREKA (R01-NS066311). 8 References [1] Andrea Frome, Yoram Singer, and Jitendra Malik. Image retrieval and classification using local distance functions. In B. Sch¨ lkopf, J. Platt, and T. Hoffman, editors, Advances in Neural Information Processing o Systems 19 (NIPS), pages 417–424, Cambridge, MA, 2007. MIT Press. [2] Andrea Frome, Fei Sha, Yoram Singer, and Jitendra Malik. Learning globally-consistent local distance functions for shape-based image retrieval and classification. In International Conference on Computer Vision (ICCV), pages 1–8, 2007. [3] Deva Ramanan and Simon Baker. Local distance functions: A taxonomy, new algorithms, and an evaluation. IEEE Transactions on Pattern Analysis and Machine Intelligence, 33(4):794–806, 2011. [4] Shai Shalev-Shwartz, Yoram Singer, and Andrew Y. Ng. Online and batch learning of pseudo-metrics. In Proceedings of the twenty-first international conference on Machine learning, ICML ’04, pages 94–101. ACM, 2004. [5] Kilian Q. Weinberger and Lawrence K. Saul. Distance metric learning for large margin nearest neighbor classification. The Journal of Machine Learning Research, 10:207–244, 2009. [6] Tomasz Malisiewicz and Alexei A. Efros. Recognition by association via learning per-exemplar distances. In IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pages 1–8, 2008. [7] Yiming Ying and Peng Li. Distance metric learning with eigenvalue optimization. The Journal of Machine Learning Research, 13:1–26, 2012. [8] Matthew Schultz and Thorsten Joachims. Learning a distance metric from relative comparisons. In Advances in Neural Information Processing Systems 16 (NIPS), 2004. [9] Trevor Hastie and Robert Tibshirani. Discriminant adaptive nearest neighbor classification. IEEE Transactions on Pattern Analysis and Machine Intelligence, 18(6):607–616, June 1996. [10] Elzbieta Pekalska, Pavel Paclik, and Robert P. W. Duin. A generalized kernel approach to dissimilaritybased classification. Journal of Machine Learning Research, 2:175–211, 2002. [11] Manfredo Perdigao do Carmo. Riemannian Geometry. Birkh¨ user Boston, January 1992. a [12] Joshua B. Tenenbaum, Vin De Silva, and John C. Langford. A global geometric framework for nonlinear dimensionality reduction. Science, 290(5500):2319–2323, 2000. [13] Jan R. Magnus and Heinz Neudecker. Matrix Differential Calculus with Applications in Statistics and Econometrics. John Wiley & Sons, 2007. [14] Jacek Kierzenka and Lawrence F. Shampine. A BVP solver based on residual control and the Matlab PSE. ACM Transactions on Mathematical Software, 27(3):299–316, 2001. [15] John R. Dormand and P. J. Prince. A family of embedded Runge-Kutta formulae. Journal of Computational and Applied Mathematics, 6:19–26, 1980. [16] P. Thomas Fletcher, Conglin Lu, Stephen M. Pizer, and Sarang Joshi. Principal Geodesic Analysis for the study of Nonlinear Statistics of Shape. IEEE Transactions on Medical Imaging, 23(8):995–1005, 2004. [17] Peter E. Jupp and John T. Kent. Fitting smooth paths to spherical data. Applied Statistics, 36(1):34–46, 1987. [18] Xavier Pennec. Probabilities and statistics on Riemannian manifolds: Basic tools for geometric measurements. In Proceedings of Nonlinear Signal and Image Processing, pages 194–198, 1999. [19] Stefan Sommer, Francois Lauze, Søren Hauberg, and Mads Nielsen. Manifold valued statistics, exact ¸ principal geodesic analysis and the effect of linear approximations. In European Conference on Computer Vision (ECCV), pages 43–56, 2010. [20] Kathleen M. Robinette, Hein Daanen, and Eric Paquet. The CAESAR project: a 3-D surface anthropometry survey. In 3-D Digital Imaging and Modeling, pages 380–386, 1999. [21] Dragomir Anguelov, Praveen Srinivasan, Daphne Koller, Sebastian Thrun, Jim Rodgers, and James Davis. Scape: shape completion and animation of people. ACM Transactions on Graphics, 24(3):408–416, 2005. [22] Oren Freifeld and Michael J. Black. Lie bodies: A manifold representation of 3D human shape. In A. Fitzgibbon et al. (Eds.), editor, European Conference on Computer Vision (ECCV), Part I, LNCS 7572, pages 1–14. Springer-Verlag, oct 2012. [23] Yi Hong, Quannan Li, Jiayan Jiang, and Zhuowen Tu. Learning a mixture of sparse distance metrics for classification and dimensionality reduction. In International Conference on Computer Vision (ICCV), pages 906–913, 2011. 9
2 0.71085906 107 nips-2012-Effective Split-Merge Monte Carlo Methods for Nonparametric Models of Sequential Data
Author: Michael C. Hughes, Erik B. Sudderth, Emily B. Fox
Abstract: Applications of Bayesian nonparametric methods require learning and inference algorithms which efficiently explore models of unbounded complexity. We develop new Markov chain Monte Carlo methods for the beta process hidden Markov model (BP-HMM), enabling discovery of shared activity patterns in large video and motion capture databases. By introducing split-merge moves based on sequential allocation, we allow large global changes in the shared feature structure. We also develop data-driven reversible jump moves which more reliably discover rare or unique behaviors. Our proposals apply to any choice of conjugate likelihood for observed data, and we show success with multinomial, Gaussian, and autoregressive emission models. Together, these innovations allow tractable analysis of hundreds of time series, where previous inference required clever initialization and lengthy burn-in periods for just six sequences. 1
same-paper 3 0.69774705 250 nips-2012-On-line Reinforcement Learning Using Incremental Kernel-Based Stochastic Factorization
Author: Doina Precup, Joelle Pineau, Andre S. Barreto
Abstract: Kernel-based stochastic factorization (KBSF) is an algorithm for solving reinforcement learning tasks with continuous state spaces which builds a Markov decision process (MDP) based on a set of sample transitions. What sets KBSF apart from other kernel-based approaches is the fact that the size of its MDP is independent of the number of transitions, which makes it possible to control the trade-off between the quality of the resulting approximation and the associated computational cost. However, KBSF’s memory usage grows linearly with the number of transitions, precluding its application in scenarios where a large amount of data must be processed. In this paper we show that it is possible to construct KBSF’s MDP in a fully incremental way, thus freeing the space complexity of this algorithm from its dependence on the number of sample transitions. The incremental version of KBSF is able to process an arbitrary amount of data, which results in a model-based reinforcement learning algorithm that can be used to solve continuous MDPs in both off-line and on-line regimes. We present theoretical results showing that KBSF can approximate the value function that would be computed by conventional kernel-based learning with arbitrary precision. We empirically demonstrate the effectiveness of the proposed algorithm in the challenging threepole balancing task, in which the ability to process a large number of transitions is crucial for success. 1
4 0.69701684 101 nips-2012-Discriminatively Trained Sparse Code Gradients for Contour Detection
Author: Ren Xiaofeng, Liefeng Bo
Abstract: Finding contours in natural images is a fundamental problem that serves as the basis of many tasks such as image segmentation and object recognition. At the core of contour detection technologies are a set of hand-designed gradient features, used by most approaches including the state-of-the-art Global Pb (gPb) operator. In this work, we show that contour detection accuracy can be significantly improved by computing Sparse Code Gradients (SCG), which measure contrast using patch representations automatically learned through sparse coding. We use K-SVD for dictionary learning and Orthogonal Matching Pursuit for computing sparse codes on oriented local neighborhoods, and apply multi-scale pooling and power transforms before classifying them with linear SVMs. By extracting rich representations from pixels and avoiding collapsing them prematurely, Sparse Code Gradients effectively learn how to measure local contrasts and find contours. We improve the F-measure metric on the BSDS500 benchmark to 0.74 (up from 0.71 of gPb contours). Moreover, our learning approach can easily adapt to novel sensor data such as Kinect-style RGB-D cameras: Sparse Code Gradients on depth maps and surface normals lead to promising contour detection using depth and depth+color, as verified on the NYU Depth Dataset. 1
5 0.63397461 258 nips-2012-Online L1-Dictionary Learning with Application to Novel Document Detection
Author: Shiva P. Kasiviswanathan, Huahua Wang, Arindam Banerjee, Prem Melville
Abstract: Given their pervasive use, social media, such as Twitter, have become a leading source of breaking news. A key task in the automated identification of such news is the detection of novel documents from a voluminous stream of text documents in a scalable manner. Motivated by this challenge, we introduce the problem of online 1 -dictionary learning where unlike traditional dictionary learning, which uses squared loss, the 1 -penalty is used for measuring the reconstruction error. We present an efficient online algorithm for this problem based on alternating directions method of multipliers, and establish a sublinear regret bound for this algorithm. Empirical results on news-stream and Twitter data, shows that this online 1 -dictionary learning algorithm for novel document detection gives more than an order of magnitude speedup over the previously known batch algorithm, without any significant loss in quality of results. 1
6 0.58971429 178 nips-2012-Learning Label Trees for Probabilistic Modelling of Implicit Feedback
7 0.57081115 318 nips-2012-Sparse Approximate Manifolds for Differential Geometric MCMC
8 0.54602653 54 nips-2012-Bayesian Probabilistic Co-Subspace Addition
9 0.52519792 83 nips-2012-Controlled Recognition Bounds for Visual Learning and Exploration
10 0.52214605 38 nips-2012-Algorithms for Learning Markov Field Policies
11 0.52031881 162 nips-2012-Inverse Reinforcement Learning through Structured Classification
12 0.51828873 112 nips-2012-Efficient Spike-Coding with Multiplicative Adaptation in a Spike Response Model
13 0.51823902 353 nips-2012-Transferring Expectations in Model-based Reinforcement Learning
14 0.51821554 168 nips-2012-Kernel Latent SVM for Visual Recognition
15 0.51755506 197 nips-2012-Learning with Recursive Perceptual Representations
16 0.51715815 229 nips-2012-Multimodal Learning with Deep Boltzmann Machines
17 0.51688492 108 nips-2012-Efficient Bayes-Adaptive Reinforcement Learning using Sample-Based Search
18 0.51680785 316 nips-2012-Small-Variance Asymptotics for Exponential Family Dirichlet Process Mixture Models
19 0.51550555 292 nips-2012-Regularized Off-Policy TD-Learning
20 0.51545829 111 nips-2012-Efficient Sampling for Bipartite Matching Problems