nips nips2011 nips2011-291 knowledge-graph by maker-knowledge-mining
Source: pdf
Author: Alessandro Lazaric, Marcello Restelli
Abstract: Transfer reinforcement learning (RL) methods leverage on the experience collected on a set of source tasks to speed-up RL algorithms. A simple and effective approach is to transfer samples from source tasks and include them in the training set used to solve a target task. In this paper, we investigate the theoretical properties of this transfer method and we introduce novel algorithms adapting the transfer process on the basis of the similarity between source and target tasks. Finally, we report illustrative experimental results in a continuous chain problem.
Reference: text
sentIndex sentText sentNum sentScore
1 it Abstract Transfer reinforcement learning (RL) methods leverage on the experience collected on a set of source tasks to speed-up RL algorithms. [sent-5, score-0.493]
2 A simple and effective approach is to transfer samples from source tasks and include them in the training set used to solve a target task. [sent-6, score-1.328]
3 In this paper, we investigate the theoretical properties of this transfer method and we introduce novel algorithms adapting the transfer process on the basis of the similarity between source and target tasks. [sent-7, score-1.471]
4 1 Introduction The objective of transfer in reinforcement learning (RL) [10] is to speed-up RL algorithms by reusing knowledge (e. [sent-9, score-0.511]
5 The underlying assumption of transfer methods is that the source tasks (or a suitable combination of these) are somehow similar to the target task, so that the transferred knowledge can be useful in learning its solution. [sent-12, score-1.225]
6 A wide range of scenarios and methods for transfer in RL have been studied in the last decade (see [12, 6] for a thorough survey). [sent-13, score-0.488]
7 In this paper, we focus on the simple transfer approach where trajectory samples are transferred from source MDPs to increase the size of the training set used to solve the target MDP. [sent-14, score-1.255]
8 , simulators in case of robotic applications), the solution of the target task can benefit from a larger training set that includes also some source samples. [sent-20, score-0.581]
9 This approach has been already investigated in the case of transfer between tasks with different state-action spaces in [11], where the source samples are used to build a model of the target task whenever the number of target samples is not large enough. [sent-21, score-1.727]
10 The authors introduce an algorithm which estimates the similarity between source and target tasks and selectively transfers from the source tasks which are more likely to provide samples similar to those generated by the target MDP. [sent-23, score-1.494]
11 On the other hand, in supervised learning a number of theoretical works investigated the effectiveness of transfer in reducing the sample complexity of the learning process. [sent-25, score-0.488]
12 In domain adaptation, a solution learned on a source task is transferred to a target task and its performance depends on how similar the two tasks are. [sent-26, score-0.808]
13 The case of transfer of samples from multiple source tasks is studied in [2]. [sent-28, score-1.106]
14 The most interesting finding is that the transfer performance benefits from using a larger training set at the cost of an additional error due to the average distance between source and target tasks. [sent-29, score-1.067]
15 This implies the existence of a transfer tradeoff between transferring as many samples as possible and limiting the transfer to sources which are similar to the target task. [sent-30, score-1.439]
16 As a result, the transfer of samples is expected to outperform single-task learning whenever negative transfer (i. [sent-31, score-1.144]
17 , transfer from source tasks far from the target task) is limited w. [sent-33, score-1.096]
18 This also opens the question whether it is possible to design methods able to automatically detect the similarity between tasks and adapt the transfer process accordingly. [sent-37, score-0.641]
19 In this paper, we investigate the transfer of samples in RL from a more theoretical perspective w. [sent-38, score-0.672]
20 4 and 5) whose objective is to solve the transfer tradeoff by identifying the best combination of source tasks. [sent-47, score-0.937]
21 We formalize the setting of transfer of samples and we derive a finite-sample analysis of AST which highlights the importance of the average MDP obtained by the combination of the source tasks. [sent-49, score-1.071]
22 We also report the analysis for BAT which shows both the advantage of identifying the best combination of source tasks and the additional cost in terms of auxiliary samples needed to compute the similarity between tasks. [sent-50, score-0.82]
23 6) on a simple chain problem which confirm the main theoretical findings and support the idea that sample transfer can significantly speed-up the learning process and that adaptive methods are able to solve the transfer tradeoff and avoid negative transfer effects. [sent-53, score-1.536]
24 2 Preliminaries In this section we introduce the notation and the transfer problem considered in the rest of the paper. [sent-55, score-0.47]
25 We consider the transfer problem in which M tasks {Mm }M are available and the objective is m=1 to learn the solution for the target task M1 transferring samples from the source tasks {Mm }M . [sent-72, score-1.527]
26 This is an assumption on how the samples are generated but in practice, a single realization of samples and task indexes Ml is available. [sent-89, score-0.466]
27 This condition implies that (on average) the number of target samples is much less than the source 2 Input: Linear space F = span{ϕi , 1 ≤ i ≤ d}, initial function Q0 ∈ F for k = 1, 2, . [sent-94, score-0.679]
28 samples and it is usually not enough to learn an accurate solution for the target task. [sent-101, score-0.358]
29 We will also consider the pure transfer case in which λ1 = 0 (i. [sent-102, score-0.47]
30 3 All-Sample Transfer Algorithm We first consider the case when the source samples are generated according to Def. [sent-107, score-0.52]
31 1 and the designer has no access to the source tasks. [sent-108, score-0.381]
32 We define its reward function Rλ and its transition kernel P λ as the weighted average of reward functions and transition kernels of the basic MDPs with weights determined by the proportions λ of the multinomial distribution in the definition of the random tasks design (i. [sent-118, score-0.45]
33 The bound is composed by three main terms: (i) approximation error, (ii) estimation error, and (iii) transfer error. [sent-140, score-0.508]
34 The approximation error ||fαk − ∗ T1 Qk−1 ||µ is the smallest error of functions in F in approximating the target function T1 Qk−1 and it does not depend on the transfer algorithm. [sent-141, score-0.72]
35 The estimation error (third and fourth terms in the bound) is due to the finite random samples used to learn Qk and it depends on the dimensionality d of the function space and it decreases with the total number of samples L with the fast rate of linear spaces 3 (O(d/L) instead of O( d/L)). [sent-142, score-0.425]
36 Finally, the transfer error Eλ accounts for the difference between source and target tasks. [sent-143, score-1.015]
37 In fact, samples from source tasks different from the target might bias Qk towards a wrong solution, thus resulting in a poor approximation of the target function T1 Qk−1 . [sent-144, score-0.984]
38 It is interesting to notice that the transfer error depends on the difference between the target task and the average MDP Mλ obtained by taking a linear combination of the source tasks weighted by the parameters λ. [sent-145, score-1.284]
39 This means that even when each of the source tasks is very different from the target, if there exists a suitable combination which is similar to the target task, then the transfer process is still likely to be effective. [sent-146, score-1.154]
40 The main difference is that Qk uses only N1 samples and, as a result, has a much bigger estimation error than Qk , which s takes advantage of all the L samples transferred from the source tasks. [sent-156, score-0.819]
41 At the same time, Qk suffers from an additional transfer error. [sent-157, score-0.483]
42 Thus, we can conclude that AST is expected to perform better than single-task learning whenever the advantage of using more samples is greater than the bias due to samples coming from tasks different from the target task. [sent-158, score-0.708]
43 This introduces a transfer tradeoff between including many source samples, so as to reduce the estimation error, and finding source tasks whose combination leads to a small transfer error. [sent-159, score-1.86]
44 4 we define an adaptive transfer algorithm which selects proportions λ so as to keep the transfer error Eλ as small as possible. [sent-161, score-1.054]
45 5 we consider a different setting where the number of samples in each source is limited. [sent-163, score-0.505]
46 The transfer error supα (T1 − T λ )T (fα ) µ characterizes the difference between the target and average Bellman operators through the space F . [sent-190, score-0.721]
47 As a result, even MDPs with significantly different rewards and transitions might have a small transfer error because of the functions in F . [sent-191, score-0.508]
48 4 Best Average Transfer Algorithm As discussed in the previous section, the transfer error Eλ plays a crucial role in the comparison with single-task learning. [sent-210, score-0.508]
49 We now consider the case where the designer has direct access to the source tasks (i. [sent-212, score-0.512]
50 In particular, we propose a method that adapts λ at each iteration so as to minimize the transfer error Eλ . [sent-215, score-0.547]
51 , no target samples are used in the learning training set). [sent-218, score-0.388]
52 Thus, for any function Q we define the estimated transfer error as Eλ (Q) = 1 S M S λm Rs,m + Rs,1 − s=1 m=2 γ T 2 M T t λm max Q(Ys,m , a′ ) ′ t max Q(Ys,1 , a′ )− ′ a t=1 m=2 a . [sent-229, score-0.508]
53 The bound shows that BAT outperforms AST whenever the advantage in achieving the smallest possible transfer error Eλk is larger ∗ than the additional estimation error due to the auxiliary training set. [sent-239, score-0.727]
54 When compared to single-task learning, BAT has a better performance whenever the best combination of source tasks has a small transfer error and the additional auxiliary estimation error is smaller than the estimation error in single-task learning. [sent-240, score-1.23]
55 The number of calls to the generative 5 Table 1: Parameters for the first set of tasks Table 2: Parameters for the second set of tasks tasks p l η Reward tasks p l η Reward M1 0. [sent-242, score-0.524]
56 We remark that the dependency of the auxiliary estimation error on M is due to the fact that the λ vectors (over which the transfer error is optimized) belong to the simplex Λ of dimensionality M -2. [sent-264, score-0.673]
57 Hence, the previous condition suggests that, in general, adaptive transfer methods may significantly improve the transfer performance (i. [sent-265, score-0.96]
58 , in this case a smaller transfer error) at the cost of additional sources of errors which depend on the dimensionality of the search space used to adapt the transfer process (in this case Λ). [sent-267, score-0.969]
59 5 Best Transfer Trade-off Algorithm The previous algorithm is proved to successfully estimate the combination of source tasks which better approximates the Bellman operator of the target task. [sent-268, score-0.7]
60 Nonetheless, BAT relies on the implicit assumption that L samples can always be generated from any source task 2 and it cannot be applied to the case where the number of source samples is limited. [sent-269, score-1.094]
61 Here we consider the more challenging case where the designer has still access to the source tasks but only a limited number of samples is available in each of them. [sent-270, score-0.717]
62 In this case, an adaptive transfer algorithm should solve a tradeoff between selecting as many samples as possible, so as to reduce the estimation error, and choosing the proportion of source samples properly, so as to control the transfer error. [sent-271, score-1.79]
63 The solution of this tradeoff may return non-trivial results, where source tasks similar to the target task but with few samples are removed in favor of a pool of tasks whose average roughly approximate the target task but can provide a larger number of samples. [sent-272, score-1.318]
64 We denote by Nm the maximum number of samples available for source task m. [sent-275, score-0.582]
65 Let β ∈ [0, 1]M be a weight vector, where βm is the fraction of samples from task m used in the transfer process. [sent-276, score-0.71]
66 We denote by Eβ (Eβ ) the transfer error (the estimated transfer error) with proportions λ where λm = (βm Nm )/ m′ (βm′ Nm′ ). [sent-277, score-1.034]
67 At each iteration k, BTT returns the vector β which optimizes the tradeoff between estimation and transfer errors, that is ˆ β k = arg min β∈[0,1]M Eβ (Qk−1 ) + τ d M m=1 βm N m , (3) where τ is a parameter. [sent-278, score-0.635]
68 While the first term accounts for the transfer error induced by β, the second term is the estimation error due to the total amount of samples used by the algorithm. [sent-279, score-0.761]
69 1) since the number of source samples is constrained by Nm . [sent-283, score-0.505]
70 6 Experiments In this section, we report preliminary experimental results of the transfer algorithms. [sent-286, score-0.485]
71 With probability p each action makes a step of length l, affected by a noise η, 2 If λm = 1 for task m, then the algorithm would generate all the L training samples from task m. [sent-290, score-0.326]
72 1 without transfer BAT with 1000 samples BAT with 5000 samples BAT with 10000 samples AST with 10000 samples 0 -0. [sent-303, score-1.206]
73 1 10000 1 Number of target samples 2 3 4 5 6 7 8 9 10 11 12 13 Number of iterations Figure 3: Transfer from M2 , M3 , M4 , M5 . [sent-306, score-0.358]
74 Furthermore, to evaluate the performance of the transfer algorithms previously described, we considered eight source tasks {M2 , . [sent-315, score-0.922]
75 We first consider the pure transfer problem where no target samples are actually used in the learning training set (i. [sent-325, score-0.858]
76 The objective is to study the impact of the transfer error due to the use of source samples and the effectiveness of BAT in finding a suitable combination of source tasks. [sent-328, score-1.392]
77 3 compares the performances of FQI with and without the transfer of samples from the first four tasks listed in Tab. [sent-330, score-0.785]
78 In case of single-task learning, the number of target samples refers to the samples used at learning time, while for BAT it represents the size S of the auxiliary training set used to estimate the transfer error. [sent-332, score-1.107]
79 The number of source samples added to the auxiliary set for each target sample was empirically fixed to one (T = 1). [sent-334, score-0.744]
80 Such result is due to the existence of linear combinations of source tasks which closely approximate the target task M1 at each iteration of FQI. [sent-340, score-0.737]
81 Given the first four source tasks, BAT finds a combination (λ ≃ (0. [sent-344, score-0.379]
82 As a result, the coefficient λ2 drops to zero, while a new combination among the other source tasks is found. [sent-351, score-0.51]
83 Note that BAT significantly improves single-task learning, in particular when very few target samples are available. [sent-352, score-0.358]
84 In the general case, the target task cannot be obtained as any combination of the source tasks, as it happens by considering the second set of source tasks (M6 , M7 , M8 , M9 ). [sent-353, score-1.061]
85 Note that, when a few target samples are available, the transfer of samples from a combination of the source tasks using the BAT algorithm is still beneficial. [sent-356, score-1.522]
86 On the other hand, the performance attainable by BAT is bounded by the transfer error corresponding to the best source task combination (which in this case is large). [sent-357, score-0.943]
87 2 without transfer BAT with 1000 source samples BAT with 5000 source samples BAT with 10000 source samples 0. [sent-367, score-1.985]
88 2 without transfer BAT with 1000 source samples + target samples BAT with 10000 source samples + target samples BTT with max 5000 samples for each source BTT with max 10000 samples for each source 0. [sent-372, score-3.206]
89 1 0 10000 0 Number of target samples 1000 2000 3000 4000 5000 Number of target samples Figure 4: Transfer from M6 , M7 , M8 , M9 . [sent-373, score-0.716]
90 75) with 5000 and 10000 samples for each source task. [sent-376, score-0.505]
91 Results presented so far for the BAT transfer algorithm assume that FQI is trained only with the samples obtained through combinations of source tasks. [sent-378, score-0.975]
92 Since a number of target samples is already available in the auxiliary training set, a trivial improvement is to include them in the training set together with the source samples (selected according to the proportions computed by BAT). [sent-379, score-1.065]
93 This experiment highlights the tradeoff between the need of samples to reduce the estimation error and the resulting transfer error when the target task cannot be expressed as a combination of source tasks (see Sec. [sent-385, score-1.576]
94 4, it exploits the advantage of transferring source samples when a few target samples are available, and it reduces the weight of the source tasks (so as to avoid large transfer errors) when samples from the target task are enough. [sent-388, score-2.253]
95 It is interesting to notice that increasing the number of samples available for each source task from 5000 to 10000 improves the performance in the first part of the graph, while keeping unchanged the final performance. [sent-389, score-0.597]
96 This is due to the capability of the BTT algorithm to avoid the transfer of source samples when there is no need for them, thus avoiding negative transfer effects. [sent-390, score-1.445]
97 We first derived a finitesample analysis of the performance of a simple transfer algorithm which includes all the source samples into the training set used to solve a given target task. [sent-392, score-1.197]
98 At the best of our knowledge, this is the first theoretical result for a transfer algorithm in RL showing the potential benefit of transfer over single-task learning. [sent-393, score-0.958]
99 When the designer has direct access to the source tasks, we introduced an adaptive algorithm which selects the proportion of source tasks so as to minimize the bias due to the use of source samples. [sent-394, score-1.208]
100 Finally, we considered a more challenging setting where the number of samples available in each source task is limited and a tradeoff between the amount of transferred samples and the similarity between source and target tasks must be solved. [sent-395, score-1.538]
wordName wordTfidf (topN-words)
[('bat', 0.505), ('transfer', 0.47), ('qk', 0.361), ('source', 0.321), ('samples', 0.184), ('ast', 0.181), ('target', 0.174), ('vmax', 0.154), ('fqi', 0.136), ('tasks', 0.131), ('btt', 0.126), ('rl', 0.103), ('xl', 0.085), ('yl', 0.071), ('tradeoff', 0.07), ('reward', 0.07), ('auxiliary', 0.065), ('al', 0.06), ('transferred', 0.058), ('combination', 0.058), ('task', 0.056), ('proportions', 0.056), ('bellman', 0.049), ('mdps', 0.046), ('designer', 0.042), ('restelli', 0.042), ('reinforcement', 0.041), ('iteration', 0.039), ('transferring', 0.039), ('error', 0.038), ('xs', 0.035), ('maxa', 0.035), ('proportion', 0.034), ('mdp', 0.033), ('nm', 0.033), ('transition', 0.031), ('training', 0.03), ('lazaric', 0.03), ('remark', 0.03), ('marcello', 0.028), ('alessandro', 0.028), ('mm', 0.027), ('dy', 0.025), ('transfers', 0.025), ('milano', 0.025), ('koby', 0.025), ('arg', 0.023), ('design', 0.022), ('average', 0.021), ('available', 0.021), ('whenever', 0.02), ('adaptive', 0.02), ('estimation', 0.019), ('plot', 0.019), ('sup', 0.019), ('bound', 0.019), ('jennifer', 0.019), ('multinomial', 0.018), ('operators', 0.018), ('thorough', 0.018), ('theoretical', 0.018), ('solve', 0.018), ('access', 0.018), ('similarity', 0.018), ('crammer', 0.017), ('rmax', 0.017), ('ml', 0.017), ('highlights', 0.017), ('sources', 0.016), ('operator', 0.016), ('existence', 0.016), ('advantage', 0.015), ('pm', 0.015), ('report', 0.015), ('generated', 0.015), ('matthew', 0.015), ('notice', 0.015), ('inria', 0.014), ('displays', 0.014), ('indexes', 0.014), ('span', 0.014), ('returns', 0.014), ('european', 0.014), ('measurable', 0.013), ('build', 0.013), ('simplex', 0.013), ('tted', 0.013), ('returned', 0.013), ('principled', 0.013), ('additional', 0.013), ('assumption', 0.013), ('rm', 0.013), ('accounts', 0.012), ('concentrability', 0.012), ('damien', 0.012), ('calais', 0.012), ('contrat', 0.012), ('etat', 0.012), ('projets', 0.012), ('domain', 0.012)]
simIndex simValue paperId paperTitle
same-paper 1 0.99999988 291 nips-2011-Transfer from Multiple MDPs
Author: Alessandro Lazaric, Marcello Restelli
Abstract: Transfer reinforcement learning (RL) methods leverage on the experience collected on a set of source tasks to speed-up RL algorithms. A simple and effective approach is to transfer samples from source tasks and include them in the training set used to solve a target task. In this paper, we investigate the theoretical properties of this transfer method and we introduce novel algorithms adapting the transfer process on the basis of the similarity between source and target tasks. Finally, we report illustrative experimental results in a continuous chain problem.
2 0.26549459 268 nips-2011-Speedy Q-Learning
Author: Mohammad Ghavamzadeh, Hilbert J. Kappen, Mohammad G. Azar, Rémi Munos
Abstract: We introduce a new convergent variant of Q-learning, called speedy Q-learning (SQL), to address the problem of slow convergence in the standard form of the Q-learning algorithm. We prove a PAC bound on the performance of SQL, which shows that for an MDP with n state-action pairs and the discount factor γ only T = O log(n)/(ǫ2 (1 − γ)4 ) steps are required for the SQL algorithm to converge to an ǫ-optimal action-value function with high probability. This bound has a better dependency on 1/ǫ and 1/(1 − γ), and thus, is tighter than the best available result for Q-learning. Our bound is also superior to the existing results for both modelfree and model-based instances of batch Q-value iteration that are considered to be more efficient than the incremental methods like Q-learning. 1
3 0.24212405 12 nips-2011-A Two-Stage Weighting Framework for Multi-Source Domain Adaptation
Author: Qian Sun, Rita Chattopadhyay, Sethuraman Panchanathan, Jieping Ye
Abstract: Discriminative learning when training and test data belong to different distributions is a challenging and complex task. Often times we have very few or no labeled data from the test or target distribution but may have plenty of labeled data from multiple related sources with different distributions. The difference in distributions may be both in marginal and conditional probabilities. Most of the existing domain adaptation work focuses on the marginal probability distribution difference between the domains, assuming that the conditional probabilities are similar. However in many real world applications, conditional probability distribution differences are as commonplace as marginal probability differences. In this paper we propose a two-stage domain adaptation methodology which combines weighted data from multiple sources based on marginal probability differences (first stage) as well as conditional probability differences (second stage), with the target domain data. The weights for minimizing the marginal probability differences are estimated independently, while the weights for minimizing conditional probability differences are computed simultaneously by exploiting the potential interaction among multiple sources. We also provide a theoretical analysis on the generalization performance of the proposed multi-source domain adaptation formulation using the weighted Rademacher complexity measure. Empirical comparisons with existing state-of-the-art domain adaptation methods using three real-world datasets demonstrate the effectiveness of the proposed approach. 1
4 0.17136082 18 nips-2011-Action-Gap Phenomenon in Reinforcement Learning
Author: Amir-massoud Farahmand
Abstract: Many practitioners of reinforcement learning problems have observed that oftentimes the performance of the agent reaches very close to the optimal performance even though the estimated (action-)value function is still far from the optimal one. The goal of this paper is to explain and formalize this phenomenon by introducing the concept of the action-gap regularity. As a typical result, we prove that for an ˆ agent following the greedy policy π with respect to an action-value function Q, the ˆ ˆ ˆ performance loss E V ∗ (X) − V π (X) is upper bounded by O( Q − Q∗ 1+ζ ), ∞ in which ζ ≥ 0 is the parameter quantifying the action-gap regularity. For ζ > 0, our results indicate smaller performance loss compared to what previous analyses had suggested. Finally, we show how this regularity affects the performance of the family of approximate value iteration algorithms. 1
5 0.12230819 53 nips-2011-Co-Training for Domain Adaptation
Author: Minmin Chen, Kilian Q. Weinberger, John Blitzer
Abstract: Domain adaptation algorithms seek to generalize a model trained in a source domain to a new target domain. In many practical cases, the source and target distributions can differ substantially, and in some cases crucial target features may not have support in the source domain. In this paper we introduce an algorithm that bridges the gap between source and target domains by slowly adding to the training set both the target features and instances in which the current algorithm is the most confident. Our algorithm is a variant of co-training [7], and we name it CODA (Co-training for domain adaptation). Unlike the original co-training work, we do not assume a particular feature split. Instead, for each iteration of cotraining, we formulate a single optimization problem which simultaneously learns a target predictor, a split of the feature space into views, and a subset of source and target features to include in the predictor. CODA significantly out-performs the state-of-the-art on the 12-domain benchmark data set of Blitzer et al. [4]. Indeed, over a wide range (65 of 84 comparisons) of target supervision CODA achieves the best performance. 1
6 0.091288164 254 nips-2011-Similarity-based Learning via Data Driven Embeddings
7 0.080639206 94 nips-2011-Facial Expression Transfer with Input-Output Temporal Restricted Boltzmann Machines
8 0.076971889 88 nips-2011-Environmental statistics and the trade-off between model-based and TD learning in humans
9 0.074818492 190 nips-2011-Nonlinear Inverse Reinforcement Learning with Gaussian Processes
10 0.073024243 219 nips-2011-Predicting response time and error rates in visual search
11 0.070112161 109 nips-2011-Greedy Model Averaging
12 0.061992444 70 nips-2011-Dimensionality Reduction Using the Sparse Linear Model
13 0.061562695 77 nips-2011-Efficient Learning of Generalized Linear and Single Index Models with Isotonic Regression
14 0.061280854 49 nips-2011-Boosting with Maximum Adaptive Sampling
15 0.059135504 16 nips-2011-A reinterpretation of the policy oscillation phenomenon in approximate policy iteration
16 0.056284033 226 nips-2011-Projection onto A Nonnegative Max-Heap
17 0.05467736 163 nips-2011-MAP Inference for Bayesian Inverse Reinforcement Learning
18 0.054332212 238 nips-2011-Relative Density-Ratio Estimation for Robust Distribution Comparison
19 0.054259215 120 nips-2011-History distribution matching method for predicting effectiveness of HIV combination therapies
20 0.051729687 258 nips-2011-Sparse Bayesian Multi-Task Learning
topicId topicWeight
[(0, 0.135), (1, -0.051), (2, 0.008), (3, 0.058), (4, -0.106), (5, 0.044), (6, 0.027), (7, -0.078), (8, -0.063), (9, -0.018), (10, -0.134), (11, 0.04), (12, 0.026), (13, 0.097), (14, -0.029), (15, -0.131), (16, -0.094), (17, -0.077), (18, 0.201), (19, 0.029), (20, -0.261), (21, 0.281), (22, -0.094), (23, -0.093), (24, -0.17), (25, -0.141), (26, 0.067), (27, 0.067), (28, -0.144), (29, -0.049), (30, -0.136), (31, 0.066), (32, -0.036), (33, 0.006), (34, 0.016), (35, -0.046), (36, 0.009), (37, -0.038), (38, 0.05), (39, 0.032), (40, -0.022), (41, -0.03), (42, 0.077), (43, 0.003), (44, -0.028), (45, -0.057), (46, 0.025), (47, -0.004), (48, -0.007), (49, -0.119)]
simIndex simValue paperId paperTitle
same-paper 1 0.96935505 291 nips-2011-Transfer from Multiple MDPs
Author: Alessandro Lazaric, Marcello Restelli
Abstract: Transfer reinforcement learning (RL) methods leverage on the experience collected on a set of source tasks to speed-up RL algorithms. A simple and effective approach is to transfer samples from source tasks and include them in the training set used to solve a target task. In this paper, we investigate the theoretical properties of this transfer method and we introduce novel algorithms adapting the transfer process on the basis of the similarity between source and target tasks. Finally, we report illustrative experimental results in a continuous chain problem.
2 0.76914668 268 nips-2011-Speedy Q-Learning
Author: Mohammad Ghavamzadeh, Hilbert J. Kappen, Mohammad G. Azar, Rémi Munos
Abstract: We introduce a new convergent variant of Q-learning, called speedy Q-learning (SQL), to address the problem of slow convergence in the standard form of the Q-learning algorithm. We prove a PAC bound on the performance of SQL, which shows that for an MDP with n state-action pairs and the discount factor γ only T = O log(n)/(ǫ2 (1 − γ)4 ) steps are required for the SQL algorithm to converge to an ǫ-optimal action-value function with high probability. This bound has a better dependency on 1/ǫ and 1/(1 − γ), and thus, is tighter than the best available result for Q-learning. Our bound is also superior to the existing results for both modelfree and model-based instances of batch Q-value iteration that are considered to be more efficient than the incremental methods like Q-learning. 1
3 0.64705265 12 nips-2011-A Two-Stage Weighting Framework for Multi-Source Domain Adaptation
Author: Qian Sun, Rita Chattopadhyay, Sethuraman Panchanathan, Jieping Ye
Abstract: Discriminative learning when training and test data belong to different distributions is a challenging and complex task. Often times we have very few or no labeled data from the test or target distribution but may have plenty of labeled data from multiple related sources with different distributions. The difference in distributions may be both in marginal and conditional probabilities. Most of the existing domain adaptation work focuses on the marginal probability distribution difference between the domains, assuming that the conditional probabilities are similar. However in many real world applications, conditional probability distribution differences are as commonplace as marginal probability differences. In this paper we propose a two-stage domain adaptation methodology which combines weighted data from multiple sources based on marginal probability differences (first stage) as well as conditional probability differences (second stage), with the target domain data. The weights for minimizing the marginal probability differences are estimated independently, while the weights for minimizing conditional probability differences are computed simultaneously by exploiting the potential interaction among multiple sources. We also provide a theoretical analysis on the generalization performance of the proposed multi-source domain adaptation formulation using the weighted Rademacher complexity measure. Empirical comparisons with existing state-of-the-art domain adaptation methods using three real-world datasets demonstrate the effectiveness of the proposed approach. 1
4 0.5720939 53 nips-2011-Co-Training for Domain Adaptation
Author: Minmin Chen, Kilian Q. Weinberger, John Blitzer
Abstract: Domain adaptation algorithms seek to generalize a model trained in a source domain to a new target domain. In many practical cases, the source and target distributions can differ substantially, and in some cases crucial target features may not have support in the source domain. In this paper we introduce an algorithm that bridges the gap between source and target domains by slowly adding to the training set both the target features and instances in which the current algorithm is the most confident. Our algorithm is a variant of co-training [7], and we name it CODA (Co-training for domain adaptation). Unlike the original co-training work, we do not assume a particular feature split. Instead, for each iteration of cotraining, we formulate a single optimization problem which simultaneously learns a target predictor, a split of the feature space into views, and a subset of source and target features to include in the predictor. CODA significantly out-performs the state-of-the-art on the 12-domain benchmark data set of Blitzer et al. [4]. Indeed, over a wide range (65 of 84 comparisons) of target supervision CODA achieves the best performance. 1
5 0.51726496 120 nips-2011-History distribution matching method for predicting effectiveness of HIV combination therapies
Author: Jasmina Bogojeska
Abstract: This paper presents an approach that predicts the effectiveness of HIV combination therapies by simultaneously addressing several problems affecting the available HIV clinical data sets: the different treatment backgrounds of the samples, the uneven representation of the levels of therapy experience, the missing treatment history information, the uneven therapy representation and the unbalanced therapy outcome representation. The computational validation on clinical data shows that, compared to the most commonly used approach that does not account for the issues mentioned above, our model has significantly higher predictive power. This is especially true for samples stemming from patients with longer treatment history and samples associated with rare therapies. Furthermore, our approach is at least as powerful for the remaining samples. 1
6 0.51460087 18 nips-2011-Action-Gap Phenomenon in Reinforcement Learning
7 0.46944436 254 nips-2011-Similarity-based Learning via Data Driven Embeddings
8 0.4106819 49 nips-2011-Boosting with Maximum Adaptive Sampling
9 0.35289913 219 nips-2011-Predicting response time and error rates in visual search
10 0.32974333 238 nips-2011-Relative Density-Ratio Estimation for Robust Distribution Comparison
11 0.32943401 284 nips-2011-The Impact of Unlabeled Patterns in Rademacher Complexity Theory for Kernel Classifiers
12 0.32588068 109 nips-2011-Greedy Model Averaging
13 0.29303691 233 nips-2011-Rapid Deformable Object Detection using Dual-Tree Branch-and-Bound
14 0.28584576 77 nips-2011-Efficient Learning of Generalized Linear and Single Index Models with Isotonic Regression
15 0.2774421 290 nips-2011-Transfer Learning by Borrowing Examples for Multiclass Object Detection
16 0.26882878 279 nips-2011-Target Neighbor Consistent Feature Weighting for Nearest Neighbor Classification
17 0.26674184 94 nips-2011-Facial Expression Transfer with Input-Output Temporal Restricted Boltzmann Machines
18 0.25648734 228 nips-2011-Quasi-Newton Methods for Markov Chain Monte Carlo
19 0.25464627 7 nips-2011-A Machine Learning Approach to Predict Chemical Reactions
20 0.25345457 70 nips-2011-Dimensionality Reduction Using the Sparse Linear Model
topicId topicWeight
[(0, 0.083), (4, 0.034), (11, 0.029), (20, 0.02), (26, 0.031), (31, 0.098), (33, 0.024), (43, 0.067), (45, 0.163), (57, 0.033), (71, 0.204), (74, 0.042), (83, 0.033), (99, 0.026)]
simIndex simValue paperId paperTitle
1 0.85995078 300 nips-2011-Variance Reduction in Monte-Carlo Tree Search
Author: Joel Veness, Marc Lanctot, Michael Bowling
Abstract: Monte-Carlo Tree Search (MCTS) has proven to be a powerful, generic planning technique for decision-making in single-agent and adversarial environments. The stochastic nature of the Monte-Carlo simulations introduces errors in the value estimates, both in terms of bias and variance. Whilst reducing bias (typically through the addition of domain knowledge) has been studied in the MCTS literature, comparatively little effort has focused on reducing variance. This is somewhat surprising, since variance reduction techniques are a well-studied area in classical statistics. In this paper, we examine the application of some standard techniques for variance reduction in MCTS, including common random numbers, antithetic variates and control variates. We demonstrate how these techniques can be applied to MCTS and explore their efficacy on three different stochastic, single-agent settings: Pig, Can’t Stop and Dominion. 1
same-paper 2 0.8400079 291 nips-2011-Transfer from Multiple MDPs
Author: Alessandro Lazaric, Marcello Restelli
Abstract: Transfer reinforcement learning (RL) methods leverage on the experience collected on a set of source tasks to speed-up RL algorithms. A simple and effective approach is to transfer samples from source tasks and include them in the training set used to solve a target task. In this paper, we investigate the theoretical properties of this transfer method and we introduce novel algorithms adapting the transfer process on the basis of the similarity between source and target tasks. Finally, we report illustrative experimental results in a continuous chain problem.
3 0.80095059 169 nips-2011-Maximum Margin Multi-Label Structured Prediction
Author: Christoph H. Lampert
Abstract: We study multi-label prediction for structured output sets, a problem that occurs, for example, in object detection in images, secondary structure prediction in computational biology, and graph matching with symmetries. Conventional multilabel classification techniques are typically not applicable in this situation, because they require explicit enumeration of the label set, which is infeasible in case of structured outputs. Relying on techniques originally designed for single-label structured prediction, in particular structured support vector machines, results in reduced prediction accuracy, or leads to infeasible optimization problems. In this work we derive a maximum-margin training formulation for multi-label structured prediction that remains computationally tractable while achieving high prediction accuracy. It also shares most beneficial properties with single-label maximum-margin approaches, in particular formulation as a convex optimization problem, efficient working set training, and PAC-Bayesian generalization bounds. 1
4 0.75526768 273 nips-2011-Structural equations and divisive normalization for energy-dependent component analysis
Author: Jun-ichiro Hirayama, Aapo Hyvärinen
Abstract: Components estimated by independent component analysis and related methods are typically not independent in real data. A very common form of nonlinear dependency between the components is correlations in their variances or energies. Here, we propose a principled probabilistic model to model the energycorrelations between the latent variables. Our two-stage model includes a linear mixing of latent signals into the observed ones like in ICA. The main new feature is a model of the energy-correlations based on the structural equation model (SEM), in particular, a Linear Non-Gaussian SEM. The SEM is closely related to divisive normalization which effectively reduces energy correlation. Our new twostage model enables estimation of both the linear mixing and the interactions related to energy-correlations, without resorting to approximations of the likelihood function or other non-principled approaches. We demonstrate the applicability of our method with synthetic dataset, natural images and brain signals. 1
5 0.72993958 96 nips-2011-Fast and Balanced: Efficient Label Tree Learning for Large Scale Object Recognition
Author: Jia Deng, Sanjeev Satheesh, Alexander C. Berg, Fei Li
Abstract: We present a novel approach to efficiently learn a label tree for large scale classification with many classes. The key contribution of the approach is a technique to simultaneously determine the structure of the tree and learn the classifiers for each node in the tree. This approach also allows fine grained control over the efficiency vs accuracy trade-off in designing a label tree, leading to more balanced trees. Experiments are performed on large scale image classification with 10184 classes and 9 million images. We demonstrate significant improvements in test accuracy and efficiency with less training time and more balanced trees compared to the previous state of the art by Bengio et al. 1
6 0.72865528 106 nips-2011-Generalizing from Several Related Classification Tasks to a New Unlabeled Sample
7 0.72519571 53 nips-2011-Co-Training for Domain Adaptation
8 0.72101611 73 nips-2011-Divide-and-Conquer Matrix Factorization
9 0.72033101 12 nips-2011-A Two-Stage Weighting Framework for Multi-Source Domain Adaptation
10 0.72023326 159 nips-2011-Learning with the weighted trace-norm under arbitrary sampling distributions
11 0.72015589 70 nips-2011-Dimensionality Reduction Using the Sparse Linear Model
12 0.71946704 172 nips-2011-Minimax Localization of Structural Information in Large Noisy Matrices
13 0.71921492 246 nips-2011-Selective Prediction of Financial Trends with Hidden Markov Models
14 0.71727884 149 nips-2011-Learning Sparse Representations of High Dimensional Data on Large Scale Dictionaries
15 0.71423852 231 nips-2011-Randomized Algorithms for Comparison-based Search
16 0.71412742 118 nips-2011-High-dimensional regression with noisy and missing data: Provable guarantees with non-convexity
17 0.71354079 265 nips-2011-Sparse recovery by thresholded non-negative least squares
18 0.71286875 4 nips-2011-A Convergence Analysis of Log-Linear Training
19 0.71241897 220 nips-2011-Prediction strategies without loss
20 0.71197748 67 nips-2011-Data Skeletonization via Reeb Graphs