nips nips2010 nips2010-38 knowledge-graph by maker-knowledge-mining

38 nips-2010-Batch Bayesian Optimization via Simulation Matching

Source: pdf

Author: Javad Azimi, Alan Fern, Xiaoli Z. Fern

Abstract: Bayesian optimization methods are often used to optimize unknown functions that are costly to evaluate. Typically, these methods sequentially select inputs to be evaluated one at a time based on a posterior over the unknown function that is updated after each evaluation. In many applications, however, it is desirable to perform multiple evaluations in parallel, which requires selecting batches of multiple inputs to evaluate at once. In this paper, we propose a novel approach to batch Bayesian optimization, providing a policy for selecting batches of inputs with the goal of optimizing the function as efﬁciently as possible. The key idea is to exploit the availability of high-quality and efﬁcient sequential policies, by using Monte-Carlo simulation to select input batches that closely match their expected behavior. Our experimental results on six benchmarks show that the proposed approach signiﬁcantly outperforms two baselines and can lead to large advantages over a top sequential approach in terms of performance per unit time. 1

Reference: text

Summary: the most important sentenses genereted by tfidf model

sentIndex sentText sentNum sentScore

1 Typically, these methods sequentially select inputs to be evaluated one at a time based on a posterior over the unknown function that is updated after each evaluation. [sent-5, score-0.188]

2 In many applications, however, it is desirable to perform multiple evaluations in parallel, which requires selecting batches of multiple inputs to evaluate at once. [sent-6, score-0.244]

3 In this paper, we propose a novel approach to batch Bayesian optimization, providing a policy for selecting batches of inputs with the goal of optimizing the function as efﬁciently as possible. [sent-7, score-1.031]

4 The key idea is to exploit the availability of high-quality and efﬁcient sequential policies, by using Monte-Carlo simulation to select input batches that closely match their expected behavior. [sent-8, score-0.5]

5 Our experimental results on six benchmarks show that the proposed approach signiﬁcantly outperforms two baselines and can lead to large advantages over a top sequential approach in terms of performance per unit time. [sent-9, score-0.353]

6 There are a number of well established policies for selecting the next input to evaluate given the current posterior. [sent-16, score-0.207]

7 We will refer to such policies as sequential policies to stress the fact that they select one input at a time. [sent-17, score-0.581]

8 In such cases, existing sequential policies are not sufﬁcient. [sent-20, score-0.371]

9 Rather, batch mode BO is more appropriate, where policies select a batch of multiple inputs to be evaluated at once. [sent-21, score-1.399]

10 To the best of our knowledge and as noted in [4], there is no established work on BO that considers the batch selection problem, except for a brief treatment in [21]. [sent-22, score-0.616]

11 The main contribution of this work is to propose an approach to batch BO and to demonstrate its effectiveness. [sent-23, score-0.587]

12 The key motivation behind our approach comes from the fact that the sequential mode of BO has a fundamental advantage over BO in batch mode. [sent-24, score-0.855]

13 This is because in sequential mode, each function evaluation is immediately used to obtain a more accurate posterior of f (x), which in turn will allow 1 a selection policy to make more informed choices about the next input. [sent-25, score-0.582]

14 Given an effective sequential selection policy, our goal is then to design a batch policy that approximates its behavior. [sent-26, score-1.082]

15 In particular, our batch policy attempts to select a batch that “matches” the expected behavior of a sequential policy as closely as possible. [sent-27, score-1.935]

16 The approach generates Monte-Carlo simulations of a sequential policy given the current posterior, and then derives an optimization problem over possible batches aimed at minimizing the loss between the sequential policy and the batch. [sent-28, score-1.11]

17 We solve the k-means variant via k-means clustering and show that the k-medoid variant corresponds to minimizing a non-increasing supermodular function, for which there is an efﬁcient approximation algorithm [9]. [sent-30, score-0.21]

18 We evaluate our approach on a collection of six functions and compare it to random and another baseline batch policy based on submodular maximization. [sent-31, score-0.918]

19 The results show that our approach significantly outperforms these baselines and can lead to large advantages over a top sequential approach in terms of performance per unit time. [sent-32, score-0.276]

20 Our objective is to ﬁnd an experiment x ∈ X that approximately maximizes f by requesting a limited number of experiments and observing their outcomes. [sent-37, score-0.2]

21 This motivates the problem of selecting a sequence of batches, each containing k experiments, where the choice of a batch can depend on the results observed from all previous experiments. [sent-41, score-0.644]

22 We will refer to the rule for selecting a batch based on previous experiments as the batch policy. [sent-42, score-1.298]

23 The main goal of this paper is to develop a batch policy that optimizes the unknown function as efﬁciently as possible. [sent-43, score-0.864]

24 This posterior is used to select the next experiment to be run in a way that attempts to trade-off exploring new parts of the experimental space and exploiting parts that look promising. [sent-49, score-0.216]

25 While the BO literature has provided a number of effective policies, they are all sequential policies, where only a single experiment is selected and run at a time. [sent-50, score-0.345]

26 Thus, the main novelty of our work is in deﬁning a batch policy in the context of BO, which is described in the next section. [sent-51, score-0.812]

27 3 Simulation Matching for Batch Selection Given a data set D of previously observed experiments, which induces a posterior distribution over the unknown function, we now consider how to select the next batch of k experiments. [sent-52, score-0.748]

28 The policy must attempt to explore by requesting experiments from unexplored parts of the input space, at the same time also attempt to optimize the unknown function via experiments that look promising given the current data. [sent-54, score-0.479]

29 While, under most measures, optimizing this trade-off is computationally intractable, there are a number of heuristic sequential policies from the BO literature that are computationally efﬁcient and perform very well in practice. [sent-55, score-0.392]

30 For example, one such policy selects the next experiment to be the one that has the “maximum expected improvement” according to the current posterior [14, 10]. [sent-56, score-0.383]

31 The main idea behind our approach is to leverage such sequential policies by selecting a batch of k > 1 experiments that “closely matches” the sequential policy’s expected behavior. [sent-57, score-1.333]

32 Given this density we can deﬁne a density over the outcomes of executing policy π for k steps, each outcome consisting k of a set of k selected experiments. [sent-61, score-0.32]

33 1 Our batch policy is based on generating a number of samples of Sπ , which are used to deﬁne an objective for optimizing a batch of k experiments. [sent-64, score-1.48]

34 1 Batch Objective Function Our goal is to select a batch B of k experiments that best “matches the expected behavior” of a base sequential policy π conditioned on the observed data D. [sent-67, score-1.2]

35 More precisely, we consider a batch B to be a good match for a policy execution if B contains an experiment that is close to the best of the k experiments selected by the policy. [sent-68, score-0.942]

36 Our objective can now be written as selecting a batch B that minimizes k k x∗ (f, Sπ ) − nn(x∗ (f, Sπ ), B) OBJ(B) = ESπ Ef |Sπ k k 2 |D |D . [sent-72, score-0.704]

37 If we assume that the unknown function f (x) is Lipschitz continuous then minimizing this objective can be viewed as minimizing an upper bound on the expected performance difference between the sequential policy and the selected batch. [sent-74, score-0.664]

38 Here the performance of a policy or a batch is equal to the output value of the best selected experiment. [sent-75, score-0.833]

39 We now deﬁne our objective as minimizing (1) over batch B. [sent-80, score-0.665]

40 The objective corresponds to a weighted k-means clustering problem, where we must select B to minimize the weighted distortion between the simulated points and their closest points in B. [sent-81, score-0.22]

41 The weight on each simulated experiment αi,x corresponds to the probability that the experiment x ∈ Si achieves the maximum k value of the unknown f among the experiments in Si , conditioned on D and the fact that Sπ = Si . [sent-82, score-0.239]

42 This objective corresponds to the weighted k-medoid clustering problem, which is often considered to improve robustness to outliers in clustering. [sent-89, score-0.144]

43 Accordingly we will refer to this objective as the k-medoid objective and note that given a ﬁxed set of simulations this corresponds to a discrete optimization problem. [sent-90, score-0.185]

44 In general these weights will be difﬁcult to compute exactly, particularly 1 For example, this can be done by starting with D and selecting the ﬁrst experiment x1 using π and then using P (f | D) to simulate the result y1 of experiment x1 . [sent-93, score-0.179]

45 , xm } // initialize batch to all data points while |B| > k do m x ← arg minx∈B j=1 wj · xj − nn(xj , B \ x) // point that inﬂuences objective the least B ←B\x end while return B due to the conditioning on the set Si . [sent-101, score-0.647]

46 The k cluster centers are returned as our batch B. [sent-129, score-0.613]

47 ˆ The k-medoid objective is well known [22] and the weighted k-medoid clustering algorithm [11] has been shown to perform well and be robust to outliers in the data. [sent-130, score-0.144]

48 The input to the algorithm is the set of weighted experiments and the batch size k. [sent-133, score-0.691]

49 The algorithm initializes the batch B to include all of the input experiments, which achieves the minimum objective value of zero. [sent-134, score-0.694]

50 This greedy algorithm is motivated by theoretical results on the minimization of non-increasing, supermodular set functions. [sent-136, score-0.18]

51 Thus, a set function is supermodular if adding an element to a smaller set provides no less improvement than adding the element to a larger set. [sent-139, score-0.151]

52 It can be shown that our k-medoid objective function of (1) is both a non-increasing and supermodular function of B and achieves a minimum value of zero for B = i Si . [sent-141, score-0.184]

53 Also note that the greedy algorithm we use is qualitatively different from the one used for submodular maximization, since it greedily removes elements from B rather than greedily adding elements to B. [sent-151, score-0.182]

54 Our batch selection approach described above requires that we maintain a posterior over the unknown function f . [sent-153, score-0.736]

55 Our batch selection approach also requires a base sequential policy π to be used for simulation matching. [sent-163, score-1.166]

56 This policy must be able to select the next experiment given any set of prior experimental observations D. [sent-164, score-0.351]

57 In our experiments, we use a policy based on the Maximum Expected Improvement (MEI) heuristic [14, 10] which is a very successful sequential policy for BO and has been shown to converge in the limit to the global optimum. [sent-165, score-0.691]

58 Given data D the MEI policy simply selects the next experiment to be the one that maximizes the expected improvement over the current set of experiments with respect to maximizing the unknown function. [sent-166, score-0.464]

59 Note that we have also evaluated our simulation-matching approach with an alternative sequential policy known as Maximum Probability of Improvement [16, 10]. [sent-171, score-0.466]

60 The computation of the MEI policy requires maximizing MEI(x) over the input space X . [sent-173, score-0.267]

61 To the best of our knowledge there is no well-known batch policy for Bayesian optimization. [sent-177, score-0.812]

62 The ﬁrst baseline is random selection, where a batch of k random experiments is returned at each step. [sent-179, score-0.685]

63 Interestingly, in the case of batch active learning for classiﬁcation, the random batch selection strategy 5 Function Cosines Rosenbrock Table 1: Benchmark Functions. [sent-180, score-1.222]

64 Our second, more sophisticate, baseline is based on selecting a batch of experiments whose expected maximum output is the largest. [sent-191, score-0.745]

65 More formally, we consider selecting a size k batch B that maximizes the objective Ef [maxx∈B f (x) | D], which we will refer to as the EMAX objective. [sent-192, score-0.723]

66 The algorithm starts with an empty B and greedily adds experiments to B, each time selecting the one that improves the EMAX objective the most. [sent-203, score-0.193]

67 Therefore, to implement the greedy algorithm, which requires many evaluations of the EMAX objective, we use Monte Carlo sampling, where for a given set B we sample the corresponding normally distributed vector and average the maximum values across the samples. [sent-205, score-0.144]

68 5 Experimental Results In this section we evaluate our proposed batch BO approach and the baseline approaches on six different benchmarks. [sent-206, score-0.662]

69 The Hydrogen data set is the result of data collected as part of a study on biosolar hydrogen production [6], where the goal was to maximize the hydrogen production of a particular bacteria by optimizing the PH and Nitrogen levels of the growth medium. [sent-212, score-0.435]

70 2 Results Figures 2 and 3 show the performance of our methods on all six benchmark functions for batch sizes 5 and 10 respectively. [sent-221, score-0.673]

71 Hence the regret is always positive and smaller values are preferred. [sent-225, score-0.145]

72 Each run of a policy initializes the data set to contain 5 randomly selected experiments for the 2-d functions and 20 random initial experiments for the higher dimensional functions. [sent-226, score-0.391]

73 1 75 Rosenbrock 100 125 # of Experimets 150 175 2 25 200 30 35 Cart-Pole 40 45 50 55 60 # of Experimets 65 70 75 80 Michalewicz Figure 2: Performance evaluation with batch size 5. [sent-323, score-0.606]

74 1 30 40 Cart-Pole Figure 3: Performance evaluation with batch size 10. [sent-366, score-0.606]

75 In addition, for reference we plot the performance of the base Sequential MEI BO policy (k = 1) on each graph. [sent-368, score-0.254]

76 Note that since the batch approaches request either 5 or 10 experiments at a time, their curves only contain data points at those intervals. [sent-369, score-0.673]

77 For example, for the batch size 5 results the ﬁrst point on a batch curve corresponds to 10 experiments, including the initial 5 experiments and the ﬁrst requested batch. [sent-370, score-1.289]

78 The next point on the batch curve is for 15 experiments which includes the next requested batch and so on. [sent-371, score-1.289]

79 Rather the Sequential policy has a point at every step since it requests experiments one at a time. [sent-372, score-0.273]

80 It is important to realize that we generally expect a good sequential policy to do better, or no worse, than a batch policy with respect to performance per number of experiments. [sent-373, score-1.278]

81 Thus, the Sequential curve can be typically viewed as an upper performance bound and provides an indication of how much loss is incurred when moving to a batch setting in terms of efﬁciency per experiment. [sent-374, score-0.614]

82 The major observation from our results is that for all benchmarks and for both batch sizes the proposed k-means and k-medoid approaches signiﬁcantly outperform the baselines. [sent-376, score-0.629]

83 This provides strong validation for our proposed simulation-matching approach to batch selection. [sent-377, score-0.587]

84 However, for both batch sizes k-medoid often does shows a small improvement over k-means and appears to have a signiﬁcant advantage in FuelCell. [sent-381, score-0.614]

85 The advantage of Sequential over our batch approaches varies with the benchmark. [sent-386, score-0.607]

86 However, in most cases, our proposed batch approaches catch up to Sequential in a relatively small number of experiments and in some cases, the batch policies are similar to Sequential from the start. [sent-387, score-1.372]

87 The main exception is Cart-Pole for batch size 10, where the batch policies appear to be signiﬁcantly less efﬁcient in terms of performance versus number of experiments. [sent-388, score-1.304]

88 Generally, we see that the difference between our batch policies and Sequential is larger for batch size 10 than batch size 5, which is expected, since larger batch sizes imply that less information per experiment is used in making decisions. [sent-389, score-2.539]

89 It is clear, however, that if we evaluate the performance of our batch policies in terms of experimental time, then there is a very signiﬁcant advantage over Sequential. [sent-390, score-0.741]

90 In particular, the amount of experimental time for a policy is approximately equal to the number of requested batches, assuming that the batch size is selected to allow for all selected experiments to be run in parallel. [sent-391, score-0.988]

91 This means, for example, that for the batch size 5 results, 5 time steps for the batch approaches correspond to 30 total experiments (5 initial + 5 batches). [sent-392, score-1.242]

92 In all cases, the batch policies yield a very large improvement in regret reduction per unit time, which is the primary motivation for batch selection. [sent-394, score-1.476]

93 6 Summary and Future Work In this paper we introduced a novel approach to batch BO based on the idea of simulation matching. [sent-395, score-0.642]

94 The key idea of our approach is to design batches of experiments that approximately match the expected performance of high-quality sequential policies for BO. [sent-396, score-0.562]

95 We considered two variants of the matching problem and showed that both approaches signiﬁcantly outperformed two baselines including random batch selection on six benchmark functions. [sent-397, score-0.774]

96 For future work we plan to consider the general idea of simulation matching for other problems, such as active learning, where there are also good sequential policies and batch selection is often warranted. [sent-398, score-1.078]

97 In addition, we plan to consider less myopic approaches for selecting each batch and the problem of batch size selection, where there is a choice about batch size that must take into account the current data and experimental budget. [sent-399, score-1.862]

98 Optimization of ph and nitrogen for enhanced hydrogen production by synechocystis sp. [sent-445, score-0.264]

99 An approximation guarantee of the greedy descent algorithm for minimizing a supermodular set function. [sent-461, score-0.198]

100 Improved fuel cell and electrode designs for producing electricity from microbial degradation. [sent-505, score-0.164]

similar papers computed by tfidf model

tfidf for this paper:

wordName wordTfidf (topN-words)

[('batch', 0.587), ('emax', 0.319), ('bo', 0.242), ('sequential', 0.241), ('policy', 0.225), ('experimets', 0.212), ('medoid', 0.212), ('hydrogen', 0.159), ('mei', 0.157), ('regret', 0.145), ('policies', 0.13), ('supermodular', 0.124), ('si', 0.119), ('batches', 0.114), ('gp', 0.1), ('cosines', 0.089), ('rosenbrock', 0.089), ('fuel', 0.078), ('michalewicz', 0.071), ('posterior', 0.068), ('experiment', 0.061), ('objective', 0.06), ('selecting', 0.057), ('greedy', 0.056), ('nn', 0.055), ('benchmark', 0.055), ('simulation', 0.055), ('unknown', 0.052), ('submodular', 0.051), ('production', 0.048), ('experiments', 0.048), ('fern', 0.047), ('evaluations', 0.046), ('normally', 0.042), ('select', 0.041), ('requested', 0.04), ('costly', 0.04), ('weighted', 0.036), ('baselines', 0.035), ('azimi', 0.035), ('fuelcell', 0.035), ('microbial', 0.035), ('nitrogen', 0.035), ('ymax', 0.035), ('means', 0.033), ('six', 0.031), ('steepness', 0.031), ('obj', 0.031), ('requesting', 0.031), ('bayesian', 0.031), ('clustering', 0.03), ('closed', 0.03), ('selection', 0.029), ('base', 0.029), ('expected', 0.029), ('greedily', 0.028), ('mode', 0.027), ('improvement', 0.027), ('curve', 0.027), ('initializes', 0.027), ('nemhauser', 0.027), ('inputs', 0.027), ('cell', 0.026), ('objectives', 0.026), ('returned', 0.026), ('electricity', 0.025), ('baseline', 0.024), ('optimization', 0.024), ('experimental', 0.024), ('outcome', 0.023), ('benchmarks', 0.022), ('maxx', 0.022), ('run', 0.022), ('simulations', 0.022), ('controller', 0.022), ('ph', 0.022), ('minx', 0.022), ('maximizing', 0.022), ('contour', 0.021), ('selected', 0.021), ('optimizing', 0.021), ('input', 0.02), ('approaches', 0.02), ('variant', 0.019), ('optimize', 0.019), ('active', 0.019), ('removes', 0.019), ('evaluation', 0.019), ('refer', 0.019), ('matches', 0.018), ('minimizing', 0.018), ('sin', 0.018), ('outliers', 0.018), ('curves', 0.018), ('attempt', 0.018), ('simulated', 0.017), ('density', 0.017), ('matching', 0.017), ('outcomes', 0.017), ('yk', 0.017)]

similar papers list:

simIndex simValue paperId paperTitle

same-paper 1 0.99999952 38 nips-2010-Batch Bayesian Optimization via Simulation Matching

Author: Javad Azimi, Alan Fern, Xiaoli Z. Fern

2 0.18870954 189 nips-2010-On a Connection between Importance Sampling and the Likelihood Ratio Policy Gradient

Author: Tang Jie, Pieter Abbeel

Abstract: Likelihood ratio policy gradient methods have been some of the most successful reinforcement learning algorithms, especially for learning on physical systems. We describe how the likelihood ratio policy gradient can be derived from an importance sampling perspective. This derivation highlights how likelihood ratio methods under-use past experience by (i) using the past experience to estimate only the gradient of the expected return U (θ) at the current policy parameterization θ, rather than to obtain a more complete estimate of U (θ), and (ii) using past experience under the current policy only rather than using all past experience to improve the estimates. We present a new policy search method, which leverages both of these observations as well as generalized baselines—a new technique which generalizes commonly used baseline techniques for policy gradient methods. Our algorithm outperforms standard likelihood ratio policy gradient algorithms on several testbeds. 1

3 0.17239675 152 nips-2010-Learning from Logged Implicit Exploration Data

Author: Alex Strehl, John Langford, Lihong Li, Sham M. Kakade

Abstract: We provide a sound and consistent foundation for the use of nonrandom exploration data in “contextual bandit” or “partially labeled” settings where only the value of a chosen action is learned. The primary challenge in a variety of settings is that the exploration policy, in which “ofﬂine” data is logged, is not explicitly known. Prior solutions here require either control of the actions during the learning process, recorded random exploration, or actions chosen obliviously in a repeated manner. The techniques reported here lift these restrictions, allowing the learning of a policy for choosing actions given features from historical data where no randomization occurred or was logged. We empirically verify our solution on two reasonably sized sets of real-world data obtained from Yahoo!.

4 0.16068709 194 nips-2010-Online Learning for Latent Dirichlet Allocation

Author: Matthew Hoffman, Francis R. Bach, David M. Blei

Abstract: We develop an online variational Bayes (VB) algorithm for Latent Dirichlet Allocation (LDA). Online LDA is based on online stochastic optimization with a natural gradient step, which we show converges to a local optimum of the VB objective function. It can handily analyze massive document collections, including those arriving in a stream. We study the performance of online LDA in several ways, including by ﬁtting a 100-topic topic model to 3.3M articles from Wikipedia in a single pass. We demonstrate that online LDA ﬁnds topic models as good or better than those found with batch VB, and in a fraction of the time. 1

5 0.1503675 184 nips-2010-Nonparametric Bayesian Policy Priors for Reinforcement Learning

Author: Finale Doshi-velez, David Wingate, Nicholas Roy, Joshua B. Tenenbaum

Abstract: We consider reinforcement learning in partially observable domains where the agent can query an expert for demonstrations. Our nonparametric Bayesian approach combines model knowledge, inferred from expert information and independent exploration, with policy knowledge inferred from expert trajectories. We introduce priors that bias the agent towards models with both simple representations and simple policies, resulting in improved policy and model learning. 1

6 0.1488024 208 nips-2010-Policy gradients in linearly-solvable MDPs

7 0.14203064 160 nips-2010-Linear Complementarity for Regularized Policy Evaluation and Improvement

8 0.1372159 14 nips-2010-A Reduction from Apprenticeship Learning to Classification

9 0.12891723 179 nips-2010-Natural Policy Gradient Methods with Parameter-based Exploration for Control Tasks

10 0.12390757 23 nips-2010-Active Instance Sampling via Matrix Partition

11 0.12032703 78 nips-2010-Error Propagation for Approximate Policy and Value Iteration

12 0.1189056 196 nips-2010-Online Markov Decision Processes under Bandit Feedback

13 0.11803778 201 nips-2010-PAC-Bayesian Model Selection for Reinforcement Learning

14 0.09117227 43 nips-2010-Bootstrapping Apprenticeship Learning

15 0.090929821 134 nips-2010-LSTD with Random Projections

16 0.09052138 193 nips-2010-Online Learning: Random Averages, Combinatorial Parameters, and Learnability

17 0.084021039 60 nips-2010-Deterministic Single-Pass Algorithm for LDA

18 0.0752462 183 nips-2010-Non-Stochastic Bandit Slate Problems

19 0.074805655 4 nips-2010-A Computational Decision Theory for Interactive Assistants

20 0.074578367 269 nips-2010-Throttling Poisson Processes

similar papers computed by lsi model

lsi for this paper:

topicId topicWeight

[(0, 0.183), (1, -0.226), (2, 0.011), (3, -0.015), (4, -0.049), (5, 0.012), (6, 0.077), (7, 0.004), (8, 0.031), (9, -0.081), (10, 0.075), (11, 0.019), (12, -0.006), (13, -0.024), (14, -0.057), (15, 0.045), (16, -0.006), (17, 0.019), (18, -0.015), (19, 0.035), (20, 0.021), (21, -0.015), (22, -0.039), (23, 0.02), (24, -0.085), (25, -0.034), (26, 0.061), (27, 0.0), (28, -0.08), (29, 0.017), (30, 0.029), (31, -0.115), (32, 0.044), (33, -0.045), (34, -0.022), (35, -0.019), (36, -0.087), (37, -0.086), (38, 0.08), (39, 0.01), (40, -0.0), (41, 0.003), (42, -0.026), (43, -0.068), (44, 0.037), (45, -0.053), (46, 0.037), (47, 0.065), (48, 0.053), (49, -0.029)]

similar papers list:

simIndex simValue paperId paperTitle

same-paper 1 0.94570106 38 nips-2010-Batch Bayesian Optimization via Simulation Matching

Author: Javad Azimi, Alan Fern, Xiaoli Z. Fern

2 0.70024669 208 nips-2010-Policy gradients in linearly-solvable MDPs

Author: Emanuel Todorov

Abstract: We present policy gradient results within the framework of linearly-solvable MDPs. For the ﬁrst time, compatible function approximators and natural policy gradients are obtained by estimating the cost-to-go function, rather than the (much larger) state-action advantage function as is necessary in traditional MDPs. We also develop the ﬁrst compatible function approximators and natural policy gradients for continuous-time stochastic systems.

3 0.69289196 152 nips-2010-Learning from Logged Implicit Exploration Data

Author: Alex Strehl, John Langford, Lihong Li, Sham M. Kakade

4 0.69058055 189 nips-2010-On a Connection between Importance Sampling and the Likelihood Ratio Policy Gradient

Author: Tang Jie, Pieter Abbeel

5 0.67963934 160 nips-2010-Linear Complementarity for Regularized Policy Evaluation and Improvement

Author: Jeffrey Johns, Christopher Painter-wakefield, Ronald Parr

Abstract: Recent work in reinforcement learning has emphasized the power of L1 regularization to perform feature selection and prevent overﬁtting. We propose formulating the L1 regularized linear ﬁxed point problem as a linear complementarity problem (LCP). This formulation offers several advantages over the LARS-inspired formulation, LARS-TD. The LCP formulation allows the use of efﬁcient off-theshelf solvers, leads to a new uniqueness result, and can be initialized with starting points from similar problems (warm starts). We demonstrate that warm starts, as well as the efﬁciency of LCP solvers, can speed up policy iteration. Moreover, warm starts permit a form of modiﬁed policy iteration that can be used to approximate a “greedy” homotopy path, a generalization of the LARS-TD homotopy path that combines policy evaluation and optimization. 1

6 0.64749104 14 nips-2010-A Reduction from Apprenticeship Learning to Classification

7 0.63533539 78 nips-2010-Error Propagation for Approximate Policy and Value Iteration

8 0.63462174 179 nips-2010-Natural Policy Gradient Methods with Parameter-based Exploration for Control Tasks

9 0.62393337 184 nips-2010-Nonparametric Bayesian Policy Priors for Reinforcement Learning

10 0.58230025 201 nips-2010-PAC-Bayesian Model Selection for Reinforcement Learning

11 0.51225334 43 nips-2010-Bootstrapping Apprenticeship Learning

12 0.48645878 194 nips-2010-Online Learning for Latent Dirichlet Allocation

13 0.48292607 183 nips-2010-Non-Stochastic Bandit Slate Problems

14 0.47516054 134 nips-2010-LSTD with Random Projections

15 0.46699083 196 nips-2010-Online Markov Decision Processes under Bandit Feedback

16 0.46006364 269 nips-2010-Throttling Poisson Processes

17 0.45814303 50 nips-2010-Constructing Skill Trees for Reinforcement Learning Agents from Demonstration Trajectories

18 0.43685895 226 nips-2010-Repeated Games against Budgeted Adversaries

19 0.43505296 202 nips-2010-Parallelized Stochastic Gradient Descent

20 0.42057666 23 nips-2010-Active Instance Sampling via Matrix Partition

similar papers computed by lda model

lda for this paper:

topicId topicWeight

[(6, 0.293), (13, 0.047), (27, 0.087), (30, 0.06), (35, 0.015), (45, 0.205), (50, 0.046), (52, 0.022), (60, 0.019), (77, 0.064), (78, 0.012), (90, 0.036)]

similar papers list:

simIndex simValue paperId paperTitle

same-paper 1 0.73859626 38 nips-2010-Batch Bayesian Optimization via Simulation Matching

Author: Javad Azimi, Alan Fern, Xiaoli Z. Fern

2 0.73434597 114 nips-2010-Humans Learn Using Manifolds, Reluctantly

Author: Tim Rogers, Chuck Kalish, Joseph Harrison, Xiaojin Zhu, Bryan R. Gibson

Abstract: When the distribution of unlabeled data in feature space lies along a manifold, the information it provides may be used by a learner to assist classiﬁcation in a semi-supervised setting. While manifold learning is well-known in machine learning, the use of manifolds in human learning is largely unstudied. We perform a set of experiments which test a human’s ability to use a manifold in a semisupervised learning task, under varying conditions. We show that humans may be encouraged into using the manifold, overcoming the strong preference for a simple, axis-parallel linear boundary. 1

3 0.64205545 238 nips-2010-Short-term memory in neuronal networks through dynamical compressed sensing

Author: Surya Ganguli, Haim Sompolinsky

Abstract: Recent proposals suggest that large, generic neuronal networks could store memory traces of past input sequences in their instantaneous state. Such a proposal raises important theoretical questions about the duration of these memory traces and their dependence on network size, connectivity and signal statistics. Prior work, in the case of gaussian input sequences and linear neuronal networks, shows that the duration of memory traces in a network cannot exceed the number of neurons (in units of the neuronal time constant), and that no network can out-perform an equivalent feedforward network. However a more ethologically relevant scenario is that of sparse input sequences. In this scenario, we show how linear neural networks can essentially perform compressed sensing (CS) of past inputs, thereby attaining a memory capacity that exceeds the number of neurons. This enhanced capacity is achieved by a class of “orthogonal” recurrent networks and not by feedforward networks or generic recurrent networks. We exploit techniques from the statistical physics of disordered systems to analytically compute the decay of memory traces in such networks as a function of network size, signal sparsity and integration time. Alternately, viewed purely from the perspective of CS, this work introduces a new ensemble of measurement matrices derived from dynamical systems, and provides a theoretical analysis of their asymptotic performance. 1

4 0.63976115 98 nips-2010-Functional form of motion priors in human motion perception

Author: Hongjing Lu, Tungyou Lin, Alan Lee, Luminita Vese, Alan L. Yuille

Abstract: It has been speculated that the human motion system combines noisy measurements with prior expectations in an optimal, or rational, manner. The basic goal of our work is to discover experimentally which prior distribution is used. More speciﬁcally, we seek to infer the functional form of the motion prior from the performance of human subjects on motion estimation tasks. We restricted ourselves to priors which combine three terms for motion slowness, ﬁrst-order smoothness, and second-order smoothness. We focused on two functional forms for prior distributions: L2-norm and L1-norm regularization corresponding to the Gaussian and Laplace distributions respectively. In our ﬁrst experimental session we estimate the weights of the three terms for each functional form to maximize the ﬁt to human performance. We then measured human performance for motion tasks and found that we obtained better ﬁt for the L1-norm (Laplace) than for the L2-norm (Gaussian). We note that the L1-norm is also a better ﬁt to the statistics of motion in natural environments. In addition, we found large weights for the second-order smoothness term, indicating the importance of high-order smoothness compared to slowness and lower-order smoothness. To validate our results further, we used the best ﬁt models using the L1-norm to predict human performance in a second session with different experimental setups. Our results showed excellent agreement between human performance and model prediction – ranging from 3% to 8% for ﬁve human subjects over ten experimental conditions – and give further support that the human visual system uses an L1-norm (Laplace) prior.

5 0.63837308 155 nips-2010-Learning the context of a category

Author: Dan Navarro

Abstract: This paper outlines a hierarchical Bayesian model for human category learning that learns both the organization of objects into categories, and the context in which this knowledge should be applied. The model is ﬁt to multiple data sets, and provides a parsimonious method for describing how humans learn context speciﬁc conceptual representations.

6 0.63613969 268 nips-2010-The Neural Costs of Optimal Control

7 0.63589168 200 nips-2010-Over-complete representations on recurrent neural networks can support persistent percepts

8 0.63587797 51 nips-2010-Construction of Dependent Dirichlet Processes based on Poisson Processes

9 0.63572311 277 nips-2010-Two-Layer Generalization Analysis for Ranking Using Rademacher Average

10 0.63531965 158 nips-2010-Learning via Gaussian Herding

11 0.63510817 17 nips-2010-A biologically plausible network for the computation of orientation dominance

12 0.63473225 63 nips-2010-Distributed Dual Averaging In Networks

13 0.63384944 21 nips-2010-Accounting for network effects in neuronal responses using L1 regularized point process models

14 0.63357937 117 nips-2010-Identifying graph-structured activation patterns in networks

15 0.6323238 194 nips-2010-Online Learning for Latent Dirichlet Allocation

16 0.63209426 85 nips-2010-Exact learning curves for Gaussian process regression on large random graphs

17 0.63109004 44 nips-2010-Brain covariance selection: better individual functional connectivity models using population prior

18 0.63098663 30 nips-2010-An Inverse Power Method for Nonlinear Eigenproblems with Applications in 1-Spectral Clustering and Sparse PCA

19 0.63075626 109 nips-2010-Group Sparse Coding with a Laplacian Scale Mixture Prior

20 0.62982345 70 nips-2010-Efficient Optimization for Discriminative Latent Class Models