nips nips2010 nips2010-38 knowledge-graph by maker-knowledge-mining
Source: pdf
Author: Javad Azimi, Alan Fern, Xiaoli Z. Fern
Abstract: Bayesian optimization methods are often used to optimize unknown functions that are costly to evaluate. Typically, these methods sequentially select inputs to be evaluated one at a time based on a posterior over the unknown function that is updated after each evaluation. In many applications, however, it is desirable to perform multiple evaluations in parallel, which requires selecting batches of multiple inputs to evaluate at once. In this paper, we propose a novel approach to batch Bayesian optimization, providing a policy for selecting batches of inputs with the goal of optimizing the function as efficiently as possible. The key idea is to exploit the availability of high-quality and efficient sequential policies, by using Monte-Carlo simulation to select input batches that closely match their expected behavior. Our experimental results on six benchmarks show that the proposed approach significantly outperforms two baselines and can lead to large advantages over a top sequential approach in terms of performance per unit time. 1
Reference: text
sentIndex sentText sentNum sentScore
1 Typically, these methods sequentially select inputs to be evaluated one at a time based on a posterior over the unknown function that is updated after each evaluation. [sent-5, score-0.188]
2 In many applications, however, it is desirable to perform multiple evaluations in parallel, which requires selecting batches of multiple inputs to evaluate at once. [sent-6, score-0.244]
3 In this paper, we propose a novel approach to batch Bayesian optimization, providing a policy for selecting batches of inputs with the goal of optimizing the function as efficiently as possible. [sent-7, score-1.031]
4 The key idea is to exploit the availability of high-quality and efficient sequential policies, by using Monte-Carlo simulation to select input batches that closely match their expected behavior. [sent-8, score-0.5]
5 Our experimental results on six benchmarks show that the proposed approach significantly outperforms two baselines and can lead to large advantages over a top sequential approach in terms of performance per unit time. [sent-9, score-0.353]
6 There are a number of well established policies for selecting the next input to evaluate given the current posterior. [sent-16, score-0.207]
7 We will refer to such policies as sequential policies to stress the fact that they select one input at a time. [sent-17, score-0.581]
8 In such cases, existing sequential policies are not sufficient. [sent-20, score-0.371]
9 Rather, batch mode BO is more appropriate, where policies select a batch of multiple inputs to be evaluated at once. [sent-21, score-1.399]
10 To the best of our knowledge and as noted in [4], there is no established work on BO that considers the batch selection problem, except for a brief treatment in [21]. [sent-22, score-0.616]
11 The main contribution of this work is to propose an approach to batch BO and to demonstrate its effectiveness. [sent-23, score-0.587]
12 The key motivation behind our approach comes from the fact that the sequential mode of BO has a fundamental advantage over BO in batch mode. [sent-24, score-0.855]
13 This is because in sequential mode, each function evaluation is immediately used to obtain a more accurate posterior of f (x), which in turn will allow 1 a selection policy to make more informed choices about the next input. [sent-25, score-0.582]
14 Given an effective sequential selection policy, our goal is then to design a batch policy that approximates its behavior. [sent-26, score-1.082]
15 In particular, our batch policy attempts to select a batch that “matches” the expected behavior of a sequential policy as closely as possible. [sent-27, score-1.935]
16 The approach generates Monte-Carlo simulations of a sequential policy given the current posterior, and then derives an optimization problem over possible batches aimed at minimizing the loss between the sequential policy and the batch. [sent-28, score-1.11]
17 We solve the k-means variant via k-means clustering and show that the k-medoid variant corresponds to minimizing a non-increasing supermodular function, for which there is an efficient approximation algorithm [9]. [sent-30, score-0.21]
18 We evaluate our approach on a collection of six functions and compare it to random and another baseline batch policy based on submodular maximization. [sent-31, score-0.918]
19 The results show that our approach significantly outperforms these baselines and can lead to large advantages over a top sequential approach in terms of performance per unit time. [sent-32, score-0.276]
20 Our objective is to find an experiment x ∈ X that approximately maximizes f by requesting a limited number of experiments and observing their outcomes. [sent-37, score-0.2]
21 This motivates the problem of selecting a sequence of batches, each containing k experiments, where the choice of a batch can depend on the results observed from all previous experiments. [sent-41, score-0.644]
22 We will refer to the rule for selecting a batch based on previous experiments as the batch policy. [sent-42, score-1.298]
23 The main goal of this paper is to develop a batch policy that optimizes the unknown function as efficiently as possible. [sent-43, score-0.864]
24 This posterior is used to select the next experiment to be run in a way that attempts to trade-off exploring new parts of the experimental space and exploiting parts that look promising. [sent-49, score-0.216]
25 While the BO literature has provided a number of effective policies, they are all sequential policies, where only a single experiment is selected and run at a time. [sent-50, score-0.345]
26 Thus, the main novelty of our work is in defining a batch policy in the context of BO, which is described in the next section. [sent-51, score-0.812]
27 3 Simulation Matching for Batch Selection Given a data set D of previously observed experiments, which induces a posterior distribution over the unknown function, we now consider how to select the next batch of k experiments. [sent-52, score-0.748]
28 The policy must attempt to explore by requesting experiments from unexplored parts of the input space, at the same time also attempt to optimize the unknown function via experiments that look promising given the current data. [sent-54, score-0.479]
29 While, under most measures, optimizing this trade-off is computationally intractable, there are a number of heuristic sequential policies from the BO literature that are computationally efficient and perform very well in practice. [sent-55, score-0.392]
30 For example, one such policy selects the next experiment to be the one that has the “maximum expected improvement” according to the current posterior [14, 10]. [sent-56, score-0.383]
31 The main idea behind our approach is to leverage such sequential policies by selecting a batch of k > 1 experiments that “closely matches” the sequential policy’s expected behavior. [sent-57, score-1.333]
32 Given this density we can define a density over the outcomes of executing policy π for k steps, each outcome consisting k of a set of k selected experiments. [sent-61, score-0.32]
33 1 Our batch policy is based on generating a number of samples of Sπ , which are used to define an objective for optimizing a batch of k experiments. [sent-64, score-1.48]
34 1 Batch Objective Function Our goal is to select a batch B of k experiments that best “matches the expected behavior” of a base sequential policy π conditioned on the observed data D. [sent-67, score-1.2]
35 More precisely, we consider a batch B to be a good match for a policy execution if B contains an experiment that is close to the best of the k experiments selected by the policy. [sent-68, score-0.942]
36 Our objective can now be written as selecting a batch B that minimizes k k x∗ (f, Sπ ) − nn(x∗ (f, Sπ ), B) OBJ(B) = ESπ Ef |Sπ k k 2 |D |D . [sent-72, score-0.704]
37 If we assume that the unknown function f (x) is Lipschitz continuous then minimizing this objective can be viewed as minimizing an upper bound on the expected performance difference between the sequential policy and the selected batch. [sent-74, score-0.664]
38 Here the performance of a policy or a batch is equal to the output value of the best selected experiment. [sent-75, score-0.833]
39 We now define our objective as minimizing (1) over batch B. [sent-80, score-0.665]
40 The objective corresponds to a weighted k-means clustering problem, where we must select B to minimize the weighted distortion between the simulated points and their closest points in B. [sent-81, score-0.22]
41 The weight on each simulated experiment αi,x corresponds to the probability that the experiment x ∈ Si achieves the maximum k value of the unknown f among the experiments in Si , conditioned on D and the fact that Sπ = Si . [sent-82, score-0.239]
42 This objective corresponds to the weighted k-medoid clustering problem, which is often considered to improve robustness to outliers in clustering. [sent-89, score-0.144]
43 Accordingly we will refer to this objective as the k-medoid objective and note that given a fixed set of simulations this corresponds to a discrete optimization problem. [sent-90, score-0.185]
44 In general these weights will be difficult to compute exactly, particularly 1 For example, this can be done by starting with D and selecting the first experiment x1 using π and then using P (f | D) to simulate the result y1 of experiment x1 . [sent-93, score-0.179]
45 , xm } // initialize batch to all data points while |B| > k do m x ← arg minx∈B j=1 wj · xj − nn(xj , B \ x) // point that influences objective the least B ←B\x end while return B due to the conditioning on the set Si . [sent-101, score-0.647]
46 The k cluster centers are returned as our batch B. [sent-129, score-0.613]
47 ˆ The k-medoid objective is well known [22] and the weighted k-medoid clustering algorithm [11] has been shown to perform well and be robust to outliers in the data. [sent-130, score-0.144]
48 The input to the algorithm is the set of weighted experiments and the batch size k. [sent-133, score-0.691]
49 The algorithm initializes the batch B to include all of the input experiments, which achieves the minimum objective value of zero. [sent-134, score-0.694]
50 This greedy algorithm is motivated by theoretical results on the minimization of non-increasing, supermodular set functions. [sent-136, score-0.18]
51 Thus, a set function is supermodular if adding an element to a smaller set provides no less improvement than adding the element to a larger set. [sent-139, score-0.151]
52 It can be shown that our k-medoid objective function of (1) is both a non-increasing and supermodular function of B and achieves a minimum value of zero for B = i Si . [sent-141, score-0.184]
53 Also note that the greedy algorithm we use is qualitatively different from the one used for submodular maximization, since it greedily removes elements from B rather than greedily adding elements to B. [sent-151, score-0.182]
54 Our batch selection approach described above requires that we maintain a posterior over the unknown function f . [sent-153, score-0.736]
55 Our batch selection approach also requires a base sequential policy π to be used for simulation matching. [sent-163, score-1.166]
56 This policy must be able to select the next experiment given any set of prior experimental observations D. [sent-164, score-0.351]
57 In our experiments, we use a policy based on the Maximum Expected Improvement (MEI) heuristic [14, 10] which is a very successful sequential policy for BO and has been shown to converge in the limit to the global optimum. [sent-165, score-0.691]
58 Given data D the MEI policy simply selects the next experiment to be the one that maximizes the expected improvement over the current set of experiments with respect to maximizing the unknown function. [sent-166, score-0.464]
59 Note that we have also evaluated our simulation-matching approach with an alternative sequential policy known as Maximum Probability of Improvement [16, 10]. [sent-171, score-0.466]
60 The computation of the MEI policy requires maximizing MEI(x) over the input space X . [sent-173, score-0.267]
61 To the best of our knowledge there is no well-known batch policy for Bayesian optimization. [sent-177, score-0.812]
62 The first baseline is random selection, where a batch of k random experiments is returned at each step. [sent-179, score-0.685]
63 Interestingly, in the case of batch active learning for classification, the random batch selection strategy 5 Function Cosines Rosenbrock Table 1: Benchmark Functions. [sent-180, score-1.222]
64 Our second, more sophisticate, baseline is based on selecting a batch of experiments whose expected maximum output is the largest. [sent-191, score-0.745]
65 More formally, we consider selecting a size k batch B that maximizes the objective Ef [maxx∈B f (x) | D], which we will refer to as the EMAX objective. [sent-192, score-0.723]
66 The algorithm starts with an empty B and greedily adds experiments to B, each time selecting the one that improves the EMAX objective the most. [sent-203, score-0.193]
67 Therefore, to implement the greedy algorithm, which requires many evaluations of the EMAX objective, we use Monte Carlo sampling, where for a given set B we sample the corresponding normally distributed vector and average the maximum values across the samples. [sent-205, score-0.144]
68 5 Experimental Results In this section we evaluate our proposed batch BO approach and the baseline approaches on six different benchmarks. [sent-206, score-0.662]
69 The Hydrogen data set is the result of data collected as part of a study on biosolar hydrogen production [6], where the goal was to maximize the hydrogen production of a particular bacteria by optimizing the PH and Nitrogen levels of the growth medium. [sent-212, score-0.435]
70 2 Results Figures 2 and 3 show the performance of our methods on all six benchmark functions for batch sizes 5 and 10 respectively. [sent-221, score-0.673]
71 Hence the regret is always positive and smaller values are preferred. [sent-225, score-0.145]
72 Each run of a policy initializes the data set to contain 5 randomly selected experiments for the 2-d functions and 20 random initial experiments for the higher dimensional functions. [sent-226, score-0.391]
73 1 75 Rosenbrock 100 125 # of Experimets 150 175 2 25 200 30 35 Cart-Pole 40 45 50 55 60 # of Experimets 65 70 75 80 Michalewicz Figure 2: Performance evaluation with batch size 5. [sent-323, score-0.606]
74 1 30 40 Cart-Pole Figure 3: Performance evaluation with batch size 10. [sent-366, score-0.606]
75 In addition, for reference we plot the performance of the base Sequential MEI BO policy (k = 1) on each graph. [sent-368, score-0.254]
76 Note that since the batch approaches request either 5 or 10 experiments at a time, their curves only contain data points at those intervals. [sent-369, score-0.673]
77 For example, for the batch size 5 results the first point on a batch curve corresponds to 10 experiments, including the initial 5 experiments and the first requested batch. [sent-370, score-1.289]
78 The next point on the batch curve is for 15 experiments which includes the next requested batch and so on. [sent-371, score-1.289]
79 Rather the Sequential policy has a point at every step since it requests experiments one at a time. [sent-372, score-0.273]
80 It is important to realize that we generally expect a good sequential policy to do better, or no worse, than a batch policy with respect to performance per number of experiments. [sent-373, score-1.278]
81 Thus, the Sequential curve can be typically viewed as an upper performance bound and provides an indication of how much loss is incurred when moving to a batch setting in terms of efficiency per experiment. [sent-374, score-0.614]
82 The major observation from our results is that for all benchmarks and for both batch sizes the proposed k-means and k-medoid approaches significantly outperform the baselines. [sent-376, score-0.629]
83 This provides strong validation for our proposed simulation-matching approach to batch selection. [sent-377, score-0.587]
84 However, for both batch sizes k-medoid often does shows a small improvement over k-means and appears to have a significant advantage in FuelCell. [sent-381, score-0.614]
85 The advantage of Sequential over our batch approaches varies with the benchmark. [sent-386, score-0.607]
86 However, in most cases, our proposed batch approaches catch up to Sequential in a relatively small number of experiments and in some cases, the batch policies are similar to Sequential from the start. [sent-387, score-1.372]
87 The main exception is Cart-Pole for batch size 10, where the batch policies appear to be significantly less efficient in terms of performance versus number of experiments. [sent-388, score-1.304]
88 Generally, we see that the difference between our batch policies and Sequential is larger for batch size 10 than batch size 5, which is expected, since larger batch sizes imply that less information per experiment is used in making decisions. [sent-389, score-2.539]
89 It is clear, however, that if we evaluate the performance of our batch policies in terms of experimental time, then there is a very significant advantage over Sequential. [sent-390, score-0.741]
90 In particular, the amount of experimental time for a policy is approximately equal to the number of requested batches, assuming that the batch size is selected to allow for all selected experiments to be run in parallel. [sent-391, score-0.988]
91 This means, for example, that for the batch size 5 results, 5 time steps for the batch approaches correspond to 30 total experiments (5 initial + 5 batches). [sent-392, score-1.242]
92 In all cases, the batch policies yield a very large improvement in regret reduction per unit time, which is the primary motivation for batch selection. [sent-394, score-1.476]
93 6 Summary and Future Work In this paper we introduced a novel approach to batch BO based on the idea of simulation matching. [sent-395, score-0.642]
94 The key idea of our approach is to design batches of experiments that approximately match the expected performance of high-quality sequential policies for BO. [sent-396, score-0.562]
95 We considered two variants of the matching problem and showed that both approaches significantly outperformed two baselines including random batch selection on six benchmark functions. [sent-397, score-0.774]
96 For future work we plan to consider the general idea of simulation matching for other problems, such as active learning, where there are also good sequential policies and batch selection is often warranted. [sent-398, score-1.078]
97 In addition, we plan to consider less myopic approaches for selecting each batch and the problem of batch size selection, where there is a choice about batch size that must take into account the current data and experimental budget. [sent-399, score-1.862]
98 Optimization of ph and nitrogen for enhanced hydrogen production by synechocystis sp. [sent-445, score-0.264]
99 An approximation guarantee of the greedy descent algorithm for minimizing a supermodular set function. [sent-461, score-0.198]
100 Improved fuel cell and electrode designs for producing electricity from microbial degradation. [sent-505, score-0.164]
wordName wordTfidf (topN-words)
[('batch', 0.587), ('emax', 0.319), ('bo', 0.242), ('sequential', 0.241), ('policy', 0.225), ('experimets', 0.212), ('medoid', 0.212), ('hydrogen', 0.159), ('mei', 0.157), ('regret', 0.145), ('policies', 0.13), ('supermodular', 0.124), ('si', 0.119), ('batches', 0.114), ('gp', 0.1), ('cosines', 0.089), ('rosenbrock', 0.089), ('fuel', 0.078), ('michalewicz', 0.071), ('posterior', 0.068), ('experiment', 0.061), ('objective', 0.06), ('selecting', 0.057), ('greedy', 0.056), ('nn', 0.055), ('benchmark', 0.055), ('simulation', 0.055), ('unknown', 0.052), ('submodular', 0.051), ('production', 0.048), ('experiments', 0.048), ('fern', 0.047), ('evaluations', 0.046), ('normally', 0.042), ('select', 0.041), ('requested', 0.04), ('costly', 0.04), ('weighted', 0.036), ('baselines', 0.035), ('azimi', 0.035), ('fuelcell', 0.035), ('microbial', 0.035), ('nitrogen', 0.035), ('ymax', 0.035), ('means', 0.033), ('six', 0.031), ('steepness', 0.031), ('obj', 0.031), ('requesting', 0.031), ('bayesian', 0.031), ('clustering', 0.03), ('closed', 0.03), ('selection', 0.029), ('base', 0.029), ('expected', 0.029), ('greedily', 0.028), ('mode', 0.027), ('improvement', 0.027), ('curve', 0.027), ('initializes', 0.027), ('nemhauser', 0.027), ('inputs', 0.027), ('cell', 0.026), ('objectives', 0.026), ('returned', 0.026), ('electricity', 0.025), ('baseline', 0.024), ('optimization', 0.024), ('experimental', 0.024), ('outcome', 0.023), ('benchmarks', 0.022), ('maxx', 0.022), ('run', 0.022), ('simulations', 0.022), ('controller', 0.022), ('ph', 0.022), ('minx', 0.022), ('maximizing', 0.022), ('contour', 0.021), ('selected', 0.021), ('optimizing', 0.021), ('input', 0.02), ('approaches', 0.02), ('variant', 0.019), ('optimize', 0.019), ('active', 0.019), ('removes', 0.019), ('evaluation', 0.019), ('refer', 0.019), ('matches', 0.018), ('minimizing', 0.018), ('sin', 0.018), ('outliers', 0.018), ('curves', 0.018), ('attempt', 0.018), ('simulated', 0.017), ('density', 0.017), ('matching', 0.017), ('outcomes', 0.017), ('yk', 0.017)]
simIndex simValue paperId paperTitle
same-paper 1 0.99999952 38 nips-2010-Batch Bayesian Optimization via Simulation Matching
Author: Javad Azimi, Alan Fern, Xiaoli Z. Fern
Abstract: Bayesian optimization methods are often used to optimize unknown functions that are costly to evaluate. Typically, these methods sequentially select inputs to be evaluated one at a time based on a posterior over the unknown function that is updated after each evaluation. In many applications, however, it is desirable to perform multiple evaluations in parallel, which requires selecting batches of multiple inputs to evaluate at once. In this paper, we propose a novel approach to batch Bayesian optimization, providing a policy for selecting batches of inputs with the goal of optimizing the function as efficiently as possible. The key idea is to exploit the availability of high-quality and efficient sequential policies, by using Monte-Carlo simulation to select input batches that closely match their expected behavior. Our experimental results on six benchmarks show that the proposed approach significantly outperforms two baselines and can lead to large advantages over a top sequential approach in terms of performance per unit time. 1
2 0.18870954 189 nips-2010-On a Connection between Importance Sampling and the Likelihood Ratio Policy Gradient
Author: Tang Jie, Pieter Abbeel
Abstract: Likelihood ratio policy gradient methods have been some of the most successful reinforcement learning algorithms, especially for learning on physical systems. We describe how the likelihood ratio policy gradient can be derived from an importance sampling perspective. This derivation highlights how likelihood ratio methods under-use past experience by (i) using the past experience to estimate only the gradient of the expected return U (θ) at the current policy parameterization θ, rather than to obtain a more complete estimate of U (θ), and (ii) using past experience under the current policy only rather than using all past experience to improve the estimates. We present a new policy search method, which leverages both of these observations as well as generalized baselines—a new technique which generalizes commonly used baseline techniques for policy gradient methods. Our algorithm outperforms standard likelihood ratio policy gradient algorithms on several testbeds. 1
3 0.17239675 152 nips-2010-Learning from Logged Implicit Exploration Data
Author: Alex Strehl, John Langford, Lihong Li, Sham M. Kakade
Abstract: We provide a sound and consistent foundation for the use of nonrandom exploration data in “contextual bandit” or “partially labeled” settings where only the value of a chosen action is learned. The primary challenge in a variety of settings is that the exploration policy, in which “offline” data is logged, is not explicitly known. Prior solutions here require either control of the actions during the learning process, recorded random exploration, or actions chosen obliviously in a repeated manner. The techniques reported here lift these restrictions, allowing the learning of a policy for choosing actions given features from historical data where no randomization occurred or was logged. We empirically verify our solution on two reasonably sized sets of real-world data obtained from Yahoo!.
4 0.16068709 194 nips-2010-Online Learning for Latent Dirichlet Allocation
Author: Matthew Hoffman, Francis R. Bach, David M. Blei
Abstract: We develop an online variational Bayes (VB) algorithm for Latent Dirichlet Allocation (LDA). Online LDA is based on online stochastic optimization with a natural gradient step, which we show converges to a local optimum of the VB objective function. It can handily analyze massive document collections, including those arriving in a stream. We study the performance of online LDA in several ways, including by fitting a 100-topic topic model to 3.3M articles from Wikipedia in a single pass. We demonstrate that online LDA finds topic models as good or better than those found with batch VB, and in a fraction of the time. 1
5 0.1503675 184 nips-2010-Nonparametric Bayesian Policy Priors for Reinforcement Learning
Author: Finale Doshi-velez, David Wingate, Nicholas Roy, Joshua B. Tenenbaum
Abstract: We consider reinforcement learning in partially observable domains where the agent can query an expert for demonstrations. Our nonparametric Bayesian approach combines model knowledge, inferred from expert information and independent exploration, with policy knowledge inferred from expert trajectories. We introduce priors that bias the agent towards models with both simple representations and simple policies, resulting in improved policy and model learning. 1
6 0.1488024 208 nips-2010-Policy gradients in linearly-solvable MDPs
7 0.14203064 160 nips-2010-Linear Complementarity for Regularized Policy Evaluation and Improvement
8 0.1372159 14 nips-2010-A Reduction from Apprenticeship Learning to Classification
9 0.12891723 179 nips-2010-Natural Policy Gradient Methods with Parameter-based Exploration for Control Tasks
10 0.12390757 23 nips-2010-Active Instance Sampling via Matrix Partition
11 0.12032703 78 nips-2010-Error Propagation for Approximate Policy and Value Iteration
12 0.1189056 196 nips-2010-Online Markov Decision Processes under Bandit Feedback
13 0.11803778 201 nips-2010-PAC-Bayesian Model Selection for Reinforcement Learning
14 0.09117227 43 nips-2010-Bootstrapping Apprenticeship Learning
15 0.090929821 134 nips-2010-LSTD with Random Projections
16 0.09052138 193 nips-2010-Online Learning: Random Averages, Combinatorial Parameters, and Learnability
17 0.084021039 60 nips-2010-Deterministic Single-Pass Algorithm for LDA
18 0.0752462 183 nips-2010-Non-Stochastic Bandit Slate Problems
19 0.074805655 4 nips-2010-A Computational Decision Theory for Interactive Assistants
20 0.074578367 269 nips-2010-Throttling Poisson Processes
topicId topicWeight
[(0, 0.183), (1, -0.226), (2, 0.011), (3, -0.015), (4, -0.049), (5, 0.012), (6, 0.077), (7, 0.004), (8, 0.031), (9, -0.081), (10, 0.075), (11, 0.019), (12, -0.006), (13, -0.024), (14, -0.057), (15, 0.045), (16, -0.006), (17, 0.019), (18, -0.015), (19, 0.035), (20, 0.021), (21, -0.015), (22, -0.039), (23, 0.02), (24, -0.085), (25, -0.034), (26, 0.061), (27, 0.0), (28, -0.08), (29, 0.017), (30, 0.029), (31, -0.115), (32, 0.044), (33, -0.045), (34, -0.022), (35, -0.019), (36, -0.087), (37, -0.086), (38, 0.08), (39, 0.01), (40, -0.0), (41, 0.003), (42, -0.026), (43, -0.068), (44, 0.037), (45, -0.053), (46, 0.037), (47, 0.065), (48, 0.053), (49, -0.029)]
simIndex simValue paperId paperTitle
same-paper 1 0.94570106 38 nips-2010-Batch Bayesian Optimization via Simulation Matching
Author: Javad Azimi, Alan Fern, Xiaoli Z. Fern
Abstract: Bayesian optimization methods are often used to optimize unknown functions that are costly to evaluate. Typically, these methods sequentially select inputs to be evaluated one at a time based on a posterior over the unknown function that is updated after each evaluation. In many applications, however, it is desirable to perform multiple evaluations in parallel, which requires selecting batches of multiple inputs to evaluate at once. In this paper, we propose a novel approach to batch Bayesian optimization, providing a policy for selecting batches of inputs with the goal of optimizing the function as efficiently as possible. The key idea is to exploit the availability of high-quality and efficient sequential policies, by using Monte-Carlo simulation to select input batches that closely match their expected behavior. Our experimental results on six benchmarks show that the proposed approach significantly outperforms two baselines and can lead to large advantages over a top sequential approach in terms of performance per unit time. 1
2 0.70024669 208 nips-2010-Policy gradients in linearly-solvable MDPs
Author: Emanuel Todorov
Abstract: We present policy gradient results within the framework of linearly-solvable MDPs. For the first time, compatible function approximators and natural policy gradients are obtained by estimating the cost-to-go function, rather than the (much larger) state-action advantage function as is necessary in traditional MDPs. We also develop the first compatible function approximators and natural policy gradients for continuous-time stochastic systems.
3 0.69289196 152 nips-2010-Learning from Logged Implicit Exploration Data
Author: Alex Strehl, John Langford, Lihong Li, Sham M. Kakade
Abstract: We provide a sound and consistent foundation for the use of nonrandom exploration data in “contextual bandit” or “partially labeled” settings where only the value of a chosen action is learned. The primary challenge in a variety of settings is that the exploration policy, in which “offline” data is logged, is not explicitly known. Prior solutions here require either control of the actions during the learning process, recorded random exploration, or actions chosen obliviously in a repeated manner. The techniques reported here lift these restrictions, allowing the learning of a policy for choosing actions given features from historical data where no randomization occurred or was logged. We empirically verify our solution on two reasonably sized sets of real-world data obtained from Yahoo!.
4 0.69058055 189 nips-2010-On a Connection between Importance Sampling and the Likelihood Ratio Policy Gradient
Author: Tang Jie, Pieter Abbeel
Abstract: Likelihood ratio policy gradient methods have been some of the most successful reinforcement learning algorithms, especially for learning on physical systems. We describe how the likelihood ratio policy gradient can be derived from an importance sampling perspective. This derivation highlights how likelihood ratio methods under-use past experience by (i) using the past experience to estimate only the gradient of the expected return U (θ) at the current policy parameterization θ, rather than to obtain a more complete estimate of U (θ), and (ii) using past experience under the current policy only rather than using all past experience to improve the estimates. We present a new policy search method, which leverages both of these observations as well as generalized baselines—a new technique which generalizes commonly used baseline techniques for policy gradient methods. Our algorithm outperforms standard likelihood ratio policy gradient algorithms on several testbeds. 1
5 0.67963934 160 nips-2010-Linear Complementarity for Regularized Policy Evaluation and Improvement
Author: Jeffrey Johns, Christopher Painter-wakefield, Ronald Parr
Abstract: Recent work in reinforcement learning has emphasized the power of L1 regularization to perform feature selection and prevent overfitting. We propose formulating the L1 regularized linear fixed point problem as a linear complementarity problem (LCP). This formulation offers several advantages over the LARS-inspired formulation, LARS-TD. The LCP formulation allows the use of efficient off-theshelf solvers, leads to a new uniqueness result, and can be initialized with starting points from similar problems (warm starts). We demonstrate that warm starts, as well as the efficiency of LCP solvers, can speed up policy iteration. Moreover, warm starts permit a form of modified policy iteration that can be used to approximate a “greedy” homotopy path, a generalization of the LARS-TD homotopy path that combines policy evaluation and optimization. 1
6 0.64749104 14 nips-2010-A Reduction from Apprenticeship Learning to Classification
7 0.63533539 78 nips-2010-Error Propagation for Approximate Policy and Value Iteration
8 0.63462174 179 nips-2010-Natural Policy Gradient Methods with Parameter-based Exploration for Control Tasks
9 0.62393337 184 nips-2010-Nonparametric Bayesian Policy Priors for Reinforcement Learning
10 0.58230025 201 nips-2010-PAC-Bayesian Model Selection for Reinforcement Learning
11 0.51225334 43 nips-2010-Bootstrapping Apprenticeship Learning
12 0.48645878 194 nips-2010-Online Learning for Latent Dirichlet Allocation
13 0.48292607 183 nips-2010-Non-Stochastic Bandit Slate Problems
14 0.47516054 134 nips-2010-LSTD with Random Projections
15 0.46699083 196 nips-2010-Online Markov Decision Processes under Bandit Feedback
16 0.46006364 269 nips-2010-Throttling Poisson Processes
17 0.45814303 50 nips-2010-Constructing Skill Trees for Reinforcement Learning Agents from Demonstration Trajectories
18 0.43685895 226 nips-2010-Repeated Games against Budgeted Adversaries
19 0.43505296 202 nips-2010-Parallelized Stochastic Gradient Descent
20 0.42057666 23 nips-2010-Active Instance Sampling via Matrix Partition
topicId topicWeight
[(6, 0.293), (13, 0.047), (27, 0.087), (30, 0.06), (35, 0.015), (45, 0.205), (50, 0.046), (52, 0.022), (60, 0.019), (77, 0.064), (78, 0.012), (90, 0.036)]
simIndex simValue paperId paperTitle
same-paper 1 0.73859626 38 nips-2010-Batch Bayesian Optimization via Simulation Matching
Author: Javad Azimi, Alan Fern, Xiaoli Z. Fern
Abstract: Bayesian optimization methods are often used to optimize unknown functions that are costly to evaluate. Typically, these methods sequentially select inputs to be evaluated one at a time based on a posterior over the unknown function that is updated after each evaluation. In many applications, however, it is desirable to perform multiple evaluations in parallel, which requires selecting batches of multiple inputs to evaluate at once. In this paper, we propose a novel approach to batch Bayesian optimization, providing a policy for selecting batches of inputs with the goal of optimizing the function as efficiently as possible. The key idea is to exploit the availability of high-quality and efficient sequential policies, by using Monte-Carlo simulation to select input batches that closely match their expected behavior. Our experimental results on six benchmarks show that the proposed approach significantly outperforms two baselines and can lead to large advantages over a top sequential approach in terms of performance per unit time. 1
2 0.73434597 114 nips-2010-Humans Learn Using Manifolds, Reluctantly
Author: Tim Rogers, Chuck Kalish, Joseph Harrison, Xiaojin Zhu, Bryan R. Gibson
Abstract: When the distribution of unlabeled data in feature space lies along a manifold, the information it provides may be used by a learner to assist classification in a semi-supervised setting. While manifold learning is well-known in machine learning, the use of manifolds in human learning is largely unstudied. We perform a set of experiments which test a human’s ability to use a manifold in a semisupervised learning task, under varying conditions. We show that humans may be encouraged into using the manifold, overcoming the strong preference for a simple, axis-parallel linear boundary. 1
3 0.64205545 238 nips-2010-Short-term memory in neuronal networks through dynamical compressed sensing
Author: Surya Ganguli, Haim Sompolinsky
Abstract: Recent proposals suggest that large, generic neuronal networks could store memory traces of past input sequences in their instantaneous state. Such a proposal raises important theoretical questions about the duration of these memory traces and their dependence on network size, connectivity and signal statistics. Prior work, in the case of gaussian input sequences and linear neuronal networks, shows that the duration of memory traces in a network cannot exceed the number of neurons (in units of the neuronal time constant), and that no network can out-perform an equivalent feedforward network. However a more ethologically relevant scenario is that of sparse input sequences. In this scenario, we show how linear neural networks can essentially perform compressed sensing (CS) of past inputs, thereby attaining a memory capacity that exceeds the number of neurons. This enhanced capacity is achieved by a class of “orthogonal” recurrent networks and not by feedforward networks or generic recurrent networks. We exploit techniques from the statistical physics of disordered systems to analytically compute the decay of memory traces in such networks as a function of network size, signal sparsity and integration time. Alternately, viewed purely from the perspective of CS, this work introduces a new ensemble of measurement matrices derived from dynamical systems, and provides a theoretical analysis of their asymptotic performance. 1
4 0.63976115 98 nips-2010-Functional form of motion priors in human motion perception
Author: Hongjing Lu, Tungyou Lin, Alan Lee, Luminita Vese, Alan L. Yuille
Abstract: It has been speculated that the human motion system combines noisy measurements with prior expectations in an optimal, or rational, manner. The basic goal of our work is to discover experimentally which prior distribution is used. More specifically, we seek to infer the functional form of the motion prior from the performance of human subjects on motion estimation tasks. We restricted ourselves to priors which combine three terms for motion slowness, first-order smoothness, and second-order smoothness. We focused on two functional forms for prior distributions: L2-norm and L1-norm regularization corresponding to the Gaussian and Laplace distributions respectively. In our first experimental session we estimate the weights of the three terms for each functional form to maximize the fit to human performance. We then measured human performance for motion tasks and found that we obtained better fit for the L1-norm (Laplace) than for the L2-norm (Gaussian). We note that the L1-norm is also a better fit to the statistics of motion in natural environments. In addition, we found large weights for the second-order smoothness term, indicating the importance of high-order smoothness compared to slowness and lower-order smoothness. To validate our results further, we used the best fit models using the L1-norm to predict human performance in a second session with different experimental setups. Our results showed excellent agreement between human performance and model prediction – ranging from 3% to 8% for five human subjects over ten experimental conditions – and give further support that the human visual system uses an L1-norm (Laplace) prior.
5 0.63837308 155 nips-2010-Learning the context of a category
Author: Dan Navarro
Abstract: This paper outlines a hierarchical Bayesian model for human category learning that learns both the organization of objects into categories, and the context in which this knowledge should be applied. The model is fit to multiple data sets, and provides a parsimonious method for describing how humans learn context specific conceptual representations.
6 0.63613969 268 nips-2010-The Neural Costs of Optimal Control
7 0.63589168 200 nips-2010-Over-complete representations on recurrent neural networks can support persistent percepts
8 0.63587797 51 nips-2010-Construction of Dependent Dirichlet Processes based on Poisson Processes
9 0.63572311 277 nips-2010-Two-Layer Generalization Analysis for Ranking Using Rademacher Average
10 0.63531965 158 nips-2010-Learning via Gaussian Herding
11 0.63510817 17 nips-2010-A biologically plausible network for the computation of orientation dominance
12 0.63473225 63 nips-2010-Distributed Dual Averaging In Networks
13 0.63384944 21 nips-2010-Accounting for network effects in neuronal responses using L1 regularized point process models
14 0.63357937 117 nips-2010-Identifying graph-structured activation patterns in networks
15 0.6323238 194 nips-2010-Online Learning for Latent Dirichlet Allocation
16 0.63209426 85 nips-2010-Exact learning curves for Gaussian process regression on large random graphs
17 0.63109004 44 nips-2010-Brain covariance selection: better individual functional connectivity models using population prior
18 0.63098663 30 nips-2010-An Inverse Power Method for Nonlinear Eigenproblems with Applications in 1-Spectral Clustering and Sparse PCA
19 0.63075626 109 nips-2010-Group Sparse Coding with a Laplacian Scale Mixture Prior
20 0.62982345 70 nips-2010-Efficient Optimization for Discriminative Latent Class Models