nips nips2012 nips2012-61 knowledge-graph by maker-knowledge-mining

61 nips-2012-Best Arm Identification: A Unified Approach to Fixed Budget and Fixed Confidence


Source: pdf

Author: Victor Gabillon, Mohammad Ghavamzadeh, Alessandro Lazaric

Abstract: We study the problem of identifying the best arm(s) in the stochastic multi-armed bandit setting. This problem has been studied in the literature from two different perspectives: fixed budget and fixed confidence. We propose a unifying approach that leads to a meta-algorithm called unified gap-based exploration (UGapE), with a common structure and similar theoretical analysis for these two settings. We prove a performance bound for the two versions of the algorithm showing that the two problems are characterized by the same notion of complexity. We also show how the UGapE algorithm as well as its theoretical analysis can be extended to take into account the variance of the arms and to multiple bandits. Finally, we evaluate the performance of UGapE and compare it with a number of existing fixed budget and fixed confidence algorithms. 1

Reference: text


Summary: the most important sentenses genereted by tfidf model

sentIndex sentText sentNum sentScore

1 This problem has been studied in the literature from two different perspectives: fixed budget and fixed confidence. [sent-2, score-0.146]

2 We also show how the UGapE algorithm as well as its theoretical analysis can be extended to take into account the variance of the arms and to multiple bandits. [sent-5, score-0.363]

3 Finally, we evaluate the performance of UGapE and compare it with a number of existing fixed budget and fixed confidence algorithms. [sent-6, score-0.146]

4 In this problem, a forecaster repeatedly selects an arm and observes a sample drawn from its reward distribution during an exploration phase, and then is asked to return the best arm(s). [sent-8, score-0.906]

5 Unlike the standard multi-armed bandit problem, where the goal is to maximize the cumulative sum of rewards obtained by the forecaster (see e. [sent-9, score-0.219]

6 , [15, 2]), in this problem the forecaster is evaluated on the quality of the arm(s) returned at the end of the exploration phase. [sent-11, score-0.244]

7 , the best arm identification), and it is not interested in the scores collected during the test phase (i. [sent-18, score-0.672]

8 , [3, 1]), the number of rounds of the exploration phase is fixed and is known by the forecaster, and the objective is to maximize the probability of returning the best arm(s). [sent-25, score-0.227]

9 They also introduced an elimination algorithm, called Successive Rejects, which divides the budget n in phases and discards one arm per phase. [sent-32, score-0.791]

10 [8] considered the extension of the best arm identification problem to the multi1 bandit setting, where the objective is to return the best arm for each bandit. [sent-36, score-1.378]

11 [4] extended the previous results to the problem of m-best arm identification and introduced a new version of the Successive Rejects algorithm (with accept and reject) that is able to return the set of the m-best arms with high probability. [sent-38, score-0.969]

12 , [12, 6]), the forecaster tries to minimize the number of rounds needed to achieve a fixed confidence about the quality of the returned arm(s). [sent-42, score-0.248]

13 They designed an elimination algorithm, called Hoeffding Races, based on progressively discarding the arms that are suboptimal with enough confidence. [sent-47, score-0.377]

14 [6] studied the fixed confidence setting without any budget constraint and designed an elimination algorithm able to return an arm with a required accuracy (i. [sent-51, score-0.868]

15 Kalyanakrishnan & Stone [10] further extended this approach to the case where the m-best arms must be returned with a given confidence. [sent-54, score-0.368]

16 [11] recently introduced an algorithm for the case of m-best arm identification along with a thorough theoretical analysis showing the number of rounds needed to achieve the desired confidence. [sent-56, score-0.736]

17 Although the fixed budget and fixed confidence problems have been studied separately, they display several similarities. [sent-57, score-0.146]

18 In this paper, we propose a unified approach to these two settings in the general case of m-best arm identification with accuracy . [sent-58, score-0.672]

19 In Section 3, we propose a novel meta-algorithm, called unified gap-based exploration (UGapE), which uses the same arm selection and (arm) return strategies for the two settings. [sent-60, score-0.706]

20 , the case of = 0 has not been studied in the fixed budget setting). [sent-63, score-0.146]

21 We also provide a thorough empirical evaluation of UGapE and compare it with a number of existing fixed budget and fixed confidence algorithms in Appendix C of [7]. [sent-68, score-0.161]

22 In particular, we show that the probability of success in the fixed budget setting as well as the sample complexity in the fixed confidence setting strictly depend on the inverse of the gaps of the arms and the desired accuracy . [sent-75, score-0.608]

23 , Bernstein-based bounds) and to more complex settings, such as the multi-bandit best arm identification problem introduced in [8]. [sent-79, score-0.632]

24 , K} be the set of arms such that each arm k ∈ A is characterized by a distribution νk bounded in [0, b] with mean 2 µk and variance σk . [sent-84, score-0.935]

25 We define the m-max and m-argmax operators as2 m µ(m) = max µk k∈A and m (m) = arg max µk , k∈A where (m) denotes the index of the m-th best arm in A and µ(m) is its corresponding mean so that µ(1) ≥ µ(2) ≥ . [sent-85, score-0.732]

26 , |S m | = m < K) and by S m,∗ the subset of the m best arms (i. [sent-91, score-0.347]

27 Without loss of generality, we 1 2 Note that when = 0 and m = 1 this reduces to the standard best arm identification problem. [sent-94, score-0.632]

28 and k∈A k∈A For each arm k ∈ A, we define the gap ∆k as ∆k = µk − µ(m+1) µ(m) − µk if k ∈ S ∗ . [sent-106, score-0.634]

29 if k ∈ S ∗ / This definition of gap indicates that if k ∈ S ∗ , ∆k represents the “advantage” of arm k over the suboptimal arms, and if k ∈ S ∗ , ∆k denotes how suboptimal arm k is. [sent-107, score-1.278]

30 Given an accuracy and a number of arms m, we say that an arm i=k k is ( ,m)-optimal if µk ≥ µ(m) − . [sent-109, score-0.96]

31 Thus, we define the ( ,m)-best arm identification problem as the problem of finding a set S of m ( ,m)-optimal arms. [sent-110, score-0.61]

32 The ( ,m)-best arm identification problem can be formalized as a game between a stochastic bandit environment and a forecaster. [sent-111, score-0.69]

33 At each round t, the forecaster pulls an arm I(t) ∈ A and observes an independent sample drawn from the distribution νI(t) . [sent-113, score-0.841]

34 The forecaster estimates the expected value of each arm by computing the average of the samples observed over time. [sent-114, score-0.749]

35 Let Tk (t) be the number of times that arm k has been pulled by the end Tk (t) of round t, then the mean of this arm is estimated as µk (t) = Tk1 s=1 Xk (s), where Xk (s) is (t) the s-th sample observed from νk . [sent-115, score-1.289]

36 For any arm k ∈ A, we define the notion of arm simple regret as rk = µ(m) − µk , (1) and for any set S ⊂ A of m arms, we define the simple regret as rS = max rk = µ(m) − min µk . [sent-116, score-1.492]

37 k∈S k∈S (2) We denote by Ω(t) ⊂ A the set of m arms returned by the forecaster at the end of the exploration phase (when the alg. [sent-117, score-0.609]

38 Returning m ( ,m)-optimal arms is then equivalent to having rΩ(t) smaller than . [sent-119, score-0.347]

39 Given an accuracy and a number of arms m to return, we now formalize the two settings of fixed budget and fixed confidence. [sent-120, score-0.533]

40 The objective is to design a forecaster capable of returning a set of m ( ,m)-optimal arms with the largest possible confidence using a fixed budget of n rounds. [sent-122, score-0.647]

41 More formally, given a budget n, the performance of the forecaster is measured by the probability δ of not meeting the ( ,m) requirement, i. [sent-123, score-0.285]

42 The goal is to design a forecaster that stops as soon as possible and returns a set of m ( ,m)-optimal arms with a fixed confidence. [sent-127, score-0.522]

43 Given a confidence level δ, the forecaster has to guarantee that P rΩ(n) ≥ ≤ δ. [sent-129, score-0.139]

44 The performance of the forecaster is then measured by the number of rounds n either in expectation or high probability. [sent-130, score-0.205]

45 Although these settings have been considered as two distinct problems, in Section 3 we introduce a unified arm selection strategy that can be used in both cases by simply changing the stopping criteria. [sent-131, score-0.713]

46 As shown in Figure 1, both fixed-budget (UGapEb) and fixed-confidence (UGapEc) instances of UGapE use the same armselection strategy, SELECT-ARM (described in Figure 2), and upon stopping, return the m-best arms in the same manner (using Ω). [sent-134, score-0.359]

47 More precisely, both algorithms receive as input the definition of the problem ( , m), a constraint (the 3 budget n in UGapEb and the confidence level δ in UGapEc), and a parameter (a or c). [sent-136, score-0.146]

48 While UGapEb runs for n rounds and then returns the set of arms Ω(n), UGapEc runs until it achieves the desired accuracy with the requested confidence level δ. [sent-137, score-0.457]

49 This difference is due to the two different objectives targeted by the algorithms; while UGapEc optimizes its budget for a given confidence level, UGapEb’s goal is to optimize the quality of its recommendation for a fixed budget. [sent-138, score-0.146]

50 Regardless of the final objective, how to select an arm at each round (arm-selection strategy) is SELECT-ARM (t) Compute Bk (t) for each arm k ∈ A the key component of any multi-arm bandit al1. [sent-143, score-1.318]

51 One of the most important features of Identify the set of m arms J(t) ∈ arg min Bk (t) k∈A UGapE is having a unique arm-selection stratPull the arm I(t) = arg max βk (t − 1) egy for the fixed-budget and fixed-confidence k∈{lt ,ut } settings. [sent-146, score-1.015]

52 L (t) for each arm k ∈ A, where k ∀t, ∀k ∈ A Uk (t) = µk (t − 1) + βk (t − 1) , Lk (t) = µk (t − 1) − βk (t − 1). [sent-150, score-0.61]

53 3, βk (t − 1) is a confidence interval, and Uk (t) and Lk (t) are high probability upper and lower bounds on the mean of arm k, µk , after t − 1 rounds. [sent-152, score-0.628]

54 Defining the confidence interval in a general form βk (t − 1) allows us to easily extend the algorithm by taking into account different (higher) moments of the arms (see Appendix B of [7] for the case of variance, where βk (t − 1) is obtained from the Bernstein inequality). [sent-157, score-0.374]

55 3, we may see that the index Bk (t) is an upper-bound on the simple regret rk of the kth arm (see Eq. [sent-159, score-0.754]

56 Similar to the arm index, BS is also defined in order to upper-bound the simple regret rS with high probability (see Lemma 1). [sent-162, score-0.718]

57 After computing the arm indices, UGapE finds a set of m arms J(t) with minimum upper-bound 1. [sent-163, score-0.935]

58 From J(t), it computes two arm indices ut = k∈A arg maxj ∈J(t) Uj (t) and lt = arg mini∈J(t) Li (t), where in both cases the tie is broken in favor of / 3 To be more precise, βk (t − 1) is the width of a confidence interval or a confidence radius. [sent-168, score-0.975]

59 4 the arm with the largest uncertainty β(t − 1). [sent-169, score-0.61]

60 Arms lt and ut are the worst possible arm among those in J(t) and the best possible arm left outside J(t), respectively, and together they represent how bad the choice of J(t) could be. [sent-170, score-1.501]

61 Finally, the algorithm selects and pulls the arm I(t) as the arm with the larger β(t − 1) among ut and lt , observes a sample XI(t) TI(t) (t − 1) + 1 from the distribution νI(t) , and updates the empirical mean µI(t) (t) and the number of pulls TI(t) (t) of the selected arm I(t). [sent-171, score-2.214]

62 1) While UGapEc defines the set of returned arms as Ω(t) = J(t), UGapEb returns the set of arms J(t) with the smallest index, i. [sent-173, score-0.71]

63 2) UGapEc stops (we refer to the number of rounds before stopping as n) when BJ(n+1) (n + 1) is less than the given accuracy , i. [sent-179, score-0.177]

64 , when even the mth worst upper-bound on the arm simple regret among all the arms in the selected set J(n + 1) is smaller than . [sent-181, score-1.065]

65 Note that event E plays an important role in the sequel, since it allows us to first derive a series of results which are directly implied by the event E and to postpone the study of the stochastic nature of the problem (i. [sent-195, score-0.19]

66 In particular, when E holds, we have that for any arm k ∈ A and at any time t, Lk (t) ≤ µk ≤ Uk (t). [sent-198, score-0.61]

67 (6) Note that although the complexity has an explicit dependence on , it also depends on the number of arms m through the definition of the gaps ∆i , thus making it a complexity measure of the ( , m) best arm identification problem. [sent-200, score-1.043]

68 1 Analysis of the Arm-Selection Strategy Here we report lower (Lemma 1) and upper (Lemma 2) bounds for indices BS on the event E, which show their connection with the regret and gaps. [sent-204, score-0.221]

69 , T }, the index BS (t) is an upper-bound on the simple regret of this set rS . [sent-209, score-0.129]

70 On event E, for any arm i ∈ S ∗ and each time t ∈ {1, . [sent-216, score-0.705]

71 On event E, if arm k ∈ {lt , ut } is pulled at time t ∈ {1, . [sent-223, score-0.889]

72 ut ∈ S ∗ : Since by definition ut ∈ J(t), there exists an arm j ∈ S ∗ such that j ∈ J(t). [sent-234, score-0.876]

73 / / Now we may write (a) (b) (c) (d) µ(m+1) ≥ µj ≥ Lj (t) ≥ Llt (t) ≥ Lut (t) = µk (t − 1) − βk (t − 1) ≥ µk − 2βk (t − 1) (10) (a) and (d) hold because of event E, (b) follows from the fact that j ∈ J(t) and from the definition of lt , and (c) is the result of Lemma 4. [sent-235, score-0.239]

74 lt ∈ S ∗ : Since lt ∈ S ∗ and the fact that by definition lt ∈ J(t), there exists an / / arm j ∈ S ∗ such that j ∈ J(t). [sent-250, score-0.988]

75 Now we may write / (a) (b) (c) (d) µut + 2βut (t − 1) ≥ Uut (t) ≥ Uj (t) ≥ µj ≥ µ(m) (12) (a) and (c) hold because of event E, (b) is from the definition of ut and the fact that j ∈ J(t), and / (d) holds because j ∈ S ∗ . [sent-251, score-0.246]

76 Lemma 1 confirms the intuition that the B-values upper-bound the regret of the corresponding set of arms (with high probability). [sent-260, score-0.433]

77 Unfortunately, this is not enough to claim that selecting J(t) as the set of arms with smallest B-values actually correspond to arms with small regret, since BJ(t) could be an arbitrary loose bound on the regret. [sent-261, score-0.671]

78 8) that gets smaller and smaller, thus implying that selecting the arms with the smaller B-value, i. [sent-267, score-0.369]

79 4 We now have all the tools needed to prove the performance of UGapEb for the m ( ,m)-best arm identification problem. [sent-278, score-0.633]

80 4 The extension to a confidence interval that takes into account the variance of the arms is discussed in Appendix B of [7]. [sent-279, score-0.374]

81 We assume that rΩ(n) > following two steps: on event E and consider the Step 1: Here we show that on event E, we have the following upper-bound on the number of pulls of any arm i ∈ A: 4ab2 Ti (n) < (13) 2 + 1. [sent-284, score-0.851]

82 max ∆i2+ , Let ti be the last time that arm i is pulled. [sent-285, score-0.836]

83 If arm i has been pulled only during the initialization phase, Ti (n) = 1 and Eq. [sent-286, score-0.661]

84 By the definition of ti , we know Ti (n) = Ti (ti − 1) + 1. [sent-296, score-0.2]

85 We first prove the bound on the simple regret of UGapEc. [sent-320, score-0.152]

86 Using Lemma 1, we have that on event E, the simple regret of UGapEc upon stopping satisfies BJ(n+1) (n + 1) = BΩ(n+1) (n + 1) ≥ rΩ(n+1) . [sent-321, score-0.248]

87 As a result, on event E, the regret of UGapEc cannot be bigger than , because then it contradicts the stopping condition of the algorithm, i. [sent-322, score-0.248]

88 Similar to the proof of Theorem 1, we consider the following two steps: Step 1: Here we show that on event E, we have the following upper-bound on the number of pulls of any arm i ∈ A: 2b2 log(4K(n − 1)3 /δ) Ti (n) ≤ + 1. [sent-327, score-0.756]

89 (15) 2 max ∆i2+ , Let ti be the last time that arm i is pulled. [sent-328, score-0.836]

90 If arm i has been pulled only during the initialization phase, Ti (n) = 1 and Eq. [sent-329, score-0.661]

91 For example, for = 0 and m = 1 we recover the complexity used in the definition of UCB-E [1] for the fixed budget setting and the one defined in [6] for the fixed accuracy problem. [sent-346, score-0.223]

92 We define the complexity of a single arm i ∈ A, H ,i = b2 / max( ∆i2+ , )2 . [sent-348, score-0.644]

93 In fact, the algorithm can stop as soon as the desired accuracy is achieved, which means that there is no need to exactly discriminate between arm i and the best arm. [sent-352, score-0.681]

94 4 of [9], if the complexity H is known, an algorithm like UGapEc can be adapted to run in the fixed budget setting by inverting the bound on its sample complexity. [sent-358, score-0.219]

95 In particular, it would be important to derive a distribution-dependent lower bound in the form of the one reported in [1] for the general case of ≥ 0 and m ≥ 1 for both the fixed budget and fixed confidence settings. [sent-361, score-0.167]

96 5 Summary and Discussion We proposed a meta-algorithm, called unified gap-based exploration (UGapE), that unifies the two settings of the best arm(s) identification problem in stochastic multi-armed bandit: fixed budget and fixed confidence. [sent-362, score-0.267]

97 We also showed how UGapE and its theoretical analysis can be extended to take into account the variance of the arms and to multiple bandits. [sent-367, score-0.363]

98 Finally, we evaluated the performance of UGapE and compare it with a number of existing fixed budget and fixed confidence algorithms. [sent-368, score-0.146]

99 Despite their similarities, fixed budget and fixed confidence settings have been treated differently in the literature. [sent-370, score-0.183]

100 Action elimination and stopping conditions for the multi-armed bandit and reinforcement learning problems. [sent-414, score-0.16]


similar papers computed by tfidf model

tfidf for this paper:

wordName wordTfidf (topN-words)

[('arm', 0.61), ('ugape', 0.364), ('arms', 0.325), ('ugapec', 0.224), ('ugapeb', 0.21), ('ti', 0.2), ('dence', 0.15), ('budget', 0.146), ('forecaster', 0.139), ('ut', 0.133), ('lt', 0.126), ('regret', 0.108), ('bj', 0.098), ('event', 0.095), ('uni', 0.083), ('bandit', 0.08), ('con', 0.077), ('identi', 0.069), ('rounds', 0.066), ('fixed', 0.063), ('exploration', 0.062), ('gabillon', 0.062), ('kalyanakrishnan', 0.056), ('samp', 0.056), ('pulled', 0.051), ('pulls', 0.051), ('bs', 0.048), ('tk', 0.045), ('stopping', 0.045), ('returned', 0.043), ('llt', 0.042), ('company', 0.041), ('stops', 0.041), ('phase', 0.04), ('bubeck', 0.039), ('returning', 0.037), ('rs', 0.037), ('settings', 0.037), ('xed', 0.036), ('elimination', 0.035), ('bernstein', 0.035), ('uut', 0.034), ('mnih', 0.034), ('ghavamzadeh', 0.034), ('return', 0.034), ('complexity', 0.034), ('lemma', 0.033), ('interval', 0.032), ('lemmas', 0.031), ('bk', 0.03), ('nition', 0.029), ('appendix', 0.028), ('arg', 0.027), ('max', 0.026), ('lazaric', 0.026), ('accuracy', 0.025), ('lk', 0.024), ('gap', 0.024), ('desired', 0.024), ('prove', 0.023), ('austin', 0.023), ('maron', 0.023), ('races', 0.023), ('rejects', 0.023), ('observes', 0.023), ('smaller', 0.022), ('best', 0.022), ('theoretical', 0.021), ('index', 0.021), ('cation', 0.021), ('strategy', 0.021), ('bound', 0.021), ('pull', 0.02), ('nal', 0.02), ('uj', 0.02), ('victor', 0.02), ('audibert', 0.02), ('auer', 0.02), ('broken', 0.02), ('twentieth', 0.019), ('bi', 0.018), ('bounds', 0.018), ('gaps', 0.018), ('setting', 0.018), ('write', 0.018), ('ri', 0.018), ('round', 0.018), ('suboptimal', 0.017), ('returns', 0.017), ('mohammad', 0.017), ('account', 0.017), ('intrinsic', 0.017), ('corollary', 0.017), ('alessandro', 0.017), ('deduce', 0.017), ('customers', 0.016), ('reward', 0.016), ('hoeffding', 0.016), ('thorough', 0.015), ('rk', 0.015)]

similar papers list:

simIndex simValue paperId paperTitle

same-paper 1 1.0000002 61 nips-2012-Best Arm Identification: A Unified Approach to Fixed Budget and Fixed Confidence

Author: Victor Gabillon, Mohammad Ghavamzadeh, Alessandro Lazaric

Abstract: We study the problem of identifying the best arm(s) in the stochastic multi-armed bandit setting. This problem has been studied in the literature from two different perspectives: fixed budget and fixed confidence. We propose a unifying approach that leads to a meta-algorithm called unified gap-based exploration (UGapE), with a common structure and similar theoretical analysis for these two settings. We prove a performance bound for the two versions of the algorithm showing that the two problems are characterized by the same notion of complexity. We also show how the UGapE algorithm as well as its theoretical analysis can be extended to take into account the variance of the arms and to multiple bandits. Finally, we evaluate the performance of UGapE and compare it with a number of existing fixed budget and fixed confidence algorithms. 1

2 0.55925465 295 nips-2012-Risk-Aversion in Multi-armed Bandits

Author: Amir Sani, Alessandro Lazaric, Rémi Munos

Abstract: Stochastic multi–armed bandits solve the Exploration–Exploitation dilemma and ultimately maximize the expected reward. Nonetheless, in many practical problems, maximizing the expected reward is not the most desirable objective. In this paper, we introduce a novel setting based on the principle of risk–aversion where the objective is to compete against the arm with the best risk–return trade–off. This setting proves to be more difficult than the standard multi-arm bandit setting due in part to an exploration risk which introduces a regret associated to the variability of an algorithm. Using variance as a measure of risk, we define two algorithms, investigate their theoretical guarantees, and report preliminary empirical results. 1

3 0.16232616 216 nips-2012-Mirror Descent Meets Fixed Share (and feels no regret)

Author: Nicolò Cesa-bianchi, Pierre Gaillard, Gabor Lugosi, Gilles Stoltz

Abstract: Mirror descent with an entropic regularizer is known to achieve shifting regret bounds that are logarithmic in the dimension. This is done using either a carefully designed projection or by a weight sharing technique. Via a novel unified analysis, we show that these two approaches deliver essentially equivalent bounds on a notion of regret generalizing shifting, adaptive, discounted, and other related regrets. Our analysis also captures and extends the generalized weight sharing technique of Bousquet and Warmuth, and can be refined in several ways, including improvements for small losses and adaptive tuning of parameters. 1

4 0.11533494 149 nips-2012-Hierarchical Optimistic Region Selection driven by Curiosity

Author: Odalric-ambrym Maillard

Abstract: This paper aims to take a step forwards making the term “intrinsic motivation” from reinforcement learning theoretically well founded, focusing on curiositydriven learning. To that end, we consider the setting where, a fixed partition P of a continuous space X being given, and a process ν defined on X being unknown, we are asked to sequentially decide which cell of the partition to select as well as where to sample ν in that cell, in order to minimize a loss function that is inspired from previous work on curiosity-driven learning. The loss on each cell consists of one term measuring a simple worst case quadratic sampling error, and a penalty term proportional to the range of the variance in that cell. The corresponding problem formulation extends the setting known as active learning for multi-armed bandits to the case when each arm is a continuous region, and we show how an adaptation of recent algorithms for that problem and of hierarchical optimistic sampling algorithms for optimization can be used in order to solve this problem. The resulting procedure, called Hierarchical Optimistic Region SElection driven by Curiosity (HORSE.C) is provided together with a finite-time regret analysis. 1

5 0.093669057 73 nips-2012-Coding efficiency and detectability of rate fluctuations with non-Poisson neuronal firing

Author: Shinsuke Koyama

Abstract: Statistical features of neuronal spike trains are known to be non-Poisson. Here, we investigate the extent to which the non-Poissonian feature affects the efficiency of transmitting information on fluctuating firing rates. For this purpose, we introduce the Kullback-Leibler (KL) divergence as a measure of the efficiency of information encoding, and assume that spike trains are generated by time-rescaled renewal processes. We show that the KL divergence determines the lower bound of the degree of rate fluctuations below which the temporal variation of the firing rates is undetectable from sparse data. We also show that the KL divergence, as well as the lower bound, depends not only on the variability of spikes in terms of the coefficient of variation, but also significantly on the higher-order moments of interspike interval (ISI) distributions. We examine three specific models that are commonly used for describing the stochastic nature of spikes (the gamma, inverse Gaussian (IG) and lognormal ISI distributions), and find that the time-rescaled renewal process with the IG distribution achieves the largest KL divergence, followed by the lognormal and gamma distributions.

6 0.087218359 259 nips-2012-Online Regret Bounds for Undiscounted Continuous Reinforcement Learning

7 0.079339407 50 nips-2012-Bandit Algorithms boost Brain Computer Interfaces for motor-task selection of a brain-controlled button

8 0.077977724 241 nips-2012-No-Regret Algorithms for Unconstrained Online Convex Optimization

9 0.076088995 261 nips-2012-Online allocation and homogeneous partitioning for piecewise constant mean-approximation

10 0.072309747 110 nips-2012-Efficient Reinforcement Learning for High Dimensional Linear Quadratic Systems

11 0.068366818 325 nips-2012-Stochastic optimization and sparse statistical recovery: Optimal algorithms for high dimensions

12 0.064057827 112 nips-2012-Efficient Spike-Coding with Multiplicative Adaptation in a Spike Response Model

13 0.060051609 83 nips-2012-Controlled Recognition Bounds for Visual Learning and Exploration

14 0.059972446 281 nips-2012-Provable ICA with Unknown Gaussian Noise, with Implications for Gaussian Mixtures and Autoencoders

15 0.052182708 44 nips-2012-Approximating Concavely Parameterized Optimization Problems

16 0.05168448 134 nips-2012-Finite Sample Convergence Rates of Zero-Order Stochastic Optimization Methods

17 0.050219614 105 nips-2012-Dynamic Pruning of Factor Graphs for Maximum Marginal Prediction

18 0.049295127 122 nips-2012-Exploration in Model-based Reinforcement Learning by Empirically Estimating Learning Progress

19 0.048129205 102 nips-2012-Distributed Non-Stochastic Experts

20 0.044466704 293 nips-2012-Relax and Randomize : From Value to Algorithms


similar papers computed by lsi model

lsi for this paper:

topicId topicWeight

[(0, 0.113), (1, -0.081), (2, 0.062), (3, 0.034), (4, 0.058), (5, 0.036), (6, -0.009), (7, 0.062), (8, -0.092), (9, 0.048), (10, 0.137), (11, 0.14), (12, -0.155), (13, -0.24), (14, -0.218), (15, 0.109), (16, -0.136), (17, 0.007), (18, -0.193), (19, -0.128), (20, 0.171), (21, -0.129), (22, -0.08), (23, 0.041), (24, 0.139), (25, -0.078), (26, -0.051), (27, -0.18), (28, -0.003), (29, 0.051), (30, -0.088), (31, -0.08), (32, 0.0), (33, 0.081), (34, 0.046), (35, -0.094), (36, -0.066), (37, -0.005), (38, 0.078), (39, -0.006), (40, 0.047), (41, 0.016), (42, 0.138), (43, -0.033), (44, 0.116), (45, 0.013), (46, -0.013), (47, 0.122), (48, -0.011), (49, 0.015)]

similar papers list:

simIndex simValue paperId paperTitle

same-paper 1 0.96471566 61 nips-2012-Best Arm Identification: A Unified Approach to Fixed Budget and Fixed Confidence

Author: Victor Gabillon, Mohammad Ghavamzadeh, Alessandro Lazaric

Abstract: We study the problem of identifying the best arm(s) in the stochastic multi-armed bandit setting. This problem has been studied in the literature from two different perspectives: fixed budget and fixed confidence. We propose a unifying approach that leads to a meta-algorithm called unified gap-based exploration (UGapE), with a common structure and similar theoretical analysis for these two settings. We prove a performance bound for the two versions of the algorithm showing that the two problems are characterized by the same notion of complexity. We also show how the UGapE algorithm as well as its theoretical analysis can be extended to take into account the variance of the arms and to multiple bandits. Finally, we evaluate the performance of UGapE and compare it with a number of existing fixed budget and fixed confidence algorithms. 1

2 0.90989554 295 nips-2012-Risk-Aversion in Multi-armed Bandits

Author: Amir Sani, Alessandro Lazaric, Rémi Munos

Abstract: Stochastic multi–armed bandits solve the Exploration–Exploitation dilemma and ultimately maximize the expected reward. Nonetheless, in many practical problems, maximizing the expected reward is not the most desirable objective. In this paper, we introduce a novel setting based on the principle of risk–aversion where the objective is to compete against the arm with the best risk–return trade–off. This setting proves to be more difficult than the standard multi-arm bandit setting due in part to an exploration risk which introduces a regret associated to the variability of an algorithm. Using variance as a measure of risk, we define two algorithms, investigate their theoretical guarantees, and report preliminary empirical results. 1

3 0.50245517 216 nips-2012-Mirror Descent Meets Fixed Share (and feels no regret)

Author: Nicolò Cesa-bianchi, Pierre Gaillard, Gabor Lugosi, Gilles Stoltz

Abstract: Mirror descent with an entropic regularizer is known to achieve shifting regret bounds that are logarithmic in the dimension. This is done using either a carefully designed projection or by a weight sharing technique. Via a novel unified analysis, we show that these two approaches deliver essentially equivalent bounds on a notion of regret generalizing shifting, adaptive, discounted, and other related regrets. Our analysis also captures and extends the generalized weight sharing technique of Bousquet and Warmuth, and can be refined in several ways, including improvements for small losses and adaptive tuning of parameters. 1

4 0.49703184 259 nips-2012-Online Regret Bounds for Undiscounted Continuous Reinforcement Learning

Author: Ronald Ortner, Daniil Ryabko

Abstract: We derive sublinear regret bounds for undiscounted reinforcement learning in continuous state space. The proposed algorithm combines state aggregation with the use of upper confidence bounds for implementing optimism in the face of uncertainty. Beside the existence of an optimal policy which satisfies the Poisson equation, the only assumptions made are H¨ lder continuity of rewards and transition o probabilities. 1

5 0.45888373 110 nips-2012-Efficient Reinforcement Learning for High Dimensional Linear Quadratic Systems

Author: Morteza Ibrahimi, Adel Javanmard, Benjamin V. Roy

Abstract: We study the problem of adaptive control of a high dimensional linear quadratic (LQ) system. Previous work established the asymptotic convergence to an optimal controller for various adaptive control schemes. More recently, for the average √ cost LQ problem, a regret bound of O( T ) was shown, apart form logarithmic factors. However, this bound scales exponentially with p, the dimension of the state space. In this work we consider the case where the matrices describing the dynamic of the LQ system are sparse and their dimensions are large. We present √ an adaptive control scheme that achieves a regret bound of O(p T ), apart from logarithmic factors. In particular, our algorithm has an average cost of (1 + ) times the optimum cost after T = polylog(p)O(1/ 2 ). This is in comparison to previous work on the dense dynamics where the algorithm requires time that scales exponentially with dimension in order to achieve regret of times the optimal cost. We believe that our result has prominent applications in the emerging area of computational advertising, in particular targeted online advertising and advertising in social networks. 1

6 0.42906371 102 nips-2012-Distributed Non-Stochastic Experts

7 0.42057022 149 nips-2012-Hierarchical Optimistic Region Selection driven by Curiosity

8 0.40934062 241 nips-2012-No-Regret Algorithms for Unconstrained Online Convex Optimization

9 0.33814323 109 nips-2012-Efficient Monte Carlo Counterfactual Regret Minimization in Games with Many Player Actions

10 0.32611966 261 nips-2012-Online allocation and homogeneous partitioning for piecewise constant mean-approximation

11 0.29382089 83 nips-2012-Controlled Recognition Bounds for Visual Learning and Exploration

12 0.28292906 50 nips-2012-Bandit Algorithms boost Brain Computer Interfaces for motor-task selection of a brain-controlled button

13 0.27585417 73 nips-2012-Coding efficiency and detectability of rate fluctuations with non-Poisson neuronal firing

14 0.27000231 281 nips-2012-Provable ICA with Unknown Gaussian Noise, with Implications for Gaussian Mixtures and Autoencoders

15 0.2232561 266 nips-2012-Patient Risk Stratification for Hospital-Associated C. diff as a Time-Series Classification Task

16 0.21172911 122 nips-2012-Exploration in Model-based Reinforcement Learning by Empirically Estimating Learning Progress

17 0.20624347 112 nips-2012-Efficient Spike-Coding with Multiplicative Adaptation in a Spike Response Model

18 0.19758475 296 nips-2012-Risk Aversion in Markov Decision Processes via Near Optimal Chernoff Bounds

19 0.18620457 134 nips-2012-Finite Sample Convergence Rates of Zero-Order Stochastic Optimization Methods

20 0.1839792 105 nips-2012-Dynamic Pruning of Factor Graphs for Maximum Marginal Prediction


similar papers computed by lda model

lda for this paper:

topicId topicWeight

[(0, 0.024), (1, 0.2), (21, 0.039), (27, 0.011), (36, 0.141), (38, 0.106), (42, 0.021), (54, 0.033), (55, 0.017), (74, 0.055), (76, 0.105), (80, 0.081), (92, 0.068)]

similar papers list:

simIndex simValue paperId paperTitle

1 0.75147337 223 nips-2012-Multi-criteria Anomaly Detection using Pareto Depth Analysis

Author: Ko-jen Hsiao, Kevin Xu, Jeff Calder, Alfred O. Hero

Abstract: We consider the problem of identifying patterns in a data set that exhibit anomalous behavior, often referred to as anomaly detection. In most anomaly detection algorithms, the dissimilarity between data samples is calculated by a single criterion, such as Euclidean distance. However, in many cases there may not exist a single dissimilarity measure that captures all possible anomalous patterns. In such a case, multiple criteria can be defined, and one can test for anomalies by scalarizing the multiple criteria using a linear combination of them. If the importance of the different criteria are not known in advance, the algorithm may need to be executed multiple times with different choices of weights in the linear combination. In this paper, we introduce a novel non-parametric multi-criteria anomaly detection method using Pareto depth analysis (PDA). PDA uses the concept of Pareto optimality to detect anomalies under multiple criteria without having to run an algorithm multiple times with different choices of weights. The proposed PDA approach scales linearly in the number of criteria and is provably better than linear combinations of the criteria. 1

same-paper 2 0.74934697 61 nips-2012-Best Arm Identification: A Unified Approach to Fixed Budget and Fixed Confidence

Author: Victor Gabillon, Mohammad Ghavamzadeh, Alessandro Lazaric

Abstract: We study the problem of identifying the best arm(s) in the stochastic multi-armed bandit setting. This problem has been studied in the literature from two different perspectives: fixed budget and fixed confidence. We propose a unifying approach that leads to a meta-algorithm called unified gap-based exploration (UGapE), with a common structure and similar theoretical analysis for these two settings. We prove a performance bound for the two versions of the algorithm showing that the two problems are characterized by the same notion of complexity. We also show how the UGapE algorithm as well as its theoretical analysis can be extended to take into account the variance of the arms and to multiple bandits. Finally, we evaluate the performance of UGapE and compare it with a number of existing fixed budget and fixed confidence algorithms. 1

3 0.74521852 266 nips-2012-Patient Risk Stratification for Hospital-Associated C. diff as a Time-Series Classification Task

Author: Jenna Wiens, Eric Horvitz, John V. Guttag

Abstract: A patient’s risk for adverse events is affected by temporal processes including the nature and timing of diagnostic and therapeutic activities, and the overall evolution of the patient’s pathophysiology over time. Yet many investigators ignore this temporal aspect when modeling patient outcomes, considering only the patient’s current or aggregate state. In this paper, we represent patient risk as a time series. In doing so, patient risk stratification becomes a time-series classification task. The task differs from most applications of time-series analysis, like speech processing, since the time series itself must first be extracted. Thus, we begin by defining and extracting approximate risk processes, the evolving approximate daily risk of a patient. Once obtained, we use these signals to explore different approaches to time-series classification with the goal of identifying high-risk patterns. We apply the classification to the specific task of identifying patients at risk of testing positive for hospital acquired Clostridium difficile. We achieve an area under the receiver operating characteristic curve of 0.79 on a held-out set of several hundred patients. Our two-stage approach to risk stratification outperforms classifiers that consider only a patient’s current state (p<0.05). 1

4 0.7395066 35 nips-2012-Adaptive Learning of Smoothing Functions: Application to Electricity Load Forecasting

Author: Amadou Ba, Mathieu Sinn, Yannig Goude, Pascal Pompey

Abstract: This paper proposes an efficient online learning algorithm to track the smoothing functions of Additive Models. The key idea is to combine the linear representation of Additive Models with a Recursive Least Squares (RLS) filter. In order to quickly track changes in the model and put more weight on recent data, the RLS filter uses a forgetting factor which exponentially weights down observations by the order of their arrival. The tracking behaviour is further enhanced by using an adaptive forgetting factor which is updated based on the gradient of the a priori errors. Using results from Lyapunov stability theory, upper bounds for the learning rate are analyzed. The proposed algorithm is applied to 5 years of electricity load data provided by the French utility company Electricit´ de France (EDF). e Compared to state-of-the-art methods, it achieves a superior performance in terms of model tracking and prediction accuracy. 1

5 0.70590603 295 nips-2012-Risk-Aversion in Multi-armed Bandits

Author: Amir Sani, Alessandro Lazaric, Rémi Munos

Abstract: Stochastic multi–armed bandits solve the Exploration–Exploitation dilemma and ultimately maximize the expected reward. Nonetheless, in many practical problems, maximizing the expected reward is not the most desirable objective. In this paper, we introduce a novel setting based on the principle of risk–aversion where the objective is to compete against the arm with the best risk–return trade–off. This setting proves to be more difficult than the standard multi-arm bandit setting due in part to an exploration risk which introduces a regret associated to the variability of an algorithm. Using variance as a measure of risk, we define two algorithms, investigate their theoretical guarantees, and report preliminary empirical results. 1

6 0.70307422 43 nips-2012-Approximate Message Passing with Consistent Parameter Estimation and Applications to Sparse Learning

7 0.69708818 288 nips-2012-Rational inference of relative preferences

8 0.66731209 188 nips-2012-Learning from Distributions via Support Measure Machines

9 0.6521883 34 nips-2012-Active Learning of Multi-Index Function Models

10 0.64459425 69 nips-2012-Clustering Sparse Graphs

11 0.64024216 326 nips-2012-Structure estimation for discrete graphical models: Generalized covariance matrices and their inverses

12 0.63387358 209 nips-2012-Max-Margin Structured Output Regression for Spatio-Temporal Action Localization

13 0.6224677 259 nips-2012-Online Regret Bounds for Undiscounted Continuous Reinforcement Learning

14 0.61703348 252 nips-2012-On Multilabel Classification and Ranking with Partial Feedback

15 0.61378098 83 nips-2012-Controlled Recognition Bounds for Visual Learning and Exploration

16 0.60914922 229 nips-2012-Multimodal Learning with Deep Boltzmann Machines

17 0.6083706 149 nips-2012-Hierarchical Optimistic Region Selection driven by Curiosity

18 0.60786468 65 nips-2012-Cardinality Restricted Boltzmann Machines

19 0.60661572 260 nips-2012-Online Sum-Product Computation Over Trees

20 0.60613388 186 nips-2012-Learning as MAP Inference in Discrete Graphical Models