nips nips2011 nips2011-97 knowledge-graph by maker-knowledge-mining

97 nips-2011-Finite Time Analysis of Stratified Sampling for Monte Carlo


Source: pdf

Author: Alexandra Carpentier, Rémi Munos

Abstract: We consider the problem of stratified sampling for Monte-Carlo integration. We model this problem in a multi-armed bandit setting, where the arms represent the strata, and the goal is to estimate a weighted average of the mean values of the arms. We propose a strategy that samples the arms according to an upper bound on their standard deviations and compare its estimation quality to an ideal allocation that would know the standard deviations of the strata. We provide two regret analyses: a distribution� dependent bound O(n−3/2 ) that depends on a measure of the disparity of � the strata, and a distribution-free bound O(n−4/3 ) that does not. 1

Reference: text


Summary: the most important sentenses genereted by tfidf model

sentIndex sentText sentNum sentScore

1 fr Abstract We consider the problem of stratified sampling for Monte-Carlo integration. [sent-5, score-0.059]

2 We model this problem in a multi-armed bandit setting, where the arms represent the strata, and the goal is to estimate a weighted average of the mean values of the arms. [sent-6, score-0.295]

3 We propose a strategy that samples the arms according to an upper bound on their standard deviations and compare its estimation quality to an ideal allocation that would know the standard deviations of the strata. [sent-7, score-0.908]

4 We provide two regret analyses: a distribution� dependent bound O(n−3/2 ) that depends on a measure of the disparity of � the strata, and a distribution-free bound O(n−4/3 ) that does not. [sent-8, score-0.431]

5 1 Introduction Consider a polling institute that has to estimate as accurately as possible the average income of a country, given a finite budget for polls. [sent-9, score-0.282]

6 The institute has call centers in every region in the country, and gives a part of the total sampling budget to each center so that they can call random people in the area and ask about their income. [sent-10, score-0.206]

7 A naive method would allocate a budget proportionally to the number of people in each area. [sent-11, score-0.302]

8 However some regions show a high variability in the income of their inhabitants whereas others are very homogeneous. [sent-12, score-0.139]

9 Now if the polling institute knew the level of variability within each region, it could adjust the budget allocated to each region in a more clever way (allocating more polls to regions with high variability) in order to reduce the final estimation error. [sent-13, score-0.323]

10 This example is just one of many for which an efficient method of sampling a function with natural strata (i. [sent-14, score-0.421]

11 Note that even in the case that there are no natural strata, it is always a good strategy to design arbitrary strata and allocate a budget to each stratum that is proportional to the size of the stratum, compared to a crude Monte-Carlo. [sent-17, score-0.874]

12 There are many good surveys on the topic of stratified sampling for Monte-Carlo, such as (Rubinstein and Kroese, 2008)[Subsection 5. [sent-18, score-0.059]

13 The main problem for performing an efficient sampling is that the variances within the strata (in the previous example, the income variability per region) are usually unknown. [sent-20, score-0.605]

14 One possibility is to estimate the variances online while sampling the strata. [sent-21, score-0.136]

15 The work of Etor´ and Jourdain (2010) matches e e exactly our problem of designing an efficient adaptive sampling strategy. [sent-23, score-0.087]

16 In this article they propose to sample according to an empirical estimate of the variance of the strata, whereas Kawai (2010) addresses a computational complexity problem which is slightly different from ours. [sent-24, score-0.093]

17 (2011) describes a strategy that enables to sample e asymptotically according to the (unknown) standard deviations of the strata and at the same time adapts the shape (and number) of the strata online. [sent-26, score-0.998]

18 1 These works provide asymptotic convergence of the variance of the estimate to the targeted stratified variance1 divided by the sample size. [sent-28, score-0.093]

19 They also prove that the number of pulls within each stratum converges to the desired number of pulls i. [sent-29, score-0.466]

20 the optimal allocation if the variances per stratum were known. [sent-31, score-0.449]

21 Our contribution is to design a sampling strategy for which we can derive a finite-time analysis (where ’time’ refers to the number of samples). [sent-33, score-0.162]

22 This enables us to predict the quality of our estimate for any given budget n. [sent-34, score-0.152]

23 We model this problem using the setting of multi-armed bandits where our goal is to estimate a weighted average of the mean values of the arms. [sent-35, score-0.066]

24 Although our goal is different from a usual bandit problem where the objective is to play the best arm as often as possible, this problem also exhibits an exploration-exploitation trade-off. [sent-36, score-0.43]

25 The arms have to be pulled both in order to estimate the initially unknown variability of the arms (exploration) and to allocate correctly the budget according to our current knowledge of the variability (exploitation). [sent-37, score-0.798]

26 The authors present an algorithm, called GAFS-MAX, that allocates samples proportionally to the empirical variance of the arms, √ while imposing that each arm is pulled at least n times to guarantee a sufficiently good estimation of the true variances. [sent-40, score-0.601]

27 it targets an allocation which is proportional to the standard deviation (and not to the variance) of the strata times their size2 . [sent-44, score-0.677]

28 Some questions remain open in this work, notably that no distribution independent regret bound is provided for GAFS-WL. [sent-45, score-0.215]

29 The algorithm, called MC-UCB, samples the arms proportionally to an UCB3 on the standard deviation times the size of the stratum. [sent-51, score-0.359]

30 Our contributions are the following: • We derive a finite-time analysis for the stratified sampling for Monte-Carlo setting by using an algorithm based on upper confidence bounds. [sent-54, score-0.091]

31 � • We provide two regret analysis: (i) a distribution-dependent bound O(n−3/2 )4 that depends on the disparity of the stratas (a measure of the problem complexity), and which corresponds to a stationary regime where the budget n is large compared to � this complexity. [sent-56, score-0.606]

32 (ii) A distribution-free bound O(n−4/3 ) that does not depend on the the disparity of the stratas, and corresponds to a transitory regime where n is small compared to the complexity. [sent-57, score-0.345]

33 3 Note that we consider a sampling strategy based on UCBs on the standard deviations of the arms whereas the so-called UCB algorithm of Auer et al. [sent-66, score-0.519]

34 (2002), in the usual multi-armed bandit setting, computes UCBs on the mean rewards of the arms. [sent-67, score-0.1]

35 2 illustrate our method on the problem of pricing Asian options as introduced in (Glasserman et al. [sent-69, score-0.101]

36 2 Preliminaries The allocation problem mentioned in the previous section is formalized as a K-armed bandit problem where each arm (stratum) k = 1, . [sent-72, score-0.64]

37 At each round t ≥ 1, an allocation strategy (or algorithm) A selects an arm kt and receives a sample drawn from νkt independently of the past samples. [sent-76, score-0.796]

38 , the arm selected at round t may depend on past observed samples. [sent-79, score-0.387]

39 The goal is to define a strategy that estimates as precisely �K as possible µ = k=1 wk µk using a total budget of n samples. [sent-85, score-0.397]

40 �t Let us write Tk,t = s=1 I {ks = k} the number of times arm k has been pulled up to time Tk,t 1 � Xk,s the empirical estimate of the mean µk at time t, where Xk,s t, and µk,t = ˆ Tk,t s=1 denotes the sample received when pulling arm k for the s-th time. [sent-86, score-0.99]

41 Note ˆ that in the case of a deterministic strategy, the expected quadratic estimation error of the �K ˆ weighted mean µ as estimated by the weighted average µn = k=1 wk µk,n satisfies: ˆ �� � �� �� �2 � �2 � �K �2 � K 2 =E = k=1 wk Eνk µk,n − µk . [sent-88, score-0.416]

42 ˆ wk (ˆk,n − µk ) µ E µn − µ ˆ k=1 We thus use the following measure for the performance of any algorithm A: � � �K 2 2 Ln (A) = k=1 wk E (µk − µk,n ) . [sent-89, score-0.348]

43 ˆ (1) The goal is to define an allocation strategy that minimizes the global loss defined in Equation 1. [sent-90, score-0.339]

44 If the variance of the arms were known in advance, one could design an optimal static5 allocation strategy A∗ by pulling each arm k proportionally to the quantity wk σk . [sent-91, score-1.267]

45 ∗ Indeed, if arm k is pulled a deterministic number of times Tk,n , then 2 �K 2 σ (2) Ln (A∗ ) = k=1 wk T ∗k . [sent-92, score-0.639]

46 In the following, we write λk = k,n = wΣwk the optimal allocation n proportion for arm k and λmin = min1≤k≤K λk . [sent-94, score-0.59]

47 Note that a small λmin means a large disparity of the wk σk and, as explained later, provides for the algorithm we build in Section 3 a characterization of the hardness of a problem. [sent-95, score-0.37]

48 However, in the setting considered here, the σk are unknown, and thus the optimal allocation u is out of reach. [sent-96, score-0.236]

49 A possible allocation is the uniform strategy Au , i. [sent-97, score-0.339]

50 Its performance is i=1 wi �K w σ 2 �K Σ Ln (Au ) = k=1 wk k=1 k k = w,2 , n n 5 Static means that the number of pulls allocated to each arm does not depend on the received samples. [sent-100, score-0.819]

51 In addition, since i wi = 1, we have Σ2 − Σw,2 = − k wk (σk − Σw )2 . [sent-104, score-0.233]

52 The difference w between those two quantities is the weighted quadratic variation of the σk around their weighted mean Σw . [sent-105, score-0.068]

53 As a result the gain of A∗ compared to Au grow with the disparity of the σk . [sent-107, score-0.148]

54 We would like to do better than the uniform strategy by considering an adaptive strategy A that would estimate the σk at the same time as it tries to implement an allocation strategy as close as possible to the optimal allocation algorithm A∗ . [sent-108, score-0.841]

55 This introduces a natural trade-off between the exploration needed to improve the estimates of the variances and the exploitation of the current estimates to allocate the pulls nearly-optimally. [sent-109, score-0.339]

56 In order to assess how well A solves this trade-off and manages to sample according to the true standard deviations without knowing them in advance, we compare its performance to that of the optimal allocation strategy A∗ . [sent-110, score-0.533]

57 For this purpose we define the notion of regret of an adaptive algorithm A as the difference between the performance loss incurred by the algorithm and the optimal algorithm: Rn (A) = Ln (A) − Ln (A∗ ). [sent-111, score-0.175]

58 (5) The regret indicates how much we loose in terms of expected quadratic estimation error Σ2 w by not knowing in advance the standard deviations (σk ). [sent-112, score-0.348]

59 Note that since Ln (A∗ ) = n , a consistent strategy i. [sent-113, score-0.103]

60 , asymptotically equivalent to the optimal strategy, is obtained whenever its regret is neglectable compared to 1/n. [sent-115, score-0.147]

61 1 The algorithm In this section, we introduce our adaptive algorithm for the allocation problem, called Monte Carlo Upper Confidence Bound (MC-UCB). [sent-117, score-0.264]

62 The algorithm computes a high-probability bound on the standard deviation of each arm and samples the arms proportionally to their bounds times the corresponding weights. [sent-118, score-0.805]

63 , n do � � � wk 1 ˆ Compute Bk,t = Tk,t−1 σk,t−1 + b Tk,t−1 for each arm 1 ≤ k ≤ K Pull an arm kt ∈ arg max1≤k≤K Bk,t end for Output: µk,t for each arm 1 ≤ k ≤ K ˆ Figure 1: The pseudo-code of the MC-UCB algorithm. [sent-130, score-1.28]

64 The empirical standard deviations σk,t−1 are computed using Equation 6. [sent-131, score-0.145]

65 ˆ The algorithm starts by pulling each arm twice in rounds t = 1 to 2K. [sent-132, score-0.478]

66 From round t = 2K+1 on, it computes an upper confidence bound Bk,t on the standard deviation σk , for each arm k, and then pulls the one with largest Bk,t . [sent-133, score-0.705]

67 The upper bounds on the standard deviations are built by using Theorem 10 in (Maurer and Pontil, 2009)6 and based on the empirical standard deviation σk,t−1 : ˆ σk,t−1 = ˆ2 6 1 Tk,t−1 − 1 Tk,t−1 � i=1 (Xk,i − µk,t−1 )2 , ˆ We could also have used the variant reported in (Audibert et al. [sent-134, score-0.255]

68 4 (6) where Xk,i is the i-th sample received when pulling arm k, and Tk,t−1 is the number of pulls allocated to arm k up to time t − 1. [sent-136, score-1.051]

69 After n rounds, MC-UCB returns the empirical mean µk,n for each arm 1 ≤ k ≤ K. [sent-137, score-0.354]

70 A distribution-dependent result: We now report the first bound on the regret of MCUCB algorithm. [sent-148, score-0.215]

71 , the difference in the number of pulls of each arm compared to the optimal allocation (see Lemma 3). [sent-152, score-0.739]

72 Theorem 1 Under Assumption 1 and if we choose c2 such that c2 ≥ 2Kn−5/2 , the regret of MC-UCB run with parameter δ = n−7/2 with n ≥ 4K is bounded as � � log(n)c1 (c2 + 2) � 19 � 112Σw + 6K + 3 KΣ2 + 720c1 (c2 + 1) log(n)2 . [sent-153, score-0.147]

73 Rn (AM C−U CB ) ≤ w 3/2 λmin n2 n3/2 λmin Note that this result crucially depends on the smallest proportion λmin which is a measure of the disparity of the standard deviations times their weight. [sent-154, score-0.327]

74 A distribution-free result: Now we report our second regret bound that does not depend on λmin but whose rate is poorer. [sent-156, score-0.215]

75 Theorem 2 Under Assumption 1 and if we choose c2 such that c2 ≥ 2Kn−5/2 , the regret of MC-UCB run with parameter δ = n−7/2 with n ≥ 4K is bounded as √ � 200 c1 (c2 + 2)Σw K 365 � Rn (AM C−U CB ) ≤ log(n) + 3/2 129c1 (c2 + 2)2 K 2 log(n)2 + KΣ2 . [sent-158, score-0.147]

76 Note that the bound is not entirely distribution free since Σw appears. [sent-160, score-0.068]

77 1 Discussion on the results Distribution-free versus distribution-dependent � −5/2 Theorem 1 provides a regret bound of order O(λmin n−3/2 ), whereas Theorem 2 provides a � bound in O(n−4/3 ) independently of λmin . [sent-164, score-0.283]

78 , a given λmin , the distribution-free result of Theorem 2 is more informative than the distribution-dependent result of Theorem 1 in the transitory regime, that is to say when n is small compared to λ−1 . [sent-167, score-0.084]

79 The distribution-dependent result of Theorem 1 is better in the stationary regime i. [sent-168, score-0.074]

80 7 The distribution dependent bound is in O(K log n/Δ), where Δ is the difference between the √ mean value of the two best arms, and the distribution-free bound is in O( nK log n) as explained in (Auer et al. [sent-172, score-0.285]

81 5 Although we do not have a lower bound on the regret yet, we believe that the rate n−3/2 cannot be improved for general distributions. [sent-174, score-0.215]

82 As explained in the proof in Appendix B of (Carpentier and Munos, 2011), this rate is a direct consequence of the high probability √ bounds on the estimates of the standard deviations of the arms which are in O(1/ n), and those bounds are tight. [sent-175, score-0.372]

83 A natural question is whether there exists an algorithm with a regret � of order O(n−3/2 ) without any dependence in λ−1 . [sent-176, score-0.147]

84 The problem dependent upper bound is similar to the one provided for GAFS-WL in (Grover, 2009). [sent-181, score-0.1]

85 Note however that when there is an arm with 0 standard deviation, GAFS-WL is likely to perform better than MC-UCB, as it will only sample this √ � arm O( n) times while MC-UCB samples it O(n2/3 ) times. [sent-184, score-0.768]

86 By the choice of the value assigned to δ in the two theorems, b should be chosen of order c log(n), where c can be interpreted as a high probability bound on the range of the samples. [sent-188, score-0.068]

87 Note that in the case of bounded distributions, b can be chosen as � � b = 4 5 c log(n) where c is a true bound on the variables. [sent-190, score-0.068]

88 5 Numerical experiment: Pricing of an Asian option We consider the pricing problem of an Asian option introduced in (Glasserman et al. [sent-192, score-0.177]

89 The discounted payoff of the Asian option is defined as a function of W , by: � � � � � √ d 1 (8) F (W ) = exp(−rT ) max d i=1 S0 exp (r − 1 s2 ) iT + s0 T Wi − C, 0 , 0 d 2 where S0 , r, and s0 are constants, and the price is defined by the expectation p = EW F (W ). [sent-196, score-0.09]

90 We want to estimate the price p by Monte-Carlo simulations (by sampling on W = (Wi )1≤i≤d ). [sent-197, score-0.143]

91 In order to reduce the variance of the estimated price, we can stratify the space of W . [sent-198, score-0.109]

92 (1999) suggest to stratify according to a one dimensional projection of W , i. [sent-200, score-0.074]

93 , by choosing a projection vector u ∈ Rd and define the strata as the set of W such that u · W lies in intervals of R. [sent-202, score-0.362]

94 , to stratify according to the last component Wd of W . [sent-205, score-0.074]

95 e Like in (Kawai, 2010) we consider 5 strata of equal weight. [sent-212, score-0.362]

96 Since Wd follows a N (0, 1), the strata correspond to the 20-percentile of a normal distribution. [sent-213, score-0.362]

97 The left plot of Figure 2 represents the cumulative distribution function of Wd and shows the strata in terms of 6 percentiles of Wd . [sent-214, score-0.387]

98 The right plot represents, in dot line, the curve E[F (W )|Wd = x] versus P(Wd < x) parameterized by x, and the box plot represents the expectation and standard deviations of F (W ) conditioned on each stratum. [sent-215, score-0.145]

99 We observe that this stratification produces an important heterogeneity of the standard deviations per stratum, which indicates that a stratified sampling would be profitable compared to a crude Monte-Carlo sampling. [sent-216, score-0.244]

100 Expectation of the payoff in every strata for W with C=90 d 1000 900 E[F(W)|W =x] 800 E[F(W)|W ∈ strata] d d 700 E[F(W)|Wd=x] 600 500 400 300 200 100 0 −100 0 0. [sent-217, score-0.362]


similar papers computed by tfidf model

tfidf for this paper:

wordName wordTfidf (topN-words)

[('strata', 0.362), ('arm', 0.354), ('strati', 0.25), ('allocation', 0.236), ('carpentier', 0.197), ('arms', 0.179), ('wk', 0.174), ('etor', 0.168), ('kawai', 0.168), ('stratum', 0.168), ('wd', 0.167), ('pulls', 0.149), ('disparity', 0.148), ('regret', 0.147), ('deviations', 0.145), ('jourdain', 0.14), ('budget', 0.12), ('glasserman', 0.112), ('strategy', 0.103), ('munos', 0.101), ('proportionally', 0.101), ('erence', 0.096), ('di', 0.092), ('asian', 0.085), ('pulling', 0.085), ('kroese', 0.084), ('transitory', 0.084), ('au', 0.081), ('allocate', 0.081), ('pulled', 0.077), ('stratify', 0.074), ('income', 0.074), ('rubinstein', 0.074), ('bound', 0.068), ('ln', 0.068), ('pricing', 0.068), ('variability', 0.065), ('pull', 0.06), ('sampling', 0.059), ('wi', 0.059), ('polling', 0.056), ('ucbs', 0.056), ('allocated', 0.055), ('cb', 0.055), ('price', 0.052), ('bandit', 0.05), ('grover', 0.049), ('payo', 0.049), ('stratas', 0.049), ('ucb', 0.049), ('explained', 0.048), ('dence', 0.046), ('variances', 0.045), ('regime', 0.045), ('deviation', 0.045), ('kt', 0.044), ('regimes', 0.041), ('appendix', 0.04), ('crude', 0.04), ('country', 0.04), ('brownian', 0.04), ('nord', 0.04), ('subsection', 0.04), ('min', 0.039), ('rounds', 0.039), ('option', 0.038), ('lille', 0.037), ('europe', 0.036), ('variance', 0.035), ('log', 0.034), ('weighted', 0.034), ('times', 0.034), ('erent', 0.033), ('et', 0.033), ('exploration', 0.033), ('advance', 0.033), ('round', 0.033), ('upper', 0.032), ('estimate', 0.032), ('exploitation', 0.031), ('audibert', 0.031), ('inria', 0.029), ('stationary', 0.029), ('carlo', 0.028), ('adaptive', 0.028), ('monte', 0.028), ('received', 0.028), ('region', 0.027), ('auer', 0.027), ('usual', 0.026), ('sample', 0.026), ('rn', 0.025), ('theorem', 0.025), ('percentiles', 0.025), ('strike', 0.025), ('computes', 0.024), ('analyses', 0.024), ('direction', 0.024), ('knowing', 0.023), ('static', 0.023)]

similar papers list:

simIndex simValue paperId paperTitle

same-paper 1 1.0 97 nips-2011-Finite Time Analysis of Stratified Sampling for Monte Carlo

Author: Alexandra Carpentier, Rémi Munos

Abstract: We consider the problem of stratified sampling for Monte-Carlo integration. We model this problem in a multi-armed bandit setting, where the arms represent the strata, and the goal is to estimate a weighted average of the mean values of the arms. We propose a strategy that samples the arms according to an upper bound on their standard deviations and compare its estimation quality to an ideal allocation that would know the standard deviations of the strata. We provide two regret analyses: a distribution� dependent bound O(n−3/2 ) that depends on a measure of the disparity of � the strata, and a distribution-free bound O(n−4/3 ) that does not. 1

2 0.27304944 56 nips-2011-Committing Bandits

Author: Loc X. Bui, Ramesh Johari, Shie Mannor

Abstract: We consider a multi-armed bandit problem where there are two phases. The first phase is an experimentation phase where the decision maker is free to explore multiple options. In the second phase the decision maker has to commit to one of the arms and stick with it. Cost is incurred during both phases with a higher cost during the experimentation phase. We analyze the regret in this setup, and both propose algorithms and provide upper and lower bounds that depend on the ratio of the duration of the experimentation phase to the duration of the commitment phase. Our analysis reveals that if given the choice, it is optimal to experiment Θ(ln T ) steps and then commit, where T is the time horizon.

3 0.24923582 175 nips-2011-Multi-Bandit Best Arm Identification

Author: Victor Gabillon, Mohammad Ghavamzadeh, Alessandro Lazaric, Sébastien Bubeck

Abstract: We study the problem of identifying the best arm in each of the bandits in a multibandit multi-armed setting. We first propose an algorithm called Gap-based Exploration (GapE) that focuses on the arms whose mean is close to the mean of the best arm in the same bandit (i.e., small gap). We then introduce an algorithm, called GapE-V, which takes into account the variance of the arms in addition to their gap. We prove an upper-bound on the probability of error for both algorithms. Since GapE and GapE-V need to tune an exploration parameter that depends on the complexity of the problem, which is often unknown in advance, we also introduce variations of these algorithms that estimate this complexity online. Finally, we evaluate the performance of these algorithms and compare them to other allocation strategies on a number of synthetic problems.

4 0.18684818 128 nips-2011-Improved Algorithms for Linear Stochastic Bandits

Author: Yasin Abbasi-yadkori, Csaba Szepesvári, David Tax

Abstract: We improve the theoretical analysis and empirical performance of algorithms for the stochastic multi-armed bandit problem and the linear stochastic multi-armed bandit problem. In particular, we show that a simple modification of Auer’s UCB algorithm (Auer, 2002) achieves with high probability constant regret. More importantly, we modify and, consequently, improve the analysis of the algorithm for the for linear stochastic bandit problem studied by Auer (2002), Dani et al. (2008), Rusmevichientong and Tsitsiklis (2010), Li et al. (2010). Our modification improves the regret bound by a logarithmic factor, though experiments show a vast improvement. In both cases, the improvement stems from the construction of smaller confidence sets. For their construction we use a novel tail inequality for vector-valued martingales. 1

5 0.1571801 245 nips-2011-Selecting the State-Representation in Reinforcement Learning

Author: Odalric-ambrym Maillard, Daniil Ryabko, Rémi Munos

Abstract: The problem of selecting the right state-representation in a reinforcement learning problem is considered. Several models (functions mapping past observations to a finite set) of the observations are given, and it is known that for at least one of these models the resulting state dynamics are indeed Markovian. Without knowing neither which of the models is the correct one, nor what are the probabilistic characteristics of the resulting MDP, it is required to obtain as much reward as the optimal policy for the correct model (or for the best of the correct models, if there are several). We propose an algorithm that achieves that, with a regret of order T 2/3 where T is the horizon time. 1

6 0.13536413 32 nips-2011-An Empirical Evaluation of Thompson Sampling

7 0.13308056 177 nips-2011-Multi-armed bandits on implicit metric spaces

8 0.12698492 210 nips-2011-PAC-Bayesian Analysis of Contextual Bandits

9 0.12279089 220 nips-2011-Prediction strategies without loss

10 0.092110224 61 nips-2011-Contextual Gaussian Process Bandit Optimization

11 0.090956718 25 nips-2011-Adaptive Hedge

12 0.081518196 260 nips-2011-Sparse Features for PCA-Like Linear Regression

13 0.078471698 109 nips-2011-Greedy Model Averaging

14 0.074967176 98 nips-2011-From Bandits to Experts: On the Value of Side-Observations

15 0.073100559 80 nips-2011-Efficient Online Learning via Randomized Rounding

16 0.072673209 283 nips-2011-The Fixed Points of Off-Policy TD

17 0.071656667 272 nips-2011-Stochastic convex optimization with bandit feedback

18 0.069431871 185 nips-2011-Newtron: an Efficient Bandit algorithm for Online Multiclass Prediction

19 0.061711811 217 nips-2011-Practical Variational Inference for Neural Networks

20 0.060635138 145 nips-2011-Learning Eigenvectors for Free


similar papers computed by lsi model

lsi for this paper:

topicId topicWeight

[(0, 0.161), (1, -0.227), (2, 0.014), (3, 0.028), (4, 0.126), (5, -0.05), (6, 0.043), (7, 0.116), (8, 0.051), (9, 0.092), (10, -0.151), (11, 0.058), (12, -0.137), (13, -0.052), (14, 0.069), (15, 0.005), (16, -0.146), (17, -0.178), (18, -0.02), (19, 0.078), (20, 0.04), (21, 0.016), (22, 0.05), (23, -0.018), (24, -0.068), (25, -0.019), (26, -0.06), (27, -0.043), (28, 0.045), (29, -0.007), (30, -0.03), (31, -0.019), (32, 0.093), (33, 0.094), (34, -0.087), (35, 0.112), (36, -0.039), (37, 0.162), (38, -0.075), (39, -0.062), (40, -0.007), (41, 0.011), (42, 0.001), (43, 0.057), (44, 0.003), (45, -0.072), (46, -0.009), (47, 0.042), (48, -0.17), (49, 0.062)]

similar papers list:

simIndex simValue paperId paperTitle

1 0.94853002 175 nips-2011-Multi-Bandit Best Arm Identification

Author: Victor Gabillon, Mohammad Ghavamzadeh, Alessandro Lazaric, Sébastien Bubeck

Abstract: We study the problem of identifying the best arm in each of the bandits in a multibandit multi-armed setting. We first propose an algorithm called Gap-based Exploration (GapE) that focuses on the arms whose mean is close to the mean of the best arm in the same bandit (i.e., small gap). We then introduce an algorithm, called GapE-V, which takes into account the variance of the arms in addition to their gap. We prove an upper-bound on the probability of error for both algorithms. Since GapE and GapE-V need to tune an exploration parameter that depends on the complexity of the problem, which is often unknown in advance, we also introduce variations of these algorithms that estimate this complexity online. Finally, we evaluate the performance of these algorithms and compare them to other allocation strategies on a number of synthetic problems.

same-paper 2 0.93549722 97 nips-2011-Finite Time Analysis of Stratified Sampling for Monte Carlo

Author: Alexandra Carpentier, Rémi Munos

Abstract: We consider the problem of stratified sampling for Monte-Carlo integration. We model this problem in a multi-armed bandit setting, where the arms represent the strata, and the goal is to estimate a weighted average of the mean values of the arms. We propose a strategy that samples the arms according to an upper bound on their standard deviations and compare its estimation quality to an ideal allocation that would know the standard deviations of the strata. We provide two regret analyses: a distribution� dependent bound O(n−3/2 ) that depends on a measure of the disparity of � the strata, and a distribution-free bound O(n−4/3 ) that does not. 1

3 0.87707168 56 nips-2011-Committing Bandits

Author: Loc X. Bui, Ramesh Johari, Shie Mannor

Abstract: We consider a multi-armed bandit problem where there are two phases. The first phase is an experimentation phase where the decision maker is free to explore multiple options. In the second phase the decision maker has to commit to one of the arms and stick with it. Cost is incurred during both phases with a higher cost during the experimentation phase. We analyze the regret in this setup, and both propose algorithms and provide upper and lower bounds that depend on the ratio of the duration of the experimentation phase to the duration of the commitment phase. Our analysis reveals that if given the choice, it is optimal to experiment Θ(ln T ) steps and then commit, where T is the time horizon.

4 0.6446566 32 nips-2011-An Empirical Evaluation of Thompson Sampling

Author: Olivier Chapelle, Lihong Li

Abstract: Thompson sampling is one of oldest heuristic to address the exploration / exploitation trade-off, but it is surprisingly unpopular in the literature. We present here some empirical results using Thompson sampling on simulated and real data, and show that it is highly competitive. And since this heuristic is very easy to implement, we argue that it should be part of the standard baselines to compare against. 1

5 0.56475538 177 nips-2011-Multi-armed bandits on implicit metric spaces

Author: Aleksandrs Slivkins

Abstract: The multi-armed bandit (MAB) setting is a useful abstraction of many online learning tasks which focuses on the trade-off between exploration and exploitation. In this setting, an online algorithm has a fixed set of alternatives (“arms”), and in each round it selects one arm and then observes the corresponding reward. While the case of small number of arms is by now well-understood, a lot of recent work has focused on multi-armed bandits with (infinitely) many arms, where one needs to assume extra structure in order to make the problem tractable. In particular, in the Lipschitz MAB problem there is an underlying similarity metric space, known to the algorithm, such that any two arms that are close in this metric space have similar payoffs. In this paper we consider the more realistic scenario in which the metric space is implicit – it is defined by the available structure but not revealed to the algorithm directly. Specifically, we assume that an algorithm is given a tree-based classification of arms. For any given problem instance such a classification implicitly defines a similarity metric space, but the numerical similarity information is not available to the algorithm. We provide an algorithm for this setting, whose performance guarantees (almost) match the best known guarantees for the corresponding instance of the Lipschitz MAB problem. 1

6 0.55213428 245 nips-2011-Selecting the State-Representation in Reinforcement Learning

7 0.5431 128 nips-2011-Improved Algorithms for Linear Stochastic Bandits

8 0.44545397 109 nips-2011-Greedy Model Averaging

9 0.44481057 210 nips-2011-PAC-Bayesian Analysis of Contextual Bandits

10 0.39800799 208 nips-2011-Optimistic Optimization of a Deterministic Function without the Knowledge of its Smoothness

11 0.38326073 272 nips-2011-Stochastic convex optimization with bandit feedback

12 0.36470735 61 nips-2011-Contextual Gaussian Process Bandit Optimization

13 0.33126587 98 nips-2011-From Bandits to Experts: On the Value of Side-Observations

14 0.32476482 25 nips-2011-Adaptive Hedge

15 0.30596995 160 nips-2011-Linear Submodular Bandits and their Application to Diversified Retrieval

16 0.30221704 185 nips-2011-Newtron: an Efficient Bandit algorithm for Online Multiclass Prediction

17 0.29407331 220 nips-2011-Prediction strategies without loss

18 0.2879158 241 nips-2011-Scalable Training of Mixture Models via Coresets

19 0.28593257 62 nips-2011-Continuous-Time Regression Models for Longitudinal Networks

20 0.28486529 95 nips-2011-Fast and Accurate k-means For Large Datasets


similar papers computed by lda model

lda for this paper:

topicId topicWeight

[(0, 0.029), (4, 0.022), (20, 0.046), (26, 0.02), (31, 0.05), (43, 0.08), (45, 0.12), (57, 0.054), (61, 0.277), (74, 0.031), (79, 0.107), (83, 0.029), (99, 0.046)]

similar papers list:

simIndex simValue paperId paperTitle

same-paper 1 0.74769467 97 nips-2011-Finite Time Analysis of Stratified Sampling for Monte Carlo

Author: Alexandra Carpentier, Rémi Munos

Abstract: We consider the problem of stratified sampling for Monte-Carlo integration. We model this problem in a multi-armed bandit setting, where the arms represent the strata, and the goal is to estimate a weighted average of the mean values of the arms. We propose a strategy that samples the arms according to an upper bound on their standard deviations and compare its estimation quality to an ideal allocation that would know the standard deviations of the strata. We provide two regret analyses: a distribution� dependent bound O(n−3/2 ) that depends on a measure of the disparity of � the strata, and a distribution-free bound O(n−4/3 ) that does not. 1

2 0.62342542 127 nips-2011-Image Parsing with Stochastic Scene Grammar

Author: Yibiao Zhao, Song-chun Zhu

Abstract: This paper proposes a parsing algorithm for scene understanding which includes four aspects: computing 3D scene layout, detecting 3D objects (e.g. furniture), detecting 2D faces (windows, doors etc.), and segmenting background. In contrast to previous scene labeling work that applied discriminative classifiers to pixels (or super-pixels), we use a generative Stochastic Scene Grammar (SSG). This grammar represents the compositional structures of visual entities from scene categories, 3D foreground/background, 2D faces, to 1D lines. The grammar includes three types of production rules and two types of contextual relations. Production rules: (i) AND rules represent the decomposition of an entity into sub-parts; (ii) OR rules represent the switching among sub-types of an entity; (iii) SET rules represent an ensemble of visual entities. Contextual relations: (i) Cooperative “+” relations represent positive links between binding entities, such as hinged faces of a object or aligned boxes; (ii) Competitive “-” relations represents negative links between competing entities, such as mutually exclusive boxes. We design an efficient MCMC inference algorithm, namely Hierarchical cluster sampling, to search in the large solution space of scene configurations. The algorithm has two stages: (i) Clustering: It forms all possible higher-level structures (clusters) from lower-level entities by production rules and contextual relations. (ii) Sampling: It jumps between alternative structures (clusters) in each layer of the hierarchy to find the most probable configuration (represented by a parse tree). In our experiment, we demonstrate the superiority of our algorithm over existing methods on public dataset. In addition, our approach achieves richer structures in the parse tree. 1

3 0.58203244 175 nips-2011-Multi-Bandit Best Arm Identification

Author: Victor Gabillon, Mohammad Ghavamzadeh, Alessandro Lazaric, Sébastien Bubeck

Abstract: We study the problem of identifying the best arm in each of the bandits in a multibandit multi-armed setting. We first propose an algorithm called Gap-based Exploration (GapE) that focuses on the arms whose mean is close to the mean of the best arm in the same bandit (i.e., small gap). We then introduce an algorithm, called GapE-V, which takes into account the variance of the arms in addition to their gap. We prove an upper-bound on the probability of error for both algorithms. Since GapE and GapE-V need to tune an exploration parameter that depends on the complexity of the problem, which is often unknown in advance, we also introduce variations of these algorithms that estimate this complexity online. Finally, we evaluate the performance of these algorithms and compare them to other allocation strategies on a number of synthetic problems.

4 0.56516826 85 nips-2011-Emergence of Multiplication in a Biophysical Model of a Wide-Field Visual Neuron for Computing Object Approaches: Dynamics, Peaks, & Fits

Author: Matthias S. Keil

Abstract: Many species show avoidance reactions in response to looming object approaches. In locusts, the corresponding escape behavior correlates with the activity of the lobula giant movement detector (LGMD) neuron. During an object approach, its firing rate was reported to gradually increase until a peak is reached, and then it declines quickly. The η-function predicts that the LGMD activity is a product ˙ between an exponential function of angular size exp(−Θ) and angular velocity Θ, and that peak activity is reached before time-to-contact (ttc). The η-function has become the prevailing LGMD model because it reproduces many experimental observations, and even experimental evidence for the multiplicative operation was reported. Several inconsistencies remain unresolved, though. Here we address ˙ these issues with a new model (ψ-model), which explicitly connects Θ and Θ to biophysical quantities. The ψ-model avoids biophysical problems associated with implementing exp(·), implements the multiplicative operation of η via divisive inhibition, and explains why activity peaks could occur after ttc. It consistently predicts response features of the LGMD, and provides excellent fits to published experimental data, with goodness of fit measures comparable to corresponding fits with the η-function. 1 Introduction: τ and η Collision sensitive neurons were reported in species such different as monkeys [5, 4], pigeons [36, 34], frogs [16, 20], and insects [33, 26, 27, 10, 38]. This indicates a high ecological relevance, and raises the question about how neurons compute a signal that eventually triggers corresponding movement patterns (e.g. escape behavior or interceptive actions). Here, we will focus on visual stimulation. Consider, for simplicity, a circular object (diameter 2l), which approaches the eye at a collision course with constant velocity v. If we do not have any a priori knowledge about the object in question (e.g. its typical size or speed), then we will be able to access only two information sources. These information sources can be measured at the retina and are called optical variables (OVs). The first is the visual angle Θ, which can be derived from the number of stimulated photore˙ ˙ ceptors (spatial contrast). The second is its rate of change dΘ(t)/dt ≡ Θ(t). Angular velocity Θ is related to temporal contrast. ˙ How should we combine Θ and Θ in order to track an imminent collision? The perhaps simplest ˙ combination is τ (t) ≡ Θ(t)/Θ(t) [13, 18]. If the object hit us at time tc , then τ (t) ≈ tc − t will ∗ Also: www.ir3c.ub.edu, Research Institute for Brain, Cognition, and Behaviour (IR3C) Edifici de Ponent, Campus Mundet, Universitat de Barcelona, Passeig Vall d’Hebron, 171. E-08035 Barcelona 1 give us a running estimation of the time that is left until contact1 . Moreover, we do not need to know anything about the approaching object: The ttc estimation computed by τ is practically independent of object size and velocity. Neurons with τ -like responses were indeed identified in the nucleus retundus of the pigeon brain [34]. In humans, only fast interceptive actions seem to rely exclusively on τ [37, 35]. Accurate ttc estimation, however, seems to involve further mechanisms (rate of disparity change [31]). ˙ Another function of OVs with biological relevance is η ≡ Θ exp(−αΘ), with α = const. [10]. While η-type neurons were found again in pigeons [34] and bullfrogs [20], most data were gathered from the LGMD2 in locusts (e.g. [10, 9, 7, 23]). The η-function is a phenomenological model for the LGMD, and implies three principal hypothesis: (i) An implementation of an exponential function exp(·). Exponentation is thought to take place in the LGMD axon, via active membrane conductances [8]. Experimental data, though, seem to favor a third-power law rather than exp(·). (ii) The LGMD carries out biophysical computations for implementing the multiplicative operation. It has been suggested that multiplication is done within the LGMD itself, by subtracting the loga˙ rithmically encoded variables log Θ − αΘ [10, 8]. (iii) The peak of the η-function occurs before ˆ ttc, at visual angle Θ(t) = 2 arctan(1/α) [9]. It follows ttc for certain stimulus configurations (e.g. ˆ l/|v| 5ms). In principle, t > tc can be accounted for by η(t + δ) with a fixed delay δ < 0 (e.g. −27ms). But other researchers observed that LGMD activity continuous to rise after ttc even for l/|v| 5ms [28]. These discrepancies remain unexplained so far [29], but stimulation dynamics perhaps plays a role. We we will address these three issues by comparing the novel function “ψ” with the η-function. LGMD computations with the ψ-function: No multiplication, no exponentiation 2 A circular object which starts its approach at distance x0 and with speed v projects a visual angle Θ(t) = 2 arctan[l/(x0 − vt)] on the retina [34, 9]. The kinematics is hence entirely specified by the ˙ half-size-to-velocity ratio l/|v|, and x0 . Furthermore, Θ(t) = 2lv/((x0 − vt)2 + l2 ). In order to define ψ, we consider at first the LGMD neuron as an RC-circuit with membrane potential3 V [17] dV Cm = β (Vrest − V ) + gexc (Vexc − V ) + ginh (Vinh − V ) (1) dt 4 Cm = membrane capacity ; β ≡ 1/Rm denotes leakage conductance across the cell membrane (Rm : membrane resistance); gexc and ginh are excitatory and inhibitory inputs. Each conductance gi (i = exc, inh ) can drive the membrane potential to its associated reversal potential Vi (usually Vinh ≤ Vexc ). Shunting inhibition means Vinh = Vrest . Shunting inhibition lurks “silently” because it gets effective only if the neuron is driven away from its resting potential. With synaptic input, the neuron decays into its equilibrium state Vrest β + Vexc gexc + Vinh ginh V∞ ≡ (2) β + gexc + ginh according to V (t) = V∞ (1 − exp(−t/τm )). Without external input, V (t 1) → Vrest . The time scale is set by τm . Without synaptic input τm ≡ Cm /β. Slowly varying inputs gexc , ginh > 0 modify the time scale to approximately τm /(1 + (gexc + ginh )/β). For highly dynamic inputs, such as in late phase of the object approach, the time scale gets dynamical as well. The ψ-model assigns synaptic inputs5 ˙ ˙ ˙ ˙ gexc (t) = ϑ(t), ϑ(t) = ζ1 ϑ(t − ∆tstim ) + (1 − ζ1 )Θ(t) (3a) e ginh (t) = [γϑ(t)] , ϑ(t) = ζ0 ϑ(t − ∆tstim ) + (1 − ζ0 )Θ(t) 1 (3b) This linear approximation gets worse with increasing Θ, but turns out to work well until short before ttc (τ adopts a minimum at tc − 0.428978 · l/|v|). 2 LGMD activity is usually monitored via its postsynaptic neuron, the Descending Contralateral Movement Detector (DCMD) neuron. This represents no problem as LGMD spikes follow DCMD spikes 1:1 under visual stimulation [22] from 300Hz [21] to at least 400Hz [24]. 3 Here we assume that the membrane potential serves as a predictor for the LGMD’s mean firing rate. 4 Set to unity for all simulations. 5 LGMD receives also inhibition from a laterally acting network [21]. The η-function considers only direct feedforward inhibition [22, 6], and so do we. 2 Θ ∈ [7.63°, 180.00°[ temporal resolution ∆ tstim=1.0ms l/|v|=20.00ms, β=1.00, γ=7.50, e=3.00, ζ0=0.90, ζ1=0.99, nrelax=25 0.04 scaled dΘ/dt continuous discretized 0.035 0.03 Θ(t) (input) ϑ(t) (filtered) voltage V(t) (output) t = 56ms max t =300ms c 0.025 0 10 2 η(t): α=3.29, R =1.00 n =10 → t =37ms log Θ(t) amplitude relax max 0.02 0.015 0.01 0.005 0 −0.005 0 50 100 150 200 250 300 −0.01 0 350 time [ms] 50 100 150 200 250 300 350 time [ms] (b) ψ versus η (a) discretized optical variables Figure 1: (a) The continuous visual angle of an approaching object is shown along with its discretized version. Discretization transforms angular velocity from a continuous variable into a series of “spikes” (rescaled). (b) The ψ function with the inputs shown in a, with nrelax = 25 relaxation time steps. Its peak occurs tmax = 56ms before ttc (tc = 300ms). An η function (α = 3.29) that was fitted to ψ shows good agreement. For continuous optical variables, the peak would occur 4ms earlier, and η would have α = 4.44 with R2 = 1. For nrelax = 10, ψ is farther away from its equilibrium at V∞ , and its peak moves 19ms closer to ttc. t =500ms, dia=12.0cm, ∆t c =1.00ms, dt=10.00µs, discrete=1 stim 250 n relax = 50 2 200 α=4.66, R =0.99 [normal] n = 25 relax 2 α=3.91, R =1.00 [normal] n =0 relax tmax [ms] 150 2 α=1.15, R =0.99 [normal] 100 50 0 β=1.00, γ=7.50, e=3.00, V =−0.001, ζ =0.90, ζ =0.99 inh −50 5 10 15 20 25 30 0 35 1 40 45 50 l/|v| [ms] (a) different nrelax (b) different ∆tstim ˆ ˆ Figure 2: The figures plot the relative time tmax ≡ tc − t of the response peak of ψ, V (t), as a function of half-size-to-velocity ratio (points). Line fits with slope α and intercept δ were added (lines). The predicted linear relationship in all cases is consistent with experimental evidence [9]. (a) The stimulus time scale is held constant at ∆tstim = 1ms, and several LGMD time scales are defined by nrelax (= number of intercalated relaxation steps for each integration time step). Bigger values of nrelax move V (t) closer to its equilibrium V∞ (t), implying higher slopes α in turn. (b) LGMD time scale is fixed at nrelax = 25, and ∆tstim is manipulated. Because of the discretization of optical variables (OVs) in our simulation, increasing ∆tstim translates to an overall smaller number of jumps in OVs, but each with higher amplitude. Thus, we say ψ(t) ≡ V (t) if and only if gexc and ginh are defined with the last equation. The time ˙ scale of stimulation is defined by ∆tstim (by default 1ms). The variables ϑ and ϑ are lowpass filtered angular size and rate of expansion, respectively. The amount of filtering is defined by memory constants ζ0 and ζ1 (no filtering if zero). The idea is to continue with generating synaptic input ˙ after ttc, where Θ(t > tc ) = const and thus Θ(t > tc ) = 0. Inhibition is first weighted by γ, and then potentiated by the exponent e. Hodgkin-Huxley potentiates gating variables n, m ∈ [0, 1] instead (potassium ∝ n4 , sodium ∝ m3 , [12]) and multiplies them with conductances. Gabbiani and co-workers found that the function which transforms membrane potential to firing rate is better described by a power function with e = 3 than by exp(·) (Figure 4d in [8]). 3 Dynamics of the ψ-function 3 Discretization. In a typical experiment, a monitor is placed a short distance away from the insect’s eye, and an approaching object is displayed. Computer screens have a fixed spatial resolution, and as a consequence size increments of the displayed object proceed in discrete jumps. The locust retina is furthermore composed of a discrete array of ommatidia units. We therefore can expect a corresponding step-wise increment of Θ with time, although optical and neuronal filtering may ˙ smooth Θ to some extent again, resulting in ϑ (figure 1). Discretization renders Θ discontinuous, ˙ For simulating the dynamics of ψ, we discretized angular size what again will be alleviated in ϑ. ˙ with floor(Θ), and Θ(t) ≈ [Θ(t + ∆tstim ) − Θ(t)]/∆tstim . Discretized optical variables (OVs) were re-normalized to match the range of original (i.e. continuous) OVs. To peak, or not to peak? Rind & Simmons reject the hypothesis that the activity peak signals impending collision on grounds of two arguments [28]: (i) If Θ(t + ∆tstim ) − Θ(t) 3o in consecutively displayed stimulus frames, the illusion of an object approach would be lost. Such stimulation would rather be perceived as a sequence of rapidly appearing (but static) objects, causing reduced responses. (ii) After the last stimulation frame has been displayed (that is Θ = const), LGMD responses keep on building up beyond ttc. This behavior clearly depends on l/|v|, also according to their own data (e.g. Figure 4 in [26]): Response build up after ttc is typically observed for suffi˙ ciently small values of l/|v|. Input into ψ in situations where Θ = const and Θ = 0, respectively, ˙ is accommodated by ϑ and ϑ, respectively. We simulated (i) by setting ∆tstim = 5ms, thus producing larger and more infrequent jumps in discrete OVs than with ∆tstim = 1ms (default). As a consequence, ϑ(t) grows more slowly (deˆ layed build up of inhibition), and the peak occurs later (tmax ≡ tc − t = 10ms with everything else ˆ ˆ identical with figure 1b). The peak amplitude V = V (t) decreases nearly sixfold with respect to default. Our model thus predicts the reduced responses observed by Rind & Simmons [28]. Linearity. Time of peak firing rate is linearly related to l/|v| [10, 9]. The η-function is consistent ˆ with this experimental evidence: t = tc − αl/|v| + δ (e.g. α = 4.7, δ = −27ms). The ψ-function reproduces this relationship as well (figure 2), where α depends critically on the time scale of biophysical processes in the LGMD. We studied the impact of this time scale by choosing 10µs for the numerical integration of equation 1 (algorithm: 4th order Runge-Kutta). Apart from improving the numerical stability of the integration algorithm, ψ is far from its equilibrium V∞ (t) in every moment ˙ t, given the stimulation time scale ∆tstim = 1ms 6 . Now, at each value of Θ(t) and Θ(t), respectively, we intercalated nrelax iterations for integrating ψ. Each iteration takes V (t) asymptotically closer to V∞ (t), and limnrelax 1 V (t) = V∞ (t). If the internal processes in the LGMD cannot keep up with stimulation (nrelax = 0), we obtain slopes values that underestimate experimentally found values (figure 2a). In contrast, for nrelax 25 we get an excellent agreement with the experimentally determined α. This means that – under the reported experimental stimulation conditions (e.g. [9]) – the LGMD would operate relatively close to its steady state7 . Now we fix nrelax at 25 and manipulate ∆tstim instead (figure 2b). The default value ∆tstim = 1ms corresponds to α = 3.91. Slightly bigger values of ∆tstim (2.5ms and 5ms) underestimate the experimental α. In addition, the line fits also return smaller intercept values then. We see tmax < 0 up to l/|v| ≈ 13.5ms – LGMD activity peaks after ttc! Or, in other words, LGMD activity continues to increase after ttc. In the limit, where stimulus dynamics is extremely fast, and LGMD processes are kept far from equilibrium at each instant of the approach, α gets very small. As a consequence, tmax gets largely independent of l/|v|: The activity peak would cling to tmax although we varied l/|v|. 4 Freeze! Experimental data versus steady state of “psi” In the previous section, experimentally plausible values for α were obtained if ψ is close to equilibrium at each instant of time during stimulation. In this section we will thus introduce a steady-state 6 Assuming one ∆tstim for each integration time step. This means that by default stimulation and biophysical dynamics will proceed at identical time scales. 7 Notice that in this moment we can only make relative statements - we do not have data at hand for defining absolute time scales 4 tc=500ms, v=2.00m/s ψ∞ → (β varies), γ=3.50, e=3.00, Vinh=−0.001 tc=500ms, v=2.00m/s ψ∞ → β=2.50, γ=3.50, (e varies), Vinh=−0.001 300 tc=500ms, v=2.00m/s ψ∞ → β=2.50, (γ varies), e=3.00, Vinh=−0.001 350 300 β=10.00 β=5.00 norm. rmse = 0.058...0.153 correlation (β,α)=−0.90 (n=4) ∞ β=1.00 e=4.00 norm. |η−ψ | = 0.009...0.114 e=3.00 300 norm. rmse = 0.014...0.160 correlation (e,α)=0.98 (n=4) ∞ e=2.50 250 250 norm. |η−ψ | = 0.043...0.241 ∞ norm. rmse = 0.085...0.315 correlation (γ,α)=1.00 (n=5) 150 tmax [ms] 200 tmax [ms] 200 tmax [ms] γ=5.00 γ=2.50 γ=1.00 γ=0.50 γ=0.25 e=5.00 norm. |η−ψ | = 0.020...0.128 β=2.50 250 200 150 100 150 100 100 50 50 50 0 5 10 15 20 25 30 35 40 45 0 5 50 10 15 20 l/|v| [ms] 25 30 35 40 45 0 5 50 10 15 20 l/|v| [ms] (a) β varies 25 30 35 40 45 50 l/|v| [ms] (b) e varies (c) γ varies ˆ ˆ Figure 3: Each curve shows how the peak ψ∞ ≡ ψ∞ (t) depends on the half-size-to-velocity ratio. In each display, one parameter of ψ∞ is varied (legend), while the others are held constant (figure title). Line slopes vary according to parameter values. Symbol sizes are scaled according to rmse (see also figure 4). Rmse was calculated between normalized ψ∞ (t) & normalized η(t) (i.e. both functions ∈ [0, 1] with original minimum and maximum indicated by the textbox). To this end, the ˆ peak of the η-function was placed at tc , by choosing, at each parameter value, α = |v| · (tc − t)/l (for determining correlation, the mean value of α was taken across l/|v|). tc=500ms, v=2.00m/s ψ∞ → (β varies), γ=3.50, e=3.00, Vinh=−0.001 tc=500ms, v=2.00m/s ψ∞ → β=2.50, γ=3.50, (e varies), Vinh=−0.001 tc=500ms, v=2.00m/s ψ∞ → β=2.50, (γ varies), e=3.00, Vinh=−0.001 0.25 β=5.00 0.12 β=2.50 β=1.00 0.1 0.08 (normalized η, ψ∞) 0.12 β=10.00 (normalized η, ψ∞) (normalized η, ψ∞) 0.14 0.1 0.08 γ=5.00 γ=2.50 0.2 γ=1.00 γ=0.50 γ=0.25 0.15 0.06 0.04 0.02 0 5 10 15 20 25 30 35 40 45 50 meant |η(t)−ψ∞(t)| meant |η(t)−ψ∞(t)| meant |η(t)−ψ∞(t)| 0.06 0.04 e=5.00 e=4.00 e=3.00 0.02 e=2.50 10 l/|v| [ms] 15 20 25 30 35 40 45 50 l/|v| [ms] (a) β varies (b) e varies 0.1 0.05 0 5 10 15 20 25 30 35 40 45 50 l/|v| [ms] (c) γ varies Figure 4: This figure complements figure 3. It visualizes the time averaged absolute difference between normalized ψ∞ (t) & normalized η(t). For η, its value of α was chosen such that the maxima of both functions coincide. Although not being a fit, it gives a rough estimate on how the shape of both curves deviate from each other. The maximum possible difference would be one. version of ψ (i.e. equation 2 with Vrest = 0, Vexc = 1, and equations 3 plugged in), ψ∞ (t) ≡ e ˙ Θ(t) + Vinh [γΘ(t)] e ˙ β + Θ(t) + [γΘ(t)] (4) (Here we use continuous versions of angular size and rate of expansion). The ψ∞ -function makes life easier when it comes to fitting experimental data. However, it has its limitations, because we brushed the whole dynamic of ψ under the carpet. Figure 3 illustrates how the linˆ ear relationship (=“linearity”) between tmax ≡ tc − t and l/|v| is influenced by changes in parameter values. Changing any of the values of e, β, γ predominantly causes variation in line slopes. The smallest slope changes are obtained by varying Vinh (data not shown; we checked Vinh = 0, −0.001, −0.01, −0.1). For Vinh −0.01, linearity is getting slightly compromised, as slope increases with l/|v| (e.g. Vinh = −1 α ∈ [4.2, 4.7]). In order to get a notion about how well the shape of ψ∞ (t) matches η(t), we computed timeaveraged difference measures between normalized versions of both functions (details: figure 3 & 4). Bigger values of β match η better at smaller, but worse at bigger values of l/|v| (figure 4a). Smaller β cause less variation across l/|v|. As to variation of e, overall curve shapes seem to be best aligned with e = 3 to e = 4 (figure 4b). Furthermore, better matches between ψ∞ (t) and η(t) correspond to bigger values of γ (figure 4c). And finally, Vinh marches again to a different tune (data not shown). Vinh = −0.1 leads to the best agreement (≈ 0.04 across l/|v|) of all Vinh , quite different from the other considered values. For the rest, ψ∞ (t) and η(t) align the same (all have maximum 0.094), 5 ˙ (a) Θ = 126o /s ˙ (b) Θ = 63o /s Figure 5: The original data (legend label “HaGaLa95”) were resampled from ref. [10] and show ˙ DCMD responses to an object approach with Θ = const. Thus, Θ increases linearly with time. The η-function (fitting function: Aη(t+δ)+o) and ψ∞ (fitting function: Aψ∞ (t)+o) were fitted to these data: (a) (Figure 3 Di in [10]) Good fits for ψ∞ are obtained with e = 5 or higher (e = 3 R2 = 0.35 and rmse = 0.644; e = 4 R2 = 0.45 and rmse = 0.592). “Psi” adopts a sigmoid-like curve form which (subjectively) appears to fit the original data better than η. (b) (Figure 3 Dii in [10]) “Psi” yields an excellent fit for e = 3. RoHaTo10 gregarious locust LV=0.03s Θ(t), lv=30ms e011pos014 sgolay with 100 t =107ms max ttc=5.00s ψ adj.R2 0.95 (LM:3) ∞ η(t) adj.R2 1 (TR::1) 2 ψ : R =0.95, rmse=0.004, 3 coefficients ∞ → β=2.22, γ=0.70, e=3.00, V =−0.001, A=0.07, o=0.02, δ=0.00ms inh η: R2=1.00, rmse=0.001 → α=3.30, A=0.08, o=0.0, δ=−10.5ms 3.4 3.6 3.8 4 4.2 4.4 4.6 4.8 5 5.2 time [s] (b) α versus β (a) spike trace Figure 6: (a) DCMD activity in response to a black square (l/|v| = 30ms, legend label “e011pos14”, ref. [30]) approaching to the eye center of a gregarious locust (final visual angle 50o ). Data show the first stimulation so habituation is minimal. The spike trace (sampled at 104 Hz) was full wave rectified, lowpass filtered, and sub-sampled to 1ms resolution. Firing rate was estimated with Savitzky-Golay filtering (“sgolay”). The fits of the η-function (Aη(t + δ) + o; 4 coefficients) and ψ∞ -function (Aψ∞ (t) with fixed e, o, δ, Vinh ; 3 coefficients) provide both excellent fits to firing rate. (b) Fitting coefficient α (→ η-function) inversely correlates with β (→ ψ∞ ) when fitting firing rates of another 5 trials as just described (continuous line = line fit to the data points). Similar correlation values would be obtained if e is fixed at values e = 2.5, 4, 5 c = −0.95, −0.96, −0.91. If o was determined by the fitting algorithm, then c = −0.70. No clear correlations with α were obtained for γ. despite of covering different orders of magnitude with Vinh = 0, −0.001, −0.01. Decelerating approach. Hatsopoulos et al. [10] recorded DCMD activity in response to an ap˙ proaching object which projected image edges on the retina moving at constant velocity: Θ = const. ˙ This “linear approach” is perceived as if the object is getting increasingly implies Θ(t) = Θ0 + Θt. slower. But what appears a relatively unnatural movement pattern serves as a test for the functions η & ψ∞ . Figure 5 illustrates that ψ∞ passes the test, and consistently predicts that activity sharply rises in the initial approach phase, and subsequently declines (η passed this test already in the year 1995). 6 Spike traces. We re-sampled about 30 curves obtained from LGMD recordings from a variety of publications, and fitted η & ψ∞ -functions. We cannot show the results here, but in terms of goodness of fit measures, both functions are in the same ballbark. Rather, figure 6a shows a representative example [30]. When α and β are plotted against each other for five trials, we see a strong inverse correlation (figure 6b). Although five data points are by no means a firm statistical sample, the strong correlation could indicate that β and α play similar roles in both functions. Biophysically, β is the leakage conductance, which determines the (passive) membrane time constant τm ∝ 1/β of the neuron. Voltage drops within τm to exp(−1) times its initial value. Bigger values of β mean shorter τm (i.e., “faster neurons”). Getting back to η, this would suggest α ∝ τm , such that higher (absolute) values for α would possibly indicate a slower dynamic of the underlying processes. 5 Discussion (“The Good, the Bad, and the Ugly”) Up to now, mainly two classes of LGMD models existed: The phenomenological η-function on the one hand, and computational models with neuronal layers presynaptic to the LGMD on the other (e.g. [25, 15]; real-world video sequences & robotics: e.g. [3, 14, 32, 2]). Computational models predict that LGMD response features originate from excitatory and inhibitory interactions in – and between – presynaptic neuronal layers. Put differently, non-linear operations are generated in the presynaptic network, and can be a function of many (model) parameters (e.g. synaptic weights, time constants, etc.). In contrast, the η-function assigns concrete nonlinear operations to the LGMD [7]. The η-function is accessible to mathematical analysis, whereas computational models have to be probed with videos or artificial stimulus sequences. The η-function is vague about biophysical parameters, whereas (good) computational models need to be precise at each (model) parameter value. The η-function establishes a clear link between physical stimulus attributes and LGMD activity: It postulates what is to be computed from the optical variables (OVs). But in computational models, such a clear understanding of LGMD inputs cannot always be expected: Presynaptic processing may strongly transform OVs. The ψ function thus represents an intermediate model class: It takes OVs as input, and connects them with biophysical parameters of the LGMD. For the neurophysiologist, the situation could hardly be any better. Psi implements the multiplicative operation of the η-function by shunting inhibition (equation 1: Vexc ≈ Vrest and Vinh ≈ Vrest ). The η-function fits ψ very well according to our dynamical simulations (figure 1), and satisfactory by the approximate criterion of figure 4. We can conclude that ψ implements the η-function in a biophysically plausible way. However, ψ does neither explicitly specify η’s multiplicative operation, nor its exponential function exp(·). Instead we have an interaction between shunting inhibition and a power law (·)e , with e ≈ 3. So what about power laws in neurons? Because of e > 1, we have an expansive nonlinearity. Expansive power-law nonlinearities are well established in phenomenological models of simple cells of the primate visual cortex [1, 11]. Such models approximate a simple cell’s instantaneous firing rate r from linear filtering of a stimulus (say Y ) by r ∝ ([Y ]+ )e , where [·]+ sets all negative values to zero and lets all positive pass. Although experimental evidence favors linear thresholding operations like r ∝ [Y − Ythres ]+ , neuronal responses can behave according to power law functions if Y includes stimulus-independent noise [19]. Given this evidence, the power-law function of the inhibitory input into ψ could possibly be interpreted as a phenomenological description of presynaptic processes. The power law would also be the critical feature by means of which the neurophysiologist could distinguish between the η function and ψ. A study of Gabbiani et al. aimed to provide direct evidence for a neuronal implementation of the η-function [8]. Consequently, the study would be an evidence ˙ for a biophysical implementation of “direct” multiplication via log Θ − αΘ. Their experimental evidence fell somewhat short in the last part, where “exponentation through active membrane conductances” should invert logarithmic encoding. Specifically, the authors observed that “In 7 out of 10 neurons, a third-order power law best described the data” (sixth-order in one animal). Alea iacta est. Acknowledgments MSK likes to thank Stephen M. Rogers for kindly providing the recording data for compiling figure 6. MSK furthermore acknowledges support from the Spanish Government, by the Ramon and Cajal program and the research grant DPI2010-21513. 7 References [1] D.G. Albrecht and D.B. Hamilton, Striate cortex of monkey and cat: contrast response function, Journal of Neurophysiology 48 (1982), 217–237. [2] S. Bermudez i Badia, U. Bernardet, and P.F.M.J. Verschure, Non-linear neuronal responses as an emergent property of afferent networks: A case study of the locust lobula giant movemement detector, PLoS Computational Biology 6 (2010), no. 3, e1000701. [3] M. Blanchard, F.C. Rind, and F.M.J. Verschure, Collision avoidance using a model of locust LGMD neuron, Robotics and Autonomous Systems 30 (2000), 17–38. [4] D.F. Cooke and M.S.A. Graziano, Super-flinchers and nerves of steel: Defensive movements altered by chemical manipulation of a cortical motor area, Neuron 43 (2004), no. 4, 585–593. [5] L. Fogassi, V. Gallese, L. Fadiga, G. Luppino, M. Matelli, and G. Rizzolatti, Coding of peripersonal space in inferior premotor cortex (area f4), Journal of Neurophysiology 76 (1996), 141–157. [6] F. Gabbiani, I. Cohen, and G. Laurent, Time-dependent activation of feed-forward inhibition in a looming sensitive neuron, Journal of Neurophysiology 94 (2005), 2150–2161. [7] F. Gabbiani, H.G. Krapp, N. Hatsopolous, C.H. Mo, C. Koch, and G. Laurent, Multiplication and stimulus invariance in a looming-sensitive neuron, Journal of Physiology - Paris 98 (2004), 19–34. [8] F. Gabbiani, H.G. Krapp, C. Koch, and G. Laurent, Multiplicative computation in a visual neuron sensitive to looming, Nature 420 (2002), 320–324. [9] F. Gabbiani, H.G. Krapp, and G. Laurent, Computation of object approach by a wide-field, motionsensitive neuron, Journal of Neuroscience 19 (1999), no. 3, 1122–1141. [10] N. Hatsopoulos, F. Gabbiani, and G. Laurent, Elementary computation of object approach by a wide-field visual neuron, Science 270 (1995), 1000–1003. [11] D.J. Heeger, Modeling simple-cell direction selectivity with normalized, half-squared, linear operators, Journal of Neurophysiology 70 (1993), 1885–1898. [12] A.L. Hodkin and A.F. Huxley, A quantitative description of membrane current and its application to conduction and excitation in nerve, Journal of Physiology 117 (1952), 500–544. [13] F. Hoyle, The black cloud, Pinguin Books, London, 1957. [14] M.S. Keil, E. Roca-Morena, and A. Rodr´guez-V´ zquez, A neural model of the locust visual system for ı a detection of object approaches with real-world scenes, Proceedings of the Fourth IASTED International Conference (Marbella, Spain), vol. 5119, 6-8 September 2004, pp. 340–345. [15] M.S. Keil and A. Rodr´guez-V´ zquez, Towards a computational approach for collision avoidance with ı a real-world scenes, Proceedings of SPIE: Bioengineered and Bioinspired Systems (Maspalomas, Gran Canaria, Canary Islands, Spain) (A. Rodr´guez-V´ zquez, D. Abbot, and R. Carmona, eds.), vol. 5119, ı a SPIE - The International Society for Optical Engineering, 19-21 May 2003, pp. 285–296. [16] J.G. King, J.Y. Lettvin, and E.R. Gruberg, Selective, unilateral, reversible loss of behavioral responses to looming stimuli after injection of tetrodotoxin or cadmium chloride into the frog optic nerve, Brain Research 841 (1999), no. 1-2, 20–26. [17] C. Koch, Biophysics of computation: information processing in single neurons, Oxford University Press, New York, 1999. [18] D.N. Lee, A theory of visual control of braking based on information about time-to-collision, Perception 5 (1976), 437–459. [19] K.D. Miller and T.W. Troyer, Neural noise can explain expansive, power-law nonlinearities in neuronal response functions, Journal of Neurophysiology 87 (2002), 653–659. [20] Hideki Nakagawa and Kang Hongjian, Collision-sensitive neurons in the optic tectum of the bullfrog, rana catesbeiana, Journal of Neurophysiology 104 (2010), no. 5, 2487–2499. [21] M. O’Shea and C.H.F. Rowell, Projection from habituation by lateral inhibition, Nature 254 (1975), 53– 55. [22] M. O’Shea and J.L.D. Williams, The anatomy and output connection of a locust visual interneurone: the lobula giant movement detector (lgmd) neurone, Journal of Comparative Physiology 91 (1974), 257–266. [23] S. Peron and F. Gabbiani, Spike frequency adaptation mediates looming stimulus selectivity, Nature Neuroscience 12 (2009), no. 3, 318–326. [24] F.C. Rind, A chemical synapse between two motion detecting neurones in the locust brain, Journal of Experimental Biology 110 (1984), 143–167. [25] F.C. Rind and D.I. Bramwell, Neural network based on the input organization of an identified neuron signaling implending collision, Journal of Neurophysiology 75 (1996), no. 3, 967–985. 8 [26] F.C. Rind and P.J. Simmons, Orthopteran DCMD neuron: a reevaluation of responses to moving objects. I. Selective responses to approaching objects, Journal of Neurophysiology 68 (1992), no. 5, 1654–1666. [27] , Orthopteran DCMD neuron: a reevaluation of responses to moving objects. II. Critical cues for detecting approaching objects, Journal of Neurophysiology 68 (1992), no. 5, 1667–1682. [28] , Signaling of object approach by the dcmd neuron of the locust, Journal of Neurophysiology 77 (1997), 1029–1033. [29] , Reply, Trends in Neuroscience 22 (1999), no. 5, 438. [30] S.M. Roger, G.W.J. Harston, F. Kilburn-Toppin, T. Matheson, M. Burrows, F. Gabbiani, and H.G. Krapp, Spatiotemporal receptive field properties of a looming-sensitive neuron in solitarious and gregarious phases of desert locust, Journal of Neurophysiology 103 (2010), 779–792. [31] S.K. Rushton and J.P. Wann, Weighted combination of size and disparity: a computational model for timing ball catch, Nature Neuroscience 2 (1999), no. 2, 186–190. [32] Yue. S., Rind. F.C., M.S. Keil, J. Cuadri, and R. Stafford, A bio-inspired visual collision detection mechanism for cars: Optimisation of a model of a locust neuron to a novel environment, Neurocomputing 69 (2006), 1591–1598. [33] G.R. Schlotterer, Response of the locust descending movement detector neuron to rapidly approaching and withdrawing visual stimuli, Canadian Journal of Zoology 55 (1977), 1372–1376. [34] H. Sun and B.J. Frost, Computation of different optical variables of looming objects in pigeon nucleus rotundus neurons, Nature Neuroscience 1 (1998), no. 4, 296–303. [35] J.R. Tresilian, Visually timed action: time-out for ’tau’?, Trends in Cognitive Sciences 3 (1999), no. 8, 1999. [36] Y. Wang and B.J. Frost, Time to collision is signalled by neurons in the nucleus rotundus of pigeons, Nature 356 (1992), 236–238. [37] J.P. Wann, Anticipating arrival: is the tau-margin a specious theory?, Journal of Experimental Psychology and Human Perceptual Performance 22 (1979), 1031–1048. [38] M. Wicklein and N.J. Strausfeld, Organization and significance of neurons that detect change of visual depth in the hawk moth manduca sexta, The Journal of Comparative Neurology 424 (2000), no. 2, 356– 376. 9

5 0.55997407 128 nips-2011-Improved Algorithms for Linear Stochastic Bandits

Author: Yasin Abbasi-yadkori, Csaba Szepesvári, David Tax

Abstract: We improve the theoretical analysis and empirical performance of algorithms for the stochastic multi-armed bandit problem and the linear stochastic multi-armed bandit problem. In particular, we show that a simple modification of Auer’s UCB algorithm (Auer, 2002) achieves with high probability constant regret. More importantly, we modify and, consequently, improve the analysis of the algorithm for the for linear stochastic bandit problem studied by Auer (2002), Dani et al. (2008), Rusmevichientong and Tsitsiklis (2010), Li et al. (2010). Our modification improves the regret bound by a logarithmic factor, though experiments show a vast improvement. In both cases, the improvement stems from the construction of smaller confidence sets. For their construction we use a novel tail inequality for vector-valued martingales. 1

6 0.53709006 56 nips-2011-Committing Bandits

7 0.52269399 24 nips-2011-Active learning of neural response functions with Gaussian processes

8 0.51851696 177 nips-2011-Multi-armed bandits on implicit metric spaces

9 0.51799142 208 nips-2011-Optimistic Optimization of a Deterministic Function without the Knowledge of its Smoothness

10 0.51711136 32 nips-2011-An Empirical Evaluation of Thompson Sampling

11 0.5109998 61 nips-2011-Contextual Gaussian Process Bandit Optimization

12 0.50963163 159 nips-2011-Learning with the weighted trace-norm under arbitrary sampling distributions

13 0.50824296 109 nips-2011-Greedy Model Averaging

14 0.50737786 118 nips-2011-High-dimensional regression with noisy and missing data: Provable guarantees with non-convexity

15 0.5022918 29 nips-2011-Algorithms and hardness results for parallel large margin learning

16 0.50200129 265 nips-2011-Sparse recovery by thresholded non-negative least squares

17 0.50048566 49 nips-2011-Boosting with Maximum Adaptive Sampling

18 0.50042242 186 nips-2011-Noise Thresholds for Spectral Clustering

19 0.50034374 195 nips-2011-On Learning Discrete Graphical Models using Greedy Methods

20 0.49943316 185 nips-2011-Newtron: an Efficient Bandit algorithm for Online Multiclass Prediction