nips nips2011 nips2011-272 knowledge-graph by maker-knowledge-mining

272 nips-2011-Stochastic convex optimization with bandit feedback

Source: pdf

Author: Alekh Agarwal, Dean P. Foster, Daniel J. Hsu, Sham M. Kakade, Alexander Rakhlin

Abstract: This paper addresses the problem of minimizing a convex, Lipschitz function f over a convex, compact set X under a stochastic bandit feedback model. In this model, the algorithm is allowed to observe noisy realizations of the function value f (x) at any query point x ∈ X . We demonstrate √ a generalization of the ellipsoid algorithm that √ incurs O(poly(d) T ) regret. Since any algorithm has regret at least Ω( T ) on this problem, our algorithm is optimal in terms of the scaling with T . 1

Reference: text

Summary: the most important sentenses genereted by tfidf model

sentIndex sentText sentNum sentScore

1 Stochastic convex optimization with bandit feedback Alekh Agarwal Department of EECS UC Berkeley alekh@cs. [sent-1, score-0.224]

2 We demonstrate √ a generalization of the ellipsoid algorithm that √ incurs O(poly(d) T ) regret. [sent-13, score-0.194]

3 Since any algorithm has regret at least Ω( T ) on this problem, our algorithm is optimal in terms of the scaling with T . [sent-14, score-0.311]

4 1 Introduction This paper considers the problem of stochastic convex optimization under bandit feedback which is a generalization of the classical multi-armed bandit problem, formulated by Robbins in 1952. [sent-15, score-0.369]

5 The algorithm repeatedly queries f at points x ∈ X and observes noisy realizations of f (x). [sent-17, score-0.175]

6 For these “non-parametric” bandit problems, the rates of regret (after T queries) are of the form T α , with exponent α approaching 1 for large dimension d. [sent-23, score-0.387]

7 Our main contribution is an algorithm which √ ˜ achieves, with high probability, an O(poly(d) T ) regret after T requests. [sent-25, score-0.288]

8 [4] √ ˜ show a regret of O( T ) for a one-dimensional non-convex problem with ﬁnite number of maximizers. [sent-31, score-0.265]

9 [7] show T regret for a subset of Lipschitz functions with certain metric properties. [sent-34, score-0.265]

10 Convex, Lipschitz cost functions have also been looked at in the adversarial model [10, 12], but the best-known regret bounds for these algorithms are O(T 3/4 ). [sent-35, score-0.265]

11 The problem addressed in this paper is closely related to noisy zeroth order convex optimization, whereby the algorithm queries a point of the domain X and receives a noisy evaluation of the function. [sent-38, score-0.308]

12 A low regret guarantees that the net cost over the entire process is bounded unlike an optimization error bound. [sent-44, score-0.297]

13 In the next section, we give the formal problem setup and highlight diﬀerences between the regret and optimization error settings. [sent-46, score-0.297]

14 Section 4 describes our generalization of the ellipsoid algorithm for d dimensions along with its regret guarantee. [sent-48, score-0.434]

15 2 Setup In this section we will give the basic setup and the performance criterion, and explain the diﬀerences between the metrics of regret and optimization error. [sent-50, score-0.297]

16 , xT ∈ X , the regret of the algorithm compared to any x∗ ∈ arg minx∈X f (x) is RT = T t=1 f (xt ) − f (x∗ ) . [sent-59, score-0.288]

17 optimization error T 1 Since f is convex, the average xT = T t=1 xt satisﬁes f (¯T ) − f (x∗ ) ≤ RT /T so that low ¯ x regret (1) also gives a small optimization error. [sent-64, score-0.364]

18 Guaranteeing a small regret typically involves a more careful balancing of exploration and exploitation. [sent-67, score-0.265]

19 If f is linear indeed, we only incur O(T 1/3 ) regret on these rounds. [sent-71, score-0.265]

20 For the purposes of regret minimization, however, an algorithm has to detect that the function curves between the two sampled points. [sent-74, score-0.307]

21 Finally, if f at all three points appears to be similar at a given scale, we have a certiﬁcate (due to convexity) that the algorithm is not paying regret per query larger than this scale. [sent-78, score-0.37]

22 This center-point device that allows to quickly detect that the optimization method might be paying high regret and to act on this information is the main novel tool of our paper. [sent-79, score-0.349]

23 Their method does not guarantee vanishing regret by itself, and a careful fusion of this algorithm with our center-point device is required. [sent-83, score-0.318]

24 1 Algorithm description Algorithm 1 One-dimensional stochastic convex bandit algorithm input noisy black-box access to f : [0, 1] → R, total number of queries allowed T . [sent-87, score-0.35]

25 4: Let xl := lτ + wτ /4, xc := lτ + wτ /2, and xr := lτ + 3wτ /4. [sent-93, score-0.396]

26 7: For each x ∈ {xl , xc , xr }, query f (x) 2σ log T times. [sent-98, score-0.31]

27 2 γi 8: if max{LBγi (xl ), LBγi (xr )} ≥ min{UBγi (xl ), UBγi (xr )} + γi then 9: {Case 1: CI’s at xl and xr are γi separated} 10: if LBγi (xl ) ≥ LBγi (xr ) then let lτ +1 := xl and rτ +1 := rτ . [sent-99, score-0.465]

28 11: if LBγi (xl ) < LBγi (xr ) then let lτ +1 := lτ and rτ +1 := xr . [sent-100, score-0.173]

29 13: else if max{LBγi (xl ), LBγi (xr )} ≥ UBγi (xc ) + γi then 14: {Case 2: CI’s at xc and xl or xr are γi separated} 15: if LBγi (xl ) ≥ LBγi (xr ) then let lτ +1 := xl and rτ +1 := rτ . [sent-102, score-0.566]

30 16: if LBγi (xl ) < LBγi (xr ) then let lτ +1 := lτ and rτ +1 := xr . [sent-103, score-0.173]

31 18: end if 19: end for 20: end for Algorithm 1 proceeds in a series of epochs demarcated by a working feasible region (the interval Xτ = [lτ , rτ ] in epoch τ ). [sent-105, score-0.66]

32 Each epoch is further subdivided 2 into rounds, where we query the function (2σ log T )/γi times in round i at each of the points. [sent-108, score-0.582]

33 At the end of an epoch τ , Xτ is reduced to a subset Xτ +1 = [lτ +1 , rτ +1 ] ⊂ [lτ , rτ ] of the current region for the next epoch τ + 1, and this reduction is such that the new region is smaller in size by a constant fraction. [sent-111, score-0.927]

34 For the algorithm to identify a sizable portion of Xτ to discard, the queries in each epoch should be suitably chosen, and the convexity of f must be exploited. [sent-113, score-0.578]

35 To this end, the algorithm makes its queries at three equally-spaced points xl < xc < xr in Xτ (see Section 4. [sent-114, score-0.512]

36 Case 1: If the CIs around f (xl ) and f (xr ) are suﬃciently separated, the algorithm discards a fourth of [lτ , rτ ] (to the left of xl or right of xr ) which does not contain x∗ . [sent-116, score-0.366]

37 Case 3: Finally, if none of the earlier cases is true, then the algorithm is assured (by convexity) that the function is suﬃciently ﬂat on Xτ and hence it has not incurred much regret so far . [sent-119, score-0.363]

38 To avoid having probabilities throughout our analysis, we deﬁne an event E where at each epoch τ , and each round i, f (x) ∈ [LBγi (x), UBγi (x)] for x ∈ {xl , xc , xr }. [sent-123, score-0.772]

39 The following theorem bounds the regret incurred by Algorithm 1. [sent-125, score-0.34]

40 We note that the regret would be maintained in terms of the points xt queried by the algorithm at time t. [sent-126, score-0.347]

41 We remark that O( T ) is the smallest possible regret for any algorithm even with noisy gradient information. [sent-136, score-0.327]

42 The key idea is to show that the regret on any epoch is small and the total number of epochs is bounded. [sent-139, score-0.735]

43 To bound the per-epoch regret, we will show that the total number of queries made on any epoch depends on how ﬂat the function is on Xτ . [sent-140, score-0.496]

44 We start by showing that the reduction in Xτ after each epoch always preserves near-optimal points. [sent-142, score-0.427]

45 If epoch τ ends in round i, then [lτ +1 , rτ +1 ] contains every x ∈ [lτ , rτ ] such that f (x) ≤ f (x∗ ) + γi . [sent-145, score-0.545]

46 4 The next two lemmas bound the regret incurred in any single epoch. [sent-147, score-0.34]

47 To show this, we ﬁrst establish that an algorithm incurs low regret in a round as long as it does not end an epoch. [sent-148, score-0.459]

48 Then, as a consequence of the doubling trick, we show that the regret incurred in an epoch is on the same order as that incurred in the last round of the epoch. [sent-149, score-0.957]

49 If epoch τ continues from round i to round i + 1, then −1 the regret incurred in round i is at most 72γi σ log T. [sent-151, score-1.1]

50 If epoch τ ends in round i, then the regret incurred in the −1 entire epoch is at most 216γi σ log T. [sent-153, score-1.288]

51 The ﬁnal regret bound is simply the product of the number of epochs and the regret incurred in any single epoch. [sent-155, score-0.672]

52 Such an approach yields regret and running time that scales exponentially with the dimension d. [sent-161, score-0.265]

53 We deﬁne a pyramid to be a d-dimensional polyhedron deﬁned by d + 1 points; d points form a d-dimensional regular polygon that is the base of the pyramid, and the apex lies above the hyperplane containing the base (see Figure 1 for a graphic illustration in 3 dimensions). [sent-163, score-0.702]

54 rτ APEX x0 ϕ xd+1 h Figure 1: Pyramid in 3-dimensions Rτ x1 Xτ x2 Figure 2: The regular simplex constructed at round i of epoch τ with radius rτ , center x0 and vertices x1 , . [sent-167, score-0.704]

55 At the beginning of epoch τ , we have a current feasible set Xτ which contains at least one approximate optimum of the convex function. [sent-173, score-0.494]

56 The epoch ends with discarding some portion of the set Xτ in such a way that we still retain at least one approximate optimum in the remaining set Xτ +1 . [sent-174, score-0.448]

57 At the start of the epoch τ , we apply an aﬃne transformation to Xτ so that the smallest volume ellipsoid containing it is a Euclidian ball of radius Rτ (denoted as B(Rτ )). [sent-175, score-0.661]

58 Within each epoch, the algorithm proceeds in several rounds, each round maintaining a value γi which is successively halved. [sent-181, score-0.182]

59 5 Algorithm 2 Stochastic convex bandit algorithm input feasible region X ⊂ Rd ; noisy black-box access to f : X → R, constants c1 and c2 , functions ∆τ (γ), ∆τ (γ) and number of queries T allowed. [sent-182, score-0.415]

60 do Construct pyramid Πk with apex yk ; let z1 , . [sent-205, score-0.615]

61 , zd be the vertices of the base of Πk and z0 be the center of Πk . [sent-208, score-0.247]

62 Let center := z0 , apex := yk , top be the vertex v of Πk maximizing LBγ (v), bottom be the vertex v of Πk minimizing LBγ (v). [sent-214, score-0.646]

63 if LBγ (top) ≥ UBγ (bottom) + ∆τ (γ) and LBγ (top) ≥ UBγ (apex) + γ then {Case (1a)} Let yk+1 := top, and immediately continue to pyramid k + 1. [sent-215, score-0.215]

64 else if LBγ (top) ≥ UBγ (bottom) + ∆τ (γ) and LBγ (top) < UBγ (apex) + γ then {Case (1b)} Set (Xτ +1 , Bτ +1 ) = Cone-cutting(Πk , Xτ , Bτ ), and proceed to epoch τ + 1. [sent-216, score-0.447]

65 else if LBγ (top) < UBγ (bottom) + ∆τ (γ) and UBγ (center) < LBγ (bottom) − ∆τ (γ) then {Case (2b)} Set (Xτ +1 , Bτ +1 )= Hat-raising(Πk , Xτ , Bτ ), and proceed to epoch τ + 1. [sent-219, score-0.447]

66 end if end loop end for end for end for Algorithm 3 Cone-cutting input pyramid Π with apex y, (rounded) feasible region Xτ for epoch τ , enclosing ball Bτ 1: Let z1 , . [sent-220, score-1.329]

67 , zd be the vertices of the base of Π, and ϕ the angle at its apex. [sent-223, score-0.239]

68 output new feasible region Xτ +1 and enclosing ellipsoid Bτ +1 . [sent-230, score-0.315]

69 Algorithm 4 Hat-raising input pyramid Π with apex y, (rounded) feasible region Xτ for epoch τ , enclosing ball Bτ . [sent-231, score-1.194]

70 3: Set Π to be the pyramid with apex y and same base as Π. [sent-234, score-0.619]

71 output new feasible region Xτ +1 and enclosing ellipsoid Bτ +1 . [sent-236, score-0.315]

72 6 y1 y1 y1 ϕ z1 z2 x0 y2 x0 y2 x0 y3 Figure 3: Sequence of pyramids constructed by Algorithm 2 Let x0 be the center of the ball B(Rτ ) containing Xτ . [sent-237, score-0.197]

73 At the start of a round i, we construct a regular simplex centered at x0 and contained in B(rτ ). [sent-238, score-0.227]

74 The algorithm queries the function f at all the vertices of the simplex, denoted by x1 . [sent-239, score-0.173]

75 The base of the pyramid consists of vertices z1 , . [sent-256, score-0.273]

76 We note that the construction of such a pyramid is always possible— we take a sphere with y1 − x0 as the diameter, and arrange z1 , . [sent-260, score-0.242]

77 , zd on the boundary of the sphere such that the angle between y1 − x0 and y1 − zi is ϕ. [sent-263, score-0.205]

78 The construction of the pyramid is depicted in Figure 3. [sent-264, score-0.217]

79 , zd as well as the center of the pyramid until the CI’s all shrink to γ. [sent-268, score-0.334]

80 Let top and bottom denote the vertices of the pyramid (including y1 ) with the largest and the smallest function value estimates resp. [sent-269, score-0.339]

81 For consistency, we will also use apex to denote the apex y1 . [sent-270, score-0.806]

82 We then check for one of the following conditions (see Section 5 of the full-length version [2] for graphical illustrations of these cases): (1) If LBγ (top) ≥ UBγ (bottom) + ∆τ (γ), we proceed based on the separation between top and apex CI’s. [sent-271, score-0.493]

83 (2) In this case, we set top to be the apex of the next pyramid, reset γ = 1 and continue the sampling procedure on the next pyramid. [sent-273, score-0.486]

84 In this case, we declare the epoch over and pass the current apex to the cone-cutting step. [sent-275, score-0.806]

85 (2) If LBγ (top) ≤ UBγ (bottom) + ∆τ (γ), then one of the following happens: (a) If UBγ (center) ≥ LBγ (bottom) − ∆τ (γ), then all of the vertices and the center of the pyramid have their function values within a 2∆τ (γ) + 3γ interval. [sent-276, score-0.3]

86 Otherwise, we continue sampling the current pyramid with the new value of γ. [sent-279, score-0.215]

87 (b) If UBγ (center) ≤ LBγ (bottom) − ∆τ (γ), then we terminate the epoch and pass the center and the current apex to the hat-raising step. [sent-280, score-0.869]

88 In this case, we will show that if we move the apex of the pyramid a little from yi to yi , then yi ’s CI is above the top CI while the angle of the new pyramid at yi is not much smaller than ϕ. [sent-282, score-1.002]

89 Letting centeri denote the center of the pyramid, we set yi = yi + (yi − centeri ) and denote the angle at the apex yi by 2ϕ. [sent-283, score-0.687]

90 ¯ 7 B τ +1 Kτ yi ϕ yi ¯ Bτ ϕ z1 z2 Figure 4: Transformation of the pyramid Π in the hat-raising step. [sent-285, score-0.248]

91 The dotted ellipsoid is the new enclosing ellipsoid Bτ +1 . [sent-289, score-0.373]

92 In either case, we have a pyramid with an apex y, base z1 , . [sent-292, score-0.619]

93 , αd > 0, i=1 d αi = 1 : x = y − λ i=1 αi (zi − y)} (3) which is centered at y and a reﬂection of the pyramid around the apex. [sent-299, score-0.18]

94 We set Bτ +1 to be the ellipsoid of minimum volume ¯ containing Bτ \ Kτ and deﬁne Xτ +1 = Xτ ∩ Bτ +1 . [sent-301, score-0.168]

95 The following theorem states our regret guarantee on the performance of Algorithm 2. [sent-311, score-0.265]

96 c2 2 Then with probability at least 1 − 1/T , the regret incurred by the algorithm is bounded by √ 768d3 σ T log2 T 2d2 log d +1 c2 2 4d7 c1 d(d + 1) + c3 c2 2 12c1 d4 + 11 c2 2 √ ˜ = O(d16 T ). [sent-314, score-0.363]

97 Optimal algorithms for online convex optimization with multi-point bandit feedback. [sent-326, score-0.204]

98 Online convex optimization in the bandit setting: gradient descent without a gradient. [sent-385, score-0.204]

99 Modiﬁcations and implementation of the ellipsoid algorithm for linear programming. [sent-389, score-0.169]

100 Gaussian process optimization in the bandit setting: No regret and experimental design. [sent-417, score-0.419]

similar papers computed by tfidf model

tfidf for this paper:

wordName wordTfidf (topN-words)

[('lb', 0.455), ('epoch', 0.403), ('apex', 0.403), ('ub', 0.286), ('regret', 0.265), ('pyramid', 0.18), ('xr', 0.173), ('ellipsoid', 0.146), ('xl', 0.146), ('bandit', 0.122), ('round', 0.119), ('yudin', 0.097), ('queries', 0.093), ('zd', 0.091), ('enclosing', 0.081), ('xc', 0.077), ('incurred', 0.075), ('nemirovski', 0.074), ('pyramids', 0.073), ('epochs', 0.067), ('zeroth', 0.064), ('center', 0.063), ('ci', 0.061), ('query', 0.06), ('vertices', 0.057), ('angle', 0.055), ('bottom', 0.054), ('convex', 0.05), ('pennysylvania', 0.048), ('top', 0.048), ('region', 0.047), ('feasible', 0.041), ('noisy', 0.039), ('ball', 0.039), ('xd', 0.039), ('su', 0.038), ('lipschitz', 0.038), ('simplex', 0.037), ('construction', 0.037), ('convexity', 0.037), ('base', 0.036), ('continue', 0.035), ('xt', 0.035), ('di', 0.035), ('yi', 0.034), ('zi', 0.034), ('unimodal', 0.033), ('centeri', 0.032), ('yk', 0.032), ('optimization', 0.032), ('device', 0.03), ('poly', 0.03), ('agarwal', 0.03), ('erences', 0.028), ('end', 0.027), ('transformation', 0.027), ('foster', 0.026), ('rounded', 0.026), ('cate', 0.026), ('incurs', 0.025), ('regular', 0.025), ('sphere', 0.025), ('cone', 0.025), ('isotropic', 0.025), ('euclidian', 0.024), ('queried', 0.024), ('discards', 0.024), ('cis', 0.024), ('start', 0.024), ('else', 0.024), ('stochastic', 0.023), ('vertex', 0.023), ('algorithm', 0.023), ('alekh', 0.023), ('certi', 0.023), ('ends', 0.023), ('containing', 0.022), ('rounds', 0.022), ('kakade', 0.022), ('erence', 0.022), ('srinivas', 0.022), ('illustrations', 0.022), ('paying', 0.022), ('rakhlin', 0.022), ('portion', 0.022), ('construct', 0.022), ('proceeds', 0.021), ('aa', 0.021), ('england', 0.021), ('feedback', 0.02), ('proceed', 0.02), ('recognized', 0.02), ('doubling', 0.02), ('realizations', 0.02), ('bubeck', 0.02), ('separated', 0.02), ('discarded', 0.019), ('erent', 0.019), ('successively', 0.019), ('purposes', 0.019)]

similar papers list:

simIndex simValue paperId paperTitle

same-paper 1 0.9999997 272 nips-2011-Stochastic convex optimization with bandit feedback

Author: Alekh Agarwal, Dean P. Foster, Daniel J. Hsu, Sham M. Kakade, Alexander Rakhlin

2 0.19534433 128 nips-2011-Improved Algorithms for Linear Stochastic Bandits

Author: Yasin Abbasi-yadkori, Csaba Szepesvári, David Tax

Abstract: We improve the theoretical analysis and empirical performance of algorithms for the stochastic multi-armed bandit problem and the linear stochastic multi-armed bandit problem. In particular, we show that a simple modiﬁcation of Auer’s UCB algorithm (Auer, 2002) achieves with high probability constant regret. More importantly, we modify and, consequently, improve the analysis of the algorithm for the for linear stochastic bandit problem studied by Auer (2002), Dani et al. (2008), Rusmevichientong and Tsitsiklis (2010), Li et al. (2010). Our modiﬁcation improves the regret bound by a logarithmic factor, though experiments show a vast improvement. In both cases, the improvement stems from the construction of smaller conﬁdence sets. For their construction we use a novel tail inequality for vector-valued martingales. 1

3 0.16011849 220 nips-2011-Prediction strategies without loss

Author: Michael Kapralov, Rina Panigrahy

Abstract: Consider a sequence of bits where we are trying to predict the next bit from the previous bits. Assume we are allowed to say ‘predict 0’ or ‘predict 1’, and our payoff is +1 if the prediction is correct and −1 otherwise. We will say that at each point in time the loss of an algorithm is the number of wrong predictions minus the number of right predictions so far. In this paper we are interested in algorithms that have essentially zero (expected) loss over any string at any point in time and yet have small regret with respect to always predicting 0 or always predicting 1. For a sequence of length T our algorithm has regret 14 T and loss √ 2 2 T e− T in expectation for all strings. We show that the tradeoff between loss and regret is optimal up to constant factors. Our techniques extend to the general setting of N experts, where the related problem of trading off regret to the best expert for regret to the ’special’ expert has been studied by Even-Dar et al. (COLT’07). We obtain essentially zero loss with respect to the special expert and optimal loss/regret tradeoff, improving upon the results of Even-Dar et al and settling the main question left open in their paper. The strong loss bounds of the algorithm have some surprising consequences. First, we obtain a parameter free algorithm for the experts problem that has optimal regret bounds with respect to k-shifting optima, i.e. bounds with respect to the optimum that is allowed to change arms multiple times. Moreover, for any window of size n the regret of our algorithm to any expert never exceeds O( n(log N + log T )), where N is the number of experts and T is the time horizon, while maintaining the essentially zero loss property. 1

4 0.14084999 245 nips-2011-Selecting the State-Representation in Reinforcement Learning

Author: Odalric-ambrym Maillard, Daniil Ryabko, Rémi Munos

Abstract: The problem of selecting the right state-representation in a reinforcement learning problem is considered. Several models (functions mapping past observations to a ﬁnite set) of the observations are given, and it is known that for at least one of these models the resulting state dynamics are indeed Markovian. Without knowing neither which of the models is the correct one, nor what are the probabilistic characteristics of the resulting MDP, it is required to obtain as much reward as the optimal policy for the correct model (or for the best of the correct models, if there are several). We propose an algorithm that achieves that, with a regret of order T 2/3 where T is the horizon time. 1

5 0.1286989 210 nips-2011-PAC-Bayesian Analysis of Contextual Bandits

Author: Yevgeny Seldin, Peter Auer, John S. Shawe-taylor, Ronald Ortner, François Laviolette

Abstract: We derive an instantaneous (per-round) data-dependent regret bound for stochastic multiarmed bandits with side information (also known as contextual bandits). The p scaling of our regret bound with the number of states (contexts) N goes as N I⇢t (S; A), where I⇢t (S; A) is the mutual information between states and actions (the side information) used by the algorithm at round t. If the algorithm p uses all the side information, the regret bound scales as N ln K, where K is the number of actions (arms). However, if the side information I⇢t (S; A) is not fully used, the regret bound is signiﬁcantly tighter. In the extreme case, when I⇢t (S; A) = 0, the dependence on the number of states reduces from linear to logarithmic. Our analysis allows to provide the algorithm large amount of side information, let the algorithm to decide which side information is relevant for the task, and penalize the algorithm only for the side information that it is using de facto. We also present an algorithm for multiarmed bandits with side information with O(K) computational complexity per game round. 1

6 0.10913555 61 nips-2011-Contextual Gaussian Process Bandit Optimization

7 0.10605519 98 nips-2011-From Bandits to Experts: On the Value of Side-Observations

8 0.10407932 185 nips-2011-Newtron: an Efficient Bandit algorithm for Online Multiclass Prediction

9 0.099238336 80 nips-2011-Efficient Online Learning via Randomized Rounding

10 0.098483019 233 nips-2011-Rapid Deformable Object Detection using Dual-Tree Branch-and-Bound

11 0.093419813 229 nips-2011-Query-Aware MCMC

12 0.090279356 145 nips-2011-Learning Eigenvectors for Free

13 0.086339183 32 nips-2011-An Empirical Evaluation of Thompson Sampling

14 0.085127898 25 nips-2011-Adaptive Hedge

15 0.080408819 56 nips-2011-Committing Bandits

16 0.076800719 202 nips-2011-On the Universality of Online Mirror Descent

17 0.07566072 177 nips-2011-Multi-armed bandits on implicit metric spaces

18 0.075424053 195 nips-2011-On Learning Discrete Graphical Models using Greedy Methods

19 0.071704999 205 nips-2011-Online Submodular Set Cover, Ranking, and Repeated Active Learning

20 0.071656667 97 nips-2011-Finite Time Analysis of Stratified Sampling for Monte Carlo

similar papers computed by lsi model

lsi for this paper:

topicId topicWeight

[(0, 0.142), (1, -0.201), (2, -0.031), (3, 0.006), (4, 0.176), (5, -0.032), (6, 0.008), (7, 0.057), (8, 0.07), (9, -0.007), (10, -0.035), (11, 0.009), (12, -0.082), (13, 0.001), (14, 0.054), (15, 0.026), (16, -0.015), (17, -0.015), (18, -0.077), (19, 0.025), (20, -0.031), (21, 0.02), (22, 0.029), (23, -0.055), (24, -0.084), (25, -0.032), (26, 0.057), (27, 0.07), (28, -0.034), (29, -0.012), (30, 0.027), (31, -0.084), (32, -0.067), (33, -0.046), (34, -0.088), (35, -0.014), (36, -0.002), (37, -0.072), (38, 0.033), (39, 0.055), (40, 0.036), (41, -0.041), (42, 0.002), (43, -0.019), (44, -0.06), (45, 0.058), (46, 0.022), (47, -0.04), (48, 0.126), (49, -0.04)]

similar papers list:

simIndex simValue paperId paperTitle

same-paper 1 0.94095588 272 nips-2011-Stochastic convex optimization with bandit feedback

Author: Alekh Agarwal, Dean P. Foster, Daniel J. Hsu, Sham M. Kakade, Alexander Rakhlin

2 0.73727316 128 nips-2011-Improved Algorithms for Linear Stochastic Bandits

Author: Yasin Abbasi-yadkori, Csaba Szepesvári, David Tax

3 0.67658526 245 nips-2011-Selecting the State-Representation in Reinforcement Learning

Author: Odalric-ambrym Maillard, Daniil Ryabko, Rémi Munos

4 0.66581404 98 nips-2011-From Bandits to Experts: On the Value of Side-Observations

Author: Shie Mannor, Ohad Shamir

Abstract: We consider an adversarial online learning setting where a decision maker can choose an action in every stage of the game. In addition to observing the reward of the chosen action, the decision maker gets side observations on the reward he would have obtained had he chosen some of the other actions. The observation structure is encoded as a graph, where node i is linked to node j if sampling i provides information on the reward of j. This setting naturally interpolates between the well-known “experts” setting, where the decision maker can view all rewards, and the multi-armed bandits setting, where the decision maker can only view the reward of the chosen action. We develop practical algorithms with provable regret guarantees, which depend on non-trivial graph-theoretic properties of the information feedback structure. We also provide partially-matching lower bounds. 1

5 0.65692925 32 nips-2011-An Empirical Evaluation of Thompson Sampling

Author: Olivier Chapelle, Lihong Li

Abstract: Thompson sampling is one of oldest heuristic to address the exploration / exploitation trade-off, but it is surprisingly unpopular in the literature. We present here some empirical results using Thompson sampling on simulated and real data, and show that it is highly competitive. And since this heuristic is very easy to implement, we argue that it should be part of the standard baselines to compare against. 1

6 0.63712949 220 nips-2011-Prediction strategies without loss

7 0.61174202 61 nips-2011-Contextual Gaussian Process Bandit Optimization

8 0.60368317 210 nips-2011-PAC-Bayesian Analysis of Contextual Bandits

9 0.55988616 185 nips-2011-Newtron: an Efficient Bandit algorithm for Online Multiclass Prediction

10 0.50706124 25 nips-2011-Adaptive Hedge

11 0.48649487 56 nips-2011-Committing Bandits

12 0.48226738 177 nips-2011-Multi-armed bandits on implicit metric spaces

13 0.4805834 20 nips-2011-Active Learning Ranking from Pairwise Preferences with Almost Optimal Query Complexity

14 0.47014606 229 nips-2011-Query-Aware MCMC

15 0.44396421 80 nips-2011-Efficient Online Learning via Randomized Rounding

16 0.44038665 145 nips-2011-Learning Eigenvectors for Free

17 0.42040765 205 nips-2011-Online Submodular Set Cover, Ranking, and Repeated Active Learning

18 0.41554606 97 nips-2011-Finite Time Analysis of Stratified Sampling for Monte Carlo

19 0.40412223 231 nips-2011-Randomized Algorithms for Comparison-based Search

20 0.39351273 160 nips-2011-Linear Submodular Bandits and their Application to Diversified Retrieval

similar papers computed by lda model

lda for this paper:

topicId topicWeight

[(0, 0.027), (4, 0.02), (20, 0.02), (26, 0.025), (31, 0.05), (43, 0.046), (45, 0.582), (57, 0.016), (74, 0.031), (79, 0.014), (83, 0.025), (99, 0.03)]

similar papers list:

simIndex simValue paperId paperTitle

1 0.99678922 142 nips-2011-Large-Scale Sparse Principal Component Analysis with Application to Text Data

Author: Youwei Zhang, Laurent E. Ghaoui

Abstract: Sparse PCA provides a linear combination of small number of features that maximizes variance across data. Although Sparse PCA has apparent advantages compared to PCA, such as better interpretability, it is generally thought to be computationally much more expensive. In this paper, we demonstrate the surprising fact that sparse PCA can be easier than PCA in practice, and that it can be reliably applied to very large data sets. This comes from a rigorous feature elimination pre-processing result, coupled with the favorable fact that features in real-life data typically have exponentially decreasing variances, which allows for many features to be eliminated. We introduce a fast block coordinate ascent algorithm with much better computational complexity than the existing ﬁrst-order ones. We provide experimental results obtained on text corpora involving millions of documents and hundreds of thousands of features. These results illustrate how Sparse PCA can help organize a large corpus of text data in a user-interpretable way, providing an attractive alternative approach to topic models. 1

2 0.99526906 63 nips-2011-Convergence Rates of Inexact Proximal-Gradient Methods for Convex Optimization

Author: Mark Schmidt, Nicolas L. Roux, Francis R. Bach

Abstract: We consider the problem of optimizing the sum of a smooth convex function and a non-smooth convex function using proximal-gradient methods, where an error is present in the calculation of the gradient of the smooth term or in the proximity operator with respect to the non-smooth term. We show that both the basic proximal-gradient method and the accelerated proximal-gradient method achieve the same convergence rate as in the error-free case, provided that the errors decrease at appropriate rates. Using these rates, we perform as well as or better than a carefully chosen ﬁxed error level on a set of structured sparsity problems. 1

3 0.99192852 28 nips-2011-Agnostic Selective Classification

Author: Yair Wiener, Ran El-Yaniv

Abstract: For a learning problem whose associated excess loss class is (β, B)-Bernstein, we show that it is theoretically possible to track the same classiﬁcation performance of the best (unknown) hypothesis in our class, provided that we are free to abstain from prediction in some region of our choice. The (probabilistic) volume of this √ rejected region of the domain is shown to be diminishing at rate O(Bθ( 1/m)β ), where θ is Hanneke’s disagreement coefﬁcient. The strategy achieving this performance has computational barriers because it requires empirical error minimization in an agnostic setting. Nevertheless, we heuristically approximate this strategy and develop a novel selective classiﬁcation algorithm using constrained SVMs. We show empirically that the resulting algorithm consistently outperforms the traditional rejection mechanism based on distance from decision boundary. 1

same-paper 4 0.99174666 272 nips-2011-Stochastic convex optimization with bandit feedback

Author: Alekh Agarwal, Dean P. Foster, Daniel J. Hsu, Sham M. Kakade, Alexander Rakhlin

5 0.98664308 143 nips-2011-Learning Anchor Planes for Classification

Author: Ziming Zhang, Lubor Ladicky, Philip Torr, Amir Saffari

Abstract: Local Coordinate Coding (LCC) [18] is a method for modeling functions of data lying on non-linear manifolds. It provides a set of anchor points which form a local coordinate system, such that each data point on the manifold can be approximated by a linear combination of its anchor points, and the linear weights become the local coordinate coding. In this paper we propose encoding data using orthogonal anchor planes, rather than anchor points. Our method needs only a few orthogonal anchor planes for coding, and it can linearize any (α, β, p)-Lipschitz smooth nonlinear function with a ﬁxed expected value of the upper-bound approximation error on any high dimensional data. In practice, the orthogonal coordinate system can be easily learned by minimizing this upper bound using singular value decomposition (SVD). We apply our method to model the coordinates locally in linear SVMs for classiﬁcation tasks, and our experiment on MNIST shows that using only 50 anchor planes our method achieves 1.72% error rate, while LCC achieves 1.90% error rate using 4096 anchor points. 1

6 0.98208606 238 nips-2011-Relative Density-Ratio Estimation for Robust Distribution Comparison

7 0.98137784 293 nips-2011-Understanding the Intrinsic Memorability of Images

8 0.9461149 190 nips-2011-Nonlinear Inverse Reinforcement Learning with Gaussian Processes

9 0.91063136 19 nips-2011-Active Classification based on Value of Classifier

10 0.89816487 284 nips-2011-The Impact of Unlabeled Patterns in Rademacher Complexity Theory for Kernel Classifiers

11 0.89198846 214 nips-2011-PiCoDes: Learning a Compact Code for Novel-Category Recognition

12 0.88788706 105 nips-2011-Generalized Lasso based Approximation of Sparse Coding for Visual Recognition

13 0.88777643 252 nips-2011-ShareBoost: Efficient multiclass learning with feature sharing

14 0.88750088 220 nips-2011-Prediction strategies without loss

15 0.88393813 45 nips-2011-Beating SGD: Learning SVMs in Sublinear Time

16 0.88330972 78 nips-2011-Efficient Methods for Overlapping Group Lasso

17 0.88273036 182 nips-2011-Nearest Neighbor based Greedy Coordinate Descent

18 0.87311035 161 nips-2011-Linearized Alternating Direction Method with Adaptive Penalty for Low-Rank Representation

19 0.87045264 187 nips-2011-Non-Asymptotic Analysis of Stochastic Approximation Algorithms for Machine Learning

20 0.86893177 70 nips-2011-Dimensionality Reduction Using the Sparse Linear Model