nips nips2013 nips2013-240 knowledge-graph by maker-knowledge-mining
Source: pdf
Author: Sasha Rakhlin, Karthik Sridharan
Abstract: We provide several applications of Optimistic Mirror Descent, an online learning algorithm based on the idea of predictable sequences. First, we recover the Mirror Prox algorithm for offline optimization, prove an extension to H¨ lder-smooth o functions, and apply the results to saddle-point type problems. Next, we prove that a version of Optimistic Mirror Descent (which has a close relation to the Exponential Weights algorithm) can be used by two strongly-uncoupled players in a finite zero-sum matrix game to converge to the minimax equilibrium at the rate of O((log T )T ). This addresses a question of Daskalakis et al [6]. Further, we consider a partial information version of the problem. We then apply the results to convex programming and exhibit a simple algorithm for the approximate Max Flow problem. 1
Reference: text
sentIndex sentText sentNum sentScore
1 Optimization, Learning, and Games with Predictable Sequences Alexander Rakhlin University of Pennsylvania Karthik Sridharan University of Pennsylvania Abstract We provide several applications of Optimistic Mirror Descent, an online learning algorithm based on the idea of predictable sequences. [sent-1, score-0.192]
2 Next, we prove that a version of Optimistic Mirror Descent (which has a close relation to the Exponential Weights algorithm) can be used by two strongly-uncoupled players in a finite zero-sum matrix game to converge to the minimax equilibrium at the rate of O((log T )T ). [sent-3, score-0.516]
3 We then apply the results to convex programming and exhibit a simple algorithm for the approximate Max Flow problem. [sent-6, score-0.108]
4 1 Introduction Recently, no-regret algorithms have received increasing attention in a variety of communities, including theoretical computer science, optimization, and game theory [3, 1]. [sent-7, score-0.078]
5 The wide applicability of these algorithms is arguably due to the black-box regret guarantees that hold for arbitrary sequences. [sent-8, score-0.135]
6 However, such regret guarantees can be loose if the sequence being encountered is not “worst-case”. [sent-9, score-0.166]
7 For instance, in some applications of online methods, the sequence comes from an additional computation done by the learner, thus being far from arbitrary. [sent-11, score-0.06]
8 One way to formally capture the partially benign nature of data is through a notion of predictable sequences [11]. [sent-12, score-0.229]
9 First, we show that the Mirror Prox method [9], designed for optimizing non-smooth structured saddle-point problems, can be viewed as an instance of the predictable sequence approach. [sent-14, score-0.215]
10 Predictability in this case is due precisely to smoothness of the inner optimization part and the saddle-point structure of the problem. [sent-15, score-0.058]
11 Second, we address the question raised in [6] about existence of “simple” algorithms that converge ˜ at the rate of O(T −1 ) when employed in an uncoupled manner by players in a zero-sum finite matrix game, yet maintain the usual O(T −12 ) rate against arbitrary sequences. [sent-17, score-0.413]
12 We give a positive answer and exhibit a fully adaptive algorithm that does not require the prior knowledge of whether the other player is collaborating. [sent-18, score-0.341]
13 Here, the additional predictability comes from the fact that both players attempt to converge to the minimax value. [sent-19, score-0.411]
14 We also tackle a partial information version of the problem where the player has only access to the real-valued payoff of the mixed actions played by the two players on each round rather than the entire vector. [sent-20, score-0.785]
15 Our third application is to convex programming: optimization of a linear function subject to convex constraints. [sent-21, score-0.164]
16 This problem often arises in theoretical computer science, and we show that the idea of 1 predictable sequences can be used here too. [sent-22, score-0.212]
17 2 Online Learning with Predictable Gradient Sequences Let us describe the online convex optimization (OCO) problem and the basic algorithm studied in [4, 11]. [sent-24, score-0.126]
18 , T , the learner makes a prediction ft ∈ F and observes a convex function Gt on F. [sent-29, score-0.687]
19 The objective is to keep regret T 1 G (f ) − Gt (f ∗ ) small for any f ∗ ∈ F. [sent-30, score-0.116]
20 Suppose that at the beginning of every round t, the learner has access to Mt , a vector computable based on the past observations or side information. [sent-35, score-0.079]
21 The method adheres to the OCO protocol since Mt is available at the beginning of round t, and ∇Gt (ft ) becomes available after the prediction ft is made. [sent-37, score-0.635]
22 Let R ∶ B → R be a 1-strongly convex function on F with respect to some norm ⋅ , and let ⋅ ∗ denote the dual norm. [sent-41, score-0.083]
23 ∗ When applying the lemma, we will often use the simple fact that ⇢ 1 2 2 ∇t − Mt ∗ gt − ft = inf ∇t − Mt ∗ + gt − ft . [sent-43, score-1.23]
24 2 2⇢ ⇢>0 (3) In particular, by setting ⇢ = ⌘, we obtain the (unnormalized) regret bound of ⌘ −1 R2 + 2 2 (⌘2) ∑T ∇t − Mt ∗ , which is R 2 ∑T ∇t − Mt ∗ by choosing ⌘ optimally. [sent-44, score-0.116]
25 Then regret of the Optimistic Mirror Descent algorithm is upper 2 bounded by 3. [sent-47, score-0.116]
26 t=1 These results indicate that tighter regret bounds are possible if one can guess the next gradient ∇t by computing Mt . [sent-49, score-0.139]
27 One such case arises in offline optimization of a smooth function, whereby the previous gradient turns out to be a good proxy for the next one. [sent-50, score-0.135]
28 In this optimization setting, no guessing of Mt is needed: we may simply query the oracle for the gradient and set Mt = ∇G(gt−1 ). [sent-52, score-0.055]
29 The Optimistic Mirror Descent then becomes ft = argmin ⌘t f, ∇G(gt−1 ) + DR (f, gt−1 ) , gt = argmin ⌘t g, ∇G(ft ) + DR (g, gt−1 ) f ∈F g∈F 2 which can be recognized as the Mirror Prox method, due to Nemirovski [9]. [sent-53, score-1.05]
30 (3) and ⇢ = ⌘ = 1H immediately yields a bound T ∗ 2 G(ft ) − G(f ) ≤ HR , t=1 1 ¯ ¯ which implies that the average fT = T ∑T ft satisfies G(fT ) − G(f ∗ ) ≤ HR2 T , a known bound t=1 for Mirror Prox. [sent-56, score-0.577]
31 We now extend this result to arbitrary ↵-H¨ lder smooth functions, that is convex o functions G such that ∇G(f ) − ∇G(g)∗ ≤ Hf − g↵ for all f, g ∈ F. [sent-57, score-0.165]
32 Let F be a convex set in a Banach space B and let R ∶ B → R be a 1-strongly convex function on F with respect to some norm ⋅ . [sent-59, score-0.149]
33 Let G be a convex ↵-H¨ lder smooth function with o ¯ = 1 ∑T ft of the trajectory given by Optimistic constant H > 0 and ↵ ∈ [0, 1]. [sent-60, score-0.742]
34 Then the average fT T t=1 Mirror Descent Algorithm enjoys ¯ G(fT ) − inf G(f ) ≤ 8HR1+↵ f ∈F where R ≥ 0 is such that supf ∈F DR (f, g0 ) ≤ R. [sent-61, score-0.076]
35 T 1+↵ 2 This result provides a smooth interpolation between the T −12 rate at ↵ = 0 (that is, no predictability of the gradient is possible) and the T −1 rate when the smoothness structure allows for a dramatic speed up with a very simple modification of the original Mirror Descent. [sent-62, score-0.212]
36 3 Structured Optimization In this section we consider the structured optimization problem argmin G(f ) f ∈F where G(f ) is of the form G(f ) = supx∈X (f, x) with (⋅, x) convex for every x ∈ X and (f, ⋅) concave for every f ∈ F. [sent-63, score-0.172]
37 While G itself need not be smooth, it has been recognized that the structure can be exploited to improve rates of optimization if the function is smooth [10]. [sent-65, score-0.136]
38 From the point of view of online learning, we will see that the optimization problem of the saddle point type can be solved by playing two online convex optimization algorithms against each other (henceforth called Players I and II). [sent-66, score-0.267]
39 , fT by using a regret-minimization algorithm, such that 1 T 1 T 1 (ft , xt ) − inf (f, xt ) ≤ Rate (x1 , . [sent-70, score-0.41]
40 , xT with 1 T 1 T 2 (− (ft , xt )) − inf (− (ft , x)) ≤ Rate (f1 , . [sent-76, score-0.243]
41 [7]), inf f 1 T ¯ (f, xt ) ≤ inf (f, xT ) ≤ sup inf (f, x) T t=1 x f f ¯ ≤ inf sup (f, x) ≤ sup fT , x ≤ sup f ¯ where fT = sup x∈X 1 T T ¯ ∑t=1 ft and xT = 1 T x x T ∑t=1 xt . [sent-82, score-1.52]
42 By adding (4) and (5), we have x 1 T (ft , x) T t=1 1 T 1 T 1 2 (ft , x) − inf (f, xt ) ≤ Rate (x1 , . [sent-83, score-0.243]
43 , fT ) T t=1 f ∈F T t=1 (6) which sandwiches the previous sequence of inequalities up to the sum of regret rates and implies ¯ near-optimality of fT and xT . [sent-89, score-0.17]
44 Suppose both players employ the Optimistic Mirror Descent algorithm with, respectively, predictable sequences Mt1 and Mt2 , 1-strongly convex functions R1 on F (w. [sent-91, score-0.583]
45 Let {ft } and {xt } denote the primary sequences of the players while let {gt }, {yt } denote the secondary. [sent-98, score-0.353]
46 We obtain the following corollary: ∶ F × X R is H¨ lder smooth in the following sense: o Corollary 5. [sent-101, score-0.099]
47 Suppose both players employ Optimistic Mirror Descent with Mt1 = ∇f (gt−1 , yt−1 ) and Mt2 = ∇x (gt−1 , yt−1 ), where {gt } and {yt } 2 are the secondary sequences updated by the two algorithms, and with step sizes ⌘ = ⌘ ′ = (R1 + 2 R2 ) 1− 2 (2H)−1 T 2 −1 2 . [sent-104, score-0.353]
48 Such a coupling of the upper bounds on regret of the two players can be seen as leading to faster rates under the appropriate assumptions, and this idea will be exploited to a great extent in the proofs of the next section. [sent-106, score-0.443]
49 4 Zero-sum Game and Uncoupled Dynamics The notions of a zero-sum matrix game and a minimax equilibrium are arguably the most basic and important notions of game theory. [sent-107, score-0.277]
50 The tight connection between linear programming and minimax equilibrium suggests that there might be simple dynamics that can lead the two players of the game to eventually converge to the equilibrium value. [sent-108, score-0.585]
51 Existence of such simple or natural dynamics is of interest in behavioral economics, where one asks whether agents can discover static solution concepts of the game iteratively and without extensive communication. [sent-109, score-0.1]
52 The two players aim to ¯¯ ¯ x find a pair of near-optimal mixed strategies (f , x) ∈ n × m such that f T A¯ is close to the minimax value minf ∈ n maxx∈ m f T Ax, where n is the probability simplex over n actions. [sent-111, score-0.446]
53 It is well-known (and follows immediately from (6)) that the players can compute near-optimal strategies by simply playing no-regret algorithms [7]. [sent-113, score-0.375]
54 More precisely, on round t, the players I and II “predict” the mixed strategies ft and xt and observe Axt and ftT A, respectively. [sent-114, score-1.201]
55 While black-box regret minimization algorithms, such as Exponential Weights, immediately yield O(T −12 ) convergence rates, Daskalakis et al [6] asked whether faster methods exist. [sent-115, score-0.156]
56 The authors of [6] exhibited a near-optimal algorithm that, if used by both players, yields a pair of 4 +(log(m+n)) mixed strategies that constitutes an O log(m+n)(log TT 32 ) -approximate minimax equi- librium. [sent-117, score-0.141]
57 Furthermore, the method has a regret bound of the same order as Exponential Weights when faced with an arbitrary sequence. [sent-118, score-0.135]
58 Furthermore, by choosing the step size adaptively, the same method guarantees the typical O(T −12 ) regret if not faced with a compliant player, thus ensuring robustness. [sent-125, score-0.135]
59 1, we analyze the “first-order information” version of the problem, as described above: upon playing the respective mixed strategies ft and xt on round t, Player I observes Axt and Player II observes ftT A. [sent-127, score-0.961]
60 2, we consider an interesting extension to partial information, whereby the players submit their moves ft , xt but only observe the real value ftT Axt . [sent-129, score-1.108]
61 Other than the “mixing in” of the uniform distribution, the algorithm for both players is simply the Optimistic Mirror Descent with the (negative) entropy function. [sent-134, score-0.305]
62 In fact, the step of mixing in the uniform distribution is only needed when some coordinate of gt (resp. [sent-135, score-0.359]
63 Furthermore, this step is also not needed if none of the players deviate from the prescribed method. [sent-137, score-0.326]
64 In such a case, the resulting algorithm is simply the constant step-size Exponential Weights ft (i) ∝ exp{−⌘ ∑t−2 [Axs−1 ]i + 2⌘[Axt−1 ]i }, but with a factor 2 in front of the latest loss vector! [sent-138, score-0.577]
65 T Furthermore, if only one player (say, Player I) follows the above algorithm, her regret against any sequence x1 , . [sent-142, score-0.46]
66 O (9) T t=1 5 √ In particular, this implies the worst-case regret of O log(nT ) in the general setting of online linear T optimization. [sent-146, score-0.144]
67 We remark that (9) can give intermediate rates for regret in the case that the second player deviates from the prescribed strategy but produces “stable” moves. [sent-147, score-0.494]
68 For instance, if the second player employs a mirror descent algorithm (or Follow the Regularized Leader / Exponential Weights method) with step size ⌘, one can typically show stability xt − xt−1 = O(⌘). [sent-148, score-0.539]
69 In this case, (9) yields the rate log O ⌘ √T T for the first player. [sent-149, score-0.072]
70 A typical setting of ⌘ ∝ T −12 for the second player still ensures the O(log T T ) regret for the first player. [sent-150, score-0.428]
71 The reason for the extra step of “mixing in” the uniform distribution stems from the goal of having an adaptive and robust method that still attains O(T −12 ) regret if the other player deviates from using the algorithm. [sent-152, score-0.502]
72 If one is only interested in the dynamics when both players cooperate, this step is not necessary, and in this case the extraneous log T factor disappears from the above bound, leading to the O log n+log m convergence. [sent-153, score-0.434]
73 It is possible that the doubling trick or the analysis of Auer et al [2] (who encountered the same problem for the Exponential Weights algorithm) can remove the extra log T factor while still preserving the regret minimization property. [sent-156, score-0.243]
74 We also remark that Rmax is small when R1 is instead the p-norm; hence, the use of this regularizer avoids the extraneous logarithmic in T factor while still preserving the logarithmic dependence on n and m. [sent-157, score-0.059]
75 Recall that the matrix A is not known to the players, yet we are interested in finding ✏-optimal minimax strategies. [sent-161, score-0.066]
76 On each round, the two players choose mixed strategies ft ∈ n and xt ∈ m , respectively, and observe ftT Axt . [sent-162, score-1.143]
77 Now the question is, how many such observations do we need to get to an ✏-optimal minimax strategy? [sent-163, score-0.066]
78 The specific setting we consider below requires that on each round t, the two players play four times, and that these four plays are -close to each other (that is, fti − ftj 1 ≤ for i, j ∈ {1, . [sent-165, score-0.42]
79 Interestingly, up to logarithmic factors, the fast rate of the previous section is possible even in this scenario, but we do require the knowledge of the number of actions of the opposing player (or, an upper bound on this number). [sent-169, score-0.383]
80 We leave it as an open problem the question of whether one can attain the 1T -type rate with only one play per round. [sent-170, score-0.068]
81 Furthermore, if only one player (say, Player I) follows the above algorithm, her regret against any sequence x1 , . [sent-182, score-0.46]
82 5 Approximate Smooth Convex Programming In this section we show how one can use the structured optimization results from Section 3 for approximately solving convex programming problems. [sent-187, score-0.159]
83 c f (10) ∀i ∈ [d], Gi (f ) ≤ 1 where G is a convex set and each Gi is an H-smooth convex function. [sent-190, score-0.132]
84 The convex programming problem in (10) can now be reformulated as d argmin max Gi (f ) = argmin sup x(i)Gi (f ) . [sent-193, score-0.279]
85 We may think of the first player as aiming to minimize the above expression over F, while the second player maximizes over a mixture of constraints with the aim of violating at least one of them. [sent-195, score-0.624]
86 Consider the solution where ↵ = T ∑t=1 ft ∈ F is the average of the trajectory of the procedure in Lemma 4 2 for the optimization problem (11). [sent-200, score-0.609]
87 Set ⌘ = argmin B + ⌘ log d , ⌘ ′ = ⌘ − H, ⌘ 1−⌘H = such that Mt1 ✏ ✏+ ¯ and fT = ˆ ¯ fT = (1 − ↵)fT + ↵f0 1 T d ∑i=1 yt−1 (i)∇Gi (gt−1 ) and Mt2 T> ⌘≤H −1 = (G1 (gt−1 ), . [sent-203, score-0.096]
88 Let number of iterations T be 1 B 2 ⌘ log d inf + ✏ ⌘≤H −1 ⌘ 1 − ⌘H 7 ˆ We then have that fT ∈ G satisfies all d constraints and is ✏ -approximate, that is ˆ c fT ≥ 1 − ✏ F∗ . [sent-207, score-0.117]
89 Lemma 8 tells us that using the predictable sequences approach for the two players, one can obtain an ✏ -approximate solution to the smooth convex programming problem in number of iterations at most order 1✏. [sent-208, score-0.381]
90 T2 ) is the time complexity for single update of the predictable sequence algorithm of Player I (resp. [sent-210, score-0.213]
91 The Max Flow problem can be seen as an instance of a convex (linear) programming problem, and we apply the proposed algorithm for structured optimization to obtain an approximate solution. [sent-215, score-0.159]
92 Applying the procedure for smooth convex programming from Lemma 8 to the Max Flow problem with f0 = 0 ∈ G the 0 flow, the time complexity to compute an ✏-approximate Max Flow is bounded by √ d32 log d O . [sent-223, score-0.21]
93 ✏ This time complexity matches the known result from [8], but with a much simpler procedure (gradient descent for the flow player and Exponential Weights for the constraints). [sent-224, score-0.365]
94 As we showed, the notion of using extra information about the sequence is a powerful tool with applications in optimization, convex programming, game theory, to name a few. [sent-228, score-0.215]
95 All the applications considered in this paper, however, used some notion of smoothness for constructing the predictable process Mt . [sent-229, score-0.207]
96 An interesting direction of further research is to isolate more general conditions under which the next gradient is predictable, perhaps even when the functions are not smooth in any sense. [sent-230, score-0.084]
97 This could then be used to solve for the right predictable sequence to use so as to optimize the bounds. [sent-232, score-0.196]
98 Using this notion of selecting predictable sequences one can hope to derive adaptive optimization procedures that in practice can provide rapid convergence. [sent-233, score-0.29]
99 The multiplicative weights update method: A meta-algorithm and applications. [sent-240, score-0.047]
100 Prox-method with rate of convergence O(1/t) for variational inequalities with lipschitz continuous monotone operators and smooth convex-concave saddle point problems. [sent-296, score-0.135]
wordName wordTfidf (topN-words)
[('ft', 0.577), ('gt', 0.342), ('player', 0.312), ('players', 0.305), ('mirror', 0.174), ('axt', 0.17), ('xt', 0.167), ('predictable', 0.164), ('mt', 0.158), ('yt', 0.121), ('regret', 0.116), ('ftt', 0.105), ('flow', 0.094), ('optimistic', 0.093), ('vjt', 0.088), ('uit', 0.08), ('game', 0.078), ('dr', 0.077), ('inf', 0.076), ('nt', 0.072), ('minimax', 0.066), ('convex', 0.066), ('axi', 0.062), ('smooth', 0.061), ('sup', 0.061), ('round', 0.058), ('argmin', 0.055), ('descent', 0.053), ('daskalakis', 0.053), ('gi', 0.052), ('sequences', 0.048), ('prox', 0.047), ('unif', 0.047), ('rmax', 0.047), ('uncoupled', 0.046), ('ow', 0.044), ('mixed', 0.043), ('saddle', 0.043), ('rt', 0.042), ('programming', 0.042), ('log', 0.041), ('predictability', 0.04), ('lemma', 0.04), ('lder', 0.038), ('playing', 0.038), ('play', 0.037), ('equilibrium', 0.036), ('sequence', 0.032), ('optimization', 0.032), ('strategies', 0.032), ('rate', 0.031), ('oco', 0.031), ('weights', 0.03), ('adaptive', 0.029), ('exponential', 0.028), ('online', 0.028), ('corollary', 0.027), ('banach', 0.027), ('smoothness', 0.026), ('extraneous', 0.025), ('fit', 0.024), ('doubling', 0.024), ('actions', 0.023), ('adaptively', 0.023), ('observes', 0.023), ('capacity', 0.023), ('payoff', 0.023), ('deviates', 0.023), ('gradient', 0.023), ('rates', 0.022), ('dynamics', 0.022), ('extra', 0.022), ('al', 0.022), ('ii', 0.022), ('rakhlin', 0.021), ('partial', 0.021), ('learner', 0.021), ('exp', 0.021), ('recognized', 0.021), ('prescribed', 0.021), ('plays', 0.02), ('arguably', 0.019), ('draw', 0.019), ('structured', 0.019), ('observe', 0.019), ('whereby', 0.019), ('faced', 0.019), ('pennsylvania', 0.018), ('asked', 0.018), ('st', 0.018), ('suppose', 0.018), ('auer', 0.018), ('games', 0.018), ('encountered', 0.018), ('mixing', 0.017), ('norm', 0.017), ('ine', 0.017), ('notion', 0.017), ('update', 0.017), ('logarithmic', 0.017)]
simIndex simValue paperId paperTitle
same-paper 1 1.0000007 240 nips-2013-Optimization, Learning, and Games with Predictable Sequences
Author: Sasha Rakhlin, Karthik Sridharan
Abstract: We provide several applications of Optimistic Mirror Descent, an online learning algorithm based on the idea of predictable sequences. First, we recover the Mirror Prox algorithm for offline optimization, prove an extension to H¨ lder-smooth o functions, and apply the results to saddle-point type problems. Next, we prove that a version of Optimistic Mirror Descent (which has a close relation to the Exponential Weights algorithm) can be used by two strongly-uncoupled players in a finite zero-sum matrix game to converge to the minimax equilibrium at the rate of O((log T )T ). This addresses a question of Daskalakis et al [6]. Further, we consider a partial information version of the problem. We then apply the results to convex programming and exhibit a simple algorithm for the approximate Max Flow problem. 1
2 0.44421226 191 nips-2013-Minimax Optimal Algorithms for Unconstrained Linear Optimization
Author: Brendan McMahan, Jacob Abernethy
Abstract: We design and analyze minimax-optimal algorithms for online linear optimization games where the player’s choice is unconstrained. The player strives to minimize regret, the difference between his loss and the loss of a post-hoc benchmark strategy. While the standard benchmark is the loss of the best strategy chosen from a bounded comparator set, we consider a very broad range of benchmark functions. The problem is cast as a sequential multi-stage zero-sum game, and we give a thorough analysis of the minimax behavior of the game, providing characterizations for the value of the game, as well as both the player’s and the adversary’s optimal strategy. We show how these objects can be computed efficiently under certain circumstances, and by selecting an appropriate benchmark, we construct a novel hedging strategy for an unconstrained betting game. 1
3 0.37526459 231 nips-2013-Online Learning with Switching Costs and Other Adaptive Adversaries
Author: Nicolò Cesa-Bianchi, Ofer Dekel, Ohad Shamir
Abstract: We study the power of different types of adaptive (nonoblivious) adversaries in the setting of prediction with expert advice, under both full-information and bandit feedback. We measure the player’s performance using a new notion of regret, also known as policy regret, which better captures the adversary’s adaptiveness to the player’s behavior. In a setting where losses are allowed to drift, we characterize —in a nearly complete manner— the power of adaptive adversaries with bounded memories and switching costs. In particular, we show that with switch� ing costs, the attainable rate with bandit feedback is Θ(T 2/3 ). Interestingly, this √ rate is significantly worse than the Θ( T ) rate attainable with switching costs in the full-information case. Via a novel reduction from experts to bandits, we also � show that a bounded memory adversary can force Θ(T 2/3 ) regret even in the full information case, proving that switching costs are easier to control than bounded memory adversaries. Our lower bounds rely on a new stochastic adversary strategy that generates loss processes with strong dependencies. 1
4 0.30460721 2 nips-2013-(Nearly) Optimal Algorithms for Private Online Learning in Full-information and Bandit Settings
Author: Abhradeep Guha Thakurta, Adam Smith
Abstract: We give differentially private algorithms for a large class of online learning algorithms, in both the full information and bandit settings. Our algorithms aim to minimize a convex loss function which is a sum of smaller convex loss terms, one for each data point. To design our algorithms, we modify the popular mirror descent approach, or rather a variant called follow the approximate leader. The technique leads to the first nonprivate algorithms for private online learning in the bandit setting. In the full information setting, our algorithms improve over the regret bounds of previous work (due to Dwork, Naor, Pitassi and Rothblum (2010) and Jain, Kothari and Thakurta (2012)). In many cases, our algorithms (in both settings) match the dependence on the input length, T , of the optimal nonprivate regret bounds up to logarithmic factors in T . Our algorithms require logarithmic space and update time. 1
5 0.30268183 89 nips-2013-Dimension-Free Exponentiated Gradient
Author: Francesco Orabona
Abstract: I present a new online learning algorithm that extends the exponentiated gradient framework to infinite dimensional spaces. My analysis shows that the algorithm is implicitly able to estimate the L2 norm of the unknown competitor, U , achieving √ a regret bound of the order of O(U log(U T + 1)) T ), instead of the standard √ O((U 2 + 1) T ), achievable without knowing U . For this analysis, I introduce novel tools for algorithms with time-varying regularizers, through the use of local smoothness. Through a lower bound, I also show that the algorithm is optimal up to log(U T ) term for linear and Lipschitz losses. 1
6 0.26273695 95 nips-2013-Distributed Exploration in Multi-Armed Bandits
7 0.25422397 325 nips-2013-The Pareto Regret Frontier
8 0.21983594 125 nips-2013-From Bandits to Experts: A Tale of Domination and Independence
9 0.20534599 106 nips-2013-Eluder Dimension and the Sample Complexity of Optimistic Exploration
10 0.16408999 230 nips-2013-Online Learning with Costly Features and Labels
11 0.14728083 175 nips-2013-Linear Convergence with Condition Number Independent Access of Full Gradients
13 0.13880499 266 nips-2013-Recurrent linear models of simultaneously-recorded neural populations
14 0.11857431 211 nips-2013-Non-Linear Domain Adaptation with Boosting
15 0.10579733 62 nips-2013-Causal Inference on Time Series using Restricted Structural Equation Models
16 0.099043541 233 nips-2013-Online Robust PCA via Stochastic Optimization
17 0.093224742 269 nips-2013-Regression-tree Tuning in a Streaming Setting
18 0.083943136 48 nips-2013-Bayesian Inference and Learning in Gaussian Process State-Space Models with Particle MCMC
19 0.083900221 7 nips-2013-A Gang of Bandits
20 0.077097073 286 nips-2013-Robust learning of low-dimensional dynamics from large neural ensembles
topicId topicWeight
[(0, 0.213), (1, -0.17), (2, 0.327), (3, -0.296), (4, -0.036), (5, -0.126), (6, -0.019), (7, 0.027), (8, 0.074), (9, -0.1), (10, -0.049), (11, -0.157), (12, 0.025), (13, 0.018), (14, -0.051), (15, -0.065), (16, 0.032), (17, -0.099), (18, 0.002), (19, 0.052), (20, -0.022), (21, 0.03), (22, -0.015), (23, 0.021), (24, -0.064), (25, -0.1), (26, 0.155), (27, 0.029), (28, 0.129), (29, -0.011), (30, 0.043), (31, -0.103), (32, 0.032), (33, -0.119), (34, 0.225), (35, 0.012), (36, 0.054), (37, 0.12), (38, 0.058), (39, 0.08), (40, 0.056), (41, -0.08), (42, 0.037), (43, 0.087), (44, -0.077), (45, -0.055), (46, -0.017), (47, 0.066), (48, -0.039), (49, -0.053)]
simIndex simValue paperId paperTitle
same-paper 1 0.96430129 240 nips-2013-Optimization, Learning, and Games with Predictable Sequences
Author: Sasha Rakhlin, Karthik Sridharan
Abstract: We provide several applications of Optimistic Mirror Descent, an online learning algorithm based on the idea of predictable sequences. First, we recover the Mirror Prox algorithm for offline optimization, prove an extension to H¨ lder-smooth o functions, and apply the results to saddle-point type problems. Next, we prove that a version of Optimistic Mirror Descent (which has a close relation to the Exponential Weights algorithm) can be used by two strongly-uncoupled players in a finite zero-sum matrix game to converge to the minimax equilibrium at the rate of O((log T )T ). This addresses a question of Daskalakis et al [6]. Further, we consider a partial information version of the problem. We then apply the results to convex programming and exhibit a simple algorithm for the approximate Max Flow problem. 1
2 0.87538624 191 nips-2013-Minimax Optimal Algorithms for Unconstrained Linear Optimization
Author: Brendan McMahan, Jacob Abernethy
Abstract: We design and analyze minimax-optimal algorithms for online linear optimization games where the player’s choice is unconstrained. The player strives to minimize regret, the difference between his loss and the loss of a post-hoc benchmark strategy. While the standard benchmark is the loss of the best strategy chosen from a bounded comparator set, we consider a very broad range of benchmark functions. The problem is cast as a sequential multi-stage zero-sum game, and we give a thorough analysis of the minimax behavior of the game, providing characterizations for the value of the game, as well as both the player’s and the adversary’s optimal strategy. We show how these objects can be computed efficiently under certain circumstances, and by selecting an appropriate benchmark, we construct a novel hedging strategy for an unconstrained betting game. 1
3 0.81939185 231 nips-2013-Online Learning with Switching Costs and Other Adaptive Adversaries
Author: Nicolò Cesa-Bianchi, Ofer Dekel, Ohad Shamir
Abstract: We study the power of different types of adaptive (nonoblivious) adversaries in the setting of prediction with expert advice, under both full-information and bandit feedback. We measure the player’s performance using a new notion of regret, also known as policy regret, which better captures the adversary’s adaptiveness to the player’s behavior. In a setting where losses are allowed to drift, we characterize —in a nearly complete manner— the power of adaptive adversaries with bounded memories and switching costs. In particular, we show that with switch� ing costs, the attainable rate with bandit feedback is Θ(T 2/3 ). Interestingly, this √ rate is significantly worse than the Θ( T ) rate attainable with switching costs in the full-information case. Via a novel reduction from experts to bandits, we also � show that a bounded memory adversary can force Θ(T 2/3 ) regret even in the full information case, proving that switching costs are easier to control than bounded memory adversaries. Our lower bounds rely on a new stochastic adversary strategy that generates loss processes with strong dependencies. 1
4 0.65985811 325 nips-2013-The Pareto Regret Frontier
Author: Wouter M. Koolen
Abstract: Performance guarantees for online learning algorithms typically take the form of regret bounds, which express that the cumulative loss overhead compared to the best expert in hindsight is small. In the common case of large but structured expert sets we typically wish to keep the regret especially small compared to simple experts, at the cost of modest additional overhead compared to more complex others. We study which such regret trade-offs can be achieved, and how. We analyse regret w.r.t. each individual expert as a multi-objective criterion in the simple but fundamental case of absolute loss. We characterise the achievable and Pareto optimal trade-offs, and the corresponding optimal strategies for each sample size both exactly for each finite horizon and asymptotically. 1
5 0.65248543 89 nips-2013-Dimension-Free Exponentiated Gradient
Author: Francesco Orabona
Abstract: I present a new online learning algorithm that extends the exponentiated gradient framework to infinite dimensional spaces. My analysis shows that the algorithm is implicitly able to estimate the L2 norm of the unknown competitor, U , achieving √ a regret bound of the order of O(U log(U T + 1)) T ), instead of the standard √ O((U 2 + 1) T ), achievable without knowing U . For this analysis, I introduce novel tools for algorithms with time-varying regularizers, through the use of local smoothness. Through a lower bound, I also show that the algorithm is optimal up to log(U T ) term for linear and Lipschitz losses. 1
6 0.5812676 125 nips-2013-From Bandits to Experts: A Tale of Domination and Independence
7 0.51878399 2 nips-2013-(Nearly) Optimal Algorithms for Private Online Learning in Full-information and Bandit Settings
8 0.47806671 230 nips-2013-Online Learning with Costly Features and Labels
9 0.42026782 95 nips-2013-Distributed Exploration in Multi-Armed Bandits
10 0.41778266 62 nips-2013-Causal Inference on Time Series using Restricted Structural Equation Models
11 0.39836368 17 nips-2013-A multi-agent control framework for co-adaptation in brain-computer interfaces
12 0.39282459 227 nips-2013-Online Learning in Markov Decision Processes with Adversarially Chosen Transition Probability Distributions
13 0.36107573 106 nips-2013-Eluder Dimension and the Sample Complexity of Optimistic Exploration
14 0.33270541 266 nips-2013-Recurrent linear models of simultaneously-recorded neural populations
15 0.3200745 159 nips-2013-Learning Prices for Repeated Auctions with Strategic Buyers
16 0.31425342 269 nips-2013-Regression-tree Tuning in a Streaming Setting
17 0.30663517 39 nips-2013-Approximate Gaussian process inference for the drift function in stochastic differential equations
18 0.30417162 198 nips-2013-More Effective Distributed ML via a Stale Synchronous Parallel Parameter Server
19 0.30068949 45 nips-2013-BIG & QUIC: Sparse Inverse Covariance Estimation for a Million Variables
20 0.27730164 338 nips-2013-Two-Target Algorithms for Infinite-Armed Bandits with Bernoulli Rewards
topicId topicWeight
[(2, 0.098), (16, 0.014), (23, 0.123), (33, 0.14), (34, 0.075), (41, 0.062), (49, 0.051), (56, 0.193), (70, 0.026), (85, 0.054), (89, 0.021), (93, 0.022), (95, 0.028)]
simIndex simValue paperId paperTitle
same-paper 1 0.92004216 240 nips-2013-Optimization, Learning, and Games with Predictable Sequences
Author: Sasha Rakhlin, Karthik Sridharan
Abstract: We provide several applications of Optimistic Mirror Descent, an online learning algorithm based on the idea of predictable sequences. First, we recover the Mirror Prox algorithm for offline optimization, prove an extension to H¨ lder-smooth o functions, and apply the results to saddle-point type problems. Next, we prove that a version of Optimistic Mirror Descent (which has a close relation to the Exponential Weights algorithm) can be used by two strongly-uncoupled players in a finite zero-sum matrix game to converge to the minimax equilibrium at the rate of O((log T )T ). This addresses a question of Daskalakis et al [6]. Further, we consider a partial information version of the problem. We then apply the results to convex programming and exhibit a simple algorithm for the approximate Max Flow problem. 1
2 0.88044059 66 nips-2013-Computing the Stationary Distribution Locally
Author: Christina E. Lee, Asuman Ozdaglar, Devavrat Shah
Abstract: Computing the stationary distribution of a large finite or countably infinite state space Markov Chain (MC) has become central in many problems such as statistical inference and network analysis. Standard methods involve large matrix multiplications as in power iteration, or simulations of long random walks, as in Markov Chain Monte Carlo (MCMC). Power iteration is costly, as it involves computation at every state. For MCMC, it is difficult to determine whether the random walks are long enough to guarantee convergence. In this paper, we provide a novel algorithm that answers whether a chosen state in a MC has stationary probability larger than some ∆ ∈ (0, 1), and outputs an estimate of the stationary probability. Our algorithm is constant time, using information from a local neighborhood of the state on the graph induced by the MC, which has constant size relative to the state space. The multiplicative error of the estimate is upper bounded by a function of the mixing properties of the MC. Simulation results show MCs for which this method gives tight estimates. 1
3 0.87705374 191 nips-2013-Minimax Optimal Algorithms for Unconstrained Linear Optimization
Author: Brendan McMahan, Jacob Abernethy
Abstract: We design and analyze minimax-optimal algorithms for online linear optimization games where the player’s choice is unconstrained. The player strives to minimize regret, the difference between his loss and the loss of a post-hoc benchmark strategy. While the standard benchmark is the loss of the best strategy chosen from a bounded comparator set, we consider a very broad range of benchmark functions. The problem is cast as a sequential multi-stage zero-sum game, and we give a thorough analysis of the minimax behavior of the game, providing characterizations for the value of the game, as well as both the player’s and the adversary’s optimal strategy. We show how these objects can be computed efficiently under certain circumstances, and by selecting an appropriate benchmark, we construct a novel hedging strategy for an unconstrained betting game. 1
4 0.86609191 125 nips-2013-From Bandits to Experts: A Tale of Domination and Independence
Author: Noga Alon, Nicolò Cesa-Bianchi, Claudio Gentile, Yishay Mansour
Abstract: We consider the partial observability model for multi-armed bandits, introduced by Mannor and Shamir [14]. Our main result is a characterization of regret in the directed observability model in terms of the dominating and independence numbers of the observability graph (which must be accessible before selecting an action). In the undirected case, we show that the learner can achieve optimal regret without even accessing the observability graph before selecting an action. Both results are shown using variants of the Exp3 algorithm operating on the observability graph in a time-efficient manner. 1
5 0.86331844 231 nips-2013-Online Learning with Switching Costs and Other Adaptive Adversaries
Author: Nicolò Cesa-Bianchi, Ofer Dekel, Ohad Shamir
Abstract: We study the power of different types of adaptive (nonoblivious) adversaries in the setting of prediction with expert advice, under both full-information and bandit feedback. We measure the player’s performance using a new notion of regret, also known as policy regret, which better captures the adversary’s adaptiveness to the player’s behavior. In a setting where losses are allowed to drift, we characterize —in a nearly complete manner— the power of adaptive adversaries with bounded memories and switching costs. In particular, we show that with switch� ing costs, the attainable rate with bandit feedback is Θ(T 2/3 ). Interestingly, this √ rate is significantly worse than the Θ( T ) rate attainable with switching costs in the full-information case. Via a novel reduction from experts to bandits, we also � show that a bounded memory adversary can force Θ(T 2/3 ) regret even in the full information case, proving that switching costs are easier to control than bounded memory adversaries. Our lower bounds rely on a new stochastic adversary strategy that generates loss processes with strong dependencies. 1
6 0.8530516 171 nips-2013-Learning with Noisy Labels
7 0.85241759 182 nips-2013-Manifold-based Similarity Adaptation for Label Propagation
8 0.85233498 230 nips-2013-Online Learning with Costly Features and Labels
10 0.84814596 2 nips-2013-(Nearly) Optimal Algorithms for Private Online Learning in Full-information and Bandit Settings
11 0.84786105 248 nips-2013-Point Based Value Iteration with Optimal Belief Compression for Dec-POMDPs
12 0.84684068 29 nips-2013-Adaptive Submodular Maximization in Bandit Setting
13 0.84253097 228 nips-2013-Online Learning of Dynamic Parameters in Social Networks
14 0.84224057 325 nips-2013-The Pareto Regret Frontier
15 0.84147692 142 nips-2013-Information-theoretic lower bounds for distributed statistical estimation with communication constraints
16 0.83649111 79 nips-2013-DESPOT: Online POMDP Planning with Regularization
17 0.83580446 235 nips-2013-Online learning in episodic Markovian decision processes by relative entropy policy search
18 0.83565813 23 nips-2013-Active Learning for Probabilistic Hypotheses Using the Maximum Gibbs Error Criterion
19 0.83440083 95 nips-2013-Distributed Exploration in Multi-Armed Bandits
20 0.83319145 224 nips-2013-On the Sample Complexity of Subspace Learning