nips nips2013 nips2013-89 knowledge-graph by maker-knowledge-mining
Source: pdf
Author: Francesco Orabona
Abstract: I present a new online learning algorithm that extends the exponentiated gradient framework to infinite dimensional spaces. My analysis shows that the algorithm is implicitly able to estimate the L2 norm of the unknown competitor, U , achieving √ a regret bound of the order of O(U log(U T + 1)) T ), instead of the standard √ O((U 2 + 1) T ), achievable without knowing U . For this analysis, I introduce novel tools for algorithms with time-varying regularizers, through the use of local smoothness. Through a lower bound, I also show that the algorithm is optimal up to log(U T ) term for linear and Lipschitz losses. 1
Reference: text
sentIndex sentText sentNum sentScore
1 com Abstract I present a new online learning algorithm that extends the exponentiated gradient framework to infinite dimensional spaces. [sent-2, score-0.375]
2 My analysis shows that the algorithm is implicitly able to estimate the L2 norm of the unknown competitor, U , achieving √ a regret bound of the order of O(U log(U T + 1)) T ), instead of the standard √ O((U 2 + 1) T ), achievable without knowing U . [sent-3, score-0.487]
3 Popular online algorithms for classification include the standard Perceptron and its many variants, such as kernel Perceptron [6], and p-norm Perceptron [7]. [sent-10, score-0.18]
4 Recently, Online Mirror Descent (OMD)1 and has been proposed as a general meta-algorithm for online learning, parametrized by a regularizer [16]. [sent-12, score-0.215]
5 By appropriately choosing the regularizer, most online learning algorithms are recovered as special cases of OMD. [sent-13, score-0.125]
6 Moreover, performance guarantees can also be derived simply by instantiating the general OMD bounds to the specific regularizer being used. [sent-14, score-0.146]
7 So, for all the first-order online learning algorithms, it is possible to prove regret bounds √ of the order of O(f (u) T ), where T is the number of rounds and f (u) is the regularizer used in OMD, evaluated on the competitor vector u. [sent-15, score-0.714]
8 Hence, different choices of the regularizer will give rise to different algorithms and guarantees. [sent-16, score-0.126]
9 In particular √ √ for the Euclidean regularizer η T u 2 , we have a regret bound of O( T ( u 2 /η + η)). [sent-18, score-0.476]
10 On the other hand, EG η has a regret bound of O( T log d), where d is the dimension of the space. [sent-20, score-0.395]
11 I prove a regret bound of O( u log( u T + 1) T ). [sent-25, score-0.397]
12 Up to logarithmic terms, the bound of DFEG is equal to the optimal bound obtained through the knowledge of u , but it does not require the tuning of any parameter. [sent-26, score-0.213]
13 In order to analyze DFEG, I also introduce new and general techniques to cope with timevarying regularizers for OMD, using the local smoothness of the dual of the regularization function, that might be of independent interest. [sent-29, score-0.122]
14 I also extend and improve the lower bound in [19], to match √ the upper bound of DFEG up to a log T term, and to show an implicit trade-off on the regret versus different competitors. [sent-30, score-0.522]
15 The algorithms have multiplicative updates and regret bounds that depend logarithmically on the dimension of the input space. [sent-33, score-0.309]
16 Their regret bound depends on the dimension of the space and can neither be used with infinite dimensional spaces nor with kernels. [sent-38, score-0.421]
17 √ Vovk proposed two algorithms for square loss, with regret bounds of O(( u + Y ) T ) and √ O( u T ) respectively, where Y is an upper bound on the range of the target values [20]. [sent-39, score-0.436]
18 Indeed, the lower bound I prove shows that for linear and Lipschitz losses a log( u T ) term is unavoidable. [sent-42, score-0.171]
19 The time-varying regularization for OMD has been explored in [4, 12, 15], but in none of these works does the negative terms in the bound due to the time-varying regularizer play a decisive role. [sent-48, score-0.205]
20 2 Problem setting and definitions In the online learning scenario the learning algorithms work in rounds [3]. [sent-51, score-0.17]
21 Let X a Euclidean vector space2 , at each round t, an instance xt ∈ X, is presented to the algorithm, which then predicts a label yt ∈ R. [sent-52, score-0.4]
22 Then, the correct label yt is revealed, and the algorithm pays a loss (ˆt , yt ), for having ˆ y predicted yt , instead of yt . [sent-53, score-0.381]
23 The aim of the online learning algorithm is to minimize the cumulative ˆ sum of the losses, on any sequence of data/labels {(xt , yt )}T . [sent-54, score-0.252]
24 Typical examples of loss functions t=1 are, for example, the absolute loss, |ˆt − yt |, and the hinge loss, max(1 − yt yt , 0). [sent-55, score-0.283]
25 In this paper I focus on linear prediction of the form yt = wt , xt , where wt ∈ X represents the hypothesis of the online algorithm at time t. [sent-57, score-0.925]
26 Such analysis bounds the regret, that is the difference between the cumulative loss of the T T algorithm, t=1 t ( wt , xt ), and the one of an arbitrary and fixed competitor u, t=1 t ( u, xt ). [sent-67, score-1.02]
27 Strong convexity and strong smoothness are key properties in the design of online learning algorithms, they are defined as follows. [sent-77, score-0.212]
28 It shares some similarities with the exponentiated gradient with unnormalized weights algorithm [9], to the selftuning variant of exponentiated gradient in [15], and to the epoch-free algorithm in [19]. [sent-83, score-0.535]
29 However, note that it does not access to single coordinates of wt and xt , but only their inner products. [sent-84, score-0.518]
30 For the DFEG algorithm we have the following regret bound, that will be proved in Section 4. [sent-87, score-0.289]
31 3 3 2 HT ln HT u −1 , The bound has a logarithmic part, typical of the family of exponentiated gradient algorithms, but instead of depending on the dimension, it depends on the norm of the competitor, u . [sent-92, score-0.417]
32 Hence, the regret bound of DFEG holds for infinite dimensional spaces as well, that is, it is dimension-free. [sent-93, score-0.421]
33 It is interesting to compare this bound with the usual bound for online learning using an L2 regular√ izer. [sent-94, score-0.301]
34 Using a time-varying regularizer ft (w) = ηt w 2 it is easy to see, e. [sent-95, score-0.398]
35 If an upper bound U on u is known, we can use it to tune η to √ obtain an upper bound of √ order of O(U T ). [sent-98, score-0.279]
36 On the other hand, we obtain for DFEG a bound the of O( u log( u T + 1) T ), that is optimal bound, up to logarithmic terms, without knowing U . [sent-99, score-0.142]
37 So my bound goes to constant if the norm of the competitor goes to zero. [sent-100, score-0.344]
38 However, note that, for any fixed competitor, the gradient descent bound is asymptotically better. [sent-101, score-0.177]
39 The parameter a is directly linked to the leading constant of the regret bound; therefore, it is intuitive that the range of acceptable values must have a lower bound different from zero. [sent-103, score-0.368]
40 Notice that the bound is data-dependent because it depends on the sequence of observed input vectors xt . [sent-105, score-0.436]
41 A data-independent bound can be easily obtained from the upper bound on the norm of the input vectors. [sent-106, score-0.299]
42 The use of the function max( xt , xt 2 ) is necessary to have such a data-dependent bound and it seems that it cannot be avoided in order to prove the regret bound. [sent-107, score-0.993]
43 It is natural to ask if the log term in the bound can be avoided. [sent-108, score-0.124]
44 In particular, the following theorem shows that the regret of any online learning algorithm has a satisfy to a trade-off between the guarantees against the competitor with norm equal to zero and the ones against other competitors. [sent-110, score-0.626]
45 Fix a non-trivial vector space X , a specific online learning algorithm, and let the sequence of losses be composed by linear losses. [sent-113, score-0.174]
46 If the algorithm guarantees a zero regret against the competitor with zero L2 norm, then there exists a sequence of T vectors in X , such that the regret against any other competitor is Ω(T ). [sent-114, score-0.869]
47 √ It is also interesting to note √ an L2 regularizer suffers a loss of O( T ) against a competitor with that zero norm, that cancels the log T term. [sent-119, score-0.33]
48 1 Online mirror descent and local smoothness Algorithm 2 is a generic meta-algorithm for online learning. [sent-123, score-0.26]
49 Most of the online learning algorithms can be derived from it, choosing the functions ft and the vectors z t . [sent-124, score-0.434]
50 The following lemma, that is a generalization of Corollary 4 in [8], Corollary 3 in [4], and Lemma 1 in [12], is the main tool to prove the regret bound for the DFEG algorithm. [sent-125, score-0.397]
51 √ 3 Despite what claimed in Section 1 of [19], the use of the time-varying regularizer ft (w) = guarantees a sublinear regret for unconstrained online convex optimization, for any η > 0. [sent-127, score-0.814]
52 do Choose wt = ft∗ (θ t ) Observe z t ∈ X Update θ t+1 = θ t + z t end for Lemma 1. [sent-135, score-0.202]
53 Then for any wt , u ∈ S we have T T ∗ ft∗ (θ t+1 ) − ft−1 (θ t ) − wt , z t z t , u − wt ≤ fT (u) + t=1 , t=1 ∗ ∗ ∗ where we set f0 (w1 ) = 0. [sent-140, score-0.606]
54 are twice differentiable, and 2 ∗ max0≤τ ≤1 ft (θ t + τ z t ) ≤ λt , then we have ∗ ∗ ft∗ (θ t+1 ) − ft−1 (θ t ) − wt , z t ≤ ft∗ (θ t ) − ft−1 (θ t ) + λt zt 2 2 . [sent-144, score-0.646]
55 Note that the above Lemma is usually stated assuming the strong convexity of ft , that is equivalent to the strong smoothness of ft∗ , that in turns for twice differentiable functions is equivalent to a global bound on the norm of the Hessian of ft∗ (see Theorem 2. [sent-145, score-0.665]
56 Hence, for twice differentiable conjugate functions, this bound is always tighter than the ones in [4, 8, 12]. [sent-149, score-0.215]
57 Indeed, in our case, the global strong smoothness cannot be used to prove any meaningful regret bound. [sent-150, score-0.382]
58 Set in Algorithm 2 ft (w) = αt w (log(βt w ) − 1), where αt and βt are defined in Algorithm 1, and z t = −∂ t ( wt , xt )xt . [sent-152, score-0.79]
59 First, assume that we The usual apare on a round where we have a local upper bound on the norm of the Hessian ft∗ . [sent-154, score-0.246]
60 √ proach in these kind of proof is to have a regularizer that is growing over time as t, so that the ∗ terms ft∗ (θ t ) − ft−1 (θ t ) are negative and can be safely discarded. [sent-155, score-0.153]
61 At the same time the sum of the √ √ squared norms of the gradients will typically be of the order of O( T ), giving us a O( T ) regret bound (see for example the proofs in [4]). [sent-156, score-0.419]
62 In the following, I will show the surprising result that with my choice of the regularizers ft , the ∗ terms ft∗ (θ t ) − ft−1 (θ t ) and the squared norm of the gradient cancel out. [sent-160, score-0.458]
63 This is in agreement with Theorem 9 in [19] that rules out algorithms with a fixed regularizer to obtain regret bounds like Theorem 1. [sent-163, score-0.417]
64 It remains to bound the regret in the rounds where we do not have an upper bound on the norm of the Hessian. [sent-164, score-0.615]
65 In these rounds I show that the norm of wt (and θ t ) is small enough so that the regret is still bounded, thanks to the choice of βt . [sent-165, score-0.618]
66 Note the similarities with EGU, where the regularizer is d d i=1 wi (log(wi ) − 1), w ∈ R , wi ≥ 0 [9]. [sent-168, score-0.169]
67 The following properties hold • f ∗ (θ) = α β exp θ α . [sent-171, score-0.145]
68 5 f ∗ (θ) = • 2 ∗ • f (θ) θ β θ 2 ≤ exp θ α . [sent-172, score-0.145]
69 Equipped with a local upper bound on the Hessian of f ∗ , we can now use Lemma 1. [sent-174, score-0.149]
70 In fact if we want the regret to be √ √ ˜ ˜ O( T ), αt must be O( T ) too. [sent-176, score-0.271]
71 The first two are used to upper bound the exponential function with quadratic functions. [sent-178, score-0.127]
72 We will use Lemma 1 to upper bound the regret of DFEG. [sent-187, score-0.398]
73 Hence, using the notation in Algorithm 1, set z t = −∂ t ( wt , xt )xt , and ft (w) = αt w (log(βt w ) − 1). [sent-188, score-0.79]
74 Observe that, by the hypothesis on t , we have z t ≤ L xt . [sent-189, score-0.316]
75 With this assumption, and using the third property of Lemma 2, we have 2 ∗ ft (θ t max 0≤τ ≤1 exp + τ z t ) ≤ max 0≤τ ≤1 θ t +τ z t αt exp βt min( θ t + τ z t , αt ) ≤ θt + zt αt βt α t . [sent-192, score-0.705]
76 We have that λt 2 t + ft∗ (θ t ) − ft−1 (θ t ) can be upper bounded by zt 2 θt + z t αt θt αt−1 θt exp + exp − exp 2αt βt αt βt αt βt−1 αt−1 zt 2 θt + z t αt θt αt−1 θt ≤ exp + exp − exp 2αt βt αt βt αt βt−1 αt θt zt zt 2 a a = exp + − . [sent-194, score-1.545]
77 (1) 2 exp αt 2aHt αt Ht Ht−1 We will now prove that the term in the parenthesis of (1) is negative. [sent-195, score-0.192]
78 It can be rewritten as zt 2 2 exp 2aHt zt αt z t 2 Ht−1 exp a a + − = Ht Ht−1 and from the expression of αt we have that M = 1/a. [sent-196, score-0.54]
79 These are valid settings because zt αt − 2a2 Ht−1 L2 n(xt ) − 2a2 L4 (n(xt ))2 , 2 2aHt Ht−1 zt 1 2 αt ≤ a , so we now use Lemma 3 with p = 2a and 1 exp( a ) 1 ≤ 2a2 ≤ exp( a ), ∀ 0. [sent-197, score-0.25]
80 We use the first statement of Lemma 1, setting wt = wt if θ = 0, and wt = 0 otherwise. [sent-203, score-0.606]
81 In this θ 1 way, from the second property of Lemma 2, we have that wt ≤ βt exp( αt ). [sent-204, score-0.202]
82 Note that any other t choice of wt satisfying the the previous relation on the norm of wt would have worked as well. [sent-205, score-0.498]
83 ∗ ft∗ (θ t+1 ) − ft−1 (θ t ) = ≤ exp θt αt αt exp βt αt exp βt zt αt Remembering that θ t+1 αt zt αt − αt−1 βt−1 αt−1 exp βt−1 θt αt−1 = a exp − θt αt exp z √t a Ht Ht−1 − Ht . [sent-206, score-1.12]
84 t( The stated bound can be obtained observing that the convexity of t and the definition of z t . [sent-211, score-0.12]
85 5 w t , xt ) − t( u, xt ) ≤ u − wt , z t , from Experiments A full empirical evaluation of DFEG is beyond the scope of this paper. [sent-212, score-0.798]
86 95 cpusmall dataset 14000 DFEG Kernel GD, various eta DFEG Kernel GD, various eta 12000 1. [sent-218, score-0.329]
87 5 −2 10 −1 10 0 10 eta 1 10 2 10 2000 −1 10 0 1 10 10 2 10 eta Figure 1: Left: regret versus number of input vectors on synthetic dataset. [sent-227, score-0.57]
88 Center and Right: total loss for DFEG and Kernel GD on the cadata and cpusmall dataset respectively. [sent-228, score-0.145]
89 First, I generated synthetic data as in the proof of Theorem 2, that is the input vectors are all the same and the yt is equal to 1 for the t even and −1 for the others. [sent-231, score-0.126]
90 Figure 1(left) I have plotted the regret as a function of the number of input vectors. [sent-233, score-0.271]
91 As √ predicted by the theory, DFEG has a constant regret, while Kernel GD has a regret of the form O(η T ). [sent-234, score-0.271]
92 Hence, it can have a constant regret only when η is set to zero, and this can be done only with prior knowledge of u , that is impossible in practical applications. [sent-235, score-0.271]
93 6 Discussion I have presented a new algorithm for online learning, the first one in the family of exponentiated gradient to be dimension-free. [sent-243, score-0.351]
94 Thanks to new analysis tools, I have proved that DFEG attains a √ regret bound of O(U log(U T + 1)) T ), without any parameter to tune. [sent-244, score-0.368]
95 I also proved a lower √ bound that shows that the algorithm is optimal up to log T term for linear and Lipschitz losses. [sent-245, score-0.142]
96 The problem of deriving a regret bound that depends on the sequence of the gradients, rather than T on the xt , remains open. [sent-246, score-0.688]
97 Resolving this issue would result in the tighter O( t=1 t ( w t , xt )) regret bounds in the case that the t are smooth [18]. [sent-247, score-0.613]
98 A stochastic view of optimal regret through minimax duality. [sent-261, score-0.271]
99 Adaptive subgradient methods for online learning and stochastic optimization. [sent-283, score-0.125]
100 A generalized online mirror descent with applications to classification and regression, 2013. [sent-346, score-0.179]
wordName wordTfidf (topN-words)
[('dfeg', 0.479), ('ht', 0.384), ('xt', 0.298), ('ft', 0.29), ('regret', 0.271), ('omd', 0.24), ('wt', 0.202), ('exponentiated', 0.174), ('exp', 0.145), ('eta', 0.14), ('competitor', 0.134), ('zt', 0.125), ('gd', 0.115), ('regularizer', 0.108), ('online', 0.107), ('bound', 0.097), ('yt', 0.08), ('eg', 0.075), ('norm', 0.075), ('lemma', 0.066), ('lht', 0.06), ('smoothness', 0.059), ('kernel', 0.055), ('cadata', 0.053), ('gradient', 0.052), ('cpusmall', 0.049), ('perceptron', 0.048), ('differentiable', 0.046), ('rounds', 0.045), ('losses', 0.045), ('mirror', 0.044), ('orabona', 0.043), ('loss', 0.043), ('regularizers', 0.041), ('convex', 0.038), ('hessian', 0.037), ('editors', 0.033), ('norms', 0.032), ('fenchel', 0.031), ('upper', 0.03), ('twice', 0.029), ('prove', 0.029), ('spaces', 0.029), ('descent', 0.028), ('proof', 0.027), ('log', 0.027), ('instantiation', 0.026), ('knowing', 0.026), ('tune', 0.025), ('cumulative', 0.025), ('similarities', 0.025), ('thanks', 0.025), ('lipschitz', 0.025), ('route', 0.024), ('francesco', 0.024), ('tighter', 0.024), ('dimensional', 0.024), ('inequality', 0.024), ('veri', 0.023), ('strong', 0.023), ('convexity', 0.023), ('aggregating', 0.023), ('culotta', 0.023), ('sequence', 0.022), ('round', 0.022), ('local', 0.022), ('shares', 0.022), ('colt', 0.021), ('theorem', 0.021), ('writing', 0.021), ('chicago', 0.021), ('bounds', 0.02), ('dent', 0.02), ('princeton', 0.02), ('logarithmic', 0.019), ('seem', 0.019), ('conjugate', 0.019), ('relation', 0.019), ('zemel', 0.019), ('goes', 0.019), ('gradients', 0.019), ('putting', 0.019), ('vectors', 0.019), ('subgradient', 0.018), ('lecture', 0.018), ('comments', 0.018), ('hence', 0.018), ('algorithms', 0.018), ('suffers', 0.018), ('lafferty', 0.018), ('access', 0.018), ('classi', 0.018), ('wi', 0.018), ('hypothesis', 0.018), ('algorithm', 0.018), ('instantiating', 0.018), ('parenthesis', 0.018), ('proach', 0.018), ('reply', 0.018), ('thankful', 0.018)]
simIndex simValue paperId paperTitle
same-paper 1 0.99999988 89 nips-2013-Dimension-Free Exponentiated Gradient
Author: Francesco Orabona
Abstract: I present a new online learning algorithm that extends the exponentiated gradient framework to infinite dimensional spaces. My analysis shows that the algorithm is implicitly able to estimate the L2 norm of the unknown competitor, U , achieving √ a regret bound of the order of O(U log(U T + 1)) T ), instead of the standard √ O((U 2 + 1) T ), achievable without knowing U . For this analysis, I introduce novel tools for algorithms with time-varying regularizers, through the use of local smoothness. Through a lower bound, I also show that the algorithm is optimal up to log(U T ) term for linear and Lipschitz losses. 1
2 0.30346572 2 nips-2013-(Nearly) Optimal Algorithms for Private Online Learning in Full-information and Bandit Settings
Author: Abhradeep Guha Thakurta, Adam Smith
Abstract: We give differentially private algorithms for a large class of online learning algorithms, in both the full information and bandit settings. Our algorithms aim to minimize a convex loss function which is a sum of smaller convex loss terms, one for each data point. To design our algorithms, we modify the popular mirror descent approach, or rather a variant called follow the approximate leader. The technique leads to the first nonprivate algorithms for private online learning in the bandit setting. In the full information setting, our algorithms improve over the regret bounds of previous work (due to Dwork, Naor, Pitassi and Rothblum (2010) and Jain, Kothari and Thakurta (2012)). In many cases, our algorithms (in both settings) match the dependence on the input length, T , of the optimal nonprivate regret bounds up to logarithmic factors in T . Our algorithms require logarithmic space and update time. 1
3 0.30268183 240 nips-2013-Optimization, Learning, and Games with Predictable Sequences
Author: Sasha Rakhlin, Karthik Sridharan
Abstract: We provide several applications of Optimistic Mirror Descent, an online learning algorithm based on the idea of predictable sequences. First, we recover the Mirror Prox algorithm for offline optimization, prove an extension to H¨ lder-smooth o functions, and apply the results to saddle-point type problems. Next, we prove that a version of Optimistic Mirror Descent (which has a close relation to the Exponential Weights algorithm) can be used by two strongly-uncoupled players in a finite zero-sum matrix game to converge to the minimax equilibrium at the rate of O((log T )T ). This addresses a question of Daskalakis et al [6]. Further, we consider a partial information version of the problem. We then apply the results to convex programming and exhibit a simple algorithm for the approximate Max Flow problem. 1
4 0.23325515 230 nips-2013-Online Learning with Costly Features and Labels
Author: Navid Zolghadr, Gabor Bartok, Russell Greiner, András György, Csaba Szepesvari
Abstract: This paper introduces the online probing problem: In each round, the learner is able to purchase the values of a subset of feature values. After the learner uses this information to come up with a prediction for the given round, he then has the option of paying to see the loss function that he is evaluated against. Either way, the learner pays for both the errors of his predictions and also whatever he chooses to observe, including the cost of observing the loss function for the given round and the cost of the observed features. We consider two variations of this problem, depending on whether the learner can observe the label for free or not. We provide algorithms and upper and lower bounds on the regret for both variants. We show that a positive cost for observing the label significantly increases the regret of the problem. 1
5 0.22319154 106 nips-2013-Eluder Dimension and the Sample Complexity of Optimistic Exploration
Author: Dan Russo, Benjamin Van Roy
Abstract: This paper considers the sample complexity of the multi-armed bandit with dependencies among the arms. Some of the most successful algorithms for this problem use the principle of optimism in the face of uncertainty to guide exploration. The clearest example of this is the class of upper confidence bound (UCB) algorithms, but recent work has shown that a simple posterior sampling algorithm, sometimes called Thompson sampling, can be analyzed in the same manner as optimistic approaches. In this paper, we develop a regret bound that holds for both classes of algorithms. This bound applies broadly and can be specialized to many model classes. It depends on a new notion we refer to as the eluder dimension, which measures the degree of dependence among action rewards. Compared to UCB algorithm regret bounds for specific model classes, our general bound matches the best available for linear models and is stronger than the best available for generalized linear models. 1
6 0.21528403 231 nips-2013-Online Learning with Switching Costs and Other Adaptive Adversaries
7 0.2077072 311 nips-2013-Stochastic Convex Optimization with Multiple Objectives
8 0.19168872 175 nips-2013-Linear Convergence with Condition Number Independent Access of Full Gradients
9 0.18034297 26 nips-2013-Adaptive Market Making via Online Learning
10 0.17114083 62 nips-2013-Causal Inference on Time Series using Restricted Structural Equation Models
11 0.16860309 325 nips-2013-The Pareto Regret Frontier
12 0.15651464 227 nips-2013-Online Learning in Markov Decision Processes with Adversarially Chosen Transition Probability Distributions
13 0.14700234 137 nips-2013-High-Dimensional Gaussian Process Bandits
14 0.13339944 269 nips-2013-Regression-tree Tuning in a Streaming Setting
15 0.13078447 211 nips-2013-Non-Linear Domain Adaptation with Boosting
16 0.12877227 191 nips-2013-Minimax Optimal Algorithms for Unconstrained Linear Optimization
17 0.12266271 253 nips-2013-Prior-free and prior-dependent regret bounds for Thompson Sampling
18 0.11505996 48 nips-2013-Bayesian Inference and Learning in Gaussian Process State-Space Models with Particle MCMC
19 0.11289948 266 nips-2013-Recurrent linear models of simultaneously-recorded neural populations
20 0.11220211 193 nips-2013-Mixed Optimization for Smooth Functions
topicId topicWeight
[(0, 0.237), (1, -0.138), (2, 0.316), (3, -0.241), (4, -0.032), (5, -0.068), (6, -0.048), (7, 0.143), (8, 0.114), (9, -0.05), (10, -0.08), (11, -0.106), (12, -0.025), (13, -0.066), (14, -0.01), (15, 0.026), (16, 0.014), (17, 0.041), (18, 0.047), (19, 0.017), (20, 0.023), (21, 0.058), (22, 0.044), (23, -0.003), (24, -0.034), (25, 0.001), (26, 0.035), (27, -0.115), (28, 0.044), (29, -0.002), (30, 0.079), (31, -0.105), (32, -0.023), (33, 0.042), (34, -0.039), (35, -0.039), (36, 0.112), (37, -0.024), (38, 0.114), (39, 0.009), (40, 0.076), (41, 0.022), (42, -0.073), (43, 0.027), (44, 0.029), (45, -0.062), (46, -0.019), (47, 0.071), (48, 0.036), (49, -0.001)]
simIndex simValue paperId paperTitle
same-paper 1 0.96905255 89 nips-2013-Dimension-Free Exponentiated Gradient
Author: Francesco Orabona
Abstract: I present a new online learning algorithm that extends the exponentiated gradient framework to infinite dimensional spaces. My analysis shows that the algorithm is implicitly able to estimate the L2 norm of the unknown competitor, U , achieving √ a regret bound of the order of O(U log(U T + 1)) T ), instead of the standard √ O((U 2 + 1) T ), achievable without knowing U . For this analysis, I introduce novel tools for algorithms with time-varying regularizers, through the use of local smoothness. Through a lower bound, I also show that the algorithm is optimal up to log(U T ) term for linear and Lipschitz losses. 1
2 0.77624708 240 nips-2013-Optimization, Learning, and Games with Predictable Sequences
Author: Sasha Rakhlin, Karthik Sridharan
Abstract: We provide several applications of Optimistic Mirror Descent, an online learning algorithm based on the idea of predictable sequences. First, we recover the Mirror Prox algorithm for offline optimization, prove an extension to H¨ lder-smooth o functions, and apply the results to saddle-point type problems. Next, we prove that a version of Optimistic Mirror Descent (which has a close relation to the Exponential Weights algorithm) can be used by two strongly-uncoupled players in a finite zero-sum matrix game to converge to the minimax equilibrium at the rate of O((log T )T ). This addresses a question of Daskalakis et al [6]. Further, we consider a partial information version of the problem. We then apply the results to convex programming and exhibit a simple algorithm for the approximate Max Flow problem. 1
3 0.7182405 2 nips-2013-(Nearly) Optimal Algorithms for Private Online Learning in Full-information and Bandit Settings
Author: Abhradeep Guha Thakurta, Adam Smith
Abstract: We give differentially private algorithms for a large class of online learning algorithms, in both the full information and bandit settings. Our algorithms aim to minimize a convex loss function which is a sum of smaller convex loss terms, one for each data point. To design our algorithms, we modify the popular mirror descent approach, or rather a variant called follow the approximate leader. The technique leads to the first nonprivate algorithms for private online learning in the bandit setting. In the full information setting, our algorithms improve over the regret bounds of previous work (due to Dwork, Naor, Pitassi and Rothblum (2010) and Jain, Kothari and Thakurta (2012)). In many cases, our algorithms (in both settings) match the dependence on the input length, T , of the optimal nonprivate regret bounds up to logarithmic factors in T . Our algorithms require logarithmic space and update time. 1
4 0.68056142 230 nips-2013-Online Learning with Costly Features and Labels
Author: Navid Zolghadr, Gabor Bartok, Russell Greiner, András György, Csaba Szepesvari
Abstract: This paper introduces the online probing problem: In each round, the learner is able to purchase the values of a subset of feature values. After the learner uses this information to come up with a prediction for the given round, he then has the option of paying to see the loss function that he is evaluated against. Either way, the learner pays for both the errors of his predictions and also whatever he chooses to observe, including the cost of observing the loss function for the given round and the cost of the observed features. We consider two variations of this problem, depending on whether the learner can observe the label for free or not. We provide algorithms and upper and lower bounds on the regret for both variants. We show that a positive cost for observing the label significantly increases the regret of the problem. 1
5 0.67246747 231 nips-2013-Online Learning with Switching Costs and Other Adaptive Adversaries
Author: Nicolò Cesa-Bianchi, Ofer Dekel, Ohad Shamir
Abstract: We study the power of different types of adaptive (nonoblivious) adversaries in the setting of prediction with expert advice, under both full-information and bandit feedback. We measure the player’s performance using a new notion of regret, also known as policy regret, which better captures the adversary’s adaptiveness to the player’s behavior. In a setting where losses are allowed to drift, we characterize —in a nearly complete manner— the power of adaptive adversaries with bounded memories and switching costs. In particular, we show that with switch� ing costs, the attainable rate with bandit feedback is Θ(T 2/3 ). Interestingly, this √ rate is significantly worse than the Θ( T ) rate attainable with switching costs in the full-information case. Via a novel reduction from experts to bandits, we also � show that a bounded memory adversary can force Θ(T 2/3 ) regret even in the full information case, proving that switching costs are easier to control than bounded memory adversaries. Our lower bounds rely on a new stochastic adversary strategy that generates loss processes with strong dependencies. 1
6 0.59439361 325 nips-2013-The Pareto Regret Frontier
7 0.58420902 191 nips-2013-Minimax Optimal Algorithms for Unconstrained Linear Optimization
8 0.57434326 62 nips-2013-Causal Inference on Time Series using Restricted Structural Equation Models
9 0.57099843 106 nips-2013-Eluder Dimension and the Sample Complexity of Optimistic Exploration
10 0.53875881 311 nips-2013-Stochastic Convex Optimization with Multiple Objectives
11 0.53243619 175 nips-2013-Linear Convergence with Condition Number Independent Access of Full Gradients
12 0.51567709 17 nips-2013-A multi-agent control framework for co-adaptation in brain-computer interfaces
13 0.49817437 39 nips-2013-Approximate Gaussian process inference for the drift function in stochastic differential equations
14 0.48646975 26 nips-2013-Adaptive Market Making via Online Learning
15 0.47463745 137 nips-2013-High-Dimensional Gaussian Process Bandits
16 0.4737829 266 nips-2013-Recurrent linear models of simultaneously-recorded neural populations
17 0.45784464 41 nips-2013-Approximate inference in latent Gaussian-Markov models from continuous time observations
18 0.45766696 269 nips-2013-Regression-tree Tuning in a Streaming Setting
20 0.44462121 131 nips-2013-Geometric optimisation on positive definite matrices for elliptically contoured distributions
topicId topicWeight
[(2, 0.02), (16, 0.014), (33, 0.152), (34, 0.052), (41, 0.035), (49, 0.031), (56, 0.192), (85, 0.34), (89, 0.024), (93, 0.024), (95, 0.017)]
simIndex simValue paperId paperTitle
1 0.96173 272 nips-2013-Regularized Spectral Clustering under the Degree-Corrected Stochastic Blockmodel
Author: Tai Qin, Karl Rohe
Abstract: Spectral clustering is a fast and popular algorithm for finding clusters in networks. Recently, Chaudhuri et al. [1] and Amini et al. [2] proposed inspired variations on the algorithm that artificially inflate the node degrees for improved statistical performance. The current paper extends the previous statistical estimation results to the more canonical spectral clustering algorithm in a way that removes any assumption on the minimum degree and provides guidance on the choice of the tuning parameter. Moreover, our results show how the “star shape” in the eigenvectors–a common feature of empirical networks–can be explained by the Degree-Corrected Stochastic Blockmodel and the Extended Planted Partition model, two statistical models that allow for highly heterogeneous degrees. Throughout, the paper characterizes and justifies several of the variations of the spectral clustering algorithm in terms of these models. 1
2 0.94832951 7 nips-2013-A Gang of Bandits
Author: Nicolò Cesa-Bianchi, Claudio Gentile, Giovanni Zappella
Abstract: Multi-armed bandit problems formalize the exploration-exploitation trade-offs arising in several industrially relevant applications, such as online advertisement and, more generally, recommendation systems. In many cases, however, these applications have a strong social component, whose integration in the bandit algorithm could lead to a dramatic performance increase. For instance, content may be served to a group of users by taking advantage of an underlying network of social relationships among them. In this paper, we introduce novel algorithmic approaches to the solution of such networked bandit problems. More specifically, we design and analyze a global recommendation strategy which allocates a bandit algorithm to each network node (user) and allows it to “share” signals (contexts and payoffs) with the neghboring nodes. We then derive two more scalable variants of this strategy based on different ways of clustering the graph nodes. We experimentally compare the algorithm and its variants to state-of-the-art methods for contextual bandits that do not use the relational information. Our experiments, carried out on synthetic and real-world datasets, show a consistent increase in prediction performance obtained by exploiting the network structure. 1
3 0.91452461 332 nips-2013-Tracking Time-varying Graphical Structure
Author: Erich Kummerfeld, David Danks
Abstract: Structure learning algorithms for graphical models have focused almost exclusively on stable environments in which the underlying generative process does not change; that is, they assume that the generating model is globally stationary. In real-world environments, however, such changes often occur without warning or signal. Real-world data often come from generating models that are only locally stationary. In this paper, we present LoSST, a novel, heuristic structure learning algorithm that tracks changes in graphical model structure or parameters in a dynamic, real-time manner. We show by simulation that the algorithm performs comparably to batch-mode learning when the generating graphical structure is globally stationary, and significantly better when it is only locally stationary. 1
4 0.91426039 140 nips-2013-Improved and Generalized Upper Bounds on the Complexity of Policy Iteration
Author: Bruno Scherrer
Abstract: Given a Markov Decision Process (MDP) with n states and m actions per state, we study the number of iterations needed by Policy Iteration (PI) algorithms to converge to the optimal “-discounted optimal policy. We consider two variations of PI: Howard’s PI that changes the actions in all states with a positive advantage, and Simplex-PI that only changes the action in the state with maximal Ï advantage. We show that Howard’s PI terminates 1 2Ì 1 1 22 1 1 nm 1 after at most n(m ≠ 1) 1≠“ log 1≠“ = O 1≠“ log 1≠“ iterations, improving by a factor O(log 1 a result by [3], while Simplex-PI terminates n) 1 22 1 2 1 22 2 1 1 2 after at most n (m ≠ 1) 1 + 1≠“ log 1≠“ = O n m log 1≠“ 1≠“ iterations, improving by a factor O(log n) a result by [11]. Under some structural assumptions of the MDP, we then consider bounds that are independent of the discount factor “: given a measure of the maximal transient time ·t and the maximal time ·r to revisit states in recurrent classes under all policies, we show that Simplex-PI terminates after at most n2 (m≠ # $ 1) (Á·r log(n·r )Ë + Á·r log(n·t )Ë) (m ≠ 1)Án·t log(n·t )Ë + Án·t log(n2 ·t )Ë = !
5 0.8715378 168 nips-2013-Learning to Pass Expectation Propagation Messages
Author: Nicolas Heess, Daniel Tarlow, John Winn
Abstract: Expectation Propagation (EP) is a popular approximate posterior inference algorithm that often provides a fast and accurate alternative to sampling-based methods. However, while the EP framework in theory allows for complex nonGaussian factors, there is still a significant practical barrier to using them within EP, because doing so requires the implementation of message update operators, which can be difficult and require hand-crafted approximations. In this work, we study the question of whether it is possible to automatically derive fast and accurate EP updates by learning a discriminative model (e.g., a neural network or random forest) to map EP message inputs to EP message outputs. We address the practical concerns that arise in the process, and we provide empirical analysis on several challenging and diverse factors, indicating that there is a space of factors where this approach appears promising. 1
same-paper 6 0.85442138 89 nips-2013-Dimension-Free Exponentiated Gradient
7 0.82027096 206 nips-2013-Near-Optimal Entrywise Sampling for Data Matrices
8 0.77620757 125 nips-2013-From Bandits to Experts: A Tale of Domination and Independence
9 0.77574575 232 nips-2013-Online PCA for Contaminated Data
10 0.75804925 196 nips-2013-Modeling Overlapping Communities with Node Popularities
11 0.74939668 2 nips-2013-(Nearly) Optimal Algorithms for Private Online Learning in Full-information and Bandit Settings
12 0.74219137 29 nips-2013-Adaptive Submodular Maximization in Bandit Setting
13 0.74061912 104 nips-2013-Efficient Online Inference for Bayesian Nonparametric Relational Models
14 0.73945147 285 nips-2013-Robust Transfer Principal Component Analysis with Rank Constraints
15 0.73056591 227 nips-2013-Online Learning in Markov Decision Processes with Adversarially Chosen Transition Probability Distributions
16 0.73040557 289 nips-2013-Scalable kernels for graphs with continuous attributes
17 0.72388566 282 nips-2013-Robust Multimodal Graph Matching: Sparse Coding Meets Graph Matching
18 0.71288747 25 nips-2013-Adaptive Anonymity via $b$-Matching
19 0.71172082 66 nips-2013-Computing the Stationary Distribution Locally
20 0.71005571 142 nips-2013-Information-theoretic lower bounds for distributed statistical estimation with communication constraints