nips nips2011 nips2011-220 knowledge-graph by maker-knowledge-mining

220 nips-2011-Prediction strategies without loss


Source: pdf

Author: Michael Kapralov, Rina Panigrahy

Abstract: Consider a sequence of bits where we are trying to predict the next bit from the previous bits. Assume we are allowed to say ‘predict 0’ or ‘predict 1’, and our payoff is +1 if the prediction is correct and −1 otherwise. We will say that at each point in time the loss of an algorithm is the number of wrong predictions minus the number of right predictions so far. In this paper we are interested in algorithms that have essentially zero (expected) loss over any string at any point in time and yet have small regret with respect to always predicting 0 or always predicting 1. For a sequence of length T our algorithm has regret 14 T and loss √ 2 2 T e− T in expectation for all strings. We show that the tradeoff between loss and regret is optimal up to constant factors. Our techniques extend to the general setting of N experts, where the related problem of trading off regret to the best expert for regret to the ’special’ expert has been studied by Even-Dar et al. (COLT’07). We obtain essentially zero loss with respect to the special expert and optimal loss/regret tradeoff, improving upon the results of Even-Dar et al and settling the main question left open in their paper. The strong loss bounds of the algorithm have some surprising consequences. First, we obtain a parameter free algorithm for the experts problem that has optimal regret bounds with respect to k-shifting optima, i.e. bounds with respect to the optimum that is allowed to change arms multiple times. Moreover, for any window of size n the regret of our algorithm to any expert never exceeds O( n(log N + log T )), where N is the number of experts and T is the time horizon, while maintaining the essentially zero loss property. 1

Reference: text


Summary: the most important sentenses genereted by tfidf model

sentIndex sentText sentNum sentScore

1 edu Abstract Consider a sequence of bits where we are trying to predict the next bit from the previous bits. [sent-3, score-0.254]

2 Assume we are allowed to say ‘predict 0’ or ‘predict 1’, and our payoff is +1 if the prediction is correct and −1 otherwise. [sent-4, score-0.418]

3 In this paper we are interested in algorithms that have essentially zero (expected) loss over any string at any point in time and yet have small regret with respect to always predicting 0 or always predicting 1. [sent-6, score-0.869]

4 For a sequence of length T our algorithm has regret 14 T and loss √ 2 2 T e− T in expectation for all strings. [sent-7, score-0.672]

5 We show that the tradeoff between loss and regret is optimal up to constant factors. [sent-8, score-0.681]

6 Our techniques extend to the general setting of N experts, where the related problem of trading off regret to the best expert for regret to the ’special’ expert has been studied by Even-Dar et al. [sent-9, score-1.356]

7 We obtain essentially zero loss with respect to the special expert and optimal loss/regret tradeoff, improving upon the results of Even-Dar et al and settling the main question left open in their paper. [sent-11, score-0.451]

8 First, we obtain a parameter free algorithm for the experts problem that has optimal regret bounds with respect to k-shifting optima, i. [sent-13, score-0.769]

9 Moreover, for any window of size n the regret of our algorithm to any expert never exceeds O( n(log N + log T )), where N is the number of experts and T is the time horizon, while maintaining the essentially zero loss property. [sent-16, score-1.228]

10 1 Introduction Consider a gambler who is trying to predict the next bit in a sequence of bits. [sent-17, score-0.323]

11 One could think of the bits as indications of whether a stock price goes up or down on a given day, where we assume that the stock always goes up or down by 1 (this is, of course, a very simplified model of the stock market). [sent-18, score-0.451]

12 that the stock will go up), she buys one stock to sell it the next day, and short sells one stock if her prediction is 0. [sent-21, score-0.405]

13 If the prediction is right the gambler gets a payoff of c otherwise −c. [sent-23, score-0.556]

14 Clearly there is the strategy of never predicting (by setting confidence 0) all the time that never has a loss but also never has a positive payoff. [sent-26, score-0.408]

15 However, if the sequence is very imbalanced and has many more 0’s than 1’s then this never predict strategy has a high regret with respect to the strategy that predicts the majority bit. [sent-27, score-0.963]

16 Thus, one is interested in a strategy that has a small regret with respect to predicting the majority bit and incurs no loss at the same time. [sent-28, score-1.005]

17 More precisely, we show that for any > 1/ T 1 2 √ there exists an algorithm that achieves regret at most 14 T and loss at most 2e− T T , where T is the time horizon. [sent-30, score-0.625]

18 The bit prediction problem can be cast as the experts problem with two experts: S+ , that always predicts 1 and S− that always predicts 0. [sent-32, score-0.544]

19 The weighted majority algorithm of [12] is known to give optimal √ regret guarantees. [sent-34, score-0.644]

20 However, it can be seen that weighted majority may result in a loss of Ω( T ). [sent-35, score-0.249]

21 [7] on the problem of trading off regret to the best expert for regret to the average expert, which is equivalent to our problem. [sent-37, score-1.168]

22 √ Stated as a result on bounding loss, they were able to obtain a constant loss and regret O( √ log T ). [sent-38, score-0.663]

23 T Their work left the question open as to whether it is possible to even get a regret of O( T log T ) and constant loss. [sent-39, score-0.537]

24 In this paper we give an optimal regret/loss tradeoff, in particular showing that this regret can be achieved even with subconstant loss. [sent-40, score-0.489]

25 Our results extend to the general setting of prediction with expert advice when there are multiple experts. [sent-41, score-0.263]

26 In this problem the decision maker iteratively chooses among N available alternatives without knowledge of their payoffs, and gets payoff based on the chosen alternative. [sent-42, score-0.422]

27 This process is repeated over T rounds, and the goal of the decision maker is to maximize her cumulative payoff over all time steps t = 1, . [sent-44, score-0.398]

28 The most widely used measure of performance of an online decision making algorithm is regret, which is defined as the difference between the payoff of the best fixed alternative and the payoff of the algorithm. [sent-51, score-0.764]

29 The well-known weighted majority √ algorithm of [12] obtains regret O( T log N ) even when no assumptions are made on the process generating the payoff. [sent-52, score-0.692]

30 Regret to the best fixed alternative in hindsight is a very natural notion when the payoffs are sampled from an unknown distribution, and in fact such scenarios show that the √ bound of O( T log N ) on regret achieved by the weighted majority algorithm is optimal. [sent-53, score-0.812]

31 [7] gave an√ algorithm that has constant regret to any fixed distribution on the experts at the expense of regret O( T log N (log T + log log N )) with respect to all other experts1 . [sent-55, score-1.37]

32 We obtain an optimal tradeoff between the two, getting an algorithm with regret O( T (log N + log T )) to the best and O((N T )−Ω(1) ) to the average as a special case. [sent-56, score-0.729]

33 The strong loss bounds of our algorithm allow us to achieve lossless boosting, i. [sent-59, score-0.272]

34 we use available expert to continuously improve upon the performance of the base expert whenever possible while essentially never hurting its performance. [sent-61, score-0.534]

35 This property allows us to easily obtain optimal adaptive regret bounds, i. [sent-66, score-0.535]

36 For the bit prediction problem one would set N = 2 and use the uniform distribution over the ‘predict 0’ and ‘predict 1’ strategy as the special distribution. [sent-69, score-0.311]

37 2 worse than the payoff of the strategy that is best in that window (see Theorem 11). [sent-72, score-0.53]

38 In the full version of the paper ([11]) we also obtain bounds against the class of strategies that are allowed to change experts multiple times while maintaining the essentially zero loss property. [sent-73, score-0.523]

39 In the full version of the paper, we also show how our algorithm yields regret bounds that depend on the lp norm of the costs, regret bounds dependent on Kolmogorov complexity as well as applications of our framework to multi-armed bandits with partial information and online convex optimization. [sent-75, score-1.074]

40 Tradeoffs between regret and loss were also examined in [13], where the author studied the set of values of a, b for which an algorithm can have payoff aOP T + b log N , where OP T is the payoff of the best arm and a, b are constants. [sent-78, score-1.426]

41 The problem of bit prediction was also considered in [8], where several loss functions are considered. [sent-79, score-0.32]

42 In recent work on the NormalHedge algorithm[4] the authors use a potential function which is very similar to our function g(x) (see (2) below), getting strong regret guarantees to the -quantile of best experts. [sent-81, score-0.54]

43 It will be convenient to adopt the convention that bt ∈ {−1, +1} instead of bt ∈ {0, 1} since it simplifies the formula for the payoff. [sent-89, score-0.696]

44 In fact, in what follows we will only assume that −1 ≤ bt ≤ 1, allowing bt to be real numbers. [sent-90, score-0.696]

45 , T the algorithm is required to output a confidence level ft ∈ [−1, 1], and then the value of bt is revealed to it. [sent-94, score-0.446]

46 The payoff of the algorithm by time t is t At = ft bt . [sent-95, score-0.789]

47 (1) t=1 For example, if bt ∈ {−1, +1}, then this setup is analogous to a prediction process in which a player observes a sequence of bits and at each point in time predicts that the value of the next bit will be sign(ft ) with confidence |ft |. [sent-96, score-0.701]

48 We define the loss of the algorithm on a string b as loss = min{−At , 0}, i. [sent-98, score-0.336]

49 the absolute value of the smallest negative payoff over all time steps. [sent-100, score-0.343]

50 It is easy to see that any algorithm that has a positive expected payoff on some sequence necessarily loses on another sequence. [sent-101, score-0.422]

51 Thus, we are concerned with finding a prediction strategy that has exponentially small loss bounds but also has low regret against a number of given prediction strategies. [sent-102, score-0.929]

52 In the simplest setting we would like to design an algorithm that has low regret against two basic strategies: S+ , which always predicts +1 and S− , which always predicts −1. [sent-103, score-0.685]

53 Note that the T maximum of the payoffs of S+ and S− is always equal to t=1 bt . [sent-104, score-0.472]

54 In what follows we will use the notation AT for the cumulative payoff of the algorithm by time T as defined above. [sent-106, score-0.375]

55 As we will show in section 3, our techniques extend easily to give an algorithm that has low regret with respect to the best of any N bit prediction strategies and exponentially small loss. [sent-107, score-0.911]

56 Our techniques work for the general experts problem, where loss corresponds to regret with respect to the ’special’ expert S0 , and hence we give the proof in this setting. [sent-108, score-1.021]

57 In section 2 we give an algorithm for the case of two prediction strategies S+ and S− , and in section 3 we extend it to the general experts problem, additionally giving the claimed adaptive regret bounds. [sent-110, score-0.887]

58 the algorithm has at most 14 T regret against S+ and S− as well as a exponentially small loss. [sent-113, score-0.543]

59 √ By setting so that the loss bound is 2Z T , we get a regret bound of T log(1/Z). [sent-114, score-0.593]

60 We note that the algorithm is a strict generalization of weighted majority, which can be seen by letting Z = Θ(1) (this property will also hold for the generalization to N experts in section 3). [sent-115, score-0.273]

61 For a chosen discount factor ρ = 1 − 1/n, 0 ≤ ρ ≤ 1 t−1 the algorithm maintains a discounted deviation xt = j=1 ρt−1−j bj at each time t = 1, . [sent-117, score-0.596]

62 The value of the prediction at time t is then given by g(xt ) for a function g(·) to be defined (note that xt depends only on bt for t < t, so this is an online algorithm). [sent-121, score-0.882]

63 The function g as well as the discount factor ρ depend on the desired bound on expected loss and regret against S+ and S− . [sent-122, score-0.614]

64 In 1 particular, we will set ρ = 1 − T for our main result on regret/loss tradeoff, and will use the freedom to choose different values of ρ to obtain adaptive regret guarantees in section 3. [sent-123, score-0.524]

65 In particular, we will choose the confidence function g(x) so that xt Φt = g(s)ds. [sent-128, score-0.459]

66 In particular, we will choose g(x) so that the change of Φt lower bounds the payoff of the algorithm. [sent-130, score-0.397]

67 If we let Φt = G(xt ) (assuming for sake of clarity that xt > 0), where x G(x) = g(s)ds, 0 we have Φt+1 − Φt = G(xt+1 ) − G(xt ) ≈ G (x)∆x + G (x)∆x2 /2 ≈ g(x) [(ρ − 1)x + bt ] + g (x)/2. [sent-131, score-0.807]

68 Since the payoff of the algorithm at time step t is g(xt )bt , we have ∆Φt − g(xt )bt = −g(xt )(1 − ρ)xt + g (xt )/2, so the condition becomes −g(xt )(1 − ρ)xt + g (xt )/2 ≤ Z, where Z is the desired bound on per step loss of the algorithm. [sent-132, score-0.501]

69 We will later choose n = T to prove Theorem 1, but we will use different value ¯ of n for the adaptive regret guarantees in section 3. [sent-144, score-0.524]

70 Then the payoff of the algorithm is at least T ρxt g(xt )(1 − h(x)) + ΦT +1 − Z T ¯ t=1 as long as |bt | ≤ 1 for all t. [sent-148, score-0.375]

71 Proof: We will show that at each t Φt+1 − Φt ≤ bt g(xt ) + Z − ρxt g(xt )(1 − h(xt )), ¯ i. [sent-149, score-0.348]

72 T T bt g(xt ) ≥ −Z T + t=1 ρxt g(xt )(1 − h(xt )) + ΦT +1 − Φ1 , ¯ t=1 thus implying the claim of the lemma since Φ1 = 0. [sent-151, score-0.424]

73 0 ≤ bt ≤ 1: We have xt+1 = ρxt + bt = xt − ρxt + bt , and the expected payoff of the algorithm ¯ is g(xt )bt . [sent-155, score-1.878]

74 Then xt −ρxt +bt ¯ Φt+1 − Φt = g(s)ds xt 1 ≤ g(xt )(bt − ρxt ) + (¯xt + bt )2 · ¯ ρ max |g (s)| 2 s∈[xt ,xt −ρxt +bt ] ¯ 1 ≤ g(xt )bt + −g(xt )¯xt + (¯xt + 1)2 · ρ ρ max |g (s)| 2 s∈[xt ,xt −ρxt +bt ] ¯ ≤ g(xt )bt + (−1 + h(xt ))¯xt g(xt ) + Z . [sent-156, score-1.304]

75 The following lemma shows that the function g(x) satisfies the properties stated in Lemma 2: 5 (3) x U +1 tanh g(x) −U +U x −1 Figure 1: The shape of the confidence function g(x) (solid line) and the tanh(x) function used by weighted majority (dotted line). [sent-165, score-0.289]

76 ¯ We can now lower bound the payoff of Algorithm 1. [sent-172, score-0.343]

77 ¯ U U t=1 Proof: By Lemma 3 we have that the function g(x) satisfies the conditions of Lemma 2, and so from the bounds stated in Lemma 2 the payoff of the algorithm is at least T √ ρ|xt |+ + ΦT +1 − 2ZT / n. [sent-175, score-0.451]

78  T ¯ t=1 ρxt T −1 Proof: In light of Theorem 4 it remains to bound T −1 T ρ ¯ t t=1 t=1 j=1 T bt (1 − ρT −t ) + ρt−j bj + xT +1 = xt + xT +1 = ρ ¯ + xT +1 . [sent-178, score-0.86]

79 Our loss/regret tradeoff is optimal up to constant factors (proof deferred to the full version): 6 √ Theorem 6 Any algorithm that has regret O( T log(1/Z)) incurs loss Ω(Z T ) on at least one sequence of bits bt , t = 1, . [sent-182, score-1.189]

80 Note that if Z = o(1/T ), then the payoff of the algorithm is positive whenever the absolute value of √ the deviation xt is larger than, say 8 n log T in at least one window of size n. [sent-186, score-1.007]

81 3 Combining strategies (lossless boosting) In the previous section we derived an algorithm for the bit prediction problem with low regret to the S+ and S− strategies and exponentially small loss. [sent-187, score-0.981]

82 We now show how our techniques yield an algorithm that has low regret to the best of N bit prediction strategies S1 , . [sent-188, score-0.837]

83 However, since the proof works for the general experts problem, where loss corresponds to regret to a ’special’ expert S0 , we state it in the general experts setting. [sent-192, score-1.155]

84 In what follows we will refer to regret to S0 as loss. [sent-193, score-0.467]

85 We will also prove optimal bounds on regret that hold in every window of length n at the end of the section. [sent-194, score-0.62]

86 We start by proving Theorem 7 For any Z < 1/e there exists an algorithm for combining N strategies that has regret √ O( T log(N/Z)) against the best of N strategies and loss at most O(ZN T ) with respect to any strategy S0 fixed a priori. [sent-195, score-1.029]

87 A prediction strategy S given a bit string bt , produces a sequence of weights N wjt on the set of experts j = 1, . [sent-198, score-0.973]

88 , N such that wjt depends only on bt , t < t and j=1 wjt = 1, wjt ≥ 0 for all t. [sent-201, score-0.588]

89 Thus, using strategy S amounts to using expert j with probability wj,t at time t, for all t = 1, . [sent-202, score-0.307]

90 For a strategy S we denote its payoff at time t by st . [sent-207, score-0.451]

91 Our algorithm will consider S1 as the base strategy (corresponding to the null strategy S0 in the previous section) and will use S2 to improve on S1 whenever possible, without introducing significant loss over S1 in the process. [sent-209, score-0.413]

92 The intuition behind the algorithm is that since the difference in payoff obtained by using S2 instead of S1 is given by (s2,t − s1,t ), it is sufficient to emulate t−1 t−1−j Algorithm 1 on this sequence. [sent-215, score-0.375]

93 In particular, we set xt = (s2,j − s1,j ) and predict j=1 ρ 1 ¯ g (xt ) (note that since |s1,t − s2,t | ≤ 2, we need to use g( 2 x) in the definition of g to scale the ¯ payoffs). [sent-216, score-0.502]

94 7 Lemma 8 There exists an algorithm that given two strategies S1 and S2 gets payoff at least T T (s2,t − s1,t ) − O s1,t + max t=1 T log(1/Z) , 0 √ − O(Z T ). [sent-221, score-0.54]

95 We emphasize the property that Algorithm 2 combines two strategies S1 and S2 , improving on the performance of S1 using S2 whenever possible, essentially without introducing any loss with respect to S1 . [sent-224, score-0.379]

96 The regret and loss guarantees follow by Lemma 8. [sent-233, score-0.623]

97 Corollary 9 Setting Z = (N T )−1−γ for γ > 0, we get regret O( γT (log N + log T )) to the best of N strategies and loss at most O((N T )−γ ) wrt strategy S0 fixed a priori. [sent-234, score-0.895]

98 Finally, we show another adaptive regret property of Algorithm 3. [sent-243, score-0.513]

99 We now prove that the difference between the payoff of our algorithm and the payoff of any expert is Z-uniform, i. [sent-250, score-0.906]

100 Moreover, the loss of the algorithm with respect to the o((N base strategy is at most 2ZN T . [sent-254, score-0.305]


similar papers computed by tfidf model

tfidf for this paper:

wordName wordTfidf (topN-words)

[('regret', 0.467), ('xt', 0.459), ('bt', 0.348), ('payoff', 0.343), ('expert', 0.188), ('experts', 0.164), ('loss', 0.126), ('strategies', 0.122), ('bit', 0.119), ('gambler', 0.114), ('stock', 0.11), ('payoffs', 0.098), ('strategy', 0.088), ('majority', 0.086), ('wjt', 0.08), ('erf', 0.08), ('window', 0.077), ('lemma', 0.076), ('prediction', 0.075), ('log', 0.07), ('kapralov', 0.068), ('xg', 0.068), ('predicts', 0.067), ('tradeoff', 0.066), ('ft', 0.066), ('lossless', 0.06), ('bounds', 0.054), ('predicting', 0.053), ('bj', 0.053), ('string', 0.052), ('dence', 0.05), ('imbalance', 0.049), ('never', 0.047), ('sequence', 0.047), ('boosting', 0.047), ('proof', 0.046), ('bits', 0.045), ('exponentially', 0.044), ('tanh', 0.043), ('predict', 0.043), ('weighted', 0.037), ('incurs', 0.036), ('combine', 0.036), ('theorem', 0.036), ('essentially', 0.036), ('ds', 0.033), ('rina', 0.033), ('algorithm', 0.032), ('maker', 0.031), ('amounts', 0.031), ('discounted', 0.031), ('guarantees', 0.03), ('respect', 0.03), ('freund', 0.03), ('base', 0.029), ('special', 0.029), ('odd', 0.029), ('geometrically', 0.028), ('day', 0.027), ('adaptive', 0.027), ('always', 0.026), ('arms', 0.026), ('whenever', 0.026), ('goes', 0.025), ('shape', 0.025), ('trading', 0.024), ('null', 0.024), ('decision', 0.024), ('gets', 0.024), ('con', 0.024), ('predictions', 0.024), ('wt', 0.024), ('colt', 0.023), ('arm', 0.023), ('yielding', 0.023), ('best', 0.022), ('optimal', 0.022), ('adversarial', 0.022), ('stated', 0.022), ('ex', 0.021), ('nj', 0.021), ('satis', 0.021), ('getting', 0.021), ('letting', 0.021), ('discount', 0.021), ('sign', 0.021), ('maintaining', 0.021), ('st', 0.02), ('game', 0.02), ('bandit', 0.02), ('sn', 0.02), ('panigrahy', 0.02), ('hurting', 0.02), ('tempted', 0.02), ('improving', 0.02), ('start', 0.02), ('property', 0.019), ('extensively', 0.019), ('predictors', 0.019), ('max', 0.019)]

similar papers list:

simIndex simValue paperId paperTitle

same-paper 1 0.99999982 220 nips-2011-Prediction strategies without loss

Author: Michael Kapralov, Rina Panigrahy

Abstract: Consider a sequence of bits where we are trying to predict the next bit from the previous bits. Assume we are allowed to say ‘predict 0’ or ‘predict 1’, and our payoff is +1 if the prediction is correct and −1 otherwise. We will say that at each point in time the loss of an algorithm is the number of wrong predictions minus the number of right predictions so far. In this paper we are interested in algorithms that have essentially zero (expected) loss over any string at any point in time and yet have small regret with respect to always predicting 0 or always predicting 1. For a sequence of length T our algorithm has regret 14 T and loss √ 2 2 T e− T in expectation for all strings. We show that the tradeoff between loss and regret is optimal up to constant factors. Our techniques extend to the general setting of N experts, where the related problem of trading off regret to the best expert for regret to the ’special’ expert has been studied by Even-Dar et al. (COLT’07). We obtain essentially zero loss with respect to the special expert and optimal loss/regret tradeoff, improving upon the results of Even-Dar et al and settling the main question left open in their paper. The strong loss bounds of the algorithm have some surprising consequences. First, we obtain a parameter free algorithm for the experts problem that has optimal regret bounds with respect to k-shifting optima, i.e. bounds with respect to the optimum that is allowed to change arms multiple times. Moreover, for any window of size n the regret of our algorithm to any expert never exceeds O( n(log N + log T )), where N is the number of experts and T is the time horizon, while maintaining the essentially zero loss property. 1

2 0.35031316 128 nips-2011-Improved Algorithms for Linear Stochastic Bandits

Author: Yasin Abbasi-yadkori, Csaba Szepesvári, David Tax

Abstract: We improve the theoretical analysis and empirical performance of algorithms for the stochastic multi-armed bandit problem and the linear stochastic multi-armed bandit problem. In particular, we show that a simple modification of Auer’s UCB algorithm (Auer, 2002) achieves with high probability constant regret. More importantly, we modify and, consequently, improve the analysis of the algorithm for the for linear stochastic bandit problem studied by Auer (2002), Dani et al. (2008), Rusmevichientong and Tsitsiklis (2010), Li et al. (2010). Our modification improves the regret bound by a logarithmic factor, though experiments show a vast improvement. In both cases, the improvement stems from the construction of smaller confidence sets. For their construction we use a novel tail inequality for vector-valued martingales. 1

3 0.31816661 145 nips-2011-Learning Eigenvectors for Free

Author: Wouter M. Koolen, Wojciech Kotlowski, Manfred K. Warmuth

Abstract: We extend the classical problem of predicting a sequence of outcomes from a finite alphabet to the matrix domain. In this extension, the alphabet of n outcomes is replaced by the set of all dyads, i.e. outer products uu where u is a vector in Rn of unit length. Whereas in the classical case the goal is to learn (i.e. sequentially predict as well as) the best multinomial distribution, in the matrix case we desire to learn the density matrix that best explains the observed sequence of dyads. We show how popular online algorithms for learning a multinomial distribution can be extended to learn density matrices. Intuitively, learning the n2 parameters of a density matrix is much harder than learning the n parameters of a multinomial distribution. Completely surprisingly, we prove that the worst-case regrets of certain classical algorithms and their matrix generalizations are identical. The reason is that the worst-case sequence of dyads share a common eigensystem, i.e. the worst case regret is achieved in the classical case. So these matrix algorithms learn the eigenvectors without any regret. 1

4 0.26385716 204 nips-2011-Online Learning: Stochastic, Constrained, and Smoothed Adversaries

Author: Alexander Rakhlin, Karthik Sridharan, Ambuj Tewari

Abstract: Learning theory has largely focused on two main learning scenarios: the classical statistical setting where instances are drawn i.i.d. from a fixed distribution, and the adversarial scenario wherein, at every time step, an adversarially chosen instance is revealed to the player. It can be argued that in the real world neither of these assumptions is reasonable. We define the minimax value of a game where the adversary is restricted in his moves, capturing stochastic and non-stochastic assumptions on data. Building on the sequential symmetrization approach, we define a notion of distribution-dependent Rademacher complexity for the spectrum of problems ranging from i.i.d. to worst-case. The bounds let us immediately deduce variation-type bounds. We study a smoothed online learning scenario and show that exponentially small amount of noise can make function classes with infinite Littlestone dimension learnable. 1

5 0.25271094 185 nips-2011-Newtron: an Efficient Bandit algorithm for Online Multiclass Prediction

Author: Elad Hazan, Satyen Kale

Abstract: We present an efficient algorithm for the problem of online multiclass prediction with bandit feedback in the fully adversarial setting. We measure its regret with respect to the log-loss defined in [AR09], which is parameterized by a scalar α. We prove that the regret of N EWTRON is O(log T ) when α is a constant that does not vary with horizon T , and at most O(T 2/3 ) if α is allowed to increase to infinity √ with T . For α = O(log T ), the regret is bounded by O( T ), thus solving the open problem of [KSST08, AR09]. Our algorithm is based on a novel application of the online Newton method [HAK07]. We test our algorithm and show it to perform well in experiments, even when α is a small constant. 1

6 0.24388166 39 nips-2011-Approximating Semidefinite Programs in Sublinear Time

7 0.24193618 61 nips-2011-Contextual Gaussian Process Bandit Optimization

8 0.22850771 245 nips-2011-Selecting the State-Representation in Reinforcement Learning

9 0.20205085 210 nips-2011-PAC-Bayesian Analysis of Contextual Bandits

10 0.19756469 80 nips-2011-Efficient Online Learning via Randomized Rounding

11 0.18415262 25 nips-2011-Adaptive Hedge

12 0.17833924 98 nips-2011-From Bandits to Experts: On the Value of Side-Observations

13 0.16011849 272 nips-2011-Stochastic convex optimization with bandit feedback

14 0.15703684 262 nips-2011-Sparse Inverse Covariance Matrix Estimation Using Quadratic Approximation

15 0.15035516 32 nips-2011-An Empirical Evaluation of Thompson Sampling

16 0.14775227 196 nips-2011-On Strategy Stitching in Large Extensive Form Multiplayer Games

17 0.14099026 218 nips-2011-Predicting Dynamic Difficulty

18 0.12929882 56 nips-2011-Committing Bandits

19 0.1271264 177 nips-2011-Multi-armed bandits on implicit metric spaces

20 0.1229978 303 nips-2011-Video Annotation and Tracking with Active Learning


similar papers computed by lsi model

lsi for this paper:

topicId topicWeight

[(0, 0.227), (1, -0.431), (2, -0.01), (3, -0.029), (4, 0.362), (5, -0.016), (6, 0.097), (7, 0.028), (8, 0.094), (9, -0.021), (10, 0.093), (11, -0.027), (12, 0.057), (13, 0.012), (14, -0.025), (15, 0.01), (16, 0.028), (17, 0.055), (18, 0.035), (19, -0.053), (20, 0.022), (21, 0.028), (22, 0.004), (23, -0.041), (24, -0.019), (25, -0.122), (26, -0.035), (27, -0.046), (28, -0.01), (29, 0.032), (30, 0.039), (31, -0.048), (32, 0.025), (33, 0.012), (34, -0.005), (35, -0.023), (36, 0.037), (37, -0.114), (38, 0.023), (39, 0.032), (40, -0.065), (41, 0.046), (42, -0.048), (43, -0.007), (44, -0.032), (45, -0.021), (46, -0.005), (47, 0.048), (48, 0.124), (49, 0.006)]

similar papers list:

simIndex simValue paperId paperTitle

same-paper 1 0.9746213 220 nips-2011-Prediction strategies without loss

Author: Michael Kapralov, Rina Panigrahy

Abstract: Consider a sequence of bits where we are trying to predict the next bit from the previous bits. Assume we are allowed to say ‘predict 0’ or ‘predict 1’, and our payoff is +1 if the prediction is correct and −1 otherwise. We will say that at each point in time the loss of an algorithm is the number of wrong predictions minus the number of right predictions so far. In this paper we are interested in algorithms that have essentially zero (expected) loss over any string at any point in time and yet have small regret with respect to always predicting 0 or always predicting 1. For a sequence of length T our algorithm has regret 14 T and loss √ 2 2 T e− T in expectation for all strings. We show that the tradeoff between loss and regret is optimal up to constant factors. Our techniques extend to the general setting of N experts, where the related problem of trading off regret to the best expert for regret to the ’special’ expert has been studied by Even-Dar et al. (COLT’07). We obtain essentially zero loss with respect to the special expert and optimal loss/regret tradeoff, improving upon the results of Even-Dar et al and settling the main question left open in their paper. The strong loss bounds of the algorithm have some surprising consequences. First, we obtain a parameter free algorithm for the experts problem that has optimal regret bounds with respect to k-shifting optima, i.e. bounds with respect to the optimum that is allowed to change arms multiple times. Moreover, for any window of size n the regret of our algorithm to any expert never exceeds O( n(log N + log T )), where N is the number of experts and T is the time horizon, while maintaining the essentially zero loss property. 1

2 0.83951598 185 nips-2011-Newtron: an Efficient Bandit algorithm for Online Multiclass Prediction

Author: Elad Hazan, Satyen Kale

Abstract: We present an efficient algorithm for the problem of online multiclass prediction with bandit feedback in the fully adversarial setting. We measure its regret with respect to the log-loss defined in [AR09], which is parameterized by a scalar α. We prove that the regret of N EWTRON is O(log T ) when α is a constant that does not vary with horizon T , and at most O(T 2/3 ) if α is allowed to increase to infinity √ with T . For α = O(log T ), the regret is bounded by O( T ), thus solving the open problem of [KSST08, AR09]. Our algorithm is based on a novel application of the online Newton method [HAK07]. We test our algorithm and show it to perform well in experiments, even when α is a small constant. 1

3 0.81755638 128 nips-2011-Improved Algorithms for Linear Stochastic Bandits

Author: Yasin Abbasi-yadkori, Csaba Szepesvári, David Tax

Abstract: We improve the theoretical analysis and empirical performance of algorithms for the stochastic multi-armed bandit problem and the linear stochastic multi-armed bandit problem. In particular, we show that a simple modification of Auer’s UCB algorithm (Auer, 2002) achieves with high probability constant regret. More importantly, we modify and, consequently, improve the analysis of the algorithm for the for linear stochastic bandit problem studied by Auer (2002), Dani et al. (2008), Rusmevichientong and Tsitsiklis (2010), Li et al. (2010). Our modification improves the regret bound by a logarithmic factor, though experiments show a vast improvement. In both cases, the improvement stems from the construction of smaller confidence sets. For their construction we use a novel tail inequality for vector-valued martingales. 1

4 0.76976746 145 nips-2011-Learning Eigenvectors for Free

Author: Wouter M. Koolen, Wojciech Kotlowski, Manfred K. Warmuth

Abstract: We extend the classical problem of predicting a sequence of outcomes from a finite alphabet to the matrix domain. In this extension, the alphabet of n outcomes is replaced by the set of all dyads, i.e. outer products uu where u is a vector in Rn of unit length. Whereas in the classical case the goal is to learn (i.e. sequentially predict as well as) the best multinomial distribution, in the matrix case we desire to learn the density matrix that best explains the observed sequence of dyads. We show how popular online algorithms for learning a multinomial distribution can be extended to learn density matrices. Intuitively, learning the n2 parameters of a density matrix is much harder than learning the n parameters of a multinomial distribution. Completely surprisingly, we prove that the worst-case regrets of certain classical algorithms and their matrix generalizations are identical. The reason is that the worst-case sequence of dyads share a common eigensystem, i.e. the worst case regret is achieved in the classical case. So these matrix algorithms learn the eigenvectors without any regret. 1

5 0.71396232 204 nips-2011-Online Learning: Stochastic, Constrained, and Smoothed Adversaries

Author: Alexander Rakhlin, Karthik Sridharan, Ambuj Tewari

Abstract: Learning theory has largely focused on two main learning scenarios: the classical statistical setting where instances are drawn i.i.d. from a fixed distribution, and the adversarial scenario wherein, at every time step, an adversarially chosen instance is revealed to the player. It can be argued that in the real world neither of these assumptions is reasonable. We define the minimax value of a game where the adversary is restricted in his moves, capturing stochastic and non-stochastic assumptions on data. Building on the sequential symmetrization approach, we define a notion of distribution-dependent Rademacher complexity for the spectrum of problems ranging from i.i.d. to worst-case. The bounds let us immediately deduce variation-type bounds. We study a smoothed online learning scenario and show that exponentially small amount of noise can make function classes with infinite Littlestone dimension learnable. 1

6 0.70871627 272 nips-2011-Stochastic convex optimization with bandit feedback

7 0.6989029 80 nips-2011-Efficient Online Learning via Randomized Rounding

8 0.61425036 39 nips-2011-Approximating Semidefinite Programs in Sublinear Time

9 0.59335387 210 nips-2011-PAC-Bayesian Analysis of Contextual Bandits

10 0.591555 61 nips-2011-Contextual Gaussian Process Bandit Optimization

11 0.59085196 245 nips-2011-Selecting the State-Representation in Reinforcement Learning

12 0.58904439 32 nips-2011-An Empirical Evaluation of Thompson Sampling

13 0.5838185 218 nips-2011-Predicting Dynamic Difficulty

14 0.57547015 98 nips-2011-From Bandits to Experts: On the Value of Side-Observations

15 0.57104528 25 nips-2011-Adaptive Hedge

16 0.48102751 21 nips-2011-Active Learning with a Drifting Distribution

17 0.47307289 56 nips-2011-Committing Bandits

18 0.45402274 262 nips-2011-Sparse Inverse Covariance Matrix Estimation Using Quadratic Approximation

19 0.4385519 162 nips-2011-Lower Bounds for Passive and Active Learning

20 0.41621894 177 nips-2011-Multi-armed bandits on implicit metric spaces


similar papers computed by lda model

lda for this paper:

topicId topicWeight

[(0, 0.026), (4, 0.073), (9, 0.143), (20, 0.028), (26, 0.022), (31, 0.073), (33, 0.012), (43, 0.07), (45, 0.188), (57, 0.049), (74, 0.097), (79, 0.015), (83, 0.053), (99, 0.048)]

similar papers list:

simIndex simValue paperId paperTitle

1 0.94276291 181 nips-2011-Multiple Instance Learning on Structured Data

Author: Dan Zhang, Yan Liu, Luo Si, Jian Zhang, Richard D. Lawrence

Abstract: Most existing Multiple-Instance Learning (MIL) algorithms assume data instances and/or data bags are independently and identically distributed. But there often exists rich additional dependency/structure information between instances/bags within many applications of MIL. Ignoring this structure information limits the performance of existing MIL algorithms. This paper explores the research problem as multiple instance learning on structured data (MILSD) and formulates a novel framework that considers additional structure information. In particular, an effective and efficient optimization algorithm has been proposed to solve the original non-convex optimization problem by using a combination of ConcaveConvex Constraint Programming (CCCP) method and an adapted Cutting Plane method, which deals with two sets of constraints caused by learning on instances within individual bags and learning on structured data. Our method has the nice convergence property, with specified precision on each set of constraints. Experimental results on three different applications, i.e., webpage classification, market targeting, and protein fold identification, clearly demonstrate the advantages of the proposed method over state-of-the-art methods. 1

2 0.93790859 69 nips-2011-Differentially Private M-Estimators

Author: Jing Lei

Abstract: This paper studies privacy preserving M-estimators using perturbed histograms. The proposed approach allows the release of a wide class of M-estimators with both differential privacy and statistical utility without knowing a priori the particular inference procedure. The performance of the proposed method is demonstrated through a careful study of the convergence rates. A practical algorithm is given and applied on a real world data set containing both continuous and categorical variables. 1

same-paper 3 0.89847767 220 nips-2011-Prediction strategies without loss

Author: Michael Kapralov, Rina Panigrahy

Abstract: Consider a sequence of bits where we are trying to predict the next bit from the previous bits. Assume we are allowed to say ‘predict 0’ or ‘predict 1’, and our payoff is +1 if the prediction is correct and −1 otherwise. We will say that at each point in time the loss of an algorithm is the number of wrong predictions minus the number of right predictions so far. In this paper we are interested in algorithms that have essentially zero (expected) loss over any string at any point in time and yet have small regret with respect to always predicting 0 or always predicting 1. For a sequence of length T our algorithm has regret 14 T and loss √ 2 2 T e− T in expectation for all strings. We show that the tradeoff between loss and regret is optimal up to constant factors. Our techniques extend to the general setting of N experts, where the related problem of trading off regret to the best expert for regret to the ’special’ expert has been studied by Even-Dar et al. (COLT’07). We obtain essentially zero loss with respect to the special expert and optimal loss/regret tradeoff, improving upon the results of Even-Dar et al and settling the main question left open in their paper. The strong loss bounds of the algorithm have some surprising consequences. First, we obtain a parameter free algorithm for the experts problem that has optimal regret bounds with respect to k-shifting optima, i.e. bounds with respect to the optimum that is allowed to change arms multiple times. Moreover, for any window of size n the regret of our algorithm to any expert never exceeds O( n(log N + log T )), where N is the number of experts and T is the time horizon, while maintaining the essentially zero loss property. 1

4 0.8830868 132 nips-2011-Inferring Interaction Networks using the IBP applied to microRNA Target Prediction

Author: Hai-son P. Le, Ziv Bar-joseph

Abstract: Determining interactions between entities and the overall organization and clustering of nodes in networks is a major challenge when analyzing biological and social network data. Here we extend the Indian Buffet Process (IBP), a nonparametric Bayesian model, to integrate noisy interaction scores with properties of individual entities for inferring interaction networks and clustering nodes within these networks. We present an application of this method to study how microRNAs regulate mRNAs in cells. Analysis of synthetic and real data indicates that the method improves upon prior methods, correctly recovers interactions and clusters, and provides accurate biological predictions. 1

5 0.84966058 149 nips-2011-Learning Sparse Representations of High Dimensional Data on Large Scale Dictionaries

Author: Zhen J. Xiang, Hao Xu, Peter J. Ramadge

Abstract: Learning sparse representations on data adaptive dictionaries is a state-of-the-art method for modeling data. But when the dictionary is large and the data dimension is high, it is a computationally challenging problem. We explore three aspects of the problem. First, we derive new, greatly improved screening tests that quickly identify codewords that are guaranteed to have zero weights. Second, we study the properties of random projections in the context of learning sparse representations. Finally, we develop a hierarchical framework that uses incremental random projections and screening to learn, in small stages, a hierarchically structured dictionary for sparse representations. Empirical results show that our framework can learn informative hierarchical sparse representations more efficiently. 1

6 0.8444466 185 nips-2011-Newtron: an Efficient Bandit algorithm for Online Multiclass Prediction

7 0.84073281 118 nips-2011-High-dimensional regression with noisy and missing data: Provable guarantees with non-convexity

8 0.84060085 169 nips-2011-Maximum Margin Multi-Label Structured Prediction

9 0.8391217 80 nips-2011-Efficient Online Learning via Randomized Rounding

10 0.83834851 271 nips-2011-Statistical Tests for Optimization Efficiency

11 0.83834279 70 nips-2011-Dimensionality Reduction Using the Sparse Linear Model

12 0.8377282 172 nips-2011-Minimax Localization of Structural Information in Large Noisy Matrices

13 0.83596051 251 nips-2011-Shaping Level Sets with Submodular Functions

14 0.83593959 159 nips-2011-Learning with the weighted trace-norm under arbitrary sampling distributions

15 0.83500397 150 nips-2011-Learning a Distance Metric from a Network

16 0.83487314 265 nips-2011-Sparse recovery by thresholded non-negative least squares

17 0.8345241 186 nips-2011-Noise Thresholds for Spectral Clustering

18 0.83387929 283 nips-2011-The Fixed Points of Off-Policy TD

19 0.83327651 45 nips-2011-Beating SGD: Learning SVMs in Sublinear Time

20 0.83121812 263 nips-2011-Sparse Manifold Clustering and Embedding